Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
ยฉ 2012 Boise State University 1
CS-334Algorithms of Machine Learning
Topic: Sequential Modeling
Arthur Putnam
Boise State University 2
Overview
โข Sequential data
โข What we can do with sequential modeling
โข What techniques are out there
โข Introduction into Markov Chains
Boise State University 4
What is Sequential Data?
Sequential Data is any kind of data where the order โmattersโ
What makes Sequential Data different from Timeseries data?
Boise State University 5
Why is it important to know if our data is sequential ?When should we consider using a sequential approach?
When โcriticalโ information is lost if the order is lost or not represented
Boise State University 6
Does the order of this data matter? Why or why not?Hint: this may be trick question
Key takeaway: just because the data has a sequential element doesnโt mean we should model it that way
Boise State University 7
What we can do with sequential modeling?
โข Classification
โข Predict the next item in the sequence
โข Determine if a given sequence is normal or abnormal
โข Use it to create generative AI
Boise State University 8
Common Ways to Model Sequential Data
โข Conditional Random Fields
โข Recurrent Neural Networks (RNN)
โข Markov Chains
โข HMMs
Boise State University 9
Sequential State Spaces
For the rest of this lecture we are going to assume both time and states are discrete.
Boise State University 10
Sequential State Spaces
โข Define a state space generically as:
โข ๐ ๐ represents a discrete state
โข ๐๐ก๐๐ก๐ ๐ ๐๐๐๐ contains all possible states for seen and unseen sequences
๐๐ก๐๐ก๐ ๐ ๐๐๐๐ = ๐ = {๐ 0, ๐ 1โฆ๐ ๐}
Boise State University 11
Sequenceโข Letโs define a sequence as
โข t represents a discrete timestep
โข ๐๐ก represents the discrete state at time ๐ก
โข ๐ is an ordered list where each element is in the state space
๐ = ๐๐ก = ๐0, ๐1, ๐2, โฆ๐๐ก
โ๐ก โ โ, ๐๐ก โ ๐
Boise State University 12
Sequence Example
โข Let ๐ = {๐๐๐๐, ๐๐๐๐๐, ๐ ๐๐๐ ๐ ๐๐๐ }
๐ = ๐0, ๐1, ๐2, ๐3๐ = ๐๐๐๐, ๐๐๐๐, ๐๐๐๐๐, ๐ ๐๐๐ ๐ ๐๐๐
Game 2Game 1 Game 3 Game 4
๐0 = ๐๐๐๐ ๐1 = ๐๐๐๐ ๐2 = ๐๐๐๐๐ ๐3 = ๐ ๐๐๐ ๐ ๐๐๐
Boise State University 13
Sequence Example 2
โข Let ๐ = {๐๐๐๐, ๐๐๐๐๐, ๐ ๐๐๐ ๐ ๐๐๐ }
๐ = ๐0, ๐1๐ = ๐๐๐๐, ๐๐๐๐
Game 2Game 1
๐0 = ๐๐๐๐ ๐1 = ๐๐๐๐
Boise State University 14
Letโs assume we are trying to predict the next move our opponent is going to make in a game of
Rock, Paper, Scissors
Game 2Game 1 Game 3
๐0 = ๐๐๐๐ ๐1 = ๐๐๐๐ ๐2 = ๐๐๐๐๐
Boise State University 15
Modeling Sequential Data
โข One way we might model sequential data is via a probabilistic approach where we predict future states based on the present and the past
๐ ๐๐ข๐ก๐ข๐๐ ๐๐๐๐ ๐๐๐ก, ๐๐๐ ๐ก)
Boise State University 16
Example Assumptions
โข Letโs assume we already know the probability distribution
โข This probability distribution is based on our opponent's previous moves
๐ ๐๐ข๐ก๐ข๐๐ ๐๐๐๐ ๐๐๐ก, ๐๐๐ ๐ก)
Boise State University 17
๐ ๐๐ข๐ก๐ข๐๐ ๐๐๐๐ ๐๐๐ก, ๐๐๐ ๐ก)
๐ ๐๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐)
Game 2Game 1 Game 3 Game 4
๐0 = ๐๐๐๐ ๐1 = ๐๐๐๐ ๐2 = ๐๐๐๐๐ ๐3 = ๐ ๐๐๐ ๐ ๐๐๐
๐ ๐๐๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐)
๐ ๐ ๐๐๐ ๐ ๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐)
Boise State University 18
โข Letโs assume
โข What move should we go with?
๐ ๐๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐) = 0.3
๐ ๐๐๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐) = 0.3
๐ ๐ ๐๐๐ ๐ ๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐) = 0.4
๐ ๐ ๐๐๐ ๐ ๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐) = 0.4
Boise State University 20
First Approach Assessment
โข What are some problems with this approach?
โข How would we calculate the probability distribution, if it wasnโt given to us?
๐ ๐๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐) = 0.3
๐ ๐๐๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐) = 0.3
๐ ๐ ๐๐๐ ๐ ๐๐๐ ๐2 = ๐๐๐๐๐, ๐1 = ๐๐๐๐, ๐0 = ๐๐๐๐) = 0.4
๐ ๐๐ข๐ก๐ข๐๐ ๐๐ก , ๐๐กโ1, ๐๐กโ2, โฆ ๐0)
Boise State University 21
Markov Property
โข Markov Property states that the conditional probability distribution of future states of the process depends only on the present state, not on the sequence of events that preceded it.
โข Markov assumption is used to describe a model where the Markov property is assumed to hold
Boise State University 22
Using the Markov Property
๐ ๐๐ข๐ก๐ข๐๐ ๐๐๐๐ ๐๐๐ก, ๐๐๐ ๐ก) ๐ ๐๐ข๐ก๐ข๐๐ ๐๐๐๐ ๐๐๐ก)
๐ ๐๐ข๐ก๐ข๐๐ ๐๐ก , ๐๐กโ1, ๐๐กโ2, โฆ ๐0) ๐ ๐๐ข๐ก๐ข๐๐ ๐๐ก)
This simplifies the probability function and is more robust at handling differently ordered sequences
Boise State University 23
Markov Chain
โข A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.
Markov Model
Some state ๐๐ก
*Conceptual Model
*We will define this model more formally later
Probability (or a set of probabilities for each state)
๐ ๐๐ข๐ก๐ข๐๐ ๐๐ก)
Boise State University 24
Visualization
https://setosa.io/ev/markov-chains/
Boise State University 25
Markov Chainโข Often considered to be โmemory-lessโ thanks to the Markov Property
โข Markov Model can take a sequence as input and produceโ the probability of that sequence
โ or the probability of the next states in the sequence
โข Is trained from empirical data โ (multiset of sequences)
โข Pros: easy to train, simple to understand
Boise State University 26
Letโs Build a Markov Chain from Scratchโข We need to:
โ Define the state space
โ Find the initial probabilities
โ Find the transition probabilities
โข Data Setโ Sequences:
โข RRPSSRPSRP
โข PPPSPSPSRR
โข RPSSPSRPSP
โ R = Rock, P = Paper, S = Scissors
Boise State University 27
Define the state space
Empirical data:โข RRPSSRPSRP
โข PPPSPSPSRR
โข RPSSPSRPSP
๐ = {๐ , ๐, ๐}
Boise State University 28
Find the Initial Probabilities
Empirical data:โข RRPSSRPSRP
โข PPPSPSPSRR
โข RPSSPSRPSP
R starting Probability P starting Probability S starting Probability
2/3 = 0.666 1/3 = 0.333 0/3 = 0
Boise State University 29
Find the Transition Probabilities
Empirical data:โข RRPSSRPSRP
โข PPPSPSPSRR
โข RPSSPSRPSP
State Transitions ๐ ๐๐ก+1 ๐๐ก)
RR 2/7
RP
RS
PR
PP
PS
SR
SP
SS
๐ ๐๐ก+1 = ๐ ๐๐ก = ๐ )
๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐ ๐คโ๐๐๐ ๐๐ก , ๐๐ก+1 ๐๐ ๐ ๐๐๐
๐๐ข๐๐๐๐ ๐๐ ๐๐ก
Boise State University 30
Find the Transition Probabilities
Empirical data:โข RRPSSRPSRP
โข PPPSPSPSRR
โข RPSSPSRPSP
State Transitions ๐ ๐๐ก+1 ๐๐ก)
RR 2/7
RP 5/7
RS
PR
PP
PS
SR
SP
SS
๐ ๐๐ก+1 = ๐ ๐๐ก = ๐)
๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐ ๐คโ๐๐๐ ๐๐ก , ๐๐ก+1 ๐๐ ๐ ๐๐๐
๐๐ข๐๐๐๐ ๐๐ ๐๐ก
Boise State University 31
Find the Transition Probabilities
Empirical data:โข RRPSSRPSRP
โข PPPSPSPSRR
โข RPSSPSRPSP
State Transitions ๐ ๐๐ก+1 ๐๐ก)
RR 2/7
RP 5/7
RS 0/7
PR
PP
PS
SR
SP
SS
๐ ๐๐ก+1 = ๐ ๐๐ก = ๐)
๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐ ๐คโ๐๐๐ ๐๐ก , ๐๐ก+1 ๐๐ ๐ ๐๐๐
๐๐ข๐๐๐๐ ๐๐ ๐๐ก
Boise State University 32
Find the Transition Probabilities
Empirical data:โข RRPSSRPSRP
โข PPPSPSPSRR
โข RPSSPSRPSP
State Transitions ๐ ๐๐ก+1 ๐๐ก)
RR 2/7 = 0.285
RP 5/7 = 0.714
RS 0/7 = 0
PR 0/10 = 0
PP 2/10 = 0.2
PS 8/10 = 0.8
SR 4/10 = 0.4
SP 4/10 = 0.4
SS 2/10 = 0.2
๐ ๐๐ก+1 ๐๐ก)
๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐ ๐คโ๐๐๐ ๐๐ก , ๐๐ก+1 ๐๐ ๐ ๐๐๐
๐๐ข๐๐๐๐ ๐๐ ๐๐ก
Boise State University 33
So now we haveโฆ
R starting Probability P starting Probability S starting Probability
2/3 = 0.666 1/3 = 0.333 0/3 = 0
State Transitions ๐ ๐๐ก+1 ๐๐ก)
RR 2/7 = 0.285
RP 5/7 = 0.714
RS 0/7 = 0
PR 0/10 = 0
PP 2/10 = 0.2
PS 8/10 = 0.8
SR 4/10 = 0.4
SP 4/10 = 0.4
SS 2/10 = 0.2
Transition Probabilities Initial Probabilities
Now we should be able to calculate the probability of the sequence of: RPS
Boise State University 34
So now we haveโฆ
R starting Probability P starting Probability S starting Probability
2/3 = 0.666 1/3 = 0.333 0/3 = 0
State Transitions ๐ ๐๐ก+1 ๐๐ก)
RR 2/7 = 0.285
RP 5/7 = 0.714
RS 0/7 = 0
PR 0/10 = 0
PP 2/10 = 0.2
PS 8/10 = 0.8
SR 4/10 = 0.4
SP 4/10 = 0.4
SS 2/10 = 0.2
Transition Probabilities Initial Probabilities
calculate the probability of the sequence of: RPS
๐ ๐0 = ๐ โ ๐ ๐1 = ๐ ๐0 = ๐ ) โ ๐(๐2 = ๐ | ๐1 = ๐)
2
3โ5
7โ8
10=
80
210=
8
21โ 0.38
Boise State University 35
Looking up each probability is still tediousโข Lucky we can redefine this process as a series of matrix multiplications
โข We can also define 1xN (denoted by ฯ) vector to represent our initial state probabilities
โข We can define a NxN matrix (denoted by P) which represents the transition probabilities
โข From there we can use
๐ ๐ ๐๐๐ข๐๐๐๐ = ๐๐ ๐ก๐๐๐ก๐๐๐_๐ ๐ก๐๐ก๐ โ ๐๐ก๐๐๐๐ ๐๐ก๐๐๐1 โ ๐๐ก๐๐๐๐ ๐๐ก๐๐๐2 โฆโ ๐๐ก๐๐๐๐ ๐๐ก๐๐๐_๐
Boise State University 36
Initial State Probabilities
โข We can also define 1xN (denoted by ฯ) vector to represent our initial state probabilities
๐ = ๐๐๐๐๐๐ก๐๐1 , ๐๐๐๐๐๐ก๐๐2 , โฆ ๐๐๐๐๐๐ก๐๐๐
๐ = 0.666, 0.333, 0
๐ = {๐ , ๐, ๐}
Boise State University 37
Transition Matrix
โข Let p be an NxN matrix where N is the number of discrete states
P=
๐11 ๐12 โฏ ๐1๐๐21 ๐22 โฆ ๐2๐โฎ
๐๐1
โฎ๐๐2
โฑ โฎโฆ ๐๐๐
๐๐๐ = ๐(๐๐|๐๐)
*Assume ๐ is ordered
P =0.285 0.714 00 0.2 0.80.4 0.4 0.2
R P S
RPS
๐ = {๐ , ๐, ๐}
๐๐๐ = ๐(๐๐|๐๐)
Boise State University 38
Markov Chain
โข We can now calculate the probability of a sequence by โchainingโ together the elements of the matrix
๐ ๐ ๐๐๐ข๐๐๐๐ = ๐๐ ๐ก๐๐๐ก๐๐๐_๐ ๐ก๐๐ก๐ โ ๐๐ก๐๐๐๐ ๐๐ก๐๐๐1 โ ๐๐ก๐๐๐๐ ๐๐ก๐๐๐2 โฆโ ๐๐ก๐๐๐๐ ๐๐ก๐๐๐_๐
Boise State University 39
Letโs check
Previously we calculated the probability of the sequence of: RPS using the โoldโ method. Letโs try the matrix method.
๐ ๐0 = ๐ โ ๐ ๐0 = ๐ ๐0 = ๐ ) โ ๐(๐1 = ๐ | ๐1 = ๐)
2
3โ5
7โ8
10=
8
21โ 0.38
๐1 โ ๐12 โ ๐230.666 โ 0.714 โ 0.8 = 0.38
Boise State University 40
Your Turn! Class Challenge
โข Assume ๐ = {๐, ๐, ๐ }
โข Assume ๐ = 0.8, 0.1, 0.1
โข Create the Transition matrix P
โข Try to find the probability for:
โ SPR
Boise State University 41
Your Turn! Class Challenge
โข Assume ๐ = {๐, ๐, ๐ }
โข Assume ๐ = 0.8, 0.1, 0.1
โข Try to find the probability for:
โ SPR
https://setosa.io/ev/markov-chains/
๐ =0.5 0.4 0.10.4 0.5 0.10.2 0.2 0.6
๐1 โ ๐12 โ ๐23 = 0.8 โ 0.4 โ 0.1 = 0.032
Boise State University 42
So what can we do with our Markov chain
โข Calculate the probability of a sequence
โ This is useful for comparing on sequence to another
โ We could also use this to classify by pick a probability cutoff point
โข We can calculate the probability of ending in state S after some number of transitions T
Boise State University 43
Recommended Next Steps
โข Hidden Markov Models (HMM)
โ We are making the assumptions that are likely untrue, HMMs help address or model hidden states
โข Smoothing & Normalization
โ Some transitions might never happen in our data set, thus the probability will be zero which is probably not what we want