View
2
Download
0
Category
Preview:
Citation preview
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Hidden Markov Models (HMM)
Devise algorithms for computing the posteriors on a Markov Chain with hidden variables (HMM).
1
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐1 ๐2 ๐3 ๐๐
Letโs assume we have discrete random variables (e.g., taking 3 discrete
values ๐๐ก = {100,010,001})
๐(๐๐ก ๐1, . . , ๐๐กโ1 = ๐(๐๐ก|๐๐กโ1) Markov Property:
Stationary, Homogeneous or Time-Invariant if the distribution ๐ ๐๐ก ๐๐กโ1 does not depend on ๐ก
e.g. ๐(๐๐ก =100|๐๐กโ1 =
010)
Markov Chains with Discrete Random Variables
2
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐(๐1, . . , ๐๐) = ๐(๐1) ๐(๐๐ก|๐๐กโ1)
T
๐ก=2
What do we need in order to describe the whole procedure?
(1) A probability for the first frame/timestamp etc ๐(๐1). In order to
define the probability we need to define the vector ๐ =(๐1, ๐2, โฆ , ๐K)
(2) A transition probability ๐ ๐๐ก|๐๐กโ1 . In order to define it we need a
๐พ๐ฅ๐พ transition matrix ๐จ = ๐๐๐
๐ ๐1|๐ = ๐๐๐ฅ1๐
๐พ
๐=1
๐ ๐๐ก|๐๐กโ1, ๐จ = ๐๐๐๐ฅ๐กโ1๐๐ฅ๐ก๐
๐พ
๐=1
๐พ
๐=1
Markov Chains with Discrete Random Variables
3
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
ฮ(๐|๐ฎ1, ๐1)
๐ฎ(๐|๐ฎ2, ๐2)
ฮ(๐|๐ฎ3, ๐3)
๐๐ก =๐ง๐ก1๐ง๐ก2๐ง๐ก3
โ100,010,001
๐ ๐ฟ, ๐ ๐
= ๐ ๐1, ๐2, โฏ , ๐๐ , ๐1,๐2, โฏ , ๐๐|๐
= ๐(๐๐ก|๐๐ก, ๐๐ฅ)
๐
๐ก=1
๐(๐๐ก|๐๐ง)
๐
๐ก=1
Latent Variables
4
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Latent Variables in a Markov Chain
๐(๐1, . . , ๐๐) = ๐(๐1) ๐(๐๐ก|๐๐กโ1)
T
๐ก=2
๐ ๐ฟ, ๐ ๐
= ๐ ๐1, ๐2, โฏ , ๐๐ , ๐1,๐2, โฏ , ๐๐|๐
= ๐(๐๐ก|๐๐ก, ๐๐ฅ)
๐
๐ก=1
๐(๐1) ๐(๐๐ก|๐๐กโ1)
T
๐ก=2
๐1
๐1
๐2
๐2
๐3
๐3
๐๐
๐๐
5
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Each die has 6 sides.
(our latent variable is probably the dice )
Hidden Markov Models
๐ = {10,01}
A casino has two dice
One is fair and one is not
๐ = {
100000
,
010000
,
001000
,
000100
,
000010
,
000001
}
(1) Fair die ๐ ๐ฅ๐ = 1|๐ง1 = 1 =1
6 for all ๐ = {1, . . , 6}
(2) Loaded die ๐ ๐ฅ๐ = 1|๐ง2 = 1 =1
10 for ๐ = 1, . . , 5 and
๐ ๐ฅ6 = 6|๐ง2 = 1 =1
2
Emission probability
6
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
A casino player switches back & forth between fair and loaded die
once every 20 turns
Hidden Markov Models
0.95
1 2
0.05
0.05
0.95
(i.e., ๐ ๐ง๐ก 1 = 1|๐ง๐กโ1 2 = 1 = ๐ ๐ง๐ก 2 = 1|๐ง๐กโ1 1 = 1 = 0.05)
7
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Given a string of observations and the above model:
(1) We want to find for a timestamp ๐ก the probabilities of each die given
the observations that far.
This process is called Filtering: ๐ ๐๐ก ๐1, ๐2, โฆ , ๐๐ก
(2) We want to find for a timestamp ๐ก the probabilities of each die given
the whole string.
This process is called Smoothing: ๐ ๐๐ก ๐1, ๐2, โฆ , ๐๐
Hidden Markov Models
664153216162115234653214356634261655234232315142464156663246
8
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
(3) Given the observation string find the string of hidden variables that
maximize the posterior.
argmax๐1โฆ ๐๐ก ๐ ๐1, ๐2, โฆ , ๐๐ก ๐1, ๐2, โฆ , ๐๐ก
This process is called Decoding (Viterbi).
(4) Find the probability of the model.
๐(๐1, ๐2, โฆ , ๐๐)
This process is called Evaluation
Hidden Markov Models
664153216162115234653214356634261655234232315142464156663246
222222222222221111111222222222222221111111111111111111122222222
9
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Hidden Markov Models
Filtering Smoothing Decoding
10
Taken from Machine Learning: A Probabilistic Perspective by K. Murphy
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
(5) Prediction
Hidden Markov Models
๐ ๐๐ก+๐ฟ ๐1, ๐2, โฆ , ๐๐ก ๐ ๐๐ก+๐ฟ ๐1, ๐2, โฆ , ๐๐ก
(6) Parameter estimation (Baum-Welch algorithm)
๐จ,๐ , ๐
11
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐ ๐๐ก ๐1, ๐2, โฆ , ๐๐ก
๐ ๐๐ก ๐1, ๐2, โฆ , ๐๐
argmax๐1โฆ ๐๐ก ๐ ๐1, โฆ , ๐๐ก ๐๐, โฆ , ๐๐
๐ ๐๐กโ๐ ๐1, ๐2, โฆ , ๐๐
๐ ๐๐ก+๐ฟ ๐1, ๐2, โฆ , ๐๐
๐ ๐๐ก+๐ฟ ๐1, ๐2, โฆ , ๐๐
Hidden Markov Models
12
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐ ๐1, โฆ , ๐๐กโ1 ๐๐ก , ๐๐ก = ๐ ๐1, โฆ , ๐๐กโ1 ๐๐ก
๐ ๐1, โฆ , ๐๐กโ1 ๐๐กโ1, ๐๐ก = ๐ ๐1, โฆ , ๐๐กโ1 ๐๐กโ1
๐ ๐๐ก+1, โฆ , ๐๐ ๐๐ก , ๐๐ก+1 = ๐ ๐๐ก+1, โฆ , ๐๐ ๐๐ก+1
๐ ๐๐ก+2, โฆ , ๐๐ ๐๐ก+1, ๐๐ก+1 = ๐ ๐๐ก+2, โฆ , ๐๐ ๐๐ก+1
๐ ๐1, โฆ , ๐๐ ๐๐กโ1, ๐๐ก = ๐ ๐1, โฆ , ๐๐กโ1 ๐๐กโ1 ๐ ๐๐ก ๐๐ก
๐ ๐๐ก+1, โฆ , ๐๐ ๐๐ก
Hidden Markov Models
๐ ๐1, โฆ , ๐๐ ๐๐ก = ๐ ๐1, โฆ , ๐๐ก ๐๐ก
๐ ๐๐ก+1, โฆ , ๐๐ ๐๐ก
๐ ๐๐+1 ๐๐ , ๐1, โฆ , ๐๐ = ๐ ๐๐+1 ๐๐
13
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Smoothing: ๐ ๐๐ก ๐1, ๐2, โฆ , ๐๐
๐ ๐๐ก ๐1, ๐2, โฆ , ๐๐ =๐ ๐1, ๐2, โฆ , ๐๐ ๐๐ก ๐(๐๐ก)
๐(๐1, ๐2, โฆ , ๐๐)
=๐ ๐1, ๐2, โฆ , ๐๐ก ๐๐ก ๐(๐๐ก+1, โฆ , ๐๐|๐๐ก)๐(๐๐ก)
๐(๐1, ๐2, โฆ , ๐๐)
Filtering: ๐ ๐๐ก ๐1, ๐2, โฆ , ๐๐ก
=๐(๐1, ๐2, โฆ , ๐๐ก , ๐๐ก)๐(๐๐ก+1, โฆ , ๐๐ป|๐๐ก)
๐(๐1, ๐2, โฆ , ๐๐)
=๐ผ ๐๐ก ๐ฝ(๐๐ก)
๐(๐1, ๐2, โฆ , ๐๐)
Filtering and smoothing
14
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐ผ(๐๐ก) = ๐(๐1, โฆ , ๐๐ก , ๐๐ก)
= ๐ ๐1, โฆ , ๐๐ก ๐๐ก ๐(๐๐ก)
= ๐ ๐๐ก ๐๐ก ๐ ๐1, โฆ , ๐๐กโ1 ๐๐ก ๐(๐๐ก)
= ๐ ๐๐ก ๐๐ก ๐(๐1, โฆ , ๐๐กโ1 , ๐๐ก)
= ๐ ๐๐ก ๐๐ก ๐(๐1, โฆ , ๐๐กโ1 , ๐๐ก , ๐๐กโ1)
๐๐กโ1
= ๐ ๐๐ก ๐๐ก ๐(๐1, โฆ , ๐๐กโ1 , ๐๐ก|๐๐กโ1)p(๐๐กโ1) ๐๐กโ1
= ๐ ๐๐ก ๐๐ก ๐(๐1, โฆ , ๐๐กโ1 |๐๐กโ1, ๐๐ก)p(๐๐ก|๐๐กโ1)p(๐๐กโ1) ๐๐กโ1
Forward Probabilities
15
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
= ๐ ๐๐ก ๐๐ก ๐ ๐1, โฆ , ๐๐กโ1 ๐๐กโ1 p(๐๐กโ1)p(๐๐ก|๐๐กโ1) ๐๐กโ1
= ๐ ๐๐ก ๐๐ก ๐(๐1, โฆ , ๐๐กโ1 , ๐๐กโ1)p(๐๐ก|๐๐กโ1) ๐๐กโ1
= ๐ ๐๐ก ๐๐ก ฮฑ(๐๐กโ1)p(๐๐ก|๐๐กโ1) ๐๐กโ1
Forward Probabilities
Recursive formula and initial condition ฮฑ(๐1)
๐ผ ๐1 = ๐ ๐1, ๐1 = ๐ ๐1 ๐(๐1|๐1) = ๐๐๐(๐1|๐ง1๐)๐ง1๐
2
๐=1
16
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐ผ ๐ง๐ก1 = ๐(๐๐ก|๐ง๐ก1 = 1) ๐11๐ ๐ง๐กโ1 1 + ๐21๐ ๐ง๐กโ1 2
๐ ๐ง๐ก 2 = ๐(๐๐ก|๐ง๐ก 2 = 1) ๐12๐ ๐ง๐กโ1 1 + ๐22๐ ๐ง๐กโ1 2
In total ๐(22) computation (in general O(๐พ2))
p(๐๐ก|๐ง๐ก1 = 1) 1 1
2
๐11
๐21
๐ผ ๐ง๐กโ1 1 ๐ผ ๐ง๐ก 1
๐ ๐ง๐กโ1 2
๐(๐๐ก|๐ง๐ก 2 = 1)
1
2 2
๐12
๐22
๐ ๐ง๐กโ1 1
๐ ๐ง๐กโ1 2
Forward Probabilities
17
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐ ๐1, โฆ , ๐๐ก = ๐ ๐1, โฆ , ๐๐ก , ๐๐ก๐๐ก
= ฮฑ ๐๐ก๐๐ก
= ฮฑ(๐ง๐ก1)+ฮฑ(๐ง๐ก2)!!
๐ ๐๐ก|๐1, โฆ , ๐๐ก =๐(๐1, โฆ , ๐๐ก , ๐๐ก)
๐(๐1, โฆ , ๐๐ก )=
ฮฑ(๐๐ก)
ฮฑ(๐๐ก)๐๐ก
= ๐ (๐๐ก)
Evaluation: How can we compute ๐(๐1, โฆ , ๐๐ )?
๐(๐1, โฆ , ๐๐ต ) = ๐(๐1, โฆ , ๐๐ต , ๐๐ต)
๐๐ต
= ฮฑ(๐๐ต)
๐๐ต
Filtering
Filtering
18
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐ฝ ๐๐ก = ๐ ๐๐ก+1, โฆ , ๐๐ ๐๐ก
= ๐ ๐๐ก+1, โฆ , ๐๐ , ๐๐ก+1 ๐๐ก
๐๐ก+1
= ๐ ๐๐ก+1, โฆ , ๐๐ ๐๐ก, ๐๐ก+1 ๐(๐๐ก+1|๐๐ก)
๐๐ก+1
= ๐ ๐๐ก+1, โฆ , ๐๐ ๐๐ก+1 ๐(๐๐ก+1|๐๐ก)
๐๐ก+1
= ๐ ๐๐ก+2, โฆ , ๐๐ ๐๐ก+1 ๐(๐๐ก+1|๐๐ก+1)๐(๐๐ก+1|๐๐ก)
๐๐ก+1
= ๐ฝ(๐๐ก+1)๐(๐๐ก+1|๐๐ก+1)๐(๐๐ก+1|๐๐ก)
๐๐ก+1
Backward probabilities
19
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐(๐๐ก|๐ง๐ก+1 1 = 1)
๐(๐๐ก|๐ง๐ก+1 2 = 1)
1 1
2
๐11
๐21
๐ฝ ๐ง๐ก 1 ๐ฝ ๐ง๐ก+1 1
ฮฒ ๐ง๐ก 2 2
๐ฝ ๐ง๐ก+1 2
Computing Backward probabilities
๐ฝ ๐ง๐ก 1 = ๐ฝ(๐ง๐ก+1 1)๐11๐ ๐๐ก ๐ง๐ก+1 1 = 1
+๐ฝ(๐ง๐ก+1 2)๐12๐ ๐๐ก ๐ง๐ก+1 2 = 1
20
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
p(๐๐ก|๐ง๐ก+1 2 = 1)
๐ฝ ๐ง๐ก 2 = ฮฒ(๐ง๐ก+1 1)๐21๐ ๐๐ก ๐ง๐ก+1 1 = 1
+๐ฝ ๐ง๐ก+1 2 ๐22๐ ๐๐ก ๐ง๐ก+1 2 = 1
๐ฝ ๐๐ = 1 ๐(๐๐ ๐1, โฆ , ๐๐ต =๐(๐1, โฆ , ๐๐ต , ๐๐ป)
๐(๐1, โฆ , ๐๐ต )=
๐ผ ๐๐ โ 1
๐(๐1, โฆ , ๐๐ต )
p(๐๐ก|๐ง๐ก+1 1 = 1) 1
2 2
๐21
๐22
๐ฝ ๐ง๐ก 1
ฮฒ ๐ง๐ก 2
1
๐ฝ ๐ง๐ก+1 1
๐ฝ ๐ง๐ก+1 2
Computing Backward probabilities
21
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
๐(๐๐ก+2 ๐1, โฆ , ๐๐ก = ๐(๐๐ก+2, ๐๐ก+1, ๐๐ก|๐1, โฆ , ๐๐ก)
๐๐ก๐๐ก+1
= ๐ ๐๐ก+2, ๐๐ก+1 ๐1, โฆ , ๐๐ก , ๐๐ก ๐(๐๐ก|๐1, โฆ , ๐๐ก)
๐๐ก๐๐ก+1
= ๐ ๐๐ก+2, ๐๐ก+1 ๐๐ก๐๐ก๐๐ก+1
๐(๐๐ก|๐1, โฆ , ๐๐ก)๐ ๐๐ก
= ๐(๐๐ก+2|๐๐ก+1)๐(๐๐ก+1|๐๐ก)๐ (๐๐ก)
๐๐ก๐๐ก+1
๐จ๐๐จ๐๐
Prediction
22
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Prediction
๐(๐๐ก+2 ๐1, โฆ , ๐๐ก = ๐(๐๐ก+2|๐๐ก+2) ๐(๐๐ก+2|๐1, โฆ , ๐๐ก)
๐๐ก+2
and
23
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
ฮพ ๐๐กโ1, ๐๐ก = ๐(๐๐กโ1, ๐๐ก|๐1, โฆ , ๐๐)
=๐(๐1, โฆ , ๐๐|๐๐กโ1, ๐๐ก)๐(๐๐กโ1, ๐๐ก)
๐(๐1, โฆ , ๐๐)
=๐ ๐1, โฆ , ๐๐กโ1 ๐๐กโ1 ๐(๐๐ก|๐๐ก)๐ ๐๐ก+1, โฆ , ๐๐ ๐๐ก ๐(๐๐ก|๐๐กโ1)๐(๐๐กโ1)
๐(๐)
=ฮฑ ๐๐กโ1 ๐ ๐๐ก ๐๐ก ๐ ๐๐ก ๐๐กโ1 ๐ฝ(๐๐ก)
๐(๐1, โฆ , ๐๐ต)
Smoothed transition
24
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
We need to define an EM algorithm
Parameter Estimation (Baum-Welch algorithm)
๐(๐๐กโ1, ๐๐ก|๐1, โฆ , ๐๐)
and we have all the necessary ingredients
๐ ๐๐ก ๐1, ๐2, โฆ , ๐๐ป
25
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Summary
We saw how to perform
(a) Filtering
(b) Smoothing
(c) Evaluation
(d) Prediction
Next we will see how to perform (e) EM and (f) Decoding
26
Recommended