Upload
buitu
View
214
Download
0
Embed Size (px)
Citation preview
Hidden Markov Model
Prof. Yan WangWoodruff School of Mechanical Engineering
Georgia Institute of TechnologyAtlanta, GA 30332, [email protected]
Multiscale Systems Engineering Research Group
Learning Objectives
To familiarize the hidden Markov model as a generalization of Markov chain
To understand the three basic problems (evaluation, decoding, and learning) in HMM model construction and applications
Multiscale Systems Engineering Research Group
Hidden Markov Model (HMM)
HMM is an extension of regular Markov chainState variables q’s are not directly observable
All statistical inference about the Markov chain itself has to be done in terms of observable o’sobservable
hidden
ot−1
qt−1
o1
q1 qt qt+1
ot ot+1 oT
qT
Multiscale Systems Engineering Research Group
HMM
o’s are conditionally independent given {qt}.
However, {ot} is not an independent sequence, nor a Markov chain itself.
observable
hidden
ot−1
qt−1
o1
q1 qt qt+1
ot ot+1 oT
qT
Multiscale Systems Engineering Research Group
HMM Components
State sequence: Q={q1,q2,…,qT} with Npossible valuesObservation sequence: O={o1,o2,…,oT} with M possible symbols {v1,v2,…vM}State transition matrix: A=(aij)N×N where aij=P(qt+1=j|qt=i)Observation matrix: B=(bik)N×M where bik=bi(vk)=P(ot=vk|qt=i)Initial state distribution: Δ=(δi)1×N where δi=P(q1=i)
Model’s parameters λ={A, B, Δ}
Multiscale Systems Engineering Research Group
Three Basic Problems in HMMEvaluation: Given a model with parameters λ and a sequence of observations O={o1,o2,…,oT}, what is the probability that the model generates those observations P(O|λ)?Decoding: Given a model with parameters λand a sequence of observations O={o1,o2,…,oT}, what is the most likely state sequence Q={q1,q2,…,qT} in the model that produces the observations?Learning: Given a set of observations O={o1,o2,…,oT}, how can we find a model with the parameters λ with the maximum likelihood P(O|λ)?
Multiscale Systems Engineering Research Group
Evaluation Problem
Given an O={o1,o2,…,oT}, P(O|λ)=?
Naïve algorithmbecause of lemma 3.1
where Q(d) is one of all possible combinations of state sequences
assume conditional independence between observations
However, the number of possible combinations of state sequences is huge!
( ) ( )
( ) ( ) ( )( | ) ( , | ) ( | , ) ( | )d d
d d d
Q QP O P O Q P O Q P Qλ λ λ λ= =∑ ∑
( )
1 1( | , ) ( | , ) ( )
t
T Tdt t q tt t
P O Q P o q b oλ λ= =
= =∏ ∏1 1 1
1 1( )11 1
( | ) ( | , )t t
T Tdq t t q q qt t
P Q P q q aλ δ λ δ+
− −
+= == =∏ ∏
Multiscale Systems Engineering Research Group
Evaluation Problem –Forward Algorithm
Define a forward variable
which can be recursively calculated by 1 2
( ) ( , , , , | )t t ti P o o o q iα λ= =…
1 1 1 1 1 1 1( ) ( , | ) ( | ) ( | , ) ( )
i ii P o q i P q i P o q i b oα λ λ λ δ= = = = = =
1 1 1 1
1 1 1 1
1 1 1 1 1
1 1 1 1
1 1 1 1
1 1
( ) ( , , , | )
( | ) ( , , | , )
( | ) ( | , ) ( , , | , )
( | , ) ( , , , | )
( | , ) ( , , , , | )
( |t
t t t
t t t
t t t t t
t t t t
t t t t tq
t t
i P o o q i
P q i P o o q i
P q i P o q i P o o q i
P o q i P o o q i
P o q i P o o q q i
P o q
α λλ λλ λ λ
λ λ
λ λ
+ + +
+ + +
+ + + +
+ + +
+ + +
+ +
= == = == = = == = =
= = =
= =
∑
……
……
…
1 1
1 ,
, ) ( | , ) ( , , , | )
( ) ( )t
tt
t t t tq
i t q i t tq
i P q i q P o o q
b o a q
λ λ λ
α+
+
=
=
∑∑
…
1 1( | ) ( , | ) ( )
N N
T Ti iP O P O q i iλ λ α
= == = =∑ ∑
Multiscale Systems Engineering Research Group
Evaluation Problem –Backward Algorithm
Define a backward variable
which can be recursively calculated by 1 2
( ) ( , , , | , )t t t T ti P o o o q iβ λ+ += =…
( ) 1Tiβ =
1
1
1 2
1 2
1 2 1
2 11
( ) ( , , | , )
( | , ) ( , , | , )
( | , ) ( , , , | ) / ( | )
( | , ) ( , , , , | ) / ( | )
( , , | , , ) (( | , )
t
t t T t
t t t T t
t t t T t t
t t t T t t tq
t T t t tt t
i P o o q i
P o q i P o o q i
P o q i P o o q i P q i
P o q i P o o q i q P q i
P o o q i q P qP o q i
β λλ λλ λ λ
λ λ λλ
λ
+
+
+ +
+ +
+ + +
+ ++
= == = == = = =
= = = =
== =
∑
………
……
1
1
11
1
1 2 1 1
1 , 1 1
, | )
( | )
( | , ) ( , , | , ) ( | , )
( ) ( )
t
t
tt
tq
t
t t t T t t tq
i t i q t tq
i q
P q i
P o q i P o o q P q q i
b o a q
λλ
λ λ λ
β
+
+
++
+
+ + + +
+ + +
=
=
= = =
=
∑
∑∑
…
Multiscale Systems Engineering Research Group
Decoding Problem
Given an O={o1,o2,…,oT}, find a Q*={q1*,q2*,…,qT*} with the maximum of P(O|Q)
Viterbi Algorithm is a dynamic programming method to solve the decoding problem
Multiscale Systems Engineering Research Group
Dynamic Programming
Breaking down complex problems into subproblems in a recursive manner.
e.g. search the shortest path in the Traveling Salesman Problem
Multiscale Systems Engineering Research Group
Define an auxiliary probability
which is the highest probability that a single path leads to qt=i at time t
Recursively
Decoding Problem –Viterbi Algorithm
1 11 1 1, ,
( ) : max ( , , , , , , | )t
t t t tq qi P q q q i o oρ λ
−−= =
…… …
{ }{ }{ }
1 1 1 1
1 1 1
1
( ) max ( ) ( | , ) ( | )
max ( ) ( | , ) ( | )
max ( ) ( )
t t t t t ti
t t t t ti
t ij j ti
j i P q j q i P o q j
i P q j q i P o q j
i a b o
ρ ρ λ
ρ λ
ρ
+ + + +
+ + +
+
= = = =
= = = =
=
Multiscale Systems Engineering Research Group
Define an auxiliary variable
to store the optimal state at time t to reach state j at time t+1.
The algorithm is1. initialize
2. recursion
Decoding Problem –Viterbi Algorithm – cont’d
( )1 1( ) ( ) , 1, ,
j jj b o j j Nρ δ= ∀ = …
{ }1 1( ) max ( ) ( )
t t ij j tij i a b oρ ρ+ +=
{ } { }1 1( ) arg max ( ) ( ) arg max ( )
t t ij j t t iji i
j i a b o i aψ ρ ρ+ += =
1( ) 0jψ =
{ }1( ) arg max ( )
t t iji
j i aψ ρ+ =
Multiscale Systems Engineering Research Group
3. terminate•The optimal probability
•The optimal final state
4. backtrack state sequence
Decoding Problem –Viterbi Algorithm – cont’d
{ }* max ( )Tj
P jρ=
{ }* arg max ( )T T
jq jρ=
* *1 1( )
t t tq qψ + +=
Multiscale Systems Engineering Research Group
Learning Problem
Given an O={o1,o2,…,oT}, find a λ*={A*, B*, Δ*} with the maximum likelihood P(O|λ)
Global optimum needs to search all possible state and observation sequences
Instead, Baum-Welch Algorithm is usually used to search heuristically
{ } { }* arg max ( | ) arg max log ( | )P O P Oλ λ
λ λ λ= =
{ }( ) ( )
* ( ) ( )arg max log ( | , )d d
d d
O QP O Q
λλ λ= ∑ ∑
Multiscale Systems Engineering Research Group
Learning Problem –Baum-Welch Algorithm
Algorithm 1.Guess some parameters λ2. Determine some “probable paths” {Q(1),…,Q(d)}3. Estimate the number of transitions , from state i to state j, given the current estimate of λ.4. Estimate the number of the observation vkemitted from state i as
5. Estimate the initial distribution6. Re-estimate λ’ from Aij’s and Bi(vk)’s
7. If λ’ and λ is close enough, stop; otherwise, assign λ = λ’ and go back to step 2.
ija
ˆ( )i kb v
ˆiδ
Multiscale Systems Engineering Research Group
Learning Problem –Baum-Welch Algorithm – cont’d
Define an auxiliary likelihood
which is the probability that a transition qt=i and qt+1=joccurs at time t given the complete observations {o1,o2,…,oT} and model parameters λ
1 1( , ) ( , | , , , )t t t Ti j P q i q j o o+= = =ξ λ…
Multiscale Systems Engineering Research Group
Learning Problem –Baum-Welch Algorithm – cont’d
1 1
1 1
1
1 1 1
1
1 1 1
2 1 1
( , ) ( , | , , , )
( , , , , | )
( , , | )( , , | , , ) ( , | )
( , , | )( , , | | ) ( | , )
( , , | | ) ( | , ) ( |
t t t T
t t T
T
T t t t t
T
t t t t
t T t t t t
i j P q i q j o o
P q i q j o o
P o oP o o q i q j P q i q j
P o oP o o q i P o q j
P o o q j P q j q i P q i
+
+
+ +
+ +
+ + +
= = == =
=
= = = ==
= == = = =
=
ξ λλ
λλ λ
λλ λ
λ λ
……
……
………
1
1 1
1 1 2 1
1
1 1
1
)
( , , | )( , , , | ) ( | , )
( | , ) ( , , | | )
( , , | )( ) ( ) ( )
( )
T
t t t t
t t t T t
T
t ij i t t
N
Ti
P o oP o o q i P q j q i
P o q j P o o q j
P o oi a b o j
i
+
+ + + +
+ +
=
⎡ ⎤⎢ ⎥⎢ ⎥⎣ ⎦
⎡ ⎤= = =⎢ ⎥
= =⎢ ⎥⎣ ⎦=
=∑
λ
λλ λ
λ λ
λα β
α
……
…
…
Multiscale Systems Engineering Research Group
Learning Problem –Baum-Welch Algorithm – cont’dEstimate parameters
1 11 1
11 1 1
( , | , , , ) ( , )ˆ
( | , , , ) ( , )
T T
t t T tt tij T T N
t T tt t k
P q i q j o o i ja
P q i o o i k
+= =
= = =
= == =
=
∑ ∑∑ ∑ ∑
λ ξ
λ ξ
…
…
1 11
11
1
1 1
( , , | , , , )ˆ( )( | , , , )
1 ( , )
( , )t k
T
t k t t Tti k T
t TtT
o v tt
T N
tt k
P o v q i q j o ob v
P q i o o
i j
i k
+=
=
==
= =
= = ==
=
=
∑∑
∑∑ ∑
λ
λ
ξ
ξ
…
…
1 2 1 11 1ˆ ( , | , , , ) ( , )
N N
i Tk kP q i q k o o i k
= == = = =∑ ∑δ λ ξ…
Multiscale Systems Engineering Research Group
Learning Problem –Baum-Welch Algorithm – cont’dHow to measure two models, λ’ and λ, are close enough?
( )
( )( | )d
d
Op P O λ= ∑
11
2
( )( ) log
( )
p xCross Entropy p x
p x=
Multiscale Systems Engineering Research Group
HMM Applications
HMM has been applied in many fields that are based on the analysis of discrete-valued time series, such as
speech recognition (Rabiner, 1989)
Image recognition
genetic profile and classification (Eddy, 1998)
Multiscale Systems Engineering Research Group
Kalman Filter
Can be regarded as a special case of HMM
Also known as the Gaussian linear state-space model
State series are linearly dependent on history, subject to process white noises.
Observations are also linearly dependent on states, subject to measurement noises.
t t t= +x Cx w
t t t= +y Dx v
Multiscale Systems Engineering Research Group
Summary
Hidden Markov model is a generalization Markov chain
Multiscale Systems Engineering Research Group
Further ReadingsRabiner L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): 267-296 Eddy S.R. (1998) Profile hidden Markov models. Bioinformatics Review, 14(9): 755-776 Dempster A.P., Laird N.M., and Rubin D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 39(1): 1-38Wu C.F.J. (1983) On the convergence properties of the EM algorithm. The Annals of Statistics, 11(1): 95-103HMM Software Packages http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html