Fundamentals of Hidden Markov Model1

8/2/2019 Fundamentals of Hidden Markov Model1

1/34

Fundamentals of Hidden

Markov Model

Mehmet Yunus Dnmez


2/34

Markov Random Processes

A random sequence has the Markov property if itsdistribution is determined solely by its current state.Any random process having this property is called a

Markov random process.

For observable state sequences (state is known fromdata), this leads to a Markov chainmodel.

For non-observable states, this leads to a HiddenMarkov Model(HMM).


3/34

HMM Elements

An HMM for discrete symbol observation

- N

the number of states in the model

the state at time t

- M

the number of distinct observation symbols per state

{1, 2,..., }N

tq

1 2{ , ,..., }MV v v v


4/34

HMM Elements (2)

- : the state-transition probability distribution : A

- : the observation symbol probability distribution : B

ija

1[ | ]

ij t t a p q j q i 1 ,i j N

( )jb k

( ) [ | ] j t k t

b k p o v q j

1k M


5/34

HMM Elements (3)

- : the initial state distribution,

Compact Notation of a HMM Model

=

i

1[ ]

i p q i 1 i N

( , , )A B

1 1 2 2 ( 1)1 1 2( ) ( )... ( )

T T Tq q q q q q q T b o a b o a b o

1 2 1 2( , ,... | , , ,..., )T q q qT p o o o s s s


6/34

A General Case HMM


7/34

HMM Generator

Choose an initial state ( ) initial state distribution

Set

Choose symbol probability distribution

Transit to a new state state transition probabilitydistribution

Set ; return to step 3 if ;

otherwise, terminate the procedure

1q i

1t

t ko v

1tq j

1t t t T


8/34

HMM Properties

Often simplified

, and

Obviously for all i

Discrete HMMs :

Continuous HMMs :

1( ) 1s 1( ) 0is

1ijj

a

1 2{ , ,... }MV v v v

dV R


9/34

HMM Properties (2)

The term hidden

- we can only access to visible symbols (observations)

- drawing conclusions without knowing the hidden sequence of

states

Causal: Probabilities depend on previous states

Ergodic if every state is visited in transition sequence for any

given initial state

Final or absorbing state: the state which, if entered, is never left


10/34

3 Basic Problems

The Evaluation Problem

- given an HMM- given an observation

- compute the probability of the observation

1 2{ , ,... | }

Tp o o o

1 2, ,... To o o


11/34

3 Basic Problems (2)

The Decoding Problem

- given an HMM

- given an observation- compute the most likely state sequence

i.e.

1 2, ,... To o o

1 2, ,...,q q qT s s s

1,... 1 2 1arg max ( , ,..., | ,... , )

q qT T T

p o o o q q


12/34

3 Basic Problems (3)

The learning / optimization problem

- given an HMM

- given an observation- find an HMM such that

1 2, ,... To o o

1 2 1 1 2{ , ,... | } { , ,... | }T Tp o o o p o o o


13/34

The Evaluation Problem

We know :

=

- From this :

=

1 2 1 2( , ,... | , , ,..., )T q q qT p o o o s s s

1 1 1 11 11,... 1( ) ( ) ( )k k kq q q q q k k Ts b o a b o

1 2( , ,... | )Tp o o o

1 1 1 11

2

3

1 11,... 1,... 11,...1,...

1,...

( ) ( ) ( )k k k

T

q q q q q k q N k Tq Nq N

q N

s b o a b o


14/34

The Evaluation Problem(2)

Obvious:

for sufficiently large values of T, it is infeasible to compute theabove term for all possible state sequences need other

solution


15/34

The Forward Algorithm

At time t and state i, probability of partial observation sequence

: array

1 2, ,... to o o ( )t i

1 1( ) ( )i ii b o 1 i N

[ ][ ]time state

1 11( ) [ ( ) ] ( )

N

t t ij j t i j i a b o


16/34

The Forward Algorithm (2)

As a result at the last time T

[ ][ ] ( )timetime state state

[ ][ ]state

T state1 2( , ,... | )Tp o o o


17/34

Figure


18/34

The Backward Algorithm

1 2

1 1

1

1 2 1

1

, ,... ( )

( ) 1

( ) ( ) ( )

1, 2,...1

( , ,... | ) ( )

t t T t

T

N

t ij j t t

j

N

T

j

o o o i

i

i a b o j

t T T

p o o o j


19/34

Figure


20/34

The Decoding Problem

Finding the optimal state sequence associated with the givenobservation sequence


21/34

Forward-Backward

Optimality criterion : to choose the states that areindividually most likely at each time t

The probability of being in state i at time t

: accounts for partial observation sequence

: account for remainder

tq

1

( ) ( | , )

( ) ( )

( ) ( )

i t

t t

N

t t

i

t p q i O

i i

i i

( )t i

( )t

i 1 2, ,...t t To o o 1 2, ,... to o o


22/34

The Viterbi Algorithm

The best score along a single path, at time t, which accounts forthe first t observations and ends in state i

Keep track of the argument that maximize above equation

Viterbi Algorithm is similar in implementation to the forwardcalculation, but the major difference is the maximization overprevious states

1 1( ) [max ( ) ] ( )

t t ij j t i

j i a b o

( )t j


23/34

The Complete Procedure(for finding the best state sequence)

Initialization

Recursion

1 1

1

( ) ( )

( ) 0

i ii b o

i

1 i N

11

11

( ) max[ ( ) ] ( )

( ) arg max[ ( ) ]

t t ij j t i N

t t iji N

j i a b o

j i a

2

1

t T

j N


24/34

The Complete Procedure (2)(for finding the best state sequence)

Termination

Path(state sequence) backtracking

*

1

*1

max[ ( )]

arg max[ ( )]

Ti N

T Ti N

P i

q i

* *

1 1

( )

1, 2,...,1t t t

q q

t T T


25/34

The Learning /

Optimization problem

How do we adjust the model parameters to maximize

??

- Parameter Estimation

- Baum-Welch Algorithm ( EM : Expectation Maximization )

- Iterative Procedure

( | )P O


26/34

Parameter Estimation

Probability of being in state i at time t, and state j at time t+1

1

1 1

1 1

1 1

( , ) ( , | , )( ) ( ) ( )

( ) ( ) ( )

t t t

t ij j t t

N N

t ij j t t

i j

i j P q i q j Oi a b o j

i a b o j


27/34

Figure


28/34

Parameter Estimation (2)

Probability of being in state i at time t, given the entireobservation sequence and the model

We can relate these by summing over j

1( ) ( , )

N

t t

ji i j


29/34


30/34


Update using &

: expected frequency (number of times) in state i at time (t=1)

( , , )A B ( , )t i j ( )i t

_

1( )i i


31/34


New Transition Probability

expected number of transitions from state i to j

expected number of transitions from state I

=

1

_1

1

1

( , )

( )

T

t

tij T

t

t

i j

ai


32/34


New Observation Probability

expected number of times in state j and observing symbol

expected number of times in j

=

kv

1_

. .

1

( )

( )

( )

t k

T

t

t

s t o vj

T

t

t

j

b k

j


33/34


From , if we define new

- New model is more likely than old model in the sense that

- The observation sequence is more likely to be produced bynew model

- has been proved by Baum & his colleagues

- iteratively use new model in place of old model, and repeatthe reestimation calculation ML estimation

( , , )A B _ _ _ _

( , , )A B

_

( | ) ( | )P O P O


34/34

Questions??

Documents

Fundamentals of Hidden Markov Model1