Hidden Markov Models (HMM)ย ยท Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495) 1 2...

Preview:

Citation preview

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Hidden Markov Models (HMM)

Devise algorithms for computing the posteriors on a Markov Chain with hidden variables (HMM).

1

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐’™1 ๐’™2 ๐’™3 ๐’™๐‘‡

Letโ€™s assume we have discrete random variables (e.g., taking 3 discrete

values ๐’™๐‘ก = {100,010,001})

๐‘(๐’™๐‘ก ๐’™1, . . , ๐’™๐‘กโˆ’1 = ๐‘(๐’™๐‘ก|๐’™๐‘กโˆ’1) Markov Property:

Stationary, Homogeneous or Time-Invariant if the distribution ๐‘ ๐’™๐‘ก ๐’™๐‘กโˆ’1 does not depend on ๐‘ก

e.g. ๐‘(๐’™๐‘ก =100|๐’™๐‘กโˆ’1 =

010)

Markov Chains with Discrete Random Variables

2

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐‘(๐’™1, . . , ๐’™๐‘‡) = ๐‘(๐’™1) ๐‘(๐’™๐‘ก|๐’™๐‘กโˆ’1)

T

๐‘ก=2

What do we need in order to describe the whole procedure?

(1) A probability for the first frame/timestamp etc ๐‘(๐’™1). In order to

define the probability we need to define the vector ๐… =(๐œ‹1, ๐œ‹2, โ€ฆ , ๐œ‹K)

(2) A transition probability ๐‘ ๐’™๐‘ก|๐’™๐‘กโˆ’1 . In order to define it we need a

๐พ๐‘ฅ๐พ transition matrix ๐‘จ = ๐‘Ž๐‘–๐‘—

๐‘ ๐’™1|๐… = ๐œ‹๐‘๐‘ฅ1๐’„

๐พ

๐‘=1

๐‘ ๐’™๐‘ก|๐’™๐‘กโˆ’1, ๐‘จ = ๐‘Ž๐‘—๐‘˜๐‘ฅ๐‘กโˆ’1๐‘—๐‘ฅ๐‘ก๐‘˜

๐พ

๐‘˜=1

๐พ

๐‘—=1

Markov Chains with Discrete Random Variables

3

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

ฮ(๐’™|๐œฎ1, ๐1)

๐›ฎ(๐’™|๐œฎ2, ๐2)

ฮ(๐’™|๐œฎ3, ๐3)

๐’›๐‘ก =๐‘ง๐‘ก1๐‘ง๐‘ก2๐‘ง๐‘ก3

โˆˆ100,010,001

๐‘ ๐‘ฟ, ๐’ ๐œƒ

= ๐‘ ๐’™1, ๐’™2, โ‹ฏ , ๐’™๐‘‡ , ๐’›1,๐’›2, โ‹ฏ , ๐’›๐‘‡|๐œƒ

= ๐‘(๐’™๐‘ก|๐’›๐‘ก, ๐œƒ๐‘ฅ)

๐‘‡

๐‘ก=1

๐‘(๐’›๐‘ก|๐œƒ๐‘ง)

๐‘‡

๐‘ก=1

Latent Variables

4

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Latent Variables in a Markov Chain

๐‘(๐’›1, . . , ๐’›๐‘‡) = ๐‘(๐’›1) ๐‘(๐’›๐‘ก|๐’›๐‘กโˆ’1)

T

๐‘ก=2

๐‘ ๐‘ฟ, ๐’ ๐œƒ

= ๐‘ ๐’™1, ๐’™2, โ‹ฏ , ๐’™๐‘‡ , ๐’›1,๐’›2, โ‹ฏ , ๐’›๐‘‡|๐œƒ

= ๐‘(๐’™๐‘ก|๐’›๐‘ก, ๐œƒ๐‘ฅ)

๐‘‡

๐‘ก=1

๐‘(๐’›1) ๐‘(๐’›๐‘ก|๐’›๐‘กโˆ’1)

T

๐‘ก=2

๐’›1

๐’™1

๐’›2

๐’™2

๐’›3

๐’™3

๐’›๐‘‡

๐’™๐‘‡

5

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Each die has 6 sides.

(our latent variable is probably the dice )

Hidden Markov Models

๐’› = {10,01}

A casino has two dice

One is fair and one is not

๐’™ = {

100000

,

010000

,

001000

,

000100

,

000010

,

000001

}

(1) Fair die ๐‘ ๐‘ฅ๐‘— = 1|๐‘ง1 = 1 =1

6 for all ๐‘— = {1, . . , 6}

(2) Loaded die ๐‘ ๐‘ฅ๐‘— = 1|๐‘ง2 = 1 =1

10 for ๐‘— = 1, . . , 5 and

๐‘ ๐‘ฅ6 = 6|๐‘ง2 = 1 =1

2

Emission probability

6

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

A casino player switches back & forth between fair and loaded die

once every 20 turns

Hidden Markov Models

0.95

1 2

0.05

0.05

0.95

(i.e., ๐‘ ๐‘ง๐‘ก 1 = 1|๐‘ง๐‘กโˆ’1 2 = 1 = ๐‘ ๐‘ง๐‘ก 2 = 1|๐‘ง๐‘กโˆ’1 1 = 1 = 0.05)

7

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Given a string of observations and the above model:

(1) We want to find for a timestamp ๐‘ก the probabilities of each die given

the observations that far.

This process is called Filtering: ๐‘ ๐’›๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ก

(2) We want to find for a timestamp ๐‘ก the probabilities of each die given

the whole string.

This process is called Smoothing: ๐‘ ๐’›๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡

Hidden Markov Models

664153216162115234653214356634261655234232315142464156663246

8

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

(3) Given the observation string find the string of hidden variables that

maximize the posterior.

argmax๐’š1โ€ฆ ๐’š๐‘ก ๐‘ ๐’š1, ๐’š2, โ€ฆ , ๐’š๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ก

This process is called Decoding (Viterbi).

(4) Find the probability of the model.

๐‘(๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡)

This process is called Evaluation

Hidden Markov Models

664153216162115234653214356634261655234232315142464156663246

222222222222221111111222222222222221111111111111111111122222222

9

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Hidden Markov Models

Filtering Smoothing Decoding

10

Taken from Machine Learning: A Probabilistic Perspective by K. Murphy

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

(5) Prediction

Hidden Markov Models

๐‘ ๐’›๐‘ก+๐›ฟ ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ก ๐‘ ๐’™๐‘ก+๐›ฟ ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ก

(6) Parameter estimation (Baum-Welch algorithm)

๐‘จ,๐…, ๐œƒ

11

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐‘ ๐’›๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ก

๐‘ ๐’›๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡

argmax๐’š1โ€ฆ ๐’š๐‘ก ๐‘ ๐’š1, โ€ฆ , ๐’š๐‘ก ๐’™๐Ÿ, โ€ฆ , ๐’™๐’•

๐‘ ๐’›๐‘กโˆ’๐œ ๐’™1, ๐’™2, โ€ฆ , ๐’™๐’•

๐‘ ๐’›๐‘ก+๐›ฟ ๐’™1, ๐’™2, โ€ฆ , ๐’™๐’•

๐‘ ๐’™๐‘ก+๐›ฟ ๐’™1, ๐’™2, โ€ฆ , ๐’™๐’•

Hidden Markov Models

12

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐‘ ๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 ๐’™๐‘ก , ๐’›๐‘ก = ๐‘ ๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 ๐’›๐‘ก

๐‘ ๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 ๐’›๐‘กโˆ’1, ๐’›๐‘ก = ๐‘ ๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 ๐’›๐‘กโˆ’1

๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก , ๐’›๐‘ก+1 = ๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก+1

๐‘ ๐’™๐‘ก+2, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก+1, ๐’™๐‘ก+1 = ๐‘ ๐’™๐‘ก+2, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก+1

๐‘ ๐’™1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘กโˆ’1, ๐’›๐‘ก = ๐‘ ๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 ๐’›๐‘กโˆ’1 ๐‘ ๐’™๐‘ก ๐’›๐‘ก

๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก

Hidden Markov Models

๐‘ ๐’™1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก = ๐‘ ๐’™1, โ€ฆ , ๐’™๐‘ก ๐’›๐‘ก

๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก

๐‘ ๐’›๐‘‡+1 ๐’›๐‘‡ , ๐’™1, โ€ฆ , ๐’™๐‘‡ = ๐‘ ๐’›๐‘‡+1 ๐’›๐‘‡

13

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Smoothing: ๐‘ ๐’›๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡

๐‘ ๐’›๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡ =๐‘ ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก ๐‘(๐’›๐‘ก)

๐‘(๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡)

=๐‘ ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ก ๐’›๐‘ก ๐‘(๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡|๐’›๐‘ก)๐‘(๐’›๐‘ก)

๐‘(๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡)

Filtering: ๐‘ ๐’›๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ก

=๐‘(๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ก , ๐’›๐‘ก)๐‘(๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘ป|๐’›๐‘ก)

๐‘(๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡)

=๐›ผ ๐’›๐‘ก ๐›ฝ(๐’›๐‘ก)

๐‘(๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘‡)

Filtering and smoothing

14

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐›ผ(๐’›๐‘ก) = ๐‘(๐’™1, โ€ฆ , ๐’™๐‘ก , ๐’›๐‘ก)

= ๐‘ ๐’™1, โ€ฆ , ๐’™๐‘ก ๐’›๐‘ก ๐‘(๐’›๐‘ก)

= ๐‘ ๐’™๐‘ก ๐’›๐‘ก ๐‘ ๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 ๐’›๐‘ก ๐‘(๐’›๐‘ก)

= ๐‘ ๐’™๐‘ก ๐’›๐‘ก ๐‘(๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 , ๐’›๐‘ก)

= ๐‘ ๐’™๐‘ก ๐’›๐‘ก ๐‘(๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 , ๐’›๐‘ก , ๐’›๐‘กโˆ’1)

๐’›๐‘กโˆ’1

= ๐‘ ๐’™๐‘ก ๐’›๐‘ก ๐‘(๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 , ๐’›๐‘ก|๐’›๐‘กโˆ’1)p(๐’›๐‘กโˆ’1) ๐’›๐‘กโˆ’1

= ๐‘ ๐’™๐‘ก ๐’›๐‘ก ๐‘(๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 |๐’›๐‘กโˆ’1, ๐’›๐‘ก)p(๐’›๐‘ก|๐’›๐‘กโˆ’1)p(๐’›๐‘กโˆ’1) ๐’›๐‘กโˆ’1

Forward Probabilities

15

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

= ๐‘ ๐’™๐‘ก ๐’›๐‘ก ๐‘ ๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 ๐’›๐‘กโˆ’1 p(๐’›๐‘กโˆ’1)p(๐’›๐‘ก|๐’›๐‘กโˆ’1) ๐’›๐‘กโˆ’1

= ๐‘ ๐’™๐‘ก ๐’›๐‘ก ๐‘(๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 , ๐’›๐‘กโˆ’1)p(๐’›๐‘ก|๐’›๐‘กโˆ’1) ๐’›๐‘กโˆ’1

= ๐‘ ๐’™๐‘ก ๐’›๐‘ก ฮฑ(๐’›๐‘กโˆ’1)p(๐’›๐‘ก|๐’›๐‘กโˆ’1) ๐’›๐‘กโˆ’1

Forward Probabilities

Recursive formula and initial condition ฮฑ(๐’›1)

๐›ผ ๐’›1 = ๐‘ ๐’™1, ๐’›1 = ๐‘ ๐’›1 ๐‘(๐’™1|๐’›1) = ๐œ‹๐‘˜๐‘(๐’™1|๐‘ง1๐‘˜)๐‘ง1๐‘˜

2

๐‘˜=1

16

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐›ผ ๐‘ง๐‘ก1 = ๐‘(๐’™๐‘ก|๐‘ง๐‘ก1 = 1) ๐‘Ž11๐‘Ž ๐‘ง๐‘กโˆ’1 1 + ๐‘Ž21๐‘Ž ๐‘ง๐‘กโˆ’1 2

๐‘Ž ๐‘ง๐‘ก 2 = ๐‘(๐’™๐‘ก|๐‘ง๐‘ก 2 = 1) ๐‘Ž12๐‘Ž ๐‘ง๐‘กโˆ’1 1 + ๐‘Ž22๐‘Ž ๐‘ง๐‘กโˆ’1 2

In total ๐‘‚(22) computation (in general O(๐พ2))

p(๐’™๐‘ก|๐‘ง๐‘ก1 = 1) 1 1

2

๐‘Ž11

๐‘Ž21

๐›ผ ๐‘ง๐‘กโˆ’1 1 ๐›ผ ๐‘ง๐‘ก 1

๐‘Ž ๐‘ง๐‘กโˆ’1 2

๐‘(๐’™๐‘ก|๐‘ง๐‘ก 2 = 1)

1

2 2

๐‘Ž12

๐‘Ž22

๐‘Ž ๐‘ง๐‘กโˆ’1 1

๐‘Ž ๐‘ง๐‘กโˆ’1 2

Forward Probabilities

17

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐‘ ๐’™1, โ€ฆ , ๐’™๐‘ก = ๐‘ ๐’™1, โ€ฆ , ๐’™๐‘ก , ๐’›๐‘ก๐’›๐‘ก

= ฮฑ ๐’›๐‘ก๐’›๐‘ก

= ฮฑ(๐‘ง๐‘ก1)+ฮฑ(๐‘ง๐‘ก2)!!

๐‘ ๐’›๐‘ก|๐’™1, โ€ฆ , ๐’™๐‘ก =๐‘(๐’™1, โ€ฆ , ๐’™๐‘ก , ๐’›๐‘ก)

๐‘(๐’™1, โ€ฆ , ๐’™๐‘ก )=

ฮฑ(๐’›๐‘ก)

ฮฑ(๐’›๐‘ก)๐’›๐‘ก

= ๐‘Ž (๐’›๐‘ก)

Evaluation: How can we compute ๐‘(๐’™1, โ€ฆ , ๐’™๐‘‡ )?

๐‘(๐’™1, โ€ฆ , ๐’™๐›ต ) = ๐‘(๐’™1, โ€ฆ , ๐’™๐›ต , ๐’›๐›ต)

๐’›๐›ต

= ฮฑ(๐’›๐›ต)

๐’›๐›ต

Filtering

Filtering

18

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐›ฝ ๐’›๐‘ก = ๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก

= ๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ , ๐’›๐‘ก+1 ๐’›๐‘ก

๐’›๐‘ก+1

= ๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก, ๐’›๐‘ก+1 ๐‘(๐’›๐‘ก+1|๐’›๐‘ก)

๐’›๐‘ก+1

= ๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก+1 ๐‘(๐’›๐‘ก+1|๐’›๐‘ก)

๐’›๐‘ก+1

= ๐‘ ๐’™๐‘ก+2, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก+1 ๐‘(๐’™๐‘ก+1|๐’›๐‘ก+1)๐‘(๐’›๐‘ก+1|๐’›๐‘ก)

๐’›๐‘ก+1

= ๐›ฝ(๐’›๐‘ก+1)๐‘(๐’™๐‘ก+1|๐’›๐‘ก+1)๐‘(๐’›๐‘ก+1|๐’›๐‘ก)

๐’›๐‘ก+1

Backward probabilities

19

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐‘(๐’™๐‘ก|๐‘ง๐‘ก+1 1 = 1)

๐‘(๐’™๐‘ก|๐‘ง๐‘ก+1 2 = 1)

1 1

2

๐‘Ž11

๐‘Ž21

๐›ฝ ๐‘ง๐‘ก 1 ๐›ฝ ๐‘ง๐‘ก+1 1

ฮฒ ๐‘ง๐‘ก 2 2

๐›ฝ ๐‘ง๐‘ก+1 2

Computing Backward probabilities

๐›ฝ ๐‘ง๐‘ก 1 = ๐›ฝ(๐‘ง๐‘ก+1 1)๐‘Ž11๐‘ ๐’™๐‘ก ๐‘ง๐‘ก+1 1 = 1

+๐›ฝ(๐‘ง๐‘ก+1 2)๐‘Ž12๐‘ ๐’™๐‘ก ๐‘ง๐‘ก+1 2 = 1

20

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

p(๐’™๐‘ก|๐‘ง๐‘ก+1 2 = 1)

๐›ฝ ๐‘ง๐‘ก 2 = ฮฒ(๐‘ง๐‘ก+1 1)๐‘Ž21๐‘ ๐’™๐‘ก ๐‘ง๐‘ก+1 1 = 1

+๐›ฝ ๐‘ง๐‘ก+1 2 ๐‘Ž22๐‘ ๐’™๐‘ก ๐‘ง๐‘ก+1 2 = 1

๐›ฝ ๐’›๐‘‡ = 1 ๐‘(๐’›๐‘‡ ๐’™1, โ€ฆ , ๐’™๐›ต =๐‘(๐’™1, โ€ฆ , ๐’™๐›ต , ๐’›๐‘ป)

๐‘(๐’™1, โ€ฆ , ๐’™๐›ต )=

๐›ผ ๐’›๐‘‡ โˆ— 1

๐‘(๐’™1, โ€ฆ , ๐’™๐›ต )

p(๐’™๐‘ก|๐‘ง๐‘ก+1 1 = 1) 1

2 2

๐‘Ž21

๐‘Ž22

๐›ฝ ๐‘ง๐‘ก 1

ฮฒ ๐‘ง๐‘ก 2

1

๐›ฝ ๐‘ง๐‘ก+1 1

๐›ฝ ๐‘ง๐‘ก+1 2

Computing Backward probabilities

21

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

๐‘(๐’›๐‘ก+2 ๐’™1, โ€ฆ , ๐’™๐‘ก = ๐‘(๐’›๐‘ก+2, ๐’›๐‘ก+1, ๐’›๐‘ก|๐’™1, โ€ฆ , ๐’™๐‘ก)

๐’›๐‘ก๐’›๐‘ก+1

= ๐‘ ๐’›๐‘ก+2, ๐’›๐‘ก+1 ๐’™1, โ€ฆ , ๐’™๐‘ก , ๐’›๐‘ก ๐‘(๐’›๐‘ก|๐’™1, โ€ฆ , ๐’™๐‘ก)

๐’›๐‘ก๐’›๐‘ก+1

= ๐‘ ๐’›๐‘ก+2, ๐’›๐‘ก+1 ๐’›๐‘ก๐’›๐‘ก๐’›๐‘ก+1

๐‘(๐’›๐‘ก|๐’™1, โ€ฆ , ๐’™๐‘ก)๐‘Ž ๐’›๐‘ก

= ๐‘(๐’›๐‘ก+2|๐’›๐‘ก+1)๐‘(๐’›๐‘ก+1|๐’›๐‘ก)๐‘Ž (๐’›๐‘ก)

๐’›๐‘ก๐’›๐‘ก+1

๐‘จ๐‘‡๐‘จ๐‘‡๐’‚

Prediction

22

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Prediction

๐‘(๐’™๐‘ก+2 ๐’™1, โ€ฆ , ๐’™๐‘ก = ๐‘(๐’™๐‘ก+2|๐’›๐‘ก+2) ๐‘(๐’›๐‘ก+2|๐’™1, โ€ฆ , ๐’™๐‘ก)

๐’›๐‘ก+2

and

23

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

ฮพ ๐’›๐‘กโˆ’1, ๐’›๐‘ก = ๐‘(๐’›๐‘กโˆ’1, ๐’›๐‘ก|๐’™1, โ€ฆ , ๐’™๐‘‡)

=๐‘(๐’™1, โ€ฆ , ๐’™๐‘‡|๐’›๐‘กโˆ’1, ๐’›๐‘ก)๐‘(๐’›๐‘กโˆ’1, ๐’›๐‘ก)

๐‘(๐’™1, โ€ฆ , ๐’™๐‘‡)

=๐‘ ๐’™1, โ€ฆ , ๐’™๐‘กโˆ’1 ๐’›๐‘กโˆ’1 ๐‘(๐’™๐‘ก|๐’›๐‘ก)๐‘ ๐’™๐‘ก+1, โ€ฆ , ๐’™๐‘‡ ๐’›๐‘ก ๐‘(๐’›๐‘ก|๐’›๐‘กโˆ’1)๐‘(๐’›๐‘กโˆ’1)

๐‘(๐’™)

=ฮฑ ๐’›๐‘กโˆ’1 ๐‘ ๐’™๐‘ก ๐’›๐‘ก ๐‘ ๐’›๐‘ก ๐’›๐‘กโˆ’1 ๐›ฝ(๐’›๐‘ก)

๐‘(๐’™1, โ€ฆ , ๐’™๐›ต)

Smoothed transition

24

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

We need to define an EM algorithm

Parameter Estimation (Baum-Welch algorithm)

๐‘(๐’›๐‘กโˆ’1, ๐’›๐‘ก|๐’™1, โ€ฆ , ๐’™๐‘‡)

and we have all the necessary ingredients

๐‘ ๐’›๐‘ก ๐’™1, ๐’™2, โ€ฆ , ๐’™๐‘ป

25

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Summary

We saw how to perform

(a) Filtering

(b) Smoothing

(c) Evaluation

(d) Prediction

Next we will see how to perform (e) EM and (f) Decoding

26

Recommended