32
Hidden Markov Models: Algorithms and Applications

Hidden Markov Models POS

  • Upload
    iiita

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Hidden Markov Models: Algorithms and Applications

Introduction Often we are interested in finding patterns in signals which change over a space or time.

For example:- commands used in instructing a computer- sequences of words in sentences- sequence of phonemes in spoken words

i.e. areas where a sequence of events occurs could produce useful patterns.

Markov models In Markov Model we model a system as a finite set of states

The system makes transitions from one state to another with some transition probability

Markov models contd. In Markov Models we assume that a state of a system depends only on the previous k states.

e.g. if k = 2 then S(t4) depends only on S(t3) and S(t2) and not on S(t1).

An example Consider the current value of shares of a particular company.

A stock-broker would be interested in knowing whether the value of the share is going to increase, decrease or remain unchanged.

Thus, there are three possible states: I (increase)D (decrease) U (unchanged)

The example contd. Say we make observe the share value for several days

and note whether it increased, decreased or remained same.

We get a sequence likeU U I I I U U I I D D D D I I U U D U D

What conclusions would we like to draw from the above sequence?

Obviously we would like to know whether the share values are going to increase / decrease / remain unchanged in the near future.

In other words, given the state today and of the immediate past, we would like to predict tomorrow’s state.

The example contd. Recall, the sequence was:U U I I I U U I I D D D D I I U U D U DWe can get the probabilities of- observing a particular state (7/20 for U, 7/20 for I, 6/20 for D)- this observation is not very informative. It does not tell us that if the state is D today then what state it is going to be in tomorrow since all states have nearly equal probabilities.

The example contd. We can also calculate the transition

probabilities For example:In the sequence

U U I I I U U I I D D D D I I U U D U D- transitions from one state to another (3/7 for U → U, 2/7 for U → I, 2/7 for U → D etc.)

The example contd. We can create a transition probability matrix: For the sequenceU U I I I U U I I D D D D I I U U D U D

U I D3/7 2/7 2/7 U

A = 2/7 4/7 1/7 I1/5 1/5 3/5 D

Note: the discrepancy in the last row.The last D is not considered since there is no transitioncorresponding to it.For sufficiently large data the discrepancy will be small.

The example contd. We have built a Markov model with k = 1 from the given data.

We can now use it to predict. The probability of getting a D on the next day is 3/5 while the probabilities of getting U or I are 1/5 each.

Advice: Don’t buy the share today.

State transition diagrams We commonly represent the system by a state transition diagram.

The numbers on the directed edges indicate the transition probabilities.

Role of initial state Let us start with the system in the state U Given the transition matrix as

U I D3/7 2/7 2/7 UA = 2/7 4/7 1/7 I1/5 1/5 3/5 D

We can represent the initial state as the vectorS(t0) = = (1, 0, 0)

To find the state of the system at the next time slot we haveS(t1) = * A

In general: S(tj+1) = S(tj) * A

Hidden Markov Models We have assumed that we know the system i.e. we know

the possible states of the system.Note: It is possible that we are not be able to

observe the system directly. Instead we may be able to observe some effect of the system.

Assumptions:

- there is an underlying system- the system follows the Markov assumption- we can not observe the system directly- we can observe some effect of the system- the underlying state of the system is responsible for the observation. Part-of-speech tagging

(observed: words, hidden: part-of-speech tags)

A possible scenario Assume that we can not observe the value of the share

directly. Instead we can observe what a stock-broker does with

those shares. He either buys more shares, sells the shares bought earlier or does nothing.

Possible observables are:B (buy), S (sell) and N (do nothing)

We can get sequences likeB B S N B S B B N ….

Each possible state of the system can generate any of the given observations with a given probability

i.e. The states, U, I or D can generate all the observables, B, S or N with some probabilities (emission probabilities).

State diagram for HMM

Inputs for an HMM A system that can be in some states, xi Transition probabilities between states, aij A set of observables, yk Emission probabilities of observables from a state,

bjk A start state An HMM is characterized by the triplet

= ({aij}, {bij}, ) Where

- aij = P(xi(t+1) | xj(t)); aij ≥ 0; j=1N aij = 1 for

all i

- bjk = P(yk(t) | xj(t)); bjk ≥ 0; k=1M bjk = 1 for all

j

Three Basic HMM problems Problem 1 (Evaluation):

Given the observation sequence O=o1,…,oT and an HMM model, how do we compute the probability of O given the model?

Problem 2 (Decoding): Given the observation sequence O=o1,…,oT and an HMM model, how do we find the state sequence that best explains the observations?

Problem 3 (Learning): How do we adjust the model parameters = ({aij}, {bij}, ) , to maximize P(O|) ?

Assumptions of HMM(1)The Markov assumption

(2)The stationarity assumption State transition probabilities and emission probabilities areindependent of the actual time at which the transitions takes place.

(3)The output independence assumption This is the assumption that current output (observation) is statistically independent of the previous outputs (observations).

Note: Unlike the other two, this assumption has limited validity.

e.g. In the stock broker example, this assumption implies that

whether the action of the stockbroker today is independent of what he did yesterday.

What happens if he had sold all the shares yesterday?

The evaluation problem Given an HMM:

= ({aij}, {bjk}, )

what is the probability of a sequence of observations O = {o (1), o(2), o(3), …, o(T)} ?

To find P(O|) using simple probabilistic arguments.

The evaluation problem contd…Problem: The computational complexity is very high as the operation will be O(NT)

WhereN is number of states T is the number of observationsEven small HMMs, e.g. T=10 and N=10, contain 10 billion different paths

Forward Algorithm To reduce complexity- define the partial forward probabilities

Denote the partial probability of state j at time t as t( j )

This partial probability is calculated as; t( j )= P( observation | hidden state is j ) x P(all paths to state j at time t)

Forward Algorithm contd. t( j ) = P(o(1), o(2), o(3), …, o(t); yt = j| )

We can get the recursion:t+1( j ) = bj o(t+1) i=1

N t( i ) aij 1 ≤ j ≤ N; 1 ≤ t ≤ T-1

Initialization:1( j ) = j * bj o(1)

We can now calculate T( j ) for 1 ≤ j ≤ Nwith complexity O(N2 T)P(O | ) = i=1

N T( i )

Backward partial probabilities We can similarly define

t( i ) = P(o(t+1), o(t+2), o(t+3), …, o(T); yt = j| ,)

Which follow the recursiont( i ) = j=1

N t+1( j ) aij bj o(t+1)With the initial condition

T( i ) = 1 for all i We find

t( i ) * t( i ) = P(O, o(t) = i | )and

P(O | ) = i=1N t( i ) t( i )

Decoding (Viterbi) Algorithm Given an HMM:

= ({aij}, {bjk}, )

and a sequence of observations O = {o (1), o(2), o(3), …, o(T)}

what is the most likely sequence of hidden states that produced the given set of observations?

Decoding (Viterbi) Algorithm contd… We want to maximize t(i) i.e. t(i) = max P(s1, s2, s3, …, st-1, st = i; o1, o2, o3, …, ot-1| ) s1, s2, s3, …, st-1

We get the recursion

t+1(j) = bj o(t+1) { max t(i) aij} 1≤i≤N

With initial condition 1(j) = j bj o(1)

Contd.. Similar to computing the forward probabilities, but instead of summing over transitions from incoming states, compute the maximum.

Learning Training HMM to encode observation sequence such

that HMM should identify a similar observation sequence in future

Find = ({aij}, {bij}, ) maximising P(O|) Determine optimum model, given a training set of

observations Find , such that P(O| ) is maximal

General algorithm: Initialise: 0 Compute new model , using 0 and observed sequence O Then 0 ← Repeat steps 2 and 3 until:log P(O|) - log P(O| 0) < d

Penn Treebank Tag-set Tag Description

Examples $ dollar $ -$ --$ A$ C$ HK$ M$ NZ$ S$ U.S.$ US$

`` opening quotation mark ` ``

'' closing quotation mark ' ''

( opening parenthesis ( [ {

) closing parenthesis ) ] }

, comma ,

-- dash --

. sentence terminator . ! ?

: colon or ellipsis : ; ...

Tag Description ExamplesCC conjunction, & 'n and both but c

coordinating either et for less minus neither nor or plus so therefore times v. versus vs. whether yet

CD numeral, mid-1890 nine-thirty cardinal forty- two one-tenth ten million 0.5 one forty-seven 1987 twenty '79 zero two 78-degrees eighty-four IX '60s .025 fifteen 271,124 dozen quintillion DM2,000 ...

DT determiner all an another any both del each either every half la many much nary neither no some such that the them these this those

EX existential there there

Tag Description Examples

FW foreign word gemeinschaft hund ich jeux habeas Haementeria Herr K'ang-si vous lutihaw alai je jour objets salutaris fille quibusdam pas trop Monte terram fiche oui corporis ...

IN preposition or astride among uppon whether out inside conjunction, pro despite on by throughout below within subordinating for towards near behind atop around if like until below next into if beside ...

JJ adjective or third ill-mannered pre-war regrettable numeral, oiled calamitous first separable ectoplasmic ordinal oiled calamitous first separable ectoplasmic x still-to-be-named multilingual multi-disciplinary ...

JJR adjective, bleaker braver breezier briefer brighter comparative bleaker braver breezier briefer brighter x cheaper choosier cleaner clearer closer colder commoner costlier cozier creamier crunchier cuter ...

JJS adjective, superlative calmest cheapest choicest classiest cleanest clearest closest commonest corniest costliest crassest creepiest crudest cutest darkest deadliest dearest deepest densest dinkiest ...

LS list item marker A A. B B. C C. D E F First G H I J K One SP-

44001 SP-44002 SP-44005 SP-44007 Second Third Three Two \* a b c d first five four one six three two

MD modal auxiliary can cannot could couldn't dare may might

must need ought shall should shouldn't will would

NN noun, common, singular or mass common-carrier cabbage knuckle-duster Casino

afghan shed thermostat investment slide humour falloff slick wind hyena override subhumanity machinist ...

NNP noun, proper, singular Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA Shannon A.K.C. Meltex Liverpool ...

NNPS noun, proper, plural Americans Americas Amharas Amityvilles

Amusements Anarcho-Syndicalists Andalusians Andes Andruses Angels Animals Anthony Antilles Antiques Apache Apaches Apocrypha ...

NNS noun, common, plural undergraduates scotches bric-a-brac products

bodyguards facets coasts divestitures storehouses designs clubs fragrances averages subjectivists apprehensions muses factory-jobs ...

PDT pre-determiner all both half many quite such sure this