CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K 1 2 K 1 2 K 1 2 K x1x1 x2x2 x3x3 xKxK 2 1 K 2

  • View
    219

  • Download
    5

Embed Size (px)

Text of CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K 1 2 K 1 2 K 1 2 K x1x1 x2x2 x3x3...

  • Slide 1
  • CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K 1 2 K 1 2 K 1 2 K x1x1 x2x2 x3x3 xKxK 2 1 K 2
  • Slide 2
  • CS262 Lecture 5, Win07, Batzoglou Example: The Dishonest Casino A casino has two dice: Fair die P(1) = P(2) = P(3) = P(5) = P(6) = 1/6 Loaded die P(1) = P(2) = P(3) = P(5) = 1/10 P(6) = 1/2 Casino player switches back-&-forth between fair and loaded die once every 20 turns Game: 1.You bet $1 2.You roll (always with a fair die) 3.Casino player rolls (maybe with fair die, maybe with loaded die) 4.Highest number wins $2
  • Slide 3
  • CS262 Lecture 5, Win07, Batzoglou The dishonest casino model FAIRLOADED 0.05 0.95 P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/2
  • Slide 4
  • CS262 Lecture 5, Win07, Batzoglou A HMM is memory-less At each time step t, the only thing that affects future states is the current state t K 1 2
  • Slide 5
  • CS262 Lecture 5, Win07, Batzoglou Definition of a hidden Markov model Definition: A hidden Markov model (HMM) Alphabet = { b 1, b 2, , b M } Set of states Q = { 1,..., K } Transition probabilities between any two states a ij = transition prob from state i to state j a i1 + + a iK = 1, for all states i = 1K Start probabilities a 0i a 01 + + a 0K = 1 Emission probabilities within each state e i (b) = P( x i = b | i = k) e i (b 1 ) + + e i (b M ) = 1, for all states i = 1K K 1 2 End Probabilities a i0 In Durbin; not needed
  • Slide 6
  • CS262 Lecture 5, Win07, Batzoglou A HMM is memory-less At each time step t, the only thing that affects future states is the current state t P( t+1 = k | whatever happened so far) = P( t+1 = k | 1, 2, , t, x 1, x 2, , x t )= P( t+1 = k | t ) K 1 2
  • Slide 7
  • CS262 Lecture 5, Win07, Batzoglou A HMM is memory-less At each time step t, the only thing that affects x t is the current state t P(x t = b | whatever happened so far) = P(x t = b | 1, 2, , t, x 1, x 2, , x t-1 )= P(x t = b | t ) K 1 2
  • Slide 8
  • CS262 Lecture 5, Win07, Batzoglou A parse of a sequence Given a sequence x = x 1 x N, A parse of x is a sequence of states = 1, , N 1 2 K 1 2 K 1 2 K 1 2 K x1x1 x2x2 x3x3 xKxK 2 1 K 2
  • Slide 9
  • CS262 Lecture 5, Win07, Batzoglou Generating a sequence by the model Given a HMM, we can generate a sequence of length n as follows: 1.Start at state 1 according to prob a 0 1 2.Emit letter x 1 according to prob e 1 (x 1 ) 3.Go to state 2 according to prob a 1 2 4. until emitting x n 1 2 K 1 2 K 1 2 K 1 2 K x1x1 x2x2 x3x3 xnxn 2 1 K 2 0 e 2 (x 1 ) a 02
  • Slide 10
  • CS262 Lecture 5, Win07, Batzoglou Likelihood of a parse Given a sequence x = x 1 x N and a parse = 1, , N, To find how likely this scenario is: (given our HMM) P(x, ) = P(x 1, , x N, 1, , N ) = P(x N | N ) P( N | N-1 ) P(x 2 | 2 ) P( 2 | 1 ) P(x 1 | 1 ) P( 1 ) = a 0 1 a 1 2 a N-1 N e 1 (x 1 )e N (x N ) 1 2 K 1 2 K 1 2 K 1 2 K x1x1 x2x2 x3x3 xKxK 2 1 K 2 A compact way to write a 0 1 a 1 2 a N-1 N e 1 (x 1 )e N (x N ) Enumerate all parameters a ij and e i (b); n params Example: a 0Fair : 1 ; a 0Loaded : 2 ; e Loaded (6) = 18 Then, count in x and the # of times each parameter j = 1, , n occurs F(j, x, ) = # parameter j occurs in (x, ) (call F(.,.,.) the feature counts) Then, P(x, ) = j=1n j F(j, x, ) = = exp [ j=1n log( j ) F(j, x, ) ]
  • Slide 11
  • CS262 Lecture 5, Win07, Batzoglou Example: the dishonest casino Let the sequence of rolls be: x = 1, 2, 1, 5, 6, 2, 1, 5, 2, 4 Then, what is the likelihood of = Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Fair? (say initial probs a 0Fair = , a oLoaded = ) P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) P(4 | Fair) = (1/6) 10 (0.95) 9 =.00000000521158647211 ~= 0.5 10 -9
  • Slide 12
  • CS262 Lecture 5, Win07, Batzoglou Example: the dishonest casino So, the likelihood the die is fair in this run is just 0.521 10 -9 OK, but what is the likelihood of = Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded, Loaded? P(1 | Loaded) P(Loaded, Loaded) P(4 | Loaded) = (1/10) 9 (1/2) 1 (0.95) 9 =.00000000015756235243 ~= 0.16 10 -9 Therefore, it somewhat more likely that all the rolls are done with the fair die, than that they are all done with the loaded die
  • Slide 13
  • CS262 Lecture 5, Win07, Batzoglou Example: the dishonest casino Let the sequence of rolls be: x = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6 Now, what is the likelihood = F, F, , F? (1/6) 10 (0.95) 9 ~= 0.5 10 -9, same as before What is the likelihood = L, L, , L? (1/10) 4 (1/2) 6 (0.95) 9 =.00000049238235134735 ~= 0.5 10 -7 So, it is 100 times more likely the die is loaded
  • Slide 14
  • CS262 Lecture 5, Win07, Batzoglou Question # 1 Evaluation GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 QUESTION How likely is this sequence, given our model of how the casino works? This is the EVALUATION problem in HMMs Prob = 1.3 x 10 -35
  • Slide 15
  • CS262 Lecture 5, Win07, Batzoglou Question # 2 Decoding GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 QUESTION What portion of the sequence was generated with the fair die, and what portion with the loaded die? This is the DECODING question in HMMs FAIRLOADEDFAIR
  • Slide 16
  • CS262 Lecture 5, Win07, Batzoglou Question # 3 Learning GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 QUESTION How loaded is the loaded die? How fair is the fair die? How often does the casino player change from fair to loaded, and back? This is the LEARNING question in HMMs Prob(6) = 64%
  • Slide 17
  • CS262 Lecture 5, Win07, Batzoglou The three main questions on HMMs 1.Evaluation GIVEN a HMM M, and a sequence x, FIND Prob[ x | M ] 2.Decoding GIVENa HMM M, and a sequence x, FINDthe sequence of states that maximizes P[ x, | M ] 3.Learning GIVENa HMM M, with unspecified transition/emission probs., and a sequence x, FINDparameters = (e i (.), a ij ) that maximize P[ x | ]
  • Slide 18
  • CS262 Lecture 5, Win07, Batzoglou Lets not be confused by notation P[ x | M ]: The probability that sequence x was generated by the model The model is: architecture (#states, etc) + parameters = a ij, e i (.) So, P[x | M] is the same with P[ x | ], and P[ x ], when the architecture, and the parameters, respectively, are implied Similarly, P[ x, | M ], P[ x, | ] and P[ x, ] are the same when the architecture, and the parameters, are implied In the LEARNING problem we always write P[ x | ] to emphasize that we are seeking the * that maximizes P[ x | ]
  • Slide 19
  • CS262 Lecture 5, Win07, Batzoglou Problem 1: Decoding Find the most likely parse of a sequence
  • Slide 20
  • CS262 Lecture 5, Win07, Batzoglou Decoding GIVEN x = x 1 x 2 x N Find = 1, , N, to maximize P[ x, ] * = argmax P[ x, ] Maximizes a 0 1 e 1 (x 1 ) a 1 2 a N-1 N e N (x N ) Dynamic Programming! V k (i) = max { 1 i-1} P[x 1 x i-1, 1, , i-1, x i, i = k] = Prob. of most likely sequence of states ending at state i = k 1 2 K 1 2 K 1 2 K 1 2 K x1x1 x2x2 x3x3 xKxK 2 1 K 2 Given that we end up in state k at step i, maximize product to the left and right
  • Slide 21
  • CS262 Lecture 5, Win07, Batzoglou Decoding main idea Inductive assumption: Given that for all states k, and for a fixed position i, V k (i) = max { 1 i-1} P[x 1 x i-1, 1, , i-1, x i, i = k] What is V l (i+1)? From definition, V l (i+1) = max { 1 i} P[ x 1 x i, 1, , i, x i+1, i+1 = l ] = max { 1 i} P(x i+1, i+1 = l | x 1 x i, 1,, i ) P[x 1 x i, 1,, i ] = max { 1 i} P(x i+1, i+1 = l | i ) P[x 1 x i-1, 1, , i-1, x i, i ] = max k [ P(x i+1, i+1 = l | i =k) max { 1 i-1} P[x 1 x i-1, 1,, i-1, x i, i =k] ] = max k [ P(x i+1 | i+1 = l ) P( i+1 = l | i =k) V k (i) ] = e l (x i+1 ) max k a kl V k (i)
  • Slide 22
  • CS262 Lecture 5, Win07, Batzoglou The Viterbi Algorithm Input: x = x 1 x N Initialization: V 0 (0) = 1(0 is the imaginary first position) V k (0) = 0, for all k > 0 Iteration: V j (i) = e j (x i ) max k a kj V k (i 1) Ptr j (i) = argmax k a kj V k (i 1) Termination: P(x, *) = max k V k (N) Traceback: N * = argmax k V k (N) i-1 * = Ptr i (i)
  • Slide 23
  • CS262 Lecture 5, Win07, Batzoglou The Viterbi Algorithm Similar to aligning a set of states to a sequence Time: O(K 2 N) Space: O(KN) x 1 x 2 x 3 ..x N State 1 2 K V j (i)
  • Slide 24
  • CS262 Lecture 5, Win07, Batzoglou Viterbi Algorithm a practical detail Underflows are a significant problem P[ x 1,., x i, 1, , i ] = a 0 1 a 1 2 a i e 1 (x 1 )e i (x i ) These numbers become extremely small underflow Solution: Take the logs of all values V l (i) = log e k (x i ) + max k [ V k (i-1) + log a kl ]
  • Slide 25
  • CS262 Lecture 5, Win07, Batzoglou Example Let x be a long sequence with a portion of ~ 1/6 6s, followed by a portion of ~ 6s x = 12345612345612345 66263646561626364656 Then, it is not hard to show that optimal parse is (exercise): FFF...F LLL...L 6 characters 123456 parsed as F, contribute.95 6 (1/6) 6 = 1.6 10 -5 parsed