37
S. Maarschalkerweerd & A. Tjhang 1 Parameter estimation for Parameter estimation for HMMs, Baum-Welch HMMs, Baum-Welch algorithm, Model topology, algorithm, Model topology, Numerical stability Numerical stability Chapter 3.3-3.7

S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

S. Maarschalkerweerd & A. Tjhang

1

Parameter estimation for HMMs, Parameter estimation for HMMs, Baum-Welch algorithm, Model Baum-Welch algorithm, Model topology, Numerical stability topology, Numerical stability

Chapter 3.3-3.7

S. Maarschalkerweerd & A. Tjhang

2

Overview last lectureOverview last lecture

Hidden Markov Models Different algorithms:

– Viterbi– Forward– Backward

S. Maarschalkerweerd & A. Tjhang

3

Overview todayOverview today

Parameter estimation for HMMs– Baum-Welch algorithm

HMM model structureMore complex Markov chainsNumerical stability of HMM algorithms

S. Maarschalkerweerd & A. Tjhang

4

Specifying a HMM modelSpecifying a HMM model

Most difficult problem using HMMs is specifying the model– Design of the structure– Assignment of parameter values

S. Maarschalkerweerd & A. Tjhang

5

Specifying a HMM modelSpecifying a HMM model

Most difficult problem using HMMs is specifying the model– Design of the structure– Assignment of parameter values

S. Maarschalkerweerd & A. Tjhang

6

Parameter estimation for HMMsParameter estimation for HMMs

Estimate transition and emission probabilities akl and ek(b)

Two ways of learning:– Estimation when state sequence is known– Estimation when paths are unknown

Assume that we have a set of example sequences (training sequences x1, …xn)

S. Maarschalkerweerd & A. Tjhang

7

Parameter estimation for HMMsParameter estimation for HMMs

Assume that x1…xn independent.So P(x1,…,xn | ) = P(xj | )

Since log ab = log a + logb

n

j=1

S. Maarschalkerweerd & A. Tjhang

8

Estimation when state Estimation when state sequence is knownsequence is known

Easier than estimation when paths unknown

Akl = number of transitions k to l in trainingdata + rkl

Ek(b) = number of emissions of b from k in training data + rk(b)

S. Maarschalkerweerd & A. Tjhang

9

Estimation when paths are unknownEstimation when paths are unknown

More complex than when paths are knownWe can’t use maximum likelihood

estimators Instead, an iterative algorithm is used

– Baum-Welch

S. Maarschalkerweerd & A. Tjhang

10

The Baum-Welch algorithmThe Baum-Welch algorithm

We don’t know real values of Akl and Ek(b)

1. Estimate Akl and Ek(b)

2. Update akl and ek(b)

3. Repeat with new model parameters akl and ek(b)

S. Maarschalkerweerd & A. Tjhang

11

Baum-Welch algorithmBaum-Welch algorithmForward value Backward value

S. Maarschalkerweerd & A. Tjhang

12

Baum-Welch algorithmBaum-Welch algorithm

Now that we have estimated Akl and Ek(b), use maximum likelihood estimators to compute akl and ek(b)

We use these values to estimate Akl and Ek(b) in the next iteration

Continue doing this iteration until change is very small or max number of iterations is exceeded

S. Maarschalkerweerd & A. Tjhang

13

Baum-Welch algorithmBaum-Welch algorithm

S. Maarschalkerweerd & A. Tjhang

14

ExampleExample

S. Maarschalkerweerd & A. Tjhang

15

S. Maarschalkerweerd & A. Tjhang

16

DrawbacksDrawbacks

ML estimators– Vulnerable to overfitting if not enough data– Estimations can be undefined if never used in

training set (use pseudocounts)Baum-Welch

– Local maximum instead of global maximum can be found, depending on starting values of parameters

– This problem will be worse for large HMMs

S. Maarschalkerweerd & A. Tjhang

17

Modelling of labelled sequencesModelling of labelled sequences

Only -- and ++ are calculatedBetter than using ML estimators, when

many different classes are present

S. Maarschalkerweerd & A. Tjhang

18

Specifying a HMM modelSpecifying a HMM model

Most difficult problem using HMMs is specifying the model– Design of the structure– Assignment of parameter values

S. Maarschalkerweerd & A. Tjhang

19

Design of the structureDesign of the structure

Design: how to connect states by transitions A good HMM is based on the knowledge about the

problem under investigation Local maxima are biggest disadvantage in models

that are fully connected After deleting a transition from model Baum-Welch

will still work: set transition probability to zero

S. Maarschalkerweerd & A. Tjhang

20

Example 1Example 1

Geometric distribution

p

1-p

S. Maarschalkerweerd & A. Tjhang

21

Example 2Example 2

Model distribution of length between 2 and 10

S. Maarschalkerweerd & A. Tjhang

22

Example 3Example 3

S. Maarschalkerweerd & A. Tjhang

23

Silent statesSilent states

States that do not emit symbols

Also in other places in HMM

B

S. Maarschalkerweerd & A. Tjhang

24

ExampleExample

Silent states

S. Maarschalkerweerd & A. Tjhang

25

Silent statesSilent states

Advantage:– Less estimations of transition probabilities

needed

Drawback:– Limits the possibilities of defining a model

S. Maarschalkerweerd & A. Tjhang

26

More complex Markov chainsMore complex Markov chains

So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol

More complex – High order Markov chains– Inhomogeneous Markov chains

S. Maarschalkerweerd & A. Tjhang

27

High order Markov chainsHigh order Markov chains

An nth order Markov process

Probability of a symbol in a sequence depends on the probability of the previous n symbols

An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet An of n-tuples, because

P(AB|B) = P(A|B)

S. Maarschalkerweerd & A. Tjhang

28

ExampleExample

A second order Markov chain with two different symbols {A,B}

This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB}

Sometimes the framework of high order model is convenient

S. Maarschalkerweerd & A. Tjhang

29

Gene candidates in DNA:

-sequence of triplets of nucleotides:

startcodon nr. of non-stopcodons stopcodon

-open reading frame (ORF)An ORF can be either a gene or a non-

coding ORF (NORF)

Finding prokaryotic genesFinding prokaryotic genes

S. Maarschalkerweerd & A. Tjhang

30

Finding prokaryotic genesFinding prokaryotic genes

Experiment: – DNA from bacterium E.coli– Dataset contains 1100 genes (900 used for

training, 200 for testing)

Two models:– Normal model with first order Markov chains– Also first order Markov chains, but codons

instead of nucleotides are used as symbol

S. Maarschalkerweerd & A. Tjhang

31

Finding prokaryotic genesFinding prokaryotic genes

Outcomes:

S. Maarschalkerweerd & A. Tjhang

32

Inhomogeneous Markov chainsInhomogeneous Markov chains

Using the position information in the codon– Three models for position 1, 2 and 3

CAT GCA

P(C)aCA aAT aTG aGC aCA P(C)a2CA a3

AT a1TG a2

GC a3CA

Homogeneous Inhomogeneous

1 2 3 1 2 3

S. Maarschalkerweerd & A. Tjhang

33

Numerical Stability of HMM Numerical Stability of HMM algorithmsalgorithms

Multiplying many probabilities can cause numerical problems:– Underflow errors– Wrong numbers are calculated

Solutions:– Log transformation– Scaling of probabilities

S. Maarschalkerweerd & A. Tjhang

34

The log transformationThe log transformation

Compute log probabilities– Log 10-100000 = -100000– Underflow problem is essentially solved

Sum operation is often faster than product operation

In the Viterbi algorithm:

S. Maarschalkerweerd & A. Tjhang

35

Scaling of probabilitiesScaling of probabilities

Scale f and b variablesForward variable:

– For each i a scaling variable si is defined

– New f variables are defined:

– New forward recursion:

S. Maarschalkerweerd & A. Tjhang

36

Scaling of probabilitiesScaling of probabilities

Backward variable– Scaling has to be with same numbers as forward

variable– New backward recursion:

This normally works well, however underflow errors can still occur in models with many silent states (chapter 5)

S. Maarschalkerweerd & A. Tjhang

37

SummarySummary

Hidden Markov ModelsParameter estimation

– State sequence known – State sequence unknown

Model structure– Silent states

More complex Markov chainsNumerical stability