S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7

S. Maarschalkerweerd & A. Tjhang

1

Parameter estimation for HMMs, Parameter estimation for HMMs, Baum-Welch algorithm, Model Baum-Welch algorithm, Model topology, Numerical stability topology, Numerical stability

Chapter 3.3-3.7


2

Overview last lectureOverview last lecture

Hidden Markov Models Different algorithms:

– Viterbi– Forward– Backward


3

Overview todayOverview today

Parameter estimation for HMMs– Baum-Welch algorithm

HMM model structureMore complex Markov chainsNumerical stability of HMM algorithms


4

Specifying a HMM modelSpecifying a HMM model

Most difficult problem using HMMs is specifying the model– Design of the structure– Assignment of parameter values


5




6

Parameter estimation for HMMsParameter estimation for HMMs

Estimate transition and emission probabilities akl and ek(b)

Two ways of learning:– Estimation when state sequence is known– Estimation when paths are unknown

Assume that we have a set of example sequences (training sequences x1, …xn)


7

Parameter estimation for HMMsParameter estimation for HMMs

Assume that x1…xn independent.So P(x1,…,xn | ) = P(xj | )

Since log ab = log a + logb

n

j=1


8

Estimation when state Estimation when state sequence is knownsequence is known

Easier than estimation when paths unknown

Akl = number of transitions k to l in trainingdata + rkl

Ek(b) = number of emissions of b from k in training data + rk(b)


9

Estimation when paths are unknownEstimation when paths are unknown

More complex than when paths are knownWe can’t use maximum likelihood

estimators Instead, an iterative algorithm is used

– Baum-Welch


10

The Baum-Welch algorithmThe Baum-Welch algorithm

We don’t know real values of Akl and Ek(b)

1. Estimate Akl and Ek(b)

2. Update akl and ek(b)

3. Repeat with new model parameters akl and ek(b)


11

Baum-Welch algorithmBaum-Welch algorithmForward value Backward value


12

Baum-Welch algorithmBaum-Welch algorithm

Now that we have estimated Akl and Ek(b), use maximum likelihood estimators to compute akl and ek(b)

We use these values to estimate Akl and Ek(b) in the next iteration

Continue doing this iteration until change is very small or max number of iterations is exceeded


13

Baum-Welch algorithmBaum-Welch algorithm


14

ExampleExample


15


16

DrawbacksDrawbacks

ML estimators– Vulnerable to overfitting if not enough data– Estimations can be undefined if never used in

training set (use pseudocounts)Baum-Welch

– Local maximum instead of global maximum can be found, depending on starting values of parameters

– This problem will be worse for large HMMs


17

Modelling of labelled sequencesModelling of labelled sequences

Only -- and ++ are calculatedBetter than using ML estimators, when

many different classes are present


18




19

Design of the structureDesign of the structure

Design: how to connect states by transitions A good HMM is based on the knowledge about the

problem under investigation Local maxima are biggest disadvantage in models

that are fully connected After deleting a transition from model Baum-Welch

will still work: set transition probability to zero


20

Example 1Example 1

Geometric distribution

p

1-p


21

Example 2Example 2

Model distribution of length between 2 and 10


22

Example 3Example 3


23

Silent statesSilent states

States that do not emit symbols

Also in other places in HMM

B


24

ExampleExample

Silent states


25

Silent statesSilent states

Advantage:– Less estimations of transition probabilities

needed

Drawback:– Limits the possibilities of defining a model


26

More complex Markov chainsMore complex Markov chains

So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol

More complex – High order Markov chains– Inhomogeneous Markov chains


27

High order Markov chainsHigh order Markov chains

An nth order Markov process

Probability of a symbol in a sequence depends on the probability of the previous n symbols

An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet An of n-tuples, because

P(AB|B) = P(A|B)


28

ExampleExample

A second order Markov chain with two different symbols {A,B}

This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB}

Sometimes the framework of high order model is convenient


29

Gene candidates in DNA:

-sequence of triplets of nucleotides:

startcodon nr. of non-stopcodons stopcodon

-open reading frame (ORF)An ORF can be either a gene or a non-

coding ORF (NORF)

Finding prokaryotic genesFinding prokaryotic genes


30


Experiment: – DNA from bacterium E.coli– Dataset contains 1100 genes (900 used for

training, 200 for testing)

Two models:– Normal model with first order Markov chains– Also first order Markov chains, but codons

instead of nucleotides are used as symbol


31


Outcomes:


32

Inhomogeneous Markov chainsInhomogeneous Markov chains

Using the position information in the codon– Three models for position 1, 2 and 3

CAT GCA

P(C)aCA aAT aTG aGC aCA P(C)a2CA a3

AT a1TG a2

GC a3CA

Homogeneous Inhomogeneous

1 2 3 1 2 3


33

Numerical Stability of HMM Numerical Stability of HMM algorithmsalgorithms

Multiplying many probabilities can cause numerical problems:– Underflow errors– Wrong numbers are calculated

Solutions:– Log transformation– Scaling of probabilities


34

The log transformationThe log transformation

Compute log probabilities– Log 10-100000 = -100000– Underflow problem is essentially solved

Sum operation is often faster than product operation

In the Viterbi algorithm:


35

Scaling of probabilitiesScaling of probabilities

Scale f and b variablesForward variable:

– For each i a scaling variable si is defined

– New f variables are defined:

– New forward recursion:


36

Scaling of probabilitiesScaling of probabilities

Backward variable– Scaling has to be with same numbers as forward

variable– New backward recursion:

This normally works well, however underflow errors can still occur in models with many silent states (chapter 5)


37

SummarySummary

Hidden Markov ModelsParameter estimation

– State sequence known – State sequence unknown

Model structure– Silent states

More complex Markov chainsNumerical stability

Documents

S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7