Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Design and Implementation of Speech Recognition Systems

Fall 2014Ming Li

Special topic: the Expectation-Maximization algorithm and GMM

Sep 30 2014

Some graphics are from Pattern Recognition and Machine learning, Bishop, Copyright © Springer

Some equations are from the slides edited by Muneem S.

Parametric density estimation

• How to estimate those parameters?– ML– MAP

How about Multimodal cases?

Mixture Models• If you believe that the data set is comprised

of several distinct populations• It has the following form:

M

jjjj pp

1

)|()|( xΘx

),,,,,,( 11 MM Θ

11

M

jjwith

Mixture Models

M

jjjj pp

1

)|()|( xΘx

Let yi{1,…, M} represents the source that generates the data.

X

Y},,,{ 21 Nyyy Y

},,,{ 21 NxxxX

Mixture Models1

( | ) ( | )M

i j j i jj

p p

x Θ x

Let yi{1,…, M} represents the source that generates the data.

X

Y

ixiy j

( | , ) ( | )i j i jp y j p x Θ x

( | )i jp y j Θ

Mixture Models

ixiy

)|(),|()|,()|( ΘxΘxΘxΘz iiiiii pypypp

iz

1 1 1

( | ) ( , | ) ( | ) ( | , ) ( | )i i

M M M

i i i i i j j i jy y j

p p y p y j p y j p

x Θ x Θ Θ x Θ x

( | , ) ( | )i j i jp y j p x Θ x ( | )i jp y j Θ

Mixture Models

( | ) ( , | ) ( | , ) ( | )=( | , ) ( | )i i i i i i i i ip p y p y p y p y z Θ x Θ x Θ x Θ x Θ Θ

( | , ) ( | )i j i jp y j p x Θ x ( | )i jp y j Θ

),(),,(),|(

ΘxΘxΘx

i

iiii p

ypyp ),(

),(),|(Θx

ΘΘx

i

iii

pypyp

)()|()()|(),|(

ΘΘxΘΘΘx

pppypyp

i

iii)|(

)|(),|(Θx

ΘΘx

i

iii

pypyp

M

jjjj

yyiy

p

piii

1

)|(

)|(

x

xGiven x and , the conditional density

of y can be computed.

1

( | ) ( | )M

i j j i jj

p p

x Θ x

Complete-Data Likelihood Function

},,{ 1 NxxX

),( iii yxz },,{ 1 Nyy y

),,{ 1 NzzZ

( )p |Z Θ ),( ΘyX |p )(),|( ΘyΘyX |pp

N

iiii |ypyp

1

)(),|( ΘΘx

N

iyiyy iii

p1

)|( x

iy iy

X

Y

But we don’t know the latent variable y

11 1

( | ) ( | ) ( | )N N M

i j j i jji i

p p p

x Θ x Θ x

Expectation-Maximization (1)• If we want to find the local maxima of

– Define an auxiliary function at – Satisfy and

( )f x( , )tA x x

( ) ( , ),tf x A x x x ( ) ( , )t t tf x A x x

1 arg max ( , )t tx A x x

1 1( ) ( , )t t tf x A x x

1( , ) ( , ) ( )t t t t tA x x A x x f x

1( ) ( )t tf x f x 2 1( ) ( )t tf x f x

tx

Expectation-Maximization (2)• The goal: finding ML solutions for

probabilistic models with latent variables

( , | ) ( , | )log ( ) ( ) log( )( ) ( )y y

p y p yq y q yq y q y

x Θ x ΘDifficult, auxiliary function

X

Y log ( | ) log ( , | )y

p p y x Θ x Θ

( ) 1, ( ) 0y

q y q y ( , | )log ( | ) log ( , | ) ( ) log( )

( )y y

p yp p y q yq y

x Θx Θ x Θ

Log(x) is concaveAuxiliary function

Expectation-Maximization (2)• What is the gap?

( , | )log ( | ) ( ) log( )( )y

p yp q yq y

x Θx Θ

( , | )( ) log ( | ) log( )( )y

p yq y pq y

x Θx Θ

( ) ( | )( ) log( )( , | )y

q y pq yp y

x Θx Θ

( )( ) log( )( | , )y

q yq yp y x

Θ

( ( ) ( | , )) 0KL q y p y x Θ

( ( ) ( | , )) 0 ( ) ( | , )KL q y p y x iff q y p y x Θ Θ

Expectation-Maximization (2)• We have shown that

( , | )log ( | ) ( ) log( ) ( ( ) ( | , )) 0( )y

p yp q y KL q y p y xq y

x Θx Θ Θ

( ) ( | , )oldq y p y x Θ( , )oldA Θ Θ

( , | )( ) log( )( )

old

y

p yq yq y

x Θ

( , | )( | , ) log( )( | , )

oldold

oldy

p yp y xp y x

x ΘΘΘ

( | , ) log( ( | )) log( ( | ))old old old

y

p y x p p Θ x Θ x Θ

If we evaluate with old parameters

Expectation-Maximization (3)• Any physical meaning of the auxiliary function?

( , | )( , ) ( | , ) log( )( | , )

old oldold

y

p yA p y xp y x

x ΘΘ Θ ΘΘ

( | , ) log( ( , | )) ( | , ) ( | , )old old old

y y

p y x p y p y x p y x Θ x Θ Θ Θ

( | , ) log( ( , | ))old

y

p y x p y const Θ x ΘIndependent of Θ

Conditional expectation of the complete data log likelihood

Expectation-Maximization (4)1. Choose an initial setting for the parameters2. E step: evaluate

3. M step: evaluate given by

4. Check for convergence if not, and return to step 2

oldΘ( | , )oldp y x Θ

newΘarg max ( , )new oldQ

ΘΘ Θ Θ

( , ) ( | , ) log( ( , | ))old old

y

Q p y x p yΘ Θ Θ x Θ

old newΘ Θ

Expectation-Maximization (4)

Guassian Mixture Model (GMM)

1/ 2 1/ 2

1 1( | , ) exp ( ) ( )(2 ) | | 2

Tj i j j i j j i jd

j

p

x μ Σ x μ Σ x μ

Σ

Guassian model of a d-dimensional source, say j :

GMM with M sources:

1 11

( | , , , , ) ( | , )M

j i M M j j i j jj

p p

x μ Σ μ Σ x μ Σ

),( jjj Σμ

0

1j

j

ix

iy

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ

),,,,,,( 11 MM Θ

1

( | ) ( | )M

i l l i ll

p p

x Θ xMixture Model

11

M

llsubject to

M step:

1 1

( , ) ( | , ) log( ( | ))N M

g gi l l i l

i l

Q p l p

Θ Θ x Θ x

( , ) ( | , ) log( ( , | ))old old

y

Q p y x p yΘ Θ Θ x Θ

E step: ( | , )gip l x Θ

( , ) ( | )i i ii i y y i yp | p x y Θ x

Correlated with l only.

Correlated with l only.

arg max ( , )new oldQΘ

Θ Θ Θ

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1


Due to the constraint on l’s, we introduce Lagrange Multiplier , and solve the following equation.

MllpM

ll

gi

M

l

N

il

l

,,1 ,01),|()log(11 1

Θx

MllpN

i

gi

l

,,1 ,0),|(11

Θx

Mllp l

N

i

gi ,,1 ,0),|(

1

Θx

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1


Mllp l

N

i

gi ,,1 ,0),|(

1

Θx

0),|(11 1

M

ll

M

l

N

i

gilp Θx

0),|(11 1

M

ll

N

i

M

l

gilp Θx1

N

1

N

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1


Mllp l

N

i

gi ,,1 ,0),|(

1

Θx

N

N

i

gil lp

N 1

),|(1 Θx

M

j

gjj

gj

glil

glg

i

p

plp

1

)|(

)|(),|(

x

xΘx

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ Only need to maximizethis term

Consider GMM

)()(

21exp

||)2(1),|( 1

2/12/ llT

ll

dlllp μxΣμxΣ

Σμx

),( lll Σμ

)()(||log2log)],|(log[ 1212/1

21

2 llT

lld

lllp μxΣμxΣΣμx

unrelated

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ Only need to maximizethis term

Therefore, we want to maximize:

M

l

N

i

gilil

Tlil

g lpQ1 1

1212/1

21 ),|()()(||log),( ΘxμxΣμxΣΘΘ

How?

Finding l Therefore, we want to maximize:

M

l

N

i

gilil

Tlil

g lpQ1 1

1212/1

21 ),|()()(||log),( ΘxμxΣμxΣΘΘ

N

i

gi

N

i

gii

l

lp

lp

1

1

),|(

),|(

Θx

Θxxμ

N

i

gi

N

i

Tlili

gi

l

lp

lp

1

1

),|(

))()(,|(

Θx

μxμxΘx

M

j

gjj

gj

glil

glg

i

p

plp

1

)|(

)|(),|(

x

xΘx

SummaryEM algorithm for GMM

M

j

gjj

gj

glil

glg

i

p

plp

1

)|(

)|(),|(

x

xΘx

N

i

gi

newl lp

N 1

),|(1 Θx

N

i

gi

N

i

gii

newl

lp

lp

1

1

),|(

),|(

Θx

Θxxμ

N

i

gi

N

i

Tnewli

newli

gi

newl

lp

lp

1

1

),|(

))()(,|(

Θx

μxμxΘx

Given an initial guess g, find new as follows

newg ΘΘ

Not converge

Documents

Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some