25
Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some graphics are from Pattern Recognition and Machine learning, Bishop, Copyright Springer Some equations are from the slides edited by Muneem S.

Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Embed Size (px)

DESCRIPTION

How about Multimodal cases?

Citation preview

Page 1: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Design and Implementation of Speech Recognition Systems

Fall 2014Ming Li

Special topic: the Expectation-Maximization algorithm and GMM

Sep 30 2014

Some graphics are from Pattern Recognition and Machine learning, Bishop, Copyright © Springer

Some equations are from the slides edited by Muneem S. 

Page 2: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Parametric density estimation

• How to estimate those parameters?– ML– MAP

Page 3: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

How about Multimodal cases?

Page 4: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Mixture Models• If you believe that the data set is comprised

of several distinct populations• It has the following form:

M

jjjj pp

1

)|()|( xΘx

),,,,,,( 11 MM Θ

11

M

jjwith

Page 5: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Mixture Models

M

jjjj pp

1

)|()|( xΘx

Let yi{1,…, M} represents the source that generates the data.

X

Y},,,{ 21 Nyyy Y

},,,{ 21 NxxxX

Page 6: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Mixture Models1

( | ) ( | )M

i j j i jj

p p

x Θ x

Let yi{1,…, M} represents the source that generates the data.

X

Y

ixiy j

( | , ) ( | )i j i jp y j p x Θ x

( | )i jp y j Θ

Page 7: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Mixture Models

ixiy

)|(),|()|,()|( ΘxΘxΘxΘz iiiiii pypypp

iz

1 1 1

( | ) ( , | ) ( | ) ( | , ) ( | )i i

M M M

i i i i i j j i jy y j

p p y p y j p y j p

x Θ x Θ Θ x Θ x

( | , ) ( | )i j i jp y j p x Θ x ( | )i jp y j Θ

Page 8: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Mixture Models

( | ) ( , | ) ( | , ) ( | )=( | , ) ( | )i i i i i i i i ip p y p y p y p y z Θ x Θ x Θ x Θ x Θ Θ

( | , ) ( | )i j i jp y j p x Θ x ( | )i jp y j Θ

),(),,(),|(

ΘxΘxΘx

i

iiii p

ypyp ),(

),(),|(Θx

ΘΘx

i

iii

pypyp

)()|()()|(),|(

ΘΘxΘΘΘx

pppypyp

i

iii)|(

)|(),|(Θx

ΘΘx

i

iii

pypyp

M

jjjj

yyiy

p

piii

1

)|(

)|(

x

xGiven x and , the conditional density

of y can be computed.

1

( | ) ( | )M

i j j i jj

p p

x Θ x

Page 9: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Complete-Data Likelihood Function

},,{ 1 NxxX

),( iii yxz },,{ 1 Nyy y

),,{ 1 NzzZ

( )p |Z Θ ),( ΘyX |p )(),|( ΘyΘyX |pp

N

iiii |ypyp

1

)(),|( ΘΘx

N

iyiyy iii

p1

)|( x

iy iy

X

Y

But we don’t know the latent variable y

11 1

( | ) ( | ) ( | )N N M

i j j i jji i

p p p

x Θ x Θ x

Page 10: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Expectation-Maximization (1)• If we want to find the local maxima of

– Define an auxiliary function at – Satisfy and

( )f x( , )tA x x

( ) ( , ),tf x A x x x ( ) ( , )t t tf x A x x

1 arg max ( , )t tx A x x

1 1( ) ( , )t t tf x A x x

1( , ) ( , ) ( )t t t t tA x x A x x f x

1( ) ( )t tf x f x 2 1( ) ( )t tf x f x

tx

Page 11: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Expectation-Maximization (2)• The goal: finding ML solutions for

probabilistic models with latent variables

( , | ) ( , | )log ( ) ( ) log( )( ) ( )y y

p y p yq y q yq y q y

x Θ x ΘDifficult, auxiliary function

X

Y log ( | ) log ( , | )y

p p y x Θ x Θ

( ) 1, ( ) 0y

q y q y ( , | )log ( | ) log ( , | ) ( ) log( )

( )y y

p yp p y q yq y

x Θx Θ x Θ

Log(x) is concaveAuxiliary function

Page 12: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Expectation-Maximization (2)• What is the gap?

( , | )log ( | ) ( ) log( )( )y

p yp q yq y

x Θx Θ

( , | )( ) log ( | ) log( )( )y

p yq y pq y

x Θx Θ

( ) ( | )( ) log( )( , | )y

q y pq yp y

x Θx Θ

( )( ) log( )( | , )y

q yq yp y x

Θ

( ( ) ( | , )) 0KL q y p y x Θ

( ( ) ( | , )) 0 ( ) ( | , )KL q y p y x iff q y p y x Θ Θ

Page 13: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Expectation-Maximization (2)• We have shown that

( , | )log ( | ) ( ) log( ) ( ( ) ( | , )) 0( )y

p yp q y KL q y p y xq y

x Θx Θ Θ

( ) ( | , )oldq y p y x Θ( , )oldA Θ Θ

( , | )( ) log( )( )

old

y

p yq yq y

x Θ

( , | )( | , ) log( )( | , )

oldold

oldy

p yp y xp y x

x ΘΘΘ

( | , ) log( ( | )) log( ( | ))old old old

y

p y x p p Θ x Θ x Θ

If we evaluate with old parameters

Page 14: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Expectation-Maximization (3)• Any physical meaning of the auxiliary function?

( , | )( , ) ( | , ) log( )( | , )

old oldold

y

p yA p y xp y x

x ΘΘ Θ ΘΘ

( | , ) log( ( , | )) ( | , ) ( | , )old old old

y y

p y x p y p y x p y x Θ x Θ Θ Θ

( | , ) log( ( , | ))old

y

p y x p y const Θ x ΘIndependent of Θ

Conditional expectation of the complete data log likelihood

Page 15: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Expectation-Maximization (4)1. Choose an initial setting for the parameters2. E step: evaluate

3. M step: evaluate given by

4. Check for convergence if not, and return to step 2

oldΘ( | , )oldp y x Θ

newΘarg max ( , )new oldQ

ΘΘ Θ Θ

( , ) ( | , ) log( ( , | ))old old

y

Q p y x p yΘ Θ Θ x Θ

old newΘ Θ

Page 16: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Expectation-Maximization (4)

Page 17: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Guassian Mixture Model (GMM)

1/ 2 1/ 2

1 1( | , ) exp ( ) ( )(2 ) | | 2

Tj i j j i j j i jd

j

p

x μ Σ x μ Σ x μ

Σ

Guassian model of a d-dimensional source, say j :

GMM with M sources:

1 11

( | , , , , ) ( | , )M

j i M M j j i j jj

p p

x μ Σ μ Σ x μ Σ

),( jjj Σμ

0

1j

j

ix

iy

Page 18: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ

),,,,,,( 11 MM Θ

1

( | ) ( | )M

i l l i ll

p p

x Θ xMixture Model

11

M

llsubject to

M step:

1 1

( , ) ( | , ) log( ( | ))N M

g gi l l i l

i l

Q p l p

Θ Θ x Θ x

( , ) ( | , ) log( ( , | ))old old

y

Q p y x p yΘ Θ Θ x Θ

E step: ( | , )gip l x Θ

( , ) ( | )i i ii i y y i yp | p x y Θ x

Correlated with l only.

Correlated with l only.

arg max ( , )new oldQΘ

Θ Θ Θ

Page 19: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ

Due to the constraint on l’s, we introduce Lagrange Multiplier , and solve the following equation.

MllpM

ll

gi

M

l

N

il

l

,,1 ,01),|()log(11 1

Θx

MllpN

i

gi

l

,,1 ,0),|(11

Θx

Mllp l

N

i

gi ,,1 ,0),|(

1

Θx

Page 20: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ

Mllp l

N

i

gi ,,1 ,0),|(

1

Θx

0),|(11 1

M

ll

M

l

N

i

gilp Θx

0),|(11 1

M

ll

N

i

M

l

gilp Θx1

N

1

N

Page 21: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ

Mllp l

N

i

gi ,,1 ,0),|(

1

Θx

N

N

i

gil lp

N 1

),|(1 Θx

M

j

gjj

gj

glil

glg

i

p

plp

1

)|(

)|(),|(

x

xΘx

Page 22: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ Only need to maximizethis term

Consider GMM

)()(

21exp

||)2(1),|( 1

2/12/ llT

ll

dlllp μxΣμxΣ

Σμx

),( lll Σμ

)()(||log2log)],|(log[ 1212/1

21

2 llT

lld

lllp μxΣμxΣΣμx

unrelated

Page 23: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Finding l

M

l

N

i

gilil

gi

M

l

N

il

g lpplpQ1 11 1

),|()]|(log[),|()log(),( ΘxxΘxΘΘ Only need to maximizethis term

Therefore, we want to maximize:

M

l

N

i

gilil

Tlil

g lpQ1 1

1212/1

21 ),|()()(||log),( ΘxμxΣμxΣΘΘ

How?

Page 24: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

Finding l Therefore, we want to maximize:

M

l

N

i

gilil

Tlil

g lpQ1 1

1212/1

21 ),|()()(||log),( ΘxμxΣμxΣΘΘ

N

i

gi

N

i

gii

l

lp

lp

1

1

),|(

),|(

Θx

Θxxμ

N

i

gi

N

i

Tlili

gi

l

lp

lp

1

1

),|(

))()(,|(

Θx

μxμxΘx

M

j

gjj

gj

glil

glg

i

p

plp

1

)|(

)|(),|(

x

xΘx

Page 25: Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep 30 2014 Some

SummaryEM algorithm for GMM

M

j

gjj

gj

glil

glg

i

p

plp

1

)|(

)|(),|(

x

xΘx

N

i

gi

newl lp

N 1

),|(1 Θx

N

i

gi

N

i

gii

newl

lp

lp

1

1

),|(

),|(

Θx

Θxxμ

N

i

gi

N

i

Tnewli

newli

gi

newl

lp

lp

1

1

),|(

))()(,|(

Θx

μxμxΘx

Given an initial guess g, find new as follows

newg ΘΘ

Not converge