Upload
harry-doyle
View
216
Download
0
Embed Size (px)
DESCRIPTION
How about Multimodal cases?
Citation preview
Design and Implementation of Speech Recognition Systems
Fall 2014Ming Li
Special topic: the Expectation-Maximization algorithm and GMM
Sep 30 2014
Some graphics are from Pattern Recognition and Machine learning, Bishop, Copyright © Springer
Some equations are from the slides edited by Muneem S.
Parametric density estimation
• How to estimate those parameters?– ML– MAP
How about Multimodal cases?
Mixture Models• If you believe that the data set is comprised
of several distinct populations• It has the following form:
M
jjjj pp
1
)|()|( xΘx
),,,,,,( 11 MM Θ
11
M
jjwith
Mixture Models
M
jjjj pp
1
)|()|( xΘx
Let yi{1,…, M} represents the source that generates the data.
X
Y},,,{ 21 Nyyy Y
},,,{ 21 NxxxX
Mixture Models1
( | ) ( | )M
i j j i jj
p p
x Θ x
Let yi{1,…, M} represents the source that generates the data.
X
Y
ixiy j
( | , ) ( | )i j i jp y j p x Θ x
( | )i jp y j Θ
Mixture Models
ixiy
)|(),|()|,()|( ΘxΘxΘxΘz iiiiii pypypp
iz
1 1 1
( | ) ( , | ) ( | ) ( | , ) ( | )i i
M M M
i i i i i j j i jy y j
p p y p y j p y j p
x Θ x Θ Θ x Θ x
( | , ) ( | )i j i jp y j p x Θ x ( | )i jp y j Θ
Mixture Models
( | ) ( , | ) ( | , ) ( | )=( | , ) ( | )i i i i i i i i ip p y p y p y p y z Θ x Θ x Θ x Θ x Θ Θ
( | , ) ( | )i j i jp y j p x Θ x ( | )i jp y j Θ
),(),,(),|(
ΘxΘxΘx
i
iiii p
ypyp ),(
),(),|(Θx
ΘΘx
i
iii
pypyp
)()|()()|(),|(
ΘΘxΘΘΘx
pppypyp
i
iii)|(
)|(),|(Θx
ΘΘx
i
iii
pypyp
M
jjjj
yyiy
p
piii
1
)|(
)|(
x
xGiven x and , the conditional density
of y can be computed.
1
( | ) ( | )M
i j j i jj
p p
x Θ x
Complete-Data Likelihood Function
},,{ 1 NxxX
),( iii yxz },,{ 1 Nyy y
),,{ 1 NzzZ
( )p |Z Θ ),( ΘyX |p )(),|( ΘyΘyX |pp
N
iiii |ypyp
1
)(),|( ΘΘx
N
iyiyy iii
p1
)|( x
iy iy
X
Y
But we don’t know the latent variable y
11 1
( | ) ( | ) ( | )N N M
i j j i jji i
p p p
x Θ x Θ x
Expectation-Maximization (1)• If we want to find the local maxima of
– Define an auxiliary function at – Satisfy and
( )f x( , )tA x x
( ) ( , ),tf x A x x x ( ) ( , )t t tf x A x x
1 arg max ( , )t tx A x x
1 1( ) ( , )t t tf x A x x
1( , ) ( , ) ( )t t t t tA x x A x x f x
1( ) ( )t tf x f x 2 1( ) ( )t tf x f x
tx
Expectation-Maximization (2)• The goal: finding ML solutions for
probabilistic models with latent variables
( , | ) ( , | )log ( ) ( ) log( )( ) ( )y y
p y p yq y q yq y q y
x Θ x ΘDifficult, auxiliary function
X
Y log ( | ) log ( , | )y
p p y x Θ x Θ
( ) 1, ( ) 0y
q y q y ( , | )log ( | ) log ( , | ) ( ) log( )
( )y y
p yp p y q yq y
x Θx Θ x Θ
Log(x) is concaveAuxiliary function
Expectation-Maximization (2)• What is the gap?
( , | )log ( | ) ( ) log( )( )y
p yp q yq y
x Θx Θ
( , | )( ) log ( | ) log( )( )y
p yq y pq y
x Θx Θ
( ) ( | )( ) log( )( , | )y
q y pq yp y
x Θx Θ
( )( ) log( )( | , )y
q yq yp y x
Θ
( ( ) ( | , )) 0KL q y p y x Θ
( ( ) ( | , )) 0 ( ) ( | , )KL q y p y x iff q y p y x Θ Θ
Expectation-Maximization (2)• We have shown that
( , | )log ( | ) ( ) log( ) ( ( ) ( | , )) 0( )y
p yp q y KL q y p y xq y
x Θx Θ Θ
( ) ( | , )oldq y p y x Θ( , )oldA Θ Θ
( , | )( ) log( )( )
old
y
p yq yq y
x Θ
( , | )( | , ) log( )( | , )
oldold
oldy
p yp y xp y x
x ΘΘΘ
( | , ) log( ( | )) log( ( | ))old old old
y
p y x p p Θ x Θ x Θ
If we evaluate with old parameters
Expectation-Maximization (3)• Any physical meaning of the auxiliary function?
( , | )( , ) ( | , ) log( )( | , )
old oldold
y
p yA p y xp y x
x ΘΘ Θ ΘΘ
( | , ) log( ( , | )) ( | , ) ( | , )old old old
y y
p y x p y p y x p y x Θ x Θ Θ Θ
( | , ) log( ( , | ))old
y
p y x p y const Θ x ΘIndependent of Θ
Conditional expectation of the complete data log likelihood
Expectation-Maximization (4)1. Choose an initial setting for the parameters2. E step: evaluate
3. M step: evaluate given by
4. Check for convergence if not, and return to step 2
oldΘ( | , )oldp y x Θ
newΘarg max ( , )new oldQ
ΘΘ Θ Θ
( , ) ( | , ) log( ( , | ))old old
y
Q p y x p yΘ Θ Θ x Θ
old newΘ Θ
Expectation-Maximization (4)
Guassian Mixture Model (GMM)
1/ 2 1/ 2
1 1( | , ) exp ( ) ( )(2 ) | | 2
Tj i j j i j j i jd
j
p
x μ Σ x μ Σ x μ
Σ
Guassian model of a d-dimensional source, say j :
GMM with M sources:
1 11
( | , , , , ) ( | , )M
j i M M j j i j jj
p p
x μ Σ μ Σ x μ Σ
),( jjj Σμ
0
1j
j
ix
iy
M
l
N
i
gilil
gi
M
l
N
il
g lpplpQ1 11 1
),|()]|(log[),|()log(),( ΘxxΘxΘΘ
),,,,,,( 11 MM Θ
1
( | ) ( | )M
i l l i ll
p p
x Θ xMixture Model
11
M
llsubject to
M step:
1 1
( , ) ( | , ) log( ( | ))N M
g gi l l i l
i l
Q p l p
Θ Θ x Θ x
( , ) ( | , ) log( ( , | ))old old
y
Q p y x p yΘ Θ Θ x Θ
E step: ( | , )gip l x Θ
( , ) ( | )i i ii i y y i yp | p x y Θ x
Correlated with l only.
Correlated with l only.
arg max ( , )new oldQΘ
Θ Θ Θ
Finding l
M
l
N
i
gilil
gi
M
l
N
il
g lpplpQ1 11 1
),|()]|(log[),|()log(),( ΘxxΘxΘΘ
Due to the constraint on l’s, we introduce Lagrange Multiplier , and solve the following equation.
MllpM
ll
gi
M
l
N
il
l
,,1 ,01),|()log(11 1
Θx
MllpN
i
gi
l
,,1 ,0),|(11
Θx
Mllp l
N
i
gi ,,1 ,0),|(
1
Θx
Finding l
M
l
N
i
gilil
gi
M
l
N
il
g lpplpQ1 11 1
),|()]|(log[),|()log(),( ΘxxΘxΘΘ
Mllp l
N
i
gi ,,1 ,0),|(
1
Θx
0),|(11 1
M
ll
M
l
N
i
gilp Θx
0),|(11 1
M
ll
N
i
M
l
gilp Θx1
N
1
N
Finding l
M
l
N
i
gilil
gi
M
l
N
il
g lpplpQ1 11 1
),|()]|(log[),|()log(),( ΘxxΘxΘΘ
Mllp l
N
i
gi ,,1 ,0),|(
1
Θx
N
N
i
gil lp
N 1
),|(1 Θx
M
j
gjj
gj
glil
glg
i
p
plp
1
)|(
)|(),|(
x
xΘx
Finding l
M
l
N
i
gilil
gi
M
l
N
il
g lpplpQ1 11 1
),|()]|(log[),|()log(),( ΘxxΘxΘΘ Only need to maximizethis term
Consider GMM
)()(
21exp
||)2(1),|( 1
2/12/ llT
ll
dlllp μxΣμxΣ
Σμx
),( lll Σμ
)()(||log2log)],|(log[ 1212/1
21
2 llT
lld
lllp μxΣμxΣΣμx
unrelated
Finding l
M
l
N
i
gilil
gi
M
l
N
il
g lpplpQ1 11 1
),|()]|(log[),|()log(),( ΘxxΘxΘΘ Only need to maximizethis term
Therefore, we want to maximize:
M
l
N
i
gilil
Tlil
g lpQ1 1
1212/1
21 ),|()()(||log),( ΘxμxΣμxΣΘΘ
How?
Finding l Therefore, we want to maximize:
M
l
N
i
gilil
Tlil
g lpQ1 1
1212/1
21 ),|()()(||log),( ΘxμxΣμxΣΘΘ
N
i
gi
N
i
gii
l
lp
lp
1
1
),|(
),|(
Θx
Θxxμ
N
i
gi
N
i
Tlili
gi
l
lp
lp
1
1
),|(
))()(,|(
Θx
μxμxΘx
M
j
gjj
gj
glil
glg
i
p
plp
1
)|(
)|(),|(
x
xΘx
SummaryEM algorithm for GMM
M
j
gjj
gj
glil
glg
i
p
plp
1
)|(
)|(),|(
x
xΘx
N
i
gi
newl lp
N 1
),|(1 Θx
N
i
gi
N
i
gii
newl
lp
lp
1
1
),|(
),|(
Θx
Θxxμ
N
i
gi
N
i
Tnewli
newli
gi
newl
lp
lp
1
1
),|(
))()(,|(
Θx
μxμxΘx
Given an initial guess g, find new as follows
newg ΘΘ
Not converge