Upload
ursula-fox
View
225
Download
2
Tags:
Embed Size (px)
Citation preview
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech
RecognitionBing Zhang and Spyros Matsoukas,
BBN Technologies, 50 Moulton St. Cambridge
Reporter : Chang Chih Hao
Introduction
• LDA and HLDA– Better classification accuracy
– some common Limitations• None of them assumes any prior knowledge of confusable hypotheses
• Their objective functions do not directly relate to the word error rate (WER)
• Minimum Phoneme Error– Minimize phoneme errors in lattice-based training frameworks
– Since this criterion is closely related to WER, MPE_HLDA tends to be more robust than other projection methods, which makes it potentially better suited for a wider variety of features.
MPE Objection Function
• MPE-HLDA model
• MPE-HLDA aims at minimizing expected number of phoneme errors introduced by the MPE-HLDA model in a given hypothesis lattice, or equivalently maximizing the function
m m
Tm m
t t
A
C diag A A
o Ao
, | (4)
is the total number of training utterances,
is the sequence of p-dimensional observation vectors in utterance r,
is the "raw accuracy" score of wor
r
R
MPE r r rr w
r
r
F O P w O w
R
O
w
d hypothesis .rw
,m mC
MPE Objection Function
•
| is the posterior probability of hypothesis in the lattice
| |
|
is the language model probability of hypothesis ,
k : in order
r
r r r
k
r r r
r r k
r r rw
r r
P w O w
P O w P wP w O
P O w P w
P w w
to reduce acoustic scores dynamic range, thereby avoiding
the concentration of all posterior mass in the top-1 hypothesis of the lattice.
MPE Objection Function
• It can be shown that the derivative of (4) with respect to A is
, log | ,, (6)
, | ,
is the MPE score of utterance r (average accuracy over all hypotheses),
is the average accuracy ove
r
RMPE qr r
rr q
r r qr r
r
F O P O q rk D q r
A A
where
D q r P q O r q r
r
q
rr all hypotheses that contain arc q .
MPE Objection Function
•
•
log | , log | ,
and are the begin and end time of are ,
denotes the posterior probability of Gaussian m in arc at time t.
qr
qr
qr
r r
qr
Eqr r tm
t S m
q q r
mr
P O q r P o mt
A A
S E q
t q
1 1 1
1 1 1 1
log | ,t m mm m t p m m t
T T
m m t m t m m m p m m t m t m
Tmt t m t m
Tmt t m t m
P o mC C P I A C R
A
C C diag o o A C I A C o o
where
P diag o o
R o o
MPE Objection Function
• Therefore, Eq.(6) can be rewritten as
1 1
1
,
,
,
,
qr
r
r qr
qr
r
r qr
qr
r
r qr
MPE
m m m m p mm
Em
m r qr q t S
Em m
m r q tr q t S
Em m
m r q tm r q t S
F Ok C C g I A kJ
A
where
D q r t
g D q r t P
J C D q r t R
39*39
39*162
MPE-HLDA Implementation
• In theory, the derivative of the MPE-HLDA objective function can be computed based on Eq.(12), via s single forward-backward pass over the training lattices. In practice, however, it is not possible to fit all the full covariance matrices in memory.
• Two steps– First, run a forward-backward pass over the training lattices to acumulate
– Second, uses these statistics together with the full covariance matrices to synthesize the derivative.
• The Paper used gradient descent in updating the projection matrix.
MPE-HLDA Implementation
Experimental Framework
A Lp*n
n*l
l*1
p*1
Global feature projection
---there is more useful information in longer contexts
---Reduce the computational cost
Experimentation
• Conversational Telephone Speech (CTS)– 2300 hours of training data
• 800 hours : training the initial ML model
• 1500 hours : held-out training data – Lattice generation
– Discriminative training
– MPE-HLDA : only 370 hours
– Testing set• Eval03
• Dev04
Experimentation
• Conversational Telephone Speech (CTS)– Feature
• Frame concatenated PLP cepstra– 15 frames, l = 225, n = 130, p = 60
Experimentation
Experimentation
• Broadcast News (BN)– 600 hours : training the initial model (Hub4 and TDT)
– 330 hours : held-out data
Thanks