62
Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hw ei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor : Hsin-Min Wang Berlin Chen

Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

Embed Size (px)

Citation preview

Page 1: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

Confidence Measures for Automatic Speech Recognition

Presented by Tzan-Hwei Chen

National Taiwan Normal UniversitySpoken Language Processing Lab

Advisor : Hsin-Min Wang Berlin Chen

Page 2: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

2

Outline

• Introduction

• The category of estimation methods of confidence measure (CM)– Featured based– Posterior probability based– Explicit model based – Incorporation of high-level information for CM*

• The application of CM to improve speech recognition

• Summary

Page 3: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

3

Introduction (1/9)

• It is extremely important to be able to make an appropriate and reliable judgement based on the error-prone ASR result.

• Researchers have proposed to compute a score (preferably 0~1), called confidence measure (CM), to indicate reliability of any recognition decision made by an ASR system.

Page 4: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

4

Introduction (2/9)

Feature extraction

Decodingspeechsignal

Acousticmodel

Languagemodel

recognizedword

sequence

featurevector

Lexicon

Confidence Measure

Verification

臺北 到 魚籃

12

1. 臺北到魚籃2. 臺北到宜蘭

Some application of CM

Page 5: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

5

Introduction (3/9)

• First of all, we can backtrack some early research on CM to rejection in word-spotting systems.

• Other early CM-related works lie in automatic detection of new words in LVCSR.

• From the past few years, the CM has been applied to more and more research areas, e.g.,– To improve speech recognition– The algorithm about look-head in LVCSR– To guide the system to perform unsupervised learning– …

Page 6: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

6

Recognizedunits

Introduction (4/9)

• The general procedure of CM for verification

Confidence estimation

judgment

Predefinedthreshold

> threshold < threshold

Confidenceof unit

acceptance rejection

Page 7: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

7

Introduction (5/9)

• Four situations when judging a hypothesis

宜蘭ref

hyp宜蘭 Accept Correct acceptance

reject Correct rejection

reject false rejection

Accept false acceptance

宜蘭ref

hyp魚籃

宜蘭ref

hyp宜蘭

宜蘭ref

hyp魚籃

Page 8: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

8

Introduction (6/9)

• The evaluation metric :– Confidence error rate :

wordsrecognized ofnumber totalthe

rejection false and acceptance false ofnumber CER

三民 候選人 通過

有 三名 候選人 通過 審查

審查 了

FA CA FR CA FA

5

12 CER

hyp

ref

Page 9: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

9

Introduction (7/9)

• The evaluation metric :– Confidence error rate :

wordsrecognized ofnumber totalthe

ins. sub.

correct) is wordrecognizedevery assumed is(it

baseline

三民 候選人 通過

有 三名 候選人 通過 審查

審查 了

FA CA CA CA FA

5

11baseline

hyp

ref

Page 10: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

10

Introduction (8/9)

• The evaluation metric (cont):– Receiver operator characteristics (ROC) curve :simply contains

a plot of the false acceptance rate over the detection rate.

raterejection false-1 ratedetection

wordsrecognizedy incorrectl ofnumber

acceptance false of num.rate acceptance false

wordsrecognizedcorrectly ofnumber

rejection false of num.rate rejection false

Page 11: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

11

Introduction (9/9)

• All methods proposed for computing CMs can be roughly classified into three major categories [7]:

– Feature based

– Posterior probability based

– Explicit model based (utterance verification, UV)

– Incorporation of high-level information for CM*

Page 12: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

12

Feature-based confidence measure

Page 13: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

13

Feature-based confidence measure (1/8)

• The feature can be collected during the decoding procedure and may include acoustic, language and syntactic information

• Any feature can be called a predictor if its p.d.f. of correctly recognized words is clearly distinct from that of misrecognized words

)( wfp

wf

misrecognized word correctly recognized word

Page 14: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

14

Feature-based confidence measure (2/8)

• Some common predictor features – Pure normalized likelihood score related : acoustic score per

frame.

– N-best related : count in the N-best list, N-best homogeneity score

– Duration related : word duration divided by its number of phones

)|],,([

)|],,([

1],,[

1],,[

1

1

Xeswp

Xeswp

Nnnn

bestNesw

Mmmm

esww

Nnnn

Mmmm

wordth- theof timeend the:

wordth- theof start time the:

is word

ofnumber that sequence a word :],,[ 1

me

ms

M

esw

m

m

Mmmm

Page 15: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

15

Feature-based confidence measure (3/8)

• Some common predictor features (cont)– Hypothesis density :

ttD at arc worddifferent theofnumber The:)'(

)(1

1]),;[:( tD

seeswaHD

a

a

e

staaaaa

靜音結果

建國

三名三名

三名

候選人

候選人

候選人

沒有

沒有

沒有

審查

審查

候選人通過

三名候選人

通過

2)( tDgraph wordin arc a word :],;[: aaa eswa

Page 16: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

16

Feature-based confidence measure (4/8)

• Some common predictor features (cont)– Acoustic stability

今天 天氣 很好Hypothesized word sequence

天氣 很好

今天 天氣

今天Hypothesized word sequence

1)()|( WPWXp

今天 天氣

今天 天氣 不佳

2)()|( WPWXp

3)()|( WPWXp

今天 天氣 很好

今天 天氣Hypothesized word sequence

不佳

Page 17: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

17

Feature-based confidence measure (6/8)

• We can combine the above features with any one of the following classifiers

– Line discriminant function

– Generalized linear model

– Neural networks

– Decision tree

– Support vector machine

– Boosting

– Naïve Bayes classifier

Page 18: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

18

Feature-based confidence measure (7/8)

• Naïve Bayes Classifier [3]

),'|()|'(

),|()|(),|(

21or'

111

wCfPwCP

wCfPwCPwfCP

wCCC

ww

),|(),|(1

wCfPwCfP iw

K

diw d

)(

),()|(

wN

wCNwCP i

i

),(

),,(),|(

wCN

wCfNwCfP

i

iwiw

d

d

vectorfeaturepredictor dimension the:

wrongis wordrecognized the:

correct is wordrecognized the:

2

1

kf

C

C

w

Page 19: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

19

Feature-based confidence measure (8/8)

• Experiments [3]

• Corpus : an Italian speech corpus of phone calls to the front desk of a hotel

feature   CER(%)relative reduction (%)

acoustic stability 16.3 22.4

language modelscore

18.8 10

hypothesis density 18.9 10.5

duration 19.3 8.1

acoustic score 19.6 6.7

baseline 21  

Page 20: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

20

Posterior probability based confidence measure

Page 21: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

21

)],,([)],,[|(

)],,([)],,[|(

)(

)],,([)],,[|()|],,([

11

],,[

11

111

1

Nnnn

Nnnn

esw

Mmmm

Mmmm

Mmmm

MmmmM

mmm

eswPeswXp

eswPeswXp

Xp

eswPeswXpXeswP

Nnnn

__

W

Posterior probability based confidence measure (1/11)

• Posterior probability of a word sequence :

• To adopt some approximation methods

Impossible to estimate in a precise manner

Page 22: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

22

靜音

建國

Posterior probability based confidence measure (2/11)

• Word graph based approximation

結果

三名三名

三名

三名

三名

候選人

候選人

候選人

沒有

沒有

沒有靜音

通過 靜音

候選人通過

三名候選人

靜音

)],,([)],,[|()( 11

],,[ 1

Nnnn

Nnnn

esw

eswPeswXpXpN

nnn

__

W

)],,([)],,[|( 11],,[ 1

Mmmm

Mmmm

esw

eswPeswXpXM

mmm

Page 23: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

23

Posterior probability based confidence measure (3/11)

• Posterior probability of a word arc :

– Some issues are addressed and the word posterior probability is generalized

• Reduced search space

• Relaxed time registration

• Optimal acoustic and language model weights

)|()|(

)|()|(

)|],;[:(

1}],;{[

1],;[,}],;{[

1

11

nnnes

N

nesw

mmmes

M

meswaeswXaaa

hwPwXp

hwPwXp

eswapn

nXN

nnnn

m

mMmmmm

XMmmmm

Page 24: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

24

Posterior probability based confidence measure (4/11)

• Posterior probability of a word arc [6] :

)|()|(

)|()|(

)|],;[:(

1}],;{[

1],;[,}],;{[

1

11

nnnes

N

nesw

mmmes

M

meswaeswXaaanormal

hwPwXp

hwPwXp

eswaCn

nXN

nnnn

m

mMmmmm

XMmmmm

靜音結果

建國

有由

三名三名

三名

三名

三名

候選人

候選人

候選人

沒有

沒有

沒有靜音

通過 靜音

候選人通過

三名候選人

靜音

Page 25: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

25

Posterior probability based confidence measure (5/11)

• Posterior probability of a word arc [6] :

)|],;[:(]),;[:(

2/)(],,;[:

Xrrrnormal

eesswweswr

aaamed eswrCeswaC

raar

arrrr

靜音結果

建國

有由

三名三名

三名

三名

三名

候選人

候選人

候選人

沒有

沒有

沒有靜音

通過 靜音

候選人通過

三名候選人

靜音

Page 26: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

26

三名

Posterior probability based confidence measure (6/11)

• Posterior probability of a word arc [6] :

)|],;[:(max]),;[:(],,;[:

},,{

Xrrrnormal

etswweswr

estaaa eswrCeswaC

rr

arrrraa

max

靜音結果

建國

有由

三名三名

三名

三名

候選人

候選人

候選人

沒有

沒有

沒有靜音

通過 靜音

候選人通過

三名候選人

靜音

Page 27: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

27

Posterior probability based confidence measure (7/11)

• Posterior probability of a word arc [6] :

)|],;[:(]),;[:(

),(),(:],,;[:

secX

rrrnormal

eseswweswr

aaa eswrCeswaC

rraa

arrrr

靜音結果

建國

有由

三名三名

三名

三名

三名

候選人

候選人

候選人

沒有

沒有

沒有靜音

通過 靜音

候選人通過

三名候選人

靜音

Page 28: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

28

Posterior probability based confidence measure (8/11)

• The drawbacks of the above methods – all need an additional pass.

• In [8], the “local word confidence measure” is proposed

)'())'|(max(

)())|((max]),,([

''

]'',',[

''

]'',,[

wpwxp

wpwxpeswC

es

Eesw

es

Eesw

今天 rate. relaxationa given sconstraint length and time

therealize which wordsalternate theofset the:E

今天

今天

今天

))|((max ''

]'',,[wxp e

sesw

Page 29: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

29

Posterior probability based confidence measure (8/11)

• local word confidence measure (cont)

)'())'|(max(

)())|((max]),,([

''

]'',',[

''

]'',,[

wpwxp

wpwxpeswC

es

Eesw

es

Eesw

)'|'())'|(max(

)|())|((max]),,([

'

''

]'',',[

'

''

]'',,[

hw

ss

Eesw

hw

es

Eesw

wwpwxp

wwpwxpeswC

h

bigram applied

)}'|'()'|'({))'|(max(

)}|()|({))|((max

]),,([''

]'',',[

''

]'',,[

wwpwwpwxp

wwpwwpwxp

wswCfh

ww

es

Eesw

fhww

es

Eesw

fh

fh

forward/backwardbigram applied

Page 30: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

30

Posterior probability based confidence measure (9/11)

• Impact of word graph density on the quality of posterior probability [9]

Baseline 27.3 15.4

wordsspoken ofnumber the

arcs wordofnumber totalWGD

Page 31: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

31

Posterior probability based confidence measure (10/11)

• Experiments [6]

corpus baseline Cnormal Csec Cmed Cmax

ARISE 13.6 11.5 8.9 8.8 8.9

Verbmobil 27.3 23.3 19.0 20.0 18.9

NAB 20k 11.3 10.3 9.2 9.2 9.2

NAB 64k 9.2 8.4 7.2 7.2 7.2

Broadcast news 27.7 23.7 20.6 20.4 20.6

Page 32: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

32

Explicit model based confidence measure (1/10)

• The CM problem is formulated as a statistical hypothesis testing problem.

• Under the framework of binary hypothesis testing, there are two complementary hypotheses

• We test against

W1

W0

model from NOT is and recognized wrongly is :Hypothese) ve(Alternati

model from comes truly and recognizedcorrectly is :Hypothese) (Null

XH

XH

0H 1H

0

1

)|(

)|( RT) testing(Lratio likelihood

1

0

H

HHXP

HXP

Page 33: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

33

Explicit model based confidence measure (3/10)

• The above LRT score can be transformed to a CM based on a monotonic 1-1 mapping function.

• The major difficulty with LRT is how to model the alternative hypothesis.

• In practice, the same HMM structure is adopted to model the alternative hypothesis.

• A discriminative training procedure plays a crucial role in improving modeling performance.

Page 34: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

34

Explicit model based confidence measure (3/10)

• Two-pass procedure :

)|(score observaion csxP

)|(score transition ci

cj ssp

今天 天氣 很好

)|(

)|(:

aes

ces

XP

XPLRT

今天

今天

of model-anti the:

of modelcorrect the:a

c

Page 35: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

35

Explicit model based confidence measure (4/10)

• One-pass procedure

)|(

)|(score observaion

a

c

sxP

sxP

)|(

)|(score transition

ai

aj

ci

cj

ssp

ssp

今天 天氣 很好

)|(

)|(:

aes

ces

XP

XPLRT

今天

今天

a

tct ss

Page 36: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

36

Explicit model based confidence measure (5/10)

• How to calculate the confidence of a recognized word?

shift.a is and function sigmoid theof slope thedefines where

)))((logexp(1

1)(

function sigmoida by

dmanipulate is measure confidence subword therange, dynamic limit the To

segment. decoded thein frames ofnumber theis where

)|(

)|(log

1),,(log

1)(

as obtained be can,X vectors,nobservatio

ofsegment a over decoded unit subworda for score likelihood levelunit unweighted The

u

uu

uu

u

au

cu

u

acu

u

uLRuU

N

XP

XP

NXLR

NuLR

u

Page 37: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

37

Explicit model based confidence measure (6/10)

• How to calculate the confidence of a recognized word (cont)?

))(log1

exp()(

)(1

)(

))(log1

exp()(

)(1

)(

:,,1, units subword of composed a wordfor defined are measures following The

compared. are U()scores ratio likelihood levelunit weightedsigmoid theand LR(),

scores, ratio likelihood levelunit unweighted the toingcorrespond measures confidence level Word

,1

4

,1

3

,1

2

,1

1

,n

in

N

inn

in

N

inn

in

N

inn

in

N

inn

nin

uUN

wW

uUN

wW

uLRN

wW

uLRN

wW

Niu

n

n

n

n

LR() theof means arithmetic

LR() theof means geometric

U() theof means arithmetic

U() theof means geometric

Page 38: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

38

Explicit model based confidence measure (7/10)

• Discriminative training [10] – The goal of the training procedure is to increase the average val

ue of for correct hypotheses and decrease the average value of for false acceptance.

),,( acXLR

),,( acXLR

},,{segment over the

unit as decodedsegment speech theof frame final and initial theare and where

)(1

1)(

as distances based frame theaveragingby obtained is distance basedsegment The

))(log())(log()(

:decoder by the

obtained sequence in then transitiostateeach for defined is distance based frameA

1-tq

ij

ufui

uu

t

uf

uiuu

ttu

fi

tq

tt

ttif

uu

taj

aijt

cj

cijt

xxX

utt

xrtt

XR

xbaxbayr

Page 39: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

39

Explicit model based confidence measure (8/10)

• Discriminative training (cont)

),(1

)},({ },,{ where

)},({

)},({cost expected theon performed is updategradient A

imposter ,1

correct ,1)(

as defined is )( functionindicator thewhere

)))()((exp(1

1),,(

function sigmoida using unit for ),,( functioncost theDefine

1

u

u1

uuu

N

iu

uuu

au

cu

uuun

un

uuu

uu

uu

au

cu

uu

au

cu

uu

XFN

XFE

XFE

XFE

u

uu

u

XRuXF

uXF

u

Page 40: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

40

Explicit model based confidence measure (9/10)

)(F

)))()(exp(1

1),,(

uu

uu

au

cu

uu XRuXF

) R(andimposter

)( andcorrect if

uu

uRu )(F

Why discriminative training works?

Page 41: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

41

Explicit model based confidence measure (10/10)

• Experiments [10] • This task, referred to as the “movie locator”,

Page 42: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

42

Incorporation of high-level information for CM

Page 43: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

43

Incorporation of high-level information for CM (1/4)

• LSA

– The key property of LSA is that words whose vectors are close to each other are semantically similar words.

– These similarities can be used to provide an estimate of the likelihood of the words co-occurring within the same utterance.

21dd nd

2

1

w

w

mw

2

1

w

w

mw

A U

21dd nd

TV

Page 44: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

44

Incorporation of high-level information for CM (2/4)

• LSA (cont)– The entry of matrix :

– The confidence of a recognized word :

)1log()1(j

ijiij n

cEa

A

ijij

N

ji ff

NE 2

12

log)(log

1

i

ij

ij t

cf

))(),((Cosine1

1ji

N

jwUwU

N

document all in termofcount the:

document of size the:

document in termofcount the:

it

jn

jic

i

j

ij

Page 45: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

45

Incorporation of high-level information for CM (3/4)

• Inter-word mutual information :

wordsrecognized remaining thewith

word thisof ninformatio mutual average theas calculated is wordrecognized each ofCM

))w()(

)w,(log(

as calculated be can and wordsany two between ninformatio Mutual

)w,(

)w,(),(

: is )w,(y probabilitjoint thedocuments, training thein

wordand wordof timesoccurrence-co thedenotes )w,( Assume

21

21

21

21w,

2121

21

2121

21

pwp

wpMI

ww

wN

wNwwP

wP

wwwN

w

Page 46: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

46

Incorporation of high-level information for CM (4/4)

• Experiments [14]

CM Switchbord Mandarin dictation

LSA 44.7 38.5

MI 41.0 33.7

Cmed 24.4 17.5

N-best count 28.3 21.1

MI+Cmed 23.9 16.2

Page 47: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

47

The application of CM to improve speech recognition

Page 48: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

48

The application of CM to improve speech recognition (1/10)

• Statistical decision theory aims at minimizing the expected of making error

)|],,([maxarg],,[ 1],,[

*

11

XeswPesw Nnnn

esw

Nnnn

Nnnn

靜音結果

建國

有由

三名三名

三名

三名

三名

候選人

候選人

候選人

沒有沒有

沒有靜音

通過 靜音

候選人通過

三名候選人

靜音

Page 49: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

49

The application of CM to improve speech recognition (2/10)

• Method 1 [16]:

)|],,([

),],,[|],,([

)|],,([)|],,([

11

11

11

111

Tnnn

N

n

Tnnnn

N

n

TNnnn

Nnnn

xtswp

xtswtswp

xeswpXeswp

)|],,([maxarg],,[ 11],,[

**

11

TNnnn

esw

Nnnn xeswPesw

Nnnn

Page 50: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

50

The application of CM to improve speech recognition (3/10)

• Method 2 [18] :

)|],,([WERminarg],,[ 1],,[

*

11

Xeswesw Nnnn

esw

Nnnn

Nnnn

)|()(1

0.1)|],,([1

1 XwPcorrectwPN

XeswWER nn

N

n

Nnnn

Page 51: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

51

The application of CM to improve speech recognition (4/10)

• Method 3 (Time Frame Error decoding) [17]: – In minimum Bayes risk decoding

– if

)|],,([

)],,[,],,([minarg],,[

1

],,[11

],,[

*

1 1

1XesvP

esveswCesw

Mmmm

esv

Mmmm

Nnnn

esw

Nnnn

Mmmm

Nnnn

M

mmmN

nnn

Mmmm

NnnnM

mmmN

nnnesvesw

esveswesveswC

11

1111

],,[],,[,0

],,[],,[,1)],,[,],,([

Page 52: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

52

The application of CM to improve speech recognition (5/10)

• Method 3 (cont)

)|],,([maxarg

)|],,([1minarg

)|],,([1minarg

)|],,([

)],,[,],,([minarg],,[

1],,[

1],,[

1],,[],,[

],,[

1

],,[11

],,[

*

1

1

1

111

1

1

XeswP

XeswP

XesvP

XesvP

esveswCesw

Nnnn

esw

Nnnn

esw

Mmmm

eswesvesw

Mmmm

esv

Mmmm

Nnnn

esw

Nnnn

Nnnn

Nnnn

Nnnn

Mmmm

Nnnn

Mmmm

Nnnn

Page 53: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

53

The application of CM to improve speech recognition (6/10)

• Method 3 (cont) – we are now faced with a conceptual mismatch between the decis

ion rule and the evaluation criterion for speech recognizers- the word error rate

– The easiest way to overcome this mismatch is to use the same cost function for evaluation – Levensthein distance

– In (Stolcke et. al 1997), the pairwise alignment is restricted to N-best list.

– Let us assume that sub. were the one type of error.• A dynamic programming alignment would thus not be necess

ary.

Page 54: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

54

The application of CM to improve speech recognition (7/10)

• Method 3 (cont)

)(1

),(1

)],,[,],,([1

11nn

tn

et

stN

n

Mmmm

Nnnn se

w

esveswC

n

n

'W

W

今天 天氣 很好

今天 天氣

1 every word ofcost maximum the1, if 2' ofcost max the

3' ofcost max the

W

W

tesv

vM

mmm

t

frame timeintersects which],,[

sequence wordin hypothesis word theofidentity word the:

1

Page 55: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

55

The application of CM to improve speech recognition (8/10)

• Method 3 (cont)

)(1

)|],,([),(1

minarg

)(1

)|],,([),()|],,([1

minarg

)|],,([)(1

),(1

minarg

)|],,([)],,[,],,([minarg],,[

1],,[

1],,[

1]'',,[

11],,[

1],,[

],,[1

1],,[

],,[111

],,[

*

1

1

1

11

1

11

11

nn

Mmmm

tn

esv

et

stN

nesw

nn

Mmmm

tn

es

et

st

TMmmm

esv

et

stN

nesw

esv

Mmmm

nn

tn

et

stN

nesw

esv

Mmmm

Mmmm

Nnnn

esw

Nnnn

se

XesvPw

se

XesvPwxesvP

XesvPse

w

XesvPesveswCesw

Mmmm

n

n

N

M

n

nM

mmm

n

n

N

Mmmm

n

n

N

Mmmm

N

Page 56: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

56

The application of CM to improve speech recognition (9/10)

• Method 3 (cont)

),|(

)|],,([),(

)|],,([),(

)|],,([),(

''

1

1

:],,,[

11 ],,[

1],,[

Xtwp

XesvPw

XesvPw

XesvPw

n

mmmmnetsesv

Mmmmmn

M

etsmesv

Mmmm

tn

esv

mmmmmm

mm

Mmmm

Mmmm

)(1

),|(1

nn

n

et

st

se

Xtwpn

n

Can be interpreted as the normalizedProbability of a word being incorrect.

nw

Page 57: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

57

The application of CM to improve speech recognition (10/10)

• Experiments (Method 3)

corpus baseline Time frame error

ARISE 15.8 15.0

Verbmobil 33.6 32.5

NAB 20k 13.2 12.9

NAB 64k 11.1 10.8

broadcast news 33.3 32.3

Page 58: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

58

Summary

• Almost all CMs rely almost entirely on a single information source :how much the underlying decision can overtake other possible competitors.

• We believe it is critical to improve performance of CMs by – taking this segmentation issue into account.– Deciding a dynamic threshold for different word.

Page 59: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

59

Reference (1/4)

• Main reference– [1] H. Jiang ,“Confidence Measures for Speech Recognition : A Survey”,

Speech communication 2005 .

• Feature based confidence measure– [2] S. Cox and R. Rose, “Confidence Measures for The Switchboard Dat

abase”, ICASSP 1996.– [3] T. Schaaf and T. Kemp, “Confidence Measures for Spontaneous Spe

ech Recognition”, ICASSP 1997.– [4] A. Sanchis , A. Juan and E. Vidal, “New Features based on Multiple

Word Graphs for Utterance Verification”, ICSLP 2003.– [5] R. Zhang and A.I. Rudnicky, “Word Level Confidence Annotation Usi

ng Combinations of Features”, EuroSpeech 2001.

• Posterior based confidence measure– [6] F. Wessel , R. Schluter, K. Macherey, and H. Ney, “Confidence Meas

ures for Large Vocabulary Continuous Speech Recognition”, IEEE SAP 2001.

– [7] F. K. Soong and W. K. Lo, “Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences”, ICASSP 2005

Page 60: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

60

Reference (2/4)

• Posterior based confidence measure– [8] J. Razik, O. Mella, D. Fohr, J.-P. Haton, “Local Word Confidence Me

asure Using Word Graph and N-Best List.”

– [9] T, Fabian, R. Lieb, G. Ruske, M. Thomae, “Impact of Word Graph Density on the Quality of Posterior Probability Based Confidence Measures.

• Explicit model based confidence measure– [10] E. Lleida, R. C. Rose, “Utterance Verification in Continuous Speech

Recognition: Decoding and Training Procedures”, IEEE SAP 2000.

– [11] M. G. Rahim and C.-H Lee, “String-based Minimum Verification Error (SB-MVE) Training for Speech Recognition”, computer speech and language 1997.

– [12] H. Jiang, F. K. Soong and C.-H. Lee, “A Dynamic In-Search Data Selection Method With Its Applications to Acoustic Modeling and Utterance Verification”, IEEE SAP 2005.

Page 61: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

61

Reference (3/4)

• Incorporation of high-level information for CM– [13] R. Sarikaya, Y. Gao, M. Picheny and H. Erdogan, “Semantic Confid

ence Measurement for Spoken Dialog Systems.”, IEEE SAP 2005.

– [14] G. Guo, C. Huang, H. Jiang, R.-H. Wang, “A Comparative Study on Various Confidence Measures in Large Vocabulary Speech Recognition”, ISCSLP 2004.

• Some application for CM – [15] M. Afify, F. Liu, H. Jiang and O. Siohan, “A New Verification-based

Fast-Match for Large Vocabulary Continuous Speech Recognition” IEEE SAP 2005

– [16] F. Wessel , R. Schluter and H. Ney, “Using Posterior Word Probabilities for Improved Speech Recognition”, ICASSP 2000.

– [17] F. Wessel , R. Schluter and H. Ney, “Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities”, ICASSP 2001.

Page 62: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor

62

Reference (2/4)

• Some application for CM – [18] A. Kobayashi, K. Onoe, S. Sato and T. Imai, “Word Error Minimizati

on Using an Integrate Confidence Measure”, INTERSPEECH 2005.

– [19] Y. Qian, T. Lee and F. K. Soong, “Tone Information as a Confidence Measure for Improving Cantonese LVCSR “, ICSLP 2004.