20
Language Identification Oldřich Plchot, Pavel Matějka Speech@FIT, Brno University of Technology, Czech Republic [email protected] IKR Brno 2012

Language Identification

  • Upload
    argyle

  • View
    48

  • Download
    2

Embed Size (px)

DESCRIPTION

Language Identification. Oldřich Plchot, Pavel Ma t ějka Speech@FIT, Brno University of Technology, Czech Republic [email protected]. IKR Brno 2012. Outline. Why do we need LID? Evaluations Acoustic LID Phonotactic LID Fusion Conclusion. Wh y do we need language identification?. - PowerPoint PPT Presentation

Citation preview

Page 1: Language Identification

Language Identification

Oldřich Plchot, Pavel Matějka Speech@FIT, Brno University of Technology, Czech Republic

[email protected]

IKRBrno2012

Page 2: Language Identification

Language Identification IKR, Brno, 2012

2

Outline

• Why do we need LID?• Evaluations• Acoustic LID• Phonotactic LID• Fusion• Conclusion

Page 3: Language Identification

Language Identification IKR, Brno, 2012

3

Why do we need language identification?

1) Route phone calls to human operators.

Emergency (112,155,911) Call centers

Fireguard (150)Police (158)

Page 4: Language Identification

Language Identification IKR, Brno, 2012

4

Why do we need language identification?

2) Pre-select suitable recognition system.

Translate SPA

KWS CHN

Speech2Text ENG

Language Identification

Translate CZETranslate VIE

Connect

Page 5: Language Identification

Language Identification IKR, Brno, 2012

5

Why do we need language identification?

3) Security applications to narrow search space.

Page 6: Language Identification

Language Identification IKR, Brno, 2012

6

Two main approaches to LID

• Acoustic – Gaussian Mixture Model

• Phonotactic – Phoneme Recognition followed by Language Model

Page 7: Language Identification

Language Identification IKR, Brno, 2012

7

Acoustic approach

• Gaussian Mixture Model

- good for short speech segments and dialect recognition- relies on the sounds

Page 8: Language Identification

Language Identification IKR, Brno, 2012

8

-11.20.4

-4.7-13.0

2.34.5…

Spectral features - MFCC

20ms 10ms

Short-timeFFT

Mel - Filter Bank

Log () Discrete Cosine Transform

-12.8-0.3-5.7

-22.48.96.8…

Page 9: Language Identification

Language Identification IKR, Brno, 2012

9

Shifted delta cepstra

• Shifted Delta Cepstra represent an information about the speech evolution around the current frame ( ± 0.1sec)

• Size of Final feature vector is: 7 MFCC + 7 × 7 SDC = 56

Page 10: Language Identification

Language Identification IKR, Brno, 2012

10

Acoustic systems – GMM based

• Maximum likelihood (generative)• Objective function to maximize is the likelihood of

training data given the transcription

• Maximum Mutual Information (discriminative)• Objective function to maximize is the posterior

probability of all training utterances being correctly recognized

• Advantages of using discriminative training:• Lower error rates• Less parameters

• Disadvantages of discriminative training• Overtraining• Sometimes computationaly expensive

• Channel Compensation – from previous presentation

Page 11: Language Identification

Language Identification IKR, Brno, 2012

11

Highly overlapped distributions

Page 12: Language Identification

Language Identification IKR, Brno, 2012

12

Results on LRE 2007 (14 languages)

System / Equal Error Rate [%] 30sec 10sec 3sec

GMM2048 8.03 12.89 21.77

GMM2048-eigchan 2.76 7.38 17.14

GMM2048–chcf 2.94 7.40 17.93

GMM2048-MMI-chcf ( ~3 MMI iterations) 2.41 7.02 16.90

The best acoustic system combines:• Many Gaussians• Eigen-channel compensation of features• MMI

System / Equal Error Rate [%] 30sec 10sec 3sec

GMM2048 ML 8.03 12.89 21.77

GMM 256 ML ~16

GMM256 MMI (~15 MMI iterations) 4.15 8.61 18.43

GMM256-MMI-chcf (~3 MMI iterations) 3.73 9.81 20.98

Page 13: Language Identification

Language Identification IKR, Brno, 2012

13

Phonotactic approach

• Phoneme Recognition followed by Language Model (PRLM)

- good for longer speech segments- robust against dialects in one language - eliminates speech characteristics of speaker's native language

Page 14: Language Identification

Language Identification IKR, Brno, 2012

14

Phone recognizer

• 3 neural networks to produce the phone posterior probability

• 310 ms long time trajectory around the actual frame

• Investigation of different phone recognizers for LID => better phone recognizer ≈ better LID system

Page 15: Language Identification

Language Identification IKR, Brno, 2012

15

Phone recognition output

One best phone string

Page 16: Language Identification

Language Identification IKR, Brno, 2012

16

Phonotactic modeling - example

u n d 25

a n d 3

t h e 0

. . . .

u n d 1

a n d 32

t h e 13

. . . .

u n d 5

a n d 0

t h e 1

. . . .

German English Test

• N-gram language models – discounting, backoff • Support Vector Machines – vectors with counts• PCA + LDA• Neural Networks

Page 17: Language Identification

Language Identification IKR, Brno, 2012

17

Phone recognition output

One best phone string

Phone lattice0,6

0,30,1

Page 18: Language Identification

Language Identification IKR, Brno, 2012

18

Results on LRE 2007 (14 languages)

Conclusion:• Build as good phone recognizer as you can• Gather as much data for each language as you can• Different approaches to modeling counts seem to not have

big influence on results

System / Equal Error Rate [%] 30 sec 10 sec 3 sec

HU_LM string (4-gram) 6.35 13.86 27.12

HU_LM 5.54 11.75 23.54

HU_SVM-3gram-counts 5.41 13.26 26.92

Page 19: Language Identification

Language Identification IKR, Brno, 2012

19

Fusion - LRE 2007 (14 languages)

System / Equal Error Rate [%] 30 sec 10 sec 3 sec

Acoustic - GMM2048-MMI-chcf ( ~3 MMI iterations)

2.41 7.02 16.90

Phonotactic - EN_TREE 3.54 10.68 22.66

Phonotactic - HU_TREE_A3E7M5S3G3_LFA 4.52 10.35 23.66

Fusion – The best 3 systems 1.28 4.63 13.53

Note:• Fusion weights have to be trained on separate set of files

which are as close as possible to target data

Page 20: Language Identification

Language Identification IKR, Brno, 2012

20

Thanks for your attentionand

I hope you enjoyed it ;)