25
iVector approach to Phonotactic LRE Mehdi Soufifar 2 nd May 2011

iVector approach to Phonotactic LRE

  • Upload
    ronni

  • View
    63

  • Download
    2

Embed Size (px)

DESCRIPTION

iVector approach to Phonotactic LRE. Mehdi Soufifar 2 nd May 2011. Phonotactic LRE. Extract n-gram statistics. Train :. Phoneme sequence. N-gram counts. Train Classifier LR, SVM, LM GLC,. L. L. Classifier. Recognizer ( Hvite,BUTPR ,...). Recognizer ( Hvite,BUTPR ,...). - PowerPoint PPT Presentation

Citation preview

Page 1: iVector approach to Phonotactic LRE

iVector approach to Phonotactic LREMehdi Soufifar2nd May 2011

Page 2: iVector approach to Phonotactic LRE

Phonotactic LRE

Train Classifier

LR, SVM, LMGLC,..Language-dependant

Utterance

L Recognizer(Hvite,BUTPR,...)

AM

Phoneme sequence

Extract n-gram statistics

N-gramcounts

Train :

Classifier

Test Utterance

L Recognizer(Hvite,BUTPR,...)

AM

Phoneme sequence

Extract n-gram statistics

N-gramcounts

Test :

Language dependant Score

Page 3: iVector approach to Phonotactic LRE

N-gram Counts

N^3= 226981 for RU phoneme set

1 • Problem : Huge vector of n-gram counts• Solutions:

▫Choose the most frequent n-grams▫Choosing top N n-grams discriminatively(LL)▫Compress the n-gram counts

Singular Value Decomposition (SVD) Decompose the document matrix D Using the the transformation matrix U to reduce

the n-gram vector dimensionality PCA-based dimensionality reduction

▫iVector feature selection

D =USV

Page 4: iVector approach to Phonotactic LRE

Sub-space multinomial modeling

• Every vector of n-gram counts consist of E events (#n-grams)• Log probability of nth utterance in MN distribution is:

• can be defined as :

• Model parameter to be estimated in ML estimation are t and w• No analytical solution!• We use Newton Raphson update as a Numerical solution

N^3= 226981 for RU phoneme set

Page 5: iVector approach to Phonotactic LRE

Sub-space multinomial modeling• 1st solution :

▫consider all 3-grams to be components of a Bernoulli trial

▫Model the entire vector of 3-gram counts with one multinomial distribution

▫N-gram events are not independent (not consistent with Bernoulli trial presumption!)

• 2nd solution▫Cluster 3-grams based on their histories▫Model each history with a separate MN

distribution Data sparsity problem! Clustering 3-grams based on binary-decision tree

Page 6: iVector approach to Phonotactic LRE

Training of iVector extractor

•Number of iterations : 5-7 (depends on sub-space dimension)

•Sub-space dimension : 600

3 seconds 10 seconds 30 seconds

Page 7: iVector approach to Phonotactic LRE

Classifiers

•Configuration : L one-to-all linear classifier▫L: number of targeted languages

•Classifiers:▫SVM▫LR▫Linear Generative Classifier ▫MLR (to be done!)

Page 8: iVector approach to Phonotactic LRE

Results on different classifiers

•Task : NIST LRE 2009

Dev-3s Dev-10s Dev-30s Eval-3s Eval-10s

Eval-30s

PCA-SVM

2.83 7.05 17.77 3.62 8.82 21.00

PCA-LR 2.22 6.22 17.26 2.93 8.29 22.60

PCA-GLC

2.81 8.25 19.83 3.50 9.88 22.88

iVec-SVM

6.54 14.07 26.79 8.54 17.5 18.06

iVec-LR 2.44 6.88 18.01 3.05 8.10 21.39

iVec-GLC

2.58 7.13 18.18 2.92 8.03 21.13

Page 9: iVector approach to Phonotactic LRE

Results of different systems LRE09

Dev-3s Dev-10s

Dev-30s Evl-3s Evl-10s

Evl-30s

BASE-HU-SVM 2.83 7.05 17.77 3.62 8.82 21.00

PCA-HU-LR 2.22 6.22 17.26 2.93 8.29 22.60

iVect-HU-LR 2.81 8.25 19.83 3.05 8.10 21.05

iVec+PCA-HU-LR

2.05 5.74 16.71 2.79 7.63 21.05

iVec-RU-LR 2.66 6.46 17.50 2.59 7.42 19.83

iVec-LR HU+RU

1.54 4.44 13.30 2.09 5.34 16.53

iVec-LR HURU 1.90 5.10 14.69 2.06 5.80 17.79

Page 10: iVector approach to Phonotactic LRE

N-gram clustering

•Remove all the 3-gram with repetition < 10 over all training utterances

•Model each history with a separate MN distribution

•1084 histories, up to 33 3-grams each

Dev-3s Dev-10s

Dev-30s Eval-3s Eval-10s

Eval-30s

>10 3-gram

8.84 16.04 27.94 10.34 19.92 32.35

Page 11: iVector approach to Phonotactic LRE

Merging histories using BDT• In case of 3-gram PiPjPk

• Merging histories which do not increase the entropy more than a certain value

PiP22Pk

PiP33Pk

PiP33+22Pk

E1=Entropy(Model1)

Models 1 Models 2

E2=Entropy(Model2)

D= E1 – E2D= E1 – E2

Page 12: iVector approach to Phonotactic LRE

Results on DT Hist. merging

•1089-60

More iterations on training T => T matrix is moving toward zero matrix!

Iteration Dev-3s Dev-10s

Dev-30s Eval-3s Eval-10s

Eval-30s

DT 4.36 10.41 22.20 5.46 12.80 27.09

>10 3-gram

8.84 16.04 27.94 10.34 19.92 32.35

Page 13: iVector approach to Phonotactic LRE

Deeper insight to the iVector Extrac.

Wnnew =Wn

old + Hn−1gn

gn = tiT (l ni −φni

old l njj

E

∑i=1

E

∑ )

Hn = tiT ti

i=1

E

∑ max(l ni,φniold l nj

j

E

∑ )

Tenew = Te

old + He−1ge

ge = (l ne −φneold l ni

i=1

E

∑n=1

N

∑ )WnT

He = max(l ne,φneold l ni

i=1

E

∑ )n=1

N

∑ wnwnT

Page 14: iVector approach to Phonotactic LRE

Strange results• 3-grams with no repetition through out the

whole training set should not affect system performance!

• Remove all the 3-grams with no repetition through the whole training set

• 35973->35406 (567 reduction)

• Even worse result if we prune more!!!!

Dev-30s Dev-10s Dev-3s Eval-30s

Eval-10s

Eval-3s

35973 2.44 6.88 18.01 3.05 8.10 21.39

35406 3.35 8.05 19.73 3.63 9.18 22.60

Page 15: iVector approach to Phonotactic LRE

DT clustering of n-gram histories•The overall likelihood is an order of

magnitude higher than the 1st solution•Change of the model-likelihood is quite

notable in each iteration!•The T Matrix is mainly zero after some

iterations!

Page 16: iVector approach to Phonotactic LRE

1st iteration

Page 17: iVector approach to Phonotactic LRE

2nd iteration

Page 18: iVector approach to Phonotactic LRE

3rd iteration

Page 19: iVector approach to Phonotactic LRE

4th iteration

Page 20: iVector approach to Phonotactic LRE

5th iteration

Page 21: iVector approach to Phonotactic LRE

6th iteration

Page 22: iVector approach to Phonotactic LRE

Closer look at TRAIN set

amha

boSn

cant

creo

croa

dari

engi

engl

fars

fren

geor

haus

hind

kore

mand

pash

port

russ

span

turk

ukra

urdu

viet

TRAIN voa ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

TRAIN cts ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✔ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✔

DEV voa ✔ ✔ ✗ ✔ ✗ ✔

✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✔ ✗ ✗ ✔ ✔ ✗ ✗

DEV cts ✗ ✗ ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✗ ✗ ✔ ✔

EVAL voa ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

EVAL cts ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔

Page 23: iVector approach to Phonotactic LRE

Ivector inspection

Cant Engl

Page 24: iVector approach to Phonotactic LRE

iVect inspection

•Multiple data source causes bimodality

•We also see this effect in some single source languages

Amha

Page 25: iVector approach to Phonotactic LRE