iVector approach to Phonotactic LRE

iVector approach to Phonotactic LREMehdi Soufifar2nd May 2011

Phonotactic LRE

Train Classifier

LR, SVM, LMGLC,..Language-dependant

Utterance

L Recognizer(Hvite,BUTPR,...)

AM

Phoneme sequence

Extract n-gram statistics

N-gramcounts

Train :

Classifier

Test Utterance

L Recognizer(Hvite,BUTPR,...)

AM

Phoneme sequence

Extract n-gram statistics

N-gramcounts

Test :

Language dependant Score

N-gram Counts

N^3= 226981 for RU phoneme set

1 • Problem : Huge vector of n-gram counts• Solutions:

▫Choose the most frequent n-grams▫Choosing top N n-grams discriminatively(LL)▫Compress the n-gram counts

Singular Value Decomposition (SVD) Decompose the document matrix D Using the the transformation matrix U to reduce

the n-gram vector dimensionality PCA-based dimensionality reduction

▫iVector feature selection

€

D =USV

Sub-space multinomial modeling

• Every vector of n-gram counts consist of E events (#n-grams)• Log probability of nth utterance in MN distribution is:

• can be defined as :

• Model parameter to be estimated in ML estimation are t and w• No analytical solution!• We use Newton Raphson update as a Numerical solution

N^3= 226981 for RU phoneme set

Sub-space multinomial modeling• 1st solution :

▫consider all 3-grams to be components of a Bernoulli trial

▫Model the entire vector of 3-gram counts with one multinomial distribution

▫N-gram events are not independent (not consistent with Bernoulli trial presumption!)

• 2nd solution▫Cluster 3-grams based on their histories▫Model each history with a separate MN

distribution Data sparsity problem! Clustering 3-grams based on binary-decision tree

Training of iVector extractor

•Number of iterations : 5-7 (depends on sub-space dimension)

•Sub-space dimension : 600

3 seconds 10 seconds 30 seconds

Classifiers

•Configuration : L one-to-all linear classifier▫L: number of targeted languages

•Classifiers:▫SVM▫LR▫Linear Generative Classifier ▫MLR (to be done!)

Results on different classifiers

•Task : NIST LRE 2009

Dev-3s Dev-10s Dev-30s Eval-3s Eval-10s

Eval-30s

PCA-SVM

2.83 7.05 17.77 3.62 8.82 21.00

PCA-LR 2.22 6.22 17.26 2.93 8.29 22.60

PCA-GLC

2.81 8.25 19.83 3.50 9.88 22.88

iVec-SVM

6.54 14.07 26.79 8.54 17.5 18.06

iVec-LR 2.44 6.88 18.01 3.05 8.10 21.39

iVec-GLC

2.58 7.13 18.18 2.92 8.03 21.13

Results of different systems LRE09

Dev-3s Dev-10s

Dev-30s Evl-3s Evl-10s

Evl-30s

BASE-HU-SVM 2.83 7.05 17.77 3.62 8.82 21.00

PCA-HU-LR 2.22 6.22 17.26 2.93 8.29 22.60

iVect-HU-LR 2.81 8.25 19.83 3.05 8.10 21.05

iVec+PCA-HU-LR

2.05 5.74 16.71 2.79 7.63 21.05

iVec-RU-LR 2.66 6.46 17.50 2.59 7.42 19.83

iVec-LR HU+RU

1.54 4.44 13.30 2.09 5.34 16.53

iVec-LR HURU 1.90 5.10 14.69 2.06 5.80 17.79

N-gram clustering

•Remove all the 3-gram with repetition < 10 over all training utterances

•Model each history with a separate MN distribution

•1084 histories, up to 33 3-grams each

Dev-3s Dev-10s

Dev-30s Eval-3s Eval-10s

Eval-30s

>10 3-gram

8.84 16.04 27.94 10.34 19.92 32.35

Merging histories using BDT• In case of 3-gram PiPjPk

• Merging histories which do not increase the entropy more than a certain value

PiP22Pk

PiP33Pk

PiP33+22Pk

E1=Entropy(Model1)

Models 1 Models 2

E2=Entropy(Model2)

D= E1 – E2D= E1 – E2

Results on DT Hist. merging

•1089-60

More iterations on training T => T matrix is moving toward zero matrix!

Iteration Dev-3s Dev-10s

Dev-30s Eval-3s Eval-10s

Eval-30s

DT 4.36 10.41 22.20 5.46 12.80 27.09

>10 3-gram

8.84 16.04 27.94 10.34 19.92 32.35

Deeper insight to the iVector Extrac.

€

Wnnew =Wn

old + Hn−1gn

€

gn = tiT (l ni −φni

old l njj

E

∑i=1

E

∑ )

€

Hn = tiT ti

i=1

E

∑ max(l ni,φniold l nj

j

E

∑ )

€

Tenew = Te

old + He−1ge

€

ge = (l ne −φneold l ni

i=1

E

∑n=1

N

∑ )WnT

€

He = max(l ne,φneold l ni

i=1

E

∑ )n=1

N

∑ wnwnT

Strange results• 3-grams with no repetition through out the

whole training set should not affect system performance!

• Remove all the 3-grams with no repetition through the whole training set

• 35973->35406 (567 reduction)

• Even worse result if we prune more!!!!

Dev-30s Dev-10s Dev-3s Eval-30s

Eval-10s

Eval-3s

35973 2.44 6.88 18.01 3.05 8.10 21.39

35406 3.35 8.05 19.73 3.63 9.18 22.60

DT clustering of n-gram histories•The overall likelihood is an order of

magnitude higher than the 1st solution•Change of the model-likelihood is quite

notable in each iteration!•The T Matrix is mainly zero after some

iterations!

1st iteration

2nd iteration

3rd iteration

4th iteration

5th iteration

6th iteration

Closer look at TRAIN set

amha

boSn

cant

creo

croa

dari

engi

engl

fars

fren

geor

haus

hind

kore

mand

pash

port

russ

span

turk

ukra

urdu

viet

TRAIN voa ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

TRAIN cts ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✔ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✔

DEV voa ✔ ✔ ✗ ✔ ✗ ✔

✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✔ ✗ ✗ ✔ ✔ ✗ ✗

DEV cts ✗ ✗ ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✗ ✗ ✔ ✔

EVAL voa ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

EVAL cts ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔

Ivector inspection

Cant Engl

iVect inspection

•Multiple data source causes bimodality

•We also see this effect in some single source languages

•

Amha

Documents

iVector approach to Phonotactic LRE