prev

next

of 25

View

46Download

2

Tags:

Embed Size (px)

DESCRIPTION

iVector approach to Phonotactic LRE. Mehdi Soufifar 2 nd May 2011. Phonotactic LRE. Extract n-gram statistics. Train :. Phoneme sequence. N-gram counts. Train Classifier LR, SVM, LM GLC,. L. L. Classifier. Recognizer ( Hvite,BUTPR ,...). Recognizer ( Hvite,BUTPR ,...). - PowerPoint PPT Presentation

iVector approach to Phonotactic LREMehdi Soufifar2nd May 2011

Phonotactic LREPhoneme sequenceN-gramcountsTrain :Phoneme sequenceN-gramcountsTest :Language dependant Score

N-gram Counts N^3= 226981 for RU phoneme set 1Problem : Huge vector of n-gram countsSolutions: Choose the most frequent n-gramsChoosing top N n-grams discriminatively(LL)Compress the n-gram countsSingular Value Decomposition (SVD)Decompose the document matrix D Using the the transformation matrix U to reduce the n-gram vector dimensionality PCA-based dimensionality reduction

iVector feature selection

Sub-space multinomial modelingEvery vector of n-gram counts consist of E events (#n-grams)Log probability of nth utterance in MN distribution is:

can be defined as :

Model parameter to be estimated in ML estimation are t and wNo analytical solution!We use Newton Raphson update as a Numerical solution

N^3= 226981 for RU phoneme set

Sub-space multinomial modeling1st solution : consider all 3-grams to be components of a Bernoulli trial Model the entire vector of 3-gram counts with one multinomial distributionN-gram events are not independent (not consistent with Bernoulli trial presumption!)

2nd solutionCluster 3-grams based on their historiesModel each history with a separate MN distribution Data sparsity problem!Clustering 3-grams based on binary-decision tree

Training of iVector extractorNumber of iterations : 5-7 (depends on sub-space dimension)Sub-space dimension : 600

3 seconds10 seconds30 seconds

ClassifiersConfiguration : L one-to-all linear classifierL: number of targeted languages

Classifiers:SVMLRLinear Generative Classifier MLR (to be done!)

Results on different classifiersTask : NIST LRE 2009

Dev-3sDev-10sDev-30sEval-3sEval-10sEval-30sPCA-SVM2.837.0517.773.628.8221.00PCA-LR2.226.2217.262.938.2922.60 PCA-GLC2.818.2519.833.509.8822.88

iVec-SVM6.5414.0726.798.5417.518.06iVec-LR2.446.8818.013.058.1021.39iVec-GLC2.587.1318.182.928.0321.13

Results of different systems LRE09

Dev-3sDev-10sDev-30sEvl-3sEvl-10sEvl-30sBASE-HU-SVM2.837.0517.773.628.8221.00PCA-HU-LR2.226.2217.262.938.2922.60 iVect-HU-LR2.818.2519.833.058.1021.05

iVec+PCA-HU-LR2.055.7416.712.797.6321.05

iVec-RU-LR2.666.4617.502.597.4219.83

iVec-LR HU+RU1.544.4413.302.095.3416.53iVec-LR HURU1.905.1014.692.065.8017.79

N-gram clustering Remove all the 3-gram with repetition < 10 over all training utterancesModel each history with a separate MN distribution1084 histories, up to 33 3-grams each

Dev-3sDev-10sDev-30sEval-3sEval-10sEval-30s>10 3-gram8.8416.0427.9410.3419.9232.35

Merging histories using BDTIn case of 3-gram PiPjPkMerging histories which do not increase the entropy more than a certain value

E1=Entropy(Model1)Models 1Models 2E2=Entropy(Model2)D= E1 E2

Results on DT Hist. merging 1089-60

More iterations on training T => T matrix is moving toward zero matrix!

IterationDev-3sDev-10sDev-30sEval-3sEval-10sEval-30sDT4.3610.4122.205.4612.8027.09>10 3-gram8.8416.0427.9410.3419.9232.35

Deeper insight to the iVector Extrac.

Strange results3-grams with no repetition through out the whole training set should not affect system performance! Remove all the 3-grams with no repetition through the whole training set35973->35406 (567 reduction)

Even worse result if we prune more!!!!

Dev-30sDev-10sDev-3sEval-30sEval-10sEval-3s359732.446.8818.013.058.1021.39354063.358.0519.733.639.1822.60

DT clustering of n-gram historiesThe overall likelihood is an order of magnitude higher than the 1st solutionChange of the model-likelihood is quite notable in each iteration!The T Matrix is mainly zero after some iterations!

1st iteration

2nd iteration

3rd iteration

4th iteration

5th iteration

6th iteration

Closer look at TRAIN set

amhaboSncantcreocroadariengienglfarsfrengeorhaushindkoremandpashportrussspanturkukraurduvietTRAIN voaTRAIN cts

DEV voa DEV cts

EVAL voaEVAL cts

Ivector inspectionCantEngl

iVect inspectionMultiple data source causes bimodalityWe also see this effect in some single source languages Amha