iVector approach to Phonotactic LRE

  • View

  • Download

Embed Size (px)


iVector approach to Phonotactic LRE. Mehdi Soufifar 2 nd May 2011. Phonotactic LRE. Extract n-gram statistics. Train :. Phoneme sequence. N-gram counts. Train Classifier LR, SVM, LM GLC,. L. L. Classifier. Recognizer ( Hvite,BUTPR ,...). Recognizer ( Hvite,BUTPR ,...). - PowerPoint PPT Presentation

Text of iVector approach to Phonotactic LRE

  • iVector approach to Phonotactic LREMehdi Soufifar2nd May 2011

  • Phonotactic LREPhoneme sequenceN-gramcountsTrain :Phoneme sequenceN-gramcountsTest :Language dependant Score

  • N-gram Counts N^3= 226981 for RU phoneme set 1Problem : Huge vector of n-gram countsSolutions: Choose the most frequent n-gramsChoosing top N n-grams discriminatively(LL)Compress the n-gram countsSingular Value Decomposition (SVD)Decompose the document matrix D Using the the transformation matrix U to reduce the n-gram vector dimensionality PCA-based dimensionality reduction

    iVector feature selection

  • Sub-space multinomial modelingEvery vector of n-gram counts consist of E events (#n-grams)Log probability of nth utterance in MN distribution is:

    can be defined as :

    Model parameter to be estimated in ML estimation are t and wNo analytical solution!We use Newton Raphson update as a Numerical solution

    N^3= 226981 for RU phoneme set

  • Sub-space multinomial modeling1st solution : consider all 3-grams to be components of a Bernoulli trial Model the entire vector of 3-gram counts with one multinomial distributionN-gram events are not independent (not consistent with Bernoulli trial presumption!)

    2nd solutionCluster 3-grams based on their historiesModel each history with a separate MN distribution Data sparsity problem!Clustering 3-grams based on binary-decision tree

  • Training of iVector extractorNumber of iterations : 5-7 (depends on sub-space dimension)Sub-space dimension : 600

    3 seconds10 seconds30 seconds

  • ClassifiersConfiguration : L one-to-all linear classifierL: number of targeted languages

    Classifiers:SVMLRLinear Generative Classifier MLR (to be done!)

  • Results on different classifiersTask : NIST LRE 2009

    Dev-3sDev-10sDev-30sEval-3sEval-10sEval-30sPCA-SVM2.837.0517.773.628.8221.00PCA-LR2.226.2217.262.938.2922.60 PCA-GLC2.818.2519.833.509.8822.88


  • Results of different systems LRE09

    Dev-3sDev-10sDev-30sEvl-3sEvl-10sEvl-30sBASE-HU-SVM2.837.0517.773.628.8221.00PCA-HU-LR2.226.2217.262.938.2922.60 iVect-HU-LR2.818.2519.833.058.1021.05



    iVec-LR HU+RU1.544.4413.302.095.3416.53iVec-LR HURU1.905.1014.692.065.8017.79

  • N-gram clustering Remove all the 3-gram with repetition < 10 over all training utterancesModel each history with a separate MN distribution1084 histories, up to 33 3-grams each

    Dev-3sDev-10sDev-30sEval-3sEval-10sEval-30s>10 3-gram8.8416.0427.9410.3419.9232.35

  • Merging histories using BDTIn case of 3-gram PiPjPkMerging histories which do not increase the entropy more than a certain value

    E1=Entropy(Model1)Models 1Models 2E2=Entropy(Model2)D= E1 E2

  • Results on DT Hist. merging 1089-60

    More iterations on training T => T matrix is moving toward zero matrix!

    IterationDev-3sDev-10sDev-30sEval-3sEval-10sEval-30sDT4.3610.4122.205.4612.8027.09>10 3-gram8.8416.0427.9410.3419.9232.35

  • Deeper insight to the iVector Extrac.

  • Strange results3-grams with no repetition through out the whole training set should not affect system performance! Remove all the 3-grams with no repetition through the whole training set35973->35406 (567 reduction)

    Even worse result if we prune more!!!!


  • DT clustering of n-gram historiesThe overall likelihood is an order of magnitude higher than the 1st solutionChange of the model-likelihood is quite notable in each iteration!The T Matrix is mainly zero after some iterations!

  • 1st iteration

  • 2nd iteration

  • 3rd iteration

  • 4th iteration

  • 5th iteration

  • 6th iteration

  • Closer look at TRAIN set

    amhaboSncantcreocroadariengienglfarsfrengeorhaushindkoremandpashportrussspanturkukraurduvietTRAIN voaTRAIN cts

    DEV voa DEV cts

    EVAL voaEVAL cts

  • Ivector inspectionCantEngl

  • iVect inspectionMultiple data source causes bimodalityWe also see this effect in some single source languages Amha