Upload
ronni
View
63
Download
2
Tags:
Embed Size (px)
DESCRIPTION
iVector approach to Phonotactic LRE. Mehdi Soufifar 2 nd May 2011. Phonotactic LRE. Extract n-gram statistics. Train :. Phoneme sequence. N-gram counts. Train Classifier LR, SVM, LM GLC,. L. L. Classifier. Recognizer ( Hvite,BUTPR ,...). Recognizer ( Hvite,BUTPR ,...). - PowerPoint PPT Presentation
Citation preview
iVector approach to Phonotactic LREMehdi Soufifar2nd May 2011
Phonotactic LRE
Train Classifier
LR, SVM, LMGLC,..Language-dependant
Utterance
L Recognizer(Hvite,BUTPR,...)
AM
Phoneme sequence
Extract n-gram statistics
N-gramcounts
Train :
Classifier
Test Utterance
L Recognizer(Hvite,BUTPR,...)
AM
Phoneme sequence
Extract n-gram statistics
N-gramcounts
Test :
Language dependant Score
N-gram Counts
N^3= 226981 for RU phoneme set
1 • Problem : Huge vector of n-gram counts• Solutions:
▫Choose the most frequent n-grams▫Choosing top N n-grams discriminatively(LL)▫Compress the n-gram counts
Singular Value Decomposition (SVD) Decompose the document matrix D Using the the transformation matrix U to reduce
the n-gram vector dimensionality PCA-based dimensionality reduction
▫iVector feature selection
€
D =USV
Sub-space multinomial modeling
• Every vector of n-gram counts consist of E events (#n-grams)• Log probability of nth utterance in MN distribution is:
• can be defined as :
• Model parameter to be estimated in ML estimation are t and w• No analytical solution!• We use Newton Raphson update as a Numerical solution
N^3= 226981 for RU phoneme set
Sub-space multinomial modeling• 1st solution :
▫consider all 3-grams to be components of a Bernoulli trial
▫Model the entire vector of 3-gram counts with one multinomial distribution
▫N-gram events are not independent (not consistent with Bernoulli trial presumption!)
• 2nd solution▫Cluster 3-grams based on their histories▫Model each history with a separate MN
distribution Data sparsity problem! Clustering 3-grams based on binary-decision tree
Training of iVector extractor
•Number of iterations : 5-7 (depends on sub-space dimension)
•Sub-space dimension : 600
3 seconds 10 seconds 30 seconds
Classifiers
•Configuration : L one-to-all linear classifier▫L: number of targeted languages
•Classifiers:▫SVM▫LR▫Linear Generative Classifier ▫MLR (to be done!)
Results on different classifiers
•Task : NIST LRE 2009
Dev-3s Dev-10s Dev-30s Eval-3s Eval-10s
Eval-30s
PCA-SVM
2.83 7.05 17.77 3.62 8.82 21.00
PCA-LR 2.22 6.22 17.26 2.93 8.29 22.60
PCA-GLC
2.81 8.25 19.83 3.50 9.88 22.88
iVec-SVM
6.54 14.07 26.79 8.54 17.5 18.06
iVec-LR 2.44 6.88 18.01 3.05 8.10 21.39
iVec-GLC
2.58 7.13 18.18 2.92 8.03 21.13
Results of different systems LRE09
Dev-3s Dev-10s
Dev-30s Evl-3s Evl-10s
Evl-30s
BASE-HU-SVM 2.83 7.05 17.77 3.62 8.82 21.00
PCA-HU-LR 2.22 6.22 17.26 2.93 8.29 22.60
iVect-HU-LR 2.81 8.25 19.83 3.05 8.10 21.05
iVec+PCA-HU-LR
2.05 5.74 16.71 2.79 7.63 21.05
iVec-RU-LR 2.66 6.46 17.50 2.59 7.42 19.83
iVec-LR HU+RU
1.54 4.44 13.30 2.09 5.34 16.53
iVec-LR HURU 1.90 5.10 14.69 2.06 5.80 17.79
N-gram clustering
•Remove all the 3-gram with repetition < 10 over all training utterances
•Model each history with a separate MN distribution
•1084 histories, up to 33 3-grams each
Dev-3s Dev-10s
Dev-30s Eval-3s Eval-10s
Eval-30s
>10 3-gram
8.84 16.04 27.94 10.34 19.92 32.35
Merging histories using BDT• In case of 3-gram PiPjPk
• Merging histories which do not increase the entropy more than a certain value
PiP22Pk
PiP33Pk
PiP33+22Pk
E1=Entropy(Model1)
Models 1 Models 2
E2=Entropy(Model2)
D= E1 – E2D= E1 – E2
Results on DT Hist. merging
•1089-60
More iterations on training T => T matrix is moving toward zero matrix!
Iteration Dev-3s Dev-10s
Dev-30s Eval-3s Eval-10s
Eval-30s
DT 4.36 10.41 22.20 5.46 12.80 27.09
>10 3-gram
8.84 16.04 27.94 10.34 19.92 32.35
Deeper insight to the iVector Extrac.
€
Wnnew =Wn
old + Hn−1gn
€
gn = tiT (l ni −φni
old l njj
E
∑i=1
E
∑ )
€
Hn = tiT ti
i=1
E
∑ max(l ni,φniold l nj
j
E
∑ )
€
Tenew = Te
old + He−1ge
€
ge = (l ne −φneold l ni
i=1
E
∑n=1
N
∑ )WnT
€
He = max(l ne,φneold l ni
i=1
E
∑ )n=1
N
∑ wnwnT
Strange results• 3-grams with no repetition through out the
whole training set should not affect system performance!
• Remove all the 3-grams with no repetition through the whole training set
• 35973->35406 (567 reduction)
• Even worse result if we prune more!!!!
Dev-30s Dev-10s Dev-3s Eval-30s
Eval-10s
Eval-3s
35973 2.44 6.88 18.01 3.05 8.10 21.39
35406 3.35 8.05 19.73 3.63 9.18 22.60
DT clustering of n-gram histories•The overall likelihood is an order of
magnitude higher than the 1st solution•Change of the model-likelihood is quite
notable in each iteration!•The T Matrix is mainly zero after some
iterations!
1st iteration
2nd iteration
3rd iteration
4th iteration
5th iteration
6th iteration
Closer look at TRAIN set
amha
boSn
cant
creo
croa
dari
engi
engl
fars
fren
geor
haus
hind
kore
mand
pash
port
russ
span
turk
ukra
urdu
viet
TRAIN voa ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
TRAIN cts ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✔ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✔
DEV voa ✔ ✔ ✗ ✔ ✗ ✔
✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✔ ✗ ✗ ✔ ✔ ✗ ✗
DEV cts ✗ ✗ ✔ ✗ ✔ ✗ ✔ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✔ ✗ ✗ ✔ ✔
EVAL voa ✔ ✔ ✔ ✔ ✔ ✔ ✗ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
EVAL cts ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✔ ✔ ✗ ✗ ✔ ✗ ✗ ✗ ✔ ✔
Ivector inspection
Cant Engl
iVect inspection
•Multiple data source causes bimodality
•We also see this effect in some single source languages
•
Amha