SNRAwarePLDAModelingforRobust Speaker*Veriﬁcaonmwmak/papers/SYSU-CMU-2015.pdf ·...

SNR-‐Aware PLDA Modeling for Robust Speaker Verifica?on

Department of Electronic and Informa?on Engineering The Hong Kong Polytechnic University

廣東順德中山大學-‐卡內基梅隆大學國際聯合研究院(SYSU-‐CMU-‐Joint Research Ins?tute)

28 Dec. 2015

Man-Wai MAK enmwmak@polyu.edu.hk

http://www.eie.polyu.edu.hk/~mwmak

http://www.eie.polyu.edu.hk/~mwmak/papers/SYSU-CMU-2015.pdf

Contents

1.  I-‐Vector/PLDA for Speaker Verifica?on 2.  SNR-‐Aware PLDA Modeling

–  SNR-‐Invariant PLDA –  Mixture of PLDA

3.  Experiments on SRE12

4.  Conclusions

I-‐Vectors for Speaker Verifica4on •  State-‐of-‐the-‐art method for speaker verifica?on •  Factor analysis model:

!µs =

!µ +Txs

•  Instead of using the high-‐dimension to present the speaker s, we use the low-‐dimension (typically 500) i-‐vector xs to represent the speaker.

•  T is es?mated by an EM algorithm using the u]erances of many speakers. T represents the subspace in which the i-‐vectors vary.

•  Given T, es?mate xs for each target speaker and test u]erance xt

UBM supervector Low-‐rank total variability matrix

Speaker-‐dependent i-‐vector

(61440×500)

I-‐Vectors for Speaker Verifica4on •  Given an u]erance, we align its acous?c vectors against a UBM

to obtain the sufficient sta?s?cs:

•  The i-‐vector of the u]erance is the posterior mean of the latent factor of the factor analysis model:

Alignment

i-vector of utterance i: hxi|Oi = L

�1i T

T(⌃(b))�1

�1i = cov(xi,xi|O) =

⇣I+T

(b)�1NiT

⌘�1

I-‐Vectors for Speaker Verifica4on

Align ot with UBM

ni,1I 0 ! 00 ni,2I 0 00 0 ! 00 0 " ni,MI

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

!fi ,1!"fi ,M

hxi|Oi = L

�1i T

T(⌃(b))�1

�1i = cov(xi,xi|O) =

⇣I+T

(b)�1NiT

⌘�1

Training Data

Training Total Variability Matrix

I-‐Vector Extractor LDA+WCCN

U]erance from Target Speaker s

Test u]erance t

Scoring Method

Decision Maker Reject θ<

θ≥Accept

•  Given an u]erance from speaker s and a total variability matrix T, we es?mate his/her i-‐Vector xs

•  Because T defines the combined space describing both speaker variability and channel variability, we use LDA+WCCN to remove channel variability

Before LDA (x) Ader LDA

Each point represents an u]erance. Each marker type represents a speaker.

I-‐Vectors Scoring

SCD xs,xt( ) =WTxs,W

TxtWTxs W

•  Given the i-‐vector of target speaker and the i-‐vector of a test u]erance, we compute the cosine-‐distance score:

•  If the score is larger than a threshold θ, then we accept the speaker; otherwise we reject the speaker.

SCD(xs,xt )∈ [0,1]

Probabilis4c LDA for SV •  PLDA is based on a genera?ve model that uses pre-‐processed

i-‐vectors as input •  It aims to model the speaker and channel variability in the i-‐

vector space •  The method assumes that there is a speaker subspace V

within the i-‐vector space •  The i-‐vector xs is wri]en as:

i-vector extracted from the utterance of

speaker s Global mean of all i-vectors Defining

Speaker subspace

Speaker factor

Residual noise with covariance Σ

xs =m+Vzs +εs

Probabilis4c LDA for SV •  Similarly, the i-‐vector xt from a test u]erance is wri]en as:

•  Ini?a?vely, you may think of zs and zt are projected vectors on the speaker subspace defined by the eigenvectors in V.

•  But unlike PCA, given an i-‐vector xt , there are infinite numbers of zt. So, we need to consider the joint density of xt and zt when compu?ng the likelihood of xt

xt =m+Vzt +εt

PLDA Scoring

x t =m+Vz+ εt

x s =m+Vz+ εsxt =m+Vzt +εtxs =m+Vzs +εs

against

H0: Same speaker H1: Different speaker

Conven4onal Noise Robust PLDA

•  In conven?onal mul?-‐condi?on training, we pool i-‐vectors from various background noise levels to train m, V and Σ.

EM Algorithm {m,V,Σ}

I-vectors with 2 SNR ranges

Conven4onal Noise Robust PLDA •  Conven?onal i-‐vector/PLDA systems use a channel

space (with covariance ) to handle all SNR condi?ons.

I-‐Vector/PLDA Scoring

Enrollment Utterances

PLDA Scores

{m,V,Σ}

Contents

4.  Conclusions

•  We argue that the varia?on caused by SNR variability can be modeled by an SNR subspace and u]erances falling within a narrow SNR range should share the same SNR factor (Li & Mak, Interspeech15; Li & Mak, T-‐ASLP 15)

SNR Subspace

SNR Factor 2

Group1

Group2

Group3

SNR Factor 1

SNR Factor 3

SNR Invariant PLDA

•  Method of modeling SNR informa?on

clean 15 dB

SNR Subspace

I-vector Space

i-vector

SNR Invariant PLDA

SNR-‐invariant PLDA •  PLDA:

•  By adding an SNR factor to the conven?onal PLDA, we have SNR-‐invariant PLDA:

where U denotes the SNR subspace, is an SNR factor, and is the speaker (iden?ty) factor for speaker i.

•  Note that it is not the same as PLDA with channel subspace R:

k kij i k ij= + + +x m Vh Uw ε

ij i ij= + +x m Vh ε

xij =m+Vhi +Rrij + εij

i: Speaker index j: Session index

k: SNR index

SNR-‐invariant PLDA •  We separate I-‐vectors into different groups

according to the SNR of their u]erances

EM Algorithm {m,V,U,Σ}

Compared with Conven4onal PLDA

Conventional PLDA

ij i ij= + +x m Vh ε

SNR-Invariant PLDA

PLDA vs SNR-‐invariant PLDA

PLDA SNR-‐invariant PLDA

Generative Model

ij i ij= + +x m Vh ε k kij i k ij= + + +x m Vh Uw ε

p(x) = N (x |m,VVT +Σ) ( ) ( | , )T Tp N= + +x x m VV UU Σ

{ }=θ m,V,Σ { }=θ m,V,U,Σ

PLDA vs SNR-‐invariant PLDA

E-Step

| ( )iHTi i ijjX − −

== −∑h L V Σ x m

1| | | TTi i i i iX X X−= +h h L h h

PLDA versus SNR-‐invariant PLDA M-Step

1( ) | |T Tij i i iij ij

X X−

⎡ ⎤ ⎡ ⎤= − ⎣ ⎦⎣ ⎦∑ ∑V x m h h h

( )( ) | ( )T Tij ij i ijij

⎡ ⎤− − − −⎣ ⎦=∑

∑x m x m V h x m

SNR-‐invariant PLDA Score

Contents

4.  Conclusions

Mixture of PLDA (mPLDA) •  Conven?onal i-‐vector/PLDA systems use a single PLDA

model to handle all SNR condi?ons.

PLDA Model

Enrollment i-vectors

PLDA Scores

{m,V,Σ}

•  We argue that a PLDA model should focus on a small range of SNR.

PLDA Model 1

PLDA Score

PLDA Model 2

PLDA Model 3

PLDA Score

Mixture of PLDA (mPLDA)

•  The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the u]erance’s SNR (Mak, Interspeech14; Mak et al., T-‐ASLP 16)

PLDA Model 1

PLDA Score PLDA

Model 2

PLDA Model 3

SNR Es?mator

rior E

M.W. Mak, X.M. Pang and J.T. Chien, "Mixture of PLDA for Noise Robust I-Vector Speaker Verification", IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, No. 1, pp. 13-0142, Jan. 2016.

Mixture of PLDA (mPLDA)

Mo4va4on of mPLDA •  The idea of mPLDA is based on two hypotheses:

1.  Different levels of background noise will cause the i-‐vectors to fall on different regions of the i-‐vector space

2.  SNR variability nega?vely affects PLDA speaker recogni?on accuracy, but its effect can be mi?gated by explicitly modelling the SNR-‐dependent speaker subspaces through mixture of PLDA.

Mo4va4on of mPLDA •  To verify these two hypotheses, we corrupted 7,156 clean

telephone u]erances from 763 speakers with babble noise at 6dB and 15dB using the FaNT tool

•  This results in 3 sets of i-‐vectors: clean, 15dB, and 6dB •  Then, a GMM is constructed as shown below.

I-Vector Extraction

Compute mean & cov

I-Vector Extraction

Compute mean & cov

Construct GMM

Clean speech

{1/3, ⌧k,�k}3k=1

⌧1,�1

⌧3,�3

Mo4va4on of mPLDA •  We used par??on coefficients (PC) and par??on entropy

coefficients (PE) to quan?fy the cluster separability of the three groups of i-‐vectors.

PC à 1 and PE à 0 mean that the clusters are well separated

Mo4va4on of mPLDA •  To verify the 2nd hypothesis, we perform speaker

iden?fica?on experiments under SNR-‐match and SNR-‐ mismatch condi?ons.

•  There are 9 combina?ons of PLDA models and SNR groups, of which three are matched in training and test condi?ons and six are mismatched.

•  The SID accuracy gradually decreases when the SNR of the training data progressively deviates from that of the test data.

mPLDA: Model Parameters

For modeling SNR of utts.

For modeling SNR-dependent i-vectors

•  Model Parameters:

Graphical Model of mPLDA

For modeling SNR of utts.

For modeling SNR-dependent i-vectors

`ij : SNR of the j-th utterance from the i-th speaker

xij: i-vector of the j-th utterance from the i-th speaker

V ={Vk}k=1K

π ={πk}k=1K

Graphical Model: PLDA vs. mPLDA

`ij : SNR of the j-th utterance from the i-th speaker

PLDA mPLDA

Genera4ve Model for mPLDA

where the posterior prob of SNR is

r of S

: SNR in dB

PLDA vs. mPLDA

PLDA Mixture of PLDA

Generative Model

EM: PLDA vs. mPLDA Auxiliary Function

Mixture of PLDA:

Latent indicator variables:

SNR of training utterances:

Speaker indexes

Session indexes

No. of mixtures

Latent speaker factors:

EM: PLDA vs. mPLDA

E-Step

EM: PLDA vs. mPLDA M-Step

Likelihood-‐Ra4o Scores of mPLDA •  Same-‐speaker likelihood:

i-vectors of target and test speakers

SNR of target and test utterances

Likelihood-‐Ra4o Scores of mPLDA •  Different-‐speaker likelihood:

•  Verifica?on Score = Same-speaker likelihood

Different-speaker likelihood

41 #For full derivation, see http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf

Complexity Analysis

Dimension of i-vectors

Types of mPLDA •  The mixture of PLDA models can be of two types:

1.  SNR-‐independent mPLDA (SI-‐mPLDA) 2.  SNR-‐dependent mPLDA (SD-‐mPLDA)

Types of mPLDA •  SNR-‐independent mPLDA is the supervised version of Hinton’s mixture of factor analyzers, where the supervision comes from the speaker labels

•  Equivalent to clustering in i-‐vector space with the subspaces Vk of clusters determined by PLDA

•  No guidance from SNR informa?on.

SI-‐mPLDA vs. SD-‐mPLDA

Mixture weights independent of the SNR of utterances.

p(x) =KX

⇢kN (x,VkVTk +⌃k)

•  SNR-‐independent mPLDA:

•  SNR-‐dependent mPLDA:

Posterior prob. of SNR obtained from a 1-D GMM

Cluster Alignment in mPLDA

SNR-independent mPLDA SNR-dependent mPLDA

In SD-mPLDA, i-vectors that are aligned to the same mixture component have similar SNR

SNR-‐dependent vs. SNR-‐independent

Performance on CC4 of NIST12 (male)

SNR-indepedent mPLDA

SNR-dependent mPLDA

Contents

4.  Conclusions

Data and Features •  Evalua4on dataset: Common evalua?on condi?on 1 and 4 of

NIST SRE 2012 core set. •  Parameteriza4on: 19 MFCCs together with energy plus their

1st and 2nd deriva?ves à 60-‐Dim •  UBM: gender-‐dependent, 1024 mixtures •  Total Variability Matrix: gender-‐dependent, 500 total factors •  I-‐Vector Preprocessing:

Ø Whitening by WCCN then length normaliza?on Ø For SI-‐PLDA, followed by NFA (500-‐dim à 200-‐dim) + WCCN Ø For mPLDA, followed by LDA (500-‐dim à 200-‐dim) + WCCN

Distribu4on of SNR in SRE12

Each SNR region is handled by a specific set of SNR factors

Finding SNR Groups

Training Utterances

SNR Distribu4ons •  SNR Distribution of training and test utterances in CC4

Test Utterances

Training Utterances

Performance on SRE12

Method Parameters Male Female

K Q EER(%) minDCF EER(%) minDCF

PLDA -‐ -‐ 5.42 0.371 7.53 0.531

SDmPLDA -‐ -‐ 5.28 0.415 7.70 0.539

SNR-‐Invariant PLDA

3 40 5.42 0.382 6.93 0.528

5 40 5.28 0.381 6.89 0.522

6 40 5.29 0.388 6.90 0.536

8 30 5.56 0.384 7.05 0.545

No. of SNR Groups

No. of SNR factors (dim of ) wk 53

Method Parameters

Male Female

PLDA -‐ -‐ 2.40 0.332 2.19 0.335

SNR-‐dependent mPLDA

-‐ -‐ 2.47 0.283 2.07 0.328

3 40 1.96 0.277 1.74 0.290

6 40 1.99 0.278 1.72 0.290

No. of SNR Groups

No. of SNR factors (dim of ) wk

Method Parameters Male Female

PLDA -‐ -‐ 3.13 0.312 2.82 0.341

SD-‐mPLDA -‐ -‐ 2.88 0.329 2.71 0.332

3 40 2.72 0.289 2.36 0.314

5 40 2.67 0.291 2.38 0.322

6 40 2.63 0.287 2.43 0.319

8 30 2.70 0.292 2.29 0.313

No. of SNR Groups

Method Parameters

Male Female

PLDA -‐ -‐ 2.86 0.286 2.47 0.343

SNR-‐dependent mPLDA

-‐ -‐ 2.86 0.295 2.59 0.332

3 40 2.47 0.273 2.07 0.294

6 40 2.48 0.275 2.04 0.294

No. of SNR Groups

CC4, Female

Conventional PLDA

SNR-Invariant PLDA

Conclusions

•  We show that while I-‐vectors of different SNR fall on different regions of the I-‐vector space, they vary within a single cluster in an SNR-‐subspace.

•  Therefore, it is possible to model the SNR variability by adding an SNR loading matrix and SNR factors to the conven?onal PLDA model.

•  We also show that I-‐vectors derived from u]erances of different SNR live in different speaker subspaces.

•  Therefore, it is possible to model SNR variability by mixture of SNR-‐dependent PLDA

Bibliography 1.  M.W. Mak, X.M. Pang and J.T. Chien, "Mixture of PLDA for Noise Robust I-‐Vector Speaker Verifica?on",

IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, No. 1, pp. 13-‐0142, Jan. 2016.

2.  Na Li and M.W. Mak, "SNR-‐Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verifica?on", IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 23, no. 10, pp. 1648-‐1659, Oct. 2015.

3.  W. Rao and M.W. Mak, "Boos?ng the Performance of I-‐Vector Based Speaker Verifica?on via U]erance Par??oning", IEEE Trans. on Audio, Speech and Language Processing, vol. 21, no. 5, pp. 1012-‐1022, May 2013.

4.  N. Li and M.W. Mak, "SNR-‐Invariant PLDA with Mul?ple Speaker Subspaces", ICASSP'16, March, 2016.

5.  X.M. Pang and M.W. Mak, "Noise Robust Speaker Verifica?on via the Fusion of SNR-‐Independent and SNR-‐Dependent PLDA", InternaAonal Journal of Speech Technology, Oct. 2015.

6.  M.W. Mak, "Fast Scoring for Mixture of PLDA in I-‐Vector/PLDA Speaker Verifica?on” Proc. APSIPA’15, pp. 587-‐593, Dec. 2015, Hong Kong.

7.  M.W. Mak and H.B. Yu, " A Study of Voice Ac?vity Detec?on Techniques for NIST Speaker Recogni?on Evalua?ons", Computer Speech & Language, vol. 28, No. 1, Jan 2014, pp. 295-‐313.

8.  N. Li and M.W. Mak, "SNR-‐Invariant PLDA Modeling for Robust Speaker Verifica?on, Interspeech'15, Sept. 2015, Dresden, Germany, pp. 2317 -‐ 2321.

9.  P. Kenny, “Bayesian speaker verifica?on with heavy-‐tailed priors,” in Proc. of Odyssey: Speaker and Language RecogniAon Workshop, Brno, Czech Republic, June 2010.

10.  N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-‐end factor analysis for speaker verifica?on,” IEEE TransacAons on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 788–798, May 2011.

Acknowledgment

60 Xiaomin Pang Zhili Tan Shibiao Wan Wei RAO Na LI

SNRAwarePLDAModelingforRobust Speaker*Veriﬁcaonmwmak/papers/SYSU-CMU-2015.pdf ·...

Documents

Joint PLDA for Simultaneous Modeling of Two Factors · PLDA models does not allow the overall model to learn how samples from the same class vary across conditions; only within-condition

1 Robust and transparent watermarking scheme for colour images Speaker : Po-Hung Lai Adviser : Chih-Hung Lin Date : 2009.1.5

Robust Audio Tool (RAT) Speaker : Wei-Shin Pan DATE : 09/07/02

Logo Privacy-Preserving PLDA Speaker Veriﬁcation using ... · Biometric information protection is achieved by three properties as requested by the ISO/IEC 24745 standard (ISO/IEC

· A four-speaker sound system is standard on the LX model, while the robust six-speaker system is standard on the remaining trims

Building a Robust Speaker Recognition System Old řich Plchot , Ondřej Glembek , Pavel Matějka

Robust Speaker Recognition with Combined Use of …cs.joensuu.fi/pages/tkinnu/webpage/pdf/INTERSPEECH_2016_TMIC.pdf · Robust Speaker Recognition with Combined Use of Acoustic and

JOURNAL OF LA SNR-Invariant PLDA Modeling in Nonparametric

EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION ... · EFFECTIVE ACOUSTIC MODELING FOR ROBUST SPEAKER RECOGNITION Publication No. Tau q Hasan Al Banna, PhD The University

1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University

Exploration of Small Enrollment Speaker Veri cation on ...groups.csail.mit.edu/sls/publications/2005/woo_thesis.pdfThis thesis explores the problem of robust speaker veri cation for

PLDA-Web Certificaat exporteren en opladen op

Metal keypad. Robust Anti-vandal unit. Anti-Vandal Speaker

Building a Robust Speaker Recognition System Oldřich Plchot, Ondřej Glembek, Pavel Matějka December 9 th 2012

A Robust Speaker Identification System

Speaker Adapted Beamforming for Multi-Channel Automatic ...€¦ · speech masks is discussed. Index Terms: robust ASR, multi-channel ASR, speaker adap-tation, acoustic beamforming,

[staa^Mo^^a^ae remHen qwfla a qulen lae plda. NL1111S. 15

Salud mental y empleohoxe.vigo.org/pdf/social/plda/manuel.pdf"Conductas adictivas e inclusión social" VII Xornadas PLDA "Conductas adictivas e inclusión social" Vigo, 14 e 15 decembro

Robust text-independent speaker identification using ...frank/csc401/readings/ReynoldsRose.pdf · Title: Robust text-independent speaker identification using Gaussian mixture sp eaker

Overcoming the Barriers to Creating a Robust Service Catalog/media/HDIFusion/Files/speaker... · 2017. 10. 16. · Overcoming the Barriers to Creating a Robust Service Catalog by

SNRAware*PLDA*Modeling*for*Robust Speaker*Veriﬁcaonmwmak/papers/SYSU-CMU-2015.pdf ·...

SNRAwarePLDAModelingforRobust Speaker*Veriﬁcaonmwmak/papers/SYSU-CMU-2015.pdf ·...