60
SNRAware PLDA Modeling for Robust Speaker Verifica?on Department of Electronic and Informa?on Engineering The Hong Kong Polytechnic University 廣東順德中山大學卡內基梅隆大學國際聯合研究院 (SYSUCMUJoint Research Ins?tute) 28 Dec. 2015 Man-Wai MAK [email protected] http://www.eie.polyu.edu.hk/~mwmak http://www.eie.polyu.edu.hk/~mwmak/papers/SYSU-CMU-2015.pdf

SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

SNR-­‐Aware  PLDA  Modeling  for  Robust  Speaker  Verifica?on  

Department  of  Electronic  and  Informa?on  Engineering  The  Hong  Kong  Polytechnic  University  

廣東順德中山大學-­‐卡內基梅隆大學國際聯合研究院(SYSU-­‐CMU-­‐Joint  Research  Ins?tute)  

28  Dec.  2015  

Man-Wai MAK [email protected]

http://www.eie.polyu.edu.hk/~mwmak

http://www.eie.polyu.edu.hk/~mwmak/papers/SYSU-CMU-2015.pdf

Page 2: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

2  

Contents

1.  I-­‐Vector/PLDA  for  Speaker  Verifica?on  2.  SNR-­‐Aware  PLDA  Modeling  

–  SNR-­‐Invariant  PLDA  –  Mixture  of  PLDA  

3.  Experiments  on  SRE12  

4.  Conclusions  

2  

Page 3: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

3

I-­‐Vectors  for  Speaker  Verifica4on  •  State-­‐of-­‐the-­‐art  method  for  speaker  verifica?on  •  Factor  analysis  model:  

!µs =

!µ +Txs

•  Instead  of  using  the  high-­‐dimension          to  present  the  speaker  s,  we  use  the  low-­‐dimension  (typically  500)  i-­‐vector  xs  to  represent  the  speaker.  

•  T  is  es?mated  by  an  EM  algorithm  using  the  u]erances  of  many  speakers.  T  represents  the  subspace  in  which  the  i-­‐vectors  vary.  

•  Given  T,  es?mate  xs  for  each  target  speaker  and  test  u]erance  xt    

 

UBM  supervector   Low-­‐rank  total  variability  matrix  

Speaker-­‐dependent  i-­‐vector  

(61440×500)

!µs

Page 4: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

4

I-­‐Vectors  for  Speaker  Verifica4on  •  Given  an  u]erance,  we  align  its  acous?c  vectors  against  a  UBM  

to  obtain  the  sufficient  sta?s?cs:  

•  The  i-­‐vector  of  the  u]erance  is  the  posterior  mean  of  the  latent  factor  of  the  factor  analysis  model:  

Alignment

UBM

i-vector of utterance i: hxi|Oi = L

�1i T

T(⌃(b))�1

f̃i

L

�1i = cov(xi,xi|O) =

⇣I+T

T⌃

(b)�1NiT

⌘�1

4  

Page 5: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

5

I-­‐Vectors  for  Speaker  Verifica4on  

Align ot with UBM

Ni =

ni,1I 0 ! 00 ni,2I 0 00 0 ! 00 0 " ni,MI

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

!fi =

!fi ,1!"fi ,M

!

"

####

$

%

&&&&

hxi|Oi = L

�1i T

T(⌃(b))�1

f̃i

L

�1i = cov(xi,xi|O) =

⇣I+T

T⌃

(b)�1NiT

⌘�1

5  

Page 6: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

6

I-­‐Vectors  for  Speaker  Verifica4on  

UBM

Training  Data

Training  Total  Variability  Matrix

I-­‐Vector  Extractor LDA+WCCN  

U]erance  from  Target  Speaker  s  

Test  u]erance  t

Scoring  Method

Decision  Maker Reject θ<

θ≥Accept

xs

xt

WTxs

WTxt

T

•  Given  an  u]erance  from  speaker  s  and  a  total  variability  matrix  T,  we  es?mate  his/her  i-­‐Vector  xs

•  Because  T defines  the  combined  space  describing  both  speaker  variability  and  channel  variability,  we  use  LDA+WCCN  to  remove  channel  variability  

Page 7: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

7

I-­‐Vectors  for  Speaker  Verifica4on  

Before  LDA  (x)   Ader  LDA  

Each  point  represents  an  u]erance.  Each  marker  type  represents  a  speaker.  

WTx

7  

Page 8: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

8

I-­‐Vectors  Scoring  

SCD xs,xt( ) =WTxs,W

TxtWTxs W

Txt

•  Given  the  i-­‐vector  of  target  speaker  and  the  i-­‐vector  of  a  test  u]erance,  we  compute  the  cosine-­‐distance  score:  

 

•  If  the  score  is  larger  than  a  threshold  θ,  then  we  accept  the  speaker;  otherwise  we  reject  the  speaker.  

SCD(xs,xt )∈ [0,1]

8  

Page 9: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Probabilis4c  LDA  for  SV  •  PLDA  is  based  on  a  genera?ve  model  that  uses  pre-­‐processed  

i-­‐vectors  as  input  •  It  aims  to  model  the  speaker  and  channel  variability  in  the  i-­‐

vector  space  •  The  method  assumes  that  there  is  a  speaker  subspace  V  

within  the  i-­‐vector  space    •  The  i-­‐vector  xs  is  wri]en  as:  

i-vector extracted from the utterance of

speaker s Global mean of all i-vectors Defining

Speaker subspace

Speaker factor

Residual noise with covariance Σ

xs =m+Vzs +εs

9  

Page 10: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

10

Probabilis4c  LDA  for  SV  •  Similarly,  the  i-­‐vector  xt  from  a  test  u]erance  is  wri]en  as:  

•  Ini?a?vely,  you  may  think  of  zs  and  zt  are  projected  vectors  on  the  speaker  subspace  defined  by  the  eigenvectors  in  V.  

•  But  unlike  PCA,  given  an  i-­‐vector  xt ,  there  are  infinite  numbers  of  zt.  So,  we  need  to  consider  the  joint  density  of  xt  and  zt  when  compu?ng  the  likelihood  of  xt  

 

xt =m+Vzt +εt

10  

Page 11: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

11

PLDA  Scoring  

x t =m+Vz+ εt

x s =m+Vz+ εsxt =m+Vzt +εtxs =m+Vzs +εs

against

H0: Same speaker H1: Different speaker

11  

Page 12: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

12  

Conven4onal  Noise  Robust  PLDA

•  In  conven?onal  mul?-­‐condi?on  training,  we  pool  i-­‐vectors  from  various  background  noise  levels  to  train  m,  V  and  Σ.

 

EM Algorithm {m,V,Σ}

I-vectors with 2 SNR ranges

Page 13: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

13  

Conven4onal  Noise  Robust  PLDA •  Conven?onal  i-­‐vector/PLDA  systems  use  a  channel  

space  (with  covariance        )  to  handle  all  SNR  condi?ons.  

I-­‐Vector/PLDA  Scoring  

Enrollment Utterances

PLDA Scores

{m,V,Σ}

Σ

Page 14: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

14  

Contents

1.  I-­‐Vector/PLDA  for  Speaker  Verifica?on  2.  SNR-­‐Aware  PLDA  Modeling  

–  SNR-­‐Invariant  PLDA  –  Mixture  of  PLDA  

3.  Experiments  on  SRE12  

4.  Conclusions  

Page 15: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

15  

•  We  argue   that   the   varia?on   caused  by   SNR  variability   can  be   modeled   by   an   SNR   subspace   and   u]erances   falling  within   a   narrow   SNR   range   should   share   the   same   SNR  factor  (Li  &  Mak,  Interspeech15;  Li  &  Mak,  T-­‐ASLP  15)  

SNR Subspace

SNR Factor 2

Group1

Group2

Group3

SNR Factor 1

SNR Factor 3

SNR  Invariant  PLDA

Page 16: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

16  

6 dB

•  Method  of  modeling  SNR  informa?on  

clean 15 dB

SNR Subspace

w6dB

wcln

w15dB

I-vector Space

i-vector

SNR  Invariant  PLDA

Page 17: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

17  

SNR-­‐invariant  PLDA •  PLDA:                                                                                      

•  By  adding  an  SNR  factor  to  the  conven?onal  PLDA,  we  have  SNR-­‐invariant  PLDA:  

             where  U  denotes  the  SNR  subspace,                is  an  SNR      factor,  and            is  the  speaker  (iden?ty)  factor  for  speaker  i.

•  Note  that  it  is  not  the  same  as  PLDA  with  channel  subspace  R:  

 

k kij i k ij= + + +x m Vh Uw ε

wk

ih

ij i ij= + +x m Vh ε

xij =m+Vhi +Rrij + εij

i: Speaker index j: Session index

k: SNR index

Page 18: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

18  

SNR-­‐invariant  PLDA •  We  separate  I-­‐vectors  into  different  groups  

according  to  the  SNR  of  their  u]erances    

k kij i k ij= + + +x m Vh Uw ε

EM Algorithm {m,V,U,Σ}

Page 19: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

19  

Compared  with  Conven4onal  PLDA

k kij i k ij= + + +x m Vh Uw ε

Conventional PLDA

ij i ij= + +x m Vh ε

SNR-Invariant PLDA

Page 20: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

20  

PLDA  vs  SNR-­‐invariant  PLDA

PLDA   SNR-­‐invariant  PLDA  

Generative Model

ij i ij= + +x m Vh ε k kij i k ij= + + +x m Vh Uw ε

p(x) = N (x |m,VVT +Σ) ( ) ( | , )T Tp N= + +x x m VV UU Σ

{ }=θ m,V,Σ { }=θ m,V,U,Σ

Page 21: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

21  

PLDA  vs  SNR-­‐invariant  PLDA

PLDA   SNR-­‐invariant  PLDA  

                                                                                                                                                                                                           

E-Step

1 11

| ( )iHTi i ijjX − −

== −∑h L V Σ x m

1| | | TTi i i i iX X X−= +h h L h h

Page 22: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

PLDA   SNR-­‐invariant  PLDA  

22  

PLDA  versus  SNR-­‐invariant  PLDA M-Step

1( ) | |T Tij i i iij ij

X X−

⎡ ⎤ ⎡ ⎤= − ⎣ ⎦⎣ ⎦∑ ∑V x m h h h

( )( ) | ( )T Tij ij i ijij

ii

X

H

⎡ ⎤− − − −⎣ ⎦=∑

∑x m x m V h x m

Σ

Page 23: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

SNR-­‐invariant  PLDA  Score  

23  

Page 24: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

24  

Contents

1.  I-­‐Vector/PLDA  for  Speaker  Verifica?on  2.  SNR-­‐Aware  PLDA  Modeling  

–  SNR-­‐Invariant  PLDA  –  Mixture  of  PLDA  

3.  Experiments  on  SRE12  

4.  Conclusions  

Page 25: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

25  

Mixture  of  PLDA  (mPLDA) •  Conven?onal  i-­‐vector/PLDA  systems  use  a  single  PLDA  

model  to  handle  all  SNR  condi?ons.  

PLDA  Model  

Enrollment i-vectors

PLDA Scores

{m,V,Σ}

Page 26: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

26  

•  We  argue  that  a  PLDA  model  should  focus  on  a  small  range  of  SNR.  

PLDA    Model  1  

PLDA Score

PLDA    Model  2  

PLDA  Model  3  

PLDA Score

PLDA Score

Mixture  of  PLDA  (mPLDA)

Page 27: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

27  

•  The  full  spectrum  of  SNRs  is  handled  by  a  mixture  of  PLDA  in  which  the  posteriors  of  the  indicator  variables  depend  on  the  u]erance’s  SNR  (Mak,  Interspeech14;  Mak  et  al.,  T-­‐ASLP  16)  

PLDA    Model  1  

PLDA Score PLDA    

Model  2  

PLDA    Model  3  

SNR    Es?mator  

SN

R P

oste

rior E

stim

ator

M.W. Mak, X.M. Pang and J.T. Chien, "Mixture of PLDA for Noise Robust I-Vector Speaker Verification", IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, No. 1, pp. 13-0142, Jan. 2016.

Mixture  of  PLDA  (mPLDA)

Page 28: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

28  

Mo4va4on  of  mPLDA •  The  idea  of  mPLDA  is  based  on  two  hypotheses:  

1.  Different  levels  of  background  noise  will  cause  the  i-­‐vectors  to  fall  on  different  regions  of  the  i-­‐vector  space  

2.  SNR  variability  nega?vely  affects  PLDA  speaker  recogni?on  accuracy,  but  its  effect  can  be  mi?gated  by  explicitly  modelling  the  SNR-­‐dependent  speaker  subspaces  through  mixture  of  PLDA.  

Page 29: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

29  

Mo4va4on  of  mPLDA •  To  verify  these  two  hypotheses,  we  corrupted  7,156  clean  

telephone  u]erances  from  763  speakers  with  babble  noise  at  6dB  and  15dB  using  the  FaNT  tool    

•  This  results  in  3  sets  of  i-­‐vectors:  clean,  15dB,  and  6dB  •  Then,  a  GMM  is  constructed  as  shown  below.  

FaNT

FaNT

I-Vector Extraction

I-Vector Extraction

Compute mean & cov

Compute mean & cov

I-Vector Extraction

Compute mean & cov

Construct GMM

Clean speech

{1/3, ⌧k,�k}3k=1

6dB

15dB

⌧1,�1

⌧3,�3

Page 30: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

30  

Mo4va4on  of  mPLDA •  We  used  par??on  coefficients  (PC)  and  par??on  entropy  

coefficients  (PE)  to  quan?fy  the  cluster  separability  of  the  three  groups  of  i-­‐vectors.  

PC à 1 and PE à 0 mean that the clusters are well separated

Page 31: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

31  

Mo4va4on  of  mPLDA •  To  verify  the  2nd  hypothesis,  we  perform  speaker  

iden?fica?on  experiments  under  SNR-­‐match  and  SNR-­‐  mismatch  condi?ons.    

•  There  are  9  combina?ons  of  PLDA  models  and  SNR  groups,  of  which  three  are  matched  in  training  and  test  condi?ons  and  six  are  mismatched.  

•  The  SID  accuracy  gradually  decreases  when  the  SNR  of  the  training  data  progressively  deviates  from  that  of  the  test  data.  

Page 32: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

32  

mPLDA:  Model  Parameters

2  

For modeling SNR of utts.

For modeling SNR-dependent i-vectors

•  Model  Parameters:  

Page 33: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

33  

Graphical  Model  of  mPLDA

For modeling SNR of utts.

For modeling SNR-dependent i-vectors

`ij : SNR of the j-th utterance from the i-th speaker

xij: i-vector of the j-th utterance from the i-th speaker

V ={Vk}k=1K

π ={πk}k=1K

Page 34: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

34  

Graphical  Model:  PLDA  vs.  mPLDA

`ij : SNR of the j-th utterance from the i-th speaker

PLDA mPLDA

Page 35: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

35  

Genera4ve  Model  for  mPLDA

where the posterior prob of SNR is

Pos

terio

r of S

NR

: SNR in dB

Page 36: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

36  

PLDA  vs.  mPLDA

PLDA   Mixture  of  PLDA  

Generative Model

Page 37: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

37  

EM:  PLDA  vs.  mPLDA Auxiliary Function

PLDA:

Mixture of PLDA:

Latent indicator variables:

SNR of training utterances:

Speaker indexes

Session indexes

No. of mixtures

Latent speaker factors:

Page 38: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

38  

EM:  PLDA  vs.  mPLDA

PLDA   Mixture  of  PLDA  

E-Step

Page 39: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

PLDA   Mixture  of  PLDA  

39  

EM:  PLDA  vs.  mPLDA M-Step

Page 40: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

40  

Likelihood-­‐Ra4o  Scores  of  mPLDA •  Same-­‐speaker  likelihood:  

i-vectors of target and test speakers

SNR of target and test utterances

Page 41: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

41  

Likelihood-­‐Ra4o  Scores  of  mPLDA •  Different-­‐speaker  likelihood:  

•  Verifica?on  Score  =    Same-speaker likelihood

Different-speaker likelihood

41  #For full derivation, see http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf

Page 42: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Complexity  Analysis

42  

Dimension of i-vectors

Page 43: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

43  

Types  of  mPLDA •  The  mixture  of  PLDA  models  can  be  of  two  types:  

1.  SNR-­‐independent  mPLDA  (SI-­‐mPLDA)  2.  SNR-­‐dependent  mPLDA  (SD-­‐mPLDA)  

Page 44: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

44  

Types  of  mPLDA •  SNR-­‐independent  mPLDA  is  the  supervised  version  of  Hinton’s  mixture  of  factor  analyzers,  where  the  supervision  comes  from  the  speaker  labels  

•  Equivalent  to  clustering  in  i-­‐vector  space  with  the  subspaces  Vk  of  clusters  determined  by  PLDA  

•  No  guidance  from  SNR  informa?on.    

 

Page 45: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

45  

SI-­‐mPLDA  vs.  SD-­‐mPLDA

Mixture weights independent of the SNR of utterances.

p(x) =KX

k=1

⇢kN (x,VkVTk +⌃k)

•  SNR-­‐independent  mPLDA:  

•  SNR-­‐dependent  mPLDA:  

Posterior prob. of SNR obtained from a 1-D GMM

Page 46: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

46  

Cluster  Alignment  in  mPLDA

SNR-independent mPLDA SNR-dependent mPLDA

In SD-mPLDA, i-vectors that are aligned to the same mixture component have similar SNR

Page 47: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

47  

SNR-­‐dependent  vs.  SNR-­‐independent

Performance on CC4 of NIST12 (male)

PLDA

SNR-indepedent mPLDA

SNR-dependent mPLDA

Page 48: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

48  

Contents

1.  I-­‐Vector/PLDA  for  Speaker  Verifica?on  2.  SNR-­‐Aware  PLDA  Modeling  

–  SNR-­‐Invariant  PLDA  –  Mixture  of  PLDA  

3.  Experiments  on  SRE12  

4.  Conclusions  

Page 49: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

49  

Data  and  Features    •  Evalua4on  dataset:  Common  evalua?on  condi?on  1  and  4  of  

NIST  SRE  2012  core  set.  •  Parameteriza4on:    19  MFCCs    together  with  energy  plus  their  

1st  and  2nd  deriva?ves  à  60-­‐Dim    •  UBM:    gender-­‐dependent,  1024  mixtures    •  Total  Variability  Matrix:  gender-­‐dependent,  500  total  factors  •  I-­‐Vector  Preprocessing:  

Ø Whitening  by  WCCN  then  length  normaliza?on  Ø For  SI-­‐PLDA,  followed  by  NFA  (500-­‐dim  à  200-­‐dim)  +  WCCN  Ø For  mPLDA,  followed  by  LDA  (500-­‐dim  à  200-­‐dim)  +  WCCN  

Page 50: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

50  

Distribu4on  of  SNR  in  SRE12

Each SNR region is handled by a specific set of SNR factors

Page 51: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

51  

Finding  SNR  Groups

Training Utterances

Page 52: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

SNR  Distribu4ons •  SNR Distribution of training and test utterances in CC4

52  

Test Utterances

Training Utterances

Page 53: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Performance  on  SRE12

Method   Parameters   Male   Female  

K   Q   EER(%)   minDCF   EER(%)   minDCF  

PLDA   -­‐   -­‐   5.42   0.371   7.53   0.531  

SDmPLDA   -­‐   -­‐   5.28   0.415   7.70   0.539  

 SNR-­‐Invariant  PLDA    

3   40   5.42   0.382   6.93   0.528  

5   40   5.28   0.381   6.89   0.522  

6   40   5.29   0.388   6.90   0.536  

8   30   5.56   0.384   7.05   0.545  

No. of SNR Groups

No. of SNR factors (dim of ) wk 53  

CC1

Page 54: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Performance  on  SRE12

Method   Parameters  

Male   Female  

K   Q   EER(%)   minDCF   EER(%)   minDCF  

PLDA   -­‐   -­‐   2.40   0.332   2.19   0.335  

SNR-­‐dependent  mPLDA  

-­‐   -­‐   2.47   0.283   2.07   0.328  

SNR-­‐Invariant  PLDA  

3   40   1.96   0.277   1.74   0.290  

6   40   1.99   0.278   1.72   0.290  

No. of SNR Groups

No. of SNR factors (dim of ) wk

54  

CC2

Page 55: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Performance  on  SRE12

Method   Parameters   Male   Female  

K   Q   EER(%)   minDCF   EER(%)   minDCF  

PLDA   -­‐   -­‐   3.13   0.312   2.82   0.341  

SD-­‐mPLDA   -­‐   -­‐   2.88   0.329   2.71   0.332  

 SNR-­‐Invariant  PLDA  

3   40   2.72   0.289   2.36   0.314  

5   40   2.67   0.291   2.38   0.322  

6   40   2.63   0.287   2.43   0.319  

8   30   2.70   0.292   2.29   0.313  

No. of SNR Groups

55  

No. of SNR factors (dim of ) wk

CC4

Page 56: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Performance  on  SRE12

Method   Parameters  

Male   Female  

K   Q   EER(%)   minDCF   EER(%)   minDCF  

PLDA   -­‐   -­‐   2.86   0.286   2.47   0.343  

SNR-­‐dependent  mPLDA  

-­‐   -­‐   2.86   0.295   2.59   0.332  

SNR-­‐Invariant  PLDA  

3   40   2.47   0.273   2.07   0.294  

6   40   2.48   0.275   2.04   0.294  

No. of SNR Groups

No. of SNR factors (dim of ) wk

56  

CC5

Page 57: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Performance  on  SRE12

CC4, Female

Conventional PLDA

SNR-Invariant PLDA

57  

Page 58: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Conclusions

•  We  show  that  while  I-­‐vectors  of  different  SNR  fall  on  different   regions   of   the   I-­‐vector   space,   they   vary  within  a  single  cluster  in  an  SNR-­‐subspace.

•  Therefore,   it   is  possible   to  model   the  SNR  variability  by  adding  an  SNR   loading  matrix  and  SNR   factors   to  the  conven?onal  PLDA  model.  

•  We  also  show  that  I-­‐vectors  derived  from  u]erances  of  different  SNR  live  in  different  speaker  subspaces.  

•  Therefore,   it   is   possible   to  model   SNR   variability   by    mixture  of  SNR-­‐dependent  PLDA  

58  

Page 59: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Bibliography 1.  M.W.  Mak,  X.M.  Pang  and   J.T.   Chien,   "Mixture  of   PLDA   for  Noise  Robust   I-­‐Vector   Speaker  Verifica?on",  

IEEE/ACM  Trans.  on  Audio  Speech  and  Language  Processing,  vol.  24,  No.  1,  pp.  13-­‐0142,  Jan.  2016.    

2.  Na   Li   and   M.W.   Mak,   "SNR-­‐Invariant   PLDA   Modeling   in   Nonparametric   Subspace   for   Robust   Speaker  Verifica?on",   IEEE/ACM  Trans.  on  Audio  Speech  and  Language  Processing,  vol.  23,  no.  10,  pp.  1648-­‐1659,  Oct.  2015.  

3.  W.  Rao   and  M.W.  Mak,   "Boos?ng   the   Performance  of   I-­‐Vector   Based   Speaker  Verifica?on   via  U]erance  Par??oning",   IEEE  Trans.  on  Audio,   Speech  and  Language  Processing,   vol.  21,  no.  5,  pp.  1012-­‐1022,  May  2013.  

4.  N.  Li  and  M.W.  Mak,  "SNR-­‐Invariant  PLDA  with  Mul?ple  Speaker  Subspaces",  ICASSP'16,  March,  2016.  

5.  X.M.  Pang  and  M.W.  Mak,  "Noise  Robust  Speaker  Verifica?on  via  the  Fusion  of  SNR-­‐Independent  and  SNR-­‐Dependent  PLDA",  InternaAonal  Journal  of  Speech  Technology,  Oct.  2015.    

6.  M.W.  Mak,  "Fast  Scoring  for  Mixture  of  PLDA  in   I-­‐Vector/PLDA  Speaker  Verifica?on”  Proc.  APSIPA’15,  pp.  587-­‐593,  Dec.  2015,  Hong  Kong.  

7.  M.W.  Mak   and   H.B.   Yu,   "   A   Study   of   Voice   Ac?vity   Detec?on   Techniques   for   NIST   Speaker   Recogni?on  Evalua?ons",  Computer  Speech  &  Language,  vol.  28,  No.  1,  Jan  2014,  pp.  295-­‐313.  

8.  N.  Li  and  M.W.  Mak,  "SNR-­‐Invariant  PLDA  Modeling  for  Robust  Speaker  Verifica?on,   Interspeech'15,  Sept.  2015,  Dresden,  Germany,  pp.  2317  -­‐  2321.  

9.  P.   Kenny,   “Bayesian   speaker   verifica?on   with   heavy-­‐tailed   priors,”   in   Proc.   of   Odyssey:   Speaker   and  Language  RecogniAon  Workshop,  Brno,  Czech  Republic,  June  2010.  

10.  N.   Dehak,   P.   Kenny,   R.   Dehak,   P.   Dumouchel,   and   P.   Ouellet,   “Front-­‐end   factor   analysis   for   speaker  verifica?on,”  IEEE  TransacAons  on  Audio,  Speech  and  Language  Processing,  vol.  19,  no.  4,  pp.  788–798,  May  2011.  

59  

Page 60: SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Verificaon * Departmentof*Electronic*and*Informaon*Engineering*

Acknowledgment

60  Xiaomin Pang Zhili Tan Shibiao Wan Wei RAO Na LI