26
Relevance Vector Machines with Empirical Likelihood- Ratio Kernels for PLDA Speaker Verification Wei Rao and Man-Wai Mak Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China ISCSLP 2014

Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

  • Upload
    bernie

  • View
    66

  • Download
    0

Embed Size (px)

DESCRIPTION

Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification. Wei Rao and Man-Wai Mak. ISCSLP 2014. Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China. Summary. - PowerPoint PPT Presentation

Citation preview

Page 1: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for

PLDA Speaker Verification

Wei Rao and Man-Wai Mak

Department of Electronic and Information EngineeringThe Hong Kong Polytechnic University, Hong Kong SAR, China

ISCSLP 2014

Page 2: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Summary• Our previous studies have shown that PLDA-SVM scoring

have three advantages:I. Better utilization of multiple enrollment utterances.II. Use the information of background speakers explicitly.III. Open up opportunity for adopting sparse kernel machines for

PLDA-based speaker verification systems.

• In this work, we – investigate the property of empirical kernels in SVM and

relevance vector machine (RVM)– compare the performance between SVM and RVM in

PLDA-based speaker verification.

2

Page 3: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Outline• Empirical LR Kernels

– Likelihood Ratio Score for Gaussian PLDA– PLDA Score Space– Empirical Kernel Map– SVM with Empirical Kernels

• Relevance Vector Machines– RVM vs. SVM– Limitations of SVM– RVM Regression– PLDA-RVM Scoring

• Experiment Setup• Results• Conclusions

3

Page 4: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Likelihood Ratio Score for Gaussian PLDA

Given a length-normalized test i-vector and target speaker’s i-vector , the likelihood ratio score can be computed as follows:

4

is the factor loading matrix and is the covariance of the PLDA model.

Implicitly use Implicitly use background background informationinformation

Page 5: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

PLDA Score Space• Given a set of target speaker’s i-vectors

and a set of background speakers’ i-vectors we create a PLDA score space for this speaker.

• For each target-speaker’s i-vector, we compute a score vector that lives in this space.

• The score vectors of this speakers are used to train a speaker-dependent SVM for scoring.

5

Constructing PLDA Score

Vector

PLDA Model

SVM Training

Speaker-dependent

SVM

Score vectors of background speakers

Page 6: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Empirical Kernel Map

6

• We refer to the mapping from i-vectors to as Empirical Kernel Map.

• Given a test i-vector , we have a test score-vector

Constructing PLDA Score

Vector

PLDA Model

Page 7: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Utterance Partitioning with Acoustic Vector Resampling (UP-AVR)

7

• For speakers with a small number of enrollment utterances, we apply utterance partitioning (Rao and Mak, 2013) to produce more i-vectors from a few utterances.

One enrollment utt

RN + 1 enrollment i-vectors

Page 8: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

SVM with Empirical LR Kernel

8

Target speaker’s i-vectors

Non-target speakers’ i-vectors

PLDA Scoring+

Empirical Kernel Map

Background speaker’s i-vectors

SVM

Training

Target speaker

SVM

sX

bX

bX

Non-target speaker’s i-vectors bX

Target speaker’s i-vectors sX

,s bX X X

PLDA Scoring+

Empirical Kernel Map

SVM , , , ,SV SV

, , , ,s b

t s b s j t s j s j t b j sj j

S x X X K x x K x x d

Page 9: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Outline• Empirical LR Kernels

– Likelihood Ratio Score for Gaussian PLDA– PLDA Score Space– Empirical Kernel Map– SVM with Empirical Kernels

• Relevance Vector Machines– RVM vs. SVM– Limitations of SVM– RVM Regression– PLDA-RVM Scoring

• Experiment Setup• Results• Conclusions

9

Page 10: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

RVM versus SVM• RVM stands for relevance vector machine (Tipping, 2001)• The scoring function of RVM and SVM is the same:

where are the optimized weights. is a bias term. is a kernel function, e.g. empirical LR kernel.

10

0w

• Difference between RVM and SVM SVM training is based on Structural Risk Minimization RVM training is based on Bayesian Relevance Learning

Page 11: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Limitations of SVM• The number of support vectors in SVM increases linearly

with the size of the training set.• The SVM scores are not probabilistic, meaning that score

normalization is required to adjust the score range of individual SVMs.

• It is necessary to tradeoff the training error and margin of separation through adjusting the penalty factor for each target speaker during SVM training.

11

RVM is designed to address the above issues in SVM

Page 12: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

RVM Regression• In RVM regression, the target y is assumed to follow a Gaussian

distribution with mean and variance

• To avoid over-fitting, RVM defines a prior distribution over the RVM weights w

• Note: when

12

Page 13: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

RVM Regression• Given a test vector , the predictive distribution is

where

13

Page 14: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

PLDA-RVM Scoring• We use the mean of the predictive distribution as the

verification score:

where

14

Page 15: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Outline• Empirical LR Kernels

– Likelihood Ratio Score for Gaussian PLDA– PLDA Score Space– Empirical Kernel Map– SVM with Empirical Kernels

• Relevance Vector Machines– RVM vs. SVM– Limitations of SVM– RVM Regression– PLDA-RVM Scoring

• Experiment Setup• Results• Conclusions

15

Page 16: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

16

Speech Data and PLDA Models

• Evaluation dataset: Common evaluation condition 2 of NIST SRE 2012 male core set.

• Parameterization: 19 MFCCs together with energy plus their 1st and 2nd derivatives 60-Dim

• UBM: gender-dependent, 1024 mixtures • Total Variability Matrix: gender-dependent, 400 total factors• I-Vector Preprocessing:

Whitening by WCCN then length normalization Followed by LDA (500-dim 200-dim) and WCCN

Page 17: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

17

Property of Empirical LR Kernels in SVM and RVM

When the number of SVs increases, the performance of SVMs becomes poor.

When the number of RVs is very small, the performance of RVM is poor.

The performance of RVM is fairly stable and better than that of SVMs once the number of RVs is sufficient.

Page 18: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

18

Property of Empirical LR Kernels in SVM and RVM

When the RBF width increases, the number of SVs decreases first and then gradually increases.

The number of RVs monotonically decreases when the RBF width increases.

The RVMs is less sparse than the SVMs for a wide range of RBF width.

RBF Width γ

Page 19: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

19

Property of Empirical LR Kernels in SVM and RVM

For the RVMs to be effective, we need at least 50 RVs.

Once the RVMs have sufficient RVs, their performance can be better than that of the SVMs.

RBF Width γ

Page 20: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Performance Comparison (CC2 of SRE12, male)

20

Methods EER MinNDCF

PLDA 2.40 0.33

PLDA+UP-AVR 2.32 0.32

PLDA+SVM 2.07 0.31

PLDA+RVM-C 3.76 0.48

PLDA+RVM-R 2.32 0.28

PLDA+UP-AVR+SVM 1.97 0.30

PLDA+UP-AVR+RVM-C 3.00 0.42

PLDA+UP-AVR+RVM-R 1.94 0.28

Both PLDA-SVM and PLDA-RVM (regression) perform better than PLDA.

RVM classification performs poorly because of the small number of RVs.

After performing utterance partitioning (UP-AVR), the performance of both RVM regression and SVM improves and RVM regression slightly outperforms SVM.

Page 21: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Conclusions

• RVM classification is not appropriate for the verification task in NIST SRE, but RVM regression with empirical LR kernel can achieve comparable performance as PLDA-SVM.

• UP-AVR can boost the performance of both SVM and RVM regression and the performance of RVM regression is slightly better than SVM after adopting utterance partitioning.

21

Page 22: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

22

Page 23: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Optimization of RBF Width

23

The preferredvalue of for RVMs is between 1400 and 1500 and that for SVMs is between 1900 and 2000.

Page 24: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Property of Empirical LR Kernels in SVM and RVM

24

• 108 target speakers with true-target trials and imposter trials were extracted from NIST 2012 SRE.

• EER and achieved by the SVMs and RVMs and their corresponding number of support vectors (SVs) and relevance vectors (RVs) were averaged across the 108 speakers.

Page 25: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Experiment Setup for SVM and RVM

25

• Optimization of RBF Width: a development set was created for optimizing RBF parameter for each target speakers’ SVM/RVM. True-target trials: UP-AVR was applied for his/her enrollment

utterances to generate a number of enrollment i-vectors. Some of these i-vectors are used for training SVM/RVM model. The remaining i-vectors were used as true-target trials.

Imposter trials: 200 background utterances were selected from the previous NIST SREs.

Varying the width to maximize the difference between the mean of the true-target scores and the mean of the imposter scores.

Page 26: Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Empirical Kernel Maps

26