1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science

1

USING CLASS WEIGHTING IN INTER-CLASS MLLR

Sam-Joo Doh and Richard M. Stern

Department of Electrical and Computer Engineering

and School of Computer Science

Carnegie Mellon University

October 20, 2000

Robust Speech Group2Carnegie Mellon

Outline

Introduction

Review Transformation-based adaptation Inter-class MLLR

Application of weights For different neighboring classes

Summary


Introduction

We would like to achieve Better adaptation using small amount of adaptation data

Enhance recognition accuracy

Current method Reduce the number of parameters Assume transformation function

Transformation-based adaptation Example: Maximum likelihood linear regression


Introduction (cont’d)

Transformation-based adaptation Transformation classes are assumed to be independent It does not achieve reliable estimates for multiple classes using a small amount of adaptation data

Better idea ? Utilize inter-class relationship to achieve more reliable

estimates for multiple classes


Transformation-Based Adaptation

Estimate each target parameter (mean vector)


Transformation-Based Adaptation (cont’d)

Estimate each transformation function


Transformation-Based Adaptation (cont’d)

Trade-off

Better estimationof transformation function

More details

of target parameters

Number of transformation classes

Quality


Previous Works

Consider Correlations among model parameters

Mostly in Bayesian framework

Considering a few neighboring models: Not effective

Considering all neighboring models: Too much computation

It is difficult to apply correlation on multi-Gaussian mixtures: No explicit correspondence


Previous Works (cont’d)

Using correlations among model parameters


Inter-Class Relationship

Inter-class relationshipamong transformation functions ?


Inter-Class Relationship (cont’d)

Two classes are independent

Class 1 Class 2

2class

),(μμ̂ 222

k

f kk

1class

),(μμ̂ 111

j

f jj


Inter-Class Relationship (cont’d)

If we know an inter-class transformation g12(.)

Now class 2 data contribute to the estimation of f1(.)

More reliable estimation of f1(.) while it keeps the characteristics of Class 1

Transform class 2 parameters

k2μ k2μ̂f2(.)

f1(.)g12(.) 2k(12)

Class 1 Class 2

2class

),(μμ̂ 222

k

f kk

1class

),(μμ̂ 111

j

f jj

2class

),(μμ̂ )12(212

k

f kk

f2(.) can be estimated by transforming class 1 parameters


Use Linear Regression for inter-class transformation

class Target : 1 Class ,b A 11 kkk μμ̂

11

11

b A

b A

μ̂ k)(μ

)μ()12(

1212

k

k dT

tkk t k k t to C o k Q

1 Class

1 T) μ ( ) μ )( ( 1 1 1 1b A b A

Estimate (A1, b1) to minimize Q

k C

t k k

t o

k

t

t

Gaussian of matrix covariance :

time at Gaussian with being of y probabilit :

time at vector feature input :

posteriori a ) (

Where

tk

k t k k t to C o k2 Class

) 12 ( 1 T ) 12 () μ ( ) μ )( ( 1 1 1 1b A b A

Inter-class MLLR

2 Class ,b A 22 kkk μμ̂


Application of Weights

Neighboring classes have different contributions to the target class


Application of Weights (cont’d)

Application of weights to the neighboring classes We assume in neighboring class n

The error using (A1, b1) in neighboring class n

11 b A )1(μμ̂ nkk

tn

kt eo 11 b A )1(μ

tnk

nk t k

nk t t

tkk t k k t t

o C o k

o C o k Q

Neighbor n Class 1 1

T1 1

1 Class 1 1

T1 1

b A b A

b A b A

) μ ( ) μ )( (

) μ ( ) μ )( (

) 1( 1 ) 1(

1

Weighted least squares estimation: Use the variance of the error for weight Large error Small weight Small error Large weight


Number of Neighboring Classes

Limit the number of neighboring classes

Sort neighboring classes Set threshold for the number of samples

Use “closer” neighboring class first

Count the number of samples used

Use next neighboring classes until the number of samples

exceed the threshold


Experiments

Test data 1994 DARPA, Wall Street Journal (WSJ) task 10 Non-native speaker x 20 test sentences (Spoke 3: s3-94)

Baseline System: CMU SPHINX-3 Continuous HMM, 6000 senones 39 dimensional features

• MFCC cepstra + delta + delta-delta + power

Supervised/Unsupervised adaptation Focus on small amounts of adaptation data

13 phonetic-based classes for inter-class MLLR


ExperimentsExperiments (cont’d)

Adaptation Method 1 Adapt. Sent. 3 Adapt. Sent.

Baseline (No adapt) 27.3% 27.3%

Conventional MLLR (one class) 24.1% 23.1%

Inter-class MLLR without weights (full + shift)

20.4% (15.4%) 19.6% (15.2%)

Inter-class MLLR with weights

(full + shift)

20.2% (16.2%) 19.3% (16.5%)

Supervised adaptation

Word Error Rates


ExperimentsExperiments (cont’d)

Adaptation Method 1 Test Sent. 10 Test Sent.

Baseline (No adapt) 27.3% 27.3%

Conventional MLLR (one class) 26.7% 23.9%

Inter-class MLLR without weights (full + shift)

24.0 % (10.1%) 20.1% (15.9%)

Inter-class MLLR with weights

(full + shift)

24.3 % (9.0%) 19.9% (16.7%)

Unsupervised adaptation

Word Error Rates


Experiments (cont’d)

Limit the number of neighboring classes

Supervised adaptation: 10 adaptation sentences

0 2 4 6 8 10 1219

19.5

20

20.5

21

Average number of classes

WE

R (

%)


Summary

Application of weights Use weighted least square estimation Was helpful for supervised case Was not helpful for unsupervised case

(with small amount of adaptation data)

Number of neighboring classes Use smaller number of neighboring classes

as more adaptation data are available


Summary

Inter-class transformation It can have speaker-dependent information We may prepare several sets of inter-class

transformations

Select appropriate set for a new speaker

Combination with Principal Component MLLR Did not provide additional improvement


Thank you !

Documents

1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science