Independent Component Analysis and Unsupervised Learning · • Independent component analysis...

Preview:

Citation preview

Independent Component Analysisand Unsupervised Learning

Jen-Tzung Chien

TABLE OF CONTENTS

1. Independent Component Analysis

2. Case Study I: Speech Recognition• Independent voices• Nonparametric likelihood ratio ICA

3. Case Study II: Blind Source Separation• Convex divergence ICA• Nonstationary Bayesian ICA• Online Gaussian process ICA

4. Summary

Introduction• Independent component analysis (ICA) is essential for

blind source separation.

• ICA is applied to separate the mixed signals and find the independent components.

• The demixed components can be grouped into clusterswhere the intra-cluster elements are dependent and inter-cluster elements are independent.

• ICA provides unsupervised learning approach to acoustic modeling, signal separation and many others.

APSIPA DL: Independent Component Analysis and Unsupervised Learning 3

Blind Source Separation• Cocktail-party problem

• Goal−Unknown: A and s−Reconstruct the source signals via demixing matrix W−Mixture matrix A is assumed to be fixed.

APSIPA DL: Independent Component Analysis and Unsupervised Learning 4

Independent Component Analysis

• Three assumptions−sources statistically independent− independent component nongaussian distribution−mixing matrix square matrix

tm

mtm

t

mmm

m

mtm

t

SS

SS

AA

AA

XX

XX

1

111

1

111

1

111

mm tm

Asx

APSIPA DL: Independent Component Analysis and Unsupervised Learning 5

6

ICA Objective Function

APSIPA DL: Independent Component Analysis and Unsupervised Learning

ICA Learning Rule

• ICA demixing matrix can be estimated by optimizing an objective function via gradient descent algorithm or natural gradient algorithm

APSIPA DL: Independent Component Analysis and Unsupervised Learning 7

TABLE OF CONTENTS

1. Independent Component Analysis

2. Case Study I: Speech Recognition• Independent voices• Nonparametric likelihood ratio ICA

3. Case Study II: Blind Source Separation• Convex divergence ICA• Nonstationary Bayesian ICA• Online Gaussian process ICA

4. Summary

ICA for Speech Recognition

• Mismatch between training and test data always exists. Adaptation of HMM parameters is important.

• Eigenvoice (PCA) versus Independent Voice (ICA) −PCA performs a linear de-correlation process − ICA extracts the higher-order statistics

][ ][ ][][ 2121 MM eEeEeEeeeE

][ ][ ][][ 2121rM

rrrM

rr sEsEsEsssE

uncorrelation PCA

higher-order correlations are zero ICA

APSIPA DL: Independent Component Analysis and Unsupervised Learning 9

Sparseness & Information Redundancy

• The degree of sparseness in distribution of the transformed signals is proportional to the amount of information conveyed by the transformation.

• Sparseness measurement− fourth-order statistics (kurtosis) nongaussianity

• Information redundancy reduction using ICA is higher than that using PCA.

3][][)kurt( 224 sEsEs

APSIPA DL: Independent Component Analysis and Unsupervised Learning 10

11

Eigenvoices versus Independent Voices

APSIPA DL: Independent Component Analysis and Unsupervised Learning

12

Evaluation of Kurtosis

0 5 10 15 20 25 30 355

10

15

20

25

30

35

Voice index

Kur

tosi

s

Independent voiceEigenvoice

APSIPA DL: Independent Component Analysis and Unsupervised Learning

13

Word Error Rates on Aurora2

K: number of components L: number of adaptation sentencesK=10 K=15

02

7.0

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

Wor

d er

ror r

ates

(%)

No adaptation Eigenvoice L=5 Eigenvoice L=10 Eigenvoice L=15 Independent voice L=5 Independent voice L=10 Independent voice L=15

APSIPA DL: Independent Component Analysis and Unsupervised Learning

TABLE OF CONTENTS

1. Independent Component Analysis

2. Case Study I: Speech Recognition• Independent voices• Nonparametric likelihood ratio ICA

3. Case Study II: Blind Source Separation• Convex divergence ICA• Nonstationary Bayesian ICA• Online Gaussian process ICA

4. Summary

Test of Independence• Given the demixing signals , the null &

alternative hypotheses are defined as

• If y is Gaussian distributed, we are testing whether the correlation between and is equal to zero, i.e. or

APSIPA DL: Independent Component Analysis and Unsupervised Learning 15

Likelihood Ratio

• LR serves as the test statistics which measures the confidence for against .

• LR is a measure of independence forand can act as an objective

function for finding ICA demixing matrix.• However, it is not allowed to assume Gaussianity

for ICA problem.

APSIPA DL: Independent Component Analysis and Unsupervised Learning 16

Nonparametric Approach

• Let each sample be transformed by .

• Instead of assuming Gaussianity, we apply the kernel density estimation

using Gaussian kernel

• Kernel centroid is given by

APSIPA DL: Independent Component Analysis and Unsupervised Learning 17

Nonparametric Likelihood Ratio

• NLR objective function

with multivariate Gaussian kernel

APSIPA DL: Independent Component Analysis and Unsupervised Learning 18

ICA Learning Procedure

Output W

APSIPA DL: Independent Component Analysis and Unsupervised Learning 19

• Log likelihood ratio for null and alternative hypotheses

• Maximizing with respect to , , we obtain

....

....

NLR-ICA

K-means clustering

Training data

HMM 1 HMM 2 HMM M

Test dataSpeech recognizer

Hidden Markov model training

Recognitionresult

Viterbi alignment

Segment-based supervectorcollection for a subword unit

X

Y

Lexicon

Cluster 1 Cluster 2 Cluster M

1 2 M

Segment-Based Supervector

Aligned utterance

Aligned utterance

Aligned utterance

1sx2sx

Nsx...

...

...

2x

1x

1Tx Tx

1x 2x 1Tx TxX

...

Segment-basedsupervector matrix

APSIPA DL: Independent Component Analysis and Unsupervised Learning 21

TABLE OF CONTENTS

1. Independent Component Analysis

2. Case Study I: Speech Recognition• Independent voices• Nonparametric likelihood ratio ICA

3. Case Study II: Blind Source Separation• Convex divergence ICA• Nonstationary Bayesian ICA• Online Gaussian process ICA

4. Summary

ICA Objective Function

Independent Component Analysis

NegentropyDivergence Measure

MaximumLikelihood Kurtosis

Kullback-Leiblier (KL) Divergence

Euclidean Divergence

Cauchy Schwartz

Divergence

Convex Divergence-Divergence

23APSIPA DL: Independent Component Analysis and Unsupervised Learning

Mutual Information & KL Divergence

24APSIPA DL: Independent Component Analysis and Unsupervised Learning

• Mutual information between two variables and is defined by using the Shannon entropy .

• It can be formulated as the KL divergence or relative entropy between the joint distribution and the product of marginal distribution

where .

Divergence Measures

25APSIPA DL: Independent Component Analysis and Unsupervised Learning

• Euclidean divergence

• Cauchy-Schwartz divergence

• -divergence

Divergence Measures

26APSIPA DL: Independent Component Analysis and Unsupervised Learning

• f-divergence

• Jensen-Shannon divergence

where . Entropy is a concave function.

Convex Function

)(f : convex function

27APSIPA DL: Independent Component Analysis and Unsupervised Learning

• A convex function should meet the Jensen’s inequality

• A general convex function is defined by

Convex Divergence• By assuming equal weight , we have

• When , C-DIV is derived as a case with convex function

28APSIPA DL: Independent Component Analysis and Unsupervised Learning

Different Divergence Measures

29APSIPA DL: Independent Component Analysis and Unsupervised Learning

Different Divergence Measures

30APSIPA DL: Independent Component Analysis and Unsupervised Learning

Convex Divergence ICA

• C-ICA learning algorithm

• Nonparametric C-ICA is established by using Parzen window density function.

31APSIPA DL: Independent Component Analysis and Unsupervised Learning

Simulated Experiments

• A parametric demixing matrix

• Two sources: super-Gaussian and sub-Gaussiandistribution

• Kurtosis−Source 1: -1.13, source 2: 2.23

22

11

sincossincos

W

otherwise,0

],[,21

)( 11111

ssp

2

2

22 exp

21)(

s

sp

32APSIPA DL: Independent Component Analysis and Unsupervised Learning

33

KL-DIV

C-DIV alpha=1 C-DIV, alpha= -1

APSIPA DL: Independent Component Analysis and Unsupervised Learning

Learning Curves

34APSIPA DL: Independent Component Analysis and Unsupervised Learning

Experiments on Blind Source Separation

• One music signal and two speech signals from two male speakers were sampled from ICA’99 BSS Test Sets at http://sound.media.mit.edu/ica-bench/

• Mixing matrix

• Evaluation metric−signal-to-interference ratio (SIR)

3.07.03.02.08.03.03.02.08.0

A

T

t ttT

t t 12

12

10log10)dB(SIR sys

35APSIPA DL: Independent Component Analysis and Unsupervised Learning

PC-ICA NC-ICA

36APSIPA DL: Independent Component Analysis and Unsupervised Learning

Comparison of Different Methods

TABLE OF CONTENTS

1. Independent Component Analysis

2. Case Study I: Speech Recognition• Independent voices• Nonparametric likelihood ratio ICA

3. Case Study II: Blind Source Separation• Convex divergence ICA • Nonstationary Bayesian ICA• Online Gaussian process ICA

4. Summary

• Real-world blind source separation−number of sources is unknown−BSS is a dynamic time-varying system−mixing process is nonstationary

• Why nonstationary?−Bayesian method using ARD can determine the changing number

of sources− recursive Bayesian for online tracking of nonstationary conditions−Gaussian process provides a nonparametric solution to represent

temporal structure of time-varying mixing system.

Why Nonstationary Source Separation?

APSIPA DL: Independent Component Analysis and Unsupervised Learning 38

Nonstationary Mixing Systems

• Time-varying mixing matrix• Source signals may abruptly appear or disappear

APSIPA DL: Independent Component Analysis and Unsupervised Learning

1S

2S

3S

1d3d

2d)(

22ta

)(21ta

)(23ta

)1(21ta

'1d )1(

22)(

22 tt aa

)1(23

)(23

tt aa

'2d

)2(22ta

)2(23

)1(23

tt aa

0)2(21 ta

2S

39

Nonstationary Bayesian (NB) Learning

• Maximum a posteriori estimation of NB-ICA parameters and compensation parameters

APSIPA DL: Independent Component Analysis and Unsupervised Learning

updating

(t-1)

(t-1)

(t)

Prior Updating

(t)

Learning epoch t

(t+1)

Prior Updating

(t+1)

Learning epoch t+1

Learning epoch t+1

Learning epoch t

40

Model Construction

• Noisy ICA model

• Likelihood function of an observation

• Distribution of model parameters

−source

−mixing matrix

−noise

APSIPA DL: Independent Component Analysis and Unsupervised Learning 41

Prior & Marginal Distributions

• Prior distributions−precision of noise

−precision of mixing matrix

• Marginal likelihood of NB-ICA model

APSIPA DL: Independent Component Analysis and Unsupervised Learning 42

Automatic Relevance Determination

• Detection of source signals

−number of sources can be determined

APSIPA DL: Independent Component Analysis and Unsupervised Learning 43

Compensation for Nonstationary ICA

• Prior density of compensation parameter−conjugate prior (Wishart distribution)

APSIPA DL: Independent Component Analysis and Unsupervised Learning 44

Graphical Model for NB-ICA

APSIPA DL: Independent Component Analysis and Unsupervised Learning 45

L

)(ltx

)(lR

)(lΠ

)(lts

)(lA)()( ll Hα

)(lM)(lB

)1( lρ )1( lV

)1( lu

)1( lω)(l

)1( lΦ

)1( lmΦ

)1( lrΦ

NM

MN

Experiments

• Nonstationary Blind Source Separation− ICA'99 http://sound.media.mit.edu/ica-bench/

• Scenarios−state of source signals: active or inactive−source signals or sensors are moving: nonstationary mixing matrix

APSIPA DL: Independent Component Analysis and Unsupervised Learning 46

Source Signals and ARD Curves

APSIPA DL: Independent Component Analysis and Unsupervised Learning

Blue: first source signalRed: second source signal

47

TABLE OF CONTENTS

1. Independent Component Analysis

2. Case Study I: Speech Recognition• Independent voices• Nonparametric likelihood ratio ICA

3. Case Study II: Blind Source Separation• Convex divergence ICA • Nonstationary Bayesian ICA • Online Gaussian process ICA

4. Summary

Online Gaussian Process (OLGP)

• Basic ideas

− incrementally detect the status of source signals and estimate the corresponding distributions from online observation data

.

− temporal structure of time-varying mixing coefficients are characterized by Gaussian process.

−Gaussian process is a nonparametric model which defines the priordistribution over functions for Bayesian inference.

APSIPA DL: Independent Component Analysis and Unsupervised Learning 49

Model Construction

• Noisy ICA model• Likelihood function

• Distribution of model parameters− source

− noise

−P

APSIPA DL: Independent Component Analysis and Unsupervised Learning 50

Gaussian Process

• Mixing matrix− is generated by the latent function

− GP is adopted to describe the distribution of

− are hyperparameters of kernel function

APSIPA DL: Independent Component Analysis and Unsupervised Learning 51

Graphical Model for OLGP-ICA

APSIPA DL: Independent Component Analysis and Unsupervised Learning 52

L

)(lts)(l

tx

)(ltA

)1( laR)1( l

aM

)(lB

)1( lu

)1( lω )(ltε

NM

)1( lsR

)1( lsM

)1( lsλ

)1( lsρ

)1( laΞ

)1( laΛ

MN

Experimental Setup

• Nonstationary source separation using source signals from−http://www.kecl.ntt.co.jp/icl/signal/

• Nonstationary scenarios−status of source signals: active or inactive−source signals or sensors are moving: nonstationary mixing matrix

APSIPA DL: Independent Component Analysis and Unsupervised Learning 53

APSIPA DL: Independent Component Analysis and Unsupervised Learning

Male Music

Female

54

Comparison of Different Methods

• Signal-to-interference ratios (SIRs) (dB)

APSIPA DL: Independent Component Analysis and Unsupervised Learning

VB-ICA BICA-HMM Switching-ICA

OnlineVB-ICA

OLGP-ICA

Demixed signal 1 7.97 9.04 12.06 11.26 17.24

Demixed signal 2 -3.23 -1.5 -4.82 4.47 9.96

55

Summary• We presented speaker adaptation method based on

independent voices by fulfilling ICA perspective.• A nonparametric likelihood ratio ICA was proposed

according to hypothesis test theory. • A convex divergence was developed as an optimization

metric for ICA algorithm.• A nonstationary Bayesian ICA was proposed to deal with

nonstationary mixing system.• An online Gaussian process ICA was presented for

nonstationary and temporally correlated source separation.

• ICA methods could be extended to solve nonnegative matrix factorization and single-channel separation.

APSIPA DL: Independent Component Analysis and Unsupervised Learning 56

References• J.-T. Chien and B.-C. Chen, “A new independent component analysis for speech

recognition and separation”, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1245-1254, 2006.

• J.-T. Chien, H.-L. Hsieh and S. Furui, “A new mutual information measure for independent component analysis”, in Proc. ICASSP, pp. 1817-1820, 2008.

• H.-L. Hsieh, J.-T. Chien, K. Shinoda and S. Furui, “Independent component analysis for noisy speech recognition”, in Proc. ICASSP, pp. 4369-4372, 2009.

• H.-L. Hsieh and J.-T. Chien, “Online Bayesian learning for dynamic source separation”, in Proc. ICASSP, pp. 1950-1953, 2010.

• H.-L. Hsieh and J.-T. Chien, “Online Gaussian process for nonstationary speech separation”, in Proc. INTERSPEECH, pp. 394-397, 2010.

• H.-L. Hsieh and J.-T. Chien, “Nonstationary and temporally-correlated source separation using Gaussian process”, in Proc. ICASSP, pp. 2120-2123, 2011.

• J.-T. Chien and H.-L. Hsieh, “Convex divergence ICA for blind source separation”, IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 1, pp. 290-301, 2012.

APSIPA DL: Independent Component Analysis and Unsupervised Learning 57

APSIPA DL: Independent Component Analysis and Unsupervised Learning 58

Thanks to

H.-L. Hsieh K. Shinoda S. Furui

jtchien@nctu.edu.tw

Recommended