Independent Component Analysis

主講人：虞台文

Content

What is ICA? Nongaussianity Measurement — Kurtosis ICA By Maximization of Nongaussianity Gradient and FastICA Algorithms Using Kurt

osis Measuring Nongaussianity by Negentropy FastICA Using Negentrophy

What is ICA?

Motivation

Example: three people are speaking simultaneously in a room that has three microphones.

Denote the microphone signals by x1(t), x2(t), and x3(t).

They are mixtures of sources s1(t), s2(t), and s3(t).

The goal is to estimate the original speech signals using only the recorded signals.

This is called the cocktail-party problem.

11 12 13

21 22 23

1 2 3323 3

( ) ( ) ( )

s t s t s t

ta sa t a

The Cocktail-Party Problem

The original speech signals The mixed speech signals

The Cocktail-Party Problem

The original speech signals The estimated sources

The Problem

)()()()(

3332321313

3232221212

3132121111

tsatsatsatx

Find the sources s1(t), s2(t) and s

3(t), and the coefficients aij’s from the observed signals x1(t), x2

(t), and x3(t).

It turns out that the problem can be solved just by assuming that the sources si(t) are nongaussian and statistically independent.

)()()()(

3332321313

3232221212

3132121111

txbtxbtxbts

BxxAs 1

Applications

Cocktail party problem: separation of voices or music or sounds

Sensor array processing, e.g. radar Biomedical signal processing with multiple sensors: EEG,

ECG, MEG, fMRI Telecommunications: e.g. multiuser detection in CDMA Financial and other time series Noise removal from signals and images Feature extraction for images and signals Brain modelling

Basic ICA Model

nitsatsatsatx niniii ,,2,1 ),()()()( 2211

Mixing signals(observable) Latent variables

)(1 tx

)(2 tx

),( 21 xxp

)( 1xp

)( 2xp

Asx Asx

The Basic Assumptions

The independent components are assumed statistically independent.

The independent components must have nongaussian distributions.

For simplicity, we assume that the unknown mixing matrix A is square.

Asx Asx

Assumption I:Statistical Independence

Basically, random variables y1, y2, …, yn are said to be independent if information on the value of yi does not give any information on the value of yj for i j.

Mathematically, the joint pdf is factorizable in the following way:

p(y1, y2, …, yn) = p1(y1) p2(y2)…pn(yn) Note that uncorrelatedness does not necessary im

ply independence.

Asx Asx

Assumption II:Nongaussian Distributions

Note that in the basic model we do not have to know what the nongaussian distributions of the ICs look like.

Asx Asx

Assumption III:Mixing Matrix is square

In other words, the number of independent components is equal to the number of observed mixtures.– This simplifies our discussion in the first stage.

However, in the basic ICA model, this is no restriction as long as originally the number of observations xi is at least as large as the number of sources sj.

Asx Asx

Ambiguities of ICA

We cannot determine the variances (energies) of IC’s.

– This also implies E[x]=0 (centering of x) and sign of si is unimportant.

We cannot determine the order of IC’s.

Asx Asx

i ii i

x a Therefore, we assume

1][ 2 isE

0][ isE

PsAPx 1 where P is any permutation matrix.

Illustration of ICA

otherwise

Mixing

Whitening Is Only Half of ICA

whiteningwhitening

Vxz Whitening

Matrix

Whitening Is Only Half of ICA

)( izp

By whitening, we have E[zzT] = I.

This, however, doesn’t imply zi’s are independent, i.e., we may have

iiin zpzzzp

121 )(),,,(

Uncorrelatedness is related to independence, but is weaker than independence.

Vxz Central limit theorem implicitly tells us that the additive of components, makes the distribution to become ‘more’ Gaussian.Therefore, nongaussianity is an important criterion for ICA.

Degaussian is hence the central theme in ICA.

Nongaussianity Measurement — Kurtosis

Moments

dxxpxxE jj

j )(][

dxxpxxE jj

j )(][The jth moment:

dxxpmxxE j

j )()(])[( 1

dxxpmxxE j

j )()(])[( 1The jth central m

oment:

Variance:

][1 xEmx

])[( 22

2xx mxE

Skewness: ])[()( 33 xmxExskew

Moment Generating Function

The moment generating function MX(t) of a random variable X is defined by:

X~N(, 2)

Z~N(0, 1)

dxxpeeEtM txtX

X )(][)(

dxxpeeEtM txtX

X )(][)(

)( ttX eetM

)( tZ etM

][1][)(

3322 tXEtXEtXEeEtM tX

][1][)(

Standard Normal Distribution N(0, 1)

)( tZ etM 2/2

)( tZ etM

0][ 12 kZE

!2][ 2

!32!22!12

42 ttt

][1][)(

Zero for all odd moments

1][ 2 ZE

3][ 4 ZE

Kurtosis

Kurtosis of a zero-mean random variable X is defined by

Normalize kurtosis:

224 ])[(3][)( XEXEXkurt 224 ])[(3][)( XEXEXkurt

0)( Zkurt

Gaussianity

Gaussian

Supergaussian

Subgaussian

2)( .,. xexpge

2 / 21. ., ( )

2xe g p x e

2 / 21. ., ( )

2xe g p x e

],[ ,2

1)( .,. aax

axpge ],[ ,

1)( .,. aax

Kurtosis for Supergaussian

2)( xexp Consider Laplacian Distribution:

dxexXE x||22

dxex x

0][ XE

dxexXE x||44

dxex x

Xkurt4

3)(~ X > 0

Kurtosis for Supergaussian

2)( xexp Consider Laplacian Distribution:

dxexXE x||22

dxex x

0][ XE

dxexXE x||44

dxex x

Xkurt4

3)(~ X > 0

Kurtosis for Subgassian

Consider Uniform Distribution:

0][ XE

aaXkurt

6)(~ X < 0

],[ ,2

1)( aax

Nongaussianity Measurement By Kurtosis

Kurtosis, or rather is absolute value, has been widely used as a measure of nongaussianity in ICA and related fields.

Computationally, kurtosis can be estimated simply by using the 4th moment of the sample data (if the variance is kept constant).

Properties of Kurtosis

Let X1 and X2 be two independent variables both have zero mean.

)()()( 2121 XkurtXkurtXXkurt

)()( 14

1 XkurtXkurt

ICA By Maximization of Nongaussianity

Restate the Problem

zero mean (observable)

mixing matrix (unknown)

zero mean & unit variance IC’s(latent; unknown)

xAs 1Ultimate goal

Simplification

Asx xAs 1Ultimate goal

For simplicity, we assume sources are i.i.d.

To estimate an independent component by

xbTy AsbT sqTIf b is properly identified, qT = bTA contains only one nonzero entry with value one.

This implies that b will be one row of identified, A1.

Nongaussian Is Independent

Asx xAs 1Ultimate goal

For simplicity, we assume sources are i.i.d.

To estimate an independent component by

xbTy AsbT sqT

We will take b that maximizes the nongaussianity of bTx.

sqAsbxb TTTy sqAsbxb TTTy

MixingMixing

Asx Vxz

whiteningwhitening

Vxz )( izp

Additive of components becomes more Gaussian

Wzy Vxz

RotationRotation y1

T),( 21 wwW

Estimated density Wzy

Consider to get one independent component.

Ty w z VAswT sqT

1||||))((|||| 22 wwVAVAwq TTT

Consider to get one independent component.

Ty w z

1||||))((|||| 22 wwVAVAwq TTT

Project the whitened data to a unit vector w to get an independent component.

2211 sqsqy

)()( 2211 sqsqkurtykurt )()( 2

41 skurtqskurtq

1)()(][ 2221

2 sVarqsVarqyE

41 qqc

We require that

The search space is

Using kurtosis as nongaussianity measurement.

Gradient Algorithm Using Kurtosis

Criterion for ICA Using Kurtosis

maximize |)(| zwTkurt

224 ][3][)( XEXEXkurt 224 ][3][)( XEXEXkurt

Subject to 1|||| 2w

|])[(3])[(||)(|224 zwzwzw ww

TTT EEkurt

|||||3])[(|224 wzww TE

}||||3])[()){((4 23 wwzzwzw TT Ekurtsign

Gradient Algorithm

}||||3])[()){((4|)(| 23 wwzzwzwzww TTT Ekurtsignkurt

Subject to 1|||| 2w

unrelated

])[())(( 3 zzwzww TT Ekurtsign||||/ www

FastICA Algorithm

}||||3])[()){((4|)(| 23 wwzzwzwzww TTT Ekurtsignkurt

Subject to 1|||| 2w

At a stable point, the gradient must point in the direction of w.Using fixed-point interation, then

23 ||||3])[( wwzzww TE sign is not important

FastICA wzzww 3])[( 3 TE ||||/ www

Measuring Nongaussianity by Negentrop

Critique of Kurtosis

Kurtosis can be very sensitive to outliers.– Kurtosis may depend on only a few observations i

n the tails of the distribution.

Not a robust measure of nongaussianity.

224 ][3][)( XEXEXkurt 224 ][3][)( XEXEXkurt

Negentropy

dxppH )(log)()( xxX XX

Differential Entropy

Negentropy Entropy

)()()( XXX HHJ gauss )()()( XXX HHJ gauss ≧0Negentropy is zero only when the random variable is Gaussian distributed.

]2log1[2

detlog2

nH gaussX ]2log1[

2detlog

nH gaussX

It is invariant by a invertible linear transformation.

Approximation of Negentropy (I)

][)( 33 XEx

3][)( 44 XEx

Skewness

Kurtosis

For a zero mean and unit variance random variable.

Using approximation is helpless because it is sensitive to outliers.

Approximation of Negentropy (II)

211 )]([)]([)]([)( ZGEXGEkXGEkXJ

Measures the asymmetry

Measures the dimension of bimodality

vs. peak at zero

G1(x) odd

G2(x) even

Choose two nonpolynomial functions

The first term is zero if the underlying density is zero.

such that

dzzZGZGE ]2/exp[)(2

1)]([ 2

dzzZGZGE ]2/exp[)(

1)]([ 2

Usually, only the second term is used.

Approximation of Negentropy (II)

2)]([)]([)( ZGEXGEXJ

If only an even nonpolynomial function, say, G is used, we have

The following two functions are useful

1 coshlog1

]2/exp[)( 22 xxG

21 1 a

G3(x)=x4

Degaussian

2)]([)]([)( ZGEXGEXJ

For ICA, we want to maximize this quantity.

Specifically, let z = Vx be the whitened data.

For one-unit ICA, we want to find a rotation, say, w to

2)]([)]([)( ZGEGEkJ TT zwzwmaximize

1|||| 2wsubject to

Gradient Algorithm

Fact: 1])[( 2 zwTE

)],([)( zwzzwwTT gEJ

)]}([)]([{2 ZGEGEk T zwAlgorithm

)]([ zwzw TgE

||||/ www

constant

batch mode

)( zwzw Tg

||||/ www

On-line mode

Analysis

Consider the term inside the braces.

)]([)]([)( ZGEXGEXf

G3(x)=x4

1 coshlog1

]2/exp[)( 22 xxG

43 )( xxG

The functions G’s we used have the following property:

0)( Xf

if X is supergaussian

if X is subgaussian

Analysis

G3(x)=x4

1 coshlog1

]2/exp[)( 22 xxG

43 )( xxG

0)( Xf

if X is supergaussian

if X is subgaussian

Consider the term inside the braces.

)]([)]([)( ZGEXGEXf

The functions G’s we used have the following property:

Minimize E[G(wTz)] if IC is suppergaussian.

Maximize E[G(wTz)] if IC is subgaussian.

Analysis

xaxg 11 tanh)(

]2/exp[)( 22 xxxg

33 4)( xxg

1 coshlog1

]2/exp[)( 22 xxG

43 )( xxG

G3(x)=x4

Both g1 and g2 are more insensitive on outliers than g3.

Analysis

)]}([)]([{2 ZGEGEk T zwAlgorithm

)]([ zwzw TgE

||||/ www

batch mode

)( zwzw Tg

||||/ www

On-line mode

Controls the search direction.The sign is dependent on the super/subgaussianity of samples

Nonlinearity g(wTt) is for weighting samples.

Stability Analysis

Assume that the input data follows the ICA model with whiten data: z = VAs.

And, G is a sufficiently smooth even function. Then, the local maxima (resp. minima) of E[G(wTz)] unde

r the constraint ||w|| = 1 include those rows of the inverse of the mixing matrix VA such that the corresponding independent components si satisify

)0 .( 0)]()([ respsgsgsE iii

Stability Analysis

Assume that the input data follows the ICA model with whiten data: z = VAs.

And, G is a sufficiently smooth even function. Then, the local maxima (resp. minima) of E[G(wTz)] unde

r the constraint ||w|| = 1 include those rows of the inverse of the mixing matrix VA such that the corresponding independent components si satisify

0)]()([)]()([ ZGsGEsgsgsE iiii

This condition is, in general, true for reasonable choices of G.This condition is, in general, true for reasonable choices of G.

FastICA Using Negentropy

Clue From Gradient Algorithm

GradientAlgorithm

)]([ zwzw TgE

||||/ www

batch mode

)( zwzw Tg

||||/ www

On-line mode

Fixed-point iteration suggested:

)]([ zwzw TgE||||/ www

Nonpolynomial moments do not have the same nice algebraic properties as kurtosis. Such a iteration scheme is poor.

Newton’s Method

)]([ zwTGEMaximize or minimize

1|||| 2wsubject to

Construct the Lagrangian as follows:

wwzwzw TTT GEL )]([)(

2 TT LL

Newton’s method finds an extreme point by letting:

Newton’s Methodwwzwzw TTT GEL )]([)( wwzwzw TTT GEL )]([)(

2 TT LL

Newton’s method finds an extreme point of the by letting:

)]([)( T

Izwzzw

wzwzIzwzzw

)]([)]([1 TTT gEgE

Evaluate the Hessian matrix and its inverse

is time consuming.We want to

approximate it.

)]([)( T

Izwzzw

wzwzIzwzzw

)]([)]([1 TTT gEgE

Izwzwzzzwzz )]([)]([][)]([ TTTTT gEgEEgE

IzwIzwzz )]([)]([ TTT gEgE A diagonal matix

)]([)( T

Izwzzw

wzwzIzwzzw

)]([)]([1 TTT gEgE

IzwIzwzz )]([)]([ TTT gEgE A diagonal matix

1)]([)]([

zwwzwzw TT gEgE

FastICA 1

)]([)]([

zwwzwzw TT gEgE 1)]([)]([

zwwzwzw TT gEgE

1)]([)]([

zwwzwzww TT gEgE

wzwzwzwwzw )]([)]([)]([

1 TTT gEgEgE

)]([)]([ zwzwzw TT gEgE

The algorithm:

||||/ www

FastICA

||||/ www

1. Center the data to make mean zero.

2. Whiten the data to give z.

3. Choose the initial vector w of unit norm.

6. If not converged, go back to step 4.

FastICA

||||/ www

3. Choose the initial vector w of unit norm.

6. If not converged, go back to step 4.

1 coshlog1

]2/exp[)( 22 xxG

43 )( xxG

xaxg 11 tanh)(

]2/exp[)( 22 xxxg

33 4)( xxg

)tanh1()( 12

11 xaaxg

]2/exp[)1()( 222 xxxg

23 12)( xxg

FastICAxa

11 coshlog

]2/exp[)( 22 xxG

43 )( xxG

xaxg 11 tanh)(

]2/exp[)( 22 xxxg

33 4)( xxg

)tanh1()( 12

11 xaaxg

]2/exp[)1()( 222 xxxg

23 12)( xxg

-4 -3 -2 -1 0 1 2 3 4

-4 -3 -2 -1 0 1 2 3 4cc

-4 -3 -2 -1 0 1 2 3 4

Estimating Several IC’s

Deflation Orthogonalization– Based on Gram-Schmidt Method

Symmetric Orthogonalization– Adjust vectors in parallel

Deflation Orthogonalization

3. Choose m, the number of IC’s to estimate, set counter p←1

4. Choose an initial vector wp of unit norm, randomly.

8. If wp not converged, go back to step 5.

9. Set p← p +1, if p<m, go back to step 4.

)]([)]([ zwzwzw Tp

Tpp gEgE

||||/ ppp www

j jjTppp wwwww

Symmetric Orthogonalization

1. Choose the number of independent components to estimate, say, m.

2. Initialize the wi, i=1,…,m.

3. Do an iteration of one-unit algorithm on every wi in paralle

4. Do a symmetric orthogonalization of matrixW=(w1,… , wn).

5. If not converged, go back to step3.

Symmetric Orthogonalization

Method 1: (Classic Method)

WWWW 2/1)( T

1. Let

2. Let

3. If WWT is not close enough to identity, go back to step

Method 2: (Iteration Method)

||||/ WWW WWWWW T

Independent Component Analysis

Documents

Independent Component Analysis Enhancements for Source

Independent Component Analysis For Track Classification

Independent Component Analysis (ICA)

Independent Component Analysis of …papers.nips.cc/paper/1091-independent-component-analysis...Independent Component Analysis of Electroencephalographic Data Scott Makeig Naval Health

Independent Component Analysis of Biomedical Signalspapers.cnl.salk.edu/PDFs/Independent Component Analysis of... · Proceedings of the 2nd International Workshop on Independent Component

ALGEBRAIC OVERCOMPLETE INDEPENDENT COMPONENT ANALYSIS · ALGEBRAIC OVERCOMPLETE INDEPENDENT COMPONENT ANALYSIS ... signal processing, ... Overcomplete Independent Component Analysis

Application of independent component analysis to ...suinlee/... · Application of independent component analysis to microarraysWe apply linear and nonlinear independent component

Independent component analysis and blind source separationresearch.ics.aalto.fi/ica/biennial2003-2.pdf · Independent component analysis ... component analysis and blind source separation

Probabilistic PCA, Independent Component Analysis

Comparison of Different Independent Component Analysis

Another tutorial on Independent Component Analysis

Independent Component Analysis and Projection Pursuit.ps

Independent Component Analysis - Örebro Universityaass.oru.se/.../2007_06_07b-Ungh-Independent_Component_Analysi… · Further information: Book: Independent Component Analysis -

Applications of independent component analysis

Independent Component Analysis Computing Independent ...maov/classes/vision08/... · Independent Component Analysis • PCA finds the directions that uncorellate • ICA / Blind Source

Independent Component Analysis & Blind Source Separation

Topics Part I •Principal Component Analysis •Independent Component Analysis · •Principal Component Analysis •Independent Component Analysis. Fall 2004 Pattern Recognition

SECTION 5A ICA Independent Component Analysis ICAmathstat.carleton.ca/~smills/2016-17/STAT5703/Pdf Notes/Section5… · Independent Component Analysis (ICA ) (“Independent Component

1 Independent Component Analysis Reference: Independent Component Analysis: A Tutorial by Aapo Hyvarinen, http:

Independent Component Analysis: Algorithms and Applicationsric.uthscsa.edu/personalpages/lancaster/SPM_Class/Lecture_18/ica... · Independent Component Analysis: Algorithms and Applications