45
Introduction to Machine Learning Independent Component Analysis Barnabás Póczos

Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

Introduction to Machine Learning Independent Component Analysis

Barnabás Póczos

Page 2: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

2

Independent Component Analysis

Page 3: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

3

Observations (Mixtures)

original signals

Model

ICA estimated signals

Independent Component Analysis

Page 4: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

4

We observe

Model

We want

Goal:

Independent Component Analysys

Page 5: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

5

PCA EstimationSources Observation

x(t) = As(t)s(t)

Mixing

y(t)=Wx(t)

The Cocktail Party ProblemSOLVING WITH PCA

Page 6: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

6

ICA EstimationSources Observation

x(t) = As(t)s(t)

Mixing

y(t)=Wx(t)

The Cocktail Party ProblemSOLVING WITH ICA

Page 7: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

7

• Perform linear transformations

• Matrix factorization

X U S

X A S

PCA: low rank matrix factorization for compression

ICA: full rank matrix factorization to remove dependency among the rows

=

=

N

N

N

M

M<N

ICA vs PCA, Similarities

Columns of U = PCA vectors

Columns of A = ICA vectors

Page 8: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

8

PCA: X=US, UTU=I

ICA: X=AS, A is invertible

PCA does compression • M<N

ICA does not do compression • same # of features (M=N)

PCA just removes correlations, not higher order dependence

ICA removes correlations, and higher order dependence

PCA: some components are more important than others (based on eigenvalues)

ICA: components are equally important

ICA vs PCA, Similarities

Page 9: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

9

Note• PCA vectors are orthogonal

• ICA vectors are not orthogonal

ICA vs PCA

Page 10: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

10

ICA vs PCA

Page 11: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

11

Gabor wavelets,

edge detection,

receptive fields of V1 cells..., deep neural networks

ICA basis vectors extracted from natural images

Page 12: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

12

PCA basis vectors extracted from natural images

Page 13: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

13

EEG ~ Neural cocktail party Severe contamination of EEG activity by

• eye movements • blinks• muscle• heart, ECG artifact• vessel pulse • electrode noise• line noise, alternating current (60 Hz)

ICA can improve signal • effectively detect, separate and remove activity in EEG records

from a wide variety of artifactual sources. (Jung, Makeig, Bell, and Sejnowski)

ICA weights (mixing matrix) help find location of sources

ICA Application,Removing Artifacts from EEG

Page 14: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

14Fig from Jung

ICA Application,Removing Artifacts from EEG

Page 15: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

15Fig from Jung

Removing Artifacts from EEG

Page 16: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

1616

original noisy Wiener filtered

median filtered

ICA denoised

ICA for Image Denoising

(Hoyer, Hyvarinen)

Page 17: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

17

Method for analysis and synthesis of human motion from motion captured data

Provides perceptually meaningful “style” components

109 markers, (327dim data)

Motion capture ) data matrix

Goal: Find motion style components.

ICA ) 6 independent components (emotion, content,…)

ICA for Motion Style Components

(Mori & Hoshino 2002, Shapiro et al 2006, Cao et al 2003)

Page 18: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

18

walk sneaky

walk with sneaky sneaky with walk

Page 19: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

19

ICA Theory

Page 20: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

20

Definition (Independence)

Statistical (in)dependence

Definition (Mutual Information)

Definition (Shannon entropy)

Definition (KL divergence)

Page 21: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

21

Solving the ICA problem with i.i.d. sources

Page 22: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

22

Solving the ICA problem

Page 23: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

23

Whitening

(We assumed centered data)

Page 24: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

24

Whitening (continued)

We have,

Page 25: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

25

whitenedoriginal mixed

Whitening solves half of the ICA problem

Note:

The number of free parameters of an N by N orthogonal

matrix is (N-1)(N-2)/2.

) whitening solves half of the ICA problem

Page 26: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

26

Remove mean, E[x]=0

Whitening, E[xxT]=I

Find an orthogonal W optimizing an objective function• Sequence of 2-d Jacobi (Givens) rotations

find y (the estimation of s),

find W (the estimation of A-1)

ICA solution: y=Wx

ICA task: Given x,

original mixed whitened rotated

(demixed)

Solving ICA

Page 27: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

27

p q

p

q

Optimization Using Jacobi Rotation Matrices

Page 28: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

28

ICA Cost Functions

Proof: HomeworkLemma

Therefore,

Page 29: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

29

) go away from normal distribution

ICA Cost Functions

Therefore,

The covariance is fixed: I. Which distribution has the largest entropy?

Page 30: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

3030

The sum of independent variables converges to the normal distribution

) For separation go far away from the normal distribution

) Negentropy, |kurtozis| maximization

Figs from Ata Kaban

Central Limit Theorem

Page 31: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

31

Maximum Entropy

Page 32: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

Independent Subspace Analysis

Observation

Hidden, independent

sources (subspaces)

Estimate A and S observing samples from X=AS only

Goal:

32

Page 33: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

Independent Subspace Analysis

In case of perfect separation,

WA is a block permutation

matrix.

Objective:

33

Observation

Estimation

Hidden sources

Page 34: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

34

ICA Algorithms

Page 35: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

35

Kurtosis = 4th order cumulant

Measures

•the distance from normality

•the degree of peakedness

ICA algorithm based on Kurtosis maximization

Page 36: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

36

Probably the most famous ICA algorithm

The Fast ICA algorithm (Hyvarinen)

(¸ Lagrange multiplier)

Solve this equation by Newton–Raphson’s method.

Page 37: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

37

Newton method for finding a root

Page 38: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

38

Example: Finding a Root

http://en.wikipedia.org/wiki/Newton%27s_method

Page 39: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

39

Newton Method for Finding a Root

Linear Approximation (1st order Taylor approx):

Goal:

Therefore,

Page 40: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

40

Illustration of Newton’s method

Goal: finding a root

In the next step we will linearize here in x

Page 41: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

41

Newton Method for Finding a Root

This can be generalized to multivariate functions

Therefore,

[Pseudo inverse if there is no inverse]

Page 42: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

42

Newton method for FastICA

Page 43: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

43

The Fast ICA algorithm (Hyvarinen)

Solve:

The derivative of F :

Note:

Page 44: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

44

The Fast ICA algorithm (Hyvarinen)

Therefore,

The Jacobian matrix becomes diagonal, and can easily be inverted.

Page 45: Introduction to Machine Learning10701/slides/16_ICA.pdfICA weights (mixing matrix) help find location of sources ICA Application, Removing Artifacts from EEG. Fig from Jung 14 ICA

45

Thanks for your Attention