21

Click here to load reader

cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

  • Upload
    hacong

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

Team 7 (Zemeng Wang, Chimiao Wang)Instructor: Debasis MitraCSE 5290

Team Project Report: PCA-ICA

Page 2: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

Principal Component Analysis

Team 7

Zemeng Wang

Chimiao Wang

Theory

Principal Component Analysis (PCA) is a statistical procedure that uses an

orthogonal transformation to convert a set of observations of possibly correlated

variables into a set of values of linearly uncorrelated variables called principal

components. [1] It is usually used for decreasing the dimension of the data set, while

maintaining the largest contribution to the variance in the data set. This transformation

is defined in such a way that the first principal component has the largest possible

variance (that is, accounts for as much of the variability in the data as possible), and

each succeeding component in turn has the highest variance possible under the

constraint that it is orthogonal to the preceding components. The resulting vectors are

an uncorrelated orthogonal basis set. PCA is sensitive to the relative scaling of the

original variables. [1] If a multivariate data set can be displayed in a high-dimensional

data space coordinate system, then the PCA will be able to provide a relatively low-

dimensional image, the image is the most information point in the original object of a

'projection'. This allows a small number of principal components to be used to reduce

the dimensionality of the data. [2]

Intuition by math

Page 3: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

PCA can be thought of as fitting an n-dimensional ellipsoid to the data, where

each axis of the ellipsoid represents a principal component. If some axis of the ellipse

is small, then the variance along that axis is also small, and by omitting that axis and

its corresponding principal component from our representation of the dataset, we lose

only a commensurately small amount of information. [4]

So we define an n X m matrix, XT is the data that remove average (move to the origin

with the average value as the center), Its rows are the data samples, and the columns

are the data classes. So the singular value decomposition of X is X = WΣVT, m X m

matrix W is the Eigenvector matrix of XXT, Σ is the Non - negative rectangular

diagonal matrix of m X n, n X n matrix V is the Eigenvector matrix of XTX. So we

could know that:

When m< n-1, V is not uniquely defined in the normal case, and Y is uniquely

defined. W is an orthogonal matrix, YT is the transpose of XT, and the first column of

YT is composed of the first principal component, the second column is composed of

the second principal component, and so on. [4]

Steps

1. Find the mean of each feature (column) and subtract the mean value.

2. Find the characteristic covariance matrix.

3. Find the eigenvalues and eigenvectors of the covariance matrix.

Page 4: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

4. Sort the eigenvalues from the largest to the smallest, and choose the first largest k

values. And the corresponding k eigenvectors are respectively used as column

vectors to form the eigenvector matrix.

5. The points of data are projected onto the selected k eigenvectors.

So that we get a new k dimensions data that changed from n dimensions data.

Data

We will use The Yale Face Database B [3] as our data to implement the algorithm.

Page 5: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

Reference

[1] Wikipedia: Principal component analysis

https://en.wikipedia.org/wiki/Principal_component_analysis

[2] Chinese Wikipedia 主成分分析 https://zh.wikipedia.org/wiki/%E4%B8%BB

%E6%88%90%E5%88%86%E5%88%86%E6%9E%90

[3] Yale face database http://vision.ucsd.edu/content/yale-face-database

[4] What is the principal component of data?

https://www.zhihu.com/question/38417101

[5] Stone, James V. Independent Component Analysis: A Tutorial Introduction.

Cambridge, MA: MIT, 2004. Print.

Page 6: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

Independent Component Analysis

Introduction

Imagine you're at a cocktail party. For you it is no problem to follow the

discussion of your neighbors, even if there are lots of other sound sources in the room.

You might even hear a siren from the passing-by police car. It is not known exactly

how humans are able to separate the different sound sources. Independent component

analysis is able to do it, if there are at least as many microphones or 'ears' in the room

as there are different simultaneous sound sources.

Independent component analysis (ICA) is a statistical and computational

technique for revealing hidden factors that underlie sets of random variables,

measurements, or signals.

ICA defines a generative model for the observed multivariate data, which is

typically given as a large database of samples. In the model, the data variables are

assumed to be linear mixtures of some unknown latent variables, and the mixing

system is also unknown. The latent variables are assumed non-Gaussian and mutually

independent, and they are called the independent components of the observed data.

These independent components, also called sources or factors, can be found by ICA.

ICA is superficially related to principal component analysis and factor analysis. ICA

is a much more powerful technique, however, capable of finding the underlying

factors or sources when these classic methods fail completely.

The data analyzed by ICA could originate from many different kinds of

Page 7: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

application fields, including digital images, document databases, economic indicators

and psychometric measurements. In many cases, the measurements are given as a set

of parallel signals or time series; the term blind source separation is used to

characterize this problem. Typical examples are mixtures of simultaneous speech

signals that have been picked up by several microphones, brain waves recorded by

multiple sensors, interfering radio signals arriving at a mobile phone, or parallel time

series obtained from some industrial process.

Definition of ICA

The data are represented by the random vector  x=(x1 ,…,xm )T and the

components as the random vector s= (s1 ,…, sn )T . The task is to transform the observed

data x, using a linear static transformation W as s=Wx into maximally independent

components s measured by some function F (s1 ,…, sn)FFfsss  of independence.

Principles of ICA estimation

The ICA separation of mixed signals gives very good results is based on two

assumptions and three effects of mixing source signals.

Two assumptions:

1. The source signals are independent of each other.

2. The values in each source signal have non-Gaussian distributions.

Page 8: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

Three effects of mixing source signals:

1. Independence: As per assumption 1, the source signals are independent;

however, their signal mixtures are not. This is because the signal mixtures

share the same source signals.

2. Normality: According to the Central Limit Theorem, the distribution of a sum

of independent random variables with finite variance tends towards a

Gaussian distribution. Loosely speaking, a sum of two independent random

variables usually has a distribution that is closer to Gaussian than any of the

two original variables. Here we consider the value of each signal as the

random variable.

3. Complexity: The temporal complexity of any signal mixture is greater than

that of its simplest constituent source signal.

Those principles contribute to the basic establishment of ICA. If the signals we

happen to extract from a set of mixtures are independent like sources signals, or have

non-Gaussian histograms like source signals, or have low complexity like source

signals, then they must be source signals

Methods for ICA

1. Measures of non-Gaussianity

2. Minimization of mutual information

3. Maximum likelihood estimation

Page 9: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

Preprocessing for ICA

1. Centering

2. Whitening

3. Further preprocessing

Data

Original sound sources:

1. 2. 3. 4. 5.

6.

Samples at the cocktail party:

Page 10: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

1. 2. 3.

4. 5. 6.

Page 11: cs.fit.edu · Web viewPCA can be thought of as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the

Reference

Hyvärinen, A., and E. Oja. "Independent Component Analysis: Algorithms and

Applications." Neural Networks 13.4-5 (2000): 411-30. Web.

Stone, James V. Independent Component Analysis: A Tutorial Introduction.

Cambridge, MA: MIT, 2004. Print.