Click here to load reader
Upload
hacong
View
214
Download
0
Embed Size (px)
Citation preview
Team 7 (Zemeng Wang, Chimiao Wang)Instructor: Debasis MitraCSE 5290
Team Project Report: PCA-ICA
Principal Component Analysis
Team 7
Zemeng Wang
Chimiao Wang
Theory
Principal Component Analysis (PCA) is a statistical procedure that uses an
orthogonal transformation to convert a set of observations of possibly correlated
variables into a set of values of linearly uncorrelated variables called principal
components. [1] It is usually used for decreasing the dimension of the data set, while
maintaining the largest contribution to the variance in the data set. This transformation
is defined in such a way that the first principal component has the largest possible
variance (that is, accounts for as much of the variability in the data as possible), and
each succeeding component in turn has the highest variance possible under the
constraint that it is orthogonal to the preceding components. The resulting vectors are
an uncorrelated orthogonal basis set. PCA is sensitive to the relative scaling of the
original variables. [1] If a multivariate data set can be displayed in a high-dimensional
data space coordinate system, then the PCA will be able to provide a relatively low-
dimensional image, the image is the most information point in the original object of a
'projection'. This allows a small number of principal components to be used to reduce
the dimensionality of the data. [2]
Intuition by math
PCA can be thought of as fitting an n-dimensional ellipsoid to the data, where
each axis of the ellipsoid represents a principal component. If some axis of the ellipse
is small, then the variance along that axis is also small, and by omitting that axis and
its corresponding principal component from our representation of the dataset, we lose
only a commensurately small amount of information. [4]
So we define an n X m matrix, XT is the data that remove average (move to the origin
with the average value as the center), Its rows are the data samples, and the columns
are the data classes. So the singular value decomposition of X is X = WΣVT, m X m
matrix W is the Eigenvector matrix of XXT, Σ is the Non - negative rectangular
diagonal matrix of m X n, n X n matrix V is the Eigenvector matrix of XTX. So we
could know that:
When m< n-1, V is not uniquely defined in the normal case, and Y is uniquely
defined. W is an orthogonal matrix, YT is the transpose of XT, and the first column of
YT is composed of the first principal component, the second column is composed of
the second principal component, and so on. [4]
Steps
1. Find the mean of each feature (column) and subtract the mean value.
2. Find the characteristic covariance matrix.
3. Find the eigenvalues and eigenvectors of the covariance matrix.
4. Sort the eigenvalues from the largest to the smallest, and choose the first largest k
values. And the corresponding k eigenvectors are respectively used as column
vectors to form the eigenvector matrix.
5. The points of data are projected onto the selected k eigenvectors.
So that we get a new k dimensions data that changed from n dimensions data.
Data
We will use The Yale Face Database B [3] as our data to implement the algorithm.
Reference
[1] Wikipedia: Principal component analysis
https://en.wikipedia.org/wiki/Principal_component_analysis
[2] Chinese Wikipedia 主成分分析 https://zh.wikipedia.org/wiki/%E4%B8%BB
%E6%88%90%E5%88%86%E5%88%86%E6%9E%90
[3] Yale face database http://vision.ucsd.edu/content/yale-face-database
[4] What is the principal component of data?
https://www.zhihu.com/question/38417101
[5] Stone, James V. Independent Component Analysis: A Tutorial Introduction.
Cambridge, MA: MIT, 2004. Print.
Independent Component Analysis
Introduction
Imagine you're at a cocktail party. For you it is no problem to follow the
discussion of your neighbors, even if there are lots of other sound sources in the room.
You might even hear a siren from the passing-by police car. It is not known exactly
how humans are able to separate the different sound sources. Independent component
analysis is able to do it, if there are at least as many microphones or 'ears' in the room
as there are different simultaneous sound sources.
Independent component analysis (ICA) is a statistical and computational
technique for revealing hidden factors that underlie sets of random variables,
measurements, or signals.
ICA defines a generative model for the observed multivariate data, which is
typically given as a large database of samples. In the model, the data variables are
assumed to be linear mixtures of some unknown latent variables, and the mixing
system is also unknown. The latent variables are assumed non-Gaussian and mutually
independent, and they are called the independent components of the observed data.
These independent components, also called sources or factors, can be found by ICA.
ICA is superficially related to principal component analysis and factor analysis. ICA
is a much more powerful technique, however, capable of finding the underlying
factors or sources when these classic methods fail completely.
The data analyzed by ICA could originate from many different kinds of
application fields, including digital images, document databases, economic indicators
and psychometric measurements. In many cases, the measurements are given as a set
of parallel signals or time series; the term blind source separation is used to
characterize this problem. Typical examples are mixtures of simultaneous speech
signals that have been picked up by several microphones, brain waves recorded by
multiple sensors, interfering radio signals arriving at a mobile phone, or parallel time
series obtained from some industrial process.
Definition of ICA
The data are represented by the random vector x=(x1 ,…,xm )T and the
components as the random vector s= (s1 ,…, sn )T . The task is to transform the observed
data x, using a linear static transformation W as s=Wx into maximally independent
components s measured by some function F (s1 ,…, sn)FFfsss of independence.
Principles of ICA estimation
The ICA separation of mixed signals gives very good results is based on two
assumptions and three effects of mixing source signals.
Two assumptions:
1. The source signals are independent of each other.
2. The values in each source signal have non-Gaussian distributions.
Three effects of mixing source signals:
1. Independence: As per assumption 1, the source signals are independent;
however, their signal mixtures are not. This is because the signal mixtures
share the same source signals.
2. Normality: According to the Central Limit Theorem, the distribution of a sum
of independent random variables with finite variance tends towards a
Gaussian distribution. Loosely speaking, a sum of two independent random
variables usually has a distribution that is closer to Gaussian than any of the
two original variables. Here we consider the value of each signal as the
random variable.
3. Complexity: The temporal complexity of any signal mixture is greater than
that of its simplest constituent source signal.
Those principles contribute to the basic establishment of ICA. If the signals we
happen to extract from a set of mixtures are independent like sources signals, or have
non-Gaussian histograms like source signals, or have low complexity like source
signals, then they must be source signals
Methods for ICA
1. Measures of non-Gaussianity
2. Minimization of mutual information
3. Maximum likelihood estimation
Preprocessing for ICA
1. Centering
2. Whitening
3. Further preprocessing
Data
Original sound sources:
1. 2. 3. 4. 5.
6.
Samples at the cocktail party:
1. 2. 3.
4. 5. 6.
Reference
Hyvärinen, A., and E. Oja. "Independent Component Analysis: Algorithms and
Applications." Neural Networks 13.4-5 (2000): 411-30. Web.
Stone, James V. Independent Component Analysis: A Tutorial Introduction.
Cambridge, MA: MIT, 2004. Print.