Upload
lee-winters
View
59
Download
0
Embed Size (px)
DESCRIPTION
Principal Components and Factor Analysis. Dr. Adel Elomri [email protected]. Principal Components Analysis (PCA). Data Analysis and Presentation. We have too many observations and dimensions To reason about or obtain insights from To visualize - PowerPoint PPT Presentation
Citation preview
Principal Components Analysis
(PCA)
Data Analysis and PresentationCriterion 1 Criterion 2 …. Criterion n
Observation1
Observation 2
…
…
Observation n
• We have too many observations and dimensions– To reason about or obtain insights from– To visualize– To find classification, clustering, pattern recognition
How many unique “sub-sets” are in the sample?
How are they similar / different?
What are the underlying factors that influence the samples?
Which time / temporal trends are (anti)correlated?
Which measurements are needed to differentiate?
How to best present what is “interesting”?
Which “sub-set” does this new sample rightfully belong?
Data Analysis and Presentation
0 10 20 30 40 50 60 700
0.20.40.60.811.21.41.61.8
Person
H-B
an
ds
Univariate
0 50 150 250 350 45050100150200250300350400450500550
C-Triglycerides
C-L
DH
Bivariate
0100
200300
400500
0200
4006000
1
2
3
4
C-TriglyceridesC-LDH
M-E
PI
Trivariate
Data Presentation
What if we have 5, or 8, or 10 Dimensions?
Data Analysis and PresentationHow to comment on this data?
Car Type Engine displacement (cm3)
Power (h)
Speed (km/h)
Weight (kg)
Length (mm)
Width(mm)
Citroën C2 1.1 Base 1124 61 158 932 1659 366
Smart Fortwo Coupé 698 52 135 730 1515 2500
Mini 1.6 170 1598 170 218 1215 1690 3625
Nissan Micra 1.2 65 1240 65 154 965 1660 3715
Renault Clio 3.0 V6 2946 255 245 1400 1810 3812
Audi A3 1.9 TDI 1896 105 187 1295 1765 4203
Peugeot 307 1.4 HDI 7 1398 70 160 1179 1746 4202
Peugeot 407 3.0 V6 BVA 2946 211 229 1640 1811 4676
Mercedes Classe C 270 CDI 2685 170 230 1600 1728 4528
BMW 530d 2993 218 245 1595 1846 4841
Jaguar S Type 2.7 V6 Bi 2720 207 230 1722 1818 4905
BMW 745i 4398 333 250 1870 1902 5029
Mercedes Classe S 400 CDI 3966 260 250 1915 2092 5038
Citroën C3 Pluriel 1.6i 1587 110 185 1177 1700 3934
BMW Z4 2.5i 2494 192 235 1260 1781 4091
Audi TT 1.8T 180 1781 180 228 1280 1764 4041
Aston Martin Vanquish 5935 460 306 1835 1923 4665
Bentley Continental GT 5998 560 318 2385 1918 4804
Ferrari Enzo 5998 660 350 1365 2650 4700
Renault Scenic 1.9 dCi 120 1870 120 188 1430 1805 4259
Volkswagen Touran 1.9 TDI 105 1896 105 180 1498 1794 4391
Land Rover Defender Td5 2495 122 135 1695 1790 3883
Land Rover Discovery Td5 2495 138 157 2175 2190 4705
Nissan X Trail 2.2 dCi 2184 136 180 1520 1765 4455
Source: L’argus 2004, France
• We have too many observations and dimensions– To reason about or obtain insights from– To visualize– Too much noise in the data
– Need to “reduce” them to a smaller set of factors– Better representation of data without losing much information– Can build more effective data analyses on the reduced-
dimensional space: classification, clustering, pattern recognition
Data Presentation: The goal
• Discover a new set of factors/dimensions/axes against which to represent, describe or evaluate the data– For more effective reasoning, insights, or better visualization– Reduce noise in the data– Typically a smaller set of factors: dimension reduction – Better representation of data without losing much information– Can build more effective data analyses on the reduced-
dimensional space: classification, clustering, pattern recognition
• Factors are combinations of observed variables – May be more effective bases for insights, even if physical
meaning is obscure– Observed data are described in terms of these factors rather than
in terms of original variables/dimensions
Principal Components Analysis (PCA)
PCA: Basic Concept
• Areas of variance in data are where items can be best discriminated and key underlying phenomena observed– Areas of greatest “signal” in the data
• If two items or dimensions are highly correlated or dependent– They are likely to represent highly related phenomena– If they tell us about the same underlying variance in the data, combining
them to form a single measure is reasonable• Parsimony• Reduction in Error
• So we want to combine related variables, and focus on uncorrelated or independent ones, especially those along which the observations have high variance
• We want a smaller set of variables that explain most of the variance in the original data, in more compact and insightful form
PCA: Basic Concept
• What if the dependences and correlations are not so strong or direct?
• And suppose you have 3 variables, or 4, or 5, or 10000?
• Look for the phenomena underlying the observed covariance/co-dependence in a set of variables– Once again, phenomena that are uncorrelated or independent,
and especially those along which the data show high variance
• These phenomena are called “factors” or “principal components” or “independent components,”
PCA:
• The new variables/dimensions– Are linear combinations of the original ones– Are uncorrelated with one another
• Orthogonal in original dimension space
– Capture as much of the original variance in the data as possible
– Are called Principal Components
PCA: The goalPCA used to reduce dimensions of data without much loss of information. Explain/summarize the underlying variance-covariance structure of a large set of variables through a few linear combinations of these variables.
Used in machine learning and in signal processing and image compression (among other things).
PCA: Applications
• Uses:– Data Visualization– Data Reduction– Data Classification– Trend Analysis– Factor Analysis– Noise Reduction
PCA:
All is about the way you look at the data
PCA: Example
PCA: Example
PCA: Example
Principle of PCA
This is accomplished by rotating the axes.
Suppose we have a population measured on p random variables X1,…,Xp. Note that these random variables represent the p-axes of the Cartesian coordinate system in which the population resides. Our goal is to develop a new set of p axes (linear combinations of the original p axes) in the directions of greatest variability:
X1
X2
Trick: Rotate Coordinate Axes
Background and theory for PCA
• Suppose attributes are A1 and A2, and we have n training examples. x’s denote values of A1 and y’s denote values of A2 over the training examples.
• Variance of an attribute:
Background for PCA
)1(
)()var( 1
2
1
n
xxA
n
ii
• Covariance of two attributes:
• If covariance is positive, both dimensions increase together. If negative, as one increases, the other decreases. Zero: independent of each other/non linear relationship.
)1(
))((),cov( 1
21
n
yyxxAA
n
iii
Background for PCA
• Covariance matrix– Suppose we have n attributes, A1, ..., An.
– Covariance matrix: ),cov( where),( ,, jijiji
nn AAccC
Background for PCA
Covariance matrix
3705.104
5.1047.47
)var(5.104
5.104)var(
),cov(),cov(
),cov(),cov(
M
H
MMHM
MHHH
Background for PCA
• Eigenvectors: – Let M be an nn matrix.
• v is an eigenvector of M if M v = v• is called the eigenvalue associated with v
– For any eigenvector v of M and scalar a,
– Thus you can always choose eigenvectors of length 1:
– If M has any eigenvectors, it has n of them, and they are orthogonal to one another.
– Thus eigenvectors can be used as a new basis for a n-dimensional vector space.
vvM aa
1... 221 nvv
Background for PCA
PCA: Algebraic Interpretation
• Given m points in a n dimensional space, for large n, how does one project on to a low dimensional space while preserving broad trends in the data and allowing it to be visualized?
PCA :Algebraic Interpretation – 1D
• Given m points in a n dimensional space, for large n, how does one project on to a 1 dimensional space?
• Choose a line that fits the data so the points are spread out well along the line
• Formally, minimize sum of squares of distances to the line.
PCA :Algebraic Interpretation – 1D
• Minimizing sum of squares of distances to the line is the same as maximizing the sum of squares of the projections on that line, thanks to Pythagoras.
PCA :Algebraic Interpretation – 1D
PCA: General
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
such that:
yk's are uncorrelated (orthogonal)y1 explains as much as possible of original variance in data sety2 explains as much as possible of remaining varianceetc.
PCA: General
4.0 4.5 5.0 5.5 6.02
3
4
5
1st Principal Component, y1
2nd Principal Component, y2
PCA Scores
4.0 4.5 5.0 5.5 6.02
3
4
5
xi2
xi1
yi,1 yi,2
PCA Eigenvalues
4.0 4.5 5.0 5.5 6.02
3
4
5
λ1λ2
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
such that:
yk's are uncorrelated (orthogonal)y1 explains as much as possible of original variance in data sety2 explains as much as possible of remaining varianceetc.
yk's arePrincipal Components
PCA: Another Explanation
{a11,a12,...,a1k} is 1st Eigenvector of correlation/covariance matrix, and coefficients of first principal component
{a21,a22,...,a2k} is 2nd Eigenvector of correlation/covariance matrix, and coefficients of 2nd principal component
…
{ak1,ak2,...,akk} is kth Eigenvector ofcorrelation/covariance matrix, and coefficients of kth principal component
PCA: General
PCA Summary
• Rotates multivariate dataset into a new configuration which is easier to interpret
• Purposes– simplify data– look at relationships between variables– look at patterns of units
A 2d-Numerical Example
PCA Example –STEP 1
• Subtract the mean
from each of the data dimensions. All the x values have x subtracted and y values have y subtracted from them. This produces a data set whose mean is zero.
Subtracting the mean makes variance and covariance calculation easier by simplifying their equations. The variance and co-variance values are not affected by the mean value.
PCA Example –STEP 1
DATA:x y2.5 2.40.5 0.72.2 2.91.9 2.23.1 3.02.3 2.72 1.61 1.11.5 1.61.1 0.9
ZERO MEAN DATA:
x y
.69 .49
-1.31 -1.21
.39 .99
.09 .29
1.29 1.09
.49 .79
.19 -.31
-.81 -.81
-.31 -.31
-.71 -1.01
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
PCA Example –STEP 1
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
PCA Example –STEP 2
• Calculate the covariance matrix
cov = .616555556 .615444444
.615444444 .716555556
• since the non-diagonal elements in this covariance matrix are positive, we should expect that both the x and y variable increase together.
PCA Example –STEP 3
• Calculate the eigenvectors and eigenvalues of the covariance matrix
eigenvalues = .0490833989
1.28402771
eigenvectors = -.735178656 -.677873399
.677873399 -.735178656
PCA Example –STEP 3
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
• eigenvectors are plotted as diagonal dotted lines on the plot.
• Note they are perpendicular to each other.
• Note one of the eigenvectors goes through the middle of the points, like drawing a line of best fit.
• The second eigenvector gives us the other, less important, pattern in the data, that all the points follow the main line, but are off to the side of the main line by some amount.
PCA Example –STEP 4
• Reduce dimensionality and form feature vectorthe eigenvector with the highest eigenvalue is the principle component of the data set.
In our example, the eigenvector with the larges eigenvalue was the one that pointed down the middle of the data.
Once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives you the components in order of significance.
PCA Example –STEP 4
Now, if you like, you can decide to ignore the components of lesser significance.
You do lose some information, but if the eigenvalues are small, you don’t lose much
• n dimensions in your data • calculate n eigenvectors and eigenvalues• choose only the first p eigenvectors• final data set has only p dimensions.
PCA Example –STEP 4
• Feature Vector
FeatureVector = (eig1 eig2 eig3 … eign)We can either form a feature vector with both of the eigenvectors:
-.677873399 -.735178656 -.735178656 .677873399
or, we can choose to leave out the smaller, less significant component and only have a single column: - .677873399
- .735178656
PCA Example –STEP 5
• Deriving the new data
FinalData = RowFeatureVector x RowZeroMeanData
RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top
RowZeroMeanData is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension.
PCA Example –STEP 5
FinalData transpose: dimensions along columns x y
-.827970186 -.1751153071.77758033
.142857227-.992197494
.384374989-.274210416
.130417207-1.67580142
-.209498461-.912949103
.175282444.0991094375 -.3498246981.14457216
.0464172582.438046137
.01776462971.22382056
-.162675287
PCA Example –STEP 5
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
A 6-D Numerical Example-XLSTAT
Car Type Engine displacement (cm3)
Power (h)
Speed (km/h)
Weight (kg)
Length (mm)
Width(mm)
Citroën C2 1.1 Base 1124 61 158 932 1659 366
Smart Fortwo Coupé 698 52 135 730 1515 2500
Mini 1.6 170 1598 170 218 1215 1690 3625
Nissan Micra 1.2 65 1240 65 154 965 1660 3715
Renault Clio 3.0 V6 2946 255 245 1400 1810 3812
Audi A3 1.9 TDI 1896 105 187 1295 1765 4203
Peugeot 307 1.4 HDI 7 1398 70 160 1179 1746 4202
Peugeot 407 3.0 V6 BVA 2946 211 229 1640 1811 4676
Mercedes Classe C 270 CDI 2685 170 230 1600 1728 4528
BMW 530d 2993 218 245 1595 1846 4841
Jaguar S Type 2.7 V6 Bi 2720 207 230 1722 1818 4905
BMW 745i 4398 333 250 1870 1902 5029
Mercedes Classe S 400 CDI 3966 260 250 1915 2092 5038
Citroën C3 Pluriel 1.6i 1587 110 185 1177 1700 3934
BMW Z4 2.5i 2494 192 235 1260 1781 4091
Audi TT 1.8T 180 1781 180 228 1280 1764 4041
Aston Martin Vanquish 5935 460 306 1835 1923 4665
Bentley Continental GT 5998 560 318 2385 1918 4804
Ferrari Enzo 5998 660 350 1365 2650 4700
Renault Scenic 1.9 dCi 120 1870 120 188 1430 1805 4259
Volkswagen Touran 1.9 TDI 105 1896 105 180 1498 1794 4391
Land Rover Defender Td5 2495 122 135 1695 1790 3883
Land Rover Discovery Td5 2495 138 157 2175 2190 4705
Nissan X Trail 2.2 dCi 2184 136 180 1520 1765 4455
Correlation matrix (Pearson (n)):
Variables
Engine displacem
ent (cm3)
Power (h)
Speed (km/h)
Weight (kg)
Length (mm)
Width(mm)
Engine displacement (cm3) 1 0.954 0.885 0.692 0.706 0.546
Power (h) 0.954 1 0.934 0.529 0.730 0.448
Speed (km/h) 0.885 0.934 1 0.466 0.619 0.488
Weight (kg) 0.692 0.529 0.466 1 0.477 0.679
Length (mm) 0.706 0.730 0.619 0.477 1 0.468
Width(mm) 0.546 0.448 0.488 0.679 0.468 1
Principal Component Analysis:
Eigenvalues:
F1 F2 F3 F4 F5 F6Eigenvalue 4.256 0.882 0.432 0.348 0.063 0.019Variability (%) 70.931 14.694 7.204 5.799 1.053 0.320Cumulative % 70.931 85.625 92.829 98.627 99.680 100.000
F1 F2 F3 F4 F5 F6
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
20
40
60
80
100
Scree plot
axis
Eig
en
va
lue
Cu
mu
lati
ve
va
ria
bil
ity
(%
)
Correlations between variables and factors:
F1 F2 F3 F4 F5 F6Engine displacement (cm3) 0.962 -0.129 -0.136 -0.107 -0.140 -0.085Power (h) 0.932 -0.326 -0.092 -0.004 -0.077 0.104Speed (km/h) 0.891 -0.307 -0.220 0.185 0.170 -0.027Weight (kg) 0.745 0.531 -0.112 -0.381 0.074 0.018Length (mm) 0.796 -0.140 0.585 -0.058 0.037 -0.011Width(mm) 0.693 0.602 0.044 0.392 -0.042 0.006
Contribution of the variables (%):
F1 F2 F3 F4 F5 F6Engine displacement (cm3)21.764 1.892 4.308 3.318 31.200 37.517Power (h) 20.414 12.034 1.946 0.004 9.348 56.254Speed (km/h) 18.642 10.677 11.202 9.886 45.793 3.801Weight (kg) 13.027 32.010 2.884 41.701 8.746 1.631Length (mm) 14.876 2.233 79.214 0.965 2.110 0.601Width(mm) 11.277 41.154 0.445 44.126 2.803 0.196
-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
-1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
Variables (axes F1 and F2: 85.62 %)
F1 (70.93 %)
F2
(1
4.6
9 %
)
-4 -2 0 2 4 6
-4
-2
0
2
4
Citroën C2 1.1 Base
Smart Fortwo Coupé
Mini 1.6 170
Nissan Micra 1.2 65
Renault Clio 3.0 V6
Audi A3 1.9 TDIPeugeot 307 1.4 HDI 7
Peugeot 407 3.0 V6 BVAMercedes Classe C 270 CDIBMW 530d
Jaguar S Type 2.7 V6 Bi
BMW 745iMercedes Classe S 400 CDI
Citroën C3 Pluriel 1.6i
BMW Z4 2.5iAudi TT 1.8T 180
Aston Martin Vanquish
Bentley Continental GT
Ferrari Enzo
Renault Scenic 1.9 dCi 120
Volkswagen Touran 1.9 TDI 105Land Rover Defender Td5
Land Rover Discovery Td5
Nissan X Trail 2.2 dCi
Observations (axes F1 and F2: 85.62 %)
F1 (70.93 %)
F2
(14.
69 %
)
-10 -8 -6 -4 -2 0 2 4 6 8 10
-8
-6
-4
-2
0
2
4
6
Engine displacement (cm3)
Power (h)
Speed (km/h)
Weight (kg)
Length (mm)
Width(mm)
Citroën C2 1.1 Base
Smart Fortwo Coupé
Mini 1.6 170
Nissan Micra 1.2 65
Renault Clio 3.0 V6
Audi A3 1.9 TDIPeugeot 307 1.4 HDI 7
Peugeot 407 3.0 V6 BVAMercedes Classe C 270 CDIBMW 530d
Jaguar S Type 2.7 V6 Bi
BMW 745iMercedes Classe S 400 CDI
Citroën C3 Pluriel 1.6i
BMW Z4 2.5iAudi TT 1.8T 180
Aston Martin Vanquish
Bentley Continental GT
Ferrari Enzo
Renault Scenic 1.9 dCi 120
Volkswagen Touran 1.9 TDI 105Land Rover Defender Td5
Land Rover Discovery Td5
Nissan X Trail 2.2 dCi
Biplot (axes F1 and F2: 85.62 %)
F1 (70.93 %)
F2
(1
4.6
9 %
)
For More on PCA …
• Text Book Chapter 20• Technical Note on PCA (see attached Pdf)• Case Studies and Examples of PCA (see
attached Pdf)