Principal Components and Factor Analysis

Principal Components and Factor Analysis

Dr. Adel Elomri

[email protected]

mailto:[email protected]

Principal Components Analysis

(PCA)

Data Analysis and PresentationCriterion 1 Criterion 2 …. Criterion n

Observation1

Observation 2

…

…

Observation n

• We have too many observations and dimensions– To reason about or obtain insights from– To visualize– To find classification, clustering, pattern recognition

How many unique “sub-sets” are in the sample?

How are they similar / different?

What are the underlying factors that influence the samples?

Which time / temporal trends are (anti)correlated?

Which measurements are needed to differentiate?

How to best present what is “interesting”?

Which “sub-set” does this new sample rightfully belong?

Data Analysis and Presentation

0 10 20 30 40 50 60 700

0.20.40.60.811.21.41.61.8

Person

H-B

an

ds

Univariate

0 50 150 250 350 45050100150200250300350400450500550

C-Triglycerides

C-L

DH

Bivariate

0100

200300

400500

0200

4006000

1

2

3

4

C-TriglyceridesC-LDH

M-E

PI

Trivariate

Data Presentation

What if we have 5, or 8, or 10 Dimensions?

Data Analysis and PresentationHow to comment on this data?

Car Type Engine displacement (cm3)

Power (h)

Speed (km/h)

Weight (kg)

Length (mm)

Width(mm)

Citroën C2 1.1 Base 1124 61 158 932 1659 366

Smart Fortwo Coupé 698 52 135 730 1515 2500

Mini 1.6 170 1598 170 218 1215 1690 3625

Nissan Micra 1.2 65 1240 65 154 965 1660 3715

Renault Clio 3.0 V6 2946 255 245 1400 1810 3812

Audi A3 1.9 TDI 1896 105 187 1295 1765 4203

Peugeot 307 1.4 HDI 7 1398 70 160 1179 1746 4202

Peugeot 407 3.0 V6 BVA 2946 211 229 1640 1811 4676

Mercedes Classe C 270 CDI 2685 170 230 1600 1728 4528

BMW 530d 2993 218 245 1595 1846 4841

Jaguar S Type 2.7 V6 Bi 2720 207 230 1722 1818 4905

BMW 745i 4398 333 250 1870 1902 5029

Mercedes Classe S 400 CDI 3966 260 250 1915 2092 5038

Citroën C3 Pluriel 1.6i 1587 110 185 1177 1700 3934

BMW Z4 2.5i 2494 192 235 1260 1781 4091

Audi TT 1.8T 180 1781 180 228 1280 1764 4041

Aston Martin Vanquish 5935 460 306 1835 1923 4665

Bentley Continental GT 5998 560 318 2385 1918 4804

Ferrari Enzo 5998 660 350 1365 2650 4700

Renault Scenic 1.9 dCi 120 1870 120 188 1430 1805 4259

Volkswagen Touran 1.9 TDI 105 1896 105 180 1498 1794 4391

Land Rover Defender Td5 2495 122 135 1695 1790 3883

Land Rover Discovery Td5 2495 138 157 2175 2190 4705

Nissan X Trail 2.2 dCi 2184 136 180 1520 1765 4455

Source: L’argus 2004, France

• We have too many observations and dimensions– To reason about or obtain insights from– To visualize– Too much noise in the data

– Need to “reduce” them to a smaller set of factors– Better representation of data without losing much information– Can build more effective data analyses on the reduced-

dimensional space: classification, clustering, pattern recognition

Data Presentation: The goal

• Discover a new set of factors/dimensions/axes against which to represent, describe or evaluate the data– For more effective reasoning, insights, or better visualization– Reduce noise in the data– Typically a smaller set of factors: dimension reduction – Better representation of data without losing much information– Can build more effective data analyses on the reduced-

dimensional space: classification, clustering, pattern recognition

• Factors are combinations of observed variables – May be more effective bases for insights, even if physical

meaning is obscure– Observed data are described in terms of these factors rather than

in terms of original variables/dimensions

Principal Components Analysis (PCA)

PCA: Basic Concept

• Areas of variance in data are where items can be best discriminated and key underlying phenomena observed– Areas of greatest “signal” in the data

• If two items or dimensions are highly correlated or dependent– They are likely to represent highly related phenomena– If they tell us about the same underlying variance in the data, combining

them to form a single measure is reasonable• Parsimony• Reduction in Error

• So we want to combine related variables, and focus on uncorrelated or independent ones, especially those along which the observations have high variance

• We want a smaller set of variables that explain most of the variance in the original data, in more compact and insightful form

PCA: Basic Concept

• What if the dependences and correlations are not so strong or direct?

• And suppose you have 3 variables, or 4, or 5, or 10000?

• Look for the phenomena underlying the observed covariance/co-dependence in a set of variables– Once again, phenomena that are uncorrelated or independent,

and especially those along which the data show high variance

• These phenomena are called “factors” or “principal components” or “independent components,”

PCA:

• The new variables/dimensions– Are linear combinations of the original ones– Are uncorrelated with one another

• Orthogonal in original dimension space

– Capture as much of the original variance in the data as possible

– Are called Principal Components

PCA: The goalPCA used to reduce dimensions of data without much loss of information. Explain/summarize the underlying variance-covariance structure of a large set of variables through a few linear combinations of these variables.

Used in machine learning and in signal processing and image compression (among other things).

PCA: Applications

• Uses:– Data Visualization– Data Reduction– Data Classification– Trend Analysis– Factor Analysis– Noise Reduction

PCA:

All is about the way you look at the data

PCA: Example

PCA: Example

PCA: Example

Principle of PCA

This is accomplished by rotating the axes.

Suppose we have a population measured on p random variables X1,…,Xp. Note that these random variables represent the p-axes of the Cartesian coordinate system in which the population resides. Our goal is to develop a new set of p axes (linear combinations of the original p axes) in the directions of greatest variability:

X1

X2

Trick: Rotate Coordinate Axes

Background and theory for PCA

• Suppose attributes are A1 and A2, and we have n training examples. x’s denote values of A1 and y’s denote values of A2 over the training examples.

• Variance of an attribute:

Background for PCA

)1(

)()var( 1

2

1

n

xxA

n

ii

• Covariance of two attributes:

• If covariance is positive, both dimensions increase together. If negative, as one increases, the other decreases. Zero: independent of each other/non linear relationship.

)1(

))((),cov( 1

21

n

yyxxAA

n

iii

Background for PCA

• Covariance matrix– Suppose we have n attributes, A1, ..., An.

– Covariance matrix: ),cov( where),( ,, jijiji

nn AAccC

Background for PCA

Covariance matrix

3705.104

5.1047.47

)var(5.104

5.104)var(

),cov(),cov(

),cov(),cov(

M

H

MMHM

MHHH

Background for PCA

• Eigenvectors: – Let M be an nn matrix.

• v is an eigenvector of M if M v = v• is called the eigenvalue associated with v

– For any eigenvector v of M and scalar a,

– Thus you can always choose eigenvectors of length 1:

– If M has any eigenvectors, it has n of them, and they are orthogonal to one another.

– Thus eigenvectors can be used as a new basis for a n-dimensional vector space.

vvM aa

1... 221 nvv

Background for PCA

PCA: Algebraic Interpretation

• Given m points in a n dimensional space, for large n, how does one project on to a low dimensional space while preserving broad trends in the data and allowing it to be visualized?

PCA :Algebraic Interpretation – 1D

• Given m points in a n dimensional space, for large n, how does one project on to a 1 dimensional space?

• Choose a line that fits the data so the points are spread out well along the line

• Formally, minimize sum of squares of distances to the line.


• Minimizing sum of squares of distances to the line is the same as maximizing the sum of squares of the projections on that line, thanks to Pythagoras.


PCA: General

From k original variables: x1,x2,...,xk:

Produce k new variables: y1,y2,...,yk:

y1 = a11x1 + a12x2 + ... + a1kxk

y2 = a21x1 + a22x2 + ... + a2kxk

...

yk = ak1x1 + ak2x2 + ... + akkxk



y1 = a11x1 + a12x2 + ... + a1kxk

y2 = a21x1 + a22x2 + ... + a2kxk

...


such that:

yk's are uncorrelated (orthogonal)y1 explains as much as possible of original variance in data sety2 explains as much as possible of remaining varianceetc.

PCA: General

4.0 4.5 5.0 5.5 6.02

3

4

5

1st Principal Component, y1

2nd Principal Component, y2

PCA Scores

4.0 4.5 5.0 5.5 6.02

3

4

5

xi2

xi1

yi,1 yi,2

PCA Eigenvalues

4.0 4.5 5.0 5.5 6.02

3

4

5

λ1λ2



y1 = a11x1 + a12x2 + ... + a1kxk

y2 = a21x1 + a22x2 + ... + a2kxk

...


such that:

yk's are uncorrelated (orthogonal)y1 explains as much as possible of original variance in data sety2 explains as much as possible of remaining varianceetc.

yk's arePrincipal Components

PCA: Another Explanation

{a11,a12,...,a1k} is 1st Eigenvector of correlation/covariance matrix, and coefficients of first principal component

{a21,a22,...,a2k} is 2nd Eigenvector of correlation/covariance matrix, and coefficients of 2nd principal component

…

{ak1,ak2,...,akk} is kth Eigenvector ofcorrelation/covariance matrix, and coefficients of kth principal component

PCA: General

PCA Summary

• Rotates multivariate dataset into a new configuration which is easier to interpret

• Purposes– simplify data– look at relationships between variables– look at patterns of units

A 2d-Numerical Example

PCA Example –STEP 1

• Subtract the mean

from each of the data dimensions. All the x values have x subtracted and y values have y subtracted from them. This produces a data set whose mean is zero.

Subtracting the mean makes variance and covariance calculation easier by simplifying their equations. The variance and co-variance values are not affected by the mean value.


DATA:x y2.5 2.40.5 0.72.2 2.91.9 2.23.1 3.02.3 2.72 1.61 1.11.5 1.61.1 0.9

ZERO MEAN DATA:

x y

.69 .49

-1.31 -1.21

.39 .99

.09 .29

1.29 1.09

.49 .79

.19 -.31

-.81 -.81

-.31 -.31

-.71 -1.01

http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf




• Calculate the covariance matrix

cov = .616555556 .615444444

.615444444 .716555556

• since the non-diagonal elements in this covariance matrix are positive, we should expect that both the x and y variable increase together.


• Calculate the eigenvectors and eigenvalues of the covariance matrix

eigenvalues = .0490833989

1.28402771

eigenvectors = -.735178656 -.677873399

.677873399 -.735178656



• eigenvectors are plotted as diagonal dotted lines on the plot.

• Note they are perpendicular to each other.

• Note one of the eigenvectors goes through the middle of the points, like drawing a line of best fit.

• The second eigenvector gives us the other, less important, pattern in the data, that all the points follow the main line, but are off to the side of the main line by some amount.


• Reduce dimensionality and form feature vectorthe eigenvector with the highest eigenvalue is the principle component of the data set.

In our example, the eigenvector with the larges eigenvalue was the one that pointed down the middle of the data.

Once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives you the components in order of significance.


Now, if you like, you can decide to ignore the components of lesser significance.

You do lose some information, but if the eigenvalues are small, you don’t lose much

• n dimensions in your data • calculate n eigenvectors and eigenvalues• choose only the first p eigenvectors• final data set has only p dimensions.


• Feature Vector

FeatureVector = (eig1 eig2 eig3 … eign)We can either form a feature vector with both of the eigenvectors:

-.677873399 -.735178656 -.735178656 .677873399

or, we can choose to leave out the smaller, less significant component and only have a single column: - .677873399

- .735178656


• Deriving the new data

FinalData = RowFeatureVector x RowZeroMeanData

RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top

RowZeroMeanData is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension.


FinalData transpose: dimensions along columns x y

-.827970186 -.1751153071.77758033

.142857227-.992197494

.384374989-.274210416

.130417207-1.67580142

-.209498461-.912949103

.175282444.0991094375 -.3498246981.14457216

.0464172582.438046137

.01776462971.22382056

-.162675287



A 6-D Numerical Example-XLSTAT

Car Type Engine displacement (cm3)

Power (h)

Speed (km/h)

Weight (kg)

Length (mm)

Width(mm)

Citroën C2 1.1 Base 1124 61 158 932 1659 366

Smart Fortwo Coupé 698 52 135 730 1515 2500

Mini 1.6 170 1598 170 218 1215 1690 3625

Nissan Micra 1.2 65 1240 65 154 965 1660 3715

Renault Clio 3.0 V6 2946 255 245 1400 1810 3812

Audi A3 1.9 TDI 1896 105 187 1295 1765 4203

Peugeot 307 1.4 HDI 7 1398 70 160 1179 1746 4202

Peugeot 407 3.0 V6 BVA 2946 211 229 1640 1811 4676

Mercedes Classe C 270 CDI 2685 170 230 1600 1728 4528

BMW 530d 2993 218 245 1595 1846 4841

Jaguar S Type 2.7 V6 Bi 2720 207 230 1722 1818 4905

BMW 745i 4398 333 250 1870 1902 5029

Mercedes Classe S 400 CDI 3966 260 250 1915 2092 5038

Citroën C3 Pluriel 1.6i 1587 110 185 1177 1700 3934

BMW Z4 2.5i 2494 192 235 1260 1781 4091

Audi TT 1.8T 180 1781 180 228 1280 1764 4041

Aston Martin Vanquish 5935 460 306 1835 1923 4665

Bentley Continental GT 5998 560 318 2385 1918 4804

Ferrari Enzo 5998 660 350 1365 2650 4700

Renault Scenic 1.9 dCi 120 1870 120 188 1430 1805 4259

Volkswagen Touran 1.9 TDI 105 1896 105 180 1498 1794 4391

Land Rover Defender Td5 2495 122 135 1695 1790 3883

Land Rover Discovery Td5 2495 138 157 2175 2190 4705

Nissan X Trail 2.2 dCi 2184 136 180 1520 1765 4455

Correlation matrix (Pearson (n)):

Variables

Engine displacem

ent (cm3)

Power (h)

Speed (km/h)

Weight (kg)

Length (mm)

Width(mm)

Engine displacement (cm3) 1 0.954 0.885 0.692 0.706 0.546

Power (h) 0.954 1 0.934 0.529 0.730 0.448

Speed (km/h) 0.885 0.934 1 0.466 0.619 0.488

Weight (kg) 0.692 0.529 0.466 1 0.477 0.679

Length (mm) 0.706 0.730 0.619 0.477 1 0.468

Width(mm) 0.546 0.448 0.488 0.679 0.468 1

Principal Component Analysis:

Eigenvalues:

F1 F2 F3 F4 F5 F6Eigenvalue 4.256 0.882 0.432 0.348 0.063 0.019Variability (%) 70.931 14.694 7.204 5.799 1.053 0.320Cumulative % 70.931 85.625 92.829 98.627 99.680 100.000

F1 F2 F3 F4 F5 F6

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0

20

40

60

80

100

Scree plot

axis

Eig

en

va

lue

Cu

mu

lati

ve

va

ria

bil

ity

(%

)

Correlations between variables and factors:

F1 F2 F3 F4 F5 F6Engine displacement (cm3) 0.962 -0.129 -0.136 -0.107 -0.140 -0.085Power (h) 0.932 -0.326 -0.092 -0.004 -0.077 0.104Speed (km/h) 0.891 -0.307 -0.220 0.185 0.170 -0.027Weight (kg) 0.745 0.531 -0.112 -0.381 0.074 0.018Length (mm) 0.796 -0.140 0.585 -0.058 0.037 -0.011Width(mm) 0.693 0.602 0.044 0.392 -0.042 0.006

Contribution of the variables (%):

F1 F2 F3 F4 F5 F6Engine displacement (cm3)21.764 1.892 4.308 3.318 31.200 37.517Power (h) 20.414 12.034 1.946 0.004 9.348 56.254Speed (km/h) 18.642 10.677 11.202 9.886 45.793 3.801Weight (kg) 13.027 32.010 2.884 41.701 8.746 1.631Length (mm) 14.876 2.233 79.214 0.965 2.110 0.601Width(mm) 11.277 41.154 0.445 44.126 2.803 0.196

-1 -0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

Variables (axes F1 and F2: 85.62 %)

F1 (70.93 %)

F2

(1

4.6

9 %

)

-4 -2 0 2 4 6

-4

-2

0

2

4

Citroën C2 1.1 Base

Smart Fortwo Coupé

Mini 1.6 170

Nissan Micra 1.2 65

Renault Clio 3.0 V6

Audi A3 1.9 TDIPeugeot 307 1.4 HDI 7

Peugeot 407 3.0 V6 BVAMercedes Classe C 270 CDIBMW 530d

Jaguar S Type 2.7 V6 Bi

BMW 745iMercedes Classe S 400 CDI

Citroën C3 Pluriel 1.6i

BMW Z4 2.5iAudi TT 1.8T 180

Aston Martin Vanquish

Bentley Continental GT

Ferrari Enzo

Renault Scenic 1.9 dCi 120

Volkswagen Touran 1.9 TDI 105Land Rover Defender Td5

Land Rover Discovery Td5

Nissan X Trail 2.2 dCi

Observations (axes F1 and F2: 85.62 %)

F1 (70.93 %)

F2

(14.

69 %

)

-10 -8 -6 -4 -2 0 2 4 6 8 10

-8

-6

-4

-2

0

2

4

6

Engine displacement (cm3)

Power (h)

Speed (km/h)

Weight (kg)

Length (mm)

Width(mm)

Citroën C2 1.1 Base

Smart Fortwo Coupé

Mini 1.6 170

Nissan Micra 1.2 65

Renault Clio 3.0 V6

Audi A3 1.9 TDIPeugeot 307 1.4 HDI 7

Peugeot 407 3.0 V6 BVAMercedes Classe C 270 CDIBMW 530d

Jaguar S Type 2.7 V6 Bi

BMW 745iMercedes Classe S 400 CDI

Citroën C3 Pluriel 1.6i

BMW Z4 2.5iAudi TT 1.8T 180

Aston Martin Vanquish

Bentley Continental GT

Ferrari Enzo

Renault Scenic 1.9 dCi 120

Volkswagen Touran 1.9 TDI 105Land Rover Defender Td5

Land Rover Discovery Td5

Nissan X Trail 2.2 dCi

Biplot (axes F1 and F2: 85.62 %)

F1 (70.93 %)

F2

(1

4.6

9 %

)

For More on PCA …

• Text Book Chapter 20• Technical Note on PCA (see attached Pdf)• Case Studies and Examples of PCA (see

attached Pdf)

Documents

Principal Components and Factor Analysis