Session :Spectraltheorem, PCA&SingularValue Decomposition

Session�: Spectral theorem,PCA&SingularValueDecomposition

Optimization and Computational Linear Algebra for Data Science

Marylou Gabrié (based onmaterial by Léo Miolane)

Midterm

The Midterm exam is in �week.

Scope: Session � to Session � included - HW� to HW� included

Knowing is not enough! You need to practice: reviewproblems available on the last year’s course’s webpage.

Practice is not enough! You need to know thedefinitions/theorems/propositions.

Past years midterms also available, with solutions.

Important: when working on a problem, take at least ��minon it before looking at the solution (in case you are stuck).

You can bring notes, but if you think that you need them forthe exam, you are probably not prepared enough.

Contents

�. The Spectral Theorem�.� Theorem�.� Consequences�.� The Theorem behind PCA

�. Principal Component Analysis�. Singular Value Decomposition

�. The Spectral theorem �/�

�. TheSpectral theorem

�.� The Spectral theorem

�. The Spectral theorem �.� The Spectral theorem �/�

TheoremLetA œ Rn◊n be a symmetricmatrix. Then there is a orthonormalbasis ofRn composed of eigenvectors ofA.

That means that ifA is symmetric, then there exists an orthonormalbasis (v1, . . . , vn) ofRn and ⁄1, . . . , ⁄n œ R such that

Avi = ⁄ivi for all i œ {1, . . . , n}.

Theorem (Matrix formulation)LetA œ Rn◊n be a symmetricmatrix. Then there exists anorthogonal matrix P and a diagonal matrixD of sizes n ◊ n suchthat

A = PDP T.

1

The spectral orthonormal basis


For any a EIR on rn is an orthonormal basisAofEnd eigenvectors

N Gian t hairnet G Nn onMA

An Can Aunt arm Avn

n a Mut t Ln on AnUn

Define DMe v4 orthogonal D Iya

pint 11 11 TIEExeaze PAPI AK for any ne IR PD.PT I

Geometric interpretation


matrix an RixA symmetric Thereis O in to Lt

II PERO

�.� Consequences

�. The Spectral theorem �.� Consequences �/�

If

A = P

Q

ccccca

⁄1 0 · · · 00 ⁄2

...... . . . 00 · · · 0 ⁄n

R

dddddbP T

for some orthogonal matrix P then:

Consequence #�: ⁄1, . . . , ⁄n are the only eigenvalues ofA, and thenumber of time that an eigenvalue appear on the diagonal equalsits multiplicity.

Proof sketch on an example


Consider n = 3 and

A = P

Q

ca3 0 00 3 00 0 ≠1

R

db P T where P =

Q

ca| | |

v1 v2 v3| | |

R

db

is an orthogonal matrix.

Arn P1 EPin

1 P13

Ava 3v2

Av3 V3multiplicityof An 3 m3 2 since Span unVa Elamultiplicityof 112 1 my 1 sinceSpanky E A

Mz t m i 3 dim IRS no more eigenvalues

Proof sketch on an example




If

A = P

Q

ccccca

⁄1 0 · · · 00 ⁄2

...... . . . 00 · · · 0 ⁄n

R

dddddbP T


Consequence #�: The rank ofA equals to the number of non-zero⁄i’s on the diagonal:

rank(A) = #)i-- ⁄i ”= 0

*.

Proof


hd

MjKenA Eo A eigenspace associated witheigenvalue O

A fi In OnatePt

Basisof ker A is on Va

E Iv Va are linearly independent since thefamilyis orthonormal

Span on Ya KerAK Espanto va ne Kala

ÉÉns

at herA a darn t there that that thnkthereexists an dm

e An Ot to Leftturnn t Sparta k a


�. The Spectral theorem �.� Consequences ��/�

If

A = P

Q

ccccca

⁄1 0 · · · 00 ⁄2

...... . . . 00 · · · 0 ⁄n

R

dddddbP T


Consequence #�: A is invertible if and only if ⁄i ”= 0 for all i. In suchcase

A≠1 = P

Q

ccccca

1/⁄1 0 · · · 00 1/⁄2

...... . . . 00 · · · 0 1/⁄n

R

dddddbP T

EQUIVALENCE

cos

Proof


Excise



If

A = P

Q

ccccca

⁄1 0 · · · 00 ⁄2

...... . . . 00 · · · 0 ⁄n

R

dddddbP T


Consequence #�: Tr(A) = ⁄1 + · · · + ⁄n.

AE lRnxn

H TCA InAiifB in IR

mxn

C in tramTr BL Tr CB

Trl A EET Tr DIII Tr D

�.� The Theorembehind PCA

�. The Spectral theorem �.� The Theorem behind PCA ��/�

TheoremLetA be a n ◊ n symmetric matrix and let ⁄1 Ø · · · Ø ⁄n be its neigenvalues and v1, . . . , vn be an associated orthonormal family ofeigenvectors. Then

⁄1 = maxÎvÎ=1

vTAv and v1 = arg maxÎvÎ=1

vTAv .

Moreover, for k = 2, . . . , n:

⁄k = maxÎvÎ=1, v‹v1,...,vk≠1

vTAv , and vk = arg maxÎvÎ=1, v‹v1,...,vk≠1

vTAv.

FIRIRN

2 Az max VTAV Nz argmax FAVlull L HullNIU Vtv

Proof


let ME 112 suchthat Hulkslet Can an be the coordinates of u in basis by rn

Aa A Liv t an un

ofAuant tannin 1 in ban's

atAu n Au In I didnt thin

Maximize atAu subject to Hull L

Proof


Maximize atAn subject to Hulks

Maximize a dat an in subject to aft tanks

Since Xr 7 In the maximum is acheived

for da 1 X2 dm O

This corresponds to Me on

I For which at An An

Proof


If now we want to maximize atau subject toULuton

we maximize

�. Principal Component Analysis ��/�

�. PrincipalComponentAnalysis

Empiricalmean and covariance


We are given a dataset of n points a1, . . . , an œ Rd

d = 1

Mean

µ = 1n

nÿ

i=1ai œ R

Variance

‡2 = 1n

nÿ

i=1(ai≠µ)2 œ R

d Ø 2

Mean

µ = 1n

nÿ

i=1ai œ Rd

Covariancematrix

S = 1n

nÿ

i=1(ai ≠ µ)(ai ≠ µ)T œ Rd◊d

= 1n

nÿ

i=1aia

Ti if µ = 0.

Empiricalmean and covariance


We are given a dataset of n points a1, . . . , an œ Rd

d = 1

Mean

µ = 1n

nÿ

i=1ai œ R

Variance

‡2 = 1n

nÿ

i=1(ai≠µ)2 œ R

d Ø 2

Mean

µ = 1n

nÿ

i=1ai œ Rd

Covariancematrix

S = 1n

nÿ

i=1(ai ≠ µ)(ai ≠ µ)T œ Rd◊d

= 1n

nÿ

i=1aia

Ti if µ = 0.

She InEnlaiulalaia

Eid xd

PCA


We are given a dataset of n points a1, . . . , an œ Rd, where d is«large».Goal: represent this dataset in lower dimension, i.e. findÂa1, . . . , Âan œ Rk where k π d.Assume that the dataset is centered:

qni=1 ai = 0.

Then, S can be simply written as:

S =nÿ

i=1aia

Ti = ATA.

whereA is the n ◊ d “data matrix”:

A =

Q

ca

≠ aT1 ≠...

≠ aTn ≠

R

db .

designmatrix

Direction ofmaximal variance


i



agent Titi as

Efron

Letzte Chanth thian

Cv In an t tan

O

Lu ai Tai ait v

IÉÉÉ iii anÉ naiait

vt InE aiait vNt STER



Good news: S = ATA is symmetric.Spectral Theorem: let ⁄1 Ø ⁄2 Ø · · · Ø ⁄n be the eigenvalues of Sand (v1, . . . , vn) an associated orthonormal basis of eigenvectors.

SEATTLEagmÉ I É ai ait A III

06.50 Sispositivedefinite 147,127 7 In

Byprevious theorem 1.3

we let Su is maximized by Iswith Us by

Transform an an in IRdIn the dimensionallyreduced at or an each in IR

É Vn an

�nd direction ofmaximal variance


BythetheoremNz maximizes its a subject to

11411 1

let V2

A 2d reduction of the data set an an eachin Irt

aifairyam

anKanin

an V2

jth direction ofmaximal variance


The « jth direction of maximal variance » is vj since vj issolution of

maximize vTSv, subject to ÎvÎ = 1, v ‹ v1, v ‹ v2, . . . , v ‹ vj≠1.

The dimensionally reduced dataset of in k-dimensions is thenQ

cccca

Èv1, a1ÍÈv2, a1Í

...Èvk, a1Í

R

ddddb,

Q

cccca


...Èvk, a2Í

R

ddddb,

Q

cccca


...Èvk, a3Í

R

ddddb. . .

Q

cccca

Èv1, anÍÈv2, anÍ

...Èvk, anÍ

R

ddddb.

as at as an

Recap


How to conpute reduced dimensional dataset?

CENTER DATA SUCH THAT Eai D

COMPUTE COVARIANCE S IndiaT ATA

COMPUTE EIGEN DECOMPOSITION SORTED

1471127 Xm OFS

Na in

KEEP K LARGEST

Compote at In

Which value of k shouldwe take?


IWAY KEEP A GIVEN FRACTION OF TOTAL VARIANCE

in ItnChoose the smallest k such that fee 70.8

Which value of k shouldwe take?


2ndWA PLOT EIGEN VANES OF S

f gapbetween Az and43

111Keep Foie

�. Singular Value Decomposition ��/�

�. SingularValueDecomposition

CENTERED DATA

an aar r ar r

n I I

PCA


Data matrix A œ Rn◊m

“Covariance matrix” S = ATA œ Rm◊m.

S is symmetric positive semi-definite.

Spectral Theorem: there exists an orthonormal basisv1, . . . , vm ofRm such that the vi’s are eigenvectors of Sassociated with the eigenvalues ⁄1 Ø · · · Ø ⁄m Ø 0.

Singular values/vectors


For i = 1, . . . , m:we define ‡i =

Ô⁄i, called the ith singular value ofA.

we call vj the ith right singular vector ofA.

For i = 1, . . . , r:we call ui = 1

‡iAvi the ith le� singular vector ofA.

If r < n, we add ur+1, · · · un such that u1, · · · un is an orthonormalbasis ofRn.

Singular Value decomposition


TheoremLetA œ Rn◊m. Then there exists two orthogonal matricesU œ Rn◊n and V œ Rm◊m and amatrix� œ Rn◊m such that�1,1 Ø �2,2 Ø · · · Ø 0 and�i,j = 0 for i ”= j, that verify

A = U�V T.

Geometric interpretation of U�V T


Questions?

��/�

A fataiay

A data point Inmy

points

Variance of 1st coordinate In far it

co variance of 1st 2nd coordinate

In E Caa a ai ut i datpaintindex

covariance k e featureifup0ztalee.S're I É lari a aei ne

Questions?

��/�

Documents

Session :Spectraltheorem, PCA&SingularValue Decomposition