Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Session�: Spectral theorem,PCA&SingularValueDecomposition
Optimization and Computational Linear Algebra for Data Science
Marylou Gabrié (based onmaterial by Léo Miolane)
Midterm
The Midterm exam is in �week.
Scope: Session � to Session � included - HW� to HW� included
Knowing is not enough! You need to practice: reviewproblems available on the last year’s course’s webpage.
Practice is not enough! You need to know thedefinitions/theorems/propositions.
Past years midterms also available, with solutions.
Important: when working on a problem, take at least ��minon it before looking at the solution (in case you are stuck).
You can bring notes, but if you think that you need them forthe exam, you are probably not prepared enough.
Contents
�. The Spectral Theorem�.� Theorem�.� Consequences�.� The Theorem behind PCA
�. Principal Component Analysis�. Singular Value Decomposition
�. The Spectral theorem �/�
�. TheSpectral theorem
�.� The Spectral theorem
�. The Spectral theorem �.� The Spectral theorem �/�
TheoremLetA œ Rn◊n be a symmetricmatrix. Then there is a orthonormalbasis ofRn composed of eigenvectors ofA.
That means that ifA is symmetric, then there exists an orthonormalbasis (v1, . . . , vn) ofRn and ⁄1, . . . , ⁄n œ R such that
Avi = ⁄ivi for all i œ {1, . . . , n}.
Theorem (Matrix formulation)LetA œ Rn◊n be a symmetricmatrix. Then there exists anorthogonal matrix P and a diagonal matrixD of sizes n ◊ n suchthat
A = PDP T.
1
The spectral orthonormal basis
�. The Spectral theorem �.� The Spectral theorem �/�
For any a EIR on rn is an orthonormal basisAofEnd eigenvectors
N Gian t hairnet G Nn onMA
An Can Aunt arm Avn
n a Mut t Ln on AnUn
Define DMe v4 orthogonal D Iya
pint 11 11 TIEExeaze PAPI AK for any ne IR PD.PT I
Geometric interpretation
�. The Spectral theorem �.� The Spectral theorem �/�
matrix an RixA symmetric Thereis O in to Lt
II PERO
�.� Consequences
�. The Spectral theorem �.� Consequences �/�
If
A = P
Q
ccccca
⁄1 0 · · · 00 ⁄2
...... . . . 00 · · · 0 ⁄n
R
dddddbP T
for some orthogonal matrix P then:
Consequence #�: ⁄1, . . . , ⁄n are the only eigenvalues ofA, and thenumber of time that an eigenvalue appear on the diagonal equalsits multiplicity.
Proof sketch on an example
�. The Spectral theorem �.� Consequences �/�
Consider n = 3 and
A = P
Q
ca3 0 00 3 00 0 ≠1
R
db P T where P =
Q
ca| | |
v1 v2 v3| | |
R
db
is an orthogonal matrix.
Arn P1 EPin
1 P13
Ava 3v2
Av3 V3multiplicityof An 3 m3 2 since Span unVa Elamultiplicityof 112 1 my 1 sinceSpanky E A
Mz t m i 3 dim IRS no more eigenvalues
Proof sketch on an example
�. The Spectral theorem �.� Consequences �/�
�.� Consequences
�. The Spectral theorem �.� Consequences �/�
If
A = P
Q
ccccca
⁄1 0 · · · 00 ⁄2
...... . . . 00 · · · 0 ⁄n
R
dddddbP T
for some orthogonal matrix P then:
Consequence #�: The rank ofA equals to the number of non-zero⁄i’s on the diagonal:
rank(A) = #)i-- ⁄i ”= 0
*.
Proof
�. The Spectral theorem �.� Consequences �/�
hd
MjKenA Eo A eigenspace associated witheigenvalue O
A fi In OnatePt
Basisof ker A is on Va
E Iv Va are linearly independent since thefamilyis orthonormal
Span on Ya KerAK Espanto va ne Kala
ÉÉns
at herA a darn t there that that thnkthereexists an dm
e An Ot to Leftturnn t Sparta k a
�.� Consequences
�. The Spectral theorem �.� Consequences ��/�
If
A = P
Q
ccccca
⁄1 0 · · · 00 ⁄2
...... . . . 00 · · · 0 ⁄n
R
dddddbP T
for some orthogonal matrix P then:
Consequence #�: A is invertible if and only if ⁄i ”= 0 for all i. In suchcase
A≠1 = P
Q
ccccca
1/⁄1 0 · · · 00 1/⁄2
...... . . . 00 · · · 0 1/⁄n
R
dddddbP T
EQUIVALENCE
cos
Proof
�. The Spectral theorem �.� Consequences ��/�
Excise
�.� Consequences
�. The Spectral theorem �.� Consequences ��/�
If
A = P
Q
ccccca
⁄1 0 · · · 00 ⁄2
...... . . . 00 · · · 0 ⁄n
R
dddddbP T
for some orthogonal matrix P then:
Consequence #�: Tr(A) = ⁄1 + · · · + ⁄n.
AE lRnxn
H TCA InAiifB in IR
mxn
C in tramTr BL Tr CB
Trl A EET Tr DIII Tr D
�.� The Theorembehind PCA
�. The Spectral theorem �.� The Theorem behind PCA ��/�
TheoremLetA be a n ◊ n symmetric matrix and let ⁄1 Ø · · · Ø ⁄n be its neigenvalues and v1, . . . , vn be an associated orthonormal family ofeigenvectors. Then
⁄1 = maxÎvÎ=1
vTAv and v1 = arg maxÎvÎ=1
vTAv .
Moreover, for k = 2, . . . , n:
⁄k = maxÎvÎ=1, v‹v1,...,vk≠1
vTAv , and vk = arg maxÎvÎ=1, v‹v1,...,vk≠1
vTAv.
FIRIRN
2 Az max VTAV Nz argmax FAVlull L HullNIU Vtv
Proof
�. The Spectral theorem �.� The Theorem behind PCA ��/�
let ME 112 suchthat Hulkslet Can an be the coordinates of u in basis by rn
Aa A Liv t an un
ofAuant tannin 1 in ban's
atAu n Au In I didnt thin
Maximize atAu subject to Hull L
Proof
�. The Spectral theorem �.� The Theorem behind PCA ��/�
Maximize atAn subject to Hulks
Maximize a dat an in subject to aft tanks
Since Xr 7 In the maximum is acheived
for da 1 X2 dm O
This corresponds to Me on
I For which at An An
Proof
�. The Spectral theorem �.� The Theorem behind PCA ��/�
If now we want to maximize atau subject toULuton
we maximize
�. Principal Component Analysis ��/�
�. PrincipalComponentAnalysis
Empiricalmean and covariance
�. Principal Component Analysis ��/�
We are given a dataset of n points a1, . . . , an œ Rd
d = 1
Mean
µ = 1n
nÿ
i=1ai œ R
Variance
‡2 = 1n
nÿ
i=1(ai≠µ)2 œ R
d Ø 2
Mean
µ = 1n
nÿ
i=1ai œ Rd
Covariancematrix
S = 1n
nÿ
i=1(ai ≠ µ)(ai ≠ µ)T œ Rd◊d
= 1n
nÿ
i=1aia
Ti if µ = 0.
Empiricalmean and covariance
�. Principal Component Analysis ��/�
We are given a dataset of n points a1, . . . , an œ Rd
d = 1
Mean
µ = 1n
nÿ
i=1ai œ R
Variance
‡2 = 1n
nÿ
i=1(ai≠µ)2 œ R
d Ø 2
Mean
µ = 1n
nÿ
i=1ai œ Rd
Covariancematrix
S = 1n
nÿ
i=1(ai ≠ µ)(ai ≠ µ)T œ Rd◊d
= 1n
nÿ
i=1aia
Ti if µ = 0.
She InEnlaiulalaia
Eid xd
PCA
�. Principal Component Analysis ��/�
We are given a dataset of n points a1, . . . , an œ Rd, where d is«large».Goal: represent this dataset in lower dimension, i.e. findÂa1, . . . , Âan œ Rk where k π d.Assume that the dataset is centered:
qni=1 ai = 0.
Then, S can be simply written as:
S =nÿ
i=1aia
Ti = ATA.
whereA is the n ◊ d “data matrix”:
A =
Q
ca
≠ aT1 ≠...
≠ aTn ≠
R
db .
designmatrix
Direction ofmaximal variance
�. Principal Component Analysis ��/�
i
Direction ofmaximal variance
�. Principal Component Analysis ��/�
agent Titi as
Efron
Letzte Chanth thian
Cv In an t tan
O
Lu ai Tai ait v
IÉÉÉ iii anÉ naiait
vt InE aiait vNt STER
Direction ofmaximal variance
�. Principal Component Analysis ��/�
Good news: S = ATA is symmetric.Spectral Theorem: let ⁄1 Ø ⁄2 Ø · · · Ø ⁄n be the eigenvalues of Sand (v1, . . . , vn) an associated orthonormal basis of eigenvectors.
SEATTLEagmÉ I É ai ait A III
06.50 Sispositivedefinite 147,127 7 In
Byprevious theorem 1.3
we let Su is maximized by Iswith Us by
Transform an an in IRdIn the dimensionallyreduced at or an each in IR
É Vn an
�nd direction ofmaximal variance
�. Principal Component Analysis ��/�
BythetheoremNz maximizes its a subject to
11411 1
let V2
A 2d reduction of the data set an an eachin Irt
aifairyam
anKanin
an V2
jth direction ofmaximal variance
�. Principal Component Analysis ��/�
The « jth direction of maximal variance » is vj since vj issolution of
maximize vTSv, subject to ÎvÎ = 1, v ‹ v1, v ‹ v2, . . . , v ‹ vj≠1.
The dimensionally reduced dataset of in k-dimensions is thenQ
cccca
Èv1, a1ÍÈv2, a1Í
...Èvk, a1Í
R
ddddb,
Q
cccca
Èv1, a2ÍÈv2, a2Í
...Èvk, a2Í
R
ddddb,
Q
cccca
Èv1, a3ÍÈv2, a3Í
...Èvk, a3Í
R
ddddb. . .
Q
cccca
Èv1, anÍÈv2, anÍ
...Èvk, anÍ
R
ddddb.
as at as an
Recap
�. Principal Component Analysis ��/�
How to conpute reduced dimensional dataset?
CENTER DATA SUCH THAT Eai D
COMPUTE COVARIANCE S IndiaT ATA
COMPUTE EIGEN DECOMPOSITION SORTED
1471127 Xm OFS
Na in
KEEP K LARGEST
Compote at In
Which value of k shouldwe take?
�. Principal Component Analysis ��/�
IWAY KEEP A GIVEN FRACTION OF TOTAL VARIANCE
in ItnChoose the smallest k such that fee 70.8
Which value of k shouldwe take?
�. Principal Component Analysis ��/�
2ndWA PLOT EIGEN VANES OF S
f gapbetween Az and43
111Keep Foie
�. Singular Value Decomposition ��/�
�. SingularValueDecomposition
CENTERED DATA
an aar r ar r
n I I
PCA
�. Singular Value Decomposition ��/�
Data matrix A œ Rn◊m
“Covariance matrix” S = ATA œ Rm◊m.
S is symmetric positive semi-definite.
Spectral Theorem: there exists an orthonormal basisv1, . . . , vm ofRm such that the vi’s are eigenvectors of Sassociated with the eigenvalues ⁄1 Ø · · · Ø ⁄m Ø 0.
Singular values/vectors
�. Singular Value Decomposition ��/�
For i = 1, . . . , m:we define ‡i =
Ô⁄i, called the ith singular value ofA.
we call vj the ith right singular vector ofA.
For i = 1, . . . , r:we call ui = 1
‡iAvi the ith le� singular vector ofA.
If r < n, we add ur+1, · · · un such that u1, · · · un is an orthonormalbasis ofRn.
Singular Value decomposition
�. Singular Value Decomposition ��/�
TheoremLetA œ Rn◊m. Then there exists two orthogonal matricesU œ Rn◊n and V œ Rm◊m and amatrix� œ Rn◊m such that�1,1 Ø �2,2 Ø · · · Ø 0 and�i,j = 0 for i ”= j, that verify
A = U�V T.
Geometric interpretation of U�V T
�. Singular Value Decomposition ��/�
Questions?
��/�
A fataiay
A data point Inmy
points
Variance of 1st coordinate In far it
co variance of 1st 2nd coordinate
In E Caa a ai ut i datpaintindex
covariance k e featureifup0ztalee.S're I É lari a aei ne
Questions?
��/�