View
221
Download
0
Embed Size (px)
Citation preview
Olga Sorkine’s slidesTel Aviv University
2
Spectra and diagonalization
A
If A is symmetric, the eigenvectors are orthogonal (and there’s always an eigenbasis).
A = UUT
n
2
1
= Aui = i ui
3
PCA finds an orthogonal basis that best represents given data set.
The sum of distances2 from the x’ axis is minimized.
PCA – the general idea
x
y
x’
y’
4
PCA – the general idea
PCA finds an orthogonal basis that best represents given data set.
PCA finds a best approximating plane (again, in terms of distances2)
3D point set instandard basis
x y
z
5
Managing high-dimensional data
Data base of face scans (3D geometry + texture)
10,000 points in each scan x, y, z, R, G, B 6 numbers for each point Thus, each scan is a 10,000*6 = 60,000-dimensional vector!
6
Managing high-dimensional data
How to find interesting axes is this 60000-dimensional space? axes that measures age, gender, etc… There is hope: the faces are likely to be governed by a small set
of parameters (much less than 60,000…)
age axis gender axis
7
Notations
Denote our data points by x1, x2, …, xn Rd
11 11 2
22 21 2
1 2
, , ,
n
n
dd dn
xx x
xx x
xx x
1 2 dx x x
8
The origin is zero-order approximation of our data set (a point)
It will be the center of mass:
It can be shown that:
The origin of the new axes
1
1
n
ini
m x
2
1
argminn
ii
x
m x - x
9
Scatter matrix
Denote yi = xi – m, i = 1, 2, …, n
where Y is dn matrix with yk as columns (k = 1, 2, …,
n)
TS YY
1 1 1 1 21 2 1 1 12 2 2 1 21 2 2 2 2
1 21 2
dn
dn
d d d dn n n n
y y y y y y
y y y y y yS
y y y y y y
Y YT
10
Variance of projected points
In a way, S measures variance (= scatterness) of the data in different directions.
Let’s look at a line L through the center of mass m, and project our
points xi onto it. The variance of the projected points x’i is:
Original set Small variance Large variance
21
1
var( ) || ||n
ini
L
x m
L L L L
11
Variance of projected points
Given a direction v, ||v|| = 1, the projection of xi onto L = m + vt is:
|| || , || || , Ti i i ix m v x m v v y v y
vm
xi
x’iL
12
Variance of projected points
So,
2 2 21 1 1
1 1
21 1 1 1 1
var( ) || || ( ) || ||
|| || , ,
n n
n n ni i
T T Tn n n n n
L Y
Y Y Y YY S S
T Ti i
T T T
x -m v y v
v v v v v v v v v
2 21 1 1 1
1 22 2 2 2
2 1 2 1 2 21 2
1 1
1 2
( ) || ||
i n
n nT d di n
i i
d d d di n
y y y y
y y y yv v v v v v Y
y y y y
Tiv y v
13
Directions of maximal variance
So, we have: var(L) = <Sv, v> Theorem:
Let f : {v Rd | ||v|| = 1} R,
f (v) = <Sv, v> (and S is a symmetric matrix).
Then, the extrema of f are attained at the eigenvectors of S.
So, eigenvectors of S are directions of maximal/minimal variance!
14
Summary so far
We take the centered data points y1, y2, …, yn Rd
Construct the scatter matrix S measures the variance of the data points Eigenvectors of S are directions of maximal variance.
TS YY
15
Scatter matrix - eigendecomposition
S is symmetric
S has eigendecomposition: S = VVT
S = v2v1 vd
1
2
d
v2
v1
vd
The eigenvectors formorthogonal basis
16
Principal components
Eigenvectors that correspond to big eigenvalues are the directions in which the data has strong components (= large variance).
If the eigenvalues are more or less the same – there is no preferable direction.
17
Principal components
There’s no preferable direction
S looks like this:
Any vector is an eigenvector
There is a clear preferable direction
S looks like this:
is close to zero, much smaller than .
TVV
18
How to use what we got
For finding oriented bounding box – we simply compute the bounding box with respect to the axes defined by the eigenvectors. The origin is at the mean point m.
v2v1
v3
19
For approximation
x
yv1
v2
x
y
This line segment approximates the original data set
The projected data set approximates the original data set
x
y
20
For approximation
In general dimension d, the eigenvalues are sorted in descending order:
1 2 … d The eigenvectors are sorted accordingly. To get an approximation of dimension d’ < d, we
take the d’ first eigenvectors and look at the subspace they span (d’ = 1 is a line, d’ = 2 is a plane…)
21
For approximation
To get an approximating set, we project the original data points onto the chosen subspace:
xi = m + 1v1 + 2v2 +…+ d’vd’ +…+dvd
Projection:
xi’ = m + 1v1 + 2v2 +…+ d’vd’ +0vd’+1+…+ 0 vd
22
Optimality of approximation
The approximation is optimal in least-squares sense. It gives the minimal of:
The projected points have maximal variance.
2
1
n
k kk
x x
Original set projection on arbitrary line projection on v1 axis
23
Technical remarks:
i 0, i = 1,…,d (such matrices are called positive semi-definite). So we can indeed sort by the magnitude of i
Theorem: i 0 <Sv, v> 0 v
Proof:
Therefore, i 0 <Sv, v> 0 v
,
( ) ( ) ,
T T T T
T T T T
S V V S S V V
V V
v v v v v v
v v v v v v
2 2 21 2, ... dS 1 2 dv v u u u