Olga Sorkine’s slides Tel Aviv University. 2 Spectra and diagonalization A If A is symmetric, the eigenvectors are orthogonal (and there’s always an eigenbasis)

Olga Sorkine’s slidesTel Aviv University

2

Spectra and diagonalization

A

If A is symmetric, the eigenvectors are orthogonal (and there’s always an eigenbasis).

A = UUT

n

2

1

= Aui = i ui

3

PCA finds an orthogonal basis that best represents given data set.

The sum of distances2 from the x’ axis is minimized.

PCA – the general idea

x

y

x’

y’

4

PCA – the general idea

PCA finds an orthogonal basis that best represents given data set.

PCA finds a best approximating plane (again, in terms of distances2)

3D point set instandard basis

x y

z

5

Managing high-dimensional data

Data base of face scans (3D geometry + texture)

10,000 points in each scan x, y, z, R, G, B 6 numbers for each point Thus, each scan is a 10,000*6 = 60,000-dimensional vector!

6

Managing high-dimensional data

How to find interesting axes is this 60000-dimensional space? axes that measures age, gender, etc… There is hope: the faces are likely to be governed by a small set

of parameters (much less than 60,000…)

age axis gender axis

7

Notations

Denote our data points by x1, x2, …, xn Rd

11 11 2

22 21 2

1 2

, , ,

n

n

dd dn

xx x

xx x

xx x

1 2 dx x x

8

The origin is zero-order approximation of our data set (a point)

It will be the center of mass:

It can be shown that:

The origin of the new axes

1

1

n

ini

m x

2

1

argminn

ii

x

m x - x

9

Scatter matrix

Denote yi = xi – m, i = 1, 2, …, n

where Y is dn matrix with yk as columns (k = 1, 2, …,

n)

TS YY

1 1 1 1 21 2 1 1 12 2 2 1 21 2 2 2 2

1 21 2

dn

dn

d d d dn n n n

y y y y y y

y y y y y yS

y y y y y y

Y YT

10

Variance of projected points

In a way, S measures variance (= scatterness) of the data in different directions.

Let’s look at a line L through the center of mass m, and project our

points xi onto it. The variance of the projected points x’i is:

Original set Small variance Large variance

21

1

var( ) || ||n

ini

L

x m

L L L L

11


Given a direction v, ||v|| = 1, the projection of xi onto L = m + vt is:

|| || , || || , Ti i i ix m v x m v v y v y

vm

xi

x’iL

12


So,

2 2 21 1 1

1 1

21 1 1 1 1

var( ) || || ( ) || ||

|| || , ,

n n

n n ni i

T T Tn n n n n

L Y

Y Y Y YY S S

T Ti i

T T T

x -m v y v

v v v v v v v v v

2 21 1 1 1

1 22 2 2 2

2 1 2 1 2 21 2

1 1

1 2

( ) || ||

i n

n nT d di n

i i

d d d di n

y y y y

y y y yv v v v v v Y

y y y y

Tiv y v

13

Directions of maximal variance

So, we have: var(L) = <Sv, v> Theorem:

Let f : {v Rd | ||v|| = 1} R,

f (v) = <Sv, v> (and S is a symmetric matrix).

Then, the extrema of f are attained at the eigenvectors of S.

So, eigenvectors of S are directions of maximal/minimal variance!

14

Summary so far

We take the centered data points y1, y2, …, yn Rd

Construct the scatter matrix S measures the variance of the data points Eigenvectors of S are directions of maximal variance.

TS YY

15

Scatter matrix - eigendecomposition

S is symmetric

S has eigendecomposition: S = VVT

S = v2v1 vd

1

2

d

v2

v1

vd

The eigenvectors formorthogonal basis

16

Principal components

Eigenvectors that correspond to big eigenvalues are the directions in which the data has strong components (= large variance).

If the eigenvalues are more or less the same – there is no preferable direction.

17

Principal components

There’s no preferable direction

S looks like this:

Any vector is an eigenvector

There is a clear preferable direction

S looks like this:

is close to zero, much smaller than .

TVV

18

How to use what we got

For finding oriented bounding box – we simply compute the bounding box with respect to the axes defined by the eigenvectors. The origin is at the mean point m.

v2v1

v3

19

For approximation

x

yv1

v2

x

y

This line segment approximates the original data set

The projected data set approximates the original data set

x

y

20

For approximation

In general dimension d, the eigenvalues are sorted in descending order:

1 2 … d The eigenvectors are sorted accordingly. To get an approximation of dimension d’ < d, we

take the d’ first eigenvectors and look at the subspace they span (d’ = 1 is a line, d’ = 2 is a plane…)

21

For approximation

To get an approximating set, we project the original data points onto the chosen subspace:

xi = m + 1v1 + 2v2 +…+ d’vd’ +…+dvd

Projection:

xi’ = m + 1v1 + 2v2 +…+ d’vd’ +0vd’+1+…+ 0 vd

22

Optimality of approximation

The approximation is optimal in least-squares sense. It gives the minimal of:

The projected points have maximal variance.

2

1

n

k kk

x x

Original set projection on arbitrary line projection on v1 axis

23

Technical remarks:

i 0, i = 1,…,d (such matrices are called positive semi-definite). So we can indeed sort by the magnitude of i

Theorem: i 0 <Sv, v> 0 v

Proof:

Therefore, i 0 <Sv, v> 0 v

,

( ) ( ) ,

T T T T

T T T T

S V V S S V V

V V

v v v v v v

v v v v v v

2 2 21 2, ... dS 1 2 dv v u u u

Documents

Olga Sorkine’s slides Tel Aviv University. 2 Spectra and diagonalization A If A is symmetric, the eigenvectors are orthogonal (and there’s always an eigenbasis)