Lecture15 - Face Recognition2

8/2/2019 Lecture15 - Face Recognition2

1/101

PCAin

Face Recognition

Berrin Yanikoglu

FENS

Computer Science & EngineeringSabanc University

Some slides from Derek Hoiem, Lana Lazebnik, Silvio Savarese, Fei-Fei Li


2/101

Overview

Face recognition, verification, tracking

Feature subspaces: PCA and FLD Results from vendor tests

Interesting findings about human face recognition


3/101

Face detection and recognition

Detection Recognition Sally


4/101

Applications of Face Recognition

Surveillance

Digital photography

Album organization


5/101

Consumer application: iPhoto 2009

Can be trained to recognize pets!

http://www.maclife.com/article/news/iphotos_faces_recognizes_cats
http://www.maclife.com/article/news/iphotos_faces_recognizes_catshttp://www.maclife.com/article/news/iphotos_faces_recognizes_cats


6/101

Consumer application: iPhoto 2009


7/101

Error measure

Face Detection/Verification

False Positives (%)

False Negatives (%)

Face Recognition

Top-N rates (%) Open/closed set problems

Sources of variations:

Withglasses

Withoutglasses

3 Lightingconditions 5 expressions


8/101

Starting idea of eigenfaces

1. Treat pixels as a vector

2. Recognize face by nearest neighbor

x

nyy ...

1

xy T

kk

min


9/101

The space of face images

When viewed as vectors of pixel values, face images

are extremely high-dimensional 100x100 image = 10,000 dimensions

Large memory and computational requirements

But very few 10,000-dimensional vectors are valid

face images We want to reduce dimensionality and effectively

model the subspace of face images


10/101

Principal Component Analysis (PCA)

Kz

z

z

2

1

z

Pattern recognition in high-dimensional spaces

Problems arise when performing recognition in a high-dimensional space(curse of dimensionality).

Significant improvements can be achieved by first mapping the data into alower-dimensional sub-space.

where K


11/101

Change of basis

x1

x2

z1

z2

3

3p

3

3

1

10

1

13

3

3

1

03

0

13

p

p


12/101

Dimensionality reduction

1

1

1

121 zzq

1

1

1

1

1

1

1

21

b

bb

qq

qqq

x1

x2

z1z2

p

q

|||| qq Error:


13/101


KNKK

N

N

uuu

uuu

uuu

21

22221

11211

W

NKNKKK

NN

NN

xuxuxuz

xuxuxuz

xuxuxuz

...

...

...

...

2211

22221212

12121111

Wxz

PCA allows us to compute a linear transformation that maps data from a

high dimensional space to a lower dimensional sub-space.

In short,

where

Nx

x

x

2

1

x


14/101


K21 uuux Kzzz 21

N1 v,v ,

Lower dimensionality basis

Approximate vectors by finding a basis in an appropriate lower dimensionalspace.

(1) Higher-dimensional space representation:

(2) Lower-dimensional space representation:

NNxxx vvvx 21

21

are the basis vectors of the N-dimensional space

are the basis vectors of the N-dimensional spaceK1 u,,u

Note: If N=K, then xx


15/101

Dimensionality reduction

1

1

1

121 zzq

1

1

1

1

1

1

1

21

b

bb

qq

qq

x1

x2

z1z2

p

q

q


16/101

Illustration for projection, variance and bases

x1

x2

z1

z2


17/101


Dimensionality reduction implies information loss !!

Want to preserve as much information as possible, that is:

How to determine the best lower dimensional sub-space?


18/101

Principal Components Analysis (PCA)

The projection of xon the direction of uis:z= uTx

Find usuch that Var(z) is maximized:

Var(z) = Var(uTx)

= E[ (uTx-uTm) (uTx-uTm)T]= E[(u

T

x

u

T

)

2

] //since(uTx-uTm) is a scalar

= E[(uTx uT)(uTx uT)]

= E[uT(x)(x)Tu]

= uT E[(x)(x)T]u

= uTu

where = E[(x)(x)T] (covariance of x)


19/101

Principal Component Analysis

Direction that maximizes the variance of the projected data:

Projection of data point

Covariance matrix of data

N

N

1/N

Maximize

subject to ||u||=1


20/101

Maximize Var(z) = uTusubject to ||u||=1

Taking the derivative w.r.t w1,and setting it equal to 0, we get:

u1= u1 u1 is an eigenvector of Choose the eigenvector with the largest eigenvalue for Var(z) to be

maximum

Second principal component: Max Var(z2), s.t., ||u2||=1 and it isorthogonal to u1

Similar analysis shows that, u2= u2 u2 is another eigenvector of and so on.

)1max 11111

uuuuuTT

) )01max 1222222

uuuuuuu

TTT

, :Lagrange multipliers


21/101

Maximize var(z)=

Consider the eigenvectors of for which u lu where u is an eigenvector of andl is the corresponding eigenvalue. Multiplying by uT:

uTu = uT lu = l uT u = l for ||u||=1. Choose the eigenvector with the largest eigenvalue.


22/101


Given: N data points x1, ,xNin Rd

We want to find a new set of features that arelinear combinations of original ones:

u(xi) = uT(x

i)

(: mean of data points)

Choose unit vector u in Rd that captures the

most data variance

Forsyth & Ponce, Sec. 22.3.1, 22.3.2


23/101

What PCA does

The transformation z= WT(xm)

where the columns of W are the eigenvectors of ,

m is sample mean,centers the data at the origin and rotates the axes


24/101

The covariance matrix is symmetrical and it can always bediagonalized as:

TWW

where is the column matrix consisting of

the eigenvectors of,

WT=W-1

is the diagonal matrix whose elementsare the eigenvalues of.

],...,,[ 21 luuuW

Eigenvalues of the covariance matrix


25/101


Methodology

Suppose x1, x2, ..., xMare Nx 1 vectors


26/101


Methodology cont.


27/101


Linear transformation implied by PCA

The linear transformation RNRKthat performs the dimensionalityreduction is:


28/101


How to choose the principal components?

To choose K, use the following criterion:


29/101

How to choose k ?

Proportion of Variance (PoV) explained

when iare sorted in descending order

Typically, stop at PoV>0.9

Scree graph plots of PoV vs k, stop at elbow

dk

k

llll

lll

21

21


30/101


31/101


32/101


What is the error due to dimensionality reduction?

We saw above that an original vector xcan be reconstructed using itsprincipal components:

It can be shown that the low-dimensional basis based on principalcomponents minimizes the reconstruction error:

It can be shown that the error is equal to:


33/101


Standardization

The principal components are dependent on the unitsused to measure theoriginal variables as well as on the rangeof values they assume.

We should always standardize the data prior to using PCA.

A common standardization method is to transform all the data to have zeromean and unit standard deviation:


34/101

Eigenface Implementation


35/101

Computation of the eigenfaces


36/101

Computation of the eigenfaces cont.


37/101



38/101



39/101

Face Recognition Using Eigenfaces

f


40/101

Face Recognition Using Eigenfaces cont.

The distance er

can be computed using the common Euclideandistance, however, it has been reported that the Mahalanobis distanceperforms better:

Ei f l


41/101

Eigenfaces example

Training images

x1,,xN

Ei f l


42/101

Eigenfaces example

Top eigenvectors: u1,uk

Mean:

Vi li ti f i f


43/101

Visualization of eigenfaces

Principal component (eigenvector) uk

+ 3kuk

3kuk

R t ti d t ti


44/101

Representation and reconstruction

Face xin face space coordinates:

=

R t ti d t ti


45/101

Representation and reconstruction

Face xin face space coordinates:

Reconstruction:

+

+ w1u1 + w2u2 + w3u3 + w4u4 +

=

x

R t ti


46/101

P = 4

P = 200

P = 400

Reconstruction

Eigenfaces are computed using the 400 face images from ORLface database. The size of each image is 92x112 pixels (x has ~10Kdimension).

R iti ith i f


47/101

Recognition with eigenfaces

Process labeled training images

Find mean and covariance matrix Find k principal components (eigenvectors of ) u1,uk Project each training image xi onto subspace spanned by

principal components:(wi1,,wik) = (u1

T(xi), , ukT(xi))

Given novel image x Project onto subspace:

(w1,,wk) = (u1T(x), , uk

T(x))

Optional: check reconstruction error xx to determinewhether image is really a face

Classify as closest training face in k-dimensionalsubspace

^

M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991

PCA
http://www.cs.ucsb.edu/~mturk/Papers/mturk-CVPR91.pdfhttp://www.cs.ucsb.edu/~mturk/Papers/mturk-CVPR91.pdf


48/101

PCA

General dimensionality reduction technique

Preserves most of variance with a much morecompact representation Lower storage requirements (eigenvectors + a few

numbers per face)

Faster matching

What are the problems for face recognition?

Limitations


49/101

Limitations

Global appearance method:

not robust at all to misalignment

Not very robust to background variation, scale



50/101


Problems

Background (de-emphasize the outside of the face e.g., bymultiplying the input image by a 2D Gaussian window centered onthe face)

Lighting conditions (performance degrades with light changes)

Scale (performance decreases quickly with changes to head size)

multi-scale eigenspaces

scale input image to multiple sizes

Orientation (performance decreases but not as fast as with scalechanges)

plane rotations can be handled

out-of-plane rotations are more difficult to handle


51/101

Fisherfaces

Limitations


52/101

Limitations

The direction of maximum variance is not

always good for classification

A more discriminative subspace: FLD


53/101

A more discriminative subspace: FLD

Fisher Linear Discriminants Fisher Faces

PCA preserves maximum variance

FLD preserves discrimination Find projection that maximizes scatter between

classes and minimizes scatter within classes

Reference: Eigenfaces vs. Fisherfaces, Belheumer et al., PAMI 1997
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3247&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3247&rep=rep1&type=pdf


54/101

Illustration of the Projection

Poor Projection

x1

x2

x1

x2

Using two classes as example:

Good


55/101

Comparing with PCA


56/101

Variables

N Sample images:

c classes:

Average of each class:

Average of all data:

Nxx ,,1 c,,1

ikx

k

i

i xN

m 1

N

kkxN 1

1m


57/101

Scatter Matrices

Scatter of class i: ) )T

ik

x

iki xxSik

mm

c

i

iW SS1

) )

c

i

T

iiiB NS1

mmmm

Within class scatter:

Between class scatter:


58/101

Illustration

2S

1S

B

S

21 SSSW

x1

x2

Within class scatter

Between class scatter

Mathematical Form lation


59/101

Mathematical Formulation

After projection

Between class scatter

Within class scatter

Objective

Solution: Generalized Eigenvectors

Rank of Wopt is limited

Rank(SB)


60/101

Illustration

2S

1S

B

S

21 SSSW

x1

x2

Recognition with FLD


61/101

Recognition with FLD

Compute within-class and between-class scatter matrices

Solve generalized eigenvector problem

Project to FLD subspace and classify by nearest neighbor

WSW

WSWW

W

T

B

T

optW

maxarg miwSwS iWiiB ,,1 l

) )Tikx

iki xxSik

mm

c

i

iW SS1

) )

c

i

T

iiiB NS1

mmmm

xWxT

opt


62/101

Results: Eigenface vs. Fisherface

Variation in Facial Expression, Eyewear, and Lighting

Input: 160 images of 16 people Train: 159 images

Test: 1 image

Withglasses

Withoutglasses

3 Lightingconditions

5 expressions



63/101

Eigenfaces vs. Fisherfaces



64/101

1 Identify a Specific Instance


65/101

1. Identify a Specific Instance

Frontal faces

Typical scenario: few examples per face,identify or verify test example

Whats hard: changes in expression, lighting,

age, occlusion, viewpoint Basic approaches (all nearest neighbor)

1. Project into a new subspace (or kernel space)

(e.g., Eigenfaces=PCA)

1. Measure face features2. Make 3d face model, compare shape+appearance (e.g.,

AAM)

Large scale comparison of methods


66/101

Large scale comparison of methods

FRVT 2006 Report

Not much (or any) information available aboutmethods, but gives idea of what is doable

FVRT Challenge
http://www.frvt.org/FRVT2006/docs/FRVT2006andICE2006LargeScaleReport.pdfhttp://www.frvt.org/FRVT2006/docs/FRVT2006andICE2006LargeScaleReport.pdf


67/101

FVRT Challenge

Frontal faces

FVRT2006 evaluation

FalseRejectionRate at FalseAcceptanceRate = 0.001

FVRT Challenge


68/101

FVRT Challenge

Frontal faces

FVRT2006 evaluation: controlled illumination

FVRT Challenge


69/101

FVRT Challenge

Frontal faces

FVRT2006 evaluation: uncontrolled illumination

FVRT Challenge


70/101

FVRT Challenge

Frontal faces

FVRT2006 evaluation: computers win!

Face recognition by humans


71/101

Face recognition by humans

Face recognition by humans: 20 results (2005)

Slides by Jianchao Yang
http://web.mit.edu/bcs/sinha/papers/20Results_2005.pdfhttp://www.cs.uiuc.edu/homes/dhoiem/courses/cs598_spring09/slides/spring09_jianchao_biologicalInspiredComputerVision.pdfhttp://www.cs.uiuc.edu/homes/dhoiem/courses/cs598_spring09/slides/spring09_jianchao_biologicalInspiredComputerVision.pdfhttp://web.mit.edu/bcs/sinha/papers/20Results_2005.pdfhttp://web.mit.edu/bcs/sinha/papers/20Results_2005.pdfhttp://web.mit.edu/bcs/sinha/papers/20Results_2005.pdf


72/101


73/101


74/101


75/101


76/101


77/101


78/101


79/101

Result 17: Vision progresses fromi l t h li ti


80/101

piecemeal to holistic


81/101

Things to remember


82/101

g

PCA is a generally useful dimensionality reductiontechnique But not ideal for discrimination

FLD better for discrimination, though only idealunder Gaussian data assumptions

Computer face recognition works very well under

controlled environments still room for improvementin general conditions


83/101

Normal Distribution and Covariance

Matrix


84/101

Assume we have a d-dimensional input (e.g. 2D), x.

We will see how can we characterize p(x), assuming it isnormally distributed.

For 1-dimension it was the mean (m) and variance (s2)

Mean=E[x] Variance=E[(x - m)2]

For d-dimensions, we need

the d-dimensional mean vector

dxd dimensional covariance matrix

If x ~ Nd(m,) then each dimension of x is univariate normal Converse is not true

Normal Distribution &Multivariate Normal Distribution


85/101

Multivariate Normal Distribution

For a single variable, the normal density function is:

For variables in higher dimensions, this generalizes to:

where the mean mis now a d-dimensional

vector, is a dx d covariance matrix

|| is the determinant of :

Multivariate Parameters: Mean, Covariance


86/101

) TTT

ddd

d

d

EE xxxxx

][]))([(Cov

2

21

2

2

221

112

2

1

sss

sss

sss

] ]TdE mm,...,:Mean 1 x

)])([(,Cov jjiijiij xxExx mms


87/101

Variance: How much X variesaround the expected value

Covariance is the measure thestrength of the linearrelationshipbetween two random variables

covariance becomes more positivefor each pair of values which differfrom their mean in the samedirection

covariance becomes more negativewith each pair of values which differfrom their mean in oppositedirections.

if two variables are independent,then their covariance/correlation

is zero (converse is not true).

)ji

ij

ijji

jjiijiij

xx

xxExx

ss

s

mms

,Corr:nCorrelatio

)])([(,Cov:Covariance

Correlation is a dimensionlessmeasure of linear dependence.

range between1 and +1

Covariance Matrices


88/101


89/101

Contours of constant probability density for a 2D Gaussian distributions witha) general covariance matrix

b) diagonal covariance matrix (covariance of x1,x2 is 0)c) proportional to the identity matrix (covariances are 0, variances of eachdimension are the same)

Shape and orientation ofthe hyper-ellipsoid


90/101

the hyper ellipsoidcentered at m is definedby

v1v2


91/101

The projection of a d-dimensional normal distribution

onto the vector w is univariate normalwTx ~ N(w

Tm, wTw) More generally, when W is a dxk matrix with k < d,

then the k-dimensional WTx is k-variate normal with:

WTx ~ Nk(WTm, WTW)

W1T

W2T

.

.

WT=


92/101

Eigenvalue Decomposition of the

Covariance Matrix

Eigenvector and Eigenvalue


93/101

Given a linear transformation A, a non-zero vector x isdefined to be an eigenvector of the transformation if it

satisfies the eigenvalue equation

Ax = x for some scalar.

In this situation, the scalar is called an eigenvalue of Acorresponding to the eigenvectorx.

(A- I)x=0 => det(A- I) must be 0. Gives the characteristic polynomial whore roots are the

eigenvalues of A

Eigenvalues of the covariance matrix
http://en.wikipedia.org/wiki/Scalar_(mathematics)http://en.wikipedia.org/wiki/Scalar_(mathematics)


94/101

The covariance matrix is symmetrical and it can always be

diagonalized as:

T

whereis the column matrix consisting of the eigenvectors of,

T=-1andis the diagonal matrix whose elements are the eigenvalues of.

],...,,[ 21 l


95/101


96/101

If x ~ Nd(m,) then each dimension of x is univariatenormal Converse is not true counterexample?


97/101

The projection of a d-dimensional normal distribution

onto the vector w is univariate normalwTx ~ N(w

Tm, wTw) More generally, when W is a dxk matrix with k < d,

then the k-dimensional WTx is k-variate normal with:

WTx ~ Nk(WTm, WTW)

W1T

W2T

.

.

WT=

Univariate normal transformed x


98/101

Proof for the first one:

E[wTx] = wT E[x] = wT m Var[wTx] = E[ (wTx - wT m) (wTx - wT m)T]

E[ (wTx - wT m)2] since(wTx - wT m) is a scalar E[ (wTx - wT m) (wTx - wT m) ] E wTxm) xm)Tw ] based on (slide 9.,rule 5) wT w move out and use defin. of


99/101

Similar to projection onto

w, ATxis computedhere, and results in ATmand ATA

Whitening Transform


100/101

Given the previous, we can construct a transformation matrix

Aw todecorrelate variables and obtain =I. I.e. Compute Aw (x-m) to obtain zero-mean and unit variance data.

This matrix is called the whitening transform Aw=1/2T where

is the diagonal matrix of the eigenvalues of the original distributionand

is the matrix composed of eigenvectors as its columns

See that the resulting covariance matrix, AwAwT

is indeed

the identity matrix, I (has decorrelated dimensions)


101/101

From previous slides:

Awx has covariance matrix AwAwT

AwAwT = (1/2T)( T) (1/2T)T

= 1/2T) T) (1/2)T

= 1/2I) I) (1/2)T since T 1

= 1/2I) I) (1/2) since 1/2 is symmetric

= 1/2 1 1/2

I

Documents

Lecture15 - Face Recognition2