Eigenfaces vs Fisher Faces Presentation

Eigenface vs Fisherface

Comparison between Eigenface and Fisherface

by Marian Moise

Overview

● Eigenface is still used today and its main idea comes from the fact that each face image can be reconstructed based on the weighted average of the principal components of the original training set of face images. Reconstruction is performed by projecting a new image into the subspace

spanned by the eigenfaces (“face space”) and then classifying the face by comparing its position in face space with the positions of known individuals

● On the other hand Fisherface is based on LDA technique which searches for those vectors in the underlying space that best discriminate among classes (rather than those that best describe the data). More formally, given a number of independent features relative to which data is

described, LDA creates a linear combination of these which yields the largest mean differences between the desired classes.

PCA vs FLD

PCA actually smears the classes together so that they are no longer linearly separable in the projected space. It is clear that, although PCA achieves larger total scatter, FLD achieves greater between-class scatter, and, consequently, classification is simplified.

Face Recognition Using Eigenfaces

➔The main idea behind Eigenfaces is to represent a face as a linear combination of a set of basis

images(PCA technique):

Φi=∑j=1

k

w j∗u j

Recognition overview

●Initialization: Acquire the training set of face images and calculate the eigenfaces, which define the face space.

●When a new face image is encountered, calculate a set of weights based on the input image and the M eigenfaces by projecting the input image onto each of the eigenfaces.

●Determine if the image is a face at all (whether known or unknown) by checking to see if the image is sufficiently close to “face space.”

Recognition overview

●If it is a face, classify the weight pattern as either a known person or as unknown.

●(Optional) If the same unknown face is seen several times, calculate its characteristic weight pattern and incorporate into the known faces (i.e., learn to recognize it).

Eigenfaces algorithm

➔Obtain M training images I1,I2...IM➔Represent each image Ii as a vector Γi:

I i=[a11 a12 ⋯ a1N

⋮ ⋮ ⋱ ⋮a21 a22 ⋯ a2N

aN1 aN2 ⋯ aNN]i=[

a11

⋮a1N⋮

a2N⋮

aNN]


Compute the average face vector Ψ :

Subtract the mean face from each face vector Γi to get a set of vectors Φi. The purpose of subtracting the mean image from each image vector is to be left with only the distinguishing features from each face and removing information that is common:

= 1M∑

i=1

M

i

i=i−


➔Find the covariance matrix C:

Note that C is a N2 * N2 matrix, while A matrix size is N2 * M.

C=A∗AT , where A=[12M ]


➔We now need to compute the Eigenvectors ui of C. However note that C is a N2 * N2 matrix and it would return N2 eigenvectors, each being N2 dimensional. For an image this is HUGE. The computations required would easily make your system run out of memory.

Eigenfaces algorithmInstead of the matrix A*AT consider the matrix

AT*A. Remember A is a N2 * M matrix, thus AT*A is a M * M matrix. If we find the eigenvectors of this matrix, it would return M eigenvectors, each of dimension M*1, let's call these eigenvectors vi.

Now from properties of matrices, it follows that ui=A*vi, where ui are the M largest eigenvectors of the covariance matrix C with M<<N2.

When calculating the eigenvectors ui, we should take into account also that ||ui||=1.


➔Select the best K eigenvectors(principal components). Usually the selection of these eigenvectors is done heuristically.

Finding weights

Each normalized face in the training set could now be represented as a linear combination of these eigenvectors:

These weights can be calculated as:

Φi=∑j=1

K

w j∗u j

w j=u jT∗i

Finding weights

So each training image Φi (i=1,2,...,M) will be represented in the new model as:

i=[w1

w2

⋮wK]

Recognition process

Let's say we have a face image Γ that is to be recognized, then the following steps should be performed:1) Face normalization: Φ=Γ-Ψ2) Project this normalized probe onto the eigenspace and find the weights:

3) The normalized probe Φ can now be represented as:

w j=u jT∗i

=[w1 w2 ⋯ wK ]T

Recognition process

Classification of the feature vector will be done using distance measures:

If er < Θ, where Θ is a threshold chosen heuristically, then the probe image is recognized as the image with which it gives the lowest score.However, if er > Θ then the probe doesn't belong to the database.

er=min∥−i∥

Distance measures

Euclidean Distance:

Mahalanobis distance:

Because it takes into account also the covariance of the vectors p and q, thus removing the problems related to scale and correlation, it is a better solution for pattern recognition problems.

d p ,q=∑i=1

n

pi−qi2

d p ,q= p−qTC−1 p−q

Eigenfaces issues➔All the above calculus has been made on the assumptions that faces are mostly upright and frontal.

➔Because it might happen that the probe image is not a face and however it still resembles a particular face class stored in the database, face detection is recommended to be part of such a system.

➔The results indicate that changing lighting conditions causes relatively few errors, while performance drops dramatically with size change.

Fisherfaces

Fisher's Linear Discriminant (FLD) is an example of a class specific method, in the sense that it tries to “shape” the scatter in order to make it more reliable for the classification. This method selects Wopt (optimal eigenvectors) in such a way that the ratio of the between-class scatter (SB) and the within-class scatter (SW) is maximized (in case SW is non-singular):

Wopt=arg maxW

∣WT SB W∣

∣WT SW W∣=[w1 w 2 wm ]

Fisherfaces Between-class scatter matrix is defined as:

Within-class scatter matrix is defined as:

S B=∑i=1

c

N iμ i− μ μi−μT

SW=∑i=1

c

∑xk∈X i

xk− μixk−μ iT

Within-class scatter matrix is defined as:

where μi the mean image of class Xi and Ni the number of samples in class Xi

Singularity issue

In the face recognition problem, one is confronted with the difficulty that the within-class scatter matrix SW is always singular. This stems from the fact that the rank of SW is at most N-c and, in general, the number of images in the learning set N is much smaller than the number of pixels in each image n.

In order to overcome the complication of a singular SW , an alternative method, called Fisherface, deals with it by projecting the image set to a lower dimensional space so that the resulting within-class scatter matrix SW is non-singular. This is achieved by using PCA to reduce the dimension of the feature space to N-c, and then applying the standard FLD to reduce the dimension to c-1. More formally, Wopt is given by:

W optT =W fld

T W pcaT

Singularity issue

W pca=arg maxW ∣WT ST W∣

W fld=argmaxW∣W TW pca

T STW pcaW∣

∣W TW pcaT SW W pcaW∣

Results

Related work

These two papers confirms my obtained results:

● Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, Peter N.Belhumeur, Joao P. Hespanha, and David J. Kriegman● Eigenfaces and Fisherfaces, Naotoshi Seo, University of Maryland

Limitations and possible extensions

Faces are assumed to be upright and frontal:● it would be a good idea to have an additional subsystem which transforms any given image face to frontal centered view● another possibility would be to create a small number of face classes for each known person corresponding to characteristic views (various posing)

PCA is sensitive to lighting conditions:● specularities removal should be considered as a preprocessing step● unfortunately, there is no available method for grayscale images, so highlights removal should be used instead

Limitations and possible extensions

Probe image might not be a face and however it still resembles a particular face class stored in the database:● face detection is recommended to be part of such a system

Getting more significant eigenvectors and improve the recognition rate:● face cropping should be done, for example by using an ellipsoidal shape kernel(filtering matrix)

Conclusions➔ Fisherface method appears to be the best at simultaneously handling lighting variation, facial expression variation and presence of glasses.

➔ as expected, the PCA method suffers when confronted with variation in facial expression and presence of glasses.

➔ the optimal method between these two is Fisherface as it has several properties that intuitively suggest that it should fare well in a variety of circumstances, most notably the fact that it eliminates intra-class differences from its feature set. This suggests that it is close to optimal in deciding exactly what features are relevant to a particular class, given enough examples of that class.

➔ PCA can outperform LDA when the training dataset is small or when there are chosen a few eigenvectors.

➔ both methods performed well if presented with an image in the test set which is similar to an image in the training set.

Questions?

THANK YOU !

Documents

Eigenfaces vs Fisher Faces Presentation