Facial Recognition Using Active Shape Models, Local ...kseshadr/ML_Paper.pdf · Facial Recognition Using Active Shape Models, Local Patches and Support Vector Machines Utsav Prabhu

Facial Recognition Using Active Shape Models, LocalPatches and Support Vector Machines

Utsav PrabhuECE Department

Carnegie Mellon University5000 Forbes AvenuePittsburgh, PA-15213

[email protected]

Keshav SeshadriECE Department

Carnegie Mellon University5000 Forbes AvenuePittsburgh, PA-15213

[email protected]

Abstract

In this paper we propose an improved method for facial recognition of frontal facesusing local patches around well defined facial landmarks. Our method aims atrectifying the problems of illumination variation and in-plane rotation of faces byonly using specific discriminative areas on a face thus making it more robust. 79landmarks are automatically fitted onto all faces in our training and test set usinga pre-trained Active Shape Model. Local patches of fixed dimension are builtaround the most discriminative and accurate landmarks and then used to obtainfeatures. It is these features that are used to differentiate one class from anotherusing a Support Vector Machine as the classifier in a one against the rest form. Weevaluate our scheme on random training and test sets drawn from two differentdatabases (NIST Multiple Biometric Grand Challenge-2008 (MBGC-2008) andCMU Multi-PIE) and show that our method is capable of good recognition rates.

1 Introduction

Facial recognition schemes are increasingly becoming more accurate, however, the combined effectof illumination changes, pose variations and in-plane rotations of subjects has been known to throwoff the accuracy of several schemes. Our focus is on illumination and in-plane effects and wedo not address the problem posed by pose variations in this paper. Several solutions have beenproposed to deal with the problem of illumination. Such schemes include de-illumination and re-illumination of faces in the image domain as described in [1], illumination normalization usinghistogram equalization [2] and using Near-Infrared images [3]. All of the above schemes do achievegood results but focus mainly on compensating for illumination effects rather that using an approachthat is inherently robust to it.

It has been shown that a local approach to face recognition is more robust to illumination effects thana global approach [4], [5]. It is for this reason that we focus on the use of features extracted fromsmall two-dimensional (2D) regions around selected facial landmarks for our recognition algorithm.A modified Active Shape Model (ASM) [6] is used to determine the locations of 79 landmark pointsacross all faces in our training and test databases. Local patches are isolated around each landmarkand used to build features unique to each class using a combination of Gabor filter banks and Princi-pal Component Analysis (PCA). These features are used to train a Support Vector Machine (SVM)which serves as our classifier. Such a scheme harnesses a lot of information from every facial imageunlike global approaches which utilize pixel intensities in the image as a whole and thus can sufferfrom noise in the background, in-plane rotations etc. Similar local approaches have been followedin [4] and [5]. [4] uses Active Shape Models to find landmarks of interest on a face and then com-pares the facial shape of a test image with those in a training database to classify the test image. [5]

1

extracts facial feature regions and uses an SVM (in a one against the rest form) for classificationusing these extracted features.

To evaluate our algorithm we observe the identification rates it generates on images drawn fromthe NIST Multiple Biometric Grand Challenge-2008 (MBGC-2008) database [7] as well as imagesdrawn from the CMU Multi-PIE database [8]. Both databases have are quite challenging and containimages with illumination and in-plane rotation effects.

The rest of this paper is organized as follows. In section 2, we describe the algorithms we use inour implementation. Section 3 describes the results of our experiments while section 4 presents ourconclusions and a description of future work.

2 Component tools used by our method

This section goes into the details of several existing tools and why they are suitable for our use inour overall facial recognition method.

2.1 Active shape models

Active Shape Models (ASMs) are aimed at automatically locating landmark points that define theshape of any statistically modeled object in an image. When modeling faces, the landmark points ofinterest consist of points that lie along the shape boundaries of facial features such as the eyes, lips,nose, mouth and eyebrows.

The training stage of an ASM involves the building of a statistical facial model from a trainingset containing images with manually annotated landmarks. The landmarking scheme used by usconsists of 79 facial points as shown in Figure 1. Our training set comprised of 500 images of115 subjects from the query set of the still face challenge problem of the MBGC-2008 database.The shapes in the training set are aligned with each other using Generalized Procrustes Analysis(GPA) [9] and then used to generate a mean shape of a typical face. Subsequently, statistical modelsof the grey level intensities of the region around each landmark are built using 2D profiles whichare generated by sampling the image in a square region around each landmark. Such profiles aregenerated for each landmark point in each image and for four different levels in an image pyramid.

At the testing stage, the OpenCV implementation of the Viola Jones face detector [10] is used forlocating the face in an image. Once the face has been detected, the mean face is scaled, rotated andtranslated using a similarity transform to roughly fit on top of the face in the test image. Multi-levelprofiles are constructed for the image in the same way as they were at the training stage. Landmarksare repeatedly moved into locations with profiles that best match the mean profile for that landmarkuntil there is no significant change in their positions between two successive iterations. This processcontinues until convergence is declared at the finest level of the pyramid at which stage the final

Figure 1: Landmarking scheme used in our ASM implementation

2

Detect face in test image and align mean face over it

Multi-level profiling to determine best location for landmarks

Level 3 Level 2 Level 1 Level 0

Final landmark coordinates ready

Figure 2: Steps involved in ASM at the test stage

landmark coordinates are obtained. Figure 2 illustrates the process of ASM fitting of an unseen testimage.

2.2 Gabor filters for texture analysis

The texture around particular areas of the face image provides sufficient information to constructa robust face recognition engine. This places a considerable emphasis on the formalization andevaluation of the texture of the image patches, a task which is carried out by the use of Gabor filterbanks.

Gabor filters are tunable band-pass filters which can be tuned in frequency, orientation and band-width. The filter takes the form:

g(x, y;λ, θ, ψ, σ, γ) = exp(−x′2 + γ2y′2

2σ2) cos(2π

x′

λ+ ψ) (1)

where

x′ = x cos θ + y sin θy′ = −x sin θ + y cos θ

Hence, a Gabor filter is simply the product of a Gaussian kernel and a cosine wave. In (1), λ and ψrepresent the wavelength and phase of the underlying cosine wave, σ and γ represent the standarddeviation and spatial aspect ratio of the Gaussian kernel and θ represents the orientation of thenormal to the function.

Gabor filters have been found to be both efficient and versatile in implementation. Consequently,they are widely used in computer vision to identify and differentiate textures in images, usually

3

(a)

(b)

Figure 3: (a) Some of the Gabor filters used in our filter bank (b) Filtering operation on a patcharound a landmark (landmark 20)

in the form of a Gabor filter bank consisting of many Gabor filters tuned in different ways. Forour experiments, we generate a Gabor filter bank consisting of 384 different Gabor filters, using 8orientations, 4 frequencies, 3 scales and 2 spatial aspect ratios. Each combination of these valuesresults in 2 Gabor filters: one even-symmetric and one odd-symmetric. A subset of the Gabor filtersused in our experiments is shown in Figure 3. Each local patch extracted from the image is thenfiltered with these 384 filters, leading to a 384 dimension feature vector for each patch as shown inFigure 3. To reduce the length of the feature vector describing each image, we use a PCA-basedapproach similar to the one proposed in [11].

2.3 SVMs for multi class problems

SVMs are predominantly used for binary class problems, however their use can be extended to multi-class problems as well by 2 approaches. The first approach is in a one against the rest form whereM SVMs are built for M classes (one for each class) by treating images from each class as positivesamples and images from all remaining samples as negative samples. The second approach is thepairwise method in which M(M-1)/2 SVMs are built to differentiate each class form the remainingM-1 classes. Both approaches have been found to produce approximately similar results whendealing with person recognition [12] so we prefer the one against the rest method as it requires thetraining of far fewer classifiers than the pairwise approach.

In the one against the rest approach, the class label y of a test sample x is assigned as follows:

y = n if dn(x) > 0 (2)

4

Figure 4: ASM fitting on some images from the MBGC dataset

where dn(x) = max{di(x)}Ni=1 and di(x) is the distance of x from the ith hyperplane (built for the

ith class). The larger the value of di(x), the more reliable the classification result is and hence wechoose the final label of a test sample as the class whose SVM model maximizes this distance.

3 Experimental results

Our first experiment aimed at benchmarking our implementation against a global PCA scheme whentrained and tested on a subset of images from the still query set (consisting of 10,687 frontal imagesof 570 subjects) and the still target set (consisting of 24,042 frontal images of 466 subjects) of theMBGC-2008 database. Our training set consisted of 129 classes and 20 images per class while thetest set consisted of 94 classes and 5 images per class. Images in this dataset were of size 407× 527while the faces in the images were typically of size 300 × 400. ASM was run on all these imagesto get the required 79 facial landmarks. Sample results from the MBGC dataset obtained using theASM fitting process are shown in Figure 4.

The first method for facial recognition was a global PCA scheme in which the facial region wascropped for all images and resized to size 300 × 300. For the training set, the entire facial regionwas used as a feature vector for an image and PCA was used to reduce the dimensionality of thefeature vector by projecting onto eigenvectors corresponding to eigenvalues that modeled 97% of thefeature variance. Now each training image was represented by 273 PCA coefficients. By projectingonto the eigenvectors built during training, each test image was now also represented by the samenumber of coefficients. Our implementation on the same training and test set first isolated 25 × 25local patches around 64 landmarks (numbered 16 to 79 in Figure 1). We neglected landmarks alongthe facial edge as isolating patches around such landmarks would lead to the patch containing regionsoutside the face. A Gabor filter bank consisting of 384 filters was then applied to each patch so thateach image was now represented by a vector of length 384× 64. PCA (modeling 97% of variance)was used for dimensionality reduction as well as for classification and after it was used each imagewas represented by 179 PCA coefficients. Identification rates and ROC curves were obtained forboth methods after computing a similarity matrix (based on cosine distance between feature vectors)for all the training and test images. Thus both these schemes used PCA for dimensionality reductionas well as for classification.

Now in order to improve the identification rates, an SVM was used as the classifier. For our im-plementation we used the SVM Multi-Class library [13]. An SVM model (using a linear kernel)was built in a one against the rest form first for the global PCA coefficients and next for the PCAcoefficients obtained from the filtered local patches. The ID rates obtained using these 2 methods on

5

15 views

Frontal LeftRight

20 illuminations

2 expressions

249 subjects

1 2

3

4 5 6 7 8 9 10 11 12

13

14 15

Figure 5: Session 1 of the CMU Multi-PIE database

(a) (b)

Figure 6: ROC Curves (a) MBGC dataset (b) MPIE dataset

the same test set were computed to compare against the earlier 2 methods which did not involve theuse of an SVM.

The same schemes were compared in a second experiment carried out on a training set consisting of249 classes and 30 images per class and a test set consisting of 249 classes and 10 images per classdrawn from the frontal view (view 8 in Figure 5) set of session 1 of the CMU Multi-PIE databasewhich contains 149,400 images each of size 640× 480 (with the face in the image approximately ofsize 250×300) of 249 different subjects, across 15 views, 20 illuminations and 2 expressions (neutraland smiling) as shown in Figure 5. For this dataset, PCA (when used alone) reduced each imageto a representation consisting of 238 coefficients, while when used with the Gabor responses foreach patch, it produced 211 coefficients per image. Table 1 shows the identification rates obtainedfor the global PCA scheme, the method involving Gabor responses to local patches and PCA, SVMas a classifier for the global PCA coefficients and SVM as a classifier for the local patches PCAcoefficients for both the MBGC and MPIE datasets respectively while Figure 6 compares the ROCcurves obtained for the first two methods in each experiment.

6

Table 1: Identification rates (in %) obtained by 3 methods on the MBGC and MPIE datasets

Method Used MBGC Dataset MPIE DatasetGlobal PCA 21.70% 33.82%

PCA + Filtered Local Patches 55.32% 53.49%SVM + Global PCA 71.70% 74.45%

SVM + PCA + Filtered Local Patches 79.36% 98.19%

It is clear that a global PCA scheme obtains extremely poor results on both the datasets. The per-formance is bettered by the use of the Gabor responses to local patches and PCA as a classifier.However, the best results were obtained when an SVM was used as the classifier. This is expected,since PCA seldom functions as a good classifier and its role should be that of dimensionality re-duction. As Table 1 shows, the use of an SVM improves the performance for both the global PCAscheme as well as the local patches scheme significantly. What is key though, is that our local ap-proach scores over the global approach both with and without the use of an SVM. This indicates thatthe idea of using local patches and subsequent feature selection by applying Gabor filters is soundand with the correct choice of Gabor filters and SVM parameters our method can do extremely wellon challenging databases.

4 Conclusions and future work

We have proposed a method of facial recognition to deal with illumination changes and in-planerotations which uses a local approach as opposed to a global one. Features are obtained by applyinga Gabor filter bank to 2D patches around specific facial landmarks that are fitted using an accurateActive Shape Model. Subsequent use of PCA (for dimensionality reduction) and SVMs (for classifi-cation) have been shown to perform quite well on two challenging datasets. Our implementation hasbeen benchmarked against a global PCA based scheme and has been shown to obtain far superiorresults.

The theory behind our implementation has been proved to be quite sound, however there is stillscope for improved performance. We have not yet looked into optimizing the Gabor filters we usefor extracting the features nor completed a study of the best SVM parameters for building the bestclassifier. Future work will involve looking into the afore mentioned areas as well as the possibilityof using Gabor jets for feature extraction instead of a Gabor filter bank. Another area worth investi-gating will be the performance enhancement that can be gained by weighting the features obtainedfor certain landmarks over others. For example, the ASM we use tends to fit eye and nose coordi-nates better than others and hence the features obtained from patches around these landmarks couldbe given more weight than others.

References

[1] Brendan Moore, Marshall Tappen, Hassan Foroosh, “Learning Face Appearance Under Different LightingConditions,” Proceedings of the 3rd IEEE International Conference on Biometrics: Theory, Applications andSystems, September 2008.

[2] Saleh Aly, Alaa Sagheer, Naoyuki Tsuruta, Rin-ichiro Taniguchi, “Face recognition across illumination,”The 12th International Symposium on Artificial Life and Robotics, January 2007.

[3] Stan Z. Li, RuFeng Chu, ShengCai Liao and Lun Zhang, “Illumination Invariant Face Recognition UsingNear-Infrared Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, No. 4, pp.627-639, April 2007.

[4] A. Faro, D. Giordano, C. Spampinato, “An Automated Tool for Face Recognition using Visual Attentionand Active Shape Models Analysis,” Proceedings of The 28th IEEE EBMS Annual International Conference,September 2006.

[5] Bernd Heisele, Purdy Ho, Tomasso Poggio, “Face recognition: component-based versus global ap-proaches”, Computer Vision and Image Understanding, vol. 91, pp. 6-21, August 2003.

7

[6] Keshav Seshadri and Marios Savvides, “Robust Modified Active Shape Model for Automatic Facial Land-mark Annotation of Frontal Faces,” The 3rd IEEE International Conference on Biometrics: Theory, Applica-tions and Systems, September 2009.

[7] P. Jonathon Phillips, Patrick J. Flynn, J. Ross Beveridge, W. Todd Scrugs, Alice J. O Toole, David Bolme,Kevin W. Bowyer, Bruce A. Draper, Geof H. Givens, Yui Man Lui, Hassan Sahibzada, Joseph A. Scallan III andSamuel Weimer “Overview of the Multiple Biometrics Grand Challenge,” Proceedings of the 3rd IAPR/IEEEInternational Conference on Biometrics, June 2009.

[8] R. Gross, I. Matthews, J. Cohn, T. Kanade and S. Baker, “Multi-PIE,” Proceedings of the 8th IEEE Inter-national Conference on Automatic Face and Gesture Recognition, September 2008.

[9] J. C. Gower, “Generalized Procrustes Analysis,” Psychometrika, vol. 40, no. 1, pp. 33-51, March 1975.

[10] Intel: Open Source Computer Vision Library, Intel, 2007.

[11] C. Liu and K. Wechsler, “Gabor Feature Based Classification using the Enhanced Fisher Linear Discrimi-nant Model for Face Recognition”, IEEE Transactions on Image Processing, vol. 11, No. 4, pp. 467-476, April2002.

[12] C. Nakajima, M. Pontil and T. Poggio, “People Recognition and Pose Estimation in Image Sequences”,Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 4, pp. 4189-4195, July 2000.

[13] “SVMmulticlass−Multi-Class Support Vector Machine,” http://svmlight.joachims.org/svm multiclass.html.

8

Documents

Facial Recognition Using Active Shape Models, Local ...kseshadr/ML_Paper.pdf · Facial Recognition Using Active Shape Models, Local Patches and Support Vector Machines Utsav Prabhu