6
Robust facial feature points extraction in color images Yue Zhou n , Yin Li, Zheng Wu, Meilin Ge Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China article info Article history: Received 14 July 2008 Received in revised form 19 July 2010 Accepted 2 September 2010 Available online 25 September 2010 Keywords: Facial feature points extraction Skin similarity model Gabor feature KNN LDA abstract A method of facial feature points extraction based on improved active appearance model (AAM) with Gabor wavelet features was presented in the paper. After the pre-processing of a standard face detector and lighting compensation, the paper proposed a hybrid AAM by combining the local skin similarity with the original local grey-level appearance model. Moreover, the feature points by the hybrid AAM and their neighbors were considered by a classification problem to further refine the results. Namely, the Gabor feature around the feature points was extracted, trained by linear discriminant analysis (LDA) and classified by K Nearest Neighbor (KNN) to give the precise location of the feature points. Experimental results indicated that facial feature points can be located robustly and precisely by the proposed method. Crown Copyright & 2010 Published by Elsevier Ltd. All rights reserved. 1. Introduction Facial feature point extraction (Alan et al., 1992) is one of the key techniques in the face representation. And it is widely used in different problems such as face recognition, pose estimation and 3D face reconstruction. With its root in point distribution model (PDM), active appearance model (AAM) is one of the most popular methods in facial feature point extraction. AAM learns the shape model together with the grey-level appearance model from the training set. The model could then fit on the human face image by iteratively changing the shape until its convergence to the right location. However, the traditional AAM only deals with the local grey-level appearance, thus it might get stuck in the local minimum due to the variance of illumination or the complex local structure such as facial wrinkle. Fortunately, for color images, such kind of variations could have little impact on the facial skin chrominance. And the complex local structure could in turn provide additional information for the accurate location of the facial feature points. To address these problems, a method of facial feature points extraction based on improved AAM with Gabor wavelet features is presented in the paper. After the pre-processing of a standard face detector and lighting compensation, the paper proposes a hybrid AAM by combining the local skin similarity with the original local grey-level appearance model. Moreover, the feature points by the hybrid AAM are further refined by a classification problem. To be more precise, the Gabor features around the feature points are extracted, trained by linear discriminant analysis (LDA) and classified by K nearest neighbor (KNN), to give the accurate location of feature points. Section 2 describes the pre-processing of the image. Section 3 presents the hybrid AAM. Feature points refinement based on Gabor jet is presented in Section 4. Experimental results are described in Section 5. Finally, Section 6 concludes the paper. 2. Pre-processing of the image Pre-processing is necessary for the robust extraction of the facial feature points. To extract the feature point, the location of the face in a real world image is crucial. Moreover, to facilitate the skin similarity model, lighting compensation would be bene- ficial. As a consequence, the Adaboot based face detector is first implemented, followed by the lighting compensation. 2.1. Face detection algorithm Before the extraction of facial feature, a face detector is needed to locate the face. Viola and Jones (2004) proposed the well- known Adaboost algorithm, which could handle the detection in an efficient and robust manner. With Harr like features, the algorithm combines multiple weak classifiers in a cascade and produces efficient and accurate solutions. Therefore, the authors implement the algorithm for the face detection in our system. A face detected by Adaboost is shown in Fig. 1 as an example. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Artificial Intelligence 0952-1976/$ - see front matter Crown Copyright & 2010 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2010.09.001 n Corresponding author. E-mail addresses: [email protected] (Y. Zhou), [email protected] (Y. Li), [email protected] (Z. Wu), [email protected] (M. Ge). Engineering Applications of Artificial Intelligence 24 (2011) 195–200

Robust facial feature points extraction in color images

Embed Size (px)

Citation preview

Engineering Applications of Artificial Intelligence 24 (2011) 195–200

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

0952-19

doi:10.1

n Corr

E-m

(Y. Li),

journal homepage: www.elsevier.com/locate/engappai

Robust facial feature points extraction in color images

Yue Zhou n, Yin Li, Zheng Wu, Meilin Ge

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China

a r t i c l e i n f o

Article history:

Received 14 July 2008

Received in revised form

19 July 2010

Accepted 2 September 2010Available online 25 September 2010

Keywords:

Facial feature points extraction

Skin similarity model

Gabor feature

KNN

LDA

76/$ - see front matter Crown Copyright & 2

016/j.engappai.2010.09.001

esponding author.

ail addresses: [email protected] (Y. Zhou),

[email protected] (Z. Wu), [email protected]

a b s t r a c t

A method of facial feature points extraction based on improved active appearance model (AAM) with

Gabor wavelet features was presented in the paper. After the pre-processing of a standard face detector

and lighting compensation, the paper proposed a hybrid AAM by combining the local skin similarity

with the original local grey-level appearance model. Moreover, the feature points by the hybrid AAM

and their neighbors were considered by a classification problem to further refine the results. Namely,

the Gabor feature around the feature points was extracted, trained by linear discriminant analysis (LDA)

and classified by K Nearest Neighbor (KNN) to give the precise location of the feature points.

Experimental results indicated that facial feature points can be located robustly and precisely by the

proposed method.

Crown Copyright & 2010 Published by Elsevier Ltd. All rights reserved.

1. Introduction

Facial feature point extraction (Alan et al., 1992) is one of thekey techniques in the face representation. And it is widely used indifferent problems such as face recognition, pose estimation and3D face reconstruction. With its root in point distribution model(PDM), active appearance model (AAM) is one of the most popularmethods in facial feature point extraction.

AAM learns the shape model together with the grey-levelappearance model from the training set. The model could then fiton the human face image by iteratively changing the shape untilits convergence to the right location. However, the traditionalAAM only deals with the local grey-level appearance, thus itmight get stuck in the local minimum due to the variance ofillumination or the complex local structure such as facial wrinkle.Fortunately, for color images, such kind of variations could havelittle impact on the facial skin chrominance. And the complexlocal structure could in turn provide additional information forthe accurate location of the facial feature points.

To address these problems, a method of facial feature pointsextraction based on improved AAM with Gabor wavelet featuresis presented in the paper. After the pre-processing of a standardface detector and lighting compensation, the paper proposes ahybrid AAM by combining the local skin similarity with theoriginal local grey-level appearance model. Moreover, the featurepoints by the hybrid AAM are further refined by a classification

010 Published by Elsevier Ltd. All

[email protected]

(M. Ge).

problem. To be more precise, the Gabor features around thefeature points are extracted, trained by linear discriminantanalysis (LDA) and classified by K nearest neighbor (KNN), togive the accurate location of feature points.

Section 2 describes the pre-processing of the image. Section 3presents the hybrid AAM. Feature points refinement based onGabor jet is presented in Section 4. Experimental results aredescribed in Section 5. Finally, Section 6 concludes the paper.

2. Pre-processing of the image

Pre-processing is necessary for the robust extraction of thefacial feature points. To extract the feature point, the location ofthe face in a real world image is crucial. Moreover, to facilitate theskin similarity model, lighting compensation would be bene-ficial. As a consequence, the Adaboot based face detector is firstimplemented, followed by the lighting compensation.

2.1. Face detection algorithm

Before the extraction of facial feature, a face detector is neededto locate the face. Viola and Jones (2004) proposed the well-known Adaboost algorithm, which could handle the detection inan efficient and robust manner. With Harr like features, thealgorithm combines multiple weak classifiers in a cascade andproduces efficient and accurate solutions. Therefore, the authorsimplement the algorithm for the face detection in our system.A face detected by Adaboost is shown in Fig. 1 as an example.

rights reserved.

Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200196

2.2. Lighting compensation

Complex lighting condition has a negative effect on theestimation of facial skin similarity probability. Standard lightingcompensation uses ‘‘reference white’’ to normalize the colorappearance. Before using ‘‘reference white’’, the image isprocessed by Gamma correct.

new_pixel_value¼ old_pixel_value1=CG ð1Þ

where CG is the Gamma coefficient. Empirically, CG¼2.2222 for allimages.

After Gamma correct, pixels with the top 5% of the illumina-tion value in the image are regarded as the reference white only ifthe number of these pixels are sufficiently large (4100 for ourexperiment with the resolution of 300�300). The R, G and Bcomponents of a color image are adjusted so that the average greyvalue of these reference white pixels is linearly scaled to 255. Theimage is not corrected if the number of reference-white pixels issmall (o100) or the color distribution is already close to the skinmodel mentioned in Section 3.1. Fig. 3a and b demonstrates anexample of the lighting compensation results. With lightingcompensation, the proposed hybrid appearance becomes morerobust (see Fig. 3c and d).

3. Hybrid appearance model for AAM

Based on the point distribution model (PDM), AAM is one ofthe most popular algorithms in facial feature point extraction.Specifically, AAM trains a shape model with local grey-valueappearance and fits the model on the human face image byiteratively changing the shape until its convergence to the rightlocation. In this section, a hybrid appearance model which

Fig. 1. Face detection by Adaboost.

Fig. 2. Normalized distributio

combines grey-value information with facial skin similarity ispresented.

3.1. Facial skin similarity probability

Modeling skin color requires choosing an appropriate colorspace and identifying a cluster associated with skin color in thisspace. Recent studies assume that the chrominance componentsof the skin-tone color are independent of the luminancecomponent. And YCbCr color space is often used to build a skinchrominance model. The YCbCr space is also used in this paper tomodel the skin similarity. In addition, the skin model is learnedfrom the SJTU face database, which contains 2274 Asian people’sface images with feature points been marked manually.

By the transformation of the RGB space to the YCbCr space,facial skin patches were (567,959 pixels) collected from 100 faceimages in SJTU image database (Du et al., 2006; Feng et al., 2006).The skin-tone cluster is shown in Fig. 2 in the subspace CbCr. Thecluster is approximated by the Gaussian distribution (Cootes andTaylor, 2004) with its parameters in (2) and (3).

Mean¼ E½px� ð2Þ

Cov¼ E½ðpx-MeanÞðpx-MeanÞT � ð3Þ

n of skin CbCr subspace.

Fig. 3. The effect of lighting compensation: (a) Original RGB image; (b) a lighting

compensated image, (c) skin similarity estimation of (a) and (d) skin similarity

estimation of (b).

Fig. 4. Initial shape of AAM.

Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200 197

where px¼[Cr,Cb]. The parameters are fitted to the samples bymaximum likelihood estimation and is shown in

Mean¼ ½140:9557,115:2307�, Cov¼17:6018 �9:7822

�9:7822 14:6607

� �ð4Þ

Then, facial skin similarity probability P could be calculated byMahalanobis distance to the cluster center as shown in

P¼ expð�0:5ðpx-MeanÞT Cov�1ðpx-MeanÞÞ ð5Þ

where P ranges [0,1]. The probability P is then linearlytransformed to [0,255], and look-up table is pre-computed tospeed up the calculation from CbCr subspace to facial skinsimilarity probability.

3.2. Modeling shape variation

Suppose a sets of points xiARnd aligned into a commoncoordinate image is given (Cootes et al., 1995). These vectorsform a cloud of points in Rnd, i.e. a distribution in the nd

dimensional space. AAM apply principal component analysis(PCA) to the data. PCA computes the main axes of this cloud,leading to the approximation of the original points using a modelwith fewer than nd parameters. Namely, one could get

x¼ x_þPsbs ð6Þ

where x_

is the mean shape, Ps¼(p0, p1, p2, pt�1) and pi is theeigenvector corresponding to eigenvalue li (sorted in decreasingorder):

bs ¼ PTs ðx�x

_Þ ð7Þ

The vector bs defines a set of parameters of a deformablemodel. The shape varies by different bs. Furthermore, the limit ofthe parameter bi is set to 73

ffiffiffiffili

p, indicating that the shape

generated is similar to the original training set.

3.3. Local hybrid appearance model

Most of the facial feature points are at the boundary of skinand non-skin region. Thus, skin similarity probability could be acue to infer their locations. In this section, a hybrid AAM bycombining the local skin similarity with the original local grey-level appearance model is presented and discussed.

Suppose for a given feature point i, n pixels on either side ofthe ith point in the kth training image are sampled. Then, for eachpoint, np¼2n+1 samples could be collected and form a vectorhki¼(hki0,y,hki(np�1))

T (we set n¼30). The local grey-levelappearance is represented by the gradient of the correspondingpixels in the vector hki as dhki¼(hki1�hki0,y,hki(np�1)�hki(np�2))

T.The gradient is further normalized by dividing by the sum ofabsolute values as

gki ¼dhkiPnp�2

q ¼ 0

9hkiðqþ1Þ�hkiq9

ð8Þ

Local skin probability gradient vector pki could be calculatedsimilar to the local grey gradient vector. The difference is thatvector hki is from the grey level value image, while lki is from theskin similarity probability image.

pki ¼dlkiPnp�2

q ¼ 0

9lkiðqþ1Þ�lkiq9

ð9Þ

A reasonable approach is to combine gki with pki, and form a

new feature vector cki ¼gki

Wppki

!, where Wp is a diagonal matrix

of weights for each pki parameter, allowing more flexibilitybetween gki and pki (we set Wp¼ I in experiment). cki as thefeature is called the local hybrid appearance model. The localhybrid appearance model combines local grey-level informationwith skin similarity, leading to a more robust solution.

ck is approximated by Gaussian distribution with its mean andcovariance as

ci ¼1

N

XN

k ¼ 1

cki ð10Þ

CovðckÞ ¼1

N

XN

k ¼ 1

ðcki�ciÞðcki�ciÞT

ð11Þ

The distance of a new sample cui to the model is given by

d¼ ðcui�cÞT Cov�1i ðcui�ciÞ ð12Þ

This is the Mahalanobis distance of the sample cui from themodel mean ci. Minimizing d is equivalent to maximizingthe likelihood of cui. Thus, the feature points should have reachedthe minimum distance of the model.

3.4. Initial shape parameters setting

Hsu et al. (2002) proposes an algorithm that fast locates theeyes and mouth according to the chrominance information ofthem. For better initialization, the feature points of eyes andmouth in the average shape are adapted to fit the location byupdating the shape parameters bs. By pose parameters (y,s,t), theinitial shape could be set as X ¼Mðs,yÞðxþPsbsÞþt. An illustrationof the initialization is shown in Fig. 4.

3.5. Feature points searching

The optimal feature points could be found according to thelocal hybrid appearance model. The optimal solution should bethe points which have the minimum distance defined in thehybrid appearance model. The solution is iteratively refined:updating the parameters (y,s,t,bs) to best fit the found shape,applying limits of 73

ffiffiffiffili

pto the parameter bs and then

reconstructing a new shape. The procedure is repeated untilconvergence.

Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200198

4. Gabor jet based precise adjustment

After the hybrid AAM, the feature points at local edges arelocated optimally, but the feature points at corners such as eyespoints, nose points and mouth points cannot be located precisely.It is due to the fact that AAM searches only along normaldirection, leading to possible inaccurate position. To address theproblem, a Gabor jet classification based search algorithm ispresented. The algorithm searches in 2D domain and refines thelocation of the feature points.

4.1. Gabor jets and comparing jets

Gabor wavelets are biologically motivated convolution kernelswith the shape of plane waves restricted by a Gaussian envelopefunction. The set of convolution coefficients for kernels ofdifferent orientations and frequencies at one image pixel is calleda jet. The definition of jets and similarity functions between jetsare clarified in this section.

A jet describes a small patch of an image Ið x!Þ around a given

pixel x!¼ ðx,yÞ. It is based on a wavelet transform, defined as a

convolution:

Jjð x!Þ¼

ZIð x!

uÞcjð x!� x!

uÞd2 x!

u ð13Þ

where cjð x!Þ is a family of Gabor kernels with the shape of plane

waves with vector k!

j, restricted by a Gaussian envelope function:

cjð x!Þ¼

k2j

s2exp �

k2j x2

2s2

!expði k

!j x!Þ�exp �

s2

2

� �� �ð14Þ

k!

j ¼kjx

kjy

kv cosjm

kv sinjm

!, kv ¼ 2�vþ2=2p, jm ¼ m

p8

ð15Þ

A discrete set of 5 different frequencies denoted as n¼0,y, 4and 8 orientations, denoted as m¼0,y, 7 is employed with theindex j¼m+8n (Fig. 5).

A jet is defined as the set of 40 complex coefficients obtainedfor the pixel. It can be written as

Jj ¼ ajexpðifjÞ ð16Þ

with the magnitude ajð x!Þ and the phase fjð x

!Þ. Wiskott et al.

(1999) defined phase-ignored similarity function Sa(J,J0) andphase-sensitive similarity function Sj(J,J0) for jets.

SaðJ,JuÞ ¼

PjajaujffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

ja2j

Pjau

2j

q ð17Þ

SjðJ,JuÞ ¼

Pjajaujcosðfj�fuj� d

!k!

jÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPja

2j

Pjau

2j

q ð18Þ

Fig. 5. Gabor kernels of 5 different frequencies and 8 different orientations.

4.2. Dimension reduction and classification of feature point’s

Gabor jet

Previous approaches compare current feature point’s Gabor jetwith the standard feature point’s Gabor jet via the aforemen-tioned similarity function and choose the most similar point to bethe optimal one. However, it is difficult to decide which point isthe standard feature point. In this section, a LDA and KNN basedmethod are proposed to classify the points’ Gabor jet and choosethe optimal point.

4.2.1. LDA training

First, the face images in the training set are pre-processed bynormalization and lighting compensation. 9 points around thehand-marked feature point are labeled 9 classes in all trainingimages (Fig. 6). The corresponding points in the training imagesare regarded as the same class. Gabor feature of theses points areextracted for classification.

Linear discriminative analysis (Belhumeur et al., 1997) is aclass specific method. This method selects project matrix thatmaximizes the ratio of the between-class scatter and the within-class scatter. As for the LDA, the empirical choice for thedimension of the reduced space is c�1, where c is the numberof the classes. LDA method projects the aforementioned Gaborfeatures into a 8-dimensional space.

4.2.2. KNN classification

The points’ Gabor features of the same class cluster are in alow-dimensional space. They can be classified easily using theKNN method (Vapnik, 1995). The algorithm proposed in thispaper does not need special standard feature points rather than atraining set. Thus, it is more robust than the similarity function.

4.3. Feature points search

The extracted points of the hybrid AAM are treated as theinitial shape and classified by their Gabor features. Feature pointsof different classes have different searching domains (see Fig. 7).Note that most feature points have a relative higher gradient,searching the points with high gradient in the domain for thefeature point candidates will lead to a better efficiency. The pointwith the label of class one is seen as the feature point. If there areno class 1 points, one can choose other points with the nearestdistance to class one as the feature points. After the refinement,the shape is constrained by shape model and iteratively changinguntil it converges to the optimal location.

Class 3 Class 2 Class 9

Class 4 Class 1 Class 8

Class 5 Class 6 Class 7

Fig. 6. 9 classes around eye’s corner feature point.

Fig. 7. Different searching domain of different initial points.

Fig. 8. Visual comparison between improved AAM and traditional AAM:

(a) Failure result using traditional AAM and (b) the corresponding result using

improved AAM.

Table 1Quantitative comparison between hybrid AAM and traditional AAM.

Method Failure rate (%) Mean error to hand-marked points

Hybrid AAM 2.33 4.29 pixels

Traditional AAM 4.83 6.57 pixels

Fig. 9. Gabor jet based refinement for eyes’ and mouth’s feature points:

(1) Traditional AAM, (2) improved AAM and (3) after Gabor jet.

Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200 199

5. Experimental results

The experiment is conducted on the SJTU face database. 1000color face images of Asian people are collected under roughly thesame lighting condition and the facial feature points are markedmanually. Each image contains 60 facial feature points. The dataset is spitted into training set and testing set, with 400 images and600 images, respectively.

5.1. Comparison between traditional AAM and hybrid AAM

The authors present the comparison between the hybrid AAMwith the traditional AAM on SJTU face database. The face imagesare resized to the resolution of 300�300. The extraction failureoccurs if the difference between hand-marked points and pointsextracted by AAM is more than 10 pixels.

Fig. 8 demonstrates a visual comparison of the results. Andfurther quantitative comparison could be found in Table 1. For thehybrid AAM, extraction failure rate drop from 4.83% to 2.33%against the traditional AAM. And the mean error is 4.29 pixels,more than 30% accurate against the traditional method with themean error of 6.57 pixels. The results show that the hybrid AAMoutperforms the traditional AAM.

Therefore, it is reasonable to state that the hybrid AAM methodis more effective and robust than the traditional AAM. The authorswould like to contribute for the improvement in the use of skinchrominance information in color face image.

5.2. Gabor jet based refinement

Finally, Gabor jet based refinement further improves the meanerror of the feature points to 2.3 pixels. Fig. 9 illustrates theresults of the whole methods: feature points of the traditionalAAM, the hybrid AAM and feature points after the adjustment. It iseasy to check that the refinement leads to a better solution.

6. Conclusion

The paper focuses on the robust extraction of facial featurepoints based on the AAM model. After the pre-processing of astandard face detector and lighting compensation, the paperproposed a hybrid AAM by combining the local skin similarity

with the original local grey-level appearance model. Moreover,the feature points by the hybrid AAM and their neighbors wereconsidered by a classification problem to further refine theresults. Experimental results indicated that facial feature pointscan be located robustly and precisely by the proposed method.

Acknowledgement

This research has been supported by the National NaturalScience Foundation of China (No 60772097).

Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200200

References

Alan, L.Yuille, Peter, W.Hallinan, David, S.Cohen, 1992. Feature extraction fromfaces using deformable template. International Journal of Computer Vision8 (2), 99–111.

Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J., 1997. Eigenfaces vs. Fisherfaces:recognition using class specific linear projection. IEEE Transactions on PatternAnalysis and Machine Intelligence 19 (7), 711–720.

Cootes T.F., Taylor C.J., 2004. Statistical models of appearance for computer vision.Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J., 1995. Active shape models—their

training and application. Computer Vision and Image Understanding 61 (1), 38–59.Du, Chunhua, Yang, Jie, Wu, Qiang, Zhang, Tianhao, Wang, Huahua, Chen, Lu, Wu,

Zheng, 2006. Extended fitting methods of active shape model for the locationof facial feature points. ICVGIP, 610–618.

Feng, Li, Lixiu, Yao, Jie, Yang, Xinliang, G.E., 2006. The face detection in colorimages with complex environments. Journal of Shanghai Jiaotong University40 (5).

Hsu, Rein-Lien, Abdel-Mottaleb, M., Jain, A.K., 2002. Face detection in colorimages. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5),696–706.

Vapnik, Vladimir N., 1995. The Nature of Statistical Learning Theory. Springer, NewYork.

Viola, P., Jones, M.J., 2004. Robust Real-Time Face Detection. International Journalof Computer Vision 57 (2), 137–154.

Wiskott, Laurenz, Fellous, Jean-Marc, Kruger, Norbert, von der Malsburg,Christoph, 1999. Face recognition by elastic bunch graph matching. In:Jain, L.C. (Ed.), Intelligent Biometric Techniques in Fingerprint and FaceRecognition. CRC Press, pp. 355–396 (Chapter 11).