Upload
yue-zhou
View
214
Download
2
Embed Size (px)
Citation preview
Engineering Applications of Artificial Intelligence 24 (2011) 195–200
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence
0952-19
doi:10.1
n Corr
E-m
(Y. Li),
journal homepage: www.elsevier.com/locate/engappai
Robust facial feature points extraction in color images
Yue Zhou n, Yin Li, Zheng Wu, Meilin Ge
Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, 200240, P.R. China
a r t i c l e i n f o
Article history:
Received 14 July 2008
Received in revised form
19 July 2010
Accepted 2 September 2010Available online 25 September 2010
Keywords:
Facial feature points extraction
Skin similarity model
Gabor feature
KNN
LDA
76/$ - see front matter Crown Copyright & 2
016/j.engappai.2010.09.001
esponding author.
ail addresses: [email protected] (Y. Zhou),
[email protected] (Z. Wu), [email protected]
a b s t r a c t
A method of facial feature points extraction based on improved active appearance model (AAM) with
Gabor wavelet features was presented in the paper. After the pre-processing of a standard face detector
and lighting compensation, the paper proposed a hybrid AAM by combining the local skin similarity
with the original local grey-level appearance model. Moreover, the feature points by the hybrid AAM
and their neighbors were considered by a classification problem to further refine the results. Namely,
the Gabor feature around the feature points was extracted, trained by linear discriminant analysis (LDA)
and classified by K Nearest Neighbor (KNN) to give the precise location of the feature points.
Experimental results indicated that facial feature points can be located robustly and precisely by the
proposed method.
Crown Copyright & 2010 Published by Elsevier Ltd. All rights reserved.
1. Introduction
Facial feature point extraction (Alan et al., 1992) is one of thekey techniques in the face representation. And it is widely used indifferent problems such as face recognition, pose estimation and3D face reconstruction. With its root in point distribution model(PDM), active appearance model (AAM) is one of the most popularmethods in facial feature point extraction.
AAM learns the shape model together with the grey-levelappearance model from the training set. The model could then fiton the human face image by iteratively changing the shape untilits convergence to the right location. However, the traditionalAAM only deals with the local grey-level appearance, thus itmight get stuck in the local minimum due to the variance ofillumination or the complex local structure such as facial wrinkle.Fortunately, for color images, such kind of variations could havelittle impact on the facial skin chrominance. And the complexlocal structure could in turn provide additional information forthe accurate location of the facial feature points.
To address these problems, a method of facial feature pointsextraction based on improved AAM with Gabor wavelet featuresis presented in the paper. After the pre-processing of a standardface detector and lighting compensation, the paper proposes ahybrid AAM by combining the local skin similarity with theoriginal local grey-level appearance model. Moreover, the featurepoints by the hybrid AAM are further refined by a classification
010 Published by Elsevier Ltd. All
(M. Ge).
problem. To be more precise, the Gabor features around thefeature points are extracted, trained by linear discriminantanalysis (LDA) and classified by K nearest neighbor (KNN), togive the accurate location of feature points.
Section 2 describes the pre-processing of the image. Section 3presents the hybrid AAM. Feature points refinement based onGabor jet is presented in Section 4. Experimental results aredescribed in Section 5. Finally, Section 6 concludes the paper.
2. Pre-processing of the image
Pre-processing is necessary for the robust extraction of thefacial feature points. To extract the feature point, the location ofthe face in a real world image is crucial. Moreover, to facilitate theskin similarity model, lighting compensation would be bene-ficial. As a consequence, the Adaboot based face detector is firstimplemented, followed by the lighting compensation.
2.1. Face detection algorithm
Before the extraction of facial feature, a face detector is neededto locate the face. Viola and Jones (2004) proposed the well-known Adaboost algorithm, which could handle the detection inan efficient and robust manner. With Harr like features, thealgorithm combines multiple weak classifiers in a cascade andproduces efficient and accurate solutions. Therefore, the authorsimplement the algorithm for the face detection in our system.A face detected by Adaboost is shown in Fig. 1 as an example.
rights reserved.
Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200196
2.2. Lighting compensation
Complex lighting condition has a negative effect on theestimation of facial skin similarity probability. Standard lightingcompensation uses ‘‘reference white’’ to normalize the colorappearance. Before using ‘‘reference white’’, the image isprocessed by Gamma correct.
new_pixel_value¼ old_pixel_value1=CG ð1Þ
where CG is the Gamma coefficient. Empirically, CG¼2.2222 for allimages.
After Gamma correct, pixels with the top 5% of the illumina-tion value in the image are regarded as the reference white only ifthe number of these pixels are sufficiently large (4100 for ourexperiment with the resolution of 300�300). The R, G and Bcomponents of a color image are adjusted so that the average greyvalue of these reference white pixels is linearly scaled to 255. Theimage is not corrected if the number of reference-white pixels issmall (o100) or the color distribution is already close to the skinmodel mentioned in Section 3.1. Fig. 3a and b demonstrates anexample of the lighting compensation results. With lightingcompensation, the proposed hybrid appearance becomes morerobust (see Fig. 3c and d).
3. Hybrid appearance model for AAM
Based on the point distribution model (PDM), AAM is one ofthe most popular algorithms in facial feature point extraction.Specifically, AAM trains a shape model with local grey-valueappearance and fits the model on the human face image byiteratively changing the shape until its convergence to the rightlocation. In this section, a hybrid appearance model which
Fig. 1. Face detection by Adaboost.
Fig. 2. Normalized distributio
combines grey-value information with facial skin similarity ispresented.
3.1. Facial skin similarity probability
Modeling skin color requires choosing an appropriate colorspace and identifying a cluster associated with skin color in thisspace. Recent studies assume that the chrominance componentsof the skin-tone color are independent of the luminancecomponent. And YCbCr color space is often used to build a skinchrominance model. The YCbCr space is also used in this paper tomodel the skin similarity. In addition, the skin model is learnedfrom the SJTU face database, which contains 2274 Asian people’sface images with feature points been marked manually.
By the transformation of the RGB space to the YCbCr space,facial skin patches were (567,959 pixels) collected from 100 faceimages in SJTU image database (Du et al., 2006; Feng et al., 2006).The skin-tone cluster is shown in Fig. 2 in the subspace CbCr. Thecluster is approximated by the Gaussian distribution (Cootes andTaylor, 2004) with its parameters in (2) and (3).
Mean¼ E½px� ð2Þ
Cov¼ E½ðpx-MeanÞðpx-MeanÞT � ð3Þ
n of skin CbCr subspace.
Fig. 3. The effect of lighting compensation: (a) Original RGB image; (b) a lighting
compensated image, (c) skin similarity estimation of (a) and (d) skin similarity
estimation of (b).
Fig. 4. Initial shape of AAM.
Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200 197
where px¼[Cr,Cb]. The parameters are fitted to the samples bymaximum likelihood estimation and is shown in
Mean¼ ½140:9557,115:2307�, Cov¼17:6018 �9:7822
�9:7822 14:6607
� �ð4Þ
Then, facial skin similarity probability P could be calculated byMahalanobis distance to the cluster center as shown in
P¼ expð�0:5ðpx-MeanÞT Cov�1ðpx-MeanÞÞ ð5Þ
where P ranges [0,1]. The probability P is then linearlytransformed to [0,255], and look-up table is pre-computed tospeed up the calculation from CbCr subspace to facial skinsimilarity probability.
3.2. Modeling shape variation
Suppose a sets of points xiARnd aligned into a commoncoordinate image is given (Cootes et al., 1995). These vectorsform a cloud of points in Rnd, i.e. a distribution in the nd
dimensional space. AAM apply principal component analysis(PCA) to the data. PCA computes the main axes of this cloud,leading to the approximation of the original points using a modelwith fewer than nd parameters. Namely, one could get
x¼ x_þPsbs ð6Þ
where x_
is the mean shape, Ps¼(p0, p1, p2, pt�1) and pi is theeigenvector corresponding to eigenvalue li (sorted in decreasingorder):
bs ¼ PTs ðx�x
_Þ ð7Þ
The vector bs defines a set of parameters of a deformablemodel. The shape varies by different bs. Furthermore, the limit ofthe parameter bi is set to 73
ffiffiffiffili
p, indicating that the shape
generated is similar to the original training set.
3.3. Local hybrid appearance model
Most of the facial feature points are at the boundary of skinand non-skin region. Thus, skin similarity probability could be acue to infer their locations. In this section, a hybrid AAM bycombining the local skin similarity with the original local grey-level appearance model is presented and discussed.
Suppose for a given feature point i, n pixels on either side ofthe ith point in the kth training image are sampled. Then, for eachpoint, np¼2n+1 samples could be collected and form a vectorhki¼(hki0,y,hki(np�1))
T (we set n¼30). The local grey-levelappearance is represented by the gradient of the correspondingpixels in the vector hki as dhki¼(hki1�hki0,y,hki(np�1)�hki(np�2))
T.The gradient is further normalized by dividing by the sum ofabsolute values as
gki ¼dhkiPnp�2
q ¼ 0
9hkiðqþ1Þ�hkiq9
ð8Þ
Local skin probability gradient vector pki could be calculatedsimilar to the local grey gradient vector. The difference is thatvector hki is from the grey level value image, while lki is from theskin similarity probability image.
pki ¼dlkiPnp�2
q ¼ 0
9lkiðqþ1Þ�lkiq9
ð9Þ
A reasonable approach is to combine gki with pki, and form a
new feature vector cki ¼gki
Wppki
!, where Wp is a diagonal matrix
of weights for each pki parameter, allowing more flexibilitybetween gki and pki (we set Wp¼ I in experiment). cki as thefeature is called the local hybrid appearance model. The localhybrid appearance model combines local grey-level informationwith skin similarity, leading to a more robust solution.
ck is approximated by Gaussian distribution with its mean andcovariance as
ci ¼1
N
XN
k ¼ 1
cki ð10Þ
CovðckÞ ¼1
N
XN
k ¼ 1
ðcki�ciÞðcki�ciÞT
ð11Þ
The distance of a new sample cui to the model is given by
d¼ ðcui�cÞT Cov�1i ðcui�ciÞ ð12Þ
This is the Mahalanobis distance of the sample cui from themodel mean ci. Minimizing d is equivalent to maximizingthe likelihood of cui. Thus, the feature points should have reachedthe minimum distance of the model.
3.4. Initial shape parameters setting
Hsu et al. (2002) proposes an algorithm that fast locates theeyes and mouth according to the chrominance information ofthem. For better initialization, the feature points of eyes andmouth in the average shape are adapted to fit the location byupdating the shape parameters bs. By pose parameters (y,s,t), theinitial shape could be set as X ¼Mðs,yÞðxþPsbsÞþt. An illustrationof the initialization is shown in Fig. 4.
3.5. Feature points searching
The optimal feature points could be found according to thelocal hybrid appearance model. The optimal solution should bethe points which have the minimum distance defined in thehybrid appearance model. The solution is iteratively refined:updating the parameters (y,s,t,bs) to best fit the found shape,applying limits of 73
ffiffiffiffili
pto the parameter bs and then
reconstructing a new shape. The procedure is repeated untilconvergence.
Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200198
4. Gabor jet based precise adjustment
After the hybrid AAM, the feature points at local edges arelocated optimally, but the feature points at corners such as eyespoints, nose points and mouth points cannot be located precisely.It is due to the fact that AAM searches only along normaldirection, leading to possible inaccurate position. To address theproblem, a Gabor jet classification based search algorithm ispresented. The algorithm searches in 2D domain and refines thelocation of the feature points.
4.1. Gabor jets and comparing jets
Gabor wavelets are biologically motivated convolution kernelswith the shape of plane waves restricted by a Gaussian envelopefunction. The set of convolution coefficients for kernels ofdifferent orientations and frequencies at one image pixel is calleda jet. The definition of jets and similarity functions between jetsare clarified in this section.
A jet describes a small patch of an image Ið x!Þ around a given
pixel x!¼ ðx,yÞ. It is based on a wavelet transform, defined as a
convolution:
Jjð x!Þ¼
ZIð x!
uÞcjð x!� x!
uÞd2 x!
u ð13Þ
where cjð x!Þ is a family of Gabor kernels with the shape of plane
waves with vector k!
j, restricted by a Gaussian envelope function:
cjð x!Þ¼
k2j
s2exp �
k2j x2
2s2
!expði k
!j x!Þ�exp �
s2
2
� �� �ð14Þ
k!
j ¼kjx
kjy
!¼
kv cosjm
kv sinjm
!, kv ¼ 2�vþ2=2p, jm ¼ m
p8
ð15Þ
A discrete set of 5 different frequencies denoted as n¼0,y, 4and 8 orientations, denoted as m¼0,y, 7 is employed with theindex j¼m+8n (Fig. 5).
A jet is defined as the set of 40 complex coefficients obtainedfor the pixel. It can be written as
Jj ¼ ajexpðifjÞ ð16Þ
with the magnitude ajð x!Þ and the phase fjð x
!Þ. Wiskott et al.
(1999) defined phase-ignored similarity function Sa(J,J0) andphase-sensitive similarity function Sj(J,J0) for jets.
SaðJ,JuÞ ¼
PjajaujffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP
ja2j
Pjau
2j
q ð17Þ
SjðJ,JuÞ ¼
Pjajaujcosðfj�fuj� d
!k!
jÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPja
2j
Pjau
2j
q ð18Þ
Fig. 5. Gabor kernels of 5 different frequencies and 8 different orientations.
4.2. Dimension reduction and classification of feature point’s
Gabor jet
Previous approaches compare current feature point’s Gabor jetwith the standard feature point’s Gabor jet via the aforemen-tioned similarity function and choose the most similar point to bethe optimal one. However, it is difficult to decide which point isthe standard feature point. In this section, a LDA and KNN basedmethod are proposed to classify the points’ Gabor jet and choosethe optimal point.
4.2.1. LDA training
First, the face images in the training set are pre-processed bynormalization and lighting compensation. 9 points around thehand-marked feature point are labeled 9 classes in all trainingimages (Fig. 6). The corresponding points in the training imagesare regarded as the same class. Gabor feature of theses points areextracted for classification.
Linear discriminative analysis (Belhumeur et al., 1997) is aclass specific method. This method selects project matrix thatmaximizes the ratio of the between-class scatter and the within-class scatter. As for the LDA, the empirical choice for thedimension of the reduced space is c�1, where c is the numberof the classes. LDA method projects the aforementioned Gaborfeatures into a 8-dimensional space.
4.2.2. KNN classification
The points’ Gabor features of the same class cluster are in alow-dimensional space. They can be classified easily using theKNN method (Vapnik, 1995). The algorithm proposed in thispaper does not need special standard feature points rather than atraining set. Thus, it is more robust than the similarity function.
4.3. Feature points search
The extracted points of the hybrid AAM are treated as theinitial shape and classified by their Gabor features. Feature pointsof different classes have different searching domains (see Fig. 7).Note that most feature points have a relative higher gradient,searching the points with high gradient in the domain for thefeature point candidates will lead to a better efficiency. The pointwith the label of class one is seen as the feature point. If there areno class 1 points, one can choose other points with the nearestdistance to class one as the feature points. After the refinement,the shape is constrained by shape model and iteratively changinguntil it converges to the optimal location.
Class 3 Class 2 Class 9
Class 4 Class 1 Class 8
Class 5 Class 6 Class 7
Fig. 6. 9 classes around eye’s corner feature point.
Fig. 7. Different searching domain of different initial points.
Fig. 8. Visual comparison between improved AAM and traditional AAM:
(a) Failure result using traditional AAM and (b) the corresponding result using
improved AAM.
Table 1Quantitative comparison between hybrid AAM and traditional AAM.
Method Failure rate (%) Mean error to hand-marked points
Hybrid AAM 2.33 4.29 pixels
Traditional AAM 4.83 6.57 pixels
Fig. 9. Gabor jet based refinement for eyes’ and mouth’s feature points:
(1) Traditional AAM, (2) improved AAM and (3) after Gabor jet.
Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200 199
5. Experimental results
The experiment is conducted on the SJTU face database. 1000color face images of Asian people are collected under roughly thesame lighting condition and the facial feature points are markedmanually. Each image contains 60 facial feature points. The dataset is spitted into training set and testing set, with 400 images and600 images, respectively.
5.1. Comparison between traditional AAM and hybrid AAM
The authors present the comparison between the hybrid AAMwith the traditional AAM on SJTU face database. The face imagesare resized to the resolution of 300�300. The extraction failureoccurs if the difference between hand-marked points and pointsextracted by AAM is more than 10 pixels.
Fig. 8 demonstrates a visual comparison of the results. Andfurther quantitative comparison could be found in Table 1. For thehybrid AAM, extraction failure rate drop from 4.83% to 2.33%against the traditional AAM. And the mean error is 4.29 pixels,more than 30% accurate against the traditional method with themean error of 6.57 pixels. The results show that the hybrid AAMoutperforms the traditional AAM.
Therefore, it is reasonable to state that the hybrid AAM methodis more effective and robust than the traditional AAM. The authorswould like to contribute for the improvement in the use of skinchrominance information in color face image.
5.2. Gabor jet based refinement
Finally, Gabor jet based refinement further improves the meanerror of the feature points to 2.3 pixels. Fig. 9 illustrates theresults of the whole methods: feature points of the traditionalAAM, the hybrid AAM and feature points after the adjustment. It iseasy to check that the refinement leads to a better solution.
6. Conclusion
The paper focuses on the robust extraction of facial featurepoints based on the AAM model. After the pre-processing of astandard face detector and lighting compensation, the paperproposed a hybrid AAM by combining the local skin similarity
with the original local grey-level appearance model. Moreover,the feature points by the hybrid AAM and their neighbors wereconsidered by a classification problem to further refine theresults. Experimental results indicated that facial feature pointscan be located robustly and precisely by the proposed method.
Acknowledgement
This research has been supported by the National NaturalScience Foundation of China (No 60772097).
Y. Zhou et al. / Engineering Applications of Artificial Intelligence 24 (2011) 195–200200
References
Alan, L.Yuille, Peter, W.Hallinan, David, S.Cohen, 1992. Feature extraction fromfaces using deformable template. International Journal of Computer Vision8 (2), 99–111.
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J., 1997. Eigenfaces vs. Fisherfaces:recognition using class specific linear projection. IEEE Transactions on PatternAnalysis and Machine Intelligence 19 (7), 711–720.
Cootes T.F., Taylor C.J., 2004. Statistical models of appearance for computer vision.Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J., 1995. Active shape models—their
training and application. Computer Vision and Image Understanding 61 (1), 38–59.Du, Chunhua, Yang, Jie, Wu, Qiang, Zhang, Tianhao, Wang, Huahua, Chen, Lu, Wu,
Zheng, 2006. Extended fitting methods of active shape model for the locationof facial feature points. ICVGIP, 610–618.
Feng, Li, Lixiu, Yao, Jie, Yang, Xinliang, G.E., 2006. The face detection in colorimages with complex environments. Journal of Shanghai Jiaotong University40 (5).
Hsu, Rein-Lien, Abdel-Mottaleb, M., Jain, A.K., 2002. Face detection in colorimages. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5),696–706.
Vapnik, Vladimir N., 1995. The Nature of Statistical Learning Theory. Springer, NewYork.
Viola, P., Jones, M.J., 2004. Robust Real-Time Face Detection. International Journalof Computer Vision 57 (2), 137–154.
Wiskott, Laurenz, Fellous, Jean-Marc, Kruger, Norbert, von der Malsburg,Christoph, 1999. Face recognition by elastic bunch graph matching. In:Jain, L.C. (Ed.), Intelligent Biometric Techniques in Fingerprint and FaceRecognition. CRC Press, pp. 355–396 (Chapter 11).