8
Cross-pose face recognition based on partial least squares Annan Li a,b,, Shiguang Shan a , Xilin Chen a , Wen Gao c,a a Key Lab of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China b Graduate University of Chinese Academy of Sciences, Beijing 100190, China c Institute of Digital Media, Peking University, Beijing 100871, China article info Article history: Received 30 December 2010 Available online 24 August 2011 Communicated by A.R. Alimi Keywords: Face recognition Cross-pose face recognition Partial least squares abstract The pose problem is one of the bottlenecks for face recognition. In this paper we propose a novel cross- pose face recognition method based on partial least squares (PLS). By training on the coupled face images of the same identities and across two different poses, PLS maximizes the squares of the intra-individual correlations. Therefore, it leads to improvements in recognizing faces across pose differences. The exper- imental results demonstrate the effectiveness of the proposed method. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Automatic face recognition systems can achieve high perfor- mance for recognizing faces under frontal view. However, in real- world applications face images are captured under various viewpoints. Recognition performance drops significantly when the poses of query and target faces are different (Zhao et al., 2003). Thus, recognizing faces across pose differences is still a challenging problem. The difficulty for cross-pose face recognition is that the pose var- ies in 3D space, while the image captures only 2D appearances. It leads to a special phenomenon that faces of different identities but under similar viewpoint are similar than that of the same identity under different viewpoints (see Fig. 1). This phenomenon explains why many popular face recognition methods such as the ‘‘eigenfac- es’’ (Turk and Pentland, 1991) and ‘‘fisherfaces’’ (Belhumeur et al., 1997) fail when confronted to large pose variations. Because the pose variations separate the faces into several different subspaces, it is not surprising that the directly comparing two vectors from dif- ferent subspaces in such methods will lead to poor performance. In the statistical learning based face recognition approaches, faces are usually represented by linear subspaces. As shown in Fig. 1, it is difficult to represent faces by a single linear subspace across different poses. Therefore, it is better to divide faces into several pose-specific subspaces. Then the problem left is how to recognize faces across two different linear subspaces. In this paper, we propose to use partial least squares (Rosipal and Kramer, 2006) to recognize faces across poses. Since the PLS can maximize the squares of correlation between two different subspaces, it is a proper tool to explore the problem of cross-pose face recognition. The remaining parts of this paper are organized as follows: Sec- tion 2 gives a brief review of the methods of cross-pose face recog- nition; We describe the cross-pose face recognition approach that using the PLS Section 3. In Section 4, we presented the experimen- tal results. And lastly, we stated our conclusion in Section 5. 2. Related works The pose change is a challenging problem in the research of face recognition. To overcome this problem, some research works have been proposed during the past decade. Related literature survey can be found in (Zhang and Gao, 2009). In our opinion, these meth- ods can be divided into three categories, i.e., the geometric approaches, the statistical approaches and the hybrid approaches. For the tight connection between the pose variation and 3D shape, pursuing 3D alignment is a natural way to tackle the pose problem. The 3D morphable model (3DMM) proposed by Blanz and Vetter (2003) is considered the state of the art in this type of approaches. In their approach, input 2D face images are fitted to the morphable 3D face model. Thus, faces are represented by these models in 3D space. Face recognition is performed by using the representation coefficients in 3D space or the transforming non- frontal face images to frontal view (Blanz et al., 2005). Although the performance is good, the 3DMM also have some limits. For example, the computational complexity of 3DMM is very high. Besides that generating 2D appearances from 3D shapes needs proper estimation of illumination model. The point model used in the 3DMM is too simple to represent the illumination variations in real-world application. To overcome this problem, Zhang and Samaras (2006) improves the 3DMM by using the spherical har- monic illumination model. In essence, reconstructing 3D shape is 0167-8655/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2011.07.020 Corresponding author at: Key Lab of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China. E-mail address: [email protected] (A. Li). Pattern Recognition Letters 32 (2011) 1948–1955 Contents lists available at SciVerse ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Cross-pose face recognition based on partial least squares

Embed Size (px)

Citation preview

Page 1: Cross-pose face recognition based on partial least squares

Pattern Recognition Letters 32 (2011) 1948–1955

Contents lists available at SciVerse ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

Cross-pose face recognition based on partial least squares

Annan Li a,b,⇑, Shiguang Shan a, Xilin Chen a, Wen Gao c,a

a Key Lab of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, Chinab Graduate University of Chinese Academy of Sciences, Beijing 100190, Chinac Institute of Digital Media, Peking University, Beijing 100871, China

a r t i c l e i n f o a b s t r a c t

Article history:Received 30 December 2010Available online 24 August 2011Communicated by A.R. Alimi

Keywords:Face recognitionCross-pose face recognitionPartial least squares

0167-8655/$ - see front matter � 2011 Elsevier B.V. Adoi:10.1016/j.patrec.2011.07.020

⇑ Corresponding author at: Key Lab of IntelligenChinese Academy of Sciences, Institute of Computing Tof Sciences, Beijing 100190, China.

E-mail address: [email protected] (A. Li).

The pose problem is one of the bottlenecks for face recognition. In this paper we propose a novel cross-pose face recognition method based on partial least squares (PLS). By training on the coupled face imagesof the same identities and across two different poses, PLS maximizes the squares of the intra-individualcorrelations. Therefore, it leads to improvements in recognizing faces across pose differences. The exper-imental results demonstrate the effectiveness of the proposed method.

� 2011 Elsevier B.V. All rights reserved.

1. Introduction squares of correlation between two different subspaces, it is a

Automatic face recognition systems can achieve high perfor-mance for recognizing faces under frontal view. However, in real-world applications face images are captured under variousviewpoints. Recognition performance drops significantly when theposes of query and target faces are different (Zhao et al., 2003). Thus,recognizing faces across pose differences is still a challengingproblem.

The difficulty for cross-pose face recognition is that the pose var-ies in 3D space, while the image captures only 2D appearances. Itleads to a special phenomenon that faces of different identities butunder similar viewpoint are similar than that of the same identityunder different viewpoints (see Fig. 1). This phenomenon explainswhy many popular face recognition methods such as the ‘‘eigenfac-es’’ (Turk and Pentland, 1991) and ‘‘fisherfaces’’ (Belhumeur et al.,1997) fail when confronted to large pose variations. Because thepose variations separate the faces into several different subspaces,it is not surprising that the directly comparing two vectors from dif-ferent subspaces in such methods will lead to poor performance.

In the statistical learning based face recognition approaches,faces are usually represented by linear subspaces. As shown inFig. 1, it is difficult to represent faces by a single linear subspaceacross different poses. Therefore, it is better to divide faces intoseveral pose-specific subspaces. Then the problem left is how torecognize faces across two different linear subspaces. In this paper,we propose to use partial least squares (Rosipal and Kramer, 2006)to recognize faces across poses. Since the PLS can maximize the

ll rights reserved.

t Information Processing ofechnology, Chinese Academy

proper tool to explore the problem of cross-pose face recognition.The remaining parts of this paper are organized as follows: Sec-

tion 2 gives a brief review of the methods of cross-pose face recog-nition; We describe the cross-pose face recognition approach thatusing the PLS Section 3. In Section 4, we presented the experimen-tal results. And lastly, we stated our conclusion in Section 5.

2. Related works

The pose change is a challenging problem in the research of facerecognition. To overcome this problem, some research works havebeen proposed during the past decade. Related literature surveycan be found in (Zhang and Gao, 2009). In our opinion, these meth-ods can be divided into three categories, i.e., the geometricapproaches, the statistical approaches and the hybrid approaches.

For the tight connection between the pose variation and 3Dshape, pursuing 3D alignment is a natural way to tackle the poseproblem. The 3D morphable model (3DMM) proposed by Blanzand Vetter (2003) is considered the state of the art in this type ofapproaches. In their approach, input 2D face images are fitted tothe morphable 3D face model. Thus, faces are represented by thesemodels in 3D space. Face recognition is performed by using therepresentation coefficients in 3D space or the transforming non-frontal face images to frontal view (Blanz et al., 2005). Althoughthe performance is good, the 3DMM also have some limits. Forexample, the computational complexity of 3DMM is very high.Besides that generating 2D appearances from 3D shapes needsproper estimation of illumination model. The point model usedin the 3DMM is too simple to represent the illumination variationsin real-world application. To overcome this problem, Zhang andSamaras (2006) improves the 3DMM by using the spherical har-monic illumination model. In essence, reconstructing 3D shape is

Page 2: Cross-pose face recognition based on partial least squares

Fig. 1. Pose variations lead to a special phenomenon that two faces of differentperson under similar viewpoint (a) are smaller than that of the same person underdifferent viewpoints (b).

A. Li et al. / Pattern Recognition Letters 32 (2011) 1948–1955 1949

to achieve dense and pixel-level alignment. Besides 3D face recon-struction, some sparse alignment based approaches are also pro-posed for recognizing face across pose. For example, the elasticbunch graph matching method proposed by Wiskott et al. (1997),and the multi-view active appearance models proposed by Cooteset al. (2002).

Besides the geometric approaches, statistical learning is analternative way to recognizing face across poses. Hitherto, a typicalstatistical approach is the ‘‘eigen light-fields’’ method proposed byGross et al. (2004). The ‘‘light-fields’’ are complete appearancemodels including all possible pose variations. Ideally the ‘‘light-fields’’ can cover all pose variations like 3D face models. Thus, itis a pose-free face representation. A test image can be viewed asa part of corresponding ‘‘light-field’’. The missing parts are esti-mated from the available parts. Performing principal componentanalysis on the ‘‘light-fields’’, the method becomes the ‘‘eigenlight-fields’’. Cross-pose face recognition is performed by compar-ing the coefficients of the ‘‘eigen light-fields’’. The tied factor anal-ysis method proposed by Prince et al. (2008) is another typicalstatistical approach. In this method faces are represented acrosspose difference by the models similar to the ‘‘light-fields’’. Thekey differences is they use factor analysis instead of the principalcomponent analysis. Specifically, some ‘‘tied factors’’ across posedifference are learned using Expectation Maximization algorithm.Then, face recognition is performed based on the probabilistic dis-tance metric built on the factor loadings.

Besides building the pose-invariant statistical models, trans-forming face or features from one pose to another statistically isanother way to tackle the pose problem. Sanderson et al. (2006)extend the gallery set by transforming the frontal face model tonon-frontal views. Lee and Kim (2006) constructed feature spacesfor each pose using kernel principal component analysis, and thentransformed the non-frontal face to frontal through the featurespaces. Different from the foregoing two methods that transformholistic faces, Chai et al. (2007) performed linear regression on lo-cal patches for virtual frontal view synthesis. In their approach faceis divided into image patches. For a non-frontal face image, eachpatch can predict a virtual frontal image patch. A virtual frontalface is synthesized by overlapping these virtual frontal patches.

Although theoretically reasonable, both geometric and statisti-cal approaches have fundamental limits in application. For geo-metric approach, reconstructing 3D object from a single 2Dimage is an ill-posed problem. As to statistical approaches, it is al-ways hard to collect enough training data that including all thevariations of face. To improve the performance, hybrid approachesthat combining geometrical alignment and statistical learning arerecently proposed.

One way to integrate geometrical and statistical information iscombining local statistical models with coarse geometrical align-ment. Kanade and Yamada (2003) proposed a probabilistic frame-work for integrating local models for cross-pose face recognition.The statistical models are built on local patches in their approach.Based on this framework, Liu and Chen (2005) used a simple 3D

ellipsoid model and Li et al. (2009) used a generic 3D face model toalign patches across different pose respectively. And Ashraf et al.(2008) aligned the patches by learning 2D affine transform for eachpatch pair via a Lucas–Kanade like optimization procedure (Lucasand Kanade, 1981). Different from the above methods that concernon matching local patches, Lucey and Chen (2006) extend Kanadeand Yamada’s work by building similar statistical models betweenholistic non-frontal faces and local patches on frontal face.

Besides combining local statistical models with coarse geometri-cal alignment, embedding geometrical information into the featurevectors is another way to integrate geometrical and statistical infor-mation. Specifically, faces are represented by a set of augmented fea-ture vectors in this kind of methods. The augmented vectors includetwo parts, i.e., the feature vector describes the visual appearance andthe spatial locations respectively. Zhao and Gao (2009) sampled theaugmented feature vectors through key points detection, and usedmodified Hausdorff distance to measure the similarity betweentwo faces. Wright and Hua (2009) densely sampled the augmentedfeature vectors on face, and quantify them into histograms via ran-dom projection trees. Face recognition is then performed by match-ing the histograms. Benefiting from the dense sampling andquantization, the matching is spatially elastic to some extent. Thus,the problem of alignment is implicitly alleviated.

It should be pointed out that we also extended the proposedmethod with coarse geometric alignment and local feature repre-sentations. Therefore the proposed method can be also viewed asa hybrid approach.

3. Cross-pose face recognition based on partial least squares

In this section we described the partial least squares and how itleads to pose invariance based on the coupled face data. A PLSbased cross-pose face recognition method is proposed. Local Gaborfeatures have been proved successful in face recognition (Liu andWechsler, 2002). To explore the potentiality of the proposed mod-els and evaluate it under relative fair experimental conditions, anenhancement with coarse alignment and local Gabor features isalso proposed in this section.

3.1. Cross-pose face recognition using partial least squares

Partial least squares is a classical statistical learning method. Itis widely used in chemometrics and bioinformatics etc. (Rosipaland Kramer, 2006). In recent years, it is also applied in face recog-nition (Baek and Kim, 2004; Štruc and Pavešic, 2009) and humandetection (Schwartz et al., 2009). PLS can avoid the small samplesize problem in linear discriminant analysis (LDA). Therefore it isused as a alternative method of LDA. It should be pointed thatthe problem we study is quite different from that in (Baek andKim, 2004; Štruc and Pavešic, 2009). PLS is used to reduce the influ-ence of pose in this paper.

PLS models the relations between two sets of variables bymeans of score vectors. ‘‘The underlying assumption of all PLSmethods is that the observed data is generated by a system orprocess which is driven by a small number of latent variables.’’(Rosipal and Kramer, 2006).

Denote n observed samples from each sets of variables as X =(x1,x2, . . . ,xn) and Y = (y1,y2, . . . ,yn) respectively, with X 2 RM andY 2 RN . Both X and Y are normalized to zero mean. The PLS decom-poses the X and Y into the form:

X ¼ TPT þ E

Y ¼ UQT þ F;ð1Þ

where the T and U are n � p matrices of the p extracted score vec-tors or latent vectors. The M � p matrix P and the N � p matrix Q

Page 3: Cross-pose face recognition based on partial least squares

Fig. 2. The illustration of PLS based cross-pose face recognition. (a) In the training phase PLS decomposition is performed on the coupled faces across pose differences. (b) Intesting, corresponding score vectors of the input faces are estimated using the loading vectors obtained in the training phase. Recognition is performed by comparing thesescore vectors.

−0.5 0 0.5 1(a)

−0.5 0 0.5 1(b)

Different identitySame identity

Fig. 3. The correlation distribution in original feature vectors (a) and score vectors(b).

Fig. 4. Gabor magnitude features of 5 scales and 8 orientations are extracted fromthe patches that centered at fiducial points.

1950 A. Li et al. / Pattern Recognition Letters 32 (2011) 1948–1955

are the matrices of loadings. And the E and F are the matrices ofresiduals whose size is n �M and n � N respectively. The optimiza-tion goal of PLS is to find weight vectors w and c such that

½covðt;uÞ�2 ¼ ½covðXw;YcÞ�2 ¼ maxjrj¼jsj¼1

½covðXr;YsÞ�2; ð2Þ

where t and u are the correspond score vectors in T and U, and cov(a,b) = aTb/n denotes the covariance. In other words, PLS maximizesthe squares of covariance between the score vectors T and U. Itdirectly models the relations between X and Y.

In this paper, we propose that an effective cross-pose face rec-ognition method can be made by combining PLS and the coupledfaces across pose differences. As shown in Fig. 2, the methodinclude two parts. In the training phase, we first construct a cou-pled training set. Denote the training set as (X,Y), X and Y corre-spond to two different poses. The samples in X and Y are coupledby identity. PLS is performed on the coupled training set to getoptimized loading vectors P and Q. In the testing phase, the loadingvectors are used to estimate score vectors for new samples. Cross-pose face recognition is performed by comparing the estimatedscore vectors. The process of recognition is shown in Fig. 2.

In the PLS training the original sample vectors are decomposedinto two parts. The optimization goal of PLS implies that the squaresof covariance between the scores are maximized. Because the train-ing faces are coupled by identity, the squares of intra-individualcorrelation between score vectors are maximized by the PLS. InFig. 3, we plot the correlation distribution for original feature vectorsand their corresponding score vectors estimated by PLS. As can beseen, the distribution of the same and different identity are betterseparated after PLS modeling. The score vectors are good pose-invariant representation of faces. Cross-pose face recognition canbe performed by directly matching the score vectors.

Denote a gallery face as xtest, and a probe face as ytest. Based onEq. (1) their corresponding score vectors t and u can be estimatedusing the loading vectors P and Q.

t ¼ xtestðPTÞ�1

u ¼ ytestðQTÞ�1

:ð3Þ

The similarity s between xtest and ytest can be simply measured bythe correlation between t and u.

s ¼ ht; uiktk � kuk

; ð4Þ

where the ha,bi denotes the dot product of a and b. In the recogni-tion, a probe face is simply classified to the gallery face with highests value. The testing process of this recognition approach is shown inFig. 2(b). There are different methods to get the solution of Eq. (2).

Because the nonlinear iterative partial least squares algorithm(Rosipal and Kramer, 2006) is computational efficient, we use it toget the solution of PLS in this paper.

3.2. Extension with coarse alignment and local feature representations

As described in previous section, integrating geometric align-ment and statistical models is a better way to recognize facesacross poses than the pure statistical approaches. To explore thepotentiality of the proposed method, we enhanced our approachwith coarse geometric alignment. As shown in Fig. 4, the alignmentis obtained by using some landmarks on face, such as the eye cen-ters, the tip of nose etc. Based on these aligned landmarks, we firstextracted visual features from the local region centered at eachlandmark. Since Gabor features have been proved successful in facerecognition (Liu and Wechsler, 2002), we applied it for visual fea-ture representation. Gabor magnitude features of 5 scales and 8orientations are extracted from the patches that centered at fidu-cial points.

After feature extraction we built a PLS based classifier on eachpair of corresponding landmarks. Therefore we have several

Page 4: Cross-pose face recognition based on partial least squares

Fig. 7. Five facial landmarks used in the experiments.

A. Li et al. / Pattern Recognition Letters 32 (2011) 1948–1955 1951

independent local models for representing the appearance of faces.There are two different ways to utilize these models, i.e., concate-nating these models into a holistic feature vector and using themindependently. In this paper we chose to use these models inde-pendently for reducing the computational cost. The final decisionof classification is derived from the fusion of these classifiers. Inthis paper we adopted the approach proposed by Kanade andYamada (2003) for integrating the outputs of each classifier.

As shown in Eq. (4), in each PLS based classifier, the similaritybetween two faces is measured by the correlation between theprojections. For the i-th classifier, we denote the correlation as si.Its conditional probability density that it belong to the same anddifferent identity are denoted as P(sijsame) and P(sijdif) respec-tively. The distributions of them are approximated by a Gaussiandistribution (see Fig. 5). Accordingly

PðsijsameÞ ¼ 1ffiffiffiffiffiffiffi2pp

rsamei

exp �12

si � lsamei

rsamei

� �2" #

Pðsijdif Þ ¼ 1ffiffiffiffiffiffiffi2pp

rdifi

exp �12

si � ldifi

rdifi

!224

35;

ð5Þ

where lsame and ldif, rsame and rdif are the means and standarddeviations respectively. This probabilistic modeling process isshown in Fig. 5. Deriving from the Bayes rule, the posteriori proba-bility that two faces belong to the same identity based on the i-thclassifier is given by the following equation.

PðsamejsiÞ ¼PðsijsameÞPðsameÞ

PðsijsameÞPðsameÞ þ Pðsijdif ÞPðdif Þ : ð6Þ

Here P(same) is the priori probability for the same identity to ap-pear, while P(dif) denotes that of different identity. Since there isno prior knowledge, we assume that the probability is equal. Sowe set P(same) = P(dif) = 0.5. For each classifier we can get a proba-bility. The total similarity between two faces is measured by thesum of the probability values.

Stotal ¼Xk

i¼1

PðsamejsiÞ; ð7Þ

where k is the total number of classifiers. In the recognition, a probeface is classified to the gallery face with highest Stotal value. In this

Fig. 5. The process of probabilistic modeling. (a) The correlation matrix between galleryThe Gaussian fit of (b).

Fig. 6. Examples of normalized faces of Multi-PI

fusion approach, the parameters of Gaussian distribution for eachclassifier are unknown. We estimate these parameters by perform-ing cross validation on the training set.

4. Experiments

For evaluating the proposed methods, we conducted face recog-nition experiments of recognizing an individual across posedifferences with a single enrolled image on the CMU PIE (Simet al., 2003) and the Multi-PIE (Gross et al., 2010) database. Inthe experiments shown in Sections 4.1 and 4.2, the pose of galleryand probe images are assumed to be known. We evaluated therobustness to inaccurate pose estimation of the propose methodin Section 4.3.

4.1. Experiments on the Multi-PIE database

The Multi-PIE database is the latest released multi-posedatabase. It contains more samples and pose variations than thePIE. We used the data of session 1 in the Multi-PIE database withnormal illumination and expressions, which contains faces of 249subjects in 15 poses. In each pose there is only 1 image per subject.The yaw angle difference of neighbor pose is about 15�. In theexperiments 100 and 149 subjects are used for training and testingrespectively.

In this experiment, we test the proposed method with holisticand local feature representation. The face images are normalizedaccording to three key points, i.e., two eye centers and mouth

and probe faces. (b) The histogram of correlation for same and different identity. (c)

E (top row) and PIE (bottom row) database.

Page 5: Cross-pose face recognition based on partial least squares

Fig. 8. Experimental results on Multi-PIE (top row) and PIE (bottom row) database. Brighter color denotes higher recognition accuracy. Results shown matrix (a) and (c) areobtained by using holistic features. Results shown matrix (b) and (d) are based on local feature representation.

1952 A. Li et al. / Pattern Recognition Letters 32 (2011) 1948–1955

center respectively. The holistic feature are directly sampled fromthe normalized holistic face regions. Some examples of normalizedface region are shown in Fig. 6. In the experiments the holistic faceregions are down sampled to 45 � 56. Thus, the length of the holis-tic feature vector is 2520.

Besides the holistic features, local Gabor features are also usedin the experiments. In the experiments Gabor features are sampledfrom the image patches centered in corresponding fiducial points.As shown in Fig. 7, we used five fiducial points, i.e. two eye centers,two mouth corners and the tip of nose respectively. In the

experiments the fiducial points are manually labeled. We use 40Gabor filters that correspond to 5 scales and 8 orientations. Foreach patch, we can get 40 magnitude image of filter response. Toreduce the dimension of feature vectors, each magnitude imageis down sampled to 7 � 7. Therefore the total length of Gabor fea-ture vector is 1960.

When using holistic features, a PLS based classifier is built foreach pair of gallery and probe poses. And for local features, webuild five independent classifiers that each classifier correspondto a landmark. The five classifiers are integrated together based

Page 6: Cross-pose face recognition based on partial least squares

08_1 11_0 12_0 09_0 08_0 13_0 14_0 05_1 05_0 04_1 19_0 20_0 01_0 24_0 19_10

20

40

60

80

100

Probe Pose

Rec

ogni

tion

Rat

e (%

)

Multi−PIE

PLS − HolisticPLS − LocalEigenfacesFisherfaces

c22 c25 c02 c37 c05 c07 c27 c09 c29 c11 c14 c31 c3420

40

60

80

100

Rec

ogni

tion

Rat

e (%

)

Probe Pose

PIE

PLS − HolisticPLS − LocalELFKanade and Yamada

Fig. 9. Experimental results with frontal gallery faces.

Table 1Performance comparison on CMU PIE database.

Work Pose diff. (�) Rec. rate (%)

ELF (Gross et al., 2004) 22.5/45/90 93/88/39Chai et al. (2007) 22.5/45 98.5/89.73DMM (Blanz and Vetter, 2003) front/side/profile 99.8/97.8/79.5Kanade and Yamada (2003) 22.5/45/90 100/100/47TFA (Prince et al., 2008) 22.5/90 100/91Our method 22.5/45/90 100/100/76.47

A. Li et al. / Pattern Recognition Letters 32 (2011) 1948–1955 1953

on the probabilistic modeling approach proposed in forgoing sec-tion. In the process of combining multiple classifiers, the parame-ters of Gaussian distributions of the same and different identitiesneed to be determined. In the experiments the parameters are esti-mated by a twofold cross validation on the training set.

We plotted the matrix of recognition results (all poses againstall poses) with holistic and local features in Fig. 8(a) and (b). Theaverage recognition rates with holistic and local features are66.65% and 75.93% respectively. Performance comparison underfrontal gallery pose is illustrated in Fig. 9(a). The recognition re-sults of the eigenfaces (Turk and Pentland, 1991) method is alsoillustrated as a baseline. As can be seen, both using holistic featuresPLS outperforms the eigenfaces. To make further comparison, wealso give the results of fisherface (Belhumeur et al., 1997) inFig. 9(a). Fisherface recognizes faces via class separation. The pro-posed method does not use any class label information, but theperformance is comparable to fisherface. Besides that we also showthe performance of PLS using local Gabor feature. We can see thatthe enhancement with coarse geometric alignment and local Gaborfeatures considerably improves the recognition performance.These results show that the proposed method is effective forcross-pose face recognition.

1 In Fig. 9 and Table 1 we directly used the experimental results reportedcorresponding papers.

4.2. Experiments on the PIE database

Although the Multi-PIE is a better benchmark than the PIE, pre-vious works did not perform experiments on it. For comparison toprevious works, we also conducted experiments on the PIE data-base. In the PIE database there are 68 people whose images cap-tured in 13 poses with yaw and pitch angle differences. The yawangle difference of neighbor pose is about 22.5�. In the experimentsone half of the data are used for training while the remaining partis used for testing. In the experiments we also use 5 manuallylabeled fiducial points.

The experimental results with holistic and local feature aregiven in Fig. 8 (c) and (d) respectively. The average recognitionrates with holistic and local features are 81.44% and 89.05%. Inthe experiments on PIE we use 34 subjects for recognition test,

while in the experiments on Multi-PIE 149 subjects are used. Lesstesting samples make the experiments on PIE ‘‘easier’’ than that onMulti-PIE. Therefore, the results obtained from PIE are better.

In Fig. 9 (b) we show the results and comparisons with frontalgallery pose.1 When both using holistic features, our method outper-forms the ELF method. The performance of our method with holisticfeatures is lower than that reported in (Kanade and Yamada, 2003),which is a local patch based approach. But when local Gabor featureswere used, the performance of our method is better.

For comparing the results reported in (Prince et al., 2008), per-formance comparison of three probe views (c05,c11,c22) on PIEdatabase are given in Table 1. To our knowledge, the 3DMM (Blanzand Vetter, 2003) and the TFA (Prince et al., 2008) are the state ofthe art in cross-pose face recognition. It should be pointed out thatthe details of implementation is quite different. Therefore the com-parison in Table 1 is empirical. For example, in the experiments of3DMM and TFA, 6–8 and 14 manually labeled fiducial points areused respectively. In our experiments we only use 5 points. Inthe experiments of ELF, TFA and our method, 34 subjects are usedfor testing, and 68 subjects are used in the experiments of 3DMM.Thus, the experiments of 3DMM are more challenging. Althoughthe comparison is empirical, we can see that the our method are

Page 7: Cross-pose face recognition based on partial least squares

Fig. 10. Performance declination caused by pose difference.

1954 A. Li et al. / Pattern Recognition Letters 32 (2011) 1948–1955

obviously better than ELF and Takeo and Yamada’s method. Theresults of our method are lower than TFA, but close to 3DMM.

4.3. Experiments with unknown probe pose

In experiments shown in Sections 4.1 and 4.2, the pose angle ofgallery and probe pose is assumed to be known. To evaluate therobustness to inaccurate pose estimation, we conducted experi-ments with unknown probe pose on the Multi-PIE database. Inthe experiments, the gallery pose is set to frontal (05_1), whilethe probe pose is unknown and non-frontal. All the poses in theMulti-PIE are included except 08_1 and 09_1.

The ‘‘confusion matrix’’ matrix of recognition rate using holisticand local feature are shown in Fig. 10 (a) and (b) respectively.2 Todemonstrate the performance declination caused by inaccurate poseestimation, we calculate the average performance declination whenthe differences between the estimated pose and the real poseare ± 15�, ± 30� and ± 45�. The results are shown in Fig. 10 (c). Ascan be seen, local features are more robust to inaccurate pose esti-mation. When the error of pose estimation is within ± 15�, the aver-age performance declination is 9.15% based on local features. Theseexperimental results show that it is still difficult for recognizing facewith the big error in pose estimation. Besides the difference betweenassumed and real probe pose, the difference between gallery andreal probe pose is another factor that influences the performance.In Fig. 10 (d) we show the average performance degradation whenthe probe poses are within ± 45�. We can see that the performancedeclination is much smaller. Based on local features the average per-formance declination is 4.11% when the error of pose estimation iswithin ± 15�. We can conclude that the proposed method is robustwhen the error of pose estimation is within ± 15� and the probeposes are within ± 45�.

5. Conclusions

In this paper we proposed a cross-pose face recognition methodbased on partial least squares. When pose difference is big, appear-ance variations caused by pose are larger that caused by identity. Itleads to confusion in face recognition. In the proposed method,faces across poses are coupled by identity. Performing partial leastsquares on such coupled training set, the confusion caused by posedifferences could be considerably reduced. Therefore, the proposedmethod is an effective way to recognize face across pose. And iffurther enhanced with geometric alignment and local feature rep-resentation, the performance is close to the state of the art.

2 The pose arrangement in Fig. 8 (a) and (b) is ‘‘11_0, 12_0, 09_0, 08_0, 13_0, 14_0,05_0, 04_1, 19_0, 20_0, 01_0, 24_0’’.

References

Ashraf, A.B., Lucey, S., Chen, T., 2008. Learning patch correspondences for improvedviewpoint invariant face recognition. In: Proc. IEEE Conf. on Computer Visionand Pattern Recognition. pp. 1–8.

Baek, J., Kim, M., 2004. Face recognition using partial least squares components.Pattern Recognit. 37 (6), 1303–1306.

Belhumeur, P.N., Hespanha, P., Kriegman, D.J., 1997. Eigenfaces vs. fisherfaces:recognition using class specific linear projection. IEEE Trans. Pattern Anal.Machine Intell. 19 (7), 711–720.

Blanz, V., Vetter, T., 2003. Face recognition based on fitting a 3D morphable model.IEEE Trans. Pattern Anal. Machine Intell. 25, 1063–1074.

Blanz, V., Grother, P., Phillips, P.J., Vetter, T., June 2005. Face Recognition Based onFrontal Views Generated from Non-Frontal Images. In: Proc. IEEE Conf. onComputer Vision and Pattern Recognition. pp. 454–461.

Chai, X., Shan, S., Chen, X., Gao, W., 2007. Locally linear regression for pose-invariantface recognition. IEEE Trans. Image Process. 16 (7), 1716–1725.

Cootes, T., Wheeler, G., Walker, K., Taylor, C., 2002. View-based active appearancemodels. Image Vision Comput. 20 (9-10), 657–664.

Gross, R., Matthews, I., Baker, S., 2004. Appearance-based face recognition and light-fields. IEEE Trans. Pattern Anal Machine Intell. 26 (4), 449–465.

Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S., 2010. Multi-pie. Image VisionComput. 28 (5), 807–813.

Kanade, T., Yamada, A., 2003. Multi-Subregion Based Probabilistic ApproachToward Pose-Invariant Face Recognition. In: Proc. IEEE Internat. Symposiumon Computational Intelligence in Robotics and Automation, CIRA, pp. 954–959.

Lee, H.-S., Kim, D., 2006. Generating frontal view face image for pose invariant facerecognition. Pattern Recognition Lett. 27 (7), 747–754.

Li, A., Shan, S., Chen, X., Gao, W., June 2009. Maximizing intra-individualcorrelations for face recognition across pose differences. In: Proc. of IEEE Conf.on Computer Vision and Pattern Recognition. pp. 605–611.

Liu, X., Chen, T., 2005. Pose-Robust face recognition using geometry assistedprobabilistic modeling. In: Proc. IEEE Conf. Computer Vision and PatternRecognition. pp. 502–509.

Liu, C., Wechsler, H., 2002. Gabor feature based classification using the enhancedfisher linear discriminant model for face recognition. IEEE Trans. Image Process.11 (4), 467–476.

Lucas, B.D., Kanade, T., April 1981. An iterative image registration technique with anapplication to stereo vision. In: Proc. 1981 DARPA Image UnderstandingWorkshop. pp. 121–130.

Lucey, S., Chen, T., June 2006. Learning patch dependencies for improved posemismatched face verification. In: Proc. IEEE Conf. Computer Vision and PatternRecognition. pp. 909–915.

Prince, S.J.D., Elder, J.H., Warrell, J., Felisberti, F.M., 2008. Tied factor analysis for facerecognition across large Pose differences. IEEE Trans. Pattern Anal MachineIntell. 30 (6), 970–984.

Rosipal, R., Kramer, N., 2006. Overview and recent advances in partial least squares.Subspace Latent Struct. Feat. Select., 34–51.

Sanderson, C., Bengio, S., Gao, Y., 2006. On transforming statistical models for non-frontal face verification. Pattern Recognit. 39, 288–302.

Schwartz, W., Kembhavi, A., Harwood, D., Davis, L., 2009. Human detection usingpartial least squares analysis. In: IEEE Internat. Conf. on Computer Vision. IEEE,pp. 24–31.

Sim, T., Baker, S., Bsat, M., 2003. The CMU Pose, illumination, and expressiondatabase. IEEE Trans. on Pattern Anal. Machine Intell. 25 (1), 1615–1618.

Štruc, V., Pavešic, N., 2009. Gabor-based Kernel partial-least-squares discriminationfeatures for face recognition. Informatica 20 (1), 115–138.

Turk, M., Pentland, A., 1991. Eigenfaces for recognition. J. Cognit. Neurosci. 3 (1), 71–86.

Wiskott, L., Fellous, J., Kuiger, N., von der Malsburg, C., 1997. Face recognition byelastic bunch graph matching. IEEE Trans. Pattern Anal. Machine Intell. 19 (7),775–779.

Page 8: Cross-pose face recognition based on partial least squares

A. Li et al. / Pattern Recognition Letters 32 (2011) 1948–1955 1955

Wright, J., Hua, G., 2009. Implicit elastic matching with random projections forPose-variant face recognition. In: Proc. IEEE Conf. on Computer Vision andPattern Recognition. pp. 1502–1509.

Zhang, X., Gao, Y., 2009. Face recognition across Pose: a review. Pattern Recognit. 42,2876–2896.

Zhang, L., Samaras, D., 2006. Face recognition from a single training image underarbitrary unknown lighting using spherical harmonics. IEEE Trans. on PatternAnal. Machine Intell. 28 (3), 351–363.

Zhao, S., Gao, Y., 2009. Textural Hausdorff distance for wider-range tolerance topose variation and misalignment in 2D face recognition. In: Proc. of IEEE Conf.on Computer Vision and Pattern Recognition. pp. 1629–1634.

Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A., 2003. Face recognition: aliterature survey. ACM Comput. Sur. 35 (4), 399–458.