Continuous Pose Normalization for Pose-Robust Face Recognition

IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 11, NOVEMBER 2012 721

Continuous Pose Normalization forPose-Robust Face Recognition

Liu Ding, Xiaoqing Ding, Fellow, IEEE, and Chi Fang

Abstract—Pose variation is a great challenge for robust facerecognition. In this paper, we present a fully automatic pose nor-malization algorithm that can handle continuous pose variationsand achieve high face recognition accuracy. First, an automaticmethod is proposed to find pose-dependent correspondences be-tween 2-D facial feature points and 3-D face model. This methodis based on a multi-view random forest embedded active shapemodel. Then we densely map each pixel in the face image ontothe 3-D face model and rotate it to the frontal view. The fillingof occluded face regions is guided by facial symmetry. Recogni-tion experiments were conducted on the two western databasesCMU-PIE, FERET and one eastern database CAS-PEAL. Cur-rently the algorithm has been trained with pose variation up to

in yaw. Our algorithm not only achieves high recognitionaccuracy for learnt poses but also shows good generalizability forextreme poses. Furthermore, it suggests the promising applicationto people of different races.

Index Terms—Face recognition, pose normalization.

I. INTRODUCTION

F ACE recognition has been an active area of research inthe last two decades with widespread applications, such

as access control and video surveillance. Despite the rapid de-velopment of this technology under controlled conditions, thepose variation remains a major challenge in uncontrolled en-vironment. In a series of evaluations held by NIST [1], [2],the performance drops drastically for large pose change. Thebody of pose-robust face recognition algorithms is huge and canmainly be categorized into three categories, namely invariantfeature extraction-based, multi-view based and pose normaliza-tion-based. Among them, the most natural idea is pose normal-ization [3]–[9]. By generating a novel pose that is the same as theenrolled one, the face recognition system is greatly simplified.There is a wide variety of approaches related to pose normal-

ization. They can be further categorized into two categories,namely 2-D and 3-D methods. The 2-D methods such asLLR [3], pose parameter manipulation [4], component-wisepose normalization [5], and CPR [6] have been reported to berobust for small pose variation. However, the performance ofthese methods is limited by the use of 2-D warping, since the

Manuscript received June 25, 2012; revised August 20, 2012; accepted Au-gust 22, 2012. Date of publication August 27, 2012; date of current versionSeptember 10, 2012. This work was supported by the National Natural Sci-ence Foundation of China under Grant 60972094. The associate editor coordi-nating the review of this manuscript and approving it for publication was Prof.Giuseppe Scarpa.The authors are with State Key Laboratory of Intelligent Technology and Sys-

tems, Tsinghua National Laboratory for Information Science and Technology,Department of Electronic Engineering, Tsinghua University, Beijing 100084,China (e-mail: [email protected]; [email protected]; [email protected]).Digital Object Identifier 10.1109/LSP.2012.2215586

nature of 2-D linear transformation is incapable of capturingsuch 3-D rotations. The 3-D methods explicitly handle thepose variation using rigid motion models. The seminal workof Blanz and Vetter [7] was evaluated in Face RecognitionVendor Test (FRVT2002) [2], and achieved distinct results.Later this method was extended by [8], [9] using only sparsefeature points. From sparse correspondences between 2-Dfacial feature points and 3-D model vertices, the 3-D methodscan densely map the non-frontal images onto the 3-D model,followed by rotating the textured model to frontal views. Onlyuntil recently, researchers pay attention to the pose-dependentnature of this correspondence. Pose-invariant correspondenceswould incorrectly direct the texture mapping procedure. In [9]they use hand labeling to set up a pose-specific look-up tableof correspondent 3-D vertices.The novelty of this paper is to propose an automatic contin-

uous pose normalization method, which could handle poses upto 60 in yaw. The main difference between our approach and[9] is that the pose-dependent correspondences between 2-Dfeature points and 3-D model vertices are obtained automat-ically via feature detection and we utilize facial symmetry tofill occluded regions. We use a 3-D model to enhance the facialfeature detector based on multi-view random forest embeddedactive shape model [10]. Compared to the preliminary version,we contribute to further improve the accuracy of detection. Fur-thermore, we propose a sparse reconstruction algorithm to fit the3-D Morphable Model [7] to the specific person. The key ideais to preserve more discriminative information in face shape in-stead of only warping to mean shape [9]. The entire system isshown in Fig. 1.The rest of the paper is organized as following. Section II

presents our correspondence matching method. Section IIIdescribes the details of pose normalization. Section IV showsthe pose normalized images and recognition results. Section Vdraws the conclusion.

II. 2-D–3-D CORRESPONDENCE MATCHINGVIA FEATURE DETECTION

In this section, we propose an improved facial feature detectorbased on a variation of Active Shape Model (ASM) to connect2-D facial feature points with the corresponding points in the3-D model. The underlying assumption is that people under thesame pose should share the same configuration of feature points.Thus, we use virtual view as a bridge between 3-D model and2-D images.The so-called random forest embedded ASM (RFE-ASM)

[10] is able to locate 88 facial feature points automatically,which combines the discriminant learning into ASM based onrandom forest with pair-compare features. It allows a morerobust texture representation than traditional active shape

1070-9908/$31.00 © 2012 IEEE

722 IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 11, NOVEMBER 2012

Fig. 1. Overview of our pose-invariant face recognition system.

Fig. 2. Illustration of the improved view-based RFE-ASM facial detector.

model and fuses them into a general statistical model. Likeview-based active appearance model (AAM) [11], severalRFE-ASM models are trained, each of which covers a certainrange of pose. Despite the pose robustness of multi-viewmodels, too much hand labeling is needed, and too manyRFE-ASM models will greatly increase the fitting complexity.We currently trained 7 RFE-ASM models, each covers anon-overlapping interval within yaw angles from to 50 .One limitation of such scheme is that each pose interval istoo large for accurate localization. Another drawback of thisdetector is that pitch variation is not considered in the model.To this extent, we suggest using a virtual image synthesized

from a 3-D model by weak perspective projection (the meanshape and texture from Basel Face Model [12]) as initializationof the feature detector to alleviate the adverse effect of largepose-variation on each RFE-ASMmodel. The bridge image im-proves the overall robustness of the fitting process in two as-pects. The first is that facial feature points on the generated vir-tual face (with mean shape and texture) are easier to locate byRFE-ASM than the input face, because it possesses no personalvariation. Second, the feature location results of the generatedvirtual face (same pose with the input face) are closer to theground truth than the mean shape of RFE-ASM, because shapevariation caused by changes in pose is eliminated.

It is essential to obtain pose-specific 2-D–3-D correspon-dences for pose normalization as pointed out by [9]. 2-D featurepoints detected by ASM model present a phenomenon similarto that by AAM in [9]. Due to self-occlusion, the boundaryfeatures correspond to quite different 3-D model vertices whenthe head pose changes, In contrast to offline hand labeling in[9], we automatically match 2-D features and 3-D vertices viathe online-synthesized bridge image. They are linked by weakperspective projection and we find the pose-specific correspon-dences by tracing the depth buffer at the 2-D feature locations.The proposed method is executed in two steps. In the first

step, we use face detection [13] and eye location [14] result tosegment the face from original image. Then we use the methodof [15] to estimate the head pose of input facial image, afterwhich a virtual image of the same pose is synthesized from the3-D model. We can hence apply the RFE-ASM model at thisspecific pose interval to the virtual image for feature localiza-tion. Then we read the depth buffer and search the 3-D modelto find the corresponding 3-D vertices of these features. In thesecond step, the 2-D locations of facial features are transformedto the input image and used as the initial position for refinedRFE-ASM fitting. This increases the speed of searching and ac-curacy of detection. The routine of our approach is summarizedin Fig. 2.

DING et al.: CONTINUOUS POSE NORMALIZATION FOR POSE-ROBUST FACE RECOGNITION 723

Fig. 3. Overview of pose normalization procedure.

Fig. 4. Examples of 3-D Pose Normalization from (a) CMU-PIE, and(b) FERET. Each part’s top row contains the input images, and the bottom rowcontains the corresponding pose-normalized images.

III. POSE NORMALIZATION

Our pose normalization utilizes the feature positions and2-D–3-D correspondences obtained from Section II to recon-struct a person-specific 3-D model. Specifically, we map thetexture from input image onto the 3-D face model to create atextured 3-D model, from which we render the face into thefrontal pose.The first step is to find a 3-D rigid transformation that trans-

forms the 3-D model vertices from frontal pose to a pose thatoptimally matches 2-D facial features. Following the method in[16], we can recover the rigid transformation and 3-D faceshape by local linear fitting of the 3-D Morphable Model.Once both and are obtained, we are able to extract tex-

ture from the facial image by weak perspective projection. Dueto self-occlusion, part of face texture may become invisible.Visibility of each vertex is tested with a Z-buffer [17] algo-rithm. We use facial symmetry to guide the filling of the oc-cluded region. Then, we render the pose-normalized image tohave frontal pose. This procedure is summarized in Fig. 3.Fig. 4 gives pose-normalization examples of images from

several data sets using our fully automatic method. In each part,there are two rows and six columns of images. Each image in thetop row is the original input image, and the image in the bottomrow is the synthesized frontal face.When it comes to face recog-nition, the pose-normalized frontal images of input images withdifferent poses are compared.

TABLE IPOSE-WISE RANK-1 RECOGNITION RATES (%) FOR CMU-PIE,

FERET, AND CAS-PEAL DATABASES

IV. EXPERIMENTAL RESULTS

Face recognition algorithm used here is based on Gabor, whose idea is similar to Gabor-Fisher Classifier

[18]. We trained the PCA and LDA projection matrix by of-fline samples. An internal database that contains 429 Chinesepeople with pose variation up to 50 in yaw angle was used fortraining the PCA and LDA projection matrix and evaluatingthe parameters. Two feature vectors were compared by nor-malized cross-correlation. We conducted experiments on theCMU-PIE [19], FERET [20] and CAS-PEAL [21] databases.The CMU-PIE and FERET databases are commonly used forface recognition across different poses, and are mainly consistof western people, while CAS-PEAL database only consistsof Chinese people. These databases contain 68, 200 and 800people respectively. Our experiments on these databasesare convenient for comparison with other approaches, anddemonstrate our system’s ability to handle different racesand population sizes. The entire fitting and pose normalizingprocess takes about 2 seconds per image on an Intel Xeonprocessor including disk access.We reported rank-1 recognition rates arranged by pose in

Table I. Unlike [9], there is no failure, as our system only re-lies on the eye location.CMU-PIEDatabase: There was no gallery image for the sub-

ject 04039, sowe removed that subject from our results.We used

724 IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 11, NOVEMBER 2012

the remaining 67 subjects for our recognition. We used 67 sub-jects with neutral expression at 9 different poses (see Table I) forour recognition experiment. The frontal image (Pose ID c27) foreach subject was used as the gallery image and the remaining 8images per subject were used as probes (408 in total). We testedthe larger poses, c02 and c14 (67.5 ) and the averagerank-1 recognition rate was 81.3%. Our system’s overall rank-1recognition rate on this set was approximately 100% excludingthese two poses.FERET Database: We used all 200 subjects at 7 different

poses (see Table I) for our recognition experiment. Frontalimage ba for each subject was used as the gallery image (200in total) and the remaining 6 images per subject were used asprobes (1 200 in total). We tested the larger poses, bband bi (60 ) and their average rank-1 recognition rate was83.8%. Our system’s overall rank-1 recognition rate was 97.6%excluding these two poses.CAS-PEAL Database: We selected 800 subjects at 7 dif-

ferent poses (see Table I) for our recognition experiment.Frontal image for each subject was used as the gallery image(800 in total) and the remaining 6 images per subject wereused as probes (4 800 in total). Our system’s overall rank-1recognition rate was 96.8%.Summary of Results: The results show that our system

achieves better recognition performance than severalstate-of-the-art methods on datasets (CMU-PIE, FERET,and CAS-PEAL) with large pose variation, and can be appliedto face images of different races. The average rank-1 recogni-tion rate for yaw angles smaller than 45 amounts to over 95%,although it slightly drops with the increase of size of popula-tion. Most existing methods cannot be applied to extreme poses(yaw angle as large as 60 ), whereas our method significantlyoutperformed comparable methods on CMU-PIE and FERET.

V. CONCLUSION

A fully automatic continuous pose normalization algorithmfor pose-robust face recognition is proposed. The 3-D pose nor-malization takes advantage of accurate 2-D feature points and2-D–3-D correspondence provided by our improved multi-viewRFE-ASM feature detector. Our method achieves higher facerecognition accuracy than state-of-the-art pose-normalizationmethods on learnt poses in three public datasets and showsbetter generalizability for extreme poses. In future work, weplan to extend the system to an even wider range of poses. Inthat case, a different shape model should be used in RFE-ASMfor near profile facial images.

ACKNOWLEDGMENT

The authors would like to thank Dr. T. Vetter for providingthe BFM Database and the anonymous reviewers for their in-formative comments.

REFERENCES[1] D. M. Blackburn, J. M. Bone, and P. J. Phillips, FRVT 2000 Evalu-

ation Report Tech. Rep., Feb. 2001, pp. 32–35 [Online]. Available:http://biometrics.nist.gov/cs_links/face/frvt/FRVT_2000.pdf, [On-line]. Available:

[2] P. J. Phillips, P. Grother, R. J. Micheals, D. M. Blackburn, E. Tabassi,and J. M. Bone, FRVT 2002: Overview and Summary Tech. Rep.,Mar. 2003, pp. 10–11 [Online]. Available: http://biometrics.nist.gov/cs_links/face/frvt/FRVT_2002_Overview_and_Summary.pdf

[3] X. Chai, S. Shan, X. Chen, and W. Gao, “Locally linear regression forpose-invariant face recognition,” IEEE Trans. Image Process., vol. 16,no. 7, pp. 1716–1725, July 2007.

[4] D. Gonzalez-Jimenez and J. L. Alba-Castro, “Toward pose-invariant2-D face recognition through point distribution models and facial sym-metry,” IEEE Trans Inf, Forensics Secur., vol. 2, no. 3, pp. 413–429,Sept. 2007.

[5] S. Du and R. Ward, “Component-wise pose normalization for pose in-variant face recognition,” in Proc. 2009 IEEE ICASSP, pp. 873–876.

[6] A. Asthana, M. J. Jones, T. K. Marks, K. H. Tieu, and R. Goecke,“Pose normalization via learned 2D warping for fully automatic facerecognition,” in Proc. 2011 BMVC, pp. 127.1–127.11.

[7] V. Blanz and T. Vetter, “A morphable model for the synthesis of 3Dfaces,” in Proc. SIGGRAPH, 1999, pp. 187–194.

[8] X. Chai, L. Qing, S. Shan, X. Chen, and W. Gao, “Pose invariant facerecognition under arbitrary illumination based on 3D face reconstruc-tion,” in Proc. Audio- and Video-Based Biometric Person Authentica-tion, New York, 2005, pp. 956–965.

[9] A. Asthana, T. K. Marks, M. J. Jones, K. H. Tieu, and R. MV, “Fullyautomatic pose-invariant face recognition via 3D pose normalization,”in Proc. 2011 IEEE Int. Conf. Computer Vision., pp. 937–944.

[10] L. Wang, L. Ding, X. Ding, and C. Fang, “2D face fitting-assisted 3Dreconstruction for pose-robust face recognition,” Soft Comput., vol. 15,no. 3, pp. 417–428, 2011.

[11] T. F. Cootes, G. V. Wheeler, K. N. Walker, and C. J. Taylor, “View-based active appearance models,” Image Vis. Comput., vol. 20, no.9–10, pp. 657–664, Aug. 2002.

[12] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter, “A 3Dface model for pose and illumination invariant face recognition,” inProc. 2009 IEEE Int. Conf. Advanced Video and Signal Based Surveil-lance, pp. 296–301.

[13] Y. Ma and X. Ding, “Real-time multi-view face detection and poseestimation based on cost-sensitive adaboost,” Tsinghua Sci. Technol.,vol. 10, no. 2, pp. 152–157, Apr. 2005.

[14] Y. Ma, X. Ding, Z. Wang, and N. Wang, “Robust precise eye locationunder probabilistic framework,” in Proc. 2004 IEEE Int. Conf. Auto-matic Face and Gesture Recognition, pp. 339–344.

[15] C. Huang, X. Ding, and C. Fang, “Head pose estimation based onrandom forests for multiclass classification,” in Proc. 2010 Int. Conf.Pattern Recognition, pp. 934–937.

[16] L. Ding, X. Ding, and C. Fang, “3D face sparse reconstruction basedon local linear fitting,” Vis. Comput., submitted for publication.

[17] J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes, ComputerGraphics: Principles and Practice, 2nd ed. Reading, MA: Addison-Wesley, 1996.

[18] C. Liu and H. Wechsler, “Gabor feature based classification using theenhanced fisher linear discriminant model for face recognition,” IEEETrans. Image Process., vol. 11, no. 4, pp. 467–476, Nov. 2002.

[19] T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and ex-pression database,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 25, no.12, pp. 1615–1618, Dec. 2003.

[20] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “The FERETvaluation methodology for face-recognition algorithms,” IEEE Trans.Patt. Anal. Mach. Intell., vol. 22, no. 10, pp. 1090–1104, Oct. 2000.

[21] W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, and X. Zhang, “TheCAS-PEAL large-scale Chinese face database and baseline evalua-tions,” IEEE Trans. Syst., Man, Cybern. A, vol. 38, no. 1, pp. 149–161,Jan. 2008.

[22] M. S. Sarfraz and O. Hellwich, “Probabilistic learning for fully auto-matic face recognition across pose,” Image Vis. Comput., vol. 28, pp.744–753, May 2010.

[23] A. Asthana, C. Sanderson, T. Gedeon, and R. Goecke, “Learning-basedface synthesis for pose-robust recognition from single image,” in Proc.2009 Brit. Machine Vision Conf., pp. 31.1–31.10.

[24] Z. Wang, “Research on Statistics Based Robust Face Recognition,”Ph.D. dissertation, Dept. Elect. Eng., Tsinghua Univ., Beijing, China,2009.

Documents

Continuous Pose Normalization for Pose-Robust Face Recognition