6
Facial Expression Invariant Head Pose Normalization using Gaussian Process Regression Ognjen Rudovic Comp. Dept Imperial College London, UK [email protected] Ioannis Patras Elec. Eng. Dept Queen Mary University London, UK [email protected] Maja Pantic Comp. Dept Imperial College, London, UK EEMCS, Univ. Twente, NL [email protected] Abstract We present a regression-based scheme for facial- expression-invariant head pose normalization. We address the problem by mapping the locations of 2D facial points (e.g. mouth corners) from non-frontal poses to the frontal pose. This is done in two steps. First, we propose a head pose estimator that maps the input 2D facial point loca- tions into a head-pose space defined by a low dimensional manifold attained by means of multi-class LDA. Then, to learn the mappings between a discrete set of non-frontal head poses and the frontal pose, we propose using a Gaus- sian Process Regression (GPR) model for each pair of tar- get poses (i.e. a non-frontal and the frontal pose). During testing, the head pose estimator is used to activate the most relevant GPR model which is later applied to project the lo- cations of 2D facial landmarks from an arbitrary pose (that does not have to be one of the training poses) to the frontal pose. In our experiments we show that the proposed scheme (i) performs accurately for continuous head pose in the range from 0 to 45 pan rotation and from 0 to 30 tilt ro- tation despite the fact that the training was conducted only on a set of discrete poses, (ii) handles successfully both ex- pressive and expressionless faces (even in cases when some of the expression categories were missing in certain poses during the training), and (iii) outperforms both 3D Point Distribution Model (3D-PDM) and Linear Regression (LR) model that are used as baseline methods for pose normal- ization. The proposed method is experimentally evaluated on data from the BU-3DFE facial expression database. 1. Introduction Rigid motion of the face accounts for a great amount of variance in its appearance in a 2D image array. Simulta- neously, the non-rigid deformations of the face (from fa- cial expression variation and identity variation) cause more subtle variations in the 2D image. An individual’s identity and/or his or her facial expression, however, are captured by these small variations alone and are not specified by the variance due to the rigid head motion. Thus, it is neces- sary to compensate or normalize a face for position so that the variance due to this is minimized. Consequently, the small variations in the image due to identity, facial expres- sion, and so on, will become the dominant source of vari- ance in an image and can thus be analyzed for recognition purposes [12]. The main aim of head pose normalization is to gener- ate a ‘virtual’ view, i.e. to normalize an input face image to a predefined pose (e.g. frontal), before further analysis is conducted. A standard solution to this problem in com- puter vision applications is to use a 3-D or 2-D face-shape model. In [3], a 3-D morphable model is used to estimate 3-D facial shape and bring it into frontal view where fur- ther analysis of faces is carried out. In [17], a 3D-PDM and normalized SVD decomposition are used to simultaneously recover facial expression and pose parameters. A similar 3D-PDM is employed in [16] to separate the rigid head ro- tation from non-rigid facial expressions. A rigid face shape model is applied to build person-dependent descriptors that were later used to decompose facial pose and expression si- multaneously [10]. Well-known face-shape models in 2-D are 2-D PDM and Active Appearance Model (AAM), the latter of which fits an input face image to a pre-learned face model and consists of separated shape and appearance mod- els [6]. In the work presented in [9], a non-frontal view im- age is warped onto the 2-D PDM and the target virtual view is synthesized via Thin Plate Splines-based texture map- ping. In [7, 1], a multi-view AAM, where the frontal view of the input face can be easily synthesized, has been pro- posed. Overall, although the head pose normalization can be achieved to some extent by means of the 3-D/2-D face- shape models, a disadvantage of these methods is the use of generative models and/or fitting techniques that can fail to 1 28 978-1-4244-7030-3/10/$26.00 ©2010 IEEE

Facial Expression Invariant Head Pose Normalization using ... · 3-D facial shape and bring it into frontal view where fur-ther analysis of faces is carried out. In [17], a 3D-PDM

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Facial Expression Invariant Head Pose Normalization using ... · 3-D facial shape and bring it into frontal view where fur-ther analysis of faces is carried out. In [17], a 3D-PDM

Facial Expression Invariant Head Pose Normalization using Gaussian ProcessRegression

Ognjen RudovicComp. Dept

Imperial CollegeLondon, UK

[email protected]

Ioannis PatrasElec. Eng. Dept

Queen Mary UniversityLondon, UK

[email protected]

Maja PanticComp. Dept

Imperial College, London, UKEEMCS, Univ. Twente, [email protected]

Abstract

We present a regression-based scheme for facial-expression-invariant head pose normalization. We addressthe problem by mapping the locations of 2D facial points(e.g. mouth corners) from non-frontal poses to the frontalpose. This is done in two steps. First, we propose a headpose estimator that maps the input 2D facial point loca-tions into a head-pose space defined by a low dimensionalmanifold attained by means of multi-class LDA. Then, tolearn the mappings between a discrete set of non-frontalhead poses and the frontal pose, we propose using a Gaus-sian Process Regression (GPR) model for each pair of tar-get poses (i.e. a non-frontal and the frontal pose). Duringtesting, the head pose estimator is used to activate the mostrelevant GPR model which is later applied to project the lo-cations of 2D facial landmarks from an arbitrary pose (thatdoes not have to be one of the training poses) to the frontalpose. In our experiments we show that the proposed scheme(i) performs accurately for continuous head pose in therange from 0! to 45! pan rotation and from 0! to 30! tilt ro-tation despite the fact that the training was conducted onlyon a set of discrete poses, (ii) handles successfully both ex-pressive and expressionless faces (even in cases when someof the expression categories were missing in certain posesduring the training), and (iii) outperforms both 3D PointDistribution Model (3D-PDM) and Linear Regression (LR)model that are used as baseline methods for pose normal-ization. The proposed method is experimentally evaluatedon data from the BU!3DFE facial expression database.

1. IntroductionRigid motion of the face accounts for a great amount of

variance in its appearance in a 2D image array. Simulta-neously, the non-rigid deformations of the face (from fa-cial expression variation and identity variation) cause more

subtle variations in the 2D image. An individual’s identityand/or his or her facial expression, however, are capturedby these small variations alone and are not specified by thevariance due to the rigid head motion. Thus, it is neces-sary to compensate or normalize a face for position so thatthe variance due to this is minimized. Consequently, thesmall variations in the image due to identity, facial expres-sion, and so on, will become the dominant source of vari-ance in an image and can thus be analyzed for recognitionpurposes [12].

The main aim of head pose normalization is to gener-ate a ‘virtual’ view, i.e. to normalize an input face imageto a predefined pose (e.g. frontal), before further analysisis conducted. A standard solution to this problem in com-puter vision applications is to use a 3-D or 2-D face-shapemodel. In [3], a 3-D morphable model is used to estimate3-D facial shape and bring it into frontal view where fur-ther analysis of faces is carried out. In [17], a 3D-PDM andnormalized SVD decomposition are used to simultaneouslyrecover facial expression and pose parameters. A similar3D-PDM is employed in [16] to separate the rigid head ro-tation from non-rigid facial expressions. A rigid face shapemodel is applied to build person-dependent descriptors thatwere later used to decompose facial pose and expression si-multaneously [10]. Well-known face-shape models in 2-Dare 2-D PDM and Active Appearance Model (AAM), thelatter of which fits an input face image to a pre-learned facemodel and consists of separated shape and appearance mod-els [6]. In the work presented in [9], a non-frontal view im-age is warped onto the 2-D PDM and the target virtual viewis synthesized via Thin Plate Splines-based texture map-ping. In [7, 1], a multi-view AAM, where the frontal viewof the input face can be easily synthesized, has been pro-posed. Overall, although the head pose normalization canbe achieved to some extent by means of the 3-D/2-D face-shape models, a disadvantage of these methods is the use ofgenerative models and/or fitting techniques that can fail to

128978-1-4244-7030-3/10/$26.00 ©2010 IEEE

Page 2: Facial Expression Invariant Head Pose Normalization using ... · 3-D facial shape and bring it into frontal view where fur-ther analysis of faces is carried out. In [17], a 3D-PDM

Figure 1. The proposed approach. Twenty points marked with yel-low color are used as an input to the pose estimation. In the secondblock we couple two GP to obtain more accurate predictions in thefrontal pose.

converge. In addition, most of these methods are compu-tationally expensive and in need of time-consuming initial-ization process (e.g. due to manual annotation of more than60 facial landmark points).

Next to 3-D/2-D model-based methods, another ap-proach to head pose normalization is to apply a 2-D face-shape-free method. In [5], an affine transformation is usedto independently map different face regions from a discreteset of non-frontal views to the frontal view. In [4], the Lo-cal Linear Regression (LLR) method is proposed. In thismethod, the whole surface of the face is partitioned intomultiple uniform blocks, and the mappings between the cor-responding blocks in non-frontal poses and in the frontalpose are learned using linear regression model. Similarly, in[8], the non-frontal facial region is partitioned into multiplefacial components where linear transformations are appliedto different components for the generation of their frontalcounterpart. The underlying assumption of the aforemen-tioned methods is that by partitioning the whole face sur-face into multiple patches, the mapping of each patch toits counterpart in the frontal view can be modeled using alinear transform [4]. However, none of the aforementionedworks has analyzed the problem of virtual pose generation(i.e. head pose normalization) in the presence of facial ex-pressions. Yet, as the rigid head motions and non-rigid fa-cial expressions are non-linearly coupled in 2D [17], theunderlying assumption that the mappings between differentimage patches can be represented as a linear transformationmay not hold anymore. Moreover, the above models havebeen tested only on a set of the poses that were used to learnthe mappings, and were not tested for unknown poses, i.e.

the poses that were not used for learning the mappings.In this paper we propose a 2-D face-shape-free method

for head pose normalization. In contrast to above-describedmethods, the proposed method is facial-expression invari-ant and based on geometric (as opposed to appearance) fea-tures. Specifically, we address the problem by mapping thelocations of 2-D facial points from non-frontal poses to thefrontal pose. The proposed two-step approach is illustratedin Fig. 1. In the first step of the approach, we propose a headpose estimator that maps the input 2D facial point locationsinto a head-pose space defined by a low dimensional mani-fold attained by means of multi-class LDA. To deal with theproblem of facial landmark localization, any state-of-the-artfacial-expression-and-head-pose-invariant facial point de-tector can be used (e.g. see [14]). In this work we haveused manual facial landmark annotations instead, which al-lows us to test the performance of the method alone, withoutthe effect of landmark detection errors. In the second step,we use a Gaussian Process Regression (GPR) [13] modelto learn mappings between the 2D locations of landmarkpoints in non-frontal poses to their location in the frontalpose, and we do so for each pair of target poses (i.e. non-frontal and frontal). The contributions of the proposed ap-proach can be summarized as follows.

1. We propose a novel facial-expression-invariant headpose normalization approach based on geometric fea-tures that can handle expressive faces in the range from0! to +45! pan rotation and from 0! to +30! tilt rota-tion. The proposed approach performs accurately forcontinuous head pose (i.e. for unknown poses in thegiven range) despite the fact that the training was con-ducted only on a set of discrete poses. It also can suc-cessfully handle the problem of having an incompletetraining dataset (e.g. when instances of certain facialexpressions are not included in the training dataset forthe given discrete pose).

2. We propose modeling the mappings between differenthead-poses using the state-of-the-art probabilistic non-linear regression model, namely, Gaussian Process Re-gression model. We experimentally show that theproposed scheme for facial-expression-invariant headpose normalization outperforms both the baseline face-shape methods for pose normalization like 3D PDMand the baseline face-shape-free methods such as Lin-ear Regression (LR) model.

The rest of the paper is organized as follows. In Sec-tion 2 we present our facial-expression-invariant head posenormalization approach. In Section 3 we present and dis-cuss experimental results, and in Section 4 we conclude thepaper.

29

Page 3: Facial Expression Invariant Head Pose Normalization using ... · 3-D facial shape and bring it into frontal view where fur-ther analysis of faces is carried out. In [17], a 3D-PDM

2. Facial-expression-invariant Head Pose Nor-malization

In this section we describe the proposed approach forfacial-expression-invariant head pose normalization giventhe 2D locations p = [px1 .. p

xL py1 .. p

yL]1"d of L = d/2

facial landmarks of an facial image observed in an arbi-trary pose. The proposed approach consists of two mainsteps: (i) head pose estimation by using a pose classifieron p, (ii) pose normalization by mapping the positions p ofthe facial landmarks from a non-frontal pose to the corre-sponding 2D positions p0 in the frontal pose. We assumethat we have training data for each of P discrete poses andthe correspondences between the points for each target pairof poses (non-frontal and frontal pose). In our case, wediscretize the head pose space into P = 12 poses evenlydistributed across the range from 0! to +45! pan rotationand from 0! to +30! tilt rotation, with an increment of 15!.We denote with Dk = {pk1 , ..., pkN} the data set in posek containing N vectors of facial landmark locations, andwith D =

!D0, ..., Dk, ..., DP#1

"the whole training data

set. In what follows, we first introduce the proposed headpose estimation approach. Then, we describe the base GPRmodel used to independently learn P ! 1 mapping func-tions {f1, .. fk .. , fP#1}, one per each pair of target poses,(k, 0). Note that while in the training phase landmark vec-tor p is associated to one of P discrete poses, during testingthe pose associated with p is unknown and may not belongto the discrete set of poses (here the constraint is that thepose belongs to the aforementioned range of pan and tilt ro-tation). We use the output of the pose estimator to select themost relevant GPR model, fk, which then maps the pointsp from an arbitrary pose to the frontal pose.

2.1. Head Pose EstimationVarious head pose estimation methods based on appear-

ance and/or geometric features are proposed in the litera-ture [11]. In this paper, we propose to estimate the proba-bility that input facial landmark vector p is associated to ahead pose belonging to a discretized head-pose space. Toobtain a low-dimensional expression-invariant head-posemanifold, we apply multi-class LDA. Prior to learning thislow-dimensional manifold, we first normalize all examplesfrom D (i.e. training data containing 2D locations of faciallandmarks in P poses) to remove the scale and translationcomponents. To do this end, we transform the vector of fa-cial points p into image coordinates (pxi , p

yi ), i = 1...L, and

compute the gravity center (pxc , pyc ) and the scaling param-eter sc as

pxc = 1L

L#i=1

pxi , pyc = 1L

L#i=1

pyi

sc = 1L

L#i=1

((pxi ! pxc )2 + (pyi ! pyc )

2)12

where the normalized facial landmark positions are

pxni =

((pxi #px

c )2+(py

i #pyc )

2)12

sc (pxi ! pxc )

pyni =

((pxi #px

c )2+(py

i #pyc )

2)12

sc (pyi ! pyc ).

The normalized vector pn = [pxn1 , .., pxn

L , .., pyn1 , .., pyn

L ]is later used as an input to the multi-class LDA. We useLDA since it finds a simple linear transform S that, given atraining set of normalized facial landmark points Dn alongwith the target pose labels, is used to project the high-dimensional input data p to a low dimensional manifoldthat best represents pose variations while ignoring the othersources of variation such as facial expressions, identity-specific variation across different individuals, etc. The fa-cial landmark vectors projected onto the low-dimensionalmanifold, plda = S · p, are used to model a head pose k bya Gaussian conditional distribution

P (plda|k) =1

(2!)d/2 |!k|1/2·

exp(!1

2

$plda ! µk

%T!#1

k

$plda ! µk

%)

(1)

where µk and !k are the mean and covariance computedfrom Dn

k . By applying the Bayes’ rule to Eq.(1), we obtainthe probability of pose k given the low dimensional vectorplda as

P (k|plda) " P (plda|k)P (k) (2)

where we assume an uniform prior, P (k) = 1/P , foreach of P discrete head poses.

2.2. Gaussian Process Regression (GPR) ModelGPR model has gained increased popularity in statisti-

cal machine learning as it offers a principled nonparametricBayesian framework for inference, model fitting and modelselection [13]. Formally, for a given set of N input vectorspki in pose k along with the target values p0i in the frontalpose, we use the GPR model to find a smooth function fk:Rd # Rd that maps the input facial landmark locationsto the corresponding frontal facial landmark locations. As-suming a Gaussian noise "i with zero mean and covariancematrix #2

nI , this is expressed by p0i = fk(pki )+ "i. Further,a zero mean Gaussian process prior is placed over the func-tion fk that is fk $ GP (0,K + #2

nI). Here, K(Dk, Dk)denotes N %N covariance matrix obtained by applying thekernel function

k(pki , pkj ) =#2

s exp(!1

2(pkig ! pkjg)

TW (pkig ! pkjg))

+ #lpki p

kj + #b

(3)

to the pairs (pki , pkj ), where i, j = 1..N . #s and W =

diag(w1, ..., wd) are the parameters of the Radial Basis

30

Page 4: Facial Expression Invariant Head Pose Normalization using ... · 3-D facial shape and bring it into frontal view where fur-ther analysis of faces is carried out. In [17], a 3D-PDM

Function (RBF) with different length scales for each inputdimension (each coordinate of each landmark point), #l isthe process variance which controls the scale of the outputfunction fk, and #b is the model bias. We use this com-pound kernel function as it performed best in a pilot studywhich compared single RBF, MLP, Polynomial and Linearkernel, as well as their compound versions. In addition,the applied kernel has been widely used in literature dueto its ability to handle both linear and non-linear data struc-tures [2]. For a new facial landmark vector pk$ from pose kwe obtain the predictive mean fk(pk$) and the correspond-ing covariance V k(pk$) as

fk(pk$) = kT$ (K + #2nI)

#1D0 (4)

V k(pk$) = k(pk$, pk$) ! kT$ (K + #2

nI)#1k$ (5)

with k$ = k(Dk, pk$), where k(·, ·) is given in Eq. (3).The kernel parameters $ = {#s,W,#l,#b,#n} are found bymaximizing the log marginal likelihood, L = log p(D0|$),using the conjugate gradient algorithm. Because we aresolving the multivariate regression problem here, we as-sume that the output dimensions (each coordinate of eachlandmark point in p0i ) are identically distributed. This al-lows us to have the same covariance function for each out-put dimension [13].

2.3. Algorithm SummaryWe give a summary of our approach to facial-expression-

invariant head pose normalization in Alg. 1. In the off-linephase of the algorithm, we first apply multi-class LDA tothe training data D to find the transformation matrix S thatmaps the input data to the low dimensional manifold of headposes and estimates the probability that the input data is ineach of the discrete training poses. In the second step ofthe off-line phase, the training data is registered by definingreference faces, one for each pose k, and by mapping thelandmark points in pose k to the corresponding referenceface using an affine transform. The registration is carriedout using three referential points: the nasal spine point andthe inner corners of the eyes (see Fig. 1) that are chosensince they are stable facial points and the contractions ofthe facial muscles do not affect them. The registered train-ing data are used to train P ! 1 GPR models, one for eachpair of the non-frontal and the frontal pose. The learnedmodels are stored for later use in the on-line part of the al-gorithm. In the on-line phase, an input vector of the faciallandmark positions p$ in an arbitrary pose is first subjectedto the proposed head pose estimation. The output of thisstep are the probabilities of p$ being in each of the P train-ing poses. In the second step, we register p$ to the mostlikely pose kmax found in the previous step. Note that theaffine transformation used to register points p$ to pose kmax

helps both scaling of p$ and reducing the error introduced

due to the discretization of the pose space (i.e. ±7.5!). Fi-nally, if the retrieved pose is not the frontal pose, we selectthe most relevant GPR model, fkmax , that is further usedto make predictions of the facial landmark locations in thefrontal view, p̂0$, given the registered facial landmark vectorpkmax$ .

Algorithm 1 Head Pose NormalizationOFF-LINE: Learning model parameters1. Apply LDA to compute transformation S, and learn µk

and #k for each head pose k = 0..P ! 1 (Sec.2.1)2. Form pairs of registered facial landmark locations,(Dk

reg, D0reg), k = 1..P ! 1 (Sec.2.3)

3. Learn GPR models {f1, ..., fP#1} for P ! 1 targetpairs of poses (Sec. 2.2)ON-LINE: Inference of facial landmark locations p$ in anarbitrary pose1. Apply pose estimation to obtain P (k|plda$ ), whereplda$ = S · p$ and k = 0 .. P ! 1 (Sec. 2.1)2. Register p$ to pose kmax = max

k(P (k|plda$ )), pkmax

$

3. Predict locations of facial points in frontal poseif (kmax == 0) thenp̂0$ = pkmax

$elsep̂0$ = fkmax(pkmax

$ )end if

3. ExperimentsThe experimental evaluation of the proposed methodol-

ogy has been carried out using the BU-3D Facial Expression(BU3DFE) database [15]. BU3FEDB contains 3D rangedata of 7 facial expressions performed by 100 subjects (60%female of various ethnic origin) which are shown in Fig. 2.All facial expressions except Neutral were sampled in fourdifferent levels of intensity. We generate 2D multi-view im-ages of facial expressions from the available 3D data by ro-tating 39 facial landmark points provided by the databasecreators (see Fig. 4), which were used further as the featuresin our study. The data in our experiment include images of50 subjects (54% female) from 0! to 45! pan rotation, and0! to 30! tilt rotation (see Fig. 1), with 5! increment, re-sulting in 1250 images for each of 70 poses. The trainingdata is subsampled from this dataset to include images ofexpressive faces in 12 poses only (15! increment in pan andtilt angles). These data (referred to as BU-TR dataset inthe text below) as well as the rest of the data (referred toas BU-TST dataset and used to test the performance of theproposed method in case of unknown head poses) were par-titioned into five parts in a person-independent manner foruse in a 5-fold cross validation procedure. The rest of this

31

Page 5: Facial Expression Invariant Head Pose Normalization using ... · 3-D facial shape and bring it into frontal view where fur-ther analysis of faces is carried out. In [17], a 3D-PDM

Figure 2. The examples of images from BU3FE dataset represent-ing seven facial expressions in the frontal view: Surprise, Angry,Happy, Neutral, Disgust, Fear and Sad

section is organized as follows. First, we evaluate the accu-racy of the proposed head pose estimator. Then, we presentthe experiments aimed at evaluation of the accuracy of theproposed head pose normalization method. To measure theaccuracy of the method, we used the Root-Mean-Squared-

Error (RMSE) that is defined as&

1d &"p&2, where "p is

the facial deformation vector representing the difference be-tween the predicted positions of the facial landmarks in thefrontal pose and the ground truth (the manually annotatedlandmarks in the frontal pose).

We evaluate the performance of the proposed head poseestimator when trained on BU-TR dataset and tested on asubset of BU-TST dataset containing only the facial datain the discrete poses used for training. A five-fold person-independent cross validation was performed, where data of40 subjects from BU-TR were used for training, and data of10 unknown subjects from BU-TST were used for testingthe pose estimator. As can be seen from Table 1, an averageclassification error rate of 6.9% has been attained overall. Itis interesting to note that correct pose estimation was easierto obtain for facial data in poses being far from the frontalpose than for facial data in poses being closer to the frontalpose, in which case confusions between neighbouring near-frontal poses were often encountered. This is, however, anatural consequence of the fact that the facial point distri-bution is rather similar for various near-frontal poses andthat it is rather different for various poses being far awayfrom the frontal pose.

+45 +30 +15 +0+30 0.9±1.3% 2.3±1.8% 5.1±1.1% 12.3±2.3%+15 2.3±2.3% 3.9±1.6% 7.6±1.2% 14.8±2.5%+0 3.9±2.2% 7.5±2.1% 10.2±2.2% 11.9±2.1%

Table 1. Average error rate for head pose estimation across 12training poses where training/testing of the pose estimator wasperformed in a person-independent manner using data from BU-TR/BU-TST dataset. The head-pose classification was performedby selecting the most likely pose

To evaluate the proposed head pose normalization ap-proach, first we trained 11 GPR models (as described inAlg. 1) using the data from BU-TR dataset, and then wetested the trained models on the whole BU-TST dataset (in-cluding the facial data in unknown poses). We comparethe performance of the proposed GPR-based method to thatachieved by the ‘baseline’ methods for pose normalization,namely, the 3D-PDM [17] and the standard LR model [2].

Figure 3. Comparison of head pose normalization methods GPR,LR, 3D-PDM trained on BU-TR (12 head poses) and tested onBU-TST (70 head poses) in a person-independent manner in termsof RMSE

(a) (b) (c)Figure 4. Prediction of the facial landmarks in the frontal posefor an BU3DFE image of Happy facial expression in pose(+45!,+30!) obtained by using (a) 3D-PDM (b) LR and (d) GPR.The blue ! represent the ground truth and the black ! are the pre-dicted points. As can be seen, the alignment of the predicted andthe corresponding ground truth facial landmarks is far from perfectin case of 3D-PDM

The 3D-PDM was trained using the corresponding frontal3-D data from BU-TR dataset, retaining 15 modes of non-rigid shape variation. LR models were trained in a similarway as the GPR models, i.e. for each pair of target poses.Fig. 3 shows the comparative results in terms of RMSE ofthe tested head pose normalization methods along with theresults obtained when no pose normalization is performedand only the translation component has been removed. Ascan be seen, both GPR- and LR-based methods outperformthe 3D-PDM method for pose normalization. Judging fromFig 4, this is probably due to the fact that the tested 3D-PDM was not able to accurately model the non-rigid facialmovements present in facial expression data.

The performance of the aforementioned models in thepresence of noise in test data was evaluated on the BU-TSTdata corrupted by adding four different levels of noise. Ascan be seen from Table 3, even in the presence of high levelsof noise the performance of GPR-based method is compa-rable to that of 3D-PDM achieved for noise-free data. Al-though the GPR-based method outperforms the LR-basedmethod (as can be seen from Fig. 3), the performance ofGPR- and LR-based can be considered comparable in theaforementioned experiments where the utilized data werebalanced (i.e. when the method is trained on data contain-ing examples of all facial expression categories and theirintensity levels in all target poses). However, we furthertested the ability of the GPR- and LR-based models to han-dle unseen non-rigid face motion, i.e. novel facial expres-

32

Page 6: Facial Expression Invariant Head Pose Normalization using ... · 3-D facial shape and bring it into frontal view where fur-ther analysis of faces is carried out. In [17], a 3D-PDM

# = 0 # = 0.5 # = 1 # = 2 # = 33D-ASM 2.9±0.5 2.8±0.6 3.3±0.8 3.6±0.6 3.8±0.6

LR 1.5±0.3 1.7±0.3 1.9±0.2 2.7±0.2 3.7±0.2GPR 1.3±0.2 1.5±0.3 1.7±0.3 2.5±0.2 3.4±0.1

Table 2. Comparison of head pose normalization methods 3D-PDM, LR and GPR, trained on BU-TR (12 head poses) and testedon BU-TST (70 head poses) corrupted with different levels ofGaussian noise with standard deviation !, in a person-independentmanner, in terms of RMSE

sions. This is important when testing the generalizationability of the learned models. It is also central to the abil-ity of the proposed model to learn the mappings even whenexamples of some facial expressions are not available in allposes. To evaluate this ability of the LR and GPR models,we did the following. We used the data of 40 subjects fromBU-TR dataset for training the LR and GPR models. Theused training data contained examples of the apex of six fa-cial expressions (i.e. intensity-level-4 examples except forthe facial expression Neutral where only one example perperson is available). The testing was performed on inten-sity levels 2, 3 and 4 of the missing expression category(except Neutral), using data of 10 subjects from BU-TSTdataset. As can be seen from Table 4, GPR-based methodoutperforms LR-based method. This clearly shows that theGPR-based head-pose normalization method is more robustto non-rigid deformations that are not seen in the train-ing set than is the case with the LR-base method. More-over, GPR-based method generalizes better than LR-basedmethod even when just a small amount of training data isprovided.

4. ConclusionWe presented a novel 2D-shape-free regression-based

scheme for facial-expression-invariant head pose normal-ization that is based on 2-D geometric features. Experi-mental results show that the proposed method outperformsboth the baseline 3D-face-shape method, i.e. 3D-PDM, andthe baseline 2D-shape-free method, i.e. LR-based method.Moreover, we show experimentally that the proposed GPR-based method performs accurately for continuous head pose(i.e. for unknown poses in the given range of poses) despitethe fact that the training was conducted only on a discrete(small) set of poses. Finally, experimental results clearlyshow that the proposed method can successfully handlethe problem of having an incomplete training dataset, e.g.,when examples of certain facial expressions were not in-cluded in the training dataset for the given pose.

AcknowledgmentsThe work of Ognjen Rudovic is funded in part by

the European Community’s 7th Framework Programme

Disgust Angry Fear Happy Sad Surprise NeutralLR 2.3±1.1 1.8±0.9 1.7±0.7 2.1±1.2 1.9±0.8 2.1±1.2 1.7±0.8

GPR 1.4±0.4 1.2±2.2 1.2±0.4 1.3±0.8 1.1±0.4 1.7±0.6 1.2±0.3Table 3. RMSE for pose normalization obtained by using LR- andGPR-based methods in the presence of incomplete training dataset

[FP7/2007-2013] under grant agreement no. 211486 (SE-MAINE). The work of Maja Pantic is funded in part by theEuropean Research Council under the ERC Starting Grantagreement no. ERC-2007-StG-203143 (MAHNOB). Thework of Ioannis Patras is partially supported by EPSRCproject EP/G033935/1.

References[1] A. Asthana, R. Goecke, N. Quadrianto, and T. Gedeon. Learning based auto-

matic face annotation for arbitrary poses and expressions from frontal imagesonly. In CVPR, pages 1635–1642, 2009. 1

[2] C. M. Bishop. Pattern Recognition and Machine Learning (Information Scienceand Statistics). Springer, 2007. 4, 5

[3] V. Blanz, P. Grother, J. P. Phillips, and T. Vetter. Face recognition based onfrontal views generated from non-frontal images. In CVPR, pages 454–461,2005. 1

[4] X. Chai, S. Shan, X. Chen, and W. Gao. Local linear regression (llr) for poseinvariant face recognition. In AFGR, pages 631–636, 2006. 2

[5] X. Chai, S. Shan, and W. Gao. Pose normalization for robust face recognitionbased on statistical affine transformation. In ICSP, pages 1413–1417, 2003. 2

[6] T. Cootes and C. Taylor. Active shape models - smart snakes. In BMVC, pages266–275, 1992. 1

[7] T. Cootes, G. Wheeler, K. Walker, and C. Taylor. View-based active appearancemodels. Image and Vision Computing, 20(9-10):657 – 664, 2002. 1

[8] S. Du and R. Ward. Component-wise pose normalization for pose-invariantface recognition. In ICASSP, pages 873–876, 2009. 2

[9] D. Gonzalez-Jimenez and J. Alba-Castro. Symmetry-aided frontal view syn-thesis for pose-robust face recognition. In ICASSP 2007, pages II–237 –II–240,2007. 1

[10] S. Kumano, K. Otsuka, J. Yamato, E. Maeda, and Y. Sato. Pose-invariant facialexpression recognition using variable-intensity templates. Int’l J. ComputerVision, 83(2):178–194, 2009. 1

[11] E. Murphy-Chutorian and M. M. Trivedi. Head pose estimation in computervision: A survey. IEEE Trans. Pattern Analysis and Machine Intelligence,31(4):607–626, 2009. 3

[12] M. Pantic and L. J. M. Rothkrantz. Automatic analysis of facial expressions:The state of the art. IEEE Trans. Pattern Analysis and Machine Intelligence,22:1424–1445, 2000. 1

[13] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learn-ing (Adaptive Computation and Machine Learning). The MIT Press, 2005. 2,3, 4

[14] M. Valstar, B. Martinez, X. Binefa, and M. Pantic. Facial point detection usingboosted regression and graph models. In CVPR, 2010. 2

[15] J. Wang, L. Yin, X. Wei, and Y. Sun. 3d facial expression recognition based onprimitive surface feature distribution. In CVPR, pages 1399–1406, 2006. 4

[16] T.-H. Wang and J.-J. J. Lien. Facial expression recognition system based onrigid and non-rigid motion separation and 3d pose estimation. Pattern Recog-nition, 42(5):962 – 977, 2009. 1

[17] Z. Zhu and Q. Ji. Robust real-time face pose and facial expression recovery. InCVPR, pages 681–688, 2006. 1, 2, 5

33