Head Pose Estimation Based on Face Symmetry Analysis · Symmetrical region is de ned by analysing pixels intensity. The intensity of one pixel on the right side of the face is more

HAL Id: hal-01111677https://hal.archives-ouvertes.fr/hal-01111677

Submitted on 28 Sep 2018

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Head Pose Estimation Based on Face SymmetryAnalysis

Afifa Dahmane, Slimane Larabi, Ioan Marius Bilasco, Chaabane Djeraba

To cite this version:Afifa Dahmane, Slimane Larabi, Ioan Marius Bilasco, Chaabane Djeraba. Head Pose EstimationBased on Face Symmetry Analysis. Signal, Image and Video Processing, Springer Verlag, 2015, 9 (8),pp.1871-1880. �10.1007/s11760-014-0676-x�. �hal-01111677�

https://hal.archives-ouvertes.fr/hal-01111677

https://hal.archives-ouvertes.fr

Noname manuscript No.(will be inserted by the editor)

Head Pose Estimation Based on Face Symmetry Analysis

Afifa Dahmane · Slimane Larabi · Ioan Marius Bilasco · Chabane

Djeraba

Received: date / Accepted: date

Abstract This paper address the problem of head pose

estimation in order to infer a non-intrusive feedback

from users about gaze attention. The proposed approach

exploit the bilateral symmetry of the face. We use the

size and the orientation of the symmetrical area of the

face to estimate roll and yaw poses by the mean of De-

cision Tree model. The approach does not need the lo-

cation of interest points on face and is robust to partial

occlusions. Tests were performed on different datasets

(FacePix, CMU PIE, Boston University) which give

variability in illumination and expressions. Results demon-

strate that the change in the size of the regions that

contain a bilateral symmetry provides accurate pose es-

timation.

Keywords Head pose estimation · Symmetry detec-

tion · Pattern recognition

1 Introduction

The head pose is often linked with visual gaze estima-

tion and provides a coarse indication of the gaze in situ-

ations where the system should be non-intrusive using

only a regular camera either in situations where the

eyes may be not visible. In this context, a coarse head

pose can give a good indication of the gaze attention.

Afifa Dahmane, Slimane LarabiComputer Science Department, USTHB University, AlgeriaTel.: +213 21247607Fax: +213 21247607E-mail: fdahmane,[email protected]

Afifa Dahmane, Ioan Marius Bilasco, Chabane DjerabaLIFL, USTL Universitiy of Lille UMR CNRS 8022, FranceTel.: +33 3 62531584Fax: +33 3 28778537E-mail: afifa.dahmane,marius.bilasco,[email protected]

Head pose estimation is a classic problem in com-

puter vision. It is widely used in many applications such

as video conferencing, driver monitoring or human com-

puter interaction. Moreover, for many pattern recog-

nition applications, it is necessary to estimate coarse

head pose to eliminate variation in pose for better ac-

curacy (e.g. face recognition or facial expression anal-

ysis). Many approaches based on local facial features

are proposed to deal with head pose estimation. How-

ever, the obvious difficulty for this local approaches lies

in detecting outlying or missing features in situations

where facial landmarks are obscured. Also, low resolu-

tion imagery make it difficult to precisely determine the

feature locations.

1.1 Contribution

This paper presents a method based on symmetry to

estimate discrete head pose. We exploit the bilateral

symmetry of the face to directly deduce two degrees of

freedom for the head (yaw and roll). The symmetry is

defined using global skin region instead of local interest

points. The proposed approach does not need the loca-

tion of interest points on the face and can be deployed

using low-cost and widely available hardware. Also, no

initialization of pose nor calibration are required. The

estimated pose is coarse but sufficient to infer general

gaze direction. We have three main contributions:

– First, we develop a method for detecting the po-

sition of symmetry axis and its orientation in an

image.

– Second, the roll angle is deduced from the inclina-

tion of the symmetry axis.

– Third, the yaw angle is calculated using the region

which delimits symmetrical pixels.

2 Afifa Dahmane et al.

Symmetrical region is defined by analysing pixels

intensity. The intensity of one pixel on the right side of

the face is more similar to its mirror pixel than another

pixel in the image. We have conducted experiments

which indicate that the use of facial symmetry as a geo-

metrical indicator for head pose is still reliable when lo-

cal geometric features (such as eyes, nose or mouth) are

missed due to occlusions or wrong detections. We give

more insights about the method comparatively with

our previous work and extend it to two degrees of free-

dom. Besides using public datasets (FacePix, CMU PIE

and Boston University datasets [1] [2] [3]) we have also

used web-cam captures in order to cover situations not

present in the available datasets. Sample captures are

available here : www.lifl.fr/˜dahmane/VIDEOS.

The paper is organized as follows. We first review

the related work on head pose estimation in section 2.

Then, we provide the methodology used for the estima-

tion of the head pose using the symmetrical parts of the

face in Section 3. In Section 4, we present the results

of the evaluation process. The results of head pose esti-

mation are discussed. Finally, we conclude and discuss

the potential future work in Section 5.

2 Related work

In this section, we review the related work for head pose

estimation regardless of the underlying descriptors and

methodology. We analyse the existing methods in order

to highlight advantages and disadvantages of each one.

Then, we focus our attention on global approaches ex-

ploiting symmetry information. Even though the latter

approaches are less popular, we strongly believe in the

benefits of global symmetry for pose estimation.

Existing techniques for head pose estimation are

summarised in [4] and can be categorized in six groups:

Model based approaches include geometric and

flexible model approaches. Geometric approaches use

the location of facial features such as eyes, mouth and

nose and geometrically determine the pose from their

relative configuration [5][6]. Flexible Model approaches

use facial features to build a flexible model which fits to

the image such that it conforms to the facial structure of

each individual (AAM) [7]. However, accurate matching

of a deformable face model to image sequences with

large amounts of head movement is still a challenging

task [8].

Classification-based approaches formulate the

head pose estimation as a pattern classification prob-

lem. Several works have used a range of classifiers such

as SVM [9][10]. Isarun and al. [11] uses random trees

beside SVM. In [12] Kernel Principal Component Anal-

ysis (KPCA) is used to learn a non-linear subspace for

each range of view. Then a test face is classified into one

of the facial views using Kernel Support Vector Classi-

fier (KSVC). Also, classification is achieved in [13] using

a set of randomized ferns and in [14] Naive Bayes clas-

sifier are applied to estimate head pose.

Regression-based approaches consider pose an-

gles as regression values. Several regressors are possible

such as Convex Regularized Sparse Regression (CRSR)

[15] and Gaussian Progress Regression (GPR) [16]. Mu-

rad and al. [17] proposed a method based on Partial

Least Squares (PLS) Regression to estimate head pose.

Support Vector Regressors (SVRs) are used to train

Localized Gradient Orientation (LGO) histogram com-

puted on detected facial region to estimate driver’s head

pose in [18]. Neural networks are one of the most used

non-linear regression tools for head pose estimation. Tia

and al. [19], use multiple cameras and estimate head

pose by neural networks for each camera.

Template Matching approaches compare im-

ages or filtered images to a set of training examples and

find the most similar. In [20] author represents faces

with templates that cover different poses and for an in-

put data uses correlation on model templates to achieve

face recognition finding the best match. Similarity-to-

prototypes philosophy is adopted by authors in [21] in

order to calculate the pose similarity ratio.

Manifold Embedding approaches produce a low

dimensional representation of the original facial fea-

tures and then learn a mapping from the low dimen-

sional manifold to the angles. Biased Manifold Embed-

ding for supervised manifold learning is proposed in

[22]. The incorporation of continuous pose angle infor-

mation into one or more stage of the manifold learn-

ing process such as Neighbourhood Preserving Embed-

ding (NPE) and Locality Preserving Projection (LPP)

is studied in [23]. Dong [24] proposed Supervised Local

Subspace Learning (SL2) to learn a local linear model

where the mapping from the input data to the em-

bedded space was learned using a Generalized Regres-

sion Neural Network (GRNN). In [25] author proposed

the K-manifold clustering method, integrating manifold

embedding and clustering.

Tracking approaches uses temporal information

to improve the head pose estimation using the results

of the head tracking [26]. In [11], a pedestrian tracker is

applied to the heads video to infer head pose labels from

walking direction and automatically aggregate ground

truth head pose labels. Ba and al. [27] aims recognition

of people’s visual focus of attention by using a track-

ing system based on particle filtering techniques. KLT

algorithm is used in [28] to track features over video

frames in order to estimate 3D rotation matrix of the

head.

Head Pose Estimation Based on Face Symmetry Analysis 3

Each approach has specific limitations. Appearance-

based approaches suffer from information about iden-

tity and lighting which are contained in the face ap-

pearance. For template matching methods, the effect

of identity can cause more dissimilarity then the pose

itself. Most Manifold Embedding techniques have ten-

dency to build manifold to identity as well as pose.

Unlike model-based approaches are identity indepen-

dent when the feature points used are linked to hu-

man morphology (anatomical points) and not to spe-

cific appearance points (mathematical points) like cor-

ners. Appearance-based approaches also require high

computational time and this make it difficult to imple-

ment a real-time system. The model-based approaches

are fast but sensitive to occlusion and usually require

high resolution images. The difficulty lies in accurate

detection of the facial features (morphological points)

since all of the facial features are required (the outer

corner of both eyes). High resolution imagery may not

be available in many applications such as driver mon-

itoring and e-learning systems. Also, model-based ap-

proaches [5], [7] require frontal view to initialize the

system.

A specific family of approaches that exploit global

features of the face, reducing dependency on identity

and avoiding initialization of frontal pose, is represented

by solutions that exploit facial symmetry. In [29], in

order to detect possible regions for training a face in

image, authors estimate the symmetry of the regions.

The hypothesis is that the amount of symmetry can

offer hints about head orientation.

2.1 Symmetry based approaches

The human perception of head pose is based upon two

cues: the deviation of the head shape from bilateral

symmetry, and the deviation of the nose orientation

from the vertical [30]. Therefore, we presume that head

pose is more related to the geometry of the face im-

ages and the symmetry of the face is a good indicator

about the geometric configuration and the pose of the

head. In the literature, there are a few symmetry based

geometric head pose estimation methods.

Despite the fact that the human face is not perfectly

symmetrical, facial symmetry of a person is significant

and can be exploited. Some works dealing with the head

pose estimation through feature points use the symmet-

rical property of the head. For instance, facial symme-

try has been used as a visual intent indicator in [31]

for people with disabilities. The face pose (roll and yaw

angles) are estimated from a single uncalibrated view

in [32] where the symmetric structure of human face

is exploited by taking the mirror image of a test face

image as a virtual second view. Facial feature points of

the test and its mirror image are matched against each

other in order to evaluate the pose. In [33] authors in-

troduce gabor filters in order to enhance the symmetry

computation process and estimate the yaw. Symmetry

based illumination model proposed in [34] is based on

three features (the two eyes and the nose tip). For every

combination of two eyes and a nose, head pose is com-

puted using a weak geometry projection and internally

calibrated camera. In the context of face recognition,

in [35] authors use the bilateral symmetry of the face

to deduce if the pose is frontal or not. Beside intensity

images, 3D data can be used for head pose estimation

[36], [37].

Symmetry provides high-level knowledge about the

geometry of face. We use the bilateral symmetry of the

face to deal with the head pose estimation problem. We

propose an approach to perform head pose estimation

based on the symmetrical properties of the face.

3 Our approach

Our symmetry based approach aims at being non in-

trusive and do not require user collaboration. It has to

be independent to user identity and can be deployed on

still images as on videos. Our system use geometrical

model which is not based on specific feature points. We

propose an approach that joins the effectiveness of both

local and global methods. We select symmetrical area

on the face relative to skin pixels intensity and use the

size of this area and it’s orientation to estimate roll and

yaw poses.

The proposed method (Figure 1) first detects the

face using Viola Jones algorithm [38]. Preprocessing

(histogram equalization) is applied in order to reduce il-

lumination influence. Then the symmetry axis is searched

in the area of the face. This task is performed using a

symmetry detection algorithm. Once the head and the

symmetry axis are detected, we extract the features of

the symmetry. We deduce the roll angle from the ori-

entation of the symmetry axis and estimate the yaw by

analysing some characteristics of the symmetrical re-

gion. The method is described on figure 1. We can see

that the symmetry detection allows the estimation of

the roll angle and further the extraction of symmetric

features.

In the following, we will bring out the correlation be-

tween symmetry and head pose by analysing the sym-

metrical regions of the face. We will detail the symme-

try axis detection process and the characterization of

the symmetry region. Then, we pass to the yaw estima-

tion process by means of a Decision Tree Classifier.


Fig. 1 Proposed approach.

3.1 Analysis of symmetrical regions on face

Fig. 2 (a) Variation in the size of the symmetrical regionduring yaw movement. (b) Variation in the angle of the sym-metry axis during roll movement.

When the face is in front of the camera, the symme-

try between its two parts appears clearly and the line

which passes between the two eyes and nose tip defines

the symmetry axis. However, when the head performs

a motion, for example, a yaw motion, this symmetry

decreases. We exploit the difference between the sym-

metries before and after the head rotation to deal with

yaw movement.

Figure 2 shows the variation of the symmetrical re-

gion for various head yaw and roll poses. First, for the

yaw movement, we analyse the amount of symmetrical

parts under various yaw angles (Figure 2 (a)).

Let a and b two symmetrical points on the face. m is

the middle of the segment [ab]. The projections of these

points on the image plane are ai, bi, mi. When the face

is in front of the camera, the segments [aimi] and [mibi]

are symmetrical with respect to mi as shown in Figure 3

(a). When the head performs a yaw motion, the features

points (a, b,m) are projected into (a′i, b′i,m

′i) (see Figure

3 (b)). Let ω′ be the vanishing point associated to the

direction of (a, b) in the image plane. Since the central

Fig. 3 (a) Projection of a segment (ab) when the face is infront of the camera, (b) Projection of the segment line aftera yaw motion.

projection preserves the cross ratio [39], the cross-ratios

of (a, b,m,∞) and (a′i, b′i,m

′i, ω′) are equal. We obtain:

ma

mb=m′ia

′i

m′ib′i

÷ ω′a′iω′b′i

(1)

As the two members of the equation 1 are equal

to one (as m is the middle of [ab]), the point m′i is

not the middle of a′ib′i and its position depends on the

position of a′ib′i relatively to ω′. Sincem is the symmetry

centres of ab, the pixels of segment line a′ib′i may satisfy

a partial symmetry but in this case the symmetry center

will not be the middle of a′ib′i. It will be m′i and the

symmetry will concerns the segments m′ia′i and m′id

′i

where d′i is located betweenm′i and b′i so asm′ia′i = m′id

′i

(see Figure 3 (b)).

Thus, after a yaw motion, the symmetry on the im-

age plane is partial. The symmetrical part of a segment

linking two symmetrical points on the face, is smaller


than the symmetrical part of the same segment before

the movement.

Secondly, regarding the roll angle, we estimate that

it corresponds to the angle of the symmetry axis (see

Figure 2 (b)). We infer the pose angle from the inclina-

tion of the symmetry axis that we calculate in case of

frontal view.

3.2 Symmetry axis detection

We use pixels intensity to detect symmetry in the im-

age. Therefore, illumination influences the detection and

in some cases, causes errors. We apply preprocessing on

images before starting the symmetry detection in order

to improve the robustness. We use the RGB space which

gives more significant information about skin colour

compared to grayscale and allows us to differentiate

between face and background since one skin pixel is

generally more similar to another skin pixel than to a

background pixel. For this reason, we apply histogram

equalization on each RGB color channels of the image in

order to reduce illumination effect and then, we merge

them back.

Our goal is to find the morphological symmetry of

face, under different poses, provided that the desired

symmetry does not disappear completely from the im-

age (e.g. when yaw angle exceeds 45◦). Our algorithm is

based on Stentiford [40]. It was necessary to adapt the

initial algorithm that highlights the symmetries present

in the image regardless of what the image represents, a

face or an object. After detecting the face, we consider

an ellipse inside and we set our region of interest to be

the top half of the ellipse. This part of the face is chosen

because upper part is more affected by head rotations.

The change in the size of the symmetric region after a

right/left rotation is greater in the region of the eyes

than that of the mouth.

In order to detect the position P and the orientation

α of the image symmetry axis, we vary α from αmin to

αmax with a step αstep. Then, for each inclination, we

seek the position of the symmetry axis. Once all the

inclinations tested, a vote is performed considering as

best axis the one which accounts the greatest number

of symmetrical pixels as well as being the closest to the

face centre. We consider the distribution of the symme-

try axes Ai{Pi,αi}, each of them weighted by the number

of the local symmetries which it satisfies. We take the

n maximum of this distribution and vote for the axis

A{P,α} which minimize the distance to the face centre

C such that:

d(C,A{P,α}) = min{d(C,Ai{Pi,α})}i ∈ [1, n] (2)

Detect the symmetry relative to an inclination:

The region of interest of the image is divided into small

overlapping square blocks “cells” with side s. We search

for local symmetries via the image cells by searching

the symmetrical cell of each non-homogeneous cell in

the region of interest. When we find two symmetrical

cells, we can determine the position of their symmetry

axis. This local symmetry axis passes perpendicularly

in the middle of the strip which passes through the two

cells. After we detect all the symmetry axes Ai{Pi,α}with i ranges from 1 to the number of axis positions for

a given inclination α. We vote for the best axis A{P,α}using the same mechanism as defined previously.

Define the local symmetries:We test for a match between the original cell and all its

mirror cells relative to α (see Figure 4) until a match

is found. The location of each mirror cell is calculated

with the equation 3. The coordinates of a pixel (x,y) in

the reflected position are (xi,yi). We vary xi along the

width of the region of interest and obtain yi.

yi = y + ((tanα)× (xi − x)) (3)

If two cells match, we consider them symmetric. Mirror

cells lie on the strip which passes through the original

cell and that is inclined by an angle α+ π/2.

– Two cells match if each pixel on the diagonal of the

original cell matches its corresponding pixel on the

mirror cell.

– A pixel matches another one if the intensity differ-

ence of the three channels does not exceed a given

threshold ε.

Fig. 4 Symmetry axis detection.

Pseudocode for symmetry detection:ROI: The region of interest (inside the ellipse)

C: A cell belonging to the ROI

r: Diagonal of the cell C

Cr: Mirror cell of C (after reflection)


xr: First pixel belonging to the diagonal of Cr

Correspondence: Boolean to indicate the correspon-

dence between two cells.

α← αminwhile α < αmax do

while ROI doTake a cell Cif C non-homogeneous then

xr ← widthROICorrespondence← falsewhile Correspondence = false ANDxr > r do

Define Cr (equation 3)if Cr ∈ ROI then

Test correspondence between Cand Cr

if correspondence thenSave Ai{Pi,α,nbrSym}

end

endxr ← xr − 1

endTake n maximum in the distribution ofAi{Pi,α,nbrSym}Vote for A{P,α,nbrSym} via equation 2Save A{P,α,nbrSym}

end

endα← α+ αε

endTake n maximum in the distribution ofA{P,α,nbrSym}Vote for A via equation 2

Algorithm 1: Symmetry axis detection

The interval and the step of α influence the results.

A small step gives more accuracy but takes more cal-

culation time and requires a large amount of storage.

We set the step according to the interval so that we

do not obtain a high-dimensional distribution. Also, a

big interval may provide symmetries which do not cor-

respond to the bilateral symmetry searched. To this,

we set αmax to not exceed 135◦ and αmin not under

45◦ because the natural movement of the roll does not

exceed 45◦ on each side.

One can see results of symmetry axis detection on

Figure 2. We set s = 20 and ε = 25, n = 3, αmin = 80◦,

αmax = 100◦ and αstep = 3◦.

3.3 Features

Once we have detected symmetry axis, and rotated the

image with respect to the axis inclination, we extract

symmetrical features. We test for a match between all

the pixels and their symmetrical one related to the de-

tected symmetry axis. This time, differently from the

previous step (symmetry axis detection), pixels should

not be part of a cell. The pixels are tested one by one, in

order to define the region of symmetry without exclud-

ing homogeneous area (see Figure 5). In this way, detec-

tion of the symmetrical region is not sensitive to pixel

matching process since we use all the texture. We can

find x2, the symmetrical pixel of x1, with the equation

3. If the difference in intensity between the two pixels is

greater than a certain threshold, we decide that the two

pixels are not symmetric. Then, we use the convex hull

encompassing symmetrical pixels to characterise the ge-

ometric features: the size of the symmetrical region (as

shown in Figure 5).

After experimental attribute selection, vertical mea-

surement are not kept because not useful for yaw move-

ment. Therefore, as symmetrical features, we use the

width of the hull which contains symmetrical pixels and

the mean distance of all symmetrical pixels to the axis

of symmetry. We define the width as euclidean distance

between the two most distant pixels.

Fig. 5 Examples of extracting features. (a) Computing aconvex hull which includes the symmetrical pixels. (b) Mea-sures relating to the symmetrical region.

3.4 Yaw estimation

3.4.1 Decision tree classifier

In order to determine the amount of yaw motion, a

Decision Tree classifier is trained using the relative fea-

tures (width of the symmetrical region and the mean

distance of symmetrical pixels) extracted from the sym-

metrical parts according to the amount of yaw motion.

Each class of the classifier corresponds to a discrete

pose. To increase performance of prediction, we use the

Alternating Decision Tree which is based on boosting

[41]. The tree alternates between prediction nodes and

decision nodes. The root node is a prediction node and

contains a value for each class. The prediction values

are used as a measure of confidence in the prediction.

The set of head pose images used for learning rep-

resents the angles for which the symmetry axis is prop-

erly detected. The poses are discrete and vary from -45◦

(left) to +45◦ (right).


We start by extracting features from the region of

interest as described in section 3.3. Then, we construct

the model from the feature vectors derived from images

of several people recorded in different poses. Right and

left poses with the same angle are gathered in the same

class as they contain the same amount of symmetry

and, therefore, the same information. Thus, to estimate

2 ∗ n+ 1 discrete poses (n lateral right poses, n lateral

left poses and 1 frontal pose), the classifier has n + 1

classes. For this, we use 2 ∗ n+ 1 images per subject to

represent the 2 ∗ n+ 1 poses.

The root contains null values as prediction for the

n+1 classes. The first level contain decision nodes based

on the values of the feature vector attributes, followed

by prediction nodes for each class and so on until the

leaves. The sum of the prediction values crossed when

following all paths for which all decision node are true,

is used to classify a given instance. The class which has

the biggest prediction value is the predicted class.

As we use a supervised classification approach, we

first have to train the alternating decision tree classifier

using the same number of images per person as the

number of classes. With the constructed tree, we can

predict the yaw for various test face images. Training

and testing images do not have to be from the same

dataset.

3.4.2 Left vs Right poses

To differentiate between left and right poses, we use the

difference in intensity between the skin and the back-

ground. Our assumption is that a pixel on the face is

more similar to another pixel on the face than to a pixel

on the background.

We take a pixel located on the symmetry axis to

ensure that it is on the face. We compute the average

intensity of the pixels surrounding and consider this

value as a reference. If the symmetry axis is closer to

the left contour (resp. right contour) of the face, then

the face is oriented to the left (resp. right). We calculate

two values : the difference between the reference value

and the average intensity of pixels on the left side and

the same difference for the reference value and the right

side of the axis. If the difference is bigger on the left side

(resp. right), we conclude that the face on the image is

oriented to the left (resp. right).

With this method, we determine which side the pose

is oriented and this information is combined with the

degree of orientation estimated by the Decision Tree in

order to obtain the yaw head pose.

4 Experimental results and discussion

We evaluate the obtained model in order to validate the

features extracted from symmetry. We first evaluate the

approach using the Face Pix [1] dataset which is ideal

for the yaw motion. It consists of poses in the interval

±90◦ at 1◦ increments. This allow us to form several

class configurations as explained in section (3.4.1) (e.g.

10 classes for 19 poses). Also, we test our approach on

the CMU PIE dataset [2] which gives more variability

in term of illumination and expressions (e.g. eyes closed

or smile). In addition to image datasets, we test the

video sequences of the Boston University (BU) dataset

[3]. In BU dataset, subjects are doing free movements

including yaw and roll variations. This allow us to es-

timate the roll (in-plane rotation) accuracy besides the

yaw. Poses in the videos are predicted using the model

built with the Face Pix dataset. Video sequences are

also recorded in the lab, to reproduce situations of par-

tial occlusion not present in the available datasets. In

all experiments, we use the same parameters ε = 25,

s = 20 et n = 3 for symmetry axis detection. The in-

terval of α is [85◦, 95◦] for FacePix and CMU PIE and

[45◦, 135◦] for BU dataset with a step = 3◦. The results

of our experiments are presented below.

4.1 Face Pix dataset

We use the Face Pix dataset [1] to build a head pose

model and to evaluate it. The FacePix database con-

sists of three sets of face images : variable pose, variable

dark illumination and variable light illumination. The

sets of variable illumination images have only frontal

pose. This is why we use only the set of variable poses

which is composed of 181 pose images of 30 different

subjects. Among the 181 poses, we use poses varying

from −45◦ to +45◦ because when exceeding this inter-

val, the bilateral symmetry disappears from the image

plane.

We test several configurations, changing the num-

ber of classes each time. Figure 6 shows the confusion

matrix for three classifiers : 19 discrete poses associ-

ated to the yaw angles from -45◦ to 45◦ with 5◦ step

(10 classes), 9 discrete poses associated to the yaw an-

gles from -40◦ to 40◦ with 10◦ step (5 classes) and 7

discrete poses associated to the yaw angles from -45◦

to 45◦ with 15◦ step (4 classes). One can see that the

estimated pose is in the diagonal of the matrix. How-

ever, the 7 poses model had the higher classification rate

but further experiments on continuous image sequences

(the BU dataset’s videos) reported in section (4.3) show

that the model of 19 poses give more accuracy on this

dataset.


Fig. 6 Confusion matrix associated to: (a)19 poses classifierwith 5◦ step. (b)9 poses classifier with 10◦ step. (c)7 posesclassifier with 15◦ step.

In order to evaluate the model, we split the data

into 6 equal subsets and performed 6-fold cross valida-

tion. In each run, 5 subsets are used as the training set

and the rest is used as a test set. The subjects in the

training and test set are completely distinct since each

subject is taken only once. On this dataset, we test the

sensitivity of the method to the symmetry axis detec-

tion accuracy. We annotated the position of the head

and the position and the orientation of the symmetry

axis in order to compare results in a semi-automatic

and a fully-automatic settings.

A detailed description of the results with 7 poses is

shown in Table 1.

Table 1 Classification rates and Mean Absolute Errors(MAE) for FacePix dataset in semi and fully automaticmodes.

Data Accuracy (%) MAE (◦)Head and symmetry axis annotated 82.38 2.71Head annotated and symmetry automatic 81.90 2.78Head and symmetry detection automatic 79.63 3.14

When removing errors related to head detection and/or

symmetry axis detection, the results outperform those

of the completely automatic mode. The latest one are

not much worse, since the classification accuracy reaches

79.6% for the seven poses model.

4.2 CMU PIE dataset

The CMU Pose, Illumination and Expression dataset

[2] contains images of 68 subjects with a step of 22.5◦

between poses. In our experiments we use the image set

corresponding to variable expression (4 different expres-

sions), the one recorded under variable lighting condi-

tions (21 different flash orientations) and the set with

subjects talking. Concerning the first set (Expression),

we use images of poses between 45◦ and −45◦. We built

a classifier for each set to study apart the robustness

to varying expressions and lighting. We calculate the

classification rate for each classifier using 6-folds cross-

validation. We also merge all images in one encompass-

ing set. The challenge with this dataset is the variable

lighting set. In this case, when there is an intense light

source in a lateral side, the scene loses it’s symmetry.

Table 2 shows results for each set and those con-

sidering all images in one encompassing set. To achieve

illumination invariance, the RGB histogram equaliza-

tion is not sufficient. We apply a discrete cosine trans-

form (DCT) based normalization technique [42] to the

full image. A number of DCT coefficients are truncated

to minimize illumination variations since the variations

mainly lie in the low frequency band. This truncation

affect the matching process in the Expression set. The

accuracy drop from 72.57% to 49.81%. Unlike Talking

and Lighting sets where DCT normalization did more

good than bad. The illumination affects strongly the

matching of symmetries than the noise added by the

normalization. On the other hand, DCT normalization

gives better results on sets with a great number of learn-

ing images. In the CMU Expression set, for each per-

son, each pose is represented by 3 or 4 images (neutral,

blinked, smiling and for certain subject with glasses).

However, in the Talking set, each pose has 60 images

and in the Light set, 23 images are recorded for each

pose.The large number of images used for learning off-

set the loss due to the normalization and even allowed

slightly improve the accuracy for the Talking set.

Table 2 Results for the CMU PIE dataset.

Data Classification accuracy (%)

RGB Equalization DCT

CMU Expression 72.57 49.81CMU Talking 81.04 87.63CMU Lighting 72.51 85.90CMU PIE 72.48 82.26

4.3 Videos

We also test on videos as we aim to use the solution

in real environment for having real pose related feed-

back. We test our method on the video sequences of the

Boston University head pose dataset [3]. We recall that

the inclination of the symmetry axis corresponds to the

roll angle in case of frontal view. The yaw and the roll

are calculated over all the frames in order to compare

with ground truth. In the experiments, we have used the

alternating decision tree trained on FacePix dataset as

it covers better the range of face poses than CMU PIE

which has widely spaced poses (22.5◦ between poses).

We ensure that the size of the face in the BU images is

the same than in FacePix dataset. The best results for

the yaw are obtained using the model built with 19 dis-

crete poses and 5◦ step from the FacePix dataset, giv-

ing 5.24◦ mean absolute error (MAE), 6.80◦ root mean

squared error (RMSE) and a standard deviation (STD)


of 4.33◦. Results for the yaw and the roll are shown in

Table 3.

Table 3 Results for the BU dataset.

RMSE (◦) MAE (◦) STD (◦)Roll 4.39 2.57 3.56

Yaw (FacePix model - 5◦ intervall) 7.60 5.12 5.62

Yaw (FacePix model - 15◦ intervall) 6.80 5.24 4.33

We exploit the temporal information contained in

the video stream in order to reduce calculation time. We

use the position and the orientation of the symmetry

axis of a given frame to reduce the search interval in

the next frame. We perform a check in every 10 frames

(approximately one second).

4.4 Resolution and occlusion

We have conducted experiments which indicate that the

facial symmetry is a good geometrical indicator for head

pose when the local geometric features (such as eyes,

nose or mouth) are missed due to occlusions or wrong

detections. When the head rotation exceeds 30◦ (in a

left/right rotation), some feature points disappear from

the image plane but partial symmetry still exists.

In order to measure the robustness of the approach,

we generate low resolution images from the FacePix

dataset where the head size were 80 × 80. We resize

the head to generate two head image sets, the first is40 × 40 pixels and the second 25 × 25 pixels. We suc-

ceeded in detecting the symmetric features, thing that

can’t be done when relying on specific feature points.

We built a 9 pose classifier for both sets using the pa-

rameters ε = 25, s = 2 et n = 3, 85◦ <= α <= 95◦.

The accuracy of the first classifier is 74.1% and that

of the second is 63.8%. We can see that the accuracy

drop from 79.6% because the method is based on local

symmetries and our algorithm is sensitive to symmetry

axis calculation. On very low resolution images, the lo-

cal symmetries are not enough relevant. But results are

not very bad for heads which are 25× 25 pixels.

We also test the system with web-cam in the labora-

tory simulating local partial occlusions. As the process

does not need interest points, partially occluded faces

can be processed since there is at least one couple of

symmetrical pixels on the image. To do so, all the tex-

ture pixels in the region of interest, contribute to the

demarcation of the symmetrical area. This can be seen

in figure 7.

Fig. 7 Sample frames from video sequences taken in lab.

4.5 Summary and comparison with the state of the art

The main advantage of the method is that the calcu-

lation can start at any pose, without any initialisation,

since the head and the symmetry axis are automati-

cally detected for poses between -45◦ and +45◦. Also,

new face images can be classified easily meaning a built

model.

In video sequences where the head is performing

free movements, wrong detections often occur. To re-

solve this issue, we exploit the continuity of movement.

We exclude detections which are very far from the 3

previous frames considering them as wrong. We use in-

stead an interpolated position of the head. The process

is then, fully automatic but sensitive to the accuracy of

head detection and symmetry axis calculation. The sys-

tem is robust to changes in lighting condition, expres-

sion and also to identity informations since the method

is geometric. Besides, no specific points are needed to

be detected on the face. So, closed eyes or partially oc-

cluded face give the same results as complete face.

We compare our results with others which used the

same datasets. Tian et al. [19] obtained ?? % of good

classification on CMU PIE dataset and 82% for us. Ta-

bles 4 and 5 shows results on FacePix and BU datasets

expressed in MAE, RMSE, STD and classification ac-

curacy (Acc). From these results, it is shown that our

method provide comparable results on CMU PIE and

BU datasets. On FacePix, manifold embedding meth-

ods give good results but there is no explicit solution

for out-of-sample embedding in an LLE and LE mani-

fold [4]. These methods are not automatic unlike ours.

New data can be classified easily through a model of

examples already built.

5 Conclusion

We presented a new approach to perform head pose

estimation. We exploit bilateral symmetry of the face

to deal with roll and yaw motions. The orientation of


Table 4 Comparison of the yaw results with the state of theart using FacePix dataset.

Method resolution MAE (◦) Acc (%)

Hao et al. 2011 (Regression) 60 ×60 6.1 -

Xiangyang et al. 2010*(K-manifold clustering) 16 ×16 3.16 -

Vineeth et al. 2007*(Biased Isomap) 32 ×32 5.02 -

Vineeth et al. 2007*(Biased LLE) 32 ×32 2.11 -

Vineeth et al. 2007*(Biased LE) 32 ×32 1.44 -Proposed 80 ×80 3.14 79.63

40 x 4025 x 25

* A significant drawback of manifold learning techniques is the lackof a projection matrix to treat new data points.

Table 5 Comparison of the BU dataset results with the stateof the art.

RMSE (◦) MAE (◦) STD (◦)Valenti et al. 2012 Yaw 6.10a - 5.79a

Roll 3.00a - 2.82a

Morency et al. 2010 Yaw - 4.97 -Roll - 2.91 -

Proposed Yaw 6.80 5.24 4.33

Roll 4.39b 2.57b 3.56b

a Eye cues used, the pose is estimated only when eyesare detected.

b The roll is estimated in case of frontal view.

the symmetry axis indicates the roll angle of the head.

The symmetrical region of the face with respect to this

orientation provides us features such as the width of re-

gion which allow us to classify and then, to predict yaw

angles. Symmetrical features may be extracted with-

out the detection of special facial landmarks and no

calibration nor initial frontal pose are required. The

results obtained by our approach have been evaluated

using public datasets and they outline the good perfor-

mance of our algorithm with regard of the state of the

art methods. In our future work, we will nominate new

features which allow us to estimate combined yaw and

pitch pose. We will also explore more complicated re-

gression methods to achieve the two degrees of freedom.

we are planning also to explore temporal correlation ob-

tained from the head tracking to extend the range of

motion.

Acknowledgements This work was conducted in the con-text of the ITEA2 ”Empathic Products” project, ITEA2 1105,and is supported by funding from DGCIS, France.

References

1. J. Black, M. Gargesha, K. Kahol, P. Kuchi, and S. Pan-chanathan, “A framework for performance evaluation offace recognition algorithms,” in ITCOM, Internet Mul-timedia Systems II, Boston, 2002.

2. T. Sim, S. Baker, and M. Bsat, “The cmu pose, illumi-nation, and expression database,” IEEE Trans. PatternAnal. Mach. Intell., vol. 25, no. 12, pp. 1615–1618, 2003.

3. R. Valenti and T. Gevers, “Robustifying eye center lo-calization by head pose cues.” in IEEE conference onComputer Vision and Pattern Recognition, 2009.

4. E. Murphy-Chutorian and M. M. Trivedi, “Head poseestimation in computer vision: A survey,” IEEE Trans-actions on Pattern Analysis and Machine Intelligence(TPAMI), vol. 31, no. 4, pp. 607–626, 2009.

5. J.-G. Wang and E. Sung, “Em enhancement of 3d headpose estimated by point at infinity,” Image Vision Com-put., vol. 25, no. 12, pp. 1864–1874, Dec. 2007.

6. Y. Pan, H. Zhu, and R. Ji, 3-D Head Pose Estimationfor Monocular Image, ser. Fuzzy Systems and KnowledgeDiscovery. Springer, 2005.

7. S. Baker, I. Matthews, J. Xiao, R. Gross, T. Kanade,and T. Ishikawa, “Real-time non-rigid driver head track-ing for driver mental state estimation,” in in 11th WorldCongress on Intelligent Transportation Systems, 2004.

8. T. C. Angela Caunce, Chris Taylor, “Improved 3d modelsearch for facial feature location and pose estimation in2d images,” BMVC, 2010.

9. J. Huang, X. Shao, and H. Wechsler, “Face pose discrimi-nation using support vector machines (svm),” in Interna-tional Conference on Pattern Recognition (ICPR), 1998.

10. M. Dahmane and J. Meunier, “Object representationbased on gabor wave vector binning : An application tohuman head pose detection,” ICCV, 2011.

11. D. S. T. S. T. O. Y. S. I. Chamveha, Y. Sugano andA. Sugimoto, “Appearance-based head pose estimationwith scene-specific adaptation,” ICCV, 2011.

12. S. Li, Q. Fu, L. Gu, B. Scholkopf, Y. Cheng, andH. Zhang, “Kernel machine based learning for multi-viewface detection and pose estimation,” in Computer Vision,2001. ICCV 2001. Proceedings. Eighth IEEE Interna-tional Conference on, vol. 2, 2001, pp. 674 –679 vol.2.

13. B. Benfold and I. Reid, “Colour invariant head pose clas-sification in low resolution video,” BMVC, 2008.

14. Z. Zhang, Y. Hu, M. Liu, and T. Huang, “Head pose esti-mation in seminar room using multi view face detectors,”in Multimodal Technologies for Perception of Humans,ser. Lecture Notes in Computer Science, R. Stiefelhagenand J. Garofolo, Eds. Springer Berlin Heidelberg, 2007,vol. 4122, pp. 299–304.

15. F. S. Z. S. Y. T. Hao Ji, Risheng Liu, “Robust headpose estimation via convex regularized sparse regression,”ICIP, 2011.

16. A. Ranganathan and M.-H. Yang, “Online sparse ma-trix gaussian process regression and vision applications,”ECCV, 2008.

17. J. G. Murad Al Haj and L. S. Davis, “On partial leastsquares in head pose estimation: How to simultaneouslydeal with misalignment,” CVPR, 2012.

18. E. Murphy-Chutorian, A. Doshi, and M. Trivedi, “Headpose estimation for driver assistance systems: A robustalgorithm and experimental evaluation,” in IntelligentTransportation Systems Conference, ITSC. IEEE, 2007,pp. 709–714.

19. Y. li Tian, L. Brown, J. Connell, S. Pankanti, A. Ham-papur, A. Senior, and R. Bolle, “Absolute head pose esti-mation from overhead wide-angle cameras,” in In IEEEInternational Workshop on Analysis and Modeling ofFaces and Gestures, 2003.

20. D. J. Beymer, “Face recognition under varying pose,”CVPR, pp. 756–761, 1994.

21. S. G. Jamie Sherrah and E.-J. Ong, “Understanding posediscrimination in similarity space,” BMVC, 1999.

22. J. Y. Vineeth Nallure Balasubramanian and S. Pan-chanathan, “Biased manifold embedding: A frameworkfor person-independent head pose estimation,” CVPR,2007.


23. C. BenAbdelkader, “Robust head pose estimation usingsupervised manifold learning,” ECCV, 2010.

24. F. D. l. T. H. B. Dong Huang, Markus Storer, “Super-vised local subspace learning for continuous head poseestimation,” CVPR, 2011.

25. W. L. Xiangyang Liu, Hongtao Lu, “Multi-manifold mod-eling for head pose estimation,” ICIP, 2010.

26. R. Valenti, N. Sebe, and T. Gevers, “Combining headpose and eye location information for gaze estimation,”IEEE Transactions on Image Processing, vol. 21, no. 2,pp. 802–815, 2012.

27. S. O. Ba and J.-M. Odobez, “Multiperson visual focus ofattention from head pose and meeting contextual cues,”IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 33, pp. 101–116, 2011.

28. M. Nabati and A. Behrad, “3d head pose estimation andcamera mouse implementation using a monocular videocamera,” Signal, Image and Video Processing, pp. 1–6,2012.

29. H. A. Rowley, S. Baluja, and T. Kanade, “Rotation in-variant neural network-based face detection,” in Proceed-ings of the IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition, ser. CVPR ’98,1998, pp. 38–.

30. H. R. Wilson, F.Wilkinson, L. Lin, and M. Castillo, “Per-ception of head orientation,” Vision Research, vol. 40,no. 5, pp. 459–472, 2000.

31. T. Luhandjula, E. Monacelli, Y. Hamam, B. vanWyk, and Q. Williams, “Visual intention detection forwheelchair motion,” in International Symposium on Vi-sual Computing (ISVC), 2009, pp. 407–416.

32. S. D. Vinod Pathangay and T. Greiner, “Symmetry-based face pose estimation from a single uncalibratedview,” 8th IEEE International Conference on AutomaticFace and Gesture Recognition (FG 2008),The Nether-lands, pp. 1–8, 2008.

33. B. Ma, A. Li, X. Chai, and S. Shan, “Head yaw estimationvia symmetry of regions,” 2013, pp. 1–6.

34. M. Gruendig and O. Hellwich, “3d head pose estimationwith symmetry based illumination model in low resolu-tion video,” in Lecture Notes in Computer Science, vol.3175. Springer, 2004, pp. 45–53.

35. J. Harguess, S. Gupta, and J. Aggarwal, “3d face recog-nition with the average-half-face,” in Pattern Recogni-tion, 2008. ICPR 2008. 19th International Conferenceon, 2008, pp. 1–4.

36. K. Hattori, S. Matsumori, and Y. Sato, “Estimating poseof human face based on symmetry plane using range andintensity images,” in Pattern Recognition, 1998. Pro-ceedings. Fourteenth International Conference on, vol. 2,1998, pp. 1183–1187 vol.2.

37. Z. Gui and C. Zhang, “3d head pose estimation us-ing non-rigid structure-from-motion and point correspon-dence,” IEEE TENCON, 2006.

38. P. Viola and M. Jones, “Rapid object detection using aboosted cascade of simple features,” in Computer Visionand Pattern Recognition, 2001. CVPR 2001. Proceed-ings of the 2001 IEEE Computer Society Conference on,vol. 1, pp. I–511 – I–518 vol.1.

39. H. Coxeter, Projective Geometry, ser. Fuzzy Systems andKnowledge Discovery. Springer-Verlag 2nd Revised edi-tion, 2003.

40. F. Stentiford, “Attention based facial symmetry detec-tion,” in In Proc. ICAPR 2005, 2005.

41. G. Holmes, B. Pfahringer, R. Kirkby, E. Frank, andM. Hall, “Multiclass alternating decision trees,” inECML. Springer, 2001, pp. 161–172.

42. W. Chen, M. J. Er, and S. Wu, “Illumination Compen-sation and Normalization for Robust Face RecognitionUsing Discrete Cosine Transform in Logarithm Domain,”IEEE Transactions on Systems, Man, and Cybernetics,vol. 36, pp. 458–466, 2006.

Documents

Head Pose Estimation Based on Face Symmetry Analysis · Symmetrical region is de ned by analysing pixels intensity. The intensity of one pixel on the right side of the face is more