View
246
Download
1
Category
Preview:
Citation preview
International Journal of Current Trends in Engineering & Research (IJCTER)
e-ISSN 2455–1392 Volume 2 Issue 4, April 2016 pp. 321 - 336
Scientific Journal Impact Factor : 3.468
http://www.ijcter.com
@IJCTER-2016, All rights Reserved 321
FACIAL POINT DETECTION AND EMOTION RECOGNITION FOR A HUMANOID ROBOT
A.V.Kiranmai1, S.Karimulla
2
Asst. Professor, Dept. of E.C.E,
SVEW, Tirupathi.
Abstract—Automatic acuity of facial lexis with scaling differences, pose variations and occlusions would greatly augment natural human robot interface. This paper proposes unsupervised automatic
facial point detection integrated with regression based intensity estimation for facial action units
(AUs) and emotion clustering to deal with such challenges. The proposed facial point detector is able
to detect 54 facial points in images of faces with occlusions, pose variations and scaling differences
using Gabor filtering, BRISK (Binary Robust Invariant Scalable Key points), an Iterative Closest
Point (ICP) algorithm and fuzzy C-means (FCM) clustering. Particularly, in order to effectively deal
with images with occlusions, ICP is first applied to generate neutral landmarks for the occluded
facial rudiments. Then FCM is used to further reason the shape of the occluded facial region by
taking the prior knowledge of the non-occluded facial elements into account then conduct AU
intensity estimation respectively using support vector regression and neural networks for 18 selected
AU’s. FCM is also subsequently employed to identify seven basic emotions as well as neutral
expressions. It also shows great potential to deal with compound and newly arrived novel emotion
class detection. The overall system is integrated with a humanoid robot and enables it to deal with
challenging real life facial emotion recognition tasks.
Key Words—action units (AUs), BRISK, Iterative Closest Point (ICP), fuzzy C-means (FCM)
I. INTRODUCTION
In order to build robots that interact in a more humanlike and instinctive manner, perception
of human emotions is essential [1–3]. Automatic face and expression recognition has greatly
benefited such multimodal agent based interface development. However, detecting emotions from
natural facial expressions during real life human robot interaction could still be challenging because
of various pose and subject variations, illumination changes, occlusions and background confusion.
Especially, for automatic face analysis, the original vision APIs provided by the robot’s SDK
employed in this paper were not capable of dealing with such challenging facial emotion recognition
tasks. Optimal, robust and accurate automatic face analysis is thus important to deal with such
challenging real life applications since the performance of advanced applications such as facial
action and emotion detection relies deeply on it. Many parametric and mode explicit feature drawing
out approaches in the computer visualization field have been proposed to estimate head pose and
detect facial landmarks from real life images to benefit subsequent automatic facial behavior
perception to address the above issues. However, many of the above applications found it difficult to
balance well between high quality feature extraction and low computational requirement, which is
essential in real time applications. This paper is thus aggravated to develop a facial emotion detection
system for a humanoid robot to deal with emotion detection from images with pose variations,
illumination changes, occlusions and background noise. A cost effective optimal unsupervised
learning scheme for facial point detection is proposed as the first step of this paper and implemented
incorporating a 2D Gabor filter, a novel feature descriptor, BRISK (Binary Robust Invariant Scalable
Key points), an Iterative Closest Point (ICP) algorithm and fuzzy C-means (FCM) clustering. In
order to deal with images with occlusions effectively, the proposed facial point detector applies ICP
to first recover neutral landmark points for an occluded facial region. Then FCM is applied to further
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 322
implication the shape of the occluded element by taking the prior knowledge of the attributes of the
non-occluded facial regions into account. The following post processing is subsequently used to
further reconstruct the occluded facial region. After we have applied FCM to obtain the shape cluster
of the occluded facial element, selecting the top five image outputs with the highest correlations to
the test image in the cluster and average them to reconstruct the best fitting geometry for the
occluded facial element. The overall system architecture with facial point detection for non occluded
images is provided in Fig. 1 whereas the system architecture with facial point detection for images
with occlusions is presented in Fig. 2.
Fig. 1. The overall system architecture with facial point detection for nonoccluded images
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 323
Fig. 2. The system architecture with facial point detection for images with occlus
II. UNSUPERVISED FACIAL POINT DETECTION
Supervised facial feature detection based on AAM, ASM or CLM relies heavily on the
training data. Such trained models sometimes do not have very good compatibility across different
databases [4] indicated that a supervised AAM model trained with frontal view image stends to have
very moderate performance for facial feature extraction from non-frontal or multi-view of images in
other databases (e.g. PUT and LFW). Especially its performances decline fast when dealing with
large pose variations, partial occlusions, scaling differences and back-ground clutter [4]. CLM was
more designed for real time applications [4,20]. Therefore, in order to overcome the above
difficulties implemented an unsupervised robust landmark detector. Comparing to the above
supervised facial feature detection models, this unsupervised facial point detector is more flexible to
deal with diverse landmark detection tasks against pose variations,
Algorithm 1. Intelligent Facial Point Detection and Emotion Recognition Input:
(1) A test facial image
(2) A landmark file with 68 landmarks for any neutral image .
Output:
(1) AU intensity of the selected 18 AUs
(2) The detected emotion of the test image
Begin
repeat
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 324
Step 1. Load the test input image and the source input landmark file with 68 landmarks taken from the
database.
Step 2. Process the test input image for feature point extraction.
For each image or frame
{
2.1 Conduct face and ROI detection.
2.2 Increase the contrast of each ROI.
2.3 Apply the bilateral filter to reduce the noise of each ROI.
2.4 Apply the Gabor filter on each ROI to detect its contour.
2.5 Apply the BRISK keypoint detector to extract landmarks.
2.6 Use the generated 21 facial points obtained from the combined outputs from 2.4 and 2.5 as the
reference point cloud and transform the source 68 neutral landmarks from CK+ to best match the
above 21 reference points using the Iterative Closest Point algorithm to produce a set of detected 54
facial points.
2.7 If (images contain occlusions)
{
FCM is applied to further inference the shape of the occluded facial element.
Post landmark correlation processing is then applied to derive the best fitting geometry for the
occluded facial element to adjust the neutral landmarks generated by ICP and output the final 54
detected land-marks.}
2.8 Output and plot the generated 54 landmarks on the test image
}
Step 3. Measure AU intensity based on the detected 54 facial landmarks using SVRs and NNs.
Step 4. Use fuzzy c-means to conduct emotion clustering.
until ESC key is pressed; end
This proposed facial point detector incorporates several advanced feature extraction
algorithms and balances well between high quality feature extraction and low computational
requirement. It also offers efficient optimal performance to deal with real-time facial point detection.
Algorithm 2 demonstrates the details of this pro-posed unsupervised keypoint detector.
Algorithm 2. Unsupervised Facial Point Detection
Input:
(1) A test facial image
(2) A source landmark file with 68 landmarks for any neutral image from the CK+ database
Output:
(1) 54 facial points for the test image begin repeat
1. Load the test input image from video inputs and the input landmark file with 68 landmarks taken from
CK+.
2. If (the image and landmark files not loaded) exit (0).
3. Load Haar Cascade classifiers for face and regions of interest (i.e. areas of eyes, nose and mouth)
detection.
4. Conduct feature point extraction.
For each image or frame
{
4.1 Convert the input image into a gray scale image.
4.2 Equalize the histogram values of the converted gray scale image to increase contrast.
4.3 Apply Haar face detector and extract the face region.
4.4 Apply Haar Cascade feature detectors on the detected face region to extract ROIs.
4.5 Increase the contrast of each ROI.
4.6 Apply the bilateral filter to reduce the noise of each ROI.
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 325
4.7 Apply the Gabor filter on each ROI to detect its edge and derive the initial 16 facial points.
4.8 Apply the BRISK feature detector and extend the detected number of landmarks to 21.
4.9 Apply the Iterative Closest Point algorithm to obtain a set of 54 detected landmarks.
{
4.9.1 Retrieve the previously loaded source 68 neutral landmark cloud provided by the database.
4.9.2 For each ROI, apply the Iterative Closest Point algorithm to reserve 2D facial curves and reconstruct
2D surface with the corresponding source neutral landmark cloud and the corresponding detected
reference facial point cloud obtained from 4.8 as inputs.
}
4.10 If (Occlusion occurred)
{
4.10.1 Apply FCM to inference the shape of the occluded facial region.
4.10.2 Apply post landmark correlation processing to derive the best fitting geometry for the occluded facial
element and adjust the neutral landmarks generated by ICP.
4.11 Output the generated 54 landmarks of the test image.
}
until ESC key is pressed; end
2.1. Face and Region of Interest Detection
First conduct some pre-processing of input images before applying the face detection
algorithm. That is, converted an input image into gray scale. A histogram equalization method is then
applied to improve the contrast of the converted image. Then detect the face region in the test image
using the improved face detection algorithm.
As an important step in automatic facial feature extraction, ROI detection is subsequently
performed. In this paper, employed three cascade classifiers, i.e. the right and left eye cascade
classifiers and a lower facial region cascade classifier, to detect ROIs. These three employed cascade
classifiers are borrowed from Open CV [46,47]. According to [46,47], these classifiers were
respectively trained with a large amount of positive and negative images in order to detect each ROI
efficiently.The ROIs are detected in the following way. First of all, an equalized histogram is
employed in order to increase the contrast of the detected face region by the face detector to gain
more visibility of the areas of the eyes, nose and mouth. The detected face region is then divided into
three sections in order to retrieve ROIs with high accuracy. These three divided facial sections
include areas of the right and left upper faces and a lower face region (see the left diagram in Fig. 3).
Then the three cascade classifiers for ROI detection are respectively applied on these three
facial parts. The ROIs recovered include the positions of both eyes along with eyebrows and
locations of the nose and the mouth (see the right diagram in Fig. 3). This ROI detection is also
proved to be efficient which achieves 100% accuracy for a test set of 1000 images selected images. It
also able to detect ROIs successfully from realtime video inputs with up to 60deg rotations.
Fig. 3. Face division (left) and ROI detection (right).
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 326
For a test image with the mouth area occluded, while detecting the ROI, the occluded mouth area is
not detected by the Haar cascade detectors. That is, if an ROI is not detected, then it indicates that the
image is occluded for that region. Therefore image occlusion is detected. Further detailed discussion
on the reconstruction of the occluded facial regions is provided in Section 2.4. Also the subsequent
Gabor filter and BRISK based analysis will not be applied to the occluded facial region but only
applied to the non occluded facial elements to identify key facial points.
2.2. Gabor Filtering Based Feature Extraction
The detected ROIs are then further processed to recover their corresponding borders and contours by
using bilateral and Gabor filtering. First, a bilateral filter is employed to reduce the noise of each
ROI. In comparison to other filtering algorithms (such as a normalized box filter and a median filter),
bilateral filtering is not only a nonlinear and noise reducing smoothing process for images, but also
edge preserving [48,49]. It replaces the intensity value of each pixel with a weighted average of
intensity values from nearby pixels to reduce the noise and preserve sharp edges. Bilateral filtering
has been applied individually on each ROI. This process helps us to reduce noise in the input image
and increase the accuracy of the subsequent Gabor filtering based edge detection for each ROI.
We subsequently employ Gabor filtering in this paper to detect contour of each ROI. A Gabor
filter is a linear filter used in the computer vision field for edge detection. Gabor filters are rotation
sensitive local frequency detectors. They employ optimal localization properties in both spatial and
frequency domains, which make them suitable for texture segmentation analysis. Gabor filters with
different frequencies and orientations can form a filter bank, which is proved to be effective for
feature extraction from images [50,51]. In this research, we employ a 2D Gabor filter to detect the
edge of each ROI. Previous paper [52] also indicated that Gabor filtering based feature detection
performed better than principal component analysis, local feature analysis and Fisher’s linear
discriminate. Most importantly, Gabor filters are able to remove the light and contrast variations in
images while preserving their robustness.
x = x cos θ + y sin θ
y = −x cos θ + y sin θ
where λ is the wave length of the sinusoidal factor with θ as the anti-clock wise rotation of the
Gaussian and the plane wave, ψ as the phase offset, σ as the sigma/standard deviation of the
Gaussian envelope, and as the spatial aspect ratio [50].In this applied the Gabor filter on each ROI to detect its edge.
Fig. 4. Edge detection using a 2D Gabor filter
Example outputs of the Gabor filter are shown in Fig. 4. These include the detected outlines of the
eyes, mouth and nose. These outputs also give the detected important edge regions in white color
with the rest in black color pixels. Also, in order to recover four key points for each ROI based on
these filtering outputs, connect the biggest and closest blocks of white pixels to form a big block and
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 327
draw a rectangle around it.Then subsequently place a key point respectively on the left and right side
of the rectangle and the midpoint of the upper and lower side of the rectangle (see Fig. 4). Thus we
recover 16 key points for both eyes, nose and mouth with each of these facial elements allocated four
facial points. This 2D Gabor filter is also able to deal with head rotations effectively. However, it is
not able to recover any key points for both eyebrows, which may play important roles in emotional
facial expressions (e.g. AU1 and AU2 are present for surprised emotion).In order to increase the
system’s robustness, identify facial features for both eyebrows and further justify the above detected
16 facial points for subsequent facial analysis, employ a robust novel key point descriptor and
detector, BRISK. This key point detector has further extended the current detected facial landmarks
and is also robust enough to deal with scaling differences, pose variations and head rotations. A high
speed corner detector, FAST, BRISK is able to provide more reliable corner detection results in
comparison to the 2D Gabor filter. It also shows high levels of repeatability under diverse changes of
head poses, rotations and scales. In this paper sequence of landmarks ranging from 100 to 200 points
is generated by BRISK for all ROIs. An example output of BRISK is also provided in Fig. 5.
Because BRISK is scale and rotation invariance, our facial point detector is able to reliably deal with
rotations and pose variations at least up to 60 deg in real time applications.
Fig. 5. Keypoint detection for each ROI using BRISK (left) and the final detected 21 points (right).
However, although Gabor filtering is not as accurate as BRISK based facial point detection, it
generates one core corner detection point for each potential candidate corner without any
overlapping and redundancy, which also provides good reference to the BRISK based feature
detection. Thus, we combine these two sets of key point outputs for each ROI produced respectively
by the Gabor filter and BRISK to generate the final output landmarks for each facial element and
further increase the accuracy for feature point detection.
The following processing shown in Algorithm 3 is conducted in order to combine the landmarks
generated by BRISK and Gabor filter which consists of three steps.
Algorithm 3. Output Combination of Gabor and BRISK
Input:
(1) The outputs produced by Gabor filter and BRISK
Output:
(1) 21 selected keypoints
begin
Step 1. //Reduce the number of feature points detected by BRISK For each keypoint in BRISK’s output
{
1.1 Apply circle-circle intersection method.
1.2 If (Keypoint is intersected by more than 5 times)
1.2.1 Consider it as an important keypoint.
else
1.2.2 Remove the keypoint from the feature set.
1.3 Add the newly selected keypoint to the new set of BRISK feature output.
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 328
}
Step 2. For each keypoint in new set of BRISK feature output and Gabor filter output
{
2.1 Apply circle-circle intersection method.
2.2 If (New BRISK feature output keypoint overlapping the Gabor filter output keypoint)
2.2.1 Use BRISK keypoint as the feature point.
else
2.2.2 Remove the keypoint from feature set.
}
Step 3. Combine the eyebrow keypoints generated by BRISK with the above new output. End
Most importantly, BRISK is able to provide landmarks for both eyebrows in comparison to the
outputs generated by Gabor filtering. Eventually, 21 landmarks are generated based on the
combination of the BRISK and Gabor filtering based facial point detection. These 21 generated
landmarks include three points for each eyebrow, four for each eye, and three for the nose and four
for the mouth contour.
III. FACIAL POINT DETECTION FOR OCCLUSIONS USING ICP AND FCM
The ICP algorithm is usually employed to reconstruct 2D or 3D surfaces and restore 2D curves in
computer vision research. It is an algorithm employed to minimize the difference between two sets of
points. It employs one reference and one source point cloud as inputs. The reference point cloud will
be kept unchanged while the source point cloud will be transformed to provide the best match to the
reference set [58]. In this research, we take the 21 combined landmark outputs generated by BRISK
and Gabor filtering as the reference point cloud and employ a set of 68 neutral landmarks of a
randomly selected neutral image provided by the CK+ database as the source point cloud. Since we
aim to detect 54 facial landmarks for a test image, we only select 54 landmark points from the 68
original neutral points for this experiment by discarding 14 landmarks for the description of the
overall facial contour. We then align the neutral landmarks on each ROI of the test image. The
reason that the alignment for each ROI is done separately is to make the neutral landmarks best fitted
with the test image.
Algorithm 4. Iterative Closest Point Algorithm
Input:
(1) 21 reference landmark points generated by Algorithm 3 and neutral 68 source landmark points.
Output:
(1) 54 generated landmarks for the test image. begin
while (the maximum number of iterations is not met)
{
Step 1. To select each point in the source point cloud and each point in the reference point cloud for each ROI
and match these points to compute the closest points.
Let S be the source point set with Ns points denoted as l S = { } for l = 1, 2, ..., Ns whereas R be the
reference point set with Nr points denoted as : R = { } for k = 1, 2, ..., Nr. Calculate the Euclidean
distance metric, d, between each pair of points in S and R which is denoted as follows:
By referring C as the closest point operator, the
resulting set of closest points, Y , is obtained as follows:
Y = C(S, R) (6)
d(s, R) = minr∈R||r − s|| (5)
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 329
Step 2. To estimate the transformation using a mean squared error in order to best align each source point to
its match found.
After obtaining the closest point set Y , the least squares quaternion operation, Q as mentioned in [58], is used
to compute the least squares registration as follows:
(q, dms) = Q(S, Y ) (7)
where dms is the mean square point matching error and q is the registration vector.
Step 3. To transform the source points using the obtained transformation (e.g. rotation and translation).
Update the positions of the data set S until it reaches the termination point. The update of positions is given as
follows:
S = q(S) (8)
where q(S) denotes the updated point set S after transformation by the registration vector, q.
}
1. We apply cascade classifiers for ROI detection in order to identify the positions of the left eye,
right eye, nose tip and mouth.
2. If any of the ROI is occluded, then the corresponding Haar cascade detector will not detect it. That
is, the image occlusion is detected.
3. Only the detected ROIs are further processed using Gabor filter and BRISK to recover key
landmarks for these facial regions. In another word, Gabor filter and BRISK are not applied to the
occluded facial regions.
4. After applying Gabor filter and BRISK, we get a set of landmark points (less than 21) for the
nonoccluded facial elements (e.g. a total of 17 landmarks for a test image with mouth occlusion).
5. Then we employ a set of neutral landmarks with ICP to construct 54 landmarks for the overall
image with neutral landmarks recovered for the occluded region (i.e. the mouth).
These detected landmarks for the overall image are consequently used for SVR based facial AU
intensity estimation and FCM based emotion recognition. Moreover, as discussed earlier, the NAO
robots original vision AL Face Detection API is only able to provide 31 2D points for a facial
representation including the contour of the mouth (eight points), nose.
Using ICP, for each ROI, aligned the source neutral landmarks provided by the database to the
reference points generated by BRISK and Gabor filtering using iterative transformation to gradually
reduce the distance between the source and the reference points. The reference point cloud, i.e. the
generated points by BRISK and Gabor filtering, will be kept unchanged while the source point cloud,
i.e. the neutral points provided by image will be transformed to provide the best match to the
reference point cloud. This ICP algorithm first of all retrieves the closest point in the reference point
set for each source point. It then calculates rotations and translations between each pair of source and
reference points in order to provide best alignment between them. Subsequently, it conducts the
transformation of the source points based on the above estimation. This process continues until the
stopping criterion (i.e. the maximum number of iterations) is reached [58]. The pseudo code of the
ICP algorithm with equations in reference to [58] is provided in Algorithm 4. In this application, the
ICP algorithm transforms the source neutral 54 landmarks to best match the reference 21 facial
points. It allows us to reconstruct 2D surfaces and thus extend the detected facial points from 21 to
54. The output 54 landmarks recovered by ICP include 5 points for each eyebrows, 6 landmarks for
each eye, 9 for the nose, 20 for the mouth, and 3 points for the chin. An example output of the ICP
algorithm is shown in Fig. 6. Fig. 7 shows the overall detected 54 facial landmarks for an example
image.
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 330
Fig. 6. (a) The combined output of BRISK and Garbor filtering is shown. (b) The fitted 68 neutral landmarks
provided by CK+ are shown. (c) A set of 54 facial point outputs generated by ICP with (a) and (b) as inputs is shown
Fig. 7. An example image from CK+ with the detected 54 facial points.
In summary, facial occlusion in the image is detected when we are detecting ROIs. That is, if an ROI
(e.g. the mouth) is not detected, then it indicates that the image is occluded for that region. The step
by step procedures are also summarized in the following for the occlusion detection (steps 1 and 2)
and landmark generation for the overall image (steps 3–6).
However, although the ICP algorithm is able to reconstruct the missing facial features, which also
achieves reasonable performance especially for images from real life applications (e.g. with subtle
expressions), the facial landmarks generated by ICP always represent a neutral facial element.
Therefore in order to restore emotional expression for the occluded facial regions, FCM clustering,
an unsupervised learning technique is applied to further reason the shapes of the occluded facial
elements. First of all, divide the whole face into three parts such as left eye, right eye, and lower
facial regions. For each facial region, we employ one FCM to inference its shape. Overall, three
FCM algorithms are developed to respectively recover the contours or shapes of the occluded left
eye, right eye and the mouth. Each FCM uses landmarks of non occluded facial regions as inputs and
outputs three clusters to represent an opened, narrowed and neutral facial element. For the reasoning
of the shape of the mouth, 22 landmarks representing both eye regions with 10 landmarks denoting
eyebrows and 12 landmarks denoting eyes are used as inputs whereas for the inference of the shape
of either eye, 20 landmarks which form the geometry of mouth and 11 points representing the other
visible eye region are used as inputs. Depending on which facial element is occluded, the three
output clusters of FCM represent either mouth wide open/lip corner puller/closed neutral mouth or
widened/tightened/neutral contour of the eye. Subsequently, the output shape information of each
FCM is then used to advise a subsequent post landmark correlation processing to further adjust the
neutral facial landmarks of the occluded element produced by ICP. In this research, we only use
FCM and related post processing on top of ICP if the occlusion is occurred in the test image
otherwise the landmark generation completes after the application of ICP. As discussed earlier, after
the attribute or shape of the occluded facial element of a test image is predicted by FCM, the test
image is grouped into a specific shape cluster (e.g. mouth open if the mouth is occluded). Then
within this shape cluster, landmark correlations between the visible facial elements of the test
instance and the corresponding facial elements of other samples in the cluster have been calculated.
Then five images with the highest correlations to the visible facial elements in the test image are
selected. The landmark points from these highly correlated five images for the corresponding
occluded facial element in the test image (e.g. the mouth) are retrieved and then averaged to produce
the landmarks for the occluded facial element in the test image. Overall the landmark detection using
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 331
both ICP and FCM with post correlation processing achieves better detection accuracy in comparison
to purely using ICP for images with occlusions because of the employment of the clustering
technique to further reason the shape of the occluded facial element based on prior knowledge of the
non occluded facial regions and the retrieval of the best fitted geometry for the occluded facial region
based on the highly correlated images with similar emotion indication to the test image.
Facial occlusion in the image is detected when we are detecting ROIs. That is, if an ROI (e.g. the
mouth) is not detected, then it indicates that the image is occluded for that region. The step by step
procedures are also summarized in the following for the occlusion detection (steps 1 and 2) and
landmark generation for the overall image (steps 3–6).
6. We apply cascade classifiers for ROI detection in order to identify the positions of the left eye,
right eye, nose tip and mouth.
7. If any of the ROI is occluded, then the corresponding Haar cascade detector will not detect it. That
is, the image occlusion is detected.
8. Only the detected ROIs are further processed using Gabor filter and BRISK to recover key
landmarks for these facial regions. In another word, Gabor filter and BRISK are not applied to the
occluded facial regions.
9. After applying Gabor filter and BRISK, we get a set of landmark points (less than 21) for the
nonoccluded facial elements (e.g. a total of 17 landmarks for a test image with mouth occlusion).
10. Then we employ a set of neutral landmarks with ICP to construct 54 landmarks for the overall
image with neutral landmarks recovered for the occluded region (i.e. the mouth).
11. To further reconstruct the landmarks for the occluded facial region, we employ FCM and some
post correlation processing to identify the best fitting geometry, further adjust the shape of the
occluded facial element and output the final set of 54 landmarks.
IV. AU INTENSITY ESTIMATION AND EMOTION CLUSTERING
Geometric features are capable of capturing physical cues effectively embedded in emotional
facial expressions, they are widely used for facial action and emotion recognition paper[12,33].
motion based features, i.e. the derived 54 facial points, to estimate AU intensity and recognize
emotions of facial images, respectively employ SVR and NNs for the AU intensity estimation for the
selected 18 AUs. FCM clustering is also used to detect the eight basic emotions including happiness,
anger, sadness, disgust, surprise, fear, contempt and neutral. Experiments of compound emotion
recognition using FCM are also conducted.
4.1. AU intensity estimation
Among the 32 AUs defined by the FACS, this paper focuses on the recognition of 18 AUs closely
associated with the expression of eight basic emotions including Inner Brow Raiser (AU1), Outer
Brow Raiser (AU2), Brow Lowerer (AU4), Upper Lid Raiser (AU5), Cheek Raiser (AU6), Lid
Tightener (AU7), Nose Wrinkler (AU9), Upper Lip Raiser (AU10), Lip Corner Puller (AU12),
Dimpler (AU14), Lip Corner Depressor (AU15), Lower Lip Depressor (AU16), Chin Raiser (AU17),
Lip Stretcher (AU20), Lip Tightner (AU23), Lip Pressor (AU24), Lips Part (AU25), Jaw Drop and
Mouth Stretch (AU26/27). We respectively employ 18 SVRs and 18 NNs to measure the intensities
of these 18 selected AUs with each SVR/NN dedicated to intensity estimation for each AU. As
discussed earlier, in recent research, several techniques have been proposed for AU intensity
measurement [30,32]. We employ SVR and NN based AU intensity estimation in this paper because
of their promising performances and robustness of the modeling of the problem domain.
As well known, SVR is a nonlinear regression technique. It uses a nonlinear mapping to transform
the original training data into a higher dimension and computes a linear regression function in this
transformed high dimensional feature space. SVR aims to identify the most suitable hyperplane,
which is able to accurately predict the distribution of data within an error tolerance value of ε. In
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 332
comparison to support vector classification (SVC), it estimates the function of data rather than to
classify data into distinctive two or more classes. Moreover, the crucial differences between SVR
and SVC are:
(1) The predicted label value of an instance is a continuous value in SVR but a discrete one in
SVC. (2) For SVR, there is a tolerable error, ε, between the predicted value and the actual
label value. But for SVC, the predicted and actual class types must be either exactly matched
or no matching at all. SVR has been applied in various fields to solve regression problems
such as financial time series forecasting [59]
MSE=
∑ ( )
2
where yi is the predicted value, and y i is the original annotation. The best parameter set of C, gamma and epsilon within the search space in our application is identified for each SVR, which
yielded the lowest MSE for each AU intensity estimation. These identified most optimal parameters
for each AU intensity estimation are then used for the training and testing of the corresponding
nonlinear SVR model,18 SVRs are employed for AU intensity measurement of the 18 selected Aus.
Each image also contains AAM tracked 68 two dimensional landmark points stored in an individual
file. The database also includes 593 FACS coded peak frame emotional images. 327 peak facial
images have also been annotated with the selected seven emotion labels including happiness, anger,
sadness, disgust, surprise, fear and contempt [40,41]. Therefore, 200 FACS coded peak emotional
images with AU intensity annotation and 50 neutral images for the cross validation processing of
these 18 SVRs. Moreover, a different number of images is also used for the training of each SVR
based on the availability of the corresponding AU among the extracted 250 training images. It ranges
from 15 images used for the training of intensity estimation for AU10 to 125 images used for
intensity estimation for AU25. On average, 75 images are used to train each SVR.
4.2. Emotion clustering
In this the Open CV version of the Libsvm package is also recompiled under the latest NAO C++
SDK and integrated with NAO to perform real time facial AU intensity estimation.
Facial emotion recognition has been carried out using many supervised learning techniques in the
past, e.g. NNs and SVMs [6,12,33,36]. Unsupervised FCM clustering technique to recognize the
seven basic emotions and neutral expressions. This clustering technique also shows great potential in
the mean squared error (MSE) of each AU intensity estimation.
Clustering algorithms generally organize objects into groups based on similarity criteria. In order
to deal with different clustering mining tasks, clustering techniques can be classified into the
following categories: partitioning, hierarchical, density based and grid based clustering methods.
However, these basic clustering algorithms tend to force clustering an object into only one cluster.
Sometimes, this rigid clustering may not be desirable for the application of some of the problem
domains. For instance, a compound facial emotion may belong to more than one emotion cluster. An
online shopping review may contain comments related to several products, which may fall into
several product clusters. Therefore, in this employed probabilistic model based clustering to allow
one facial expression image represented by facial actions to be grouped into more than one emotion
cluster [63]. It would be even more useful if a weighting is also calculated to reflect the strength of
an object belonging to one cluster. Thus, a well known probabilistic model based clustering
algorithm clustering, is used in this paper to detect emotions Given a set of objects, o1, o2, . . . , on,
and k predefined fuzzy clusters, C1, C2, . . . , Ck, FCM clustering groups each object into more than
one cluster and generates a partition matrix, M = [wi j ](1 ≤ i ≤ n, 1 ≤ j ≤ k), where wij is the degree of
membership that a data point belongs to a cluster. The partition matrix also needs to fulfil the
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 333
following criteria [63]. (1) The degree of membership wij for each object oi belonging to a cluster Cj,
should be in the range of [0, 1] to ensure that cluster with a nonzero membership value.
In order to calculate a probability distribution over the clusters to obtain degrees of membership
for each data object, an expectation maximization (EM) algorithm is employed. It includes two
procedures: the expectation and maximization steps. The algorithm starts with initialized random
parameters and iterates until the clustering cannot be improved (i.e. the clustering converges or the
change to the membership values between the most recent two iterations is sufficiently small). In each iteration, it calculates the center of each cluster and updates memberships for fuzzy clustering.
Both of expectation (Estep) and maximization (Mstep) steps are involved in each iteration. The Estep
first of all selects random data instances as the initial centers of clusters. It then calculates degrees of
membership for each data in each cluster. We consider the idea that if a data point, oi, is closer to a
cluster, Cj, the degree of membership of oi for this cluster Cj should be higher. For one data object,
the sum of its membership values for all the clusters.
Thus, the Estep produces fuzzy memberships and partitions objects into each cluster.
Algorithm 5. Fuzzy CMeans Clustering
Input: An input file containing:
(1) The number of test data points
(2) The predetermined number of clusters
(3) The number of dimensions of the data points
(4) The fuzziness coefficient (more than 1, m = 2)
(5) The predefined termination criterion (a threshold value, i.e. 0.0005)
(6) The overall test data objects
Output: (1) The membership matrix
begin
1. Initialize parameters including the number of test data points, the predetermined number of clusters,
the number of dimensions of the data points, the fuzziness coefficient and the termination threshold
value.
2. Initialize degrees of membership for all data points for each cluster with random values.
repeat
3. Calculate the centres of cluster vectors using Equation 11.
4. Update the degrees of membership for all data points using Equation 10.
5. Identify differences between new and old memberships for all data points and find the maximum
difference using Equation 13.
6. object xi belonging to a cluster Cj [63].
V. CONCLUSION AND FUTURE WORK
In this paper developed an unsupervised facial point detector (a rarely explored topic), regression
based AU intensity estimation and emotion clustering for the recognition of the eight basic and
compound emotions from posed and spontaneous facial expressions. The proposed facial point
detection model is able to perform robust landmark extraction from images with illumination
changes, head rotations, pose variations, scaling differences, partial occlusions and background
clutter. Facial point detector has achieved an averaged accuracy rate of 80%, 73%, 78%, 85% and
85% respectively for the evaluation of 200 diverse images. On average, it has outperformed AAM
and CLM respectively by 13% and 9%.It also has optimal computational cost and is significantly fast
than AAM and CLM with comparable computational costs to GNDPM. Moreover, the AU intensity
estimation and emotion clustering are also evaluated using. The SVR based AU intensity estimation
outperformed the NN based method. The average MSE of the SVR based intensity estimation for the
18 AUs is 0.0397. FCM clustering also not only enables the robot to recognize the seven basic
emotions and neutral expressions but also shows great potential to detect compound and newly
arrived novel emotion classes. It also out performs other state of the art research in the field. Initial
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 334
experiments also indicate its efficiency in tackling compound emotions with an average detection accuracy of 82.83% for surprised compound emotion recognition.
In future work, to extend the system to deal with emotion recognition for 900
side view images of
spontaneous expressions, which pose challenges to many state of the art applications. Also aimed to
employ other techniques (e.g. Greylevel Co-occurrence Matrices and evolutionary optimization [69])
to extract appearance deformations embedded in textures to inform affect analysis when geometric
features are not able to provide a thorough full view of emotional behaviors. Compound and
spontaneous emotional expressions from databases and more testing subjects will also be employed
to further evaluate the system’s efficiency. Minmargin based active learning techniques will also be explored to deal with emotion annotation ambiguity of the clustering results of FCM for challenging
real world emotion recognition tasks. For extension of FCM to recognize neutral expression may
apply different clustering algorithms. Moreover, according to Kappas [1], human emotions are
psychological constructs with notoriously noisy, murky, and fuzzy boundaries. Therefore, in the long
term, also aimed to incorporate affect indicators embedded in body gestures and dialogue contexts
with facial emotion perception to draw a more reliable conclusion on affect interpretation to better
deal with open ended challenging robot human interaction.
REFERENCES
[1] K. A, Smile when you read this, whether you like it or not: Conceptual challenges to affect detection, IEEE Trans.
Affective Comput. 1 (1) (2010) 38–41.
[2] L. Zhang, J. Barnden, Affect sensing using linguistic, semantic and cognitive cues in multithreaded improvisational
dialogue, Cognitive Comput. 4 (4) (2012) 436– 459.
[3] L. Zhang, G. M, J.A. Barnden, Emma: An automated intelligent actor in edrama, in: Proceedings of IUI, Spain,
2008, pp. 409–412.
[4] S. Jaiswal, T. Almaev, M.F. Valstar, Guided unsupervised learning of mode specific models for facial point
detection in the wild, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops,
IEEE, 2013, pp. 370– 377.
[5] G. Tzimiropoulos, M. Pantic, GaussNewton deformable part models for face alignment inthewild, in: CVPR, IEEE,
2014, pp. 1851–1858.
[6] Z. Zeng, M. Pantic, G.I. Roisman, T.S. Huang, A survey of affect recognition methods: Audio, visual, and
spontaneous expressions, IEEE Trans. Pattern Anal. Mach. Intell. 31 (1) (2009) 39–58.
[7] P. Ekman, W.V. Friesen, Pictures of Facial Affect, Consulting Psychologists Press, Palo Alto, CA, 1976.
[8] P. Ekman, W.V. Friesen, J.C. Hager, Facial Action Coding System, The Manual, Research Nexus Division of
Network Information PaperCorporation, USA, 2002.
[9] P. Ekman, E.L. Rosenberg, What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the
Facial Action Coding System (FACS), 2nd ed., Oxford University Press, New York, 2005.
[10] J.A. Russell, Core affect and the psychological construction of emotion, Psychol. Rev. 110 (2003) 145–172.
[11] A. Martinez, S. Du, A model of the perception of facial expressions of emotion by humans: Paperoverview and
perspectives, J. Mach. Learn. Res. 13 (1) (2012) 1589–1608.
[12] L. Zhang, M. Jiang, D. Farid, A.M. Hossain, Intelligent facial emotion recognition and semanticbased topic detection
for a humanoid robot, Expert Syst. Appl. 40 (2013) 5160–5168.
[13] D. Vukadinovic, M. Pantic, Fully automatic facial feature point detection using gabor feature based boosted features,
in: Proceedings of the International Conference on Systems, Man and Cybernetics, IEEE, 2005, pp. 1692–1698.
[14] T. Senechal, V. Rapp, L. Prevost, Facial feature tracking for emotional dynamic analysis, in: 13th International
Conference on Advances Concepts for Intelligent Vision Systems (ACIVS), Belgium, Springer, 2011, pp. 495–506.
[15] T.F. Cootes, G.J. Edwards, C.J. Taylor, Active Appearance Models, in: H. Burkhardt, B. Neumann (Eds.),
Proceedings of the European Conference on Computer Vision, Vol. 2, Springer, 1998, pp. 484–498.
[16] I. Matthews, S. Baker, Active appearance models revisited, Int. J. Comput. Vis. 60 (2004) 135–164.
[17] X. Gao, Y. Su, X. Li, D. Tao, A review of active appearance models, IEEE Trans. Syst. Man Cybernet. C: Appl.
Rev. 40 (2) (2010) 145–158.
[18] J. Sung, D. Kim, A background robust active appearance model using active contour technique, Pattern Recogn. 40
(1) (2007) 108–120.
[19] R. Gross, I. Matthews, S. Baker, Constructing and fitting active appearance models with occlusion, in: Proceedings
of the IEEE Conference on Computer Vision Pattern Recognition Workshops, Vol. 5, IEEE, 2004, p. 72.
[20] D. Cristinacce, T. Cootes, Feature detection and tracking with constrained local models, in: Proceedings of British
Machine Vision Conference, Vol. 3, BMVA Press, 2006, pp. 929–938.
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 335
[21] A. Asthana, S. Zafeiriou, S. Cheng, M. Pantic, Robust discriminative response map fitting with constrained local
models, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). USA, 2013,
pp. 3444–3451.
[22] B. Martinez, M. Valstar, X. Binefa, M. Pantic, Local evidence aggregation for regression based facial point
detection, IEEE Trans. Pattern Anal. Mach. Intell. 35 (5) (2013) 1149–1163.
[23] H. Zhang, Y. Zhang, T.S. Huang, Poserobust face recognition via sparse representation, Pattern Recogn. 46 (5)
(2013) 1511–1521.
[24] J. Orozco, O. Rudovic, J. Gonzlez, M. Pantic, Hierarchical online appearancebased tracking for 3d head pose,
eyebrows, lips, eyelids and irises, Image Vis. Comput. 31 (4) (2013) 322–340.
[25] J.C. Lin, C.H. Wu, W.L. Wei, Facial action unit prediction under partial occlusion based on error weighted
crosscorrelation model, in: Proceedings of ICASSP, IEEE, 2013, pp. 3482–3486.
[26] C.H. Wu, J.C. Lin, W.L. Wei, Action unit reconstruction of occluded facial expression, in: Proceedings of ICOT,
IEEE, 2014, pp. 177–180.
[27] C. Shan, S. Gong, P.W. McOwan, Facial expression recognition based on local binary patterns: A comprehensive
study, Image Vis. Comput. 27 (2009) 803–816.
[28] F. Tsalakanidou, S. Malassiotis, Realtime 2d+3d facial action and expression recognition, Pattern Recogn. 43 (5)
(2010) 1763–1775.
[29] Z. Zheng, J. Jiong, D. Chunjiang, X. Liu, J. Yang, Facial feature localization based on an improved active shape
model, Inform. Sci. 178 (9) (2008) 2215–2223.
[30] S. Kaltwang, O. Rudovic, M. Pantic, Lecture Notes in Computer Science, Advances in Visual Computing, Vol.
7432, Springer, Heidelberg, 2012, pp. 368–377.
[31] A. Savran, B. Sankur, M.T. Bilge, Regressionbased intensity estimation of facial action units, Image Vis. Comput.
30 (10) (2012) 774–784.
[32] Y. Li, S.M. Mavadati, M.H. Mahoor, Q. Ji, A unified probabilistic framework for measuring the intensity of
spontaneous facial action units, in: Proceedings of 10th IEEE International Conference and Workshops on
Automatic Face and Gesture Recognition (FG), China, 2013, pp. 1–7.
[33] X. Li, Q. Ruan, Y. Ming, 3D facial expression recognition based on basic geometric features, in: Proceedings of
IEEE 10th International Conference on Signal Processing (ICSP), China, 2010, pp. 1366–1369.
[34] A. Majumder, L. Behera, V.K. Subramanian, Emotion recognition from geometric facial features using
selforganizing map, Pattern Recogn. 47 (3) (2014) 1282– 1293.
[35] H. Fang, N.M. Parthalin, A.J. Aubrey, G.K.L. Tam, R. Borgo, P.L. Rosin, P.W. Grant, D. Marshall, M. Chen, Facial
expression recognition in dynamic sequences: An integrated approach, Pattern Recogn. 47 (3) (2014) 1271–1281.
[36] S. Moore, R. Bowden, Local binary patterns for multiview facial expression recognition, Comput. Vis. Image
Understand. 115 (4) (2011) 541–558.
[37] H.Y. Chen, C.L. Huang, C.M. Fu, Hybridboost learning for multipose face detection and facial expression
recognition, Pattern Recogn. 41 (3) (2008) 1173– 1185.
[38] K. Yu, Z. Wang, L. Zhuo, J. Wang, Z. Chi, D. Feng, Learning realistic facial expressions from web images, Pattern
Recogn. 46 (8) (2013) 2144–2155.
[39] P.A. Viola, M.J. Jones, Rapid object detection using a boosted cascade of simple features, Comput. Vis. Pattern
Recogn. 2001 (1) (2001) 511–518.
[40] T. Kanade, J.F. Cohn, Y. Tian, Comprehensive database for facial expression analysis, in: Proceedings of the Fourth
IEEE International Conference on Automatic Face and Gesture Recognition (FG’00), Grenoble, France., 2000, p.
4653.
[41] P. Lucey, J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews, The extended CohnKanade dataset (CK+): A
complete expression dataset for action unit and emotionspecified expression, in: Proceedings of the Third
International Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB’10), San Francisco,
USA, 2010, pp. 94–101.
[42] G.B. Huang, M. Ramesh, T. Berg, E. Learnedmiller, Labeled faces in the wild: A database for studying face
recognition in unconstrained environments, Tech. Rep. 0749, University of Massachusetts, Amherst, MA, USA,
2007.
[43] A. Kasiski, A. Florek, A. Schmidt, The put face database, Image Process. Commun. 13 (3) (2008) 59–64.
[44] R. Lienhart, J. Maydt, An extended set of haarlike features for rapid object detection, in: IEEE ICIP, IEEE, 2002, pp.
900–903.
[45] L. Zhang, K. Mistry, A. Hossain, Shape and texture based facial action and emotion recognition, in: Proceedings of
AAMAS’14, 2014. (Demo paper), France
[46] M. Castrilln, O. Dniz, C. Guerra, M. Hernndez, Encara2: Realtime detection of multiple faces at different resolutions
in video streams, J. Vis. Commun. Image Represent. 18 (2) (2007) 130–140.
[47] M. Castrillnsantana, O. Dnizsurez, L. Antncanals, J. Lorenzonavarro, Face and facial feature detection evaluation,
in: Proceedings of Third International Conference on Computer Vision Theory and Applications (VISAPP’08),
INSTICC Institute for Systems and Technologies of Information, Control and Communication, 2008, pp. 167–172.
International Journal of Current Trends in Engineering & Research (IJCTER)
Volume 02, Issue 04; April – 2016 [Online ISSN 2455–1392]
@IJCTER-2016, All rights Reserved 336
[48] C. Tomasi, R. Manduchi, Bilateral filtering for gray and color images, in: Proceedings of Sixth International
Conference on Computer Vision, IEEE, 1998, pp. 839– 846.
[49] S. Paris, P. Kornprobst, J. Tumblin, F. Durand, Bilateral filtering: Theory and applications, Found. Trend Comput.
Graph. Vis. 4 (1) (2008) 1–73.
[50] J. Daugman, Complete discrete 2d gabor transforms by neural networks for image analysis and compression, IEEE
Trans Acoust. Speech Signal Process. 36 (7) (1988) 1169–1179.
[51] J.J. Henriksen, 3D surface tracking and approximation using Gabor filters, South Denmark University, 2007 (Master
dissertation).
[52] G. Donato, M.S. Bartlett, J.C. Hager, P. Ekman, T.J. Sejnowski, Classifying facial actions, IEEE Trans. Pattern
Anal. Mach. Intell. 21 (10) (1999) 974–989.
[53] H. Bay, A. Ess, T. Tuytelaars, L.V. Gool, Speeded up robust features (surf), J. Comput. Vis. Image Understand. 110
(3) (2008) 346–359.
[54] M. Calonder, V. Lepetit, C. Strecha, P. Fua, Brief: Binary robust independent elementary features, in: Proceedings of
the 11th European Conference on Computer Vision (ECCV): Part IV, Springer, 2010, pp. 778–792.
[55] R. Ortiz, Freak: Fast retina keypoint, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), IEEE, 2012, pp. 510–517.
[56] S. Leutenegger, M. Chli, R.Y. Siegwart, Brisk: Binary robust invariant scalable keypoints, in: Proceedings of
International Conference on Computer Vision, Spain, 2011, pp. 2548–2555.
[57] Weisstein, W. Eric, 2015, Circlecircle intersection. From MathWorld—A Wolfram web resource.
http://mathworld.wolfram.com/CircleCircleIntersection.html (accessed Jan 15).
[58] P.J. Besl, N.D. McKay, A method for registration of 3d shapes, IEEE Trans. Pattern Anal. Mach. Intell. 14 (2)
(1992) 239–256.
[59] G. Montana, F. Parrella, Learning to trade with incremental support vector regression experts, 3rd International
Workshop on Hybrid Artificial Intelligence Systems (HAIS’08), LNCS, Vol. 5271, Springer, 2008, pp. 591–598.
[60] C.C. Chang, C.J. Lin, Libsvm: A library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011)
1–27. Article No. 27
[61] C. Hsu, C. Chang, C. Lin, 2010, A Practical Guide to Support Vector Classification, Department of Computer
Science National, Taiwan University.
[62] M.H. DeGroot, Probability and Statistics, second ed., AddisonWesley, 1980.
[63] J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, third ed., Morgan Kaufmann, 2011.
[64] P. Belhumeur, D. Jacobs, D. Kriegman, N. Kumar, Localizing parts of faces using a consensus of exemplars, in:
CVPR, IEEE, 2011, p. 2011.
[65] V. Le, J. Brandt, Z. Lin, L. Bourdev, T.S. Huang, Interactive facial feature localization, in: ECCV, Springer, 2012, p.
2012.
[66] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, M. Pantic, 300 faces inthewild challenge: The first facial landmark
localization challenge, in: Proceedings of IEEE International Conference on Computer Vision (ICCVW’13), in: 300
Faces intheWild Challenge (300W), Sydney, Australia, 2013.
[67] S. Jain, C. Hu, J.K. Aggarwal, Facial expression recognition with temporal modeling of shapes, in: IEEE
International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, 2011, pp. 1642–1649.
[68] T. Wu, M. Bartlett, J.R. Movellan, Facial expression recognition using gabor motion energy filters, in: Proceedings
of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10),
IEEE, 2010, pp. 42–47.
[69] S.C. Neoh, L. Zhang, K. Mistry, M.A. Hossain, C.P. Lim, N. Aslam, P. Kinghorn, Intelligent facial emotion
recognition using a layered encoding cascade optimization model, Appl. Soft Comput. 34 (2015) 72–93,
doi:10.1016/j.asoc.2015.05.006.
Recommended