[IEEE 2014 International Conference on Computer and Information Sciences (ICCOINS) - Kuala Lumpur, Malaysia (2014.6.3-2014.6.5)] 2014 International Conference on Computer and Information

978-1-4799-0059-6/13/$31.00 ©2014 IEEE

Gender Recognition on Real World Faces Based on Shape Representation and Neural Network Olasimbo Ayodeji Arigbabu1, Sharifah Mumtazah Syed Ahmad1, 2, Wan Azizun Wan Adnan2, Salman Yussof 2, Vahab

Iranmanesh3, Fahad Layth Malallah3

1,2,3 Department of Computer and Communication Systems, Faculty of Engineering, Universiti Putra Malaysia, Selangor, Malaysia. 2 Department of Systems and Networking, College of Information Technology, Universiti Tenaga National, Jalan IKRAM-Uniten, Kajang,

Malaysia. {[email protected], {s_mumtazah, wawa}@upm.edu.my , [email protected]}

Abstract—Gender as a soft biometric attribute has been extensively investigated in the domain of computer vision because of its numerous potential application areas. However, studies have shown that gender recognition performance can be hindered by improper alignment of facial images. As a result, previous experiments have adopted face alignment as an important stage in the recognition process, before performing feature extraction. In this paper, the problem of recognizing human gender from unaligned real world faces using single image per individual is investigated. The use of feature descriptor to form shape representation of face images with any arbitrary orientation from the cropped version of Labeled Faces in the Wild (LFW) dataset is proposed. By combining the feature extraction technique with artificial neural network for classification, a recognition rate of 89.3% is attained.

Keywords—Gender Recognition, Shape Representation, Neural Network

I. INTRODUCTION Gender recognition has recently become an area of research

in the domain of computer vision that provides many interesting application areas. Among the potential application areas for gender recognition are human computer interaction, person categorization, improving biometric recognition and demographic data collection in a passive manner [1]. Even in the commercial world, there are many applications where gender recognition is useful. An instance is when a retailer wants to evaluate the percentage of a particular gender that stopped by to view the displayed products. All these potential applications of gender recognition have led to the necessity of finding practical solutions to the problem of gender recognition for electronic systems. For humans, recognizing gender is an easy task, under any kind of constrain. But, for electronic systems, performing accurate gender recognition is a daunting task, especially when facial features cannot be well extracted as a result of several challenges such as low quality image, occlusion, and improperly aligned images [2][3].

Various studies have been carried out on gender recognition, with quite a number of reliable recognition results [4][5]. The previous studies on gender using facial images can be mainly grouped into; those performing recognition on

aligned images and those focusing on recognition on unaligned images. Aligned images are images whose geometric distortions, as a result of false detection from the automatic face detector have been compensated by centralizing the coordinates of the images. While, unaligned images exhibit all sort of geometric distortions like rotation, translation as a result of variation in pose and orientation of the individuals in the image or shift in the bounding box of the face detector.

The first major attempt on face gender classification was reported by Golomb et al. [6], where the authors introduced SEXNET, based on back propagation neural network classification of 30-by-30 low resolution facial images. The experiment was carried out on 90 male and female images, which yielded an accuracy of 91.9%. Likewise, a similar classification approach was adopted by Mitsomoto H. [7], for gender recognition on 8x8 and 8x6 low resolution face images with an accuracy of 90%. Meanwhile, Yang and Moghaddam [8] utilized linear support vector machine (SVM) for gender classification. It outperformed human perception test, as well as exceeding the performance of other linear classifiers such as Fisher Linear Discriminant, Nearest Neighbor and even Radial Basis Function based classifiers. Their result showed an average error rate of 3.4%. Makinen and Raisamo[9] performed a comparative experiment on four different gender classification methods and various techniques to automatically detect and align faces. The authors deduced that the best result was achieved by manually aligning 36-by-36 pixel facial images and using SVM for classification. The recognition rate attained by automatically aligning the face images was 82.1%, while utilizing manual alignment yielded an increased recognition rate of 87.1%. Automatic image alignment is computationally expensive and the optimization technique may lack precision, as the optimizer can be trapped in the local minima [9].

More recently, the influence of facial alignment on gender recognition performances was investigated by Mayo and Zhang [10]. Their argument was based on the fact that, by deliberately adding some misaligned images in the dataset, learning classifiers can be made robust to misalignment. As a result, it reduces the necessity for automatic face alignment in the recognition process. They attained a result of 92.5%. Chu et al.

[11] performed gender recognition on only unaligned face images. The unaligned images were achieved by randomly changing the position of the face detector, which resulted in several sample images of each subject with varying pose and orientation. In an organized data compilation setting, where the subjects are cooperative, it is easier to acquire multiple sample images of each individual, with the improving level of technology, where multiple camera networks could be exploited for image acquisition. However, in realistic scenario, it is difficult to compile several images of an individual in an unconstrained environment, because subjects are uncooperative and only one or few sample images are available for each individual.

In this paper, we propose a technique that can accurately recognize human gender without the need to perform image alignment and by using only a single sample image per individual. This is achieved by depending on the holistic facial features of the individuals to be recognized. The facial shape of humans is an obvious feature that can be exploited for this purpose. Even though, a 2D image does not give as much details as a 3D image in terms of the intrinsic structure and the depth of the image. But, when properly extracted and represented, it can provide useful features that can be used for gender recognition. Our proposed technique involves the use of shape representation by combining Laplacian filtered images with Pyramid Histogram of Gradient (PHOG) shape descriptor, presented by Bosch et al. [12], to aid gender classification. The rest of the paper is organized as follows. In Section 2, the proposed system for gender recognition is explained in detail. Section 3 presents the setup of the experiment. Section 4 provides the experimental results and discussions. Finally, Section 5 brings the work to a conclusion.

II. FACE GENDER RECOGNITION The proposed technique utilizes the already established fundamental operations for performing recognition in a computer vision application. These operations are pre-processing, feature extraction, and classification, as depicted in Fig 1.

Fig. 1 Operations required to perform gender recognition

A. Pre-processing Pre-processing is a method for data normalization that can

be considered a necessity in the process of recognition. It is to provide an improvement in the result attained, which may not only be accurate but also reliable. Examples of pre-processing

techniques include noise removal, spatial transformation and normalization. The requirement of the application domain determines which pre-processing technique(s) should be applied. In this paper, the images used are already spatially resized to equal dimension by the authors of the database. So, noise removal is considered as a very important pre-processing tool to normalize the pixels of the image, based on the fact that the images were acquired from random sources.

Therefore, we utilized a 2-D Gaussian filter for noise removal. Gaussian filter is a type of spatial filter that operates based on the Gaussian function by convolving the kernel or mask with the image. The filtering process gives more significance to pixels near the border or edge of the image, thereby reducing edge blur. Moreover, its performance is computationally efficient because it performs equally in all directions. The function of a 2-D Gaussian filter is bell-shaped and it is denoted by the mathematical expression shown in equation 1.

, √ (1)

where x and y are the coordinate of the pixel in the image and σ is the standard deviation. The standard deviation determines the tightness of the bell-shaped Gaussian function [13].

B. Feature Extraction The size of an image would literally represent the length of

the features, if raw pixels are used. But, it has been claimed in previous studies [14][15], that dimensionality of the feature vector is a major challenge in any biometric recognition task. As such, there is a need to acquire or extract some discriminating information that is representative of the image, without losing too many information about the image. Shapes in image processing can be extracted in the form of edges or contours, but, they can be adequately represented by using feature descriptors. To achieve this, the shape representation algorithm described in [12] is exploited. With regards to this, we utilized the Laplacian operator for edge detection instead of the Canny edge detector that was used in the initial algorithm. The reason for using Laplacian operator is because it is an isotropic filter. It can detect edge information equally without bias on the orientation or direction of the edges. In order to detect edges, the spatial kernel of the Laplacian operator is convolved with the image. The output of this convolution between the image and the Laplacian operator is normally greyish edge lines. It highlights the regions of swift change in pixel intensity and other discontinuities, while suppressing the regions where there are gradual changes in pixel intensity, which are superimposed on an empty dark background [13].The result of applying the Laplacian operator is presented in Fig. 2 (c).

PHOG descriptor is inspired by the concept of Histogram of Oriented Gradient proposed by Dalal and Triggs [16]. PHOG can be considered as a combination of several HOGs, but there is a little bit of variation in its computation. In order to compute the PHOG descriptor, the image was first divided

Gaussian Smoothing

Laplacian / PHOG

Neural Network

Male / Female

Database

En

rollm

en

t

TestingFacial Image

into regions; which is referred to as pyramid levels. Then, we used a second order derivative operator based on Laplacian operator, to generate edges in all directions in the image. Afterwards, sobel operator is applied for generating image gradients, Grx and Gry in horizontal and vertical directions. The following expressions are used to compute the gradient magnitude and orientation, respectively: (2) ⁄ (3)

The orientation and gradient magnitude for each edge were used to form a local orientation histogram at a spatial dense, grid by calculating a weighted vote for an edge orientation histogram channel based on the direction of the gradient element positioned on it [12]. The weighted votes were accumulated into cells of orientation bins over local spatial regions [12]. The resulting edge orientation and gradient histograms at each spatial pyramid are then concatenated together into a feature descriptor, which gives a more compact shape descriptor, as shown in Fig. 2 (d). The feature length of the descriptor can be varied depending on the application domain by reducing or increasing the number of pyramid levels, L, and number of bins for the binning local histogram H.

(a) (b)

(c) (d) Fig. 2 The results of pre-processing and feature representation

process. (a) The original image. (b) Smoothed image with Gaussian. (c) Laplacian edge detection with L=2 pyramid

level. (d) PHOG descriptor

C. Classification Once the feature vectors have been extracted, the final operation is learning, which is where the system learns to classify the features into their respective classes. Gender classification is considered a binary classification problem, where there can only be two classes, either male or female. As

a result, neural network is utilized in order to classify the genders. Neural Network is a popular machine learning algorithm for performing recognition. It was motivated by the way human nervous system operates. The major characteristic of neural network is its ability to learn from previous instances or samples pattern. The architecture of neural network is mainly composed of input layer, hidden layer and output layer. In this experiment, the neural network toolbox on MATLAB workstation is used. It is a feed forward network with a scaled conjugate gradient back propagation training function. The network consists of a single hidden layer of 30 nodes.

III. EXPERIMENTS The proposed approach is evaluated on the cropped version of Labeled Faces in the Wild (LFW) dataset [17][18]. The dataset was mainly compiled with the aim of solving the problem of unconstrained face recognition. It is composed of several challenging variations that depict real life conditions such as misalignment, scale, pose and expression variations. The dataset contains 13,233 gray scale facial images, with each image having the size of 64 x 64 pixels. Meanwhile, there is variation in terms of the number of sample images available for each individual. Some individual have only one sample, while some have up to 150 sample images. Therefore, ground truth was annotated for each image in the dataset based on our subjective perception. Further, any image whose ground truth could not be well identified as a result of degradation in the quality of the image was eliminated. As a result, a total of 2,759 face images were selected, which is composed of 1,679 different male faces and 1,080 different female faces. Some sample images are shown in Fig. 3. In order to perform classification, we divided the training and testing data for male and female samples equally. In the training stage, a cross validation was performed by randomly using 30 % of the training set.

In this paper, an initial experiment is carried out to evaluate the performance of combination of canny edge detector with the described shape descriptor. Then, we performed another experiment using Laplacian for edge detection and sobel operator for generating image gradient. Furthermore, to analyze the two experiments in detail, we evaluate the effect of various feature length of the shape descriptor on recognition performance by varying the size of the spatial pyramid, L, and the binning histogram, H. Pyramid levels, L = 2, 3 and histogram bin, H = 8, 16, with a constant orientation of 180 degrees, were tested. The recognition rate of the system is presented by calculating the percentage of True Positive (TP) and True Negative (TN) over the entire population, using equation 4.

(4)

Fig.3 Sample images from c

IV. RESULTS AND DISCUSSI

This section presents the result attained in thcarried out, in this paper. Notably, is that aperformed, using a single sample image peLFW dataset, which is very challenging datachieved using different combinations of Pwith their respective feature length is presenrepresented with a plot of recognition perfThe first experiment, practically involves uimplementation of PHOG algorithm, whicedge detection. We tested different parameters, L and orientation binning histogL = 2 and H = 8, the system only attained a r79%. Using L = 2 and H =16, yielded a r80%. However, by increasing the pyramid using H=8, the result significantly increasedwas the best performance attained usingdetection, as shown above in Fig 4. Furtheincreasing the histogram to H=16, onperformance of the system to 84.2%. Wincreasing the binning histogram made littthe recognition rate. To perform a further evaluation of our obsersecond order derivative edge operator basededge detection. Similarly, the same paramlevel and binning histograms were attemptedH = 8, the recognition rate was 83.8%. L

Fig.4 R

cropped LFW dataset, top row includes Male and botto

IONS e two experiments ll experiments are er individual from tabase. The results PHOG parameters, nted in Table I and formance in Fig.4. using the original

ch uses canny for spatial pyramid

grams H. By using recognition rate of recognition rate of

level, L to 3 and d to 85.4%, which

g canny for edge r, using L=3, and

nly degraded the We observed that tle contribution to

rvation, we used a d on Laplacian for meters of pyramid d. Using L = 2 and

= 2 and H =16,

yielded a recognition rate of than the best result achieved uMoreover, increasing the pyramto H=8, gave an improved resuL=3, and increasing the histoincrease in the performance, was shown in Fig 4 and Table I. Also, we note here that, incmade little positive contributisystem. But, increasing theimproves the system’s perfonoticeable form Table I, that threduced from 88.6% to 88.4%L=2, but with the histogramSimilarly, it is evident in Tablfemale class dropped from 83and increasing H from 8 to degradation to the sparseness binning histogram is suddenly should probably be increased fbinning histogram is increasedobservation from the experimerecognition rate of system wasthe performance of PHOG seeWe infer that, at much deninformation being encoded bec

Recognition rate against varying feature length

om row female

85.5%, which is already better using canny for edge detection. mid level to L=3, and histogram ult of 88.3%. Furthermore, using ogram to H=16, made a small

with a recognition rate of 89.3%,

creasing the binning histogram ion to the performance of the e pyramid level significantly formance. For instance, it is he recognition rate of Male class

%, when we used pyramid level m H, increased from 8 to 16. le I, that the recognition rate of 3.7% to 83.1%, by using L=3,

16. We mainly attribute this of the feature vector, when the increased. The edge orientation from 180 to 360 degrees, when

d from 8 to 16. Finally, another ent is that, even though the best s attained with L=3 and H=16,

em to saturate at L=3 and H=8. nse pyramid level, the shape comes increasingly sparse.

Table I Results of classification with Neural Network

PHOG Recognition Rate Descriptor Parameters

Feature Length

Male (%)

Female(%)

Overall(%)

L = 2, H = 8 168 88.6 76.3 83.8 L = 2, H = 16 336 88.4 80.9 85.5 L = 3, H = 8 680 91.3 83.7 88.3 L = 3, H = 16 1360 93.2 83.1 89.3

Note: Best result was achieved using L = 3, H= 16 descriptor parameters

V. CONCLUSION In this paper, a technique for gender recognition on

unaligned face images based on facial shape representation is proposed. The experiment was performed on sample images from the LFW dataset. The images presented several challenges in terms of variation in head pose, face expression and low image resolution. The proposed technique requires the use of only a single sample image per individual, as opposed to the previous techniques, which utilized multiple sample images for each individual. We adopted the already existing shape descriptor based on spatial pyramid of orientation and gradient histogram, with neural network for classification on equally divided dataset. Initially the original implementation of PHOG, which involves the use of canny for edge detection was tested, the best result attained was only 85.4%, with L=3, H=8. However, Laplacian operator was later resorted to, instead of canny for edge detection, because of its ability to derive edge information independent of the direction the mask is applied, which resulted in a recognition rate of 89.3%. In addition, we denote that an increase in the binning histogram, from 8 to 16 made little contribution to the performance of the proposed system, while increasing the pyramid level, improved the recognition rate of the system.

REFERENCES

[1] G. Farinella and J. Dugelay, “Demographic classification: Do gender and ethnicity affect each other?,” in Proceeding of IEEE Intenational Conference on Informatics, Electronics & Vision, 2012, pp. 383–390.

[2] G. Srinivas, S. Rd, and B. Manor, “Gender and Ethnic Classification of Human Faces Using Hybrid Classifiers,” in International Joint Conference on Neural Networks, 1999, vol. 6, pp. 4084–4089.

[3] C. Ng, Y. Tay, and B. Goi, “Recognizing human gender in computer vision: a survey,” in PRICAI 2012: Trends in Artificial Intelligence, 2012, pp. 335–346.

[4] P. Rai and P. Khanna, “Gender Classification Techniques: A Review,” Adv. Comput. Sci. Eng. Appl., vol. 166, pp. 51–59, 2012.

[5] S. A. Khan, M. Nazir, S. Akram, and N. Riaz, “Gender classification using image processing techniques: A survey,” in IEEE 14th International Multitopic Conference (INMIC), 2011, pp. 25–30.

[6] B. Golomb, D. Lawrence, and T. Sejnowski, “SEXNET: A Neural Network Identifies Sex From Human Faces.,” NIPS, 1990.

[7] S. T. Mitsumoto, Hiroshi, “Male / Female Identification From 8 X 6 Very Low Resolution Face Images By Neural Network,” Pattern Recognit., Vol. 29, No. 2, Pp. 331–335, 1996.

[8] M. Yang and B. Moghaddam, “Support vector machines for visual gender classification,” in Proceedings of 15th International Conference on Pattern Recognition, 2000, pp. 1115–1118.

[9] E. Makinen and R. Raisamo, “Evaluation of gender classification methods with automatically detected and aligned faces,” IEEE Trans on Pattern Anal. Mach. Intell., vol. 30, no. 3, pp. 541–547, 2008.

[10] M. Mayo and E. Zhang, “Improving face gender classification by adding deliberately misaligned faces to the training data,” in IEEE 3rd International Conference Image and Vision Computing, 2008, pp. 1–5.

[11] W.-S. Chu, C.-R. Huang, and C.-S. Chen, “Gender classification from unaligned facial images using support subspaces,” Inf. Sci. (Ny)., vol. 221, pp. 98–109, Feb. 2013.

[12] A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel,” in Proceedings of the 6th ACM international conference on Image and video retrieval, 2007, pp. 401–408.

[13] R. C. Gonzalez and R. E. Woods, “Digital Image Processing (3rd Edition),” in Prentice-Hall, Inc, 2002.

[14] L. Lu, Z. Xu, and P. Shi, “Gender Classification of Facial Images Based on Multiple Facial Regions,” in WRI World Congress on Computer Science and Information Engineering, 2009, pp. 48–52.

[15] R. Jafri and H. R. Arabnia, “A Survey of Face Recognition Techniques,” J. Inf. Process. Syst., vol. 5, no. 2, pp. 41–68, 2009.

[16] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, 2005, vol. 1, pp. 886 – 893, 2005.

[17] C. Sanderson and B. Lovell, “Multi-region probabilistic histograms for robust and scalable identity inference,” Adv. Biometrics, vol. 5558, pp. 199–208, 2009.

[18] G. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,” University of Massachusetts, Amherst, Technical Report, pp. 1–14, 2007.

Documents

[IEEE 2014 International Conference on Computer and Information Sciences (ICCOINS) - Kuala Lumpur, Malaysia (2014.6.3-2014.6.5)] 2014 International Conference on Computer and Information