12
CS528 Project Facial Attractiveness [email protected], [email protected] December 8, 2015 Abstract Facial attractiveness is hypothesized to have a strong correlation to certain facial features and charac- teristics. Presented are several models that will classify a dataset of faces as either attractive or unattrac- tive. We will ultimately train several classifiers based on a dataset that contains 2,222 images with psychological and demographic attributes for each image. Several classifiers will be evaluated to find the components which best represent human facial attractiveness. 1 Introduction There exists a correlation between human facial features and attractiveness. In this work, we constructed a series of experiments in order to test several classifiers against a dataset of 2,222 labeled facial images in order to discover the characteristics that accurately predict attractiveness of the person’s face. For our experiments, the “correct” attractiveness score was the aggregated vote from a mechanical turk survey. 2 Related Work There exists a large amount of existing research in this area and there does not seem to be a standardized method to identify attractiveness. Jim Hefner and Roddy Lindsay released a project titled, “Are you Hot or Not?”, which observed various methods for identifying ways to identify attractiveness in human faces. They first attempted a classification using Eigenfaces followed by SVM and kNN which resulted in an accuracy of around fifty percent. Then they attempted extracting geometric features via distances between two point clusters. After calculating the distances between clusters they used both SVM and kNN once again and achieved around 70%.[6] Altwaijry performed a similar classification using a combination of kNN and SVM which achieved an accuracy of around 63%. They also attempted regression analysis using SVR to predict the attractiveness scores. This method predicted labels with an average of a 0.28 deviation from the actual score.[3] 3 Dataset 3.1 Description The dataset is a natural, unbiased set of 10,168 face photographs based on the 1990 US population. The resulting database follows a distribution of faces similar to the US population in terms of gender, race, and age. The database also has variability in attractiveness, image quality, angle of face, emotional expression, and several other features. Images have a resolution of at least 72 pixels per inch and have been cropped 1

CS528 Project Facial Attractiveness

Embed Size (px)

Citation preview

CS528 ProjectFacial Attractiveness

[email protected], [email protected]

December 8, 2015

Abstract

Facial attractiveness is hypothesized to have a strong correlation to certain facial features and charac-teristics. Presented are several models that will classify a dataset of faces as either attractive or unattrac-tive. We will ultimately train several classifiers based on a dataset that contains 2,222 images withpsychological and demographic attributes for each image. Several classifiers will be evaluated to findthe components which best represent human facial attractiveness.

1 Introduction

There exists a correlation between human facial features and attractiveness. In this work, we constructeda series of experiments in order to test several classifiers against a dataset of 2,222 labeled facial imagesin order to discover the characteristics that accurately predict attractiveness of the person’s face. For ourexperiments, the “correct” attractiveness score was the aggregated vote from a mechanical turk survey.

2 Related Work

There exists a large amount of existing research in this area and there does not seem to be a standardizedmethod to identify attractiveness.

Jim Hefner and Roddy Lindsay released a project titled, “Are you Hot or Not?”, which observed variousmethods for identifying ways to identify attractiveness in human faces. They first attempted a classificationusing Eigenfaces followed by SVM and kNN which resulted in an accuracy of around fifty percent. Thenthey attempted extracting geometric features via distances between two point clusters. After calculating thedistances between clusters they used both SVM and kNN once again and achieved around 70%.[6]

Altwaijry performed a similar classification using a combination of kNN and SVM which achieved anaccuracy of around 63%. They also attempted regression analysis using SVR to predict the attractivenessscores. This method predicted labels with an average of a 0.28 deviation from the actual score.[3]

3 Dataset

3.1 Description

The dataset is a natural, unbiased set of 10,168 face photographs based on the 1990 US population. Theresulting database follows a distribution of faces similar to the US population in terms of gender, race, andage. The database also has variability in attractiveness, image quality, angle of face, emotional expression,and several other features. Images have a resolution of at least 72 pixels per inch and have been cropped

1

with an oval around the face to minimize background effect and resized to a height of 256 pixels withvariable width. Several additional pieces of information were collection for 2,222 of the larger set of images.Demographic labels were found for all of these images pertaining to race, gender, etc. A study was alsoperformed for these images to find psychology attributes such as a person’s attractiveness. After the attributeratings were collected the score for each face’s attributes was calculated as an average of the collectedscores. [4] The attributes that we will be using along with the images for the purpose of this study will betheir gender and attractiveness score.

3.2 Preprocessing

The dataset that we retrieved contained images of varying sizes as described above. To resolve this we used acommand line utility called sips. [5] We then created a command line script to process the 10k image datasetby resampling each image to then convert it to a variety of sizes depending on the scale of the classifier.Most notably, the sizes 80 by 64 and 40 by 30 pixels.

3.3 Preliminary Analysis

The baseline probability that a face is above a ‘2’ attractiveness score is 65.8%. Representing attractivenesscategorically (‘1’ is the score 1, ‘2’ is the score 2, etc. . . ), the probability distribution is graphed below.Notably, roughly 47% of the population consists of ‘3’.

Figure 1: Baseline Binary Attractiveness Distribution

Figure 2: Baseline Categorical Attractiveness Distribution

2

Figure 3: Baseline Gender Distribution

Therefore, it is safe to assume naive highly biased classifiers can achieve: 65.8% accuracy for binaryattractiveness, 47.64% for categorical, and 54.4% for gender.

4 Experiments

4.1 SVM Classification

We first attempted to do SVM classification on the full set of features which equated to around 4800 featuresper image. This caused the classifier to severely overfit so we will be experimenting with both dimensionalityreduction as well as segmenting our data.

4.1.1 Non Gender Separated

We tested two forms of dimensionality reduction: non-negative matrix factorization (NMF) and principalcomponent analysis (PCA) on the unseparated data. For both of these methods we attempted to reduce thenumber of dimensions to between 1 and 200 dimensions. After which, we plotted the accuracy of eachtrained model against the dimensions to help visualize the accuracy and possibility of overfitting as thedimensionality of our model increased.

The first test that was performed was a SVM classifier to detect facial attractiveness purely on thereduced dimensionality data when labeled by the mean separated attractiveness scores. We bucketed thedata into a binary labeled dataset using the following condition.

Y =

{0 a≤ µA

1 a > µA∀a ∈ A

The results of the dimensions tested are shown in figure 4.1.1 on page 4 for NMF and in figure 4.1.1 on 4for PCA. As seen in figure 4.1.1, when using NMF the classify performs with good results at around 90dimensions with a training accuracy of 67% and a test accuracy of 68%. Dimensionality reduction usingPCA resulted in around 130 dimensions showing the best results with a training accuracy of 70% and atesting accuracy of 69%.

3

Figure 4: PCA — Attractiveness Classification Results

Figure 5: NMF — Attractiveness Classification Results

4.1.2 Segmented By Gender

Next, we attempted to experiment with segmenting the data. When plotting a histogram of the attractivenessscores after separating the data by gender (figure 4.1.2, page 5) we noticed that the scores followed differentdistributions. The male distribution has a mean of 2.72, variance of 0.37, and a standard distribution of 0.61.The female distribution has a mean of 3.13, variance of 0.64, and a standard distribution of 0.8.

4

Figure 6: Gender Attractiveness Scores

Following these results, we then separated the data in to two datasets and performed several dimension-ality reduction experiments. Once again, we used NMF and PCA for dimensionality reduction. We recordedthe results of reducing the dimensions of our data ranging from 1 to 200 dimensions. We bucketed each datasegment into a binary labeled dataset using the following condition.

Y =

{0 a≤ µAg

1 a > µAg

∀a ∈ A,∀g ∈ {male , female}

The results from the NMF dimensionality reduction experiments are displayed in figure 4.1.2 on page 8and the results from the PCA dimensionality reduction experiments are displayed in figure 4.1.2 on page 6.

Figure 7: PCA — Male

5

Figure 8: PCA — Female

Figure 9: PCA — Gender Separated Attractiveness Classification Results

6

Figure 10: NMF — Male

Figure 11: NMF — Female

7

Figure 12: NMF — Gender Separated Attractiveness Classification Results

The experiment was performed independently on both the male and the female data segments. The maledataset performed best at 80 dimensions with NMF and at 78 dimensions with PCA. The top 36 weightsfor the male segment is listed in figure 4.1.2 for NMF and in figure 4.1.2 for PCA. The top 36 weights forthe female segment is listed in figure 4.1.2 for NMF and in figure 4.1.2 for PCA. NMF produced a trainingaccuracy of 69% and a testing accuracy of 66%. PCA produced a training accuracy of 75% and a testingaccuracy of 69%. The female dataset performed best at 120 dimensions with NMF and at 76 dimensionswith PCA. NMF produced a training accuracy of 78% and a testing accuracy of 72%. PCA produced atraining accuracy of 72% and a testing accuracy of 74%. The results of the segmented classifiers producedvarying results between the male and female segments. Compared to the initial unsegmented experiment thefemale dataset performed slightly better whereas the male dataset performed roughly the same. The maleand female segments both produced better results by performing dimensionality reduction using PCA.

4.1.3 Bagged Attractiveness Classifier

In order to accomplish our goal of classifying independent as either attractive or unattractive we will needa way to classify the images as either male or female. In the initial experiment, we used unseparated datain section 4.1.1 we do not need to classify as male or female since we are ignoring the feature of gender inthe classification. We could let a given user supply their gender when passed to our classification algorithm,however we would like to produce a model that works similar to that of our initial experiment where only animage is supplied. In order to accomplish this we will be training a bagged classifier model. The classifierwill use the following structure:

Gender Classifier

Male SVM Female SVM

The first classifier will predict the gender of the image supplied. Like that of the other experimentswe performed a dimensionality reduction experiment using NMF and PCA. PCA produced the best results

8

at 156 dimensions with a training accuracy of 88% and a test accuracy of 83%. NMF produced the bestresults at 160 dimensions with a training accuracy of 88% and a test accuracy of 83%. With this resultneither PCA or NMF produces a significantly more accurate model. The results of the NMF dimensionalityexperiments are shown in figure 4.1.3 on page 9. The results of the PCA dimensionality experiments areshown in figure 4.1.3 on page 9.

Figure 13: NMF — Gender Classification Results

Figure 14: PCA — Gender Classification Results

The child classifiers are models that were trained in section 4.1.2. The model is trained with data froma dimensionally reduced dataset using both NMF and PCA. The number of dimensions used in each genderbased classifier was the number of dimensions that produced the most accurate result from the experiments in

9

section 4.1.2. The NMF and PCA reduced datasets produced results that were similar to the first experiment.The bagged set of NMF based classifiers produced a training accuracy of 71% and a testing accuracy of 67%.The bagged set of PCA based classifiers produced a training accuracy of 73% and a testing accuracy of 68%.

Figure 15: PCA/NMF — Bagged Attractiveness Classification Results

4.1.4 Experiment Results

Through the use of several experiments based on SVM, we determined that the simple classifier in sec-tion 4.1.1 produced the most consistent results. However, the results in section 4.1.3 produced comparableresults. It’s possible that through the addition of additional classifiers in the bagged model we could improvethe accuracy of this classifier. One other possibility to improve accuracy could be to apply boosting to thatof the gender based classifier based on the existence of certain facial features.

5 Neural Networks

5.1 Simple Network

The simple hidden layer networks did not perform well. Most of them converged to choosing a bias andforcing all test outputs to that bias. Therefore, each experiment converged to the 67.87 bias term for bi-nary attractiveness and 47.67% for categorical. Gender classification was not tried. Many networks neverdecreased their loss score even after 300 epochs.

5.2 Convolutional Network

The convolutional neural network was able to detect binary attractiveness classification (whether the face isabove an attractive score of 2) with accuracy of approximately 68%. The convolution network was testedchanging the number of the epochs from 3 to 24. This is not a significant improvement from the baselinenaive classifier of 65.8%.

10

Figure 16: Binary Attractiveness Accuracy — Total Epochs

The convolutional neural network was able to detect categorical attractiveness classification (the actualattractiveness score) with accuracy of approximately 48.53%. This is not a significant improvement fromthe baseline naive classifier of 47.67%.

The convolutional neural network was able to detect gender classification (whether the face belongs to amale or female human) with accuracy of approximately 82%. The convolution network was tested changingthe size ofA the last hidden layer from 2 to 64. Which is a significant improved from the baseline naive54.4% classifier. Below is the weight vector for the first and second layer of the convolutional neural net.

Figure 17: Gender Accuracy — Size of Last Hidden Layer

Figure 18: Convolution Neural Net — Layer 1 Components

11

Figure 19: Convolution Neural Net — Layer 2 Components

6 Future Work

As mentioned in the experiment results from section 4.1.4, we would like to explore other features to utilizewithin our bagged classifier in order to improve its accuracy. The Neural Computation journal reportedthat left-right symmetry of a face had a strong correlation to the attractiveness score.[7] We could possiblyimprove the accuracy of our SVM classifier by boosting the produced score by a symmetry factor. Anotherpossibility in order to improve our accuracy could be to convert our learned model from a classifier to insteadbe a regression model in order to predict the actual score as other experiments have.

7 Conclusion

In conclusion we were able to identify attractive faces when separated into a binary labeled dataset. Theresults produced were similar to that of experiments from other projects that were researched. Classifyingfacial attractiveness is a much harder problem than gender detection, even though those appear to be verysimilar tasks.

References

[1] Pattern Recognition. Elsevier Inc., 2009.

[2] T. Leyvand et al. A Kagian, G. Dror. A machine learning predictor of facial attractiveness revealinghuman-like psychophysical biases. 8:235–243, 2008.

[3] Hani Altwaijry. Facial attractiveness scoring. 2011.

[4] Isola P. Bainbridge, W. A. and A. Oliva. The intrinsic memorability of face photographs. journal ofexperimental psychology: General, 142(4). 8:1323–1334, 2013.

[5] BSD. BSD General Commands Manual - sips, October 2013. Darwin.

[6] Jim Hefner and Roddy Lindsay. Are you hot or not? 2006.

[7] E. Ruppin Y. Eisenthal, G. Dror. Facial attractiveness: Beauty and the machine. Neural Comput., pages119–142, 2006.

12