Computational perspectives on the other-race effect

This article was downloaded by: [UQ Library]On: 10 November 2014, At: 10:48Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Visual CognitionPublication details, including instructions for authorsand subscription information:http://www.tandfonline.com/loi/pvis20

Computational perspectives onthe other-race effectAlice J. O'Toolea & Vaidehi Natua

a School of Behavioural and Brain Sciences, Universityof Texas at Dallas, Richardson, TX, USAPublished online: 14 Jun 2013.

To cite this article: Alice J. O'Toole & Vaidehi Natu (2013) Computationalperspectives on the other-race effect, Visual Cognition, 21:9-10, 1121-1137, DOI:10.1080/13506285.2013.803505

To link to this article: http://dx.doi.org/10.1080/13506285.2013.803505

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, orsuitability for any purpose of the Content. Any opinions and views expressedin this publication are the opinions and views of the authors, and are not theviews of or endorsed by Taylor & Francis. The accuracy of the Content shouldnot be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions,claims, proceedings, demands, costs, expenses, damages, and other liabilitieswhatsoever or howsoever caused arising directly or indirectly in connectionwith, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly

http://www.tandfonline.com/loi/pvis20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/13506285.2013.803505

http://dx.doi.org/10.1080/13506285.2013.803505

forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Computational perspectives on the other-race effect

Alice J. O’Toole and Vaidehi Natu

School of Behavioural and Brain Sciences, University of Texas at

Dallas, Richardson, TX, USA

Psychological studies have long shown that human memory is superior for faces ofour own-race than for faces of other-races. In this paper, we review computationalstudies of own- versus other-race face processing. Computational models examinethe visual challenges of representing the uniqueness of individual faces that varyboth within and across demographic categories. These models isolate the visualcomponents of the other-race effect and provide an objective control for socio-affective responses to other-race faces. This control allows researchers to compareand test the role of experience/contact in the other-race effect, using variousoperational definitions of this theoretical construct. The models show that toproduce an other-race effect computationally, biased experience or learning mustintervene during the process of feature selection. This implicates the criticalimportance of ‘‘developmental’’ learning in the other-race effect.

Keywords: Computational; Face; Other-race effect.

The perception of own- and other-race faces has been studied with

experimental behavioural approaches for decades (Malpass & Kravitz,

1969). These studies document factors in the domains of social and visual

perception that give rise to differences in the quality and flexibility of our

memories for own- and other-race faces. Over the last few decades, there

have been hundreds of papers examining perceptual and memory compo-

nents of the other-race effect. This Special Issue contains a comprehensive

look at these findings and their implications.

Please address all correspondence to Alice J. O’Toole, School of Behavioural and

Brain Sciences, University of Texas at Dallas, Richardson, TX 75080, USA. E-mail:

[email protected]

Thanks are due to funding from the Technical Support Working Group of the Department

of Defense, which supported the authors in preparing this paper. Thanks are also due to Allyson

Rice and two anonymous reviewers for comments on a previous version of this manuscript.

Visual Cognition, 2013

Vol. 21, Nos. 9�10, 1121�1137, http://dx.doi.org/10.1080/13506285.2013.803505

# 2013 Taylor & Francis

Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1080/13506285.2013.803505

The purpose of the present paper is to consider a less studied aspect of the

other-race effect*its computational foundations. In other words, we will

examine computer-based models of face perception and recognition algo-

rithms that have focused on the computational challenges posed by

processing faces of different races. We note at the outset that the definition

of the other-race effect in a computational model is not entirely analogous to

its definition in the psychological literature. Rather, in computational terms,

‘‘race’’ is a visually based stimulus category defined by the statistical

variability of faces from within and across demographic race/ethnicity

categories. In short, race is one of several demographic categories of faces

(e.g., gender, age) that may pose special challenges to computational models.

As we will see, ‘‘other-race’’ faces, in computational terms, will be defined in

various ways, ranging from race categories that constitute a minority of the

faces in a computationally based model memory or experience history, to

race categories that are underrepresented where a particular face recognition

algorithm was developed. We will return to, and expand on, these

definitional variables at several points in the paper.From the computational perspective, engineers have been developing face

recognition models since the 1980s. The earliest attempts at these algorithms

quickly came up against the difficulty of encoding the uniqueness of

individual faces in the context of populations of faces that share the same

set of ‘‘features’’, arranged in roughly the same configuration. Feature-based

approaches to computationally based face recognition did not fare well,

because feature descriptors did not adequately capture the uniqueness of

individual faces. The second wave of computational attempts fared better

because these models captured subtle variations in the form and configura-

tion of the features, using global structure quantifiers, such as principal

components. The third wave of computational models has been successful

enough to produce commercial algorithms that are now used in industry and

by governments for identity verification.

Recent tests indicate the best algorithms now perform more accurately

than humans, in some challenging viewing environments (O’Toole, An,

Dunlop, Natu, & Phillips, 2012; O’Toole et al., 2007). Although computa-

tional models of face recognition are still limited in fundamental ways, this

approach has informed psychologists about the nature of the computational

problems brains encounter in representing and remembering faces. In

particular, the models offer insight into the problem of quantifying

information in faces that can specify an individual’s identity and his/her

status with respect to visually derived semantic categories (e.g., sex, race, age)

(Bruce & Young, 1986). For present purposes, a computational framework

provides for a quantitative description of demographic categories of humans

(race, sex, age) and a description of how faces within a demographic category

1122 O’TOOLE AND NATU

Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

differ from one another. This framework has proven useful for evaluating

and modelling the role of experience in the other-race effect.

We digress briefly to note the term experience takes on different connota-

tions in computational and psychological studies. We will argue, however, that

there are more points of contact on the multiple meanings of this term than we

might first anticipate. To begin, the rich literature on developmental experience

shows that exposure or contact during a critical period can contribute to the

definition and delineation of neural and perceptual feature sets (Kuhl, 1994,

1998). This type of experience has an analogous form in computational models

that derive face representations from statistical learning. Changes to face

processing abilities that take place at this junction are not (easily) malleable

later in the life of the person or computational model. In addition, contact or

experience with people of different races later in life still impacts face

processing skill, but these effects differ in fundamental ways from those

imposed by developmental contact. As we will see, the computational analogy

to adult learning affects the memory capacity of the model and its ability to

keep distinct representations of similar faces. It is worth noting at the outset

that race is both a social and visually derived semantic category (Bruce &

Young, 1986). Gender and age are likewise social and visually derived semantic

categories. Similarly strong effects of experience on perception and memory

have not been reported for age and gender. This may be due to differences in

developmental versus later life contact with race versus gender and age. We will

return to the theme of learning types at several points in the paper.

In this paper, we review computational studies of the other-race effect.

This is a rather sparse literature. Throughout the paper, we emphasize

computational findings relevant for understanding the challenges of creating

representations that are shaped by ‘‘experience’’ with faces. Although the

role of experience in the other-race effect for humans has been controversial,

an understanding of the diversity of neural and computational embodiments

of experience may help to bridge theoretical gaps that are left open with

behavioural approaches.

COMPUTATIONAL APPROACHES TO THEOTHER-RACE EFFECT

Computer vision researchers have been developing computational models of

face recognition for roughly two decades now. Progress on this endeavour

has been reviewed in detail in a recent collection of papers (Li & Jain, 2011).

Historically, and even into the present, these models divide naturally into

two types. Both types are inspired from biological/psychological models of

human perception, albeit in different ways. The first approach is based on

Gabor wavelets connected with dynamic link architectures. Gabor filters are

OTHER-RACE EFFECT: COMPUTATIONAL MODELS 1123

Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

roughly analogous to the receptive field properties of cells in the primary

visual cortex (e.g., Wiskott, Fellous, Kruger, & von der Malsburg, 1997; cf.

Shen & Bai, 2006, for a review). In these models, local Gabor filters sample

the face at points on a square lattice. The process of recognition is

implemented by deforming the lattice to ‘‘fit’’ other stored face exemplars.

If the amount of deformation required for the fit does not exceed some

threshold, the best-fit match is chosen as the recognized identity. These

models show some ability to compensate for changes in illumination,

expression, and small changes (10�15 degrees) in pose. They have been

used also to model aspects of human object and face processing (Biederman

& Kalocsais, 1997). As we will see here, these models operate independent of

experience with faces*a characteristic that makes them ill-suited to model

some aspects of the other-race effect.

The second type of model is based on a global analysis of faces implemented

with principal components analysis (PCA; O’Toole, Abdi, Deffenbacher, &

Valentin, 1993; Sirovich & Kirby, 1987; Turk & Pentland, 1991). In their

original implementation, this analysis was applied to images of faces. Since

then, PCA has been applied to ‘‘morphable’’ face representations from two-

dimensional images (Hancock, Burton, & Bruce, 1996) and to three-dimen-

sional laser scans (Blanz & Vetter, 1999). In both cases, the data include a

separated representation of the shape and reflectance of a face. The

psychological appeal of PCA-based computational approaches, especially

implemented with morphable models, is that they are metaphorically compa-

tible with face space models (Valentine, 1991). As such, they have been proven

valuable for understanding well-known effects in human face recognition. In

particular, a face-space framework has been used to reason about the effects of

typicality on face recognition (Light, Kayra-Stuart, & Hollander, 1979). The

framework also provides an appealing metaphor for face adaptation results

(Leopold, O’Toole, Vetter, & Blanz, 2001; Webster & MacLin, 1999). For

present purposes, face-space representations make for a relatively natural

conceptualization of how we perceive and remember own- and other-race faces.

As we shall see, however, not all computational implementations of face-space

models effortlessly produce an other-race effect. Comparing the ones that do,

with the ones that do not, offers interesting clues about how experience can

affect the quality of face representations for own- and other-race faces.

COMPUTATIONAL MODELLING OF THEOTHER-RACE EFFECT

The first attempt to computationally model the other-race effect was carried

out by O’Toole, Deffenbacher, Abdi, and Bartlett (1991) using a simple

autoassociative neural network. This was applied to images of faces that were


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

aligned so that the eye level and nose roughly coincided. Autoassociative

networks implement a PCA with an iterative procedure that is reminiscent of

perceptual learning. Specifically, these networks act as content-addressable

memories in which the storage of information is parallel and distributed

(images share the same storage space). The iterative ‘‘perceptual learning’’procedure makes gradual adjustments to ‘‘neural synaptic’’ strength to

minimize errors in the ability of the memory to reconstruct a stored image

from a full or partial cue (occluded image). Using this model, O’Toole et al.

conceptualized the cause of the other-race effect as an imbalance in the quality

of face representations for own- and other-race faces. This imbalance was

thought to be due to differences in the amount of perceptual learning people

have for own- versus other-race faces. This type of model embodies the contact

hypothesis of the other-race effect (cf. Levin, 2000). The basic elements of acomputational approach to the contact hypothesis are summarized in O’Toole

et al. and have not changed much in 20 years.

‘‘First, we assume that faces of different races comprise different statistical

categories of faces. Second, within a given category of faces, a set of

differentially weighted ‘‘features’’ is optimal for encoding faces in a manner

that makes faces within the category most discriminable. Different feature sets

and weightings, however, are optimal for processing faces from other-race

categories of faces. Third, with exposure to many faces of a given race and a

smaller number of faces of other races, perceptual learning enables observers to

make optimal use of the features that are best for processing faces from the

category with which they have had the most experience, typically faces of their

own race. By this account, the difficulties experienced with faces of another

race are due to the fact that the optimal features for distinguishing faces of

one’s own race are not optimal in processing the faces of another race’’

(O’Toole et al., 1991, pp. 164�165).

To test the model, O’Toole et al. (1991) implemented autoassociative

networks with a ‘‘majority-race’’ (95%) and ‘‘minority-race’’ (5%) of faces,

with Japanese and Caucasian faces serving alternately as the majority and

minority race. Next, they compared the quality of face representations for

novel majority versus novel minority-race faces. Novel refers to faces that

were not used to create the autoassociative matrix. The results revealed more

accurate reconstructions (i.e., representations) for faces from the majority

race than from the minority race. Moreover, interface similarity, computedas the similarity between all possible pairs of reconstructed images, was

higher for the minority-race than majority-race faces. Thus, the model

created less distinctive representations for minority faces. This result

simulates the basic perceptual components of the other-race effect.

The simulation of a face recognition task with this model proved more

difficult than simulating the perceptual effects. The problem was that the


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

quality of reconstructions for the learned minority faces was actually higher

than the quality of the learned majority faces. This finding was puzzling at

first, though the reason for it was ultimately obvious. The minority faces

were distinct with respect to the training set. Therefore, the matrix could

store the minority faces with minimal interference at the learning stage (i.e.,these faces contributed to the features, or PCs, of the model). To simulate the

recognition component of the other-race effect, the solution was to use a

race-biased face history matrix combined with a short-term ‘‘recognition’’

matrix. The former was strongly race-biased and the latter had equal

numbers of Caucasian and Japanese faces. Using a weighted combination of

the history matrix (0.75) and the short-term matrix (0.25), O’Toole et al.

(1991) found the recognition memory version of the other-race effect.

In summary, O’Toole et al. (1991) showed that it was possible to simulatethe perceptual and memory components of the other-race effect using a

simple computational model in which experience or training with different

races is varied. The need to incorporate different kinds of experience into the

model to obtain the recognition effect suggested that a learning history is

critical in determining the quality and suitability of the feature space for

representing faces. The long-term history may be relevant for understanding

the role of developmental learning in constraining the quality of representa-

tions possible for faces from different races.The importance of considering developmental experience in computational

accounts of the other-race effect was reinforced in a study over a decade later.

Furl, Phillips, and O’Toole (2002) evaluated face recognition algorithms from

the Face Recognition Technology (FERET) program (Phillips, Moon, Rizvi,

& Rauss, 2000). The FERET evaluation was a US government-sponsored test

of computer-based face recognition algorithms, conducted between 1994 and

1997. Furl et al. evaluated six algorithms from that test, plus seven additional

control algorithms implemented by the organizers of the FERET, as baselinealgorithms. Furl et al. tested the algorithms’ ability to recognize Caucasian

(majority race) and Asian (minority race) faces. Note that in this study, the

majority race of Caucasian was set by the FERET competition database and

was not under the control of the experimenters.

The algorithms available from the FERET could be grouped into

categories that mapped well onto two psychological hypotheses about the

role of experience in the other-race effect. The generic contact hypothesis

gives equal weight to learning throughout the ‘‘virtual lifespan’’ of thealgorithm. Eight algorithms fit this description, including seven baseline

PCA models, implemented by Moon and Phillips (2001),1 and an eighth

PCA-based algorithm from Moghaddam and Pentland (1997).

1 Jonathon Phillips was the organizer of the FERET test and so the controls were

implemented as baseline algorithms against which the others could be compared.


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

The developmental contact hypothesis assumes that exposure to faces early

in life tunes feature selection to optimize the quality of representations for

the faces we see most*typically, faces of our own race (cf. Nelson, 2001).2

Once a critical period has passed, these features remain stable. This account

is similar to those proposed by Kuhl (1994, 1998) for native language

learning, whereby phonetic features are tuned with early developmental

exposure to language, and remain stable thereafter. This perspective also

figures prominently in more recent studies of the development of face

recognition (cf. Anzures et al., this issue 2013). Returning to the question of

faces, three algorithms in the FERET used a two-stage learning process

analogous to developmental and mature learning (Moghaddam & Pentland,

1998; Swets & Weng, 1996; Zhao, Krishnaswamy, Chellappa, Swets, &

Weng, 1998). The first step was one of feature selection (using PCA). This

was followed by a standard linear discriminant analysis on a set of newly

learned test faces.Finally, two additional control algorithms available from the FERET test

were deemed noncontact algorithms, because they used the image-based

discriminability of faces. Of note, one of these was an algorithm based on a

dynamic link architecture that processed the output of Gabor jet filters. This

model produces a representation of faces that is independent of the learning

history of the algorithm.

More concretely, all algorithms in the FERET evaluation were trained

using a large number of faces (n�501). Caucasians comprised the majority

(64%) of faces and Asians were the next most populous ethnic group, making

up approximately (7%) of the set. Furl et al. (2002) tested face recognition

accuracy for the algorithms using an equal number of Asian and Caucasian

faces from the training set (old) and an equal number of novel Asian and

Caucasian faces. The results were clear. Algorithms in the developmental

contact group consistently yielded better performance with the Caucasian

faces (majority race). Similar to the result found earlier by O’Toole et al.

(1991) with the PCA model without a learning history (i.e., a step whereby

PC features were based on a learning step with a race bias), seven of the eight

generic contact models performed more accurately on the minority race of

Asian faces. The two noncontact models split between an Asian advantage

and a Caucasian advantage.

The study by Furl et al. (2002) reinforced some basic computational

principles relevant for understanding psychological embodiments of the

contact hypothesis. Specifically, it is well known that contact with other-race

faces, measured in self-report surveys, is a poor predictor of the magnitude

of the other-race effect (Levin, 2000). As noted by Levin (2000), using this

2 We assume this will be covered in detail by Anzures et al. (this issue 2013).


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

type of method, some studies support the contact hypothesis (Carroo, 1986;

Chiroro & Valentine, 1995; Cross, Cross, & Daly, 1971; Feinman & Entwisle,

1976; Shepherd, Deregowski, & Ellis, 1974) and others do not (Brigham &

Barkowitz, 1978; Lavarkas, Buri, & Mayzner, 1976; Malpass & Kravitz,

1969; Ng & Lindsay, 1994). The most consistent effects of contact have been

reported in developmental studies, where other-race exposure occurs early in

life (Chance, Turner, & Goldstein, 1982; Cross et al., 1971; Feinman &

Entwisle, 1976). These studies have now been joined by newer studies with

infants, showing that experience with different races of faces can result in

differences for own- versus other-race face processing as early as 3 months of

age (e.g., Kelly, Liu, Ge, Quinn, Slater, Lee, and Pascalis, 2007).

Following along the lines of the emergence of the other-race effect in

infants, Balas (2012) implemented a computational Bayesian model of the

development of the other-race effect. He modelled developmental learning as

a perceptual narrowing of infants’ ability to discriminate individual faces,

which proceeded in the context of the development of face race categories.

The model was a variant of Moghaddam and Pentland’s (1998) algorithm*one of the models that exhibited an other-race effect in the study of Furl

et al. (2002). This algorithm represents a unique and intriguing approach to

the problem in that it directly learns to distinguish appearance differences

that are due intrapersonal variation from appearance differences due to

extrapersonal variation. As such, training examples for the model come from

difference images created from the same person (intrapersonal) and from

different people (extrapersonal variation). The key manipulation in the

simulations of Balas was the inclusion or exclusion of extrapersonal

variations that crossed race boundaries. The model proceeds by generating

two face spaces: one from the intrapersonal difference images and the other

from the extrapersonal difference images. Next, a Bayesian classifier was

trained to discriminate the two types of variation. Using the resultant

discriminator, Balas simulated a Visual Paired Comparison (VPC) task

commonly employed in infant experiments. In this task, infants view a target

face and must compare it to two additional images: one that matches the

target identity and a second image of a different person. Here, Balas tested

the model on its ability to make VPC discriminations.

As noted, the inclusion or exclusion of training examples of extrapersonal

variations that crossed race boundaries was manipulated. When these cross-

race examples were included, Balas (2012) found better performance for the

majority White faces (90% of the training data) than for the minority Asian

race faces. He concluded that the development of the other-race effect in this

model is consistent with perceptual narrowing, whereby the formation of

race categories plays an important role in determining the relative

discriminability of own- and other-race faces.


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

The importance of a computationally based experience history in the

other-race effect was investigated further by Haque and Cottrell (2005), this

time, with a focus on race categorization. Levin (2000) showed that other-

race faces can be categorized as faces more quickly than own-race faces.

Levin proposed that for other-race faces, race acts a salient ‘‘feature’’ that iseasy to detect. For own-race faces, race is treated as the absence of the

feature. Haque and Cottrell modelled the categorization advantage people

show for other-race faces using a PCA similar to that used in O’Toole et al.

(1991). In this case, however, they modelled the information content of faces

from a majority race versus minority race (Asian or Caucasian), using a

representation created by the PCA model. Information content was assessed

based on Shannon Entropy, as a kind of outlier measure, i.e., outlier faces

have more information. Indeed, they found that the minority race faces hadsignificantly more information than the majority race faces. They interpret

this result in terms of a feature-positive state or salient marker of race in the

minority race faces.

Moving the clock forward, we are now in an era in which face recognition

algorithms are commercial products, widely used by governments, law

enforcement agencies, and other industries where identity verification is

necessary. Over the past decade, beginning with the FERET program, the

US Government has organized large-scale, international tests of facerecognition algorithms on a regular basis. Because these tests attract the

best algorithms in the world, and because the results of these tests are

publicly available, much is known about the state of the art for automatic

face recognition. Since roughly 2005, our lab has had the opportunity to

conduct head-to-head comparisons between algorithms and humans. In

these experiments, we have compared humans and machines at the task of

matching identity in pairs of images.

We digress briefly to describe the procedure we have used in the human�machine comparisons we discuss here. In the large-scale government tests,

algorithms match identity in pairs of images (often more than 100 million

pairs). They do this by assigning a similarity score to each pair of images.

The similarity score indicates the algorithm’s estimate that the two images

are of the same person. The algorithm data consists of a distribution of

similarity scores for same-identity image pairs (pictures of the same person)

and a distribution of similarity scores for different-identity pairs (pictures of

different people). A receiver operating characteristic curve (ROC), createdfrom the same- and different-identity distributions, is used to summarize an

algorithm’s performance. Human participants in our experiments likewise

match identity in interesting subsets (i.e., demographic groups) of the image

pairs used for the algorithm tests. Participants generate a similarity score for

each image pair by rating them on the following scale: (1) sure they are same

person; (2) think they are same person; (3) don’t know; (4) think they are


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

different people; (5) sure they are different people. These ratings are used to

create analogous ROC curves for humans and machines using the same

image pairs.

In our first human�machine comparisons, we worked with algorithms

entered in the Face Recognition Grand Challenge (FRGC; Phillips et al.,

2005) and the Face Recognition Vendor Test 2006 (FRVT 2006; Phillips

et al., 2010). In those first experiments, we found, much to our surprise, that

the best algorithms performed more accurately than humans, even with

differences in illumination between the two images (O’Toole et al., 2007;

O’Toole, Phillips, & Narvekar, 2008). Figure 1 shows an example image pair,

with one image taken with studio lighting and the other image taken in a

corridor. With more recent algorithms, tested with highly variable images

(i.e., taken indoors and outdoors, and with expression and appearance

changes), algorithms are now better than humans in all but the most

challenging conditions (O’Toole, An, et al., 2012).

One issue with all of these tests is that the database used to evaluate the

algorithms contains mostly Caucasian faces. The sheer number of stimulus

pairs tested in the FRVT 2006, however, made it possible to determine

whether algorithms in these tests showed evidence of an other-race effect.

Figure 1. Example of stimulus pair and response options. To view this figure in colour, please see the

online issue of the Journal.


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

For algorithms, this amounts to asking the following question. ‘‘Does the

ethnic composition of the population at the geographic origin of the

algorithm (i.e., where it was developed) affect how an algorithm performs

on faces of different races?’’ We defer a discussion of why we think this might

be the case, until after we present the results.

Phillips, Jiang, Narvekar, Ayyad, and O’Toole (2011) carried out the test

as follows. Algorithms in the FRVT 2006 could be divided into those

originating in Western countries (n �8, from France, Germany, and the

United States) and those originating in East Asian countries (n �5, from

Japan, Korea, and China). Phillips et al. created a Western fusion algorithm

by combining3 the similarity estimates produced by the Western algorithms

and an East Asian algorithm by combining the similarity scores produced by

the East Asian algorithms. Next, all available same-identity and different-

identity pairs of Caucasian faces (n�3,359,404) and Asian faces (n�205,114) were used to create the ROC curves shown in Figure 2. These

curves show the classic other-race effect, with the East Asian fusion

algorithm more accurate with East Asian faces and the Western fusion

algorithm more accurate with Caucasian faces.

Additional experiments by Phillips et al. (2011) replicated the finding that

algorithm and human performance are closely matched. The study also

indicated that humans show an other-race effect at the task of identity

matching*a different task than those typically used in behavioural studies.

Figure 2. Receiver operator characteristics (ROC) curves for East Asian and Western algorithms on

East Asian and Caucasian faces.

3 Combining was done by rescaling the scores from the different algorithms and averaging

them.


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

Perhaps the most intriguing difference between the human and machine

behaviour was that although humans showed the other-race effect, their

performance was more stable across changes in face race, than was the

performance of the algorithms, which in some cases ‘‘tanked’’ on faces of the

other race.Finally, we must ask the question of why the algorithms in this study

showed an other-race effect. Why would the demographic origin of the

algorithm affect its relative performance for faces from the local population

versus faces from a nonlocal population? Unfortunately, this is a question we

cannot answer definitively because many of the algorithms that compete in

international tests are proprietary. To protect the proprietary nature of the

software, while still encouraging the participation of the very best algorithms,

the US Government tests use executable versions of the software for theirevaluations. As a consequence, we can only speculate about the reason(s) for

the other-race effect found by Phillips et al. (2011). Almost certainly, part of

this result has to do with the availability of training data (i.e., faces) where

individual algorithms are developed. This, combined with the likelihood that

most state-of-the-art algorithms employ statistical learning analyses for facial

feature selection, could cause an other-race effect. Fortunately, the next study

moves us closer to understanding the role of experience in the other-race effect

for current computer-based face recognition systems.Klare, Burge, Klontz, Vorder Bruegge, and Jain (2012) examined the effects

of demographics on the performance of six face recognition algorithms. Three

of these were commercial off-the-shelf (COTS) algorithms (Cognitec’s Face-

VACS ver. 8.2, PittPatt ver. 5.2.2, and Neurotechnology’s MegaMatcher ver.

3.1). Although the COTS algorithms probably make use of training regimes,

the authors were not able to alter this training in any way for their study. Two

algorithms were nontrainable*a local binary pattern and Gabor feature

representation model. The sixth algorithm was trainable and was spectrallysampled structural subspace features (4SF) algorithm, developed in-house by

Klare et al. The authors use this trainable algorithm to test hypotheses about

the computational mechanisms underlying the other-race effect.

Klare et al. (2012) tested the six algorithms on a large database that could

be subdivided into eight demographic categories. These categories were race

(Black, Hispanic, and White), sex, and age (younger, 18�30; middle-aged,

30�50; and older, 50�70). All three COTS algorithms and the two untrained

algorithms had lower match accuracies on the following three demographicgroups: Blacks, females, and younger subjects. To gain insight into the role of

training, Klare et al. next tested the performance of their 4SF algorithm, as

follows. They compared their algorithm when it was trained with all of three

ethnic groups simultaneously to three separate implementations of the

algorithm trained with each of the ethnic groups individually. The results

indicated that face-matching accuracy was best when the system was trained


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

only on faces of the same ethnicity. The authors suggest that all COTS face

recognition algorithms should have access to multiple face recognition

systems, trained on different demographic cohorts.

The studies of Klare et al. (2012) and Phillips et al. (2011) underscore the

importance of understanding the mechanisms behind human and machine-based other-race effects. Because face recognition technology is now assigned

to critical tasks, including passport and visa screening in many countries, the

relative accuracy of these machines for faces of different races is more than a

question of interest to psychologists. Rather, it has become part of larger

issues of social policy and nondiscrimination in assuring that everyone is

treated fairly and equally by these emerging technologies.

SUMMARY

In conclusion, in the last 20 years (or more) since the first algorithm-based

model of the other-race effect, we have gained insight into computationalmechanisms that can give rise to the other-race effect. All of the models we

have seen indicate that differential experience with faces of various races, per

se, is not sufficient to produce the effect. Rather, to produce an other-race

effect computationally, biased experience or learning must intervene during

the process of feature selection. This computational principle aligns well with

developmental learning, which may produce a set of stable features that are

optimal for own-race face representation, but are limited in their ability to

represent the uniqueness of other-race faces.

ON MEASURING THE OTHER-RACE EFFECT FORALGORITHMS

In this final section, we briefly discuss a recently identified problem in

accurately measuring the performance of face recognition systems on

different demographic groups of faces. Although this may seem an esoteric

point that is of interest only to researchers who test algorithms, it is an

excellent example of how cross-talk between psychologists and computer

vision researchers can inform attempts to use automatic face recognition

technology. This measurement issue has become increasingly important with

the use of these systems in airports and in other venues where there is acontinually changing tableau of faces of many races.

As noted previously, the performance of algorithms on face identification

tasks requires a distribution of similarity scores for pairs of images that show the

same identity and a distribution of similarity scores for pairs of images that show

different people. There is a strong tendency in the computer vision community

to worry only about the same-identity distribution. In other words, the idea is


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

that if the target and test image are similar, the algorithm will always perform

well. Of course, this is one part of the problem. The other part has to do with the

similarity of pairs of images that show different people. Psychologists have long

worried about the background against which known faces are encoded*in the

context of face typicality effects and other-race effects.

By definition, the same-identity pairs are of the same sex and ethnicity,

though not necessarily of the age (pictures may be taken weeks, months, or

years apart). Different-identity pairs, however, might show people of

different ethnicities, genders, or ages. Whether different-identity image pairs

are constrained to be of the same sex, race, and age is a decision made by the

researchers. In many cases, the performance of face recognition technology is

evaluated with no constraints on the different-identity distribution.

O’Toole, Phillips, An, and Dunlop (2012) documented the effects of

yoking different-identity pairs by gender, race, or both on estimates of the

performance of face recognition algorithms. As expected, performance, as

measured with ROCs, looks best when the different-identity similarity score

distribution is completely unconstrained. In other words, this occurs when

the different-identity pairs are allowed to differ in race and sex. Con-

comitantly, performance looks worse when different-identity pairs are

constrained to be of the same sex and race.

Although this is in some ways an obvious result, it is one that becomes

important when we consider the background population against which

automatic face recognition systems must function. Imagine, for a moment,

an international airport in Europe. The ethnic diversity of the background

population may vary by the time of day (i.e., planes from the Far East Asia,

Europe, North America, and Africa may land at different times of the day)

and/or by the time of year (tourist season). An unstable background

population will give rise to unstable expectations about how well the

algorithm will operate for faces from different races. Behavioural studies of

face typicality have focused the attention of psychologists on the importance

of these background distributions (i.e., what is typical in a particular

context). These findings also have a place in understanding and predicting

the behaviour of face recognition software in the field.

CONCLUSION

Computational algorithms provide a basic framework for testing perceptual

mechanisms that may give rise to an other-race effect, and thus have the

potential to inform psychological tests of the phenomenon. Understanding

how humans develop and retain perceptual and memory advantages for

faces of their own-race has always been an important question in both

social and eyewitness domains. Face recognition systems are becoming a


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

commodity that we all deal with at border crossing and in security-

monitored settings (banks, embassies, railways, airports). From the broad

perspective of social and political policy, researchers from both psychology

and computer vision must begin to consider and counter the role face race

can play in the accuracy of these systems.Before closing it is worth pointing out that, to date, computational

modelling efforts are limited by the availability of truly diverse databases that

have high quality images of several races. Notably absent in this literature is the

inclusion of images of people of Hispanic and African descent. This limits the

generality of the conclusions that can be made both in visual/perceptual and

social terms. Direct comparisons between humans and machines need to

consider both the perceptual challenges posed by faces of other races, as well as

the social context of interactions among own- and other-race people.

Asymmetries in social contact also pose a challenge, with some people (and

categories of people) broadly exposed to other-race faces, and others less so.

Future efforts should be aimed at developing and testing algorithms on (as yet

nonexistent) databases that represent the full diversity of the human race.

REFERENCES

Anzures, G., Quinn, P. C., Pascalis, O., Slater, A. M., & Lee, K. (2013). Development of own

and other-race biases. Visual Cognition. doi:10.1080/13506285.2013.821428

Balas, B. (2012). Bayesian face recognition and perceptual narrowing in face-space. Develop-

mental Science, 15, 579�588. doi:10.1111/j.1467-7687.2012.01154.x

Biederman, I., & Kalocsais, P. (1997). Neurocomputational bases of object and face recognition.

Philosophical Transactions of the Royal Society of London, 29B, 1203�1219. doi:10.1098/

rstb.1997.0103

Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of three dimensional faces.

In SIGGRAPH’ 99: Proceedings of the 26th annual conference on Computer Graphics and

Interactive Techniques (pp. 187�194). New York, NY: ACM Press anf Addison-Wesley.

Brigham, J. C., & Barkowitz, P. (1978). Do ‘‘They all look alike?’’ The effects of race, sex,

experience and attitudes on the ability to recognize faces. Journal of Applied Social

Psychology, 8, 306�318. doi:10.1111/j.1559-1816.1978.tb00786.x

Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology,

77, 305�327. doi:10.1111/j.2044-8295.1986.tb02199.x

Carroo, A. W. (1986). Other-race face recognition: A comparison of Black American and

African subjects. Perceptual and Motor Skills, 62, 135�138. doi:10.2466/pms.1986.62.1.135

Chance, J. E., Turner, A. L., & Goldstein, A. G. (1982). Development of differential recognition

for own- and other-race faces. Journal of Psychology, 112, 29�37. doi:10.1080/

00223980.1982.9923531

Chiroro, P., & Valentine, T. (1995). An investigation of the contact hypothesis of the own-race

bias in face recognition. Quarterly Journal of Experimental Psychology: Human Experi-

mental Psychology, 48A, 879�894.

Cross, J. F., Cross, J., & Daly, J. (1971). Sex, race, age, and beauty as factors in recognition of

faces. Perception and Psychophysics, 10, 393�396. doi:10.3758/BF03210319

Feinman, S., & Entwisle, D. R. (1976). Children’s ability to recognize other children’s faces.

Child Development, 47, 506�510. doi:10.2307/1128809


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

http://dx.doi.org/10.1080/13506285.2013.821428

http://dx.doi.org/10.1111/j.1467-7687.2012.01154.x

http://dx.doi.org/10.1098/rstb.1997.0103

http://dx.doi.org/10.1098/rstb.1997.0103

http://dx.doi.org/10.1111/j.1559-1816.1978.tb00786.x

http://dx.doi.org/10.1111/j.2044-8295.1986.tb02199.x

http://dx.doi.org/10.2466/pms.1986.62.1.135

http://dx.doi.org/10.1080/00223980.1982.9923531

http://dx.doi.org/10.1080/00223980.1982.9923531

http://dx.doi.org/10.3758/BF03210319

http://dx.doi.org/10.2307/1128809

Furl, N., Phillips, P. J., & O’Toole, A. J. (2002). Face recognition algorithms as models of the

other-race effect. Cognitive Science, 96, 1�19.

Hancock, P. J. B., Burton, A. M., & Bruce, V. (1996). Face processing: Human perception and

principal components analysis. Memory and Cognition, 24, 26�40. doi:10.3758/BF03197270

Haque, A., & Cottrell, G. W. (2005). Modeling the other race advantage. In Proceedings of the

27th annual Cognitive Science conference (pp. 899�904). Mahwah, NJ: Lawrence Erlbaum

Associates.

Kelly, D. J., Liu, S., Ge, L., Quinn, P. C., Slater, A. M., Lee, K., & Pascalis, O. (2007). Cross-race

preferences for same-race faces extend beyond the African versus Caucasian contrast in 3-

month-old infants. Infancy, 11, 87�95. doi:10.1207/s15327078in1101_4

Klare, B. F., Burge, M. J., Klontz, J. C., Vorder Bruegge, R. W., & Jain, A. K. (2012). Face

recognition performance: Role of demographic information. IEEE Transactions on

Information Forensics and Security, 7, 1789�1801. doi:10.1109/TIFS.2012.2214212

Kuhl, P. K. (1994). Learning and representation in speech. Current Opinion in Neurobiology, 4,

812�822. doi:10.1016/0959-4388(94)90128-7

Kuhl, P. K. (1998). The development of speech and language. In T. J. Carew, R. Menzel, & C. J.

Schatz (Eds.), Mechanistic relationships between development and learning (pp. 53�73). New

York, NY: Wiley.

Lavarkas, P. J., Buri, J. R., & Mayzner, M. S. (1976). A perspective on the recognition of other-

race faces. Perception and Psychophysics, 20, 475�481. doi:10.3758/BF03208285

Leopold, D. A., O’Toole, A. J., Vetter, T., & Blanz, V. (2001). Prototype-referenced shape encoding

revealed by high-level aftereffects. Nature Neuroscience, 4, 89�94. doi:10.1038/82947

Levin, D. (2000). Race as a visual feature: Using visual search and perceptual discrimination

tasks to understand face categories and the cross-race recognition deficit. Journal of

Experimental Psychology: General, 129, 559�574. doi:10.1037/0096-3445.129.4.559

Li, S. Z., & Jain, A. K. (2011). Handbook of face recognition. London: Springer-Verlag.

Light, L. L., Kayra-Stuart, F., & Hollander, S. (1979). Recognition memory for typical and

unusual faces. Journal of Experimental: Human Perception and Performance, 5, 212�228.

Malpass, R. S., & Kravitz, J. (1969). Recognition for faces of own and other race. Journal of

Personality and Social Psychology, 13, 330�334. doi:10.1037/h0028434

Moghaddam, B., & Pentland, A. (1997). Probabilistic visual learning for object detection. IEEE

Transaction Pattern Analysis and Machine Intelligence, 19, 696�710. doi:10.1109/34.598227

Moghaddam, B., & Pentland, A. (1998). Beyond linear eigenspaces: Bayesian matching for face

recognition. In H. Wechsler, P. J. Phillips, V. Bruce, F. Fogelman Soulie, & T. S. Huang

(Eds.), Face recognition: From theory to applications (pp. 230�243). Berlin: Springer.

Moon, H., & Phillips, P. J. (2001). Computational and performance aspects of PCA-based face

recognition algorithms. Perception, 30, 301�321. doi:10.1068/p2896

Nelson, C. A. (2001). The development and neural bases of face recognition. Infant and Child

Development, 10, 3�18. doi:10.1002/icd.239

Ng, W., & Lindsay, R. C. L. (1994). Cross-race facial recognition: Failure of the contact hypothesis.

Journal of Cross-Cultural Psychology, 25, 217�232. doi:10.1177/0022022194252004

O’Toole, A. J., Abdi, H., Deffenbacher, K. A., & Valentin, D. (1993). Low dimensional

representation of faces in higher dimensions of the face space. Journal of the Optical Society

of America, 10A, 405�410.

O’Toole, A. J., An, X., Dunlop, J. P., Natu, V., & Phillips, P. J. (2012). Comparing face

recognition algorithms to humans on challenging tasks. ACM Transactions on Applied

Perception, 9. doi:10.1145/2355598.2355599

O’Toole, A. J., Deffenbacher, K. A., Abdi, H., & Bartlett, J. C. (1991). Simulation of ‘‘other-race

effect’’ as a problem in perceptual learning. Connection Science, 3, 163�178. doi:10.1080/

09540099108946583


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

http://dx.doi.org/10.3758/BF03197270

http://dx.doi.org/10.1207/s15327078in1101_4

http://dx.doi.org/10.1109/TIFS.2012.2214212

http://dx.doi.org/10.1016/0959-4388(94)90128-7

http://dx.doi.org/10.3758/BF03208285

http://dx.doi.org/10.1038/82947

http://dx.doi.org/10.1037/0096-3445.129.4.559

http://dx.doi.org/10.1037/h0028434

http://dx.doi.org/10.1109/34.598227

http://dx.doi.org/10.1068/p2896

http://dx.doi.org/10.1002/icd.239

http://dx.doi.org/10.1177/0022022194252004

http://dx.doi.org/10.1145/2355598.2355599

http://dx.doi.org/10.1080/09540099108946583

http://dx.doi.org/10.1080/09540099108946583

O’Toole, A. J., Phillips, P. J., An, X., & Dunlop, J. (2012). Demographic effects on estimates of

automatic face recognition. Image and Vision Computing, 30, 169�176. doi:10.1016/

j.imavis.2011.12.007

O’Toole, A. J., Phillips, P. J., Jiang, F., Ayyad, J., Penard, N., & Abdi, H. (2007). Face

recognition algorithms surpass humans matching faces across changes in illumination.

IEEE: Transactions on Pattern Analysis and Machine Intelligence, 29, 1642�1646.

doi:10.1109/TPAMI.2007.1107

O’Toole, A. J., Phillips, P. J., & Narvekar, A. (2008). Humans versus algorithms: Comparisons

from the Face Recognition Vendor Test 2006. Paper presented at the eighth IEEE

international conference on Automatic Face and Gesture Recognition.

Phillips, P. J., Flynn, P. J., Scruggs, T., Bowyer, K. W., Chang, J., Hoffman, K., . . .Worek, W.

(2005). Overview of the face recognition grand challenge. In Proceedings of the Computer

Society conference on Computer Vision and Pattern Recognition (pp. 947�954). Los Alamitos,

CA: IEEE Computer Society Press.

Phillips, P. J., Jiang, F., Narvakar, A., Ayyad, J., & O’Toole, A. J. (2011). An other-race effect for

face recognition algorithms. ACM Transactions on Applied Perception, 8(2), 1�11.

doi:10.1145/1870076.1870082

Phillips, P. J., Moon, H., Rizvi, S., & Rauss, P. (2000). The FERET evaluation method for face

recognition algorithms. IEEE Transaction Pattern Analysis and Machine Intelligence, 22,

1090�1104. doi:10.1109/34.879790

Phillips, P. J., Scruggs, W., O’Toole, A. J., Flynn, P. J., Bowyer, K. W., Scott, C. L., & Sharpe, M.

(2010). FRVT 2006 and ICE 2006 large scale results. IEEE Transactions: Pattern Analysis

and Machine Intelligence, 32, 831�846. doi:10.1109/TPAMI.2009.59

Shen, L., & Bai, L. (2006). A review on Gabor wavelets for face recognition. Pattern Analysis

Applications, 9, 273�292. doi:10.1007/s10044-006-0033-y

Shepherd, J. W., Deregowski, J. B., & Ellis, H. D. (1974). A cross-cultural study of recognition

memory for faces. International Journal of Psychology, 9, 205�212. doi:10.1080/

00207597408247104

Sirovich, L., & Kirby, M. (1987). Low-dimensional procedure for the characterization of

human. Journal of the Optical Society of America, 4A, 519�524.

Swets, D. L., & Weng, J. (1996). Discriminant analysis and eigenspace partition tree for face and

object recognition from views. In Proceedings of second international conference on automatic

face and gesture recognition. Los Alamitos, CA: IEEE Computer Society Press.

Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neurosicence,

3, 71�86. doi:10.1162/jocn.1991.3.1.71

Valentine, T. (1991). A unified account of the effects of distinctiveness, inversion, and race in

face recognition. Quarterly Journal of Experimental Psychology, 43A, 161�204.

Webster, M. A., & MacLin, O. H. (1999). Figural after-effects in the perception of faces.

Psychonomic Bulletin Review, 6, 647�653.

Wiskott, L., Fellous, Kruger, N., & von der Malsburg, C. (1997). Face recognition by elastic

bunch graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19,

775�779.

Zhao, W., Krishnaswamy, A., Chellappa, R., Swets, D., & Weng, J. (1998). Discriminant analysis of

principal components for face recognition. In H. Wechsler, P. J. Phillips, V. Bruce, F. Fogelman

Soulie, & T. S. Huang (Eds.), Face recognition: From theory to applications (pp. 73�85). Berlin:

Springer.

Manuscript received April 2013

Revised manuscript received May 2013

First published online June 2013


Dow

nloa

ded

by [

UQ

Lib

rary

] at

10:

48 1

0 N

ovem

ber

2014

http://dx.doi.org/10.1016/j.imavis.2011.12.007

http://dx.doi.org/10.1016/j.imavis.2011.12.007

http://dx.doi.org/10.1109/TPAMI.2007.1107

http://dx.doi.org/10.1145/1870076.1870082

http://dx.doi.org/10.1109/34.879790

http://dx.doi.org/10.1109/TPAMI.2009.59

http://dx.doi.org/10.1007/s10044-006-0033-y

http://dx.doi.org/10.1080/00207597408247104

http://dx.doi.org/10.1080/00207597408247104

http://dx.doi.org/10.1162/jocn.1991.3.1.71

Documents

Computational perspectives on the other-race effect