Upload
nicta
View
0
Download
0
Embed Size (px)
Citation preview
Optimal Local Basis: A Reinforcement
Learning Approach for Face Recognition This is an author-produced version of a paper published in International Journal of Computer Vision (ISSN 0920-5691). Citation as published: Mehrtash T. Harandi, Majid Nili Ahmadabadi and Babak N. Araabi, "Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition," International Journal of Computer Vision (IJCV), Vol. 81, No. 2, pp. 191-204, Feb. 2009.
Abstract
This paper presents a novel learning approach for Face Recognition by introducing
Optimal Local Basis. Optimal local bases are a set of basis derived by reinforcement
learning to represent the face space locally. By designing the reinforcement signal to be
correlated to the recognition accuracy, the optimal local bases are derived by finding the
most discriminant features for different parts of the face space, which represents either
different individuals or different expressions, orientations, poses, illuminations, and other
variants of the same individual. Therefore, unlike most of the existing approaches that
solve the recognition problem by using a single basis for all individuals, our proposed
method benefits from local information by incorporating different bases for its decision.
We also introduce a novel classification scheme that uses reinforcement signal to build a
similarity measure in a non-metric space.
Experiments on AR, PIE, ORL and YALE databases indicate that the proposed method
facilitates robust face recognition under pose, illumination and expression variations.
The performance of our method is compared with that of Eigenface, Fisherface, Subclass
Discriminant Analysis, and Random Subspace LDA methods as well.
Index Terms: Face Recognition, Feature Selection, Reinforcement Learning.
1. Introduction
Individual identification through face recognition is an important and crucial ability of
humans in effective communication and interactions. Interest in producing artificial face
recognizers has been grown rapidly during the past ten years, mainly because of its
various practical applications such as surveillance, identification, authentication, access
control, and Mug shot searching. Among different approaches devised for face
recognition, the widely studied ones are the statistical learning methods that try to derive
an appropriate basis for face representation [1] [2] [4]. The reason behind deriving a basis
is because a complete basis makes the derivation of unique image representations suitable
for processes like image retrieval and object recognition. Statistical learning theory also
offers a lower dimensional description of faces. The lower dimensional description is
crucial in learning, as the number of examples required for achieving a given
performance exponentially grows with the dimension of the representation space. On the
other hand, low dimensional representations of visual objects have some biological roots
as suggested by Edelman and Intrator [31] :"perceptual tasks such as similarity judgment
tend to be performed on a low-dimensional representation of the sensory data." Principal
Component Analysis (PCA) [1], [7], [8], Linear Discriminate Analysis (LDA) [2],
Subclass Discriminant Analysis (SDA) [3], Independent Component Analysis (ICA) [4],
[10], Locality Preserving Projections (LPP) [12], [13], kernel machines [5], [6], [9] and
hybrid methods [14] are successful examples of applying statistical learning theory to
face recognition by introducing a single basis for face representation. This is where some
important questions come to mind: Is a holistic basis sufficient for a complicated task like
face recognition or we have to introduce different bases for different parts of face space
(individuals) to achieve best practically possible recognition? How does a single basis
may interpret the holistic and feature analysis behaviors [16] of human beings?
Apparently using the same projection for all individuals is in contrast with human feature
analysis behavior in face recognition. Several studies showed that even the derived basis
can be further improved by searching for the most discriminant directions using
evolutionary algorithms like evolutionary pursuit [15], [17], GA-Fisher [18] and
individual/combinatorial feature selection methods [19] [28].
Several other approaches also exist in the area of face recognition to tackle the problem
by introducing different bases. The very first approach was View-based Eigenspaces
proposed by Moghaddam [20]. In his work, Moghaddam suggested to group images
taken from a specific view together and build a specific space for each view. The idea is
extended by Kim to LDA sp aces with soft clustering idea in 2005 [21]. Wang introduced
random subspace LDA to generate a number of subspaces randomly followed by fusing
the result of recognition [11]. In addition, mutual subspaces are also proposed when
several images from each class are available [37].
Nevertheless each space in these mixture models is still holistic in terms of individual
representation and identification. Furthermore, the clustering method used to form the
samples of each space, does not necessarily group the images to optimally extract their
features (LDA in LLDA and LDA mixture model and PCA features in View-based
approach). One can also expect unacceptable generalization ability if the number of
samples in some clusters are not sufficient.
In this paper, inspired by biological findings in human face recognition, we present a
novel idea to locally represent the face space by using the reinforcement learning (RL)
method. The main idea proposed in this paper is illustrated as an example in Fig. 1.
Considering the complex face space manifold, our learning method tries to find the most
discriminant features for different parts of the space. Different parts of face space can be
corresponds to different individuals or even different expressions/poses/illuminations of
an individual. These learned discriminant features are varied in number for different parts
of the face space although they are selected from a unique feature pool. This pool can be
provided by any holistic algorithm. As the learned discriminant features formed local
basis for face spaces derived by statistical learning methods, we called our method
Optimal Local Basis (OLB). The proposed method can be considered as an extension to
evolutionary algorithms [15] in which only one optimal basis is sought for all individuals.
However, in contrast, our approach tries to learn one or several separate optimal
basis/bases for each individual. We also provide a novel classification scheme to use RL
reward signal for taking its final decision in a non-metric space. The proposed method is
successfully tested on AR [33], PIE [34], ORL [29] and YALE [30] datasets, which cover
a wide range of variations such as different expressions, illuminations and poses. We
apply our method to face spaces derived by LDA and compare with Eigenface,
Fisherface, SDA and random subspace LDA in terms of recognition accuracy.
Fig. 1. An example of the proposed idea where, each subject is represented by its most discriminant
features in the face space.
Feature Pool
Most Discriminant Features
Subject B
Subject A Most Discriminant Features
The rest of this paper is organized as follows: Section 2 provides general background
on RL algorithm. In section 3 our method for optimal basis derivation using RL is
introduced. Section 4 presents the proposed class similarity measure and OLB classifier.
Section 5 describes the computational complexity of the proposed method. Experimental
results are described in Section 6. Section 7 provides some interesting properties of the
proposed algorithm followed by conclusion remarks and suggestions for future in
Section 8.
2. Reinforcement Learning
Reinforcement Learning is a machine learning technique for solving sequential decision
problems. Various decision problems in real life are sequential in nature. In these
problems the received reward does not depend on an isolated decision but rather on a
sequence of decisions. Therefore, learning agent maximizes a combination of its
immediate and delayed rewards.
In RL, at each moment t , learning agent senses the environment in state Sst ∈ and
takes an action Aat ∈ where S and A denote the sets of agent’s states and its available
actions, respectively. Agent’s action causes a state transition and reward tr from the
environment. The expected value of the received reward by following an arbitrary policy
π from an initial state ts is given by
⎥⎦
⎤⎢⎣
⎡= ∑
∞
=+
0)(V
iit
it rEs γπ (1)
where itr + is the reward received in state its + using policy π , and [ )1,0∈γ is a discount
factor that balances delayed versus instant rewards. The learning problem is modeled as
the best mapping from states into actions AS → : π i.e. finding the policy that
maximizes the sum of expected rewards:
( )π
arg max V , s s Sππ ∗ = ∀ ∈ (2)
There are different RL methods to find the optimum policy; e.g. Q-learning and Sarsa
[23]. In this paper we use Q-learning to estimate optimal local bases.
3. Optimal Local Basis Learning Using RL
We will introduce our approach in this section. We assume that we are encountering a
classification problem where the feature vectors are represented by N dimensional vectors
and each datum belongs to one of the classes { }1 2, , , Cω ω ω… . An OLB is defined as a
three-tuple ( , , )i i iωx T , where R Ni ∈x is the OLB representative point,
{ }1 2, , ,i Cω ω ω ω∈ … is the OLB class label, i N∈T Λ is a binary vector expressing the set
of features associated with the OLB, and NΛ is the set of all the N dimensional binary
vectors excluding the null binary vector. Therefore, NΛ have 2 1N − members. For
instance, in a face recognition task where the features are obtained by PCA, ix is the
representation of a sample face in the Eigenface space, N is the number of Eigenfaces,
and iT is the binary vector demonstrating which Eigenfaces are the optimal subset to
describe ix . For example in Fig. 1, the optimal binary vector for the subject A is a binary
vector with four elements equal to one whereas the OLB optimal binary vector for the
subject B has only two ones. The location of the ones in the OLB optimal binary vector
determines which features are optimal for describing the underlying OLB. In this sense,
the binary vector iT can be considered as a feature selection mapping from a high
dimensional space into a lower dimensional one. In sequel, the set of optimum local
features, OLB optimal binary vector, and iT are used interchangeably.
In solving a classification problem, the problem is modeled by a set of OLBs. The
OLB learning algorithm tries to find the OLB optimal binary vector iT for each OLB by
maximizing the discrimination power of its representative point ix from other
representative points , j j i≠x . More specifically the learning process tries to find the
best iT so that all the other representative points , j j i≠x with the same class label iω
are seen closer to ix than the representative points with different class labels jω in the
space defined by iT .
In its formal description, OLB derivation algorithm can be regarded as a manifold
learning [36] approach; however, here we would like to explore another view of the OLB
algorithm; which is closely related to some of interesting findings in Biology and
Neuroscience. Considering a classification problem defined over a feature space, features
can be categorized either as global or local based on the way they are used in the
classification task. In most classification algorithms, a unique set of features is identically
selected to separate all the classes over the feature space. Such features are called global
features. On the other hand, most often if we consider a portion of feature space, it is
possible to find a handful of features to characterize that portion more efficiently as
compared with global features. Such features may be labeled as local features. Global
features are ideal, provided that features appropriateness for classification does not vary
much across different classes. That is, the same set of features performs equally well to
separate all the classes. However, in most classification problems, it is very difficult or
even impractical to find such an optimal set of global features. Moreover, all features
may not always be processed and used due to computational and/or time resource
limitations. Unlike the global features, the local features may vary on different portions
of the feature space. The OLB algorithm is a practical approach to obtain the local
features in a complex classification task like face recognition; because iT can be
considered as the subset of most discriminant features from the OLB representative point
of view. It is interesting to know that the opportunistic behavior of human face
recognition [16] and recent theories of face spaces can also be partially modeled by the
OLB algorithm. The theory of face space is the most common theoretical framework
proposed for face recognition in both computational and psychological literature [32]. In
most computational researches, a fixed face space with holistic features is assumed
however some psychological findings challenge the proposition that the feature space is
ever holistic or fixed. It means, an expert’s feature space may become reorganized and
tuned to perform a categorization task more efficiently [35]. In a similar way, the OLB
algorithm manages to find the most discriminant features of each representative point. As
a result, it, tunes and reorganizes the face space to boost the recognition performance
from each local observer’s point of view.
Person-specific image-based approaches are recently introduced with promising
results for face recognition [24], [26], [27]. In [24] and [26], the SIFT operator -introduced
by Lowe [25]- is used to detect the key points of each gallery image and to extract the
scale/rotation invariant features for each key point. For identifying a probe image, its
SIFT features are compared with those of each gallery image and the closest match is
selected as the recognition result. Our approach and the mentioned person-specific
methods resemble each other at the first glance; because, in both algorithms, each
training sample is represented by a set of specific features. However, our method differs
from the person-specific methods in two major aspects. Firstly, our method is
independent of the feature space and theoretically can be used with different features.
Secondly, the nature of the feature selection approaches is different. In fact, we find a set
of most discriminant features for each sample image by looking at its neighbor images
while discrimination power is not directly considered in extraction of features in the
person-specific methods.
Selection of the best feature subset in an N dimensional space demands for examining
2 1N − possibilities per OLB. Searching such a space is computationally exhaustive even
for medium size feature spaces. In order to search the feature space more efficiently, the
feature selection task is modeled as an expected reward maximization problem in an thn
order Markov Decision Process (MDP). This MDP model drastically reduces the
computational complexity of the problem; as in an thn order MDP the learner needs to
keep its n last states and decisions. To derive the OLB optimal binary vector iT , the
problem is divided into two steps. In the first step we obtain a sorted feature set by
learning discrimination factors of all the features describing the OLB representative point
using RL. The discrimination factor shows how discriminative the corresponding feature
is for classifying its representative point along with the other features. In the second step,
the best subset of features is selected as the OLB optimum local features.
In sequel, we introduce the proposed learning method in more details. In particular
Section 3.1 proposes the learning algorithm based on Q-learning -a widely used
implementation of reinforcement learning [23]- to estimate the discrimination factors. In
Section 3.2 we provide different methods for selecting optimal features for each OLB.
Estimated OLBs are integrated into a classification schema in Section 4. A preliminary
study on the OLB learning algorithm is reported in [22].
3.1. The Learning Method
In this section, our method for assigning discrimination factor to each feature of an OLB
using RL is presented. We assume that the learning process is applied to the
( , , )i i iOLB ωx T where the classification problem is modeled by the set
{ }1 1 2 2 2( , , ), ( , , ), , ( , , ), , ( , , )i i i m m mOLB OLB OLB OLBω ω ω ω1x T x T x T x T… … . Here m is
the number of OLBs and m C≥ .
In order to use RL, we should model the problem as a Markov decision process and
design the reward function. Fig. 2 shows a schematic view of the proposed system where,
the learning process is modeled as agent-environment interaction. For each OLB, the
corresponding agent traverses the feature space and at each step selects a feature from the
available features. The environment responds the agent completing an action (feature
selection) by a reward signal. The agent’s task is to learn those features for
( , , )i i iOLB ωx T that result in the collection of maximum expected reward.
Fig. 2. A schematic view of the learning system.
The environment is modeled as an thn order MDP where the current state iS is
represented merely by n last selected features ( )1, , ,s s si n i n if f f− − + … as shown in Fig. 3. In
state iS agent chooses its decision ia based on its n last decision ( )si
sni
sni fff ,,, 1 …+−− and
available remaining features. The selected features are kept in binary vector h of length
N , where N is the dimension of feature space. This is to ensure that a feature is not
selected more than once. Whenever a feature is selected the corresponding element of
vector h becomes one. The agent can select only those features that their corresponding
elements in h are zero.
Fig. 3. A thn order MDP model of the environment, where sjf is the thj selected feature and ia is the
agent’s action at state is
1−iS
1−ir ir
iS 1+iSs
ii fa 1+=sii fa =−1
( )1 1, , ,s s si n i n if f f− − − −… ( )1, , ,s s s
i n i n if f f− − + … ( )1 2 1, , ,s s si n i n if f f− + − + +…
The value of agent’s state-action pairs are modeled by 1n + Q-tables
{ }0 1 2 , , , nQ Q Q Q… of size 1
, , , n
N N N N N N N N N+
⎧ ⎫× × × … × ×…⎨ ⎬
⎩ ⎭. The element
( )0 1 1, , , ,jj jQ i i i i−… of the Q-table jQ demonstrates the expected value of received reward
by selecting feature ji when the features 0 1 1, , , ji i i −… are already selected in j previous
steps. For example in a first order MDP, element ),( 10 ii of 1Q demonstrate the expected
reward of selecting direction 1i when 0i . is previously selected The updating equations
for Q-learning algorithm is
0 1 1 0 1 1 1 2 0 1 11( , , , , ) ( , , , , ) ( max ( , , , , ) ( , , , , ))
Nj j j j
j j j j j l j jlQ i i i i Q i i i i r Q i i i i Q i i i iα γ− − −=
= + + −… … … … (3)
where 0 1 1( , , , , )jj jQ i i i i−… is the expected reward of selecting feature ji when the agent is in
the state ( )0 1 1, , , ji i i −… ; 1 2( , , , , )jj lQ i i i i… is the expected reward of selecting li one step
after selecting feature ji ; r is the received reward of selecting feature ji ; ( )10 ≤< αα is
the learning rate; and ( ) 0 1γ γ≤ ≤ is the discount factor. In the learning phase, the agent
traverses the environment with the ε Greedy policy, i.e. in each state the agent selects the
best action with probability ε−1 while it is possible for the agent to choose a random
action with the probability of ε .
The reward signal is designed to satisfy the following criteria:
• Narrowing the distance between ix and the other representative points ;j j i≠x
with the same class labels ( )i jω ω= in an Euclidean space. This criterion is
equivalent to minimize intra-set distance for each class.
• Maximizing the discrimination of ix from the representative points ;j j i≠x with
different class labels ( )i jω ω≠ . This criterion is equivalent to maximize inter-set
distances for classes. In this process, those features that segregate ix from the
representative points with different class labels are preferred.
The critic (the environment) evaluates the selected features by examining the class
labels of the K nearest neighbors of ix . To do this, all representative points
, 1j j m=x … are projected into the space defined by the selected features,
( ), 1j jdiag j m= ⊗ =p x h … . Then the K nearest neighbors of ip is obtained using L2
norm. The received reward is formulated as
( )( )1
( ) ( ) 1 ( ) ( )K
j i C j i NCj
r f R j f P jω ω=
= × + − ×∑ (4)
In (4) ( )j if ω is a binary value function representing whether the j-th neighbor has
the same class label as iω or not:
if the neighbor has the same class label as .
otherwise
1 ( )
0 i
j i
j - thf
ωω
⎧= ⎨⎩
(5)
( )CR j and ( )NCP j are the reward and punishment that the agent receives for correct
and incorrect hits, respectively. It means that, the agent receives reward ( )CR j when the
j-th neighbor has the same class label as iω and punishment ( )NCP j otherwise. The
maximum and minimum expected rewards are 1
( )K
Cj
R j=∑ and
1
( )K
NCj
P j=∑ , respectively.
Each episode of RL has at last N steps (the number of available features). The episode
could be terminated sooner if the agent receives the maximum reward in hitsC consequent
steps. The agent must visit the feature space sufficiently in order to learn effectively. The
learning algorithm can be summarized as shown in Table 1.
Table 1 – OLB derivation algorithm Algorithm OLB Learning Select randomly an OLB ( , , )i i iOLB ωx T from the training dataset { }1 1 2 2 2( , , ), ( , , ), , ( , , ), , ( , , )i i i m m mOLB OLB OLB OLBω ω ω ω1x T x T x T x T… … . Initialize the corresponding Q-tables randomly. for iteration=1 to _number_of_Episodes do
0,0, 0N⎛ ⎞
⎜ ⎟=⎜ ⎟⎝ ⎠
h …
repeat Select a feature 1
si ia f += by ε greedy policy.
Update the selected feature vector, 1( ) 1sif + =h .
Project all the representative points mjj 1 , =x in training dataset into space defined by h using ( )j jdiag= ⊗p x h . Find the class labels of the K -nearest neighbors of the projected data from ip. Update the corresponding cell of Q-table according to (4).
until in hitsC consequent steps, the agent receives the maximum reward or if all the features are selected.
end for
In this paper, we confined our experiments to the first order MDPs and face spaces
derived by applying statistical learning methods. As a result, in the context of our
proposed algorithm the words direction and feature are used interchangeably. The
updating equations for Q-tables are given in (6) and (7). Since the expected reward of
selecting a direction right after 0i is coded in 1Q , 1Q table is used to update the values in
0Q .
0 0 1 00 0 0 01
( ) ( ) ( max ( , ) ( ))N
llQ i Q i r Q i i Q iα γ
== + + − (6)
1 1 1 10 1 0 1 1 0 11
( , ) ( , ) ( max ( , ) ( , ))N
llQ i i Q i i r Q i i Q i iα γ
== + + − (7)
3.2. Selecting the most appropriate features
After completing the learning process, the set of most appropriate features for each OLB
should be selected. Here, the term "appropriate" is shorthand for "appropriate for
discrimination of an OLB". Not only less appropriate features cannot provide better
discrimination for the underlying OLB, but also they may deteriorate the effect of more
appropriate ones. Now, the question is how to determine the appropriate features for each
OLB. To do this, firstly the features are sorted according to their discrimination using the
available Q-tables and then from the sorted features the optimal binary vector iT is
extracted.
To obtain a set of features in descending order of appropriateness, we use the already
learned Q-tables in recall mode. For an thn order MDP, the first n appropriate features
are obtained by selecting the position of the maximum in , 0,1, 2,.., 1jQ j n= − ,
respectively. In each selection step, the corresponding Q-table and the selected features
are used. After using the above process to find the first n features in order of
appropriateness, the rest of features are selected using the last n selected features and
tracing the position of maximum in the nQ table. Therefore, selection of each feature
beyond thn selected one depends only on nQ table along with the last n selected
features.
As an example, for a second order MDP, we use the 0Q to select the first optimal
feature 0a . Then the selected feature and 1Q is used and the second appropriate feature,
1a , is found by locating the maximum considering 0a , i.e. 11 0argmax( ( , ))jj
a Q a a= . The
remaining appropriate features are obtained from 2Q using
22 1argmax( ( , , ) | ( ) 1)l l l jj
a Q a a a h j− −= = , 2,3,..., 1l N= − , recursively, where h is the binary
vector that keeps track of the selected features in the previous steps. The pseudo code of
acquiring the ordered set of appropriate features is shown in Table 2.
After obtaining the ordered features, we have to select a subset of them. Three
different methods can be devised here:
• Static method: Features with Q-values above a predefined threshold are used in
the decision process. A fixed threshold is used for all OLBs.
• Adaptive method: Features with Q-values above a varying threshold are used in
the decision process. The varying threshold is selected proportional to the largest
Q-value or by a clustering method. In this method the threshold varies for each
OLB.
• Validation method: It is also possible to split the data into validation and test
subsets and then find the optimal features using the validation subset. In this
method the threshold varies for each OLB too.
In this paper we use an adaptive method, which is based on a two classes clustering.
This is partly due to its performance advantages over the static method and also its
relative simplicity compared to the validation method.
Table 2 – Sorting the features for a thn order MDP Algorithm Sorting the features according to their appropriateness.
1fC =
1,1, 1N
h⎛ ⎞⎜ ⎟=⎜ ⎟⎝ ⎠
…
00 arg max( ( ))j
ja Q a=
0_ ( )ordered features a=
0( ) 0h a = ( )0cState a=
while fC n< do Select all the Q values described by cState in fCQ . This is a vector of length N which is called ( ) ( , ), 1fC fC
jRow Q Q cState a j N= = … .
arg max( ( ) | ( ) 1)fCfC j
a Row Q h j= =
_ ( _ , )fCordered features ordered features a= _ ( ) 0fCrem features a =
( )0 1, , , fCcState a a a= … 1fC fC= +
end while while fC N≤ do
Select all the Q values described by State of nQ , ( ) ( , ), 1n
jRow Q cState a j N= = … . This vector has length N .
arg max( ( ) | ( ) 1)nfC j
a Row Q h j= =
_ ( _ , )fCordered features ordered features a= ( ) 0fCh a =
( )1, , ,fC n fC n fCcState a a a− − += … 1fC fC= +
end while
4. Class Similarity Measure and OLB Classifier
When the training phase is finished and the most appropriate features for each OLB are
selected, we are ready to build an OLB-based Classifier. To this end, we need to assess
the similarity of a query datum to all stored classes. Simple similarity judgment based on
ordinary distance measures on feature space does not work here since OLBs have
different dimensions and features; as illustrated in Fig. 4. In this figure, the Euclidean
distance between qx and 1 1 1( , , )OLB ωx T is ( ) ( )2 2
1 ,1 1,1 ,4 1,4q qd x x x x= − + − while the
corresponding distance for 2 2 2( , , )OLB ωx T is
( ) ( ) ( ) ( )2 2 2 2
2 ,1 2,1 ,2 2,2 ,4 2,4 ,5 2,5q q q qd x x x x x x x x= − + − + − + − .
Fig. 4 An illustration of selected features for each OLB. Due to differences in the dimensions and the optimal features of OLBs, it is impractical to use Euclidean norms as similarity measures.
In order to make different bases comparable, we use the reward signal as the similarity
measure. The similarity between a query datum qx and ( , , )i i iOLB ωx T is defined by
first projecting qx and all the representative points , 1j j m=x … into the space defined
by iT , ( )j j idiag= ⊗p x T , finding the label of K-nearest neighbors of qp in that space
and calculating the following similarity measure:
1( , ( , , )) ( ) ( )
K
q i i i C j ij
S OLB R j fω ω=
=∑x x T (8)
1d
1,1x1 1 1( , , )OLB ωx T
,1qx
2,1x
1,7x1,6x1,5x1,4x1,3x1,2x
2,7x2,6x2,5x2,4x2,3x2,2x
,7qx,6qx,5qx,4qx
,3qx,2qx
2 2 2( , , )OLB ωx T
2d
( )j if ω in equation 8 is a binary value function that demonstrates whether the class label
of the j-th neighbor of qp is iω or not (Equation 5). ( )CR j is the reward function
defined in (4).
Based on equation 8, class Ω similarity measure is defined by fusing the similarity
measures of all the OLBs belonging to it, i.e. ( , ( , , ))q i iS OLB Ωx x T . For fusing the
( , ( , , ))q i iS OLB Ωx x T different methods can be utilized including max, median, sum and
product rule. In our experiments sum rule as shown in equation 9 led to higher
recognition accuracies.
1( , ( , , ))
in
q i ii
S S OLBΩ=
= Ω∑ x x T (9)
where in is the number of representative points in class i .
In order to classify a query input, we adopt a single stage decision making strategy
where the similarity measure for all the classes are computed using (9) and the most
similar match is assigned as the output of the classifier.
At semantic level, reward and punishment terms in (4) are comparable with within
and between scatters or distances in a separability matrix or measure. In the classification
stage we want to calculate the similarity of query qx to class iω represented by
( , , )i i iOLB ωx T . Since the punishment shows similarity of qx to the other classes and
similarity of qx to , j j iω ≠ is considered when (8) is calculated for
( , , ) j j jOLB j iω ≠x T , we do not incorporate the punishments in (8), while an
alternative version of (8) with punishment term is still conceivable.
5. Computational Complexity
The computational complexity of the learning algorithm depends on the order of MDP
used for training. The computational complexity of the first order MDP is analyzed here
as this particular model is utilized in experiments of this paper. Consider a first order
MDP with N features. The OLB learning algorithm creates a 1Q table of size N N× .
As every cell of this table must be visited by the agent sufficiently, we chose the number
of episodes of the RL algorithm equal to the number of cells of 1Q as 2N . Since each
episode maximally consists of N steps, the average number of visits per cell is bounded
by N. This ensures that every cells is visited almost N times in average.
In each step of an episode, the agent finds a maximum and updates appropriate cells.
Therefore, the computational complexity of a first order MDP is equal to the number of
episodes, 2( )O N , multiplied by the number of steps, ( )O N , at most. This results in
3( )O N for each OLB. For higher order MDPs, the computational cost is higher. To
avoid high computational cost, authors are currently working on function approximation
methods for estimation of Q-values. This approach drastically decreases the
computational cost [23].
6. Experimental Results
We used AR [33], PIE [34], ORL [29] and YALE [30] databases to evaluate the
performance of the proposed method in different face orientations, expressions,
illumination situations and occlusions. In all experiments, no preprocessing except
downsampling was performed on the images. In each experiment, the image set was
partitioned into training and testing sets. For ease of representation, the experiments are
named as /Gm Pn which means that m images per individual are randomly selected for
training and the remaining n images are used for testing. The experiments are repeated
ten times for every randomly partitioned dataset.
In our method, 2Gm = is the smallest possible number of training data per class. The
reason is that the proposed algorithm employs RL -which is a semi-supervised method-
and like other semi-supervised/supervised algorithms needs more than one sample per
class for learning.
The proposed method is benchmarked against the well-known Eigenface [1],
Fisherface [2] and two state of the art methods; SDA [3] and random subspace LDA [11].
In SDA, the leave one out version was used. The number of subclasses in SDA -which is
a free parameter - was set to two, three and five (whenever the gallery size permits) and
the highest recognition rates are reported. For random subspace LDA, Wang’s
suggestions for the number of subspaces and fixed dimensions are adopted and the
majority voting fusion rule was used [11]. Again for random subspace LDA, the highest
recognition rates are reported.
In OLB algorithm, most appropriate features were selected by the adaptive
thresholding method with k-means clustering. In the learning process of the experiments,
we employed a k-NN classifiers with 3k = whenever there were more than two
prototypes per individual in the training set. The reward signal were [ ]3 2 1.5CR = and
[ ]-1 -1 -1NCP = in these cases. For the cases where there were only two prototypes per
individual in the training set, a 2k = k-NN classifier was used with [ ]3 2CR = and
[ ]-1 -1NCP = .
In order to make a fair comparison with the SDA and random subspace LDA, we
trained the proposed method over LDA space. The number of features passed to OLB
algorithm was 1C − , where C was the number of classes in each database. Consequently
the number of learning episodes was chosen as 2C .
The studied methods are compared in terms of the recognition rate, its standard
deviation and the number of features. For every partition of train/test sets, we found the
maximum recognition rate of each benchmarking method over all of its recognition-
affecting parameters. The reported recognition rate for each method is the average of that
maximum accuracy. The recognition-affecting parameters are the number of features in
Eigenface, the numbers of features and sub-clusters in SDA, and the numbers of
subspaces and randomly selected Eigenfaces in addition to the number of the preserved
ones in the random subspace LDA. For methods in those the number of features affects
the recognition rate, the recognition rate is checked for the top k features as k varies from
one to the maximum number of available features.
The reported numbers of features are the average of number of features associated to
the above mentioned maximum accuracy over ten runs of each train/test partition. The
number of most appropriate features in the proposed method differs from one OLB to
another. Therefore, the reported values here are obtained by first averaging the number of
most appropriate features over all the OLBs and then averaging over ten runs of each
train/test partition. The number of features for all cases was rounded to the nearest
integer.
6.1. AR Database
The AR face database consists of frontal view images of over 100 people. The subjects
were photographed twice at a 2-week interval. During each session 13 conditions with
varying facial expressions, illumination and occlusion were captured. Fig. 5 shows an
example of each condition. Experiments on this database were carried out on the down-
sampled images of size 96x64. Sixty randomly selected individuals of this database were
used in our simulations. A random subset with ( 3,7,11)Gm = images per individual was
taken to form the training set, and the rest of the database was used for testing. The
gallery images were selected merely from the first imaging session.
The recognition accuracies, standard deviations and the number of features are
compared in Table 3 and 4, respectively. Table 3 reveals that the proposed algorithm
outperforms the other studied methods in all cases. The difference between the OLB and
the other algorithms become more significant when the number of gallery samples per
individual is smaller. The improvements are obtained even by using fewer features as
shown in Table 4.
The performance of the random subspace LDA and OLB algorithm were close for
( 7,11)Gm = samples per individual, however; the improvement that OLB has caused
over LDA is very significant. The proposed method can also be used in each subspace
derived by random subspace LDA.
Another interesting property of the obtained results on AR dataset is that when the
number of gallery samples per individual is increased, the number of features in OLB
algorithm is decreased. This shows that when the number of gallery samples is increased,
the OLB agent extracts the local basis with fewer features; this is of course the result of
OLB local representation characteristic.
Fig. 5. Images of one of the subjects in the AR database taken during one session.
Table 3. Recognition rate and its standard deviation of the studied methods for AR database with different
training set size.
Method G3/P23 G7/P19 G11/P15
Eigenface [1] 63.66, σ=7.1 74.7, σ=5.1 77.15, σ=2.8 Fisherface [2] 75.75, σ=5.6 86.0, σ=1.1 87.70, σ=3.0
SDA on Eigenface [3] 75.18, σ=5.3 83.79, σ=3.3 87.13, σ=1.5 Random subspace LDA [11] 77.72, σ=6.4 89.74, σ=1.9 91.04, σ=0.8
Proposed Method 82.47, σ=5.2 90.33, σ=1.8 92.07, σ=0.7
Table 4. The number of features employed by the studied methods for AR database.
Method G3/P23 G7/P19 G11/P15
Eigenface [1] 163 372 359 Fisherface [2] 59 59 59
SDA on Eigenface [3] 67 72 119 Random subspace LDA [11] 20x59 20x59 20x59
Proposed Method 28 16 13
6.2. PIE Database
The CMU PIE face database contains 68 individuals with 41368 face images. Examples
of pose and illumination variation in this database are shown in Fig. 6. The near frontal
poses (C05, C07, C09, C27 and C29) were used in our experiments which resulted in
(a) (b) (f)(e)(d)(c) (g)
(h) (i) (m)(l)(k)(j)
having 170 images per subject under different illuminations and expressions.
Experiments on this database were carried out on the cropped and down-sampled images
of size 119x98. Similar to the previous experiment, a random subset with ( 3,7,11)Gm =
images per individual was taken to form the training set, and the rest of the images
formed the testing set. The recognition accuracies, standard deviations and the number of
features are compared in Table 5 and 6, respectively. Table 5 demonstrates that the
proposed approach performs better than the other studied methods. In addition, the
improvement that OLB has caused over LDA is very significant.
Fig. 6. Example images of PIE database
Table 5. Recognition rate and its standard deviation of the studied methods for PIE database with different
training set size.
Method G3/P167 G7/P163 G11/P159
Eigenface [1] 23.85, σ=4.0 37.08, σ=2.5 46.98, σ=5.9 Fisherface [2] 49.94, σ=7.6 62.14, σ=7.9 67.98, σ=6.7
SDA on Eigenface [3] 46.75, σ=8.8 62.42, σ=5.4 72.12, σ=7.7 Random subspace LDA [11] 52.70, σ=8.4 69.08, σ=10.2 78.58, σ=3.7
Proposed Method 55.55, σ=7.6 71.07, σ=8.2 80.56, σ=4.4
Table 6. The number of features employed by the studied methods for PIE database.
Method G3/P167 G7/P163 G11/P159
Eigenface [1] 123 446 460 Fisherface [2] 67 67 67
SDA on Eigenface [3] 58 54 43 Random subspace LDA [11] 20x67 40x67 40x67
Proposed Method 37 23 32
6.3. ORL Database ORL face database (developed at the Olivetti Research Laboratory, Cambridge, U.K.) is
composed of 400 images with ten different images for each of the 40 distinct subjects.
The variations of the images are across pose, size, time, and facial expressions. Some of
the images of this database are shown in Fig. 7. Experiments on this database were
carried out on the original images of size 112x92. For ORL database, a random subset
with ( 2,3,5)Gm = images per individual was taken to form the training set, and the rest
of the database was used as the testing set. The recognition accuracies, standard
deviations along with the number of features are compared in Table 7 and 8, respectively.
Similar to previous experiments, it can be observed that the OLB algorithm is performing
better than the other studied methods with fewer features. For small number of samples
per individual, the difference between the OLB algorithm and the other algorithms is
more significant. As in AR database, the number of features in OLB is proportional to the
inverse size of the training samples per class.
Fig. 7. Example images of ORL database
Table 7. Recognition rate and its standard deviation of the studied methods for ORL database with different
training set size.
Method G2/P8 G3/P7 G5/P5
Eigenface [1] 80.08, σ=2.6 88.43, σ=4.3 96.1, σ=1.3 Fisherface [2] 73.06, σ=3.9 85.61, σ=2.6 93.6, σ=0.9
SDA on Eigenface [3] 74.42, σ= 4.9 87.32, σ=3.8 93.8, σ=1.1 Random subspace LDA [11] 78.36, σ=3.5 88.95, σ=2.6 96.1, σ=0.8
Proposed Method 83.81, σ=3.2 91.86, σ=2.6 97.7, σ=0.6
Table 8. The number of features employed by the studied methods for ORL.
Method G2/P8 G3/P7 G5/P5
Eigenface [1] 73 63 59 Fisherface [2] 39 39 39
SDA on Eigenface [3] 53 57 78 Random subspace LDA [11] 5x39 5x39 5x39
Proposed Method 13 10 7
6.4. YALE Database
Yale face database contains 165 images with 11 different images for each of the 15
distinct subjects. The 11 images per subject are taken under different facial expression or
configuration: center-light, with glasses, happy, left-light, without glasses, normal, right-
light, sad, sleepy, surprised, and wink. Some of the images of this database are shown in
Fig. 8. Experiments on this database were carried out on the down-sampled images of
size 32x32. For Yale database, a random subset with ( 2,3,5)Gm = images per individual
was taken to form the training set and the rest of the database were used for testing. The
recognition accuracies, standard deviations along with the number of features are
compared in Table 9 and 10, respectively. Like the previous experiments, the proposed
algorithm outperforms the other studied methods with fewer features.
Fig. 8. Example images of YALE database
Table 9. Recognition rate and its standard deviation of the studied methods for YALE database with
different training set size.
Method G2/P9 G3/P8 G5/P6
Eigenface [1] 51.4, σ=4.5 58.25, σ=2.7 61.11, σ=5.7 Fisherface [2] 46.37, σ=3.7 63.58%, σ=5.2 70.67, σ=8.3
SDA on Eigenface [3] 58.52, σ=2.5 66.08, σ=3.2 72.44, σ=6.7 Random subspace LDA [11] 55.38, σ=7.8 70.83, σ=5.2 72.66, σ=5.7
Proposed Method 59.56, σ=4.7 74.08, σ=4.6 76.75, σ=5.2
Table 10. The number of features employed by the studied methods for YALE database.
Method G2/P9 G3/P8 G5/P6
Eigenface [1] 27 40 29 Fisherface [2] 14 14 14
SDA on Eigenface [3] 14 26 23 Random subspace LDA [11] 5x14 5x14 5x14
Proposed Method 9 9 8
7. Further Discussions
In Section 6, we reported the performance of the proposed method on different databases.
Now, we take a closer look at the properties of our proposed algorithm. Firstly the effect
of different face spaces is studied.
The OLB algorithm was trained on PIE database with two different face spaces;
Eigenface and Fisherface. In this experiment, three images per individual were used as
the gallery data. A sample plot of the Q-values vs. PCA/LDA feature numbers for the
first 65 features for one of the gallery samples is shown in Fig. 9. Examining these plots
reveals the following points:
• The directions corresponding to higher eigenvalues are not usually the first and
the best choice for discrimination. For example the second and third directions in
the upper plot of Fig. 9 (PCA space) have negative values. This implies that by
selecting these features, the OLB agent has received punishment, i.e. the agent has
not successfully discriminated its class from the other classes.
• Based on the Q-values demonstrated in the Fig. 9-b, it can be seen that the LDA
space is much more discriminative for this sample of gallery. This observation
suggests that one can define a feature pool with different clusters (like LDA
cluster and PCA cluster). The OLB agent must first decide which cluster is more
appropriate for each sample and then derive the optimal features based on the
selected cluster.
Fig. 9. Q-values vs. feature number for a gallery sample of PIE database trained over (a) PCA space, (b)
LDA space.
10 20 30 40 50 600
20
40
60
80
100
120
140
Feature Number
Q-v
alue
(b)
0 10 20 30 40 50 60 70-30
-20
-10
0
10
20
30
Feature Number
Q-v
alue
(a)
Another interesting observation is to study the order of features in Eigenface space. We
trained the OLB algorithm over the first fifty directions of the space defined by PCA for
ORL database where three images per individual were used for training. In this case, the
Eigenface method reached its best accuracy around 50 features. As a result, we did not
train the OLB algorithm for all the Eigenface space. In Fig. 10 a contour map of the
Q-values vs. feature number for this experiment is shown. Each row of this figure
corresponds to a training image. The brighter regions of this plot mean that the agent has
received greater rewards by selecting the corresponding features. Looking globally, the
features correspond to larger eigenvalues are usually more important than those
correspond to smaller eigenvalues. This observation is supported by random subspace
LDA method where the directions corresponding to higher eigenvalues are kept intact.
However, the existence of samples with brighter regions seen for lower eigenvalues
supports our idea to develop OLB.
Fig. 10. Q-values vs. feature number for all the training images of ORL images where three images per
individual is used as gallery. The face space is derived by Eigenface method. Brighter regions correspond
to higher Q-values. Note that for some images the agent did not receive high rewards.
Finally it is good to study the effect of choosing different k-NN in the training phase.
In order to study the effect of this parameter, a sample experiment was performed by
selecting thirteen images of the first session of the AR database as training data and all
thirteen images of the second session as the testing set and trained the OLB algorithm
with ( 2,3, 4,...,12)k = k-NN classifiers. The recognition accuracy vs. parameter k is
shown in Fig. 11. It is evident that when the number of samples per class is small, the
designer has to choose a small k too. Interestingly, this figure reveals that it is possible to
get higher recognition accuracy by selecting a small k even when the number of samples
per class is large. The reason for this observation is because, by choosing a large k, the
Feature Number
Trai
ning
Sam
ple
Num
ber
10 20 30 40 50
20
40
60
80
100
120
-20
0
20
40
60
80
100
120
OLB agent is forced to derive the best discriminant features for each representative point
as global as possible. It contradicts OLB’s local nature and it may result in reducing its
generalization ability.
Fig. 11. Maximum recognition accuracy for different k-NN classifiers when the thirteen images of the first session of AR database are used for training.
8. Conclusion
In this paper, we introduced an object called Optimal Local Basis (OLB) as a tool for
face recognition. Each OLB was characterized by a representative point in feature space,
a class label, and a set of locally optimal features. The locally optimal features were
derived from a larger set of features by using a Reinforcement Learning (RL) approach.
The reinforcement signal was designed to be correlated to the recognition accuracy. For
2 3 4 5 6 7 8 9 10 11 120.9
0.905
0.91
0.915
0.92
0.925
0.93
0.935
0.94
each representative point, the maximum reward was obtained when those data with the
same class label got closer to it in a Euclidean space.
In general, finding discriminant local bases can be modeled as an optimization
problem. There is a variety of methods for modeling and solving such optimization
problems; like evolutionary methods, dynamic programming, RL, etc. The key
differences of those methods are in the complexity of the model and the optimization
cost, along with the challenges in defining the cost (fitness) function. By assuming the
learning task as a Markov Decision Process (MDP), we reduced the search space
considerably. Since the MDP model is not fully known, we solved the optimization
problem by using RL; which benefits from bootstrapping and Monte-Carlo estimation of
optimal values. In addition, by using RL, defining a fitness function becomes simpler and
more flexible, as we just need local and very simple and discrete reward and punishment
signals. We also exploited the reinforcement signal to construct a new non-metric
classifier. In our approach to classification, by utilizing the reinforcement signal, we do
not need to define and tune a distance measure for each OLB.
The OLB method perfectly suits the face recognition dilemma and problems with
high dimensional feature space and small number of training samples. The performance
of the proposed learning algorithm was examined on the face recognition problem in
different databases and with different number of training images where LDA transform
was used to extract the face features. Nevertheless the proposed method is theoretically
independent from the type of feature space.
Our proposed method can be compared with “person-specific” methods, where a set
of exclusive features is extracted for each individual. Basically we can call the OLBs
“person-specific”, since each OLB is derived to represent an individual. However, the
features associated to each OLB are the most discriminant subset of features in a
neighborhood of its representative point and each person may have several OLBs due to
the complexity of face manifold. That means OLBs are local in addition to being
person-specific.
We are working on an incremental version of our learning algorithm, which enables
the OLB method to easily adapt to the changes cause by adding new classes to the
database. Pattern classification can be grouped into close-set and open-set applications. In
practice, the open-set classification is a more challenging problem. The incremental
version of the proposed learning algorithm can tackle the open-set identification problem
as well. In addition, as the proposed method extracts the most discriminant features for
each prototype in the training database, it could also be a suitable candidate for face
verification task. We plan to investigate the appropriateness of our proposed algorithm
for this task in near future.
References
[1] Turk, M., Pentland, A., (1991). Eigenfaces for recognition. Cognitive Neuroscience, 3, 71-
86.
[2] Belhumeur, P. N., Hespanha, J. P., and Kriegman, D. J., (1997). Eigenfaces vs. Fisherfaces:
recognition using class specific linear projection. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 19(7), 711-720.
[3] Manli, Z., and Martinez, A.M. (2006). Subclass discriminant analysis. IEEE Trans. on
Pattern Analysis and Machine Intelligence, 28(8), 1274-1286.
[4] Bartlett, M. S., Movellan, J. R., and Sejnowski, T. J., (2002). Face Recognition by
Independent Component Analysis. IEEE Trans. on Neural Network, 13(6), 1450-1464.
[5] Kim, K. I., Jung, K., and Kim, H. J., (2002). Face Recognition Using Kernel Principal
Component Analysis. IEEE Signal Processing Letters, 9(2), 40-42.
[6] Liu, C., (2004). Gabor-based kernel PCA with fractional power polynomial models for face
recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26(5), 572-581.
[7] Yang, J., Zhang, D., Frangi, A. F., and Yang, J-Y., (2004). Two-Dimensional PCA: A New
Approach to Appearance-Based Face Representation and Recognition. IEEE Trans. on
Pattern Analysis and machine intelligence, 26(1), 131-137.
[8] Feng, G. C., Yuen, P. C., and Dai, D. Q., (2002). Human Face Recognition Using PCA on
Wavelet Subband. Electronic Imaging, 9, 226-233.
[9] Lu, J., Plataniotis, K.N., and Venetsanopoulos, A.N., (2003). Face recognition using kernel
direct discriminant analysis algorithms. IEEE Trans. on Neural Networks, 14(1), 117-126.
[10] Liu, C., and Wechsler, H., (2003). Independent component analysis of Gabor features for
face recognition. IEEE Trans. on Neural Networks, 14(4), 919-928.
[11] Wang, X., Tang, X., (2006). Random Sampling for Subspace Face Recognition. Int. Journal
of Computer Vision, 70(1), 91-104.
[12] He, X., Yan, S., Hu, Y., Niyogi, P., and Zhang, H., (2005). Face Recognition Using
Laplacianfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(3), 328-340.
[13] Cai, D., He, X., Han, J., and H. Zhang, (2006). Orthogonal Laplacianfaces for Face
Recognition. IEEE Trans. on Image Processing, 15(11), 3608-3614.
[14] Yang, J., Frangi, A.F., Yang, J. Y., Zhang, D., and Jin, Z., (2005). KPCA plus LDA: a
complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE
Trans. on Pattern Analysis and Machine Intelligence, 27(2), 230-244.
[15] Liu, C., and Wechsler, H., (2000). Evolutionary pursuit and its application to face
recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(6), 570-582.
[16] Zhao, W., Chellappa, R., Phillips, P., and Rosenfeld, A., (2003). Face recognition: A
literature survey. ACM Computing Surveys, 35(4), 399-458.
[17] Harandi, M., Nili Ahmadabadi, M., Araabi, B. N., and Lucas, C., (2004). Feature selection
using genetic algorithm and its application to face recognition. IEEE International
Conference on Cybernetics and Intelligent Systems, 1367-1372.
[18] Zheng, W.S., Lai, J.H., and Yuen, P.C., (2005). GA-fisher: a new LDA-based face
recognition algorithm with selection of principal components. IEEE Trans. on Systems, Man
and Cybernetics, Part B, 35(5), 1065-1078.
[19] Liu, H., and Motoda, H., (1998). Feature Extraction, Construction and Selection: A Data
Mining Perspective, Kluwer Academic Publishers Norwell, MA, USA.
[20] Pentland, A., Moghaddam, B., and Starner, T., (1994). View-Based and Modular
Eigenspaces for Face Recognition. IEEE International Conference on Computer Vision
and Pattern Recognition, 84-91.
[21] Kim, T.K., and Kittler, J., (2005). Locally linear discriminant analysis for multimodally
distributed classes for face recognition with a single model image. IEEE Trans. on Pattern
Analysis and Machine Intelligence, 27(3), 318-327.
[22] Harandi, M., Nili Ahmadabadi, M., and Araabi, B.N., (2004). Face Recognition Using
Reinforcement Learning. IEEE International Conference on Image Processing, 2709-2712.
[23] Sutton, R.S., and Barto, A. G., (1998). Reinforcement Learning, An Introduction. MIT
Press, Cambridge, MA.
[24] Ma, J.L., Takikawa, Y., Lao, E., Kawade, S., and Bao-Liang Lu, M., (2007). Person-
Specific SIFT Features for Face Recognition. IEEE International Conference on Acoustics,
Speech and Signal Processing, 593-596.
[25] Lowe, D., (2004). Distinct image features from scale-invariant keypoints., International
Journal of Computer Vision, 60(2), 91-110.
[26] Bicego, M., Lagorio, A., Grosso, E., and Tistarelli, M. (2006). On the use of SIFT features
for face authentication. IEEE International Workshop on Biometrics, in association with
CVPR.
[27] Ahonen T., Hadid A., and Pietikäinen, M. (2006). Face description with local binary
patterns: Application to face recognition. IEEE Trans. on Pattern Analysis and Machine
Intelligence, 28(12), 2037-2041.
[28] Ekenel, H.K., and Sankur, B., (2004).Feature selection in the independent component
subspace for face recognition. Pattern Recognition Letters, 25(12), 1377-1388.
[29] ORL database, publicly available, http://www.uk.research.att.com /facedatabase.html.
[30] Yale University Face Image Database, publicly available for non-commercial use,
http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
[31] Edelman, S., and Intrator, N., (1990). Learning as extraction of low-dimensional
representations. Mechanisms of Perceptual Learning, D. Medin, R. Goldstone, and P.
Schyns, Eds. New York: Academic, 1990.
[32] O’Toole, A. J., Wenger, M. J., and Townsend, J. T., (1999) Quantitative models of
perceiving and remembering faces: Precedents and possibilities. M. J. Wenger & J. T.
Townsend (Eds.), Computational, geometric, and process perspectives on facial cognition:
Contexts and challenges, 1-38.
[33] Martinez, A.M., and Kak, A.C., (2001). PCA versus LDA. IEEE Trans. on Pattern Analysis
and Machine Intelligence, 23(2), 228-233.
[34] Sim, T., Baker, S., and Bsat, M., (2003). The CMU pose, illumination and expression
database. IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(12), 1615-1618.
[35] Carbon, C.C., (2003). Face processing, Early processing in the recognition of faces. ph.D
dissertation, Free University of Berlin.
[36] Roseis, S.T., and Saul, L.K., (2000). Nonlinear dimensionality reduction by locally linear
embedding. Science, 290, 2323-2326.
[37] Yamaguchi, O., Fukui, K.; Maeda, K.i., (1998). Face recognition using temporal image
sequence. IEEE International Conference on Automatic Face and Gesture Recognition, 318 -
323.