Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition

Optimal Local Basis: A Reinforcement

Learning Approach for Face Recognition This is an author-produced version of a paper published in International Journal of Computer Vision (ISSN 0920-5691). Citation as published: Mehrtash T. Harandi, Majid Nili Ahmadabadi and Babak N. Araabi, "Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition," International Journal of Computer Vision (IJCV), Vol. 81, No. 2, pp. 191-204, Feb. 2009.

Abstract

This paper presents a novel learning approach for Face Recognition by introducing

Optimal Local Basis. Optimal local bases are a set of basis derived by reinforcement

learning to represent the face space locally. By designing the reinforcement signal to be

correlated to the recognition accuracy, the optimal local bases are derived by finding the

most discriminant features for different parts of the face space, which represents either

different individuals or different expressions, orientations, poses, illuminations, and other

variants of the same individual. Therefore, unlike most of the existing approaches that

solve the recognition problem by using a single basis for all individuals, our proposed

method benefits from local information by incorporating different bases for its decision.

We also introduce a novel classification scheme that uses reinforcement signal to build a

similarity measure in a non-metric space.

Experiments on AR, PIE, ORL and YALE databases indicate that the proposed method

facilitates robust face recognition under pose, illumination and expression variations.

The performance of our method is compared with that of Eigenface, Fisherface, Subclass

Discriminant Analysis, and Random Subspace LDA methods as well.

Index Terms: Face Recognition, Feature Selection, Reinforcement Learning.

1. Introduction

Individual identification through face recognition is an important and crucial ability of

humans in effective communication and interactions. Interest in producing artificial face

recognizers has been grown rapidly during the past ten years, mainly because of its

various practical applications such as surveillance, identification, authentication, access

control, and Mug shot searching. Among different approaches devised for face

recognition, the widely studied ones are the statistical learning methods that try to derive

an appropriate basis for face representation [1] [2] [4]. The reason behind deriving a basis

is because a complete basis makes the derivation of unique image representations suitable

for processes like image retrieval and object recognition. Statistical learning theory also

offers a lower dimensional description of faces. The lower dimensional description is

crucial in learning, as the number of examples required for achieving a given

performance exponentially grows with the dimension of the representation space. On the

other hand, low dimensional representations of visual objects have some biological roots

as suggested by Edelman and Intrator [31] :"perceptual tasks such as similarity judgment

tend to be performed on a low-dimensional representation of the sensory data." Principal

Component Analysis (PCA) [1], [7], [8], Linear Discriminate Analysis (LDA) [2],

Subclass Discriminant Analysis (SDA) [3], Independent Component Analysis (ICA) [4],

[10], Locality Preserving Projections (LPP) [12], [13], kernel machines [5], [6], [9] and

hybrid methods [14] are successful examples of applying statistical learning theory to

face recognition by introducing a single basis for face representation. This is where some

important questions come to mind: Is a holistic basis sufficient for a complicated task like

face recognition or we have to introduce different bases for different parts of face space

(individuals) to achieve best practically possible recognition? How does a single basis

may interpret the holistic and feature analysis behaviors [16] of human beings?

Apparently using the same projection for all individuals is in contrast with human feature

analysis behavior in face recognition. Several studies showed that even the derived basis

can be further improved by searching for the most discriminant directions using

https://www.researchgate.net/publication/27411100_Two-Dimensional_PCA_A_New_Approach_to_Appearance-Based_Face_Representation_and_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/228603675_Human_face_recognition_using_PCA_on_wavelet_subband?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/220566092_Face_Recognition_A_Literature_Survey?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/220279443_Face_recognition_using_kernel_direct_discriminant_analysis_algorithms?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/3342721_Face_recognition_using_kernel_principal_component_analysis?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/226575631_Eigenfaces_vs_Fisherfaces_Recognition_Using_Class_Specific_Linear_Projection?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


https://www.researchgate.net/publication/2303276_Learning_as_Extraction_of_Low-Dimensional_Representations?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/6720925_Orthogonal_Laplacianfaces_for_Face_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/6899681_Subclass_Discriminant_Analysis?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/220181313_Face_Recognition_Using_Laplacianfaces?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/8255822_Gabor-based_Kernel_PCA_with_Fractional_Power_Polynomial_Models_for_Face_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/27411101_KPCA_Plus_LDA_A_complete_Kernel_Fisher_Discriminant_Framework_for_Feature_Extraction_and_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/5614037_Independent_component_analysis_of_Gabor_features_for_face_recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/288961008_Face_recognition_by_independent_component_analysis?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


https://www.researchgate.net/publication/283996949_Eigenfaces_for_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


evolutionary algorithms like evolutionary pursuit [15], [17], GA-Fisher [18] and

individual/combinatorial feature selection methods [19] [28].

Several other approaches also exist in the area of face recognition to tackle the problem

by introducing different bases. The very first approach was View-based Eigenspaces

proposed by Moghaddam [20]. In his work, Moghaddam suggested to group images

taken from a specific view together and build a specific space for each view. The idea is

extended by Kim to LDA sp aces with soft clustering idea in 2005 [21]. Wang introduced

random subspace LDA to generate a number of subspaces randomly followed by fusing

the result of recognition [11]. In addition, mutual subspaces are also proposed when

several images from each class are available [37].

Nevertheless each space in these mixture models is still holistic in terms of individual

representation and identification. Furthermore, the clustering method used to form the

samples of each space, does not necessarily group the images to optimally extract their

features (LDA in LLDA and LDA mixture model and PCA features in View-based

approach). One can also expect unacceptable generalization ability if the number of

samples in some clusters are not sufficient.

In this paper, inspired by biological findings in human face recognition, we present a

novel idea to locally represent the face space by using the reinforcement learning (RL)

method. The main idea proposed in this paper is illustrated as an example in Fig. 1.

Considering the complex face space manifold, our learning method tries to find the most

discriminant features for different parts of the space. Different parts of face space can be

corresponds to different individuals or even different expressions/poses/illuminations of

an individual. These learned discriminant features are varied in number for different parts

https://www.researchgate.net/publication/2776205_View-Based_and_Modular_Eigenspaces_for_Face_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/3745216_Face_recognition_using_temporal_image_sequence?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/3414067_GA-Fisher_A_New_LDA-Based_Face_Recognition_Algorithm_With_Selection_of_Principal_Components?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/239575017_Feature_Extraction_Construction_and_Selection_A_Data_Mining_Perspective?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/7986020_Locally_linear_discriminant_analysis_for_multimodally_distributed_classes_for_face_recognition_with_a_single_model_image?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/220659416_Random_Sampling_for_Subspace_Face_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/3193120_Evolutionary_pursuit_and_its_application_to_face_recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/220647022_Feature_selection_in_the_independent_component_subspace_for_face_recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

of the face space although they are selected from a unique feature pool. This pool can be

provided by any holistic algorithm. As the learned discriminant features formed local

basis for face spaces derived by statistical learning methods, we called our method

Optimal Local Basis (OLB). The proposed method can be considered as an extension to

evolutionary algorithms [15] in which only one optimal basis is sought for all individuals.

However, in contrast, our approach tries to learn one or several separate optimal

basis/bases for each individual. We also provide a novel classification scheme to use RL

reward signal for taking its final decision in a non-metric space. The proposed method is

successfully tested on AR [33], PIE [34], ORL [29] and YALE [30] datasets, which cover

a wide range of variations such as different expressions, illuminations and poses. We

apply our method to face spaces derived by LDA and compare with Eigenface,

Fisherface, SDA and random subspace LDA in terms of recognition accuracy.

Fig. 1. An example of the proposed idea where, each subject is represented by its most discriminant

features in the face space.

Feature Pool

Most Discriminant Features

Subject B

Subject A Most Discriminant Features

https://www.researchgate.net/publication/3193633_The_CMU_Pose_Illumination_and_Expression_Database?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/220182328_Pca_versus_lda?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


The rest of this paper is organized as follows: Section 2 provides general background

on RL algorithm. In section 3 our method for optimal basis derivation using RL is

introduced. Section 4 presents the proposed class similarity measure and OLB classifier.

Section 5 describes the computational complexity of the proposed method. Experimental

results are described in Section 6. Section 7 provides some interesting properties of the

proposed algorithm followed by conclusion remarks and suggestions for future in

Section 8.

2. Reinforcement Learning

Reinforcement Learning is a machine learning technique for solving sequential decision

problems. Various decision problems in real life are sequential in nature. In these

problems the received reward does not depend on an isolated decision but rather on a

sequence of decisions. Therefore, learning agent maximizes a combination of its

immediate and delayed rewards.

In RL, at each moment t , learning agent senses the environment in state Sst ∈ and

takes an action Aat ∈ where S and A denote the sets of agent’s states and its available

actions, respectively. Agent’s action causes a state transition and reward tr from the

environment. The expected value of the received reward by following an arbitrary policy

π from an initial state ts is given by

⎥⎦

⎤⎢⎣

⎡= ∑

∞

=+

0)(V

iit

it rEs γπ (1)

where itr + is the reward received in state its + using policy π , and [ )1,0∈γ is a discount

factor that balances delayed versus instant rewards. The learning problem is modeled as

the best mapping from states into actions AS → : π i.e. finding the policy that

maximizes the sum of expected rewards:

( )π

arg max V , s s Sππ ∗ = ∀ ∈ (2)

There are different RL methods to find the optimum policy; e.g. Q-learning and Sarsa

[23]. In this paper we use Q-learning to estimate optimal local bases.

3. Optimal Local Basis Learning Using RL

We will introduce our approach in this section. We assume that we are encountering a

classification problem where the feature vectors are represented by N dimensional vectors

and each datum belongs to one of the classes { }1 2, , , Cω ω ω… . An OLB is defined as a

three-tuple ( , , )i i iωx T , where R Ni ∈x is the OLB representative point,

{ }1 2, , ,i Cω ω ω ω∈ … is the OLB class label, i N∈T Λ is a binary vector expressing the set

of features associated with the OLB, and NΛ is the set of all the N dimensional binary

vectors excluding the null binary vector. Therefore, NΛ have 2 1N − members. For

instance, in a face recognition task where the features are obtained by PCA, ix is the

representation of a sample face in the Eigenface space, N is the number of Eigenfaces,

and iT is the binary vector demonstrating which Eigenfaces are the optimal subset to

describe ix . For example in Fig. 1, the optimal binary vector for the subject A is a binary

vector with four elements equal to one whereas the OLB optimal binary vector for the

subject B has only two ones. The location of the ones in the OLB optimal binary vector

determines which features are optimal for describing the underlying OLB. In this sense,

the binary vector iT can be considered as a feature selection mapping from a high

https://www.researchgate.net/publication/5596000_Reinforcement_Learning_An_Introduction?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


dimensional space into a lower dimensional one. In sequel, the set of optimum local

features, OLB optimal binary vector, and iT are used interchangeably.

In solving a classification problem, the problem is modeled by a set of OLBs. The

OLB learning algorithm tries to find the OLB optimal binary vector iT for each OLB by

maximizing the discrimination power of its representative point ix from other

representative points , j j i≠x . More specifically the learning process tries to find the

best iT so that all the other representative points , j j i≠x with the same class label iω

are seen closer to ix than the representative points with different class labels jω in the

space defined by iT .

In its formal description, OLB derivation algorithm can be regarded as a manifold

learning [36] approach; however, here we would like to explore another view of the OLB

algorithm; which is closely related to some of interesting findings in Biology and

Neuroscience. Considering a classification problem defined over a feature space, features

can be categorized either as global or local based on the way they are used in the

classification task. In most classification algorithms, a unique set of features is identically

selected to separate all the classes over the feature space. Such features are called global

features. On the other hand, most often if we consider a portion of feature space, it is

possible to find a handful of features to characterize that portion more efficiently as

compared with global features. Such features may be labeled as local features. Global

features are ideal, provided that features appropriateness for classification does not vary

much across different classes. That is, the same set of features performs equally well to

separate all the classes. However, in most classification problems, it is very difficult or

https://www.researchgate.net/publication/12204040_Nonlinear_Dimensionality_Reduction_by_Locally_Linear_Embedding?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


even impractical to find such an optimal set of global features. Moreover, all features

may not always be processed and used due to computational and/or time resource

limitations. Unlike the global features, the local features may vary on different portions

of the feature space. The OLB algorithm is a practical approach to obtain the local

features in a complex classification task like face recognition; because iT can be

considered as the subset of most discriminant features from the OLB representative point

of view. It is interesting to know that the opportunistic behavior of human face

recognition [16] and recent theories of face spaces can also be partially modeled by the

OLB algorithm. The theory of face space is the most common theoretical framework

proposed for face recognition in both computational and psychological literature [32]. In

most computational researches, a fixed face space with holistic features is assumed

however some psychological findings challenge the proposition that the feature space is

ever holistic or fixed. It means, an expert’s feature space may become reorganized and

tuned to perform a categorization task more efficiently [35]. In a similar way, the OLB

algorithm manages to find the most discriminant features of each representative point. As

a result, it, tunes and reorganizes the face space to boost the recognition performance

from each local observer’s point of view.

Person-specific image-based approaches are recently introduced with promising

results for face recognition [24], [26], [27]. In [24] and [26], the SIFT operator -introduced

by Lowe [25]- is used to detect the key points of each gallery image and to extract the

scale/rotation invariant features for each key point. For identifying a probe image, its

SIFT features are compared with those of each gallery image and the closest match is

selected as the recognition result. Our approach and the mentioned person-specific

https://www.researchgate.net/publication/2738478_Quantitative_Models_of_Perceiving_and_Remembering_Faces_Precedents_and_Possibilities?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

https://www.researchgate.net/publication/200038910_Distinctive_Image_Features_from_Scale-Invariant_Keypoints?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


https://www.researchgate.net/publication/224711149_Person-Specific_SIFT_Features_for_Face_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


https://www.researchgate.net/publication/4245847_On_the_Use_of_SIFT_Features_for_Face_Authentication?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3


https://www.researchgate.net/publication/6690158_Face_Description_with_Local_Binary_Patterns_Application_to_Face_Recognition?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

methods resemble each other at the first glance; because, in both algorithms, each

training sample is represented by a set of specific features. However, our method differs

from the person-specific methods in two major aspects. Firstly, our method is

independent of the feature space and theoretically can be used with different features.

Secondly, the nature of the feature selection approaches is different. In fact, we find a set

of most discriminant features for each sample image by looking at its neighbor images

while discrimination power is not directly considered in extraction of features in the

person-specific methods.

Selection of the best feature subset in an N dimensional space demands for examining

2 1N − possibilities per OLB. Searching such a space is computationally exhaustive even

for medium size feature spaces. In order to search the feature space more efficiently, the

feature selection task is modeled as an expected reward maximization problem in an thn

order Markov Decision Process (MDP). This MDP model drastically reduces the

computational complexity of the problem; as in an thn order MDP the learner needs to

keep its n last states and decisions. To derive the OLB optimal binary vector iT , the

problem is divided into two steps. In the first step we obtain a sorted feature set by

learning discrimination factors of all the features describing the OLB representative point

using RL. The discrimination factor shows how discriminative the corresponding feature

is for classifying its representative point along with the other features. In the second step,

the best subset of features is selected as the OLB optimum local features.

In sequel, we introduce the proposed learning method in more details. In particular

Section 3.1 proposes the learning algorithm based on Q-learning -a widely used

implementation of reinforcement learning [23]- to estimate the discrimination factors. In



Section 3.2 we provide different methods for selecting optimal features for each OLB.

Estimated OLBs are integrated into a classification schema in Section 4. A preliminary

study on the OLB learning algorithm is reported in [22].

3.1. The Learning Method

In this section, our method for assigning discrimination factor to each feature of an OLB

using RL is presented. We assume that the learning process is applied to the

( , , )i i iOLB ωx T where the classification problem is modeled by the set

{ }1 1 2 2 2( , , ), ( , , ), , ( , , ), , ( , , )i i i m m mOLB OLB OLB OLBω ω ω ω1x T x T x T x T… … . Here m is

the number of OLBs and m C≥ .

In order to use RL, we should model the problem as a Markov decision process and

design the reward function. Fig. 2 shows a schematic view of the proposed system where,

the learning process is modeled as agent-environment interaction. For each OLB, the

corresponding agent traverses the feature space and at each step selects a feature from the

available features. The environment responds the agent completing an action (feature

selection) by a reward signal. The agent’s task is to learn those features for

( , , )i i iOLB ωx T that result in the collection of maximum expected reward.

https://www.researchgate.net/publication/221123732_Face_recognition_using_reinforcement_learning?el=1_x_8&enrichId=rgreq-706913ec825869c1465ac05227450fa9-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY1OTUwNjtBUzo5OTg5MDY4NDIzNTc5MUAxNDAwODI3MjAzNzI3

Fig. 2. A schematic view of the learning system.

The environment is modeled as an thn order MDP where the current state iS is

represented merely by n last selected features ( )1, , ,s s si n i n if f f− − + … as shown in Fig. 3. In

state iS agent chooses its decision ia based on its n last decision ( )si

sni

sni fff ,,, 1 …+−− and

available remaining features. The selected features are kept in binary vector h of length

N , where N is the dimension of feature space. This is to ensure that a feature is not

selected more than once. Whenever a feature is selected the corresponding element of

vector h becomes one. The agent can select only those features that their corresponding

elements in h are zero.

Fig. 3. A thn order MDP model of the environment, where sjf is the thj selected feature and ia is the

agent’s action at state is

1−iS

1−ir ir

iS 1+iSs

ii fa 1+=sii fa =−1

( )1 1, , ,s s si n i n if f f− − − −… ( )1, , ,s s s

i n i n if f f− − + … ( )1 2 1, , ,s s si n i n if f f− + − + +…

The value of agent’s state-action pairs are modeled by 1n + Q-tables

{ }0 1 2 , , , nQ Q Q Q… of size 1

, , , n

N N N N N N N N N+

⎧ ⎫× × × … × ×…⎨ ⎬

⎩ ⎭. The element

( )0 1 1, , , ,jj jQ i i i i−… of the Q-table jQ demonstrates the expected value of received reward

by selecting feature ji when the features 0 1 1, , , ji i i −… are already selected in j previous

steps. For example in a first order MDP, element ),( 10 ii of 1Q demonstrate the expected

reward of selecting direction 1i when 0i . is previously selected The updating equations

for Q-learning algorithm is

0 1 1 0 1 1 1 2 0 1 11( , , , , ) ( , , , , ) ( max ( , , , , ) ( , , , , ))

Nj j j j

j j j j j l j jlQ i i i i Q i i i i r Q i i i i Q i i i iα γ− − −=

= + + −… … … … (3)

where 0 1 1( , , , , )jj jQ i i i i−… is the expected reward of selecting feature ji when the agent is in

the state ( )0 1 1, , , ji i i −… ; 1 2( , , , , )jj lQ i i i i… is the expected reward of selecting li one step

after selecting feature ji ; r is the received reward of selecting feature ji ; ( )10 ≤< αα is

the learning rate; and ( ) 0 1γ γ≤ ≤ is the discount factor. In the learning phase, the agent

traverses the environment with the ε Greedy policy, i.e. in each state the agent selects the

best action with probability ε−1 while it is possible for the agent to choose a random

action with the probability of ε .

The reward signal is designed to satisfy the following criteria:

• Narrowing the distance between ix and the other representative points ;j j i≠x

with the same class labels ( )i jω ω= in an Euclidean space. This criterion is

equivalent to minimize intra-set distance for each class.

• Maximizing the discrimination of ix from the representative points ;j j i≠x with

different class labels ( )i jω ω≠ . This criterion is equivalent to maximize inter-set

distances for classes. In this process, those features that segregate ix from the

representative points with different class labels are preferred.

The critic (the environment) evaluates the selected features by examining the class

labels of the K nearest neighbors of ix . To do this, all representative points

, 1j j m=x … are projected into the space defined by the selected features,

( ), 1j jdiag j m= ⊗ =p x h … . Then the K nearest neighbors of ip is obtained using L2

norm. The received reward is formulated as

( )( )1

( ) ( ) 1 ( ) ( )K

j i C j i NCj

r f R j f P jω ω=

= × + − ×∑ (4)

In (4) ( )j if ω is a binary value function representing whether the j-th neighbor has

the same class label as iω or not:

if the neighbor has the same class label as .

otherwise

1 ( )

0 i

j i

j - thf

ωω

⎧= ⎨⎩

(5)

( )CR j and ( )NCP j are the reward and punishment that the agent receives for correct

and incorrect hits, respectively. It means that, the agent receives reward ( )CR j when the

j-th neighbor has the same class label as iω and punishment ( )NCP j otherwise. The

maximum and minimum expected rewards are 1

( )K

Cj

R j=∑ and

1

( )K

NCj

P j=∑ , respectively.

Each episode of RL has at last N steps (the number of available features). The episode

could be terminated sooner if the agent receives the maximum reward in hitsC consequent

steps. The agent must visit the feature space sufficiently in order to learn effectively. The

learning algorithm can be summarized as shown in Table 1.

Table 1 – OLB derivation algorithm Algorithm OLB Learning Select randomly an OLB ( , , )i i iOLB ωx T from the training dataset { }1 1 2 2 2( , , ), ( , , ), , ( , , ), , ( , , )i i i m m mOLB OLB OLB OLBω ω ω ω1x T x T x T x T… … . Initialize the corresponding Q-tables randomly. for iteration=1 to _number_of_Episodes do

0,0, 0N⎛ ⎞

⎜ ⎟=⎜ ⎟⎝ ⎠

h …

repeat Select a feature 1

si ia f += by ε greedy policy.

Update the selected feature vector, 1( ) 1sif + =h .

Project all the representative points mjj 1 , =x in training dataset into space defined by h using ( )j jdiag= ⊗p x h . Find the class labels of the K -nearest neighbors of the projected data from ip. Update the corresponding cell of Q-table according to (4).

until in hitsC consequent steps, the agent receives the maximum reward or if all the features are selected.

end for

In this paper, we confined our experiments to the first order MDPs and face spaces

derived by applying statistical learning methods. As a result, in the context of our

proposed algorithm the words direction and feature are used interchangeably. The

updating equations for Q-tables are given in (6) and (7). Since the expected reward of

selecting a direction right after 0i is coded in 1Q , 1Q table is used to update the values in

0Q .

0 0 1 00 0 0 01

( ) ( ) ( max ( , ) ( ))N

llQ i Q i r Q i i Q iα γ

== + + − (6)

1 1 1 10 1 0 1 1 0 11

( , ) ( , ) ( max ( , ) ( , ))N

llQ i i Q i i r Q i i Q i iα γ

== + + − (7)

3.2. Selecting the most appropriate features

After completing the learning process, the set of most appropriate features for each OLB

should be selected. Here, the term "appropriate" is shorthand for "appropriate for

discrimination of an OLB". Not only less appropriate features cannot provide better

discrimination for the underlying OLB, but also they may deteriorate the effect of more

appropriate ones. Now, the question is how to determine the appropriate features for each

OLB. To do this, firstly the features are sorted according to their discrimination using the

available Q-tables and then from the sorted features the optimal binary vector iT is

extracted.

To obtain a set of features in descending order of appropriateness, we use the already

learned Q-tables in recall mode. For an thn order MDP, the first n appropriate features

are obtained by selecting the position of the maximum in , 0,1, 2,.., 1jQ j n= − ,

respectively. In each selection step, the corresponding Q-table and the selected features

are used. After using the above process to find the first n features in order of

appropriateness, the rest of features are selected using the last n selected features and

tracing the position of maximum in the nQ table. Therefore, selection of each feature

beyond thn selected one depends only on nQ table along with the last n selected

features.

As an example, for a second order MDP, we use the 0Q to select the first optimal

feature 0a . Then the selected feature and 1Q is used and the second appropriate feature,

1a , is found by locating the maximum considering 0a , i.e. 11 0argmax( ( , ))jj

a Q a a= . The

remaining appropriate features are obtained from 2Q using

22 1argmax( ( , , ) | ( ) 1)l l l jj

a Q a a a h j− −= = , 2,3,..., 1l N= − , recursively, where h is the binary

vector that keeps track of the selected features in the previous steps. The pseudo code of

acquiring the ordered set of appropriate features is shown in Table 2.

After obtaining the ordered features, we have to select a subset of them. Three

different methods can be devised here:

• Static method: Features with Q-values above a predefined threshold are used in

the decision process. A fixed threshold is used for all OLBs.

• Adaptive method: Features with Q-values above a varying threshold are used in

the decision process. The varying threshold is selected proportional to the largest

Q-value or by a clustering method. In this method the threshold varies for each

OLB.

• Validation method: It is also possible to split the data into validation and test

subsets and then find the optimal features using the validation subset. In this

method the threshold varies for each OLB too.

In this paper we use an adaptive method, which is based on a two classes clustering.

This is partly due to its performance advantages over the static method and also its

relative simplicity compared to the validation method.

Table 2 – Sorting the features for a thn order MDP Algorithm Sorting the features according to their appropriateness.

1fC =

1,1, 1N

h⎛ ⎞⎜ ⎟=⎜ ⎟⎝ ⎠

…

00 arg max( ( ))j

ja Q a=

0_ ( )ordered features a=

0( ) 0h a = ( )0cState a=

while fC n< do Select all the Q values described by cState in fCQ . This is a vector of length N which is called ( ) ( , ), 1fC fC

jRow Q Q cState a j N= = … .

arg max( ( ) | ( ) 1)fCfC j

a Row Q h j= =

_ ( _ , )fCordered features ordered features a= _ ( ) 0fCrem features a =

( )0 1, , , fCcState a a a= … 1fC fC= +

end while while fC N≤ do

Select all the Q values described by State of nQ , ( ) ( , ), 1n

jRow Q cState a j N= = … . This vector has length N .

arg max( ( ) | ( ) 1)nfC j

a Row Q h j= =

_ ( _ , )fCordered features ordered features a= ( ) 0fCh a =

( )1, , ,fC n fC n fCcState a a a− − += … 1fC fC= +

end while

4. Class Similarity Measure and OLB Classifier

When the training phase is finished and the most appropriate features for each OLB are

selected, we are ready to build an OLB-based Classifier. To this end, we need to assess

the similarity of a query datum to all stored classes. Simple similarity judgment based on

ordinary distance measures on feature space does not work here since OLBs have

different dimensions and features; as illustrated in Fig. 4. In this figure, the Euclidean

distance between qx and 1 1 1( , , )OLB ωx T is ( ) ( )2 2

1 ,1 1,1 ,4 1,4q qd x x x x= − + − while the

corresponding distance for 2 2 2( , , )OLB ωx T is

( ) ( ) ( ) ( )2 2 2 2

2 ,1 2,1 ,2 2,2 ,4 2,4 ,5 2,5q q q qd x x x x x x x x= − + − + − + − .

Fig. 4 An illustration of selected features for each OLB. Due to differences in the dimensions and the optimal features of OLBs, it is impractical to use Euclidean norms as similarity measures.

In order to make different bases comparable, we use the reward signal as the similarity

measure. The similarity between a query datum qx and ( , , )i i iOLB ωx T is defined by

first projecting qx and all the representative points , 1j j m=x … into the space defined

by iT , ( )j j idiag= ⊗p x T , finding the label of K-nearest neighbors of qp in that space

and calculating the following similarity measure:

1( , ( , , )) ( ) ( )

K

q i i i C j ij

S OLB R j fω ω=

=∑x x T (8)

1d

1,1x1 1 1( , , )OLB ωx T

,1qx

2,1x

1,7x1,6x1,5x1,4x1,3x1,2x

2,7x2,6x2,5x2,4x2,3x2,2x

,7qx,6qx,5qx,4qx

,3qx,2qx

2 2 2( , , )OLB ωx T

2d

( )j if ω in equation 8 is a binary value function that demonstrates whether the class label

of the j-th neighbor of qp is iω or not (Equation 5). ( )CR j is the reward function

defined in (4).

Based on equation 8, class Ω similarity measure is defined by fusing the similarity

measures of all the OLBs belonging to it, i.e. ( , ( , , ))q i iS OLB Ωx x T . For fusing the

( , ( , , ))q i iS OLB Ωx x T different methods can be utilized including max, median, sum and

product rule. In our experiments sum rule as shown in equation 9 led to higher

recognition accuracies.

1( , ( , , ))

in

q i ii

S S OLBΩ=

= Ω∑ x x T (9)

where in is the number of representative points in class i .

In order to classify a query input, we adopt a single stage decision making strategy

where the similarity measure for all the classes are computed using (9) and the most

similar match is assigned as the output of the classifier.

At semantic level, reward and punishment terms in (4) are comparable with within

and between scatters or distances in a separability matrix or measure. In the classification

stage we want to calculate the similarity of query qx to class iω represented by

( , , )i i iOLB ωx T . Since the punishment shows similarity of qx to the other classes and

similarity of qx to , j j iω ≠ is considered when (8) is calculated for

( , , ) j j jOLB j iω ≠x T , we do not incorporate the punishments in (8), while an

alternative version of (8) with punishment term is still conceivable.

5. Computational Complexity

The computational complexity of the learning algorithm depends on the order of MDP

used for training. The computational complexity of the first order MDP is analyzed here

as this particular model is utilized in experiments of this paper. Consider a first order

MDP with N features. The OLB learning algorithm creates a 1Q table of size N N× .

As every cell of this table must be visited by the agent sufficiently, we chose the number

of episodes of the RL algorithm equal to the number of cells of 1Q as 2N . Since each

episode maximally consists of N steps, the average number of visits per cell is bounded

by N. This ensures that every cells is visited almost N times in average.

In each step of an episode, the agent finds a maximum and updates appropriate cells.

Therefore, the computational complexity of a first order MDP is equal to the number of

episodes, 2( )O N , multiplied by the number of steps, ( )O N , at most. This results in

3( )O N for each OLB. For higher order MDPs, the computational cost is higher. To

avoid high computational cost, authors are currently working on function approximation

methods for estimation of Q-values. This approach drastically decreases the

computational cost [23].

6. Experimental Results

We used AR [33], PIE [34], ORL [29] and YALE [30] databases to evaluate the

performance of the proposed method in different face orientations, expressions,

illumination situations and occlusions. In all experiments, no preprocessing except

downsampling was performed on the images. In each experiment, the image set was


partitioned into training and testing sets. For ease of representation, the experiments are

named as /Gm Pn which means that m images per individual are randomly selected for

training and the remaining n images are used for testing. The experiments are repeated

ten times for every randomly partitioned dataset.

In our method, 2Gm = is the smallest possible number of training data per class. The

reason is that the proposed algorithm employs RL -which is a semi-supervised method-

and like other semi-supervised/supervised algorithms needs more than one sample per

class for learning.

The proposed method is benchmarked against the well-known Eigenface [1],

Fisherface [2] and two state of the art methods; SDA [3] and random subspace LDA [11].

In SDA, the leave one out version was used. The number of subclasses in SDA -which is

a free parameter - was set to two, three and five (whenever the gallery size permits) and

the highest recognition rates are reported. For random subspace LDA, Wang’s

suggestions for the number of subspaces and fixed dimensions are adopted and the

majority voting fusion rule was used [11]. Again for random subspace LDA, the highest

recognition rates are reported.

In OLB algorithm, most appropriate features were selected by the adaptive

thresholding method with k-means clustering. In the learning process of the experiments,

we employed a k-NN classifiers with 3k = whenever there were more than two

prototypes per individual in the training set. The reward signal were [ ]3 2 1.5CR = and

[ ]-1 -1 -1NCP = in these cases. For the cases where there were only two prototypes per

individual in the training set, a 2k = k-NN classifier was used with [ ]3 2CR = and

[ ]-1 -1NCP = .






In order to make a fair comparison with the SDA and random subspace LDA, we

trained the proposed method over LDA space. The number of features passed to OLB

algorithm was 1C − , where C was the number of classes in each database. Consequently

the number of learning episodes was chosen as 2C .

The studied methods are compared in terms of the recognition rate, its standard

deviation and the number of features. For every partition of train/test sets, we found the

maximum recognition rate of each benchmarking method over all of its recognition-

affecting parameters. The reported recognition rate for each method is the average of that

maximum accuracy. The recognition-affecting parameters are the number of features in

Eigenface, the numbers of features and sub-clusters in SDA, and the numbers of

subspaces and randomly selected Eigenfaces in addition to the number of the preserved

ones in the random subspace LDA. For methods in those the number of features affects

the recognition rate, the recognition rate is checked for the top k features as k varies from

one to the maximum number of available features.

The reported numbers of features are the average of number of features associated to

the above mentioned maximum accuracy over ten runs of each train/test partition. The

number of most appropriate features in the proposed method differs from one OLB to

another. Therefore, the reported values here are obtained by first averaging the number of

most appropriate features over all the OLBs and then averaging over ten runs of each

train/test partition. The number of features for all cases was rounded to the nearest

integer.

6.1. AR Database

The AR face database consists of frontal view images of over 100 people. The subjects

were photographed twice at a 2-week interval. During each session 13 conditions with

varying facial expressions, illumination and occlusion were captured. Fig. 5 shows an

example of each condition. Experiments on this database were carried out on the down-

sampled images of size 96x64. Sixty randomly selected individuals of this database were

used in our simulations. A random subset with ( 3,7,11)Gm = images per individual was

taken to form the training set, and the rest of the database was used for testing. The

gallery images were selected merely from the first imaging session.

The recognition accuracies, standard deviations and the number of features are

compared in Table 3 and 4, respectively. Table 3 reveals that the proposed algorithm

outperforms the other studied methods in all cases. The difference between the OLB and

the other algorithms become more significant when the number of gallery samples per

individual is smaller. The improvements are obtained even by using fewer features as

shown in Table 4.

The performance of the random subspace LDA and OLB algorithm were close for

( 7,11)Gm = samples per individual, however; the improvement that OLB has caused

over LDA is very significant. The proposed method can also be used in each subspace

derived by random subspace LDA.

Another interesting property of the obtained results on AR dataset is that when the

number of gallery samples per individual is increased, the number of features in OLB

algorithm is decreased. This shows that when the number of gallery samples is increased,

the OLB agent extracts the local basis with fewer features; this is of course the result of

OLB local representation characteristic.

Fig. 5. Images of one of the subjects in the AR database taken during one session.

Table 3. Recognition rate and its standard deviation of the studied methods for AR database with different

training set size.

Method G3/P23 G7/P19 G11/P15

Eigenface [1] 63.66, σ=7.1 74.7, σ=5.1 77.15, σ=2.8 Fisherface [2] 75.75, σ=5.6 86.0, σ=1.1 87.70, σ=3.0

SDA on Eigenface [3] 75.18, σ=5.3 83.79, σ=3.3 87.13, σ=1.5 Random subspace LDA [11] 77.72, σ=6.4 89.74, σ=1.9 91.04, σ=0.8

Proposed Method 82.47, σ=5.2 90.33, σ=1.8 92.07, σ=0.7

Table 4. The number of features employed by the studied methods for AR database.


Eigenface [1] 163 372 359 Fisherface [2] 59 59 59

SDA on Eigenface [3] 67 72 119 Random subspace LDA [11] 20x59 20x59 20x59

Proposed Method 28 16 13

6.2. PIE Database

The CMU PIE face database contains 68 individuals with 41368 face images. Examples

of pose and illumination variation in this database are shown in Fig. 6. The near frontal

poses (C05, C07, C09, C27 and C29) were used in our experiments which resulted in

(a) (b) (f)(e)(d)(c) (g)

(h) (i) (m)(l)(k)(j)

having 170 images per subject under different illuminations and expressions.

Experiments on this database were carried out on the cropped and down-sampled images

of size 119x98. Similar to the previous experiment, a random subset with ( 3,7,11)Gm =

images per individual was taken to form the training set, and the rest of the images

formed the testing set. The recognition accuracies, standard deviations and the number of

features are compared in Table 5 and 6, respectively. Table 5 demonstrates that the

proposed approach performs better than the other studied methods. In addition, the

improvement that OLB has caused over LDA is very significant.

Fig. 6. Example images of PIE database

Table 5. Recognition rate and its standard deviation of the studied methods for PIE database with different

training set size.

Method G3/P167 G7/P163 G11/P159




Table 6. The number of features employed by the studied methods for PIE database.

Method G3/P167 G7/P163 G11/P159




6.3. ORL Database ORL face database (developed at the Olivetti Research Laboratory, Cambridge, U.K.) is

composed of 400 images with ten different images for each of the 40 distinct subjects.

The variations of the images are across pose, size, time, and facial expressions. Some of

the images of this database are shown in Fig. 7. Experiments on this database were

carried out on the original images of size 112x92. For ORL database, a random subset

with ( 2,3,5)Gm = images per individual was taken to form the training set, and the rest

of the database was used as the testing set. The recognition accuracies, standard

deviations along with the number of features are compared in Table 7 and 8, respectively.

Similar to previous experiments, it can be observed that the OLB algorithm is performing

better than the other studied methods with fewer features. For small number of samples

per individual, the difference between the OLB algorithm and the other algorithms is

more significant. As in AR database, the number of features in OLB is proportional to the

inverse size of the training samples per class.

Fig. 7. Example images of ORL database

Table 7. Recognition rate and its standard deviation of the studied methods for ORL database with different

training set size.



SDA on Eigenface [3] 74.42, σ= 4.9 87.32, σ=3.8 93.8, σ=1.1 Random subspace LDA [11] 78.36, σ=3.5 88.95, σ=2.6 96.1, σ=0.8


Table 8. The number of features employed by the studied methods for ORL.





6.4. YALE Database

Yale face database contains 165 images with 11 different images for each of the 15

distinct subjects. The 11 images per subject are taken under different facial expression or

configuration: center-light, with glasses, happy, left-light, without glasses, normal, right-

light, sad, sleepy, surprised, and wink. Some of the images of this database are shown in

Fig. 8. Experiments on this database were carried out on the down-sampled images of

size 32x32. For Yale database, a random subset with ( 2,3,5)Gm = images per individual

was taken to form the training set and the rest of the database were used for testing. The

recognition accuracies, standard deviations along with the number of features are

compared in Table 9 and 10, respectively. Like the previous experiments, the proposed

algorithm outperforms the other studied methods with fewer features.

Fig. 8. Example images of YALE database

Table 9. Recognition rate and its standard deviation of the studied methods for YALE database with

different training set size.


Eigenface [1] 51.4, σ=4.5 58.25, σ=2.7 61.11, σ=5.7 Fisherface [2] 46.37, σ=3.7 63.58%, σ=5.2 70.67, σ=8.3



Table 10. The number of features employed by the studied methods for YALE database.





7. Further Discussions

In Section 6, we reported the performance of the proposed method on different databases.

Now, we take a closer look at the properties of our proposed algorithm. Firstly the effect

of different face spaces is studied.

The OLB algorithm was trained on PIE database with two different face spaces;

Eigenface and Fisherface. In this experiment, three images per individual were used as

the gallery data. A sample plot of the Q-values vs. PCA/LDA feature numbers for the

first 65 features for one of the gallery samples is shown in Fig. 9. Examining these plots

reveals the following points:

• The directions corresponding to higher eigenvalues are not usually the first and

the best choice for discrimination. For example the second and third directions in

the upper plot of Fig. 9 (PCA space) have negative values. This implies that by

selecting these features, the OLB agent has received punishment, i.e. the agent has

not successfully discriminated its class from the other classes.

• Based on the Q-values demonstrated in the Fig. 9-b, it can be seen that the LDA

space is much more discriminative for this sample of gallery. This observation

suggests that one can define a feature pool with different clusters (like LDA

cluster and PCA cluster). The OLB agent must first decide which cluster is more

appropriate for each sample and then derive the optimal features based on the

selected cluster.

Fig. 9. Q-values vs. feature number for a gallery sample of PIE database trained over (a) PCA space, (b)

LDA space.

10 20 30 40 50 600

20

40

60

80

100

120

140

Feature Number

Q-v

alue

(b)

0 10 20 30 40 50 60 70-30

-20

-10

0

10

20

30

Feature Number

Q-v

alue

(a)

Another interesting observation is to study the order of features in Eigenface space. We

trained the OLB algorithm over the first fifty directions of the space defined by PCA for

ORL database where three images per individual were used for training. In this case, the

Eigenface method reached its best accuracy around 50 features. As a result, we did not

train the OLB algorithm for all the Eigenface space. In Fig. 10 a contour map of the

Q-values vs. feature number for this experiment is shown. Each row of this figure

corresponds to a training image. The brighter regions of this plot mean that the agent has

received greater rewards by selecting the corresponding features. Looking globally, the

features correspond to larger eigenvalues are usually more important than those

correspond to smaller eigenvalues. This observation is supported by random subspace

LDA method where the directions corresponding to higher eigenvalues are kept intact.

However, the existence of samples with brighter regions seen for lower eigenvalues

supports our idea to develop OLB.

Fig. 10. Q-values vs. feature number for all the training images of ORL images where three images per

individual is used as gallery. The face space is derived by Eigenface method. Brighter regions correspond

to higher Q-values. Note that for some images the agent did not receive high rewards.

Finally it is good to study the effect of choosing different k-NN in the training phase.

In order to study the effect of this parameter, a sample experiment was performed by

selecting thirteen images of the first session of the AR database as training data and all

thirteen images of the second session as the testing set and trained the OLB algorithm

with ( 2,3, 4,...,12)k = k-NN classifiers. The recognition accuracy vs. parameter k is

shown in Fig. 11. It is evident that when the number of samples per class is small, the

designer has to choose a small k too. Interestingly, this figure reveals that it is possible to

get higher recognition accuracy by selecting a small k even when the number of samples

per class is large. The reason for this observation is because, by choosing a large k, the

Feature Number

Trai

ning

Sam

ple

Num

ber

10 20 30 40 50

20

40

60

80

100

120

-20

0

20

40

60

80

100

120

OLB agent is forced to derive the best discriminant features for each representative point

as global as possible. It contradicts OLB’s local nature and it may result in reducing its

generalization ability.

Fig. 11. Maximum recognition accuracy for different k-NN classifiers when the thirteen images of the first session of AR database are used for training.

8. Conclusion

In this paper, we introduced an object called Optimal Local Basis (OLB) as a tool for

face recognition. Each OLB was characterized by a representative point in feature space,

a class label, and a set of locally optimal features. The locally optimal features were

derived from a larger set of features by using a Reinforcement Learning (RL) approach.

The reinforcement signal was designed to be correlated to the recognition accuracy. For

2 3 4 5 6 7 8 9 10 11 120.9

0.905

0.91

0.915

0.92

0.925

0.93

0.935

0.94

each representative point, the maximum reward was obtained when those data with the

same class label got closer to it in a Euclidean space.

In general, finding discriminant local bases can be modeled as an optimization

problem. There is a variety of methods for modeling and solving such optimization

problems; like evolutionary methods, dynamic programming, RL, etc. The key

differences of those methods are in the complexity of the model and the optimization

cost, along with the challenges in defining the cost (fitness) function. By assuming the

learning task as a Markov Decision Process (MDP), we reduced the search space

considerably. Since the MDP model is not fully known, we solved the optimization

problem by using RL; which benefits from bootstrapping and Monte-Carlo estimation of

optimal values. In addition, by using RL, defining a fitness function becomes simpler and

more flexible, as we just need local and very simple and discrete reward and punishment

signals. We also exploited the reinforcement signal to construct a new non-metric

classifier. In our approach to classification, by utilizing the reinforcement signal, we do

not need to define and tune a distance measure for each OLB.

The OLB method perfectly suits the face recognition dilemma and problems with

high dimensional feature space and small number of training samples. The performance

of the proposed learning algorithm was examined on the face recognition problem in

different databases and with different number of training images where LDA transform

was used to extract the face features. Nevertheless the proposed method is theoretically

independent from the type of feature space.

Our proposed method can be compared with “person-specific” methods, where a set

of exclusive features is extracted for each individual. Basically we can call the OLBs

“person-specific”, since each OLB is derived to represent an individual. However, the

features associated to each OLB are the most discriminant subset of features in a

neighborhood of its representative point and each person may have several OLBs due to

the complexity of face manifold. That means OLBs are local in addition to being

person-specific.

We are working on an incremental version of our learning algorithm, which enables

the OLB method to easily adapt to the changes cause by adding new classes to the

database. Pattern classification can be grouped into close-set and open-set applications. In

practice, the open-set classification is a more challenging problem. The incremental

version of the proposed learning algorithm can tackle the open-set identification problem

as well. In addition, as the proposed method extracts the most discriminant features for

each prototype in the training database, it could also be a suitable candidate for face

verification task. We plan to investigate the appropriateness of our proposed algorithm

for this task in near future.

References

[1] Turk, M., Pentland, A., (1991). Eigenfaces for recognition. Cognitive Neuroscience, 3, 71-

86.

[2] Belhumeur, P. N., Hespanha, J. P., and Kriegman, D. J., (1997). Eigenfaces vs. Fisherfaces:

recognition using class specific linear projection. IEEE Trans. on Pattern Analysis and

Machine Intelligence, 19(7), 711-720.

[3] Manli, Z., and Martinez, A.M. (2006). Subclass discriminant analysis. IEEE Trans. on

Pattern Analysis and Machine Intelligence, 28(8), 1274-1286.








[4] Bartlett, M. S., Movellan, J. R., and Sejnowski, T. J., (2002). Face Recognition by

Independent Component Analysis. IEEE Trans. on Neural Network, 13(6), 1450-1464.

[5] Kim, K. I., Jung, K., and Kim, H. J., (2002). Face Recognition Using Kernel Principal

Component Analysis. IEEE Signal Processing Letters, 9(2), 40-42.

[6] Liu, C., (2004). Gabor-based kernel PCA with fractional power polynomial models for face

recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 26(5), 572-581.

[7] Yang, J., Zhang, D., Frangi, A. F., and Yang, J-Y., (2004). Two-Dimensional PCA: A New

Approach to Appearance-Based Face Representation and Recognition. IEEE Trans. on

Pattern Analysis and machine intelligence, 26(1), 131-137.

[8] Feng, G. C., Yuen, P. C., and Dai, D. Q., (2002). Human Face Recognition Using PCA on

Wavelet Subband. Electronic Imaging, 9, 226-233.

[9] Lu, J., Plataniotis, K.N., and Venetsanopoulos, A.N., (2003). Face recognition using kernel

direct discriminant analysis algorithms. IEEE Trans. on Neural Networks, 14(1), 117-126.

[10] Liu, C., and Wechsler, H., (2003). Independent component analysis of Gabor features for

face recognition. IEEE Trans. on Neural Networks, 14(4), 919-928.

[11] Wang, X., Tang, X., (2006). Random Sampling for Subspace Face Recognition. Int. Journal

of Computer Vision, 70(1), 91-104.

[12] He, X., Yan, S., Hu, Y., Niyogi, P., and Zhang, H., (2005). Face Recognition Using

Laplacianfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(3), 328-340.

[13] Cai, D., He, X., Han, J., and H. Zhang, (2006). Orthogonal Laplacianfaces for Face

Recognition. IEEE Trans. on Image Processing, 15(11), 3608-3614.






















[14] Yang, J., Frangi, A.F., Yang, J. Y., Zhang, D., and Jin, Z., (2005). KPCA plus LDA: a

complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE

Trans. on Pattern Analysis and Machine Intelligence, 27(2), 230-244.

[15] Liu, C., and Wechsler, H., (2000). Evolutionary pursuit and its application to face

recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 22(6), 570-582.

[16] Zhao, W., Chellappa, R., Phillips, P., and Rosenfeld, A., (2003). Face recognition: A

literature survey. ACM Computing Surveys, 35(4), 399-458.

[17] Harandi, M., Nili Ahmadabadi, M., Araabi, B. N., and Lucas, C., (2004). Feature selection

using genetic algorithm and its application to face recognition. IEEE International

Conference on Cybernetics and Intelligent Systems, 1367-1372.

[18] Zheng, W.S., Lai, J.H., and Yuen, P.C., (2005). GA-fisher: a new LDA-based face

recognition algorithm with selection of principal components. IEEE Trans. on Systems, Man

and Cybernetics, Part B, 35(5), 1065-1078.

[19] Liu, H., and Motoda, H., (1998). Feature Extraction, Construction and Selection: A Data

Mining Perspective, Kluwer Academic Publishers Norwell, MA, USA.

[20] Pentland, A., Moghaddam, B., and Starner, T., (1994). View-Based and Modular

Eigenspaces for Face Recognition. IEEE International Conference on Computer Vision

and Pattern Recognition, 84-91.

[21] Kim, T.K., and Kittler, J., (2005). Locally linear discriminant analysis for multimodally

distributed classes for face recognition with a single model image. IEEE Trans. on Pattern

Analysis and Machine Intelligence, 27(3), 318-327.



















[22] Harandi, M., Nili Ahmadabadi, M., and Araabi, B.N., (2004). Face Recognition Using

Reinforcement Learning. IEEE International Conference on Image Processing, 2709-2712.

[23] Sutton, R.S., and Barto, A. G., (1998). Reinforcement Learning, An Introduction. MIT

Press, Cambridge, MA.

[24] Ma, J.L., Takikawa, Y., Lao, E., Kawade, S., and Bao-Liang Lu, M., (2007). Person-

Specific SIFT Features for Face Recognition. IEEE International Conference on Acoustics,

Speech and Signal Processing, 593-596.

[25] Lowe, D., (2004). Distinct image features from scale-invariant keypoints., International

Journal of Computer Vision, 60(2), 91-110.

[26] Bicego, M., Lagorio, A., Grosso, E., and Tistarelli, M. (2006). On the use of SIFT features

for face authentication. IEEE International Workshop on Biometrics, in association with

CVPR.

[27] Ahonen T., Hadid A., and Pietikäinen, M. (2006). Face description with local binary

patterns: Application to face recognition. IEEE Trans. on Pattern Analysis and Machine

Intelligence, 28(12), 2037-2041.

[28] Ekenel, H.K., and Sankur, B., (2004).Feature selection in the independent component

subspace for face recognition. Pattern Recognition Letters, 25(12), 1377-1388.

[29] ORL database, publicly available, http://www.uk.research.att.com /facedatabase.html.

[30] Yale University Face Image Database, publicly available for non-commercial use,

http://cvc.yale.edu/projects/yalefaces/yalefaces.html.
















[31] Edelman, S., and Intrator, N., (1990). Learning as extraction of low-dimensional

representations. Mechanisms of Perceptual Learning, D. Medin, R. Goldstone, and P.

Schyns, Eds. New York: Academic, 1990.

[32] O’Toole, A. J., Wenger, M. J., and Townsend, J. T., (1999) Quantitative models of

perceiving and remembering faces: Precedents and possibilities. M. J. Wenger & J. T.

Townsend (Eds.), Computational, geometric, and process perspectives on facial cognition:

Contexts and challenges, 1-38.

[33] Martinez, A.M., and Kak, A.C., (2001). PCA versus LDA. IEEE Trans. on Pattern Analysis

and Machine Intelligence, 23(2), 228-233.

[34] Sim, T., Baker, S., and Bsat, M., (2003). The CMU pose, illumination and expression

database. IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(12), 1615-1618.

[35] Carbon, C.C., (2003). Face processing, Early processing in the recognition of faces. ph.D

dissertation, Free University of Berlin.

[36] Roseis, S.T., and Saul, L.K., (2000). Nonlinear dimensionality reduction by locally linear

embedding. Science, 290, 2323-2326.

[37] Yamaguchi, O., Fukui, K.; Maeda, K.i., (1998). Face recognition using temporal image

sequence. IEEE International Conference on Automatic Face and Gesture Recognition, 318 -

323.










Documents

Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition