USING ATTRIBUTE EMBEDDED INVERTED INDEXING FOR … · USING ATTRIBUTE EMBEDDED INVERTED INDEXING FOR ... (scalable) codewords.To the best ... Sparse coding for face image retrieval

USING ATTRIBUTE EMBEDDED INVERTED INDEXING FOR

THE CONTENT BASED FACE IMAGE EXTRACTION

Reshma S Hebbar

Student, VTU RC Mysore

[email protected]

Abstract—Face image retrieval from the

large scale databases is the challenging problem and is beneficial to many real world applications. In this paper I propose to develop a method where I exploit special properties of faces like local features based on hamming distance to encode discriminative global feature for each face inorder to retrieve the required image from large scale database. In this I utilize demographic information and facial marks so that it will be useful in differentiating identical twins which thereby improves face image matching and retrieval. Here automatic attribute detection based on both local features and global image representations are used. I retrieve the image based on two important concepts that is attribute enhanced sparse coding and attribute embedded inverted indexing. The results obtained from experiments show the good retrieval performance and increased efficiency up to 45%.So here my system is not only scalable but also outperforms the linear scan retrieval system.

Keywords-face retrieval, attribute

embedded inverted indexing, sparse coding. 1. INTRODUCTION

Given a face images as a query, my goal is to retrieve

images containing faces of a person appeared in the

query image from a large scale databases. Among all

those photos, a big percentage of them are photos with

human faces (estimated more than 60%). The importance

and the sheer amount of human face photos make

manipulations (e.g., search and mining) of large-scale

human face images a really important research problem

and enable many real world applications [1], [2]. My

goal in this paper is to address one of the important and

challenging problems – large-scale content-based face

image retrieval. Given a query face image, content-based

face image retrieval tries to find similar face images from

a large image database. It is an enabling technology for many applications including automatic face annotation

[2], crime investigation [3], etc. Traditional

methods for face image retrieval usually use low-level features to represent faces [2], [4], [5], but low-level

features are lack of semantic meanings and face images

usually have high intra-class variance (e.g., expression,

posing), so the retrieval results are unsatisfactory. To

tackle this problem In this work, we provide a new

perspective on content based face image retrieval by

incorporating high-level human attributes into face

image representation and index structure. Along with the

facial marks, demographic information (i.e., gender and

ethnicity) and face identification [8] can also be

considered as ancillary information that is useful in

matching face images. The demographic information and

facial marks are collectively referred to as soft biometric

traits. Soft biometric traits are defined as characteristics

that provide some information about the individual, but

lack the distinctiveness and permanence to sufficiently

differentiate any two individuals By combining low-level features with high-level human attributes, was able to

find better feature representations and achieve better

retrieval results. Here we use two orthogonal methods.

i) Attribute enhanced sparse coding ii) attribute

embedded inverted indexing. The selected descriptors are

promising for other applications paper is organized as

follows. Section 2 discusses related work. Section 3

describes observations on the face image retrieval

problem and the promising utilities of human attributes.

Section 4 introduces the proposed methods including

attribute-enhanced sparse coding and attribute embedded

inverted indexing. Section 5 gives the experimental

results, and section 6 concludes this paper.

2. RELATED WORK

This work is closely related to several different research topics, including content-based image retrieval (CBIR),

human attribute detection, and content-based face image

retrieval. Traditional CBIR techniques use image content

like colour, texture and gradient to represent images. To

deal with large scale data, mainly two kinds of indexing

systems are used .Many studies have leveraged inverted

indexing or hash based indexing combined with bag-of-

word model (BoW)[9] and local features like SIFT, to

achieve efficient similarity search.Although these

Reshma S Hebbar, Int.J.Computer Technology & Applications,Vol 5 (3),1061-1065

IJCTA | May-June 2014 Available [email protected]

1061

ISSN:2229-6093

methods can achieve high precision on rigid object

retrieval, they suffer from low recall problem due to the

semantic gap. Recently, some researchers have focused

on bridging the semantic gap by finding semantic image

representations to improve the CBIR performance and

propose to use extra textual information to construct

semantic code words uses class labels for semantic

hashing. The idea of this work is similar to the

aforementioned methods, but instead of using extra

information that might require intensive human

annotations (and tagging), this paper exploits

automatically detected human attributes to construct

semantic codewords for the face image retrieval task. Automatically detected human attributes have been

shown promising in different applications recently.

Kumar et al.propose a learning framework to

automatically find describable visual attributes [7]. Using

automatically detected human attributes, they achieve

excellent performance on keyword based face image

retrieval and face verification. Some further extend the

framework to deal with multi-attribute queries for

keyword-based face image retrieval. Bayesian network

approach was proposed to utilize the human attributes for

face identification. This works demonstrates the

emerging opportunities for the human attributes but are

not exploited to generate more semantic (scalable)

codewords.To the best of my knowledge, very few works

aim to deal with this problem. Due to the rise of photo

sharing/social network services, there rises the strong

need for large-scale content-based face image retrieval.

Meanwhile, the photo quality in consumer photos is more diverse and poses more visual variances. Wu et al.

[4] propose a face retrieval framework using component-

based local features with identity-based quantization to

deal with scalability issues. Wang et. al. [2] proposes an

automatic face annotation framework based on content-

based face image retrieval. In their framework, they

adopt GIST [2] feature with locality sensitive hashing [5]

for face image retrieval. Chen et al. [5] propose to use

component-based local binary pattern (LBP) a well

known feature for face recognition, combined with

sparse coding and partial identity information to

construct semantic code words for content-based face

image retrieval. Although images naturally have very

high dimensional representations, those within the same

class usually lie on a low dimensional subspace. Sparse

coding can exploit the semantics of the data and achieve

promising results in many different applications such as image classification and face recognition. Raina et al.

propose a machine learning framework using unlabeled

data with sparse coding for classification tasks Yang

et.al apply the framework on SIFT descriptors along with

spatial pyramid matching [7] and maximum pooling to

improve classification results. Wright et. al. propose to

use sparse representation for face recognition and

achieve state-of the-art performance. Note that the

proposed methods can be easily combined with the

method proposed in [5] to take advantage of both identity

information and automatically detected human attributes.

Also, low-level feature (i.e., LBP) can be replaced by

other features such as T3HS2 descriptor.

3. OBSERVATIONS

When dealing with face images, as shown in figure 1,

usually crop only the facial region and normalize the face

into the same position and illumination to reduce intra-

class. Doing these pre-processing steps, they ignore the

rich semantic cues for a designated face such as skin

colour, gender, hair style. When using a cropped version

of face images, the face verification performance will

drop comparing with using the original uncropped version for identifying a person. Therefore, I propose

to use automatically detected human attributes to

compensate the information loss. So here we use entire

image of the face rather than cropped version.

Fig 1: Cropped version of images Given a face image, let X be a random variable for the

identity of a person, and Y is the attribute (e.g., gender).

In information-theoretic perspective, knowing attributes

can reduce the entropy for identifying a person and the

information gain can be computed as,

I(X; Y) = H(X) - H(X/Y); (1)

Where H(X) denotes the Shannon entropy of X, which is

used to measure the uncertainty of the random variable

X. H(X/Y) is the conditional entropy of X given Y and

shows the uncertainty of X after knowing the value of Y

. Intuitively, the larger mutual information indicates

more help coming from Y for predicting X. The

probability of X is computed by the frequency of the

person in the dataset. Genders of the people are manually

labelled. I only considered gender in Y for simplicity. As a result, I gained up to 0.97 bit information, that is,

considering the gender attribute allows us to skip nearly

half of the database if the database contains 50% females

and 50% males. Hence I hypothesize that using human

attributes can help the face retrieval task.



1062

ISSN:2229-6093

4. PROPOSED METHOD

In this section, I explained the proposed methods:

attribute-enhanced sparse coding and attribute-embedded

inverted indexing in details. As shown in figure 2 for

every image in the database, we first apply Viola-Jones

face detector to find the locations of faces. . the human

attributes mentioned in the coming sections are

automatically detected using the method described in

[7].I found 73 different attribute scores. Active shape model is applied to locate 68 different facial

landmarks on the image. Using this active shape model

Alligned face male,asian,black Ranking

results

Local patches Patch level Aaaaaaaaaaasb patch level LBP features sparse codewords

Fig 2: the proposed system framework

Barycentric co ordinate is applied. For each detected facial

component, we will extract7*5 grids, where each grid is a

square patch [4]. In total we have 175 grids from five

components including two eyes, nose tip, and two mouth

corners. On the aligned image using similar methods

proposed in [4]. From each grid, we extract an image patch

and compute a 59-dimensional uniform LBP feature

descriptor. Attribute embedded inverted index described is

then built for efficient retrieval. When a query image arrives, it will go through the same procedure to obtain

sparse codewords and human attributes, and use these

codewords with binary attribute signature to retrieve

images in the index system. Figure 2 illustrates the

overview of our system.

4.1. Attribute-enhanced sparse coding (ASC) In this section, we first describe how to use sparse coding

for face image retrieval. We then describe details of the

proposed attribute-enhanced sparse coding. Note that in

the following sections, we apply the same procedures to all

patches in a single image to find different codewords and

combine all these codewords together to represent the

image.

1) Sparse coding for face image retrieval (SC):

Using sparse coding for face image retrieval, I

solve the following optimization problem:

Min(D,V)∑||xˆ(i) - Dvˆ||ˆ2ˬ2+λ||vˆ(I)||i (2)

Subject to ||D*j||ˆ2ˬ2=1,for all j where x(i) is the original

features extracted from a patch of face image i, D € Rˆd*K

is a to-be-learned dictionary contains K centroids with d

dimensions. V = [vˆ(1),v^(2)...,v^(n)] is the sparse

representation of the image patches. The constraint on

each column of D (D*_j ) is to

keep D from becoming arbitrarily large. After finding v(i)

for each image patch, we consider nonzero entries as

Attribute

enhanced sparse

coding

Attribute

embedded

inverted

indexing

Attribute

detection

Query

image

Face

detection

Face landmark

detection

Face

alignment



1063

ISSN:2229-6093

codewords of image i and use them for inverted indexing.

Note that we apply the above process to 175 different

spatial grids separately, so codewords from different grids

will never match. Accordingly, we can encode the

important spatial information of faces into sparse coding.

2) Attribute enhanced sparse coding (ASC):In order

consider human attributes in the sparse

representations, first I used dictionary

selection(ASC-D)[7] to force images of different

attribute values to contain different codewords.

Here the dictionary centroids are divided into sparse

representations into multiple segments based on the

number of attributes and each segment generated depending on single attribute.

Here in order to minimise error we can also use ASC-S

that is soft weighted attribute. In this method I assigned

half of the dictionary centroids with +1 attribute used to

represent positive demographic attribute (i.e. gender etc)

and other half of dictionary centroid with -1 to represent

images with negative demographic attribute. These

dictionary centroid is calculated by using the below

equation.

Here attribute vector a€{1, -1}^k where aˬj contains

attribute scores of the jth centroid.

aˬj = +1, if j ≥ [k/2]

-1 otherwise (3)

4.2 Attribute-embedded inverted indexing (AEI):

i) Image ranking and inverted indexing: For each image, after computing the sparse representation we can use

codeword set c(i) to represent it by taking non-zero entries

in the sparse representations codewords. The similarity S

between two images( i and j) are then computed as

follows,

S(i; j) = ||c(i) U c(j)|| (4)

The image ranking according to this similarity score can

be efficiently found using inverted index structure. To

embed attribute information into index structure for every

image along with sparse codewords c^(i) we use

dimension dˬb binary signature to represent human

attribute b^(i).

bˬjˆ(i) = 1 if fˬaˆ(i) (j) > 0

0 otherwise (5)

If the hamming distance of the binary signatures

of the two images is less than threshold(T) then ||c^

(i)ᴒc(j)||

Otherwise 0.

5. EXPERIMENTAL RESULTS

In this section we compare and visualize results using real

examples.

Database

Fig 5: Re ranking of images As the figure 3 describes the ranking of images based on

inverted indexing. Here red boxes indicate the false

positives. Here it shows after using the attribute

embedded inverted indexing majority of references are

correct and in figure 4 the graph shows the query time per

image how efficient the retrieval is in the proposed

method.

In my current implementation each codewords only needs

16 bits to store its image ID in the index. Total memory

usage is about 94.2 MB which is reasonable amount for

general computer server.

Here I have used matlab and images are stored in database.

For attribute detection, detecting single attribute can be

done within few milliseconds.

Re-rank



1064

ISSN:2229-6093

Fig 4: Query time per image .

6. CONCLUSION AND FUTURE ENHANCEMENT

Conventional face matching system generate only numeric

matching scores as a similarity between face images but in

the proposed method combines two methods to utilize

automatically detected human attributes to significantly

improve content-based face image retrieval .Here I have

used and adopted the rule of maximization of mutual

information to obtain a compact and discriminative

dictionary. Some dictionary atoms are also considered as

attributes in the paper. Since I have used the sparse coding

feature which speeds up the process. To the best of my

knowledge, this is the first proposal of combining low-

level features and automatically detected human attributes

for content-based face image retrieval. Attribute-enhanced

sparse coding exploits the global structure and uses several

human attributes to construct semantic-aware codewords

in the offline stage. Attribute-embedded inverted indexing further considers the local attribute signature of the query

image and still ensures efficient retrieval in the online

stage. The experimental results show that using the

codewords generated by the proposed coding scheme, we

can reduce the quantization error and achieve effective

results. Current methods treat all attributes as equal. I will

investigate methods to dynamically decide the importance

of the attributes and further exploit the contextual

relationships between them.

My ongoing work includes 1) studying the image

resolution requirement for facial mark detection.2) since I

used automatic detection of spontaneous asymmetric

expressions my further work includes Analysing few basic

words in kids before they talk with the help of expressions.

3) improving the efficiency of the retrieval of images even

in the unconstrained environment.

ACKNOWLEDGEMENT

I would like to thank the department of computer science

& engineering, VTU RC Mysore for providing support to

this research work. My indebted gratitude also goes to our

head of the department Dr.K Thippeswamy for his helpful

tips and timely suggestions. I would like to express my

sincere thanks to Dr.K M RAVI KUMAR PG Coordinator,

VTU, PG Centre, Regional Office, Mysore for his support

and guidance, without whose assistance would have

faltered in this effort.

REFERENCES

[1] Y.-H. Lei, Y.-Y. Chen, L. Iida, B.-C. Chen, H.-H.

Su, and W. H. Hsu, ―Photo search by face positions and

facial attributes on touch devices,‖

ACM Multimedia, 2011.

[2] D. Wang, S. C. Hoi, Y. He, and J. Zhu, ―Retrieval-

based face annotation by weak label regularized local

coordinate coding,‖ ACM Multimedia, 2011.

[3] U. Park and A. K. Jain, ―Face matching and retrieval

using soft biometrics,‖ IEEE Transactions on Information

Forensics and Security, 2010.

[4] Z. Wu, Q. Ke, J. Sun, and H.-Y. Shum, ―Scalable

face image retrieval with identity-based quantization and

multi-reference re-ranking,‖ IEEE Conference on

Computer Vision and Pattern Recognition, 2010.

[5] B.-C. Chen, Y.-H. Kuo, Y.-Y. Chen, K.-Y Chu,

―Semisupervised image retrieval using sparse coding with

identity constraint,’’.ACM multimedia 2010

[6]M. Douze and A. Ramisa and C. Schmid,

“Combining Attributes and Fisher Vectors for Efficient

Image Retrieval,” IEEE Conference on Computer Vision

and Pattern Recognition, 2011.

[7] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K.

Nayar, ―Describable visual attributes for face verification

and image search,‖ in IEEE Transactions on Pattern

Analysis and Machine Intelligence (PAMI), Special Issue

on Real-World Face Recognition, Oct 2011.

[8] W. Scheirer, N. Kumar, K. Ricanek, T. E. Boult,

and P. N. Belhumeur, ―Fusing with context: a bayesian

approach to combining descriptive attributes,‖

International Joint Conference on Biometrics, 2011

.

[9] J. Sivic and A. Zisserman, ―Video google: A text

retrieval approach to object matching in videos,‖

International Conference on Computer Vision, 2003.



1065

ISSN:2229-6093

Documents

USING ATTRIBUTE EMBEDDED INVERTED INDEXING FOR … · USING ATTRIBUTE EMBEDDED INVERTED INDEXING FOR ... (scalable) codewords.To the best ... Sparse coding for face image retrieval