Upload
hanhan
View
291
Download
2
Embed Size (px)
Citation preview
USING ATTRIBUTE EMBEDDED INVERTED INDEXING FOR
THE CONTENT BASED FACE IMAGE EXTRACTION
Reshma S Hebbar
Student, VTU RC Mysore
Abstract—Face image retrieval from the
large scale databases is the challenging problem and is beneficial to many real world applications. In this paper I propose to develop a method where I exploit special properties of faces like local features based on hamming distance to encode discriminative global feature for each face inorder to retrieve the required image from large scale database. In this I utilize demographic information and facial marks so that it will be useful in differentiating identical twins which thereby improves face image matching and retrieval. Here automatic attribute detection based on both local features and global image representations are used. I retrieve the image based on two important concepts that is attribute enhanced sparse coding and attribute embedded inverted indexing. The results obtained from experiments show the good retrieval performance and increased efficiency up to 45%.So here my system is not only scalable but also outperforms the linear scan retrieval system.
Keywords-face retrieval, attribute
embedded inverted indexing, sparse coding. 1. INTRODUCTION
Given a face images as a query, my goal is to retrieve
images containing faces of a person appeared in the
query image from a large scale databases. Among all
those photos, a big percentage of them are photos with
human faces (estimated more than 60%). The importance
and the sheer amount of human face photos make
manipulations (e.g., search and mining) of large-scale
human face images a really important research problem
and enable many real world applications [1], [2]. My
goal in this paper is to address one of the important and
challenging problems – large-scale content-based face
image retrieval. Given a query face image, content-based
face image retrieval tries to find similar face images from
a large image database. It is an enabling technology for many applications including automatic face annotation
[2], crime investigation [3], etc. Traditional
methods for face image retrieval usually use low-level features to represent faces [2], [4], [5], but low-level
features are lack of semantic meanings and face images
usually have high intra-class variance (e.g., expression,
posing), so the retrieval results are unsatisfactory. To
tackle this problem In this work, we provide a new
perspective on content based face image retrieval by
incorporating high-level human attributes into face
image representation and index structure. Along with the
facial marks, demographic information (i.e., gender and
ethnicity) and face identification [8] can also be
considered as ancillary information that is useful in
matching face images. The demographic information and
facial marks are collectively referred to as soft biometric
traits. Soft biometric traits are defined as characteristics
that provide some information about the individual, but
lack the distinctiveness and permanence to sufficiently
differentiate any two individuals By combining low-level features with high-level human attributes, was able to
find better feature representations and achieve better
retrieval results. Here we use two orthogonal methods.
i) Attribute enhanced sparse coding ii) attribute
embedded inverted indexing. The selected descriptors are
promising for other applications paper is organized as
follows. Section 2 discusses related work. Section 3
describes observations on the face image retrieval
problem and the promising utilities of human attributes.
Section 4 introduces the proposed methods including
attribute-enhanced sparse coding and attribute embedded
inverted indexing. Section 5 gives the experimental
results, and section 6 concludes this paper.
2. RELATED WORK
This work is closely related to several different research topics, including content-based image retrieval (CBIR),
human attribute detection, and content-based face image
retrieval. Traditional CBIR techniques use image content
like colour, texture and gradient to represent images. To
deal with large scale data, mainly two kinds of indexing
systems are used .Many studies have leveraged inverted
indexing or hash based indexing combined with bag-of-
word model (BoW)[9] and local features like SIFT, to
achieve efficient similarity search.Although these
Reshma S Hebbar, Int.J.Computer Technology & Applications,Vol 5 (3),1061-1065
IJCTA | May-June 2014 Available [email protected]
1061
ISSN:2229-6093
methods can achieve high precision on rigid object
retrieval, they suffer from low recall problem due to the
semantic gap. Recently, some researchers have focused
on bridging the semantic gap by finding semantic image
representations to improve the CBIR performance and
propose to use extra textual information to construct
semantic code words uses class labels for semantic
hashing. The idea of this work is similar to the
aforementioned methods, but instead of using extra
information that might require intensive human
annotations (and tagging), this paper exploits
automatically detected human attributes to construct
semantic codewords for the face image retrieval task. Automatically detected human attributes have been
shown promising in different applications recently.
Kumar et al.propose a learning framework to
automatically find describable visual attributes [7]. Using
automatically detected human attributes, they achieve
excellent performance on keyword based face image
retrieval and face verification. Some further extend the
framework to deal with multi-attribute queries for
keyword-based face image retrieval. Bayesian network
approach was proposed to utilize the human attributes for
face identification. This works demonstrates the
emerging opportunities for the human attributes but are
not exploited to generate more semantic (scalable)
codewords.To the best of my knowledge, very few works
aim to deal with this problem. Due to the rise of photo
sharing/social network services, there rises the strong
need for large-scale content-based face image retrieval.
Meanwhile, the photo quality in consumer photos is more diverse and poses more visual variances. Wu et al.
[4] propose a face retrieval framework using component-
based local features with identity-based quantization to
deal with scalability issues. Wang et. al. [2] proposes an
automatic face annotation framework based on content-
based face image retrieval. In their framework, they
adopt GIST [2] feature with locality sensitive hashing [5]
for face image retrieval. Chen et al. [5] propose to use
component-based local binary pattern (LBP) a well
known feature for face recognition, combined with
sparse coding and partial identity information to
construct semantic code words for content-based face
image retrieval. Although images naturally have very
high dimensional representations, those within the same
class usually lie on a low dimensional subspace. Sparse
coding can exploit the semantics of the data and achieve
promising results in many different applications such as image classification and face recognition. Raina et al.
propose a machine learning framework using unlabeled
data with sparse coding for classification tasks Yang
et.al apply the framework on SIFT descriptors along with
spatial pyramid matching [7] and maximum pooling to
improve classification results. Wright et. al. propose to
use sparse representation for face recognition and
achieve state-of the-art performance. Note that the
proposed methods can be easily combined with the
method proposed in [5] to take advantage of both identity
information and automatically detected human attributes.
Also, low-level feature (i.e., LBP) can be replaced by
other features such as T3HS2 descriptor.
3. OBSERVATIONS
When dealing with face images, as shown in figure 1,
usually crop only the facial region and normalize the face
into the same position and illumination to reduce intra-
class. Doing these pre-processing steps, they ignore the
rich semantic cues for a designated face such as skin
colour, gender, hair style. When using a cropped version
of face images, the face verification performance will
drop comparing with using the original uncropped version for identifying a person. Therefore, I propose
to use automatically detected human attributes to
compensate the information loss. So here we use entire
image of the face rather than cropped version.
Fig 1: Cropped version of images Given a face image, let X be a random variable for the
identity of a person, and Y is the attribute (e.g., gender).
In information-theoretic perspective, knowing attributes
can reduce the entropy for identifying a person and the
information gain can be computed as,
I(X; Y) = H(X) - H(X/Y); (1)
Where H(X) denotes the Shannon entropy of X, which is
used to measure the uncertainty of the random variable
X. H(X/Y) is the conditional entropy of X given Y and
shows the uncertainty of X after knowing the value of Y
. Intuitively, the larger mutual information indicates
more help coming from Y for predicting X. The
probability of X is computed by the frequency of the
person in the dataset. Genders of the people are manually
labelled. I only considered gender in Y for simplicity. As a result, I gained up to 0.97 bit information, that is,
considering the gender attribute allows us to skip nearly
half of the database if the database contains 50% females
and 50% males. Hence I hypothesize that using human
attributes can help the face retrieval task.
Reshma S Hebbar, Int.J.Computer Technology & Applications,Vol 5 (3),1061-1065
IJCTA | May-June 2014 Available [email protected]
1062
ISSN:2229-6093
4. PROPOSED METHOD
In this section, I explained the proposed methods:
attribute-enhanced sparse coding and attribute-embedded
inverted indexing in details. As shown in figure 2 for
every image in the database, we first apply Viola-Jones
face detector to find the locations of faces. . the human
attributes mentioned in the coming sections are
automatically detected using the method described in
[7].I found 73 different attribute scores. Active shape model is applied to locate 68 different facial
landmarks on the image. Using this active shape model
Alligned face male,asian,black Ranking
results
Local patches Patch level Aaaaaaaaaaasb patch level LBP features sparse codewords
Fig 2: the proposed system framework
Barycentric co ordinate is applied. For each detected facial
component, we will extract7*5 grids, where each grid is a
square patch [4]. In total we have 175 grids from five
components including two eyes, nose tip, and two mouth
corners. On the aligned image using similar methods
proposed in [4]. From each grid, we extract an image patch
and compute a 59-dimensional uniform LBP feature
descriptor. Attribute embedded inverted index described is
then built for efficient retrieval. When a query image arrives, it will go through the same procedure to obtain
sparse codewords and human attributes, and use these
codewords with binary attribute signature to retrieve
images in the index system. Figure 2 illustrates the
overview of our system.
4.1. Attribute-enhanced sparse coding (ASC) In this section, we first describe how to use sparse coding
for face image retrieval. We then describe details of the
proposed attribute-enhanced sparse coding. Note that in
the following sections, we apply the same procedures to all
patches in a single image to find different codewords and
combine all these codewords together to represent the
image.
1) Sparse coding for face image retrieval (SC):
Using sparse coding for face image retrieval, I
solve the following optimization problem:
Min(D,V)∑||xˆ(i) - Dvˆ||ˆ2ˬ2+λ||vˆ(I)||i (2)
Subject to ||D*j||ˆ2ˬ2=1,for all j where x(i) is the original
features extracted from a patch of face image i, D € Rˆd*K
is a to-be-learned dictionary contains K centroids with d
dimensions. V = [vˆ(1),v^(2)...,v^(n)] is the sparse
representation of the image patches. The constraint on
each column of D (D*_j ) is to
keep D from becoming arbitrarily large. After finding v(i)
for each image patch, we consider nonzero entries as
Attribute
enhanced sparse
coding
Attribute
embedded
inverted
indexing
Attribute
detection
Query
image
Face
detection
Face landmark
detection
Face
alignment
Reshma S Hebbar, Int.J.Computer Technology & Applications,Vol 5 (3),1061-1065
IJCTA | May-June 2014 Available [email protected]
1063
ISSN:2229-6093
codewords of image i and use them for inverted indexing.
Note that we apply the above process to 175 different
spatial grids separately, so codewords from different grids
will never match. Accordingly, we can encode the
important spatial information of faces into sparse coding.
2) Attribute enhanced sparse coding (ASC):In order
consider human attributes in the sparse
representations, first I used dictionary
selection(ASC-D)[7] to force images of different
attribute values to contain different codewords.
Here the dictionary centroids are divided into sparse
representations into multiple segments based on the
number of attributes and each segment generated depending on single attribute.
Here in order to minimise error we can also use ASC-S
that is soft weighted attribute. In this method I assigned
half of the dictionary centroids with +1 attribute used to
represent positive demographic attribute (i.e. gender etc)
and other half of dictionary centroid with -1 to represent
images with negative demographic attribute. These
dictionary centroid is calculated by using the below
equation.
Here attribute vector a€{1, -1}^k where aˬj contains
attribute scores of the jth centroid.
aˬj = +1, if j ≥ [k/2]
-1 otherwise (3)
4.2 Attribute-embedded inverted indexing (AEI):
i) Image ranking and inverted indexing: For each image, after computing the sparse representation we can use
codeword set c(i) to represent it by taking non-zero entries
in the sparse representations codewords. The similarity S
between two images( i and j) are then computed as
follows,
S(i; j) = ||c(i) U c(j)|| (4)
The image ranking according to this similarity score can
be efficiently found using inverted index structure. To
embed attribute information into index structure for every
image along with sparse codewords c^(i) we use
dimension dˬb binary signature to represent human
attribute b^(i).
bˬjˆ(i) = 1 if fˬaˆ(i) (j) > 0
0 otherwise (5)
If the hamming distance of the binary signatures
of the two images is less than threshold(T) then ||c^
(i)ᴒc(j)||
Otherwise 0.
5. EXPERIMENTAL RESULTS
In this section we compare and visualize results using real
examples.
Database
Fig 5: Re ranking of images As the figure 3 describes the ranking of images based on
inverted indexing. Here red boxes indicate the false
positives. Here it shows after using the attribute
embedded inverted indexing majority of references are
correct and in figure 4 the graph shows the query time per
image how efficient the retrieval is in the proposed
method.
In my current implementation each codewords only needs
16 bits to store its image ID in the index. Total memory
usage is about 94.2 MB which is reasonable amount for
general computer server.
Here I have used matlab and images are stored in database.
For attribute detection, detecting single attribute can be
done within few milliseconds.
Re-rank
Reshma S Hebbar, Int.J.Computer Technology & Applications,Vol 5 (3),1061-1065
IJCTA | May-June 2014 Available [email protected]
1064
ISSN:2229-6093
Fig 4: Query time per image .
6. CONCLUSION AND FUTURE ENHANCEMENT
Conventional face matching system generate only numeric
matching scores as a similarity between face images but in
the proposed method combines two methods to utilize
automatically detected human attributes to significantly
improve content-based face image retrieval .Here I have
used and adopted the rule of maximization of mutual
information to obtain a compact and discriminative
dictionary. Some dictionary atoms are also considered as
attributes in the paper. Since I have used the sparse coding
feature which speeds up the process. To the best of my
knowledge, this is the first proposal of combining low-
level features and automatically detected human attributes
for content-based face image retrieval. Attribute-enhanced
sparse coding exploits the global structure and uses several
human attributes to construct semantic-aware codewords
in the offline stage. Attribute-embedded inverted indexing further considers the local attribute signature of the query
image and still ensures efficient retrieval in the online
stage. The experimental results show that using the
codewords generated by the proposed coding scheme, we
can reduce the quantization error and achieve effective
results. Current methods treat all attributes as equal. I will
investigate methods to dynamically decide the importance
of the attributes and further exploit the contextual
relationships between them.
My ongoing work includes 1) studying the image
resolution requirement for facial mark detection.2) since I
used automatic detection of spontaneous asymmetric
expressions my further work includes Analysing few basic
words in kids before they talk with the help of expressions.
3) improving the efficiency of the retrieval of images even
in the unconstrained environment.
ACKNOWLEDGEMENT
I would like to thank the department of computer science
& engineering, VTU RC Mysore for providing support to
this research work. My indebted gratitude also goes to our
head of the department Dr.K Thippeswamy for his helpful
tips and timely suggestions. I would like to express my
sincere thanks to Dr.K M RAVI KUMAR PG Coordinator,
VTU, PG Centre, Regional Office, Mysore for his support
and guidance, without whose assistance would have
faltered in this effort.
REFERENCES
[1] Y.-H. Lei, Y.-Y. Chen, L. Iida, B.-C. Chen, H.-H.
Su, and W. H. Hsu, ―Photo search by face positions and
facial attributes on touch devices,‖
ACM Multimedia, 2011.
[2] D. Wang, S. C. Hoi, Y. He, and J. Zhu, ―Retrieval-
based face annotation by weak label regularized local
coordinate coding,‖ ACM Multimedia, 2011.
[3] U. Park and A. K. Jain, ―Face matching and retrieval
using soft biometrics,‖ IEEE Transactions on Information
Forensics and Security, 2010.
[4] Z. Wu, Q. Ke, J. Sun, and H.-Y. Shum, ―Scalable
face image retrieval with identity-based quantization and
multi-reference re-ranking,‖ IEEE Conference on
Computer Vision and Pattern Recognition, 2010.
[5] B.-C. Chen, Y.-H. Kuo, Y.-Y. Chen, K.-Y Chu,
―Semisupervised image retrieval using sparse coding with
identity constraint,’’.ACM multimedia 2010
[6]M. Douze and A. Ramisa and C. Schmid,
“Combining Attributes and Fisher Vectors for Efficient
Image Retrieval,” IEEE Conference on Computer Vision
and Pattern Recognition, 2011.
[7] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K.
Nayar, ―Describable visual attributes for face verification
and image search,‖ in IEEE Transactions on Pattern
Analysis and Machine Intelligence (PAMI), Special Issue
on Real-World Face Recognition, Oct 2011.
[8] W. Scheirer, N. Kumar, K. Ricanek, T. E. Boult,
and P. N. Belhumeur, ―Fusing with context: a bayesian
approach to combining descriptive attributes,‖
International Joint Conference on Biometrics, 2011
.
[9] J. Sivic and A. Zisserman, ―Video google: A text
retrieval approach to object matching in videos,‖
International Conference on Computer Vision, 2003.
Reshma S Hebbar, Int.J.Computer Technology & Applications,Vol 5 (3),1061-1065
IJCTA | May-June 2014 Available [email protected]
1065
ISSN:2229-6093