13
102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015 EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval Bin Xu, Student Member, IEEE, Jiajun Bu, Member, IEEE, Chun Chen, Member, IEEE, Can Wang, Member, IEEE, Deng Cai, Member, IEEE, and Xiaofei He, Senior Member, IEEE Abstract—Graph-based ranking models have been widely applied in information retrieval area. In this paper, we focus on a well known graph-based model - the Ranking on Data Manifold model, or Manifold Ranking (MR). Particularly, it has been successfully applied to content-based image retrieval, because of its outstanding ability to discover underlying geometrical structure of the given image database. However, manifold ranking is computationally very expensive, which significantly limits its applicability to large databases especially for the cases that the queries are out of the database (new samples). We propose a novel scalable graph-based ranking model called Efficient Manifold Ranking (EMR), trying to address the shortcomings of MR from two main perspectives: scalable graph construction and efficient ranking computation. Specifically, we build an anchor graph on the database instead of a traditional k-nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking. An approximate method is adopted for efficient out-of-sample retrieval. Experimental results on some large scale image databases demonstrate that EMR is a promising method for real world retrieval applications. Index Terms—Graph-based algorithm, ranking model, image retrieval, out-of-sample 1 I NTRODUCTION G RAPH- BASED ranking models have been deeply studied and widely applied in information retrieval area. In this paper, we focus on the problem of apply- ing a novel and efficient graph-based model for content- based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems are based on key- word search, such as Google and Yahoo image search. In these systems, a user keyword (query) is matched with the context around an image including the title, man- ual annotation, web document, etc. These systems don’t utilize information from images. However these systems suffer many problems, such as shortage of the text infor- mation and inconsistency of the meaning of the text and image. Content-based image retrieval is a considerable choice to overcome these difficulties. CBIR has drawn a great attention in the past two decades [1]–[3]. Different from traditional keyword search systems, CBIR systems uti- lize the low-level features, including global features (e.g., color moment, edge histogram, LBP [4]) and local features (e.g., SIFT [5]), automatically extracted from images. A great B. Xu, J. Bu, C. Chen, and C. Wang are with the Zhejiang Provincial Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou 310027, China. E-mail: {xbzju, bjj, chenc, wcan}@zju.edu.cn. D. Cai and X. He are with the State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, Hangzhou 310027, China. E-mail: {dengcai, xiaofeihe}@cad.zju.edu.cn. Manuscript received 9 Oct. 2012; revised 7 Apr. 2013; accepted 22 Apr. 2013. Date of publication 1 May 2013; date of current version 1 Dec. 2014. Recommended for acceptance by H. Zha. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier 10.1109/TKDE.2013.70 amount of researches have been performed for designing more informative low-level features to represent images, or better metrics (e.g., DPF [6]) to measure the perceptual similarity, but their performance is restricted by many con- ditions and is sensitive to the data. Relevance feedback [7] is a useful tool for interactive CBIR. User’s high level per- ception is captured by dynamically updated weights based on the user’s feedback. Most traditional methods focus on the data features too much but they ignore the underlying structure informa- tion, which is of great importance for semantic discovery, especially when the label information is unknown. Many databases have underlying cluster or manifold structure. Under such circumstances, the assumption of label consis- tency is reasonable [8], [9]. It means that those nearby data points, or points belong to the same cluster or manifold, are very likely to share the same semantic label. This phe- nomenon is extremely important to explore the semantic relevance when the label information is unknown. In our opinion, a good CBIR system should consider images’ low- level features as well as the intrinsic structure of the image database. Manifold Ranking (MR) [9], [10], a famous graph-based ranking model, ranks data samples with respect to the intrinsic geometrical structure collectively revealed by a large number of data. It is exactly in line with our consid- eration. MR has been widely applied in many applications, and shown to have excellent performance and feasibility on a variety of data types, such as the text [11], image [12], [13], and video[14]. By taking the underlying structure into account, manifold ranking assigns each data sample a relative ranking score, instead of an absolute pairwise simi- larity as traditional ways. The score is treated as a similarity 1041-4347 c 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015

EMR: A Scalable Graph-based Ranking Modelfor Content-based Image Retrieval

Bin Xu, Student Member, IEEE, Jiajun Bu, Member, IEEE, Chun Chen, Member, IEEE,Can Wang, Member, IEEE, Deng Cai, Member, IEEE, and Xiaofei He, Senior Member, IEEE

Abstract—Graph-based ranking models have been widely applied in information retrieval area. In this paper, we focus on a wellknown graph-based model - the Ranking on Data Manifold model, or Manifold Ranking (MR). Particularly, it has been successfullyapplied to content-based image retrieval, because of its outstanding ability to discover underlying geometrical structure of the givenimage database. However, manifold ranking is computationally very expensive, which significantly limits its applicability to largedatabases especially for the cases that the queries are out of the database (new samples). We propose a novel scalable graph-basedranking model called Efficient Manifold Ranking (EMR), trying to address the shortcomings of MR from two main perspectives:scalable graph construction and efficient ranking computation. Specifically, we build an anchor graph on the database instead of atraditional k-nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking. An approximatemethod is adopted for efficient out-of-sample retrieval. Experimental results on some large scale image databases demonstrate thatEMR is a promising method for real world retrieval applications.

Index Terms—Graph-based algorithm, ranking model, image retrieval, out-of-sample

1 INTRODUCTION

GRAPH-BASED ranking models have been deeplystudied and widely applied in information retrieval

area. In this paper, we focus on the problem of apply-ing a novel and efficient graph-based model for content-based image retrieval (CBIR), especially for out-of-sampleretrieval on large scale databases.

Traditional image retrieval systems are based on key-word search, such as Google and Yahoo image search. Inthese systems, a user keyword (query) is matched withthe context around an image including the title, man-ual annotation, web document, etc. These systems don’tutilize information from images. However these systemssuffer many problems, such as shortage of the text infor-mation and inconsistency of the meaning of the text andimage. Content-based image retrieval is a considerablechoice to overcome these difficulties. CBIR has drawn agreat attention in the past two decades [1]–[3]. Differentfrom traditional keyword search systems, CBIR systems uti-lize the low-level features, including global features (e.g.,color moment, edge histogram, LBP [4]) and local features(e.g., SIFT [5]), automatically extracted from images. A great

• B. Xu, J. Bu, C. Chen, and C. Wang are with the Zhejiang ProvincialKey Laboratory of Service Robot, College of Computer Science, ZhejiangUniversity, Hangzhou 310027, China.E-mail: {xbzju, bjj, chenc, wcan}@zju.edu.cn.

• D. Cai and X. He are with the State Key Lab of CAD&CG, Collegeof Computer Science, Zhejiang University, Hangzhou 310027, China.E-mail: {dengcai, xiaofeihe}@cad.zju.edu.cn.

Manuscript received 9 Oct. 2012; revised 7 Apr. 2013; accepted 22 Apr. 2013.Date of publication 1 May 2013; date of current version 1 Dec. 2014.Recommended for acceptance by H. Zha.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier 10.1109/TKDE.2013.70

amount of researches have been performed for designingmore informative low-level features to represent images,or better metrics (e.g., DPF [6]) to measure the perceptualsimilarity, but their performance is restricted by many con-ditions and is sensitive to the data. Relevance feedback [7]is a useful tool for interactive CBIR. User’s high level per-ception is captured by dynamically updated weights basedon the user’s feedback.

Most traditional methods focus on the data features toomuch but they ignore the underlying structure informa-tion, which is of great importance for semantic discovery,especially when the label information is unknown. Manydatabases have underlying cluster or manifold structure.Under such circumstances, the assumption of label consis-tency is reasonable [8], [9]. It means that those nearby datapoints, or points belong to the same cluster or manifold,are very likely to share the same semantic label. This phe-nomenon is extremely important to explore the semanticrelevance when the label information is unknown. In ouropinion, a good CBIR system should consider images’ low-level features as well as the intrinsic structure of the imagedatabase.

Manifold Ranking (MR) [9], [10], a famous graph-basedranking model, ranks data samples with respect to theintrinsic geometrical structure collectively revealed by alarge number of data. It is exactly in line with our consid-eration. MR has been widely applied in many applications,and shown to have excellent performance and feasibilityon a variety of data types, such as the text [11], image[12], [13], and video[14]. By taking the underlying structureinto account, manifold ranking assigns each data sample arelative ranking score, instead of an absolute pairwise simi-larity as traditional ways. The score is treated as a similarity

1041-4347 c© 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

XU ET AL.: EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL 103

metric defined on the manifold, which is more meaningfulto capturing the semantic relevance degree. He et al. [12]firstly applied MR to CBIR, and significantly improvedimage retrieval performance compared with state-of-the-artalgorithms.

However, manifold ranking has its own drawbacks tohandle large scale databases – it has expensive computa-tional cost, both in graph construction and ranking com-putation stages. Particularly, it is unknown how to handlean out-of-sample query (a new sample) efficiently underthe existing framework. It is unacceptable to recompute themodel for a new query. That means, original manifold rank-ing is inadequate for a real world CBIR system, in whichthe user provided query is always an out-of-sample.

In this paper, we extend the original manifold rankingand propose a novel framework named Efficient ManifoldRanking (EMR). We try to address the shortcomings ofmanifold ranking from two perspectives: the first is scalablegraph construction; and the second is efficient computa-tion, especially for out-of-sample retrieval. Specifically, webuild an anchor graph on the database instead of the tradi-tional k-nearest neighbor graph, and design a new form ofadjacency matrix utilized to speed up the ranking compu-tation. The model has two separate stages: an offline stagefor building (or learning) the ranking model and an onlinestage for handling a new query. With EMR, we can handle adatabase with 1 million images and do the online retrievalin a short time. To the best of our knowledge, no previousmanifold ranking based algorithm has run out-of-sampleretrieval on a database in this scale.

A preliminary version of this work previously appearedas [13]. In this paper, the new contributions are as follows:

• We pay more attention to the out-of-sample retrieval(online stage) and propose an efficient approximatemethod to compute ranking scores for a new queryin Section 4.5. As a result, we can run out-of-sample retrieval on a large scale database in a shorttime.

• We have optimized the EMR code1 and re-run all theexperiments (Section 5). Three new databases includ-ing two large scale databases with about 1 millionssamples are added for testing the efficiency of theproposed model. We offer more detailed analysis forexperimental result.

• We formally define the formulation of local weightestimation problem (Section 4.1.1) for buildingthe anchor graph and two different methods arecompared to determine which method is better(Section 5.2.2).

The rest of this paper is organized as follows. InSection 2, we briefly discuss some related work and inSection 3, we review the algorithm of MR and makean analysis. The proposed approach EMR is described inSection 4. In Section 5, we present the experiment resultson many real world image databases. Finally we provide aconclusions in Section 6.

1. http://eagle.zju.edu.cn/∼binxu/

2 RELATED WORK

The problem of ranking has recently gained great attentionsin both information retrieval and machine learning areas.Conventional ranking models can be content based models,like the Vector Space Model, BM25, and the language mod-eling [15]; or link structure based models, like the famousPageRank [16] and HITS [17]; or cross media models [18].Another important category is the learning to rank model,which aims to optimize a ranking function that incorpo-rates relevance features and avoids tuning a large numberof parameters empirically [19], [20]. However, many con-ventional models ignore the important issue of efficiency,which is crucial for a real-time systems, such as a web appli-cation. In [21], the authors present a unified framework forjointly optimizing effectiveness and efficiency.

In this paper, we focus on a particular kind of rank-ing model – graph-based ranking. It has been successfullyapplied in link-structure analysis of the web [16], [17], [22]–[24], social networks research [25]–[27] and multimedia dataanalysis [28]. Generally, a graph [29] can be denoted asG = (V, E, W), where V is a set of vertices in which eachvertex represents a data point, E ⊆ V × V is a set of edgesconnecting related vertices, and W is a adjacency matrixrecording the pairwise weights between vertices. The objectof a graph-based ranking model is to decide the importanceof a vertex, based on local or global information draw fromthe graph.

Agarwal [30] proposed to model the data by a weightedgraph, and incorporated this graph structure into the rank-ing function as a regularizer. Guan et al. [26] proposed agraph-based ranking algorithm for interrelated multi-typeresources to generate personalized tag recommendation.Liu et al. [25] proposed an automatically tag ranking schemeby performing a random walk over a tag similarity graph.In [27], the authors made the music recommendation byranking on a unified hypergraph, combining with richsocial information and music content. Hypergraph is a newgraph-based model and has been studied in many works[31]. Recently, there have been some papers on speeding upmanifold ranking. In [32], the authors partitioned the datainto several parts and computed the ranking function by ablock-wise way.

3 MANIFOLD RANKING REVIEW

In this section, we briefly review the manifold ranking algo-rithm and make a detailed analysis about its drawbacks. Westart form the description of notations.

3.1 Notations and FormulationsGiven a set of data χ = {x1, x2, . . . , xn} ⊂ Rm and builda graph on the data (e.g., kNN graph). W ∈ Rn×n denotesthe adjacency matrix with element wij saving the weight ofthe edge between point i and j. Normally the weight canbe defined by the heat kernel wij = exp [ − d2(xi, xj)/2σ 2)]if there is an edge linking xi and xj, otherwise wij = 0.Function d(xi, xj) is a distance metric of xi and xj definedon χ , such as the Euclidean distance. Let r:χ → R be aranking function which assigns to each point xi a rankingscore ri. Finally, we define an initial vector y = [y1, . . . , yn]T,in which yi = 1 if xi is a query and yi = 0 otherwise.

Page 3: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

104 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015

The cost function associated with r is defined to be

O(r) = 1

2

⎛⎝

n∑i,j=1

wij‖1√Dii

ri −1√Djj

rj‖2 + μ

n∑i=1

‖ri − yi‖2

⎞⎠ ,

(1)

where μ > 0 is the regularization parameter and D is adiagonal matrix with Dii =

∑nj=1 wij.

The first term in the cost function is a smoothness con-straint, which makes the nearby points in the space havingclose ranking scores. The second term is a fitting con-straint, which means the ranking result should fit to theinitial label assignment. With more prior knowledge aboutthe relevance or confidence of each query, we can assigndifferent initial scores to the queries. Minimizing the costfunction respect to r results into the following closed formsolution

r∗ = (In − αS)−1y, (2)

where α = 11+μ

, In is an identity matrix with n×n, and S isthe symmetrical normalization of W, S = D−1/2WD−1/2. Inlarge scale problems, we prefer to use the iteration scheme:

r(t + 1) = αSr(t) + (1 − α)y. (3)

During each iteration, each point receives informationfrom its neighbors (first term), and retains its initial infor-mation (second term). The iteration process is repeateduntil convergence. When manifold ranking is applied toretrieval (such as image retrieval), after specifying a queryby the user, we can use the closed form or iteration schemeto compute the ranking score of each point. The rankingscore can be viewed as a metric of the manifold dis-tance which is more meaningful to measure the semanticrelevance.

3.2 AnalysisAlthough manifold ranking has been widely used in manyapplications, it has its own drawbacks to handle large scaledatabased, which significantly limits its applicability.

The first is its graph construction method. The kNNgraph is quite appropriate for manifold ranking becauseof its good ability to capture local structure of the data. Butthe construction cost for kNN graph is O(n2 log k), whichis expensive in large scale situations. Moreover, manifoldranking, as well as many other graph-based algorithmsdirectly use the adjacency matrix W in their computation.The storage cost of a sparse W is O(kn). Thus, we need tofind a way to build a graph in both low construction costand small storage space, as well as good ability to captureunderlying structure of the given database.

The second, manifold ranking has very expensive com-putational cost because of the matrix inversion operationin equation (2). This has been the main bottleneck to applymanifold ranking in large scale applications. Although wecan use the iteration algorithm in equation (3), it is stillinefficient in large scale cases and may arrive at a local con-vergence. Thus, original manifold ranking is inadequate fora real-time retrieval system.

4 EFFICIENT MANIFOLD RANKING

We address the shortcomings of original MR from twoperspectives: scalable graph construction and efficient rank-ing computation. Particularly, our method can handle theout-of-sample retrieval, which is important for a real-timeretrieval system.

4.1 Scalable Graph ConstructionTo handle large databases, we want the graph constructioncost to be sub-linear with the graph size. That means, foreach data point, we can’t search the whole database, as kNNstrategy does. To achieve this requirement, we constructan anchor graph [33], [34] and propose a new design ofadjacency matrix W.

The definitions of anchor points and anchor graph haveappeared in some other works. For instance, in [35], theauthors proposed that each data point on the manifoldcan be locally approximated by a linear combination of itsnearby anchor points, and the linear weights become itslocal coordinate coding. Liu et al. [33] designed the adja-cency matrix in a probabilistic measure and used it forscalable semi-supervised learning. This work inspires usmuch.

4.1.1 Anchor Graph ConstructionNow we introduce how to use anchor graph to modelthe data [33], [34]. Suppose we have a data set χ ={x1, . . . , xn} ⊂ Rm with n samples in m dimensions, andU = {u1, . . . , ud} ⊂ Rm denotes a set of anchors sharingthe same space with the data set. Let f :χ → R be a realvalue function which assigns each data point in χ a seman-tic label. We aim to find a weight matrix Z ∈ Rd×n thatmeasures the potential relationships between data pointsin χ and anchors in U. Then we estimate f (x) for each datapoint as a weighted average of the labels on anchors

f̂ (xi) =d∑

k=1

zkif (uk), i = 1, . . . , n, (4)

with constraints∑d

k=1 zki = 1 and zki ≥ 0. Element zki repre-sents the weight between data point xi and anchor uk. Thekey point of the anchor graph construction is how to com-pute the weight vector zi for each data point xi. Two issuesneed to be considered: (1) the quality of the weight vectorand (2) the cost of the computation.

Similar to the idea of LLE [8], a straightforward wayto measure the local weight is to optimize the followingconvex problem:

minzi

ε(zi) = 12‖xi −

∑|N (xi)|s=1 us∈N (xi)zis‖2

s.t.∑

s zis = 1, zi ≥ 0,(5)

where N (xi) is the index set of xi’s nearest anchors. Wecall the above problem as the local weight estimation prob-lem. A standard quadratic programming (QP) can solve thisproblem, but QP is very computational expensive. A pro-jected gradient based algorithm was proposed in [33] tocompute weight matrix and in our previous work [13], akernel regression method was adopted. In this paper, wecompare these two different methods to find the weightvector zi. Both of them are much faster than QP.

Page 4: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

XU ET AL.: EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL 105

(1) Solving by Projected GradientThe first method is the projected gradient method, which

has been used in the work of [33]. The updating rule in thismethod is expressed as the following iterative formula [33]:

z(t+1)

i = �s(z(t)i − ηt∇ε(zt

i)), (6)

where ηt denotes the step size of time t, ∇ε(z) denotes thegradient of ε at z, and �s(z) denotes the simplex projectionoperator on any z ∈ R

s. Detailed algorithm can be foundin Algorithm 1 of [33].(2) Solving by Kernel Regression

We adopt the Nadaraya-Watson kernel regression toassign weights smoothly [13]

zki =K( |xi−uk|

λ

)

∑dl=1 K

( |xi−ul|λ

) , (7)

with the Epanechnikov quadratic kernel

Kλ(t) ={ 3

4 (1 − t2) if |t| ≤ 1;0 otherwise.

(8)

The smoothing parameter λ determines the size of thelocal region in which anchors can affect the target point. Itis reasonable to consider that one data point has the samesemantic label with its nearby anchors in a high probability.There are many ways to determine the parameter λ. Forexample, it can be a constant selected by cross-validationfrom a set of training data. In this paper we use a morerobust way to get λ, which uses the nearest neighborhoodsize s to replace λ, that is

λ(xi) = |xi − u[s]|, (9)

where u[s] is the sth closest anchor of xi. Later in the exper-iment part, we’ll discuss the effectiveness and efficiency ofthe above two methods.

Specifically, to build the anchor graph, we connect eachsample to its s nearest anchors and then assign the weights.So the construction has a total complexity O(nd log s), whered is the number of anchors and s is very small. Thus, thenumber of anchors determines the efficiency of the anchorgraph construction. If d n, the construction is linear tothe database.

How can we get the anchors? Active learning [36], [37] orclustering methods are considerable choices. In this paper,we use k-means algorithm and select the centers as anchors.Some fast k-means algorithms [38] can speed up the compu-tation. Random selection is a competitive method which hasextremely low selection cost and acceptable performance.

The main feature, also the main advantage of buildingan anchor graph is separating the graph construction intotwo parts – anchor selection and graph construction. Eachdata sample is independent to the other samples but relatedto the anchors only. The construction is always efficientsince it has linear complexity to the date size. Note that wedon’t have to update the anchors frequently, as informativeanchors for a large database are relatively stable (e.g., thecluster centers), even if a few new samples are added.

4.1.2 Design of Adjacency MatrixWe present a new approach to design the adjacency matrixW and make an intuitive explanation for it. The weight

matrix Z ∈ Rd×n can be seen as a d dimensional represen-tation of the data X ∈ Rm×n, d is the number of anchorpoints. That is to say, data points can be represented inthe new space, no matter what the original features are.This is a big advantage to handle some high dimensionaldata. Then, with the inner product as the metric to mea-sure the adjacent weight between data points, we designthe adjacency matrix to be a low-rank form [33], [39]

W = ZTZ, (10)

which means that if two data points are correlative (Wij >

0), they share at least one common anchor point, other-wise Wij = 0. By sharing the same anchors, data pointshave similar semantic concepts in a high probability as ourconsideration. Thus, our design is helpful to explore thesemantic relationships in the data.

This formula naturally preserves some good propertiesof W: sparseness and nonnegativeness. The highly sparsematrix Z makes W sparse, which is consistent with theobservation that most of the points in a graph have onlya small amount of edges with other points. The nonnega-tive property makes the adjacent weight more meaningful:in real world data, the relationship between two items isalways positive or zero, but not negative. Moreover, non-negative W guarantees the positive semidefinite property ofthe graph Laplacian in many graph-based algorithms [33].

4.2 Efficient Ranking ComputationAfter graph construction, the main computational cost formanifold ranking is the matrix inversion in equation (2),whose complexity is O(n3). So the data size n can not betoo large. Although we can use the iteration algorithm, itis still inefficient for large scale cases.

One may argue that the matrix inversion can be done off-line, then it is not a problem for on-line search. However,off-line calculation can only handle the case when the queryis already in the graph (an in-sample). If the query is notin the graph (an out-of-sample), for exact graph structure,we have to update the whole graph to add the new queryand compute the matrix inversion in equation (2) again.Thus, the off-line computation doesn’t work for an out-of-sample query. Actually, for a real CBIR system, user’s queryis always an out-of-sample.

With the form of W = ZTZ , we can rewrite the equa-tion (2), the main step of manifold ranking, by Woodburyformula as follows. Let H = ZD− 1

2 , and S = HTH, then thefinal ranking function r can be directly computed by

r∗ = (In − αHTH)−1y =(

In − HT(

HHT − 1

αId

)−1

H

)y.

(11)

By equation (11), the inversion part (taking the mostcomputational cost) changes from a n × n matrix to a d × dmatrix. If d n, this change can significantly speed upthe calculation of manifold ranking. Thus, applying ourproposed method to a real-time retrieval system is viable,which is a big shortage for original manifold ranking.

During the computation process, we never use the adja-cency matrix W. So we don’t save the matrix W in memory,

Page 5: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

106 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015

but save matrix Z instead. In equation (11), D is a diagonalmatrix with Dii =

∑nj=1 wij. When W = ZTZ,

Dii =n∑

j=1

zTi zj = zT

i v, (12)

where zi is the ith column of Z and v = ∑nj=1 zj. Thus we

get the matrix D without using W.A useful trick for computing r∗ in equation (11) is run-

ning it from right to left. So every time we multiply a matrixby a vector, avoiding the matrix - matrix multiplication.As a result, to compute the ranking function, EMR has acomplexity O(dn + d3).

4.3 Complexity AnalysisIn this subsection, we make a comprehensive complexityanalysis of MR and EMR, including the computation costand storage cost. As we have mentioned, both MR andEMR have two stages: the graph construction stage andthe ranking computation stage.

For the model of MR:

• MR builds a kNN graph, i.e., for each data sample,we need to calculate the relationships to its k-nearestneighbors. So the computation cost is O(n2 log k). Atthe same time, we save the adjacency matrix W ∈Rn×n with a storage cost O(kn) since W is sparse.

• In the ranking computation stage, the main stepis to compute the matrix inversion in 2, which isapproximately O(n3).

For the model of EMR:

• EMR builds an anchor graph, i.e., for each data sam-ple, we calculate the relationships to its s-nearestanchors. The computation cost is O(nd log s). We usek-means to select the anchors, we need a cost ofO(Tdn), where T is the iteration number. But thisselection step can be done off-line and unneces-sarily updated frequently. At the same time, wesave the sparse matrix Z ∈ Rd×n with a storagecost O(sn).

• In the ranking computation stage, the main step isEq.(11), which has a computational complexity ofO(dn + d3).

As a result, EMR has a computational cost of O(dn) +O(d3) (ignoring s, T) and a storage cost O(sn), while MR hasa computational cost of O(n2) + O(n3) and a storage costO(kn). Obviously, when d n, EMR has a much lower costthan MR in computation.

4.4 EMR for Content-Based Image RetrievalIn this part, we make a brief summary of EMR applied topure content-based image retrieval. To add more informa-tion, we just extend the data features.

First of all, we extract the low-level features of imagesin the database, and use them as coordinates of data pointsin the graph. We will further discuss the low-level featuresin Section 5. Secondly, we select representative points asanchors and construct the weight matrix Z with a smallneighborhood size s. Anchors are selected off-line and does

Fig. 1. Extend matrix W (MR) and Z (EMR) in the gray regions for anout-of-sample.

not affect the on-line process. For a stable data set, we don’tfrequently update the anchors. At last, after the user speci-fying or uploading an image as a query, we get or extract itslow-level features, update the weight matrix Z, and directlycompute the ranking scores by equation (11). Images withhighest ranking scores are considered as the most relevantand return to the user.

4.5 Out-of-Sample RetrievalFor in-sample data retrieval, we can construct the graphand compute the matrix inversion part of equation (2) off-line. But for out-of-sample data, the situation is totallydifferent. A big limitation of MR is that, it is hard to han-dle the new sample query. A fast strategy for MR is leavingthe original graph unchanged and adding a new row anda new column to W (left picture of Fig. 1). Although thenew W is efficiently to compute, it is not helpful for theranking process (Eq.(2)). Computing Eq.(2) for each newquery in the online stage is unacceptable due to its highcomputational cost.

In [40], the authors solve the out-of-sample problemby finding the nearest neighbors of the query and usingthe neighbors as query points. They don’t add the queryinto the graph, therefore their database is static. However,their method may change the query’s initial semantic mean-ing, and for a large database, the linear search for nearestneighbors is also costly.

In contrast, our model EMR can efficiently handle thenew sample as a query for retrieval. In this subsection,we describe the light-weight computation of EMR for anew sample query. We want to emphasize that this is abig improvement over our previous conference version ofthis work, which makes EMR scalable for large-scale imagedatabases (e.g., 1 million samples). We show the algorithmas follows.

For one instant retrieval, it is unwise to update the wholegraph or rebuild the anchors, especially on a large database.We believe one point has little effect to the stable anchorsin a large data set (e.g., cluster centers). For EMR, each datapoint (zi) is independently computed, so we assign weightsbetween the new query and its nearby anchors, forming anew column of Z (right picture of Fig. 1).

We use zt to denote the new column. Then, Dt = zTt v

and ht = ztD− 1

2t , where ht is the new column of H. As we

have described, the main step of EMR is Eq.(11). Our goalis to further speedup the computation of Eq.(11) for a newquery. Let

C =(

HHT − 1

αId

)−1

=( n∑

i=1

hihTi − 1

αId

)−1

, (13)

Page 6: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

XU ET AL.: EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL 107

Fig. 2. COREL image samples randomly selected from semantic con-cept balloon, beach, and butterfly.

and the new C′ with adding the column ht is

C′ =( n∑

i=1

hihTi + hthT

t − 1

αId

)−1

≈ C (14)

when n is large and ht is highly sparse. We can see thematrix C as the inverse of a covariance matrix. The aboveequation says that one single point would not affect thecovariance matrix of a large database. That is to say, thecomputation of C can be done in the off-line stage.

The initial query vector yt is

yt =[

0n1

], (15)

where 0n is a n-length zero vector. We can rewrite Eq.(11)with the new query as

r(n+1)×1 =(

In+1 −[

HTChT

t C

][H ht]

)[0n1

]. (16)

Our focus is the top n elements of r, which is equal to

rn×1 = −HTCht = Eht. (17)

The matrix En×d = −HTC can be computed offline, i.e., inthe online stage, we need to compute a multiplication of an × d matrix and a d × 1 vector only. As ht is sparse (e.g., snon-zero elements), the essential computation is to select scolumns of E according to ht and do a weighted summation.As a result, we need to do sn scalar multiplications and(s − 1)n scalar additions to get the ranking score (rn×1) foreach database sample; while for linear scan using Euclideandistance, we need to do mn scalar subtractions, mn scalarmultiplications and (m−1)n scalar additions. As s m, ourmodel EMR is much faster than linear scan using Euclideandistance in the online stage.

5 EXPERIMENTAL STUDY

In this section, we show several experimental results andcomparisons to evaluate the effectiveness and efficiency ofour proposed method EMR on four real world databases:two middle size databases COREL (5,000 images) andMNIST (70,000 images), and two large size databasesSIFT1M (1 million sift descriptors) and ImageNet (1.2 mil-lion images). We use COREL and MNIST to compare theranking performance and use SIFT1M and ImageNet toshow the efficiency of EMR for out-of-sample retrieval. Our

TABLE 1Statistics of the Four Databases

experiments are implemented in MATLAB and run on acomputer with 2.0 GHz(×2) CPU, 64GB RAM.

5.1 Experiments SetupThe COREL image data set is a subset of COREL imagedatabase consisting of 5,000 images. COREL is widely usedin many CBIR works [2], [41], [42]. All of the images arefrom 50 different categories, with 100 images per category.Images in the same category belong to the same seman-tic concept, such as beach, bird, elephant and so on. That isto say, images from the same category are judged relevantand otherwise irrelevant. We use each image as a queryfor testing the in-sample retrieval performance. In Fig. 2,we randomly select and show nine image samples fromthree different categories. In our experiments, we extractfour kinds of effective features for COREL database, includ-ing Grid Color Moment, edge histogram, Gabor WaveletsTexture, Local Binary Pattern and GIST feature. As a result,a 809-dimensional vector is used for each image [43].

The MNIST database2 of handwritten digits has a set of70,000 examples. The images were centered in a 28 × 28image by computing the center of mass of the pixels, andtranslating the image so as to position this point at the cen-ter of the 28 × 28 field. We use the first 60,000 images asdatabase images and the rest 10,000 images as queries fortesting the out-of-sample retrieval performance. The nor-malized gray-scale values for each pixel are used as imagefeatures.

The SIFT1M database contains one million SIFT featuresand each feature is represented by a 128-dimensional vector.

The ImageNet is an image database organized accordingto the WordNet nouns hierarchy, in which each node ofthe hierarchy is depicted by hundreds and thousands ofimages3. We downloaded about 1.2 million images’ BoWrepresentations. A visual vocabulary of 1,000 visual wordsis adopted, i.e., each image is represented by a 1,000-lengthvector. Due to the complex structure of the database andhigh diversity of images in each node, as well as the lowquality of simple BoW representation, the retrieval task isvery hard.

We use SIFT1M and ImageNet databases to evaluatethe efficiency of EMR on large and high dimensional data.We randomly select 1,000 images as out-of-sample testqueries for each. Some basic statistics of the four databasesare listed in Table 1. For COREL, MNIST and SIFT1Mdatabases, the data samples have dense features, while forImageNet database, the data samples have sparse features.

5.1.1 Evaluation Metric DiscussionThere are many measures to evaluate the retrieval resultssuch as precision, recall, F measure, MAP and NDCG [44].

2. http://yann.lecun.com/exdb/mnist/3. http://www.image-net.org/index

Page 7: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

108 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015

They are very useful for a real CBIR application, especiallyfor a web application in which only the top returned imagescan attract user interests. Generally, the image retrievalresults are displayed screen by screen. Too many imagesin a screen will confuse the user and drop the experienceevidently. Images in the top pages attract the most interestsand attentions from the user. So the precision at K metricis significant to evaluate the image retrieval performance.

MAP (Mean Average Precision) provides a single-figuremeasure of quality across recall levels. MAP has beenshown to have especially good discriminative power andstability. For a single query, Average Precision is the aver-age of the precision value obtained for the set of top k itemsexisting after each relevant item is retrieved, and this valueis then averaged over all queries [44]. That is, if the set ofrelevant items for a query qj ∈ Q is {d1, . . . , dmj} and Rjk isthe set of ranked retrieval results from the top result untilyou get to item dk, then

MAP(Q) = 1

|Q||Q|∑j=1

1

mj

mj∑k=1

Precision(Rjk). (18)

NDCG is a wildly used metric to evaluate a ranked list[44]. NDCG@K is defined as:

NDCG@K = 1

IDCG×

K∑i=1

2ri−1

log2(i + 1), (19)

where ri is 1 if the item at position i is a relevant item and0 otherwise. IDCG is chosen so that the perfect ranking hasa NDCG value 1.

5.2 Experiments on COREL DatabaseThe goal of EMR is to improve the speed of manifold rank-ing with acceptable ranking accuracy loss. We first compareour model EMR with the original manifold ranking (MR)and fast manifold ranking (FMR [32]) algorithm on CORELdatabase. As both MR and FMR are designed for in-sampleimage retrieval, we use each image as a query and evalu-ate in-sample retrieval performance. More comparison toranking with SVM can be found in our previous confer-ence version [13]. In this paper, we pay more attention onthe trade-off of accuracy and speed for EMR respect to MR,so we ignore the other methods.

We first compare the methods without relevance feed-back. Relevance feedback asks users to label some retrievedsamples, making the retrieval procedure inconvenient. Soif possible, we prefer an algorithm having good perfor-mance without relevance feedback. In Section 5.2.4, weevaluate the performance of the methods after one round ofrelevance feedback. MR-like algorithms can handle the rel-evance feedback very efficiently - revising the initial scorevector y.

5.2.1 Baseline AlgorithmEud: the baseline method using Euclidean distance forranking.MR: the original manifold ranking algorithm, the mostimportant comparison method. Our goal is to improvethe speed of manifold ranking with acceptable rankingaccuracy loss.

TABLE 2Precision and Time Comparisons of Two

Weight Estimation Methods

FMR: fast manifold ranking [32] firstly partitions the datainto several parts (clustering) and computes the matrixinversion by a block-wise way. It uses the SVD techniquewhich is time consuming. So its computational bottleneckis transformed to SVD. When SVD is accurately solved,FMR equals MR. But FMR uses the approximate solution tospeed up the computation. We use 10 clusters and calculatethe approximation of SVD with 10 singular values. Higheraccuracy requires much more computational time.

5.2.2 Comparisons of Two Weight Estimation Methodsfor EMR

Before the main experiment of comparing our algorithmEMR to some other models, we use a single experimentto decide which weight estimation method described inSection 4.1.1 should be adopted. We records the averageretrieval precision (each image is used as a query) and thecomputational time (seconds) of EMR with the two weightestimation methods in Table 2.

From the table, we see that the two methods havevery close retrieval results. However, the projected gradi-ent is much slower than kernel regression. In the rest ofour experiments, we use the kernel regression method toestimate the local weight (computing Z).

5.2.3 PerformanceAn important issue needs to be emphasized: although wehave the image labels (categories), we don’t use them inour algorithm, since in real world applications, labeling isvery expensive. The label information can only be used toevaluation and relevance feedback.

Each image is used as a query and the retrieval perfor-mance is averaged. Fig. 3 prints the average precision (at 20to 80) of each method and Table 3 records the average val-ues of recall, F1 score, NDCG and MAP (MAP is evaluatedonly for the top-100 returns). For our method EMR, 1000anchors are used. Later in the model selection part, we findthat using 500 anchors achieves a close performance. It iseasy to find that the performance of MR and EMR are veryclose, while FMR lose a little precision due to its approx-imation by SVD. As EMR’s goal is to improve the speedof manifold ranking with acceptable ranking accuracy loss,the performance results are not to show which method isbetter but to show the ranking performance of EMR is closeto MR on COREL.

We also record the offline building time for MR, FMRand EMR in Table 3. For in-sample retrieval, all the three

Page 8: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

XU ET AL.: EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL 109

Fig. 4. Precision at the top 10 returns of the three algorithms on each category of COREL database.

methods have the same steps and cost, so we ignore it onCOREL. We find that for a database with 5,000 images, allthe three methods have acceptable building time, and EMRis the most efficient. However, according to the the analy-sis in Section 4.3, MR’s computational cost is cubic to thedatabase size while EMR is linear to the database size. Theresult can be found in our experiments on MNIST database.

The anchor points are computed off-line and do notaffect the current on-line retrieval system. In the workof [13], we have tested different strategies for anchorpoints selection, including normal k-means, fast k-meansand random anchors. The conclusion is that the cost andperformance are trade-offs in many situations.

To see the performance distribution in the whole dataset more concretely, we plot the retrieval precision at top10 returns for all 50 categories in Fig. 4. As can be seen, theperformance of each algorithm varies with different cate-gories. We find that EMR is fairly close to MR in almostevery categories, but for FMR, the distribution is totallydifferent.

5.2.4 Performance with Relevance FeedbackRelevance Feedback [7] is a powerful interactive techniqueused to improve the performance of image retrieval sys-tems. With user provided relevant/irrelevant informationon the retrieved images, The system can capture the seman-tic concept of the query more correctly and graduallyimprove the retrieval precision.

Fig. 3. Retrieval precision at top 20 to 80 returns of Eud (left), MR, FMRand EMR (right).

Applying relevance feedback to EMR (as well as MR andFMR)is extremely simple. We update the initial vector y andrecompute the ranking scores. We use an automatic labelingstrategy to simulate relevance feedback: for each query, thetop 20 returns’ ground truth labels (relevant or irrelevant tothe query) are used as relevance feedbacks. It is performedfor one round, since the users have no patience to do more.The retrieval performance are plotted in Fig. 5. By relevancefeedback, MR, FMR and EMR get higher retrieval precisionbut still remain close to each other.

5.2.5 Model SelectionModel selection plays a key role to many machine learningmethods. In some cases, the performance of an algorithmmay drastically vary by different choices of the parameters,thus we have to estimate the quality of the parameters. Inthis subsection, we evaluate the performance of our methodEMR with different values of the parameters.

There are three parameters in our method EMR: s, α,and d. Parameter s is the neighborhood size in the anchorgraph. Small value of s makes the weight matrix Z verysparse. Parameter α is the tradeoff parameter in EMR andMR. Parameter d is the number of anchor points. For

TABLE 3Recall, F1, NCDG and MAP Values, as well as the Offline

Building Time (Seconds) of MR, FMR and EMR

Page 9: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

110 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015

Fig. 5. Retrieval precision at top 20 to 80 returns of Eud (left), MR, FMRand EMR (right) after one round of relevance feedback.

convenience, the parameter α is fixed at 0.99, consistentwith the experiments performed in [9], [10], [12].

Fig. 6 shows the performance of EMR (Precision at 60)by k-means anchors at different values of s. We find thatthe performance of EMR is not sensitive to the selection ofs when s > 3. With small s, we can guarantee the matrix Zhighly sparse, which is helpful to efficient computation. Inour experiments, we just select s = 5.

Fig. 7 shows the performance of EMR versus differ-ent number of anchors in the whole data set. We findthat the performance increases very slowly when thenumber of anchors is larger than 500 (approximately).In previous experiments, we fix the number of anchorsto 1000. Actually, a smaller number of anchors, like 800or 600 anchors, can achieve a close performance. Withfewer anchors, the graph construction cost will be furtherreduced. But as the size of COREL is not large, the savingis not important.

5.3 Experiments on MNIST DatabaseWe also investigate the performance of our method EMR onthe MNIST database. The samples are all gray digit imagesin the size of 28 × 28. We just use the gray values on each

Fig. 6. Retrieval precision versus different values of parameter s. Thedotted line represents MR performance.

Fig. 7. Retrieval precision versus different number of anchorss. Thedotted line represents MR performance.

pixel to represent the images, i.e., for each sample, we usea 784-dimensional vector to represent it. The database wasseparated into 60,000 training data and 10,000 testing data,and the goal is to evaluate the performance on the test-ing data. Note that although it is called ’training data’, aretrieval system never uses the given labels. All the rank-ing models use the training data itself to build their modelsand rank the samples according to the queries. Similaridea can be found in many unsupervised hashing algo-rithms [45], [46] for approximate and fast nearest neighborsearch.

With MNIST database, we want to evaluate the effi-ciency and effectiveness of the model EMR. As we havementioned, MR’s cost is cubic to the database size, whileEMR is much faster. We record the training time (buildingthe model offline) of MR, FMR and EMR (1k anchors) inTable 4 with the database size increasing step by step. Therequired time for MR and FMR increases very fast and forthe last two sizes, their procedures are out of memory dueto inverse operation. The algorithm MR with the solutionof Eq.(2) is hard to handle the size of MNIST. FMR per-forms even worse than MR as it clusters the samples andcomputes a large SVD – it seems that FMR is only use-ful for small-size database. However, EMR is much fasterin this test. The time cost scales linearly – 6 seconds for10,000 samples and 35 seconds for 60,000 samples. We usek-means algorithm with maximum 5 iterations to generatethe anchor points. We find that running k-means with 5iterations is good enough for anchor point selection.

TABLE 4Computational Time (s) for Offline Training of MR, FMR, and

EMR (1k Anchors) on MNIST Database

Page 10: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

XU ET AL.: EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL 111

(a) (b) (c)

Fig. 8. (a) MAP values with different number of anchors for EMR. (b) Offline training time of EMR with different number of anchors. (c) Online newquery retrieval time of EMR with different number of anchors on MNIST.

5.3.1 Out-of-Sample Retrieval TestIn this section, we evaluate the response time of EMRwhen handling an out-of-sample (a new sample). As MR(as well as FMR)’s framework is hard to handle the out-of-sample query and is too costly for training the modelon the size of MNIST (Table 4), from now on, we don’tuse MR and FMR as comparisons, but some other rankingscore (similarity or distance) generating methods should becompared. We use the following two methods as baselinemethods:Eud: linear scan by Euclidean distance. This maybe themost simple but meaningful baseline to compare the out-of-sample retrieval performance. Many previous fast nearestneighbor search algorithms or hashing-based algorithmswere proposed to accelerate the linear scan speed withsome accuracy loss than Euclidean distance. Their goal isdifferent with ranking – the ranking model assigns eachsample a score but not only the neighbors.LSH: locality sensitive hashing [45], a famous hashing codegenerating method. We use LSH to generate binary codesfor the images for both training and testing samples andthen calculate the hamming distance of a query to alldatabase samples as ranking metric. We use 128 bits and256 bits as the code length of LSH.

In Fig. 8(a), we draw the MAP (top 200) values for allthe testing data of our model EMR with different num-ber of anchor points. The performance of Eud and LSHare showed by three horizontal lines. We can see that,when more than 400 anchors are used, EMR outperformsEuclidean distance metric significantly. LSH is worse thanEud due to its binary representation. We also record EMR’soffline training time and online retrieval time in Fig. 8(b)and Fig. 8(c). The computational time for both offline andonline increases linearly to the number of anchors.

Then, in Table 5, we record the computational time (inseconds) and out-of-sample retrieval performance of EMR(1000 anchors), Eud and LSH with 128 and 256 code length.The best performance of each line is in bold font. EMR andLSH-128 have close online retrieval time, which is greatlyfaster than linear scan Eud – about 30 times faster. LSHhas very small training cost as its hashing functions arerandomly selected, while EMR needs more time to buildthe model. With more offline building cost, EMR receiveshigher retrieval performance in metric of precision, NDCG

at 100 and MAP. The offline cost is valuable. The numberwith ∗ means it is significant higher than Eud at the 0.001significance level.

5.3.2 Case StudyFig. 9 is an out-of-sample retrieval case with Fig. 9(a) usingEuclidean distance to measure the similarity and Fig. 9(b)using EMR with 400 anchors and Fig. 9(c) with 600 anchors.Since the database structure is simple, we just need to usea small number of anchors to build our anchor graph.When we use 400 anchors, we have received a good result(Fig. 9(b)). Then, when we use more anchors, we can get abetter result. It is not hard to see that, the results of Fig. 9(b)and (c) are all correct, but the quality of Fig. 9(c) is a littlebetter – the digits are more similar with the query.

5.4 Experiments on Large Scale DatabasesIn our consideration, the issue of performance shouldinclude both efficiency and effectiveness. Since our methodis designed to speedup the model ’manifold ranking’, theefficiency is the main point of this paper. The first sev-eral experiments are used to show that our model is muchfaster than MR in both offline training and online retrievalprocesses, with only a small accuracy loss. The originalMR model can not be directly applied to a large data set,e.g., a data set with 1 million samples. Thus, to show theperformance of our method for large data sets, we com-pare many state-of-the-art hash-based fast nearest neighborsearch algorithms (our ranking model can naturally do the

TABLE 5Out-of-Sample Retrieval Time (s) and Retrieval Performance

Comparisons of EMR (1k Anchors), Eud and LSH with128 and 256 Code Length on MNIST Database

The best performance is in bold font. The number with ∗ means it is significanthigher than Eud at the 0.001 significance level.

Page 11: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

112 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015

Fig. 9. Top retrieved MNIST digits via (a) Euclidean distance, (b) EMR with 400 anchor points, and (c) EMR with 600 anchor points. The digit in thefirst line is a new query and the rest digits are the top returns.

work of nearest neighbor search) on SIFT1M and ImageNetdatabases.

For these two sets, there is no exact labels, so we followthe criterion used in many previous fast nearest neighborsearch work [46]: the groundtruth neighbors are obtainedby brute force search. We use the top-1 percent nearestneighbors as groundtruth. We record the computationaltime (offline training and online retrieval) and rankingperformance in Tables 6 and 7. The offline time is for train-ing and the online time is for a query retrieval (averaged).We randomly select 1,000 images from the database asout-of-sample queries and evaluate the performance.

For comparison, some state-of-the-art hashing meth-ods including LSH, Spectral Hashing [46] and SphericalHashing (a very recent proposed method [47]) are used.For EMR, we select 10% of the database samples to run k-means algorithm with maximum 5 iterations,which is veryfast. In the online stage, the hamming distances betweenthe query sample and the database samples are calculatedfor LSH, Spectral hashing and Spherical Hashing and thenthe distances are sorted. While for our method, we directlycompute the scores via Eq.(17) and sort them. If we adoptany filtering strategy to reduce the number of candidatesamples, the computational cost for each method would bereduced equally. So we only compare the largest compu-tational cost (brute force search). We adopt 64-bit binarycodes for SIFT1M and 128-bit for ImageNet for all the hashmethods.

From Tables 6 and 7, we find that EMR has a com-parable online query cost, and a high nearest neighborsearch accuracy, especially on the high dimensional dataset ImageNet, showing its good performance.

TABLE 6Computational Time (s) and Retrieval Performance

Comparison of EMR (1k Anchors), and LSH andSpherical Hash on SIFT1M Database

(1 Million-Sample, 128-Dim)

5.5 Algorithm AnalysisFrom the comprehensive experimental results above, weget a conclusion that our algorithm EMR is effective andefficient. It is appropriate for CBIR since it is friendly tonew queries. A core point of the algorithm is the anchorpoints selection. Two issues should be further discussed: thequality and the number of anchors. Obviously, our goal isto select less anchors with higher quality. We discuss themas follows:

• How to select good anchor points? This is an openquestion. In our method, we use k-means clusteringcenters as anchors. So any faster or better clusteringmethods do help to the selection. There is a tradeoffbetween the selection speed and precision. However,the k-means centers are not perfect – some clustersare very close while some clusters are very small.There is still much space for improvement.

• How many anchor points we need? There isno standard answer but our experiments pro-vide some clues: SIFT1M and ImageNet databasesare larger than COREL, but they need similarnumber of anchors to receive acceptable results,i.e., the required number of anchors is not pro-portional to the database size. This is impor-tant, otherwise EMR is less useful. The numberof anchors is determined by the intrinsic clusterstructure.

6 CONCLUSION

In this paper, we propose the Efficient Manifold Rankingalgorithm which extends the original manifold ranking to

TABLE 7Computational Time (s) and Retrieval Performance

Comparison of EMR (1k Anchors), and LSHand Spherical Hash on ImageNet Database

(1.2 Million-Sample, 1k-Dim)

Page 12: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

XU ET AL.: EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL 113

handle large scale databases. EMR tries to address theshortcomings of original manifold ranking from two per-spectives: the first is scalable graph construction; and thesecond is efficient computation, especially for out-of-sampleretrieval. Experimental results demonstrate that EMR is fea-sible to large scale image retrieval systems – it significantlyreduces the computational time.

ACKNOWLEDGMENTS

This work was supported in part by National NaturalScience Foundation of China under Grant 61125203,91120302, 61173186, 61222207, and 61173185, and inpart by the National Basic Research Program of China(973 Program) under Grant 2012CB316400, FundamentalResearch Funds for the Central Universities, Programfor New Century Excellent Talents in University(NCET-09-0685), Zhejiang Provincial Natural ScienceFoundation under Grant Y1101043 and Foundation ofZhejiang Provincial Educational Department under GrantY201018240.

REFERENCES

[1] R. C. Veltkamp and M. Tanase, “Content-based image retrievalsystems: A survey,” Dept. Computing Science, Utrecht University,Utrecht, The Netherlands, Tech. Rep. UU-CS-2000-34, 2000.

[2] Y. Liu, D. Zhang, G. Lu, and W. Ma, “A survey of content-basedimage retrieval with high-level semantics,” Pattern Recognit.,vol. 40, no. 1, pp. 262–282, 2007.

[3] R. Datta, D. Joshi, J. Li, and J. Wang, “Image retrieval: Ideas,influences, and trends of the new age,” ACM CSUR, vol. 40, no. 2,pp. 1–60, 2008.

[4] T. Ojala, M. Pietikäinen, and D. Harwood, “A comparativestudy of texture measures with classification based on featureddistributions,” Pattern Recognit., vol. 29, no. 1, pp. 51–59, 1996.

[5] D. Lowe, “Object recognition from local scale-invariant features,”in Proc. 7th IEEE ICCV, Kerkyra, Greece, 1999, p. 1150.

[6] B. Li, E. Chang, and C. Wu, “DPF—A perceptual distance functionfor image retrieval,” in Proc. Int. Conf. Image Process., vol. 2. 2002,pp. 597–600.

[7] Y. Rui, T. Huang, M. Ortega, and S. Mehrotra, “Relevance feed-back: A power tool for interactive content-based image retrieval,”IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 5, pp. 644–655,Sept. 2002.

[8] S. Roweis and L. Saul, “Nonlinear dimensionality reduction bylocally linear embedding,” Sci., vol. 290, no. 5500, p. 2323, 2000.

[9] D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf,“Learning with local and global consistency,” in Proc. Adv. NIPS,2004, pp. 595–602.

[10] D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Schölkopf,“Ranking on data manifolds,” Proc. Adv. NIPS, vol. 16. Vancouver,BC, Canada, pp. 169–176, 2004.

[11] X. Wan, J. Yang, and J. Xiao, “Manifold-ranking based topic-focused multi-document summarization,” in Proc. 20th IJCAI,vol. 7. San Francisco, CA, USA, 2007, pp. 2903–2908.

[12] J. He, M. Li, H. Zhang, H. Tong, and C. Zhang, “Manifold-ranking based image retrieval,” in Proc. 12th Annu. ACM Int. Conf.Multimedia, New York, NY, USA, 2004, pp. 9–16.

[13] B. Xu et al., “Efficient manifold ranking for image retrieval,” inProc. 34th Int. ACM SIGIR, New York, NY, USA, 2011, pp. 525–534.

[14] X. Yuan, X. Hua, M. Wang, and X. Wu, “Manifold-ranking basedvideo concept detection on large database and feature pool,” inProc. 14th Annu. ACM Int. Conf. Multimedia, New York, NY, USA,2006, pp. 623–626.

[15] J. Ponte and W. Croft, “A language modeling approach to infor-mation retrieval,” in Proc. 21st Annu. Int. ACM SIGIR, Melbourne,VIC, Australia, 1998, pp. 275–281.

[16] S. Brin and L. Page, “The anatomy of a large-scale hypertextualWeb search engine,” Comput. Netw. ISDN Syst., vol. 30, no. 1–7,pp. 107–117, 1998.

[17] J. Kleinberg, “Authoritative sources in a hyperlinked environ-ment,” J. ACM, vol. 46, no. 5, pp. 604–632, 1999.

[18] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic image anno-tation and retrieval using cross-media relevance models,” inProc. 26th Annu. Int. ACM SIGIR, Toronto, ON, Canada, 2003,pp. 119–126.

[19] M. Tsai, T. Liu, T. Qin, H. Chen, and W. Ma, “FRank: A rankingmethod with fidelity loss,” in Proc. 30th Annu. Int. ACM SIGIR,Amsterdam, The Netherlands, 2007, pp. 383–390.

[20] W. Gao, P. Cai, K. Wong, and A. Zhou, “Learning to rank onlyusing training data from related domain,” in Proc. 33rd Int. ACMSIGIR, New York, NY, USA, 2010, pp. 162–169.

[21] L. Wang, J. Lin, and D. Metzler, “Learning to efficiently rank,” inProc. 33rd Int. ACM SIGIR, New York, NY, USA, 2010, pp. 138–145.

[22] X. He, D. Cai, J. Wen, W. Ma, and H. Zhang, “Clustering andsearching WWW images using link and page layout analysis,”ACM Trans. Multimedia Comput. Commun. Appl., vol. 3, no. 2,Article 10, 2007.

[23] D. Cai, X. He, W. Ma, J. Wen, and H. Zhang, “OrganizingWWW images based on the analysis of page layout and web linkstructure,” in Proc. IEEE ICME, vol. 1. 2004, pp. 113–116.

[24] X. He, D. Cai, J.-R. Wen, W.-Y. Ma, and H.-J. Zhang, “Clusteringand searching WWW images using link and page layout analy-sis,” ACM Trans. Multimedia Comput. Commun. Appl., vol. 3, no. 1,Article 10, 2007.

[25] D. Liu, X. Hua, L. Yang, M. Wang, and H. Zhang, “Tag ranking,”in Proc. 18th Int. Conf. WWW, Madrid, Spain, 2009, pp. 351–360.

[26] Z. Guan, J. Bu, Q. Mei, C. Chen, and C. Wang, “Personalizedtag recommendation using graph-based ranking on multi-typeinterrelated objects,” in Proc. 32nd Int. ACM SIGIR, Boston, MA,USA, 2009, pp. 540–547.

[27] J. Bu et al., “Music recommendation by unified hypergraph:Combining social media information and music content,” in Proc.18th Annu. ACM Int. Conf. Multimedia, Firenze, Italy, 2010, pp.391–400.

[28] D. Cai, X. He, and J. Han, “Using graph model for face analy-sis,” Comput. Sci. Dept., UIUC, Champaign, IL, USA, Tech. Rep.UIUCDCS-R-2005-2636, 2005.

[29] W. Liu, J. Wang, and S. Chang, “Robust and scalable graph-based semisupervised learning,” Proc. IEEE, vol. 100, no. 9,pp. 2624–2638, Sept. 2012.

[30] S. Agarwal, “Ranking on graph data,” in Proc. 23rd Int. Conf. Mach.Learn., New York, NY, USA, 2006, pp. 25–32.

[31] J. Yu, D. Tao, and M. Wang, “Adaptive hypergraph learning andits application in image classification,” IEEE Trans. Image Process.,vol. 21, no. 7, pp. 3262–3272, Jul. 2012.

[32] R. He, Y. Zhu, and W. Zhan, “Fast manifold-ranking for content-based image retrieval,” in Proc. ISECS Int. Coll. CCCM, Sanya,China, 2009.

[33] W. Liu, J. He, and S. Chang, “Large graph construction for scalablesemi-supervised learning,” in Proc. 27th Int. Conf. Mach. Learn.,Haifa, Israel, 2010, pp. 679–686.

[34] K. Zhang, J. Kwok, and B. Parvin, “Prototype vector machinefor large scale semi-supervised learning,” in Proc. 26th Annu. Int.Conf. Mach. Learn., Montreal, QC, Canada, 2009, pp. 1233–1240.

[35] K. Yu, T. Zhang, and Y. Gong, “Nonlinear learning using localcoordinate coding,” in Proc. Adv. NIPS, 2009.

[36] S. Tong and E. Chang, “Support vector machine active learn-ing for image retrieval,” in Proc. 9th ACM Int. Conf. Multimedia,Ottawa, ON, Canada, 2001, pp. 107–118.

[37] K. Yu, J. Bi, and V. Tresp, “Active learning via transductive exper-imental design,” in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh,PA, USA, 2006, pp. 1081–1088.

[38] T. Kanungo et al., “An efficient k-means clustering algorithm:Analysis and implementation,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 24, no. 7, pp. 881–892, Jul. 2002.

[39] J. Chen, J. Liu, and J. Ye, “Learning incoherent sparse and low-rank patterns from multiple tasks,” ACM Trans. Knowl. Discov.Data, vol. 5, no. 4, Article 22, 2012.

[40] J. He, M. Li, H. Zhang, H. Tong, and C. Zhang, “Generalizedmanifold-ranking-based image retrieval,” IEEE Trans. ImageProcess., vol. 15, no. 10, pp. 3170–3177, Oct. 2006.

[41] H. Yu, M. Li, H. Zhang, and J. Feng, “Color texture moments forcontent-based image retrieval,” in Proc. Int. Conf. Image Process.,vol. 3. 2002, pp. 929–932.

Page 13: 102 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA … · based image retrieval (CBIR), especially for out-of-sample retrieval on large scale databases. Traditional image retrieval systems

114 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 1, JANUARY 2015

[42] X. He, D. Cai, and J. Han, “Learning a maximum margin subspacefor image retrieval,” IEEE Trans. Knowl. Data Eng., vol. 20, no. 2,pp. 189–201, Feb. 2007.

[43] J. Zhu, S. Hoi, M. Lyu, and S. Yan, “Near-duplicate keyframeretrieval by nonrigid image matching,” in Proc. 16th ACM Int.Conf. Multimedia, Vancouver, BC, Canada, 2008, pp. 41–50.

[44] C. Manning, P. Raghavan, and H. Schütze, Introduction toInformation Retrieval. New York, NY, USA: Cambridge UniversityPress, 2008.

[45] A. Gionis, P. Indyk, and R. Motwani, “Similarity search in highdimensions via hashing,” in Proc. Int. Conf. VLDB, Edinburgh,U.K., 1999, pp. 518–529.

[46] Y. Weiss, A. Torralba, and R. Fergus, “Spectral hashing,” in Proc.Adv. NIPS, 2008.

[47] J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon, “Sphericalhashing,” in Proc. IEEE Conf. CVPR, Providence, RI, USA, 2012,pp. 2957–2964.

Bin Xu received the B.S. degree in computer sci-ence from Zhejiang University, China, in 2009.He is currently a candidate for the Ph.D. degreein computer science at Zhejiang University. Hiscurrent research interests include informationretrieval, recommender system, and machinelearning. He is a student member of the IEEE.

Jiajun Bu received the B.S. and Ph.D. degreesin computer science from Zhejiang University,China, in 1995 and 2000, respectively. He is aProfessor in the College of Computer Science,Zhejiang University. His current research inter-ests include embedded system, data mining,information retrieval, and mobile database. He isa member of the IEEE.

Chun Chen received the B.S. degree in mathe-matics from Xiamen University, China, in 1981,and the M.S. and Ph.D. degrees in computer sci-ence from Zhejiang University, China, in 1984and 1990, respectively. He is a Professor inthe College of Computer Science, ZhejiangUniversity. His current research interests includeinformation retrieval, data mining, computervision, computer graphics, and embedded tech-nology. He is a member of the IEEE.

Can Wang received the B.S. degree in eco-nomics, and the M.S. and Ph.D. degrees in com-puter science from Zhejiang University, China,in 1995, 2003, and 2009, respectively. He iscurrently a Faculty Member in the College ofComputer Science at Zhejiang University. Hiscurrent research interests include informationretrieval, data mining, and machine learning. Heis a member of the IEEE.

Deng Cai is a Professor in the State KeyLab of CAD&CG, College of Computer Science,Zhejiang University, China. He received thePh.D. degree in computer science from theUniversity of Illinois at Urbana-Champaign,Urbana-Champaign, IL, USA in 2009. Beforethat, he received the bachelor’s and master’sdegrees from Tsinghua University, China, in2000 and 2003, respectively, both in automa-tion. His current research interests includemachine learning, data mining and informationretrieval. He is a member of the IEEE.

Xiaofei He received the B.S. degree in com-puter science from Zhejiang University, China,in 2000 and the Ph.D. degree in computer sci-ence from the University of Chicago, IL, USA, in2005. He is a Professor in the State Key Lab ofCAD&CG at Zhejiang University. Prior to joiningZhejiang University in 2007, he was a ResearchScientist at Yahoo! Research Labs, Burbank,CA, USA. His current research interests includemachine learning, information retrieval, andcomputer vision. He is a senior memberof the IEEE.

� For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.