: Personalized Image Recommendation via Exploratory Search From Large-Scale Flickr Images

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009 273

JustClick: Personalized Image Recommendation viaExploratory Search From Large-Scale Flickr Images

Jianping Fan, Daniel A. Keim, Yuli Gao, Hangzai Luo, and Zongmin Li

Abstract—In this paper, we have developed a novel frameworkcalled JustClick to enable personalized image recommendation viaexploratory search from large-scale collections of Flickr images.First, a topic network is automatically generated to summarizelarge-scale collections of Flickr images at a semantic level. Hyper-bolic visualization is further used to enable interactive navigationand exploration of the topic network, so that users can gain in-sights of large-scale image collections at the first glance, build uptheir mental query models interactively and specify their queries(i.e., image needs) more precisely by selecting the image topics onthe topic network directly. Thus, our personalized query recom-mendation framework can effectively address both the problemof query formulation and the problem of vocabulary discrepancyand null returns. Second, a small set of most representative imagesare recommended for the given image topic according to their rep-resentativeness scores. Kernel principal component analysis andhyperbolic visualization are seamlessly integrated to organize andlayout the recommended images (i.e., most representative images)according to their nonlinear visual similarity contexts, so that userscan assess the relevance between the recommended images andtheir real query intentions interactively. An interactive interfaceis implemented to allow users to express their time-varying queryintentions precisely and to direct our JustClick system to more rel-evant images according to their personal preferences. Our experi-ments on large-scale collections of Flickr images show very positiveresults.

Index Terms—Personalized image recommendation, similarity-based image exploration, topic network.

I. INTRODUCTION

T HE LAST FEW years have witnessed enormous growthin digital cameras and online high-quality digital images,

thus there is an increasing need of new techniques to supportmore effective image search. Because the keywords are moreintuitive for users to specify their image needs, keyword-basedimage retrieval approaches are now becoming more popular

Manuscript received August 09, 2007; revised January 17, 2008. First pub-lished December 09, 2008; current version published January 30, 2009. Thispaper was recommended by Associate Editor L. Guan.

J. Fan is with the Department of Computer Science, University of North Car-olina, Charlotte, NC 28223 USA (e-mail: [email protected]).

D. A. Keim is with the Computer Science Institute, University of Konstanz,Konstanz D-78457, Germany (e-mail: [email protected]).

Y. Gao was with the University of North Carolina, Charlotte, NC 28223 USA.He is now with HP Labs, Palo Alto, CA 94304 USA (e-mail: [email protected]).

H. Luo was with the University of North Carolina, Charlotte, NC 28223USA. He is now with East China Normal University, Shanghai, China (e-mail:[email protected]).

Z. Li was with the University of North Carolina, Charlotte, NC 28223 USA.He is now with the Department of Computer Science, China University of Pe-troleum Dongyong, China (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2008.2009258

than traditional content-based image retrieval (CBIR) ones [1].However, there are at least three main obstacles for supportingkeyword-based image retrieval. 1) Automatic annotation oflarge sets of images with unconstrained contents and capturingconditions is still an ongoing research challenge because of thesemantic gap [1], [6]. 2) For the same keyword-based query,different users may look for different types of images withvarious visual properties and a few keywords for query formu-lation cannot capture the users’ query intentions effectively andefficiently [2]–[4]. Thus, most existing CBIR systems tend toreturn the same set of images to all the users (i.e., one size fitsall) and users may seriously suffer from the problem of infor-mation overload [2]. 3) Users may not be able to find the mostsuitable keywords to formulate their image needs precisely orthey may not even know what to look for (i.e., I don’t knowwhat I am looking for, but I’ll know when I find it) [2]–[5]. Inaddition, there may have a vocabulary discrepancy betweenthe keywords for users to formulate their queries and the textterms for image annotation, and such vocabulary discrepancymay result in null returns for the mismatching queries. Thus,users may seriously suffer from both the problem of queryformulation and the problem of vocabulary discrepancy andnull returns. The keywords for image annotation may not beable to describe the rich details of the visual contents of theimages sufficiently, thus most existing keyword-based imageretrieval systems cannot allow users to look for some particularimages according to their visual properties.

Even though the keywords are more intuitive for users to for-mulate their queries (i.e., image needs), a few keywords cannotdescribe the users’ real query intentions effectively and effi-ciently, thus there is an uncertainty between the keywords forquery formulation and the users’ real query intentions. In ad-dition, the users’ query intentions may vary with their timelyimage observations, and such dynamics cannot be characterizedprecisely by using pre-learned user profiles. The huge numberof online users and the uncertainties of the image retrieval en-vironment (i.e., dynamic nature of the users’ interests and hugediversity of image semantics) also make it extremely difficult tolearn the user profiles adaptively. Therefore, it is very importantto develop new algorithms that can capture the users’ dynamicquery intentions implicitly and support personalized image rec-ommendation.

It is important to understand that the system alone cannotmeet the users’ sophisticated image needs effectively, imagesearch requires greater interactivity between the users and thesystems [2]–[5]. Thus, there is an urgent need to enhance thesystem’s ability to allow users to communicate their imageneeds more precisely and express their time-varying queryinterests effectively. Visualization and interactive image ex-ploration can offer good opportunities for allowing users to

1051-8215/$25.00 © 2009 IEEE

274 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 2, FEBRUARY 2009

add their background knowledge, powerful pattern recognitioncapability and inference skills for directing the CBIR systemsto find more relevant images according to their personal pref-erences [3]–[5]. Such user-system interaction and explorationprocess can bring the system perspective of image collectionsand the users’ perspective of image needs together. Thus, theinteractive user-system interface should aid the users in under-standing and expressing their image needs easily and precisely.Unfortunately, designing interactive user interfaces for theCBIR systems has not received enough attentions [4], [5].

From the users’ point of views, such interactive user inter-faces should be able to allow them to: 1) communicate theirimage needs easily and precisely; 2) express their time-varyingquery interests precisely for directing the system to find morerelevant images according to their personal preferences; 3) ex-plore large amounts of returned images interactively accordingto their inherent visual similarity contexts for relevance assess-ment; and 4) track and visualize their access path (query con-texts) and recommend the most relevant image topics or themost representative images for the next search.

From the system’s point of view, such interactive user inter-faces should be able to: 1) disclose a good global overview (i.e.,a big picture) of large-scale image collections to assist users inmaking better query decisions; 2) visualize large amounts of re-turned images according to their visual similarity contexts toassist the users in interactive image exploration and relevanceassessment; 3) capture the users’ time-varying query intentionsimplicitly and integrate them to find more relevant images ac-cording to the users’ personal preferences; and 4) provide a goodenvironment to integrate users’ background knowledge and pat-tern recognition capacity for bridging the semantic gap in theloop of image retrieval.

Based on these observations, we have developed a novelframework called JustClick to achieve personalized image rec-ommendation via exploratory search. Our research focuses onlarge-scale collections of Flickr images to bypass the semanticgap problem for automatic image annotation. In addition, aninteractive user-system interface is designed for integratingthe users’ background knowledge and their pattern recognitioncapability to bridge the semantic gap in the loop of imageretrieval. This paper is organized as follows. Section II brieflyreviews some existing work on tackling these three obstacles;Section III introduces our new scheme to incorporate topicnetwork and hyperbolic visualization for achieving personal-ized query recommendation (addressing both the problem ofquery formulation and the problem of vocabulary discrepancyand null returns); Section IV describes our new algorithm forintegrating visualization and interactive image exploration tocapture the users’ time-varying query intentions implicitly forachieving user-adaptive image recommendation (addressingthe problem of time-varying query intentions and informationoverload); Section V summarizes our evaluation of the algo-rithms and the system; We conclude in Section VI.

II. RELATED WORK

To tackle the first obstacle (e.g., bridge the semantic gapfor image annotation), automatic image annotation via se-mantic classification has recently received sustantial attentions[7]–[11]. Unfortunately, supporting automatic annotation oflarge-scale image collections with unconstrainted contents and

capturing conditions is still an ongoing research challenge[7]–[11]. Therefore, two alternative approaches are widely usedfor supporting keyword-based retrieval of large-scale onlineimage collections. 1) Google image search engine by indexinglarge-scale online image collections through the text terms thatare automatically extracted from the associated text documents,image file names or URLs. 2) Flickr image search engine byindexing large-scale collections of shared images through thetaggings that are manually provided by numerous online users.These keyword-based image search engines have offered manygood opportunities for the CBIR community while emergingmany new challenges.

The two commercial image search engines Google andFlickr have achieved significant progress on supporting key-word-based retrieval of large-scale online image collections byusing the manual image taggings or the associated text terms,but their performance (accuracy, efficiency, and effectiveness)is still not acceptable because of the following reasons. 1) Thefile names, URLs, and the associated text terms may not haveexact correspondence with the semantics of the images, andthus the Google image search engine returns large amountsof junk images [22]–[24]. 2) Different users may use variouskeywords with ambiguous word senses to annotate the imagesand one single keyword may have multiple word senses, thusthe Flickr image search engine may return inconsistent orincomplete results. In addition, there may have a vocabularydiscrepancy between the keywords for the users to formulatetheir queries and the taggings for manual image annotations,and such vocabulary discrepancy may result in null returns forthe mismatching queries. Thus, only using the manual annota-tions for image retrieval may be far from people’s expectation.3) The visual contents of the images are completely ignored,thus both Google image search engine and Flickr image searchengine cannot allow users to look for some particular images ofinterest according to their visual properties. However, there aresome evidence that the visual properties are very important forpeople to search for images [1]. Unfortunately, the keywordsmay not be expressive enough for describing the rich detailsof the visual contents of the images sufficiently. Even thelow-level visual features may not be able to carry the semantisof image contents directly [1], [6], they can definitely be used toenhance the users’ abilities on finding some particular imagesaccording to their inherent visual similarity contexts. 4) A fewkeywords for query formulation may not be able to capture theusers’ real query intentions effectively, thus users may seriouslysuffer from the problem of information overload.

Some pioneer work have been done by incorporating rele-vance feedback to bridge the semantic gap in the loop of imageretrieval [25], [26]. Unfortunately, most existing relevance feed-back approaches may bring huge burden on the users and a lim-ited number of labeled images may not be representative enoughfor learning an accurate prediction model to categorize largeamounts of unseen images precisely.

To tackle the second obstacle (e.g., the problem oftime-varying query intentions and information overload),personalized information retrieval can be treated as one po-tential solution and there are two well-accepted approaches[21]: content-based filtering and collaborative filtering. Un-fortunately, both of them cannot directly be extended forsupporting personalized image recommendation because of the

FAN et al.: JustClick: PERSONALIZED IMAGE RECOMMENDATION 275

huge diversity and the time-varying properties of the users’preferences, and it is very hard if not impossible to learn theusers’ preferences precisely from a few relevance judgments.

The interfaces for most existing CBIR systems are designedfor users to assess the relevance between the returned imagesand their real query intentions via page-by-page browsing. Suchpage-by-page browsing approach may seriously suffer from thefollowing problems. 1) They are not scalable to the sizes of theimages and many pages are needed for displaying large amountsof images returned by the same keyword-based query, thus itis very tedious for users to look for some particular imagesof interest through page-by-page browsing. 2) Because the vi-sual similarity contexts among the returned images are com-pletely ignored for image ranking, the visually-similar imagesmay be separated into different pages and each page may leadthe users to new image contents. Such inter-page visual dis-connection may divert the users’ attentions from their currentquery contexts and make it very difficult for the users to com-pare the diverse visual similarities between the returned images.Thus, the users cannot assess the relevance between the returnedimages and their real query intentions effectively. Rather thandisplaying the returned images page by page, more interactiveuser-system interfaces should be developed to allow the usersto explore large amounts of returned images according to theirinherent visual similarity contexts, so that the users can dis-cover new knowledge from large-scale image collections via ex-ploratory search [3].

Based on these observations, some pioneer works havebeen done by incorporating visualization to support interactiveimage navigation and exploration [11]–[14]. Unfortunately, allthese existing similarity-based image visualization techniquescannot work on large-scale image collections because they mayseriously suffer from the following problems: 1) they are notscalable to the sizes of the images because of the overlappingproblem; 2) they are not scalable to the diversity of imagesemantics because of the shortage of the effective techniquesfor semantic image summarization; 3) the underlying similarityfunctions may not be able to characterize the diverse visualsimilarities between the images accurately and the visual sim-ilarity structures between the images could be nonlinear; 4)most existing techniques for image projection, such as MDSand PCA, may suffer from low convergence rate, stick in localminima or may not be able to exploit and preserve the nonlinearvisual similarity structures between the images precisely.

To tackle the third obstacle (e.g., problem of query formula-tion and problem of vocabulary discrepancy and null returns),Flickr has provided a list of the most popular taggings (i.e., tag-cloud) for assisting users on query formulation. Unfortunately,the contextual relationships between the image topics are com-pletely ignored. However, such inter-topic contextual relation-ships, which can give a good approximation of the interesting-ness of the image topics (i.e., like PageRank for characterizingthe importance of web pages [20]), may play an important rolefor the users to make better query decisions. When the most rele-vant image topics are displayed together according to their asso-ciation contexts, it is easier for the users to come up with moreclear query concepts. Therefore, it is very attractive to exploitboth the image topics of interest and their association contextsfor supporting personalized query recommendation, so that the

users can make better query decisions and formulate their imageneeds more precisely.

III. PERSONALIZED QUERY RECOMMENDATION

Every process for image seeking is necessarily initiated byan image need from the user’s side, thus a successful CBIRsystem should be able to allow the users to obtain a good globaloverview of large-scale image collections quickly and com-municate their image needs more effectively. In our JustClicksystem, we have developed two innovative techniques tosupport personalized query recommendation for assisting theusers in communicating their image needs with the systemmore effectively. 1) Topic network is automatically generatedto summarize large-scale collections of Flickr images at asemantic level. 2) Hyperbolic visualization is implemented toenable change of focus, so that the users can gain the significantinsights of large-scale image collections via interactive topicnetwork navigation and exploration.

Our topic network consists of two key components: popularimage topics at Flickr and their inter-topic contextual relation-ships. We have developed an automatic scheme for generatingsuch topic network from large-scale collections of Flickr im-ages. Each image at Flickr is associated with the users’ taggingsof the underlying image semantics and the users’ rating scores.After the images and the associated users’ taggings are down-loaded from Flickr.com, the text terms for image topic interpre-tation are identified automatically by using standard text anal-ysis techniques.

Concept ontology has recently been used to index and sum-marize large-scale image collections at the concept level byusing one-direction IS-A hierarchy (i.e., each concept node islinked with only its parent node in one direction) [7], [11]. How-ever, the contextual relationships between the image topics atFlickr could be more complex rather than such one-directionIS-A hierarchy, where each image topic may link with multiplerelated image topics as a network. Thus, the inter-topic contextsfor Flickr image collections cannot be characterized preciselyby using only the one-direction IS-A hierarchy. Based on theseobservations, we have developed a new algorithm for deter-mining more general inter-topic associations by seamlessly in-tegrating both the semantic similarity and the information con-tent gained from the co-occurrence probability of the relevanttext terms. The inter-topic association is then definedas

(1)

where , the first part is used to measure the con-tribution from the semantic similarity between theimage topics and , the second part indicates the contri-bution from the information content gained from theco-occurrence probability of the image topics and , is aninteger to keep and to be on the samescale, and are the weighting parameters. In our definition,the strength of the inter-topic association is normalized to 1 and


it increases adaptively with the semantic similarity and the in-formation content.

The information content is defined as

(2)

where is the co-occurrence probability of the textterms for interpreting two image topics and , and it can beobtained from the available image annotation document. Fromthis definition, one can observe that higher co-occurrence prob-ability of the image topics corresponds to larger informa-tion content and stronger inter-topic association .

The semantic similarity is defined as [28]

(3)

where is the length of the shortestpath between the image topics and in an one-drectionIS-A taxonomy, and is the maximum depthof such one-direction IS-A taxonomy. From this definition, onecan observe that closer between the image topics (i.e., smallervalue of ) on the taxonomy correspondsto higher semantic similarity and stronger inter-topic as-sociation .

The information content gained from the co-occur-rence probability of the image topics is more representative forcharacterizing the complex inter-topic associations at Flickr,thus it plays more important role than the semantic similarity

on characterizing the strength of the inter-topic associa-tions , and we set and heuristically.

Unlike the one-direction IS-A hierarchy in the concept on-tology, each image topic can be linked with all the other imagetopics on the topic network, thus the maximum number of suchinter-topic associations is , where is the totalnumber of image topics on the topic network. However, thestrength of the associations between some image topics maybe very weak, thus it is not necessary for each image topic tobe linked with all the other topics on the topic network. Basedon this understanding, each image topic is automatically linkedwith top most relevant image topics with largervalues of the inter-topic associations , and the potentialnumber of such inter-topic associations is reduced to .One small part of our topic network is given in Fig. 1, wherethe image topics are organized according to the strength of theirassociations, . One can observe that such topic networkcan provide a good global overview of large-scale collectionsof Flickr images at a semantic level, thus it can be used to assistusers in making better query decisions. When users see howthe image topics are related to each other on the topic network,they will have a better understanding of their image needs andcome up with more precise query concepts.

To integrate the topic network for supporting personalizedquery recommendation, it is very attractive to enable graphicalrepresentation and visualization of the topic network, so thatusers can obtain a good global overview of large-scale imagecollections at the first glance. It is also very attractive to enableinteractive topic network navigation and exploration accordingto the inherent inter-topic contexts, so that the users can easilyfind the image topics which are more relevant to their mental

Fig. 1. One portion of our topic network for organizing large-scale Flickr imagecollections at a semantic level.

query models. Thus, the queries do not need to be formulated ex-plicitly, but emerge through the interaction of the users with thetopic network (i.e., users can just choose the visible image topicson the topic network rather than generate the keywords for queryformulation, thus we name our system as JustClick). Unfortu-nately, visualizing large-scale topic network (i.e., the topic net-work which consists of large amounts of image topics) in a 2-Dsystem interface with a limited screen size is not a trivial taskbecause of the overlapping problem [16].

To tackle the overlapping problem for topic network visual-ization, we have investigated multiple innovative techniques. 1)The interestingness scores are defined for the image topics onthe topic network and they are used to highlight the image topicsof interest and eliminate some less interesting image topics,so that users can always be acquainted by the most interestingimage topics and gain the significant insights of large-scaleimage collections at the first glance. 2) A new constraint-driventopic clustering algorithm is developed to achieve multilevelrepresentation and visualization of large-scale topic network,so that our topic network visualization algorithm can have goodscalability with the number of image topics. 3) A query-driventopic network visualization algorithm is developed to supportgoal-directed query recommendation, so that users can beacquainted by a limited number of image topics which are mostrelevant to their current query interests. 4) Hyperbolic visual-ization is used to enable interactive topic network explorationand tackle the overlapping problem interactively via change offocus.

It is worth noting that the processes for automatic topicnetwork construction, interestingness score calculation, con-straint-driven topic clustering, multidimensional scaling (MDS)for topic network projection and visualization, and personal-ized topic network generation can be performed off-line. Onlythe processes for interactive topic network exploration andquery-driven topic visualization should be performed on-lineand they can be achieved in real time.

A. Interactive Topic Network Exploration

We have integrated both the popularity of the image topicsand the importance of the image topics to determine their inter-estingness scores. The popularity for one certain image topic isrelated to the number of images under the given image topic. Ifone image topic consists of more images, it tends to be more in-teresting. The importance of a given image topic is also related


to its linkages with other image topics on the topic network. Ifone image topic is linked with more image topics on the topicnetwork, it tends to be more interesting [20]. Thus, the interest-ingness score for a given image topic is defined as

(4)

where , is the number of images under , isthe number of image topics linked with on the topic network,and is an integer to keep and to be on the samescale. The number of the linked topics is more important thanthe number of images for characterizing the interestingness for agiven image topic, e.g., image topics, which consist of a smallersize of images but are linked with many other image topics, aremore interesting than the image topics which consist of largersize of images but are linked with few other image topics. Basedon this observation, we set and heuristically. Inour definition, the interestingness score is normalized to 1and it increases adaptively with the number of images andthe number of the linked topics .

After such interestingness scores for all the image topics onthe topic network are available, the image topics are then rankedaccording to their interestingness scores, so that the most in-teresting image topics with larger values of the interestingnessscores can be highlighted effectively and the less interestingimage topics with smaller values of the interestingness scorescan be eliminated automatically from the topic network. Thus,users can always be acquainted by the most interesting imagetopics.

We have also developed a new constraint-driven clustering al-gorithm to achieve multilevel representation of large-scale topicnetwork, and the pairwise edge weight is defined as

ifotherwise

(5)where the first part denotes the association betweenthe image topics and , the second part indicates theirlinkage relatedness and constraint, is the distancebetween the physical locations for the image topics and

on the topic network, is the variance of their physicallocation distances, and is a pre-defined threshold.

After such pairwise edge weight matrix is obtained, Normal-ized Cut algorithm is used to obtain an optimal partition oflarge-scale topic network [15]. Thus, the image topics, whichare strongly correlated and are connected each other on the topicnetwork, are partitioned into the same cluster and be treated asone super-topic at the higher abstract level. Each super-topic canbe expanded into a group of strongly-correlated image topics(i.e., a smaller-size topic network) at the lower level. Thus, ourconstraint-driven topic clustering algorithm is able to achievemultilevel representation and visualization of large-scale topicnetwork. Because the number of image topics at each level aremuch smaller, our multilevel topic network representation andvisualization algorithm can reduce the visual complexity sig-nificantly, tackle the overlapping problem effectively, and havegood scalability with the number of image topics.

When the users have clear query concepts in mind, they candirectly submit their keyword-based queries to our system andwe have developed a novel algorithm to enable query-drievn

Fig. 2. Query-drievn visualization of the same topic network as shown in Fig. 1,where “sunset” is the query concept from the user.

topic network visualization. As shown in Fig. 2, large-scaletopic network is reorganized according to the user’s query con-cept and more spaces are arranged automatically for the imagetopics which are more relevant to the given query concept. Thequery-driven interestingness score for a given imagetopic on the topic network is defined as

otherwise(6)

where the is used to denote the image topic on thetopic network which best matches with the user’s query concept,

is the location distance between the given imagetopic and the best matching image topic by countingthe image topic nodes between them on the topic network. Inour current implementation, we just consider the most relevantimage topics which the distance is no more than 2 (i.e., second-order nearest neighbors). Our query-driven topic network visu-alization algorithm can offer at least two advantages: 1) it cansignificantly reduce the overlapping between the image topicsby focusing on only a small number of image topics which aremost relevant to the user’s query concept (e.g., ); 2)it can guide the user on which image topics they can access fornext search according to the inter-topic contexts.

We have investigated multiple solutions to layout the topicnetwork. 1) A string-based approach is incorporated to visu-alize the topic network with a nested view, where each imagetopic is displayed closely with the most relevant image topicsaccording to the strength of their inter-topic associations. Theinterestingness score for each image topic can also be visual-ized, so that users can get the sense of the interestingness of theimage topics at the first glance. 2) The inter-topic contextualrelationships are represented as the weighted undirected edges,and the length of such weighted undirected edges are inverselyproportional to the strength of the corresponding inter-topic as-sociation , e.g., closer image topics on the topic networkare more relevant with stronger inter-topic associations. Thus,the geometric closeness between the image topics is stronglyrelated to their associations, so that such graphical representa-tion of the topic network can reveal a great deal about how theseimage topics are correlated and how they are intended to be usedjointly for manual image tagging. 3) An iconic image is selectedautomatically for each image topic and it is visualized simulta-neously with the corresponding image topic node. Such combi-nation of the iconic images and the text terms for multimodal


image topic interpretation and visualization can provide moreintuitive format for human cognition.

Our approach for topic network visualization has also ex-ploited hyperbolic geometry [16]. The hyperbolic geometry isparticularly well suited for achieving graph-based layout of thetopic network. The essence of our approach is to project thetopic network onto a hyperbolic plane according to the strengthof the associations between the image topics, and layout thetopic network by mapping the relevant image topic nodes onto acircular display region. Thus, our hyperbolic topic network vi-sualization scheme takes the following steps. 1) The image topicnodes on the topic network are projected onto a hyperbolic planeaccording to the strength of their associations by minimizingSammon’s cost function [27]

(7)

where is the strength of the inter-topic associationbetween the image topics and , the factors are usedto weight the disparities individually and also to normalize theSammon’s cost function to be independent to the absolutescale of the inter-topic association , and isthe location distance between the image topics and on thehyperbolic plane

(8)

where and are the locations of the image topics onthe hyperbolic plane. In our current implementation, Sammon’sintermediate normalization factor is used to calculate

(9)

2) After such context-preserving projection of the image topicsis obtained, Poincaré disk model [16] is used to map the imagetopic nodes on the hyperbolic plane onto a 2-D display coordi-nate. Poincaré disk model maps the entire hyperbolic space ontoan open unit circle, and produces a non-uniform mapping of theimage topic nodes to the 2-D display coordinate.

After the hyperbolic visualization of the topic network isavailable, it can be used to enable interactive exploration andnavigation of the semantic summary (i.e., topic network) oflarge-scale collections of manually-annotated Flickr imagesvia change of focus. The change of focus is implemented bychanging the mapping of the image topic nodes from the hyper-bolic plane to the unit disk for display, and the positions of theimage topic nodes in the hyperbolic plane need not to be alteredduring the focus manipulation [16]. Users can directly see thetopics of interest in such interactive topic network navigationand exploration process, thus they can build up their mentalquery models interactively and specify their queries preciselyby selecting the visible image topics on the topic networkdirectly as shown in Fig. 3. By supporting interactive topicnetwork exploration, our hyperbolic topic network visualiza-tion scheme can support personalized query recommendationinteractively, which can address both the problem of queryformulation and the problem of vocabulary discrepancy and

null returns more effectively. Such interactive topic networkexploration process does not require the user profiles, thus ourJustClick system can also support new users effectively.

Through change of focus, users can always be acquaintedby the image topics of interest according to their time-varyingquery interests. At the same time, users can also see the localinter-topic contexts which are embedded in a global structure(i.e., topic network), so that they can easily perceive the ap-propriate directions for future search as shown in Fig. 3. OurJustClick system can also track and visualize the users’ ac-cess path (query contexts over the topic network) as shown inFig. 3(a), so that the users can see a “big picture” about whichimage topics they have visited, which image topics they are vis-iting now, and which image topics are recommended by oursystem for next search. Thus, the users can always be acquaintedby the image topics which are most relevant to their currentquery interests. By tracking and visualizing the users’ accesspath, the users can see a good global structure of their query con-texts and keep track of the progress of their sequential searches.

B. Personalized Topic Network Generation

The users’ query interests have two significant properties: 1)dynamic (current and time-varying interests) and 2) consistency(long-term and general interests) [21]. Our user-system interac-tion interface focuses on capturing the users’ dynamic query in-terests adaptively for supporting personalized topic recommen-dation. On the other hand, it is also very attractive to learn theusers’ long-term query interests (i.e., user profiles) for person-alized topic recommendation.

In our JustClick system, we have developed a new algorithmfor learning the user profiles automatically from the collectionof the user’s dynamic query actions (i.e., query context maps)for updating the system’s knowledge of the user’s image needsand personal preferences. Thus, the personalized interesting-ness score for a given image topic on the topic net-work can be defined as

(10)

where is the original interestingness score of the givenimage topic , is the visiting times of the given imagetopic from the particular user, is the staying seconds forthe particular user to stick on the given image topic , isthe access depth for the particular user to interact with the imagetopic and the images under , , , and are the normal-ization parameters, . Thus, the personalizedinterestingness scores of the image topics can be determined im-mediately when such user-system interaction happens, and theywill converge to the stable values for characterizing the user’slong-term query interests (i.e., user profiles).

After the personalized interestingness scores for all theseimage topics are learned from the collection of the user’sdynamic query actions, they can be used to highlight the imagetopics for generating a personalized topic network to representthe user profiles. Thus, the image topics with smaller valuesof the personalized interestingness scores can be eliminatedautomatically, so that each user can be acquainted by the most


Fig. 3. Our JustClick system can achieve a good balance between global context and local details (a) user’s access path on a global context of the topic networkand (b)–(d) the local details of the semantic contexts for query recommendation.

interesting image topics according to his/her personal prefer-ences.

The user’s interests may be changed according to his/hertimely image observations, and one major problem for inte-grating the pre-learned user profiles for query recommendationis that such pre-learned user profiles may over-specify theuser’s interests. Thus, the pre-learned user profiles may hinderthe user to access other interesting image topics on the topicnetwork. Based on this observation, we have developed a novelalgorithm for propagating the user’s interests over other rele-vant image topics on the topic network. Thus, the personalizedinterestingness score for the image topic to bepropagated can defined as

(11)

where is the original interestingness score for the imagetopic to be propagated, is the propagation strengthof the image topic and it is defined as

otherwise(12)

where is the set of the accessed image topics in the first-ordernearest neighborhood of the image topic as shown in Fig. 4,

is the inter-topic association between the image topicto be propagated and the image topic that has been ac-

cessed by the particular user. Thus, the image topics, which havelarger values of the personalized interestingness scores ,can be propagated adaptively.

Fig. 4. First-order nearest neighborhood and inter-topic correlations for auto-matic preference propagation, where the image topic “beach” in the center is tobe propagated.

It is worth noting that the image topics, which are less inter-esting in the large pool of image topics and are eliminated atthe beginning for reducing the complexity for large-scale topicnetwork visualization, can be recovered and be presented to theparticular user when their strongly-related image topics (whichare strongly related to these eliminated image topics) are ac-cessed by the particular user and thus the values of their person-alized interestingness scores may become larger. Therefore, ourinterestingness propagation algorithm can allow users to accesssome less interesting image topics according to their personalpreferences.

By integrating the inter-topic correlations for automatic prop-agation of the user’s preferences, our JustClick system can pre-cisely predict the user’s hidden preferences from the collectionof his/her dynamic query actions. Thus, the personalized topic


network can be used to represent the user profiles precisely,where the interesting image topics can be highlighted accordingto their personalized interestingness scores. The personalizedtopic network, which is treated as the user profiles to interpretthe user’s personal preferences, can recommend the topics of in-terest to each individual user directly without requiring him/herto make an explicit query.

IV. PERSONALIZED IMAGE RECOMMENDATION

Multiple keywords may simultaneously be used to tag thesame image, thus one single image may belong to multipleimage topics on the topic network. On the other hand, the samekeyword may be used to tag many semantically-similar images,thus each image topic at Flickr may consist of large amountsof semantically-similar images with diverse visual properties.Unfortunately, most existing keyword-based image retrievalsystems tend to return all these images to the users withouttaking their personal preferences into consideration. Thus,query-by-topic via keyword matching will return large amountsof semantically-similar images under the same topic and usersmay seriously suffer from the problem of information overload.In order to tackle this problem in our JustClick system, wehave developed a novel framework for personalized imagerecommendation and it consists of three major components.1) Topic-Driven Image Summarization and Recommendation:The semantically-similar images under the same topic are firstpartitioned into multiple clusters according to their nonlinearvisual similarity contexts, and a limited number of imagesare automatically selected as the most representative imagesaccording to their representativeness for a given image topic.Our system can also allow users to define the number ofsuch most representative images for relevance assessment. 2)Context-Driven Image Visualization and Exploration: KernelPCA and hyperbolic visualization are seamlessly integratedto enable interactive image exploration according to theirinherent visual similarity contexts, so that users can assessthe relevance between the recommended images (i.e., mostrepresentative images) and their real query intentions moreeffectively. 3) Intention-Driven Image Recommendation: Aninteractive user-system interface is designed to allow the userto express his/her time-varying query intentions easily fordirecting the system to find more relevant images according tohis/her personal preferences.

It is worth noting that the processes for kernel-based imageclustering, topic-driven image summarization and recommen-dation (i.e., most representative image recommendation) andcontext-driven image visualization can be performed off-linewithout considering the users’ personal preferences. Only theprocesses for interactive image exploration and intention-drivenimage recommendation should be performed on-line and theycan be achieved in real time.

A. Topic-Driven Image Summarization and Recommendation

The visual properties of the images and their visual simi-larity contexts are very important for users to assess the rele-vance between the images and their real query intentions [1].

Unfortunately, Flickr image search engine has completely ig-nored such important characteristics of the images. To char-acterize the diverse visual principles of the images efficientlyand effectively, both the global visual features and the local vi-sual features should be extracted for image similarity charac-terization [11]. To avoid the pitfalls of the image segmentationtools, segmentation is not performed for feature extraction. Inour current implementations, 16-bin color histogram and 62-di-mensional texture features from Gabor filter banks are used tocharacterize the global visual properties of the images, a numberof interest points and their SIFT (scale invariant feature trans-form) features are used to characterize the local visual propertiesof the images.

For a given image pair and under the same topic, theirvisual similarity context is characterized by using a mixture ofthree basic image kernels (i.e., mixture-of-kernels) [11]

(13)

where is the importance factor for the th basic image kernel.Our mixture-of-kernels algorithm can achieve more accurate ap-proximation of the diverse visual similarities between the im-ages and produce nonlinear separation hypersurfaces betweenthe images. Thus, it can achieve more accurate image clusteringand exploit the nonlinear visual similarity contexts for image vi-sualization.

The semantically-similar images (which belong to the sameimage topic) are automatically partitioned into multiple clustersaccording to their kernel-based visual similarity contexts. Theoptimal partition of the images under the same topic is obtainedby minimizing the trace of the within-cluster scatter matrix, .The scatter matrix is given by [17]

(14)

where is the mapping function of the image with the visualfeatures , is the number of images and is the number ofclusters, is the membership parameter, ifand 0 otherwise, is the center of the th cluster and it isgiven as

(15)

where is the number of images in the th cluster. The traceof the scatter matrix can be computed by

(16)and it can further be rewritten as

(17)


where is defined as

(18)

(19)

where can further be refined as

(20)

(21)

The trace of the scatter matric can be refined as

(22)

To achieve geometrical interpretation of the image clusters,we can define a cluster sphere for describing the images in thesame cluster (i.e., the sphere with minimal radius containing allthe images in the same cluster) [19]. The images, which locateon the boundary of the cluster sphere, are treated as the supportvectors for the corresponding image cluster. Thus, the distancebetween a given image with the visual features and the center

of the best matching cluster can be defined as

(23)

The radius of the cluster sphere is given by the distance betweena support vector and the center of the cluster sphere, thus theradius of the cluster sphere for the th image cluster can bedefined as

(24)

where is the set of support vectors of the th cluster .The image with the visual features can be detected as the

outlier (i.e., )

if for allotherwise.

(25)

Based on such geometrical interpretation of the image clus-ters, searching the optimal values of the elements and thatminimizes the expression of the trace in (22) can be achievedeffectively by using two iterations: 1) outer iteration and 2)inner iteration.

Ideally, a good combination of these three basic image kernels(i.e., with optimal values for these three parameters) should be

able to achieve more accurate characterization of the nonlinearvisual similarity contexts between the images and result in betterimage clustering with less overlapping between the spheres fordifferent image clusters. Thus, the optimal values of the param-eters for combining three basic image kernels are obtained byminimizing the cluster sphere overlapping

(26)where and areused to determine the distances between the given image withthe visual features and the centers for the image clustersand . To reduce the computational cost for the outer iter-ation, we have pre-defined a set of the potential combinationsof these three parameters with different values. Such pre-de-fined set of parameters can be obtained from a small set of im-ages via semi-supervised learning. Thus, the problem for findingan optimal combination of these three basic image kernels (i.e.,finding optimal values of ) is simplified to search an optimalunit sequentially over the pre-defined set of the potential com-binations of these three parameters.

For the inner iteration (each iteration picks one integer from, sequentially), the membership parameters are de-

termined automatically by minimizing the trace of the scattermatric in (22). In this inner iteration procedure, our algo-rithm uses a K-means-like strategy, i.e., updating all the clustercenters repeatedly according to the image memberships (i.e., theparameters ) which are obtained by minimizing the trace of thescatter matric . This inner iteration procedure will stop untilthe cluster centers become stable (no change anymore).

When there are images under the given topic, the compu-tational cost for obtaining their kernel matrix is approximatedas . Therefore, the total computational cost for clus-tering these images into clusters can be approximated as

. Thus, supporting kernel-based image clustering mayrequire huge memory space to store the kernel matrix when thegiven topic consists of large amount of semantically-similarimages. To address this problem, we have developed a newalgorithm for reducing the memory cost by seamlessly inte-grating parallel computing with global decision optimization.Our new algorithm takes the following key steps. 1) To reducethe memory cost, the images under the same topic are randomlypartitioned into multiple smaller subsets. 2) Our kernel-basedimage clustering algorithm is performed parallelly on all theseimage subsets to obtain a within-subset partition of the imagesaccording to their diverse visual similarity contexts. 3) Thesupport vectors for each image subset are validated by otherimage subsets through testing Karush–Kuhn–Tucker (KKT)conditions. The support vectors, which violate the KKT con-ditions, are integrated to update the decision boundaries forthe corresponding image subset incrementally. This process isrepeated until the global optimum is reached and an optimalpartition of large amount of images under the same image topicis obtained.

Our kernel-based image clustering algorithm has the fol-lowing advantages. 1) It can seamlessly integrate multiplekernels to characterize the diverse visual similarities between


the images more accurately. Thus, it can provide a good insightof large amount of images by determining their global distribu-tion structures (i.e., image clusters and their similarity contexts)accurately, and such global image distribution structures canfurther be used to achieve more effective image visualizationby selecting the most representative images automatically fromall these image clusters. 2) Only the most representative images(which are the support vectors) are stored and validated byother image subsets, thus it requests far less memory space.The redundant images (which are not the support vectors) areeliminated early, thus the kernel-based image clustering processcan be accelerated significantly. 3) The support vectors foreach image subset are validated by other image subsets, thusour algorithm can handle the outliers and noise effectively andit can generate more robust clustering results.

To allow users to assess the relevance between the imagesreturned by the keyword-based query and their real query in-tentions, it is very important to visualize the semantically-sim-ilar images under the same topic according to their inherent vi-sual similarity contexts. Because each topic may relate to largeamounts of semantically-similar images, visualizing such large-size of images on a size-limited display screen may seriouslysuffer from the overlapping problem. On the other hand, dis-playing large amounts of redundant images with similar visualproperties to the users cannot provide them with any additionalinformation. Obviously, simply selecting only one iconic imagefrom each image cluster is unable to represent and preserve thenonlinear visual similarity contexts among large amounts of se-mantically-similar images under the same topic, thus more ef-fective techniques are strongly expected to achieve represen-tativeness-based image summarization. Based on these under-standings, we have developed a novel algorithm to achieve rep-resentativeness-based image summarization and overlapping re-duction by selecting the most representative images automat-ically according to their representativeness for a given imagetopic.

Different clusters are used to interpret different groups of theimages with various visual properties and different importantaspects for a given image topic, thus the most representative im-ages should be selected from all these clusters and the outliersto preserve the nonlinear visual similarity contexts betweenthe images. Obviously, different clusters may contain differentnumbers of images, thus larger number of such most represen-tative images should be selected from the clusters with biggercoverage percentages. On the other hand, some representativeimages should prior be selected from the outliers for supportingserendipitous discovery of unexpected images. The optimalnumber of such most representative images depends on theireffectiveness and efficiency for representing and preserving thenonlinear visual similarity contexts among large amounts ofsemantically-similar images under the same topic.

For the visually-similar images in the same cluster, we haveseamlessly integrated both the subjective measure and the ob-jective measure to quantify their representativeness scores. Thesubjective measure of an image depends on the users’ ratingscores that are available at Flickr. The objective measure ofan image depends on its representativeness for the underlyingnonlinear visual similarity contexts between the images. Thus,three types of images can be selected to achieve representative-ness-based summarization of the visually-similar images in the

same cluster. 1) The images, which locate on the cluster sphereand are treated as the support vectors for the correspondingimage cluster, can effectively capture the essentials (i.e., prin-cipal visual properties) of the images in the same cluster. 2) Theimages, which locate at the center of the cluster and are far awayfrom the cluster sphere, are more representative for the popularimages located in the areas with higher densities. 3) The images,which have higher rating scores from numerous online users, aremore interesting to the users. Thus, the representativeness score

for a given image with the visual features can be definedas

(27)

where is the objective measure of the representativenessscore for the image with the visual feature , is theusers’ rating score for the given image, and is the con-text-oriented correlation between the given image and the cor-responding image cluster .

The context-oriented correlation can be defined as

(28)where is used to characterize the corre-lation between the given image and the decision boundary (i.e.,support vectors) for the image cluster , is the set of thesupport vectors for the image cluster , and is usedto characterize the correlation between the given image and thecenter for the image cluster . Thus, the images, which arecloser to the decision boundary or closer to the cluster centerfor one of these image clusters or have higher users’ ratingscores, will have larger values of the representativeness scores

. The images in the same cluster can be ranked preciselyaccording to their representativeness scores, and the most rep-resentative images with larger values of can be selected andbe recommended to the users for relevance assessment.

The user profiles are not required for selecting the most repre-sentative images, thus our topic-driven image recommendationscheme can support new users effectively. Only the most repre-sentative images are recommended, and large amounts of redun-dant images, which have similar visual properties with the mostrepresentative images, are eliminated automatically. Throughsupporting topic-driven image recommendation, our JustClicksystem can significantly reduce both the information overloadfor relevance assessment and the visual complexity for imagevisualization and exploration.

B. Context-Driven Image Visualization and Exploration

To assist users in assessing the relevance between the mostrepresentative images (i.e., recommended images) and their realquery intentions, it is very important to develop new visualiza-tion algorithms that are able to preserve the nonlinear visualsimilarity contexts between the images in the high-dimensionalfeature space. We have incorporated kernel PCA [18] to projectthe most representative images onto a hyperbolic plane, so that


Fig. 5. The 200 most representative images are selected to represent and pre-serve the visual similarity contexts between 30887 semantically-similar imagesunder the same topic “garden.”

the nonlinear visual similarity contexts between the images canbe preserved precisely for interactive image exploration.

The most representative images for a given image topic arefirst centered according to their center in the feature space. Fora given most representative image with the visual features , itsfeature-based representation can be centered by using the center

of these most representative images

(29)The covariance matrix and its component is defined as thedot product matrix

(30)

Then the kernel PCA is obtained by solving the eigenvalue equa-tion

(31)

where are the eigenvectors, is the number of the most rep-resentative images, denotes the eigenvalues,

, and denotes the ex-pansion coefficients of an Eigenvector,

(32)

where is used to interpret the eigenvectors with non-zerovalues of eigenvalues, is the expansion coefficients for thetop eigenvectors with non-zero eigenvalues, .

For a given image with the visual features , its projectionon the selected top eigenvectors with non-zero

eigenvalues can be defined as

(33)

The most representative images, which are projected on thetop principal components, are further mapped onto the hy-perbolic plane. After such context-preserving projection of themost representative images is obtained, Poincaré disk model isused to map the most representative images on the hyperbolicplane onto a 2-D display coordinate to support change of focusand interactive image exploration.

Even the low-level visual features may not be able to carry theimage semantics directly, supporting similarity-based image vi-sualization can significantly leverage humans’ powerful capa-bilities on pattern recognition for interactive relevance assess-ment. Through change of focus, users can easily control the pre-sentation and visualization of the recommended images for in-teractive relevance assessment.

Our hyperbolic visualization result of the most representativeimages for the image topic “garden” is given in Fig. 5, where 200most representative images for the image topic “garden” are rec-ommended and visualized. One can observe that such 2-D hy-perbolic visualization of the most representative images can pro-vide an effective interpretation and summarization of the orig-inal visual similarity contexts among large amounts of seman-tically-similary images under the same topic. Visualizing themost representative images according to their visual similaritycontexts can allow users to find interesting visual similarity con-texts between the images and discover more relevant images ac-cording to their nonlinear visual similarity contexts. Throughselecting the most representative images for image summariza-tion and visualization, our JustClick system can provide a goodglobal overview of the diverse image contents in a sized-limitedscreen.

C. Intention-Driven Image Recommendation

Through interactive exploration of the recommended images(i.e., most representative images) according to their nonlinearvisual similarity contexts, fortunate discoveries of some un-expected images can be achieved effectively by selecting themost representative images from the outliers autonomouslyand incorporating hyperbolic visualization to allow the users tozoom into the area of interest interactively. Thus, our JustClicksystem can facilitate discovery of new knowledge through theinteractive image exploration process, and the users can easilyfind some particular images according to their personal inter-ests. We have developed a new scheme to achieve user-adaptiveimage recommendation by autonomously adjusting the recom-mendation and visualization of the most representative imagesaccording to the users’ time-varying query interests. Afterthe users find some images of interest via interactive imageexploration, our JustClick system can allow the users to clickthese images of interest to express their time-varying queryinterests interactively for directing the system to find morerelevant images according to their personal preferences.

After the user’s time-varying query interests are captured,the personalized interestingness scores for the images under thesame topic are calculated automatically, and the personalizedinterestingness score for a given image with the visualfeature is defined as

(34)


Fig. 6. Our JustClick system for personalized image recommendation. (a) Most representative images recommended for the topic-based query “sunset,” where theimage in blue box is clicked by the user (i.e., query intention). (b) More images which are similar with the accessed image are recommened adaptively accordingto the user’s query intentions of “blight sunset.”

Fig. 7. Our JustClick system for personalized image recommendation. a) Most representative images recommended for the keyword-based query “vegetables,”where the image in blue box is clicked by the user. (b) Most relevant images recommended according to the user’s query intention of “tomato.”

for all (35)

where is the original representativeness score for the givenimage, is the kernel-based visual similarity correla-tion between the given image with the visual features andthe clicked image with the visual features which belong tothe same image cluster. Thus, the redundant images with largervalues of the personalized interestingness scores, which havesimilar visual properties with the clicked image (i.e., belongingto the same cluster) and are initially eliminated for reducingthe visual complexity for image summarization and visualiza-tion, can be recovered and be recommended to the users adap-tively as shown in Figs. 6–8. One can observe that integratingthe visual similarity contexts for personalized image recommen-dation can significantly enhance the users’ ability on findingsome particular images of interest. Thus, integrating the visualsimilarity contexts between the images for personalized image

recommendation can significantly enhance the users’ ability onfinding some particular images of interest. With a higher degreeof transparency of the underlying image recommender, userscan achieve their image retrieval goals (i.e., looking for someparticular images) with a minimum of cognitive load and a max-imum of enjoyment [3]. By supporting intention-driven imagerecommendation, users can maximize the amount of relevantimages while minimizing the amount of irrelevant images ac-cording to their personal preferences.

It is worth noting that our JustClick system for personal-ized image recommendation is significantly different from tra-ditional relevance feedback approaches for image retrieval [25],[26]. 1) The relevance feedback approaches require users tolabel a reasonable number of returned images into the positiveor negative classes for learning a reliable model to predict theuser’s query intentions, and thus they may bring huge burdenon the users even active learning has recently been proposed forlabel propagation. On the other hand, our personalized imagerecommendation framework can allow the users to express theirtime-varying query intentions easily and thus it can lessen the


Fig. 8. Our JustClick system for personalized image recommendation. (a) Most representative images recommended for the keyword-based query “leaves,” wherethe image in read box is clicked by the user. (b) Most relevant images recommended according to the user’s query intention of “red leaves.”

burden on the users significantly. 2) When large-scale imagecollections come into view, a limited number of labeled imagesmay not be representative for large amounts of unseen imagesand thus a limited number of labeled images are insufficientfor learning an accurate model to predict the user’s query in-tentions precisely. On the other hand, our personalized imagerecommendation framework can select a reasonable number ofmost representative images according to their representative-ness of the nonlinear visual similarity contexts between the im-ages. Thus, the users can always be acquainted by the most rep-resentative images according to their personal preferences. 3)Most existing relevance feedback approaches use page-by-pageranked list to display the query results, and the nonlinear vi-sual similarity contexts between the images are completely ig-nored. On the other hand, our personalized image recommen-dation framework can allow users to see the most representativeimages and their nonlinear visual similarity contexts at the firstglance, and thus the users can obtain more significant insightsand make better query decisions and assess the image relevancemore effectively.

ALGORITHM AND SYSTEM EVALUATION

We carry out our experimental studies by using large-scalecollections of Flickr images with unconstrained contents andcapturing conditions. We have downloaded more than 1.5billions Flickr images and their tagging documents. We havelearned a large-scale topic network with more than 4000 mostpopular image topics (i.e., most popular taggings along thetime) at Flickr. Creating a new search engine which scales tosuch size of image collections may emerge many new chal-lenges while offering many good opportunities for the CBIRcommunity.

Our work on algorithm evaluation focus on 1) evaluating theresponse time for supporting change of focus in our JustClicksystem, which is critical for supporting interactive explorationof large-scale topic network and large amounts of recommendedimages; 2) evaluating the performance (efficiency and accuracy)of our JustClick system for achieving personalized image rec-ommendation according to the users’ personal preferences; and3) evaluating the benefits for integrating topic network (i.e., a

global overview of large-scale image collections), hyperbolicvisualization and interactive image exploration for improvingimage search.

One critical issue for evaluating our personalized image rec-ommendation system is the response time for supporting changeof focus. In our JustClick system, the change of focus is used forachieving interactive exploration and navigation of large-scaletopic network and large amounts of recommended images. Thechange of focus is implemented by changing the Poincaré map-ping of the image topic nodes or the recommended images fromthe hyperbolic plane to the display unit disk, and the positions ofthe image topic nodes or the recommended images in the hyer-bolic plane need not to be altered during the focus manipulation.Thus, the response time for supporting change of focus dependson two components. 1) The computational time for recalcu-lating the new Poincaré mapping of large-scale topic network orlarge amounts of recommended images from a hyperbolic planeto a 2-D display unit disk, i.e., recalculating the Poincaré po-sition for each image topic node or each recommended image.2) The visualization time for relayouting and revisualizinglarge-scale topic network or large amounts of recommended im-ages on the display disk unit according to their new Poincarémappings.

Because the computational time may depend on thenumber of image topic nodes, we have tested the performancedifferences for our system to recalculate the Poincaré mappingsfor different numbers of image topic nodes. Thus, our topicnetwork with 4000 image topic nodes is partitioned into fivedifferent scales: 500 nodes, 1000 node, 2000 nodes, 3000 nodes,3500 nodes, and 4000 nodes. We have tested the computationaltime for recalculating the Poincaré mappings of differentnumbers of image topic nodes when the focus is changed. Asshown in Fig. 9, one can find that the computational time isnot sensitive to the number of image topics, and thus recalcu-lating the Poincaré mapping for large-scale topic network canalmost be achieved in real time.

Following the same approach, we have also evaluated theempirical relationship between the computational time andthe number of the recommended images. By computing thePoincaré mappings for different numbers of the recommended


Fig. 9. Empirical relationship between the computational time � (seconds)and the number of image topic nodes.

Fig. 10. Empirical relationship between the computational time � (seconds)and the number of recommended images.

images, we have obtained the same conclusion, i.e., the compu-tational time for recalculating the new Poincaré mappingsis not sensitive to the number of the recommended images asshown in Fig. 10, and thus recalculating the Poincaré mappingfor large amounts of recommended images can almost beachieved in real time.

We have also evaluated the empirical relationship between thevisualization time and the number of image topic nodes andthe number of recommended images. In our experiments, wehave found that revisualization of large-scale topic network andlarge amounts of recommended images is not sensitive to thenumber of image topics and the number of recommended im-ages, and thus our system can support revisualization of large-scale topic network and large amounts of recommended imagesin real time.

From these evaluation results, one can conclude that ourJustClick system can support change of focus in real time, andthus our JustClick system can achieve interactive navigationand exploration of large-scale image collections effectively.

To support similarity-based visualization of large amount ofmost representative images recommended for each image topic,the computational cost depends on two issues. 1) The compu-tational cost for supporting kernel-based image clusteringand achieving representativeness-based image summarization.2) The computational cost for performing kernel PCA on themost representative images to obtain their similarity-preservingprojections on the hyperbolic plane.

To achieve kernel-based image clustering, the kernel matrixfor the images should be calculated and the computational costlargely depends on the number of images for the given imagetopic. The computational cost for achieving kernel-based imageclustering is approximated as , where is the totalnumber of the images and is the number of image clusters.Because each image topic in Flickr may consist of large amountof images, we have obtained the empirical relationship betweenthe computational cost (CPU time) and the number of im-ages as shown in Fig. 11. One can observe that the computa-tional cost increases exponentially with the number of im-ages. Based on this observation, the images under the same topic

Fig. 11. Empirical relationship between the computational cost � (seconds)and the number of images.

Fig. 12. Empirical relationship between the computational cost � (seconds)and the number of the most representative images.

are first partitioned into multiple subsets, parallel computing andglobal decision optimization are integrated to reduce the compu-tational cost significantly from to , where

is the number of image subsets. It is also worth noting thatsuch cost-sensitive process for kernel-based image clusteringcan be performed off-line.

After the images under the same topic are partitioned intomultiple clusters via kernel-based clustering, our system can se-lect the most representative images automatically and performkernel PCA to obtain their similarity-preserving projections onthe hyperbolic plane. The computational cost for performingkernel PCA is approximated as , where is the numberof the most representative images recommended for the givenimage topic. As shown in Fig. 12, we have obtained the em-pirical relationship between the computational cost and thenumber of most representative images. One can observe that thecomputational cost exponentially increases with the numberof the most representative images. In our JustClick system, thenumber of the most representative images is normally less than500, thus the computational cost is acceptable for supportinginteractive image exploration and navigation.

When the most representative images for the given imagetopic are recommended and visualized, our system can furtherallow users to click one or multiple images of interest for ex-pressing their query intentions and directing our system to findmore relevant images according to their personal preferences.For evaluating the effeciency and the accuracy of our personal-ized image recommendation system, the benchmark metric in-cludes precision and recall . The precision is used to char-acterize the accuracy of our system for finding more relevantimages according to the user’s personal preferences, and the re-call is used to characterize the efficiency of our system forfinding more relevant images. They are defined as

(36)

where is the set of true positive images that are visually-similarwith the images accessed by the users and are recommendedcorrectly, is the set of fause positive images that are visually-


TABLE IPRECISION AND RECALL FOR OUR JUSTCLICK SYSTEM TO LOOK FOR SOME

PARTICULAR IMAGES VIA INTENTION-DRIVEN IMAGE RECOMMENDATION

similar with the accessed images and are not recommended, andis the set of true negative images that are visually-dissimilar

with the accessed images but are recommended incorrectly.Table I gives the precision and recall of our JustClick system

for personalized image recommendation. From these experi-mental results, one can observe that our system can supportpersonalized image recommendation effectively. Thus, the vi-sual properties of the images (i.e., characterized by using thelow-level visual features) are very important for achieving moreeffective image retrieval, even the low-level visual features maynot be able to carry the semantics of image contents directly.It is also worth noting that such interactive process for inten-tion-driven image recommendation can be achieved in real time,and thus our JustClick system can support interactive image ex-ploration on-line.

V. CONCLUSION

In this paper, we have developed a novel framework calledJustClick to enable personalized image recommendation via ex-ploratory search from large-scale collections of Flickr images.The topic network is automatically generated to summarize andvisualize large-scale image collections at a semantic level. Thenonlinear visual similarity contexts between the images are ex-ploited to select a limited number of most representative im-ages to summarize large amounts of semantically-similar im-ages under the same topic according to their representativenessscores. An interactive user-system interface is designed to allowusers to express their time-varying query intentions easily fordirecting our system to find more relevant images according

to their personal preferences. Our experiments on large-scaleimage collections (1.5 billions Flickr images) with diverse se-mantics (4000 image topics) have obtained very positive results.

REFERENCES

[1] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Content-based mul-timedia information retrieval: State of the art and challenges,” ACMTrans. Multimedia Comput. Appl., vol. 2, no. 1, pp. 1–19, 2006.

[2] N. Sebe and Q. Tian, “Personalized multimedia retrieval: the newtrend?,” in ACM Workshop Multimedia Inf. Retrieval, 2007, pp.299–306.

[3] G. Marchionini, “Exploratory search: from finding to understanding,”Commun. ACM, vol. 49, no. 4, pp. 41–46, 2006.

[4] S. Santini, A. Gupta, and R. Jain, “Emergent semantics through inter-action in image databases,” IEEE Trans. Knowl. Data Eng.g, vol. 13,no. 3, pp. 337–351, 2001.

[5] A. Jaimes, “Human factors in automatic image retrieval system designand evaluation,” Proc. SPIE, vol. 6061, pp. 101–109, 2006.

[6] J. S. Hare, P. H. Lewis, P. G. B. Enser, and C. J. Sandom, “Mind thegap: another look at the problem of the semantic gap in image retrieval,”Proc. SPIE, vol. 6073, pp. 205–216, 2006.

[7] M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy,A. Hauptmann, and J. Curtis, “Large-scale concept ontology for mul-timedia,” IEEE Multimedia, vol. 13, no. 3, pp. 86–91, 2006.

[8] J. Fan, Y. Gao, and H. Luo, “Multi-level annotation of natural scenesusing dominant image compounds and semantic concepts,” in Proc.ACM Multimedia, 2004, pp. 540–547.

[9] E. Chang, K. Goh, G. Sychay, and G. Wu, “CBSA: Content-based an-notation for multimodal image retrieval using bayes point machines,”IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 1, pp. 26–38,Feb. 2002.

[10] A. Vailaya, M. Figueiredo, A. K. Jain, and H. J. Zhang, “A bayesianframework for semantic classification of outdoor vacation images,”Proc. SPIE, vol. 3656, pp. 120–132, 1998.

[11] J. Fan, Y. Gao, and H. Luo, “Integrating concept ontology and mul-titask learning to achieve more effective classifier training for multi-level image annotation,” IEEE Trans. Image Process., vol. 17, no. 3,pp. 407–426, 2008.

[12] Y. Rubner, C. Tomasi, and L. Guibas, “A metric for distributionswith applications to image databases,” in Proc. IEEE ICCV, 1998, pp.59–66.

[13] G. P. Nyuyen and M. Worring, “Interactive access to large image visu-alizations using similarity-based visualization,” J. Vis. Lang. Comput.,vol. 19, no. 2, pp. 203–224, 2008.

[14] B. Moghaddam, Q. Tian, N. Lesh, C. Shen, and T. S. Huang, “Visual-ization and user-modeling for browsing personal photo libraries,” Intl.J. Comput. Vis., vol. 56, pp. 109–130, 2004.

[15] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEETrans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Aug.2000.

[16] J. Lamping and R. Rao, “The hyperbolic browser: A focus+contenttechnique for visualizing large hierarchies,” J. Vis. Lang. Comput., vol.7, pp. 33–55, 1996.

[17] M. Girolami, “Mercer kernel-based clustering in feature space,” IEEETrans. Neural Netw., vol. 13, no. 3, pp. 780–784, 2002.

[18] B. Scholkopf, A. Smola, and K. R. Muller, “Nonlinear component anal-ysis as a kernel eigenvalue problem,” Neural Comput., vol. 10, no. 5,pp. 1299–1319, 1998.

[19] A. Ben-Hur, D. Horn, H. T. Siegelmann, and V. Vapnik, “Supportvector clustering,” J. Mach. Learning Res., vol. 2, pp. 125–137, 2001.

[20] S. Brin and L. Page, “The anatomy of a large-scale hypertextual websearch engine,” in Proc. WWW, 1998, pp. 107–117.

[21] J. Teevan, S. Dumais, and E. Horvitz, “Personalized search via auto-mated analysis of interests and activities,” in Proc. ACM SIGIR, 2005,pp. 449–456.

[22] Y. Gao, J. Fan, H. Luo, and S. Satoh, “A novel approach for filteringjunk images from Google search results,” in Proc. MMM’08, 2008, pp.1–12.

[23] R. Fergus, P. Perona, and A. Zisserman, “A visual category filter forGoogle images,” in Proc. ECCV, 2004, vol. 3021, pp. 242–256.

[24] D. Cai, X. He, Z. Li, W.-Y. Ma, and J.-R. Wen, “Hierarchical clusteringof WWW image search results using visual, textual, and link informa-tion,” in Proc. ACM Multimedia, 2004, pp. 952–959.

[25] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback:A power tool in interactive content-based image retrieval,” IEEE Trans.Circuits Syst. Video Technol., vol. 8, no. 5, pp. 644–655, Oct. 1998.


[26] S. Tong and E. Chang, “Support vector machine active learning forimage retrieval,” in Proc. ACM Multimedia, 2001, pp. 107–118.

[27] M. F. Cox and M. A. Cox, Multidimensional Scaling. New York:Chapman & Hall, 2001.

[28] C. Leacock, M. Chodorow, and G. A. Miller, “Combining local con-text and wordnet similarity for sense identification,” WordNet: An Elec-tronic Lexical Database, pp. 265–283, 1998.

Jianping Fan received the M.S. degree in theoryphysics from Northwestern University, Xian, China,in 1994 and the Ph.D. degree in optical storage andcomputer science from Shanghai Institute of Opticsand Fine Mechanics, Chinese Academy of Sciences,Shanghai, China, in 1997.

He was a Postdoctoral Researcher at Fudan Uni-versity, Shanghai, China, during 1998. From 1998 to1999, he was a Researcher with Japan Society of Pro-motion of Science (JSPS), Department of Informa-tion System Engineering, Osaka University, Osaka,

Japan. From September 1999 to 2001, he was a Postdoctoral Researcher in theDepartment of Computer Science, Purdue University, West Lafayette, IN. In2001, he joined the Department of Computer Science, University of North Car-olina at Charlotte as an Assistant Pofessor and then become Associate Professor.His research interests include image/video analysis, semantic image/video clas-sification, personalized image/video recommendation, surveillance videos, andstatistical machine learning.

Daniel A. Keim received the Ph.D. degree in com-puter science from the University of Munich, Mu-nich, Germany, in 1994.

He is a Full Professor in the Computer andInformation Science Department, University ofKonstanz. He has been an Assistant Professor inthe Computer Science Department, University ofMunich, and an Associate Professor in ComputerScience Department, Martin Luther UniversityHalle. He also worked at AT&T Shannon ResearchLaboratories, Florham Park, NJ. In the field of

information visualization, he developed several novel techniques, which usevisualization technology for the purpose of exploring large databases. Hehas published extensively on information visualization and data mining, hehas given tutorials on related issues at several large conferences, includingVisualization, SIGMOD, VLDB, and KDD.

Dr. Keim was program cochair of the IEEE Information Visualization Sym-posia in 1999 and 2000, the ACM SIGKDD Conference in 2002, and the Visual

Analytics Symposium in 2006. Currently, he is on the Editorial Board of theIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, the Knowledgeand Information System Journal, and the Information Visualization Journal. Heis a member of IEEE Computer Society.

Yuli Gao received the B.S. degree in computerscience from Fudan University, Shanghai, China, in2002 and the Ph.D. degree in information technologyfrom University of North Carolina at Charlotte, in2007.

He then joined HP Labs, Palo Alto, CA. Hisresearch interests include computer vision, imageclassification and retrieval, and statistical machinelearning.

Dr. Gao received an award from IBM as EmergingLeader in Multimedia in 2006.

Hangzai Luo received the B.S. degree in computerscience from Fudan University, Shanghai, China, in1998 and the Ph.D. in information technology fromUniversity of North Carolina at Charlotte in 1006

In 1998, he joined Fudan University as Lecturer.He joined East China Normal University, Shanghai,China, as Associate Professor at 2007. His researchinterests include computer vision, video retrieval, andstatistical machine learning.

Dr. Luo received second place award from Depart-ment of Homeland Security at 2007 for his excellent

work on video analysis and visualization for homeland security applications.

Zongmin Li received the M.S. degree from NanjingUniversity of Aeronautics and Astronautics, Nanjing,China, in 1992 and the Ph.D. degree from the Insti-tute of Computing Technology, Chinese Academy ofSciences, Beijing, China, in 2005.

From 2007 to 2008, he was a Visiting Professorwith the University of North Carolina at Charlotte. Heis currently a Professor of Computer Science at ChinaUniversity of Petroleum, Dongyong. His research in-terests include image processing, pattern recognition,and computer graphics.

Documents

: Personalized Image Recommendation via Exploratory Search From Large-Scale Flickr Images