Annotating Personal Albums via Web Mining jzwang/ustc11/mm2008/p459-jia.pdf¢  annotation propagation

  • View

  • Download

Embed Size (px)

Text of Annotating Personal Albums via Web Mining jzwang/ustc11/mm2008/p459-jia.pdf¢  annotation...

  • Annotating Personal Albums via Web Mining

    Jimin Jia MOE-Microsoft Key Lab of MCC

    University of Science and Technology of China Hefei 230027, China


    Nenghai Yu MOE-Microsoft Key Lab of MCC

    University of Science and Technology of China Hefei 230027, China


    Xian-Sheng Hua Microsoft Research Asia

    49 Zhichun Road Haidian District

    Beijing 100080, China


    ABSTRACT Nowadays personal albums are becoming more and more popular due to the explosive growth of digital image capturing devices. An effective automatic annotation system for personal albums is desired for both efficient browsing and search. Existing research on image annotation evolves through two stages: learning-based

    methods and web-based methods. Learning-based methods attempt to learn classifiers or joint probabilities between images and concepts, which are difficult to handle large-scale concept sets due to the lack of training data. Web-based methods leverage web image data to learn relevant annotations, which greatly expand the scale of concepts. However, they still suffer two problems: the query image lacks prior knowledge and the annotations are often noisy and incoherent. To address the above

    issues, we propose a web-based annotation approach to annotate a collection of photos simultaneously, instead of annotating them independently, by leveraging the abundant correlations among the photos. A multi-graph similarity propagation based semi- supervised learning (MGSP-SSL) algorithm is used to suppress the noises in the initial annotations from the Web. Experiments on real personal albums show that the proposed approach outperforms existing annotation methods.

    Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Retrieval Models.

    General Terms Algorithms, Experimentation

    Keywords Image annotation, personal albums, multi-graph, semi-supervised

    learning, similarity propagation

    1. INTRODUCTION Recent years have witnessed a rapid growth in the number of digital personal photo albums, due to the popularity of digital cameras, mobile phone cameras and other portable digital imaging devices. With such large amount of personal photo albums, users often encounter difficulties when browsing the album or searching

    for certain photos. Therefore, effective photo album organization and management has emerged as an important topic [25]. Providing meaningful keywords for the albums is an effective way to manage photo albums, as text indexing and ranking techniques can be easily incorporated [33]. With the semantic annotations, it is more convenient for users to browse or search images from their albums. Furthermore, more and more people are willing to upload their own albums onto popular image

    sharing communities like Flickr. If the uploaded albums are associated with accurate descriptions, it will significantly improve the experiences of image sharing and search on the Web.

    Manually image annotation is accurate, but it heavily relies on human efforts and is thus labor-intensive and time-consuming. In order to reduce human efforts, contemporary commercial software products, such as Picasa and ACDSee, allow users to manually select images with the same topic and then give description texts

    for the series of images. Although the existing software attempts to reduce the boring labeling work by providing friendly and attractive interfaces, it still requires intensive labors and time costs. Therefore, effectively automatic annotations for personal albums are highly desired.

    Generally, automatic image annotation could be categorized into two scenarios: learning-based methods and web-based methods. In learning-based methods, a statistical model is built to learn a classifier or the joint probabilities between images and

    annotations. The main problem of this scenario is that the learned models highly depend on the size and distribution of the training dataset. However, a sufficient training dataset is difficult to obtain since it requires high labor costs. For example, the commonly used image annotation dataset, Corel, only contains 371 words, and many popular words like ‘ipod’, ‘mp3’ are not included in the dataset. For an image of any kind, it is difficult to provide abundant and accurate annotations based on such limited-size

    concept sets. Therefore, the existing learning-based methods are difficult to extend to large-scale image annotation tasks. On the other hand, web-based annotation methods leverage web-scale data, which are able to cover unlimited vocabulary. A typical web-based method is AnnoSearch, proposed by Wang et al. in [7]. For a target image, an initial keyword is provided to conduct a text-based search on the web database. Then content-based

    Permission to make digital or hard copies of all or part of this work for

    personal or classroom use is granted without fee provided that copies are

    not made or distributed for profit or commercial advantage and that copies

    bear this notice and the full citation on the first page. To copy otherwise, or

    republish, to post on servers or to redistribute to lists, requires prior specific

    permission and/or a fee.

    MM’08, October 26-31, 2008, Vancouver, British Columbia, Canada.

    Copyright 2008 ACM 978-1-60558-303-7/08/10...$5.00.


  • approach is adopted to search visually similar images, and annotations are extracted from descriptions of both visually and semantically similar images. Other methods [6][8][34] further extend Wang et al.’s framework and propose more general image annotation schemes where the initial keywords are not required.

    Although web-based methods overcome the insufficiency of training dataset in learning-based methods, they still suffer two problems. One is the lack of prior knowledge about the query image, which also exists in learning-based methods. Actually, users introduce a lot of prior knowledge in the interpretation of an image, which is difficult to accomplish by computers. The other problem is that the annotations extracted from descriptions of web images are noisy and a refinement process is desired.

    To address the aforementioned problems, we propose to simultaneously annotate a collection of photos, instead of an individual photo, by leveraging web image database. In order to overcome the lack of prior knowledge in a single image, we attempt to utilize image correlations to facilitate annotating personal photo albums. For example, if we have Figure 1(a) alone, it is difficult to tell what the architecture is even by human being. However, given the images in Figure 1(a)-(e), it will be much

    easier to conclude that the architecture is a palace in the Forbidden City. We argue that the collection of images brings more prior knowledge and the image correlations in the album can be utilized to improve image annotation. Moreover, by grouping the images into different clusters, we can build better image queries to learn annotations via searching similar web images.

    However, though multiple photos help infer better annotations from descriptions of web images, annotations are still noisy. To

    address this problem, we propose a multi-graph similarity propagation based semi-supervised learning (MGSP-SSL) approach to reduce the noises and improve the annotation coherence. Various properties of personal photo albums like visual similarity, temporal consistency and word correlation are employed to mine the correlations in the album, and a modified graph-based semi-supervised learning is thus designed. It is worth highlighting that inter-relationships between different graphs are learned by a similarity propagation model. By using the modified

    graph-based semi-supervised learning framework and similarity propagation model, annotations for images in the album are cleaned and re-ranked.

    Another advantage of the framework is that user interaction could be easily incorporated to help improve the annotations. If users are not satisfied with certain results, they are able to modify them. The system correspondingly produces updated annotations by propagating users’ labeling to other images. Generally, users’

    manual annotation on the selected images also reflects users’

    preferences. As a result, the final annotations are highly related to users’ personalized interests.

    The contributions of the work are multifold:

    1. We propose to mine annotations for personal albums by leveraging web database. To the best of our knowledge, this is

    the first work attempting to automatically annotate a batch of images simultaneously.

    2. We exploit image correlations in personal albums to overcome the lack of prior knowledge in existing image annotation methods.

    3. A multi-graph similarity propagation based semi-supervised learning (MGSP-SSL) method is used to reduce noises in the initial web-based annotations. The method combines both

    annotation propagation and inter-relationships between different graphs to re-rank the initial annotations.

    4. User interaction could be easily incorporated during annotation. Annotations for all images in the album will be accordingly re-ranked afterwards.

    The organization of the paper is as follows. In Section 2, the related works will be introduced. In Section 3, the annotation framework will be presented in details. User inter