Web Image Retrieval

Bielefeld UniversityFaculty of TechnologyApplied Informatics

Bachelor Thesis

Retrieval of Web Images forComputer Vision Research

September 28, 2009

Author:Maikel Linke (NWI)malinke techfak.uni-bielefeld.de

Supervisors:Dipl.-Inform. Marco KortkampPD Dr.-Ing. Sven Wachsmuth

ii

Abstract

Today, the web is an important source of image data for computer vision research.Search engines and online photo albums allow the retrieval of images related to a givenkeyword. This work contains an overview of these services and their usage. Additionally,it describes the development of a common programming interface to access differentservices. The developed Java library implements the most prominent services.

iv

Contents

1 Introduction 1

2 Related Work 32.1 Datasets generated from the Web . . . . . . . . . . . . . . . . . . . . . . . 32.2 Image Services and their Usage . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Ranking of Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Requirements 93.1 The Domain of Web Image Retrieval . . . . . . . . . . . . . . . . . . . . . 93.2 The User’s Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 The Developer’s Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Derived Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Design 134.1 The User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 The Architecture Behind . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 The Developer’s Guide 175.1 Implementing a new Provider . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Modifying existent Source Code . . . . . . . . . . . . . . . . . . . . . . . . 185.3 Limits of Pure Extending . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 A brief API User Guide 196.1 Getting Web Image Providers . . . . . . . . . . . . . . . . . . . . . . . . . 196.2 Configuring and Executing . . . . . . . . . . . . . . . . . . . . . . . . . . 206.3 Receiving Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7 Evaluation 237.1 Performance Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237.2 Exemplary Precision Comparison . . . . . . . . . . . . . . . . . . . . . . . 23

8 Summary 27

A Additional Material 33A.1 Performance Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33A.2 Precision Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

B Implementation Notes 37

v

B.1 The Web Image Retrieval API . . . . . . . . . . . . . . . . . . . . . . . . 37B.2 Evaluation Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37B.3 Used Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

C Declaration of Independent Work 39

vi

1 Introduction

Computer vision research requires large datasets of images. The World Wide Web isan immense source of data and rapidly growing1. The number of images on websitesis growing with the web. And photo sharing services like Flickr report thousands ofuploads per minute2. These images can be accessed through a keyword based searchprovided by different search engines and application programming interfaces (APIs).

Therefore a lot of research uses the web as an image data source. Chapter 2 gives anoverview of some important papers. While older algorithms used some fixed datasets toclassify visual objects, new ones try to learn from noisy datasets retrieved from imagesearch engines or photo sharing services3. Most approaches are tested with only afew services or just one. But the more sources an algorithm supports with—constantquality—the more general and adaptive it is. Some papers name the sources only asexample to indicate, that their algorithm works with multiple sources. This is the firstmotivation to develop a common interface for different sources. So algorithms can bedeveloped source-independently and their actual task can be focused.

Another motivation to combine multiple services is, that all services are limited. Mostsearch engines limit the number of search results to one thousand per query. Withmultiple services more images can be retrieved. But there are also some techniques liketranslating the query to other languages to achieve more results. A common librarycould provide such techniques for all supported services.

Third, this work tries to expose further advantages of combining multiple services.Possible advantages are

• easy using through a common interface,

• efficiency by a common infrastructure like a thread pool and

• an easily extendable library to maximize the number of results and to keep up withthe ever-changing web.

In the end, such an open library could support the community by merging web imageretrieval knowledge. Future work in computer vision research could use such a libraryinstead of reinventing the wheel every time.

Chapter 2 introduces literature concerned with datasets and web images. There isalso evaluated, which service-support should be implemented first. The requirements ofthe system are explored in Chapter 3. This leads to the software design presented in

1http://news.netcraft.com/archives/web_server_survey.html2http://www.flickr.com/3This evolution is described in the introduction of [5].

1

http://news.netcraft.com/archives/web_server_survey.html

http://www.flickr.com/

Chapter 4. In Chapter 5 is described, how the software works and how new modules arewritten to extend the library. Chapter 6 explains how to use the library as API user.An exemplary precision comparison of different services and a performance test of thedeveloped library are described in Chapter 7. The results of this work are summarizedin Chapter 8.

2

2 Related Work

The introduced works below describe computer vision algorithms. These algorithmsneed a large amount of image data. Focusing on the pure retrieval (collecting anddownloading) of these images is a way of gaining hints how to design a web imageretrieval system. For that purpose, this chapter is not sorted by papers. It is dividedinto the generation of datasets and the usage of search services. Both points to theranking of these services to decide, which service-support should be implemented first.

2.1 Datasets generated from the Web

This section summarizes some popular datasets, which were build by using images fromthe web. Some datasets also contain non-web images. Retrieval details are not exposedin all cases. A statistical overview of the datasets is given by Table 2.1 on page 5.

Caltech-101 (2004). The paper [7] presents a way of learning object categories usingprior information. The images were collected by using Google Images and then filteredmanually. The result is a dataset of 101 object categories containing over 9,000 images.The dataset can be downloaded at http://www.vision.caltech.edu/.

Faces in the Wild (2004). In 2004 Berg et al. published [2]. It describes automaticassociation of faces and names from the caption of news images. For this purpose, theycollected almost 45,000 images from Yahoo News. The dataset can be downloaded athttp://www.tamaraberg.com/faceDataset/.

Pascal VOC (2005, 2006, . . . ). The 2005 Pascal Visual Object Classes Challenge,described in [5], was a competition of twelve teams. The task was to recognize objectsfrom four visual object classes. At a certain time a development kit and datasets werepublished. Three weeks later the test data has been released. One of the test datasetshas been collected from Google Images and contained 654 images. In 2006 the VOCChallenge ([6]) used an image database collected from personal photographs, Flickrand the Microsoft Research Cambridge (MSRC). It contained over 5,000 images. Thechallenge still takes place every year and all datasets can be downloaded at http://pascallin.ecs.soton.ac.uk/challenges/VOC/.

MSRC (2005). Winn, Criminisi and Minka present an algorithm for automatic cat-egorization of object classes from images in [27]. The test images have been col-lected from acquired photographs, images from the web and from a Pascal dataset

3

http://www.vision.caltech.edu/

http://www.tamaraberg.com/faceDataset/

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

(see Pascal VOC above). It is not mentioned, how the web images have been re-trieved. The dataset can be downloaded at http://research.microsoft.com/en-us/projects/objectclassrecognition/.

Animals on the Web (2006). Berg and Forsyth searched for images using Google’s websearch and combined text and visual information for clustering the downloaded images.They selected ten animals as categories and downloaded over 26,000 images. Afterwardsthey re-ranked the images to get good results. The category sizes and precision isdescribed in [3]. The dataset can be downloaded at http://www.tamaraberg.com/animalDataset/.

Caltech-256 (2007). Very similar to the collecting approach of the Caltech-101 datasetis the collecting of the Caltech-256 dataset. In addition to Google Images also Picsearchhas been used to collect the images. By combining the first results of two search enginesthe precision of the collected categories could be increased. Searching and Filteringresulted in 256 object categories with over 30,000 images. The dataset can be downloadedat http://www.vision.caltech.edu/.

LabelMe (2008). The goal of the work in [20] was to collect a large number of imageswith ground truth labels. Google Images, Flickr and Altavista Images have been usedto collect the images and a new web-based tool to label the images was developed. In2008 the dataset contained over 30,000 images with over 100,000 annotations in 183categories. The dataset can be downloaded at http://labelme.csail.mit.edu/.

ImageNet (2009). [4] describes the building of ImageNet, a structured index like Word-Net, but containing images. Several, not named search engines have been used to retrievecandidate images. The 5200 categories were filtered manually and build a dataset of 3.2million images. The dataset can be downloaded at http://www.image-net.org/.

2.2 Image Services and their Usage

A lot of search services are provided in the web. Some of them are listed in this section.The list contains all services found in important papers plus randomly found services.An overview is given in Table 2.2.

Altavista provides its search service since 1995 and is one of the very first search engines,which is still on the market. It supports image, especially photo, and video search.Torralba et al. used Altavista, Ask, Flickr, Cydral, Google, Picsearch and Webshotsto collect 80 million tiny images stored with 32 pixels height and width and analyzedobject classification described in [22].

4

http://research.microsoft.com/en-us/projects/objectclassrecognition/

http://research.microsoft.com/en-us/projects/objectclassrecognition/

http://www.tamaraberg.com/animalDataset/

http://www.tamaraberg.com/animalDataset/

http://www.vision.caltech.edu/

http://labelme.csail.mit.edu/

http://www.image-net.org/

Name Images Classes Annot. Used Web-Service(s) Ref.Caltech-101 8,765 102 8765 Google Images [7]Faces in the Wild 30,281 — — Yahoo News [2]Pascal VOC 2005 654 4 654 Google Images [5]Pascal VOC 2006 5,304 10 5455 Flickr [6]MSRC 591 23 1751 — [27]Animals on the Web ca. 26,000 10 — Google Web [3]Caltech-256 30,607 257 30607 Google Images,

Picsearch[10]

LabelMe 30,369 183 111490 Google Images, Flickr,Altavista Images

[20]

ImageNet 3,247,902 5247 several [4]

Table 2.1: Popular datasets, which were build using the web. Column four lists thenumber of annotations, while the annotation style is varying and not listed.This table does not claim to be complete.

Ask searches for websites, images, news, videos, maps and “answers”. No special optionsfor the image search are provided, but there are distinct portals for some countries withdifferent search results for the same keyword. Ask was used in [22] as described inthe paragraph Altavista. Popescu, Moellic and Millet selected Ask to build an imagedatabase of over 25,000 images because of the high precision for queries with uniqueterms. In [18] they wrote:

For a set of 20 queries with familiar concepts and 50 images per query, themean precision is around 80%, while Picsearch, Yahoo! and Google obtainrespectively 70%, 63% and 56%.

Bing is the name of Microsoft’s new search engine, online since June 2009. The previousversion was called Live Search. Search results are accessible through an API supportingmultiple protocols: JSON, XML and SOAP. During research for this work no paperusing Bing was found. The likely reason is the new release of Bing. But also the oldsearch engines of Microsoft could not be found in important papers.

Cydral is a pure image search engine used by [22] as mentioned in the paragraph Altavista.Until the middle of September 2009 it was not available. After that it provided only afew results1.

Flickr is a popular photo sharing service provided by Yahoo. The website reported,“5,831 uploads in the last minute”2. Multiple protocols are supported by the open APIwith the following response formats: REST, XML-RPC, SOAP, JSON, PHP and YQL.

142 images for “dog” and 91 images for “cat”, 09/15/20092http://www.flickr.com/, 09/08/2009, 16:41

5

http://www.flickr.com/

In 2008 at least five important papers used Flickr: [15, 17, 19, 22, 28]. Li et al. generated3D models of landmarks from a collection of landmark photos. Olivares, Ciaramita andvan Zwol used annotated photos from Flickr to enhance keyword based searches by usingFlickr notes to extract visual features. Raguram and Lazebnik used Flickr photos to findiconic images representing visual categories. The work of Torralba et al. is described inthe paragraph Altavista. Yang, Chen and Wang took over two million tags from Flickrto build a demo version of “Web 2.0 Dictionary”. In 2009 [1] also presented a way offinding iconic images.

Google Images is separated from the paragraph Google Web, because it was used bythe most papers focused in this research. The following 15 papers are using GoogleImages: [9] describes a visual category filter tested with Google Images. [26] groupsGoogle’s results in clusters trained by human labeled images. [5] reports about the PascalVOC 2005, where one test dataset was collected using Google Images. [8] presents analgorithm learning a category by its name without supervision. It retrieves the trainingdata from Google Images. [12] groups the results of an image search engine and presentsa new user interface to show and select the clustered search results (e.g. of GoogleImages). [13] clusters the results of six categories into groups of keywords. [25] studies theeffectiveness of ontology based approaches in image retrieval. [10] describes the retrievalof the Caltech-256 dataset mentioned before. [21] uses textual and visual features toretrieve a large number of images for a given object class. The authors compare threeapproaches: First, collecting images from a website found with Google Web. Second,searching with Google Images and downloading all images from the websites of thefound images. And third, downloading only the images found by Google Images. Theymeasured the precision of all three approaches (same order): 26%, 4% and 39%. [14]merges results from Google Images and Google Web and re-ranks the images using thegroundwork of [21]. [16] describes the re-ranking of Google’s image search results. A userselects wanted images and the algorithm ranks related images higher. [20] reports fromthe LabelMe dataset mentioned above. [22] uses Google Images among others and isdescribed in the paragraph Altavista. [23] follows a multi-modality ontology approach tore-rank images. [24] constructs a multi-modality ontology from Wikipedia and combinesthis with the previous work. Google provides no API for its image search and limits thesearch results to 1000.

Google Web can also be used to search images. Most found websites contain images.Some are related to the keyword, although the precision is very low. The text of thewebsite can be analyzed to evaluate the images. The previously mentioned [3] describesthe search for animal images on websites returned by Google. Even in [21] this methodis compared with Google’s image search and an approach between. [14] combines thesetwo approaches.

Picasa Web Albums is a photo sharing service like Flickr. It is provided by Google andcan be accessed through the Google Data API. [17, 28] mention this portal as alternativeto Flickr, but do not use it.

6

Picsearch searches for images, but also supports web search. It can filter animations,colored pictures, portraits and some picture sizes. In [12] candidate images for clusteringare retrieved from an image search engine (e.g. Picsearch). [10] reports from the Caltech-256 dataset generated with the results of Google Images and Picsearch. [22] is mentionedin Altavista and used Picsearch among many others.

Webshots is a photo sharing portal also used by [22].

Yahoo provides similar search services to Google. Many APIs are usable. The imagesearch returns many Flickr photos3. Yahoo News provides a photo portal with photocaptions used by [2, 11]. The aim of Berg et al. ([2]) was to identify the persons in thephotos. Jeon and Manmatha ([11]) tried to label the photos automatically.

Name Interface(s) Max. results Used ByAltavista 1020 [22]Ask 320 [18, 22]Bing JSON, XML, SOAP 1000 [22]Cydral — [22]Flickr REST, XML-RPC, SOAP,

JSON, PHP, YQL∞ [15, 17, 19, 22, 28, 1]

Google Images 1000 [9, 26, 5, 8, 12, 13, 25, 10,21, 14, 16, 20, 22, 23, 24]

Google Web 1000 [3, 21, 14]Picasa WebAlbums

Atom, JSON ∞ [17, 28]

Picsearch 10000 [12, 10, 22]Webshots ∞ [22]Yahoo 1008 [2, 11] (News Photos)

Table 2.2: Popular image services. This table does not claim to be complete.

2.3 Ranking of Services

Not all services can be implemented in this work. Thus the services have to be ranked todecide, which service-support should be implemented first. Many criteria are possible:used by many papers, new and not studied, good API support, many results, goodresults, good performance, et cetera.

In this work the most crucial criterion is the number of papers, which had used theservice. But the others also play a role. Additionally diverse services should be usedto develop a library supporting most services. The following shows an ordered list ofservices with keyword comments, which criteria were most decisive.

3Example: 11 of the first 18 results of a “dog” search came from Flickr, 09/10/2009, 10:15

7

1. Google Images, most used

2. Flickr, very often used, ready Java API

3. Bing, not used, but promising4 and API differing from Google Images and Flickr

4. Picsearch, often used, high number of results

5. Picasa Web Albums, used, Java API

6. Webshots, many results

7. Altavista

8. Ask, few results, but multiple portals with different results

9. Yahoo, overlap with Flickr

10. Google Web, web search not focused, low precision

11. Cydral, not available or very few results at the moment

The first three services have been implemented in the developed library. The remaininglist can be seen as suggestion, which services should be supported next.

4The evaluation described in Section 7.2 educes a relative low precision.

8

3 Requirements

The requirements of a web image retrieval library are evolved in this chapter. First,the domain of web image retrieval is introduced. Then Section 3.2 lists the goals of alibrary user, while Section 3.3 names the ones of a developer. From these goals furtherrequirements are derived in Section 3.4.

3.1 The Domain of Web Image Retrieval

Most of the terms used in this domain were already mentioned in the previous chapters.Figure 3.1 connects these terms. Search is the main term. What the program shoulddo, is searching for images and providing them. The Search provides a number of WebImages. The Query defines, what kind of images are requested. In this case: all imagesrelated to a given Keyword1. A Keyword comes from a Language, even if some wordsare the same in different languages. How many images are demanded, defines the ImageQuantity. More parameters are imaginable2.

1

Image Quantity

specified by

0..*

provides

executes

Web ImageURL

Keyword

Language

Query Search

generalizes

generalizes

PhotoSharing Portal

ImageSearch Service

Service

Figure 3.1: A domain model without software specific terms.

1“. . . keyword-based search is the de-facto model for query formulation on the Web.” [17]2Common query parameters of the used services Bing, Flickr and Google are: activating a safe-search

filter, setting up a license of the images and defining a media-type, for example “photos”. Eachservice has many more individual search parameters.

9

3.2 The User’s Goals

What are the user’s goals? He wants to retrieve and process images. What kind ofprocessing he does, should not matter. To retrieve images, he needs something providingthese images. This implicates two new terms—Web Image Receiver and Web ImageProvider—generalizing the problem. The user also wants to

• use the library out of the box,

• learn the usage of the library quickly,

• define the image quantity,

• define how many images are needed,

• retrieve no duplicates, if not desired,

• know everything about found images,

• process the images efficiently, and

• select which services should be used.

3.3 The Developer’s Goals

The developer extends the library to increase the number of supported services. Shewants to

• understand the library quickly,

• write as little code as possible,

• put aside common tasks, that are not service specific, and

• benefit from the already coded infrastructure.

3.4 Derived Requirements

What else implicate these goals? Some of them can be directly translated into softwaredesign. Others are more general.

To use the library out of the box, it needs a good default configuration. This isnot possible in all cases. For example the Flickr API needs registered user keys forauthentication. So this service should not be activated until these keys are given. Thisexample also shows, that the services have to be initialized individually. So the libraryneeds something managing these initializations, for example a factory3. The library

3Explained in the Factory design pattern.

10

should also provide a minimal implementation for direct use. Such an implementationcan also serve as an example code.

To learn the usage of the library quickly, it should provide a minimal number of publicinterfaces to use. The details should be encapsulated4.

Sometimes the quantity of available images for a given query is lower than the desiredquantity. A very special query can result in no or only a few images. Therefore only amaximum of images is defined in a query.

Avoiding duplicate images needs a central register of all found images. The usage ofthis register should be optional.

To process the images efficiently, every image should be processed as soon as it isavailable. A concurrent search for more images should go on. This implicates a kind ofthread management and an event-based data processing. This means, that the imagesare not returned as collection, but a process image method is called, when an image isavailable. The event-based processing is not constrained, but a simple solution for thisproblem.

On the other hand, the developer has her own goals. A simple architecture with clearprogramming interfaces are also needed. But the developer’s interfaces differ from theuser’s interfaces. Therefore these interfaces should be divided into separate packages.

To write as little code as possible common abstract classes and tools should be given.That way the developer has to write only the service specific code. These tools shouldprovide all common tasks and encapsulate details to allow the developer to put asidethese details. The more reusable the code is designed, the more a developer can benefitfrom it.

All these requirements have been evolved step by step during the whole softwaredevelopment phase. They are included in the software design presented in the nextchapter.

4Encapsulation is an object oriented design principle, which hides complexity.

11

12

4 Design

In the previous chapter the requirements of a web image retrieval library are described.These lead to the design demonstrated in this chapter. It is divided into the user inter-faces derived from the user’s perspective, and the remaining architecture behind theseinterfaces, which deals with the developer’s needs and all other derived requirements.

4.1 The User Interfaces

Derived from the user’s goals, a minimal set of public interfaces can be designed. Itis presented in Figure 4.1. The client code can get WebImageProviders from the We-bImageProviderFactory. Providers can be configured with query information and thenbe executed. Providing images runs concurrently to the rest of the client code. Afound image is delivered through the processWebImage method of the WebImageReceiverinterface. The client code has to implement this interface to receive the images.

receives fromreceives

WebImage+getImage(): byte[]+getUrl(): URL+getSearchKey(): String+getRank(): int

identify provider by

«enum»WebImageServiceLiterals:BINGFLICKRGOOGLE

instantiates

configuresand executes

«interface»WebImageProvider

+addReceiver(receiver: WebImageReceiver)+removeReceiver(receiver: WebImageReceiver)+configure(searchKey: String, maxNumberOfImages: int, languageOfKey: Language)+activateFiltering()+execute()+waitForEndOfProviding()

WebImageProviderFactory+setup(configurationFile: String)+availableServices(): WebImageService[0..*]+createWebImageProvider(service WebImageService): WebImageProvider

gets providers fromto receive

«interface»WebImageReceiver

+processWebImage(image: WebImage, source WebImageProvider)+providingFinished(source WebImageProvider)

Client Code

Figure 4.1: The basic set of user interfaces.

4.2 The Architecture Behind

Without further requirements, the whole rest of the code could be placed in the samepackage, not visible for the client code. But the developer needs some interfaces, too.So a package hierarchy is useful, where one package contains the user interfaces and onethe developer interfaces. Additionally each implemented service (WebImageProvider)should gain its own package, because they will change independently, when a servicechanges. And implementations added by different developers are desirable.

13

de.unibi.techfak.agai.webImageRetrieval

This is the root package of the library.

The place foruser interfaces.

provider

The place fordeveloper tools.

google

bing flickr

The placesfor eachWebImageProvider.

Figure 4.2: Overview of all packages.

HtmlDecoder+ampDecode(html: String): String

GoogleTranslator+translate(fromLang: Language, toLang: Language, query: String): String

ScheduledDownloader+downloadAndProcess(url: URL, byteArrayProcessor: ByteArrayProcessor, restTime: long = 0): Future<byte[]>+isNewUrl(url URL): URL

«interface»ByteArrayProcessor

+processByteArray(byteArray: byte [0..1])

FutureWebImage+FutureWebImage(searchKey: String, url: URL, rank: int, sender: ReceiverCaller)+processByteArray(byteArray: byte [0..1])

ImageCounter+ImageCounter(maxNumberOfImages: int)+countImage(): boolean+maximumReached(): boolean+actualNumber(): int+neededImages(): int

ImageDownloader+ImageDownloader(counter: ImageCounter, filterImages: boolean, caller: ReceiverCaller, searchKey: String, urlProvider: ImageUrlProvider)+download()

«interface»ImageUrlProvider

+receiverUrls(): URL [0..*]

ReceiverCaller+ReceiverCaller(receivers: ReceiverCollection, provider: WebImageProvider)+sendImage(image: WebImage)+finish()

ReceiverCollection+add(): boolean+remove(): boolean+isEmpty(): boolean+sendImage(image: WebImage, source WebImageProvider)+finish(source WebImageProvider)

DownloadingProviderAbstract operations to implement:#createUrlProvider(searchKey: String, lang: Language): ImageUrlProviderAccess to stored values:#getCounter(): ImageCounter

ThreadedImageProviderAbstract operations to implement:#executeInThread()Access to stored values:#getCaller(): ReceiverCaller

AbstractImageProviderAbstract operations to implement:#checkedExecute()#checkedWait()Access to stored values:#getSearchKey(): String#getMaxNumberOfImages(): int#getLanguageOfKey: Language#getReceivers: ReceiverCollection#filterDuplicates: boolean

Figure 4.3: All classes in the provider package.

14

While the root package contains all user interfaces, the sub-package provider was createdto store the developer tools. The first helpful class here is an AbstractImageProvider. Itimplements the WebImageProvider interface and carries for all architecture-implicatedresponsibilities: managing receivers, storing query configuration and handling of excep-tions. The number of methods, that have to be implemented, shrinks from six to two.

The next responsibility an WebImageProvider has, is to run in a thread. This providesthe ThreadedImageProvider. It extends the AbstractImageProvider and starts an ownthread. The providing finished message is sent last in the thread. Only one abstractmethod has to be implemented.

Then, all implemented providers query an internet server to get image URLs fora given query. After that, they have to download the images and send them to alllistening receivers. Collecting URLs, downloading images and calling the receivers arecommon tasks. But the collecting of URLs differs. The next abstraction is the classDownloadingProvider. All it needs is an ImageUrlProvider, which is gained via anabstract method.

These three abstract implementations cover all needed levels so far and all normal 1

providers extend the DownloadingProvider. All other classes in this package are toolsfor common problems and can be used for own code. The ReceiverCollection can beused to store and inform receivers. To bind a ReceiverCollection to a provider, theReceiverCaller can be used. An ImageDownloader requires such a caller for downloadedimages and an ImageUrlProvider to get the image URLs. To download the adequatenumber of images it also needs an ImageCounter. While the bytes of an image arebeing downloaded, additional information, for example the URL, has to be stored. Forthis purpose the FutureWebImage stores these information. It also implements theByteArrayProcessor interface to receive the downloaded bytes and send the ready webimage via a ReceiverCaller.

One of the most important utility classes is the ScheduledDownloader. It manages athread pool and is used for all downloads. One web host is not stressed by more than onedownload at a time. A delay can be given to make pauses between downloads from onehost. This prevents blocking of requests. Via the isNewUrl method duplicate downloadscan be avoided.

The HtmlDecoder replaces encoded ampersands in found URL strings. And theGoogleTranslator can translate text via Google’s translation service.

Two classes in the root package were not mentioned before. First, the FileWritin-gReceiver is an example implementation of WebImageReceiver and simply writes allreceived images to a directory. Second, the LanguageIterator generates providers fordifferent languages and calls them with translated search keys.

Figure 4.2 presents all classes of the library at once. Package private classes arealso visible. Generalizations and aggregations were all drawn, but not all simple usage-dependencies.

1There are two special providers in the root package. The LanguageIterator and the AllInOneProviderare decorating (design pattern) the providers. For this purpose, they only implement the WebImage-Provider interface.

15

de.unibi.techfak.agai.webImageRetrieval

provider

google

bing

flickr

«interface»ProviderDistributor

AllInOneProvider+DISTRIBUTOR: ProviderDistributor

ResultPage#ResultPage(pageContent: String)#ResultPage(pageBytes: byte [0..*])#getImageUrls(): URL [0..*]#nextResultPageUrl(): URL

GoogleImageSearch#GoogleImageSearch(lang: Language, searchKey: String)

GoogleImageProvider

UrlCreator#UrlCreator(searchKey: String, lang: Language)#creatUrl()

«enum»GoogleImagePortal

Literals:ANDORRABELARUSBULGARIA...

FlickrUrlProvider#FlickrUrlProvider(photosGate: FlickrPhotosGate, SearchKey: String, maxNumberOfImages: int)

FlickrPhotosGate#FlickrPhotosGate(key: String, privateKey: String)#search(params SearchParameters, imagesPerPage: int, page: int): PhotoList#getSizes(photoId: String): Size [0..*]

FlickrImageProvider+setup(key: String, privateKey: String)

BingImageUrlProvider#BingImageUrlProvider(urlTemplate: String, counter: ImageCounter)

BingImageProvider+BingImageProvider(applicationKey: String)

HtmlDecoder+ampDecode(html: String): String

GoogleTranslator+translate(fromLang: Language, toLang: Language, query: String): String

ScheduledDownloader+downloadAndProcess(url: URL, processor: ByteArrayProcessor, restTime: long = 0): Future<byte[]>+isNewUrl(url URL): URL

«interface»ByteArrayProcessor

+processByteArray(byteArray: byte [0..1])

FutureWebImage+FutureWebImage(searchKey: String, url: URL, rank: int, sender: ReceiverCaller)+processByteArray(byteArray: byte [0..1])

ImageCounter+ImageCounter(maxNumberOfImages: int)+countImage(): boolean+maximumReached(): boolean+actualNumber(): int+neededImages(): int

ImageDownloader+ImageDownloader(counter: ImageCounter, filterImages: boolean, caller: ReceiverCaller, searchKey: String, urlProvider: ImageUrlProvider)+download()

«interface»ImageUrlProvider

+receiverUrls(): URL [0..*]

ReceiverCaller+ReceiverCaller(receivers: ReceiverCollection, provider: WebImageProvider)+sendImage(image: WebImage)+finish()

ReceiverCollection+add()+remove()+isEmpty(): boolean+sendImage(image: WebImage, source WebImageProvider)+finish(source WebImageProvider)

DownloadingProviderAbstract operations to implement:#createUrlProvider(searchKey: String, lang: Language): ImageUrlProviderAccess to stored values:#getCounter(): ImageCounter

ThreadedImageProviderAbstract operations to implement:#executeInThread()Access to stored values:#getCaller(): ReceiverCaller

AbstractImageProviderAbstract operations to implement:#checkedExecute()#checkedWait()Access to stored values:#getSearchKey(): String#getMaxNumberOfImages(): int#getLanguageOfKey: Language#getReceivers: ReceiverCollection#filterDuplicates: boolean

«enum»Language

Literals:AFRICAANSALBANIANARABIC...

LanguageIterator+LanguageIterator(distributor: ProviderDistributor)+LanguageIterator(distributor: ProviderDistributor, languages: Language [0..*])

FileWritingReceiver+FileWritingReceiver()+FileWritingReceiver(writeDir: String)

ConfigurationReader#read(configFile: String): Map< WebImageService, Map<String, String> >

WebImage+getImage(): byte [0..*]+getUrl(): URL+getSearchKey(): String+getRank(): int «enum»

WebImageServiceLiterals:BINGFLICKRGOOGLE

«interface»WebImageProvider

+addReceiver(receiver: WebImageReceiver)+removeReceiver(receiver: WebImageReceiver)+configure(searchKey: String, maxNumberOfImages: int, languageOfKey: Language)+activateFiltering()+execute()+waitForEndOfProviding()

WebImageProviderFactory+setup(configurationFile: String)+availableServices(): WebImageService[0..*]+createWebImageProvider(service WebImageService): WebImageProvider+createDistributor(service WebImageService): ProviderDistributor

«interface»WebImageReceiver

+processWebImage(image: WebImage, source WebImageProvider)+providingFinished(source WebImageProvider)

Figure 4.4: All classes sorted in packages. Not all dependencies are visible.

16

5 The Developer’s Guide

A developer wants to extend the library. In this case, she wants to write new implemen-tations of WebImageProvider. There are two ways to realize this:

1. Modifying the existent source code.

2. Using the existent code as library and writing only new code.

The first manner is easier to code and described in the next two sections, while thesecond manner is better to maintain, but has some restrictions. These restrictions andsome possible solutions are given in Section 5.3.

5.1 Implementing a new Provider

As seen in Section 4.2, the easiest way to implement a new image provider is to extendthe DownloadingProvider. It asks for an ImageUrlProvider, which has to be given byimplementing the createUrlProvider method. This URL provider only has to providecollections of image URLs. Downloading the images, delivering them to the receivers,counting the images and finishing the providing is done by the DownloadingProvider.Helpful tools are the ScheduledDownloader to download anything from a given URLand the ImageCounter to avoid providing too many URLs, which will not be used. Thecounter should be taken from the DownloadingProvider, since it uses the same instanceto count the images.

If this solution does not fit, the ThreadedImageProvider can be used. An implemen-tation has to fill the executeInThread method. The whole configuration of the providercan be accessed via get-methods of the extended classes. The finish signal is sent afterreturning of the executeInThread method. The responsibilities of the implementing classare:

• downloading of images,

• counting of images and not downloading more than the maximum,

• delivering of downloaded images,

• returning, when all downloads are finished.

Perhaps the ImageDownloader can be used to fulfill these tasks. Useful, in most cases,will be the ScheduledDownloader and the ReceiverCaller, which can be accessed via theThreadedImageProvider. The FutureWebImage can be useful as well.

17

The simplest abstract implementation of WebImageProvider is the AbstractImage-Provider. It only checks the right usage (state) of the provider, checks the given inputand stores it.

5.2 Modifying existent Source Code

If a new provider has been implemented, it should be integrated in the user’s interfacesby the following steps:

• Adding an identifier of the new service to the WebImageService enumeration.

• Giving the ability to create new instances of the new provider to the WebImage-ProviderFactory.

• Eventually modifying the used configuration file to setup the factory correctly.

After these steps the new provider can be used like any other provider.

5.3 Limits of Pure Extending

If there is only new code to add, a new provider can be implemented as described above.What cannot be done is:

1. modifying WebImageService and

2. modifying WebImageProviderFactory.

WebImageService is a Java Enum, which cannot be extended. As a consequence, the newprovider cannot be identified by an instance of WebImageService and not be recognizedby an extended factory. If the client code depends on only one factory, an extendedfactory can be written, that takes general Enums as argument and evaluates, if it canhandle this Enum as service.

It is also not possible to extend the Language enumeration. The supported languagesand the translator tool have to be maintained in the library’s source code.

Last, the information provided by WebImage could be not sufficient. If more specialinformation about web images is needed, there are two possible solutions:

1. Extending of WebImage. A special provider creates this type and received Web-Images are casted, if they come from this provider.

2. Another special provider stores additional information and provides this via get-methods for given web images.

In the future, a solution should be found for the last problem. The remaining restric-tions are bearable for the present and can be solved by adding new interfaces.

18

6 A brief API User Guide

The usage of the library follows three steps:

1. getting one or more web image providers,

2. configuring and executing,

3. receiving of web images.

These three steps are described in the next sections. First test code is printed in List-ing 6.1. It is the shortest way to download some images and write them to the system’stemporary directory.

/∗ s e t t i n g v a r i a b l e s ∗/f ina l St r ing searchKey = ”dog” ;f ina l int numberOfImages = 3 ;f ina l WebImageReceiver r e c e i v e r = new Fi l eWr i t ingRece ive r ( ) ;/∗ g e t a s p e c i a l prov ider ,

d e c o r a t i n g a l l a v a i l a b l e p r o v i d e r s ∗/f ina l WebImageProvider prov ide r = new AllInOneProvider ( ) ;/∗ c o n f i g u r e and e x e c u t e ∗/prov ide r . addReceiver ( r e c e i v e r ) ;p rov ide r . c o n f i g u r e ( searchKey , numberOfImages ) ;p rov ide r . execute ( ) ;/∗ now r e c e i v i n g . . . ∗/prov ide r . waitForEndOfProviding ( ) ;

Listing 6.1: A short example to use the library.

6.1 Getting Web Image Providers

The example above shows the easiest way of getting a provider. It creates an AllInOne-Provider, which takes all available services and creates a provider of each service. Theavailable services are provided by the WebImageProviderFactory class. Which servicesare available depends on which services are configured. All services can be configured viathe setup method of the factory class. The configuration file could look like Listing 6.2on the next page.

19

<?xml version=” 1 .0 ” encoding=”UTF−8”?><prov ide r s>

<prov ide r name=”Bing”><parameter name=”appKey” value=””/>

</ prov ide r><prov ide r name=” F l i c k r ”>

<parameter name=”key” value=””/><parameter name=” privateKey ” value=””/>

</ prov ide r><prov ide r name=”Google”/>

</ prov ide r s>Listing 6.2: Example Configuration File. The values of the keys have to be inserted.

A more conventional way to get providers, is to query the WebImageProviderFactory.The createWebImageProvider method needs a service as argument and returns a newinstance of the wanted provider type. Similar to this method works the createProvider-

Distributor method. It returns an instance of the ProviderDistributor interface, that isable to produce any number of providers of the previous given type, without demandingany parameter. This interface allow other classes to work service-independently. Forexample the LanguageIterator needs such a distributor to generate a provider for eachlanguage and configures them with translated search keys. The AllInOneProvider pro-vides also a distributor for itself. This way it can be combined with the LanguageIteratorto retrieve the maximum number of images.

6.2 Configuring and Executing

Each provider has four methods to be configured:

• configure

• addReceiver

• removeReceiver

• activateFiltering

The first two methods have to be called before executing. Otherwise an IllegalStateEx-ception will be thrown. All these methods cannot be called after executing. A try willresult in the same exception.

The configure method takes the query information. The language of the key can beomitted.

Each provider demands at least one receiver. Calling addReceiver, then removeReceiver

and executing the provider will throw an IllegalStateException.

20

A call of activateFiltering instructs the provider to register each found image URLand provide the image, only if it was not provided before.

Executing means to call the execute method. It will return immediately. The provid-ing runs concurrently in an own thread.

6.3 Receiving Images

Listing 6.1 uses the FileWritingReceiver to receive all web images. It writes all receivedimages to the system’s temporary directory by default. But another directory can begiven in the constructor. This receiver is only an example and should be replaced bya new implementation. It is the place for the computer vision algorithm. If it receivesan image via the processWebImage method, it gets the web image and its source, theprovider, as reference. If the AllInOneProvider or the LanguageIterator is used, thesource will be one of the many providers created by these two special ones. And each ofthese many sources will send a finish signal.

The received WebImage provides the following information:

• the image file as byte array,

• the URL, from which the image was downloaded,

• the search key, with which the image was found, and

• the rank of the image, which is the position in the result list of the search.

21

22

7 Evaluation

This chapter is divided into two sections evaluating the work done. In Section 7.1 a testis described, which determines if the developed library performs better than a standardscript. Section 7.2 compares the precision of nine services exemplarily.

7.1 Performance Test

In this test scenario two programs downloaded a number of images related to a givensearch key. Their execution time were measured with the shell command time.

This procedure was executed ten times for each number of images: 1, 2, 4, 6, 10, 20,50, 100, 500, 1000. As search key “dog” was chosen, because enough images were foundwith this word.

The tested script is a shell script with wget and perl calls. It is printed in Listing A.1.Listing A.2 shows the Java test candidate. Both use Google Images as search engineand have the same timeouts for HTTP connections, without retrying on failures.

Figure 7.1 on page 25 shows the average execution times, which are also listed inTable A.1. The used script performs better, if only one image is to be downloaded.At this stage, the Java library is not able to benefit from concurrency. But with twoimages it can download concurrently already and terminates earlier. Henceforward, theexecution time of the script increases linearly, while the Java program only waits for thelatest concurrent download.

This test demonstrates, that the developed library performs better in almost all cases.The overhead generated by Java objects and thread pools is marginal.

7.2 Exemplary Precision Comparison

This section compares the precision of the first twenty images of nine different services.The precision of found images is an important factor for computer vision algorithms. Itis defined as the fraction of relevant images in the set of found images. More precisely:If F is the set of found images and R the set of all relevant images, then is:

precision =|R ∩ F ||F |

But, if only found images are observed, then this can be simplified. If f is the numberof found images and r the number of relevant found images, then is:

precision =r

f

23

The question, which images are relevant, is more difficult. Here, a simplified clas-sification is used, that bases on the annotation model of [21]. It describes the threeclasses in-class-good, in-class-okay and non-class with two sub-classes abstract and non-abstract. Since only in-class and non-class matter for the above defined precision, imageswill be labeled only with these two. This means, that also abstract images like drawingscan be in-class.

The compared services are: Altavista, Ask, Bing, Flickr, Google Images, Picasa, Pic-search, Webshots, and Yahoo. Since the intention of this section is to exemplify theprecision of services, only the following three animal names were chosen as search keys:dog, giraffe and mouse. Considering the first twenty images mostly represents the firstimpression of a service user. With these twenty images per search key, sixty images perservice are examined.

Table A.2 lists the measured precision for each service and search key. This datais illustrated in Figure 7.2. The results show, that the ambiguous term mouse is animportant factor. A mouse can be the animal or a computer mouse. Many search resultspresented the last meaning. The high average precision of Ask is derived from only afew computer mice in the search result. Ambiguous search keys can be defined moreprecisely. Picasa’s precision for mouse was measured as 10%, but for mouse+animal aregained 95%.

If the key mouse is omitted, the average precision for dog and giraffe is in the rangefrom 87% (Flickr) to 100% (Google Images). This demonstrates, that all these servicesare generally applicable for computer vision algorithms. To get significant data themeasurement has to be repeated with more search keys.

24

0

50

100

150

200

250

300

350

400

450

0 100 200 300 400 500 600 700 800 900 1000

Exe

cutio

n tim

e in

sec

onds

Requested number of images

ScriptJava

Figure 7.1: The results of the performance test.

0

0.2

0.4

0.6

0.8

1

Altavista Ask Bing Flickr Google Picasa Picsearch Websh. Yahoo

prec

isio

n

average dog giraffe mouse

Figure 7.2: The measured precision values besides the average precision.

25

Figure 7.3: The first twenty dog images returned by Google (random order).

Figure 7.4: The first twenty giraffe images returned by Google (random order).

26

8 Summary

This thesis gives an overview about the retrieval of image data via image search enginesand online photo albums. A Java library has been developed, which provides a commoninterface to access three different services.

Each service can be called with translated search keys automatically. This method isable to download far in excess of 100.000 images per search key. Besides, it is designedto be easily extendable. New services can be added very fast.

The following techniques of many retrieval systems are merged in the library:

• The position of an image in the search results can be a measurement of correlationto the given search key. It is provided by the common interface.

• Translating the search key to other languages can multiply the number of foundimages. 51 languages are supported.

• Different portals in different countries of one service can lead to different results.This is utilized by one image provider. The other services do not return differentresults.

Chapter 7 demonstrates that the developed library performs better than a standardscript and retrieves thousands of images in a few minutes. In the second part is seenthat the precision of the evaluated services is sufficient for computer vision research.The considered services are suitable to implement via the web image retrieval library.

With the solutions given in Chapter 5 the most web image retrieval scenarios aresupported. The library generalizes the problem of image retrieval expediently. In thefuture, the development of a well-engineered data type for web images could enhancethe project. But the library is limited by its content-independent approach. A secondlibrary merging content-dependent knowledge, for example common filters, could usethis work and simplify the development of new computer vision algorithms.

27

28

Bibliography

[1] Tamara L. Berg and Alexander C. Berg. Finding iconic images. In Conference onComputer Vision and Pattern Recognition (CVPR) 2009, 2009.

[2] Tamara L. Berg, Alexander C. Berg, Jaety Edwards, and David A. Forsyth. Who’sin the picture. In Neural Information Processing Systems (NIPS), 2004.

[3] Tamara L. Berg and David A. Forsyth. Animals on the web. In CVPR ’06: Pro-ceedings of the 2006 IEEE Computer Society Conference on Computer Vision andPattern Recognition, pages 1463–1470, Washington, DC, USA, 2006. IEEE Com-puter Society.

[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition(CVPR), 2009.

[5] M. Everingham, A. Zisserman, C. Williams, L. Van Gool, M. Allan, C. Bishop,O. Chapelle, N. Dalal, T. Deselaers, G. Dorko, S. Duffner, J. Eichhorn, J. Farquhar,M. Fritz, C. Garcia, T. Griffiths, F. Jurie, D. Keysers, M. Koskela, J. Laaksonen,D. Larlus, B. Leibe, H. Meng, H. Ney, B. Schiele, C. Schmid, E. Seemann, J. Shawe-Taylor, A. Storkey, S. Szedmak, B. Triggs, I. Ulusoy, V. Viitaniemi, and J. Zhang.The 2005 pascal visual object classes challenge. In Selected Proceedings of the FirstPASCAL Challenges Workshop. Springer-Verlag, 2006.

[6] Mark Everingham, Andrew Zisserman, Chris Williams, and Luc Van Gool. Thepascal visual object classes challenge 2006 (voc2006) results, 2006.

[7] Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models fromfew training examples: An incremental bayesian approach tested on 101 objectcategories. Computer Vision and Pattern Recognition Workshop, 12:178, 2004.

[8] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories fromgoogle”s image search. In ICCV ’05: Proceedings of the Tenth IEEE InternationalConference on Computer Vision, pages 1816–1823, Washington, DC, USA, 2005.IEEE Computer Society.

[9] R. Fergus, P. Perona, A. Zisserman, and Dept Engineering Science. A visual cat-egory filter for google images. In In Proc. ECCV, pages 242–256. Springer-Verlag,2004.

[10] G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. TechnicalReport 7694, California Institute of Technology, 2007.

29

[11] J. Jeon and R. Manmatha. Automatic image annotation of news images with largevocabularies and low quality training data. In ACM Multimedia, 2004.

[12] Feng Jing, Changhu Wang, Yuhuan Yao, Kefeng Deng, Lei Zhang, and Wei-YingMa. Igroup: web image search results clustering. In MULTIMEDIA ’06: Proceedingsof the 14th annual ACM international conference on Multimedia, pages 377–384,New York, NY, USA, 2006. ACM.

[13] Su Ming Koh and Liang-Tien Chia. Web image clustering with reduced keywordsand weighted bipartite spectral graph partitioning. In Advances in MultimediaInformation Processing - PCM 2006, volume Volume 4261/2006, pages 880–889,2006.

[14] M. Kortkamp. Automatic generation of visual training sets from the web. Master’sthesis, Bielefeld University, 2008.

[15] Xiaowei Li, Changchang Wu, Christopher Zach, Svetlana Lazebnik, and Jan-Michael Frahm. Modeling and recognition of landmark image collections usingiconic scene graphs. In ECCV ’08: Proceedings of the 10th European Conference onComputer Vision, pages 427–440, Berlin, Heidelberg, 2008. Springer-Verlag.

[16] Nick Morsillo, Chris Pal, and Randal Nelson. Mining the web for visual concepts.In MDM ’08: Proceedings of the 9th International Workshop on Multimedia DataMining, pages 18–25, New York, NY, USA, 2008. ACM.

[17] Ximena Olivares, Massimiliano Ciaramita, and Roelof van Zwol. Boosting imageretrieval through aggregating search results based on visual annotations. In MM’08: Proceeding of the 16th ACM international conference on Multimedia, pages189–198, New York, NY, USA, 2008. ACM.

[18] Adrian Popescu, Pierre-Alain Moellic, and Christophe Millet. Semretriev: an on-tology driven image retrieval system. In CIVR ’07: Proceedings of the 6th ACMinternational conference on Image and video retrieval, pages 113–116, New York,NY, USA, 2007. ACM.

[19] Rahul Raguram and Svetlana Lazebnik. Computing iconic summaries for generalvisual concepts. In 1st Internet Vision Workshop, 2008.

[20] Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman.Labelme: A database and web-based tool for image annotation. Int. J. Comput.Vision, 77(1-3):157–173, 2008.

[21] F. Schroff, A. Criminisi, and A. Zisserman. Harvesting image databases from theweb. In Proceedings of the 11th International Conference on Computer Vision, Riode Janeiro, Brazil, 2007.

[22] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large dataset for nonparametric object and scene recognition. Pattern Analysis and MachineIntelligence, IEEE Transactions on, 30(11):1958–1970, 2008.

30

[23] Huan Wang, Liang-Tien Chia, and Song Liu. Image retrieval ++–web imageretrieval with an enhanced multi-modality ontology. Multimedia Tools Appl.,39(2):189–215, 2008.

[24] Huan Wang, Xing Jiang, Liang-Tien Chia, and Ah-Hwee Tan. Ontology enhancedweb image retrieval: aided by wikipedia & spreading activation theory. In MIR’08: Proceeding of the 1st ACM international conference on Multimedia informationretrieval, pages 195–201, New York, NY, USA, 2008. ACM.

[25] Huan Wang, Song Liu, and Liang-Tien Chia. Does ontology help in image re-trieval?: a comparison between keyword, text ontology and multi-modality ontologyapproaches. In MULTIMEDIA ’06: Proceedings of the 14th annual ACM interna-tional conference on Multimedia, pages 109–112, New York, NY, USA, 2006. ACM.

[26] Xin-Jing Wang, Wei-Ying Ma, Qi-Cai He, and Xing Li. Grouping web image searchresult. In MULTIMEDIA ’04: Proceedings of the 12th annual ACM internationalconference on Multimedia, pages 436–439, New York, NY, USA, 2004. ACM.

[27] J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universalvisual dictionary. In ICCV ’05: Proceedings of the Tenth IEEE International Con-ference on Computer Vision, pages 1800–1807, Washington, DC, USA, 2005. IEEEComputer Society.

[28] Qingxiong Yang, Xin Chen, and Gang Wang. Web 2.0 dictionary. In CIVR ’08:Proceedings of the 2008 international conference on Content-based image and videoretrieval, pages 591–600, New York, NY, USA, 2008. ACM.

31

32

A Additional Material

A.1 Performance Test

UrlTemplate=” http :// images . goog l e . com/ images ?gbv=1&hl=de&sa=N&q=$1&s t a r t=”WantedImages=$2

WgetOptions=”−U Moz i l l a \/5 .0 −t 1 −−connect−t imeout 3 −−read−t imeout 10”GoogleWgetOptions=”$WgetOptions −a goog l e . l og ”PerlWgetOptions=”$WgetOptions −a images . l og ”

Regex=’m|<a h r e f=/imgres \? imgurl =([ˆ&]+)&[ˆ>]+> ’\’<img s r c=\S+ width=\d+ he ight=\d+></a>|g ’

for ( ( i =0; i<$WantedImages ; i +=20)); dowget $GoogleWgetOptions $UrlTemplate$i −O − | p e r l −e ”my \$imageCounter = $ i ;whi l e (my \ $ l i n e = <STDIN>) {

whi le (\ $imageCounter < $WantedImages && \ $ l i n e =˜ $Regex ) {system (\ ”wget $PerlWgetOptions \$1\” ) ;\$imageCounter++;}}”

done

Listing A.1: Short shell script.

public stat ic void main ( f ina l St r ing [ ] a rgs ) {i f ( args . l ength != 2) {

System . e x i t ( 1 ) ;}f ina l St r ing searchKey = args [ 0 ] ;f ina l int numberOfImages = I n t e g e r . pa r s e In t ( args [ 1 ] ) ;f ina l WebImageReceiver r e c e i v e r = new Fi l eWr i t ingRece ive r ( ” . ” ) ;f ina l WebImageProvider prov ide r = new GoogleImageProvider ( ) ;p rov ide r . addReceiver ( r e c e i v e r ) ;p rov ide r . c o n f i g u r e ( searchKey , numberOfImages ) ;p rov ide r . execute ( ) ;p rov ide r . waitForEndOfProviding ( ) ;

}Listing A.2: The test candidate using the Web Image Retrieval API.

33

Quantity Shell Java1 1.51 1.772 2.70 1.864 4.07 2.016 4.57 2.05

10 6.75 2.9520 11.87 2.2150 28.94 3.84

100 51.28 10.55500 187.07 30.32

1000 411.69 74.46

Table A.1: Average execution time of ten iterations.

A.2 Precision Statistics

service average dog giraffe mouseAltavista 0.73 0.95 0.90 0.35Ask 0.98 1.00 1.00 0.95Bing 0.68 0.90 0.95 0.20Flickr 0.80 0.95 0.80 0.65Google 0.87 1.00 1.00 0.60Picasa 0.67 1.00 0.90 0.10Picsearch 0.75 0.90 0.95 0.40Webshots 0.70 1.00 0.85 0.20Yahoo 0.82 1.00 0.90 0.55

Table A.2: The measured precision of the first twenty images.

34

0

0.2

0.4

0.6

0.8

1


prec

isio

n

dog giraffe mouse

Figure A.1: The measured precision values of each service.

0

0.2

0.4

0.6

0.8

1


prec

isio

n

average

Figure A.2: The measured average precision of each service.

35

0

0.2

0.4

0.6

0.8

1


prec

isio

n

average dog giraffe

Figure A.3: The measured average precision of the search keys dog and giraffe.

36

B Implementation Notes

B.1 The Web Image Retrieval API

The developed library is written Java 6. For the Flickr service the flickrj API, developedby Anthony Eden, Martin Goebel, Matthew Ray, Matthew MacKenzie and Till Krechis used. Writing the Google implementation was inspired by Java scripts from MarcoKortkamp and C# scripts from Ilan Assayag1.

B.2 Evaluation Scripts

For evaluation purpose, a shell script with wget and perl calls was written. The repeatedcalls with the time command and building average values carried another shell script.Gnuplot generated the plots from these data sets.

B.3 Used Software

The following software has been used to write source code, generate files and convertdata.

• Sun Java JDK 6

• Netbeans 6.7

• Checkstyle plugin by Petr Hejl

• UMLet

• LATEX

• BibTeX

• Texmaker

• Kile

• JabRef

• Bash1http://www.codeproject.com/KB/IP/google_image_search_api.aspx

37

http://www.codeproject.com/KB/IP/google_image_search_api.aspx

• Bc

• Wget

• Perl

• Gnuplot

• ImageMagick

Thanks to all the people who supported the development of the mentioned software.

38

C Declaration of Independent Work

I declare that this thesis is my own work and that i did not used any other resourcesthan the denoted ones. Also the contained data sets, figures and program code werecomposed by me or taken from the denoted sources.

Bielefeld, November 27, 2009

Maikel Linke

39

Documents

Web Image Retrieval