Content-Based Image and Video Retrieval || Introduction

Chapter 1

INTRODUCTION

The amount of audiovisual information available in digital format has grown exponentially in recent years. Gigabytes of new images, audio and video clips are generated and stored everyday, helping to build up a huge, distributed, mostly unstructured repository of multimedia information, much of which can be accessed through the Internet.

Digitization, compression, and archiving of multimedia information have become popular and inexpensive, and there is a broad range of available hardware and software to support these tasks. Subsequent retrieval of the stored information, however, is not as straightforward and might require considerable additional work in order to be effective and efficient. There are basically three ways of retrieving previously stored multimedia data:

1 Free browsing: users browse through a collection of images, audio, and video files, and stop when they find the desired information.

2 Text-based retrieval: textual information (metadata) is added -either manually or using (semi- ) automatic tools - to the audiovisual files during the cataloguing stage. In the retrieval phase, this additional information is used to guide conventional, text-based query and search engines to find the desired data.

3 Content-based retrieval: users search the multimedia repository providing information about the actual contents of the image, audio, or video clip. A content-based search engine translates this in-

O. Marques et al., Content-Based Image and Video Retrieval

© Kluwer Academic Publishers 2002

2 CONTENT-BASED IMAGE AND VIDEO RETRIEVAL

formation in some way as to query the database1 and retrieve the candidates that are more likely to satisfy the user's request.

The first two methods have several limitations. Free browsing is tedious. inefficient, and time-consuming and becomes completely impractical for large databases. Text-based search and retrieval suffers from two big problems associated with the cataloguing phase: (a) the considerable amount of time and effort needed to manually annotate each individual image or clip; and (b) the imprecision associated with the subjective human perception of the contents being annotated. These problems are aggravated when the multimedia collection gets bigger and may be the cause of unrecoverable errors in later retrieval.

Despite these problems, query by keyword is still the most popular method for searching visual information on the Web. Several generalpurpose search engines have extended their capabilities to include keyword-based search of visual media. Examples include: Yahoo!'s Picture Gallery (http://gallery . yahoo. com/), the multimedia searcher of Lycos (http://multimedia .lycos. com/), and AltaVista Photofinder (http://www.altavista.com/sites/search/simage).

In order to overcome the inefficiencies and limitations of text-based retrieval of previously annotated visual data, many researchers, mostly from the Image Processing and Computer Vision community, started to investigate possible ways of retrieving visual information based solely on its contents. In other words, instead of being manually annotated using keywords, images and video clips would be indexed by their own visual content, such as color, texture. objects' shape and movement, among others. Research in the field of Content-Based Visual Information Retrieval (CBVIR) started in the early 1990's [135] and is likely to continue during the first decade of the 21 st century. Many research groups in leading universities, research institutes, and companies ([22, 23, 28, 34, 36, 45,61,82,96,97,109,112,116,125,126,132,133,139, 158, 175]' among many others) are actively working towards the ultimate goal of enabling users to retrieve the desired image or video clip among massive amounts of visual data in a fast, efficient, semantically meaningful, friendly, and location-independent manner.

Many improvements are still needed to move CBVIR solutions from research labs prototypes to the mainstream. At the time of this writing, the first commercial CBVIR systems are only starting to appear, They usually offer a product suite that performs a number of image or

lThe use of the word database in the context of this dissertation is merely to mean a (large) collection of data. It does not necessarily mean that data is structured in any organized way.

Introduction 3

video processing including filtering, indexing and classification, text and content-based retrieval, similarity search, and others.

An example of a commercial system is a product suite from LTV Technologies [106] that includes five products: Image-Filter, Image-Indexer, Image-Seeker, Image-Shopper, and Image-Watcher. The core of the LTV technology is a high-level perceptual image analyzer that is capable of indexing, recognizing, and describing images according to their visual features. The image analyzer consists of two steps: (1) Image segmentation, in which a complex image is divided into visual-relevant segments using a non-parametric, multiscale approach, and (2) Image indexing, in which the system assigns a unique identifier (or index) to the segmented image, called the signature, or content DNA. Image-Indexer uses the image analysis techniques to automate classification and semantic categorization of visual content. Image-Seeker performs similarity search within an image database to find a set of images similar to a particular image. The system also provides relevance feedback from the user to refine search results. Figure 1.1 illustrates how the Image-Seeker operates.

One of interesting new trends is the way of delivering commercial CVBIR systems. Besides offering their CBIVR system as a software product, some companies (such as Cob ion and Virage) offer their CBIVR system as a service.

For example, in the case of Cob ion [38], its customers purchase the service from Cobion, and then they offer it to their users. An example is illustrated in Figure 1.2. The heart of the Cobion's visual search platform is Global ImageCenter consisting of a parallel processing supercomputer with 1,000 processors. The system is capable of indexing and processing several millions images per day from the Web. The Cobion's customer (such as Infoseek) presents to its users its own Web site and the user starts the search. The customer's server then transforms the query to a HTTP request and send it to Cobion's server. Cobion's systeD1 performs search and retrieval and through Cobion's API returns a XML £lIe. The customer's server then converts the results to a HTML file and presents to the user.

This book is organized as follows: Chapter 2 provides an overview of the fundamentals of CBVIR systems, primarily from a user's point of view. Chapter 3 takes a look behind the scenes and provides the necessary technical background on the main concepts, techniques, and technical challenges involved in designing and implementing a successful Content-Based Image Retrieval (CBIR) system. In Chapter 4 this discussion is extended to video. Chapter 5 provides a comprehensive survey of existing CBIR systems. Finally, Chapter 6 presents a detailed

4 CONTENT-BASED IMAGE AND VIDEO RETRIEVAL

case study of designing MUSE - a content-based image retrieval system developed at Florida Atlantic University in Boca Raton, Florida.

Vid€o

lmage Understanding

Image~ Oalab~$t .t

(a)

c ontent. XML ___ ~ DC5cription

ft!Imm; '----Llu;------'

irnage -Seeker '

(b)

.t ImagllS D.l.ba,o

COllto"l XML . . ~

Image -Seeker -

Search Results

RETRIEVAL: sirni lm images

Figure 1.1. Image-Seeker: (a) An image is presented to the system and real-time image analysis is performed by extracting features of the image , (b) The system then matches these features with the features of images in the database and retrieves similar images (from [106]).

Introduction

User Cust omer The XML-File from Cobion

contain s,

--1 path to thumbnails hosW byC:obion

--1 lmage-URl

--7 URi. o.fthe wehsitt!

-, file5izeand image dimension .

5

Figure 1.2. Cobion's visual search platform is based on the ASP approach. (From [38])

Documents

Content-Based Image and Video Retrieval || Introduction