An object-based image retrieval system for digital libraries

Multimedia Systems (2006) 11(3): 260–270DOI 10.1007/s00530-006-0010-8

REGULAR PAPER

Sridhar R. Avula · Jinshan Tang · Scott T. Acton

An object-based image retrieval system for digital libraries

Published online: 23 February 2006c© Springer-Verlag 2006

Abstract A novel approach to clustering for image segmen-tation and a new object-based image retrieval method areproposed. The clustering is achieved using the Fisher dis-criminant as an objective function. The objective functionis improved by adding a spatial constraint that encouragesneighboring pixels to take on the same class label. A six-dimensional feature vector is used for clustering by wayof the combination of color and busyness features for eachpixel. After clustering, the dominant segments in each classare chosen based on area and used to extract features forimage retrieval. The color content is represented using a his-togram, and Haar wavelets are used to represent the texturefeature of each segment. The image retrieval is segment-based; the user can select a query segment to perform theretrieval and assign weights to the image features. The dis-tance between two images is calculated using the distancebetween features of the constituent segments. Each imageis ranked based on this distance with respect to the queryimage segment. The algorithm is applied to a pilot databaseof natural images and is shown to improve upon the con-ventional classification and retrieval methods. The proposedsegmentation leads to a higher number of relevant imagesretrieved, 83.5% on average compared to 72.8 and 68.7%for the k-means clustering and the global retrieval methods,respectively.

Keywords Image retrieval · Digital libraries · Imagesegmentation · Texture · Color · Wavelets

1 Introduction

As image libraries experience spectacular growth, image re-trieval systems must be developed that automatically extract

S. R. Avula · J. Tang · S. T. Acton (B)Department of Electrical and Computer Engineering, Virginia Imageand Video Analysis, University of Virginia, 351 McCormick Road,Charlottesville, VA 22904-4743, USAE-mail: [email protected], [email protected]

features and annotate image data [1–3]. The exponentiallygrowing image databases need efficient storage, catalogingand retrieval systems in order to facilitate use. Images in thedatabase are typically indexed using text annotation, whichis dependent upon the language and point of view of the hu-man operator [3]. Content-based image retrieval (CBIR), incontrast, is the method of retrieving images using only thecontent of the images.

In this paper, we will discuss clustering-based im-age retrieval. The goal of clustering is to find regionsthat represent objects or meaningful parts of objects inthe image by grouping pixels with similar characteristics[4].

Clustering is a powerful tool when used in retrieval be-cause it allows us to narrow the field of search by select-ing the objects of interest in the image. We propose a newclustering algorithm that builds upon the Fisher discrimi-nant by adding spatial constraints. In this method the im-age information is converted into a ratio (Fisher discrimi-nant) that is large when the intercluster distance is large orthe intracluster distance is small. Each pixel is assigned to acluster if it maximizes the discriminant value for the wholeimage. The Fisher discriminant [5] is effective in classifi-cation of images and provides a logical group of the im-age pixels, as it not only considers the means of the classesbut also the variance of each class in the classificationprocess.

The clustering algorithm developed here has been ap-plied to image retrieval. If the user is interested only ina single object or in a few objects in the image, we needto perform the retrieval using local features, whereas theglobal feature-based methods use all the features includingthe background features. Given a query image, the user canselect individual segment or object from the image that are tobe used in retrieval. Now the features of the query segmentsare compared with those of the segments in the database,and a distance function is used to rank the images in thedatabase according to their similarity. The retrieved imagesare displayed in the order of their ranking.

An object-based image retrieval system for digital libraries 261

2 Background

2.1 Clustering

Segmentation is the process of dividing an image into ho-mogenous regions, which may correspond to the objects inthe image. Image segmentation is trivial for images withsmooth or constant regions, where gray levels can be used tosegment the image [6]. But many images are complex andcontain several objects, colors, textures and shapes.

Many segmentation algorithms have been developed inthe past including thresholding [7, 8], edge based methods[9, 10], region merging techniques [11, 12], level-set basedtechniques [13, 14] and clustering algorithms [15, 16]. Inthis paper, we will study the clustering approach. In a clus-tering algorithm, pixels with similar attributes are groupedtogether to form clusters. Each cluster consists of pixels thatare similar to each other and dissimilar to pixels of otherclusters. Traditional clustering algorithms are basically di-vided into hierarchical classification and partitioning [17].In the paper we will explore with partitioning algorithms,where pixels are iteratively assigned to subsets. Certain at-tributes of the clusters are optimized iteratively to achievethe classification. The attributes generally used are intensity,texture and gradient of the image. Each pixel is classifiedinto one of the classes based on its similarity with the class.The similarity is measured as the distance between the sta-tistical distributions like means and variances that representthe cluster centers and the pixel attributes.

A widely used clustering method is the k-means algo-rithm [15, 16]. The pixels in the image are initially assignedto a fixed number of classes. To start the iteration, the clustercenter means are calculated using the feature vectors. Thefeature vectors can be color and texture representations ofeach pixel. In the subsequent iterations, the color image at-tributes are used to iterate the k-means clustering. The dis-tance between mean vectors of each class is calculated usinga distance measure. After each stage, cluster centers are up-dated using feature vectors of all the pixels of the updatedclasses. If the cluster size remains unchanged, the classifica-tion process is terminated.

The typical approach to k-means clustering involves asimple two-step process: (1) to assign all the pixels to theirnearest centroids and (2) to compute the centroids of thenewly updated groups. The main disadvantage of this ap-proach is that the process has to be iterated many times be-fore the final classification is yielded, and also the effect ofeach pixel on the objective function is not taken into con-sideration. In contrast, the algorithm for clustering used inthis paper employs an iterative optimization method, whichis based on more detailed effects on the objective functioncaused by moving a point from its cluster to a potentiallynew one which also forms the source for this clusteringmethod.

Attempts have been made in this work to overcome thedrawbacks of the k-means clustering, such as the limitationof considering only the mean as a classification factor. The

advantages of the traditional k-means algorithm includestraightforward implementation, fairly robust behaviorand applicability to multidimensional data. One commonmethod is to use smoothing before applying k-means clus-tering. However, standard-smoothing filters can result in aloss of important image details. Recently, an approach [18]was proposed for increasing the robustness of classificationto noise by directly modifying the objective function byreplacing distance computed at each pixel by the weightedsum of distances within a neighborhood of the pixel. Thiswork is most similar to our current work. The spatialconstraints on the pixels of the image have been imposedin a basic way to upgrade the classification into a segmen-tation technique. The algorithm presented here is aimed atapplication of segmentation to improve image retrieval.

2.2 Background on image retrieval

Content-based image retrieval has gained importance in var-ious fields with the advent of recent technologies to captureand store digital images using less memory, higher compu-tational speeds and better availability and usage of Internet.To effectively and efficiently use an expansive database, weneed an efficient retrieval system [1]. Many retrieval systemshave been designed. The Query By Image Content (QBIC)[19] was one of the first image retrieval systems developed.The query in the QBIC system can be an example image,user made drawings and sketches and color and textures canbe selected. QBIC was one of the first retrieval systems touse high dimensional feature indexing. In the Virage system[20], the user can adjust the weights associated with eachfeature and get the desired images. Spatial features havebeen included in visual SEEK [21] along with the visualfeatures. Edge flow-based image region segmentation alongwith the spatial and visual features have been exploited inNETRA [22].

A vital step in content-based image retrieval is featureextraction. The features can be classified into domainspecific, like human faces, fingerprints, etc. and generalfeatures that include color, texture and shape [1]. We limitthis work to the general feature extraction.

Color: Color is one of the mostly used visual features inretrieval. It is considered very robust and is independent ofthe size of the image and orientation. Color can also be usedto find the location in an image and to differentiate a largenumber of objects [23]. The color histogram is obtainedby measuring the frequency of a particular color in theimage. In addition to color histograms, color moments andcolor sets have been proposed [24]. Since our retrieval isclustering based, the color features of each segment haveto be stored for an efficient retrieval. For simplicity, his-tograms have been used in this work. Colors are quantizedand represented in histogram form for the current imageretrieval that will be discussed in Sect. IV.

Texture: Texture is an innate property of virtually all sur-faces. It represents the regularity, smoothness/coarseness of

262 S. R. Avula et al.

the image. Texture gives a direction sense to the spatial ar-rangement of image intensities. Filter banks, and AM-FMmodels have been proposed to represent texture of an image[25, 26]. Although wavelet analysis has been mainly appliedto image compression [27, 28], it has also proven very effec-tive in texture representation [29].

3 Image segmentation using J-means clusteringand spatial information

Segmentation of images is performed so that the user can se-lect entire objects in the image and use the same as a queryinstead of the whole image. A clustering method has beenused to attain the segmentation of the images. A standard k-means algorithm has been modified and spatial constraintshave been used to improve its effectiveness. In the algorithm,the Fisher discriminant has been used to decrease the itera-tions required for convergence. In our implementation, spa-tial constraints have been added to improve the ability withwhich to extract homogeneous regions. In the current seg-mentation the red–green–blue (RGB) color space was used,but the algorithm could certainly be extended to other colorspaces.

3.1 Construction of feature vectors

To represent the texture feature of the image, the busynessfactor is calculated for each pixel. In order classify eachpixel, we need a pixel-based representation of the textureof the image, and hence the busyness factor was chosen overwavelets that are usually employed to represent the textureof larger regions. The given image is converted into grayscale image IG. The busyness factor, BF, for each pixel iscalculated by counting the number of edges in a small win-dow around each pixel. The vertical, horizontal and diago-nal edge strength is represented by v, h and d, respectively,and calculated by counting the number of edges in the cor-responding directions in a window around each pixel.

Consider a window of size 3×3 around a pixel p(x, y)in IG. Here we display a subimage of 5 × 5 pixels, with 1’ssurrounding the 3 × 3 window:

1 1 1 1 1

1 2 3 3 1

1 1 4 5 1

1 3 5 6 1

1 1 1 1 1

In this example, p(x, y) = 4, v = 9, h = 5 and d = 16.The values of v, h and d are calculated by counting the num-ber of intensity changes in the row, column and upper diag-onal directions by scanning the window from top to bottomand left to right for each pixel p(e, f), where e = 1, 2, 3and f = 1, 2, 3, from the above matrix, p(1, 1) = 2 and

p(3, 3) = 6. Quantization of the difference between thevalue of the current pixel and the left, top and the upperdiagonal pixels respectively derives the values of v, h andd. These values are calculated for each pixel in the matrix.These values approximately represent the texture feature ofthe each pixel. The busyness factor has been chosen overother texture representations, because it is simple to calcu-late and also can be used as a pixel-based feature vector inclassification. Hence, our feature vector for the classificationhas 6-dimensions given by r, g, b and v, h, d. Both color andtexture features are represented.

3.2 Classification

The features of the given color image Ic, after clustering, arerepresented in the form of a matrix If. If is a two-dimensionaltensor with a feature vector corresponding to each pixel lo-cation. Each element of the matrix If is a six-element fea-ture vector representing the color (r, g, b) and the busy-ness factor (v, h, d) information of each pixel of the im-age. A matrix L of size identical to the image size is usedto represent the class labeling. Dividing L into K classesrow wise initiates the clustering algorithm. For the Pth it-eration the means (mi P (r, g, b, v, h, d)) and the variances(vi P (r, g, b, v, h, d)) for class i, are calculated using the fea-ture vector of If as follows

mip = 1

Nip

∑L(x,y)∈Aip

If(x, y), (1)

vDip = 1

Nip

∑L(x,y)∈Aip

(I Df (x, y) − m D

ip

)2 (2)

for all i = 1, 2, . . . , K and Eq. (2) for each dimensionD = r, g, b, v, h and d. Here m D

ip is the Dth element of the

mean vector mip, vDip is the Dth element of the variance vec-

tor vip, I Df is the Dth element of If and Ni P is the number

of pixels in the ith cluster, Aip denotes the set of labeling ofthe pixels for the ith cluster. During iteration P, the pixelsare reassigned to a class using a criterion which is based onFisher discriminant. Fisher discriminant is defined as the ra-tio of the sum of difference in means to the sum of variancesof each class

Jp =∑K

i=1∑K

j>i (mip − m jp)T(mip − m jp)(∑K

i=1 vTipvi p

)1/2(3)

The criterion for reassigning a pixel to a class is thatthe reassignment must cause an increase in Jp. In orderto realize this constraint, we adopt a searching algorithm.Consider reassignment of the tth pixel, where t is a re-parameterization of (x, y), and the pixel belongs to thei1th class. Assume that we have the means and variances


for each class before the pixel is reassigned. For conve-nience, these means and variances are denoted by mi P(t)and vi P(t), where i = 1, 2, . . . , K . Obviously we have Kpossible reassignments for this pixel. The class to which thepixel is reassigned can be obtained by selecting one assign-ment from the K possible reassignments using the followingequation

i(t) = arg maxi2

Jp,i2(t) (4)

where

JP,i2(t) =

K∑i=1i �=i1,i2,

K∑j>ij �=i1,i2

(mip(t) − m jp(t))T(mip(t) − m jp(t))

K∑

i=1i �=i1,i2

vi p(t)Tvi p(t) +2∑

k=1v′

ik p(t)Tv′

ik p(t)

1/2

+

2∑k=1

K∑j=1j �=i1,i2

(m′ik p(t) − m jp(t))T(m′

ik p(t) − m jp(t))

K∑

i=1i �=i1,i2


k=1v′

ik p(t)Tv′

ik p(t)

1/2

+ (m′i1 p(t) − m′

i2 p(t))T(m′

i1 p(t) − m′i2 p(t))

K∑i=1i �=i1,i2


k=1v′

ik p(t)Tv′

ik p(t)

1/2

(5)

Here m′i1 P(t)and v′

i1 P(t) are the means and variances of the

i1th class after the pixel is removed from it. m′i2 P(t) and

v′i2 P(t) are the means and variances of the i2th class after

the pixel is reassigned to it.Because in each reassignment procedure, we reassign a

single pixel, we can compute m′ik P(t) and v′

ik P(t) using atime-saving iterative algorithm. Here, m′

ik P(t) and v′ik P(t)

can be obtained by

m′ik P

D(t) =(m D

ik P(t)Nik P(t) − ( f ′)I f (t))

(Nik P(t) − f ′), (6)

v′ik P

D(t) = Nik P(t)vDik P(t) − (Nik P(t) − f ′)T 2

A − f ′T 2B

Nik P(t) − f ′(7)

where m Dik P(t) and m′

ik PD(t) are the Dth element of m′

ik P(t)

and mik P(t), respectively. v′ik P

D(t) and vDik P(t) are the Dth

element of v′ik P(t) and vik P(t), respectively, Nik P(t) is the

number of the pixels in class ik before removing or addingthe current pixel. Here f ′ = 1 if the pixel is to be removed

from the class ik and f ′ = −1 if it is to be added to class ik .TA and TB are temporary variables defined by

TA = (m′

ik PD(t) − m D

ik P(t)), (8)

and

TB = (m D

ik P(t) − I Df (t)

)(9)

After we reassign the tth pixel, we use the same pro-cedure to assign the other pixels. After all the pixels havebeen reassigned, we can increment P and refine the classi-fication by repeating the entire update procedure. The pro-cessing can continue till the requirement is satisfied. In thereal application, this type of clustering is very efficient inthat the majority of classification is attained at the end of thefirst iteration of the image. But here the spatial aspects of thepixel are not incorporated. Thus, the clustering can be sensi-tive to noise in the image. Smoothing can be applied but wemay lose the image details in the process. Spatial constraintscan be included to discourage unlikely pixel labeling, suchas a pixel of a different class surrounded by pixels of sameclass.

3.3 Spatial constraints

Classification using J-means clustering is very effective inthe sense that for each pixel assignment the means and thevariances of all the classes are taken into account. Afterthe first iteration, most of the pixels are assigned to properclasses. The spatial constraints are activated in the subse-quent iterations. The spatial term is calculated using a smallwindow W of dimensions (2ω + 1) centered around eachpixel p(x, y). The term is maximized if the pixel belongs tothe same class as its neighboring pixels in the window W.

The window around each pixel p(x, y) can be defined asfollows: Define e and f such that (x − ω) ≤ e ≤ (x + ω)and (y −ω) ≤ f ≤ (y +ω). In each window W, the numberof pixels that are assigned to the same class as p(x, y), arecounted. αi ′(t) is the probability of the tth pixel to belong tothe same class (i ′) as the pixels in window W.

αi ′(t) = 1

(2ω + 1)2

∑(e, f )∈W

U (e, f ), (10)

where

U (e, f ) ={

1, if L(e, f ) = L(x, y)

0, otherwise(11)


Using i2 to replace i ′ in (10), we can modify the new objec-tive function in (5) as follows

JP,i2(t) =

K∑i=1i �=i1,i2,

K∑j>ij �=i1,i2


K∑

i=1i �=i1,i2


k=1v′

ik p(t)Tv′

ik p(t)

1/2

+

2∑k=1

K∑j=1j �=i1,i2



K∑

i=1i �=i1,i2


k=1v′

ik p(t)Tv′

ik p(t)

1/2

+ (m′i1 p(t) − m′

i2 p(t))T(m′

i1 p(t) − m′i2 p(t))

K∑i=1i �=i1,i2


k=1v′

ik p(t)Tv′

ik p(t)

1/2

+λαi2(t) (12)

The spatial term is maximized for a class when all theneighboring pixels belong to one class and the center pixelalso to the same class. λ is the weight given to the spatialconstraint and can be varied accordingly. One problemwith this spatial term is at the edges because at the edgesthe neighboring pixels need not belong to the same class.Hence, we introduce an edge pixel detector so that thespatiality is not enforced on the edge pixels. The edge termallows the effect of the spatial term on the objective functiononly in the absence of an edge. We detect the edge pixelsusing the gray scale image IG. In the window W, we countthe number of pixels (γ ) in each quarter (Q1, Q2, Q3, Q4)of the window, including the row and column containingtth pixel p(x, y), that have similar gray scale intensity asp(x, y). If at least one of the four quarters are filled withpixels of similar gray level then the pixel p(x, y) is called anedge pixel. It can be explained as follows: For a pixel p(x, y),

Q1 is defined as p(e, f ) ∈ Q1 if (x − ω) ≤ e ≤ (x) and(y − ω) ≤ f ≤ (y),Q2 is defined as p(e, f ) ∈ Q2 if (x − ω) ≤ e ≤ (x) and(y) ≤ f ≤ (y + ω),Q3 is defined as p(e, f ) ∈ Q3 if (x) ≤ e ≤ (x + ω) and(y − ω) ≤ f ≤ (y),Q4 is defined as p(e, f ) ∈ Q4 if (x) ≤ e ≤ (x + ω) and(y) ≤ f ≤ (y + ω).

For each QC , where C = 1, 2, 3, 4.

γ (QC ) =∑

(e, f )∈QC

Z(e, f ), (13)

where

Z(e, f ) ={

1, if IG(e, f ) ∼= IG(x, y)

0, otherwise

and

δ(t) ={

1, if γ (QC ) = (ω + 1)(ω + 1)

0, otherwise. (14)

If δ(t) = 0, then the process is repeated for the next quarter.Now including the edge factor into the spatial term,

Eq. (12) is modified as

JP,i2(t) =

K∑i=1i �=i1,i2,

K∑j>ij �=i1,i2


K∑i=1i �=i1,i2


k=1v′

ik p(t)Tv′

ik p(t)

1/2

+

2∑k=1

K∑j=1j �=i1,i2



K∑i=1i �=i1,i2


k=1v′

ik p(t)Tv′

ik p(t)

1/2

+ (m′i1 p(t) − m′

i2 p(t))T(m′

i1 p(t) − m′i2 p(t))

K∑

i=1i �=i1,i2


k=1v′

ik p(t)Tv′

ik p(t)

1/2

+(1 − δ(t))λαi2(t) (15)

The value of Jp,i2(t) is calculated for each pixel and ismaximized as we scan the image from left to right and top tobottom. The size factor ω of the window W is increased aftereach iteration P, by a small amount to increase the span ofthe spatial term. The process is again repeated for each pixeland the iteration is terminated when there is no significantchange in the sizes (Ni P ) of the classes. Once the image isclustered into K classes, the classes are processed to capturethe segments in the image.

The effect of using spatial constraints is illustrated inFig. 1. Using spatial constraints, we were able to obtain thequarter as a whole segment and the background as the other,unlike with the traditional k-means approach, which rele-gated various parts of the quarter segmented to the back-ground.

Each class i is labeled into pi regions using connectedcomponent labeling (with 4-connectivity) [9]. Let Rin rep-resent the nth region in the ith class. The region Risi withmaximum area is chosen as the dominant component in eachclass.

�i (x, y) = 1, if (x, y) ∈ Risi , (16)


Fig. 1 a Original image, b clustering into two classes using only k-means and c clustering using the proposed J -means algorithm

Fig. 2 Images a, b, c, d illustrate the clustering of color images

for all i = 1, 2, . . . , K , where

si = arg( pimaxt=1

(area(Rin))).

si is the region of maximum area of all the pi regions labeledin each class i which is stored in �i . The components arestored in the database for each image and are accessed whenneeded for retrieval. Figure 2 shows some examples of theproposed clustering algorithm.

In the results illustrated, each image was segmented intofour clusters using the proposed clustering algorithm. Thedominant components represent the segments of the image.The details of feature extraction, storage and comparison forretrieval are explained in Sect. 4.

4 Feature extraction and image retrieval

The process of image retrieval proposed here is based onclustering. The characteristic features of the image segmentsare extracted and stored for expeditious image retrieval.

For each segment, we extract two features: color, tex-ture. Texture is an important feature which describes theregularity, smoothness and coarseness of the image. Manytechniques have been proposed to describe texture feature.We use the method developed in [3] to extract the texturefeatures. The only difference is that the extraction of tex-ture feature here is not from the whole image but from thesegments. To calculate the coefficients a square is fit in-side the component such that all parts of the square are in-side the component. This part of the component is used tocalculate the wavelet coefficients. The similarity in texture


DT between any two components Qi and D j is calcu-lated as the Euclidean distance between the derived waveletcoefficients.

DT(Qi , D j ) =√∑

k

(Wi,Q(k) − W j,D(k))2 (17)

where Wi,Q(k) and W j,D(k) are the energies in the kth sub-band of the ith component (Qi ) of the query image Q andof the jth component (D j ) of the database image D, respec-tively.

In the retrieval of specific image objects, color has enor-mous importance, because it is essentially view invariantand resolution independent [23]. The color histogram is ob-tained by measuring the frequency of the constituent colors.A benefit of this basic representation is that histograms areinvariant to translation and rotation [23]. For a given com-ponent of an image its color content is extracted in the formof the histogram of each color band: R, G and B in this case(other color spaces are possible [30]). Here, the distance DCbetween components is calculated similarly taking the his-togram values of each component.

Dc(Qi , D j ) =√∑

k

(Hi,Q(k) − Hj,D(k))2 (18)

where Hi,Q, Hj,D are the histogram values of the ith com-ponent (Qi ) of the query image Q and of the jth component(D j ) of the database image D, respectively.

In the retrieval process, a query image is first segmentedand the dominant components found are highlighted. Once adominant component is chosen, the corresponding featuresare compared with the features of the other image compo-nents in the database, and the M closest matches are re-trieved. In this paradigm, we use a similarity measure comb-ing the two features. Let the normalized color similarity andthe normalized texture similarity between two segments beDC and DT which can be obtained by the method in Eqs. (15)and (16). Then the similarity between two segments is com-puted as

d(Qi , D j ) = wc DC(Qi , D j ) + wt DT(Qi , D j ) (19)

where Qi is the query segment and the D j is a segment froman image in the database. wc, wt are the user-related weightsfor color, texture and shape features, respectively.

5 Experimental results and analysis

In this section we display the success of the proposed algo-rithm for segmentation and retrieval. We show the improve-ment given by the segment-based retrieval over the globalfeature-based method. The numerical and visual results arepresented to demonstrate the same. We implemented the re-trieval using JBuilder Java 9.0, and all experiments were runon a Pentium IV 1.59 GHz processor with 512 MB RAM.

Table 1 Table showing the number of images having the correspond-ing segments (Confusion matrix: This table gives an idea of the com-bination of segments in the images in the database)

Number of imagesSegment containing this segment

Trees 42Grass 27Water bodies 72Sky 45Clouds 49Mountains 26Football fields 25Miscellaneous 10

Table 2 Table showing the various combinations of segments presentin the images in the database

Trees Grass Water Sky Clouds Mountains

Trees 42 25 22 30 23 8Grass 25 27 6 22 12 6Water 22 6 72 17 35 20Sky 30 22 17 45 0 3Clouds 23 12 35 0 49 19Mountains 8 6 20 3 19 26

The experiments were conducted on a database, whichare composed of some images from the image database in[31] and some images captured at the University of Virginia.Most of the images are natural sceneries. Each image is seg-mented and the segment features are stored. The typical sizeof each image is 500 × 700 pixels. For the current databaseeach image is assumed to contain four segments. This num-ber has been chosen on inspection of the database images. Inthis database, the images consist of different segments. Wechoose eight classes of segments as queries with which toperform image retrieval. These classes include trees, grass,water bodies, sky, clouds, mountains, and football fields.Table 1 shows the number of images having the correspond-ing segments, Table 2 shows the various combinations ofsegments present in the images in the database.

The retrieval performance is measured using the follow-ing metrics: precision, recall, false alarm rate, and miss rate[32]:

(a) Precision is the number of relevant images retrievedcompared to the total number of images retrieved.

(b) Recall is the number of relevant images retrieved com-pared to the total number of relevant images.

(c) False alarm rate is the number of irrelevant images re-trieved against the total number of images retrieved.

(d) Miss rate is the number of relevant images not retrievedto the total number of relevant images.

For comparison, three image retrieval schemes wereused in the experiments. Two of them were clustering-basedimage retrieval using J -means clustering and k-means clus-tering, respectively, and the other one is image retrievalusing global features. For the global feature, two images aresaid to be relevant if the image has the query segment.


(j)

(a)

(b)

(c)

(k) (l)

(g) (h) (i)

(d) (e) (f)

Fig. 3 a is the segmented query image and b to f are images retrieved (in order) using the water as the query segment and g to l are the imagesretrieved (in order) using the greenery segment as the query segment


Table 3 shows the number of images retrieved againstthe number of relevant images retrieved for each segmentused and Figs. 3 and 4 show the image retrieval examplesusing two clustered based image retrieval methods (Fig. 3)and retrieval using global features (Fig. 4). From Table 3 andFigs. 3 and 4, we observe that the image retrieval algorithmbased on J -means clustering (the method proposed in thispaper) retrieved the greatest number of relevant images inall cases.

Figures 5 and 6 show the average precision and the missrate for the three methods. In Fig. 5, the number of rele-vant images retrieved increases with the number of imagesretrieved. The J -means clustering yields higher number ofrelevant images while global feature-based method providesthe worst in contrast. Also in Fig. 6 the number of irrele-vant images retrieved is compared with the number of im-ages retrieved. The J -means segmentation method retrievedless irrelevant images compared to the other two methods.

As we observe, the J -means clustering methods super-sede both the k-means clustering method and the global fea-ture methods in terms of retrieval performance. The recalland the miss rate are the same as precision and false alarmbecause the number of relevant images in the database ismore than the number of images retrieved and can be takenas equal to number of images retrieved here. The graph ofrecall is the same as that of precision and that of miss rate

Fig. 4 a–f Query results (in order of retrieval) using global features (for the query shown in Fig. 3a)

Table 3 Table showing number of images retrieved against number ofrelevant images retrieved for each segment used

Number of imagesretrieved 4 8 12 16 20 24

WaterJ -means 4 8 11 13 15 16k-means 4 7 9 11 13 15Global 4 6 9 12 14 15

MountainsJ -means 4 7 10 12 15 19k-means 2 5 7 9 10 13Global 4 7 9 12 15 17

GrassJ -means 4 7 11 14 16 18k-means 4 6 8 10 14 14Global 2 3 4 6 7 8

SkyJ -means 4 7 11 15 17 19k-means 4 6 9 12 14 15Global 4 7 8 11 11 13

TreesJ -means 3 6 8 12 14 18k-means 3 6 10 13 14 17Global 4 6 7 9 9 11

CloudsJ -means 4 7 11 14 16 19k-means 4 8 11 14 18 20Global 4 8 12 16 19 22

AverageJ -means 3.66 7 10.33 13.33 15.5 18.16k-means 3.5 6.33 9.16 11.66 14 15.83Global 3.66 6.16 8.16 11 12.5 14.33


Fig. 5 Graph representing precision rate for three retrieval methods

is similar to that of false alarm. Comparing the results wecan observe that J -means based clustering leads to a viableretrieval system.

6 Conclusion

This paper introduces a clustering retrieval system capableof retrieving images from a database by comparing the seg-ments in the images. The novelty lies in the clustering thatincludes a spatial term that improves the homogeneity of thesegments computed in the image. The J -means clustering-based retrieval system was found to be more successful inretrieving relevant images compared to global and k-meansclustering-based retrieval methods in terms of the number ofrelevant images retrieved and the number of irrelevant im-ages retrieved.

A few extensions and refinements will improve thissystem in realizing the goal of a completely automateddigital library retrieval system. Areas specific to this sys-tem include automatic computation of the number of seg-ments and classification the database for an efficient re-trieval. Various methods of determining number of classesexist, but do not necessarily match human perception. Theretrieval time for vast databases can be reduced by or-ganizing the database into classes. These classes can beformed on the basis of the dominant features in the im-ages, such as dominant color, dominant texture or dominantshape.

Fig. 6 Graph representing the false alarm rate of the three retrievalmethods

References

1. Rui, Y., Huang, T.S., Chang, S.F.: Image retrieval: Past, present,and future. In: Proceedings of ISMIP (December 1997)

2. Tang, J., Avula, S.R., Acton, S.T.: DIRECT: A decentralized im-age retrieval system for the national STEM digital library. Inform.Technol. Libr. 23(1), 9–15 (2004)

3. Avula, S.R., Tang, J., Acton, S. T.: Image retrieval using segmenta-tion. In: Proceedings of the IEEE Systems and Information Engi-neering Design Symposium, Charlottesville, Virginia (April 2003)

4. Hartigan, J.: Clustering Algorithms Wiley, New York (1975)5. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis,

Wiley (1973)6. Porter, R., Cangajarah, N.: A robust automatic clustering scheme

for image segmentation using wavelets. IEEE Trans. Image Pro-cess. 5(4), 662–665 (1996)

7. Haralick, R.M., Shapiro, L.G.: Computer and Robot Vision, vol. 1.Addison Wesley Publishing Co., New York (1993)

8. Sahoo, P.K., Soltani, S., Wong, A.K.C.: A survey of threshold-ing techniques. Comput. Vis. Graph. Image Process. 41, 233—260(1988)

9. Acton, S.T., Mukherjee, D.P.: Scale space classification using areamorphology. IEEE Trans. Image Process. 9, 623–635 (2000)

10. Beucher, S., Lantuejoul, C.: Use of watersheds in contour de-tection. In: Proceedings of the International Workshop on ImageProcessing, Real-Times Edge and Motion Detection/Estimation.Rennes, France (September 17–21, 1979)

11. Zhu, S.C., Yuille, A.: Region competition: Unifying snakes, regiongrowing and Bayes/MDL for multiband and image segmentation.IEEE Trans. Pattern Anal. Mach. Intell. 18(9), 884–900 (1996)

12. Partha, P.M., Mukherjee, D.P., Acton, S.T.: Agglomerative cluster-ing for image segmentation. In: Proceedings of IEEE InternationalConference on Image Processing, Rochester, New York (Septem-ber 22–25 2002)


13. Sethian, J.A.: Level Set Methods and Fast Marching Methods.Cambridge University Press (2000)

14. Kass, M., Witkin, A., Terzopoulos, D.: Snakes – Active contourmodels. Int. J. Comput. Vis. 1(4), 321–331 (1987)

15. MacQueen, J.: Some methods for classification and analysis ofmultivariate observations. In: Proceedings of the Fifth BerkeleySymposium on Mathematical Statistics and Probability, vol. 1, pp.281–297. Berkeley, University of California Press (1967)

16. Ray, S., Turi, R.H.: Determination of number of clusters in k-means clustering and application in color image segmentation. In:Proceedings of the 4th International Conference on Advances inPattern Recognition and Digital Techniques (ICAPRDT’99), pp.137–143 (1999)

17. Berkhin, P.: Survey of clustering data mining techniques. Techni-cal report, Accrue Software (2002)

18. Liew, A.W.C., Leung, S.H., Lau, W.H.: Fuzzy image clusteringincorporating spatial continuity. IEEE Proc. Visual Image SignalProcess. 147(2), 185–192 (2000)

19. Flickner, M., Sawhney, H., Niblack, W., Ashley, J.: Query by im-age and video content: The QBIC system. IEEE Comput. 28, 23-33 (1995)

20. Bach, J.R., Fuller, C., Gupta, A., Hampapur, A., Horowitz, B.,Humphrey, R., Jain, R.C., Shu, C.: Virage image search engine: anopen framework for image management, In: Proceedings of SPIE(Storage and Retrieval for Image and Video Databases IV), 2670,pp. 76–87 (1996)

21. Smith, J.R., Chang, S.F.: VisualSEEK. A Fully Automated Con-tent Based Image Query System. ACM Multimedia, Boston, MA(1996)

22. Ma, W.Y., Manjunath, B.S.: Netra: a tool box for navigating largeimage databases. In: Proceedings of IEEE International Confer-ence on Image Processing (1997)

23. Swain, M.J., Ballard, D.H.: Indexing via color histograms. In: Pro-ceedings of the Third International Conference on Computer Vi-sion (December 1990)

24. Markus, A.S., Markus, O.: Similarity of color images. In: SPIEProceedings, vol. 2420 (1995)

25. Tay, P., Havelicek, J.P., De Brunner, V.: Discrete wavelet trans-form with optimal joint localization for determining the numberof image texture segments. In: Proceedings of IEEE InternationalConference on Image Processing (2002)

26. Manjunath, B.S., Ma, W.Y.: Texture features for browsing andretrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell.(Special Issue on Digital Libraries) 18(8), 837–842 (1996)

27. Tang, J., Zhang, Y.G.: A perfect reconstruction, size-limited filterbank for orthogonal, wavelet-based, finite-signal subband process-ing. Digital Signal Process. 11(4), 304–328 (2001)

28. Shapiro, J.M.: Embedded image coding using zerotrees of waveletcoefficients. IEEE Trans. Signal Process. 41, 3445–3462 (1993)

29. Do, M.N., Vetterli, M.: Wavelet-based texture retrieval using gen-eralized Gaussian density and Kullback-Leibler distance. IEEETrans. Image Process. 11, 146–158 (2002)

30. Tang, J., Action, S.T.: Locating human faces in a complex back-ground including non-face skin colors. J. Electron. Imaging 12(3),423–430 (2003)

31. http://www.cs.washington.edu/research/imagedatabase/groundtruth/32. Cassidy, D., Carthy, J., Drummond, A., Dunnion, J., Sheppard,

J.: The use of data mining in the design and implementationof an incident report retrieval system. In: Proceedings of theIEEE Systems and Information Engineering Design Symposium,Charlottesville, Virginia (April 2003)

Documents

An object-based image retrieval system for digital libraries