11
Shape measures for image retrieval George Gagaudakis * , Paul L. Rosin Department of Computer Science, Cardiff University, Newport Road, Cardiff, CF24 3XF, UK Received 15 December 2002; received in revised form 12 March 2003 Abstract One of the main goals in content-based image retrieval is to incorporate shape into the process in a reliable manner. In order to overcome the difficulties of directly obtaining shape information (in particular avoiding region segmenta- tion) we develop several shape measures that tackle the problem in an indirect manner, requiring only a minimal amount of segmentation. A histogram-based scheme is then used, maintaining low complexity with high efficiency and robustness. The obtained results showed that the combination of the shape measures provide an improvement over the colour histogram. Ó 2003 Elsevier B.V. All rights reserved. Keywords: Content-based image retrieval; Histogram; Shape; Texture; Colour 1. Introduction A limitation with content-based image retrieval (CBIR) systems is their restriction to primitive image features such as colour and texture. Ideally CBIR systems would operate on the semantic im- age content and accept queries of that nature (e.g. find pictures of steam trains in the country; Eakins, 1996). The early CBIR systems used techniques such as colour histograms because they were easy to compute, robust, and fairly effective. However, randomly scrambling the positions of all the pixels in the image leaves its histogram unaltered, and it soon became apparent that it was necessary to incorporate some spatial information into the search. One scheme was to allow the user to sketch query shapes which were matched to the image database (Flickner et al., 1995), but this is not convenient to the average (unartistic) user. Alter- natively, in (Pass and Zabih, 1999) a scheme of joint histograms is exploited to incorporate more information, than just colour, in the process of image indexing. That approach kept the indexing process automated while performance was in- creased. An obvious approach to include shape is to segment the image into regions. It is then straight- forward to measure region shape as well as deter- mining spatial interrelationships between regions. The difficulty is that segmentation is inherently * Corresponding author. Tel.: +44-2920875588; fax: +44- 2920874598. E-mail addresses: [email protected], g.gagaudakis@ cs.cardiff.ac.uk (G. Gagaudakis), [email protected] (P.L. Rosin). 0167-8655/$ - see front matter Ó 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0167-8655(03)00114-4 Pattern Recognition Letters 24 (2003) 2711–2721 www.elsevier.com/locate/patrec

Shape measures for image retrieval

Embed Size (px)

Citation preview

Pattern Recognition Letters 24 (2003) 2711–2721

www.elsevier.com/locate/patrec

Shape measures for image retrieval

George Gagaudakis *, Paul L. Rosin

Department of Computer Science, Cardiff University, Newport Road, Cardiff, CF24 3XF, UK

Received 15 December 2002; received in revised form 12 March 2003

Abstract

One of the main goals in content-based image retrieval is to incorporate shape into the process in a reliable manner.

In order to overcome the difficulties of directly obtaining shape information (in particular avoiding region segmenta-

tion) we develop several shape measures that tackle the problem in an indirect manner, requiring only a minimal

amount of segmentation. A histogram-based scheme is then used, maintaining low complexity with high efficiency and

robustness. The obtained results showed that the combination of the shape measures provide an improvement over the

colour histogram.

� 2003 Elsevier B.V. All rights reserved.

Keywords: Content-based image retrieval; Histogram; Shape; Texture; Colour

1. Introduction

A limitation with content-based image retrieval

(CBIR) systems is their restriction to primitiveimage features such as colour and texture. Ideally

CBIR systems would operate on the semantic im-

age content and accept queries of that nature (e.g.

find pictures of steam trains in the country;

Eakins, 1996).

The early CBIR systems used techniques such

as colour histograms because they were easy to

compute, robust, and fairly effective. However,

* Corresponding author. Tel.: +44-2920875588; fax: +44-

2920874598.

E-mail addresses: [email protected], g.gagaudakis@

cs.cardiff.ac.uk (G. Gagaudakis), [email protected] (P.L.

Rosin).

0167-8655/$ - see front matter � 2003 Elsevier B.V. All rights reserv

doi:10.1016/S0167-8655(03)00114-4

randomly scrambling the positions of all the pixels

in the image leaves its histogram unaltered, and it

soon became apparent that it was necessary to

incorporate some spatial information into thesearch. One scheme was to allow the user to sketch

query shapes which were matched to the image

database (Flickner et al., 1995), but this is not

convenient to the average (unartistic) user. Alter-

natively, in (Pass and Zabih, 1999) a scheme of

joint histograms is exploited to incorporate more

information, than just colour, in the process of

image indexing. That approach kept the indexingprocess automated while performance was in-

creased.

An obvious approach to include shape is to

segment the image into regions. It is then straight-

forward to measure region shape as well as deter-

mining spatial interrelationships between regions.

The difficulty is that segmentation is inherently

ed.

2712 G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721

such a difficult task that the performance of

current algorithms falls far short of being able

to provide an adequate input to such schemes

(Gurari and Wechsler, 1982; Cooper, 1998). In

some contexts this may not be such a problem. For

instance, in region-based classification it is ac-ceptable to over-segment the image since the

fragmentation does not disadvantage the classifier.

However, for CBIR the region sizes, shapes, and

positions are just as relevant as their underlying

colour, intensity, and texture. Thus the assump-

tion of many region-based CBIR schemes on very

high-quality segmentation was unrealistic (Chang

et al., 1987), while more recent approaches thatminimise their dependency on good segmenta-

tion are limited in scope (Smith and Li, 1999).

(Of course, in some specific applications such as

trademark retrieval it is feasible to reliably extract

the objects as simple segmentation such as thres-

holding is often sufficient, and then a host of shape

measures can be applied; Mehtre et al., 1997.)

Rather than perform region segmentation someresearchers have investigated the use of interest

points as a means of localising processing to sig-

nificant image windows. Properties, preferably in-

variant to intensity scaling, rotation, etc. can then

be extracted, and used for retrieval within a voting

scheme (Schmid and Mohr, 1997) or by histogram

matching (Wolf et al., 2000). Still, the difficulty is

that corner detection and other interest operatorsare typically unreliable since they are looking for

relatively complex features based on small win-

dows of information.

This suggests edge detection as a more reliable

approach, since it lies somewhere in between re-

gions and points. It does not require a complete

partitioning of the image like region segmentation.

Only the edges are of interest, and they only cover afraction (e.g. a tenth) of the image. It is true that

edges are detected using local window operators

just like corners. However, the standard linking

phase to generated connected edge curves provides

a level of noise suppression, and easily enables

isolated insignificant edges to be eliminated. Nev-

ertheless, although edge detection may be more

tractable than region or corner detection it is stillprone to many errors such as mislocalisation, false

edges, drop-out, incorrect linking, and so on, and

therefore edge-based CBIR techniques must be

able to cope with these difficulties.

Some of our earlier work was based on edges

(Gagaudakis and Rosin, 2002). The multi-scale

salience distance transform (Rosin and West,

1995) was run on the edge gradient map. This hadthe effect of propagating weighted distances from

the edges throughout the image. These values were

histogrammed, and used as a signature of the im-

age. The advantage of this approach was that it

utilised some of the spatial information within the

image without requiring high quality segmenta-

tion. The distance transform is not dependent on

connectivity, and small changes in the edges gen-erally only cause small changes in the distance

map. Weighting the distances by the edge magni-

tudes also reduces the effect of low strength spu-

rious edges while avoiding the sensitivity to an

edge magnitude threshold. The benefit of the his-

togram approach is that it is robust since relatively

small numbers of potentially large errors can be

accommodated. Thus, errors in the edge mapproduce limited effect on the histogram. It can be

seen that our goal is to incorporate some aspects of

shape information into the CBIR process, prefer-

ably without explicitly having to extract shapes

(i.e. regions) from the image.

In the same vein of obtaining some aspect of

shape from edges are approaches by Jain and

Vailaya (1996) and Zhou and Huang (2000). Theformer histograms the edge orientations. Since

the resulting histogram is not rotation invariant

matching requires the histogram to be cyclically

shifted to find the best match. The latter is based

on the analogy of filling edge curves with a flow of

water. Various features are extracted such as fill

time, the number of forks encountered, the num-

ber of loops, etc. Our previous work also exploredother non-edge-based methods of indirect shape

measurement.

2. Methods

Our objective is to enhance the colour histo-

gram with additional information extracted fromimages. We exploited our strategy in three steps.

We first tried to expand our current methods to

G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721 2713

take the spatial distribution of colour into ac-

count, as shown in Section 2.1. Then we used

statistical methods over local windows to extract

aspects of colour and textural characteristics from

the image (Sections 2.2 and 2.3). The final step was

to calculate different shape aspects of the image inan indirect fashion (Sections 2.4 and 2.5).

2.1. Colour labels vs. distance transforms

In (Gagaudakis and Rosin, 2002) we histo-

grammed the intensities of the distance map, pro-

duced by applying multi-scale distance transform

to the detected edges (Fig. 1b). This incorporatedspatial information with respect to intensity

changes, but ignored colour content. To remedy

this we followed a similar trend and applied the

multi-scale distance transform to the boundaries

of the extracted colour region boundaries (Fig. 1c)

creating a 1D histogram of distances to colour

edges.

Trying to use colour in a more active way weexpanded the process by combining the colour and

shape information, forming a two-dimensional

histogram. Each bin in that histogram represents

the frequency of occurrence of a colour at some

distance from a feature. We applied this approach

both on the detected edges and the boundaries of

the colour regions to obtain two histogram fea-

tures.A straightforward approach was taken to the

colour region segmentation. The objective was not

to provide semantically meaningful regions, but

rather to provide a stable partitioning. Assuming

no severe illumination change, we achieved that by

a transformation of the image from the RGB space

to a perceptual set of colour labels (Berlin and

Fig. 1. Example of image and the resulting distance transforms wi

Original, (b) edges, (c) region boundaries.

Kay, 1969; see M. Seaborn and Stonham (1999)

for a recent application). The RGB values were

mapped to Berlin and Kay colour labels, this way

pixels of perceptually similar colour were assigned

the same label. To reduce noise we first apply

Gaussian smoothing (r ¼ 2:8) to the RGB imageand after labelling run majority voting filtering

using a 5 · 5 mask.

2.2. Local colour entropy

In an attempt to involve more colour in the

feature histograms we calculated entropy in local

windows over the hue map of the image. As a firststep we transformed the image to HSV keeping

only the hue channel, throwing away the rest of

the information (as we are interested only on the

plain colour aspect of the image). Then we define a

set of N � N overlapped windows. For each win-

dow, centered at position (x; y), we generate the

histogram of the hues and then calculate the en-

tropy (EHxy ¼ �P

pi log2 pi) of the histogram. Thedesired image signature is then obtained by simply

histogramming the entropy values over all the

windows.

The size of the hue histogram affects the cal-

culation of the colour entropy. We experimented

with static and dynamic histogram sizes and the

best results we obtained was with dynamically

sized histograms containing size ¼ minðw2; 256Þbins with w being the width of the window.

2.3. Local vs. global statistics

Next we experimented with histogramming the

relation between local statistical image information

and the corresponding global image information.

th the relevant overlayed colour region edges/boundaries. (a)

2714 G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721

In particular we wish to capture the similarity and

difference of the local statistics of small windows

against the corresponding statistics of the whole

image. This way we managed to capture the ho-

mogeneity of the image as well as the variation of

the statistics across the image. This can be con-sidered as capturing some sort of global texture of

the image. The basic tool we used for this process is

based on Tsai�s (1985) moment preserving binary

thresholding. This choice was based on a brief

comparative study of a set of automatic thres-

holding methods used for CBIR, as described in

(Gagaudakis and Rosin, 2002).

Thresholding is applied both globally to theimage and locally to the individual windows. Then

the percentage of pixels in the window that have

different values according the two thresholds is

histogrammed.

Additionally, we histogrammed the amount of

blackness (proportion of black pixels) found in the

window content after thresholding.

In both Sections 2.2 and 2.3 we used over-lapping windows. The difference caused by the

amount of overlap was found to be negligible. In

our experiments we used an overlap of 25% of

the window size in each direction. In Fig. 2 an ex-

ample of thresholding for measuring the difference

is illustrated.

2.4. Delaunay triangulation

In this section we consider another approach to

indirectly measuring shape. Its basis is the gener-

ation of the Delaunay triangulation (Preparata

and Shamos, 1985) of a set of edges. Edge detec-

tion and linking is first carried out, which enables

us to easily eliminate spurious short edge lists. The

Fig. 2. A sub-window is shown in (a). The result of thresholding u

thresholding the full image produces the result in (c). The difference be

pixels) between local and global statistics.

remainder is further sub-sampled by a factor of 10

which both speeds up triangulation and effectively

performs some noise suppression. The strength of

this approach is that connectivity is used to help

filter out noise but nevertheless the triangulation is

not dependent on connectivity and therefore cancope with edge linking errors. In (Tao and Grosky,

1999a,b) used the Delaunay triangulation for im-

age indexing. In (Tao and Grosky, 1999a) their

scheme requires isolated objects while in (Tao and

Grosky, 1999b) it is assumed that the image con-

tent is rigid. While the method is scale and rotation

invariant it is not tolerant to rearrangement of

objects. Our scheme does not exploit any objectextraction methods and we do not try to capture

any spatial dependencies explicitly.

We calculate different aspects of the individual

triangles and histogram their properties over the

complete triangulation. In particular we are in-

terested in:

Area: The area of each triangle.Aspect ratio: The ratio of the longest to the

shortest edge of each triangle.

Length: Lengths of triangle edges.

In Fig. 3, an example of an image and the ob-

tained triangulation is illustrated. Note, the sub-

sampling of the edges was chosen to be very coarse

for viewing purposes.

2.5. More shape methods

In an attempt to use more shape in the process

we tried to calculate shape measures in a more

direct way. Avoiding meaningful segmentation,

which is not practical (Gurari and Wechsler, 1982;

sing only the local window information is shown in (b) while

tween the two (d) captures the difference (indicated by the black

Fig. 3. Triangulation of sub-sampled edges.

G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721 2715

Cooper, 1998), we used edges and colour regions

which are potentially more robust. Using the

outcome of that partitioning we calculated several

shape characteristics of region edges and colour

regions.

2.5.1. Edge curvature

In a similar manner to edge orientation histo-gram we generated an edge curvature histogram.

Not only will this contain higher order shape in-

formation, but it is invariant to change and ori-

entation. Curvature is calculated using the cosine

curvature. Due to the difficulty in reliably estima-

tion curvature the resulting histogram appeared

to be noisy when the window size was relatively

small. To compensate we smoothed the results byusing a large fixed window size (although more

sophisticated adaptive methods are possible).

2.5.2. Region shape histograms

Although reliable segmentation is problematic

we did consider a region-based shape extraction

method. Colour regions were extracted using the

same process as in Section 2.1. Then we calculatedthe following properties: triangularity, elliptic-

ity, rectangularity, convexity, circularity (Rosin,

2000). In an attempt to overcome faulty segmen-

tation the shape properties were histogrammed

and used as a shape feature. Thus a few incorrect

regions would only have a small effect in the his-

togram.

Initial testing showed that both of these meth-ods produced poor results and so no further ex-

periment details are provided. The edge curvature

did not appear to provide useful discriminatory

information, while the region shape method failed

due to variations in segmentation that occurred

between even similar images.

3. Testing and results

3.1. Testing protocol

Our objective is to improve the effectiveness of

the simple colour histogram by incorporating

shape. At this point we examine the performance

and potential of these shape methods when com-bined with other features. There are two main

aspects of interest in this context, namely, effec-

tiveness and efficiency.

Efficiency is closely related with the storage re-

quirements and responsiveness of a CBIR system.

Histogram-based approaches tend to have a low

level of computational complexity. However, at

this point we are not greatly concerned with allaspects of efficiency, such as the histogram size.

Modifications of histogram sizes are possible and

capable of increasing efficiency according to cer-

tain trade-offs.

Effectiveness approximates the performance of

a system by taking into account the relevance of

the retrieved images, given a query, as perceived by

the user. A simple way of obtaining such a rankingis by employing the recall and precision measures.

Recall represents the proportion of ‘‘correct’’

matches in the top-N list of the retrieved images.

Precision represents the spread of a cluster in a

database. So given a cluster the bigger the spread

the worse the precision.

Two remaining issues need to be clarified, (a)

how image similarity is quantified and (b) howmultiple histogram features can be joined to a

single similarity value. Image similarity can be de-

termined by the distance between histograms. We

have experimented with various methods to fit the

purpose, such as Euclidian, Chi-square, Earth�sMover etc. The results shown in this paper were

2716 G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721

obtained using the Euclidean distance, while mul-

tiple histogram distances were combined using the

geometric mean to avoid the parameter tuning that

may be required by the weighted sum and the

complexity of more sophisticated methods.

Using the above methods, an experimentaldata-set for which the ground-truth classification

is available is essential. It is not hard to create a

collection of still images that can be clustered into

disjoint classes, as long as the content is simple

(such as faces, buildings, landscapes etc.). How-

ever as the number of classes increases and dis-

tinctions become more subtle it becomes harder to

complete the collection without the evaluationbecoming very subjective.

To avoid the difficulty (and impracticality) of

manual groundtruthing, we follow a method sim-

ilar to Milanese and Cherbuliez�s (1999) where stillimages extracted from video clips are used. A

broadcast TV signal from a local Greek station

(CRETA Channel) was captured at a resolution of

two frames-per-second. This was then re-sampledto obtain nine still images representing each clip. A

total of 387 images, giving 42 groups, were used.

As in (Milanese and Cherbuliez, 1999) we assumed

that (a) the continuity of the visual content of a

clip is implied by the uninterrupted recording of a

video camera, (b) there is gradual change of con-

tent, from frame to frame, due to camera opera-

tions and subject motion, object appearance anddisappearance.

Table 1

Methods, description and individual performance

Method Description

TXTR Circular co-occurrence matrix

CL01 Spatialised colour labels

CL Berlin and Kay labels

BD2D Colour labels vs. distance from colour region

ORNT Edge orientation

CLSH Colour labels vs. distance from extracted edg

ENTR Local hue entropy

TARE Triangle areas

LBLK Local blackness

DIFF Local binary difference

TLEN Triangle length

BKDT Multi-scale distance transform of colour regio

DST Multi-scale distance transform of extracted ed

TRAT Aspect ratio of triangles

This method of performance evaluation is not

perfect. The ground truth is user dependent and

varies in time. Therefore there is not one ‘‘best’’

ground truth that would represent all the users�desires. However, its advantage is that is scal-

able––large test set can be assembled with lowmanual effort. We use this method of performance

evaluation to identify the potential of the devel-

oped methods and how they mix together. Fur-

thermore, we extensively tested the system with all

the images used as queries, allowing us to use the

results for further inspection of the system�s view,as in (Gagaudakis et al., 2000).

Various system parameters exist, but these havenot been tuned to the data beyond trialing on a

handful of images (not included into the evalua-

tion images) to check that they operate reasonably.

This revealed that while the edge and region de-

scriptor methods were overly sensitive to changes

in the system parameters, the remaining methods

(listed in Table 1) were robust. While further

parameter and tuning would be expected to havean effect in performance this was not the purpose

of this study. The list of parameters and their

corresponding values follows:

• Circular co-occurrence matrix: The diameter of

the digital circle, 16 pixels.

• Edge detection: Maximal suppression was fol-

lowed by thresholding keeping the edge pixelshaving 80% of the maximum magnitude.

Recall Precision

83.49 8.58

82.77 9.07

82.51 6.95

boundaries 81.96 10.40

76.71 16.10

es 73.87 13.26

72.75 11.16

42.63 28.22

38.73 39.54

32.27 44.24

32.21 40.48

n boundaries 30.86 58.46

ges 22.83 61.96

19.93 61.72

G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721 2717

• Colour label regions: Prior to labelling the image

pixels Gaussian smoothing (r ¼ 2:4) was ap-

plied to the image, as we found out that it

helped to avoid some small spurious regions.

• Delaunay triangulation: After linking the imageedges, we kept one every tenth pixel, while edges

of less than 10 pixels were excluded.

• Window size: In all methods where the image

was partitioned to windows, the window size

was dependent to the image size and it was such

that a total of 20 · 20 windows would fit.

3.2. Results

In this paper we investigate the potential of

shape measures in CBIR, when used individually or

combined with other measures. Fourteen methods

were considered, including those described in this

paper as well as in (Gagaudakis and Rosin, 2002).

Table 1 shows the individual methods and their

ranking. We developed a system to try out all the

Table 2

Combinations of methods along with recall/precision ranking

Method

TXTR ENTR

CL ORNT

ORNT ENTR BD2D

TXTR ORNT ENTR

TXTR ORNT ENTR BD2D

TXTR ORNT ENTR DIFF

TXTR ORNT ENTR DIFF BD2D

TXTR ORNT ENTR BD2D TARE

TXTR ORNT ENTR LBLK DIFF BD2D

TXTR ORNT ENTR DIFF BD2D TARE

TXTR ORNT ENTR LBLK BD2D TARE TRAT

TXTR ORNT ENTR LBLK DIFF BD2D TLEN

TXTR ORNT ENTR LBLK DIFF BD2D TARE TRAT

TXTR ORNT ENTR LBLK DIFF BD2D TRAT TLEN

TXTR ORNT ENTR LBLK DIFF BD2D TARE TRAT TLEN

TXTR DIST ORNT ENTR LBLK DIFF BD2D TARE TRAT

CL01 TXTR DIST ORNT ENTR LBLK DIFF BD2D TARE TL

CL01 TXTR DIST ORNT ENTR LBLK DIFF BD2D TARE TR

CL01 TXTR ORNT ENTR CLSH LBLK DIFF BD2D TARE T

CL01 TXTR ORNT ENTR CLSH LBLK DIFF BKDT BD2D T

CL01 TXTR ORNT ENTR CLSH LBLK DIFF BKDT BD2D T

CL01 TXTR DIST ORNT ENTR LBLK DIFF BKDT BD2D TA

CL01 TXTR DIST ORNT ENTR CLSH LBLK DIFF BKDT BD

CL CL01 TXTR DIST ORNT ENTR LBLK DIFF BKDT BD2D

CL CL01 TXTR DIST ORNT ENTR CLSH LBLK DIFF BKDT

Good methods should gain high recall and low precision values.

possible combinations of subsets of those 14

methods, giving a total of over 16,000 combina-

tions. The results were then grouped the results in

terms of the number of methods used. From these

groups we kept only the best two methods,

according to their ranking, which are listed inTable 2.

Brief investigation showed that some methods

seem to be appearing in most (if not all) the

combinations. Particularly, the hue entropy and

the texture methods are on the top of the list.

Followed by the edge orientation and the colour

labels vs. distance of colour region boundaries. The

methods using the triangulation although not onthe top of the list they are involved in the top

performing combinations. That shows we cannot

really predict the result of a methods combination

based only on their individual performance.

The nature of the image source suggested that

noise levels vary according to different parameters

like signal strength and external interference which

Recall Precision

91.12 5.79

90.03 4.92

93.65 4.45

93.65 4.74

94.80 3.59

94.17 4.50

95.09 3.49

95.03 3.65

95.23 3.68

95.14 3.52

95.26 4.12

95.20 4.02

95.11 3.96

95.06 4.23

94.65 4.50

94.45 4.60

EN 94.28 4.62

AT 94.20 4.61

RAT TLEN 94.14 4.52

ARE TRAT 94.08 4.19

ARE TRAT TLEN 94.02 4.39

RE TRAT TLEN 93.97 4.78

2D TARE TRAT TLEN 93.71 4.77

TARE TRAT TLEN 93.36 4.53

BD2D TARE TRAT TLEN 92.99 4.65

2718 G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721

vary in time. Our target was to investigate the

homogeneity of noise levels through clusters as

well as looking for any potential correlation be-

tween noise levels and feature performance. We

estimated the noise level of the images by mea-

suring the standard deviation of their grayscalemap using a fast noise variance estimation (Im-

merkaer, 1996) method. In Fig. 4, the noise level

Fig. 4. Relation of noise and performance of difference histogram

(solid line) across the dataset is illustrated overlaid

by the performance of four features (dotted lines).

These graphs show that there is no visible depen-

dency of the feature performance to noise.

The results (Tables 1 and 2) suggested that the

clusters may not be really disjoint as we were ex-pecting. This warned us to take a look at the

database from two different aspects. At a individ-

features, in parenthesis the Pearson correlation coefficient.

G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721 2719

ual image level, inspecting the histograms of the

images and at a cluster level, inspecting the dis-

tances between images.

At an individual image level we created images,

that resemble a form of spectrogram, of the image

histograms (see Fig. 5). Each column of pixels onthese images is a histogram of an image. The white

lines are used to separate the image clusters (given

by the ground truth). The images are post-pro-

cessed for viewing purposes. This representation

enabled us to have a visual feel of the similarity or

dissimilarity of the image features, individually.

Additionally we deducted conclusions on the ap-

plicability and discrimination potential of features:

Global level uniformity: In some cases the fea-

tures showed, e.g. DST, some uniformity limit-

ing the discrimination power to a narrow part

of the histogram.

Noise: In addition, a ‘‘noisy’’ nature of the his-

togram caused problems, limiting the discrimi-

nation power even more.Dithering effect: In another case, BKDT, the

histograms showed a high degree of uniformity

in cluster level and variation in global level,

which would be the ideal behaviour of a feature.

Although, performance was not as good as ex-

pected due to small variations of the histo-

grams, it looks like the histograms are shifted

Fig. 5. Histogram spectrums. (a) BKDT

slightly, which is not taken into account in the

distance measure calculation.

Narrow band bin-population: In many cases a

narrow band of the histograms is populated

leaving a small part of the histogram to beused for discrimination, like TARE, TRAT or

TLEN. Rescaling the histograms could consti-

tute a partial solution, which would potentially

fail in cases of noisy histograms.

Global level variation: This is the kind of behav-

ior of the top performing features. Narrow

banding appears in some cases without consider-

ably affecting the discrimination power of thehistogram. Such a phenomenon would suggest

that compression could be easily achieved by

carefully tuning the size of the histogram bins.

At a cluster level, we extracted the distances of

each image to all the other images in the database.

Since the number of possible combinations is large,

we selectively used some features, according to theirperformance. We used the extracted distances to

produce images, as shown in Fig. 6 where the po-

sition of each pixel represents two image indices and

the intensity their distance, the higher the intensity

the more similar the images. The matrix is arranged

in such a way that inter-cluster distances are rep-

resented by 9 · 9 blocks on the diagonal. We

used the distance matrices as an indicator for

, (b) BD2D, (c) DST, (d) TXTR.

Fig. 6. Distance matrices. (a) Clusters are evident on the diagonal indicating good precision but there is confusion as shown by high

off-diagonal values which suggest poor recall, (b) clustering is not visible indicating poor precision, (c,d) clusters are formed with the

diagonal blocks being highlighted indicating high recall and precision; the difference in precision between the two methods is visible

through the intensity difference of the off-diagonal blocks.

2720 G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721

(a) precision, which is indicated by homogeneoussquare blocks on the diagonal and (b) recall, the

diagonal blocks are brighter than the blocks formed

in the rest of the matrix. The performance mea-

sures, calculated inSection 3,were comparedagainst

the distance matrices and verified their expected

appearance. That is, high performing methods dis-

played the behavior in these two aspects.

4. Conclusions

This paper describes a number of new shape

measures for use in CBIR. The general strategy

was to avoid performing region segmentation asthis was considered too unreliable. Instead a va-

riety of schemes based on edges were developed.

We ran the methods on our existing system and

tested all possible combinations of our new and

old measures. In many cases by incorporating

shape the performance was improved over the

plain colour labels histogram. Initial experiment

showed that the edge curvature and the colourregion shape measure did not work well, and so

they were not investigated further. Focusing on the

shape aspect, we identified the potential of mea-

suring indirect shape using the Delaunay triangu-

lation.

G. Gagaudakis, P.L. Rosin / Pattern Recognition Letters 24 (2003) 2711–2721 2721

Our future work schedule includes further in-

vestigation on the proposed methods, in particu-

lar:

Local vs. global statistics: As well as investigat-

ing local vs. global statistics we will also measurethe change of local statistics over a neighbour-

hood, capturing local information in a more de-

tailed manner.

Delaunay triangulation: Use more properties of

the triangulation like, vertex order (the number

of edges incident to a vertex), triangle perimeter

and triangle altitude. Further more we will use

the triangulation as a form of interest operator,using the triangles to partition the image and

focus the extraction of features (e.g. colour,

texture) on local triangular windows.

Method ranking: Currently we combine mea-

sure using the geometric mean of the distances

between individual feature histograms. This is

simple and non-parametric however we would

like to investigate a more adaptive way of fusingthem. One approach could be to learn appropri-

ate combination rules from training data.

Additionally we are working on involving a

learning process to be used as a feature selector. By

investigating individual query results a system

could possibly be enabled to suggest feature com-

binations that comply with certain performancerequirements, finding a balance to achieve the best

performance with the least effort (computational

complexity and method reliability compensation).

Acknowledgement

This project is funded by EPSRC grant no. GR/L94628.

References

Berlin, B., Kay, P., 1969. Basic Color Terms: Their Universality

and Evolution. University of California Press.

Chang, S., Shi, Q., Yan, C., 1987. Iconic indexing by 2-D

strings. PAMI 9 (3), 413–428.

Cooper, M., 1998. The tractability of segmentation and scene

analysis. IJCV 30 (1), 27–42.

Eakins, J., 1996. Automatic image content retrieval––are we

getting anywhere? In: Proc. Third Internat. Conf. Electronic

Library and Visual Information (ELVIRA3). pp. 123–135.

Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q.,

Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D.,

Steele, D., Yanker, P., 1995. Query by image and video

content: The QBIC system. Computer 28 (9), 23–32.

Gagaudakis, G., Rosin, P.L., 2002. Incorporating shape into

histograms for CBIR. Pattern Recognition 35, 81–91.

Gagaudakis, G., Rosin, P., Chen, C., 2000. Using CBIR and

pathfinder networks for image database visualisation. In:

ICPR00, vol. I. pp. 1052–1055.

Gurari, E., Wechsler, H., 1982. On the difficulties involved in

the segmentation of pictures. IEEE Trans. PAMI 4 (3), 304–

306.

Immerkaer, J., 1996. Fast noise variance-estimation. CVIU 64

(2), 300–302.

Jain, A., Vailaya, A., 1996. Image retrieval using color and

shape. Pattern Recognition 29 (8), 1233–1244.

M. Seaborn, L.H., Stonham, J., 1999. Fuzzy colour category

map for content based image retrieval. British Machine

Vision Conf., 103–112.

Mehtre, B., Kankanhalli, M., Lee, W., 1997. Shape measures

for content based image retrieval: A comparison. Inf. Proc.

and Manag. 33 (3), 319–337.

Milanese, R., Cherbuliez, M., 1999. A rotation, translation, and

scale-invariant approach to content-based image retrieval. J.

Visual Comm. Image Representation 10, 186–196.

Pass, G., Zabih, R., 1999. Comparing images using joint

histograms. Multimedia Systems 7, 234–240.

Preparata, F., Shamos, M., 1985. Computational Geometry.

Springer-Verlag.

Rosin, P., 2000. Measuring shape: Ellipticity, rectangularity,

and triangularity. In: ICPR00, vol. I. pp. 952–955.

Rosin, P., West, G., 1995. Salience distance transforms.

Graphical Models Image Process. 57, 483–521.

Schmid, C., Mohr, R., 1997. Local grayvalue invariants for

image retrieval. PAMI 19 (5), 530–535.

Smith, J., Li, C., 1999. Image classification and querying using

composite region templates. CVIU 75 (1/2), 165–174.

Tao, Y., Grosky, W., 1999a. Delaunay triangulation for image

object indexing: A novel method for shape representation.

In: IST SPIE Symposium on Storage and Retrieval for

Image and Video Databases VII.

Tao, Y., Grosky, W., 1999b. Object-based image retrieval using

point feature maps. In: Proc. Internat. Conf. Database

Semantics (DS-8). pp. 59–73.

Tsai, W., 1985. Moment-preserving thresholding. Computer

Vision, Graphics Image Process. 29, 377–393.

Wolf, C., Jolion, J., Kropatsch, W., Bischof, H., 2000. Content

based image retrieval using interest points and texture

features. In: ICPR00, vol. IV. p. 1A.

Zhou, X., Huang, T., 2000. Image representation and retrieval

using structural features. In: ICPR00, vol. I. pp. 1039–1042.