Spatial layout representation for query-by-sketchcontent-based image retrieval
E. Di Sciascio *, F.M. Donini, M. Mongiello
Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Via Re David, 200 I-70125 Bari, Italy
Received 10 August 2001; received in revised form 30 November 2001
Most content-based image retrieval techniques and systems rely on global features, ignoring the spatial relationships
in the image. Other approaches take into account spatial composition but usually focus on iconic or symbolic images.
Nevertheless a large class of users queries request images having regions, or objects, in well-determined spatial ar-
rangements. Within the framework of query-by-sketch image retrieval, we propose a structured spatial layout repre-
sentation. The approach provides a way to extract the image content starting from basic features and combining them
in a higher-level description of spatial layout of components, characterizing the semantics of the scene. We also propose
an algorithm that measures a weighted global similarity between a sketched query and a database image. 2002Elsevier Science B.V. All rights reserved.
Keywords: Image retrieval; Similarity retrieval; Query by sketch; Spatial relationships
Content-based image retrieval (CBIR) systemsmainly perform extraction of visual features, typ-ically, color, shape and texture as a set of uncor-related characteristics. Such features provide aglobal description of images but fail to considerthe meaning of portrayed objects and the seman-tics of scenes. At a more abstract level of knowl-edge about the content of images, extraction of
object descriptions and their relative positionsprovides a spatial conguration and a logicalrepresentation of images. Because of the lack oflow-level features extraction such methods gener-ally fail to consider the physical extension of ob-jects and their primitive features. Anyway, the twoapproaches for CBIR should be considered ascomplementary. An image retrieval system shouldperform similarity matching based on the repre-sentation of visual features conveying the contentof segmented regions; besides, it should capturethe spatial layout of the depicted scenes in order toface the user expectations.
In this paper we strive to overcome the gapexisting between these two approaches. Hence weprovide a method for describing the content of animage as the spatial composition of objects/regions
Pattern Recognition Letters 23 (2002) 15991612
*Corresponding author. Tel.: +39-805-460641; fax: +39-805-
E-mail addresses: firstname.lastname@example.org (E. Di Sciascio),
email@example.com (F.M. Donini), firstname.lastname@example.org (M.
0167-8655/02/$ - see front matter 2002 Elsevier Science B.V. All rights reserved.PII: S0167-8655 (02 )00124-1
in which each one preserves its visual features,including shape, color and texture. To this aim wepropose a representation for spatial layout ofsketches and a good performance algorithm forcomputing a similarity measure between a sket-ched query and a database image. Similarity is aconcept that involves human interpretation andis aected by the nature of real images; thereforewe base our similarity on a fuzzy concept thatincludes exact similarity in the case of perfectmatching.
The outline of the paper is as follows: in thenext section, we revise related work on spatial re-lationships based image retrieval and outline ourproposal. In Section 3 we dene how we representshapes in the user sketch, and their spatial ar-rangement, as well as other relevant features, suchas color and texture. We dene similarly how wesegment regions in an image, and compare thesketch with an arrangement of regions. Then inSection 4 we dene the formulae we use to com-pute single similarities, and how we compose themusing a more general algorithm. In Section 5 wepresent a set of experimental results, showingthe accordance between our method and a groupof independent human experimenters. Finally, wedraw conclusions in the last section.
2. Related work and proposal of the paper
In the work by Chang et al. (1983), which canbe considered the ancestor of works on image re-trieval by spatial relationships, the modeling oficonic images was presented in terms of 2D strings,each of the strings accounting for the position oficons along one of the two planar dimensions. Inthis approach retrieval of images basically revertsto simpler string matching. In the paper by Gu-divada and Raghavan (1995) objects in a symbolicimage are associated with vertexes in a weightedgraph. The spatial relationships among the objectsare represented through the list of the edges con-necting pairs of centroids. A similarity functioncomputes the degree of closeness between the twoedge lists representing the query and the databasepicture as a measure of the matching between thetwo spatial graphs.
More recent papers on the topic include Gudi-vada (1998) and El-Kwae and Kabuka (1999),which basically propose extensions of the stringapproach for ecient retrieval of subsets of icons.Gudivada (1998) proposes a logical representationof an image based on so-called hR-strings. Sucha representation also provides a geometry-basedapproach to iconic indexing based on spatial re-lationships between the iconic objects in an imageindividuated by their centroid coordinates. Trans-lation, rotation and scale variant images and thevariants generated by an arbitrary composition ofthese three geometric transformations are con-sidered. The similarity between a database and aquery image is obtained through a spatial simi-larity algorithm, simg, that measures the degree ofsimilarity between a query and a database imageby comparing the similarity between their hR-strings. The algorithm recognizes rotation, scaleand translation variants of the arrangement ofobjects, and also subsets of the arrangements. Aconstraint limiting the practical use of this ap-proach is the assumption that an image can con-tain at most one instance of each icon or object. Ofcourse the worst constraint of the algorithm de-pends on the fact that it takes into account onlythe points in which the icons are placed and notthe real conguration of objects. For example apicture with a plane in the left corner and a housein the right one has the same meaning of a picturewith the same objects in dierent proportions: asmall plane and a big house; besides the relativesize and orientation of the objects are not takeninto account.
El-Kwae and Kabuka (1999) propose a furtherextension of the spatial-graph approach that in-cludes both the topological and directional con-straints. The topological extension of the objectscan be obviously useful in determining furtherdierences between images. The similarity algo-rithm extends the graph-matching proposed byGudivada and Raghavan (1995) and retains theproperties of the original approach, including itsinvariance to scaling, rotation and translation andis also able to recognize multiple rotation variants.Even though it considers the topological extensionof objects it is far from considering the composi-tional structure of objects: the objects are consid-
1600 E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612
ered as a whole and no reasoning is possible aboutthe meaning and the purpose of objects or scenesdepicted. Several extensions of these approacheshave been proposed with an evaluation and a com-parison of computational complexity (Zhou et al.,2001).
The computational complexity of evaluating thesimilarity of two arrangements of shapes has beenanalyzed since early 90s. Results have been foundfor what are called type-0 and type-1 similarity inS.K. Chang classication (Chang, 1989). Briey,in type-0 and type-1 similarity, arrangements areconsidered similar if they preserve horizontal andvertical orderings. For example, if object A isbelow and on the left of object B in picture 1,picture 2 is considered similar if the same object Aappears below and on the left of B, regardless oftheir relative size and distance in the two pictures.Tucci et al. (1991) studied the case when there canbe multiple instances of an object, and found thatthe problem is NP-complete. Later, Guan et al.(2000) proved that the same lower bound alsoholds when objects are all distinct. The authorsalso gave a polynomial-time algorithm for type-2similarity, which is stricter than type-1 since itconsiders similar two arrangements only if one ofthe two is a rigid translation of the other. Type-2similarity is too strict for our approach, sincewe admit also rotational and scale variants of anarrangement.
When objects reduce to points, the problem ofevaluating the similarity of arrangements has beencalled point pattern matching. This problem hasbeen studied in computational geometry, where itis known as the constellation problem. In Car-doze and Schulman (1998), a randomized algo-rithm has been given for exact matching of pointsunder translations and rotations, but not underscaling. For matching n points in the plane, thealgorithm works in On2 log n, where the proba-bility of nding wrong matches is a decreasingexponential. A dierent probabilistic algorithmthis one missing good matcheshas been proposedby Irani and Raghavan (1996) for the best match-ing problem, which can also be a non-exact match.The algorithm considers matching under transla-tion, rotation, and scaling. However, the pointpattern matching problem solves just the matching
of centroids of an arrangement of shapes, and itis not obvious if and how algorithms could begeneralized to matching arrangements of shapes.
All the previously described methods are partialsolutions to the problem of image retrieval. Theyconsider disjoint properties instead of a globalsimilarity measure between images. CBIR essen-tially reverts to two basic specications: repre-sentation of the image features and denition ofa metric for similarity retrieval. Most featuresadopted in the literature are global ones, whichconvey informationand measure similaritybased on purely visual appearance of an image.Unfortunately, what a generic user typically con-siders the content of an image is seldom cap-tured by such global features. Particularly forquery-by-sketch image retrieval, the main issue isthe recognition of the sketched components of aquery in one or more database images, followed bya measure of similarity, to nd more relevant ones.In our approach, we assume retrievable only im-ages where all components of the sketch are pre-sent as regions in the image. 1 The approach maybe considered more restrictive than other ap-proaches since only images having all the compo-nents of the sketch are taken into account. This isbecause we consider such components meaningfulto the overall congurationwhy a user wouldsketch them if not? Once all the shapes of theconguration have been found in the image, otherproperties of the shapes concur to dene theoverall similarity measure, such as rotation of eachshape, rotation of the overall arrangement, scaleand translation, color, texture. For measuring thesimilarity of the overall arrangement, we proposea modied version of Gudivadas hR-strings.
There are m!=n! pairings between the n objectsof a sketch and n out of m regions in an image,with m > n, hence a blind trial of all possiblepairings takes exponential time. Results we men-tioned about type-2 similarity and point patternmatching suggest that it might exist a polynomial-time algorithm for matching arrangements of
1 Of course, not all regions in the image must correspond to
components of the sketch. In other words, we do not enforce
full picture matching.
E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612 1601
shapes under translation, rotation and scaling.Since our approach is dierent from the previousones, we devised a non-optimal algorithm to con-duct our experiments, leaving a deeper analysis oncomputational complexity of the problem, andoptimal algorithms to solve it, to future research.The algorithm we employ is not totally blind,though, since it takes advantage of the fact thatusually most shapes in the sketch and regions inthe image do not match each other. Hence, com-puting an n m matrix of similarities betweenshapes and regions, one can ignore most of thewrong matchings from the beginning. Neverthe-less, the worst case of the algorithmwhen shapesare all similar to regionsis exponential.
3. Representing shapes, objects and images
In a previous paper (Di Sciascio et al., 2000a)we proposed a structured language (based on ideasborrowed from Description Logics) to describe thecomplex structure of objects in an image, startingfrom a region-based segmentation of objects. Thecomplete formalism was presented in anotherpaper (Di Sciascio et al., 2000b), together with thesyntax and an extensional semantics, which is fullycompositional.
The main syntactic objects we consider are basicshapes, composite shape descriptions, and trans-formations.
Basic shapes are denoted with the letter B, andhave an edge contour eB characterizing them.We assume that eB is described as a single,closed 2D-curve in a space whose origin coincideswith the centroid of B. Examples of basic shapescan be circle, rectangle, with the contoursecircle , erectangle , but alsoany complete, rough contoure.g., the one of ashipcan be a basic shape.
The possible transformations are the basicones that are present in any drawing tool: rota-tion (around the centroid of the shape), scalingand translation. We globally denote a rotationtranslationscaling transformation as s. Recallthat transformations can be composed in se-quences s1 sn, and they form a mathemat-ical group.
Basic building block of our syntax is a basicshape component hc; t; s;Bi, which represents a re-gion with color c, texture t, and contour seB.With seB we denote the pointwise transfor-mation s of the whole contour of B. For example,s could specify to place the contour eB in theupper left corner of the image, scaled by 1/2 androtated 45 .
Composite shape descriptions are conjunctionsof basic shape components, each one with its owncolor and texture. They are denoted as
C hc1; t1; s1;B1i u u hcn; tn; sn;BniNotice that this is just an internal representation,invisible to the user, that we map in a visual lan-guage actually used to sketch the query.
Gudivada (1998) proposed hR-strings, a geo-metry-based structure for the representation ofspatial relations in images, and an associatedspatial similarity algorithm. The hR-string is asymbolic representation of an image. It is obtainedby associating a name with each domain objectidentied in the image, and then considering thesequence of names and coordinates of the cent-roids of the objects, with reference to a Cartesiancoordinate system. Gudivadas original represen-tation was limited by the assumption that imagescould not contain multiple instances of the sameobject type. We propose a modied formulation ofhR-strings, which allows our similarity algorithmto overcome the previous limitation. Although westill consider the arrangement of components asthe spatial layout of a composite shape, we alsodescribe each shape by including its relevant fea-tures. The icons of the symbolic image in Gudi-vadas hR-string representation are replaced byobjects with their shape. Hence each shape is notscale, rotation, translation invariant (as it was inhR-strings) since we believe such properties havetheir meaning in composite shape descriptions.Therefore we provide a characterization both ofimages and objects in term of basic shapes com-posing complex objects in a sketched query and ofregions composing database images.
To measure similarity, we propose an algorithmthat takes into account the arrangement of shapesin the sketch and compares it with groups of re-gions in an image. The algorithm provides a spa-
1602 E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612
tial similarity measure thatwith respect to Gu-divadas simg algorithmassumes the presence inthe image of a group of regions that correspond tothe components of the sketch. Besides, it consid-ers the relative size of corresponding regions andshapes. To dene a global similarity measure, wetake into account similarity measure depending onall the features that characterize the componentsof objects, i.e., color, texture, rotation and scaling.
3.1. Description of relevant features
While relevant features are properties simplycomputable for elements of a sketch, dealing withregions in real images requires segmentation toobtain a partition of the image. Several segmen-tation algorithms have been proposed in the lit-erature; our approach does not depend on theparticular segmentation algorithm adopted. Any-way, it is obvious that the better the segmentation,the better our approach will work.
Our scope here is limited to the description ofthe computation of image features, assuming asuccessful segmentation. To make the descriptionself-contained, we start dening a generic colorimage as f~IIx; yj16 x6Nh; 16 y6Nvg, where Nh,Nv are the horizontal and vertical dimensions,respectively, and ~IIx; y is a triple R;G;B. Weassume that the image I has been partitioned inm regions ri, i 1; . . . ;m satisfying the followingproperties:
I Sri; i 1; 2; . . . ;m 8i 2 1; 2; . . . ;m, ri is a non-empty and simply-
connected set ri \ rj ; i i 6 j each region satises heuristic and physical re-
We characterize each region ri with the follow-ing attributes: shape, position, size, orientation,color and texture.
3.1.1. ShapeGiven a connected region a point moving along
its boundary generates a complex function denedas: zt xt jyt, t 1; . . . ;Nb, with Nb thenumber of boundary points. Following the ap-
proach proposed in (Rui et al., 1996) we dene thediscrete Fourier transform of zt as:
with k 1; . . . ;Nb.In order to address the spatial discretization
problem we compute the fast Fourier transform(FFT) of the boundary zt; use the rst 2Nc 1FFT coecients to form a dense, non-uniform setof points of the boundary as:
with t 1; . . . ;Ndense, where Ndense is the number ofdense samples used in resampling the ModiedFourier Descriptors.
We then interpolate these samples to obtainuniformly spaced samples zunift, t 0; . . . ;Nunif .We compute again the FFT of zunift obtainingFourier coecients Zunifk, k Nc; . . . ;Nc. Theshape-feature of a region is hence characterized bya vector of 2Nc 1 complex coecients.
3.1.2. Position and sizeWe characterize size and position with the aid
of moment invariants (Pratt, 1991). In order tosimplify notation, let us assume that the extractedexternal contour of each region ri, is placed in anew image Irix; y having the same size of theoriginal image, with a uniform background. Let usalso suppose the new images be binarized, i.e.,discretized with two levels. The p; qth spatialmoment of each Iri can be dened as:
Mup; q XNvy1
The p; qth scaled spatial moment can be denedas:
Mp; q Mup; qNqvN
The denition of size is just the zero momentM0; 0. The position is obtained with reference tothe region centroid, having coordinates:
xx M1; 0M0; 0 yy
M0; 1M0; 0
E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612 1603
3.1.3. OrientationIn order to quantify the orientation of each
region ri we use the same Fourier representation,which stores the orientation information in thephase values. We obviously deal also with specialcases when the shape of a region has more thanone symmetry, e.g. a rectangle or a circle. Rota-tional similarity between a reference shape B and agiven region ri can then be obtained nding max-imum values via cross-correlation:
Ct 12Nc 1
with t 2 0; . . . ; 2Nc
3.1.4. ColorColor information of each region ri is stored,
after quantization in a 112 values color space, asthe mean RGB value within the region:
Rp Gri Xp2ri
Gp Bri Xp2ri
3.1.5. TextureWe extract texture information for each region
ri with a method based on the work in (Pok andLiu, 1999). Following this approach, we extracttexture features convolving the original grey levelimage Ix; y with a bank of Gabor lters, havingthe following impulse response:
hx; y 12pr2
where r is the scale factor, U ; V represents thelter location in frequency-domain coordinates,and k and h are the central frequency and theorientation, respectively dened as:
k U 2 V 2
The processing allows to extract a 24-componentsfeature vector, which characterizes each texturedregion.
Descriptions relative to basic shapes can beused to build complex objects by properly com-posing such components. For example, we cansketch a house using a rectangle and a triangle
posed on its top. To deal with complex objects weassume a set F of basic shapes that can be used forcomparison with other instances of the same shapehaving the same proportions. The image com-posed in Fig. 1 has ve shapes, two of them areoccurrences of the same shape, a circle, but with adierent scale factor. To describe a complex objectwe extend the description for a basic shape.
From now on, the regions of a query sketch willbe denoted with fj, j 1; . . . ; n. Each region fjrepresents an instance of a shape Fj. Consider anobserver posed in the center-of-mass of the sketch.We dene the order in which the components ap-pear as the order in which they are seen from thisviewpoint, counter clock-wise (CCW). For eachcomponent of index j, we call its left-hand sidecomponent, component with index j 1, the fol-lowing component CCW, and its right-hand sidecomponent, with index j 1, the preceding com-ponent CCW. A shape is hence characterized asfollows:
index of Fj; coordinates of the centroid of Cfj of the contour
of fj; length of the segment CQCfj between the cent-
roid of the shape and the centroid of the image; angle of the segment CQCfj with the x-axis; scale factor of the shape fj with respect to the
shape Fj; rotation angle of fj with respect to Fj;
Fig. 1. Information contained in the fk component of a MhR-string.
1604 E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612
index j 1 of the shape Fj1 of which the shapethat precedes fj is an instance;
index j 1 of the shape Fj1 of which the shapethat follows fj is an instance;
color of fj; texture of fj; distance between the centroid of the edge of
shape fj and that of shape fj1 on the left sideof fj;
distance between the centroid of the edge ofshape fj and that of shape fj1 on the right sideof fj.
3.2. Spatial representation with modied hR-string
We now dene the Modied hR-string (MhR-string). The terminology we introduce refers toDenitions 1 through 8 to the work by Gudivada(1998) and properly extends them for our pur-poses. The main dierence concerns the denition:in the original version there is a unique denitionof hR-string for both the database image and thesketched query. Here we give two dierent deni-tions respectively for a sketched query and for areal image. In the denitions we refer to the groupof regions of the segmented image and to the shapecomponents of a complex object. Nearly all theterminology extends to both denitions. For amore accurate analysis of the dierences betweenthe hR-string and the MhR-string recall the de-nition of Augmented hR-string given by Gudi-vada. In a MhR we consider the index of thereference shape instead of the name of the object;the angle h used for dening the order relationamong components is computed between the cen-ter-of-mass of the considered shape and the cent-roid of the centroids in the image instead of thecentroid of the image I. Such an angle inducesthe order relation in which the elements appear inthe same order the components are seen, CCW, byan observer posed in the center-of-mass of them.With respect to the order relation, the denitionof left and right neighbors of a shape (region) ex-tends to the MhR-string, and the indexes of therelative shapes are added to the MhR-string. Thedistance between components are computed usingthe euclidean distance between the centroid and
the angle h. Moreover, in the MhR-string of animage the index of the shape fj in the object de-scription corresponding to the region ri is added.We remind that before determining the MhR-string it is necessary to search for a group ofregions in the image that resembles the set ofcomponents in the sketched object.
Denition 1 (MhR-string for an object Q). TheMhR-string for a complex object Q is a list of nelements, one for each component. The order in whichthe elements appear is the same order the compo-nents are seen, CCW, by an observer posed in thecenter-of-mass of them. Then, each component has aleft-hand side component, and a right-hand side one.The element for the jth component fj of Q containsthe items below:
fj:indexindex of the prototypical shape whichfj is an instance of;
fj:anglethe angle that the line joining the cent-roid (center-of-mass) of fj with the centroid ofthe centroids of Q subtends with the positive x-axis;
fj:distancedistance between the centroid of fjand the centroid of the object description;
fj:indexRindex of the prototypical shape of theshape on the right side of fj;
fj:indexLindex of the prototypical shape of theshape on the left side of fj;
fj:distanceRdistance between the centroid of fjand the centroid of the shape on the right side;
fj:distanceLdistance between the centroid of fjand the centroid of the shape on the left side.
An example is shown in Fig. 1 where we rep-resent the MhR-string for a complex object. Werefer to the MhR-string of the fk component hererepresented by the bold circle. fk:index is the indexof the basic shape which represents the bold circle.The component fk:indexR and fk:indexL are re-spectively the index of the circle on the right sideand of the rectangle on the left side of fk. The othercomponents of the MhR-string are the distancesand the angle highlighted on the gure.
Denition 2 (MhR-string for an image I). TheMhR-string for an image I is a list of m elements,
E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612 1605
one for each segmented region. The order in whichthe elements appear is the same order the compo-nents are seen, CCW, by an observer posed in thecenter-of-mass of them. Then, each component has aleft-hand side component, and a right-hand side one.The element for the ith region ri of I contains theitems below:
(1) ri:indexindex of the shape which ri is an in-stance of;
(2) ri:anglethe angle that the line joining the cent-roid of ri with the centroid of the centroids of Isubtends with the positive x-axis;
(3) ri:distancedistance between the centroid of riand the centroid of the group of regions;
(4) ri:indexRindex of the shape corresponding tothe region on the right side of ri;
(5) ri:indexLindex of the shape corresponding tothe region on the left side of ri;
(6) ri:distanceRdistance between the centroid of riand the centroid of the region on the right side;
(7) ri:distanceLdistance between the centroid of riand the centroid of the region on the left side;
(8) ri:ObjectShapeindex of the shape fj in the ob-ject description corresponding to the region ri(such value is returned by the search for a groupof regions).
4. Computing similarities
In all similarity measures, we adopt the functionUx; gx; gy. This function was determined by trialand error to replace the exponential one used byGudivada in his similarity algorithm. The moti-vation was the the exponential function resultedtoo steep, so that even small variations in the rel-ative position of the centroids would produceconsiderable variations in the similarity functionsimh, which appeared too drastic for real imagesretrieval. We then chose a function with a nullrst-order derivative for x 0. The role of thefunction is to smooth the changes of the quan-tity x, depending on two parameters gx; gy, whichhave been tuned in the experimental phase, and tochange a distance x (in which 0 corresponds toperfect matching) to a similarity measure (in whichthe value 1 corresponds to perfect matching).
Ux; gx; gy gy 1 gy cos px
if 0 < x6 gx
gy 1arctan pxgx1gygxgy
35 if x > gx
where values for gx > 0 and 0 < gy < 1 have beenexperimentally determined, and are presented inTable 1.
Given a query Q with n objects and a picture Isegmented in m regions, from all the groups ofregions in the picture that might resemble thecomponents, we select the groups that presentthe higher spatial similarity with the objects. Inarticial examples in which all shapes in I and Qresemble each other, this may generate an expo-nential number of groups to be tested, given bymm 1 . . . m n 1. Hence, our algorithmmay not be optimal from a computational pointof view. Whether there exist or not polynomial-time algorithms to solve this problem is still anopen question; a polynomial-time algorithm isknown only if scaling transformations of a layout
Conguration parameters, grouped by feature type
Fourier descriptors threshold 0.9
Circular symmetry threshold 0.9
hR-string spatial factor 0.25hR-string scale factor 0.25Spatial similarity threshold 0.2
Spatial similarity weight a 0.30Spatial similarity sensitivity gx 0.5Spatial similarity sensitivity gy 0.4Shape similarity weight b 0.30Shape similarity sensitivity gx 0.005Shape similarity sensitivity gy 0.2Scale similarity weight g 0.1Scale similarity sensitivity gx 0.5Scale similarity sensitivity gy 0.4Rotation similarity weight d 0.1Rotation similarity sensitivity gx 10Rotation similarity sensitivity gy 0.3Color similarity weight c 0.1Color similarity sensitivity gx 110Color similarity sensitivity gy 0.4Texture similarity weight 0.1
Texture similarity sensitivity gx 110Texture similarity sensitivity gy 0.4Global similarity threshold 0.29
1606 E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612
are not considered (Chew et al., 1997). However,in typical real images the similarity betweenshapes is selective enough to yield only a verysmall number of possible groups to try. Therefore,our algorithm has a behavior ecient enough tocarry on experiments. We recall that in Gudi-vadas approach (1998) a strict assumption is made,namely, each basic component in Q does not ap-pear twice, and each region in I matches at mostone component in Q. Our approach, relaxing thisassumption, is more suited for image retrieval. Wecompute the simss shape similarity between re-gions of I and shape components of Q, and selectregions with a similarity greater than a giventhreshold. Computation of simss is invariant withrespect to scale and rotation. The measure is ob-tained as the cosine norm applied to the tuples Zfjand Zri of complex coecients describing respec-tively the shape of a region ri and the shapeof a component fj, Zfj x1; . . . ; x2Nc and Zri y1; . . . ; y2Nc
simssfj; ri P2Nc
l1 xlylP2Ncl1 x
simssfj; ri is a number in the range [0,1], wheresimss 1 corresponds to maximum similarity be-tween a region and a shape component.
4.1. Spatial similarity
We recall that all the components of Q must befound in I. So, the recognition algorithm dis-cardswithout further processingthose imagesthat do not contain all the shape components ofQ. Notice that we consider a shape componentrecognized if its shape similarity with reference toa region in the image is beyond a threshold. Alsonotice that this does not prevent regions notincluded in the sketch Q be present in the image I.As a consequence, we describe the MhR-stringalgorithm assuming the presence in an image of agroup of regions that corresponds to the compo-nents of Q. Let the objects in the sketch be num-bered as 1 . . . n and the regions in the image as1 . . .m. To make precise the correspondence be-tween objects and regions we use an injective
mapping l : f1; . . . ; ng ! f1; . . . ;mg, such thatobject i is matched with region rli. In orderto compute the spatial similarity of the overallarrangement of regions with reference to thearrangement of components in Q, we considertheir centroids. Once the group of regions has beenselected the algorithm extracts for each groupthe corresponding MhR-string and evaluates theoverall similarity. The spatial similarity value simhis computed by the ComputeSIMh algorithmconsidering the relative positions of shapes andregions (normalized with respect to the image size)and accounts for the similarity in terms of thespatial arrangements of objects.
Algorithm ComputeSIMh (MhRQ,MhRI )input MhR-string of the query Q
MhR-string of the group of n regionsrl1; . . . ; rlnoutput simhbegin
simh 0for j 2 f1; . . . ; ng do
i rlj:ObjectShapeif rlj:indexR fi:indexR then
simh simh KSpatialFactorsimh simh KScaleFactorUhjfi:dist-
endifif rlj:indexL fi:indexL then
simh simh KSpatialFactorsimh simh KScaleFactorUhjfi:dist-
The factors KSpatialFactor and KScaleFactor areconstants, subject to the following constraint:2KSpatialFactor 2KScaleFactor 1. KSpatialFactor is thedegree of importance of spatial relationships be-tween the corresponding region and shape incomputing the spatial similarity; KScaleFactor is thedegree of importance of the scale variations be-tween the region and the corresponding shape. The
E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612 1607
multiplicative term 2 comes from the computationof spatial similarity considering both the left andright neighbors of a shape (region). Q:magnituderepresents the extension of the object Q. Its valueis obtained as the mean value of the lengths of thesegments between the centroid of the shape andthe centroid of the image.
4.2. Shape similarity
simshape measures the similarity between shapesin the composite shape description and the regionsin the segmented image,
simshape U maxnj1
simssfj; rlj; gxshape; gyshape:
4.3. Scale similarity
simscale takes into account the dierences inscale between each region rlj in the consideredgroup of regions and the corresponding basicshape fj in Q.
where fj:scale is the scaling factor in fj (the trans-formation for the jth component of Q), rlj:scaleis the scaling factor of region rj when matchedto basic shape fj, and ScaleFactor is the overallscaling factor of the selected group of regionswhen matched with Q. Observe that we choose themaximum, since the dierences are distances, andnot similarities. Then we compute the scale simi-larity with the aid of the function U,
simscale UDscale; gxscale; gyscale:
4.4. Rotation similarity
simrotation takes into account the errors in therotation of each region in the considered group ofregions with respect to the corresponding basicshapes in Q. We start by nding the Rotation-Factor of the group of n regions rl1; . . . ; rln withrespect to Q, in accordance with the followingalgorithm:
Algorithm ComputeSIMRotation (MhRQ,MhRI )input MhR-string of the query Q
MhR-string of the group of n regionsrl1; . . . ; rlnoutput RotationFactorbegin
RotationFactor 0for j 2 f1; . . . ng do
i rlj:ObjectShapeDangle rlj:angle fi:angleif Dangle > 180 then
Dangle Dangle 360else if Dangle6 180 then
Dangle Dangle 360endif
RotationFactor RotationFactor Dangleendfor
RotationFactor RotationFactor=nreturn RotationFactor
In order to compute the rotation of each regionwith respect to the corresponding shapes we con-sider the maximum angles ar, r 2 f1; . . . ; kjg, j 2f1; . . . ; ng, obtained by computing the cross-cor-relation function, as described in Section 3. Eachrotation error Dj, for j 2 f1; . . . ; ng, is then com-puted as follows:
for r 2 f1; . . . kjg doDr jar RotationFactor Bj:anglejDr Dr mod 180
Dj minkjr1 fjDrjg
The maximum rotation dierence is then:
and again, since we computed a distance, wesmooth and convert it into a similarity measurewith the help of U:
simrotation UDrotation; gxrotation; gyrotation:
1608 E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612
4.5. Color similarity
simcolor measures the similarity in terms of colorappearance between the regions and the corres-ponding shapes in the composite shape descrip-tion. In the following formula, Dcolorj:R denotesthe dierence in the red color component betweenthe j-th component of Q and the region rlj, andsimilarly for the green and the blue color compo-nents.
Dcolorj Dcolorj:R2 Dcolorj:G2 Dcolorj:B2
qThen the function U takes the maximum of thedierences to obtain a similarity:
simcolor U maxnj1
fDcolorjg; gxcolor; gycolor
4.6. Texture similarity
Finally, simtexture measures the similarity be-tween the texture features in the components of Qand in the corresponding regions.
simtexture UDtexture; gxtexture; gytexture
4.7. The recognition algorithm
We now formally describe the recognition al-gorithm. The algorithm uses a global similarity
value that is determined from similarity values forthe various features we considered relevant forregions description. These are simh for spatialsimilarity, simshape for shape similarity, simscale forscale similarity, simrotation for rotational similaritysimcolor for color similarity and simtexture for tex-tural similarity. Usually, such features are weigh-ted, as a user may consider colors more importantthan textures, or the arrangement of shapes moreimportant than the single shapes. However, theircomposition is not a weighted sum, since we useminimum for computing similarities. Minimumstems from the fuzzy interpretation of logical
and in the formal language we mentioned in thebeginning. Minimum is also suitable when com-posing the similarities of each object shape fj withthe region rlj it is mapped into; in fact, if fj istotally dissimilar from rlj, its similarity drops tozero, and we want this null value to propagate tothe similarity of the entire layout. In this way,requiring all objects to be present in the image isnot an aside constraint, but comes naturally fromthe semantics. Therefore, we have the problem ofcombining weighted minimums. The problem hasbeen solved by Fagin and Wimmers (2000), in away we briey recall. Suppose that coecients a,b, c, g, d, with a b c g d 1 weightthe relevance each feature has in the global simi-larity computation. We number these coecientsas h1; . . . ; h6, and link them to the (numbered)similarities as follows:
h1 a sim1 simhh2 b sim2 simshapeh3 c sim3 simcolorh4 g sim4 simscaleh5 d sim5 simrotationh6 sim6 simtexture
8>>>>>>>>>>>:Now we reorder coecients in non-increasingorder, by means of a permutation r : f1; . . . ; 6g !f1; . . . ; 6g such that hr1 P hr2 P P hr6.Then, the formula by Fagin and Wimmers is asfollows:
The behavior of the algorithm, hereafter pre-sented, obviously depends on the congurationparameters, shown in Table 1, which determine therelevance of the various features involved in thesimilarity computation.
Algorithm Recognize (Q,I);input a query picture Q composed by n singleobjects, and an image I, segmented into regionsr1; . . . ; rmoutput A similarity measure of the recognitionof Q in Ibegin
similarity hr1 hr2 simr1 2 hr2 hr3 minsimr1; simr2 3 hr3 hr4 minsimr1; simr2; simr3 6hr6 minsimr1; . . . ; simr6 1
E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612 1609
/* for each object in the sketch */for j 2 f1; . . . ; ng do
simssmax 0/* make sure that at least one region issimilar to the object */for i 2 f1; . . . ;mg do
compute similarity simssfj; ri betweenfj and riif simssmax < simssfj; ri then simssmax simssfj; ri
/* if no region is similar to object j, fail */if simssmax < threshold then return (0)
smax 0for all injective functions l: f1; . . . ; ng !f1; . . . ;mg yielding a group of n regionsrl1; . . . ; rlnsuch that for j 1; . . . ; n it is simssfj; rlj >thresholddo:
extract the MhR-string MhRQ ofrl1; . . . ; rln
compute similarity using Formula (1)if smax < s then smax s
The approach described in the preceding sec-tions has been deployed in DESIR (DEscriptionlogics Structured Image Retrieval) a prototypeinteractive system. Queries can be posed by sketch,using pre-stored basic shapes or newly createdones (see an example of query by sketch in Fig. 2),but also by example. Both a query by example ora submitted database image undergo the sameprocedure: segmentation, feature extraction andclassication.
One of the unfortunate aspects of similarityretrieval is that it is inherently fuzzy and subjec-tive, and widely accepted benchmarks are still tobe found. Although our image database currentlystores a few thousands images, in order to easecomparisons with previous work on spatial-re-
lationships-based image-retrieval approaches wechose an experimental setup that is close to the oneproposed by Gudivada and Raghavan (1995), andalso adopted by Gudivada (1998) and El-Kwaeand Kabuka (1999), in which a small, well dened,set of images is used to check the system perfor-mances vs. human users judgement. Our test setconsists of 93 images (available at www-ictserv.poliba.it/disciascio/test-images.htm), picturing sim-ple or composite objects arranged together. Thetotal number of dierent objects was 18. Imageswere captured using a digital camera, all had size1080 720 pixels, 24 bits/pixel. Images were au-tomatically segmented by the system. Thirty-oneimages of the set were selected as queries by ex-ample. The resulting classication carried out bythe system against the image set was dened assystem-provided ranking. Fig. 3 shows one of thequeries and the retrieved set of images. To obtain ausers ranking we then asked to ve volunteers toclassify the 93 images based on their similarity toeach query image. Users could group databaseimages that he/she considered equivalent in termsof similarity to a given query. The ve classica-tions were not univocal, and they were mergedtogether in a unique ranking by considering, foreach image, the minimum ranking among the veavailable. This provided a users-ranking for thesame set of queries. Notice that this approachlimited the weight that images badly classied bysingle users have on the nal ranking. Also fol-
Fig. 2. A query by sketch and retrieved images.
1610 E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612
lowing the approach by Gudivada and Raghavan(1995), we measured the retrieval eectivenessadopting the Rnorm (Bollmann et al., 1985) asquality measure. Assuming G a nite set of imageswith a user-dened preference relation P that iscomplete and transitive, and Dusr the rank orderingof G induced by the user preference relation, letDsys be some rank ordering of G induced by thesimilarity values computed by the image retrievalsystem. The formulation of Rnorm is:
where S is the number of image pairs where abetter image is ranked by the system ahead of aworse one; S is the number of pairs where a worseimage is ranked ahead of a better one and Smax isthe maximum possible number of S. It should benoticed that the calculation of S, S, and Smax isbased on the ranking of image pairs in Dsys relativeto the ranking of corresponding image pairs inDusr. Rnorm values are in the range [0,1.0]; 1.0 cor-responds to a system-provided ordering of thedatabase images that is either identical to the oneprovided by the human experts or has a higherdegree of resolution, lower values correspond to aproportional disagreement between the two. Weobtained an average RnormAVG 0:89, and as low-est value RnormMIN 0:62 and highest RnormMAX1:0. We also computed, as a reference, averagevalues for PrecisionNRR=NR, RecallNRR=N
where NRRnumber of images retrieved and rele-vant and NR total number of relevant images indatabase and N total number of retrieved images.We obtained PrecisionAVG0:91 and RecallAVG0:80.
Results presented by Gudivada and Raghavan(1995) showed an average Rnorm 0:98, on a data-base of 24 iconic images used both as queries anddatabase images and similarity computed only onspatial relationships between icons. Our systemworks on real images and computes similarity onseveral image features; we believe that resultsprove the ability of the system to catch to a goodextent the users information need, and make re-ned distinctions between images when searchingfor composite shapes. Obviously, we do not claimthese results would scale exactly considering largerimage databases, or databases with images pic-turing scenes without well-dened objects. Never-theless they show, in a controlled framework, goodperformances. As a nal remark, notice we do notrequire full picture matching; the image retrievedfrom the sketch may contain other regions not inthe sketch. However, all components of the sketchmust be recognized in an image in order tomake it retrievable. Only in this case the databaseimages is processed to determine its similarity.This approach tends to increase precision, even atthe expenses of recall. It is anyway reasonable toassume that, in an interactive session using query-by-sketch, a user will introduce a partial querypicturing what he/she thinks are the main features/objects representing his/her information need.Should the retrieved set be still too large, furtherdetails will be added to reduce the set.
Starting from the observation that most imageretrieval systems either rely on global features orconcentrate on symbolic images considered onlyin terms of spatial positioning, we proposed anapproach particularly suitable for query by sketchimage retrieval, able to handle queries made byseveral shapes, where the position, orientation andsize of the shapes relative to each other is mean-ingful. In our approach we start extracting basic
Fig. 3. A query by example and retrieved images.
E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612 1611
features of the image and combine them in ahigher-level description of spatial layout of com-ponents, characterizing the semantics of the scene.We also dened a similarity algorithm that mea-sures a weighted global similarity between asketched query and a database image and allowsfor both a perfect matching or an approximaterecognition of sketches in real images.
Bollmann, P., Jochum, F., Reiner, U., Weissmann, V., Zuse,
H., 1985. The LIVE-Project-Retrieval experiments based on
evaluation viewpoints. In: Proceedings of the 8th Annual
International ACM/SIGIR Conference on Research and
Development in Information Retrieval. ACM, New York,
Cardoze, D.E., Schulman, L.J., 1998. Pattern matching for
spatial point sets. In: Proceedings of the 39th Annual
Symposium on the Foundations of Computer Science
Chang, S.K., 1989. Principles of Pictorial Information Systems
Design. Prentice-Hall, Englewood Clis, New Jersey.
Chang, S.K., Shi, Q.Y., Yan, C.W., 1983. Iconic indexing by
2D strings. IEEE Trans. Pattern Anal. Mach. Intelligence 9
Chew, L.P., Goodrich, M.T., Huttenlocher, D.P., Kedem, K.,
Kleinberg, J.M., Kravets, D., 1997. Geometric pattern
matching under euclidean motion. Comput. Geom. 7, 113
Di Sciascio, E., Donini, F.M., Mongiello, M., 2000a. A
Description logic for image retrieval. In: Lamma, E., Mello,
P. (Eds.), AI*IA 99: Advances in Articial Intelligence,
number 1792 in Lecture Notes in Articial Intelligence.
Springer, pp. 1324.
Di Sciascio, E., Donini, F.M., Mongiello, M., 2000b. Semantic
indexing for image retrieval using description logics. In:
Laurini, R. (Ed.), Advances in Visual Information Systems,
number 1929 in Lecture Notes in Computer Science.
Springer, pp. 372383.
El-Kwae, E.A., Kabuka, M.R., 1999. Content-based retrieval
by spatial similarity in image databases. ACM Trans.
Inform. Syst. 17, 174198.
Fagin, R., Wimmers, E.L., 2000. A formula for incorporating
weights into scoring rules. Theor. Comput. Sci. 239, 309
Guan, D.J., Chou, C.Y., Chen, C.W., 2000. Computational
complexity of similarity retrieval in a pictorial database.
Inform. Process. Lett. 75, 113117.
Gudivada, V.N., 1998. hR-string: A geometry-based represen-tation for ecient and eective retrieval of images by spatial
similarity. IEEE Trans. Knowl. Data Eng. 10 (3), 504512.
Gudivada, V.N., Raghavan, J.V., 1995. Design and evaluation
of algorithms for image retrieval by spatial similarity. ACM
Trans. Inform. Syst. 13 (2), 115144.
Irani, S., Raghavan, P., 1996. Combinatorial and experimental
results for randomized point matching algorithms. Proceed-
ings of the 12th ACM Symposium on Computational
Pok, G., Liu, J., 1999. Texture classication by a two-level
hybrid scheme. Storage and Retrieval for Image and Video
Databases VII 3656, 614622, SPIE.
Pratt, W.K., 1991. Digital Image Processing. J. Wiley & Sons
Inc., Englewood Clis, NJ.
Rui, Y., She, A.C., Huang, T.S., 1996. Modied Fourier
descriptors for shape representationa practical approach.
In: Proceedings of 1st Workshop on Image Databases and
Tucci, M., Costagliola, G., Chang, S.K., 1991. A remark on
NP-completeness of picture matching. Inform. Process.
Lett. 39, 241243.
Zhou, X.M., Ang, C.H., Ling, T.W., 2001. Image retrieval
based on objects orientation spatial relationship. Pattern
Recogn. Lett. 22, 469477.
1612 E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612