Spatial layout representation for query-by-sketch content-based image retrieval

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>Spatial layout representation for query-by-sketchcontent-based image retrieval</p><p>E. Di Sciascio *, F.M. Donini, M. Mongiello</p><p>Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari, Via Re David, 200 I-70125 Bari, Italy</p><p>Received 10 August 2001; received in revised form 30 November 2001</p><p>Abstract</p><p>Most content-based image retrieval techniques and systems rely on global features, ignoring the spatial relationships</p><p>in the image. Other approaches take into account spatial composition but usually focus on iconic or symbolic images.</p><p>Nevertheless a large class of users queries request images having regions, or objects, in well-determined spatial ar-</p><p>rangements. Within the framework of query-by-sketch image retrieval, we propose a structured spatial layout repre-</p><p>sentation. The approach provides a way to extract the image content starting from basic features and combining them</p><p>in a higher-level description of spatial layout of components, characterizing the semantics of the scene. We also propose</p><p>an algorithm that measures a weighted global similarity between a sketched query and a database image. 2002Elsevier Science B.V. All rights reserved.</p><p>Keywords: Image retrieval; Similarity retrieval; Query by sketch; Spatial relationships</p><p>1. Introduction</p><p>Content-based image retrieval (CBIR) systemsmainly perform extraction of visual features, typ-ically, color, shape and texture as a set of uncor-related characteristics. Such features provide aglobal description of images but fail to considerthe meaning of portrayed objects and the seman-tics of scenes. At a more abstract level of knowl-edge about the content of images, extraction of</p><p>object descriptions and their relative positionsprovides a spatial conguration and a logicalrepresentation of images. Because of the lack oflow-level features extraction such methods gener-ally fail to consider the physical extension of ob-jects and their primitive features. Anyway, the twoapproaches for CBIR should be considered ascomplementary. An image retrieval system shouldperform similarity matching based on the repre-sentation of visual features conveying the contentof segmented regions; besides, it should capturethe spatial layout of the depicted scenes in order toface the user expectations.</p><p>In this paper we strive to overcome the gapexisting between these two approaches. Hence weprovide a method for describing the content of animage as the spatial composition of objects/regions</p><p>Pattern Recognition Letters 23 (2002) 15991612</p><p></p><p>*Corresponding author. Tel.: +39-805-460641; fax: +39-805-</p><p>460410.</p><p>E-mail addresses: (E. Di Sciascio),</p><p> (F.M. Donini), (M.</p><p>Mongiello).</p><p>0167-8655/02/$ - see front matter 2002 Elsevier Science B.V. All rights reserved.PII: S0167-8655 (02 )00124-1</p></li><li><p>in which each one preserves its visual features,including shape, color and texture. To this aim wepropose a representation for spatial layout ofsketches and a good performance algorithm forcomputing a similarity measure between a sket-ched query and a database image. Similarity is aconcept that involves human interpretation andis aected by the nature of real images; thereforewe base our similarity on a fuzzy concept thatincludes exact similarity in the case of perfectmatching.</p><p>The outline of the paper is as follows: in thenext section, we revise related work on spatial re-lationships based image retrieval and outline ourproposal. In Section 3 we dene how we representshapes in the user sketch, and their spatial ar-rangement, as well as other relevant features, suchas color and texture. We dene similarly how wesegment regions in an image, and compare thesketch with an arrangement of regions. Then inSection 4 we dene the formulae we use to com-pute single similarities, and how we compose themusing a more general algorithm. In Section 5 wepresent a set of experimental results, showingthe accordance between our method and a groupof independent human experimenters. Finally, wedraw conclusions in the last section.</p><p>2. Related work and proposal of the paper</p><p>In the work by Chang et al. (1983), which canbe considered the ancestor of works on image re-trieval by spatial relationships, the modeling oficonic images was presented in terms of 2D strings,each of the strings accounting for the position oficons along one of the two planar dimensions. Inthis approach retrieval of images basically revertsto simpler string matching. In the paper by Gu-divada and Raghavan (1995) objects in a symbolicimage are associated with vertexes in a weightedgraph. The spatial relationships among the objectsare represented through the list of the edges con-necting pairs of centroids. A similarity functioncomputes the degree of closeness between the twoedge lists representing the query and the databasepicture as a measure of the matching between thetwo spatial graphs.</p><p>More recent papers on the topic include Gudi-vada (1998) and El-Kwae and Kabuka (1999),which basically propose extensions of the stringapproach for ecient retrieval of subsets of icons.Gudivada (1998) proposes a logical representationof an image based on so-called hR-strings. Sucha representation also provides a geometry-basedapproach to iconic indexing based on spatial re-lationships between the iconic objects in an imageindividuated by their centroid coordinates. Trans-lation, rotation and scale variant images and thevariants generated by an arbitrary composition ofthese three geometric transformations are con-sidered. The similarity between a database and aquery image is obtained through a spatial simi-larity algorithm, simg, that measures the degree ofsimilarity between a query and a database imageby comparing the similarity between their hR-strings. The algorithm recognizes rotation, scaleand translation variants of the arrangement ofobjects, and also subsets of the arrangements. Aconstraint limiting the practical use of this ap-proach is the assumption that an image can con-tain at most one instance of each icon or object. Ofcourse the worst constraint of the algorithm de-pends on the fact that it takes into account onlythe points in which the icons are placed and notthe real conguration of objects. For example apicture with a plane in the left corner and a housein the right one has the same meaning of a picturewith the same objects in dierent proportions: asmall plane and a big house; besides the relativesize and orientation of the objects are not takeninto account.</p><p>El-Kwae and Kabuka (1999) propose a furtherextension of the spatial-graph approach that in-cludes both the topological and directional con-straints. The topological extension of the objectscan be obviously useful in determining furtherdierences between images. The similarity algo-rithm extends the graph-matching proposed byGudivada and Raghavan (1995) and retains theproperties of the original approach, including itsinvariance to scaling, rotation and translation andis also able to recognize multiple rotation variants.Even though it considers the topological extensionof objects it is far from considering the composi-tional structure of objects: the objects are consid-</p><p>1600 E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612</p></li><li><p>ered as a whole and no reasoning is possible aboutthe meaning and the purpose of objects or scenesdepicted. Several extensions of these approacheshave been proposed with an evaluation and a com-parison of computational complexity (Zhou et al.,2001).</p><p>The computational complexity of evaluating thesimilarity of two arrangements of shapes has beenanalyzed since early 90s. Results have been foundfor what are called type-0 and type-1 similarity inS.K. Chang classication (Chang, 1989). Briey,in type-0 and type-1 similarity, arrangements areconsidered similar if they preserve horizontal andvertical orderings. For example, if object A isbelow and on the left of object B in picture 1,picture 2 is considered similar if the same object Aappears below and on the left of B, regardless oftheir relative size and distance in the two pictures.Tucci et al. (1991) studied the case when there canbe multiple instances of an object, and found thatthe problem is NP-complete. Later, Guan et al.(2000) proved that the same lower bound alsoholds when objects are all distinct. The authorsalso gave a polynomial-time algorithm for type-2similarity, which is stricter than type-1 since itconsiders similar two arrangements only if one ofthe two is a rigid translation of the other. Type-2similarity is too strict for our approach, sincewe admit also rotational and scale variants of anarrangement.</p><p>When objects reduce to points, the problem ofevaluating the similarity of arrangements has beencalled point pattern matching. This problem hasbeen studied in computational geometry, where itis known as the constellation problem. In Car-doze and Schulman (1998), a randomized algo-rithm has been given for exact matching of pointsunder translations and rotations, but not underscaling. For matching n points in the plane, thealgorithm works in On2 log n, where the proba-bility of nding wrong matches is a decreasingexponential. A dierent probabilistic algorithmthis one missing good matcheshas been proposedby Irani and Raghavan (1996) for the best match-ing problem, which can also be a non-exact match.The algorithm considers matching under transla-tion, rotation, and scaling. However, the pointpattern matching problem solves just the matching</p><p>of centroids of an arrangement of shapes, and itis not obvious if and how algorithms could begeneralized to matching arrangements of shapes.</p><p>All the previously described methods are partialsolutions to the problem of image retrieval. Theyconsider disjoint properties instead of a globalsimilarity measure between images. CBIR essen-tially reverts to two basic specications: repre-sentation of the image features and denition ofa metric for similarity retrieval. Most featuresadopted in the literature are global ones, whichconvey informationand measure similaritybased on purely visual appearance of an image.Unfortunately, what a generic user typically con-siders the content of an image is seldom cap-tured by such global features. Particularly forquery-by-sketch image retrieval, the main issue isthe recognition of the sketched components of aquery in one or more database images, followed bya measure of similarity, to nd more relevant ones.In our approach, we assume retrievable only im-ages where all components of the sketch are pre-sent as regions in the image. 1 The approach maybe considered more restrictive than other ap-proaches since only images having all the compo-nents of the sketch are taken into account. This isbecause we consider such components meaningfulto the overall congurationwhy a user wouldsketch them if not? Once all the shapes of theconguration have been found in the image, otherproperties of the shapes concur to dene theoverall similarity measure, such as rotation of eachshape, rotation of the overall arrangement, scaleand translation, color, texture. For measuring thesimilarity of the overall arrangement, we proposea modied version of Gudivadas hR-strings.</p><p>There are m!=n! pairings between the n objectsof a sketch and n out of m regions in an image,with m &gt; n, hence a blind trial of all possiblepairings takes exponential time. Results we men-tioned about type-2 similarity and point patternmatching suggest that it might exist a polynomial-time algorithm for matching arrangements of</p><p>1 Of course, not all regions in the image must correspond to</p><p>components of the sketch. In other words, we do not enforce</p><p>full picture matching.</p><p>E. Di Sciascio et al. / Pattern Recognition Letters 23 (2002) 15991612 1601</p></li><li><p>shapes under translation, rotation and scaling.Since our approach is dierent from the previousones, we devised a non-optimal algorithm to con-duct our experiments, leaving a deeper analysis oncomputational complexity of the problem, andoptimal algorithms to solve it, to future research.The algorithm we employ is not totally blind,though, since it takes advantage of the fact thatusually most shapes in the sketch and regions inthe image do not match each other. Hence, com-puting an n m matrix of similarities betweenshapes and regions, one can ignore most of thewrong matchings from the beginning. Neverthe-less, the worst case of the algorithmwhen shapesare all similar to regionsis exponential.</p><p>3. Representing shapes, objects and images</p><p>In a previous paper (Di Sciascio et al., 2000a)we proposed a structured language (based on ideasborrowed from Description Logics) to describe thecomplex structure of objects in an image, startingfrom a region-based segmentation of objects. Thecomplete formalism was presented in anotherpaper (Di Sciascio et al., 2000b), together with thesyntax and an extensional semantics, which is fullycompositional.</p><p>The main syntactic objects we consider are basicshapes, composite shape descriptions, and trans-formations.</p><p>Basic shapes are denoted with the letter B, andhave an edge contour eB characterizing them.We assume that eB is described as a single,closed 2D-curve in a space whose origin coincideswith the centroid of B. Examples of basic shapescan be circle, rectangle, with the contoursecircle , erectangle , but alsoany complete, rough contoure.g., the one of ashipcan be a basic shape.</p><p>The possible transformations are the basicones that are present in any drawing tool: rota-tion (around the centroid of the shape), scalingand translation. We globally denote a rotationtranslationscaling transformation as s. Recallthat transformations can be composed in se-quences s1 sn, and they form a mathemat-ical group.</p><p>Basic building block of our syntax is a basicshape component hc; t; s;Bi, which represents a re-gion with color c, texture t, and contour seB.With seB we denote the pointwise transfor-mation s of the whole contour of B. For example,s could specify to place the contour eB in theupper left corner of the image, scaled by 1/2 androtated 45 .</p><p>Composite shape descriptions are conjunctionsof basic shape components, each one with its owncolor and texture. They are denoted as</p><p>C hc1; t1; s1;B1i u u hcn; tn; sn;BniNotice that this is just an internal representation,invisible to the user, that we map in a visual lan-guage actually used to sketch the query.</p><p>Gudivada (1998) proposed hR-strings, a geo-metry-based structure for the representation ofspatial relations in images, and an associatedspatial similarity algorithm. The hR-string is asymbolic representation of an image. It is obtainedby associating a name with each domain objectidentied in the image, and then considering thesequence of names and coordinates of the cent-roids of the objects, with reference to a Cartesiancoordinate system. Gudivadas original represen-tation was limited by the assumption that imagescould not contain multiple instances of the sameobject type. We propose a modied formulation ofhR-strings, which allows our simi...</p></li></ul>


View more >