Content-based Image Retrieval using intuitive Shape Partitioning

Embed Size (px)

Citation preview

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    1/69

    Technische Universitt Hamburg-Harburg

    Vision Systems

    Prof. Dr.-Ing. R.-R. Grigat

    Content-based Image Retrieval using

    Intuitive Shape Partitioning

    Studienarbeit

    Andrey Galochkin

    January 2007

    In cooperation with Prof. Kamel, University of Waterloo

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    2/69

    Erklrung

    Hiermit erklre ich, dass die vorliegende Arbeit von mir selbstndig und nur unter Ver-

    wendung der aufgefhrten Hilfsmittel erstellt wurde.

    Harburg, den 5. Januar 2007

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    3/69

    Abstract

    In this thesis we present a novel query-by-example shape-based image retrieval system

    that uses the correspondence of visual parts to assess the degree of similarity between

    shapes. The visual parts are explicitly computed based on the cognitive principles ofhuman perception. The developed method is robust to rotation, translation, scale and

    moderate level of noise. In addition, it can deal with articulated or partially occluded

    shapes.

    We compare our system with other part-based methods and evaluate its performance

    using the MPEG-7 benchmark dataset.

    Finally, we discuss the advantages and drawbacks of our system compared to global

    shape similarity measures on the example of the Contour Fourier method.

    ii

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    4/69

    Contents

    List of Figures iv

    List of Tables v

    1 Introduction 1

    1.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    2 Background Theory 2

    2.1 Content-based image retrieval . . . . . . . . . . . . . . . . . . . . . . 2

    2.1.1 Architecture of CBIR systems . . . . . . . . . . . . . . . . . . 2

    2.1.2 Image descriptors . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2.2 Shape description techniques . . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Demands on shape features. . . . . . . . . . . . . . . . . . . . 5

    2.2.2 Classification of shape descriptors . . . . . . . . . . . . . . . . 6

    2.2.3 Global descriptors . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.2.4 Structural descriptors and partial shape matching . . . . . . . . 10

    3 Cognitive Principles of Shape Partitioning 15

    3.1 The minima rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3.2 Boundary strength (minima salience) . . . . . . . . . . . . . . . . . . . 17

    3.3 Cut length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3.4 Relative area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.5 Protrusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.6 Good continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3.7 Convex partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.8 Partitioning problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    4 The Developed System 22

    4.1 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    iii

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    5/69

    Contents iv

    4.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    4.3.1 Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3.2 Reduce in size . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4.3.3 Extract boundary . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4.3.4 Adaptive smoothing . . . . . . . . . . . . . . . . . . . . . . . 27

    4.3.5 Discrete curve evolution . . . . . . . . . . . . . . . . . . . . . 30

    4.3.6 Insert auxiliary points . . . . . . . . . . . . . . . . . . . . . . 32

    4.4 Part segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.4.1 SplitShape algorithm . . . . . . . . . . . . . . . . . . . . . . . 34

    4.4.2 Merge parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    4.5 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    4.5.1 Global features . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5.2 Local features. . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    4.6 Retrieval algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    5 Performance Evaluation 42

    5.1 Retrieval rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    5.2 Time issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    5.2.1 Feature extraction. . . . . . . . . . . . . . . . . . . . . . . . . 46

    5.2.2 Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    5.3 Comparison to other part-based methods . . . . . . . . . . . . . . . . . 47

    5.3.1 Shape tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    5.3.2 Skeletons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    5.3.3 Latecki NL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    6 Conclusions 54

    7 Future Work 55

    Bibliography 59

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    6/69

    List of Figures

    2.1 Typical architecture of a content-based image retrieval system (Reprinted

    from [19]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Classification of shape representation and description techniques (Reprinted

    from [22]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.3 an object (a) and its convex hull (b). . . . . . . . . . . . . . . . . . . . 7

    2.4 Reconstruction of a deer shape with increasing number of FDs. The

    general form of an object can be described by the first few coefficients. . 10

    2.5 A horse shape has been divided into different tokens. The numbers

    corresponding to each token are the curvature and the orientation of the

    token. (Reprinted from [2]). . . . . . . . . . . . . . . . . . . . . . . . 11

    2.6 The medial axis of a polygon is defined as the locus of centers of maxi-

    mally inscribed disks. (Reprinted from [22]). . . . . . . . . . . . . . . 132.7 The sensitivity to noise of the medial axis: small changes in the bound-

    ary may induce significant changes in the medial axis. (Reprinted from

    [18]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.8 The parsing of the dog bone into parts at the branch points of the Medial

    Axis Transform (a) gives the same part structure to a rectangle (b).

    (Reprinted from [15]). . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.1 When two 3D shapes intersect, they generically create a concave crease

    at the locus of intersection (reprinted from [15]). . . . . . . . . . . . . . 16

    3.2 Although any subset of an object is physically a part of it, human ob-

    servers clearly find some parts perceptually natural (b),whereas othersseem rather contrived (c) (reprinted from [15]). . . . . . . . . . . . . . 16

    3.3 Sharper negative minima are stronger attractors of parts cuts than weaker

    negative minima. In (b), a slight deviation of the part cut from negative

    minima looks clearly wrong. However, in (d) a deviation of identical

    magnitude appears less contrived (reprinted from [15]). . . . . . . . . . 17

    v

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    7/69

    List of Figures vi

    3.4 The natural part cuts for the shape in (a) are shown in (b). Note that

    each of these cuts joins a negative minimum of curvature to a point ofzero curvature. Simply joining the two negative minima, on the other

    hand as in (c) leads to a perceptually unnatural parsing. (Adapted from

    [16]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3.5 The role of cut length in determining part cuts. The cut pq in (a) appears

    far more natural than the cut pr. This is also true in (b) where the areas

    of the two candidate parts have been equated. (reprinted from [16]) . . . 18

    3.6 An example of the role of good continuation in parsing. The horizontal

    cuts in (b)appear less natural than the vertical cuts in (c), even though

    the vertical cuts are longer. (reprinted from [15]) . . . . . . . . . . . . 19

    3.7 (a) is naturally segmented using four part cuts (into a central core andfour parts), whereas (b) is naturally segmented using two part cuts (into

    a large vertical body and two parts on the sides). [ 16] . . . . . . . . . . 21

    4.1 A contour consisting of 27 points (P1and P27coincide) . . . . . . . . . 22

    4.2 A shape with the cutting segmentP8P12. The partP8P12 is the se-quence of pointsP8,P9,P10,P11,P12,P8. . . . . . . . . . . . . . . . . . . 23

    4.3 if holes in (a) are filled (b), the degree of similarity between (b) and

    other "lizzards" decreases. However, in some cases (c) holes should be

    filled (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    4.4 Shapes before (a,c) and after adaptive smoothing (b,d) . . . . . . . . . 28

    4.5 A shape and its curvature. After smoothing only global extrema remain.

    (red: maxima, blue: minima, green: inflection points). . . . . . . . . . . 30

    4.6 a shape before (a) and after discrete curve evolution (b) . . . . . . . . . 32

    4.7 Contour of cellular_phone-04 after discrete curve evolution. . . . . . . 32

    4.8 bad cuts (red). Because "good" points are missing, no "good" cuts exist

    here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    4.9 After points have been inserted, intuitive partitioning is possible. . . . . 34

    4.10 (a) incorrect partitioning of octopus-15. The part cut through the body

    is wrong, even though its start and end points are salient minima. (b)

    Correct partitioning.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    4.11 the correct partitioning (a) can be destroyed by an incorrect merge (b) . 38

    5.1 Some shapes used in part B of MPEG-7 Core Experiment CE-Shape-1.

    Shapes in each row belong to the same class.(reprinted from [20] ) . . . 43

    5.2 Results of the MPEG-7 CE-Shape-1 part B test for each class for both

    Contour Fourier descriptors and our part-based method. . . . . . . . . . 44

    5.3 Twenty most similar images to device7-10 found by our method. Matched

    parts are displayed in the same color as the corresponding query parts.

    Parts for which no correspondence was found are painted black. . . . . 49

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    8/69

    List of Figures vii

    5.4 Twenty most similar images found by the CFD method. Images are

    displayed as silhouettes because this method doesnt compute any parts. 505.5 Twenty most similar images to ray-11 found by our method. Matched

    parts are displayed in the same color as the corresponding query parts.

    Parts for which no correspondence was found are painted black. . . . . 51

    5.6 Twenty most similar images to ray-11 found by the CFD method.. . . . 52

    5.7 Inconsistent partitioning makes it difficult to match shapes. . . . . . . . 53

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    9/69

    List of Tables

    5.1 Time needed to perform feature extraction and retrieval.. . . . . . . . . 45

    viii

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    10/69

    Chapter 1

    Introduction

    1.1 Problem definition

    Global shape similarity measures fail when the analyzed shapes are partially occluded,

    globally deformed or their parts articulated. The solution to this problem is to apply

    part-based instead of whole-shape matching.

    The main goal of this thesis project is to design algorithms that mimic the way humans

    partition shapes and then carry out part-based matching which is robust to articulations

    and occlusions.

    1.2 Thesis outline

    The rest of this thesis is organized as follows:

    Chapter2 is a short survey of CBIR and image descriptors with the focus on shape

    descriptors that we used in our algorithms.

    Chapter3 explains some cognitive principles of shape partitioning.

    Chapter4 describes the image retrieval system developed in this project.

    Chapter5 is about the performance evaluation.

    1

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    11/69

    Chapter 2

    Background Theory

    2.1 Content-based image retrieval

    As the size of digital image collections worldwide increases, searching for images in

    such collections is becoming an important operation. In particular, there is an increas-

    ing need for describing the complex information of digital images by non-textual de-scriptions, that can be used to efficiently search for similar images. The field within the

    multimedia research area, focusing on using information about the visual content (such

    as color, texture or shape) of the images in order to search an image database, is called

    content-based image retrieval (CBIR). [18]

    One of the main advantages of the CBIR approach is the possibility of an automatic

    retrieval process, instead of the traditional keyword-based approach, which usually re-

    quires very laborious and time-consuming previous annotation of database images. The

    CBIR technology has been used in several applications such as fingerprint identification,

    biodiversity information systems, digital libraries, crime prevention, medicine, histori-

    cal research, among others.

    2.1.1 Architecture of CBIR systems

    Figure2.1shows a typical architecture of a content-based image retrieval system.

    2

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    12/69

    2.1 Content-based image retrieval 3

    Figure 2.1: Typical architecture of a content-based image retrieval system (Reprinted

    from [19])

    Two main functionalities are supported: data insertion and query processing. The data

    insertion subsystem is responsible for extracting appropriate features from images and

    storing them into the image database (see dashed modules and arrows). This process is

    usually performed off-line. The query processing, in turn, is organized as follows: the

    interface allows a user to specify a query by means of a query pattern and to visualize the

    retrieved similar images. The query-processing module extracts a feature vector from

    a query pattern and applies a metric (such as the Euclidean distance) to evaluate the

    similarity between the query image and the database images. Next, it ranks the database

    images in a decreasing order of similarity to the query image and forwards the most

    similar images to the interface module. Database images are often indexed according

    to their feature vectors to speed up retrieval and similarity computation [19]. Note that

    both the data insertion and the query processing functionalities use the feature vector

    extraction module.

    The CBIR system that we developed in this thesis is structured in a very similar way.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    13/69

    2.1 Content-based image retrieval 4

    2.1.2 Image descriptors

    An image descriptoris a pair is a pair feature vector extraction function and distance

    function, used for image indexation by similarity. The extracted feature vector subsumes

    the image properties and the distance function measures the dissimilarity between two

    images with respect to their properties [19].

    This section aims to present a brief overview of existing image descriptors. Even though

    the image retrieval system developed in this thesis deals only with shape-based retrieval

    for binary images, we will be discussing, for completeness, color and texture in addition

    to shape features used in image retrieval.

    2.1.2.1 Color

    The color feature is one of the most widely used visual features in image retrieval.

    It is relatively robust to background complication and independent of image size and

    orientation [19].

    Color description techniques can be grouped into two classes based on whether or not

    they encode information related to the color spatial distribution.

    Examples of descriptors that do not incorporate spatial color distribution include Color

    Histogram, Color Moments and Color Sets. Color Histogram is the most commonly

    used descriptor in image retrieval. Statistically, it denotes the joint probability of the

    intensities of the color channels, e.g. RGB [12].

    On the other hand, such descriptors as Color Coherence Vector (CCV), Border/Interior

    Pixel Classification (BIC), and Color Correlogram, incorporate color spatial distribution

    [19].

    2.1.2.2 Texture

    This image property can be characterized by the existence of basic primitives, whose

    spatial distribution creates some visual patterns defined in terms of granularity, direc-

    tionality, and repetitiveness. There exist different approaches to extract and represent

    textures. They can be classified into space-based,frequency-basedmodels, andtexture

    signatures [19].

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    14/69

    2.2 Shape description techniques 5

    Co-occurrence Matrix is one the most traditional techniques for encoding texture in-

    formation. It describes spatial relationships among grey-levels in a image. A cell de-fined by the position(i,j)in this matrix registers the probability at which two pixels ofgray levelsi and j occur in two relative positions. A set of co-occurrence probabilities

    (such as, energy, entropy, contrast) has been proposed to characterize textured regions.

    Other example of space-based method includes the use ofAuto-RegressiveModels.

    Frequency-based texture descriptors include, for instance, the Garbor wavelet coeffi-

    cients that were found to be the best among the tested candidates which matched human

    vision study results [12].

    An example of texture signatures can be found in the proposal of Tamura et al. This

    descriptor aims to characterize texture information in terms of contrast, coarseness, anddirectionality. The MPEG-7 initiative proposed three texture descriptors: texture brows-

    ing descriptor, homogeneous texture descriptor, and local edge histogram descriptor

    [19].

    2.2 Shape description techniques

    A shape was defined as: all the geometrical information that remains when location,

    scale and rotational effects are filtered out from an object [17]. Shape description is

    the extraction of shape features in order to quantify important properties of the shape.

    2.2.1 Demands on shape features

    Petrakis et al [10] state that among others the following properties are important for

    reliable shape matching and retrieval:

    Invariance to translation, rotation and scale, Robustness to noise and deformations, Computational efficiency, Compactness (the features require little storage space).

    In our algorithms we used only such features that meet these requirements.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    15/69

    2.2 Shape description techniques 6

    2.2.2 Classification of shape descriptors

    Shape descriptors are classified into boundary-based (or contour-based) and region-

    basedmethods. This classification takes into account whether shape features are ex-

    tracted from the contour only or from the whole shape region. These two classes, in

    turn, can be divided into structural (local) and global descriptors. This subdivision is

    based on whether the shape is represented as a whole or represented by segments/sec-

    tions. Another possible classification categorizes shape description methods intospatial

    and transform domain techniques, depending on whether direct measurements of the

    shape are used or a transformation is applied [22].

    Figure 2.2: Classification of shape representation and description techniques (Reprinted

    from [22])

    Next, we present an overview of the shape descriptors relevant for the rest of this thesis.

    2.2.3 Global descriptors

    Perimeteris the shape boundary length, i.e the number of pixels on the shape boundary.

    This feature is often used to normalize curves to have unity length (e.g. in discrete curve

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    16/69

    2.2 Shape description techniques 7

    evolution that will be described later).

    Areais the number of pixels constituting the shape.

    Roundnessis roughly correlated with the complexity of the contour and can be com-

    puted asR=Area 4/Perimeter2.Roundness equals one for a circle and zero for a line segment.

    Convex hull

    A regionRis convex if and only if for any two points P1,P2R, the whole line segmentwhose end points are P1 and P2 is also inside R. The convex hull of a region is the

    smallest convex regionHwhich satisfies the condition RH. [22]

    (a) (b)

    Figure 2.3: an object (a) and its convex hull (b)

    Solidity is defined as the ratio of the shapes area to the area of its convex hull and

    measures the deviation of a shape from being totally convex.

    2.2.3.1 Shape signatures

    In general, ashape signature is any 1-D function representing 2-D areas or boundaries

    [21]. Assume the shape boundary coordinates(x(t),y(t)),t= 0,1,...,L 1, have beenextracted in the preprocessing stage. Then we can define the

    Complex coordinates function,

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    17/69

    2.2 Shape description techniques 8

    which is simply the complex number generated from the boundary coordinates:

    z(t) =x(t) + iy(t)

    In order to eliminate the effect of bias, we use the shifted coordinates function:

    z(t) = [x(t)xc] + i[y(t) yc]

    where(xc,yc)is thecentroid of the shape, which is the average of the boundary coordi-

    nates

    xc=1

    L

    L1t=0

    x(t), yc=1

    L

    L1t=0

    y(t)

    This shift makes the shape representation invariant to translation. An im-

    portant property of this representation is that it is information preserving,

    i.e. it allows full reconstruction of the shape of the contour [21].

    2.2.3.2 Fourier descriptors

    For a given shape signature described as aboves(t),t= 0,1,...L,assum-ing it is normalized toNpoints in the sampling stage, the discrete Fourier

    transform ofs(t)is given by

    un= 1

    N

    N1

    k=0

    s(t) exp(j2nt

    N), n=0,1,...,N1

    The coefficientsun,n=0,1,...,N 1,are called

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    18/69

    2.2 Shape description techniques 9

    Fourier descriptors(FD) of the shape, denoted asF Dn,n = 0,1,...,N

    1 [21]. Rotation invarianceof the FDs is achieved by ignoring the phaseinformation and by taking only the magnitude values of the FDs.

    For complex coordinates signature, all the Ndescriptors except the first

    one (DC component) are needed to index the shape. The DC component

    depends only on the position of the shape, it is not useful in describing

    shape thus is discarded.

    Scale normalization is achieved by dividing the magnitude values of all

    the other descriptors by the magnitude value of the second descriptor.The invariant feature vector used to index the shape is then given by [ 21]

    f= [ |FD2||FD1|,|FD3||FD1|,...,

    |FDN1||FD1| ]

    Typically, 10-15 descriptors are sufficient to describe shapes. We used

    N=14 in our algorithms.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    19/69

    2.2 Shape description techniques 10

    original shape

    N=1 N=2 N=4

    N=6 N=10 N=15

    N=20 N=30 N=50

    Figure 2.4: Reconstruction of a deer shape with increasing number of FDs. The general

    form of an object can be described by the first few coefficients.

    2.2.4 Structural descriptors and partial shape matching

    With the structural approach, shapes are broken down into boundary seg-

    ments called primitives. Structural methods differ in the selection of

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    20/69

    2.2 Shape description techniques 11

    primitives and the organization of the primitives for shape representation

    [22].

    2.2.4.1 Shape tokens

    In [2], the curvature zero-crossing points from a Gaussian smoothed bound-

    ary are used to obtain primitives, called tokens (Fig.2.5). The feature for

    each token is its maximum curvature and its orientation, and the similar-

    ity between two tokens is measured by the weighted Euclidean distance.

    Figure 2.5: A horse shape has been divided into different tokens. The numbers cor-

    responding to each token are the curvature and the orientation of the token. (Reprinted

    from [2]).

    Since the feature includes curve orientation, it is not rotation invariant.

    The authors addressed the problem, but did not solve it.

    Given a query shape, the retrieval of similar shapes from the database

    takes two steps. The first step is token retrieval. For all theN tokens

    on the query shape, the similar tokens are found by traversing the index

    treeNtimes. The set of retrieved tokens having the same shape identifier

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    21/69

    2.2 Shape description techniques 12

    form a potential similar shape. The second step is to match the query

    shape and the potential similar shape using a model-by-model match-

    ing algorithm which is the best match between tokens of the two shapes

    and involves O(MN)operations (MandNare the numbers of tokens oftwo matching shapes, respectively). Matching of tokens in both steps in-

    volves thresholding which is ad hoc or empirical. Quantitative retrieval

    performance (precision and recall) and retrieval efficiency are reported

    based on a shape database extracted from classical painted images. Since

    the tree is traversed a number of times in the shape matching, it is not

    clear whether the indexing is better than model-by-model indexing. Onlymatching performance using different trees is reported. The matching ef-

    ficiency also depends on the number of tokens for each shape, and on the

    scale used in the smoothing stage [22].

    2.2.4.2 Visual parts

    Latecki et al [9] presented a shape matching approach that works directlyon the the closed boundaries. It is based on visual parts (VP), where (part

    of) a database shape is simplified in the context of the query shape prior

    to their matching. The simplification process includes the elimination

    of particular points from the database shape such that the similarity to

    the query shape is maximized. The main disadvantage of this method is

    the high computational complexity of the matching algorithm, which is

    O(N3logN)whereNis the number of the boundary points [1].

    2.2.4.3 Skeletons

    The basic idea is to eliminate redundant information while retaining only

    the topological structure of the object. Skeletons can be computed by

    medial axis transform. The medial axis is the locus of the centers of

    maximal circles that fit within the shape, as illustrated in Fig. 2.6.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    22/69

    2.2 Shape description techniques 13

    Figure 2.6: The medial axis of a polygon is defined as the locus of centers of maximally

    inscribed disks. (Reprinted from [22]).

    The skeleton is then segmented and represented as a graph according to

    certain criteria. The matching between shapes becomes graph matching

    problem. This method is sensitive to noise and requires high computa-

    tions [22].

    Figure 2.7: The sensitivity to noise of the medial axis: small changes in the boundarymay induce significant changes in the medial axis. (Reprinted from [18]).

    Another disadvantage of skeleton-based shape partitioning is that it can

    produce unintuitive results (Fig.2.8).

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    23/69

    2.2 Shape description techniques 14

    (a)

    (b)

    Figure 2.8: The parsing of the dog bone into parts at the branch points of the Medial

    Axis Transform (a) gives the same part structure to a rectangle (b).

    (Reprinted from [15]).

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    24/69

    Chapter 3

    Cognitive Principles of ShapePartitioning

    There is strong evidence from cognitive psychology that humans recog-

    nize objects by first decomposing them into parts. Human vision orga-

    nizes object shapes in terms of parts and their spatial relationships. We

    perceive a human hand, for example, as a coherent perceptual object; but

    also as a spatial arrangement of clearly defined parts: five fingers and a

    palm. Hence, perceptual units exist at many levels: at the level of whole

    objects, at the level of parts, and possibly smaller parts nested within

    larger ones [15].

    In this chapter we summarize the main findings in this research area be-

    cause they were explicitly used in the design of our algorithms.

    3.1 The minima rule

    Cognitive experiments have shown that humans perceive as a boundary

    between two "parts" a segment containing at least one point of negative

    curvature [3]. The reason is that when two convex parts overlap, their

    15

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    25/69

    3.1 The minima rule 16

    boundary in most cases contains one or two points of negative curvature

    (see Fig.3.1)

    Figure 3.1: When two 3D shapes intersect, they generically create a concave crease at

    the locus of intersection (reprinted from [15]).

    Therefore we can define the

    Minima Rule for Silhouettes:

    Divide silhouettes into parts using points of negative minima of curvature

    on their bounding contour as boundaries between parts [3].

    Figure 3.2: Although any subset of an object is physically a part of it, human observers

    clearly find some parts perceptually natural (b),whereas others seem rather contrived (c)

    (reprinted from [15]).

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    26/69

    3.2 Boundary strength (minima salience) 17

    3.2 Boundary strength (minima salience)

    The sharper a curvature minimum M, the more natural it is to a human

    observer to draw a cut through it [15].

    Figure 3.3: Sharper negative minima are stronger attractors of parts cuts than weaker

    negative minima. In (b), a slight deviation of the part cut from negative minima looksclearly wrong. However, in (d) a deviation of identical magnitude appears less contrived

    (reprinted from [15]).

    However, a good cut doesnt always connect two curvature minima, even

    if they are very sharp. Thus geometric constraints in addition to the min-

    ima rule are needed to define cuts, and hence the parts themselves.

    For our current purposes, we take a part cutto be a straight-line segment

    which joins two points on the outline of a silhouette such that

    **(1) at least one of the two points has negative curvature,

    **(2) the entire segment lies in the interior of the shape.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    27/69

    3.3 Cut length 18

    Figure 3.4: The natural part cuts for the shape in (a) are shown in (b). Note that each of

    these cuts joins a negative minimum of curvature to a point of zero curvature. Simply

    joining the two negative minima, on the other hand as in (c) leads to a perceptuallyunnatural parsing. (Adapted from [16])

    3.3 Cut length

    Consider the elbow in Figure3.5. Cut pq on this elbow looks far more

    natural than cut pr. In Figure3.5b, we have made the areas of the two

    segments equal, and pq is still the preferred cut, suggesting that the area

    of the parts is not determining the cuts in these figures. Instead, exampleslike these suggest that human vision prefers to divide shapes into parts

    using the shortest cuts possible.

    Figure 3.5: The role of cut length in determining part cuts. The cut pq in (a) appears far

    more natural than the cut pr. This is also true in (b) where the areas of the two candidate

    parts have been equated. (reprinted from [16])

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    28/69

    3.4 Relative area 19

    3.4 Relative area

    The salience of a part increases as the ratio of its visible area to the visible

    area of the whole silhouette increases [15].

    3.5 Protrusion

    This factor is the degree to which a part sticks out from its object. Parts

    that stick out more seem to be more salient [4]. It can be computed as the

    ratio of the part perimeter to the length of the cutting segment.

    3.6 Good continuation

    Consider the shape in Figure3.6. Here the parsing induced by the shorter

    cuts (shown in Figure3.6b) appears less natural than the one induced by

    the longer cuts (shown in Figure3.6c).

    Figure 3.6: An example of the role of good continuation in parsing. The horizontal cuts

    in (b)appear less natural than the vertical cuts in (c), even though the vertical cuts are

    longer. (reprinted from [15])

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    29/69

    3.7 Convex partitioning 20

    There is another factor at play here, in addition to minimizing cut length:

    In Figure3.6c each cut continues the directions of two tangents at the

    negative minima of curvature but not in Figure3.6b. Hence good contin-

    uation between a pair of tangents (one at each of the two part boundaries)

    is an important geometric factor for determining part cuts.

    3.7 Convex partitioning

    Rosin [11] showed that a partitioning scheme which maximizes the weighted

    sum of part convexities is closely related to Hoffman and Singhs part

    salience factors [4]. The idea is to produce few solid parts with maxi-

    mum relative area.

    3.8 Partitioning problems

    One of the main problems of intuitive shape partitioning is instability,

    e.g. small changes in shape can cause signigicant changes in part seg-

    mentation. In particular, partitioning is sensitive to relative size of shape

    parts.

    We addressed this issue when designing our algorithms.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    30/69

    3.8 Partitioning problems 21

    Figure 3.7: (a) is naturally segmented using four part cuts (into a central core and four

    parts), whereas (b) is naturally segmented using two part cuts (into a large vertical body

    and two parts on the sides). [16]

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    31/69

    Chapter 4

    The Developed System

    4.1 Definitions

    Contour (Boundary): a sequence of points P1P2..PNP1that, when joined,form a polygon without self-intersections (the contour of a shape). This

    sequence begins and ends with the same point to ensure that the polygon

    is closed.

    Figure 4.1: A contour consisting of 27 points (P1and P27coincide)

    22

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    32/69

    4.1 Definitions 23

    Part :

    Let M=Pi (i-th point on the contour) and Q= Pj , then a subset of acontour,PiPi+1...Pj1PjPi, j>i mod Nis called apartMQ.The part has to have area larger than the remaining area. If this is not the

    case, the part and the remaining shape are swapped. The segment M Qis called acutting segmentor justcut.

    Figure 4.2: A shape with the cutting segment P8P12. The partP8P12 is the sequenceof pointsP8,P9,P10,P11,P12,P8.

    As mentioned before, because of cognitive principles, a cutting segment

    always starts at a curvature minimum and must lie completely inside the

    shape contour.

    The words "part" and "shape part" mean the same in the scope of this

    thesis.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    33/69

    4.2 Overview 24

    4.2 Overview

    The developed system has two main functionalities: populate the fea-

    ture databaseand retrieval.

    In the populate_database method the visual parts of a shape are

    computed, their features extracted and the resulting matrix saved to a file

    imagename_Features.mat.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    34/69

    4.2 Overview 25

    Populate Database:

    read in an image

    reduce in size

    extract boundary

    compute global featurescompute curvature

    adaptive smoothing

    discrete curve evolution

    (leave only perceptually salient points )

    insert points enabling to make shortest or straight cuts

    iteratively split shape using intuition

    merge incorrectly split parts

    extract features from each partsave feature matrix to file

    In this representation the rows are the found parts of the shape and the

    columns are the corresponding features.

    Note: for a very solid shape, such as a circle, no parts will be found, e.g.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    35/69

    4.3 Preprocessing 26

    the shape will not be split. If two such shapes are to be matched, the

    system simply computes the Euclidean distance between the two feature

    vectors.

    The retrieval algorithm is loosely based on the shape tokens retrieval

    scheme[2] and is described in detail in the next chapter.

    The main idea is to match shapes both on the global and local level.

    d1

    distance between global features

    d2distance between local featuresdad1+ (1 a)d2Note: The retrieval algorithm can only be run after the populate_database

    method since it requires the extracted shape features.

    4.3 Preprocessing

    4.3.1 Holes

    In the pre-processing stage we didnt deal with holes because it is not

    always clear when holes should be filled or opened.

    (a) (b) (c) (d)

    Figure 4.3: if holes in (a) are filled (b), the degree of similarity between (b) and other

    "lizzards" decreases. However, in some cases (c) holes should be filled (d).

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    36/69

    4.3 Preprocessing 27

    4.3.2 Reduce in size

    This step is required mainly for computational reasons since the speed of

    all subsequent algorithms strongly depends on the number of the bound-

    ary points. Empirically we found out that N=128x128 white pixels are

    sufficient to retain all perceptually important details of most images. There-

    fore, each image is pre-processed as follows:

    compute area (number of "turned on" pixels)if area > N

    reduce in size to make area equal to N.

    4.3.3 Extract boundary

    Here we used the Matlab method bwtraceboundary and 8-connectivity

    to extract the parametrized coordinates[x,y]of the shape contour.

    4.3.4 Adaptive smoothing

    For further processing it is necessary to sufficiently reduce the level of

    noise and to remove small details that decrease the degree of similarity

    between shapes. The contour of the shape is iteratively smoothed until

    the number of curvature extrema becomes sufficiently low (at most 20curvature minima).

    1 smoothing parameter P threshold

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    37/69

    4.3 Preprocessing 28

    7 increase P

    8 goto 39 end

    Thus, complex shapes with many details are heavily smoothed whereas

    "simple" shapes are left unchanged to prevent loss of information.

    (a) (b)

    (c) (d)

    Figure 4.4: Shapes before (a,c) and after adaptive smoothing (b,d)

    4.3.4.1 gauSmooth

    The easiest and most computation efficient way to smooth the boundary

    would be to simply reconstruct the curve using the previously computedFourier descriptors Fz:

    z=ifft(Fz); % inverse Fourier transform

    xsmooth=real(z);

    ysmooth=imag(z);

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    38/69

    4.3 Preprocessing 29

    However, such reconstruction sometimes produces self-intersections of

    the approximated curve and ringing, which would significantly affect the

    outcome of the partitioning algorithm.

    This is why we used Gaussian smoothing, which produces natural results

    (like blur of a camera).

    Here the curve point sequence is smoothed by circularly convolving it

    with a Gaussian

    f(x|,) = 1

    2e(x)2

    22

    1 sigma = smoothingParameter*length(x);

    2 W = 3*sigma; % outside the Gaussian is negligibly small

    3 t = (-W:W); % truncate too small values of the Gaussian

    4 gau = normpdf(t,0,sigma);

    5 gau=gau/sum(gau); % normalize to make area=1

    6 xx=conv(x1,gau); % smoothed x coordinates

    4.3.4.2 Compute curvature

    Mathematically, a planar, continuous curve can be parameterized with

    respect to its arc length t, and expressed as c(t) ={x(t),y(t)}

    Hence, the curvature (t)ofc(t)at the point{x(t),y(t)}can be expressedas: can be expressed as:

    (t) =xt(t)ytt(t)xtt(t)yt(t)

    (x2t(t) +y2t(t))

    3/2

    The discrete derivatives are computed using the formulae

    xn(k) =x(k 1) +x(k+ 1)

    2

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    39/69

    4.3 Preprocessing 30

    and

    xnn(k) = xn(k 1) +xn(k+ 1)

    2

    From these formulae one can see that the computed curvature is very sen-

    sitive to noise, which results in a high number of detected extrema. Hence

    the curve needs to be smoothed before further processing. The smooth-

    ing parameter is adjusted as described in the adaptive smoothing function.

    (a) (b)

    Figure 4.5: A shape and its curvature. After smoothing only global extrema remain.

    (red: maxima, blue: minima, green: inflection points).

    4.3.5 Discrete curve evolution

    To reduce the computation time it is necessary that the shape boundary

    consists of as few points as possible. A straight-forward downsampling

    (take each n-th point of the boundary) has the disadvantage that some

    perceptually significant points may be removed in areas of high detail

    and too many insignificant points left in areas of low detail.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    40/69

    4.3 Preprocessing 31

    For example, to represent one period of a cosine wave, at least 5 points are

    necessary. On the other hand, to downsample a line of arbitrary length,

    just 2 points are enough.

    We implemented the method described by Latecki [6,7].

    In every evolutional step, a pair of consecutive line segments s1,s2 isreplaced by a single line segment joining the endpoints ofs1 s2.

    The key property of this evolution is the order of the substitution. The

    substitution is achieved according to a relevance measureKgiven by:

    K(s1,s2) =(s1,s2)l(s1)l(s2)

    l(s1) + l(s2)

    where line segments s1,s2 are the polygon sides incident to a vertex v,

    (s1,s2)is the turn angle at the common vertex of segmentss1,s2,lis the

    length function normalized with respect to the total length of a polygonalcurveC. The main property of this relevance measurement is [7,9]:

    The higher value ofK(s1,s2),the larger is the contribution of the arcs1 s2to the shape. Given the input boundary polygon P withn vertices,DCE produces a sequence of simpler polygons P= Pn,Pn1,...,P3 suchthatPn(k+1) is obtained by removing a single vertexv fromPnkwhoseshape contribution measured byKis the smallest.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    41/69

    4.3 Preprocessing 32

    (a) (b)

    Figure 4.6: a shape before (a) and after discrete curve evolution (b)

    4.3.6 Insert auxiliary points

    4.3.6.1 Motivation

    Empirically we found out that to partition a shape well, several points

    have to be inserted. For example, if a cut has to be made and a pointhas been removed by discrete curve evolution, then this good cut cant be

    made.

    Figure 4.7: Contour of cellular_phone-04 after discrete curve evolution.

    For example, the shape of a cell phone apparently consists of two parts:

    the body and the antenna. However, since the body is rectangular, its

    lines are straight and thus just two points of each line remain after the

    curve evolution.

    Obviously, none of the possible cuts is intuitive.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    42/69

    4.3 Preprocessing 33

    Figure 4.8: bad cuts (red). Because "good" points are missing, no "good" cuts exist

    here.

    Therefore, we need to insert points that will most likely be used to build

    part cuts. We used the "shortest cut" and "good continuation" rules.

    4.3.6.2 insertShortestCut

    As previously mentioned, humans prefer to partition shapes with seg-

    ments having shortest length. This means that such segments are orthog-

    onal bisectors of the opposite lines on the contour. More formally,

    1 for each curvature minimum M2 for each point Pi on the contour

    3 compute vector v = Pi->Pi+1 %tangent at Pi

    4 compute normal vector n v %orthogonal to PiPi+1

    5

    6 L:= M + k*n %line through M, in direction of n

    7

    8 S=intersect(L,PiPi+1) %intersection point of line L and

    9 %segment PiPi+1

    10

    11 if between(S, PiPi+1) %intersection within segment PiPi+1

    12 insert S %into the contour points sequence

    13 end

    14 end

    15 end

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    43/69

    4.4 Part segmentation 34

    4.3.6.3 insertStraightCut

    Continue each long line segment until it intersects some other segment.

    Insert the intersection point into the sequence of boundary points.

    Figure 4.9: After points have been inserted, intuitive partitioning is possible.

    4.4 Part segmentation

    4.4.1 SplitShape algorithm

    The algorithm uses the previously mentioned cognitive principles trying

    to split a shape in a way a human would do it. The following pseudocode

    illustrates the main idea and is shown very strongly simplified.

    1 while remaining shape not convex

    2 for all curvature minima M

    3 for all points on the boundary P

    4 if admissible(cut M->P)

    5 compute6 CutLength,

    7 areaC %relative area of the candidate Part,

    8 solC %its Solidity (=Convexity)

    9 solR %Solidity of the Remaining Shape

    10 mnSalience %Salience of start and end points

    11

    12 F = areaC + solC + cutLength + solR + mnSalience;

    13

    14 save F %value of utility function for cut M->P

    15 end

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    44/69

    4.4 Part segmentation 35

    16 end

    17 end

    18 remove Parts with highest F

    19 end

    A cut is admissible if it lies completely inside the shape boundary. To

    verify that for a cut MP we designed a function lineInPoly thatchecks whether the vector MP intersects any of existing line segments.

    (In other words, for alln, points pnpn+1 must lie on the same side of the

    line through M

    P). This solution is much better than the numerical

    one (generate 100 points betweenM,Pand check if all of them lie in thepolygon).

    Each component of the utility function F is cognitively motivated:

    areaC

    Part salience increases with relative area. Regions of a shape are nor-

    mally perceived as parts only if they have some significant area relativeto the original shape [15].

    solC

    Rosin [11] showed that a partitioning scheme which maximizes the weighted

    sum of part solidities is closely related to Hoffman and Singhs part

    salience factors [4]. Therefore, part salience increases with solidity.

    solR

    Empirically we found out that "good" parts, when removed from the orig-

    inal shape, make it more solid. For example, the solidity of an X-shape is

    about 0.3. However, when the four "legs" are removed, only the central

    core remains, which has solidity=1.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    45/69

    4.4 Part segmentation 36

    mnSalience

    Hoffman and Singh [4] showed that sharper extrema of curvature are

    more powerful attractors of part cuts. Latecki [6, 7] demonstrated that

    also the lengths of the correspondent tangents are important.

    Moreover, we found that the components are not significantly correlated

    and thus all of them need to be computed. Each component has been

    appropriately weighted, sometimes using nonlinear transformations.

    After all part candidates have been examined, the algorithm:

    - takes the partPiwith highest value of the utility functionF max,

    - finds all other partsPj that are:

    ***(1) disjoint withPi (PiandPjhave not more than 2 points in common)

    ***(2) have value of utility functionFj>0.9Fmax.

    The criterium (1) makes sure that any subregion of the shape belogs to

    exactly one part. Thus, it is forbidden to partition a (leg=foot+shin+hip)

    into (foot+shin) and (shin+hip)

    Demand (2) is mainly due to computational reasons. To partition a shape

    in less time it is better to remove several parts in one iteration. Moreover,

    if parts were removed one by one, the value of the utility function for

    future iterations would be different and thus the overall result very noise-

    sensitive.

    The described algorithm splits many simple shapes of the MPEG7 dataset

    in just one iteration (such as device0..device7, apple, cell_phone etc).

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    46/69

    4.4 Part segmentation 37

    4.4.2 Merge parts

    The main idea of this thesis is not only to partition shapes intuitively, but

    also to achieve that similar shapes are partitionedin the same way. Also,

    partitioning should be robust to noise and moderate perturbations of the

    contour.

    Since partitioning algorithms have no background knowledge about the

    world, wrong cuts cannot be avoided.

    (a) (b)

    Figure 4.10: (a) incorrect partitioning of octopus-15. The part cut through the body is

    wrong, even though its start and end points are salient minima. (b) Correct partitioning.

    The merging algorithm consists of two phases. In the first phase obvious

    errors are corrected, such as splitting of a circle in two halves. These

    errors mainly occur in an attempt to make the remaining shape as convex

    as possible.

    The key idea of the first phase is to merge two or more parts so that:

    ***(1) the resulting part has high solidity,

    ***(2) this merge is not "unfair" to other candidates for a merge.

    For example, a regular pentacle (5-pointed star) is naturally split into a

    pentagon with 5 triangles around it. We have to forbid a merge between

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    47/69

    4.5 Feature extraction 38

    the pentagon and any of those triangles to remain "fair" and preserve

    topology.

    (a) (b)

    Figure 4.11: the correct partitioning (a) can be destroyed by an incorrect merge (b)

    The second phase of the algorithm is "topological" merge. Two or more

    neighbouring parts are merged if each of them has more than one neigh-

    bour. The rationale is that most living things or objects have their limbs

    arranged around just one center.

    (Another extension of the algorithm would be to merge several parts ifthey can be turned in such a way that this results in high solidity. For

    example, the tail of a ray is basically a long cylinder which can be bent

    several times. Because of low solidity, it will be split in about ten parts

    by the SplitShape algorithm. If all those parts are merged, the degree of

    similarity between the ray with a straight tail and a bent tail will increase

    making the retrieval robust to articulation of limbs.)

    4.5 Feature extraction

    4.5.1 Global features

    The idea is to capture the overall appearance of the shape without going

    into details. These features are computed right after the boundary ex-

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    48/69

    4.5 Feature extraction 39

    traction. As global features we took 14 normalized Fourier Descriptors

    obtained by the Contour Fourier method [5]. The advantages of these

    features are:

    - robustness to noise,

    - invariance to rotation, translation and scale,

    - invariance to starting point on the boundary,

    - computational efficiency.

    4.5.2 Local features

    To describe the segmented parts we used the following features:

    1. Fourier descriptors (also 14, as in the case of global features)

    2. roundness3. relative Area

    4. solidity

    5. number of neighbours

    We found out that Fourier descriptors are much better at describing con-

    vex shapes compared to such features as eccentricity. They can distin-

    guish between such shapes as squares and hexagons, which is not thecase with many other global features.

    Roundnesshas very small values for elongated shapes such as sticks or

    pencils and is therefore good at detecting them.

    Relative area helps to know at which scale the parts should be matched

    during retrieval since the system expects whole shapes as a query.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    49/69

    4.6 Retrieval algorithm 40

    Solidityof most parts will be equal to one because the SplitShape algo-

    rithm tries to remove only solid parts. However, during topological merge

    several central parts can be merged together. For example, a beetle torso

    will be most likely split in two halves but later merged together because

    each part has several neighbours (legs or antennas).

    Number of neighboursallows to distinguish between such classes as de-

    vice1 and device2. Both of these cogwheel-like structures have a central

    core with limbs around them. However, the former has six spikes whereas

    the latter has eight.

    4.6 Retrieval algorithm

    Once a reasonably intuitive shape partitioning has been found, partial

    shape matching can be carried out which is more robust to articulations

    and occlusions than whole shape matching.

    For similarity computation we used a distance measure based on Eu-

    clidean distance between feature vectors.

    We found that with increasing solidity partitioning becomes increasingly

    unstable and thus part-based distance becomes unreliable. Therefore we

    decided to weight distances depending on the shape soliditys.

    1 d1

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    50/69

    4.6 Retrieval algorithm 41

    1 Compute feature matrix of the query shape Q

    2

    3 for all part_feature matrices Mk in the database

    4 if Q has more rows than Mk

    5 swap(Mk,Q);

    6 end

    7

    8 for each r ow ri o f Q

    9 for each row rj of Mk

    10 d(j) = dist(ri,rj) % Euclidean distance between two parts

    11 end

    12 totalDist += min(d) % add dist of best matching parts

    13 remove matched part from Mk

    14 end

    15 totalDist+=penalty(unmatched parts)

    16 end

    Thus, the part-based shape distance is the sum of Euclidian distances

    between best matching parts and a penalty for unmatched parts.

    The advantage of this algorithm is that once shapes features have been

    extracted, the retrieval is fast because then only Euclidian distances haveto be computed, opposed to graph-matching algorithms where most com-

    putation has to be done during retrieval. Typically it takes less than 10

    ms to match two shapes of average complexity.

    Another advantage is flexibility. By adjusting the penalty weight one can

    implicitly set the threshold for tolerable occlusion percentage. For ex-

    ample, if the penalty is set to zero (which is the case in the shape tokens

    algorithm [2]), then it is enough to match all parts of one shape to com-

    pute distance. Thus, the distance between a cogwheel and a circle would

    be zero because the core of the cogwheel perfectly matches the circle.

    However, we believe this is not the most intuitive result.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    51/69

    Chapter 5

    Performance Evaluation

    5.1 Retrieval rate

    To test the performance of our system we evaluated the retrieval rate on

    the dataset created by the MPEG-7 committee for evaluation of shape

    similarity measures [20]. The test set consists of 70 different classes of

    shapes, each class containing 20 similar objects, usually (heavily) dis-

    torted versions of a single base shape. The whole dataset therefore con-

    sists of 1400 shapes. For example, each row in Figure shows four shapes

    from the same class.

    We focus our attention on the performance evaluation in experiments es-

    tablished in Part B of the MPEG-7 CE-Shape-1 data set.

    Each image was used as a query, and the retrieval rate is expressed by the

    so called Bulls Eye Percentage (BEP): the fraction of images that belong

    to the same class in the top 40 matches. Since the maximum number of

    correct matches for a single query image is 20, the total number of correct

    matches is 28000.

    Strong shape variations within the same classes make that no shape simi-

    42

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    52/69

    5.1 Retrieval rate 43

    (a) (b)

    Figure 5.1: Some shapes used in part B of MPEG-7 Core Experiment CE-Shape-1.

    Shapes in each row belong to the same class.(reprinted from [20] )

    larity measure achieves a 100% retrieval rate. E.g., see the third row in (a)

    and the first and the second rows in (b). The third row shows spoons that

    are more similar to shapes in different classes than to themselves [20].

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    53/69

    5.1 Retrieval rate 44

    010

    20

    30

    40

    50

    60

    70

    80

    90

    100

    RetrievalRate(%)

    octo

    pus

    device

    2de

    vice

    0tree device

    5

    dev

    ice9 crow

    nde

    vice

    1be

    etle

    device

    6 camel

    device

    7hat

    device

    4sp

    oon

    lizza

    rd bric

    k Mis

    kde

    vice

    3fish

    chicke

    n

    elepha

    nt

    b

    utte

    rfly horse

    sea_

    snak

    e wat

    ch lmfis

    hrayba

    t

    chop

    per

    person

    al_c

    ar turtle frog

    ham

    merca

    rrat bone pen

    cil bird dee

    rfla

    tfish Hear

    tpo

    cket

    clas

    sicfly com

    ma

    foun

    tainja

    r shoest

    ef apple cattl

    edog truck sprin

    g

    cellu

    lar_

    phon

    egu

    itar key

    device

    8bel

    lHC

    ircle fork

    carria

    gebottl

    ecup

    horses

    hoe tedd

    y glas

    child

    ren face

    contourFourier

    ourmethod

    Figure5.2:Resu

    ltsoftheMPEG-7CE-Shape-1partBtestforeachclassforb

    othContourFourierdescriptorsandourpart-basedmethod.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    54/69

    5.2 Time issues 45

    Figure 5.2 shows that our method (significantly) outperforms the CFD

    method for 55 classes. The Bulls Eye percentagesare:

    - our method: 63.536 %

    - Contour Fourier method: 57.014 %.

    From Figure5.2one can see that our method performs best on shapes

    having clear part structure and thus having stable (consistent) partition-

    ing, such as device7, which is most logically split to a central core and10 triangles around it.

    Our method also better deals with occlusions and articulations than the

    CFD method (see Fig.5.5)

    However, in case of unstable partitioning the retrieval rate of our method

    significantly decreases.

    5.2 Time issues

    To run the Bulls Eye test we used Intel Pentium 4 CPU with 2.26 GHz

    and 512 MB of RAM. The programs were unoptimised Matlab code.

    Our method Contour Fourier method

    Time to complete Bulls Eye test 4h 32min 2h 2min

    Average retrieval time for one query 11.65 seconds 5.23 seconds

    Time to extract features 117 minutes 20 minutes

    Average time to extract features 5 seconds 0.85 seconds

    Table 5.1: Time needed to perform feature extraction and retrieval.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    55/69

    5.2 Time issues 46

    5.2.1 Feature extraction

    Table5.2shows that the proposed method is about six times slower than

    the CFD method. In fact, the CFD is a subset of our method (see "com-

    pute global features" part). However, the main amount of computation

    for feature extraction is due to convex hull construction. Nevertheless,

    this step cannot be left out because, as previously shown, convexity is

    one of the main features that determine part saliency.

    It also deserves mentioning that the CFDs speed isO(n)in the number ofboundary pixelsnand is absolutely independent of the shape (i.e. relative

    positions of pixels). On the other hand, the speed of our partitioning al-

    gorithm strongly depends on shape and is O(mn), wheremis the numberof curvature minima. However, this increase in complexity has limits be-

    cause the adaptiveSmoothing and curveEvolution routines

    simplify the shape boundary and thus limit the number of minima.

    In the best case the shape is convex and doesnt need to be split. Thenthe algorithm is exactly as fast as the CFD. This explains why the feature

    extraction time ranges from 1 to 15 seconds per shape.

    5.2.2 Retrieval

    Here the retrieval times differ on average by the factor of two. Again, in

    the CFD case the speed is constant because to compare two shapes thedistance between just two feature vectors has to be computed.

    On the other hand, our system basically needs to compute distances be-

    tween each pair of parts additionally to the global feature vector. Thus,

    time required grows quadratically in the number of shape parts. How-

    ever, the number of parts is implicitly limited due to the preceding curve

    smoothing, so that one can basically regard the increase in delay as a

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    56/69

    5.3 Comparison to other part-based methods 47

    constant factor.

    We think that the retrieval speed can be improved by properly indexing

    feature matrices. For example, one could sort them by the number of rows

    (i.e. the number of shape parts) and then match only potential candidates.

    We also believe that for a CBIR system, most relevant is the retrieval

    time because this is the time the user has to wait when he needs to retrieve

    results. Moreover, feature extraction needs to be done only once, whereas

    retrieval delay occursevery time.

    5.3 Comparison to other part-based methods

    We compared our system to Shape Tokens, Latecki NL and Skeleton-

    based methods, which were briefly described in chapter 1. To obtain

    the Bulls Eye scores we used http://give-lab.cs.uu.nl/sidestep. The au-

    thors of this website have reimplemented many popular shape-based al-

    gorithms, thus we assume that the reported scores are correct.

    5.3.1 Shape tokens

    As mentioned before, this method is not rotation-invariant. It is also not

    robust to occlusions, since sufficiently large protrusions caused by noise

    can cause extra inflection points and thus splitting of shape tokens. Our

    method would simply regard such protrusions as a new parts and remove

    them in the SplitShape method.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    57/69

    5.3 Comparison to other part-based methods 48

    5.3.2 Skeletons

    The main problems of skeleton-based matching are: sensitivity to noise,

    high computation complexity and sometimes unituitive partitioning (see

    chapter1 for details). The advantage is robustness to articulations. Usu-

    ally shock graphs need to be constructed and matched, which is very time

    consuming. The Bulls Eye percentage reported by the aforementioned

    website is 68%.

    5.3.3 Latecki NL

    This method is more accurate than the previous two with the Bulls Eye

    score 72%. However, the price paid is O(n3log(n)) computation com-plexity during the matching.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    58/69

    5.3 Comparison to other part-based methods 49

    device710d=0

    device205d=1.71

    device717

    d=1.73

    device715

    d=1.83

    device702d=1.88

    device704d=1.9

    device706d=2.23

    device711d=2.45

    device716d=2.54

    device703d=2.55

    device213d=2.58

    device709d=2.64

    device219d=2.66

    device713d=2.66

    device114d=2.72

    octopus18d=2.73

    device217d=2.84

    device719d=2.84

    device701d=2.86

    device705d=2.87

    Figure 5.3: Twenty most similar images to device7-10 found by our method. Matched

    parts are displayed in the same color as the corresponding query parts. Parts for which

    no correspondence was found are painted black.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    59/69

    5.3 Comparison to other part-based methods 50

    device710d=0

    device205d=0.55

    octopus18d=1.31

    device317d=1.4

    device304d=1.42

    device312d=1.42 hat04

    d=1.44

    device319d=1.48

    device302d=1.53

    device309d=1.54

    device107d=1.6

    device307d=1.6

    device012d=1.63

    device115d=1.63

    device116d=1.65

    device111d=1.66

    device310d=1.66

    device114d=1.66

    device303d=1.66

    device106d=1.67

    Figure 5.4: Twenty most similar images found by the CFD method. Images are dis-

    played as silhouettes because this method doesnt compute any parts.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    60/69

    5.3 Comparison to other part-based methods 51

    ray11d=0

    ray12d=0.98

    ray15d=1.55

    ray09

    d=1.56

    ray16d=1.57

    ray06d=1.75 ray10

    d=1.78

    ray07d=1.94

    ray08d=1.97

    ray04d=1.98

    cattle03d=1.98

    ray13d=2.02

    ray03d=2.04

    cattle15d=2.05

    cattle07d=2.06

    elephant06d=2.07

    cattle02d=2.11

    elephant09d=2.11

    elephant03d=2.12 cattle10

    d=2.12

    Figure 5.5: Twenty most similar images to ray-11 found by our method. Matched parts

    are displayed in the same color as the corresponding query parts. Parts for which no

    correspondence was found are painted black.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    61/69

    5.3 Comparison to other part-based methods 52

    ray11d=0

    ray12

    d=0.6

    ray15d=1.04

    ray16d=1.05

    butterfly20d=1.32

    butterfly17d=1.4

    ray06d=1.46 elephant09

    d=1.46

    cattle01d=1.46

    ray07d=1.48 elephant06

    d=1.49

    ray19d=1.49

    butterfly14d=1.55 cattle20

    d=1.56

    horse09d=1.57

    horse05d=1.57

    camel16d=1.59

    ray09d=1.63

    horse10d=1.64

    deer19d=1.64

    Figure 5.6: Twenty most similar images to ray-11 found by the CFD method.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    62/69

    5.3 Comparison to other part-based methods 53

    device601 device602 device603 device604

    device605 device606 device607 device608

    device609 device610 device611 device612

    device613 device614 device615 device616

    device617 device618 device619 device620

    Figure 5.7: Inconsistent partitioning makes it difficult to match shapes.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    63/69

    Chapter 6

    Conclusions

    As expected, the developed system performs best on shapes having clear

    part structure (such as device1 or device7). It can distinguish between 5-,

    8- and 10-pointed stars even in the presence of noise or articulations.

    Also, because of partial matching, it significantly outperforms global de-

    scriptors when dealing with partially occluded shapes (e.g. the classes

    ray, apple and octopus of the MPEG-7 Shape-1 dataset).

    Problems may arise when dealing with shapes prone to unstable parti-

    tioning. Whenever a shape is split incorrectly (i.e. in a different way than

    the members of the same class), it leads to a big shape distance. This

    explains why the retrieval rate is so low for the classes fly or dog.

    To overcome this flaw, the developed system was extended to combineboth, part-based and global descriptors. Shape distance is computed

    as weighted sum between global and part-based distances. Whenever a

    shape is prone to unstable partitioning (which is most often the case with

    high solidity shapes), the algorithm gives part-based distance less weight.

    Thus, the system tries to perform at least as good as the associated global

    descriptor.

    54

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    64/69

    Chapter 7

    Future Work

    To improve the developed system one could implement the following ex-

    tensions:

    - Take relative positions of parts into account. Define such features

    as part orientation and describe the position of each part in polarcoordinates (distance from shape centroid and polar angle).

    - Merge part chains. Bent shapes (such as sea_snake) are currently

    split in many convex segments, although topologically seen such

    shapes have no part structure. Curved parts can be described by their

    solidity or bending energy. This would allow to detect similaritity

    between bone and broken_bone.

    - To make matching more robust to partitioning errors, allow to matchseveral parts to one part or even N to M parts at once. Merge or split

    parts at runtime, to better match the query shape. This extention

    would be relatively easy to implement because currently the algo-

    rithm can save the computed parts in the database.

    - Compute several representations for each shape (most probable par-

    titions). Then the shape distance is the smallest distance between all

    55

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    65/69

    Chapter 7: Future Work 56

    pairs of such partitions. (This would, of course, reduce the speed of

    both the feature extraction and retrieval processes asO(n2)).

    - Use a more powerful global descriptor or a combination hereof, for

    example the multiscale Fourier Descriptor [5].

    - Use envelope detection or just convex hull before extracting shape

    boundaries. This would allow to correctly classify such shapes as

    distorted pentagons, triangles and squares from the MPEG-7 Shape-

    1 dataset.

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    66/69

    Contributions of this project thesis

    Design of a utility function that combines several cognitive princi-ples of shape partitioning,

    Algorithm to find perceptually salient curvature extrema, Algorithm to check whether a cutting segment lies inside a polygon, Shape splitting and merging algorithms, Combining of part-based and global similarity measures, Image database retrieval architecture.

    57

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    67/69

    Appendix

    Results that turned out less useful

    The original project title was "Shape representation and matching using geometric prim-

    itives (geons)". The main idea was to decompose a given 2-D binary shape into gener-

    alized rectangles and ellipses and represent the shape as a directed graph or encode as a

    number.

    However, we found that this approach is only applicable to man-made objects that have

    clear geometric structure. On the contrary, most natural objects cannot be reliably rep-

    resented by simple geometric figures. Thus, this decomposition scheme is not robustand hence inappropriate for part-based matching.

    Therefore, instead of describing parts in parametric form, we decided to extract global

    features from each part. This means that in the new approach the parts can have arbitrary

    form.

    58

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    68/69

    Bibliography

    [1] N. Alajlan. Multi-Object Shape Retrieval Using Curvature Trees. PhD thesis,University of Waterloo, Canada, 2006.

    [2] S. Berretti, A. Del Bimbo, and P. Pala. Retrieval by shape using multidimensional

    indexing structures. InICIAP, pages 945950, 1999.

    [3] D. D. Hoffman and W. A. Richards. Parts of recognition. In T. F. Shipley and P. J.

    Kellman, editors,Cognition, chapter 18, pages 6596. Elsevier Science, 1984.

    [4] D. D. Hoffman and M. Singh. Salience of visual parts. Cognition, 63, pages

    2978, 1997.

    [5] I. Kunttu, L. Lepisto, J. Rauhamaa, and A. Visa. Multiscale fourier descriptor for

    shape classification. InICIAP 03: Proceedings of the 12th International Confer-

    ence on Image Analysis and Processing, pages 536541, Washington, DC, USA,

    2003. IEEE Computer Society.

    [6] L. J. Latecki and R. Lakamper. Convexity rule for shape decomposition based on

    discrete contour evolution. Computer Vision Image Understanding, 73(3):441

    454, 1999.

    [7] L. J. Latecki and R. Lakamper. Shape similarity measure based on correspondence

    of visual parts. IEEE Transactions on Pattern Analysis and Machine Intelligence,22(10):11851190, 2000.

    [8] L. J. Latecki, R. Lakamper, and U. Eckhardt. Shape descriptors for non-rigid

    shapes with a single closed contour. InIEEE Conf. on Computer Vision and Pattern

    Recognition (CVPR), pages 424429, 2000.

    [9] L. J. Latecki, R. Lakamper, and D. Wolter. Optimal partial shape similarity.Image

    and Vision Computing, 23(2):227236, 2005.

    59

  • 8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

    69/69

    Bibliography 60

    [10] E. Petrakis, A. Diplaros, and E. Milios. Matching and retrieval of distorted and

    occluded shapes using dynamic programming, 2002.

    [11] P. L. Rosin. Shape partitioning by convexity. BMVC99, pages 633642, 1999.

    [12] Y. Rui and T. S. Huang. Image retrieval: Current techniques, promising directions,

    and open issues. Journal of Visual Communication and Image Representation 10,

    pages 3962, 1999.

    [13] M. Safar, C. Shahabi, and X. Sun. Image retrieval by shape: A comparative study.

    Technical Report 1, University of Southern California, 1999.

    [14] K. Siddiqi and B. B. Kimia. Parts of visual form: Computational aspects. IEEETransactions on Pattern Analysis and Machine Intelligence, 17(3):239251, 1995.

    [15] M. Singh and D. D. Hoffman. From Fragments to Objects: Grouping and Seg-

    mentation in Vision, chapter 9, pages 401459. Elsevier Science, 2001.

    [16] M. Singh, G. Seyranian, and D. Hoffman. Parsing silhouettes: The short-cut rule.

    Perception and Psychophysics, 61, pages 636660, 1999.

    [17] M. B. Stegmann and D. D. Gomez. A brief introduction to statistical shape analy-

    sis. Technical report, Technical University of Denmark, 2002.

    [18] M. Tanase-Avatavului. Shape Decomposition and Retrieval. PhD thesis, UtrechtUniversity, Holland, 2005.

    [19] R. S. Torres and A. X. Falcao. Content-based image retrieval: Theory and appli-

    cations. Revista de Informatica Teorica e Aplicada, 13(2):161185, 2006.

    [20] R. C. Veltkamp and L. J. Latecki. Properties and performances of shape similarity

    measures. In Tim Crawford and Remco C. Veltkamp, editors,Content-Based Re-

    trieval, Dagstuhl Seminar Proceedings. IBFI, Schloss Dagstuhl, Germany, 2006.

    [21] D. Zhang and G. Lu. A comparative study on shape retrieval using fourier descrip-

    tors with different shape signatures, 2001.

    [22] D. Zhang and G. Lu. Review of shape representation and description techniques.

    Pattern Recognition, 37(1):119, 2004.