Locally Affine Invariant Descriptors for Shape Matching and Retrieval

IEEE SIGNAL PROCESSING LETTERS, VOL. 17, NO. 9, SEPTEMBER 2010 803

Locally Affine Invariant Descriptorsfor Shape Matching and Retrieval

Zhaozhong Wang, Member, IEEE, and Min Liang

Abstract—This work proposes novel locally affine invariant de-scriptors for shape representation. The descriptors are theoreti-cally simple and solid, derived from the matrix theories. They canbe used for matching and retrieval of shapes under affine transfor-mation, articulated motion or nonrigid deformation. Comparisonsof the work with the state-of-the-art shape descriptors are per-formed based on synthetic and some well-known databases. The ex-periments validate that the proposed descriptors achieve higher re-trieval accuracy and have faster running speed than most of otherapproaches.

Index Terms—Affine invariance, shape descriptor, shapematching, shape retrieval.

I. INTRODUCTION

S HAPE matching and retrieval have been long key problemsin image processing and pattern recognition. Applications

of these techniques cover the fields such as industry, businessand entertainment [16]. This work proposes novel shape de-scriptors for shape matching and retrieval. The descriptors arederived from the simple but rigorous theory of matrices, thusthey are computationally efficient and mathematically solid.A descriptor for a point on shape contour is a vector withlocally affine-invariant and permutation-invariant properties. Itcaptures the invariance of local shapes under varying transfor-mations like rotation, shearing, anisotropic scaling, articulatedmotion and nonrigid deformation.

There are many different kinds of approaches that havebeen proposed to represent shape features. One of the mostwell-researched boundary-based shape descriptions is the cur-vature scale space (CSS) [14], which uses the number of pointswith zero-crossing curvature at different levels of smoothedcontours. The shape tree [7] is another hierarchical descriptionof object’s boundary which captures geometric arrangement ofpoints in different subsampling levels. One more hierarchicaldescriptor [1] uses curvature tree (CT) and triangle-area rep-resentation (TAR) to model both the shape and topology ofimage objects. In [9] the authors developed a descriptor calleddistance sets: Each feature point along a contour has a datastructure containing its distance to surrounding points. Theshape contexts (SC) [4] offer global characterizations of shapes,depicting each point by the distributions of the remaining points

Manuscript received April 12, 2010; revised June 22, 2010; accepted June29, 2010. Date of publication July 12, 2010; date of current version July 22,2010. This work was supported by the National Natural Science Foundation ofChina under Grant 60803071. The associate editor coordinating the review ofthis manuscript and approving it for publication was Dr, Dimitrios Tzovaras.

The authors are with the Image Processing Center, Beihang University, Bei-jing 100191, China (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/LSP.2010.2057506

relative to it. Based on the SC, the work [5] defines a sym-bolic descriptor that transforms the contour of each shapeinto symbolic string and uses edit distances to measure thesimilarity. The authors of [12] proposed a descriptor called theinner-distance shape context (IDSC), which is defined as thelength of the shortest path between landmark points within ashape silhouette. This method shows advantage in capturingshapes with articulated parts. Shape transformations havealso been considered in shape representations. For example,the feature-driven generative model [15] for representing therelationship between shapes allows for transformations such asaffine and nonrigid. The work [3] and the references thereinsurvey more affine-invariant shape description methods.

Besides these shape descriptions and representations, re-searchers also proposed fine matching methods to enhance theperformance of shape retrieval. A hierarchical segment-basedmatching method [13] that proceeds in a global to local direc-tion has been proposed. The authors in [2] proposed to replacethe original distances between two shapes with distances in-duced by geodesic paths in the shape manifold. The locallyconstrained diffusion process (LCDP) [18] takes account of thebeneficial influence of other shapes in the similarity measure ofeach pair of shapes, and uses a diffusion process to propagatethe influence.

In this work we shall use a matching method similar to thatused for SC and IDSC, but the proposed descriptors are ableto be embedded in other matching frameworks. We propose thebasic theory in Section II, then design the matching and retrievalalgorithm in Section III. Section IV demonstrates experimentalresults. The last section draws our conclusion.

II. INVARIANT THEORY AND SHAPE DESCRIPTORS

The fact that the shape invariance under simple transforma-tions like scaling can be achieved by point normalization moti-vates us to explore general shape invariance under affine trans-formation. We consider points in the 2-D Euclidean space;these points may locate on a shape contour. The shape configu-ration matrix [6] is given as

(1)

where each row vector denotes the Cartesiancoordinate of the th point. Assume that and the matrix

is of full rank. To handle the complete affine transformationsin a compact form we introduce the homogeneous coordinate

and define the augmented configuration matrix

(2)

where . Then we define

(3)

1070-9908/$26.00 © 2010 IEEE

804 IEEE SIGNAL PROCESSING LETTERS, VOL. 17, NO. 9, SEPTEMBER 2010

as the full affine matrix, where is the translation vectorand the nonsingular matrix may be called the cen-tral affine matrix, which codes the transformations of rotation,(isotropic and anisotropic) scaling, skewing, reflection, etc. Anyaffine transform for the points can be realized by the ma-trix product . Thus we assume in general that an augmentedconfiguration matrix is transformed from by some affinetransformation and permutation as

(4)

where is a permutation matrix, which allows thepoint sets to be ordered arbitrarily. Our previous work [17] hasproven that if are orthogonal projection ma-trices on the column spaces of the matrices and respec-tively, then (4) implies that

(5)

This equation indicates that the two matrices and arepermutation similar under any affine transformation.

It is known that the permutation similarity of matrices re-orders their diagonal entries [10], i.e., (5) implies

where , denote the vectors of diag-onal elements of and . The two vectors contain the sameelements but with different orders. To remove the order/per-mutation effect of , we sort the elements of and

with, for example, the descending order as

(6)

where and are two given permutation matrices whichreorder the diagonal entries of and , and

is a vector with ordered entries. The vector is the same forthe two matrices and . In other words, the vector isunique under any affine and permutation transformations of thepoint set . Hence it can be used as a descriptor to identify thepoint set under such transformations.

More descriptors can be derived from the above idea. Define

(7)

as the centerized configuration matrix, where is called thecentering matrix [6] and the identity matrix, we have the fol-lowing.

Proposition 1: If two augmented configuration matricesand satisfy (4), then the corresponding centerized configura-tion matrices and satisfy

(8)

where is the central affine matrix.The proof is given in Appendix A. Similar to (4), (8) implies

the relation

(9)

where and are corresponding orthogonal projection ma-trices of and . We can use the ordered diagonal vectorsof these matrices as shape descriptors as well.

Fig. 1. Affine invariant descriptors �� of different shapes, where � �

�� on each shape contour are sampled. Left: Descriptors of the sixshapes in the first column of Fig. 3. Right: Descriptors of the first six shapes ofquadrangles in the second row of Fig. 3.

Which descriptor is better? One of the criteria is the robust-ness. A shape descriptor should be robust to shape perturba-tions. It is known that perturbation bounds in computing orthog-onal projection matrices depend on the condition numbers ofconfiguration matrices [8]: Smaller condition numbers result inlower perturbation bounds. The centerized configuration matrix

generally has a smaller condition number than the augmentedconfiguration matrix , so we prefer to use the vector to de-scribe the shape .

The proposed descriptors could be viewed as a special kindof normalization for the points in , which removes the affectof affine transformation while maintains the inherent shape fea-ture. Fig. 1 illustrates the shape descriptors . The left plot ofFig. 1 compares the descriptors of different shapes, showing thatthe proposed descriptors are discriminative. The right plot com-pares the descriptors of affine-transformed shapes, illustratingthe invariance of descriptors.

III. THE MATCHING AND RETRIEVAL ALGORITHM

In real-world problems, shape distortions caused by perspec-tive projection, nonrigid deformation or articulated motion canbe approximated by locally affine transformations. We designin this section a shape matching and retrieval scheme based onlocally affine invariant descriptors. Assume that a shape contourconsists of totally points. For each point on the contour wepick up points in its neighborhood to form a local con-figuration . Then we compute the centerized config-uration and its orthogonal projection matrix . We mayuse the ordered diagonal vector

of as the locally affine invariant descriptor of ; see Fig. 2for illustration.

For a point on the first shape contour and a point onthe second one, we both take points in their neighborhoods toform the descriptors. Let denote the cost of matchingbetween these two points. The cost can be simply computed bythe sum of absolute difference:

(10)

This is much simpler than the statistic used in the matchingbased on the SC and IDSC.

Given the set of costs between all pairs of pointsand , we minimize the total cost using the bipartite graph

method [4] to achieve a matching of the shapes. If ordering in-formation of contour points are available, we use the dynamical

WANG AND LIANG: LOCALLY AFFINE INVARIANT DESCRIPTORS FOR SHAPE MATCHING AND RETRIEVAL 805

Fig. 2. Locally affine invariant descriptors for points on shape contours. Thetwo shapes in the middle are the same scissor but with articulated motion, ob-tained from the database in Fig. 4. The curves on the left and right sides are thedescriptors of the corresponding red square spots on the shape contours. Eachcontour contains totally� � �� , but � � �� neigh-boring to the red square spots are used to form the local descriptors.

programming, just like [12], to speed up the matching. Theremay exist contour points that do not satisfy the locally affinetransformations. We treat these points as outliers, and set a con-stant matching cost as the threshold for outlier detection. Thisapproach also allows the number of points on the two shape con-tours to be different, and has been successfully used in the SCand IDSC matching.

After achieving the matching of points between two shapes,we use the matching cost

(11)

to measure the similarity between them, where the mappingdenotes the computed permutation of the indices to form theinlier matching between and , and is the number ofinlier matchings. The number is determined by the threshold

, and in our experiments we usually set to maintain 60% ofcontour points as inliers, i.e., .

The proposed scheme has several theoretical merits: The de-scriptors are actually affine and permutation invariant, there-fore it is unnecessary to compute twice [12] for mirrored shapematching, and unnecessary to establish local tangent coordi-nates [4] for rotated shapes. Typical descriptors like the SC andIDSC are merely invariant to translation and isotropic scaling,while our descriptors are locally invariant to the full set of affinetransformations. Thus they can describe more complex shapedeformations. Additionally, the descriptors can be used in par-tial shape similarity since the descriptor for each point is formedonly using its neighboring points.

IV. EXPERIMENTAL RESULTS

We perform experiments for various data sets to test theproposed algorithm and compare it with the state-of-the-artmethods. We first use a synthetic data set of affine-transformedand perturbed shapes, which consists of 60 images from sixclasses (Fig. 3). The number of sample points is 200 for all thetests based on the SC, the IDSC and our descriptors, and thematching methods are all the dynamical programming. Sincethe affine transformations for each class of images in Fig. 3 areglobal, we use a “global” version of the proposed descriptors,i.e., the total 200 contour points are all used to form the de-scriptors, see Fig. 1. In the tests, each shape is presented as a

Fig. 3. Synthetic data set for affine-transformed shapes, including ellipses,quadrangles, five-pointed stars, triangles, bottle and heart shapes. Each rowcontains ten shapes of the same class.

Fig. 4. Articulated data set from [12], which includes 40 images from eightobjects. Each column contains five images from the same object.

TABLE IRETRIEVAL RESULTS FOR THE ARTICULATED SHAPES IN FIG. 4

test query and the top 20 (two times the number of shapes ineach class) matches are counted. Then we measure the retrievalaccuracy using the so-called Bullseye score, which is definedas the rate of the number of correct matching of all imagesto the highest possible number of matching (10 60). Herethe score of SC is 64.50%, IDSC 64.00%, and our approach98.33%. This verifies the very high accuracy of the descriptorswe proposed for retrieval under affine transformations.

Next we test the articulated data set [12], as shown in Fig. 4.It is a challenging data set because of the similarities betweendifferent objects and the articulated property. For our scheme25 points in the neighborhood of each contour point are usedto form the local descriptor (see Fig. 2), and all other experi-mental conditions are the same as [12]. The retrieval results aresummarized as the numbers of the 1st, 2nd, 3rd and 4th mostsimilar matches that come from the correct object, as shown inTable I. The accuracy of the SC is relatively low. The IDSC thatdesigned specially for articulated shape matching shows a con-siderable advantage. Our approach obtains a similar retrieval ac-curacy with the IDSC, but our approach is superior to the IDSC

806 IEEE SIGNAL PROCESSING LETTERS, VOL. 17, NO. 9, SEPTEMBER 2010

TABLE IITHE BULLSEYE RETRIEVAL RATES OF DIFFERENT METHODS

FOR THE MPEG-7 CE-SHAPE-1 DATABASE

TABLE IIIRETRIEVAL TIME OF EACH ALGORITHM FOR DIFFERENT DATA SETS

in this experiment since the retrieval time it consumes is muchless, see Table III.

We also test the MPEG-7 CE-Shape-1 database [11], whichis extensively used for assessing shape matching and retrievalmethods. It has 1400 silhouette images with 70 types of ob-jects, each having 20 different shapes. This database is verychallenging since it contains both natural and artificial shapesand covers a variety of objects. For our method we use

, and use to form the localdescriptors; the matching method is still the dynamical pro-gramming. The Bullseye scores of different methods are shownin Table II. The score of our descriptor is higher than all otherdescriptors. In the table the LCDP [18] performs the best, but itscontribution mainly focuses on the matching scheme, and thedescriptor it used is still the IDSC. We expect that a powerfulmatching scheme with our descriptors would result in more ac-curate retrieval.

The computational loads of different shape descriptors arealso recorded. Table III lists the computing time of each methodon Matlab for different data sets. The “ ” in the table standsfor seconds and “ ” for hours. The results show that our ap-proach is the least time-consuming one compared with the SCand IDSC. For the MPEG-7 database, the running time of theIDSC is for 100 sample points [12], while our method uses 200sample points, as mentioned previously.

V. CONCLUSION

From the rigorous theory of matrices we derived the locallyaffine invariant shape descriptors for matching and retrievalproblems. After constructing the (augmented or centerized)configuration matrices from points on shape contours, thedescriptors are formed by the ordered diagonal elements oforthogonal projection matrices of the configurations. The de-scriptors are invariant to the full set of affine transformationsand can approximate local shape distortions caused by nonrigiddeformation or articulated motion. Experiments validate thatthe proposed descriptors are superior to the state-of-the-artdescriptions such as SC and IDSC by taking into account boththe retrieval accuracy and speed. We shall study further thecombination of the descriptors with more efficient matchingschemes such as the LCDP to achieve higher performance.

APPENDIX

PROOF OF PROPOSITION 1

Using the definitions of the augmented configuration matrixand the full affine matrix, (4) becomes

It follows from this equation that

(12)

here we have used the relation . Multiplying thecentering matrix on both sides of (12),

The left-hand side of this equation is equal to since, and for the right-hand side, .

Therefore the equation holds.

REFERENCES

[1] N. Alajlan, M. Kamel, and G. Freeman, “Geometry-based image re-trieval in binary image databases,” IEEE Trans. Pattern Anal. MachineIntell., vol. 30, no. 6, pp. 1003–1013, 2008.

[2] X. Bai, X. Yang, L. J. Latecki, W. Liu, and Z. Tu, “Learning contextsensitive shape similarity by graph transduction,” IEEE Trans. PatternAnal. Machine Intell., vol. 32, no. 5, pp. 861–874, 2010.

[3] A. Bandera, R. Marfil, and E. Antunez, “Affine-invariant contoursrecognition using an incremental hybrid learning approach,” PatternRecognit. Lett., vol. 30, pp. 1310–1320, 2009.

[4] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and objectrecognition using shape contexts,” IEEE Trans. Pattern Anal. MachineIntell., vol. 24, no. 24, pp. 509–522, 2002.

[5] M. R. Daliri and V. Torre, “Robust symbolic representation forshape recognition and retrieval,” Pattern Recognit., vol. 41, no. 5, pp.1782–1798, 2008.

[6] I. L. Dryden and K. V. Mardia, Statistical Shape Analysis. New York:Wiley, 1998.

[7] P. F. Felzenszwalb and J. Schwartz, “Hierarchical matching of de-formable shapes,” in IEEE Conf. Computer Vision and Pattern Recog-nition, 2007, pp. 1–8.

[8] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd ed. Bal-timore, MD: Johns Hopkins Univ. Press, 1996.

[9] C. Grigorescu and N. Petkov, “Distance sets for shape filters and shaperecognition,” IEEE Trans. Image Process., vol. 12, pp. 1274–1286,2003.

[10] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.:Cambridge Univ. Press, 1985.

[11] L. J. Latecki, R. Lakamper, and U. Eckhardt, “Shape descriptors fornon-rigid shapes with a single closed contour,” in IEEE Conf. ComputerVision and Pattern Recognition, 2000, pp. 424–429.

[12] H. Ling and D. W. Jacobs, “Shape classification using the inner-dis-tance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 2, pp.286–299, 2007.

[13] G. McNeill and S. Vijayakumar, “Hierarchical procrustes matching forshape retrieval,” in IEEE Conf. Computer Vision and Pattern Recogni-tion, 2006, pp. 885–894.

[14] F. Mokhtarian, F. Abbasi, and J. Kittler, “Efficient and robust retrievalby shape content through curvature scale space,” in IEEE Conf. Com-puter Vision and Pattern Recognition, 2005, pp. 93–93.

[15] Z. Tu and A. L. Yuille, “Shape matching and recognition using genera-tive models and informative features,” in Eur. Conf. Computer Vision,2004, pp. 195–209.

[16] R. C. Veltkamp and M. Hagedoorn, Principles of Visual InformationRetrieval. London, U.K.: Springer-Verlag, 2000.

[17] Z. Wang and H. Xiao, “Dimension-free affine shape matching throughsubspace invariance,” in IEEE Conf. Computer Vision and PatternRecognition, 2009, pp. 2482–2487.

[18] X. Yang, S. K. Tezel, and L. J. Latecki, “Locally constrained diffusionprocess on locally densified distance spaces with applications to shaperetrieval,” in IEEE Conf. Computer Vision and Pattern Recognition,2009, pp. 357–364.

Documents

Locally Affine Invariant Descriptors for Shape Matching and Retrieval