*Corresponding author. Tel.: #61-8-9266-2110; fax: #61-8-9266-2819.
E-mail address: email@example.com (K. Shearer).
Pattern Recognition 34 (2001) 1075}1091
Video indexing and similarity retrieval by largest commonsubgraph detection using decision trees
Kim Shearer!,*, Horst Bunke", Svetha Venkatesh!
!Department of Computer Science, Curtin University of Technology, GPO Box U1987, Perth 6001, WA, Australia"Institut fu( r Informatik und Angewandte Mathematik, Universita( t Bern, Switzerland
Received 14 May 1999; received in revised form 1 March 2000; accepted 1 March 2000
While the largest common subgraph (LCSG) between a query and a database of models can provide an elegant andintuitive measure of similarity for many applications, it is computationally expensive to compute. Recently developedalgorithms for subgraph isomorphism detection take advantage of prior knowledge of a database of models to improvethe speed of on-line matching. This paper presents a new algorithm based on similar principles to solve the largestcommon subgraph problem. The new algorithm signi"cantly reduces the computational complexity of detection of theLCSG between a known database of models, and a query given on-line. ( 2001 Pattern Recognition Society. Publishedby Elsevier Science Ltd. All rights reserved.
Keywords: Graph matching; Similarity retrieval; Video indexing; Decision tree
With the advent of large on-line databases of imagesand video, it is becoming increasingly important to ad-dress the issues of e!ective indexing and retrieval. Themajority of existing techniques in the area of image andvideo indexing can be broadly categorised into low- andhigh-level techniques. Low-level techniques use at-tributes such as colour and texture measures to encodean image, and perform image similarity retrieval by com-paring vectors of such attributes [1}6]. High-level tech-niques use semantic information about the meaning ofthe pictures to describe an image, and therefore requireintensive manual annotation [7}14]. An additionalmethod of image indexing is by qualitative spatial rela-tionships between key objects. This form of index hasbeen successfully applied to indexing and retrieval ofimage data [15}17]. Recent work by Shearer and Ven-katesh [18,19] has extended the range of this repres-
entation to encoding of video data. With the objectinformation proposed for inclusion in the MPEG-4 stan-dard, such indexes will be simpler to implement. It shouldbe noted that none of these indexes provide a compre-hensive retrieval method, but may be considered com-plementary methods, each being one component of aretrieval toolkit.
When images or video are encoded using spatial in-formation, searches for exact pictorial or subpicturematches may be performed using a compact encodingsuch as 2D-strings [20}22]. Similarity retrieval, however,is best expressed as an inexact isomorphism detectionbetween graphs representing two images [23,24]. Imagesare encoded as graphs by representing each object in animage as a vertex in the graph, and placing an edge,labelled with the spatial relationship between the twocorresponding objects, between each pair of vertices.Note that this is only one plausible representation ofimages using graphs. Other representations are possible,and in some cases will be preferable. Indeed any relation-al structure represented as a graph could make use of thealgorithms discussed in this paper.
Inexact isomorphism detection between two graphscan be performed using one of two measures of similarity;
0031-3203/01/$20.00 ( 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 0 4 8 - 0
edit distance or largest common subgraph. While thereare well-known algorithms to solve these problems, thesealgorithms are exponential in time complexity in thegeneral case. This is a disadvantage when retrieval islikely to involve browsing of a large database, then pro-gressive re"nement.
Recently, new graph isomorphism algorithms havebeen developed by Messmer and Bunke [25,26], whichuse a priori knowledge of the database of model graphsto build an e$cient index o!-line. This index takesadvantage of similarity between models to reduceexecution time. Messmer and Bunke propose algo-rithms to solve the subgraph isomorphism problem,and to solve the inexact subgraph isomorphism prob-lem using an edit distance measure. While subgraphisomorphism is often used as a measure of similaritybetween images, the edit distance measure is notsuitable for image and video indexing by spatial relation-ships.
The major di$culty when applying edit distancemethods to the image similarity problem, is that there isno clear interpretation for the edit operations. Deletionof a vertex implies the exclusion of an object fromthe matching part of two graphs, altering the label of anedge represents altering the relationships of two objects,there is no meaningful comparison for these two opera-tions. This problem means that any similarity measurebased on graph edit distance will return a value withlittle or no physical signi"cance. Inexact isomorphismalgorithms based on an edit distance measure also su!erfrom bias when used to compare an input against mul-tiple model graphs. By the nature of edit distance algo-rithms, if the input graph is smaller than the models,smaller model graphs are more likely to be chosen assimilar.
A more appropriate measure of image similarity is thelargest common subgraph between the graphs represent-ing the images. Largest common subgraph is a simpleand intuitive measure of image similarity. The largestcommon subgraph between two graphs G
coding two images I1
and I2, represents the largest
collection of objects found in I1
that maintain thesame relationship to each other in both images. Theusual algorithm for determining largest common sub-graphs is the maximal clique detection method proposedby Levi .
The maximal clique detection algorithm is e$cient inthe best case, but in the worst case requires O((nm)n) time,where n is the number of vertices in the input graph andm the number of vertices in the model, for a completegraph with all vertices sharing the same label. This highcomputational complexity makes it di$cult to applyindices which are based on spatial relationships to largedatabases.
In this paper we describe a new algorithm for largestcommon subgraph detection. The algorithm has been
developed from an algorithm originally proposed byMessmer and Bunke [28,29]. As with the original algo-rithm, this algorithm uses a preprocessing step, appliedto the models contained in the database, to provide rapidclassi"cation at run time. The proposed algorithm isconsiderably more e$cient in time than any previousalgorithm for largest common subgraph detection, how-ever the space complexity of the algorithm somewhatrestricts its application. If the di!erence in time complex-ity is considered, from O((nm)n) for the usual algorithmfor matching an input of size n, to a database of modelsof size m, to the new algorithms complexity of O(2nn3),there is room to perform space saving operations whilestill signi"cantly out performing typical algorithms. Oneexample of space saving is to depth limit the decisiontree, such that searching is halted when a certaindepth is reached, and matching between the few graphsleft to separate is completed using a algorithm such asUllman's , instantiated from the matching alreadyperformed.
While largest common subgraph has been appliedmostly to image similarity retrieval, the application ofthese new algorithms to indexing by spatial relation-ships can also take advantage of the high degree ofcommon structure expected in video databases. In thispaper we treat a video as a simple sequence of images.Even with this straightforward treatment it is possibleto provide a similarity retrieval scheme that is ex-tremely e$cient, due to the high degree of commonstructure encountered in between frames in video. Manyframes of a video will be classi"ed by one element ofthe classi"cation structure used. This removes theimpediment of slow similarity retrieval for indices ofthis type. Furthermore, the algorithms may be appliedto labelled, attributed, directed or undirected graphs.Such graphs have a great deal of expressive power,and may be applied to other encodings of image andvideo information, or other data sets of relational struc-tures.
This paper begins by providing de"nitions for the keygraph related concepts. Following sections will then brie-#y describe the encoding used for image and video in-formation. The precursor to the new algorithm is thenexplained, followed by an explanation of the new algo-rithm. The results section compares the new algorithmwith other available algorithms and examines the spacecomplexity over an image database.
De5nition 1. A graph is a 4-tuple G"(
De5nition 2. Given a graph G"(
Table 1Possible interval relationships, due to Allen 
Relation Symbol Example Relation Symbol Example
Less than a(b Meets aDb
Overlaps a/b Ends a]b
Contains a%b Begins a[b
Equals a"b Begins inverse a['b
Contains inverse a%'b Ends inverse a]'b
Overlaps inverse a/'b Meets inverse aD'b
Less than inverse a('b
that the a and b are a type-0 match, with the additionalcondition that the objects have the same orthogonalrelationships. Insisting on matching orthogonal relation-ships in addition to a type-0 match restricts type-1matching to a similar rotational position.
Two pictures P and Q are said to be a type-n match if:
1. For each object oi3P there exists an o
j3Q such that
j, and for ea