17

Click here to load reader

Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Embed Size (px)

Citation preview

Page 1: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

*Corresponding author. Tel.: #61-8-9266-2110; fax: #61-8-9266-2819.

E-mail address: [email protected] (K. Shearer).

Pattern Recognition 34 (2001) 1075}1091

Video indexing and similarity retrieval by largest commonsubgraph detection using decision trees

Kim Shearer!,*, Horst Bunke", Svetha Venkatesh!

!Department of Computer Science, Curtin University of Technology, GPO Box U1987, Perth 6001, WA, Australia"Institut fu( r Informatik und Angewandte Mathematik, Universita( t Bern, Switzerland

Received 14 May 1999; received in revised form 1 March 2000; accepted 1 March 2000

Abstract

While the largest common subgraph (LCSG) between a query and a database of models can provide an elegant andintuitive measure of similarity for many applications, it is computationally expensive to compute. Recently developedalgorithms for subgraph isomorphism detection take advantage of prior knowledge of a database of models to improvethe speed of on-line matching. This paper presents a new algorithm based on similar principles to solve the largestcommon subgraph problem. The new algorithm signi"cantly reduces the computational complexity of detection of theLCSG between a known database of models, and a query given on-line. ( 2001 Pattern Recognition Society. Publishedby Elsevier Science Ltd. All rights reserved.

Keywords: Graph matching; Similarity retrieval; Video indexing; Decision tree

1. Introduction

With the advent of large on-line databases of imagesand video, it is becoming increasingly important to ad-dress the issues of e!ective indexing and retrieval. Themajority of existing techniques in the area of image andvideo indexing can be broadly categorised into low- andhigh-level techniques. Low-level techniques use at-tributes such as colour and texture measures to encodean image, and perform image similarity retrieval by com-paring vectors of such attributes [1}6]. High-level tech-niques use semantic information about the meaning ofthe pictures to describe an image, and therefore requireintensive manual annotation [7}14]. An additionalmethod of image indexing is by qualitative spatial rela-tionships between key objects. This form of index hasbeen successfully applied to indexing and retrieval ofimage data [15}17]. Recent work by Shearer and Ven-katesh [18,19] has extended the range of this repres-

entation to encoding of video data. With the objectinformation proposed for inclusion in the MPEG-4 stan-dard, such indexes will be simpler to implement. It shouldbe noted that none of these indexes provide a compre-hensive retrieval method, but may be considered com-plementary methods, each being one component of aretrieval toolkit.

When images or video are encoded using spatial in-formation, searches for exact pictorial or subpicturematches may be performed using a compact encodingsuch as 2D-strings [20}22]. Similarity retrieval, however,is best expressed as an inexact isomorphism detectionbetween graphs representing two images [23,24]. Imagesare encoded as graphs by representing each object in animage as a vertex in the graph, and placing an edge,labelled with the spatial relationship between the twocorresponding objects, between each pair of vertices.Note that this is only one plausible representation ofimages using graphs. Other representations are possible,and in some cases will be preferable. Indeed any relation-al structure represented as a graph could make use of thealgorithms discussed in this paper.

Inexact isomorphism detection between two graphscan be performed using one of two measures of similarity;

0031-3203/01/$20.00 ( 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 0 4 8 - 0

Page 2: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

edit distance or largest common subgraph. While thereare well-known algorithms to solve these problems, thesealgorithms are exponential in time complexity in thegeneral case. This is a disadvantage when retrieval islikely to involve browsing of a large database, then pro-gressive re"nement.

Recently, new graph isomorphism algorithms havebeen developed by Messmer and Bunke [25,26], whichuse a priori knowledge of the database of model graphsto build an e$cient index o!-line. This index takesadvantage of similarity between models to reduceexecution time. Messmer and Bunke propose algo-rithms to solve the subgraph isomorphism problem,and to solve the inexact subgraph isomorphism prob-lem using an edit distance measure. While subgraphisomorphism is often used as a measure of similaritybetween images, the edit distance measure is notsuitable for image and video indexing by spatial relation-ships.

The major di$culty when applying edit distancemethods to the image similarity problem, is that there isno clear interpretation for the edit operations. Deletionof a vertex implies the exclusion of an object fromthe matching part of two graphs, altering the label of anedge represents altering the relationships of two objects,there is no meaningful comparison for these two opera-tions. This problem means that any similarity measurebased on graph edit distance will return a value withlittle or no physical signi"cance. Inexact isomorphismalgorithms based on an edit distance measure also su!erfrom bias when used to compare an input against mul-tiple model graphs. By the nature of edit distance algo-rithms, if the input graph is smaller than the models,smaller model graphs are more likely to be chosen assimilar.

A more appropriate measure of image similarity is thelargest common subgraph between the graphs represent-ing the images. Largest common subgraph is a simpleand intuitive measure of image similarity. The largestcommon subgraph between two graphs G

1and G

2, en-

coding two images I1

and I2, represents the largest

collection of objects found in I1

and I2

that maintain thesame relationship to each other in both images. Theusual algorithm for determining largest common sub-graphs is the maximal clique detection method proposedby Levi [27].

The maximal clique detection algorithm is e$cient inthe best case, but in the worst case requires O((nm)n) time,where n is the number of vertices in the input graph andm the number of vertices in the model, for a completegraph with all vertices sharing the same label. This highcomputational complexity makes it di$cult to applyindices which are based on spatial relationships to largedatabases.

In this paper we describe a new algorithm for largestcommon subgraph detection. The algorithm has been

developed from an algorithm originally proposed byMessmer and Bunke [28,29]. As with the original algo-rithm, this algorithm uses a preprocessing step, appliedto the models contained in the database, to provide rapidclassi"cation at run time. The proposed algorithm isconsiderably more e$cient in time than any previousalgorithm for largest common subgraph detection, how-ever the space complexity of the algorithm somewhatrestricts its application. If the di!erence in time complex-ity is considered, from O(¸(nm)n) for the usual algorithmfor matching an input of size n, to a database of ¸ modelsof size m, to the new algorithms complexity of O(2nn3),there is room to perform space saving operations whilestill signi"cantly out performing typical algorithms. Oneexample of space saving is to depth limit the decisiontree, such that searching is halted when a certaindepth is reached, and matching between the few graphsleft to separate is completed using a algorithm such asUllman's [30], instantiated from the matching alreadyperformed.

While largest common subgraph has been appliedmostly to image similarity retrieval, the application ofthese new algorithms to indexing by spatial relation-ships can also take advantage of the high degree ofcommon structure expected in video databases. In thispaper we treat a video as a simple sequence of images.Even with this straightforward treatment it is possibleto provide a similarity retrieval scheme that is ex-tremely e$cient, due to the high degree of commonstructure encountered in between frames in video. Manyframes of a video will be classi"ed by one element ofthe classi"cation structure used. This removes theimpediment of slow similarity retrieval for indices ofthis type. Furthermore, the algorithms may be appliedto labelled, attributed, directed or undirected graphs.Such graphs have a great deal of expressive power,and may be applied to other encodings of image andvideo information, or other data sets of relational struc-tures.

This paper begins by providing de"nitions for the keygraph related concepts. Following sections will then brie-#y describe the encoding used for image and video in-formation. The precursor to the new algorithm is thenexplained, followed by an explanation of the new algo-rithm. The results section compares the new algorithmwith other available algorithms and examines the spacecomplexity over an image database.

1.1. Dexnitions

De5nition 1. A graph is a 4-tuple G"(<,E,k, l), where

f < is a set of vertices,f E-<]< is the set of edges,f k :<P¸

Vis a function assigning labels to vertices,

f l :EP¸E

is a function assigning labels to the edges.

1076 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091

Page 3: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

De5nition 2. Given a graph G"(<,E, k, l), a subgraph ofG is a graph S"(<

S,E

S, k

S, l

S) such that

f <S-<,

f ES"EW (<

S]<

S),

f ks

and lS

are the restrictions of k and l to <S

andES

respectively, i.e.

ks(v)"G

k(v) if v3<S,

undefined otherwise,

ls(e)"G

l(e) if e3ES,

undefined otherwise.

The notation S-G is used to indicate that S is a sub-graph of G.

De5nition 3. A bijective function f :<P<@ is a graphisomorphism from a graph G"(<,E, k, l) to a graphG@"(<@, E@,k@, l@) if

1. k(v)"k@( f (v)) ∀v3<.2. For any edge e"(v

1, v

2)3E there exists an edge

e@"( f (v1), f (v

2))3E@ such that l(e)"l(e@), and for any

e@"(v@1, v@

2)3E@ there exists an edge

e"( f~1(v@1), f ~1(v@

2))3E such that l(e@)"l(e)

De5nition 4. An injective function f :<P<@ is a subgraphisomorphism from G to G@ if there exists a subgraphS-G@ such that f is a graph isomorphism from G to S.

Note that "nding a subgraph isomorphism from G toG@ implies "nding a subgraph of G@ isomorphic to thewhole of G. This distinction becomes important in laterdiscussion.

De5nition 5. S is a largest common subgraph of twographs G and G@, where S-G and S-G@, i!∀S@: S@-G'S@-G@NDS@D)DSD.

There may or may not be a unique largest commonsubgraph for any two graphs G and G@.

2. Image indexing and retrieval

Before describing algorithms for retrieval it is neces-sary to discuss the underlying encoding of video. Theencoding presented here is intended as a workingexample of how graph isomorphism may be used inimage and video indexing and retrieval, but it is by nomeans the only such representation possible. Numerousmethods have been proposed for indexing and retrievalof image and video information. These methods vary in

the amount of preprocessing required and the type ofinformation available for queries. The index used in thispaper is based on spatial relationships between objects.

Spatial relationships between objects in a picture maybe described in a number of ways. The most precisedescription is that proposed initially by Allen [31] forqualitative temporal reasoning. Allen gives the 13 pos-sible relationships between two intervals along an axis(see Table 1). This may be extended to two dimensions,by assigning one relation along each axis, giving a total of169 possible relationships between two objects. The twoaxis are called either the u- and v-axis, or the x- andy-axis. By convention u or x refers to the horizontal axis,with v or y referring to the vertical axis.

A less precise description, based upon the same con-cepts, may be derived using the relationship categoriesproposed by Lee et al. [24]. The 169 possible spatialrelationships in two dimensions may be partitioned into"ve categories, de"ned by characteristics of the relation-ships. The categories are:

De5nition 6. (1) Disjoint * the two objects a and b donot touch or overlap, that is there is a less than (oper-ator along at least one axis.

(2) Meets * the two objects a and b touch but do notoverlap, thus they have a meets D relationship along oneaxis and a non-overlapping relationship along the other.

(3) Contains* object a contains object b, that is objecta contains %, begins [ or ends ] object b along both axes.

(4) Belongs to * object b contains object a, that isobject b contains %, begins [ or ends ] object a along bothaxes.

(5) Overlaps* the two objects do not fall into any ofthe above categories.

These category relationships are rotation and scalinginvariant, making them useful as an approximate match-ing strategy.

General matching schemes based on spatial relation-ships de"ne three levels of pictorial matching. Theselevels de"ne the form of correspondence that is requiredbetween the spatial relationships of two objects, whichappear in two pictures, for the pictures to be considered amatching pair. The matching scheme used in this work isadopted from the B-string notation [24], which is theunderlying notation used for storage of spatial relation-ships. B-strings provide three types of approximatematching, referred to as type-0, type-1 and type-2. Type-2matching is the most exact, requiring that two objectsa and b, present in two pictures P and Q, must have thesame spatial relationship along both axes. Thus theymust have the same pair, of the 169 possible pairs, ofinterval relationships. Type-0 matching is the least exactand uses relationship categories, requiring only that fora and b, the spatial relationships fall in the same relation-ship category. The "nal type of matching, type-1, requires

K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1077

Page 4: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Table 1Possible interval relationships, due to Allen [31]

Relation Symbol Example Relation Symbol Example

Less than a(b Meets aDb

Overlaps a/b Ends a]b

Contains a%b Begins a[b

Equals a"b Begins inverse a['b

Contains inverse a%'b Ends inverse a]'b

Overlaps inverse a/'b Meets inverse aD'b

Less than inverse a('b

that the a and b are a type-0 match, with the additionalcondition that the objects have the same orthogonalrelationships. Insisting on matching orthogonal relation-ships in addition to a type-0 match restricts type-1matching to a similar rotational position.

Two pictures P and Q are said to be a type-n match if:

1. For each object oi3P there exists an o

j3Q such that

oi,o

j, and for each object o

j3Q there exists an o

i3P

such that oj,o

i.

2. For all object pairs oi, o

jwhere o

i3P and o

j3Q,

oiand o

jare a type-n match.

In many cases two pictures may not be complete match-es, or even share the same object set, but we may beinterested in partial matches. The search for partialmatches, or subpicture matches is referred to as simila-rity retrieval. A subpicture match between P and Q isde"ned as:

1. For all objects oi3Q@, where Q@LQ, there exists an

object oj3P@, where P@LP, such that o

i,o

j.

2. For all objects oj3P@, where P@LP, there exists an

object oi3Q@, where Q@LQ, such that o

j,o

i.

3. For all object pairs oi, o

jwhere o

i3P@ and o

j3Q@,

oiand o

jare a type-n match.

Thus a subpicture match occurs when a subset of theobjects in a picture P are found in a picture Q, with thespatial relationships being a type-n match.

When searching a video database, if there are nocomplete picture matches, we will be interested in thelargest subpicture match that may be found. The classicmethods for solving this problem cast the problem assubgraph isomorphism detection. When applied to imageor video retrieval, the largest common subgraph methodwill "nd the largest subpicture in common betweena query image and a database of images. Whether this isthe desired retrieval result depends upon a number offactors.

The largest common picture may not contain one ormore key objects which are considered essential by theuser, or may contain additional objects or have generalcharacteristics which are unsuitable. For this reason aranked list of the closest matching images is generallyreturned. Modi"cation of the query, in conjunction withselection of a more exact, or less exact matching type maythen be utilised to re"ne the retrieval set.

1078 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091

Page 5: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Fig. 1. Two pictures with possible graph encodings.

Variation of type of matching employed allows either

f restriction of the retrieval set by increased exactitudein matching,

f expansion of the retrieval set by reduced exactitude inmatching.

This, combined with small alterations to the query,allows a rapid focus on the desired results during re-trieval. The qualitative nature of the relationships be-tween objects and the less exact matching types allownoise to be accommodated in the browsing and queryre"nement process. Retrieval for this method is wellde"ned and exact, the image in the database with thegreatest number of objects in relationships which matchthe query will be the "rst model retrieved. How useful theretrieval method is depends on the form of retrievaldesired and the #exibility of the three matching types.A discussion of usefulness and accuracy for a speci"cvideo retrieval application can be found in earlier workby the authors [19,32].

3. Encoding videos

Video may be encoded using techniques from imagedatabase work, adapted to the task of capturing motionin video [18,33]. In order to keep the representatione$cient in space usage, only changes in the video shouldbe represented, rather than duplicating information be-tween frames. This can be achieved by representing theinitial frame of a video sequence using 2D strings, then

encoding only the changes in spatial relationships be-tween objects for the rest of the sequence. The earlierwork performed by this group [18] encodes changingrelationships such that matching can be performed dir-ectly from the abbreviated notation, without expansionof the notation. This encoding of video using spatialrelationships applies graph algorithms to solve similarityretrieval.

In order to use a graph algorithm to solve a particularproblem, it is "rst necessary to encode the operands ofthe problem as graphs. In this case the operands areeither digital pictures or frames from a video. Theseframes are indexed by the spatial relationships of keyobjects. The logical graph encoding used for suchinformation is that graph vertices represent objects,with edges labelled by the relationships between theobjects. For the task of "nding graph isomorphismsthis leads to a complete labelled graph, as deductionof relationships at run time would be prohibitivelytime consuming. In practice, edges are labelled with therelationship category (De"nition 6) of the relationshipsbetween the two objects, with the actual relationshipalong each axis used as attributes of the edge. Thisrepresentation is more e$cient as all matching typesrequire that two objects are at least a type-0 match, thatis they have the same relationship category. Matchingcan therefore be performed by testing the edge label, orrelationship category, and proceeding to the attributes iftype-1 or type-2 matching is required and the labels areequivalent.

Fig. 1 shows two pictures and the graphs which repres-ent them. There is a clear similarity between the two

K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1079

Page 6: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Fig. 2. Graph with adjacency matrix.

Fig. 3. Row column elements of a matrix.

pictures, the most obvious being between the subpicturescomposed of objects A}C. In the context of qualitativereasoning, the object B does not move with respect toA and C, as the relationships between them do notchange. An examination of the two graphs reveals that ifwe remove the vertex for object D, and all arcs leading toor from that vertex, then the remaining parts of thegraphs are identical. That is, the remaining vertices andarcs are a subgraph of the "rst graph isomorphic toa subgraph of the second. In the context of similarityretrieval in pictorial databases, we are interested in theelements of the database which share the largest isomor-phic subgraph with the query input. This subgraph isreferred to as the largest common subgraph, and is sim-ilar to the longest common subsequence problem for textstrings.

This encoding of images as graph enables one image tobe compared to another and the largest similar partdetected. The algorithms presented in the next section areapplicable to similarity retrieval for both image andvideo databases. While they still perform matching ona single-input query image basis, the compilation of mul-tiple model images into a database explicitly takes ad-vantage of the similarity between frames within a video.It is in part this similarity that leads to the e$ciency ofthe proposed algorithms.

4. Decision tree algorithms

The decision-tree-based algorithm detects graph andsubgraph isomorphisms from the input graph to themodel graphs. That is, it "nds subgraphs of the modelgraphs that are isomorphic to the input graph. This typeof isomorphism detection is important as it is requiredfor query by pictorial example, which is a common formof image database query. When query by pictorialexample is used as the retrieval method, the input isan iconic sketch containing key objects. This iconicsketch is translated to a graph for which the vertex setgenerally represents a subset of the objects in the imagesretrieved.

The decision tree algorithm is based on a decision treeconstructed using the adjacency matrix representationfor the model graphs. A graph G may be represented bya matrix M, called the adjacency matrix, where the ele-ments of M are assigned values

mij"G

k(vi) if i"j,

l((vi, v

j)) if iOj.

(1)

This places the object labels down the diagonal of thematrix, and the labels of the edges in elements corre-sponding to the vertices they link. Fig. 2 shows a graphwith an adjacency matrix that can be used to represent it.

The property of adjacency matrices which leads to thedecision tree algorithm is their behaviour under permu-tation. A permutation matrix is de"ned as

De5nition 7. An n]n matrix P"(pij) is called a permu-

tation matrix if

1. pij30, 1 for 1)i)n,1)j)n,

2. +ni/1

pij"1 for 1)j)n,

3. +nj/1

pij"1 for 1)i)n.

If element pij"1 in a permutation matrix P, then ap-

plying the transformation

M@"PMPT

will cause the jth vertex in adjacency matrix M to becomethe ith vertex in M@. One property of adjacency matricesis that given a graph G that is represented by an adjac-ency matrix M, then any matrix M@, where M@"PMPT

and P is a permutation matrix, is also an adjacencymatrix representing G. Graph isomorphism detectionbetween two graphs M

1and M

2can therefore be recast

as the task of "nding a permutation matrix P such that

M2"PM

1PT,

where M1

and M2

are the two adjacency matrices repres-enting graphs G

1and G

2, respectively.

In order to use this property to increase the speed ofisomorphism detection, a decision tree is constructedfrom the adjacency matrices which represent the modelgraphs. This tree is created and navigated using rowcolumn elements of the adjacency matrices. Fig. 3 showsthe adjacency matrix from Fig. 2 broken into its rowcolumn elements. Each row column element r

icontains

1080 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091

Page 7: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Fig. 4. Decision tree for example graph.

one vertex label vi

and the labels of all edges betweenviand vertices v

1,2, v

i~1. The decision tree for a graph

G begins with an unlabelled root node, which has asmany descendants as there are distinct vertex labels.Each of these initial branches is labelled with a singlevertex label, these represent the initial one element rowcolumn elements of each of the possible adjacency ma-trices for G. An example of this is given in Fig. 4 whichshows the six adjacency matrices which can represent thegraph in Fig. 3, and the resulting decision tree. There areonly two unique labels in the graph so there are only twodescendants from the root node. These immediate de-scendants of the root have one descendent for each of thethree element row column elements that follow them inone of the adjacency matrices. Fig. 4 shows the decisiontree generated from the six possible permutations of theexample graph.

Further graphs, representing other images, can beadded to the tree incrementally. For each additionaladjacency matrix A, representing a graph G, to beadded, the following algorithm is followed. Beginningwith the root node of the tree, and the "rst row columnelement:

1. Test the next row column element of A against thelabels on the branches descending from the currentnode.

2. If the last row column element of A has been reached,place a marker at the current node of the decision treeto say that this node represents an isomorphism forthe model graph G.

3. If there is a matching branch label continue this pro-cedure at the node reached along the matchingbranch, with the next row column element.

4. If there is no matching branch, create a new branch,labelled with the current row column element, anddescend that branch to the next node. Continue fromthe new node with the next row column element.

This algorithm leads to the addition of only thosebranches which are not already represented in the tree.The higher the degree of similarity between models, thesmaller the increase in tree size per model added.

The algorithm used to detect isomorphisms betweenan input graph G

Iand the model graphs encoded in the

decision tree is similar to the procedure used to add newadjacency matrices to the decision tree. Given an inputgraph G

Ito test for subgraph isomorphisms, the adjac-

ency matrix MI

which represents GI

is used to navigatethe decision tree. Beginning at the root with the oneelement row column element at the top left of M

I, the

algorithm descends the branches matching each ofthe row column elements of M

I. When descending the

tree there are two possible termination conditions for

K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1081

Page 8: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

the algorithm:

1. If at any point there is no arc from the current node inthe tree which has a label matching the next rowcolumn element, then there is no subgraph isomor-phism.

2. If all row column elements of the graph GI

have beenused and node n

ihas been reached, then all models

Gi1,2, G

ik, associated with node n

ihave a subgraph

isomorphic to the input graph.

In case 2, if the node ni

is a leaf then the modelsG

i1,2,G

ikand input G

Iare the same size and a graph

isomorphism, rather than a subgraph isomorphism, hasbeen found.

A row column element ri

contains one vertex labelviand the edge labels for each edge which connects r

ito

any vertex vj: j(i. Thus each row column element en-

capsulates all edges between viand vertices which have

already been classi"ed. The process of matching a rowcolumn element and descending to a new node in the treeis an incremental addition of vertices to the classi"edgraph.

There are numerous optimisations which may be madeto this algorithm [25], involving methods of reducing thenumber of nodes in the tree. The best space complexitythat may be achieved without some form of pruning is

O(¸ D lvD (1# D l

eD 2)n), (2)

where lv

is the number of vertex labels used and le

is thenumber of edge labels used in the models. This "gure canbe achieved while still representing all states in the graph.Further heuristics can be used to prevent generation offurther states at strategic points.

While this size complexity is an impediment to the useof this algorithm in general graph matching problems,there are problem domains for which the number ofobject labels can be small. An example of this is inmedical image databases such as chest X-ray data. Insuch areas there are typically only four or "ve signi"cantobjects, making the decision tree algorithm appropriate.

The are also a number of methods for pruning thedecision tree. This "rst is breadth pruning, in whichnodes within the decision tree may eliminated by sortingthe row column elements of the adjacency graphs. Herethe row column elements are ordered such that eachvertex is connected to at least one vertex which appearsin an earlier row column element. This allows a numberof permutations to be eliminated. The second method isdepth pruning. This method places an upper limit on thedepth of the decision tree. At any point in the construc-tion of the decision tree where the maximum depth isreached, no further branches are built, but models arecollected at the terminal node. If classi"cation of an inputreaches this terminal node, then Ullman's algorithm [30]

is then used to complete classi"cation. The vertex map-pings used to reach the terminal node may also be used toinitialise Ullman's algorithm, such that only the unmap-ped vertices are considered.

The advantage of the decision tree algorithm is that ithas computational complexity is polynomial in the num-ber of vertices in the input. The time taken to computegraph and subgraph isomorphisms is therefore indepen-dent of both the size of the model graphs and the numberof model graphs. For an image or video database thatwill contain a large number of images, and will generallybe queried by iconic query, this computational complex-ity is a clear advantage. Comparing the computationalcomplexity of this algorithm of O(n2) to that of theprevious best algorithm O(¸mnn2), shows how rapidlythis algorithm can classify the input.

Apart from the disadvantage of the space complexityof this algorithm which restricts its application, the inex-act isomorphism method used with this algorithm hasspace complexity exponentially greater than the basealgorithm. In order to apply this extremely fast algorithmto image and video databases it was necessary to developan algorithm for similarity retrieval which is useable. Thenext section describes the LCSG decision tree algorithmdeveloped for application to image and video indexingand retrieval.

4.1. Decision-tree-based LCSG algorithm

The advantage of the decision-tree-based largest com-mon subgraph (LCSG) algorithm is that its computa-tional complexity is independent of the number ofmodels, and the size of the models. The complexity isdependent only on the size of the input, which is advant-ageous when the models are expected to be larger thanthe input, as in the case of query by pictorial example.This complexity is due to the depth of decent through thedecision tree being limited to the number of vertices inthe input.

The largest common subgraph (LCSG) for a pair ofgraphs is generally expensive to compute, due in part toits non-monotonic nature. This means that any LCSGalgorithm must employ some form of backtracking toguarantee "nding the optimal solution. The existence ofa very fast algorithm for detection of subgraph isomor-phisms presents the possibility of a new approach to thelargest common subgraph problem.

The algorithm developed is based on the observationthat the decision tree algorithm may terminate classi"ca-tion before all possible matches for row column elementsare discovered. The decision tree algorithm terminates assoon as a row column element is found for which there isno matching branch from the current node. At this pointthere may, however, be row column elements furtherdown the adjacency matrix for which a match does exist.Consider the adjacency matrix in Fig. 5(a). When we

1082 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091

Page 9: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Fig. 7. Permutation of matrices during classi"cation.

Fig. 6. Partitions of the adjacency matrix.

Fig. 5. Input graph adjacency matrices.

attempt to classify this using the tree in Fig. 4, thealgorithm terminates at the second row column elementas there are no matches. However, if we permute thematrix of Fig. 5(a) such that the second and third rowcolumn elements are interchanged, we then have thematrix of Fig. 5(b). This represents a graph isomorphic tothat represented by Fig. 5(a), but the permuted matrixallows both the "rst and second row column elements tobe classi"ed against the tree in Fig. 4. This gives usa description of a common subgraph of order two for thegraphs represented by the decision tree and the input.Clearly this procedure can be applied to larger examples.

When this method is applied to descend as far aspossible through the tree, the resulting adjacency matrixis partitioned into two distinct parts. Fig. 6 showsa graph and its adjacency matrix, and the resultingmatrix after classi"cation with respect to the decision treeof Fig. 4. The "nal matrix in Fig. 6 is marked to distin-guish between the two partitions. The lower right parti-tion contains the row column elements which could notbe matched, and the upper left partition contains those

row column elements which were matched. The upper leftpartition forms a permuted matrix which describes a po-tential largest common subgraph of the input and at leastone model graph. The graph represented is only a poten-tial LCSG, as any one vertex when included in the set ofmapped vertices, may prevent correct detection of thelargest common subgraph.

This problem is depicted in Fig. 7. An adjacencymatrix representing the example graph in Fig. 7(a) isgiven in Fig. 7(b). When this adjacency matrix is classi"edusing the decision tree from Fig. 4, the initial descentthrough the tree terminates with the partitioned matrixin Fig. 7(c). This shows a common subgraph of size two,with two unmatched row column elements. Fig. 7(d)shows the partitioned matrix that results if row columnelement one is permuted to the end of the matrix beforeclassi"cation. Here the matrix shows a common sub-graph of size three. Including node 1 therefore preventsdetection of the LCSG.

The process of descending the decision tree to thebest-possible depth, by mapping row column elementswhere a match exists and permuting to the end ofthe adjacency matrix when there is no match, is given inthe pseudo-code in Fig. 8, between labels 1 and 4 . Theblock between labels 2 and 3 performs the step ofpermuting rows to the end of the adjacency matrix (func-tion permuteOut) until either a matching row columnelement is found, or there are no further rows to test.

In order to "nd the largest common subgraph it isnecessary to perform backtracking. Once the initial de-scent has been performed, the "nal node reached is a can-didate node for the LCSG. Backtracking is performed bytaking the partitioned matrix for the candidate node, and

K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1083

Page 10: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Fig. 8. Pseudo-code for the decision tree LCSG algorithm.

permuting the last row column element of the matchedpartition to the bottom of the matrix. The dimension ofthe matrix is then reduced by one to prevent a repeatedmatch. This is performed at label 5 in the pseudo-code.Classi"cation is then resumed at the immediate ancestorof the candidate node. Fig. 7 shows a simple example ofthis backtracking scheme. After the initial descent, whichdetects a common subgraph of size two (Fig. 7(c)), rowcolumn element 4 is permuted to the end of the adjacencymatrix, giving the matrix in Fig. 7(e). This matrix o!ers

no further matches. Row three will then be permutedto the end, also yielding no further matches. The back-tracking algorithm then returns to the previous level ofmatching by popping previous environments o! a stack.This gives the original matrix with the "rst row columnelement as the only match. The "rst row column elementis then permuted to the end of the matrix, giving thematrix in Fig. 7(d). At this point the whole matrix hasbeen matched, row column element 1 having been dis-carded, so no further matching is required. It can easily

1084 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091

Page 11: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Fig. 9. Example of an association graph.

be seen that this algorithm will examine every possiblesubgraph.

Such a search scheme is highly ine$cient in this naiveform, so a pruning mechanism is introduced. The searchtree may be pruned at any point at which the number ofnodes which have been permuted to the lower partition,plus the order of the best common subgraph detected sofar, is greater than or equal to the current matrix dimen-sion. For example assume during a continuing classi"ca-tion, a candidate subgraph has been found of size 5, froman input of size 8. If the classi"cation has reached thethird branch from the root, then the dimension of theinput matrix will have been reduced to 6, as two rowcolumn elements (r

1and r

2) have been discarded. This

means that the best-possible subgraph size on this de-scent is 6, so that any more than one permutation duringthe descent will make it impossible to exceed the currentbest e!ort of size 5.

The best pruning is seen when the depth of the initialdescent is a large fraction of the full depth of the tree. Atthe extreme, if the dimension of the input adjacencymatrix is d, and the depth of the initial descent is d!1,then pruning will occur before the second row columnelement is permuted out on all branches under the leftmost branch from the root node. On descents from thesecond branch from the root node pruning will occurwhen the "rst permutation is required, and no furtherbranches under the root node will be examined as itwould not be possible to equal the already discoveredcommon subgraph. Consideration of this limit reveals itto be a powerful pruning factor, as shown by the follow-ing computational complexity analysis.

4.2. Complexity of tree-based LCSG

For this complexity analysis we will assume that aninput graph G

Iwith n vertices is to be classi"ed against a

database of ¸ models of with m vertices. The complexityis derived using the equivalence of maximal cliques inassociation graphs and largest common subgraphs, toprovide worst case complexity.

For two graphs G and G@, the association graph isa vertex labelled, undirected graph which can be createdby the two step process:

1. For each correct vertex mapping from graph G to G@,insert a vertex in the association graph. This vertex islabelled with the vertex mapping between the verticesof G and G@.

2. For each pair of vertices viand v

jin the association

graph, insert the edge SvivjT if the vertices mapped

from G have the same edge characteristics as thevertices they are mapped to in G@.

Fig. 9 shows two labelled graphs and the associationgraph produced for them. The vertices are labelled withthe letters a}d, edges are directed but unlabelled, and thevertices of the two graphs are uniquely numbered foridenti"cation in the mappings of the association graph.This mapping 1}5 indicates the vertex labelled a in Fig.9(a) is mapped to the vertex labelled a in Fig. 9(b).

Each clique within the association graph representsa set of vertices which have the same mutual relation-ships in G and G@. That is, the cliques represent subgraphscommon to G and G@. The maximal clique or cliques in anassociation graph thus represent the largest commonsubgraph, or subgraphs. In order to determine the num-ber of possible common subgraphs between two graphs,it is su$cient to examine the characteristics of cliques inan unlabelled, undirected graph of the appropriate size.This is used in analysing the worst-case computationalcomplexity.

The best-case complexity for the new algorithm isfound when the input is an exact subgraph isomorphismof at least one of the model graphs. In this case the timecomplexity is O(n2), as the algorithm simply descendsdirectly down the tree. When there are no subgraphs withnumber of vertices greater than one, the complexity issimply O(n4). That is the cost of a descent with only onepossible match, for each vertex in the input.

The worst-case time complexity for this algorithm willoccur when the size of the largest common subgraph is

K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1085

Page 12: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

approximately half the size of the input. There are twofactors which determine this:

1. The maximum possible number of common sub-graphs is greatest.

2. The pruning due to best previous result has less e!ect.

The clearest example of the pruning e!ect can be seen ifwe have already found a subgraph of size n!1, in whichcase we can prune the search space at any point wherea second permutation becomes necessary. This impliesthat only the two initial branches from the root will beexamined.

In order to determine the maximum possible numberof subgraphs for a graph of a given size, the followingresult due to Turan [34] is used. During this theory andthe rest of this section the term order will be used for thenumber of vertices in a graph.

Theorem 1. Every graph G on n vertices with

1#An

2B!tn!c!r

2

edges contains a clique of order c#1, where n"tc#r,0)r(c. This is the best possible.

The only graph G@ of size

An

2B!tn!c!r

2

which does not have a clique order c#1 is the completec-partite graph with r parts of order t#1 and c!r parts oforder t.

The graph G@ gives an indication of the maximal num-ber of order c cliques possible in a graph of order n of

Nc"A

n!r

c#1B

r

An!r

c Bc~r

. (3)

When the extra edge is added to G@ to give a clique oforder c#1, this will create

Nc`1

"2An!r

c#1B

r~1

An!r

c Bc~r

(4)

cliques of order c#1. This leaves the number of cliquesof size c as

Nc{"N

c!N

c`1(5)

"An!r

c#1B

r~1

An!r

c Bc~r

CAn!r

c#1B!2D

(6)

"An!r

c!1BA

n!r

c#1B

r~1

An!r

c Bc~r

. (7)

Empirical analysis reveals that the practical limit tothe function in Eq. (3) is 2n, or more precisely [email protected]. Thefunction N

c{is bounded by 2n~1, as the leading factor

approaches unity. We shall use Nc{"2n for our analysis,

it should be noted that this is a generous over estimate.A single descent through the decision tree has time

complexity

Cd"n2#(n!c)C

perm. (8)

The second factor in Eq. (8) is the number of permuta-tions required (n!c), multiplied by the complexity of apermutation C

perm. The maximum complexity of a per-

mutation is n2, the cost of exchanging all elements. Eq. (8)is therefore

Cd"n3#(1!c)n2. (9)

Now given the number of cliques determined inEqs. (4) and (7), we can provide an upper bound for thetime complexity of the algorithm of

C"2n(n3#(1!(c#1))n2)#2n~1(n3#(1!c)n2). (10)

Eq. (10) consists of the number of cliques of size c#1multiplied by the cost of each descent, added to thenumber of cliques of size c multiplied by the cost of eachdescent. It can easily be seen from the construction of thegraph that every vertex must contribute to at least oneclique of size c, so the complexity calculation includes alldescents. Removing an edge to reduce the size of a clique,or produce more backtracking, will greatly reduce thenumber of available cliques, and also reduce the com-putational complexity. The polynomial factors in thisequation have been retained as the values of n for whichthis algorithm is used may be small, often no more thanfour.

The time complexity of the new algorithm, O(2nn3),compares favourably with the maximal clique "ndingalgorithm which has a worst case of O((nm)n).

5. Results of tests over a video database

The video database used in the experiments was drawnmainly from our campus guide database [18,19]. Thesevideo clips depict a guide walking between various loca-tions on the Curtin University of Technology Campus. Inaddition to the clips from the campus guide, there are anumber of other clips of park and city scenes, and a smallnumber of disparate clips of completely di!erent typesof scenes.

The clips used vary in length from 4 to 20 s, andcontain between 12 and 19 objects each. The shortest clipcontains 71 changes to object relationships, while thelongest clip has 402 changes. These may be regarded astypical "gures for changes per second, although one clip

1086 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091

Page 13: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Table 2General performance of the decision tree algorithm

Mean Minimum Maximum d(ls) (ls) (ls) (ls)

Ullman 393.2 252 607 113.1AH 617.1 362 861 178.2DN ED 109.0 81 209 36.2DN LCSG 67.4 38 100 19.6DT LCSG 16.6 6 23 6.5

Table 3E!ect of pruning depth

Depth Nodesused

Test 1 Test 2 Test 3 Test 4(ls) (ls) (ls) (ls)

11 59,200 3 3 3 39 58,900 11 3 7 78 56,706 6 6 5 57 49,130 6 5 6 56 34,689 5 4 5 45 18,135 4 4 5 54 6495 36 17 7 63 1499 46 15 11 10

contains 225 changes in 7 s. The higher frequency ofchanges in relationship re#ects the higher number of keyobjects in the clip.

All times given in tables are in milliseconds, and areaveraged over a number of executions for each query.The algorithms used for comparison in this section areUllman's algorithm for exact graph isomorphism detec-tion [30], the A* algorithm for inexact graph isomor-phism detection [35], and the decomposition networkalgorithms of Messmer and Bunke [28]. There are twodecomposition network algorithms used as comparison,one using an edit distance measure (DN ED) and anotherusing the largest common subgraph (DN LCSG) as thedistance measure [36].

5.1. Decision tree

There is no doubt that the decision tree algorithm is anexceptionally fast method for the detection of subpicturesof models that are isomorphic to the input. Table 2 showsthe times required for the decision tree algorithm tosearch two thirds of the guide database, containing thelongest clips, for isomorphisms with example frames. Thealgorithm performs classi"cation much faster than eitherUllman's algorithm or either of the decomposition net-work algorithms. The decision tree algorithm not onlyhas by far the least mean for execution time, but also byfar the least standard deviation and maximum. Thisspeed of execution does not come without cost, in theform of increased space requirement over the other algo-rithms. Whereas Ullman's algorithm, the A* algorithmand both decomposition network algorithms all requiresimilar space, and can load the entire database, it was notpossible to load all clips using the decision tree algo-rithm. This is mostly due to the number of objects in twoparticular clips. Due to the high space requirement, thereare a number of experiments in this section which exam-ine the characteristics of the decision tree algorithm inthis area.

One area in which the decision tree algorithm has amarked advantage is when there is no match available. Inthis case the decision tree algorithm can classify the inputin 6 ms, whereas for the example in Table 2 the network

algorithm averaged 39 ms. In fact, the decision tree algo-rithm performs at its best when the network algorithmperforms at its worst.

The disadvantage of the decision tree algorithm is itsspace requirement. While it was possible to build a com-plete decision tree for two-thirds of the guide database in20 Mb of memory, it was not possible to build a decisiontree for just one of the omitted sequences in 300 Mb. Thereason for this is the exponential dependence of thedecision tree algorithm upon the number of distinct ver-tex labels in the graph. In the case of image and videodatabases each vertex represents a key object. Whereasfour clips with nine or 10 objects per frame require a totalof 18,135 nodes in a decision tree, adding just two framesof a sequence with 19 objects increases the requirement to76,913 nodes. This experiment is described in Table 4.

In an attempt to combat the size problems of thedecision tree algorithm the GUB toolkit [26] includes adepth pruned algorithm. This permits the user to specifya maximum depth for the decision tree, once this depth isreached no further nodes are added, models being col-lected into the leaf node. In run time isomorphism detec-tion, if a node is reached that has been depth limited, thendetection continues using Ullman's algorithm, initialisedwith the results from the partial classi"cation. This mayallow rapid elimination of many alternatives, while stillexecuting in a tractable size. The results of creating deci-sion trees of varying depths, for example, sequences aregiven in Table 3. Here we give the number of nodesrequired and the average time taken to match examplepictures for the example depths of the decision tree. Thisshows that although the time required to search thedatabase increases quite rapidly, the total time is still farless than that taken by any other algorithm.

Table 4 gives an indication of factors in#uencing thegrowth of the decision tree. Using a pruning depth of 5,the number of nodes was found for a number of combina-tions of clips from the database. Initially, a databasecontaining only the clip liblr was constructed, then a se-lection of other clips was added. liblr is relatively short,

K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1087

Page 14: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Table 4E!ect of introducing sequences

Sequences Nodes d nodes

1 liblr 67802 liblr librl 8211 14313 liblr librl bookrl 13,991 57804 liblr bookrl 12,977 61975 liblr bookrl librl 13,991 10146 liblr librl bookrl booklr 18,135 41447 liblr librl bookrl booklr

way1dl19,389 1254

8 liblr librl bookrl booklrcafelr[0-1]

76,913 58,788

Table 6Times for approximate match between two graphs

Query AH DN ED DT LCSG DN LCSG(ls) (ls) (ls) (ls)

15.1 158 23 22 3415.2 160 22 23 3515.3 164 22 22 34

Table 5Performance of LCSG algorithm

Query Error AH DN ED DT-LCSG DNLCSG

(ls) (ls) (ls) (ls)

liblr.10 0 10,737 172 26 78liblr.0 0 9073 164 22 65wayq 6 9122 223 7 46libq 12 35,674 2851 14 72

being only slightly over 5 s in length, and containing 12indexed objects. Other clips added were:

librl: A clip showing the same background as liblr, butwith the guide walking in the opposite direction.

bookrl: A clip of similar length and number of objectsto the previous clips, but showing a di!erent location,therefore with a largely disjoint object set.

booklr: A clips showing the same background as bookrl,but with the guide walking in the opposite direction.

way1dl: A clip 4 times the length of the previous clips,containing two less objects, with the object set largelydisjoint from the previous clips.

cafelr: A clip of similar length to the shorter clips, butcontaining 19 indexed objects, also with a largely disjointobject set.

Here we see that the addition of a clip containinga similar object set (1}2) causes only a minor increase inthe number of nodes, while introducing a clip of similarsize, with a disjoint object set (1}4), almost doubles thenumber of nodes. The increase in size is less than doubledue to the guide and exit being common objects in theseclips. The fact that building a decision tree is a determin-istic process is displayed by the equal number of nodes at3 and 5. When a further sequence which shares mostobjects is added at 6 we see that a moderate increase innodes occurs, yet noticeably less than the increase causedby the "rst sequence with that object set.

The "nal two lines display the pronounced e!ect ofincluding a clip with an increased number of objects. At7 in Table 4 a sequence four times the length of theprevious sequences has been added. This sequence hasonly ten objects, and causes the smallest increase in nodesrequired of any addition. In contrast to this, at 8 we haveadded only the "rst two frames from a sequence contain-ing 18 objects, yet the increase in nodes required is fargreater than the total number of nodes required for allother clips.

This clearly displays the weakness of the decision treealgorithm. Given a limited number of object labels, thisalgorithm can be used to detect database pictures orvideo frames which contain a user query picture withminimal execution time. However, the number of labelledobjects need only increase slightly to make the expressionof the problem too expensive in space requirement. Theoriginal decision tree algorithm is also restricted to exactmatching, with no inexact isomorphism detectionmethod available.

5.2. Results of the LCSG algorithm

Table 5 gives the results of four queries performedusing a number of inexact isomorphism detection algo-rithms. The algorithms used in comparison are the A*with lookahead algorithm, the decomposition networkalgorithm using edit distance as a similarity measure(DN ED) and the decomposition network algorithm us-ing largest common subgraph as a distance measure (DNLCSG) [36]. The "rst two queries have exact solution inthe database, while wayq and libq have increasing di!er-ences (edit distance 6 and 12, respectively). As expectedboth algorithms based on edit distance (A* and DN ED)show a large increase in execution time as the errorincreases. The LCSG-based algorithms show a decreasein execution time for one inexact query, and a timesimilar to exact times for the other. In all cases thedecision tree LCSG algorithm is much faster than anyother, an is at least two orders of magnitude faster thanthe A* algorithm.

Table 6 gives the results of the A*, the DN ED and thetwo LCSG algorithms for approximate matches between

1088 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091

Page 15: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Table 7Times for approximate match against 11 graphs

Query AH DN inexact DT LCSG DN LCSG(ls) (ls) (ls) (ls)

11.6 282 30 17 3812.2 53 15 5 1414.1 84 18 7 1615.2 642 34 23 4515.3 692 36 22 4616.1 136 20 8 20

a single model graph and an input graph. The modelgraph and the input graphs have ten nodes each, andhave been constructed to cause worst-case performancefor the LCSG algorithms. In each case the edit distancenetwork algorithm performs isomorphism detection in asimilar time to the decision tree LCSG algorithm, andthese are the two fastest algorithms. The decompositionLCSG algorithm is slightly slower than the inexact de-composition algorithm, as expected given the nature ofthe graphs while the A* algorithm is much slower evenfor a single graph.

When the test in Table 6 is extended to multiplemodels graphs, the performance of the decision treeLCSG algorithm varies little from that for a single modelgraph. The results in Table 7 are for queries against 11model graphs, of similar size and structure to the exam-ples used in Table 6. This model set is once again con-structed to produce the worst-case performance from theLCSG algorithms. The results show that the time takenfor the edit distance network algorithm increases con-siderably more slowly than that taken for the A* algo-rithm. The table also shows that the DT LCSG algorithmis essentially una!ected by the additional graphs in thedatabase, out performing all other algorithms and re-turning the same time for the queries from Table 6. Thisis as predicted by the complexity analysis. Given that thedata set is constructed to provide worst case performancefor the LCSG algorithms this is good performance.

While the A* and inexact network, and the LCSG doperform di!erent computations, the end results are usedfor a similar purpose. The DT LCSG algorithm is actual-ly more e$cient than all other algorithms even in itsworst case. The choice between LCSG algorithms shoulddepend mostly on the characteristics of the problem,which determine the space complexity of the DT LCSGalgorithm. Given a model size which makes the decisiontree LCSG algorithm possible, it will provide signi"-cantly faster similarity retrieval than any other algo-rithm.

Whether the edit distance or LCSG algorithm is mostappropriate for a given application depends on a numberof factors. Just as there are tasks for which LCSG is thepreferred measure of graph similarity, there are also tasks

for which edit distance is a better measure. Structuralproperties of the model graphs should also be considered,the preprocessed algorithms being at their best whenthere is much common structure in the model graphs.

6. Conclusion

This paper presents a new algorithm for detection ofthe largest common subgraph between two graphs. Thealgorithm is intended for use with a database of models ofwhich there is prior knowledge, and provides rapid online processing of input queries. The contributions of thispaper are to examine the performance of the algorithmand its ancestors over video database data, and thepresentation of the new algorithm for largest commonsubgraph detection.

The alternative algorithms studied are suitable forcertain problems in image and video database retrieval,however each is limited in this application as discussed.The largest common subgraph algorithm provides suit-able solutions for the task of similarity retrieval for im-ages and video in large databases. The strength of thealgorithm presented in this paper is its exceptional classi-"cation performance, as shown in the results over videodatabase data.

The algorithm presented here makes largest commonsubgraph detection tractable for large databases of smallmodel graphs. This is a problem that is becoming morecommon as complex data such as images, video andsound become widely available in large quantities. Thesetypes of data require complex relational descriptions,which have traditionally been slow to process for approx-imate matching. Given prior knowledge of the databaseof model graphs the algorithm presented here o!ers largeimprovements in classi"cation time for input graphs.

References

[1] M. Flickner, H. Sawhney, W. Niblack, J. Ashley,Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee,D. Petkovic, D. Steele, P. Yanker, Query by image andvideo content: the QBIC system, IEEE Comput. 28 (1995)23}32.

[2] J. Ashley, R. Barber, M. Flickner, J. Hafner, D. Lee, W.Niblack, D. Petkovic, Automatic and semi-automaticmethods for image annotation and retrieval in QBIC,SPIE Proceedings of Storage and Retrieval for ImageVideo Databases III, 1995, pp. 24}35.

[3] A. Nagasaka, Y. Tanaka, Automatic video indexing andfull-video search for object appearences, in: E. Knuth, L.M.Wegner (Eds.), Visual Database Systems, Vol. II, No. A-7in IFIP Transactions, IFIP, Elsevier Science Publishers,September 1992, pp. 113}127.

[4] M. Kass, A. Witkin, D. Terzopoulos, Snakes: active con-tour models, Proceedings of the First International Con-ference on Computer Vision, 1987, pp. 259}269.

K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1089

Page 16: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

[5] A.K. Jain, A. Vailaya, Image retrieval using color and shape,Proceedings of the Second Asian Conference on ComputerVision, Vol. II, IEEE, New York, 1995, pp. 529}533.

[6] S. Santini, R. Jain, Similarity matching, Proceedings of theSecond Asian Conference on Computer Vision, Vol. II,IEEE, New York, 1995, pp. 544}548.

[7] B. Holt, L. Hartwick, Visual image retrieval for applica-tions in art and art history, in: W. Niblack, R.C. Jain (Eds.),SPIE Proceedings of Storage and Retrieval for ImageVideo Databases II, Vol. 2185, SPIE, February 1994,pp. 70}81.

[8] R. Burke, A. Kass, Re"ning the universal indexing frame tosupport retrieval of tutorial stories, Indexing and Reuse inMultimedia Systems, Seattle, WA, AAAI, August 1994,pp. 1}11. Workshop Notes.

[9] L.A. Rowe, J.S. Boreczky, C.A. Eads, Indexing for useraccess to large video databases, in: W. Niblack, R.C. Jain(Eds.), SPIE Proceedings of Storage and Retrieval forImage Video Databases II, Vol. 2185, San Jose, CA, IS &Tand SPIE, February 1994, pp. 150}161.

[10] A.S. Chakravarthy, Towards semantic retrieval of picturesand video, Indexing and Reuse in Multimedia Systems,Seattle, WA, AAAI, August 1994, pp. 12}18. WorkshopNotes.

[11] M. Davis, Knowledge representation for video, Indexingand Reuse in Multimedia Systems, Seattle, WA, AAAI,August 1994, pp. 19}28. Workshop Notes.

[12] B.R. Gaines, M.L. Shaw, Concept maps indexing multi-media knowledge bases, Indexing and Reuse in Multi-media Systems, Seattle, WA, AAAI, August 1994, pp.36}45. Workshop Notes.

[13] S. Abe, Y. Tonomura, H. Kasahara, Scene retrievalmethod for video database applications using temporalcondition changes, International Workshop on IndustrialApplications of Machine Intelligence and Vision, IEEE,New York, April 1989, pp. 355}359.

[14] S. Adah, S. Candan, S.-S. Chen, K. Erol, V.S. Subrah-manian, The advanced video information system: datastructures and query processing, Multimedia Systems4 (1996) 172}186.

[15] H. Tamura, N. Yokoya, Image database systems: a survey,Pattern Recognition 17 (1) (1984) 29}43.

[16] M.-C. Yang, 2D B-string representation and access methodsof image database, Master's Thesis, Department ofComputer Science and Information Engineering, NationalChiao Tung University, Hsinchu, Taiwan, July 1990.

[17] T. Arndt, S. Chang, An intelligent image database system,Proceedings of the IEEE Workshop on Visual Languages,IEEE, New York, 1989, pp. 177}182.

[18] K.R. Shearer, S. Venkatesh, D. Kieronska, Spatial index-ing for video databases, J. Visual Commun. Image Repres-entation 7 (1997) 325}335.

[19] K.R. Shearer, D. Kieronska, S. Venkatesh, Resequencingvideo using spatial indexing, J. Visual Languages Comput.8 (1997) 193}214.

[20] S. Chang, Q. Shi, C. Yan, Iconic indexing by 2D strings,Proceedings of the IEEE Workshop on Visual Languages,Dallas, Texas, USA, June 1986. Also in IEEE Trans. Pat-tern Anal. Mach. Intell. 9 (1987) 413}428.

[21] S. Lee, F. Hsu, Spatial reasoning and similarity retrieval ofimages using 2D C-string knowledge representation, Pat-tern Recognition 25 (3) (1992) 305}318.

[22] S. Chang, E. Jungert, T. Li, Representation and retrieval ofsymbolic pictures using generalized 2D strings, SPIE Pro-ceedings of Visual Communications and Image ProcessingIV, Vol. 1199, SPIE, 1989, pp. 1360}1372.

[23] S. Lee, M. Shan, W. Yang, Similarity retrieval of iconicimage database, Pattern Recognition 22 (6) (1989)675}682.

[24] S. Lee, M. Yang, J. Chen, Signature "le as a spatial "lter foriconic image database, J. Visual Languages Comput.3 (1992) 373}397.

[25] B.T. Messmer, H. Bunke, Subgraph isomorphism detec-tion in polynomial time on preprocessed model graphs,Second Asian Conference on Computer Vision, 1995,pp. 151}155.

[26] B.T. Messmer, E$cient graph matching algorithms forpreprocessed model graphs, Ph.D. Thesis, Institut fur In-formatik und angewandte Mathematik, Universitat Bern,Switzerland, 1995.

[27] G. Levi, A note on the derivation of maximal commonsubgraphs of two directed or undirected graphs, Calcolo9 (1972) 341}354.

[28] B.T. Messmer, H. Bunke, A new algorithm for error-tolerant subgraph isomorphism detection, IEEE Trans.Pattern Anal. Mach. Intell. 20 (1998) 493}504.

[29] B.T. Messmer, H. Bunke, Error-correcting graph isomor-phism using decision trees, Int. J. Pattern RecognitionArtif. Intell. 12 (1998) 721}742.

[30] J.R. Ullman, An algorithm for subgraph isomorphism,J. Assoc. Comput. Mach. 23 (1) (1976) 31}42.

[31] J.F. Allen, Maintaining knowledge about temporal inter-vals, Commun. ACM 26 (1983) 832}843.

[32] K. Shearer, S. Venkatesh, D. Kieronska, The visitors guide:a simple video reuse application, Int. J. Pattern Recogni-tion Artif. Intell. 11 (2) (1997) 275}301.

[33] T. Arndt, S.-K, Chang, Image sequence compression byiconic indexing, 1989 IEEE Workshop on Visual Lan-guages, The Institute of Electrical and Electronic Engin-eers, IEEE Computer Society, Silverspring, MD, October1989, pp. 177}182.

[34] P. TuraH n, Eine extremalaufgabe aus der graphentheorie,Mat. Fiz. Lapok 48 (1941) 436}452.

[35] N.J. Nilsson, Principles of Arti"cial Intelligence, Tioga,Palo Alto, 1980.

[36] K. Shearer, S. Venkatesh, H. Bunke, An e$cient leastcommon subgraph algorithm for video indexing, Proceed-ings of the International Conference on Pattern Recogni-tion, Vol. II, IAPR, IEEE Computer Society, Silverspring,MD, August 1998, pp. 1241}1243.

About the Author*KIM SHEARER holds a Bachelor of Science degree, with honours in computer science, from The University ofWestern Australia. He then held a position as tutor and worked in industry. During his time Kim undertook research in the distributedsystem area.

1090 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091

Page 17: Video indexing and similarity retrieval by largest common subgraph detection using decision trees

Kim completed a Ph.D. in computing at the Curtin University of Technology, and has held a research fellows position there for 1998and 1999. His area of study is indexing and retrieval for image and video databases, with associated interests in graph theory and otherformal representations.

Kim began a position at IDIAP in Switzerland in 2000 working in the area of video annotation and retrieval.

About the Author*HORST BUNKE received his M.S. and Ph.D. degrees in Computer Science from the University of Erlangen,Germany. From 1980 to 1981 he was on a postdoctoral leave visiting Purdue University, west Lafayette, Indiana, and in 1983 he helda temporary appointment at the university of Hamburg, Germany. In 1984, he joined the University of Bern, Switzerland, where he isa full professor in the Computer Science Department. He was department chairman from 1992}1996. From 1997}1998 he was Dean ofthe Faculty of Science. Horst Bunke held visiting positions at the IBM Los Angeles Scienti"c Center (1989), the University of Szeged,Hungary (1991), the University of South Florida at Tampa (1991, 1996, 1998 and 1999), the University of Nevada at Las Vegas (1994),Kagawa University, Takamatsu, Japan (1995), and Curtin University, Perth, Australia (1999).

Horst Bunke is a Fellow and "rst Vice-President of the International Association for Pattern Recognition (IAPR). He is an associateeditor of the International Journal of Document Analysis and Recognition, editor-in-charge of the International Journal of PatternRecognition and Arti"cial Intelligence, and editor-in-chief of the book series on Machine Perception and Arti"cial Intelligence by WorldScienti"c Publ. Co. He was on the program and organization committee of many conferences and served as a referee for numerousjournals and scienti"c organizations.

He has more than 300 publications, including 20 books and special editions of journals. His current interests include patternregonition, machine vision, and arti"cial intelligence.

About the Author*SVETHA VENKATESH holds a Bachelor of Engineering from Roorkee University and a Masters of Technologydegree from the Indian Institute of Technology in India. She received her Ph.D. from the University of Western Australia in Perth. She iscurrently an professor at the School of Computing in Curtin University, Perth. She was worked in the areas of active vision, objectrecognition systems, biological based vision systems, and is currently working on the application of computer vision to image and videodatabase indexing and retrieval. She is a Senior Member of the IEEE, and was the President of the Australia Pattern RecognitionSociety in 1991. She is the author of about 160 papers, and is the co-author of the book `From Living Eyes to Seeing Machinesa.

K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1091