Video indexing and similarity retrieval by largest common ... 30044293/venkatesh-video...subgraph detection using decision trees ... Graph matching; Similarity retrieval; ... likely to involve browsing of a large database, then pro-

  • Published on

  • View

  • Download


<ul><li><p>*Corresponding author. Tel.: #61-8-9266-2110; fax: #61-8-9266-2819.</p><p>E-mail address: (K. Shearer).</p><p>Pattern Recognition 34 (2001) 1075}1091</p><p>Video indexing and similarity retrieval by largest commonsubgraph detection using decision trees</p><p>Kim Shearer!,*, Horst Bunke", Svetha Venkatesh!</p><p>!Department of Computer Science, Curtin University of Technology, GPO Box U1987, Perth 6001, WA, Australia"Institut fu( r Informatik und Angewandte Mathematik, Universita( t Bern, Switzerland</p><p>Received 14 May 1999; received in revised form 1 March 2000; accepted 1 March 2000</p><p>Abstract</p><p>While the largest common subgraph (LCSG) between a query and a database of models can provide an elegant andintuitive measure of similarity for many applications, it is computationally expensive to compute. Recently developedalgorithms for subgraph isomorphism detection take advantage of prior knowledge of a database of models to improvethe speed of on-line matching. This paper presents a new algorithm based on similar principles to solve the largestcommon subgraph problem. The new algorithm signi"cantly reduces the computational complexity of detection of theLCSG between a known database of models, and a query given on-line. ( 2001 Pattern Recognition Society. Publishedby Elsevier Science Ltd. All rights reserved.</p><p>Keywords: Graph matching; Similarity retrieval; Video indexing; Decision tree</p><p>1. Introduction</p><p>With the advent of large on-line databases of imagesand video, it is becoming increasingly important to ad-dress the issues of e!ective indexing and retrieval. Themajority of existing techniques in the area of image andvideo indexing can be broadly categorised into low- andhigh-level techniques. Low-level techniques use at-tributes such as colour and texture measures to encodean image, and perform image similarity retrieval by com-paring vectors of such attributes [1}6]. High-level tech-niques use semantic information about the meaning ofthe pictures to describe an image, and therefore requireintensive manual annotation [7}14]. An additionalmethod of image indexing is by qualitative spatial rela-tionships between key objects. This form of index hasbeen successfully applied to indexing and retrieval ofimage data [15}17]. Recent work by Shearer and Ven-katesh [18,19] has extended the range of this repres-</p><p>entation to encoding of video data. With the objectinformation proposed for inclusion in the MPEG-4 stan-dard, such indexes will be simpler to implement. It shouldbe noted that none of these indexes provide a compre-hensive retrieval method, but may be considered com-plementary methods, each being one component of aretrieval toolkit.</p><p>When images or video are encoded using spatial in-formation, searches for exact pictorial or subpicturematches may be performed using a compact encodingsuch as 2D-strings [20}22]. Similarity retrieval, however,is best expressed as an inexact isomorphism detectionbetween graphs representing two images [23,24]. Imagesare encoded as graphs by representing each object in animage as a vertex in the graph, and placing an edge,labelled with the spatial relationship between the twocorresponding objects, between each pair of vertices.Note that this is only one plausible representation ofimages using graphs. Other representations are possible,and in some cases will be preferable. Indeed any relation-al structure represented as a graph could make use of thealgorithms discussed in this paper.</p><p>Inexact isomorphism detection between two graphscan be performed using one of two measures of similarity;</p><p>0031-3203/01/$20.00 ( 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 0 4 8 - 0</p></li><li><p>edit distance or largest common subgraph. While thereare well-known algorithms to solve these problems, thesealgorithms are exponential in time complexity in thegeneral case. This is a disadvantage when retrieval islikely to involve browsing of a large database, then pro-gressive re"nement.</p><p>Recently, new graph isomorphism algorithms havebeen developed by Messmer and Bunke [25,26], whichuse a priori knowledge of the database of model graphsto build an e$cient index o!-line. This index takesadvantage of similarity between models to reduceexecution time. Messmer and Bunke propose algo-rithms to solve the subgraph isomorphism problem,and to solve the inexact subgraph isomorphism prob-lem using an edit distance measure. While subgraphisomorphism is often used as a measure of similaritybetween images, the edit distance measure is notsuitable for image and video indexing by spatial relation-ships.</p><p>The major di$culty when applying edit distancemethods to the image similarity problem, is that there isno clear interpretation for the edit operations. Deletionof a vertex implies the exclusion of an object fromthe matching part of two graphs, altering the label of anedge represents altering the relationships of two objects,there is no meaningful comparison for these two opera-tions. This problem means that any similarity measurebased on graph edit distance will return a value withlittle or no physical signi"cance. Inexact isomorphismalgorithms based on an edit distance measure also su!erfrom bias when used to compare an input against mul-tiple model graphs. By the nature of edit distance algo-rithms, if the input graph is smaller than the models,smaller model graphs are more likely to be chosen assimilar.</p><p>A more appropriate measure of image similarity is thelargest common subgraph between the graphs represent-ing the images. Largest common subgraph is a simpleand intuitive measure of image similarity. The largestcommon subgraph between two graphs G</p><p>1and G</p><p>2, en-</p><p>coding two images I1</p><p>and I2, represents the largest</p><p>collection of objects found in I1</p><p>and I2</p><p>that maintain thesame relationship to each other in both images. Theusual algorithm for determining largest common sub-graphs is the maximal clique detection method proposedby Levi [27].</p><p>The maximal clique detection algorithm is e$cient inthe best case, but in the worst case requires O((nm)n) time,where n is the number of vertices in the input graph andm the number of vertices in the model, for a completegraph with all vertices sharing the same label. This highcomputational complexity makes it di$cult to applyindices which are based on spatial relationships to largedatabases.</p><p>In this paper we describe a new algorithm for largestcommon subgraph detection. The algorithm has been</p><p>developed from an algorithm originally proposed byMessmer and Bunke [28,29]. As with the original algo-rithm, this algorithm uses a preprocessing step, appliedto the models contained in the database, to provide rapidclassi"cation at run time. The proposed algorithm isconsiderably more e$cient in time than any previousalgorithm for largest common subgraph detection, how-ever the space complexity of the algorithm somewhatrestricts its application. If the di!erence in time complex-ity is considered, from O((nm)n) for the usual algorithmfor matching an input of size n, to a database of modelsof size m, to the new algorithms complexity of O(2nn3),there is room to perform space saving operations whilestill signi"cantly out performing typical algorithms. Oneexample of space saving is to depth limit the decisiontree, such that searching is halted when a certaindepth is reached, and matching between the few graphsleft to separate is completed using a algorithm such asUllman's [30], instantiated from the matching alreadyperformed.</p><p>While largest common subgraph has been appliedmostly to image similarity retrieval, the application ofthese new algorithms to indexing by spatial relation-ships can also take advantage of the high degree ofcommon structure expected in video databases. In thispaper we treat a video as a simple sequence of images.Even with this straightforward treatment it is possibleto provide a similarity retrieval scheme that is ex-tremely e$cient, due to the high degree of commonstructure encountered in between frames in video. Manyframes of a video will be classi"ed by one element ofthe classi"cation structure used. This removes theimpediment of slow similarity retrieval for indices ofthis type. Furthermore, the algorithms may be appliedto labelled, attributed, directed or undirected graphs.Such graphs have a great deal of expressive power,and may be applied to other encodings of image andvideo information, or other data sets of relational struc-tures.</p><p>This paper begins by providing de"nitions for the keygraph related concepts. Following sections will then brie-#y describe the encoding used for image and video in-formation. The precursor to the new algorithm is thenexplained, followed by an explanation of the new algo-rithm. The results section compares the new algorithmwith other available algorithms and examines the spacecomplexity over an image database.</p><p>1.1. Dexnitions</p><p>De5nition 1. A graph is a 4-tuple G"(</p></li><li><p>De5nition 2. Given a graph G"(</p></li><li><p>Table 1Possible interval relationships, due to Allen [31]</p><p>Relation Symbol Example Relation Symbol Example</p><p>Less than a(b Meets aDb</p><p>Overlaps a/b Ends a]b</p><p>Contains a%b Begins a[b</p><p>Equals a"b Begins inverse a['b</p><p>Contains inverse a%'b Ends inverse a]'b</p><p>Overlaps inverse a/'b Meets inverse aD'b</p><p>Less than inverse a('b</p><p>that the a and b are a type-0 match, with the additionalcondition that the objects have the same orthogonalrelationships. Insisting on matching orthogonal relation-ships in addition to a type-0 match restricts type-1matching to a similar rotational position.</p><p>Two pictures P and Q are said to be a type-n match if:</p><p>1. For each object oi3P there exists an o</p><p>j3Q such that</p><p>oi,o</p><p>j, and for each object o</p><p>j3Q there exists an o</p><p>i3P</p><p>such that oj,o</p><p>i.</p><p>2. For all object pairs oi, o</p><p>jwhere o</p><p>i3P and o</p><p>j3Q,</p><p>oiand o</p><p>jare a type-n match.</p><p>In many cases two pictures may not be complete match-es, or even share the same object set, but we may beinterested in partial matches. The search for partialmatches, or subpicture matches is referred to as simila-rity retrieval. A subpicture match between P and Q isde"ned as:</p><p>1. For all objects oi3Q@, where Q@LQ, there exists an</p><p>object oj3P@, where P@LP, such that o</p><p>i,o</p><p>j.</p><p>2. For all objects oj3P@, where P@LP, there exists an</p><p>object oi3Q@, where Q@LQ, such that o</p><p>j,o</p><p>i.</p><p>3. For all object pairs oi, o</p><p>jwhere o</p><p>i3P@ and o</p><p>j3Q@,</p><p>oiand o</p><p>jare a type-n match.</p><p>Thus a subpicture match occurs when a subset of theobjects in a picture P are found in a picture Q, with thespatial relationships being a type-n match.</p><p>When searching a video database, if there are nocomplete picture matches, we will be interested in thelargest subpicture match that may be found. The classicmethods for solving this problem cast the problem assubgraph isomorphism detection. When applied to imageor video retrieval, the largest common subgraph methodwill "nd the largest subpicture in common betweena query image and a database of images. Whether this isthe desired retrieval result depends upon a number offactors.</p><p>The largest common picture may not contain one ormore key objects which are considered essential by theuser, or may contain additional objects or have generalcharacteristics which are unsuitable. For this reason aranked list of the closest matching images is generallyreturned. Modi"cation of the query, in conjunction withselection of a more exact, or less exact matching type maythen be utilised to re"ne the retrieval set.</p><p>1078 K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091</p></li><li><p>Fig. 1. Two pictures with possible graph encodings.</p><p>Variation of type of matching employed allows either</p><p>f restriction of the retrieval set by increased exactitudein matching,</p><p>f expansion of the retrieval set by reduced exactitude inmatching.</p><p>This, combined with small alterations to the query,allows a rapid focus on the desired results during re-trieval. The qualitative nature of the relationships be-tween objects and the less exact matching types allownoise to be accommodated in the browsing and queryre"nement process. Retrieval for this method is wellde"ned and exact, the image in the database with thegreatest number of objects in relationships which matchthe query will be the "rst model retrieved. How useful theretrieval method is depends on the form of retrievaldesired and the #exibility of the three matching types.A discussion of usefulness and accuracy for a speci"cvideo retrieval application can be found in earlier workby the authors [19,32].</p><p>3. Encoding videos</p><p>Video may be encoded using techniques from imagedatabase work, adapted to the task of capturing motionin video [18,33]. In order to keep the representatione$cient in space usage, only changes in the video shouldbe represented, rather than duplicating information be-tween frames. This can be achieved by representing theinitial frame of a video sequence using 2D strings, then</p><p>encoding only the changes in spatial relationships be-tween objects for the rest of the sequence. The earlierwork performed by this group [18] encodes changingrelationships such that matching can be performed dir-ectly from the abbreviated notation, without expansionof the notation. This encoding of video using spatialrelationships applies graph algorithms to solve similarityretrieval.</p><p>In order to use a graph algorithm to solve a particularproblem, it is "rst necessary to encode the operands ofthe problem as graphs. In this case the operands areeither digital pictures or frames from a video. Theseframes are indexed by the spatial relationships of keyobjects. The logical graph encoding used for suchinformation is that graph vertices represent objects,with edges labelled by the relationships between theobjects. For the task of "nding graph isomorphismsthis leads to a complete labelled graph, as deductionof relationships at run time would be prohibitivelytime consuming. In practice, edges are labelled with therelationship category (De"nition 6) of the relationshipsbetween the two objects, with the actual relationshipalong each axis used as attributes of the edge. Thisrepresentation is more e$cient as all matching typesrequire that two objects are at least a type-0 match, thatis they have the same relationship category. Matchingcan therefore be performed by testing the edge label, orrelationship category, and proceeding to the attributes iftype-1 or type-2 matching is required and the labels areequivalent.</p><p>Fig. 1 shows two pictures and the graphs which repres-ent them. There is a clear similarity between the two</p><p>K. Shearer et al. / Pattern Recognition 34 (2001) 1075}1091 1079</p></li><li><p>Fig. 2. Graph with adjacency matrix.</p><p>Fig. 3. Row column elements of a matrix.</p><p>pictures, the most obvious being between the subpicturescomposed of objects A}C. In the context of qualitativereasoning, the object B does not move with respect toA and C, as the relationships between them do notchange. An examination of the two graphs reveals that ifwe remove the vertex for object D, and all arcs leading toor from that vertex, then the remaining pa...</p></li></ul>


View more >