Approximate graph edit distance computation by means of bipartite graph matching

  • Published on

  • View

  • Download


<ul><li><p>ti</p><p>asse</p><p>f gredpattitsntegcomquwence</p><p>is emprically veried that the accuracy of the suboptimal distance remains sufciently accurate for var-</p><p>ocess oumbernt yearaphs iresentes andexibl</p><p>the edit distance of graphs can also be interpreted as a pattern sim-ilarity measure in the context of kernel machines, which makes alarge number of powerful methods applicable to graphs [4], includ-ing support vector machines for classication and kernel principalcomponent analysis for pattern analysis. There are various applica-tions where the edit distance has proved to be suitable for error-tolerant graph matching [5,6].</p><p>authors propose simple variants of an optimal edit distance algo-rithm that make the computation substantially faster. A numberof graph matching methods based on genetic algorithms havebeen proposed [11]. Genetic algorithms offer an efcient wayto cope with large search spaces, but are non-deterministicand suboptimal. A common way to make graph matching moreefcient is to restrict considerations to special classes of graphs.Examples include the classes of planar graphs [12], bounded-va-lence graphs [13], trees [14], and graphs with unique vertex la-bels [15]. Recently, a suboptimal edit distance algorithmhas been proposed [5] that requires the nodes of graphs to be</p><p>* Corresponding author. Tel./fax: +41 316318699.E-mail addresses: (K. Riesen), (H.</p><p>Image and Vision Computing 27 (2009) 950959</p><p>Contents lists availab</p><p>io</p><p>elsBunke).erant graph matching that is applicable to various kinds of graphsis based on the edit distance of graphs [2,3]. The idea of graph editdistance is to dene the dissimilarity of graphs by the amount ofdistortion that is needed to transform one graph into another.Using the edit distance, an input graph to be classied can be ana-lyzed by computing its dissimilarity to a number of traininggraphs. For classication, the resulting distance values may befed, for instance, into a nearest-neighbor classier. Alternatively,</p><p>have been proposed. In some approaches, the basic idea is toperform a local search to solve the graph matching problem, thatis, to optimize local criteria instead of global, or optimal ones[7,8]. In [9], a linear programming method for computing theedit distance of graphs with unlabeled edges is proposed. Themethod can be used to derive lower and upper edit distancebounds in polynomial time. Two fast but suboptimal algorithmsfor graph edit distance computation are proposed in [10]. The1. Introduction</p><p>Graph matching refers to the prtural similarity of graphs. A large nmatching have been proposed in recetage of a description of patterns by ggraphs allow for a more powerful reptions. In the most general case, nodarbitrary attributes. One of the most0262-8856/$ - see front matter 2008 Elsevier B.V. Adoi:10.1016/j.imavis.2008.04.004ious pattern recognition applications. 2008 Elsevier B.V. All rights reserved.</p><p>f evaluating the struc-of methods for graphrs [1]. The main advan-nstead of vectors is thatation of structural rela-edges are labeled withe methods for error-tol-</p><p>Yet, the error-tolerant nature of edit distance potentially allowsevery node of a graph to be mapped to every node of another graph(unlike exact graph matching methods such as subgraph isomor-phism or maximum common subgraph). The time and space com-plexity of edit distance computation is therefore very high.Consequently, the edit distance can be computed for graphs of arather small size only.</p><p>In recent years, a number of methods addressing the highcomputational complexity of graph edit distance computationrather than global, edge structure during the optimization process. In experiments on different datasetswe demonstrate a substantial speed-up of our proposed method over two reference systems. Moreover, itApproximate graph edit distance computa</p><p>Kaspar Riesen *, Horst BunkeInstitute of Computer Science and Applied Mathematics, University of Bern, Neubrckstr</p><p>a r t i c l e i n f o</p><p>Article history:Received 15 October 2007Received in revised form 1 February 2008Accepted 9 April 2008</p><p>Keywords:Graph based representationGraph edit distanceBipartite graph matching</p><p>a b s t r a c t</p><p>In recent years, the use ograph edit distance emergaddress different tasks inof graph edit distance areand the fact that one can icic edit cost functions. Itsthe involved graphs. Conseonly. In the present papertimally, compute edit dista</p><p>Image and Vis</p><p>journal homepage: www.ll rights reserved.on by means of bipartite graph matching</p><p>10, CH-3012 Bern, Switzerland</p><p>aph based object representation has gained popularity. Simultaneously,as a powerful and exible graph matching paradigm that can be used toern recognition, machine learning, and data mining. The key advantageshigh degree of exibility, which makes it applicable to any type of graph,rate domain specic knowledge about object similarity by means of spe-putational complexity, however, is exponential in the number of nodes of</p><p>ently, exact graph edit distance is feasible for graphs of rather small sizeintroduce a novel algorithm which allows us to approximately, or subop-in a substantially faster way. The proposed algorithm considers only local,</p><p>le at ScienceDirect</p><p>n Computing</p><p>evier .com/locate / imavis</p></li><li><p>planarly embedded, which is satised in many, but not all com-puter vision applications of graph matching.</p><p>In this paper, we propose a new efcient algorithm for edit dis-</p><p>the concept of edit distance can be tailored to specicapplications.</p><p>K. Riesen, H. Bunke / Image and Vision Computing 27 (2009) 950959 951tance computation for general graphs. The method is based on an(optimal) fast bipartite optimization procedure mapping nodesand their local structure of one graph to nodes and their local struc-ture of another graph. This procedure is somewhat similar in spirit tothe method proposed in [16,17]. However, rather than using dy-namic programming for nding an optimal match between the setsof local structure, we make use of Munkres algorithm [18]. Origi-nally, this algorithm has been proposed to solve the assignmentproblem in polynomial time. However, in the present paper we gen-eralize the original algorithm to the computation of graph edit dis-tance. In experiments on semi-articial and real-world data wedemonstrate that the proposed method results in a substantialspeed-up of the computation of graph edit distance, while at thesame time the accuracy of the approximated distances is not muchaffected.</p><p>A preliminary version of the current paper appeared in [19]. Thecurrent paper has been signicantly extended with respect to theunderlying methodology and the experimental evaluation.</p><p>2. Graph edit distance computation</p><p>In this section we rst dene our basic notation and then intro-duce graph edit distance and its computation. Let L be a nite orinnite set of labels for nodes and edges.</p><p>Denition 1. (Graph) A graph g is a four-tuple g V ; E; l; m,where</p><p> V is the nite set of nodes, E#V V is the set of edges, l : V ! L is the node labeling function, and m : E! L is the edge labeling function.</p><p>This denition allows us to handle arbitrary graphs with uncon-strained labeling functions. For example, the labels can be given bythe set of integers, the vector space Rn, or a set of symbolic labelsL fa; b; c; . . .g. Moreover, unlabeled graphs are obtained as a spe-cial case by assigning the same label l to all nodes and edges. Edgesare given by pairs of nodes u; v, where u 2 V denotes the sourcenode and v 2 V the target node of a directed edge. Undirectedgraphs can be modeled by inserting a reverse edge v;u 2 E foreach edge u; v 2 E with mu; v mv;u.</p><p>Graph matching refers to the task of evaluating the dissimilar-ity of graphs. One of the most exible methods for measuring thedissimilarity of graphs is the edit distance [2,3]. Originally, theedit distance has been proposed in the context of string matching[20]. Procedures for edit distance computation aim at deriving adissimilarity measure from the number of distortions one has toapply to transform one pattern into the other. The concept of editdistance has been extended from strings to trees and eventuallyto graphs [2,3]. Similarly to string edit distance, the key idea ofgraph edit distance is to dene the dissimilarity, or distance, ofgraphs by the minimum amount of distortion that is needed totransform one graph into another. Compared to other approachesto graph matching, graph edit distance is known to be very ex-ible since it can handle arbitrary graphs and any type of node andedge labels. Furthermore, by dening costs for edit operations,</p><p>g1Fig. 1. A possible edit path between graph g1 and g2 (nodA standard set of distortion operations is given by insertions,deletions, and substitutions of both nodes and edges. We denotethe substitution of two nodes u and v by u ! v, the deletion ofnode u by u! e, and the insertion of node v by e ! v. For edgeswe use a similar notation. Other operations, such as merging andsplitting of nodes [21], can be useful in certain applications butare not considered in this paper. Given two graphs, the sourcegraph g1 and the target graph g2, the idea of graph edit distanceis to delete some nodes and edges from g1, relabel (substitute)some of the remaining nodes and edges, and insert some nodesand edges in g2, such that g1 is nally transformed into g2. A se-quence of edit operations e1; . . . ; ek that transform g1 completelyinto g2 is called an edit path between g1 and g2. In Fig. 1 an exam-ple of an edit path between two graphs g1 and g2 is given. This editpath consists of three edge deletions, one node deletion, one nodeinsertion, two edge insertions, and two node substitutions.</p><p>Obviously, for every pair of graphs (g1; g2), there exist a numberof different edit paths transforming g1 into g2. Let !g1; g2 denotethe set of all such edit paths. To nd the most suitable edit path outof !g1; g2, one introduces a cost for each edit operation, measur-ing the strength of the corresponding operation. The idea of such acost function is to dene whether or not an edit operation repre-sents a strong modication of the graph. Obviously, the cost func-tion is dened with respect to the underlying node or edge labels.</p><p>Clearly, between two similar graphs, there should exist an inex-pensive edit path, representing low cost operations, while for dis-similar graphs an edit path with high costs is needed.Consequently, the edit distance of two graphs is dened by theminimum cost edit path between two graphs.</p><p>Denition 2. (Graph Edit Distance) Let g1 V1; E1; l1; m1 be thesource and g2 V2; E2; l2; m2 the target graph. The graph editdistance between g1 and g2 is dened by</p><p>dg1; g2 mine1 ;...;ek2!g1 ;g2Xki1</p><p>cei;</p><p>where !g1; g2 denotes the set of edit paths transforming g1 into g2,and c denotes the cost function measuring the strength cei of editoperation ei.</p><p>The computation of the edit distance is usually carried out bymeans of a tree search algorithm which explores the space of allpossible mappings of the nodes and edges of the rst graph tothe nodes and edges of the second graph. A widely used methodis based on the A* algorithm [22] which is a best-rst search algo-rithm. The basic idea is to organize the underlying search space asan ordered tree. The root node of the search tree represents thestarting point of our search procedure, inner nodes of the searchtree correspond to partial solutions, and leaf nodes represent com-plete not necessarily optimal solutions. Such a search tree isconstructed dynamically at runtime by iteratively creating succes-sor nodes linked by edges to the currently considered node in thesearch tree. In order to determine the most promising node in thecurrent search tree, i.e. the node which will be used for furtherexpansion of the desired mapping in the next iteration, a heuristicfunction is usually used. Formally, for a node p in the search tree,we use gp to denote the cost of the optimal path from the rootnode to the current node p, i.e. gp is set equal to the cost of thepartial edit path accumulated so far, and we use hp for denoting</p><p>g2e labels are represented by different shades of grey).</p></li><li><p>Ineet</p><p>s</p><p>chosepm</p><p>tions have to be performed in a complete edit path at least [2].</p><p>isionNote that edit operations on edges are implied by edit opera-tions on their adjacent nodes, i.e. whether an edge is substituted,deleted, or inserted, depends on the edit operations performedon its adjacent nodes. Formally, let u;u0 2 V1 [ feg andv; v0 2 V2 [ feg, and assume that the two node operations u! vand u0 ! v0 have been executed. We distinguish three cases.</p><p>(1) Assume there are edges e1 u;u0 2 E1 and e2 v; v0 2 E2in the corresponding graphs g1 and g2. Then the edge substitutione1 ! e2 is implied by the node operations given above.</p><p>(2) Assume there is an edge e1 u;u0 2 E1 but there is no edgee2 v; v0 2 E2. Then the edge deletion e1 ! e is implied by thenode operations given above. Obviously, if v e or v0 e there can-not be any edge v; v0 2 E2 and thus an edge deletion e1 ! e hasto be performed.</p><p>(3) Assume there is no edge e1 u;u0 2 E1 but an edgeminie2 the nal costs among all possible competing paths (line 7).</p><p>editn rst (line 5). This procedure guarantees that the completeath found by the algorithm rst is always optimal, i.e. haspath p 2 OPEN, i.e. the one that minimizes gp hp, is always</p><p>to be processed in the next steps. The most promising partial edit</p><p>The et OPEN of partial edit paths contains the search tree nodes</p><p>nodes of the second graph are inserted in a single step (line 14).</p><p>If all nodes of the rst graph have been processed, the remaining</p><p>which produces a number of successor nodes in the search tree.</p><p>subs itution of a node (line 11) are considered simultaneously,</p><p>cess d in the order u1;u2; . . .. The deletion (line 12) or the</p><p>tancAlgorithm 1 the A*-based method for optimal graph edit dis-computation is given. The nodes of the source graph are pro-17: end loop</p><p>16: end if</p><p>15:2 i1 ik</p><p>end if</p><p>14: Insert pmin [</p><p>Sw2V nfv ;;v gfe ! wg into OPENthe estimated cost from p to a leaf node. The sum gp hp givesthe total cost assigned to an open node in the search tree. One canshow that, given that the estimation of the future costs hp is low-er than, or equal to, the real costs, the algorithm is admissible, optimal path from the root node to a leaf node is guaranteed tobe found [22].</p><p>Algorithm 1. Graph edit distance algorithm</p><p>Input: Non-empty graphs g1 V1; E1; l1; m1 andg2 V2; E2; l2m2, where V1 fu1; . . . ;ujV1 jg andV2 fv1; . . . ; vjV2 jg</p><p>Output: A minimum cost edit path from g1 to g2e.g. pmin fu1 ! v3;u2 ! e; . . . ; e ! v2g</p><p>1: initialize OPEN to the empty set fg2: For each node w 2 V2, insert the substitution fu1 ! wg into</p><p>OPEN3: Insert the deletion fu1 ! eg into OPEN4: loop5: Remove pmin argminp2OPENfgp hpg from OPEN6: if pmin is a complete edit path then7: Return pmin as the solution8: else9: Let pmin fu1 ! vi1; ;uk ! vikg10: if k &lt; jV1j then11: For eachw 2 V2 n fvi1; ; vikg, insert pmin [ fuk1 ! wg</p><p>into OPEN12:...</p></li></ul>


View more >