Approximate graph edit distance computation by means of bipartite graph matching

Embed Size (px)

Text of Approximate graph edit distance computation by means of bipartite graph matching

  • ti

    asse

    f gredpattitsntegcomquwence

    is emprically veried that the accuracy of the suboptimal distance remains sufciently accurate for var-

    ocess oumbernt yearaphs iresentes andexibl

    the edit distance of graphs can also be interpreted as a pattern sim-ilarity measure in the context of kernel machines, which makes alarge number of powerful methods applicable to graphs [4], includ-ing support vector machines for classication and kernel principalcomponent analysis for pattern analysis. There are various applica-tions where the edit distance has proved to be suitable for error-tolerant graph matching [5,6].

    authors propose simple variants of an optimal edit distance algo-rithm that make the computation substantially faster. A numberof graph matching methods based on genetic algorithms havebeen proposed [11]. Genetic algorithms offer an efcient wayto cope with large search spaces, but are non-deterministicand suboptimal. A common way to make graph matching moreefcient is to restrict considerations to special classes of graphs.Examples include the classes of planar graphs [12], bounded-va-lence graphs [13], trees [14], and graphs with unique vertex la-bels [15]. Recently, a suboptimal edit distance algorithmhas been proposed [5] that requires the nodes of graphs to be

    * Corresponding author. Tel./fax: +41 316318699.E-mail addresses: riesen@iam.unibe.ch (K. Riesen), bunke@iam.unibe.ch (H.

    Image and Vision Computing 27 (2009) 950959

    Contents lists availab

    io

    elsBunke).erant graph matching that is applicable to various kinds of graphsis based on the edit distance of graphs [2,3]. The idea of graph editdistance is to dene the dissimilarity of graphs by the amount ofdistortion that is needed to transform one graph into another.Using the edit distance, an input graph to be classied can be ana-lyzed by computing its dissimilarity to a number of traininggraphs. For classication, the resulting distance values may befed, for instance, into a nearest-neighbor classier. Alternatively,

    have been proposed. In some approaches, the basic idea is toperform a local search to solve the graph matching problem, thatis, to optimize local criteria instead of global, or optimal ones[7,8]. In [9], a linear programming method for computing theedit distance of graphs with unlabeled edges is proposed. Themethod can be used to derive lower and upper edit distancebounds in polynomial time. Two fast but suboptimal algorithmsfor graph edit distance computation are proposed in [10]. The1. Introduction

    Graph matching refers to the prtural similarity of graphs. A large nmatching have been proposed in recetage of a description of patterns by ggraphs allow for a more powerful reptions. In the most general case, nodarbitrary attributes. One of the most0262-8856/$ - see front matter 2008 Elsevier B.V. Adoi:10.1016/j.imavis.2008.04.004ious pattern recognition applications. 2008 Elsevier B.V. All rights reserved.

    f evaluating the struc-of methods for graphrs [1]. The main advan-nstead of vectors is thatation of structural rela-edges are labeled withe methods for error-tol-

    Yet, the error-tolerant nature of edit distance potentially allowsevery node of a graph to be mapped to every node of another graph(unlike exact graph matching methods such as subgraph isomor-phism or maximum common subgraph). The time and space com-plexity of edit distance computation is therefore very high.Consequently, the edit distance can be computed for graphs of arather small size only.

    In recent years, a number of methods addressing the highcomputational complexity of graph edit distance computationrather than global, edge structure during the optimization process. In experiments on different datasetswe demonstrate a substantial speed-up of our proposed method over two reference systems. Moreover, itApproximate graph edit distance computa

    Kaspar Riesen *, Horst BunkeInstitute of Computer Science and Applied Mathematics, University of Bern, Neubrckstr

    a r t i c l e i n f o

    Article history:Received 15 October 2007Received in revised form 1 February 2008Accepted 9 April 2008

    Keywords:Graph based representationGraph edit distanceBipartite graph matching

    a b s t r a c t

    In recent years, the use ograph edit distance emergaddress different tasks inof graph edit distance areand the fact that one can icic edit cost functions. Itsthe involved graphs. Conseonly. In the present papertimally, compute edit dista

    Image and Vis

    journal homepage: www.ll rights reserved.on by means of bipartite graph matching

    10, CH-3012 Bern, Switzerland

    aph based object representation has gained popularity. Simultaneously,as a powerful and exible graph matching paradigm that can be used toern recognition, machine learning, and data mining. The key advantageshigh degree of exibility, which makes it applicable to any type of graph,rate domain specic knowledge about object similarity by means of spe-putational complexity, however, is exponential in the number of nodes of

    ently, exact graph edit distance is feasible for graphs of rather small sizeintroduce a novel algorithm which allows us to approximately, or subop-in a substantially faster way. The proposed algorithm considers only local,

    le at ScienceDirect

    n Computing

    evier .com/locate / imavis

  • planarly embedded, which is satised in many, but not all com-puter vision applications of graph matching.

    In this paper, we propose a new efcient algorithm for edit dis-

    the concept of edit distance can be tailored to specicapplications.

    K. Riesen, H. Bunke / Image and Vision Computing 27 (2009) 950959 951tance computation for general graphs. The method is based on an(optimal) fast bipartite optimization procedure mapping nodesand their local structure of one graph to nodes and their local struc-ture of another graph. This procedure is somewhat similar in spirit tothe method proposed in [16,17]. However, rather than using dy-namic programming for nding an optimal match between the setsof local structure, we make use of Munkres algorithm [18]. Origi-nally, this algorithm has been proposed to solve the assignmentproblem in polynomial time. However, in the present paper we gen-eralize the original algorithm to the computation of graph edit dis-tance. In experiments on semi-articial and real-world data wedemonstrate that the proposed method results in a substantialspeed-up of the computation of graph edit distance, while at thesame time the accuracy of the approximated distances is not muchaffected.

    A preliminary version of the current paper appeared in [19]. Thecurrent paper has been signicantly extended with respect to theunderlying methodology and the experimental evaluation.

    2. Graph edit distance computation

    In this section we rst dene our basic notation and then intro-duce graph edit distance and its computation. Let L be a nite orinnite set of labels for nodes and edges.

    Denition 1. (Graph) A graph g is a four-tuple g V ; E; l; m,where

    V is the nite set of nodes, E#V V is the set of edges, l : V ! L is the node labeling function, and m : E! L is the edge labeling function.

    This denition allows us to handle arbitrary graphs with uncon-strained labeling functions. For example, the labels can be given bythe set of integers, the vector space Rn, or a set of symbolic labelsL fa; b; c; . . .g. Moreover, unlabeled graphs are obtained as a spe-cial case by assigning the same label l to all nodes and edges. Edgesare given by pairs of nodes u; v, where u 2 V denotes the sourcenode and v 2 V the target node of a directed edge. Undirectedgraphs can be modeled by inserting a reverse edge v;u 2 E foreach edge u; v 2 E with mu; v mv;u.

    Graph matching refers to the task of evaluating the dissimilar-ity of graphs. One of the most exible methods for measuring thedissimilarity of graphs is the edit distance [2,3]. Originally, theedit distance has been proposed in the context of string matching[20]. Procedures for edit distance computation aim at deriving adissimilarity measure from the number of distortions one has toapply to transform one pattern into the other. The concept of editdistance has been extended from strings to trees and eventuallyto graphs [2,3]. Similarly to string edit distance, the key idea ofgraph edit distance is to dene the dissimilarity, or distance, ofgraphs by the minimum amount of distortion that is needed totransform one graph into another. Compared to other approachesto graph matching, graph edit distance is known to be very ex-ible since it can handle arbitrary graphs and any type of node andedge labels. Furthermore, by dening costs for edit operations,

    g1Fig. 1. A possible edit path between graph g1 and g2 (nodA standard set of distortion operations is given by insertions,deletions, and substitutions of both nodes and edges. We denotethe substitution of two nodes u and v by u ! v, the deletion ofnode u by u! e, and the insertion of node v by e ! v. For edgeswe use a similar notation. Other operations, such as merging andsplitting of nodes [21], can be useful in certain applications butare not considered in this paper. Given two graphs, the sourcegraph g1 and the target graph g2, the idea of graph edit distanceis to delete some nodes and edges from g1, relabel (substitute)some of the remaining nodes and edges, and insert some nodesand edges in g2, such that g1 is nally transformed into g2. A se-quence of edit operations e1; . . . ; ek that transform g1 completelyinto g2 is called an edit path between g1 and g2. In Fig. 1 an exam-ple of an edit path between two graphs g1 and g2 is given. This editpath consists of three edge deletions, one node deletion, one nodeinsertion, two edge insertions, and two node substitutions.

    Obviously, for every pair of graphs (g1; g2), there exist a numb