Upload
francesc
View
216
Download
1
Embed Size (px)
Citation preview
Accepted Manuscript
Fast Computation of Bipartite Graph Matching
Francesc Serratosa
PII: S0167-8655(14)00133-0DOI: http://dx.doi.org/10.1016/j.patrec.2014.04.015Reference: PATREC 6004
To appear in: Pattern Recognition Letters
Received Date: 13 December 2013
Please cite this article as: Serratosa, F., Fast Computation of Bipartite Graph Matching, Pattern RecognitionLetters (2014), doi: http://dx.doi.org/10.1016/j.patrec.2014.04.015
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customerswe are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, andreview of the resulting proof before it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1
Graphical Abstract (Optional) To create your abstract, type over the instructions in the template box below. Fonts or abstract dimensions should not be changed or altered.
Research Highlights (Required)
To create your highlights, please type over the instructions in the template box below:
It should be short collection of bullet points that convey the core findings of the article. It
should include 3 to 5 bullet points (maximum 85 characters, including spaces, per bullet point.) Please find below an example:
• A sub-optimal algorithm is presented to compute the graph edit distance.
• It obtains exactly the same distance value than the Bipartite algorithm. • The computational cost is lower than the Bipartite one.
• Validation shows an important decrease on run time
Fast Computation of Bipartite Graph
Matching
Francesc Serratosa
Leave this area blank for abstract info.
1
Pattern Recognition Letters j ou r n a l h om ep a ge : w w w .e ls e v ie r .c om
Fast Computation of Bipartite Graph Matching
Francesc Serratosaa
a Universitat Rovira i Virgili Tarragona, Catalonia, Spain.
1. Introduction
Attributed Graphs have been of crucial importance in pattern recognition throughout more than 3 decades [1]. Graphs have been used to model several kinds of problems in some pattern recognition fields such as object recognition [2] and [3], scene view alignment [4], [5], [6], [7], multiple object alignment [8], [9], object characterization [10], [11] among a great amount of other applications. Interesting reviews of techniques and applications are [12], [13] and [14]. If elements in pattern recognition are modelled through attributed graphs, error-tolerant graph-matching algorithms are needed that aim to compute a matching between nodes of two attributed graphs that minimizes some kind of objective function. Unfortunately, the time and space complexity to compute the minimum of these objective functions is very high. For this reason, a lot of research has been done on trying to reduce as much as possible the run time of the graph-matching algorithms through sub-optimal techniques. Since its presentation, Bipartite algorithm [15] has been considered one of the best graph-matching algorithms due to it obtains a sub-optimal distance value almost near to the optimal
one but with a considerable decrease on the run time.
This paper presents a variant of the Bipartite algorithm [15] that obtains exactly the same distance value but with a reduced run time. Experimental validation shows a Speed up of 5 on well-known databases and higher Speed up on synthetically generated graphs. In fact, higher is the order of the graphs, higher is also the Speed up of our algorithm. This property is interesting since in the next years, we will see a need on representing the objects (social nets, scenes, proteins…) on larger structures.
The outline of the paper is as follows, in the next section, we define the attributed graphs and the graph-edit distance. On section 3, we explain how to compute the graph edit distance and we concretize on the Bipartite algorithm. Finally, on section 4,
we present our new method and we schematically show our algorithm. On section 5, we show the experimental validation and we finish the article with some conclusions.
2. Attributed Graphs and Graph Edit Distance
In this section, we first define the Attributed Graphs, Cliques
and Graph matching and then we explain the Graph Edit Distance.
2.1. Definitions
Attributed Graph and Cliques: Let ��and ��denote the domains of possible values for attributed vertices and arcs, respectively. An attributed graph (over �� and ��) is defined by a tuple � � ���, ��, �, ��, where �� � �� | � �,… , �� is the set of vertices (or nodes), �� � ��� �| , � ∈ �,… , �� is the set of arcs (or edges), �: �� → �� assigns attribute values to vertices and �: �� → �� assigns attribute values to arcs. The order of graph � is �.
We define a clique � on an attributed graph � as a local structure composed of a node and its outgoing edges � ��� , ��� �|� ∈ �,… , ��, �, ��. Error correcting graph isomorphism between graphs: Let
G� � �Σ��, Σ��, γ��, γ��� and G � �Σ� , Σ� , γ� , γ� � be two attributed
graphs of initial order n and m. To allow maximum flexibility in
the matching process, graphs are extended with null nodes [16] to
be of order n# m. We will refer to null nodes of G� and G by
Σ$�� ⊆ Σ�� and Σ$� ⊆ Σ� respectively. We assume null nodes have
indices & ∈ '( # 1,… , ( #*� and + ∈ '* # 1,… , ( #*� for
graphs G� and G , respectively. Let T be a set of all possible
bijections between two vertex sets �� and � . Bijection
f �, : Σ�� → Σ� , assigns one vertex of G� to only one vertex of G .
ARTIC LE INFO ABSTRAC T
Article history:
Received
Received in revised form
Accepted
Available online
We present a new algorithm to compute the Graph Edit Distance in a sub-optimal way. We
demonstrate that the distance value is exactly the same than the one obtained by the algorithm
called Bipartite but with a reduced run time. The only restriction we impose is that the edit costs
have to be defined such that the Graph Edit Distance can be really defined as a distance function,
that is, the cost of insertion plus deletion of nodes (or arcs) have to be lower or equal than the
cost of substitution of nodes (or arcs). Empirical validation shows that higher is the order of the
graphs, higher is the obtained Speed up.
2012 Elsevier Ltd. All rights reserved.
2 The bijection between arcs, denoted by f��, , is defined accordingly to the bijection of their terminal nodes. In other
words:
./0,123450 6 � 3781 ⇒ .0,1�:40� � :71 ∧ .0,12:506 � :81 (1)
:40, :50 ∈ Σ�� − Σ$�� and :71 , :81 ∈ Σ� − Σ$�
We define the non-existent or null edges by Σ$�� ⊆ Σ��and
Σ$� ⊆ Σ� .
2.2. Graph Edit Distance between two graphs
Graph Edit Distance [17, 18, 19] is the most used method to
solve the error-tolerant graph matching. It is based on defining a
distance between attributed graphs through the minimum modifications that are required to transform one attributed graph
into the other. To do so, it is needed to define these
modifications, which are called edit operations. Basically, six
different edit operations have been defined: insertion, deletion and substitution of both nodes and edges. In this way, for every
pair of attributed graphs (=0 and =1), there is an edit path
3>+?@&?ℎ�=0, =1� � �BC, … , BD� (where each B7 denotes an edit operation) that transforms one graph into the other. Given two
graphs, there is a set of edit paths, which we name E, that each of them transforms one of the graphs into the other. Figure 1 shows
the edit path that transforms =0 into =1. It is composed of the following 5 edit operations: Delete edge, Delete node, Insert
node, Insert edge and Substitute Node. The substitution operation is needed since the attributes of both nodes are different.
Edit cost functions have been introduced to quantitatively evaluate which edit path is the best. The aim of these functions is
to assign a penalty cost to each edit operation according to the amount of modification that it introduces in the transformation
sequence.
Figure 1. One of the edit paths that transforms =0 into =1.
Given two attributed graphs to be compared, we can define a
graph bijection .0,1 ∈ F between them and also we can relate this
bijection with the edit path, 3>+?@&?ℎ�=0, =1� ∈ E. To do so, we
can assume the edit operation called Substitution simply
represents node-to-node assignations. Moreover, the edit operations Deletion and Insertion are transformed to mappings of
non-null nodes of the first or second graph to null nodes of the
second or first graph. Using this transformation, given two
graphs, =0 and =1, and a bijection between their nodes, .0,1, the graph edit cost is given by (Definition 7 of [20]):
G>+?HIJ?�=0,=1, .0,1� � K HLM2:40, :716LNO∈PQORSTQOLUV∈SQVRSTQV
#
∑ HLX2:40, :716LNO∈PQORSTQOLUV∈STQV
#∑ HL72:40, :716LNO∈STQOLUV∈SQVRSTQV
# (2)
K H/M23450 , 3781 6/NYO ∈PZORSTZO/U[V∈SZVRSTZV
# K H/X23450 , 3781 6/NYO ∈PZORSTZO/U[V∈STZV
# K H/723450 , 3781 6/NYO ∈STZO
/U[V∈SZVRSTZV
.0,1�:40� � :71 and ./0,123470 6 � 3781
where the edit costs are: HLM: Cost of substituting node :40 of =0
for node .0,1�:40� of =1. HLX: Cost of deleting node :40 of =0.
HL7: Cost of inserting node :71 of =1. And for edges, H/M: Cost of
substituting edge 3450 of graph =0 for edge ./0,123450 6 of =1. H/X:
Cost of assigning edge 3450 of =0 to a non-existing edge of =1.
H/7: Cost of assigning edge 3451 of =1 to a non-existing edge of
=0. The cost of mapping two null nodes or two null arcs is always defined to be zero. For this reason, we have not
considered this case in the equation. Figure 2 shows the obtained
labelling given the edit path presented in figure 1. The cost of
this labelling is: G>+?HIJ?�=0, =1 , .0,1� � H/X #HLX # HL7 #H/7 # HLM.
Figure 2. Labelling .0,1 between =0 and =1 given the edit path of fig. 2.
Finally, the Graph Edit Distance is defined as the minimum cost
under any bijection in F:
G>+?\+J?&(]3�=0, =1� � min_O,V∈` G>+?HIJ?�=0, =1 , .0,1� (3)
Using this definition, the Graph Edit Distance directly
depends on parameters or functions HLM, HLX, HL7 , H/M, H/X and
H/7. Several definitions of these functions exist, if we focus first
on the definition of functionsHLM and H/M, the most common approaches are the following. The first and simplest approach
considers HLM�:40, :71� � aLM if >+J?2bL0�:40�, bL1�:71�6 >Fℎd3JℎIe> otherwise HLM � 0, >+J?�·� is defined as a distance
function over the domain of the attributes. Specific examples of
this cost can be found in fingerprint verification [21] whereHLM ∈�0,1� or in [20, 22]. The second and most frequently used
approach corresponds to the case whereHLM2:40, :51 , hL6 ∈ ℝ. In
this case, node substitution cost depends on the attributes of the
nodes and possibly on some other parameters hL as shown in [23,
24] and [4], among others. Similar approaches can be used to
defineH/M. With regard to HLX , HL7, H/X andH/7, these functions usually simply assign a constant cost. However, they can also
depend on node or edge attributes [25, 26] and [27].
We say the optimal bijection, .0,1∗, is the one that obtains the
minimum cost,
.0,1∗ � argmin_O,V∈` G>+?HIJ?�=0, =1 , .0,1� (4)
We define the distance and the optimal bijection between two
cliques in a similar way as the distance between two graphs since they are local structures of graphs. We name the cost of
substituting clique Ko� by Kp
as q4,7. The cost of deleting clique
Ko� as q4,r and the cost of inserting clique Kp
as qr,7.
3 Independently of the definition of the edit costs, if we wish the Edit Distance to be defined as a distance function, it is needed to
assure the following four properties [28]:
1) HLM ≤ HLX #HL7 and H/M ≤ H/X # H/7. 2) HLX � HL7 and H/X � H/7. (5)
3) HLM � 0 and H/M � 0 if same attributes.
4) All costs have to be non-negative.
3. Edit Distance Computation
In this section, we first comment the algorithms presented to
compute an optimal and sub-optimal distance value and then we
explain more deeply the Bipartite algorithm.
3.1. Optimal and Suboptimal Edit Distance Algorithms
The A* algorithm [29] is the most widely used method to
compute the exact value of the graph edit distance. It is a classical tree search algorithm that is based on exploring the
space of all possible mappings of nodes and edges of both
attributed graphs. The computational cost of this algorithm is
exponential in the number of nodes in its worst case. To alleviate this problem, some heuristic functions have been published to
reduce the space search. At the beginnings of the 80s decade, the firsts sub-optimal algorithms to solve the error-tolerant graph-
matching problem appeared [17, 30]. Nowadays, other techniques and surveys have been published [12, 13, 31, 32] and
[18]. Recently, some sub-optimal methods to compute the edit
distance have been presented. The main idea is to optimise local criteria instead of global [33, 34]. New methods have been
presented such as the ones based on Dominant Sets [35] or the Hausdorff distance [24]. And some other methods have been
proposed for specific graph definitions such as planar graphs [37,
38], bounded-valence graphs [39], unique vertex labels [40] or image registration [41, 42].
Bipartite [15] is one of the newest methods presented to solve the Edit Distance. Experimental validation shows that, nowadays,
it is one of the best sub-optimal algorithms since it obtains a good approximation of the distance value in cubic computational cost.
The method we present obtains exactly the same distance value but with a reduced computational time.
3.2. Edit Distance Computation by Bipartite algorithm (BP)
The assignment problem considers the task of finding an
optimal assignment of the elements of a set t to the elements of
another set u, where both sets have the same cardinality ( �|t| � |u|. Let us assume there is a (v( cost matrix H. The
matrix elements H7,8 correspond to the cost of assigning the i-th
element of t to the j-th element of u. An optimal assignment is the one that minimises the sum of the assignment costs and so,
the assignment problem can be stated as finding the
permutationp that minimises ∑ Cp,��p�ypzC . Munkres’ algorithm
[43] solves the assignation problem in O�(|�, in the worst case. It is a refinement of an earlier version by Kuhn [44] and is also
referred to as Kuhn-Munkres or Hungarian algorithm. The
algorithm repeatedly finds the maximum number of independent
optimal assignments and in the worst case the maximum number
of operations needed by the algorithm is O�(|�. Due to this local exploration is performed through columns (or rows), in cases the
cost matrix is not symmetric, the real run time drastically
depends on the order of exploration (columns or rows). Later, an algorithm to solve this problem applied to non-square matrices
where presented [45].
Bipartite, or BP for short [15], is an efficient algorithm for edit distance computation for general graphs that
use the Munkres’ algorithm. That is, they generalised the original
Munkres’ algorithm that solve the assignment problem to the
computation of the graph edit distance by defining a specific cost matrix. In experiments on artificial and real-world data, authors
demonstrate BP obtains an important speed-up of the
computation respect other methods while at the same time the
accuracy of the approximated distances is not much affected [15]. For this reason, since its publication, it has become one of the
most used graph-matching algorithms.
Given attributed graphs G� and G , the �n #m�}�n# m� cost matrix C is defined as follows,
Where q4,7 denotes the cost of substituting clique Ko� by Kp ,
q4,r denotes the cost of deleting clique Ko� and qr,7 denotes the
cost of inserting clique Kp . On the basis of this cost matrix
definition, Munkres’ algorithm can be executed to find the minimum cost for all permutations. Obviously, as described in
[15], this minimum cost is a sub-optimal Edit Distance value
between the involved graphs since cost matrix rows are related to
cliques of graph G� and columns are related to cliques of G . Moreover, it is considered a correct permutation the one that
∑ Co,��o�y~�ozC < ∞. That is, all costs are assigned to non-infinitive
values.
Munkres’ algorithm has been demonstrated to be optimal for solving the assignation problem and it has a polynomial
computational cost: O��( #*�|�. Nevertheless, it is important to note that it is suboptimal for solving the graph edit distance. This
is because the second order information is not completely considered since cliques are evaluated individually. For this
reason, the computed distance values are equal or smaller than the distance values obtained in an optimal method, which have to
be computed in an exponential cost method. Finally, in [46], a preliminary version of the BP algorithm
was presented that had a computational cost of O�*&}�(,*�|�. It had the advantage that the cost matrix had not to be extended
but the deletion and insertion costs were not considered. The variant of the BP we present has the same computational cost
than [47] but obtains the same distance value than [15].
4. Fast Bipartite (FBP)
In this section, after a short introduction, we present a new definition of the Edit Distance and then we depict the algorithm
to compute the graph matching.
4.1. Introduction
In this paper, we propose a new method that obtains exactly
the same distance value and mapping between nodes than the BP
but with a reduced computational time. We refer to it as Fast
4 Bipartite or FBP for short. We also used the Munkres’ method but we propose to use a different and smaller matrix cost in the
cases that the costs are defined such that the edit distance is really
a distance function (equation 5). The computational cost of our
method is O��max�(,*��|�. Our method needs the edit distance to properly be defined as
a distance function, which means that the four distance properties
described on equation (5) have to hold. In this case we can
demonstrate the following Lemma:
Lemma 1. Suppose we have two graphs G� and G of order
( #*. These graphs originally had ( and * nodes but have been
enlarged with * and ( null nodes, respectively. If equation (5)
holds then:
1) If ( ≥ * then the optimal bijection .0,1∗ does not
involve any node deletion of ��.
2) If ( ≤ * then the optimal bijection .0,1∗ does not
involve any node insertion of Σ� .
The aim of Lemma 1 is to show that the whole nodes of the graph
with initial lower or equal order are substituted if the Edit Distance is defined as a distance function. Therefore, it is not
possible to have an insertion operation and deletion operation at
the same time in an optimal bijection.
Proof. If C�� ≤ C�� # C�p then the optimal labelling never has
these two operations together:C��2vo�, vp 6 and C�p2v��, v� 6 where
the initial nodes are vo� ∈ Σ�� − Σ$�� and v� ∈ Σ� − Σ$� and the
extended nodes (null nodes) are v�� ∈ Σ$�� and vp ∈ Σ$� . This is
because, for sure the labelling C��2vo�, v� 6 has a lower value and
the cost of substituting the two null nodes v�� and vp is zero. The
same holds with the edges case∎
4.2. New Edit Distance definition
We define the following edit cost,
G>+?HIJ?′�=0, =1, .0,1� � G>+?HIJ?�=0, =1 , .0,1� −HL/_X7�=0, =1� (6)
Where,
HL/_X7�=0, =1� � K HLX2:40, :�16LNO∈PQO
# K HL72:�0, :716LUV∈SQV
#
∑ H/X23450 , 3�16/NYO ∈PZO #∑ H/723�0 , 3781 6/U[
V∈SZV (7)
:�1 ∈ Σ$L1, :�0 ∈ Σ$L0, 3�1 ∈ Σ$/1 and 3�0 ∈ Σ$/0
The new edit cost definition subtracts from the original edit
cost the costs of deleting nodes and arcs from G� and inserting
nodes and arcs from G . Nodes v� and v�� (or arcs e� and e�� )
represent any extended node (or arc). Note that in cases that nodes or arcs where the extended ones (null nodes o arcs), that is,
vo� ∈ Σ$��, vp ∈ Σ$� , eo�� ∈ Σ$�� or ep� ∈ Σ$� , then their corresponding
cost is zero by definition. Moreover, the subtracted cost C��_�p does not depend on any bijection f �, . In a similar way than the original edit cost, the optimal bijection,
.′0,1∗, is the one that obtains the minimum cost,
.′0,1∗ � argmin_O,V∈` G>+?HIJ?′�=0, =1 , .0,1� (8)
Theorem 1: The equality .′0,1∗ � .0,1∗ holds for all pair of
graphs =0 and =1.
Proof. Cost C��_�p�G�, G � is independent of any bijection f �,
therefore it is constant throughout all the exploration space and
does not affect the point where the function minimises ∎
Theorem 1 shows as that minimisingG>+?HIJ?′ is similar than
minimising G>+?HIJ?. In the next section, we describe an
algorithm to minimise G>+?HIJ?′ with lower computational time
than the Bipartite algorithm.
4.3. Edit Distance Computation by Fast Bipartite algorithm
We define D as an n #m vector. The first n positions are filled
with the costs of deleting cliques Ko� that we named H4,r , and the
other * positions are filled with zeros: D � �HC,r ,…H�,r , 0… 0� . Moreover, we define I as an m# n vector. The first * positions
are filled with the costs of inserting cliques Kp that we named
Hr,7, and the other ( positions are filled with zeros: I ��C�,C, …C�,� , 0… 0�. Note that zeros in both vectors represent the
cost of deleting or inserting null cliques. Besides, it is easy to
demonstrate that C��_�p�G�,G � � ⟨1�, �Io # Do�⟩. With these two vectors, we define two �n #m�}�m# n� matrices, \T and ��. The first one is obtained by the replication of
vector D through columns and the second one is obtained by the
replication of vector I through rows.
We are ready to define our cost matrix C′ as follows, C′=C-
2\T # ��6,
H � �
������������CC,C − 2CC,� # C�,C6 …
⋮
… CC,� − 2CC,� # C�,�6 ⋮
⋮ Cy,C − 2Cy,� # C�,C6 …
⋮ … Cy,� − 2Cy,� # C�,�6
0 ∞ …∞ ⋱ ⋮ 0
∞
∞
0 ⋮ ⋱ ∞… ∞ 0
0 ∞ …∞ ⋱ ⋮ 0
∞
∞
0 ⋮ ⋱ ∞… ∞ 0
0
¡¡¡¡¡¡¡¡¡¡¢
Similarly to cost matrix H, this matrix is composed of four
quadrants. The dimensions of each quadrant is: HQ1� � nXm,
HQ2� � nXn,HQ3� � mXm andHQ4� � mXn. Cells in the first one are filled with the value of substituting the cliques (as in
H) but the cost of deleting and inserting the respective cliques is subtracted. The second and third quadrants are composed of
infinitive values except at the diagonal that is filled with zeros. All cells on the fourth quadrant have a zero.
On the basis of the new cost matrix C′ defined above, Munkres’
algorithm [43] can be executed and it finds the optimal
permutation p that minimises ∑ C′o,��o�y~�ozC . Note that any correct
5 permutation on p is equivalent to a bijection f �, between nodes
of graphs G� and G .
Lemma 2. In the case that ( ≤ * (or * ≤ ( ), all correct
permutations hold that there are a minimum of * (or ()
assignments with null value, C′o,��o� � 0. Proof. In the case that ( ≤ * (or * ≤ ( ), matrix C′ has * rows
(or ( columns) with only 0 or infinitive values. Due to a correct permutation does not select matrix cells with value infinitive, the
only option is to select * (or () cells with null value∎
Theorem 2. The value of EditCost′ is equal to a correct
permutation cost of C′, formally:
If ( ≤ * then EditCost′�G�, G , f�, � � ∑ C′o,py~�ozC � ∑ C′o,pyozC .
If * ≤ ( then EditCost′�G�, G , f�, � � ∑ C′o,p�~ypzC � ∑ C′o,p�7zC .
Where f�, �a� � i. Proof. Suppose that ( ≤ *. Moreover, without of loosing
generality suppose that nodes :40: 1 ≤ & ≤ ( are mapped to
nodes :71: 1 ≤ + ≤ ( (if this was not the case, we always can
reorder the nodes of G such that this supposition holds).
Therefore by Lemma 2 we have that :71: ( # 1 ≤ + ≤ * have to
be inserted and there is not any :40 that has to be deleted. By
definition,EditCost� � EditCost − C��_�p �
∑ Co,p�4zC #∑ C�,p®7z�~C #∑ 0�~®4z®~C¯°°°°°°°°°°±°°°°°°°°°°²G>+?HIJ?
−∑ Co,��4zC #∑ C�,p®7zC°̄°°°°±°°°°°²
C��_�p.
Rearranging the terms, EditCost� � ∑ Co,p�4zC −∑ Co,��4zC −∑ C�,p�7zC . Applying the associative property, EditCost� �∑ ³Co,p − 2Co,� # C�,o6´�4zC � ∑ C′o,p�4zC . Finally, by Lemma 2 we
can conclude that ∑ C′o,p�4zC � ∑ C′o,p�~®4zC . The case * < ( is
similar than ( ≤ *∎
Theorem 2 gives us a way to compute the G>+?HIJ?′ through the
permutation matrix C′. Moreover, Theorem 1 showed us that the
optimal bijection of G>+?HIJ?′ is also an optimal bijection of
G>+?HIJ?. And finally, equations (6) and (7) make possible to
obtain the G>+?HIJ? with G>+?HIJ?′. Considering this three facts,
we could implement and algorithm that obtains the G>+?HIJ? through applying the Munkres’ algorithm to matrix C′. Nevertheless, due to the dimensions of C′ are the same than the
original C, the computational cost would be equivalent. If we
restrict the cost such that equation (5) holds then by Lemma 1,
we can assure the Munkres’ algorithm only needs to explore the
sub-matrix composed by the first quadrant composed of the first
n rows and m columns, that we call it HQ1�. Algorithm 1
computes the µ&J?u+¶&d?+?3.
Algorithm 1. Fast Bipartite
HQ1� � Computation_Cost�G�, G �. // HQ1� is the (v* first quadrant of cost matrix C�
P � Munkres�HQ1�� // P is the (v* permutation matrix
EditCost� � Sum2¼½*�@.∗ HQ1��6. // .∗ represents the multiplication element by element
EditCost � EditCost� # C��_�p // Final distance value
End.
As commented, the Munkres algorithm was initially implemented
to find the permutation of a quadratic matrix. In case * ≠ (,
matrix HQ1�can be extended with negative values (lower than
any original cost). Nevertheless, it is usual the implemented
functions of the Munkres’ algorithm to automatically enlarge the
cost matrix. The worst computational cost of Fast Bipartite is the
cost of the Munkres’ algorithm, that is: ¿�max�*,(�|�. The
cost of Bipartite algorithm [15] is ¿��* # (�|�.
5. Experimental Validation
The goodness of the Bipartite algorithm has been tested in
several papers, for this reason, we only want to present the Speed
up of our method respect the classical one [15]. We do not present new recognition-ratio tests or correlation tests between
the sub-optimal distance and the optimal one since we obtain
exactly the same distance value (as described above). We have performed two types of experiments. The first ones are
performed on artificially created graphs. The aim of these experiments is twofold. On the one hand, we want to show the
real computational time of the Bipartite algorithm. On the other hand, we want to show the speed up of the Fast Bipartite respect
of the order of the graphs. The second experiments are performed on well-know graph datasets. The aim is to execute again the
tests published in paper [15] where the Bipartite algorithm was presented and show the Speed up of our method on these largely
used databases.
5.1. Speed up respect graph order
In this first set of experiments, graphs have been randomly created. Nodes have one attribute and arcs do not have attributes
and are directed. The substitution cost between nodes is the absolute value of the difference between attribute values,
C�� � Àγ�2vo�6 − γ�2vp 6À. The substitution cost between arcs is 0
if both exist and 1 otherwise. Cost insertion and cost deletion of
nodes and arcs has been set to C�p � C�� � C�p � C�� � 0.5.
Each experiment has been run 100 times to not be influenced on
the random data. Source code on Matlab can be downloaded from [48].
Figure 3 shows the total execution time in seconds of 100
executions of the Bipartite algorithm tÂÃ and the Fast Bipartite
algorithm tÄÂÃ respect the order of the graphs 5, 10, 15, 20, 25, 30, 35, 40, 45 and 50 (MacPro, processor I7). Axes represent the
order of the graph presented on the first parameter and the second
parameter of the algorithm. We realise that both figures are not
symmetric and it is less costly the computation of the distance when the first graph has lower order than the second graph. In the
Bipartite case, when the order of both graphs is exactly the same,
the increase of the run time respect the order of the graphs is
really lower than in the cases that there is a small difference between both graph orders. The figure of the Fast Bipartite is
completely different since the higher is the order of the first
graph, the higher the run time. From these figures, we conclude
that (using our implementation of the Munkres’ algorithm) it is worth to present on the first parameter of the Munkres’ function
the graph with lower order and on the second parameter, the
graph of higher order.
6 (a) Bipartite algorithm.
(b) Fast Bipartite algorithm.
Figure 3. Run time tÂÃ and tÄÂÃ respect the order of the graphs.
Figure 4 shows the Speed up of our algorithm: ÅÆyÇp���ÈÉÊ�ÅÆyÇp���ÉÊ� . In
all the cases the Speed up is higher than one. It is important to note that the highest Speed up appears in the cases that the first
graph has the lowest order, that is, the desired situation that we
have realised both algorithms have the lowest run time.
Moreover, the larger the difference between graph orders, the better the Speed up.
Figure 4. Speed up (ratio between Bipartite and Fast Bipartite)
5.2. Speed up respect well-known graph datasets
The purpose of the experiments in this section is to empirically
verify the Speed up of our algorithm in some well-known graph databases known as IAM graph database repository. These
databases have been used during some years to test different a
types of algorithms related to graphs such as, classification, clustering, prototyping or graph embedding. We do not want to
add any comment to these databases since a lot of literature has been written talking about them. The first explanations of these
databases can be found in [47] and also they have been commented in [15]. They can be downloaded from the IAPR-
TC15 web page [49]. In this set of experiments we have compared all graphs on the
test set respect all graphs on the reference set as authors did in [15]. Then, from this large number of comparisons, we have
extracted the mean computational times t̅ÉÊ and t̅ÈÉÊ and the
Speed up (table 1). The source code in Matlab can be
downloaded from [48]. Moreover, the graph with the lowest order have been presented as the first parameter and the one with
the highest order as the second parameter since now we know
that this combination is the one that obtains lower run times on
both algorithms.
Table 1. Mean computational time of Bipartite and Fast Bipartite and Speed Up of Fast Bipartite respect Bipartite
Letter (L) Letter (M) Letter (H) COIL GREC Fingerprint Molecules Proteins
Mean Order 4.5 4.6 4.7 3.0 12.9 5.4 9.5 32.6
Max Order 8 10 9 11 39 26 85 126
Ì̅ÍÎ 0.0016 0.0021 0.0018 0.0022 0.0293 0.0098 0.3913 1.4606
Ì̅ÏÍÎ 0.0014 0.0018 0.0016 0.0019 0.0131 0.0062 0.0997 0.278
Speed up 1.0888 1.1344 1.1381 1.1779 2.2319 1.5927 3.9249 5.2544
The Speed up is higher than one on the whole experiments therefore it is always worth to use the Fast Bipartite instead of the
Bipartite algorithm. Moreover, the higher is the mean and the
maximum number of nodes, the higher is the Speed up. Figure 5
shows graphically the run time of both algorithms.
Figure 5. Normalised run time of FBP and BP.
6. Conclusions
This paper presents a new algorithm called Fast Bipartite to
compute the Graph Edit Distance. We have shown that under the
restriction that the edit costs are defined such as the Edit Distance
is really a distance function, we obtain exactly the same distance
value than the Bipartite algorithm but with a reduced run time. Moreover, we have also realised the run time of the Bipartite
algorithm depends on the order of presentation of both graphs.
More precisely, it is important to consider the number of nodes of
graphs. Empirical evaluation shows the Fast Bipartite is always faster than the Bipartite algorithm and higher is the order of both
graphs, better is the Speed up we obtain.
Acknowledgments
This research is supported by the CICYT projects DPI2013-
42458-P.
7 References
1. A. Sanfeliu, R. Alquézar, J. Andrade, J. Climent, F.
Serratosa & J. Vergés, “Graph-based Representations and
Techniques for Image Processing and Image Analysis”, Pattern Recognition 35 (3), pp: 639-650, 2002.
2. He, L., et al., Graph matching for object recognition and
recovery. Pattern Recognition Letters, 2004. 37(7). 3. F. Serratosa, X. Cortés & A. Solé-Ribalta, "Component
Retrieval based on a Database of Graphs for Hand-Written Electronic-Scheme Digitalisation", Expert Systems With
Applications, ESWA 40, pp: 2493-2502 , 2013. 4. Caetano, T., et al., Learning Graph Matching. Transaction
on Pattern Analysis and Machine Intelligence, 2009. 31(6): p. 1048-1058.
5. Williams, M., R. Wilson, and E. Hancock, Multiple Graph Matching with Bayesian Inference. Pattern Recognition
Letters, 1997. 18: p. 1275-1281.
6. Konc, J. and D. Janežič, A Branch and Bound Algorithm for
Matching Protein Structures, in Adaptive and Natural Computing Algorithms. 2007. p. 399-406.
7. Sanromà, G., R. Alquézar, and F. Serratosa, Smooth
Simultaneous Structural Graph Matching and Point-Set
Registration, in Workshop on Graph-based Representations in Pattern Recognition. 2011. p. 142-151.
8. A. Solé-Ribalta & F. Serratosa, "Graduated Assignment
Algorithm for Multiple Graph Matching based on a
Common Labelling", International Journal of Pattern Recognition and Artificial Intelligence, IJPRAI 27 (1), pp:
1,27, 2013 9. A. Solé & F. Serratosa, “Models and Algorithms for
computing the Common Labelling of a set of Attributed Graphs”, Computer Vision and Image
Understanding, CVIU, 115(7), pp: 929-945, 2011. 10. M. Ferrer, E. Valveny, F. Serratosa, K. Riesen & H. Bunke.
“Generalized Median Graph Computation by Means of Graph Embedding in Vector Spaces”, Pattern Recognition,
43 (4), pp. 1642-1655, 2010.
11. M. Ferrer, E. Valveny & F. Serratosa, “Median graphs: A genetic approach based on new theoretical
properties”, Pattern Recognition 42, (9),pp 2003-2012, 2009.
12. Donatello Conte, Pasquale Foggia, Carlo Sansone, Mario
Vento: Thirty Years Of Graph Matching In Pattern Recognition. IJPRAI 18(3): 265-298 (2004).
13. Mario Vento: A One Hour Trip in the World of Graphs, Looking at the Papers of the Last Ten Years. GbRPR 2013,
pp: 1-10. 14. Edwin R. Hancock, Richard C. Wilson, “Pattern analysis
with graphs: Parallel work at Bern and York”, Pattern Recognition Letters 33(7): 833-841 (2012).
15. Kaspar Riesen, Horst Bunke: Approximate graph edit distance computation by means of bipartite graph matching.
Image Vision Comput. 27(7): 950-959 (2009) 16. Wong, A. and M. You, Entropy and Distance of Random
Graphs with Application to Structural Pattern Recognition.
Transaction on Pattern Analysis and Machine Intelligence,
1985. PAMI-7(5): p. 599-609. 17. Sanfeliu, A. and K.-S. Fu, A Distance measure between
attributed relational graphs for pattern recognition. IEEE
transactions on systems, man, and cybernetics, 1983. 13(3):
p. 353-362. 18. Gao, X., et al., A survey of graph edit distance. Pattern
Analysis and applications, 2010. 13(1): p. 113-129.
19. A. Solé, F. Serratosa & A. Sanfeliu, “On the Graph Edit
Distance cost: Properties and Applications”, International Journal of Pattern Recognition
and Artificial Intelligence, IJPRAI 26, (5), 2012.
20. Bunke, H., Error Correcting Graph Matching: On the Influence of the Underlying Cost Function. Transactions
on Pattern Analysis and Machine Intelligence, 1999. 21(9):
p. 917-922.
21. Jain, A.K. and D. Maltoni, Handbook of Fingerprint Recognition. 2003, Springer-Verlag New York.
22. Bunke, H., On a relation between graph edit distance and
maximum common subgraph. Pattern Recognition Letters,
1998. 18(8): p. 689-694. 23. Neuhaus, M. and H. Bunke, Automatic learning of cost
functions for graph edit distance. Information Sciences,
2006. 177(1): p. 239-247. 24. Lladós, J., E. Martí, and J. Villanueva, Symbol Recognition
by Error-Tolerant Subgraph Matching between Region
Adjacency Graphs. TRANSACTIONS ON PATTERN
ANALYSIS AND MACHINE INTELLIGENCE, 2001. 23(10): p. 1137-1143.
25. Wong, A. and M. You, Entropy and Distance of Random
Graphs with Application to Structural Pattern Recognition.
Transaction on Pattern Analysis and Machine Intelligence, 1985. PAMI-7(5): p. 599-609.
26. F. Serratosa, R. Alquézar & A. Sanfeliu, Function-Described Graphs for modelling objects represented by attributed
graphs. Pattern Recognition, 36(3)p. 781-798. 2003. 27. Sanfeliu, A., F. Serratosa, and R. Alquézar, Second-Order
Random Graphs for modelling sets of Attributed Graphs and their application to object learning and recognition.
International Journal of Pattern Recognition and Artificial Intelligence, 2004. 18(3): p. 375-396.
28. Bunke, H., Allermann ,G., “Inexact graph matching for structural pattern recognition”, Pattern Recognition Letters,
1983. 1(4): p. 245–253.
29. P. Hart, N. Nilsson, B. Raphael, A formal basis for the
heuristic determination of minimum cost paths, IEEE Transactions of Systems, Science, and Cybernetics 4(2),
1968: p. 100–107.
30. M. Eshera, K. Fu, A graph distance measure for image
analysis, IEEE Transactions on Systems, Man, and Cybernetics (Part B) 14 (3) (1984) 398–408.
31. Mario Vento, “A long trip in the charming world of graphs
for Pattern Recognition, Pattern Recognition”, Available online 15 January 2014.
32. P. Foggia, G. Percannella and M. Vento, Graph matching and learning in Pattern Recognition in the last 10 years,
International Journal of Pattern Recognition and Artificial Intelligence, 2013.
33. D. Justice, A. Hero, A binary linear programming
formulation of the graph edit distance, IEEE Tran. on Pattern
Analysis and Machine Intelli. 28 (8) (2006) 1200–1214. 34. M. Neuhaus, K. Riesen, H. Bunke, Fast suboptimal
algorithms for the computation of graph edit distance, in: C.
Ribeiro, S. Martins (Eds.), Proceedings of 11th International
Workshop on Structural and Syntactic Pattern Recognition, LNCS, 3059, Springer, 2006, pp. 163–172.
35. N. Rebagliati, A. Solé, M. Pelillo, F. Serratosa, “Computing
the Graph Edit Distance Using Dominant Sets”,
International Conference on Pattern Recognition, ICPR2012, Tsukuba, Japan, Volume , pp: 1080-1083, 2012.
36. Andreas Fischer, Ching Y. Suen, Volkmar Frinken, Kaspar
Riesen, Horst Bunke: A Fast Matching Algorithm for Graph-
Based Handwriting Recognition. GbRPR 2013: 194-203. 37. Neuhaus, M. and H. Bunke, Self-organizing maps for
learning the edit costs in graph matching. Transactions on Systems, Man, and Cybernetics, 2005. 35(3): p. 305-314.
38. Romero, A. and M. Cazorla, Topological SLAM Using Omnidirectional Images: Merging Feature Detectors and
8 Graph-Matching, in Advanced Concepts for Intelligent Vision Systems. 2010.
39. Thrun, S. and M. Montemerlo, The Graph SLAM Algorithm
with Applications to Large-Scale Mapping of Urban
Structures. International Journal of Robotics Research, 2006. 25(5-6): p. 403-429.
40. Xia, S. and E. Hancock, Learning Class Specific Graph
Prototypes, in Image Analysis and Processing. 2009. p. 269-
277. 41. G. Sanromà, R. Alquézar, & F. Serratosa, A New Graph
Matching Method for Point-Set Correspondence using the
EM Algorithm and Softassign, Computer Vision and Image Understanding, CVIU, 116(2), pp: 292-304, 2012.
42. G. Sanromà, R. Alquézar, F. Serratosa & B. Herrera,
Smooth Point-set Registration using Neighbouring
Constraints, Pattern Recognition Letters, PRL 33, pp: 2029-2037, 2012.
43. J. Munkres, Algorithms for the assignment and
transportation problems, Journal of the Society for Industrial
and Applied Mathematics 5 (1957) 32–38.
44. H. Kuhn, The Hungarian method for the assignment problem, Naval Research Logistic Quarterly 2
(1955) 83–97.
45. F. Bourgeois, J. Lassalle, An extension of the Munkres
algorithm for the assignment problem to rectangular matrices, Communications of the ACM 14 (12) (1971) 802–
804.
46. K. Riesen, M. Neuhaus, H. Bunke: Bipartite Graph
Matching for Computing the Edit Distance of Graphs. GbRPR 1-12, 2007.
47. Riesen, K., Bunke, H.: IAM graph database repository for
graph based pattern recognition and machine learning, in N. da Vitoria Lobo et al. (Eds.): Structural, Syntactic, and
Statistical Pattern Recognition, Springer, LNCS 5342, 287 -
297, 2008.
48. http://deim.urv.cat/~francesc.serratosa/SW/ 49. http://iapr-tc15.greyc.fr/links.html
9
- A sub-optimal algorithm is presented to compute the
graph edit distance.
- It obtains exactly the same distance value than the
Bipartite algorithm.
- The computational cost is lower than the Bipartite
one.
- Validation shows an important decrease on run time.