9
Automatic learning of cost functions for graph edit distance Michel Neuhaus * , Horst Bunke Institute of Computer Science and Applied Mathematics, University of Bern, Neubru ¨ ckstrasse 10, CH-3012 Bern, Switzerland Received 10 February 2006; accepted 20 February 2006 Abstract Graph matching and graph edit distance have become important tools in structural pattern recognition. The graph edit distance concept allows us to measure the structural similarity of attributed graphs in an error-tolerant way. The key idea is to model graph variations by structural distortion operations. As one of its main constraints, however, the edit distance requires the adequate definition of edit cost functions, which eventually determine which graphs are considered similar. In the past, these cost functions were usually defined in a manual fashion, which is highly prone to errors. The present paper proposes a method to automatically learn cost functions from a labeled sample set of graphs. To this end, we formulate the graph edit process in a stochastic context and perform a maximum likelihood parameter estimation of the distribution of edit operations. The underlying distortion model is learned using an Expectation Maximization algorithm. From this model we finally derive the desired cost functions. In a series of experiments we demonstrate the learning effect of the pro- posed method and provide a performance comparison to other models. Ó 2006 Elsevier Inc. All rights reserved. Keywords: Graph matching; Graph edit distance; Edit cost function 1. Introduction In recent years, graphs have increasingly been used for structural pattern representation [1–3]. By convert- ing patterns into graphs, we turn the pattern classification problem into the problem of evaluating the structural similarity of graphs, which is commonly referred to as graph matching. A large variety of methods have been proposed for graph matching, ranging from exact matching techniques, such as maximum common subgraph based methods, to error-tolerant approaches based on continuous optimization theory and spectral decomposition methods [7]. One of the most intuitive ways to approach the graph matching problem is via the definition of a graph similarity measure [5,11,24]. Among several possible alternatives, graph edit distance [21,16] has been recognized a general and flexible method to measure the dissimilarity (or similarity) of attributed graphs by taking structural errors into account. However, the structural dissim- ilarity of graphs is only correctly reflected by graph edit distance if the underlying edit costs are defined appropriately. 0020-0255/$ - see front matter Ó 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2006.02.013 * Corresponding author. Tel.: +41 31 6318699; fax: +41 31 6313262. E-mail addresses: [email protected] (M. Neuhaus), [email protected] (H. Bunke). Information Sciences 177 (2007) 239–247 www.elsevier.com/locate/ins

Automatic learning of cost functions for graph edit distance

Embed Size (px)

Citation preview

Information Sciences 177 (2007) 239–247

www.elsevier.com/locate/ins

Automatic learning of cost functions for graph edit distance

Michel Neuhaus *, Horst Bunke

Institute of Computer Science and Applied Mathematics, University of Bern, Neubruckstrasse 10, CH-3012 Bern, Switzerland

Received 10 February 2006; accepted 20 February 2006

Abstract

Graph matching and graph edit distance have become important tools in structural pattern recognition. The graph editdistance concept allows us to measure the structural similarity of attributed graphs in an error-tolerant way. The key idea isto model graph variations by structural distortion operations. As one of its main constraints, however, the edit distancerequires the adequate definition of edit cost functions, which eventually determine which graphs are considered similar. Inthe past, these cost functions were usually defined in a manual fashion, which is highly prone to errors. The present paperproposes a method to automatically learn cost functions from a labeled sample set of graphs. To this end, we formulate thegraph edit process in a stochastic context and perform a maximum likelihood parameter estimation of the distribution ofedit operations. The underlying distortion model is learned using an Expectation Maximization algorithm. From thismodel we finally derive the desired cost functions. In a series of experiments we demonstrate the learning effect of the pro-posed method and provide a performance comparison to other models.� 2006 Elsevier Inc. All rights reserved.

Keywords: Graph matching; Graph edit distance; Edit cost function

1. Introduction

In recent years, graphs have increasingly been used for structural pattern representation [1–3]. By convert-ing patterns into graphs, we turn the pattern classification problem into the problem of evaluating thestructural similarity of graphs, which is commonly referred to as graph matching. A large variety of methodshave been proposed for graph matching, ranging from exact matching techniques, such as maximumcommon subgraph based methods, to error-tolerant approaches based on continuous optimization theoryand spectral decomposition methods [7]. One of the most intuitive ways to approach the graph matchingproblem is via the definition of a graph similarity measure [5,11,24]. Among several possible alternatives,graph edit distance [21,16] has been recognized a general and flexible method to measure the dissimilarity(or similarity) of attributed graphs by taking structural errors into account. However, the structural dissim-ilarity of graphs is only correctly reflected by graph edit distance if the underlying edit costs are definedappropriately.

0020-0255/$ - see front matter � 2006 Elsevier Inc. All rights reserved.

doi:10.1016/j.ins.2006.02.013

* Corresponding author. Tel.: +41 31 6318699; fax: +41 31 6313262.E-mail addresses: [email protected] (M. Neuhaus), [email protected] (H. Bunke).

240 M. Neuhaus, H. Bunke / Information Sciences 177 (2007) 239–247

For the problem of string matching, Ristad and Yianilos [20] propose a model to obtain edit costs from anestimation of the frequency of edit operations. Related to this approach, we introduce in this paper a prob-abilistic model of the distribution of graph edit operations that allows us to derive edit costs that are optimalwith respect to certain criteria. Unlike other graph matching methods based on machine learning techniques[6,26,9,12,14], we propose a training of the graph matching system according to the edit operation model ofstructural similarity.

This paper is structured as follows. In Section 2 we briefly introduce graph edit distance. Section 3 describesthe algorithm for learning edit costs. An experimental evaluation of the proposed model is presented inSection 4, and conclusions are drawn in Section 5.

2. Graph edit distance

The adequate definition of object similarity is a key problem in pattern recognition. In the case of graphmatching, objects are represented by graph structures consisting of nodes, edges connecting pairs of nodes,and attribute labels attached to nodes and edges. While in general the similarity of graphs can be measuredin various ways [7], some graph matching methods seem to be too restricted to cope with noisy real-worlddata. Exact graph isomorphism based methods, for instance, provide for a very precise theoretical foundationof graph matching, yet they often fail to successfully model structural variations. Other methods are limited tospecial classes of graphs that occur rather infrequently in graph data based on real-world problems.

A common concept to measure the similarity of graphs is based on graph edit distance [21,16]. Its mainadvantage is that graph edit distance can be applied to all kinds of graphs, while being able to model structuralvariation in a very intuitive and illustrative way. Edit distance has originally been developed for string match-ing [23], and a considerable amount of variants and extensions to edit distance have been proposed in recentyears for strings and graphs. The basic idea is to model structural variation by edit operations, which reflectmodifications in structure, such as the removal of a single node or the modification of an attribute attached toan edge. A standard set of edit operations consists of a node insertion, node deletion, node substitution, edgeinsertion, edge deletion, and edge substitution operation. A node deletion operation, for example, is used tomodel the removal of a node from a graph, and an edge substitution operation is used to model the modifi-cation of an edge attribute. A key property is that any graph can be transformed into any other graph by iter-atively applying edit operations. Computing the edit distance of two graphs g and g 0 is equivalent to finding asequence of edit operations transforming graph g into graph g 0. Such a sequence of edit operations is alsotermed an edit path from g to g 0. A trivial edit path from any graph g to any other graph g 0 is given by sequen-tially deleting all nodes and edges from g and then inserting all nodes and edges into g 0, although it is usuallynot an adequate way to model the differences between the two graphs under consideration. The problemof measuring the similarity of two graphs hence turns into to the problem of finding the best model of thestructural differences of two graphs.

To be able to quantify whether an edit operation modifies a graph’s structure heavily or only slightly, editcost functions are introduced. Edit cost functions assign a cost value to each edit operation reflecting thestrength of the modification applied to the graph. To obtain a cost function on edit paths, individual edit oper-ation costs of the edit path are accumulated. An edit path from g to g 0 with minimal costs can then be definedas the best model for the structural differences of g and g 0. Not only can an optimal edit path be used for avisual description of the optimal stepwise transformation from g to g 0, but it also provides us with a minimaltransformation cost assigned to the two graphs, called graph edit distance. More formally, given two graphs g

and g 0, let E(g,g 0) denote the set of all edit paths from g to g 0, and c denote a function assigning non-negativecosts to edit operations. The graph edit distance of g and g 0 is then defined by

dðg; g0Þ ¼ minðe1;...;ekÞ2Eðg;g0Þ

Xk

i¼1

cðeiÞ. ð1Þ

While leaving the general edit distance framework unchanged, edit cost functions can be used to tailor editdistance to specific applications and datasets. Node insertion, deletion, and substitution costs, for example,determine whether it is cheaper to delete a node u and subsequently insert a node u 0 instead of substituting

M. Neuhaus, H. Bunke / Information Sciences 177 (2007) 239–247 241

node u with u 0 (which means that one prefers deletion and insertion over substitution in an optimal edit path).Edit operation costs are usually limited to non-negative values. If the cost functions additionally satisfy theconditions of positive definiteness and symmetry as well as the triangle inequality at the level of single editoperations, the resulting edit distance function d : G · G! R+ [ {0} is known to be a metric [4]. The termdistance is therefore legitimate for edit distance.

The edit distance of graphs is usually computed by means of a tree search procedure that basically evaluatesall possible node-to-node correspondences [16]. In some cases, the running time and memory requirements canbe reduced by applying heuristics to the tree search. Yet, the overall computational complexity is rather high –the time and space complexity of a graph edit distance computation is exponential in the number of nodesinvolved. Thus, graph edit distance is in general only feasible for small graphs.

In edit distance based graph matching, it is often sufficient to deal with distances of graphs only, withoutrecourse to the actual graph objects. For a graph dataset provided with a full distance matrix, a visual repre-sentation of the graph distribution, for instance, can be obtained by applying multidimensional scaling [8]. Theresult of multidimensional scaling is a set of vectors in two or three dimensions reflecting the original graphedit distances. A commonly used approach in pattern classification is based on nearest-neighbor classification.That is, an unknown object is assigned the class or identity of its closest known element, or nearest neighbor.Although more complex algorithms are available, in the experiments presented in this paper we employ asimple nearest-neighbor classifier.

3. Learning edit costs from samples

A major difficulty of edit distance, independent of its computational complexity, is the adequate definitionof edit costs. The objective is to define edit costs in such a way that the resulting intra-class distances are smallwhile inter-class distances are large, which is equivalent to graphs from the same class being similar and graphsfrom different classes being dissimilar, based on their edit distance.

The approach presented in this paper is based on a probabilistic modelling of the distribution of edit oper-ations. Motivated by a cost learning method for string matching introduced in [20], we adopt a stochastic viewon the process of editing a graph into another one. Instead of applying edit operations to graphs, we assumethat a sequence of random edit operations is observed. Under a few weak constraints, one can show that everyrandom sequence of edit operations is equivalent to a pair of graphs. Given a probability distribution on editoperations, the probability of a sequence of edit operations can be derived by summing up individual prob-abilities under the condition of stochastic independence. Hence, as the probability of each edit path is welldefined, we define the probability of two graphs g and g 0 by

pðg; g0Þ ¼Zðe1;...;ekÞ2Eðg;g0Þ

dPðe1; . . . ; ekjUÞ; ð2Þ

where U denotes the parameters of the edit operation distribution. In this model, the probability of two graphsg and g 0 is governed by the probability of all underlying edit transformations from g to g 0. In cases wherethe whole set of edit paths is not available for some reason, one can compute an approximate probabilityinstead

pðg; g0Þ ¼ maxðe1;...;ekÞ2Eðg;g0ÞP ðe1; . . . ; ekjUÞ. ð3Þ

If we assume that the structural similarity of two graphs can be expressed by their probability, we obtain adissimilarity measure on graphs by defining

dðg; g0Þ ¼ � log pðg; g0Þ. ð4Þ

Hence we arrive at a graph distance measure based on edit distance defined with respect to an underlying editoperation distribution. The issue of learning edit costs can thus be understood as learning the probabilitydistribution of edit operations. As our general objective is to assign low distances to graphs from the sameclass and high distances to graphs from different classes, we modify the edit operation distribution such thatgraphs from the same class are assigned high probabilities while graphs from different classes have lowprobabilities.

242 M. Neuhaus, H. Bunke / Information Sciences 177 (2007) 239–247

Our general procedure therefore is as follows. We first introduce a model for the distribution of edit oper-ations, train the model to obtain high intra-class probabilities, and finally derive edit costs from the model.These steps are described in greater detail in the remainder of this section.

3.1. Distribution model

In the following we assume that node labels as well as edge labels are vectors of a fixed dimension. Notethat this restriction is reasonable and rather weak from the application-oriented point of view, as mostreal-world graphs belong to this category or can easily be converted into it. The distribution model we proposeis based on mixtures of Gaussians. Identifying edit operations by their labels, we use a Gaussian mixture den-sity to model every edit operation type individually. Our model hence consists of three weighted node mixturedensities and three weighted edge mixture densities, one for insertion, one for deletion, and one for substitu-tion operations. The probability of an edge insertion, for instance, is given by the probability of the corre-sponding edge label in the edge insertion mixture density. Similarly, the probability of a node substitutionis given by the probability of the corresponding pair of node labels in the node substitution mixture density.That is, the Gaussian mixtures can be considered an approximation of the unknown edit operation distribu-tion in the space of node and edge labels.

More formally, if G( Æ jl,R) denotes a multivariate Gaussian density with mean l and covariance matrix R,the probability of an edit path e = (e1, . . . ,ek) is given by

pðe1; . . . ; ekÞ ¼Yk

j¼1

btj

Xmtj

i¼1

aitjGðejjli

tj;Ri

tjÞ; ð5Þ

where tj denotes the type of edit operation ej. Every edit operation type is additionally provided with a modelweight btj , a number of mixture components mtj , and for each component i 2 f1; . . . ;mtjg with a mixtureweight ai

tj, a mean vector li

tj, and a covariance matrix Ri

tj. If node labels are d-dimensional vectors, node inser-

tion and deletion mixture components consist of d-variate Gaussian densities, and node substitution mixturecomponents consist of 2d-variate Gaussian densities. In the proposed model we use weighted mixtures ofGaussians as they can be parametrized in a straightforward way and are suitable for an approximation ofunknown distributions. The general considerations, however, are not limited to this kind of probability distri-bution models.

3.2. Cost learning algorithm

The training of the distribution model of edit operations, and hence the learning of the edit cost functions,is intended to improve the compactness of graph classes, that is, to increase intra-class probabilities in a con-trolled manner. To this end we introduce a maximum likelihood criterion with respect to sample graphs. Weassume that a labeled sample set of training graphs is given, that is, the class or identity of every graph in thetraining set is known. We proceed by extracting intra-class graph pairs from the training corpus, hence obtain-ing pairs of graphs required to be similar. During learning the underlying density model is adapted such thatthe probability of intra-class training pairs is increased, which is equivalent to intra-class distances accordingto Eq. (4) being decreased.

To train the edit operation model, we apply the Expectation Maximization (EM) algorithm [10,19]. EM canbe used to estimate maximum likelihood parameters coping with missing or hidden data. An initial parame-trized distribution is improved in a dual-step process by locally maximizing the conditional expectation of thehidden data given an observation. A convenient property of the EM algorithm is that the likelihood of theobserved data does not decrease in consecutive EM steps. In our context, the likelihood of the training samplesis maximized by modifying the hidden parameters of the underlying distribution. For the specific edit opera-tion model introduced above, we can conclude that the training algorithm will converge to a stationary pointon the likelihood surface [27].

The EM algorithm we propose is related to general EM mixture density learning. During learning, theparameters of the edit operation model given in Eq. (5) are adapted according to

F

M. Neuhaus, H. Bunke / Information Sciences 177 (2007) 239–247 243

btj ¼Pðe1;...;ekÞpðe1; . . . ; ekÞvtj

ðe1; . . . ; ekÞPðe1;...;ekÞpðe1; . . . ; ekÞ � k

; ð6Þ

aitj¼

Pðe1;...;ekÞpðe1; . . . ; ekÞ

Pkl¼1wilP

ðe1;...;ekÞpðe1; . . . ; ekÞ � vtjðe1; . . . ; ekÞ; ð7Þ

litj¼Pðe1;...;ekÞpðe1; . . . ; ekÞ

Pkl¼1wil � elP

ðe1;...;ekÞpðe1; . . . ; ekÞPk

l¼1wil

; ð8Þ

Ritj¼Pðe1;...;ekÞpðe1; . . . ; ekÞ

Pkl¼1wil � ðel � li

tjÞðel � li

tjÞTP

ðe1;...;ekÞpðe1; . . . ; ekÞPk

l¼1wil

; ð9Þ

where vtjðe1; . . . ; ekÞ denotes the number of edit operations of type tj occurring in edit path (e1, . . . ,ek), andwil is the posterior probability of mixture component i given edit operation el. Note that the summation isperformed over all edit paths between training graph pairs.

A predefined initial set of distribution parameters is required by the training algorithm. The learning of editcosts does not lead to globally optimal parameters from the initial set of parameters in general, but only to alocal optimum on the likelihood surface. Instead of choosing random values for the number of mixture com-ponents, the mixture weights, and the Gaussian parameters, we employ an iterative technique for mixture den-sity estimation [22]. Every mixture density is initialized with a single component derived from the trainingsamples. Whenever the mixture density appears to converge during training, a number of new candidate com-ponents are added by evaluating a Taylor approximation of the likelihood function. The candidate that per-forms best in the evaluation is finally chosen. This procedure is terminated if the overall likelihood of thesamples cannot be improved any further. Theoretical considerations encourage the use of such an iterativeparameter initialization strategy [22]. After convergence, all mixture components contributing only little tothe overall distribution are removed.

4. Experimental results

In the following, we present an experimental evaluation of the proposed cost learning method on synthet-ically generated letter graphs and graphs extracted from a standard database of fingerprint images.

4.1. Recognition of letter line drawings

In the first experiment we consider the set of all capital characters that consist of straight line segments only(A, E, F, H, I, K, L, M, N, T, V, W, X, Y, Z). To obtain a letter graph database, we manually create a pro-totype of an ideal letter drawing – one for each of the 15 classes considered. The class prototypes then undergoa distortion process leading to a random removal, insertion, or displacement of one or several line segments.An arbitrary number of sample graphs can be generated by repeatedly distorting the ideal prototype. An illus-tration of an ideal prototype and two distorted instances of a letter A line drawing are provided in Fig. 1a andb–c, respectively. Such line drawings are then converted into attributed graphs by representing ending pointsby nodes (with a label giving the position of the node) and lines by edges (without label) [18].

ig. 1. (a) Ideal letter prototype, (b–c) distorted instances of ideal prototype, and (d–e) sheared instances of ideal prototype.

244 M. Neuhaus, H. Bunke / Information Sciences 177 (2007) 239–247

We begin our evaluation by visualizing the learning of edit costs. For a database of three letter classes and10 graphs per class, the edit distance between all graph pairs is computed before learning and after terminationof the learning algorithm. For a visual representation of the distribution of the graphs, we derive a Euclideanembedding from the full distance matrix by means of a multidimensional scaling technique [8]. In Fig. 2, thegraph distribution before learning and after learning is illustrated. It can clearly be observed that before learn-ing the three classes severely overlap (Fig. 2a). After learning, however, well-separated clusters of graphs areobtained (Fig. 2b).

In the following we use a cluster validation index to measure the quality of a clustering in quantitativeterms, instead of resorting to a visual interpretation. A cluster validation measure is a function indicatinghow well the classes are clustered in the graph space. The measure we use is based on the C-index [13] andis adapted so as to produce high values for compact and well-separated clusters. In the context of the learningof edit costs, the edit distance, and hence eventually the underlying edit costs, determine what clusters resultfrom the graph matching process. In Fig. 3a, the edit cost learning process is illustrated in terms of the clustervalidation index. Note that an improvement in the clustering structure and an apparent convergence of thevalidation index after only 10 iterations are clearly visible.

For the letter graph dataset, another edit cost model that is specifically suitable for the letter graph repre-sentation has been heuristically developed. This application-specific model assigns costs that are proportionalto the Euclidean distance of two labels in case of substitutions, while insertions and deletions have fixed costs.The model hence takes into account that the node parameters represent Euclidean coordinates. To evaluate

Fig. 2. Distribution of three graph classes (a) before learning and (b) after learning.

Fig. 3. Clustering quality of the probabilistic model on (a) letter graphs and (b) fingerprint graphs.

Fig. 4. Classification accuracy of the probabilistic model compared to an application-specific model on (a) letter graphs and (b) fingerprintgraphs.

M. Neuhaus, H. Bunke / Information Sciences 177 (2007) 239–247 245

which model best copes with strong distortions, we carry out another experiment on the same dataset. Byapplying an additional distortion operator in the form of a shearing transformation to the graphs, the originaldataset is converted into a more difficult dataset with stronger distortions. We proceed by generating six sam-ple sets of graphs from the original dataset with various degrees of distortion. (For an example of distortedcharacters see an ideal letter prototype in Fig. 1a and two distorted letter instances in Fig. 1d–e.) Comparingthe proposed model to the application-specific heuristic model, we find that the probabilistic estimation of theedit operation distribution is especially effective on heavily distorted data. The accuracy of a leave-one-outnearest-neighbor classifier on the original letter dataset and six datasets of various degrees of distortion is illus-trated in Fig. 4a. The handcrafted application-specific model performs well compared to the probabilisticmodel in case of small distortions. With an increasing degree of distortion, however, it rapidly deteriorates,and the stochastic model proposed in this paper becomes superior.

We conclude from the letter graph experiments that the proposed learning algorithm is able to estimate theedit operation distribution such that an improved set of edit costs can be derived from the underlying mixturemodel. The learning of the edit costs results in compact clusters and an improvement in classification accuracy,particularly in case of heavy distortions.

4.2. Fingerprint classification

Fingerprint classification is the task of grouping fingerprints into classes with similar characteristics [15].The fingerprint classification system we describe in this paper is based on a structural representation of singu-lar regions extracted from fingerprints [17]. A fingerprint image, the corresponding characteristic regions, andthe extracted graph are shown in Fig. 5a–c.

Fig. 5. (a) Fingerprint image, (b) detected characteristic regions, and (c) extracted attributed graph.

246 M. Neuhaus, H. Bunke / Information Sciences 177 (2007) 239–247

We demonstrate the usefulness of the cost learning algorithm on 500 graphs from the NIST-4 database offingerprints [25]. We again compute a cluster validation index reflecting the quality of the graph clustering.The learning of edit costs in terms of the clustering quality is illustrated in Fig. 3b. It turns out that thevalidation index improves only slightly, the improvement being virtually invisible in the illustration. Hence,the training process is unable to strongly adapt the edit costs to the fingerprint graph sample.

If we compare the classification accuracy, however, we observe that the probabilistic model proposed in thispaper outperforms the application-specific model. In Fig. 4b, the classification performance obtained on anindependent test set of 500 fingerprints by means of a nearest-neighbor classifier on the training set is illus-trated. As expected from the results of the training process shown in Fig. 3b, the classification performanceof the probabilistic model in Fig. 4b is almost independent of the training iteration. The initial estimationof the edit operation distribution is therefore not significantly adapted by the training process. Yet, theproposed system in its initial state already provides us with better cost functions than the manually designedsystem. Using the proposed model, we can improve the classification rate from 66.6% to 77.6%.

5. Conclusions

In this paper we propose a method to derive graph edit costs from a probabilistic model. Edit costs are usedto compute distances between graphs by performing a structural matching. We introduce a probabilisticmodel for graph edit operations and show how to estimate the edit operation distribution from a labeledset of graphs. The edit costs are adapted so as to decrease the distance between graphs from the same class,leading to compact graph clusters. In an experimental evaluation, we show that our method can be used tolearn edit costs that result in enhanced clusterings and improved recognition performance. The learning pro-cess is demonstrated on synthetically generated graphs representing letter drawings and on real-world finger-print graphs. The proposed method is found to outperform application-specific models of edit operation costs.In the future, we intend to apply the learning algorithm to other graph datasets. We also plan to further inves-tigate the convergence behavior of the system on difficult graph representations.

Acknowledgments

This research was supported by the Swiss National Science Foundation NCCR program ‘‘InteractiveMultimodal Information Management (IM)2’’ in the Individual Project ‘‘Multimedia Information Accessand Content Protection’’.

References

[1] Special Section on Graph algorithms and computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (10)(2001) 1040–1151.

[2] Special Issue on graph based representations, Pattern Recognition Letters 24 (8) (2003) 1033–1122.[3] Special Issue on Graph matching in pattern recognition and computer vision, International Journal of Pattern Recognition and

Artificial Intelligence 18 (3) (2004) 261–517.[4] H. Bunke, G. Allermann, Inexact graph matching for structural pattern recognition, Pattern Recognition Letters 1 (1983) 245–253.[5] H. Bunke, K. Shearer, A graph distance metric based on the maximal common subgraph, Pattern Recognition Letters 19 (3) (1998)

255–259.[6] W.J. Christmas, J. Kittler, M. Petrou, Structural matching in computer vision using probabilistic relaxation, IEEE Transactions on

Pattern Analysis and Machine Intelligence 17 (8) (1995) 749–764.[7] D. Conte, P. Foggia, C. Sansone, M. Vento, Thirty years of graph matching in pattern recognition, International Journal of Pattern

Recognition and Artificial Intelligence 18 (3) (2004) 265–298.[8] T. Cox, M. Cox, Multidimensional Scaling, Chapman and Hall, 1994.[9] A.D.J. Cross, E.R. Hancock, Graph matching with a dual-step EM algorithm, IEEE Transactions on Pattern Analysis and Machine

Intelligence 20 (11) (1998) 1236–1253.[10] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal

Statistical Society 39 (1) (1977) 1–38.[11] M.-L. Fernandez, G. Valiente, A graph distance metric combining maximum common subgraph and minimum common supergraph,

Pattern Recognition Letters 22 (6–7) (2001) 753–758.

M. Neuhaus, H. Bunke / Information Sciences 177 (2007) 239–247 247

[12] A.M. Finch, R.C. Wilson, E.R. Hancock, Symbolic graph matching with the EM algorithm, Pattern Recognition 31 (11) (1998) 1777–1790.

[13] L. Hubert, J. Schultz, Quadratic assignment as a general data analysis strategy, British Journal of Mathematical and StatisticalPsychology 29 (1976) 190–241.

[14] B. Luo, E. Hancock, Structural graph matching using the EM algorithm and singular value decomposition, IEEE Transactions onPattern Analysis and Machine Intelligence 23 (10) (2001) 1120–1136.

[15] D. Maltoni, D. Maio, A.K. Jain, S. Prabhakar, Handbook of Fingerprint Recognition, Springer, 2003.[16] B.T. Messmer, H. Bunke, A new algorithm for error-tolerant subgraph isomorphism detection, IEEE Transactions on Pattern

Analysis and Machine Intelligence 20 (5) (1998) 493–504.[17] M. Neuhaus, H. Bunke, A graph matching based approach to fingerprint classification using directional variance, in: Proceedings

of the 5th International Conference on Audio- and Video-Based Biometric Person Authentication, LNCS, vol. 3546, Springer, 2005,pp. 191–200.

[18] M. Neuhaus, H. Bunke, Self-organizing maps for learning the edit costs in graph matching, IEEE Transactions on Systems, Man, andCybernetics (Part B) 35 (3) (2005) 503–514.

[19] R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood and the EM algorithm, Society for Industrial and AppliedMathematics (SIAM) Review 26 (2) (1984) 195–239.

[20] E. Ristad, P. Yianilos, Learning string edit distance, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (5) (1998)522–532.

[21] A. Sanfeliu, K.S. Fu, A distance measure between attributed relational graphs for pattern recognition, IEEE Transactions onSystems, Man, and Cybernetics (Part B) 13 (3) (1983) 353–363.

[22] N. Vlassis, A. Likas, A greedy EM algorithm for Gaussian mixture learning, Neural Processing Letters 15 (1) (2002) 77–87.[23] R.A. Wagner, M.J. Fischer, The string-to-string correction problem, Journal of the Association for Computer Machinery 21 (1)

(1974) 168–173.[24] W.D. Wallis, P. Shoubridge, M. Kraetzl, D. Ray, Graph distances using graph union, Pattern Recognition Letters 22 (6) (2001) 701–

704.[25] C.I. Watson, C.L. Wilson, NIST Special Database 4, Fingerprint Database, March 1992.[26] R.C. Wilson, E. Hancock, Structural matching by discrete relaxation, IEEE Transactions on Pattern Analysis and Machine

Intelligence 19 (6) (1997) 634–648.[27] C.F.J. Wu, On the convergence properties of the EM algorithm, The Annals of Statistics 11 (1) (1983) 95–103.