17
JOURNAL OF ALGORITHMS 5.163-179 (1984) Heuristic Matching for Graphs Satisfying the Triangle Inequality* DAVID A. PLAISTED Depurtment of Computer Science, University of Illinois, Urbuna, Illinois 61801 Received January 3,198l Two matching heuristics are presented. The hyper-greedy method runs in time O(n210g n) and produces a matching whose cost is at most 21og,(1.5n) times optimal. Graphs are given causing this method to achieve nearly this ratio. The factor of two method runs in time O(n’log K), where K is the maximum ratio of edge lengths in the graph, and never requires more than O(n3) time. The factor of two method produces a matching whose cost is at most max(4 log, K, 4 log, n) times optimal, plus lower-order terms. Graphs are given causing this method to achieve a ratio asymptotically equal to (log,n)/2. INTRODUCTION We represent an undirected graph G by the ordered pair (V, E), where I/ are the vertices of G and E are the edges. From now on we assume that all edges of all graphs under consideration have lengths or weights assigned which are rational numbers, The length of edge {x, y } is denoted by d(x, y). A perfect matching of a graph G is a subset M of the edges of G such that every vertex of G is an endpoint of exactly one edge in M. The cost of a perfect matching M is the sum of the lengths of the edges in M. The problem we are interested in is finding a perfect matching of small cost. The most efficient known general algorithm to find a smallest cost-perfect matching requires e(n’) operations for graphs with n vertices [3]. We present an algorithm which finds a matching whose cost is at most 2 log,(lSn) times optimal and which runs in time O(n’log n) operations. A *This research was supported in part by the National Science Foundation under Grant MCS 81-09831. 163 0196-6774/84 $3.00 Copyright 0 19R4 by Academic Press, Inc. All rights of reproduction in any form reserved.

Heuristic matching for graphs satisfying the triangle inequality

Embed Size (px)

Citation preview

Page 1: Heuristic matching for graphs satisfying the triangle inequality

JOURNAL OF ALGORITHMS 5.163-179 (1984)

Heuristic Matching for Graphs Satisfying the Triangle Inequality*

DAVID A. PLAISTED

Depurtment of Computer Science, University of Illinois, Urbuna, Illinois 61801

Received January 3,198l

Two matching heuristics are presented. The hyper-greedy method runs in time O(n210g n) and produces a matching whose cost is at most 21og,(1.5n) times optimal. Graphs are given causing this method to achieve nearly this ratio. The factor of two method runs in time O(n’log K), where K is the maximum ratio of edge lengths in the graph, and never requires more than O(n3) time. The factor of two method produces a matching whose cost is at most max(4 log, K, 4 log, n) times optimal, plus lower-order terms. Graphs are given causing this method to achieve a ratio asymptotically equal to (log,n)/2.

INTRODUCTION

We represent an undirected graph G by the ordered pair (V, E), where I/ are the vertices of G and E are the edges. From now on we assume that all edges of all graphs under consideration have lengths or weights assigned which are rational numbers, The length of edge {x, y } is denoted by d(x, y). A perfect matching of a graph G is a subset M of the edges of G such that every vertex of G is an endpoint of exactly one edge in M. The cost of a perfect matching M is the sum of the lengths of the edges in M. The problem we are interested in is finding a perfect matching of small cost. The most efficient known general algorithm to find a smallest cost-perfect matching requires e(n’) operations for graphs with n vertices [3]. We present an algorithm which finds a matching whose cost is at most 2 log,(lSn) times optimal and which runs in time O(n’log n) operations. A

*This research was supported in part by the National Science Foundation under Grant MCS 81-09831.

163 0196-6774/84 $3.00

Copyright 0 19R4 by Academic Press, Inc. All rights of reproduction in any form reserved.

Page 2: Heuristic matching for graphs satisfying the triangle inequality

164 DAVID A. PLAISTED

sparse graph is a graph which lacks some edges between distinct vertices. Such missing edges are assumed to have as length the length of a shortest path between their endpoints. We assume sparse graphs are connected since perfect matchings can be found for disconnected graphs by finding a perfect matching for each connected component, assuming each component has an even number of vertices. Therefore we can assume that ] E 1 & n - 1, where n is the number of vertices. We present a matching algorithm which can find a matching whose cost is bounded as in the above heuristic and which runs in 0( ] E ](log n)2) operations on sparse graphs. A dense graph is a complete graph on its set of vertices (with no edges from a vertex to itself). The length of a path is the sum of the length of the edges in the path; the length of a cycle is the number of edges in the cycle. For a discussion of graph terminology see [4]. Terminology similar to the above can be defined for directed graphs. We represent an edge from x toy in a directed graph by the ordered pair (x, y).

1. THE HYPER-GREEDY METHOD

Suppose we are given an undirected graph G = (I’, E) satisfying the triangle inequality. Suppose ] V] = n and n is even. This method works by constructing a sequence G,, G,, . . . , G, of undirected graphs in which G is G,. Then subsets Mk, MkP1,. . . ,M, of the edges of G,, Gk-i,. . . , G, are constructed. These subsets are similar to matchings of subsets of the vertices of the G;. Finally, M, is converted to a matching of G,.

1.1. Constructing the Graphs Gj

The graph Gi is (y, Ei) and the vertices K of G, are partitioned into two disjoint subsets, Oddi and Even,. We call Oddi the “odd vertices” of Gi and Even, the “even vertices” of G;. Each vertex in Oddi is a set of vertices in Odd,- 1, for i > 0. Also, Odd, = V and Even, = $. We obtain G, from G,, in the following way. First, we construct the directed graph Gi = (V,, E,‘) in which E’, includes an edge (x, y) if y is one of the closest neighbors of x in G,. Now, G; may include cycles (ignoring the directions of the edges) if we are not careful how GA is constructed. For example, if vertices a, b, and c are all the same distance from each other, the cycle {(a, b), (b, c), (c, a)} can be included in G,‘. To avoid this, we construct GA as follows: For each vertex x of G,, let n(x) be min{ d(x, y): y E V,}, where d(x, y) is the length of the edge { x, y }. We sort the vertices of G,, and process them in increasing order of n(x). For each vertex x, one of the edges { x, y } of length n(x) is added to E,‘. If there exists y such that d( x, y) = n(x) and such that (y, x) is

Page 3: Heuristic matching for graphs satisfying the triangle inequality

HEURISTIC MATCHING IN GRAPHS 165

already in E& then some such y is chosen and the edge (x, y) is added to E,‘.

This method of constructing Gi ensures that G; contains no cycles of length greater than two. To see this, suppose Gi contains a cycle C of length 3 or more. Consider the last edge e added to C. Suppose e was added when processing vertex x. Let e = (x, y) and suppose (z, x) was already in C when x was processed. Then d( x, y ) = d( x, z); if d( x, y ) is smaller then x would have been processed before z, and if d(x, y) is larger then (x, y) would not have been added to EL. Now, in this case, since (z, x) is already in E,’ and ( y, x) is not already in E,‘, (x, y) would not have been added to E,‘. We know that (y, x) is not already in E because we assume (x, y) is the last edge to be added to C. Therefore, ignoring directions of edges, GG contains no cycles of length greater than two, and is thus a collection of disjoint trees.

To obtain G,, let Vi contain one vertex x for each tree Tin G& The vertex x is the set of vertices in T. Also, if x and y are vertices of V,, then there is an edge between x and y iff there exists an edge {u, u} in E, such that UEX, u~y. In this case, d(x,y)= min{d(u,u):uEx, u~y}. Also, Odd, = {x E Vi : 1x1 is odd} and Even, = {x E V1 : Jx 1 is even}. Note that G, need not satisfy the triangle inequality even if G, does.

The general method for constructing Gi+i from Gi is similar to the above, except that in constructing G/ from Gi, we include an edge from an odd vertex x to some odd vertex y such that the length of the shortest path from x to y is minimal. If there are more than one such y, then one of them is chosen as indicated above to insure that no cycles of length greater than two are created. The length of the edge (x, y) in G/ is the sum of the lengths of edges in some shortest path from x and y, and is denoted d’( x, y ). No edges are constructed in G! having even vertices of Gi are endpoints. Thus G/ is a collection of disjoint trees (ignoring the directions of the edges). For each tree T of G/, a vertex x is included in Gi + i; this vertex x is the set of vertices in T. Also, x is an odd vertex of Gi+l if T contains an odd number of vertices, and x is an even vertex of Gi+l otherwise. In addition, all even vertices of Gi are included as even vertices of Gi+i. As before, there is an edge between vertices x and y of Gi+i iff there exists an edge {u, y} in Gi such that u E x and u E y. In this case, d(x, y) is min{ d(u, u) : u E x, u E y }. Note that each odd vertex of G, is a set containing at least three elements, for i > 0. Therefore IOddi+i 1 6 (1/3)IOddil for 1 >/ 0. Since IOdd, is even for all i and IOdd,( = 0, IOdd,-, > 2 so k < log,(3n/2).

1.2. Constructing the Matchings M,

In order to specify how the matchings Mi are obtained, we introduce some terminology.

Page 4: Heuristic matching for graphs satisfying the triangle inequality

166 DAVID A. PLAISTED

DEFINITION. An odd matching of a set W of vertices in a graph G = (V, E) is a set D of edges of G such that (a) every vertex in W is an endpoint of an odd number of edges of D and (b) every vertex in V - W is an endpoint of an even number of edges in D. If G is a directed graph, we ignore the directions of edges in G when constructing D.

Note that an ordinary matching of a graph G = (V, E) is an odd matching of V in G. Also, if an odd matching of W in G exists, then ] WI must be even.

PROPOSITION. Zf T is a tree and W is a set containing an even number of vertices of T then there is a unique odd matching of W in T.

Proof. To show that an odd matching exists, pair up the vertices of Win some way. There is a unique path connecting each pair of vertices. Let D be the set of edges contained in an odd number of such paths. Then D is an odd matching of W in T, since each path contains two edges incident with interior vertices on the path and one edge incident with the endpoints of the path. To show uniqueness, suppose {x, y } is an edge of T. Then removing this edge disconnects T into two trees Tl and T2. If each such tree contains an odd number of vertices in W, then {x, y } must be contained in W, otherwise {x, y } must be excluded from W.

If G is a tree and ] WI is even, it is possible to find an odd matching of W in G in linear time using the following procedure. This procedure can be made to run in linear time since the vertices of T can be traversed in depth-first order [4] in linear time, and thus leaves of T can be found as necessary in linear time.

procedure Odd( W, T); if W = C#I then return ($J) fi; let x be a leaf of T; T’ + T with x and the edge incident with x deleted; if x P W then retum(Odd( W, T’)) else

let y be the neighbor of x in T; if y E w then retum(Odd(W - {x, y}, T’) u {{x, y}})

el= retumWd((W - {x>> u {Y}, T’) u {{x, y }}) fi

fi; end Odd;

Given a set D of edges in G/, let Paths(D) be the corresponding set of paths in Gi. Recall that each edge in G/ corresponds to a path in G,, namely, a shortest path between its endpoints. Also, given a set D of edges in Gi, let Close(D) be the corresponding edges in G,-i. Thus if (x, y) is in D, (u, V)

will be in Close(D) for some u E x and v E y such that d( u, v) is minimal.

Page 5: Heuristic matching for graphs satisfying the triangle inequality

HEURISTIC MATCHING IN GRAPHS 167

The matchings Mi of G,, Gk-i,. . . , G, are constructed in order. To do this, auxiliary matchings M: are also constructed. First, Mk is empty. In general, for i < k, M; is an odd matching of {x E Odd, : x is included in an even number of edges of Close( Mi+ 1)} in G/. Thus, in each tree of G/, the vertices that are included an odd number of times in edges of Close( Mi+ 1) are considered as already matched, and an odd matching of the remaining vertices is found. Also, for i -c k, Mi is Mio U Close(Mi+,), where Mio is the set of edges of Gi included in an odd number of paths in Paths(M,‘).

We now show that these matchings exist and that Mi is an odd matching of Odd, in Gi. We show this by induction, starting with k and going down to 0. Since Mk is empty and Odd, is empty, Mk is as desired. Also, ML is not used. Suppose Mi+ 1 is an odd matching of Oddi+i in Gi+r. Consider a tree T of G/; this tree corresponds to a vertex u of Gi+ i. If T has an odd number of vertices, then u is in Oddi+ i. Therefore there are on odd number of edges

Of Mi+l incident with u. Therefore there are an odd number of edges of Close( Mi+ i) incident with vertices in T. Therefore there are an odd number of vertices of T incident with an odd number of edges of Close(M,+,). Since T has an odd number of vertices, there are an even number of vertices of T incident with an even number of edges of Close( Mi+ i). Thus there is an odd matching of the vertices of T incident with an even number of edges in Close( Mi+ i). Suppose T has an even number of vertices. Then u is in Even,,, so there are an even number of edges of Mi+, incident with u. Thus there are an even number of edges of Close(Mi+,) incident with vertices in T. Therefore there are an even number of vertices of T incident with an odd number of edges of Close( Mi+l). Since T has an even number of vertices, there are an even number of vertices of T incident with an even number of edges of Close( Mi+l). Therefore there is an odd matching of the vertices of T incident with an even number of edges in Close(M,+i). Finally, since all trees of G( are edge (and vertex) disjoint, an odd matching exists of the odd vertices of G( included in an even number of edges of Close( M,,,). We call this matching M:.

We now show that M/’ is an odd matching of the odd vertices of Gi included in an even number of edges in Close(M,+i). Suppose x is such a vertex. Then x is an endpoint of an odd number of edges in Mi’. Hence x is an endpoint of an odd number of paths in Paths (M:). For each such path, there will be one edge of Gi incident with x. Also, x may be an interior point of an arbitrary number of paths of Paths(M,‘). For each such path, there will be two edges of G, incident with x. Thus, summing over all paths in Paths(M,‘), there will be an odd number of occurrences of edges incident with x. Hence Mio contains an odd number of edges incident with x.

Suppose y is an odd vertex of Gi included in an odd number of edges in Close( Mi+ i). Then y is an endpoint of an even number of edges of M:.

Page 6: Heuristic matching for graphs satisfying the triangle inequality

168 DAVID A. PLAISTED

Hence y is an endpoint of an even number of paths of Paths(M,‘). Each such path has one edge incident with y; thus there are an even number of occurrences of such edges altogether. In addition, y may be interior point in an arbitrary number of paths of Paths(M,‘). Each such path has two edges incident with y. Summing over all paths in Paths(M,‘), there are an even number of occurrences of edges incident with y in paths of Paths(A4,‘). Hence there will be an even number of edges of 44: incident with y.

Suppose z is an even vertex of G,..Then z is not an endpoint of any edges of M,‘. Hence z only occurs in the interior of paths of Paths( A!,‘), if at all. Reasoning as above, z is an endpoint of an even number of edges of MF. Hence Mio is an odd matching of the odd vertices of Gi included an even number of times in edges of Close(M,+,).

Now, no path of Paths(Mi’) will contain any edges of Close(Mi+,). To see this, suppose p is some such path. Let the endpoints of p be vertices x and y. Then x and y are odd vertices of Gi and there is an edge connecting x and y in G!. Thus x and y are in the same tree of G/. Furthermore, all other vertices of p are even vertices of Gi (since p is a shortest path from x or y to another odd vertex). If some edge { U, u } of Close(Mi+ i) were in p, then u and u are endpoints of p (since u and u are odd vertices of Gi). But this cannot be, since edges in Close(M,+,) connect distinct trees in G,‘, implying that u and u are in different trees of G/.

From the above result it follows that i’$” fl Close(M,+,) is empty, since every edge in Mio occurs in some path of Paths( M;). Therefore Mj is an odd matching of Odd, in G;. To see this, consider an odd vertex x of Gi. Suppose x is an endpoint of an even number of edges of Close(M,+,). Then x is an endpoint of an odd number of edges of Mio, since Mio is an odd matching of the odd vertices of G, included in an even number of edges of Close(M,+,). Therefore x is an endpoint of an odd number of edges of Mi, since Mi = Mp U Close(M,+,). Suppose y is an odd vertex of Gi that is an endpoint of an odd number of edges of Close(M,+,). Then y is an endpoint of an even number of edges of Mio. Hence y is an endpoint of an odd number of edges of M,. Finally, if z is an even vertex of Gi then z is an endpoint of no edges of Close(M,+i), and z is an endpoint of an even number of edges of Mio. Hence z is an endpoint of an even number of edges of M,. Therefore M, is an odd matching of Odd; in Gi, as claimed.

Note that the hyper-greedy method can produce an odd matching of the vertices of any graph G having an even number of vertices, regardless of whether G satisfies the triangle inequality. This follows because MO is an odd matching of the odd vertices of Go, and Go is the original graph G, and all vertices of G are considered as odd vertices of Go. If G satisfies the triangle inequality, this odd matching can be converted to an ordinary matching in the following way: If MO is not already an ordinary matching, then some vertex u of G must be an endpoint of three or more edges of ~4,.

Page 7: Heuristic matching for graphs satisfying the triangle inequality

HEURISTIC MATCHING IN GRAPHS 169

Suppose {x, u } and { u, y } are two of these edges. Suppose {x, y } is not in M,. Then {x, u} and {u, y } can be deleted from 44, and replaced by {x, y }. Since G satisfies the triangle inequality, this operation will not increase the sum of the lengths of the edges in M,. Furthermore, after this replacement, the resulting set of edges is still an odd matching of the vertices of G, so the operation can be repeated. If {x, u >, (u, y }, and {x, y} are all in M,, they can all be deleted. (It could be that such a situation can never occur.) Eventually an ordinary matching of the vertices of G will be obtained.

Now we show that the matching produced by this method is no more than a factor of 2 log,(lSn) more costly than an optimal matching, where n is the number of vertices of G. Let M be the matching produced by the hyper-greedy method. Let Opt be an optimal matching (that is, minimizing the sum of edge lengths). Given a set D of edges, let 11 Dll be the sum of the lengths of the edges in D. For a vertex x of G,, let B(x) be U{ B(y) : y E x} if x is not a vertex of G,. If x is a vertex of G,, let B(x) be {x}. One can easily show by induction that if x is an odd vertex of G, then I B( x) 1 is odd, and if x is an even vertex of Gi then lB(x) I is even. For i -K k, let Opt, be {{u, u} : u and u are distinct vertices of G, and there exists an odd number of edges { x, y } in Opt such that x E B(U), y E B(u)}. Thus Opt, is a subset of the edges of Gi. Also, if {u, u} E Opt, and x and y are as in the definition of Opt,, then d(u, u) < d(x, y). Hence (lOptill < IlOptll. Further- more, Opt, is an odd matching of the odd vertices of Gj. This is because each vertex in G, is an endpoint of exactly one edge in Opt, and because B(x) is odd if x is an odd vertex of Gj and B(x) is even if x is an even vertex of G;. Therefore it is possible to form a set of edge disjoint paths from the edges in Opt, having the following property: Each path has two odd vertices as endpoints, and each odd vertex of Gi is an endpoint of exactly one such path. Let Opt; be some such set of paths, and let IlOpt:\l be the sum of the lengths of the edges included in paths of Opt;. Note that not all edges of Opt, need be included in paths of Opt;. Also, Opt; is an ordinary matching of the odd vertices of Oddi, except that disjoint paths instead of single edges are used to connect vertices. Finally, IlOptJl < [[Optill 6 IlOptll. We now show that llMioll < 21lOpt{ll. Clearly llMioll < IIEi’ll since My is a subset of E,‘. We show 11 E,‘jl < 2llOpt;ll as follows: Consider an odd vertex x of G,. Suppose (x, y) is in E,‘; thus the shortest path from x to y has sum of edge weights no larger than any other path from x to another vertex. Suppose there is some path from x to z in Opt;. Then the sum of the edge lengths of this path is at least as large as the sum of the edge lengths for the path from x to y. Summing over all odd vertices x of Gi, l\~$‘[l < 2llOptJl since each edge of E,’ is counted once during the summation and each path of Opt: is counted twice, once for each endpoint. Putting all these inequalities together, lIM:lI d 2llOptill < 2()0ptJl. Therefore 1lMll <

Page 8: Heuristic matching for graphs satisfying the triangle inequality

170 DAVID A. PLAISTED

c$@; < k]]Optl]. s ince k < log,(l.5n), ]lMll < 210g3(l.5n)]10pt(l as claimed.

1.3. Example

Suppose G = G, is the complete graph (V, E), where V is (0, 1,2,4,5,6} and the length of edge {x, y } is (y - xl. Then G’, has two connected components, (0, 1,2} and {4,5,6}. Hence G, has two vertices, one of them being the set (0, 1,2} and the other {4,5,6}. Both are odd vertices since each has an odd number of elements. The distance between the two vertices of G, is 2 since the distance between vertex 2 and vertex 4 is 2. Finally, G, has one vertex, the set { (0, 1,2}, {4,5,6}}, which is an even vertex. Now, Ml includes an edge between the vertex (0, 1,2} and the vertex {4,5,6}. Also, Close(M,) is the set containing the edge { 2,4}. Furthermore, Md is an odd matching of the vertices (0, 1,5,6} in G,’ and consists of the edges {O,l} and {5,6} directed in some way. Also, Mt is the set ((0, l}, {5,6}} containing two edges. Finally, Ma is M,O u Close( Mi) or { (0, 1 }, { 2,4}, {5,6}}. In general, the hyper-greedy heuristic will produce an optimal matching of a collinear set of points in the Euclidean plane (that is, a matching with no overlapping edges).

1.4. Time Required

We now specify in more detail how the hyper-greedy method can be implemented, and analyze the time required in the worst case. From now on, we assume that with each even vertex x of G,, a vertex prei(x) of G, is stored such that the mapping pre, has the following property: There exists an odd vertex y of Gi and a path p from x to y in Gi such that p begins with the edge {x, pre,( x)} and such that the length of p is minimal over all such p and y. Thus y is one of the closest odd vertices to x (in the sense of minimizing length of a shortest path) and the path x, pre,(x), Prei(Prei(x)),. - -, y is a shortest path from x to y in G,. The vertices pre,( x) can be computed for all even vertices x in Gi using a simple modification of the shortest path algorithm of Dijkstra [2]. This algorithm can be performed in O( ( V;.12) operations on dense graphs and in 0( ] E,IlogI F]) operations on sparse graphs Gi.

DEFINITION. If y is an odd vertex of G, then the Voronoi region of y is the set of vertices x of G, such that the set {x, pre,( x), prei(prei(x)), . . . } includes y. Thus y itself is in the Voronoi region of y. The generalized Voronoi diagram of Gi is the set of Voronoi regions of odd vertices in G,. Note that the generalized Voronoi diagram of Gi is a partition of I$ Also,

Page 9: Heuristic matching for graphs satisfying the triangle inequality

HEURISTIC MATCHING IN GRAPHS 171

this partition can be constructed in 0( 1 E, 1) time, assuming the mapping pre, has already been computed.

PROPOSITION. ify is an odd uertex of G, then there exists an edge ( u, u } in Ei such that u is in the Voronoi region of y and v is in the Voronoi region of w and w is an odd vertex of Gi minimizing the length of the shortest path from y to w (subject to the condition w # y).

Prooj Let z be some odd vertex of Gi distinct from y such that the length of a shortest path from y to z is minimal. Let p be some such shortest path in Gi. This path begins in the Voronoi region of y and ends in the Voronoi region of z. Let u be the last vertex of p in the Voronoi region of y; let u be the next vertex in p. Suppose u is in the Voronoi region of w. Then the shortest path from u to w is the same length as the shortest path from u to z. Hence there is a path q from y to w, of the same length asp.

This result means that at least one of the closest odd vertices to an odd vertex y of G, can be found by examining all edges of E, and considering those edges {u, u} with endpoints in different Voronoi regions. This can be done for all odd vertices y of G, at the same time by examining all the edges of Gi. This requires O(lE,I) time or O(lEl) time, since IE,I < IEl. To construct G/ from Gj also requires sorting the odd vertices of G, by distance to a nearest odd vertex; this requires O(log n) time. It is necessary to verify that the method of avoiding cycles in G/ can still be used. However, this follows because the Voronoi regions of w and y are adjacent if the Voronoi regions of y and w are adjacent. (We say two Voronoi regions S, and S, are adjacent if there is an edge {u, u} with u E S, and u E S,.) The total work to construct G; from G; assuming pre, has already been computed is therefore 0( I El) operations. Finally, constructing Gi+i given G/ can be done in 0( 1 E, I + 1 E,‘I ) time using a simple connected components algo- rithm. Recalling that G/ is a set of disjoint trees, I E,‘I < n so this part takes 0( n*) time, or 0( I El) time for sparse graphs. The work per level is thus O(n*) for a total of O(n*log n). For sparse graphs, the work per level is O(IEllogn) for a total of O(IEl(logn)*).

Consider now the work required to construct the matchings M,’ and Mi given Mi+i Recall that 44,’ is an odd matching in G/ of the vertices of Oddi included an even number of times in edges of Close(Mi+,). To find Mi’ requires processing all the edges in M, + i and constructing an odd matching. With each edge in Ei+ 1 we keep a pointer to the corresponding edge in Close( Ei+ i). Thus each edge in I$+ i can be processed in constant time, for a total of 0( I E I) operations per level. Also, since G; is a set of disjoint trees, the odd matching M; can be found in linear time, that is, O(n) operations. Since n < 1 El + 1, O( 1 El) operations suffice. Now, M, = Mp U

Close(M,+i), where i$” is the set of edges of G, included in an odd number of paths of Paths(M,‘). Since each such path contains at most n - 1 edges

Page 10: Heuristic matching for graphs satisfying the triangle inequality

172 DAVID A. PLAISTED

and there are at most n - 1 such paths, Mio can be found in O(n2) operations by processing the paths one at a time. We now show that this work can be reduced to 0( IEl) operations for sparse graphs.

Suppose ( y, z) is an edge of G/. This edge corresponds to a shortest path p from y to z in Gi. These paths are found by examining edges of G, with endpoints in different Voronoi regions. Therefore this path consists of three parts: an initial segment p1 in the Voronoi region of y, an edge { U, u } with u in the Voronoi region of y and u in the Voronoi region of z, and a final segment p2 in the Voronoi region of z. Also, p1 is a shortest path from y to U, meeting the vertices U, pre,( u), pre,(pre,( u)), . . . in reverse order, and p2 is a shortest path from u to z, meeting the vertices u, pre,( u), pre,(pre,( u)), . . . in order. Therefore any path p of Paths( M,‘) intersecting the Voronoi region of y will contain y. Since y is an odd vertex of G;, all paths of Paths(M,‘) that intersect the Voronoi region of y must be from the same tree of G/. Consider the set of edges {u, u} contain in at least one path of Paths(M,‘) such that u and u are both contained in the Voronoi region of y, for some particular odd vertex y of G,. This set of edges will be a tree, by above remarks concerning the structure of paths corresponding to edges of G/. We want to find which of these edges are included in an odd number of paths of Paths(M,‘). The straightforward approach, processing the paths one at a time, may process an edge many times. For sparse graphs, this may be inefficient. Instead, we process all the portions of paths in this Voronoi region in parallel, working toward the center of the region from the boundary. For a path p intersecting the Voronoi region of y, consider the first edge of p to be the edge including y, the second edge of p to be the next edge etcetera. Let m be the highest numbered edge of any relevant path such that both endpoints of this edge are in the Voronoi region of y. We then process all mth edges of all relevant paths, then all m - lth edges and so on. Processing in this way, it is easy to keep track of which edges are included in an odd number of paths with only a constant amount of work per edge. Also, edges with endpoints in different Voronoi regions will only be included in one path of Paths(M,‘). In addition, the trees of G,! can be processed one at a time, because any two trees whose paths intersect a given Voronoi region must be identical. Finally, the numbering of the edges can be found easily in O(1 Eil) time. Thus 44: and M, can be found in 0( IEI) operations for sparse graphs. The total work to find the matchings is therefore O(n210g n) operations for dense graphs, and 0( IE [log n) opera- tions for sparse graphs.

The last step of the hyper-greedy method, converting the odd matching MO to an ordinary matching M of G using the triangle inequality, requires 0( 1 E 1) operations because each step reduces the number of edges in the matching by at least one. Therefore the total work for the hyper-greedy

Page 11: Heuristic matching for graphs satisfying the triangle inequality

HEURISTIC MATCHING IN GRAPHS 173

method is O(n*logn) operations for dense graphs and O((E((logn)*) operations for sparse graphs.

1.5. A Lower Bound for the Hyper-Greedy Method

We now show that the upper bound of 21og,(l.Sn) on the ratio for the hyper-greedy method can nearly be achieved. We exhibit a family of graphs having arbitrarily many vertices for which the heuristic produces a matching of cost about 21og,(1.5n) - 1 times optimal, where n is the number of vertices. Let graphs Ai( E) and B,(r) be collinear graphs in the Euclidean plane defined as follows: A&E) is a single vertex. A,(c) is three vertices at intervals of c/2, thus: .~/2.~/2. . For i a 1, A,+l(c) is three copies of Ai (e/9), with adjacent copies separated by e/3: Ai (r/9) c/3 Ai (r/9) e/3 Ai (r/9). Therefore A,(c) has diameter E for all i > 0. Also, the graph BO(c) is two points at distance E: .e. . For i > 0, Bi(c) is two copies of Ai(c/3) separated by a distance of e/3. Thus Bi(c) has diameter E for all i.

Note that Ai has an odd number of vertices for all i and Bi( c) has an even number of vertices. Consider graph G (Fig. 1). On this graph, the optimal matching has cost near 1 since all Bi(c) can be matched using short edges. kfowever, the hyper-greedy heuristic produces a matching of cost about 2k + 3. To see this, let G, be the above graph. Then the graph G, constructed by the hyper-greedy method will have each of the isolated vertices merged with the nearest copy of B,,(r). Also, vertices within Bi(c) will merge for i > 0. The graph G, will have the two isolated vertices each merged with the nearest copies of BO(z) and B,(r). More merging will also occur within Bi(c) for i > 1. This continues until the top half of the graph has all merged into one vertex and similarly for the bottom half of the graph. Thus the top vertex contains an isolated vertex together with Bi(c) for 0 $ i d k, and similarly for the bottom vertex. The shortest edge between these “supervertices” is the leftmost edge, between the two copies of Bk(c). This edge is the first edge chosen as part of the heuristic matching.

.* . B, (El y

. B,(C) .

1 I

l+E

. B,(E)

0. l

FIGURE 1

Page 12: Heuristic matching for graphs satisfying the triangle inequality

174 DAVID A. PLAISTED

The next edges chosen will be between Bk(e) and Bk-i(c) ignoring short edges. This continues until all edges of length 1 around the circle have been chosen by the heuristic. Since there are 2 k + 3 such edges, the heuristic produces a matching of cost near 2k + 3. The number n of vertices in this graph is 4(3k + 3k-’ + . . . + 3 + 1) + 2, so n = 2 . 3k+1. Therefore 2k + 3 is 21og,(1.5n) - 1, not far from the upper bound of 21og,(1.5n).

2. THE FACTOR OF Two METHOD

The factor of two method is similar to the hyper-greedy method except that G/ is defined in a different manner. This method performs well on the above example, but in general does not do much better than the hyper-greedy method. Let 1 be the length of the shortest path between any two distinct odd vertices of Gi. Then G/ includes an edge { y, z} if y and z are distinct odd vertices of Gi and there is a path from y to z in G, of length less than 21. Thus G/ is an undirected graph. However, edges of G/ occurring in cycles are deleted until G/ consists of a set of disjoint trees. Other than this, the factor of two method is identical to the hyper-greedy method.

2.1. Time Required

The factor of two method can be efficiency implemented using the generalized Voronoi diagram. This is possible for the following reason: If u and w are odd vertices of Gi, then the Voronoi regions of u and w will be adjacent unless there is an odd vertex x of Gi such that d’(u, x) 6 d’(u, w) and d’(w, x) < d’(u, w). (Recall that d’(x, y) is the length of a shortest path between x and y.) For if p is a shortest path between u and w, then p will lie entirely within the Voronoi regions of u and w unless there is some vertex x as above such that p intersects the Voronoi region of x. Therefore if u and w may be connected by a path of length less than 21 then u and x may be connected by such a path and x and w may be connected by such a path. Hence u and w will still end up in the same connected component of G/ if the generalized Voronoi diagram is used to construct the components.

The maximum level k for the factor of two method is bounded by log,K, where K is the maximum ratio of edge lengths in G, since the edge length doubles at each level. Since the levels start at 0, and level k does not require any work, the number of relevant levels is k, which is bounded by log,K. However, the number of levels may be much less than this, and will never be larger than n. Hence the total work for the factor of two method is O(n’log K) and is never more than O(n3). Possibly this heuristic can be implemented more efficiently than this. For sparse graphs, 0( E log n log K) operations suffice.

Page 13: Heuristic matching for graphs satisfying the triangle inequality

HEURISTIC MATCHING IN GRAPHS 175

2.2. An Upper Bound

The analysis of the cost of the matching produced by the factor of two method is similar to that for the hyper-greedy method. However, by the way the graphs G/ are constructed, llE,‘ll < 41lOpt;IJ since an optimal path may be nearly twice as long as the corresponding edge in E/. Thus II MFll < 4110ptJl d 4llOptll. As before, l[Mll Q C;k_$VF so 1lMll -C 4kllOptll. Since k d log,K, llMl1 < 4(logzK)l10ptll. Thus the matching produced by the factor of two method is at most 4(log,K) times as costly as an optimal matching. We now give another upper bound, based not on K but only on the number of vertices in the graph.

Recall the definition of Opt; in the analysis of the worst-case behavior of the hyper-greedy method. In order to analyze the factor of two method, we specify that the paths in Opt j be constructed in a specific manner. Initially Opt; is specified since each optimal edge by itself is a path. In general, assume Opt; is given as some set of edge disjoint paths between odd vertices in G;. The odd vertices of Gi will be joined into “supervertices” to obtain Gi+l* For each new even vertex u of Gi+i, we pair up all the elements of u in some arbitrary manner. (Recall that the elements of u are odd vertices of Gi). For each odd vertex u of Gi+l, we pick a distinguished element of u and pair up the remaining elements of u in some arbitrary manner. We then join the paths of Opt; to construct paths between odd vertices of Gi+l. We do this so that if path p ends at vertex x of Gi and path q begins at vertex y of Gi and x and y are paired up elements of some vertex z of Gi+ i, then p and q are joined together in a path of Opt{+i. Similarly, p, q, and r may be joined together if the other endpoint of q is paired up with an endpoint of r. In this way, arbitrarily many paths of Opt; may be joined into longer paths in Opt:,,. These paths in Opt:, i will be edge disjoint. Some of these paths will be cycles; these can be ignored at higher levels. The remaining paths will pair up the odd vertices of Gi+l (Fig. 2).

Since G, has no odd vertices, eventually all such paths will become cycles. We analyze each cycle separately, comparing the length of the optimal edges in the cycle to the length of the heuristic matching contributed by vertices in the cycle. Suppose the shortest edge in GI has length 1;. Then each path p in Opt; must have length li or longer (since it is a path between odd vertices). Let w be an endpoint of a pathp of Opt:; thus w is an odd vertex of Gi and

ODD ODD

c

n n---- -H-N)-- -H-u--

PWqWr 3

FIGURE 2

Page 14: Heuristic matching for graphs satisfying the triangle inequality

176 DAVID A. PLAISTED

may be an endpoint of an edge e of E,‘. The length of e must be in the interval [Ii, 21,) since e is an edge of G,‘. In this way we relate the length of e to the length of p. The length of e is less than twice the length of p. Also, since llJ$‘ll < llMi’li < lIE,‘ll, 1lMll < C,llMplI < CillEi’ll- Thus we relate the lengths of edges in an optimal matching to the cost of the matching produced by the factor of two method. For convenience, we orient the edges E,’ of G/ so that each vertex of G,’ has outdegree one. This is possible because G,’ is a collection of disjoint trees. In this way exactly one edge of E,’ is associated with each vertex of G,‘.

Suppose a cycle C of length L is produced in Opt;_,; the same analysis applies if the cycle is produced at a smaller level. We want to find the largest possible contribution of C to the heuristic matching using a fixed number of vertices. Each vertex that was ever an endpoint of a portion of C in some path of Opt; for some i is considered as contributing to the cost of the heuristic matching. The cost such a vertex contributes to the matching is the length of the edge of E,’ leaving it. Now, I,- i < L so l,-, < L/2, 1,-s 6 L/4, I,,_, < L/8 et cetera. Thus the maximum edge length of an edge of ML-i, ML-*, MiP3 etcetera will be 2L, L, L/2, etcetera. In fact, these upper bounds cannot be achieved. Hence vertices matched at higher levels can contribute most to the cost of a heuristic matching. To get the worst case, we want as many vertices as possible to be matched at levels as high as possible.

Suppose C is divided into paths pl, pz, . . . , pm at level k - 2. Let x be an endpoint of pi that is matched at level k - 2, that is, x is an endpoint of an edge e of Mj’. Then the length of e is less than L since I, -2 < L/2. Hence each vertex x matched at level k - 2 contributes less than L to the heuristic matching. Also, if x is an endpoint of pj then vertex x will contribute less than 21pjl to the length of the heuristic matching since pj 2 1,-,. Since each pj has two endpoints, the contribution to the heuristic matching for pj is less than 4 lp,l . The total contribution at level k - 2 is thus less than 4L. Similarly, the total contribution at levels k - 3, k - 4, k - 5, etcetera, is less than 4L.

We want to see how C can be subdivided at each level to get the largest number of levels filled with a given number of vertices. Instead of doing a detailed analysis, we note that the maximum number of vertices that can contribute nearly L to the heuristic matching at level k - 2 is 4 since each such vertex must be an endpoint of a segment of C of length near L/2, and each such segment can be counted at most twice, once per endpoint. By similar reasoning, the maximum contribution per vertex at level k - 3 is less than L/2 and the maximum number of vertices that can contribute this much is 8. In general, at level k - j, the maximum contribution per vertex is less than L22-j and the maximum number of vertices that can contribute this much is 2J. Note that at least 2j vertices are required at level k - j to

Page 15: Heuristic matching for graphs satisfying the triangle inequality

HEURISTIC MATCHING IN GRAPHS 177

achieve the maximum contribution of 4L at that level. Distributing vertices to fill as many high levels as possible, levels k - 1 through k - j can be filled by 2 + 4 + 8 + . . . + 2j or 2j+’ - 2 vertices. This contributes a cost of less than 4Lj to the matching. If there are n = 2j+’ - 2 vertices, j + 1 is log,(n + 2) so the cycle C contributes less than 4L(log,(n + 2) - 1) to the heuristic matching. The worst case for this analysis occurs when all vertices of the graph are in one cycle, since this makes the ratio as large as possible. Hence the ratio ]]Ml]/]]Opt]] for the factor of two method is bounded by 4 log,(n) plus lower order terms, where n is the number of vertices in the graph. This ratio is not as good as for the hyper-greedy method, but a more careful analysis might yield a better ratio. The running times of the two methods are incomparable, since the hyper-greedy method is faster in the worst case but if K is small the factor of two method is faster. Also, if K is small the upper bound of 4(log,K) derived previously for the ratio ]]M]]/l]Opt]l for the factor of two method will be smaller than the ratio 2 log,(l.5n) for the hyper-greedy method.

2.3. A Lower Bound

In [6] it was stated that the factor of two method produces a matching whose cost is at most eight times optimal. However, this assertion was based on an incorrect attempt by the author to separate the edges in an optimal matching into levels. We now show that the ratio can be nearly as bad as (1/2)log n for a graph containing n vertices.

Consider graphs H, (Figs. 3,4). The graph H,,, has 2” + m vertices for m even and 2” + m + 1 vertices for m odd. The extra m or m + 1 vertices are used to ensure that the length of the shortest edge in Gj will be 2’-‘. The optimal matching of H3 has cost 4. The heuristic will connect the vertices at distance 1 - e in the first stage. These connected pairs of vertices then become even vertices in the next stage, when the vertices at distance 2 - e are connected. The final matching produced has cost 6 - 4~. For H4, the

1121 1 .-.-.-.

"3

2-E ‘/

I-Ei

.-‘y )1-c

1'1 l -*

71

2-E

FIGURE 3

Page 16: Heuristic matching for graphs satisfying the triangle inequality

178 DAVID A. PLAISTED

pm-2-E

.-.

: l . . .

. .

p-3-E I I

2m-3-E

. .

. . . l . .

l -* f-h 2m-2-E

FIGURE 4

optimal matching has cost 8 and the factor of two method produces a matching of cost 16 - 8~. For H,, the optimal matching has cost 2”-’ and the heuristic matching has cost m2”‘-* - 2’“-k The ratio is therefore m/2 - c. This ratio is asymptoticaly equal to (logn)/2 and can be arbi- trarily large. The same ratios can be achieved even if the shortest edges are retained when removing cycles in the construction of G;. It is interesting to note that the hyper-greedy method produces an optimal matching for these graphs H,,,.

3. CONCLUSIONS

Two heuristics have been presented for obtaining close to optimal perfect matchings on graphs satisfying the triangle inequality. Each method has advantages. The following problems remain: Give a more careful analysis of the worst case behavior of the factor of two method. Can the worst case bound be improved if Cl is constructed so that small edges are retained when deleting edges in cycles? For other possible heuristics, it might be possible to analyze heuristics based on obtaining matchings from traveling salesperson tours obtained by the nearest-neighbor method [5] and other heuristics [l, 51. Each tour corresponds to two matchings; if the least costly of these is taken, what can be said about how close to optimal it is? One can show that if this heuristic is applied to optimal tours or to tours obtained using the nearest neighbor heuristic, the worst case ratio for graphs contain- ing n vertices can be as bad as n/4 asymptotically. Another area of research is to find heuristics that are more efficient for special graphs, such as sets of points in the Euclidean plane. A different kind of analysis of heuristics for graphs in the Euclidean plane is given in [6].

Page 17: Heuristic matching for graphs satisfying the triangle inequality

HEURISTIC MATCHING IN GRAPHS 179

ACKNOWLEDGMENTS

I would like to thank Ed Reingold for suggesting this problem. Also I would like to thank Ed Reingold and Ken Supowit for many valuable and stimulating discussions. The UNIX system at the University of Illinois was helpful in preparing this paper.

REFERENCES

1. N. CHIUSTOFIDES, “Worst-Case Analysis of a New Heuristic for the Travelling Salesman Problem,” Technical report, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, Pennsylvania, 1976.

2. E. DIJKSTRA, Two problems in connexion with graphs, Numer. M&z. 1 (1959), 269-271. 3. H. GABOW, An efficient implementation of Edmond’s algorithm for maximum matching on

graphs, J. Assoc. Comput. Mach. 23 (1976), 221-234. 4. E. REINGOLD, J. NIEVERGELT, AND N. DEO, “Combinatorial Algorithms: Theory and

Practice,” Prentice-Hall, Englewood Cliffs, N.J.. 1977. 5. D. ROSENKRANTZ, R. STEARNS, AND P. LEWIS, Approximation algorithms for the travelling

salesman problem, in “Proceedings, Fifteenth IEEE Symposium on Switching and Au- tomata Theory,” pp. 33-42, IEEE, New York, 1974.

6. K. SUPOWIT, D. PLAISTED, AND E. REINGOLD, Heuristics for weighted perfect matching, in “Proceedings, Twelfth Annual ACM Symposium on Theory of Computing,” pp. 398-419, Association for Computing Machinery, 1980.