Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Disjoint Sets
Chapter 21
CPTR 430 Algorithms Disjoint Sets 1
Disjoint Sets
A disjoint-set data structure maintains a collection
S1 S2 Sk
of disjoint dynamic sets
Each set has a designated representative which is an element of theset
For some applications, the representative may be arbitrary For others, the “smallest” element is the representative (if the
elements can be ordered)
CPTR 430 Algorithms Disjoint Sets 2
Disjoint Set Operations
makeSet(x)—creates a new set whose only member is x
union(x,y)—combines the sets that contain elements x and y
If x Sx and y Sy, then union(x,y) returns a new set equal toSx
Sy
Sx and Sy are disjoint before the union() operation The representative of the resulting set can be any element in Sx
Sy,but usually we choose either the representative of Sx or Sy
The original sets, Sx and Sy are removed from
findSet(x)—returns a reference to the representative of the setcontaining x
CPTR 430 Algorithms Disjoint Sets 3
Skeleton Implementation
public class DisjointSet public static void makeSet(DSElement element) /* To be determined */ public static void union(DSElement x, DSElement y) /* To be determined */ public static DSElement findSet(DSElement element) /* To be determined */
CPTR 430 Algorithms Disjoint Sets 4
Disjoint Set Analysis n—the number makeSet() operations
m—the total number of makeSet(), and union(), and findSet()operations
The sets in are disjoint, so
Each union() operation reduces
by 1
After n 1 union() operations
1
The number of union() operations is at most n 1
m
n
For the purposes of analysis, assume the first n operations aremakeSet() operations
CPTR 430 Algorithms Disjoint Sets 5
Applications of Disjoint Sets
Determining the connected components of an undirected graph
Kruskal’s minimum spanning tree algorithm
In FORTRAN, handling the EQUIVALENCE(X,Y) statement
Type unification by compilers and interpreters of dynamically typedprogramming languages
Image processing—blob coloring
Colorizing old movies
CPTR 430 Algorithms Disjoint Sets 6
Sample Application—Detecting theConnected Components of an Undirected
Graph
This undirected graph has four connected components:
a
c
b
d
e
g
f h
i
j
CPTR 430 Algorithms Disjoint Sets 7
Connectivity Algorithm
public class ConnectedGraph public static void connectedComponents(Graph g)
Vertex[] vertices = g.getVertices();Edge[] edges = g.getEdges();for ( int i = 0; i < vertices.length; i++ )
DisjointSet.makeSet(vertices[i]);for ( int i = 0; i < edges.length; i++ )
DSElement fromSetRep = DisjointSet.findSet(edges[i].from),toSetRep = DisjointSet.findSet(edges[i].to);
if ( fromSetRep != toSetRep ) DisjointSet.union(fromSetRep, toSetRep);
public static boolean sameComponent(Vertex v1, Vertex v2)
return DisjointSet.findSet(v1) == DisjointSet.findSet(v2);
CPTR 430 Algorithms Disjoint Sets 8
Connectivity Algorithm
connectedComponents() initially places each vertex into its own set
Next, all edges are examined; an edge connecting two vertices impliesthat the two vertices are to be unioned into one set
After all edges have been examined, two vertices are within the sameconnected component if sameComponent() returns true
For things to work, a vertex object must reference an associated disjointset object and vice-versa
CPTR 430 Algorithms Disjoint Sets 9
Linked List Implementation
The linked list implementation is simple
The first element in the list is the set’s representative
Each element in the list contains:
a data object a pointer to the next element in
the list a pointer to the representative
data
next
rep
CPTR 430 Algorithms Disjoint Sets 10
Linked List Implementation (cont.)
Pointers head and tail refer, respectively, to the first and lastelements in the list
head points to the representative tail points to the position where a new element can be added and
another set can be unioned
next
rep
data
next
rep
data
next
rep
data
next
rep
data
a c db
tail
head
CPTR 430 Algorithms Disjoint Sets 11
Efficiency of the Linked Implementation
makeSet()
Create a new list with one element O
1
findSet()
Return the pointer to the representative stored in each node O
1
union()
Attach one list to end of the other The end can be found quickly via the tail pointer Updating the representative pointers in every node in the attached
list takes time proportional to the length of the attached list O
n
CPTR 430 Algorithms Disjoint Sets 12
The Amortized Analysis
In the worst case, a sequence of m operations requires O
n2 time
Take objects x1 x2 xn perform the operations
Operation Number of Objects Updated
makeSet(x1) 1makeSet(x2) 1
... ...makeSet(xn) 1
... ...union(x1 x2) 1union(x2 x3) 2union(x3 x4) 3
... ...union(xn 1 xn) n 1
CPTR 430 Algorithms Disjoint Sets 13
The Amortized Analysis (cont.)
The operation sequence is n makeSet()s following by n 1 union()ssuch that the longer list is always appended to the shorter list
The n makeSet() operations take Θ
n
time
The ith union() operation updates i objects
Total number of objects updated by all n 1 union() operations is
n 1
∑i 1
i Θ
n2
The total number of operations is 2n 1
Each operation on average requires Θ
n
time
By aggregate analysis, then, the amortized cost of each operation isΘ
n
CPTR 430 Algorithms Disjoint Sets 14
Weighted-union Heuristic
Ensure that the shorter list is always appended to the longer list
Fewer representative pointers to update
Maintain the length of each list (easy—add an extra integer field)
A union can still require Ω
n
if both lists have Ω
n
elements
Helps a little?
CPTR 430 Algorithms Disjoint Sets 15
It Does Better than Θ
n2
Given: linked list representation with the weighted-union heuristic
Any sequence of m makeSet(), findSet(), and union() operations,n of which are makeSet() operations, takes O
m
n lgn
time(Theorem 21.1)
Why?
CPTR 430 Algorithms Disjoint Sets 16
Consider each object in a set of size n
For a given object, x, how many times has its representative beenupdated?
The first time it was updated it originally had to have been an elementin the smaller set, since the weighted-union heuristic always appendsthe smaller list to the larger one
After x’s representative was updated the first time, x’s resulting set musthave had at least two elements (Why?)
The next time x’s representative is updated, x’s set must have at leastfour elements (Again, why?)
For all k
n, the resulting set has at least k elements after x’srepresentative has been updated
lgn
times
CPTR 430 Algorithms Disjoint Sets 17
Proof of Theorem 21.1 (cont.)
For all k
n, the resulting set has at least k elements after x’srepresentative has been updated
lgn
times
The largest set has at most n elements (Why?)
Each element in that largest set has been updated at most
lgn
times
The time to update the n elements is O
n lgn
The time to adjust the head and tail pointers, as well as the lengthfield, is constant
CPTR 430 Algorithms Disjoint Sets 18
Proof of Theorem 21.1 (cont.)
The makeSet() and findSet() operations take O
1
time
There are O
m
makeSet() and findSet() operations
The time for the entire sequence of m operations is
O
m
n lgn
CPTR 430 Algorithms Disjoint Sets 19
Disjoint-set Forests
A set is represented by a rooted tree
The root is the set’s representative
Each node points to its parent (the root points to itself)
So, unlike trees we are used to seeing, the pointers point “up” insteadof “down”
e,gunion( )f
d
g
c
b
eh
c
h e
b
f
d
g
CPTR 430 Algorithms Disjoint Sets 20
Operation Implementations
makeSet()—create a tree containing one node
findSet()—follow parent pointers until the root is found
The path is called the find path
union()—redirect the parent pointer of one of the roots to point to theother root
e,gunion( )f
d
g
c
b
eh
c
h e
b
f
d
g
CPTR 430 Algorithms Disjoint Sets 21
Efficiency of Disjoint-set Forests
The straightforward approach is no better than the linked list version
A sequence of n 1 union() operations can create a tree of height n
A couple of heuristics can tweak the implementation into theasymptotically fastest disjoint-set data structure known
Union by rank Path compression
CPTR 430 Algorithms Disjoint Sets 22
Union by Rank
Same idea as weighted-union for linked lists
The root of the tree with fewer nodes points to the root of the tree withmore nodes
We could have each node keep track of the number of nodes in itssubtree
Instead, each node maintains a rank that is an upper bound on itsheight
union() then redirects the pointer of the root of the tree with smallerrank to the root of the tree with larger rank
CPTR 430 Algorithms Disjoint Sets 23
Path Compression
Simple in concept and to implement but very effective
Alter the findSet() operation so that each node on the find path pointsdirectly to the root instead of its immediate parent
Path compression does not affect any ranks (Why?)
gfindSet( )c
h e d
b
g
f
c
dh ebg
f
CPTR 430 Algorithms Disjoint Sets 24
Disjoint-set Forest Implementation
The node definition is slightly different:
public class Node public DSElement data; // Element to storepublic Node parent; // Pointer to the parent nodepublic int rank;public Node(DSElement d)
data = d;parent = this; // No parent node yetrank = 0; // No subtree for this new nodedata.setNode(this);
CPTR 430 Algorithms Disjoint Sets 25
Disjoint-set Forest Implementation (cont.)
public class DisjointSet public static void makeSet(DSElement element)
new Node(element);public static void union(DSElement x, DSElement y)
link(findSet(x), findSet(y));private static void link(DSElement x, DSElement y)
Node nx = findSet(x).getNode(),ny = findSet(y).getNode();
if ( nx.rank >= ny.rank ) ny.parent = nx;
else nx.parent = ny;if ( nx.rank == ny.rank )
ny.rank++;
. . .
CPTR 430 Algorithms Disjoint Sets 26
Disjoint-set Forest Implementation (cont.)
public class DisjointSet . . .public static DSElement findSet(DSElement element)
Node node = element.getNode();if ( node.parent != node )
node.parent = findSet(node.parent.data).getNode();return node.parent.data;
Note the recursion in findSet()
findSet() uses two passes:
The first pass up the tree to find the root (representative) The second pass is down the tree to update the parents of all the nodes in the find
path (to point directly to the root) The recursive calls make up the first pass The returns from the recursive calls make up the second pass
CPTR 430 Algorithms Disjoint Sets 27
Do the Heuristics Help?
Union by rank, by itself, yields a running time of
O
m lgn
Path compression, by itself, yields a running time of
Θ
n
f
1
log2
f
n n
where
n is the number of makeSet() operations (which means at most n 1union() operations)
f is the number of findSet() operations
CPTR 430 Algorithms Disjoint Sets 28
The Combined Effect
Together, they yield a running time of
O
m α
n
where
α
n
is a very slowly growing function
For all practical applications of disjoint sets, α
n
4
Thus, the running time is linear in m, for all practical purposes
CPTR 430 Algorithms Disjoint Sets 29