20
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle for e in the data structure. find(x): Given a handle for an element x; find the set that contains x. Return a handle/identifier/pointer/label for this set. union(A,B): Given two set identifiers create the union of the two sets. 1

Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle

Embed Size (px)

Citation preview

Union Find ADT

Data type for disjoint sets:• makeSet(x): Given an element x create a singleton set

that contains only this element. Return a locator/handlefor e in the data structure.

• find(x): Given a handle for an element x; find the set thatcontains x. Return a handle/identifier/pointer/label for this set.

• union(A,B): Given two set identifiers create the union ofthe two sets.

1

Union Find ADT

Applications:• keep track of the connected components of a dynamic

graph that changes due to insertion of nodes and edges

• Kruskals Minimum Spanning Tree Algorithm

2

Union Find ADT

List Implementation:• the elements of a set are stored in a list; each

node has a backward pointer to the tail• the tail of the list contains the label for the set• makeSet(x) needs constant time• find(x) also needs constant time

3

A

Union Find ADT

Union:• take the smaller of the two sets• change all its backward pointer to the label of the larger set• insert the smaller list at the head of the larger• time O(min(|A|,|B|))

4

B

A

5

Union Find

Lemma. The amortized running times are:• find O(1).• makeSet O(log n).• union O(1).

Proof. Idea: We partially charge the cost for a union operation to the elements involved. In total we will charge at most O(log n) to an element. Since each element has to be created with makeSet(x) we get an amortized time bound byinflating the cost of makeSet(x) to O(log n).

• find: actual cost and amortized cost are the same. • union(A,B): if (A==B) do nothing time O(1) for the check otherwise

add the smaller set to the larger; charge the cost for this to the elements in the smaller set; each element is charged one. your cost: zero!

6

Union Find

How much do we charge to an element?

Observation.Whenever we charge one to an element x the size of the subsetAx that contains x increases by at least a factor of 2.

total charge to an element is at most O(log n)

7

Implementation via trees:• the root of the tree is the label of the set• only pointer to parent exist; we cannot list all elements of

a given set• Problem: find is not constant anymore

Union Find

23

16

19

3

8

14 17

7

6

95

10

12

2

8

Union of two sets:• the root of the tree is the label of the set• store the size of a subtree in the root

Union Find: Tree Implementation

3

8

14 17

7

6

9 5

10

12

2

7

5 1

1

1

21

4

21

1

9

Union of two sets:• the root of the tree is the label of the set• store the size of a subtree in the root • make the smaller tree the child of the larger

3

8

7

6

9

14 17 2

5

10

12

Union Find: Tree Implementation

11

5 1

1

1

21

4

21

1

10

Find:• go upwards until you find the root• make all visited node into children of the root

(path compression)

3

8

7

6

9

14 17 2

5

10

12

Union Find: Tree Implementation

11

5 1

1

1

21

4

21

1

11

Find:• go upwards until you find the root• make all visited node into children of the root

(path compression)

3

8

7

6

9

14 17

Union Find: Tree Implementation

11

5 1

1

1

21

10

12

4

1

2

5 2

1

12

Tree Implementation: Analysis

Analysis• union (A,B) can be done in time O(1)• makeSet(x) is still trivial: O(1)• the cost for find(x) may be large.

Observation: The height of the trees is at most O(log n). • if for an element x the distance to the root increases this

means that the number of elements in its sub-tree at least doubles.

the cost for find(x) is at most O(log n) without amortization.

13

Can we do better? Yes with amortization!

Definitions:• n(v) := the number of nodes that were in the subtree

rooted at v, when v became child of another node. • rank r(v) :=• n(v) ≥ 2r(v)

Lemma: The rank of a parent p must be strictly larger than therank of its child c.• after c is linked to p the rank of c does not change anymore

while the rank of c might still increase.• directly after the linking r(p)

Tree Implementation: Analysis

≥ ≥=

> r(c)=

TexPoint Display

14

Tree Implementation: Analysis

Theorem. There are at most n/2s nodes of rank s.

Proof:• a node of rank s had at least 2s nodes in its subtree, when

it became a child of another node• nodes of the same rank have disjoint subtrees as they

cannot be ancestors of each other[observe that a node that was initially not in a subtree T cannot via path compression join this sub-tree. ]

more precisely: each node in the tree sees during its lifetime at most one ancestor of rank s; for each rank s node there are at least 2s nodes that have seen him; hence there can at most be n/2s nodes of rank s.

15

Tree Implementation: Analysis

Definitions:

Theorem: We can obtain the following amortized running times:• makeSet(x):• find(x):• union(A,B): O(1)

16

Tree Implementation: Analysis

group-number:• a node with rank r[v] is in rank-group• this means the rank-group g contains ranks t(g-1)+1,…., t(g)• there are at most different

rank-groups

17

Tree Implementation: Analysis

Accounting Scheme:• create an account for every find-operation• create an account for every node

The cost of a find operation is equal to the length of the pathtraversed. We charge the cost for going from v to parent[v] in thefollowing way:

• if the parent of v does not change due to path-compressionwe charge the cost to the find-account (at most cost 1 per find)

• if the group-number of rank[v] is the same as that of rank[parent(v)] (before starting path compression) we charge the cost to the node-account of v

• otherwise we charge the cost to the find-account

18

Tree Implementation: Analysis

Observations:• find(x) is only charged . (max number of rank-groups)

• after a node is charged its parent is re-assigned to a node higher

up in the tree. parent gets larger rank.• after some time the parent is in a larger rank group. node will never be charged again

• the charge to a node in rank-group g is at most t(g)-t(g-1)<= t(g)

What is the total number of operations that is charged to nodes?• the total charge is at most

where n(g) is the number of nodes in group g

19

Tree Implementation: Analysis

hence: as there are only groups

20

Tree Implementation: Analysis

If there are only n elements we charge at mostto these elements.

Charging to every makeSet()-operation gives theresult.

The analysis is not tight. In fact it has been shown that the amortized time for the union-find implementation with pathcompression is O(®(n)), were ®(n), is the inverse Ackermann function which grows a lot slower than . There is alsoa lower bound of (®(n)).