Upload
sanjeev-malik
View
216
Download
1
Embed Size (px)
DESCRIPTION
fi
Citation preview
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 180
Unit 4 B – Trees
Structure
4.1 Introduction
Objectives
4.2 Properties of B – Trees
4.3 The height of a B – Tree
4.4 Binomial trees
4.5 Binomial Heaps
4.6 Fibonacci Heaps
4.7 Data Structures for Disjoint Sets
4.8 Summary
4.9 Terminal Questions
4.10 Answers
4.1 Introduction
We know that for binary search trees and red-black trees, any “satellite
information” associated with a key is stored in the same node as the key. In
practice, one might actually store with each key just a pointer to another disk
page containing the satellite information for that key. The pseudo code in
this chapter implicitly assumes that the satellite information associated with
a key, or the pointer to such satellite information, travels with the key
whenever the key is moved from node to node. A common variant on a
B – tree, known as a B+ – tree, stores all the satellite information in the
leaves and stores only keys and child pointers in the internal nodes, thus
maximizing the branching factor of the internal nodes.
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 181
Objectives
At the end of this unit the student should be able to:
Find the height of a B-tree.
Recognise a Fibonacci Heap
4.2 Properties of B – Trees
A B – tree T is a rooted tree (whose root is root [T]) having the following
properties:
1. Every node x has the following fields:
a. n [x], the number of keys currently stored in node x,
b. the n [x keys themselves, stored in nondecreasing order, so that
xkey...xkeyxkey xn21 ,
c. leaf [x], a Boolean value that is TRUE if x is a leaf and FALSE if x is
an internal node.
2. Each internal node x also contains n [x]+1 pointers c1[x], c2 [x]….,
cn[x]+1[x] to its children. Leaf nodes have no children, so their ic fields
are undefined.
3. The keys keyI [x] separate the ranges of keys stored in each subtree: if
KI is any key stored in the subtree with root ci [x], then
1xnxn2211 kxkey...xkeykxkeyk .
4. All leaves have the same depth, which is the tree’s height h.
5. There are lower and upper bounds on the number of keys a node can
contain. These bounds can be expressed in terms of a fixed integer 2t
called the minimum degree of the B – Tree:
a. Every node other than the root must have at least t – 1 keys. Every
internal node other than the root thus has at least t children. If the
tree is nonempty, the root must have at least one key.
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 182
b. Every node can contain at most 2t – 1 keys. Therefore, an internal
node can have at most 2t children. We say that a node is full if it
contains exactly 2t – 1 keys.
The simplest B – tree occurs when t = 2. Every internal node then
has either 2, 3, or 4 children, and we have a 2-3-4 tree. In practice,
however, much larger values of t are typically used.
4.3 The height of a B – tree
The number of disk accesses required for most operations on a B – tree is
proportional to the height of the B – tree. We now analyze the worst-case
height of a B – tree.
Theorem
If 1n , then for any n – key B – tree T of height h and minimum degree
2t , 2
1nlogh t
Proof If a B – tree has height h, the root contains at least one key and all
other nodes contain at least 1t keys. Thus, there are at least 2 nodes at
depth 1, at least 2t nodes at depth 2, at least 2t 2 nodes at depth 3, and so
on, until at depth h there are at least 2t h – 1 nodes. Figure illustrates such a
tree for h = 3. Thus, the number n of keys satisfies the inequality.
Figure 4.1: B – tree of height 3 containing a minimum possible number of
keys. Shown inside each node x is n[x].
number
depth of nodes
0 1
1 2
2 2t
3 2t2
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 183
h
1i
1it21t1n
1t
1t1t21
h
1t2h .
By simple algebra, we get
2
1nt
h . Taking base – t logarithms of both
sides proves the theorem.
Here we see the power of B – trees, as compared to red-black trees.
Although the height of the tree grows as O(lg n) in both cases (that t is a
constant), for B – trees the base of the logarithm can be many times larger.
Thus, B – trees save a factor of about lg t over red-black trees in the
number of nodes examined for most tree operations. Since examining an
arbitrary node in a tree usually requires a disk access, the number of disk
accesses is substantially reduced.
4.4 Binomial trees
A binomial heap is a collection of binomial trees, so this section starts by
defining binomial trees and proving some key properties. We then define
binomial heaps and show how they can be represented.
4.4.1 Binomial trees
The binomial tree Bk is an ordered tree (see section B.5.2) defined
recursively. As shown in Figure 19.2 (a), the binomial tree B0 consists of a
single node. The binomial tree Bk consists of two binomial trees Bk-1 that
are linked together : the root of one is the leftmost child of the root of the
other. Figure 19.2 (b) shows the binomial trees B0 through B4.
Some properties of binomial trees are given by the following lemma.
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 184
4.4.2 (Properties of binomial trees)
For the binomial tree Bk,
1. there are 2k nodes,
2. the height of the trees is k,
3. there are exactly
i
k nodes at depth i for I = 0, 1, ……, k, and
4. the root has degree k, which is greater than that of any other node;
moreover if the children of the root are numbered from left to right by
k – 1, k – 2, …. 0, child i is the root of a subtree Bi.
Proof: The proof is by induction on k. For each property, the basis is the
binomial tree B0 . Verifying that each property holds for B0 is trivial.
For the inductive step, we assume that the lemma holds for B k–1.
1. Binomial tree Bk consists of two copies of Bk–1, and so Bk has
2k–1 + 2 k–1 = 2k nodes.
2. Because of the way in which the two copies of Bk–1 are linked to form
Bk, the maximum depth of a node in Bk is one greater than maximum
depth in B k–1.
By the inductive hypothesis, this maximum depth is (k–1) + 1 = k.
3. Let D(k, i) be the number of nodes at depth i of binomial tree Bk. Since
Bk is composed of two copies of Bk–1 linked together, a node at depth i
in Bk–1 appears in Bk once at depth i and once at depth i + 1. In other
words, the number of nodes at depth i in Bk is the number of nodes at
depth i in Bk–1 plus the number of nodes at depth i – 1 in Bk–1. Thus,
1i,1kDi,1kDi,kD (by the inductive hypothesis)
1i
1k
i
1k (formula)
i
k .
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 185
Figure 4.2 (a) The recursive definition of the binomial tree Bk. Triangles represent rooted sub-trees. (b) The binomial trees B0 through B4. Node depths in B4 are shown. (c) Another way of looking at the binomial tree Bk.
4. The only node with greater degree in Bk than in Bk–1 is the root, which
has one more child than in Bk–1. Since the root of Bk–1 has degree k – 1,
the root of Bk has degree k. Now, by the inductive hypothesis, and as
Figure 4.2(c) shows, from left to right, the children of the root of Bk–1 are
roots of Bk–2, Bk–3, ……, B0. When Bk–1 is linked to Bk–1, therefore, the
children of the resulting root are roots of Bk–1, Bk–2, ……, B0.
Note:
The maximum degree of any node in an n-node binomial tree is lg n..
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 186
4.5 Binomial Heaps
A binomial heap H is a set of binomial trees that satisfies the following
binomial heap properties.
1. Each binomial tree in H obeys the min-heap property: the key of a
node is greater than or equal to the key of its parent. We say that each
such tree is min-heap-ordered.
2. For any nonnegative integer k, there is at most one binomial tree in H
whose root has degree k.
The first property tells us that the root of a min-heap-ordered tree
contains the smallest key in the tree.
The second property implies that an n-node binomial heap H consists of
at most [lg n] + 1 binomial trees. To see why, observe that the binary
representation of n has [lg n] + 1 bits, say 01]n[lg]n[lg b....,,b,b ,
so that
ii
]n[lg
0i2bn . By property 1 of 4.4.2, therefore, binomial
tree Bi appears in H if and only if bit bI = 1. Thus, binomial heap H
contains at most [lg n] + 1 binomial trees.
4.6 Fibonacci Heaps
4.6.1 Structure of Fibonacci heaps
Like a binomial heap, a Fibonacci heap is a collection of min-heap-ordered
trees. The trees in a Fibonacci heap are not constrained to be binomial
trees, however. Figure 4.3(a) shows an example of a Fibonacci heap.
Unlike trees within binomial heaps, which are ordered, trees within
Fibonacci heaps are rooted but unordered. As Figure 4.3(b) shows, each
node x contains a pointer p [x] to its parent and a pointer child [x] to any one
of its children. The children of x are linked together in a circular, doubly
linked list, which we call the child list of x. Each child y in a child list has
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 187
pointers left [y] and right [y] that point to y’s left and right siblings,
respectively. If node y is an only child, then left [y] = right [y] = y. The order
in which siblings appear in a child list is arbitrary.
Figure 4.3(a) A Fibonacci heap consisting of five min-heap-ordered trees and 14 nodes. The dashed line indicates the root list. The minimum node of the heap is the node containing the key 3. The three marked nodes are blackened. The potential of this particular Fibonacci heap is 5+2.3=11. (b) A more complete representation showing pointers p (up arrows), child (down arrows), and left and right (sideways arrows).
Two other fields in each node will be of use. The number of children in the
child list of node x is stored in degree[x]. The Boolean-valued field mark[x]
indicates whether node x has lost a child since the last time x was made the
child of another node. Newly created nodes are unmarked, and a node x
becomes unmarked whenever it is made the child of another node.
A given Fibonacci heap H is accessed by a pointer min [H] to the root of a
tree containing a minimum key; this node is called the minimum node of
the Fibonacci heap. If a Fibonacci heap H is empty, then min [H] = NIL.
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 188
The roots of all the trees in a Fibonacci heap are linked together using their
left and right pointers into a circular, doubly linked list called the root list of
the Fibonacci heap. The pointer min [H] thus points to the node in the root
list whose key is minimum. The order of the trees within a root list is
arbitrary.
We rely on one other attribute for a Fibonacci heap H : the number of nodes
currently in H is kept in n[H].
4.6.2 Potential function
For a given Fibonacci heap H, we indicate by t (H) the number of trees in
the root list of H and by m(H) the number of marked nodes in H. The
potential of Fibonacci heap H is then defined by
Hm2HtH (a)
For example, the potential of the Fibonacci heap shown in Figure 4.3 is
5+2.3 = 11. The potential of a set of Fibonacci heaps is the sum of the
potentials of its constituent Fibonacci heaps. We shall assume that a unit of
potential can pay for a constant amount of work, where the constant is
sufficiently large to cover the cost of any of the specific constant-time pieces
of work that we might encounter.
We assume that a Fibonacci heap application begins with no heaps. The
initial potential, therefore, is 0, and by equation (a), the potential is
nonnegative at all subsequent times.
4.6.3 Maximum degree
The amortized analyses we shall perform in the remaining sections of this
unit assume that there is a known upper bound D(n) on the maximum
degree of any node in an n-node Fibonacci heap.
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 189
4.7 Data Structures for Disjoint Sets
4.7.1 Disjoint-set operations
A disjoint-set data structure maintains a collection S = {S1, S2, ….. Sk} of
disjoint dynamic sets. Each set is identified by a representative, which is
some member of the set. In some applications, it doesn’t matter which
member is used as the representative; In other applications, there may be a
prespecified rule for choosing the representative, such as choosing the
smallest member in the set.
As in the other dynamic-set implementations we have studied, each element
of a set is represented by an object. Letting x denote an object, we wish to
support the following operations:
MAKE – SET (x) creates a new set whose only member (and thus
representative) is x.
Since the sets are disjoint, we require that x not already be in some other
set. UNION (x, y) unites the dynamic sets that contain x and y, say Sx and
Sy, into a new set that is the union of these two sets. The two sets are
assumed to be disjoint prior to the operation. The representative of the
resulting set is any member of Sx Sy, although many implementations of
UNION specifically choose the representative of either Sx or Sy as the new
representative. Since we require the sets in the collection to be disjoint, we
“destroy” set Sx and Sy, removing them from the collection S.
FIND – SET (x) returns a pointer to the representative of the (unique) set
containing x.
Throughout this unit, we shall analyze the running times of disjoint-set data
structures in terms of two parameters: n, the number of MAKE–SET
operations, and m, the total number of MAKE–SET, UNION, and
FIND – SET operations. Since the sets are disjoint, each UNION operation
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 190
reduces the number of sets by one. After n –1 UNION operations, therefore,
only one set remains. The number of UNION operations is thus at most
n – 1. Note also that since the MAKE–SET operations are included in the
total number of operations m, we have m n. We assume that the n
MAKE–SET operations are the first n operations performed.
4.7.2 An application of disjoint-set data structures
One of the many applications of disjoint-set data structures arises in
determining the connected components of an undirected graph. Figure
4.4(a), for example, shows a graph with four connected components.
The procedure CONNECTED – COMPONENTS that follows uses the
disjoint-set operations to compute the connected components of a graph.
Once CONNECTED – COMPONENTS has been run as a preprocessing
step, the procedure SAME – COMPONENT answers queries about whether
two vertices are in the same connected component.1 (The set of vertices of
a graph G is denoted by V [G], and the set of edges is denoted by E[G].)
a b
c d
e f
g
h i
j
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 191
Edge Collection of disjoint sets
Processed
Initial sets {a} {b} {c} {d} {e} {f} {g} {h} {i} {j}
(b, d) {a} {b, d} {c} {e} {f} {g} {h} {i} {j}
(e, g) {a} {b, d} {c} {e, g} {f} {h} {i} {j}
(a, c) {a, c} {b, d} {e, g} {f} {h} {I} {j}
(h, i) {a, c} {b, d} {e, g} {f} {h, i} {j}
(a, b) {a, b, c,d} {e, g} {f} {h, i} {j}
(e, f) {a, b, c, d} {e, f, g} {h, i} {j}
(b, c) {a, b, c, d} {e, f, g} {h, i} {j}
(b)
Figure 4.4 (a) A graph with four connected components {a, b, c, d}, {e, f, g}
{h, j} and {j}
(b) The collection of disjoint sets after each edge is processed.
CONNECTED – COMPONENTS (S)
1. for each vertex V {G}
2. do MAKE – SET ()
3. for each edge (u, ) E [G]
4. do if FIND – SET (u) FIND – SET ()
5. then UNION (u, )
SAME – COMPONENT (u, )
1. if FIND – SET (u) = FIND – SET ()
2. then return TRUE
3. else return FALSE
The procedure CONNECTED – COMPONENTS initially places each vertex
in its own set. Then, for each edge (u, ), it unites the sets containing u
and . After all the edges are processed, two vertices are in the same
Analysis and Design of Algorithms Unit 4
Sikkim Manipal University Page No.: 192
connected component if and only if the corresponding objects are in the
same set. Thus, CONNECTED – COMPONENTS computes sets in such a
way that the procedure SAME – COMPONENT can determine whether two
vertices are in the same connected components. Figure 4.4 (b) illustrates
how the disjoint sets are computed by CONNECTED – COMPONENTS.
Self Assessment Questions
1. Briefly describe the properties of B-trees.
2. Explain the height of a B-tree.
3. What do you mean by Binomial heaps.
4.8 Summary
In this we study the properties of B-trees in depth. The concept of Binomial
trees and Binomial heaps is discussed here in a simple manner. Fibonacci
heaps and the concepts related to it are described here with suitable
illustrations. Lastly in this unit Data Structures for Disjoint sets is studied.
4.9 Terminal Questions
1. Describe the binomial trees
2. Explain the structure of Fibonacci heaps
3. Briefly explain Disjoint set operations.
4.10 Answers
Self Assessment Questions
1. Refer to Section 4.2.
2. Refer to Section 4.3.
3. Refer to Section 4.5
Terminal Questions
1. Refer to Section 4.4
2. Refer to Section 4.6.1
3. Refer to Section 4.7.1