13
Analysis and Design of Algorithms Unit 4 Sikkim Manipal University Page No.: 180 Unit 4 B Trees Structure 4.1 Introduction Objectives 4.2 Properties of B Trees 4.3 The height of a B Tree 4.4 Binomial trees 4.5 Binomial Heaps 4.6 Fibonacci Heaps 4.7 Data Structures for Disjoint Sets 4.8 Summary 4.9 Terminal Questions 4.10 Answers 4.1 Introduction We know that for binary search trees and red-black trees, any “satellite information” associated with a key is stored in the same node as the key. In practice, one might actually store with each key just a pointer to another disk page containing the satellite information for that key. The pseudo code in this chapter implicitly assumes that the satellite information associated with a key, or the pointer to such satellite information, travels with the key whenever the key is moved from node to node. A common variant on a B tree, known as a B + tree, stores all the satellite information in the leaves and stores only keys and child pointers in the internal nodes, thus maximizing the branching factor of the internal nodes.

Unit 4fi

Embed Size (px)

DESCRIPTION

fi

Citation preview

Page 1: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 180

Unit 4 B – Trees

Structure

4.1 Introduction

Objectives

4.2 Properties of B – Trees

4.3 The height of a B – Tree

4.4 Binomial trees

4.5 Binomial Heaps

4.6 Fibonacci Heaps

4.7 Data Structures for Disjoint Sets

4.8 Summary

4.9 Terminal Questions

4.10 Answers

4.1 Introduction

We know that for binary search trees and red-black trees, any “satellite

information” associated with a key is stored in the same node as the key. In

practice, one might actually store with each key just a pointer to another disk

page containing the satellite information for that key. The pseudo code in

this chapter implicitly assumes that the satellite information associated with

a key, or the pointer to such satellite information, travels with the key

whenever the key is moved from node to node. A common variant on a

B – tree, known as a B+ – tree, stores all the satellite information in the

leaves and stores only keys and child pointers in the internal nodes, thus

maximizing the branching factor of the internal nodes.

Page 2: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 181

Objectives

At the end of this unit the student should be able to:

Find the height of a B-tree.

Recognise a Fibonacci Heap

4.2 Properties of B – Trees

A B – tree T is a rooted tree (whose root is root [T]) having the following

properties:

1. Every node x has the following fields:

a. n [x], the number of keys currently stored in node x,

b. the n [x keys themselves, stored in nondecreasing order, so that

xkey...xkeyxkey xn21 ,

c. leaf [x], a Boolean value that is TRUE if x is a leaf and FALSE if x is

an internal node.

2. Each internal node x also contains n [x]+1 pointers c1[x], c2 [x]….,

cn[x]+1[x] to its children. Leaf nodes have no children, so their ic fields

are undefined.

3. The keys keyI [x] separate the ranges of keys stored in each subtree: if

KI is any key stored in the subtree with root ci [x], then

1xnxn2211 kxkey...xkeykxkeyk .

4. All leaves have the same depth, which is the tree’s height h.

5. There are lower and upper bounds on the number of keys a node can

contain. These bounds can be expressed in terms of a fixed integer 2t

called the minimum degree of the B – Tree:

a. Every node other than the root must have at least t – 1 keys. Every

internal node other than the root thus has at least t children. If the

tree is nonempty, the root must have at least one key.

Page 3: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 182

b. Every node can contain at most 2t – 1 keys. Therefore, an internal

node can have at most 2t children. We say that a node is full if it

contains exactly 2t – 1 keys.

The simplest B – tree occurs when t = 2. Every internal node then

has either 2, 3, or 4 children, and we have a 2-3-4 tree. In practice,

however, much larger values of t are typically used.

4.3 The height of a B – tree

The number of disk accesses required for most operations on a B – tree is

proportional to the height of the B – tree. We now analyze the worst-case

height of a B – tree.

Theorem

If 1n , then for any n – key B – tree T of height h and minimum degree

2t , 2

1nlogh t

Proof If a B – tree has height h, the root contains at least one key and all

other nodes contain at least 1t keys. Thus, there are at least 2 nodes at

depth 1, at least 2t nodes at depth 2, at least 2t 2 nodes at depth 3, and so

on, until at depth h there are at least 2t h – 1 nodes. Figure illustrates such a

tree for h = 3. Thus, the number n of keys satisfies the inequality.

Figure 4.1: B – tree of height 3 containing a minimum possible number of

keys. Shown inside each node x is n[x].

number

depth of nodes

0 1

1 2

2 2t

3 2t2

Page 4: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 183

h

1i

1it21t1n

1t

1t1t21

h

1t2h .

By simple algebra, we get

2

1nt

h . Taking base – t logarithms of both

sides proves the theorem.

Here we see the power of B – trees, as compared to red-black trees.

Although the height of the tree grows as O(lg n) in both cases (that t is a

constant), for B – trees the base of the logarithm can be many times larger.

Thus, B – trees save a factor of about lg t over red-black trees in the

number of nodes examined for most tree operations. Since examining an

arbitrary node in a tree usually requires a disk access, the number of disk

accesses is substantially reduced.

4.4 Binomial trees

A binomial heap is a collection of binomial trees, so this section starts by

defining binomial trees and proving some key properties. We then define

binomial heaps and show how they can be represented.

4.4.1 Binomial trees

The binomial tree Bk is an ordered tree (see section B.5.2) defined

recursively. As shown in Figure 19.2 (a), the binomial tree B0 consists of a

single node. The binomial tree Bk consists of two binomial trees Bk-1 that

are linked together : the root of one is the leftmost child of the root of the

other. Figure 19.2 (b) shows the binomial trees B0 through B4.

Some properties of binomial trees are given by the following lemma.

Page 5: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 184

4.4.2 (Properties of binomial trees)

For the binomial tree Bk,

1. there are 2k nodes,

2. the height of the trees is k,

3. there are exactly

i

k nodes at depth i for I = 0, 1, ……, k, and

4. the root has degree k, which is greater than that of any other node;

moreover if the children of the root are numbered from left to right by

k – 1, k – 2, …. 0, child i is the root of a subtree Bi.

Proof: The proof is by induction on k. For each property, the basis is the

binomial tree B0 . Verifying that each property holds for B0 is trivial.

For the inductive step, we assume that the lemma holds for B k–1.

1. Binomial tree Bk consists of two copies of Bk–1, and so Bk has

2k–1 + 2 k–1 = 2k nodes.

2. Because of the way in which the two copies of Bk–1 are linked to form

Bk, the maximum depth of a node in Bk is one greater than maximum

depth in B k–1.

By the inductive hypothesis, this maximum depth is (k–1) + 1 = k.

3. Let D(k, i) be the number of nodes at depth i of binomial tree Bk. Since

Bk is composed of two copies of Bk–1 linked together, a node at depth i

in Bk–1 appears in Bk once at depth i and once at depth i + 1. In other

words, the number of nodes at depth i in Bk is the number of nodes at

depth i in Bk–1 plus the number of nodes at depth i – 1 in Bk–1. Thus,

1i,1kDi,1kDi,kD (by the inductive hypothesis)

1i

1k

i

1k (formula)

i

k .

Page 6: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 185

Figure 4.2 (a) The recursive definition of the binomial tree Bk. Triangles represent rooted sub-trees. (b) The binomial trees B0 through B4. Node depths in B4 are shown. (c) Another way of looking at the binomial tree Bk.

4. The only node with greater degree in Bk than in Bk–1 is the root, which

has one more child than in Bk–1. Since the root of Bk–1 has degree k – 1,

the root of Bk has degree k. Now, by the inductive hypothesis, and as

Figure 4.2(c) shows, from left to right, the children of the root of Bk–1 are

roots of Bk–2, Bk–3, ……, B0. When Bk–1 is linked to Bk–1, therefore, the

children of the resulting root are roots of Bk–1, Bk–2, ……, B0.

Note:

The maximum degree of any node in an n-node binomial tree is lg n..

Page 7: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 186

4.5 Binomial Heaps

A binomial heap H is a set of binomial trees that satisfies the following

binomial heap properties.

1. Each binomial tree in H obeys the min-heap property: the key of a

node is greater than or equal to the key of its parent. We say that each

such tree is min-heap-ordered.

2. For any nonnegative integer k, there is at most one binomial tree in H

whose root has degree k.

The first property tells us that the root of a min-heap-ordered tree

contains the smallest key in the tree.

The second property implies that an n-node binomial heap H consists of

at most [lg n] + 1 binomial trees. To see why, observe that the binary

representation of n has [lg n] + 1 bits, say 01]n[lg]n[lg b....,,b,b ,

so that

ii

]n[lg

0i2bn . By property 1 of 4.4.2, therefore, binomial

tree Bi appears in H if and only if bit bI = 1. Thus, binomial heap H

contains at most [lg n] + 1 binomial trees.

4.6 Fibonacci Heaps

4.6.1 Structure of Fibonacci heaps

Like a binomial heap, a Fibonacci heap is a collection of min-heap-ordered

trees. The trees in a Fibonacci heap are not constrained to be binomial

trees, however. Figure 4.3(a) shows an example of a Fibonacci heap.

Unlike trees within binomial heaps, which are ordered, trees within

Fibonacci heaps are rooted but unordered. As Figure 4.3(b) shows, each

node x contains a pointer p [x] to its parent and a pointer child [x] to any one

of its children. The children of x are linked together in a circular, doubly

linked list, which we call the child list of x. Each child y in a child list has

Page 8: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 187

pointers left [y] and right [y] that point to y’s left and right siblings,

respectively. If node y is an only child, then left [y] = right [y] = y. The order

in which siblings appear in a child list is arbitrary.

Figure 4.3(a) A Fibonacci heap consisting of five min-heap-ordered trees and 14 nodes. The dashed line indicates the root list. The minimum node of the heap is the node containing the key 3. The three marked nodes are blackened. The potential of this particular Fibonacci heap is 5+2.3=11. (b) A more complete representation showing pointers p (up arrows), child (down arrows), and left and right (sideways arrows).

Two other fields in each node will be of use. The number of children in the

child list of node x is stored in degree[x]. The Boolean-valued field mark[x]

indicates whether node x has lost a child since the last time x was made the

child of another node. Newly created nodes are unmarked, and a node x

becomes unmarked whenever it is made the child of another node.

A given Fibonacci heap H is accessed by a pointer min [H] to the root of a

tree containing a minimum key; this node is called the minimum node of

the Fibonacci heap. If a Fibonacci heap H is empty, then min [H] = NIL.

Page 9: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 188

The roots of all the trees in a Fibonacci heap are linked together using their

left and right pointers into a circular, doubly linked list called the root list of

the Fibonacci heap. The pointer min [H] thus points to the node in the root

list whose key is minimum. The order of the trees within a root list is

arbitrary.

We rely on one other attribute for a Fibonacci heap H : the number of nodes

currently in H is kept in n[H].

4.6.2 Potential function

For a given Fibonacci heap H, we indicate by t (H) the number of trees in

the root list of H and by m(H) the number of marked nodes in H. The

potential of Fibonacci heap H is then defined by

Hm2HtH (a)

For example, the potential of the Fibonacci heap shown in Figure 4.3 is

5+2.3 = 11. The potential of a set of Fibonacci heaps is the sum of the

potentials of its constituent Fibonacci heaps. We shall assume that a unit of

potential can pay for a constant amount of work, where the constant is

sufficiently large to cover the cost of any of the specific constant-time pieces

of work that we might encounter.

We assume that a Fibonacci heap application begins with no heaps. The

initial potential, therefore, is 0, and by equation (a), the potential is

nonnegative at all subsequent times.

4.6.3 Maximum degree

The amortized analyses we shall perform in the remaining sections of this

unit assume that there is a known upper bound D(n) on the maximum

degree of any node in an n-node Fibonacci heap.

Page 10: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 189

4.7 Data Structures for Disjoint Sets

4.7.1 Disjoint-set operations

A disjoint-set data structure maintains a collection S = {S1, S2, ….. Sk} of

disjoint dynamic sets. Each set is identified by a representative, which is

some member of the set. In some applications, it doesn’t matter which

member is used as the representative; In other applications, there may be a

prespecified rule for choosing the representative, such as choosing the

smallest member in the set.

As in the other dynamic-set implementations we have studied, each element

of a set is represented by an object. Letting x denote an object, we wish to

support the following operations:

MAKE – SET (x) creates a new set whose only member (and thus

representative) is x.

Since the sets are disjoint, we require that x not already be in some other

set. UNION (x, y) unites the dynamic sets that contain x and y, say Sx and

Sy, into a new set that is the union of these two sets. The two sets are

assumed to be disjoint prior to the operation. The representative of the

resulting set is any member of Sx Sy, although many implementations of

UNION specifically choose the representative of either Sx or Sy as the new

representative. Since we require the sets in the collection to be disjoint, we

“destroy” set Sx and Sy, removing them from the collection S.

FIND – SET (x) returns a pointer to the representative of the (unique) set

containing x.

Throughout this unit, we shall analyze the running times of disjoint-set data

structures in terms of two parameters: n, the number of MAKE–SET

operations, and m, the total number of MAKE–SET, UNION, and

FIND – SET operations. Since the sets are disjoint, each UNION operation

Page 11: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 190

reduces the number of sets by one. After n –1 UNION operations, therefore,

only one set remains. The number of UNION operations is thus at most

n – 1. Note also that since the MAKE–SET operations are included in the

total number of operations m, we have m n. We assume that the n

MAKE–SET operations are the first n operations performed.

4.7.2 An application of disjoint-set data structures

One of the many applications of disjoint-set data structures arises in

determining the connected components of an undirected graph. Figure

4.4(a), for example, shows a graph with four connected components.

The procedure CONNECTED – COMPONENTS that follows uses the

disjoint-set operations to compute the connected components of a graph.

Once CONNECTED – COMPONENTS has been run as a preprocessing

step, the procedure SAME – COMPONENT answers queries about whether

two vertices are in the same connected component.1 (The set of vertices of

a graph G is denoted by V [G], and the set of edges is denoted by E[G].)

a b

c d

e f

g

h i

j

Page 12: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 191

Edge Collection of disjoint sets

Processed

Initial sets {a} {b} {c} {d} {e} {f} {g} {h} {i} {j}

(b, d) {a} {b, d} {c} {e} {f} {g} {h} {i} {j}

(e, g) {a} {b, d} {c} {e, g} {f} {h} {i} {j}

(a, c) {a, c} {b, d} {e, g} {f} {h} {I} {j}

(h, i) {a, c} {b, d} {e, g} {f} {h, i} {j}

(a, b) {a, b, c,d} {e, g} {f} {h, i} {j}

(e, f) {a, b, c, d} {e, f, g} {h, i} {j}

(b, c) {a, b, c, d} {e, f, g} {h, i} {j}

(b)

Figure 4.4 (a) A graph with four connected components {a, b, c, d}, {e, f, g}

{h, j} and {j}

(b) The collection of disjoint sets after each edge is processed.

CONNECTED – COMPONENTS (S)

1. for each vertex V {G}

2. do MAKE – SET ()

3. for each edge (u, ) E [G]

4. do if FIND – SET (u) FIND – SET ()

5. then UNION (u, )

SAME – COMPONENT (u, )

1. if FIND – SET (u) = FIND – SET ()

2. then return TRUE

3. else return FALSE

The procedure CONNECTED – COMPONENTS initially places each vertex

in its own set. Then, for each edge (u, ), it unites the sets containing u

and . After all the edges are processed, two vertices are in the same

Page 13: Unit 4fi

Analysis and Design of Algorithms Unit 4

Sikkim Manipal University Page No.: 192

connected component if and only if the corresponding objects are in the

same set. Thus, CONNECTED – COMPONENTS computes sets in such a

way that the procedure SAME – COMPONENT can determine whether two

vertices are in the same connected components. Figure 4.4 (b) illustrates

how the disjoint sets are computed by CONNECTED – COMPONENTS.

Self Assessment Questions

1. Briefly describe the properties of B-trees.

2. Explain the height of a B-tree.

3. What do you mean by Binomial heaps.

4.8 Summary

In this we study the properties of B-trees in depth. The concept of Binomial

trees and Binomial heaps is discussed here in a simple manner. Fibonacci

heaps and the concepts related to it are described here with suitable

illustrations. Lastly in this unit Data Structures for Disjoint sets is studied.

4.9 Terminal Questions

1. Describe the binomial trees

2. Explain the structure of Fibonacci heaps

3. Briefly explain Disjoint set operations.

4.10 Answers

Self Assessment Questions

1. Refer to Section 4.2.

2. Refer to Section 4.3.

3. Refer to Section 4.5

Terminal Questions

1. Refer to Section 4.4

2. Refer to Section 4.6.1

3. Refer to Section 4.7.1