55
1 External Sorting and Searching B-Trees, etc.

External Sorting and Searching

  • Upload
    gotzon

  • View
    83

  • Download
    1

Embed Size (px)

DESCRIPTION

External Sorting and Searching. B-Trees, etc. m-Way Search Trees. In a binary search tree, there is one key value per node and two children. There is no reason why I couldn’t have (at most) m-1 key values per node and m children. Such trees are called m-way search trees. - PowerPoint PPT Presentation

Citation preview

Page 1: External Sorting and Searching

1

External Sorting and Searching

B-Trees, etc.

Page 2: External Sorting and Searching

2

m-Way Search Trees

In a binary search tree, there is one key value per node and two children.

There is no reason why I couldn’t have (at most) m-1 key values per node and m children.

Such trees are called m-way search trees.

Page 3: External Sorting and Searching

3

m-Way Search Tree Example

Here is a 3-way search tree; each node has a maximum of 3 children.

120, 240,

360, 44097 200

Page 4: External Sorting and Searching

4

m-Way Search Tree Example II

Here is another one.

120, 240

360, 440

97

500

Page 5: External Sorting and Searching

5

m-Way Time Complexity

Clearly, the search and insert time for an m-way search tree is still O(n). The number of nodes visited is O(n/m) For each, we must look at m values. We could search in O(log2(m)) time,

yielding a best case of O(n/m * log2(m)). Of course, as n gets much larger than

M, this is still O(n).

Page 6: External Sorting and Searching

6

B-Trees

What I want is a height-balanced m-way search tree to achieve the best search time.

These are called B-Trees.As with height-balanced BSTs, we

will have a re-balancing algorithm to run after every insert and delete.

Page 7: External Sorting and Searching

7

B-Tree Properties

The root may have between 2 and m children.

All other nodes must have between M/2 and m children.

A node that has k children will have k-1 key values.

Thus, the root may have only 2 children; all other nodes must be at least half full.

Page 8: External Sorting and Searching

8

B-Tree Properties II

If a B-Tree has k children (T0, T1, ...TK-1) and k-1 ordered key values (D1, D2,...DK-1), then all the key values in Ti are greater than Di but less than Di+1 for i=1...k-2.

All the key values in T0 are less than D1.

All the key values in Tk-1 are greater than DK-1.

This simply means it is a search tree.

Page 9: External Sorting and Searching

9

B-Tree Insertion

All insertions are done at the terminal level.

First search for terminal level node to insert the new key value into.

If the number of children of this node does not exceed m, stop.

If the number of children does exceed m...

Page 10: External Sorting and Searching

10

B-Tree Node Splitting

Split this node into two nodes: Take the middle value out. Create one node with the lower half of the

key values and one with the upper half. Insert middle value into the parent node. Continue recursively until either the node

can hold the new key value, or you split the root.

Page 11: External Sorting and Searching

11

B-Tree Insert Example

A B-Tree of order 3 (i.e. m=3) is the smallest possible.

It is also the easiest to draw, so we’ll use this order for our example.

This is also called a “2-3 Tree” because each node may have a maximum of 2 key values and 3 children.

Page 12: External Sorting and Searching

12

B-Tree Example

Insert 120. A new root node is created and this value is placed into it.

120

Key values left to insert: 360, 240, 200, 97, 440, 280

Page 13: External Sorting and Searching

13

B-Tree Example

Insert 360. It goes into the root. No further action is required.

120, 360

Key values left to insert:240, 200, 97, 440, 280

Page 14: External Sorting and Searching

14

B-Tree Example

Insert 240. It goes into the root. Since this node has 3 values, it must be split.

120, 240, 360

Key values left to insert: 200, 97, 440, 280

Page 15: External Sorting and Searching

15

B-Tree Example

This shows the result of the split. 120 and 360 go into nodes by themselves, and 240 is placed into a new root node.

240

360120

Key values left to insert: 200, 97, 440, 280

Page 16: External Sorting and Searching

16

B-Tree Example

Insert value 200. It goes into the node with 120. No further action is required.

240

360120, 200

Key values left to insert: 97, 440, 280

Page 17: External Sorting and Searching

17

B-Tree Example

Insert value 97. It goes into the node with 120 and 200. Since this node contains too many values, it must be split

240

36097, 120, 200

Key values left to insert: 440, 280

Page 18: External Sorting and Searching

18

B-Tree Example

This shows the result of the split. 97 and 200 are placed into their own nodes, and 120 is moved up to the parent. The parent node is OK.

120, 240,

36097 200

Key values left to insert: 440, 280

Page 19: External Sorting and Searching

19

B-Tree Example

Insert 440. It goes into the node with 360. No further action is required.

120, 240,

360, 44097 200

Key values left to insert:280

Page 20: External Sorting and Searching

20

B-Tree Example

Insert the value 280. It goes into the node with 360 and 440. Since this node has 3 values, it must be split.

120, 240,

280, 360, 44097 200

Key values left to insert:DONE

Page 21: External Sorting and Searching

21

B-Tree Example

This shows the result of the split. 280 and 440 go into nodes by themselves, and 360 is moved up to the parent node.

120, 240, 360

44028097 200

Page 22: External Sorting and Searching

22

B-Tree Example

The parent node must be split as well. Because it is the root, we must create a new root node.

240

120 360

44028097 200

Page 23: External Sorting and Searching

23

Time Complexity

What is the order of a B-tree search? To answer this, we need to determine the worst case number of levels in a B-Tree of order m that has n key values.

Let’s look at the number of nodes per level: The root must have 1 node; Level 2 must have 2 nodes; Level 3 must have 2* M/2 nodes; Level 4 must have 2* M/2 2 nodes; Level L must have 2* M/2 L-2 nodes.

Page 24: External Sorting and Searching

24

Time Complexity II

Observation: in any list of n elements, there are n+1 ways for the search to fail.

In a B-tree, all the ways to fail are at level L+1 (these are sometimes called Failure Nodes).

Thus, this is a relationship between the number of key values and the height of the tree:

Page 25: External Sorting and Searching

25

Time Complexity III

Because the previous analysis is a worst case, the number of nodes at level L+1 must be less than or equal to N+1:

2 * m/2L-1 <= (N+1)m/2L-1 <= (N+1)/2L-1 <= Log m/2[(N+1)/2]L <= Log m/2[(N+1)/2] + 1

Page 26: External Sorting and Searching

26

Time Complexity IV

One node at each level must be accessed, so L gives the number of nodes to access.

Each node contains m/2 -1 key values, so the total number of comparisons is

{Log m/2[(N+1)/2]+1} * {Log2[m/2 -1]}

Page 27: External Sorting and Searching

27

Fun With Math

Removing the constants, we may say this search is

O{ Log m/2(N) * Log2[m/2] }O{Log2(N) / Log2m/2 * (Log2[m/2) }O{Log2(N)}

Page 28: External Sorting and Searching

28

Summing it up:

WHAT??? ALL THIS WORK FOR THE SAME ORDER AS AN AVL-TREE!!!

What’s going on here???

Page 29: External Sorting and Searching

29

What Really Happens

Remember this is external sorting, so accessing the information and doing comparisons are a much different cost.

Each node in the B-tree is stored in a “block” on the disk; a “block” is the minimum amount of information which can be retrieved with one disk access.

Page 30: External Sorting and Searching

30

What Really Happens II

Thus, the number of disk accesses is the bottle-neck; this is given by L.

A B-tree is built on a field of a data file to speed access to that field.

A “Clustered” or “Primary” B-tree stores the entire record of the file in the B-Tree.

An “Unclustered” or “Secondary” B-tree stores the field’s value and the record number in the node.

Page 31: External Sorting and Searching

31

What Really Happens III

It is the secondary B-trees that one usually means when one says “B-tree”.

Thus, to do a search for a record on a field which has a B-tree: Search the B-tree for the key value. When found, retrieve its associated record

number. Retrieve that record from the data file.

Page 32: External Sorting and Searching

32

A Real Example.

What follows is a real example of how a B-tree is used.

Page 33: External Sorting and Searching

33

Sample Data File

Course Teacher Schedule#CS 470 Prof. Green 23CS 471 Prof. Green 45CS 472 Prof. Green 46CS 473 Prof. Smith 100CS 474 Prof. Smith 110CS 475 Prof. Smith 120CS 476 Prof. Green 140CS 477 Prof. Green 210

Page 34: External Sorting and Searching

34

B-Tree on Schedule#

100

45 120

23 46 110 140,210

This is the way we would normally view it:

Page 35: External Sorting and Searching

35

B-Tree on Schedule#

Rec#Child Ptr 1

Key value 1

Data Ptr 1

Child Ptr 2

Key value 2

Data Ptr 2

Child Ptr 3

1 2 100 4 6 0 0 0

2 3 45 2 4 0 0 0

3 0 23 1 0 0 0 0

4 0 46 3 0 0 0 0

5 0 110 5 0 0 0 0

6 5 120 6 7 0 0 0

7 0 140 7 0 210 8 0

This is how it really looks in a file :

Page 36: External Sorting and Searching

36

Deleting in a B-tree

To delete from a B-Tree, first locate the key value with the normal search routine.

If the key value is not located in a terminal node, replace it with its in order successor and delete the in order successor.

Thus, all deletes which reduce the number of key values occur at the terminal level.

Page 37: External Sorting and Searching

37

Deleting From the Terminal Level

Good news: because there are no children to worry about, we can just remove it from the list.

Bad news: what if this removal reduces the number of children below m/2?

Reality: at some point we will need to reduce the number of nodes...

Page 38: External Sorting and Searching

38

The “Borrow” Algorithm

When a node is reduced below m/2children, first try and borrow a key value from one of its neighbors.

If a neighbor has more than the minimum, then rotate the appropriate key to the parent and the appropriate key from the parent down to the reduced child.

Page 39: External Sorting and Searching

39

Borrow Example

Suppose I want to delete 200 from this b-tree of order 3.

To do so, rotate 240 into middle child, and 360 up to root:

120, 240

360, 44097 200

Page 40: External Sorting and Searching

40

Borrow Example

This shows the result.Problem: what if I now want to delete

240?Borrowing won’t work...

120, 360

44097 240

Page 41: External Sorting and Searching

41

Combining Nodes

When borrowing won’t work, combine the node with the key value from the parent AND the neighbor node with minimum children.

Repeat the deletion algorithm from the parent, looking first to borrow if possible.

Now, let’s delete 240...

Page 42: External Sorting and Searching

42

Combining Example

First, remove 240.

120, 360

44097 240

Page 43: External Sorting and Searching

43

Combining Example

Next, attempt to borrow.Borrowing fails.Combine empty node with 360 and

440.

120, 360

44097 <empty>

Page 44: External Sorting and Searching

44

Combining Example

This shows the result.The parent is OK, so we are done...

120

360, 44097

Page 45: External Sorting and Searching

45

A Larger Example

Delete 280This is a “borrow” case:

260

120, 180 360

440, 50028097 150 200

Page 46: External Sorting and Searching

46

A Larger Example

Delete 360This is a “combine” case:

260

120, 180 440

50036097 150 200

Page 47: External Sorting and Searching

47

A Larger Example

First, remove 360...

260

120, 180 440

500<empty>97 150 200

Page 48: External Sorting and Searching

48

A Larger Example

Next combine node with its neighbor (500) and 440 from the parent...

260

120, 180 440

500<empty>97 150 200

Page 49: External Sorting and Searching

49

A Larger Example

Parent now has a problem...This is a borrow case:

260

120, 180 <empty>

440, 50097 150 200

Page 50: External Sorting and Searching

50

A Larger Example

Children must now be considered. What do I do with the node with 200?

180

120 260

440, 50097 150 200

Page 51: External Sorting and Searching

51

A Larger Example

Link it under 260.Now, delete 97...

180

120 260

440, 50097 150 200

Page 52: External Sorting and Searching

52

A Larger Example

This is a combine case, so bring 120 down and combine with 150...

180

120 260

440, 500<empty> 150 200

Page 53: External Sorting and Searching

53

A Larger Example

The parent now has a problem.This is a combine case:

180

<empty> 260

440, 500120, 150 200

Page 54: External Sorting and Searching

54

A Larger Example

The old root is now empty; what to do with it?

<empty>

180, 260

440, 500120, 150 200

Page 55: External Sorting and Searching

55

A Larger Example

Just dispose of it properly.

180, 260

440, 500120, 150 200