23
B-Trees Chapter 9

B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

B-Trees

Chapter 9

Page 2: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Limitations of binary search

• Though faster than sequential search, binary search still requires an unacceptable number of accesses for data files with more than 1000 records

• Resorting the index after each record is inserted is not practical if the index cannot be kept in memory

Page 3: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

9.3 Binary search tree index

• Tree structure includes pointers to left and right index nodes in addition to a key (and data record pointer)

• Each left node defines a subtree with smaller keys, each right node with larger

• Pointers make sorting the index unnecessary. Why?

Page 4: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Binary tree balance problem

• Building the tree from the root by inserting randomly ordered incoming records results in paths to some leaves that are much longer than others

• Performance is unacceptably poor for keys on remote paths

• Keeping the tree balanced is non-trivial

Page 5: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

KF

FB SD

WSPAHNCL

AX DE FT JD NR RF TK YJ

LV

MB

NP

TS

TM

ND

LA

NK

UF

Page 6: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

A

YW

X

H I M

Balanced AVL tree

Page 7: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

9.3.2 Paged binary trees

• multiple binary nodes are located on the same page (sector) on secondary storage

• each disk seek returns several nodes in a search path, reducing search complexity from log2N to logk+1N

• random insertions cause imbalance which cannot be easily fixed because keys must be shifted to different pages throughout the tree

Page 8: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Multi-record index

• number of records in a data file exceeds the maximum number of keys allowed in a single record index

• index must still be maintained in sorted order (across multiple records) to allow binary search

Page 9: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Searching multi-record index

• total number of keys (data records) is N

• each index record holds k keys

• for binary search, first look at the index record in the middle of the index file

• compare search key to smallest and largest keys in current index record

Page 10: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

record 1keys 1 : k

record 2keys k+1 : 2k

record N/2kkeys N/2k + 1 : N/2

record N/k + 1keys N - N mod k : N

Starting recordfor binary search

Multi-record index file

Page 11: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

9.4 Multilevel indexing

• Level-1 index is a multi-record index for the entire data file

• Each higher level index below the root is a multi-record index to the index below it

• Root level index is a single record• Though multilevel index is entry sequenced

in that the records at each level need not be ordered, record insertion is still a problem

Page 12: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

9.5 B-trees• insertion problem of simple multilevel

index is solved by (1) using partially filled index records(2) splitting records when they fill up, instead of

shifting keys to the next record

• when an index node is split, the largest key in the new node is promoted to the next higher index level

• at worst, insertion causes one node at each level to split

Page 13: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

DC TS

Initial node contains keys C, D, S, and T.

C D S T

A

D T

A

A DC

Figure 9.14 Growth of a B-tree

Insertion of A causes node to split.A new root node is created and the largest key in each leaf node placed in the root.

Key A can now be inserted in the correct leaf node.

Page 14: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

9.7 B-Tree implementation

• Class BTreeNode (supports index record)– subclass of SimpleIndex class– template class allows different types of keys– uses same Search method as SimpleIndex

• Class BTree (supports B-tree index file)– uses RecordFile object to access index file– FindLeaf method sets an array of pointers,

Nodes, to define a search path

Page 15: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

9.9-10 Formal definition

• The order of a B-tree (m) is the maximum number of descendents for each node.

• Every node except the root and leaves must have at least m/2 descendents.

• The root must have at least 2 descendents unless it is a leaf (i.e., the only node).

• All leaves are on the same level.• The leaf level is a complete index.

Page 16: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Implications of formal definition

• Path length is the same for all searches, and is equal to the tree depth, since only the leaf nodes point to data records.

• The worst case depth can be computed for a B-tree with a given order and number of keys (see § 9.11 in the text)

Page 17: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Deletion

• maintaining balance requires that each index node hold no more than m keys and no fewer than m/2 keys

• when insertion causes overflow (more than m keys) in a node, it is split

• what happens when deletion results in “underflow” (fewer than m/2 keys)?

Page 18: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Situations arising from deletion

(Figure 9.21)

a) Victim node has more than m/2 keys, and key to be deleted is not the largest key.

b) Victim node has more than m/2 keys, and key to be deleted is the largest key.

c) Victim node has exactly m/2 keys.

Page 19: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Merging and Redistribution

• Needed for situation c), when deletion leaves fewer than m/2 keys.

• Two options:– merge with a sibling that has m/2 or

m/2 + 1 keys

– move at least 1 key from a sibling that has at least m/2 + 1 keys

Page 20: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Questions• What is the minimum and maximum

number of siblings a node can have?• Is it possible that there are no siblings

available with which to merge or redistribute after a deletion?

• Is it possible to have a choice of either merging with or redistributing from the same sibling?

• Is it ever possible to merge two nodes without first deleting at least one key?

Page 21: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

B*tree and Redistribution

• Redistribution may be used optionally to improve storage utilization

• B*tree uses redistribution during insertion to maintain each node 2/3 full (rather than 1/2, as results from simply splitting)

• Notes on B*trees by Jan Jannink: http://www.cise.ufl.edu/~jhammer/classes/b_star.html

Page 22: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

9.15 Page buffering

• Keep a page buffer, or collection of index pages in memory.

• Whenever an index page is needed, first look for it in the page buffer. If it’s there, you save seeking for it on the disk.

• If a needed index page is not in the buffer, load it into the buffer from the disk

Page 23: B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses

Page replacement schemes

• If a needed index page is not in the buffer, but the buffer is full, a page must be replaced.

• LRU replacement scheme is based on the assumption of temporal locality.

• Page height scheme favors pages on higher levels. Why?