Upload
hanhu
View
218
Download
2
Embed Size (px)
Citation preview
B-Trees
CS321 Spring 2014 Steve Cutchin
Topics for Today
• HW #2 Once Over • B Trees • Questions PA #3 • Expression Trees • Balance Factor • AVL Heights • Data Structure Animations • Graphs
2
3
B-Tree Motivation
• When data is too large to fit in main memory, then the number of disk accesses becomes important.
• A disk access is unbelievably expensive compared to a typical computer instruction (mechanical limitations).
• One disk access is worth about 200,000 instructions.
• The number of disk accesses will dominate the running time.
4
Motivation Cont..
• Secondary memory (disk) is divided into equal-sized blocks (typical sizes are 512, 2048, 4096 or 8192 bytes)
• The basic I/O operation transfers the contents of one disk block to/from main memory.
• Our goal is to devise a multiway search tree that will minimize file accesses (by exploiting disk block read).
5
m-ary Trees
• A node contains multiple keys. • Order of subtrees is based on parent node’s keys • If each node has m children & there are n keys
then the average time taken to search the tree is logmn.
Etc.
K1 K2 K3 K4
T1 T2 T3
K < K1 K1 < K < K2
B Tree Definition
• A B-Tree is a search tree with a root node. • Each node in a B-Tree can have multiple keys. • Each node in a B-Tree can have multiple children. • The number of children is dependent on the
number of keys. • A node in a B-Tree has at most 1 more child than
it has keys.
6
Layout of a B-Tree
7
Each node has at most 3 keys and 4 children. Each node has a minimum of 2 children. This is a 2-3-4 B-Tree
Important Metrics
• The minimal degree of a B-Tree is defined as: – Degree = t, t >= 2. – Every node except root has at least t children. – Every node except root has at least t-1 keys. – Every node except root has at most 2*t – 1 keys.
• The order of a B-Tree is defined as: – Order = m – No node may have more than m children.
• Therefore: Order = 2*degree; 8
Layout of a B-Tree
9
What is the degree of this B-Tree? What is the order of this B-Tree?
Size of B Trees
• All leaves in a tree have the same depth. • The depth of a B-Tree is uniform and equal to its
height. • By definition all B-Trees are balanced.
10
Size of B Trees
• For a given B-Tree with n keys and degree t • Height h <= logt((n+1)/2);
• For a given B-tree with height of h and degree t • n >= 2 * th - 1
11
B-Tree and Block Size
• A B-Tree Node is usually the size of a Disk Page. • So if a Disk Page = 4096 bytes we want our Node
to be that size: • Say, 84 bytes overhead for the Node. • 4 Bytes for each key. 4 Bytes for each child
pointer. 4 bytes for num keys, 4 bytes num children.
12
B-Tree and Block Size
• 4096 = 4K + 4C + 4 + 4 + 84. • C = K+1. • 4096 = 4K + 4K+4 + 4 + 4 + 84. • 4096 = 8K + 12 + 84 • 4096 -12 -84 = 8K • K = 500 Keys per Node for one block. • C = 501 Children per Node for each block. • A tree of height 2 has 125,751,500 Keys • A tree of height 2 has 251,503 Disk Blocks. 13
14
Definition of a B-Tree
• Def: B-tree of degree t is a tree with the following properties.: – The root has at least 2 children, unless it is a leaf. – Every non-root node must have t-1 keys. – Every non-root internal node has t children. – If the tree is non-empty the root has at least one key. – Every node may have at most 2t-1 keys. – An internal node may have at most 2t children. – A full tree occurs when every node has 2t-1 keys.
Components of B-Tree Nodes
• Every node x has the following attributes: – X.n = the number of keys in X – X.keys[n] = the actual keys. – X.leaf = is this a leaf? Can the root be a leaf? – X.child[n+1] = array of pointers to the children.
• Rule: key[1] <= key[2] <= … key[n].
15
16
Definition of a B-Tree
• Def: B-tree of order m is a tree with the following properties: – The root has at least 2 children, unless it is a leaf. – No node in the tree has more then m children. – Every node except for the root and the leaves have at
least ⎡m/2⎤ children. – All leaves appear at the same level. – An internal node with k children contains exactly k-1
keys.
17
B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion and deletion into a tree structure,
based on logmn property, where m is the order of the tree.
• The idea is that you leave some key spaces open. So an insert of a new key is done using available space (most cases). – Less dynamic then our typical Binary Tree – Efficient for disk based operations.
18
2-3 Trees
G
I | M
J | K
C
D | E A H N | O
B Tree Operations (adt)
• Search(key) • Insert(key) • Delete(key)
19
20
Searching m-ary Trees
• A generalized SOT will visit all keys in ascending order.
for (i==1;i<=m-1;i++) { visit subtree to left of ki visit ki } visit subtree to right of km-1
21
Basic Recursive Search
• Ordered Recursive Search. Array indexed by 1. Search(T,k) for (i==1;i<=m-1;i++) { if (k < ki)
return Search(T.child[i],k);
} Return Search(T.child[m],k);
Notice the for loop! O(?)