21
B-Trees CS321 Spring 2014 Steve Cutchin

B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

  • Upload
    hanhu

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

B-Trees

CS321 Spring 2014 Steve Cutchin

Page 2: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

Topics for Today

•  HW #2 Once Over •  B Trees •  Questions PA #3 •  Expression Trees •  Balance Factor •  AVL Heights •  Data Structure Animations •  Graphs

2

Page 3: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

3

B-Tree Motivation

•  When data is too large to fit in main memory, then the number of disk accesses becomes important.

•  A disk access is unbelievably expensive compared to a typical computer instruction (mechanical limitations).

•  One disk access is worth about 200,000 instructions.

•  The number of disk accesses will dominate the running time.

Page 4: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

4

Motivation Cont..

•  Secondary memory (disk) is divided into equal-sized blocks (typical sizes are 512, 2048, 4096 or 8192 bytes)

•  The basic I/O operation transfers the contents of one disk block to/from main memory.

•  Our goal is to devise a multiway search tree that will minimize file accesses (by exploiting disk block read).

Page 5: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

5

m-ary Trees

•  A node contains multiple keys. •  Order of subtrees is based on parent node’s keys •  If each node has m children & there are n keys

then the average time taken to search the tree is logmn.

Etc.

K1 K2 K3 K4

T1 T2 T3

K < K1 K1 < K < K2

Page 6: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

B Tree Definition

•  A B-Tree is a search tree with a root node. •  Each node in a B-Tree can have multiple keys. •  Each node in a B-Tree can have multiple children. •  The number of children is dependent on the

number of keys. •  A node in a B-Tree has at most 1 more child than

it has keys.

6

Page 7: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

Layout of a B-Tree

7

Each node has at most 3 keys and 4 children. Each node has a minimum of 2 children. This is a 2-3-4 B-Tree

Page 8: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

Important Metrics

•  The minimal degree of a B-Tree is defined as: –  Degree = t, t >= 2. –  Every node except root has at least t children. –  Every node except root has at least t-1 keys. –  Every node except root has at most 2*t – 1 keys.

•  The order of a B-Tree is defined as: –  Order = m –  No node may have more than m children.

•  Therefore: Order = 2*degree; 8

Page 9: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

Layout of a B-Tree

9

What is the degree of this B-Tree? What is the order of this B-Tree?

Page 10: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

Size of B Trees

•  All leaves in a tree have the same depth. •  The depth of a B-Tree is uniform and equal to its

height. •  By definition all B-Trees are balanced.

10

Page 11: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

Size of B Trees

•  For a given B-Tree with n keys and degree t •  Height h <= logt((n+1)/2);

•  For a given B-tree with height of h and degree t •  n >= 2 * th - 1

11

Page 12: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

B-Tree and Block Size

•  A B-Tree Node is usually the size of a Disk Page. •  So if a Disk Page = 4096 bytes we want our Node

to be that size: •  Say, 84 bytes overhead for the Node. •  4 Bytes for each key. 4 Bytes for each child

pointer. 4 bytes for num keys, 4 bytes num children.

12

Page 13: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

B-Tree and Block Size

•  4096 = 4K + 4C + 4 + 4 + 84. •  C = K+1. •  4096 = 4K + 4K+4 + 4 + 4 + 84. •  4096 = 8K + 12 + 84 •  4096 -12 -84 = 8K •  K = 500 Keys per Node for one block. •  C = 501 Children per Node for each block. •  A tree of height 2 has 125,751,500 Keys •  A tree of height 2 has 251,503 Disk Blocks. 13

Page 14: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

14

Definition of a B-Tree

•  Def: B-tree of degree t is a tree with the following properties.: –  The root has at least 2 children, unless it is a leaf. –  Every non-root node must have t-1 keys. –  Every non-root internal node has t children. –  If the tree is non-empty the root has at least one key. –  Every node may have at most 2t-1 keys. –  An internal node may have at most 2t children. –  A full tree occurs when every node has 2t-1 keys.

Page 15: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

Components of B-Tree Nodes

•  Every node x has the following attributes: –  X.n = the number of keys in X –  X.keys[n] = the actual keys. –  X.leaf = is this a leaf? Can the root be a leaf? –  X.child[n+1] = array of pointers to the children.

•  Rule: key[1] <= key[2] <= … key[n].

15

Page 16: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

16

Definition of a B-Tree

•  Def: B-tree of order m is a tree with the following properties: –  The root has at least 2 children, unless it is a leaf. –  No node in the tree has more then m children. –  Every node except for the root and the leaves have at

least ⎡m/2⎤ children. –  All leaves appear at the same level. –  An internal node with k children contains exactly k-1

keys.

Page 17: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

17

B-Trees & Efficiency •  Used in Mac, NTFS, OS2 for file structure. •  Allow insertion and deletion into a tree structure,

based on logmn property, where m is the order of the tree.

•  The idea is that you leave some key spaces open. So an insert of a new key is done using available space (most cases). –  Less dynamic then our typical Binary Tree –  Efficient for disk based operations.

Page 18: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

18

2-3 Trees

G

I | M

J | K

C

D | E A H N | O

Page 19: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

B Tree Operations (adt)

•  Search(key) •  Insert(key) •  Delete(key)

19

Page 20: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

20

Searching m-ary Trees

•  A generalized SOT will visit all keys in ascending order.

for (i==1;i<=m-1;i++) { visit subtree to left of ki visit ki } visit subtree to right of km-1

Page 21: B-Trees - Boise State CScs.boisestate.edu/~scutchin/cs321_spring2014/B-Trees_brief.pdf · 17 B-Trees & Efficiency • Used in Mac, NTFS, OS2 for file structure. • Allow insertion

21

Basic Recursive Search

•  Ordered Recursive Search. Array indexed by 1. Search(T,k) for (i==1;i<=m-1;i++) { if (k < ki)

return Search(T.child[i],k);

} Return Search(T.child[m],k);

Notice the for loop! O(?)