btrees (1)

Embed Size (px)

Citation preview

  • 8/8/2019 btrees (1)

    1/33

    1

    Multilevel Indexing andB+ Trees

  • 8/8/2019 btrees (1)

    2/33

    2

    Indexed Sequential Files

    Provide a choice between two alternativeviews of a file:

    1. Indexed : the file can be seen as a set of records that is indexed by key; or

    2. Sequential: the file can be accessedsequentially (physically contiguousrecords), returning records in order bykey.

  • 8/8/2019 btrees (1)

    3/33

    3

    E xample of applications

    Student record system in a university: Indexed view: access to individual records Sequential view: batch processing when posting

    grades

    Credit card system: Indexed view: interactive check of accounts

    Sequential view: batch processing of payments

  • 8/8/2019 btrees (1)

    4/33

    4

    The initial idea

    Maintain a sequence set: Group the records into blocks in a sorted way. Maintain the order in the blocks as records are

    added or deleted through splitting,concatenation, and redistribution.

    Construct a simple, single level index for

    these blocks. Choose to build an index that contain the key

    for the last record in each block.

  • 8/8/2019 btrees (1)

    5/33

    5

    Maintaining a Sequence Set

    Sorting and re-organizing after insertions anddeletions is out of question. We organize thesequence set in the following way: Records are grouped in blocks . Blocks should be at least half full . Link fields are used to point to the preceding block and

    the following block (similar to doubly linked lists) Changes (insertion/deletion) are localized into blocks

    by performing:Block splitting when insertion causes overflowBlock merging or redistribution when deletion causesunderflow.

  • 8/8/2019 btrees (1)

    6/33

    6

    E xample: insertionBlock size = 4Key : Last name

    ADAMS BIXBY CARSON COLE Block 1

    Insert BAIRD :

    ADAMS BAIRD BIXBY

    CARSON .. COLE

    Block 1

    Block 2

  • 8/8/2019 btrees (1)

    7/33

    7

    E xample: deletionADAMS BAIRD BIXBY BOONE Block 1

    BYNUM CARSON .. CARTER ..

    DENVER ELLIS

    Block 2

    Block 3

    Delete DAVIS, BYNUM, CART E R,

    COLE DAVISBlock 4

  • 8/8/2019 btrees (1)

    8/33

    8

    Add an Index set

    Key Block

    BERNE 1CAGE 2DUTTON 3EVANS 4F OLK 5GADDIS 6

  • 8/8/2019 btrees (1)

    9/33

    9

    Tree indexes

    This simple scheme is nice if the index fits inmemory.If index doesnt fit in memory: Divide the index structure into blocks,

    Organize these blocks similarly building a treestructure.

    Tree indexes: B Trees

    B+ Trees Simple prefix B+ Trees

  • 8/8/2019 btrees (1)

    10/33

    10

    Separators

    Block Range of Keys Separator1 ADAMS-BERNE

    BOL E N2 BOLEN-CAGE

    CAMP3 CAMP-DUTTON

    E MBRY4 EMBRY-EVANS

    FAB E R 5 F ABER- F OLK

    FOLKS6 F OLKS-GADDIS

  • 8/8/2019 btrees (1)

    11/33

    11

    ADAMS-BERNE F OLKS-GADDIS

    FABER- F OLK BOLEN-CAGE

    CAMP-DUTTON EMBRY-EVANS

    BOLEN CAMP

    EMBRY

    FABER F OLKS

    1

    2

    3

    5

    4 6

    Index set

    root

  • 8/8/2019 btrees (1)

    12/33

    12

    B Trees

    B-tree is one of the most important data structuresin computer science.What does B stand for? (Not binary!)

    B-tree is a multiway search tree.Several versions of B-trees have been proposed,

    but only B+ Trees has been used with large files.

    A B+tree is a B-tree in which data records are inleaf nodes, and faster sequential access is possible.

  • 8/8/2019 btrees (1)

    13/33

    13

    Formal definition of B+ Tree Properties

    Properties of a B+ Tree of order v : All internal nodes (except root) has at least v keys

    and at most 2v keys . The root has at least 2 children unless it s a leaf.. All leaves are on the same level. An internal node with k keys has k+1 children

  • 8/8/2019 btrees (1)

    14/33

    14

    B+ tree: Internal/root node

    structureP 0 K1 P 1 K2 P n-1 Kn P n

    Requirements:

    K1 < K 2 < < K n

    For any search key value K in the subtree pointed by Pi,If Pi = P0, we require K < K1If Pi = Pn, Kn e KIf Pi = P1, , Pn-1, Ki < K e Ki+1

    Each P i is a pointer to a child node; each K i is a search key value# of search key values = n, # of pointers = n+1

  • 8/8/2019 btrees (1)

    15/33

    15

    Pointer L points to the left neighbor; R points tothe right neighbor

    K1 < K2 < < K nv e n e 2v (v is the order of this B+ tree)

    W e will use K i* for the pair and omit L andR for simplicity

    B+ tree: leaf node structure

    L K1 r 1 K2 Kn r nR

  • 8/8/2019 btrees (1)

    16/33

    16

    E xample: B+ tree with order of 1

    Each node must hold at least 1 entry, and at most2 entries

    10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

    20 33 51 63

    40

    Root

  • 8/8/2019 btrees (1)

    17/33

    17

    E xample: Search in a B+ tree order 2Search: how to find the records with a given search key value? Begin at root, and use key comparisons to go to leaf

    Examples: search for 5* , 16* , all data entries >= 2 4* ... The last one is a range search, we need to do the sequential scan, starting from

    the first leaf containing a value >= 2 4 .Root

    17 24 30

    2* 3* 5* 7* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

    13

  • 8/8/2019 btrees (1)

    18/33

    18

    B+ Trees in Practice

    Typical order: 100. Typical fill-factor: 67% . average fanout = 1 33 (i.e, # of pointers in internal

    node)

    Can often hold top levels in buffer pool: Level 1 = 1 page = 8 Kbytes Level 2 = 1 33 pages = 1 Mbyte Level 3 = 1 7 ,68 9 pages = 1 33 MBytes

    Suppose there are 1,000,000,000 data entries. H = log 133 (1000000000/1 3 2) < 4

    The cost is 5 pages read

  • 8/8/2019 btrees (1)

    19/33

    19

    H ow to Insert a Data E ntry into a

    B+ Tree?

    Lets look at several examples first.

  • 8/8/2019 btrees (1)

    20/33

    20

    Inserting 16*, 8* into E xample B+ tree

    Root 17 24 3013

    2* 3* 5* 7* 8*

    2* 5* 7*3*

    17 24 3013

    8*

    You overflow

    O ne new child (leaf node)generated; must add one morepointer to its parent, thus one morekey value as well.

    14* 15* 16*

  • 8/8/2019 btrees (1)

    21/33

    21

    Inserting 8* (cont.)

    Copy up themiddle value(leaf split)

    2*3* 5*

    7* 8*

    5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)

    s copied up and

    13 17 24 30

    You overflow!5 13 17 24 30

  • 8/8/2019 btrees (1)

    22/33

    22

    (Note that 17 is pushed up and only

    appears once in the index. Contrast

    Entry to be inserted in parent node.

    this with a leaf split.)

    5 24 30

    17

    13

    Insertion into B+ tree (cont.)

    5 13 17 24 30

    Understanddifference

    between copy-upand push-up

    Observe howminimumoccupancy isguaranteed in

    both leaf and

    index pg splits.

    We split this node, redistribute entries evenly,and push up middle key.

  • 8/8/2019 btrees (1)

    23/33

    23

    E xample B+ Tree After Inserting 8*

    Notice that root was split, leading to increase in height.

    2* 3*

    Root

    17

    24 30

    14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

    135

    7*5* 8*

  • 8/8/2019 btrees (1)

    24/33

  • 8/8/2019 btrees (1)

    25/33

    25

    Deleting a Data E ntry from a B+

    TreeExamine examples first

  • 8/8/2019 btrees (1)

    26/33

    26

    Delete 19* and 20*

    2* 3*

    Root

    17

    24 30

    14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

    135

    7*5* 8*

    22*

    27* 29*22* 24*

    Have we still forgot something?

  • 8/8/2019 btrees (1)

    27/33

    27

    Deleting 19* and 20* (cont.)

    Notice how 2 7 is c opied up .But can we move it up?Now we want to delete 2 4

    Underflow again! But can we redistribute this time?

    2* 3*

    Root

    17

    30

    14* 16* 33* 34* 38* 39*

    135

    7*5* 8* 22* 24*

    27

    27* 29*

  • 8/8/2019 btrees (1)

    28/33

    28

    Deleting 24*Observe the two leaf nodes are merged, and27 is discarded fromtheir parent, but

    Observe pull down of index entry (below).

    30

    22* 27* 29* 33* 34* 38* 39*

    2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

    30135 17N ew root

  • 8/8/2019 btrees (1)

    29/33

    29

    Deleting a Data E ntry from a B+ Tree:

    SummaryStart at root, find leaf L where entry belongs.Remove the entry. If L is at least half-full, done! If L has only d-1 entries,

    Try to re-distribute , borrowing from sibling (adja c ent nodewith same parent as L) .If re-distribution fails, merge L and sibling.

    If merge occurred, must delete entry (pointing to L or sibling) from parent of L.Merge could propagate to root, decreasing height.

  • 8/8/2019 btrees (1)

    30/33

    3 0

    E xample of Non-leaf Re-distribution

    Tree is shown below during deletion of 2 4* . (Whatcould be a possible initial tree?)In contrast to previous example, can re-distribute entryfrom left child of root to right child.

    Root

    135 17 20

    22

    30

    14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*

  • 8/8/2019 btrees (1)

    31/33

    3 1

    After Re-distribution

    Intuitively, entries are re-distributed by ` pushing through the splitting entry in the parent node.It suffices to re-distribute index entry with key 20;weve re-distributed 1 7 as well for illustration.

    14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2* 3*

    Root

    135

    17

    3020 22

  • 8/8/2019 btrees (1)

    32/33

    3 2

    Terminology

    Bucket Factor: the number of records which canfit in a leaf node.Fan-out : the average number of children of aninternal node.A B+tree index can be used either as a primaryindex or a secondary index. Primary index: determines the way the records are

    actually stored (also called a sparse index) Secondary index: the records in the file are not

    grouped in buckets according to keys of secondaryindexes (also called a dense index)

  • 8/8/2019 btrees (1)

    33/33

    33

    SummaryTree-structured indexes are ideal for range-searches, also good for equality searches.B+ tree is a dynamic structure. Inserts/deletes leave tree height-balanced; High fanout ( F )

    means depth rarely more than 3 or 4 . Almost always better than maintaining a sorted file. Typically, 67% occupancy on average. If data entries are data records, splits can change rids!

    Most widely used index in database managementsystems because of its versatility. One of the mostoptimized components of a DBMS.