29
File Organizations - Indexing R McFadyen ACS - 3902 1 Tree terms root, internal, leaf, subtree parent, child, sibling balanced, unbalanced b + -tree - split on overflow; merge on underflow - in practice it is usually 3 or 4 levels deep search, insert , delete algorithms

File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 1

•Tree terms

•root, internal, leaf, subtree

•parent, child, sibling

•balanced, unbalanced

•b+-tree

- split on overflow; merge on underflow

- in practice it is usually 3 or 4 levels deep

•search, insert, delete algorithms

Page 2: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 2

ACS-3902

b-trees and b+-trees are used for indexes

b+-trees and other index organizations are used in practice

Cover b+-tree from a theoretical perspective

Variations exist in database systems

Database systems mostly use b+-trees

Page 3: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 3

MySQL – simplified syntax

CREATE [UNIQUE] INDEX index_name

ON tbl_name (index_col_name,...)

USING {BTREE | HASH};

Create index index1 on Employees (dno);

Page 4: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 4

PostgreSQL– simplified syntax

CREATE [UNIQUE] INDEX index_name

ON tableName (index_col_name,...)

[USING {B-tree | hash | GiST | SP-GiST | GIN | BRIN}]

(index_col_name,...)

[WHERE predicate];

CREATE INDEX index1 ON Employee (dno);

CREATE INDEX index2 ON Employee (lname, fname);

Page 5: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 5

PRIMARY KEY

Is a constraint that enforces entity integrity for a given column or columns

through a unique index.

Only one PRIMARY KEY constraint can be created per table.

UNIQUE

Is a constraint that provides entity integrity for a given column or columns

through a unique index.

A table can have multiple UNIQUE constraints.

Indexes are automatically created for:

Page 6: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 6

Clustering

The physical order of rows is the same as the indexed order of

the rows.

If Index entries are logically close the the data will be close

together physically.

A Primary key index is normally clustered

Page 7: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 7

Motivation (finding one record given its key)

•Scanning a file is time consuming

•b+-tree provides a short access path

file of records

page1

page2

page3

B+-tree

Page 8: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 8

Motivation

•A b+-tree for a file (table) is stored in a separate file.

•A file (table) could have many b+-trees

file of records

bucket 1

bucket 2

bucket 3

B+-tree

Page 9: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 9

b+-tree

•based on b-tree (Bayer, balanced, Boeing, bushy)

•dynamic

Root

Internal

nodes

Leaf nodes

......

Page 10: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 10

b+-tree

•based on b-tree (Bayer, balanced, Boeing, bushy)

•dynamic

Root

Internal

nodes

Leaf nodes

......

3902: horizontal pointers at

the leaf level

Typical of implementations

Provides for sequential

access by key

Page 11: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 11

Node structure for b+-tree of order p

non-leaf node (internal node or a root)

• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq > (q p)

• keys are in sequence

K1 < K2 < ... < Kq-1

• for any key value, X, in the subtree pointed to by Pi

•Ki-1 < XKi for 1 < i < q

•X K1 for i = 1

•Kq-1 < X for i = q

•each internal node has at most p pointers

•each node except root must have at least p/2 pointers

•the root, if it has some children, must have at least 2 pointers

Page 12: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 12

Node structure for b+-tree of order p

leaf node (terminal node)

•< (K1, Pr1), (K2, Pr2), …, (Kq-1, Prq-1), Pnext >

•K1 < K2 < ... < Kq-1

•Pri points to a record with key value Ki , or, Pri points to a block

containing a record with key value Ki

•each leaf has at least p/2 keys

•maximum of p keys

•all leaves are at the same level (balanced)

•Pnext points to the next leaf for key sequencing

Page 13: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 13

Example

•insert records with key values

Diane, Cory, Ramon, Amy, Miranda,

Marshall, Zena, Rhonda, Vincent, Simon, Mary

into a b+-tree with p=3.

internal node : minimum 2 pointers and

maximum 3 pointers - inserting a fourth will

cause a split

leaf node : at least 2 key/pointer pairs and a

maximum of 3 key/pointer pairs - inserting a

fourth will cause a split

Page 14: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 14

insert Diane

Diane

Pointer to data

(wherever the

record for Diane

is actually stored)

Pointer to next leaf

in ascending key

sequence –

horizontal pointer

insert Cory

Cory , Diane

Only leaf nodes at this point

– need a split before there

are internal nodes

Page 15: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 15

Example

insert Ramon

Cory , Diane , Ramon

inserting Amy will cause the node to overflow:

Amy , Cory , Diane , Ramon This must split

Only leaf nodes

at this point

Page 16: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 16

Example

This is logically correct but it exceeds the space available …..

it must split into two leafs:

Amy , Cory , Diane , Ramon

Do a 50/50 split

Page 17: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 17

split the node into two nodes

Need to promote a key value upwards

• this must be Cory because it’s the highest key value in the left

subtree

Amy , Cory Diane , Ramon

Page 18: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 18

split the node and promote a key value upwards (this must be Cory

because it’s the highest key value in the left subtree)

Amy , Cory Diane , Ramon

Cory

When the root node splits, the tree has grown one level

≤ Cory

Page 19: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 19

Splitting Nodes

Any value being promoted upwards will come from the node that

is splitting.

•When a leaf splits, a ‘copy’ of a key value is promoted.

•When an internal node splits, the middle key value ‘moves’

from a child to a parent node.

There are three situations to be concerned with:

•a leaf splits,

•an internal node splits,

•a new root is generated.

Page 20: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 20

Leaf splittingWhen a leaf splits, a new leaf is allocated

•the original leaf is the left sibling, the new one is the right sibling

•key and pointer pairs of the overflowing node are redistributed: the left

sibling will have lesser keys than the right sibling

•a 'copy' of the key value which is the largest of the keys in the left sibling

is promoted to the parent

Two situations arise: the parent exists or not.

If the parent exists, then a copy of the key value and the pointer to the right

sibling are promoted upwards. Otherwise, the b+-tree is just beginning to grow

...

Page 21: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 21

Internal node splitting

If an internal node splits and it is not the root,

•insert the key and pointer and then determine the middle key

•a new 'right' sibling is allocated

•everything to its left stays in the left sibling

•everything to its right goes into the right sibling

•the middle key value along with the pointer to the new right sibling is

promoted to the parent (the middle key value 'moves' to the parent to become

the discriminator between the left and right siblings)

Page 22: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 22

Internal node splitting

When a new root is formed, a key value and two pointers must

be placed into it.

Page 23: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 23

A sample trace

Diane, Cory, Ramon, Amy, Miranda,

Marshall, Zena, Rhonda, Vincent, Simon, Mary

into a b+-tree with p=3.

Amy , Cory Diane , Ramon

Cory

Miranda

Page 24: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 24

Amy , Cory

Cory

Diane , Miranda , Ramon

Marshall

Amy , Cory Diane , Marshall Miranda , Ramon

Cory Marshall

Zena

After the 50/50 split

Marshall is the

discriminator

Zena fits in the node

Page 25: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 25

Amy , Cory Diane , Marshall Miranda , Ramon , Zena

Cory Marshall

Rhonda-causes split

-discriminator

promoted

Amy , Cory Diane , Marshall Rhonda , Zena

Cory Marshall Ramon

Miranda , Ramon

Page 26: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 26

Amy , Cory Diane , Marshall Rhonda , Zena

Marshall

Miranda , Ramon

Cory Ramon

Vincent

Page 27: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 27

Amy , Cory Diane , Marshall

Rhonda , Vincent ,Zena

Marshall

Miranda , Ramon

Cory Ramon

Simon-causes split

-Simon promoted

Page 28: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 28

Marshall

Miranda , Ramon

Ramon Simon

Rhonda , Simon

Vincent , ZenaMary-fits in leaf

Amy , Cory Diane , Marshall

Cory

Page 29: File Organizations - Indexing Tree terms root, internal ... · File Organizations - Indexing R McFadyen ACS - 3902 2 ACS-3902 b-trees and b+-trees are used for indexes b+-trees and

File Organizations - Indexing

R McFadyen ACS - 3902 29

b+-tree operations

•search - always the same search length - tree height+1

•retrieval - sequential access is facilitated as the lowest

level is typically linked

•insert - may cause overflow - tree may grow

•delete - may cause underflow – be aware the tree may shrink

3902