102
CS 277 – Spring 2002 Notes 4 1 CS 277: Database System Implementation Notes 4: Indexing Arthur Keller

CS 277: Database System Implementation Notes 4: Indexing

  • Upload
    hachi

  • View
    64

  • Download
    0

Embed Size (px)

DESCRIPTION

CS 277: Database System Implementation Notes 4: Indexing. Arthur Keller. Chapter 4. Indexing & Hashing value. record. ?. value. Topics. Conventional indexes B-trees Hashing schemes. 10. 30. 50. 70. 90. 20. 40. 60. 80. 100. Sequential File. 70. 50. 30. 10. 110. 90. - PowerPoint PPT Presentation

Citation preview

Page 1: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 1

CS 277: Database System Implementation

Notes 4: Indexing

Arthur Keller

Page 2: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 2

Indexing & Hashing

value

Chapter 4

value

record

Page 3: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 3

Topics

• Conventional indexes• B-trees• Hashing schemes

Page 4: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 4

Sequential File

2010

4030

6050

8070

10090

Page 5: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 5

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Page 6: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 6

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Page 7: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 7

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

Page 8: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 8

• Comment:{FILE,INDEX} may be contiguous

or not (blocks chained)

Page 9: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 9

Question:

• Can we build a dense, 2nd level index for a dense index?

Page 10: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 10

Notes on pointers:

(1) Block pointer (sparse index) can be smaller than record pointer

BP

RP

Page 11: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 11

(2) If file is contiguous, then we can omitpointers (i.e., compute them)

Notes on pointers:

Page 12: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 12

K1

K3

K4

K2

R1

R2

R3

R4

say:1024 Bper block

• if we want K3 block: get it at offset (3-1)1024 = 2048 bytes

Page 13: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 13

Sparse vs. Dense Tradeoff

• Sparse: Less index space per record can keep more of

index in memory• Dense: Can tell if any record exists

without accessing file

(Later: – sparse better for insertions– dense needed for secondary indexes)

Page 14: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 14

Terms

• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values in)• Sparse index• Multi-level index

Page 15: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 15

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

Page 16: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 16

Duplicate keys

1010

2010

3020

3030

4540

Page 17: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 17

1010

2010

3020

3030

4540

10101020

20303030

1010

2010

3020

3030

4540

10101020

20303030

Dense index, one way to implement?

Duplicate keys

Page 18: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 18

1010

2010

3020

3030

4540

10203040

Dense index, better way?

Duplicate keys

Page 19: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 19

1010

2010

3020

3030

4540

10102030

Sparse index, one way?

Duplicate keys

care

ful if lookin

gfo

r 2

0 o

r 3

0!

Page 20: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 20

1010

2010

3020

3030

4540

10203030

Sparse index, another way?

Duplicate keys

– place first new key from block

shouldthis be40?

Page 21: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 21

Duplicate values, primary index

• Index may point to first instance ofeach value only

File Index

Summary

aaa

b

Page 22: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 22

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

Page 23: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 23

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 40

Page 24: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 24

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 30

4040

Page 25: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 25

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

Page 26: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 26

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

Page 27: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 27

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

– delete record 30

4040

Page 28: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 28

Insertion, sparse index case

2010

30

5040

60

10304060

Page 29: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 29

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!

Page 30: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 30

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 15

15

2030

20

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index

Page 31: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 31

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 25

25

overflow blocks(reorganize later...)

Page 32: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 32

Insertion, dense index case

• Similar

• Often more expensive . . .

Page 33: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 33

Secondary indexesSequencefield

5030

7020

4080

10100

6090

Page 34: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 34

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!

Page 35: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 35

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Dense index10203040

506070...

105090...

sparsehighlevel

Page 36: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 36

With secondary indexes:

• Lowest level is dense• Other levels are sparse

Also: Pointers are record pointers

(not block pointers; not computed)

Page 37: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 37

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

Page 38: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 38

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10101020

20304040

4040...

one option...

Problem:excess overhead!

• disk space• search time

Page 39: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 39

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10

another option...

4030

20Problem:variable sizerecords inindex!

Page 40: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 40

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10203040

5060...

Another idea (suggested in class):Chain records with same key?

Problems:• Need to add fields to records• Need to follow chain to know records

Page 41: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 41

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10203040

5060...

buckets

Page 42: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 42

Why “bucket” idea is useful

Indexes RecordsName: primary EMP

(name,dept,floor,...)

Dept: secondaryFloor: secondary

Page 43: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 43

Query: Get employees in

(Toy Dept) ^ (2nd floor)

Dept. index EMP Floor index

Toy 2nd

Intersect toy bucket and 2nd Floor

bucket to get set of matching EMP’s

Page 44: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 44

This idea used in text information retrieval

Documents

...the cat is fat ...

...was raining cats and dogs...

...Fido the dog ...

Inverted lists

cat

dog

Page 45: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 45

IR QUERIES

• Find articles with “cat” and “dog”• Find articles with “cat” or “dog”• Find articles with “cat” and not “dog”

• Find articles with “cat” in title• Find articles with “cat” and “dog”

within 5 words

Page 46: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 46

Common technique: more info in inverted

list

cat Title 5

Title 100

Author 10

Abstract 57

Title 12

d3d2

d1

dog

typeposit

ion

locatio

n

Page 47: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 47

Posting: an entry in inverted list.Represents occurrence

ofterm in article

Size of a list: 1 Rare words or (in postings) miss-spellings

106 Common words

Size of a posting: 10-15 bits (compressed)

Page 48: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 48

IR DISCUSSION

• Stop words• Truncation• Thesaurus• Full text vs. Abstracts• Vector model

Page 49: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 49

Vector space model

w1 w2 w3 w4 w5 w6 w7 …

DOC = <1 0 0 1 1 0 0 …>

Query= <0 0 1 1 0 0 0 …>

PRODUCT = 1 + ……. = score

Page 50: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 50

• Tricks to weigh scores + normalize

e.g.: Match on common word not asuseful as match on rare

words...

Page 51: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 51

• How to process VS. Queries?

w1 w2 w3 w4 w5 w6 …Q = < 0 0 0 1 1 0 … >

Page 52: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 52

• Try Altavista, Excite, Infoseek, Lycos...

Page 53: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 53

Summary so far

• Conventional index– Basic Ideas: sparse, dense, multi-

level…– Duplicate Keys– Deletion/Insertion– Secondary indexes

– Buckets of Postings List

Page 54: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 54

Conventional indexes

Advantage:- Simple- Index is sequential file

good for scans

Disadvantage:- Inserts expensive, and/or- Lose sequentiality & balance

Page 55: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 55

Example Index (sequential)

continuous

free space

102030

405060

708090

39313536

323834

33

overflow area(not sequential)

Page 56: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 56

Outline:

• Conventional indexes• B-Trees NEXT• Hashing schemes

Page 57: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 57

• NEXT: Another type of index– Give up on sequentiality of index– Try to get “balance”

Page 58: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 58

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 59: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 59

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

Page 60: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 60

Sample leaf node:

From non-leaf node

to next leafin

sequence5

7

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Page 61: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 61

In textbook’s notationn=3

Leaf:

Non-leaf:

30

35

30

30 35

30

Page 62: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 62

Size of nodes: n+1 pointersn keys

(fixed)

Page 63: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 63

Don’t want nodes to be too empty

• Use at least

Non-leaf: (n+1)/2pointers

Leaf: (n+1)/2 pointers to data

Page 64: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 64

Full nodemin. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Page 65: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 65

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to records except for “sequence pointer”

Page 66: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 66

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 1 1

Max Max Min Min ptrs keys ptrsdata keys

(n+1)/2 (n+1)/2

Page 67: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 67

Insert into B+tree

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

Page 68: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 68

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

Page 69: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 69

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

Page 70: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 70

(c) Insert key = 160 n=3

10

0

120

150

180

150

156

179

180

200

160

18

0

160

179

Page 71: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 71

(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

Page 72: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 72

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

Page 73: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 73

(b) Coalesce with sibling– Delete 50

10

40

100

10

20

30

40

50

n=4

40

Page 74: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 74

(c) Redistribute keys– Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

Page 75: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 75

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalese– Delete 37

n=4

40

30

25

25

new root

Page 76: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 76

B+tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!

Page 77: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 77

Comparison: B-trees vs. static indexed sequential

fileRef #1: Held & Stonebraker

“B-Trees Re-examined”CACM, Feb. 1978

Page 78: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 78

Ref # 1 claims:- Concurrency control harder in B-Trees

- B-tree consumes more space

For their comparison:block = 512 byteskey = pointer = 4 bytes4 data records per block

Page 79: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 79

Example: 1 block static index

127 keys

(127+1)4 = 512 Bytes-> pointers in index implicit! up to 127

blocks

k1

k2

k3

k1

k2

k3

1 datablock

Page 80: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 80

Example: 1 block B-tree

63 keys

63x(4+4)+8 = 512 Bytes-> pointers needed in B-tree up to 63

blocks because index is blocksnot contiguous

k1

k2

...

k63

k1

k2

k3

1 datablock

next

-

Page 81: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 81

Size comparison Ref. #1Size comparison Ref. #1

Static Index B-tree# data # datablocks height blocks

height

2 -> 127 2 2 -> 63 2128 -> 16,129 3 64 -> 3968 316,130 -> 2,048,383 4 3969 -> 250,047 4

250,048 -> 15,752,961 5

Page 82: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 82

Ref. #1 analysis claims

• For an 8,000 block file,after 32,000 inserts

after 16,000 lookups Static index saves enough accesses

to allow for reorganization

Ref. #1 conclusion Static index better!!

Page 83: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 83

Ref #2: M. Stonebraker, “Retrospective on a database

system,” TODS, June 1980

Ref. #2 conclusion B-trees better!!

Page 84: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 84

• DBA does not know when to reorganize• DBA does not know how full to load

pages of new index

Ref. #2 conclusion B-trees better!!

Page 85: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 85

• Buffering– B-tree: has fixed buffer requirements– Static index: must read several

overflow blocks to be efficient (large & variable size buffers needed for this)

Ref. #2 conclusion B-trees better!!

Page 86: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 86

• Speaking of buffering… Is LRU a good policy for B+tree

buffers? Of course not! Should try to keep root in memory

at all times(and perhaps some nodes from second level)

Page 87: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 87

Interesting problem:

For B+tree, how large should n be?

n is number of keys / node

Page 88: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 88

Sample assumptions:(1) Time to read node from disk is

(70+0.05n) msec.(2) Once block in memory, use binary

search to locate key:(a + b LOG2 n) msec.

For some constants a,b; Assume a << 70

(3) Assume B+tree is full, i.e., # nodes to examine is LOGn N

where N = # records

Page 89: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 89

Can get: f(n) = time to find a record

f(n)

nopt n

Page 90: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 90

FIND nopt by f’(n) = 0

Answer is nopt = “few hundred”

(see homework for details)

What happens to nopt as

• Disk gets faster?• CPU get faster?

Page 91: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 91

Variation on B+tree: B-tree (no +)• Idea:

– Avoid duplicate keys– Have record pointers in non-leaf nodes

Page 92: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 92

to record to record to record with K1 with K2 with K3

to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k3 >k3

K1 P1 K2 P2 K3 P3

Page 93: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 93

B-tree example n=2

65

125

145

165

85

105

25

45

10

20

30

40

110

120

90

100

70

80

170

180

50

60

130

140

150

160

• sequence pointers not useful now! (but keep space for simplicity)

Page 94: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 94

Note on inserts

• Say we insert record with key = 25

10

20

30 n=3

leaf

10

– 20 –

25

30

• Afterwards:

Page 95: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 95

So, for B-trees:

MAX MINTree Rec Keys Tree Rec KeysPtrs Ptrs Ptrs Ptrs

Non-leafnon-root n+1 n n (n+1)/2 (n+1)/2-1

(n+1)/2-1Leafnon-root 1 n n 1 (n+1)/2

(n+1)/2

Rootnon-leafn+1 n n 2 1 1

RootLeaf 1 n n 1 1 1

Page 96: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 96

Tradeoffs:

B-trees have faster lookup than B+trees

in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated

B+trees preferred!

Page 97: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 97

But note:

• If blocks are fixed size(due to disk and buffering restrictions)

Then lookup for B+tree isactually better!!

Page 98: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 98

Example:- Pointers 4 bytes- Keys 4 bytes- Blocks 100 bytes (just example)- Look at full 2 level tree

Page 99: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 99

Root has 8 keys + 8 record pointers+ 9 son pointers

= 8x4 + 8x4 + 9x4 = 100 bytes

B-tree:

Each of 9 sons: 12 rec. pointers (+12 keys)= 12x(4+4) + 4 = 100 bytes

2-level B-tree, Max # records =12x9 + 8 = 116

Page 100: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 100

Root has 12 keys + 13 son pointers= 12x4 + 13x4 = 100 bytes

B+tree:

Each of 13 sons: 12 rec. ptrs (+12 keys)= 12x(4 +4) + 4 = 100 bytes

2-level B+tree, Max # records= 13x12 = 156

Page 101: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 101

So...

ooooooooooooo ooooooooo 156 records 108 records

Total = 116

B+ B

8 records

• Conclusion:– For fixed block size,– B+ tree is better because it is bushier

Page 102: CS 277: Database System Implementation Notes 4: Indexing

CS 277 – Spring 2002

Notes 4 102

Outline/summary

• Conventional Indexes• Sparse vs. dense• Primary vs. secondary

• B trees• B+trees vs. B-trees• B+trees vs. indexed sequential

• Hashing schemes --> Next