34
1 Chapter 11: Indexing and Chapter 11: Indexing and Hashing Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files Hashing Hashing Static Static Dynamic Hashing Dynamic Hashing

1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

Embed Size (px)

Citation preview

Page 1: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

1

Chapter 11: Indexing and Chapter 11: Indexing and HashingHashing

IndexingIndexing Basic ConceptsBasic Concepts Ordered Indices Ordered Indices B+-Tree Index FilesB+-Tree Index Files

HashingHashing StaticStatic Dynamic HashingDynamic Hashing

Page 2: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

2

Basic ConceptsBasic Concepts

ValueValue

Search KeySearch Key - set of attributes used to - set of attributes used to look up records in a file.look up records in a file.

value

record

search key pointer

Page 3: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

3

Index Evaluation MetricsIndex Evaluation Metrics

Access types supported efficiently. E.g., Access types supported efficiently. E.g., Point query: find “Tom”Point query: find “Tom” Range query: find students whose age is Range query: find students whose age is

between 20-40between 20-40 Access timeAccess time Update timeUpdate time Space overheadSpace overhead

Page 4: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

4

Ordered IndicesOrdered Indices In an In an ordered indexordered index, , index entries are index entries are

stored sorted on the search key value. stored sorted on the search key value. E.g., author catalog in library.E.g., author catalog in library.

Page 5: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

5

2010

4030

6050

8070

10090

10305070

90110130150

170190210230

Primary index

Also called clustering index

•The search key of a primary index is usually but not necessarily the primary key.

same order

Search key

Page 6: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

6

Search key

5030

7020

4080

10100

6090

10203040

506070...

Secondary index: non-clustering index.

different order

Page 7: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

7

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Dense Index: contains index records for every search-key values.

Page 8: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

8

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Sparse Index: contains index records for only some search-key values.

Applicable when records are sequentially ordered on search-key

Page 9: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

9

Secondary indexesSecondary indexes Sequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!

Page 10: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

10

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

Multilevel IndexMultilevel Index

Page 11: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

11

Secondary indexesSecondary indexes Sequencefield

5030

7020

4080

10100

6090

10203040

506070...

105090...

sparsehighlevel

Lowest level is denseLowest level is dense Other levels are Other levels are

sparsesparse

Multilevel IndexMultilevel Index

Page 12: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

12

Conventional indexesConventional indexes

Advantage:Advantage:

- Simple- Simple- Index is sequential file good - Index is sequential file good

for for scansscansDisadvantage:Disadvantage:

- Inserts expensive- Inserts expensive

Page 13: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

13

OutlineOutline

Conventional indexesConventional indexes B+-Tree B+-Tree NEXT NEXT

Page 14: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

14

NEXT: Another type of indexNEXT: Another type of index Give up on sequentiality of indexGive up on sequentiality of index Try to get “balance”Try to get “balance”

Page 15: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

15

RootRoot

B+Tree Example n=4

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 16: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

16

Sample non-leafSample non-leaf

57

81

95

Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

Page 17: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

17

Sample leaf node:Sample leaf node:

From non-leaf nodeFrom non-leaf node

to next leafto next leaf

in sequencein sequence

57

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Key is copied (not moved) from leaf node to non-leaf node

Page 18: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

18

n=4n=4

Leaf:Leaf:

Non-leaf:Non-leaf:

30

35

30

30 35

30

Page 19: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

19

Size of nodes: Size of nodes:

n pointersn pointers

n-1 keysn-1 keys

Page 20: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

20

Don’t want nodes to be too Don’t want nodes to be too emptyempty

Use at leastUse at least

Root : 2 pointersRoot : 2 pointers

Non-leaf: Non-leaf: n/2n/2 pointers pointers

Leaf : Leaf : (n-1)/2(n-1)/2 keys keys

Page 21: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

21

Full nodeFull node min. nodemin. node

Non-leafNon-leaf

LeafLeaf

n=4

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Page 22: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

22

B+tree rulesB+tree rulestree of order tree of order nn

(1) All leaves at same lowest level(1) All leaves at same lowest level(balanced tree)(balanced tree)

(2) Pointers in leaves point to records(2) Pointers in leaves point to records except for “sequence pointer”except for “sequence pointer”

Page 23: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

23

(3) Number of pointers/keys for B+tree(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n n-1 n/2 n/2- 1

Leaf(non-root) n n-1

Root n n-1 2 1

Max Max Min Min ptrs keys ptrsdata keys

(n-1)/2 (n-1)/2

Page 24: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

24

Insert into B+treeInsert into B+tree

(a) simple case(a) simple case space available in leafspace available in leaf

(b) leaf overflow(b) leaf overflow

(c) non-leaf overflow(c) non-leaf overflow

(d) new root(d) new root

Page 25: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

25

(a) Insert key = 32(a) Insert key = 32 n=43 5 11

30

31

30

100

32

Page 26: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

26

(b) Insert key = 7(b) Insert key = 7 n=4

3 5 11

30

31

30

100

3 5

7

7

Page 27: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

27

(c) Insert key = 160(c) Insert key = 160 n=4

10

0

120

150

180

150

156

179

180

200

160

18

0

160

179

Page 28: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

28

(d) New root, insert 45(d) New root, insert 45 n=4

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

Page 29: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

29

(a) Simple case - (a) Simple case - no exampleno example

(b) Coalesce with neighbor (b) Coalesce with neighbor (sibling)(sibling)

(c) Re-distribute keys(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf(d) Cases (b) or (c) at non-leaf

Deletion from B+treeDeletion from B+tree

Page 30: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

30

(b) Coalesce with sibling(b) Coalesce with sibling Delete 50Delete 50

10

40

100

10

20

30

40

50

n=5

40

Page 31: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

31

(c) Redistribute keys(c) Redistribute keys Delete 50Delete 50

10

40

100

10

20

30

35

40

50

n=5

35

35

Page 32: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

32

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalese(d) Non-leaf coalese Delete 37Delete 37

n=5

40

30

25

25

new root

Page 33: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

33

B+tree deletions in B+tree deletions in practicepractice

– Often, coalescing is Often, coalescing is notnot implemented implemented Too hard and not worth it!Too hard and not worth it!

Page 34: 1 Chapter 11: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

34

Index Definition in SQLIndex Definition in SQL

Create an indexCreate an indexcreate indexcreate index <index-name> <index-name>

onon <relation-name> (<attribute-list>) <relation-name> (<attribute-list>)

E.g.: create index gindex on country(gdp);E.g.: create index gindex on country(gdp);

To drop an index To drop an index drop index drop index <index-name><index-name>

E.g.: drop index gindex;E.g.: drop index gindex;