34
1 Chapter 12: Indexing and Chapter 12: Indexing and Hashing Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files Hashing Hashing Static Static Dynamic Hashing Dynamic Hashing

Chapter 12: Indexing and Hashing

  • Upload
    osanna

  • View
    45

  • Download
    2

Embed Size (px)

DESCRIPTION

Chapter 12: Indexing and Hashing. Indexing Basic Concepts Ordered Indices B+-Tree Index Files Hashing Static Dynamic Hashing. search key pointer. Basic Concepts. Value Search Key - set of attributes used to look up records in a file. record. ?. value. Index Evaluation Metrics. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 12:  Indexing and Hashing

1

Chapter 12: Indexing and Chapter 12: Indexing and HashingHashing

IndexingIndexing Basic ConceptsBasic Concepts Ordered Indices Ordered Indices B+-Tree Index FilesB+-Tree Index Files

HashingHashing StaticStatic Dynamic HashingDynamic Hashing

Page 2: Chapter 12:  Indexing and Hashing

2

Basic ConceptsBasic Concepts

ValueValue

Search KeySearch Key - set of attributes used to - set of attributes used to look up records in a file.look up records in a file.

value

record

search key pointer

Page 3: Chapter 12:  Indexing and Hashing

3

Index Evaluation MetricsIndex Evaluation Metrics

Access types supported efficiently. E.g., Access types supported efficiently. E.g., Point query: find “Tom”Point query: find “Tom” Range query: find students whose age is Range query: find students whose age is

between 20-40between 20-40 Access timeAccess time Update timeUpdate time Space overheadSpace overhead

Page 4: Chapter 12:  Indexing and Hashing

4

Ordered IndicesOrdered Indices In an In an ordered indexordered index, , index entries are index entries are

stored sorted on the search key value. stored sorted on the search key value. E.g., author catalog in library.E.g., author catalog in library.

Page 5: Chapter 12:  Indexing and Hashing

5

2010

4030

6050

8070

10090

10305070

90110130150

170190210230

Primary index

Also called clustering index

•The search key of a primary index is usually but not necessarily the primary key.

same order

Search key

Page 6: Chapter 12:  Indexing and Hashing

6

Search key

5030

7020

4080

10100

6090

10203040

506070...

Secondary index: non-clustering index.

different order

Page 7: Chapter 12:  Indexing and Hashing

7

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Dense Index: contains index records for every search-key values.

Page 8: Chapter 12:  Indexing and Hashing

8

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Sparse Index: contains index records for only some search-key values.

Applicable when records are sequentially ordered on search-key

Page 9: Chapter 12:  Indexing and Hashing

9

Secondary indexesSecondary indexes Sequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!

Page 10: Chapter 12:  Indexing and Hashing

10

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

Multilevel IndexMultilevel Index

Page 11: Chapter 12:  Indexing and Hashing

11

Secondary indexesSecondary indexes Sequencefield

5030

7020

4080

10100

6090

10203040

506070...

105090...

sparsehighlevel

Lowest level is denseLowest level is dense Other levels are sparseOther levels are sparse

Multilevel IndexMultilevel Index

Page 12: Chapter 12:  Indexing and Hashing

12

Conventional indexesConventional indexes

Advantage:Advantage:

- Simple- Simple- Index is sequential file good - Index is sequential file good

for for scansscansDisadvantage:Disadvantage:

- Inserts expensive- Inserts expensive

Page 13: Chapter 12:  Indexing and Hashing

13

OutlineOutline

Conventional indexesConventional indexes B+-Tree B+-Tree NEXT NEXT

Page 14: Chapter 12:  Indexing and Hashing

14

NEXT: Another type of indexNEXT: Another type of index Give up on sequentiality of indexGive up on sequentiality of index Try to get “balance”Try to get “balance”

Page 15: Chapter 12:  Indexing and Hashing

15

RootRoot

B+Tree Example n=4

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 16: Chapter 12:  Indexing and Hashing

16

Sample non-leafSample non-leaf

57

81

95

Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

Page 17: Chapter 12:  Indexing and Hashing

17

Sample leaf node:Sample leaf node:

From non-leaf nodeFrom non-leaf node

to next leafto next leaf

in sequencein sequence

57

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Key is copied (not moved) from leaf node to non-leaf node

Page 18: Chapter 12:  Indexing and Hashing

18

n=4n=4

Leaf:Leaf:

Non-leaf:Non-leaf:

30

35

30

30 35

30

Page 19: Chapter 12:  Indexing and Hashing

19

Size of nodes: Size of nodes:

n pointersn pointers

n-1 keysn-1 keys

Page 20: Chapter 12:  Indexing and Hashing

20

Don’t want nodes to be too Don’t want nodes to be too emptyempty

Use at leastUse at least

Root : 2 pointersRoot : 2 pointers

Non-leaf: Non-leaf: n/2n/2 pointers pointers

Leaf : Leaf : (n-1)/2(n-1)/2 keys keys

Page 21: Chapter 12:  Indexing and Hashing

21

Full nodeFull node min. nodemin. node

Non-leafNon-leaf

LeafLeaf

n=4

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Page 22: Chapter 12:  Indexing and Hashing

22

B+tree rulesB+tree rulestree of order tree of order nn

(1) All leaves at same lowest level(1) All leaves at same lowest level(balanced tree)(balanced tree)

(2) Pointers in leaves point to records(2) Pointers in leaves point to records except for “sequence pointer”except for “sequence pointer”

Page 23: Chapter 12:  Indexing and Hashing

23

(3) Number of pointers/keys for B+tree(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n n-1 n/2 n/2- 1

Leaf(non-root) n n-1

Root n n-1 2 1

Max Max Min Min ptrs keys ptrsdata keys

(n-1)/2 (n-1)/2

Page 24: Chapter 12:  Indexing and Hashing

24

Insert into B+treeInsert into B+tree

(a) simple case(a) simple case space available in leafspace available in leaf

(b) leaf overflow(b) leaf overflow

(c) non-leaf overflow(c) non-leaf overflow

(d) new root(d) new root

Page 25: Chapter 12:  Indexing and Hashing

25

(a) Insert key = 32(a) Insert key = 32 n=43 5 11

30

31

30

100

32

Page 26: Chapter 12:  Indexing and Hashing

26

(b) Insert key = 7(b) Insert key = 7 n=4

3 5 11

30

31

30

100

3 5

7

7

Page 27: Chapter 12:  Indexing and Hashing

27

(c) Insert key = 160(c) Insert key = 160 n=4

10

0

120

150

180

150

156

179

180

200

160

18

0

160

179

Page 28: Chapter 12:  Indexing and Hashing

28

(d) New root, insert 45(d) New root, insert 45 n=4

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

Page 29: Chapter 12:  Indexing and Hashing

29

(a) Simple case - (a) Simple case - no exampleno example

(b) Coalesce with neighbor (b) Coalesce with neighbor (sibling)(sibling)

(c) Re-distribute keys(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf(d) Cases (b) or (c) at non-leaf

Deletion from B+treeDeletion from B+tree

Page 30: Chapter 12:  Indexing and Hashing

30

(b) Coalesce with sibling(b) Coalesce with sibling Delete 50Delete 50

10

40

100

10

20

30

40

50

n=5

40

Page 31: Chapter 12:  Indexing and Hashing

31

(c) Redistribute keys(c) Redistribute keys Delete 50Delete 50

10

40

100

10

20

30

35

40

50

n=5

35

35

Page 32: Chapter 12:  Indexing and Hashing

32

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalese(d) Non-leaf coalese Delete 37Delete 37

n=5

40

30

25

25

new root

Page 33: Chapter 12:  Indexing and Hashing

33

B+tree deletions in B+tree deletions in practicepractice

– Often, coalescing is Often, coalescing is notnot implemented implemented Too hard and not worth it!Too hard and not worth it!

Page 34: Chapter 12:  Indexing and Hashing

34

Index Definition in SQLIndex Definition in SQL

Create an indexCreate an indexcreate indexcreate index <index-name> <index-name>

onon <relation-name> (<attribute-list>) <relation-name> (<attribute-list>)

E.g.: create index gindex on country(gdp);E.g.: create index gindex on country(gdp);

To drop an index To drop an index drop index drop index <index-name><index-name>

E.g.: drop index gindex;E.g.: drop index gindex;