30
1 Chapter 12: Indexing and Chapter 12: Indexing and Hashing Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files Hashing Hashing Static Static Dynamic Hashing Dynamic Hashing More: bitmap indexing More: bitmap indexing

1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

Embed Size (px)

Citation preview

Page 1: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

1

Chapter 12: Indexing and Chapter 12: Indexing and HashingHashing

IndexingIndexing Basic ConceptsBasic Concepts Ordered Indices Ordered Indices B+-Tree Index FilesB+-Tree Index Files

HashingHashing StaticStatic Dynamic HashingDynamic Hashing

More: bitmap indexingMore: bitmap indexing

Page 2: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

2

HashingHashing

Static hashingStatic hashing Dynamic hashingDynamic hashing

Page 3: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

3

key key h(key) h(key)

Hashing

<key>

.

.

Buckets(typically 1disk block)

Page 4: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

4

Example hash functionExample hash function

Key = ‘xKey = ‘x11 x x2 2 … x… xnn’ ’ nn byte character string byte character string Have Have bb buckets buckets h: add xh: add x1 + 1 + xx2 + 2 + ….. x….. xnn

compute sum modulo compute sum modulo bb

Page 5: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

5

This may not be best function …This may not be best function …

Good hashGood hash Expected number of Expected number of

function:function: keys/bucket is thekeys/bucket is the

same for all same for all bucketsbuckets

Page 6: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

6

Within a bucket:Within a bucket:

Do we keep keys sorted?Do we keep keys sorted?

Yes, if CPU time criticalYes, if CPU time critical

& Inserts/Deletes not too frequent& Inserts/Deletes not too frequent

Page 7: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

7

Next:Next: example to illustrate example to illustrateinserts, overflows, inserts, overflows,

deletesdeletes

h(K)h(K)

Page 8: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

8

EXAMPLEEXAMPLE 2 records/bucket 2 records/bucket

INSERT:INSERT:

h(a) = 1h(a) = 1

h(b) = 2h(b) = 2

h(c) = 1h(c) = 1

h(d) = 0h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1h(e) = 1

e

Page 9: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

9

0

1

2

3

a

bc

e

d

EXAMPLE:EXAMPLE: deletion deletion

Delete:ef

fg

maybe move“g” up

cd

Page 10: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

10

Rule of thumb:Rule of thumb: Try to keep space utilizationTry to keep space utilization

between 50% and 80%between 50% and 80%

Utilization = Utilization = # keys used# keys used total # keys that fittotal # keys that fit

If < 50%, wasting spaceIf < 50%, wasting space If > 80%, overflows significantIf > 80%, overflows significant

depends on how good hashdepends on how good hashfunction is & on # keys/bucketfunction is & on # keys/bucket

Page 11: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

11

How do we cope with growth?How do we cope with growth?

Overflows and reorganizationsOverflows and reorganizations Dynamic hashingDynamic hashing

ExtensibleExtensible LinearLinear

Page 12: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

12

Extensible hashing:Extensible hashing: two ideas two ideas

(a) Use (a) Use ii of of bb bits output by hash function bits output by hash function

bb

h(K) h(K)

use use ii grows over time…. grows over time….

00110101

Page 13: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

13

(b) Use directory(b) Use directory

h(K)[h(K)[i i ]] to bucket to bucket

.

.

.

.

Page 14: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

14

Example:Example: h(k) is 4 bits; 2 h(k) is 4 bits; 2 keys/bucketkeys/bucket

ii = = 1

1

1

0001

1001

1100

Insert Insert 10101010

11100

1010

New directory

200

01

10

11

ii = =

2

2

Page 15: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

15

10001

21001

1010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example continued

0111

0000

0111

0001

2

2

Page 16: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

16

00

01

10

11

2i =

21001

1010

21100

20111

20000

0001

Insert:

1001

Example continued

1001

1001

1010

000

001

010

011

100

101

110

111

3i =

3

3

Page 17: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

17

Extensible hashing: Extensible hashing: deletiondeletion

No merging of blocksNo merging of blocks Merge blocks Merge blocks

and cut directory if possible and cut directory if possible

(Reverse insert procedure)(Reverse insert procedure)

Page 18: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

18

Deletion example:Deletion example:

Run thru insert example in reverse!Run thru insert example in reverse!

Page 19: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

19

Extensible hashingExtensible hashing

Can handle growing filesCan handle growing files

- with less wasted space- with less wasted space

- with no full reorganizations- with no full reorganizations

Summary

+

IndirectionIndirection(Not bad if directory in memory)(Not bad if directory in memory)

Directory doubles in sizeDirectory doubles in size(Now it fits, now it does not)(Now it fits, now it does not)

-

-

Page 20: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

20

Advanced indexingAdvanced indexing

Multiple attributesMultiple attributes Bitmap indexing Bitmap indexing

Page 21: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

21

Multiple-Key AccessMultiple-Key Access Use multiple indices for certain types Use multiple indices for certain types

of queries.of queries. Example: Example:

select select account-numberaccount-number

fromfrom account account

wherewhere branch-name branch-name = “Perryridge” = “Perryridge” and and balance balance = 1000= 1000

Possible strategies?Possible strategies?

Page 22: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

22

Indices on Multiple Indices on Multiple AttributesAttributes

wherewhere branch-name = branch-name = “PP” “PP” andand balance = balance = 10001000Suppose we have an index on combined search-key

(branch-name, balance).

BB

,10

00

CC

,20

0PP,8

00

PP,1

500

AB

,20

0

AA

,20

00

AA

,23

00

AA

,25

00

AB

,20

0A

C,2

00

CC

,20

0D

D,2

00

DD

,300

CC

,20

0PP,3

00

PP,8

00

PP,1

000

PP,1

300

PP,1

500

PP,1

560

Page 23: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

23

wherewhere branch-name branch-name = “PP” = “PP” and and balance balance < 1000< 1000

Suppose we have an index on combined search-key(branch-name, balance).

BB

,10

00

CC

,20

0PP,8

00

PP,1

500

AB

,20

0

AA

,20

00

AA

,23

00

AA

,25

00

AB

,20

0A

C,2

00

CC

,20

0D

D,2

00

DD

,300

CC

,20

0PP,3

00

PP,8

00

PP,1

000

PP,1

300

PP,1

500

PP,1

560

search pp,0

search pp,1000

Page 24: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

24

wherewhere branch-name branch-name < “PP” < “PP” andand balance balance = = 1000?1000?

Suppose we have an index on combined search-key(branch-name, balance).

BB

,10

00

CC

,20

0PP,8

00

PP,1

500

AB

,20

0

AA

,20

00

AA

,23

00

AA

,25

00

AB

,20

0A

C,2

00

CC

,20

0D

D,2

00

DD

,300

CC

,20

0PP,3

00

PP,8

00

PP,1

000

PP,1

300

PP,1

500

PP,1

560

NO!

Page 25: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

25

Bitmap IndicesBitmap Indices

An index designed for multiple valued An index designed for multiple valued search keyssearch keys

Page 26: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

26

Bitmap Indices (Cont.)Bitmap Indices (Cont.)

Unique values of gender

Unique values of income-

level

Bitmap(size = table size)

The income-level value of record 3 is L1

Page 27: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

27

Bitmap Indices (Cont.)Bitmap Indices (Cont.) Some properties of bitmap indicesSome properties of bitmap indices

Number of bitmaps for each attribute?Number of bitmaps for each attribute? Size of each bitmap?Size of each bitmap? When is the When is the bitmap matrixbitmap matrix sparse and what attributes sparse and what attributes

are good for bitmap indices?are good for bitmap indices?

Page 28: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

28

Bitmap Indices (Cont.)Bitmap Indices (Cont.) Bitmap indices generally very small Bitmap indices generally very small

compared with relation sizecompared with relation size E.g. if record is 100 bytes, space for a single E.g. if record is 100 bytes, space for a single

bitmap is 1/800 of space used by relation. bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, If number of distinct attribute values is 8,

bitmap is only 1% of relation sizebitmap is only 1% of relation size What about insertion?What about insertion? Deletion?Deletion?

Page 29: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

29

Bitmap Indices QueriesBitmap Indices QueriesSample query: Males with income level L1

10010 AND 10100 = 10000

What about the number of males with income level L1?

even faster!

Page 30: 1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files

30

Bitmap Indices QueriesBitmap Indices Queries

Queries are answered using bitmap Queries are answered using bitmap operationsoperations Intersection (and)Intersection (and) Union (or)Union (or) Complementation (not) Complementation (not)