1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices...

Preview:

Citation preview

1

Chapter 12: Indexing and Chapter 12: Indexing and HashingHashing

IndexingIndexing Basic ConceptsBasic Concepts Ordered Indices Ordered Indices B+-Tree Index FilesB+-Tree Index Files

HashingHashing StaticStatic Dynamic HashingDynamic Hashing

More: bitmap indexingMore: bitmap indexing

2

HashingHashing

Static hashingStatic hashing Dynamic hashingDynamic hashing

3

key key h(key) h(key)

Hashing

<key>

.

.

Buckets(typically 1disk block)

4

Example hash functionExample hash function

Key = ‘xKey = ‘x11 x x2 2 … x… xnn’ ’ nn byte character string byte character string Have Have bb buckets buckets h: add xh: add x1 + 1 + xx2 + 2 + ….. x….. xnn

compute sum modulo compute sum modulo bb

5

This may not be best function …This may not be best function …

Good hashGood hash Expected number of Expected number of

function:function: keys/bucket is thekeys/bucket is the

same for all same for all bucketsbuckets

6

Within a bucket:Within a bucket:

Do we keep keys sorted?Do we keep keys sorted?

Yes, if CPU time criticalYes, if CPU time critical

& Inserts/Deletes not too frequent& Inserts/Deletes not too frequent

7

Next:Next: example to illustrate example to illustrateinserts, overflows, inserts, overflows,

deletesdeletes

h(K)h(K)

8

EXAMPLEEXAMPLE 2 records/bucket 2 records/bucket

INSERT:INSERT:

h(a) = 1h(a) = 1

h(b) = 2h(b) = 2

h(c) = 1h(c) = 1

h(d) = 0h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1h(e) = 1

e

9

0

1

2

3

a

bc

e

d

EXAMPLE:EXAMPLE: deletion deletion

Delete:ef

fg

maybe move“g” up

cd

10

Rule of thumb:Rule of thumb: Try to keep space utilizationTry to keep space utilization

between 50% and 80%between 50% and 80%

Utilization = Utilization = # keys used# keys used total # keys that fittotal # keys that fit

If < 50%, wasting spaceIf < 50%, wasting space If > 80%, overflows significantIf > 80%, overflows significant

depends on how good hashdepends on how good hashfunction is & on # keys/bucketfunction is & on # keys/bucket

11

How do we cope with growth?How do we cope with growth?

Overflows and reorganizationsOverflows and reorganizations Dynamic hashingDynamic hashing

ExtensibleExtensible LinearLinear

12

Extensible hashing:Extensible hashing: two ideas two ideas

(a) Use (a) Use ii of of bb bits output by hash function bits output by hash function

bb

h(K) h(K)

use use ii grows over time…. grows over time….

00110101

13

(b) Use directory(b) Use directory

h(K)[h(K)[i i ]] to bucket to bucket

.

.

.

.

14

Example:Example: h(k) is 4 bits; 2 h(k) is 4 bits; 2 keys/bucketkeys/bucket

ii = = 1

1

1

0001

1001

1100

Insert Insert 10101010

11100

1010

New directory

200

01

10

11

ii = =

2

2

15

10001

21001

1010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example continued

0111

0000

0111

0001

2

2

16

00

01

10

11

2i =

21001

1010

21100

20111

20000

0001

Insert:

1001

Example continued

1001

1001

1010

000

001

010

011

100

101

110

111

3i =

3

3

17

Extensible hashing: Extensible hashing: deletiondeletion

No merging of blocksNo merging of blocks Merge blocks Merge blocks

and cut directory if possible and cut directory if possible

(Reverse insert procedure)(Reverse insert procedure)

18

Deletion example:Deletion example:

Run thru insert example in reverse!Run thru insert example in reverse!

19

Extensible hashingExtensible hashing

Can handle growing filesCan handle growing files

- with less wasted space- with less wasted space

- with no full reorganizations- with no full reorganizations

Summary

+

IndirectionIndirection(Not bad if directory in memory)(Not bad if directory in memory)

Directory doubles in sizeDirectory doubles in size(Now it fits, now it does not)(Now it fits, now it does not)

-

-

20

Advanced indexingAdvanced indexing

Multiple attributesMultiple attributes Bitmap indexing Bitmap indexing

21

Multiple-Key AccessMultiple-Key Access Use multiple indices for certain types Use multiple indices for certain types

of queries.of queries. Example: Example:

select select account-numberaccount-number

fromfrom account account

wherewhere branch-name branch-name = “Perryridge” = “Perryridge” and and balance balance = 1000= 1000

Possible strategies?Possible strategies?

22

Indices on Multiple Indices on Multiple AttributesAttributes

wherewhere branch-name = branch-name = “PP” “PP” andand balance = balance = 10001000Suppose we have an index on combined search-key

(branch-name, balance).

BB

,10

00

CC

,20

0PP,8

00

PP,1

500

AB

,20

0

AA

,20

00

AA

,23

00

AA

,25

00

AB

,20

0A

C,2

00

CC

,20

0D

D,2

00

DD

,300

CC

,20

0PP,3

00

PP,8

00

PP,1

000

PP,1

300

PP,1

500

PP,1

560

23

wherewhere branch-name branch-name = “PP” = “PP” and and balance balance < 1000< 1000

Suppose we have an index on combined search-key(branch-name, balance).

BB

,10

00

CC

,20

0PP,8

00

PP,1

500

AB

,20

0

AA

,20

00

AA

,23

00

AA

,25

00

AB

,20

0A

C,2

00

CC

,20

0D

D,2

00

DD

,300

CC

,20

0PP,3

00

PP,8

00

PP,1

000

PP,1

300

PP,1

500

PP,1

560

search pp,0

search pp,1000

24

wherewhere branch-name branch-name < “PP” < “PP” andand balance balance = = 1000?1000?

Suppose we have an index on combined search-key(branch-name, balance).

BB

,10

00

CC

,20

0PP,8

00

PP,1

500

AB

,20

0

AA

,20

00

AA

,23

00

AA

,25

00

AB

,20

0A

C,2

00

CC

,20

0D

D,2

00

DD

,300

CC

,20

0PP,3

00

PP,8

00

PP,1

000

PP,1

300

PP,1

500

PP,1

560

NO!

25

Bitmap IndicesBitmap Indices

An index designed for multiple valued An index designed for multiple valued search keyssearch keys

26

Bitmap Indices (Cont.)Bitmap Indices (Cont.)

Unique values of gender

Unique values of income-

level

Bitmap(size = table size)

The income-level value of record 3 is L1

27

Bitmap Indices (Cont.)Bitmap Indices (Cont.) Some properties of bitmap indicesSome properties of bitmap indices

Number of bitmaps for each attribute?Number of bitmaps for each attribute? Size of each bitmap?Size of each bitmap? When is the When is the bitmap matrixbitmap matrix sparse and what attributes sparse and what attributes

are good for bitmap indices?are good for bitmap indices?

28

Bitmap Indices (Cont.)Bitmap Indices (Cont.) Bitmap indices generally very small Bitmap indices generally very small

compared with relation sizecompared with relation size E.g. if record is 100 bytes, space for a single E.g. if record is 100 bytes, space for a single

bitmap is 1/800 of space used by relation. bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, If number of distinct attribute values is 8,

bitmap is only 1% of relation sizebitmap is only 1% of relation size What about insertion?What about insertion? Deletion?Deletion?

29

Bitmap Indices QueriesBitmap Indices QueriesSample query: Males with income level L1

10010 AND 10100 = 10000

What about the number of males with income level L1?

even faster!

30

Bitmap Indices QueriesBitmap Indices Queries

Queries are answered using bitmap Queries are answered using bitmap operationsoperations Intersection (and)Intersection (and) Union (or)Union (or) Complementation (not) Complementation (not)

Recommended