Upload
ilene-barrett
View
234
Download
0
Embed Size (px)
Citation preview
1
Chapter 12: Indexing and Chapter 12: Indexing and HashingHashing
IndexingIndexing Basic ConceptsBasic Concepts Ordered Indices Ordered Indices B+-Tree Index FilesB+-Tree Index Files
HashingHashing StaticStatic Dynamic HashingDynamic Hashing
More: bitmap indexingMore: bitmap indexing
2
HashingHashing
Static hashingStatic hashing Dynamic hashingDynamic hashing
3
key key h(key) h(key)
Hashing
<key>
.
.
Buckets(typically 1disk block)
4
Example hash functionExample hash function
Key = ‘xKey = ‘x11 x x2 2 … x… xnn’ ’ nn byte character string byte character string Have Have bb buckets buckets h: add xh: add x1 + 1 + xx2 + 2 + ….. x….. xnn
compute sum modulo compute sum modulo bb
5
This may not be best function …This may not be best function …
Good hashGood hash Expected number of Expected number of
function:function: keys/bucket is thekeys/bucket is the
same for all same for all bucketsbuckets
6
Within a bucket:Within a bucket:
Do we keep keys sorted?Do we keep keys sorted?
Yes, if CPU time criticalYes, if CPU time critical
& Inserts/Deletes not too frequent& Inserts/Deletes not too frequent
7
Next:Next: example to illustrate example to illustrateinserts, overflows, inserts, overflows,
deletesdeletes
h(K)h(K)
8
EXAMPLEEXAMPLE 2 records/bucket 2 records/bucket
INSERT:INSERT:
h(a) = 1h(a) = 1
h(b) = 2h(b) = 2
h(c) = 1h(c) = 1
h(d) = 0h(d) = 0
0
1
2
3
d
ac
b
h(e) = 1h(e) = 1
e
9
0
1
2
3
a
bc
e
d
EXAMPLE:EXAMPLE: deletion deletion
Delete:ef
fg
maybe move“g” up
cd
10
Rule of thumb:Rule of thumb: Try to keep space utilizationTry to keep space utilization
between 50% and 80%between 50% and 80%
Utilization = Utilization = # keys used# keys used total # keys that fittotal # keys that fit
If < 50%, wasting spaceIf < 50%, wasting space If > 80%, overflows significantIf > 80%, overflows significant
depends on how good hashdepends on how good hashfunction is & on # keys/bucketfunction is & on # keys/bucket
11
How do we cope with growth?How do we cope with growth?
Overflows and reorganizationsOverflows and reorganizations Dynamic hashingDynamic hashing
ExtensibleExtensible LinearLinear
12
Extensible hashing:Extensible hashing: two ideas two ideas
(a) Use (a) Use ii of of bb bits output by hash function bits output by hash function
bb
h(K) h(K)
use use ii grows over time…. grows over time….
00110101
13
(b) Use directory(b) Use directory
h(K)[h(K)[i i ]] to bucket to bucket
.
.
.
.
14
Example:Example: h(k) is 4 bits; 2 h(k) is 4 bits; 2 keys/bucketkeys/bucket
ii = = 1
1
1
0001
1001
1100
Insert Insert 10101010
11100
1010
New directory
200
01
10
11
ii = =
2
2
15
10001
21001
1010
21100
Insert:
0111
0000
00
01
10
11
2i =
Example continued
0111
0000
0111
0001
2
2
16
00
01
10
11
2i =
21001
1010
21100
20111
20000
0001
Insert:
1001
Example continued
1001
1001
1010
000
001
010
011
100
101
110
111
3i =
3
3
17
Extensible hashing: Extensible hashing: deletiondeletion
No merging of blocksNo merging of blocks Merge blocks Merge blocks
and cut directory if possible and cut directory if possible
(Reverse insert procedure)(Reverse insert procedure)
18
Deletion example:Deletion example:
Run thru insert example in reverse!Run thru insert example in reverse!
19
Extensible hashingExtensible hashing
Can handle growing filesCan handle growing files
- with less wasted space- with less wasted space
- with no full reorganizations- with no full reorganizations
Summary
+
IndirectionIndirection(Not bad if directory in memory)(Not bad if directory in memory)
Directory doubles in sizeDirectory doubles in size(Now it fits, now it does not)(Now it fits, now it does not)
-
-
20
Advanced indexingAdvanced indexing
Multiple attributesMultiple attributes Bitmap indexing Bitmap indexing
21
Multiple-Key AccessMultiple-Key Access Use multiple indices for certain types Use multiple indices for certain types
of queries.of queries. Example: Example:
select select account-numberaccount-number
fromfrom account account
wherewhere branch-name branch-name = “Perryridge” = “Perryridge” and and balance balance = 1000= 1000
Possible strategies?Possible strategies?
22
Indices on Multiple Indices on Multiple AttributesAttributes
wherewhere branch-name = branch-name = “PP” “PP” andand balance = balance = 10001000Suppose we have an index on combined search-key
(branch-name, balance).
BB
,10
00
CC
,20
0PP,8
00
PP,1
500
AB
,20
0
AA
,20
00
AA
,23
00
AA
,25
00
AB
,20
0A
C,2
00
CC
,20
0D
D,2
00
DD
,300
CC
,20
0PP,3
00
PP,8
00
PP,1
000
PP,1
300
PP,1
500
PP,1
560
23
wherewhere branch-name branch-name = “PP” = “PP” and and balance balance < 1000< 1000
Suppose we have an index on combined search-key(branch-name, balance).
BB
,10
00
CC
,20
0PP,8
00
PP,1
500
AB
,20
0
AA
,20
00
AA
,23
00
AA
,25
00
AB
,20
0A
C,2
00
CC
,20
0D
D,2
00
DD
,300
CC
,20
0PP,3
00
PP,8
00
PP,1
000
PP,1
300
PP,1
500
PP,1
560
search pp,0
search pp,1000
24
wherewhere branch-name branch-name < “PP” < “PP” andand balance balance = = 1000?1000?
Suppose we have an index on combined search-key(branch-name, balance).
BB
,10
00
CC
,20
0PP,8
00
PP,1
500
AB
,20
0
AA
,20
00
AA
,23
00
AA
,25
00
AB
,20
0A
C,2
00
CC
,20
0D
D,2
00
DD
,300
CC
,20
0PP,3
00
PP,8
00
PP,1
000
PP,1
300
PP,1
500
PP,1
560
NO!
25
Bitmap IndicesBitmap Indices
An index designed for multiple valued An index designed for multiple valued search keyssearch keys
26
Bitmap Indices (Cont.)Bitmap Indices (Cont.)
Unique values of gender
Unique values of income-
level
Bitmap(size = table size)
The income-level value of record 3 is L1
27
Bitmap Indices (Cont.)Bitmap Indices (Cont.) Some properties of bitmap indicesSome properties of bitmap indices
Number of bitmaps for each attribute?Number of bitmaps for each attribute? Size of each bitmap?Size of each bitmap? When is the When is the bitmap matrixbitmap matrix sparse and what attributes sparse and what attributes
are good for bitmap indices?are good for bitmap indices?
28
Bitmap Indices (Cont.)Bitmap Indices (Cont.) Bitmap indices generally very small Bitmap indices generally very small
compared with relation sizecompared with relation size E.g. if record is 100 bytes, space for a single E.g. if record is 100 bytes, space for a single
bitmap is 1/800 of space used by relation. bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, If number of distinct attribute values is 8,
bitmap is only 1% of relation sizebitmap is only 1% of relation size What about insertion?What about insertion? Deletion?Deletion?
29
Bitmap Indices QueriesBitmap Indices QueriesSample query: Males with income level L1
10010 AND 10100 = 10000
What about the number of males with income level L1?
even faster!
30
Bitmap Indices QueriesBitmap Indices Queries
Queries are answered using bitmap Queries are answered using bitmap operationsoperations Intersection (and)Intersection (and) Union (or)Union (or) Complementation (not) Complementation (not)