Upload
zhaolinjnu
View
1.446
Download
2
Tags:
Embed Size (px)
DESCRIPTION
hash introduction
Citation preview
2
What is a Hash Table?
• Hash Table– Hash function
• h(k): key ‡ integer [0…n]• e.g., h(‘Susan’) = 7
– Array for keys: T[0…n]– Given a key k, store it in
T[h(k)]
5
Susan4
James3
2
Neil1
0
h(Susan) = 4h(James) = 3h(Neil) = 1
3
Why Hash Table?
• Direct access• Saved space
– Do not reserve a space forevery possible key
• How can we use this idea fora record search in a DBMS?
4
Hashing for DBMS(Static Hashing)
(key, record)
.
..
Disk blocks (buckets)
search key Æ h(key)
0
1
2
3
4
7
Overflow and Chaining
• Inserth(a) = 1h(b) = 2h(c) = 1h(d) = 0h(e) = 1
• Deleteh(b) = 2h(c) = 1
0
1
2
3
Do we keep keys sorted inside a block?
8
Questions
• Q: Do we keep keys sortedinside a block?
• Q: How much space shouldwe use for “good”performance?
9
What Hash Function?
• Example– Key = ‘x1 x2 … xn’ n byte
character string– Have b buckets– h: (x1 + x2 + ….. xn) mod b
• Desired property– Uniformity: same # of keys for
every bucket• Reference
– “The Art of ComputerProgramming Vol. 3” by Knuth
10
Major Problem ofStatic Hashing
• How to cope with growth?– Data tends to grow in size– Overflow blocks unavoidable
102030
405060
708090
39313536
323834
33
overflow blockshash buckets
11
Idea 1
• Periodic reorganization
– New hash function– Complete reassignment of records– Very expensive
102030
4050…
33
102030
405060
708090
33
405060
13
(a) Use i of b bits output byhash function
Extendible Hashing(two ideas)
00110101h(K) Æ
b
use i Æ grows over time
14
(b) Use directory thatmaintains pointers to hashbuckets (indirection)
Extendible Hashing(two ideas)
..
.
.
..
c
e
hash bucketdirectory
h(c)
22
Extendible Hashing:Deletion
• Two optionsa) No merging of bucketsb) Merge buckets and shrink
directory if possible
23
Bucket Merge Example
Can we merge a and b? b and c?
Can we shrink the directory?
1
0001
2
1001
2
1100
00
01
10
11
2i =a
b
c
24
Bucket MergeCondition
• Bucket merge condition– Bucket i’s are the same– First (i-1) bits of the hash key
are the same
• Directory shrink condition– All bucket i’s are smaller than
the directory i
25
Summary ofExtendible Hashing
• Can handle growing files– No periodic reorganizations
• Indirection– Up to 2 disk accesses to
access a key
• Directory doubles in size– Not too bad if the data is not
too large