View
217
Download
1
Category
Tags:
Preview:
Citation preview
1
Hashing
2
Introduction - Hashing
Use an array to store key values.But instead of storing them in a
position relative to other key values (i.e. in ascending order), store them based on a mathematical calculation.
This mathematical calculation is called a “hash function”.
3
Discussion
Advantage: Fastest search times - O(1) in the best case Inserts, deletes also fast
Disadvantages: Key values not available in order. Eliminates the option of range queries. What if two key values hash to the same
place?
4
Basic Idea
A hash algorithm has two parts: The hash function The collision resolution strategy
Simple example: A hash function for integers might be
H(x) = x mod nwhere “n” is the number of elements in the array, or “buckets”
To resolve collisions, simply search linearly for the next open bucket. (linear probing)
5
Example
First, hash 17:
Keys to hash: 17, 23, 89, 103, 44 H(x) = x mod 10
210 543 876 9
6
Example
17 hashes to bucket 7...no problem.Now hash 23:
Keys to hash: 23, 89, 103, 44 H(x) = x mod 10
210 543 8
17
76 9
7
Example
23 hashes to bucket 3...no problem.Now hash 89:
Keys to hash: 89, 103, 44 H(x) = x mod 10
210 54
23
3 8
17
76 9
8
Example
89 hashes to bucket 9...no problem.Now hash 103:
Keys to hash: 103, 44 H(x) = x mod 10
210 54
23
3 8
17
76
89
9
9
Example
103 hashes to bucket 3; since 23 is already there, place 103 in bucket 4.
Now hash 44:
Keys to hash: 44 H(x) = x mod 10
210 5
103
4
23
3 8
17
76
89
9
10
Example
44 hashes to bucket 4, which is full, so put 44 in bucket 5.
Keys to hash: done. H(x) = x mod 10
210
44
5
103
4
23
3 8
17
76
89
9
11
Example
Note that bucket 4 contains a key value which does not hash to 4, but there is one that does.
This makes delete tricky...
Keys to hash: done. H(x) = x mod 10
210
44
5
103
4
23
3 8
17
76
89
9
12
Searching a Hash Table
Hash the key value to search;If it is in that bucket, stop and report
success.If not, search linearly until
It is found (success) An empty bucket is encountered.
13
Deleting from a Hash Table
To delete, first hash the key value to delete to find its bucket.
If it is there, skip the next stepSearch linearly in the table until
either you find it (success - continue) you reach an open bucket (failure - stop)
Remove the key value from its bucket
14
Deleting II
But you might have previously hashed keys that collided with this one, so...
For every key value from the current bucket to the next open bucket: Rehash the key value and move it if
necessary.
15
Delete Example
Delete 17:
Keys to delete: 17, 23, 103
210
44
5
103
4
23
3 8
17
76
89
9
H(x) = x mod 10
16
Delete Example
17 hashes to bucket 7. It is there, so remove it; 8 is empty, so we’re done.
Delete 23:
Keys to delete: 23, 103
210
44
5
103
4
23
3 876
89
9
H(x) = x mod 10
17
Delete Example
23 hashes to bucket 3; it is there, so remove it.
Now rehash from bucket 4 to 5 (6 is empty)
Keys to delete: 103
210
44
5
103
43 876
89
9
H(x) = x mod 10
18
Delete Example
What do I do with 103?
Keys to delete: 103
210
44
5
103
43 876
89
9
H(x) = x mod 10
19
Delete Example
Move it to bucket 3.What about 44?
Keys to delete: 103
210
44
54
103
3 876
89
9
H(x) = x mod 10
20
Delete Example
Move it to bucket 4.Done with delete of 23.Delete 103:
Keys to delete: 103
210 5
44
4
103
3 876
89
9
H(x) = x mod 10
21
Delete Example
103 hashes to bucket 3; it is there, so remove it.
What about rehashes?
Keys to delete: done
210 5
44
43 876
89
9
H(x) = x mod 10
22
Delete Example
I must rehash from 4 to 4 as bucket 5 is the next open bucket.
44 hashes to bucket 4, so no moves are necessary.
Keys to delete: done
210 5
44
43 876
89
9
H(x) = x mod 10
Picking a Good Hash Function
Suppose I wish to hash the values to the right.
Suppose further my table size is 16.
I could try H(X) = X mod 16
23
53
21
37
5
58
42
26
10
Picking a Good Hash Function II
This doesn’t work out so well.
What else can I do?
Try looking at the bits:
24
Number Mod 1653 521 537 5
5 558 1042 1026 1010 10
Picking a Good Hash Function III
25
Number Mod 16 32 16 8 4 2 153 5 1 1 0 1 0 121 5 0 1 0 1 0 137 5 1 0 0 1 0 1
5 5 0 0 0 1 0 158 10 1 1 1 0 1 042 10 1 0 1 0 1 026 10 0 1 1 0 1 010 10 0 0 1 0 1 0
Picking a Good Hash Function IV
26
Number
Mod 16 32 16 8 4 2 1
4bit*8 + 8bit*4 + 16bit*2 + 32bit*1
53 5 1 1 0 1 0 1 11
21 5 0 1 0 1 0 1 10
37 5 1 0 0 1 0 1 9
5 5 0 0 0 1 0 1 8
58 10 1 1 1 0 1 0 7
42 10 1 0 1 0 1 0 5
26 10 0 1 1 0 1 0 6
10 10 0 0 1 0 1 0 4
Collision Resolution
Linear probing often produces a “cluster” of values. This slows the process of finding/inserting a value.
What else can we do? Linear probing with a factor > 1. Quadratic probing. Double Hashing. Closed addressing.
27
Linear Probing with i>1
Suppose we use a hash function such as H(x) = (x+k*i) mod n “k” is the multiplying constant. Start with i=0 and proceed to n-1 until
an open bucket is found.
This will move values away from a cluster more quickly.
28
Linear Probing Problem
In our previous example, n=16.With k=6, we have H(x) = (x+6*i) mod 16
What is the probe sequence for x=3?
3, 9, 15, 5, 11, 1, 7, 13, 3
OOPS!!! All values are not available!
29
Problem Solution
“n” and “k” must be relatively prime.Thus, try it with k=3:Using H(x) = (x+3*i) mod 16What is the probe sequence for x=3?
3, 6, 9, 12, 15, 2, 5, 8, 11, 14, 1, 4, 7, 10, 13, 0, 3
The absolute best is to choose “n” prime. 30
Quadratic Probing
Uses something like H(x) = (x+i2) mod n
Advantage:Moves values farther more quickly.
Disadvantage:Not guaranteed to probe every bucket (But half is possible if n is prime).
31
Double Hashing
Uses something like H1(x) = [x+i*H2(x)] mod n
H2(x) = R – (x mod R)
Here, it is very important to choose n and R prime and that R < n.
32
Closed Addressing
Idea: avoid collisions entirely by allowing more than one key value per bucket.
How to allow multiple values per bucket? Linked List Binary Search Tree AVL Tree Another Hash table
33
Recommended