1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them...

Preview:

Citation preview

1

Hashing

2

Introduction - Hashing

Use an array to store key values.But instead of storing them in a

position relative to other key values (i.e. in ascending order), store them based on a mathematical calculation.

This mathematical calculation is called a “hash function”.

3

Discussion

Advantage: Fastest search times - O(1) in the best case Inserts, deletes also fast

Disadvantages: Key values not available in order. Eliminates the option of range queries. What if two key values hash to the same

place?

4

Basic Idea

A hash algorithm has two parts: The hash function The collision resolution strategy

Simple example: A hash function for integers might be

H(x) = x mod nwhere “n” is the number of elements in the array, or “buckets”

To resolve collisions, simply search linearly for the next open bucket. (linear probing)

5

Example

First, hash 17:

Keys to hash: 17, 23, 89, 103, 44 H(x) = x mod 10

210 543 876 9

6

Example

17 hashes to bucket 7...no problem.Now hash 23:

Keys to hash: 23, 89, 103, 44 H(x) = x mod 10

210 543 8

17

76 9

7

Example

23 hashes to bucket 3...no problem.Now hash 89:

Keys to hash: 89, 103, 44 H(x) = x mod 10

210 54

23

3 8

17

76 9

8

Example

89 hashes to bucket 9...no problem.Now hash 103:

Keys to hash: 103, 44 H(x) = x mod 10

210 54

23

3 8

17

76

89

9

9

Example

103 hashes to bucket 3; since 23 is already there, place 103 in bucket 4.

Now hash 44:

Keys to hash: 44 H(x) = x mod 10

210 5

103

4

23

3 8

17

76

89

9

10

Example

44 hashes to bucket 4, which is full, so put 44 in bucket 5.

Keys to hash: done. H(x) = x mod 10

210

44

5

103

4

23

3 8

17

76

89

9

11

Example

Note that bucket 4 contains a key value which does not hash to 4, but there is one that does.

This makes delete tricky...

Keys to hash: done. H(x) = x mod 10

210

44

5

103

4

23

3 8

17

76

89

9

12

Searching a Hash Table

Hash the key value to search;If it is in that bucket, stop and report

success.If not, search linearly until

It is found (success) An empty bucket is encountered.

13

Deleting from a Hash Table

To delete, first hash the key value to delete to find its bucket.

If it is there, skip the next stepSearch linearly in the table until

either you find it (success - continue) you reach an open bucket (failure - stop)

Remove the key value from its bucket

14

Deleting II

But you might have previously hashed keys that collided with this one, so...

For every key value from the current bucket to the next open bucket: Rehash the key value and move it if

necessary.

15

Delete Example

Delete 17:

Keys to delete: 17, 23, 103

210

44

5

103

4

23

3 8

17

76

89

9

H(x) = x mod 10

16

Delete Example

17 hashes to bucket 7. It is there, so remove it; 8 is empty, so we’re done.

Delete 23:

Keys to delete: 23, 103

210

44

5

103

4

23

3 876

89

9

H(x) = x mod 10

17

Delete Example

23 hashes to bucket 3; it is there, so remove it.

Now rehash from bucket 4 to 5 (6 is empty)

Keys to delete: 103

210

44

5

103

43 876

89

9

H(x) = x mod 10

18

Delete Example

What do I do with 103?

Keys to delete: 103

210

44

5

103

43 876

89

9

H(x) = x mod 10

19

Delete Example

Move it to bucket 3.What about 44?

Keys to delete: 103

210

44

54

103

3 876

89

9

H(x) = x mod 10

20

Delete Example

Move it to bucket 4.Done with delete of 23.Delete 103:

Keys to delete: 103

210 5

44

4

103

3 876

89

9

H(x) = x mod 10

21

Delete Example

103 hashes to bucket 3; it is there, so remove it.

What about rehashes?

Keys to delete: done

210 5

44

43 876

89

9

H(x) = x mod 10

22

Delete Example

I must rehash from 4 to 4 as bucket 5 is the next open bucket.

44 hashes to bucket 4, so no moves are necessary.

Keys to delete: done

210 5

44

43 876

89

9

H(x) = x mod 10

Picking a Good Hash Function

Suppose I wish to hash the values to the right.

Suppose further my table size is 16.

I could try H(X) = X mod 16

23

53

21

37

5

58

42

26

10

Picking a Good Hash Function II

This doesn’t work out so well.

What else can I do?

Try looking at the bits:

24

Number Mod 1653 521 537 5

5 558 1042 1026 1010 10

Picking a Good Hash Function III

25

Number Mod 16 32 16 8 4 2 153 5 1 1 0 1 0 121 5 0 1 0 1 0 137 5 1 0 0 1 0 1

5 5 0 0 0 1 0 158 10 1 1 1 0 1 042 10 1 0 1 0 1 026 10 0 1 1 0 1 010 10 0 0 1 0 1 0

Picking a Good Hash Function IV

26

Number

Mod 16 32 16 8 4 2 1

4bit*8 + 8bit*4 + 16bit*2 + 32bit*1

53 5 1 1 0 1 0 1 11

21 5 0 1 0 1 0 1 10

37 5 1 0 0 1 0 1 9

5 5 0 0 0 1 0 1 8

58 10 1 1 1 0 1 0 7

42 10 1 0 1 0 1 0 5

26 10 0 1 1 0 1 0 6

10 10 0 0 1 0 1 0 4

Collision Resolution

Linear probing often produces a “cluster” of values. This slows the process of finding/inserting a value.

What else can we do? Linear probing with a factor > 1. Quadratic probing. Double Hashing. Closed addressing.

27

Linear Probing with i>1

Suppose we use a hash function such as H(x) = (x+k*i) mod n “k” is the multiplying constant. Start with i=0 and proceed to n-1 until

an open bucket is found.

This will move values away from a cluster more quickly.

28

Linear Probing Problem

In our previous example, n=16.With k=6, we have H(x) = (x+6*i) mod 16

What is the probe sequence for x=3?

3, 9, 15, 5, 11, 1, 7, 13, 3

OOPS!!! All values are not available!

29

Problem Solution

“n” and “k” must be relatively prime.Thus, try it with k=3:Using H(x) = (x+3*i) mod 16What is the probe sequence for x=3?

3, 6, 9, 12, 15, 2, 5, 8, 11, 14, 1, 4, 7, 10, 13, 0, 3

The absolute best is to choose “n” prime. 30

Quadratic Probing

Uses something like H(x) = (x+i2) mod n

Advantage:Moves values farther more quickly.

Disadvantage:Not guaranteed to probe every bucket (But half is possible if n is prime).

31

Double Hashing

Uses something like H1(x) = [x+i*H2(x)] mod n

H2(x) = R – (x mod R)

Here, it is very important to choose n and R prime and that R < n.

32

Closed Addressing

Idea: avoid collisions entirely by allowing more than one key value per bucket.

How to allow multiple values per bucket? Linked List Binary Search Tree AVL Tree Another Hash table

33

Recommended