33
1 Hashing

1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

1

Hashing

Page 2: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

2

Introduction - Hashing

Use an array to store key values.But instead of storing them in a

position relative to other key values (i.e. in ascending order), store them based on a mathematical calculation.

This mathematical calculation is called a “hash function”.

Page 3: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

3

Discussion

Advantage: Fastest search times - O(1) in the best case Inserts, deletes also fast

Disadvantages: Key values not available in order. Eliminates the option of range queries. What if two key values hash to the same

place?

Page 4: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

4

Basic Idea

A hash algorithm has two parts: The hash function The collision resolution strategy

Simple example: A hash function for integers might be

H(x) = x mod nwhere “n” is the number of elements in the array, or “buckets”

To resolve collisions, simply search linearly for the next open bucket. (linear probing)

Page 5: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

5

Example

First, hash 17:

Keys to hash: 17, 23, 89, 103, 44 H(x) = x mod 10

210 543 876 9

Page 6: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

6

Example

17 hashes to bucket 7...no problem.Now hash 23:

Keys to hash: 23, 89, 103, 44 H(x) = x mod 10

210 543 8

17

76 9

Page 7: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

7

Example

23 hashes to bucket 3...no problem.Now hash 89:

Keys to hash: 89, 103, 44 H(x) = x mod 10

210 54

23

3 8

17

76 9

Page 8: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

8

Example

89 hashes to bucket 9...no problem.Now hash 103:

Keys to hash: 103, 44 H(x) = x mod 10

210 54

23

3 8

17

76

89

9

Page 9: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

9

Example

103 hashes to bucket 3; since 23 is already there, place 103 in bucket 4.

Now hash 44:

Keys to hash: 44 H(x) = x mod 10

210 5

103

4

23

3 8

17

76

89

9

Page 10: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

10

Example

44 hashes to bucket 4, which is full, so put 44 in bucket 5.

Keys to hash: done. H(x) = x mod 10

210

44

5

103

4

23

3 8

17

76

89

9

Page 11: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

11

Example

Note that bucket 4 contains a key value which does not hash to 4, but there is one that does.

This makes delete tricky...

Keys to hash: done. H(x) = x mod 10

210

44

5

103

4

23

3 8

17

76

89

9

Page 12: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

12

Searching a Hash Table

Hash the key value to search;If it is in that bucket, stop and report

success.If not, search linearly until

It is found (success) An empty bucket is encountered.

Page 13: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

13

Deleting from a Hash Table

To delete, first hash the key value to delete to find its bucket.

If it is there, skip the next stepSearch linearly in the table until

either you find it (success - continue) you reach an open bucket (failure - stop)

Remove the key value from its bucket

Page 14: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

14

Deleting II

But you might have previously hashed keys that collided with this one, so...

For every key value from the current bucket to the next open bucket: Rehash the key value and move it if

necessary.

Page 15: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

15

Delete Example

Delete 17:

Keys to delete: 17, 23, 103

210

44

5

103

4

23

3 8

17

76

89

9

H(x) = x mod 10

Page 16: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

16

Delete Example

17 hashes to bucket 7. It is there, so remove it; 8 is empty, so we’re done.

Delete 23:

Keys to delete: 23, 103

210

44

5

103

4

23

3 876

89

9

H(x) = x mod 10

Page 17: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

17

Delete Example

23 hashes to bucket 3; it is there, so remove it.

Now rehash from bucket 4 to 5 (6 is empty)

Keys to delete: 103

210

44

5

103

43 876

89

9

H(x) = x mod 10

Page 18: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

18

Delete Example

What do I do with 103?

Keys to delete: 103

210

44

5

103

43 876

89

9

H(x) = x mod 10

Page 19: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

19

Delete Example

Move it to bucket 3.What about 44?

Keys to delete: 103

210

44

54

103

3 876

89

9

H(x) = x mod 10

Page 20: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

20

Delete Example

Move it to bucket 4.Done with delete of 23.Delete 103:

Keys to delete: 103

210 5

44

4

103

3 876

89

9

H(x) = x mod 10

Page 21: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

21

Delete Example

103 hashes to bucket 3; it is there, so remove it.

What about rehashes?

Keys to delete: done

210 5

44

43 876

89

9

H(x) = x mod 10

Page 22: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

22

Delete Example

I must rehash from 4 to 4 as bucket 5 is the next open bucket.

44 hashes to bucket 4, so no moves are necessary.

Keys to delete: done

210 5

44

43 876

89

9

H(x) = x mod 10

Page 23: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Picking a Good Hash Function

Suppose I wish to hash the values to the right.

Suppose further my table size is 16.

I could try H(X) = X mod 16

23

53

21

37

5

58

42

26

10

Page 24: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Picking a Good Hash Function II

This doesn’t work out so well.

What else can I do?

Try looking at the bits:

24

Number Mod 1653 521 537 5

5 558 1042 1026 1010 10

Page 25: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Picking a Good Hash Function III

25

Number Mod 16 32 16 8 4 2 153 5 1 1 0 1 0 121 5 0 1 0 1 0 137 5 1 0 0 1 0 1

5 5 0 0 0 1 0 158 10 1 1 1 0 1 042 10 1 0 1 0 1 026 10 0 1 1 0 1 010 10 0 0 1 0 1 0

Page 26: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Picking a Good Hash Function IV

26

Number

Mod 16 32 16 8 4 2 1

4bit*8 + 8bit*4 + 16bit*2 + 32bit*1

53 5 1 1 0 1 0 1 11

21 5 0 1 0 1 0 1 10

37 5 1 0 0 1 0 1 9

5 5 0 0 0 1 0 1 8

58 10 1 1 1 0 1 0 7

42 10 1 0 1 0 1 0 5

26 10 0 1 1 0 1 0 6

10 10 0 0 1 0 1 0 4

Page 27: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Collision Resolution

Linear probing often produces a “cluster” of values. This slows the process of finding/inserting a value.

What else can we do? Linear probing with a factor > 1. Quadratic probing. Double Hashing. Closed addressing.

27

Page 28: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Linear Probing with i>1

Suppose we use a hash function such as H(x) = (x+k*i) mod n “k” is the multiplying constant. Start with i=0 and proceed to n-1 until

an open bucket is found.

This will move values away from a cluster more quickly.

28

Page 29: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Linear Probing Problem

In our previous example, n=16.With k=6, we have H(x) = (x+6*i) mod 16

What is the probe sequence for x=3?

3, 9, 15, 5, 11, 1, 7, 13, 3

OOPS!!! All values are not available!

29

Page 30: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Problem Solution

“n” and “k” must be relatively prime.Thus, try it with k=3:Using H(x) = (x+3*i) mod 16What is the probe sequence for x=3?

3, 6, 9, 12, 15, 2, 5, 8, 11, 14, 1, 4, 7, 10, 13, 0, 3

The absolute best is to choose “n” prime. 30

Page 31: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Quadratic Probing

Uses something like H(x) = (x+i2) mod n

Advantage:Moves values farther more quickly.

Disadvantage:Not guaranteed to probe every bucket (But half is possible if n is prime).

31

Page 32: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Double Hashing

Uses something like H1(x) = [x+i*H2(x)] mod n

H2(x) = R – (x mod R)

Here, it is very important to choose n and R prime and that R < n.

32

Page 33: 1 Hashing. 2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in

Closed Addressing

Idea: avoid collisions entirely by allowing more than one key value per bucket.

How to allow multiple values per bucket? Linked List Binary Search Tree AVL Tree Another Hash table

33