35
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

Embed Size (px)

Citation preview

Page 1: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

COSC 2007 Data Structures II

Chapter 13Advanced Implementation of

Tables IV

Page 2: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

2

Topics

How to choose a Hash Function? Closed hashing

Linear hashing Quadratic hashing Double hashing

Page 3: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

3

Hash Functions Good hash function:

Easy & fast to compute Has  minimal number of clashes Data items are spread uniformly throughout the array

Hashing problems reduce to the following points: Finding a hashing method that minimizes collisions Resolving collisions when they do happen

Page 4: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

4

Hashing Methods

Integer Type It is sufficient for a hash function to operate on integers Any arbitrary integer can be converted into an integer

within a certain range The index of the hash table lies within a specific range

Solutions Digit Selection Folding Modulo arithmetic

Page 5: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

5

Hashing Methods

Digit Selection Choose a group of digits from the number Use combination of Mod/div operations on the

search key One of the most effective hashing methods

Page 6: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

6

Hashing Methods

Digit Selection Example

Assume table size = 1000 Key = 01234567

Choose 2nd, 4th,& last digits H(key) = 147

key = d1 d2 d3 d4 d5 d6 d7 d8 d9 Choose leftmost 3 digits

H(key) = key Div 1000000 = d1 d2 d3 Choose rightmost 3 digits

H(key) = key Mod 1000 = d7 d8 d9

Page 7: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

7

Hashing Methods

Digit Selection Mid-square Method (Multiplication)

First Variant Key is squared, then some digits of this square are selected to

give the index.

Example k = 54321 H(k) = k2 = 2950771041 Pick up 3 middle digits index = 077

Page 8: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

8

Hashing Methods

Folding Method Digits are added together instead of just being selected Digits can first be grouped and then add the groups Folding can be done more than once on the search key

Page 9: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

9

Hashing Methods

Folding Method Example:

Key = 1234567 H(Key) = 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28

Disadvantage All values will be put in the range

Solution Divide into groups then fold

Key = 1234567 Groups: 12 345 67 Fold: 12 + 345 + 67 = 454 Hash again to fit into table size

Page 10: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

10

Hashing Methods

Modulo Arithmetic Choose a prime table size Divide the search key using modulo the size of the table

h(x) = x mod TableSize

Items will be distributed over the table Advantages

Simple Reduces collisions

items will be evenly distributed if table size is a prime number

Page 11: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

11

Hashing Methods What should be done if the search key is a

character? Convert the character string into some integer

before applying the hash function How should we do that?

Use the ASCII code: Can lead to duplication (e.g. NOTE and TONE will result in the

same hash function)

Write a numeric value for each character in binary Concatenate the results

Page 12: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

12

Hashing Methods

Example: Key = NOTE

ASCII code for each character N = 14 = (01110) // Order of ‘N’ in alphabet O = 15 = (01111) T = 20 = (10100) E = 5 = (00101)

Concatenation Binary result:

y = (01110 01111 10100 00101) Equivalent decimal

X = 474,757 Apply hash function

h(x) = x mod TableSize

Page 13: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

13

Closed Hashing (Open Addressing)

No secondary data structure All the data goes inside the table. On collision, try alternate cells until an

empty cell is found. How? Bigger table is needed.

Page 14: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

14

Linear Probing

Linear search from position where collision occurred.

Page 15: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

15

Linear Probing This is called a collision, because

there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685

Number 265-7917

My hashvalue is [2].[2] is occupied, how to do[2] is occupied, how to do

Page 16: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

16

Linear Probing This is called a collision, because

there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685

Number 265-7917

My hashvalue is [2].

When a collision occurs,move forward until you

find an empty spot.

When a collision occurs,move forward until you

find an empty spot.

Page 17: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

17

Linear Probing

This is called a collision, because

there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685

Number 265-7917

My hashvalue is [2].[5] is empty, I can insert it[5] is empty, I can insert it

Page 18: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

18

Linear Probing This is called a collision, because

there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685

The new record goesin the empty spot.

The new record goesin the empty spot.

Number 701466868

Page 19: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

19

Linear Probing

Find the next index in the array up until the maximum subscript is reached and then it should return to the first index (wrap around)

Try alternate cells Cells h0(x), h1(x), h2(x), … are tried until an free cell

is found hi(x) = ( hash(x) + f(i) ) mod TSIZE f(0) = 0

Linear probing f(i) = i

Page 20: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

20

Searching for a Key The data that's attached to a key

can be found fairly quickly.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Number 265-7917

Page 21: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

21

Searching for a Key Calculate the hash value. Check that location of the array

for the key..

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Number 265-7917

My hashvalue is [2].Not me.

Page 22: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

22

Searching for a Key Keep moving forward until you find the

key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Number 265-7917

My hashvalue is [2].Not me.

Page 23: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

23

Searching for a Key Keep moving forward until you find the

key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Number 265-7917

My hashvalue is [2].Not me.

Page 24: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

24

Searching for a Key Keep moving forward until you find the

key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Number 265-7917

My hashvalue is [2].Yes!

Page 25: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

25

Searching for a Key When the item is found, the information

can be copied to the necessary location.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Number 265-7917

My hashvalue is [2].

Yes!

Page 26: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

26

Deleting a Record

Records may also be deleted from a hash table

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Pleasedelete me.

Page 27: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

27

Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary

"empty spot" since that could interfere with searches.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Page 28: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

28

Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty

spot" since that could interfere with searches. The location must be marked in some special way so

that a search can tell that the spot used to have something in it.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]

. . .

[100]Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

Page 29: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

29

Linear Probing Advantage

Uses less memory than chaining don’t have to store all the links

Disadvantages Can be slower than chaining

may have to walk along the table for a long way Difficult to delete a key and associated record.

has an impact on the search process Clustering

Primary clustering Table contains groups of consecutively occupied locations

Page 30: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

30

Linear probing: f(i) = i Quadratic probing: f(i) = i2

Insert 10, 40, 60, 20, 30, 70, 80

Quadratic Probing

0123456789

10

02

1040

12

1040

60

22

1040

60

2032

1040

60

30

2042

1040

607030

2052

1040

607030

2062 mod 10 = 6

Page 31: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

31

Quadratic Probing Advantages

Easy to compute Avoids primary clustering

Disadvantage Not all entries are searched. Might not encounter a free storage

location even when there are locations that are still free Elements that has the same hash value will probe the same set

of alternate cells Secondary clustering Not a big problem in practice

Use a good hash function

Page 32: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

32

Double Hashing

Use two hash functions one as before that generates the ‘home’ position. second one generates a sequence of offsets from

the home position that define the probe sequence. probe = (probe + offset) mod N

If the size of the table is prime, this method will eventually examine every position in the table.

Page 33: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

33

Problems with Closed Hashing

Table too full Running time too long Inserts could fail

Must be chosen in advance Don’t know the number of elements

Rehashing Build a new table that is about twice as big Hash the elements into the new table

Need to apply new hash function to every item in the old hash table

Page 34: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

34

Summary

Hash tables are specialized for dictionary operations: Insert, Delete, Search

Principle: Turn the key field of the record into a number, which we use as an index for locating the item in an array.

O(1) in the ideal case Problems: find a good hash function, collisions,

wasted space, do not support ordering queries Implementations: open hashing, closed hashing,

dynamic hashing

Page 35: COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV

35

Reveiw

What is a perfect hash function? What is a collision? What is meant by clustering? How does

clustering affect the overall efficiency of hashing?

What is a bucket? What is the time complexity for insertion,

deletion, and search in Hashing?