25
Lecture 10 Sept 29 Goals: • hashing • dictionary operations • general idea of hashing • hash functions • chaining • closed hashing

Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Lecture 10 Sept 29

Goals:

• hashing

• dictionary operations

• general idea of hashing

• hash functions

• chaining

• closed hashing

Page 2: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Dictionary operations

• search • insert• delete

Applications:

• compilers

• web searching

• game playing, state space search

• spell checking etc.

Page 3: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Dictionary operations•search• insert• delete

ARRAY LINKED LIST

sorted unsorted sorted unsorted

Search

Insert

delete

O(log n) O(n) O(n) O(n)

O(n) O(1) O(n) O(n)

O(n) O(n) O(n) O(n)

comparisons and data movements combined (Assuming keys can be compared with <, > and = outcomes)

Exercise: Create two separate tables for data movements (assignments) and comparisons.

Page 4: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Performance goal for dictionary operations:

O(n) is too inefficient.

Goal is to achieve each of the operations

(a) average O(log n)

(b) worst-case O(log n)

(c) average constant time

Data structure that achieve these goals:

(a) binary search tree

(b) balanced binary search tree (AVL tree)

(c) hashing (but worst-case is O(n))

Page 5: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Hashing

o An important and widely useful technique for implementing dictionaries

o Constant time per operation (on the average)

o Worst case time proportional to the size of the set for each operation (just like array and linked list implementation)

Page 6: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

General idea

U = Set of all possible keys: (e.g. 9 digit SS #)

If n = |U| is not very large, a simple way to support dictionary operations is:

map each key e in U to a unique integer h(e) in the range 0 .. n – 1.

Boolean array H[0 .. n – 1] to store keys.

Page 7: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

General idea

Page 8: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Ideal case not realistic

• U the set of all possible keys is usually very large so we can’t create an array of size n = |U|.

• Create an array H of size m much smaller than n.

• Actual keys present at any time will usually be smaller than n.

• mapping from U -> {0, 1, …, m – 1} is called hash function.

Example: D = students currently enrolled in courses, U = set of all SS #’s, hash table of size = 1000

Hash function h(x) = last three digits

Page 9: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Example (continued)

Insert Student “Dan” SS# = 1238769871h(1238769871) = 871

...

0 1 2 3 999

hash table

buckets

871

DanNULL

Page 10: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Example (continued)

Insert Student “Tim” SS# = 1872769871h(1238769871) = 871, same as that of Dan.

Collision

...

0 1 2 3 999

hash table

buckets

871

DanNULL

Page 11: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Hash Functions

If h(k1) = = h(k2): k1 and k2 have collision at slot

There are two approaches to resolve collisions.

Page 12: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Collision Resolution Policies

Two ways to resolve: (1) Open hashing, also known as separate

chaining (2) Closed hashing

Chaining: keys that collide are stored in a linked list.

Page 13: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Previous Example:

Insert Student “Tim” SS# = 1872769871h(1238769871) = 871, same as that of Dan.

Collision

...

0 1 2 3 999

hash table

buckets

871

DanNULL

Tim

Page 14: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Open Hashing

The hash table is a pointer to the head of a linked list

All elements that hash to a particular bucket are placed on that bucket’s linked list

Records within a bucket can be ordered in several waysby order of insertion, by key value order, or by

frequency of access order, in the order of most recent access etc.

Page 15: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Open Hashing Data Organization

0

1

2

3

4

D-1

...

...

...

Page 16: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Hash table class

Data members:

• hashArray : array of list pointers• currentSize: the number of keys in the table.

Member functions:• search• insert • delete• other supporting functions …

Page 17: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Basic HashTable class

class HashTable {

public: HashTable(int size = 101); bool contains(const HashedObj& x) const;

void makeEmpty();

void insert(const HashedObj& x);

void remove(const HashedObj& x);

Page 18: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Basic HashTable class

private: vector<list<HashedObj> > theLists;

int CurrentSize;

void reHash();

int myHash( const HashedObj & x) const;

};

Page 19: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Choice of hash function

A good hash function should:

• be easy to compute

• distribute the keys uniformly to the buckets

• use all the fields of the key object.

Page 20: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Example: key is a string over {a, …, z, 0, … 9, _ }Suppose hash table size is n = 10007.

(Choose table size to be a prime number.)

Good hash function: interpret the string as a number to base 37 and compute mod 10007.

h(“word”) = ? “w” = 23, “o” = 15, “r” = 18 and “d” = 4.

h(“word”) = (23 * 37 + 15 * 37 + 18 * 37 + 4) % 10007

Page 21: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Computing hash function for a string

Horner’s rule: (( … (a0 x + a1) x + a2) x + … + an-2 )x + an-1)

int hash( const string & key ){ int hashVal = 0;

for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key[ i ];

return hashVal;}

Page 22: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Computing hash function for a string

int myhash( const HashedObj & x ) const { int hashVal = hash( x ); hashVal %= theLists.size( ); return hashVal; }

Alternatively, we can apply % theLists.size() after each iteration of the loop in hash function.

int myHash( const string & key ){ int hashVal = 0; int s = theLists.size();

for( int i = 0; i < key.length( ); i++ ) hashVal = (37 * hashVal + key[ i ]) % s;

return hashVal % s;}

Page 23: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Implementation of open hashing - search

boolean contains(const HashedObj& x) const {

// search for x in open hash table T

// returns true if x is in the table, no else

int add = myhash(x->key);

if (add < 0 || add > arraysize – 1)

{cout << “hash address incorrect” << endl;

return false;}else {

for (list* h =theLists[add], h!= null && h->key != x, h= h->next))

;

if (h) return true else return false;

}

Page 24: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Insert into an open hash table

Key x is inserted in front of the list to which h hashes

void insert(const HashedObj& x) {

// insert x into open hash table

int add = myhash(x->key);

if (add < 0 || add > arraysize – 1)

cout << “hash address incorrect” << endl;else

if (!search(x->key)) {

theLists[add].insert(x);

}

}

Page 25: Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing

Implementation of open hashing - delete

HashedObj delete(key x) {

// return node containing x from open hash table

// null pointer is returned if x is not in the table

int add = myhash(x);

return theLists[add].delete(x);

}

We assume insert and delete supported by the list class.

We may have to write different version of insert depending on the policy used.