39
HASHING CSC 172 SPRING 2002 LECTURE 22

HASHING CSC 172 SPRING 2002 LECTURE 22. Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

HASHING

CSC 172

SPRING 2002

LECTURE 22

Hashing

A cool way to get from an element x to the place where x can be found

An array [0..B-1] of bucketsBucket contains a list of set elements

B = number of buckets

A hash function that takes potential set elements and produces a “random” integer [0..B-1]

Example

If the set elements are integers then the simplest/best hash function is usually h(x) = x % B

Suppose B = 6 and we wish to store the integers {70, 53, 99, 94, 83, 76, 64, 30}

They belong in the buckets 4, 5, 3, 4, 5, 4, 4, and 0

Note: If B = 7 0,4,1,3,6,6,1,2

Pitfalls of Hash Function Selection

We want to get a uniform distribution of elements into buckets

Beware of data patterns that cause non-uniform distribution

Example

If integers were all even, then B = 6 would cause only bucktes 0,2, and 4 to fill

If we hashed words in the the UNIX dictionary into 10 buckets by length of word then 20% go into bucket 7

Dictionary Operations

Lookup

Go to head of bucket h(x)

Search for bucket list. If x is in the bucket

Insertion: append if not found

Delete – list deletion from bucket list

Analysis

If we pick B to be new n, the nubmer of elements in the set, then the average list is O(1) long

Thus, dictionary ops take O(1) time

Worst case all elements go into one bucketO(n)

Managing Hash Table Size

If n gets as high as 2B, create a new hash table with 2B buckets

“Rehash” every element into the new tableO(n) time total

There were at least n inserts since the last “rehash”All these inserts took time O(n)

Thus, we “amortize” the cost of rehashing over the inserts since the last rehashConstant factor, at worst

So, even with rehashing we get O(1) time ops

Collisions

A collision occurs when two values in the set hash to the same value

There are several ways to deal with thisChaining (using a linked list or some secondary structure)

Open AddressingDouble hashing

Linear Probing

Chaining

0

1

2

3

4

5

6

70

99 64

83 76

94

53

30

Very efficientTime Wise

Other approachesUse less space

Open Addressing

When a collision occurs,

if the table is not full find an available spaceLinear Probing

Double Hashing

Linear ProbingIf the current location is occupied, try the next table location

LinearProbingInsert(K) {if (table is full) error;probe = h(K);while (table[probe] is occupied)

probe = ++probe % M;table[probe] = K;

}

Walk along table until an empty spot is foundUses less memory than chaining (no links)Takes more time than chaining (long walks)Deleting is a pain (mark a slot as having been deleted)

Linear Probingh(K) = K % 13

180 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5,

Linear Probingh(K) = K % 13

41 180 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2,

Linear Probingh(K) = K % 13

41 18 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9,

Linear Probingh(K) = K % 13

41 18 59 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9, 7,

Linear Probingh(K) = K % 13

41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9, 7, 6,

Linear Probingh(K) = K % 13

41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9, 7, 6, 5,

Linear Probingh(K) = K % 13

41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9, 7, 6, 5,

Linear Probingh(K) = K % 13

41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9, 7, 6, 5,

Linear Probingh(K) = K % 13

41 18 32 59 31 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9, 7, 6, 5,

Linear Probingh(K) = K % 13

41 18 32 59 31 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9, 7, 6, 5, 8

Linear Probingh(K) = K % 13

41 18 32 59 31 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h(K) : 5, 2, 9, 7, 6, 5, 8

73

Double HashingIf the current location is occupied, try another table location

Use two hash functions

If M is prime, eventually will examine every location DoubleHashInsert(K) {

if (table is full) error;

probe = h1(K);

offset = h2(K);

while (table[probe] is occupied)

probe = (probe+offset) % M;

table[probe] = K;

}

Many of the same (dis)advantages as linear probing

Distributes keys more evenly than linear probing

Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8

0 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h1(K) : 5, 2, 9, 7, 6, 5, 8

h2(K) : 6, 7, 2, 5, 8, 1, 7

Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8

41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h1(K) : 5, 2, 9, 7, 6, 5, 8

h2(K) : 6, 7, 2, 5, 8, 1, 7

31

Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8

41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12

Insert: 18, 41, 22, 59, 32, 31, 73

h1(K) : 5, 2, 9, 7, 6, 5, 8

h2(K) : 6, 7, 2, 5, 8, 1, 7

3173

Implementing Hash Tables

public class HashMap implements Map {

private transient Entry table[];

private transient int count;….

Implementing Hash Tablespublic class HashMap implements Map {

…….

public HashMap(int initialCapacity, float loadFactor)

public HashMap(int initialCapacity)

public boolean containsValue(Object value)

public boolean containsKey(Object key)

public Object get(Object key)

public Object put(Object value, Object key)

public Object remove (Object key)

Constructorpublic HashMap(int initialCapacity, float loadFactor){

if (initialCapacity < 0) throw new IllegalArgumentException(

“Illegal InitialCapacity “ + initalCapacity);if (loadFactor <= 0)

throw new IllegalArgumentException(“Illegal loadFactor “ + loadFactor);

if (initalCapacity == 0) initalCapacity = 1;this.loadFactor = loadFactor;table = new Entry[initialCapacity];threshold = (int)(initialCapacity * loadFactor);

}// constructor

containsKey()public boolean containsKey(Object key){

Entry tab[] = table;if (key != null) {

int hash = key.hashCode();int index = (hash & 0x7FFFFFFF)% tab.length;for (Entry e = tab[index];e!=null;e=e.next)

if (e.hash == hash && key.equals(e.key))return true;

} else {for (Entry e = tab[index];e!=null;e=e.next)

if (e.hash == null) return true;}return false;

}// method containsKey

put()public Object put(Object key, Object value){

Entry tab[] = table; int hash = 0; int index = 0;

if (key != null) {

hash = key.hashCode();

index = (hash & 0x7FFFFFFF)% tab.length;

for (Entry e = tab[index];e!=null;e=e.next)

if (e.hash == hash && key.equals(e.key)){

Object old = e.value;

e.value = value;

return old;

}

}

put()else {

for (Entry e = tab[0];e!=null;e=e.next){ if (e.key == null){

Object old = e.value;

e.value = value;

return old;

}

}

}// key == null

put()modCount++;

if (count >= threshold) {

rehash();

tab = table;

index =(hash & 0x7FFFFFFF)% tab.length;

}

Entry e = new Entry(hash,key,value,tab[index]);

tab[index] = e;

count++;

return null;

}//method put

rehash()private void rehash(){

int oldCapacity = table.length;Entry oldMap[] = table;int newCapacity = oldCapacity * 2 + 1; Entry newMap[] = new Entry[newCapacity];modCount++;threshold = (int)(newCapacity * loadFactor);table = newMap;for (int I = olcCapacity;I0;) {

for (Entry old = oldMap[i];old!=null;){Entry e = old;old = old.next;int index =(e.hash & 0x7FFFFFFF)% newCapacity;e.next = newMap[index];newMap[index] = e;

}}

}

remove()public Object remove(Object key){

Entry tab[] = table;if (key != null) {

int hash = key.hashCode();int index = (hash & 0x7FFFFFFF)% tab.length;for (Entry e = tab[index],prev = null;e!=null;prev=e,e=e.next)

if (e.hash == hash && key.equals(e.key)){modCount++;if (prev != null) prev.next = e.next;else tab[index] = e.next;count--;Object oldValue = e.value;e.value = null;return oldValue;

}

remove()else {

for (Entry e = tab[0],prev = null;e!=null;prev=e,e=e.next){

if (e.key == null){

modCount++;

if (prev != null) prev.next = e.next;

else tab[0] = e.next;

count--;

Object oldValue = e.value;

e.value = null;

return oldValue;

}

}

return null;

}

Theoretical Results

Not Found Found

Chaining

Linear Probing

Double Hashing

1 21

212

1

2

1

12

1

2

1

1

1

1

1ln

1

Expected Probes

0.5 1.0

1.0

Linear Probing

Double Hashing

Chaining