View
214
Download
0
Embed Size (px)
Citation preview
Hashing
A cool way to get from an element x to the place where x can be found
An array [0..B-1] of bucketsBucket contains a list of set elements
B = number of buckets
A hash function that takes potential set elements and produces a “random” integer [0..B-1]
Example
If the set elements are integers then the simplest/best hash function is usually h(x) = x % B
Suppose B = 6 and we wish to store the integers {70, 53, 99, 94, 83, 76, 64, 30}
They belong in the buckets 4, 5, 3, 4, 5, 4, 4, and 0
Note: If B = 7 0,4,1,3,6,6,1,2
Pitfalls of Hash Function Selection
We want to get a uniform distribution of elements into buckets
Beware of data patterns that cause non-uniform distribution
Example
If integers were all even, then B = 6 would cause only bucktes 0,2, and 4 to fill
If we hashed words in the the UNIX dictionary into 10 buckets by length of word then 20% go into bucket 7
Dictionary Operations
Lookup
Go to head of bucket h(x)
Search for bucket list. If x is in the bucket
Insertion: append if not found
Delete – list deletion from bucket list
Analysis
If we pick B to be new n, the nubmer of elements in the set, then the average list is O(1) long
Thus, dictionary ops take O(1) time
Worst case all elements go into one bucketO(n)
Managing Hash Table Size
If n gets as high as 2B, create a new hash table with 2B buckets
“Rehash” every element into the new tableO(n) time total
There were at least n inserts since the last “rehash”All these inserts took time O(n)
Thus, we “amortize” the cost of rehashing over the inserts since the last rehashConstant factor, at worst
So, even with rehashing we get O(1) time ops
Collisions
A collision occurs when two values in the set hash to the same value
There are several ways to deal with thisChaining (using a linked list or some secondary structure)
Open AddressingDouble hashing
Linear Probing
Chaining
0
1
2
3
4
5
6
70
99 64
83 76
94
53
30
Very efficientTime Wise
Other approachesUse less space
Open Addressing
When a collision occurs,
if the table is not full find an available spaceLinear Probing
Double Hashing
Linear ProbingIf the current location is occupied, try the next table location
LinearProbingInsert(K) {if (table is full) error;probe = h(K);while (table[probe] is occupied)
probe = ++probe % M;table[probe] = K;
}
Walk along table until an empty spot is foundUses less memory than chaining (no links)Takes more time than chaining (long walks)Deleting is a pain (mark a slot as having been deleted)
Linear Probingh(K) = K % 13
180 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5,
Linear Probingh(K) = K % 13
41 180 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2,
Linear Probingh(K) = K % 13
41 18 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9,
Linear Probingh(K) = K % 13
41 18 59 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7,
Linear Probingh(K) = K % 13
41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6,
Linear Probingh(K) = K % 13
41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5,
Linear Probingh(K) = K % 13
41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5,
Linear Probingh(K) = K % 13
41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5,
Linear Probingh(K) = K % 13
41 18 32 59 31 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5,
Linear Probingh(K) = K % 13
41 18 32 59 31 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5, 8
Linear Probingh(K) = K % 13
41 18 32 59 31 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h(K) : 5, 2, 9, 7, 6, 5, 8
73
Double HashingIf the current location is occupied, try another table location
Use two hash functions
If M is prime, eventually will examine every location DoubleHashInsert(K) {
if (table is full) error;
probe = h1(K);
offset = h2(K);
while (table[probe] is occupied)
probe = (probe+offset) % M;
table[probe] = K;
}
Many of the same (dis)advantages as linear probing
Distributes keys more evenly than linear probing
Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8
0 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h1(K) : 5, 2, 9, 7, 6, 5, 8
h2(K) : 6, 7, 2, 5, 8, 1, 7
Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8
41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h1(K) : 5, 2, 9, 7, 6, 5, 8
h2(K) : 6, 7, 2, 5, 8, 1, 7
31
Double Hashingh1(K) = K % 13h1(K) = 8 - K % 8
41 18 32 59 220 1 2 3 4 5 6 7 8 9 10 11 12
Insert: 18, 41, 22, 59, 32, 31, 73
h1(K) : 5, 2, 9, 7, 6, 5, 8
h2(K) : 6, 7, 2, 5, 8, 1, 7
3173
Implementing Hash Tables
public class HashMap implements Map {
private transient Entry table[];
private transient int count;….
Implementing Hash Tablespublic class HashMap implements Map {
…….
public HashMap(int initialCapacity, float loadFactor)
public HashMap(int initialCapacity)
public boolean containsValue(Object value)
public boolean containsKey(Object key)
public Object get(Object key)
public Object put(Object value, Object key)
public Object remove (Object key)
Constructorpublic HashMap(int initialCapacity, float loadFactor){
if (initialCapacity < 0) throw new IllegalArgumentException(
“Illegal InitialCapacity “ + initalCapacity);if (loadFactor <= 0)
throw new IllegalArgumentException(“Illegal loadFactor “ + loadFactor);
if (initalCapacity == 0) initalCapacity = 1;this.loadFactor = loadFactor;table = new Entry[initialCapacity];threshold = (int)(initialCapacity * loadFactor);
}// constructor
containsKey()public boolean containsKey(Object key){
Entry tab[] = table;if (key != null) {
int hash = key.hashCode();int index = (hash & 0x7FFFFFFF)% tab.length;for (Entry e = tab[index];e!=null;e=e.next)
if (e.hash == hash && key.equals(e.key))return true;
} else {for (Entry e = tab[index];e!=null;e=e.next)
if (e.hash == null) return true;}return false;
}// method containsKey
put()public Object put(Object key, Object value){
Entry tab[] = table; int hash = 0; int index = 0;
if (key != null) {
hash = key.hashCode();
index = (hash & 0x7FFFFFFF)% tab.length;
for (Entry e = tab[index];e!=null;e=e.next)
if (e.hash == hash && key.equals(e.key)){
Object old = e.value;
e.value = value;
return old;
}
}
put()else {
for (Entry e = tab[0];e!=null;e=e.next){ if (e.key == null){
Object old = e.value;
e.value = value;
return old;
}
}
}// key == null
put()modCount++;
if (count >= threshold) {
rehash();
tab = table;
index =(hash & 0x7FFFFFFF)% tab.length;
}
Entry e = new Entry(hash,key,value,tab[index]);
tab[index] = e;
count++;
return null;
}//method put
rehash()private void rehash(){
int oldCapacity = table.length;Entry oldMap[] = table;int newCapacity = oldCapacity * 2 + 1; Entry newMap[] = new Entry[newCapacity];modCount++;threshold = (int)(newCapacity * loadFactor);table = newMap;for (int I = olcCapacity;I0;) {
for (Entry old = oldMap[i];old!=null;){Entry e = old;old = old.next;int index =(e.hash & 0x7FFFFFFF)% newCapacity;e.next = newMap[index];newMap[index] = e;
}}
}
remove()public Object remove(Object key){
Entry tab[] = table;if (key != null) {
int hash = key.hashCode();int index = (hash & 0x7FFFFFFF)% tab.length;for (Entry e = tab[index],prev = null;e!=null;prev=e,e=e.next)
if (e.hash == hash && key.equals(e.key)){modCount++;if (prev != null) prev.next = e.next;else tab[index] = e.next;count--;Object oldValue = e.value;e.value = null;return oldValue;
}
remove()else {
for (Entry e = tab[0],prev = null;e!=null;prev=e,e=e.next){
if (e.key == null){
modCount++;
if (prev != null) prev.next = e.next;
else tab[0] = e.next;
count--;
Object oldValue = e.value;
e.value = null;
return oldValue;
}
}
return null;
}
Theoretical Results
Not Found Found
Chaining
Linear Probing
Double Hashing
1 21
212
1
2
1
12
1
2
1
1
1
1
1ln
1