LECTURE 12: COLLISIONS
CSC 213 – Large Scale Programming
Today’s Goal
Today’s Goal
Review when & why we need search ADTs Why Sequence-based approach causes
problems How hash can help solve these problems What is inappropriate and incorrect about
hash jokes Discover hash’s problems & what must
be done What would happen if keys hashed to same
index Ways of handling situation so that hash still
works To remove data, using null may not be
best option Dark secrets of hashing, exposed at
lecture’s end
Map Performance
In many situations can be matter of life-or-death 911 Operators immediately need
addresses Google’s search performance in TB/s O(log n) time too slow for these uses
Would love to use arrays Convert key to int with hash function With result of hash, have index in table to
examine
put, remove & get only O(1) time
Hash Table
Entrys
0 •1 02561200
01 “Jay Doe”
2 9811010002
“Bob Doe”
3 •4 45122900
04 “Jill Roe”
⁞ ⁞999
7 •999
82007519998
“Rhi Smith”
9999 •
Hash Table
Array locations either: null Reference to Entry Marker value*
Table will contain gaps Better when spread out
Hash key to index Always start with hash After hash, move to array
Ideal World
key hashed to unique index Hash and done, Entry is there
Ideal World
key hashed to unique index Hash and done, Entry is there
And then…
You wake up
Collisions
Occurs when 2 keys hash to same index
Ideal hash spreads keys out evenly across table As nice side effect, this limits collisions Small table size important also, since RAM
limited Unfortunately, no such thing as ideal
hash Must handle collisions to get O(1) efficiency
buzz
Bad Hash
Perfect hash does not exist Cannot know all keys beforehand Clustered around a few indices Or find all keys hashed to same index
Handling bad hash is a necessary Even given Entry always check key Store multiple Entrys with same hash (Shot of adrenaline restarts heart)
Bucket Arrays
Make hash table an array of linked list Nodes First node aliased by the array location
Whenever we have collision, we “chain” Entrys Create new Node to store the Entry The linked list will have new Node at its
front
0 •1
2 •3 •4 •5 •
Bucket Arrays
But what if have really bad hash? Hashes to same index in every situation
All Entrys now found in single linked list O(n) execution times would now be required
Bucket Arrays
But what if have really bad hash? Hashes to same index in every situation
All Entrys now found in single linked list O(n) execution times would now be required (Also get bad case of the munchies)
Collisions
Normally, table holds one Entry per index Need to be smarter when keys collide
Efficiency matters If we do not care, use Sequence-based
approach Several common schemes used to
provide speed Each of these schemes has strengths &
weaknesses Silver bullets do not exist in CSC, must
balance needs If all-powerful answers desired, try
Religious Studies
Collisions
Normally, table holds one Entry per index Need to be smarter when keys collide
Efficiency matters important If we do not care, use Sequence-based
approach Several common schemes used to
provide speed Each of these schemes has strengths &
weaknesses Silver bullets do not exist in CSC, must
balance needs If all-powerful answers desired, try
Religious Studies
Collisions
Normally, table holds one Entry per index Need to be smarter when keys collide
Efficiency matters important critical If we do not care, use Sequence-based
approach Several common schemes used to
provide speed Each of these schemes has strengths &
weaknesses Silver bullets do not exist in CSC, must
balance needs If all-powerful answers desired, try
Religious Studies
Linear Probing
Musical chairs uses this algorithm At index where key hashed examine Entry Circle through array until empty index
found
Algorithm is very simple But creates clusters of Entrys
Linear Probe Example
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Linear Probe Example
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Linear Probe Example
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Linear Probe Example
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Linear Probe Example
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Probing Reaction
Oh, ****Adding to hash table still O(n)
Quadratic Probe
Avoids primary clustering problems But does create secondary clustering (no
one cares) Quadratic probe still simple (like linear
probe) Examine Entry at index k, hashed value of
key Check (k + j2) % length: k+1, k+4, k+9, k+16,
… Continue probing until unused array slot
found Guaranteed to work when:
Need to get around -- table size is prime number
Under 50% full so many open slots exist
Quadratic Probe Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Quadratic Probe Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Quadratic Probe Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Quadratic Probe Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Quadratic Probe Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13Now add:
44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Quadratic Probing Reaction
Darn it to heck.Adding to hash table still O(n)
Double Hashing
Solve bad hash with even more hash Use 2nd hash function very different from
first 2nd hash function not allowed to return zero
Re-hash key using 2nd function after the collision Check index equal to sum of two hash
functions Re-add 2nd hash to this sum to continue
probing Guaranteed to work when
Still must get around -- table size is prime number
Double Hash Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13h2(x) = 5 - (x mod 5)
Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Double Hash Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13h2(x) = 5 - (x mod 5)
Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Double Hash Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13h2(x) = 5 - (x mod 5)
Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Double Hash Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13h2(x) = 5 - (x mod 5)
Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Double Hash Example
31 15 18 44 20 32 22 76 0 1 2 3 4 5 6 7 8 9 10 11 12
h(x) = x mod 13h2(x) = 5 - (x mod 5)
Now add: 44 h(44) = 520 h(20) = 722 h(22) = 931 h(31) = 5
Double Probing Reaction
Sweet! Double hashing keeps put O(n)
Probing and Searching
Search index where key hashed If cannot place Entry at index
The array must keep being probed Stop only at usable index
May need to probe every index! Searching takes O(n) even with hash
May need to reallocate & rehash table Worst case O(n) put even with perfect hash
Post-Removal Operations
What happens when we remove an Entry? Set index to null in most structures
Consider if we call remove(44)
15 18 44 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Post-Removal Operations
What happens when we remove an Entry? Set index to null in most structures
Consider if we call remove(44)
15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Post-Removal Operations
What happens when we remove an Entry? Set index to null in most structures
Consider if we call remove(44) get(31) called, what would happen?
15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Post-Removal Operations
What happens when we remove an Entry? Set index to null in most structures
Consider if we call remove(44) get(31) called, what would happen?
First check index it is hashed to
15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Post-Removal Operations
What happens when we remove an Entry? Set index to null in most structures
Consider if we call remove(44) get(31) called, what would happen?
First check index it is hashed to Checks first probe indexed… 15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
Post-Removal Operations
What happens when we remove an Entry? Set index to null in most structures
Consider if we call remove(44) get(31) called, what would happen?
First check index it is hashed to Checks first probe indexed… & stops at null
15 18 20 32 22 31 76 0 1 2 3 4 5 6 7 8 9 10 11 12
*Marker Value Explained
Mark cleared indices in hash table Since collision could have happened,
continue search Index can be used to store new Entry
Ways to show that array index is clear Entry with null key could be used if one is
careful Could try and make key which is never
used Use static final field of type Entry
Why Use Hash Table & Probes?
Hash tables can require O(n) complexity Provide O(1) time if you are really good
Ultimately depends on hash function used Choose wisely and be rich
Before Next Lecture…
Get updated lab project into SVN directory No need to e-mail, I will collect directories at 5PM
Finish working on week #4 assignment Due at usual time tomorrow afternoon/evening
Start thinking of your design for the project Due Tuesday a preliminary design & javadoc
Review sections for Map & Dictionary Quiz ADTs, hash, probing and other ideas covered Initially work on your own, groups get harder
questions