Upload
moses-boone
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
A Look at Modern Dictionary Structures & Algorithms
Warren Hunt
Dictionary Structures
Used for storing information (key, value) pairs
Bread and Butter of a Data-structures and Algorithms course
Common Dictionary Structures
List (Array)Sorted List
Linked List
Move to Front List Inverted Index ListSkip List check this one out…
Common Dictionary Structures
(Balanced) Binary Search TreesAVL TreeRed-Black TreeSplay TreeB-TreeTriePatricia Tree…
Common Dictionary Structures Hash Tables
Linear (or Quadratic) ProbingSeparate Chaining (or Treeing)Double HashingPerfect HashingHash TreesCuckoo Hashing
d-ary binned
…
+Every Hybrid You Can Think Of!
Unfortunately, they don’t teach the cool ones…Skip lists are a faster/easier to code
alternative to most binary search trees Invented in 1990!
Cuckoo Hashing has a huge number of nice properties (IMHO far superior to all other hashing designs)
Invented in 2001
So many to choose from!Which is best? That Depends on your needs… Sorted Lists are simple and easy to implement
(simple means fast on small datasets!) Binary search trees and sorted lists provide easy
access to sorted data B-trees have great page-performance for
databases Hash tables have fastest asymptotic lookup time
Focus On Hashing for Now
Fastest lookup/insert/delete time: O(1) Used in Bloom-filters
not the graphics kind! Useful in garbage collection (or anywhere
you want to mark things as visited) Small hash-tables implement an
associative cache Easy to implement! (no pointer chasing)
Traditional Hashing Just make up an address in an array for
some piece of data and stick it thereHash function generates the address
Problems arise when two things have the same address, so we’ll address that:Linear (or Quadratic) ProbingSeparate Chaining (Treeing…)Double Hashing
Problems With Traditional Hashing Without separate chaining, they can’t get
too full or bad things happen With separate chaining, we have poor
cache performance and still O(n) worst case behavior Separate treeing provides O(log n) worst
case, but they don’t teach that in school… Linear probing is still the most common
(fastest cache behavior, bite the bullet on poorer memory utilization)
Good Hash Functions
All hash table implementations require good hash functions (with the exception of separate treeing)Universal hash functions are required
(number theory, I won’t discuss it here)Cuckoo hashing is less strict (different
assumptions are made in each paper to make proofs easier)
Cuckoo Hashing
Guaranteed O(1) lookup/delete Amortized O(1) insert 50% space efficient Requires *mostly* random hash functions
Newish and largely unknown (barely mentioned in Wikipedia-Hash Tables)
Cuckoo Hashing
Use two hash tables and two hash functions
Each element will have exactly one “nest” (hash location) in each tableGuarantee that any element will only ever
exist in one of its “nests”Lookup/delete are O(1) because we can
check 2 locations (“nests”) in O(1) time
Cuckoo Hashing - Insertion
1. Insert an element by finding one of its “nests” and putting it there
• This may evict another element! (goto 2.)
2. Insert the evicted element into its *other* “nest” This may evict another element! (goto 2.)
• Under reasonable assumptions, this process will terminate in O(1) time…
Why does this work?
Matching property of random graphs
With high probability, any matching under a saturation threshold (50% in this case) can take another edge without breaking
More details in the paper
Overflowing the Table
Insertion can potentially fail causing an infinite insertion loop
Detected using a depth cutoff
Due to unlucky hash functionsDue to a full hash table
Double the size of the table (if need be), choose new hash functions and rehash all of the elements
Example
To the board!
Asymetric Cuckoo Hashing
Choose one (the first) table to be larger than the other Improves the probability that we get a hit on
the first lookupOnly a minor slowdown on insert
Same Table Cuckoo Hashing
We didn’t actually need two separate tables. It made the analysis much easierBut… In practice, we just need two hash
functions
d-ary Cuckoo Hashing
Guaranteed O(1) lookup/delete Amortized O(1) insert 97%+ space efficient Analysis requires random hash functions
(not quite as easy to implement)(robust against crappier hash functions)
d-ary Cuckoo Hashing
Use d hash tables instead of two!Lookup and delete look at d buckets Insert is more complicated
Insertion sees a tree of possible eviction+insertion pathsBFS to find an empty nestRandom walk to find an empty nest (easier)
Bucketed Cuckoo Hashing
Guaranteed O(1) lookup/delete Amortized O(1) insert 90%+ space efficient Requires *mostly* random hash functions
(easier to implement)(better, “good” cache performance)
Bucketed Cuckoo Hashing Use two hash functions: but each hashes to an
associative m-wide bucket Lookup and delete must check at most two whole buckets Insertion into a full bucket leaves a choice during eviction
Insertion sees a tree of possible eviction+insertion paths BFS to find an empty bucket
Best first uses most empty target bucket
Random walk to find an empty bucket (easier) Use LRI eviction for easiest implementation
Generalization: Use both!
Use k hash function Use bins for size m
Get the best of both worlds!
Max load for O(1) Insert – 99% Guarantee (proven)
1 cell 2 cells 4 cells 8 cells
4 hash func
97% 99% 99.9% 100%*
3 hash func
91% 97% 98% 99.9%
2 hash func
49% 86% 93% 96%
1 hash func
0.06% 0.6% 3% 12%
IBM’s Implementation
IBM designed a hash table for the cell processor Parameters: K=2, M=4 (SIMD width)
If hash table fit in scratch L2: lookup in 21 cycles
Simple multiplicative hash functions worked well
Better Cache Performance than you Would Think If prefetching is used, cost of lookup is one
memory latency (plus time to compute the hash function, which can be done in SIMD) Exactly two cache-line loads
Binary search trees, linear probing, linear chaining, etc… usually take more cache-line loads and have a very branchy search loop
Conclusions
Cuckoo Hashing Provides:Guaranteed O(1) lookup+deleteAmortized O(1) insertEfficient memory utalization
Both in space and bandwidth!Small constant factors
And SIMD friendly!And is simple to implement
(easier than linear probing!)
Good Hash Function?
http://www.burtleburtle.net/bob/c/lookup3.c (very fast, especially if you use the __rotl intrinsic) #define mix(a,b,c){
a -= c; a ^= rot(c, 4); c += b;b -= a; b ^= rot(a, 6); a += c;c -= b; c ^= rot(b, 8); b += a;a -= c; a ^= rot(c,16); c += b;b -= a; b ^= rot(a,19); a += c;c -= b; c ^= rot(b, 4); b += a;
}
Questions?