A Look at Modern Dictionary Structures & Algorithms Warren Hunt

A Look at Modern Dictionary Structures & Algorithms

Warren Hunt

Dictionary Structures

Used for storing information (key, value) pairs

Bread and Butter of a Data-structures and Algorithms course

Common Dictionary Structures

List (Array)Sorted List

Linked List

Move to Front List Inverted Index ListSkip List check this one out…

Common Dictionary Structures

(Balanced) Binary Search TreesAVL TreeRed-Black TreeSplay TreeB-TreeTriePatricia Tree…

Common Dictionary Structures Hash Tables

Linear (or Quadratic) ProbingSeparate Chaining (or Treeing)Double HashingPerfect HashingHash TreesCuckoo Hashing

d-ary binned

…

+Every Hybrid You Can Think Of!

Unfortunately, they don’t teach the cool ones…Skip lists are a faster/easier to code

alternative to most binary search trees Invented in 1990!

Cuckoo Hashing has a huge number of nice properties (IMHO far superior to all other hashing designs)

Invented in 2001

So many to choose from!Which is best? That Depends on your needs… Sorted Lists are simple and easy to implement

(simple means fast on small datasets!) Binary search trees and sorted lists provide easy

access to sorted data B-trees have great page-performance for

databases Hash tables have fastest asymptotic lookup time

Focus On Hashing for Now

Fastest lookup/insert/delete time: O(1) Used in Bloom-filters

not the graphics kind! Useful in garbage collection (or anywhere

you want to mark things as visited) Small hash-tables implement an

associative cache Easy to implement! (no pointer chasing)

Traditional Hashing Just make up an address in an array for

some piece of data and stick it thereHash function generates the address

Problems arise when two things have the same address, so we’ll address that:Linear (or Quadratic) ProbingSeparate Chaining (Treeing…)Double Hashing

Problems With Traditional Hashing Without separate chaining, they can’t get

too full or bad things happen With separate chaining, we have poor

cache performance and still O(n) worst case behavior Separate treeing provides O(log n) worst

case, but they don’t teach that in school… Linear probing is still the most common

(fastest cache behavior, bite the bullet on poorer memory utilization)

Good Hash Functions

All hash table implementations require good hash functions (with the exception of separate treeing)Universal hash functions are required

(number theory, I won’t discuss it here)Cuckoo hashing is less strict (different

assumptions are made in each paper to make proofs easier)

Cuckoo Hashing

Guaranteed O(1) lookup/delete Amortized O(1) insert 50% space efficient Requires *mostly* random hash functions

Newish and largely unknown (barely mentioned in Wikipedia-Hash Tables)

Cuckoo Hashing

Use two hash tables and two hash functions

Each element will have exactly one “nest” (hash location) in each tableGuarantee that any element will only ever

exist in one of its “nests”Lookup/delete are O(1) because we can

check 2 locations (“nests”) in O(1) time

Cuckoo Hashing - Insertion

1. Insert an element by finding one of its “nests” and putting it there

• This may evict another element! (goto 2.)

2. Insert the evicted element into its *other* “nest” This may evict another element! (goto 2.)

• Under reasonable assumptions, this process will terminate in O(1) time…

Why does this work?

Matching property of random graphs

With high probability, any matching under a saturation threshold (50% in this case) can take another edge without breaking

More details in the paper

Overflowing the Table

Insertion can potentially fail causing an infinite insertion loop

Detected using a depth cutoff

Due to unlucky hash functionsDue to a full hash table

Double the size of the table (if need be), choose new hash functions and rehash all of the elements

Example

To the board!

Asymetric Cuckoo Hashing

Choose one (the first) table to be larger than the other Improves the probability that we get a hit on

the first lookupOnly a minor slowdown on insert

Same Table Cuckoo Hashing

We didn’t actually need two separate tables. It made the analysis much easierBut… In practice, we just need two hash

functions

d-ary Cuckoo Hashing

Guaranteed O(1) lookup/delete Amortized O(1) insert 97%+ space efficient Analysis requires random hash functions

(not quite as easy to implement)(robust against crappier hash functions)

d-ary Cuckoo Hashing

Use d hash tables instead of two!Lookup and delete look at d buckets Insert is more complicated

Insertion sees a tree of possible eviction+insertion pathsBFS to find an empty nestRandom walk to find an empty nest (easier)

Bucketed Cuckoo Hashing

Guaranteed O(1) lookup/delete Amortized O(1) insert 90%+ space efficient Requires *mostly* random hash functions

(easier to implement)(better, “good” cache performance)

Bucketed Cuckoo Hashing Use two hash functions: but each hashes to an

associative m-wide bucket Lookup and delete must check at most two whole buckets Insertion into a full bucket leaves a choice during eviction

Insertion sees a tree of possible eviction+insertion paths BFS to find an empty bucket

Best first uses most empty target bucket

Random walk to find an empty bucket (easier) Use LRI eviction for easiest implementation

Generalization: Use both!

Use k hash function Use bins for size m

Get the best of both worlds!

Max load for O(1) Insert – 99% Guarantee (proven)

1 cell 2 cells 4 cells 8 cells

4 hash func

97% 99% 99.9% 100%*

3 hash func

91% 97% 98% 99.9%

2 hash func

49% 86% 93% 96%

1 hash func

0.06% 0.6% 3% 12%

IBM’s Implementation

IBM designed a hash table for the cell processor Parameters: K=2, M=4 (SIMD width)

If hash table fit in scratch L2: lookup in 21 cycles

Simple multiplicative hash functions worked well

Better Cache Performance than you Would Think If prefetching is used, cost of lookup is one

memory latency (plus time to compute the hash function, which can be done in SIMD) Exactly two cache-line loads

Binary search trees, linear probing, linear chaining, etc… usually take more cache-line loads and have a very branchy search loop

Conclusions

Cuckoo Hashing Provides:Guaranteed O(1) lookup+deleteAmortized O(1) insertEfficient memory utalization

Both in space and bandwidth!Small constant factors

And SIMD friendly!And is simple to implement

(easier than linear probing!)

Good Hash Function?

http://www.burtleburtle.net/bob/c/lookup3.c (very fast, especially if you use the __rotl intrinsic) #define mix(a,b,c){

a -= c; a ^= rot(c, 4); c += b;b -= a; b ^= rot(a, 6); a += c;c -= b; c ^= rot(b, 8); b += a;a -= c; a ^= rot(c,16); c += b;b -= a; b ^= rot(a,19); a += c;c -= b; c ^= rot(b, 4); b += a;

}

Questions?

Documents

A Look at Modern Dictionary Structures & Algorithms Warren Hunt