23
Universal Hashing

Universal Hashing

  • Upload
    akamu

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

Universal Hashing. H={ h: U →{0,…,m-1} }, which is a finite collection of hash functions. H is called “universal” if for each pair of distinct keys k, ∈ U, the number of hash functions h∈H for which h(k)=h( ) is at most |H|/m Define n i = the length of list T[i] - PowerPoint PPT Presentation

Citation preview

Page 1: Universal Hashing

Universal Hashing

Page 2: Universal Hashing

Worst case analysis Probabilistic analysis

Need the knowledge of the distribution of the inputs

Indicator random variables Given a sample space S and an event A, the

indicator random variable I{A} associated with event A is defined as: 10 if occurs

o/wAI A

Page 3: Universal Hashing

E.g.: Consider flipping a fair coin:• Sample space S = { H,T }• Define random variable Y with Pr{ Y=H } =

Pr{ Y=T }=1/2• We can define an indicator r.v. XH associated with

the coin coming up heads, i.e. Y=H

10 if if H

Y HX I Y H Y T

1 Pr 0 Pr1Pr2

HE X E I Y HY H Y T

Y H

Page 4: Universal Hashing

{ }

:

:

Pr

1 Pr 0 PrPr

A

A

A

S AS X I A

E X A

E X E I A A AA

Lemma

Proof

Given a sample space and an event in thesample space , let Then

Page 5: Universal Hashing

H={ h: U→{0,…,m-1} }, which is a finite collection of hash functions.

H is called “universal” if for each pair of distinct keys k, ∈ U, the number of hash functions h∈H for which h(k)=h( ) is at most |H|/m

Define ni = the length of list T[i] Theorem:

Suppose h is randomly selected from H, using chaining to resolve collisions. If k is not in the table, then E[nh(k)] ≤ α. If k is in the table, then E[nh(k)] ≤ 1+α

Page 6: Universal Hashing

Proof: For each pair k and of distinct keys,

define Xk =I{h(k)=h( )}. By definition, Prh{h(k)=h( )} ≤ 1/m, and so

E[Xk ] ≤ 1/m. Define Yk to be the number of keys other than k

that hash to the same slot as k, so that

1[ ] [ ]

k k

Tk

k k

T Tk k

Y X

E Y E Xm

Page 7: Universal Hashing

If k∈T, then because k appears in T[h(k)] and the count Yk does not include k, we have nh(k) = Yk + 1and

( )

( )

, |{ : , } |

[ ] [ ]

h k k

h k k

If k T then n Y and T k nnthus E n E Ym

( )

|{ : , } | 11 1[ ] [ ] 1 1 1 1h k k

T k nnThus E n E Ym m

Page 8: Universal Hashing

Corollary: Using universal hashing and collision resolution by chaining in an initially empty table with m slots, it takes expected time Θ(n) to handle any sequence of n Insert, Search and Delete operations containing O(m) Insert operations.

Proof: Since n= O(m), the load factor is O(1). By the Thm, each Search takes O(1) time. Each of Insert and Delete takes O(1). Thus the expected time is Θ(n).

Page 9: Universal Hashing

Designing a universal class of hash functions: p:prime

For any and , define ha,b : Zp→Zm

1,,1,0 pZ p 1,,2,1 pZ p

pZb pZa

mpbakkh ba mod)mod)(()(,

ppbamp ZbandZah *,, :

Page 10: Universal Hashing

Theorem:Hp,m is universal.

Pf: Let k, be two distinct keys in Zp.Given ha,b, Let r= (ak +b) mod p, and s= (a +b) mod p.Then r-s ≡ a(k- ) mod p

Page 11: Universal Hashing

For any ha,b∈Hp,m, distinct inputs k and map to distinct r and s modulo p.

Each possible p(p-1) choices for the pair (a,b) with a≠0 yields a different resulting pair (r,s) with r≠s, since we can solve for a and b given r and s:

a=((r-s)((k- )-1 mod p)) mod p b=(r-ak) mod p

Page 12: Universal Hashing

There are p(p-1) possible pairs (r,s) with r≠s, there is a 1-1 correspondence between pairs (a,b) with a≠0 and (r,s), r≠s.

Page 13: Universal Hashing

For any given pair of inputs k and , if we pick (a,b) uniformly at random from

, the resulting pair (r,s) is equally likely to be any pair of distinct values modulo p.

pp ZZ

Page 14: Universal Hashing

Pr[ k and collide]=Prr,s[ r≡s mod m]

Given r, the number of s such that s≠r and s≡r (mod m) is at most

⌈p/m⌉-1≤((p+m-1)/m)-1 =(p-1)/m ∵ s, s+m, s+2m,…., ≤p

Page 15: Universal Hashing

Thus, Prr,s[r≡s mod m] ≤((p-1)/m)/(p-1) =1/mTherefore, for any pair of distinct k, ∈Zp, Pr[ha,b(k)=ha,b( )] ≤1/m,so that Hp,m is universal.

Page 16: Universal Hashing

Perfect Hashing Good for when the keys are static; i.e. ,

once stored, the keys never change, e.g. CD-ROM, the set of reserved word in programming languages. A perfect hashing uses O(1) memory accesses for a search.

Thm :If we store n keys in a hash table of size m=n2 using a hash function h randomly chosen from a universal class of hash functions, then the probability of there being any collisions is < ½ .

Page 17: Universal Hashing

Proof: Let h be chosen from an universal family.

Then each pair collides with probability 1/m , and there are pairs of keys.

Let X be a r.v. that counts the number of collisions. When m=n2,

2n

2

2

1 1 1[ ]2 2 2

' , Pr[ ] [ ] / ,1.

n n nE Xm n

By Markov s inequality X t E X tand take t

Page 18: Universal Hashing

Thm: If we store n keys in a hash table of size m=n using a hash function h randomly chosen from universal class of hash functions, then , where nj is the number of keys hashing to slot j.

nnE m

j j 2][ 1

0

2

Page 19: Universal Hashing

Pf: It is clear for any nonnegative integer

a,

222 a

aa

]2

[2][

]2

2[][

1

0

1

0

1

0

1

0

2

m

j

jm

jj

m

j

jj

m

jj

nEnE

nnEnE

Page 20: Universal Hashing

]2

[2]2

[2][1

0

1

0

m

j

jm

j

j nEn

nEnE

total number of pairs of keys that collide

.2122

12][

. since ,2

12

)1(12

1

0

2 nnnnnE

nmnm

nnm

n

m

jj

Page 21: Universal Hashing

Cor: If store n keys in a hash table of size m=n using a hash function h randomly chosen from a universal class of hash functions and set the size of each secondary hash table to mj=nj

2 for j=0,…,m-1, then the expected amount of storage required for all secondary hash tables in a perfect hashing scheme is < 2n.

Page 22: Universal Hashing

Pf:

.2][][1

0

21

0

nnEmEm

jj

m

jj

Page 23: Universal Hashing

Testing a few randomly chosen hash functions will soon find one using small storage. Cor: Pr[total storage for secondary

hash tables ] 4n] < 1/2 Pf: By Markov’s inequality, Pr[X t]

E[X]/t.

.21

42

4

][}4Pr{

:4 and Take

1

01

0

1

0

nn

n

mEnm

ntmX

m

jjm

jj

m

jj