26
Hash tables & probability Ben Langmead Department of Computer Science Please sign guestbook (www.langmead-lab.org/teaching-materials) to tell me briey how you are using the slides. For original Keynote les, email me ([email protected]).

105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Hash tables & probabilityBen Langmead

Department of Computer Science

Please sign guestbook (www.langmead-lab.org/teaching-materials) to tell me briefly how you are using the slides. For original Keynote files, email me ([email protected]).

Page 2: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

"Hashing with chaining" or "chain hashing"

Hash Table

Pointer

Null Pointer

Key

Value

Page 3: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Hash Function

UN

Assume hash function operates on any item from (integers, strings, etc) and is

timeU

O(1)

n

Assume accessing

table slot is O(1)

Page 4: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Hash Table

What "abstract data types" can we implement with this?

map set counter

< , > < , > < , > < , >

k1 v1k2 v2k3 v3k4 v4

< > < > < > < >

k1k2k3k4

< , 7> < , 4> < , 8> < , 5>

k1k2k3k4

Page 5: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Question Assumption Statement Comment

Does any bucket have more the one item? Yes Pigeonhole

principle

Is any bucket empty? Yes"Empty

pigeonhole" principle

What is the average bucket occupancy? - -

I add items to an -bucket hash tablem n

m > n

m < n

m/n

Without probability, what can I say?

Hash Table

Nothing profound here

Page 6: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Hash Table

I have added items to a -bucket hash table. What "interesting questions" can I ask about the table's state?

m n

How many buckets are empty?

How many items are in the average bucket?

How many items are in the median bucket?

How many items are in the fullest bucket?

What's the chance all buckets are non-empty?

What's the chance no bucket has >1 item?

: # items : # buckets

mn

Page 7: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Probability

Sample space ( ) is set of all possible outcomes

Ω

An event is a subset of Ω

: fraction of outcomes that are in APr(A)

E.g. = { all possible rolls of 2 dice }Ω

= { rolls where 1st die is odd } = { rolls where 2nd die is even }

AB

= = = Pr(A) |A | / |Ω | 18/36 0.5

Ω

1 2 3 4 5 6123456

Die 1

Die 2

When outcomes are equally likely, can use "naive definition of probability"

Page 8: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Probability

“Naive definition” of probability fails to apply when outcomes are not equally probable

Loaded coin # goals scored in soccer game

Page 9: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Probability function Pr

, where is "power set" (set of all subsets) of , satisfies conditions:Pr : 𝒫(Ω) → ℝ 𝒫(Ω)

Ω

1. For any event , E 0 ≤ Pr(E) ≤ 1

2. Pr(Ω) = 1

3. Probabilities of disjoint events add:E1, E2, . . .

Pr ⋃i≥1

Ei = ∑i≥1

Pr(Ei)

Page 10: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

0 1/6 2/6 3/6 4/6 5/6 1

Ω

set

reals in [0, 1]

probability function

outcomes

Sample space

Pr

Probability function Pr

Page 11: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

0 1/6 2/6 3/6 4/6 5/6 1

Ω

Probability function Pr

Prset

probability functionProbabilities of disjoint

events add;

({ , }) = 1/3Pr

"roll one die"

Page 12: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Random variable

Random variables have two "natures"

Function, mapping outcomes from to numbers (in )

ΩIR

X

X( ) = 4

Potential experiment with a distribution (a for its ) and numerical result

Pr Ω

Y = 3.5 − X

Page 13: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Random variable

We use capitals e.g. to denote a random variable

X, Y

Abbreviate with "r.v."

Page 14: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Ω

Random variable & probability function

set

probability function

"roll one die"

reals ( ) IR1 2 3 4 5 6

random variable PrX

reals in [0, 1]0 1/6 2/6 3/6 4/6 5/6 1

Pr(X = 2) = Pr( ) = 1/6

Page 15: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Ω

Random variable & probability function

1 2 3 4 5 6

PrX

0 1/6 2/6 3/6 4/6 5/6 1

Pr(X = 2) = Pr( ) = 1/6

Page 16: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Ω

Random variable & probability function

1 2 3 4 5 6

PrX

0 1/6 2/6 3/6 4/6 5/6 1

Pr(X ≥ 4) = Pr( ) + Pr( ) + Pr( ) = 1/2

Page 17: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Ω

Random variable & probability function

1 2 3 4 5 6

PrX

0 1/6 2/6 3/6 4/6 5/6 1

Y = 3.5 − X

Page 18: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Ω

Random variable & probability function

-3 -2 -1 0 1 2 3

PrY

0 1/6 2/6 3/6 4/6 5/6 1

Y = 3.5 − X

Page 19: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Expected value

Expectation ("expected value") of a discrete r.v. , called , is given by

XE[X]

E[X] = ∑x

x ⋅ Pr(X = x)

where summation is over values in range of . X

Page 20: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Linearity of expectation

For discrete r.v.s X1, X2, . . . , Xn

E [n

∑i=1

Xi] =n

∑i=1

E[Xi]

True whether or not s are independentXi

Page 21: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Expected value

PrZ

When is a linear combination of other r.v.s, can be easier to get than Z

E[Z] Pr

Z = X + Y where is fair die roll & is fair coin flip

XY

is simple (3.5 + 0.5 = 4)E[Z] = E[X] + E[Y]

Page 22: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Besides this setup, what else do we need to define a random variable describing the table?

Hash Table

I have added items to a -bucket hash tablem n

1. Sample space

2. Probability func.

3. Map from outcomes to reals

Ω

Pr

X

Possible allocation of items to buckets

Depend on question asked, assumptions made about hash function

Page 23: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Balls & bins

Throw balls into bins uniformly and independentlym n

Ω2,3 "2 balls in 3 bins"

Page 24: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Hash Table

I have added items to a -bucket hash table. What "interesting questions" can I ask about the table's state?

m n

How many buckets are empty?

How many items are in the average bucket?

How many items are in the median bucket?

How many items are in the fullest bucket?

What's the chance all buckets are non-empty?

What's the chance no bucket has >1 item?

: # items : # buckets

mn

Page 25: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Balls & bins

I throw balls into bins uniformly and independently. What can I ask about the bins and their contents?

m n

How many bins are empty?

How many balls are in the average bin?

How many balls are in the median bin?

How many balls are in the fullest bin?

What's the chance all bins are non-empty?

What's the chance no bin has >1 item?

: # items : # buckets

mn

Page 26: 105 hash and prob pub - Department of Computer Sciencelangmea/resources/lecture_notes/105_hash_and_prob_pub.pdfHash Table I have added m items to a n-bucket hash table 1. Sample space

Category Questions Approach

Empty/ non empty

How many buckets are

empty?

Collisions / no collisions

How many throws until there is a >0.5

chance of a collision?

What's the chance all buckets are non-empty?

Coupon collector

Birthday problem

Local (single bin)

occupancy

Binomial, Geometric,

Poisson r.v.s

Global occupancy

What is the median bucket

occupancy?

What is the maximum bucket

occupancy?

What is the chance no bin has >1 item?

Often hard

What's the occupancy of a given bucket?

What is the chance a given bucket has

>2 items?

I throw balls into bins uniformly and independently. What can I ask?

m nBalls & bins