42
The Poisson distribuon The Poisson distribuon Will Monroe Will Monroe July 12, 2017 July 12, 2017 with materials by Mehran Sahami and Chris Piech

The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Embed Size (px)

Citation preview

Page 1: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

The Poisson distributionThe Poisson distribution

Will MonroeWill MonroeJuly 12, 2017July 12, 2017

with materials byMehran Sahamiand Chris Piech

Page 2: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Announcements: Problem Set 2

(Cell phone location sensing)

Due today!

Page 3: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Announcements: Problem Set 3

(election prediction)

To be released later today.

Due next Wednesday, 7/19, at 12:30pm (before class).

Page 4: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Announcements: Midterm

Two weeks from yesterday:

Tuesday, July 25, 7:00-9:00pm

Tell me by the end of thisweek if you have a conflict!

Page 5: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Review: Basic distributions

X∼Bin (n , p)

Many types of random variables come up repeatedly. Known frequently-occurring distributions lets you do computations without deriving formulas from scratch.

family parametersvariable

We have ________ independent _________, INTEGER PLURAL NOUN

each of which ________ with probability VERB ENDING IN -S

________. How many of the ________ REAL NUMBER REPEAT PLURAL NOUN

________?REPEAT VERB -S

Page 6: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Review: Bernoulli random variable

An indicator variable (a possibly biased coin flip) obeys a Bernoulli distribution. Bernoulli random variables can be 0 or 1.

X∼Ber ( p)

pX (1)=ppX (0)=1−p (0 elsewhere)

Page 7: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Review: Bernoulli fact sheet

probability of “success” (heads, ad click, ...)

X∼Ber ( p)?

image (right): Gabriela Serrano

PMF:

expectation: E [X ]=p

variance: Var(X )=p(1−p)

pX (1)=ppX (0)=1−p (0 elsewhere)

Page 8: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Review: Binomial random variable

The number of heads on n (possibly biased) coin flips obeys a binomial distribution.

pX (k)={(nk) p

k(1−p)n−k if k∈ℕ ,0≤k≤n

0 otherwise

X∼Bin (n , p)

Page 9: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Review: Binomial fact sheet

probability of “success” (heads, crash, ...)

X∼Bin (n , p)

PMF:

expectation: E [X ]=np

variance: Var(X )=np(1−p)

number of trials (flips, program runs, ...)

Ber(p)=Bin (1 , p)note:

pX (k )={(nk) p

k(1−p)

n−k if k∈ℕ ,0≤k≤n

0 otherwise

Page 10: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson random variable

The number of occurrences of an event that occurs with constant rate λ (per unit time), in 1 unit of time, obeys a Poisson distribution.

pX (k)={e−λ λ

k

k !if x∈ℤ , x≥0

0 otherwise

X∼Poi (λ)

Page 11: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

Which looks like the most likely PMF for the distribution of X?

image: Zbigniew Bzdak, Chicago Tribune

0 1 2 3 4 5 6 7 8 90

0.050.1

0.150.2

0 1 2 3 4 5 6 7 8 90

0.5

10 1 2 3 4 5 6 7 8 9

00.05

0.10.15

0.2

←x

pX (x)

A)

B)

C)

D)

0 1 2 3 4 5 6 7 8 90

0.050.1

0.150.2

https://bit.ly/1a2ki4G → https://b.socrative.com/login/student/Room: CS109SUMMER17

Page 12: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

Which looks like the most likely PMF for the distribution of X?

image: Zbigniew Bzdak, Chicago Tribune

0 1 2 3 4 5 6 7 8 90

0.050.1

0.150.2

0 1 2 3 4 5 6 7 8 90

0.5

10 1 2 3 4 5 6 7 8 9

00.05

0.10.15

0.2

←x

pX (x)

A)

B)

C)

D)

0 1 2 3 4 5 6 7 8 90

0.050.1

0.150.2

https://bit.ly/1a2ki4G → https://b.socrative.com/login/student/Room: CS109SUMMER17

Page 13: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

60 seconds

...0 0 0 0 0 0 0111

X∼Bin (n , p)

Page 14: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

60 seconds

...0 0 0 0 0 0 0111

X∼Bin (60,5

60) E [X ]=60 p=5

p=5

60

Page 15: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

60000 milliseconds

...

X∼Bin (60000,5

60000) E [X ]=60000 p=5

p=5

60000

Page 16: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

∞ instants

...

X∼Bin (∞ ,5∞) E [X ]=∞ p=5

p=5∞

Page 17: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Binomial in the limitP(X=k )= lim

n→∞ (nk)(5n )

k

(1−5n )

n−k

=limn→∞

n!k !(n−k)!

5k

nk

(1−5/n)n

(1−5 /n)k

=limn→∞

n(n−1)⋯(n−k+1)

k !5k

nk

(1−5 /n)n

(1−5 /n)k

=limn→∞

nn

n−1n

⋯n−k+1

n5k

k !(1−5 /n)

n

(1−5/n)k

nn

,n−1n

,⋯,n−k+1

n→1 (1−5/n)

k→1k

=1

(1−5/n)n→e−5

=limn→∞

n(n−1)⋯(n−k+1)

k !5k

nk

(1−5 /n)n

(1−5 /n)k

Page 18: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Binomial in the limitP(X=k )= lim

n→∞ (nk)(5n )

k

(1−5n )

n−k

=limn→∞

n!k !(n−k)!

5k

nk

(1−5/n)n

(1−5 /n)k

=limn→∞

n(n−1)⋯(n−k+1)

k !5k

nk

(1−5 /n)n

(1−5 /n)k

=limn→∞

nn

n−1n

⋯n−k+1

n5k

k !(1−5 /n)

n

(1−5/n)k

=limn→∞

n(n−1)⋯(n−k+1)

k !5k

nk

(1−5 /n)n

(1−5 /n)k

=5k

k !e−5

Page 19: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

∞ instants

...

X∼Bin (∞ ,5∞) P(X=3)=

53

3!e−5

≈0.14

Page 20: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

∞ instants

...

X∼Poi (5) P(X=3)=53

3!e−5

≈0.14

Page 21: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF: pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

Page 22: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson and Cookies

Make a very large chocolate chip cookie recipe.

Recipe tells you the overall ratio ofchocolate chips per cookie (λ).

But some cookies get more, some get less!

X ~ Poi(λ) is the number of chocolate chips in some individual cookie.

(This is called a “Poisson process”: independent discrete events [chocolate chips] scattered throughout continuous time or space [batter], with constant overall rate.)

Page 23: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF: pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

expectation: E [X ]=λ

Page 24: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Expectation of Poisson

The cheater’s way:

Poi(λ)=Bin(n=∞ , p= λ∞ )

E [X ]=np=∞⋅λ∞=λ

The real way:

E [X ]=∑k=0

e−λ λk

k !⋅k

=∑k=1

e−λ λk

k !⋅k

=∑k=1

e−λ λk

(k−1)!

=λ e−λ∑k=1

∞λ

k−1

(k−1)!

=λ e−λ∑j=0

∞λ

j

j !

=λ e−λ⋅eλ

∑j=0

∞λ

j

j !=eλ

Page 25: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF:

expectation: E [X ]=λ

pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

Page 26: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Variance of Poisson

The cheater’s way:

Poi(λ)=Bin(n=∞ , p= λ∞ )

Var (X )=np(1−p)

The real way:

E [X 2]=∑

k=0

e−λ λk

k !⋅k2

=∑k=1

e−λ λk

k !⋅k2

=λ∑k=1

e−λ λk−1

(k−1)!⋅k

∑j=0

∞λ

j

j !=eλ

=∞⋅λ∞⋅(1− λ

∞)=λ⋅1=λ

=λ∑j=0

e−λ λj

j !⋅( j+1)

=λ [∑j=0

e−λ λj

j !⋅j+∑

j=0

e−λ λj

j ! ]=λ [E [X ]+e−λ∑

j=0

∞λ

j

j ! ]=λ(λ+1)

Var (X )=E [X 2]−(E [X ])

2=λ(λ+1)−λ

2=λ

Page 27: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF:

expectation: E [X ]=λ

variance: Var(X )=λ

pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

Page 28: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Siméon Denis Poisson

French mathematician (1781-1840)

● First paper at age 18● Became professor at 21● Published over 300 papers

« La vie n'est bonne qu'à deux choses : à faire des mathématiques et à les professer. »

—quoted in Arago (1854)

(“Life is only good for two things: doing mathematics and teaching it.”)

Page 29: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Break time!

Page 30: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF:

expectation: E [X ]=λ

variance: Var(X )=λ

pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

Page 31: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Web server hits

Your web server gets an average of 2 requests each second.

What’s the probability of exactly 5requests in a second?

X: number of requests in next sec.

P(X=5)=e−2 25

5!≈0.0361

Page 32: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Earthquakes

There are an average of 2.8 major earthquakes in the world each year.

What’s the probability of >1major earthquake next year?

X: number of earthquakes next year

P(X>1)=1−P(X=0)−P(X=1)

=1−e−2.8 2.80

0!−e−2.8 2.81

1!=1−0.06−0.17

=0.77

Page 34: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson approximation of binomial

We derived Poisson as the limit of the binomial as n → ∞, with λ = np.

But it works (approximately) for finite n as well!

Bin (n , p)≈Poi (λ=np)

if: n is largep is small

Page 35: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

More space messages

P(X≤1)=P (X=0)+P(X=1)

≈0.67031+0.26815=0.9384 6

=0.99994000+4000⋅0.0001⋅0.99993999

=(40000 )(0.0001)

0(0.9999)

4000−0

+(40001 )(0.0001)

1(0.9999)

4000−1

P(X≤1)=P (X=0)+P(X=1)

≈0.67032+0.26813=0.9384 5

=e−0.4+0.4 e−0.4

≈e−0.4 0.40

0 !+e−0.4 0.41

1!

Sending a 4000-bit message through space. Each bit corrupted (flipped) with probability p = 0.0001.

X: number of bits flippedX ~ Bin(4000, 0.0001) ≈ Poi(0.4)

Page 36: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson approximation of binomial

P(X=k )

k

Page 37: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Poisson approximations are chill

Poisson often still works despite “mild” violations of the binomial distribution assumptions:

– Independence

e.g., # of entries in each bucket in a hash table

– Same p:

e.g. prob. of birthdays is somewhat uneven

Page 38: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Birthdays

m people in a room.What is the probability none share the same birthday?

P(Exy) ≈ 1/365

Exy: x and y have the same birthday (“success”)

P(X=0)=e−(m2 )

1365

⋅((m2 )

1365 )

0

0!=e

−(m2 )1

365

P(X = 0) < 1/2: n = 23

trials, one for each distinct pair of people (x, y)(m2 )

Approximate: X ~ Poi(λ) λ=(m2 )1

365

not all independent

Page 39: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

A real license plate seen at Stanford

Page 40: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Randomness and clumping

uniformly random

regularpattern

“low-discrepancy”pseudorandom

Page 41: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Application: Low-discrepancysampling for rendering

uniformly random

regularpattern

“low-discrepancy”pseudorandom

Page 42: The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

Philosophical note:The structure of the universe

image: Andrew Z. Colvin