Intro to Probability - stat.ubc.cabouchard/courses/stat302-sp2017-18//files/... · Intro to Probability ... • Graded midterm available after lecture ... can interchange integrals

Intro to Probability Instructor: Alexandre Bouchard

www.stat.ubc.ca/~bouchard/courses/stat302-sp2017-18/

Announcements

• Graded midterm available after lecture

• Webwork due tonight

Regrading policy• IF you would like a partial regrading, you should,

BEFORE or ON Friday March 15 Tuesday 20th, hand in to me at the beginning of a lecture:

• your exam

• a clean piece of paper stapled to it that clearly (i) explains the question(s) you would like us to regrade AND (ii) the issue(s) you would like to raise

• NOTE: for fairness, the new grade for the question could stay the same, increase, or, in certain cases, decrease (except if the request is limited to the points mentioned in last Friday mass email)

Plan for today

• Sum of continuous random variables

• Conditional densities

Review: transformations

• Suppose I tell you that is the distribution of Richter scales

• What is the distribution of the amplitudes?

• For simplicity:

• Assume Richter scale X ~ Uniform(0, 1)

• What is the distribution of Y = exp(X) ?

Ex 53

Review: recipe for transformations

Recipe for finding the distribution of transforms of r.v.’s

1

2

Find the CDF

Differentiate to find the density

Density fX

Richter:

Amplitude:

0 1

1 102 3 4 5 6 7 8 9



• For simplicity:

• Assume Richter scale X ~ Uniform(0,1)

• What is the distribution of exp(X) ?


1 Find the CDF



• For simplicity:



FY (y) = P (exp(X) y)

= P (X log(y))

= FX(log(y)) = 1[1,e](y) log(y)

Why?

Why P(exp(X)≤y) = P(X≤log(y))

• Because (exp(X)≤y) = (X≤log(y)), which is true because:

• log is increasing, i.e. x1≤x2 iff log(x1)≤log(x2)

• this means I can take log on both sides of the inequality: (exp(X)≤y) = (log(exp(X))≤log(y))

• log/exp are invertible: log(exp(z)) = z, so(log(exp(X))≤log(y)) = (X≤log(y))


2



• For simplicity:




fY (y) =dFY (y)

dy

= 1[1,e](y)1

y

at points where FY is differentiable

Sums of independent discrete random variables

(exact method)

Sum of independent r.v.s: summary

• Approximations:

• Central limit theorem (Normal approximation)

• Use software/PPL

• Exact methods:

• Binomial distribution (works only for sum of Bernoullis)

• Today: general, exact method CONVOLUTIONS

Simple example

• X: outcome of white dice

• Y: outcome of black dice

• Example: computing P(X + Y = 4)

Ex 68

Simple example

Application

• Not convinced? Play this game:

Settler of Catan

General formula for discrete r.v.s

If:

Sum of Independent Random Variables

Consider two integer-valued independent r.v. X and Y of respectivep.m.f. pX (x) and pY (y).

Consider Z = X + Y , we want to compute the p.m.f. of Z denotedpZ (z).

Assume Y = y then Z = z if and only if X = z y and

P (X = z y Y = y) = pX (z y) pY (y)

so, as Y can take integer values and the events

(X = z y) (Y = y) and (X = z y ) (Y = y ) are mutuallyexclusive for y = y , we have

pZ (z) =�

⇥y=�

pX (z y) pY (y) .

AD () March 2010 9 / 13

Then:


Consider two integer-valued independent r.v. X and Y of respectivep.m.f. pX (x) and pY (y).

Consider Z = X + Y , we want to compute the p.m.f. of Z denotedpZ (z).

Assume Y = y then Z = z if and only if X = z y and

P (X = z y Y = y) = pX (z y) pY (y)

so, as Y can take integer values and the events

(X = z y) (Y = y) and (X = z y ) (Y = y ) are mutuallyexclusive for y = y , we have

pZ (z) =�

⇥y=�

pX (z y) pY (y) .

AD () March 2010 9 / 13

Prop 16

Sums of independent continuous random variables

Sum of continuous r.v.s• X: a continuous r.v. with density fX

• Y: a continuous r.v. with density fY

• Assume they are indep: f(x, y) = fX(x) fY(y)

• What is the density fZ of the sum Z = X + Y?

Recipe for finding the distribution of transforms of r.v.’s

1

2

Find the CDF


Density fX

Richter:

Amplitude:

0 1

1 102 3 4 5 6 7 8 9

Example

• Let X and Y be independent and both uniform on [0, 1]


x

y

Ex 69

Example



x

y

1 Find the CDF

P( Z ≤ 1 ) = P( X + Y ≤ 1 )

= ?

FZ(z) = P(Z ≤ z) example: z = 1

Example



x

y

1 Find the CDF

P( Z ≤ 1 ) = P( X + Y ≤ 1 )

= P( (X, Y) ∈ A )

x

y

P(Z ≤ z) for all zexample: z = 1

=

Z

Af(x, y) dx dy

= 1/2

=

Z 1

�1

✓Z 1�x

�1f(x, y) dy

◆dx

A = {(x,y) : x + y ≤ 1}

Example



x

y

1 Find the CDF

P( Z ≤ z ) = P( X + Y ≤ z )

P(Z ≤ z) for all z

=

Z 1

�1

✓Zz�x

�1f

X

(x)fY

(y) dy

◆dx

=

Z 1

�1f

X

(x)

✓Zz�x

�1f

Y

(y) dy

◆dx

=

Z 1

�1fX(x) (FY (z � x)) dx

Definition of the CDF F(y)

Example



x

y

1 Find the CDF

FZ(z) = P( Z ≤ z )

2 Differentiate to find the density

=

Z 1

�1fX(x)FY (z � x)dx

fZ(z) =dFZ(z)

dz=

Z 1

�1fX(x)

dFY (z � x)

dzdx

=

Z 1

�1fX(x)fY (z � x)dx

=

Z 1

�1fX(x)fY (z � x)

✓d

dz(z � x)

◆dx

Under regularity conditions, you can interchange

integrals and derivatives

Chain rule of calculus

Sum of continuous r.v.s• X: a continuous r.v. with density fX

• Y: a continuous r.v. with density fY



In numerous scenarios, we have to sum independent continuous r.v.;signal + noise, sums of dierent random eects etc.

Assume that X ,Y are continuous r.v. of respective pdf fX (x) andfY (y) then Z = X + Y admits the pdf

fZ (z) = �

�fX (z y) fY (y) dy

= �

�fX (x) fY (z x) dx

The pdf fZ (z) is the so-called “convolution” of fX (x) and fY (y).

AD () March 2010 11 / 13

Terminology: ‘convolution’

Prop 16b



Note: Not equal to the sum of the densities !!!

Ex 69

x

y

Conditional densities

Conditional PMF and density

* if denominator is non-zero

fX|Y (x|y) =joint density

marginal density

=

f(x, y)

fY (y)

Conditional density given y

Conditional PMF given y

pX|Y (x|y) =joint PMF

marginal PMF

=

p(x, y)

pY (y)

Def 26*

Rewriting chain rule

P( A, B) = P(A) P(B | A)

For any events A, B, with P(A) > 0:

p(x, y) = pX(x)pY |X(y|x)

A = (X = x), B = (Y = y)Correspondence?

Prop 17a

Rewriting Bayes rule

P (H|E) =P (H)P (E|H)

P (E)

pZ|X(z|x) =pZ(z)pX|Z(x|z)

pY (y)

H: hypothesis (unknown), E: evidence/observation

H = (Z = z), E = (X = x)Correspondence?

Prop 17b

Density versions

P (H|E) =P (H)P (E|H)

P (E)

p(x, y) = pX(x)pY |X(y|x)

f(x, y) = fX(x)fY |X(y|x)

P (A,B) = P (A)P (B|A)

dens

ities

PMFs

Even

ts

Chain rule Bayes rule

Prop 17c

fZ|X(z|x) =fZ(z)fX|Z(x|z)

fX(x)

pZ|X(z|x) =pZ(z)pX|Z(x|z)

pX(x)

Usual warning• f and p behave similarly in formulas (replacing

sums by integrals)

• BUT: as always, f(x, y), fX(x), fY(y) and fX|Y(x|y) are NOT probabilities. We integrate over a region to get probabilities

• For fX(x), fY(y) and fX|Y(x|y), use a single integral

• For f(x, y), use a double integral

Example: Using conditioning to predict the number of future

members of the human species

Simple problem

• I have a measuring tape, but you do not know how long is it.

• Length of tape: Z

• I go in a separate room, unroll it fully, and pick a number at random from the tape.

• Random point on tape: Y

• If I tell you Y, how should we optimally guess Z?

Ex 72

Model

• I have a measuring tape, but you do not know how long is it.

• Length of tape: Z

• Let’s say we think it’s less than 5m

• I go in a separate room, unroll it fully, and pick a number at random from the tape.

• Random point on tape: Y

Z ~ Unif(0, 5)

Y|Z ~ Unif(0, Z)

Ex 72

More ‘dramatic’ version: how to predict the number of future

members of the human species?• I have a measuring tape, but you do not know

how long is it (Z).

• I go in a separate room, unroll it fully, and pick a number (Y) at random from the tape.

• If I tell you Y, how should we optimally guess Z?

Total number of humans to ever live, future and past (in trillion)

Number of humans that were born before present (from archeological records, ~0.06 trillion)

Can we guess (probabilistically) how many more human there will be?

Ex 72

http://en.wikipedia.org/wiki/Doomsday_argument

http://en.wikipedia.org/wiki/Doomsday_argument

Conditional probability: continuous case

fZ(z)

New information (observation): a fixed point y

Beliefs before new info (prior) Conditioning Updated beliefs

fZ|Y(z|y)

Exercises: see handout• Write fZ(z) and fY|Z(y|z)

• Write f(z,y)

• Compute fY(y)

• Compute fZ|Y(z|y)

• Compute the conditional expectation:

E[Z|Y ] =

Z 1

�1zfZ|Y (z|Y )dz

Z ~ Unif(0, 5)Y | Z ~ Unif(0, Z)

Observed: Y ≈ 0.06

Ex 72

Useful formulas for continuous random variables

fX(x) =

Z +1

�1f(x, y) dy

fX|Y (x|y) =joint density

marginal density

=

f(x, y)

fY (y)

Conditional density given y

U ⇠ Unif(a, b)

Marginalization Uniform density

fU (u) =1(a,b)(u)

b� a

Joint density?Z ~ Unif(0, 5)

Y | Z ~ Unif(0, Z)


1[0,5](z)

5

1[0,z](y)

z

1[0,5](y)

5

1[0,y](z)

y

1[y,5](z)

y � 5

1[0,z](y)

z

1[0,5](z)

5

1[0,5](y)

5

A.

B.

C.

D.

Hint:

Ex 72a

Joint density?Z ~ Unif(0, 5)

Y | Z ~ Unif(0, Z)


1[0,5](z)

5

1[0,z](y)

z

1[0,5](y)

5

1[0,y](z)

y

1[y,5](z)

y � 5

1[0,z](y)

z

1[0,5](z)

5

1[0,5](y)

5

A.

B.

C.

D.

Hint:

Ex 72a

Joint densityZ ~ Unif(0, 5)

Y | Z ~ Unif(0, Z)


{ {{

f(z,y)

fZ(z) fY|Z(y|z)

1(0,5)(z)

5

1(0,z)(y)

z

Marginal of Y, fY(y)

A.

B.

C.

D.

Ex 72b

1

5

(log 5� log y)

1

5

✓2

y2� 2

25

◆

log 5� log y

2

y2� 2

25

For 0 < y < 5:

Z ~ Unif(0, 5)Y | Z ~ Unif(0, Z)


Marginal of Y, fY(y)

A.

B.

C.

D.

Ex 72b

1

5

(log 5� log y)

1

5

✓2

y2� 2

25

◆

log 5� log y

2

y2� 2

25

For 0 < y < 5:

Z ~ Unif(0, 5)Y | Z ~ Unif(0, Z)


Posterior density, fZ|Y(z|y)

At y = 0.06, get:

0.0

0.5

1.0

1.5

0 1 2 3 4 5x

Appr

oxim

atio

n of

f(x)

Density of Z

fZ|Y (z|y) =1(0,5)(z)1(0,z)(y)

z(log(5)� log y)

‘Carter catastrophe’At y = 0.06, get:

0.0

0.5

1.0

1.5

0 1 2 3 4 5x

Appr

oxim

atio

n of

f(x)

Density of Z

Brandon Carter; McCrea, W. H. (1983). "The anthropic principle and its implications for biological evolution".Philosophical Transactions of the Royal Society of London. A310 (1512): 347–363. doi:10.1098/rsta.1983.0096.

- Does not mean humanity will come to an end! Why?- Assumptions (e.g. that our birth rank should be viewed uniform among all human births) still hotly debated- Choice of prior on Z: are we over-pessimistic/optimistic by assuming a uniform prior density on [0,5]? - However, note that the math is solid (think about the measuring tape example if uncomfortable with Carter’s assumptions)

http://en.wikipedia.org/wiki/Digital_object_identifier

http://dx.doi.org/10.1098%2Frsta.1983.0096

Conditional expectation

A. 1.5B. 1.23C. 1.117D. 0.9714

E[Z|Y ] =

Z 1

�1zfZ|Y (z|Y )dz ≈ ...

Ex 72cZ ~ Unif(0, 5)

Y | Z ~ Unif(0, Z)


At y = 0.06...

fZ|Y (z|y) =1(0,5)(z)1(0,z)(y)

z(log(5)� log y)

Conditional expectation

A. 1.5B. 1.23C. 1.117D. 0.9714

E[Z|Y ] =

Z 1

�1zfZ|Y (z|Y )dz ≈ ...

Ex 72cZ ~ Unif(0, 5)

Y | Z ~ Unif(0, Z)


At y = 0.06...

fZ|Y (z|y) =1(0,5)(z)1(0,z)(y)

z(log(5)� log y)

Documents

Intro to Probability - stat.ubc.cabouchard/courses/stat302-sp2017-18//files/... · Intro to Probability ... • Graded midterm available after lecture ... can interchange integrals