Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
SDS 321: Introduction to Probability andStatistics
Lecture 23: Continuous random variables-Inequalities, CLT
Purnamrita SarkarDepartment of Statistics and Data Science
The University of Texas at Austin
www.cs.cmu.edu/∼psarkar/teaching
1
Roadmap
I Examples of Markov and Chebyshev
I Weak law of large numbers and CLT
I Normal approximation to Binomial
2
Markov’s inequality Example
You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use
the Markov inequality to bound P(20∑i=1
Xi ≥ 15)
I P(∑i
Xi ≥ 15) ≤E [∑
i Xi ]
15=
20
15=
4
3
I How useful is this?
3
Markov’s inequality Example
You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use
the Markov inequality to bound P(20∑i=1
Xi ≥ 15)
I P(∑i
Xi ≥ 15) ≤E [∑
i Xi ]
15=
20
15=
4
3
I How useful is this?
3
Markov’s inequality Example
You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use
the Markov inequality to bound P(20∑i=1
Xi ≥ 15)
I P(∑i
Xi ≥ 15) ≤E [∑
i Xi ]
15=
20
15=
4
3
I How useful is this?
3
Chebyshev’s inequality Example
You have n independent Poisson(1) random variables X1, . . . ,Xn. Use theChebyshev inequality to bound P(|X̄ − 1| ≥ 1)?
I
P(|X̄ − 1| ≥ 1) ≤ var(X1)
n=
1
n
=1
10When n = 10
=1
100When n = 100
...
4
Chebyshev’s inequality Example
You have n independent Poisson(1) random variables X1, . . . ,Xn. Use theChebyshev inequality to bound P(|X̄ − 1| ≥ 1)?
I
P(|X̄ − 1| ≥ 1) ≤ var(X1)
n=
1
n
=1
10When n = 10
=1
100When n = 100
...
4
Weak law of large numbers
The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.
I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ
and variance σ2.
I Let Mn =X1 + · · ·+ Xn
n.
I E [Mn] =E [X1] + · · ·+ E [Xn]
n= µ
I var(Mn) =var[X1] + · · ·+ var[Xn]
n2=σ2
n
I So P(|Mn − µ| ≥ ε) ≤σ2
nε2
I For large n this probability is small.
5
Weak law of large numbers
The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.
I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ
and variance σ2.
I Let Mn =X1 + · · ·+ Xn
n.
I E [Mn] =E [X1] + · · ·+ E [Xn]
n= µ
I var(Mn) =var[X1] + · · ·+ var[Xn]
n2=σ2
n
I So P(|Mn − µ| ≥ ε) ≤σ2
nε2
I For large n this probability is small.
5
Weak law of large numbers
The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.
I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ
and variance σ2.
I Let Mn =X1 + · · ·+ Xn
n.
I E [Mn] =E [X1] + · · ·+ E [Xn]
n= µ
I var(Mn) =var[X1] + · · ·+ var[Xn]
n2=σ2
n
I So P(|Mn − µ| ≥ ε) ≤σ2
nε2
I For large n this probability is small.
5
Weak law of large numbers
The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.
I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ
and variance σ2.
I Let Mn =X1 + · · ·+ Xn
n.
I E [Mn] =E [X1] + · · ·+ E [Xn]
n= µ
I var(Mn) =var[X1] + · · ·+ var[Xn]
n2=σ2
n
I So P(|Mn − µ| ≥ ε) ≤σ2
nε2
I For large n this probability is small.
5
Weak law of large numbers
The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.
I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ
and variance σ2.
I Let Mn =X1 + · · ·+ Xn
n.
I E [Mn] =E [X1] + · · ·+ E [Xn]
n= µ
I var(Mn) =var[X1] + · · ·+ var[Xn]
n2=σ2
n
I So P(|Mn − µ| ≥ ε) ≤σ2
nε2
I For large n this probability is small.
5
Illustration
Consider the mean of n independent Poisson(1) random variables. Foreach n, we plot the distribution of the average.
6
Can we say more? Central Limit Theorem
Turns out that not only can you say that the sample mean is close to thetrue mean, you can actually predict its distribution using the famousCentral Limit Theorem.
I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ
and variance σ2.
I Let X̄n =X1 + · · ·+ Xn
n. Remember E [X̄n] = µ and var(X̄n) = σ2/n
I Standardize X̄n to getX̄n − µσ/√n
I As n gets bigger,X̄n − µσ/√n
behaves more and more like a Normal(0, 1)
random variable.
I P
(X̄n − µσ/√n< z
)≈ Φ(z)
7
Can we say more? Central Limit Theorem
Figure: (Courtesy: Tamara Broderick) You bet!
8
Example
You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use
the CLT to bound P(20∑i=1
Xi ≥ 15)
I
P(∑i
Xi ≥ 15) = P(∑i
Xi − 20 ≥ −5)
= P(X̄ − 1
1/√
20≥ −.25
√20)
≈ P(Z ≥ −1.18) = 0.86
I How useful is this? Better than Markov.
9
Example
You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use
the CLT to bound P(20∑i=1
Xi ≥ 15)
I
P(∑i
Xi ≥ 15) = P(∑i
Xi − 20 ≥ −5)
= P(X̄ − 1
1/√
20≥ −.25
√20)
≈ P(Z ≥ −1.18) = 0.86
I How useful is this? Better than Markov.
9
Example
You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use
the CLT to bound P(20∑i=1
Xi ≥ 15)
I
P(∑i
Xi ≥ 15) = P(∑i
Xi − 20 ≥ −5)
= P(X̄ − 1
1/√
20≥ −.25
√20)
≈ P(Z ≥ −1.18) = 0.86
I How useful is this? Better than Markov.
9
ExampleAn astronomer is interested in measuring the distance, in light-years, from his observatory toa distant star. Although the astronomer has a measuring technique, he knows that, becauseof changing atmospheric conditions and normal error, each time a measurement is made itwill not yield the exact distance, but merely an estimate. As a result, the astronomer plansto make a series of measurements and then use the average value of these measurements ashis estimated value of the actual distance. If the astronomer believes that the values of themeasurements are independent and identically distributed random variables having acommon mean d (the actual distance) and a common variance of 4 (light-years), how manymeasurements need he make to 95% sure that his estimated distance is accurate to within±.5 lightyears?
I Let XÌ„n be the mean of the measurements.
I How large does n have to be so that P(|X̄n − d | ≤ .5) = 0.95
I
P
(|X̄n − d |
2/√n≤ 0.25
√n
)≈ P(|Z | ≤ 0.25
√n) = 1− 2P(Z ≤ −0.25
√n) = 0.95
P(Z ≤ −0.25√n) = 0.025
−0.25√n = −1.96√n = 7.84
n ≈ 62
10
ExampleAn astronomer is interested in measuring the distance, in light-years, from his observatory toa distant star. Although the astronomer has a measuring technique, he knows that, becauseof changing atmospheric conditions and normal error, each time a measurement is made itwill not yield the exact distance, but merely an estimate. As a result, the astronomer plansto make a series of measurements and then use the average value of these measurements ashis estimated value of the actual distance. If the astronomer believes that the values of themeasurements are independent and identically distributed random variables having acommon mean d (the actual distance) and a common variance of 4 (light-years), how manymeasurements need he make to 95% sure that his estimated distance is accurate to within±.5 lightyears?
I Let XÌ„n be the mean of the measurements.
I How large does n have to be so that P(|X̄n − d | ≤ .5) = 0.95
I
P
(|X̄n − d |
2/√n≤ 0.25
√n
)≈ P(|Z | ≤ 0.25
√n) = 1− 2P(Z ≤ −0.25
√n) = 0.95
P(Z ≤ −0.25√n) = 0.025
−0.25√n = −1.96√n = 7.84
n ≈ 62
10
Normal Approximation to Binomial
The probability of selling an umbrella is 0.5 on a rainy day. If there are400 umbrellas in the store, whats the probability that the owner will sellat least 180?
I Let X be the total number of umbrellas sold.
I X ∼ Binomial(400, .5)
I We want P(X > 180). Crazy calculations.
I But can we approximate the distribution of X/n?
I X/n = (∑i
Yi )/n where E [Yi ] = 0.5 and var(Yi ) = 0.25.
I Sure! CLT tells us that for large n,X/400− 0.5√
0.25/400∼ N(0, 1)
I So P(X > 180) = P((X − 200)/√
100 > −2) ≈ P(Z ≥ −2) =
1− Φ(−2) = 0.97
11
Normal Approximation to Binomial
The probability of selling an umbrella is 0.5 on a rainy day. If there are400 umbrellas in the store, whats the probability that the owner will sellat least 180?
I Let X be the total number of umbrellas sold.
I X ∼ Binomial(400, .5)
I We want P(X > 180). Crazy calculations.
I But can we approximate the distribution of X/n?
I X/n = (∑i
Yi )/n where E [Yi ] = 0.5 and var(Yi ) = 0.25.
I Sure! CLT tells us that for large n,X/400− 0.5√
0.25/400∼ N(0, 1)
I So P(X > 180) = P((X − 200)/√
100 > −2) ≈ P(Z ≥ −2) =
1− Φ(−2) = 0.97
11
Frequentist Statistics
I The parameter(s) θ is fixed and unknown
I Data is generated through the likelihood function p(X ; θ) (ifdiscrete) or f (X ; θ) (if continuous).
I Now we will be dealing with multiple candidate models, one for eachvalue of θ
I We will use Eθ[h(X )] to define the expectation of the randomvariable h(X ) as a function of parameter θ
12
Problems we will look at
I Parameter estimation: We want to estimate unknown parametersfrom data.
I Maximum Likelihood estimation (section 9.1): Select theparameter that makes the observed data most likely.
I i.e. maximize the probability of obtaining the data at hand.
I Hypothesis testing: An unknown parameter takes a finite numberof values. One wants to find the best hypothesis based on the data.
I Significance testing: Given a hypothesis, figure out the rejectionregion and reject the hypothesis if the observation falls within thisregion.
13