Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Sample Mean and Central Limit Theorem
Lecture 21-22November 17-21
Outline
• Sums of Independent Random Variables• Chebyshev’s Inequality• Estimating sample sizes• Central Limit Theorem• Binomial Approximation to the normal
Sample Mean Statistics
Let X1,…Xn be a random sample from a population (e.g. The Xi are independent and identically distributed).
The sample mean is defined as
What can we say about the distribution of the sample mean?
1
1 .n
ii
x xn =
= ∑
Sample Mean for Normal
Let X1,X2,…,Xn be a random sample from a normal (μ,σ). What is the pdf of ?
Hint use the mgf method. The mgf of a normal is:
x
2 2( )
2( )tt
xm t eαμ +
=
SolutionThe mgf of Y=∑Xi is
But we want
Thus the mean is normal with mean μ and variance
2 2 2 2( ) ( )
2 2( ) [ ( )]n
t ntt n tny xm t m t e e
α αμ μ+ +⎡ ⎤= = =⎢ ⎥
⎢ ⎥⎣ ⎦
/x y n= 2 22 2 ( )( / ) ( )( / )22( ) ( / )
tn t n tn t nn
x ym t m t n e eαα μμ ++
= = =
2
nσ
For any distribution with mean μand standard deviation σ
What is mean and variance of ?x
1 1( ) ( )i
ii
i
xE x E E x n
n n nμ μ
⎛ ⎞⎜ ⎟= = = =⎜ ⎟⎜ ⎟⎝ ⎠
∑∑
22
2 2
1 1var( ) var var( )i
ii
i
xx x n
n n n nσσ
⎛ ⎞⎜ ⎟= = = =⎜ ⎟⎜ ⎟⎝ ⎠
∑∑
What if we don’t know the distribution?
If we don’t want to make assumptions about what the distribution is, can we still do things?
• Chebyshev’s Inequality• Central Limit Theorem
Amazing property true for any distribution
Chebyshev inequality:Consider any random variable with mean μ
and standard deviation σ. For any k>0, P[|X- μ|≥k σ)≤1/k2
E(X) E(X)+kσE(X)-kσ
In words,
The probability that X deviates from its expected value at least k standard deviations is less than
2
1k
Example
Suppose that, on average, a post office handles 10000 letters a day with a variance of 2000. What can be said about the probability that this post office will handle between 8000 and 12000 letters tomorrow?
Using Chebshev’s
X = number of letters will handle tomorrow
Because by Chebyshev’s
2( ) 10000 var( ) 2000(8000 12000).
E X XWant P Xμ α= = = =
< <
(8000 12000) ( 2000 10000 2000)(| 10000 | 2000)
1 (| 10000 | 2000) 1 .0005 .9995
P X P XP X
P X
< < = − < − <= − <= − − > ≥ − =
(| 10000 | 2000) (| 10 | 2000 )1 .0005
2000
P X P X σ− > = − >
≤ =
Alternative forms of Chebyshev
For any k>0, P[|X- μ|≥kσ)≤1/k2
For any t>0P[|X- μ|≥t)≤ σ 2/t2 by setting t= kσ
E(X) E(X)+tE(X)-t
Chebyshev’s applied to Sample Mean
Recall ifare independent and identically distributedwith mean μ and standard deviation σ then
1 1( ) ( )i
ii
i
xE x E E x n
n n nμ μ
⎛ ⎞⎜ ⎟= = = =⎜ ⎟⎜ ⎟⎝ ⎠
∑∑
22
2 2
1 1var( ) var var( )i
ii
i
xx x n
n n n nσσ
⎛ ⎞⎜ ⎟= = = =⎜ ⎟⎜ ⎟⎝ ⎠
∑∑
1 ,, nX X…
Chebyshev’s applied to Sample Mean
Applying Chebyshev’s to the sample meanthen for any ε>0
2
2(| | )P Xn
αμ εε
− ≥ ≤
X
Example
A biologist wants to estimate the life span of a type of insect. He takes a sample of size n and measure the lifetime from birth to death of each insect. Then he averages these numbers. If he believes the lifetimes of the insect are iid with variance 1.5 days. How large a sample should he choose to be at least 98% sure that his average is accurate within plus or minus 0.2 days (4.8 hours)?
Solution
Let Xi be the life time of the ith insect.We want to find n such that
Or equivalently
By Chebyshev’s
( 0.2 0.2) .98P X μ− < − < ≥
(| | 0.2) 1 .98 .02P X μ− > ≤ − =2
2(| | )P X tn tαμ− > ≤
2
2
(1.5) 37.5(| | 0.2 ) .02(.2)
37.5 / .02 1875.
P Xn n
So weneed n
μ− > ≤ = =
≥ =⎡ ⎤⎢ ⎥
Central Limit Theorem
Idea: No matter what the population distribution may be, if n is large then the distribution of the sample mean is approximately normal with mean μ and variance
The larger the n, the better the approximation. Good approximation for n>30. Try applet
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/
2
.nσ
Central Limit Theorem
If has mean μ and standard deviationLet
For large n, the distribution of U is approximately Normal(0,1)
xnσ
/xU
nμ
σ−
=
Example
A soft drink machine dispenses drinks in a cup. The amount dispensed is a R.V. with mean 200 ml and s.d 15 ml. What is the probability that the average amount dispensed in a random sample of size 36 is at least 204 ml?
By CLT, is approximately normal (200,15/6)36
ixx = ∑
Solutions( ) (204 )( 204)
(204 200)6 ( 1.6) .054815
is standard normal
x n nP X P
P Z P Z
Z
μ μσ σ
⎛ ⎞− −> = >⎜ ⎟⎜ ⎟
⎝ ⎠−⎛ ⎞≈ > = > =⎜ ⎟
⎝ ⎠
Sample Size Problem
Note that as sample size increases, the sample standard deviationgets smaller.
We can use the sample mean to estimate the true mean. By making the sample size bigger, we can make the estimate as accurate as we desire.
/ nσ
Sample Size Problem
Say we know the standard deviation is σ=3 for each item in a random sample. Say we want to be close to .95.
How big should the sample size be?
You try it?
(| | .5)P X μ− <
Solution
Using CLT approximation,
| | .5(| | .5) ( ) .953/ 3 /
.5.95 (| | 1.96) (| | )3 /
solve for n and round up 139
XP X Pn n
P Z P Zn
n
μμ −− < = < =
= < = <
=
Note
You can also estimate n using Chebyshev’sby the bound is not as strong
2
2
2
2
(| | .5) 1 (| | .5) .95
(| | .5) .05(.5)
3 720.05(.5)
which is much larger than the estimate based on CLT 139
P X P X
P Xn
So need n
n
μ μ
σμ
− < = − − ≥ =
− ≥ = =
⎡ ⎤= =⎢ ⎥⎢ ⎥
=
Problem
We admit 1500 students to RPI and we know historically 2/3 actually attend on average. We can assume the decision of each student to attend is independent. What is the probability that more than 1050 students attend?
Thoughts
What is the distribution of the number of students that attend?
Y=Binomial(n=1500,p=2/3)How can we compute P(Y>1050)?We learned that we can approximate
binomial using a Poisson with λ=np if n is very large and p is very small so that λ is small (<=10). But this is not the case here.
Normal approximation to Binomial
The acceptance of each student can be modeled as a Bernoulli random variable
with mean p=2/3 and s.d.
The ratio of students attending can be approximated as a normal with mean 2/3
and variance
by CLT
1 with probability 2/30 with probability 1/3
i
i
XX
=
=
ii
XX
n=
∑
229(1500) n
σ=
(1 ) 2 / 9p p− =
continued
For Y=the number of students admittedY is really binomial with mean np=1000and variance np(1-p)=1500*2/3*1/3
You can approximate it by CLT as normalWith same mean and variance since
equivalently ( )1000 (0,1)
(1 ) 1500 2 / 3 (1/ 3)Y np Y Nnp p− −
=−
∼
( ) ( )1000 1050 1000 ( 2.73) 0.0027
1500 2 / 3 (1/ 3) 1500 2 / 3 (1/ 3)YP P Z
⎛ ⎞− −⎜ ⎟> ≈ > =⎜ ⎟⎝ ⎠
( ) ( ) var( ) (1 )Y nX approximately normal E Y np Y np p= = = −
Note normal is continuous and binomial is discrete so we can improve the approximation a bit
P(Y=1050) =0Approximate by applying CLT toP(1049.5≤Y ≤1050.5)
So better approximation for P(Y>1050) is
( ) ( )1000 1050.5 1000 ( 2.766) 0.0028
1500 2 / 3 (1/ 3) 1500 2 / 3 (1/ 3)YP P Z
⎛ ⎞− −⎜ ⎟> ≈ > =⎜ ⎟⎝ ⎠
Note normal is continuous and binomial is discrete so we can improve the approximation a bit
Consider P(Y=1050) >0Approximate by applying CLT toP(1049.5≤Y ≤1050.5)
So better approximation for P(Y>1050) is
( ) ( )1000 1050.5 1000 ( 2.766) 0.0028
1500 2 / 3 (1/ 3) 1500 2 / 3 (1/ 3)YP P Z
⎛ ⎞− −⎜ ⎟> ≈ > =⎜ ⎟⎝ ⎠
Normal Approximation to Binomial
If Y is binomial n and p, then if we wantP(a≤Y≤b) use P(a-1/2 ≤Y≤b+1/2)
1/ 2 1/ 2(1 ) (1 ) (1 ) (1 )
1/ 2 1/ 2(1 ) (1 )
y np b np y np a npP Pnp p np p np p np p
b np a npP Z P Znp p np p
⎛ ⎞ ⎛ ⎞− + − − − −= ≤ − ≤⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟− − − −⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞+ − − −= ≤ − ≤⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠
Check to see if you got itIf Y is binomial (n, p), then if we wantP(a≤Y≤b) use approximation on
P(a-1/2 ≤Y≤b+1/2) P(a<Y≤b) use approximation on P(a+1/2 ≤Y≤b+1/2)P(a<Y<b) use approximation on P(a+1/2 ≤Y≤b-1/2)P(a ≤ Y<b) use approximation on P(a-1/2 ≤Y≤b-1/2)
How good is the approximation?
Use the normal approximation to the binomial to determine the probability of getting 2 heads and 3 tails in 5 flips of a balanced coin.
What is the actual distribution?
0 1 2 3 4 5f(x) 1/32 5/32 10/32 10/32 5/32 1/32
Normal approximation
Mean is np=5/2=2.5s.d. = (1 ) 1.25np p− =
(2 ) (2.5 ) (1.5 )2.5 2.5 2.5 2.5 1.5 2.5
1.118 1.118 1.118 1.118( 0) ( 1.0)
.5 (1 .8133) .3133
Compare to .3125
P heads P heads P headsx xP P
P Z P Z
= −
− − − −⎛ ⎞ ⎛ ⎞= ≤ − ≤⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
= ≤ − < −= − − =