34
Sample Mean and Central Limit Theorem Lecture 21-22 November 17-21

Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Sample Mean and Central Limit Theorem

Lecture 21-22November 17-21

Page 2: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Outline

• Sums of Independent Random Variables• Chebyshev’s Inequality• Estimating sample sizes• Central Limit Theorem• Binomial Approximation to the normal

Page 3: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Sample Mean Statistics

Let X1,…Xn be a random sample from a population (e.g. The Xi are independent and identically distributed).

The sample mean is defined as

What can we say about the distribution of the sample mean?

1

1 .n

ii

x xn =

= ∑

Page 4: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Sample Mean for Normal

Let X1,X2,…,Xn be a random sample from a normal (μ,σ). What is the pdf of ?

Hint use the mgf method. The mgf of a normal is:

x

2 2( )

2( )tt

xm t eαμ +

=

Page 5: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

SolutionThe mgf of Y=∑Xi is

But we want

Thus the mean is normal with mean μ and variance

2 2 2 2( ) ( )

2 2( ) [ ( )]n

t ntt n tny xm t m t e e

α αμ μ+ +⎡ ⎤= = =⎢ ⎥

⎢ ⎥⎣ ⎦

/x y n= 2 22 2 ( )( / ) ( )( / )22( ) ( / )

tn t n tn t nn

x ym t m t n e eαα μμ ++

= = =

2

Page 6: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

For any distribution with mean μand standard deviation σ

What is mean and variance of ?x

1 1( ) ( )i

ii

i

xE x E E x n

n n nμ μ

⎛ ⎞⎜ ⎟= = = =⎜ ⎟⎜ ⎟⎝ ⎠

∑∑

22

2 2

1 1var( ) var var( )i

ii

i

xx x n

n n n nσσ

⎛ ⎞⎜ ⎟= = = =⎜ ⎟⎜ ⎟⎝ ⎠

∑∑

Page 7: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

What if we don’t know the distribution?

If we don’t want to make assumptions about what the distribution is, can we still do things?

• Chebyshev’s Inequality• Central Limit Theorem

Page 8: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Amazing property true for any distribution

Chebyshev inequality:Consider any random variable with mean μ

and standard deviation σ. For any k>0, P[|X- μ|≥k σ)≤1/k2

E(X) E(X)+kσE(X)-kσ

Page 9: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

In words,

The probability that X deviates from its expected value at least k standard deviations is less than

2

1k

Page 10: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Example

Suppose that, on average, a post office handles 10000 letters a day with a variance of 2000. What can be said about the probability that this post office will handle between 8000 and 12000 letters tomorrow?

Page 11: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Using Chebshev’s

X = number of letters will handle tomorrow

Because by Chebyshev’s

2( ) 10000 var( ) 2000(8000 12000).

E X XWant P Xμ α= = = =

< <

(8000 12000) ( 2000 10000 2000)(| 10000 | 2000)

1 (| 10000 | 2000) 1 .0005 .9995

P X P XP X

P X

< < = − < − <= − <= − − > ≥ − =

(| 10000 | 2000) (| 10 | 2000 )1 .0005

2000

P X P X σ− > = − >

≤ =

Page 12: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Alternative forms of Chebyshev

For any k>0, P[|X- μ|≥kσ)≤1/k2

For any t>0P[|X- μ|≥t)≤ σ 2/t2 by setting t= kσ

E(X) E(X)+tE(X)-t

Page 13: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Chebyshev’s applied to Sample Mean

Recall ifare independent and identically distributedwith mean μ and standard deviation σ then

1 1( ) ( )i

ii

i

xE x E E x n

n n nμ μ

⎛ ⎞⎜ ⎟= = = =⎜ ⎟⎜ ⎟⎝ ⎠

∑∑

22

2 2

1 1var( ) var var( )i

ii

i

xx x n

n n n nσσ

⎛ ⎞⎜ ⎟= = = =⎜ ⎟⎜ ⎟⎝ ⎠

∑∑

1 ,, nX X…

Page 14: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Chebyshev’s applied to Sample Mean

Applying Chebyshev’s to the sample meanthen for any ε>0

2

2(| | )P Xn

αμ εε

− ≥ ≤

X

Page 15: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Example

A biologist wants to estimate the life span of a type of insect. He takes a sample of size n and measure the lifetime from birth to death of each insect. Then he averages these numbers. If he believes the lifetimes of the insect are iid with variance 1.5 days. How large a sample should he choose to be at least 98% sure that his average is accurate within plus or minus 0.2 days (4.8 hours)?

Page 16: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Solution

Let Xi be the life time of the ith insect.We want to find n such that

Or equivalently

By Chebyshev’s

( 0.2 0.2) .98P X μ− < − < ≥

(| | 0.2) 1 .98 .02P X μ− > ≤ − =2

2(| | )P X tn tαμ− > ≤

2

2

(1.5) 37.5(| | 0.2 ) .02(.2)

37.5 / .02 1875.

P Xn n

So weneed n

μ− > ≤ = =

≥ =⎡ ⎤⎢ ⎥

Page 17: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Central Limit Theorem

Idea: No matter what the population distribution may be, if n is large then the distribution of the sample mean is approximately normal with mean μ and variance

The larger the n, the better the approximation. Good approximation for n>30. Try applet

http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/

2

.nσ

Page 18: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Central Limit Theorem

If has mean μ and standard deviationLet

For large n, the distribution of U is approximately Normal(0,1)

xnσ

/xU

σ−

=

Page 19: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Example

A soft drink machine dispenses drinks in a cup. The amount dispensed is a R.V. with mean 200 ml and s.d 15 ml. What is the probability that the average amount dispensed in a random sample of size 36 is at least 204 ml?

By CLT, is approximately normal (200,15/6)36

ixx = ∑

Page 20: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Solutions( ) (204 )( 204)

(204 200)6 ( 1.6) .054815

is standard normal

x n nP X P

P Z P Z

Z

μ μσ σ

⎛ ⎞− −> = >⎜ ⎟⎜ ⎟

⎝ ⎠−⎛ ⎞≈ > = > =⎜ ⎟

⎝ ⎠

Page 21: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Sample Size Problem

Note that as sample size increases, the sample standard deviationgets smaller.

We can use the sample mean to estimate the true mean. By making the sample size bigger, we can make the estimate as accurate as we desire.

/ nσ

Page 22: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Sample Size Problem

Say we know the standard deviation is σ=3 for each item in a random sample. Say we want to be close to .95.

How big should the sample size be?

You try it?

(| | .5)P X μ− <

Page 23: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Solution

Using CLT approximation,

| | .5(| | .5) ( ) .953/ 3 /

.5.95 (| | 1.96) (| | )3 /

solve for n and round up 139

XP X Pn n

P Z P Zn

n

μμ −− < = < =

= < = <

=

Page 24: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Note

You can also estimate n using Chebyshev’sby the bound is not as strong

2

2

2

2

(| | .5) 1 (| | .5) .95

(| | .5) .05(.5)

3 720.05(.5)

which is much larger than the estimate based on CLT 139

P X P X

P Xn

So need n

n

μ μ

σμ

− < = − − ≥ =

− ≥ = =

⎡ ⎤= =⎢ ⎥⎢ ⎥

=

Page 25: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Problem

We admit 1500 students to RPI and we know historically 2/3 actually attend on average. We can assume the decision of each student to attend is independent. What is the probability that more than 1050 students attend?

Page 26: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Thoughts

What is the distribution of the number of students that attend?

Y=Binomial(n=1500,p=2/3)How can we compute P(Y>1050)?We learned that we can approximate

binomial using a Poisson with λ=np if n is very large and p is very small so that λ is small (<=10). But this is not the case here.

Page 27: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Normal approximation to Binomial

The acceptance of each student can be modeled as a Bernoulli random variable

with mean p=2/3 and s.d.

The ratio of students attending can be approximated as a normal with mean 2/3

and variance

by CLT

1 with probability 2/30 with probability 1/3

i

i

XX

=

=

ii

XX

n=

229(1500) n

σ=

(1 ) 2 / 9p p− =

Page 28: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

continued

For Y=the number of students admittedY is really binomial with mean np=1000and variance np(1-p)=1500*2/3*1/3

You can approximate it by CLT as normalWith same mean and variance since

equivalently ( )1000 (0,1)

(1 ) 1500 2 / 3 (1/ 3)Y np Y Nnp p− −

=−

( ) ( )1000 1050 1000 ( 2.73) 0.0027

1500 2 / 3 (1/ 3) 1500 2 / 3 (1/ 3)YP P Z

⎛ ⎞− −⎜ ⎟> ≈ > =⎜ ⎟⎝ ⎠

( ) ( ) var( ) (1 )Y nX approximately normal E Y np Y np p= = = −

Page 29: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Note normal is continuous and binomial is discrete so we can improve the approximation a bit

P(Y=1050) =0Approximate by applying CLT toP(1049.5≤Y ≤1050.5)

So better approximation for P(Y>1050) is

( ) ( )1000 1050.5 1000 ( 2.766) 0.0028

1500 2 / 3 (1/ 3) 1500 2 / 3 (1/ 3)YP P Z

⎛ ⎞− −⎜ ⎟> ≈ > =⎜ ⎟⎝ ⎠

Page 30: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Note normal is continuous and binomial is discrete so we can improve the approximation a bit

Consider P(Y=1050) >0Approximate by applying CLT toP(1049.5≤Y ≤1050.5)

So better approximation for P(Y>1050) is

( ) ( )1000 1050.5 1000 ( 2.766) 0.0028

1500 2 / 3 (1/ 3) 1500 2 / 3 (1/ 3)YP P Z

⎛ ⎞− −⎜ ⎟> ≈ > =⎜ ⎟⎝ ⎠

Page 31: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Normal Approximation to Binomial

If Y is binomial n and p, then if we wantP(a≤Y≤b) use P(a-1/2 ≤Y≤b+1/2)

1/ 2 1/ 2(1 ) (1 ) (1 ) (1 )

1/ 2 1/ 2(1 ) (1 )

y np b np y np a npP Pnp p np p np p np p

b np a npP Z P Znp p np p

⎛ ⎞ ⎛ ⎞− + − − − −= ≤ − ≤⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟− − − −⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛ ⎞+ − − −= ≤ − ≤⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠

Page 32: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Check to see if you got itIf Y is binomial (n, p), then if we wantP(a≤Y≤b) use approximation on

P(a-1/2 ≤Y≤b+1/2) P(a<Y≤b) use approximation on P(a+1/2 ≤Y≤b+1/2)P(a<Y<b) use approximation on P(a+1/2 ≤Y≤b-1/2)P(a ≤ Y<b) use approximation on P(a-1/2 ≤Y≤b-1/2)

Page 33: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

How good is the approximation?

Use the normal approximation to the binomial to determine the probability of getting 2 heads and 3 tails in 5 flips of a balanced coin.

What is the actual distribution?

0 1 2 3 4 5f(x) 1/32 5/32 10/32 10/32 5/32 1/32

Page 34: Sample Mean and Central Limit Theoremhomepages.rpi.edu/~bennek/class/probold/handouts/Lecture_21-08b… · Sample Size Problem Note that as sample size increases, the sample standard

Normal approximation

Mean is np=5/2=2.5s.d. = (1 ) 1.25np p− =

(2 ) (2.5 ) (1.5 )2.5 2.5 2.5 2.5 1.5 2.5

1.118 1.118 1.118 1.118( 0) ( 1.0)

.5 (1 .8133) .3133

Compare to .3125

P heads P heads P headsx xP P

P Z P Z

= −

− − − −⎛ ⎞ ⎛ ⎞= ≤ − ≤⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

= ≤ − < −= − − =