31
Introduction The three trends The central limit theorem Summary The Central Limit Theorem Patrick Breheny October 1 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/31

The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The Central Limit Theorem

Patrick Breheny

October 1

Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/31

Page 2: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Kerrich’s experiment

A South African mathematician named John Kerrich wasvisiting Copenhagen in 1940 when Germany invaded Denmark

Kerrich spent the next five years in an interment camp

To pass the time, he carried out a series of experiments inprobability theory

One of them involved flipping a coin 10,000 times

Patrick Breheny Biostatistical Methods I (BIOS 5710) 2/31

Page 3: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

The law of averages?

We know that a coin lands heads with probability 50%

So, loosely speaking, if we flip the coin a lot, we should haveabout the same number of heads and tails

The subject of today’s lecture, though is to be much morespecific about exactly what happens and precisely whatprobability theory tells us about what will happen in those10,000 coin flips

Patrick Breheny Biostatistical Methods I (BIOS 5710) 3/31

Page 4: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Kerrich’s results

Number of Number of Heads -tosses (n) heads 0.5·Tosses

10 4 -1100 44 -6500 255 5

1,000 502 22,000 1,013 133,000 1,510 104,000 2,029 295,000 2,533 336,000 3,009 97,000 3,516 168,000 4,034 349,000 4,538 38

10,000 5,067 67

Patrick Breheny Biostatistical Methods I (BIOS 5710) 4/31

Page 5: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Kerrich’s results plotted

−100

−50

0

50

100

Number of tosses

Num

ber

of h

eads

min

us h

alf t

he n

umbe

r of

toss

es

10 100 400 1000 2000 4000 7000 10000

Instead of getting closer, the numbers of heads and tails aregetting farther apart

Patrick Breheny Biostatistical Methods I (BIOS 5710) 5/31

Page 6: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Repeating the experiment 50 times

−100

−50

0

50

100

Number of tosses

Num

ber

of h

eads

min

us h

alf t

he n

umbe

r of

toss

es

10 100 400 1000 2000 4000 7000 10000

This is not a fluke – instead, it occurs systematically andconsistently in repeated simulated experiments

Patrick Breheny Biostatistical Methods I (BIOS 5710) 6/31

Page 7: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Where’s the law of averages?

As the figure indicates, simplistic notions like “we should haveabout the same number of heads and tails” are inadequate todescribe what happens with long-run probabilities

We must be more precise about what is happening – inparticular, what is getting more predictable as the number oftosses goes up, and what is getting less predictable?

Consider instead looking at the percentage of flips that areheads

Patrick Breheny Biostatistical Methods I (BIOS 5710) 7/31

Page 8: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Repeating the experiment 50 times, Part II

30

40

50

60

70

Number of tosses

Per

cent

age

of h

eads

10 100 400 1000 2000 4000 7000 10000

Patrick Breheny Biostatistical Methods I (BIOS 5710) 8/31

Page 9: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

What happens as n gets bigger?

We now turn our attention to obtaining a precisemathematical description of what is happening to the mean(i.e, the proportion of heads) with respect to three trends:

Its expected valueIts varianceIts distribution

First, however, we need to define joint distributions and provea few theorems about the expectation and variance of sums

Patrick Breheny Biostatistical Methods I (BIOS 5710) 9/31

Page 10: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Joint distributions

We can extend the notion of a distribution to include theconsideration of multiple variables simultaneously

Suppose we have two discrete random variables X and Y .Then the joint probability mass function is a functionf : R2 → R defined by

f(x, y) = P (X = x, Y = y)

Likewise, suppose we have two discrete random variables Xand Y . Then the joint probability density function is afunction f : R2 → R satisfying

P ((X,Y ) ∈ A) =

∫ ∫Af(x, y) dx dy

for all rectangles A ∈ R2

Patrick Breheny Biostatistical Methods I (BIOS 5710) 10/31

Page 11: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Marginal distributions

Suppose we have two discrete random variables X and Y .Then the pmf

f(x) =∑y

f(x, y)

is called the marginal pmf of X

Likewise, if X and Y are continuous, we integrate out onevariable to obtain the marginal pdf of the other:

f(x) =

∫f(x, y) dy

Patrick Breheny Biostatistical Methods I (BIOS 5710) 11/31

Page 12: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Expectation of a sum

Theorem: Let X and Y be random variables. Then

E(X + Y ) = E(X) + E(Y )

provided that E(X) and E(Y ) exist

Note in particular that X and Y do not have to beindependent for this to work

Patrick Breheny Biostatistical Methods I (BIOS 5710) 12/31

Page 13: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

10,000 coin flipsExpectation and variance of sums

Variance of a sum

Theorem: Let X and Y be independent random variables.Then

Var(X + Y ) = Var(X) + Var(Y )

provided that Var(X) and Var(Y ) exist

Note that X and Y must be independent for this to work

Patrick Breheny Biostatistical Methods I (BIOS 5710) 13/31

Page 14: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

Expected valueVarianceThe distribution of the average

The expected value of the mean

Theorem: Suppose X1, X2, . . . Xn are random variables withthe same expected value µ, and let X̄ denote the mean of alln random variables. Then

E(X̄) = µ

In other words, for any value of n, the expected value of thesample mean is expected value of the underlying distribution

When an estimator θ̂ has the property that E(θ̂) = θ, theestimator is said to be unbiased

Thus, the above theorem shows that the sample mean is anunbiased estimator of the population mean

Patrick Breheny Biostatistical Methods I (BIOS 5710) 14/31

Page 15: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

Expected valueVarianceThe distribution of the average

Applying this result to the coin flips example

Theorem: For the binomial distribution, E(X) = nπ

Thus, letting π̂ = X/n, E(π̂) = π, which is exactly what wesaw in the earlier picture:

30

40

50

60

70

Number of tosses

Per

cent

age

of h

eads

10 100 400 1000 2000 4000 7000 10000

Patrick Breheny Biostatistical Methods I (BIOS 5710) 15/31

Page 16: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

Expected valueVarianceThe distribution of the average

The variance of the mean

Our previous result showed that the sample mean is always“close” to the expected value, at least in the sense of beingcentered around it

Of course, how close it is also depends on the variance, whichis what we now consider

Theorem: Suppose X1, X2, . . . Xn are independent randomvariables with expected value µ and variance σ2. Letting X̄denote the mean of all n random variables,

Var(X̄) =σ2

n

Corollary: SD(X̄) = σ/√n

Patrick Breheny Biostatistical Methods I (BIOS 5710) 16/31

Page 17: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

Expected valueVarianceThe distribution of the average

The square root law

To distinguish between the standard deviation of the data andthe standard deviation of an estimator (e.g., the mean),estimator standard deviations are typically referred to asstandard errors

As the previous slide makes clear, these are not the same, andare related to each other by a very important way, sometimescalled the square root law:

SE =SD√n

This is true for all averages, although as we will see later inthe course, must be modified for other types of estimators

Patrick Breheny Biostatistical Methods I (BIOS 5710) 17/31

Page 18: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

Expected valueVarianceThe distribution of the average

Standard errors

Note that Var(∑X) goes up with n, while Var(X̄) goes down

with n, exactly as we saw in our picture from earlier:

−100

−50

0

50

100

Number of tosses

Num

ber

of h

eads

min

us h

alf t

he n

umbe

r of

toss

es

10 100 400 1000 2000 4000 7000 10000

30

40

50

60

70

Number of tosses

Per

cent

age

of h

eads

10 100 400 1000 2000 4000 7000 10000

Indeed, Var(X̄) actually goes down all the way to zero as n→∞

Patrick Breheny Biostatistical Methods I (BIOS 5710) 18/31

Page 19: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

Expected valueVarianceThe distribution of the average

The distribution of the mean

Finally, let’s look at the distribution of the mean by creatinghistograms of the mean from our 50 simulations

Proportion (Heads)

Fre

quen

cy

0.2 0.4 0.6 0.8

0

5

10

15n=10

Proportion (Heads)

Fre

quen

cy

0.3 0.4 0.5 0.6 0.7

0

5

10

15

n=20

Proportion (Heads)

Fre

quen

cy

0.35 0.45 0.55 0.65

0

2

4

6

8

10

12n=40

Proportion (Heads)

Fre

quen

cy

0.40 0.50 0.60

0

2

4

6

8

10

12

14n=80

Patrick Breheny Biostatistical Methods I (BIOS 5710) 19/31

Page 20: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

The central limit theorem (informal)

In summary, there are three very important phenomena goingon here concerning the sample average:

#1 The expected value is always equal to the population average#2 The standard error is always equal to the population standard

deviation divided by the square root of n#3 As n gets larger, its distribution looks more and more like the

normal distribution

Furthermore, these three properties of the sample averagehold for any distribution

Patrick Breheny Biostatistical Methods I (BIOS 5710) 20/31

Page 21: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

The central limit theorem (formal)

Central limit theorem: Suppose X1, X2, . . . Xn areindependent random variables with expected value µ andvariance σ2. Letting X̄ denote the mean of all n randomvariables,

√nX̄ − µσ

d−→ N(0, 1)

The notationd−→ is read “converges in distribution to”, and

means that the limit as n→∞ of the CDF of the quantity onthe left is equal to the CDF on the right (at all points x wherethe CDF on the right is continuous)

Patrick Breheny Biostatistical Methods I (BIOS 5710) 21/31

Page 22: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

Graphical idea of convergence in distribution

For the binomial distribution (red=normal, blue=√n(X̄ − µ)/σ):

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

x

F(x

)n=1024

Patrick Breheny Biostatistical Methods I (BIOS 5710) 22/31

Page 23: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

Power of the central limit theorem

This result is one of the most important, remarkable, andpowerful results in all of statistics

In the real world, we rarely know the distribution of our data

But the central limit theorem says: we don’t have to

Patrick Breheny Biostatistical Methods I (BIOS 5710) 23/31

Page 24: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

Power of the central limit theorem

Furthermore, as we have seen, knowing the mean andstandard deviation of a distribution that is approximatelynormal allows us to calculate anything we wish to know withtremendous accuracy – and the distribution of the mean isalways approximately normal

The caveat, however, is that for any finite sample size, theCLT only holds approximately

How good is this approximation? It depends. . .

Patrick Breheny Biostatistical Methods I (BIOS 5710) 24/31

Page 25: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

How large does n have to be?

Rules of thumb are frequently recommended that n = 20 orn = 30 is “large enough” to be sure that the central limittheorem is working

There is some truth to such rules, but in reality, whether n islarge enough for the central limit theorem to provide anaccurate approximation to the true distribution depends onhow close to normal the population distribution is, and thusmust be checked on a case-by-case basis

If the original distribution is close to normal, n = 2 might beenough

If the underlying distribution is highly skewed or strange insome other way, n = 50 might not be enough

Patrick Breheny Biostatistical Methods I (BIOS 5710) 25/31

Page 26: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

Example #1

−6 −4 −2 0 2 4 6

0.00

0.05

0.10

0.15

0.20

x

Den

sity

n=10

Sample means

Den

sity

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

Patrick Breheny Biostatistical Methods I (BIOS 5710) 26/31

Page 27: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

Example #2

Now imagine an urn containing the numbers 1, 2, and 9:

n=20

Sample mean

Den

sity

1 2 3 4 5 6 7 8

0.0

0.2

0.4

0.6

0.8

n=50

Sample mean

Den

sity

2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

Patrick Breheny Biostatistical Methods I (BIOS 5710) 27/31

Page 28: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

Example #3

Weight tends to be skewed to the right (more people areoverweight than underweight)

Let’s perform an experiment in which the NHANES sample ofadult men is the population

I am going to randomly draw twenty-person samples from thispopulation (i.e. I am re-sampling the original sample)

Patrick Breheny Biostatistical Methods I (BIOS 5710) 28/31

Page 29: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

The theoremHow good is the CLT approximation?

Example #3 (cont’d)

n=20

Sample mean

Den

sity

160 180 200 220 240 260

0.00

0.01

0.02

0.03

0.04

Patrick Breheny Biostatistical Methods I (BIOS 5710) 29/31

Page 30: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

Why do so many things follow normal distributions?

We can see now why the normal distribution comes up sooften in the real world: any time a phenomenon has manycontributing factors, and what we see is the average effect ofall those factors, the quantity will follow a normal distribution

For example, there is no one cause of height – thousands ofgenetic and environmental factors make small contributions toa person’s adult height, and as a result, height is normallydistributed

On the other hand, things like eye color, cystic fibrosis, brokenbones, and polio have a small number of (or a single)contributing factors, and do not follow a normal distribution

Patrick Breheny Biostatistical Methods I (BIOS 5710) 30/31

Page 31: The Central Limit Theorem - MyWeb · Theorem: For the binomial distribution, E(X) = nˇ Thus, letting ^ˇ = X=n, E(^ˇ) = ˇ, which is exactly what we saw in the earlier picture:

IntroductionThe three trends

The central limit theoremSummary

Summary

E(X + Y ) = E(X) + E(Y )

Var(X + Y ) = Var(X) + Var(Y ) if X and Y are independent

Central limit theorem:

The expected value of the average is always equal to thepopulation averageSE = SD/

√n

As n gets larger, the distribution of the sample average looksmore and more like the normal distribution

Generally speaking, the sampling distribution looks prettynormal by about n = 20, but this could happen faster orslower depending on the underlying distribution, in particularby how skewed it is

Patrick Breheny Biostatistical Methods I (BIOS 5710) 31/31