Topic 9: The Central Limit Theorem and the Normal …jwatkins/505d/i-clt.pdfTopic 9: The Central Limit Theorem and the Normal Distribution June 20, 2011 1 Introduction In the dice

Topic 9: The Central Limit Theorem and the Normal Distribution∗

June 20, 2011

1 Introduction

In the dice examples, we saw the running averages moving to its distributional mean which we denoted by µ. Inaddition, we also learned that the standard deviation of an average of n dice rolls has size inversely proportional to√n, the square root of the number of observations.

Do the size of these fluctuations have any regular and predictable structure? This question and the investigationthat led to led to its answer, the central limit theorem, constitute one of the most important episodes in mathematics.

!!!!!!!!!!!!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!

!

!

!

!

!

!

!

!

!!!

!

!

!

!

!

!

!

!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!

!

!

!

!

!

!

!

!

!!!

!

!

!

!

!

!

!

!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!!!!!!!!!!!!

0 20 40 60 80 100

0.00

0.02

0.04

0.06

0.08

0.10

x

y

!!!!!

!

!

!

!

!

!

!

!

!

!

!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!!

!

!

!

!

!

!!!

!

!

!

!

!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!

!

!

!

!

!!!!

!

!

!

!

!

!

!!!!!!!!!!!!!

0 10 20 30 40 50 60

0.00

0.05

0.10

0.15

x

y

Figure 1: a. Probability of x successes in 100 Bernoulli trials with p = 0.2, 0.4, 0.6 and 0.8. b. Probability of x successes in Bernoulli trials withp = 1/2 and n = 20, 40 and 80.

∗ c© 2011 Joseph C. Watkins

1

Data Analysis and Probability The Central Limit Theorem

2 The Classical Central Limit TheoremLet’s begin by examining the distribution for the sum ofX1, X2 . . . Xn, independent and identically distributed randomvariables

Sn = X1 +X2 + · · ·+Xn,

what distribution do we see? Let’s look first to the simplest case, Xi Bernoulli random variables. In this case, the sumSn is a binomial random variable. We look at two cases - in the first we keep the number of trials the same at n = 100and vary the success probability p. In the second case, we keep the success probability the same at p = 1/2, but varythe number of trials.

The curves in Figure 1 are looking like bell curves. Their center and spread vary in ways that are predictable - Sn

hasmean np and standard deviation

√np(1− p).

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

z

sca

led

pro

ba

bili

ty

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

z

sca

led

pro

ba

bili

ty

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

z

sca

led

pro

ba

bili

ty

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

z

sca

led

pro

ba

bili

ty

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

z

sca

led

pro

ba

bili

ty

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

z

sca

led

pro

ba

bili

ty

Figure 2: Displaying the central limit theorem graphically. The scaledmass functions of the standardized version of the sum of n independent ge-ometric random variables for n = 1 (black) , 2 (green), 4 (red), 8 blue), 16(orange), and 32 (violet). Note how the skewness of the exponential distri-bution slowly gives way to the bell curve shape of the normal distribution.

Thus, if we take the standarized version ofthese sums of Bernoulli random variables

Zn =Sn − np√np(1− p)

,

then these bell curve graphs would lie on top ofeach other.

For our next example, we look at the den-sity of the sum of standardized geometric ran-dom variables with p = 1/2. The geometricmass function is strongly skewed and so wehave have to wait for larger values of n be-fore we see the bell curve emerge. In order tomake comparisons, we examine standardizedversions of the random variables with mean µand variance σ2.

To accomplish this,

• we can either standardize using the sum Sn having mean nµ and standard deviation σ√n, or

• we can standardize using the sample mean X̄n having mean µ and standard deviation σ/√n.

This yields two equivalent versions of the z-score.

Zn =Sn − nµσ√n

=X̄n − µσ/√n. (1)

In Figure 2, we see the densities approaching that of the bell curve for a standard normal random variables. Evenfor the case of n = 32 we see a small amount of skewness that is a remnant of the skewness in the geometric density.

The theoretical result behind these numerical explorations is called the classical central limit theorem:Let {Xi; i ≥ 1} be independent random variables having a common distribution. Let µ be their mean and σ2 be

their variance. Then the distribution function for the random variable Zn, the standardized scores defined by equation(1), converges to the distribution function Z a standard normal random variable. More precisely, the distributionfunction FZn converges to Φ, the distribution function of the standard normal.

In practical terms the central limit theorem states that P{a < Zn ≤ b} ≈ P{a < Zn ≤ b} = Φ(b)− Φ(a).This theorem is an enormously useful tool in providing good estimates for probabilities of events depending on

either Sn or X̄n. We shall begin to show this in the following examples.

2


-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

z

probability

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

z

Figure 3: Displaying the central limit theorem graphically. Here we compare the distribution function for the z-score for 32geometric random variables (violet) and the distribution function for a standard normal random variable.

3 The Normal Distribution

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

z

density

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

z

probability

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

-3 -2 -1 0 1 2 3

0.0

0.4

0.8

Figure 4: The density function (top) and distribution function (bottom) for a standardnormal random variable.

The central limit theorem shows thebroad applicability of the normaldistribution. To the left we see afigure that has (top), the density ofthe standard normal random variableand below its. distribution function.

For the density curve, probabili-ties are given by areas between ver-tical lines. For example, the proba-bility that a standard normal randomvariable is within one standard devi-ation of its mean

P{−1 < Z < 1} (2)

is the area under the bell curve andabove the horizontal axis betweenthe vertical lines at -1 and 1.

The can also be determined fromthe distribution function. Looking atz = 1, we see that the probabilitythat a standard normal is below thisvalue is ∼ 0.84. Similarly the prob-ability that a standard normal is be-low z = −1 is ∼ 0.16. Thus, the probability in (2) is ∼ 0.68.

3


Here is a short table of the standard normal probability distribution function Φ.

z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-3 0.0013 0.0010 0.0007 0.0005 0.0003 0.0002 0.0001 0.0001-2 0.0228 0.0179 0.0139 0.0107 0.0082 0.0062 0.0047 0.0035 0.0026 0.0019-1 0.1587 0.1357 0.1151 0.0968 0.0808 0.0668 0.0548 0.0446 0.0359 0.0288-0 0.5000 0.4601 0.4207 0.3821 0.3446 0.3085 0.2743 0.2420 0.2119 0.18410 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.81591 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.97122 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.99817 0.9987 0.9990 0.9993 0.9994 0.9997 0.9998 0.9999 0.9999

4 Summary of Normal ApproximationsThe z-score of some random quantity based on n observations is

Zn =random quantity−mean

standard deviation.

The central limit theorem tells us when the Zn-score has an approximately standard normal distribution. Thus,using a normal distribution table, we can find good approximations for the probabilities of

• P{Z < z} by looking up the value for z in the table

• P{Z > z} by looking up the value for z in the table and subtracting it from 1 or

• P{z1 < Z < z2} by looking up the values for z2 and z1 in the table and subtracting them.

4.1 Sample Sum

-100 -50 0 50 100 150

0.000

0.002

0.004

0.006

0.008

0.010

0.012

x

-100 -50 0 50 100 150

0.000

0.002

0.004

0.006

0.008

0.010

0.012

x

-100 -50 0 50 100 150

0.000

0.002

0.004

0.006

0.008

0.010

0.012

x

-100 -50 0 50 100 150

0.000

0.002

0.004

0.006

0.008

0.010

0.012

x

Figure 5: The density function for Sn for a random sample of size n = 10 (red), 20(green), 30 (blue), and 40 (purple). In this example, the observations are normallydistributed with mean µ = 1 and standard deviation σ = 10.

If we have a sum Sn of n indepen-dent random variables, X1, X2, . . . Xn

whose common distribution has meanµ and variance σ2, then

• the mean ESn = nµ,

• the variance Var(Sn) = nσ2,

• the standard deviation is σ√n.

Thus, Sn is approximately normal withmean nµ and variance nσ2. The z-score in this case is


.

We can approximate P{Sn < x}by noting that this is the same as com-puting


<x− nµσ√n

= z

and finding P{Zn < z} using the standard normal distribution.

4


4.2 Sample Mean

-10 -5 0 5 100.0

0.1

0.2

0.3

0.4

x

-10 -5 0 5 100.0

0.1

0.2

0.3

0.4

x

-10 -5 0 5 100.0

0.1

0.2

0.3

0.4

x

-10 -5 0 5 100.0

0.1

0.2

0.3

0.4

x

-10 -5 0 5 100.0

0.1

0.2

0.3

0.4

x

Figure 6: The density function for X̄ − µ for a random sample ofsize n = 1 (black), 10 (red), 20 (green), 30 (blue), and 40 (pur-ple). In this example, the observations are normally distributedwith standard deviation σ = 10.

For a sample mean X̄ = (X1 +X2 + · · ·+Xn)/n,

• the mean EX̄ = µ,

• the variance Var(X̄) = σ2/n,

• the standard deviation is σ/√n.

Thus, X̄ is approximately normal with mean µ and vari-ance σ2/n. The z-score in this case is

Zn =X̄ − µσ/√n.

Thus,

X̄ < x is equivalent to Zn =X̄ − µσ/√n<x− µσ/√n.

4.3 Sample ProportionFor Bernoulli trials X1, X2, . . . Xn with success proba-bility p, Let p̂ = (X1 +X2 + · · ·+Xn)/n be the sampleproportion. Then

• the mean Ep̂ = p,

• the variance Var(p̂) = p(1− p)/n,

• the standard deviation is√p(1− p)/n.

Thus, p̂ is approximately normal with mean p and variance p(1− p)/n. The z-score in this case is

Zn =p̂− p√

p(1− p)/n.

5 Examples

Example 1. For Bernoulli random variables, µ = p and σ =√p(1− p). Sn is the number of successes in n

Bernoulli trials. In this situation, the sample mean is the fraction of trials that result in a success. This is generallydenoted by p̂ to indicate the fact that it is a sample proportion.

The normalized versions of Sn and p̂ are equal to

Zn =Sn − np√np(1− p)

=p̂n − p√p(1− p)/n

,

For example, in 100 tosses of a fair coin, µ = 1/2 and σ =√

1/2(1− 1/2) = 1/2. Thus,

Z100 =S100 − 50

5.

So,

P{S100 > 65} = P

{S100 − 50

5> 3}≈ P{Z100 > 3} ≈ 0.0013.

5


We could also write,

Z100 =p̂− 1/2

1/20= 20(p̂− 1/2).

andP{p̂ ≤ 0.40} = P{20(p̂− 1/2) ≤ 20(0.4− 1/2)} = P{Zn ≤ −2} ≈ 0.023.

Example 2. Video projector light bulbs are known to have a mean lifetime of µ = 100 hours and standard deviationσ = 75. The school uses the projectors for 9000 hours per semester. How likely are 100 light bulbs to be sufficient?

Let S100 be the total lifetime of the 100 bulbs. We are asking for the probability that {Sn > 9000}. Note that thisevent is equivalent to

Zn =Sn − 1000075 ·√

100>

9000− 1000075 ·√

100= −4

3

and, by the central limit theorem, P{Zn > 4/3} ≈ 0.909.

Example 3. To find the value of z∗ so that P{|Z| > z∗} = 0.95, first note that he area between −z∗ and z∗ belowthe density curve is 0.95. The remaining 0.05 outside the vertical red lines in the figure is 0.05. Thus, the probabilityin each of the tails, P{Z < −z∗} = P{Z > z∗} = 0.025. Looking of 0.025 in the normal table, we find the value-1.96 for the value of −z∗, Thus, z∗ = 1.96.

Let’s continue this,

probability tail probability z∗

0.95 0.025 1.960.96 0.02 2.050.98 0.01 2.330.99 0.005 2.590.999 0.005 3.29

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

z

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

z

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

z

Figure 7: The area between the red lines under thedensity curve is 0.95. Thus, the area outside the linesis 0.05

6

Documents

Topic 9: The Central Limit Theorem and the Normal …jwatkins/505d/i-clt.pdfTopic 9: The Central Limit Theorem and the Normal Distribution June 20, 2011 1 Introduction In the dice