Lab 3b: Distribution of the mean Outline Distribution of the mean: Normal (Gaussian) – Central Limit Theorem “Justification” of the mean as the best estimate

Lab 3b: Distribution of the meanOutline

•Distribution of the mean: Normal (Gaussian)– Central Limit Theorem

•“Justification” of the mean as the best estimate

•“Justification” of error propagation formula– Propagation of errors

– Standard deviation of the mean

640:104 - Elementary Combinatorics and Probability 960:211-212. Statistics I, II

Finite number of repeated measurements

1

1 N

ii

X x xN

The average value is the best estimation of true mean value, while the square of standard deviation is the best estimation of true variance.

22 2

1

1

1

N

x ii

x xN

The same set of data also provides the best estimation of the variance of the mean. 2

2 xx N

xx

N

x

x x

x x

The Central Limit Theorem

1 2 Nx x xx

N

The mean is a sum of N independent random variables:

NGP x P x

In probability theory, the central limit theorem (CLT) states that, given conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.

The PDF of the mean is the PDF of the sum of N independent random variables.

*In practice, N just needs to be large enough.

Probability Distribution Function of the mean

1P x 1 2P x x

1 2 3P x x x 1 2 3 4P x x x x

An example of CLT

For this example N>4 is “enough”.

More examples of CLT

http://www.statisticalengineering.com/central_limit_theorem.html

E.g. N measurements xi (i=1, 2, …, N) of random variable x can be divided to m subsets of measurements {nj}, j=1,2,…,m.

Define the means of each subset as new measurements of new random variable y. If nj are large enough, the PDF of yj is approximately a Normal (Gaussian) one.

The significant consequence of CLT

Almost any repeated measurements of independent random variables can be effectively described by Normal (Gaussian) distribution.

1 2 mN n n n

1

1 jn

j j iij

y x xn

“Justification” of the mean as the best estimate

Assuming all the measured values follow a normal distribution GX,(x) with unknown parameters X and .

The probability of obtaining value x1 in a small interval dx1 is:

2 2/ 2,

1

2x X

XG e

2 22 21 1/ 2 / 2

1 1 1 1

1 1Prob between and

2x X x Xx x x dx e dx e

Similarly, the probability of obtaining value xi (i=0, …, N) in a small interval dxi is:

2 22 2/ 2 / 21 1Prob between and

2i ix X x X

i i i ix x x dx e dx e

constant

Justification of the mean as best estimate (cont.)

The probability of N measurements with values: x1, x2, … xN:

2 2

1

, 1 1 2

/ 2

Prob , , Prob Prob Prob

1N

ii

X N N

x X

N

x x x x x

e

Here x1, x2, … xN are known results because they are measured values. On the other hand, X and are unknown parameters. We want to find out the best estimate of them from ProbX, (x1, x2, … xN ). One of the reasonable procedures is the principle of maximum likelihood, i.e. maximizing ProbX, (x1, x2, … xN ) respect to X minimizing 2 2

1

/ 2N

ii

x X

1 1

10 best estimate of

N N

i ii i

x X X x xN

Best estimate of varianceSimilarly, we could maximize ProbX, (x1, x2, … xN ) respect to to get the best estimate of :

2 22 22/ 2 / 2

1 31

2

21

2

1

1 10

10

1best estimate of

i iN

x X x X

iN Ni

N

ii

N

ii

Ne x X e

N x X

x XN

2

1

1best estimate of

1

N

ii

x xN

However, we don’t really know X. In practice, we have to replace X with the mean value, which reduces the accuracy. To account for this factor, we have to replace N with N-1.

q x A Measured quantity plus a constant:

q Bx

Measured quantity multiplies a fixed #:

q x y Sum of two measured quantities:

Justification of error propagation formula (I)

q x y

P q P x P y dxdy P x P q x dx

2 2/ 2

2

xx

x

eP x

Assuming X=Y=0, 2 2/ 2

2

yy

y

eP y

, = q xq X A

, = q xq B X B

q x y Sum of two measured quantities:

Justification of error propagation formula (II)

2 2q x y

2 22 2 / 2/ 2

2 2

yx q xx

x y

e eP q P x P q x dx dx

22

2 22 21

2x y

q xx

x y

e dx

2

2 22

2 2

1

2

x y

q

x y

e

Justification of error propagation formula (III)

The general case: . q q x

Assuming are small enough so that we are concerned

only with values of close to .x

x X

Using Taylor expansion:

qq x q X x X

x

q qq X X x

x x

2

q x x

q q

x x

constant

Justification of error propagation formula (IV)

The more general case: , . q q x y

Assuming and are small enough so that we are concerned

only with values of close to and close to .

x y

x X y Y

Using Taylor expansion:

, ,q q

q x y q X Y x X y Yx y

,q q q q

q X Y X Y x yx y x y

22

q x y

q q

x y

Standard deviation of the mean

22 2

1 2

Assuming they are normally distributed and have the same and .x

x x x xN

X

q q q

x x x

1 2 Nx x xx

N

2 2 221 1 1 1 x

x x x x xN N N N N

xx

N

Lab 3b: test the CLT1. One run of decay rate (R=N/Δt) measurement

a. Sampling rate: 0.5 second/sample (Δt)

b. Run time: 30 seconds (# of measurements: n=60)

c. calculate , plot histogram.

2. Repeat the 30-sec run 20 times (a set of runs)a. Remember to save data

b. Plot histograms of 21 means with proper bin widths

c. Make sure you answer all the questions

3. Sample size (n=T/ Δt) dependence– Keep the same configuration, vary run time

–

* “Origin” (OriginLab®) is more convenient than Matlab.

, ,N NN

Record and , plot vs. log with error bar = .N NN N n

http://www.physics.rutgers.edu/rcem/~ahogan/326.html

Documents

Lab 3b: Distribution of the mean Outline Distribution of the mean: Normal (Gaussian) – Central Limit Theorem “Justification” of the mean as the best estimate