Download pdf - error analysis lectures

7/27/2019 error analysis lectures

1/47

Introduction to Error AnalysisPart 1: the Basics

Andrei Gritsanbased on lectures by Petar Maksimovic

February 1, 2010

Overview

Definitions

Reporting results and rounding

Accuracy vs precision systematic vs statistical errors Parent distribution

Mean and standard deviation

Gaussian probability distribution What a 1 error means


2/47

Definitions true: true value of the quantity x we measure xi: observed value error on : difference between the observed and true

value, xi trueAll measurement have errors true value is unattainable

seek best estimate of true value,

seek best estimate of true error xi


3/47

One view on reporting measurements (from the book) keep only one digit of precision on the error everything

else is noise

Example: 410.5163819

4

102

exception: when the first digit is 1, keep two:Example: 17538 1.7 104

round off the final value of the measurement up to thesignificant digits of the errorsExample: 87654 345 kg (876 3) 102 kg

rounding rules:

6 and above

round up

4 and below round down 5: if the digit to the right is even round down, else round

up

(reason: reduces systematic erorrs in rounding)


4/47

A different view on roundingFrom Particle Data Group (authority in particle physics):

http://pdg.lbl.gov/2009/reviews/rpp2009-rev-rpp-intro.pdf

between 100 and 354, we round to two significant digitsExample: 87654 345 kg (876.5 3.5) 102 kg

between 355 and 949, we round to one significant digitExample: 87654 365 kg (877 4) 10

2

kg lie between 950 and 999, we round up to 1000 and keep two

significant digits

Example: 87654

950 kg

(87.7

1.0)

103 kg

Bottom line:

Use consistent approach to rounding which is sound andaccepted in the field of study, use common sense after all


5/47

Accuracy vs precision Accuracy: how close to true value Precision: how well the result is determined (regardless

of true value); a measure of reproducibility

Example: = 30

x = 23 2 precise, but inacurate uncorrected biases(large systematic error)

x = 28 7 acurate, but imprecise

subsequent measurements will scatter around

= 30 but cover the true value in most cases(large statistical (random) error)

an experiment should be both acurate and precise


6/47

Statistical vs. systematic errors Statistical (random) errors:

describes by how much subsequent measurementsscatter the common average value

if limited by instrumental error, use a better apparatus if limited by statistical fluctuations, make more

measurements

Systematic errors: all measurements biased in a common way harder to detect: faulty calibrations

wrong model bias by observer

also hard to determine (no unique recipe) estimated from analysis of experimental conditions and

techniques

may be correlated


7/47

Parent distribution(assume no systematic errors for now)

parent distribution: the probability distribution of results ifthe number of measurements N

however, only a limited number of measurements: weobserve only a sample of parent dist., a sample distribution

prob. distribution of our measurements only approaches

parent dist. with N use observed distribution to infer the parameters from theparent distribution, e.g., true when N

Notation

Greek: parameters of the parent distribution

Roman: experimental estimates of params of parent dist.


8/47

Mean, median, mode Mean: of experimental (sample) dist:

x 1

N

Ni=1

xi

. . . of the parent dist

limN

1

N

xi

mean centroid average

Median: splits the sample in two equal parts

Mode: most likely value (highest prob.density)


9/47

Variance Deviation: di xi , for single measurement

Average deviation:

xi = 0 by definition |xi |but, absolute values are hard to deal with analytically

Variance: instead, use mean of the deviations squared:2 (xi )2 = x2 2

2 = limN

1

N

x2i

2

(mean of the squares minus the square of the mean)


10/47

Standard deviation Standard deviation: root mean square of deviations:

2 = x

2

2

associated with the 2nd moment of xi distribution

Sample variance: replace by x

s2 1N 1

(xi x)2

N 1 instead of N because x is obtained from the samedata sample and not independently


11/47

So what are we after? We want . Best estimate of is sample mean, x x Best estimate of the error on x (and thus on is square root

of sample variance, s

s2

Weighted averages

P(xi) discreete probability distribution

replacexi withP(xi)xi andx2i byP(xi)x2i by definition, the formulae using are unchanged


12/47

Gaussian probability distribution unquestionably the most useful in statistical analysis

a limiting case of Binomial and Poisson distributions (whichare more fundamental; see next week)

seems to describe distributions of random observations fora large number of physical measurements

so pervasive that all results of measurements are alwaysclassified as gaussian or non-gaussian (even on WallStreet)


13/47

Meet the Gaussian probability density function: random variable x

parameters center and width :pG(x; , ) =

1

2exp

1

2

x

2

Differential probability:probability to observe a value in [x, x + dx] is

dPG(x; , ) = pG(x; , )dx


14/47

Standard Gaussian Distribution:replace (x )/ with a new variable z:

pG(z)dz =1

2exp

z2

2 dz

got a Gaussian centered at 0 with a width of 1.All computers calculate Standard Gaussian first, and then

stretch it and shift it to makepG(x; , )

mean and standard deviationBy straight application of definitions:

mean = (the center) standard deviation = (the width)

This makes Gaussian so convenient!


15/47

Interpretation of Gaussian errors we measured x = x0 0; what does that tell us? Standard Gaussian covers 0.683 from 1.0 to +1.0

the true value of x is contained by the interval[x0 0, x0 + 0] 68.3% of the time!


16/47

The Gaussian distribution coverage


17/47

Introduction to Error AnalysisPart 2: Fitting

Overview

principle of Maximum Likelihood minimizing 2 linear regression fitting of an arbitrery curve


18/47

Likelihood observed N data points, from parent population assume Gaussian parent distribution (mean , std.

deviation )

probability to observe xi, given true ,

Pi(xi|, ) =1

2

exp 1

2

xi

2

probability to have measured in this single measurement,

given observed xi and i, is called likelihood:

Pi(|xi, i) = 1

2exp

1

2

xi

2

for N observations, total likelihood is

L P() = Ni=1Pi()


19/47

Principle of Maximum Likelihood maximizing P() gives as the best estimate of

(the most likely population from which data might have come isassumed to be the correct one)

for Gaussian individual probability distributions(i = const = )

L = P() =

1

2

Nexp

12

xi

2

maximizing likelihood

minimizing argumen of Exp.

2 xi

2


20/47

Example: calculating mean cross-checking...

d2

d =

d

dxi

2

= 0

derivative is linear

xi

= 0

= x 1N

xi

The mean really is the best estimate of the measured quantity.


21/47

Linear regression simplest case: linear functional dependence measurements yi, model (prediction) y = f(x) = a + bx

in each point, y(xi) = a + bxi( special casef(xi) = const = a = )

minimize

2(a, b) =

yi f(xi)

2 conditions for a minimum in 2-dim parameter space:

a2(a, b) = 0

b2(a, b) = 0

can be solved analytically, but dont do it in real life(e.g. see p.105 in Bevington)


22/47

Familiar example from day one: linear fit A program (e.g. ROOT) will do minimization of 2 for you

x label [units]1 2 3 4 5

ylabel[uni

ts]

1

1.5

2

2.5

3

3.5

4

4.5

5

example analysisexample analysis

This program will give you the answer for a and b(= )


23/47

Fitting with an arbitrary curve a set of measurement pairs (xi, yi i)

(note no errors onxi!)

theoretical model (prediction) may depend on severalparameters {ai} and doesnt have to be lineary = f(x; a0, a1, a2, ...)

identical approach: minimize total 2

2(

{ai

}) =

yi f(xi; a0, a1, a2, ...)

i2

minimization proceeds numerically


24/47

Fitting data points with errors on both x and y xi xi , yi yi Each term in 2 sum gets a correction from the xi

contribution:yi f(xi)yi

2 (yi f(xi))2

(yi )2 +

f(xi+xi)f(xi

xi)

2 2

x

y(x)

f(xsx)

error on x

contributionto y chi2

f(x)f(x+sx)


25/47

Behavior of 2

function near minimum when N is large, 2(a0, a1, a2, . . .) becomes quadratic

in each parameter near minimum

2 = (aj a

j)2

2j+ C

known as parabollic approximation

C tells us about goodness of the overall fit (function of alluncertaintines + other {ak} for k = j

aj = j 2 = 1valid in all cases! parabollic error is the curvature at the minimum

22

a2j

=2

2j


26/47

2

shapes near minimum: examples

better errorsworse fit

a

2

asymmetric errorsbetter fit


27/47

Two methods for obtaining the error1.

2j =

222

a2

j

2. scan each parameter around minimum while others are fixed

until 2 = 1 is reached

method #1 is much faster to calculate method #2 is more generic and works even when the shapeof 2 near minimum is not exactly parabollic

the scan of

2

= 1defines a so-called one-sigma contour.

It contains the truthwith 68.3% probability(assuming Gaussian errors)


28/47

What to remember In the end the fit will be done for you by the program

you supply the data, e.g. (xi, yi)

and the fit model, e.g. y = f(x; a1, a2, ...)

the program returns a1 = 1 1, a2 = 2 2,...and the plot with the model line through the points

You need to understand what is done

In more complex cases you may need to go deep into code


29/47

Introduction to Error AnalysisPart 3: Combining measurements

Overview

propagation of errors covariance weighted average and its error error on sample mean and sample standard deviation


30/47

Propagation of Errors x is a known function of u, v. . .x = f(u,v,...)

assume that most probable value for x isx = f(u, v, ...)

x is the mean of xi = f(ui, vi,...)

by definition of variance

x = limN

1

N(xi x)2

expand (xi x) in Taylor series:

xi

x

(ui

u)

x

u + (vi

v)x

v +


31/47

Variance of x

2x

limN

1

N(ui

u)x

u + (vi

v)x

v +

2

limN

1

N

(ui u)2

x

u

2+ (vi v)2

x

v

2

+2(ui u)(vi v)

xu

xv

+

2ux

u2

+ 2v xv2

+ 22uv xux

v +

This is the error propagation equation.

uv is COvariance. Describes correlation between errors on uand

v.

For uncorrelated errors uv 0


32/47

Examples x = u + awhere a =const. Thus x/u = 1

x

= u

x = au + bvwhere a, b =const.

2x = a

22u + b

22v + 2ab

2uv

correlation can be negative, i.e. 2uv < 0

if an error on u counterballanced by a proportional erroron v, x can get very small!


33/47

More examples x = auv

x

u = av

x

v = au

2x = (avu)

2 + (auv)2 + 2a2uv2uv

2x

x2 =

2u

u2 +

2v

v2 + 2

2uv

uv2

x = auv

2x

x2=

2u

u2+

2v

v22

2uv

uv2

etc., etc.


34/47

Weighted averageFrom part #2: calculation of the mean

minimizing

2

=xi

i2

minimum at d2/d = 0, but now i = const.

0 =

xi

2i

=

xi2i

12i

so-called weighted average is

=

xi2

i

12

i each measurement is weightedby 1/2i !


35/47

Error on weighted average N points contribute to a weighted average straight application of the error propagation equation:

2 =

2i

xi

2

xi

=

xi

(xj/

2

j )(1/2k)

= 1/2

i(1/2k)

putting both together

1

2

=

2i

1/2i

(1/2k)

21= 1

2k


36/47

Example of weighted average

x1 = 25.0 1.0 x2 = 20.0 5.0 error

2 =1

1/1 + 1/52= 25/26 0.96 1.0

weighted average

x = 2 25

12+

20

52 =25

2625 25 + 20 1

25= 24.8

result: x = 24.8 1.0 morale: x2 practically doesnt matter!


37/47

Error on the mean

N measurements from the same parent population (, ) from part #1: sample mean and sample standard

deviation are best estimators of the parent population

but: more measurements still gives same : our knowledge of shape of parent population improves

and thus of original true error on each point

but how well do we know the true value? (i.e. ?)

if N points from same population with :1

2 = 1

2 =

N

2

=N

sN

Standard deviation of the mean, or standard error.


38/47

Example: Lightness vs Lycopene content, scatter plot

Lightness

30 40 50 60

Lycopene

content

50

60

70

80

90

100

110

120lyc:Light


39/47

Example: Lightness vs Lycopene content: RMS as Error

Lightness

30 35 40 45 50 55

Lycopene

content

60

65

70

75

80

85

90

95100

105

Lightness vs Lycopene content -- spread option

Points dont scatter enough the error bars are too large!


40/47

Example: Lightness vs Lycopene content: Error on Mean

Lightness

30 35 40 45 50 55

Lycopene

content

60

65

70

75

80

85

90

95

100

Lightness vs Lycopene content

This looks much better!


41/47

Introduction to Error AnalysisPart 4: dealing with non-Gaussian cases

Overview

binomial p.d.f. Poisson p.d.f.


42/47

Binomial probability density function

random process with exactly two outcomes (Bernoulliprocess)

probability for one outcome (success) isp

probability for exactly r successes (0 r N) in Nindependent trials

order of successes and failures doesnt matter

binomial p.d.f.:f(r; N, p) =

N!

r!(N r)!pr(1 p)Nr

mean: N p variance: N p(1 p)

if r and s are binomially distributed with (Nr, p) and

(Ns, p), then t = r + s distributed with (Nr + Ns, p).


43/47

Examples of binomial probability

Binomial distribution always shows up when data exhibitsbinary properties:

event passes or fails

efficiency (an important exp. parameter) defined as

= Npass/Ntotal

particles in a sample are positive or negative


44/47

Poisson probability density function

probability of finding exactly n events in a given interval ofx (e.g., space and time)

events are independent of each other and of x

average rate of per interval Poisson p.d.f. ( > 0)

f(n; ) =

ne

n!

mean:

variance:

limiting case of binomial for many events with lowprobability:

p 0, N while N p =

Poisson approaches Gaussian for large


45/47

Examples of Poisson probability

Shows up in counting measurements with small number ofevents

number of watermellons with circumference

c [19.5, 20.5]in. nuclear spectroscopy in tails of distribution

(e.g. high channel number)

Rutherford experiment


46/47

Fitting a histogram with Poisson-distributed content

Poisson data require special treatment in terms of fitting!

histogram, i-th channel contains ni entries

for large ni, P(ni) is GaussianPoisson =

approximated by

ni

(WARNING: this is what ROOT uses by default!)

Gaussian p.d.f.

symmetric errors

equal probability to fluctuate up or down minimizing 2 fit (the true value) is equally likely to be

above and below the data!


47/47

Comparing Poisson and Gaussian p.d.f.

Expected number of events, x-2 0 2 4 6 8 10 12 14 16 18

Prob(5|x)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Expected number of events, x0 2 4 6 8 10 12 14

CummulativeProbability

0

0.1

0.2

0.3

0.4

0.5

0.6

5 observed events Dashed: Gaussian at 5 with = 5 Solid: Poisson with = 5 Left: prob.density functions (note: Gauss can be < 0!) Right: confidence integrals (p.d.f integrated from 5)