7/27/2019 error analysis lectures
1/47
Introduction to Error AnalysisPart 1: the Basics
Andrei Gritsanbased on lectures by Petar Maksimovic
February 1, 2010
Overview
Definitions
Reporting results and rounding
Accuracy vs precision systematic vs statistical errors Parent distribution
Mean and standard deviation
Gaussian probability distribution What a 1 error means
7/27/2019 error analysis lectures
2/47
Definitions true: true value of the quantity x we measure xi: observed value error on : difference between the observed and true
value, xi trueAll measurement have errors true value is unattainable
seek best estimate of true value,
seek best estimate of true error xi
7/27/2019 error analysis lectures
3/47
One view on reporting measurements (from the book) keep only one digit of precision on the error everything
else is noise
Example: 410.5163819
4
102
exception: when the first digit is 1, keep two:Example: 17538 1.7 104
round off the final value of the measurement up to thesignificant digits of the errorsExample: 87654 345 kg (876 3) 102 kg
rounding rules:
6 and above
round up
4 and below round down 5: if the digit to the right is even round down, else round
up
(reason: reduces systematic erorrs in rounding)
7/27/2019 error analysis lectures
4/47
A different view on roundingFrom Particle Data Group (authority in particle physics):
http://pdg.lbl.gov/2009/reviews/rpp2009-rev-rpp-intro.pdf
between 100 and 354, we round to two significant digitsExample: 87654 345 kg (876.5 3.5) 102 kg
between 355 and 949, we round to one significant digitExample: 87654 365 kg (877 4) 10
2
kg lie between 950 and 999, we round up to 1000 and keep two
significant digits
Example: 87654
950 kg
(87.7
1.0)
103 kg
Bottom line:
Use consistent approach to rounding which is sound andaccepted in the field of study, use common sense after all
7/27/2019 error analysis lectures
5/47
Accuracy vs precision Accuracy: how close to true value Precision: how well the result is determined (regardless
of true value); a measure of reproducibility
Example: = 30
x = 23 2 precise, but inacurate uncorrected biases(large systematic error)
x = 28 7 acurate, but imprecise
subsequent measurements will scatter around
= 30 but cover the true value in most cases(large statistical (random) error)
an experiment should be both acurate and precise
7/27/2019 error analysis lectures
6/47
Statistical vs. systematic errors Statistical (random) errors:
describes by how much subsequent measurementsscatter the common average value
if limited by instrumental error, use a better apparatus if limited by statistical fluctuations, make more
measurements
Systematic errors: all measurements biased in a common way harder to detect: faulty calibrations
wrong model bias by observer
also hard to determine (no unique recipe) estimated from analysis of experimental conditions and
techniques
may be correlated
7/27/2019 error analysis lectures
7/47
Parent distribution(assume no systematic errors for now)
parent distribution: the probability distribution of results ifthe number of measurements N
however, only a limited number of measurements: weobserve only a sample of parent dist., a sample distribution
prob. distribution of our measurements only approaches
parent dist. with N use observed distribution to infer the parameters from theparent distribution, e.g., true when N
Notation
Greek: parameters of the parent distribution
Roman: experimental estimates of params of parent dist.
7/27/2019 error analysis lectures
8/47
Mean, median, mode Mean: of experimental (sample) dist:
x 1
N
Ni=1
xi
. . . of the parent dist
limN
1
N
xi
mean centroid average
Median: splits the sample in two equal parts
Mode: most likely value (highest prob.density)
7/27/2019 error analysis lectures
9/47
Variance Deviation: di xi , for single measurement
Average deviation:
xi = 0 by definition |xi |but, absolute values are hard to deal with analytically
Variance: instead, use mean of the deviations squared:2 (xi )2 = x2 2
2 = limN
1
N
x2i
2
(mean of the squares minus the square of the mean)
7/27/2019 error analysis lectures
10/47
Standard deviation Standard deviation: root mean square of deviations:
2 = x
2
2
associated with the 2nd moment of xi distribution
Sample variance: replace by x
s2 1N 1
(xi x)2
N 1 instead of N because x is obtained from the samedata sample and not independently
7/27/2019 error analysis lectures
11/47
So what are we after? We want . Best estimate of is sample mean, x x Best estimate of the error on x (and thus on is square root
of sample variance, s
s2
Weighted averages
P(xi) discreete probability distribution
replacexi withP(xi)xi andx2i byP(xi)x2i by definition, the formulae using are unchanged
7/27/2019 error analysis lectures
12/47
Gaussian probability distribution unquestionably the most useful in statistical analysis
a limiting case of Binomial and Poisson distributions (whichare more fundamental; see next week)
seems to describe distributions of random observations fora large number of physical measurements
so pervasive that all results of measurements are alwaysclassified as gaussian or non-gaussian (even on WallStreet)
7/27/2019 error analysis lectures
13/47
Meet the Gaussian probability density function: random variable x
parameters center and width :pG(x; , ) =
1
2exp
1
2
x
2
Differential probability:probability to observe a value in [x, x + dx] is
dPG(x; , ) = pG(x; , )dx
7/27/2019 error analysis lectures
14/47
Standard Gaussian Distribution:replace (x )/ with a new variable z:
pG(z)dz =1
2exp
z2
2 dz
got a Gaussian centered at 0 with a width of 1.All computers calculate Standard Gaussian first, and then
stretch it and shift it to makepG(x; , )
mean and standard deviationBy straight application of definitions:
mean = (the center) standard deviation = (the width)
This makes Gaussian so convenient!
7/27/2019 error analysis lectures
15/47
Interpretation of Gaussian errors we measured x = x0 0; what does that tell us? Standard Gaussian covers 0.683 from 1.0 to +1.0
the true value of x is contained by the interval[x0 0, x0 + 0] 68.3% of the time!
7/27/2019 error analysis lectures
16/47
The Gaussian distribution coverage
7/27/2019 error analysis lectures
17/47
Introduction to Error AnalysisPart 2: Fitting
Overview
principle of Maximum Likelihood minimizing 2 linear regression fitting of an arbitrery curve
7/27/2019 error analysis lectures
18/47
Likelihood observed N data points, from parent population assume Gaussian parent distribution (mean , std.
deviation )
probability to observe xi, given true ,
Pi(xi|, ) =1
2
exp 1
2
xi
2
probability to have measured in this single measurement,
given observed xi and i, is called likelihood:
Pi(|xi, i) = 1
2exp
1
2
xi
2
for N observations, total likelihood is
L P() = Ni=1Pi()
7/27/2019 error analysis lectures
19/47
Principle of Maximum Likelihood maximizing P() gives as the best estimate of
(the most likely population from which data might have come isassumed to be the correct one)
for Gaussian individual probability distributions(i = const = )
L = P() =
1
2
Nexp
12
xi
2
maximizing likelihood
minimizing argumen of Exp.
2 xi
2
7/27/2019 error analysis lectures
20/47
Example: calculating mean cross-checking...
d2
d =
d
dxi
2
= 0
derivative is linear
xi
= 0
= x 1N
xi
The mean really is the best estimate of the measured quantity.
7/27/2019 error analysis lectures
21/47
Linear regression simplest case: linear functional dependence measurements yi, model (prediction) y = f(x) = a + bx
in each point, y(xi) = a + bxi( special casef(xi) = const = a = )
minimize
2(a, b) =
yi f(xi)
2 conditions for a minimum in 2-dim parameter space:
a2(a, b) = 0
b2(a, b) = 0
can be solved analytically, but dont do it in real life(e.g. see p.105 in Bevington)
7/27/2019 error analysis lectures
22/47
Familiar example from day one: linear fit A program (e.g. ROOT) will do minimization of 2 for you
x label [units]1 2 3 4 5
ylabel[uni
ts]
1
1.5
2
2.5
3
3.5
4
4.5
5
example analysisexample analysis
This program will give you the answer for a and b(= )
7/27/2019 error analysis lectures
23/47
Fitting with an arbitrary curve a set of measurement pairs (xi, yi i)
(note no errors onxi!)
theoretical model (prediction) may depend on severalparameters {ai} and doesnt have to be lineary = f(x; a0, a1, a2, ...)
identical approach: minimize total 2
2(
{ai
}) =
yi f(xi; a0, a1, a2, ...)
i2
minimization proceeds numerically
7/27/2019 error analysis lectures
24/47
Fitting data points with errors on both x and y xi xi , yi yi Each term in 2 sum gets a correction from the xi
contribution:yi f(xi)yi
2 (yi f(xi))2
(yi )2 +
f(xi+xi)f(xi
xi)
2 2
x
y(x)
f(xsx)
error on x
contributionto y chi2
f(x)f(x+sx)
7/27/2019 error analysis lectures
25/47
Behavior of 2
function near minimum when N is large, 2(a0, a1, a2, . . .) becomes quadratic
in each parameter near minimum
2 = (aj a
j)2
2j+ C
known as parabollic approximation
C tells us about goodness of the overall fit (function of alluncertaintines + other {ak} for k = j
aj = j 2 = 1valid in all cases! parabollic error is the curvature at the minimum
22
a2j
=2
2j
7/27/2019 error analysis lectures
26/47
2
shapes near minimum: examples
better errorsworse fit
a
2
asymmetric errorsbetter fit
7/27/2019 error analysis lectures
27/47
Two methods for obtaining the error1.
2j =
222
a2
j
2. scan each parameter around minimum while others are fixed
until 2 = 1 is reached
method #1 is much faster to calculate method #2 is more generic and works even when the shapeof 2 near minimum is not exactly parabollic
the scan of
2
= 1defines a so-called one-sigma contour.
It contains the truthwith 68.3% probability(assuming Gaussian errors)
7/27/2019 error analysis lectures
28/47
What to remember In the end the fit will be done for you by the program
you supply the data, e.g. (xi, yi)
and the fit model, e.g. y = f(x; a1, a2, ...)
the program returns a1 = 1 1, a2 = 2 2,...and the plot with the model line through the points
You need to understand what is done
In more complex cases you may need to go deep into code
7/27/2019 error analysis lectures
29/47
Introduction to Error AnalysisPart 3: Combining measurements
Overview
propagation of errors covariance weighted average and its error error on sample mean and sample standard deviation
7/27/2019 error analysis lectures
30/47
Propagation of Errors x is a known function of u, v. . .x = f(u,v,...)
assume that most probable value for x isx = f(u, v, ...)
x is the mean of xi = f(ui, vi,...)
by definition of variance
x = limN
1
N(xi x)2
expand (xi x) in Taylor series:
xi
x
(ui
u)
x
u + (vi
v)x
v +
7/27/2019 error analysis lectures
31/47
Variance of x
2x
limN
1
N(ui
u)x
u + (vi
v)x
v +
2
limN
1
N
(ui u)2
x
u
2+ (vi v)2
x
v
2
+2(ui u)(vi v)
xu
xv
+
2ux
u2
+ 2v xv2
+ 22uv xux
v +
This is the error propagation equation.
uv is COvariance. Describes correlation between errors on uand
v.
For uncorrelated errors uv 0
7/27/2019 error analysis lectures
32/47
Examples x = u + awhere a =const. Thus x/u = 1
x
= u
x = au + bvwhere a, b =const.
2x = a
22u + b
22v + 2ab
2uv
correlation can be negative, i.e. 2uv < 0
if an error on u counterballanced by a proportional erroron v, x can get very small!
7/27/2019 error analysis lectures
33/47
More examples x = auv
x
u = av
x
v = au
2x = (avu)
2 + (auv)2 + 2a2uv2uv
2x
x2 =
2u
u2 +
2v
v2 + 2
2uv
uv2
x = auv
2x
x2=
2u
u2+
2v
v22
2uv
uv2
etc., etc.
7/27/2019 error analysis lectures
34/47
Weighted averageFrom part #2: calculation of the mean
minimizing
2
=xi
i2
minimum at d2/d = 0, but now i = const.
0 =
xi
2i
=
xi2i
12i
so-called weighted average is
=
xi2
i
12
i each measurement is weightedby 1/2i !
7/27/2019 error analysis lectures
35/47
Error on weighted average N points contribute to a weighted average straight application of the error propagation equation:
2 =
2i
xi
2
xi
=
xi
(xj/
2
j )(1/2k)
= 1/2
i(1/2k)
putting both together
1
2
=
2i
1/2i
(1/2k)
21= 1
2k
7/27/2019 error analysis lectures
36/47
Example of weighted average
x1 = 25.0 1.0 x2 = 20.0 5.0 error
2 =1
1/1 + 1/52= 25/26 0.96 1.0
weighted average
x = 2 25
12+
20
52 =25
2625 25 + 20 1
25= 24.8
result: x = 24.8 1.0 morale: x2 practically doesnt matter!
7/27/2019 error analysis lectures
37/47
Error on the mean
N measurements from the same parent population (, ) from part #1: sample mean and sample standard
deviation are best estimators of the parent population
but: more measurements still gives same : our knowledge of shape of parent population improves
and thus of original true error on each point
but how well do we know the true value? (i.e. ?)
if N points from same population with :1
2 = 1
2 =
N
2
=N
sN
Standard deviation of the mean, or standard error.
7/27/2019 error analysis lectures
38/47
Example: Lightness vs Lycopene content, scatter plot
Lightness
30 40 50 60
Lycopene
content
50
60
70
80
90
100
110
120lyc:Light
7/27/2019 error analysis lectures
39/47
Example: Lightness vs Lycopene content: RMS as Error
Lightness
30 35 40 45 50 55
Lycopene
content
60
65
70
75
80
85
90
95100
105
Lightness vs Lycopene content -- spread option
Points dont scatter enough the error bars are too large!
7/27/2019 error analysis lectures
40/47
Example: Lightness vs Lycopene content: Error on Mean
Lightness
30 35 40 45 50 55
Lycopene
content
60
65
70
75
80
85
90
95
100
Lightness vs Lycopene content
This looks much better!
7/27/2019 error analysis lectures
41/47
Introduction to Error AnalysisPart 4: dealing with non-Gaussian cases
Overview
binomial p.d.f. Poisson p.d.f.
7/27/2019 error analysis lectures
42/47
Binomial probability density function
random process with exactly two outcomes (Bernoulliprocess)
probability for one outcome (success) isp
probability for exactly r successes (0 r N) in Nindependent trials
order of successes and failures doesnt matter
binomial p.d.f.:f(r; N, p) =
N!
r!(N r)!pr(1 p)Nr
mean: N p variance: N p(1 p)
if r and s are binomially distributed with (Nr, p) and
(Ns, p), then t = r + s distributed with (Nr + Ns, p).
7/27/2019 error analysis lectures
43/47
Examples of binomial probability
Binomial distribution always shows up when data exhibitsbinary properties:
event passes or fails
efficiency (an important exp. parameter) defined as
= Npass/Ntotal
particles in a sample are positive or negative
7/27/2019 error analysis lectures
44/47
Poisson probability density function
probability of finding exactly n events in a given interval ofx (e.g., space and time)
events are independent of each other and of x
average rate of per interval Poisson p.d.f. ( > 0)
f(n; ) =
ne
n!
mean:
variance:
limiting case of binomial for many events with lowprobability:
p 0, N while N p =
Poisson approaches Gaussian for large
7/27/2019 error analysis lectures
45/47
Examples of Poisson probability
Shows up in counting measurements with small number ofevents
number of watermellons with circumference
c [19.5, 20.5]in. nuclear spectroscopy in tails of distribution
(e.g. high channel number)
Rutherford experiment
7/27/2019 error analysis lectures
46/47
Fitting a histogram with Poisson-distributed content
Poisson data require special treatment in terms of fitting!
histogram, i-th channel contains ni entries
for large ni, P(ni) is GaussianPoisson =
approximated by
ni
(WARNING: this is what ROOT uses by default!)
Gaussian p.d.f.
symmetric errors
equal probability to fluctuate up or down minimizing 2 fit (the true value) is equally likely to be
above and below the data!
7/27/2019 error analysis lectures
47/47
Comparing Poisson and Gaussian p.d.f.
Expected number of events, x-2 0 2 4 6 8 10 12 14 16 18
Prob(5|x)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Expected number of events, x0 2 4 6 8 10 12 14
CummulativeProbability
0
0.1
0.2
0.3
0.4
0.5
0.6
5 observed events Dashed: Gaussian at 5 with = 5 Solid: Poisson with = 5 Left: prob.density functions (note: Gauss can be < 0!) Right: confidence integrals (p.d.f integrated from 5)