High School Mathematics at the
Research Frontier
Don Lincoln
Fermilab
http://www-d0.fnal.gov/~lucifer/PowerPoint/HSMath.ppt
What is Particle Physics?
High Energy Particle Physics is a study of the smallest pieces of matter.
It investigates (among other things) the nature of the universe immediately after the Big Bang.
It also explores physics at temperatures not common for the past 15 billion years (or so).
It’s a lot of fun.
Now (15 billion years)
Stars form(1 billion years)
Atoms form (300,000 years)
Nuclei form (180 seconds)
??? (Before that)
4x10-12 seconds
Nucleons form (10-10 seconds)
DØ Detector: Run II
30’
30’
50’
• Weighs 5000 tons• Can inspect
3,000,000 collisions/second
• Will record 50 collisions/second
• Records approximately 10,000,000 bytes/second
• Will record 1015 (1,000,000,000,000,000) bytes in the next run (1 PetaByte).
Remarkable Photos
This collision is the most violentever recorded. It required thatparticles hit within 10-19 m or 1/10,000 the size of a proton
In this collision, a top and anti-top quark were created,helping establish their existence
How Do You Measure Energy?• Go to Walmart and buy an energy detector?
• Ask the guy sitting the next seat over and hope the teacher doesn’t notice?
• Ignore the problem and spend the day on the beach?
• Design and build your equipment and calibrate it yourself.
Build an Electronic Scale
150 lbs
?? Volts
Volts are a unit of electricity
Car battery = 12 VoltsWalkman battery = 1.5 Volts
Calibrating the Scale
120 lb girl = 9 V (120, 9)
180 lb guy = 12 V (150, 12)
Scale Calibration
0
5
10
15
0 50 100 150 200
Weight (lb)
Voltag
e (V
))
Make a line, solve slope and intercept
y = m x + b
Voltage = (0.05) weight + 3
Implies
Weight = 20 (Voltage – 3)
This implies that you can know the voltage for any weight.
For instance, a weight of 60 lbswill give a voltage of 6 V.
Now you have a calibrated scale.
(Or do you?)
Issues with calibrating.Scale Calibration
0
10
20
0 50 100 150 200
Weight (lb)
Volta
ge (V
))
FitValue at
60 lb
Purple 6
Blue 10
Red -70
Green 11.5
All four of these functions go through the two calibration points. Yet all give very different predictions for a weight of 60 lbs.
What can we do to resolve this?
Approach: Take More DataScale Calibration
0
10
20
0 50 100 150 200
Weight (lb)Vo
ltage
(V))
Easy
Scale Calibration
0
10
20
0 50 100 150 200
Weight (lb)
Volta
ge (V
))Hard
Solution: Pick Two Points
Scale Calibration
0
10
20
0 20 40 60 80 100 120 140 160 180 200
Weight (lb)
Voltag
e (V))
Dreadful representation of data
Solution: Pick Two Points
Scale Calibration
0
10
20
0 50 100 150 200
Weight (lb)
Volta
ge (V
))
Better, but still poor, representation of data
Why don’t all the data lie on a line?
• Error associated with each calibration point.
• Must account for that in data analysis.
• How do we determine errors?
• What if some points have larger errors than others? How do we deal with this?
First Retake Calibration Data• Remeasure the 120 lb point
• Note that the data doesn’t always repeat.
• You get voltages near the 9 Volt ideal, but with substantial variation.
• From this, estimate the error.
Attempt Voltage
1 9.26
2 9.35
3 9.08
4 8.72
5 8.58
6 9.02
7 9.25
8 8.86
9 8.94
10 9.12
11 8.72
12 9.33
DataCalibration Data for 120 lb
01
2
79
1618
1416
9
53
00
5
10
15
20
5 7 9 11 13Voltages
#
Bin Frequency6 06.5 17 27.5 78 98.5 169 189.5 1410 1610.5 911 511.5 312 0
More 0
While the data clusters around 9 volts, it has a range. How we estimate the error is somewhat technical, but we can say
9 1 Volts
Redo for All Calibration Points
Weight Voltage
60 4.2 0.5
120 9.4 1.0
150 10 0.7
180 13.2 1.2
300 13.2 8.4
Redo for All Calibration Points
Weight Voltage
60 4.2 0.5
120 9.4 1.0
150 10 0.7
180 13.2 1.2
300 13.2 8.4
Scale Calibration Data
0510152025
0 100 200 300 400
Weight
Vol
tage
0
10
20
30
0 100 200 300 400
Weight
Volta
geScale Calibration Data
0
10
20
30
0 50 100 150 200 250 300 350
Volta
ge
Scale Calibration Data
0510152025
0 100 200 300 400
Weight
Volta
ge
Both lines go through the data.
How to pick the best one?
State the Problem• How to use mathematical techniques to
determine which line is best?
• How to estimate the amount of variability allowed in the found slope and intercept that will also allow for a reasonable fit?
• Answer will be m m and b b
The Problem
• Given a set of five data points, denoted (xi,yi,i) [i.e. weight, voltage, uncertainty in voltage]
• Also given a fit function f(xi) = m xi + b
• Define
5
12
2
25
255
21
2112
)]([
)]([)]([
i i
ii xfy
xfyxfy
Looks Intimidating!
Forget the math, what does it mean?
Scale Calibration Data
051015202530
0 100 200 300 400
Weight
Volta
ge
Scale Calibration Data
10
12
14
16
18
170 175 180 185 190
Weight
Volta
gef(xi)
xi
yi
yi - f(xi) i
2
2)]([
i
ii xfy
Each term in the sum is simply the separation between the data and fit in units of error bars. In this case, the separation is about 3.
More TranslationSo
Means
2
2
2
2
2
2
bars)error in 5 separation(
bars)error in 4 separation(
bars)error in 3 separation(
bars)error in 2 separation(
bars)error in 1 separation(
Since f(xi) = m xi + b, find m and b that minimizes the 2.
25
255
24
244
23
233
22
222
21
2112
)]([
)]([
)]([
)]([
)]([
bmxy
bmxy
bmxy
bmxy
bmxy
5
12
22 )]([
i i
ii xfy
Approach
5
12
22 ][
i i
ii bmxy
Find m and b that minimizes 2
Calculus
5
12
2
5
12
2
)1]([20
)]([20
i i
ii
i i
iii
bmxy
b
xbmxy
m
Back to algebra
Note the common term (-2). Factor it out.
Approach #2
5
12
5
12
][0
)]([0
i i
ii
i i
iii
bmxy
xbmxy
Now distribute the terms
5
12
5
12
2
][0
][0
i i
ii
i i
iiii
bmxy
bxmxxy
Rewrite as separate sums
5
12
5
12
5
12
5
12
5
12
25
12
0
0
i ii i
i
i i
i
i i
i
i i
i
i i
ii
bmxy
bxmxxy
Move terms to LHS
5
12
5
12
5
12
5
12
5
12
25
12
i ii i
i
i i
i
i i
i
i i
i
i i
ii
bmxy
bxmxxy
Factor out m and b terms
5
12
5
12
5
12
5
12
5
12
25
12
1
i ii i
i
i i
i
i i
i
i i
i
i i
ii
bx
my
xb
xm
xy
5
12
5
12
5
12
5
12
5
12
25
12
1
i ii i
i
i i
i
i i
i
i i
i
i i
ii
bx
my
xb
xm
xy
Approach #3
Notice that this is simply two equations with two unknowns. Very similar to
yExDF
yBxAC
You know how to solve this
BDAE
CDFAy
BDAE
BFCEx
,
]][[]1
][[
]][[]][[
]][[]1
][[
]][[]1
][[
5
12
5
12
5
12
5
12
2
5
12
5
12
5
12
25
12
5
12
5
12
5
12
5
12
2
5
12
5
12
5
12
5
12
i i
i
i i
i
i ii i
i
i i
i
i i
ii
i i
i
i i
i
i i
i
i i
i
i ii i
i
i i
i
i i
i
i ii i
ii
xxx
xxyxy
b
xxx
yxxy
m
Substitution
Note the common term in the denominator
5
12
5
12
5
12
5
12
5
12
25
12
1
i ii i
i
i i
i
i i
i
i i
i
i i
ii
bx
my
xb
xm
xy
ohmigod….
yougottabekiddingme
i x i y i
i x i y i /( i )2
1 60 4.2 0.5 1008.02 120 9.4 1 1128.03 150 10 0.7 3061.24 180 13.2 1.2 1650.05 300 19.8 8.4 84.2
6931.4
4.6931
2.8416502.306111281008
25
5524
4423
3322
2221
115
12
xyxyxyxyxyxy
i i
ii
So each numberisn’t bad
Approach #4
25
12
5
12
5
12
2
5
12
5
12
5
12
25
12
25
12
5
12
5
12
2
5
12
5
12
5
12
5
12
][]1
][[
]][[]][[
][]1
][[
]][[]1
][[
i i
i
i ii i
i
i i
i
i i
ii
i i
i
i i
i
i i
i
i ii i
i
i i
i
i i
i
i ii i
ii
xx
xxyxy
b
xx
yxxy
m
Inserting and evaluating, we get
m = 0.068781, b = 0.161967
What about significant figures?
2
2
2
2
2
2
bars)error in 5 separation(
bars)error in 4 separation(
bars)error in 3 separation(
bars)error in 2 separation(
bars)error in 1 separation(
0.8178
0.300145
0.468521
0.996884
031568.0
2
2nd and 5th terms give biggest contribution to 2 = 2.587
Best FitScale Calibration Data
0
10
20
30
0 100 200 300 400
Weight
Volta
ge
Best vs. Good
Best
Doesn’t always
mean good
Goodness of Fit
Scale Calibration Data
0
10
20
30
0 100 200 300 400
Weight
Volta
ge
Our old buddy, in which the data and the fit seem to agree
Scale Calibration Data
012345
0 100 200 300 400Weight
Vo
ltag
e
A new hypothetical set of data with the best line (as determined by the same 2 method) overlaid
New Important Concept
• If you have 2 data points and a polynomial of order 1 (line, parameters m & b), then your line will exactly go through your data
• If you have 3 data points and a polynomial of order 2 (parabola, parameters A, B & C), then your curve will exactly go through your data
• To actually test your fit, you need more data than the curve can naturally accommodate.
• This is the so-called degrees of freedom.
Degrees of Freedom (dof )
• The dof of any problem is defined to be the number of data points minus the number of parameters.
• In our case,
• dof = 5 – 2 = 3
• Need to define the 2/dof
Goodness of Fit
Scale Calibration Data
0
10
20
30
0 100 200 300 400
Weight
Volta
geScale Calibration Data
012345
0 100 200 300 400Weight
Vo
ltag
e
2/dof = 2.587/(5-2) = 0.862
2/dof = 22.52/(5-2) = 7.51
2/dof near 1 means the fit is good.
Too high bad fitToo small errors were over estimated
Can calculate probability that data is represented by the given fit. In this case:
Top: < 0.1%
Bottom: 68%
In the interests of time, we will skip how to do this.
2 Distribution for b
2
2.5
3
3.5
4
4.5
-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
b
2
The error in b is indicated by the spot at which the 2 is changed by 1.
Uncertainty in m and b #1Recall that we found
m = 0.068781, b = 0.161967
What about uncertainty and significant figures?
If we take the derived value for one variable (say m), we can derive the 2 function for the other variable (b).
CBxAx
mxymxybb
bmxy
i i
ii
i i
ii
i i
i i
ii
2
5
12
25
12
5
12
2
5
12
22
~
)()()2(
1
])[(
So 0.35
2 Distribution for m
2
2.5
3
3.5
4
0.064 0.066 0.068 0.07 0.072 0.074m
2
Uncertainty in m and b #2Recall that we found
m = 0.068781, b = 0.161967
What about uncertainty and significant figures?
If we take the derived value for one variable (say b), we can derive the 2 function for the other variable (m).
CBxAx
bybyxm
xm
mxby
i i
i
i i
ii
i i
i
i i
ii
2
5
12
25
12
5
12
22
5
12
22
~
)()()2(
])[(
The error in m is indicated by the spot at which the 2 is changed by 1.
So 0.003
Uncertainty in m and b #3So now we know a lot of the story
m = 0.068781 0.003b = 0.161967 0.35
So we see that significant figures are an issue.
Finally we can see
Voltage = (0.069 0.003) × Weight + (0.16 0.35)
Scale Calibration Data
0
10
20
30
40
0 100 200 300 400
Weight
Volta
ge
Final complication: When we evaluated the error for m and b, we treated the other variable as constant. As we know, this wasn’t correct.
Error Ellipse
xyFyExDCyBxA
mbFbEmDCbBmA
xmb
yb
yxmb
xm
y
bmxy
i i i i i i
i
i
i
i i
ii
ii
i
i
i
i i
ii
22
22
5
1
5
1
5
1
5
1
5
122
5
122
22
22
2
2
5
12
22
2221
][
m
bBest b & m
2min + 1
2min + 2
2min + 3
More complicated, but shows that uncertainty in one variable also affects the uncertainty seen in another variable.
Scale Calibration Data
0510152025
0 100 200 300 400
Weight
Volta
ge
Increase intercept, keep slo
pe the same
To remain ‘good’, if you increase the intercept, you must decrease the slope
Increase intercept, keep slo
pe the same
Scale Calibration Data
0510152025
0 100 200 300 400
Weight
Volta
ge
Decrease slope, keep intercept the same
Similarly, if you decrease the slope, you must increase the intercept
Error Ellipse
m
b
Best b & m
mbest
bbest
new m
within errors
new b
within errors
When one has an m below mbest, the range of preferred b’s tends to be above bbest.
From both physical principles and strict mathematics, you can see that if you make a mistake estimating one parameter, the other must move to compensate. In this case, they are anti-correlated (i.e. if b, then m and if b, then m.)
Back to Physics
Data and error analysis is crucial, whether you work in a high school lab…
Or the Frontier!!!!
References• P. Bevington and D. Robinson, Data Reduction
and Error Analysis for the Physical Sciences, 2nd Edition, McGraw-Hill, Inc. New York, 1992.
• J. Taylor, An Introduction to Error Analysis, Oxford University Press, 1982.
• Rotated ellipses – http://www.mecca.org/~halfacre/MATH/rotation.htm
http://www-d0.fnal.gov/~lucifer/PowerPoint/HSMath.ppt
http://worldscientific.com/books/physics/5430.html