Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Welcome to Physical Sciences 2 lab! We're very excited about the labs for this course, and we hope you will be, too. Everything about the labs has been newly designed for a great educational experience with a minimum of annoying busywork. We've had a lot of fun working on the labs, and hopefully, you will have a lot of fun doing them.
By this time you should all have sectioned for a lab time assignment. If you haven't, or if you don't remember your lab section time, please contact Kirill immediately: [email protected]. Lab 1 will run next week, from Tuesday, October 3 to Thursday, October 5.
Before you show up to your first lab next week, we would like you to do three things:
1. Download the Logger Pro Software. Logger Pro is the data collection and analysis software we will be using for all of the labs in this course. It's very easy to use and powerful, and is available both for Windows or Macintosh platforms. The site license agreement allows any Harvard student to freely download and use the software. (If you don't have a PC or a Mac, or don't want to put Logger Pro on your own computer, you can use one of the computers in the Science Center computer labs.) The program can be downloaded from the HASCS Software Download Page: http://www.fas.harvard.edu/computing/download. Either version 3.4.5 or 3.4.6 is okay; 3.4.6 is the latest; as of this writing, 3.4.5 is the version available from HASCS, but we're told they are working on getting 3.4.6 up there.
2. Learn to use Logger Pro. We recommend you go through some of the tutorials that come with the software. To do so, go to File-Open and then under the folder labeled Experiments, find the subfolder called Tutorials. Tutorial #1 is a quick overview; #5 has information on entering data; #7 is a very brief summary on working with graphs; and #9 teaches you how to analyze data using curve fitting. Some of the other tutorials are also useful, but they require one or more sensors connected so that you can learn how to take data.
3. Read the attached handout, "An Introduction to Measurement and Uncertainty." This document contains ideas which will be new to many of you, even those of you with a background in statistics, but we have essentially tried to boil down the most important things you need to know about doing quantitative experimental science and put them in one place, so it is very important; we will be using the ideas from this document over and over throughout the labs this semester. If you have any specific questions about the document, please post your questions to the Lab Discussion Page on the course website, or contact your Lab TF.
That's it! We look forward to a semester of fun, excitement, and instruction in the labs. See you next week!
1
Physical Sciences 2 and Physics 11a
An Introduction to Measurement and Uncertainty
1. Measurement and Uncertainty
In the laboratory portion of this course, you will perform experiments and make
observations. You should distinguish between two types of observations: qualitative and
quantitative observations. Although qualitative observations are an important aspect of
experimental science (e.g., “I connected the battery and smoke started pouring out of the
device”), we will focus on quantitative observations, or measurements. You will make
measurements using various measuring devices, and report the values of these
measurements. Physical theories, such as Newton’s laws of motion, make quantitative
predictions about the outcomes of experiments: if we drop a ball from a height h above
the ground, Newton’s laws predict the speed of the ball when it strikes the ground. In
order to test, refine, and develop our physical theories, we must make quantitative
measurements.
Although you make measurements every day—after all, a clock is a device that
measures time—you probably do not give much thought to the process of measurement.
The following schematic should help you think about this process:
The physical system(what we measure)
is described by certainparameters, such asposition, time, velocity,mass, force, etc.
The measuring device(takes a measurement)
could be a stopwatch,ruler, balance,thermometer, etc.The experimenter may be "a part of the device."
The measurement(what we record)
must have three things:• numerical value• estimated uncertainty• units
Within the paradigm of classical physics, we consider the parameters of the physical
system to be defined to infinitely high precision. Any measuring device, however, has
some limits on the precision of its measurements. For instance, you may measure time
using a digital stopwatch that records time to the nearest millisecond. A measuring
device observes a physical system and records a measurement. When you measure
length using a ruler, the ruler alone is not a complete measuring device: you must
interpret the markings on the ruler and record the measurement, so you are a part of the
2
measuring device. A thermometer connected to a computer is a complete measuring
device, since the computer records the measurements.
All measurements involve some uncertainty, or error. Physicists use the term
error not to describe mistakes (“I dropped the thermometer and it broke”) but to describe
the inevitable uncertainty that accompanies any measurement. When we report a
measurement, we must include three pieces of information: the numerical value of the
measurement, the units of the measurement, and some estimate of the uncertainty of the
measurement. For example, you might report that the length of a metal rod is 13.2 ± 0.1
cm. In the first lab activity of this course, we will try to explore exactly what is meant by
uncertainty.
We distinguish between two types of error in measurement: systematic error and
random error. The following illustration shows examples of these two types of error:
"true" value
measuredvalues
large systematic errorsmall random error
small systematic errorlarge random error
The set of measured values on the left exhibit a large systematic error: they are all lower
than the true value of the parameter. The set of measured values on the right exhibit a
small systematic error: they are, on average, neither higher nor lower than the true value
of the parameter. However, the measured values on the right have more random error
than those on the left: they vary more from one measurement to the next. You may have
heard the terms precision and accuracy used to describe measurements. A measuring
device that has very little systematic error is said to be accurate: its measurements
should, on average, be equal to the true value. A measuring device that has very little
random error is said to be precise: repeated measurements of the same parameter should
not vary much from one measurement to the next.
In principle, you can eliminate systematic error from your measurements by
calibrating your measuring device. If you measure a standard object or system that has a
known value for the parameter of interest, you can determine the sign and magnitude of
the systematic error of your device and compensate for that error in your measurements.
3
For instance, you can use a mixture of ice and water at equilibrium (which will have a
temperature of 0°C) as a standard reference point to calibrate a thermometer. Ideally, you
should calibrate a measuring device at several different points over its range. A proper
laboratory experiment should always check for the possibility of systematic error and
compensate for that error by calibration.
You can never eliminate random error from your measurements. Electronic
measuring devices, for instance, suffer from various sources of electronic noise. All
devices suffer from thermal fluctuations. Errors made by the operator of a device
(“human error”) can be both systematic and random. For instance, if you measure the
time of an event by pressing a button on a stopwatch, you are likely to press the button
somewhat after the event has actually occurred (a systematic error), and the amount that
you are late is likely to vary from one measurement to the next (a random error).
In the preceding discussion, we have implicitly introduced the concept of making
repeated measurements. You should ask: what does it mean to repeat a measurement?
Often, a physical system will not “sit still” and wait for us to make repeated
measurements. If we want to drop a ball from a height h and measure its velocity when it
strikes the ground, we can probably make only one measurement of the velocity at that
instant. Instead, we repeat the experiment using identical starting conditions and make a
measurement of each experiment. In this case, we could take the same ball and drop it
again from the same height h. As you might expect, this procedure introduces some error
because we can never exactly reproduce the conditions of a particular experiment. We
can control a small number of parameters (e.g. the mass of the ball, its initial height) but
cannot control many other parameters (e.g. the velocity of every molecule of air in the
room). Because our world is ultimately governed by quantum mechanics, we cannot
even in principle control all the relevant physical parameters of a given experiment! We
must, therefore, consider what parameters are likely to have a significant effect on our
experiment and control those parameters to the best of our ability.
2. Repeated Measurements and Statistical Distributions
The “gold standard” of any physical experiment is to perform a huge number of
measurements on repeated experiments with identical starting conditions. This procedure
4
would yield not one measurement, but a statistical distribution of measurements. We can
report a statistical distribution using a histogram. For instance, 50 repeated
measurements of the velocity at the moment of impact of a particular ball dropped from a
particular height might yield the following histogram:
The x-axis of a histogram shows the values of the measured parameter, divided into bins
of equal width; the y-axis shows the frequency, or number of times that a measured value
fell within a particular bin. In the histogram shown above, the bins are centered around
the values shown on the x-axis; the width of each bin is equal to 0.05 m/s. A histogram is
the best way to report the results of repeated measurements of a parameter: one can see
immediately the overall shape of the distribution, the mean (or average) of the
distribution, and whether there are any notable statistical outliers (values that fall
unusually far from the mean).
Obviously, it would be unwieldy to publish a histogram for every measured
parameter in every experiment. Usually, we fit an idealized distribution to the measured
histogram and report a few parameters that characterize the idealized distribution. In
most cases, we can fit a normal or Gaussian distribution to the histogram. The normal
distribution is characterized by two parameters: the arithmetic mean (often symbolized by
the Greek letter µ or the symbol
!
x ) and the standard deviation (often symbolized by the
Greek letter σ). An approximate formula for the Gaussian distribution (for histograms
containing a total of N measurements with bins of width w) is:
5
!
Expected Gaussian frequency for bin centered around x "Nw
# 2$exp
%(x %µ)2
2# 2
&
' (
)
* +
Here is the above histogram along with a Gaussian distribution calculated from the
arithmetic mean and standard deviation of the measured data:
As you can see, the Gaussian distribution offers a reasonable approximation to the
experimental distribution. Indeed, most experimental measurements yield histograms
that are approximately Gaussian.
Several features of the Gaussian distribution make it particularly useful in
describing and analyzing experimental data. This distribution is characterized by only
two parameters: the mean (µ) and the standard deviation (σ). For a Gaussian distribution,
the mean (the arithmetic mean), the median (the “midpoint” of the data) and the mode
(the highest point, or the most common result) are all identical:
6
The standard deviation (σ) gives a measure of the spread or “width” of the distribution.
Another common measure of the spread of a distribution is the full-width at half-
maximum, or FWHM, which is exactly what it says: the full width of the distribution at
the midpoint between the baseline and the peak of the distribution:
The standard deviation σ of a Gaussian distribution is related to the FWHM by the
following equation:
!
" =FWHM
2 2 ln 2#
FWHM
2.35
You can use the standard deviation to estimate how many measurements will fall within a
certain “distance” of the mean. The general rule (often called the “68–95–99.7 rule”)
states that:
68% of the measurements should fall within 1 std. dev. of the mean
95% of the measurements should fall within 2 std. dev. of the mean
99.7% of the measurements should fall within 3 std. dev. of the mean
We can understand the meaning of this rule by examining the area under the Gaussian
curve within these limits:
7
Thus, knowledge of the standard deviation (which can be derived from a statistical
analysis of the data, from fitting a Gaussian curve to a histogram, or from the FWHM of
8
the distribution) allows you to estimate the probability that a measurement will fall within
a certain range of the mean. This can be useful in deciding whether to eliminate a
statistical outlier from your data. If your measuring device usually yields a Gaussian
distribution of measurements, and you see a measurement that is, for instance, 4 standard
deviations away from the mean, you may want to reject that measurement as an outlier.
You should also analyze your experimental setup and your measuring device to see if you
can determine why that measurement was erroneous.
3. Normally, Everything is Normal: The Ubiquitous Gaussian Distribution
In nearly all cases, the random error in any set of repeated measurements leads to
a distribution of measurements that is approximately Gaussian. Why is this distribution
so common? In statistics, this distribution is called the normal distribution: data that
follow this distribution are said to be normally distributed. We can understand why this
distribution arises using an important result from statistics known as the central limit
theorem.
The central limit theorem says that if you add together an infinite number of
uncorrelated random variables—with the stipulation that each random variable must have
a mean of zero and a finite standard deviation—the result will be a Gaussian distribution.
Let’s think about this for a minute. First, we require that the variables be random and
uncorrelated (not correlated with one another). Those requirements should be intuitively
obvious. Next, we require that each variable must have a mean of zero. That is another
way of saying that the random variables should not introduce any systematic error: on
average, each random variable should not add or subtract anything to the sum. Finally,
we require that each random variable have a finite standard deviation (stated more often
as the requirement of a finite variance, which is simply the square of the standard
deviation). Any random variable that has an infinite standard deviation would be
unbounded, which poses a challenge to our intuitive notion of randomness: what would
you do if someone told you to pick a random number between one and infinity? (It is
mathematically possible to have an unbounded random variable with a finite standard
deviation—indeed, the Gaussian distribution is an example—but all physical random
variables will be bounded by some limits.) As long as those requirements are fulfilled,
9
the sum of all the random variables will approach a Gaussian distribution as the number
of random variables approaches infinity. This theorem places no other requirements on
the distribution of each random variable. For instance, a sum of an infinite number of
bimodal distributions will yield a single Gaussian distribution.
How is the central limit theorem related to uncertainty in physical measurements?
In any experiment, there will be many sources of error: electronic noise, operator error,
thermal fluctuations, etc. We assume that any systematic error has been eliminated by
proper calibration of the measuring device. Thus, the average error introduced by all of
these various sources should be zero. We expect that these sources of error are
uncorrelated, and they must be bounded by some physical limits, so they will have a
finite standard deviation. Finally, we assume that these sources of error are additive: that
is, the total error is the sum of each of the individual sources of error. As long as there
are a large number of such sources of error, the total distribution will approximate a
Gaussian distribution. Any experiment that yields a non-Gaussian distribution probably
has some source of systematic error or some hidden correlation between the random
sources of error.
We should note that we are considering physical measurements in which the
uncertainty of measured values arises from random errors in the measuring device, not
from variations in the “true” value that is measured. Within the realm of classical
physics, we assume that the “true” value of any physical parameter has no uncertainty
and that all of the uncertainty arises from the process of measurement. Thus, the
measured distributions are nearly always Gaussian. In many other applications of
statistics, however, the underlying parameter may exhibit intrinsic variance and a notably
non-Gaussian distribution. For instance, the distribution of family incomes in the United
States is highly non-Gaussian: the vast majority of families have moderate incomes, but
there is a long “tail” that extends up to very high incomes. Such distributions are said to
be skewed. Under most circumstances, highly skewed distributions will not result from
random measurement errors.
The central limit theorem properly applies only in the limit of an infinite number
of random variables. If one examines how the sum of a finite number of random
variables converges on a Gaussian distribution, one observes that the central part of the
10
distribution converges quite rapidly, but the “tails” of the distribution converge more
slowly. Although the Gaussian distribution is mathematically unbounded, you should not
take the extreme tails of this distribution seriously: in the “ball drop” experiment, for
instance, a literal interpretation of the Gaussian distribution would suggest that there is a
non-zero probability of measuring a negative velocity, or a velocity faster than the speed
of light. Likewise, although a graph of the heights of adult women shows an
approximately Gaussian distribution, a literal interpretation of this distribution would
suggest that there is a non-zero probability of finding an adult woman who is 100 feet
tall. As far as physical measurements are concerned, you should regard the central limit
theorem as a statement that the middle of a measured distribution should look
approximately Gaussian.
4. Repeating Measurements: Standard Deviation and Standard Error
Ideally, you would repeat every measurement enough times to plot a histogram
and confirm that the distribution is indeed Gaussian. In reality, though, such a procedure
would be unnecessarily time-consuming. Many experiments involve repeating a similar
measurement for several different initial conditions. For instance, you might measure the
velocity of a ball upon impact after dropping it from various heights. You could drop it
from one height 50 times, then drop it from a different height 50 times, and so on. Or,
you could drop it from a single height 50 times, confirm that the distribution is Gaussian
with a particular standard deviation, and then drop it from each other height only once.
You could assume that the standard deviation of the other experiments should be about
the same as the standard deviation of the first experiment. As long as the various sources
of experimental error are random and uncorrelated, this assumption is reasonable. With
each such measurement, you can report the expected standard deviation of that
measurement. You have implicitly followed this procedure whenever you have used a
standard measuring device that has a stated uncertainty. For instance, a laboratory
balance might state an uncertainty of “±0.1 mg.” In this case, the manufacturer has made
repeated measurements of various masses and found that the standard deviation is 0.1
mg. You could, with confidence, make a single measurement of the mass of an object
and report it with an uncertainty of 0.1 mg. (Of course, you would have to be sure that
11
the balance is in good working order and that it has been calibrated properly. We spend
tens of thousands of dollars each year to calibrate the laboratory equipment used in the
teaching labs in the Science Center!)
In order to determine the standard deviation of a measuring device, you must
collect enough repeated measurements to verify that the distribution is indeed
approximately Gaussian. You must also collect enough measurements to have some
measurements in the “tails” of the distribution. A good rule of thumb is that a standard
deviation will be fairly accurate if you collect at least 30 repeated measurements. With
that number of measurements, you should obtain some measurements beyond two
standard deviations from the mean (according to the “68–95–99.7” rule), and you can
verify that the distribution of measurements is approximately Gaussian.
Even if you know the standard deviation of a measuring device, you might still
want to make repeated measurements. Making repeated measurements should not
change the standard deviation of the measurement: we expect that the standard deviation
is an intrinsic property of the particular experiment and measuring device. However,
making repeated measurements will reduce the standard error of the mean for the
measurement. The standard error of the mean for a series of repeated measurements is
related to the standard deviation σ and the number of measurements N:
!
Standard error ="
N
The standard error can be thought of as the standard deviation of the mean of a series of
repeated measurements. For instance, in the above example the standard deviation is
σ = 0.11 m/s. The experiment was repeated 50 times, so the standard error is 0.016 m/s.
We could report the result of these 50 measurements in the following manner:
Velocity = 1.028 ± 0.016 m/s (N = 50)
Note that the reported uncertainty of ± 0.016 m/s is the uncertainty of the mean, not the
standard deviation of the measurement itself. When you report a measurement in this
fashion, you are implicitly reporting a distribution of measurements, not a single
measurement. Providing the number of measurements (N = 50) tells the reader that you
repeated the measurement 50 times. As a side note, if you are reporting a value using
12
scientific notation, you should include the standard deviation within the mantissa, as in
the following example:
Velocity = (1.028 ± 0.016) × 10–3 km/s (N = 50)
In general, when a reader sees a measured value reported as “xxx ± yy” he or she
will assume that the distribution is approximately Gaussian with a mean of xxx and a
standard error of yy. You should keep that assumption in mind when reporting scientific
data. Whenever you make repeated measurements, you should: i) Construct a histogram
from your data. ii) Calculate the mean and standard deviation. iii) Draw a Gaussian
curve for the calculated mean and standard deviation. iv) If the Gaussian curve is a
reasonable fit to the observed histogram, you may report the mean and the standard error
of the mean as described above. If not, you should probably report the full histogram.
Knowing the standard error of the mean allows us to estimate the confidence we
have in our measurement of the mean. Using the “68–95–99.7 rule”, we can be 68%
confident that the true velocity is within one standard error of the mean, and 95%
confident that the true velocity is within two standard errors of the mean. (Of course, this
conclusion is true only if we have eliminated the possibility of systematic error.) Thus,
with 50 measurements, we can state that there is a 95% chance that the true velocity lies
between 1.00 and 1.06 m/s. We use these confidence intervals when we compare the
results from various experiments. For example, we might perform another “ball drop”
experiment with a heavier ball. As long as air resistance is negligible, the velocity upon
impact should be the same with the heavy ball as it was with the light ball. If we find, for
instance, that the velocity of the heavy ball is between 1.03 and 1.09 m/s (with a
confidence of 95%), then the velocity of the heavy ball is statistically indistinguishable
from that of the light ball measured earlier. However, if we find that the velocity of the
heavy ball is between 1.08 and 1.14 m/s (at a 95% confidence level), then we can be 95%
certain that the velocity of the heavy ball is indeed greater than that of the light ball.
If we had made only one measurement, the standard error would be equal to the
standard deviation. Suppose, for instance, that we made only one measurement of the
velocity and we “got lucky”: the measurement was 1.028 m/s (the same as the mean that
we obtained from making 50 measurements). We would report this observation as:
Velocity = 1.03 ± 0.11 m/s (N = 1)
13
The standard error, for one measurement, is equal to the standard deviation (0.11 m/s). In
this case, we could claim only that there is a 95% chance that the true velocity lies
between 0.81 and 1.25 m/s. Although the standard deviation is the same in both cases,
the use of repeated measurements allows us to make a much more precise statement
about the mean of the distribution. You should keep in mind both the standard deviation
and the standard error of the mean in any discussion or analysis of experimental
measurements.
Note that the standard error is inversely proportional to the square root of the
number of measurements. Thus, to narrow the standard error by a factor of 10, you
would need to make 100 repeated measurements. You could achieve the same result by
improving the experiment and the measuring device to reduce the intrinsic standard
deviation by a factor of 10. Depending on the experiment, one of these procedures may
be more straightforward than the other. Some physical experiments use thousands or
millions of repeated measurements—collected automatically by a computer—to reduce
the standard error of the experiment to within reasonable bounds.
5. Propagation of Error
You may have encountered the dreaded term “propagation of error” in a previous
science course. The central concept is that any arithmetic operations on uncertain
numbers will produce a result that is uncertain; the tools of “propagation of error” allow
us to estimate this resulting uncertainty. We do not expect you to memorize formulas for
the propagation of error: you can find such formulas in standard textbooks or on the Web.
We will simply walk through one example so you can see the general concept of
propagation of error and understand how it works.
In your first lab activity, you will simulate sources of random measurement error
using three different techniques. You will assume that the “true value” of a measured
parameter is 100, and you will model the “experimental error” by rolling dice, flipping
coins, and choosing random digits from a phone book. Each of these sources of error
should be random, and you will add them all to the true value of 100 to yield the value
that is measured by the (hypothetical) “noisy instrument”:
(100) + (dice) + (coins) + (phonebook) = measurement
14
We assume that 100 has no uncertainty, since it is the “true value.” Each of the other
values—dice, coins, and phonebook—has some uncertainty, as does the sum. You will
calculate the standard deviations of each of these sources of error in your lab activity.
Let us represent the standard deviations of the values dice, coins, phonebook, and
measurement by the symbols σd, σc, σp, and σm respectively. Using these symbols, the
expected standard deviation of the total measurement can be calculated from the formula
for the propagation of error for addition:
!
"m
2=" d
2+" c
2+" p
2
You will usually see this formula for the propagation of error written (equivalently) as:
!
"m = " d
2+" c
2+" p
2
This formula is sometimes referred to as the “RSS” formula for propagation of error: the
initials stand for “root of sum of squares.”
There is an analogous formula for the propagation of error that uses the standard
error instead of the standard deviation. That is, if we denote the standard error of the
mean for the individual values as SEd, SEc, SEp and SEm the expected standard error of the
mean for the overall measurement is given by:
!
SEm = SEd
2+ SEc
2+ SEp
2
This fact that the squares of the individual errors are added together to yield the
square of the overall error is often summarized by the statement “errors add in
quadrature.” (Recall that a quadratic equation is an equation that contains a squared term
like x2.) If errors added linearly, then the multiple sources of error in physical
experiments would accumulate so quickly that it would be exceedingly difficult to make
any precise measurements. As an example, consider a measurement whose “true value”
is 100 in which there are five sources of error, each with a standard error of 10. If the
errors added linearly, we would expect the total error to equal 50; that is, we would
expect the measured values to range from 50 to 150. Since the errors add in quadrature,
however, the expected standard error is:
!
SE = 102
+102
+102
+102
+102
= 22.4
which is less than half of the standard error that would be expected if the errors added
linearly. We can add errors in quadrature when we expect the errors to be uncorrelated.
15
For instance, we expect that it is extremely unlikely that in a single experiment all the
sources of error are +10, or that all the sources of error are –10. If some of the errors are
correlated, we must use other formulas for the propagation of error that account for the
correlations. Such considerations are beyond the scope of this course.
6. “Executive Summary”
• All measurements exhibit random error, which is unavoidable, and systematic error,
which can be eliminated by proper calibration of the measuring device.
• Repeated measurements yield a statistical distribution that is almost always Gaussian;
such a distribution is characterized fully by its mean (µ) and standard deviation (σ).
• The standard deviation is a measure of the width of the distribution, and is
mathematically related to the full width at half-maximum, or FWHM.
• You should repeat one measurement at least 30 times with a particular measuring
device to determine the intrinsic standard deviation of that device.
• You may choose to repeat other measurements to minimize the standard error of the
mean, which is inversely proportional to the square root of the number of measurements.
• The standard error of the mean is a measure of the uncertainty of the mean; you can be
95% confident that the “true” mean lies within 2 standard errors of the measured mean.
• Uncorrelated random errors add in quadrature: the overall error is the root of the sum
of the squares of the individual sources of error.
For more information on error analysis and propagation of errors, you should consult the
excellent text by John R. Taylor, An Introduction to Error Analysis, 2nd ed., Sausalito,
CA: University Science Books, 1997.