Welcome to Physical Sciences 2 lab! - Harvard Universityscphys/nsta/lab_introduction.pdf · Welcome to Physical Sciences 2 lab! We're very excited about the labs for this course,

Welcome to Physical Sciences 2 lab! We're very excited about the labs for this course, and we hope you will be, too. Everything about the labs has been newly designed for a great educational experience with a minimum of annoying busywork. We've had a lot of fun working on the labs, and hopefully, you will have a lot of fun doing them.

By this time you should all have sectioned for a lab time assignment. If you haven't, or if you don't remember your lab section time, please contact Kirill immediately: [email protected]. Lab 1 will run next week, from Tuesday, October 3 to Thursday, October 5.

Before you show up to your first lab next week, we would like you to do three things:

1. Download the Logger Pro Software. Logger Pro is the data collection and analysis software we will be using for all of the labs in this course. It's very easy to use and powerful, and is available both for Windows or Macintosh platforms. The site license agreement allows any Harvard student to freely download and use the software. (If you don't have a PC or a Mac, or don't want to put Logger Pro on your own computer, you can use one of the computers in the Science Center computer labs.) The program can be downloaded from the HASCS Software Download Page: http://www.fas.harvard.edu/computing/download. Either version 3.4.5 or 3.4.6 is okay; 3.4.6 is the latest; as of this writing, 3.4.5 is the version available from HASCS, but we're told they are working on getting 3.4.6 up there.

2. Learn to use Logger Pro. We recommend you go through some of the tutorials that come with the software. To do so, go to File-Open and then under the folder labeled Experiments, find the subfolder called Tutorials. Tutorial #1 is a quick overview; #5 has information on entering data; #7 is a very brief summary on working with graphs; and #9 teaches you how to analyze data using curve fitting. Some of the other tutorials are also useful, but they require one or more sensors connected so that you can learn how to take data.

3. Read the attached handout, "An Introduction to Measurement and Uncertainty." This document contains ideas which will be new to many of you, even those of you with a background in statistics, but we have essentially tried to boil down the most important things you need to know about doing quantitative experimental science and put them in one place, so it is very important; we will be using the ideas from this document over and over throughout the labs this semester. If you have any specific questions about the document, please post your questions to the Lab Discussion Page on the course website, or contact your Lab TF.

That's it! We look forward to a semester of fun, excitement, and instruction in the labs. See you next week!

1

Physical Sciences 2 and Physics 11a

An Introduction to Measurement and Uncertainty

1. Measurement and Uncertainty

In the laboratory portion of this course, you will perform experiments and make

observations. You should distinguish between two types of observations: qualitative and

quantitative observations. Although qualitative observations are an important aspect of

experimental science (e.g., “I connected the battery and smoke started pouring out of the

device”), we will focus on quantitative observations, or measurements. You will make

measurements using various measuring devices, and report the values of these

measurements. Physical theories, such as Newton’s laws of motion, make quantitative

predictions about the outcomes of experiments: if we drop a ball from a height h above

the ground, Newton’s laws predict the speed of the ball when it strikes the ground. In

order to test, refine, and develop our physical theories, we must make quantitative

measurements.

Although you make measurements every day—after all, a clock is a device that

measures time—you probably do not give much thought to the process of measurement.

The following schematic should help you think about this process:

The physical system(what we measure)

is described by certainparameters, such asposition, time, velocity,mass, force, etc.

The measuring device(takes a measurement)

could be a stopwatch,ruler, balance,thermometer, etc.The experimenter may be "a part of the device."

The measurement(what we record)

must have three things:• numerical value• estimated uncertainty• units

Within the paradigm of classical physics, we consider the parameters of the physical

system to be defined to infinitely high precision. Any measuring device, however, has

some limits on the precision of its measurements. For instance, you may measure time

using a digital stopwatch that records time to the nearest millisecond. A measuring

device observes a physical system and records a measurement. When you measure

length using a ruler, the ruler alone is not a complete measuring device: you must

interpret the markings on the ruler and record the measurement, so you are a part of the

2

measuring device. A thermometer connected to a computer is a complete measuring

device, since the computer records the measurements.

All measurements involve some uncertainty, or error. Physicists use the term

error not to describe mistakes (“I dropped the thermometer and it broke”) but to describe

the inevitable uncertainty that accompanies any measurement. When we report a

measurement, we must include three pieces of information: the numerical value of the

measurement, the units of the measurement, and some estimate of the uncertainty of the

measurement. For example, you might report that the length of a metal rod is 13.2 ± 0.1

cm. In the first lab activity of this course, we will try to explore exactly what is meant by

uncertainty.

We distinguish between two types of error in measurement: systematic error and

random error. The following illustration shows examples of these two types of error:

"true" value

measuredvalues

large systematic errorsmall random error

small systematic errorlarge random error

The set of measured values on the left exhibit a large systematic error: they are all lower

than the true value of the parameter. The set of measured values on the right exhibit a

small systematic error: they are, on average, neither higher nor lower than the true value

of the parameter. However, the measured values on the right have more random error

than those on the left: they vary more from one measurement to the next. You may have

heard the terms precision and accuracy used to describe measurements. A measuring

device that has very little systematic error is said to be accurate: its measurements

should, on average, be equal to the true value. A measuring device that has very little

random error is said to be precise: repeated measurements of the same parameter should

not vary much from one measurement to the next.

In principle, you can eliminate systematic error from your measurements by

calibrating your measuring device. If you measure a standard object or system that has a

known value for the parameter of interest, you can determine the sign and magnitude of

the systematic error of your device and compensate for that error in your measurements.

3

For instance, you can use a mixture of ice and water at equilibrium (which will have a

temperature of 0°C) as a standard reference point to calibrate a thermometer. Ideally, you

should calibrate a measuring device at several different points over its range. A proper

laboratory experiment should always check for the possibility of systematic error and

compensate for that error by calibration.

You can never eliminate random error from your measurements. Electronic

measuring devices, for instance, suffer from various sources of electronic noise. All

devices suffer from thermal fluctuations. Errors made by the operator of a device

(“human error”) can be both systematic and random. For instance, if you measure the

time of an event by pressing a button on a stopwatch, you are likely to press the button

somewhat after the event has actually occurred (a systematic error), and the amount that

you are late is likely to vary from one measurement to the next (a random error).

In the preceding discussion, we have implicitly introduced the concept of making

repeated measurements. You should ask: what does it mean to repeat a measurement?

Often, a physical system will not “sit still” and wait for us to make repeated

measurements. If we want to drop a ball from a height h and measure its velocity when it

strikes the ground, we can probably make only one measurement of the velocity at that

instant. Instead, we repeat the experiment using identical starting conditions and make a

measurement of each experiment. In this case, we could take the same ball and drop it

again from the same height h. As you might expect, this procedure introduces some error

because we can never exactly reproduce the conditions of a particular experiment. We

can control a small number of parameters (e.g. the mass of the ball, its initial height) but

cannot control many other parameters (e.g. the velocity of every molecule of air in the

room). Because our world is ultimately governed by quantum mechanics, we cannot

even in principle control all the relevant physical parameters of a given experiment! We

must, therefore, consider what parameters are likely to have a significant effect on our

experiment and control those parameters to the best of our ability.

2. Repeated Measurements and Statistical Distributions

The “gold standard” of any physical experiment is to perform a huge number of

measurements on repeated experiments with identical starting conditions. This procedure

4

would yield not one measurement, but a statistical distribution of measurements. We can

report a statistical distribution using a histogram. For instance, 50 repeated

measurements of the velocity at the moment of impact of a particular ball dropped from a

particular height might yield the following histogram:

The x-axis of a histogram shows the values of the measured parameter, divided into bins

of equal width; the y-axis shows the frequency, or number of times that a measured value

fell within a particular bin. In the histogram shown above, the bins are centered around

the values shown on the x-axis; the width of each bin is equal to 0.05 m/s. A histogram is

the best way to report the results of repeated measurements of a parameter: one can see

immediately the overall shape of the distribution, the mean (or average) of the

distribution, and whether there are any notable statistical outliers (values that fall

unusually far from the mean).

Obviously, it would be unwieldy to publish a histogram for every measured

parameter in every experiment. Usually, we fit an idealized distribution to the measured

histogram and report a few parameters that characterize the idealized distribution. In

most cases, we can fit a normal or Gaussian distribution to the histogram. The normal

distribution is characterized by two parameters: the arithmetic mean (often symbolized by

the Greek letter µ or the symbol

!

x ) and the standard deviation (often symbolized by the

Greek letter σ). An approximate formula for the Gaussian distribution (for histograms

containing a total of N measurements with bins of width w) is:

5

!

Expected Gaussian frequency for bin centered around x "Nw

# 2$exp

%(x %µ)2

2# 2

&

' (

)

* +

Here is the above histogram along with a Gaussian distribution calculated from the

arithmetic mean and standard deviation of the measured data:

As you can see, the Gaussian distribution offers a reasonable approximation to the

experimental distribution. Indeed, most experimental measurements yield histograms

that are approximately Gaussian.

Several features of the Gaussian distribution make it particularly useful in

describing and analyzing experimental data. This distribution is characterized by only

two parameters: the mean (µ) and the standard deviation (σ). For a Gaussian distribution,

the mean (the arithmetic mean), the median (the “midpoint” of the data) and the mode

(the highest point, or the most common result) are all identical:

6

The standard deviation (σ) gives a measure of the spread or “width” of the distribution.

Another common measure of the spread of a distribution is the full-width at half-

maximum, or FWHM, which is exactly what it says: the full width of the distribution at

the midpoint between the baseline and the peak of the distribution:

The standard deviation σ of a Gaussian distribution is related to the FWHM by the

following equation:

!

" =FWHM

2 2 ln 2#

FWHM

2.35

You can use the standard deviation to estimate how many measurements will fall within a

certain “distance” of the mean. The general rule (often called the “68–95–99.7 rule”)

states that:

68% of the measurements should fall within 1 std. dev. of the mean

95% of the measurements should fall within 2 std. dev. of the mean

99.7% of the measurements should fall within 3 std. dev. of the mean

We can understand the meaning of this rule by examining the area under the Gaussian

curve within these limits:

7

Thus, knowledge of the standard deviation (which can be derived from a statistical

analysis of the data, from fitting a Gaussian curve to a histogram, or from the FWHM of

8

the distribution) allows you to estimate the probability that a measurement will fall within

a certain range of the mean. This can be useful in deciding whether to eliminate a

statistical outlier from your data. If your measuring device usually yields a Gaussian

distribution of measurements, and you see a measurement that is, for instance, 4 standard

deviations away from the mean, you may want to reject that measurement as an outlier.

You should also analyze your experimental setup and your measuring device to see if you

can determine why that measurement was erroneous.

3. Normally, Everything is Normal: The Ubiquitous Gaussian Distribution

In nearly all cases, the random error in any set of repeated measurements leads to

a distribution of measurements that is approximately Gaussian. Why is this distribution

so common? In statistics, this distribution is called the normal distribution: data that

follow this distribution are said to be normally distributed. We can understand why this

distribution arises using an important result from statistics known as the central limit

theorem.

The central limit theorem says that if you add together an infinite number of

uncorrelated random variables—with the stipulation that each random variable must have

a mean of zero and a finite standard deviation—the result will be a Gaussian distribution.

Let’s think about this for a minute. First, we require that the variables be random and

uncorrelated (not correlated with one another). Those requirements should be intuitively

obvious. Next, we require that each variable must have a mean of zero. That is another

way of saying that the random variables should not introduce any systematic error: on

average, each random variable should not add or subtract anything to the sum. Finally,

we require that each random variable have a finite standard deviation (stated more often

as the requirement of a finite variance, which is simply the square of the standard

deviation). Any random variable that has an infinite standard deviation would be

unbounded, which poses a challenge to our intuitive notion of randomness: what would

you do if someone told you to pick a random number between one and infinity? (It is

mathematically possible to have an unbounded random variable with a finite standard

deviation—indeed, the Gaussian distribution is an example—but all physical random

variables will be bounded by some limits.) As long as those requirements are fulfilled,

9

the sum of all the random variables will approach a Gaussian distribution as the number

of random variables approaches infinity. This theorem places no other requirements on

the distribution of each random variable. For instance, a sum of an infinite number of

bimodal distributions will yield a single Gaussian distribution.

How is the central limit theorem related to uncertainty in physical measurements?

In any experiment, there will be many sources of error: electronic noise, operator error,

thermal fluctuations, etc. We assume that any systematic error has been eliminated by

proper calibration of the measuring device. Thus, the average error introduced by all of

these various sources should be zero. We expect that these sources of error are

uncorrelated, and they must be bounded by some physical limits, so they will have a

finite standard deviation. Finally, we assume that these sources of error are additive: that

is, the total error is the sum of each of the individual sources of error. As long as there

are a large number of such sources of error, the total distribution will approximate a

Gaussian distribution. Any experiment that yields a non-Gaussian distribution probably

has some source of systematic error or some hidden correlation between the random

sources of error.

We should note that we are considering physical measurements in which the

uncertainty of measured values arises from random errors in the measuring device, not

from variations in the “true” value that is measured. Within the realm of classical

physics, we assume that the “true” value of any physical parameter has no uncertainty

and that all of the uncertainty arises from the process of measurement. Thus, the

measured distributions are nearly always Gaussian. In many other applications of

statistics, however, the underlying parameter may exhibit intrinsic variance and a notably

non-Gaussian distribution. For instance, the distribution of family incomes in the United

States is highly non-Gaussian: the vast majority of families have moderate incomes, but

there is a long “tail” that extends up to very high incomes. Such distributions are said to

be skewed. Under most circumstances, highly skewed distributions will not result from

random measurement errors.

The central limit theorem properly applies only in the limit of an infinite number

of random variables. If one examines how the sum of a finite number of random

variables converges on a Gaussian distribution, one observes that the central part of the

10

distribution converges quite rapidly, but the “tails” of the distribution converge more

slowly. Although the Gaussian distribution is mathematically unbounded, you should not

take the extreme tails of this distribution seriously: in the “ball drop” experiment, for

instance, a literal interpretation of the Gaussian distribution would suggest that there is a

non-zero probability of measuring a negative velocity, or a velocity faster than the speed

of light. Likewise, although a graph of the heights of adult women shows an

approximately Gaussian distribution, a literal interpretation of this distribution would

suggest that there is a non-zero probability of finding an adult woman who is 100 feet

tall. As far as physical measurements are concerned, you should regard the central limit

theorem as a statement that the middle of a measured distribution should look

approximately Gaussian.

4. Repeating Measurements: Standard Deviation and Standard Error

Ideally, you would repeat every measurement enough times to plot a histogram

and confirm that the distribution is indeed Gaussian. In reality, though, such a procedure

would be unnecessarily time-consuming. Many experiments involve repeating a similar

measurement for several different initial conditions. For instance, you might measure the

velocity of a ball upon impact after dropping it from various heights. You could drop it

from one height 50 times, then drop it from a different height 50 times, and so on. Or,

you could drop it from a single height 50 times, confirm that the distribution is Gaussian

with a particular standard deviation, and then drop it from each other height only once.

You could assume that the standard deviation of the other experiments should be about

the same as the standard deviation of the first experiment. As long as the various sources

of experimental error are random and uncorrelated, this assumption is reasonable. With

each such measurement, you can report the expected standard deviation of that

measurement. You have implicitly followed this procedure whenever you have used a

standard measuring device that has a stated uncertainty. For instance, a laboratory

balance might state an uncertainty of “±0.1 mg.” In this case, the manufacturer has made

repeated measurements of various masses and found that the standard deviation is 0.1

mg. You could, with confidence, make a single measurement of the mass of an object

and report it with an uncertainty of 0.1 mg. (Of course, you would have to be sure that

11

the balance is in good working order and that it has been calibrated properly. We spend

tens of thousands of dollars each year to calibrate the laboratory equipment used in the

teaching labs in the Science Center!)

In order to determine the standard deviation of a measuring device, you must

collect enough repeated measurements to verify that the distribution is indeed

approximately Gaussian. You must also collect enough measurements to have some

measurements in the “tails” of the distribution. A good rule of thumb is that a standard

deviation will be fairly accurate if you collect at least 30 repeated measurements. With

that number of measurements, you should obtain some measurements beyond two

standard deviations from the mean (according to the “68–95–99.7” rule), and you can

verify that the distribution of measurements is approximately Gaussian.

Even if you know the standard deviation of a measuring device, you might still

want to make repeated measurements. Making repeated measurements should not

change the standard deviation of the measurement: we expect that the standard deviation

is an intrinsic property of the particular experiment and measuring device. However,

making repeated measurements will reduce the standard error of the mean for the

measurement. The standard error of the mean for a series of repeated measurements is

related to the standard deviation σ and the number of measurements N:

!

Standard error ="

N

The standard error can be thought of as the standard deviation of the mean of a series of

repeated measurements. For instance, in the above example the standard deviation is

σ = 0.11 m/s. The experiment was repeated 50 times, so the standard error is 0.016 m/s.

We could report the result of these 50 measurements in the following manner:

Velocity = 1.028 ± 0.016 m/s (N = 50)

Note that the reported uncertainty of ± 0.016 m/s is the uncertainty of the mean, not the

standard deviation of the measurement itself. When you report a measurement in this

fashion, you are implicitly reporting a distribution of measurements, not a single

measurement. Providing the number of measurements (N = 50) tells the reader that you

repeated the measurement 50 times. As a side note, if you are reporting a value using

12

scientific notation, you should include the standard deviation within the mantissa, as in

the following example:

Velocity = (1.028 ± 0.016) × 10–3 km/s (N = 50)

In general, when a reader sees a measured value reported as “xxx ± yy” he or she

will assume that the distribution is approximately Gaussian with a mean of xxx and a

standard error of yy. You should keep that assumption in mind when reporting scientific

data. Whenever you make repeated measurements, you should: i) Construct a histogram

from your data. ii) Calculate the mean and standard deviation. iii) Draw a Gaussian

curve for the calculated mean and standard deviation. iv) If the Gaussian curve is a

reasonable fit to the observed histogram, you may report the mean and the standard error

of the mean as described above. If not, you should probably report the full histogram.

Knowing the standard error of the mean allows us to estimate the confidence we

have in our measurement of the mean. Using the “68–95–99.7 rule”, we can be 68%

confident that the true velocity is within one standard error of the mean, and 95%

confident that the true velocity is within two standard errors of the mean. (Of course, this

conclusion is true only if we have eliminated the possibility of systematic error.) Thus,

with 50 measurements, we can state that there is a 95% chance that the true velocity lies

between 1.00 and 1.06 m/s. We use these confidence intervals when we compare the

results from various experiments. For example, we might perform another “ball drop”

experiment with a heavier ball. As long as air resistance is negligible, the velocity upon

impact should be the same with the heavy ball as it was with the light ball. If we find, for

instance, that the velocity of the heavy ball is between 1.03 and 1.09 m/s (with a

confidence of 95%), then the velocity of the heavy ball is statistically indistinguishable

from that of the light ball measured earlier. However, if we find that the velocity of the

heavy ball is between 1.08 and 1.14 m/s (at a 95% confidence level), then we can be 95%

certain that the velocity of the heavy ball is indeed greater than that of the light ball.

If we had made only one measurement, the standard error would be equal to the

standard deviation. Suppose, for instance, that we made only one measurement of the

velocity and we “got lucky”: the measurement was 1.028 m/s (the same as the mean that

we obtained from making 50 measurements). We would report this observation as:

Velocity = 1.03 ± 0.11 m/s (N = 1)

13

The standard error, for one measurement, is equal to the standard deviation (0.11 m/s). In

this case, we could claim only that there is a 95% chance that the true velocity lies

between 0.81 and 1.25 m/s. Although the standard deviation is the same in both cases,

the use of repeated measurements allows us to make a much more precise statement

about the mean of the distribution. You should keep in mind both the standard deviation

and the standard error of the mean in any discussion or analysis of experimental

measurements.

Note that the standard error is inversely proportional to the square root of the

number of measurements. Thus, to narrow the standard error by a factor of 10, you

would need to make 100 repeated measurements. You could achieve the same result by

improving the experiment and the measuring device to reduce the intrinsic standard

deviation by a factor of 10. Depending on the experiment, one of these procedures may

be more straightforward than the other. Some physical experiments use thousands or

millions of repeated measurements—collected automatically by a computer—to reduce

the standard error of the experiment to within reasonable bounds.

5. Propagation of Error

You may have encountered the dreaded term “propagation of error” in a previous

science course. The central concept is that any arithmetic operations on uncertain

numbers will produce a result that is uncertain; the tools of “propagation of error” allow

us to estimate this resulting uncertainty. We do not expect you to memorize formulas for

the propagation of error: you can find such formulas in standard textbooks or on the Web.

We will simply walk through one example so you can see the general concept of

propagation of error and understand how it works.

In your first lab activity, you will simulate sources of random measurement error

using three different techniques. You will assume that the “true value” of a measured

parameter is 100, and you will model the “experimental error” by rolling dice, flipping

coins, and choosing random digits from a phone book. Each of these sources of error

should be random, and you will add them all to the true value of 100 to yield the value

that is measured by the (hypothetical) “noisy instrument”:

(100) + (dice) + (coins) + (phonebook) = measurement

14

We assume that 100 has no uncertainty, since it is the “true value.” Each of the other

values—dice, coins, and phonebook—has some uncertainty, as does the sum. You will

calculate the standard deviations of each of these sources of error in your lab activity.

Let us represent the standard deviations of the values dice, coins, phonebook, and

measurement by the symbols σd, σc, σp, and σm respectively. Using these symbols, the

expected standard deviation of the total measurement can be calculated from the formula

for the propagation of error for addition:

!

"m

2=" d

2+" c

2+" p

2

You will usually see this formula for the propagation of error written (equivalently) as:

!

"m = " d

2+" c

2+" p

2

This formula is sometimes referred to as the “RSS” formula for propagation of error: the

initials stand for “root of sum of squares.”

There is an analogous formula for the propagation of error that uses the standard

error instead of the standard deviation. That is, if we denote the standard error of the

mean for the individual values as SEd, SEc, SEp and SEm the expected standard error of the

mean for the overall measurement is given by:

!

SEm = SEd

2+ SEc

2+ SEp

2

This fact that the squares of the individual errors are added together to yield the

square of the overall error is often summarized by the statement “errors add in

quadrature.” (Recall that a quadratic equation is an equation that contains a squared term

like x2.) If errors added linearly, then the multiple sources of error in physical

experiments would accumulate so quickly that it would be exceedingly difficult to make

any precise measurements. As an example, consider a measurement whose “true value”

is 100 in which there are five sources of error, each with a standard error of 10. If the

errors added linearly, we would expect the total error to equal 50; that is, we would

expect the measured values to range from 50 to 150. Since the errors add in quadrature,

however, the expected standard error is:

!

SE = 102

+102

+102

+102

+102

= 22.4

which is less than half of the standard error that would be expected if the errors added

linearly. We can add errors in quadrature when we expect the errors to be uncorrelated.

15

For instance, we expect that it is extremely unlikely that in a single experiment all the

sources of error are +10, or that all the sources of error are –10. If some of the errors are

correlated, we must use other formulas for the propagation of error that account for the

correlations. Such considerations are beyond the scope of this course.

6. “Executive Summary”

• All measurements exhibit random error, which is unavoidable, and systematic error,

which can be eliminated by proper calibration of the measuring device.

• Repeated measurements yield a statistical distribution that is almost always Gaussian;

such a distribution is characterized fully by its mean (µ) and standard deviation (σ).

• The standard deviation is a measure of the width of the distribution, and is

mathematically related to the full width at half-maximum, or FWHM.

• You should repeat one measurement at least 30 times with a particular measuring

device to determine the intrinsic standard deviation of that device.

• You may choose to repeat other measurements to minimize the standard error of the

mean, which is inversely proportional to the square root of the number of measurements.

• The standard error of the mean is a measure of the uncertainty of the mean; you can be

95% confident that the “true” mean lies within 2 standard errors of the measured mean.

• Uncorrelated random errors add in quadrature: the overall error is the root of the sum

of the squares of the individual sources of error.

For more information on error analysis and propagation of errors, you should consult the

excellent text by John R. Taylor, An Introduction to Error Analysis, 2nd ed., Sausalito,

CA: University Science Books, 1997.

Documents

Welcome to Physical Sciences 2 lab! - Harvard Universityscphys/nsta/lab_introduction.pdf · Welcome to Physical Sciences 2 lab! We're very excited about the labs for this course,