Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org
AP Statistics
Designing Studies & Experiments
Student Handout
2017-2018 EDITION
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 2
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 1
Designing Studies and Experiments
A free response question dealing with sampling or experimental design has appeared on every
AP Statistics exam. The question is designed to assess your understanding of fundamental
concepts such as identifying a potential source of bias and its consequences, describing an
appropriate sampling method, recognizing the difference between random sampling and random
assignment, and recognizing when an inference or generalization can be made based on the
design of the study. Student responses using clear communication and correct statistical
vocabulary earn the highest score.
Reminders about content and communications:
• When identifying potential bias, be sure to link the bias to a consequence with a specific
direction in overestimating or underestimating the statistic.
• When describing a simple random sampling method based on a situation, describe a valid
sampling procedure that allows each group of n individuals an equal opportunity to be
randomly selected. Your description should be clear enough to be implemented in the
same way by several different first year statistics students.
• Without random samples from the population, results cannot be generalized to the
population. All factors of the situation must be considered before making a generalization.
• Without random assignment to treatment groups, cause and effect conclusions cannot be
made from experimental results.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 2
2011 #3 – slightly modified
An apartment building has nine floors and each floor has four apartments. The building owner wants to install new carpeting in eight apartments to see how well it wears before she decides whether to replace the carpet in the entire building. The figure below shows the floors of apartments in the building with their apartment numbers. Only the nine apartments indicated with an asterisk (*) have children in the apartment.
(a) Is this an observational study or an experiment? Explain. (b)For convenience, the apartment building owner wants to use a cluster sampling method, in which the floors are clusters, to select the eight apartments. Describe a process for randomly selecting eight different apartments using this method. (c) An alternative sampling method would be to select a stratified random sample of eight apartments, where the strata are apartments with children and apartments with no children. A stratified random sample of size eight might include two randomly selected apartments with children and six randomly selected apartments with no children. In the context of this situation, give one statistical advantage of selecting such a stratified sample as opposed to a cluster sample of eight apartments using the floors as clusters.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 3
Multiple Choice Questions:
1. A large company wants to conduct a survey to determine the proportion of its male
employees who practice yoga on a daily basis. Two of its regional offices are chosen at
random and all of the male employees at each office are surveyed. This plan is an example of
which type of sampling?
A) Cluster
B) Convenience
C) Simple random
D) Stratified random
E) Systematic
2. Which of the following is a key distinction between well designed experiments and
observational studies?
A) More subjects are available for experiments than for observational studies.
B) Ethical constraints prevent large-scale observational studies.
C) Experiments are less costly to conduct than observational studies.
D) An experiment can show a direct cause-and-effect relationship, whereas an observational
study cannot.
E) Tests of significance cannot be used on data collected from an observational study.
3. A local television station is interested in how citizens in a small town feel about the increased
sales tax proposed by the city council. The question “Are you in favor of the proposed sales
tax increase that will be used to improve the sidewalks and streets in downtown?” was shown
on the screen during the evening news broadcast and viewers were instructed to text their
answer to the number given on the screen. This survey method could produce biased results
for all of the following reasons except
A) the wording of the question is biased.
B) a person could answer the survey multiple times.
C) it uses a stratified sample rather than a simple random sample.
D) the survey excludes voters who do not watch the evening news cast.
E) people who feel strongly about the issue are more likely to respond.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 4
4. The oil used in gasoline engines for cars can be mineral oil, a synthetic blend (a mixture of
mineral oil and synthetic oil), or pure synthetic oil. An experiment is to be conducted to
determine whether oil type effects a car’s engine. In previous studies, it was determined that
engine size (4-cylinder, 6-cyliner, 8-cylinder) is associated with engine life, but car type
(coupe, sedan, wagon) is not associated with engine life. This experiment would best be done
A) by blocking on car type
B) by blocking on oil type
C) by blocking on engine life
D) by blocking on engine size
E) without blocking
5. The director of an alumni association wants to estimate the mean income of the members of
the class of 1999. Each person in the class of 779 graduates was given the survey and 163 of
the graduates returned the survey. How could the nonresponse by the 616 graduates who did
not return the survey cause the results of the survey to be biased?
A) The graduates who did not respond caused the assumption of independence to be invalid.
B) The graduates who did not respond changed the survey from a census to a simple random
sample of graduates.
C) The graduates who did not respond reduced the sample size and smaller samples are more
biased than large samples.
D) The graduates who did not respond may represent a group that is homogeneous with
respect to income and differs from the graduates who did respond.
E) The graduates who did not respond may represent a group that is heterogeneous with
respect to income and is similar to the graduates who did respond.
6. A consumer group for a camping and hiking magazine would like to compare a new
mosquito spray made with essential oils to a traditional spray that contains DEET. The group
is interested in the amount of time the spray protects the wearer from the bloodthirsty insects.
Subjects will be treated and sent into a mosquito filled meadow for a 12-hour period. Which
of the following is the BEST method for assigning the treatments?
A) Have the subjects choose which spray they are willing to use for the 12-hour period.
B) Assign the sprays to the subjects on the basis of their camping and hiking experience
without randomization.
C) Give the new spray to all subjects for a 12-hour period, then give the DEET spray to all
subjects for a second 12-hour period.
D) Randomly assign the subjects to two groups, giving the new spray to one group and the
DEET spray to the second group.
E) Each subject uses a randomization device to select which spray to apply to the right side
of their body. The other spray is then applied to the left side of their body.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 5
7. An education researcher would like to give a survey to a stratified random sample of students
in a large school district using grade level as the strata. Which of the following would NOT
be a characteristic of this stratified random sample?
A) A random sample of students will be chosen.
B) Each student in the population belongs to only one stratum.
C) The population of students will be divided into homogeneous groups by grade level.
D) Proportional numbers of students could be selected from each grade level.
E) Every possible subset of the population of students in the district has the same chance of
being selected.
8. Students taking an exam at Westwood High School were randomly selected to receive either a
peppermint candy or a similar looking candy without peppermint to determine if peppermint
really improves thinking. Both groups showed an increase in test scores.
This is an example of
A) a successful experiment due to the peppermint treatment.
B) poor design due to the lack of a control group.
C) measurement bias since we do not know the difficulty level of the exam.
D) the placebo effect due to the increase of scores in the non-peppermint group.
E) blocking by peppermint and no peppermint candy.
9. A chemical company designs an experiment to determine whether or not a new pesticide will
work better than a commonly used treatment to eliminate ants. The company uses a sample of
fire ants to test the new pesticide. The proportion of surviving ants that were randomly
selected to receive the new pesticide was significantly lower than the proportion of ants that
received the commonly used pesticide. The company concluded that the new pesticide is
indeed better at killing ants. Is this a correct conclusion?
A) No, because there was not a group that received no pesticide.
B) No, because the experiment was only done with fire ants so the results cannot be
generalized to all ants.
C) Yes, because the ants were randomly selected for the two treatment groups.
D) Yes, because the difference was statistically significant.
E) Yes, because there was a control group to reduce the effects of confounding variables.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 6
Additional Free Response Questions:
2004A Question 2 Researchers who are studying a new shampoo formula plan to compare the condition of hair for people who use the new formula with the condition of hair for people who use the current formula. Twelve volunteers are available to participate in this study. Information on these volunteers (numbered 1 through 12) is shown in the table below.
Volunteer Gender Age
1 Male 21
2 Female 20
3 Male 47
4 Female 60
5 Female 62
6 Male 61
7 Male 58
8 Female 44
9 Male 44
10 Female 24
11 Male 23
12 Female 46
(a) These researchers want to conduct an experiment involving the two formulas (new and
current) of shampoo. They believe that the condition of hair changes with age but not gender. Because researchers want the size of the blocks in an experiment to be equal to the number of treatments, they will use blocks of size 2 in their experiment. Identify the volunteers (by number) that would be included in each of the six blocks and give the criteria you used to form the blocks.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 7
(b) Other researchers believe that hair condition differs with both age and gender. These researchers will also use blocks of size 2 in their experiment. Identify the volunteers (by number) that would be included in each of the six blocks and give the criteria you used to form the blocks.
(c) The researchers in part (b) decide to select three of the six blocks to receive the new
formula and to give the other three blocks the current formula. Is this an appropriate way to assign treatments? If so, describe a method for selecting the three blocks to receive the formula. If not, describe an appropriate method for assigning treatments.
2008 Question 2
A local school board plans to conduct a survey of parents’ opinions about year-round schooling
in elementary schools. The school board obtains a list of all families in the district with at least
one child in an elementary school and sends the survey to a random sample of 500 of the
families. The survey question is provided below.
A proposal has been submitted that would require students in elementary schools to attend school
on a year round basis. Do you support this proposal? (Yes or No)
The school board received responses from 98 of the families, with 76 of the responses indicating
support for year-round schools. Based on this outcome, the local school board concludes that
most of the families with at least one child in elementary school prefer year-round schooling.
(a) What is a possible consequence of nonresponse bias for interpreting the results of this
survey?
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 8
(b) Someone advised the local school board to take an additional random sample of 500 families
and to use the combined results to make their decision. Would this be a suitable solution to
the issue raised in part (a). Explain.
(c) Suggest a different follow-up step from the one suggested in part (b) that the local school
board could take to address the issue raised in part (a).
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 9
Important vocabulary
• The population is the entire group you would like to study or draw a conclusion about. Any numerical value that comes from the population is a parameter. Parameters are usually unknown. The study of an entire population is called a census.
• The sample is the part of the population from which you take data. Any numerical value that comes from a sample is called a statistic.
• A simple random sample (SRS) of size n gives every individual and every group of size n an equal chance of being chosen. To carry out an SRS of size n , number the list of possible subjects or experimental units. Clearly describe how to complete the randomization using a random digit table or a random number generator. For each chosen number, write down the name of the corresponding subject or experimental unit. Ignore repeats. Continue until you have a list of n different subjects or experimental units.
• Choose a stratified random sample if you want to be sure to have some subjects from each subgroup in your sample. Split into subgroups called strata. Then take an SRS out of each subgroup. Note: All subjects in the subgroup must be similar (homogeneous) with respect to a characteristic that might be related to the response variable. For example, when investigating the average amount of time spent on homework each night, the strata could be freshmen, sophomores, juniors, and seniors. Then your sample would be sure to have students from each grade level since grade level is probably related to average homework time.
• Use a cluster sample if you have many groups that are similar to each other. Randomly choose one or more groups to be the sample. Note: The subjects in the groups should not be alike (heterogeneous), but each group should be similar to every other group. For example, if each fourth grade class in an elementary school has students of all ability levels and all socioeconomic groups, then randomly choosing one class as a sample would give an acceptable representation of the fourth grade as a whole.
• Convenience sample is asking whomever you happen to run into. Not a good idea but quick and easy.
• Systematic random sample: choosing every nth person through a door or on a list. • Bias occurs when a study systematically favors one outcome over another. It can occur
when a certain group is over- or under-represented or when your measurement device impacts the results.
• Voluntary response bias occurs when subjects voluntarily choose to be in the sample, and people usually volunteer only if they have strong motivation.
• Undercoverage occurs when some groups of people are ignored when the sample is being chosen. A survey sent by email ignores people without access to email.
• Non-response bias occurs when one group of subjects having something in common do not answer. (For example, only people with a lot of time on their hands respond.)
• Response bias occurs when subjects give incorrect answers, either because they have forgotten details, they are intimidated by the interviewer, or they lie about embarrassing or illegal activities.
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 10
• Experiments are studies in which the researcher imposes a treatment on experimental units. • Sometimes different groups are simply compared with one another. If no treatment is
assigned or imposed, the study is called an observational study. • Some experiments have a control group (a group of experimental units that receive no
treatment or receive only a placebo), but this is not necessary for a well-designed
experiment. • Experimental units are the smallest independent “objects” to which treatments are assigned
and on which a response is measured. Consider an experiment that is designed to determine which of several types of fish food
will result in the greatest weight gain for fish. If tanks contain several fish, and food is
added to the water in the tank, then the tank is the experimental unit (not the individual
fish), since the fish in a tank are not independent of one another, but tanks are independent
of one another.
• Replication refers to having multiple experimental units in each treatment group (repeating
the treatment), not to repeating the entire experiment.
• In an experiment, randomization refers to randomly assigning experimental units to the
treatments. Often the experimental units are not a random sample of the population of
interest. While this is not a problem with the experimental design, it may limit the scope of
inference for the experimental results. (Note that random samples are important in surveys.)
• The purpose of random assignment (of experimental units to treatment groups or of
treatments to experimental units) is to even out extraneous variables and make treatment
groups that are approximately similar in all respects except for the treatment.
• In a double blind experiment, someone must know which treatment the experimental unit
received! The subjects (assuming they are people) are blind to which treatments they are
receiving, and anyone who interacts with the subjects should also be blinded to which
treatment was given to each subject.
• If the response variable is in any way a subjective evaluation, then the person
performing that evaluation should be blind to what treatments were applied. But
obviously, some person or people on the research team must have a record of what
treatments have been applied to what subjects.
• A confounding variable is a variable that affects the response variable and also is related
to group membership. A variable that affects the response variable and is not related to
group membership (that is, the variable would be expected to even out across the groups) is
not a confounding variable. You may refer to this type of variable as an extraneous variable.
It is best to avoid using the term lurking variable.
o For example: It has been observed that people who take long vacations have, on
average, significantly longer lifespans than people who don’t. Can we conclude
that vacationing is a way to extend your lifespan? Not necessarily: a person’s
income could be a confounding variable—people with higher incomes are more
likely to be able to take long vacations, and they’re also more likely to afford
health care that could lead to longer lifespans. Note that something like exercise
would probably be an extraneous variable and not a confounding variable.
Exercise may indeed be associated with longer lifespans, but is there an
association between getting exercise and taking long vacations?
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 11
• Blocks are groups of experimental units that are homogeneous with respect to some
inherent characteristic that is expected to affect the response to treatments.
o Blocks are considered a form of control – blocks help control known sources of
variability among the experimental units so that the experimenter is better able to
detect differences in the response variable that are due to the treatments.
o “Blocking is used to control the factors you can see; randomization helps balance
the ones you cannot see.” Richard L. Scheaffer, AP Statistics Chief Faculty
Consultant, 1997‐1999
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 12
1
California is experiencing a
severe drought which affects
the yield of all crops.
Desalinated water may be an
option for irrigation but the
water still has a higher salt
content than ground water. Is
there a difference in pecan yield
using desalinated water?
2
Stevia is a natural substitute for
sugar extracted from a plant by
the same name.
Does using Stevia in place of
sugar reduce the risk of Type II
diabetes?
3
Good manners are listed as one
of the top ten skills adults
believe children need to succeed
in the modern world. What
percent of high school students
would be categorized as having
good manners?
4
Resveratrol is a plant based
compound found in places such as
the skin of grapes or
blueberries. Some believe
resveratrol lowers the risk of
cancer. How could we determine
if taking resveratrol supplements
is beneficial?
5
A city is hoping to avoid the
need of a new trash landfill by
giving every customer a recycling
bin as well as a trash bin. They
are interested in predicting the
amount of trash reduction going
into the landfill using the new
recycling bins. How should they
conduct a study to determine
this?
6
How long does it really take to
complete AP Statistics
homework?
Copyright © 2017 National Math + Science Initiative, Dallas, Texas. All rights reserved. Visit us online at www.nms.org 13
7
As water shortages become
increasingly common in many
cities throughout the United
States, efforts to conserve this
necessary resource are being
explored. What water saving
habits are citizens most likely to
actually do to conserve water?
8
Sodium nitrate is often added to
lunch meat and hot dogs to fight
harmful bacteria. What long
term health effects do nitrates
have on our health?
9
Sleep:
Not now, of course!
High school students are
increasingly sleep deprived. Is
there a way to solve this
problem? You may come up with
a study or experiment to help
work towards a solution.
10
Wind power is renewable, clean,
and takes up a relatively small
amount of land area.
What is the optimum length for
the rotor blades of a wind
turbine for generating the
maximum amount of electricity?
11
What is the difference in
lifespan between lemurs living in
the wild and lemurs living in
captivity?
12
What is the smallest amount of
pesticide that can be used on
strawberry plants and still
remain effective in keeping bugs
from eating the crop?