5
Lee Kucera Page 1 4/7/13 Independent Samples vs. Paired Samples Sometimes our two samples of data aren’t independent (we can’t use a two-sample t-test). There are different situations where samples may be paired (think back to Chapter 2)— siblings (esp. twins) are naturally occurring pairs; individuals can be their own “pair” (before/after results, left/right hand, foot, side); we can pair measurements over time; we can place experimental subjects into pairs based on like characteristics before assigning them to treatments (two highest blood pressures, next two, down to two lowest blood pressures; two oldest to two youngest; etc.) The purpose of pairing is to reduce variation. Which type of sample do we have in each case? Explain. (thanks to Bob Hayden) 1. To test the effect of background music on productivity, the workers are observed. For one month they had no music. For another month they had background music. A worker's productivity measurement with music is paired with a productivity measurement for the same worker without music—randomize order. 2. A random sample of 10 workers in Plant A are to be compared to a sample of 10 workers in Plant B. If we pick at random a worker in Plant A, we have no information that would allow us to match that worker to another worker in Plant B. Hence we would treat these as independent samples. 3. A new weight reducing diet was tried on ten women. The weight of each woman was measured before the diet, and again after being on the diet for ten weeks. A woman's weight before using the diet is paired with a weight for the same woman after the diet. 4. To compare the average weight gain of pigs fed two different rations, nine pairs of pigs were used. The pigs in each pair were litter-mates. As it says, litter-mates are paired. Some hog farmers question whether this is a worthwhile pairing. Randomly assign members of pairs to two foods. 5. To test the effects of a new fertilizer, 100 plots are treated with one fertilizer, and 100 plots are treated with the other. If we pick at random a plot treated with the new fertilizer, we have no information that would allow us to match that plot with one of the plots where the old fertilizer was used. Hence we would treat these as independent samples. 6. A sample of college teachers is taken. We wish to compare the average salaries of male and female teachers. We would like to pair teachers with the same degrees, years of experience, publications, etc., but this is usually impossible and we have no choice but to take independent samples.

Independent Samples vs. Paired Samples · 2018-03-19 · Lee Kucera Page 1 4/7/13 Independent Samples vs. Paired Samples Sometimes our two samples of data aren’t independent (we

  • Upload
    lyminh

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Independent Samples vs. Paired Samples · 2018-03-19 · Lee Kucera Page 1 4/7/13 Independent Samples vs. Paired Samples Sometimes our two samples of data aren’t independent (we

Lee Kucera Page 1 4/7/13

Independent Samples vs. Paired Samples

Sometimes our two samples of data aren’t independent (we can’t use a two-sample t-test). There are different situations where samples may be paired (think back to Chapter 2)—siblings (esp. twins) are naturally occurring pairs; individuals can be their own “pair” (before/after results, left/right hand, foot, side); we can pair measurements over time; we can place experimental subjects into pairs based on like characteristics before assigning them to treatments (two highest blood pressures, next two, down to two lowest blood pressures; two oldest to two youngest; etc.) The purpose of pairing is to reduce variation.

Which type of sample do we have in each case? Explain. (thanks to Bob Hayden)

1. To test the effect of background music on productivity, the workers are observed. For one month they had no music. For another month they had background music. A worker's productivity measurement with music is paired with a productivity measurement for the same worker without music—randomize order.

2. A random sample of 10 workers in Plant A are to be compared to a sample of 10 workers in Plant B. If we pick at random a worker in Plant A, we have no information that would allow us to match that worker to another worker in Plant B. Hence we would treat these as independent samples.

3. A new weight reducing diet was tried on ten women. The weight of each woman was measured before the diet, and again after being on the diet for ten weeks. A woman's weight before using the diet is paired with a weight for the same woman after the diet.

4. To compare the average weight gain of pigs fed two different rations, nine pairs of pigs were used. The pigs in each pair were litter-mates. As it says, litter-mates are paired. Some hog farmers question whether this is a worthwhile pairing.

Randomly assign members of pairs to two foods.

5. To test the effects of a new fertilizer, 100 plots are treated with one fertilizer, and 100 plots are treated with the other. If we pick at random a plot treated with the new fertilizer, we have no information that would allow us to match that plot with one of the plots where the old fertilizer was used. Hence we would treat these as independent samples.

6. A sample of college teachers is taken. We wish to compare the average salaries of male and female teachers. We would like to pair teachers with the same degrees, years of experience, publications, etc., but this is usually impossible and we have no choice but to take independent samples.

Page 2: Independent Samples vs. Paired Samples · 2018-03-19 · Lee Kucera Page 1 4/7/13 Independent Samples vs. Paired Samples Sometimes our two samples of data aren’t independent (we

Lee Kucera Page 2 4/7/13

7. A new fertilizer is tested on 100 plots. Each plot is divided in half. Fertilizer A is applied to one half and B to the other. This is called a split plot design. The results obtained on one half of the plot are paired with the results obtained on the other half of the same plot. Randomly assign fertilizers to the two halves. 8. Consumers Union wants to compare two types of calculators. They get 100 volunteers and ask them to carry out a series of 50 routine calculations (such as figuring discounts, sales tax, totaling a bill, etc.). Each calculation is done on each type of calculator, and the time required for each calculation is recorded. The time it takes one individual to do a particular calculation on one calculator is paired with the time it took the same person to do the same calculation on the other calculator. Randomly assign order.

We want to see if cars get better gas mileage running on premium fuel than they do on regular—think about two possible designs for a study:

Plan A: get 20 cars, randomly split them into 2 groups, run one group on regular gas and the other on premium Plan B: try each car on both types of gas, randomizing which tankful comes first

There are many reasons why gas mileage may vary - differences in cars, drivers, routes, traffic, etc. Note that Plan B uses the SAME CARS (and possibly the same drivers and routes) for both kinds of gas. This better control of sources of variability gives us more power to detect a difference caused by the type of gasoline. Because the variation from other sources is smaller, true differences based on the fuel will stand out better against the background noise. If we were to observe the same mpg difference in studies using Plan A and Plan B, we'd find the evidence coming from Plan B more convincing - and that shows up as a lower P-value.

1. Null hypothesis: H0: µd = hypothesized value

Alternative hypothesis: Ha: µd ≠ hypothesized value 2. Name test: matched pairs t-test Requirements/Assumptions:

• The samples are paired. • Randomize treatment assignments where needed. • The n sample differences can be viewed as a random sample from a population of

differences. • The number of sample differences is large (generally at least 30) OR the population

Null hypothesis: H0d = hypothesized value Test statistic:

Page 3: Independent Samples vs. Paired Samples · 2018-03-19 · Lee Kucera Page 1 4/7/13 Independent Samples vs. Paired Samples Sometimes our two samples of data aren’t independent (we

Lee Kucera Page 3 4/7/13

Member Initial

Weight One Week

Weight Difference Initial -1week

1 195 195 0

2 153 151 2

3 174 170 4

4 125 123 2

5 149 144 5

6 152 149 3

7 135 131 4

8 143 147 -4

9 139 138 1

10 198 192 6

11 215 211 4 12 153 152 1

distribution of differences is approximately normal. • The number of sample differences is less than 10% of the population of differences.

3. Test statistic:

Pvalue: df = n-1 4. Significance level: α = Is Pvalue < α? Conclusion in context: ex. 1 A weight reduction center advertises that participants in its program lose an average of at least 5 pounds during the first week of the participation. Because of numerous complaints, the state’s consumer protection agency doubts this claim. To test the claim at the 0.05 level of significance, 12 participants were randomly selected. Their initial weights and their weights after 1 week in the program appear on the next slide. Is the CPA correct in doubting the claim? (Peck, Olsen, Devore) 1. H0: µd(after-before) = 5 Ha: µ d(after-before) < 5 (concerned that the weight loss is less than 5 pounds) 2. Test: matched pairs t-test Assumptions/Requirements:

• Told that the sample is randomly selected (no treatments to randomize) • Two weight measures for each individual—matched pairs, not independent • Reasonable to assume more than 120 program participants so n < 10% N

Page 4: Independent Samples vs. Paired Samples · 2018-03-19 · Lee Kucera Page 1 4/7/13 Independent Samples vs. Paired Samples Sometimes our two samples of data aren’t independent (we

Lee Kucera Page 4 4/7/13

• The sample size (12) is small, so from the boxplot we see that there is one outlier but the distribution is reasonably symmetric and the normal plot confirms that it is reasonable to assume that the population of differences (weight losses) is normally distributed.

3. Test statistic: t = –3.45 Pvalue: Pval=.00269 with df=11 4. Significance level: α = 0.05

Is Pvalue < α? 0.00269 < 0.05000? yes, we reject H0 and accept Ha.

Context: There is strong evidence that the mean weight loss for those who took the program for one week is less than 5 pounds.

Page 5: Independent Samples vs. Paired Samples · 2018-03-19 · Lee Kucera Page 1 4/7/13 Independent Samples vs. Paired Samples Sometimes our two samples of data aren’t independent (we

Lee Kucera Page 5 4/7/13

Iowa Test Problem The Iowa Tests of Basic Skills is a collection of various achievement tests given to students in grades 1 – 8. Student achievement levels are reported on a scale that runs from 0 to 13. The North Snowshoe Community Schools are evaluating their reading program for students whose native language is not English. In one part of the study the reading comprehension of 10 students, randomly selected from a large population of students in the program, take the Reading Comprehension test in third and fourth grade. Their scores for each year, in grade equivalents, are listed below. (Peck, Olsen, Devore)

Student # 1 2 3 4 5 6 7 8 9 10 3rd Grade Score 3.5 3.2 2.7 3.9 3.5 5.8 4.6 2.5 2.4 3.5 4th Grade Score 4.8 3.5 2.6 4.8 4.2 6.5 5.2 2.9 2.2 3.7 Difference 1.3 0.3 -0.1 0.9 0.7 0.7 0.6 0.4 -0.2 0.2

"Normal" growth, by definition, is a change of 1.0. Using the data above, test the hypothesis that the difference in means for 3rd and 4th grade students in this program for non-native English speakers is equal to 1.0. 1. Ho: µd(4th grade score-3rd grade score) = 1 Ha: µ d(4th grade score-3rd grade score < 1 (growth > 1 would not be a concern)

2. matched-pairs t-test requirements:

• students randomly selected • not independent—Reading scores from the same 10 students

in 3rd and 4th grades • samples not large enough—samples are size 10; boxplot and

histogram show no outliers or skewness so reasonable to assume distribution of mean growth is approximately normal

• more than 100 students so n ≤ 10% N

3. t = –3.6017, df=9 Pvalue = .00287 4. α = .05 is p-value ≤ a? is .00287 ≤ .05000? yes – reject Ho, accept Ha There is strong evidence that the difference in means (growth over one year) for students in this program is less than the normal 1 year.