INFERENCE WITH MATCHED PAIRS

INFERENCE WITH MATCHED PAIRSa special type of t-inference

AP StatisticsChapter 25

a) He randomly selects 50 students and has each student perform a memory test once while listening to music and once without listening to music. He then compares the two scores for each student, “with music” vs. “without music”…

b) He takes 50 students, and has half of them perform a memory test without listening to music, and the other half perform the memory test while listening to music. He then obtains the means and standard deviations of the “with music” and the “without music” groups…

Which situation requires a 2-sample t procedure, and which requires matched pairs?A researcher wishes to determine whether listening to music affects students' performance on memory test.

matched pairs

2-sample t

a) 30 athletes are selected for this study. One group of 15 runs the sprint wearing the new footwear, and the other group of 15 runs with their normal footwear. He then compares the mean sprint times between the two groups…

b) 30 athletes are selected, each of them runs one sprint wearing the new footwear, and also one sprint with their normal footwear. Randomization (flipping a coin for each athlete) determines which footwear they run with first. The two times for each athlete are compared…

Which situation requires a 2-sample t procedure, and which requires matched pairs?A manufacturer has designed athletic footwear which it hopes will improve the performance of athletes running the 100-meter sprint.

matched pairs

2-sample t

A couple of tips (reminders?):• For an experiment, you don’t need a random

sample – volunteers are okay! But use randomization to split subjects into groups (use a RNG… or flip a coin for each person…

it is OKAY for groups to be DIFFERENT sizes)• When designing a matched-pairs procedure,

EVERYONE gets both “treatments” – so randomize the order!!!(if practical. For “before-after” scenarios, you can’t really do this…)

30 athletes are selected… randomization (flipping a coin for each athlete) determines which footwear they run with first. The two times for each athlete are compared…

DESIGNING STUDIES in tonight’s HW!!! gasp!

Some run with the new shoes

FIRST…Some run with the new shoes

SECOND.

After 600 570 550 650 690 665 780 640

Why matched pairs?(why not stick with 2-sample t?)

Before 550 520 500 600 640 615 730 590

• Is there variance in the “before” scores?• Is there variance in the “after” scores?• Is there variance in the

improvements?

+50 +50 +50 +50 +50 +50 +50 +50 sd = 0s1 = 73.04

s2 = 73.04

?2

22

1

21

ns

ns

Student 1 2 3 4 5 6 7 8

2

22

1

21

ns

ns

nsdSo instead of we use

Last flap of our “means” foldable (outside)

matched pairs!

paired t-intervaland

paired t-test

Update your foldables (inside, top half)

Define md (“true mean difference…”)

Conditions:• Paired data???• Random sample (pairs)• (10%)• Nearly Normal Condition

o n > 30 (number of PAIRS!!!)o boxplots/histogram of DIFFERENCES!!!

So we may use a t-distribution, df = n – 1 (“n” is the number of pairs)

***define which way you are subtracting!!!

***do NOT graph BOTH boxplots!!!

Update your foldables (inside, bottom half)

d

ddfd nstx *

d

d

d

nsxt 0

0dO:H m><≠

0dA:H m

paired t-test:

paired t-interval:

on calculator, just do “t-test” or “t-interval” with the differences

SOME TIPS ON HOW TO TELL MATCHED PAIRS…

• The two sets of data MUST have the same number of elements…

• HOWEVER, just because both sets of data have the same count does NOT NECESSARILY make it matched pairs (so be careful!)

• Is each PAIR of numbers linked somehow? (sometimes this is very difficult to determine)

(WARNING: THIS IS NOT A COMPLETE LIST)

• Reduces the variability (spread) of our data (sampling model)

• With LESS variability, we are MORE likely to reject Ho.

• Makes it easier to detect an “effect” (a “change” or “difference” or “improvement”, etc.)

BIG PICTURE: BLOCKING/STRATIFYING AND INFERENCE

2-sample t(no blocking)

Matched pairs(blocking)

larger p-value

smaller p-value

Day 1 2 3 4 5 6 7 8Morning 9 7 10 10 2 5 7 6Afternoon 10 9 9 8 4 7 9 6

A whale-watching company noticed that many customers wanted to know whether it was better to book an excursion in the morning or the afternoon. To test this question, the company collected the following data (number of whales sighted) on 8 randomly selected days over the past month. (Note: days were not consecutive)

Since you have two values for each day, they are

dependent on the day – making this data matched

pairs

You may subtract either way – just be careful when writing

Ha

Day 1 2 3 4 5 6 7 8Morning 9 7 10 10 2 5 7 6Afternoon 10 9 9 8 4 7 9 6Differences -1 -2 1 2 -2 -2 -2 0

Conditions:• The data are paired by day since whale-watching

conditions may change from day to day • We have a random sample of days for whale-

watching• n = 8 days is certainly less than 10% of all whale-

watching days

You need to state assumptions using the differences!

Nearly Normal Condition:• The box plot of differences is

skewed, but has no outliers, so normality is plausible (especially with this small a sample size)(remember, you can also do a dot plot!!!)

We may use a t-distribution w/ df = 7

Day 1 2 3 4 5 6 7 8

Differences -1 -2 1 2 -2 -2 -2 0

At the 5% significance level, is there evidence that more whales are sighted in the afternoon?

H0: mD = 0Ha: mD < 0mD = true mean difference in whale sightings, morning – afternoon

If you subtract afternoon – morning; then Ha: mD > 0

Day 1 2 3 4 5 6 7 8

Differences -1 -2 1 2 -2 -2 -2 0

Be careful writing your Ha!Think about how you subtracted: M – A

If afternoon is more, should the differences be + or -?

(Don’t look at numbers!!!!)

define which way you are subtracting!!!

Day 1 2 3 4 5 6 7 8

Differences -1 -2 1 2 -2 -2 -2 0finishing the hypothesis test:

Since p-value (.1108) > a (.05), we fail to reject H0. We lack sufficient evidence to suggest that more whales are sighted in the afternoon than in the morning.

05.71108.

3416.1

85811.1

075.

a

m

dfp

nsxt Notice that if

you subtracted A-M, then your test statistic

t = + 1.3416, but

p-value would be the same

In your calculator, perform a t-test

using the differences (L3)

Develop a 90% confidence interval for the true average difference in number of whales sighted (morning – afternoon)

statistic (critical value) (SE)

(-1.809, 0.3091)df = 8 – 1 = 7

We are 90% confident that the true mean difference in whale sightings is from 1.809 fewer in the morning to 0.3091 more in the morning.

We can’t really say that it matters when you go whaling!

* dd df

d

sx tn

8)5811.1(895.1)75.0(

…and now, a paired t-interval

whale watching!

since this is really a 1-sample interval, get the t* value from the t-table

(here is the problem we did in class, but without the work, if you wish to give it a shot and check

with someone later)

The table below contains data on the subjects’ scores on a depression test. Higher scores show more symptoms of depression.

IS CAFFEINE DEPENDENCE REAL?

Subject 1 2 3 4 5 6 7 8 9 10 11

Caffeine 5 5 4 3 8 5 0 0 2 11 1Placebo 16 23 5 7 14 24 6 3 15 12 0

a) Do the data from this study provide statistical evidence at the 5% level of significance that caffeine deprivation leads to an increase in depression?

The table below contains data on the subjects’ scores on a depression test. Higher scores show more symptoms of depression.

IS CAFFEINE DEPENDENCE REAL?

Subject 1 2 3 4 5 6 7 8 9 10 11

Caffeine 5 5 4 3 8 5 0 0 2 11 1Placebo 16 23 5 7 14 24 6 3 15 12 0

b) Use a 90% confidence interval to estimate the true mean increase in depression scores that results from being deprived of caffeine.

Documents

INFERENCE WITH MATCHED PAIRS