26
Bootstraps and Scrambles: Letting Data Speak for Themselves Robin H. Lock Burry Professor of Statistics St. Lawrence University [email protected] Science Today SUNY Oswego, March 31, 2010

Bootstraps and Scrambles: Letting Data Speak for Themselves

  • Upload
    makaio

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Bootstraps and Scrambles: Letting Data Speak for Themselves. Robin H. Lock Burry Professor of Statistics St. Lawrence University [email protected]. Science Today SUNY Oswego, March 31, 2010. Bootstrap CI’s & Randomization Tests. (1) What are they? (2) Why are they being used more? - PowerPoint PPT Presentation

Citation preview

Page 1: Bootstraps and Scrambles: Letting Data Speak for Themselves

Bootstraps and Scrambles: Letting Data Speak for

ThemselvesRobin H. Lock

Burry Professor of StatisticsSt. Lawrence University

[email protected]

Science TodaySUNY Oswego, March 31, 2010

Page 2: Bootstraps and Scrambles: Letting Data Speak for Themselves

Bootstrap CI’s & Randomization Tests

(1) What are they?

(2) Why are they being used more?

(3) Can these methods be used to introduce students to key ideas of statistical inference?

Page 3: Bootstraps and Scrambles: Letting Data Speak for Themselves

Example #1: Perch Weights

Suppose that we have collected a sample of 56 perch from a lake in Finland.

Estimate and find 95% confidence bounds for the mean weight of perch in the lake.

From the sample:

n=56 X=382.2 gms s=347.6 gms

Page 4: Bootstraps and Scrambles: Letting Data Speak for Themselves

Classical CI for a Mean (μ)“Assume” population is normal, then

1~

nt

ns

X n

stX n*

1

566.347004.22.382

5.46004.22.382

1.932.382

(289.1, 475.3)

For perch sample:

Page 5: Bootstraps and Scrambles: Letting Data Speak for Themselves

Possible PitfallsWhat if the underlying population is NOT normal?

Weight200 400 600 800 1000

Perch Dot Plot

What if the sample size is small? What is you have a different sample statistic?What if the Central Limit Theorem doesn’t apply? (or you’ve never heard of it!)

Page 6: Bootstraps and Scrambles: Letting Data Speak for Themselves

BootstrapBasic idea: Simulate the sampling distribution of any statistic (like the mean) by repeatedly sampling from the original data.

Bootstrap distribution of perch means:• Sample 56 values (with replacement)

from the original sample.• Compute the mean for bootstrap sample• Repeat MANY times.

Page 7: Bootstraps and Scrambles: Letting Data Speak for Themselves

Original Sample (56 fish)

Page 8: Bootstraps and Scrambles: Letting Data Speak for Themselves

Bootstrap “population”

Sample and compute means from this “population”

Page 9: Bootstraps and Scrambles: Letting Data Speak for Themselves

Bootstrap Distribution of 1000 Perch Means

xbar250 300 350 400 450 500 550

Measures from Sample of Perch Dot Plot

Page 10: Bootstraps and Scrambles: Letting Data Speak for Themselves

CI from Bootstrap Distribution

Method #1: Use bootstrap std. dev.

bootSzX *

For 1000 bootstrap perch means: Sboot=45.8

)0.472,4.292(8.892.3828.4596.12.382

Page 11: Bootstraps and Scrambles: Letting Data Speak for Themselves

CI from Bootstrap DistributionMethod #2: Use bootstrap quantiles

xbar250 300 350 400 450 500 550

Measures from Sample of Perch Dot Plot

2.5%2.5%

299.6 476.195% CI for μ

Page 12: Bootstraps and Scrambles: Letting Data Speak for Themselves

Example #2: Friendly ObserversExperiment: Subjects were tested for performance on a video game

Conditions:Group A: An observer shares prizeGroup B: Neutral observer

Response: (categorical)Beat/Fail to Beat score threshold

Hypothesis: Players with an interested observer (Group A) will tend to perform less ably.

Butler & Baumeister (1998)

Page 13: Bootstraps and Scrambles: Letting Data Speak for Themselves

A Statistical ExperimentStart with 24 subjectsDivide at random into two groups

Group A: Share Group B: NeutralGroup A: Share Group B: Neutral

Record the data (Beat or No Beat)

Page 14: Bootstraps and Scrambles: Letting Data Speak for Themselves

Friendly Observer Results

Group A(share prize)

Group B(prize alone)

Beat Threshold

Failed to Beat Threshold

12 12

11

13

3

98

4

Is this difference “statistically significant”?

Page 15: Bootstraps and Scrambles: Letting Data Speak for Themselves

Friendly Observer - Simulation1. Start with a pack of 24 cards. 11 Black (Beat) and 13 Red (Fail to Beat)

2. Shuffle the cards and deal 12 at random to form Group A.

3. Count the number of Black (Beat) cards in Group A.

4. Repeat many times to see how often a random assignment gives a count as small as the experimental count (3) to Group A.

Automate this

Page 16: Bootstraps and Scrambles: Letting Data Speak for Themselves

50

100

150

200

250

300

350

ABeat0 2 4 6 8 10

Measures from Scrambled Friendly Observer Experiment Histogram

Friendly Observer – Fathom Computer Simulation

48/1000

Page 17: Bootstraps and Scrambles: Letting Data Speak for Themselves

Automate: Friendly Observers Applet

Allan Rossman & Beth Chance http://www.rossmanchance.com/applets/

Page 18: Bootstraps and Scrambles: Letting Data Speak for Themselves

Observer’s Applet

Page 19: Bootstraps and Scrambles: Letting Data Speak for Themselves

Fisher’s Exact test

1124

812

312

1124

912

212

1124

1012

112

1124

1112

012

P( A Beat < 3)

04363.0058.000032.0000005.0

0498.0)3Beat A ( P

Page 20: Bootstraps and Scrambles: Letting Data Speak for Themselves

35.035.536.036.537.037.538.038.539.0

Age6 8 10 12 14 16 18

FishEggs Scatter Plot

Example #3: Lake Ontario Trout X = fish age (yrs.)Y = % dry mass of eggsn = 21 fish

Is there a significant negative association between age and % dry mass of eggs?

r = -0.45

Ho:ρ=0 vs. Ha: ρ<0

Page 21: Bootstraps and Scrambles: Letting Data Speak for Themselves

• Randomize the PctDM values to be assigned to any of the ages (ρ=0).

• Compute the correlation for the randomized sample.

• Repeat MANY times.• See how often the randomization

correlations exceed the originally observed r=-0.45.

Randomization Test for Correlation

FishEggsAge PctDM <new>

1234567891011121314151617181920

7 37.35

8 38.05

8 37.45

9 38.95

9 37.9

9 36.45

9 36.15

10 38.35

10 37.15

11 36.5

11 35.1

12 37.7

12 37.1

13 37.4

13 37.55

13 36.35

14 36.75

15 37.05

17 36.15

18 35.7

Scrambled FishEggsAge PctDM <new>

1234567891011121314151617181920

7 37.15

8 36.15

8 37.1

9 35.7

9 36.15

9 37.05

9 35.1

10 36.75

10 38.95

11 37.7

11 37.4

12 38.05

12 36.45

13 37.9

13 36.5

13 36.35

14 37.55

15 37.45

17 37.35

18 38.35

Page 22: Bootstraps and Scrambles: Letting Data Speak for Themselves

r-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

Measures from Scrambled FishEggs Dot Plot

Randomization Distribution of Sample Correlations when Ho:ρ=0

26/1000

r=-0.45

Page 23: Bootstraps and Scrambles: Letting Data Speak for Themselves

Confidence Interval for Correlation?

Construct a bootstrap distribution of correlations for samples of n=20 fish drawn with replacement from the original sample.

Page 24: Bootstraps and Scrambles: Letting Data Speak for Themselves

r-0.8 -0.6 -0.4 -0.2 0.0 0.2

Measures from Sample of FishEggs Dot Plot

Bootstrap Distribution of Sample Correlations

r=-0.74 r=-0.08

Page 25: Bootstraps and Scrambles: Letting Data Speak for Themselves

Bootstrap/Randomization Methods• Require few (often no) assumptions/conditions

on the underlying population distribution.• Avoid needing a theoretical derivation of

sampling distribution.• Can be applied readily to lots of different

statistics.• Are more intuitively aligned with the logic of

statistical inference.

Page 26: Bootstraps and Scrambles: Letting Data Speak for Themselves

Can these methods really be used to introduce students to the core ideas of statistical inference?

Coming in 2012…

Statistics: Unlocking the Power of Databy Lock, Lock, Lock, Lock and Lock