Randomization and Bootstrap Methods in the …Randomization and Bootstrap Methods in the...

Preview:

Citation preview

RandomizationandBootstrapMethodsintheIntroductory

StatisticsCourse

KariLockMorgan RobinLockDukeUniversity St.LawrenceUniversity

kari@stat.duke.edu rlock@stlawu.edu

Panela2013JointMathematicsMeetingsSanDiego,CA

HowmighttheIntroStatcurriculumchangeto

accommodate/takeadvantageofbootstrap/randomization

methods?

IntroStat– TraditionalTopics• DescriptiveStatistics– oneandtwosamples• Normaldistributions• Dataproduction(samples/experiments)

• Samplingdistributions(mean/proportion)

• Confidenceintervals(means/proportions)

• Hypothesistests(means/proportions)

• ANOVAforseveralmeans,Inferenceforregression,Chi-squaretests

IntroStat– RevisetheTopics• DescriptiveStatistics– oneandtwosamples• Normaldistributions• Dataproduction(samples/experiments)

• Samplingdistributions(mean/proportion)

• Confidenceintervals(means/proportions)

• Hypothesistests(means/proportions)

• ANOVAforseveralmeans,Inferenceforregression,Chi-squaretests

• Dataproduction(samples/experiments)• Bootstrapconfidenceintervals• Randomization-basedhypothesistests• Normaldistributions

• Bootstrapconfidenceintervals• Randomization-basedhypothesistests

• DescriptiveStatistics– oneandtwosamples

WhystartwithBootstrapCI’s?•Minimalprerequisites:

Populationparametervs.samplestatisticRandomsamplingDotplot (orhistogram)Standarddeviationand/orpercentiles

• SamemethodofrandomizationinmostcasesSamplewithreplacementfromoriginalsample

• NaturalprogressionSampleestimate==>Howaccurateistheestimate?

• Intervalsaremoreuseful?Agooddebateforanothersession…

Example:MustangPrices

Data:Sampleof25MustangslistedonAutotrader.com

Findaconfidence intervalfortheslope ofaregression linetopredictpricesofusedMustangsbasedontheirmileage.

“Bootstrap”SamplesKeyidea:• Samplewithreplacementfromtheoriginalsampleusingthesamen.

• Computethesamplestatisticforeachbootstrapsample.

• Collectlotsofsuchbootstrapstatistics

Imaginethe“population”ismany,manycopiesoftheoriginalsample.

Distributionof3000BootstrapSlopes

UsingtheBootstrapDistributiontoGetaConfidenceInterval– Version#1

Thestandarddeviationofthebootstrapstatisticsestimatesthestandarderrorofthesamplestatistic.

Quickintervalestimate:

𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 ± 2 / 𝑆𝐸ForthemeanMustangslopetime:

)162.0,278.0(058.022.0029.0222.0 −−=−±−=⋅±−

UsingtheBootstrapDistributiontoGetaConfidenceInterval– Version#2

Keep95%inmiddle

Chop2.5%ineachtail

Chop2.5%ineachtail

95%CIforslope(-0.279,-0.163)

3.SimulationTechnology?

Fall2010:FathomFall2011:Fathom&Applets

Tactilesimulationsfirst?Bootstrap– No(withreplacementistough)Testforanexperiment– Yes(1or2)

DesirableTechnologyFeatures?

ThreeDistributions

OnetoManySamples

DesirableTechnologyFeatures

4.OneCrankorTwo?

ConfidenceIntervals– Bootstrap– onecrank

SignificanceTests– Two(ormore)cranks

Rulesforselectingrandomizationsamplesforatest.Beconsistentwith:1. thenullhypothesis2. thesampledata3. thewaydatawerecollected

RandomizationTestforSlope

5.Testfora2x2Table

Firstexample:ArandomizedexperimentTeststatistic:CountinonecellRandomize:TreatmentgroupsMargins:FixbothLaterexamplesvary,e.g.usedifferenceinproportionsorrandomizeasindependentsampleswithcommonp.

6.Whatabout“traditional”methods?

AFTERstudentshaveseenlotsofbootstrapandrandomizationdistributions(andhopefullybeguntounderstandthelogicofinference)…

• Introducethenormaldistribution(andlatert)

• Introduce“shortcuts”forestimatingSEforproportions,means,differences,…

BacktoMustangPricesThe regression equation isPrice = 30.5 - 0.219 Miles

Predictor Coef SE Coef T PConstant 30.495 2.441 12.49 0.000Miles -0.21880 0.03130 -6.99 0.000

S = 6.42211 R-Sq = 68.0% R-Sq(adj) = 66.6%

7.Assessment?

Newlearninggoals• Understandhowtogeneratebootstrap

samplesanddistribution.• Understandhowtocreaterandomization

samplesanddistribution.• Beabletouseabootstrap/randomization

distributiontofindaninterval/p-value.

8.Howdiditgo?• Studentsenjoyedandwereengagedwiththenewapproach• Instructorenjoyedandwasengagedwiththenewapproach.• Betterunderstandingofp-valuereflecting“ifH0 istrue”.• Betterinterpretationsofintervals.• Challenge:Few“experienced”studentstoserveasresources.

Goingforward

Continuewithrandomizationapproach?

ABSOLUTELY(3sectionsinFall2011)

Recommended