Using Simulations to Teach Statistical Inference Beth Chance, Allan Rossman (Cal Poly) ICTCM 20111

Using Simulations to Teach Statistical InferenceBeth Chance, Allan Rossman (Cal Poly)

ICTCM 2011 1

Joint Work with

Soma Roy, Karen McGaughey (Cal Poly), Alex Herrington (Cal Poly undergrad)

John Holcomb (Cleveland State), George Cobb (Mt. Holyoke), Nathan Tintle, Jill VanderStoep, Todd

Swanson (Hope College) This project has been supported by the

National Science Foundation, DUE/CCLI #0633349

ICTCM 2011 2

Outline

Motivation/Goals Examples

Binomial process, randomized experiment- binary, randomized experiment - quantitative response

Series of lab assignments Discussion points

Student feedback, Evaluation results Design principles & implementation Observations, Open questions

ICTCM 2011 3

Motivation

Cobb (2007) – 12 reasons to teach permutation tests… Model is “simple and easily grasped” Matches production process, links data production

and inference Role for tactile and computer simulations Easily extendible to other designs (e.g., blocking) Fisherian logic

--”The Introductory Statistics Course:

A Ptolemaic Curriculum” (TISE)

ICTCM 2011 4

Goals

Develop an introductory curriculum that focuses on randomization-based approach to inference vs. using simulation to teach traditional inference From beginning of course, permeate all topics

Improve understanding of inference and statistical process in general More modern (computer intensive) and flexible

approach to inferential analysis

ICTCM 2011 5

Brief overview of labs

Case-study focus Pre-lab

Background, Review questions submitted in advance 50-minute (computer) lab period Online instructions

Directed questions following statistical process Embedded applets or statistical software

Application/Extension Lab report with partner

ICTCM 2011 6

Example 1: Friend or Foe (Helper/Hinderer) Videos Research question Pre-lab Descriptive analysis Introduction of null hypothesis, p-

value terminology Plausible values Conclusions

ICTCM 2011 7

Discussion Points

Can this be done on day one? Yes if can motivate the simulation

Loaded dice Before reveal the data?

ICTCM 2011 8

<<After tactile simulation>> How many infants would need to choose the helper toy for you to be convinced the choice was not made “at random,” but they actually prefer the helper toy? Many students can reason inferentially

“If a choice is made at complete random, then having 13 infants would be highly unlikely”

“Based on the coin flipping experiment, the results stated that at/over 12 was extremely rare. Therefore, at least 12 infants …

“Would be around 12-16 because it seems highly unlikely that given a 50-50 option 12-16 would choose the helper toy”

ICTCM 2011 9

<<After tactile simulation>> How many infants would need to choose the helper toy for you to be convinced the choice was not made “at random,” but they actually prefer the helper toy? But maybe not as well “distributionally”

Is it unusual? = “barely over half” vs. unusual compared to distribution

Examine language carefully “Unlikely that choice is random” “Prove” “Simulate”, “Repeated this study” “At random” = 50/50, “model”

“Random” = anything is possible

ICTCM 2011 10

Discussion Points

Can this be done on day one? Yes if can motivate the simulation

Loaded dice Before reveal the data? Enough understanding of “chance model”? Use of class data instead? (“observed” vs. research

study) Yes, if return to and build on the ideas throughout

the course So what comes next?

ICTCM 2011 11

Discussion Points

Tactile simulation One coin 16 times vs. 16 coins

Population vs process Defining the parameter

3Ss: statistic, simulate, strength of evidence “could have been” distribution of data “what if the null was true” distribution of statistic

Fill in the blank wording Timing of final report

Follow-up in-class discussion

ICTCM 2011 12

Example 2: Two Proportions

Is Yawning Contagious? Modelling entire process: data collection,

descriptive statistics, inferential analysis, conclusions

Parallelisms to first example Could random assignment alone produce a

difference in the group proportions at least this extreme?

Card shuffling, recreate two-way table Extend to own data

ICTCM 2011 13

Lab Instructions

ICTCM 2011 14

Exam Questions

Horizontal axis Shade p-value Make up a research question

ICTCM 2011 15

Discussion Points

Starting with a significant result but when ready to discuss insignificant?

How critical is authentic data? Choice of statistic (count vs. difference in

proportion) Role of traditional symbols and notation? Visualization of bar graphs from trial to trial Implementation of predict and test

ICTCM 2011 16

Example 3: Two means

Are there lingering effects to sleep deprivation? Randomized experiment Quantitative data Parallel inferential reasoning process

Index cards

Possible follow-up/extensions: what if -4.33?, medians, plausible values

ICTCM 2011 17

Discussion Points

Role of tactile simulation Scaffolding of lab report

Introductory sentences, labeling of graphs Write conclusion to journal

When should “normal-based” methods be introduced Alternative approximation to simulation Position, method for confidence intervals

Choice of technology Advantages/Disadvantages

Applets, Minitab, R, Fathom

ICTCM 2011 18

Post-Lab Assessment (Fall 2010) Following the lab comparing two groups on a

quantitative variable (65 responses) Discuss the purpose of the simulation process What information does the simulation process reveal

to help you answer the research question? Essentially correct: 35.4% demonstrated

understanding of the big picture (looking at repeated shuffles to assess whether the observed results happened by chance)

Partially: 38.5% (one of null or comparison) Incorrect: 26.1% (“better understand the data”)

ICTCM 2011 19

Post-Lab Assessment (Fall 2010) Did students address the null hypothesis?

33.9% E/ 38.5% P/ 27.7% I Did students reference the random assignment?

36.9% E/ 36.9% P/ 26.2% I Did students focus on comparing the observed

result? 64.6% E/ 13.8% P/ 21.5% I

Did students explain how they would link the pieces together and draw their conclusion? 24.6% E/ 60% P/ 15% I

ICTCM 2011 20

Student Surveys

ICTCM 2011 21

Student Surveys

ICTCM 2011 22

Student Surveys

Example 3 simulation

ICTCM 2011 23

Student Surveys

ICTCM 2011 24

Student Surveys

ICTCM 2011 25

Student Surveys

Helper/Hinderer (Winter 2011) – Did the lab help you understand the overall process of a statistical investigation?

ICTCM 2011 26

Student Surveys

Did subsequent labs increase understanding?

ICTCM 2011 27

Remainder of labs

Lab 4: Random babies Lab 5: Reese’s Pieces (demo)

Normal approximation, CLT for binary Transition to formal test of significance (6 steps)

Lab 6: Sleepless nights (finite population) t approximation, CLT for quantitative, conf interval

Lab 7: Simulation of matched-pairs Lab 8: Simulation of regression sampling Chi-square, ANOVA

ICTCM 2011 28

Lab Report

ICTCM 2011 29

Student Feedback (Winter 2011) Google docs survey during last week of

course Two instructors

ICTCM 2011 30

Student end-of-course surveys (W 11)

ICTCM 2011 31

Student end-of-course surveys

ICTCM 2011 32


ICTCM 2011 33


ICTCM 2011 34


ICTCM 2011 35


ICTCM 2011 36


ICTCM 2011 37


ICTCM 2011 38


ICTCM 2011 39


ICTCM 2011 40


ICTCM 2011 41


ICTCM 2011 42


ICTCM 2011 43


ICTCM 2011 44


ICTCM 2011 45

Top 2 most interesting labs

Instructor A Is Yawning Contagious? Heart Rates (matched pairs)

Instructor B Friend or Foe Is Yawning Contagious? Reese’s Pieces

ICTCM 2011 46

Top 2 most/least helpful labs

Most helpful: Friend or Foe

Least Helpful (Instructor B): Random babies Melting away (intro two-sample t, paired)

ICTCM 2011 47

Exam 1

In a recent Gallup survey of 500 randomly selected US adult Republicans, 390 said they believe their congressional representative should vote to repeal the Healthcare Law. Suppose we wish to determine if significantly more than three-quarters (75%) of US adult Republicans favor repeal.

The coin tossing simulation applet was used to generate the following two dotplots (A) and (B). Which, if either, of the two plots (A) and (B) was created using the correct procedure? Explain how you know.

ICTCM 2011 48

Exam 1

35% picked B (usually citing null .75500) But some look at shape, or later p-value

29% picked A (observed result) 23% neither (wanted .5500 = 250) 13% other responses: 0, .75, 50, can’t tell,

anything possible, label is wrong

ICTCM 2011 49

Exam 2

Heights of females are known to follow a normal distribution with a mean of 64 inches and a standard deviation of 3 inches. Consider the behavior of sample means. Each of the graphs below depicts the behavior of the sample mean heights of females. a. One graph shows the distribution of sample means for many, many samples of size 10. The other graph shows the distribution of sample means for many, many samples of size 50. Which graph goes with which sample size?

ICTCM 2011 50

Exam 2

85% matched n=10 and n = 50

ICTCM 2011 51

Exam 2

Suppose we wish to test the following hypotheses about the population of Cal Poly undergraduate women:

For which graph (A or B) would you expect

the p-value to be smaller? Explain using the appropriate statistical reasoning.

: 64

: 64o Height

A Height

H

H

ICTCM 2011 52

Exam 2

77% picked B Mixture of appealing to smaller SD/outliers, larger

sample size means smaller p-value, and thinking in terms of test statistic

A few choices not internally consistent

ICTCM 2011 53

Student understanding of p-value CAOS questions (final exam)

Statistically significant results correspond to small p-values Traditional (National/Hope/CP): 69/86/41% Randomization (Hope/CP): 95%/95%

Recognize valid p-value interpretation Traditional (National/Hope/CP): 57/41/74% Randomization (Hope/CP): 60/72%

p-value as probability of Ho - Invalid Traditional (National/Hope/CP): 59/69/68% Randomization (Hope/CP): 80%/89%

ICTCM 2011 54

Student understanding of p-value CAOS questions (final exam)

p-value as probability of Ha – Invalid Traditional (National/Hope/CP): 54/48/72% Randomization (Hope/CP): 45/67%

Recognize a simulation approach to evaluate significance (simulate with no preference vs. repeating the experiment) Traditional (National/Hope/CP): 20/20/30% Randomization (Hope/CP): 32%/40%

ICTCM 2011 55

Student understanding of p-value p-value interpretation in regression (final

exam)

ICTCM 2011 56

Student understanding of process Video game question (Final exam: NCSU, Hope,

Cal Poly, UCLA, Rhodes College) What is the explanation for the process the

student followed? Which of the following was used as a basis for

simulating the data 1000 times? What does the histogram tell you about whether

$5 incentives are effective in improving performance on the video game?

Which of the following could be the approximate p-value in this situation?

ICTCM 2011 57

Student understanding of process Simulation process

Fall: over 40% chose “This process allows her to determine how many times she needs to replicate the experiment for valid results.”

About 70% pick “The $5 incentive and verbal encouragement are equally effective at improving performance.” as underlying assumption

Still evidence some look at center at zero or shape as evidence of no treatment effect

1/3 to ½ could estimate p-value from graph

ICTCM 2011 58

Example – 2009 AP Statistics Exam A consumer organization would like a method

for measuring the skewness of the data. One possible statistic for measuring skewness is the ratio mean/median…. Calculate statistic for sample data… Draw conclusion from simulated data …

59ICTCM 2011

Design Principles

Tactile simulation Visual, contextual animation of tactile simulation Intermediate animation capability Level of student construction

Ease of changing inputs Connect elements between graphs

Carefully designed, spiraling activities “Stop!” Thought questions

Allow for student exploration

ICTCM 2011 60

Implementation

Early in course Repetition through course, connections Normal approximations Lab assignments

Focus on entire statistical process Motivating research question Follow-up application Thought questions Screen captures Pre-lab questions Minitab demos (Adobe Captivate)

Exam questions

ICTCM 2011 61

Observations

Students quickly get sense of trying to determine whether a result could be “just due to chance”

Still struggle with more technical understanding Under the null hypothesis Observed vs. hypothesized value

Students may fail to see connections between scenarios

ICTCM 2011 62

Suggestions/Open Questions

Begin with class discussion/brain-storming on how to evaluate data before show class results Loaded dice, biased coin tossing Thought questions

Student data vs. genuine research article “the result” vs. “your result”

Choice of first exposure Significant? Random sampling or random assignment

ICTCM 2011 63


Scaffolding Observational units, variable

How would you add one more dot to graph? At some point, require students to enter the

correct “observed result” (e.g., Captivate) At some point, ask students to design the

simulation? Start with fill in the blank interpretation?

ICTCM 2011 64


One crank or more? When connect to normal approximations?

How make sure traditional methods don’t overtake once they are introduced?

How much discuss exact methods? Confidence intervals

ICTCM 2011 65

Summary

Very promising but also need to be very careful, and need a strong cycle of repetition closely tied to rest of course…

ICTCM 2011 66

Documents

Using Simulations to Teach Statistical Inference Beth Chance, Allan Rossman (Cal Poly) ICTCM 20111