30
Galit Shmuéli Indian School of Business Hyderabad, India Too Large To Fail: Large Samples and False Discoveries Lin, Lucas & Shmueli (2013), “Too Large To Fail: Large Samples and the P-Value Problem”, Information Systems Research

Too Large To Fail: Large Samples and False Discoveries

Embed Size (px)

DESCRIPTION

Slides from Galit Shmueli's talk at the closing panel "Too Much Data + Too Much Statistics = Too Many Errors?" at the 2014 Israel Statistical Association symposium.

Citation preview

Page 1: Too Large To Fail: Large Samples and False Discoveries

Galit Shmuéli

Indian School of BusinessHyderabad, India

Too Large To Fail: Large Samples and False Discoveries

Lin, Lucas & Shmueli (2013), “Too Large To Fail: Large Samples and the P-Value Problem”, Information Systems Research

Page 2: Too Large To Fail: Large Samples and False Discoveries

Current Empirical Research in Information Systems, Marketing

“over 10,000 publicly available feedback text comments… in eBay”The Nature and Role of Feedback Text Comments in Online Marketplaces

Pavlou & Dimoka, Information Systems Research 2006

“51,062 rare coin auctions that took place… on eBay”The Sound of Silence in Online Feedback

Dellarocas & Wood, Management Science 2006

“3.7 million records, encompassing transactions for the Federal Supply Service (FSS) of the U.S. Federal government in fiscal year 2000.”

Using Transaction prices to re-examine price dispersion in electronic marketsGhose & Yao, Information Systems Research 2011

108,333 used vehicles offered in the wholesale automotive marketElectronic vs Physical Market Mechanisms

Overby & Jap, Management Science 2008

For our analysis, we have … 784,882 [portal visits]Household-Specific Regressions Using Clickstream Data

Goldfarb & Lu, Statistical Science 2006

Page 3: Too Large To Fail: Large Samples and False Discoveries
Page 4: Too Large To Fail: Large Samples and False Discoveries

It’s about Power

Page 5: Too Large To Fail: Large Samples and False Discoveries

Magnify effects

Separate signal from noise

Page 6: Too Large To Fail: Large Samples and False Discoveries

Artwork: Running the numbers by Chris Jordan (www.chrisjordan.com) 426,000 cell phones retired in the US every day

Page 7: Too Large To Fail: Large Samples and False Discoveries

Power = Prob (detect true H1 effect) = f ( sample size, effect size, a, noise )

Page 8: Too Large To Fail: Large Samples and False Discoveries

Small & complex effects

Stronger validity

Rare events

The Promise

Page 9: Too Large To Fail: Large Samples and False Discoveries

P-value criticized for various reasons

P-value doesn’t measure plausibility of researcher’s hypothesis

Page 10: Too Large To Fail: Large Samples and False Discoveries

Apply small sample inference approach to Big Data studies?

Page 11: Too Large To Fail: Large Samples and False Discoveries

“In a large sample, we can obtain very large t statistics with low p-values for our predictors, when, in fact, their effect on Y is very slight”

Applied Statistics in Business & Economics (Doane & Seward)

Page 12: Too Large To Fail: Large Samples and False Discoveries
Page 13: Too Large To Fail: Large Samples and False Discoveries

Dramatic slope? Largest difference < $1,000

How much does a "value added" teacher contribute to a person's salary at age 28?

What happens to the average student test score as a "high value-added teacher enters the school"?

up by 0.03 points

Page 14: Too Large To Fail: Large Samples and False Discoveries

~50% of recent papers in top Information Systems journals, with n>10,000, rely almost exclusively on low p-values and the coefficient sign to draw conclusions

Page 15: Too Large To Fail: Large Samples and False Discoveries

Example

Page 16: Too Large To Fail: Large Samples and False Discoveries

Why Does This “Bug” Occur?

Numerical issues?

Design of test procedure?

Design of hypotheses?

Page 17: Too Large To Fail: Large Samples and False Discoveries

test statistic p-value hypotheses

Page 18: Too Large To Fail: Large Samples and False Discoveries

Suppose H0: b=0.

For a consistent estimator :

Page 19: Too Large To Fail: Large Samples and False Discoveries

“Are the effects of A and B different? They are always different -- for some decimal place.”

John Tukey, 1991

Page 20: Too Large To Fail: Large Samples and False Discoveries

How can you trust a method when you don’t even know when it is failing?

Lin, Lucas & Shmueli (2013), “Too Large To Fail: Large Samples and the P-Value Problem”, Information Systems Research

Page 21: Too Large To Fail: Large Samples and False Discoveries

The CI Chart

Sample size affects the p-value more drastically than it affects the CI

Page 22: Too Large To Fail: Large Samples and False Discoveries

The coefficient/p-value/sample-size (CPS) chart

For what n does the p-value problem arise?

Monte Carlo

Page 23: Too Large To Fail: Large Samples and False Discoveries

Proposed Solutions

Page 24: Too Large To Fail: Large Samples and False Discoveries

In empirical economics, a common practice is to report confidence intervals, or, depending on the context, to be more conservative and report only one of its bounds

Adjust the threshold p-value downward as the sample size grows (Greene 2003, Leamer 1978) - not used and no proposed rules of thumb in terms of how such adjustments should be made

Solutions Proposed in Econometrics

Page 25: Too Large To Fail: Large Samples and False Discoveries

Statisticians’ Solution: Effect Size and Confidence Intervals

“Authors should avoid tests of statistical significance; instead, they should report on effect sizes, confidence intervals, replications/extensions, and meta-analyses.

Practitioners should ignore significance tests and journals should discourage them.”

Scott Armstrongin “Significance Tests Harm Progress in Forecasting”

Page 26: Too Large To Fail: Large Samples and False Discoveries
Page 27: Too Large To Fail: Large Samples and False Discoveries

“A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world…

If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?”

Cohen (1990), “Things I have learned (so far)”Amer. Psychologist

Things I have learned (so far): Social scientists are unlikely to abandon classical hypothesis testing

Page 28: Too Large To Fail: Large Samples and False Discoveries

Information Systems literature: Authors who recognized the issue, tried to -

• Reduce the significance level threshold• Re-compute the p-value for a small sample• Report confidence intervals• Marginal effects charts• Conduct sensitivity analysis, modifying the

covariates or the variable structure• Include new variables• Rerun model on random data subsets• Compare model coefficients to another

model with additional control variables

Page 29: Too Large To Fail: Large Samples and False Discoveries

Conclusion

Increased power leads to a dangerous pitfall as well as a

huge opportunity

Page 30: Too Large To Fail: Large Samples and False Discoveries

Superpower challenge: “Meaningless Statistics”Need a large sample inference approach