Too Large To Fail: Large Samples and False Discoveries

Galit Shmuéli

Indian School of BusinessHyderabad, India

Too Large To Fail: Large Samples and False Discoveries

Lin, Lucas & Shmueli (2013), “Too Large To Fail: Large Samples and the P-Value Problem”, Information Systems Research

Current Empirical Research in Information Systems, Marketing

“over 10,000 publicly available feedback text comments… in eBay”The Nature and Role of Feedback Text Comments in Online Marketplaces

Pavlou & Dimoka, Information Systems Research 2006

“51,062 rare coin auctions that took place… on eBay”The Sound of Silence in Online Feedback

Dellarocas & Wood, Management Science 2006

“3.7 million records, encompassing transactions for the Federal Supply Service (FSS) of the U.S. Federal government in fiscal year 2000.”

Using Transaction prices to re-examine price dispersion in electronic marketsGhose & Yao, Information Systems Research 2011

108,333 used vehicles offered in the wholesale automotive marketElectronic vs Physical Market Mechanisms

Overby & Jap, Management Science 2008

For our analysis, we have … 784,882 [portal visits]Household-Specific Regressions Using Clickstream Data

Goldfarb & Lu, Statistical Science 2006

It’s about Power

Magnify effects

Separate signal from noise

Artwork: Running the numbers by Chris Jordan (www.chrisjordan.com) 426,000 cell phones retired in the US every day

Power = Prob (detect true H1 effect) = f ( sample size, effect size, a, noise )

Small & complex effects

Stronger validity

Rare events

The Promise

P-value criticized for various reasons

P-value doesn’t measure plausibility of researcher’s hypothesis

Apply small sample inference approach to Big Data studies?

“In a large sample, we can obtain very large t statistics with low p-values for our predictors, when, in fact, their effect on Y is very slight”

Applied Statistics in Business & Economics (Doane & Seward)

Dramatic slope? Largest difference < $1,000

How much does a "value added" teacher contribute to a person's salary at age 28?

What happens to the average student test score as a "high value-added teacher enters the school"?

up by 0.03 points

~50% of recent papers in top Information Systems journals, with n>10,000, rely almost exclusively on low p-values and the coefficient sign to draw conclusions

Example

Why Does This “Bug” Occur?

Numerical issues?

Design of test procedure?

Design of hypotheses?

test statistic p-value hypotheses

Suppose H0: b=0.

For a consistent estimator :

“Are the effects of A and B different? They are always different -- for some decimal place.”

John Tukey, 1991

How can you trust a method when you don’t even know when it is failing?

Lin, Lucas & Shmueli (2013), “Too Large To Fail: Large Samples and the P-Value Problem”, Information Systems Research

The CI Chart

Sample size affects the p-value more drastically than it affects the CI

The coefficient/p-value/sample-size (CPS) chart

For what n does the p-value problem arise?

Monte Carlo

Proposed Solutions

In empirical economics, a common practice is to report confidence intervals, or, depending on the context, to be more conservative and report only one of its bounds

Adjust the threshold p-value downward as the sample size grows (Greene 2003, Leamer 1978) - not used and no proposed rules of thumb in terms of how such adjustments should be made

Solutions Proposed in Econometrics

Statisticians’ Solution: Effect Size and Confidence Intervals

“Authors should avoid tests of statistical significance; instead, they should report on effect sizes, confidence intervals, replications/extensions, and meta-analyses.

Practitioners should ignore significance tests and journals should discourage them.”

Scott Armstrongin “Significance Tests Harm Progress in Forecasting”

“A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world…

If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?”

Cohen (1990), “Things I have learned (so far)”Amer. Psychologist

Things I have learned (so far): Social scientists are unlikely to abandon classical hypothesis testing

Information Systems literature: Authors who recognized the issue, tried to -

• Reduce the significance level threshold• Re-compute the p-value for a small sample• Report confidence intervals• Marginal effects charts• Conduct sensitivity analysis, modifying the

covariates or the variable structure• Include new variables• Rerun model on random data subsets• Compare model coefficients to another

model with additional control variables

Conclusion

Increased power leads to a dangerous pitfall as well as a

huge opportunity

Superpower challenge: “Meaningless Statistics”Need a large sample inference approach

Education

Too Large To Fail: Large Samples and False Discoveries