Upload
galit-shmueli
View
332
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Slides from Galit Shmueli's talk at the closing panel "Too Much Data + Too Much Statistics = Too Many Errors?" at the 2014 Israel Statistical Association symposium.
Citation preview
Galit Shmuéli
Indian School of BusinessHyderabad, India
Too Large To Fail: Large Samples and False Discoveries
Lin, Lucas & Shmueli (2013), “Too Large To Fail: Large Samples and the P-Value Problem”, Information Systems Research
Current Empirical Research in Information Systems, Marketing
“over 10,000 publicly available feedback text comments… in eBay”The Nature and Role of Feedback Text Comments in Online Marketplaces
Pavlou & Dimoka, Information Systems Research 2006
“51,062 rare coin auctions that took place… on eBay”The Sound of Silence in Online Feedback
Dellarocas & Wood, Management Science 2006
“3.7 million records, encompassing transactions for the Federal Supply Service (FSS) of the U.S. Federal government in fiscal year 2000.”
Using Transaction prices to re-examine price dispersion in electronic marketsGhose & Yao, Information Systems Research 2011
108,333 used vehicles offered in the wholesale automotive marketElectronic vs Physical Market Mechanisms
Overby & Jap, Management Science 2008
For our analysis, we have … 784,882 [portal visits]Household-Specific Regressions Using Clickstream Data
Goldfarb & Lu, Statistical Science 2006
It’s about Power
Magnify effects
Separate signal from noise
Artwork: Running the numbers by Chris Jordan (www.chrisjordan.com) 426,000 cell phones retired in the US every day
Power = Prob (detect true H1 effect) = f ( sample size, effect size, a, noise )
Small & complex effects
Stronger validity
Rare events
The Promise
P-value criticized for various reasons
P-value doesn’t measure plausibility of researcher’s hypothesis
Apply small sample inference approach to Big Data studies?
“In a large sample, we can obtain very large t statistics with low p-values for our predictors, when, in fact, their effect on Y is very slight”
Applied Statistics in Business & Economics (Doane & Seward)
Dramatic slope? Largest difference < $1,000
How much does a "value added" teacher contribute to a person's salary at age 28?
What happens to the average student test score as a "high value-added teacher enters the school"?
up by 0.03 points
~50% of recent papers in top Information Systems journals, with n>10,000, rely almost exclusively on low p-values and the coefficient sign to draw conclusions
Example
Why Does This “Bug” Occur?
Numerical issues?
Design of test procedure?
Design of hypotheses?
test statistic p-value hypotheses
Suppose H0: b=0.
For a consistent estimator :
“Are the effects of A and B different? They are always different -- for some decimal place.”
John Tukey, 1991
How can you trust a method when you don’t even know when it is failing?
Lin, Lucas & Shmueli (2013), “Too Large To Fail: Large Samples and the P-Value Problem”, Information Systems Research
The CI Chart
Sample size affects the p-value more drastically than it affects the CI
The coefficient/p-value/sample-size (CPS) chart
For what n does the p-value problem arise?
Monte Carlo
Proposed Solutions
In empirical economics, a common practice is to report confidence intervals, or, depending on the context, to be more conservative and report only one of its bounds
Adjust the threshold p-value downward as the sample size grows (Greene 2003, Leamer 1978) - not used and no proposed rules of thumb in terms of how such adjustments should be made
Solutions Proposed in Econometrics
Statisticians’ Solution: Effect Size and Confidence Intervals
“Authors should avoid tests of statistical significance; instead, they should report on effect sizes, confidence intervals, replications/extensions, and meta-analyses.
Practitioners should ignore significance tests and journals should discourage them.”
Scott Armstrongin “Significance Tests Harm Progress in Forecasting”
“A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world…
If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?”
Cohen (1990), “Things I have learned (so far)”Amer. Psychologist
Things I have learned (so far): Social scientists are unlikely to abandon classical hypothesis testing
Information Systems literature: Authors who recognized the issue, tried to -
• Reduce the significance level threshold• Re-compute the p-value for a small sample• Report confidence intervals• Marginal effects charts• Conduct sensitivity analysis, modifying the
covariates or the variable structure• Include new variables• Rerun model on random data subsets• Compare model coefficients to another
model with additional control variables
Conclusion
Increased power leads to a dangerous pitfall as well as a
huge opportunity
Superpower challenge: “Meaningless Statistics”Need a large sample inference approach