Methodology: Upping Your Game in Survey Analytics
DAVID VANNETTE Qualtrics
By the end of this presentation you will: • Know how to avoid several data analysis pitfalls • Have a strategy for drawing the most valid and
actionable insights from your data • Have a better ability to critique research • Learn strategies to bring your analysis to the next level
Goals
This presentation does NOT cover: • The nuts and bolts of survey data
analysis • Statistical software
Topics This presentation covers:
• Research strategy • Fishing • P-hacking • Statistical significance • Transparency • Data visualization • Uncertainty in data • Research validity
Outline • A new way to approach data analysis • What to do • What not to do Pt 1 • What not to do Pt 2 • What not to do Pt 3 • What not to do Pt 4 • What not to do Pt 5 • What not to do Pt 6 • What not to do Pt 7
Qualtrics cares (and so do I): • We want you to be as good
as you can be • Lots of people get this stuff
wrong • The world is changing • When you win, we win
Why This Talk?
7 ©2015 QUALTRICS LLC.
What story do you expect your data to tell you?
Before you have any data:
• What data will you need to reach your objective?
• How will you present these data?
• Make a shell of the report/write-up with clear indications of which numbers are needed and where.
Step 1: Write the Report
Before starting survey design: • What data will you need to present?
• How will you present them?
• What variables are in your ideal dataset?
• How will you analyze them? • Determine what format you will use to
show the data
Step 2: Assemble the Slide Deck
Now that you know what data you NEED…
• Ask questions that will produce exactly those data (and no more)
• Ask the questions in a way that will produce the most accurate data possible even if it means more data cleaning on the back end
Step 3: Design Your Dataset (AKA WRITE YOUR SURVEY)
Step 4: Collect Your Data
Produce exactly the results that you outlined in your report/presentation
• Data cleaning/restructuring
• Key comparisons/results
• Visualization
Step 5: Analyze Your Data
Step 6: Plug the Results into the Report and Slide Deck
14 ©2015 QUALTRICS LLC.
Who does research this way?
Research Philosophy (With Practical Application)
Translation: Whether a result in your analysis is meaningful depends on your context. Some statistically non-significant results could be critical, many statistically significant results could be useless How: Determine what matters before looking at the data and set thresholds for action.
What To Do: Search for Meaning
Translation: Data are never perfect and your analysis should reflect that fact. How: Present estimates with error bars, preferably 95% confidence intervals.
What to do: Embrace Uncertainty
Translation: Present your results in the most transparent way possible. How: Use data visualization and follow best practices (e.g. use the right viz type for the data, keep it simple, compare against goals, etc)
What To Do: Show Off
This means: • Understanding the structure of your variables (and knowing how
to restructure them if necessary) • Knowing the strengths of your variables (validated, robust,
reliable, etc) • Knowing the weaknesses of your variables (untested, noisy, etc) • Knowing the strengths and weaknesses of your sample (N,
necessary subsets, need for post-stratification, etc) • Looking at your data: crosstabs, plots, etc
Know Your Dataset
This does NOT mean: • Running t-tests, ANOVAs, chi-square tests,
regressions, etc, on every possible comparison
KNOW YOUR DATASET
21 ©2015 QUALTRICS LLC.
Where do things go wrong?
Understand: The difference between significant and insignificant is not itself significant.
Don’t: Rely solely on significance tests to determine what your data have to say about your research question.
Instead: Determine whether the effect or difference you see is significant for your research question.
What Not To Do: Star-Gazing
Understand: (p < 0.05) = (p < 1/20)
Don’t: Test many different comparisons or models in search of “significant” results
Instead: Specify your research questions and comparisons in advance and only report on those. Testing others is fine but they should then be replicated before being reported.
What Not To Do: Fishing
Understand: Lots of factors affect p-values, many of these can be affected by the researcher. This is cheating yourself.
Don’t: Manipulate data (recoding variables, trimming convenient “outliers”, etc) or use inappropriate tests in search of “significance”
Instead: Follow your analysis plan and be honest with yourself. There is more to research than stars.
What Not To Do: Hacking
• The p-value is not the probability that the null hypothesis is true or the probability that the alternative hypothesis is false. It is not connected to either.
• The p-value is not the probability that a finding is "merely a fluke."
• The p-value is not the probability of falsely rejecting the null hypothesis.
What P-Values Are Not • The p-value is not the probability that
replicating the experiment would yield the same conclusion.
• The significance level, such as 0.05, is not determined by the p-value.
• The p-value does not indicate the size or importance of the observed effect.
The p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the null hypothesis is true.
In observational survey research, the null hypothesis is that the result is due to sampling error.
What is a P-Value?
It is not an indication of the importance of a result to your research.
It is not an indication about the practical value of a result.
It is not an indication of the theoretical value of a result.
It is not the final arbiter of quality research
What Statistical Significance Is Not:
28 ©2015 QUALTRICS LLC.
How to take your approach to the next level
1. Ditch the spreadsheet
2. Focus on transparency
Getting Ahead of the Curve
1. Invest in learning visualization
2. Analysis should be reproducible
3. Analysis should be version-controlled
4. Analysis should be scalable
Focus on Transparency
1. Learn a new tool: 1. R (ggplot) 2. Tableau 3. GoodData
2. Learn how to effectively display data 1. Edward Tufte 2. Stephen Kosslyn
Dataviz
1. Document everything
2. Use standard methods
3. No black boxes
Make Your Work Reproducible
1. Take advantage of modern
tools for version control • Dropbox • Git • Google Docs
Version Control
1. Automate your analysis workflow
2. Design your research to be “plug-and-play”
Scaling Up
1. Work backwards (at least in your head) 2. Do:
• Think about meaning • Embrace uncertainty • Show off your data
3. Don’t: • Fish in your data • Hack your p-values • Star-gaze
4. Focus on transparency
Review
36 ©2015 QUALTRICS LLC.
Questions?
37 ©2015 QUALTRICS LLC.
Thanks!