Methodology: Upping Your Game in Survey Data Analytics

Preview:

Citation preview

Methodology: Upping Your Game in Survey Analytics

DAVID VANNETTE Qualtrics

By the end of this presentation you will: •  Know how to avoid several data analysis pitfalls •  Have a strategy for drawing the most valid and

actionable insights from your data •  Have a better ability to critique research •  Learn strategies to bring your analysis to the next level

Goals

This presentation does NOT cover: •  The nuts and bolts of survey data

analysis •  Statistical software

Topics This presentation covers:

•  Research strategy •  Fishing •  P-hacking •  Statistical significance •  Transparency •  Data visualization •  Uncertainty in data •  Research validity

Outline •  A new way to approach data analysis •  What to do •  What not to do Pt 1 •  What not to do Pt 2 •  What not to do Pt 3 •  What not to do Pt 4 •  What not to do Pt 5 •  What not to do Pt 6 •  What not to do Pt 7

Qualtrics cares (and so do I): •  We want you to be as good

as you can be •  Lots of people get this stuff

wrong •  The world is changing •  When you win, we win

Why This Talk?

7 ©2015 QUALTRICS LLC.

What story do you expect your data to tell you?

Before you have any data:

•  What data will you need to reach your objective?

•  How will you present these data?

•  Make a shell of the report/write-up with clear indications of which numbers are needed and where.

Step 1: Write the Report

Before starting survey design: •  What data will you need to present?

•  How will you present them?

•  What variables are in your ideal dataset?

•  How will you analyze them? •  Determine what format you will use to

show the data

Step 2: Assemble the Slide Deck

Now that you know what data you NEED…

•  Ask questions that will produce exactly those data (and no more)

•  Ask the questions in a way that will produce the most accurate data possible even if it means more data cleaning on the back end

Step 3: Design Your Dataset (AKA WRITE YOUR SURVEY)

Step 4: Collect Your Data

Produce exactly the results that you outlined in your report/presentation

•  Data cleaning/restructuring

•  Key comparisons/results

•  Visualization

Step 5: Analyze Your Data

Step 6: Plug the Results into the Report and Slide Deck

14 ©2015 QUALTRICS LLC.

Who does research this way?

Research Philosophy (With Practical Application)

Translation: Whether a result in your analysis is meaningful depends on your context. Some statistically non-significant results could be critical, many statistically significant results could be useless How: Determine what matters before looking at the data and set thresholds for action.

What To Do: Search for Meaning

Translation: Data are never perfect and your analysis should reflect that fact. How: Present estimates with error bars, preferably 95% confidence intervals.

What to do: Embrace Uncertainty

Translation: Present your results in the most transparent way possible. How: Use data visualization and follow best practices (e.g. use the right viz type for the data, keep it simple, compare against goals, etc)

What To Do: Show Off

This means: •  Understanding the structure of your variables (and knowing how

to restructure them if necessary) •  Knowing the strengths of your variables (validated, robust,

reliable, etc) •  Knowing the weaknesses of your variables (untested, noisy, etc) •  Knowing the strengths and weaknesses of your sample (N,

necessary subsets, need for post-stratification, etc) •  Looking at your data: crosstabs, plots, etc

Know Your Dataset

This does NOT mean: •  Running t-tests, ANOVAs, chi-square tests,

regressions, etc, on every possible comparison

KNOW YOUR DATASET

21 ©2015 QUALTRICS LLC.

Where do things go wrong?

Understand: The difference between significant and insignificant is not itself significant.

Don’t: Rely solely on significance tests to determine what your data have to say about your research question.

Instead: Determine whether the effect or difference you see is significant for your research question.

What Not To Do: Star-Gazing

Understand: (p < 0.05) = (p < 1/20)

Don’t: Test many different comparisons or models in search of “significant” results

Instead: Specify your research questions and comparisons in advance and only report on those. Testing others is fine but they should then be replicated before being reported.

What Not To Do: Fishing

Understand: Lots of factors affect p-values, many of these can be affected by the researcher. This is cheating yourself.

Don’t: Manipulate data (recoding variables, trimming convenient “outliers”, etc) or use inappropriate tests in search of “significance”

Instead: Follow your analysis plan and be honest with yourself. There is more to research than stars.

What Not To Do: Hacking

•  The p-value is not the probability that the null hypothesis is true or the probability that the alternative hypothesis is false. It is not connected to either.

•  The p-value is not the probability that a finding is "merely a fluke."

•  The p-value is not the probability of falsely rejecting the null hypothesis.

What P-Values Are Not •  The p-value is not the probability that

replicating the experiment would yield the same conclusion.

•  The significance level, such as 0.05, is not determined by the p-value.

•  The p-value does not indicate the size or importance of the observed effect.

The p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the null hypothesis is true.

In observational survey research, the null hypothesis is that the result is due to sampling error.

What is a P-Value?

It is not an indication of the importance of a result to your research.

It is not an indication about the practical value of a result.

It is not an indication of the theoretical value of a result.

It is not the final arbiter of quality research

What Statistical Significance Is Not:

28 ©2015 QUALTRICS LLC.

How to take your approach to the next level

1.  Ditch the spreadsheet

2.  Focus on transparency

Getting Ahead of the Curve

1.  Invest in learning visualization

2.  Analysis should be reproducible

3.  Analysis should be version-controlled

4.  Analysis should be scalable

Focus on Transparency

1.  Learn a new tool: 1.  R (ggplot) 2.  Tableau 3.  GoodData

2.  Learn how to effectively display data 1.  Edward Tufte 2.  Stephen Kosslyn

Dataviz

1.  Document everything

2.  Use standard methods

3.  No black boxes

Make Your Work Reproducible

1.  Take advantage of modern

tools for version control •  Dropbox •  Git •  Google Docs

Version Control

1.  Automate your analysis workflow

2.  Design your research to be “plug-and-play”

Scaling Up

1.  Work backwards (at least in your head) 2.  Do:

•  Think about meaning •  Embrace uncertainty •  Show off your data

3.  Don’t: •  Fish in your data •  Hack your p-values •  Star-gaze

4.  Focus on transparency

Review

36 ©2015 QUALTRICS LLC.

Questions?

37 ©2015 QUALTRICS LLC.

Thanks!

Recommended