43
Data Analysis for Everyone

Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Embed Size (px)

DESCRIPTION

Data is ubiquitous in our lives and work places, and more data can be easily collected. And those data can, if analyzed correctly, provide insights that lead to better decision making. Martin Monkman will present the key ideas for effective and meaningful data analysis, including sources of existing data, things to think about when collecting new data, analyzing results, and effective presentation and reporting of the data.

Citation preview

Page 1: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Data Analysis for Everyone

Page 2: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Martin Monkman

• Provincial Statistician & Director, BC Stats

• been getting paid to do data analysis in one form or another since the mid-1980s

• B.Sc. and M.A. in Geography (UVic)

• member of SABR

• bayesball.blogspot.ca

Page 3: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 4: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 5: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 6: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

1. Start with a question

ALWAYS!

And don’t start with data!

• Five Ws

Page 7: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Some examples of questions

• What was the population of Victoria in 1996? And what will the population of Victoria be in 2029?

• What are the demographics of Victoria?

• What do Victoria residents think about infrastructure investment?

Page 8: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 9: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

2. Get some data

Remember: after your research question has been asked!

Two sources:

• Third party data

• Collect your own

Page 10: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Sources of third party data

Open Data

• Social data: Statistics Canada

• The Census of Canada

• National Household Survey

• www.statcan.gc.ca

• DataBC

• www.data.gov.bc.ca

Page 11: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Collect your own data

Administrative sources

• Registration information

• Transactions

Original data collection

• Survey

Page 12: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Surveys

From the Twenty Questions:

• Who is your population?

• How are you going to reach them?

• What do you already know about them?

Page 13: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 14: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 15: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

• Differences

• Distributions

• Magnitude

• Patterns

• Proportions

• Relationships

• Trends

3. Data Analysis

Page 16: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

• MOOCs

• google “Making Sense of Data”

• Coursera

• https://www.coursera.org/course/introstats

• https://www.coursera.org/course/dataanalysis

• https://www.class-central.com/mooc/388/coursera-computing-for-data-analysis

Data Analysis: How-to

Page 17: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 18: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

“Graphics are instruments for reasoning about quantitative information.” (Edward R. Tufte)

Purposes

• Exploratory Data Analysis

• Narrative

4. Data Visualization

Page 19: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

“The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey

Page 20: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Anscombe’s Quartet

STATISTICAL MEASURES OF

EACH OF THE FOUR DATA SETS

Mean of x = 9 (exact)

Variance of x = 11 (exact)

Mean of y = 7.50

Variance of y = 4.122 or 4.127

Correlation between x and y = 0.816

Regression equation:

y = 3.00 + 0.500x

Page 21: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 22: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 23: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Population pyramid

Page 24: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 25: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

http://cran.r-project.org/

Page 26: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Capital Regional District, population by municipality, 2013

Data source: Statistics Canada & BC Stats

Page 27: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Capital Regional District, population by municipality and region, 2013

Data source: Statistics Canada & BC Stats

Page 28: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Capital Regional District population, 1996-2013

Data source: Statistics Canada & BC Stats

Page 29: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Year-over-year population change, Capital Regional District

Data source: Statistics Canada & BC Stats

Page 30: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Census tracts

Data source: Statistics Canada & BC Stats

Page 31: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Victoria CMA – median after-tax income (2005), by Census Tract

Data source: Statistics Canada & BC Stats

Page 32: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Data source: Statistics Canada

Page 33: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Source: Harvard Dialect Survey / Joshua Katz

Mapping

Page 34: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

How can I improve my data visualizations?

• Work with data

• Experiment

• Get feedback from others

• Look for good examples

• Look for bad examples

Page 35: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 36: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Five Degrees of Obfuscation

Debris

Garbage

Rubbish

Trash

Waste

0

5

10

15

20

25

Trash Debris Rubbish Waste GarbageU

nit

s

Five Columns of Clarity

Page 37: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

Foreshortened circles

Page 38: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

An illusion of distance and volume

Page 39: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

No 3D. Ever.

Page 40: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 41: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 42: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)
Page 43: Net2Vic: Effective Data Analysis for Everyone (October 23, 2014)

[email protected]@monkmanmh

bayesball.blogspot.ca