Upload
netsquared-victoria
View
81
Download
0
Embed Size (px)
DESCRIPTION
Data is ubiquitous in our lives and work places, and more data can be easily collected. And those data can, if analyzed correctly, provide insights that lead to better decision making. Martin Monkman will present the key ideas for effective and meaningful data analysis, including sources of existing data, things to think about when collecting new data, analyzing results, and effective presentation and reporting of the data.
Citation preview
Data Analysis for Everyone
Martin Monkman
• Provincial Statistician & Director, BC Stats
• been getting paid to do data analysis in one form or another since the mid-1980s
• B.Sc. and M.A. in Geography (UVic)
• member of SABR
• bayesball.blogspot.ca
1. Start with a question
ALWAYS!
And don’t start with data!
• Five Ws
Some examples of questions
• What was the population of Victoria in 1996? And what will the population of Victoria be in 2029?
• What are the demographics of Victoria?
• What do Victoria residents think about infrastructure investment?
2. Get some data
Remember: after your research question has been asked!
Two sources:
• Third party data
• Collect your own
Sources of third party data
Open Data
• Social data: Statistics Canada
• The Census of Canada
• National Household Survey
• www.statcan.gc.ca
• DataBC
• www.data.gov.bc.ca
Collect your own data
Administrative sources
• Registration information
• Transactions
Original data collection
• Survey
Surveys
From the Twenty Questions:
• Who is your population?
• How are you going to reach them?
• What do you already know about them?
• Differences
• Distributions
• Magnitude
• Patterns
• Proportions
• Relationships
• Trends
3. Data Analysis
• MOOCs
• google “Making Sense of Data”
• Coursera
• https://www.coursera.org/course/introstats
• https://www.coursera.org/course/dataanalysis
• https://www.class-central.com/mooc/388/coursera-computing-for-data-analysis
Data Analysis: How-to
“Graphics are instruments for reasoning about quantitative information.” (Edward R. Tufte)
Purposes
• Exploratory Data Analysis
• Narrative
4. Data Visualization
“The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey
Anscombe’s Quartet
STATISTICAL MEASURES OF
EACH OF THE FOUR DATA SETS
Mean of x = 9 (exact)
Variance of x = 11 (exact)
Mean of y = 7.50
Variance of y = 4.122 or 4.127
Correlation between x and y = 0.816
Regression equation:
y = 3.00 + 0.500x
Population pyramid
http://cran.r-project.org/
Capital Regional District, population by municipality, 2013
Data source: Statistics Canada & BC Stats
Capital Regional District, population by municipality and region, 2013
Data source: Statistics Canada & BC Stats
Capital Regional District population, 1996-2013
Data source: Statistics Canada & BC Stats
Year-over-year population change, Capital Regional District
Data source: Statistics Canada & BC Stats
Census tracts
Data source: Statistics Canada & BC Stats
Victoria CMA – median after-tax income (2005), by Census Tract
Data source: Statistics Canada & BC Stats
Data source: Statistics Canada
Source: Harvard Dialect Survey / Joshua Katz
Mapping
How can I improve my data visualizations?
• Work with data
• Experiment
• Get feedback from others
• Look for good examples
• Look for bad examples
Five Degrees of Obfuscation
Debris
Garbage
Rubbish
Trash
Waste
0
5
10
15
20
25
Trash Debris Rubbish Waste GarbageU
nit
s
Five Columns of Clarity
Foreshortened circles
An illusion of distance and volume
No 3D. Ever.
[email protected]@monkmanmh
bayesball.blogspot.ca