28
Edwin de Jonge, December 3, 2013 Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and Alex Priem

Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Edwin de Jonge, December 3, 2013

Big Data Visualization

“Turning Statistics into Knowledge”, Aguascalientes

With thanks to Piet Daas, Martijn Tennekes and Alex Priem

Page 2: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Overview

2

• Big Data • Research ‘theme’ at Stat. Netherlands • Data driven approach

• Visualization as a tool •Why? •Examples in our office

•Census •Social Security •Social Media •Not shown: Traffic loops, Mobile phone data

Page 3: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Why Visualization?

October 1st 2013, Statistics Netherlands

Page 4: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Effective Display!

(see Tor Norretranders, “Band width of our senses”)

Page 5: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Anscombes quartet…

5

DS1 x

y DS2 x y

DS3 x y DS4 x y

10 8.04 10 9.14 10 7.46 8 6.58

8 6.95 8 8.14 8 6.77 8 5.76

13 7.58 13 8.74 13 12.74 8 7.71

9 8.81 9 8.77 9 7.11 8 8.84

11 8.33 11 9.26 11 7.81 8 8.47

14 9.96 14 8.1 14 8.84 8 7.04

6 7.24 6 6.13 6 6.08 8 5.25

4 4.26 4 3.1 4 5.39 19 12.5

12 10.84 12 9.13 12 8.15 8 5.56

7 4.82 7 7.26 7 6.42 8 7.91

5 5.68 5 4.74 5 5.73 8 6.89

Page 6: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Anscombe’s quartet

Property Value

Mean of x1, x2, x3, x4 All equal: 9

Variance of x1, x2, x3, x4 All equal: 11

Mean of y1, y2, y3, y4 All equal: 7.50

Variance of y1, y2, y3, y4 All equal: 4.1

Correlation for ds1, ds2, ds3, ds4 All equal 0.816

Linear regression for ds1, ds2, ds3, ds4

All equal: y = 3.00 + 0.500x

Looks the same, right?

Page 7: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Lets plot!

Page 8: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Visualization

For Big Data:

Use appropriate:

- Summarization

- Granularity

- Noise filtering

Research: What works for big data?

Page 9: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

9

Scatter plot with 100 data points

Page 10: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

10

Scatter plot with 100 000 data points

Page 11: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

11

Example 1: Census

Page 12: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Example Virtual Census

‐ Every 10 years a Census needs to be conducted

‐ No longer with surveys in the Netherlands • Last traditional census was in 1971

‐ Now by (re-)using existing information • Linking administrative sources and available sample

survey data at a large scale

• Check result

• How?

• With a visualisation method: the Tableplot

11

Page 13: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Making the Tableplot

1. Load file 17 million records 2. Sort record according to 17 million records

key variable • Age in this example

3. Combine records 100 groups (170,000 records each)

• Numeric variables • Calculate average (avg. age)

• Categorical variables • Ratio between categories present (male vs. female)

4. Plot figure of select number of variables • Colours used are important up to 12

12

Page 14: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and
Page 15: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

October 1st 2013, Statistics Netherlands tableplot of the census test file

Page 16: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Tableplot: Monitor data quality

16

– All data in Office passes stages:

‐ Raw data (collected)

‐ Preproccesed (technically correct)

‐ Edited (completed data)

‐ Final (removal of outliers etc.)

Page 17: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Processing of data Raw (unedited) data

Edited data

Final data

Page 18: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Example 2 : Social Security Register

15

Page 19: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Social Security Register

– Contains all financial data on jobs, benefits and

pensions in the Netherlands

‐ Collected by the Dutch Tax office

‐ A total of 20 million records each month

‐ How to obtain insight into so much data? • With a visualisation method: a heat map

19

Page 20: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

October 1st 2013, Statistics Netherlands

Heat map: Age vs. ‘Income’

16

Age

Inco

me

(eu

ro)

Page 21: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

17

amount

amount

Page 22: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

22

Example 3: Social media

Page 23: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Daily Sentiment in Dutch Social Media

Social media: daily sentiment in Dutch messages

23

Page 24: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Granilarity: From day to week

Social media, daily sentiment in Dutch messages Social media: daily & weekly sentiment in Dutch messages

24

Page 25: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Granularity: From day to month

Social media, daily sentiment in Dutch messages Social media: daily, weekly & monthly sentiment in Dutch messages

25

Page 26: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Enter: Consumer confidence!

Social media, daily sentiment in Dutch messages Social media: monthly sentiment in Dutch messages & Consumer confidence

26 Corr: 0.88

Page 27: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

Conclusions

Big data is a very interesting data source for

official statistics

Visualisation is a great way of

getting/creating insight

Not only for data exploration, but also for

finding errors

27

Page 28: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and

The future of statistics?