35
MIS2502: Data Analytics Principles of Data Visualization David Schuff [email protected] http://community.mis.temple.edu/dschuff

MIS2502: Data Analytics Principles of Data Visualization

  • Upload
    idra

  • View
    59

  • Download
    0

Embed Size (px)

DESCRIPTION

MIS2502: Data Analytics Principles of Data Visualization. David Schuff [email protected] http://community.mis.temple.edu/dschuff. Why Visualizations? . Anscombe’s Quartet. Identical when using summary statistics …. . Why Visualizations? . Anscombe’s Quartet. A. B. - PowerPoint PPT Presentation

Citation preview

Page 1: MIS2502: Data Analytics Principles of Data Visualization

MIS2502:Data AnalyticsPrinciples of Data Visualization

David [email protected]

http://community.mis.temple.edu/dschuff

Page 2: MIS2502: Data Analytics Principles of Data Visualization

Why Visualizations? Anscombe’s Quartet

Identical when using summary statistics ….

Page 3: MIS2502: Data Analytics Principles of Data Visualization

Why Visualizations?

Anscombe’s Quartet

A

D

B

C

But when graphed….

Page 4: MIS2502: Data Analytics Principles of Data Visualization

What makes a good chart?

Wikipedia: Patriotic War of 1812http://en.wikipedia.org/wiki/File:Patriotic_War_of_1812_ENG_map1.svg

Page 5: MIS2502: Data Analytics Principles of Data Visualization

What makes a good chart?

Minard’s map of Napoleon’s campaign into Russia, 1869Reprinted in Tufte (2009), p. 41

Page 6: MIS2502: Data Analytics Principles of Data Visualization

What can you learn from this map?

http://www.popvssoda.com/countystats/total-county.html

Page 7: MIS2502: Data Analytics Principles of Data Visualization

What makes a good chart?

Zhang et al. (2010), “A case study of micro-blogging in the enterprise: use, value, and related issues,” Proceedings of the 28th International Conference on Human Factors in Computing Systems.

This is from an academic

conference paper.

What are the problems with

this chart?

Page 8: MIS2502: Data Analytics Principles of Data Visualization

Some basic principles (adapted from Tufte 2009)

•The chart should tell a story1•The chart should have graphical

integrity2

•The chart should minimize graphical complexity

3Tufte’s fundamental principle:Above all else show the data

Page 9: MIS2502: Data Analytics Principles of Data Visualization

Principle 1: The chart should tell a story

Graphics should be clear on their own

The depictions should enable meaningful comparison

The chart should yield insight beyond the text

“If the statistics are boring, then you’ve got the wrong numbers.” (Tufte 2009)

Page 10: MIS2502: Data Analytics Principles of Data Visualization

Do these tell a story?http://www.evl.uic.edu/aej/491/week03.html

http://flowingdata.com/2009/11/26/fox-news-makes-the-best-pie-chart-ever/

Page 11: MIS2502: Data Analytics Principles of Data Visualization

Daylight Savings Time Explained

http://visual.ly/daylight-saving-time-explained

Page 12: MIS2502: Data Analytics Principles of Data Visualization

Telling a Story

http:

//ec

onom

ix.b

logs

.nyti

mes

.com

/200

9/05

/05/

obes

ity-

and-

the-

fast

ness

-of-f

ood/

http:

//flo

win

gdat

a.co

m/2

011/

01/1

9/st

ates

-with

-the

-mos

t-and

-leas

t-fire

arm

s-m

urde

rs/

Page 13: MIS2502: Data Analytics Principles of Data Visualization

Principle 2: The chart should have graphical integrity

• Basically, it shouldn’t “lie” (mislead the reader)

• Tufte’s “Lie Factor”:

Should be ~ 1

< 1 = understated effect

> 1 = exaggerated effect

Page 14: MIS2502: Data Analytics Principles of Data Visualization

Examples of the “lie factor”

𝐿𝐹=5.3 /0.627.5 /18=

8.831.53=5.77

𝐿𝐹=4280% ( h𝑐 𝑎𝑛𝑔𝑒𝑖𝑛𝑣𝑜𝑙𝑢𝑚𝑒)454% ( h𝑐 𝑎𝑛𝑔𝑒 𝑖𝑛𝑝𝑟𝑖𝑐𝑒)

=9.4Reprinted from Tufte (2009), p. 57 & p. 62

Page 15: MIS2502: Data Analytics Principles of Data Visualization

How is this deceptive?

http://www.guardian.co.uk/environment/interactive/2009/mar/09/food-carbon-emissions

Page 16: MIS2502: Data Analytics Principles of Data Visualization

How about this?

http://20bits.com/articles/politics-and-tuftes-lie-factor/

The original graphic from Real Clear Politics, 2008.

(Look at the y-axis)The adjusted graphic.

Page 17: MIS2502: Data Analytics Principles of Data Visualization

Or this?

http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225

The original graphic from Fox Business, 2012. The adjusted graphic.

Page 18: MIS2502: Data Analytics Principles of Data Visualization

Other tips to avoid “lying”

Adjust for inflation

Make sure the context is presented

80

90

100

110

120

130

140

2003 2004 2005 2006 2007 2008 2009 2010Year

Hypothetical Industries, Inc.

Revenue

Adjusted Revenue

350

360

370

380

390

400

410

2009 2010

Theft

s per

100

000

citiz

ens

Hypothetical City Crime

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010Th

efts p

er 1

0000

0 ci

tizen

s

Hypothetical City Crime

vs.

Page 19: MIS2502: Data Analytics Principles of Data Visualization

Principle 3: The chart should minimize graphical complexity

Key conceptsSometimes

a table is better

Data-ink Chartjunk

Generally, the simpler the better…

Page 20: MIS2502: Data Analytics Principles of Data Visualization

When a table is better than a chartFor a few data points, a table can do just as well…

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by SalespersonSalesperson Total Sales

Peacock $225,763.68

Leverling $201,196.27

Davolio $182,500.09

Fuller $162,503.78

Callahan $123,032.67

King $116,962.99

Dodsworth $75,048.04

Suyama $72,527.63

Buchanan $68,792.25

The table carries more information in less space and is more precise.

Page 21: MIS2502: Data Analytics Principles of Data Visualization

The Ultimate Table: The Box Score

• Large amount of information in a very small space

• So why does this work?– Depends on the

reader’s knowledge of the data

Page 22: MIS2502: Data Analytics Principles of Data Visualization

The Business Box Score?

• Applying the same concept to our salesforce example.

• How does this help? How could it hurt?

Sales Performance – March 2011

Salesperson TS WD BD NC DOR

Peacock 225 3 40 20 28

Leverling 201 2 45 18 27

Davolio 182 5 38 22 28

Fuller 162 2 22 16 20

Callahan 123 1 15 14 15

King 116 0.5 20 12 18

Dodsworth 75 0.3 12 10 20

Suyama 72 0 8 10 8

Buchanan 68 0 8 8 12

Key:TS – total salesWD – worst dayBD – best dayNC – number of customersDOR – days on the road

Page 23: MIS2502: Data Analytics Principles of Data Visualization

Data Ink• The amount of “ink” devoted to data in a chart• Tufte’s Data-Ink ratio:

Should be ~ 1

< 1 = more non-data related ink in graphic

= 1 implies all ink devoted to data

Tufte’s principle:Erase ink whenever possible

Page 24: MIS2502: Data Analytics Principles of Data Visualization

Being conscious of data ink

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s per

100

000

citiz

ens

Hypothetical City Crime

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s per

100

000

citiz

ens

Hypothetical City Crime

200

270

320 330

370350

400370

2003 2004 2005 2006 2007 2008 2009 2010

Hypothetical City Crime

Lower data-ink ratio(worse)

Higher data-ink ratio(better)

Page 25: MIS2502: Data Analytics Principles of Data Visualization

What makes a good chart?

020000400006000080000

100000120000140000160000

2011 Total Sales

Order Date

Sum of Extended Price

020000400006000080000

100000120000140000160000

2011 Total Sales

Order Date

Sum of Extended Price

Sometimes it’s really a matter of

preference.

These both minimize data ink.

Why isn’t a table better here?

Page 26: MIS2502: Data Analytics Principles of Data Visualization

3-D Charts

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by Salesperson

Evaluate this from a data-ink perspective.How does it affect the clarity of the chart?

Page 27: MIS2502: Data Analytics Principles of Data Visualization

Chartjunk: Data Ink “gone wild”

Unnecessary visual clutter that doesn’t provide additional insight

Distraction from the story the chart is supposed to convey

When the data-ink ratio is low, chartjunk is likely to be high

Page 28: MIS2502: Data Analytics Principles of Data Visualization

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s per

100

000

citiz

ens

Hypothetical City Crime

Example: Moiré effects (Tufte 2009)

Creates illusion of movement

Stands out, in a bad way

$0.00

$50,000.00

$100,000.00

$150,000.00

$200,000.00

$250,000.00

Total Sales by Salesperson

Page 29: MIS2502: Data Analytics Principles of Data Visualization

Example: The Grid

25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s per

100

000

citiz

ens

Hypothetical City Crime25

75

125

175

225

275

325

375

425

2003 2004 2005 2006 2007 2008 2009 2010

Theft

s per

100

000

citiz

ens

Hypothetical City Crime

Why are these examples of chartjunk?

What could you do to remedy

it?

Page 30: MIS2502: Data Analytics Principles of Data Visualization

Data Ink Working Against Us

http:

//w

ww

.eco

nom

ist.c

om/n

ode/

2153

7922

http:

//w

ww

.eco

nom

ist.c

om/n

ode/

2153

7909

Page 31: MIS2502: Data Analytics Principles of Data Visualization

Data Ink Working For Us

Evaluate this chart in terms

of Data Ink.

Imagine this as a bar chart. As a

table!!

Page 32: MIS2502: Data Analytics Principles of Data Visualization

The Infographic

http:

//vi

sual

.ly/w

hat-i

nfog

raph

ic-2

Check out more at www.coolinfographics.com

Page 33: MIS2502: Data Analytics Principles of Data Visualization

Three excerpts from an infographic comparing Hurricane Sandy to Katrina

http://www.huffingtonpost.com/2012/11/04/hurricane-sandy-vs-katrina-infographic_n_2072432.html

Page 34: MIS2502: Data Analytics Principles of Data Visualization
Page 35: MIS2502: Data Analytics Principles of Data Visualization

Another example

http:

//w

ww

.bizt

echm

agaz

ine.

com

/arti

cle/

2012

/08/

how

-mak

e-su

re-y

our-

byod

-pla

n-al

l-goo

d-in

fogr

aphi

c