How Humans See Data - Amazon Cut

Preview:

Citation preview

How Humans

See Data

John Rauser

@jrauser

January 2017

How Humans

See Data

John Rauser

@jrauser

January 2017

visualization

visualization

is

communication

how to make better visualizations

help humans solve analytical

problems quickly and accurately

with visualization

Part I: Why visualize data at all?

x

1.972

y

1.236

x y

0.111 0.542

1.112 1.994 0.902 0.005

0.000 1.009 0.598 0.085

0.665 1.942 1.613 1.790

0.235 0.356 1.298 1.955

0.247 1.658 0.651 1.937

1.275 1.961 1.949 1.316

0.702 0.045 0.099 0.567

1.760 0.350 0.862 0.010

1.691 0.277 0.027 0.768

1.628 1.778 0.706 1.956

1.957 1.290 1.042 1.999

pre-attentive processing

A graph is an encoding

of the data.

x

1.972

y

1.236

x y

0.111 0.542

1.112 1.994 0.902 0.005

0.000 1.009 0.598 0.085

0.665 1.942 1.613 1.790

0.235 0.356 1.298 1.955

0.247 1.658 0.651 1.937

1.275 1.961 1.949 1.316

0.702 0.045 0.099 0.567

1.760 0.350 0.862 0.010

1.691 0.277 0.027 0.768

1.628 1.778 0.706 1.956

1.957 1.290 1.042 1.999

n x y n x y

1 1.972 1.236 13 0.111 0.542

2 1.112 1.994 14 0.902 0.005

3 0.000 1.009 15 0.598 0.085

4 0.665 1.942 16 1.613 1.790

5 0.235 0.356 17 1.298 1.955

6 0.247 1.658 18 0.651 1.937

7 1.275 1.961 19 1.949 1.316

8 0.702 0.045 20 0.099 0.567

9 1.760 0.350 21 0.862 0.010

10 1.691 0.277 22 0.027 0.768

11 1.628 1.778 23 0.706 1.956

12 1.957 1.290 24 1.042 1.999

Good visualizations optimize

for the human visual system.

How does the human

visual system work?

How does the human visual

system decode a graph?

Cleveland’s three visual

operations of pattern perception:

1. Detection

2. Assembly

3. Estimation

Part II: estimation

Three levels of estimation

a. discrimination X=Y X!=Y

b. ranking X>Y X<Y

c. ratioing X / Y = ?

At the heart of quantitative

reasoning is a single question:

Compared to what?

- Tufte, Envisioning Information

Three levels of estimation

a. discrimination X=Y X!=Y

b. ranking X>Y X<Y

c. ratioing X / Y = ?

the most

important

thing

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

“The first rule of color:

do not talk about color!”

- Tamara Munzner

luminance

saturation

hue

luminance

saturation

hue

Observation: Alphabetical is

almost never the correct ordering

of a categorical variable.

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

11 mpg

11 mpg

11 mpg

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned

scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

Observation: Stacked

anything is nearly always

a mistake.

Stacking makes the reader

decode lengths, not position

on a common scale.

11 mpg

Observation: Stacked

anything is nearly always

a mistake.

Observation: Pie charts are

ALWAYS a mistake.

Piecharts are the information visualization

equivalent of a roofing hammer to the

frontal lobe. They have no place in the world

of grownups, and occupy the same semiotic

space as short pants, a runny nose, and

chocolate smeared on one’s face. They are

as professional as a pair of assless chaps.

http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/

Piecharts are the information visualization

equivalent of a roofing hammer to the frontal

lobe. They have no place in the world of

grownups, and occupy the same semiotic

space as short pants, a runny nose, and

chocolate smeared on one’s face. They are

as professional as a pair of assless chaps.

http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

Tables are preferable to graphics for many small

data sets. A table is nearly always better than a

dumb pie chart; the only thing worse than a pie

chart is several of them, for then the viewer is

asked to compared quantities located in spatial

disarray both within and between pies… Given

their low data-density and failure to order

numbers along a visual dimension, pie charts

should never be used.

-Edward Tufte, The Visual Display of Quantitative Information

Tables are preferable to graphics for many

small data sets. A table is nearly always better

than a dumb pie chart; the only thing worse than

a pie chart is several of them, for then the viewer

is asked to compared quantities located in spatial

disarray both within and between pies… Given

their low data-density and failure to order

numbers along a visual dimension, pie charts

should never be used.

-Edward Tufte, The Visual Display of Quantitative Information

Clinton Trump

Among Democrats 99% 1%

Among Republicans 53% 47%

Who do you think did a better

job in tonight’s debate?

Afghanistan

Albania

Algeria

Angola

Argentina

Australia

Austria

Bahrain

Bangladesh

Belgium

Benin

Bolivia

Bosnia and Herzegovina

Botswana

Brazil

Bulgaria

Burkina Faso

Burundi

Cambodia

Cameroon

All good pie charts are jokes.

Observation: Comparison is trivial

on a common scale.

the dashboard metaphor is

fundamentally flawed

Observation: Scatterplots

show relationships directly.

Observation: Growth charts

usually aren’t.

If growth (slope) is

important, plot it directly.

Observation: Growth charts

usually aren’t.

If growth (slope) is important,

plot it directly.

The most important measurement should exploit

the highest ranked encoding possible.

• Position along a common scale

• Position on identical but nonaligned scales

• Length

• Angle or Slope

• Area

• Volume or Density or Color saturation

• Color hue

Cleveland’s three visual operations

of pattern perception:

1. Detection

2. Assembly

3. Estimation

Part three: assembly

Gestalt Psychology

reification

emergence

emergence

Prägnanz

Law Of Closure

Law Of Continuity

Observation: Good plots

leverage the law of continuity

to assist with assembly.

Law of Similarity

Law of Proximity

Observation: dodged bar

charts are a bad idea

Cleveland’s three visual operations

of pattern perception:

1. Detection

2. Assembly

3. Estimation

Part IV: detection

excel’s defaults are pretty bad

-

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

1 2 3 4 5 6

Observation: Detection isn’t

as trivial as it seems.

“Above all else, show the data.”

-Tufte

Part V: other useful results

Weber’s law: The “Just Noticeable

Difference” is proportional to the

size of the initial stimuli.

10 20

10 20

100 110

12 units

12 units

Observation: Weber’s Law is

why gridlines are useful

“Erase non-data ink.”

-Tufte

“Erase non-data ink,

within reason.”

-Tufte

“Erase non-data ink that interferes

with detection or doesn’t assist

assembly and estimation.”

-Rauser

You are bad at estimating

the difference between lines.

Observation: If a difference is

important, plot it directly.

You are best at detecting variation

in slope near 45 degrees.

banking to 45

Observation: Banking to 45

best shows variation in slope

Q: Should I include 0 on my scale?

Q: Should I include 0 on my scale?

A: It depends.

Q: Should I include 0 on my scale?

A: Relying on the pre-attentive

perception of size or intensity?

Yes, otherwise you will mislead.

Using position? It’s up to you.

“Above all else, show the data.”

-Tufte

“Above all else, show

the variation in the data.”

-Rauser (via Tufte)

R/GGplot2 code for every plot in this

presentation available at http://goo.gl/xH5PLV

The rendered document is at

http://rpubs.com/jrauser/hhsd_notes

This presentation is at

https://goo.gl/LuDNje

I will tweet these links as @jrauser

coda

visualization

is

communication

art

is

communication

visualization

is

art

why does it make you

feel that way?

visualization has as much to

learn from art as from science

R/GGplot2 code for every plot in this

presentation available at http://goo.gl/xH5PLV

The rendered document is at

http://rpubs.com/jrauser/hhsd_notes

This presentation is at

https://goo.gl/LuDNje

I will tweet these links as @jrauser

end

Recommended