Upload
john-rauser
View
2.186
Download
0
Embed Size (px)
Citation preview
How Humans
See Data
John Rauser
@jrauser
January 2017
How Humans
See Data
John Rauser
@jrauser
January 2017
visualization
visualization
is
communication
how to make better visualizations
help humans solve analytical
problems quickly and accurately
with visualization
Part I: Why visualize data at all?
x
1.972
y
1.236
x y
0.111 0.542
1.112 1.994 0.902 0.005
0.000 1.009 0.598 0.085
0.665 1.942 1.613 1.790
0.235 0.356 1.298 1.955
0.247 1.658 0.651 1.937
1.275 1.961 1.949 1.316
0.702 0.045 0.099 0.567
1.760 0.350 0.862 0.010
1.691 0.277 0.027 0.768
1.628 1.778 0.706 1.956
1.957 1.290 1.042 1.999
pre-attentive processing
A graph is an encoding
of the data.
x
1.972
y
1.236
x y
0.111 0.542
1.112 1.994 0.902 0.005
0.000 1.009 0.598 0.085
0.665 1.942 1.613 1.790
0.235 0.356 1.298 1.955
0.247 1.658 0.651 1.937
1.275 1.961 1.949 1.316
0.702 0.045 0.099 0.567
1.760 0.350 0.862 0.010
1.691 0.277 0.027 0.768
1.628 1.778 0.706 1.956
1.957 1.290 1.042 1.999
n x y n x y
1 1.972 1.236 13 0.111 0.542
2 1.112 1.994 14 0.902 0.005
3 0.000 1.009 15 0.598 0.085
4 0.665 1.942 16 1.613 1.790
5 0.235 0.356 17 1.298 1.955
6 0.247 1.658 18 0.651 1.937
7 1.275 1.961 19 1.949 1.316
8 0.702 0.045 20 0.099 0.567
9 1.760 0.350 21 0.862 0.010
10 1.691 0.277 22 0.027 0.768
11 1.628 1.778 23 0.706 1.956
12 1.957 1.290 24 1.042 1.999
Good visualizations optimize
for the human visual system.
How does the human
visual system work?
How does the human visual
system decode a graph?
Cleveland’s three visual
operations of pattern perception:
1. Detection
2. Assembly
3. Estimation
Part II: estimation
Three levels of estimation
a. discrimination X=Y X!=Y
b. ranking X>Y X<Y
c. ratioing X / Y = ?
At the heart of quantitative
reasoning is a single question:
Compared to what?
- Tufte, Envisioning Information
Three levels of estimation
a. discrimination X=Y X!=Y
b. ranking X>Y X<Y
c. ratioing X / Y = ?
the most
important
thing
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
“The first rule of color:
do not talk about color!”
- Tamara Munzner
luminance
saturation
hue
luminance
saturation
hue
Observation: Alphabetical is
almost never the correct ordering
of a categorical variable.
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
11 mpg
11 mpg
11 mpg
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned
scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
Observation: Stacked
anything is nearly always
a mistake.
Stacking makes the reader
decode lengths, not position
on a common scale.
11 mpg
Observation: Stacked
anything is nearly always
a mistake.
Observation: Pie charts are
ALWAYS a mistake.
Piecharts are the information visualization
equivalent of a roofing hammer to the
frontal lobe. They have no place in the world
of grownups, and occupy the same semiotic
space as short pants, a runny nose, and
chocolate smeared on one’s face. They are
as professional as a pair of assless chaps.
http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/
Piecharts are the information visualization
equivalent of a roofing hammer to the frontal
lobe. They have no place in the world of
grownups, and occupy the same semiotic
space as short pants, a runny nose, and
chocolate smeared on one’s face. They are
as professional as a pair of assless chaps.
http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
Tables are preferable to graphics for many small
data sets. A table is nearly always better than a
dumb pie chart; the only thing worse than a pie
chart is several of them, for then the viewer is
asked to compared quantities located in spatial
disarray both within and between pies… Given
their low data-density and failure to order
numbers along a visual dimension, pie charts
should never be used.
-Edward Tufte, The Visual Display of Quantitative Information
Tables are preferable to graphics for many
small data sets. A table is nearly always better
than a dumb pie chart; the only thing worse than
a pie chart is several of them, for then the viewer
is asked to compared quantities located in spatial
disarray both within and between pies… Given
their low data-density and failure to order
numbers along a visual dimension, pie charts
should never be used.
-Edward Tufte, The Visual Display of Quantitative Information
Clinton Trump
Among Democrats 99% 1%
Among Republicans 53% 47%
Who do you think did a better
job in tonight’s debate?
Afghanistan
Albania
Algeria
Angola
Argentina
Australia
Austria
Bahrain
Bangladesh
Belgium
Benin
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
All good pie charts are jokes.
Observation: Comparison is trivial
on a common scale.
the dashboard metaphor is
fundamentally flawed
Observation: Scatterplots
show relationships directly.
Observation: Growth charts
usually aren’t.
If growth (slope) is
important, plot it directly.
Observation: Growth charts
usually aren’t.
If growth (slope) is important,
plot it directly.
The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
Cleveland’s three visual operations
of pattern perception:
1. Detection
2. Assembly
3. Estimation
Part three: assembly
Gestalt Psychology
reification
emergence
emergence
Prägnanz
Law Of Closure
Law Of Continuity
Observation: Good plots
leverage the law of continuity
to assist with assembly.
Law of Similarity
Law of Proximity
Observation: dodged bar
charts are a bad idea
Cleveland’s three visual operations
of pattern perception:
1. Detection
2. Assembly
3. Estimation
Part IV: detection
excel’s defaults are pretty bad
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
1 2 3 4 5 6
Observation: Detection isn’t
as trivial as it seems.
“Above all else, show the data.”
-Tufte
Part V: other useful results
Weber’s law: The “Just Noticeable
Difference” is proportional to the
size of the initial stimuli.
10 20
10 20
100 110
12 units
12 units
Observation: Weber’s Law is
why gridlines are useful
“Erase non-data ink.”
-Tufte
“Erase non-data ink,
within reason.”
-Tufte
“Erase non-data ink that interferes
with detection or doesn’t assist
assembly and estimation.”
-Rauser
You are bad at estimating
the difference between lines.
Observation: If a difference is
important, plot it directly.
You are best at detecting variation
in slope near 45 degrees.
banking to 45
Observation: Banking to 45
best shows variation in slope
Q: Should I include 0 on my scale?
Q: Should I include 0 on my scale?
A: It depends.
Q: Should I include 0 on my scale?
A: Relying on the pre-attentive
perception of size or intensity?
Yes, otherwise you will mislead.
Using position? It’s up to you.
“Above all else, show the data.”
-Tufte
“Above all else, show
the variation in the data.”
-Rauser (via Tufte)
R/GGplot2 code for every plot in this
presentation available at http://goo.gl/xH5PLV
The rendered document is at
http://rpubs.com/jrauser/hhsd_notes
This presentation is at
https://goo.gl/LuDNje
I will tweet these links as @jrauser
coda
visualization
is
communication
art
is
communication
visualization
is
art
why does it make you
feel that way?
visualization has as much to
learn from art as from science
R/GGplot2 code for every plot in this
presentation available at http://goo.gl/xH5PLV
The rendered document is at
http://rpubs.com/jrauser/hhsd_notes
This presentation is at
https://goo.gl/LuDNje
I will tweet these links as @jrauser
end