28
Oh my god, it's full of data! A biased & incomplete introduction to visualization Bastian Rieck

Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Oh my god, it's full of data!

A biased & incomplete introductionto visualization

Bastian Rieck

Page 3: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

What is visualization?

Page 4: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

“Computer-based visualization systems provide visualrepresentations of datasets intended to help people carry out sometask better.”

— Tamara Munzner, Visualization Design and Analysis: Abstractions,Principles, and Methods

Page 5: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Why is visualization useful?

Page 6: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Anscombe’s quartet

I II III IVx y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.588.0 6.95 8.0 8.14 8.0 6.77 8.0 5.7613.0 7.58 13.0 8.74 13.0 12.74 8.0 7.719.0 8.81 9.0 8.77 9.0 7.11 8.0 8.8411.0 8.33 11.0 9.26 11.0 7.81 8.0 8.4714.0 9.96 14.0 8.10 14.0 8.84 8.0 7.046.0 7.24 6.0 6.13 6.0 6.08 8.0 5.254.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5012.0 10.84 12.0 9.13 12.0 8.15 8.0 5.567.0 4.82 7.0 7.26 7.0 6.42 8.0 7.915.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

Page 7: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

From the viewpoint of statistics

x y

Mean 9 7.50Variance 11 4.127Correlation 0.816Linear regression line y = 3.00 + 0.500x

Page 8: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

From the viewpoint of visualization

4

6

8

10

12

4 6 8 10 12 14 16 18

4

6

8

10

12

4 6 8 10 12 14 16 18

4

6

8

10

12

4 6 8 10 12 14 16 18

4

6

8

10

12

4 6 8 10 12 14 16 18

Page 9: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

How does it work?

Page 10: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Parallel coordinatesTabular data (e.g. attributes in columns, instances in rows)Create an axis for each attribute dimensionDraw a line through these axes to represent an instance

9

46.6

3

8

68

455

0

230

1613

5140

8

24.8

70

82

USA

Europe

Japan

MPG Cylinders Displacement Horsepower Weight Acceleration Year Origin

Page 11: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Brushing fuel-efficient cars

9.0

46.6

3

8

68.00

455.0

0.0

230.0

1613.

5140.

8.0

24.8

70

82

USA

Europe

Japan

MPG Cylinders Displacement Horsepower Weight Acceleration Year Origin

Page 12: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Some drawbacks

Does not work for dimensions � 10Order of axes matters (d ! possibilities)Rapid overplotting

(Some of these drawbacks have been solved, others involveworkarounds, which in turn cause other drawbacks, . . . )

Page 13: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Some patterns

Positive correlation Negative correlation

Two clusters Normal distribution

Page 14: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Scatterplot matrices

Set of vectors from Rn

Create (n · n − 1) 2-dimensional scatterplotsArrange them in a matrix

Page 15: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Brushing by speciesIris setosa, Iris versicolor, Iris virginica

Page 16: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Analysis

Advantages

Brushing+linking easily possibleConceptually simpleExtendible (histograms, densities, . . . )

DrawbacksQuadratic increase in number of plotsDoes not show all interesting projectionsOcclusion possible

Page 17: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

ScagnosticsA cure for some drawbacks

ProcedureCalculate k � n measures for each scatterplotAssign each projection a vector of measuresShow all vectors in a (smaller!) scatterplot

Page 18: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Outlying

Skewed

Clum py

Sparse

Striated

Convex

Skinny

Stringy

0.0 0.2 0.4 0.6 0.8 1.0Measure

Monotonic

Source: Leland Wilkinson and Graham Wills. “Scagnostics Distributions.” Journal of Computational and GraphicalStatistics (JCGS) 17:2 (2008), pp. 473–491.

Page 19: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Heatmaps

“Matrix visualization”Assign colours according to valueScale globally, per row, or per column

Page 20: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Game

sFie

ld go

al per

centag

e

Blocks

Defensi

ve reb

ounds

Total

reboun

ds

ree

-point

s mad

e

ree

-point

s atte

mpts

Freeth

row pe

rcenta

ge

ree

-point

s perc

entag

e

Minutes

play

ed

Point

sFie

ld go

als m

ade

Field

goals

attem

pts

Assis

tsSte

alsTur

novers

Freeth

rows m

ade

Freeth

rows a

ttemp

ts

Offensiv

e rebo

unds

Dwight HowardShaquille O'NealTum DuncanYao MingLaMacrus AldridgePaul GasolZachary RandolphAntawn JamisonAl JeffersonDirk NowitzkiDavid WestChris BoschChris PaulLeBron JamesDwayne WadeNate RobinsonJosh HowardRichard HamiltonJamal CrawfordJoe JohnsonBen GordonVince CarterRashard LewisAl HarringtonRudy GayJohn SalmonsO.J. MayoChauncey BillupsJason TerryMaurice WilliamsRay AllenKevin MartinDanny GrangerCarmelo AnthonyKevin DurantKobe BryantCorey MaggetteAmare StoudemireMonta EllisMichael ReddStephen JacksonCaron ButlerDeron WilliamsAllen IversonDevin HarrisTony ParkerAndre IguodaBrandon RoyRichard JeffersonPaul Pierce Low

High

Page 21: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Just what do you think you’re doing, Dave?

MPG Cylinders Displacement Horsepower Weight Accceleration Year Origin Low

High

Page 22: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

What should I remember aboutvisualization?

Page 23: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

“I am putting myself to the fullest possible use, which is all I thinkthat any conscious entity can ever hope to do.”

— HAL 9000, 2001: A Space Odyssey

Model-based approaches help explain dataVisualization may help arrive at a description of the modelIt is challenging to scale methods to larger data setsIt is easy to get it wrong, but hard to get it right

Page 24: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

I want to learn more!

Page 25: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

PeopleAsk your local friendly visualization researchers at INF 368, 5th

floor, rooms 528, 529, and 531.

Page 26: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

ToolsD3.js (http://d3js.org)IBM ManyEyes(http://www.ibm.com/software/analytics/manyeyes)Tableau Software (http://www.tableausoftware.com)

Page 27: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

BooksStephen Few. Show Me the Numbers: Designing Tables andGraphs to Enlighten.Tamara Munzner. Visualization Design and Analysis:Abstractions, Principles, and Methods.Edward R. Tufte. The Visual Display of QuantitativeInformation.Robin Williams. The Non-Designer’s Design Book.

Page 28: Oh my god, it's full of data! - bastian.rieck.me · Does not show all interesting projections Occlusion possible. Scagnostics A cure for some drawbacks Procedure Calculate k ˝ n

Thank you.