50
“It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

“It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

“It is a capital mistake to theorize

before one has data. Insensibly one

begins to twist facts to suit theories,

instead of theories to suit facts.”

Page 2: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist
Page 3: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Data Science

Page 4: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

“Data Science” is thinking with data

Page 5: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

How to categorize data

How to computationally

explore data

How to visually explore data

Page 6: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Please ask questions if you have them!!!

Page 7: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

How to categorize data

How to computationally

explore data

How to visually explore data

Page 8: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

What are the different properties of

the data

Page 9: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Data falls into two categories:

Qualitative: Descriptions, categories, and observations

Quantitative: Numeric measures

Page 10: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Data about this book:

Qualitative: Old

By Shakespeare

Published in London

Quantitative: 142 pages

20,000 words

1,700 nouns

Page 11: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Data can also be:

Ordinal Categorical/Nominal

Discrete Continuous

Page 12: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

What are the different levels of detail

we can look at?

Page 13: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Scales Overview

Detail

Overview: High-level patterns looking

across all the data

Detail: Low-level patterns looking at specific pieces of the data

Page 14: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

How to categorize data

How to computationally

explore data

How to visually explore data

Page 15: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist
Page 16: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Reduce the dataset using mathematics

and logic

Page 17: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

All models are wrong, but some are

useful. --George E. P. Box

Page 18: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Use statistics to group the data into

manageable units

Page 19: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

data

data data

data

data

data

data

data

data

data

data

data

data

data data data

data

data

data

Algorithmically categorize dataset

based on properties of the data

Ma

th &

Lo

gic

Page 20: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Topic Models: Identify words that categorize groups of texts in a

corpus

Clustering: Identify groups of datapoints with similar properties

Bayes Nets: Compute how likely it is that a text belongs to

different groups based on its properties

Explainers: Determine how similar different texts are to an

example text

Page 21: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

How to categorize data

How to computationally

explore data

How to visually explore data

Page 22: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

You need statistics to describe data, but

then visualization to see it in context.

-- Andy Kirk

Page 23: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist
Page 24: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Visualizations let us explore and communicate large

amounts of data visually

Page 25: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

1) Visually encode the data

2) Arrange the encoded data to highlight

patterns of interest

3) Design complementary methods for

looking at the data that can answer

complex analysis questions

4) Design ways for interacting with the

encoded data that support your

analysis

Page 26: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

1) Visually encode the data

2) Arrange the encoded data to highlight

patterns of interest

3) Design complementary methods for

looking at the data that can answer

complex analysis questions

4) Design ways for interacting with the

encoded data that support your

analysis

Page 27: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

A

B

A B

A B

A B

A B

A

A B

Position

Size

Shape

Value/Lightness

Color

Orientation

Texture

Visual Encodings: Ways to map data values to

visual marks

Different visual encodings

highlight different properties

in the data

Page 28: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

A B A B +

A B

Encodings can be combined to communicate

multiple properties of the data

Page 29: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

1) Visually encode the data

2) Arrange the encoded data to highlight

patterns of interest

3) Design complementary methods for

looking at the data that can answer

complex analysis questions

4) Design ways for interacting with the

encoded data that support your

analysis

Page 30: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Juxtapositioning encoded data side-by-side

Superpositioning encoded data in the same

space

Explicitly encoding relationships of interest

between datapoints

Once data is encoded, we can

highlight relationships in the data by:

Page 31: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Small Multiples: Juxtapose large

numbers of small

visualizations to

communicate high-

level patterns

Can either subdivide

the data or

properties of the

data

Page 32: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

1) Visually encode the data

2) Arrange the encoded data to highlight

patterns of interest

3) Design complementary methods for

looking at the data that can answer

complex analysis questions

4) Design ways for interacting with the

encoded data that support your

analysis

Page 33: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Coordinated Views: Create multiple visualizations that work together to

support complex analysis

Page 34: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Index Membership Freq Grouped Freq Pos in Reference In

dex

G

rou

ped

Fre

q

Pos

in R

efer

ence

Dynamic Remapping: Allow the user to change what data maps to which

visual channels to highlight different patterns

Page 35: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

1) Visually encode the data

2) Arrange the encoded data to highlight

patterns of interest

3) Design complementary methods for

looking at the data that can answer

complex analysis questions

4) Design ways for interacting with the

encoded data that support your

analysis

Page 36: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Don’

Always connect back to the person:

how can we make insights meaningful?

Page 37: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Interaction

Page 38: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Some techniques for visualizing data…

Page 39: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Bar Charts

Compare values

Page 40: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Line Graphs

Identify trends

Page 41: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Scatterplots

Identify clusters

Page 42: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Pie Charts

Communicate proportions of a whole

Page 43: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Heatmaps

Color to convey values compactly

Page 44: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Histogram

Distribution over different properties

Page 45: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Choropleths: Color to convey

values

Cartograms: Size to convey

values

Page 46: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Networks/Node-Link Diagrams

Connect related objects

Page 47: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Trees & Hierarchies

Communicate hierarchical relationships

Page 48: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

Learn More about Visualization

CS638/838: Visualization Prof. Michael Gleicher

11:00-12:15 Tu/Th

Visualization Reading Group 2pm every other Thursday

Page 49: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

1) Break into groups—mix “techies” and

“humanists”

2) Pick one dataset from your group to

talk about

3) Sketch how you might approach

analyzing this data

4) Rinse and repeat

5) Group critique

Page 50: “It is a capital mistake to theorize before one has data ...danielleszafir.com/DHRN.pdf · “It is a capital mistake to theorize before one has data. Insensibly one begins to twist

What are the different properties of the data?

What are the interesting relationships between these

properties and why?

What are common or informative labels that can

describe different aspects of the data?

What, if any, questions do you want to explore in the

data?

What levels of detail are interesting?

What would be some interesting ways to “look” at this

data?

What patterns (or lack thereof) would you hope to find in

this data and what would they mean?