45
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath [email protected] Shiv Kalyanaraman Google: “Shiv RPI” [email protected] http://www.ecse.rpi.edu/Homepages/shivkuma

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath [email protected] Shiv Kalyanaraman Google: “Shiv

Embed Size (px)

Citation preview

Page 1: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

1

Graphing to visualize data

Satish [email protected]

Shiv KalyanaramanGoogle: “Shiv RPI”

[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma

Page 2: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

2

Issues with graphing Types of graphs Examples of graph usage & what you get out of them

Art: how to choose what graph to use? Graphing Tools Pitfalls and mistakes in graphing Advanced: visualization In class work: reviewing graphing use in selected technical

papers

Overview

Page 3: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

3

Thoughts on Presentation Styles

Primary purpose: illustrate to help understand

“The goal of simulation is intuition, not numbers," - R.W. Hamming

Corollary: don’t dump data on the reader. Distill it into presentations that give insight instead…

Page 4: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

4

Descriptive Statistics

InvolvesCollecting DataPresenting DataCharacterizing DataUnderstanding

data: distill insights!

X = 30.5 SX = 30.5 S22 = 113 = 113

00

2525

5050

Q1Q1 Q2Q2 Q3Q3 Q4Q4

$$

Insights: Somewhat skewed Bell shape: perhaps a Poisson (distrn) would fit?

Statistics obtained from data

Page 5: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

5

To graph or not to graph

Use graphs whenTrends in data are not obvious It is hard to explain the X-Y relationship in

words Consider tables if

The number of data-points are smallReader might find exact value of data-points

useful

Page 6: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

6

Summary Table: Frequencies

1. Lists Categories & No. Elements in Category 2. Obtained by Tallying Responses in Category 3. May Show Frequencies (Counts), % or Both

Row Is Category

Tally:|||| |||||||| ||||

Major CountAccounting 130Economics 20Management 50Total 200

Page 7: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

7

Example Tables from Networking

SACK (Multiple Sources)

LT-TCP (Multiple Sources)

Page 8: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

8

What kind of graph?

Pie-charts to depict “fraction of a whole” Bar-charts when data-points few and table is not

suitable Line-plots when there are a lot of data-points Box-plots if statistical inference is drawn: shows

1st, 2nd, 3rd quartile for each point. Scatter-plots, 3-d plots only if necessary –

AVOID complex graphs

Page 9: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

9

Econ.10%

Mgmt.25%

Acct.65%

Pie Chart

1. Shows Breakdown of Quantity into Categories

2. Useful for Showing Relative Differences

3. Angle Size (360°) x (Percent)

Majors

(360°) (10%) = 36°

36°

Page 10: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

10

Pie Chart Networking Example

Source: http://www.caida.org/~bhuffake/papers/skitviz/

Page 11: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

11

Another eg: VPN Classification

Page 12: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

12

00 5050 100100 150150

Acct.Acct.

Econ.Econ.

Mgmt.Mgmt.

Bar Chart

Horizontal Horizontal Bars for Bars for Categorical Categorical VariablesVariables

Bar Length Bar Length Shows Shows Frequency Frequency or %or %

1/2 to 1 Bar 1/2 to 1 Bar WidthWidth

Equal Bar Equal Bar WidthsWidths

Zero PointZero Point

FrequencyFrequency

MajorMajor

Percent Used AlsoPercent Used Also

Page 13: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

13

Networking Example Bar Chart

Page 14: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

14

Example Analysis with Bar Charts

LT-TCP is able toreduce timeouts drasticallykeep the queue non-empty maximizing throughput and capacity utilization.minimize use of FEC to level needed

Page 15: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

15

00

11

22

33

44

55

Histogram: for “distributions”

FrequencyFrequency

Relative Relative FrequencyFrequency

PercentPercent

00 1515 2525 3535 4545 5555

Lower BoundaryLower Boundary

Bars Bars TouchTouch

ClassClass Freq.Freq.15 but < 2515 but < 25 3325 but < 3525 but < 35 5535 but < 4535 but < 45 22

CountCount

Page 16: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

16

Recall: Real Example Histogram What is the fairness between TCP goodputs when we use different queuing

policies? What is the confidence interval around your estimates of mean file size? Note: “distribution” need not just be a probability/frequency distribution

FQ

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

1 4 7 10 13 16 19 22 25 28 31Flow Number

Thro

ughp

ut(M

bps)

RED

0

1

2

3

4

5

6

7

8

9

10

1 4 7 10 13 16 19 22 25 28 31Flow Number

Thro

ughp

ut(M

bps)

Page 17: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

17

00 5050 100100 150150

Acct.Acct.

Econ.Econ.

Mgmt.Mgmt.

Dot Chart or Scatterplots

FrequencyFrequency

MajorMajor Line Length Line Length Shows Shows Frequency or %Frequency or %

Equal Equal SpacingSpacing

Like Like Horizontal Horizontal Bar ChartBar Chart

Percent Used AlsoPercent Used Also

Horizontal Horizontal Lines for Lines for Categorical Categorical VariablesVariables

Zero PointZero Point

Page 18: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

18

Scatter Plots

Page 19: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

19

Scatter plots with trends

Page 20: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

20

WiFi Analysis: Scatter Plots http://www.sigcomm.org/sigcomm2004/papers/p442-aguayo1111.pdf

Page 21: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

21

Line Charts:Example:

Comparative Performance

Note: also plots confidence intervals!

Page 22: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

22

Line Plots for Distributions: Example

Hop count and RTT distributions

Source: http://www.caida.org/~bhuffake/papers/skitviz/

Page 23: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

23

Recall: Distribution Shape

1. Describes How Data Are Distributed 2. Measures of Shape

Skew = Symmetry

Right-SkewedRight-SkewedLeft-SkewedLeft-Skewed SymmetricSymmetric

MeanMean = = MedianMedian = = ModeModeMeanMean MedianMedian ModeMode ModeMode MedianMedian MeanMean

Page 24: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

24

Box Plot

Graphical Display of Data Using5-Number Summary

MedianMedian

44 66 88 1010 1212

QQ33QQ11 XXlargestlargestXXsmallestsmallest

Page 25: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

25

3D Graphs Example Illustrates a complex parameter response surface ...

Page 26: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

26

3D Plots: N/w Example: Code Red Worm Analysis

http://www.prism.uvsq.fr/users/qst/Tomography/Articles_jmf/renesys_bgp_instabilities2001.pdf http://www.caida.org/outreach/isma/0112/talks/andyo/index.pdf http://www.renesys.com/resource_library/Renesys-NANOG23.pdf

Page 27: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

27

Contd…

Page 28: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

28

Tools: Gnuplot

To use with data-generating programs for repetitive plotting

E.g. generate the plot of throughput for every 1 hour interval in the last week.

http://www.gnuplot.info TIP: Export gnuplot plots as “.fig” file and edit it in

xfig for greater flexibility

Page 29: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

29

Tools: XmGrace

For more intricate details (e.g., creating error-bars, different shades for bar-charts); GUI-driven, very user friendly.

http://plasma-gate.weizmann.ac.il/Grace/ Exports images to EPS (good for LaTeX

documents), PNG (good for PowerPoint) etc. Can also run on Windows on top of Cygwin!

Page 30: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

30

Tools: MATLAB

For complex 3-d and other statistical plots like box-plots, scatter-plots and in general if enormous quantities of data is involved.

http://www.mathworks.com

Page 31: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

31

Tools: Excel Data Presentations

Open up Excel to a new Worksheet. Code a data set as below:

Blue 34

White 68

Red 25

Green 50 Explore simple data presentation possibilities…

Page 32: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

32

Graphs: things to watch out Purpose: illustrate entire time-series or response

distribution Label the x- and y-axis Check what units the x- and y-axes are in (not “goats” or

“sheep”!) Check if either scale is logarithmic (changes meaning) Check where is the origin (or zero point) for each axis! After understanding WHAT is being plotted, close your

eyes and ask: what will different patterns on this graph imply (relative

to what I want to understand)? See if the relative performance is over- or under-

emphasized (if two systems are being compared) Several examples in the Jain textbook

Page 33: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

33

Errors in Presenting Data

1. Using ‘Chart Junk’

2. No Relative Basis in Comparing Data Batches

3. Compressing the Vertical Axis

4. No Zero Point on the Vertical Axis

Page 34: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

34

‘Chart Junk’

Bad PresentationBad Presentation Good PresentationGood Presentation

1960: $1.001960: $1.00

1970: $1.601970: $1.60

1980: $3.101980: $3.10

1990: $3.801990: $3.80

Minimum WageMinimum Wage Minimum WageMinimum Wage

00

22

44

19601960 19701970 19801980 19901990

$$

Page 35: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

35

No Relative Basis

Good PresentationGood Presentation

A’s by ClassA’s by Class A’s by ClassA’s by Class

Bad PresentationBad Presentation

00

100100

200200

300300

FRFR SOSO JRJR SRSR

Freq.Freq.

0%0%

10%10%

20%20%

30%30%

FRFR SOSO JRJR SRSR

%%

Page 36: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

36

Compressing Vertical Axis

Good PresentationGood Presentation

Quarterly SalesQuarterly Sales Quarterly SalesQuarterly Sales

Bad PresentationBad Presentation

00

2525

5050

Q1Q1 Q2Q2 Q3Q3 Q4Q4

$$

00

100100

200200

Q1Q1 Q2Q2 Q3Q3 Q4Q4

$$

Page 37: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

37

No Zero Point on Vertical Axis

Good PresentationGood Presentation

Monthly SalesMonthly Sales Monthly SalesMonthly Sales

Bad PresentationBad Presentation

00

2020

4040

6060

JJ MM MM JJ SS NN

$$

3636

3939

4242

4545

JJ MM MM JJ SS NN

$$

Page 38: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

38

Graphing Practices: In pictures

Page 39: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

39

Graphing Practices…

Page 40: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

40

Graphing Practices…

Page 41: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

41

Graphing Practices….

Page 42: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

42

Checklist: In textbook

Page 43: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

43

More Complex Visualizations Internet topology aspects: CAIDA skitter project

http://www.caida.org/tools/measurement/skitter/visualizations.xml

Page 44: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

44

More…

Page 45: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Graphing to visualize data Satish Raghunath rsatish@alum.rpi.edu Shiv Kalyanaraman Google: “Shiv

Shivkumar KalyanaramanRensselaer Polytechnic Institute

45

The End