Transcript
Page 1: 2013.10.24 big datavisualization

Visualizing “Big” DataSean Kandel & Je!rey Heer Trifacta Inc. @trifacta

Page 2: 2013.10.24 big datavisualization

How can we visualize and interact with billion+ record

databases in real-time?

Page 3: 2013.10.24 big datavisualization

Two Challenges:1. E!ective visual encoding2. Real-time interaction

Page 4: 2013.10.24 big datavisualization

Perceptual and interactive scalability should be limited by the chosen resolution of the visualized data, not the

number of records.

Page 5: 2013.10.24 big datavisualization

Perception

Page 6: 2013.10.24 big datavisualization

Data Sampling

ModelingBinning

Page 7: 2013.10.24 big datavisualization

Google Fusion Tables (Sampling)

Page 8: 2013.10.24 big datavisualization

imMens (Binned Aggregation)

Page 9: 2013.10.24 big datavisualization

Bin > Aggregate (> Smooth) > Plot

1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection

Page 10: 2013.10.24 big datavisualization

Number of Bins?

Page 11: 2013.10.24 big datavisualization

100,000 Data Points Rectangular BinsHexagonal Bins

Hexagonal or Rectangular Bins?

Hex bins better estimate density for 2D plots,but the improvement is marginal [Scott 92], whilerectangles support reuse and query processing.

Page 12: 2013.10.24 big datavisualization

Bin > Aggregate (> Smooth) > Plot

1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...

Page 13: 2013.10.24 big datavisualization

Bin > Aggregate (> Smooth) > Plot

1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...

(3. Smooth Optional: smooth aggregates [Wickham ’13])

Page 14: 2013.10.24 big datavisualization

[1] Wickham 2013

Page 15: 2013.10.24 big datavisualization

Bin > Aggregate (> Smooth) > Plot

1. Bin Divide data domain into discrete “buckets”Categories: Already discrete (but check cardinality)Numbers: Choose bin intervals (uniform, quantile, ...)Time: Choose time unit: Hour, Day, Month, etc.Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...

(3. Smooth Optional: smooth aggregates [Wickham ’13])

4. Plot Visualize the aggregate summary values

Page 16: 2013.10.24 big datavisualization

Plot: Visual Encoding

Choose Most E!ective Encoding [Cleveland & McGill ’84]

1D Plot -> Position or Length EncodingHistograms, line charts, etc.

2D Plot -> Area or Color EncodingSpatial dimensions (x, y) already allocated.While less e!ective than area for magnitude estimation, color can be used at the per-pixel level and provides an overall “gestalt”

Page 17: 2013.10.24 big datavisualization

Standard Color RampCounts near zero are white.

-> Outliers are missed

Add Discontinuity after ZeroCounts near zero remain visible.

-> Outliers can be seen

Page 18: 2013.10.24 big datavisualization

Linear Alpha Interpolationis not perceptually linear.

Cube-Root Alpha Interpolationapproximates perceptual linearity.

Page 19: 2013.10.24 big datavisualization

Color Encoding

Luminance (in range 0-1)

Min. Non-Zero Intensity (α=0.15) [1] Perceptual Scaling (γ=1/3) [2]

User-Adjustable Min/Max Values [3]

[1] Keep small non-zero values visible (outliers!)[2] Match color ramp to perceptual distances[3] Enable exploration across value ranges

Page 20: 2013.10.24 big datavisualization

Design Space of Binned Plots

Page 21: 2013.10.24 big datavisualization

Interaction

Page 22: 2013.10.24 big datavisualization

Interaction Techniques?1. Select Detail-on-Demand2. Navigate Pan & Zoom3. Query Brush & Link

Page 23: 2013.10.24 big datavisualization
Page 24: 2013.10.24 big datavisualization
Page 25: 2013.10.24 big datavisualization

5-D Data CubeMonth, Day, Hour, X, Y

X

Y

256

767

512 1023…

Day

Hour

Month

23…

0 1 … 30

0 …

11

1

23…

0…

11

0 1 … 30 0 1 … 30 0

23…

0

11

10

10

12 x 31 x 24 x 512 x 512 = ~2.3 billion cells

Page 26: 2013.10.24 big datavisualization

X

Y

256

767

512 1023…

Day

Hour

Month

23…

0 1 … 30

0 …

11

1

23…

0…

11

0 1 … 30 0 1 … 30 0

23…

0

11

10

10

Brushing JanuaryMonth, Day, Hour, X, Y

31 x 24 x 512 x 512 = ~195 million cells

Page 27: 2013.10.24 big datavisualization
Page 28: 2013.10.24 big datavisualization
Page 29: 2013.10.24 big datavisualization

Multivariate Data Tiles1. Send data, not pixels2. Embed multi-dim data

Page 30: 2013.10.24 big datavisualization

Full 5-D Cube

For any pair of 1D or 2D binned plots, the maximum number of dimensions needed to support brushing & linking is four.

Σ Σ Σ Σ

Page 31: 2013.10.24 big datavisualization

X : 512 bins

Y :

512

bins

Page 32: 2013.10.24 big datavisualization
Page 33: 2013.10.24 big datavisualization
Page 34: 2013.10.24 big datavisualization
Page 35: 2013.10.24 big datavisualization
Page 36: 2013.10.24 big datavisualization
Page 37: 2013.10.24 big datavisualization
Page 38: 2013.10.24 big datavisualization
Page 39: 2013.10.24 big datavisualization
Page 40: 2013.10.24 big datavisualization
Page 41: 2013.10.24 big datavisualization
Page 42: 2013.10.24 big datavisualization
Page 43: 2013.10.24 big datavisualization
Page 44: 2013.10.24 big datavisualization
Page 45: 2013.10.24 big datavisualization
Page 46: 2013.10.24 big datavisualization
Page 47: 2013.10.24 big datavisualization
Page 48: 2013.10.24 big datavisualization

~2.3B bins

~17.6M bins (in 352KB!)

Full 5-D Cube

13 3-D Data Tiles

Σ Σ Σ Σ

Page 49: 2013.10.24 big datavisualization

Query & Render on GPU via WebGL

Pack data tiles as PNG image files,bind to WebGL as image textures.

Page 50: 2013.10.24 big datavisualization

Query & Render on GPU via WebGL

Σ

Invoke program for each output bin.Executes in parallel on GPU.

Page 51: 2013.10.24 big datavisualization

Query & Render on GPU via WebGL

Σ

Page 52: 2013.10.24 big datavisualization

Performance BenchmarksSimulate interaction:brushing & linkingacross binned plots.

- imMens vs. Profiler- 4x4 and 5x5 plots- 10 to 50 bins

Measure time from selection to render.

Test setup:2.3 GHz MacBook Pro (4-core)

NVIDIA GeForce GT 650MGoogle Chrome v.23.0

Page 53: 2013.10.24 big datavisualization

~50fps querying of visualsummaries of 1B data points.

In-Memory Data Cube

imMens

Number of Data Points

5 dimensions x 50 bins/dim x 25 plots

Page 54: 2013.10.24 big datavisualization

[1] Lins et. al. Infovis 2013

[2] Sismanis et. al. SIGMOD 2002

NanoCubes

Page 55: 2013.10.24 big datavisualization

[1] Lins et. al. Infovis 2013

NanoCubes

Page 56: 2013.10.24 big datavisualization

ResourcesimMens vis.stanford.edu/projects/immensTableau Public tableausoftware.com/publicBigVis (R) github.com/hadley/bigvisNanocubes nanocubes.netBlinkDB blinkdb.orgMapD geops.csail.mit.edu/docs/

Page 57: 2013.10.24 big datavisualization

AcknowledgmentsZhicheng “Leo” LiuBiye Jiang

Page 58: 2013.10.24 big datavisualization

Visualizing “Big” DataSean Kandel & Je!rey Heer Trifacta Inc. @trifacta


Recommended