A Browser Framework for Visualizing Big Data
Leo Meyerovich, Ma. Torok, Ras Bodik @LMeyerov UC Berkeley / Graphistry
SUPERCONDUCTOR
1
Why Big Data Visualization?
Yes No
3
Analysis Result: No
Histogram of Voter Turnout by Town
4
most towns had a 40% voter
turnout
0% 25% 50% 75% 100% Voter turnout
# Towns
who’s ballot stuffing?
Tree Map Demo
Ex: Time Series in IBM’s IT Monitor
GE Demo
parse
selectors
layout
render
Browser Engine ~= Chart Engine!
DSLs
Exploit Parallelism in Each One
layout
render
selectors
Deploy Today via Parallel JavaScript
HTML data CSS styling JS script
Pixels
Parser
Selectors
Layout
Renderer JavaScrip
t VM
Renderer.GL
Parser.js webpage
9
Layout.CL
Selectors.CL GPU
superconductor.js
data styling widgets
data viz
Data stays on GPU!
Compiler
DSL 1: Data via JSON
10
JavaScript, Ruby, Python, Java, …
Easy… until 1-10s data loading
Parsing Demo
11
span b { width: 83% } div .dog { float: leJ } p , span b { font-‐size: 7px }
DSL 2: Designers Selectors
<div>
<p> <span>
<img class=“dog”> <b>
12
<b>
<i> <b>
13
span b { width: 83% } div .dog { float: leJ } p , span b { font-‐size: 7px }
Problem: O(sels * tree log tree )
<div>
<p> <span>
<img class=“dog”> <b>
<b>
<i> <b>
<span>
1K-100K HTML nodes
1-10K selectors
×
Good News: Embarrassing Parallelism!
<div>
<p> <span>
<img class=“dog”> <b>
14
<b>
<i> <b>
span b { width: 83% } div .dog { float: leJ } p , span b { font-‐size: 7px }
Selector Engine Implementation
selectors.css
selectors.webcl
compiler.js
…
Dynamic Animation! edit style at runtime then recompile
DSL 3: Layout
CSS
parallelizable layout
JS
flexible compute
FTL parallelizable compute in declarative layout
Step 1/2: Schema of VisualizaYon
Tree class hierarchy
Node attributes
17
x y x y
y
y
y
w h
w h
x x
x
h w
Step 2/2: Schema Attribute Constraints
10px 5px
Root
HBox
Leaf Leaf
Leaf Leaf
HBox
w
x y
h w
h w h
inputs vars
[Kastens 1980, Saraiva 2003] [WWW 2010, PPOPP 2013]
2. Single-‐assignment
HBox ! left=IBox right=IBox w := left.w + right.w …
1. Local
18
Leaf
Compiler Output: Layout as Tree Traversals
w,h w,h
w,h w,h w,h
w,h x,y …
1. Works for all data sets 2. Compiler automatically parallelizes!
[WWW 2010]
logical joins
logical spawns
Parallel
Parallelism in each traversal!
19
Mozilla, Microsoft
DSL 4: Rendering as a Layout Extension
HBox ! left=IBox right=IBox @render @Rectangle(x,y,w,h,color) … w := left.w + right.w …
parallel for loop (level synchronous)
Traversals: Flattened & Level-Synchronous
level 1
Tree
level n
w h x y
Nodes in arrays
Array per attribute
Compiler automates code + data transformations.
[Blelloch 93]
21
circ(…)
Problem: Dynamic Memory Allocation on GPU?
square(…) rect(…); …
line(…); …
rect(…); …
oval(…)
22
1.0 0.8 0.5 0.2 0 0.2
function circ(x,y,r) { buffer = new Array(r*10) for (i = 0; i < r * 10; i++) buffer[i] = Math.cos(i) } dynamic allocation"
Dynamic Allocation as SIMD Traversals
allocCirc(…)à 4 allocRect(…)! 6
allocLine(…)! 6
allocRect(…)! 7
fillCirc(…)
fillRect(…)
fillLine(…)
fillRect(…)
1. Prefix sum for needed space 2. Allocate buffers
3. Fill vertex buffers in parallel 4. Give OpenGL buffers
pointer 23
1.0 0.8 0.5 0.2 0 0.2
1.0 0.8 0.5 0.2
1.0 0.8 0.5 0.2 0 0.2
1
10
100
1,000
10,000
layout (4 passes) rendering pass TOTAL
Tim
e (m
s)
Naïve JS (Chrome 26) GPU (Safari + WebCL 11/3) 24fps
CPU vs. GPU for Election Treemap: 5 traversals over 100K nodes
24
WebCL: 31X
WebCL: 5X
COMBINED: 54X !
DSLs for Big Data Visualization, Today.
Superconductor
• Explore data with interactive visualization
• Script charts like web pages: DSLs!
• Hardware accelerate each DSL
• We use WebCL:
GPGPU, keeps data on GPU, dynamic
compilation
Find us!
sc-lang.com Leo: @LMeyerov / [email protected] Matt: [email protected]
Extra
Parsing Demo
29
Optimizing JSON Parsing
30
raw.json: 23MB
compress + zip csr1.zip (0.2MB), …, csr12.zip server
browser
Parallel parsing easy! … when you fix the format
big JavaScript object
Each worker: 1. native JSON parse # csr.json 2. decompress # obj.json 3. 0-copy return: typed arrays!
parallel parse parallel parse parallel parse
partition raw1.json(1.9MB), …, raw12.json