Big Data Ecosystem & The Stratosphere Project

StratoSphereAbove the Clouds

Stratosphere

Massively Parallel Analytics

Alexander Alexandrov, Stephan Ewen,Joseph Harjung, Fabian Hüske,

Moritz Kaufmann, Aljoscha Krettek, Volker Markl, Kostas Tzoumas, Sebastian Schelter

Stratosphere – Parallel Analytics Beyond MapReduce

The Big Data Context

2

Large Quantitiesof Data

Diverse Data Structures

Complex AnalysisTasks

SQL

?

SQL NoSQL

?

NoMapReduce

SQL NoSQL

?

NoMapReduce

SQL NoSQL

SQL--

?

NoMapReduce

SQL NoSQL

SQL--

?

?

NoMapReduce

SQL NoSQL

SQL--

?

?Question 1:

Is it faster to add a HiveQL parser and

an HDFS adapter to your favorite

parallel database, or develop a parallel

engine from scratch?

NoMapReduce

SQL NoSQL

SQL--

?

?Question 1:

Is it faster to add a HiveQL parser and

an HDFS adapter to your favorite

parallel database, or develop a parallel

engine from scratch?

Question 2:Have we closed the circle (“we want

SQL!”) or is there more in analytics?

10

11

scripting

12

scripting

SQL--

13

scripting

SQL--

XQuery+/-

14

scripting

SQL--

scalable parallel sort

XQuery+/-

15

scripting

SQL--


XQuery+/- not a sortingproblem!

16

scripting

SQL--

columnstore--



17

scripting

SQL--

columnstore--


a queryplan


18

scripting

SQL--

columnstore--


a queryplan


Question 3:

How do we architect systems for the

next wave of rich data analysis?

19

≠

commandments

for Big Data

Analytics

10


case class Vertex(id: Int, component: Int)case class Edge(from: Int, to: Int)

val vertices = hdfsFile(…);val edges = hdfsFile(…);

val result = step iterate (vertices distinctBy {_.id}, vertices)

def step = (s: Data[Vertex], ws: Data[Vertex]) => {

val neighbors = ws join edges on {_.id} isEqualTo {_.from} using {(v,e) => Vertex(e.to, v.component)}

val min = allNeighbors reduceBy {_.id} ( minBy _.component)

val s1 = minNeighbors join s on {_.id} isEqualTo {_.id} using {(c,o)=> if (c.component < o.component) Some(c) else None} (s1, s1)}

(I) Thou shalt…

21

… use declarative languages!

Stratosphere – Parallel Analytics Beyond MapReduce22








(I) Thou shalt…

… use declarative languages!

Executive Summary

Connected components of a graph.

- Joins and aggregations on custom data types

- Incremental / Delta Iterations

- Mixture of operators and UDFs









(II) Thou shalt…

… accept external (dynamic) sources! “In situ” data - no load









(III) Thou shalt…

… use rich primitives! (beyond MapReduce)









(III) Thou shalt…

… use rich primitives! (beyond MapReduce)

Map

Reduce

Cross

Match

CoGroup









(IV) Thou shalt…

… define queries and UDFs in the same language!

UDF

Query definition









(V) Thou shalt…

… use an algebraic butrich data model!

Custom Object Oriented andFunctional Data Types

Use functions as referencesto fields/attributes









(VI) Thou shalt…

… optimize! Auto-parallelization and optimization à la relational databases.









(VII) Thou shalt…

… not treat UDFs as black boxes!

Static code analysis of UDFsto determine field accessesand modificationsVastly increases optimization

potential









(VIII) Thou shalt…

… iterate/recurse!

Step function

Needed for most interesting analysis cases









(IX) Thou shalt…

… exploit dynamic computation!

Naïve (Bulk)

Incremental

0200000400000600000800000

100000012000001400000

Superstep

# Ve

rtice

s (t

hous

ands

)

Pregel as a Stratosphere plan with comparable performance.


(X) Thou shalt…

… use a scalable and efficient execution engine!

Pipeline and data parallelism, flexible checkpointing, optimized network data transfers


Write like a programming language

Fazit

33

Execute like a Database


Write like a programming language

Fazit

34

Execute like a DatabaseAdd a bit of "languages and compilers" sauce to the database stack…


Stratosphere Programming Stack

35

Nephele Dataflow Engine

Runtime Operators

SOPREMOCompiler

MeteorScript

Scala

Scala-Compiler Plugin

Stratosphere Optimizer

Nephele Parallel Dataflow

PACT Program

Layered approach – several entry points to the system


Stratosphere Programming Stack

36

Nephele Dataflow Engine

Runtime Operators

SOPREMOCompiler

MeteorScript

Scala

Scala-Compiler Plugin

Stratosphere Optimizer

Nephele Parallel Dataflow

PACT Program

Pact programScala program

Scala compiler plug-in

RuntimeHash- and sort-based out-of-core operator implementations, memory management

Stratosphere optimizerPicks data shipping and local strategies, operator order

Execution plan

Nephele Execution EngineTask scheduling, network data transfers, resource allocation, checkpointing

Job graph Execution graph

Pact programScala program

Scala compiler plug-in

RuntimeHash- and sort-based out-of-core operator implementations, memory management

Stratosphere optimizerPicks data shipping and local strategies, operator order

Execution plan

Nephele Execution EngineTask scheduling, network data transfers, resource allocation, checkpointing

Job graph Execution graph

1

2

3



PARALLEL PROGRAMMING MODEL

Part 1

39


Background: PACTs

40

D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, D. Warneke: Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

Second-orderfunction

First-order function(UDF)Data Data

Map Reduce Cross Match CoGroup


■ Data flow operators (UDFs)are first-order functions

■ Application of UDFs to thedata through second-orderfunctions that defineparallel semantics

■ Declarative, as executionstrategies are not fixed

Background: PACTs

41

Reduce (on A)sum(B), avg(C)

Match (A = D)if (A>3) emit

MapC := max(A,B)

Mapif (D>4) emit

Sink 1

Source 1Extract (A,B)

Source 2Extract (D,E)

D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, D. Warneke: Nephele/PACTs: a programming model and execution framework for web-scale analytical processing


Iterative Programs

42

S. Ewen, K. Tzoumas, M. Kaufmann, V. Markl:Spinning Fast Iterative Data Flows. PVLDB 5(11), 2012

Wi Si

(v2, cid) Match

(v1,v2), (vid,cid)

(vid, cid)CoGroup

[(vid,cid)],(vid, cid)

N

Wi+1 Di+1

U.

Edges

Bulk Iteration(Page Rank)

Incremental Iteration(Connected Components)

(pid, tid, p)

Join Pand A

(pid, r)

A

Reduce (on tid)(pid=tid, r=∑ k)

Match (on pid)(tid, k=r*p)

Sum uppartial ranks

p


How does it look in code

43

val result = step iterate (vertices distinctBy {_.id}, messages)

def step = (s: Data[Vertex], ws: Data[Message]) => { val sNext = ws join s on {…} isEqualTo {…} using {…} val wNext = sNext join edges on … (sNext, wNext)}

Java

Scala


Incremental Iterations matter…

44

0 3 6 9 12 15 18 21 24 27 30 330

200000

400000

600000

800000

1000000

1200000

1400000

Superstep

# Ve

rtice

s (t

hous

ands

)

Naïve (Bulk)

Incremental

Twitter Webbase (20)0

1000

2000

3000

4000

5000

6000

Changes to the iteration's result for Connected Components in each superstep…

… and runtime.


Pregel as a Pact program

45



THE PROGRAM COMPILER AND OPTIMIZER

Part 2

46


Why an Optimizer for such Programs?

47

Do you want to hand-optimize that?


■ Cost-based optimizer produces physical execution plan given PACT program□ Annotates data channels with distribution patters, e.g., broadcast, partition□ Chooses physical execution strategies (e.g., hash/sort)□ Reorders PACT functions Deeply embeds MapReduce style UDFs in the

optimization

■ Optimization of iterative programs□ Passing data between super-steps□ Loop-invariant data□ Efficient state maintenance in partitioned indexes

■ Challenge: Semantics of user-defined functions unknown

Pact Optimizer Overview

48


Current architecture

49

1) Analyze 3) Parallelize

2) Reorder


1) Opening the Black Boxes …

50

Analyze user code to discover:

■ Read set Rf: Attributes of the input record(s) that might influence output

■ Write set Wf: Attributes of the output record(s) that might have different values from respective input attributes

■ Emit cardinality Ef: Bounds on records emitted per call (1, >1, …)

PACTf

(Rf,Wf,Ef)


1 void match (Record left,2 Record right,3 Collector col) {4 Record out = copy (left);5 if (left.get(0) > 3) {6 double a = right.get(2);7 out.set(2,1.0/a);8 }9 out.set(1, 42);10 out.set(3,right.get(0));11 out.set(4,right.get(1));12 out.set(5,right.get(2));13 col.emit (out);14 }

… via Static Code Analysis

51

Feasible:1. No control flow between

operators 2. Record data model, fixed API

Correct: ■ Difficulty comes from different code

paths■ Correctness guaranteed through

conservatism■ Add to R,W when in doubt


Conditions for reordering UDFs

52

Enabled optimizations: Selection push-down (Bushy) join reordering Aggregation push-down

Equivalent to invariant grouping transformation [Chaudhuri & Shim 1994]

Reordering of non-relational Reduce functions

Theorem 1: Two Map operators can be reordered if their UDFs have only read-read conflictsTheorem 2: For a Map and a Reduce, we need in addition the Reduce key groups to be preserved


■ Simple enumeration algorithm that checks pairwise reordering for all neighboring operators

■ Current problem: Walking all points in the search space

■ Next: Deduce join-graph-like information from reordering degrees-of-freedom

Optimizer Architecture (I)

53


■ Operators are defined in terms of possible global data properties (partitioning/replication/...) and local data properties (order/grouping/uniqueness/...)

■ Nodes propagate requested properties top-down□ Filtered by UDF‘s field modification□ Filtered by incompatibility□ Every data flow edge has a set of possible requested properties

■ Requested properties are instantiated at each point□ Global properties by exchange strategies□ Local properties by local operators

■ Requested properties used for pruning candidate (as with intersting properties)

Optimizer Architecture (II)

54


■ Determine static and dynamic data flow paths for iterations□ Static path contains data that is loop-invariant

■ Use heuristics to place caches such that loop-invariant computations are not repeated□ Cache loop-invariant data also in ordered form, or as hash tables

■ Weigh costs for static and dynamic path differently□ Optimizer favors plans that „push“ work into static path

Optimizer Architecture (III)

55


PageRank: Two Optimizer Plans

56



O

I(pid, tid, p)

CACHE

Join P and A

Sum uppartial ranks

(pid, r)

Abroadcast

part./sort (tid)

probeHashTable (pid)buildHash-Table (pid)

p

O

I(pid, tid, p)

buildHashTable (pid)

Join P and A

(pid, r)

A

part./sort (tid)

partition (pid)

CACHEprobeHash-Table (pid)



Sum uppartial ranks

ppartition (pid)

fifo

fifo



THE FUNCTIONAL LANGUAGE COMPILATION

Part 3

57


The Compiler Mismatch

58

Parser/Checker Optimizer Code

Generation Runtime

Parser/Checker

Code Generation Optimizer Runtime

The Database Approach

UDF Systems: MapReduce &Stratosphere (original)

Code Generation AFTERcontext of operation is fixed.

Code Generation BEFOREcontext of operation is fixed.

Query Compiler

Language Compiler


The Program Compilation Pipeline

59

Program Code

Parser/Checker

ByteCode

Generator

Analyzer and Code

Generator

GlobalSchema

Generator

PactOptimizer

ProgramInstantiation

Schema and Code

Finalization

Parallel Data Flow

Generator

Parallel Data Flow

Language Compiler


■ Supported Types□ Primitive (Integers, Floating-Point, Strings, …), Lists, Tuples, Product Types

(classes), Summation Types (class hierarchies) , Recursive Types

■ Data types are logically flattened□ Some fields are transparent members of the flat model, some are black box

members

■ Transparent members may be references in selector functions

■ Selector Functions are likewise analyzed and translated into logical positions

1) Analyzing Data Types

60


■ User Code is pure Scala, no Stratosphere specific types, interfaces

■ Wrapper code necessary to run it as a UDF in Stratosphere

■ Serializer/Comparator Code is generated as a template (omitting exact field positions, storing logical positions)

■ Code is inserted by modifying the program's Abstract-Syntax-Tree

2) Generating Glue Code

61


■ Schema generated from logical flattened model■ Each field in every operator’s result gets a unique name

□ Unless exact copy of an input field (info from code analysis)

■ Run Stratosphere optimizer□ Potentially reorders functions

■ Prune unused fields early□ Information whether fields are accessed by UDF from code analysis

■ Create physical data layout■ Finalize serializer / comparator code

3) Schema Generation

62


Some preliminary results...

63


■ MapReduce ■ Pig, JAQL, Hive■ AQL■ Scope■ Datalog for Machine Learning■ BOOM■ Twister / HaLoop■ Spark■ Naiad■ Flume Java / Plume Java■ Scalops■ Jet■ LINQ

Related Work

64

Technology

Big Data Ecosystem & The Stratosphere Project