View
556
Download
3
Category
Tags:
Preview:
DESCRIPTION
Presented on Nov 22, 2012 at the Stratosphere Onsite Meeting 2012. The presentation explains extensions to Stratosphere's execution engine Nephele for latency constrained stream processing. It also sheds light on future work in programming models for scalable, real-time stream processing. The Stratosphere Streaming Distribution, that implements the researched techniques, is available as open-source via github.com: https://github.com/bjoernlohrmann/stratosphere More about me: http://www.cit.tu-berlin.de/menue/personen/lohrmann_bjoern/parameter/en/
Citation preview
Stream Processing
under Latency
Constraints
Björn Lohrmann
Daniel Warneke
Odej Kao
Technische Universität Berlin
Background
20.11.2012 2Björn Lohrmann - Stream Processing Under Latency Constraints
Nephele and PACTs currently focus on batch-job
workloads
-to-
What about streaming workloads?
Generally possible with Nephele
PACTs support is WIP
May have different goals
Meet pipeline latency and throughput requirements
Max/Min other custom metrics
Motivation
20.11.2012 3Björn Lohrmann - Stream Processing Under Latency Constraints
Live processing of streamed data is also worth looking
at. Some examples:
Incremental search index updates (Google Percolator
replaced MapReduce!)
Social Media streams (see Twitter Storm)
Sensor Networks in science and industry
Multimedia-Streams
User-Generated content from mobile phones
CCTV cameras
Agenda
20.11.2012 4Björn Lohrmann - Stream Processing Under Latency Constraints
1. Latency constrained stream processing with Nephele
1. Internal framework design
2. Meeting latency requirements
1. Current Nephele Design Implications
2. Latency Constraints and Measurement
3. Strategy 1: Adaptive Output Buffer Sizing
4. Strategy 2: Task Chaining
5. Experimental Results
2. Streaming on the PACTs layer (WIP)
1. Sliding Window Semantics
Nephele IO Layer Design
20.11.2012 5Björn Lohrmann - Stream Processing Under Latency Constraints
Task
n
Task
n+1
Task
n
Task
n+1
Task
n
Task
n+1
Task
n
Compute Node X
Compute Node Y
Compute Node Z
Input Buffer
Queue
Thread/ProcessOutput
Buffer
Data
Item
Sample Application: Video
Livestreaming
20.11.2012 6Björn Lohrmann - Stream Processing Under Latency Constraints
Node 1 Node 2 Node n-1 Node n
Decoder
Merger
Overlay
Encoder
Partitioner
RTP
Server
Latency w/o Optimizations
20.11.2012 7Björn Lohrmann - Stream Processing Under Latency Constraints
Setup:
200 nodes, 800 cores
32 KB output buffer
size
6400 video streams
Results:
Latency oscillates
around 4s
Large buffers cause
bursts
Implications for Streaming
Applications
20.11.2012 8Björn Lohrmann - Stream Processing Under Latency Constraints
Effects of output buffers:
Large buffer = high tp, high latency
Small buffer = low tp, low latency
Trade-off needs to be found to meet latency goals
Thread/Process Model
1 Task= 1 Thread model is flexible, but has overhead
Thread scheduling, synchronization, communication
Serialization may be necessary (bad for TP & latency)
N Tasks = 1 Thread model can sometimes provide
better better tp and latency
Latency Constraints
20.11.2012 9Björn Lohrmann - Stream Processing Under Latency Constraints
QoS goal:
Meet latency constraint X, keep
throughput as high as possible
We designed two strategies:
1. Adaptive Output Buffer Sizing
2. Dynamic Task Chaining
Both strategies
work autonomously (only latency constraint is required)
are applied on-demand at runtime
300ms
Measuring Latency
To meet latency constraints we need to
measure first!
General aproach:
Determine which tasks & channels need
measuring
Add & evaluate periodic timestamps to
Determine wait time caused by buffers
Determining task latency is a little trickier
w/o knowing semantics (e.g. reduce)
Ship measurement data to a collector-node
20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 10
300ms
Measuring Latency
Problem:
Combinatorial explosion of
paths through execution graph
for which constraints must be
evaluated
Infeasible to do on a central node
for large-scale workflows
Current approach:
Split execution graph into subgraphs (heuristic)
Assign each subgraph to a worker node responsible for
collecting measurements & applying runtime optimizations
when constraint is violated
Successfully scaled to 200 nodes in experiments
20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 11
300ms
Adaptive Output Buffer
Sizing
20.11.2012 12Björn Lohrmann - Stream Processing Under Latency Constraints
Only applied when latency constraint violated
For each channel
Determine output buffer latency (obl)
If obl > threshold, decrease buffer size:
If obl < threshold, increase buffer size again
200,98.0
),max(:
r
rsizesize obl
310500,1.1
),min(:
r
rsizesize obl
Task Chaining
Conditions:
Pipeline of unchained tasks
Sum of CPU utilizations is <
90% of capacity of one core
(reuse available Nephele
profiling data)
Apply to longest chainable
pipeline of tasks
Control-flow manipulation
requires mp-like tasks
20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 13
Task
n
Task
n+1
Compute Node
Task
n
Task
n+1
Compute Node
Again, only applied when overall latency constraint is
violated
Schematic Overview
20.11.2012 14Björn Lohrmann - Stream Processing Under Latency Constraints
JM
300ms
TM
TM
TM
TM
Payload data
+
Measurements
(lat, tp)
+
Buffer Size Updates
Chain Commands
Scheduled Tasks
+
Distributed
Measurement Setup
Latency w/ Adaptive Buffer
Sizing
20.11.2012 15Björn Lohrmann - Stream Processing Under Latency Constraints
Final Latency:
improvement)
Latency /w ABS+TC
20.11.2012 16Björn Lohrmann - Stream Processing Under Latency Constraints
Final Latency:
improvement)
Moving further up the
20.11.2012 17Björn Lohrmann - Stream Processing Under Latency Constraints
Key steps to push constraints up to PACTs:
1. Find a model to express stream semantics for the
blocking PACTS (reduce, match, cogroup, cross)? ( and implement them in the PACT-runtime
2. Define latency constraint annotations for PACTs
jobs
3. Adapt PACT-compiler to produce streamable plans
and push constraints to Nephele layer
Stream Semantics
Literature shows many different ways of defining
stream semantics for blocking relational operators
Key aspect: The sliding data window
Degrees of freedom:
Tuple-based: take the N most recent tuples
Time-based: take all tuples whose timestamp is fresher
than T time units
Partition-based: partition the stream and take union of
N most recent tuples from each partition
Slide length (tuples or time units)
Timestamp Sources
20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 18
Timestamp Sources
Obviously for all window types we need timestamped
tuples
Affects determinism:
Depending on where timestamps are added, replaying
the same data will or will not yield the same results
Degrees of freedom:
Timestamping at the data source outside Stratosphere
Yields identical results upon replay
Timestamping within Stratosphere
Yields different results upon replaying (caused by
scheduling, network, etc)
20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 19
Video Workflow Translated
to PACTs
20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 20
data src map mapreduce map reduce
Decoder Merger Overlay Encoder RTP-Server
Reduce-Key: Group-IDWindow-Type: Partition-BasedWindow-Size: 1 Tuple (per Partition)Partitioning-By: Stream-IDSlide-Length: 1
Reduce-Key: Group-IDWindow-Type: Tuple-BasedWindow-Size: 1 TupleSlide-Length: 1
Timestamp Source
(Stream-ID, Packet)
(Group-ID, Frame) (Group-ID, Frame)
(Group-ID, Frame)
(Group-ID, Packet)
Data sink
Proposed Model
Which semantics are needed is largely application
dependent
Therefore provide PACTs with common, user-
configurable window semantics
Key configuration parameters:
Window type and slide length
Force the user to define timestamp source locations
20.11.2012 Björn Lohrmann - Stream Processing Under Latency Constraints 21
Recommended