Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Peter R. Pietzuch [email protected]
Integrating Scale Out and Fault Tolerance in Stream Processing using
Operator State Management
with Raul Castro Fernandez*Matteo Migliavacca+ and Peter Pietzuch*
*Imperial College London, +Kent Univerisity
Big data …
… in numbers: – 2.5 billions on gigabytes of data every day (source IBM)
– LSST telescope, Chile 2016, 30 TB nightly
… come from everywhere: – web feeds, social networking– mobile devices, sensors, cameras– scientific instruments– online transactions (public and private sectors)
… have value:– Global Pulse forum for detecting human crises internationally– real-time big data analytics in UK £25 billions à £216 billions in 2012-17– recommendation applications (LinkedIn, Amazon)
2
E processing infrastructure for big data analysis
A black-box approach for big data analysis • users issue analysis queries with real-time semantics• streams of data updates, time-varying rates, generated in real-time • streams of result data ü processing in near real-time
3
time
StreamProcessing
System
• queries consist of operators (join, map, select, ..., UDOs)• operators form graphs• operators process streams of tuples on-the-fly• operators span nodes
Distributed Stream Processing System
4
Elastic DSPSs in the Cloud
Real-time big data analysis challenge traditional DSPS:? what about continuous workload surges?? what about real-time resource allocation to workload variations?? keeping the state correct forstateful operators?
Massively scalable , cloud-based DSPSs [SIGMOD 2013]1. gracefully handles stateful operators’ state2. operator state management for combined scale out and
fault tolerance3. SEEP system and evaluation 4. related work5. future research directions
5
Stream Processing in the Cloud• clouds provide infinite pools of resources
6
? How do we build a stream processing platform in the Cloud?
• Failure resilience:– active fault-tolerance needs 2x resources– passive fault-tolerance leads to long
recovery times
• Intra-query parallelism:– provisioning for workload peaks
unnecessarily conservative
E dynamic scale out: increase resources when peaks appear
E hybrid fault-tolerance: low resource overhead with fast recovery
E Both mechanisms must support stateful operators
Stateless vs Stateful Operators
7
stateless:ü failure recoveryü scale out
filter > 5
filterfilter
counter
countercounter
stateful:× failure recovery× scale out
(the, 10)(with, 5) (the, 10)
(with, 5)the with the
(the, 2) !=12(with, 1) !=6
7 1 5 9 97
99
(the, …)
(with, …)with
operator state: a summary of past tuples’ processing
State Management
8
processing state: (summary of past tuples’ processing)
routing state: (routing of tuples)
buffer state: (tuples)
E operator state is an external entity managed by the DSPSE primitives for state managementE mechanisms (scale out, failure recovery) on top of primitivesE dynamic reconfiguration of the dataflow graph
A
B
C
State Management Primitives
9
takes snapshot of state and makes it externally available
E restore
E backup
AA
BB
E checkpoint
E partition
moves copy of state from one operator to another
splits state in a semantically correct fashion for parallel processing
State Management Scale Out, Stateful Ops
10
A
A
periodically, stateful operators checkpoint and back up state to designated upstream backup node, in memory
A
A
scale
out A
backup node already has state of operator to be parallelised
A’new operator
A
A’A
A’
E checkpoint
E backup
E partition
E restore upstream ops sendunprocessed tuplesto update checkpointed state
B
E How do we partition stateful operators?
Partitioning Stateful Operators• 1. Processing state modeled as (key, value) dictionary
• 2. State partitioned according to key k of tuples
• 3. Tuples will be routed to correct operator as of k
11
t=1, key=c, “computer”t=3, key=c, “cambridge”
t=3, (c, computer:1, cambridge:1)t=1, “computer”t=2, “laboratory”t=3, “cambridge” splitter
counter
t=2, key=l, “laboratory”
(a à k), A(l à z), A’
t=2, (l, laboratory:1)
counter
A
A’routing state
buffer state
processing state
Passive Fault-Tolerance Model
• recreate operator state by replaying tuples after failure:– upstream backup: sends acks upstream for tuples processed downstream
• may result in long recovery times due to large buffers:– system is reprocessing streams after failure è inefficient
12
ACKs
dataA B C D
Recovering using State Management (R+SM)
13
AA
Achang
e rou
ting
state
new instance
• Benefit from state management primitives:– use periodically backed up state on upstream node to recover faster– trim buffers at backup node– same primitives as in scale out
A
A
state is restored and unprocessed tuples are replayed from buffer
E same primitives for parallel recovery
AA’
State Management in Action: SEEP
14
(1)
(2)
(1) dynamic Scale Out: detect bottleneck , add new parallelised operator
(2) failure Recovery: detect failure, replace with new operator
EC2 stats
faultdetector
scale outcoordinator
deployment manager
query manager
queries
bottleneck detectorscaling policyVM pool
faults
recoverycoordinator
Dynamic Scale Out: Detecting bottlenecks
CPUutilisation
report
35%85%
30%
logical infrastructure view
35% 85% 30%bottleneckdetector
15
The VM Pool: Adding operators
• problem: allocating new VMs takes minutes...
16
bottleneckdetector
monitoringinformation
Cloud provider
VM1 VM2virtual machine pool
provision VM from cloud (order of mins)
add new VM to pool
fault detectorVM2
VM3 (dynamic pool size)
Experimental Evaluation• Goals:
– investigate effectiveness of scale out mechanism
– recovery time after failure using R+SM– overhead of state management
• Scalable and Elastic Event Processing (SEEP):– implemented in Java; Storm-like data flow model
• Sample queries + workload– Linear Road Benchmark (LRB) to evaluate scale out [VLDB’04]
• provides an increasing stream workload over time
• query with 8 operators, 3 are stateful; SLA: results < 5 secs
– Windowed word count query (2 ops) to evaluate fault tolerance
• induce failure to observe performance impact
• Deployment on Amazon AWS EC2– sources and sinks on high-memory double extra large instances
– operators on small instances
17
Scale Out: LRB Workload
18
0
1
2
3
4
5
6
7
0 500 1000 1500 2000 0 5 10 15 20 25 30 35 40 45 50 55 60
Tupl
es/s
(x10
0K)
Num
ber o
f VM
s
Time (seconds)
Throughput (tuples/s)Input rate (tuples/s)
Num. of VMs
0
1000
2000
3000
4000
5000
0 500 1000 1500 2000 0 5 10 15 20 25 30 35 40 45 50 55 60
Late
ncy
(milli
seco
nds)
Num
ber o
f VMs
Time (seconds)
LatencyNum. of VMs
scales to load factor L=350 with 50 VMs on Amazon EC2
(automated query parallelisation, scale out policy at 70%)
L=512 highest result [VLDB’12]
(hand-crafted query on cluster)
scale out leads to latency peaks, but remains within LRB SLA
E SEEP scales out to increasing workload in the Linear Road Benchmark
Conclusions
19
• Stream processing will grow in importance:– handling the data deluge– enables real-time response and decision making
• Integrated approach for scale out and failure recovery:– operator state an independent entity – primitives and mechanisms
• Efficient approach extensible for additional operators:– effectively applied to Amazon EC2 running LRB– parallel recovery