View
25
Download
2
Category
Preview:
DESCRIPTION
State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries. Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA. Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA. - PowerPoint PPT Presentation
Citation preview
State-Slice: New Paradigm of Multi-query Optimization ofWindow-based Stream Queries
Song Wang
Elke Rundensteiner
Database Systems Research Group
Worcester Polytechnic Institute
Worcester, MA, USA.
Samrat Ganguly
Sudeept Bhatnagar
NEC Laboratories America Inc.
Princeton, NJ, USA.
32nd VLDB Conference, Seoul, Korea, 2006 2
Computation Sharing for Stream Processing
RegisterContinuous
Queries
Streaming Data
Streaming Result
σ
П
σ
σ
New Challenges:• In-memory processing of stateful operators • Stateful operators with various window constraints
Agg
SPJA Query Network
w1
w2
w3
Agg
32nd VLDB Conference, Seoul, Korea, 2006 3
Window Constraints for Stateful Operators Time-based sliding window constraints
Each tuple has a timestamp Only tuples within W timeframe can form an output
Buffer A Buffer B
A[w]
A B
B[w]
Observations:• States in the operator dominate memory usage• State size is proportional to the input rate and window length• Join CPU cost is proportional to the state size
32nd VLDB Conference, Seoul, Korea, 2006 4
A Motivation ExampleQ1:SELECT A.*FROM Temperature A, Humidity BWHERE A.LocationId= B.LocationIdWINDOW w1 min
Q2:SELECT A.*FROM Temperature A, Humidity BWHERE A.LocationId= B.LocationId AND A.Value>ThresholdWINDOW w2 min
A[w1]
Q1
A B
B[w1]
Q2
σA
A
B
A[w2] B[w2]
Observations:• State A[W1] overlaps with state A[W2] • State B[W1] overlaps with state B[W2]• Joined results of Q1 and Q2 overlap
Let: w1<w2
32nd VLDB Conference, Seoul, Korea, 2006 5
Sharing with Selection Pull-up [CDF02, HFA+03]
+
Selection pull up Using larger window (w2)
A[w1]
Q1
A B
B[w1]
Q2
σA
A
B
A[w2] B[w2]
all
Q2 Q1
|Ta-Tb |<W1
Router
B
σA
A
R
A[w2] B[w2]
A B
A[w2] B[w2]
σA
Q2
[CDF02]: J. Chen, D. J. DeWitt, and J. F. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE’02.[HFA+03]: M. A. Hammad, M. J. Franklin, W. G. Aref, and A. K. Elmagarmid. Scheduling for shared window joins over data streams. In VLDB’03.
32nd VLDB Conference, Seoul, Korea, 2006 6
Pros Single Join Operator
Cons Wasted Computation without Early Filtering Wasted State Memory without Early Filtering Per Output-Tuple Routing Cost
Sharing with Selection Pull-up [CDF02, HFA+03]
32nd VLDB Conference, Seoul, Korea, 2006 7
Split stream A by A.Value Route shared join results
Stream Partition with Selection Pushdown [KFH04]
+
A[w1]
Q1
A B
B[w1]
Q2
σA.Value>Threshold
A
B
A[w2] B[w2]
A1
Router
>
all
BA
Threshold
<=
U
B1
Split
1
A2 B2
2
Q2 Q1
|Ta-Tb |Union R
S
A[w1] B[w1] A[w2] B[w2]
<W1
[KFH04]: S. Krishnamurthy, M. J. Franklin, J. M. Hellerstein, and G. Jacobson. The case for precision sharing. In VLDB’04.
32nd VLDB Conference, Seoul, Korea, 2006 8
Pros Selection pushdown: no wasted Join
Computation Cons
Multiple Join Operators Duplicated State Memory in Multiple Join
Operators Per Output-Tuple Routing Cost
Stream Partition with Selection Pushdown [KFH04]
32nd VLDB Conference, Seoul, Korea, 2006 9
State-Slice: New Sharing Paradigm
Key Ideas: State-Slice Concept for Sliding Window Join Pipelined Chain of Join Slices
Prospective Benefit: Fine-grained Selection Push-down Pipelined Join Operators Avoiding Per-tuple Routing Cost
32nd VLDB Conference, Seoul, Korea, 2006 10
One-way State Sliced Window Join
State of Stream A: [w1, w2]
Probe
A Tuple
B Tuple
Joined-Result
Purged-A-Tuple
Propagated-B-Tuple
Iower bound of sliding window: [w1,w2] B tuple only probes A tuples that are “older” at least W1, but at
most W2, than itself
32nd VLDB Conference, Seoul, Korea, 2006 11
The Chain of One-way State-Sliced Joins
Split state memory into chain of joins No overlap of state memory in chain of joins
Queue(s)State of Stream A: [0, w1]
Probe
A Tuple
B TupleJ1 J2
State of Stream A: [w1, w2]
Probe
UUnion
Joined-Result
=
32nd VLDB Conference, Seoul, Korea, 2006 12
female
female
From One-way to Two-way Binary Join
Intuitively a combination of two one-way join Two references for each A or B tuples
Male tuples are used to probe states Female tuples are inserted and cross-purged to
respective states
State of Stream A: [0, w1]
State of Stream B: [0, w1]
Queue(s)
A Tuple
B Tuple
J1
J2
UUnion
Joined-Result
State of Stream B: [w1, w2]
State of Stream A: [w1, w2]
male
male
32nd VLDB Conference, Seoul, Korea, 2006 13
State-Sliced Join Chain: The Example
States of sliced joins in a chain are disjoint with each other Minimize State Memory Usage
Selection can be pushed down into middle of join chain Avoid Unnecessary Resource Waste
No routing step is needed Avoid Per Output-Tuple Routing Cost Completely
A1B1
BA
[0,W1] 1
A2 B2
2
Q2 Q1
U UnionσA
s
s
σA
[0,W1]
[W1,W2] [W1,W2]+Q2
σA
A
B
A[w2] B[w2]
Q1
A[w1]
A B
B[w1]
Q1
A[w1]
A B
B[w1]
32nd VLDB Conference, Seoul, Korea, 2006 14
Summary: State-Sliced Join Chain
Pros: Minimized Memory Usage Reduced Routing Cost No Need of Operator Synchronization in the Chain
Cons: Stream traffic between pipelined joins Purge cost
32nd VLDB Conference, Seoul, Korea, 2006 15
Sharing via Chains: Memory-Optimal Chain
U
UU
s s
[w1,w2]BA
1
Q1
[0,w1]2
Q2
s
[wN-1,wN]N
…
Union
… QN
Union
s
[w2,w3]3
Q3
Union …
U
s s
[w1,w2]BA
1
Q1
[0,w1]2
Q2
s
[wN-1,wN]N
…
U Union
… QN
U Union
s
[w2,w3]3
Q3
Union …
σ’1
σ1
σ’2
σ’2
σ2 σ3
σ’3
σ’3
σN
σN
No Selection:
With Selection:
32nd VLDB Conference, Seoul, Korea, 2006 16
Mem-Optimal Chain CPU-Optimal Chain?
s s
[w1,w2]BA
1
Q1
[0,w1]2
Q2
U Union
s
[w2,w3]3
Q3
U Union
s
[w3,w4]4
Q4
U Union
s
[w4,w5]5
Q5
U Union
Overheads: Too many operators may increase system context switch cost Too many sliced states increase purging cost
32nd VLDB Conference, Seoul, Korea, 2006 17
Merging Sliced Joins
Tradeoff: Gain from Merging
Reduce number of Join operators Reduce extra purging cost
Loss from Merging Introduce routing cost Increase memory usage due to selection pullup
Cost Model for CPU Usage
si
Qi
U Union
… s
[wj-1,wj]
Qj
U Union
……
…
…
[wi-1,wi]
j
Qi
U Union
… s
[wi-1,wj]
Qj
U Union
…
<wi
|Ta-Tb |R Router
≥wj-1
i
…
…
…
32nd VLDB Conference, Seoul, Korea, 2006 18
CPU-Opt. Chain: Search Space & Solution
v0 v1 v2 v5v3
w0 w1w2 w3
w5
v4
w4
s s
[w2,w3]BA
1
[0,w2]2
Q3
U Union
s
[w3,w5]3
Q4
U Union
Q2
<w1
|Ta-Tb | RRouter
Q1
<w4
|Ta-Tb |R Router
Q5
U Union
Legend:Vi: window start/end timeVi toVj : one slice window
Shortest path problem
32nd VLDB Conference, Seoul, Korea, 2006 19
Summary: Mem-Opt. vs. CPU-Opt. Join Chain
Mem-Optimal: Minimized Memory Usage Higher System Overhead Higher Purging Cost
CPU-Optimal: Minimized CPU Usage More Memory Usage if Selection is Pulled Up to
Merge Slices.
Selection PullUp Sharing Mem-Opt. Chain
CPU-Opt. ChainState Slice State Merge
32nd VLDB Conference, Seoul, Korea, 2006 20
Experimental WPI Stream Engine: CAPE
Software DemonstrationVLDB’04
Operator Configurator
Operator Scheduler
Plan Reoptimizer
CAPE Query Engine
QoS Inspector
Execution Engine
Storage Manager
StreamSender
Stream Feeder
Stream Receiver
Internet
Control Flow
Data Flow
Legend:
Distribution Manager
Query PlanGenerator
Stream / QueryRegistration
GUI
Query 2 . . Query nQuery 1
Streaming Data
End User
32nd VLDB Conference, Seoul, Korea, 2006 21
Experiment Study 1: Memory Consumption
32nd VLDB Conference, Seoul, Korea, 2006 22
Experiment Study 2: Total Service Rate
32nd VLDB Conference, Seoul, Korea, 2006 23
Experiment Study 3: Mem-Opt. vs. CPU-Opt.
Window Distributions Used for 12 Queries.
Small-Large: 12 Queries Small-Large: 24 Queries
32nd VLDB Conference, Seoul, Korea, 2006 24
Conclusion
Pipelined state sliced join chain Mem-Optimal chain construction CPU-Optimal chain construction Implemented in CAPE Performance evaluation
32nd VLDB Conference, Seoul, Korea, 2006 25
Thank You!
Visit CAPE Homepage
http://davis.wpi.edu/dsrg/CAPE/index.html
Supported by:
CRI grant CNS 05-51584
Recommended