25
State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA. Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA.

State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

Embed Size (px)

DESCRIPTION

State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries. Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA. Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA. - PowerPoint PPT Presentation

Citation preview

Page 1: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

State-Slice: New Paradigm of Multi-query Optimization ofWindow-based Stream Queries

Song Wang

Elke Rundensteiner

Database Systems Research Group

Worcester Polytechnic Institute

Worcester, MA, USA.

Samrat Ganguly

Sudeept Bhatnagar

NEC Laboratories America Inc.

Princeton, NJ, USA.

Page 2: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 2

Computation Sharing for Stream Processing

RegisterContinuous

Queries

Streaming Data

Streaming Result

σ

П

σ

σ

New Challenges:• In-memory processing of stateful operators • Stateful operators with various window constraints

Agg

SPJA Query Network

w1

w2

w3

Agg

Page 3: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 3

Window Constraints for Stateful Operators Time-based sliding window constraints

Each tuple has a timestamp Only tuples within W timeframe can form an output

Buffer A Buffer B

A[w]

A B

B[w]

Observations:• States in the operator dominate memory usage• State size is proportional to the input rate and window length• Join CPU cost is proportional to the state size

Page 4: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 4

A Motivation ExampleQ1:SELECT A.*FROM Temperature A, Humidity BWHERE A.LocationId= B.LocationIdWINDOW w1 min

Q2:SELECT A.*FROM Temperature A, Humidity BWHERE A.LocationId= B.LocationId AND A.Value>ThresholdWINDOW w2 min

A[w1]

Q1

A B

B[w1]

Q2

σA

A

B

A[w2] B[w2]

Observations:• State A[W1] overlaps with state A[W2] • State B[W1] overlaps with state B[W2]• Joined results of Q1 and Q2 overlap

Let: w1<w2

Page 5: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 5

Sharing with Selection Pull-up [CDF02, HFA+03]

+

Selection pull up Using larger window (w2)

A[w1]

Q1

A B

B[w1]

Q2

σA

A

B

A[w2] B[w2]

all

Q2 Q1

|Ta-Tb |<W1

Router

B

σA

A

R

A[w2] B[w2]

A B

A[w2] B[w2]

σA

Q2

[CDF02]: J. Chen, D. J. DeWitt, and J. F. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE’02.[HFA+03]: M. A. Hammad, M. J. Franklin, W. G. Aref, and A. K. Elmagarmid. Scheduling for shared window joins over data streams. In VLDB’03.

Page 6: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 6

Pros Single Join Operator

Cons Wasted Computation without Early Filtering Wasted State Memory without Early Filtering Per Output-Tuple Routing Cost

Sharing with Selection Pull-up [CDF02, HFA+03]

Page 7: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 7

Split stream A by A.Value Route shared join results

Stream Partition with Selection Pushdown [KFH04]

+

A[w1]

Q1

A B

B[w1]

Q2

σA.Value>Threshold

A

B

A[w2] B[w2]

A1

Router

>

all

BA

Threshold

<=

U

B1

Split

1

A2 B2

2

Q2 Q1

|Ta-Tb |Union R

S

A[w1] B[w1] A[w2] B[w2]

<W1

[KFH04]: S. Krishnamurthy, M. J. Franklin, J. M. Hellerstein, and G. Jacobson. The case for precision sharing. In VLDB’04.

Page 8: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 8

Pros Selection pushdown: no wasted Join

Computation Cons

Multiple Join Operators Duplicated State Memory in Multiple Join

Operators Per Output-Tuple Routing Cost

Stream Partition with Selection Pushdown [KFH04]

Page 9: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 9

State-Slice: New Sharing Paradigm

Key Ideas: State-Slice Concept for Sliding Window Join Pipelined Chain of Join Slices

Prospective Benefit: Fine-grained Selection Push-down Pipelined Join Operators Avoiding Per-tuple Routing Cost

Page 10: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 10

One-way State Sliced Window Join

State of Stream A: [w1, w2]

Probe

A Tuple

B Tuple

Joined-Result

Purged-A-Tuple

Propagated-B-Tuple

Iower bound of sliding window: [w1,w2] B tuple only probes A tuples that are “older” at least W1, but at

most W2, than itself

Page 11: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 11

The Chain of One-way State-Sliced Joins

Split state memory into chain of joins No overlap of state memory in chain of joins

Queue(s)State of Stream A: [0, w1]

Probe

A Tuple

B TupleJ1 J2

State of Stream A: [w1, w2]

Probe

UUnion

Joined-Result

=

Page 12: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 12

female

female

From One-way to Two-way Binary Join

Intuitively a combination of two one-way join Two references for each A or B tuples

Male tuples are used to probe states Female tuples are inserted and cross-purged to

respective states

State of Stream A: [0, w1]

State of Stream B: [0, w1]

Queue(s)

A Tuple

B Tuple

J1

J2

UUnion

Joined-Result

State of Stream B: [w1, w2]

State of Stream A: [w1, w2]

male

male

Page 13: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 13

State-Sliced Join Chain: The Example

States of sliced joins in a chain are disjoint with each other Minimize State Memory Usage

Selection can be pushed down into middle of join chain Avoid Unnecessary Resource Waste

No routing step is needed Avoid Per Output-Tuple Routing Cost Completely

A1B1

BA

[0,W1] 1

A2 B2

2

Q2 Q1

U UnionσA

s

s

σA

[0,W1]

[W1,W2] [W1,W2]+Q2

σA

A

B

A[w2] B[w2]

Q1

A[w1]

A B

B[w1]

Q1

A[w1]

A B

B[w1]

Page 14: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 14

Summary: State-Sliced Join Chain

Pros: Minimized Memory Usage Reduced Routing Cost No Need of Operator Synchronization in the Chain

Cons: Stream traffic between pipelined joins Purge cost

Page 15: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 15

Sharing via Chains: Memory-Optimal Chain

U

UU

s s

[w1,w2]BA

1

Q1

[0,w1]2

Q2

s

[wN-1,wN]N

Union

… QN

Union

s

[w2,w3]3

Q3

Union …

U

s s

[w1,w2]BA

1

Q1

[0,w1]2

Q2

s

[wN-1,wN]N

U Union

… QN

U Union

s

[w2,w3]3

Q3

Union …

σ’1

σ1

σ’2

σ’2

σ2 σ3

σ’3

σ’3

σN

σN

No Selection:

With Selection:

Page 16: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 16

Mem-Optimal Chain CPU-Optimal Chain?

s s

[w1,w2]BA

1

Q1

[0,w1]2

Q2

U Union

s

[w2,w3]3

Q3

U Union

s

[w3,w4]4

Q4

U Union

s

[w4,w5]5

Q5

U Union

Overheads: Too many operators may increase system context switch cost Too many sliced states increase purging cost

Page 17: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 17

Merging Sliced Joins

Tradeoff: Gain from Merging

Reduce number of Join operators Reduce extra purging cost

Loss from Merging Introduce routing cost Increase memory usage due to selection pullup

Cost Model for CPU Usage

si

Qi

U Union

… s

[wj-1,wj]

Qj

U Union

……

[wi-1,wi]

j

Qi

U Union

… s

[wi-1,wj]

Qj

U Union

<wi

|Ta-Tb |R Router

≥wj-1

i

Page 18: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 18

CPU-Opt. Chain: Search Space & Solution

v0 v1 v2 v5v3

w0 w1w2 w3

w5

v4

w4

s s

[w2,w3]BA

1

[0,w2]2

Q3

U Union

s

[w3,w5]3

Q4

U Union

Q2

<w1

|Ta-Tb | RRouter

Q1

<w4

|Ta-Tb |R Router

Q5

U Union

Legend:Vi: window start/end timeVi toVj : one slice window

Shortest path problem

Page 19: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 19

Summary: Mem-Opt. vs. CPU-Opt. Join Chain

Mem-Optimal: Minimized Memory Usage Higher System Overhead Higher Purging Cost

CPU-Optimal: Minimized CPU Usage More Memory Usage if Selection is Pulled Up to

Merge Slices.

Selection PullUp Sharing Mem-Opt. Chain

CPU-Opt. ChainState Slice State Merge

Page 20: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 20

Experimental WPI Stream Engine: CAPE

Software DemonstrationVLDB’04

Operator Configurator

Operator Scheduler

Plan Reoptimizer

CAPE Query Engine

QoS Inspector

Execution Engine

Storage Manager

StreamSender

Stream Feeder

Stream Receiver

Internet

Control Flow

Data Flow

Legend:

Distribution Manager

Query PlanGenerator

Stream / QueryRegistration

GUI

Query 2 . . Query nQuery 1

Streaming Data

End User

Page 21: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 21

Experiment Study 1: Memory Consumption

Page 22: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 22

Experiment Study 2: Total Service Rate

Page 23: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 23

Experiment Study 3: Mem-Opt. vs. CPU-Opt.

Window Distributions Used for 12 Queries.

Small-Large: 12 Queries Small-Large: 24 Queries

Page 24: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 24

Conclusion

Pipelined state sliced join chain Mem-Optimal chain construction CPU-Optimal chain construction Implemented in CAPE Performance evaluation

Page 25: State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

32nd VLDB Conference, Seoul, Korea, 2006 25

Thank You!

Visit CAPE Homepage

http://davis.wpi.edu/dsrg/CAPE/index.html

Supported by:

CRI grant CNS 05-51584