40
Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

  • View
    222

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

Query Optimization overWeb Services

Utkarsh Srivastava

Jennifer WidomJennifer Widom

Kamesh Munagala

Rajeev Motwani

Page 2: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

2

Performance Numbers

Relative Contribution to Research

0

20

40

60

80

100

0 1 2 3 4 5

Time in Program (years)

Perc

en

t C

on

trib

uti

on

Student

Advisor

ThisWork

Page 3: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

3

Future Directions (sample)

• Web services with monetary cost

• Web services with unstable response times(QoS guarantees?)

• Multiple web services for same data

• Caching web-service query results

• More expressive queries, also workflows

• Web service profiling and statistics-tracking

Page 4: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

4

New QueryOptimization

Problem

First Steps in Big Problem

Ourcontribution

Page 5: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

5

Web Services

Standardized way of sharing data and functionality

Data,Functionality

• Description and discovery

WSDL,UDDIUsers/Clients SOAP

• Communication

WebServices

Page 6: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

6

Reuters

Example Web Services

WS1Stock symbol

NASDAQ

Company info

WS2Stock symbolStock activity

Page 7: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

7

Querying Across Web Services

WS1Stock symbol

NASDAQ

Company info

WS2Stock symbolStock activity

Get info about allcompanies with high-activity stock

User/Client

Query

Results

Reuters

• Easy• Transparent• Efficient• Etc.

Page 8: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

8

Same Basic Goal as Traditional DBMS

DataDatabase

Management System

Query

Results

User/Client

DeclarativeInterface

• Easy• Transparent• Efficient• Etc.

Page 9: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

9

Web Service Management

System

Web Service Management System

QueryUser/

ClientResult

s

WS1

NASDAQ

WS2

ReutersReuters

• Easy• Transparent• Efficient• Etc.

Page 10: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

10

WSMS Architecture

Client

WS1

WS2

WSn

Query +input data

Results

Declarative Interface WS Invocations

Metadata Component

Web service registration

Schemamapper

Query Processing Component

Planexecution

Response-time profiler

Statisticstracker

Profiling and Statistics Component

WSMS

Planselection

Page 11: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

11

Running Example

Credit card company wants to send offers topeople with:

a) credit rating > 600, and

b) payment history = “good” on prior credit card

Company has at its disposal:L : List of potential recipients (identified by SSN)

WS1 : SSN credit rating

WS2 : SSN cc number(s)

WS3 : cc number payment history

Page 12: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

12

Plan 1

Client

WS1

WS2

WS3

WSMS

L(SSN)

SSNcr

SSNccn

ccnph

Filter on cr, keep SSN

SSN

SSN,cr

SSN,ccn

SSN,ccn,ph

Filter on ph, keep SSN

Note: Pipelined processing

SSN cr1 5002 700

SSN ccn2 1232 456

ccn ph123 bad

456good

SSN12

SSN2

QueryPlan

Page 13: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

13

Simple Representation of Plan 1

L ResultsWS1 WS3WS2

SSNcr SSNccn ccnph

Page 14: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

14

Plan 2

Client

WS1

WS2

WS3

WSMS

L(SSN)

SSNcr

SSNccn

ccnph

Filter on cr, keep SSN

SSN SSN,cr

SSN,ccn

SSN,ccn,ph

Filter on ph, keep SSN

SSN cr1 5002 700

SSN ccn2 1232 456

ccn ph123 bad

456good

SSN12

SSN2

Join

SSN

SSN

SSN

Page 15: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

15

Simple Representation of Plan 2

L Results

WS1

WS2 WS3

SSNcr

SSNccn ccnph

Page 16: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

16

Quiz

L Results

WS1

WS2 WS3

L ResultsWS1 WS3WS2

Which plan is better?

Plan 2

Plan 1

• Cost metric: steady-state throughput• Assume join is “free”

Plan 1 is never worse

Page 17: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

17

Query Optimization Primer

Possible query plans: P1, …, Pn

Data/access statistics: S

Execution cost metric: cost(Pi, S)

GOAL: Find least-cost plan

Page 18: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

18

Query Optimization Primer

Possible query plans: P1, …, Pn

Data/access statistics: S

Execution cost metric: cost(Pi, S)

GOAL: Find least-cost plan

Page 19: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

19

Queries and Plans

“Select-Project-Join” queries over input data Land set of web services WS1, …, WSn

Precedence constraintsOutput of WSi may be needed as input for WSj

Ex: WS2: SSN ccn and WS3: ccn ph

Precedence DAG defines space of query plans

Page 20: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

20

Query Optimization Primer

Possible query plans: P1, …, Pn

Data/access statistics: S

Execution cost metric: cost(Pi, S)

GOAL: Find least-cost plan

Page 21: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

21

Statistics

1) Web service response times

2) Web service selectivities

Ourcontribution

New QueryOptimization

Problem

Page 22: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

22

Statistics: Response Times

ri: per-tuple response time of WSi from client

WS1 SSNcrClient

SSN

cr

• Assume independent response times within query plans

r1

• ri ≈ 1/throughput, can be reduced by batching, parallel calls batching

(see paper)

Ourcontribution

New QueryOptimization

Problem

Page 23: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

23

Statistics: Selectivities

si: selectivity of WSi

Average # output tuples per input tuple to WSi

including post-filtering in query plan

WS1: SSN cr, filter cr > 600If 90% of SSNs have cr > 600 then s1 = 0.9

WS2: SSN ccnIf on average each SSN has 2 credit cards then s2 = 2.0

Ourcontribution

• Assume independent selectivities within query plans

New QueryOptimization

Problem

Page 24: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

24

Query Optimization Primer

Possible query plans: P1, …, Pn

Data/access statistics: S

Execution cost metric: cost(Pi, S)

GOAL: Find least-cost plan

Page 25: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

25

Bottleneck Cost Metric

Ourcontribution

New QueryOptimization

Problem

Page 26: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

26

Bottleneck Cost Metric

Conference Lunch Buffet

Dish 1 Dish 2 Dish 3 Dish 4

Average per-tuple processing time =response time of slowest (bottleneck) stage in pipeline

Note: selectivities=1 in this example

Page 27: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

27

Cost Equation for Plan P

Ri(P): Predecessors of WSi in plan P

• Fraction of input tuples seen by WSi =

• WSi response time per input tuple =

(assumes WSMS processing is not the bottleneck)

• Bottleneck cost metric:

Πj∈Ri(P) sj

(Πj∈Ri(P) sj)•ri

cost(P) = max1≤i≤n( (Πj∈Ri(P) sj)•ri )

Page 28: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

28

Contrast with Sum Cost Metric

Dish 1 Dish 2 Dish 3 Dish 4

• Stream filter ordering• Expensive predicate placement

“Polite” Lunch Buffet

cost(P) = ∑1≤i≤n( (Πj∈Ri(P) sj)•ri )

Page 29: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

29

Problem Statement

Input:

• Web services WS1, …, WSn

• Response times r1, …, rn

• Selectivities s1, …, sn

• Precedence constraints among web services

Output:• Web services arranged into a plan P • P respects all precedence constraints• cost(P) is minimized

Page 30: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

30

No Precedence Constraints

All selectivities ≤ 1

Theorem: Optimal to order linearly by ri

(selectivities irrelevant)

General case(optimal):

… join at WSMS

“selective” web services ordered by response-time

“proliferative” web services

Results

Page 31: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

31

With Precedence Constraints

cost(P) = max1≤i≤n( (Πj∈Ri(P) sj)•ri )

Page 32: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

32

With Precedence Constraints

Sum cost metric• Hard to even obtain a factor O(n) of optimal

0

20

40

60

80

100

0 1 2 3 4 5

Time in Program (years)

Perc

en

t C

on

trib

uti

on

Student

Advisor

cost(P) = ∑1≤i≤n( (Πj∈Ri(P) sj)•ri )

Page 33: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

33

With Precedence Constraints

Bottleneck (max) cost metric• Surprisingly, optimal solution in polynomial time

• O(n5) algorithm in paper–Add one WS at a time to the plan–WS chosen by solving a linear program

0

20

40

60

80

100

0 1 2 3 4 5Time in Program (years)

Perc

en

t C

on

trib

uti

on

Student

Advisor

cost(P) = max1≤i≤n( (Πj∈Ri(P) sj)•ri )

Page 34: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

34

Example Revisited

L Results

WS1

WS2 WS3

L ResultsWS1 WS3WS2

Plan 2

Plan 1SSNcr

SSNcr

SSNccn

SSNccn

ccnph

ccnph

WS2

WS1

WS3

WS1 WS2 WS3

Selective

ProliferativeWS2 WS3Precedence constraint

max1≤i≤n( (Πj∈Ri(P) sj)•ri )

Page 35: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

35

Implementation

Built prototype WSMS query processor

• Optimizer and execution engine

• Assumes schema issues resolved, statistics provided

• Written in Java and uses Apache Axis (open-source SOAP implementation)

• Experiments (see paper) validate analytical results

Page 36: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

36

Isn’t Problem the Same as … ?

Web Service composition• Targeted for workflow-oriented applications

• No provably optimal strategies

Parallel/distributed query optimization• Freedom to place query operators

• Much larger space of execution plans

Data integration, mediators• For general sources of data

• Optimization of total resource consumption

Page 37: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

37

Future Directions (sample)

• Web services with monetary cost

• Web services with unstable response times(QoS guarantees?)

• Multiple web services for same data

• Caching web-service query results

• More expressive queries, also workflows

• Web service profiling and statistics-tracking

Page 38: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

38

Conclusion

Ourcontribution

New QueryOptimization

Problem

Page 39: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

39

Conclusion

Ourcontribution

New QueryOptimization

Problem

Page 40: Query Optimization over Web Services Utkarsh Srivastava Jennifer Widom Jennifer Widom Kamesh Munagala Rajeev Motwani

40

Questions?

0

20

40

60

80

100

0 1 2 3 4 5

Time in Program (years)

Perc

en

t C

on

trib

uti

on

Student

Advisor