View
222
Download
5
Embed Size (px)
Citation preview
Query Optimization overWeb Services
Utkarsh Srivastava
Jennifer WidomJennifer Widom
Kamesh Munagala
Rajeev Motwani
2
Performance Numbers
Relative Contribution to Research
0
20
40
60
80
100
0 1 2 3 4 5
Time in Program (years)
Perc
en
t C
on
trib
uti
on
Student
Advisor
ThisWork
3
Future Directions (sample)
• Web services with monetary cost
• Web services with unstable response times(QoS guarantees?)
• Multiple web services for same data
• Caching web-service query results
• More expressive queries, also workflows
• Web service profiling and statistics-tracking
4
New QueryOptimization
Problem
First Steps in Big Problem
Ourcontribution
5
Web Services
Standardized way of sharing data and functionality
Data,Functionality
• Description and discovery
WSDL,UDDIUsers/Clients SOAP
• Communication
WebServices
6
Reuters
Example Web Services
WS1Stock symbol
NASDAQ
Company info
WS2Stock symbolStock activity
7
Querying Across Web Services
WS1Stock symbol
NASDAQ
Company info
WS2Stock symbolStock activity
Get info about allcompanies with high-activity stock
User/Client
Query
Results
Reuters
• Easy• Transparent• Efficient• Etc.
8
Same Basic Goal as Traditional DBMS
DataDatabase
Management System
Query
Results
User/Client
DeclarativeInterface
• Easy• Transparent• Efficient• Etc.
9
Web Service Management
System
Web Service Management System
QueryUser/
ClientResult
s
WS1
NASDAQ
WS2
ReutersReuters
• Easy• Transparent• Efficient• Etc.
10
WSMS Architecture
Client
WS1
WS2
WSn
Query +input data
Results
Declarative Interface WS Invocations
Metadata Component
Web service registration
Schemamapper
Query Processing Component
Planexecution
Response-time profiler
Statisticstracker
Profiling and Statistics Component
WSMS
Planselection
11
Running Example
Credit card company wants to send offers topeople with:
a) credit rating > 600, and
b) payment history = “good” on prior credit card
Company has at its disposal:L : List of potential recipients (identified by SSN)
WS1 : SSN credit rating
WS2 : SSN cc number(s)
WS3 : cc number payment history
12
Plan 1
Client
WS1
WS2
WS3
WSMS
L(SSN)
SSNcr
SSNccn
ccnph
Filter on cr, keep SSN
SSN
SSN,cr
SSN,ccn
SSN,ccn,ph
Filter on ph, keep SSN
Note: Pipelined processing
SSN cr1 5002 700
SSN ccn2 1232 456
ccn ph123 bad
456good
SSN12
SSN2
QueryPlan
13
Simple Representation of Plan 1
L ResultsWS1 WS3WS2
SSNcr SSNccn ccnph
14
Plan 2
Client
WS1
WS2
WS3
WSMS
L(SSN)
SSNcr
SSNccn
ccnph
Filter on cr, keep SSN
SSN SSN,cr
SSN,ccn
SSN,ccn,ph
Filter on ph, keep SSN
SSN cr1 5002 700
SSN ccn2 1232 456
ccn ph123 bad
456good
SSN12
SSN2
Join
SSN
SSN
SSN
15
Simple Representation of Plan 2
L Results
WS1
WS2 WS3
SSNcr
SSNccn ccnph
16
Quiz
L Results
WS1
WS2 WS3
L ResultsWS1 WS3WS2
Which plan is better?
Plan 2
Plan 1
• Cost metric: steady-state throughput• Assume join is “free”
Plan 1 is never worse
17
Query Optimization Primer
Possible query plans: P1, …, Pn
Data/access statistics: S
Execution cost metric: cost(Pi, S)
GOAL: Find least-cost plan
18
Query Optimization Primer
Possible query plans: P1, …, Pn
Data/access statistics: S
Execution cost metric: cost(Pi, S)
GOAL: Find least-cost plan
19
Queries and Plans
“Select-Project-Join” queries over input data Land set of web services WS1, …, WSn
Precedence constraintsOutput of WSi may be needed as input for WSj
Ex: WS2: SSN ccn and WS3: ccn ph
Precedence DAG defines space of query plans
20
Query Optimization Primer
Possible query plans: P1, …, Pn
Data/access statistics: S
Execution cost metric: cost(Pi, S)
GOAL: Find least-cost plan
21
Statistics
1) Web service response times
2) Web service selectivities
Ourcontribution
New QueryOptimization
Problem
22
Statistics: Response Times
ri: per-tuple response time of WSi from client
WS1 SSNcrClient
SSN
cr
• Assume independent response times within query plans
r1
• ri ≈ 1/throughput, can be reduced by batching, parallel calls batching
(see paper)
Ourcontribution
New QueryOptimization
Problem
23
Statistics: Selectivities
si: selectivity of WSi
Average # output tuples per input tuple to WSi
including post-filtering in query plan
WS1: SSN cr, filter cr > 600If 90% of SSNs have cr > 600 then s1 = 0.9
WS2: SSN ccnIf on average each SSN has 2 credit cards then s2 = 2.0
Ourcontribution
• Assume independent selectivities within query plans
New QueryOptimization
Problem
24
Query Optimization Primer
Possible query plans: P1, …, Pn
Data/access statistics: S
Execution cost metric: cost(Pi, S)
GOAL: Find least-cost plan
25
Bottleneck Cost Metric
Ourcontribution
New QueryOptimization
Problem
26
Bottleneck Cost Metric
Conference Lunch Buffet
Dish 1 Dish 2 Dish 3 Dish 4
Average per-tuple processing time =response time of slowest (bottleneck) stage in pipeline
Note: selectivities=1 in this example
27
Cost Equation for Plan P
Ri(P): Predecessors of WSi in plan P
• Fraction of input tuples seen by WSi =
• WSi response time per input tuple =
(assumes WSMS processing is not the bottleneck)
• Bottleneck cost metric:
Πj∈Ri(P) sj
(Πj∈Ri(P) sj)•ri
cost(P) = max1≤i≤n( (Πj∈Ri(P) sj)•ri )
28
Contrast with Sum Cost Metric
Dish 1 Dish 2 Dish 3 Dish 4
• Stream filter ordering• Expensive predicate placement
“Polite” Lunch Buffet
cost(P) = ∑1≤i≤n( (Πj∈Ri(P) sj)•ri )
29
Problem Statement
Input:
• Web services WS1, …, WSn
• Response times r1, …, rn
• Selectivities s1, …, sn
• Precedence constraints among web services
Output:• Web services arranged into a plan P • P respects all precedence constraints• cost(P) is minimized
30
No Precedence Constraints
All selectivities ≤ 1
Theorem: Optimal to order linearly by ri
(selectivities irrelevant)
General case(optimal):
… join at WSMS
“selective” web services ordered by response-time
“proliferative” web services
Results
31
With Precedence Constraints
cost(P) = max1≤i≤n( (Πj∈Ri(P) sj)•ri )
32
With Precedence Constraints
Sum cost metric• Hard to even obtain a factor O(n) of optimal
0
20
40
60
80
100
0 1 2 3 4 5
Time in Program (years)
Perc
en
t C
on
trib
uti
on
Student
Advisor
cost(P) = ∑1≤i≤n( (Πj∈Ri(P) sj)•ri )
33
With Precedence Constraints
Bottleneck (max) cost metric• Surprisingly, optimal solution in polynomial time
• O(n5) algorithm in paper–Add one WS at a time to the plan–WS chosen by solving a linear program
0
20
40
60
80
100
0 1 2 3 4 5Time in Program (years)
Perc
en
t C
on
trib
uti
on
Student
Advisor
cost(P) = max1≤i≤n( (Πj∈Ri(P) sj)•ri )
34
Example Revisited
L Results
WS1
WS2 WS3
L ResultsWS1 WS3WS2
Plan 2
Plan 1SSNcr
SSNcr
SSNccn
SSNccn
ccnph
ccnph
WS2
WS1
WS3
WS1 WS2 WS3
Selective
ProliferativeWS2 WS3Precedence constraint
max1≤i≤n( (Πj∈Ri(P) sj)•ri )
35
Implementation
Built prototype WSMS query processor
• Optimizer and execution engine
• Assumes schema issues resolved, statistics provided
• Written in Java and uses Apache Axis (open-source SOAP implementation)
• Experiments (see paper) validate analytical results
36
Isn’t Problem the Same as … ?
Web Service composition• Targeted for workflow-oriented applications
• No provably optimal strategies
Parallel/distributed query optimization• Freedom to place query operators
• Much larger space of execution plans
Data integration, mediators• For general sources of data
• Optimization of total resource consumption
37
Future Directions (sample)
• Web services with monetary cost
• Web services with unstable response times(QoS guarantees?)
• Multiple web services for same data
• Caching web-service query results
• More expressive queries, also workflows
• Web service profiling and statistics-tracking
38
Conclusion
Ourcontribution
New QueryOptimization
Problem
39
Conclusion
Ourcontribution
New QueryOptimization
Problem
40
Questions?
0
20
40
60
80
100
0 1 2 3 4 5
Time in Program (years)
Perc
en
t C
on
trib
uti
on
Student
Advisor