Upload
said
View
50
Download
0
Embed Size (px)
DESCRIPTION
Automatic Physical Design Tuning: Workload as a Sequence. Sanjay Agrawal, Microsoft Research Eric Chu, University of Wisconsin-Madison Vivek Narasayya, Microsoft Research. Automatic Physical Design Tuning. DB applications more complex and varied. Considerable time spent on tuning. - PowerPoint PPT Presentation
Citation preview
Automatic Physical Design Tuning: Workload as a Sequence
Sanjay Agrawal, Microsoft ResearchEric Chu, University of Wisconsin-MadisonVivek Narasayya, Microsoft Research
04/22/23 SIGMOD 2006 2
Automatic Physical Design Tuning DB applications more complex and varied. Considerable time spent on tuning. Reduce cost of ownership of RDBMS.
Automatically recommend physical design. Supported by DB vendors.
Database Engine Tuning Advisor, Microsoft Design Advisor, IBM SQL Access Advisor, Oracle
04/22/23 SIGMOD 2006 3
Microsoft Database Engine Tuning Advisor
QueryOptimizer(extended)
Microsoft SQL Server 2005
Database Engine Tuning Advisor
Recommendation
“What-if”
ApplicationsWorkload
Set of queries, updates
Set of indexes, materialized
views, horizontal partitions
04/22/23 SIGMOD 2006 4
Workload as a Sequence: Motivation Data warehousing
Query by day, update at night. Set: No index recommended when update costs outweigh
benefits. Sequence: May exploit benefits of indexes without incurring
update costs. Insert “create” and “drop” of indexes to workload. Exploit order of statements.
Updates
Night
Queries
Day
Queries
Day
Create Indexes Create IndexesDrop Indexes
04/22/23 SIGMOD 2006 5
Set VS Sequence Set-based
Recommendation is robust to changes in order of statement arrival.
Can miss good recommendations compared to sequenced-based approach.
Outputs are different Set: what indexes to create or drop? Sequence: what indexes to create or drop and where?
UpdatesQueries Queries
Create Indexes Create IndexesDrop Indexes
04/22/23 SIGMOD 2006 6
Model Workload as a Sequence Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy-SEQ Experiments
04/22/23 SIGMOD 2006 7
Problem Setting
Cost(Si,Ci) – cost of executing Si with Ci. TC(C1, C2) – transition cost Sequence execution cost
Nk=1((Cost(Sk,Ck) + TC(Ck-1,Ck)) + TC (CN,CN+1)
Workload: S = [S1, S2, …, SN]
S2S1 S3 SN
Si {Select, Insert, Delete, Update}
C1 C2 C3 CN CN+1C0
04/22/23 SIGMOD 2006 8
Problem Definition
Given: Database D, workload W = [S1, …, SN], initial
configuration C0, and storage bound M.
Find configurations C1, C2, …, CN+1 such that Minimize sequence execution cost:
Nk=1((Cost(Sk,Ck) + TC(Ck-1,Ck)) + TC (CN,CN+1)
Storage of Ci ≤ M, for all i.
04/22/23 SIGMOD 2006 9
Search Space
Given N statements and M indexes Sequence-based tuning
2M distinct configurations for each statement. 2M(N+1) possible execution sequences.
Set-based tuning 2M configurations.
04/22/23 SIGMOD 2006 10
Model Workload as a Sequence Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy Heuristic Experiments
04/22/23 SIGMOD 2006 11
Optimal Algorithm for Single-Index Case
Node costs: Cost(Si, { }) and Cost(Si,{I}). Edge costs: 0, IC, and ID. Cost of shortest path includes node and edge costs.
SOURCE
{ } { }
DESTINATION0
Ic
0Id
Ic
0
Id
0{ }
{I}
S1
{ }
{I}
SN
{ }
{I}
S2
DAG for single index, N statements
04/22/23 SIGMOD 2006 12
General Case – Multiple Indexes
At each stage, enumerate all possible configurations from the set of indexes.
Algorithm linear in the number of nodes and edges of DAG. However, number of nodes in DAG is exponential in the number of
indexes. M indexes => O(N*2M) nodes and O(N*2M) edges.
S1 S2 SN
C11 C1
2 C1N
C01 C0
2 C0N
CF1 CF
2 CFN
Ci1 Ci
2 CiN CN+1
C0
EXHAUSTIVE
04/22/23 SIGMOD 2006 13
Optimal Solution
Recommendation
Candidate set of structures
Solve sequence using EXHAUSTIVE
Sequence, Constraints
04/22/23 SIGMOD 2006 14
Search-Space PruningTechniques to reduce number of nodes: Cost-based Pruning
Leverages shortest-path solutions of individual indexes. Prunes configurations at each stage without loss of
optimality. Disjoint Sequences
Divide-and-conquer approach. Splits the input sequence and candidate index set.
Greedy-SEQ Guarantees a polynomial number of nodes.
04/22/23 SIGMOD 2006 15
Model Workload as a Sequence Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy Heuristic Experiments
04/22/23 SIGMOD 2006 16
Exploiting Disjoint Sequences Two sequences X and Y are disjoint if they do not
share any statements AND indexes. Disjoint sequences are common
E.g., server hosts multiple applications that touch different databases.
Approach: Split workload into disjoint sequences. Solve each sequence independently. Merge to get final solution.
Idea: DAG for each disjoint sequence has fewer nodes.
04/22/23 SIGMOD 2006 17
Efficiency Gain with Disjoint Sequences
S2S1 S3 S7S5S4 S6
{I1,I2,I3}W
S1 S3 S4
{I1}
S2 S5 S6
{I2}
S7
{I3}
W1
W2
W3
8 nodes at each stage
2 nodes at each stage for each sequence
04/22/23 SIGMOD 2006 18
Merge solutions of W1, W2, and W3: No storage violations DEST
I1cS1 S3SRC
{I1} {I1}
S4
{ }I1d
{ }{ }
W1 = [S1,S3,S4]
DESTS7I3c
{I3} {I3}{ }W3 = [S7]
SRC
S2 DESTS5 S6I2d
W2 = [S2,S5,S6]
I2c
{I2} {I2} { }SRC{ } { }
Pu is optimal when there are no storage violations.
S2
{I1,I2}
S3
{I1,I2}
S1SRC
{I1}
S4
{I2}
S5
{I2}
S6
{ }
S7
{I3} {I3}
DEST
{ }
04/22/23 SIGMOD 2006 19
Merge in the presence of storage violation Suppose storage bound allows only 1 index.
Pu is not a valid solution as it has configurations with storage violation.
S2
{I1,I2}
S3
{I1,I2}
S1SRC
{I1}
S4
{I2}
S5
{I2}
S6
{ }
S7
{I3} {I3}
DEST
{ }
Pu’ = Merge P1, P2 and P3 to get a valid solution.
S1SRC
{I1}{ }
S2 S3
{I1}{I2}
S4
{I2}
S5
{I2}
S6
{ } {I3} {I3}
DESTS7
Note that cost of Pu is a lower bound on cost of any valid solution.
04/22/23 SIGMOD 2006 20
Solution with Split and Merge Sequence,
Constraints
Candidate set of structures
Recommendation
Apply Split operator to get disjoint sequences
Solve each sequence independently using EXHAUSTIVE
Merge results of disjoint sequences
or GREEDY-SEQ
04/22/23 SIGMOD 2006 21
Model Workload as a Sequence Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy Heuristic Experiments
04/22/23 SIGMOD 2006 22
Greedy Approach Goal:
Explore a polynomial number of good configurations.
Run shortest path over the DAG constructed with these configurations.
Solution close to optimal.
Greedy-SEQ: adaptation of existing greedy technique for the sequence model.
04/22/23 SIGMOD 2006 23
Greedy-SEQ Steps of Greedy-SEQ:
1. Get optimal solution for each index. Record configurations.
2. Initialize current best to be the lowest-cost solution seen so far.
3. Improve current best by combining with other solutions and resetting current best. Record new configurations of current best.
Repeat until no more improvement.
4. Run shortest-path over configurations collected.
04/22/23 SIGMOD 2006 24
Combining Two Single-Index Solutions
S1 S2 SNSK SLS0 SN+1
{I1}{} {}{}I1 {I1} {I1} {}
{}{} {}{}I2 {I2} {I2} {I2}
{I1,I2} {I1,I2}
{I1}
{} {}{}{I2}I1,I2
{I1} {I1} {}
{} {I2} {I2}
04/22/23 SIGMOD 2006 25
Combining Two Single-Index Solutions
{I1}
{} {}{}{I2}
{I1,I2} {I1,I2}
I1,I2
{I1} {I1} {}
{} {I2} {I2}
S1 S2 SNSK SLS0 SN+1
{I1}{} {}{}I1 {I1} {I1} {}
{}{} {}{}I2 {I2} {I2} {I2}
04/22/23 SIGMOD 2006 26
Greedy-SEQ: Greedy Approach1. Get optimal solution for each index. Record
configurations. 2. Initialize current best to be the lowest-cost
solution seen so far.3. Improve current best by combining with other
solutions and resetting current best. Record new configurations of current best.
Repeat Step 3 until no more improvement.
4. Run shortest-path over configurations collected.
04/22/23 SIGMOD 2006 27
End-to-End SolutionSequence, Constraints
Candidate set of structures
Recommendation
Apply split operator to get disjoint sequences
Solve each sequence independently using EXHAUSTIVE or GREEDY-SEQ
Merge results of disjoint sequences
Apply cost-based pruning on each sequence
04/22/23 SIGMOD 2006 28
Model Workload as a Sequence Motivation Problem Definition Optimal Algorithm Disjoint Sequences Greedy Heuristic Experiments
04/22/23 SIGMOD 2006 29
Sequence VS Set-based approaches % improvement relative to the optimal set-
based solution. Sequence is better in the presence of
updates and/or storage bound is low.
Workload M = 1.2 GB M = 3 GBTPCH-22 19% 0%
TPCH-22-I-10-MID 22% 16%
TPCH-22-I-10-END 25% 28%
04/22/23 SIGMOD 2006 30
Greedy-SEQ VS Exhaustive Greedy-SEQ’s much faster with minimal
degradation in quality.
Workload % reduction in running time % reduction in qualityTPCH-3 50% <1%
TPCH-5-M-5 98.4% 2.3%
TPCH-22 Exhaustive was terminated after 24 hours
Not available
04/22/23 SIGMOD 2006 31
Effectiveness of Split and Merge
Workload % reduction in running time compared to WO-SPMR
% reduction in quality compared to WO-SPMR
TPCH-22 <0.1% 0%
WKLD1 89.9% 0%
WKLD1-LOW 71.4% 3.0%
With split and merge (SPMR) VS without (WO-SPMR)
04/22/23 SIGMOD 2006 32
Conclusion Sequence model allows more optimization
opportunities than set model. Model the problem as finding the shortest
path over a DAG. Heuristics give nearly optimal solutions with
much better performance.