Self-Tuning Database systems

Self-Tuning Database systems

Wang Haocong

Jan 8, 2009

2

Tuning databases

Logical database design Physical database design (indexes) Memory, disks Materialized Views Partitioning

3

Self-Tuning Systems

Databases are complicated! Schema design is hard Lots of “knobs” to tweak Need appropriate information

Does the DB approach give us more ability to “self-tune” than some other approach (e.g., Java)?

4

What Would We Like to Auto-Tune?

Query optimization – statistics, bad decisions, …

The schema itself? Indices Auxiliary materialized views Data partitioning Perhaps logging?

5

What Are The Challenges in Building Adaptive Systems?

Really, a generalization of those in adaptive query processing

Information gathering – how do we get it? Extrapolating – how do we do this

accurately and efficiently? Sampling or piloting Minimizing the impact of mistakes if they

happen Using app-specific knowledge

6

Who’s Interested in these Problems?

Oracle: Materialized view “wizard”

Microsoft “AutoAdmin”: Index selection, materialized view selection Stats on materialized views Database layout

IBM SMART (Self-Managing And Resource Tuning): Histogram tuning (“LEO” learning optimizer) Partitioning in clusters Index selection Adaptive query processing

7

A Particular Instance: Microsoft’s Index Tuning Wizard

Why not let the system choose the best index combination(s) for a workload

The basic idea: Log a whole bunch of queries that are

frequently run See what set of indices is best

Why is this hard? Why not index everything?

Create these indices with little or no human input

8

Possible Approaches

Obviously: only consider indices that would be useful The optimizer can “tell” which indices it might use in

executing a query

But that continues to be a lot of indices! Can exhaustively compare all possible indices Note that indices can interact (esp. for updates)

How do we compare costs and benefits of indices? Execute for real Use optimizer cost model with whatever stats we have Gather some stats (e.g., build histograms, sample) and use

cost model

9

Physical design

10

SQL Server Architecture

11

Their Approach in More Detail For a workload of n queries:

Generate a separate workload with each query Evaluate the candidate indices for this query to find the best

“configuration” – limited to 2 indices, 2 tables, single joins Candidate index set for workload is the union of all configurations

Too expensive to enumerate all; use a greedy algorithm: Exhaustively enumerate (using optimizer) best m-index

configuration Pick a new index I to add, which seems to save cost relative to

adding some other I’ or to the current cost Repeat until we’ve added “enough” k indices “Despite interaction among indices, the largest cost reductions

often result from indices that are good candidates by themselves”

They iteratively expand to 2-column indices – index on leading column must be desirable for this to be desirable

12

Further Enhancements

Use the tool for “what-if” analysis What if a table grows by a substantial amount?

Supplement with extra info gathered from real query execution Maybe we can “tweak” estimates for certain

selectivities An attempt to compensate for the

“exponential error” problem

13

Physical tuning tool

Decide when to tune Decide what “representative” workload Run the tool and examine the

recommended physical design changes Implement them if appropriate

14

Alternative Tuning Models

Alerter When to tune Light weight tools

Workload as a Sequence Read/update queries Create/drop physical structures

15

Dynamic tuning

16

Dynamic tuning

Low overhead, not interfere with normal functioning of DBMS

Balance cost of transitioning and potential benefits

Avoid unwanted oscillations

17

Impact

Tuning Large Workloads Partition Sample

Tuning Production Servers Test server

18

Future Directions

Ability to compare the quality of automated physical design solutions

Light weight approaches Machine learning techniques, control

theory and online algorithms

19

Memory tuning in DB2

Innovative cost-benefit analysis Simulation technique vs. modeling

Tunes memory distribution and total memory usage

Simple greedy memory tuner

Control algorithms to avoid oscillations

Performs very well in experiments For both OLTP and DSS

20

SBPX Operation

Buffer Pool SBPX

Disk

3. Page request for

4. Check Bufferpool

5. Check SBPX

6. Start timer

1. Victimize Page (move to SBPX)2. Load new page from disk7. Victimize BP page (send to SBPX)

8. Load page from disk

9. Stop timer

21

Experimental results – tuning a static workload

T ime

Tra

ns

ac

tio

ns

Pe

r M

inu

te

Phas e 1 Phas e 2 Phas e 3

Phase 1 Phase 2 Phase 3

BP

Size

22

Experimental results – workload shift

0

1000

2000

3000

4000

5000

6000

7000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Order of execution

Tim

e in

sec

on

ds

avg = 959

avg = 2285

avg = 6206

Reduce 63%

Some IndexesDropped

23

Thank you!

Documents

Self-Tuning Database systems