R* Optimizer Validation and Performance Evaluation for Distributed Queries

KORHAN KOCABIYIK 1

R* Optimizer Validation and Performance Evaluation for Distributed

QueriesLothar F. Mackert, Guy M. Lohman

IBM Almaden Research Center

1986

presented by:KORHAN KOCABIYIK

KORHAN KOCABIYIK 2

Paper Layout

Overview of distributed Compilation and Optimization Instrumentation General Measurements Distributed Join Results Suggestions for Improving their Performance Conclusions Suggested Readings

KORHAN KOCABIYIK 3

About Paper

It is about the performance evaluation of R* to distributed queries (that has tables at multiple sites)

R* message, I/O and CPU resources consumed and corresponding costs estimated by the optimizer

A number of tests were run over wide variety of dynamically-created test databases, SQL queries, and system parameters.

Different speed networks show different performance for distributed queries

KORHAN KOCABIYIK 4

What is System R?

System R is a database system built as a research project at IBM Almaden Research Center in the 1970's.

System R introduced the SQL language and also demonstrated that a relational system could provide good transaction processing performance.

Non-procedural user interface. User specify only what data is desired, leaving the system optimizer to choose how to access that data

Optimizer plays a central role regarding system performance

KORHAN KOCABIYIK 5

R*

R* is inherited and extended to a distributed environment, the optimization algorithms of System R

Each site is autonomous as possible.– No central scheduler, no central deadlock detection, no

central catalog, etc. R* uses snapshot data – a “copy of a relation(s) in which

the data is consistent but not necessarily up to date.” used to provide a static copy of the database

The Unit of Distribution in R* is a table and each table is stored at one and only one site

KORHAN KOCABIYIK 6

Optimization Should Answer

Under what circumstances (regions of parameter space) does the

optimizer choose a suboptimal plan, or, worse, a particularly bad plan? To which parameters is the actual performance most sensitive? Are these parameters being modeled accuretly by the optimizer? What is the impact of variations from the optimizer’s simplifying

assumptions? Is it possible to simplfy the optimizer’s model (by using heuristics, for

example) to speed up optimization? What are the best database statistics to support optimization?

KORHAN KOCABIYIK 7

Distributed Compilation and Optimization

Distributed Query: Any SQL data manipulation statement that references tables at sites other than the query site

Query Site: The site to which the application program is submitted for compilation

Master Site: The site coordinates the optimization of all SQl statements embedded in that program

Apprentice Site: For each query, sites other than the master site that store a table referenced in the query.

Join Site: The site at which each join takes place. And there are two methods for trnsfering a copy of the inner table to the join site:– Ship Whole: A copy of the entire table is transferred– Fetch Matches: Only matching tuples are transferred

KORHAN KOCABIYIK 8

Fetch Matches

1) Project the outer table tuple and ship this value to the site of the inner table

2) Find those tuples in the inner inner table that match the value sent and project them to the columns needed

3) Ship a copy of the projected matching inner tuples back to the join site

4) Join the matches to the outer table

KORHAN KOCABIYIK 9

Optimization in R*

Compilation hence optimization is truly distributed in R*

The master’s optimizer makes all inter-site decisions, such as the site at which inter-site joins takes place, and the method and order of transfers between sites

Intra-site decisions are only suggested by the master planner

KORHAN KOCABIYIK 10

R* Cost Structure

The optimizer tries to minimize a cost function that is a linear combination of four components: CPU, I/O, the number of messages and the total number of bytes transferred at each message– Cost = Wcpu(#_instructions) + Wi/o(#ios) + Wmsg(#_MSGs) +

Wbyt(#_byt) Wmsg for some constant penalty for sending a

communication over the network Wbyt penalty for message length.

KORHAN KOCABIYIK 11

CPU cost

Both local and distributed queries found CPU cost to be important.

CPU costs high in sorts– 1.allocating temp disk space for partially sorted strings– 2. Encoding and decoding sorted columns– 3. Quicksorting individual pages in mem.

CPU costs are also high in scans Although “CPU cost is significant . . . [it] is not enough to

affect the choice of the optimizer”

KORHAN KOCABIYIK 12

Instruments

1. Added three statements to SQL language for test control and performance monitoring(EXPLAIN, COLLECT COUNTERS, FORCE OPTIMIZER)

2. Developed pre-compiled applications for autonmatically – (a) testing queries using the SQL statements of 1 – (b) analyzing the data collected by step a

3. Stored output of SQL statements of (1) and application programs of (2) in DB tables to establish interface between (1), (2a) and (2b)

KORHAN KOCABIYIK 13

Distributed EXPLAIN

The EXPLAIN command writes to user-owned PLAN_TABLE information describing the access plan chosen by the optimizer for a given SQL statement

No single site has the complete access plan for a distributed query Master-site has the inter-site access plan and each apprentice has its

local access plans R* EXPLAIN command was augmented to store each apprentice-

site’s plan in a local PLAN_TABLE and the test application program was altered to retrieve that information from each apprentice’s PLAN_TABLE

KORHAN KOCABIYIK 14

Distributed COLLECT COUNTERS

This new SQL statement collects and stores in a user-owned table the values of 40 internal counters RSS* (counts of disk reads, writes, and lookups) and some newly implemented counters of the communications component DC*

By a process run in the master and child processes run in the apprentices these counters are collected and inserted into the COUNTER_TABLE in the master-site

KORHAN KOCABIYIK 15

Distributed FORCE OPTIMIZER

To measure the cost of suboptimal plans they had to force the optimizer to use a specified plan.

The FORCE OPTIMIZER statement chooses the plan for the next SQL statement

Apprentice optimization can be forced by simply telling each apprentice to utilize the optimization decisions recommended by the master-site

KORHAN KOCABIYIK 16

Experimental Envirenment

2 identical IBM 4381 m/c, 40 buffer page of 4K each,

3000 random integer values for the join column 4 Integer & 5 fixed size char columns Each tuple is exactly 66 bytes long System stored 50 tuples in one page Join tests resulted with 100 to 6000 tupled result

sets

KORHAN KOCABIYIK 17

Costs Measured

KORHAN KOCABIYIK 18

Distributed Join

Focused on distributed joins rather than single table queries

In R*, n-table joins are executed as a sequence of n-1 two-table joins

Intermediate results of these joins are called composite tables

For simplicity they used only two-table joins

KORHAN KOCABIYIK 19

Distributed Joins (Cont’)

There are two different join methods:– Merge Scan Joins (-M-)– Nested Loop Joins (-N-)

There are two different transfer methods:– Whole table transfer (W)– Fetch only match (F)

AIW-M-B means Indexed outer table A is wholly shipped to inner table B with merge scan sort

KORHAN KOCABIYIK 20

Inner Table Transfer Strategy

If we ship the inner table B as a whole– best plan is AI-M-BIW

If we fetch the matching inner tuples – best plan is AI-M-BIF

KORHAN KOCABIYIK 21

Transfer Trade-offs

Option W – transfer the whole inner table.– Negatives

• No indexes can be shipped with it.• May ship inner tuples that have no matching outer tuples

– Positives• Predicates applied before shipping may reduce size• Only one transfer is needed for the join which may result in

lower overall network costs

KORHAN KOCABIYIK 22

Transfer Trade-offs (Cont.)

Option F – Fetch matching tuples only• Idea – Send outer tuple join column values, match with inner

tuples, then send these inner tuples over network

– Negatives• Multiple rounds of sending congest network• May have to send whole table anyway – W better

– Positives• Tables may have few actual join values in common, can

eliminate need to send many inner tuples.

KORHAN KOCABIYIK 23

Use W or F Option?

In W – Network costs only 2.9% of total Strategy F handy when:

– Cardinality of outer table <0.5 the # of messages required to ship the inner table as a whole.

• Idea behind rule – beat plan W in theory by sending few join values from outer table that will weed out most inner tuples

– The join cardinality < inner cardinality• Idea behind 2nd rule – since most inner tuples will not be

needed, we can beat plan W by sending only outer join values and eliminating most inner tuples.

KORHAN KOCABIYIK 24

Optimizer performance (Local)

Optimizer has trouble modeling unclustered indexes on smaller tables. In such cases, Optimizer actually picks worst plan, thinking it is the best and thinks the best is actually the worst.

Why?– Tuple order unimportant to nested loop join and index on outer

table may clutter buffer.– A highly selective predicate may eliminate the need for a scan in

which case the index is important. Adding an index can increase the cost - The Optimizer models each

table independently, ignoring competition for buffer space amongst two tables being joined

KORHAN KOCABIYIK 25

Distributed vs. Local Joins

Total resources consumed in Distributed joins higher than in local joins

Response time in Distributed joins less than in Local Joins What does this mean?

– We have more machines doing work in a distributed join so they can do work in parallel- more work is done but since more machines are doing work, the result takes less time.

KORHAN KOCABIYIK 26

Distrib vs. Local Joins Cont.

Response time improvement for distributed queries has 2 reasons– 1. Parallelism– 2. Reduced Contention – accessing tables using

unclustered indexes benefit greatly from larger buffers – n machines = n buffer size.

Negatives of Distributed – Slow network speeds make reverse true, then local joins are faster.

KORHAN KOCABIYIK 27

Alternative Join methods

R* has also introduced some alternative join methods– joins using dynamically-created indexes– semijoins– joins using hashing (Bloom) filters (Bloomjoins)

S is a table at site 1 and T is a table at site 2 S.a = T.b where a is column of S and b is a column of T Two cases

– Both S and T have an (unclustered) index on their join column(s)– Neither have an (unclustered) index on their join column(s)

KORHAN KOCABIYIK 28

Dynamically Create Temporary Index on Inner Table

Since we cannot send an index, we can try to make one

Cost structure may be high– Scan entire table and send to site 1– Store table and create a temporary index on it at

site1– Execute best local join plan

KORHAN KOCABIYIK 29

Semijoin

Sort S and T on the join column. Produce S’, T’ Send S’ ‘s join column values to site T match against T’ and send these actual tuples to site S. Merge-join T’ ‘s tuples and S’ ‘s tuples to get answer.

KORHAN KOCABIYIK 30

Bloom join

Use Bloom filter – bit string sort of like hash table where each bit represents a bucket like in a hash table. All bits start off 0. If a value hashes to bit x, turn x to 1.

Generate a Bloom filter for table S and send to T. Hash T using the same hash function and ship any

tuples that hash to a 1 in S’s Bloom filter At site S, join T’s tuples with table S.

KORHAN KOCABIYIK 31

Comparison of join methods

Bloom joins generally outperform other methods Semijoins advantageous when both data and index

(unclustered) pages of inner table fit into the buffer so that efficient use of these tables keep semijoins procesing costs low. If not, constant paging of unclustered index results in poor performance.

KORHAN KOCABIYIK 32

Why are Bloom Joins better?

Message costs of Semi and Bloom comparable Semijoin incurs higher local processing costs to

perform a “second join”, ie once send S’ ‘s join column to T’, join, then send this result to S’ and join these T’ values with S’.

KORHAN KOCABIYIK 33

Conclusions

Shipping the entire inner table to the join site and storing it there dominates the fetch-matches strategy

R* optimizer’s modeling of message costs was very accurate (a necessary condition for picking the correct join site)

Even for medium-speed, long networks linking geographically disparsed hosts, local costs for CPU and I/O are significant

Bloom-join dominates the other distributed join methods in all cases investigated

They did not test joins for very large tables and for more than 2 tables, so they may be characterised differently

They used an environment that all sites are have the same capabilities and a uniform network link. So adapting the optimizer to real-life may be rather difficult

KORHAN KOCABIYIK 34

Questions?Thanks!

presented by:KORHAN KOCABIYIK

Documents

R* Optimizer Validation and Performance Evaluation for Distributed Queries