Global Query Optimization Mai

7/30/2019 Global Query Optimization Mai

1/31

GLOBAL QUERY OPTIMIZATION

Query optimization

process of producing a query execution

plan which represents an execution strategy for the query.

Input: Fragment query

Find the best (not necessarily optimal) global schedule

Minimize a cost function


2/31

Query Optimization Process


3/31


4/31

Search Space Restrict by means of heuristics

Perform unary operations before binary operations

Restrict the shape of the join tree

linear versus bushy trees

Linear tree is a tree such that at least one operand of each operator

node is a base relation.Bushy tree both operands are intermediate relations


5/31

Search Strategy

The most popular search strategy used by query optimizers is

dynamic programming

Deterministic

Start from base relations and build plans by adding one

relation at each step

Dynamic programming: breadth-first

Greedy: depth-first

Randomized

Search for optimalities around a particular starting point

Trade optimization time for execution time

Better when > 5-6 relations


6/31


7/31

Distributed cost model

An optimizers cost model includes cost functions to

predict the cost of operators, statistics and base data andformulas to evaluate the sizes of intermediate results.

Cost Function

The cost of a distributed execution strategy can be

expressed with respect to either the total time or the

response time

Total time - The sum of all time components

Response time - Elapsed time from the initiation to the

completion of the query.


8/31

Total Time =

Tcpu * #insts + TI/Os * I/Os + Tmsg * #msgs + TTR * #bytes

Tcpu - Time of a CPU instruction

TI/Os - Time of a disk I/O

Tmsg - Fixed tune of initiating and receiving a message

TTR

- Time it takes to transmit a data unit from one site to

another


9/31


10/31


11/31

Response_time =

Tcpu * seq_#insts + TI/Os * seq_#I/Os + Tmsg *seq_#msgs + TTR * seq_#bytes

Seq.#x

x can be instructions(insts), I/O, messages orbytes


12/31


13/31


14/31

Database Statistics

The primary factor affecting the performance of an execution

strategy is the size of the intermediate relations that areproduced during the execution.

For each relation R defined over the attributes A= {A1,

A2,.., An} fragmented as R1,

, Rr the statistical data length of each attribute: length(Ai)

the number of distinct values for each attribute in

each fragment: card(Ai (Rj))

maximum and minimum values in the domain of

each attribute: min(Ai), max(Ai)

the cardinality of each domain: card(dom[Ai])

the cardinality of each fragment: card(Rj)


15/31

Selectivity factor of each operation for relations

The join selectivity factor denoted SFj of relations

R and S is a real value between 0 and 1

For eg., SFj of 0.5 corresponds to a very large joinedrelation

0.001 corresponds to a small one.

These statistics are useful to predict the size of

intermediate relations


16/31

The size of an Intermediate Relation R as follows :

size(R) = card(R) * length(R)

Length(R)the length of a tuple of R computed from the

length of its attributes

Cardinalities of Intermediate results

Selection :The cardinality of selection is


17/31


18/31


19/31

Centralized Query Optimization

INGRES

dynamic

System R

static

exhaustive search


20/31

INGRES Algorithm

Decompose each multi-variable query into a sequence

of mono-variable queries with a common variable

Process each by a one variable query processor

Choose an initial execution plan (heuristics)

Order the rest by considering intermediate relation

sizes


21/31

INGRES AlgorithmDecomposition

Replace an n variable query q by a series ofqueries

q1 q2 qn

where qi uses the result of qi-1.

Detachment

Query q decomposed into q' q" where q' and q"have a common variable which is the result ofq'

Tuple substitution

Replace the value of each tuple with actual valuesand simplify the query

q(V1, V2, ... Vn) (q' (t1, V2, V3, ... , Vn), t1R)


22/31


23/31


24/31


25/31

S R Al i h


26/31

System R Algorithm

Simple (i.e., mono-relation) queries are executedaccording to the best access path

Execute joins

2.1 Determine the possible ordering of joins

2.2 Determine the cost of each ordering

2.3 Choose the join ordering with minimal cost


27/31


28/31


29/31


30/31


31/31

Documents

Global Query Optimization Mai