Upload
anbarasi-masilamani
View
222
Download
0
Embed Size (px)
Citation preview
7/30/2019 Global Query Optimization Mai
1/31
GLOBAL QUERY OPTIMIZATION
Query optimization
process of producing a query execution
plan which represents an execution strategy for the query.
Input: Fragment query
Find the best (not necessarily optimal) global schedule
Minimize a cost function
7/30/2019 Global Query Optimization Mai
2/31
Query Optimization Process
7/30/2019 Global Query Optimization Mai
3/31
7/30/2019 Global Query Optimization Mai
4/31
Search Space Restrict by means of heuristics
Perform unary operations before binary operations
Restrict the shape of the join tree
linear versus bushy trees
Linear tree is a tree such that at least one operand of each operator
node is a base relation.Bushy tree both operands are intermediate relations
7/30/2019 Global Query Optimization Mai
5/31
Search Strategy
The most popular search strategy used by query optimizers is
dynamic programming
Deterministic
Start from base relations and build plans by adding one
relation at each step
Dynamic programming: breadth-first
Greedy: depth-first
Randomized
Search for optimalities around a particular starting point
Trade optimization time for execution time
Better when > 5-6 relations
7/30/2019 Global Query Optimization Mai
6/31
7/30/2019 Global Query Optimization Mai
7/31
Distributed cost model
An optimizers cost model includes cost functions to
predict the cost of operators, statistics and base data andformulas to evaluate the sizes of intermediate results.
Cost Function
The cost of a distributed execution strategy can be
expressed with respect to either the total time or the
response time
Total time - The sum of all time components
Response time - Elapsed time from the initiation to the
completion of the query.
7/30/2019 Global Query Optimization Mai
8/31
Total Time =
Tcpu * #insts + TI/Os * I/Os + Tmsg * #msgs + TTR * #bytes
Tcpu - Time of a CPU instruction
TI/Os - Time of a disk I/O
Tmsg - Fixed tune of initiating and receiving a message
TTR
- Time it takes to transmit a data unit from one site to
another
7/30/2019 Global Query Optimization Mai
9/31
7/30/2019 Global Query Optimization Mai
10/31
7/30/2019 Global Query Optimization Mai
11/31
Response_time =
Tcpu * seq_#insts + TI/Os * seq_#I/Os + Tmsg *seq_#msgs + TTR * seq_#bytes
Seq.#x
x can be instructions(insts), I/O, messages orbytes
7/30/2019 Global Query Optimization Mai
12/31
7/30/2019 Global Query Optimization Mai
13/31
7/30/2019 Global Query Optimization Mai
14/31
Database Statistics
The primary factor affecting the performance of an execution
strategy is the size of the intermediate relations that areproduced during the execution.
For each relation R defined over the attributes A= {A1,
A2,.., An} fragmented as R1,
, Rr the statistical data length of each attribute: length(Ai)
the number of distinct values for each attribute in
each fragment: card(Ai (Rj))
maximum and minimum values in the domain of
each attribute: min(Ai), max(Ai)
the cardinality of each domain: card(dom[Ai])
the cardinality of each fragment: card(Rj)
7/30/2019 Global Query Optimization Mai
15/31
Selectivity factor of each operation for relations
The join selectivity factor denoted SFj of relations
R and S is a real value between 0 and 1
For eg., SFj of 0.5 corresponds to a very large joinedrelation
0.001 corresponds to a small one.
These statistics are useful to predict the size of
intermediate relations
7/30/2019 Global Query Optimization Mai
16/31
The size of an Intermediate Relation R as follows :
size(R) = card(R) * length(R)
Length(R)the length of a tuple of R computed from the
length of its attributes
Cardinalities of Intermediate results
Selection :The cardinality of selection is
7/30/2019 Global Query Optimization Mai
17/31
7/30/2019 Global Query Optimization Mai
18/31
7/30/2019 Global Query Optimization Mai
19/31
Centralized Query Optimization
INGRES
dynamic
System R
static
exhaustive search
7/30/2019 Global Query Optimization Mai
20/31
INGRES Algorithm
Decompose each multi-variable query into a sequence
of mono-variable queries with a common variable
Process each by a one variable query processor
Choose an initial execution plan (heuristics)
Order the rest by considering intermediate relation
sizes
7/30/2019 Global Query Optimization Mai
21/31
INGRES AlgorithmDecomposition
Replace an n variable query q by a series ofqueries
q1 q2 qn
where qi uses the result of qi-1.
Detachment
Query q decomposed into q' q" where q' and q"have a common variable which is the result ofq'
Tuple substitution
Replace the value of each tuple with actual valuesand simplify the query
q(V1, V2, ... Vn) (q' (t1, V2, V3, ... , Vn), t1R)
7/30/2019 Global Query Optimization Mai
22/31
7/30/2019 Global Query Optimization Mai
23/31
7/30/2019 Global Query Optimization Mai
24/31
7/30/2019 Global Query Optimization Mai
25/31
S R Al i h
7/30/2019 Global Query Optimization Mai
26/31
System R Algorithm
Simple (i.e., mono-relation) queries are executedaccording to the best access path
Execute joins
2.1 Determine the possible ordering of joins
2.2 Determine the cost of each ordering
2.3 Choose the join ordering with minimal cost
7/30/2019 Global Query Optimization Mai
27/31
7/30/2019 Global Query Optimization Mai
28/31
7/30/2019 Global Query Optimization Mai
29/31
7/30/2019 Global Query Optimization Mai
30/31
7/30/2019 Global Query Optimization Mai
31/31