Global Query Optimization Mai

Embed Size (px)

Citation preview

  • 7/30/2019 Global Query Optimization Mai

    1/31

    GLOBAL QUERY OPTIMIZATION

    Query optimization

    process of producing a query execution

    plan which represents an execution strategy for the query.

    Input: Fragment query

    Find the best (not necessarily optimal) global schedule

    Minimize a cost function

  • 7/30/2019 Global Query Optimization Mai

    2/31

    Query Optimization Process

  • 7/30/2019 Global Query Optimization Mai

    3/31

  • 7/30/2019 Global Query Optimization Mai

    4/31

    Search Space Restrict by means of heuristics

    Perform unary operations before binary operations

    Restrict the shape of the join tree

    linear versus bushy trees

    Linear tree is a tree such that at least one operand of each operator

    node is a base relation.Bushy tree both operands are intermediate relations

  • 7/30/2019 Global Query Optimization Mai

    5/31

    Search Strategy

    The most popular search strategy used by query optimizers is

    dynamic programming

    Deterministic

    Start from base relations and build plans by adding one

    relation at each step

    Dynamic programming: breadth-first

    Greedy: depth-first

    Randomized

    Search for optimalities around a particular starting point

    Trade optimization time for execution time

    Better when > 5-6 relations

  • 7/30/2019 Global Query Optimization Mai

    6/31

  • 7/30/2019 Global Query Optimization Mai

    7/31

    Distributed cost model

    An optimizers cost model includes cost functions to

    predict the cost of operators, statistics and base data andformulas to evaluate the sizes of intermediate results.

    Cost Function

    The cost of a distributed execution strategy can be

    expressed with respect to either the total time or the

    response time

    Total time - The sum of all time components

    Response time - Elapsed time from the initiation to the

    completion of the query.

  • 7/30/2019 Global Query Optimization Mai

    8/31

    Total Time =

    Tcpu * #insts + TI/Os * I/Os + Tmsg * #msgs + TTR * #bytes

    Tcpu - Time of a CPU instruction

    TI/Os - Time of a disk I/O

    Tmsg - Fixed tune of initiating and receiving a message

    TTR

    - Time it takes to transmit a data unit from one site to

    another

  • 7/30/2019 Global Query Optimization Mai

    9/31

  • 7/30/2019 Global Query Optimization Mai

    10/31

  • 7/30/2019 Global Query Optimization Mai

    11/31

    Response_time =

    Tcpu * seq_#insts + TI/Os * seq_#I/Os + Tmsg *seq_#msgs + TTR * seq_#bytes

    Seq.#x

    x can be instructions(insts), I/O, messages orbytes

  • 7/30/2019 Global Query Optimization Mai

    12/31

  • 7/30/2019 Global Query Optimization Mai

    13/31

  • 7/30/2019 Global Query Optimization Mai

    14/31

    Database Statistics

    The primary factor affecting the performance of an execution

    strategy is the size of the intermediate relations that areproduced during the execution.

    For each relation R defined over the attributes A= {A1,

    A2,.., An} fragmented as R1,

    , Rr the statistical data length of each attribute: length(Ai)

    the number of distinct values for each attribute in

    each fragment: card(Ai (Rj))

    maximum and minimum values in the domain of

    each attribute: min(Ai), max(Ai)

    the cardinality of each domain: card(dom[Ai])

    the cardinality of each fragment: card(Rj)

  • 7/30/2019 Global Query Optimization Mai

    15/31

    Selectivity factor of each operation for relations

    The join selectivity factor denoted SFj of relations

    R and S is a real value between 0 and 1

    For eg., SFj of 0.5 corresponds to a very large joinedrelation

    0.001 corresponds to a small one.

    These statistics are useful to predict the size of

    intermediate relations

  • 7/30/2019 Global Query Optimization Mai

    16/31

    The size of an Intermediate Relation R as follows :

    size(R) = card(R) * length(R)

    Length(R)the length of a tuple of R computed from the

    length of its attributes

    Cardinalities of Intermediate results

    Selection :The cardinality of selection is

  • 7/30/2019 Global Query Optimization Mai

    17/31

  • 7/30/2019 Global Query Optimization Mai

    18/31

  • 7/30/2019 Global Query Optimization Mai

    19/31

    Centralized Query Optimization

    INGRES

    dynamic

    System R

    static

    exhaustive search

  • 7/30/2019 Global Query Optimization Mai

    20/31

    INGRES Algorithm

    Decompose each multi-variable query into a sequence

    of mono-variable queries with a common variable

    Process each by a one variable query processor

    Choose an initial execution plan (heuristics)

    Order the rest by considering intermediate relation

    sizes

  • 7/30/2019 Global Query Optimization Mai

    21/31

    INGRES AlgorithmDecomposition

    Replace an n variable query q by a series ofqueries

    q1 q2 qn

    where qi uses the result of qi-1.

    Detachment

    Query q decomposed into q' q" where q' and q"have a common variable which is the result ofq'

    Tuple substitution

    Replace the value of each tuple with actual valuesand simplify the query

    q(V1, V2, ... Vn) (q' (t1, V2, V3, ... , Vn), t1R)

  • 7/30/2019 Global Query Optimization Mai

    22/31

  • 7/30/2019 Global Query Optimization Mai

    23/31

  • 7/30/2019 Global Query Optimization Mai

    24/31

  • 7/30/2019 Global Query Optimization Mai

    25/31

    S R Al i h

  • 7/30/2019 Global Query Optimization Mai

    26/31

    System R Algorithm

    Simple (i.e., mono-relation) queries are executedaccording to the best access path

    Execute joins

    2.1 Determine the possible ordering of joins

    2.2 Determine the cost of each ordering

    2.3 Choose the join ordering with minimal cost

  • 7/30/2019 Global Query Optimization Mai

    27/31

  • 7/30/2019 Global Query Optimization Mai

    28/31

  • 7/30/2019 Global Query Optimization Mai

    29/31

  • 7/30/2019 Global Query Optimization Mai

    30/31

  • 7/30/2019 Global Query Optimization Mai

    31/31