78
PART 4 DATA STORAGE AND QUERY

PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

Embed Size (px)

DESCRIPTION

May 2008Database System Concepts - Chapter 14 Query Optimization -3 §14.1 Overview why optimization needed §14.2 Transformation of relational expressions equivalence rules §14.4 Choice of evaluation plans — Query optimization cost-based optimization,§14.2 heuristic optimization, §14.3 Main parts in Chapter 14

Citation preview

Page 1: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

PART 4

DATA STORAGE AND QUERY

Page 2: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

Chapter 14

Query Optimization

Page 3: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 3

§14.1 Overview why optimization needed

§14.2 Transformation of relational expressions equivalence rules

§14.4 Choice of evaluation plans — Query optimization cost-based optimization,§14.2 heuristic optimization, §14.3

Main parts in Chapter 14

Page 4: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 4

A SQL query statement may corresponds to several equivalent expressions ( or query evaluation plans ), each of which may have different costs

Query optimization is needed to choice an efficient query-evaluation plan with lower costs

E.g.1. select * from r, s where r.A = s.A σr.A=s.A (r ╳ s) is much slower than r s

§14.1 Overview

Page 5: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 5

E.g.2 Find the names of all customers who have an account at any branch located in Brooklyn

select customer-name from branch, account, depositor where branch.name=account.name AND account.number =depositor.number AND branch-city = Brooklyn

Fig.14.1

§14.1 Overview

Page 6: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

Fig. 14.1 Equivalent Expression

Page 7: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 7

The procedures of optimization generating the equivalent expressions/query trees, by

transforming of relational algebra expressions according to equivalence rules in§14.2

generating alternative evaluation plans, by annotating the resultant equivalent expressions with implementation algorithms for each operations in the expressions

choosing the optimal (that is, cheapest) or near-optimal plan based on the estimated cost, by

cost-based optimization heuristic optimization

14.1 Overview (cont.)

Page 8: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 8

The cost of operations is needed to estimate based on statistical information about relations

e.g. number of tuples, number of distinct values for join attributes, etc.

14.1 Overview (cont.)

Page 9: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 9

§14.2 Transformation of Relational Expressions

Definition. Two relational algebra expressions are equivalent, if on every legal database instances, the two expression generate the same set of tuples

Page 10: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 10

Fig.14.0.1 Schema diagram for the banking enterprise

14.2.1 Equivalence Rules

Page 11: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 11

Rule1.( 选择串接律 , 将 1 个选择操作分解为 2 个选择操作 ) Conjunctive selection operations can be deconstructed into a sequence of individual selections

e.g. refer to

Rule2. ( 选择交换律 ) Selection operations are commutative :

))(()(2121

EE

))(())((1221

EE

14.2.1 Equivalence Rules (cont.)

Page 12: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 12

Rule3. ( 投影串接律 ) Only the final operation in a sequence of project operations is needed

note: it is required that L1 ⊆ L2 …. L⊆ ⊆ n

e.g. {loan-number} { {loan-number, amount} (loan) }

= {loan-number} (loan)

)())))((((121

EE LLnLL

14.2.1 Equivalence Rules (cont.)

Page 13: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 13

Rule4. ( 以连接操作代替选择和笛卡尔乘积 ) Selections can be combined with Cartesian products and theta joins

a. (E1 ╳ E2) = E1 E2 note: definition of joinb. 1(E1 2 E2) = E1 1 2 E2 e.g. branch-name=“Perryridge”( borrower loan) = borrower (branch-name=“Perryridge) (borrower.loan-numbr=loan.loan-number) loan

By Rule4, the number of operations can be reduced, and the costs of the left-hand expressions are less than that of the right-hand expressions

often used in heuristic query optimizing

14.2.1 Equivalence Rules (cont.)

Page 14: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 14

Rule5. ( 连接操作可交换 ) Theta join operations are commutative

E1 E2 = E2 E1

Fig. 14.2

14.2.1 Equivalence Rules (cont.)

Page 15: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 15

Rule6. ( 连接操作的结合率 , associative) a. natural join operations are associative

(E1 E2) E3 = E1 (E2 E3)

b. Theta joins are associative in the following manner: (E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3)

, where 2 involves attributes from only E2 and E3

14.2.1 Equivalence Rules (cont.)

Page 16: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

Fig.14.2 Pictorial Depiction of Equivalence Rules

Rule 5

Rule 6a

Rule 7a

Page 17: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 17

Rule7. Selection operation distributes over theta-join( 选择操作对于连接操作的分配率 , 选择条件下移 ) a. when all the attributes in 0 involve only the attributes

of one of the expressions , e.g. E1, being joined

0E1 E2) = (0(E1)) E2

b. when 1 involves only the attributes of E1, and 2 involves only the attributes of E2

1 E1 E2) = (1(E1)) ( (E2)) e.g. in Fig.14.0.2

14.2.1 Equivalence Rules (cont.)

Page 18: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

When Rule7 is applied, the size of relations participating in theta-join operation is reduced, thus evaluation costs are reduced

account branch

Fig.14.0.2 An example of Rule7

Page 19: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 19

Rule8. Projection operation distributes over theta-join( 投影操作对于连接操作的分配率 , 投影下移 ) a. L1 and L2 are the attributes of E1 and E2 respectively. if

join condition involves only attributes in L1 L2, then

))(())(()( 2......12.......1 2121

EEEE LLLL

14.2.1 Equivalence Rules (cont.)

Page 20: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

{branch-name, branch-city} {branch-name, amount}

branch loan

{branch-name, branch-city} {branch-name, amount}

branch loan

Fig.14.0.3 An example of Rule 8a

L1 L2

E1: branch (branch-name, branch-city, assets) E2:loan(loan-number, branch-name, amount)

Page 21: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 21

b. consider a join E1 E2

let L1 and L2 be sets of attributes from E1 and E2, respectively

let L3 be attributes of E1 that are involved in join condition , but are not in L1 L2, and

let L4 be attributes of E2 that are involved in join condition , but are not in L1 L2.

)))(())((().....( 2......121 42312121EEEE LLLLLLLL

14.2.1 Equivalence Rules (cont.)

Page 22: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

{branch-city} { loan-number}

branch loan

{branch-city, assets} {loan-number, amount}

branch loan

Fig.14.0.4 An example of Rule 8b

B. assets> L. amount

L1 L2

L4L3

B. assets> L. amount

E1: branch (branch-name, branch-city, assets) E2:loan(loan-number, branch-name, amount)

{branch-city} { loan-number}

Page 23: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 23

Rule9. The set operations union and intersection are commutative

E1 E2 = E2 E1 E1 E2 = E2 E1

set difference is not commutative

Rule10. Set union and intersection are associative (E1 E2) E3 = E1 (E2 E3) (E1 E2) E3 = E1 (E2 E3)

14.2.1 Equivalence Rules (cont.)

Page 24: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 24

Rule11. ( 选择操作对于集合并、交、差的分配率 ) The selection operation distributes over , and –.

(E1 – E2) = (E1) – (E2) similarly for and in place of –

(E1 – E2) = (E1) – E2

similarly for in place of –, but not for

Rule12. ( 投影操作对于集合并的分配率 ) The projection operation distributes over union

L(E1 E2) = (L(E1)) (L(E2))

14.2.1 Equivalence Rules (cont.)

Page 25: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 25

Find the names of all customers who have an account at some branch located in Brooklyncustomer-name(branch-city = “Brooklyn”

(branch (account depositor))) Transformation using rule 7a.

customer-name

((branch-city =“Brooklyn” (branch)) (account depositor))

Performing the selection as early as possible reduces the size of the relation to be joined.

14.2.2 Example One

Page 26: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 26

For

When we compute t ←(branch-city = “Brooklyn” (branch) account ) we obtain a temporal relation t whose schema is:T(branch-name, branch-city, assets, account-number, balance)

Push projections using equivalence rules 8a and 8b; eliminate unneeded attributes from intermediate results to get: customer-name ( account-number t(branch-name, branch-city, assets, account-

number, balance) depositor(customer-name, account-number))

customer-name((branch-city = “Brooklyn” (branch) account) depositor)

14.2.2 Example Two

Page 27: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 27

Find the names of all customers with an account at a Brooklyn branch whose account balance is over $1000.customer-name((branch-city = “Brooklyn” balance > 1000

(branch (account depositor)))

Transformation using join associatively (Rule 6a), refer to Fig.14.3

14.2.2 Example Three

Page 28: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 28

Fig. 14.3 Multiple transformations

14.2.2 Example Three (cont.)

Page 29: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 29

14.2.3 Join Ordering

For all relations r1, r2, and r3,

(r1 r2) r3 = r1 (r2 r3 ) If r2 r3 is quite large and r1 r2 is small, we choose

(r1 r2) r3

so that we compute and store a smaller temporary relation.

Page 30: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 30

Consider the expressioncustomer-name ((branch-city = “Brooklyn” (branch))

account depositor) Could compute account depositor first, and join result with

branch-city = “Brooklyn” (branch), but account depositor is likely to be a large relation.

Since it is more likely that only a small fraction of the bank’s customers have accounts in branches located in Brooklyn, it is better to compute

branch-city = “Brooklyn” (branch) account

first.

14.2.3 An Example

Page 31: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 31

How to estimate the cost of an operation estimating the size of the resultant relations produced by the

operation p, according to statistics in the catalog about the relations on which the operation is conducted

e.g. customer-name(balance2500(account) customer)

computing the cost of the operation in terms of disk access, by means of the cost formulae shown in chapter13

disk access is the predominant cost, and the number of block transfers from disk is used as a measure of the actual cost of evaluation.

§14.3 Estimating Statistics of Expression Results

Page 32: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 32

nr: number of tuples in a relation r br: number of blocks containing tuples of r lr: size of a tuple of r in bytes fr: blocking factor of r

the number of tuples of r that fit into one block V(A, r): number of distinct values that appear in r for attribute

A same as the size of A(r)

Statistics about indices, such as heights of B+-trees, that is HTi, and the number of leaf pages in the indices, are also in catalog

14.3.1 Statistics in Catalog for Optimization

Page 33: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 33

If tuples of r are stored together physically in a file, then:

rfrn

rb

Statistics in Catalog for Optimization (cont.)

Page 34: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 34

An example for statistical information faccount= 20 (20 tuples of account fit in one block) V(branch-name, account) = 50 (50 branches) V(balance, account) = 500 (500 different balance values) naccount = 10000 (account has 10,000 tuples) assume the following indices exist on account

a primary, B+-tree index for attribute branch-name a secondary, B+-tree index for attribute balance

Statistics in Catalog for Optimization (cont.)

Page 35: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 35

The size estimation of the result of a selection operation depends on the selection predicates, that is single equality predicate, single comparison predicate, and combination of predicates

Equality selection A=a (r) assuming uniform distribution of values in relations, and the

value a appears in attribute A of some record in r the selection is estimated to have

nr / V(A, r) tuples nr /V(A, r) /fr blocks that these tuples occupy

14.3.2 Selection Size Estimation

Page 36: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 36

Comparison selections AV(r) the lowest and highest values of attributes, that is min(A, r)

and max(A, r), are available in catalog the estimated number of records/tuples satisfying comparison

condition A V 0, if v < min(A, r) nr, if v max(A, r) , otherwise

in absence of statistical information cost is assumed to be nr / 2

),min(),max(),min(.

rArArAvnr

14.3.2 Selection Size Estimation (cont.)

Page 37: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 37

The selectivity of a condition i is the probability that a tuple in the relation r satisfies i . If si is the number of satisfying tuples in r, the selectivity of i is given by si /nr.

Conjunction: 1 2. . . n (r). The estimate for number of tuples in the result is:

Disjunction:1 2 . . . n (r). Estimated number of tuples:

Negation: (r). Estimated number of tuples:nr – size((r))

nr

nr n

sssn

. . . 21

)1(...)1()1(1 21

r

n

rrr n

sns

nsn

14.3.2 Selection Size Estimation (cont.)

Page 38: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 38

14.4 Choice of Evaluation Plans— Query optimization

How to generate the evaluation plan for an relational algebra expression with lower or the lowest cost

generating of the optimized query trees selecting

implementation algorithms for each operations in the tree pipeline or materialization strategy for the expression

E.g. Fig.14.5

Page 39: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

Fig.14.5 An evaluation plan

Page 40: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 40

Query optimization choosing the evaluation plan with lowest or the lower cost

Query optimization is conducted by two strategies/approaches cost-based optimization,§14.4.2 heuristic optimization, §14.4.3!

Practical query optimizers incorporate elements of these two approaches

14.4 Choice of Evaluation Plans(cont.)

Page 41: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 41

14.4.2 Cost-based Optimization

Cost-based optimization search all the possible plans and choose the best plan a optimal plan with the minimum costs can be obtained

Cost-based optimization is expensive, though results optimal plan

Page 42: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 42

Heuristic optimization uses heuristics (or heuristic rules) to choose a better plan a suboptimal plan with lower costs can be obtained

Principles: transforms the query-tree by heuristics to reduce cost perform selection early to reduce the number of tuples perform projection early to reduces the number of attributes substitute Cartesian product and selection with join

14.4.3 Heuristic Optimization

Page 43: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 43

perform most restrictive selection and join operations before other similar operations

the most restrictive operations generate resulting relations with smallest size

14.4.3 Heuristic Optimization (cont.)

Page 44: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 44

Step1 使用 rule1 ,将 conjunctive selection 分解为多个单独的选择操作,以使单个选择操作尽可能沿查询树下移(尽早执行选择操作,以减少中间计算结果) Step2 根据选择操作的交换率和分配率,利用 rule2,

rule7.a, rule7.b, rule11 , 将查询树上的每个选择操作尽可能移向叶节点,以便尽早执行选择操作 Step3 根据连接操作的结合律和交换率,使用 rule6 ,重新安排查询树中的叶结点,使得具有 restrictive selection 特征的叶结点先执行

restrictive selection :执行此操作后,产生的结果关系最小(所含元组最少) refer to Fig. 14.3

Steps in Heuristic Optimization

Page 45: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 45

Step4 利用 rule4.a, 以连接操作代替相邻的选择和笛卡尔乘积操作 Step5 利用 rule3, 8.a, 8.b,12, 将查询树上的投影操作尽可能下移,以便尽早执行投影操作,减少中间计算结果 Step6 将最后的查询树分解为多个子树,使子树中的各操作可以采用流水线方式执行,以减少对外设的访问次数

e.g. Fig.14.5

重点:前 5 步 !!!

Steps in Heuristic Optimization (cont.)

Page 46: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 46

Given S(S#, sname, age, sex) C(C#, cname, teacher) SC(S#, C#, grade)

Find all the students, who are male, and get grades more than 90 when they learn some one course

Step1. SQL statement SELECT DISTINCT sname

FROM S, SC WHERE S.s# = SC.s# AND sex=M AND grade > 90

join condition

selection conditions

Example One

Page 47: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 47

Principles select A1, A2, …, An from r1, r2, …, rn

where P corresponds to ∏A1, A2, …, An (σP ( r1 ╳ r2 ╳ …╳ rn ) )

Example One (cont.)

Page 48: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 48

Step2. Initial Relational algebra expression E1 =

sname( s.s#=sc.s# ∧ sex=M grade>90 ∧ (S ╳ SC ) )

Step3. Initial query tree Fig. 14.0.5 (a), E1

Step4. By means of Rule1, Rule7b, distribute operations over relation S and SC

Fig. 14.0.5 (b), E2

Example One (cont.)

Page 49: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

sname

grade>90 sex=M

S SC

( b ) E2

sname

S (S#, sname, age, sex)

SC (S#, C#, grade)

(a) Initial query tree E1

s.s#=sc.s# ∧ sex=M ∧grade>90

s.s#=sc.s#

Page 50: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 50

Step5. By means of Rule4.a, replace selection and Cartesian product with natural join Fig. 14.0.5 (c), E3

Step6. !! By means of Rule8b, distribute operations over relation S and SC Fig. 14.0.5 (d),

The final optimized expression E4 that is equivalent to initial expression E1 is sname {sname, S# (sex=M (S))

grade>90 (grade, S# (SC)) }

Example One (cont.)

Page 51: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

( c ) E3

s.s#=sc.s#

Πsname, S#

SC(S#, C#, grade)

( d ) E4

sname

grade>90 sex=M

S(S#, sname, age, sex)

SC (S#, C#, grade)

s.s#=sc.s#

sname

sex=Mgrade>90

ΠS#

S(S#, sname, age, sex)

Page 52: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 52

Consider the following insurance database, where the primary keys are underlined,

Person(driver-id, name, address) Car(license, model, year) Accident(report-number, date, location) Participated(driver-id , license, report-number, damage-

amount)

Example Two

Page 53: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 53

Problems Give a SQL statement to find out the driver name, license,

report-number of the accidents which happened in Beijing and before May 3, 2008

Give the initial query tree for this query, and construct an optimized and equivalent relational algebra expression for it by means of heuristic optimization

Example Two (cont.)

Page 54: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 54

Step1. SQL statement Select name, license, Accident.report-number From Person, Participate, Accident Where Person.driver_id = Participate.driver_id AND Accident.report_number = Participate.report_number AND Location=Beijing AND date < 5/3/2008

Step2. Initial expression name, license, Accident.report-number

( Person.driver_id = Participate.driver_id AND Accident.report_number = Participate.report_number AND

Location=Beijing AND date < 5/3/2003

(Person ╳ Participate ╳ Accident )

Example Two (cont.)

Page 55: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

name, license, Accident.report-number

Person.driver_id = Participate.driver_id AND Accident.report_number =

Participate.report_number AND Location=Beijing AND date < 5/3/2003

Person Participate Accident

(a) initial query tree

Page 56: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

name, license, Accident.report-number

Accident.report_number

Person(driver-id, name, address)

Participate(driver-id, license, report-number, damage-amount)

Accident(report-number, date ,location)

(b) optimal query tree

Accident.report.number = Participate.report_number

Person.driver_id = Participate.driver_id

driver_id,

name

driver_id,

license,

report-number

Location=Beijing

AND date < 5/3/2003

name, license,

report-number

Page 57: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 57

Consider the following relations in banking enterprise database, where the primary keys are underlined

branch (branch-name, branch-city, assets), loan ( loan-number, branch-name , amount) borrower( customer-name, loan-number, borrow-date) customer (customer-name, customer-street, customer-city) account (account-number, branch-name, balance) depositor (customer-name, account-number , deposit-date)

Example Three

Page 58: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 58

For the query “ Find the names of all customers who have an loan at any branch that is located in Brooklyn and have assets more than $100,000, requiring that loan-amount is less than $1000”

give an SQL statement for this query given a initial query tree for the query, and convert it into an

optimized query tree by means of heuristic optimization

Example Three (cont.)

Page 59: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 59

SQL select customer-namefrom borrower, loan, branchwhere loan.loan-number=borrower.loan-number and branch.branch-name=loan.branch-name and branch-city=”Brooklyn” and assets>100000 and amount<1000

Example Three (cont.)

Page 60: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

customer-name

loan.loan-number customer- name, loan-number

borrower ( customer-name, loan-number, borrow-date)

loan.loan-number

,branch-name

amount< 100

loan( loan-number, branch-name , amount)

branch-name

branch-city=”Brooklyn”

AND assets>100000

Branch (branch-name, branch-city, assets)

Page 61: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

给定如下关系数据库 Employee(Employee#, Name, Address, Super-E#, E-D#) Department(D#, Dname, manager#, depart-location) Project(P#, Pname, Plocation, P-D#) 查询:对于每个在“哈尔滨”进行的工程项目,列出其工程项目号 P# ,工程所属部门号 P-D#, 该部门领导的名字

Name 和地址 Address 要求:写出该查询的 SQL 语句;转换为关系代数表达式并利用启发式方法进行查询优化;给出优化后的关系代数表达式。 Select P#, P-D#, Name, Address) From Project, Department, Employee Where Plocation = 哈尔滨 AND P-D#= D# AND manager# = Employee#

Example Four (cont.)

Page 62: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

P#, P-D#, Name, Address

P#, P-D#, manager# employee#, name, address

Employee P#, P-D#

P-location=“ 哈尔滨”

Project

D#, manager#

Department

Page 63: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 63

作业 Consider the following schema, where the primary keys are

underlined , Suppliers (supplier-id, supplier-name, city, telephone, address) Parts ( parts-id, parts-name, parts-color) Catalog (supplier-id, parts-id, price ) .   (1) Give an SQL statement to find out the name and telephone of

the suppliers who supply a red part whose price is below $2000. (2) Translate this SQL statement into an initial query tree, and

give an optimized query tree for it, by means of heuristic query optimization.

Page 64: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 64

Appendix A Sybase 平台下的查询执行计划优化 Select max(BscpID) From BSC

Page 65: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 65

Appendix A Sybase 平台下的查询执行计划优化(续)

Page 66: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization
Page 67: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 67

查询实验内容 单表、多表(如 3 张表)的查询、插入、删除、修改所对应的查询执行计划比较 有无索引、索引类型对查询、插入、删除、修改等操作的代价的影响 查询结果相同、但表达形式不一样的多个等价的查询、删除、插入、修改语句的查询执行计划的比较例如。

delete from loan where amount between 0 and 50

Appendix A Sybase 平台下的查询执行计划优化(续)

Page 68: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 68

Appendix A Sybase 平台下的查询执行计划优化(续)delete from loanwhere amount in ( select amount from loan where amount >= 0 and amount <= 50)

Page 69: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 69

Appendix B DDBS 查询代价与数据分布的关系

分布式数据库系统 DDBS 中 , 数据在各站点的分布和站点间通讯代价 / 传输时间将影响到整个查询代价 例 .

S(S# , SNAME , AGE , SEX) , 104 个元组,站点 A C(C# , CNAME , TEACHER), 105 个元组,站点 B SC(S# , C# , GRADE), 106 个元组,站点 A 元组平均长度 100bit ,站点间传输速率 v = 104 bit/s ,通 信延迟 d=1s 。假设无副本

查询要求 在支持分片透明性的 DDBS (分布在 A 和 B 2 个站点)中, 查询“所有选修“ MATH“ 的男生的学号和姓名”

Page 70: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 70

SQL (全局)查询语句 SELECT S# , SNAME

FROM S , SC , C WHERE S.S# = SC.S# AND SC.C# = C.C# ( 连接条件 ) AND SEX=M AND CNAME= MATH (选择条件) 查询代价 (query cost)

只考虑通信代价 TC (x) =1s ╳ 传输次数 + 传输的元组 bit 数 ÷ 104 bit/s

Appendix B DDBS 查询代价与数据分布的关系 ( 续 )

Page 71: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 71

策略 1 将关系 C 传到 A 地,在 A 地进行查询处理 TC1 = 1s + 105 ╳100 bit ÷ 104 bit/s ≈ 103 s ≈ 16.7 min

策略 2 将关系 S 和 SC 传到 B 地,在 B 地进行查询处理 TC2 = 2s + ( 104 +106 ) ╳ 100 bit ÷ 104 bit/s ≈10100

s≈28h

Appendix B DDBS 查询代价与数据分布的关系 ( 续 )

Page 72: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 72

策略 3 A 地从关系 SC 、 S 中求出所有男学生的成绩元组,共

105 个 根据元组中的 C#, 向 B 地的 C 关系询问 ( 应答过程 )

CNAME=MATH?共询问 105 次 TC3 ≈ 2 10╳ 5 1 ≈ ╳ 2.3 天

策略 4 在 B 地从关系 C 中求出 CNAME=MATH 的课程元组,假设有 10 个 根据 C# 询问 A 地的 S 、 SC 的连接,核实是否为选修

MATH ( SC.C#= C#? )的男生 (S.SEX=M?)

Appendix B DDBS 查询代价与数据分布的关系 ( 续 )

Page 73: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 73

TC4 ≈ 2 10 ╳ ╳ 1 ≈ 20s

策略 5 在 A 地从关系 S 、 SC 中求出男生的的课程元组,有 105 个 将结果传到 B 地,在 B 地执行查询 TC5= 1s + 105 ╳ 100 bit ÷ 10 4bit/s ≈ 103 s ≈ 16.7 m

策略 6 在 B 地从关系 C 中求出 CNAME=MATH 的课程元组,有

10 个 将结果传到 A 地,在 A 地执行查询 TC6 = 1s + 10 100 bit ÷ 10╳ 4 bit/s ≈ 1s

Appendix B DDBS 查询代价与数据分布的关系 ( 续 )

Page 74: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 74

Appendix C DB2 查询优化 对于 IBM DB2来说,它有非常好的优化算法,大致是基于代价估算后选择代价最小的策略。 代价估算的方法

搜集系统目录相关数据和要被执行的 SQL 语句,然后选择若干方案进行代价计算。 DB2 应用程序开发过程中部署测试环境时,可以专门设置虚拟的但是和将来实际运行环境比较一致的系统目录数据来模拟真实环境下的 DB2 的 SQL 执行策略。 对于 DB2 的优化器也可以设置不同的优化级别,可以使用

SET CURRENT QUERY OPTION命令或者某些子句进行相应设置,其中 0 :最小的优化 1 :相当于 DB2/6000 Version 1 中的优化加一些低成本的优化

Page 75: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 75

2 :使用 opt level 5 的功能,但连接优化上进行了一定简化 3 :中等优化,大约相当于 DB2 for Z/OS 5 :使用一定数量的优化,包括 Heuristic Rules (这是默认的值) 7 :使用一定数量的优化,不包括 Heuristic Rules 9 :使用所有可用的优化技术

Appendix C DB2 查询优化(续)

Page 76: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 76

有 4种查询优化策略访问工具,分别能够针对静态 SQL 、动态SQL 、 CLI 应用程序、 DRDA 应用程序请求者( DRDA AR )提供图形界面或者文本界面查询访问或者提供基本信息。

第一种。基本的是直接访问 Explain Table (可以直接使用SELECT 语句来访问它,所以这是一个各种OS 平台上都可以采取的措施),它的缺点是没有提供任何界面,必须由人去理解表中的数据。另外只能对执行的 SQL来处理,如果是未执行的静态包则无法产生信息。

第二种。基于图形界面的 Visual Explain (主要用于 windows 平台,甚至可以在 windows 平台上对诸如 UNIX 上的 DB2 系统上的SQL 语句进行这种图形化分析),它不支持 DADA AR ,也不支持分析未执行的静态包,不能一次分析多条 SQL 语句,但最直观。

Appendix C DB2 查询优化(续)

Page 77: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

May 2008 Database System Concepts - Chapter 14 Query Optimization - 77

第三种。 db2exfmt 以预先定义好的报告格式获得explain table 的信息,它不支持 DADA AR ,也不支持分析未执行的静态包,但支持一次分析多条语句。

第四种。 db2expln允许我们查看作为静态绑定包( static package )的一部分访问策略,它提供一种文本格式描述的策略。不支持 CLI或者 DRDA AR ,可以一次分析多条语句。

Appendix C DB2 查询优化(续)

Page 78: PART 4 DATA STORAGE AND QUERY. Chapter 14 Query Optimization

Have

a brea

k