49
Cost-based Query Optimization Christoph Koch

Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

Cost-based Query Optimization

Christoph Koch

Page 2: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

2

Computing the cost of a query plan

We know how to estimate the cost of each individual operator.But for doing this we need to know the size of the input.

Compute the size of each relation produced by some operatorBottom-upWe need to know estimates of selectivities/reduction factors for both selection and join conditions.

Given a fixed query plan, if we exchange a particular operator implementation (e.g. NL join against Hashjoin), the output does not change and we do not have to recompute output sizes.

Page 3: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

3

Model of Query plans

We use a very powerful model.Ingredients:

Algebra treeFor each operation, a choice of implementationFor some binary operations (e.g. NL joins), an annotation saying which of the two inputs is the outer and which is the inner loop.A choice of whether an operator’s output is to be written back to disk or pipelined into the next operator (sometimes there is no choice).An allocation of memory buffer pages to operators and their input/output lines.

In case of pipelining, the allocation may not be operator by operator but span several operations.

Page 4: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

4

A Remark on the cost function for block NL joins

On this slides I use the formula

pages_outer +round_up(pages_outer / (buffer_pages – 1)) *(pages_inner - 1) + 1

I explained this formula in class but it’s not in the book.You can use the formula from the book instead:

pages_outer +round_up(pages_outer / (buffer_pages – 1)) *pages_inner

Page 5: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

5

A University Database

roomgradnameid

226FullSocrates2125

7FullKant2137

36FullCurie2136

309AssocAugustinus2134

52AssocPopper2133

310AssocKopernikus2127

232FullRussel2126

Professor (p)

semesternamesid

18Xenokrates24002

2Feuerbach29555

2Theophrastos29120

3Carnap28106

6Schopenhauer27550

8Aristoxenos26830

10Fichte26120

12Jonas25403

Student (s)

21374The three critiques4630

21342CS4325022

21332The Vienna Circle5259

taught_bycreditsnamecid

21374Foundations5001

21262Bioethics5216

21263Theory of Science5052

21254Logics4052

21252Maieutics5049

21263Epistemology5043

21254Ethics5041

Course (v)

521628106

502225403

502229555

504929120

504129120

cidsid

500126120

500129120

525928106

505228106

504128106

405227550

500127550

Takes (h)

Page 6: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

6

Assumptions#tuples per page:

p: b1024/50c =20

s: b1024/50c = 20

v: b1024/100c = 10

h: b1024/16c = 64

(page overhead is ignored)

#pages:

p: 800/20 = 40

s: 38000/20 = 1900

v: 2000/10 = 200

h: d60000/64e = 938

(we ignore the overhead of the primary index.)

Relation sizes (#tuples)|p|=800|s|=38000|v|=2000|h|=60000

Avg. Tuple sizep: 50 Bytess: 50 Bytesv: 100 Bytesh: 16 Bytes

selectivitiesSel[sh] = 2.6 * 10^-5Sel[hv] = 5 * 10^-4Sel[vp] = 1.25 * 10^-3Sel[p.Name=...] =1.25 * 10^-3

Page size 1024 BytesMain memory buffers m = 20+1 pages

Page 7: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

7

Example 1

Access(p)

πs.semester

h

v

s

(with duplicates)

Joins.sid=h.sid

Joinp.id=v.taught_by

Joinv.cid=h.cidSelp.name=´Socrates´

Page 8: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

8

Example 1

Access(p)

πs.semester

s h

v

(with duplicates)

Joinp.id=v.taught_by

Joinv.cid=h.cid

Joins.sid=h.sid

Selp.name=´Socrates´

Page 9: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

9

Example 1

Access(p)

πs.semester

s h

v

(with duplicates)

Joins.sid=h.sid

Joinp.id=v.taught_by

Joinv.cid=h.cidSelp.name=´Socrates´

Page 10: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

10

Example 1

Access(p)

πs.semester

s h

v

(with duplicates)

Joins.sid=h.sid

Joinp.id=v.taught_by

Joinv.cid=h.cidSelp.name=´Socrates´

Page 11: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

11

Example 1

Access(p)

πs.semester

s h

v

(with duplicates)

Joins.sid=h.sid

Joinp.id=v.taught_by

Joinv.cid=h.cidSelp.name=´Socrates´

Page 12: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

12

Example 1

Access(p)

πs.semester

s h

v

(with duplicates)

And now we have to allocate memory buffers...

Joins.sid=h.sid

Joinp.id=v.taught_by

Joinv.cid=h.cidSelp.name=´Socrates´

Page 13: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

13

Example 1a

Access(p)NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

π s.semester

Selp.name=´Socrates´

NL-Joinv.cid=h.cid

s h

v

(with duplicates)

1 9

Buffer assignments in this color

pipl.

pipl.

10

0

pipl.

91

0

(inner: recompute!)

Page 14: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

14

Example 1a

Access(p)

π s.semester

s h

v

(with duplicates)

pipl.

pipl.

10

0

pipl.

1 9

91

0

(inner: recompute!)

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cidSelp.name=´Socrates´

Page 15: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

15

Example 1a

s h

vAccess(p)

π s.semester(with duplicates)

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cidSelp.name=´Socrates´

Page 16: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

16

Example 1b

Access(p)

π s.semester

s h

v

(with duplicates)

1 9

Buffer assignments in this color

pipl.

pipl.

10

0

pipl.

9 1

0

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cidSelp.name=´Socrates´

Page 17: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

17

Example 1b

Access(p)

π s.semester

s h

v

(with duplicates)

pipl.

pipl.

10

0

pipl.

1 9

0

9 1

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cidSelp.name=´Socrates´

Page 18: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

18

Example 1b

s h

vAccess(p)

π s.semester(with duplicates)

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cidSelp.name=´Socrates´

Page 19: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

19

Example 1c

Access(p)

π s.semester

s h

v

(with duplicates)

1 19

1

Buffer assignments in this color

pipl.

pipl.

10

0

pipl.

181

0

Store intermediateresult

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cidSelp.name=´Socrates´

Page 20: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

20

Example 1c

Access(p)

π s.semester

s h

v

(with duplicates)

1 19

1pipl.

pipl.

10

0

pipl.

181

0

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cidSelp.name=´Socrates´

Page 21: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

21

Example 1c

s h

vAccess(p)

π s.semester(with duplicates)

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cidSelp.name=´Socrates´

Page 22: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

22

Example 2

s

h

v

Access(p)

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

π s.semester

NL-Joinv.cid=h.cid

pipl.

pipl.

1 0

1 0

5 0

Selp.name=´Socrates´

Page 23: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

23

Example 2

s

h

v

Access(p)

pipl.

1 0

pipl.

(Same query, differentJoin order)

1 0

5 0

π s.semester

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cid

Selp.name=´Socrates´

Page 24: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

24

Example 2

s

h

v

Access(p)

1 0

pipl.

pipl.

1 0

5 0

π s.semester

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cid

Selp.name=´Socrates´

Page 25: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

25

Example 2

s

h

v

Access(p)

1 0

5

pipl.

1 0

pipl.

0

π s.semester

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cid

Selp.name=´Socrates´

Page 26: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

26

Example 2

s

h

v

Access(p)

5 0

pipl.

1 0

pipl.

1 0

π s.semester

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cid

Selp.name=´Socrates´

Page 27: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

27

Example 2

s

h

v

Access(p)

pipl.

1 0

pipl.

1 0

5 0

π s.semester

NL-Joins.sid=h.sid

NL-Joinp.id=v.taught_by

NL-Joinv.cid=h.cid

Selp.name=´Socrates´

Page 28: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

28

Example 3a

Access(p)

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

π s.semester

Sortv.cid Sorth.cid

v

h

s

Selp.name=´Socrates´

taught_by

Page 29: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

29

Example 3a

20+1

pipl.

1

1

pipl.

0

5 e.g. 2

pipl.

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 30: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

30

Example 3a

20+1

pipl.

1

1

pipl.

0

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 31: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

31

Example 3a

20+1

pipl.

1

1

pipl.

0

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 32: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

32

Example 3a

20+1

pipl.

1

1

pipl.

0

e.g. 2

pipl.

5

Access(p)

πs.semester

Sortv.cid Sorth.cid

v

h

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 33: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

33

Example 3a

20+1

pipl.

1

1

pipl.

0

e.g. 2

pipl.

5

Access(p)

πs.semester

Sortv.cid Sorth.cid

v

h

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 34: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

34

Example 3a

pipl.18

pipl.

20+1

pipl.

1

1

pipl.

0

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 35: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

35

Example 3a

pipl.18

pipl.

20+1

pipl.

1

1

pipl.

0

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 36: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

36

Example 3a

20+1

pipl.

1

1

pipl.

0

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 37: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

37

Example 3b

pipl.3

pipl.

20+1

pipl.

1

1

pipl.

0

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

IndexDup

Hash

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 38: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

38

Example 3b

pipl.

pipl.3

pipl.

20+1

pipl.

1

1

pipl.

0

e.g. 25

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

IndexDup

Hash

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 39: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

39

Example 3b

pipl.

pipl.3

pipl.

20+1

pipl.

1

1

pipl.

0

e.g. 25

Access(p)

π s.semester

Sortv.cid Sorth.cid

v

h

s

IndexDup

Hash

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Merge-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 40: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

40

Example 3c

pipl.

pipl.3

pipl.

pipl.

1

e.g. 25

1

(15 hashbuckets)

Access(p)

π s.semester

v

h

s

IndexDup

Hash

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Hash-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 41: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

41

Example 3d

pipl.

pipl.3

pipl.

pipl.

1

e.g. 2

1 1

5

Access(p)

π s.semester

v

h

IndexDup

Hash

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

NL-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 42: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

42

Example 3e

pipl.

pipl.3

pipl.

pipl.

1

e.g. 2

1

5

e.g. 2

Access(p)

π s.semester

v

h

IndexDup

Hash

s

Index-Joinp.id=v.taught_by

Index-Joins.sid=h.sid

Index-Joinv.cid=h.cid

Selp.name=´Socrates´

taught_by

Page 43: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

43

Observations

Join order has BIG impact on query evaluation time!

Cost of Operations and size of intermediate result can be computed separately.

The selection of a join operator has only local influence on the runtime of the query, except if it destroys properties of the data (such as sort order) that are important for operations later on in the pipeline or if it requires main memory buffers that are allocated to other operators.

Page 44: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

44

Example 4: Selectivity estimates

Access(p)

v

Now no selectivities are given, but we know that

• Professors (p) are in a 1:n relationship with Courses (v) (via foreign key taught_by).

• Each Professor has either rank „Full“ or „Assoc“.

Selectp.grad = ´Full´

Index-Joinp.id=v.taught_by

(UnclusteredB+-tree on taught_by

Page 45: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

45

Example 4a: Selectivity estimates

Access(p)

v

1

taught_by

Index-Joinp.id=v.taught_by

Selectp.grad = ´Full´

Page 46: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

46

Example 4b: Selectivity estimates

Access(p)

v

It is better NOT toUse the index!

Cost:

(UnclusteredB+-tree on taught_by

NL-Joinp.id=v.taught_by

Selectp.grad = ´Full´

Page 47: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

47

Using Histograms

0

50

100

150

200

250

300

350

400

16-17 18-19 20-21 22-23 24-25 26-27

Student.Age

Histogram computed by„analyze table“ statement

Page 48: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

48

Example 5: (co-)Clusters

p vProf 1

Course 1.1Course 1.2

Prof 2Course 2.1

Prof 3Course 3.1Course 3.2Course 3.3

1 1

1

NL-Joinp.id=v.taught_by

Selectp.grad = ´Full´

Page 49: Cost-based Query Optimization Christoph Koch · Cost-based Query Optimization Christoph Koch. 2 Computing the cost of a query plan We know how to estimate the cost of each individual

49

Example 6: Seeks

p

v

(UnclusteredB+-tree on taught_by

NL-Joinp.id=v.taught_by

Selectp.grad = ´Full´