Chapter 6 Query Execution. Query Query Compilation (Chapter 7 ) query plan Query execution metadata ( Chapter 6 ) data the major parts Of the query processor

  • View
    227

  • Download
    6

Embed Size (px)

Transcript

  • Slide 1

Chapter 6 Query Execution Slide 2 Query Query Compilation (Chapter 7 query plan Query execution metadata Chapter 6 data the major parts Of the query processor Slide 3 Query Compilation 1.Parse Query parse tree 2.Rewrite Query logical query plan 3.Implement Query physical query plan 2+3 => Query Optimization Slide 4 Metadata for Query Optimization The size of each relation Statistics such as the approximate number and frequency of different values for an attribute. The existence of certain indexes The layout of data on disk. Slide 5 Overview of Query Optimization Slide 6 parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute {P1,P2,..} {(P1,C1),(P2,C2)...} Pi answer SQL query parse tree logical query plan improved l.q.p l.q.p. +sizes statistics Slide 7 Example: SQL query SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE %1960 ); (Find the movies with stars born in 1960) Slide 8 Example: Parse Tree SELECT FROM WHERE IN title StarsIn ( ) starName SELECT FROM WHERE LIKE name MovieStar birthDate %1960 Slide 9 Example: Generating Logical Query Plan title StarsIn IN name birthdate LIKE %1960 starName MovieStar Slide 10 Example: Logical Query Plan title starName=name StarsIn name birthdate LIKE %1960 MovieStar Slide 11 Example: Improved Logical Query Plan title starName=name StarsIn name birthdate LIKE %1960 MovieStar Question: Push project to StarsIn? Slide 12 Example: Estimate Result Sizes Need expected size StarsIn MovieStar Slide 13 Physical Query Plan Selecting algorithms to implement each of the operators of the logical query plan. Selecting an order of execution for these operators. Slide 14 Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scanindex scan Parameters: Select Condition,... StarsInMovieStar Slide 15 Example: Estimate costs L.Q.P P1 P2 . Pn C1 C2 . Cn Pick best! Slide 16 Query Execution To execute the operations of physical query plan Major approaches: scanning, hashing, sorting, and indexing Different costs and structures due to the amount of available main memory Slide 17 Textbook outline Chapter 6 6.1Algebra for queries [bags vs sets] - Select, project, join, .[project list a,a+b->x, ] - Duplicate elimination, grouping, sorting 6.2Physical operators - Scan,sort, 6.3-6.10 Implementing operators + estimating their cost Slide 18 An Algebra for Queries Relational Algebra SQL Operators Union UNION Intersection INTERSECTION Difference - EXCEPT Selection WHERE Projection SELECT Product FROM Join JOIN Duplicate elimination DISTINCT Grouping GROUP BY Sorting ORDER BY Slide 19 Differences Between SQL Relations and Set Relations are bags Relations have schemas Slide 20 Union, Intersection, and Difference 1.For R U S, a tuple t is in the result as many times as the number of times it is in R plus the number of times it is in S 2.For R S, a tuple t is in the result the minimum of the number times it is in R and S 3.For R S, a tuple t is in the result the number of times it is in R minus it is in S, but not fewer than zero times For example Suppose R={A, B, B} S={C, A, B, C} R U S={A, A, B, B, B,C, C} R S={A, B} R S={B} Slide 21 The Selector Operator Selection c (R), produces the bag of those tuples in R that satisfy the condition C Condition C may involve 1. Arithmetic or string operators on constants and/or attributes 2. Comparison between terms constructed in 1 3. Boolean connectives AND, OR, and Not applied to terms constructed in 2. For example R a b a1 (R): a b b3 AND a+b6 (R) 0 1 2 3 a b 2 3 4 5 4 5 4 5 2 3 2 3 Slide 22 The Projection Operator L(R) is the projection of R onto the list L L can have the following kinds of elements 1. A single attribute of R 2. An expression x y, where x and y are names for attributes In the result, the attribute x of R is renamed with y; 3. An expression E z, where E is an expression, z is a new name for the attribute that results from calculation implied by E For example R a b c a,b+cx (R) a x 0 1 2 0 3 3 4 5 3 9 Slide 23 The Product of Relations The product RXS is a relation whose schema consists of attributes of R and the attributes of S. For example R: a b S: b c 0 1 1 4 2 3 1 4 2 3 2 5 aR.b S.b c 000222222000222222 1 3 1 2 1 2 1 2 4 5 4 5 4 5 R S: Slide 24 Joins Natural Join: R S = L( c(RXS) ), where : 1.C is a condition that equates all pairs of attributes of R and S that have the same name. 2.L is a list of all the attributes of R and S, except that one copy of each pair of equated attributes is omitted. For example R: a b S: b c 0 1 1 4 2 3 1 4 2 3 2 5 a b c 0 1 4 Slide 25 The theta-join: R S = c(RXS), c If C is x = y, x from R and y from S, we call it equijoin For example R: a b S: b c 0 1 1 4 2 3 1 4 2 3 2 5 a+R.b For StarsIn (title, year, starName), to find, for each star who has appeared in at least three movies, the earliest year in which they appeared. SQL: SELECT starNme, MIN(year) AS minYear FROM StarsIn GROUP BY starName HAVING COUNT(title)>=3; starName,minYear ctTitle>=3 starName, MIN(year) minYear, COUNT(title) ctTitle StarsIn Slide 29 The Sorting Operator L (R) denotes relation R is sorted in the order indicated by L For example, for R(a, b, c), c,b (R) orders the tuples of R by their value of c, and tuples with the same c-value are ordered by their b value. Tuples that agree on both b and c may be ordered arbitrarily. Slide 30 Logical Query Plan For example, there are two tables MovieStar(name,addr,gender,birthdate) StarsIn(title,year,starName) Convert SQL statements into logical query plan SELECT title,birthdate FROM MovieStar,StarIn WHERE year = 1996 AND gender=F AND starName=name; title,birthdate title,birthdate year=1996 AND gender=FAND starName=name starName=name gender= F year=1996 MovieStar StarIn MovieStar StarsIn Slide 31 Physical-Query-Plan Scanning a table Executing algebraic operators Passing data among operators Slide 32 Scanning Tables Tablescan Indexscan Slide 33 Sorting While Scanning Tables (1) scanning index (2) main-memory sorting algorithm (3) multiway merge sort Slide 34 The Model of Computation for Physical Operators Use the number of disk I/O as the measure of cost for an operation Assume the arguments of any operator on disk and the result left in memory Pipeline the result of one operator to another operator Slide 35 Parameters for Measuring Costs M: the number of main-memory buffers available to an execution of a particular operator B(R): the number of blocks that are needed to hold all the tuples of R T(R): the number of tuples in R V(R,a): the number of distinct values of the column for a in R Slide 36 I/O Cost for Sort-Scan Operator Fits in main memory Does not fit in main memory ClusteredB3B Not ClusteredTT+2B Slide 37 Iterators for Implementation of Physical Operators Open (R) { b:= the first block of R; t:= the first tuple of block b; Found:= TRUE; } GetNext(R) { IF ( t is past the last tuple on block b ) { increment b to the next block; IF ( there is no next block) { FOUND:=FALSE;RETURN;} ELSE t:= first tuple on block b; oldt:= t; increment t to the next tuple of b; RETURN oldt } Close(R) { } Slide 38 Building a union iterator from its components Open (R,S) { R.Open(); CurRel:=R; } GetNext (R,S) { IF (CurRel=R) { t:=R.GetNext(); IF (FOUND) RETURN t; ELSE S.Open(); CurRel:=S; } } RETURN S.GetNext() } Close(R,S){ R.Close() ; S.Close(); } Slide 39 Implementation Algorithms for Algebraic Operators 1.Sorting-based methods 2.Hash-based methods 3.Index-based methods Three degrees of difficulty and cost: one-pass, two-passes, three or more passes Slide 40 One-Pass Algorithms for Database Operators 1.Tuple-at-a-time, unary operations 2.Full-relation, unary operations 3.Full-relation, binary operations Slide 41 R Input bufferOutput buffer Unary operate COST If R is clustered disk I/O is B; If R is not clustered disk I/O is T. One-Pass Algorithms for Tuple-at-a-Time Operations Slide 42 One-Pass Algorithms for Unary, Full-Relation Operations Duplication Elimination(R) Read a block from disk for each tuple to make a decision 1. If it has not been seen before, copy it to the output 2. Otherwise, ignore it In order to support this decision keep one copy in memory B((R)) M-1 B((R)) M R Input buffer Seen before? M-1 buffers Output buffer Main Memory: Hash Table or Binary Search Tree Slide 43 One-Pass Algorithm for Binary Operations Set Union Set Intersection Set Difference Bag Intersection Bag Difference Product Natural Join S R Output B(S) Two-Pass Algorithms Based on Sorting Two-pass algorithm for sorting B(R) (B(R)>M): Read M blocks of R into main memory Sort M blocks, using an efficient sorting algorithm Write the sorted list into M blocks of disk. Main memory disk 1.Read data, process in some way 2.Write out to disk, again 3.Reread from disk to complete the operation First pass Slide 52 Duplicate Elimination Using Sorting R M buffersSame M buffers Read Write Sorting Sorted sublists of R Reread Choose the first unconsidered tuple t in sorted order Output the first copy of t and remove the other copies of t. All of sorted sublists are exhausted. Slide 53 Example 6.15: assume M=3, and only two tuples fit on a block. The relation R consists of 17 tuples: ( 2,5,2,1,2,2,) ( 4,5,4,3,4,2,) (1,5,2,1,3) R1 R2 R3 sublistIn memoryWaiting on disk R1 1 2 2 2, 2 5 R2 2 3 4 4, 4 5 R3 1 1 2 3, 5 sublistIn memoryWaiting on disk R1 2 2 2, 2 5 R2 2 3 4 4, 4 5 R3 2 3 5 sublistIn memoryWaiting on disk R1 5 R2 3 4 4, 4 5 R3 3 5 sublistIn memoryWaiting on disk R1 5 R2 4 4 4 5 R3 5 Slide 54 Analysis: The total cost of this algorithm is 3B(R). to compute (R) with the two-pass algorithm requires only B(R) blocks of memory since BM(M-1)M 1.B(R) to read each block of R when creating the sorted sublists. 2.B(R) to write each of the sor