20
Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented Approach 2ed by Kifer, Bernstein & Lewis, © Addison Wesley 2005)

Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Embed Size (px)

Citation preview

Page 1: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Optimization(CB Chapter 23.1-23.3)

CPSC 356 Database

Ellen Walker

Hiram College

(Includes figures from Database Systems: An Application Oriented Approach 2ed by Kifer, Bernstein & Lewis, © Addison Wesley 2005)

Page 2: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

SQL

• Widely used (only?) standard query language for relational databases

• Once SEQUEL (Structured English QUEry Language), now Structured Query Language

• Objectives– Easy to learn, easy to use– Create and modify the database and query from it

• DDL defines• DML manipulates

Page 3: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

SQL is Declarative, RA is Procedural

• SQL Statements describe the desired results, but do not specify a sequence of operations to get those results

• Relational Algebra expressions describe a specific sequence of operations to perform

• To evaluate a SQL statement, it needs to be translated into (a computer implementation of) RA first!

Page 4: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Processing

• Query Processing is the translation of SQL into RA-like nested function calls

• One query can have multiple translations

SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo;

roomNo,HotelName ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x Hotel) )

roomNo,HotelName ( Room.hotelNo=Hotel.hotelNo ( (hotelName=Savoy (Hotel)) x Room) )

Page 5: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Optimization

• Choose the translation that minimizes resource use (time, space)

• The second translation below is better. (Why?)

SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo;

roomNo,HotelName ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x Hotel) )

roomNo,HotelName ( (hotelName=Savoy (Hotel)) Room.hotelNo=Hotel.hotelNo Room)

Page 6: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Processing

User

SQLDecom-position

RelationalAlgebra

Optimi-zation

ProcessingEngine

Result

(table) Efficient Rel.

Algebra

Page 7: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

More Detail of Query Processing

Page 8: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Steps in Query Processing

• Query Decomposition (create relational algebra expression)

• Query Optimization (create execution plan)• Code Generation

• Query Execution

Page 9: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Parts of Optimization

• Query Plan Generator– Comes up with viable relational algebra

expressions to improve the initial naïve one

• Cost Estimator– Estimates the cost (time / space) of each plan

• Optimization– Choosing the plan with the lowest cost, or at least

“reasonably cheap”

Page 10: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Decomposition

• Check Syntax• Build a relational algebra tree

Hotel Room

hotelName=Savoy

Room.hotelNo=Hotel.hotelNo

X

Page 11: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Transformation (Selection)

• Select with multiple AND conditions can be sequence of selectshotelName=Savoy and Room.hotelNo=Hotel.hotelNo( …)

= hotelName=Savoy (Room.hotelNo=Hotel.hotelNo ( …))

• Order of Select operations doesn’t matter= Room.hotelNo=Hotel.hotelNo ( hotelName=Savoy ( …))

Page 12: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Transformation (Projection)

• Extra intermediate projections don’t matterName(Name, Status(Student))= Name (Student)

• Order of select and project doesn’t matterStatus=‘SR’(Status(Student)) =

Status(Status=‘SR’(Student))

Page 13: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Transformation (Join)

• Push Select through Join– Replace a select on a cross-product with a join– Joins can be implemented at the lowest level more efficiently

than “materialized cross-product”

• Push Select through Product– If the attributes of the condition all belong to one table of the

join, put the select on only the one table, so a smaller table is joined

– Joins on smaller tables are faster than on larger ones

• More rules pp. 640-642

Page 14: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Pushing Select Example

• Find all seniors that take CPSC356

stu_id=id & crs=‘CPSC356’(Student x Transcript)

• Separate the selects

stu_id=id ( crs=‘CPSC356’(Student x Transcript))

• Push the inner select

stu_id=id (Student x (crs=‘CPSC356’(Transcript)))

• Replace Select/Product by JoinStudent |x| stu_id=id (crs=‘CPSC356’(Transcript)))

Page 15: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Processing Example

Page 16: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Query Processing Example

Page 17: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Execution Plans

• Add specific algorithms to each relational algebra step

• Determine whether/how indices will be used• Add pipelining (not storing intermediate data)

where possible

Page 18: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Choosing Transformations

• Estimate cost of each tree based on– Table sizes– Numbers of distinct attribute values– Average number of tuples for selection condition– Methods used for join (e.g. indexed, hashed)

• Choose lowest cost tree• Because estimates aren’t perfect, the

absolute best tree might not be chosen!

Page 19: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Heuristics (Rules of Thumb)

• Perform Selection as early as possible– Unless doing it later lets you use an index

• Combine X and Selection into join operation• Execute most restrictive Selections first• Perform Projection as early as possible• Compute common expressions once

– Creating a view is a way to do this!

Page 20: Query Optimization (CB Chapter 23.1-23.3) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented

Consequences for SQL Programmer

• Using RA operations in SQL (e.g. explicit Join) constrains optimization– Good when “programmer knows best”– Bad when programmer prevents a better optimization

• Intermediate tables (i.e. views) can constrain optimization– Use views to compute common subexpressions

• When performance is substandard, tweaking the SQL can help! (Remember the heuristics).