Upload
donald-chandler
View
215
Download
0
Embed Size (px)
Citation preview
Query Optimization(CB Chapter 23.1-23.3)
CPSC 356 Database
Ellen Walker
Hiram College
(Includes figures from Database Systems: An Application Oriented Approach 2ed by Kifer, Bernstein & Lewis, © Addison Wesley 2005)
SQL
• Widely used (only?) standard query language for relational databases
• Once SEQUEL (Structured English QUEry Language), now Structured Query Language
• Objectives– Easy to learn, easy to use– Create and modify the database and query from it
• DDL defines• DML manipulates
SQL is Declarative, RA is Procedural
• SQL Statements describe the desired results, but do not specify a sequence of operations to get those results
• Relational Algebra expressions describe a specific sequence of operations to perform
• To evaluate a SQL statement, it needs to be translated into (a computer implementation of) RA first!
Query Processing
• Query Processing is the translation of SQL into RA-like nested function calls
• One query can have multiple translations
SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo;
roomNo,HotelName ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x Hotel) )
roomNo,HotelName ( Room.hotelNo=Hotel.hotelNo ( (hotelName=Savoy (Hotel)) x Room) )
Query Optimization
• Choose the translation that minimizes resource use (time, space)
• The second translation below is better. (Why?)
SELECT roomNo, HotelName FROM Room, Hotel WHERE HotelName = ‘Savoy’ and Room.hotelNo=Hotel.hotelNo;
roomNo,HotelName ( hotelName=Savoy and Room.hotelNo=Hotel.hotelNo (Room x Hotel) )
roomNo,HotelName ( (hotelName=Savoy (Hotel)) Room.hotelNo=Hotel.hotelNo Room)
Query Processing
User
SQLDecom-position
RelationalAlgebra
Optimi-zation
ProcessingEngine
Result
(table) Efficient Rel.
Algebra
More Detail of Query Processing
Steps in Query Processing
• Query Decomposition (create relational algebra expression)
• Query Optimization (create execution plan)• Code Generation
• Query Execution
Parts of Optimization
• Query Plan Generator– Comes up with viable relational algebra
expressions to improve the initial naïve one
• Cost Estimator– Estimates the cost (time / space) of each plan
• Optimization– Choosing the plan with the lowest cost, or at least
“reasonably cheap”
Query Decomposition
• Check Syntax• Build a relational algebra tree
Hotel Room
hotelName=Savoy
Room.hotelNo=Hotel.hotelNo
X
Query Transformation (Selection)
• Select with multiple AND conditions can be sequence of selectshotelName=Savoy and Room.hotelNo=Hotel.hotelNo( …)
= hotelName=Savoy (Room.hotelNo=Hotel.hotelNo ( …))
• Order of Select operations doesn’t matter= Room.hotelNo=Hotel.hotelNo ( hotelName=Savoy ( …))
Query Transformation (Projection)
• Extra intermediate projections don’t matterName(Name, Status(Student))= Name (Student)
• Order of select and project doesn’t matterStatus=‘SR’(Status(Student)) =
Status(Status=‘SR’(Student))
Query Transformation (Join)
• Push Select through Join– Replace a select on a cross-product with a join– Joins can be implemented at the lowest level more efficiently
than “materialized cross-product”
• Push Select through Product– If the attributes of the condition all belong to one table of the
join, put the select on only the one table, so a smaller table is joined
– Joins on smaller tables are faster than on larger ones
• More rules pp. 640-642
Pushing Select Example
• Find all seniors that take CPSC356
stu_id=id & crs=‘CPSC356’(Student x Transcript)
• Separate the selects
stu_id=id ( crs=‘CPSC356’(Student x Transcript))
• Push the inner select
stu_id=id (Student x (crs=‘CPSC356’(Transcript)))
• Replace Select/Product by JoinStudent |x| stu_id=id (crs=‘CPSC356’(Transcript)))
Query Processing Example
Query Processing Example
Execution Plans
• Add specific algorithms to each relational algebra step
• Determine whether/how indices will be used• Add pipelining (not storing intermediate data)
where possible
Choosing Transformations
• Estimate cost of each tree based on– Table sizes– Numbers of distinct attribute values– Average number of tuples for selection condition– Methods used for join (e.g. indexed, hashed)
• Choose lowest cost tree• Because estimates aren’t perfect, the
absolute best tree might not be chosen!
Heuristics (Rules of Thumb)
• Perform Selection as early as possible– Unless doing it later lets you use an index
• Combine X and Selection into join operation• Execute most restrictive Selections first• Perform Projection as early as possible• Compute common expressions once
– Creating a view is a way to do this!
Consequences for SQL Programmer
• Using RA operations in SQL (e.g. explicit Join) constrains optimization– Good when “programmer knows best”– Bad when programmer prevents a better optimization
• Intermediate tables (i.e. views) can constrain optimization– Use views to compute common subexpressions
• When performance is substandard, tweaking the SQL can help! (Remember the heuristics).