Query Plan Interpretation

BID207 Understanding Sybase IQ Optimizer(or What is that Query Plan Telling Me?)

Lou StantonPrincipal System [email protected] / (301) 896-1733August 2003

Query Processing and Query Plans

In this presentation - Show you how Sybase IQ Processes Queries Show methods for viewing Query Plans Understanding and Interpreting Query Plans How to influence Query Execution Discuss general Performance and Tuning tips

This presentation is Sybase IQ 12.5 specific Query plans were enhanced to provide more detail New features in v12.5 will be addressed

By the End of this Presentation You Will Know …

How to obtain and interpret Query Plans

How much (or little) the optimizer knows about your data

How important indexes are to Query Performance

The importance of table and database design

Why are Query Plans Important?

You suspect (or know) a query runs poorly Query plans can help you investigate the problem

You’re not sure you have the best indexes for a query Query plans tell you what indexes are used in a query

Query Execution Phases in Sybase

Upon submission of a query Syntax and Permissions are Checked

• Performed by the ASA front-end Query is Parsed

• Broken down into object codes Query is Optimized

• Most efficient execution method is determined• Query Plan is created

Query is executed Resources are cleaned up

Sybase IQ Query Execution

Server Front EndShared with ASAnywhere (ASA)Handles ConnectionsParses Incoming StatementsCross-DB Decomposition (CIS)Security CheckingJava SupportStored Procedures

OptimizerPredicate InferencePredicate Selectivity EstimationJoin OptimizationGrouping Algorithm SelectionSubquery OptimizationIndex Access Selection

Run-Time EnginePrefetch ManagerPredicate ExecutionTuple (Row) ProjectionJoin ExecutionGrouping ExecutionSortingSubquery Execution

Query Plan

How do I See a Query Plan?

Query Plans are Generated - In the IQ Message File As HTML Pages (if requested) in a separate file

Set Option Commands used with Query Plans Usually set as Temporary Options Example:

Set Temporary Option Query_Detail = ‘On’;

v12.5 Query_Plan option is ‘ON’ by default Query Plan appears in the IQ Message File You may turn this option off, if desired

Database Options for Query Plans

Query_Plan = ‘On’ (default ‘On’) Provides a Basic Query Execution plan in the IQ Message File May not provide enough detail

Query_Detail = ‘On’ (default ‘Off’) More detailed information in a Query Plan (Recommended) Must be used with Query_Plan or Query_Plan_As_HTML

Query_Plan_As_HTML = ‘On’ (default ‘Off’) Creates Query Plan as an external HTML file

More Database Options for Query Plans

Query_Plan_After_Run = ‘On’ (default ‘Off’) Delays creating the query plan until query completes

Query_Timing = ‘On’ (default ‘Off’) Includes execution times at each stage of the query Must be used with Query_Plan_After_Run

Query_Name = ‘query_name’ (default ‘’) Prints name provided in the Query Plan or as part of the

file name (for HTML Query Plans)

NoExec = ‘On’ (default ‘Off’) Creates Query Plan but does not execute the query

HTML Query Plans

Since HTML plans are easier to read these will be used in this presentation Query_Detail will also be set ‘On’ for all examples

When you specify Query_Plan_As_HTML = ‘On’ File will be written in the directory with the Database File (.db) The File name that is produced -

• Prefaced with User Name, Date and Time– Also contains the “Query Name” if specified

• has a “.html” file extension and is opened with a browser

Simple Example of HTML Query Plan File

Set temporary option Query_Name =‘stanton’;

Select count(*) from Central_Fact_Table;

HTML file generated in Database File Directory

Query Plan (HTML) Output – Two Parts

1) Query Tree

2) Query Detail

HTML Query Plan – The Query Tree

Node TypeAnd Number

The Node Number (underlined) is a hot linkto the Nodes below in the Query Detail

Estimated RowsMoving Up the Tree

Query Tree and Nodes

Query Tree is a representation of the Query Plan Consists of Nodes representing execution plan steps Nodes are numbered sequentially and identified by type

• Some housekeeping nodes are not displayed so there may be gaps in the Node number sequence

Query Tree also displays the estimated number of rows flowing up the branches between nodes

The Query Tree is displayed inverted As you examine a Query Plan you will see the last node at the

top of the Query Tree – usually the Root Node Leaf Nodes are typically at the bottom of the tree (but will

appear throughout the tree if there are many tables)

A More Complex Query Tree

Leaf Nodes

Nodes - General

Header Shows the Node Number and Node Type

Child Node(s) (if any) Node(s) that ‘feed’ this node Hot link to the Child Node(Leaf Nodes rarely have children)

Estimated Result Rows Number of rows the optimizer thinks

will come out of this Node Actual Row Counts are shown with

Query_Plan_After_Run = ‘On’

Some Node Types (there are others)

Root Scrolling Cursor Store Filter Group By

Sort Hash Indexes

Order By Store Semi-join Filter Filler Leaf

Join Hash (HJ) Hash Pushdown (HJPD) Nested Loop (NL) Nested Loop Pushdown

(NLPD) Sort-Merge (SM) Sort-Merge Pushdown

(SMPD) Cartesian

Subquery Union All

Node Details

Some Nodes Contain a Wealth of Information Detail in other nodes is not as important

Nodes to Concentrate On - Root Leaf Join Group By Filter

Root Node (only with Query_Detail = ‘ON’)

Node Detail• Child Node• Query Name (if any)• User Name (login)• Temp Space Used• Num of Users in Server• Num of CPUs

• Database Options Set

• Output from this node

Examining the Root Node

Number of ‘Active’ users Many Active users will have an effect on query response Optimizer may choose a different Query Plan depending upon

#CPU, IQ Cache Size and Number of Active Users

Temp Space Usage Is the space required greater than the Temp Cache Size? It may indicate Temp Cache is paging to disk

Database Options may affect the Query Plan A user or dba may have set options influencing joins,

aggregation or server behavior

Leaf Nodes

A Leaf Node Represents an IQ Table Several Types of Leaf Nodes

(Regular) Leaf Aggregation Leaf Grouping Leaf Order Leaf

Leaf Nodes can have a wealth of information Table Row Count Query Predicate(s) - WHERE clause search arguments

• Indexes available on columns used in predicates• Estimated Selectivity of predicates• Usefulness of the predicate• Index used by the predicate

Output Columns from the table

Leaf Node Discussion

Portion of a table used in a query is a ‘found set’ The optimizer knows that Table ‘A’ has X rows If there are no predicates on columns in the table we are done

• We just need the column(s) from the table for the query• Rows are passed up the Query Tree to the next Node

If predicates exist Optimizer must determine the best order to execute them If only one predicate then there is no decision With multiple predicates each one is evaluated and assigned a

USEFULNESS Score• Usefulness Score is a scale of 0 to 10 (10 = Most Useful)

IQ Optimizer - Usefulness

Usefulness is used to rank the predicates for execution order Predicate with the highest value (score) is executed first Remaining predicates executed in descending order of

Usefulness

What determines Usefulness Score? How well a Query Predicate will reduce the found set (Table) Factors influencing Usefulness Score -

• Type of Query Predicate• Index(es) available for Column• Optimized FP indexes on Column• How fast the predicate can be executed

Leaf Node Discussion – Example Query

Single table (147 MM rows) - 5 Predicates What is the best order to execute these ?? What do you need to know to decide ??

select paid_amtfrom central_fact_tableWhere elig_beg_date >= '2000-01-01' andMEMBER_GENDER = 'M' andMEMBER_DOB < '1950-01-01' andcomplication = 'OUTPATIENT' andservice_rollup = 'Neoplasms'

Optimizer Search Criteria – NO indexes

Rules the optimizer uses (IQ 12.5) when NO indexes exist for a query predicate (partial list)

Percent of table Query Operator Returned (Estimated)Equality (=) 20%Open Range (>) 40%Between 40%Like (%) 20%Inter-column equality (t.a = t.b) 30%Inter-column comparison (t.a < t.b) 50%

(You will see these estimates in Query Plans)

Leaf Node Discussion–Indexes on Predicates

If a ‘useable’ index exists on a predicate it can provide more information about a column Depends on the type of predicate and the index type(s)

Beware of some functions used on a column as they will negate the use of the index! Substring( t.a, 5, 5 ) = ‘fghijk’ May result in a Column Scan The optimizer will use the default rule for rows returned

(20% of the rows for an Equality search)

Query Predicate Operators

Equality and Inequality ( = , != ) Includes IN and NOT IN lists

Ranges ( <, >, <=, >=, Between) Including NOT (!)

Like ( % ) or Not Like Contains

IS [NOT] NULL

Query Predicate Operators and IQ Indexes

Equality and Inequality ( = , != ) Includes IN and NOT IN lists

Ranges ( <, >, <=, >=, Between) Including NOT (!)

Like ( % ) or Not Like Contains

IS [NOT] NULL

LF and HG

DATE, HNG,LF and HG

FP and WD

Null Bit

The Enumerated Indexes – HG and LF

These indexes provide exact data distribution counts to the optimizer - The number of distinct values for a column and The number of rows for each value

When columns with these indexes are used in certain query predicates they provide the optimizer with exact row counts that will be found Equality and Inequality operators (also IN List, NOT IN List)

LF/HG indexes also help Range Searches by providing counts of distinct values within the Range

Other Indexes

The other indexes used in query predicates locate the rows to satisfy the query They do not provide counts of rows to the optimizer You will see them referenced in Leaf Nodes

If Enumerated indexes or Optimized FP indexes are not available then Optimizer can only guess the number of rows that satisfy the predicate Optimizer will use the defaults (discussed earlier) Later you will see how to help the optimizer avoid bad guesses

Query Example – Leaf Node

Back to our Query …

select paid_amt

from central_fact_table

Where

elig_beg_date >= '2000-01-01' and

MEMBER_GENDER = 'M' and

MEMBER_DOB < '1950-01-01' and

complication = 'OUTPATIENT' and

service_rollup = 'Neoplasms'

The Leaf Node for This Query

This is a largeLeaf Node !(about 2 pages)

We will examineit in parts

Leaf Node (Part 1) Row Information

(Information here is hidden for clarity)

Estimated Result Set

Original Table Size

Leaf Node (Part 2) – Predicate Evaluation

Each predicate for the table is evaluated If an HG/LF index exists it provides Selectivity Statistics

(number rows returned as a Percentage of the Table) Usefulness Score is assigned based on found set size

The smaller the set, the more useful the predicate

.0001 % of the table is returned

That is very useful!!

Each Predicate is Evaluated and Ranked

Selectivity

Usefulness

Index Used

Usefulness For Each Predicate Assigned

Predicates are ranked by Usefulness

Query Predicate

==========================================

service_rollup = 'Neoplasms' .001 9.99

MEMBER_GENDER = 'M' .389 9.61

elig_beg_date >= '2000-01-01' .764 6.23

MEMBER_DOB < '1950-01-01' .389 5.61

complication = 'OUTPATIENT' .200 2.80

Selectivity Usefulness

This looks like a ‘Guess’ – 20% for an Equality

Est. Row Count from the Table is Derived from Selectivity Estimates Applied to all the Predicates

Estimates are applied to each subsequent predicate to determine the table’s Found Set Condition 1: 149,178,536 * 0.001995 = 297,587 rows Condition 2: 297,587 * 0.389099 = 115,791 rows Condition 3: 115,791 * 0.764858 = 88,564 rows Condition 4: 88,564 * 0.389793 = 34,521 rows Condition 5: 34,521 * .2000000 = 6,904 rows

Optimized FP Indexes (1 and 2 byte FPs)

Play important roles in 12.5 Engine Provide distinct counts in the absence of LF/HG index Help with Range Searches, LIKE predicates, push-down join

conditions, and predicates containing an expression on a column such as - SUBSTR(t.x, 1, 2) = 'TE'

Move data up the query tree faster (fewer bytes to move)

12.5 Query Plans display usage of these FPs

12.5 has new Database Option to “auto-create” Optimized FP indexes for all columns MINIMIZE_STORAGE (default OFF)

Minimize_Storage Option

Late addition to 12.5 Release Option description did not make first printing of any IQ Docs Look in the Release Bulletin for 12.5 for details

When set ‘ON’ has the affect of IQ Unique(255) for all columns in tables you create As data is loaded column FP index will ‘roll over’ to 2 byte and

then Flat FP index as necessary Highly recommended for all tables < 1000 columns

After Upgrading to IQ 12.5 Set Option ON and create a new table Load from old table into new table using INSERT Drop old table and rename new table

Optimized FP Indexes – Changes in 12.5

Optimizer no longer considers IQ Unique() value in Create Table for approx distinct counts of a column 12.4.x optimizer looked at the value in absence of HG/LF index That value could have changed over time and been way wrong There is no method to ‘update’ the IQ Unique() value

In 12.5 - Uses Optimized FP indexes to Improve Query Performance Reduce Data Storage in IQ Store

• Many cases of dramatic storage reduction• Fully Indexed IQ data = 50% of input data size

Leaf Node – Predicate Information

There is more detail on predicates in the Leaf Node This shows how much IQ knows about the data

• HG and Date Index on column• 8578 Distinct values• 1096 Values satisfy query• Column stored as FP(2)

elig_beg_date >= '2000-01-01'

Back to Our Query Example

Was the Optimizer Estimate correct?

Use Query_Plan_After_Run option Delays the printing of the Query Plan until Query Completes

New details appear in the Query Plan After Run Estimated and Actual Row Counts at each Node Estimated and Actual Temp Space Used

Sample Query - Query Plan After Run

Row Estimate Off by a factor or 72.6

Temp Space Usage Just slightly below actual

Questions … Why is row estimate wrong? Is this a problem?

Row Estimates

Several Reasons for Wrong Estimates Method used to derive estimate is not perfect

• Assumes even distribution of data and• Predicates are not correlated

One Predicate was a Guess (no useable index for column)• Importance of HG/LF indexes cannot be stressed enough

Wrong Estimates May NOT be a Problem Nodes using HASH tables are more sensitive to row counts

• There are row limitations and performance implications Sort Nodes are not sensitive to wrong estimates

Corrective Action and Results

Created LF Index on 149MM row table Took 110 Seconds (Sun 64 w/ 900 MHZ cpu)

New index is more Selective (and Useful)

Estimates are much closer to reality This is about as close as we can get for now

Query Plan After Index Added

User Supplied Estimates for Queries

For cases where no index can be used for a query You can advise the optimizer the percentage of rows that will

be returned by the predicateSyntax: Add percentage to predicate after a comma within

parentheses()

Select count(*) from central_fact_tableWhere (service_rollup like ‘%Poison%’ , .003)

This example - .00003 percent of rows match Value provided is a percentage

Query Plan - No Estimate Provided

Query Plan with User Supplied Estimate

Leaf Nodes – Wrap Up

Show Indexes Available and Indexes Used Order of Predicate Execution Selectivity and Usefulness of Indexes Estimated and Actual Row Counts from Table Other

Output Columns from Table

Next Node Type: Join Nodes

Join Nodes – Seven Type of Joins in IQ

Hash (HJ)

Nested Loop (NL)

Sort-Merge (SM)

Hash Pushdown (HJPD)

Nested Loop Pushdown (NLPD)

Sort-Merge Pushdown (SMPD)

Cartesian Nested Loop Joins

You will see the Full Join Type Names and Abbreviations in Query Plans

(Short description of these join types follows)

Nested Loop Joins

Nested Loop (NL) Takes a join key from the larger side of the join and compare it

to each row of the smaller side

Nested Loop Pushdown (NLPD) Take the join keys of the smaller side of the join into an IN list

on the large table

Hash Joins

Use Hashing Algorithms for Joins This Join will be handled in Memory

Hash Operations are Restricted by the DB Option MAX_HASH_ROWS – default 2,500,000 (configurable)

Hash Joins are Sensitive to Row Estimates If Estimated Row Counts are too low Thrashing can occur

• Thrashing involves Disk I/O Server will abort a query (rollback) if thrashing becomes

excessive Hash_Thrashing_Percent DB Option controls this behavior

• Default = 10 (percent) and is configurable

Sort-Merge

Classic Join Algorithm Sort both tables and match up the join keys Used for joining two large tables Uses Temp Cache/Store for sorting

Accompanied with Order By Nodes for both Tables Where sorting is performing

Sort-Merge Push Down Creates an IN List of Keys on smaller table to ‘probe’ and filter

rows from the other table Unique to Sybase IQ

Joins in a “Simple” Query

Join Node - General

Has Two Child Nodes Shows Estimated Result Rows after joining

Actual Row counts with Query_Plan_After_Run

Lists the Types of Joins the Optimizer considered Could have picked any one in the list

Shows Join Condition(s) But NOT the indexes used!

Other information specific to the type of join

Sort-Merge Join Node (Query Tree)

• Sort-Merge Joins are usually preceded by an Order By Node

Sort-Merge Join Node (partial)

Child Nodes Estimated Result Rows Join Types Considered

Optimization Note

Join Constraint

‘Push Down’ Joins

When the Optimizer uses a Push Type join you will see different notations in Leaf Nodes that are ‘probed’

The Leaf for the larger side of the join is filtered by an IN list to reduce the number of keys that flow from that Node

Join Nodes - Performance Considerations

Create Primary Keys on tables At the very least HG or Unique HG indexes on all join keys

Use Query_Plan_After_Run to check row counts If estimates are are ‘way off’ then check for missing indexes in

Leaf Nodes and/or create Primary Keys Hash Join types are most sensitive to row estimates Sort-Merge, Nested Loop joins are not

Complex Query – No Primary Keys Defined

Some BIG Numbers!

Complex Query

Query involved many join columns on large tables Indexes were correct for all joins and predicates The optimizer made very conservative estimates since it could

not accurately predicate final result set from join

Actual results were much smaller than the estimate

Creating Primary keys on tables changed the plan Result set estimates while still on the high side were decidedly

smaller

Row Estimates - Before and After Primary Key

Result Set Estimate:

7,355,725,745,151,829

New Estimate

24,396,774,363

300,000 xsmaller!

Tweaking Joins

You can influence Join Type selection Database Option - Join_Preference

You can also influence Join Order Database Option - Join_Optimization

Join_Preference Database Option

‘1’ – Prefer Sort/Merge ‘2’ – Prefer Nested Loop ‘3’ – Prefer Nested Loop PD ‘4’ – Prefer Hash ‘5’ – Prefer Hash PD ‘6’ – Prefer PreJoin ‘7’ – Prefer Sort Merge PD

‘-1’ – Avoid Sort-Merge ‘-2’ – Avoid Nested Loop ‘-3’ – Avoid Nested Loop PD ‘-4’ – Avoid Hash ‘-5’ – Avoid Hash PD ‘-6’ – Avoid PreJoin ‘-7’ – Avoid Sort-Merge PD

Default = ‘0’ (Optimizer Decides)

You can only influence join type from the list of joins the serverconsidered in the Query Plan, otherwise it will ignore the option

(and it may just ignore it anyway!!)

Join_Preference Option

Set Temporary Option Join_Preference = ‘-2’; Telling Optimizer to Avoid Nested Loop Joins Option value must be quoted as shown

For problem solving and performance testing

Use as a Temporary Option with a Query

Join_Optimization Option

Default = ‘On’ – Optimizer Decides Join Order

If set ‘Off’ will join tables as listed in the FROM clause

Use this only to diagnose obscure join performance issues And use it as a Temporary Option as well

Group By Nodes

Three types of Group By Nodes Group By (Hash) – for smaller result sets

• All done in memory Group By (Sort) – larger sets

• Will use IQ Temp Memory/dbspace for sorts Grouped Leaf – Grouping with indexes in a Leaf Node

• For single tables when all Group By columns are indexed

Group By (Hash) Same database options apply as for joins 2,500,000 row max (default)

Group By Nodes - General

Have a Child Node Show Estimated Result Rows

Actual rowcount if Query_Plan_After_Run = ‘On’

Grouping Expressions Used But no information on indexes used

Output columns

Group By Performance

Index All Grouping Expressions LF or HG indexes needed

They will provide the best estimates for results Group By (Hash) generally faster than Group By (Sort) Sort method will be used when optimizer is in doubt

If estimates in child node are way under the wrong method may be used Check Estimated vs Actual counts

Tweaking Group By Performance

Like Joins, You can influence the optimizer Database Option: Aggregation_Preference

Default = ‘0’ – Optimizer decides

‘1’ – Prefer Sort ‘-1’ Avoid Sort

‘2’ – Prefer Using Indexes ‘-2’ Avoid Indexes

‘3’ – Prefer Hash ‘-3’ Avoid Hash

Use this option with care and always as a Temporary Option

Filter Nodes

Used with some expressions and “Having” clauses that cannot by filtered with an indexExample:

Select diag_catgy_cd, sum(paid_amt)

From inpatient

Group By diag_catgy_cd

Having sum(paid_amt) > 10000

The Having expression must be evaluated and filtered after the groups are formed

Query Plan with Filter Node

Node Detail – Leaf Node

No HG/LF Index !

Node Detail –Group By (Hash)

Node detail – Filter Node

Index on Group By Column May Help

Build on index on the Grouping Expression HG Index is appropriate

Rerun The Query

New Query Plan (new HG Index)

• Grouped Leaf instead of Leaf Node and Group By(hash)

Grouped Leaf Node and Filter Node

You Should be Dangerous Now …

We have covered Query Plans “101” for Sybase IQ

Now you know … Indexes are important (always have been) Primary keys are important (more than ever) Optimized FP indexes are important (even more than ever) The optimizer is conservative and can be influenced

And there is more, but we just don’t have the time

Questions

Visit the IQ Technical Team on the Boardwalk

Lou Stanton

[email protected]

Documents

Query Plan Interpretation