DB2 SQL Tuning Best Practices

Embed Size (px)

Citation preview

  • 8/12/2019 DB2 SQL Tuning Best Practices

    1/22

    DBA BEST PRACTICES

    DB2 UDB LUW

    SQL TUNING

    FEBRUARY 2010

  • 8/12/2019 DB2 SQL Tuning Best Practices

    2/22

    2010 Computer Sciences Corporation. 2

    TABLE OF CONTENTS

    1.0 Overview 4

    2.0 Introduction 4

    3.0 UDB DB2 Database Manager Background 5

    4.0 Assumptions 7

    5.0 Best Practices 7

    5.1 Best Practices for Database Configuration 7

    5.1.1 Database Optimization Class Registry Setting 7

    5.1.2 Database Manager Instance Configuration File

    Parameters 85.1.3 Database Configuration File Parameters 9

    5.1.4 Database Bufferpool and Tablespace Configuration 10

    5.2 Database Table and Index Best Practices 11

    5.2.1 Database Table and Index Design 11

    5.3 UDB DB2 Database RUNSTATS 12

    5.3.1 RUNSTATS Command 13

    5.4 UDB DB2 Database Table Reorganization 14

    5.4.1 REORGANIZE and REORGCHK Commands 14

    5.5 SQL Workload Tuning Best Practices 155.5.1 Prioritize then Divide and Conquer 15

    5.5.2 Get Baseline Run Times and EXPLAIN Plans 15

    5.5.3 Best Practice Coding Techniques 15

    5.5.4 Review Joins and Indexes 17

    5.5.5 Review All Selected Columns and Table Indexes 17

    5.5.6 Retest the Entire Work Load After SQL Performance

    Tuning 17

    5.5.7 DB2 Index Advisor 18

    db2advis - DB2 design advisor command 185.6 Explain Tools 19

    5.6.1 Visual Explain Tool 19

    Visual Explain 19

    5.6.2 DB2expln Facility 20

    http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305210http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305207
  • 8/12/2019 DB2 SQL Tuning Best Practices

    3/22

    SQL and XQuery explain tool 20

    6.0 Appendix 21

    http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212http://localhost/var/www/apps/conversion/tmp/scratch_9/file%3A%2F%2F%2FC%3A%2FDocuments%2520and%2520Settings%2Fbwoodcraft%2FDB2_SQL_Tuning_Best_Practices.docx#_Toc255305212
  • 8/12/2019 DB2 SQL Tuning Best Practices

    4/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 4

    1.0 Overview

    The intent of this document is to describe the best practices for SQL Tuning for DB2 Databasesin the LUW environments. The document covers:

    Database Maintenance for Best Practices

    Database Configuration for Best Performance

    Database Design Issues for Best Performance

    SQL Coding for Best Practices

    SQL Explain tools for Tuning for Performance

    Version Revision Date Revised By Revision Summary

    1 02/02/2010 Bruce Woodcraft Initial draft

    2.0 Introduction

    This document describes best practices for writing Structured Query Language (SQL) scripts

    which retrieve data from an IBM DB2 database running on a Linux, UNIX, or Windows (LUW)server. It covers the best practices for writing SQL, reviewing database maintenance that affects

    data retrieval, database configuration parameters that impact performance, database object design

    issues for tables and indexes, and using the explain tools to assist in performance tuningactivities.

    SQL Query Tuning Factors can be broken down into several categories:

    Database Configuration

    Database Object Maintenance

    Database Object Design (Tables and Indexes)

    SQL Coding Techniques

    DB2 Explain Plan Tools

    There are many factors that determine the performance of a given SQL query, and many of

    which are beyond the control of the SQL query developer. For instance, there are database

    configuration parameter settings and table maintenance activities that the DBA controls, but; the

    SQL developer most likely does not have access to change or modify.It has been widely documented in the database tuning annals that the SQL query script is thesingle largest performance factor in more than three out of four cases. For this reason this

    document will have the greatest focus on SQL coding techniques for performance. The other

    contributing factors will be discussed but in far less detail as their remedies are detailed in otherdocuments and are beyond the scope of this document.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    5/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 5

    3.0 UDB DB2 Database Manager Background

    Before discussing these SQL tuning factors, we first should consider some background on IBMs

    Universal DB2 Database Manager for LUW environments. The most import component of the

    product relevant to running queries to retrieve data is the Optimizer. The optimizer for anyRelational Database Management System (RDBMs) provides the intelligence for determining the

    best steps for accessing and retrieving the data needed to satisfy the query. This set of database

    tasks is known as the Optimized Access Path. Thus the Optimizer determines how queries willbe performed within the database and is the distinguishing component among RDBMs.

    Below is a brief description of DB2s Optimizer from anIBM Technical article titled Coding

    DB2 SQL for Perforance: The Basics.

    http://www.ibm.com/developerworks/data/library/techarticle/0210mullins/0210mullins.html#author

    The Optimizer

    The optimizer is the heart and soul of DB2. It analyzes SQL statements and determines the most

    efficient access path available for satisfying each statement (see Figure 1). DB2 UDB accomplishes thisby parsing the SQL statement to determine which tables and columns must be accessed. The DB2

    optimizer then queries system information and statistics stored in the DB2 system catalog to determine

    the best method of accomplishing the tasks necessary to satisfy the SQL request.

    Figure 1. DB2 optimization in action.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    6/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 6

    The optimizer is equivalent in function to an expert system. An expert system is a set of standard rules

    that, when combined with situational data, returns an "expert" opinion. For example, a medical expertsystem takes the set of rules determining which medication is useful for which illness, combines it

    with data describing the symptoms of ailments, and applies that knowledge base to a list of input

    symptoms. The DB2 optimizer renders expert opinions on data retrieval methods based on thesituational data housed in DB2's system catalog and a query input in SQL format.

    The notion of optimizing data access in the DBMS is one of the most powerful capabilities of DB2.Remember, you access DB2 data by telling DB2 what to retrieve, not how to retrieve it. Regardless of

    how the data is physically stored and manipulated, DB2 and SQL can still access that data. This

    separation of access criteria from physical storage characteristics is called physical data independence.

    DB2's optimizer is the component that accomplishes this physical data independence.

    If you remove the indexes, DB2 can still access the data (although less efficiently). If you add a

    column to the table being accessed, DB2 can still manipulate the data without changing the programcode. This is all possible because the physical access paths to DB2 data are not coded by programmers

    in application programs, but are generated by DB2.

    Compare this with non-DBMS systems in which the programmer must know the physical structure of

    the data. If there is an index, the programmer must write appropriate code to use the index. If someone

    removes the index, the program will not work unless the programmer makes changes. Not so withDB2 and SQL. All this flexibility is attributable to DB2's capability to optimize data manipulation

    requests automatically.

    The optimizer performs complex calculations based on a host of information. To visualize how the

    optimizer works, picture the optimizer as performing a four-step process:

    1. Receive and verify the syntax of the SQL statement.2. Analyze the environment and optimize the method of satisfying the SQL statement.3. Create machine-readable instructions to execute the optimized SQL.4. Execute the instructions or store them for future execution.

    The second step of this process is the most intriguing. How does the optimizer decide how to execute

    the vast array of SQL statements that you can send its way?

    The optimizer has many types of strategies for optimizing SQL. How does it choose which of thesestrategies to use in the optimized access paths? IBM does not publish the actual, in-depth details of

    how the optimizer determines the best access path, but the optimizer is a cost-basedoptimizer. Thismeans the optimizer will always attempt to formulate an access path for each query that reducesoverall cost. To accomplish this, the DB2 optimizer applies query cost formulas that evaluate and

    weigh four factors for each potential access path: the CPU cost, the I/O cost, statistical information in

    the DB2 system catalog, and the actual SQL statement.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    7/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 7

    4.0 AssumptionsThis document assumes the target audience has some experience and knowledge of SQL query

    scripting with some relational database and points out specific best practices for using IBMs

    UDB DB2 Database product for Linux, UNIX, and Windows (LUW) environments. Also, theUDB DB2 instance and database parameter configure is beyond the discussion for this paper;

    but, are as they the briefly mention below that these settings have an important role in the overalloptimization of performance.

    5.0 Best Practices

    5.1 Best Practices for Database Configuration

    This section describes some UDB DB2 system and database configuration parameters

    that can be changed by a DBA which could have the greatest impact on SQL queryperformance. These are examples of Other System Information in the Optimizer

    figure 1 above. These parameters are mentioned here but are covered in more detail in

    the Best Practices for Database Design for UDB DB2. CAUTIONOnly the DBAshould consider tuning of these settings as they will impact all database activity, so the

    upmost level of caution is needed

    5.1.1 DATABASE OPTIMIZATION CLASS REGISTRY SETTING

    Changing the setting of the Optimization Class registry variable can provide some of the

    advantages of explicitly specifying optimization techniques, especially for the following

    cases:

    To manage very small databases or very simple dynamic queries To accommodate memory limitations at compile time on your database server

    To reduce the query compilation time, such as PREPARE

    A query optimization classis a set of query rewrite rules and optimization techniques for

    compiling queries. Per IBM s UDB Information Center for LUW on this subject:

    To set the query optimization for dynamic SQL, enter the following command in the

    command line processor: SET CURRENT QUERY OPTIMIZATION = n;

    Most statements can be adequately optimized with a reasonable amount of resources by

    using optimization class 5, which is the default query optimization class. At a given

    optimization class, the query compilation time and resource consumption is primarily

    influenced by the complexity of the query, particularly the number of joins and subqueries.However, compilation time and resource usage are also affected by the amount of

    optimization performed.

    Query optimization classes 1, 2, 3, 5, and 7 are all suitable for general-purpose use. Consider

    class 0 only if you require further reductions in query compilation time and you know that

    the SQL statements are extremely simple.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    8/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 8

    Again, CAUTIONshould be used when changing this setting. More information and a

    complete discussion of this setting can be found in the IBM UDB Information Center for

    LUW. http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp

    5.1.2 DATABASE MANAGER INSTANCE CONFIGURATION FILE PARAMETERS

    Each UDB DB2 Instance has an Instance Configuration file that contains 68 parameters.

    There are a few that have a significant impact on performance which are listed below.

    Table source: IBM Redbook DB2 UDB Enterprise Edition V8.1: Basic Performance Tuning Guidelines

    http://www.redbooks.ibm.com/redpapers/pdfs/redp4251.pdf

    http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsphttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsphttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r5%2Findex.jsp
  • 8/12/2019 DB2 SQL Tuning Best Practices

    9/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 9

    These parameters should be tuned by the database support DBA with CAUTION.

    For further detail on these parameters see the source document.

    5.1.3 DATABASE CONFIGURATION FILE PARAMETERS

    Each UDB DB2 database has its own Database Configuration File which contains 82

    different parameters. Below are the parameters that could have the greatest performanceimpact. Again use caution when changing any UDB DB2 parameter.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    10/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 10

    Table source: IBM Redbook DB2 UDB Enterprise Edition V8.1: Basic Performance Tuning Guidelines

    http://www.redbooks.ibm.com/redpapers/pdfs/redp4251.pdf

    Like the DB2 instance setting that can be turned, there are many DB2 Databaseconfigurations settings that can have a significant effect on performance of the database.

    Several key settings are: AVG_APPLSwhich the Optimizer uses to estimate how much

    buffer pool memory each which will get, CATALOGCACHE_SZwhich determines how

    much memory is used to catalog the system catalog, and SORTHEAPwhich specifiesamount of memory to be available for each sort operation. The details of tuning these

    parameters are discussed in detail in the IBM Redbook referenced above and under the

    UDB DB2 Database Tuning Best Practices and IBMs UDB DB2 Administration manual.

    5.1.4 DATABASE BUFFERPOOL AND TABLESPACE CONFIGURATION

    In any database design and configuration, the size and allocation of the databases

    bufferpools and table spaces have the most impact factor for improving the databases

    performance. Buffer pools are used to cache data in memory for reading and writing todisk, and they handle the data much faster from memory than from disks. Generally,

    there just a few of different page sizes to handle the different table space page sizes.

    Special purpose buffer pools may be created for specific data and processing methods.

    Likewise there are many sizes of tablespaces and specific purpose tablespaces. For

    instance, Temporary Tablespaces are created and assigned to specific buffer pools. UDB

    DB2 has options for partitioning large tables into multiple tablespaces for data separationand faster I/O performance. Specific data that is used frequently can be set up in its own

    bufferpool and tablespace so it can stay in memory for fast access. In tuning queries you

    may come across often-used data that may be separated out and tuned in this fashion.

    Tablespace changes, and even to a lesser extent bufferpools changes, may be needed tooptimize a given query workload and would be the responsibility of a DBA and not a

    developer.

    Remember, database configuration changes like the one mentioned above need to be

    made with CAUTIONas they could be counterproductive to other queries in the

    workload, especially if one bufferpool is reduced to create another. Its for this reasonworkloads need to be tuned as a group and measured as a group after individually looking

    at the slow performers and the most often run queries. (Do not underestimate the

    improvement that can be made to the overall runtime of a work load for a small query

    that is run a million times.)

    http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdfhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fwww.redbooks.ibm.com%2Fredpapers%2Fpdfs%2Fredp4251.pdf
  • 8/12/2019 DB2 SQL Tuning Best Practices

    11/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 11

    5.2 Database Table and Index Best Practices

    Tables organize and group the data that fills the database while indexes provide maps to

    specific data in the tables and speeds the I/O processing. The importance of good designand planning here will immediately impact the databases performance.

    5.2.1 DATABASE TABLE AND INDEX DESIGN

    Two other key elements of an optimal performing database are the design and function ofthe tables and indexes that were designed for it. Too often tables are collections of fields

    and no thought for function and use have been put into their design. Indexes get added to

    provide the tables a key but the design ends there. Tables with too many columns may be

    should be split into two parts, one with the most used columns and one with the least usedcolumns. Some tables that are constantly joined to another table may be joined for

    operational efficiency despite not being in forth normal form. Most detail on the benefits

    of good table design could be found in the UDB DB2 Database Design Best Practices.Note however that table design and structure play an important role in optimizing in the

    tuning of every table that reads from it or joins to it.

    UDB DB2 offers a variety of table structures to store and retrieve data for optimal

    performance. There are Range-Clustered Tables (RCT), MultiDimensional Clusteringtables (MDC), and for even larger tables, Range Partitioned tables (RP) tables. These

    table structures have specific indexing methods that are very beneficial when used

    properly. Again see the UDB DBA Database Best Practices for more detail on thesetable structures and indexing methods.

    One of the biggest factors effecting query performance is what indexes are available for

    the optimizer to use. The primary role of indexes is to shorten the path of the access plan

    so that the data may be retrieved as fast as possible. Indexes perform an awesome andpowerful service for the database. Sometimes creating too many indexes or adding toomany columns to a particular index will be detrimental to the entire work load, especially

    when adding or updating records to that over-indexed table. Adding indexes to a table is

    always a tradeoff between retrieval time and maintenance time plus storage space.Usually the retrieval time is more important and the indexing is done during a batch cycle

    when no one is waiting on it to finish. Also, UDB DB2 v9.7 has new index compression

    features that make indexes smaller and faster to use thus offsetting of the cost associatedwith an index on a larger table.

    Most if not all tables will have an index of some kind. Generally most have a unique

    index that servers as the Primary Key and is explicitly states as the Primary Key. (Note in

    UDB DB2 it can be created as a CONSTRAINT and will have an index created for it.)

    Rule to Remember:

    Five to seven indexes per table with five to nine columns at most..

  • 8/12/2019 DB2 SQL Tuning Best Practices

    12/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 12

    Unique Indexes can be created on tables that are other than the Primary Key ( PK) and

    are referred to as Alternate Keys. For example, a sequence number (or identity column)may be added to the row to provide a sequential numeric column to use as the PK and a

    group of other columns may form the natural key and can be a unique combination of

    columns. Unique Indexes may Include other none indexed columns that provide adirect data source for a few table columns. This becomes an extremely effective tool

    especially for large rows with lots of columns. Adding a few extra columns to the

    Unique Index (or AK) permits the I/O to be limited to the index only, saving big row

    reads. This technique of I/O is known as Index Only Reads and is quite efficient

    compared to reading both the index and the data rows.In a Snowflake or a Hub and Spoke data model, where there are a few Fact tables

    that are linked to numerous Attribute tables, the Fact table should have single column

    attribute key indexes that match the indexes of the Attribute tables. UDB DB2 has aspecial join operator called the STAR JOIN which handles this type of joins and index

    processing in a highly efficient way using RID processing and index ANDing. See the

    IBM UDB Information Center for complete details of the STAR JOIN.

    5.3 UDB DB2 Database RUNSTATS

    As we seen in the Optimizer Diagram above, the UDB DB2 Database uses systemcatalog statistical data to assist the optimizer in determining the best steps to retrieval the

    needed data. Below will discuss the importance of this data and the necessity for

    keeping it up to date.

    Rule to Remember:

    Use the Primary Key on a table whenever possible, unless another indexprovides more columns and faster Access Path.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    13/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 13

    5.3.1 RUNSTATSCOMMAND

    . The UDB DB2 Database uses catalog statistics and column distribution counts to assistthe optimizer determine the optimal data access path. Because the optimizer uses these

    counts to estimate the costs of various steps, these statistics become critical to the

    decision making process. The RUNSTATS command is used to generate fresh rowcounts and column distributions after a table has been modified in a significant way since

    the last time the RUNSTATS command was run.

    Rule to Remember:

    Run RUNSTATS command after significant changes or a total refresh of a table.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    14/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 14

    5.4 UDB DB2 Database Table Reorganization

    Another important UDB DB2 Database maintenance command is the REORGANIZE

    command which rearranges the rows in a table or index while removing the deleted rows.

    5.4.1 REORGANIZEAND REORGCHKCOMMANDS

    UDB DB2 Enterprise Manager use the REORGCHK command to test tables to see if

    they need to have the REORGANIZE command run on them.

    The REORGCHK command calculates statistics on the database to determine if tables orindexes, or both, need to be reorganized or cleaned up.

    Rule to Remember:

    Run REORG command after significant deletions and additions to a table or index.

    Rule to Remember:

    Run REORGCHK command to check to see if a table or index needs to be cleaned up.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    15/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 15

    5.5 SQL Workload Tuning Best Practices

    5.5.1 PRIORITIZE THEN DIVIDE AND CONQUER

    In most database environments there is a large set of SQL statements that is run against

    the database in any given time window. Some statements are repeated daily from on-lineapplications or report programs, others are ad hoc queries run one time by a single user.After capturing the complete set of statements, subdivide the statements by application

    and user priority. Also reduce the ad hoc queries to a representative subset as it will be

    impossible to optimize the database for every query, let alone ad hoc queries that mayonly be run once. Also identify queries that are run the most often as optimizing these

    queries will return big savings over time. Batch report queries need to run efficiently but

    may not be prioritize as high as on-line screen queries needing sub second response time.

    Review and tune the queries based on their priority and use. Focus on the most import

    queries and those with the most visibility.

    5.5.2 GET BASELINE RUN TIMES AND EXPLAINPLANS

    Once you have determined you Query Workload to tune, get baseline run times and

    Explain Plans. These will be needed for comparison to measure performance

    improvement during and at the end of tuning process.

    5.5.3 BEST PRACTICE CODING TECHNIQUES

    There are some basic SQL coding techniques to follow to insure the best performancefrom the SQL script. SQL should be written to return the exact data needed with the

    minimal steps and amount of data processed. Queries need to use column and rowfiltering to quickly reduce the possible rows in the return record set. The use of indexedcolumns, simple predicates, and avoiding bad coding techniques will help the optimizer

    determine the best data access path for the query. Below are a few guidelines to keep in

    mind when coding and reviewing SQL scripts for optimal performance.

    Keep WHERE Expressions Simple- When it comes to WHERE conditions, the simpler the

    better. Try to avoid using complex expressions where the expressions prevent the optimizer

    from using the catalog statistics to estimate an accurate selectivity. The expressions might

    also limit the choices of access plans that can be used to apply the predicate.

    Avoid Functions in JOINS - JOINS will be limited to slower Nested Joins when one of the join

    predicates contains an expression or function. Also the expressions may cause the

    cardinality estimates to be inaccurate and cause the optimizer to select a non-optimal path.

    Avoid Expressions on JOIN Columns -Try to avoid using expressions on JOIN columns where

    an index exists that would disqualify the use of the index. If possible try to rewrite the query

    using indexed columns or try using the reverse operations of the expressions . Applying

    expressions over columns prevents the use of index start and stop keys, leads to inaccurate

  • 8/12/2019 DB2 SQL Tuning Best Practices

    16/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 16

    selectivity estimates, and requires extra processing at query execution time. These

    expressions also prevent or hamper query rewrite optimization steps as well.

    Match JOIN Column Types - Avoid mismatched JOIN values as data type mismatches

    prevent the use of hash joins. Also note that if the JOIN column data type is CHAR,

    GRAPHIC, DECIMAL or DECFLOAT the lengths must be the same.

    Avoid Non-Equality JOINS - JOIN predicates that use comparison operators other than

    equality should be avoided because the join method is limited to nested loop. Also, the

    optimizer might not be able to compute an accurate selectivity estimate for the JOIN

    predicate. When a non-equality JOIN cannot be avoided, be sure an appropriate index exists

    on either table because the join predicates will be applied on the nested loop join inner.

    Dont Use Distinct Aggregations - the DISTINCT function causes a sort of the final result set,

    making it one of the more expensive sorts. Note that there are changes as of DB2 V9 where

    the optimizer will look to take advantage of an index to eliminate a sort for uniqueness as it

    currently does in optimizing with a GROUP BY statement today. Rewriting the SQL script

    using a GROUP BY or using a Sub SELECT (or IN predicate) will usually be more efficient.

    Also, avoid multiple DISTINCT aggregations [eg., SUM(distinct colx), AVG(distinct coly)] in the

    same SELECT as this becomes very expensive as the optimizer rewrites the original query

    into separate aggregations and SORTs, for each specifying DISTINCT keyword, and then

    combines the multiple aggregations using a UNION operation.

    Avoid Outer Joins Unless Necessary - The left outer join can prevent a number of

    optimizations, including the use of specialized star-schema join access methods. However,

    in some cases the left outer join can be automatically rewritten to an inner join by the query

    optimizer depending on the other predicates in the SQL script. Use of the inner equijoin is

    often more efficient so use it were possible.

    Tell Optimizer How Many Rows to Expect When the result set is know or can be closely

    estimated, use the OPTIMIZE FOR N ROWS clause along with FETCH FIRST N ROWS ONLY

    clause. OPTIMIZE FOR N ROWS clause indicates to the optimizer that the application

    intends to only retrieve N rows, but the query will return the complete result set. FETCH

    FIRST N ROWS ONLY clause indicates that the query should only return N rows. OPTIMIZE

    FOR N ROWS along with FETCH FIRST N ROWS ONLY, to encourage query access plans that

    return rows directly from the referenced tables, without first performing a buffering

    operation such as inserting into a temporary table, sorting or inserting into a hash join hash

    table. NOTE, that specify OPTIMIZE FOR N ROWS to encourage query access plans that

    avoid buffering operations, but retrieve all rows of the result set, could experience degraded

    performance. This is because the query access plan that returns the first N rows fastestmight not be the best query access plan if the entire result set is being retrieved.

    Avoid Redundant Predicates- Eliminate duplicate predicates, especially when they occur

    across different tables. In some cases, the optimizer cannot detect that the predicates are

    redundant. This might result in cardinality underestimation and the selection of a sub-

    optimal access plan. Review SQL script for columns with same data but different column

  • 8/12/2019 DB2 SQL Tuning Best Practices

    17/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 17

    names where the same tests are being performed. Again keep the predicates as simple as

    possible and remove the same test on similar columns wherever possible.

    Select Only the Columns NeededAvoid using SELECT * as you return all the columns for

    each row returned. This will cause more I/O processing and slow down SORTS with

    needless data. Also, dont select columns you know the value for in the SQL script which

    causes more unneeded data handling. For example, SELECT A, B,C WHERE C=1958

    causes column C data to be processed needlessly. Also, dont select columns for sorting or

    grouping if these columns are not needed in the return data set.

    Select Only the Rows NeededReducing the set of rows returned in a result set will make

    the query handle less data and run faster. Use row filter predicates to limit the rows of data

    being returned. When writing a SQL script with multiple predicates, determine the

    predicate that will filter out the most data from the result set and place that predicate at the

    start of the list. By sequencing your predicates in this manner, the subsequent predicates

    will have less data to filter and process.

    Use and INDEX in place of a SORTCreating an index on commonly sorted data columns

    could save a SORT of the result set.

    5.5.4 REVIEW JOINS AND INDEXES

    Table joins should always use indexed columns whenever possible for best performance.Review the JOINS and columns used. Ideally use the Primary Key for at least one of the

    tables. Using index columns in the JOINS permits the optimizer to use the column

    statistics and index to determine the best access path and could reduce the I/O by usingthe index rather than the data from the table. The use of indexed columns in filtering

    predicates reduces the processing required and data handling by utilizing the indexes andindex processing methods.

    5.5.5 REVIEW ALL SELECTED COLUMNS AND TABLE INDEXES

    Selected columns should be reviewed as well as the JOIN columns. Needed columns to

    satisfy the query may be available in the index used for a table JOIN or an index used for

    accessing the table. If all of the selected columns are in an index, then I/O processing canbe limited just to the index pages. This is known as Index-Only Read which is much

    more efficient then reading both the index and the data table. Note, UNIQUE indexes

    can have data columns INCLUDED in the index pages. This is very useful when themajority of needed columns are all ready in the index and another column or two is

    needed from the data row. If the row contains many columns, having all of the needed

    columns in an index becomes significantly more efficient than the alternative.

    5.5.6 RETEST THE ENTIRE WORK LOAD AFTER SQLPERFORMANCE TUNING

    Making index changes while tuning individual SQL statements may have unplanned

    impact on other parts of a given workload. It is important to retest the entire workload

    after tuning the SQL statements individually. Use the recorded baselines to compareperformance improvements. Compare the ending explain plans and estimated

    TIMERONS (unit of estimated run resource costs).

  • 8/12/2019 DB2 SQL Tuning Best Practices

    18/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 18

    5.5.7 DB2INDEX ADVISOR

    DB2 has a tool to review and recommend INDEXES for a specified Query Workload.

    This tool reads a file of SQL Statements and generates a list of used and recommended

    indexes for that workload (or statement) as well as a list of unused indexes. The outputof this tool specifies the percent of estimated performance improvement for each new

    recommended index and its expected size.

    Note, this tool may recommend a list of indexes to add for a given work load orstatement. Adding indexes involves a tradeoff of storage space and processing time.

    Be very cautious when adding indexes.

    See the IBM DB2 Information Center for further details of this tool.

    http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004770.html

    db2advis - DB2 design advisor command

    The DB2 Design Advisor advises users on the creation of materialized query tables (MQTs) and indexes, therepartitioning of tables, the conversion to multidimensional clustering (MDC) tables, and the deletion of unusedobjects.

    The recommendations are based on one or more SQL statements provided by the user. A group of related SQLstatements is known as a workload. Users can rank the importance of each statement in a workload and specify thefrequency at which each statement in the workload is to be executed. The Design Advisor outputs a DDL CLP scriptthat includes CREATE INDEX, CREATE SUMMARY TABLE (MQT), and CREATE TABLE statements to create therecommended objects.

  • 8/12/2019 DB2 SQL Tuning Best Practices

    19/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 19

    5.6 Explain Tools

    DB2 provides two tools for generating Explain Plans for a given SQL statement. These tools are

    useful for reviewing and tuning queries as they identify which indexes are being used and wheretable scans are being performed.

    5.6.1 VISUAL EXPLAIN TOOL

    This tool is available from the DB2 Control Center and will display graphically theExplain Plan for the SQL statement specified.

    See the IBM DB2 Information Center for further details of this tool.

    http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004

    770.htmlVisual Explain

    Visual Explain lets you view the access plan for explained SQL or XQuery statements as a graph. You can use theinformation available from the graph to tune your queries for better performance.

    Important:Access to Visual Explain through the Control Center tools has been deprecated in Version 9.7 and might be

    removed in a future release. For more information, seeControl Center tools have been deprecated.Accessing Visual

    Explain functionality through the Data Studio toolset has not been deprecated.

    You can use Visual Explain to:

    View the statistics that were used at the time of optimization. You can then compare these statistics to thecurrent catalog statistics to help you determine whether rebinding the package might improve performance.

    Determine whether or not an index was used to access a table. If an index was not used, Visual Explain canhelp you determine which columns might benefit from being indexed.

    View the effects of performing various tuning techniques by comparing the before and after versions of the

    access plan graph for a query. Obtain information about each operation in the access plan, including the total estimated cost and number of

    rows retrieved (cardinality).

    An access plangraph shows details of:

    Tables (and their associated columns) and indexes

    Operators (such as table scans, sorts, and joins)

    Table spaces and functions.

    Note:Note that Visual Explain cannot be invoked from the command line, but only from various database objects in the

    Control Center.

    To start VisualExplain:

    From the Control Center, right-click a database name and select either Show Explained Statements Historyor Explain Query.

    From the Command Editor, execute an explainable statement on the Interactive page or the Script page.

    From the Query Patroller, click Show Access Planfrom either the Managed Queries Properties notebook orfrom the Historical Queries Properties notebook.

    http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.wn.doc%2Fdoc%2Fi0054250.html
  • 8/12/2019 DB2 SQL Tuning Best Practices

    20/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 20

    5.6.2 DB2EXPLN FACILITY

    DB2 comes with a operating system level command to generate the Explain Plan for a

    given SQL statement.

    See the IBM DB2 Information Center for further details of this tool.

    http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.qb.dbconn.doc/doc/c0004770.html

    SQL and XQuery explain tool

    The db2expln command describes the access plan selected for SQL or XQuery statements.

    You can use this tool to obtain a quick explanation of the chosen access plan when explain data was not captured.

    For static SQL and XQuery statements, db2expln examines the packages that are stored in the system catalog. Fordynamic SQL and XQuery statements, db2expln examines the sections in the query cache.

    The explain tool is located in the bin subdirectory of your instance sqllib directory. If db2expln is not in your currentdirectory, it must be in a directory that appears in your PATH environment variable.

    The db2expln command uses the db2expln.bnd, db2exsrv.bnd, and db2exdyn.bnd files to bind itself to a database thefirst time the database is accessed.

    Description of db2explnoutputExplain output from the db2expln command includes both package information and section information foreach package.

    Parent topic:Explain facility

    Related reference

    db2expln- SQL and XQuery Explain command

    http://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.cmd.doc%2Fdoc%2Fr0005736.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005134.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_9/http%3A%2F%2Fpublib.boulder.ibm.com%2Finfocenter%2Fdb2luw%2Fv9r7%2Ftopic%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005739.html
  • 8/12/2019 DB2 SQL Tuning Best Practices

    21/22

    BUSINESS INTELLIGENCE PRACTICE DATABASE ADMINISTRATION COMPETENCY

    2008 Computer Sciences Corporation. 21

    6.0 Appendix

  • 8/12/2019 DB2 SQL Tuning Best Practices

    22/22