View
2.061
Download
7
Category
Tags:
Preview:
Citation preview
IBM Global Services - IBM eServer iSeries
© 2005 IBM Corporation
Indexing Strategies for DB2 UDB for iSeries
Shantan Kethireddyshantank@us.ibm.com
IBM eServer iSeries
© 2005 IBM Corporation
Agenda
• Types of indexes• How are indexes used• General approach to creating indexes• The “perfect” index• Examples
The material in this presentation was taken from the white paper: Indexing and Statistics Strategies for DB2 UDB for the iSeries.• ibm.com/servers/enable/site/education/abstracts/indxng_abs.html
IBM eServer iSeries
© 2005 IBM Corporation
How are Indexes used &
Why are they important
IBM eServer iSeries
© 2005 IBM Corporation
Binary Radix Index
• Key values are compressed– Common patterns are stored once– Unique portion stored in “leaf” pages– Positive impact on size and depth of the index tree
• Algorithm used to find values– Binary search
• Very efficient process to find a unique value
– Modified to fit the data structure
• Maintenance– Index data is automatically spread across all available disk units– Tree is automatically rebalanced to maintain an efficient structure
IBM eServer iSeries
© 2005 IBM Corporation
Binary Radix Index
Database Table
001 ARKANSAS
002 MISSIPPI
003 MISSOURI
004 IOWA
005 ARIZONA
… …
ADVANTAGES: Quick access to a single key
value (million-entry index, on average, only 20 tests)
Also efficient for small, selected range of key values (low cardinality)
DISADVANTAGES: Table rows retrieved in order of key values
(not physical order) which equates to many random I/O’s when selecting a large number of keys (high cardinality)
No way to predict which physical index pages are next when traversing the index for large number of key values
ROOTROOT
Test Node
Test Node
MISSMISS
ISSIPPI002
ISSIPPI002
OURI003
OURI003
IOWA004
IOWA004IZONA
005
IZONA005
KANSAS001
KANSAS001
ARAR
IBM eServer iSeries
© 2005 IBM Corporation
Encoded Vector Index (EVI)
• New index object for delivering fast data access in decision support and query reporting environments
– Complementary alternative to existing index object (binary radix tree structure – keyed logical file or SQL index)
– Advanced technology from IBM Research, that is variation on bitmap indexing– Easy to access data statistics improve query optimizer decision making
• Can only be created through an SQL interface– CREATE ENCODED VECTOR INDEX Library/EVI_Name on Library/Table_Name (Column) WITH n
DISTINCT VALUES
IBM eServer iSeries
© 2005 IBM Corporation
Encoded Vector Index (EVI)
Vector
Row Number Code
1 1
2 17
3 18
4 9
5 2
6 7
7 38
8 38
9 1
Symbol Table
Key ValueCode First Row Last Row Count
Arizona 1 1 80005 5000
Arkansas 2 5 99760 7300
…
Virginia 37 1222 30111 340
Wyoming 38 7 83000 2760
What is it?: New type of index *FILE object type, LF attribute Composed of two parts
Symbol table contains information for each distinct key value. Each key value is assigned a unique code
Code is 1, 2, or 4 bytes depending on number of distinct key values Rather then a bit array for each distinct key value, the index has one array of codes
(The Vector)
IBM eServer iSeries
© 2005 IBM Corporation
Index Selection
• Selection criteria is applied to ranges of index entries to quickly get a subset of rows before the table is retrieved.
– Advantages:• Only those index entries that are within a selected range are processed• Provides quick access to rows in an OLTP environment
– Potential Disadvantages:• Can perform poorly when a large number of rows are selected
– Requires a separate Random I/O against the table to extract the values
– Rule of Thumb:• Used when only asking for or expecting a few rows returned from the index• Used when sequencing the rows is required for ordering or grouping• The selection columns match the first (n) key fields of the index
IBM eServer iSeries
© 2005 IBM Corporation
General Approach to Creating Indexes
IBM eServer iSeries
© 2005 IBM Corporation
Proactive Query Tuning
• Remember the goal of creating indexes is to give the optimizer the statistics and implementation choices it needs while it is choosing an access plan for the query.
– Requires an understanding of the database model and types of queries that will be run against it– Build indexes for the largest or most commonly used queries– For ad-hoc (OLAP) or less frequently used queries build single key EVIs over the local selection
columns used in the queries– Make sure that statistics exist for the most and least selective columns for the query
• This may mean creating an index that will never be used to implement the query but only to provide the correct statistics
– Customize this approach to your own environment and query needs
IBM eServer iSeries
© 2005 IBM Corporation
Reactive Query Tuning
• Reactive query tuning really means, develop the application and any initial indexes and then run the application to see what gets used or created by the optimizer.
– Usually highlights the slower running queries, even on a subset of the entire database records (test database)– Useful for tuning existing applications that are not performing as expected– Use the feedback from the optimizer to discover:
• Any indexes or statistics the optimizer recommends for local selection• Any temporary indexes used for the query• The implementation method(s) that the optimizer has chosen to run the queries
– Use the index advisor to help guide you as to what local selection columns may provide the best index coverage– Create permanent indexes over the same columns that any temporary indexes were created upon. Try to eliminate
the temporary index builds• This also applies to temporary hash tables built over the entire table with no selection applied
IBM eServer iSeries
© 2005 IBM Corporation
Other Indexing Tips
• Avoid null capable columns if expecting to use index only access. Index only access is not available when a key column in the index is null capable
• Avoid derived expressions in local selection. Access via an index may not be used for predicates that have derived values.– T1.ShipDate > (CURRENT DATE – 10 DAYS)– UPPER(T1.CustomerName) = “SMITH”
• Index access is not used for predicates where both operands come from the same table– T1.ShipDate > T1.OrderDate
• Consider index only access if all of the columns used in the query are represented in the index as key columns• Use the most selective columns as keys in the index
– Preference should be given to columns used in equal comparisons• For key columns that are unique, specify the UNIQUE keyword when creating the index
IBM eServer iSeries
© 2005 IBM Corporation
The “Perfect” Index
IBM eServer iSeries
© 2005 IBM Corporation
Perfect Index Guidelines• Order of the columns in an index is very important. Optimizer may not use an index if the columns
are in an incorrect order. Use the following guideline:– Equal predicates first. Predicates using the “=“ operator generally eliminate the largest number of non-
participating rows and should therefore be first in the index– If all of the predicates have an equal operator, then order the columns as follows:
• Selection predicates + join predicates• Join predicates + selection predicates• Selection predicates + group by columns• Selection predicates + order by columns
– Always place the most selective columns as the first key in the index– Create perfect indexes ahead of time for pre-determined queries or queries that produce a standard report– Indexes will take up system resources, find a balance between query performance and system (index)
maintenance– A binary radix index is the fastest data access method available for a query that is highly selective and
returns a small number of rows
IBM eServer iSeries
© 2005 IBM Corporation
Using a Query Graph
• Queries can also be represented as a graph to help visualize what columns should be considered for index creation:
– Separate all of the tables and major functions in the query into different nodes of the graph• Create a different node for each table, grouping, ordering or join requirement
– Push all of the selection to the lowest level possible in the graph– Process the columns starting at the bottom of the graph to determine which ones should be included
into any indexes– Use the Perfect Index Guidelines to determine what should be included into the index
• Columns at the top of the graph may be better suited for a column stat rather than a permanent index.
IBM eServer iSeries
© 2005 IBM Corporation
Examples
IBM eServer iSeries
© 2005 IBM Corporation
One-Table Query
SELECT Customer, Customer_Number, Item_NumberFROM ItemsWHERE Year = 2000
AND Quarter = 4AND ReturnFlag = ‘R’AND ShipMode = ‘AIR’
ORDER BY Customer_Number, Item_Number
CREATE INDEX Perfect_Index ON Items (Year, Quarter, ShipMode, ReturnFlag, Customer_Number, Item_Number)
Place the most selective selection predicates first in this index based upon the database model and other queries use of the same columns.
Query has four local selection predicates and two ORDER BY columns. Follow the general guidelines of Selection predicates + order by columns:
IBM eServer iSeries
© 2005 IBM Corporation
One-Table Query Graph
Always start looking at the bottom of the graph for the columns to place into the index first. Then work your way up the graph looking for additional columns.
WHERE Year = 2000AND Quarter = 4 AND ReturnFlag = ‘R’ AND ShipMode = ‘AIR’
Final Select
Table (Items)
Customer_Number, Item_NumberORDER BY
IBM eServer iSeries
© 2005 IBM Corporation
Three-Table Join Query
SELECT T3.Year, T1.Customer_Name, SUM(T2.Revenue_WO_Tax)FROM CustDim T1, SalesFact T2, TimeDim T3WHERE T2.CustKey = T1.CustKey
AND T2.TimeKey = T3.TimeKeyAND T3.Year IN (2000, 2001)AND T3.Quarter = 1AND T1.Continent = ‘America’AND T1.Country = ‘United States’AND T1.Region = ‘Central’AND T1.Territory = ‘Five’
GROUP BY T3.Year, T1.Customer_NameORDER BY T1.Customer_Name, T3.Year
Query has two join predicates and six selection predicates. Focus first on the selection predicates for each table in the query.
IBM eServer iSeries
© 2005 IBM Corporation
Three-Table Join Query Graph
WHERE T1.Continent = ‘America’AND T1.Country = ‘United States’ AND T1.Region = ‘Central’AND T1.Territory = ‘Five’
Final Select
Table (CustDim)
T1.Customer_Name, T3.Year
Table (SalesFact)WHERE T3.Year IN (2000, 2001)
AND T3.Quarter = 1
T2.CustKey = T1.CustKey
T2.TimeKey = T3.TimeKey
T3.Year, T1.Customer_Name
ORDER BYGROUP BY
Table (TimeDim)
The columns at the top of the graph may be better suited for a column stat rather than an index.
IBM eServer iSeries
© 2005 IBM Corporation
Three-Table Join Query
SELECT T3.Year, T1.Customer_Name, SUM(T2.Revenue_WO_Tax)FROM CustDim T1, SalesFact T2, TimeDim T3WHERE T2.CustKey = T1.CustKey
AND T2.TimeKey = T3.TimeKeyAND T3.Year IN (2000, 2001)AND T3.Quarter = 1AND T1.Continent = ‘America’AND T1.Country = ‘United States’AND T1.Region = ‘Central’AND T1.Territory = ‘Five’
GROUP BY T3.Year, T1.Customer_NameORDER BY T1.Customer_Name, T3.Year
CREATE INDEX Perfect_TimeDim_Index ON TimeDim (Year, Quarter, TimeKey)
The TimeDim table has two equal selection predicates and one join predicate. Use the general guideline for Selection predicates + join predicates:
IBM eServer iSeries
© 2005 IBM Corporation
Three-Table Join Query
SELECT T3.Year, T1.Customer_Name, SUM(T2.Revenue_WO_Tax)FROM CustDim T1, SalesFact T2, TimeDim T3WHERE T2.CustKey = T1.CustKey
AND T2.TimeKey = T3.TimeKeyAND T3.Year IN (2000, 2001)AND T3.Quarter = 1AND T1.Continent = ‘America’AND T1.Country = ‘United States’AND T1.Region = ‘Central’AND T1.Territory = ‘Five’
GROUP BY T3.Year, T1.Customer_NameORDER BY T1.Customer_Name, T3.Year
CREATE INDEX Perfect_CustDim_Index ON CustDim (Continent, Country, Region, Territory, CustKey)
The CustDim table has four equal selection predicates and one join predicate. Again use the general guideline for Selection predicates + join predicates:
IBM eServer iSeries
© 2005 IBM Corporation
Three-Table Join Query
SELECT T3.Year, T1.Customer_Name, SUM(T2.Revenue_WO_Tax)FROM CustDim T1, SalesFact T2, TimeDim T3WHERE T2.CustKey = T1.CustKey
AND T2.TimeKey = T3.TimeKeyAND T3.Year IN (2000, 2001)AND T3.Quarter = 1AND T1.Continent = ‘America’AND T1.Country = ‘United States’AND T1.Region = ‘Central’AND T1.Territory = ‘Five’
GROUP BY T3.Year, T1.Customer_NameORDER BY T1.Customer_Name, T3.Year
CREATE INDEX Perfect_SalesFact_Index1 ON SalesFact (CustKey)CREATE INDEX Perfect_SalesFact_Index2 ON SalesFact (TimeKey)
The SalesFact table only has two join predicates. Since we don’t know the order in which the tables will be joined we must provide the optimizer with the flexibility of having indexes with both combinations of the join predicates.
IBM eServer iSeries
© 2005 IBM Corporation
Non-Equal Predicates
SELECT T3.Year, T1.Customer_Name, SUM(T2.Revenue_WO_Tax)FROM CustDim T1, SalesFact T2, TimeDim T3WHERE T2.CustKey = T1.CustKey
AND T2.TimeKey = T3.TimeKeyAND T3.Year >= 2000AND T3.Quarter = 1AND T1.Continent = ‘America’AND T1.Country = ‘United States’AND T1.Region = ‘Central’AND T1.Territory = ‘Five’
GROUP BY T3.Year, T1.Customer_NameORDER BY T1.Customer_Name, T3.Year
CREATE INDEX Perfect_TimeDim_Index ON TimeDim (Quarter, TimeKey, Year)
Inequalities tend to return more rows because they deal with a range of values rather then a specific value for an equal operator. Thus they should be placed at the end of the index.
IBM eServer iSeries
© 2005 IBM Corporation
Non-Equal Predicates Graph
WHERE T1.Continent = ‘America’AND T1.Country = ‘United States’ AND T1.Region = ‘Central’AND T1.Territory = ‘Five’
Final Select
Table (CustDim)
T1.Customer_Name, T3.Year
Table (SalesFact)WHERE T3.Year >= 2000
AND T3.Quarter = 1
T2.CustKey = T1.CustKey
T2.TimeKey = T3.TimeKey
T3.Year, T1.Customer_Name
ORDER BYGROUP BY
Table (TimeDim)
The non-equal predicate can be added as the last key in the index, or a column stat could be created.
IBM eServer iSeries
© 2005 IBM Corporation
EVI One-Table Query
SELECT Customer, Customer_Number, Item_NumberFROM ItemsWHERE Year = 2000
AND Quarter = 4AND ReturnFlag = ‘R’AND ShipMode = ‘AIR’
ORDER BY Customer_Number, Item_Number
CREATE ENCODED VECTOR INDEX Perfect_EVI1 ON Items (Year)CREATE ENCODED VECTOR INDEX Perfect_EVI2 ON Items (Quarter)CREATE ENCODED VECTOR INDEX Perfect_EVI3 ON Items (ReturnFlag)CREATE ENCODED VECTOR INDEX Perfect_EVI4 ON Items (ShipMode)
Dynamic bitmaps will be created from the EVI indexes and the results will be ANDed together to satisfy the query request.
When a query is not very selective (20% - 70%) then skip sequential is usually the best method. EVI’s can be scanned more efficiently. EVI’s can be used for selection in these queries, however they cannot not be used for ordering, grouping or joins.
IBM eServer iSeries
© 2005 IBM Corporation
EVI One-Table Query
SELECT Customer, Customer_Number, Item_NumberFROM ItemsWHERE Year = 2000
AND Month IN (1, 2, 3)AND ReturnFlag = ‘R’AND ShipMode = ‘RAIL’
CREATE ENCODED VECTOR INDEX Perfect_EVI5 ON Items (Month)
Either a new index over the Month column could be created or we can leave that selection to be processed as post dynamic bitmap selection.
In this case, the EVI indexes we already created on Year, ReturnFlag and ShipMode can be reused for this new query.
IBM eServer iSeries
© 2005 IBM Corporation
Indexing Strategy Summary
• Indexes don’t have to be used by the optimizer to be considered helpful• Proactively create indexes that you know will be useful based upon the database model• Use tools to help you reactively create indexes that are still required• Remember, the perfect index can be different for every single query, try to find the right
blend of indexes for your environment• Read the white paper – “Indexing and Statistics Strategies for DB2 UDB for the iSeries”
– ibm.com/servers/enable/site/education/abstracts/indxng_abs.html
IBM eServer iSeries
© 2005 IBM Corporation
Additional Information
• DB2 UDB for iSeries home page – http://www.iseries.ibm.com/db2• Newsgroups
– USENET: comp.sys.ibm.as400.misc, comp.database.ibm-db2– iSeries Network (NEWS/400 Magazine) SQL & DB2 Forum – http://www.iseriesnetwork.com/Forums/main.cfm?CFApp=59
• Education Resources – Classroom & Online– http://www.iseries.ibm.com/db2/db2educ_m.htm – http://www.iseries.ibm.com/developer/education/ibo/index.html
• DB2 UDB for iSeries Publications– Online Manuals: http://www.iseries.ibm.com/db2/books.htm – Porting Help: http://www.iseries.ibm.com/developer/db2/porting.html – DB2 UDB for iSeries Redbooks (http://ibm.com/redbooks )
• Stored Procedures & Triggers on DB2 UDB for iSeries (SG24-6503)• DB2 UDB for AS/400 Object Relational Support (SG24-5409)• SQL Query Engine
– (http://publib-b.boulder.ibm.com/Redbooks.nsf/RedpieceAbstracts/sg2456598.html)
– SQL/400 Developer’s Guide by Paul Conte & Mike Cravitz• http://iseriesnetwork.com/str/books/Uniquebook2.cfm?NextBook=183
– iSeries and AS/400 SQL at Work by Howard Arner• http://www.sqlthing.com/books.htm
• Please send questions or comments to rchudb@us.ibm.com
Recommended