Indexing Strategies

Indexing PrimerIndexing Primer

John LaSpadaJohn LaSpadaSenior SQL Architect – IBM Senior SQL Architect – IBM DBA Solution Center of ExcellenceDBA Solution Center of Excellence

OverviewOverview

Table ConceptsTable Concepts

Table Structures & HeapsTable Structures & Heaps

Index ConceptsIndex Concepts

Finding the Right BalanceFinding the Right Balance

Table ScansTable Scans

Clustered vs. Non-clustered IndexesClustered vs. Non-clustered Indexes

Index for PerformanceIndex for Performance

Table ConceptsTable Concepts

Base unit for data storage = table

Collection of unordered pages = heap

Table with no index = heap = exhaustive searches

Why exhaustive? Table scans, table scans -> Not Good No indexes means no guarantee of uniqueness No order to the data, could be more records even if

you find a match on the first row of the first page (ex. where ID = 1)

Table Structure – Heap

Table without a Clustered Index

Records are NOT ORDERED

No Doubly-Linked List (ordering algorithm or sequence)

If NO Indexes exist – a full Table Scan occurs.


If a tree were data and you were looking forleaves with a certain property, you wouldhave two options to find that data…. 1) Touch every leaf – Readingeach one to determine if theyheld that property…SCAN 2) Root -> Branchranch -> Leaveseaves …SEEK


Pros Can speed up access to data – more options over exhaustive table

scans Can guarantee uniqueness of data Can offer better lock granularity Generally lead to better balanced performance – when indexed

appropriatelyCons Adds overhead in INSERTs and DELETEs (update pointers),

add overhead in UPDATEs only when indexed column is modified Add overhead in terms of Disk space Add overhead in terms of Maintenance (there are strategies for minimizing this)

Finding a BalanceFinding a Balance

Start with a minimal number of indexes Clustered Index – How base data stored (1 per/table) Primary Key Unique Keys

Manually index foreign keys Speed up join performance

Use ITW (Index Tuning Wizard) Manually index based on either:

Execution Plans WHERE Clauses Query frequency

Dreaded TABLE SCANSDreaded TABLE SCANS

Caused when a search is performed onCaused when a search is performed on

a column where an index is nota column where an index is not

Functions causing scans: SELECT *, OR, Functions causing scans: SELECT *, OR, BETWEEN, LIKE, IN (try to stay away)BETWEEN, LIKE, IN (try to stay away)

Utilize execution plans and the optimizers Utilize execution plans and the optimizers hints!hints!

DemoDemo

Clustered IndexesClustered Indexes

Clustered Index = Actual table sorted in order of Clustered Index = Actual table sorted in order of the clustered key the clustered key

Each table = 1 Clustered IndexEach table = 1 Clustered Index Inserted new rows can either be added at the Inserted new rows can either be added at the

end (if sequential), or the row will have to be end (if sequential), or the row will have to be inserted into the correct data page, which might inserted into the correct data page, which might require a page split if there is not enough room require a page split if there is not enough room on the page for the new row. on the page for the new row.

Pointers maintains the order between the pages Pointers maintains the order between the pages so rows in other pages will not have to move.so rows in other pages will not have to move.

ClusteredClustered Index CandidatesIndex Candidates

Identity columns are ideal…Identity columns are ideal…

Narrow small keys (int) Narrow small keys (int) Unique – Minimal overhead, data takes care Unique – Minimal overhead, data takes care

of uniquenessof uniqueness Improved PerformanceImproved Performance Reduces fragmentation – improves up time!Reduces fragmentation – improves up time! Minimizes cacheMinimizes cache

Non Clustered IndexNon Clustered Index

249 Allowed, takes additional space.249 Allowed, takes additional space.

Similar to indexes in back of booksSimilar to indexes in back of books

Contains the indexed columns and a Contains the indexed columns and a pointer or bookmark pointing to the actual pointer or bookmark pointing to the actual row .row .

Think of a Google search…Think of a Google search…

Demo 2Demo 2

HintsHints

Covered Indexes - All of the columns requested Covered Indexes - All of the columns requested in the output are covered by a single index. in the output are covered by a single index.

Crucial queries - Consider creating a covering Crucial queries - Consider creating a covering index to give the query the best performance.index to give the query the best performance.

Avoid “Bookmark Lookup” in the execution plan.Avoid “Bookmark Lookup” in the execution plan.

Anytime you see "Hash" in your plan it means Anytime you see "Hash" in your plan it means temp tables and this can be done better!temp tables and this can be done better!

Use Stored Procedures whenever you can.Use Stored Procedures whenever you can.

Keeping Performance OptimalKeeping Performance Optimal

For the Optimizer to use indexes appropriately you must have Statistics!

Auto Update Statistics

Auto Create Statistics

Rebuild Index Removes all levels of fragmentation

Updates statistics

Use System ViewsUse System ViewsSELECT SELECT DISTINCT DISTINCT sys.objects.name, sys.partitions.rows, migs.user_seeks,migs.avg_total_user_cost,migs.avg_user_impact,sys.objects.name, sys.partitions.rows, migs.user_seeks,migs.avg_total_user_cost,migs.avg_user_impact, 'CREATE NONCLUSTERED INDEX <NewNameHere> ON ' + sys.objects.name + ' ( ' + mid.equality_columns + 'CREATE NONCLUSTERED INDEX <NewNameHere> ON ' + sys.objects.name + ' ( ' + mid.equality_columns + CASE WHEN mid.inequality_columns IS NULLCASE WHEN mid.inequality_columns IS NULL THEN '' ELSE CASE WHEN mid.equality_columns IS NULLTHEN '' ELSE CASE WHEN mid.equality_columns IS NULL THEN '' ELSE ',' END + mid.inequality_columns END + ' ) ' + CASE WHEN mid.included_columns IS NULLTHEN '' ELSE ',' END + mid.inequality_columns END + ' ) ' + CASE WHEN mid.included_columns IS NULL THEN '' ELSE 'INCLUDE (' + mid.included_columns + ')' END + ' with (online =ON, maxdop = 2, sort_in_tempdb = ON ) on IndexFileGroup ;' THEN '' ELSE 'INCLUDE (' + mid.included_columns + ')' END + ' with (online =ON, maxdop = 2, sort_in_tempdb = ON ) on IndexFileGroup ;' AS CreateIndexStatement , mid.equality_columns, mid.inequality_columns,AS CreateIndexStatement , mid.equality_columns, mid.inequality_columns, mid.included_columnsmid.included_columnsFROM FROM sys.dm_db_missing_index_group_stats AS migs INNER JOINsys.dm_db_missing_index_group_stats AS migs INNER JOIN sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle INNER JOINsys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle INNER JOINsys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle INNER JOIN sys.objects WITH (nolock) ON mid.object_id = sys.objects.object_id INNER JOINsys.objects WITH (nolock) ON mid.object_id = sys.objects.object_id INNER JOIN sys.partitions on sys.objects.object_id = sys.partitions.object_id sys.partitions on sys.objects.object_id = sys.partitions.object_id --and sys.partitions.index_id = 1--and sys.partitions.index_id = 1WHERE WHERE migs.group_handle INmigs.group_handle IN (SELECT TOP (10) group_handle(SELECT TOP (10) group_handle FROM sys.dm_db_missing_index_group_stats WITH (nolock)FROM sys.dm_db_missing_index_group_stats WITH (nolock) ORDER BY user_seeks DESC )ORDER BY user_seeks DESC )order by order by --migs.user_seeks desc--migs.user_seeks desc --migs.avg_total_user_cost desc--migs.avg_total_user_cost desc migs.avg_user_impact descmigs.avg_user_impact desc

Demo 3Demo 3

System Views cont.System Views cont.selectselectTableName=o.name, migs_Adv.index_advantageTableName=o.name, migs_Adv.index_advantage, s.avg_user_impact, s.avg_user_impact, s.avg_total_user_cost, s.avg_total_user_cost, s.last_user_seek, s.last_user_seek,s.unique_compiles,,s.unique_compiles,d.index_handled.index_handle,d.equality_columns, d.inequality_columns, d.included_columns, d.[statement],d.equality_columns, d.inequality_columns, d.included_columns, d.[statement]from sys.dm_db_missing_index_group_stats sfrom sys.dm_db_missing_index_group_stats sinner join sys.dm_db_missing_index_groups g on g.index_group_handle=s.group_handleinner join sys.dm_db_missing_index_groups g on g.index_group_handle=s.group_handleinner join sys.inner join sys.dm_db_missing_index_detailsdm_db_missing_index_details d on d.index_handle=g.index_handle d on d.index_handle=g.index_handleinner join sys.objects o on o.object_id=d.object_idinner join sys.objects o on o.object_id=d.object_idinner join (select user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) as inner join (select user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) as

index_advantage,index_advantage, migs.* from sys.dm_db_missing_index_group_stats migs) as migs_adv on migs.* from sys.dm_db_missing_index_group_stats migs) as migs_adv on

migs_adv.group_handle=g.index_group_handlemigs_adv.group_handle=g.index_group_handleorder by migs_adv.index_advantage desc, s.avg_user_impact descorder by migs_adv.index_advantage desc, s.avg_user_impact desc

Indexing for PerformanceHere’s the Key

Do not use the mindset were every column has to have an index, Have a strategy!

Use a strategy with data to back it. Utilize Server-side traces to see how your system is utilized by each

client at different times of day. Look at activity, data usage, and query access

Find most used queries Find highest duration queries Find highest CPU queries

Prioritize access – in terms of user queries/type Minimize total indexes and find the right balance! Start with only the necessary indexes and ADD from there.

QuestionsQuestions

Documents

Indexing Strategies