161
Table of Contents Synopsis..................................................3 Project Development Life Cycle (PDLC)..................5 Objective & Scope........................................ 6 Objective of SQL Tuner...................................6 Scope of SQL Tuner.......................................6 Theoretical Background...................................9 What is performance tuning?..............................9 Optimizing Database Performance.........................11 Designing Federated Database Servers..................12 Database Design....................................... 13 Query Tuning.......................................... 15 Advanced Query Tuning Concepts........................18 Application Design....................................19 Optimizing Utility and Tool Performance...............22 Optimizing Server Performance.........................22 Indexes.................................................23 Purpose and Structure.................................23 Index Types........................................... 24 Index Characteristics.................................25 SQL Server Performance Killers..........................28 Poor Indexing......................................... 28 Inaccurate Statistics.................................29 Excessive Blocking and Deadlocks......................29 Poor Query Design..................................... 29 Poor Database Design..................................30 Excessive Fragmentation...............................30 Non-reusable Execution Plans..........................30 Frequent Recompilation of Execution Plans.............31 Improper Use of Cursors...............................31 Improper Configuration of the Database Log............31 Ineffective Connection Pooling........................31 - 1 - SQL tuner

SQL Tuner Documentation

Embed Size (px)

Citation preview

Page 1: SQL Tuner Documentation

Table of Contents

Synopsis........................................................................................................................3

Project Development Life Cycle (PDLC)...............................................5

Objective & Scope.................................................................................................6Objective of SQL Tuner.....................................................................................6Scope of SQL Tuner.............................................................................................6

Theoretical Background...................................................................................9What is performance tuning?........................................................................9Optimizing Database Performance..........................................................11

Designing Federated Database Servers.....................................................12Database Design.................................................................................................13Query Tuning........................................................................................................15Advanced Query Tuning Concepts...............................................................18Application Design..............................................................................................19Optimizing Utility and Tool Performance...................................................22Optimizing Server Performance....................................................................22

Indexes......................................................................................................................23Purpose and Structure......................................................................................23Index Types...........................................................................................................24Index Characteristics.........................................................................................25

SQL Server Performance Killers................................................................28Poor Indexing........................................................................................................28Inaccurate Statistics...........................................................................................29Excessive Blocking and Deadlocks...............................................................29Poor Query Design..............................................................................................29Poor Database Design.......................................................................................30Excessive Fragmentation.................................................................................30Non-reusable Execution Plans........................................................................30Frequent Recompilation of Execution Plans.............................................31Improper Use of Cursors...................................................................................31Improper Configuration of the Database Log...........................................31Ineffective Connection Pooling......................................................................31

Problem Definition..............................................................................................32

System Analysis & Design............................................................................33Query Execution Process...............................................................................33Performance Tuning Process.......................................................................35Query Optimizer Architecture.....................................................................39

- 1 -SQL tuner

Page 2: SQL Tuner Documentation

Advanced Types of Optimization..................................................................42Displaying Execution Plans...........................................................................44

Execution Plan Basics........................................................................................44Graphics-Based Execution Plans...................................................................44Query Analyzer Graphical Estimated Execution Plans..........................46Text-Based Execution Plans............................................................................48

Estimated rows column in an execution plan...................................54Bookmark Lookups............................................................................................58SQL Server - Indexes and Performance................................................63

What happens over time?................................................................................63Defragmenting Indexes....................................................................................67

How to Select Indexes for Your SQL Server Tables.......................68Analyzing a Database Execution Plan....................................................77

System Planning..................................................................................................85

Methodology...........................................................................................................86

System Implementation.................................................................................97Prerequisites for system implementation...........................................97

.NET Framework 2.0 Installation...................................................................97SQL Server 2000 Installation .........................................................................99

SQL Tuner Installation …………………………………………………………..104

Technical Specification.................................................................................106Hardware Requirements..............................................................................106Software Requirements................................................................................106

User Manual...........................................................................................................107

Future Enhancements....................................................................................113Optimizing more complex queries.........................................................113Optimizing Database Structure...............................................................113Optimizing Queries Embedded in the Applications.....................114

Bibliography..........................................................................................................115Websites.................................................................................................................115Books........................................................................................................................115Components Used.............................................................................................115

- 2 -SQL tuner

Page 3: SQL Tuner Documentation

SYNOPSIS

Project Name: SQL TUNER.

Project Members:

This project is done in a group of two people. Project members are.

1. Joydeep Dass2. Sapana Rodrigues

Problem Statement:

The normal scenario in today’s industry is whenever a programmer or a developer writes a new query; they have to submit the query to the DBA of the company for tuning it. Even DBA was not able to tune the query to the fullest and even if it has been tuned it would take lots of time and resources of the DBA’s. During the crunch time it is not possible to tune each and every query. The DBA’s has to rely on his experience to do the tuning

As there are no set rules for tuning it. Sometimes even the experienced DBA was not able to tune the query.

Why SQL Tuner?

This topic was chosen to reduce the work of DBA’s of tuning the query. This tool can be used even by the developer to tune the queries instead of going to the DBA for tuning it.

Project Scope:

This project is developed for tuning SQL Queries. Tuning can be done by reducing the total CPU time and also reducing the I/O taken by the Query.

Tuning is done in two ways:

Syntax Tuning :

Checking the logical and physical operators used by the query.

Index Tuning :

Checking the indexes used in the query (if any) and which indexes can be applied to the columns used in the query.

- 3 -SQL tuner

Page 4: SQL Tuner Documentation

Methodology:

User types its queries in the interface provided by the software. The user has two choices either of tuning or executing the query. If tuning is been selected the software just gives the suggestions for improving the performance of the query. If executing is been selected the software just execute the query. Other than this it provides almost all the facilities provided by the Query Analyzer of MS SQL.

Software Requirements:

.NET Framework 2.0 SQL Server 2000

Hardware Requirements:

Processor: Preferably 1.0 GHz or Greater.

RAM : 128 MB or Greater.

Limitations of the Software:

This project was made to understand how the SQL Server parses and tunes the query internally. So, we have just been able to tune simple queries.

Future Enhancements:

To tune more complex and bigger queries. To study the database structure and provide the user with suggestions

to improve the database structure for best performance.

- 4 -SQL tuner

Page 5: SQL Tuner Documentation

Project Development Life Cycle (PDLC)

- 5 -SQL tuner

Page 6: SQL Tuner Documentation

Objective & Scope

Objective of SQL Tuner

Performance tuning is an important part of today’s database applications. Very often, large savings in both time and money can be achieved with proper performance tuning. The beauty of performance tuning is that, in many cases, a small change to an index or a SQL query can result in a far more efficient application.

Query optimization is an important skill for SQL developers and database administrators (DBAs). In order to improve the performance of SQL queries, developers and DBAs need to understand the query optimizer and the techniques it uses to select an access path and prepare a query execution plan. Query tuning involves knowledge of techniques such as cost-based and heuristic-based optimizers, plus the tools an SQL platform provides for explaining a query execution plan.

Main Objective of SQL Tuner is to analyze the query provided by the user and suggest them the ways by which they can optimize the query for performance.

SQL Tuner is a tool principally made for DBA’s and to minimize their work load, developers can also use them to prepare the optimized

Scope of SQL Tuner

The main objective of this project is to tune the SQL queries provided by the user. Tuning means a way in which the queries entered by the user can be made efficient; this can be done by reducing the Total CPU time taken by the query, and also by reducing the Input/Output taken by the computer to compute the query provided by the user.

Both Total CPU Time and Input/Output for the query can be reduced by tuning the user queries in two ways:

Syntax Tuning:

Syntax Tuning can be done by checking the Logical Operators, Arithmetic Operators and also Relational Operators of the queries provided by the user.

Logical Operators:

- 6 -SQL tuner

Page 7: SQL Tuner Documentation

These are the operators which are used to combine two or more clauses present in the where clause of the query given by the user. The logical operators are AND, OR, NOT.Out of the following three operators NOT hampers the query performance a lot then the other two logical operators. So it is preferred not to use the NOT operator, this can be done in by changing the where clause of the query.

Example: So if want to know how many students in a college are not boys. Solution: This query can be written in two ways:

Select * from students where gender not ‘BOYS’

ORSelect * from students where gender = ‘GIRLS’

The first solution searches for records which are not boys and the second solution searches for records which are girls.

To user, both the solution means one and the same thing but, to the SQL optimizer the first solution is that it has to first find all the records which are boys and store it in a temporary table and then search for records which doesn’t matches with the records present in the above mention temporary table.

But in the second solution the SQL optimizer has to search for records which are girls so it doesn’t have to create any temporary tables and neither has to compare the records. So, it reduces the query execution drastically and even the Input\Output has been reduced in the second one as it doesn’t have to go again and again to fetch the records.But in some cases it is not possible to write a query without a NOT, in which case nothing can be done syntactically to optimize the user query.

Arithmetic Operators:These are the operators by which user can obtain some calculated

values. The basic arithmetic operators are Addition (+), Subtraction (-), Division (/), Multiplication (*) and Exponentiation (^).Care should be taken by the user while writing the query such that none of these operators are present on the left hand side of the relational operators present in the where clause.

Example: Find all the salesperson whose sale is just short by 2000.

Solution: This query can also be written in many ways, out of which the two ways are as follows

Select salesperson from Sales where sale – 2000 = sale

- 7 -SQL tuner

Page 8: SQL Tuner Documentation

ORSelect salesperson from Sales where sale = sale – 2000

Here also both queries seems to be the same for the user but for the SQL optimizer the second query works faster than the first one as it doesn’t have to find the values of all the salesperson and then subtract 2000 from to it to calculate the final value.For Arithmetic operators try to avoid using the operators on the left hand side of the relational operators.

Relational Operators:These operators are use to give a relation between the column

name and its expression to be found in the where clause. The operators are listed in decreasing order of their performance.Highest performance is given by Equal (=) then comes Greater than (>) and Less than (<), then comes Greater than or equal to (>=) and Less than or equal to (<=), then comes Like operator and then the least performance is given by Not equal to (<>).

Index Tuning:

Indexing is one of the most crucial elements in increasing the performance of SQL Server. A well-written query will not show its effectiveness unless powered by an appropriate index or indexes on the table(s) used in a query, especially if the tables are large.

Indexes exist to make data selection faster, so the focus of this article is on ways you can select the best indexes for faster data retrieval. This is done in a two-step process.

Step One: Gathering Information Step Two: Taking Actions on the Information Gathered

Indexing can be quite a challenging task if you are not familiar with your databases, the relationships between tables, and how queries are written in stored procedures and embedded SQL. How many companies you have worked for have a proper ERD diagram of their databases and have followed the textbook method style of programming? In the real world, time is often limited, resulting in poor SQL Server database performance.

If you have been tasked with optimizing a database's performance (at least to a respectable level), or you want to be proactive with your databases to prevent potential future performance issues, following these steps should help you in

- 8 -SQL tuner

Page 9: SQL Tuner Documentation

tuning tables, just as they have helped me. These steps are applicable at any stage of project, even if a deadline is just around the corner.

- 9 -SQL tuner

Page 10: SQL Tuner Documentation

Theoretical Background

What is performance tuning?

What is the goal of tuning a SQL Server database? The goal is to improve performance until acceptable levels are reached. Acceptable levels can be defined in a number of ways. For a large online transaction processing (OLTP) application the performance goal might be to provide sub second response time for critical transactions and to provide a response time of less than two seconds for 95 percent of the other main transactions. For some systems, typically batch systems, acceptable performance might be measured in throughput. For example, a settlement system may define acceptable performance in terms of the number of trades settled per hour. For an overnight batch suite acceptable performance might be that it must finish before the business day starts.

Whatever the system, designing for performance should start early in the design process and continue after the application has gone live. Performance tuning is not a one-off process but an iterative process during which response time is measured, tuning performed, and response time measured again.

There is no right way to design a database; there are a number of possible approaches and all these may be perfectly valid. It is sometimes said that performance tuning is an art, not a science. This may be true, but it is important to undertake performance tuning experiments with the same kind of rigorous, controlled conditions under which scientific experiments are performed. Measurements should be taken before and after any modification, and these should be made one at a time so it can be established which modification, if any, resulted in an improvement or degradation.

What areas should the database designer concentrate on? The simple answer to this question is that the database designer should concentrate on those areas that will return the most benefit. In my experience, for most database designs I have worked with, large gains are typically made in the area of query and index design. Inappropriate indexes and badly written queries, as well as some other contributing factors, can negatively influence the query optimizer such that it chooses an inefficient strategy.

To give you some idea of the gains to be made in this area I once was asked to look at a query that joined a number of large tables together. The query was abandoned after it had not completed within 12 hours. The addition of an index in conjunction with a modification to the query meant the query now completed in less than eight minutes! This magnitude of gain cannot be achieved just by purchasing more hardware or by twiddling with some arcane SQL Server configuration option. A database designer or administrator's time is always limited, so make the best use of it! The other main area where

- 10 -SQL tuner

Page 11: SQL Tuner Documentation

gains can be dramatic is lock contention. Removing lock bottlenecks in a system with a large number of users can have a huge impact on response times.

Now, some words of caution when chasing performance problems. If users phone up to tell you that they are getting poor response times, do not immediately jump to conclusions about what is causing the problem. Circle at a high altitude first. Having made sure that you are about to monitor the correct server use the System Monitor to look at the CPU, disk subsystem, and memory use. Are there any obvious bottlenecks? If there are, then look for the culprit. Everyone blames the database, but it could just as easily be someone running his or her favorite game! If there are no obvious bottlenecks, and the CPU, disk, and memory counters in the System Monitor are lower than usual, then that might tell you something. Perhaps the network is sluggish or there is lock contention. Also be aware of the fact that some bottlenecks hide others. A memory bottleneck often manifests itself as a disk bottleneck.

- 11 -SQL tuner

Page 12: SQL Tuner Documentation

Optimizing Database Performance

The goal of performance tuning is to minimize the response time for each query and to maximize the throughput of the entire database server by reducing network traffic, disk I/O, and CPU time. This goal is achieved through understanding application requirements, the logical and physical structure of the data, and tradeoffs between conflicting uses of the database, such as online transaction processing (OLTP) versus decision support.

Performance issues should be considered throughout the development cycle, not at the end when the system is implemented. Many performance issues that result in significant improvements are achieved by careful design from the outset. To most effectively optimize the performance of Microsoft® SQL Server™ 2000, you must identify the areas that will yield the largest performance increases over the widest variety of situations and focus analysis on those areas.

Although other system-level performance issues, such as memory, hardware, and so on, are certainly candidates for study, experience shows that the performance gain from these areas is often incremental. Generally, SQL Server automatically manages available hardware resources, reducing the need (and thus, the benefit) for extensive system-level manual tuning.

Topic Description

Designing Federated Database Servers

Describes how to achieve high levels of performance, such as those required by large Web sites, by balancing the processing load across multiple servers.

Database Design Describes how database design is the most effective way to improve overall performance. Database design includes the logical database schema (such as tables and constraints) and the physical attributes such as disk systems, object placement, and indexes.

Query_Tuning Describes how the correct design of the queries used by an application can significantly improve performance.

Application Design Describes how the correct design of the user application can significantly improve performance. Application design includes transaction boundaries, locking, and the use of batches.

Optimizing Utility and Tool Performance

Describes how some of the options available with the utilities and tools supplied with Microsoft SQL Server 2000 can highlight ways in which the performance of these tools can be improved, as well as the effect of running these tools and your application at the same time.

Optimizing Server Describes how settings in the operating system (Microsoft

- 12 -SQL tuner

Page 13: SQL Tuner Documentation

Performance Windows NT®, Microsoft Windows® 95, Microsoft Windows 98 or Microsoft Windows 2000) and SQL Server can be changed to improve overall performance.

Designing Federated Database Servers

To achieve the high levels of performance required by the largest Web sites, a multitier system typically balances the processing load for each tier across multiple servers. Microsoft® SQL Server™ 2000 shares the database processing load across a group of servers by horizontally partitioning the SQL Server data. These servers are managed independently, but cooperate to process the database requests from the applications; such a cooperative group of servers is called a federation.

A federated database tier can achieve extremely high levels of performance only if the application sends each SQL statement to the member server that has most of the data required by the statement. This is called collocating the SQL statement with the data required by the statement. Collocating SQL statements with the required data is not a requirement unique to federated servers. It is also required in clustered systems.

Although a federation of servers presents the same image to the applications as a single database server, there are internal differences in how the database services tier is implemented.

Single server tier Federated server tier

There is one instance of SQL Server on the production server.

There is one instance of SQL Server on each member server.

The production data is stored in one database.

Each member server has a member database. The data is spread through the member databases.

Each table is typically a single entity. The tables from the original database are horizontally partitioned into member tables. There is one member table per member database, and distributed partitioned views are used to make it appear as if there was a full copy of the original table on each member server.

All connections are made to the single server, and all SQL statements are processed by the same instance of SQL Server.

The application layer must be able to collocate SQL statements on the member server containing most of the data referenced by the statement.

- 13 -SQL tuner

Page 14: SQL Tuner Documentation

While the goal is to design a federation of database servers to handle a complete workload, you do this by designing a set of distributed partitioned views that spread the data across the different servers.

Database Design

There are two components to designing a database: logical and physical. Logical database design involves modeling your business requirements and data using database components, such as tables and constraints, without regard for how or where the data will be physically stored. Physical database design involves mapping the logical design onto physical media, taking advantage of the hardware and software features available, which allows the data to be physically accessed and maintained as quickly as possible, and indexing.It is important to correctly design the database to model your business requirements, and to take advantage of hardware and software features early in the development cycle of a database application, because it is difficult to make changes to these components later.Logical Database DesignUsing Microsoft® SQL Server™ 2000 effectively begins with normalized database design. Normalization is the process of removing redundancies from the data. For example, when you convert from an indexed sequence access method (ISAM) style application, normalization often involves breaking data in a single file into two or more logical tables in a relational database. Transact-SQL queries then recombine the table data by using relational join operations. By avoiding the need to update the same data in multiple places, normalization improves the efficiency of an application and reduces the opportunities for introducing errors due to inconsistent data.However, there are tradeoffs to normalization. A database that is used primarily for decision support (as opposed to update-intensive transaction processing) may not have redundant updates and may be more understandable and efficient for queries if the design is not fully normalized. Nevertheless, data that is not normalized is a more common design problem in database applications than over-normalized data. Starting with a normalized design, and then selectively denormalizing tables for specific reasons, is a good strategy.Whatever the database design, you should take advantage of these features in SQL Server to automatically maintain the integrity of your data:

CHECK constraints ensure that column values are valid.

DEFAULT and NOT NULL constraints avoid the complexities (and opportunities for hidden application bugs) caused by missing column values.

PRIMARY KEY and UNIQUE constraints enforce the uniqueness of rows (and implicitly create an index to do so).

FOREIGN KEY constraints ensure that rows in dependent tables always have a matching master record.

- 14 -SQL tuner

Page 15: SQL Tuner Documentation

IDENTITY columns efficiently generate unique row identifiers. Timestamp columns ensure efficient concurrency checking between

multiple-user updates. User-defined data types ensure consistency of column definitions

across the database.

By taking advantage of these features, you can make the data rules visible to all users of the database, rather than hiding them in application logic. These server-enforced rules help avoid errors in the data that can arise from incomplete enforcement of integrity rules by the application itself. Using these facilities also ensures that data integrity is enforced as efficiently as possible.Physical Database DesignThe I/O subsystem (storage engine) is a key component of any relational database. A successful database implementation usually requires careful planning at the early stages of your project. The storage engine of a relational database requires much of this planning, which includes determining:

What type of disk hardware to use, such as RAID (redundant array of independent disks) devices.

How to place your data onto the disks. Which index design to use to improve query performance in accessing

data? How to set all configuration parameters appropriately for the database

to perform well.

- 15 -SQL tuner

Page 16: SQL Tuner Documentation

Query Tuning

It may be tempting to address a performance problem solely by system-level server performance tuning; for example, memory size, type of file system, number and type of processors, and so forth. Experience has shown that most performance problems cannot be resolved this way. They must be addressed by analyzing the application, queries, and updates that the application is submitting to the database, and how these queries and updates interact with the database schema.Unexpected long-lasting queries and updates can be caused by:

Slow network communication. Inadequate memory in the server computer or not enough memory

available for Microsoft® SQL Server™ 2000. Lack of useful statistics. Out-of-date statistics. Lack of useful indexes. Lack of useful data striping.

When a query or update takes longer than expected, use the following checklist to improve performance.

1. Is the performance problem related to a component other than queries? For example, is the problem slow network performance? Are there any other components that might be causing or contributing to performance degradation? Windows NT Performance Monitor can be used to monitor the performance of SQL Server and non-SQL Server related components.

2. If the performance issue is related to queries, which query or set of queries is involved? Use SQL Profiler to help identify the slow query or queries.

The performance of a database query can be determined by using the SET statement to enable the SHOWPLAN, STATISTICS IO, STATISTICS TIME, and STATISTICS PROFILE options.

SHOWPLAN describes the method chosen by the SQL Server query optimizer to retrieve data. For more information, see SET SHOWPLAN_ALL.

STATISTICS IO reports information about the number of scans, logical reads (pages accessed in cache), and physical reads (number of times the disk was accessed) for each table referenced in the statement. For more information, see SET STATISTICS IO.

- 16 -SQL tuner

Page 17: SQL Tuner Documentation

STATISTICS TIME displays the amount of time (in milliseconds) required to parse, compile, and execute a query.

STATISTICS PROFILE displays a result set after each executed query representing a profile of the execution of the query.

In SQL Query Analyzer, you can also turn on the graphical execution plan option to view a graphical representation of how SQL Server retrieves data.The information gathered by these tools allows you to determine how a query is executed by the SQL Server query optimizer and which indexes are being used. Using this information, you can determine if performance improvements can be made by rewriting the query, changing the indexes on the tables, or perhaps modifying the database design.

3. Was the query optimized with useful statistics?

Statistics on the distribution of values in a column are automatically created on indexed columns by SQL Server. They can also be created on nonindexed columns either manually, using SQL Query Analyzer or the CREATE STATISTICS statement, or automatically, if the auto create statistics database option is set to true. These statistics can be used by the query processor to determine the optimal strategy for evaluating a query. Maintaining additional statistics on nonindexed columns involved in join operations can improve query performance. For more information, see Statistical Information. Monitor the query using SQL Profiler or the graphical execution plan in SQL Query Analyzer to determine if the query has enough statistics.

4. Are the query statistics up-to-date? Are the statistics automatically updated?

SQL Server automatically creates and updates query statistics on indexed columns (as long as automatic query statistic updating is not disabled). Additionally, statistics can be updated on nonindexed columns either manually, using SQL Query Analyzer or the UPDATE STATISTICS statement, or automatically, if the auto update statistics database option is set to true. Up-to-date statistics are not dependent upon date or time data. If no UPDATE operations have taken place, then the query statistics are still up-to-date. If statistics are not set to update automatically, then set them to do so.

5. Are suitable indexes available? Would adding one or more indexes improve query performance?

6. Are there any data or index hot spots? Consider using disk striping. 7. Is the query optimizer provided with the best opportunity to optimize a

complex query?

- 17 -SQL tuner

Page 18: SQL Tuner Documentation

Analyzing a QueryMicrosoft SQL Server 2000 offers these ways to present information on how it navigates tables and uses indexes to access the data for a query:

Graphically display the execution plan using SQL Query Analyzer

In SQL Query Analyzer, click Query and select Display Execution Plan. After executing a query, you can select the Execution Plan tab to see a graphical representation of execution plan output.

SET SHOWPLAN_TEXT ON

After this statement is executed, SQL Server returns the execution plan information for each query.

SET SHOWPLAN_ALL ON

This statement is similar to SET SHOWPLAN_TEXT, except that the output is in a concise format.

When you display the execution plan, the statements you submit to the server are not executed; instead, SQL Server analyzes the query and displays how the statements would have been executed as a series of operators.The best execution plan used by the query engine for individual data manipulation language (DML) and Transact-SQL statements is displayed, and reveals compile-time information about stored procedures; triggers invoked by a batch, and called stored procedures and triggers invoked to an arbitrary number of calling levels. For example, executing a SELECT statement can show that SQL Server uses a table scan to obtain the data. Alternatively, an index scan may have been used instead if the index was determined to be a faster method of retrieving the data from the table.The results returned by the SHOWPLAN_TEXT and SHOWPLAN_ALL statements are a tabular representation (rows and columns) of a tree structure. The execution plan tree structure uses one row in the result set for each node in the tree, each node representing a logical or physical operator used to manipulate the data to produce expected results. SQL Query Analyzer instead graphically displays each logical and physical operator as an icon.

- 18 -SQL tuner

Page 19: SQL Tuner Documentation

Advanced Query Tuning Concepts

Microsoft SQL Server2000 performs sort, intersect, union, and difference operations using in-memory sorting and hash join technology. Using this type of query plan, SQL Server supports vertical table partitioning, sometimes called columnar storage.SQL Server employs three types of join operations:

Nested loops joins

Merge joins Hash joins

If one join input is quite small (such as fewer than 10 rows) and the other join input is fairly large and indexed on its join columns, index nested loops are the fastest join operation because they require the least I/O and the fewest comparisons. If the two join inputs are not small but are sorted on their join column (for example, if they were obtained by scanning sorted indexes), merge join is the fastest join operation. If both join inputs are large and the two inputs are of similar sizes, merge join with prior sorting and hash join offer similar performance. However, hash join operations are often much faster if the two input sizes differ significantly from each other. Hash joins can process large, unsorted, nonindexed inputs efficiently. They are useful for intermediate results in complex queries because:

Intermediate results are not indexed (unless explicitly saved to disk and then indexed) and often are not produced suitably sorted for the next operation in the query plan.

Query optimizers estimate only intermediate result sizes. Because estimates can be an order of magnitude wrong in complex queries, algorithms to process intermediate results not only must be efficient but also must degrade gracefully if an intermediate result turns out to be much larger than anticipated.

The hash join allows reductions in the use of denormalization to occur. Denormalization is typically used to achieve better performance by reducing join operations, in spite of the dangers of redundancy, such as inconsistent updates. Hash joins reduce the need to denormalize. Hash joins allow vertical partitioning (representing groups of columns from a single table in separate files or indexes) to become a viable option for physical database design.

- 19 -SQL tuner

Page 20: SQL Tuner Documentation

Application Design

Application design plays a pivotal role in determining the performance of a system using Microsoft® SQL Server™ 2000. Consider the client the controlling entity rather than the database server. The client determines the type of queries, when they are submitted, and how the results are processed. This in turn has a major effect on the type and duration of locks, amount of I/O, and processing (CPU) load on the server, and hence on whether performance is generally good or bad.For this reason, it is important to make the correct decisions during the application design phase. However, even if a performance problem occurs using a turn-key application, where changes to the client application seem impossible, this does not change the fundamental factors that affect performance: The client plays a dominant role and many performance problems cannot be resolved without making client changes. A well-designed application allows SQL Server to support thousands of concurrent users. Conversely, a poorly designed application prevents even the most powerful server platform from handling more than a few users.Guidelines for client-application design include:

Eliminate excessive network traffic.

Network roundtrips between the client and SQL Server are usually the main reason for poor performance in a database application, an even greater factor than the amount of data transferred between server and client. Network roundtrips describe the conversational traffic sent between the client application and SQL Server for every batch and result set. By making use of stored procedures, you can minimize network roundtrips. For example, if your application takes different actions based on data values received from SQL Server, make those decisions directly in the stored procedure whenever possible, thus eliminating network traffic.If a stored procedure has multiple statements, then by default SQL Server sends a message to the client application at the completion of each statement and details the number of rows affected for each statement. Most applications do not need these messages. If you are confident that your applications do not need them, you can disable these messages, which can improve performance on a slow network. Use the SET NOCOUNT session setting to disable these messages for the application.

Use small result sets.

Retrieving needlessly large result sets (for example, thousands of rows) for browsing on the client adds CPU and network I/O load, makes the application less capable of remote use, and limits multi-user scalability. It is better to design the application to prompt the user for sufficient input so queries are submitted that generate modest result sets.

- 20 -SQL tuner

Page 21: SQL Tuner Documentation

Application design techniques that facilitate this include exercising control over wildcards when building queries, mandating certain input fields, not allowing ad hoc queries, and using the TOP, PERCENT, or SET ROWCOUNT Transact-SQL statements to limit the number of rows returned by a query.

Allow cancellation of a query in progress when the user needs to regain control of the application.

An application should never force the user to restart the client computer to cancel a query. Ignoring this can lead to irresolvable performance problems. When a query is canceled by an application, for example, using the open database connectivity (ODBC) sqlcancel function, proper care should be exercised regarding transaction level. Canceling a query, for example, does not commit or roll back a user-defined transaction. All locks acquired within the transaction are retained after the query is canceled. Therefore, after canceling a query, always either commit or roll back the transaction. The same issues apply to DB-Library and other application programming interfaces (APIs) that can be used to cancel queries.

Always implement a query or lock time-out.

Do not allow queries to run indefinitely. Make the appropriate API call to set a query time-out. For example, use the ODBC SQLSetStmtOption function.

Do not use application development tools that do not allow explicit control over the SQL statements sent to SQL Server.

Do not use a tool that transparently generates Transact-SQL statements based on higher-level objects if it does not provide crucial features such as query cancellation, query time-out, and complete transactional control. It is often not possible to maintain good performance or to resolve a performance problem if the application generates transparent SQL statements, because this does not allow explicit control over transactional and locking issues, which are critical to the performance picture.

Do not intermix decision support and online transaction processing (OLTP) queries.

Do not use cursors more than necessary.

Cursors are a useful tool in relational databases; however, it is almost always more expensive to use a cursor than to use a set-oriented SQL statement to accomplish a task.In set-oriented SQL statements, the client application tells the server to update the set of records that meet specified criteria. The server figures out how to accomplish the update as a single unit of work. When updating through a cursor, the client application requires the

- 21 -SQL tuner

Page 22: SQL Tuner Documentation

server to maintain row locks or version information for every row, just in case the client asks to update the row after it has been fetched.Also, using a cursor implies that the server is maintaining client state information, such as the user's current rowset at the server, usually in temporary storage. Maintaining this state for a large number of clients is an expensive use of server resources. A better strategy with a relational database is for the client application to get in and out quickly, maintaining no client state at the server between calls. Set-oriented SQL statements support this strategy.However, if the query uses cursors, determine if the cursor query could be written more efficiently either by using a more-efficient cursor type, such as fast forward-only, or a single query.

Keep transactions as short as possible.

Use stored procedures. Use prepared execution to execute a parameterized SQL statement. Always process all results to completion.

Do not design an application or use an application that stops processing result rows without canceling the query. Doing so will usually lead to blocking and slow performance.

Ensure that your application is designed to avoid deadlocks.

Ensure that all the appropriate options for optimizing the performance of distributed queries have been set.

- 22 -SQL tuner

Page 23: SQL Tuner Documentation

Optimizing Utility and Tool Performance

Three operations performed on a production database that can benefit from optimal performance include:

Backup and restore operations.

Bulk copying data into a table. Performing database console command (DBCC) operations.

Generally, these operations do not need to be optimized. However, in situations where performance is critical, techniques can be used to fine-tune performance.

Optimizing Server Performance

Microsoft SQL Server 2000 automatically tunes many of the server configuration options, therefore requiring little, if any, tuning by a system administrator. Although these configuration options can be modified by the system administrator, it is generally recommended that these options be left at their default values, allowing SQL Server to automatically tune itself based on run-time conditions.However, if necessary, the following components can be configured to optimize server performance:

SQL Server Memory

I/O subsystem Microsoft Windows NT options

- 23 -SQL tuner

Page 24: SQL Tuner Documentation

IndexesIndexes are structured to facilitate the rapid return of result sets. The two types of indexes that SQL Server supports are clustered and non-clustered indexes. Indexes are applied to one or more columns in tables or views. The characteristics of an index affect its use of system resources and its lookup performance. The Query Optimizer uses an index if it will increase query performance.

Purpose and Structure

An index in SQL Server assists the database engine with locating records, just like an index in a book helps you locate information quickly. Without indexes, a query causes SQL Server to search all records in a table (table scan) in order to find matches. A database index contains one or more column values from a table (called the index key) and pointers to the corresponding table records. When you perform a query using the index key, the Query Optimizer will likely use an index to locate the records that match the query.

An index is structured by the SQL Server Index Manager as a balanced tree (or tree).

A B-tree is analogous to an upside-down tree with the root of the tree at the top, the leaf levels at the bottom, and intermediate levels in between. Each object in the tree structure is a group of sorted index keys called an index page. A B-tree facilitates fast and consistent query performance by carefully balancing the width and depth of the tree as the index grows. Sorting the index on the index key also improves query performance. All search requests begin at the root of a B-tree and then move through the tree to the appropriate leaf level. The number of table records and the size of the index key affect the width and depth of the tree. Index key size is called the key width. A table that has many records and a large index key width creates a deep and wide B-tree. The smaller the tree, the more quickly a search result is returned.

For optimal query performance, create indexes on columns in a table that are commonly used in queries. For example, users can query a Customers table based onLast name or customer ID. Therefore, you should create two indexes for the table: a last-name index and a customer ID index. To efficiently locate records, the Query Optimizer uses an index that matches the query. The Query Optimizer will likely use the customer ID index when the following query is executed: SELECT * FROM Customers WHERE customerid = 798 Do not create indexes for every column in a table, because too many indexes will negatively impact performance. The majority of databases are dynamic; that is, records are added, deleted, and changed regularly. When a table containing an index is modified, the index must be updated to reflect the modification. If index updates do not occur, the index will quickly become

- 24 -SQL tuner

Page 25: SQL Tuner Documentation

ineffective. Therefore, insert, update, and delete events trigger the Index Manager to update the table indexes. Like tables, indexes are data structures that occupy space in the database. The larger the table, the larger the index that is created to contain the table. Before creating an index, you must be sure that the increased query performance afforded by the index outweighs the additional computer resources necessary to maintain the index.

Index Types

There are two types of indexes: clustered and nonclustered. Both types of indexes are structured as B-trees. A clustered index contains table records in the leaf level of the B-tree. A nonclustered index contains a bookmark to the table records in the leaf level. If a clustered index exists on a table, a nonclustered index uses it to facilitate data lookup. In most cases, you will create a clustered index on a table before you create nonclustered indexes.Clustered IndexesThere can be only one clustered index on a table or view, because the clustered index key physically sorts the table or view. This type of index is particularly efficient for queries, because data records—also known as data pages—are stored in the leaf level of the B-tree. The sort order and storage location of a clustered index is analogous to a dictionary in that the words in a dictionary are sorted alphabetically and definitions appear next to the words. When you create a primary key constraint in a table that does not contain a clustered index, SQL Server will use the primary key column for the clustered index key. If a clustered index already exists in a table, a nonclustered index is created on the column defined with a primary key constraint. A column defined as the PRIMARY key is a useful index because the column values are guaranteed to be unique. Unique values create smaller B-trees than redundant values and thus make more efficient lookup structures.

To force the type of index to be created for a column or columns, you can specify the CLUSTERED or NONCLUSTERED clause in the CREATE TABLE, ALTER TABLE, or CREATE INDEX statements. Suppose that you create a Persons table containing the following columns: PersonID, FirstName, LastName, and Social- SecurityNumber. The PersonID column is defined as a primary key constraint, and the SocialSecurityNumber column is defined as a unique constraint. To make the SocialSecurityNumber column a clustered index and the PersonID column a nonclustered index, create the table by using the following syntax:

CREATE TABLE dbo.Persons(personid smallint PRIMARY KEY NONCLUSTERED,firstname varchar(30),lastname varchar(40),socialsecuritynumber char(11) UNIQUE CLUSTERED

- 25 -SQL tuner

Page 26: SQL Tuner Documentation

)

Indexes are not limited to constraints. You create indexes on any column or combination of columns in a table or view. Clustered indexes enforce uniqueness internally. Therefore, if you create a nonunique, clustered index on a column thatcontains redundant values, SQL Server creates a unique value on the redundant columns to serve as a secondary sort key. To avoid the additional work required to maintain unique values on redundant rows, favor clustered indexes for columns defined with primary key constraints.

Nonclustered Indexes

On a table or view, you can create 250 nonclustered indexes or 249 nonclustered indexes and one clustered index. You must first create a unique clustered index on a view before you can create nonclustered indexes. This restriction does not apply to tables, however. A nonclustered index is analogous to an index in the back of a book. You can use a book’s index to locate pages that match an index entry. The database uses a nonclustered index to locate matching records in the database. If a clustered index does not exist on a table, the table is unsorted and is called a heap. A nonclustered index created on a heap contains pointers to table rows. Each entry in an index page contains a row ID (RID). The RID is a pointer to a table row in a heap, and it consists of a page number, a file number, and a slot number. If a clustered index exists on a table, the index pages of a nonclustered index contain clustered index keys rather than RIDs. An index pointer, whether it is a RID or an index key, is called a bookmark.

Index Characteristics

A number of characteristics (aside from the index type, which is clustered or nonclustered) can be applied to an index. An index can be defined as follows:

Unique duplicate records are not allowed A composite of columns—an index key made up of multiple columns With a fill factor to allow index pages to grow when necessary With a pad index to change the space allocated to intermediate levels

of the B-tree With a sort order to specify ascending or descending index keys

Unique

When an index is defined as unique, the index keys and the corresponding column values must be unique. A unique index can be applied to any column if all column values are unique. A unique index can also be applied to a group of columns (a composite of columns). The composite column unique index must maintain distinctiveness. For example, a unique index defined on a lastname column and a social security number column must not contain NULL

- 26 -SQL tuner

Page 27: SQL Tuner Documentation

values in both columns. Furthermore, if there are values in both columns, the combination of lastname and social security number must be unique.

SQL Server automatically creates a unique index for a column or columns defined with a primary key or unique constraint. Therefore, use constraints to enforce data distinctiveness, rather than directly applying the unique index characteristic. SQL Server will not allow you to create an index with the uniqueness property on a column containing duplicate values.

Composite

A composite index is any index that uses more than one column in a table for its index key. Composite indexes can improve query performance by reducing input/ output (I/O) operations, because a query on a combination of columns contained in the index will be located entirely in the index. When the result of a query is obtained from the index without having to rely on the underlying table, the query is considered covered—and the index is considered covering. A single column query, such as a query on a column with a primary key constraint, is covered by the index that is automatically created on that column. A covered query on multiple columns uses a composite index as the covering index. Suppose that you run the following query:

SELECT emp_id, lname, job_lvl FROM employee01WHERE hire_date < (GETDATE() - 30)AND job_lvl >= 100ORDER BY job_lvl

If a clustered index exists on the Emp_ID column and a nonclustered index named INco exists on the LName, Job_Lvl, and Hire_Date columns, then INco is a covering index. Remember that the bookmark of a nonclustered index created on a table containing a clustered index is the clustered index key. Therefore, the INco index contains all columns specified in the query (the index is covering, and the query is covered). Figure 11.1 shows that the Query Optimizer uses INco in the query execution plan.

- 27 -SQL tuner

Page 28: SQL Tuner Documentation

Fill Factor and Pad Index

When a row is inserted into a table, SQL Server must locate some space for it. An insert operation occurs when the INSERT statement is used or when the UPDATE statement is used to update a clustered index key. If the table doesn’t contain a clustered index, the record and the index page are placed in any available space within the heap. If the table contains a clustered index, SQL Server locates the appropriate index page in the B-tree and then inserts the record in sorted order. If the index page is full, it is split (half of the pages remain in the original index page, and half of the pages move to the new index page). If the inserted row is large, additional page splits might be necessary. Page splits are complex and are resource intensive. The most common page split occurs in the leaf level index pages. To reduce the occurrence of page splits, specify how full the index page should be when it is created. This value is called the fill factor. By default, the fill factor is zero, meaning that the index page is full when it is created on existing data. A fill factor of zero is synonymous with a fill factor of 100. You can specify a global default fill factor for the server by using the sp_configure stored procedure or for a specific index with the FILLFACTOR clause. In high-capacity transaction systems, you might also want to allocate additional space to the intermediate level index pages. The additional space assigned to the intermediate levels is called the pad index.

Sort Order

When you create an index, it is sorted in ascending order. Both clustered and nonclustered indexes are sorted; the clustered index represents the sort order of the table. Consider the following SELECT statement:

SELECT emp_id, lname, job_lvl FROM employee01 WHERE hire_date < (GETDATE() - 30) AND job_lvl >= 100

Notice that there is no sort order specified. The composite index is nonclustered, and the first column in the index is lname. No sort order was specified when the index was created; therefore, the result is sorted in ascending order starting with the lname column. The ORDER BY clause is not specified, thus saving computing resources. But the result appears sorted first by lname. The sort order is dependent on the index used to return the result (unless you specify the ORDER BY clause or you tell the SELECT statement which index to use). If the Query Optimizer uses a clustered index to return a result, the result appears in the sort order of the clustered index, which is equivalent to the data pages in the table. The following Transact-SQL statement uses the clustered index on the Emp_ID column to return a result in ascending order:

SELECT emp_id, lname, fname FROM employee01

- 28 -SQL tuner

Page 29: SQL Tuner Documentation

SQL Server Performance Killers

Let’s now consider the major problem areas that can degrade SQL Server performance. By being aware of the main performance killers in SQL Server in advance, you will be able to focus your tuning efforts on the likely causes.

Once you have optimized the hardware, operating system, and SQL Server settings, the main performance killers in SQL Server are as follows, in a rough order (with the worst appearing first):

• Poor indexing

• Inaccurate statistics

• Excessive blocking and deadlocks

• Poor query design

• Poor database design

• Excessive fragmentation

• No reusable execution plans

• Frequent recompilation of execution plans

• Improper use of cursors

• Improper configuration of the database log

• Ineffective connection pooling

Let’s take a quick look at each of these

Poor Indexing

Poor indexing is usually one of the biggest performance killers in SQL Server. In the absence of proper indexing for a query, SQL Server has to retrieve and process much more data while executing the query. This causes high amounts of stress on the disk, memory, and CPU, increasing the query execution time significantly. Increased query execution time then leads to excessive blocking and deadlocks in SQL Server.

- 29 -SQL tuner

Page 30: SQL Tuner Documentation

Generally, indexes are considered to be the responsibility of the database administrator (DBA). However, the DBA cannot define how to use the indexes, since the use of indexes is determined by the database queries and stored procedures written by the developers. Therefore, defining the indexes should be the responsibility of the developers. Indexes created without the knowledge of the queries serve little purpose.

Inaccurate Statistics

As SQL Server relies heavily on cost-based optimization, accurate data-distribution statistics are extremely important for the effective use of indexes. Without accurate statistics, SQL Server’s built-in query optimizer cannot accurately estimate the number of rows affected by a query. As the amount of data to be retrieved from a table is highly important in deciding how to optimize the query execution, the query optimizer is much less effective if the data distribution statistics are not maintained accurately.

Excessive Blocking and Deadlocks

Because SQL Server is fully Atomicity, Consistency, Isolation, Durability (ACID) compliant, the database engine ensures that modifications made by concurrent transactions are properly isolated from one another. By default, a transaction sees the data either in the state before another concurrent transaction modified the data or after the other transaction completed— it does not see an intermediate state.

Because of this isolation, when multiple transactions try to access a common resource concurrently in a noncompatible way, blocking occurs in the database. A deadlock, which is an outcome of blocking, aborts the victimized database request that faced the deadlock. This requires that the database request be resubmitted for successful execution. The execution time of a query is adversely affected by the amount of blocking and deadlock it faces. For scalable performance of a multi-user database application, properly controlling the isolation levels and transaction scopes of the queries to minimize blocking and deadlock is critical; otherwise, the execution time of the queries will increase significantly, even though the hardware resources may be highly underutilized.

Poor Query Design

The effectiveness of indexes depends entirely on the way you write SQL queries. Retrieving excessively large numbers of rows from a table, or specifying a filter criterion that returns a larger result set from a table than is required, renders the indexes ineffective. To improve performance, you must ensure that the SQL queries are written to make the best use of new or

- 30 -SQL tuner

Page 31: SQL Tuner Documentation

existing indexes. Failing to write cost-effective SQL queries may prevent SQL Server from choosing proper indexes, which increases query execution time and database blocking.

Query design covers not only single queries, but also sets of queries often used to implement database functionalities such as a queue management among queue readers and writers. Even when the performance of individual queries used in the design is fine, the overall performance of the database can be very poor. Resolving this kind of bottleneck requires a broad understanding of different characteristics of SQL Server, which can affect the performance of database functionalities.

Poor Database Design

A database should be adequately normalized to increase the performance of data retrieval and reduce blocking. For example, if you have an under normalized database with customer and order information in the same table, then the customer information will be repeated in all the order rows of the customer. This repetition of information in every row will increase the I/Os required to fetch all the orders placed by a customer. At the same time, a data writer working on a customer’s order will reserve all the rows that include the customer information and thus block all other data writers/data readers trying to access the customer profile.

Over normalization of a database is as bad as under normalization. Over normalization increases the number and complexity of joins required to retrieve data. An over normalized database contains a large number of tables with a very small number of columns. As a very general rule of thumb, you may continue the normalization process unless it causes lots of queries to have four-way or greater joins. Having too many joins in a query may also be due to the fact that database entities have not been partitioned very distinctly or the query is serving a very complex set of requirements that could perhaps be better served by creating a new view or stored procedure.

Excessive Fragmentation

While analyzing data retrieval operations, you can usually assume that the data is organized in an orderly way, as indicated by the index used by the data retrieval operation. However, if the pages containing the data are fragmented in a non-orderly fashion, or if they contain a small amount of data due to frequent page splits, then the number of read operations required by the data retrieval operation will be much higher than might otherwise be required. The increase in the number of read operations caused by fragmentation hurts query performance.

Non-reusable Execution Plans

- 31 -SQL tuner

Page 32: SQL Tuner Documentation

To execute a query in an efficient way, SQL Server’s query optimizer spends a fair amount of CPU cycles creating a cost-effective execution plan. The good news is that the plan is cached in memory, so you can reuse it once created. However, if the plan is designed so that you cannot plug variable values into it, SQL Server creates a new execution plan every time the same query is resubmitted with different variable values. So, for better performance, it is extremely important to submit SQL queries in forms that help SQL Server cache and reuse the execution plans.

Frequent Recompilation of Execution Plans

One of the standard ways of ensuring a reusable execution plan, independent of variable values used in a query, is to use a stored procedure. Using a stored procedure to execute a set of SQL queries allows SQL Server to create a parameterized execution plan.

A parameterized execution plan is independent of the parameter values supplied during the execution of the stored procedure, and it is consequently highly reusable. However, the execution plan of the stored procedure can be reused only if SQL Server does not have to recompile the execution plan every time the stored procedure is run. Frequent recompilation of a stored procedure increases pressure on the CPU and the query execution time.

Improper Use of Cursors

By preferring a cursor-based (row-at-a-time) result set instead of a regular set-based SQL query, you add a fair amount of overhead on SQL Server. Use set-based queries whenever possible, but if you are forced to use cursors, be sure to use efficient cursor types such as fast forward–only. Excessive use of inefficient cursors increases stress on SQL Server resources, slowing down system performance.

Improper Configuration of the Database Log

By failing to follow the general recommendations in configuring a database log, you can adversely affect the performance of an Online Transaction Processing (OLTP)–based SQL Server database. For optimal performance, SQL Server heavily relies on accessing the database logs effectively.

Ineffective Connection Pooling

If you don’t use connection pooling, or if you don’t have enough connections in your pool, then each database connection request goes across the network to establish a database connection. This network latency increases the query execution time. Poor use of the connection pool will also increase the amount

- 32 -SQL tuner

Page 33: SQL Tuner Documentation

of memory used in SQL Server, since a large number of connections require a large amount of memory on the SQL Server.

- 33 -SQL tuner

Page 34: SQL Tuner Documentation

Problem Definition

The normal scenario in today’s world is that the user or the programmer who is using the query for working with the database often faces the problems of slow execution of the query. One reason for this problem could be that many users are trying to access the SQL Server at the same time. But user can reduce the time taken by the SQL Server for the execution of the query by optimizing the query.

Now the problem in optimizing is that user or programmer is not well versed with SQL Server Optimizer and the way in which it executes its query. So what the user or the programmer does is that it submits it query to the DBA of the company or of an organization. The DBA has to go through the query and change the query or the table in such a way that the query performs in an optimum way. Even DBA were not able to tune to the query of the user very efficiently and even if it does the tuning efficiently it would take lots of time and resources of the DBA’s. This wouldn’t be feasible at the time when queries are urgently needed by the user or the programmer.

So one way was to get a tool in which the query would get optimized and that also in less time and also without the help of the DBA. Even one more problem that the DBA was faced was that he has to rely on his own skills and experience to optimize the query as there are no set rules for doing the optimization. This would also cause a problem as there in no reference against which the DBA can be assured that the query is been optimized properly.

- 34 -SQL tuner

Page 35: SQL Tuner Documentation

System Analysis & Design

Query Execution ProcessThe path that a query traverses through a DBMS until its answer is generated is shown in Figure 1. The system modules through which it moves have the following functionality:

The Query Parser checks the validity of the query and then translates it into an internal form, usually a relational calculus expression or something equivalent.

The Query Optimizer examines all algebraic expressions that are equivalent to the given query and chooses the one that is estimated to be the cheapest.

The Code Generator or the Interpreter transforms the access plan generated by the optimizer into calls to the query processor.

The Query Processor actually executes the query.

Queries are posed to a DBMS by interactive users or by programs written in general-purpose programming languages (e.g., C/C++, Fortran, PL-1) that have queries embedded in them. An interactive (ad hoc) query goes through the entire path shown in Figure 1. On the other hand, an embedded query goes through the first three steps only once, when the program in which it is em-bedded is compiled (compile time).

- 35 -SQL tuner

Page 36: SQL Tuner Documentation

The code produced by the Code Generator is stored in the database and is simply invoked and executed by the Query Processor whenever control reaches that query during the program execution (run time). Thus, independent of the number of times an embedded query needs to be executed, optimization is not repeated until database updates make the access plan invalid (e.g., index deletion) or highly suboptimal (e.g., extensive changes in database contents). There is no real difference between optimizing interactive or embedded queries.

The area of query optimization is very large within the database field. The purpose is to primarily discuss the core problems in query optimization and their solutions, and only touch upon the wealth of results that exist beyond that. More specifically, we concentrate on optimizing a single at SQL query with ‘and’ as the only boolean connective in its qualification (also known as conjunctive query, select-project-join query, or non-recursive Horn clause) in a centralized relational DBMS, assuming that full knowledge of the run-time environment exists at compile time.

- 36 -SQL tuner

Page 37: SQL Tuner Documentation

Performance Tuning Process

The performance tuning process consists of identifying performance bottlenecks, troubleshooting their cause, applying different resolutions, and then quantifying performance improvements. It is necessary to be a little creative, since most of the time there is no one silver bullet to improve performance. The challenge is to narrow down the list of possible reasons and evaluate the effects of different resolutions. You may even undo modifications as you iterate through the tuning process. During the tuning process, you must examine various hardware and software factors that can affect the performance of a SQL Server–based application. A few of the general questions you should be asking yourself during the performance analysis are as follows:

Is any other resource-intensive application running on the same server?

Is the hardware subsystem capable of withstanding the maximum workload?

Is SQL Server configured properly? Is the database connection between SQL Server and the database

application efficient? Does the database design support the fastest data retrieval (and

modification for an updateable database)? Is the user workload, consisting of SQL queries, optimized to reduce

the load on SQL Server? Does the workload support the maximum concurrency?

If any of these factors is not configured properly, then the overall system performance may suffer. Performance tuning is an iterative process, where you identify major bottlenecks, attempt to resolve them, measure the impact of your changes, and return to the first step until performance is acceptable. While applying your solutions, you should follow the golden rule of making only one change at a time. Any change usually affects other parts of the system, so you must re-evaluate the effect of each change on the performance of the overall system. As an example, the addition of an index may fix the performance of a specific query, but it could cause other queries to run more slowly. In such a case, evaluating one change at a time also helps in prioritizing the implementation order of the changes on the production server, based on their relative contributions. You can keep on chipping away at performance bottlenecks and improving the system performance gradually. Initially, you will be able to resolve big performance bottlenecks and achieve significant performance improvements, but as you proceed through the iterations, your returns will gradually diminish. Therefore, to use your time efficiently, it is worthwhile to quantify the performance objectives first (for example, an 80% reduction in the time taken for a certain query, with no adverse effect anywhere else on the server), and then work toward them.

- 37 -SQL tuner

Page 38: SQL Tuner Documentation

The performance of a SQL Server application is highly dependent on the amount and distribution of user activity (or workload) and data. Both the amount and distribution of workload and data change over time, and differing data can cause SQL Server to execute SQL queries differently. Therefore, to ensure an optimum system performance on a continuing basis, you will need to analyze performance at regular intervals. Performance tuning is a never-ending process, as shown in Figure 1-1.

- 38 -SQL tuner

Page 39: SQL Tuner Documentation

You can see that the steps to optimize the costliest query make for a complex process, which also requires multiple iterations to troubleshoot the performance issues within the query and apply one change at a time. The steps involved in the optimization of the costliest query are shown in Figure 1-2.

- 39 -SQL tuner

Page 40: SQL Tuner Documentation

- 40 -SQL tuner

Page 41: SQL Tuner Documentation

As you can see from this process, there is quite a lot to do to ensure that you correctly tune the performance of a given query. It is important to use a solid process like this in performance tuning, to focus on the main identified issues.

Having said this, it also helps to try and keep a broader perspective about the problem as a whole, since sometimes you may believe that you are trying to solve the correct performance bottleneck, when in reality something else is causing the problem.

- 41 -SQL tuner

Page 42: SQL Tuner Documentation

Query Optimizer ArchitectureAn abstraction of query optimization process in DBMS is as above. Given a database and a query on it, several execution plans exist that can be employed to answer the query. In principle, all the alternatives need to be considered so that the one with the best estimated performance is chosen. An abstraction of the process of generating and testing these alternatives is shown in Figure 2, which is essentially a modular architecture of a query optimizer. Although one could build an optimizer based on this architecture, in real systems, the modules shown do not always have so clear-cut boundaries as in Figure 2. Based on Figure 2, the entire query optimization process can be seen as having two stages: rewriting and planning. There is only one module in the first stage, the Rewriter, whereas all other modules are in the second stage. The functionality of each of the modules in Figure 2 is analyzed below:

RewriterThis module applies transformations to a given query and produces equivalent queries that are hopefully more efficient, e.g., replacement of views with their definition, flattening out of nested queries, etc. The transformations performed by the Rewriter depend only on the declarative, i.e., static, characteristics of queries and do not take into account the actual query costs for the specific DBMS and database concerned. If the rewriting is known or assumed to always be beneficial, the original query is discarded; otherwise, it is sent to the next stage as well. By the nature of the rewriting transformations, this stage operates at the declarative level.

- 42 -SQL tuner

Page 43: SQL Tuner Documentation

Planner This is the main module of the ordering stage. It examines all possible execution plans for each query produced in the previous stage and selects the overall cheapest one to be used to generate the answer of the original query. It employs a search strategy, which examines the space of execution plans in a particular fashion. This space is determined by two other modules of the optimizer, the Algebraic Space and the Method-Structure Space. For the most part, these two modules and the search strategy determine the cost, i.e., running time, of the optimizer itself, which should be as low as possible. The execution plans examined by the Planner are compared based on estimates of their cost so that the cheapest may be chosen. These costs are derived by the last two modules of the optimizer, the Cost Model and the Size- Distribution Estimator.

Algebraic SpaceThis module determines the action execution orders that are to be considered by the Planner for each query sent to it. All such series of actions produce the same query answer, but usually differ in performance. They are usually represented in relational algebra as formulas or in tree form. Because of the algorithmic nature of the objects generated by this module and sent to the Planner, the overall planning stage is characterized as operating at the procedural level.

Method-Structure SpaceThis module determines the implementation choices that exist for the execution of each ordered series of actions specified by the Algebraic Space. This choice is related to the available join methods for each join (e.g., nested loops, merge scan, and hash join), if supporting data structures are built on the y, if/when duplicates are eliminated, and other implementation characteristics of this sort, which are predetermined by the DBMS implementation. This choice is also related to the available indices for accessing each relation, which is determined by the physical schema of each database stored in its catalogs. Given an algebraic formula or tree from the Algebraic Space, this module produces all corresponding complete execution plans, which specify the implementation of each algebraic operator and the use of any indices.

Cost Model This module specifies the arithmetic formulas that are used to estimate the cost of execution plans. For every different join method, for every different index type access, and in general for every distinct kind of step that can be found in an execution plan, there is a formula that gives its cost. Given the complexity of many of these steps, most of these formulas are simple approximations of what the system actually does and are based on certain assumptions regarding issues like buffer management, disk-cpu overlap, sequential vs. random I/O, etc. The most important input parameters to a formula are the size of the buffer pool used by the corresponding step, the sizes of relations or indices accessed, and possibly various distributions of

- 43 -SQL tuner

Page 44: SQL Tuner Documentation

values in these relations. While the first one is determined by the DBMS for each query, the other two are estimated by the Size-Distribution Estimator.

Size-Distribution EstimatorThis module specifies how the sizes (and possibly frequency distributions of attribute values) of database relations and indices as well as (sub) query results are estimated. As mentioned above, these estimates are needed by the Cost Model. The specific estimation approach adopted in this module also determines the form of statistics that need to be maintained in the catalogs of each database, if any.

- 44 -SQL tuner

Page 45: SQL Tuner Documentation

Advanced Types of Optimization

Semantic Query OptimizationSemantic query optimization is a form of optimization mostly related to the Rewriter module. The basic idea lies in using integrity constraints defined in the database to rewrite a given query into semantically equivalent ones. These can then be optimized by the Planner as regular queries and the most efficient plan among all can be used to answer the original query. As a simple example, using a hypothetical SQL-like syntax, consider the following integrity constraint :

assert sal-constraint on emp:sal>100K where job = “Sr. Programmer”.

Also consider the following query:

select name, floor from emp, dept where emp.dno = dept.dno and job = ‘Sr. Programmer’

Using the above integrity constraint, the query can be rewritten into a semantically equivalent one to include a selection on sal:

select name, floor from emp, deptwhere emp.dno = dept.dno and job = ‘Sr. Programmer’ and sal>100K

Having the extra selection could help tremendously in finding a fast plan to answer the query if the only index in the database is a B+-tree on emp.sal. On the other hand, it would certainly be a waste if no such index exists. For such reasons, all proposals for semantic query optimization present various heuristics or rules on which rewritings have the potential of being beneficial and should be applied and which not.Global Query OptimizationSo far, we have focused our attention to optimizing individual queries. Quite often, however, multiple queries become available for optimization at the same time, e.g., queries with unions, queries from multiple concurrent users, queries embedded in a single program, or queries in a deductive system. Instead of optimizing each query separately, one may be able to obtain a global plan that, although possibly suboptimal for each individual query, is optimal for the execution of all of them as a group. Several techniques have been proposed for global query optimization . As a simple example of the problem of global optimization consider the following two queries:

select name, floor from emp, deptwhere emp.dno = dept.dno and job = ‘Sr. Programmer’

select name from emp, deptwhere emp.dno = dept.dno and budget > 1M.

- 45 -SQL tuner

Page 46: SQL Tuner Documentation

Depending on the sizes of the emp and dept relations and the selectivity’s of the selections, it may well be that computing the entire join once and then applying separately the two selections to obtain the results of the two queries is more efficient than doing the join twice, each time taking into account the corresponding selection. Developing Planner modules that would examine all the available global plans and identify the optimal one is the goal of global/multiple query optimizers.Parametric/Dynamic Query OptimizationAs mentioned earlier, embedded queries are typically optimized once at compile time and are executed multiple times at run time. Because of this temporal separation between optimization and execution, the values of various parameters that are used during optimization may be very different during execution. This may make the chosen plan invalid (e.g., if indices used in the plan are no longer available) or simply not optimal (e.g., if the number of available buffer pages or operator selectivity’s have changed, or if new indices have become available). To address this issue, several techniques have been proposed that use various search strategies (e.g., randomized algorithms or the strategy of Volcano) to optimize queries as much as possible at compile time taking into account all possible values that interesting parameters may have at run time. These techniques use the actual parameter values at run time, and simply pick the plan that was found optimal for them with little or no overhead. Of a drastically different flavor is the technique of Rdb/VMS , where by dynamically monitoring how the probability distribution of plan costs changes, plan switching may actually occur during query execution.

- 46 -SQL tuner

Page 47: SQL Tuner Documentation

Displaying Execution Plans

Execution Plan Basics

I have always considered one of the easiest methods to tune a stored procedure was to simply study its execution plan. An execution plan is basically a road map that graphically or textually shows the data retrieval methods chosen by the SQL Server query optimizer for a stored procedure or ad-hoc query and is a very useful tool for a developer to understand the performance characteristics of a query or stored procedure since the plan is the one that SQL Server will place in its cache and use to execute the stored procedure or query. Most developers will grow to the point that it will be a simple matter for them to look at an execution plan and decide which step of a stored procedure is causing performance issues.

Execution plans can be viewed in either a graphical or textual format depending on the method used to obtain the execution plan. Query Analyzer and a small group of third-party tools, I personally use mssqlXpress, available at www.xpressapps.com; have the ability to turn the text-based plan into an easily viewed set of icons. From there is a simple matter of understanding the different icons and knowing how to drill down into the icon to retrieve detailed data. If you do not use Query Analyzer or have a third party tool available, you can use Transact-SQL to display a text-based execution plan. Transact-SQL provides several commands to display execution plans, SET STATISTICS PROFILE, SET STATISTICS IO, SET STATISTICS TIME, SET SHOWPLAN_ALL and SET SHOWPLAN_TEXT. You can one or all of these commands to display a text-based execution plan with various degrees of detailed information associated with that plan.

Graphics-Based Execution Plans

Most developers prefer the graphic-based execution plans displayed by Query Analyzer or a third-party tool as they allow a quick glance to determine any major performance problems with a query. While the methods to retrieve the graphical execution plans vary by application, most of the icons used are very similar in functionality and appearance. The next few examples will show you how to return graphical execution plans with Query Analyzer. If you use a third party tool in your development, please see the help section for that tool on execution plans to see how to display a graphical execution plan.

Query Analyzer Graphical Execution PlansOnce you have loaded your query or created a call to a stored procedure in the editor plane, click Query on the toolbar and then select Show Execution Plan. Execute the query and after the query has finished execution, select the Execution Plan tab to see the graphical execution plan output.

- 47 -SQL tuner

Page 48: SQL Tuner Documentation

Example 1: Type in the following query, enable execution plans and then execute the query.

--Change to pubs databaseUSE pubsGO--Select information from authors tableSELECT * FROM pubs.dbo.authorsGO

Execution Plan Output

- 48 -SQL tuner

Page 49: SQL Tuner Documentation

Query Analyzer Graphical Estimated Execution Plans

As you can see from the above example, Show Execution Plan will actually execute the query or stored procedure and output the execution plan used by the optimizer to a separate window. What if you did not want the query to actually execute but wanted to get a sense of what the optimizer is going to do? Query Analyzer allows you to simply show the estimated execution plan without actually running the query or stored procedure by using the Display Estimated Execution Plan tool.

A point to remember is that you cannot generate an estimated plan if your query contains temporary objects or references to objects the query builds, unless the objects already exist. You will have to build the temporary object or permanent object first and then obtain the estimated plan. Once you have loaded your query or created a call to a stored procedure in the editor plane, click Query on the toolbar and then select Display Estimated Execution Plan. Execute the query and after the query has finished execution, select the Estimated Execution Plan tab to see the graphical execution plan output.

Example 2 :Type in the following query, enable estimated execution plans and then execute the query.

USE pubsGO--Select information from authors tableSELECT * FROM pubs.dbo.authorsGO

- 49 -SQL tuner

Page 50: SQL Tuner Documentation

As you will notice from the outputs shown above the graphics-based plans do not seem to give you detailed information you may need to determine a problem. To obtain detailed information from each icon you just have to place the cursor over an icon to have the information displayed.

- 50 -SQL tuner

Page 51: SQL Tuner Documentation

Text-Based Execution Plans

If you do not have the ability to obtain a graphical execution plan, you can still use a series of Transact-SQL commands to retrieve an execution plan. While not as flashy for us visually orientated people, these Transact-SQL commands still provide a developer with a wealth of information to be used to find performance issues within a stored procedure or query.

SET SHOWPLAN_ALL - SET SHOWPLAN_ALL will instruct SQL Server not to execute Transact-SQL statements but return detailed information about how the statements will be executed and provides estimates of the resource requirements for the statements.Syntax : SET SHOWPLAN_ALL {ON | OFF}

- 51 -SQL tuner

Page 52: SQL Tuner Documentation

Example 3 - Type and execute the following query.

--Enable SET SHOWPLAN_ALLSET SHOWPLAN_ALL ONGO--Change to pubs databaseUSE pubsGO--Select information from authors tableSELECT * FROM pubs.dbo.authorsGOSET SHOWPLAN_ALL Output

- 52 -SQL tuner

Page 53: SQL Tuner Documentation

SET SHOWPLAN_TEXT - will instruct SQL Server not to execute Transact-SQL statements but return detailed information about how the statements are executed. Syntax : SET SHOWPLAN_TEXT {ON | OFF}

Example 4 - Type and execute the following query.--Enable SET SHOWPLAN_TEXTSET SHOWPLAN_TEXT ONGO--Change to pubs databaseUSE pubsGO--Select information from authors tableSELECT * FROM pubs.dbo.authorsGO

SET SHOWPLAN_TEXT Output

- 53 -SQL tuner

Page 54: SQL Tuner Documentation

SET STATISTICS PROFILE instructs SQL Server to display the profile information for a statement after executing the statement.

Syntax: SET STATISTICS PROFILE {ON | OFF}

Example 5: Type and execute the following query.

--Enable SET STATISTICS PROFILESET STATISTICS PROFILE ONGO--Change to pubs databaseUSE pubsGO--Select information from authors tableSELECT * FROM pubs.dbo.authorsGO

SET STATISTICS PROFILE Output

- 54 -SQL tuner

Page 55: SQL Tuner Documentation

SET STATISTICS IO - instructs SQL Server to display information regarding the amount of disk activity generated by Transact-SQL statements after executing the statement.

Syntax: SET STATISTICS IO {ON | OFF}

Example 6 - Type and execute the following query.--Enable SET STATISTICS IOSET STATISTICS IO ONGO--Change to pubs databaseUSE pubsGO--Select information from authors table

- 55 -SQL tuner

Page 56: SQL Tuner Documentation

SELECT * FROM pubs.dbo.authorsGO

SET STATISTICS IO Output

- 56 -SQL tuner

Page 57: SQL Tuner Documentation

Estimated rows column in an execution plan

The number of rows estimated by the optimizer shown in the execution plan can be a major factor in how the optimizer creates the execution plan. Understanding the number of estimated rows can help a developer in understanding the options used by the optimizer to create an execution plan. A large number of estimated rows can tell the developer why a merge join is more appropriate than a nested loop or why an index scan is favored over an index seek. Developers should investigate situations when small numbers of estimated rows with large estimated costs are seen in execution plans.

The estimated rows shown in the execution plan and the actual number of rows returned can serve as a major warning for a developer. The query optimizer uses column statistics to determine the estimated row count for an execution plan and if those column statistics are out of date the optimizer can make very bad choices in the options it will use for a query. A developer should review the estimated rows in the execution plan and the actual rows returned by the query to help determine if column statistics need to be updated which can dramatically change the execution plan of a query and increase the performance of that query.

To understand the problems with having out-dated statistics which will cause the estimated row count to be wrong, let’s look at the following query.

The first thing we need to do is build a database with two tables, add a small number of rows to the table, build some indexes, and then look at the execution plan for a simple query.

SET NOCOUNT ONUSE masterGO

--Create new databaseIF EXISTS(SELECT name FROM master.dbo.sysdatabases WHERE name = 'test_est_rows')DROP DATABASE test_est_rowsGOCREATE DATABASE test_est_rowsGOUSE test_est_rowsGO

--Create tables for queryIF OBJECT_ID('test_est_rows') IS NOT NULL

- 57 -SQL tuner

Page 58: SQL Tuner Documentation

DROP TABLE test_est_rowsGOIF OBJECT_ID('test_est_rows1') IS NOT NULL DROP TABLE test_est_rows1GOCREATE TABLE test_est_rows(intCol1 INTEGER,intCol2 INTEGER)

CREATE TABLE test_est_rows1(intCol1 INTEGER,intCol2 INTEGER)GO

--Insert 10 rows into each tableDECLARE @intLoop INTEGERSET @intLoop = 10

WHILE @intLoop > 0BEGININSERT INTO test_est_rows VALUES (@intLoop,@intLoop)

SET @intLoop = @intLoop - 1END

SET @intLoop = 10

WHILE @intLoop > 0BEGININSERT INTO test_est_rows1 VALUES (@intLoop,@intLoop)

SET @intLoop = @intLoop - 1ENDGO

--Build nonclustered indexesCREATE NONCLUSTERED INDEX ncl_test_est_rows ON test_est_rows(intCol1)GO

--Build nonclustered indexesCREATE NONCLUSTERED INDEX ncl_test_est_rows1 ON test_est_rows1(intCol1)GO

--Obtain execution planSET STATISTICS PROFILE ONGO

--Return dataSELECT t1.intCol1, t2.intCol1 FROM dbo.test_est_rows t1 INNER JOIN dbo.test_est_rows1 t2 ON t1.intCol1 = t2.intCol1 WHERE t1.intCol1 = 5GO

- 58 -SQL tuner

Page 59: SQL Tuner Documentation

--Obtain execution planSET STATISTICS PROFILE OFFGOExecution Plan for the above query

Next we will turn off SQL Server’s ability to update the statistics for the two indexes created above and add 50,000 new rows to the tables. These added rows should produce a situation in which the statistics are dramatically out-dated as these statistics are build on 10 rows of data and not 50,010 rows of data. --Turn off auto update statsALTER DATABASE test_est_rows SET AUTO_UPDATE_STATISTICS OFFGO

--Modify number of rows in each tableDECLARE @intLoop INTEGERSET @intLoop = 50000

WHILE @intLoop > 10BEGININSERT INTO test_est_rows VALUES (@intLoop,@intLoop)

SET @intLoop = @intLoop - 1END

SET @intLoop = 50000

WHILE @intLoop > 10BEGININSERT INTO test_est_rows1 VALUES (@intLoop,@intLoop)

SET @intLoop = @intLoop - 1ENDGO

--Obtain execution planSET STATISTICS PROFILE ONGO

--Return dataSELECT t1.intCol2, t2.intCol2 FROM dbo.test_est_rows t1 INNER JOIN dbo.test_est_rows1 t2 ON t1.intCol1 = t2.intCol1 WHERE t1.intCol1 = 5

- 59 -SQL tuner

Page 60: SQL Tuner Documentation

GO

--Obtain execution planSET STATISTICS PROFILE OFFGOExecution Plan for the above query

You can see from the execution plan, that the route taken by the optimizer is dramatically different than the one from our 10 row table plan. In my case, the optimizer thought their may have been 2,500,000 rows in the table instead of 50,010. The optimizer tried to develop a plan based on the estimated rows and used parallelism, table spool, and table scans to create the best route for the query. Now, let’s take a look at what the optimizer will do if it had correct statistics to determine the estimated rows. --Update statisticsexec sp_updatestats GO

--Obtain execution planSET STATISTICS PROFILE ONGO

--Return dataSELECT t1.intCol2, t2.intCol2 FROM dbo.test_est_rows t1 INNER JOIN dbo.test_est_rows1 t2 ON t1.intCol1 = t2.intCol1 WHERE t1.intCol1 = 5GO

--Obtain execution planSET STATISTICS PROFILE OFFGOExecution Plan for the above query

- 60 -SQL tuner

Page 61: SQL Tuner Documentation

Look, the index seeks are back. The estimated rows matched the actual rows. The parallelism is gone. The table spool has been removed. A much better execution plan for the query has been produced once the optimizer has the correct statistics to obtain estimated rows from.

Bookmark LookupsOne of the major overheads associated with the use of non-clustered indexes is the cost of bookmark lookups. Bookmark lookups are a mechanism to navigate from a non-clustered index row to the actual data row in the base table (clustered index) and can be very expensive when dealing with large number of rows. When a small number of rows are requested by a query, the SQL Server optimizer will try to use a non-clustered index on the column or columns contained in the WHERE clause to retrieve the data requested by the query. If the query requests data from columns not contained in the non-clustered index, SQL Server must go back to the data pages to obtain the data in those columns. It doesn’t matter if the table contains a clustered index or not, the query will still have to return to the table or clustered index to retrieve the data. Bookmark lookups require data page access in addition to the index page access needed to filter the table data rows. Because this requires the query to access two sets of pages instead of only one, the number of logical READS performed by the query will increase. If the data pages are not in the buffer cache, additional I/O operations will be required. And in the case of most large tables, the index page and the corresponding data pages are not usually located close to each other on the disk. These additional requirements for logical READS and physical I/O can cause bookmark lookups to become quite costly. While this cost may be acceptable in the case of small result sets, this cost becomes increasingly prohibitive in the case of larger and larger result sets. In fact, as the result sets become larger and larger, the optimizer may consider the costs of the bookmark lookups to be too much and discard the non-clustered index and simply perform a table scan instead. Example of a Bookmark LookupSET STATISTICS PROFILE ONGO

USE pubsGO

--Find phone number for White, JohnsonSELECT phone FROM dbo.authors WHERE au_lname = 'White'AND au_fname = 'Johnson'GO

Execution Plan (abridged)Rows Executes StmtText ----------- ----------- ----------------------------------------------------------------------------------1 1 SELECT [phone]=[phone] FROM [dbo].[authors]

- 61 -SQL tuner

Page 62: SQL Tuner Documentation

1 1 |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([pubs].[dbo].[authors]))1 1 |--Index Seek(OBJECT:([pubs].[dbo].[authors].[aunmindBecause both the au_lname and the au_fname are contained in a non-clustered index, the optimizer can use the non-clustered index to filter the rows contained in the table to return only the phone number requests. However, because the phone column in the authors table is not contained in the index or another non-clustered index, the optimizer must return to the authors table in order to return the matching phone number creating a bookmark lookup. Finding the offending column(s)In order to resolve the bookmark lookup, you must find the column or columns that cause the bookmark lookup. To find offending columns look for the index usage in the execution plan to find what index is utilized by the optimizer for the query. Execution Plan (abridged)StmtText ----------- ----------- ----------------------------------------------------------------------------------|--Index Seek(OBJECT:([pubs].[dbo].[authors].[aunmind]), SEEK:([authors].[au_lname]='White' AND [authors].[au_fname]='Johnson') ORDERED FORWARD)In this case we see that the authors.aunmind index is being used by the optimizer for the query. A quick check of the columns included in the index using sp_helpindex on the authors table will show that the index consists of the au_lname and au_fname columns. Index_name    index_description                        index_keysaunmind        nonclustered located on PRIMARY    au_lname, au_fname A review of the execution plan OutputList column reveals that the phone column is only remaining column being requested by the query. Execution Plan (abridged) OutputList ----------------------------------------------------------------------------------[authors].[phone] Since the phone column is not in the index, you can deduct that the phone column is the offending column in this case. Resolving bookmark lookupsOnce you discover the columns responsible for a bookmark lookup, you will need to consider one of four methods that are available to resolve the bookmark lookup.

1. Create a covering index 2. Remove the offending column 3. Convert a non-clustered index into a clustered index

Create a covering indexGiven the example listed earlier in this section, if the following covering index had been created, the result would be the removal of the bookmark lookup from the execution plan.

- 62 -SQL tuner

Page 63: SQL Tuner Documentation

CREATE NONCLUSTERED INDEX ncl_authors_phone ON authors(au_lname, au_fname, phone) GOExecution Plan SELECT [phone]=[phone] FROM [dbo].[authors] WHERE [au_lname]=@1 AND [au_fname]=@2 |--Index Seek(OBJECT:([pubs].[dbo].[authors].[ncl_authors_phone]), SEEK:([authors].[au_lname]=[@1] AND [authors].[au_fname]=[@2]) ORDERED FORWARD) Remove the offending columnIn the simple query below, the developer returned all the columns from the authors table when all the query asked for was the ID of the author. SET STATISTICS PROFILE ONGO

USE pubsGO

--Find ID number for White, JohnsonSELECT * FROM dbo.authors WHERE au_lname = 'White'AND au_fname = 'Johnson'GOExecution PlanStmtText ----------- ----------- ----------------------------------------------------------------------------------SELECT * FROM [dbo].[authors] WHERE [au_lname]=@1 AND [au_fname]=@2 |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([pubs].[dbo].[authors])) |--Index Seek(OBJECT:([pubs].[dbo].[authors].[aunmind]), SEEK:([authors].[au_lname]='White' AND [authors].[au_fname]='Johnson') ORDERED FORWARD) Removing the additional, unneeded columns and only returning the au_id column will remove the bookmark lookup in this case

SET STATISTICS PROFILE ONGO

USE pubsGO

--Find ID number for White, JohnsonSELECT au_id FROM dbo.authors WHERE au_lname = 'White'AND au_fname = 'Johnson'GO

Execution PlanStmtText ----------- ----------- ----------------------------------------------------------------------------------SELECT [au_id]=[au_id] FROM [dbo].[authors] WHERE [au_lname]=@1 AND [au_fname]=@2 |--Index Seek(OBJECT:([pubs].[dbo].[authors].[aunmind]), SEEK:([authors].[au_lname]=[@1] AND [authors].[au_fname]=[@2]) ORDERED FORWARD)

- 63 -SQL tuner

Page 64: SQL Tuner Documentation

Bookmark lookups are often caused by additional columns being returned in the data set “just in case” they are needed at a later date. Developers should strive to only include columns in their result sets which are needed for the defined query requirements. Additional columns can always be added at a later date. Convert a non-clustered index into a clustered indexWhen developers are faced with bookmark lookups that cannot be removed with the other choices described above, an alternative choice would be to convert an existing index being used by the query into a clustered index. Converting an existing index into a clustered index will place all the columns of the table in the index and prevent the need for a bookmark lookup.

SET STATISTICS PROFILE ONGOUSE pubsGO

--Find information for White, JohnsonSELECT fname + ' ' + lname + ' Hire Date: ' + CAST(hire_date AS VARCHAR(12)) FROM dbo.employee WHERE emp_id = 'PMA42628M'GOExecution Plan StmtText ----------- ----------- ----------------------------------------------------------------------------------SELECT fname + ' ' + lname + ' Hire Date: ' + CAST(hire_date AS VARCHAR(12)) FROM dbo.employee WHERE emp_id = 'PMA42628M'|--Compute Scalar(DEFINE:([Expr1002]=[employee].[fname]+' '+[employee].[lname]+' Hire Date: '+Convert([employee].[hire_date]))) |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([pubs].[dbo].[employee])) |--Index Seek(OBJECT:([pubs].[dbo].[employee].[PK_emp_id]), SEEK:([employee].[emp_id]='PMA42628M') ORDERED FORWARD) To resolve the bookmark lookup, the developer can change the existing clustered index on the lname, fname, and minit columns into a non-clustered index.

--change original clustered index into a non-clustered index

DROP INDEX employee.employee_indGOCREATE INDEX employee_ind ON employee(lname,fname,minit) GOOnce the clustered index has been changed into a non-clustered index, a new clustered index can be built on the emp_id column to resolve the bookmark lookup. In this particular case the emp_id is the PRIMARY KEY of the table, so instead of an index, the developer needs to recreate a clustered PRIMARY KEY.

- 64 -SQL tuner

Page 65: SQL Tuner Documentation

--Create new clustered index--Drop CONSTRAINTALTER TABLE EMPLOYEE DROP CONSTRAINT PK_emp_idGO

--Recreate CONSTRAINTALTER TABLE EMPLOYEE ADD CONSTRAINT PK_emp_id PRIMARY KEY CLUSTERED (emp_id) GO

--Test removal of bookmark lookup --Find information for White, JohnsonSELECT fname + ' ' + lname + ' Hire Date: ' + CAST(hire_date AS VARCHAR(12)) FROM dbo.EMPLOYEE WHERE emp_id = 'PMA42628M'GO

Execution Plan

StmtText ----------- ----------- ---------------------------------------------------------------------SELECT fname + ' ' + lname + ' Hire Date: ' + CAST(hire_date AS VARCHAR(12)) FROM dbo.employee WHERE emp_id = 'PMA42628M' |--Compute Scalar(DEFINE:([Expr1002]=[employee].[fname]+' '+[employee].[lname]+' Hire Date: '+Convert([employee].[hire_date]))) |--Clustered Index Seek(OBJECT:([pubs].[dbo].[employee].[PK_emp_id]),

SEEK:([employee].[emp_id]='PMA42628M') ORDERED FORWARD)

While converting a non-clustered index into a clustered index is a possible solution to bookmark lookups, often applications depend on the current clustered index and this solution will be almost impossible to implement in the real world.

- 65 -SQL tuner

Page 66: SQL Tuner Documentation

SQL Server - Indexes and Performance

One of the keys to SQL Server performance is ensuring that you have the proper indexes on a table so that any queries written against this table can run efficiently. There are more articles written about designing indexes, choosing columns, etc for optimizing performance, so I will refrain from repeating most of what is written elsewhere. I have included a few resources at the end of this article for this topic.However once we have built the indexes, there is still work to be done. As your data sets grow over time, SQL Server will continue to rebuild indexes and move data around as efficiently as possible. This happens in a number of ways, but the result is that you may need to perform maintenance on your indexes over time despite all of the automatic tools built into SQL Server. This article will discuss some of the issues with data growth over time as well as a technique to find tables in need of maintenance and how to perform this maintenance.

What happens over time?

If SQL Server includes auto statistic updating, a query optimizer that can learn to be more efficient with your queries, etc., why do we need to perform maintenance? Well, let's examine what happens over time.

When you build an index on a table (let's assume a clustered index), SQL Sever parcels the data across pages and extents. With v7.x and above, extents can be shared between objects (with v6.5 extents contain a single object). As a result, let's assume you create a table with rows that are < of a page in size. If you have 20 rows, then you have 5 pages worth of data. Is your data stored on 5 pages? Only if your FILLFACTOR is 100%. The fill factor determines how much, percentage wise, your pages are filled. let's assume a FILLFACTOR of 50%, then you would have 10 pages of data allocated to this table. This is getting complicated quickly, but let's examine it a bit more.

If you assume that we expand this example over time, we may grow to 100 pages of data. These (at a minimum) require 7 extents if this object does not share any extents. Each page within the extents links to another page with a pointer. The next page in the chain, however, may not be in the same extent. Therefore as we read the pages, we may need to "switch" to another extent.

The simplest example is assume we take 3 consecutive pages of data in the following order:

Extent 1 Extent 2

Page n Page n + 1

- 66 -SQL tuner

Page 67: SQL Tuner Documentation

Page n + 2

These are any three pages where page n links to page n+1 next, then to page n+2 and so on. To read these three pages we read extent 1, then switch to extent 2, then switch back to extent 1. These "switches" do not necessarily entail physical I/O, but all of these switches add up. They may not be a big deal on your local server or even a lightly loaded server, a web application that has hundreds or thousands of users could see a large performance impact from repeated scans of this table. Why does the table end up looking like this? This is how the table is designed to function over time. SQL Server will allocate space for each row based on the space available at that time. As a result, while a clustered index stores the data in physical order on a page, the pages may not be in physical order. Instead each page has a linkage to the next page in the sequence. Just as your hard disk can become fragmented over time as you delete and insert files, the allocations of pages for a table can be fragmented over time across extents as the data changes.

So why doesn't SQL Server just rebuild the indexes? I am not sure if I would even want it to do so. I would hate for this to occur right after a large web marketing campaign! Instead the engineers in Redmond have left it up to the DBA to track this fragmentation and repair it as necessary.

Running DBCC SHOWCONTIGPrior to SQL Server 2000, you had to first get the object ID using the following command

SELECT object_id('<object name>')

For the user table

SELECT object_id('user')

This returned me some long number (from sysobjects) that means nothing to me, but the SQL team in Redmond must use this often and did not feel like including the join in their code. I guess someone complained long and loud enough because in SQL 2000 you can use the name of the object in dbcc showcontig like this:

dbcc showcontig (user)

This produces the following statistics on your indexes:

DBCC SHOWCONTIG scanning 'User' table...

Table: ‘User' (962102468); index ID: 1, database ID: 7

TABLE level scan performed.

- 67 -SQL tuner

Page 68: SQL Tuner Documentation

Pages Scanned................................: 899 Extents Scanned..............................: 121 Extent Switches..............................: 897 Avg. Pages per Extent........................: 7.4 Scan Density [Best Count:Actual Count].......: 12.58% [113:898] Logical Scan Fragmentation ..................: 99.89% Extent Scan Fragmentation ...................: 80.99% Avg. Bytes Free per Page.....................: 2606.5 Avg. Page Density (full).....................: 67.80%

Above output is explained in details below :

Pages Scanned - Gives the # physical pages in the database scanned in this index. Not really relevant, but gives you the total size occupied by this index ( each page is 8k)

Extents scanned - An extent is 8 pages. So this should be pretty close to Pages Scanned / 8. In this example we have 121 extents which is 968 pages. Since the index is only 899 pages, we have a number of shared extents. Not necessarily a bad thing, but this gives you an idea that you are slightly fragmented. Of course, you do not know how much physical fragmentation this is which can contribute to longer query times. The minimum number for the 899 pages above would be 113. (899/8)

Extent Switches - # times the scan forced a switch from one extent to another. As this gets close to # pages, you have pretty high fragmentation. . If you see number close to # pages, then you may want to rebuild the index. See a Detailed Example.

Average Pages/Extent - Gives the math of Pages Scanned / Extents Scanned. Not of any great value other than you don't have to run Calculator to get the number. Fully populated extents would give a value of 8 here. I guess this is good for me

Scan Density [Best Count:Actual Count].......: 12.58% [113:898]

This is the tough one. This shows a percentage and 2 numbers separated by a colon. I explain this as I missed it the first two times around. The percentage is the result of dividing number 1 (113) by number 2 (898). So what are the two numbers?

The first number is the ideal number of extent changes if everything was linked in the a contiguous chain. The second number is the number of extents moved through which is 1 more than the number of extent switches (by definition). This is really another view of fragmentation. 100% would be minimal (I hate to say zero) fragmentation. As you can see, this table is fairly fragmented. The scan is constantly switching back and forth from one extent

- 68 -SQL tuner

Page 69: SQL Tuner Documentation

to another instead of finding a link from one page to another within an extent.

Logical Scan Fragmentation ..................: 99.89%

I am still not sure what this means. I have not gotten a good explanation of this anywhere, so here is my best interpretation. This shows how many pages (as a percentage) in the index which have a pointer to the next page that is different than the pointer to the next page that is stored in the leaf (data) page. This is only relevant for clustered indexes as the data (leaf pages) should be physically in the order of the clustered index.

So how do you use this? If you figure it out, let me know. Since this number is high for me and other items lead me to think this index is fragmented, I think this is bad. So try for a low number in OLAP systems and a medium number in OLTP systems.

Extent Scan Fragmentation ...................: 80.99%

Again, here is the official BOL explanation (v7.x and 2000 Beta 2).

Percentage of out-of-order extents in scanning the leaf pages of an index. This number is not relevant to heaps. An out-of-order extent is one for which the extent containing the current page for an index is not physically the next extent after the extent containing the previous page for an index.

This shows the percentage of pages where the next page in the index is not physically located next to the current page. This tells me the I/O system must move fairly often (80% of the time) when scanning the index to find the next page. A Detailed Explanation is given below.

Avg. Bytes Free per Page.....................: 2606.5

This tells you (on average) how many bytes are free per page. Since a page is 8096 bytes, it appears that I have on average, filled about 68% of the pages. This can be good or bad. If this is an OLTP system with frequent inserts to this table, then with more free space per page, there is less likely going to be a page split when an insert occurs. You want to monitor this on tables with heavy activity and periodically rebuild this index to spread out the data and create free space on pages. Of course you do this during periods of low activity (read as 3am) so that there is free space and page splits are minimal during periods of high activity (when everyone can yell at you for a slow database). Since this is an OLTP system, I am in good pretty shape.

If this were an OLAP system, then I would rather have this be closer to zero since most of the activity would be read based and I would want the reads to grab as much data as possible from each page (to reduce the time it takes to

- 69 -SQL tuner

Page 70: SQL Tuner Documentation

read the index). As your OLAP table grows, this becomes more critical and can impact (substantially) the query time for a query to complete.

(build test data of 10,000,000 rows and test index of 99% v 1% fillfactor).

Avg. Page Density (full).....................: 67.80%

This gives the percentage based on the previous number (I calculated the number above as 1 - (2606.5 / 8096) and rounded up.

This all means that we need to defragment this table. There are a large number of extent switches that occur, each of which could potentially cause a large I/O cost to queries using this table and index.

Defragmenting Indexes

We rebuild the clustered index which causes the server to read this clustered index and then begin moving the data to new extents and pages which should start putting everything back in physical order and reduce fragmentation. There is another way:

In SQL 2000, the SQL developers added another DBCC option which is INDEXDEFRAG. This can defragment both clustered and nonclustered indexes which (according to BOL) should improve performance as the physical order will match the logical order and (theoretically) reduce the I/O required by the server to scan the index.

A couple of caveats about this: If your index spans files, then it defragments each file separately and does NOT move pages between files. Not a good thing if you have added a new filegroup and allowed objects to grow across files. If 

A good thing that is way, way, way, extremely, absolutely, without-a-doubt long overdue is the reporting of progress by DBCC INDEXDEFRAG as it works. Every 5 minutes this will report the estimated progress back to the user. Of course many of us who have installed software with a feedback progress bar often wonder why the bar moves quickly to 99% and remains there for 80% of the total install time. So time will tell whether this is of any use, but I think some feedback is better than none.

Another addition that is way, way, way, (you get the idea) overdue is the ability to stop the DBCC. I cannot tell you how many late nights I wished I could do this in v6.5. In fact I often held off on running DBCC until the latest possible time since I could not stop it once it started. (well, there was that O-N-O-F-F switch.)

- 70 -SQL tuner

Page 71: SQL Tuner Documentation

Still one further addition that ranks above the other two is that this is an online operation. Let me repeat that. It’s an ONLINE operation. It does not hold locks on the table since it operates as a series of short transactions to move pages. It also operates more quickly than a rebuild of a new index and the time required is related to the amount of fragmentation for the object. Of course this means that you must have extensive log space if this is a large index. Something to keep in mind and watch the log growth when you run this to see how much space it eats up.

- 71 -SQL tuner

Page 72: SQL Tuner Documentation

How to Select Indexes for Your SQL Server Tables

Indexing is one of the most crucial elements in increasing the performance of SQL Server. A well-written query will not show its effectiveness unless powered by an appropriate index or indexes on the table(s) used in a query, especially if the tables are large.

Indexes exist to make data selection faster, so the focus of this article is on ways you can select the best indexes for faster data retrieval. This is done in a two-step process.

Step One: Gathering Information Step Two: Taking Actions on the Information Gathered

Indexing can be quite a challenging task if you are not familiar with your databases, the relationships between tables, and how queries are written in stored procedures and embedded SQL. How many companies you have worked for have a proper ERD diagram of their databases and have followed the textbook method style of programming? In the real world, time is often limited, resulting in poor SQL Server database performance.

If you have been tasked with optimizing a database's performance (at least to a respectable level), or you want to be proactive with your databases to prevent potential future performance issues, following these steps should help you in tuning tables, just as they have helped me. These steps are applicable at any stage of project, even if a deadline is just around the corner.

 Step One (Gathering Information)

Interact with the people who know about the database and its table structures. If you know it already, that’s great. This is very important and makes your life easier.

1) Identify key tables, based on:

Static tables (often called master tables). Highly transactional tables. Tables used within a lot of stored procedures or embedded SQL. Tables with an index size greater then its data's size. You can use

sp_space used with the table name to find table space usage. Top 10 or 15 big size tables. See a prior year database if available or

applicable. The idea is to identify the largest tables in the database after it is in production.

- 72 -SQL tuner

Page 73: SQL Tuner Documentation

2) Identify the most frequently called stored procedures/queries and list all of the tables used by them.

- 73 -SQL tuner

Page 74: SQL Tuner Documentation

3) Get the SQL Profiler trace of :

Production site (if available/applicable). Running a trace on the production box during typical activity is worth the effort and will be fruitful in later analysis.

Testing site (if one is available/applicable). Otherwise, get if from your development server.

It is advisable to write down information you collect in a document for later retrieval.

4) Before we dive into analyzing the information gathered, here are few things to keep in mind while tuning your tables:

To see the Query/Execution plans of queries, highlight them in SQL Query Analyzer (isqlw.exe) and select Display Estimated Query Plan (Cntl+L) from the Query menu. If you want to see the query plan of a stored procedure, select Show Execution Plan (Cntl+k) and execute the stored procedure. Also, turn on the “Set Statistics IO on “ command. Examining Query/Execution plans can be a bit time consuming. But you will find it easier if you really understand the database and its tables before you begin.

You need to have a good foundation on how clustered and non-clustered indexes work.

Preferred SQL Server Index Types

When you use Query Analyzer to produce a graphical execution plan, you will notice that there are several different ways SQL Server uses indexes.

Clustered Index Seek

A Clustered Index Seek uses the seeking ability of indexes to retrieve rows directly from a clustered index. In most cases, they provide the best performance on SELECT statements.

In Query Analyzer, go to pubs database. Type following query:

SELECT * FROM authors WHERE au_id LIKE'2%'

Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will see the following in the “Estimated Execution Plan” tab.

- 74 -SQL tuner

Page 75: SQL Tuner Documentation

Take a close look at the Argument section of the above illustration. Notice that the “UPKCL_auidind” clustered index is used to retrieve the data.

Index Seek

An Index Seek uses a non-clustered index to retrieve data, and in some ways, acts like a clustered index. This is because all of the data retrieved is fetched from the leaf layer of the non-clustered index, not from any of the data pages. You often see this behavior in a covering index.

In Query Analyzer, go to pubs database and type following query:

SELECT title_id, title FROM titles WHERE title LIKE 't%'

Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will see the following in the “Estimated Execution Plan” tab:

- 75 -SQL tuner

Page 76: SQL Tuner Documentation

In the Argument section in the above illustration, note that the “titleind” non-clustered index is used to retrieve the data.

Bookmark Lookup

A Bookmark Lookup uses a non-clustered index to select the data. It starts with an index seek in the leaf nodes of the non-clustered index to identify the location of the data from the data pages, then it retrieves the necessary data directly from the data pages. Leaf nodes of non-clustered indexes contain row locator that point to the actual data on data pages.

In Query Analyzer, go to the pubs database. Type following query:

SELECT * FROM titles WHERE title LIKE 't%'

Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will see the following in the “Estimated Execution Plan” tab.

- 76 -SQL tuner

Page 77: SQL Tuner Documentation

In the Argument section of the Index Seek, notice that the "titlecind" non-clustered index is used, but once the data pages are identified from looking them up in the leaf pages of the non-clustered index, then a Bookmark Lookup must be performed. Again, a Bookmark Lookup is when the Query Optimizer has to lookup the data from the data pages in order to retrieve it. In the Argument section of the Bookmark Lookup, note that a Bookmark Lookup called "Bmk1000" is used. This name is assigned automatically by the Query Optimizer.

Scans

Scans (Table scans, Index scan, and Clustered Index scans) are usually bad unless the table has very few rows and the Query Optimizer determines that a table scan will outperform the use of an available index. Watch out for scans in your execution plans.

In Query Analyzer, go to pubs database and type the following query:

SELECT * FROM employee WHERE hire_date > '1992-08-01'

Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will see the following in the “Estimated Execution Plan” tab:

- 77 -SQL tuner

Page 78: SQL Tuner Documentation

Notice that in this case, a Clustered Index Scan was performed, which means that every row in the clustered index had to be examined to fulfill the requirements of the query.

Now that we understand some of the basics of how to read Query Execution Plans, let’s take a look at some additional information that you will find useful when analyzing queries for proper index use:

If you create multiple query or a stored procedure execution plans at the same time in Query Analyzer, you can compare the cost of each query or stored procedure to see which is more efficient. This is useful for comparing different versions of the same query or stored procedure.

Primary Key constraints create clustered indexes automatically if no clustered index already exists on the table and a non-clustered index is not specified when you create the PRIMARY KEY constraint.

Non-clustered indexes store clustered index keys as their row locators. This overhead can be used as a benefit by creating a covering index (explained later). Using covering indexes requires caution.

A table's size comprises both the table’s data and the size of any indexes on that table.

Adding too many indexes on a table increases the total index size of atable and can often degrade performance.

Always add a clustered index to every table, unless there is a valid reason not to, like the table has few rows.

Seeks shown in Query/Execution plans for SELECT statements are good for performance, while scans should be avoided.

A table's size (number of rows) is also a major factor used up by Query Optimizer when determining best query plan.

- 78 -SQL tuner

Page 79: SQL Tuner Documentation

Index order plays an important role in many query plans. For example, in the authors table of the pubs database, a non-clustered index is defined in the order of au_lname, then au_fname.

Fine Query A

SELECT * FROM AUTHORS WHERE au_lname like 'r%'

This uses a Bookmark Lookup and an Index seek.

 Fine Query B

SELECT * FROM AUTHORS WHERE au_lname LIKE 'r%' AND au_fname like ‘a’

This uses a Bookmark Lookup and an Index Seek.

 Not so Fine Query C

SELECT * FROM AUTHORS WHERE au_fname LIKE ‘a’

This uses a Clustered Index Scan.

SQL Server 2000 (not earlier versions) offers both ascending and descending sort orders to be specified in an index. This can be useful for queries, which uses the ORDER BY DESC clause.

To find a particular word (for e.g. a table name) used in all stored procedure code, you can use the following code to identify it. For example, you can use this to find out a list of SP’s using a table.

SELECT DISTINCT a.name AS SPName FROM syscomments b, sysobjects a WHERE b.text LIKE '%authors%' AND a.id=b.id AND a.type='P'

This query brings all SP’s having text “authors” in their code. Note that this query might bring extra SP’s, for example, if a stored procedure uses text in a comment.

 

- 79 -SQL tuner

Page 80: SQL Tuner Documentation

Step Two: What to Do Once You Have Gathered the Necessary Information

Actions for Key Tables

For static tables (tables that rarely, if ever change), you can be liberal on the number of indexes that can be added. As mentioned earlier, too many indexes can degrade the performance of highly transactional tables, but this does not apply to tables whose data will not change. The only consideration possible could be disk space. Set all index fill factors on static tables to 100 in order to minimize disk I/O for even better performance.

For highly transactional tables, try to limit the number of indexes. Always keep in mind that a non-clustered index contains the clustered index key. Because of this, limit the number of columns on your clustered index in order to keep their size small. Any index for busy transactional tables has to be highly justifiable. Choose the fill factor with caution (usually 80 to 90%) for these indexes in order to avoid potential page splitting.

For tables used lot in stored procedures/embedded SQL, these tables play an important role in the total application lifetime as they are called most often. So they require special attention. What is important is look at how tables are being accessed in queries in order to eliminate scans and convert them into seeks. Watch the logical I/O used by "Set Statistics IO on" to help you determine which queries are accessing the most data. Less logical I/O is better than more. Choose clustered index with caution. Depending on how transactional the table is, choose a higher fill factor.

For tables with index size greater then data size implies a lot of indexes, so review indexes and make sure their existence is useful and justified.

For the Top 10 or 15 largest tables, keep this fact in mind when creating indexes for these types of tables, as their indexes will also be large. Also check to see if the tables are static or non-static, which is helpful information when deciding what columns need to be indexed.

For the most frequently called Stored procedures/Embedded SQL, See the Query plans and Logical I/O page use.

SQL Profiler Trace is a very good tool. It tracks calls getting executed in SQL Server at any given point of time, their execution time, I/O reads, user logins, executing SQL statement, etc. It can also be used as debugging tool. An analysis of a Profiler trace is important to identify slow running queries. You can set the duration to > 100ms to see queries which take more then 100 milliseconds to execute.

- 80 -SQL tuner

Page 81: SQL Tuner Documentation

 

- 81 -SQL tuner

Page 82: SQL Tuner Documentation

Using a Covering Index + Non-clustered Index Uses Clustered Index as a Row LocatorOne can leverage the fact that non-clustered indexes store clustered index keys as their row locators. Meaning that a non-clustered index can behave as a clustered index if the index has all of the columns referenced in SELECT list, WHERE clause, and JOIN conditions of a query.

Example 1

In the Orders table the NorthWind database, there currently is a non-clustered index on the ShippedDate column.

Try running the following:

SELECT ShippedDate, shipcity FROM orders WHERE ShippedDate > '8/6/1996'

The query plan of statement produces a Clustered Index Scan.

Now add the column shipcity to the non-clustered index on ShippedDate.

CREATE INDEX [ShippedDate] ON [dbo].[Orders] ([ShippedDate], [ShipCity]) WITH DROP_EXISTING

Now run the query again. This time, the query plan of statement produces an Index Seek.

This magic happened because all fields (ShippedDate and ShipCity) in the SELECT and the WHERE clauses are part of an index.

Example 2

In the Titles table of the Pubs database, check out the following execution plan for this query:

SELECT title_id, title FROM titles WHERE title LIKE 't%'

Notice that the execution plan shows an Index Seek, not a Bookmark Lookup (which is what you usually find with a non-clustered index). This is because the non-clustered index on the Title column contains a clustered index key Title_Id, and this SELECT has only Title_Id, Title in the SELECT and in the WHERE clause.

- 82 -SQL tuner

Page 83: SQL Tuner Documentation

Analyzing a Database Execution Plan

My everyday job is to develop back-office applications for a mobile telecom operator. When a customer orders a service through the web or voice front-end, our applications have to provide a very quick feedback. Although we are required to answer in less than one second, we have to perform complex SQL statements on our databases which are dozens of GBs.

In this environment, a single inefficient query can have disastrous effects. A bad statement may overload all database processors, so that they are no longer available to serve other customers' orders. Of course, such problems typically occur shortly after the launch of new offers... that is, precisely under heavy marketing fire. Could you imagine the mood of our senior management if such a disaster happens?

Unfortunately, suboptimal statements are difficult to avoid. Applications are generally tested against a much smaller amount of data than in production environment, so performance problems are not likely to be detected empirically.

That's why every database developer (and every application developer coping with databases) should understand the basic concepts of database performance tuning. The objective of this article is to give a theoretical introduction to the problem. At the end of this article, you should be able to answer the question: is this execution plan reasonable given the concrete amount of data I have?

I have to warn you: this is about theory. I know everyone dislike it, but there is no serious way to go around it. So, expect to find here a lot of logarithms and probabilities... Not afraid? So let's continue.

Scenario

I need a sample database for the examples of this article. Let's set up the scene.

The CUSTOMERS table contains general information about all customers. Say the company has about a million of customers. This table has a primary key

- 83 -SQL tuner

Page 84: SQL Tuner Documentation

CUSTOMER_ID, which is indexed by PK_CUSTOMERS. The LAST_NAME column is indexed by IX_CUSTOMERS_LAST_NAME. There are 100000 unique last names. Records in this table have an average of 100 bytes.

The REGION_ID column of the CUSTOMERS table references the REGIONS table, which contains all the geographical regions of the country. There are approximately 50 regions. This table has a primary key REGION_ID indexed by PK_REGIONS.

I will use the notations RECORDS(CUSTOMERS) and PAGES(CUSTOMERS) to denote respectively the number of records and pages in the CUSTOMERS table, and similarly for other tables and even for indexes. Prob[CUSTOMERS.LAST_NAME = @LastName] will denote the probability that a customer will be named by the @LastName when we have no other information about him.

What is an execution plan

An SQL statement expresses what you want but does not tell the server how to do it. Using an SQL statement, you may for instance ask the server to retrieve all customers living in the region of Prague. When the server receives the statement, the first thing it does is to parse it. If the statement does not contain any syntax error, the server can go on. It will decide the best way to compute the results. The server chooses whether it is better to read completely the table of customers, or whether using an index would be faster. It compares the cost of all possible approaches. The way that a statement can be physically executed is called an execution plan or a query plan.

An execution plan is composed of primitive operations. Examples of primitive operations are: reading a table completely, using an index, performing a nested loop or a hash join,... We will detail them in this series of articles. All primitive operations have an output: their result set. Some, like the nested loop, have one input. Other, like the hash join, have two inputs. Each input should be connected to the output of another primitive operation. That's why an execution plan can be sketched as a tree: information flows from leaves to the root. There are plenty of examples below in this article.

The component of the database server that is responsible for computing the optimal execution plan is called the optimizer. The optimizer bases its decision on its knowledge of the database content.

How to inspect an execution plan

If you are using Microsoft SQL Server 2000, you can use the Query Analyzer, to which execution plan is chosen by the optimizer. Simply type an

- 84 -SQL tuner

Page 85: SQL Tuner Documentation

SQL statement in the Query window and press the Ctrl+L key. The query is displayed graphically:

As an alternative, you can get a text representation. This is especially useful if you have to print the execution plan. Using a Command Prompt, open the isql program (type isql -? to display the possible command line parameters). Follow the following instructions:

1. Type set showplan_text on, and press Enter. 2. Type go, and press Enter. 3. Paste your SQL statement at the command prompt, and press Enter. 4. Type go, and press Enter.

The top operation of this execution plan is a hash join, whose inputs are an index scan of UNC_Dep_DepartmentName and a clustered index scan of PK_USERS. The objective of this series of articles is to learn how to understand such execution plans.

What are we optimizing?

Application developers usually have to minimize processor use and sometimes memory use. However, when developing database applications, the bottleneck is elsewhere. The main concern is to minimize disk access.

The main disk allocation unit of database engines is called a page. The size of a page is typically some kilobytes. A page usually contains between dozens and hundreds of records. This is important to remember: sometimes you may think a query is optimal from the point of view of the record accesses, while it is not if you look at page accesses.

Looking for records in tables

- 85 -SQL tuner

Page 86: SQL Tuner Documentation

Full table scan

Say we are looking for a few records in a single table -- for instance we are looking for the customers whose last name is @LastName.

sql1 ::= SELECT * FROM CUSTOMERS WHERE LAST_NAME = @LastName

The first strategy is to read records from the table of customers and select the ones fulfilling the condition LAST_NAME = @LastName. Since the records are not sorted, we have to read absolutely all the records from the beginning to the end of the table. This operation is called a full table scan. It has linear complexity, which means that the execution time is a multiple of the number of rows in the table. If it takes 500 ms to look for a record in a table of 1000 records, it may take 8 minutes in a table of one million records and 5 days in a table of one billion records...

To compute the cost of sql1, we set up a table with primitive operations. For each operation, we specify the cost of one occurrence and the number of occurrences. The total cost of the query is obviously the sum of the products of operation unit cost and number of repetitions.

Operation Unit CostNumbe

rFull table scan of

CUSTOMERSPAGES(CUSTOMERS

)1

Let's take a metaphor: a full table scan is like finding all occurrences of a word in a Roman.

Index seek and index range scan

Now what if the book is not a Roman but a technical manual with an exhaustive index at the end? For sure, the search would be much faster. But what is precisely an index?

o An index is a collection of pairs of key and location. The key is the word by which we are looking. In the case of a book, the location is the page number. In the case of a database, it is the physical row identifier. Looking for a record in a table by physical row identifier has constant complexity, that is, it does not depend on the number of rows in the table.

o Keys are sorted, so we don't have to read all keys to find the right one. Indeed, searching in an index has logarithmic complexity. If looking for a record in an index of 1000 records takes 100 ms, it may take 200 ms in an index of million of rows

- 86 -SQL tuner

Page 87: SQL Tuner Documentation

and 300 ms in an index of billion of rows. (Here I'm talking about B-Tree indexes. There are other types of indexes, but they are less relevant for application development).

- 87 -SQL tuner

Page 88: SQL Tuner Documentation

If we are looking for customers by name, we can perform the following physical operations:

o Seek the first entry in IX_CUSTOMERS_LAST_NAME where LAST_NAME=@LastName. This operation is named an index seek.

o Read the index from this entry to the last where the LAST_NAME=@LastName is still true. This will cost to read PAGES(IX_CUSTOMERS_LAST_NAME)*Prob[LAST_NAME=@LastName] pages from disk. This operation (always coupled with an index seek) is called an index range scan.

o Each index entry found by the previous steps gives us the physical location of the a record in the CUSTOMERS table. We still have to fetch this record from the table. This will imply RECORDS(CUSTOMERS)*Prob[LAST_NAME=@LastName] page fetches. This operation is called a table seek.

The detailed cost analysis of sql1 using an index range scan is the following.

Operation Unit Cost NumberIndex Seek of

IX_CUSTOMERS_LAST_NAME

Log( PAGES(IX_CUSTOMERS_LAST_NAME) )

1

Index Range Scan of IX_CUSTOMERS_LAST_NA

ME

PAGES(IX_CUSTOMERS_LAST_NAME)*

Prob[LAST_NAME = @LastName]

1

Table Seek of CUSTOMERS

1RECORDS(CUSTOMERS)* Prob[LAST_NAME =

@LastName]

Bad news is that the query complexity is still linear, so the query time is still a multiple of the table size. Good news is that we cannot do really better: the complexity of a query cannot be smaller than the size of its result set.

In the next section of this article, we will accept a simplification: we will assume that index look-up has unit cost. This estimation is not so rough because a logarithmic cost can always be neglected if it is added to a linear cost. This simplification is not valid if it is multiplied to another cost.

Index selectivity

- 88 -SQL tuner

Page 89: SQL Tuner Documentation

Comparing the cost of the full table scan approach and the index range scan approach introduces us to a crucial concept in database tuning. The conclusion of

- 89 -SQL tuner

Page 90: SQL Tuner Documentation

the previous section is that the index range scan approach shall be faster if, in terms of order of magnitude, the following condition is true:

[1] RECORDS(CUSTOMERS)* Prob[LAST_NAME = @LastName]< PAGES(CUSTOMERS)

The probability that a customer has a given name is simply the number customers having this name divided by the total number of customers. Let KEYS(IX_CUSTOMERS_LAST_NAME) denote the number of unique keys in the index IX_CUSTOMERS_LAST_NAME. The number of customers named @LastName is statistically RECORDS(CUSTOMERS)/KEYS(IX_CUSTOMERS_LAST_NAME).

So the probability can be written:

[2] Prob[LAST_NAME = @LastName] =

(RECORDS(CUSTOMERS)/ KEYS(IX_CUSTOMERS_LAST_NAME))/RECORDS(CUSTOMERS)

= 1 / KEYS(IX_CUSTOMERS_LAST_NAME)

Injecting [2] in [1] we have:

[3] RECORDS (CUSTOMERS)/ KEYS(IX_CUSTOMERS_LAST_NAME)< PAGES(CUSTOMERS)

That is, an index is adequate if the number of records per unique key is smaller than the number of pages of the table.

The inverse of the left member of the previous expression is called the selectivity of an index:

SELECTIVITY(IX_CUSTOMERS_LAST_NAME) =

KEYS(IX_CUSTOMERS_LAST_NAME) / RECORDS(CUSTOMERS)

The selectivity of a unique index is always 1. The more an index is selective (the larger is its selectivity coefficient), the more is its efficiency. Corollary: indexes with poor selectivity can be counter-productive.

Joining tables with nested loops

- 90 -SQL tuner

Page 91: SQL Tuner Documentation

Things become much more difficult when you need to retrieve information from more than one table.

Suppose we want to display the name of the region besides the name of the customer:

SELECT d.NAME, e.FIRST_NAME, e.LAST_NAME FROM CUSTOMERS e, REGIONS d WHERE e.REGION_ID = d.REGION_ID

Among the possible strategies, I will present in this article the most natural: choosing a table, reading it from the beginning to the end and, for each record, search the corresponding record in the second table. The first table is called the outer table or leading table, and the second one the inner table. The dilemma is of course to decide which table should be leading.

So let's first try to start with the table of regions. We learnt before that an index on CUSTOMERS.REGION_ID would have too low selectivity to be efficient, so our first candidate execution plan is to read the table of regions and, for each region, perform a full table scan of CUSTOMERS.

Operation

Unit Cost Number

Full table scan of

REGIONSPAGES(REGIONS) 1

Full table scan of

CUSTOMERS

PAGES(CUSTOMERS)

RECORDS(REGIONS)

The leading cost is clearly PAGES(CUSTOMERS)*RECORDS(REGIONS). If we give numeric values, we have approximately 50*PAGES(CUSTOMERS).

Now what if we did the opposite? Since the table of regions is so small that it has a single page, it is useless to have an index, so we choose again two nested full table scans.

Operation

Unit Cost Number

Full table scan of

CUSTOMERS

PAGES(CUSTOMERS)

1

Full table scan of

PAGES(REGIONS)

RECORDS(CUSTOMERS)

- 91 -SQL tuner

Page 92: SQL Tuner Documentation

Operation

Unit Cost Number

REGIONS

At first sight, the leading cost is PAGES(REGIONS)*RECORDS(CUSTOMERS). Since the table of regions is so small that it fits in one page, and since we have approximately 80 customer records per page (pages have, say, 8K),we can write that the leading cost is 80*PAGES(CUSTOMERS), which seems a little worse than the first approach. However, this second join order shall be must faster than the first one. To see this, we have to take into account a factor that we have forgotten up to now: the memory cache.

Since we are interested only in minimizing disk access, we can consider that the cost of reading a page from memory is zero.

The REGIONS table and its primary key can both be stored in cache memory. It follows that the cost matrix can be rewritten as follows:

Operation

Unit Cost Number

Full table scan of

CUSTOMERS

PAGES(CUSTOMERS)

1

First Full table scan

of REGIONS

PAGES(CUSTOMERS)

1

Next Full table scan

of REGIONS

0RECORDS(CUSTO

MERS)

So, finally the leading cost term is PAGES(CUSTOMERS), which is around 200 times better than the first join order.

- 92 -SQL tuner

Page 93: SQL Tuner Documentation

System Planning

Depending on the analysis done we came up with following plan.

- 93 -SQL tuner

Parsing the Query

Get the execution plan

used by the Query Optimizer

Query Optimized

Is Indexes

used

Is Index Scan

Is Clustered index scan

Is Index Seek

Is Table Scan

Add Clustered Index on

Columns with most number of Distinct Values

N

N

N

AA

Give Suggestions to

Optimize

Is Non-Clustered

index scan

AN

N

N

YY

Y

Y

Y

Y

Page 94: SQL Tuner Documentation

 Methodology

Connect Form

This is the second form of the project, whenever the form is load the server combo box is filled with all the server names depending on the data source used in the connection string. The user then selects the server name and if Windows Authentication is been selected then the server is been connected using windows username and password, or enters username and password if SQL Authentication option is been selected, when clicked on OK button if SQL authentication is been selected a new connection string is been created by using the entered Server Name, Username and Password. If the connection string succeeds the user will be logged into the corresponding SQL Server and goes to the next form i.e. Main Form.

- 94 -SQL tuner

START

Show frmConnect Form

ServerName, UserName, Password

Is OK Clicked

Is Cancel Clicked

STOP

Check Username and Password

Show Form frmMain

M

Do nothing

C

N

Y

Y

N

Page 95: SQL Tuner Documentation

Main Form

- 95 -SQL tuner

M

Is File Menu Clicked

Is Edit Menu Clicked

Is Query Menu

Clicked

Is Window Menu

Clicked

Is Utilities Menu

Clicked

F

E

Q

W

U

N

N

N

N

N

Y

Y

Y

Y

Y

STOP

Page 96: SQL Tuner Documentation

This is the MDI Form of the project and by default it contains the Analyzer form which is a child of the MDI form. This form contains the main functionality of the project.It contains six main menus and a toolbar. The main function is to parse the query. Parsing here means optimizing the query given by the user.

The parsing function works as follows, the query given by the user is been executed to obtain the ShowPlan using SHOWPLAN_ALL ON function of SQL. After getting the ShowPlan it is been checked whether Index scan or Table scan is used. If indexes are used then it is checked whether index seek is used or index scan is used and accordingly output is given. If Table Scan is used the suggestions given to the user for applying indexes by calculating the distinct no. of values for each column of the tables.

The executing of the query is done by passing the SQL query to the SQL engine for the output.

- 96 -SQL tuner

Page 97: SQL Tuner Documentation

File Menu

- 97 -SQL tuner

F

Is Connect Clicked

Is Disconnect Clicked

Is New Clicked

Is Open Clicked

Is Save Clicked

Is Exit Clicked

C

Disconnect the connection with the SQL Server

Open a new blank document

Open an existing SQL document

Saves the document

Close the current SQL document

N

N

N

N

N

N

YSTOP

Y

Y

Y

Y

Y

Y

Y

Page 98: SQL Tuner Documentation

Edit Menu

- 98 -SQL tuner

E

Is Undo Clicked

Is Cut Clicked

Is Copy Clicked

Is Paste Clicked

Is Select All Clicked

Undo’s the last Action taken

Cut’s the Selected Text

Copy’s the Selected Text

Paste’s the Text from the

Clipboard

Selects all the text

present in the SQL

Document

N

N

N

N

N

Y

Y

Y

Y

Y

STOP

Page 99: SQL Tuner Documentation

Query Menu

- 99 -SQL tuner

Q

Is Change Database Clicked

Is Parse Clicked

Is Execute Clicked

Is Cancel Execution

Clicked

D

Gives suggestions about the

optimization the query entered by the

user

Runs the query

entered by the user

Stops the execution of

the query

N

N

N

N

Y

Y

Y

Y

STOP

Page 100: SQL Tuner Documentation

Window Menu

- 100 -SQL tuner

W

Is Switch Pane Clicked

Is Hide Result Pane

Clicked

STOP

Changes the focus to the next pane

Hides/Shows the result pane

N

N

Y

Y

Page 101: SQL Tuner Documentation

Utility Menu

- 101 -SQL tuner

U

Is Insert/ Update

Template Clicked

STOP

I

N

Y

Page 102: SQL Tuner Documentation

Change Database Form

This is the form used to change the database on which the user wants to query. When the form is load the form displays the entire database name for that particular SQL server that has been connected to, in a data grid. To change the database user can click on the OK button or can double click the database which the user wants to use. When clicked on Ok or double clicked a new connection string is been created with database name that has been selected. Cancel button will exit the form without doing any changes to the database selection.

- 102 -SQL tuner

D

Is OK Clicked

Is Cancel Clicked

STOP

Change the current

database to the database selected from

the list

N

N

Y

Y

Page 103: SQL Tuner Documentation

Insert\Update Template Form

- 103 -SQL tuner

I

Is Insert

Checked

Select table from the list for the current database

Is All Colum

ns Clicked

Select columns

from the list for the table

selected

Is Genera

te Clicked

Is Cancel Clicked

Generate a script

STOP

Is Update Clicked

N

N

Y

YN

NY

Select columns

from the list manually

N

Y

Y

Page 104: SQL Tuner Documentation

This form is used to create templates for Insert or update query depending on the selection of the user. This form is also a Child form of the above mentioned MDI form. The database is selected from the combo box present on the Main form. A connection string is been created with that particular database.

A combo box is filled with all the tables present for that database. A list box is filled with all the columns depending on the table selected from the combo box using stored procedure called sp_Columns which will give all the columns of a particular table. The user has two options now to select the columns manually or select all the columns. All columns can be selected by clicking on a check box which will checked all the columns in the list box. Then the user clicks on Generate to create the template. On generate the user checks for the all the details if details are not filled a message box will be given, if all details are filled it is checked that Insert or update has been selected. Depending on the choices their templates will be created.

- 104 -SQL tuner

Page 105: SQL Tuner Documentation

System Implementation

Prerequisites for system implementation.

.NET Framework 2.0 SQL Server 2000

.NET Framework 2.0 Installation

Step 1: Insert CD 1 for .Net and then clicked on Windows Component.Step 2: Then insert CD 5 for .Net and it will start installing the .Net Windows Component as shown in fig below when clicked on “Update Now”.

- 105 -SQL tuner

Page 106: SQL Tuner Documentation

Step 3: The Windows component includes “Microsoft .NET Framework” as shown in the fig below.

Step 4: Installation of Windows Component including Microsoft .NET Framework.

- 106 -SQL tuner

Page 107: SQL Tuner Documentation

SQL Server 2000 Installation :

Step 1: Insert CD of Microsoft SQL Server 2000 then select “SQL Server 2000 Components” as shown in fig below.

Step 2: Select “Install Database Server” as shown in fig below.

- 107 -SQL tuner

Page 108: SQL Tuner Documentation

Step 3: Click on “Next” on Welcomes screen as shown in fig below.

Step 4: Select the option “Local Computer” to install the SQL Server on the local machine.

- 108 -SQL tuner

Page 109: SQL Tuner Documentation

Step 5: On next screen for installation options select “Create a new instance of SQL Server, or install Client Tools.”

Step 6: On Types of Installation select “Server and Client Tools” for installing a Server.

- 109 -SQL tuner

Page 110: SQL Tuner Documentation

Step 7: On Service Accounts screen select the option “Use a Domain User account” and enter the username, password and the machine name for the

Windows user account.

Step 8: On “Authentication mode” screen select “Mixed Mode” which is used for both Windows Authentication and SQL Server Authentication.

- 110 -SQL tuner

Page 111: SQL Tuner Documentation

- 111 -SQL tuner

Page 112: SQL Tuner Documentation

SQL Tuner Installation

Step 1: Open the SQL Tuner “Installation Package” and click on “SQL_Tuner.msi”.

Step 2: Click on the “Next” button on the Welcome screen to install the SQL Tuner.

- 112 -SQL tuner

Page 113: SQL Tuner Documentation

- 113 -SQL tuner

Page 114: SQL Tuner Documentation

Step 3: On Confirm Installation Screen click “Next” button to install SQL Tuner.

Step 4: Installation of the SQL Tuner while be in progress.

- 114 -SQL tuner

Page 115: SQL Tuner Documentation

Step 5: Installation is completed, click on “Close” to exit the setup.

- 115 -SQL tuner

Page 116: SQL Tuner Documentation

Technical Specification

Hardware Requirements

Requirements Minimum Recommended

Processor 900 MHz 1.2 GHz

RAM 128 MB 512 MB

Disk Space 500 MB 500 MB

Operating System Windows 2000, XP Windows 2000, XP, 2003

Software Requirements

.NET Framework 2.0 SQL Server 2000

- 116 -SQL tuner

Page 117: SQL Tuner Documentation

User Manual

Connect Form (frmConnect.cs)

 

This form is used to connect to a particular server for performing the optimization or executing a query.This is the first perform of the SQL Tuner tool that the user will face. This form will decide that the new blank document which will be created belongs to which server.

Working:

Select the server to which you want to connect by selecting a particular SQL Server from the combo box.

Select from the option buttons whether you want to connect to the above mentioned SQL Server using "Windows Authentication" or by using "SQL Server Authentication".

If  "Windows Authentication" is selected the SQL Tuner will connect to the SQL Server using the logged in users, username and password.

If "SQL Server Authentication" is selected the SQL Server will connect to SQL Server using the UserName and Password of the particular SQL Server provided by the user.

UserName and Password textboxes will be enabled only if "SQL Server Authentication" is selected.

"OK" button will check for the particular SQL Server is present or not and whether the username and password is valid or not. If details provided by the user are valid then a new blank SQL query document will be created and if details provided are not valid then user will get the notification.

- 117 -SQL tuner

Page 118: SQL Tuner Documentation

Main Form (frmMain.cs)

 This form is the main form of the tool. It is an MDI form which contains all the forms provided by the SQL Tuner. Mainly as the default this form contains the Analyzer form in the startup.This form contains all the important functionality provided by the SQL Tuner.

Working1. This form contains five Main Menus which are as follows:

FileConnect form : This sub menu is used to open a new SQL document for a particular Disconnect : This sub menu is used to disconnect the SQL document from the SQL Server it was connected to and then close the document.New : This sub menu is used to open a new SQL document.Open : This sub menu is used to open an existing SQL document which has been saved before.Save : This sub menu is used to save an SQL document.Exit : This sub menu is used to exit from the form.

- 118 -SQL tuner

Page 119: SQL Tuner Documentation

EditUndo : This sub menu is used to remove the changes done to the SQL document.Cut : This sub menu is used to cut the selected text of the SQL document to the Clip Board.Copy : This sub menu is used to copy the selected text of the SQL document to the Clip Board.Paste : This sub menu is used to paste the text on the Clip Board to the SQL document.Select ALL : This sub menu is used to select the text present on the SQL document.

QueryChange Database : This sub menu is used to change the database on which user will prepare the query.Parse :  This sub menu is used to provide suggestions to the user regarding the Syntax and Index of the query which is been parsed.Execute :  This sub menu is used to provide the output of the query which is been executed by the user in the SQL document.Cancel Execution : This sub menu is used to cancel an execution of the query which is been submitted to the SQL Tuner for execution.

Window

Switch Pane : This sub menu is used to switch pane of the Analyzer form from one control to another.Hide Result Pane : This sub menu is used to toggle the Result pane of the Analyzer form.UtilitiesInsert/Update Template : This sub menu is used to create an SQL statement for Insert or Update statements.

- 119 -SQL tuner

Page 120: SQL Tuner Documentation

Analyzer Form(frmAnalyzer.cs)

 This form is the actual SQL document on which the Main form's functionality will perform actions. This form is the child form of the Main form and user can open more than one Analyzer form in SQL Tuner.  Working

The first textbox provides the user to input the query. The Data grid provides the user with the output of its query. The second textbox provides the user any recommendations or any

error in the SQL query.

- 120 -SQL tuner

Page 121: SQL Tuner Documentation

Change Database Form(frmChangeDB.cs)

 

This form is used by the user to change the database on which user wants to run the SQL query written in the SQL document.The form lists all the databases present in the SQL Server on which the SQL document is been connected. Working

The grid provides the user with all the databases provided by the SQL Server to which the SQL document is connected.

Select the database you want to work on and then click on "OK" button.

- 121 -SQL tuner

Page 122: SQL Tuner Documentation

Insert/Update Template Form (frmTemplate.cs)

 This form is used to create an Insert or an Update statements for the selected tables and its corresponding columns. The form will create the statements with complete syntax and the values will be given by a variable.

Working The user has to select the option of "Insert" or "Update" depending on

type of statement the user wants to generate.  The combo box provides the name of the table provided by the

database. User has to select one table from the combo box.  The box provides all the columns of the selected table, having a

checkbox for each of the columns.  The user can select all the columns by checking the "Check All

Columns" checkbox.  Clicking on "Generate" button will create the query and will be

displayed in the textbox.  User can then copy this textbox by clicking on "Copy" button on the

form.

- 122 -SQL tuner

Page 123: SQL Tuner Documentation

Future Enhancements

This project was developed to understand how the SQL Server Optimizer optimizes the queries and reduces the query’s CPU time and Input\Output required.

So there are many things for future enhancements of this project. The future enhancements that are possible in the project are as follows:

To optimize more complex queries i.e. queries which include Joins, Unions, Sub Queries etc.

To study the database structure and provide the user with suggestions to improve the database structure for best performance.

To optimize the query which is been embedded in the Application, without the efforts of the user or programmer entering the query in the SQL Tuner.

Optimizing more complex queries

SQL Tuner as of now tunes simple queries and doesn’t tune complex queries which contains joins between two or more tables or sub queries i.e. query which act as a where condition for another query or unions i.e. combination of two or more queries, also insert queries, update queries or delete queries cannot be tuned in the current version of SQL Tuner.

All the limitations can be implemented by using a more sophisticated parsing methodologies and a more detailed study of how SQL Optimizer works in more complex situations.

Optimizing Database StructureSQL Tuner can be made capable of tuning databases i.e. it can provide suggestions to improve database design by analyzing the databases and the types of queries of that are fired on it. Database design is important with respect to performance because bad logical database design results in bad physical database design, and generally results in poor database performance.

- 123 -SQL tuner

Page 124: SQL Tuner Documentation

This can be implemented by following some standard rules as specified below.

Following standard database normalization recommendations when designing OLTP databases can greatly maximize a database's performance.

Consider denormalizing some of the tables in order to reduce the number of required joins.

If we are designing a database that potentially could be very large, holding millions or billions of rows, we must try the option of horizontally partitioning your large tables.

To optimize SQL Server performance, we must design rows in such a way as to maximize the number of rows that can fit into a single data page.

TEXT, NTEXT, and IMAGE data should be stored separately from the rest of the data in a table. The table itself (in the appropriate columns) contains a 16-bit pointer that points to separate data pages that contain the TEXT, NTEXT, or IMAGE data. This should be done to enhance performance.

There are many other rules that can be implemented in SQL Tuner by using procedure to analyze the database or table design.

Optimizing Queries Embedded in the Applications

In SQL Tuner as of now the users have to write the query within the interface provided by SQL Tuner and only then it can be tuned or executed. But SQL Tuner can be also be modified in such a way it searches for the queries in the application provided to and automatically tune them, this may impose less load on the user.

There is also other way in which it can be developed as a component by which the developers can provide with the tuning facilities to the users of their application.

All this is achievable by changing the interface and adding the functionality to accept the applications source code files and traverse the code for the queries, analyze the database and perform the respective changes. We can also prepare an assembly which contains all the functions and properties and the developer can use it in his application and improve the performance of the application.

- 124 -SQL tuner

Page 125: SQL Tuner Documentation

Bibliography

Websiteshttp://www.google.comhttp://www.sql-server-performance.comhttp://www.sqlservercentral.comhttp://www.sqlite.orghttp://www.transactsql.comhttp://www.iAnywhere.comhttp://www.blogs.msdn.com/queryoptteamhttp://www.informit.comhttp://www.dotnetbips.comhttp:// [email protected]

Books

Microsoft SQL Server 2000 Performance Optimization and Tuning Handbook.

- Ken England (Butterworth-Heinemann)

Microsoft T-SQL Performance Tuning - Kevin Kline, Andrew Zanevsky, and Lee Gould Applications and Database Management, Quest Software, Inc.

Query Optimization - Yannis E. Ioannidis University of Wisconsin

SQL Server Books Online - Microsoft Corporation

Components Used gudusoft.gsqlparser.dll

gudusoft.gsqlparser.yyrec.dll

- 125 -SQL tuner