Introduction to SQL Server Internals: How to Think Like the Engine

Preview:

Citation preview

Intro to InternalsHow to Think Like the SQL

EngineBrent Ozar, Brent Ozar

Unlimited

MIT License Copyright © 2016 Brent Ozar. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

3

I know, I hate killing trees.

But having these next 3 pages in your hand will help a lot as we talk through the demos.

Print this 3-page PDF to follow along:http://u.BrentOzar.com/engine.pdf

Brent OzarConsultant, Brent Ozar UnlimitedI make SQL Server faster and more reliable.

I created sp_Blitz® and the SQL Server First Responder Kit, and I loves sharing knowledge at BrentOzar.com. I hold a bunch of certifications and awards including the rare Microsoft Certified Master. You don’t care about any of that though.

Download the PDF: BrentOzar.com/go/enginepdf

/brentozar @brento brentozar

Agenda

When you pass in a query, how does SQL Server build the results? Time to role play: Brent will be an end user sending in queries, and you will play the part of the SQL Server engine. Using simple spreadsheets as your tables, you will learn how SQL Server builds execution plans, uses indexes, performs joins, and considers statistics.

This session is for DBAs and developers who are comfortable writing queries, but not so comfortable when it comes to explaining nonclustered indexes, lookups, sargability, fill factor, and corruption detection.

Index OR Data Rows

Slot Array

8KB

Page header

Leaf pages

Indexpages

You: SQL Server.Me: end user.

First query:SELECT IdFROM dbo.Users

Your execution plan:1. Shuffle through all of the pages,

saying the Id of each record out loud.

SQL Server’s execution plan

SET STATISTICS IO ONLogical reads: the number of 8K pages we read.(79,672 x 8KB = 637MB)

That’s 159 reams.

Let’s add a filter.SELECT IdFROM dbo.UsersWHERE LastAccessDate > ‘2014/07/01’

Your execution plan:1. Shuffle through all of the pages,

saying the Id of each record out loud,if their LastAccessDate > ‘2014/07/01’.

SQL Server’s execution plan

Lesson:Using a WHERE without

a matching index means scanning all the data.

Lesson:Estimated Subtree Cost guesses at CPU and IO

work required for a query.

Let’s add a sort.SELECT IdFROM dbo.UsersWHERE LastAccessDate > ‘2014/07/01’ORDER BY LastAccessDate

Your execution plan1. Shuffle through all of the pages,

writing down fields __________ for each record,if their LastAccessDate > ‘2014/07/01’.

2. Sort the matching records by LastAccessDate.

SQL Server’s execution plan

Cost is up ~4xWe needed space towrite down our results,so we got a memory grant

Order By:

Memory is set when the query starts, and not revised.

SQL Server has to assume other people will run queries at the same time as you.

Your memory grant can change with each time that you run a query.

You can’t always get what you want.

And if you run out of memory…

Let’s get all the fields.SELECT *FROM dbo.UsersWHERE LastAccessDate > ‘2014/07/01’ORDER BY LastAccessDate

Your execution plan1. Shuffle through all of the pages,

writing down fields __________ for each record,if their LastAccessDate > ‘2014/07/01’.

2. Sort the matching records by LastAccessDate.

Lesson:SELECT * sucks.

But let’s dig deeper.

Why does it suck?Do we work harder to read the data?Do we work harder to write the data?Do we work harder to sort the data?Do we work harder to output the data?

SQL Server’s execution plan

SELECT ID SELECT *No order 66 66ORDER BY 259 20,666

Lesson:Sorting is expensive,

and more fields makes it worse.

Let’s run it a few times.SELECT *FROM dbo.UsersWHERE LastAccessDate > ‘2014/07/01’ORDER BY LastAccessDate;GO 5

Your execution plan1. Shuffle through all of the pages,

writing down all the fields for each record,if their LastAccessDate > ‘2014/07/01’.

2. Sort the matching records by LastAccessDate.

3. Keep the output so you could reuse it the next time you saw this same query?

Oracle can. (It better, since it costs

$47,000 per core.)

SQL Server reads & sorts 5 times.

Lesson:SQL Server caches raw data pages, not output.

Nonclustered indexes: copies.Stored in the order we wantInclude the fields we wantCREATE INDEX IX_LastAccessDate_IdON dbo.Users(LastAccessDate, Id)

Let’s go simple again.SELECT IdFROM dbo.UsersWHERE LastAccessDate > ‘2014/07/01’ORDER BY LastAccessDate;

Your execution plan1. Grab IX_LastAccessDate and seek to

2014/07/01.2. Read the Id’s out in order.

SQL Server’s execution plan

SELECT ID SELECT *No order 66 66ORDER BY 259 20,666ORDER BY(with index)

10 6,354

Lesson:Indexes reduce reads.

Duh.

Lesson:Indexes also

reduce CPU time.

Yes, this is a “seek.”

Don’t think scan = terrible.

It covers the fields we need in this query.But if we change the query…

That’s a covering index.

Let’s add a couple of fields.SELECT Id, DisplayName, AgeFROM dbo.UsersWHERE LastAccessDate > ‘2014/07/01’ORDER BY LastAccessDate;

One execution plan1. Grab IX_LastAccessDate_Id, seek to

2014/07/01.2. Write down the Id and LastAccessDate of

matching records.3. Grab the clustered index (white pages),

and look up each matching row by their Id to get DisplayName and Age.

The SQL Server equivalent

For simplicity, I told you I created this index with the Id.

SQL Server always includes your clustering keys whether you ask for ‘em or not because it has to join indexes.

That’s why SQL Server includes the key

Key lookup is requiredwhen the index doesn’thave all the fields we need.Hover your mouse over thekey lookup and look for theOUTPUT fields.Small? Frequently used?Add ‘em to the index.DO NOT ADD A NEW INDEX.

Classic index tuning sign

But to get that plan, I had to cheat.

Because with 2014/07/01, I get:

Lesson:Even with indexes,

there’s a tipping pointwhere scans work better.

Enter statistics.

Decide which index to useWhat order to process tables/indexes inWhether to do seeks or scansGuess how many rows will match your queryHow much memory to allocate for the query

Statistics help SQL Server:

WHERE LastAccessDate > ‘2014/07/01’

Add it up, add it up

Automatic stats updates aren’t enough. Consider: • http://Ola.Hallengren.com• http://MinionWare.net/reindex Typical strategy: weekly statistics updatesUpdated statistics on an index invalidate query plans that involve that index• Affects your plan cache analysis• Can cause unpredictable query plan changes

Keep statistics updated.

How about on a single random date?

Let’s write it differently.

Wait – what?

Why can’t I get just one row

Lesson:This is called

Cardinality Estimation,and it’s not just about

keeping stats updated.

The Cardinality Estimator has huge improvements.

To turn ‘em on, just change your Compatibility Level.

Fortunately, SQL 2014/2016 fixes this.

And run the exact same query again

All better!

Lesson:2014/2016’s new

Cardinality Estimatoris, uh, new

Let’s add a join.

Lesson:bad cardinality

estimationis at the dark heartof many bad plans.

Whew.That’s a lot of lessons.

Clustered indexes hold all the fields*Nonclustered indexes are light-weight* copies of the tableNC indexes reduce not just reads, but also CPU workSQL Server caches raw data pages, not query outputStatistics drive seek vs scan, index choice, memoryStatistics aren’t the only part: cardinality estimation mattersIncludes and seeks aren’t magically delicious

What we learned

Thank You Learn more from

Brent Ozarhelp@brentozar.com or follow @BrentO

Recommended