SQL Server 2012 Best Practices

SQL Server Best PracticesMicrosoft TechNet Thursday

About me

@Ryazame

Part 1Generic – Independant of SQL Version

What’s the goal?

No Serious…What are these best practices

General rules and guidelines

Intend to improve: Maintenance

Performance

Availability

Quality

Not always 100% implementable But at least try

Document why

Coding(Believe it or not...)

If you don’t do this right It’s like...

Coding Best Practices

There is no performance loss for documented code Code Tells You How, Comments Tell You Why

Don’t hardcode SQL Server supports Variables ;-)

Format your code for Readability There is No “right” method

But… make clear agreements

And follow them

Windows Authentication

Easier to administer

Centralized

Better auditing

More Secure

Flexible

Always-On

Contained Databases

Normalize

Normalforms Aim for 3rd Normalform

Normalize first

DEnormalize when required

DEnormalization -> Sometimes OK DEnormalization can be done using many techniques

Un-Normalized

Data Integrity

Top priority

Maintained by using constraints Sometimes you’ll have to rely on triggers

Never trust code outside of the table to assure data integrity of a table

Primary Keys Should ALWAYS exist Even if you’ll have to make a surrogate key

Declare your alternate keys

Declared Referential Integrity Foreign Keys (Fast)

If there is absolutely no other choice -> Trigger code (Slow)

Data Integrity

Limit your column data Similar to Referential, but there is no table holding your values

Why not build that table?

Easier to manage, easier to scale

Use check constraints

Limit your data type Why is everyone afraid of the “big” bad tinyint?

Or even worse, the bit…

Clustered Index

Your table should have one Unless in very specific well documented cases, it will be faster

The primary key is usually NOT the best choice It is the default

Best choice can only be determined by usage If usage determines the PK to be the best choice, then it is!

Always keep partitioning in mind Should be your (range)-scan-key

Non-Clustered Indexes OLTP vs. OLAP

Avoid having more indexes then data... This is what makes a lot of databases SLOW²

Think about Scan vs. Seek

Think about entry points

Be carefull with: composite indexes with more then 2 columns

ABC <> BCA <> BAC -> If you’re not carefull you’ll be creating all 3

Included columns

Don’t include 90% of your table

Filtered Indexes

Know your logic and test!

Think about... Null’s

Generates quite some overhead

Has a meaning <> ‘None’

Datatypes Don’t overuse (n)varchar(max), think about the content

Examples

Telephone numbers (exists out of 4 blocks that all can have prefix 0) – E.164 standard

Country Code (Max 3) | regio code + Number (max 15) | Extention Max (4)

‘00999-(0)1-123.12.23 ext1234’ [varchar(33)] (2+33 bytes= 35 bytes)

‘+99911231223’,’1234’ [varchar(18)]+[varchar(4)] (2+18 + 2+4 bytes= 26 bytes)

tinyint,smallint | tinyint, tinyint | tinyint, int, int (1+2+1+1+1+4 (+4) = 10 + 4 Bytes)

Length, Value | Length, Value | Length, Value | Extention -> Other table (to avoid Nulls)

Bad Data types -> Avoid

TEXT String functions are limited

Indexing becomes useless

LARGE

NTEXT … No Comment

FLOAT, REAL Approximate numeric values

Not exact!

Can give “funny“ error’s 1E-999 <> 0

Char vs. Varchar

Action Char Varchar

Length Known Unknown

Fragmentation Easier to control Bad with updates

Flexibility None (From 1 to 8000) From 1 to MAX

Frequent Updates Size is allocated Needs to resize/split

Index able Supports Online Depends

Null size Full size is allocated + Overhead

Overhead

Avoid (When Possible) Empty space / Nulls MAX

SET-based

SQL is a set based language The optimizer is designed to do just that

Batch-mode

Typically represents 1000 rows of data.

Optimized for the multicore CPUs and increased memory throughput.

Batch mode processing spreads metadata costs and overhead.

UDF’s

User defined functions Make code easier to read

Make code easier to write

Be careful with non-deterministic

Can have a very negative impact on performance

Select *

Never use Select *

Avoid operators that don’t use your indexes

Explicit column lists Are less error prone

Easier to debug

Reduce Disk IO

More Maintainable

Columns can be added or re-positionned

Always

Use Begin and END

Even if it only contains one statement

Use schema name

There is a slight performance improvement

Makes code more readable

Use table alias

Even when not joining

Eliminated ambiguity

Reduce typo chance

Assist intellisence

Set Nocount on

Always

Use ANSI join syntax TSQL join syntax can return incorrect results

Is deprecated

Easier to read

Avoid

Table Hints

Index Hints

Join Hints

Lock Hints (this should be done on a higher level)

Very rare for the optimizer not to choose the best plan

Triple check your query (and do so with the full dataset)

Hints break your DBA’s ability to tune the database

Be careful with

Dynamic SQL If used wrongly, it will perform slower

Increased security risks because it does not take part in ownership chaining

@@Identity Can return wrong values if used in combination with triggers

Use SCOPE_IDENTITY or IDENT_CURRENT() instead

TRUNCATE Minimally logged

Doesn’t fire Triggers

Cannot use schema binding

Stored Procedures

Anticipate debug You can add a @Debug flag that talks or logs more

Make sure your stored procedures return values

Call SP’s with their parameter names Easier to read

More error free, because you can switch order

Error handling

Handle your nested transactions!

Temp Tables vs. Table Variable vs. Table Parameters

Size does matter

Test!

Consider derived tables or CTE’s

Never forget IO and scaling

Check your query plans

Think careful about the order of execution Take into consideration indexing

Query plan regeneration

Default values

Avoid

String = “Expression” Both in selects as in Where clauses

Be careful with NULL’s A Null value has a meaning

And it doesn’t mean “default” or “not available”

ANSI/ISO Standards

Use ANSI standards where possible

ISNULL vs. Coalesce

CURRENT_TIMESTAMP vs. Getdate()

ROWVERSION vs. Timestamp

ANSI SETTINGS -> ON

ANSI NULLS

ANSI PADDINGS

ANSI WARNING

ARITHABORT

CONCAT_NULL_YIELDS_NULL

QUOTED IDENTIFIERS

Numeric_Roundabout -> Should be OFF

Always Format your date time using ISO standards

YYYY-MM-DDTHH:MM:SS

Part 2 - 2012 Specific

Always ON

ColumnStore Indexes

Contained Databases

Filestore

Always-On vs. Clustering vs. Mirroring

Always ON

Always-ON Superior to Mirroring (Depricated)

Pro’s

Good wizard

Good dashboards

Same responsiveness in failover

Only One IP-adress

Multiple replica’s

Readable replica’s

Drop the [#@!*] snapshots

Contra

Same overhead

Same maintenance problems

Even more sensible to bad database design

Always-OnBe carefull with

Snapshot Isolation

Repeatable-read (LOCKS!)

Logins

Creating indexes for reporting on live databases Overhead

Backups on secondairy Copy only for the time being

TF9532 (Enable multiple replica’s in Always on)

Keep your settings compatible (ex. TF’s)

Bulk load isn’t supported

Always-ONSollutions CRUD overhead

Partition!

Maintenance overhead Partition !

No “good” Index’s for reporting vs. Overhead for OLTP Partition !

Users/logins/SID’s Partition ! (kidding)

Use windows Authentication

Use 'sp_help_revlogin‘ en automate it!

Careful with maintenance plans

AlwaysONPerformance benefits

Has huge benefits from combining it with: Resource governour

Compression

Non-Wizard maintenance

Read-only partitions

Dedicated data-network

Local (SSD) Storage

Documentation

PARTITIONING

Column Store IndexesFundamentals

Stores data in highly compressed format, with each column kept in a separate group of pages

Use the vector-based query execution method called "batch processing“

Segment Elimination

Engine pushes filters down into the scans

Makes the table/partition read-only

key to performance is to make sure your queries process the large majority of data in batch mode

Column Store IndexesDO’s & Don’ts

Do’s Only on large tables

Include every column

Star joins with grouping and aggregation

BATCH mode

On the OLAP part of your database

Don’ts String Filters on column store indexes

OUTER/CROSS JOIN

NOT IN

UNION ALL

ROW mode

ON the OLTP part of your database

Column Store IndexesMaximise Performance

Resource governour Maxdop >= 2

CTE’s Works arround not in Joins

Works arround UNION ALL

Carefull with EXISTS IN -> Inner joins

Data Managment DROP/Rebuild approach on data updates

Queries can become complex, but focus on Batch mode

Contained DatabasesSecurity

Disable the guest account

Duplicate Logins Sysadmins

Different passwords

Initial catalog

Containment Status of a Database

Attaching (Restricted_User mode)

Kerberos

Restrict access to the database file

Don’t use auto close -> DOS attacks

Excaping Contained databases

Filetable

(Disable windows Indexing on these disk volumes)

Disable generation of 8.3 names (command: FSUTIL BEHAVIOR SET DISABLE8DOT3 1)

Disable last file access time tracking (command: FSUTIL BEHAVIOR SET DISABLELASTACCESS 1)

Keep some space empty (let us say 15% for reference) on drive if possible

Defragement the volume

Is supported in ALWAYSON! If property is enabled on all servers

Using VNN’s

AlwaysOnMirroring – Clustering – LogshippingContained Databases, Column Store Index AlwaysOn complements these technologies

In a Way, AlwaysOn replaces Mirroring (Depricated)

Clearly a step into a new direction

To optimaly use these technologies Part 1 best practices are very important

Your database design should be as optimal as possible

Partitioning becomes a MUST

Resource governour becomes a MUST

You’ll need the Enterprise edtion

Call to action

Start giving feedback to your developers / 3rd party vendors NOW

Start thinking about Data flows

Data retention

Data management

Partitioning

Filegroups/Files

Data-tiering

Don’t Restrict your view to the boundairy of a database

Q&A

Technology

SQL Server 2012 Best Practices