View
6.617
Download
1
Embed Size (px)
DESCRIPTION
Presented by Karel Coenye.
Citation preview
SQL Server Best PracticesMicrosoft TechNet Thursday
About me
@Ryazame
Part 1Generic – Independant of SQL Version
What’s the goal?
No Serious…What are these best practices
General rules and guidelines
Intend to improve: Maintenance
Performance
Availability
Quality
Not always 100% implementable But at least try
Document why
Coding(Believe it or not...)
If you don’t do this right It’s like...
Coding Best Practices
There is no performance loss for documented code Code Tells You How, Comments Tell You Why
Don’t hardcode SQL Server supports Variables ;-)
Format your code for Readability There is No “right” method
But… make clear agreements
And follow them
Windows Authentication
Easier to administer
Centralized
Better auditing
More Secure
Flexible
Always-On
Contained Databases
Normalize
Normalforms Aim for 3rd Normalform
Normalize first
DEnormalize when required
DEnormalization -> Sometimes OK DEnormalization can be done using many techniques
Un-Normalized
Data Integrity
Top priority
Maintained by using constraints Sometimes you’ll have to rely on triggers
Never trust code outside of the table to assure data integrity of a table
Primary Keys Should ALWAYS exist Even if you’ll have to make a surrogate key
Declare your alternate keys
Declared Referential Integrity Foreign Keys (Fast)
If there is absolutely no other choice -> Trigger code (Slow)
Data Integrity
Limit your column data Similar to Referential, but there is no table holding your values
Why not build that table?
Easier to manage, easier to scale
Use check constraints
Limit your data type Why is everyone afraid of the “big” bad tinyint?
Or even worse, the bit…
Clustered Index
Your table should have one Unless in very specific well documented cases, it will be faster
The primary key is usually NOT the best choice It is the default
Best choice can only be determined by usage If usage determines the PK to be the best choice, then it is!
Always keep partitioning in mind Should be your (range)-scan-key
Non-Clustered Indexes OLTP vs. OLAP
Avoid having more indexes then data... This is what makes a lot of databases SLOW²
Think about Scan vs. Seek
Think about entry points
Be carefull with: composite indexes with more then 2 columns
ABC <> BCA <> BAC -> If you’re not carefull you’ll be creating all 3
Included columns
Don’t include 90% of your table
Filtered Indexes
Know your logic and test!
Think about... Null’s
Generates quite some overhead
Has a meaning <> ‘None’
Datatypes Don’t overuse (n)varchar(max), think about the content
Examples
Telephone numbers (exists out of 4 blocks that all can have prefix 0) – E.164 standard
Country Code (Max 3) | regio code + Number (max 15) | Extention Max (4)
‘00999-(0)1-123.12.23 ext1234’ [varchar(33)] (2+33 bytes= 35 bytes)
‘+99911231223’,’1234’ [varchar(18)]+[varchar(4)] (2+18 + 2+4 bytes= 26 bytes)
tinyint,smallint | tinyint, tinyint | tinyint, int, int (1+2+1+1+1+4 (+4) = 10 + 4 Bytes)
Length, Value | Length, Value | Length, Value | Extention -> Other table (to avoid Nulls)
Bad Data types -> Avoid
TEXT String functions are limited
Indexing becomes useless
LARGE
NTEXT … No Comment
FLOAT, REAL Approximate numeric values
Not exact!
Can give “funny“ error’s 1E-999 <> 0
Char vs. Varchar
Action Char Varchar
Length Known Unknown
Fragmentation Easier to control Bad with updates
Flexibility None (From 1 to 8000) From 1 to MAX
Frequent Updates Size is allocated Needs to resize/split
Index able Supports Online Depends
Null size Full size is allocated + Overhead
Overhead
Avoid (When Possible) Empty space / Nulls MAX
SET-based
SQL is a set based language The optimizer is designed to do just that
Batch-mode
Typically represents 1000 rows of data.
Optimized for the multicore CPUs and increased memory throughput.
Batch mode processing spreads metadata costs and overhead.
UDF’s
User defined functions Make code easier to read
Make code easier to write
Be careful with non-deterministic
Can have a very negative impact on performance
Select *
Never use Select *
Avoid operators that don’t use your indexes
Explicit column lists Are less error prone
Easier to debug
Reduce Disk IO
More Maintainable
Columns can be added or re-positionned
Always
Use Begin and END
Even if it only contains one statement
Use schema name
There is a slight performance improvement
Makes code more readable
Use table alias
Even when not joining
Eliminated ambiguity
Reduce typo chance
Assist intellisence
Set Nocount on
Always
Use ANSI join syntax TSQL join syntax can return incorrect results
Is deprecated
Easier to read
Avoid
Table Hints
Index Hints
Join Hints
Lock Hints (this should be done on a higher level)
Very rare for the optimizer not to choose the best plan
Triple check your query (and do so with the full dataset)
Hints break your DBA’s ability to tune the database
Be careful with
Dynamic SQL If used wrongly, it will perform slower
Increased security risks because it does not take part in ownership chaining
@@Identity Can return wrong values if used in combination with triggers
Use SCOPE_IDENTITY or IDENT_CURRENT() instead
TRUNCATE Minimally logged
Doesn’t fire Triggers
Cannot use schema binding
Stored Procedures
Anticipate debug You can add a @Debug flag that talks or logs more
Make sure your stored procedures return values
Call SP’s with their parameter names Easier to read
More error free, because you can switch order
Error handling
Handle your nested transactions!
Temp Tables vs. Table Variable vs. Table Parameters
Size does matter
Test!
Consider derived tables or CTE’s
Never forget IO and scaling
Check your query plans
Think careful about the order of execution Take into consideration indexing
Query plan regeneration
Default values
Avoid
String = “Expression” Both in selects as in Where clauses
Be careful with NULL’s A Null value has a meaning
And it doesn’t mean “default” or “not available”
ANSI/ISO Standards
Use ANSI standards where possible
ISNULL vs. Coalesce
CURRENT_TIMESTAMP vs. Getdate()
ROWVERSION vs. Timestamp
ANSI SETTINGS -> ON
ANSI NULLS
ANSI PADDINGS
ANSI WARNING
ARITHABORT
CONCAT_NULL_YIELDS_NULL
QUOTED IDENTIFIERS
Numeric_Roundabout -> Should be OFF
Always Format your date time using ISO standards
YYYY-MM-DDTHH:MM:SS
Part 2 - 2012 Specific
Always ON
ColumnStore Indexes
Contained Databases
Filestore
Always-On vs. Clustering vs. Mirroring
Always ON
Always-ON Superior to Mirroring (Depricated)
Pro’s
Good wizard
Good dashboards
Same responsiveness in failover
Only One IP-adress
Multiple replica’s
Readable replica’s
Drop the [#@!*] snapshots
Contra
Same overhead
Same maintenance problems
Even more sensible to bad database design
Always-OnBe carefull with
Snapshot Isolation
Repeatable-read (LOCKS!)
Logins
Creating indexes for reporting on live databases Overhead
Backups on secondairy Copy only for the time being
TF9532 (Enable multiple replica’s in Always on)
Keep your settings compatible (ex. TF’s)
Bulk load isn’t supported
Always-ONSollutions CRUD overhead
Partition!
Maintenance overhead Partition !
No “good” Index’s for reporting vs. Overhead for OLTP Partition !
Users/logins/SID’s Partition ! (kidding)
Use windows Authentication
Use 'sp_help_revlogin‘ en automate it!
Careful with maintenance plans
AlwaysONPerformance benefits
Has huge benefits from combining it with: Resource governour
Compression
Non-Wizard maintenance
Read-only partitions
Dedicated data-network
Local (SSD) Storage
Documentation
PARTITIONING
Column Store IndexesFundamentals
Stores data in highly compressed format, with each column kept in a separate group of pages
Use the vector-based query execution method called "batch processing“
Segment Elimination
Engine pushes filters down into the scans
Makes the table/partition read-only
key to performance is to make sure your queries process the large majority of data in batch mode
Column Store IndexesDO’s & Don’ts
Do’s Only on large tables
Include every column
Star joins with grouping and aggregation
BATCH mode
On the OLAP part of your database
Don’ts String Filters on column store indexes
OUTER/CROSS JOIN
NOT IN
UNION ALL
ROW mode
ON the OLTP part of your database
Column Store IndexesMaximise Performance
Resource governour Maxdop >= 2
CTE’s Works arround not in Joins
Works arround UNION ALL
Carefull with EXISTS IN -> Inner joins
Data Managment DROP/Rebuild approach on data updates
Queries can become complex, but focus on Batch mode
Contained DatabasesSecurity
Disable the guest account
Duplicate Logins Sysadmins
Different passwords
Initial catalog
Containment Status of a Database
Attaching (Restricted_User mode)
Kerberos
Restrict access to the database file
Don’t use auto close -> DOS attacks
Excaping Contained databases
Filetable
(Disable windows Indexing on these disk volumes)
Disable generation of 8.3 names (command: FSUTIL BEHAVIOR SET DISABLE8DOT3 1)
Disable last file access time tracking (command: FSUTIL BEHAVIOR SET DISABLELASTACCESS 1)
Keep some space empty (let us say 15% for reference) on drive if possible
Defragement the volume
Is supported in ALWAYSON! If property is enabled on all servers
Using VNN’s
AlwaysOnMirroring – Clustering – LogshippingContained Databases, Column Store Index AlwaysOn complements these technologies
In a Way, AlwaysOn replaces Mirroring (Depricated)
Clearly a step into a new direction
To optimaly use these technologies Part 1 best practices are very important
Your database design should be as optimal as possible
Partitioning becomes a MUST
Resource governour becomes a MUST
You’ll need the Enterprise edtion
Call to action
Start giving feedback to your developers / 3rd party vendors NOW
Start thinking about Data flows
Data retention
Data management
Partitioning
Filegroups/Files
Data-tiering
Don’t Restrict your view to the boundairy of a database
Q&A