Andras Belokosztolszki Red Gate Software

Andras BelokosztolszkiRed Gate Software

SQL Server Storage Engine

Software architect at Red Gate Software Responsible for SQL tools:

◦ SQL Compare, SQL Data Compare, SQL Packager◦ SQL Log Rescue◦ SQL Refactor◦ … many others

Events (NxtGenUG, VBUG, SQL Bits, PASS, many other user groups)

SQL Server Central Blog: http://www.simple-talk.com/community/blogs/andras/default.aspx Articles: http://www.simple-talk.com/author/andr%c3%a1s-belokosztolszki/

[email protected]

http://www.simple-talk.com/community/blogs/andras/default.aspx

http://www.simple-talk.com/author/andr%C3%A1s-belokosztolszki/

Physical storage◦ Pages, rows, data types, index structure

Data and schema modifications◦ What happens when you change the schema◦ What happens when a row is inserted, delted, etc

SQL Server 2008 features◦ Compression and file streams

Agenda

Primary database file (*.mdf) Secondary database files (*.ndf)

◦ Optional, can be more than one Log files (not covered)

Database files

Database

Primary Secondary

Secondary LogLog

Data files are dividied up into 8KB pages

All information is stored in pages (data, schema, database information, space allocation(GAM, SGAM, IAM), dlls)

Identified by fileId:PageId (2+4 bytes)

8 pages = 1 extent Most important for us is the

data page

Pages

1:0 1:1 1:31:2

1:4 1:5 1:71:6

1:8 1:9 1:B1:A

1:C 1:D 1:F1:E

Page header (96 bytes) Data rows Offset array

DBCC PAGE ◦ (db,file,page,options)

2 – raw, 3 – row details◦ Trace flag 3604

Structure of a data pagePage header

Demo

Fixed length data will always use its allocated space (even when it is null)

Must fit a page (max 8060 bytes)◦ Some items can overflow: Overflow space

Data row formatStat

A(1)

StatB

(1)

Null offset

(2)

Null bitmap

Ceiling(ColCnt/8)

Fixed Length Data

Var. Offsets

Var-Len

Column

Count(2)

Column

Count(2)

Var-Len Data

Demo

See sys.types◦ Fixed length (some can be adjusted (time,

decimal, char(), …) Always consumes this space

◦ Variable length (varchar, varbinary, …)◦ Bit (packed)◦ SqlVariant◦ Binary large objects (ntext, varchar(max), …)

After a certain size stored on other pages

Data types

From tables to pages

Heap/Index Partition Allocation Unit

N1 1 3

sys.indexes sys.allocation_units

In row data LOB Row overflow

sys.partitions sp JOIN sys.allocation_units au ON sp.partition_id = au.container_id

sys.partitions

Clustered index

Level 0Leaf levelRow Data

Interior levels

Root level

The full row record is at the leaf level◦ Consequently there can be only one clustered

index In the intermediary and root levels a

clustered key is stored, for the first entries of the next level pages

If the key row length is e.g. 15 bytes, an intermediary page can store up to (8096/15 =) 539 rows (reference 539 pages)

Exact space usage in sys.allocation_units Pages are double linked

Clustered index

Nonclustered index

Row Data

Leaf level

Root level/Interior levels

See sys.allocation_units Max 900 bytes per entry! Index entry contains the key columns, and

◦ Index key columns◦ Record locator (nonclusered)

Row ID or clustering key (not stored redundantly)◦ Down pointer (for non leaf pages)

Index space usage

StatA

(1)

Null bitmap

Ceiling(ColCnt/8)

Fixed Length Data

Var. Offsets

Var-Len

Column

Count(2)

Column

Count(2)

Var-Len Data

Motivation:◦ When using a clustered index on heap, an item is looked

up, then one more page read to retrieve extra data◦ When using a clustered index on a B-tree, the clustered

index structure is also traversed You can include extra columns in a non-clustered

index These will not be used to look up rows in the table Increases the coverage of an index Increases the size of an index record -> the total

size Extra maintenance

Included columns

Everything is stored on pages Rows have fixed and variable length

portions◦ Differences between certain data types and their

limitations Index structures

◦ Size estimates for indexes, page estimates for queries

The fewer pages we load into memory, the better?

Summary of static data storage

Schema changes

•Adding a column•Changing a column•Dropping a column

Data changes

•Inserting a row•Deleting a row•Altering a row

Modifications to the stored information

◦ What can happen: No rows are modified, only meta information All rows are examined

E.g. changing nullability Int to smallint (wasted space!)

All rows are rebuilt

◦ We may end up wasting a lot of valueable space! How can we reclaim the space?

Schema modification

Demo

Insert: added where there is space Delete: removed or marked as ghost Update: Since indexes refer to file:page:slot

if a row no longer fits on a page, it cannot easily be moved -> it is moved, but a reference to it is left (forwarded record)

Modifications on heaps

Insert: Since the rows are ordered, if there is not enough space on a table, the table is split into two (can happen many times)

Update: ◦ like inserts, if the new row is too big to fit◦ Changes to clustering columns = delete+insert

Delete: the row is marked as ghost or is deleted

Modifications on clustered tables

Phil Factor and Pad IndexPad Index• Intermediary pages

only• Specified as

percentage

Fill Factor• Leaf pages only• Specified as

percentage

Only when index is created or rebuilt. The free space is NOT maintained. (see later index reorganization and rebuilding)

sys.dm_db_index_physical_stats() Logical fragmentation: next leaf page for index page is

not the next page that is allocated to the index Extent fragmentation: extents are not contiguous Page fill

Fragmentation

Drop and create the clustered index◦ Index is offline

ALTER INDEX REORGANIZE◦ This is the replacement for DBCC INDEXDEFRAG◦ Reorganizes index pages (and compacts pages

and LOBs) (NO new pages) ALTER INDEX REBUILD

◦ This is the replacement for DBCC DBREINDEX◦ Basically drops and recreates the index

Handling fragmentation

Introduced in SQL Server 2008 Stores fixed length data as variable length

◦ E.g. Integer – can use 1,2,3,4 bytes + bits instead of 4 bytes + bit

Available in Enterprise edition

Row compression

CREATE TABLE RowCompressedTable(…) WITH (DATA_COMPRESSION = Row);

CD Array: 0 = null, 1 – 9 number of bytes, 10 – long

Self contained

Compressed rowStat

A(1)

CD Array(4b/col)

Column

Count (1/2)

Null bitmap

Ceiling(ColCnt/8)

Var. Offsets

Var-Len

Column

Count(2)

Short dataVar-Len Data

WITH (data_compression = row)

Row compression Prefix compression Dictionary compression

When table created, there is no compression Row compression kicks in when otherwise a

page split would occur When table with data converted it is rebuilt

sp_estimate_data_compression_savings

Page compression

Prefix compression

Page header

aaabb aaaab abcdaaabcc bbbbaaaccc

abcdaaaacc bbbb

Page header

4b 4b [][] 0bbbb

3ccc[]

[] 0bbbb

aaabcc aaaacc abcd

Dictionary compression

Page header

4b 4b [][] 0bbbb

3ccc[]

[] 0bbbb

aaabcc aaaacc abcd

Page header

0 0 [][] 1

3ccc[]

[] 1

aaabcc aaaacc abcd4b 0bbbb

B-tree structure Many pages need to be looked up Smaller BLOBs can be inlined

sp_tableoption <tablename>, ‘text in row’, <length>

BLOB Structure

Data row Text Pointer

Root entry

Intermediate node Intermediate node

Data fragment

Data fragment Data fragment Data

fragment

When BLOBs are not enough:◦ Large items (over 1Mb)◦ Very fast read is needed◦ 2GB++

Can use T-SQL to access File stream access vie Win32 API

Filestreams

Static data storage ◦ Table and index rows◦ The way these are linked together

What happens during schema and data modifications

Lessons to take away◦ Minimize the number of pages you need to read or

write◦ Rebuild your tables and use fill factor, and rebuild

indexes durng off peak hours!◦ Use the specialized data types and storage options

Summary

Thanks to SQL Bits & Sponsors Blog: http://www.simple-talk.com/community/blogs/andras/default.aspx Email: Andras.Belokosztolszki (at) red-

gate.com

Questions



Documents

Andras Belokosztolszki Red Gate Software