Upload
vinnie
View
51
Download
0
Embed Size (px)
DESCRIPTION
Andras Belokosztolszki Red Gate Software. SQL Server Storage Engine. [email protected]. Software architect at Red Gate Software Responsible for SQL tools: SQL Compare, SQL Data Compare, SQL Packager SQL Log Rescue SQL Refactor … many others - PowerPoint PPT Presentation
Citation preview
Andras BelokosztolszkiRed Gate Software
SQL Server Storage Engine
Software architect at Red Gate Software Responsible for SQL tools:
◦ SQL Compare, SQL Data Compare, SQL Packager◦ SQL Log Rescue◦ SQL Refactor◦ … many others
Events (NxtGenUG, VBUG, SQL Bits, PASS, many other user groups)
SQL Server Central Blog: http://www.simple-talk.com/community/blogs/andras/default.aspx Articles: http://www.simple-talk.com/author/andr%c3%a1s-belokosztolszki/
Physical storage◦ Pages, rows, data types, index structure
Data and schema modifications◦ What happens when you change the schema◦ What happens when a row is inserted, delted, etc
SQL Server 2008 features◦ Compression and file streams
Agenda
Primary database file (*.mdf) Secondary database files (*.ndf)
◦ Optional, can be more than one Log files (not covered)
Database files
Database
Primary Secondary
Secondary LogLog
Data files are dividied up into 8KB pages
All information is stored in pages (data, schema, database information, space allocation(GAM, SGAM, IAM), dlls)
Identified by fileId:PageId (2+4 bytes)
8 pages = 1 extent Most important for us is the
data page
Pages
1:0 1:1 1:31:2
1:4 1:5 1:71:6
1:8 1:9 1:B1:A
1:C 1:D 1:F1:E
Page header (96 bytes) Data rows Offset array
DBCC PAGE ◦ (db,file,page,options)
2 – raw, 3 – row details◦ Trace flag 3604
Structure of a data pagePage header
Demo
Fixed length data will always use its allocated space (even when it is null)
Must fit a page (max 8060 bytes)◦ Some items can overflow: Overflow space
Data row formatStat
A(1)
StatB
(1)
Null offset
(2)
Null bitmap
Ceiling(ColCnt/8)
Fixed Length Data
Var. Offsets
Var-Len
Column
Count(2)
Column
Count(2)
Var-Len Data
Demo
See sys.types◦ Fixed length (some can be adjusted (time,
decimal, char(), …) Always consumes this space
◦ Variable length (varchar, varbinary, …)◦ Bit (packed)◦ SqlVariant◦ Binary large objects (ntext, varchar(max), …)
After a certain size stored on other pages
Data types
From tables to pages
Heap/Index Partition Allocation Unit
N1 1 3
sys.indexes sys.allocation_units
In row data LOB Row overflow
sys.partitions sp JOIN sys.allocation_units au ON sp.partition_id = au.container_id
sys.partitions
Clustered index
Level 0Leaf levelRow Data
Interior levels
Root level
The full row record is at the leaf level◦ Consequently there can be only one clustered
index In the intermediary and root levels a
clustered key is stored, for the first entries of the next level pages
If the key row length is e.g. 15 bytes, an intermediary page can store up to (8096/15 =) 539 rows (reference 539 pages)
Exact space usage in sys.allocation_units Pages are double linked
Clustered index
Nonclustered index
Row Data
Leaf level
Root level/Interior levels
See sys.allocation_units Max 900 bytes per entry! Index entry contains the key columns, and
◦ Index key columns◦ Record locator (nonclusered)
Row ID or clustering key (not stored redundantly)◦ Down pointer (for non leaf pages)
Index space usage
StatA
(1)
Null bitmap
Ceiling(ColCnt/8)
Fixed Length Data
Var. Offsets
Var-Len
Column
Count(2)
Column
Count(2)
Var-Len Data
Motivation:◦ When using a clustered index on heap, an item is looked
up, then one more page read to retrieve extra data◦ When using a clustered index on a B-tree, the clustered
index structure is also traversed You can include extra columns in a non-clustered
index These will not be used to look up rows in the table Increases the coverage of an index Increases the size of an index record -> the total
size Extra maintenance
Included columns
Everything is stored on pages Rows have fixed and variable length
portions◦ Differences between certain data types and their
limitations Index structures
◦ Size estimates for indexes, page estimates for queries
The fewer pages we load into memory, the better?
Summary of static data storage
Schema changes
•Adding a column•Changing a column•Dropping a column
Data changes
•Inserting a row•Deleting a row•Altering a row
Modifications to the stored information
◦ What can happen: No rows are modified, only meta information All rows are examined
E.g. changing nullability Int to smallint (wasted space!)
All rows are rebuilt
◦ We may end up wasting a lot of valueable space! How can we reclaim the space?
Schema modification
Demo
Insert: added where there is space Delete: removed or marked as ghost Update: Since indexes refer to file:page:slot
if a row no longer fits on a page, it cannot easily be moved -> it is moved, but a reference to it is left (forwarded record)
Modifications on heaps
Insert: Since the rows are ordered, if there is not enough space on a table, the table is split into two (can happen many times)
Update: ◦ like inserts, if the new row is too big to fit◦ Changes to clustering columns = delete+insert
Delete: the row is marked as ghost or is deleted
Modifications on clustered tables
Phil Factor and Pad IndexPad Index• Intermediary pages
only• Specified as
percentage
Fill Factor• Leaf pages only• Specified as
percentage
Only when index is created or rebuilt. The free space is NOT maintained. (see later index reorganization and rebuilding)
sys.dm_db_index_physical_stats() Logical fragmentation: next leaf page for index page is
not the next page that is allocated to the index Extent fragmentation: extents are not contiguous Page fill
Fragmentation
Drop and create the clustered index◦ Index is offline
ALTER INDEX REORGANIZE◦ This is the replacement for DBCC INDEXDEFRAG◦ Reorganizes index pages (and compacts pages
and LOBs) (NO new pages) ALTER INDEX REBUILD
◦ This is the replacement for DBCC DBREINDEX◦ Basically drops and recreates the index
Handling fragmentation
Introduced in SQL Server 2008 Stores fixed length data as variable length
◦ E.g. Integer – can use 1,2,3,4 bytes + bits instead of 4 bytes + bit
Available in Enterprise edition
Row compression
CREATE TABLE RowCompressedTable(…) WITH (DATA_COMPRESSION = Row);
CD Array: 0 = null, 1 – 9 number of bytes, 10 – long
Self contained
Compressed rowStat
A(1)
CD Array(4b/col)
Column
Count (1/2)
Null bitmap
Ceiling(ColCnt/8)
Var. Offsets
Var-Len
Column
Count(2)
Short dataVar-Len Data
WITH (data_compression = row)
Row compression Prefix compression Dictionary compression
When table created, there is no compression Row compression kicks in when otherwise a
page split would occur When table with data converted it is rebuilt
sp_estimate_data_compression_savings
Page compression
Prefix compression
Page header
aaabb aaaab abcdaaabcc bbbbaaaccc
abcdaaaacc bbbb
Page header
4b 4b [][] 0bbbb
3ccc[]
[] 0bbbb
aaabcc aaaacc abcd
Dictionary compression
Page header
4b 4b [][] 0bbbb
3ccc[]
[] 0bbbb
aaabcc aaaacc abcd
Page header
0 0 [][] 1
3ccc[]
[] 1
aaabcc aaaacc abcd4b 0bbbb
B-tree structure Many pages need to be looked up Smaller BLOBs can be inlined
sp_tableoption <tablename>, ‘text in row’, <length>
BLOB Structure
Data row Text Pointer
Root entry
Intermediate node Intermediate node
Data fragment
Data fragment Data fragment Data
fragment
When BLOBs are not enough:◦ Large items (over 1Mb)◦ Very fast read is needed◦ 2GB++
Can use T-SQL to access File stream access vie Win32 API
Filestreams
Static data storage ◦ Table and index rows◦ The way these are linked together
What happens during schema and data modifications
Lessons to take away◦ Minimize the number of pages you need to read or
write◦ Rebuild your tables and use fill factor, and rebuild
indexes durng off peak hours!◦ Use the specialized data types and storage options
Summary
Thanks to SQL Bits & Sponsors Blog: http://www.simple-talk.com/community/blogs/andras/default.aspx Email: Andras.Belokosztolszki (at) red-
gate.com
Questions