View
217
Download
1
Category
Tags:
Preview:
Citation preview
Data Warehousing Enhancements
Dr Keith BurnsData ArchitectDPE, Microsoft Ltd.
Transparent Data Encryption External Key Management Data Auditing Pluggable CPU Transparent Failover for
Database Mirroring Declarative Management
Framework Server Group Management Streamlined Installation Enterprise System
Management Performance Data Collection System Analysis Data Compression Query Optimization Modes Resource Governor Entity Data Model LINQ Visual Entity Designer Entity Aware Adapters
SQL Server Change Tracking Synchronized Programming Model Visual Studio Support SQL Server Conflict Detection FILESTREAM data type Integrated Full Text Search Sparse Columns Large User Defined Types Date/Time Data Type LOCATION data type SPATIAL data type Virtual Earth Integration Partitioned Table Parallelism Query Optimizations Persistent Lookups Change Data Capture Backup Compression MERGE SQL Statement Data Profiling Star Join
Enterprise Reporting Engine
Internet Report Deployment
Block Computations Scale out Analysis BI Platform
Management Export to Word and
Excel Author reports in
Word and Excel Report Builder
Enhancements TABLIX Rich Formatted Data Personalized
Perspectives … and many more
Microsoft SQL Server 2008
Transparent Data Encryption External Key Management Data Auditing Pluggable CPU Transparent Failover for
Database Mirroring Declarative Management
Framework Server Group Management Streamlined Installation Enterprise System
Management Performance Data Collection System Analysis Data Compression Query Optimization Modes Resource Governor Entity Data Model LINQ Visual Entity Designer Entity Aware Adapters
SQL Server Change Tracking Synchronized Programming Model Visual Studio Support SQL Server Conflict Detection FILESTREAM data type Integrated Full Text Search Sparse Columns Large User Defined Types Date/Time Data Type LOCATION data type SPATIAL data type Virtual Earth Integration Partitioned Table Parallelism Query Optimizations Persistent Lookups Change Data Capture Backup Compression MERGE SQL Statement Data Profiling Star Join
Enterprise Reporting Engine
Internet Report Deployment
Block Computations Scale out Analysis BI Platform Management Export to Word and
Excel Author reports in Word
and Excel Report Builder
Enhancements TABLIX Rich Formatted Data Personalized
Perspectives … and many more
Microsoft SQL Server 2008
MERGE• New DML statement that combines
multiple DML operations− Building block for more efficient ETL− SQL-2006 compliant implementation
MERGE• New DML statement that combines
multiple DML operations− Building block for more efficient ETL− SQL-2006 compliant implementation
XXXXX XXX XXXXXXX XXX
XXXXXXXXXXX XXX XXXX XXXX XXXX XXXXX XXX XX
Source
Source can be any table
or query
MERGE• New DML statement that combines
multiple DML operations− Building block for more efficient ETL− SQL-2006 compliant implementation
XXXXX XXX XXXXXXX XXX
XXXXXXXXXXX XXX XXXX XXXX XXXX XXXXX XXX XX
Source Target
XXXXX XXXXX
XXX XXXTarget can be any table or updateable
view
MERGE• New DML statement that combines
multiple DML operations− Building block for more efficient ETL− SQL-2006 compliant implementation
XXXXX XXX XXXXXXX XXX
XXXXXXXXXXX XXX XXXX XXXX XXXX XXXXX XXX XX
Source Target
XXXXX XXXXX
XXX XXX
XX XXX XXX
If source matches target, UPDATE
MERGE• New DML statement that combines
multiple DML operations− Building block for more efficient ETL− SQL-2006 compliant implementation
XXXXX XXX XXXXXXX XXX
XXXXXXXXXXX XXX XXXX XXXX XXXX XXXXX XXX XX
Source Target
XXXXX XXX XXXXXXX XXX
XXXXXXXXXXX XXX XXXX XXXX XXXX XXXXX XXX XX
XXX XXX
If no match, INSERT
MERGE• New DML statement that combines
multiple DML operations− Building block for more efficient ETL− SQL-2006 compliant implementation
XXXXX XXX XXXXXXX XXX
XXXXXXXXXXX XXX XXXX XXXX XXXX XXXXX XXX XX
Source Target
XXXXX XXX XXXXXXX XXX
XXXXXXXXXXX XXX XXXX XXXX XXXX XXXXX XXX XX
XXX XXX
If source not matched,DELETE
MERGEMERGE Stock S
USING Trades TON S.Stock = T.StockWHEN MATCHED AND (Qty + Delta = 0) THEN
DELETE -- delete stock if Qty reaches 0WHEN MATCHED THEN
-- delete takes precedence on updateUPDATE SET Qty += Delta
WHEN NOT MATCHED THENINSERT VALUES (Stock, Delta)
MERGEMERGE Stock S
USING Trades TON S.Stock = T.StockWHEN MATCHED AND (Qty + Delta = 0) THEN
DELETE -- delete stock if Qty reaches 0WHEN MATCHED THEN
-- delete takes precedence on updateUPDATE SET Qty += Delta
WHEN NOT MATCHED THENINSERT VALUES (Stock, Delta)
OUTPUT $action, T.Stock, inserted.Delta;
INSERT over DML
• Ability to have INSERT statement consume results of DML− Enhancement over OUTPUT INTO <table>
clause
• DML OUTPUT can be filtered with a WHERE clause− Data accessing predicates not allowed
(sub-queries, data accessing UDFs and full-text)
• Why?− History tracking of slowly changing
dimensions− Dumping DML data stream to a secondary
table for post-processing
INSERT over DMLINSERT INTO Books (ISBN, Price, Shelf, EndValidDate)SELECT ISBN, Price, Shelf, GetDate() FROM( MERGE Books T
USING WeeklyChanges AS S ON T.ISBN = S.ISBN AND T.EndValidDate IS NULL WHEN MATCHED AND
(T.Price <> S.Price OR T.Shelf <> S.Shelf) THEN
UPDATE SET Price = S.Price, Shelf = S.Shelf WHEN NOT MATCHED THEN
INSERT VALUES(S.ISBN, S.Price, S.Shelf, NULL) OUTPUT $action, S.ISBN, Deleted.Price, Deleted.Shelf
) Changes(Action, ISBN, Price, Shelf)WHERE Action = 'UPDATE’;
MERGE statement
demo
Logging Enhancements
• Minimal logging = log only what is strictly necessary for rollback− Normally individual rows are logged− Page allocations are sufficient to UNDO
insertions
• Recovery model must be simple or bulk-logged
• Previous releases− CREATE INDEX− SELECT INTO− BULK INSERT/BCP with TABLOCK
Logging Enhancements• SQL Server 2008
− INSERT into table supports minimal logging− 3X-5X Performance Boost over fully logged
INSERT
Index Insert
Heap Insert
SQL Server 2008SQL Server
Run Time
Logging demo
demo
Change Data Capture• Mechanism to easily track changes on a table
− Changes captured from the log asynchronously − Information on what changed at the source
• Table-Valued Functions (TVF) to query change data− Easily consumable from Integration Services
XXXXX XXX XXXXXXX XXX
XXXXXXXXXXX XXX XXXX XXXX XXXX XXXXX XXX XX
XXXXX XXX XXX XXXX XXX XXXXXXXXXX X XXX XXXX XX
XXXXXX
XXXXXXXXX
SourceTable
Transaction Log
ChangeTable
CDCFunctions
CaptureProcess
Data Compression
• Problem: − Database sizes are growing− Storage costs are becoming the dominant
hardware cost
• Main goal: Shrink DW fact tables• Secondary goal: Improve query
performance• Enabled per table or index• Tradeoff on CPU usage
Data CompressionDateId CarrierTracking OfferID PriceDisc
20070601
4911-403C-98 10 0.00
20070601
4911-403C-99 10 0.00
20070602
6431 10 0.00
20070602
6431-4D57-83 10 0.00
20070602
6431-4D57-84 10 0.00
20070602
6431-4D57-85 10 100.00
20070603
4E0A-4F89-AE 10 0.00
Data Compression• SQL Server 2005
SP2− VarDecimal
storage
• Enables decimal values to be stored as variable-length data
DateId CarrierTracking OfferID PriceDisc
20070601
4911-403C-98 10 0.00
20070601
4911-403C-99 10 0.00
20070602
6431 10 0.00
20070602
6431-4D57-83 10 0.00
20070602
6431-4D57-84 10 0.00
20070602
6431-4D57-85 10 100.00
20070603
4E0A-4F89-AE 10 0.00
DateId CarrierTracking OfferID PriceDisc
20070601
4911-403C-98 10 0.00
20070601
4911-403C-99 10 0.00
20070602
6431 10 0.00
20070602
6431-4D57-83 10 0.00
20070602
6431-4D57-84 10 0.00
20070602
6431-4D57-85 10 100.00
20070603
4E0A-4F89-AE 10 0.00
Data CompressionFixed-length
Column • SQL Server 2008
extends the logic to all fixed-length data types− e.g. int, bigint,
etc.
DateId CarrierTracking OfferID PriceDisc
20070601
4911-403C-98 10 0.00
20070601
4911-403C-99 10 0.00
20070602
6431 10 0.00
20070602
6431-4D57-83 10 0.00
20070602
6431-4D57-84 10 0.00
20070602
6431-4D57-85 10 100.00
20070603
4E0A-4F89-AE 10 0.00
DateId CarrierTracking OfferID PriceDisc
20070601
4911-403C-98 10 0.00
20070601
4911-403C-99 10 0.00
20070602
6431 10 0.00
20070602
6431-4D57-83 10 0.00
20070602
6431-4D57-84 10 0.00
20070602
6431-4D57-85 10 100.00
20070603
4E0A-4F89-AE 10 0.00
Data CompressionPrefix Compression• A prefix list is
stored in the page for common prefixes
• Individual values are replaced by− Token for the
prefix− Suffix for the
value
DateId CarrierTracking OfferID PriceDisc
20070601
4911-403C-98 10 0.00
20070601
4911-403C-99 10 0.00
20070602
6431 10 0.00
20070602
6431-4D57-83 10 0.00
20070602
6431-4D57-84 10 0.00
20070602
6431-4D57-85 10 100.00
20070603
4E0A-4F89-AE 10 0.00
DateId CarrierTracking OfferID PriceDisc
1 8 10 0.00
1 9 10 0.00
2 10 0.00
2 3 10 0.00
2 4 10 0.00
2 5 10 100.00
3 4E0A-4F89-AE 10 0.00
4911-403C-92 6431-4D57-8320070601
1
1
1
1
1
1
1
2
2
3
3
3
3
4
Data CompressionDictionary Compression• A common value
dictionary is stored in the page
• Common values are replaced by tokens
1.5X to 7X compression ratio for
real DW fact data anticipated,
depending on data
DateId CarrierTracking OfferID PriceDisc
1 8 10 0.00
1 9 10 0.00
2 10 0.00
2 3 10 0.00
2 4 10 0.00
2 5 10 100.00
3 4E0A-4F89-AE 10 0.00
4911-403C-92 6431-4D57-8320070601
1
1
1
1
1
1
1
2
2
3
3
3
3
4
DateId CarrierTracking OfferID PriceDisc
8
9
3
4
5 100.00
3 4E0A-4F89-AE
4911-403C-92 6431-4D57-8320070601
1
1
1
1
1
1
1
2
2
3
3
3
3
4
22 10311 0.004
1
1
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
Table: Orders Partitioned on a weekly basis on OrderDate
Monday MorningRun Weekly ReportGreat Response TimeHappy Users
Tuesday MorningRun Weekly ReportPoor Response TimeUnhappy Users
Why?
Partitioned Table Parallelism
Partitioned Table Parallelism
• SQL Server 2005 query− One partition => multiple threads− Multiple partitions => single thread /
partition
• SQL Server 2008 query− Multiple partitions => all threads utilised− Far more predictable query performance
Paritition Aligned Index Views• SQL Server 2005:
− Select ProductName, count(*) from ProductSales group by ProductName
− Index view is not partition aligned− Drop index view before switching partitions
• SQL Server 2008− Index views can be partition aligned− Basically:-
− Create view with SCHEMABINDING as in 2005− Create index on the view but add on “filegroup”
clause− Do this for both tables in switch statement− http://msdn.microsoft.com/en-us/library/bb9647
15.aspx− Gives performance of index view without
having to drop views which switching partitions.
Fact Table
Dimension 1
Dimension 2
Dimension 3
Dimension 4
SQL Server 2005 strategies
SQL Server 2008 additional query plans considered
Table Scan
Star Join Query Processing
Fact Table Scan
Dimension 2
Dimension 1
Hash Join
Hash Join
Star Join Query Processing
Fact Table Scan
Dimension 2
Dimension 1
Hash Join
Hash Join
Bitmap
Filter
SQL Server 2005can create one
bitmap filter
Star Join Query Processing
Fact Table Scan
Dimension 2
Dimension 1
Hash Join
Hash Join
Bitmap
Filter 2
Bitmap
Filter 1
SQL Server 2008can create multiple
bitmap filters
Star Join Query Processing
Fact Table Scan
Dimension 2
Dimension 1
Hash Join
Hash Join
Bitmap
Filter 1
Bitmap
Filter 2
SQL Server 2008can move and
reorder the filters
Star Join Query Processing
Grouping Sets
• Extension to the GROUP BY clause• Ability to define multiple groupings in
the same query• Produces a single result set that is
equivalent to a UNION ALL of differently grouped rows
• SQL 2006 standard compatibleMakes aggregation querying and
reporting easier and faster
SELECT a, b, c, d, SUM(sales) FROM Table GROUP BY GROUPING SETS ((a,b,c,), (c,d), ())
Example (GROUPING SETS)
-- Use UNION ALL on dual SELECT statementsSELECT customerType,Null as TerritoryID,MAX(ModifiedDate)FROM Sales.Customer GROUP BY customerTypeUNION ALLSELECT Null as customerType,TerritoryID,MAX(ModifiedDate)FROM Sales.Customer GROUP BY TerritoryIDorder by TerritoryID
-- Use GROUPING SETS on single SELECT statement
SELECT customerType, TerritoryID, max(ModifiedDate)FROM Sales.Customer GROUP BY GROUPING SETS ((customerType), (TerritoryID)) order by customerType
Backup Compression• Pain points:
− Keeping disk-based backups online is expensive
− Backups take longer, windows are shrinking
• SQL Server 2008− WITH COMPRESSION clause to BACKUP− Less storage required to keep backups
online− Backups run significantly faster, as less IO
is done− Restore automatically detects
compression and adjusts accordingly
SQL Server
SQL 2005 Resource Management
• Single resource pool
• Database engine doesn’t differentiate workloads
• Best effort resource sharing
Backup
Admin Tasks
Executive Reports
OLTP Activity
Ad-hoc Reports
Workloads
Memory, CPU, Threads, …
Resources
SQL Server
Resource Governor – Workloads
• Ability to differentiate workloads− e.g. app_name,
login• Per-request limits
− Max memory %− Max CPU time− Grant timeout− Max Requests
• Resource monitoring
Memory, CPU, Threads, …
Resources
Admin Workload
Backup
Admin Tasks
OLTP Workload
OLTP Activity
Report Workload
Ad-hocReports
ExecutiveReports
SQL Server
Memory, CPU, Threads, …
Resources
Admin Workload
Backup
Admin Tasks
OLTP Workload
OLTP Activity
Report Workload
Ad-hocReports
ExecutiveReports
Resource Governor – Importance
• A workload can have an importance label− Low− Medium− High
• Gives resource allocation preference to workloads based on importance
High
Resource Governor – Pools
• Resource pool: A virtual subset of physical database engine resources
• Provides controls to specify− Min Memory %− Max Memory %− Min CPU %− Max CPU %− Max DOP
• Resource monitoring
• Up to 20 resource pools
SQL Server
Min Memory 10%Max Memory 20%
Max CPU 20%
Admin Workload
Backup
Admin Tasks
OLTP Workload
OLTP Activity
Report Workload
Ad-hocReports
ExecutiveReports
High
Max CPU 90%
Application PoolAdmin Pool
Resource Governor
Putting it all together• Workloads are
mapped to Resource Pools (n : 1)
• Online changes of groups/pools
• SQL Server 2005 = default group + default pool
Main Benefit• Prevent run-away
queries
SQL Server
Min Memory 10%Max Memory 20%
Max CPU 20%
Admin Workload
Backup
Admin Tasks
OLTP Workload
OLTP Activity
Report Workload
Ad-hocReports
ExecutiveReports
High
Max CPU 90%
Application PoolAdmin Pool
Resource Governor
Martin BellCarillon Software Systems Limited
demo
New Date and Time data types
•Date Only•From 1/1/0001 to 1/1/9999•3 bytesDate•Time Only•Optional precision up to 100 nanoseconds•3 to 5 bytes (default 5bytes ie full resolution)Time•Timezone aware UTC datetime•Optional Precision up to 100 nanoseconds•8 to 10 bytes (default 10bytes ie full resolution)
DateTimeOffset
•Large Date Range•Optional Precision up to 100 nanoseconds•6 to 8 bytes (default 8bytes ie full resolution)
DateTime2
Plus assorted new date time functions eg SYSDATETIMEOFFSET()
Sparse Column Storage
ID Column Value
1 Q1 C
1 Q2 1
1 Q10 9
2 Q1 B
2 Q3 4
2 Q5 Low
3 Q1 C
3 Q7 6
3 Q8 5
PK Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
1 C 1 9
2 B 4 Low
3 C 6 5
4 1 5
5 E High Blue
6 C 8 9
7 5 7
8 A 2 Red
9 A 3 6
Desired schema Typical Solution
• The problem− Need to store spare data− Possibly 100’s of columns− Typically only few % are populated
Sparse Columns • “Sparse” as a storage attribute on a column
− 0 bytes for a NULL, 4 byte overhead for non-NULL− No change in Query/DML behavior− Same limitations as normal tables eg 1024 columns
• Wide Table -defining a “Sparse Column Set”
− An un-typed XML column, with a published format− Logical grouping for all sparse columns in a table− Select * returns all non-sparse-columns, sparse column set
(XML)− Allows generic retrieval/update of all sparse columns as
a set− 30,000 sparse columns allowed in a table (2Gb), 1000 indexes// Sparse as a storage attibute in Create/Alter table statements
Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE);
// Create a sparse column setCreate Table Products(Id int, Type nvarchar(16)…,
Resolution int SPARSE, ZoomLength int SPARSE, Properties XML COLUMN_SET FOR ALL_SPARSE_COLUMNS);
Filtered Indexes• Filtered Indexes and Statistics
− Indexing a portion of the data in a table− Filtered/co-related statistics creation and usage− Query/DML Optimization to use Filtered indexes and Statistics− Restricted to non-clustered indexes
• Benefits− Lower storage and maintenance costs for large number of
indexes− Query/DML Performance Benefits: IO only for qualifying rows
// Create a Filtered IndexesCreate Index ZoomIdx on Products(ZoomLength) where Type = ‘Camera’;
// Optimizer will pick the filtered index when query predicates matchSelect ProductId, Type, Resolution, ZoomLength where Type = ‘Camera’
Transparent Data Encryption External Key Management Data Auditing Pluggable CPU Transparent Failover for
Database Mirroring Declarative Management
Framework Server Group Management Streamlined Installation Enterprise System
Management Performance Data Collection System Analysis Data Compression Query Optimization
Modes Resource Governor Entity Data Model LINQ Visual Entity Designer Entity Aware Adapters
SQL Server Change Tracking Synchronized Programming Model Visual Studio Support SQL Server Conflict Detection FILESTREAM data type Integrated Full Text Search Sparse Columns Large User Defined Types Date/Time Data Type LOCATION data type SPATIAL data type Virtual Earth Integration Partitioned Table Parallelism Query Optimizations Persistent Lookups Change Data Capture Backup Compression MERGE SQL Statement Data Profiling Star Join
Enterprise Reporting Engine
Internet Report Deployment
Block Computations Scale out Analysis BI Platform Management Export to Word and
Excel Author reports in Word
and Excel Report Builder
Enhancements TABLIX Rich Formatted Data Personalized
Perspectives … and many more
Microsoft SQL Server 2008
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Recommended