Upload
aurek
View
63
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Performance Tuning SSIS. Brian Knight, CEO Pragmatic Works [email protected]. About the Ugly Guy Speaking. SQL Server MVP Founder of Pragmatic Works Co-Founder of BIDN.com, SQLServerCentral.com and SQLShare.com Written more than a dozen books on SQL Server. Mobile data. - PowerPoint PPT Presentation
Citation preview
Performance Tuning SSIS
Brian Knight, CEO Pragmatic [email protected]
About the Ugly Guy SpeakingSQL Server MVPFounder of Pragmatic WorksCo-Founder of BIDN.com, SQLServerCentral.com and SQLShare.comWritten more than a dozen books on SQL Server
GeoSpatial Data:Semi structured
Legacy data: binary files
Application database
Integration is a seamless, manageable operationSource, prepare, & load data in single, auditable processScale to handle heavy and complex data requirements
SQL Server Integration Services
GeoSpatialComponents
Customsource
Standardsources
Data-cleansingcomponents
Merges
Data miningcomponents
Warehouse
Reports
Mobiledata
Integration Services in Action
Cube
Advanced Session
Today’s Problems with Integration
Integration todayIncreasing data volumesIncreasingly diverse sources
Requirements reached the Tipping PointLow-impact source extractionEfficient transformationBulk loading techniques
Tuning DecisionsChoose the right tool for the jobDon’t be afraid to use T-SQLWill parallelism work?
Source OptimizationFlat files – When available, use Fast ParseOLE DB sources – Change network packet sizeUse T-SQL whenever possible in the OLE DB Source
JoiningNULL handlingWhere clauses
Impact of Compression on ETL
NONE ROW PAGE05
101520253035
0123456
BULK INSERT into a Heap with and without Data Compression
Time to BULK INSERT 50M rows (min)Table Size after Load (GB)
Compression Type
Tim
e (m
inut
es)
Tabl
e Si
ze a
fter L
oad
(GB)
* Not official Microsoft results.
Tuning the Source
Demo
Connection manager tuningFlat file tuningOLE DB Source tuning
Transform Componentsx x x
The Pipeline presents the buffer to each downstream component
x x xx x xx x xx x xx x x
SSIS Data Flow Architecture
Synchronous vs. Non Synchronous
Case Study: Patterns
105 seconds 83 seconds
Source Data Extraction
Extracting data from the source is expensiveEfficient extraction is key to improving ETL performanceInvolves bulk loading data into staging areas or warehouse
Time consuming & resource intensiveTriggers (synchronous IO penalty)Timestamp columns (Schema changes)Complex queries (delayed IO penalty)Custom (ISV, mirror, snapshot, …)
Incremental data load is key to efficient extractionNeed to know what changed at source since a point in time
Expensive lookups to determine changed columnsProviding information up front about which columns changedwill improve efficiency
SQL Server 2008: Change Data Capture (CDC)
Information about what changed at the sourceChanges captured from the log asynchronouslyEnabled per tableCDC APIs provide access to change data
Change Tables
OLTP
Data Warehouse
Change Data Capture
Demo
Traditional CDC with SSISIntegrating CDC in 2008
Lookup Component
Three modes of operationFull Cache: for small lookup datasetsNo Cache: for volatile lookup datasetsPartial Cache: for large lookup datasets
Tradeoff memory vs. performanceUse Cascaded LookupsMerge Join may be alternative
SQL Server 2008: Lookup Transform
Hydrate cache files for large data setsCan reuse cacheCan load cache during day and use in nightly ETL
DemoCascading lookup optimizationsCache file lookup
Data Destinations
Use “Fast Load” or SQL Server DestinationTable Lock on insert operationsTrace flags for improvementOld principles still apply
Destination Tuning
Demo
Building a Work Queue System
Create a work queue table.
Create a loop to shift over the work queue constantly checking out work
Spawn x times with a batch file
Demo Results
1 2 3 4 5 6 7 800:00.0
00:08.6
00:17.3
00:25.9
00:34.6
00:43.2
00:51.8
01:00.5
01:09.1
1 Process finishes in 64 seconds
Elap
sed
Tim
e
1 2 3 4 5 6 7 800:00.0
00:08.6
00:17.3
00:25.9
00:34.6
00:43.2
00:51.8
01:00.5
01:09.1
2 Processes finish in 36 seconds
Elap
sed
Tim
e
Demo Results
1 2 3 4 5 6 7 800:00.0
00:08.6
00:17.3
00:25.9
00:34.6
00:43.2
00:51.8
01:00.5
01:09.1
4 Processes finish in 28 seconds
Elap
sed
Tim
e
Demo Results
1 2 3 4 5 6 7 800:00.0
00:08.6
00:17.3
00:25.9
00:34.6
00:43.2
00:51.8
01:00.5
01:09.1
8 Processes finish in 27 seconds
Elap
sed
Tim
e
Demo Results
Parallel Load
Demo
Managing Resources
Logging events to watch pipeline internalsPipelineExecutionPlan, PipelineExecutionTree, BufferSizeTuning
System Monitor to track I/O issuesBuffers In Use tracks how many buffers are presently being usedBuffers Spooled tracks how many 10 mb buffers have been spooled to disk
Measuring PerformancePerfmon
Location
Consider the following configuration…
Where should SSIS run? (Licensing issues aside)
SQL Server 1 SQL Server 2
SSIS Server
WSRM
Windows System Resource Manager (WSRM) can throttle CPU and memory
Creates a soft throttleCan be scheduled so SSIS gets priority on weekends and nightsOnly activates policy if resources begin to become constrained (about 70%)WSRM is free with Windows Server 2003 Enterprise Edition and included in Windows Server 2008
WSRMCreating a soft schedule cap
Demo
Summary
PlanningDon’t underestimate the power of the whiteboard!
Use the right tool for the right jobLeverage the power of the engine
Patterns and PracticesUnderstand best practicesBut don’t be afraid to experiment
The End Already?
Questions
http://www.bidn.com/people/brianknight
@BrianKnight
http://www.youtube.com/pragmaticworks