43

SSIS Exploring Scalability, Performance and Deployment Vinod Kumar & Srinivas Sampath MVP – SQL Server

Embed Size (px)

Citation preview

SSISSSISExploring Scalability, Exploring Scalability,

Performance and Performance and Deployment Deployment

Vinod Kumar & Srinivas SampathVinod Kumar & Srinivas SampathMVP – SQL ServerMVP – SQL Server

Presentation ScopePresentation Scope

A high level viewA high level viewDesign considerationsDesign considerations

How to measure performanceHow to measure performance

Performance implications of architecturePerformance implications of architecture

Manageability aspects of SSISManageability aspects of SSIS

Deployment tipsDeployment tips

Out of scopeOut of scopePrescriptive guidance for specific Prescriptive guidance for specific situationssituations

AgendaAgenda

Buffers and MemoryBuffers and Memory

OVAL Concept DetailedOVAL Concept Detailed

Component Specific NotesComponent Specific Notes

Manageability FeaturesManageability Features

Deployment ConsiderationsDeployment Considerations

IntroductionIntroduction

SSIS Life Cycle toolsSSIS Life Cycle toolsDesign the SSIS PackageDesign the SSIS Package

Business Intelligence Studio (visual Studio)Business Intelligence Studio (visual Studio)

Migration wizard for pre SQL 2005 packagesMigration wizard for pre SQL 2005 packages

Version Control Integration (VSS)Version Control Integration (VSS)

Deployment/ExecutionDeployment/ExecutionDeployment Utility to copy packagesDeployment Utility to copy packages

Command Line execution (dtexec.exe and dtexecui.exe)Command Line execution (dtexec.exe and dtexecui.exe)

Flexible Configuration OptionsFlexible Configuration Options

SupportabilitySupportabilityRich per package Logging Rich per package Logging

SQL Management Studio for monitoring running packages SQL Management Studio for monitoring running packages and organizing stored packages and organizing stored packages

Checkpoint - RestartabilityCheckpoint - Restartability

SSIS ToolsSSIS Tools

SSIS packagespackages

BI Studio

SSIS Service

Mgt Studio

Import Export Wizard

Deployment

Installer File set

Dtexec.exe

Dtexecui.exe

Dtutil.exe

executionView running and import\export

deploy

Deep dive into PerformanceDeep dive into Performance

Buffers and MemoryBuffers and MemoryBuffers based on design time metadataBuffers based on design time metadata

The width of a row determines the size of the The width of a row determines the size of the bufferbuffer

Smaller rows = more rows in memory = greater Smaller rows = more rows in memory = greater efficiencyefficiency

Memory copies are expensive!Memory copies are expensive!A buffer might have placeholder columns filled by A buffer might have placeholder columns filled by downstream componentsdownstream components

Pointer magic where possiblePointer magic where possible

Component TypesComponent Types

Logically works at a row levelLogically works at a row levelBuffer ReusedBuffer ReusedData Convert, Derived ColumnData Convert, Derived Column

Row basedRow based(synchronous(synchronous

outputs)outputs)

Partially Partially BlockingBlocking

(asynchronous(asynchronousoutputs)outputs)

BlockingBlocking(asynchronous(asynchronous

outputs)outputs)

May logically work at a row levelMay logically work at a row levelData copied to new buffersData copied to new buffersMerge, Merge Join, Union AllMerge, Merge Join, Union All

Needs all input buffers before Needs all input buffers before producing any output rowsproducing any output rowsData copied to new buffersData copied to new buffersAggregate, SortAggregate, Sort

CPU UtilizationCPU Utilization

Execution TreeExecution TreeStarts from a source Starts from a source or an async outputor an async output

Ends at a destination Ends at a destination or an input that has or an input that has no sync outputsno sync outputs

Each Execution Tree Each Execution Tree can get a worker can get a worker threadthread

MaxEngineThreads to MaxEngineThreads to control parallelismcontrol parallelism

Performance StrategyPerformance Strategy

Use OVAL to identify the factors affecting data Use OVAL to identify the factors affecting data integration performance…integration performance…

Operations

Which app is best suited to these operations on this volume of data? For example, use SQL Server or SSIS for sorting data?

Volume

Application

Location

How much data must be processed?

What logic should be applied to the data?

Where should the app run? For example, on a shared server, or on a standalone machine?

An OVAL Example—An OVAL Example—Loading a Text FileLoading a Text File

Simple scenario…Simple scenario…

Interesting performance considerations!Interesting performance considerations!

Text file on Server 1 SQL Server on Server 2

Understand Understand allall operations performed operations performed

OperationsOperations

Beware of Beware of hiddenhidden operations operationsData conversion in either step 3 or 4Data conversion in either step 3 or 4

1.1. Open a transaction on SQL ServerOpen a transaction on SQL Server

2.2. Read data from the text fileRead data from the text file

3.3. Load data into the SSIS data flowLoad data into the SSIS data flow

4.4. Load the data into SQL ServerLoad the data into SQL Server

5.5. Commit the transactionCommit the transaction

File SourceFile SourceUnnecessary data type conversionsUnnecessary data type conversions

‘‘FastParse’ in Flat File SourceFastParse’ in Flat File Source

Unnecessary operations: E.g., Unnecessary operations: E.g., converting from text to datetime, then converting from text to datetime, then from datetime to datefrom datetime to date

Reduce database operationsReduce database operationsDatabase loggingDatabase logging

Commit sizeCommit size

Fast LoadFast Load

Table lockTable lock

Operations - SharpenOperations - Sharpen

VolumeVolume

Reduce where possibleReduce where possibleDon’t push unneeded columnsDon’t push unneeded columns

Conditional split for filtering rowsConditional split for filtering rows

Do not parse or convert columns Do not parse or convert columns unnecessarilyunnecessarily

In a fixed-width format you can combine In a fixed-width format you can combine adjacent unneeded columns into oneadjacent unneeded columns into one

Leave unneeded columns as stringsLeave unneeded columns as strings

Volume - SharpenVolume - Sharpen

Use appropriate data types Use appropriate data types An integer in the range 1-999 takes 2 bytes An integer in the range 1-999 takes 2 bytes as an integer, 3 bytes as a string, but 4 as an integer, 3 bytes as a string, but 4 bytes as a realbytes as a real

Suggest TypesSuggest Types in the flat file connection in the flat file connection manager UImanager UI

Use parallelism Use parallelism If loading multiple files, can they be If loading multiple files, can they be loaded in parallel?loaded in parallel?

Application Application

Is SSIS right for this?Is SSIS right for this?Overhead of starting up an SSIS package Overhead of starting up an SSIS package may offset any performance gain over may offset any performance gain over BCP for small data sets.BCP for small data sets.

Is BCP good enough?Is BCP good enough?Is the greater manageability and control Is the greater manageability and control of SSIS needed?of SSIS needed?

Bulk Import Task vs. Data FlowBulk Import Task vs. Data Flow

LocationLocation

Consider the following configuration Consider the following configuration ……

Text file on Server 1 SQL Server on Server 2

Where should SSIS run? Where should SSIS run? (Licensing issues aside)(Licensing issues aside)

Location ConsiderationsLocation ConsiderationsSSIS on Server 1SSIS on Server 1

Competes with apps for resourcesCompetes with apps for resources

Will data conversion on Server 1 reduce or Will data conversion on Server 1 reduce or increase the volume of data transferred across increase the volume of data transferred across the network?the network?

Can not use the fast SSIS SQL Server DestinationCan not use the fast SSIS SQL Server Destination

SSIS on Server 2 SSIS on Server 2 Competes with SQL Server for resourcesCompetes with SQL Server for resources

Will pulling text over conversion be expensive?Will pulling text over conversion be expensive?Also consider transferring the file unparsed to Server 2 Also consider transferring the file unparsed to Server 2 and read it locally from thereand read it locally from there

Can use the fast SSIS SQL Server DestinationCan use the fast SSIS SQL Server Destination

Measuring PerformanceMeasuring Performance

OVAL does not provide prescriptive OVAL does not provide prescriptive guidanceguidance

Too many variables Too many variables

Improve performance by applying Improve performance by applying OVAL and measuringOVAL and measuring

SSIS LoggingSSIS Logging

Performance countersPerformance counters

SQL Server ProfilerSQL Server ProfilerFor extract queries, lookups and loadingFor extract queries, lookups and loading

ParallelismParallelismFocus on critical pathFocus on critical path

Utilize available resourcesUtilize available resourcesMemory ConstrainedMemory Constrained Reader and CPU ConstrainedReader and CPU Constrained

Let it rip!Let it rip! Optimize the slowestOptimize the slowest

Moving AheadMoving Ahead

Manageability FeaturesManageability Features

Logging and Log ProvidersLogging and Log Providers

Checkpoint RestartabilityCheckpoint Restartability

Precedence ConstraintsPrecedence Constraints

ConfigurationsConfigurations

SSIS ServiceSSIS Service

Logging and Log ProvidersLogging and Log ProvidersLog entries are a blend of status and Log entries are a blend of status and result messagesresult messages

Can select what ‘details’ per control flow Can select what ‘details’ per control flow object within each package (e.g. OnError, object within each package (e.g. OnError, OnWarning, OnPreExecute)OnWarning, OnPreExecute)

Can select what fields (e.g.computer, Can select what fields (e.g.computer, operator, ExecutionID…)operator, ExecutionID…)

Can define multiple log providers (SQL, Can define multiple log providers (SQL, text file, Windows Event..) per packagetext file, Windows Event..) per package

CheckpointingCheckpointingCheckpoint File Created

Write Checkpoint

Write Checkpoint

Write Checkpoint

Checkpoint File deleted

Package Loads

Package Completes

Data Flow Task

Data Flow Task

Send Mail Task

ConfigurationsConfigurations

‘‘Feed’ changes into a package and alter Feed’ changes into a package and alter execution without editing the package execution without editing the package directly (e.g. file name to load)directly (e.g. file name to load)

The ‘feed’ can be sourced from a SQL The ‘feed’ can be sourced from a SQL table, XML file, Registry key, OS table, XML file, Registry key, OS environment var, a Parent package.environment var, a Parent package.

You can apply 1-many configuration You can apply 1-many configuration sets per package and from a mix of sets per package and from a mix of sourcessources

Configuration ScenarioConfiguration Scenario

Dev DB

Multiple Configurations

DevTest Production

Test DB Prod DB

Machines where packages are being designed /tested /executed

Configuration updates package on load with DB locations (and mail server, file share locations….)

Package Handoff

Precedence constraintsPrecedence constraints

Directs Flow from object to object…Directs Flow from object to object…

Basically, ‘when do I move on’Basically, ‘when do I move on’

Success, Failure, Completion or one of Success, Failure, Completion or one of those plus an expression (condition)those plus an expression (condition)

Dataflow Task

SendMail Task

Success

Completion

Failure

Success & expression

Manageability Manageability DemoDemo

Deployment Deployment FlowFlow

Tools to Tools to organize and organize and ‘copy’ ‘copy’ packages and packages and supporting supporting filesfiles

•Design Package•Add Configurations•Add Miscellaneous files•Set Project Deployment properties•Build

•Choose Destination (SQL File System) •Modify protection level•Choose location of supporting files•Change configurations•Execute Installation WizardInstallation Wizard

Bi StudioBi Studio

•Copy/Move Deployment folder\files UserUser

•Create desired agent jobs SQL AgentSQL Agent

•Copy/Move Deployment folder\files UserUser

SQL Management StudioSQL Management Studio

Utilizes the SSIS serviceUtilizes the SSIS service

Allows Monitoring of currently Executing Allows Monitoring of currently Executing packagespackages

Maintain stored package structureMaintain stored package structure

Ad hoc Package executionAd hoc Package execution

DeploymentDeploymentDemoDemo

Some more TipsSome more Tips

LookupLookup

AggregateAggregate

SortSort

SwappingSwapping

Performance of LookupsPerformance of Lookups

The reference setThe reference setRestrict to only those columns you actually useRestrict to only those columns you actually useRestrict rows with WHERE if possibleRestrict rows with WHERE if possible

The lookup cacheThe lookup cacheCaching can improve performance Caching can improve performance Full cacheFull cache

When the reference set will fit comfortably in memory When the reference set will fit comfortably in memory

PartialPartialBuild a cache as the input records are matchedBuild a cache as the input records are matchedUseful for duplicate keys in the input, such as SKUsUseful for duplicate keys in the input, such as SKUs

NoneNoneReference set doesn’t fit in memory and partial cache Reference set doesn’t fit in memory and partial cache has no advantagehas no advantage

Performance of AggregatePerformance of Aggregate

Majority of work happens in Majority of work happens in ProcessInput call. ProcessInput call.

This is on the thread in the previous This is on the thread in the previous execution tree!execution tree!

Memory requirements depend on how Memory requirements depend on how ‘deep’ the aggregations are ‘deep’ the aggregations are

Can reuse buckets if one agg can be Can reuse buckets if one agg can be derived from anotherderived from another

Use when memory is limited, single Use when memory is limited, single threaded operationthreaded operation

Performance of SortPerformance of Sort

ProcessInput hangs on to the incoming dataProcessInput hangs on to the incoming data

PrimeOutput does the sort and is the PrimeOutput does the sort and is the expensive partexpensive part

Sort needs all data to be in memorySort needs all data to be in memory

Sort can have unpredictable CPU Sort can have unpredictable CPU requirements requirements

Merging is single threadedMerging is single threaded

Stock Sort component will be good enough Stock Sort component will be good enough for most usersfor most users

Third party (“fastest sort in the world”) Third party (“fastest sort in the world”) available if you really need itavailable if you really need it

Swapping buffersSwapping buffers

When physical memory is not availableWhen physical memory is not available

Each buffer gets written out to one fileEach buffer gets written out to one file

Multiple paths can be specified for Multiple paths can be specified for swapping buffersswapping buffers

BufferTempStoragePath property on the BufferTempStoragePath property on the PipelinePipeline

Do everything in your power to avoid Do everything in your power to avoid swappingswapping

Else, performance is really unpredictableElse, performance is really unpredictable

Options: 64 bits, out of process execution, Options: 64 bits, out of process execution, serializing operationsserializing operations

SSIS: SummarySSIS: SummaryFast !Fast !

Data flows process large volumes of data efficiently - even Data flows process large volumes of data efficiently - even through complex operationsthrough complex operationsExceptional price / performance on multi-coreExceptional price / performance on multi-core

Feature RichFeature RichMany pre-built adapters and transformations reduce hand codingMany pre-built adapters and transformations reduce hand coding

Extensible object model enables specialized custom or scripted Extensible object model enables specialized custom or scripted componentscomponents

Highly productive visual environment speeds development and Highly productive visual environment speeds development and debuggingdebugging

Integral part of a complete BI stack (IS-AS-RS)Integral part of a complete BI stack (IS-AS-RS)

Beyond ETLBeyond ETLEnables integration of XML, RSS and Web Services dataEnables integration of XML, RSS and Web Services data

Data cleansing features enable “difficult” data to be handled Data cleansing features enable “difficult” data to be handled during loadingduring loading

Data and Text mining allow “smart” handling of data for Data and Text mining allow “smart” handling of data for imputation of incomplete data, conditional processing of potential imputation of incomplete data, conditional processing of potential problems, or smart escalation of issues such as fraud detectionproblems, or smart escalation of issues such as fraud detection

Your FeedbackYour Feedbackis Important!is Important!

Please Fill Out the Please Fill Out the feedback formfeedback form

Questions !!!Questions !!!

Links & ResourcesLinks & Resources

Vinod Kumar, MVP-SQL Server,www.ExtremeExperts.comIntel Technology India Pvt. [email protected]

SQL Server Integration SQL Server Integration Services public siteServices public site

http://msdn.microsoft.com/SQL/sqlwarhttp://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspxehouse/SSIS/default.aspx

SQL Server Business SQL Server Business Intelligence public site Intelligence public site

http://www.microsoft.com/sql/evaluatihttp://www.microsoft.com/sql/evaluation/bi/default.aspon/bi/default.asp

SSIS MVPs community siteSSIS MVPs community site http://http://www.sqlis.comwww.sqlis.com

NewsgroupsNewsgroups microsoft.private.sqlserver2005.dtsmicrosoft.private.sqlserver2005.dts

Srinivas Sampath, MVP-SQL Server www32.brinkster.com/srisampSCT Software Solutionssrisamp@

© 2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.