Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Integration Services
Creating an ETL Solution with SSIS
Module Overview
• Introduction to ETL with SSIS
• Implementing Data Flow
Lesson 1: Introduction to ETL with SSIS
•What Is SSIS?
• SSIS Projects and Packages
• The SSIS Design Environment
•Using the Import/Export wizard
What Is SSIS?
•A platform for ETL
operations
• Installed as a feature
of
SQL Server
•Control flow engine:
• Runtime resources and
operational support for
data flow
•Data flow engine:
• Pipeline architecture for
buffer-oriented rowset
processing
Control Flow Engine
Data Flow Engine
Pip
elin
e
SSIS Projects and Packages
• Package Deployment Model
• SSIS Packages are deployed and managed individually
• Project Deployment Model
• Multiple packages are deployed in a single project
Project
Package Package
Project-level parameter
Package-level parameter Package-level parameter
Deploy
Deploy
SSIS Catalog
Package
Deployment
Model
Project-level connection manager
Package connection manager Package connection manager
The SSIS Design Environment
Control Flow
Design
Surface
Data Flow
Tab
Solution
Explorer
Properties
Pane
Connection
Managers
Pane
SSIS
Toolbox
Pane
Package-level
Parameters
Event
Handlers
Tab
Package
Explorer
Variables
Pane
Variables and SSIS Toolbox buttons are at the upper right of the design surface.
Using the Import/Export Wizard
•Can be used to Export data from a table or query
in SQL Server
•Destination can be a wide variety of database
systems or file types.
•Can be used to Import data to a SQL table.
• The resulting package can be saved for reuse.
• Limited datatype transformations
Demonstration: Exploring Source Data
In this demonstration, you will see how to:
• Extract Data with the Import and Export Data
Wizard
• Explore Data in Microsoft Excel
Demonstration Steps Extract Data with the Import and Export Data Wizard • Ensure that the 20463C-MIA-DC and 20463C-MIA-SQL virtual machines are both running, and then log on
to 20463C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd. • In the D:\Demofiles\Mod04 folder, right-click Setup.cmd, and then click Run as administrator. • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen, type Import and Export, and then start the SQL Server 2014 Import and Export
Data (64-bit) app. • On the Welcome to SQL Server Import and Export Wizard page, click Next. • On the Choose a Data Source page, set the following options, and then click Next:
Data source: SQL Server Native Client 11.0
Server name: localhost
Authentication: Use Windows Authentication
Database: ResellerSales • On the Choose a Destination page, select the following options, and then click Next:
Destination: Flat File Destination
File name: D:\Demofiles\Mod04\Top 500 Resellers.csv
Locale: English (United States)
Unicode: Unselected
Code page: 1252 (ANSI – Latin 1)
Lesson 2: Implementing Data Flow
•Connection Managers
• The Data Flow Task
•Data Sources
•Data Destinations
•Data Transformations
•Optimizing Data Flow Performance
•Demonstration: Implementing a Data Flow
Connection Managers
•A connection to a data source or destination:
• Provider (for example, ADO.NET, OLE DB, or flat file)
• Connection string
• Credentials
• Project or package level:
• Project-level connection managers:
• Can be shared across packages
• Are listed in Solution Explorer and the Connection Managers
pane for packages in which they are used
• Package-level connection managers:
• Can be shared across objects in the package
• Are listed only in the Connection Managers pane for packages
in which they are used
The Data Flow Task
• The core control flow task in most SSIS
packages
• It encapsulates a data flow pipeline
• You define the pipeline for the task on the Data
Flow tab
Data Sources
•The source of data for a data flow:
• Connection manager
• Table, view, or query (where supported)
• Columns that are included
•Many Sources Supported:• Database (ADO.NET, OLE DB, CDC Source)
• File (Excel, Flat File, XML, Raw File)
• Custom
.
Data Destinations
• Endpoint for a data flow:
• Connection manager
• Table or view (where supported)
• Column mapping
•Multiple destination types:
• Database (ADO.NET, OLE DB, SQL Server, SQL Server
Compact)
• File (Excel, Flat File, Raw File)
• SQL Server Analysis Services (Data mining model
training, dimension processing, partition processing)
• Rowset (DataReader, Recordset)
• Custom
Data Transformations
• Row Transformations
• Character Map, Copy Column, data Conversion, Derived Column, Export
Column, Import Column, OLE DB Command
• Rowset Transformations
• Aggregate, Sort, Percentage Sampling, Row Sampling, Pivot, Unpivot
• Split and Join Transformations
• Conditional Split, Multicast, Union All, Merge, Merge Join, Lookup, Cache,
CDC Splitter
• Auditing Transformations
• Audit, Rowcount
• BI Transformations
• Slowly Changing Dimension, Fuzzy Grouping, Fuzzy Lookup, Term
Extraction, Term Lookup, Data Mining Query, Data Cleansing
• Custom Transformations
• Script, Custom Component
Optimizing Data Flow Performance
•Optimize queries:
• Select only the rows and columns that you need
•Avoid unnecessary sorting:
• Use presorted data where possible
• Set the IsSorted property where applicable
•Configure Data Flow task properties:
• Buffer size
• Temporary storage location
• Parallelism
• Optimized mode
For more information, go to http://go.microsoft.com/fwlink/?LinkID=248854, Tuning Your SSIS Package Data Flow in the Enterprise (SQL Server Video), and http://go.microsoft.com/fwlink/?LinkID=248858, Understanding SSIS Data Flow Buffers (SQL Server Video).
Demonstration: Implementing a Data Flow
In this demonstration, you will see how to:
•Configure a Data Source
•Use a Derived Column Transformation
•Use a Lookup Transformation
•Configure a Destination
Preparation Steps Complete the previous demonstrations in this module. Demonstration Steps Configure a Data Source • Ensure you have completed the previous demonstrations in this module. • Start SQL Server Management Studio and connect to the MIA-SQL database engine instance using
Windows authentication. • In Object Explorer, expand Databases, expand Products, and expand Tables. Then right-click each of
the following tables and click Select Top 1000 Rows and view the data they contain.
dbo.Product
dbo.ProductCategory
dbo.ProductSubcategory • In Object Explorer, under Databases, expand DemoDW, and expand Tables. Then right-click
dbo.DimProduct and click Select Top 1000 Rows to verify that this table is empty. • Start Visual Studio and create a new Integration Services project named DataFlowDemo in the
D:\Demofiles\Mod04 folder. • If the Getting Started (SSIS) window is displayed, close it. • In Solution Explorer, expand SSIS Packages, right-click Package.dtsx, and click Rename. Then change
the package name to ExtractProducts.dtsx. • In Solution Explorer, right-click Connection Managers and click New Connection Manager. Then add a
new OLEDB connection manager with the following settings:
Server name: localhost
Log on to the server: Use Windows Authentication
Select or enter a database name: Products
Lab: Implementing Data Flow in an SSIS Package
• Exercise 1: Exploring Source Data
• Exercise 2: Transferring Data by Using a Data Flow
Task
• Exercise 3: Using Transformations in a Data Flow
Logon Information
Virtual machine: 20462C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa$$w0rd
Estimated Time: 60 minutes
SSIS Control Flow
Implementing Control Flow in an SSIS Package
Module Overview
• Introduction to Control Flow
•Creating Dynamic Packages
•Using Containers
Lesson 1: Introduction to Control Flow
•Control Flow Tasks
• Precedence Constraints
•Grouping and Annotations
•Demonstration: Implementing Control Flow
•Using Multiple Packages
Control Flow Tasks
•Data Flow Tasks
•Database Tasks
• File and Internet Tasks
• Process Execution Tasks
•WMI Tasks
•Custom Logic Tasks
•Database Transfer Tasks
•Analysis Services Tasks
• SQL Server Maintenance Tasks
Precedence Constraints
• Connect sequences of tasks
• Three control flow conditions
• Success
• Failure
• Completion
• Multiple constraints
• Logical AND
• Logical OR
Task 1
Task 2
Task 3 Task 4
Task 5
Task 10
Task 6
Task 7
Success (AND)
Failure (AND)
Completion (AND)
Success (OR)
Failure (OR)
Completion (OR)
Task 9 Task 8
• The control flow starts with Task 1. • If Task 1 succeeds, Task 2 is executed. • If either Task 1 or Task 2 fails, Task 3 is
executed. • If Task 2 or Task 3 succeeds, Task 4 is
executed. • If Task 4 fails, Task 5 is executed, and if Task 4
succeeds, Task 6 is executed. • If Task 5 or Task 6 completes, Task 7 is
executed. • If Task 7 completes, Task 8 is executed. • If Task 7 and Task 8 succeed, Task 9 is
executed. • If Task 3 and Task 9 complete, Task 10 is
executed.
Grouping and Annotations
•Group tasks to manage them as a unit at design
time
• Show/Hide
• Move
•Add annotations to provide documentation
Task 1 Task 2 Task 3
Task 4
Grouped Tasks Can be Managed as a Unit
Annotations appear as
notes on the design
surface
Demonstration: Implementing Control Flow
In this demonstration, you will see how to:
•Add Tasks to a Control Flow
•Use Precedence Constraints to Define a Control
Flow
Preparation Steps Start the 20463C-MIA-DC and 20463C-MIA-SQL virtual machines. Demonstration Steps Add Tasks to a Control Flow • Ensure that the 20463C-MIA-DC and 20463C-MIA-SQL virtual machines are both running, and then log on
to 20463C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd. • In the D:\Demofiles\Mod05 folder, run Setup.cmd as Administrator. • Start Visual Studio and open ControlFlowDemo.sln from the D:\Demofiles\Mod05 folder. • In Solution Explorer, double-click Control Flow.dtsx. • If the SSIS Toolbox is not visible, on the SSIS menu, click SSIS Toolbox. Then, from the SSIS Toolbox,
drag a File System Task to the control flow surface. • Double-click the File System Task and configure the following settings:
Name: Delete Files
Operation: Delete directory content
SourceConnection: A new connection with a Usage type of Create folder, and a Folder value of D:\Demofiles\Mod05\Demo.
• From the SSIS Toolbox, drag a second File System Task to the control flow surface. Then double-click the File System Task and configure the following settings:
Name: Delete Folder
Operation: Delete directory
SourceConnection: Demo
Using Multiple Packages
• Create reusable units of workflow
• Run multiple control flows in parallel
• Separate ETL workflows to fit data acquisition windows
Pkg1 Pkg2
Pkg4 Pkg3
Execute Package tasks
Lesson 2: Creating Dynamic Packages
•Variables
• Parameters
• Expressions
•Demonstration: Using Variables and Parameters
Variables
• User Variables:
• Variables created by an SSIS developer to hold dynamic
values
• Defined in the User namespace by default
• Defined at a specified scope
• System Variables
• Built-in variables with dynamic system values
• Defined in the System namespace
Name: fName
Data Type: String
Value: MyFile.csv
Scope: Package
User::fName
Name: StartTime
Data Type: DateTime
Value: When the package
started running
System::StartTime
The fully-qualified naming syntax for variables, which is namespace::variable_name.
Parameters
•Project parameters
• Accessible from any package in the project
•Package parameters
• Exist only at the package level
Default Value: "D:\MyFiles\"Project::folderPath
Package1
Project
Default Value: "Server=localhost…"
Package::dbConnStr
Package2
Default Value: "ftpsrv01"
Package::ftpSrvr
Expressions
•Used to set values dynamically:
• Properties
• Conditional split criteria
• Derived column values
• Precedence constraints
• Based on Integration Services expression syntax
• Can include variables and parameters
•Can be created graphically by using Expression
Builder
@[$Project::folderPath]+@[User::fName]
Demonstration: Using Variables and Parameters
In this demonstration, you will see how to:
•Create a Variable
•Create a Parameter
•Use Variables and Parameters in an Expression
Preparation Steps Complete the previous demonstration in this module. Demonstration Steps Create a Variable • Ensure you have completed the previous demonstration in this module. • Start Visual Studio and open the VariablesAndParameters.sln solution in the D:\Demofiles\Mod05 folder. • In Solution Explorer, double-click Control Flow.dtsx. • On the View menu, click Other Windows, and click Variables. • In the Variables pane. Click the Add Variable button and add a variable with the following properties:
Name: fName
Scope: Control Flow
Data type: String
Value: Demo1.txt
Lesson 3: Using Containers
• Sequence Containers
•Demonstration: Using a Sequence Container
• For Loop Containers
•Demonstration: Using a For Loop Container
• Foreach Loop Containers
•Demonstration: Using a Foreach Loop Container
Sequence Containers
•Define a control flow subset
• Enable you to manage properties for multiple
tasks
•Create a scope for variables, transactions, and
precedence
Task 1 Task 2 Task 3
Task 4
Sequence Container
Unlike a group, a sequence exists at design time and run time.
Demonstration: Using a Sequence Container
In this demonstration, you will see how to:
•Use a Sequence Container
Preparation Steps Complete the previous demonstrations in this module. Demonstration Steps Use a Sequence Container • Ensure you have completed the previous demonstrations in this module. • Start Visual Studio and open the SequenceContainer.sln solution in the D:\Demofiles\Mod05 folder. • In Solution Explorer, double-click Control Flow.dtsx. • Right-click the Group indicator around the Delete Files and Delete Folder tasks and click Ungroup to
remove it. • Drag a Sequence Container from the SSIS Toolbox to the control flow design surface. • Right-click the precedence constraint that connects Delete Files to Send Failure Notification, and click
Delete. Then delete the precedence constraints connecting the Delete Folder to Send Failure Notification and Create Folder.
• Click and drag around the Delete Files and Delete Folder tasks to select them both, and then drag into the sequence container.
• Drag a precedence constraint from the sequence container into Create Folder. Then right-click the precedence constraint and click Completion.
• Drag a precedence constraint from the sequence container to Send Failure Notification. Then right-click the precedence constraint and click Failure.
• Run the package and view the results, then stop debugging. • Click the sequence container and press F4. Then in the Properties pane, set the Disable property to
True. • Run the package again and note that neither of the tasks in the sequence container is executed. Then stop
debugging and close Visual Studio.
For Loop Containers
• Implement iterative control flow
• Similar to a C# For loop
• Initialization expression
• Evaluation expression
• Iteration expression
@Count = 0
@Count < 4
@Count = @Count + 1
For Loop
Task
Task
Iterator variable (Count)
@Count = 0
@Count < 4?
@Count = @Count + 1
No
Yes
Task
Demonstration: Using a For Loop Container
In this demonstration, you will see how to:
•Use a For Loop Container
Foreach Loop Containers
Iterate through an enumerated collection
• ADO• Rows in a recordset
• ADO.NET Schema Rowset• Objects in a database schema
• File• Files in a folder
• Variable• Elements in an array variable
• Item• Enumerated property values of an item
• Nodelist• Nodes in an XML document
• SMO• SQL Server Management Objects
Foreach Loop
Task
Enumerator variable
(for example, file name)
Demonstration: Using a Foreach Loop Container
In this demonstration, you will see how to:
•Use a Foreach Loop Container
Preparation Steps Complete the previous demonstrations in this module. Demonstration Steps Use a Foreach Loop Container • Ensure you have completed the previous demonstrations in this module. • Start Visual Studio and open the ForeachLoopContainer.sln solution in the D:\Demofiles\Mod05 folder. • In Solution Explorer, double-click Control Flow.dtsx. • From the SSIS Toolbox, drag a Foreach Loop Container to the control flow design surface. Then double-
click the Foreach loop container to view the Foreach Loop Editor dialog box. • On the Collection tab, in the Enumerator list, select Foreach File Enumerator, and in the Expressions
box, click the ellipsis (…) button. Then in the Property Expressions Editor dialog box, in the Property list, select Directory and in the Expression box click the ellipsis (…) button.
• In the Expression Builder dialog box, expand the Variables and Parameters folder and drag the $Project::folderPath parameter to the Expression box to specify that the loop should iterate through files in the folder referenced by the folderPath project parameter. Then click OK to close the Expression Builder, and click OK again to close the Property Expression Editor.
• In the Foreach Loop Editor dialog box, on the Collection tab, in the Retrieve file name section, select Name and extension to return the file name and extension for each file the loop finds in the folder.
• In the Foreach loop Editor dialog box, on the Variable Mappings tab, in the Variable list, select User::fName and in the Index column select 0 to assign the file name of each file found in the folder to the fName variable. Then click OK.
• Remove the precedence constraints that are connected to and from the Copy File task, and then drag the Copy File task into the Foreach Loop Container.
Lab: Implementing Control Flow in an SSIS Package
• Exercise 1: Using Tasks and Precedence in a
Control Flow
• Exercise 2: Using Variables and Parameters
• Exercise 3: Using Containers
Logon Information
Virtual machine: 20462C-MIA-SQL
User name: ADVENTUREWORKS\Student
Password: Pa$$w0rd
Estimated Time: 60 minutes
Deploying and ConfiguringSSIS Packages
Module Overview
•Overview of SSIS Deployment
•Deploying SSIS Projects
•
Lesson 1: Overview of SSIS Deployment
• SSIS Deployment Models
• Package Deployment Model
• Project Deployment Model
•Deployment Model Comparison
SSIS Deployment Models
Package Deployment Model
• SSIS Packages are deployed and managed individually
Project Deployment Model
• Multiple packages are deployed in a single project
Package Deployment Model
• Storage
• MSDB
• File System
• Package Configurations
• Property values to be set dynamically at run
time
• Package Deployment Utility
• Generate all required files for easier
deployment
Project Deployment Model
• The SSIS catalog
• Storage and management for SSIS projects on a SQL
Server instance
• Folders
• A hierarchical structure for organizing and securing
SSIS projects
Deployment Model Comparison
Feature Package Deployment Project Deployment
Unit of Deployment Package Project
Storage File system or MSDB SSIS Catalog
Dynamic configuration Package configurations Environment variables
mapped to project-
level parameters and
connection managers
Compiled format Multiple .dtsx files Single .ispac file
Troubleshooting Configure logging for
each package
SSIS catalog includes
built-in reports and
views
Lesson 2: Deploying SSIS Projects
•Creating an SSIS Catalog
• Environments and Variables
•Deploying an SSIS Project
•Viewing Project Execution Information
•Demonstration: Deploying an SSIS Project
Creating an SSIS Catalog
• Pre-requisites
• SQL Server 2012 or later
• SQL CLR enabled
•Creating a catalog
• Use SQL Server Management Studio
• One SSIS catalog per SQL Server instance
•Catalog Security
• Folder Security
• Object Security
• Catalog Encryption
• Sensitive Parameters
Environments and Variables
• Environments
• Execution contexts for projects
•Variables
• Environment-specific values that can be mapped to
project parameters and connection manager properties
at run time
Deploying an SSIS Project
• Integration Services Deployment Wizard
• Visual Studio
• SQL Server Management Studio
Viewing Project Execution Information
• Integration Services Dashboard provides built-in
reports
•Additional sources of information:
• Event Handlers
• Error Outputs
• Logging
• Debug Dump Files
Demonstration: Deploying an SSIS Project
In this demonstration you will see how to:
•Configure the SSIS Environment
•Deploy an SSIS Project
•Create Environments and Variables
•Run an SSIS Package
•View Execution Information