Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Microsoft IT A Case Study on Building a Highly Scalable Enterprise Application that uses Massively Parallel Processing (MPP) to Deliver High Performance and Scalability on SSIS-SQL Server 2012SQL Server Technical Article
Summary This paper shares the approach used to understand and determine
Using the new Data Flow transformation component ndash Balanced Data Distributor (BDD) in SSIS-
SQL Server BIDS-20082012 including performance analysis
Understand the specifics involved while leveraging BDD to build a highly scalable enterprise
application that uses massively parallel process (MPP) to deliver high performance and scalability
on SSIS
BDD is a simple and new Data flow component that can have huge benefits when working with data that
is takes too long to load in a single Data Flow The intention of BDD is to improve performance through
multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in
SQL 20082012 It also includes the challenges and best practices when considering BDD to build
enterprise solutions as well as the performance analysis at different stages
This content is suitable for developers architects and database administrators It is assumed that
readers of this white paper have basic knowledge of SQL Server 200820122014 and SQL Server
administration
Author Prabhakaran Sethuraman (PRAB) Microsoft
Technical Reviewers Prabhakaran Sethuraman (PRAB) Microsoft | Shubhra Mittal Microsoft | Mohammed Maqsood Microsoft | Hariharan Sethuraman Microsoft | Mandi Ohlinger Microsoft
Published September 2014
Applies to SQL Server 2008 SQL Server 2012 and SQL Server 2014
Copyright
This document is provided ldquoas-isrdquo Information and views expressed in this document including URL and other Internet Web site references can change without notice You bear the risk of using it
This document does not provide you with any legal rights to any intellectual property in any Microsoft product You can copy and use this document for your internal reference purposes
copy 2014 Microsoft All rights reserved
ContentsContents3
Introduction4
About Balanced Data Distributor (BDD)4
Balanced Data Distributor as a Solution Case Study6
Context6
Objectives7
Current Architecture7
Proposed Architecture8
Steps to add BDD in SSIS tool box12
Environments for performance analysis13
Performance analysis before using BDD14
Performance analysis after using BDD15
Design Considerations17
Conclusion18
Benefits18
Appendix19
Acknowledgements19
References19
For more information19
Introduction
There is a new Data Flow transform component available for SQL Server Integration Services ndash Balanced
Data Distributor (BDD) BDD allows you to create more than one independent segment against the
destination BDD [1] provides an easy way to ramp up your usage of multi-processor and multi-core
servers by introducing parallelism in the data flow of an SSIS package This paper describes the
following
Using the new Balanced Data Distributor (BDD) Data Flow transformation component in SSIS-
SQL Server BIDS-20082012 including performance analysis
Understand the specifics involved when leveraging BDD to build a highly scalable enterprise
application that uses massively parallel process (MPP) to deliver high performance and scalability
on SSIS
BDD [1] is a simple and new Data flow component that can have huge benefits when working with data
that takes too long to load in a single Data Flow The intention of BDD is to improve performance through
multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in
SQL 20082012 It also includes the challenges and best practices when considering BDD to build
enterprise solutions as well as the performance analysis at different stages
This content is suitable for developers architects and database administrators It is assumed that
readers of this white paper have basic knowledge of SQL Server 2008 SQL 2012 SQL 2014 and SQL
Server administration
About Balanced Data Distributor (BDD)
The functionality of the BDD [1 2] is very simple It takes its input data and routes it in equal proportions to
its outputs however many there are If you have four outputs roughly frac14 of the input rows go to each
output Instead of routing individual rows the BDD operates on buffers of data so itrsquos very efficient The
value of the BDD comes from the way modern servers work Parallelism When there are independent
segments of an SSIS data flow SSIS can distribute the work over multiple threads BDD provides an
easy way to create independent segments The following diagram provides more of an understanding
about BDD In the diagram BDD-MarketProduct (BDD component) takes its input data from the data
flow component (Market Product) and routes it in equal proportions to its outputs however many there
are We have four independent segments roughly a quarter of the input rows go to each output Instead
of routing individual rows the BDD operates on buffers of data so itrsquos very efficient
This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS
package to take advantage of todays multi-processor and multi-core servers By introducing parallelism
BDD does not require running multiple copies of equivalent packages in parallel or alternative harder
solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use
BBD against your computer file or OLEDB source Then you create several output flows as you wish
From there the new output flows should be equivalent to the execution of the slow moving processes
They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs
Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline
buffer of information to the outputs For example if there are three outputs setup the first buffer goes to
output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire
input file is complete
If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be
a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting
the destination database then there may be a substantial speed advantage
In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS
2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise
solutions
Balanced Data Distributor as a Solution Case Study
Context
Mastering data may include data about clients and customers employees product service offerings and
more Mastering data is typically shared by multiple users and groups across an organization and stored
on different systems Different systems might be pulling data through replication especially merge
replication since filtering is enabled Filtering allows applications to pull data based on their domain and
regional (geographical) values
MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement
System) Global PES is a centralized data warehouse that contains a variety of information on products
service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)
Global PES maintains this information at the market level Global PES pushes all new and modified data
to Regional MS CRM instances through Merge replication Specifically
Global PES is publishing data through Merge Replication with filter as regional MS CRM
consumes product and services data based on the specific market
MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge
replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent
job based on specific transformations and customizations at the regional level So all subscribing
applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant
copy of mastering data The following diagram shows the context of the current architecture of Global
PES and CRM integration with the local subscribers
Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business
Objectives
Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering
data using SQL Server 2012
Improve the performance and scalability of Extraction Transformation and Loading through SSIS
and retire a SQL Agent job
Explain the improvements observed when migrating from the SQL Agent job to the ETL
(Extraction Transformation and Load) process
Current Architecture
The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications
(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter
replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin
tool to administer and manage the data in the mastering system Changes made to the data are
replicated to the subscribing systems immediately Merge replication is running in continuous mode
NA CRM
HKTW CRM KOREA
Japan CRMAPAC CRM
EMEA CRM
Global PES
Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]
In the current architecture the Global PES (Mastering DB) is replicated against all subscribing
applications which creates duplicates of the same mastering data copy at the subscribing systems The
Interface DB is used by the subscribing CRM applications Based on defined transformation and business
rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the
Interface DB Full data extraction is always used and the Interface DB does not have any integrity
constraints like Primary key Check Constraints Foreign Constraints and so on
The current idea is to retire the copy of mastering data at the regional level and improve the performance
and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and
maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To
accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and
take advantage of parallelism
Proposed Architecture
In the current scenario consuming applications recognize the modified data daily after the SQL Agent job
executes So the scope is to use BDD to cut off replication and to transform the data directly from the
mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this
problem because of the following
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Copyright
This document is provided ldquoas-isrdquo Information and views expressed in this document including URL and other Internet Web site references can change without notice You bear the risk of using it
This document does not provide you with any legal rights to any intellectual property in any Microsoft product You can copy and use this document for your internal reference purposes
copy 2014 Microsoft All rights reserved
ContentsContents3
Introduction4
About Balanced Data Distributor (BDD)4
Balanced Data Distributor as a Solution Case Study6
Context6
Objectives7
Current Architecture7
Proposed Architecture8
Steps to add BDD in SSIS tool box12
Environments for performance analysis13
Performance analysis before using BDD14
Performance analysis after using BDD15
Design Considerations17
Conclusion18
Benefits18
Appendix19
Acknowledgements19
References19
For more information19
Introduction
There is a new Data Flow transform component available for SQL Server Integration Services ndash Balanced
Data Distributor (BDD) BDD allows you to create more than one independent segment against the
destination BDD [1] provides an easy way to ramp up your usage of multi-processor and multi-core
servers by introducing parallelism in the data flow of an SSIS package This paper describes the
following
Using the new Balanced Data Distributor (BDD) Data Flow transformation component in SSIS-
SQL Server BIDS-20082012 including performance analysis
Understand the specifics involved when leveraging BDD to build a highly scalable enterprise
application that uses massively parallel process (MPP) to deliver high performance and scalability
on SSIS
BDD [1] is a simple and new Data flow component that can have huge benefits when working with data
that takes too long to load in a single Data Flow The intention of BDD is to improve performance through
multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in
SQL 20082012 It also includes the challenges and best practices when considering BDD to build
enterprise solutions as well as the performance analysis at different stages
This content is suitable for developers architects and database administrators It is assumed that
readers of this white paper have basic knowledge of SQL Server 2008 SQL 2012 SQL 2014 and SQL
Server administration
About Balanced Data Distributor (BDD)
The functionality of the BDD [1 2] is very simple It takes its input data and routes it in equal proportions to
its outputs however many there are If you have four outputs roughly frac14 of the input rows go to each
output Instead of routing individual rows the BDD operates on buffers of data so itrsquos very efficient The
value of the BDD comes from the way modern servers work Parallelism When there are independent
segments of an SSIS data flow SSIS can distribute the work over multiple threads BDD provides an
easy way to create independent segments The following diagram provides more of an understanding
about BDD In the diagram BDD-MarketProduct (BDD component) takes its input data from the data
flow component (Market Product) and routes it in equal proportions to its outputs however many there
are We have four independent segments roughly a quarter of the input rows go to each output Instead
of routing individual rows the BDD operates on buffers of data so itrsquos very efficient
This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS
package to take advantage of todays multi-processor and multi-core servers By introducing parallelism
BDD does not require running multiple copies of equivalent packages in parallel or alternative harder
solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use
BBD against your computer file or OLEDB source Then you create several output flows as you wish
From there the new output flows should be equivalent to the execution of the slow moving processes
They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs
Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline
buffer of information to the outputs For example if there are three outputs setup the first buffer goes to
output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire
input file is complete
If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be
a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting
the destination database then there may be a substantial speed advantage
In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS
2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise
solutions
Balanced Data Distributor as a Solution Case Study
Context
Mastering data may include data about clients and customers employees product service offerings and
more Mastering data is typically shared by multiple users and groups across an organization and stored
on different systems Different systems might be pulling data through replication especially merge
replication since filtering is enabled Filtering allows applications to pull data based on their domain and
regional (geographical) values
MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement
System) Global PES is a centralized data warehouse that contains a variety of information on products
service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)
Global PES maintains this information at the market level Global PES pushes all new and modified data
to Regional MS CRM instances through Merge replication Specifically
Global PES is publishing data through Merge Replication with filter as regional MS CRM
consumes product and services data based on the specific market
MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge
replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent
job based on specific transformations and customizations at the regional level So all subscribing
applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant
copy of mastering data The following diagram shows the context of the current architecture of Global
PES and CRM integration with the local subscribers
Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business
Objectives
Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering
data using SQL Server 2012
Improve the performance and scalability of Extraction Transformation and Loading through SSIS
and retire a SQL Agent job
Explain the improvements observed when migrating from the SQL Agent job to the ETL
(Extraction Transformation and Load) process
Current Architecture
The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications
(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter
replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin
tool to administer and manage the data in the mastering system Changes made to the data are
replicated to the subscribing systems immediately Merge replication is running in continuous mode
NA CRM
HKTW CRM KOREA
Japan CRMAPAC CRM
EMEA CRM
Global PES
Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]
In the current architecture the Global PES (Mastering DB) is replicated against all subscribing
applications which creates duplicates of the same mastering data copy at the subscribing systems The
Interface DB is used by the subscribing CRM applications Based on defined transformation and business
rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the
Interface DB Full data extraction is always used and the Interface DB does not have any integrity
constraints like Primary key Check Constraints Foreign Constraints and so on
The current idea is to retire the copy of mastering data at the regional level and improve the performance
and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and
maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To
accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and
take advantage of parallelism
Proposed Architecture
In the current scenario consuming applications recognize the modified data daily after the SQL Agent job
executes So the scope is to use BDD to cut off replication and to transform the data directly from the
mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this
problem because of the following
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
ContentsContents3
Introduction4
About Balanced Data Distributor (BDD)4
Balanced Data Distributor as a Solution Case Study6
Context6
Objectives7
Current Architecture7
Proposed Architecture8
Steps to add BDD in SSIS tool box12
Environments for performance analysis13
Performance analysis before using BDD14
Performance analysis after using BDD15
Design Considerations17
Conclusion18
Benefits18
Appendix19
Acknowledgements19
References19
For more information19
Introduction
There is a new Data Flow transform component available for SQL Server Integration Services ndash Balanced
Data Distributor (BDD) BDD allows you to create more than one independent segment against the
destination BDD [1] provides an easy way to ramp up your usage of multi-processor and multi-core
servers by introducing parallelism in the data flow of an SSIS package This paper describes the
following
Using the new Balanced Data Distributor (BDD) Data Flow transformation component in SSIS-
SQL Server BIDS-20082012 including performance analysis
Understand the specifics involved when leveraging BDD to build a highly scalable enterprise
application that uses massively parallel process (MPP) to deliver high performance and scalability
on SSIS
BDD [1] is a simple and new Data flow component that can have huge benefits when working with data
that takes too long to load in a single Data Flow The intention of BDD is to improve performance through
multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in
SQL 20082012 It also includes the challenges and best practices when considering BDD to build
enterprise solutions as well as the performance analysis at different stages
This content is suitable for developers architects and database administrators It is assumed that
readers of this white paper have basic knowledge of SQL Server 2008 SQL 2012 SQL 2014 and SQL
Server administration
About Balanced Data Distributor (BDD)
The functionality of the BDD [1 2] is very simple It takes its input data and routes it in equal proportions to
its outputs however many there are If you have four outputs roughly frac14 of the input rows go to each
output Instead of routing individual rows the BDD operates on buffers of data so itrsquos very efficient The
value of the BDD comes from the way modern servers work Parallelism When there are independent
segments of an SSIS data flow SSIS can distribute the work over multiple threads BDD provides an
easy way to create independent segments The following diagram provides more of an understanding
about BDD In the diagram BDD-MarketProduct (BDD component) takes its input data from the data
flow component (Market Product) and routes it in equal proportions to its outputs however many there
are We have four independent segments roughly a quarter of the input rows go to each output Instead
of routing individual rows the BDD operates on buffers of data so itrsquos very efficient
This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS
package to take advantage of todays multi-processor and multi-core servers By introducing parallelism
BDD does not require running multiple copies of equivalent packages in parallel or alternative harder
solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use
BBD against your computer file or OLEDB source Then you create several output flows as you wish
From there the new output flows should be equivalent to the execution of the slow moving processes
They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs
Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline
buffer of information to the outputs For example if there are three outputs setup the first buffer goes to
output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire
input file is complete
If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be
a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting
the destination database then there may be a substantial speed advantage
In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS
2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise
solutions
Balanced Data Distributor as a Solution Case Study
Context
Mastering data may include data about clients and customers employees product service offerings and
more Mastering data is typically shared by multiple users and groups across an organization and stored
on different systems Different systems might be pulling data through replication especially merge
replication since filtering is enabled Filtering allows applications to pull data based on their domain and
regional (geographical) values
MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement
System) Global PES is a centralized data warehouse that contains a variety of information on products
service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)
Global PES maintains this information at the market level Global PES pushes all new and modified data
to Regional MS CRM instances through Merge replication Specifically
Global PES is publishing data through Merge Replication with filter as regional MS CRM
consumes product and services data based on the specific market
MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge
replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent
job based on specific transformations and customizations at the regional level So all subscribing
applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant
copy of mastering data The following diagram shows the context of the current architecture of Global
PES and CRM integration with the local subscribers
Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business
Objectives
Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering
data using SQL Server 2012
Improve the performance and scalability of Extraction Transformation and Loading through SSIS
and retire a SQL Agent job
Explain the improvements observed when migrating from the SQL Agent job to the ETL
(Extraction Transformation and Load) process
Current Architecture
The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications
(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter
replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin
tool to administer and manage the data in the mastering system Changes made to the data are
replicated to the subscribing systems immediately Merge replication is running in continuous mode
NA CRM
HKTW CRM KOREA
Japan CRMAPAC CRM
EMEA CRM
Global PES
Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]
In the current architecture the Global PES (Mastering DB) is replicated against all subscribing
applications which creates duplicates of the same mastering data copy at the subscribing systems The
Interface DB is used by the subscribing CRM applications Based on defined transformation and business
rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the
Interface DB Full data extraction is always used and the Interface DB does not have any integrity
constraints like Primary key Check Constraints Foreign Constraints and so on
The current idea is to retire the copy of mastering data at the regional level and improve the performance
and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and
maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To
accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and
take advantage of parallelism
Proposed Architecture
In the current scenario consuming applications recognize the modified data daily after the SQL Agent job
executes So the scope is to use BDD to cut off replication and to transform the data directly from the
mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this
problem because of the following
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Introduction
There is a new Data Flow transform component available for SQL Server Integration Services ndash Balanced
Data Distributor (BDD) BDD allows you to create more than one independent segment against the
destination BDD [1] provides an easy way to ramp up your usage of multi-processor and multi-core
servers by introducing parallelism in the data flow of an SSIS package This paper describes the
following
Using the new Balanced Data Distributor (BDD) Data Flow transformation component in SSIS-
SQL Server BIDS-20082012 including performance analysis
Understand the specifics involved when leveraging BDD to build a highly scalable enterprise
application that uses massively parallel process (MPP) to deliver high performance and scalability
on SSIS
BDD [1] is a simple and new Data flow component that can have huge benefits when working with data
that takes too long to load in a single Data Flow The intention of BDD is to improve performance through
multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in
SQL 20082012 It also includes the challenges and best practices when considering BDD to build
enterprise solutions as well as the performance analysis at different stages
This content is suitable for developers architects and database administrators It is assumed that
readers of this white paper have basic knowledge of SQL Server 2008 SQL 2012 SQL 2014 and SQL
Server administration
About Balanced Data Distributor (BDD)
The functionality of the BDD [1 2] is very simple It takes its input data and routes it in equal proportions to
its outputs however many there are If you have four outputs roughly frac14 of the input rows go to each
output Instead of routing individual rows the BDD operates on buffers of data so itrsquos very efficient The
value of the BDD comes from the way modern servers work Parallelism When there are independent
segments of an SSIS data flow SSIS can distribute the work over multiple threads BDD provides an
easy way to create independent segments The following diagram provides more of an understanding
about BDD In the diagram BDD-MarketProduct (BDD component) takes its input data from the data
flow component (Market Product) and routes it in equal proportions to its outputs however many there
are We have four independent segments roughly a quarter of the input rows go to each output Instead
of routing individual rows the BDD operates on buffers of data so itrsquos very efficient
This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS
package to take advantage of todays multi-processor and multi-core servers By introducing parallelism
BDD does not require running multiple copies of equivalent packages in parallel or alternative harder
solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use
BBD against your computer file or OLEDB source Then you create several output flows as you wish
From there the new output flows should be equivalent to the execution of the slow moving processes
They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs
Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline
buffer of information to the outputs For example if there are three outputs setup the first buffer goes to
output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire
input file is complete
If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be
a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting
the destination database then there may be a substantial speed advantage
In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS
2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise
solutions
Balanced Data Distributor as a Solution Case Study
Context
Mastering data may include data about clients and customers employees product service offerings and
more Mastering data is typically shared by multiple users and groups across an organization and stored
on different systems Different systems might be pulling data through replication especially merge
replication since filtering is enabled Filtering allows applications to pull data based on their domain and
regional (geographical) values
MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement
System) Global PES is a centralized data warehouse that contains a variety of information on products
service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)
Global PES maintains this information at the market level Global PES pushes all new and modified data
to Regional MS CRM instances through Merge replication Specifically
Global PES is publishing data through Merge Replication with filter as regional MS CRM
consumes product and services data based on the specific market
MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge
replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent
job based on specific transformations and customizations at the regional level So all subscribing
applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant
copy of mastering data The following diagram shows the context of the current architecture of Global
PES and CRM integration with the local subscribers
Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business
Objectives
Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering
data using SQL Server 2012
Improve the performance and scalability of Extraction Transformation and Loading through SSIS
and retire a SQL Agent job
Explain the improvements observed when migrating from the SQL Agent job to the ETL
(Extraction Transformation and Load) process
Current Architecture
The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications
(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter
replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin
tool to administer and manage the data in the mastering system Changes made to the data are
replicated to the subscribing systems immediately Merge replication is running in continuous mode
NA CRM
HKTW CRM KOREA
Japan CRMAPAC CRM
EMEA CRM
Global PES
Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]
In the current architecture the Global PES (Mastering DB) is replicated against all subscribing
applications which creates duplicates of the same mastering data copy at the subscribing systems The
Interface DB is used by the subscribing CRM applications Based on defined transformation and business
rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the
Interface DB Full data extraction is always used and the Interface DB does not have any integrity
constraints like Primary key Check Constraints Foreign Constraints and so on
The current idea is to retire the copy of mastering data at the regional level and improve the performance
and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and
maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To
accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and
take advantage of parallelism
Proposed Architecture
In the current scenario consuming applications recognize the modified data daily after the SQL Agent job
executes So the scope is to use BDD to cut off replication and to transform the data directly from the
mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this
problem because of the following
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS
package to take advantage of todays multi-processor and multi-core servers By introducing parallelism
BDD does not require running multiple copies of equivalent packages in parallel or alternative harder
solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use
BBD against your computer file or OLEDB source Then you create several output flows as you wish
From there the new output flows should be equivalent to the execution of the slow moving processes
They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs
Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline
buffer of information to the outputs For example if there are three outputs setup the first buffer goes to
output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire
input file is complete
If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be
a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting
the destination database then there may be a substantial speed advantage
In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS
2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise
solutions
Balanced Data Distributor as a Solution Case Study
Context
Mastering data may include data about clients and customers employees product service offerings and
more Mastering data is typically shared by multiple users and groups across an organization and stored
on different systems Different systems might be pulling data through replication especially merge
replication since filtering is enabled Filtering allows applications to pull data based on their domain and
regional (geographical) values
MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement
System) Global PES is a centralized data warehouse that contains a variety of information on products
service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)
Global PES maintains this information at the market level Global PES pushes all new and modified data
to Regional MS CRM instances through Merge replication Specifically
Global PES is publishing data through Merge Replication with filter as regional MS CRM
consumes product and services data based on the specific market
MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge
replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent
job based on specific transformations and customizations at the regional level So all subscribing
applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant
copy of mastering data The following diagram shows the context of the current architecture of Global
PES and CRM integration with the local subscribers
Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business
Objectives
Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering
data using SQL Server 2012
Improve the performance and scalability of Extraction Transformation and Loading through SSIS
and retire a SQL Agent job
Explain the improvements observed when migrating from the SQL Agent job to the ETL
(Extraction Transformation and Load) process
Current Architecture
The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications
(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter
replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin
tool to administer and manage the data in the mastering system Changes made to the data are
replicated to the subscribing systems immediately Merge replication is running in continuous mode
NA CRM
HKTW CRM KOREA
Japan CRMAPAC CRM
EMEA CRM
Global PES
Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]
In the current architecture the Global PES (Mastering DB) is replicated against all subscribing
applications which creates duplicates of the same mastering data copy at the subscribing systems The
Interface DB is used by the subscribing CRM applications Based on defined transformation and business
rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the
Interface DB Full data extraction is always used and the Interface DB does not have any integrity
constraints like Primary key Check Constraints Foreign Constraints and so on
The current idea is to retire the copy of mastering data at the regional level and improve the performance
and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and
maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To
accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and
take advantage of parallelism
Proposed Architecture
In the current scenario consuming applications recognize the modified data daily after the SQL Agent job
executes So the scope is to use BDD to cut off replication and to transform the data directly from the
mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this
problem because of the following
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Balanced Data Distributor as a Solution Case Study
Context
Mastering data may include data about clients and customers employees product service offerings and
more Mastering data is typically shared by multiple users and groups across an organization and stored
on different systems Different systems might be pulling data through replication especially merge
replication since filtering is enabled Filtering allows applications to pull data based on their domain and
regional (geographical) values
MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement
System) Global PES is a centralized data warehouse that contains a variety of information on products
service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)
Global PES maintains this information at the market level Global PES pushes all new and modified data
to Regional MS CRM instances through Merge replication Specifically
Global PES is publishing data through Merge Replication with filter as regional MS CRM
consumes product and services data based on the specific market
MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge
replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent
job based on specific transformations and customizations at the regional level So all subscribing
applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant
copy of mastering data The following diagram shows the context of the current architecture of Global
PES and CRM integration with the local subscribers
Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business
Objectives
Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering
data using SQL Server 2012
Improve the performance and scalability of Extraction Transformation and Loading through SSIS
and retire a SQL Agent job
Explain the improvements observed when migrating from the SQL Agent job to the ETL
(Extraction Transformation and Load) process
Current Architecture
The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications
(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter
replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin
tool to administer and manage the data in the mastering system Changes made to the data are
replicated to the subscribing systems immediately Merge replication is running in continuous mode
NA CRM
HKTW CRM KOREA
Japan CRMAPAC CRM
EMEA CRM
Global PES
Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]
In the current architecture the Global PES (Mastering DB) is replicated against all subscribing
applications which creates duplicates of the same mastering data copy at the subscribing systems The
Interface DB is used by the subscribing CRM applications Based on defined transformation and business
rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the
Interface DB Full data extraction is always used and the Interface DB does not have any integrity
constraints like Primary key Check Constraints Foreign Constraints and so on
The current idea is to retire the copy of mastering data at the regional level and improve the performance
and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and
maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To
accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and
take advantage of parallelism
Proposed Architecture
In the current scenario consuming applications recognize the modified data daily after the SQL Agent job
executes So the scope is to use BDD to cut off replication and to transform the data directly from the
mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this
problem because of the following
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business
Objectives
Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering
data using SQL Server 2012
Improve the performance and scalability of Extraction Transformation and Loading through SSIS
and retire a SQL Agent job
Explain the improvements observed when migrating from the SQL Agent job to the ETL
(Extraction Transformation and Load) process
Current Architecture
The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications
(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter
replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin
tool to administer and manage the data in the mastering system Changes made to the data are
replicated to the subscribing systems immediately Merge replication is running in continuous mode
NA CRM
HKTW CRM KOREA
Japan CRMAPAC CRM
EMEA CRM
Global PES
Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]
In the current architecture the Global PES (Mastering DB) is replicated against all subscribing
applications which creates duplicates of the same mastering data copy at the subscribing systems The
Interface DB is used by the subscribing CRM applications Based on defined transformation and business
rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the
Interface DB Full data extraction is always used and the Interface DB does not have any integrity
constraints like Primary key Check Constraints Foreign Constraints and so on
The current idea is to retire the copy of mastering data at the regional level and improve the performance
and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and
maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To
accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and
take advantage of parallelism
Proposed Architecture
In the current scenario consuming applications recognize the modified data daily after the SQL Agent job
executes So the scope is to use BDD to cut off replication and to transform the data directly from the
mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this
problem because of the following
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]
In the current architecture the Global PES (Mastering DB) is replicated against all subscribing
applications which creates duplicates of the same mastering data copy at the subscribing systems The
Interface DB is used by the subscribing CRM applications Based on defined transformation and business
rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the
Interface DB Full data extraction is always used and the Interface DB does not have any integrity
constraints like Primary key Check Constraints Foreign Constraints and so on
The current idea is to retire the copy of mastering data at the regional level and improve the performance
and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and
maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To
accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and
take advantage of parallelism
Proposed Architecture
In the current scenario consuming applications recognize the modified data daily after the SQL Agent job
executes So the scope is to use BDD to cut off replication and to transform the data directly from the
mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this
problem because of the following
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
1 The Interface DB does not have any constraints (heap table)
2 The data load is full every time the SQL Agent job runs
3 The schemaentity structure of all regions is the same
Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster
than the traditional data flow components Using BDD and parallelism we can accomplish the
transformation from the mastering DB to the subscribing applications
The following figure shows the redesigned architecture using BDD
Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters
In this redesigned architecture
ETL is designed using BDD There are several regional consuming applications of the mastering
DB These applications continue to consume data from the Interface DB which is loaded through
SSIS (ETL) instead of replications and a SQL Agent job
The Interface DB structure is similar across regions As a result the SSIS (ETL) package with
BDD is designed so it can be used with different regions by simply changing the input parameter
values like Region ID and Server Name
The following steps sync the mastering and consuming applications
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
1 Create a Configuration table in Global PES that stores the regional server names and region IDs
This data is used by SSIS-ETL to retrieve the product and service information based on the
regional server names and region IDs
Region IDMarket ID Server Name
13 NACRM
43 EMEACRM
As we need to filter data similar to Merge replication this configuration table is created and will be
used by the SSIS package
2 Create an SSIS package using BDD so it automatically uses multiple threads against the same
destination We have created the data flow for 35 entities as seen in the following diagram Here
you can see the sample data flow of MarketProduct which uses BDD and has 4 independent
segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data
transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can
be retired
Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
3 The SSIS package control flow is designed to take care of transformation and business rules as
seen in the following diagram In this scenario the following occurs
i Extract the Region ID and Server name from the configuration table
ii Capture the ETL process log including the Error log
iii Truncate all tables in the Interface DB before loading the Mastering data
iv Start loading the data from Global PES to Interface DB through BDD
v Apply localization based on the regional collation
Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
4 SSIS package are configured as a job in every region in the Mastering System The job runs on a
scheduled frequency
5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the
configuration table and transform the data against the retrieved server name
Steps to add BDD in SSIS tool box
Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox
Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and
select OK
Balanced Data Distributor (BDD) is added
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Environments for performance analysis
The performance analyses completed throughout the migration process uses the following environments
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Global PES (Mastering System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLGCPESSQL09 16 GB 500 GB 235 GHZ
Table 1 Global PES SQL Server versions and complete system configurations
System Technology Stack
Server Name RAM Secondary Memory
Processor Speed
Local PES (CRM System)
SQL Server 2012 R2 amp Windows Server 2012 R2
AGLCLFYCRM01 16 GB 900 GB 235 GHZ
Table 2 CRM System (Local PES) SQL Server versions and complete system configurations
The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as
1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)
2 Performance Analysis of New Solution by leveraging SSIS-BDD
Performance analysis before using BDD
Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the
regional systems Existing Global PES implementation
DB Name Description DB Size
Global PES Global PES DB (Mastering DB) 75 GB
Table 3 Global PES Database Size (Mastering System)
The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly
feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the
defined transformation and business rules Local PES implementation
DB Name Description DB Size
Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Interface DB (Populated through Nightly Feed Job)
Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level
8 GB
Table 4 CRM System (Local PES) Local PES and Interface DB size
The following table lists the performance numbers of the existing implementation without using SSIS-
BDD (Balance Data Distributor)
Version State DB Size (NA Region)
NA Region DB Size (EMEA Region)
EMEA Region
Merge Replication from Global PES to CRM
Global PES to CRM System through Replication for creating Local PES copy
75 GB Continuous Replication
75 GB Continuous Replication
Nightly Feed Job in CRM
To populate Interface DB
7 GB (around 20 Million for some of the Key tables)
35 Minutes 7 GB (around 20 Million for some
of the Key tables)
30 Minutes
Table 5 performance in Minutes for every run in regional CRM systems
As you can see there is a difference in performance from region-to-region as the size of the data varies
based on the filtering which is defined according to a regional requirement
Performance analysis after using BDD
The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-
BDD implementation shows a 220 improvement in different regional runs
Version State Rows ret NA Region EMEA Region
SSIS (ETL) through BDD
Replication has been retired and Local PES copy creation has been eliminated
This saved around 75 GB storage
Retired Retired
SSIS (ETL) through BDD
Nightly Feed Job to populate Interface DB has been retired through SSIS BDD
7-8 GB (around 20 Million for some of the Key tables)
15 Minutes 13 Minutes
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
The following table shows the execution time (in minutes)
NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0
5
10
15
20
25
30
35
40
35
30
15
13
SSIS BDD vs SQL Agent Job
Before BDD (Minutes) After BDD (Minutes)
Region
Min
utes
Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
The following table shows the database size (in GB)
NA Region (Local PES DB)
EMEA Region (Local PES DB)
0
10
20
30
40
50
60
70
80
Before BDD (Replication)(GB)
After BDD (No Replication)(GB)
7575
Local PES Size (Before BDD -Replication Vs After BDD (No Replication)
Before BDD (Replication)(GB) After BDD (No Replication)(GB)
REGION
DB S
IZE
(GB)
Figure 6 Database Size after the implementation of SSIS-ETL run in different regions
By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from
Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is
really a great win as far as huge data load is concerned To take the solution forward there is a trade-off
between OLTP and integration to consider yielding good performance by leveraging BDD
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Design Considerations
Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of
your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing
to think through these things there can be significant benefits Consider using BBD when the following
occurs
1 There is a large amount of data coming in
2 The data can be read faster than the rest of the data flow can process it either because there is
significant transformation work to do or because the destination is the bottleneck If the
destination is the bottleneck it must be parallelizable
3 There is no ordering dependency in the data rows For example if the data needs to stay sorted
donrsquot go and split it up using BDD
Conclusion
We have compared the performance results of pre and post design The results are positive BDD is a
real win solution to transform data from the Mastering system to consuming applications as well as
reduce the consumption copies
bull After leveraging BDD the local copy of the database has been retired
bull Interface job performance has improved An average gain of 220 was achieved
Benefits
Increased the performance around 220
Reduce infrastructure cost by retiring the local DB copy
Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)
Reusable solution for future requirements
Supportability and maintainability is easy
Free from Replication issues
Increases CSAT and CPE
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback
Appendix
Acknowledgements
Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed
Maqsood We would like to acknowledge the leadership and support provided by Hariharan
Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking
efforts and provided useful information available anytime anywhere
References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article
[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs
[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo
For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site
httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter
httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter
Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you
rate this paper and why have you given this rating For example
Are you rating it high due to having good examples excellent screen shots clear writing or
another reason
Are you rating it low due to poor examples fuzzy screen shots or unclear writing
This feedback helps improve the quality of white papers released
Send Feedback