28
Microsoft IT: A Case Study on Building a Highly Scalable Enterprise Application that uses Massively Parallel Processing (MPP) to Deliver High Performance and Scalability on SSIS-SQL Server 2012 SQL Server Technical Article Summary: This paper shares the approach used to understand and determine: Using the new Data Flow transformation component – Balanced Data Distributor (BDD) in SSIS-SQL Server BIDS-2008/2012, including performance analysis. Understand the specifics involved while leveraging BDD to build a highly scalable enterprise application that uses massively parallel process (MPP) to deliver high performance and scalability on SSIS. BDD is a simple and new Data flow component that can have huge benefits when working with data that is takes too long to load in a single Data Flow. The intention of BDD is to improve performance through multi-threading. This white paper describes the scenario and the pattern where BDD can be leveraged in SQL 2008/2012. It also includes the challenges and best practices when considering BDD to build enterprise solutions, as well as the performance analysis at different stages. This content is suitable for developers, architects, and database administrators. It is assumed that readers of this white paper have basic knowledge of SQL Server 2008/2012/2014 and SQL Server administration.

A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Microsoft IT A Case Study on Building a Highly Scalable Enterprise Application that uses Massively Parallel Processing (MPP) to Deliver High Performance and Scalability on SSIS-SQL Server 2012SQL Server Technical Article

Summary This paper shares the approach used to understand and determine

Using the new Data Flow transformation component ndash Balanced Data Distributor (BDD) in SSIS-

SQL Server BIDS-20082012 including performance analysis

Understand the specifics involved while leveraging BDD to build a highly scalable enterprise

application that uses massively parallel process (MPP) to deliver high performance and scalability

on SSIS

BDD is a simple and new Data flow component that can have huge benefits when working with data that

is takes too long to load in a single Data Flow The intention of BDD is to improve performance through

multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in

SQL 20082012 It also includes the challenges and best practices when considering BDD to build

enterprise solutions as well as the performance analysis at different stages

This content is suitable for developers architects and database administrators It is assumed that

readers of this white paper have basic knowledge of SQL Server 200820122014 and SQL Server

administration

Author Prabhakaran Sethuraman (PRAB) Microsoft

Technical Reviewers Prabhakaran Sethuraman (PRAB) Microsoft | Shubhra Mittal Microsoft | Mohammed Maqsood Microsoft | Hariharan Sethuraman Microsoft | Mandi Ohlinger Microsoft

Published September 2014

Applies to SQL Server 2008 SQL Server 2012 and SQL Server 2014

Copyright

This document is provided ldquoas-isrdquo Information and views expressed in this document including URL and other Internet Web site references can change without notice You bear the risk of using it

This document does not provide you with any legal rights to any intellectual property in any Microsoft product You can copy and use this document for your internal reference purposes

copy 2014 Microsoft All rights reserved

ContentsContents3

Introduction4

About Balanced Data Distributor (BDD)4

Balanced Data Distributor as a Solution Case Study6

Context6

Objectives7

Current Architecture7

Proposed Architecture8

Steps to add BDD in SSIS tool box12

Environments for performance analysis13

Performance analysis before using BDD14

Performance analysis after using BDD15

Design Considerations17

Conclusion18

Benefits18

Appendix19

Acknowledgements19

References19

For more information19

Introduction

There is a new Data Flow transform component available for SQL Server Integration Services ndash Balanced

Data Distributor (BDD) BDD allows you to create more than one independent segment against the

destination BDD [1] provides an easy way to ramp up your usage of multi-processor and multi-core

servers by introducing parallelism in the data flow of an SSIS package This paper describes the

following

Using the new Balanced Data Distributor (BDD) Data Flow transformation component in SSIS-

SQL Server BIDS-20082012 including performance analysis

Understand the specifics involved when leveraging BDD to build a highly scalable enterprise

application that uses massively parallel process (MPP) to deliver high performance and scalability

on SSIS

BDD [1] is a simple and new Data flow component that can have huge benefits when working with data

that takes too long to load in a single Data Flow The intention of BDD is to improve performance through

multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in

SQL 20082012 It also includes the challenges and best practices when considering BDD to build

enterprise solutions as well as the performance analysis at different stages

This content is suitable for developers architects and database administrators It is assumed that

readers of this white paper have basic knowledge of SQL Server 2008 SQL 2012 SQL 2014 and SQL

Server administration

About Balanced Data Distributor (BDD)

The functionality of the BDD [1 2] is very simple It takes its input data and routes it in equal proportions to

its outputs however many there are If you have four outputs roughly frac14 of the input rows go to each

output Instead of routing individual rows the BDD operates on buffers of data so itrsquos very efficient The

value of the BDD comes from the way modern servers work Parallelism When there are independent

segments of an SSIS data flow SSIS can distribute the work over multiple threads BDD provides an

easy way to create independent segments The following diagram provides more of an understanding

about BDD In the diagram BDD-MarketProduct (BDD component) takes its input data from the data

flow component (Market Product) and routes it in equal proportions to its outputs however many there

are We have four independent segments roughly a quarter of the input rows go to each output Instead

of routing individual rows the BDD operates on buffers of data so itrsquos very efficient

This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS

package to take advantage of todays multi-processor and multi-core servers By introducing parallelism

BDD does not require running multiple copies of equivalent packages in parallel or alternative harder

solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use

BBD against your computer file or OLEDB source Then you create several output flows as you wish

From there the new output flows should be equivalent to the execution of the slow moving processes

They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs

Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline

buffer of information to the outputs For example if there are three outputs setup the first buffer goes to

output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire

input file is complete

If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be

a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting

the destination database then there may be a substantial speed advantage

In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS

2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise

solutions

Balanced Data Distributor as a Solution Case Study

Context

Mastering data may include data about clients and customers employees product service offerings and

more Mastering data is typically shared by multiple users and groups across an organization and stored

on different systems Different systems might be pulling data through replication especially merge

replication since filtering is enabled Filtering allows applications to pull data based on their domain and

regional (geographical) values

MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement

System) Global PES is a centralized data warehouse that contains a variety of information on products

service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)

Global PES maintains this information at the market level Global PES pushes all new and modified data

to Regional MS CRM instances through Merge replication Specifically

Global PES is publishing data through Merge Replication with filter as regional MS CRM

consumes product and services data based on the specific market

MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge

replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent

job based on specific transformations and customizations at the regional level So all subscribing

applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant

copy of mastering data The following diagram shows the context of the current architecture of Global

PES and CRM integration with the local subscribers

Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business

Objectives

Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering

data using SQL Server 2012

Improve the performance and scalability of Extraction Transformation and Loading through SSIS

and retire a SQL Agent job

Explain the improvements observed when migrating from the SQL Agent job to the ETL

(Extraction Transformation and Load) process

Current Architecture

The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications

(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter

replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin

tool to administer and manage the data in the mastering system Changes made to the data are

replicated to the subscribing systems immediately Merge replication is running in continuous mode

NA CRM

HKTW CRM KOREA

Japan CRMAPAC CRM

EMEA CRM

Global PES

Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]

In the current architecture the Global PES (Mastering DB) is replicated against all subscribing

applications which creates duplicates of the same mastering data copy at the subscribing systems The

Interface DB is used by the subscribing CRM applications Based on defined transformation and business

rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the

Interface DB Full data extraction is always used and the Interface DB does not have any integrity

constraints like Primary key Check Constraints Foreign Constraints and so on

The current idea is to retire the copy of mastering data at the regional level and improve the performance

and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and

maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To

accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and

take advantage of parallelism

Proposed Architecture

In the current scenario consuming applications recognize the modified data daily after the SQL Agent job

executes So the scope is to use BDD to cut off replication and to transform the data directly from the

mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this

problem because of the following

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 2: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Copyright

This document is provided ldquoas-isrdquo Information and views expressed in this document including URL and other Internet Web site references can change without notice You bear the risk of using it

This document does not provide you with any legal rights to any intellectual property in any Microsoft product You can copy and use this document for your internal reference purposes

copy 2014 Microsoft All rights reserved

ContentsContents3

Introduction4

About Balanced Data Distributor (BDD)4

Balanced Data Distributor as a Solution Case Study6

Context6

Objectives7

Current Architecture7

Proposed Architecture8

Steps to add BDD in SSIS tool box12

Environments for performance analysis13

Performance analysis before using BDD14

Performance analysis after using BDD15

Design Considerations17

Conclusion18

Benefits18

Appendix19

Acknowledgements19

References19

For more information19

Introduction

There is a new Data Flow transform component available for SQL Server Integration Services ndash Balanced

Data Distributor (BDD) BDD allows you to create more than one independent segment against the

destination BDD [1] provides an easy way to ramp up your usage of multi-processor and multi-core

servers by introducing parallelism in the data flow of an SSIS package This paper describes the

following

Using the new Balanced Data Distributor (BDD) Data Flow transformation component in SSIS-

SQL Server BIDS-20082012 including performance analysis

Understand the specifics involved when leveraging BDD to build a highly scalable enterprise

application that uses massively parallel process (MPP) to deliver high performance and scalability

on SSIS

BDD [1] is a simple and new Data flow component that can have huge benefits when working with data

that takes too long to load in a single Data Flow The intention of BDD is to improve performance through

multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in

SQL 20082012 It also includes the challenges and best practices when considering BDD to build

enterprise solutions as well as the performance analysis at different stages

This content is suitable for developers architects and database administrators It is assumed that

readers of this white paper have basic knowledge of SQL Server 2008 SQL 2012 SQL 2014 and SQL

Server administration

About Balanced Data Distributor (BDD)

The functionality of the BDD [1 2] is very simple It takes its input data and routes it in equal proportions to

its outputs however many there are If you have four outputs roughly frac14 of the input rows go to each

output Instead of routing individual rows the BDD operates on buffers of data so itrsquos very efficient The

value of the BDD comes from the way modern servers work Parallelism When there are independent

segments of an SSIS data flow SSIS can distribute the work over multiple threads BDD provides an

easy way to create independent segments The following diagram provides more of an understanding

about BDD In the diagram BDD-MarketProduct (BDD component) takes its input data from the data

flow component (Market Product) and routes it in equal proportions to its outputs however many there

are We have four independent segments roughly a quarter of the input rows go to each output Instead

of routing individual rows the BDD operates on buffers of data so itrsquos very efficient

This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS

package to take advantage of todays multi-processor and multi-core servers By introducing parallelism

BDD does not require running multiple copies of equivalent packages in parallel or alternative harder

solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use

BBD against your computer file or OLEDB source Then you create several output flows as you wish

From there the new output flows should be equivalent to the execution of the slow moving processes

They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs

Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline

buffer of information to the outputs For example if there are three outputs setup the first buffer goes to

output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire

input file is complete

If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be

a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting

the destination database then there may be a substantial speed advantage

In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS

2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise

solutions

Balanced Data Distributor as a Solution Case Study

Context

Mastering data may include data about clients and customers employees product service offerings and

more Mastering data is typically shared by multiple users and groups across an organization and stored

on different systems Different systems might be pulling data through replication especially merge

replication since filtering is enabled Filtering allows applications to pull data based on their domain and

regional (geographical) values

MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement

System) Global PES is a centralized data warehouse that contains a variety of information on products

service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)

Global PES maintains this information at the market level Global PES pushes all new and modified data

to Regional MS CRM instances through Merge replication Specifically

Global PES is publishing data through Merge Replication with filter as regional MS CRM

consumes product and services data based on the specific market

MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge

replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent

job based on specific transformations and customizations at the regional level So all subscribing

applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant

copy of mastering data The following diagram shows the context of the current architecture of Global

PES and CRM integration with the local subscribers

Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business

Objectives

Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering

data using SQL Server 2012

Improve the performance and scalability of Extraction Transformation and Loading through SSIS

and retire a SQL Agent job

Explain the improvements observed when migrating from the SQL Agent job to the ETL

(Extraction Transformation and Load) process

Current Architecture

The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications

(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter

replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin

tool to administer and manage the data in the mastering system Changes made to the data are

replicated to the subscribing systems immediately Merge replication is running in continuous mode

NA CRM

HKTW CRM KOREA

Japan CRMAPAC CRM

EMEA CRM

Global PES

Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]

In the current architecture the Global PES (Mastering DB) is replicated against all subscribing

applications which creates duplicates of the same mastering data copy at the subscribing systems The

Interface DB is used by the subscribing CRM applications Based on defined transformation and business

rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the

Interface DB Full data extraction is always used and the Interface DB does not have any integrity

constraints like Primary key Check Constraints Foreign Constraints and so on

The current idea is to retire the copy of mastering data at the regional level and improve the performance

and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and

maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To

accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and

take advantage of parallelism

Proposed Architecture

In the current scenario consuming applications recognize the modified data daily after the SQL Agent job

executes So the scope is to use BDD to cut off replication and to transform the data directly from the

mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this

problem because of the following

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 3: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

ContentsContents3

Introduction4

About Balanced Data Distributor (BDD)4

Balanced Data Distributor as a Solution Case Study6

Context6

Objectives7

Current Architecture7

Proposed Architecture8

Steps to add BDD in SSIS tool box12

Environments for performance analysis13

Performance analysis before using BDD14

Performance analysis after using BDD15

Design Considerations17

Conclusion18

Benefits18

Appendix19

Acknowledgements19

References19

For more information19

Introduction

There is a new Data Flow transform component available for SQL Server Integration Services ndash Balanced

Data Distributor (BDD) BDD allows you to create more than one independent segment against the

destination BDD [1] provides an easy way to ramp up your usage of multi-processor and multi-core

servers by introducing parallelism in the data flow of an SSIS package This paper describes the

following

Using the new Balanced Data Distributor (BDD) Data Flow transformation component in SSIS-

SQL Server BIDS-20082012 including performance analysis

Understand the specifics involved when leveraging BDD to build a highly scalable enterprise

application that uses massively parallel process (MPP) to deliver high performance and scalability

on SSIS

BDD [1] is a simple and new Data flow component that can have huge benefits when working with data

that takes too long to load in a single Data Flow The intention of BDD is to improve performance through

multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in

SQL 20082012 It also includes the challenges and best practices when considering BDD to build

enterprise solutions as well as the performance analysis at different stages

This content is suitable for developers architects and database administrators It is assumed that

readers of this white paper have basic knowledge of SQL Server 2008 SQL 2012 SQL 2014 and SQL

Server administration

About Balanced Data Distributor (BDD)

The functionality of the BDD [1 2] is very simple It takes its input data and routes it in equal proportions to

its outputs however many there are If you have four outputs roughly frac14 of the input rows go to each

output Instead of routing individual rows the BDD operates on buffers of data so itrsquos very efficient The

value of the BDD comes from the way modern servers work Parallelism When there are independent

segments of an SSIS data flow SSIS can distribute the work over multiple threads BDD provides an

easy way to create independent segments The following diagram provides more of an understanding

about BDD In the diagram BDD-MarketProduct (BDD component) takes its input data from the data

flow component (Market Product) and routes it in equal proportions to its outputs however many there

are We have four independent segments roughly a quarter of the input rows go to each output Instead

of routing individual rows the BDD operates on buffers of data so itrsquos very efficient

This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS

package to take advantage of todays multi-processor and multi-core servers By introducing parallelism

BDD does not require running multiple copies of equivalent packages in parallel or alternative harder

solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use

BBD against your computer file or OLEDB source Then you create several output flows as you wish

From there the new output flows should be equivalent to the execution of the slow moving processes

They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs

Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline

buffer of information to the outputs For example if there are three outputs setup the first buffer goes to

output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire

input file is complete

If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be

a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting

the destination database then there may be a substantial speed advantage

In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS

2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise

solutions

Balanced Data Distributor as a Solution Case Study

Context

Mastering data may include data about clients and customers employees product service offerings and

more Mastering data is typically shared by multiple users and groups across an organization and stored

on different systems Different systems might be pulling data through replication especially merge

replication since filtering is enabled Filtering allows applications to pull data based on their domain and

regional (geographical) values

MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement

System) Global PES is a centralized data warehouse that contains a variety of information on products

service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)

Global PES maintains this information at the market level Global PES pushes all new and modified data

to Regional MS CRM instances through Merge replication Specifically

Global PES is publishing data through Merge Replication with filter as regional MS CRM

consumes product and services data based on the specific market

MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge

replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent

job based on specific transformations and customizations at the regional level So all subscribing

applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant

copy of mastering data The following diagram shows the context of the current architecture of Global

PES and CRM integration with the local subscribers

Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business

Objectives

Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering

data using SQL Server 2012

Improve the performance and scalability of Extraction Transformation and Loading through SSIS

and retire a SQL Agent job

Explain the improvements observed when migrating from the SQL Agent job to the ETL

(Extraction Transformation and Load) process

Current Architecture

The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications

(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter

replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin

tool to administer and manage the data in the mastering system Changes made to the data are

replicated to the subscribing systems immediately Merge replication is running in continuous mode

NA CRM

HKTW CRM KOREA

Japan CRMAPAC CRM

EMEA CRM

Global PES

Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]

In the current architecture the Global PES (Mastering DB) is replicated against all subscribing

applications which creates duplicates of the same mastering data copy at the subscribing systems The

Interface DB is used by the subscribing CRM applications Based on defined transformation and business

rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the

Interface DB Full data extraction is always used and the Interface DB does not have any integrity

constraints like Primary key Check Constraints Foreign Constraints and so on

The current idea is to retire the copy of mastering data at the regional level and improve the performance

and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and

maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To

accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and

take advantage of parallelism

Proposed Architecture

In the current scenario consuming applications recognize the modified data daily after the SQL Agent job

executes So the scope is to use BDD to cut off replication and to transform the data directly from the

mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this

problem because of the following

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 4: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Introduction

There is a new Data Flow transform component available for SQL Server Integration Services ndash Balanced

Data Distributor (BDD) BDD allows you to create more than one independent segment against the

destination BDD [1] provides an easy way to ramp up your usage of multi-processor and multi-core

servers by introducing parallelism in the data flow of an SSIS package This paper describes the

following

Using the new Balanced Data Distributor (BDD) Data Flow transformation component in SSIS-

SQL Server BIDS-20082012 including performance analysis

Understand the specifics involved when leveraging BDD to build a highly scalable enterprise

application that uses massively parallel process (MPP) to deliver high performance and scalability

on SSIS

BDD [1] is a simple and new Data flow component that can have huge benefits when working with data

that takes too long to load in a single Data Flow The intention of BDD is to improve performance through

multi-threading This white paper describes the scenario and the pattern where BDD can be leveraged in

SQL 20082012 It also includes the challenges and best practices when considering BDD to build

enterprise solutions as well as the performance analysis at different stages

This content is suitable for developers architects and database administrators It is assumed that

readers of this white paper have basic knowledge of SQL Server 2008 SQL 2012 SQL 2014 and SQL

Server administration

About Balanced Data Distributor (BDD)

The functionality of the BDD [1 2] is very simple It takes its input data and routes it in equal proportions to

its outputs however many there are If you have four outputs roughly frac14 of the input rows go to each

output Instead of routing individual rows the BDD operates on buffers of data so itrsquos very efficient The

value of the BDD comes from the way modern servers work Parallelism When there are independent

segments of an SSIS data flow SSIS can distribute the work over multiple threads BDD provides an

easy way to create independent segments The following diagram provides more of an understanding

about BDD In the diagram BDD-MarketProduct (BDD component) takes its input data from the data

flow component (Market Product) and routes it in equal proportions to its outputs however many there

are We have four independent segments roughly a quarter of the input rows go to each output Instead

of routing individual rows the BDD operates on buffers of data so itrsquos very efficient

This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS

package to take advantage of todays multi-processor and multi-core servers By introducing parallelism

BDD does not require running multiple copies of equivalent packages in parallel or alternative harder

solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use

BBD against your computer file or OLEDB source Then you create several output flows as you wish

From there the new output flows should be equivalent to the execution of the slow moving processes

They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs

Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline

buffer of information to the outputs For example if there are three outputs setup the first buffer goes to

output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire

input file is complete

If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be

a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting

the destination database then there may be a substantial speed advantage

In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS

2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise

solutions

Balanced Data Distributor as a Solution Case Study

Context

Mastering data may include data about clients and customers employees product service offerings and

more Mastering data is typically shared by multiple users and groups across an organization and stored

on different systems Different systems might be pulling data through replication especially merge

replication since filtering is enabled Filtering allows applications to pull data based on their domain and

regional (geographical) values

MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement

System) Global PES is a centralized data warehouse that contains a variety of information on products

service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)

Global PES maintains this information at the market level Global PES pushes all new and modified data

to Regional MS CRM instances through Merge replication Specifically

Global PES is publishing data through Merge Replication with filter as regional MS CRM

consumes product and services data based on the specific market

MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge

replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent

job based on specific transformations and customizations at the regional level So all subscribing

applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant

copy of mastering data The following diagram shows the context of the current architecture of Global

PES and CRM integration with the local subscribers

Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business

Objectives

Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering

data using SQL Server 2012

Improve the performance and scalability of Extraction Transformation and Loading through SSIS

and retire a SQL Agent job

Explain the improvements observed when migrating from the SQL Agent job to the ETL

(Extraction Transformation and Load) process

Current Architecture

The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications

(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter

replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin

tool to administer and manage the data in the mastering system Changes made to the data are

replicated to the subscribing systems immediately Merge replication is running in continuous mode

NA CRM

HKTW CRM KOREA

Japan CRMAPAC CRM

EMEA CRM

Global PES

Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]

In the current architecture the Global PES (Mastering DB) is replicated against all subscribing

applications which creates duplicates of the same mastering data copy at the subscribing systems The

Interface DB is used by the subscribing CRM applications Based on defined transformation and business

rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the

Interface DB Full data extraction is always used and the Interface DB does not have any integrity

constraints like Primary key Check Constraints Foreign Constraints and so on

The current idea is to retire the copy of mastering data at the regional level and improve the performance

and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and

maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To

accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and

take advantage of parallelism

Proposed Architecture

In the current scenario consuming applications recognize the modified data daily after the SQL Agent job

executes So the scope is to use BDD to cut off replication and to transform the data directly from the

mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this

problem because of the following

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 5: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

This BDD[123] Data Flow component was developed as a way to split your processing within an SSIS

package to take advantage of todays multi-processor and multi-core servers By introducing parallelism

BDD does not require running multiple copies of equivalent packages in parallel or alternative harder

solutions The BDD component is incredibly simple to use Once installed open your Data Flow and use

BBD against your computer file or OLEDB source Then you create several output flows as you wish

From there the new output flows should be equivalent to the execution of the slow moving processes

They will write to an equivalent output or be incorporated back using a ldquoUnion Allrdquo based on your needs

Regardless of the range of outputs you utilize BDD control can do the rendering by routing every pipeline

buffer of information to the outputs For example if there are three outputs setup the first buffer goes to

output 1 the second buffer goes to output 2 the third buffer goes to output 3 and so on until the entire

input file is complete

If you run the Data Flow on a laptop there probably wonrsquot be a speed advantage and there may even be

a speed cost If you run the Data Flow on a server with multiple cores and many disk spindles supporting

the destination database then there may be a substantial speed advantage

In this paper the main focus is on the scenario and the pattern where BDD can be leveraged in SSIS

2008 and SSIS 2012 including challenges and best practices when considering BDD to build enterprise

solutions

Balanced Data Distributor as a Solution Case Study

Context

Mastering data may include data about clients and customers employees product service offerings and

more Mastering data is typically shared by multiple users and groups across an organization and stored

on different systems Different systems might be pulling data through replication especially merge

replication since filtering is enabled Filtering allows applications to pull data based on their domain and

regional (geographical) values

MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement

System) Global PES is a centralized data warehouse that contains a variety of information on products

service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)

Global PES maintains this information at the market level Global PES pushes all new and modified data

to Regional MS CRM instances through Merge replication Specifically

Global PES is publishing data through Merge Replication with filter as regional MS CRM

consumes product and services data based on the specific market

MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge

replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent

job based on specific transformations and customizations at the regional level So all subscribing

applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant

copy of mastering data The following diagram shows the context of the current architecture of Global

PES and CRM integration with the local subscribers

Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business

Objectives

Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering

data using SQL Server 2012

Improve the performance and scalability of Extraction Transformation and Loading through SSIS

and retire a SQL Agent job

Explain the improvements observed when migrating from the SQL Agent job to the ETL

(Extraction Transformation and Load) process

Current Architecture

The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications

(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter

replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin

tool to administer and manage the data in the mastering system Changes made to the data are

replicated to the subscribing systems immediately Merge replication is running in continuous mode

NA CRM

HKTW CRM KOREA

Japan CRMAPAC CRM

EMEA CRM

Global PES

Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]

In the current architecture the Global PES (Mastering DB) is replicated against all subscribing

applications which creates duplicates of the same mastering data copy at the subscribing systems The

Interface DB is used by the subscribing CRM applications Based on defined transformation and business

rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the

Interface DB Full data extraction is always used and the Interface DB does not have any integrity

constraints like Primary key Check Constraints Foreign Constraints and so on

The current idea is to retire the copy of mastering data at the regional level and improve the performance

and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and

maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To

accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and

take advantage of parallelism

Proposed Architecture

In the current scenario consuming applications recognize the modified data daily after the SQL Agent job

executes So the scope is to use BDD to cut off replication and to transform the data directly from the

mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this

problem because of the following

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 6: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Balanced Data Distributor as a Solution Case Study

Context

Mastering data may include data about clients and customers employees product service offerings and

more Mastering data is typically shared by multiple users and groups across an organization and stored

on different systems Different systems might be pulling data through replication especially merge

replication since filtering is enabled Filtering allows applications to pull data based on their domain and

regional (geographical) values

MSIT has a mastering system in Enterprise Service Business called Global PES (Product Entitlement

System) Global PES is a centralized data warehouse that contains a variety of information on products

service offerings and entitlements that are defined by a marketrsquos Support Policy (in most circumstances)

Global PES maintains this information at the market level Global PES pushes all new and modified data

to Regional MS CRM instances through Merge replication Specifically

Global PES is publishing data through Merge Replication with filter as regional MS CRM

consumes product and services data based on the specific market

MSIT has 6 regional MS CRM applications that consume mastering data from Global PES through merge

replication The regional CRM extracts the data to an intermediate database using a nightly SQL Agent

job based on specific transformations and customizations at the regional level So all subscribing

applicationsregional CRM have a copy of the PES DB (created through replication) which is a redundant

copy of mastering data The following diagram shows the context of the current architecture of Global

PES and CRM integration with the local subscribers

Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business

Objectives

Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering

data using SQL Server 2012

Improve the performance and scalability of Extraction Transformation and Loading through SSIS

and retire a SQL Agent job

Explain the improvements observed when migrating from the SQL Agent job to the ETL

(Extraction Transformation and Load) process

Current Architecture

The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications

(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter

replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin

tool to administer and manage the data in the mastering system Changes made to the data are

replicated to the subscribing systems immediately Merge replication is running in continuous mode

NA CRM

HKTW CRM KOREA

Japan CRMAPAC CRM

EMEA CRM

Global PES

Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]

In the current architecture the Global PES (Mastering DB) is replicated against all subscribing

applications which creates duplicates of the same mastering data copy at the subscribing systems The

Interface DB is used by the subscribing CRM applications Based on defined transformation and business

rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the

Interface DB Full data extraction is always used and the Interface DB does not have any integrity

constraints like Primary key Check Constraints Foreign Constraints and so on

The current idea is to retire the copy of mastering data at the regional level and improve the performance

and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and

maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To

accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and

take advantage of parallelism

Proposed Architecture

In the current scenario consuming applications recognize the modified data daily after the SQL Agent job

executes So the scope is to use BDD to cut off replication and to transform the data directly from the

mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this

problem because of the following

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 7: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Figure 0 Interaction of Mastering System (Global PES) and MS CRM used within Services business

Objectives

Perform a proof of concept to leverage BDD in SSIS to retire the duplicate copy of the mastering

data using SQL Server 2012

Improve the performance and scalability of Extraction Transformation and Loading through SSIS

and retire a SQL Agent job

Explain the improvements observed when migrating from the SQL Agent job to the ETL

(Extraction Transformation and Load) process

Current Architecture

The current architecture of the Global PES (PublisherMastering Systems) and subscribing applications

(Regional MS CRM) is shown in the following figure In this scenario merge replication with filter

replicates the data changes to the subscribing systems [Regional CRM] Administrators use the Admin

tool to administer and manage the data in the mastering system Changes made to the data are

replicated to the subscribing systems immediately Merge replication is running in continuous mode

NA CRM

HKTW CRM KOREA

Japan CRMAPAC CRM

EMEA CRM

Global PES

Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]

In the current architecture the Global PES (Mastering DB) is replicated against all subscribing

applications which creates duplicates of the same mastering data copy at the subscribing systems The

Interface DB is used by the subscribing CRM applications Based on defined transformation and business

rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the

Interface DB Full data extraction is always used and the Interface DB does not have any integrity

constraints like Primary key Check Constraints Foreign Constraints and so on

The current idea is to retire the copy of mastering data at the regional level and improve the performance

and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and

maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To

accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and

take advantage of parallelism

Proposed Architecture

In the current scenario consuming applications recognize the modified data daily after the SQL Agent job

executes So the scope is to use BDD to cut off replication and to transform the data directly from the

mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this

problem because of the following

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 8: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Figure1 Current Architecture of Mastering and Subscribing System [Simplified Architecture]

In the current architecture the Global PES (Mastering DB) is replicated against all subscribing

applications which creates duplicates of the same mastering data copy at the subscribing systems The

Interface DB is used by the subscribing CRM applications Based on defined transformation and business

rules a nightly SQL Agent job in the regional CRM system extracts data from the Local PES to the

Interface DB Full data extraction is always used and the Interface DB does not have any integrity

constraints like Primary key Check Constraints Foreign Constraints and so on

The current idea is to retire the copy of mastering data at the regional level and improve the performance

and scalability of the SQL Agent job by replacing it with ETL This step enriches the supportability and

maintainability experience and reduces the infrastructure cost by not duplicating the mastering data To

accomplish this goal we use Balanced Data Distributor (BDD) in SSIS to redesign our architecture and

take advantage of parallelism

Proposed Architecture

In the current scenario consuming applications recognize the modified data daily after the SQL Agent job

executes So the scope is to use BDD to cut off replication and to transform the data directly from the

mastering DB to the Interface DB through SSIS BDD is the exact component and solution to solve this

problem because of the following

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 9: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

1 The Interface DB does not have any constraints (heap table)

2 The data load is full every time the SQL Agent job runs

3 The schemaentity structure of all regions is the same

Since there are no integrity constraints in the Interface DB BDD is a great solution Data load is faster

than the traditional data flow components Using BDD and parallelism we can accomplish the

transformation from the mastering DB to the subscribing applications

The following figure shows the redesigned architecture using BDD

Figure2 SSIS-is configured in mastering system and runs against each region based on the defined schedule and input parameters

In this redesigned architecture

ETL is designed using BDD There are several regional consuming applications of the mastering

DB These applications continue to consume data from the Interface DB which is loaded through

SSIS (ETL) instead of replications and a SQL Agent job

The Interface DB structure is similar across regions As a result the SSIS (ETL) package with

BDD is designed so it can be used with different regions by simply changing the input parameter

values like Region ID and Server Name

The following steps sync the mastering and consuming applications

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 10: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

1 Create a Configuration table in Global PES that stores the regional server names and region IDs

This data is used by SSIS-ETL to retrieve the product and service information based on the

regional server names and region IDs

Region IDMarket ID Server Name

13 NACRM

43 EMEACRM

As we need to filter data similar to Merge replication this configuration table is created and will be

used by the SSIS package

2 Create an SSIS package using BDD so it automatically uses multiple threads against the same

destination We have created the data flow for 35 entities as seen in the following diagram Here

you can see the sample data flow of MarketProduct which uses BDD and has 4 independent

segments (Marketproduct-thread1 to Marketproduct-thread4) to the destination Now data

transformation from the Mastering to the Subscribers is very fast and the Local copy of PES can

be retired

Figure 3 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Data Flow

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 11: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

3 The SSIS package control flow is designed to take care of transformation and business rules as

seen in the following diagram In this scenario the following occurs

i Extract the Region ID and Server name from the configuration table

ii Capture the ETL process log including the Error log

iii Truncate all tables in the Interface DB before loading the Mastering data

iv Start loading the data from Global PES to Interface DB through BDD

v Apply localization based on the regional collation

Figure 4 SSIS-ETL implementation by leveraging BDD (Balanced Data Distributor)-Control Flow

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 12: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

4 SSIS package are configured as a job in every region in the Mastering System The job runs on a

scheduled frequency

5 SSIS package take the [Region ID] and [Server Name] values as input parameters from the

configuration table and transform the data against the retrieved server name

Steps to add BDD in SSIS tool box

Step 1 Run BalancedDataDistributor-x86exe to install the BDD component in the Toolbox

Step 2 Once installed right-click on the Data Flow Transformation and select Choose Items

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 13: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Step 3 In Choose Items select the SSIS Data Flow Items tab select Balanced Data Distributor and

select OK

Balanced Data Distributor (BDD) is added

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 14: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Environments for performance analysis

The performance analyses completed throughout the migration process uses the following environments

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Global PES (Mastering System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLGCPESSQL09 16 GB 500 GB 235 GHZ

Table 1 Global PES SQL Server versions and complete system configurations

System Technology Stack

Server Name RAM Secondary Memory

Processor Speed

Local PES (CRM System)

SQL Server 2012 R2 amp Windows Server 2012 R2

AGLCLFYCRM01 16 GB 900 GB 235 GHZ

Table 2 CRM System (Local PES) SQL Server versions and complete system configurations

The performance analysis of SSIS BDD versus the SQL Agent Job is evaluated as

1 Performance Analysis of Existing Solution (SQL Agent Job without SSIS-BDD)

2 Performance Analysis of New Solution by leveraging SSIS-BDD

Performance analysis before using BDD

Global PES (Mastering DB) is replicated across CRM systems and creates a Local copy of the PES in the

regional systems Existing Global PES implementation

DB Name Description DB Size

Global PES Global PES DB (Mastering DB) 75 GB

Table 3 Global PES Database Size (Mastering System)

The Regional CRM system subscribes to Global PES so that the local PES DB copy is created A nightly

feed SQL Agent job extracts data from the local PES DB and inserts into the Interface DB based on the

defined transformation and business rules Local PES implementation

DB Name Description DB Size

Local PES Local PES DB (Local Copy of Mastering DB in CRM) 75 GB

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 15: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Interface DB (Populated through Nightly Feed Job)

Pulls data to an intermediate database using a nightly SQL Agent job based on specific transformations and customizations at the regional level

8 GB

Table 4 CRM System (Local PES) Local PES and Interface DB size

The following table lists the performance numbers of the existing implementation without using SSIS-

BDD (Balance Data Distributor)

Version State DB Size (NA Region)

NA Region DB Size (EMEA Region)

EMEA Region

Merge Replication from Global PES to CRM

Global PES to CRM System through Replication for creating Local PES copy

75 GB Continuous Replication

75 GB Continuous Replication

Nightly Feed Job in CRM

To populate Interface DB

7 GB (around 20 Million for some of the Key tables)

35 Minutes 7 GB (around 20 Million for some

of the Key tables)

30 Minutes

Table 5 performance in Minutes for every run in regional CRM systems

As you can see there is a difference in performance from region-to-region as the size of the data varies

based on the filtering which is defined according to a regional requirement

Performance analysis after using BDD

The following table lists the performance number of data extraction by leveraging SSIS-BDD This SSIS-

BDD implementation shows a 220 improvement in different regional runs

Version State Rows ret NA Region EMEA Region

SSIS (ETL) through BDD

Replication has been retired and Local PES copy creation has been eliminated

This saved around 75 GB storage

Retired Retired

SSIS (ETL) through BDD

Nightly Feed Job to populate Interface DB has been retired through SSIS BDD

7-8 GB (around 20 Million for some of the Key tables)

15 Minutes 13 Minutes

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 16: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

The following table shows the execution time (in minutes)

NA Region (Interface Feed Job) EMEA Region (Interface Feed Job)0

5

10

15

20

25

30

35

40

35

30

15

13

SSIS BDD vs SQL Agent Job

Before BDD (Minutes) After BDD (Minutes)

Region

Min

utes

Figure 5 Performance (in minutes) of the SSIS-ETL run in different regions

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 17: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

The following table shows the database size (in GB)

NA Region (Local PES DB)

EMEA Region (Local PES DB)

0

10

20

30

40

50

60

70

80

Before BDD (Replication)(GB)

After BDD (No Replication)(GB)

7575

Local PES Size (Before BDD -Replication Vs After BDD (No Replication)

Before BDD (Replication)(GB) After BDD (No Replication)(GB)

REGION

DB S

IZE

(GB)

Figure 6 Database Size after the implementation of SSIS-ETL run in different regions

By leveraging SSIS-BDD we can retire the local copy of the PES database Direct data extraction from

Global PES to the Interface DB takes less time and shows a 220 performance improvement So this is

really a great win as far as huge data load is concerned To take the solution forward there is a trade-off

between OLTP and integration to consider yielding good performance by leveraging BDD

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 18: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Design Considerations

Using the BDD [1 2] requires an understanding of the hardware you will be running on the performance of

your data flow and the nature of the data involved BDD isnrsquot for everyone But for those who are willing

to think through these things there can be significant benefits Consider using BBD when the following

occurs

1 There is a large amount of data coming in

2 The data can be read faster than the rest of the data flow can process it either because there is

significant transformation work to do or because the destination is the bottleneck If the

destination is the bottleneck it must be parallelizable

3 There is no ordering dependency in the data rows For example if the data needs to stay sorted

donrsquot go and split it up using BDD

Conclusion

We have compared the performance results of pre and post design The results are positive BDD is a

real win solution to transform data from the Mastering system to consuming applications as well as

reduce the consumption copies

bull After leveraging BDD the local copy of the database has been retired

bull Interface job performance has improved An average gain of 220 was achieved

Benefits

Increased the performance around 220

Reduce infrastructure cost by retiring the local DB copy

Decommission of Local PES DB (6 Copies and each copy size is around 75 GB)

Reusable solution for future requirements

Supportability and maintainability is easy

Free from Replication issues

Increases CSAT and CPE

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information
Page 19: A Case Study on Building a Highly Scalable …download.microsoft.com/download/D/2/0/D20E1C5F-72EA-450… · Web viewMicrosoft IT: A Case Study on Building a Highly Scalable Enterprise

Appendix

Acknowledgements

Many thanks for the technical information and input provided by Shubhra Mittal and Mohammed

Maqsood We would like to acknowledge the leadership and support provided by Hariharan

Sethuraman Special thanks to the Microsoft community on the Internet who have taken painstaking

efforts and provided useful information available anytime anywhere

References[1] Len Wyatt SQL Server Team ldquoThe Balanced Data Distributor for SSISldquo in SQL Server Technical Article

[2] SSIS Blog Balance Data Distributor ldquoSQL Server Blogrdquo in Microsoft TechNet Blogs

[3] Steve Wake Balanced Data Distributor (BDD) for SSIS ldquoBI Blogrdquo

For more informationhttpwwwmicrosoftcomsqlserver SQL Server Web site

httptechnetmicrosoftcomen-ussqlserver SQL Server TechCenter

httpmsdnmicrosoftcomen-ussqlserver SQL Server DevCenter

Did this paper help you Please provide feedback On a scale of 1 (poor) to 5 (excellent) how would you

rate this paper and why have you given this rating For example

Are you rating it high due to having good examples excellent screen shots clear writing or

another reason

Are you rating it low due to poor examples fuzzy screen shots or unclear writing

This feedback helps improve the quality of white papers released

Send Feedback

  • Contents
  • Introduction
  • About Balanced Data Distributor (BDD)
  • Balanced Data Distributor as a Solution Case Study
    • Context
    • Objectives
    • Current Architecture
    • Proposed Architecture
    • Steps to add BDD in SSIS tool box
    • Environments for performance analysis
    • Performance analysis before using BDD
    • Performance analysis after using BDD
      • Design Considerations
      • Conclusion
        • Benefits
          • Appendix
          • Acknowledgements
            • References
              • For more information