Download pdf - Upgrade to PowerCenter Grid Within Six Throughput · Upgrade to PowerCenter Grid Within Six ... •Number of Nodes in a domain ... •Disk I/O : iostat, vmstat, sar,

1

2 2

Upgrade to PowerCenter Grid Within Six Months and Significantly Improve ETL

Throughput

Pravin Darbare

Technology Solution Manager

Symantec

3

Agenda

Business Problem

Introductions 1

2

Strategy 3

Landscape Overview 4

Lessons Learnt(What to do/What not to do) 5

How to minimize Risk 6

4

Symantec Is –

Symantec Company Overview

Symantec is a global leader in providing security, storage and systems management solutions to help consumers and organizations secure and manage their information and identities.

5

Symantec At a Glance

Approximately 19,500 employees

Founded in 1982

IPO in 1989

Operations in more than 50 countries

100 percent of Fortune 500 companies are customers

#382 on the 2010 Fortune 500

$6.2 billion revenue in FY 2011;

approximately 50% outside of the U.S.

More than 1200 global patents

Symantec footprint on more than one billion systems

Included on Fortune’s Most

Admired Companies list

Invests 13% of annual revenue in R&D*

* R&D investment is Non-GAAP


6

Who we are


Transform data into insight

Information Management & Business Intelligence team(IMBI) at Symantec

Team Mission: Provide data intelligence & insights to

Symantec & its Partners,

anytime & anywhere,

in a secure & easy to use environment.

Pravin Darbare – Solution Manager

7

The Business Problem Lack of Reliable Insights to Drive Sound Business Decisions

Inability to optimize customer support by segment Inability to properly analyze discounts

Inability to identify cross sell & up sell opportunities Inability to identify, track, measure and manage renewals

Do we know what or where the

real answers are?

8

Go Forward Strategy Connect the Dots

Data MDM Apps Reporting

& Analytics

• Leverage the power of source systems and an EDW

• Leverage the power of consistent definitions and hierarchies

• Leverage the power of the portfolio

• Business Intelligence

• Business Discovery

Data & BI Governance

9

Execution of the Strategy Stabilize, Standardize, Strategize & Align on Go Forward Architecture & Plan

Stabilize, Standardize Strategize, Align Transform

10

Architecture Overview

IMBI COE Overview

Source Systems

ERP

Others

SFDC

EBE

SAS

Aprimo

“CDH”

“PDH”

Enterprise

MDM

“ODS”

Staging/

ODS

“CODS”

Business Access Layer

Qlikview dashboard

Xcelsius dashboard

eStore

Business Objects Report

Informatica Data Quality

ETL

Shareplex

Replicate

SAP & Qlikview Mobile

Self serve Data

Profiling, Rules, Quality

metrics

Self Serve Report

manage, migration, schedule

Self serve Dashboards

manage, create, data integration

Self Serve dashboards

manage, migration, schedule

Self Serve mobile

dashboards create

manage,shedule

Self Serve access to replicated source data

through Business access layer

Governance

Access

EDW

Access

Self Serve access to Dimensional data through Business

access layer

PowerCenter Enterprise Grid/IDQ

• Enterprise view • Integration • Conforming • Enrichment • History

• Departmental • Summary • Aggregates • Conformed • Dimensions

Dimensional Layer

Integration Layer

11

Informatica Footprint

Corporate standard for ETL tool

• Informatica PowerCenter version 9.1.0 Hotfix 2

• Informatica Data Quality 9.1.0 Hotfix 2

• Informatica Cloud

• Infa/Teradata connector Power Exchange 9.1.0

• Informatica Metadata Manager

Large Deployment

• 900+ of workflows and 4000+ mappings in PowerCenter

• Deployed for 8+ years

• Churns through 300-500GB+ of Data daily

• Complex workflows and logic

12

Program Approach

• Informatica Grid implementation was part of a larger Strategic

Business Intelligence program

• Piece by piece renewal of the entire Platform

• 18 month transformation initiative

• Setting the stage for future growth

• Informatica Grid implementation in 6 months

• Commodity H/W, 3 Node, Dell R710 Servers with Quad Core CPU,288GB RAM

• Moved from Sun Solaris to Linux/Dell

• Parallel Run of old and new for 3 months

13

Lessons Learned

• Maximize the use of available resources

• Observations:

• Informatica is I/O bound application

• Data in transit goes through high I/O

• How to speed up I/O ?

• Response time

• Maximize the use of available memory

• Reduce the use of disk file subsystem

• Reduce the cache to disk

14

Lessons Learned

• How to implement memory changes • Define the right number of process on each node

• Define the maximum memory allowed for auto memory attribute Resources Available

• Maximum memory allowed for auto memory attribute is changed to 2GB

• 100 processes/node can use 200GB+ Max Memory

• Selected processes with high I/O have been configured to utilize memory higher than 2GB

Node 1 Node 2 Node 3

Maximum Processes 100 100 100

Allocated Memory 288GB 288GB 288GB

15

Lessons Learned

• Maximize the use of available resources

• Observations:

• Underutilized CPU

• CPU utilization on the server is less than 30%

• How to make informatica processes to use CPU Power?

• Need more processes to use the available CPU cycles

• Partition the session based on the available CPU’s

• Launch more Informatica jobs in parallel

16

Lessons Learned

• How to implement CPU changes • Check the run-queue on the CPU

• Start few parallel processes and increase the number of processes gradually

• Monitor the run-queue length on the CPU

• Observations

• Processes start taking more time when the run-queue length exceeds more than 4/5

• Limit the maximum concurrent processes on each node to 40.

Node 1 Node 2 Node 3

Number of CPU's 8 8 8

17

Lessons Learned

• High Availability for critical business process

• Checklist for High Availability

• Architecture to support HA (System, H/W, Firmware etc.)

• Number of Nodes in a domain ( Minimum 3)

• Clustering software

• Shared common storage for all the Nodes

• Shared location for runtime files Sessions/workflows for parameter, Cache, Input/Output Files

• Shared location for State of Operations files like active service requests, scheduled tasks, completed/running processes

• HA Repository Database

• Configure the Domain Services (Integration/Repository) for HA (Restart/Failover)

18

Lessons Learned

• High Availability for critical business process

• Checklist continued …

• Set the relevant integration service properties/process variables

• Resilience Timeouts ( Client connection)

• Process Variables (switching integration service in case of failures)

• Additional tips

• Semaphore and Shared Memory settings (O/S Level)

• Monitor Memory : Top, vmstat, Perceiver etc

• Moniter CPU : Top, vmstat, Perceiver , run-queue etc

• Disk I/O : iostat, vmstat, sar, queue length on storage port etc

• Network : Netstat, sftp, nfstat etc

19

Lessons Learned

• Summary

• Frame the infrastructure need in context of a business problem resolution

• Upfront capacity planning

• Critical resource(SME) continuity through planning and execution

• Take advantage of the grid features

• Session On Grid (SONG) :- To improve scalability and performance by distributing session threads to multiple DTM processes running on nodes in the grid

• WorkFlow On Grid (WONG) :- Assign resources to nodes, create and configure the grid, and configure the Integration Service to run on a grid

• Take advantage of the H/W resources

• Memory/CPU ( Application configuration)

• Partition the session

20

Lessons Learned

• What not to do ?

• Don’t expect miracle on day 1

• “Performance is not magic but it’s a journey”

• Don’t miss the performance/load testing

• Test and validate each and every component including N/W, I/O on the shared storage, Memory, CPU

Performance!

21

Questions?