1
2 2
Upgrade to PowerCenter Grid Within Six Months and Significantly Improve ETL
Throughput
Pravin Darbare
Technology Solution Manager
Symantec
3
Agenda
Business Problem
Introductions 1
2
Strategy 3
Landscape Overview 4
Lessons Learnt(What to do/What not to do) 5
How to minimize Risk 6
4
Symantec Is –
Symantec Company Overview
Symantec is a global leader in providing security, storage and systems management solutions to help consumers and organizations secure and manage their information and identities.
5
Symantec At a Glance
Approximately 19,500 employees
Founded in 1982
IPO in 1989
Operations in more than 50 countries
100 percent of Fortune 500 companies are customers
#382 on the 2010 Fortune 500
$6.2 billion revenue in FY 2011;
approximately 50% outside of the U.S.
More than 1200 global patents
Symantec footprint on more than one billion systems
Included on Fortune’s Most
Admired Companies list
Invests 13% of annual revenue in R&D*
* R&D investment is Non-GAAP
Symantec Company Overview
6
Who we are
Symantec Company Overview
Transform data into insight
Information Management & Business Intelligence team(IMBI) at Symantec
Team Mission: Provide data intelligence & insights to
Symantec & its Partners,
anytime & anywhere,
in a secure & easy to use environment.
Pravin Darbare – Solution Manager
7
The Business Problem Lack of Reliable Insights to Drive Sound Business Decisions
Inability to optimize customer support by segment Inability to properly analyze discounts
Inability to identify cross sell & up sell opportunities Inability to identify, track, measure and manage renewals
Do we know what or where the
real answers are?
8
Go Forward Strategy Connect the Dots
Data MDM Apps Reporting
& Analytics
• Leverage the power of source systems and an EDW
• Leverage the power of consistent definitions and hierarchies
• Leverage the power of the portfolio
• Business Intelligence
• Business Discovery
Data & BI Governance
9
Execution of the Strategy Stabilize, Standardize, Strategize & Align on Go Forward Architecture & Plan
Stabilize, Standardize Strategize, Align Transform
10
Architecture Overview
IMBI COE Overview
Source Systems
ERP
Others
SFDC
EBE
SAS
Aprimo
“CDH”
“PDH”
Enterprise
MDM
“ODS”
Staging/
ODS
“CODS”
Business Access Layer
Qlikview dashboard
Xcelsius dashboard
eStore
Business Objects Report
Informatica Data Quality
ETL
Shareplex
Replicate
SAP & Qlikview Mobile
Self serve Data
Profiling, Rules, Quality
metrics
Self Serve Report
manage, migration, schedule
Self serve Dashboards
manage, create, data integration
Self Serve dashboards
manage, migration, schedule
Self Serve mobile
dashboards create
manage,shedule
Self Serve access to replicated source data
through Business access layer
Governance
Access
EDW
Access
Self Serve access to Dimensional data through Business
access layer
PowerCenter Enterprise Grid/IDQ
• Enterprise view • Integration • Conforming • Enrichment • History
• Departmental • Summary • Aggregates • Conformed • Dimensions
Dimensional Layer
Integration Layer
11
Informatica Footprint
Corporate standard for ETL tool
• Informatica PowerCenter version 9.1.0 Hotfix 2
• Informatica Data Quality 9.1.0 Hotfix 2
• Informatica Cloud
• Infa/Teradata connector Power Exchange 9.1.0
• Informatica Metadata Manager
Large Deployment
• 900+ of workflows and 4000+ mappings in PowerCenter
• Deployed for 8+ years
• Churns through 300-500GB+ of Data daily
• Complex workflows and logic
12
Program Approach
• Informatica Grid implementation was part of a larger Strategic
Business Intelligence program
• Piece by piece renewal of the entire Platform
• 18 month transformation initiative
• Setting the stage for future growth
• Informatica Grid implementation in 6 months
• Commodity H/W, 3 Node, Dell R710 Servers with Quad Core CPU,288GB RAM
• Moved from Sun Solaris to Linux/Dell
• Parallel Run of old and new for 3 months
13
Lessons Learned
• Maximize the use of available resources
• Observations:
• Informatica is I/O bound application
• Data in transit goes through high I/O
• How to speed up I/O ?
• Response time
• Maximize the use of available memory
• Reduce the use of disk file subsystem
• Reduce the cache to disk
14
Lessons Learned
• How to implement memory changes • Define the right number of process on each node
• Define the maximum memory allowed for auto memory attribute Resources Available
• Maximum memory allowed for auto memory attribute is changed to 2GB
• 100 processes/node can use 200GB+ Max Memory
• Selected processes with high I/O have been configured to utilize memory higher than 2GB
Node 1 Node 2 Node 3
Maximum Processes 100 100 100
Allocated Memory 288GB 288GB 288GB
15
Lessons Learned
• Maximize the use of available resources
• Observations:
• Underutilized CPU
• CPU utilization on the server is less than 30%
• How to make informatica processes to use CPU Power?
• Need more processes to use the available CPU cycles
• Partition the session based on the available CPU’s
• Launch more Informatica jobs in parallel
16
Lessons Learned
• How to implement CPU changes • Check the run-queue on the CPU
• Start few parallel processes and increase the number of processes gradually
• Monitor the run-queue length on the CPU
• Observations
• Processes start taking more time when the run-queue length exceeds more than 4/5
• Limit the maximum concurrent processes on each node to 40.
Node 1 Node 2 Node 3
Number of CPU's 8 8 8
17
Lessons Learned
• High Availability for critical business process
• Checklist for High Availability
• Architecture to support HA (System, H/W, Firmware etc.)
• Number of Nodes in a domain ( Minimum 3)
• Clustering software
• Shared common storage for all the Nodes
• Shared location for runtime files Sessions/workflows for parameter, Cache, Input/Output Files
• Shared location for State of Operations files like active service requests, scheduled tasks, completed/running processes
• HA Repository Database
• Configure the Domain Services (Integration/Repository) for HA (Restart/Failover)
18
Lessons Learned
• High Availability for critical business process
• Checklist continued …
• Set the relevant integration service properties/process variables
• Resilience Timeouts ( Client connection)
• Process Variables (switching integration service in case of failures)
• Additional tips
• Semaphore and Shared Memory settings (O/S Level)
• Monitor Memory : Top, vmstat, Perceiver etc
• Moniter CPU : Top, vmstat, Perceiver , run-queue etc
• Disk I/O : iostat, vmstat, sar, queue length on storage port etc
• Network : Netstat, sftp, nfstat etc
19
Lessons Learned
• Summary
• Frame the infrastructure need in context of a business problem resolution
• Upfront capacity planning
• Critical resource(SME) continuity through planning and execution
• Take advantage of the grid features
• Session On Grid (SONG) :- To improve scalability and performance by distributing session threads to multiple DTM processes running on nodes in the grid
• WorkFlow On Grid (WONG) :- Assign resources to nodes, create and configure the grid, and configure the Integration Service to run on a grid
• Take advantage of the H/W resources
• Memory/CPU ( Application configuration)
• Partition the session
20
Lessons Learned
• What not to do ?
• Don’t expect miracle on day 1
• “Performance is not magic but it’s a journey”
• Don’t miss the performance/load testing
• Test and validate each and every component including N/W, I/O on the shared storage, Memory, CPU
Performance!
21
Questions?