Upload
vothien
View
248
Download
6
Embed Size (px)
Citation preview
• Tower International Overview
• QAD deployments at Tower
• QAD infrastructure upgrade and virtualization
• DR Architecture
• Additional application and infrastructure upgrades
2
Agenda
9/27/2013
• Tower International is a leading integrated global manufacturer of engineered structural
metal components and assemblies primarily serving automotive original equipment
manufacturers
3
Tower International at a glance
9/27/2013
• Revenue: 2.1 Billion
• Employees: 9000
• Corp. Headquarters: Livonia, Michigan, USA
• Locations: Our products are manufactured at 29 production facilities strategically
located near our customers in North America, South America, Europe and Asia. We
support our manufacturing operations through eight engineering and sales locations
throughout the world.
• QAD deployed globally at all Tower locations
• QAD infrastructure centralized at our USA based Global Data
Center
• QAD versions deployed at Tower include:
– MfgPro 9.0 / Progress 9.1E
– EB2 (various SP levels) / Progress 9.1E
– QAD 2008 SE / Progress 10.1.C
– QAD 2010 SE with .NET / Progress 10.2.B
• Total QAD users around 1600
4
QAD at Tower
9/27/2013
• MfgPro 9.0 and EB2 run centrally from USA based Data Center
• Infrastructure:
– HP PARISC servers running HPUX 11.x
• Obsolete hardware. Slow performance.
• No hardware fault tolerance. Risk of single server failure bringing down
QAD for an entire region.
– EMC SAN storage
• Costly and complex outsourced storage solution
– DR (Disaster Recovery) solution with 2 day recovery time from tape
backups
• Recovery time too long
• Solution does not scale well
• Costly outsourced solution
5
Pre virtualization QAD at Tower
9/27/2013
• Project started in 2010 with the following objectives:
1. Upgrade QAD server infrastructure (replace obsolete hardware)
2. Improve QAD performance (5X)
3. Replace outsourced SAN storage solution (reduce cost and
complexity)
4. Implement scalable (across multiple applications including QAD)
DR solution with a recovery time objective of 4 hours or less
6
Infrastructure upgrade project
9/27/2013
• Key infrastructure changes
– Migrate from HPUX to Redhat Linux
• Linux made hardware upgrade possible without requiring a QAD application
upgrade
• Utilize latest Intel X86 CPUs for improved performance
– Virtualize servers on VMware
• Provide hardware fault tolerance without the complexity
• Simplify future QAD upgrades. New servers to test upgrades can be setup easily
without having to purchase new hardware.
• Server portability for future Data Center move
– Migrate from EMC SAN storage to Netapp NAS storage
• Reduce cost and complexity of storage solution
• Already proven technology at Tower
• New DR solution based on VMware virtualization and Netapp storage
replication technologies
– QAD servers and databases to be replicated to DR site
7
New Solution
9/27/2013
• Key Success Factors
– Redhat Enterprise Linux 4 with increased CPU and Memory capacity
– Increased DB cache for improved performance
– Single tier QAD/Progress architecture. Client and DB on a single
server. This later allowed for self client mode to boost performance.
– QAD databases, binaries, users on NFS version 3 TCP mounts.
Simplified storage management.
– Leveraged Bravepoint’s Pro dump/load utility to minimize outage
windows during migrations
9
Migration to Linux
9/27/2013
• Key Benefits
– LDAP (AD) authentication for users
– Improved performance (5X)
• Several large operations including MRP runs saw over 5X speed
improvement. (MRP runs went from around 50min to around 10min)
• Larger DB Cache, Large read cache on Netapp and faster CPUs
• Lessons Learned
– Code compilation issues. Source code for some custom programs could not
be properly identified. Better source code management was needed.
– Slow telnet performance on Linux especially for regions outside USA. Nagel
algorithm had to be disabled (set NODELAY flag in /etc/xinetd.d/telnet) to fix
telnet performance.
– Direct printing to Korea printers did not work in Linux. NetTerm (Terminal
emulation software) print to local functionality was used as a workaround.
10
Migration to Linux Contd….
9/27/2013
• Key success factors:
– VMware Vsphere 4 cluster with Intel Nehalem CPUs optimized for
virtualization
– No over subscription of cpu/memory resources to ensure consistent
performance
– Simplified VMware configuration with NFS volumes for data stores
– Separate gigabit network dedicated for storage access
– VMware best practices including Jumbo frames (MTU 9000)
• Key Benefits:
– VMware snapshots: Facilitates simple back out from system changes and
recovery from OS corruption.
– Automatic load balancing of virtual servers across hosts
– Ease of virtual server resource (cpu/mem) changes and new server
deployments
– Cloning: Test servers could be easily cloned from production ensuring same
OS configuration and patches on test.
11
Virtualization of QAD / Progress
9/27/2013
• Lessons learned:
– Vmotion of systems with large amounts of memory was disruptive to
systems being vmotioned.
• Setup rules to pin systems with large memory to specific hosts in order
to prevent automatic relocation. Did not impact hardware fault
tolerance.
– IP hash based load balancing across 4 gigabit ports on VMhost
storage network was not optimal. 95% of the traffic stayed on a
single gigabit port.
– Storage NICs on the VMhosts were shared for both storage traffic as
well as vmotion traffic. This caused contention at times.
12
Virtualization of QAD / Progress contd….
9/27/2013
• Key success factors
– Dual FAS 3160 Storage controllers for high availability
– Netapp sizing based on peak IOPS usage on EMC storage
– 250GB of flash based read cache per controller
– High performance 15K RPM fiber channel disks with large stripe size
– Ether channeled storage network interface with 8 gigabit ports
– Netapp best practices including jumbo frames (9000 MTU) on
storage interfaces
13
Netapp Storage
9/27/2013
• Key benefits
– Eliminated frequent dump and loads for QAD databases
• Old infrastructure required frequent (multiple times a year) dump/loads
due to poor performance
• Performance has been good and consistent since the upgrade
minimizing the need for dump/loads
– Snapshots: easily create logical snapshots of volumes for backup
and recovery
– Dedupe: Eliminates duplicate data blocks on volumes. Saw a 35%
space savings on QAD DB volumes
– NFS and CIFS access to same volume. Eliminated the need for
users to FTP to QAD servers to access print files.
• Lessons learned
– IO inefficiencies due to misaligned Windows 2003 and Redhat Linux
4/5 virtual servers.
• Administrative correction needed to fix the issue
14
Netapp Storage contd…
9/27/2013
• Key network design considerations:
– Different system IP addresses in DR
• Employ DNS to manage IP change
– Fenced DR environment to protect production environment during
DR testing
• Access to DR environment controlled via access lists on Cisco layer 3
switch
– Optimize replication traffic between Data Center and DR site
• Over 80% optimization achieved with Riverbed WAN accelerators
– Adequate network bandwidth to replicate daily changes between
Data Center and DR site
• 45Mbps line used at DR site
16
QAD Disaster Recovery
9/27/2013
• Replication Process:
– Daily replication of all QAD servers and ancillary systems to DR site
using Netapp SMVI utility
• Swap partitions are on separate volumes and are not replicated on a
daily basis
– Daily QAD DB Replication using Netapp SnapMirror technology:
• Proquiet QAD DBs and take a snapshot (takes a few seconds)
• Release QAD DB to normal operation
• Replicate snapshot copy of DB to DR site
– Only data blocks that changed since previous day’s snapshot are
replicated
17
QAD Server/Data Replication Process
9/27/2013
• DR Startup process:
– VMware SRM (Site Recovery Manager) used to automate the
following:
• Change IP addresses of all servers
• Change DNS entries to reflect new IPs
• Change DB mounts on all QAD servers to reflect DR Netapp
• Create a logical copy (flex clone) of DR volumes
• Start all systems (Entire process takes less than 4 hours)
• DR Testing:
– All QAD environments are validated annually
– Logical copies of volumes are discarded after testing is completed.
– No impact to production systems or replicated data in DR
18
QAD DR Startup and Testing
9/27/2013
• 5X QAD performance improvement
• Reduced storage cost and complexity with Netapp storage
– Less than 2 year return on investment based on outsourced storage
expense savings
• Eliminated frequent database dump/loads – saved 0.5 FTE (Full-
time Equivalent)
• Automatic hardware failure protection with VMware
• Standby scalable (across multiple apps) DR solution with a less
than 4 hour system recovery time.
19
Key accomplishments of this project
9/27/2013
• Server environment: Redhat 5 64bit, 8 vcpus and 64GB of RAM.
• 10 NA plants were migrated from a MfgPro 9.0 environment to a QAD
2010SE multi-domain environment
• System performance was good until a large plant came on board
• Linux system load frequently exceeded 80 resulting in severe
performance degradation
• High cpu utilization
• Resolution steps:
– Implemented a larger server with 24 vcpus and 128GB of RAM
– System load issues were resolved after enabling “-q” parameter (only reads
code once) in client connections
– Performance was significantly improved after implementing self client mode
– Corrected several custom programs that were consuming large amounts of
cpu due to indexing issues
20
QAD 2010SE multi-domain virtualization
9/27/2013
21
MRP (23.2 Full Regen) Run time improvements
9/27/2013
00:00
07:12
14:24
21:36
28:48
36:00
4/9/2012 4/10/2012 4/11/2012 4/12/2012 4/13/2012 4/14/2012 4/15/2012 4/16/2012 4/17/2012 4/18/2012 4/19/2012 4/20/2012
auburn
bardstown
chicago
clinton
elkton
madison
meridian
ohio
plymouth
smyrna
Min
ute
s
Post Upgrade
• 10Gbit storage network
– Simplified storage network due to reduced number of ports
– Much faster Vmotion speeds. No impact on systems with large
amounts of memory during Vmotion.
• VMware upgrade to Vsphere 5 along with distributed virtual
switch implementation
– Eliminated VMhost load balancing inefficiencies across multiple
NICs
• Upgraded Netapp to FAS 3250 cluster
– SSDs used for read/write caching
– Aggregate / Volume level caching control with SSD cache
22
Netapp and VMware upgrades
9/27/2013