Upload
lyhuong
View
220
Download
2
Embed Size (px)
Citation preview
1
TOYOTA MOTOR EUROPE
Experiences with a custom High Availability solution
GSE 03/06/2010
Vladimir Patcevski
Toyota Europe Data Center
date 6/3/10 - page 2 date 6/3/10 - page 2
TOYOTA MOTOR EUROPE
Agenda
• A few words about Toyota
• The Case
• HA setup sequence
• Final Solution
• Questions
2
date 6/3/10 - page 3 date 6/3/10 - page 3
TOYOTA MOTOR EUROPE
Before Toyota
• 1897 Sakichi Toyoda completes his first automatic loom
• 1918 - Toyoda Spinning and Weaving Co. was established
date 6/3/10 - page 4 date 6/3/10 - page 4
TOYOTA MOTOR EUROPE
The First passenger car
• In 1933, the company’s automobile department was established.
• 1936 - Logo name is changed from Toyoda to Toyota
• 1936 – Production begins of the model AA passenger car
• 2009 - Toyota operates 75 manufacturing companies across 28 countries globally, and markets vehicles in more than 170 countries.
3
date 6/3/10 - page 5 date 6/3/10 - page 5
TOYOTA MOTOR EUROPE
Toyota in Europe
• Toyota began selling cars in Europe in 1963
• 1965 Total export to Europe was 5900 units
• 1970 – Toyota Motor Corporation Brussels Office opens n Belgium
date 6/3/10 - page 6 date 6/3/10 - page 6
TOYOTA MOTOR EUROPE
Toyota in Europe
• Our European operations are supported by a network of 31 NMSCs in 56 countries, around 3,000 retailers, and nine manufacturing plants.
• In 2009, in Europe, vehicle sales were 858 thousand units
4
date 6/3/10 - page 7 date 6/3/10 - page 7
TOYOTA MOTOR EUROPE
TME Information Systems
• TME Information Systems is located in Brussels.
• Supports Toyota and Lexus automotive businesses in Europe
- IT Management
- Application Development
- Toyota Europe Data Centre > Systems Engineering > Large Systems > Unix&AS400
date 6/3/10 - page 8 date 6/3/10 - page 8
TOYOTA MOTOR EUROPE
My Team
5
date 6/3/10 - page 9 date 6/3/10 - page 9
TOYOTA MOTOR EUROPE
The Case (TechDoc3 Project) • Document Management System for Importing, Authoring, Translation and
Publication of Technical Car documentation in Europe
• Designed to provide complete Technical Car Document Management System for all TME (and independent) Car Repair Centres.
• Application Layer: Many different Application components covering complete Document Management Lifecycle (hosted on 13 LDOMs configured on Sun Blades). WebSphere 6.1 Application Servers, IBM Filenet, Text Search Applications and several Custom Applications
• Technical Documentation Files stored on File Systems (so far 1.4TB already authorized)
• Database Layer (TechDoc3 Metadata) : DB2 Instance (hosted on LPARs configured on IBM P6 servers 2 x 2 Cores CPUs, 4.7 GHz, 32 GB ), 8 Databases (~ 100GB allocated).
• Some of TechDoc3 services (Document Search) requires HA solution
date 6/3/10 - page 10 date 6/3/10 - page 10
TOYOTA MOTOR EUROPE
The Case (TechDoc3 Project)
6
date 6/3/10 - page 11 date 6/3/10 - page 11
TOYOTA MOTOR EUROPE
DB HA Planning and Considerations
• What is the required availability?
- 24 hours a day/seven days a week?
- Or… 23 hours a day/six days a week?
• What could happen to disrupt the availability?
• What is the allotted time for replacing a failed resource?
• Which failures will by automatically detected as cluster events?
• Available Infrastructure
• In house IT Expertise
date 6/3/10 - page 12 date 6/3/10 - page 12
TOYOTA MOTOR EUROPE
Our HA choice
• 2 IBM P6 servers 2 x 2 Cores CPUs, 4.7 GHz, 32 GB
• 4 Logical Partitions (LPARs) per physical server (2 logical Partitions dedicated for HA DB layer created on 2 physical servers)
• IBM AIX 5.3 (5.3.9.4 TL09 - AIX Kernel Version)
• DB2 9.1.0.4
• GPFS 3.2 (shared-disk file system for cluster computers)
• HACMP 5.5 (HACMP is IBM's solution for high-availability clusters on the AIX Unix and Linux)
7
date 6/3/10 - page 13 date 6/3/10 - page 13
TOYOTA MOTOR EUROPE
HA Setup Sequence
• Plan and prepare the secondary LPAR
• Create DB2 Instance on the secondary server
• Install and Setup GPFS
• Migrate Instance/DB related files from JFS2 to GPFS File System
• Install and Configure HACMP
• Configure Applications for HA connection
• Test
date 6/3/10 - page 14 date 6/3/10 - page 14
TOYOTA MOTOR EUROPE
Plan and prepare the secondary LPAR
• The Secondary DB server environment Warm/Cold (Hybrid) Stanby mode
- Secondary Logical Partition UP
- Secondary Database Instance DOWN
8
date 6/3/10 - page 15 date 6/3/10 - page 15
TOYOTA MOTOR EUROPE
Plan and prepare the secondary LPAR
• Configure the secondary LPAR in uncap mode (allows you to take full advantage of unused clock CPU cycles that are in the shared processor pool).
• If multiple partitions in the frame are breathing heavy, then the managed system will give preference to the partitions that have a higher weight
date 6/3/10 - page 16 date 6/3/10 - page 16
TOYOTA MOTOR EUROPE
Plan and prepare the secondary LPAR
• After Failover, LPAR configuration allows DB2 Databases to utilize unused clock CPU cycles that are in the shared processor pool
• A higher weight set for this LPAR comparing with other (If needed will take more VCPUs)
9
date 6/3/10 - page 17 date 6/3/10 - page 17
TOYOTA MOTOR EUROPE
All LPARs per Physical Server
date 6/3/10 - page 18 date 6/3/10 - page 18
TOYOTA MOTOR EUROPE
LPAR Configuration
• Processing Units = CPU core
- Minimum
- Assigned (Initial value)
- Maximum (all CPU cores)
- Weight (Determines LPAR priority in assigning CPU resources). The default value is 128. The maximum value is 255. The number is not significant in itself, but its' relative value to other LPAR's on this managed server.
10
date 6/3/10 - page 19 date 6/3/10 - page 19
TOYOTA MOTOR EUROPE
Install DB2 SW on the secondary server
• All user/pass, userid/groupid, file system names and sizes created as in the primary server “mirror configuration”
• Installation of DB2 Software on JFS2 (local) File System
• Folder structure same as in the primary LPAR
date 6/3/10 - page 20 date 6/3/10 - page 20
TOYOTA MOTOR EUROPE
GPFS ( General Parallel File System )
• GPFS is IBM’s parallel, shared-disk file system for cluster computers
• Provides high performance by allowing data to be accessed over multiple computers at once
• Can be configured to eliminate single points of failure
11
date 6/3/10 - page 21 date 6/3/10 - page 21
TOYOTA MOTOR EUROPE
Install and Setup GPFS
date 6/3/10 - page 22 date 6/3/10 - page 22
TOYOTA MOTOR EUROPE
Migrate Instance/DB related files
• Instance Home
• Data Files
• Active log files
• Archived Log files
• Backup location
• Create DB2 Instance on the secondary LPAR
12
date 6/3/10 - page 23 date 6/3/10 - page 23
TOYOTA MOTOR EUROPE
Remove Already migrated JFS2 File Systems
• At his stage we already have some kind of HA DB system
• Requires manual intervention concerning
- Process Cleanup
-DB2 Instance Failover
-Application connections
date 6/3/10 - page 24 date 6/3/10 - page 24
TOYOTA MOTOR EUROPE
Install HACMP (High Availability Cluster Multiprocessing)
13
date 6/3/10 - page 25 date 6/3/10 - page 25
TOYOTA MOTOR EUROPE
Configure HACMP
• Resource Groups (NFS)
• Cluster Events (Node unavailable)
• Application automation: Minimizing manual intervention
- Start/Stop and cleanup scripts
- Monitoring and notifications
date 6/3/10 - page 26 date 6/3/10 - page 26
TOYOTA MOTOR EUROPE
Configure Resource Groups (NFS)
14
date 6/3/10 - page 27 date 6/3/10 - page 27
TOYOTA MOTOR EUROPE
Configure Cluster Events (Node unavailable)
date 6/3/10 - page 28 date 6/3/10 - page 28
TOYOTA MOTOR EUROPE
Application automation
15
date 6/3/10 - page 29 date 6/3/10 - page 29
TOYOTA MOTOR EUROPE
..And. How does it work?
date 6/3/10 - page 30 date 6/3/10 - page 30
TOYOTA MOTOR EUROPE
Failover process – Primary Instance DOWN
• OS generates PANIC alarm
• DB2 Instance not accessible/available/crash
• HACMP detects the event that triggers Fail-over process
• NFS mounted File Systems start with the failover
• Cleanup/shutdown of the failed LPAR is being completed
16
date 6/3/10 - page 31 date 6/3/10 - page 31
TOYOTA MOTOR EUROPE
Failover process - Secondary Services Active
• Failover of NFS mounted File Systems completed
• Stop/Start scripts has completed startup of DB instance.
• DB2 Database crash recovery is being completed
• Application connections reconnect to the secondary server (LPAR)
• Complete Process takes 5-10 minutes
date 6/3/10 - page 32 date 6/3/10 - page 32
TOYOTA MOTOR EUROPE
Final configuration
17
date 6/3/10 - page 33 date 6/3/10 - page 33
TOYOTA MOTOR EUROPE
Summary
• Advantages
- Easy to configure
- In-house expertise used
- Flexible (Many Fail-over HACMP options possible)
- Reliable
• Disadvantages
- Not Suitable for 24/7/365 DB Availability Requirement
- Rollback of all uncommited transaction
date 6/3/10 - page 34 date 6/3/10 - page 34
TOYOTA MOTOR EUROPE
Questions ?
18
TOYOTA MOTOR EUROPE
Thank you