Upload
jaycee-palmer
View
215
Download
0
Embed Size (px)
Citation preview
High Availability @ KBC
Jan Tielemans
Agenda
Back in timeHow the HA environment @ KBC looked like in
2006Pro - Cons
Discuss the different steps(projects) to move to a HA environment anno today
Future directions/plans
Why a ‘High Availability’ environment ?
Back in time (1996/7….)Objective HA environment :
Keep the 24*7 applications running during ‘technical maintenance’
Be able to ‘re-direct’ 24*7 applications to ‘one’ system in case of disaster (no DRP)
Stable performance How it was implemented technically :
‘Headoffice & Branches’ workload runs on one lpar(M1)
Retail workload (24*7) runs on the other lpar(M2) Both ‘workloads’ share the same DB2 data
Developed a ‘Switch procedure’ to redirect the retail workload to another machine
HA@KBC anno 2006
(DSN A) (DNS B) (DNS XXXX)
SIPC
SDPC
SMPC
IC1 IC2
DNS A
IPC1
DPC1
MPC1
IC5 IC6
DNS B
IPC2
DPC2
MPC2
IC3 IC4
DNS XXXX
M1 M2
WAS
TCP/
IP
Retail (DNS B)
Retail (DNS B)
Retail (DNS B)
Retail (DNS B)
Retail (DNS B)
Retail (IPC1 DNS B)
Headoffice & Branches(DNS A)Headoffice & Branches
(DNS A)Headoffice & Branches(DNS A)Headoffice & Branches(SIPC DNS A) Retail (IPC2 DNS XXXX)
AGF frame work (KCF => connection cleanup after 20 min)
32
70
‘He
ad
office
&
Bra
nch
es’
Technical environment :DB2
Datasharing, 2 active members – 1 sleeping member
IMSNo Shared QueuesSubsystems/Regions for Retail not the same as
‘Headoffice & Branches’
MQNo Shared QueuesSubsystems for Retail not the same as ‘Headoffice &
Branches’ For some queues we use the MQ Clustering
technique
Switch Procedure Very complex and error prone
Often resulted in unavailability 24*7 applications Lot of interaction with WAS servers
Redirecting workload is done on each WAS serverControlled via a mainframe application
Knowledge and maintenance by one personMainframe application designed in NetView ($)
Static Workload distributionUnder utilization of CPU capacity at certain time periods
Availability was high, +99%
Things changed after initial setup in 1996More CPU/Memory available per MachineNew (critical) applications for the ‘Headoffice &
Branches’ and Retail workloadOther workload implemented More and more a mix of concurrent Online & Batch
workloadRegulations of BASEL – CBFA - …..…….
Why change ?Complex – error prone Switch procedureBetter utilize CPU capacityBetter use and exploit sysplex technologyPrepare for scalability (Cloned subsystems)
A 3 step approachMake Open Systems independent from Mainframe
Dynamic Transaction routing to the Mainframe
Workload Balancing RetailWorkload Balancing ‘Headoffice & Branches’
Implementation DTM Switch Procedure (06/2007)
TCP/IP Sysplex Distributor (DNS B M2)
SIPC
SDPC
SMPC
IC1+xit IC2+xit
DNS A
‘Headoffice & Branches’(SIPC DNS A)
Retail(GIPCDNS B)
AGF frame work (KCF => time to leave 5 Minutes)
BankSys(GIPCDNS B)
IPC1
DPC1
MPC1
IC5+xit IC6+xit
DNS B
IPC2
DPC2
MPC2
IC3+xit IC4+xit
DNS B
KBC Phone(GIPCDNS B)
M1 M2
TCP/IP Sysplex Distributor (DNS B M1)
32
70
‘He
ad
office
&
Bra
nch
es’
Implementation DTM Switch Procedure (06/2007)
Time managed connections to IMS
DTM Switch Procedure : Written in automation Retail Workload redirected in less then 10 minutes No interface/communication with WAS Servers Switch is a Mainframe ONLY operation Technical maintenance for M2 starts now on 13:00h vs 22:00h
Cloned IMS and MQ subsystems
Exploit TCP/IP sysplex Distributor
Implemented IMS exit to resolve : Map the IMS group name to an active IMS subsystem Control if IMS subsystem is active, if not redirect to other
To be solved for the next step(s) : Workload Balancing
Application changes :Eliminate system affinity in applications logic
Get IMSID – If substr(IMSID,3,1) eq ‘P’ then …..
Identified application which could suffer from DB2 Datasharing
Identified serial transaction How to serialize trx’s in a parallel environment ?
Communication – presentations for different departmentsMind Change
LET OP!
Wij draaien in HIGH AVAILABILITYTransacties kunnen op beide “online” productiesystemen in uitvoering gaan!
HA for Retail (11/2008)
KBC Phone(DNS B GIPC)
Sysplex Distributor WLM MANAGED(DNS B M1, M2)
Headoffice & Branches(SIPC DNS A)
ONL (elb & ipa) / AUT / KID / IIP(DNS B GIPC)
AGF frame work (KCF => time to leave 5 Minutes)
BankSys(DNS B==> GIPC)
SIPC
SDPC
SMPC
IC1+xit IC2+xit
DNS A
IC3+xit IC4+xit
DNS B
IPC1
DPC1
MPC1
IC3+xit IC4+xit
DNS B
M1 M2
WLM
WLM
32
70
‘He
ad
office
&
Bra
nch
es
HA for Retail (12/2008)
KBC Phone(DNS B GIPC)
Sysplex Distributor WEIGHTEDACTIVE(DNS B M1, M2)
‘Headoffice & Branches’(SIPC DNS A)
ONL (elb & ipa) / AUT / KID / IIP(DNS B GIPC)
AGF frame work (KCF => time to leave 5 Minutes)
BankSys(DNS B==> GIPC)
SIPC
SDPC
SMPC
IC1+xit IC2+xit
DNS A
IC3+xit IC4+xit
DNS B
IPC1
DPC1
MPC1
IC3+xit IC4+xit
DNS B
10 90
M1 M2
32
70
‘He
ad
office
&
Bra
nch
es’
IMS Exit :Build in logic to not distribute ‘some’ serial
transactions from ‘Headoffice & Branches’ Run these serial transactions only on ONE ims
subsystem(SIPC)
Switch from SERVERWLM to WEIGHTEDACTIVE distribution methodTo much important work defined in WLMHeterogeneous workload on M1 – M2 What information sends WLM to the Sysplex
Distributor ?
Weightedactive weights can easily be modified with ‘simple’ commands.
HA for Retail (12/2008)
KBC Phone(DNS B GIPC)
Sysplex Distributor WEIGHTEDACTIVE(DNS B M1, M2)
‘Headoffice & Branches’(SIPC DNS A)
ONL (elb & ipa) / AUT / KID / IIP(DNS B GIPC)
AGF frame work (KCF => time to leave 5 Minutes)
BankSys(DNS B==> GIPC)
SIPC
SDPC
SMPC
IC1+xit IC2+xit
DNS A
IC3+xit IC4+xit
DNS B
10
M1
IPC1
DPC1
MPC1
IC3+xit IC4+xit
DNS B
M2
Sysplex Distributor WEIGHTEDACTIVE(DNS B M1)
100 90
32
70
‘He
ad
office
&
Bra
nch
es’
HA for Retail (12/2008)
BenefitsOne environment (IPC2-DPC2-MPC2) less to
manage/maintain‘Retail’ Workload balancing is dynamic
adjustable…..Better utilization of resources (Lpars, cpu,
memory..)Pre z10 40% to M1 , 60% to M2z10 10% to M1, 90% to M2
(Cons)WLM management of the workload not possible due
to the heterogeneous workload on the system(s)
HA for ‘Headoffice & Branches’ (7/2009)
KBC Phone(DNS B GIPC)
Sysplex Distributor WEIGHTEDACTIVE(DSNA M1,M2) (DNS B M1, M2)
‘Headoffice & Branches’(SIPC DNS A)
ONL (elb & ipa) / AUT / KID / IIP(DNS B GIPC)
AGF frame work (KCF => time to leave 5 Minutes)
BankSys(DNS B==> GIPC)
SIPC
SDPC
SMPC
IC1+xit IC2+xit
DNS A
IC3+xit IC4+xit
DNS B
10
M1
IPC1
DPC1
MPC1
IC1+xit IC2+xit
DNS A
M2
Sysplex Distributor WEIGHTEDACTIVE(DSNA M1) (DNS B M1)
90
90 10
IC3+xit IC4+xit
DNS B
MSC - IP connection IP link
100
100
32
70
‘He
ad
office
&
Bra
nch
es’
IMS MSC (Multi Systems Coupling)Defined SIPC as base system for serial transactionsDefine serial transactions as Local - RemoteStill have (and will have) 21 serial transactionsAll serial transactions are now managed by MSC,
removed logic in the IMS Exit
HA in Performance figures
SSPCLP 18 (2094 –z9)Memory 40gb
IBS I
IBS IIFF
SSQCLP 10 (2097 z10)Memory 40gb
Io <2ms25K io/sec
<20µsec
Housekeeping
Housekeeping
PersoneelsnetService centerAsset centerWebseal……
‘HeadofficeBranches’
Retail Retail‘Headoffice Branches’
LdapLdap
Sysplex Distributor WEIGHTEDACTIVE
‘Headoffice & Branches’60-65 ms (2/3 DB2 -1/3 PGM)
5.5 - 6 milj trx /day – 200 trx/sec9:00 – 17:00
ONL (elb & ipa) / AUT / KID / IIP80-85 ms ((2/3 DB2 -1/3 PGM)
7.5 - 8 milj trx /day – 250 trx/sec00:00 – 24:00
AGF frame work (KCF)
90
Ldap
10 901050 50
Batch
Batch
Batch
BatchBatch
Batch Batch
Batch
Batch
Mirrored (GDPS managed)
Future Plans
Standardization on Subsystems namesDone
Implement “subsystem” failure management
Design for a SERVERWLM distribution
Design for CBU implementation vs PrioritiesRetail & ‘Headoffice & Branches’ can not run on
peek times (normal business hours) on one Lpar (system)
QUESTIONS ?