Upload
tranliem
View
232
Download
0
Embed Size (px)
Citation preview
#IDUG
Continuous Availability and Scalability with DB2 for z/OS and WebSphere
Pallavi PriyadarshiniArchitectJCC – DB2 Connect, [email protected]
Maryela WeihrauchDistinguished EngineerDB2 for z/OS, [email protected]
#IDUG
2
Agenda
• DB2 z/OS concepts for availability
• Client configurations
• Enabling Sysplex Workload Balancing in WAS environments
• Algorithms for Workload Balancing and Automatic Client Reroute
• Recommended WAS, JCC and DB2 settings
#IDUG
3
DB2 Data Sharing - Availability
Goal: Continuous availability across any planned or unplanned outage
• Elimination of single point of failure • Remove all causes for planned outages
– rolling “on-line” upgrades and maintenance– online schema evolution– online utilities
• On a failure: – Isolate failure to lowest granularity possible– Automate recovery and recover fast
Challenge: the distributed application server needs to find the (best) available path to the data.
#IDUG
4
DVIPA and Sysplex Distributor – the Concept
• Needs to be configured at network layer and used for initial connections to DB2 group
• Group DVIPA provides a virtual TCPIP address into the Data Sharing Group
• Sysplex Distributor routes the connection request to the most available member based on WLM recommendation
SD Vx
DB2A DB2Bz/OS 1 z/OS 2CF
1
2
3connect to Vx:446
distribute connectionto DB2B
#IDUG
5
DVIPA and Sysplex Distributor – Consideration
• Benefit: – Connections are successful as long as one member is up– Connection level workload balancing between members– Setup is isolated to z/OS environment
• Drawbacks– SD on one lpar may route to a member on a different lpar, which
results into slightly higher response time compared to direct member access.
– Information about availability of data sharing members is only considered at creation of a "new" connection but application server typically maintain long-running connections (eg. WAS connection pool)
DVIPA and Sysplex Workload Balancing at the application server need to be enabled to ensure the highest availability characteristics in a distr. environment.
#IDUG
DB2 Client – Panoramic view
PhP Python/Jython Ruby/JRubyJavascript node.js
Scala
Zend framework adapters
SqlAlchemy/Django Adapter
DB2
DB2 CLI and ODBC driver DB2 JCC JDBC driver
Rails Adapter Liftnode-odbc node.js
JSON driver
c
Python interpreter
java c
Ruby interpreter
java
Java
JDBC API
pureQueryAPI
JSON API
Hibernate
JPA
#IDUG
7
DB2 Client Configuration to Access DB2 z/OS – Strategy
• Clients access DB2 z/OS directly (will not change licensing model)
• Sysplex Workload Balancing is supported by DB2 Connect gateway, JCC type 4, and Data Server Driver for CLI and .NET application
C based Clients
JDBC/SQLJ/pureQuery/ObjectGrid/Data Web services
CLI/ Ruby/ Perl
.NET DRDA
DB2 Connect (Gateway)
Type 4 DRDA
DRDA
DB2 z/OS
DB2 Group
DB2 z/OS
Java based Clients
JCC
Type 4-like DRDA
DRDA CF
#IDUG
8
JCC Sysplex Support in WAS environment• Set DataSource IP as DVIPA (Initial connection to Sysplex Distributor)• Connects applications to a DB2 data sharing group as though it were a single
database server• Global JCC property OR DataSource specific setting (enableSysplexWLB)
Group DVIPA of data sharing group
#IDUG
9
JCC Sysplex Support in WAS environment• Custom properties available on Data Source
#IDUG
10
JCC Sysplex Support in WAS environment• Set DataSource property enableSysplexWLB to TRUE
#IDUG
11
JCC Sysplex Support in WAS environment
• Enables Workload Balancing (WLB)– Workload spread amongst different members, based on server lists dynamically
provided by Work Load Manager (WLM)
– Transactions distributed amongst best available members for maximum throughput and scalability
• Enables Automatic Client Reroute (ACR)– On by default when WLB enabled– Enables client application to recover from loss of communication so that application
continues with minimal interruption
– When one member of the group fails, connection should be re-established to the next best available member of the group
#IDUG
12
Basic concepts underlying EnableSysplexWLB
• Server lists
– Available members identified by identified by member IP address and port
– Member priority (Weight) that indicates capacity of each member to handle work relative to others
– Weights obtained from z/OS Workload Manager
– Returned whenever a connection is established or reused
• Connection pool
– Logical connection from the application to the driver
– No physical network connection while in pool
• Transport pool
– Physical connection from driver to DB2
– Global pool of Transport objects per JVM
– Connection reuses/changes transport at transaction boundary
– EnableSysplexWLB automatically enables use of transport pool
#IDUG
13
WebSphere Application Server - End-to-end Example
Transport 1
Transport 2
Application Server
Application
ResourceAdapter
JCAConnectionManager
DBConnection
Pool
DB2 Universal Driver JDBC/SQLJ
LogicalConn 3
LogicalConn 1
LogicalConn 2
disconnectat commit/rollback
CFpooled connectionsto DB2 Data Sharing
Group
JVM
• Always use connection pool closest to application
• Enable Sysplex Workload Balancing for best availability.
#IDUG
14
Without enableSysplexWLB – Application outage
Connection Pool with size 2
C1 T1
C2
WAS
DB2 B
DB2 A
T2
JCC Driver DB2 Data Sharing Group
Connection Pool with size 2
C1 T1
C2
WAS
DB2 B
DB2 A
T2
JCC Driver DB2 Data Sharing Group
X-4498StaleConnectionException
App1
App2
App1
App2
#IDUG
15
With enableSysplexWLB – Application continuity
Connection Pool with size 2
SysplexWLB=true
C1 T1
C2
WAS
DB2 B
DB2 A
T2
JCC Driver
DB2 Data Sharing Group
Connection Pool with size 2
SysplexWLB=true
C1 T1
C2
WAS
DB2 B
DB2 A
T2
JCC Driver
DB2 Data Sharing Group
X
T3
-30108 network failure -20542 SQL failure
App1
App2
App1
App2
DDF Shutdown
#IDUG
16
Changed driver defaults for Sysplex properties• V9.7FP6/V10.1FP2 and above recommended for Sysplex support
• Only set enableSysplexWLB to true
• Stay with driver defaults as much as possible
• Better defaults for Global properties– db2.jcc.maxTransportObjects
• Max # of connections to DB2 server across all datasources
• Default value of 1000 (previous default -1, unlimited transports)
• # of transports driver can go upto = #logical connections * # ds members
– db2.jcc.maxTransportObjectIdleTime • Maximum elapsed time in seconds before an idle transport is dropped
• Default value is 60 sec
– db2.jcc.maxTransportObjectWaitTime• Number of seconds the client waits for a transport to become available
• Default value is 1 sec (previous default = -1, unlimited wait)
#IDUG
17
Changed driver defaults for Sysplex properties– db2.jcc.maxRefreshInterval
• Max elapsed time in sec before server list is refreshed
• Default is 10 sec (previous default 30 sec)
• Better defaults for Connection and DataSource properties– maxRetriesForClientReroute
• Maximum number of connection retries for automatic client reroute
• Default is 1 (previous default -1 unlimited, previous recommendation was to set to # of ds members)
• Cycling through all members of a group counts as 1 retry (previous behavior was retry against each member was counted as 1)
– retryIntervalForClientReroute
• Number of sec between consecutive connection
• Default 0 sec
#IDUG
18
Work Load Balancing
• Initial connection to DB2 data sharing group using distributed IP address
• Initial connection returns a member IP list with WLM weights
• Subsequent connection directly to members – Any following connection uses
client Sysplex workload balancing algorithm to determine which member to
use
• Driver asks for new server list after every maxRefreshInterval secs
• Workload i.e. Active transactions should be distributed in the ratio of the
server returned member weights
• Single logical connection from an application can be directed to one member
for one transaction and directed to different member for another transaction.
• At start of each transaction, best member computed
#IDUG
19
Work Load Balancing
• At the end of a transaction, server determines whether logical connection can execute next transaction on a different member. This decision, called reuse permission, is conveyed to client in the commit or rollback reply.
• DB2 returns indicator as part of COMMIT/ROLLBACK if transport can be reused and returns SET statements to replay connection state at reuse
– No open WITH HOLD cursor– No declared global temporary tables must exist
• Used declared global temp tables must be explicitly or implicitly dropped– No reference to packages bound with KEEPDYNAMIC YES
• If reuse permission, transport given back to the transport pool at the end of the transaction
#IDUG
Work load balancing (WLB) algorithm• DB2/WLM reports following to the driver
– M1 : priority=70 IPADDR=/148.171.100.23 port=7330
– M2 : priority=30 IPADDR=/148.171.100.21 port=7330
– Member 1 has a weight of 70 i.e. ratio is 70/100 = 0.7
– Member 2 has a weight of 30 i.e. ratio is 30/100 = 0.3– Goal is for 70% of the work to be allocated to member 1, and 30% to member 2
• Next transaction should be assigned to a member where – (Member’s currently active transactions) / (Total active transactions for the group)
<= (Member priority) / (Total of all member priorities)
• With reference to WLB– No of active transactions on a member = No of transports in use
#IDUG
Work load balancing (WLB) algorithm
• What does the new algorithm do– Member 1 has 35 active connections (in a UoW)
– Member 2 has 10 active connections
– Current ratios -
• Member 1 = active connections / total connections = 35 / 45 = 0.78 (which is > 0.7 target ratio)
• Member 2 = 10 / 45 = 0.22 (which is < 0.3 target ratio)
– Next transaction would be assigned to member 2.
– Weight ratios are re-calculated each time a new member list is received
#IDUG
22
Transport Pool Behavior• Every time a connection needs a transport, WLB algorithm first provides the
‘best member’
• Logical connection requests the pool for a transport to this 'best member' using
the Database Name, Server, Port, SSL, Server Type, Auth Mechanism as a
composite key
• Every logical connection remembers the transport object it last used against a
given member.
• It passes in the key to the last transport (if any) that the connection used for this
member. Algo attempts to use this transport object when it needs to go to the
member
• Until maxTransportObjects is hit, the pool prefers to create a new transport if
one is not available
#IDUG
23
Transport Pool behavior• This leads to:
– maxPossibleTransports = # of data sharing members X # of logical connections
– If maxPossibleTransports > maxTransportObjects , the transports should remain at maxTransportObjects, the excess need will result in logical connections being put on a ‘wait queue’.
– At end of maxTransportObjectWaitTime, default is 1 second at the end of which connection receives a SQLCODE -4210, SQLSTATE 57033
• “[jcc][t4][10391][12247][3.64.82] Timeout getting a transport object from pool. ERRORCODE=-4210, SQLSTATE=57033”
Connection Pool with size 2
SysplexWLB=true
C1 T1
C2
WAS
DB2 B
DB2 A
JCC Driver
T3
T4
T2
Active
Active
Idle
Idle
App1
App2
#IDUG
Transport Pool Statistics
• For monitoring driver Sysplex behavior (global transport object pool)
• Configuration property settings– db2.jcc.dumpPool=DUMP_SYSPLEX_MSG|DUMP_POOL_ERROR
– db2.jcc.dumpPoolStatisticsOnSchedule=60
– db2.jcc.dumpPoolStatisticsOnScheduleFile=/home/WAS/logs/srv1/poolstats
• Example fields– npr - total number of requests made to the pool since the pool was created.
– nsr – number of times pool returned an object (successful requests)
– lwroc – light weight reuse
– hwroc – heavy weight reuse
– lbt – longest time thread waited to get an object
– tpo – transport object count
• Application API also exists to gather transport pool statistics (DB2PoolMonitor class)
#IDUG
25
ACR Behavior in a Data Sharing Group
• Seamless – On a connection failure, driver silently re-establishes the connection
to another member. Calling application remains unaware of this.
• Non-Seamless – On a connection failure, driver reports an exception
(SQLCODE -30108) to the calling application if the connection could be
established to an alternate member.
• When working with a Data Sharing Group, driver always evaluates the failing
transaction for a Seamless failover. Unless certain conditions are met,
Seamless failover can not be performed.
• Conditions such as 2nd or more SQL of a transaction, held cursors, open LOB
streams prevent seamless failover.
#IDUG
26
Behavior at Connection Errors• If first SQL stmt in transaction fails and reuse OK (Seamless reroute)
– No errors reported back to application– SET statements associated with the logical connection are replayed with first SQL
on another transport• If subsequent SQL fails and reuse OK (Non-seamless reroute)
– -30108 reuse error returned to application (transaction is rolled back)– Up to application to retry transaction– Reconnection happens only when application resubmits (different behavior than
older releases)– SET statements are replayed to recover connection state
• If subsequent SQL and reuse not OK (Non-seamless reroute)– -30081 connection failed error returned to application. – Connection returned to initial (default) state – application needs to reestablish connection state and retry transaction
• If all members in the member list are tried and none seems to be available, the initial data source url (distributed IP address) is retried to make sure that really no member is available.
#IDUG
27
• Connection object pool maintained by WebSphere– Saves creating/destroying Connection objects which is relatively expensive
• Connection object holds references to other Java objects like preparedStatement objects that would be destroyed with the Connection object
• Connection Pooling Properties – Max Connections
• max connections from JVM instance– Min Connections
• lazy minimum number of connections in pool– Reap Time
• How often cleanup of pool is scheduled in seconds– Unused Timeout
• How long to let a connection sit in the pool unused– Aged Timeout
• How long to let a connection live before recycling– Purge Policy
• After StaleConnection, does the entire pool get purged or only individual connection
WebSphere Connection Pooling
#IDUG
28
Recommended WebSphere Connection Pool Properties with SysplexWLB
• Disable Reap Time, Unused Timeout and Aged Timeout by setting values to zero
• Actual physical connections are handled by driver while connections handled by WAS are logical connections.
#IDUG
29
End-to-end Pooling
Connection pool
WAS JCC Driver
App threads Transport pool
DB2 for z/OS
DDF connections DBATs
Idle/Pooled
Inactive Pooled
#IDUG
30
Important DB2 zparm Values• CONDBAT – Max. # of distributed connection into DB2 system
– Includes inactive and active connections, may be large because of many inactive connections
– Default is 10,000
• MAXDBAT - Max # database access threads (DBATs) that can be active concurrently. – In many installations, max. value determined by available storage in DBM1 (check
IFCID 225) – Default is 200
• IDTHTOIN – time in sec an active server thread remain idle before it is canceled – Inactive connections are not subject to idle thread timeout– Strongly recommended to not set to 0, default of 120 secs works well
• To control connections waiting on DBAT, new parms introduced -– MAXCONQW – how long in seconds a transport/client connection should wait for a
DBAT before re-eroute– MAXCONQN - how many transports/client connections should wait for a DBAT
before reroute
#IDUG
31
WAS/DB2 - Tuning Considerations
DB2 Zparms:
•MAXDBAT•CONDBAT•MAXCONQN•MAXCONQW•IDTHTOIN
Application Server Connection Pool:•max connection•unused timeout
DB2 C-based /JCC client:
•EnablesysplexWLB•MaxTransportsObjects•MaxTransportObjectIdleTime•MaxTransportObjectWaitTime
#IDUG
32
Summary
• Highly available environment for strategic, enterprise applications involves:
– DB2 data sharing group enabled with DVIPA and Sysplex Distributor
– Application servers/DB2 Client drivers enabled for Sysplex Workload
Balancing
– Applications are well-behaved
• DB2, Driver and WAS teams have worked together to come up with optimal out
of box behavior for most scenarios
• Some tuning may be needed for HA features of DB2, Drivers and WAS
#IDUG
33
References
• DB2 for z/OS and WebSphere Integration for Enterprise Java Applications
http://www.redbooks.ibm.com/abstracts/sg248074.html?Open
• DB2 9 for z/OS Data Sharing: Distributed Load Balancing and Fault Tolerant Configurationhttp://www.redbooks.ibm.com/abstracts/redp4449.html?Open
• DB2 Version 9.1 for z/OS Data Sharing: Planning and Administration (SC18-9845)http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.dshare/db2z_dshare.htm
• DB2 for z/OS best practice education https://www.ibm.com/developerworks/data/bestpractices/db2zos/