35
© 2015 IBM Corporation DMX-2628 – Always On: High Availability Best Practices for Informix Nagaraju Inturi [email protected] Scott Lashley [email protected]

Always on high availability best practices for informix

Embed Size (px)

Citation preview

Page 1: Always on high availability best practices for informix

© 2015 IBM Corporation

DMX-2628 – Always On: High Availability Best Practices for Informix Nagaraju Inturi [email protected] Scott Lashley [email protected]

Page 2: Always on high availability best practices for informix

•  IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

•  Information regarding potential future products is intended to outline our general product direction

and it should not be relied on in making a purchasing decision. •  The information mentioned regarding potential future products is not a commitment, promise, or

legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

•  The development, release, and timing of any future features or functionality described for our

products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Please Note:

2

Page 3: Always on high availability best practices for informix

Industry Terms •  Recovery Point Objective (RPO)

§  How much data are you willing to lose?

•  Recovery Time Objective (RTO) §  How much time to recovery from a failure

•  Example §  ONCONFIG parameter RTO_SERVER_RESTART

Monitors transaction activity and coordinates checkpoints such that in the event of a server crash, the server can reboot in the time specified by RTO_SERVER_RESTART

2

Page 4: Always on high availability best practices for informix

Hot Standby •  Fred wants to implement an RTO policy of 15 seconds in the

event of a failure.

3

Primary Secondary

Page 5: Always on high availability best practices for informix

Updatable Secondary •  Fred wants to extend his HDR solution to utilize the secondary.

4

Primary Secondary

Page 6: Always on high availability best practices for informix

Updatable Secondary •  How do updates on the secondary work?

§  Row locks are acquired on secondary as updates are applied from primary

§  Initial read is done on secondary §  Update is forwarded to primary

•  If row versioning is defined in the schema for the table, the version is compared to determine if update can be applied

•  Otherwise, whole row is compared to determine if update can be applied

•  What isolation levels are supported on a secondary? §  Dirty Read §  Committed Read §  Committed Read Last Committed

5

http://www-01.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.admin.doc/ids_admin_0874.htm%23ids_admin_0874?lang=en http://www-01.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.admin.doc/ids_admin_0875.htm?lang=en http://www-01.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.admin.doc/ids_admin_0877.htm?lang=en

Page 7: Always on high availability best practices for informix

Application Perspective - Locking & Queries

6

Begin Work

Read Row (V1)

Update Row (V2)

Apply Update Sec

Commit Work

Apply Commit

Sec

ulock xlock release lock Pri

Query Primary

row(V1) DR=row(V2) CRLC=row(V1) CR,CS,RR=block

Query Secondary

row(V1) DR=row(V2) CRLC=row(V1) CR=block

Anatomy Of Update

release lock Sec

xlock Sec

Page 8: Always on high availability best practices for informix

Application Perspective - Locking & Updates

7

Begin Work

Read Row (V1)

Update Row (V2)

Apply Update

Commit Work

Apply Commit

ulock xlock release lock

Update Primary

row(V1) block

Update Secondary

If hot row, push to primary Otherwise, row(V1)

block

Anatomy Of Update

xlock Sec

release lock Sec

Page 9: Always on high availability best practices for informix

Application behavior •  I’m on an updatable secondary and my application just did an

update to a row but its not committed yet. If I go read the row, what version of the row will I see? §  When my session (or any other session) attempts to read a row

that is recently updated, it will wait for secondary server I’m connected to to replay that row update prior to reading the row.

8

Begin Work

Read Row (V1)

Forward Update Sec

Wait Apply Update Sec

Commit Work

Apply Commit

Sec Read Row

Again

Block until row is applied

Page 10: Always on high availability best practices for informix

Application behavior •  When I get error 7350 “Attempt to update a stale version of a

row”, what happened? §  My application read a row from the secondary node and between

the time the row was read and forwarded to the primary to be updated, another transaction was able to complete an update to the row.

9

Update Secondary

Update Primary

Read row (V1)

Read row (V1)

Update row (V2)

Commit row (V2)

Forward update

(V3)

At this point, the forwarded update is the wrong version to what is committed; error is returned.

Page 11: Always on high availability best practices for informix

What’s new? •  My application is using UPDATABLE_SECONDARY

configuration to perform queries and updates on all the members of my HDR cluster. How do I coordinate transactions across the HDR cluster?

•  CLUSTER_TXN_SCOPE ONCONFIG and session parameter used to control when the application receives an acknowledgement of the commit of a user’s transaction.

10

CLUSTER_TXN_SCOPE Connected to Primary Connected to Secondary

SESSION ACK when commit is complete

ACK when commit is complete on primary

SERVER (default) ACK when commit is complete

ACK when commit is complete on primary and processed on the node I’m connected to

CLUSTER ACK when commit has been applied to all nodes

ACK when commit has been applied to all nodes

Page 12: Always on high availability best practices for informix

What’s new? •  DRINTERVAL & HDR_TXN_SCOPE

These parameters work together to determine synchronization between primary and secondary nodes

•  FULL_SYNC is new

11

DRINTERVAL HDR_TXN_SCOPE Buffered logging Unbuffered logging

-1 n/a Async Near sync

0 FULL_SYNC Full sync Full sync

0 ASYNC Async Async

0 NEAR_SYNC Near sync Near sync

>0 n/a Async Async

Page 13: Always on high availability best practices for informix

DRINTERVAL & HDR_TXN_SCOPE •  My RPO is 0 for single point of failure

DRINTERVAL=0 HDR_TXN_SCOPE=NEAR_SYNC

This setting makes sure that committed transactions are received by the secondary. If the primary fails, all committed transactions will be guaranteed to be at least in volatile memory on the secondary.

•  My RPO is 10 for a single point of failure DRINTERVAL=10

Make sure I send to the secondary a buffer at least every 10s

•  My RPO is 0 for multiple points of failure DRINTERVAL=0 HDR_TXN_SCOPE=FULL_SYNC

This setting makes sure that committed transactions are received and written to disk by the secondary. If the primary fails, all committed transactions will be guaranteed to be hardened to disk on the secondary.

12

Page 14: Always on high availability best practices for informix

Offsite disaster •  Fred wants to extend his HDR solution to include offsite

replication in case of site disaster.

13

Primary Secondary

RSS Secondary

Page 15: Always on high availability best practices for informix

Remote Standalone Secondary (RSS) •  You want our remote site located in TimBuktu?

How’s the network connectivity to that site? •  You dropped what database?

§  DELAY_APPY

•  Your planning to do what maintenance this weekend? §  Stop Apply command

•  RSS Limitations §  Can only be promoted to HDR secondary, not primary §  SYNC mode not supported

14

Page 16: Always on high availability best practices for informix

Improved Network performance •  SMX_NUMPIPES

§  There is a limit on how many TCP buffers can be inflight across a wire between a pair of ports until a TCP ACK is sent to the sender. This is referred to as the TCP window. SMX can be configured to have multiple pairs of ports between two given servers, in effect filling in the gaps that would otherwise occur on the network wire. This is especially advantageous if the network connection is over a WAN or of less that best quality. In such conditions, setting SMX_NUMPIPES to 2 can result in twice as much data being sent across the wire.

§  SMX will reorganize the transmissions on the target node so that it appears to have been received across a single serial connection.

15

Page 17: Always on high availability best practices for informix

What’s new (and really cool)? •  Informix warehouse accelerator (IWA)

16

Page 18: Always on high availability best practices for informix

What’s really cool?

17

Hey Scott, we are having an online sale this weekend and we expect a huge influx of internet activity on our web site. I might have forgot to tell you that. Can our infrastructure handle that?

•  Share Disk Secondary (SDS) §  Adjust capacity as demand

changes §  Does not duplicate disk space §  No special hardware

•  Cluster mgr or SDS_LOGCHECK

§  Coexist with ER, HDR & RSS §  Primary can failover to any SDS

•  ifxclone §  Make a quick copy

Page 19: Always on high availability best practices for informix

What’s improved? •  Index page logging (IPL)

§  Copies a newly created index from primary to secondary using the logical log.

§  Required for RSS secondary servers §  Big performance boost (4x)

18

Page 20: Always on high availability best practices for informix

Best Practices for HDR, RSS, SDS •  All nodes which are candidates for failover (HDR secondary &

SDS) should have similar specs in case there is a failover •  Use unbuffered database logging to minimize lost transactions •  ONCONFIG parameter OFF_RECVRY_TRHEADS should be

set to prime (# of cpus) * 3 •  Turn on AUTO_READAHEAD on secondary •  Larger BUFFERPOOL can alleviate some random I/O •  ONCONFIG parameter TEMPTAB_NOLOG=1 to default temp

tables to non logging •  ONCONFIG parameter HA_ALIAS= TCP network-based server

alias §  Used to tell server network interface port to do server to server

replication traffic.

19

Page 21: Always on high availability best practices for informix

Best practices for HDR •  ONCONFIG parameter DRINTERVAL=0 and use

HDR_TXN_SCOPE (ASYNC, NEAR_SYNC or FULL_SYNC) •  ONCONFIG parameter DRAUTO=3 and use connection

manage to arbitrate failover •  ONCONFIG parameter LOG_STAGING_DIR always set

§  Some log records, like CHECKPOINT, require serialized processing which can block the primary from sending log data. When an HDR secondary is configured with a log staging directory, the logs can be spooled to disk while the serialized log record is applied on the secondary. Once the log record has been applied, the secondary will apply the spooled log until it catches up with the primary. This can alleviate backflow pressure from the secondary to the primary.

20

Page 22: Always on high availability best practices for informix

Best practices for RSS •  ONCONFIG parameter RSS_FLOW_CONTROL

§  This ONCONFIG parameter controls RPO (units=amount of data rather than time) for the RSS node so it doesn’t fall too far behind

•  ONCONFIG parameter SMX_NUMPIPES §  Take advantage of parallel data transmission using multiple

network pipes

21

Page 23: Always on high availability best practices for informix

Best practices for SDS •  ONCONFIG parameter SDS_LOGCHECK

User scenario… I’m using HDR SDS with no cluster manager. How do I avoid disk corruption and split brain in a failover scenario?

§  SDS_LOGCHECK is used to watch to log space in the event of a failover scenario. After waiting N seconds, if no log activity is seen, SDS secondary will assume takeover.

§  10 is a good starting value

•  ONCONFIG parameter SDS_FLOW_CONTROL §  This ONCONFIG parameter controls RTO (units=amount of data

rather than time) for the SDS node so it doesn’t fall too far behind •  No data will be lost because the disks are shared!

•  By not falling too far behind, it maintains RTO in the event of a failover so there isn’t too much log to apply in order to catch up

22

Page 24: Always on high availability best practices for informix

Connection Manager

Page 25: Always on high availability best practices for informix

Connection Manager •  Route client connection…

24

?

Cluster

Flexible Grid / ER

Page 26: Always on high availability best practices for informix

Connection Manager •  Failover arbitration

25

New Primary

Cluster

Page 27: Always on high availability best practices for informix

Connect Manager •  Act as a proxy

26

Port Blocked

CM as Proxy

CM-used port allowed

Client that cannot be recompiled

Page 28: Always on high availability best practices for informix

Connection Manager •  Connection unit types

27

1) CLUSTER 2) REPLSET 3) GRID

Primary

HDR RSS Enterprise Replication

4) SERVERSET

Page 29: Always on high availability best practices for informix

Connection Manager – Best Practices •  Avoid single point of failure

28

Client’s INFORMIXSQLHOSTS: g_mySLA group - - c=1,i=123456 cm1_mySLA onsoctcp cm1Host cm1Port g=g_mySLA cm2_mySLA onsoctcp cm2Host cm2Port g=g_mySLA cm3_mySLA onsoctcp cm3Host cm3Port g=g_mySLA

Page 30: Always on high availability best practices for informix

c=1?

http://publib.boulder.ibm.com/infocenter/idshelp/v117/topic/com.ibm.admin.doc/ids_admin_0175.htm

Page 31: Always on high availability best practices for informix

Network paths offer perspective

PRI

HDR

switch

Is PRI down? Yes

PRI

HDR Is PRI down? No

vs

Page 32: Always on high availability best practices for informix

We Value Your Feedback!

Don’t forget to submit your Insight session and speaker feedback! Your feedback is very important to us – we use it

to continually improve the conference.

Access your surveys at insight2015survey.com to quickly submit your surveys from your smartphone, laptop or

conference kiosk.

31

Page 33: Always on high availability best practices for informix

32

Notices and Disclaimers Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.

Page 34: Always on high availability best practices for informix

33

Notices and Disclaimers (con’t)

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

•  IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DB2® , DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, IMS™, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Page 35: Always on high availability best practices for informix

© 2015 IBM Corporation

Thank You