vcs_for_ha

Oracle High Availability using Veritas Cluster Server (VCS)

Chris LawsonPerformance [email protected]

VCS INTRODUCTIONVeritas Cluster Server is a popular variety of HA --High Availability architecture.Other examples are HP MC Service Guard, IBM HACMP.Goal: Minimize downtime due to server failure.Software used: Veritas Database Edition for Oracle 2.1 (includes VCS 1.1.2); Oracle 8.1.6.2, Solaris 2.6

ADVANTAGES OF VCSFail-over occurs with no loss of data. There is no need to apply redo logs.Fail-over criteria is configurable.Fail-over occurs without DBA intervention, and typically requires just a few minutes.

Service Group BService Group ANODE X

NODE Yshared diskshared disk Local disk DATABASE AVirtual IP

Virtual IP

LISTENER_ALISTENER_BLocal disk DATABASE BApplication

Application

ADVANTAGES OF VCSThe clients are automatically routed to the correct node. Fail-over systems such as used in VCS are proven technology and are widely used.Future databases can be added to cluster with only moderate effort.Far simpler to implement than Replication or Oracle Parallel Server.No modification to application required.

DISADVANTAGES OF VCSThorough testing required to ensure that all related applications properly fail-over together.

Higher level of Sys Admin and DBA expertise and time required for setup.Care must be taken to ensure that nodes are consistently and correctly configured.

DISADVANTAGES OF VCS(continued) Upon fail-over, to return the database and applications to the original node will require another, brief, interruption in service.Since the two nodes are physically co-located, a disaster could still cause service interruption. Note: Products such as Veritas Global Cluster Manager address this issueServicing of database is slightly more complex

MONITORING VCS uses Oracle Enterprise Agents to monitor the health of the Oracle system. These agents perform monitoring at both the database and Listener levels.

MONITORING RDBMSPrimary check: Confirms that Oracle background processes are presentSecondary check: Perform simple database transaction Failure of either test will cause fail-over.

LISTENER MONITORINGPrimary check: Confirm process is runningSecondary check: Status commandThus, listener.log should show a status command every one minute.Optionally, failure of either listener test will not cause failover, but rather multiple restart attempts first.

DATABASE SETUPPlace ALL oracle data files on the shared volume group. Do not intermix files from different databases.In oratab file on each node; specify N as VCS will control startup.Optionally, two sets of archive logs may be specifiedone set on local, one on shared.

TIPS: RECOVERY TIMEInstance recovery time depends on:time to roll-forward un-checkpointed transactions PLUS time to rollback uncommitted transactions:Roll Forward: Time is proportional to frequency of checkpoints.

TIPS: RECOVERY TIMECan control checkpoint frequency several ways: via size of redo logs, and parameters such as:FAST_START_IO_TARGET LOG_CHECKPOINT_INTERVAL.Roll Back: Rollback on Demand feature (default) drastically reduces this time.

LISTENER SETUPA listener should be defined for each service group that has an Oracle database. This allows VCS to try to restart the listener if a failure is detected. This RestartLimit value is not set using hagui; rather, this feature is entered in the file OracleTypes.cf.

LISTENER SETUPSuggest setting parameter RestartLimit to three so that VCS will try 3 times to restart listener before initiating a failover. The count is reset when VCS sets this resource offline.A separate listener (and port) for each database is required because VCS will shutdown the listener for a service group upon failover.

CLIENT SETUPThe client tnsnames.ora file specifies the Service Group (virtual) IP address.Upon failover, this IP address will automatically point to the correct node.No change to client tnsnames.ora file is required.

hagui Sample Screen

VCS BASIC SETUPThe Sys Admin will setup a VCS configuration file, main.cf. This file lists the various Resource Types that constitute a cluster Service Group.Some typical Resource Types are: DiskGroup, IP, and Volume. It is most convenient to use the hagui utility to define the resources for each Service Group.

ORACLE AGENT SETUPVCS uses two agents to monitor both the database and listener. The agents are defined in the main.cf file using two new Resource Types: Oracle and Sqlnet. The GUI tool may again be used to configure the agents.

ORACLE AGENT SETUP

SQLNET AGENT SETUP

SECONDARY-MONITOR SETUP

Besides checking for the background processes, VCS will attempt to perform a simple update transaction. This secondary check is automatically enabled when the following Oracle attributes are defined: MonScript [script executed], User, Pword, Table.

SECONDARY-MONITOR SETUP

Thus, for each database to be monitored, create an oracle user for this transaction. Grant minimal privilege. create a table with column: TIMESTAMP Insert one row into the tableConfirm that this user can perform simple update of the table.

SECONDARY DB CHECK For example:Create user dbcheck identified by dbcheck;Grant connect, resource to dbcheck;Connect dbcheck/dbcheckCreate table DBTEST ( TSTAMP DATE);Insert into DBTEST values (SYSDATE );

FAILOVER & LOG

TIP: INITIAL SETUPGood investment: If complications arise, hire Veritas consultant for a few days to perform the basic setup

This can easily save weeks of frustration

TIP: INITIAL SETUPThe Sql*net parameter RestartLimit must be manually entered into the VCS configuration file. This will allow VCS to attempt listener restart before failing over. Note also that the database secondary monitoring feature is automatically enabled when the User/Pword/Table fields are defined.

TIP: DATABASE CHECKWhenever the database is rebuilt, remember to create the user that performs the secondary database check.Good idea: Have a script handy to create this user and the associated check table.Otherwise, database check will fail, and failover will occur immediately.

TRAP: PROCESSES PARAMSetting of parameter is critical! Else secondary database check will fail. This will cause an unexpected failover. VCS log will indicate that the database became offline unexpectedly, on its own

TRAP: SHARED MEMORYOn Solaris 2.6, shared memory and semaphores may not be released Upon failover involving Shutdown abort, this will leave node unavailable to that instance.Upon fail-back, the instance will be unable to start.

SHARED MEMORY Detect this error and clear shared memory after a failover has occurred. Perhaps set cron job to notify DBA.Detect then correct this problem using the unix commands ipcs and ipcsrm. A shared memory segment can be matched to a particular instance by the memory size.Another method: unix command sysresv shows shared memory and semaphores for an instance (see next slide)

SHARED MEMORY

export ORACLE_SID=acxsysresv

IPC Resources for ORACLE_SID "acx" :Shared Memory:ID KEY2 0x68972d94

Semaphores:ID KEY196609 0x0b0f6b852 0x0b0f6b863 0x0b0f6b87

USEFUL REFERENCESVeritas VCS Reference GuideVeritas VCS Oracle Agent GuideListserver: [email protected]

http://support.veritas.com/

Contact InformationChris [email protected]

For other papers, see:http://dbspecialists.com/(see DBA Resources)

Documents

vcs_for_ha