Upload
nguyenngoc
View
231
Download
4
Embed Size (px)
Citation preview
Copyright © 2008, Oracle. All rights reserved.1
Oracle Clusterware for Sysadmins
Copyright © 2008, Oracle. All rights reserved.2
Oracle Clusterware Architecture
• Primary components, functions, & Layers
• Process architecture
• Process interaction (CSS, CRS, EVM, ONS)
• OPROCD integration
• Voting mechanism
• Heartbeating mechanism
• OCR (Oracle Cluster Repository)
Copyright © 2008, Oracle. All rights reserved.3
Oracle Clusterware 10g
Copyright © 2008, Oracle. All rights reserved.4
Oracle Clusterware 11g
Copyright © 2008, Oracle. All rights reserved.5
Primary components, functions, & Layers
• Portable cluster infrastructure that provides HA to RAC
databases and/or other applications:
– Monitors applications’ health
– Restarts applications on failure
– Can fail over applications on node failure
Oracle Clusterware
system files
ORACLE_HOME
CRS HOME
ORACLE_HOME
CRS HOME CRS HOME
Listener
RAC DB Inst
Protected App A
Listener
RAC DB Inst
Protected App B
Node 1 Node 2 Node 3
Copyright © 2008, Oracle. All rights reserved.6
Oracle Clusterware (OCW)
Node1 public network
EVMD
CRSD
OPROCD
ONS
VIP1
CSSDNode 2
EVMD
CRSD
OPROCD
ONS
VIP2
CSSDNode n
EVMD
CRSD
OPROCD
ONS
VIPn
CSSD
/…/
shared storage
OCR and Voting DisksRAW DevicesCSSD
Runs in Real
Time Priority
Copyright © 2008, Oracle. All rights reserved.7
Process architecture
init
crsdocssdevmd
actionaction
action
racgevtfcallout
evmlogger
Oprocd
calloutcallout
OCRVoting
disks racgimonracgimonracgimon
racgwrap
+
racgmain
Linux starting
with 10204/11.1.0.6
Copyright © 2008, Oracle. All rights reserved.8
Oracle Clusterware daemons
• OCW is formed of several daemons, each one of those
have an special function inside the stack. The daemons
are located inside the directory $CRS_HOME/bin. Here
is a list of the
daemons, for 10.2.0.3 and later, note that depending on
the platform and whether or not we have a 3rd-party
vendor clusterware some process may not exist:
• - ocssd.bin
- crsd.bin
- evmd.bin
- oclsvmon.bin
- oclsomon.bin
- oprocd
Copyright © 2008, Oracle. All rights reserved.9
Oracle Clusterware daemons
• When the daemons are running, we can say that Oracle
Clusterware is fully started. Those are executed via the
init.* scripts (init.cssd, init.crsd and init.evmd).
• Note that we do not have as many scripts init.* as
daemons, this is because init.cssd starts more than
one daemon
– init.cssd starts ocssd.bin, olcsomon, oclsvmon and opro
cd (CSS family).
– init.crsd starts crsd.bin
– init.evmd starts evmd.bin
Copyright © 2008, Oracle. All rights reserved.10
Oracle Clusterware “control” files
• Control files (also know as SCLS_SRC files).
• These files are used to control some aspects of OCW
like:
– enable/disable processes from the CSSD family (Eg.
oprocd, oslsvmon)
– stop the daemons (ocssd.bin, crsd.bin, etc).
– prevent OCW from being started when the machine boots
Copyright © 2008, Oracle. All rights reserved.11
Oracle Clusterware daemon functionality
OCSSD
• OCSSD is part of RAC and Single Instance with ASM
• Provides access to node membership
• Provides group services
• Provides basic cluster locking
• Integrates with existing vendor clusterware, when
present
• Can also runs without integration to vendor
clusterware
• Runs as Oracle.
• Failure exit causes machine reboot.
– This is a feature to prevent data corruption in event
of a split brain.
Copyright © 2008, Oracle. All rights reserved.12
Oracle Clusterware daemon functionality
CRSD
• Engine for HA operation
• Manages 'application resources'
• Starts, stops, and fails 'application resources' over
• Spawns separate 'actions' to start/stop/check
application resources
• CRSD maintains configuration profiles as well as
resource statuses in OCR (Oracle Cluster Registry).
• Stores current known state in the OCR.
• Runs as root
• Is restarted automatically on failure
Copyright © 2008, Oracle. All rights reserved.13
Oracle Clusterware daemon functionality
CRSD
• CRSD spawns dedicated processes called RACGIMON
that monitor the health of the database and ASM
instances and host various feature threads such as
Fast Application Notification (FAN).
• One RACGIMON process is spawned for each instance.
• CRSD can spawn temporary children to execute
particular actions such as:
– racgeut (Execute Under Timer), to kill actions that do not
complete after a certain amount of time
– racgmdb (Manage Database), to start/stop/check instances
– racgchsn (Change Service Name), to add/delete/check service
names for instances
– racgons, to add/remove ONS configuration to OCR
– racgvip, to start/stop/check instance virtual IP
Copyright © 2008, Oracle. All rights reserved.14
Oracle Clusterware daemon functionality
EVMD
• Generates events when things happen
• Spawns a permanent child evmlogger
• Evmlogger, on demand, spawns children
• Scans callout directory and invokes callouts.
• Runs as Oracle.
• Restarted automatically on failure
Copyright © 2008, Oracle. All rights reserved.15
OPROCD / Oracles fencing driver
• OPROCD implementation is Oracle's Cluster I/O
Fencing solution and is only started on Unix platforms
when vendor Clusterware is not running .
• OPROCD does not run on Windows
(OraFenceService)
• Linux starting with 10.2.0.4
• OPROCD executable is intended to detect potential
node hangs.
– When it detects a potential node hang, it will cause a
node reboot, to ensure that, if it has been evicted by
other cluster nodes, none of the processes can issue an
I/O after the hang clears.
Copyright © 2008, Oracle. All rights reserved.16
OPROCD / Oracles fencing driver
• The OPROCD executable sets a signal handler for the
SIGALRM handler and sets the interval timer based on
the to-millisec parameter provided.
• The alarm handler gets the current time and checks it
against the time that the alarm handler was last
entered. If the difference exceeds (to-millisec + margin-
millisec), it will fail; the production version will cause a
node reboot.
• The OPROCD takes 2 parameters:
– timeout value (-t <to-millisec>) this is the length of time
between executions
– margin (-m <margin-millisec>) this is the acceptable
leeway for dispatches
Copyright © 2008, Oracle. All rights reserved.17
Manually Control Oracle Clusterware Stack
Might be needed for planned outages:
# crsctl stop crs
# crsctl start crs
# crsctl disable crs
# crsctl enable crs
Copyright © 2008, Oracle. All rights reserved.18
MISSCOUNT: Important CSS Parameter
• Determines CSS heartbeat timeouts before node
eviction
• Has a default value of 30 seconds that is appropriate in
most cases
• Can be temporarily changed:
1. Shut down Oracle Clusterware on all nodes but one.
2. As root on available node, use: crsctl set css misscount M+1
3. Reboot available node.
4. Restart all over nodes.
• Default should never be changed when using non-
Oracle clusterware
Copyright © 2008, Oracle. All rights reserved.19
Multiplexing Voting Disks
• Voting disk is a vital resource for your cluster
availability.
• Use one voting disk if it is stored on a reliable disk.
• Otherwise, use mirrored voting disks:
– There is no need to rely on multipathing solutions.
– Mirrors should be stored on independent devices.
– Make sure that there is no I/O starvation for your voting
disks devices.
– Use at least three mirrors.
• CSS uses a simple majority rule to decide whether
voting disk reads are consistent:v = f*2+1
Copyright © 2008, Oracle. All rights reserved.20
Change Voting Disk Configuration
• Voting disk configuration can be changed dynamically.
• To add a new voting disk:
• To remove a voting disk:
• If Oracle Clusterware is down on all nodes, use the –force option:
# crsctl add css votedisk <new voting disk path>
# crsctl delete css votedisk <old voting disk path>
# crsctl delete css votedisk <old voting disk path> -force
# crsctl add css votedisk <new voting disk path> -force
Copyright © 2008, Oracle. All rights reserved.21
Back Up and Recover Your Voting Disks
• Recommendation is to use symbolic links.
• Back up one voting disk by using the dd command.
– After Oracle Clusterware installation
– After node addition or deletion
– Can be done online
• Recover voting disks by restoring the first one using the dd command, and then mirror it if necessary.
• If no voting disk backup is available, reinstall Oracle
Clusterware.
$ crsctl query css votedisk
$ dd if=<voting disk path> of=<backup path> bs=4k
Copyright © 2008, Oracle. All rights reserved.22
Heartbeat Mechanisms
• Two heartbeat mechanisms for cluster membership
– Network HeartBeat (NHB)
– Disk HeartBeat (DHB)
• Heartbeat mechanisms used for different purposes, they
are not redundant mechanisms
– NHB for detection of loss of cluster viability
– DHB for network split resolution
Copyright © 2008, Oracle. All rights reserved.23
Network HeartBeat (NHB)
• Indicates that node can participate in cluster activities,
e.g. group membership changes
• When NHB is missing for too long, a cluster membership
change (cluster reconfig) is required
• Definition of 'too long' constant over time (misscount)
• Loss of connectivity to the network not necessarily fatal
Copyright © 2008, Oracle. All rights reserved.24
Disk HeartBeat (DHB)
• Final word on whether a node is alive, when the DHB
is missing for too long, node is assumed to be dead
• When connectivity to disk is lost for 'too long', the
disk is considered offline
• The definition of 'too long' varies
– For most of the time 'too long' is 'long disk I/O time'
(LIOT), default 200 seconds
– During cluster node membership change (reconfig)
the time is 'short disk I/O time', which is related to
misscount (misscount – reboottime) (reboottime
default is 3 seconds)
• Connectivity to a majority of voting files must be
maintained for a node to stay active
Copyright © 2008, Oracle. All rights reserved.25
CSS Logging
• Default logging level differ
– Production default is 1
– Test default is 2
• Changing logging level in production
– Execute as root on a node with clusterware stack
up:
'crsctl debug log css CSSD:N' (N is logging level)
– Execute on all nodes, or
– Restart the stack on all other nodes after executing
Copyright © 2008, Oracle. All rights reserved.26
Diagnosability
• Stack dump now in CSSD log
• Signals now trapped to allow printing of diagnostic
data for SEGVs, etc.
• Other diagnostic data printed prior to termination
– Detailed logging
– Most of the memory
• Data may be lost due to reboot before log buffers
flushed to disk
– Set diagwait to allow data to be flushed to disk
– crsctl set css diagwait 13
(run as root on node with CRS stack up, then restart
stack on all nodes)
Copyright © 2008, Oracle. All rights reserved.27
OCR Architecture
Node1
OCR cache
CRS
process
Client
process
Node2
OCR cache
CRS
process
Node3
OCR cache
CRS
process
Client
process
OCR
primary
file
Shared
storageOCR
mirror
file
Copyright © 2008, Oracle. All rights reserved.28
Automatic OCR Backups
• The OCR content is critical to Oracle Clusterware.
• OCR is automatically backed up physically:
– Every four hours: CRS keeps the last three copies.
– At the end of every day: CRS keeps the last two copies.
– At the end of every week: CRS keeps the last two copies.
• Change the default automatic backup location:
$ cd $ORACLE_BASE/Crs/cdata/jfv_clus
$ ls -lt
-rw-r--r-- 1 root root 4784128 Jan 9 02:54 backup00.ocr
-rw-r--r-- 1 root root 4784128 Jan 9 02:54 day_.ocr
-rw-r--r-- 1 root root 4784128 Jan 8 22:54 backup01.ocr
-rw-r--r-- 1 root root 4784128 Jan 8 18:54 backup02.ocr
-rw-r--r-- 1 root root 4784128 Jan 8 02:54 day.ocr
-rw-r--r-- 1 root root 4784128 Jan 6 02:54 week_.ocr
-rw-r--r-- 1 root root 4005888 Dec 30 14:54 week.ocr
# ocrconfig –backuploc /shared/bak
Copyright © 2008, Oracle. All rights reserved.29
Back Up OCR Manually
• Daily backups of your automatic OCR backups to a
different storage device:
– Use your favorite backup tool.
• Logical backups of your OCR before and after making significant changes:
• Make sure that you restore OCR backups that match
your current system configuration.
# ocrconfig –export file name
Copyright © 2008, Oracle. All rights reserved.30
OCR Considerations
• If using raw devices to store OCR files, make sure they
exist before add or replace operations.
• You must be the root user to be able to add, replace,
or remove an OCR file while using ocrconfig.
• While adding or replacing an OCR file, its mirror needs
to be online.
• If you remove a primary OCR file, the mirror OCR file
becomes primary.
• Never remove the last remaining OCR file.
Copyright © 2008, Oracle. All rights reserved.31
OCR / Voting disk placement and protection
• Oracle Clusterware files include voting disks, used to
monitor cluster node status, and Oracle Cluster
Registry (OCR) which contains configuration
information about the cluster. The voting disks and
OCR are shared files on a cluster or network file
system environment. If you do not use a cluster file
system, then you must place these files on shared
block devices or shared raw devices. Oracle Universal
Installer (OUI) automatically initializes the OCR during
the Oracle Clusterware installation.
Copyright © 2008, Oracle. All rights reserved.32
OCR / Voting disk placement and protection
• For voting disk file placement, Oracle recommends that
each voting disk is configured so that it does not share
any hardware device or disk, or other single point of
failure. Any node that does not have available to it an
absolute majority of voting disks configured (more than
half) will be restarted.
Copyright © 2008, Oracle. All rights reserved.33
OCR / Voting disk placement and protection
• Critical cluster configuration repository, and split brain
resolution mechanism
• Oracle mirroring available in 10gR2 onwards
– crsctl add css votedisk path
– ocrconfig -replace ocrmirror destination_file or disk
• Recommend 3 mirrors for voting disk
– split brain resolution requires majority of disks to
allow sub-cluster to continue
Copyright © 2008, Oracle. All rights reserved.34
Useful notes on metalink
Note: 259301.1 CRS and 10g Real Application Clusters
Note: 276434.1 Modifying the VIP of a Cluster Node
Note: 272332.1 Extended "CRS/CSS 10g Diagnostic Collection Guide"
Note: 268937.1 Repairing or Restoring an Inconsistent OCR in RAC
Note: 279793.1 How to Restore a Lost Voting Disk in 10g
Note: 240001.1 Troubleshooting CRS Root.sh Problems
Note: 265769.1 Troubleshooting CRS Reboots
Note: 289690.1 Data Gathering for Troubleshooting RAC and CRS issues
Note: 301137.1 OS Watcher User Guide OS Watcher is available at :
http://coe.oraclecorp.com/pls/prod/osw/
Note: 301138.1 RAC-DDT User Guide RAC Diag tool: http://coe.oraclecorp.com/pls/prod/racddt
Note: 357808.1 Diagnosability for CRS / EVM / RACG
Note: 338706.1 Cluster Ready Services (CRS) rolling upgrade
Note: 391116.1 10.2.0.3 Patch Set - List of Bug Fixes by Problem Type
Note: 401435.1 10.2.0.3 Patch Set - Known Issues
Note: 390880.1 OCR Corruption after Adding/Removing voting disk to a cluster when CRS stack is running
Note: 459694.1 Procwatcher: Script to Monitor and Examine Oracle and CRS Processes
Note: 239998.1 10g RAC How to Clean Up After a Failed CRS Install
Note: 269320.1 Removing a Node from a 10g RAC Cluster
Note: 272332.1 CRS 10g Diagnostic Collection Guide
Copyright © 2008, Oracle. All rights reserved.35
Q U E S T I O N S
A N S W E R S