Upload
saeed-meethal
View
229
Download
0
Embed Size (px)
Citation preview
8/13/2019 ASM Troubleshooting Overview
1/27
ASM Troubleshooting
Yahoo!August 2009
Kevin MooreTechnical Lead, Advanced Customer Services
8/13/2019 ASM Troubleshooting Overview
2/27
Advanced Custom er Services
ASM L & L Topics
1. ASM init.ora Parameters
2. ASM Alert log messages
3. Yahoo! Alert Log & messages
4. ASM data Gathering
5. Troubleshooting Scenarios
6. Instance Events
7. Instance Tracing
8. ASM Rebalancing operations
9. ASM Extent management10. Performance Considerations
11. ASM Templates
12. Background Processes
13. ASM Views
14. ASMCMD Commands
15. New 11g Commands
16. ASM MySupport Documents
8/13/2019 ASM Troubleshooting Overview
3/27
Advanced Custom er Services
ASM initSID.ora
##############################################################################
# Copyright (c) 1991, 2001, 2002 by Oracle Corporation
##############################################################################
###########################################
# Cluster Database
###########################################
cluster_database=true
###########################################
# Miscellaneous
###########################################
diagnostic_dest=/home/oracle
instance_type=asm
###########################################
# Pools
###########################################
large_pool_size=12M
asm_diskgroups='DATA'
+ASM2.instance_number=2
+ASM1.instance_number=1
8/13/2019 ASM Troubleshooting Overview
4/27
Advanced Custom er Services
ASM Alert Log
Mon Aug 24 15:14:10 2009
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth0 192.168.1.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth1 67.0.0.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 2
Using LOG_ARCHIVE_DEST_1 parameter default value as/home/oracle/oracle/product/11.1.0/rdbms/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled Starting up ORACLE RDBMS Version: 11.1.0.6.0.
Using parameter settings in server-side pfile /home/oracle/oracle/product/11.1.0/rdbms/dbs/init+ASM1.ora
System parameters with non-default values:
large_pool_size = 12M
instance_type = "asm"
cluster_database = TRUE
instance_number = 1
asm_diskgroups = "DATA" diagnostic_dest = "/home/oracle"
Cluster communication is configured to use the following interface(s) for this instance
192.168.1.161
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
8/13/2019 ASM Troubleshooting Overview
5/27
Advanced Custom er Services
Yahoo! ASM Alert Log Sun May 4 00:19:05 2008
kjbdomatt send to node 0 * One line for each node *
kjbdomatt send to node 1
kjbdomatt send to node 2 NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DISK116 label:DISK116 * One line for each node *
NOTE: cache opening disk 2 of grp 2: DISK117 label:DISK117
NOTE: attached to recovery domain 2
Sun May 4 00:19:14 2008
NOTE: recovering COD for group 1/0x8ccb7277 (DATA) * Metadata for tracking long running trx *
SUCCESS: completed COD recovery for group 1/0x8ccb7277 (DATA)
Sun May 4 00:19:14 2008 NOTE: opening chunk 14 at fcn 0.0 ABA
NOTE: seq=2 blk=0
Sun May 4 00:19:14 2008
NOTE: cache mounting group 2/0x8CDB7278 (TEMP) succeeded
SUCCESS: diskgroup TEMP was mounted
Sun May 4 00:19:17 2008
NOTE: recovering COD for group 2/0x8cdb7278 (TEMP)
SUCCESS: completed COD recovery for group 2/0x8cdb7278 (TEMP) NOTE: enlarging ACD for group 1/0x8ccb7277 (DATA)
Sun May 4 00:21:10 2008
SUCCESS: ACD enlarged for group 1/0x8ccb7277 (DATA) * Metadata REDO *
NOTE: enlarging ACD for group 2/0x8cdb7278 (TEMP)
SUCCESS: ACD enlarged for group 2/0x8cdb7278 (TEMP)
8/13/2019 ASM Troubleshooting Overview
6/27
Advanced Custom er Services
ASM Data gathering
Please gather all files from the ASM bdump and udump directoriescovering the specified time frame of the problem - be sure to include
alert logs for ALL ASM instances. For Hang/Performance issues, please gather System state dumps from
ASM instances
Please use the script below for querying ASM views, and provide thespooled output (each instance).
set newpage noneset feedback offset heading offset termout offcolumn grp format 99column disk format 99999column lxn format 999column flg format 999column chk format 999spool asmselect group_number as grp, name, state, type, total_mb, free_mb from v$asm_diskgroup;select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;select group_kfdat, number_kfdat, aunum_kfdat, v_kfdat, fnum_kfdat, i_kfdat, xnum_kfdat, raw_kfdat from x$kfdat;select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;select grp, disk, NUMBER_KFDPARTNER, PARITY_KFDPARTNER, ACTIVE_KFDPARTNER from x$kfdpartner;select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;select group_kffxp as grp, number_kffxp as num, incarn_kffxp as incarn, PXN_KFFXP, XNUM_KFFXP, LXN_KFFXP as lxn, DISK_KFFXP asdisk, AU_KFFXP, FLAGS_KFFXP as flg, CHK_KFFXP as chk from x$kffxp;select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;set linesize 1500select GROUP_NUMBER, DISK_NUMBER, INCARNATION, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE, LIBRARY,TOTAL_MB, FREE_MB, NAME, FAILGROUP, LABEL, PATH, CREATE_DATE, MOUNT_DATE, READS, WRITES, READ_ERRS,WRITE_ERRS, READ_TIME, WRITE_TIME, BYTES_READ, BYTES_WRITTEN from v$asm_disk;spool offexit
8/13/2019 ASM Troubleshooting Overview
7/27
Advanced Custom er Services
ASM Troubleshooting Scenarios
ASM space issues
1. ASM level errors ORA-15041
ORA-15047
2. RDBMS level errors when storage is on ASM
3. Inconsistencies between what is perceived as the available space
4.Inconsistencies between V$ASM_DISKGROUP and X$ views
Note #351117.1 - Information to gather when diagnosing ASM space issuescontainsscripts for collecting specific ASM information
8/13/2019 ASM Troubleshooting Overview
8/27
Advanced Custom er Services
ASM Troubleshooting Scenarios
ASM Disk Missing
1. Use OS utilities to determine which disk cannot be found
TRUSSing or STRACEing the RBAL process while selecting * from v$asm_disk can often show errors in the path of thecommand
SESSION #1
strace -f -o /tmp/rbal.trc -p
truss -ef -o /tmp/rbal.out -p
SESSION #2
select * from v$asm_disk
SESSION #3
tailf /tmp/rbal.trc
Examine the rbal.out for errors:
1147090: 1871929: chdir("dev/") = 01147090: 1871929: statx("rhdisk8, ", 0x0FFFFFFFFFFFAA80, 176, 010) Err#2 ENOENT
This says that rhdisk8cannot be found
2. ORA-15063: ASM discovered an insufficient number of disks for diskgroup s%ORA-15040: diskgroup is incompleteORA-15042: ASM disk "%" is missing
Note #452770.1- ASM disk not found/visible/discovered issues
8/13/2019 ASM Troubleshooting Overview
9/27
Advanced Custom er Services
ASM Troubleshooting Scenarios
ASM is Unable to Detect ASMLIB Disks/Devices
1. First of all, please scan the disks (on all the nodes if RAC):
dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm scandisksScanning system for ASM disks: OK ]
2) Second, make sure the disks can be listed :
dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm listdisksVOL1_10GVOL2_10G
3) Query each disks:
dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm querydisk VOL1_10GDisk "VOL1_10G" is a valid ASM disk on device [3, 18]dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm querydisk VOL2_10GDisk "VOL2_10G" is a valid ASM disk on device [3, 22]
4) Check if they exist at OS level:
dbaasm.us.oracle.com:+ASM:oracle:11g>ls -l /dev/oracleasm/disks/VOL1_10Gbrw-rw---- 1 oracle dba 3, 18 Aug 13 09:54 /dev/oracleasm/disks/VOL1_10Gdbaasm.us.oracle.com:+ASM:oracle:11g>ls -l /dev/oracleasm/disks/VOL2_10Gbrw-rw---- 1 oracle dba 3, 22 Aug 13 09:55 /dev/oracleasm/disks/VOL2_10G
5) Then, in the initialization parameter file set the discovery disks string parameter as follow:
asm_diskstring =ORCL:*
Note: Also, you can set it thru the DBCA (during the diskgroup(s) creation) by pressing the [Change Disk Discovery Path]
button.
8/13/2019 ASM Troubleshooting Overview
10/27
Advanced Custom er Services
ASM Troubleshooting Scenarios
ASM is Unable to Detect ASMLIB Disks/Devices (LINUX Specific)
1. 6) If the problem persists then you can set the discovery disks string as follow:
asm_diskstring = /dev/oracleasm/disks/*
7) As workaround you can setasm_diskstring = /dev/oracleasm/disks/*, this is possible for Oracle 10g Release 2 and onwards since itcan access block devices. Oracle uses O_DIRECT flag, which can be used for opening block devices to bypass the OS cache.
8) If the problem persists, please open a new service request with Oracle support and then please provide us the next information(from all the nodes if RAC) :
2. Upload the next files:
3. ========================================)> /var/log/messages
4. =)> New /etc/sysconfig/oracleasm
=)> alert+ASM#.log for each instance.================================
And the output of the next commands
5. ================================
6. $> cat /etc/*release$> uname -a$> rpm -qa |grep oracleasm$> df -ha$> ls -l /dev/oracleasm/disks$> powermt display dev=emcpower# (On all the partitions if using PowerPath from EMC)
7. ================================$> /etc/init.d/oracleasm status$> usr/sbin/oracleasm-discover$> /usr/sbin/oracleasm-discover 'ORCL:*'
SQL> show parameter asm
Note #457369.1- ASM is Unable to Detect ASMLIB Disks/Devices
8/13/2019 ASM Troubleshooting Overview
11/27
Advanced Custom er Services
ASM Instance Events
Applicable Event Levels (15xxx)
Level 7 - DEBUG - Trace information for ASM/OSM debugging purposes only Level 6 - NLOOPS - Trace deeply nested loops within a function
Level 5 - LOOPS - Trace loops within a function
Level 4 - CALLS - Trace function call entry
Level 3 - NORMAL - Trace normal paths within a function
Level 2 - WARN - Trace warning paths within a function
Level 1 - ERROR - Trace error paths within a function
Kx 0x0000010 /* Array portion flags */Kxx 0x0000020 /* Alias-Directory operations */Kxx 0x0000040 /* Block validation interface */Kxx 0x0000080 /* metadata cache */Kxx 0x0000100 /* disk operations */Kxx 0x0000200 /* file operations */Kxx 0x0000400 /* disk group operations */Kxx 0x0000800 /* I/O layer (to ASMLIB or KSFD) */Kxx 0x0001000 /* node monitor (ie CSS interface) */Kxx 0x0002000 /* network layer (ie RDBMS-ASM connections) */Kxx 0x0004000 /* PLSQL package */Kxx 0x0008000 /* recovery */Kxx 0x0010000 /* templates */Kxx 0x0020000 /* SQL execution (processing ASM SQL commands) */Kxxx 0x0040000 /* ASM DBWR */Kxxx 0x0080000 /* ASM LGWR */Kxxx 0x0100000 /* I/O handles mirroring, striping, etc. */
8/13/2019 ASM Troubleshooting Overview
12/27
Advanced Custom er Services
ASM Instance Tracing
Trace RBAL process
[oracle@rac1 ~]$ ps -ef | grep rbal oracle 7745 1 0 09:24 ? 00:00:02 asm_rbal_+ASM1
oracle 9255 1 0 09:27 ? 00:00:00 ora_rbal_whsed1
oracle 9971 5367 0 11:31 pts/1 00:00:00 grep rbal
[oracle@rac1 ~]$ strace -f -o /tmp/rbal.trc -p 7745
Process 7745 attached - interrupt to quit
Process 7745 detached
more /tmp/rbal.trc
7745 semtimedop(163842, 0xbfb973f4, 1, {2, 350000000}) = -1 EAGAIN (Resource te
mporarily unavailable)
7745 gettimeofday({1251917133, 714243}, NULL) = 0
7745 gettimeofday({1251917133, 714337}, NULL) = 0
7745 gettimeofday({1251917133, 714395}, NULL) = 0
7745 getrusage(RUSAGE_SELF, {ru_utime={2, 79683}, ru_stime={1, 81835}, ...}) =
7745 sendmsg(13, {msg_name(16)={sa_family=AF_INET, sin_port=htons(32963), sin_a ddr=inet_addr("192.168.1.162")}, msg_iov(2)=[{"\4\3\2\1\327\263\200\0\0\0\0\0MRO
N\0\1\0\0\220\0\0\0\1"..., 68}, {"KSXP\2\0\0\0\1\0\2\0\20\0\0\0\4\0\0\0\0\0\0\0\
0\0\0\0r"..., 144}], msg_controllen=0, msg_flags=0}, 0) = 212
"buffer busy or rdbms ipc reply events
8/13/2019 ASM Troubleshooting Overview
13/27
Advanced Custom er Services
ASM Rebalancing
Rebalancing is the activity of spreading data amongst disks inan ASM group
Happens in the background but can be done manually
Internally the balance happens on a file per file basis
Only one RBAL process runs per node
Rebalance request on the same diskgroup are done serially
ASM decides how best to balance load across available disks Uses one of three allocation schemes for selecting disks
1. Placement by file/extent number
2. Random-seeded ordering of all disks in the ASM disk directory
3. Balanced placement over all disks
8/13/2019 ASM Troubleshooting Overview
14/27
Advanced Custom er Services
ASM Rebalancing
Parallel execution based on rebalance POWER
POWER settings are 1-11 (default 1) Used to throttle overhead during normal operations
Rebalance moves 1mb chunks at a time
Setting POWER to 0 defers rebalancing to another time
8/13/2019 ASM Troubleshooting Overview
15/27
Advanced Custom er Services
ASM Rebalancing
Displaying & changing rebalance POWER setting
SQL> show parameter limit
NAME TYPE VALUE------------------------------------ ----------- ------asm_power_limit integer 1
Changing setting
SQL> alter diskgroup dg1 rebalance power 8;
Verifying Change
SQL> select * from v$asm_operation;
GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE
------------ ----- ---- ---------- ---------- ---------- ---------- ----------1 REBAL RUN 8 8 0 407 0
8/13/2019 ASM Troubleshooting Overview
16/27
Advanced Custom er Services
ASM AU/Extent Management
Allocation Units (AU) at the disk level and Extents at the file level
Default AU size is 1mb
Default extent size is 1mb
Extents are allocated in 1, 4, 16, & 64mb chunks (11g)
Extent placement is circular when disks are the same size
Cannot be changed without recreating the diskgroup Templates can be created and added to diskgroups
8/13/2019 ASM Troubleshooting Overview
17/27
Advanced Custom er Services
ASM Performance Considerations
Metadata ONLY is Cached In The ASM Instance
ASM Diskgroup Configuration
External Redundancy
Normal Redundancy (default)
High Redundency
ASM Instance Configuration (large_pool_size)
Resolving ORA-4031
ASM Allocation Unit Size (1mb default)
ASM Fine Grained Stripe Size (8x128k Stripes)
MAX I/O Size Oracle Block Size
8/13/2019 ASM Troubleshooting Overview
18/27
Advanced Custom er Services
ASM Default Template
Archivelog Files - Coarse
Autobackup - Coarse
Controlfile - Fine Grained
Datafile - Coarse
Flashback data - Fine Grained Online REDO - Fine Grained
SPFILE - Coarse
Tempfile - Coarse
Coarse1mb stripe sizeFine Grained8 x 128k stripes
8/13/2019 ASM Troubleshooting Overview
19/27
Advanced Custom er Services
ASM Templates
Striping AttributesFine, Coarse
Redundancy Attributes Mirror2 way
High3 way
UnprotectedNot mirrored
8/13/2019 ASM Troubleshooting Overview
20/27
Advanced Custom er Services
ASM Templates
Viewing Template select * from V$ASM_TEMPLATE;
Altering Template
Alter diskgroup DG modify template NAME attributes (coarse/fine);
Adding Template
Alter diskgroup DG add template NAME attributes (attributes);
Dropping Templates Alter diskgroup DG drop template NAME;
8/13/2019 ASM Troubleshooting Overview
21/27
Advanced Custom er Services
ASM Background Processes
ora_asmb_whsed1 - Foregrounds servicing clients commands from client of database
asm_pmon_+ASM1 - Process monitor, same as database
asm_vktm_+ASM1 - Process to maintain a fast timer, same as database
asm_diag_+ASM1 - Diag process, same as database
asm_ping_+ASM1 - Process to measure network latency, same as database
asm_psp0_+ASM1 - Process that Starts other Processes, used to startup other backgrounds
asm_dia0_+ASM1 - Diag slave process, same as database
asm_lmon_+ASM1 - Lock monitor, Same as database
asm_lmd0_+ASM1 - Lock monitor diag, Same as database
asm_lms0_+ASM1 - Lock monitor slaves, same as database
asm_mman_+ASM1 - Autotune SGA process, Same as Database.
asm_dbw0_+ASM1 - DB writes, same as database DB writer, but deals with ASM cache
asm_lgwr_+ASM1 - Log writer, similar to database, but deals with diskgroups
asm_ckpt_+ASM1 - Checkpoint process, Similar to database CKPT
asm_smon_+ASM1 - Recovery process, Same as database SMON, but deals with diskgroup recovery asm_rbal_+ASM1 - Background process that is used for diskgroup management
asm_gmon_+ASM1 - Group monitor, used for partner and status table, and node membership
asm_lck0_+ASM1 - Lock monitor slave, Same as database
8/13/2019 ASM Troubleshooting Overview
22/27
Advanced Custom er Services
ASM Views (10g & 11G)
View Contents
V$ASM_ALIAS Alias for each disk group mounted by the ASMinstance
V$ASM_CLIENT Identifies databases using disk groups managed by
the ASM instance.
V$ASM_DISK Disks discovered by the ASM instance
V$ASM_DISKGROUP Disk groups known by the ASM instance
V$ASM_FILE File list for each disk group mounted by the ASMinstance
V$ASM_OPERATION Long running operations executing in the ASMinstance
V$ASM_TEMPLATE Templates present in each ASM mounted disk group
8/13/2019 ASM Troubleshooting Overview
23/27
Advanced Custom er Services
cd - Changes the current directory to the specified directory.du - Displays the total disk space occupied by ASM files in the specified
ASM directoryexit - Exits ASMCMD.find - Lists the paths of the specified name (with wildcards) under the
specified directory.help - Displays the syntax and description of ASMCMD commands.ls - Lists the contents of an ASM directory, attributes of the sfile, or the names and attributes
of all disk groups.lsct - Lists information about current ASM clients.lsdg - Lists all disk groups and their attributes.mkalias - Creates an alias for a system-generated filename.
mkdir - Creates ASM directory.pwd - Displays the path of the current ASM directory.m - Deletes the specified ASM files or directories.rmalias - Deletes the specified alias, retaining the file that the alias
ASMCMD Command Reference
8/13/2019 ASM Troubleshooting Overview
24/27
Advanced Custom er Services
cp- Enables you to copy files between ASM disk groups on local instances andremote instances.
lsdsk-ASM can list disk information with or without a running ASM instance. Also
useful for system or storage administrators to obtain lists of disks thatan ASM instance uses.
md_backup and md_restore- These commands enable you to re-create a pre-existing ASMdisk group with the same disk path, disk name, failure groups, attributes,templates and aliasdirectory structure. You can use md_backup to back up the disk group environment and usemd_restore to re-create the disk group before loading from a database backup.
remap- You can remap and recover bad blocks on an ASM disk in normal or high redundancythat have been reported by storage management tools such as disk scrubbers. ASM reads fromthe good copy of an ASM mirror and rewrites these blocks to an alternate location on disk.
New 11g ASM Commands
8/13/2019 ASM Troubleshooting Overview
25/27
Advanced Custom er Services
Note: 340417.1 - Data Gathering for Troubleshooting ASM Issues
Note: 267982.1 - Automatic Storage Management (ASM) Knowledge Browser Product PageNote:824354.1 - How To Trace ASMCMD on UnixNote:351866.1 - How To Reclaim ASM Disk SpaceNote:345180.1 - How to duplicate a controlfile when ASM is involvedNote:553319.1 - ORA-15036 When Starting An ASM Instance
MySupport ASM References
8/13/2019 ASM Troubleshooting Overview
26/27
Advanced Custom er Services
8/13/2019 ASM Troubleshooting Overview
27/27
Advanced Custom er Services