46

OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 2: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

OS Truth, little white lies, and the Oracle Wait Interface

John Hurley Senior DBA Federal Reserve Bank of Cleveland The Federal Reserve System may or may not use databases and/or may or may not use any commercial database products and/or specifically we do not endorse any or all hardware/software vendors. Shocking but true I am still waiting to be invited to my first OMC meeting on fiscal policy.

Page 3: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

You may be able to find the Grumpy Old DBA here: Blog: grumpyolddba.blogspot.com Twitter: @GrumpyOldDBA President of Northeast Ohio Oracle Users Group www.neooug.org www.neooug.org/gloc Great Lakes Oracle Conference May 18-20 2015

Page 4: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 5: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

Oracle Performance Tuning:

Cary Millsap and Method R provided a light at the end

of the tunnel.

“Optimizing Oracle response time is, for the most part, a

solved problem.”

Method R based on instrumentation provided via the

Oracle Wait Interface.

Works extremely well if followed diligently 99+ percent

of the time.

Page 6: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

COMMON GROUND

Adhoc definition of Oracle Wait Interface:

• An Oracle provided tool set that helps debug

important code paths and record time waited to

identify bottlenecks throughout the life of an

Oracle database session. The Oracle kernel code

is deeply instrumented to record waits.

• So what does a wait look like?

Page 7: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 8: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

1100+ wait events 11.2

Events can be categorized:

• Input/Output

• Network

• Executing SQL/PLSQL (

on CPU or waiting to get

back on CPU )

• Concurrency ( waiting

for some other session

to release resource )

• Other categories

Wait = t1 – t0

Page 9: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

COMMON GROUND

Oracle code samples database system activity

continuously but licensing is required to

access some parts of monitored information.

Comprehensive diagnostics using low level

tracing can be kicked off for a session by using

an Oracle 10046 trace.

The Oracle Enterprise Manager ( or Grid

Control ) provides a graphical user interface

showing system activity based off sampling

instrumented and recorded data.

Page 10: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

Oracle provides graphical tools to see what is going on

based on instrumentation:

• Database console 11g

• Grid control

• Cloud control

Page 11: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

SYSTEM OVERVIEW

• 4 core Xeon processor 32 gb mem 10 gb SGA

• Run 1000+ connected sessions from 10 am until 5 pm

• OEL 5.5

• Running EE 11.1 64 bit patched up to 11.1.0.7.6

• No RAC … Single instance database ( same ORACLE_HOME for database and ASM instance )

• No Grid control ( OEM with Diagnostics/Tuning packs )

• EMC Clariion direct attached storage

• Using ASM disk groups ( external redundancy )

Page 12: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

NORMAL MONDAY MORNING 11/14/2011

Page 13: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

Green = using cpu aka doing work

Blue = doing IO

Normal afternoon … pretty busy day

Page 14: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

• In Cleveland we live and die

with the orange and brown

• Our NFL granted replacement

team has not won a lot of

games since the “old winning

team” was sold to Baltimore (

1996 )

Hard core Browns fan did

a youtube sendup of what

the stadium has turned

into lately …

Page 15: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

ROUGH AFTERNOON AT THE STADIUM ???

Page 16: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

OEM INSTANCE LOCKS

Page 17: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 18: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 19: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 20: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

AWR REPORT:

Guessing game time:

Win something in brown paper sack!

Page 21: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

MASSIVE CPU STARVATION?

APPLICATION LOCKING ( CODE CHANGES ? )

IO WAITS ( DELAYS STORAGE PROBLEMS? )

• Starting just after 1:49 pm intermittent cpu spikes on OEM Top Activity display

• Maximum CPU line at 4 across graph

• At 1:57 pm spiking gets worse

• Spikes of 15 to 20 active sessions on CPU

• Even higher CPU spikes past 2:15 pm

Page 22: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

OEM TOP ACTIVITY SCREEN RECAP

Page 23: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 24: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

OS WATCHER

• This tool or similar should be running on all production systems!

• Details a little different based on actual operating system in use but gets/samples information using ps, top, mpstat, iostat, netstat, traceroute, and vmstat.

• User Guide and setup instructions available via Metalink Doc ID 301137.1

• My systems are configured to collect information once a minute and retain collected information for 10 days. No impact on system to collect.

Page 25: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

zzz ***Tue Mar 1 14:23:20 EST 2011 top - 14:23:24 up 16 days, 13:33, 3 users, load average: 1.28, 1.28, 1.58 Tasks: 1393 total, 2 running, 1391 sleeping, 0 stopped, 0 zombie

Cpu(s): 11.2%us, 1.5%sy, 0.0%ni, 85.6%id, 0.8%wa, 0.1%hi, 0.7%si, 0.0%st Mem: 32959880k total, 32708568k used, 251312k free, 266488k buffers Swap: 33456120k total, 143108k used, 33313012k free, 6943192k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26111 oracle 15 0 10.2g 27m 22m S 24.7 0.1 20:16.99 oracleprod (LOCAL=N 20885 oracle 16 0 10.2g 29m 23m S 8.5 0.1 0:13.36 oracleprod (LOCAL=N 9586 root 15 0 13696 2096 796 R 2.6 0.0 0:00.15 top -b -c -n 2 32674 oracle 18 0 2033m 426m 22m S 2.0 1.3 27:56.81 /u01/app/oracle/pro 7626 oracle 15 0 10.2g 32m 23m S 1.6 0.1 0:39.45 oracleprod (LOCAL=N 9681 oracle 16 0 13672 2096 808 S 1.6 0.0 13:36.42 top 32108 oracle 18 0 10.2g 29m 13m S 1.3 0.1 32:54.91 ora_dia0_prod 3732 oracle 15 0 10.2g 30m 23m S 1.0 0.1 0:19.90 oracleprod (LOCAL=N 14654 oracle 15 0 10.2g 21m 17m S 1.0 0.1 0:23.52 oracleprod (LOCAL=N 3515 oracle 15 0 10.2g 34m 25m S 0.7 0.1 0:57.47 oracleprod (LOCAL=N

Page 26: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

zzz ***Tue Mar 1 14:23:20 EST 2011

avg-cpu: %user %nice %system %iowait %steal %idle

15.58 0.00 3.50 1.17 0.00 79.75

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 29.00 0.00 8.67 0.00 301.33 34.77 0.04 4.15 0.58 0.50

sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

sda2 0.00 29.00 0.00 8.67 0.00 301.33 34.77 0.04 4.15 0.58 0.50

...

dm-0 0.00 0.00 0.00 37.67 0.00 301.33 8.00 0.18 4.79 0.13 0.50

dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerab 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerk 0.00 0.00 0.33 0.67 10.67 13.33 24.00 0.00 3.00 3.00 0.30

emcpowerl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpoweraa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerr 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowern 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerac 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowero 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerp 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerad 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerj 0.00 0.00 3.33 1.00 53.33 13.33 15.38 0.02 3.77 3.62 1.57

emcpowera 0.00 0.00 2.00 0.67 32.00 10.67 16.00 0.01 5.38 5.12 1.37

emcpowerq 0.00 0.00 3.00 1.00 48.00 16.00 16.00 0.01 3.75 3.17 1.27

emcpowerb 0.00 0.00 1.33 0.33 21.33 5.33 16.00 0.01 5.00 5.00 0.83

emcpowere 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpoweri 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

emcpowerc 0.00 0.00 0.00 4.67 0.00 25.33 5.43 0.00 0.93 0.93 0.43

emcpowerd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Page 27: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

zzz ***Tue Mar 1 14:22:20 EST 2011 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 3 0 143108 263504 266392 6942772 0 0 3142 305 3 4 14 3 78 5 0 1 1 143108 260124 266392 6942620 0 0 264 215 11707 10619 17 7 73 2 0 0 0 143108 260372 266392 6942820 0 0 640 454 11232 10701 16 4 74 6 0 zzz ***Tue Mar 1 14:23:20 EST 2011 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 5 0 143108 249832 266484 6943140 0 0 3142 305 3 4 14 3 78 5 0 1 0 143108 250784 266488 6942928 0 0 48 520 7835 7407 30 8 60 1 0 0 0 143108 251032 266488 6943072 0 0 88 32 6960 6954 8 1 90 1 0 zzz ***Tue Mar 1 14:24:20 EST 2011 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 3 0 143108 264044 266572 6944272 0 0 3142 305 3 4 14 3 78 5 0 0 0 143108 260440 266576 6944424 0 0 56 684 8555 8132 15 9 73 3 0 0 0 143108 260440 266576 6944316 0 0 120 4 7031 6401 7 1 90 1 0

Data accumulates in output *.dat file every minute

Page 28: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

OS TRUTH

• Utilities and diagnostics at the operating

system level give an objective view of reality

• If ps or top ( or windows task manager ) do not

show a process using cpu then it is ( probably )

not using cpu

• At times … especially when dealing with Oracle

bugs and/or uninstrumented sections of Oracle

code … information from the Oracle wait

interface can be incomplete, misleading, false,

or just plain confusing.

• Best to have some slices of OS Truth being

generated and stashed away ahead of those

situations when OWI based problem solving

efforts just do not cut it.

Page 29: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

Prod system dba toolkit

sqlplus / as sysdba <<EOF

oradebug setmypid

oradebug unlimit

oradebug dump hanganalyze 3

exit;

EOF

sqlplus / as sysdba <<EOF

oradebug setmypid

oradebug unlimit

oradebug dump systemstate 266

exit;

EOF

sleep 30 ... then repeat both

Probably smart to have

something similar canned and

ready to go if and when things

go badly … oracle support

may want different levels of

detail or other stuff but good

starting point. May need to

use ( sqlplus –prelim / as

sysdba ) if cannot connect …

Oracle doc id references:

452358.1

121779.1

Page 30: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

ORACLE WAIT INTERFACE ( OWI)

• Oracle continues to develop/enhance/maintain the wait interface instrumentation. Each new release is supported by new events and additional instrumentation as well as fixes to existing events.

• Cary Millsap and “Optimizing Oracle Performance” has given us a proven methodology for attacking and diagnosing problems using OWI.

• When licensed for OEM tools such as the diagnostics and tuning pack GUI tools often give you visibility into problems and how to fix them.

• Traces such as the 10046 and the Oracle Wait Interface capabilities are often comprehensive usually give you a good view of what is going on in your system.

• Sometimes you run into problems where the Oracle Wait Interface gives you an incorrect picture of reality. Your system may get stuck in some obscure part of the Oracle code … perhaps the last calls to the OWI and the last information that OEM has does not correspond to where you are now.

Page 31: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

PSTACK SHOWS PROCESS STACK

• At a given point in time pstack shows exactly where a process is … what module and offset is currently executing ( or waiting ) and the whole chain of programs involved.

• It is often possible to dig into the names of the code/modules shown in pstack output and figure out what a process/program is doing.

• Point in time … if you do several pstacks against the same process and it stays in the same routine it is probably stuck/waiting.

• May not be a good idea to run pstack on Oracle background processes … use with caution here.

• May need to run as root to get output ( os dependent ? )

Page 32: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

The Oracle Wait Interface is …

• Useless ( sometimes ) • See James Morle and Tanel Poder joint blog

(http://jamesmorle.wordpress.com/2009/11/09/the-oracle-wait-interface-is-useless-sometimes-pt/ ) ( see part 1 / part 2 / part 3 ).

• At times low level OS based tools such as pstack and pmap and/or system call tracing tools ( linux: strace, systemTap AIX: truss hpux: tusc, Solaris: truss, DTrace Windows ProcMon, ProcExp, StraceNT ) as well as the slices of OS truth ( from OS Watcher or elsewhere ) are needed to help solve problems.

Page 33: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 34: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 35: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

The top function ntevpque() is the real

function/operation what this oracle process was

doing (regardless of what the V$ views or standard

monitoring tools say). Functions starting with NT

mean Network Transport, which mean that the

process was currently (stuck in) doing network

related (or interprocess communication) tasks.

Also, the clsc* functions like clsc_select_ext ()

indicated that the functions in the top of stack were

related to CLuster ServiCes (CSLC). This was an

indication that the stuck Oracle processes get stuck

when they try to communicate to the cluster

services or ASM processes.

Page 36: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

SO WHAT ACTUALLY HAPPENED?

• The OEM and Oracle Wait Interface were pointing to

application locking issues and/or cpu shortages.

• Our business was pushing this system into new record levels

of sales and connections and transactions … past levels that

had ever been encountered before.

• Opened an SR with Oracle the first time we ran into problems

but we were not getting any good answers from Oracle

support. We used an outside resource Tanel Poder working

remotely to help us diagnose the problem.

• In reality we had an ASM configuration issue related to an

Oracle bug. At times dedicated server processes running

application SQL need to communicate with the ASM instance (

read a block from ASM storage into the buffer cache for

example ).

• The oracle code wanted/needed to talk thru a piece of

software cssd daemon ( ocssd.bin ) to ASM instance. This

processes gets kicked off at boot time and had a file resource

limit set to low ( Oracle bug ) … it was sitting at 1024.

Page 37: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

ASM and Database Instance hang when exceeding around

1800 sessions (Doc ID 858279.1)

Page 38: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

#!/bin/ksh

PS_NUM=`ps -ef | grep ocssd.bin | grep -v grep | awk '{ print $2 }' `

echo " The process running ocssd.bin is $PS_NUM"

cat /proc/$PS_NUM/limits pseudo file system / interface to kernel data structures

[root@HOSTNAME ~]# ./temp_mon.sh

The process running ocssd.bin is 8900

Limit Soft Limit Hard Limit Units

Max cpu time unlimited unlimited seconds

Max file size unlimited unlimited bytes

Max data size unlimited unlimited bytes

Max stack size 10485760 unlimited bytes

Max core file size unlimited unlimited bytes

Max resident set unlimited unlimited bytes

Max processes 270336 270336 processes

Max open files 65536 65536 files Was 1024 before fix

Max locked memory unlimited unlimited bytes

Max address space unlimited unlimited bytes

Max file locks unlimited unlimited locks

Max pending signals 270336 270336 signals

Max msgqueue size 819200 819200 bytes

Max nice priority 0 0

Page 39: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

DETAILS ( CONTINUED ) • With the max open files set to 1024 the ocssd.bin process has a low

limit ( file descriptors ). IPC is used between database instance

processes and ASM instance processes.

• Programs once connected via dedicated server that have done a

physical read still are holding onto ASM resource ( file descriptor ).

• An application program might hold a lock and then do something ( in

this case insert ) requiring a block from ASM storage.

• It gets stuck because the the ocssd.bin process cannot get another file

descriptor. It holds the lock and is waiting to talk to ASM.

• The oracle code was not instrumented to record at 11.1 these

interactions with ASM in the Oracle Wait Interface. Changes in 11.2

now have a number of new wait events related to database to ASM

inter communication.

• Other processes will soon get caught waiting to talk to ASM or wait for

locks held by earlier blocked process.

• OEM shows a very misleading picture … search in metalink on clssinit

point to bugs …

diff /etc/rc.d/init.d/init.cssd /etc/rc.d/init.d/init.cssd.bak

1747,1748d1746

< $ULIMIT_CORE

< ulimit -n 65536

Page 40: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 41: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

TO DEBUG: WE HAD TO CATCH A

PROCESS STUCK AND USE PSTACK ON IT

• Pstack gave us the accurate picture

of where the process was in its

execution path.

• Since it was stuck and going

nowhere multiple pstack commands

gave us the same result.

• Searches in metalink on the names

led us to the published bug. Many

bugs and documents in oracle

metalink now include relevant

module names to make matching

weird problems somewhat easier.

Page 42: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

SUMMARY

• While OEM and the Oracle Wait Interface provide a ton of

information at times they may give an inaccurate picture of

reality.

• Operating system TRUTH gives you a way of comparing and

contrasting information and facts when problems are

encountered.

• If the OS Truth looks different from what OEM and OWI are

showing you … then OEM and/or OWI are probably wrong.

Uninstrumented code in the Oracle area or bugs in oracle

code may be leading you down the wrong path.

• Code like OSWatcher is low impact but allows you to collect

and retain OS Truth.

• Utilities like pstack and pmap ( in linux and some other

operating systems ) along with low level tracing utilities can

provide diagnostic information beyond what OEM and OWI

provide.

Page 43: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

#!/bin/bash . /home/oracle/ora_11_2_env # Logon string to oracle (can be just "/" if local authentication is configured) ORA_LOGON=username/password@connstring # For how many seconds a session must have been stuck waiting in order for the hang detection to kick in THRESHOLD=60 TMPFILE=mon_hang_stack.tmp LOGFILE=mon_hang_stack.log rm -f $TMPFILE sqlplus -s $ORA_LOGON @mon_stack $THRESHOLD > $TMPFILE # WARNING! Note that I deliberately search for "ultimate_blocker_USER" only (and not BACKGROUND) # as it's not a good idea to run linux'es pstack command on background processes regularly # (this is because linux pstack attaches to target process using gdb debugger and suspends it briefly, # potentially causing other issues too if you get unlucky. So you definitely don't want to cause trouble # for LGWR process for example, thus I've disabled stack sampling for background processes here) ULTIMATE_BLOCKERS=`grep ULTIMATE_BLOCKER_USER $TMPFILE | awk '{ print $2 }'` echo >> $LOGFILE cat $TMPFILE >> $LOGFILE echo >> $LOGFILE echo DATE=`date +"%Y-%d-%m %H:%M:%S"` ULTIMATE_BLOCKERS=$ULTIMATE_BLOCKERS >> $LOGFILE for i in $ULTIMATE_BLOCKERS ; do echo >> $LOGFILE echo DATE=`date +"%Y-%d-%m %H:%M:%S"` running pstack on PID=$i >> $LOGFILE echo >> $LOGFILE pstack $i >> $LOGFILE sleep 1 echo " after 1 sec sleep pstack repeated" >> $LOGFILE pstack $i >> $LOGFILE done

Page 44: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley
Page 45: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

SET LINES 2000 PAGES 5000 TRIMSPOOL ON TRIMOUT ON FEEDBACK OFF VERIFY OFF SET SERVEROUT ON SIZE 1000000 DEFINE threshold=&1 -- First sleep and monitor V$SESSION to find long waits in the database -- This PL/SQL block will just keep running until a long enough wait is seen DECLARE l_threshold NUMBER := &threshold; l_max_wait NUMBER; BEGIN WHILE TRUE LOOP SELECT MAX(seconds_in_wait) INTO l_max_wait FROM v$session WHERE state = 'WAITING' AND wait_class != 'Idle'; IF l_max_wait > l_threshold THEN EXIT; END IF; DBMS_LOCK.SLEEP(30); END LOOP; END; / PROMPT Long wait detected, listing long waiters from V$SESSION.... SET HEADING OFF SELECT 'CURRENT_TIME= '||TO_CHAR(SYSDATE, 'YYYY-MM-DD HH24:MI:SS') FROM dual; SET HEADING ON SELECT bp.spid blocker_spid , p.spid waiter_spid , s.sid , s.program , s.sql_id , s.event , s.p1 , s.p2 , s.p3 , s.seconds_in_wait , s.blocking_session_status , s.blocking_session blocker_sid , bs.program blocker_program , bs.sql_id blocker_sql_id , bs.state blocker_state , bs.event blocker_event , bs.p1 blocker_p1 , bs.p2 blocker_p2 , bs.p3 blocker_p3 , bs.seconds_in_wait blocker_sec_in_wait

Page 46: OS Truth, little white lies, and the Oracle Wait Interfaceneooug.org/members/documents/shared/John_Hurley... · OS Truth, little white lies, and the Oracle Wait Interface John Hurley

FROM v$session s , v$process p , v$session bs , v$process bp WHERE s.paddr = p.addr AND s.blocking_session = bs.sid AND bs.paddr = bp.addr AND s.state = 'WAITING' AND s.wait_class != 'Idle' AND s.seconds_in_wait > 60 / SELECT 'ULTIMATE_BLOCKER_'||TRIM(s.type)||'= '||TRIM(osid) blocking_spid , w.in_wait_secs , w.pid , w.sid , w.in_wait , w.wait_event , w.p1 , w.p2 , w.p3 FROM v$wait_chains w , v$session s WHERE w.sid = s.sid AND w.sess_serial# = s.serial# AND w.blocker_sid IS NULL AND w.num_waiters > 0 / EXIT