102
!"#$%&’( *"$+,# -&./,$0&’( 1$&,/"#2 13" 0#20&’( $’4 4&$(’320&+ 5"$+0&+# 6#"#.&$7 8&,03’

Breaking Oracle

Embed Size (px)

Citation preview

Page 1: Breaking Oracle

!"#$%&'()*"$+,#

-&./,$0&'()1$&,/"#2)13")0#20&'()

$'4)4&$('320&+)5"$+0&+#

6#"#.&$7)8&,03'

Page 2: Breaking Oracle

About Jeremiah

! !"#$%&'()*+,(-)./!

! Working with Oracle since 1994

! Owner, ORA‐600 Consulting" Architecture, scaling, performance

" Availability, stability, complex recovery

" Training, seminars, recruiting

! UW Certificate Program instructor

! Internals and nontrivial issue resolution

Page 3: Breaking Oracle

Class objectives! Learn to induce realistic application load in test! Learn to create realistic failures and problems! Learn to detect, assess and diagnose problems! Learn appropriate pathways to resolution! Apply critical thinking to emergent problems! Learn to reduce outage times! Learn self‐diagnostics and self‐resolution! Student participation (hands on the keyboard)! Have fun watching Oracle break

Page 4: Breaking Oracle

Introductions

! Where you work

! Your role

! Background (optional)

! What you want to get from class

! A little about your environment" Number of instances

" Criticality" Team size

Page 5: Breaking Oracle

7/16/2008 5

Problem profiles

! Hangs" Single‐sesson" Multi‐session" Whole instance" Multi‐instance

! Spins" Server process" Background process

! Crashes" Session/server/process" Whole instance" ORA‐600, ORA‐7445

! Curruption/data loss" Files" Blocks" Logical" Diabolical

Page 6: Breaking Oracle

Rationale

! Substitute for real‐world ordeals! Hard to find good troubleshooters! High cost of outages! Opportunity for improvement! Obscurity of diagnostic skills" Not a standard DBA skill" Not well documented

! Inadequacy of OWS first‐line! Fun, exciting

Page 7: Breaking Oracle

Most applicable environments

! Mission‐critical apps" Heavily‐used public website" Internal business‐critical systems

! Usefulness obviated by HA?" (RAC, clusters, DataGuard, replication)"Most issues NOT addressed by HA technologies

! Environments where DBA will never encounter such issues" Allows professional development

Page 8: Breaking Oracle

Section 1: Inducing Load

! Need a realistic load to induce hangs, etc.

! Resource contention is a problem of concurrency

! Under load, problems get worse

! Helps find scaling limits of a system

! An inactive site is no excuse for not learning

! Many recent options available

Page 9: Breaking Oracle

Induced Load: Options

! Generated workload" Artificially generated transactions" Simulates a type of application (OLTP/DSS)

" Can be turned up to exhaust server resources! Recorded workload" 0%1,)#223+4#-+%&'()-,15)3%#6" Tunable playback" Less opportunity to ratchet up

Page 10: Breaking Oracle

Load Testing Tools! Application loading" HP (Mercury) LoadRunner" Borland (Segue) Silk Performer" IBM Rational Robot" Web Performance Suite" OpenSTA (open source) " Swingbench (open source)

! Oracle sessions record/playback" Oracle Database Replay" Quest Benchmark Factory" Hammerora (open source)

Page 11: Breaking Oracle

7/16/2008 11

Application Load Testing Tools

! Advantages" Can exercise many services/databases

" Provides whole‐application loading

" Scripted load can be multiplied

! Shortcomings" Probably will miss some database load and application work

" Many manual steps to define and tune load

" Concurrent DB txns not coordinated (errors)

! Script (manually define) workload as a user! Playback workload against application

Page 12: Breaking Oracle

7/16/2008 12

Oracle Session Record/Playback Tools

! Advantages" Real workload, not 7#1-8%,569

" Captures all activity, not just one application

" Some (DB Replay) coordinate txns in proper order

! Shortcomings" :#&'-);5)"13-+23+56<)%&3=)wait time can be removed

" Only applies to DB, not app or other services on which the app is dependent

! Obtain workload from 10046 trace (or internally)! Play back load faithfully or with wait time removed

Page 13: Breaking Oracle

Swingbench

! Open‐source tool by Dominic Giles (Oracle UK)! Synthetic load harness! Useful canned workloads" Order Entry" Calling Circle

! Simple to roll your own workload! Quick and easy to set up! http://www.dominicgiles.com/swingbench

Page 14: Breaking Oracle

Database Replay

! Part of 11! Real Application Testing

! Capture from earlier versions" 9.2.0.8, 10.2.0.3, 10.2.0.4

! Allows workload to resemble real application

! Allows subsetting by user, app, etc.

! Premium option

! 9"&.$"&,:)13")+7$'(#)$22/"$'+#

Page 15: Breaking Oracle

Swingbench step by step

! Assumes Oracle binaries already installed

1. Set up a dedicated test database2. Add new service to tnsnames.ora on a client3. Download and install Swingbench on client4. Configure Swingbench for your environment5. Create and populate the SOE schema6. Run a load test

Page 16: Breaking Oracle

Swingbench1. Set up a dedicated test database

oracle@db02$ dbca -silent -createdatabase \> -templatename General_Purpose.dbc -storagetype fs \> -gdbname od08 -sid od08 -totalmemory 500 \> -syspassword od08pw -systempassword od08pw \> -emconfiguration none -characterset al32utf8 \> -datafiledestination /opt/oracle/oradata \> -initparams _disable_interface_checking=trueCopying database files...Creating and starting Oracle instance...Completing Database Creation...Database creation complete.For details check the logfiles at:/opt/oracle/cfgtoollogs/dbca/od08.Database Information:Global Database Name:od08System Identifier(SID):od08

Page 17: Breaking Oracle

Swingbench2. Add new service to client tnsnames.ora! Become user oracle

jeremiah@db01$ su - oraclepassword:

! Set environmentoracle@db01$ . oraenvORACLE_SID = [oracle] ? *The Oracle base for ORACLE_HOME=/opt/oracle/app/oracle/product/11.1.0/db_1 is /opt/oracle

! Add to $ORACLE_HOME/network/admin/tnsnames.ora:od08=(description=(address=(protocol=tcp)(host=192.168.1.205)(port=1521))(connect_data=(service_name=od08))

)

Page 18: Breaking Oracle

Swingbench3. Download and install on a client

jeremiah@db01$ wget http://www.dominicgiles.com/swingbench/swingbench22.zipLength: 7504679 (7.2M) [application/zip]Saving to: `swingbench22.zip'100%[============================================>] 7,504,679 370K/s in 20s

jeremiah@db01$ unzip swingbench22.zipcreating: swingbench/inflating: swingbench/.DS_Storecreating: swingbench/bin/inflating: swingbench/bin/ccwizard

...inflating: swingbench/winbin/sample/oeconfig.xmlinflating: swingbench/winbin/sample/soeconfig.xmlinflating: swingbench/winbin/sample/spconfig.xmlinflating: swingbench/winbin/swingbench.batinflating: swingbench/winbin/swingconfig.xml

Page 19: Breaking Oracle

Swingbench4. Configure Swingbench

! Read README.txt! Modify ./swingbench.env:

export ORACLE_HOME=/opt/oracle/app/oracle/product/11.1.0/db_1export JAVAHOME=$ORACLE_HOME/jdk/jreexport SWINGHOME=~/swingbenchexport LOADGENUSER=jeremiahexport CLASSPATH=$JAVAHOME/lib/rt.jar:$JAVAHOME/lib/tools.jar:$ORACLE_HOME/jdbc/lib/ojdbc5.jar:$SWINGHOME/lib/mytransactions.jar:${SWINGHOME}/lib/swingbench.jar:$ANTHOME/ant.jar

! Modify ./bin/oewizard.xml:<WizardConfig Name="Oracle Entry Install Wizard" Mode="LightsOut"><Parameter Key="dbapassword" Value="od08pw"/><Parameter Key="connectionstring" Value="//192.168.1.205/od08"/>

! Modify ./bin/swingconfig.xml:<ConnectString>//db02/od08</ConnectString><NumberOfUsers>30</NumberOfUsers><TransactionList WaitTillAllLogon="true" MinDelay="0" MaxDelay="100"

MaxTransactions="-1" QueryTimeout="60">

Page 20: Breaking Oracle

Swingbench5. Create and poplulate SOE schema

! Set environment:jeremiah@db01$ . oraenvORACLE_SID = [oracle] ? *The Oracle base for ORACLE_HOME=/opt/oracle/app/oracle/product/11.1.0/db_1 is /opt/oracle

jeremiah@db01$ export DISPLAY=<displayhost>:0

! Create tablespaces:jeremiah@db01$ rlwrap sqlplus sys@od08 as sysdbaSQL*Plus: Release 11.1.0.6.0 - Production

SQL> create tablespace soedatafile '/opt/oracle/oradata/od08/soe.dbf' size 100M reuseautoextend on next 50m maxsize unlimitedextent management local uniform size 100k nologging;

Tablespace created.

SQL> create tablespace soeindexdatafile '/opt/oracle/oradata/od08/soeindex.dbf' size 100M reuseautoextend on next 50m maxsize unlimitedextent management local uniform size 100k nologging;

Tablespace created.

Page 21: Breaking Oracle

Swingbench5. Create and poplulate SOE schema (cont.)

! Run oewizard to set up sample schema and data:jeremiah@db01$ ./oewizardSwingBench WizardAuthor : Dominic GilesVersion : 2.2Running in Lights Out Mode using config file : oewizard.xml15:25:03 07/10 [DBUG] SQLScriptRunner Starting script ../sql/soecreateuser.sql15:25:03 07/10 [DBUG] SQLScriptRunner Script completed in 0 secs15:25:03 07/10 [DBUG] Step5 Connecting15:25:04 07/10 [DBUG] Step5 Connected...15:25:21 07/10 [DBUG] SQLScriptRunner Starting script ../sql/soepackage.sql15:25:24 07/10 [DBUG] SQLScriptRunner Script completed in 2 secs15:25:24 07/10 [DBUG] Step5 Populating Customers15:25:50 07/10 [DBUG] Step5 Populated Customers in 25 secs15:25:50 07/10 [DBUG] Step5 Populating Orders15:26:43 07/10 [DBUG] Step5 Populated Orders in 53 secs15:26:43 07/10 [DBUG] SQLScriptRunner Starting script ../sql/soeconstraints.sql15:26:54 07/10 [DBUG] SQLScriptRunner Script completed in 11 secs15:26:54 07/10 [DBUG] SQLScriptRunner Starting script ../sql/soeindexes.sql15:27:08 07/10 [DBUG] SQLScriptRunner Script completed in 13 secs...15:28:42 07/10 [DBUG] Step5 Exiting LightsOut Session

Page 22: Breaking Oracle

Swingbench6. Run a load test

jeremiah@db01$ ./charbenchAuthor : Dominic GilesVersion : 2.2

Results will be written to results.xml.Hit Return to Start & Terminate Run...

Users : 15 TPM : 246 Nested TPM : 0

<return>

Users : 0 TPM : 234 Nested TPM : 0Completed Run.

Page 23: Breaking Oracle

SwingbenchDemo GUI

! Start remote terminal connection to PC in lab! Start Xmanager X server on PC! Set DISPLAY on swingbench host to lab PC! Run swingbench

! GUI version allows:" Start/Stop/Restart" Parameter changes" Runtime statistics

Page 24: Breaking Oracle

DB Replay step by step

1. Prepare database for capture

2. Capture a workload

3. Clone DB to capture start SCN" We will use Flashback Database

4. Move workload to clone DB" Unnecessary for us; see above

5. Run workload

Page 25: Breaking Oracle

DB Replay1. Prepare database for capture

! Make sure flashback is onjeremiah@db02$ sqlplus / as sysdba

SQL> shutdown immediateSQL> startup mountSQL> alter database archivelog;SQL> alter database flashback on;SQL> alter database open;

! Create a capture directorySQL> create directory wloadcap as '/opt/oracle/wloadcap';

! Make sure there is some load to capturejeremiah@db01$ ./charbench

Page 26: Breaking Oracle

DB Replay2. Capture a workload

! Start capturejeremiah@db01$ sqlplus sys@od08 as sysdba

SQL> exec dbms_workload_capture.start_capture( -name=>'od08cap', dir=>'WLOADCAP')

PL/SQL procedure successfully completed.

! >#+-)#)?8+35@)"5#&?8+35)A5-)-85)(-#,-)B:CSQL> column name format a20SQL> select name,

to_char(start_time,'yyyy-mm-dd hh24:mi:ss') start_time, start_scn

from wrr$_captures;NAME START_TIME START_SCN-------------------- ------------------- ----------od08cap 2008-07-10 17:04:49 894606

! End the captureSQL> exec dbms_workload_capture.finish_capturePL/SQL procedure successfully completed.

Page 27: Breaking Oracle

DB Replay3. Clone (flash back) to start SCN

! Normally you would clone from a backup:RMAN> duplicate target database to od08clone until scn 894606;

! Flashed back DB is the clone for this demooracle@db02$ sqlplus / as sysdbaSQL> shutdown immediateSQL> startup mountORACLE instance started.SQL> flashback database to scn 894606;Flashback complete.SQL> alter database open resetlogs;Database altered.

Page 28: Breaking Oracle

DB Replay5. Run workload

! Normally we move workload files to cloneoracle@db02$ ls /opt/oracle/wloadcaporacle@db02$ scp -r /opt/oracle/wloadcap db03:/opt/oracle/

! Here we flashed back so leave them in placejeremiah@db01$ rlwrap sqlplus sys@od08 as sysdbaSQL> exec dbms_workload_replay.process_capture( -

capture_dir=>'WLOADCAP')PL/SQL procedure successfully completed.SQL> exec dbms_workload_replay.initialize_replay( -

replay_name=>'od08cap', -replay_dir=>'WLOADCAP')

PL/SQL procedure successfully completed.SQL> exec dbms_workload_replay.prepare_replay( -

think_time_scale=>0)

PL/SQL procedure successfully completed.

! Also move replay files to workload clientsjeremiah@db01$ scp oracle@db02:/opt/oracle/wloadcap/* replay/

Page 29: Breaking Oracle

DB Replay5. Run workload (cont.)

! Launch the workload replay clientsjeremiah@db01$ wrc system/od08pw@od08 replaydir=replay \

> connection_override=trueWorkload Replay Client: Release 11.1.0.6.0 - ProductionWait for the replay to start (21:06:32)

! Start the replayjeremiah@db01$ rlwrap sqlplus sys@od08 as sysdbaSQL> exec dbms_workload_replay.start_replayPL/SQL procedure successfully completed.

! Back in wrc:Replay started (21:09:01)Replay finished (22:29:53)

Page 30: Breaking Oracle

7/16/2008 30

Section 2: Inducing problems

! Hangs" Single‐sesson" Multi‐session" Whole instance" Multi‐instance

! Spins" Server process" Background process

! Crashes" Session/server/process" Whole instance" ORA‐600, ORA‐7445

! Curruption/data loss" Files" Blocks" Logical" Diabolical

Page 31: Breaking Oracle

Hangs

! One or more sessions getting "stuck"! Really means waiting on something! Locks, latches, I/O, object serialization! Hanging sessions may be holding resources needed by others

! Work ethic of waits! Long (legitimate) waits vs. hangs" Oracle's view" Customer's view

Page 32: Breaking Oracle

Single‐session hangs! Simplest case: blocked by an enqueue

SESS1> select order_id from orders where order_date > sysdate - .1 for update;

SESS2> update orders set order_date=sysdatewhere order_id = 26104;

SESS3> column program format a15 truncSESS3> column event format a45SESS3> select sid, program, event, state,

seconds_in_wait, blocking_sessionfrom v$sessionwhere event like 'enq%' or sid in (select blocking_session

from v$sessionwhere event like 'enq%');

! Worse when a blocked session is holding resources! Save above script as locks.sql! !&546%-5(@

Page 33: Breaking Oracle

Single‐session hangs (cont.)! Session hang on uninstrumented operation

SESS1> create table t as select * from all_objects where 1 = 0;

SESS1> exec dbms_stats.gather_table_stats(user,'T')SESS1> alter session set "_optimizer_search_limit" = 100;SESS1> select sys_context('USERENV','SID') from dual;SESS1> select * fromt t1,t t2,t t3,t t4,t t5,t t6,t t7,t t8,t t9,t t10,t t11,t t12,t t13,t t14,t t15,t t16,t t17,t t18,t t19,t t20,t t21,t t22,t t23,t t24,t t25,t t26,t t27,t t28,t t29,t t30,t t31,t t32,t t33,t t34,t t35,t t36,t t37,t t38,t t39,t t40,t t41,t t42,t t43,t t44,t t45,t t46,t t47,t t48,t t49,t t50,t t51,t t52,t t53,t t54,t t55,t t56,t t57,t t58,t t59,t t60,t t61,t t62,t t63,t t64,t t65,t t66,t t67,t t68,t t69,t t70,t t71,t t72,t t73,t t74,t t75,t t76,t t77,t t78,t t79,t t80,t t81,t t82,t t83,t t84,t t85,t t86,t t87,t t88,t t89,t t90,t t91,t t92,t t93,t t94,t t95,t t96,t t97,t t98,t t99;

SQL> @waits

Page 34: Breaking Oracle

Multi‐session hangs

! Worse case: blocked by an enqueue while holding resources needed by othersSESS1> select customer_id from orders where order_id =

(select max(order_id) from orderswhere order_date > sysdate - .5 and mod(order_id,3) = 0) for update;

SESS2> update orders set order_date = sysdatewhere order_date > sysdate - .5 and mod(order_id,3) = 0;

jeremiah@db01$ ./charbenchSESS3> @locks

! D2-+"+(-+4)E(F)25((+"+(-+4)3%4G+&A@

Page 35: Breaking Oracle

Multi‐session hangs (cont.)! Essential troubleshooting Q: who is at the head of the line and what are they doing?

! Another case: inter‐component chain of dependenciesSESS1> update customers set credit_limit = 1000

where cust_last_name = 'Edwards';SESS2> alter session set ddl_lock_timeout = 10000;SESS2> alter table customers modify

(cust_first_name null);SESS3> column program format a15 truncSESS3> column event format a45SESS3> select sid, program, event, state,

seconds_in_wait, blocking_sessionfrom v$session where type != 'BACKGROUND';

! Save the above script as waits.sql! Chains of resource holding

Page 36: Breaking Oracle

Whole‐instance hangs

! Hang I/O calls by processes that can't time outoracle@db01$ mkdir -p /opt/oracle/bctroot@db01# exportfs -i -o rw,no_root_squash /opt/oracle/bctroot@db02# mount -F nfs -o rw \

192.168.1.202:/opt/oracle/oradata/od08/bct /mnt/orabctSYS> alter database enable block change tracking

using file '/mnt/orabct/bct.ora';oracle@db02$ ./charbenchroot@db02# /etc/init.d/nfs stopSYS> @waits

! Most Oracle BG processes will time out/crash" DBW"" LGWR" CTWR is an exception

Page 37: Breaking Oracle

Spins

! Endless loops AKA "out to lunch"

! Process may be hanging or not

! Found with top or ps

! Consumes CPU resources

! If hanging may be holding resources needed by others

Page 38: Breaking Oracle

Server process spins 

! Hang and spin in regexp bugSQL> select 1 from dual where regexp_like(' ','^*[ ]*a');oracle@db02$ ps -eo pid,pcpu,args | sort -n +1 | tail -10

! Also other sessions hang

! Total denial of service available to any user

Page 39: Breaking Oracle

Server process spins! Hang and spin on uninstrumented operation

jeremiah@db01$ sqlplus sys@od08 as sysdbaSQL> grant create any directory to soe;SESS1> create directory mydir as '/tmp';oracle@db02$ mknod /tmp/myfile pSESS1> create table mytab (a number)

organization external (type oracle_loader default directory mydiraccess parameters (records delimited by newline fields terminated by ','(a)) location ('myfile'));

Table created.SESS1> select * from mytab;SQL> @waitsoracle@db02$ ps -eo pid,pcpu,args | sort -n +1 | tail -10

Page 40: Breaking Oracle

Background process spins

! Spinning background procs can't always be killed safelyoracle@db02$ ps -eo pid,s,args | grep ora_arcoracle@db02$ kill -STOP `ps -eo pid,args | grep ora_arc \

| grep -v grep | awk '{print $1}'`oracle@db02$ ps -eo pid,s,args | grep ora_arcSQL> select group#, sequence#, archived, status from v$log

order by sequence#;SQL> alter system switch logfile;SQL> alter system switch logfile;SQL> alter system switch logfile;oracle@db02 $ ps -eo pid,pcpu,args | sort -n +1 | tail -10SQL> column event format a45SQL> select event, state, seconds_in_wait from v$session

where type = 'BACKGROUND' and program like '%LGWR%';

Page 41: Breaking Oracle

Crashes! Usually ORA‐00600 and ORA‐07445! Single process crash #$" take down whole instance! ORA‐00600: internal error code, arguments: [] [] [] []

" First argument tells you where in the code" Additional arguments provide more information" Process/session does not always die" Not necessarily an emergency (anecdotes)" OERI

! ORA‐07445: exception encountered: core dump [] []" Core dump" First argument tells you where in the code (10!+)" Second argument is the signal (kill ‐l)" Additional arguments provide more information

Page 42: Breaking Oracle

ORA‐00600 Example

! Simplest case in PL/SQLSQL> declare

a exception;pragma exception_init(a,-600);

beginraise a;

end;

! Nicer, lets you specify the argumentsSQL> oradebug unit_test dbke_test dde_flow_kge_ora 12333 0 0

Page 43: Breaking Oracle

Bug that raises ORA‐00600

! Bug 6073325: SELECT QUERY WITH CONNECT BY PRIOR FAILS WITH ORA‐00600 [KKQCBYDRV:1]

SQL> select 1 from sys.table_privileges tp, user_objects uowhere tp.grantee in

(select 1 from sys.dba_role_privsconnect by prior granted_role = granteestart with grantee = 'scott');

! Raises ORA‐600, but we are sill connected! Not all ‐600 errors are fatal (most are not)! Just a unhandled exception ‐ no reason to panic

Page 44: Breaking Oracle

Bug that raises ORA‐00600! Bug 6310653: ORA‐600 [KKQCTMDCQ: QUERY BLOCK COULD NOT] ON INSERT

SQL> create table t_6310653(oid number, nm varchar(128), snm varchar(128));

SQL> create table r_6310653(cid number, oid number, sc number, dst number, rw number);

SQL> insert into t_6310653 values (1, 'foo', 'foo');SQL> commit;SQL> insert into r_6310653 (cid, rw, oid, sc)

with ors as(select oid, score(1) rs from t_6310653where contains(nm, '"foo"', 1) > 0order by rs desc, nm, oid)select 13, rownum, ors.oid, ors.rsfrom ors where rownum <= 100;

Page 45: Breaking Oracle

Bug that raises ORA‐00600! Bug 6618235: ORA‐00600: [KKQCBYDRV:1] ON V$ARCHIVED_LOG WITH LEFT OUTER JOIN

SQL> select hh24.hh24, nvl (sum(blocks*block_size),0)from (select trunc(sysdate-7 + (rownum-1)/24, 'hh24') hh24

from dual connect by trunc(sysdate-7+(rownum-1)/24,'hh24')<=sysdate) hh24left outer join (

select thread#,sequence#,next_time, max(blocks) blocks,max(block_size) block_sizefrom v$archived_logwhere dest_id = 1 group by thread#,sequence#, next_time)

on hh24.hh24=trunc(next_time, 'hh24')group by hh24.hh24order by 1;

Page 46: Breaking Oracle

ORA‐07445 Example

! Simplest case: send a signalSQL> select spid from v$process p, v$session s

where p.addr = paddrand sid = sys_context('USERENV','SID');

oracle@db02$ kill -SEGV 2513

! Use PL/SQLSQL> declare

a exception;pragma exception_init(a,-7445);

beginraise a;

end;

Page 47: Breaking Oracle

Real ORA‐07445 bug! Bug 6244173: ORA‐07445 IN QEESTRAVERSEEXPR FOR HIERARCHICAL QUERYSQL> create table t2(col1 varchar2(60));SQL> create table t1(c1 varchar2(60),

c2 varchar2(1),c3 varchar2(60),c4 varchar2(60));

SQL> explain plan forselect 1 from t1 a, t2 b ,t1 cwhere b.col1 = 'xxslc_department'and a.c1 not between c.c3 and c.c4start with a.c2='p'connect by prior a.c1 between a.c3 and a.c4;

! Raises ORA‐HIIHJ)(%)?5)3%%G)+&)#35,-)3%A@! Nature of a crashed process to generate a disconnect! Continued use of dead connection gives app:

" ORA‐3114: Not connected to Oracle" ORA‐1041: internal error. hostdef extension doesn't exist

! oerr ora 1041 ‐ Call support!

Page 48: Breaking Oracle

Whole‐instance crashes

! Something causes a required background process to exit

! ORA‐600, ORA‐7445, I/O errors, etc." Can actually be any error that prevents the next step

! Some will restart,  some crash the instance

! Usually, but not always sensible

Page 49: Breaking Oracle

11! Background Processes:Which ones crash the instance?

9"3+#22);$.#

<#2+"&50&3'

=>?- =03.&+)+3'0"3,1&,# 03).#.3":)2#"@#"

=A>n A#43),3()$"+7&@#"2

>6Bn 63C)2+7#4/,#")+33"4&'$03"

>D9E >7#+%53&'0

<nnn <&25$0+7#"2

<!A? A#23/"+#).$'$(#")5"3+#22

<!8n <$0$C$2#)F"&0#")5"3+#22#2

<G=H <&$('32&C&,&0: 5"3+#22)H

<G=I <&$('32&C&,&0: +33"4&'$03"

J<!= J,$27C$+%)4$0$)$"+7&@#" 5"3+#22

6''' 63C 2+7#4/,#")5"3+#22#2

KI8A A#43),3( F"&0#"

K?<n I,3C$,)#'L/#/# 2#"@&+#)4$#.3'2

K?*; I,3C$,)#'L/#/# 2#"@&+#).3'&03"

??=; ?#.3":).$'$(#"

9"3+#22);$.#

<#2+"&50&3'

??;K ?$'$(#$C&,&0:)?3'&03")9"3+#22)M

??*; ?$'$(#$C&,&0:)?3'&03")9"3+#22

9G;I G'0#"+3''#+0),$0#'+:).#$2/"#.#'0

9?*; 9"3+#22).3'&03"

9-9n 9"3+#22)25$F'#"2

Bnnn B/#/#)+,#$'/5)5"3+#22#2

B?;> B/#/#)+33"4&'$03"

AN>* <&20"&C/0#4)"#+3@#":)5"3+#22

A?-n A=>).$'$(#.#'0)2#"@#"

AO8A A#+3@#":)F"&0#"

-nnn -7$"#4 2#"@#"2

-?>* -5$+#).$'$(#.#'0)+33"4&'$03"

-?*; -:20#.).3'&03")5"3+#22

ODE? O&"0/$,)%##5#")31)0&.#)5"3+#22

8nnn -5$+# .$'$(#.#'0)5"3+#22#2

Page 50: Breaking Oracle

Instance crashes

! Simple case: kill an essential background process (tail the alert log)oracle@db02$ ps -eo pid,args | grep ora_ckpt | grep -v greporacle@db02$ kill -KILL <pid>

! Simple case: send a SIGSEGV or SIGBUS to an essential background processoracle@db02$ ps -eo pid,args | grep ora_dbrm | grep -v greporacle@db02$ kill -SEGV <pid>

" Raises ORA‐07445

Page 51: Breaking Oracle

Instance crashes

! Cause fatal errors in essential background processesSQL> select pid, program, background from v$process

where background = 1;SQL> oradebug setorapid 16SQL> oradebug call kgeasnmierr 4455547624 18446744071472029760

18446744071562043788 2 1 1

! Couldn't find a good ORA‐600 for background processes

Page 52: Breaking Oracle

Take a backup! Mount some mass storage

root@db01# exportfs -i -o rw,no_root_squash 192.168.1.205:/mnt/usb/od08root@db02# mount -F nfs -o vers=3 192.168.1.202:/mnt/usb/od08 \

/mnt/remote

! Move flash recovery area and increase to DB sizeSQL> select sum(bytes)/1024/1024/1024 gb from dba_segments;SQL> alter system set db_recovery_file_dest = '/mnt/remote' scope=both;SQL> alter system set db_recovery_file_dest_size = 5g scope = both;

! Delete old logsjeremiah@db01$ rlwrap rman target sys@od08RMAN> delete archivelog all;

! Fix block change trackingRMAN> backup incremental level 0 database;

! Take a backupSQL> alter database enable block change tracking

Page 53: Breaking Oracle

Instance disappears without a trace

! Most common on Windows

! Usually clusterware or OS services

! Difficult to diagnose

Page 54: Breaking Oracle

Corruption

! Catch‐all term! Physical" File headers" Data blocks" Controlfiles, logfiles, other logs

! Logical" Application tables" Data dictionary, SYS

! Break while backup runs (about 30 minutes)SQL> select message from v$session_longops

where message like 'RMAN%' order by start_time;

Page 55: Breaking Oracle

File header corruption

! Oracle bugs

! OS/hardware bugs

! Array mirror scenario is scary

! Header contents" File ID, tablespace ID, create and checkpoint data" Read at checkpoint time; implications of delay

! We use dd with conv=notrunc to corrupt" dd if=foo of=bar bs=512 oseek=512 count=1 conv=notrunc

Page 56: Breaking Oracle

File header corruption

! Simple example: write wrong file into headeroracle@db02$ dd if=/opt/oracle/oradata/od08/soeindex.dbf \

of=/opt/oracle/oradata/od08/soe.dbf \bs=8192 iseek=1 oseek=1 count=15 conv=notrunc

15+0 records in15+0 records outSQL> alter system checkpoint;

! Check the alert log! Restore datafile (under 2 minutes)

RMAN> restore datafile '/opt/oracle/oradata/od08/soe.dbf';RMAN> recover datafile '/opt/oracle/oradata/od08/soe.dbf';SQL> alter database datafile

'/opt/oracle/oradata/od08/soe.dbf' online;

Page 57: Breaking Oracle

Data block corruption! Simple example: garbage into a block! Find a block in a known table

SQL> select min(dbms_rowid.rowid_block_number(rowid))from soe.customers;

SQL> select customer_id, cust_email from soe.customerswhere dbms_rowid.rowid_block_number(rowid) = 12;

oracle@db02 $ dd if=/opt/oracle/oradata/od08/soe.dbf bs=8192 iseek=12 \count=1 | strings | grep [email protected]

oracle@db02$ dd if=$ORACLE_HOME/bin/oracle \of=/opt/oracle/oradata/od08/soe.dbf \bs=8192 oseek=12 count=1 conv=notrunc

1+0 records in1+0 records outSQL> alter system checkpoint;

! Check the alert log ‐ no errors!! Read the block

SQL> select customer_id, cust_email from soe.customerswhere dbms_rowid.rowid_block_number(rowid) = 12;

SQL> alter system flush buffer_cache;SQL> select customer_id, cust_email from soe.customers

where dbms_rowid.rowid_block_number(rowid) = 12;

! Restore data block (read again)RMAN> blockrecover datafile '/opt/oracle/oradata/od08/od08/soe.dbf' block 12;

Page 58: Breaking Oracle

Controlfile corruption

! Get controlfile locationsSQL> show parameter control_files

! Write garbage into one controlfileoracle@db02$ dd if=$ORACLE_HOME/bin/oracle \

of=/opt/oracle/oradata/od08/control01.ctl \bs=8192 oseek=1 count=30 conv=notrunc

! CheckpointSQL> alter system checkpoint;

! Check alert log! No need to restore from backup; just use one of the others

Page 59: Breaking Oracle

Controlfile corruption! Get controlfile locations

SQL> show parameter control_files

! Write garbage into all controlfilesoracle@db02$ dd if=$ORACLE_HOME/bin/oracle \

of=/opt/oracle/oradata/od08/control01.ctl \bs=8192 oseek=1 count=30 conv=notrunc

! Repeat for each of 3 copies! Checkpoint

SQL> alter system checkpoint;

! Check backupsoracle@db02$ ls -l /mnt/remote/OD08/backupset/2008_07_15RMAN> restore controlfile from

'/mnt/remote/OD08/backupset/2008_07_15/<backuppiece>.bkp';RMAN> recover database;SQL> alter database open resetlogs;

! Take a new level 0 backup

Page 60: Breaking Oracle

Logfile corruption

! Make a second member for each groupSQL> alter database add logfile member'/opt/oracle/oradata/od08/redo01a.log' to group 1;

" (Repeat)

! Cycle logs and run some loadSQL> alter system switch logfile;SQL> alter system switch logfile;SQL> alter system switch logfile;jeremiah@db01$ ./charbench

! Find the current logsSQL> column member format a55SQL> select l.group#, member, archived, l.status

from v$log l, v$logfile lf where l.group# = lf.group# order by l.group#, member;

Page 61: Breaking Oracle

Logfile corruption (cont.)

! Corrupt one of the current logs$ dd if=$ORACLE_HOME/bin/oracle \of=/opt/oracle/oradata/od08/redo01.log bs=512 oseek=1 \count=100 conv=notrunc

! Crash  and restart the instanceSQL> shutdown abortSQL> startup

! Check the alert log

! Can corrupt both during diagnostic and resolution exercises

Page 62: Breaking Oracle

Other vulnerable files

! Archived redo logs

! Flashback logs

! Flashback archives

! Block change tracking file

! Backups

Page 63: Breaking Oracle

Logical corruption! Erroneously changed data

" Missing/incorrect predicate (where clause)! Human error/application bug! Oracle bug (wrong results)

! Many tools to resolve" Flashback query" Flashback transaction" Flashback table" Flashback database" Log Miner" Traditional point‐in‐time recovery" Mini‐clone recovery

Page 64: Breaking Oracle

Logical corruption

! User oops: missing where clauseSQL> set feedback offSQL> update customers set cust_first_name = 'Nimrod'

where rownum < 1000;SQL> commit;SQL> set feedback onSQL> alter table customers enable row movement;SQL> flashback table customers to timestamp

to_date('2008-07-13 22:20:00','yyyy-mm-dd hh24:mi:ss');

! Quality resolution requires examining "versions between" to get exact SCN of change. Increase undo_retention to support.

Page 65: Breaking Oracle

Logical corruption! Some insidious examples! Update sys.obj$

SQL> select dbms_flashback.get_system_change_number from dual;SQL> update obj$ set name = 'FOO' where object_id < 500;SQL> commit;SQL> select * from cat;SQL> shutdown abortSQL> flashback database to scn 1549974;SQL> alter database open resetlogs;

! Much of dictionary is cached and seldom read! Could be weeks or months (expired backups) before discovery

Page 66: Breaking Oracle

Section 3: Detect and resolve

! Detection"Manual

"Monitoring (automated)

! Resolution" Investigative"Manual

" Automated

Page 67: Breaking Oracle

Detect and resolvesingle‐session hangs

! Simplest case: blocked by an enqueueSESS1> select order_id from orders

where order_date > sysdate - .1 for update;SESS2> update orders set order_date=sysdate

where order_id = 26104;SESS3> column program format a15 truncSESS3> column event format a45SESS3> @lock.sql

! Locks and other identifiable waits should be monitored and detected

! Business rules can dictate approach to resolution" Blocker blocking over " seconds is killed" Success depends on resiliency of application's connection pool! ORA‐00028: your session has been killed

Page 68: Breaking Oracle

A manifesto for DBA sanity andstatistics‐based monitoring

! Leverage history in AWR" dba_hist_active_sess_history

! Monitoring should:" examine current waits and statistics (all)" determine if current values fall outside STDDEV for

! Same time yesterday! Same time last week! Same time last month! Same time last year

! Exposes more real problems than static thresholds! Space management example" dba_hist_seg_stat

Page 69: Breaking Oracle

The paramount importance of ASH

! ASH/AWR are extra‐cost options

! Inappropriate, bad decision"More peoples' systems will run poorly

" Those who make money decisions don't understand the importance of this data

! Time‐series of waits per session per SQL

! Extremely rich" explore v$active_session_history

Page 70: Breaking Oracle

SASH

! Kyle Hailey (Perfvision) write a v$ view‐based pretty‐good ASH

! Easy to install/use:oracle@db02$ wget \

http://www.perfvision.com/ftp/sashpack151107.shoracle@db02$ wget http://www.perfvision.com/sql/ash/ashstart.sqloracle@db02$ vi sashpack151107.sh #edit for your envSQL> create user sash identified by sash

default tablespace users temporary tablespace temp;SQL> grant connect, resource, analyze any, create table,

create view, alter session, create sequence, create database link, unlimited tablespace, create public database link to sash;

oracle@db02$ sh sashpack151107.sh

! Provides most of ASH features" SASH.SASH_% tables

Page 71: Breaking Oracle

Detect and resolvesingle‐session hangs (cont.)

! Session hang on uninstrumented operationSESS1> alter session set "_optimizer_search_limit" = 100;SESS1> select sys_context('USERENV','SID') from dual;SESS1> select * fromt t1,t t2,t t3,t t4,t t5,t t6,t t7,t t8,t t9,t t10,t t11,t t12,t t13,t t14,t t15,t t16,t t17,t t18,t t19,t t20,t t21,t t22,t t23,t t24,t t25,t t26,t t27,t t28,t t29,t t30,t t31,t t32,t t33,t t34,t t35,t t36,t t37,t t38,t t39,t t40,t t41,t t42,t t43,t t44,t t45,t t46,t t47,t t48,t t49,t t50,t t51,t t52,t t53,t t54,t t55,t t56,t t57,t t58,t t59,t t60,t t61,t t62,t t63,t t64,t t65,t t66,t t67,t t68,t t69,t t70,t t71,t t72,t t73,t t74,t t75,t t76,t t77,t t78,t t79,t t80,t t81,t t82,t t83,t t84,t t85,t t86,t t87,t t88,t t89,t t90,t t91,t t92,t t93,t t94,t t95,t t96,t t97,t t98,t t99;

SQL> @waits

Page 72: Breaking Oracle

Important wait event concept

! Even if SECONDS_IN_WAIT is increasing, this value is only valid over 1 second if STATE = 'WAITING'

! WAITED SHORT TIME with high SECONDS_IN_WAIT is '30)$)"#$,)F$&0

! If the session is hanging, we are stuck in uninstrumented code

Page 73: Breaking Oracle

If stuck in uninstrumented code! Is there ASH data?

SQL> select event, session_state, sum(time_waited) twfrom v$active_session_history where session_id = 123 and sample_time > sysdate - (.25/24) group by event, session_state order by tw;

! Is the session spinning in CPU?SQL> select spid from v$session s, v$process p

where s.paddr = p.addrand s.sid = <SID of hanging session>;

oracle@db02$ ps -eo pid,pcpu,args | sort -n +1 | tail -10

! Obtain errorstack (multiple if spinning)SQL> oradebug set ospid <PID>SQL> oradebug dump errorstack 1SQL> oradebug tracefile_name

! Examine errorstack" Search Metalink" https://metalink.oracle.com/metalink/plsql/ml2_documents.sh

owDocument?p_database_id=NOT&p_id=153788.1! Open an SR

Page 74: Breaking Oracle

Dumps: errorstack

! Call stack trace and process state dump

! Dump for a hanging process or on error

! Levels:" 0: dump error buffer

" 1: 0+call stack" 2: 1+processstate" 3: 2+context area

Page 75: Breaking Oracle

Frontloading an Oracle Support SR

! Main time wasters in an SR are" Back‐and‐forth with diagnostic requests"Waiting for the next ocurrence

" Doing meaningless operations at OWS request

! Frontloading diagnostics" Errorstacks for spins and hangs" Alert log" RDA or ADR package

Page 76: Breaking Oracle

Generating an ADR package! Already took several errorstacks but ADR did not see errors so no incident is registered.

! Trigger a fatal error " Make sure you tell Support which one was "fake"oracle@db02$ kill -SEGV 21999

! Tail the alert log! Package the incident for Support

oracle@db02$ adrciadrci> show homesadrci> set home diag/rdbms/od08adrci> show incidentsadrci> ips create package incident 33801adrci> ips generate package 1 in /opt/oracleoracle@db02$ unzip -l <package>.zip

! Upload the zip file to support when opening SR" Contains everything they need

Page 77: Breaking Oracle

Detect and resolvemulti‐session hangs

! Blocked by an enqueue while holding resources needed by othersSESS1> select customer_id from orders where order_id =

(select max(order_id) from orderswhere order_date > sysdate - 1 and mod(order_id,3) = 0) for update;

SESS2> update orders set order_date = sysdatewhere order_date > sysdate - 1 and mod(order_id,3) = 0;

jeremiah@db01$ ./charbenchSESS3> @locks

! Same rules for automatic lock detection and adaptive monitoring apply

! In addition, specific diagnostics can be taken automatically in problem situations

Page 78: Breaking Oracle

Reactive diagnostics

! Monitors should take appropriate diagnostics" Lock pile‐up should automatically trigger locks script

! In this situation, we can issue the generic hanganalyze dump" Covers a variety of issues of chain of resource posession

SQL> oradebug setmypidSQL> oradebug hanganalyze 1

Page 79: Breaking Oracle

Hanganalyze: Chain of resource custody

! Shows who is waiting for whom and on what" Level 1‐2: dependency graph and hang chains

" Level 3: add process dumps of hanging sessions

" Level 4: add process dumps of blocking sessions

" Level 5: add process dumps of all sessions in chains

" Level 10: dump all processes in instance !

! Easy to read compared to systemstate

Page 80: Breaking Oracle

Hanganalyze dump interpretationState of nodes ([nodenum]/sid/sess_srno/session/state/start/finish/[adjlist]/predecessor): [0]/1/1/0xa6f8b0/IGN/1/2//none [1]/2/1/0xa70230/IGN/3/4//none [3]/4/1/0xa71530/IGN/5/6//none [4]/5/1/0xa71eb0/IGN/7/8//none [5]/6/1/0xa72830/IGN/9/10//none [6]/7/1/0xa731b0/IGN/11/12//none [7]/8/1/0xa73b30/IGN/13/14//none [8]/9/1/0xa744b0/IGN_DMP/15/18/[130]/none [9]/10/1/0xa74e30/IGN/19/20//none [10]/11/4202/0xa757b0/IGN/21/22/[130]/none [11]/12/1196/0xa76130/NLEAF/23/28/[49]/none [12]/13/1/0xa76ab0/IGN/29/30/[130]/none [37]/38/37/0xa85830/NLEAF/73/76/[50]/46 [46]/47/15/0xa8adb0/NLEAF/91/92/[37][50]/none

! Now take a look at our hanganalyze dump

Page 81: Breaking Oracle

Detect and resolvemulti‐session hangs (cont.)

! Essential troubleshooting Q: who is at the head of the line and what are they doing?

! Inter‐component chain of dependenciesSESS1> update customers set credit_limit = 1000

where cust_last_name = 'Edwards';SESS2> alter session set ddl_lock_timeout = 10000;SESS2> alter table customers modify

(cust_first_name null);SESS3> @waits

! Save the above script as waits.sql! Chains of resource holding also visible in v$wait_chains ‐ the online hanganalyze view

Page 82: Breaking Oracle

11! V$WAIT_CHAINS! Like hanganalyze! Enqueues act funny

select chain_id, sid, blocker_sid, in_wait, wait_event_text, in_wait_secsfrom v$wait_chains;

! Available for monitoring! Column names are better in this view! Cause some library cache contentionSQL> alter session set ddl_lock_timeout = 10000;SQL> beginloop

execute immediate 'alter table soe.orders modify (order_mode varchar2(12))';execute immediate 'alter table soe.orders modify (order_mode varchar2(8))';

end loop;end;/SQL> @waits

! K%,5@

Page 83: Breaking Oracle

11! V$WAIT_CHAINS

! Library cache contentionSQL> select chain_id, count(*) ct from v$wait_chainshaving count(*) > 1 group by chain_id order by ct;SQL> select chain_id, sid, blocker_sid, wait_event_text

from v$wait_chains wc, (select chain_id cid, count(*) ct from v$wait_chains having count(*) > 1 group by chain_id order by ct) m

where m.cid = wc.chain_id order by chain_id;

! Here we see real chains! LC pin/lock holder queries used to be poor! Now we can focus on blockers at head of chain

Page 84: Breaking Oracle

Whole‐instance hangs

! Hang CTWR again:SQL> alter system set db_recovery_file_dest='/opt/oracle'

scope=both;SQL> shutdown immediateSQL> startupSQL> alter database enable block change tracking

using file '/mnt/remote/bct.ora';oracle@db02$ ./charbenchroot@db01# service nfs stopSQL> alter system checkpoint;SQL> @waits

! L85(5)?#+-()35#6)1()-%):L>M@

Page 85: Breaking Oracle

Chasing CTWR's problem

! Check CTWR's waitsSQL> select sid, program, event, state,

seconds_in_wait, blocking_sessionfrom v$session where program like '%CTWR%'

! Shows waiting on I/O, look for causeroot@db02# lsof -p <CTWR pid>root@db02# truss -p <CTWR pid>

! Could still report to Oracle as enhancement request

! Follow same errorstack procedure to file SR

Page 86: Breaking Oracle

Emergency diagnostics

! Things to get if a restart is imminent" create table sav_ash as select * from v$active_session_history;

" create table sav_hang as select * from v$wait_chains;

" hanganalyze level 3 (x3)

" errorstacks of any blocking/hanging processes

! If you can't log in" > 10! has 'prelim' connection

" 9% and lower there is still a way

Page 87: Breaking Oracle

10! and 11!: When you can't log in

! Use the prelim connectionoracle@od08$ sqlplus /nologSQL> set _prelim onSQL> connect / as sysdbaSQL> oradebug setmypidSQL> oradebug direct_access enable traceSQL> oradebug direct_access disable replySQL> oradebug direct_access set content_type = 'text/plain'SQL> oradebug direct_access select * from x$ksdhng_chainsSQL> oradebug tracefile_name

! Any x$ view can be specified! Get ASH and other data as neede! Use v$fixed_view_definition

SQL> select view_definition from v$fixed_view_definitionwhere view_name = 'gv$active_session_history';

Page 88: Breaking Oracle

9%: When you can't log in

! Use a symbolic debugger on an active process

! Attach and call ksudss (dump system state)oracle@db02$ ps -eo pid,pcpu,args | sort -n +1 | tail -10oracle@db02$ gdb $ORACLE_HOME/bin/oracle 11559(gdb) call ksudss(10)(gdb) detach(gdb) q

! Process is stopped while attached

! Same available for other dumps with some research

Page 89: Breaking Oracle

Detect and diagnoseserver process spin

! Hang and spin in regexp bugSQL> select 1 from dual where regexp_like(' ','^*[ ]*a');

! Monitoring should detect runaways with ps" Escalating time with high CPUoracle@db02$ ps -eo pid,pcpu,time,args

! Dump errorstack to find culprit SQLoracle@db02$ ps -eo pid,pcpu,args | sort -n +1 | tail -10SQL> oradebug setospid 13120SQL> oradebug dump errorstack 1SQL> oradebug tracefile_name

! Try call stack search! Open SR, search metalink for issue

Page 90: Breaking Oracle

Detect and diagnose:server process spins

! Hang and spin on uninstrumented operationjeremiah@db01$ sqlplus sys@od08 as sysdbaSQL> grant create any directory to soe;SESS1> create directory mydir as '/tmp';oracle@db02$ mknod /tmp/myfile pSESS1> create table mytab (a number)

organization external (type oracle_loader default directory mydiraccess parameters (records delimited by newline fields terminated by ','(a)) location ('myfile'));

Table created.SESS1> select * from mytab;SQL> @waitsoracle@db02$ ps -eo pid,pcpu,args | sort -n +1 | tail -10

! Class exercise: detect and diagnose

Page 91: Breaking Oracle

Detect and diagnose:background process spins

oracle@db02$ ps -eo pid,s,args | grep ora_arcoracle@db02$ kill -STOP `ps -eo pid,args | grep ora_arc \

| grep -v grep | awk '{print $1}'`oracle@db02$ ps -eo pid,s,args | grep ora_arcSQL> select group#, sequence#, archived, status from v$log

order by sequence#;SQL> alter system switch logfile;SQL> alter system switch logfile;SQL> alter system switch logfile;oracle@db02 $ ps -eo pid,pcpu,args | sort -n +1 | tail -10SQL> column event format a45SQL> select event, state, seconds_in_wait from v$session

where type = 'BACKGROUND' and program like '%LGWR%';

Page 92: Breaking Oracle

Detect and diagnose:bug that raises ORA‐00600

SQL> select 1 from sys.table_privileges tp, user_objects uowhere tp.grantee in

(select 1 from sys.dba_role_privsconnect by prior granted_role = granteestart with grantee = 'scott');

! Detection for these fatal errors requires alert log and application log monitoring

! search metalink for kkqcbydrv:1

Page 93: Breaking Oracle

Detect and diagnose: bug that raises ORA‐00600

(as sys)SQL> create table t_6310653

(oid number, nm varchar(128), snm varchar(128));SQL> create table r_6310653

(cid number, oid number, sc number, dst number, rw number);

SQL> insert into t_6310653 values (1, 'foo', 'foo');SQL> commit;SQL> insert into r_6310653 (cid, rw, oid, sc)

with ors as(select oid, score(1) rs from t_6310653where contains(nm, '"foo"', 1) > 0order by rs desc, nm, oid)

select 13, rownum, ors.oid, ors.rsfrom ors where rownum <= 100;

Page 94: Breaking Oracle

Detect and diagnose:bug that raises ORA‐00600

SQL> select hh24.hh24, nvl (sum(blocks*block_size),0)from (select trunc(sysdate-7 + (rownum-1)/24, 'hh24') hh24

from dual connect by trunc(sysdate-7+(rownum-1)/24,'hh24')<=sysdate) hh24left outer join (

select thread#,sequence#,next_time, max(blocks) blocks,max(block_size) block_sizefrom v$archived_logwhere dest_id = 1 group by thread#,sequence#, next_time)

on hh24.hh24=trunc(next_time, 'hh24')group by hh24.hh24order by 1;

Page 95: Breaking Oracle

Detect and diagnose: ORA‐07445 bugSQL> create table t2(col1 varchar2(60));SQL> create table t1(c1 varchar2(60),

c2 varchar2(1),c3 varchar2(60),c4 varchar2(60));

SQL> explain plan forselect 1 from t1 a, t2 b ,t1 cwhere b.col1 = 'xxslc_department'and a.c1 not between c.c3 and c.c4start with a.c2='p'connect by prior a.c1 between a.c3 and a.c4;

! Raises ORA‐HIIHJ)(%)?5)3%%G)+&)#35,-)3%A@

! Search Metalink for qeesTraverseExpr

Page 96: Breaking Oracle

Detect and diagnose:Instance crashes

! Cause fatal errors in essential background processesSQL> select pid, program, background from v$process

where background = 1;SQL> oradebug setorapid 16SQL> oradebug call kgeasnmierr 4455547624 18446744071472029760

18446744071562043788 2 1 1

! Alert log monitoring is essential to catch this! Availability monitoring finds down instances! Diagnose like any other errored process! Higher support severity

Page 97: Breaking Oracle

Detect and diagnose: Logical corruption

! More accurate restore that doesn't clobber newer txnsSQL> set feedback offSQL> update customers set cust_first_name = 'Nimrod'

where rownum < 1000;SQL> commit;SQL> set feedback onSQL> select versions_startscn, versions_endscn, versions_xid

from customers versions between timestamp sysdate-(.25/24) and sysdatewhere cust_first_name = 'Nimrod';

SQL> select undo_sql from flashback_transaction_querywhere xid = '00090015000003A1'

! Detection: generally user‐detected" Other ideas: triggers, resource limits

Page 98: Breaking Oracle

Detect and diagnose:logfile corruption

! Find the current logsSQL> column member format a55SQL> select l.group#, member, archived, l.status

from v$log l, v$logfile lf where l.group# = lf.group# order by l.group#, member;

! Corrupt both of the current logs$ dd if=$ORACLE_HOME/bin/oracle \of=/opt/oracle/oradata/od08/redo01.log bs=512 oseek=1 \count=100 conv=notrunc

$ dd if=$ORACLE_HOME/bin/oracle \of=/opt/oracle/oradata/od08/redo01a.log bs=512 oseek=1 \count=100 conv=notrunc

! Would be caught (or good log copies made) immediately by a standby

Page 99: Breaking Oracle

Logfile corruption (cont.)

! Crash  and restart the instanceSQL> shutdown abortSQL> startup

! Check the alert log

! Possible resolution paths" Skip corruptions" Open an inconsistent DB" Data unloader" Patch the logfiles

Page 100: Breaking Oracle

Review objectives! Learn to induce realistic application load in test! Learn to create realistic failures and problems! Learn to detect, assess and diagnose problems! Learn appropriate pathways to resolution! Apply critical thinking to emergent problems! Learn to reduce outage times! Learn self‐diagnostics and self‐resolution! Student participation (hands on the keyboard)! Have fun watching Oracle break

Page 101: Breaking Oracle

Next steps

! Internals courses

! Exploration/discovery

! Integration with organizational standards

! Integration with drill days