64
Oracle Diagnostics Hemant K Chitale

Oracle diagnostics 11g

Embed Size (px)

DESCRIPTION

Oracle Database Diagnostics. Presentation file from November 2010

Citation preview

Page 1: Oracle diagnostics 11g

Oracle Diagnostics

Hemant K Chitale

Page 2: Oracle diagnostics 11g

Hemant K Chitale

• whoami ?• Oracle 5 to Oracle 10gR2 : DOS, Xenix,8

flavours of Unix, Linux, Windows• Financial Services, Govt/Not-for-Profit, ERP,

Custom• Production Support, Consulting, Development• A DBA, not a Developer• Product Specialist, Standard Chartered Bank• My Oracle Blog http://hemantoracledba.blogspot.com

Page 3: Oracle diagnostics 11g

Outline

Running Sessions

Tracing

Other Debugging

• no “hands-on”

Page 4: Oracle diagnostics 11g

Licensing

• The OTN License : The OTN Standard Licence• We grant you a nonexclusive, nontransferable limited license to use the

programs only for the purpose of developing, testing, prototyping and demonstrating your application, and not for any other purpose.

• You may not: - use the programs for your own internal data processing or for any commercial or production purposes, or use the programs for any purpose except the development of your application; - use the application you develop with the programs for any internal data processing or commercial or production purposes without securing an appropriate license from us;

• The Diagnostic Pack of Oracle Enterprise Manager : Oracle Diagnostics Pack

Page 5: Oracle diagnostics 11g

Diagnostics for Running Sessions

• Long Running SQL Statements• Latches and Enqueues• Locks and LockTrees• “Runaway” Processes

Page 6: Oracle diagnostics 11g

Long Running SQLs- 1

• How do you use LAST_CALL_ET ?

A simple query might be :select s.sid, s.serial#, s.program, s.machine, s.last_call_et,

p.spid from v$session s, v$process pwhere s.paddr=p.addrand s.last_call_et > 30 -- session active more than 30secondsand s.status = ‘ACTIVE’and s.type != 'BACKGROUND'and s.program not like 'oracle@%P0%'order by s.last_call_et desc, s.sid ;

Caveat : LAST_CALL_ET is reset at each *call* from a client.

Page 7: Oracle diagnostics 11g

Example 1 : (when LAST_CALL_ET cannot flag a long running query)

I ran a query :

09:20:28 SQL> l

1* select * from my_large_table

09:20:29 SQL> /

which returned

1149240 rows selected.

Elapsed: 00:42:16.68

10:02:55 SQL>

in about 42 minutes.

However, monitoring the session, using LAST_CALL_ET and STATUS=‘ACTIVE’ didn’t flag it as a long running SQL :

Page 8: Oracle diagnostics 11g

09:20:43 SQL> l 1 select status, sql_id, last_call_et, event, seq#, state, seconds_in_wait, wait_time_micro 2 from v$session 3* where username = 'HEMANT'09:20:44 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 6b80y82aqw9vm 0 SQL*Net message from client 243 WAITING 0 19555

09:20:44 SQL> 09:26:16 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 6b80y82aqw9vm 0 SQL*Net message from client 12096 WAITING 0 1825609:26:17 SQL>

Page 9: Oracle diagnostics 11g

Only *after* the query had ended, did I see LAST_CALL _ET incrementing :

10:02:03 SQL> /STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 6b80y82aqw9vm 0 SQL*Net message from client 25516 WAITING 0 1324810:02:03 SQL>10:03:24 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 31 SQL*Net message from client 27469 WAITING 30 2980579210:03:25 SQL>

Page 10: Oracle diagnostics 11g

10:03:33 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 39 SQL*Net message from client 27469 WAITING 38 37820708

10:03:33 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 49 SQL*Net message from client 27469 WAITING 48 47980099

10:03:43 SQL>

Page 11: Oracle diagnostics 11g

Thus, LAST_CALL_ET was now showing 38 seconds of true inactive time. Notice that SEQ# is not incrementing now, new waits aren’t arising.

What had been happening was that the session had been rapidly transiting from “ACTIVE” to “INACTIVE” with new waits (SEQ# being incremented) for “SQL*Net message from client”

Page 12: Oracle diagnostics 11g

Here is an extract from the trace file :PARSING IN CURSOR #3 len=28 dep=0 uid=184 oct=3 lid=184 tim=1286557710299640 hv=2507024243 ad='32643628' sqlid='6b80y82aqw9vm'

select * from my_large_table

END OF STMT

PARSE #3:c=63990,e=67645,p=213,cr=125,cu=0,mis=1,r=0,dep=0,og=1,plh=1177583212,tim=1286557710299634

EXEC #3:c=0,e=38,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1177583212,tim=1286557710299770

WAIT #3: nam='SQL*Net message to client' ela= 12 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710299985

WAIT #3: nam='direct path read' ela= 477 file number=4 first dba=17715 block cnt=13 obj#=85340 tim=1286557710301366

WAIT #3: nam='direct path read' ela= 728 file number=4 first dba=17729 block cnt=15 obj#=85340 tim=1286557710303388

FETCH #3:c=3999,e=3709,p=28,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=1177583212,tim=1286557710303755

WAIT #3: nam='SQL*Net message from client' ela= 1418 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710305396

WAIT #3: nam='SQL*Net message to client' ela= 25 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710305640

FETCH #3:c=0,e=206,p=0,cr=1,cu=0,mis=0,r=25,dep=0,og=1,plh=1177583212,tim=1286557710305717

WAIT #3: nam='SQL*Net message from client' ela= 63948 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710369770

WAIT #3: nam='SQL*Net message to client' ela= 5 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710370018

FETCH #3:c=0,e=117,p=0,cr=1,cu=0,mis=0,r=25,dep=0,og=1,plh=1177583212,tim=1286557710370067

WAIT #3: nam='SQL*Net message from client' ela= 51133 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710421261

WAIT #3: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=85340 tim=1286557710421421

Page 13: Oracle diagnostics 11g

And this is what the tkprof shows :call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 1 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 45971 7.86 8.08 16385 61722 0 1149240

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 45973 7.87 8.09 16385 61723 0 1149240

Rows Row Source Operation

------- ---------------------------------------------------

1149240 TABLE ACCESS FULL MY_LARGE_TABLE (cr=61722 pr=16385 pw=0 time=12818149 us cost=4502 size=192733146 card=931078)

Elapsed times include waiting on following events:

Event waited on Times Max. Wait Total Waited

---------------------------------------- Waited ---------- ------------

SQL*Net message to client 45971 0.00 0.44

direct path read 1035 0.00 0.32

SQL*Net message from client 45971 0.60 2442.69

Page 14: Oracle diagnostics 11g

Example 2 : Using LAST_CALL_ET to monitor an SQL that runs in the database.

I ran this SQL :07:31:10 SQL> create table another_large_table as select * from my_large_table;

Table created.

07:32:11 SQL>

This SQL runs entirely in the database.

Page 15: Oracle diagnostics 11g

Monitoring shows :07:30:45 SQL> l 1 select status, sql_id, last_call_et, event, seq#, state, seconds_in_wait, wait_time_micro 2 from v$session 3* where username = 'HEMANT'07:30:50 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 45 SQL*Net message from client 67 WAITING 45 45330503

07:30:51 SQL>07:31:41 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------ACTIVE 4yf5v6kwvy5am 23 buffer busy waits 909 WAITING 0 229769

Page 16: Oracle diagnostics 11g

07:31:42 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------ACTIVE 4yf5v6kwvy5am 40 Data file init write 1381 WAITING 0 49629

07:31:59 SQL>07:33:20 SQL>07:33:22 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 4yf5v6kwvy5am 72 SQL*Net message from client 2136 WAITING 71 71039815

07:33:22 SQL>

Page 17: Oracle diagnostics 11g

07:35:20 SQL> /

STATUS SQL_ID LAST_CALL_ET EVENT SEQ# STATE-------- ------------- ------------ ---------------------------------------------------------------- ---------- -------------------SECONDS_IN_WAIT WAIT_TIME_MICRO--------------- ---------------INACTIVE 4yf5v6kwvy5am 190 SQL*Net message from client 2136 WAITING 190 189650841

07:35:21 SQL>

Page 18: Oracle diagnostics 11g

This is from the tkprof :call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.01 0.03 0 1 0 0Execute 1 2.34 5.11 16385 16638 19413 1149240Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 2 2.35 5.15 16385 16639 19413 1149240

Misses in library cache during parse: 1Optimizer mode: ALL_ROWSParsing user id: 184

Rows Row Source Operation------- --------------------------------------------------- 0 LOAD AS SELECT (cr=17253 pr=16385 pw=16384 time=0 us)1149240 TABLE ACCESS FULL MY_LARGE_TABLE (cr=16389 pr=16385 pw=0 time=5145273 us cost=4502 size=192733146 card=931078)

Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ direct path read 1035 0.18 0.47 direct path write 522 0.01 0.21 log buffer space 1 0.02 0.02 log file switch completion 2 0.07 0.07 SQL*Net message to client 1 0.00 0.00 SQL*Net message from client 1 0.00 0.00

Page 19: Oracle diagnostics 11g

Learnings :

From the first Example :

1. The user/client SQLPlus session noted an execution time of close to 42minutes. But the database server noted an execution time of 8.09seconds only.

2. Querying by LAST_CALL_ET would have never flagged this as a long running query.

3. 2442seconds are lost on SQL*Net message waits. The database server process is NOT ‘ACTIVE’ and LAST_CALL_ET gets reset to 0 at each of these waits.

From the second Example :

4. When the SQL runs on the server, although waits do change while it is running, LAST_CALL_ET does not get reset. This is because all the waits are within the one call.

When running PLSQL :

5. LAST_CALL_ET reflects the start time of the “top” procedure.

Page 20: Oracle diagnostics 11g

Long Running SQLs- 2

• How do you use V$SSQL_MONITOR ?A simple query might be :select username, sid, sql_id, action, elapsed_time, fetches, buffer_getsfrom v$sql_monitorwhere status = ‘EXECUTING’;

Page 21: Oracle diagnostics 11g

Long Running SQLs- 3

• How do you use V$SESSION_LONGOPS ?A simple query might be :select sid, opname, target, sofar, totalwork,units, to_char(start_time,'HH24:MI:SS') StartTime,elapsed_seconds,time_remaining, message, usernamefrom v$session_longopswhere sofar != totalworkorder by start_time ;

Caveat : It is based on *operations* not on the executing SQL. Thus, it get’s reset at each operation. (The view is populated if the operation takes 6seconds or more)

Page 22: Oracle diagnostics 11g

Example 3 : Using V$SESSION_LONGOPS without PQ

I ran a query :08:01:26 SQL> create table a_large_table08:01:44 2 as select * from my_large_table union all select * from my_large_table08:02:17 3 union all select * from my_large_table;

Table created.

08:03:06 SQL>

This is the Execution Plan :

PLAN_TABLE_OUTPUT------------------------------------------------------------------------------------------------------------------------------------Plan hash value: 893555804

-----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-----------------------------------------------------------------------------------------| 0 | CREATE TABLE STATEMENT | | 2793K| 551M| 31487 (67)| 00:06:18 || 1 | LOAD AS SELECT | A_LARGE_TABLE | | | | || 2 | UNION-ALL | | | | | || 3 | TABLE ACCESS FULL | MY_LARGE_TABLE | 931K| 183M| 4502 (1)| 00:00:55 || 4 | TABLE ACCESS FULL | MY_LARGE_TABLE | 931K| 183M| 4502 (1)| 00:00:55 || 5 | TABLE ACCESS FULL | MY_LARGE_TABLE | 931K| 183M| 4502 (1)| 00:00:55 |-----------------------------------------------------------------------------------------

Note----- - dynamic sampling used for this statement (level=2)

Page 23: Oracle diagnostics 11g

It started appearing in V$SESSION_LONGOPS only after some time :08:02:02 SQL> l 1 select sid, sql_plan_line_id, sql_plan_operation, opname, target, sofar, totalwork, 2 units, to_char(start_time,'HH24:MI:SS') StartTime, 3 elapsed_seconds, time_remaining, message, username 4 from v$session_longops 5 where sofar != totalwork 6* order by start_time08:02:41 SQL> /

no rows selected

08:02:43 SQL> /

no rows selected

Page 24: Oracle diagnostics 11g

08:02:48 SQL> /

SID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION OPNAME---------- ---------------- ------------------------------ ----------------------------------------------------------------TARGET SOFAR TOTALWORK UNITS STARTTIM---------------------------------------------------------------- ---------- ---------- -------------------------------- --------ELAPSED_SECONDS TIME_REMAINING--------------- --------------MESSAGE------------------------------------------------------------------------------------------------------------------------------------USERNAME------------------------------ 44 4 TABLE ACCESS Table ScanHEMANT.MY_LARGE_TABLE 16054 16556 Blocks 08:02:42 11 0Table Scan: HEMANT.MY_LARGE_TABLE: 16054 out of 16556 Blocks doneHEMANT

08:02:54 SQL> /

no rows selected

Page 25: Oracle diagnostics 11g

08:02:57 SQL> /

SID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION OPNAME---------- ---------------- ------------------------------ ----------------------------------------------------------------TARGET SOFAR TOTALWORK UNITS STARTTIM---------------------------------------------------------------- ---------- ---------- -------------------------------- --------ELAPSED_SECONDS TIME_REMAINING--------------- --------------MESSAGE------------------------------------------------------------------------------------------------------------------------------------USERNAME------------------------------ 44 5 TABLE ACCESS Table ScanHEMANT.MY_LARGE_TABLE 12902 16556 Blocks 08:02:54 7 2Table Scan: HEMANT.MY_LARGE_TABLE: 12902 out of 16556 Blocks doneHEMANT

08:03:03 SQL> /

no rows selected

08:03:09 SQL>

Page 26: Oracle diagnostics 11g

Example 4 : Using V$SESSION_LONGOPS with PQ

I ran this query :07:47:08 SQL> l 1* select /*+ FULL (a) PARALLEL (a 2) */ count(*) from a_large_table a07:47:09 SQL> /

COUNT(*)---------- 27581760

07:48:13 SQL> select blocks from user_segments where segment_name = 'A_LARGE_TABLE';

BLOCKS---------- 394112(approx 3GB)

Page 27: Oracle diagnostics 11g

PLAN_TABLE_OUTPUT------------------------------------------------------------------------------------------------------------------------------------Plan hash value: 3384045684

-------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |-------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 7451 (1)| 00:01:30 | | | || 1 | SORT AGGREGATE | | 1 | | | | | || 2 | PX COORDINATOR | | | | | | | || 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) || 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | || 5 | PX BLOCK ITERATOR | | 3447K| 7451 (1)| 00:01:30 | Q1,00 | PCWC | || 6 | TABLE ACCESS FULL| A_LARGE_TABLE | 3447K| 7451 (1)| 00:01:30 | Q1,00 | PCWP | |-------------------------------------------------------------------------------------------------------------

Page 28: Oracle diagnostics 11g

Querying for LONGOPS :07:48:04 SQL> / SID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION OPNAME TARGET SOFAR TOTALWORK UNITS STARTTIM ELAPSED_SECONDS TIME_REMAINING---------- ---------------- ------------------------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ---------- ---------- -------------------------------- -------- --------------- --------------MESSAGE----------------------------------------------------------------------------------------------------USERNAME------------------------------ 40 6 TABLE ACCESS Rowid Range Scan HEMANT.A_LARGE_TABLE 14675 15141 Blocks 07:47:57 7 0Rowid Range Scan: HEMANT.A_LARGE_TABLE: 14675 out of 15141 Blocks doneHEMANT

07:48:04 SQL> /no rows selected07:48:07 SQL> /

SID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION OPNAME TARGET SOFAR TOTALWORK UNITS STARTTIM ELAPSED_SECONDS TIME_REMAINING---------- ---------------- ------------------------------ ---------------------------------------------------------------- ---------------------------------------------------------------- ---------- ---------- -------------------------------- -------- --------------- --------------MESSAGE----------------------------------------------------------------------------------------------------USERNAME------------------------------ 40 6 TABLE ACCESS Rowid Range Scan HEMANT.A_LARGE_TABLE 12801 15141 Blocks 07:48:04 8 1Rowid Range Scan: HEMANT.A_LARGE_TABLE: 12801 out of 15141 Blocks doneHEMANT

07:48:13 SQL>

Page 29: Oracle diagnostics 11g

Learnings :

1. Each step in the Execution Plan is a separate Operation. If a Full Table Scan appears more than once OR is executed more than once (e.g. inside a Nested Loop), each execution of that step (operation) is a separate entry in V$SESSION_LONGOPS

2. When ParallelQuery is used, each PQ Slave is allocated a certain number of blocks by the QueryCoordinator – e.g. 10,000 blocks. A scan of this range of blocks shows as a “Rowid Range Scan” in V$SESSION_LONGOPS. That is why you would see multiple occurrences of “Full Scan” for a large table FullTableScan using PQ -- as each Slave restarts with a new set of blocks.

3. So, V$SESSION_LONGOPS will not necessarily tell you the expected duration of the SQL statement if there are multiple operations and/OR multiple passes (e.g. Nested Loop OR ParallelQuery)

4. You can’t monitor an import with V$SESSION_LONGOPS

5. V$SESSION_LONGOPS can hold 500 entries. Inactive entries are not cleared up immediately

Page 30: Oracle diagnostics 11g

Long Running SQLs- 4

• How do you use V$ACTIVE_SESSION_HISTORY ?

A simple query might be :select * from (select sample_time, sql_id, sql_plan_line_id,

sql_plan_operation ,current_obj#, seq#, event, p1,p2,p3 from v$active_session_history where session_id='&sid' and sample_time > (systimestamp-(10/1440)) order by sample_id desc )where rownum < 240order by sample_time asc ;

Caveat : If the session is not ‘ACTIVE’ and/or is not waiting on an event (e.g. it is doing logical I/O on CPU) you won’t see any entries.

Page 31: Oracle diagnostics 11g

Example 5 : Using V$ACTIVE_SESSION_HISTORY (note : DiagPack Licence reqd !)

I run this query :

08:01:39 SQL> select count(*) from a_large_table;

COUNT(*)---------- 27581760

08:03:11 SQL>

Page 32: Oracle diagnostics 11g

10-OCT-10 08.03.04.213 AM 94duwhgx0jh10 2 TABLE ACCESS 85347 47291direct path read 4 405072 16

10-OCT-10 08.03.05.213 AM 94duwhgx0jh10 2 TABLE ACCESS 85347 47545direct path read 4 409136 16

10-OCT-10 08.03.06.223 AM 94duwhgx0jh10 2 TABLE ACCESS 85347 47839direct path read 4 413840 16

10-OCT-10 08.03.07.223 AM 94duwhgx0jh10 2 TABLE ACCESS 85347 48053direct path read 4 417264 16

10-OCT-10 08.03.08.223 AM 94duwhgx0jh10 2 TABLE ACCESS 85347 48260direct path read 4 420576 16

10-OCT-10 08.03.09.223 AM 94duwhgx0jh10 2 TABLE ACCESS 85347 48617direct path read 4 426288 16

10-OCT-10 08.03.10.233 AM 94duwhgx0jh10 2 TABLE ACCESS 85347 48990direct path read 4 432256 16

85 rows selected.

08:04:31 SQL> l 1 select * from ( 2 select sample_time, sql_id, sql_plan_line_id, sql_plan_operation, current_obj#, seq#, event, p1,p2,p3 from v$active_session_history where session_id='&sid' 3 and sample_time > (systimestamp-(10/1440)) order by sample_id desc ) 4 where rownum < 240 5* order by sample_time asc08:04:54 SQL>

Page 33: Oracle diagnostics 11g

Another Example : A query with GROUP BY, ORDER BY03:42:36 SQL> select country, store_type, count(*)03:42:46 2 from store_list03:42:48 3 group by country, store_type03:42:52 4 order by country, store_type03:42:56 5 /

343 rows selected.

03:43:53 SQL>

Page 34: Oracle diagnostics 11g

Querying V$ACTIVE_SESSION_HISTORY :SAMPLE_TIME SQL_ID SQL_PLAN_LINE_ID SQL_PLAN_OPERATION CURRENT_OBJ# SEQ#---------------------------- ------------- ---------------- ------------------------------ ------------ ----------EVENT P1 P2 P3------------------------------ ---------- ---------- ----------11-OCT-10 03.43.08.660 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 8623direct path read 4 42704 16

11-OCT-10 03.43.09.660 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 9180direct path read 4 51616 16

11-OCT-10 03.43.10.660 AM dnhs0zc9ub960 1 SORT 85347 9443 4 61456 16

11-OCT-10 03.43.11.660 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 9851direct path read 4 67984 16

11-OCT-10 03.43.12.670 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 10211direct path read 4 73744 16

11-OCT-10 03.43.13.670 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 10593 4 79856 16...

Page 35: Oracle diagnostics 11g

11-OCT-10 03.43.17.680 AM dnhs0zc9ub960 1 SORT 85347 12548 4 111136 16

11-OCT-10 03.43.18.680 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 13118 4 120256 16

11-OCT-10 03.43.19.680 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 13579 4 127632 16

11-OCT-10 03.43.20.680 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 14100direct path read 4 135968 16

11-OCT-10 03.43.21.690 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 14422direct path read 4 141120 16

11-OCT-10 03.43.22.690 AM dnhs0zc9ub960 1 SORT 85347 14884 4 148512 16...

11-OCT-10 03.43.51.770 AM dnhs0zc9ub960 1 SORT 85347 31661 4 416944 16

11-OCT-10 03.43.52.780 AM dnhs0zc9ub960 2 TABLE ACCESS 85347 32284direct path read 4 426912 16

03:44:23 SQL>

Page 36: Oracle diagnostics 11g

Notice that the samples are once every second – thus, NOT *every* Wait is reported. SEQ# increases significantly within each second.

Page 37: Oracle diagnostics 11g

Learnings :

1. A snapshot is obtained every second. Thus, NOT *every* Wait is reported. SEQ# increases significantly within each second.

2. The view does *NOT* get reset at each new SQL.

3. If you query only by SID, you might see data for two different sessions – one that had logged out and the second that was a new session, reusing the same SID. So, always query by SESSION_ID, SESSION_SERIAL# together.

Page 38: Oracle diagnostics 11g

Long Running SQLs- 5

• How do you identify “expensive” SQLs from V$SQLSTATS ?

An example query might be :select *from(select cpu_time, elapsed_time, application_wait_time,

concurrency_wait_time, user_io_wait_time,disk_reads, buffer_gets, rows_processed, executions,

px_servers_executions, last_active_time, sql_id, sql_text from v$sqlstats order by 2 desc )

where rownum < 11 ;-- change the ORDER BY clause as appropriate

Page 39: Oracle diagnostics 11g

Querying for SQLs with largest Elapsed_Time :

CPU_TIME ELAPSED_TIME APPLICATION_WAIT_TIME CONCURRENCY_WAIT_TIME USER_IO_WAIT_TIME DISK_READS BUFFER_GETS ROWS_PROCESSED---------- ------------ --------------------- --------------------- ----------------- ---------- ----------- --------------EXECUTIONS PX_SERVERS_EXECUTIONS LAST_ACTI SQL_ID---------- --------------------- --------- -------------SQL_TEXT------------------------------------------------------------------------------------------------------------------------------------ 25744086 69716822 0 0 65719693 786452 786469 2 2 0 11-OCT-10 5j3srpqk9ut8pselect count(*) from STORE_LIST x

30791318 45650857 0 0 29723427 393228 393260 343 1 0 11-OCT-10 dnhs0zc9ub960select country, store_type, count(*) from store_list group by country, store_type order by country, store_type

9488558 29879924 0 0 28339308 302096 302075 0 1 0 11-OCT-10 20dyx0fhnrcv2select count(*) from store_list

Page 40: Oracle diagnostics 11g

Long Running SQLs- 6

• How do you use V$TRANSACTION to monitor DML ?

An example query might be :select s.sid, s.serial#, p.spid, s.username, s.program,t.xidusn, t.used_ublk, t.used_urec, t.start_time, s.last_call_etfromv$process p,v$session s, v$transaction twhere s.paddr=p.addrand s.taddr=t.addrorder by s.sid ;

Page 41: Oracle diagnostics 11g

Example : A large transaction :

04:19:04 SQL> update store_list set store_type = 'DEPT-X' where store_type = 'DEPT-L';

396360 rows updated.

04:24:41 SQL>04:28:58 SQL> commit;

Commit complete.

04:29:00 SQL>

Page 42: Oracle diagnostics 11g

04:20:00 SQL> / SID SERIAL# SPID USERNAME PROGRAM---------- ---------- ------------------------ ------------------------------ ------------------------------------------------ XIDUSN USED_UBLK USED_UREC START_TIME LAST_CALL_ET---------- ---------- ---------- -------------------- ------------ 46 114 4402 HEMANT [email protected] (TNS V1-V3) 4 124 14313 10/11/10 04:19:25 36

04:20:01 SQL> / SID SERIAL# SPID USERNAME PROGRAM---------- ---------- ------------------------ ------------------------------ ------------------------------------------------ XIDUSN USED_UBLK USED_UREC START_TIME LAST_CALL_ET---------- ---------- ---------- -------------------- ------------ 46 114 4402 HEMANT [email protected] (TNS V1-V3) 4 382 44499 10/11/10 04:19:25 48

04:20:13 SQL>04:22:01 SQL> / SID SERIAL# SPID USERNAME PROGRAM---------- ---------- ------------------------ ------------------------------ ------------------------------------------------ XIDUSN USED_UBLK USED_UREC START_TIME LAST_CALL_ET---------- ---------- ---------- -------------------- ------------ 46 114 4402 HEMANT [email protected] (TNS V1-V3) 4 3034 355570 10/11/10 04:19:25 156

04:22:01 SQL>

Page 43: Oracle diagnostics 11g

04:24:43 SQL> /

SID SERIAL# SPID USERNAME PROGRAM---------- ---------- ------------------------ ------------------------------ ------------------------------------------------ XIDUSN USED_UBLK USED_UREC START_TIME LAST_CALL_ET---------- ---------- ---------- -------------------- ------------ 46 114 4402 HEMANT [email protected] (TNS V1-V3) 4 5918 442000 10/11/10 04:19:25 2

04:24:44 SQL>04:28:59 SQL> /

no rows selected

04:29:01 SQL>

Page 44: Oracle diagnostics 11g

Learnings :

1. USED_UREC doesn’t actually map to the number of Table Rows (or Table Rows + (Table Rows * No. of Indexes updated)). An Undo Record is not a 1-to-1 match to Table + Index records. However, you can still use it to extrapolate or compare transaction sizes if the table and index definitions are similar.

2. USED_UREC is useful to monitor *rollback*. If a transaction has failed internally and is automatically being rolled back, you would see USED_UREC declining much before you get a transaction failure error.

3. Direct Path Operations (e.g. INSERT /*+ APPEND */) would reflect as only 1 Undo Record – the table has to be locked for the duration of the INSERT.

Page 45: Oracle diagnostics 11g

Latches and Enqueues

• Latches are points of concurrency ; Enqueues are points of serialisation.

• Three sessions may be waiting on a Latch – any of the three may get the Latch before the other two.

• If sessions are waiting on an Enqueue, access is serialised – the first waiter gets the Enqueue first.

Page 46: Oracle diagnostics 11g

Latches

• Identify Latches with V$LATCHNAME, V$LATCH, V$LATCH_PARENT, V$LATCH_CHILDREN

• Latch Holders are in V$LATCHHOLDER• “willing-to-wait” and “no-wait” latches• “gets”, “misses” and “sleeps” – 1,2,3…• Most commonly known :

– library cache ; shared pool– cache buffers chains

Page 47: Oracle diagnostics 11g

Identifying Latches (listing from 11.2) :

SQL> select count(*) from v$latchname; COUNT(*)---------- 535SQL> select count(*) from v$latch; COUNT(*)---------- 535SQL> select count(*) from v$latch_parent; COUNT(*)---------- 535SQL> select count(*) from v$latch_children; COUNT(*)---------- 2773SQL>SQL> desc v$latchholder Name Null? Type ----------------------------------------- -------- ---------------------------- PID NUMBER SID NUMBER LADDR RAW(4) NAME VARCHAR2(64) GETS NUMBERSQL>

Page 48: Oracle diagnostics 11g

Refer to Oracle Support Document ID “What are Latches and What Causes Latch Contention [ID 22908.1]” for sample queries for latches/* ** Display System-wide latch statistics. */

/* ** Given a latch address, find out the latch name. */

  /* ** Display latch statistics by latch name. */

 

Page 49: Oracle diagnostics 11g

Library Cache Latch usage :This latch protects SQL statements, object definitions etc. Oracle internally determines the number of latches available (as a prime number).Adding new SQL statements and Objects require the Latch. One latch would be protecting multiple SQLs.Note : If you have DDLs modifying object definitions, there would be waits on the Library Cache Latches protecting those objects.

Library Cache Pin Latch usage :When a statement is re-executed (it has to be pinned to ensure that it is not modified !

Shared Pool Latch :This latch protects the allocation of memory in the Shared Pool. Multiple child latches are created.Frequent Hard Parsing of SQLs would cause frequent access to the Shared Pool Latch.

Page 50: Oracle diagnostics 11g

Cache Buffers Chains Latch :Protects Memory Buffers for Database Blocks. Multiple (Child) Latches are present, each Latch protecting multiple buffers (blocks).Blocks are loaded into memory based on a hash of the Database Block Address. A Linked List of the Buffer Headers is maintained so that a Block can be found quickly in the Buffer Cache.Look for CBC Latches with very high SLEEPs – indicating very frequent retries.

Don’t run this query – it will take a long time on a busy instance / large database.-- query from How To Identify a Hot Block Within The Database Buffer Cache. [ID 163424.1]… query text deleted …

Typically Hot Blocks can be Index Root / Branch Blocks when an Index is frequently used in a Nested Loop. So, look at the Sessions and SQLs and Execution Plans.

Page 51: Oracle diagnostics 11g

Cloned Buffers :Buffers are “cloned” when different sessions require different versions for Read Consistency.

Assume User “A” started a query at time t0Assume User “C” modified Buffer 123 (representing a specific table block) at time t5 with an UPDATE statementIf User “A”s session comes to the the same table block, the DBA (DataBlockAddress) requires it read Buffer 123. It finds, from the Buffer Header SCN, that the Block has been modified. The ITL entry identifies the Undo Segment and slot. From the Undo information, the session now as to recreate the “pre-change” image of the block. So, it “clones” the Buffer as another Buffer in memory and applies the Undo information to it – because it actually has to modify the block, which it cannot [and should not] do against the “dirty” Buffer 123 last updated by User “C”.

The same buffer can have multiple clones. Also, a session might have to keep “rolling back” a block through multiple updates to get to it’s desired state (SCN).

So : A Buffer Cache of 800MB doesn’t necessarily mean that you have 800MB of data, some of it could be multiple copies of the same database block, as of different points in time.

Page 52: Oracle diagnostics 11g

Cache Buffers LRU Chain Latch :The LRU Chain is a list of buffers. Oracle maintains multiple lists.A process that needs to load a block into memory “walks” the chain to identify a buffer that can be “used” (e.g. an empty or clean buffer). If it cannot find a buffer, it marks a list of dirty buffers for DBWR to flush to disk. When the buffers are flushed to disk, they are marked “clean” and can be reused. This latch is required whenever changes are to be made.

Waits on these would mean that DBWR isn’t fast enough.

Page 53: Oracle diagnostics 11g

Learnings :

1. Frequent Hard Parses strain the Shared Pool and Library Cache Latches.

2. Hot Blocks cause waits on particular Cache Buffer Chains Latches. The Hot Blocks need to be identified based the SQLs of the Sessions waiting on CBC Latches. Fixes could be SQL tuning, rebuilding table/index, reverse key indexes(avoid them !!).

3. Move to faster I/O, use Async I/O , add DB_WRITER_PROCESSES only as a last resort.

4. Latch Issues point to Concurrency issues.

Page 54: Oracle diagnostics 11g

Enqueues

• Enqueues are Queuing Mechanisms• The most famous ones are Row Locks (“TX”)

and DML Locks (“TM”)• There are 64 different types of Enqueues (11.2)

[62 in 10.2] See Appendix D of the Reference Guide

Page 55: Oracle diagnostics 11g

The Controlfile (CF) Enqueue is taken when LGWR or ARCH is updating the Controlfile or when CKPT is updating checkpoint information. NOLOGGING operations also take CF enqueues ! I’ve seen database instances crash when the CF enqueue is held for too long by one background process. RMAN uses a snapshot controlfile to avoid CF enqeuues.

The Undo Segment (US) Enqueue is taken when adding undo segments, taking them online or offline. When a “storm” of activity occurs, you may find US enqueue waits.

The Space Transaction (ST) Enqueue is for allocation / deallocation of extents.

The Object Chekpoint (KO) Enqueue is for Oracle to checkpoint an object/segment – e.g. for a TRUNCATE or DROP or before Parallel Query

The Sequence Number (SQ) Enqueue is for incrementing Sequences. Setting appropriate CACHE sizes is important.

The Job Queue (JQ) Enqueue is for Jobs.

Cross Instance (CI) Enqueue doesn’t appear only in RAC ! You will see requests and waits in non-RAC as well.

Page 56: Oracle diagnostics 11g

Enqueue Waits in an *idle* instance :

SQL> desc v$enqueue_stat Name Null? Type ----------------------------- -------- -------------------- INST_ID NUMBER EQ_TYPE VARCHAR2(2) TOTAL_REQ# NUMBER TOTAL_WAIT# NUMBER SUCC_REQ# NUMBER FAILED_REQ# NUMBER CUM_WAIT_TIME NUMBER

SQL> select eq_type, total_req#, total_wait# SQL> from v$enqueue_stat where total_wait# > 0 order by 1;

EQ TOTAL_REQ# TOTAL_WAIT#-- ---------- -----------CF 2895 1 -- controlfileJS 64043 15 -- not documentedKO 18 1 -- multiple object checkpointPR 358 3 -- process startupPV 53 3 -- not documentedTH 361 1 -- not documented

6 rows selected.

SQL>

Page 57: Oracle diagnostics 11g

After running :SQL> create table abc as select * from dba_objects 2 union all select * from dba_objects 3 union all select * from dba_objects;

Table created.

EQ TOTAL_REQ# TOTAL_WAIT#-- ---------- -----------KO 27 2

On running :SQL> select /*+ PARALLEL (a 4) */ count(*) from abc a;

COUNT(*)---------- 229848

SQL>

EQ TOTAL_REQ# TOTAL_WAIT#-- ---------- -----------KO 36 3

Page 58: Oracle diagnostics 11g

Learnings :

1. There are many different points of serialisation other than Row Locks.

2. Watch out for critical enqueues – CF, TS, SQ, KO.

Page 59: Oracle diagnostics 11g

Locks and Lock Trees

• Row Locks are Enqueues• They serialise access to rows• A transaction may hold Row Locks on multiple

rows – this is represented as a single entry in V$TRANSACTION but single or multiple entries in the ITL slots in various table / index blocks

• ITLs allow different transactions to lock different rows in the same block concurrently.

• Lock Trees are multiple sessions waiting “in order”, with potentially more than one session waiting on the same row lock

Page 60: Oracle diagnostics 11g

A Lock Tree :

Script “uttllockt.sql” can provide a tree-like diagram.

* This script prints the sessions in the system that are waiting for * locks, and the locks that they are waiting for. The printout is tree * structured. If a sessionid is printed immediately below and to the right * of another session, then it is waiting for that session. The session ids * printed at the left hand side of the page are the ones that everyone is * waiting for. * * For example, in the following printout session 9 is waiting for * session 8, 7 is waiting for 9, and 10 is waiting for 9. * * WAITING_SESSION TYPE MODE REQUESTED MODE HELD LOCK ID1 LOCK ID2 * ----------------- ---- ----------------- ----------------- -------- -------- * 8 NONE None None 0 0 * 9 TX Share (S) Exclusive (X) 65547 16 * 7 RW Exclusive (X) S/Row-X (SSX) 33554440 2 * 10 RW Exclusive (X) S/Row-X (SSX) 33554440 2 * * The lock information to the right of the session id describes the lock * that the session is waiting for (not the lock it is holding).

The script can be enhanced to provide more session information. The script uses DDLs to drop and create temp tables – so another enhancement would be to have those tables created in advance as GTTs and only populated and queried by the script

Page 61: Oracle diagnostics 11g

select s.blocking_session, to_number(s.sid) Waiting_Session, s.event, s.seconds_in_wait, p.pid, p.spid "ServerPID", s.process "ClientPID", s.username, s.program, s.machine, s.osuser, s.sql_id, substr(sq.sql_text,1,75) SQL from v$sql sq, v$session s, v$process pwhere s.event like 'enq: TX%'and s.paddr=p.addrand s.sql_address=sq.addressand s.sql_hash_value=sq.hash_valueand s.sql_id=sq.sql_idand s.sql_child_number=sq.child_numberunion allselect s.blocking_session, to_number(s.sid) Waiting_Session, s.event, s.seconds_in_wait, p.pid, p.spid "ServerPID", s.process "ClientPID", s.username, s.program, s.machine, s.osuser, s.sql_id, substr(sq.sql_text,1,75) SQL from v$sql sq, v$session s, v$process pwhere s.sid in (select distinct blocking_session from v$session where event like 'enq: TX%')and s.paddr=p.addrand s.sql_address=sq.address(+)and s.sql_hash_value=sq.hash_value(+)and s.sql_id=sq.sql_id(+)and s.sql_child_number=sq.child_number(+)order by 1 nulls first, 2/

Page 62: Oracle diagnostics 11g

Two separate sessions :SQL> connect / as sysdbaConnected.SQL> update hemant.test_row_lock set content = 'Another' where pk=1;

1 row updated.

SQL>

SQL> connect hemant/hemantConnected.SQL> update test_row_lock set content = 'First' where pk=1;….. now waiting …..

Page 63: Oracle diagnostics 11g

BLOCKING_SESSION WAITING_SESSION EVENT SECONDS_IN_WAIT PID---------------- --------------- ---------------------------------------------------------------- --------------- ----------ServerPID ClientPID USERNAME PROGRAM------------------------ ------------------------ ------------------------------ ------------------------------------------------MACHINE OSUSER SQL_ID---------------------------------------------------------------- ------------------------------ -------------SQL------------------------------------------------------------------------------------------------------------------------------------ 17 SQL*Net message from client 82 2713788 13770 SYS [email protected] (TNS V1-V3)localhost.localdomain oracle

17 26 enq: TX - row lock contention 52 1913791 3449 HEMANT [email protected] (TNS V1-V3)localhost.localdomain oracle fuwn3bnuh2axgupdate test_row_lock set content = 'First' where pk=1

SQL>

Page 64: Oracle diagnostics 11g

“Runaway” Processes• A “Runaway” is a process continuously taking

CPU on the server, with no corresponding Client process, because the client was abruptly terminated

• This is where you can use queries by LAST_CALL_ET – because there is no client that is sending calls, the LAST_CALL_ET keeps increasing

• But remember : You MUST check to see if there is a client (or Application Server) that has submitted the SQL and *is* legitimately waiting for the results.