Upload
noman78
View
212
Download
0
Embed Size (px)
Citation preview
Performance troubleshooting using Active Session History
OUG Ireland Conference 2012
Marcin Przepiórowski
Intro : About me
• Oracle DBA / consultant / trainer since 2000
• Oracle ACE – since 2010
• Blogger - http://oracleprof.blogspot.com/
2
Agenda
• Performance tuning problems and goals
• DBTime is money
• Average Active Session
• Active Session History
• Case study
3
Performance tuning
• Define a metrics to measure – best metrics are related to time and business activities
• Compare performance with system baseline and business goals
IT USER
4
Tuning goals
• 99 % of response time for SQL/business transaction has to be X seconds, rest has to be maximum Y seconds
• For 99 % of executions - X number of rows/business operations have to be processed in Y seconds
5
DBTime is money
DB Time
– Oracle doc “Database time represents the total time spent in database calls, and is an indicator of the total instance workload”
– Sum of CPU time and not idle wait time (IO and others)
– ASH samples counts = DB Time in seconds (proof in DB Time Oracle Performance Tuning: Theory and Practice by Graham Wood, John Beresniewicz)
6
DB Time
• DB Time increase when system load increase
– number of active sessions increase
– number of calls increase
• DB Time increase when system throughput/performance decrease
– IO time increase
– non-idle events time increase (locks, network problems)
7
Average Active Session
Average Active Session
AAS = ∆DB Time / Elapsed clock time
Average Active Session using ASH samples
AAS = ∑ ASH samples (∆ t=1s)/ Elapsed clock time
For more detail go to Appendix A of Average active sessions: the magic metric ? By John Beresniewicz
8
Average Active Session
Time
t0 t1
AAS = (DB TIME(t1) – DB TIME(t0) ) / (t1 – t0)
9
Average Active Session
Time
t1
AAS = ∑ ASH samples/ (t1 – t0)
2 2 0 3 3 2 2 2 No of samples
t0
10
AAS using ASH
Time
t1 t0
Be aware of using small number of samples – poor and misleading results More samples – better results
DB TIME
ASH samples
11
Average Active Session
• Total upper AAS boundary is equal to number of sessions connected to instance
• AAS ≈ 0 – database is idle
• AAS < no of CPU – no system bottleneck
• AAS >> no of CPU – database bottleneck
12
• Upper boundary for CPU class – number of CPU cores
• If CPU class is close to upper boundary there is a CPU bottleneck in the system
Average Active Session - CPU
13
• Upper boundary for IO vs. maximum system IO/s
• Single IO operation time
– single block read
– multiple block read
• Number of IO requests in one ASH sample #IO requests / sample ≈ 1 s / max ( sbrt, mbrt )
Average Active Session - IO
14
System max IO/s = 8000 IO/s Avg IO time = 5 ms (0.005 s)
Max #IO per ASH sample = 1 / 0.005 = 200
Upper boundary for User/System IO class is
8000 / 200 = 40
Average Active Session - IO
15
Active Session History
• Active Session History is system-wide activity trace
• Active sessions are sampled and keep in-memory
– Active != Idle - not idle event or on CPU
– Sampled every second
– Inserted into circular buffer in SGA
– MMON Light – take care about it
16
Active Session History
• Started with first release of Oracle 10g
• Enhancement in all next versions – number of columns/metrics increased
• Together with AWR can provide historical information – every 1/10 of samples are store in persistent tables
17
Active Session History
• 45 columns in 10.2.0.4
• 66 columns in 11.1.0.7
• 93 columns in 11.2.0.1
Important new columns:
- sql_plan_line_id – 11g
- machine – 11gR2
- delta columns – 11gR2
18
Active Session History
V$ACTIVE_SESSION_HISTORY
Main view based on in-memory X$ASH table. Circular buffer size from 1 to 128 MB about 2 MB / CPU.
Flushed to disk:
- every hour / AWR sample
- when buffer is 2/3 full
19
DBA_HIST_ACTIVE_SESS_HISTORY
Flushed history of ASH – persistent table
– 1/10 of sampled data
– Partitioned for easier purging
– Part of AWR system
Active Session History
20
Using ASH – ON CPU
21
Column name Value
SESSION_STATE ON CPU
WAIT_TIME Non zero value
EVENT, P1, P2, P3, CURRENT_OBJ#, CURRENT_FILE#, CURRENT_BLOCK#
Maybe not cleared from previous event
Using ASH - WAITING
22
Column name Value
SESSION_STATE WAITING
WAIT_TIME 0
TIME_WAITED only last sample is updated/fixed after event – DO NOT use in ASH calculations
Using ASH - WAITING
SQL> select sample_time, event, TIME_WAITED from
v$active_session_history where session_id = 170
and sql_id = '6548zp29zsqgj' order by 1;
SAMPLE_TIME EVENT TIME_WAITED
------------------ ------------------------------- -----------
08.39.59.483 AM enq: TX - row lock contention 0
08.40.00.483 AM enq: TX - row lock contention 0
08.40.01.483 AM enq: TX - row lock contention 0
08.40.02.483 AM enq: TX - row lock contention 0
. . .
08.40.18.503 AM enq: TX - row lock contention 0
08.40.19.503 AM enq: TX - row lock contention 21295426
23
Concurrency Cluster
User I/O Application
System I/O Queuing
Administrative Scheduler
Other Network
Configuration Commit
Using ASH - WAITING
• EVENTS – 1142 in 11.2.0.2
• WAIT CLASS
24
ASH - Math
• Use count(*) for calculate wait or ON CPU time
• Group by session_id, sql_id, client_id, etc....
• As ASH is a sample do not use time_waited column in calculations:
– sum(time_waited)
– avg(time_waited)
– etc.
25
ASH - Math
Example of query – Profile of SQL queries from ASH
select
sql_id,
event,
count(*) cnt
from v$active_session_history
group by sql_id,event
26
Using ASH - time
• sample_time – when it happen
• Using samples across time
– in-memory – 1 second sample
– on-disk – 10 second sample
select count(*)
from v$active_session_history ...
select count(*) * 10
from dba_hist_active_sess_history ...
27
Using ASH - Drilldown
• session_id
• user_id
• program
• module
• action
• machine
• client_id
• sql_id
• pl/sql
• blocking_session
• object
• file
• etc.
28
Top SQL
select sql_id,
round((count(*) / sum(count(*)) over
())*100,2) ActPct
from v$active_session_history
where sql_id is not null
and sample_time > sysdate–5/1440
group by sql_id
order by ActPct
29
Top Session
select session_id, round((count(*) /
sum(count(*)) over ())*100,2) ActPct
from v$active_session_history
where
sample_time > sysdate–5/1440
group by session_id
order by ActPct desc
30
Top program
select program, round((count(*) /
sum(count(*)) over ())*100,2) ActPct
from v$active_session_history
where
sample_time > sysdate–5/1440
group by program
order by ActPct desc
31
Blocking session
select
session_id, event, blocking_session
from v$active_session_history
where
blocking_session is not null
and sample_time > sysdate - 5/1440;
32
OEM 11g
33
OEM 12c – ASH Analytics
34
35
OEM 12c – ASH Analytics
36
OEM 12c – ASH Analytics
37
OEM 12c – ASH Analytics
38
OEM 12c – ASH Analytics
ASH availability
• Oracle 10g onwards
• Oracle Enterprise Edition only
• Diagnostic and Tuning Pack required
• Accessibility:
– Oracle Enterprise Manager – Performance Tab
– Text / html reports – ASHRPT.SQL
– V$ and DBA_HIST tables
39
Other options
• 3rd party products – DB Optimizer, I3, Spotlight
• Free solution – Simulating-ASH
http://ashmasters.com/
https://sourceforge.net/projects/orasash https://github.com/pioro/orasash
• Interactive ASH tool with statistics Snapper v. 3.0. by Tanel Poder
40
Case study
• Average Active Sessions vs. system throughput
• Is “big stuff” always a problem ?
• Do I need to investigate a “big stuff” ?
• Compare it with throughput first – DB Time will increase when system load increase
41
Case study
AverageTransactions/s - 80.23 AverageTransactions/s - 98.18
42
Case Study
• Average Active Session – system load
• Load profile
• SQL plan flip
• SQL query “profile”
43
select mtime, round(sum(c1),2) AAS_WAIT,
round(sum(c2),2) AAS_CPU,
round(sum(cnt),2) AAS
from ( select
to_char(sample_time,'YYYY-MM-DD HH24‘) mtime,
decode(session_state,'WAITING',count(*),0)/360 c1,
decode(session_state,'ON CPU',count(*),0) /360 c2,
count(*)/360 cnt
from dba_hist_active_sess_history
group by to_char(sample_time,'YYYY-MM-DD HH24'),
session_state
)
group by mtime order by mtime;
AAS – system load
44
AAS – system load
MTIME AAS_WAIT AAS_CPU AAS
------------- -------- ------- ------
2012-03-03 08 6.36 4.23 10.58
2012-03-03 09 17.91 6.64 24.54
2012-03-03 10 33.10 8.63 41.73
2012-03-03 11 29.34 8.90 38.24
2012-03-03 12 29.76 8.76 38.52
2012-03-03 13 33.30 9.59 42.89
2012-03-03 14 38.02 9.40 47.42
2012-03-03 15 21.05 6.11 27.16
2012-03-03 16 3.81 1.63 5.44
2012-03-03 17 3.84 1.41 5.25
45
AAS – system load
46
Load profile
select * from (
select
decode(session_state,'WAITING',event,'ON CPU'),
count(*) cnt,
count(*) / (sum(count(*)) over ()) * 100 pct
from dba_hist_active_sess_history
where
sample_time between X and Y
group by
decode(session_state,'WAITING',event,'ON CPU')
order by cnt desc
) where rownum <= 5;
47
Load profile
EVENT CNT PCT
------------------------------ ---------- -------
enq: US - contention 8059 53.65
ON CPU 3107 20.68
db file sequential read 645 4.29
latch: undo global data 536 3.57
enq: TX - row lock contention 446 2.97
ON CPU 1347 38.38
db file sequential read 524 14.93
enq: US - contention 365 10.40
LNS wait on SENDREQ 215 6.13
LGWR-LNS wait on channel 201 5.73
48
SQL plan flip
select sql_id, count(*)
from (
select distinct sql_id, sql_plan_hash_value
from dba_hist_active_sess_history
where sql_opname <> 'INSERT' and user_id = X
and sql_plan_hash_value <> 0)
group by sql_id
having count(*) > 1
order by 2;
49
SQL plan flip
SQL_ID COUNT(*)
------------- ----------
a7rf2v00g2y9t 2
8w2904am00nbq 2
dyuz1k3bdr36s 2
7upuf7wbk7nbf 2
da6wsxtsz2ftk 3
gur9jrbxvbvur 6
50
No sql_opname filter
SQL plan flip ?
select distinct sql_id, sql_plan_hash_value
from dba_hist_active_sess_history
where sql_id = 'gur9jrbxvbvur';
SQL_ID SQL_PLAN_HASH_VALUE
------------- -------------------
gur9jrbxvbvur 2263242137
gur9jrbxvbvur 3926490670
select sql_id from v$sql
where plan_hash_value = 3926490670;
SQL_ID
-------------
62fbdqwqht3x1
51
Lazy clean up
select *
from table(dbms_xplan.display_cursor
('62fbdqwqht3x1',null));
SQL_ID 62fbdqwqht3x1, child number 0
-------------------------------------
select KVAL_SEQUENCE.nextval from dual
Plan hash value: 3926490670
----------------------
| 0 | SELECT STATEMENT |
| 1 | SEQUENCE |
| 2 | FAST DUAL |
52
SQL plan flip
select
distinct sql_id, sql_plan_hash_value, sql_opname
from dba_hist_active_sess_history
where sql_id = 'dyuz1k3bdr36s';
SQL_ID SQL_PLAN_HASH_VALUE SQL_OPNAME
------------- ------------------- -----------
dyuz1k3bdr36s 760792886 SELECT
dyuz1k3bdr36s 2913971644 SELECT
53
SQL plan flip
select
to_char(sample_time, 'YYYY-MM-DD HH24') sample_time,
sql_id, sql_plan_hash_value
from dba_hist_active_sess_history
where
sql_id = 'dyuz1k3bdr36s'
group by
to_char(sample_time, 'YYYY-MM-DD HH24'),
sql_id, sql_plan_hash_value
order by sample_time;
54
SQL plan flip
SAMPLE_TIME SQL_ID SQL_PLAN_HASH_VALUE
------------- ------------- -------------------
2012-03-01 08 dyuz1k3bdr36s 760792886
2012-03-01 09 dyuz1k3bdr36s 760792886
2012-03-01 10 dyuz1k3bdr36s 760792886
2012-03-01 10 dyuz1k3bdr36s 2913971644
2012-03-01 11 dyuz1k3bdr36s 2913971644
. . .
2012-03-02 11 dyuz1k3bdr36s 2913971644
2012-03-02 12 dyuz1k3bdr36s 760792886
2012-03-02 12 dyuz1k3bdr36s 2913971644
2012-03-02 13 dyuz1k3bdr36s 760792886
2012-03-02 14 dyuz1k3bdr36s 760792886
55
SQL query “profile”
select sql_plan_hash_value, sql_exec_id,
count(*) cnt
from dba_hist_active_sess_history
where sample_time between X and Y
and sql_id = 'dyuz1k3bdr36s'
group by sql_plan_hash_value,sql_exec_id
order by sql_plan_hash_value, cnt;
SQL_PLAN_HASH_VALUE SQL_EXEC_ID CNT
------------------- ----------- ----------
760792886 17324679 1
2913971644 17324251 51
56
SQL query “profile”
select
decode(session_state,'WAITING',event,'ON CPU'),
count(*) cnt
from dba_hist_active_sess_history
where sql_id = 'dyuz1k3bdr36s'
and SQL_EXEC_ID = 999999999
group by
decode(session_state,'WAITING',event,'ON CPU')
order by cnt;
57
SQL query “profile”
SQL EXEC ID - 17324255
EVENT CNT
----------------------------- ----------
db file sequential read 3
ON CPU 6
db file scattered read 41
SQL EXEC ID - 17324679
EVENT CNT
----------------------------- ----------
db file scattered read 1
58
Q & A 59