26
#IDUG DB2 Performance Tuning: Tips & Techniques Sweta Singh IBM Session 6 16:15 17:15 Bangalore 25 March 2014 - Chennai 27 March 2014 Platform: DB2 LUW

DB2 Performance Tuning: Tips & Techniques

  • Upload
    others

  • View
    17

  • Download
    1

Embed Size (px)

Citation preview

Page 1: DB2 Performance Tuning: Tips & Techniques

#IDUG

DB2 Performance Tuning: Tips & Techniques

Sweta Singh IBM

Session 6 16:15 – 17:15 – Bangalore 25 March 2014 - Chennai 27 March 2014 Platform: DB2 LUW

Page 2: DB2 Performance Tuning: Tips & Techniques

#IDUG

Agenda

• Performance Basics

• Diagnosing Performance Problems • Sanity Check

• First Aid toolkit

• OS tools

• New Monitoring in DB2 9.7

• OS & DB2 Monitoring Combo

• RCA Decision Tree

• Interesting Case Study

2

Page 3: DB2 Performance Tuning: Tips & Techniques

#IDUG

3

Basics: Orchestration matters

Hardware: CPU, Memory, Storage and Network

• CPU cores and RAM available continue to grow

• 12 cores/chip for both Intel Ivybridge EP and IBM POWER8

• 8GB RAM/core is a good Rule of Thumb

• Think of IOPS and MB/sec read/write for sizing storage

• Include SSDs in your storage strategy– they can make an astonishing difference!

o Random I/O, especially reads, benefit significantly

o Internal SSDs have best and most cost-effective performance but

• They often don't have a write cache

• Consider HA and DR

Software: OS, DB2 and Application

• Use best practice recommendations

• Stay current on maintenance

Page 4: DB2 Performance Tuning: Tips & Techniques

#IDUG

Basics: Understand the physical layout of your database

4

• Disk spindles still matter – With sophisticated storage subsystems and storage

virtualization it just requires more sleuthing than ever to find them

– Drives keep getting bigger, 146GB-300GB now the norm

• “Spread everything everywhere” strategy works reasonably well • For OLTP, make sure that the transaction logs

reside on a separate array

• Be leery of Storage Administrators that

tell you – “Don’t worry, it doesn’t matter”

– “The cache will take care of it”

Page 5: DB2 Performance Tuning: Tips & Techniques

#IDUG

Basics: Storage best practices

5

Page 6: DB2 Performance Tuning: Tips & Techniques

#IDUG

Basics: Best Practices (DB2 configuration)

6

Use DB2 automatic storage – A modest number of storage paths that map sensibly to the SAN storage

Minimize the use of distinct page sizes – 8K is a good default (no more than 2)

– Specify the smaller one at create database time for the catalog

Minimize the number of bufferpools – 1 for data/index, 1 for temp is usually sufficient

Splitt tablespaces for logical groups – Often helpful to split data/index/LOB (especially LOB) into separate tablespaces

– No need to put EACH table in its own tablespace

Use the Autonomics – Autoconfigure & STMM

• But give DB2 a budget via instance_memory or database_memory

Compression – Turn on compression at create table time for top ~50 large tables

Page 7: DB2 Performance Tuning: Tips & Techniques

#IDUG

Agenda

• Performance Basics

• Diagnosing Performance Problems • Sanity Check

• First Aid toolkit

• OS tools

• New Monitoring in DB2 9.7

• How to identify Resource Bottleneck

• RCA Decision Tree

• Interesting Case Studies

7

Page 8: DB2 Performance Tuning: Tips & Techniques

#IDUG

Performance problem has occurred: Sanity Check

• Did something change since when performance met expectations?

• New database applications?

• Other non-database loads on the system?

• More users?

• More data?

• Software configuration changes?

• DB, DBM configuration, registry variables, schema changes, etc.

• Hardware configuration changes?

• If you can identify the change at this stage, can it / should it be undone? Or is this something you have to live with?

• If this is a real problem, a methodical approach is key to finding the root cause

8

Page 9: DB2 Performance Tuning: Tips & Techniques

#IDUG

First Aid toolkit: OS tools

• Make vmstat your best friend • It will give you the first hint at where your bottleneck is

• iostat gives valuable insights into I/O activity

• Use “iostat –D” on AIX, “iostat –x” on Linux to get I/O response time

9

kthr memory page faults cpu time

----- ----------- ------------------------ ------------ ----------- --------

r b avm fre re pi po fr sr cy in sy cs us sy id wa hr mi se

12 1 17339981 47266810 0 0 0 0 0 0 24924 82462 72310 22 6 55 16 00:08:51

8 0 17339986 47266800 0 0 0 0 0 0 24454 78758 70106 21 6 55 17 00:09:01

6 0 17339986 47266797 0 0 0 0 0 0 22510 64438 58757 17 6 59 18 00:09:11

13 0 17339987 47266793 0 0 0 0 0 0 26469 88133 78037 24 7 50 19 00:09:21

11 0 17340019 47266759 0 0 0 0 0 0 25557 84203 74263 23 7 54 17 00:09:31

9 0 17340019 47266756 0 0 0 0 0 0 25454 82285 73301 22 7 54 17 00:09:41

hdisk77 xfer: %tm_act bps tps bread bwrtn

99.0 10.6M 1293.7 8.8M 1.8M

read: rps avgserv minserv maxserv timeouts fails

1072.9 12.3 0.1 307.3 0 0

write: wps avgserv minserv maxserv timeouts fails

220.8 0.8 0.1 515.2 0 0

queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull

0.0 0.0 0.1 0.0 13.0 0.0

hdisk78 xfer: %tm_act bps tps bread bwrtn

99.3 10.5M 1282.2 8.8M 1.7M

read: rps avgserv minserv maxserv timeouts fails

1069.5 13.0 0.1 398.4 0 0

write: wps avgserv minserv maxserv timeouts fails

212.7 1.0 0.1 589.6 0 0

queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull

0.0 0.0 0.0 0.0 14.0 0.0

Page 10: DB2 Performance Tuning: Tips & Techniques

#IDUG

First Aid toolkit: New Monitoring in DB2 9.7

• DB2 9.7 brings many big improvements in monitoring

• Component & wait times

• Static SQL / SQL procedure monitoring

• Improvements in low-overhead activity monitoring via event monitors

• db2pd sorting & currently committed statistics

• Section actuals

• SQL Access to performance data

• A good set of monitor queries makes diagnosing problems much easier

• Choosing & tracking the most important metrics

• Calculating derived values (hit ratio, time per IO, etc.)

• Comparing against baseline values

• Transitioning from snapshots to SQL monitoring made easy

• MONREPORT module

10

Page 11: DB2 Performance Tuning: Tips & Techniques

#IDUG

Glimpse of New Monitoring: monreport.dbsummary

11

Monitoring report - database summary -------------------------------------------------------------------------------- Database: DTW Generated: 07/14/2010 22:59:05 Interval monitored: 10 ================================================================================ Part 1 - System performance Work volume and throughput -------------------------------------------------------------------------------- Per second Total --------------------- ----------------------- TOTAL_APP_COMMITS 137 1377 ACT_COMPLETED_TOTAL 3696 36963 APP_RQSTS_COMPLETED_TOTAL 275 2754 TOTAL_CPU_TIME = 23694526 TOTAL_CPU_TIME per request = 8603 Row processing ROWS_READ/ROWS_RETURNED = 1 (25061/20767) ROWS_MODIFIED = 22597 Wait times -------------------------------------------------------------------------------- -- Wait time as a percentage of elapsed time -- % Wait time/Total time --- ---------------------------------- For requests 70 286976/406856 For activities 70 281716/401015 -- Time waiting for next client request -- CLIENT_IDLE_WAIT_TIME = 13069 CLIENT_IDLE_WAIT_TIME per second = 1306 -- Detailed breakdown of TOTAL_WAIT_TIME -- % Total --- --------------------------------------------- TOTAL_WAIT_TIME 100 286976 I/O wait time POOL_READ_TIME 88 253042 POOL_WRITE_TIME 6 18114 DIRECT_READ_TIME 0 100 DIRECT_WRITE_TIME 0 0 LOG_DISK_WAIT_TIME 1 4258 LOCK_WAIT_TIME 3 11248 AGENT_WAIT_TIME 0 0 Network and FCM TCPIP_SEND_WAIT_TIME 0 0 TCPIP_RECV_WAIT_TIME 0 0 IPC_SEND_WAIT_TIME 0 198 IPC_RECV_WAIT_TIME 0 15 FCM_SEND_WAIT_TIME 0 0 FCM_RECV_WAIT_TIME 0 0 WLM_QUEUE_TIME_TOTAL 0 0 Component times -------------------------------------------------------------------------------- -- Detailed breakdown of processing time -- % Total ---------------- -------------------------- Total processing 100 119880 Section execution TOTAL_SECTION_PROC_TIME 11 13892 TOTAL_SECTION_SORT_PROC_TIME 0 47 Compile TOTAL_COMPILE_PROC_TIME 22 27565 TOTAL_IMPLICIT_COMPILE_PROC_TIME 2 3141 Transaction end processing TOTAL_COMMIT_PROC_TIME 0 230 TOTAL_ROLLBACK_PROC_TIME 0 0 Utilities TOTAL_RUNSTATS_PROC_TIME 0 0 TOTAL_REORGS_PROC_TIME 0 0 TOTAL_LOAD_PROC_TIME 0 0 Buffer pool -------------------------------------------------------------------------------- Buffer pool hit ratios Type Ratio Reads (Logical/Physical) --------------- --------------- ---------------------------------------------- Data 72 54568/14951 Index 79 223203/45875 XDA 0 0/0 Temp data 0 0/0 Temp index 0 0/0 Temp XDA 0 0/0 I/O -------------------------------------------------------------------------------- Buffer pool writes POOL_DATA_WRITES = 817 POOL_XDA_WRITES = 0 POOL_INDEX_WRITES = 824 Direct I/O DIRECT_READS = 122 DIRECT_READ_REQS = 15 DIRECT_WRITES = 0 DIRECT_WRITE_REQS = 0 Log I/O LOG_DISK_WAITS_TOTAL = 1275 Locking -------------------------------------------------------------------------------- Per activity Total ------------------------------ ---------------------- LOCK_WAIT_TIME 0 11248 LOCK_WAITS 0 54 LOCK_TIMEOUTS 0 0 DEADLOCKS 0 0 LOCK_ESCALS 0 0 Routines -------------------------------------------------------------------------------- Per activity Total ------------------------ ------------------------ TOTAL_ROUTINE_INVOCATIONS 0 1377 TOTAL_ROUTINE_TIME 2 105407 TOTAL_ROUTINE_TIME per invocation = 76 Sort -------------------------------------------------------------------------------- TOTAL_SORTS = 55 SORT_OVERFLOWS = 0 POST_THRESHOLD_SORTS = 0 POST_SHRTHRESHOLD_SORTS = 0 Network -------------------------------------------------------------------------------- Communications with remote clients TCPIP_SEND_VOLUME per send = 0 (0/0) TCPIP_RECV_VOLUME per receive = 0 (0/0) Communications with local clients IPC_SEND_VOLUME per send = 367 (1012184/2754) IPC_RECV_VOLUME per receive = 317 (874928/2754) Fast communications manager FCM_SEND_VOLUME per send = 0 (0/0) FCM_RECV_VOLUME per receive = 0 (0/0) Other -------------------------------------------------------------------------------- Compilation TOTAL_COMPILATIONS = 5426 PKG_CACHE_INSERTS = 6033 PKG_CACHE_LOOKUPS = 24826 Catalog cache CAT_CACHE_INSERTS = 0 CAT_CACHE_LOOKUPS = 14139 Transaction processing TOTAL_APP_COMMITS = 1377 INT_COMMITS = 0 TOTAL_APP_ROLLBACKS = 0 INT_ROLLBACKS = 0 Log buffer NUM_LOG_BUFFER_FULL = 0 Activities aborted/rejected ACT_ABORTED_TOTAL = 0 ACT_REJECTED_TOTAL = 0 Workload management controls WLM_QUEUE_ASSIGNMENTS_TOTAL = 0 WLM_QUEUE_TIME_TOTAL = 0 DB2 utility operations -------------------------------------------------------------------------------- TOTAL_RUNSTATS = 0 TOTAL_REORGS = 0 TOTAL_LOADS = 0 ================================================================================ Part 2 - Application performance drill down Application performance database-wide -------------------------------------------------------------------------------- TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ + per request WAIT_TIME % COMMITS ROWS_MODIFIED ---------------------- ----------- ------------- ---------------------------- 8603 70 1377 47658 Application performance by connection -------------------------------------------------------------------------------- APPLICATION_ TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ + HANDLE per request WAIT_TIME % COMMITS ROWS_MODIFIED ------------- ------------------- ----------- ------------- ------------- 7 0 0 0 0 22 10945 70 30 1170 23 7977 68 46 1117 24 12173 65 33 1257 58 14376 73 28 1419 59 8252 69 43 1337 60 12344 71 32 1569 61 9941 71 37 1558 63 0 0 0 0 Application performance by service class -------------------------------------------------------------------------------- SERVICE_ TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ + CLASS_ID per request WAIT_TIME % COMMITS ROWS_MODIFIED -------- ------------------- ----------- ------------- ------------- 11 0 0 0 0 12 0 0 0 0 13 8642 68 1379 47836 Application performance by workload -------------------------------------------------------------------------------- WORKLOAD_ TOTAL_CPU_TIME TOTAL_ TOTAL_APP_ ROWS_READ + NAME per request WAIT_TIME % COMMITS ROWS_MODIFIED ------------- ---------------------- ----------- ------------- ------------- SYSDEFAULTADM 0 0 0 0 SYSDEFAULTUSE 11108 68 1376 47782 ================================================================================ Part 3 - Member level information - I/O wait time is (POOL_READ_TIME + POOL_WRITE_TIME + DIRECT_READ_TIME + DIRECT_WRITE_TIME). TOTAL_CPU_TIME TOTAL_ RQSTS_COMPLETED_ I/O MEMBER per request WAIT_TIME % TOTAL wait time ------ ---------------------- ----------- ---------------- ----------------- 0 8654 68 2760 271870

Work volume and throughput

-------------------------------------------

Per second Total

------------------------ ----------- -------

TOTAL_APP_COMMITS 137 1377

ACT_COMPLETED_TOTAL 3696 36963

APP_RQSTS_COMPLETED_TOTAL 275 2754

TOTAL_CPU_TIME = 23694526

TOTAL_CPU_TIME per request = 8603

Row processing

ROWS_READ/ROWS_RETURNED = 1 (25061/20767)

ROWS_MODIFIED = 22597

Wait times

---------------------------------------

-- Wait time as a percentage of elapsed time

% Wait time/Total time

--- --------------------

For requests 70 286976/406856

For activities 70 281716/401015

-- Time waiting for next client request --

CLIENT_IDLE_WAIT_TIME = 13069

CLIENT_IDLE_WAIT_TIME per second = 1306

-- Detailed breakdown of TOTAL_WAIT_TIME

% Total

--- -------------

TOTAL_WAIT_TIME 100 286976

I/O wait time

POOL_READ_TIME 88 253042

POOL_WRITE_TIME 6 18114

DIRECT_READ_TIME 0 100

DIRECT_WRITE_TIME 0 0

LOG_DISK_WAIT_TIME 1 4258

LOCK_WAIT_TIME 3 11248

AGENT_WAIT_TIME 0 0

:

Component times

-----------------------------------------------

-- Detailed breakdown of processing time --

% Total

---- --------

Total processing 100 119880

Section execution

TOTAL_SECTION_PROC_TIME 11 13892

TOTAL_SECTION_SORT_PROC_TIME 0 47

Compile

TOTAL_COMPILE_PROC_TIME 22 27565

TOTAL_IMPLICIT_COMPILE_PROC_TIME 2 3141

Transaction end processing

TOTAL_COMMIT_PROC_TIME 0 230

TOTAL_ROLLBACK_PROC_TIME 0 0

:

Buffer pool

---------------------------------------------

Buffer pool hit ratios

Type Ratio Reads (Logical/Physical)

---------- -------- ------------------------

Data 72 54568/14951

Index 79 223203/45875

XDA 0 0/0

Temp data 0 0/0

Temp index 0 0/0

Temp XDA 0 0/0

Page 12: DB2 Performance Tuning: Tips & Techniques

#IDUG

First Aid toolkit Queries: Core Activity

• Transactions, statements, rows

12

select

current timestamp as "Timestamp",

substr(workload_name,1,32) as "Workload",

sum(TOTAL_APP_COMMITS) as "Total app. commits",

sum(ACT_COMPLETED_TOTAL) as "Total activities",

case when sum(TOTAL_APP_COMMITS) < 100 then null else

cast( sum(ACT_COMPLETED_TOTAL) / sum(TOTAL_APP_COMMITS) as decimal(6,1)) end

as "Activities / UOW",

case when sum(TOTAL_APP_COMMITS) = 0 then null else

cast( 1000.0 * sum(DEADLOCKS)/ sum(TOTAL_APP_COMMITS) as decimal(8,3)) end

as "Deadlocks / 1000 UOW",

case when sum(ROWS_RETURNED) < 1000 then null else

sum(ROWS_READ)/sum(ROWS_RETURNED) end as "Rows read/Rows ret",

case when sum(ROWS_READ+ROWS_MODIFIED) < 1000 then null else

cast(100.0 * sum(ROWS_READ)/sum(ROWS_READ+ROWS_MODIFIED) as decimal(4,1)) end

as "Pct read act. by rows"

from table(mon_get_workload(null,-2)) as t

group by rollup ( substr(workload_name,1,32) );

Page 13: DB2 Performance Tuning: Tips & Techniques

#IDUG

First Aid toolkit: Time-spent Query #1

• TOTAL_RQST_TIME is “Time Spent in DB2” • Wait Time vs Processing Time

13

select

current timestamp as "Timestamp",

TOTAL_RQST_TIME,

TOTAL_RQST_TIME / TOTAL_APP_COMMITS “Time Spent per txn”,

(TOTAL_CPU_TIME+500) / 1000 as TOTAL_CPU_TIME ,

TOTAL_WAIT_TIME,

case when TOTAL_RQST_TIME < 100 then null else

cast(100.0 * TOTAL_WAIT_TIME/TOTAL_RQST_TIME as decimal(4,1)) end

as Wait_time_pct,

cast(100.0 * ((TOTAL_CPU_TIME+500) / 1000) / TOTAL_RQST_TIME as decimal(4,1))

as CPU_time_pct

from table(mon_get_workload(null,-2)) as t

group by rollup ( substr(workload_name,1,32) );

Page 14: DB2 Performance Tuning: Tips & Techniques

#IDUG

First Aid toolkit Queries: Time-spent Query #2

14

select current timestamp as "Timestamp",

substr(workload_name,1,32) as "Workload", ...

case when sum(TOTAL_SECTION_SORTS) < 1000 then null else cast(

100.0 * sum(SORT_OVERFLOWS)/sum(TOTAL_SECTION_SORTS)

as decimal(4,1)) end as "Pct spilled sorts",

case when sum(TOTAL_SECTION_TIME) < 100 then null else cast(

100.0 * sum(TOTAL_SECTION_SORT_TIME)/sum(TOTAL_SECTION_TIME)

as decimal(4,1)) end as "Pct section time sorting",

case when sum(TOTAL_SECTION_SORTS) < 100 then null else cast(

float(sum(TOTAL_SECTION_SORT_TIME))/sum(TOTAL_SECTION_SORTS)

as decimal(6,1)) end as "Avg sort time",

case when sum(TOTAL_RQST_TIME) < 100 then null else cast(

100.0 * sum(TOTAL_COMPILE_TIME)/sum(TOTAL_RQST_TIME)

as decimal(4,1)) end as "Pct request time compiling”,

case when sum(PKG_CACHE_LOOKUPS) < 1000 then null else cast(

100.0 * sum(PKG_CACHE_LOOKUPS-PKG_CACHE_INSERTS)/sum(PKG_CACHE_LOOKUPS)

as decimal(4,1)) end as "Pkg cache h/r",

case when sum(CAT_CACHE_LOOKUPS) < 1000 then null else cast(

100.0 * sum(CAT_CACHE_LOOKUPS-CAT_CACHE_INSERTS)/sum(CAT_CACHE_LOOKUPS)

as decimal(4,1)) end as "Cat cache h/r"

• Sorting, Compilation Time

Page 15: DB2 Performance Tuning: Tips & Techniques

#IDUG

15

First Aid toolkit : Time-spent Query #3

select current timestamp as "Timestamp",substr(workload_name,1,32) as "Workload",

sum(TOTAL_RQST_TIME) as "Total request time",

sum(CLIENT_IDLE_WAIT_TIME) as "Client idle wait time",

case when sum(TOTAL_RQST_TIME) < 100 then null else

cast(float(sum(CLIENT_IDLE_WAIT_TIME))/sum(TOTAL_RQST_TIME) as decimal(10,2)) end

as "Ratio of client wt to request time",

case when sum(TOTAL_RQST_TIME) < 100 then null else

cast(100.0 * sum(TOTAL_WAIT_TIME)/sum(TOTAL_RQST_TIME) as decimal(4,1)) end

as "Wait time pct of request time",

case when sum(TOTAL_WAIT_TIME) < 100 then null else

cast(100.0*sum(LOCK_WAIT_TIME)/sum(TOTAL_WAIT_TIME) as decimal(4,1)) end

as "Lock wait time pct of Total Wait",

... sum(POOL_READ_TIME+POOL_WRITE_TIME)/ ... as "Pool I/O pct of Total Wait",

... sum(DIRECT_READ_TIME+DIRECT_WRITE_TIME)/ ... as "Direct I/O pct of Total Wait",

... sum(LOG_DISK_WAIT_TIME)/ ... as "Log disk wait pct of Total Wait",

... sum(TCPIP_RECV_WAIT_TIME+TCPIP_SEND_WAIT_TIME)/ ... as "TCP/IP wait pct ...",

... sum(IPC_RECV_WAIT_TIME+IPC_SEND_WAIT_TIME)/ ... as "IPC wait pct of Total Wait",

... sum(FCM_RECV_WAIT_TIME+FCM_SEND_WAIT_TIME)/ ... as "FCM wait pct of Total Wait",

... sum(WLM_QUEUE_TIME_TOTAL)/ ... as "WLM queue time pct of Total Wait",

... sum(XML_DIAGLOG_WRITE_WAIT_TIME)/ ... as "diaglog write pct of Total Wait"

• Lock wait, I/O Wait, Network wait, Log wait….

Page 16: DB2 Performance Tuning: Tips & Techniques

#IDUG

First Aid toolkit : Hot queries

• Sorted by CPU, ROWS_READ, NUM_EXECUTIONS

16

select MEMBER, TOTAL_ACT_TIME, TOTAL_CPU_TIME,

(TOTAL_CPU_TIME+500)/1000 as "TOTAL_CPU_TIME (ms)",

TOTAL_SECTION_SORT_PROC_TIME,

NUM_EXEC_WITH_METRICS, substr(STMT_TEXT,1,40) as stmt_text

from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as t

order by TOTAL_CPU_TIME desc fetch first 20 rows only;

select ROWS_READ, ROWS_RETURNED,

case when ROWS_RETURNED = 0 then null

else ROWS_READ / ROWS_RETURNED end as "Read / Returned",

TOTAL_SECTION_SORTS, SORT_OVERFLOWS, TOTAL_SECTION_SORT_TIME,

case when TOTAL_SECTION_SORTS = 0 then null

else TOTAL_SECTION_SORT_TIME / TOTAL_SECTION_SORTS end as "Time / sort",

NUM_EXECUTIONS, substr(STMT_TEXT,1,40) as stmt_text ...

select TOTAL_ACT_TIME, TOTAL_ACT_WAIT_TIME, LOCK_WAIT_TIME,

FCM_SEND_WAIT_TIME+FCM_RECV_WAIT_TIME as "FCM wait time",

LOCK_TIMEOUTS, LOG_BUFFER_WAIT_TIME, LOG_DISK_WAIT_TIME,

TOTAL_SECTION_SORT_TIME-TOTAL_SECTION_SORT_PROC_TIME as "Sort wait time",

NUM_EXECUTIONS, substr(STMT_TEXT,1,40) as stmt_text ...

Page 17: DB2 Performance Tuning: Tips & Techniques

#IDUG

Finding your resource bottleneck

17

“OS & DB2 Monitoring” Combo

Bottleneck OS System Monitoring DB2 Monitoring

Disk • High I/O wait seen in vmstat / iostat / perfmon

• Following symptoms in iostat • Disk > 80% busy • Average I/O Response time > 10 ms • Long disk queues

• Disk I/O Wait time pct of wait time > 10% - 30% • ms per Bufferpool Read > 10 ms • ms per Bufferpool Write > 8 ms • ms per Log Write > 6 ms

CPU • Total CPU utilization near 100% in vmstat / perfmon

• Majority of time spent in processing (TOTAL_RQST_TIME – TOTAL_WAIT_TIME)

• Percent of time spent in sorting • > 5% for OLTP • > 25% for DSS

• Percent of time spent in compilation • > 1% for OLTP • > 5-10% for DSS

Page 18: DB2 Performance Tuning: Tips & Techniques

#IDUG

18

Finding your resource bottleneck Bottleneck OS System Monitoring DB2 Monitoring

Memory Low free memory seen in vmstat / perfmon

Swapping reported in vmstat / perfmon

Activity on swap disks seen in iostat

Higher-than-normal system CPU time seen in vmstat

• Db2pd –memsets • Db2pd -mempools

Lazy System • Lower than expected CPU utilization in vmstat

• Lock Wait time pct of wait time > 10% • Latch Wait time pct of wait time > 10%

Page 19: DB2 Performance Tuning: Tips & Techniques

#IDUG

RCA Decision Tree(*)

19

Bottleneck

Type?

Disk

Bottleneck

Data

Tablespace

Temp

Tablespace

Index

Tablespace

Log

Devices

Bad plan giving tablescan?

• Old statistics?

• Need more indexes?

Insufficient prefetchers?

Over-aggressive cleaning?

LOB reads/writes?

Bad plan(s) giving excessive

index scanning?

• Need more/different indexes?

Insufficient sortheap?

Missing indexes?

Anything sharing the disks?

High transaction rate

• Too-frequent commits?

• Mincommit too low?

High data volume

• Logging too much data?

CPU

Bottleneck Network

Bottleneck

“Lazy

System”

Data

Tablespace

High User

Time

High

System

Time

SQL w/o parameter markers?

Too small dyn SQL cache?

Apps connecting/disconnecting?

Non-parallelize application?

Old device drivers

Creating/destroying agents

Too many connections

Excessive data flow?

• LOBs

• intermediate data

Shared n/w conflict

Lock escalation?

Lock contention?

Deadlocks?

Too few prefetchers?

Too few cleaners?

Append contention?

Mincommit too high?

Inadequate disk configuration / subsystem?

(*) Steve Rees

Page 20: DB2 Performance Tuning: Tips & Techniques

#IDUG

Agenda

• Performance Basics

• Diagnosing Performance Problems

• Sanity Check

• RCA Decision Tree

• First Aid toolkit

• OS tools

• New Monitoring in DB2 9.7

• How to identify Resource Bottleneck

• Interesting Case Study

20

Page 21: DB2 Performance Tuning: Tips & Techniques

#IDUG

Case study

Problem Statement: During month end processing of a financial application, end users complain of very poor response time. Sometimes it gets so bad that DBAs have to ask end users to log off to bring performance under control

21

BTIMESTAMP ACT_COMPLETED_TOTAL TOTAL_RQST_TIME TOTAL_CPU_TIME TOTAL_WAIT_TIME WAIT_TIME_PCT

------------------------------------------------- -------------------- -------------------- -------------------- ------------------------------------------------------------------------------------------------

2013-12-2709.56.06.953792 6529869 10849020 4013850 2824043 26.0

2013-12-2710.26.40.357524 7745399 54753448 5276679 19553857 35.7

2013-12-2710.57.22.446657 10858202 62094470 7724838 24864248 40.0

2013-12-2711.28.20.821182 8357401 170340138 7457519 91122840 53.4

2013-12-2711.59.16.159745 11007690 55956823 9997584 22008190 39.3

2013-12-2712.29.57.777965 9404492 58226702 10549689 26355559 45.2

2013-12-2713.00.47.218433 7274839 39615706 7158033 12774956 32.2

2013-12-2713.31.32.355720 8404885 36547009 6592374 18233963 49.8

2013-12-2714.02.19.522536 7663970 92621096 7499419 56746389 61.2

2013-12-2714.33.06.331026 8458188 182307546 8925829 105314269 57.7

2013-12-2715.07.57.484561 8062552 213516694 7368941 130611517 61.1

2013-12-2715.43.00.412277 9089660 169469181 9180359 115169806 67.9

2013-12-2716.16.16.403637 10209457 133897425 10349969 87745314 65.5

2013-12-2716.49.40.864305 12077339 99660976 13295425 55573696 55.7

2013-12-2717.21.12.302235 11608401 62731089 12411983 31806858 50.7

2013-12-2717.54.08.895777 11497929 78053625 11681713 42326863 54.2

2013-12-2718.27.15.545434 12356058 94755516 10579364 53262337 56.2

Time Spent Query #1 : Where is DB2 spending time?

Page 22: DB2 Performance Tuning: Tips & Techniques

#IDUG

22

Time Spent Query : Wait time breakup & I/O response time BTIMESTAMP TOTAL_RQST_TIME CLIENT_IDLE_WAIT_TIME TOTAL_WAIT_TIME Wait time pct of

request time Lock wait time pct of Total Wait Pool I/O pct of Total Wait Direct I/O pct of Total Wait Log

disk wait pct of Total Wait TCP/IP wait pct of Total Wait IPC wait pct of Total Wait FCM wait

pct of Total Wait Lock waits Lock timeouts Deadlocks Lock escalations

-------------------------- -------------------- --------------------- --------------------

2013-12-2714.33.06.331026 182307546 751154075 105314269

57.7 11.1 83.9 3.5

1.0 0.3 0.0 0.0

5044 8 3 0

2013-12-2715.07.57.484561 213516694 681158134 130611517

61.1 6.8 86.6 4.9

1.3 0.2 0.0 0.0

14553 18 1 0

2013-12-2715.43.00.412277 169469181 770049577 115169806

67.9 15.7 79.2 3.4

1.2 0.3 0.0 0.0

9546 22 5 0

2013-12-2716.16.16.403637 133897425 802722190 87745314

65.5 7.8 86.3 3.5

1.7 0.5 0.0 0.0

10540 0 1 0

BTIMESTAMP BPName Total Reads Total Writes Async writes

Page cleaning ratio Avg read time Avg write time

-------------------------- --------------------

2013-12-2714.02.19.485416 IBMDEFAULTBP 16920842 663812

660660 99.5 3.18 24.50

2013-12-2714.33.06.293863 IBMDEFAULTBP 20670203 738442

734670 99.4 9.24 44.02

2013-12-2715.07.57.393521 IBMDEFAULTBP 21575971 384557

381150 99.1 17.88 45.06

2013-12-2715.43.00.216328 IBMDEFAULTBP 20366286 567330

566939 99.9 14.44 29.56

2013-12-2716.16.16.374577 IBMDEFAULTBP 22007743 596890

596750 99.9 9.72 20.10

Page 23: DB2 Performance Tuning: Tips & Techniques

#IDUG

OS tools

23

xfer: %tm_act bps tps bread bwrtn rps avgserv minserv maxserv timeouts fails wps avgserv

minserv maxserv timeouts fails avgtime mintime maxtime avgwqsz avgsqsz sqfull

hdisk434 95.1 2.4M 242.5 74.5K 2.3M 14.7 10.4 0.1 2.5S 0 0 227.8 12.3 0.1

5.0S 0 0 5.6 0.0 250.3 1.0 2.0 27.6 15:00:59

hdisk434 94.5 1.8M 258.3 110.5K 1.7M 21.8 11.7 0.1 2.5S 0 0 236.5 12.3 0.1

5.0S 0 0 7.2 0.0 250.3 1.0 3.0 41.0 15:01:59

hdisk434 94.3 1.8M 258.9 108.5K 1.7M 21.3 10.2 0.1 2.5S 0 0 237.7 12.5 0.1

5.0S 0 0 7.6 0.0 250.3 2.0 3.0 44.3 15:02:59

hdisk434 93.3 1.8M 251.9 103.4K 1.7M 20.2 10.2 0.1 2.5S 0 0 231.7 11.8 0.1

5.0S 0 0 6.6 0.0 250.3 1.0 2.0 35.7 15:03:59

2 disks at ~100%

disk utilization

Page 24: DB2 Performance Tuning: Tips & Techniques

#IDUG

Hot Queries

24

TIME ROWNUM RR/RS Rows Read Rows Returned TOTAL_ACT_WAIT_TIME

NUM_EXECUTIONS SQL_TEXT

---------------------------------------------------------------------------------------------------------

2012-12-27-14.32.45.822962 1 112 32599 291 1600321 294

select a.percentage,b.descr,d.cts_rm_billability from X a

2012-12-27-14.32.45.822962 2 - 1309 0 871785 1309

UPDATE PSIBQUEUEINST SET

2012-12-27-14.32.45.822962 3 1 679 679 819480 2401

SELECT POL_TOTAL FROM YL WHERE PERIOD_END_DT=? AND EMPLID = ?

2012-12-27-14.32.45.822962 4 1 1177 1177 563161 523

SELECT SHEET_ID, SEQ_NUM, ATTACHSYSFILENAME, ATTACHUSERFILE, LAST_UPDATE_DTTM,

2012-12-27-14.32.45.822962 5 1 537 536 510173 556

SELECT CUST_ID, PROJ_ROLE, DESCR FROM XXX WHERE CUST_ID=?

2012-12-27-14.32.45.822962 6 0 0 949 352744 1025

SELECT 'Y' FROM YYY WHERE EMPLID = ? FETCH FIRST 1 ROWS ONLY

2012-12-27-14.32.45.822962 7 85 8391 98 340375 109

SELECT SHEET_ID, SEQ_NBR_3, EXPENSE_TYPE, DESCR, TXN_AMOUNT,

2012-12-27-14.32.45.822962 8 1 268 268 333748 131

SELECT CUST_ID, PROJ_ROLE, DESCR FROM..

2012-12-27-14.32.45.822962 9 1 3964 3859 325703 36

select b.emplid from x a, y b where a.oprid = ?

2012-12-27-14.32.45.822962 10 3 925 293 288198 293

SELECT BUSINESS_UNIT, PROJECT_ID, EFFDT, RESOURCE_CATEGORY

2012-12-27-14.32.45.822962 11 1 181 181 285040 112

SELECT FILL.PROJECT_MANAGER,FILL APPROVAL_STS,FILL.COUNT_7

• Explain plans for the hot queries shows lot of I/O happening due to large index-scans and

table-scans

Page 25: DB2 Performance Tuning: Tips & Techniques

#IDUG

Performance Tuning

25

• Two pronged approach • Query Tuning: Create indexes to reduce large index scans and table

scans

• DB level Tuning :

• Compressing the top 50 large tables

• Increasing the Bufferpool size

• Storage team advised to add I/O capacity

Page 26: DB2 Performance Tuning: Tips & Techniques

#IDUG

Sweta Singh IBM [email protected]

DB2 Performance Tuning: Tips & Techniques Session 6

Please fill out your session

evaluation before leaving!