Upload
manikandan-kamatchi-kamatchi
View
64
Download
18
Embed Size (px)
DESCRIPTION
SAP BASIS
Citation preview
Session:
Problem Determination and Database monitoring using db2pd
Jorge Daniel Vaquero
Jairo Balart
DAMA - UPC
2
"What's happening in the engine”
Monitoring scenarios
– Presented to better understand requirements
Tools available in v8.2 and v9.1
– System monitor
– db2pd
db2pd: tool to monitor and troubleshoot DB2– Standalone utility shipped with the DB2 engine starting with the DB2 v8.2
– Used by customers to monitor and troubleshoot
– Gives the user a closer view into the DB2 engine
Advantages of using db2pd– Tool collects information without acquiring any latches or using any engine resources
which has two major benefits
– Faster retrieval
– No competition for engine resources
3
Determine whether instance or database is up
"db2pd -" tells whether instance is up and for how long
db2pd –
Database Partition 0 -- Active -- Up 4 days 05:34:53
“db2pd -db <database> -” tells how long the database has been active
db2pd -db sample –
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 19:11:07
db2pd -alldbs –
Database Partition 0 -- Database XMLDB -- Active -- Up 0 days 19:11:25
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 19:11:43
4
Determine whether database is up (cont‟d)
Attempting to take an offline backup of an activated database
db2 backup db sample to /home/db2inst1
SQL1035N The database is currently in use. SQLSTATE=57019
db2diag.log will contain
2006-01-04-02.04.17.162649-300 I22772A342 LEVEL: Error
PID : 6275140 TID : 1 PROC : db2bp
INSTANCE: db2inst1 NODE : 000
FUNCTION: DB2 UDB, database utilities, sqlubConnectDatabase, probe:1259
DATA #1 : Hexdump, 4 bytes
0x0FFFFFFFFFFFF490 : FFFF FBF5
db2 list applications command will not work
– it only tells you whether or not any applications are active on the database
5
Monitoring progress and behavior of DB2 agents
db2pd -agents [db=<database>] [ [agent=<agentid>] | [application=<appid>] ]
Agents:
Current agents: 8
Idle agents: 5
Active agents: 2
Coordinator agents: 2
Address AppHandl [nod-index] AgentPid Priority Type
State ClientPid Userid ClientNm Rowsread Rowswrtn LkTmOt
DBName
0x0780000001A35540 0 [000-00000] 2809968 0
Idle n/a n/a n/a 0 0 NotSet
n/a
0x0780000001A16A00 1406 [000-01406] 1134686 0 Coord
Inst-Active 598024 dabrashk db2bp 2 0 NotSet
SAMPLE
0x0780000001A17460 581 [000-00581] 884778 0 Coord
Inst-Active 2969776 dabrashk db2bp 14 0 NotSet
SAMPLE
0x0780000001A34AE0 0 [000-00000] 2076776 0 Coord
Pooled n/a n/a n/a 0 0 NotSet
SAMPLE
6
SQL1226N due to too many db2agent processes
If many db2agent processes remain attached to an instance
– They may use up all of the agents (MAXAGENTS)
– New connections will trigger SQL1226N
– “The maximum number of client connections are already started”
– ADM7009E error message will be logged in a notify log
Use “db2pd –agents” to find ApplHandl, ClientPid, UserId and ClientNm
db2pd -agents | awk '/Address|^0x/ { print $2, $8, $9, $10}'
AppHandl ClientPid Userid ClientNm
643 3612794 dabrashk db2bp
642 3227856 dabrashk db2bp
Force the identified AppHandl off with command
db2 "force application (642)"
7
Repeat Option: db2pd –rep [num sec] [count]
db2pd -age -rep 10 3
Database Partition 0 -- Active -- Up 0 days 00:10:42
AppHandl AgentPid ClientPid Userid ClientNm Rowsread Rowswrtn DBName120 1605826 1130588 jmcmahon db2bp 18943 23998 SAMPLE13 1085632 1675376 jmcmahon db2bp 3489 9898 SAMPLE
AppHandl AgentPid ClientPid Userid ClientNm Rowsread Rowswrtn DBName120 1605826 1130588 jmcmahon db2bp 98941 100091 SAMPLE13 1085632 1675376 jmcmahon db2bp 4100 9898 SAMPLE
AppHandl AgentPid ClientPid Userid ClientNm Rowsread Rowswrtn DBName120 1605826 1130588 jmcmahon db2bp 100999 230448 SAMPLE13 1085632 1675376 jmcmahon db2bp 6777 9898 SAMPLE
Repeat option is handy for watching activities. Example above watches agent’s reads and writes
Combine -repeat option with the file redirection
– db2pd -age file=agents.out -rep 10 3
Combine multiple options and use mixed scope options– db2pd –db sample –loc –tra –age –fil lock.txt
– Use –file for multiple options
8
Monitoring progress and behavior of applications
db2pd –applications –db sample
Use to map application to a coordinator agent
Use to determine a status of application
Use to map application to the dynamic SQL statement
db2pd -activestatements
– Any statement that is part of the active statement list is reported
– Gives the user the ability to identify all active dynamic statements for all
applications
Use to map application ID to IP address and port
– Simplified in v9.1: no need to convert
9
Monitoring currently executing dynamic SQL statements
db2pd -db sample -activestatements
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 00:43:12
Active Statement List:
Address AppHandl [nod-index] UOW-ID StmtID AnchID StmtUID EffISO EffLockTOut EffDegree StartTime LastRefTime
0x0780000020A405A0 51 [000-00051] 7 1 73 1 1 -2 0 Sat Jan 7 00:17:43 2006 Sat Jan 7 00:17:43 2006
0x07800000209FB8A0 44 [000-00044] 1 2 44 1 1 -2 0 Sat Jan 7 00:10:12 2006 Sat Jan 7 00:10:12 2006
db2pd -db sample -dyn
Dynamic SQL Statements:
Address AnchID StmtUID NumEnv NumVar NumRef NumExe Text
0x0780000020A08660 43 1 2 2 4 3 create table staff3 like staff
0x0780000020A076C0 44 1 1 1 1 1 select name from staff
Monitoring SQL statements executed on the instance– sqltext script
– Uses insert time as reported by „db2pd –dynamic‟ SQL Variations section.
10
Monitoring transactions: db2pd -transactions
Transactions:Address AppHandl [nod-index] TranHdl Locks State Tflag
Tflag2 Firstlsn Lastlsn LogSpace SpaceReserved TID AxRegCnt GXID
0x078000002024DA80 1000 [000-01000] 2 1020 READ 0x00000000 0x00000000 0x000177000C00 0x0000017FF000 1230 10000 0x000000006614 1 n/a
0x078000002024E780 1004 [000-01004] 3 35 WRITE 0x00000000 0x00000000 0x0001801CF000 0x000001999600 115 10000 0x000000006627 1 n/a
Useful for determining the amount of resources a transaction is using
– db2pd -transactions provides
– number of locks
– first lsn, last lsn– Log Sequence Number represents relative byte address, within the database log, for the first
byte of the log record
– logspace used (in pages)
– space reserved (in pages)
Monitor the progress and behavior of any transaction
11
Monitoring application progress
Identifying slow or hanging aplications
Monitor rows read and written for agents
– db2pd -agents | awk '/Address|^0x/ { print $2, $11, $12, $14;}„
AppHandl Rowsread Rowswrtn DBName
51 109 58 SAMPLE
44 46 0 SAMPLE
9 0 0 n/a
8 126 107 SAMPLE
0 0 0 SAMPLE
– Use AppHandl to determine the application to take action on
– Use db2pd –dynamic to find SQL statement
– Use db2pd –static to find package
– Use AppId to find out application‟s IP address and port number (if applicable)
12
Monitoring application progress (cont‟d)
Monitor start time of statements in db2pd –activestatements– db2pd -activestat -db sample | awk '/Address/ { print $2,$6,$7,$11,$12 } /^0x/ {print
$2,$6,$7,substr($0,115);}'
AppHandl AnchID StmtUID StartTime LastRefTime
76 44 1 Sat Jan 7 17:57:35 2006 Sat Jan 7 17:57:35 2006
– Use AnchID and StmtUID to identify SQL statement
– Use AppHandl to identify application
Find “non-committing” transactions
– Use the first and last LSN (log sequence number) of the transaction
• db2flsn executable can be used to identify the log file for a specific lsn.
– db2pd -trans -db sample | awk '/Address|^0x/ { print $2,$9,$10;}'
AppHandl Firstlsn Lastlsn
76 0x0001801CF000 0x000202999600
44 0x000177000C00 0x000177000C00
Find biggest lockers
– Use AppHandl and Locks fields of db2pd -transactions
– db2pd -trans -db sample | awk '/^0x/ { print $5, $2}' | sort -rn
7 76
4 74
2 44
13
Monitoring OS: db2pd -osinfo
Operating System Information:
OSName: AIX
NodeName: mymachine
Version: 5
Release: 2
Machine: 000ABCD123
CPU Information:
TotalCPU OnlineCPU ConfigCPU Speed(MHz) HMTDegree
4 4 4 1453 1
Physical Memory and Swap (Megabytes):
TotalMem FreeMem AvailMem TotalSwap FreeSwap
16384 10201 n/a 16384 16366
Virtual Memory (Megabytes):
Total Reserved Available Free
40960 n/a n/a 38907
Shared Memory Information:
ShmMax ShmMin ShmIds ShmSeg
68719476736 1 131072 0
14
Using db2pd to diagnose hangs
15
Useful DB2 tools for hangs
The “db2pd” tool
– Purpose: To gather information quickly and non-intrusively
from the DB2 engine
The “db2cos” tool
– Purpose: To be called “inline” from DB2 code to collect
information about problems
Latch tracking
– Purpose: To track latch ownership
Snapshot monitoring
– Purpose: To understand point in time status of queries or
entities in the DB2 engine
DB2 trace
– To investigate possible movement in DB2 agents
16
Troubleshooting deadlocks and lock wait timeouts
For transient/infrequent deadlocks or lock wait timeouts
– These can last a short period of time and need to be caught quickly
– db2pdcfg -catch uses inline code to trigger an action immediately
– Default (primary) action is to run the DB2 call out script (db2cos)
db2pdcfg -catch locktimeout count=1Error Catch #1
Sqlcode: 0
ReasonCode: 0
ZRC: -2146435004
ECF: 0
Component ID: 0
LockName: Not Set
LockType: Not Set
Current Count: 0
Max Count: 1
Bitmap: 0x4A1
Action: Error code catch flag enabled
Action: Execute sqllib/db2cos callout script
Action: Produce stack trace in db2diag.log
17
Collecting call stacks in v9.1
Used to determine what DB2 agent is doing
Call stack collection has been sped up significantly in v9.1
– Stack traces (trap files) collection is dissociated from binary dump files
Produce stack trace for all PIDs or chosen PID
– db2pd -stack [all|<pid>]
– Produces trap file(s) in the DIAGPATH directory
Produce dump file and stack trace for all PIDs or chosen PID
– db2pd -dump [all|<pid>]
18
db2pd -latches option (v9.1)
Latch tracking is always-on in v9.1
db2pd -latches
Latches:
Address Holder Waiter Filename LOC LatchType
0x07800000203C40C8 1695804 0 sqldpool.C 529
SQLO_LT_SQLB_CLNR_PAUSE_CB__preventSuspendLatch
0x07800000203C4190 1695804 0 sqlbpool.C 2254
SQLO_LT_SQLB_PTBL__pool_table_latch
0x07800000203C5678 1695804 0 sqlbistorage.h 5169
SQLO_LT_SQLB_PTBL__ptfLatches
– Holder is agent process ID
– LatchType is a latch identifier
To group latches by holders and waiters
– db2pd -latches group
Latch Holders:
Address Holder Filename LOC LatchType
0x07800000204A8A48 1695804 sqlbilatch.C 1150 SQLO_LT_SQLB_POOL_CB__writeLatch
Latch Waiters:
Address Waiter Filename LOC LatchType
19
Detecting hangs for specific applications Take a few application snapshots a minute apart to determine
– what the status of the application is (db2pd –app: Status) and
– whether any work is being done (db2pd –agent: RowsRead/Wrtn)
– It is useful to have turned on all of the monitor switches prior to a re-creatable or recurring problem scenario
Use db2pd -app to determine status of an application– db2pd -app -db sample | awk '/Address|^0x/ { print $6 }'
If status is UOW Waiting, the hang is not occurring at the DB2 server– The client application should be investigated to find out what it is waiting for.
– In DPF environment, this may indicate a problem with another partition
If status is Executing and counters like rows-read/written are increasing, it is likely a performance issue
If status is Lock-wait than it is a locking/concurrency issue– Exception is the case when the application being waited on is in UOW Executing and making
no progress
If status is Executing yet no counters are increasing, then the agent or agents servicing the application may be in an abnormal state
– More diagnostics is needed
20
Monitoring locks: lock contention
Monitor for slowdowns– Important to figure out “who is waiting for whom”
Identify the lock owner to consider action of releasing the lock
db2pd -db sample -loc
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 00:00:26
Locks:... TranHdl Lockname Type Mode Sts Owner ...... 2 00020003000000040000000052 Row ..X G 2... 3 00020003000000040000000052 Row ..X W 2
Look for the “W” status for waiters– Transaction 2 is holding the lock that transaction 3 is waiting on
db2pd -db sample -locks wait
– Show locks with a wait status and their waiter (v8.2 FP9)
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 01:17:17
Locks:... TranHdl Lockname Type Mode Sts Owner Dur HldCnt Att Rlse... 2 000200040000000D0000000052 Row .NS W 3 1 0 0 0x0... 4 00020003000000270000000052 Row .NS W 0 1 0 0 0x0... 2 00020003000000270000000052 Row ..X G 2 1 0 8 0x40
Look for the “G” status for holders
21
Monitoring locks (cont‟d) Lockname <==> hex representation of the physical object that is being waited on
db2pd -db sample -loc showlocks
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 00:10:42
Locks:... Lockname Type... 000200030000001A0000000052 Row TbspaceID 2 TableID 3 RecordID 0x1A... 000200030000001F0000000052 Row TbspaceID 2 TableID 3 RecordID 0x1F... 53514C4332453036C8324ABC41 Internal P Pkg UniqueID 53514c43 32453036 Name c8324abc... 53514C4332453036C8324ABC41 Internal P Pkg UniqueID 53514c43 32453036 Name c8324abc... 0002000D000000050000000052 Row TbspaceID 2 TableID 13 RecordID 0x5... 00020003000000000000000054 Table TbspaceID 2 TableID 3... 0002000D000000000000000054 Table TbspaceID 2 TableID 13
„showlocks‟ suboption will expand the lockname into meaningful explanations
To determine who is holding a lock in your database
– db2pd –database sample –locks –transactions –agents –file lock.txt
– -agents will contain UserID for transaction handle that is holding a lock (status is G (granted))
Map lock info to a table name– Use TableID from Lockname or showlocks output
db2pd -tcbstats -db sample | awk '/Address|^0x/ { print $2,$3,$4}'
TbspaceID TableID TableName
0 18 SYSROUTINES
0 81 SYSROUTINEPROPERTI
2 4 DEPARTMENT
22
Monitoring buffer pool
Determine whether we are spending time flushing buffers
– due to space constraint or poor allocation of pools
– It‟s needed to identify areas for tuning
db2pd -buffer -db sample
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days
00:00:41
Bufferpools:
First Active Pool ID 1h
Max Bufferpool ID 1
Max Bufferpool ID on Disk 1
Num Bufferpools 5
Address Id Name PageSz PA-NumPgs BA-NumPgs
BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt
0x078000002034A8C0 1 IBMDEFAULTBP 4096 1000 0 0
3 0 1000 1000 0
23
Buffer pool statistics (v9.1)
db2pd -db sample -bufferpools– DatLRds (DatPRds) number of logical (physical) data page reads for this
bufferpool
– Hit ratio for data pages given the above logical and physical reads
– Same for index pages: IdxLRds, IdxPRds, HitRatio
Bufferpool Statistics for all bufferpools (when BUFFERPOOL monitor switch is ON):
BPID DatLRds DatPRds HitRatio TmpDatLRds TmpDatPRds HitRatio IdxLRds IdxPRds HitRatio TmpIdxLRds TmpIdxPRds HitRatio
1 78 22 71.79% 0 0 00.00% 100 58 42.00% 0 0 00.00%
BPID DataWrts IdxWrts DirRds DirRdReqs DirRdTime DirWrts DirWrtReqs DirWrtTime1 0 0 42 5 0 0 0 0
BPID AsDatRds AsDatRdReq AsIdxRds AsIdxRdReq AsRdTime AsDatWrts AsIdxWrts AsWrtTime 1 0 0 0 0 0 0 0 0
BPID TotRdTime TotWrtTime VectIORds VectIOReq BlockIORds BlockIOReq PhyPgMaps FilesClose NoVictAvl UnRdPFetch
1 104 0 0 0 0 0 0 8 0 0
24
Buffer pool statistics (v9.1) (cont)
Filtering bufferpools output by bufferpool ID
– -bufferpools <bpID>
db2pd -db sample -bufferpools 4099
Address Id Name PageSz PA-NumPgs BA-NumPgs
BlkSize NumTbsp PgsToRemov CurrentSz PostAlter SuspndTSCt
0x07800000203C99A0 4099 IBMSYSTEMBP32K 32768 16 0 0
0 0 16 16 0
Bufferpool Statistics for bufferpool 4099 (when BUFFERPOOL monitor switch is
ON):
25
db2pd -pages (v9.1)
db2pd -db sample -pages
– Pages for all bufferpools
db2pd -db sample -pages [<bpID>]
Monitor bufferpool behavior
– Tells which pages are in the bufferpool
– Use to determine what is in the bufferpool that is the cause for the hit ratio to be lower than you expect
Allows user to check how many pages each object (table, index, etc) has within any particular bufferpool
Similar to IDS's onstat -b option
might help to detect a problem, such as an insufficient number of buffers in the buffer pool or high “read aheads”
26
db2pd -pages: example (v9.1)
db2pd -db sample -page
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 00:01:01
Bufferpool Pages:First Active Pool ID 1Max Bufferpool ID 1Max Bufferpool ID on Disk 1Num Bufferpools 5
Pages for all bufferpools:Address BPID TbspaceID TbspacePgNum ObjID ObjPgNum ObjClass ObjType Dirty
Prefetched0x07800000204DD040 1 0 6 19 6 Perm Index N N 0x07800000204DD0F0 1 0 0 19 0 Perm LOBA N N 0x07800000204DD1A0 1 0 0 19 0 Perm Index N N 0x07800000204DD250 1 0 1 19 1 Perm Index N N 0x07800000204DD300 1 0 2 19 2 Perm Index N N 0x07800000204DD3B0 1 0 0 1 0 Perm Data N N 0x07800000204DD460 1 0 4 19 4 Perm Index N N 0x07800000204DD510 1 0 0 19 0 Perm Data N N 0x07800000204DDB40 1 0 1 87 1 Perm Index N N 0x07800000204DDBF0 1 0 7 19 7 Perm Index N N 0x07800000204DDCA0 1 0 8 19 8 Perm Index N N 0x07800000204DDD50 1 0 9 19 9 Perm Index N N 0x07800000204DDE00 1 0 10 19 10 Perm Index N N <snip>Total number of pages: 80
Summary info for all bufferpools:…
27
Monitoring for SQL errors: db2pdcfg -catch
Functionality moved from db2pd to db2pdcfg in v9.1
Purpose
– allow the user to catch any sqlcode (and reason code), zrc or ecf codes (internal error codes)
– capture the information needed to solve the error code
Primary action: execute the db2cos (callout script)
– template db2cos file is located in sqllib/bin (v9.1)
– db2cos may be altered to run any command (db2pd,OS or other) needed to solve the problem: default is „db2pd -db $database‟ in "SQLCODE“ section)
Defaults
– Up to 10 catch points simultaneously
– “Error catch array is full. Use 'clear' suboption to clear an element"
– Up to 255 invocations
– Max Count: 255
28
Index Statistics: db2pd -tcbstats all
Useful for performance tuning
Numerous statistics are reported to provide characteristics about each
index‟s use
– Only database activation/deactivation will reset these statistics
“db2pd –tcbstats all” or “db2pd –tcbstats index”
TCB Index Stats:
TbspaceID TableID TableName SchemaNm ID RootSplits Scans KeyUpdates Merg
0 2 SYSTABLES SYSIBM 6 0 0 0 0
0 2 SYSTABLES SYSIBM 5 0 0 0 0
Scans
– The number of scans against the index
29
Detect full-table scan vs index scans
Detect full-table scans for every table
– Use „db2pd -tcb -db <dbname>‟
db2 “select * from employee”
db2pd -tcb -db sample | awk '/TCB Table Stats/ { found =1} found==1 { print}' | grep -i employee | awk '{print "Scans: ", $3}'
Detect number of index scans
– Use 'TCB Index Stats' portion of the 'db2pd -tcb index -db <dbname>' output
db2 "CREATE INDEX LNAME ON EMPLOYEE (LASTNAME ASC)“
db2 “select * from employee”
db2pd -tcb index -db sample | awk '/TCB Index Stats/ { found =1} found==1 { print}' | grep -i employee | awk '$8 > 0 {print "Index Scans: ", $8}'
30
DB2 table access ratio
db2pd can help determine the access frequency of each table
– Operations such as select, update, insert and delete
– Reported by db2pd -db <dbname> -tcbstats
Identify tables with most inserts done to them
db2pd -db sample -tcbstats | awk '/TCB Table Stats/ { found =1} found==1 { print}' |
awk '/^0x/ { print $9, $2}' | sort -rn | head -5
36 STAFF3
32 EMPLOYEE
20 PROJECT
7 SYSCOLUMNS
1 SYSUSERAUTH
32
Monitoring progress of transaction logging
By watching the Pages Written output, you can determine whether the log usage is progressing
db2pd -logs -db sample
Logs:Current Log Number 4Pages Written 464
Address StartLSN State Size Pages Filename0x000000022022FEB8 0x000000FA0000 0x00000000 1000 597 S0000000.LOG
0x000000022022FF78 0x000001388000 0x00000000 1000 5 S0000001.LOG
0x0000000220008E78 0x000001770000 0x00000000 1000 3 S0000002.LOG
0x0000000220A57F58 0x000001B58000 0x00000000 1000 1000 S0000003.LOG
0x0000000220A32598 0x000001F40000 0x00000000 1000 1000 S0000004.LOG
Monitor amount of log space consumed over the course of 10 minutes– db2pd -db SAMPLE -logs -repeat 60 10
33
Monitoring log usage (FP9 enhancements)
db2pd -logs has some new information since v8.2.2:Logs:Current Log Number 5Pages Written 846Method 1 Archive Status SuccessMethod 1 Next Log to Archive 5Method 1 First Failure n/aMethod 2 Archive Status SuccessMethod 2 Next Log to Archive 5Method 2 First Failure n/a
Address StartLSN State Size Pages Filename0x000000023001BF58 0x000001B58000 0x00000000 1000 1000 S0000002.LOG0x000000023001BE98 0x000001F40000 0x00000000 1000 1000 S0000003.LOG0x0000000230008F58 0x000002328000 0x00000000 1000 1000 S0000004.LOG
Two problems can be identified with this output– Problem with archiving
– if Archive Status is set to Failure, the most recent log archive failed
– If First Failure is set, ongoing archive failure is preventing logs from archiving
– Log archiving is proceeding very slowly
– Next Log to Archive will be behind Current Log Number (this can cause the log path to fill up completely)
– Monitor „Next Log to Archive‟ compared to „Current Log Number‟
– If next log is 3 and current is 5, then logs 3 and 4 haven‟t been logged yet
– Log 5 is the current log being written into
34
Monitoring Tablespaces and Containers
Single tablespace can be monitored
– db2pd -tab[lespaces] <tablespaceID> -rep[eat] <numSecs>
db2 "create tablespace dms1 managed by database using (file 'tbspace1' 1M)“
db2pd -db sample -tab | grep DMS1 | awk '{print $2, $15}„6 DMS1 tablespace ID is 6
db2pd -db sample -tab 6 | perl -ane 'if (/TotalPgs * UsablePgs/ .. /^$/) { print "$F[2] $F[3]\n" } '
UsablePgs UsedPgs
224 96
db2 "insert into staff3 select * from staff“…repeat …
db2pd -db sample -tab 6 | perl -ane 'if (/TotalPgs * UsablePgs/ .. /^$/) { print "$F[2] $F[3]\n" } '
UsablePgs UsedPgs
224 160
35
Verifying isolation level
Current isolation level of dynamic SQL statement
– “how can I tell what isolation level is being used ?”
– db2pd -db sample -dynamic
– ISO column in Dynamic SQL Environments sectionDynamic SQL Environments:
Address AnchID StmtUID EnvID Iso QOpt Blk
0x0780000020CEBC40 41 1 1 CS 5 B
0x0780000020BB5D80 42 2 1 CS 5 B
0x0780000020CE2FC0 235 1 1 RR 5 B
– db2pd -activestatements
– EffISO column
– 0=RR,1=CS,2=UR and 3=RS
“Will setting DB2_EVALUNCOMMITTED to ON help?”
– DB2_EVALUNCOMMITTED enables deferred locking
– available if application is using cursor stability or read stability
36
Monitoring memory usage
db2pd -memsets -mempools
reports statistics about DB2 Memory Sets and Memory Pools which helps in
understanding memory usage
Memory Sets:
Name Address Id Size Key DBP Type Ov OvSize
DBMS 0x0780000000000000 19398699 56639488 0x59FE1161 0 0 Y 8241152
FMP 0x0780000010000000 72220781 245284864 0x0 0 2 N 0
Trace 0x0770000000000000 28704896 134906824 0x59FE1174 0 -1 N 0
Memory Pools:
Address MemSet PoolName Id Overhead LogSz LogUpBnd LogHWM
PhySz PhyUpBnd PhyHWM Bnd BlkCnt CfgParm
0x0780000000001230 DBMS fcmrqb 79 193888 1290312 1953864 1290312
1507328 1966080 1507328 Ovf 0 n/a
…
0x07800000000003C0 DBMS eduah 72 2496 112032 112064 112032
114688 114688 114688 Ovf 0 n/a
0x07800000100003C0 FMP undefh 59 48000 737400 245163840 737400
786432 245170176 786432 Phy 0 n/a
37
Monitoring memory usage (cont)
db2pd –memblocks
– Reports all memory blocks in DBMS set (list)
– Followed by the sorted 'per-pool' output Memory blocks sorted by size for ostrack pool:
PoolID PoolName TotalSize(Bytes) TotalCount LOC File
57 ostrack 5160048 1 3047 698130716
57 ostrack 240048 1 3034 698130716
57 ostrack 240 1 2983 698130716
57 ostrack 80 1 2999 698130716
57 ostrack 80 1 2970 698130716
57 ostrack 80 1 3009 698130716
Total size for ostrack pool: 5400576 bytes
– Final section sorts the consumers of memory for the entire setAll memory consumers in DBMS memory set:
PoolID PoolName TotalSize(Bytes) %Bytes TotalCount %Count LOC File
57 ostrack 5160048 71.90 1 0.07 3047 698130716
50 sqlch 778496 10.85 1 0.07 202 2576467555
50 sqlch 271784 3.79 1 0.07 260 2576467555
57 ostrack 240048 3.34 1 0.07 3034 698130716
50 sqlch 144464 2.01 1 0.07 217 2576467555
69 krcbh 73640 1.03 5 0.36 547 4210081592
Report memory blocks for private memory on UNIX and Linux
– db2pd -memb pid=159770
39
Monitoring utilities: db2pd -utilities
Utilities:
ID Type DBName StartTime NumPhase CurPhase Desc
2 BACKUP SAMPLE Wed 12:35:00 Apr 28 2004 1 1 offlinedb
Progress:
ID PhaseNum StartTime CompletedWork TotalWork
2 1 Wed Apr 28 12:35:42 2004 22782661 bytes 24303325 bytes
Utility types: BACKUP, RUNSTATS, REORG, RESTORE, CRASH_RECOVERY,
ROLLFORWARD_RECOVERY, LOAD, RESTART_RECREATE_INDEX
– db2 backup db sample
– db2 restore db sample
Use to monitor utilities‟ progress
– Determine whether to throttle or unthrottle BACKUP or RUNSTATS utility (using SET
UTIL_IMPACT_PRIORITY)
40
Monitoring Table Reorgs: db2pd -reorgs
Table reorg output including tablespace id, table id, table name, phases, counters, type (offline/online), start time, and end time are reported
Table Reorg Stats:
TbspaceID TableID TableName MaxPhase Phase CurCount MaxCount Type
2 2 PDTEST 2 Replace 0 2 Offline
Phase field (only applies to offline table reorganization)
– The phase of the table reorganization: Sort, Build, Replace, InxRecreat
Status field (only applies to online table reorganization)
– status of an online table reorganization: Started, Paused, Stopped, Done, Truncat
– “Done" status indicates that the reorg utility has been completed
Completion field– success indicator for the table reorganization. Possible values:
– 0. The table reorganization completed successfully
– -1. The table reorganization failed
41
Recovery: db2pd -recovery
Recovery:Recovery Status 0x00000401Current Log S0000000.LOGCurrent LSN 000000BB800CJob Type ROLLFORWARD RECOVERYJob ID 2Job Description Database Rollforward RecoveryInvoker Type User Total Phases 2 Current Phase 1
Progress:PhaseNum Description StartTime CompletedWork TotalWork1 Forward Wed May 5 10:48:09 2004 0 bytes Unknown2 Backward NotStarted 0 bytes Unknown
Monitoring recovery
db2pd –recovery shows several counters to make sure recovery is progressing:
– Current Log and Current LSN provide the log position
– CompletedWork counts the number of bytes completed thus far
42
Figuring out which application is using up your tablespace Identify number of Inserts for table (here, temp table TEMP1)
– db2pd -tcbstats
TCB Table Stats:
Address TbspaceID TableID TableName SchemaNm Scans Inserts
ObjClass UDI DataSize
0x000000022094AA58 4 2 TEMP1 SESSION 0 124
Temp 0 1
Map to tablespace 4 in db2pd -tablespaces output:
Tablespaces:
Address Id Type Content AS AR PageSize ExtentSize Auto
Prefetch BufID BufIDDisk
0x0000000220942F80 4 DMS UsrTmp No No 4096 32 Yes 32
1 1
Containers:
Address TspId ContainNum Type TotalPages UseablePgs StripeSet
Container
0x0000000220377CE0 4 0 File 10000 9952 0
/export/home/jmcmahon/tempspace2a
Notice the space filling up by watching UseablePgs vs. TotalPages
43
Figuring out which application is using up your tablespace (2)
Identify the dynamic sql statement using a table called TEMP1
– db2pd -db sample -dyn
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 00:13:06
Dynamic Cache:
Current Memory Used 1072198
Total Heap Size 1271398
Cache Overflow Flag 0
Number of References 7540
Number of Statement Inserts 3981
Number of Statement Deletes 3924
Number of Variation Inserts 2459
Number of Statements 57
Dynamic SQL Statements:
Address AnchID StmtUID NumEnv NumVar NumRef NumExe
Text
0x0000000220A08C40 78 1 2 2 3 2
declare global temporary table temp1 (c1 char(6)) not logged
0x0000000220A8D960 253 1 1 1 24 24
insert into session.temp1 values(TEST)
44
Figuring out which application is using up your tablespace (3) Map this to -app output to identify the application
– db2pd -app -db sample
Applications:
Address AppHandl [nod-index] NumAgents CoorPid Status C-
AnchID C-StmtUID L-AnchID L-StmtUID Appid
0x0000000200661840 501 [000-00501] 1 11246 UOW-Waiting 0
0 253 1 *LOCAL.jmcmahon.050202160426
db2pd -agent output will show the number of rows written as
verification
Address AppHandl [nod-index] AgentPid Priority Type
State ClientPid Userid ClientNm Rowsread Rowswrtn LkTmOt
DBName
0x0000000200698080 501 [000-00501] 11246 0 Coord
Inst-Active 26377 jmcmahon db2bp 100999 230448 NotSet
SAMPLE
45
Monitoring implicit temporary table space
Steps are different for the implicit temporary table
Use db2pd -tcbstats to identify tables with large numbers of insertsTCB Table Information:
Address TbspaceID TableID PartID MasterTbs MasterTab TableName SchemaNm ObjClass
DataSize ...
0x0780000020CC0D30 1 2 n/a 1 2 TEMP (00001,00002) <30> <JMC Temp
2470 ...
0x0780000020CC14B0 1 3 n/a 1 3 TEMP (00001,00003) <31> <JMC Temp
2367 ...
0x0780000020CC21B0 1 4 n/a 1 4 TEMP (00001,00004) <30> <JMC Temp
1872 ...
TCB Table Stats:
Address TableName Scans UDI PgReorgs NoChgUpdts Reads FscrUpdates Inserts ...
0x0780000020CC0D30 TEMP (00001,00002) 0 0 0 0 0 0 43219 ...
0x0780000020CC14B0 TEMP (00001,00003) 0 0 0 0 0 0 42485 ...
0x0780000020CC21B0 TEMP (00001,00004) 0 0 0 0 0 0 0 ...
Notice large number of inserts for implicit temporary tables
– tables with the naming convention "TEMP (TbspaceID, TableID)“
– Identify the application doing the work
– values in the SchemaNm column have a naming convention of
<AppHandl><SchemaNm>
46
Monitoring implicit temporary table space (cont) Map that info to the used space for table space 1
– Use db2pd –tablespaces
– Notice the UsedPgs vs the UsablePgs in the table space statisticsTablespace Configuration:
Address Id Type Content PageSz ExtentSz Auto Prefetch BufID BufIDDisk FSC
NumCntrs MaxStripe LastConsecPg Name
0x07800000203FB5A0 1 SMS SysTmp 4096 32 Yes 320 1 1 On
10 0 31 TEMPSPACE1
Tablespace Statistics:
Address Id TotalPgs UsablePgs UsedPgs PndFreePgs FreePgs HWM State
MinRecTime NQuiescers
0x07800000203FB5A0 1 6516 6516 6516 0 0 0 0x00000000
0 0
Tablespace Autoresize Statistics:
Address Id AS AR InitSize IncSize IIP MaxSize LastResize LRF
0x07800000203FB5A0 1 No No 0 0 No 0 None No
Identify the application handles 30 and 31
– Seen in the -tcbstats output
– db2pd -app
Map this to the Dynamic SQL using db2pd -dyn
47
db2pd –fmp command
Returns information about the process in which the fenced routines are executed
db2pd –fmp
FMP:Pool Size: 1Max Pool Size: 250Keep FMP: YESInitialized: YESTrusted Path: /home/dabrashk/sqllib/function/unfencedFenced User: nobody
FMP Process:Address FmpPid Bit Flags ActiveThrd PooledThrd Active0x07800000007A5FE0 6455492 64 0x00000000 0 0 No
Active Threads:Address FmpPid EduPid ThreadId No active threads.
Pooled Threads:Address FmpPid ThreadId No pooled threads.
48
db2pd -fmp (cont‟d)
Useful flags (StateFlags field)
– 0x00000001 - JVM initialized
– 0x00000002 - Is threaded
– 0x00000004 - Used to run federated wrappers
– 0x00000008 - Used for Health Monitor
– 0x00000010 - Marked for shutdown and will not accept new tasks
– 0x00000020 - Marked for cleanup by db2sysc
– 0x00000040 - Marked for agent cleanup
– 0x00000100 - All ipcs for the process have been removed
– 0x00000200 - .NET runtime initialized
– 0x00000400 - JVM initialized for debugging
– 0x00000800 - Termination flag
ActiveTh - Number of active threads running in the fmp process.
PooledTh - Number of pooled threads held by the fmp process.
Active - Active state of the fmp process. Values are Yes or No.
49
-fcm command improvements in v9.1
Use the new hwm option to see historical information about
applications that consume large amounts of fast communication
manager (FCM) resources
– Retrieve high-watermark consumptions of FCM buffers and channels by
applications since the start of the DB2 instance
– The high-watermark consumption values of applications are retained even
if they have disconnected from the database already
The output will now contain FCM channel usage statistics, including
the high and low water mark values with respect to the number of
channels used.
50
DB2 Troubleshooting and Problem Determination
Resources
– DB2 Monitoring and Troubleshooting: db2pd tool
– http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.a
dmin.doc/doc/r0011729.htm
– Problem Determination Guide
– http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.r
n.doc/doc/c0023244.htm
– What’s new for V9.1: Troubleshooting and problem determination enhancements
– http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.r
n.doc/doc/c0023244.htm
– DB2 Product Support site
– http://www-306.ibm.com/software/data/db2/udb/support/index.html
– DB2 APARs (Authorized Program Analysis Reports)
– http://www-306.ibm.com/software/data/db2/udb/support/apars.html
51
Summary
Understanding requirements for PD and monitoring
Different monitoring scenarios
Tools available in v8.2 and v9.1
– db2pd - Monitor and troubleshoot DB2 Universal Database Command
– Options either added or extended for each release/fixpack
– “Proactive” FFDC: catching errors
Questions
52
Appendix
Miscellaneous monitoring tools
54
Miscellaneous monitoring tools
Lock Timeout Report Tool
db2mc: The DB2 Monitoring Console - Open Source Project
– Light-weight, web-based console for DB2 for Linux, UNIX and Windows
– http://sourceforge.net/projects/db2mc
– Help: http://sourceforge.net/docman/?group_id=211760
db2top: Single System View Monitor for DB2
– Specifically designed for DPF environments
– The user interface is character-based and built using the curses library
– Unix only
– http://dl.alphaworks.ibm.com/technologies/db2top/db2top.pdf
55
Troubleshooting lock timeouts
Lock Timeout Report Tool– Introduced in v9.5. Will be back ported to v9.1FP4 and v8.2F16
– Lock timeouts and deadlocks happen very frequently in customer situations
– End result is that some application (unit of work) fails and gets rolled back forcing the user to resubmit the work.
– It is very desirable to avoid lock timeouts and deadlocks in a production environment.
– The Lock Timeout Reporting tool makes debugging these and making the needed application changes to avoid them possible.
Enabled by setting registry variable DB2_CAPTURE_LOCKTIMEOUT– To enable/disable lock timeout reporting
db2set DB2_CAPTURE_LOCKTIMEOUT=ON
db2set DB2_CAPTURE_LOCKTIMEOUT=
56
Troubleshooting lock timeouts (cont)
Lock timeout report
– Lock in contention• Lock name and type • Lock specifics, including row ID, table space ID, and table ID. Use this information to query the
SYSCAT.TABLES system catalog view to identify the name of the table.
– Lock Requestor
– Lock owner or representative– There can be more than one lock owner: the first lock owner is the representative for other lock owners
Lock timeout report files– Report file is generated by the agent receiving the lock timeout error
– When the lock timeout reporting function is active and a lock timeout occurs
– The report is stored in a file using the following name format: db2locktimeout.par.AGENTID.yyyy-mm-dd-hh-mm-ss, where– par is the database partition number. In non-partitioned database environments, par is set to 0.
– AGENTID is the agent ID.
– yyyy-mm-dd-hh-mm-ss is the time stamp, consisting of the year, month, day, hour, minute, and second.
– An example of a lock timeout report file name is /home/juntang/sqllib/db2dump/db2locktimeout.000.4944050.2006-08-11-11-09-43.
57
Monitoring STMM
Tune a system from an out-of-the-box configuration to near-optimal memory usage in
an hour or less
Retrieve the current size of a buffer pool set to AUTOMATIC
– db2pd -database MYDB1 -bufferpools
– See CurrentSz column
Monitor STMM bufferpool changes
db2diag -g "message:=Altering bufferpool" db2diag.log
Monitor DB configuration changes
– db2diag -node 1 -g "changeevent:=STMM CFG DB" db2diag.log
Tool to parse the STMM log files
– parseStmmLogFile.pl <log file> <database name> <options>
– http://www.ibm.com/developerworks/db2/library/techarticle/dm-0708naqvi/
58
Monitoring memory usage: db2mtrk
Provide complete report of memory status, for instances, databases
and agents
Outputs the following memory pool allocation information:
– Current size
– Maximum size (hard limit)
– Largest size (high water mark)
– Type (identifier indicating function for which memory will be used)
– Agent who allocated pool (only if the pool is private)
The "Other Memory" reported is the memory associated with the
overhead of operating the database management system
db2top tool
60
db2top: monitoring tool
Single System View Monitor for DB2
– Specifically designed for DPF environments
– The user interface is character-based and built using the curses library
– Unix only
– Uses the DB2 snapshot monitoring APIs to retrieve data
– Uses both global as well as partition-specific monitoring information to provide
• aggregation
• quick drill-down capabilities
db2top info
– db2top –h
– http://dl.alphaworks.ibm.com/technologies/db2top/db2top.pdf
db2top configuration file: .db2toprc
– Used to setup parameters at initialization time
– Location is determined by $DB2TOPRC
– Default is current directory (and home directory if not found)
– Type w in db2top to generate resource configuration file
61
db2top Command Options
-n specifies the node to attach to.
-d specifies the database to monitor.
-u specifies the DB2 username used to access the database.
-p specifies the DB2 password.
-V specifies the default schema used in explains.
-i specifies the delay between screen updates.
-k specifies whether to display actual or delta values.
-R Reset snapshot at startup.
-P <number>. Snapshot issued on current or partition number.
-x specifies whether to display additionnal counters on session snd appl creens (might run slower on session).
-b tells db2top to run in background mode.
-a specifies only active queries will be displayed.
-C Runs db2top in snapshot collector mode, raw snapshot data is saved in <db2snap-<Machine>.bin>.
-f <file> </pattern> <+offset>. Run db2top in replay mode when snapshot data has been previously collected in <file>. offset will jump to a certain point in time in the file. It can be expressed in seconds (+10s), minutes (+10m) or hours (+10h). /pattern will analyse file and display at which offset matches appear. pattern can specified as regular expression.
-m Will limit duration of db2top in minutes for -b and -C.
-o <outfile>. Outfile for -b option.
-h short help.
62
db2top Interactive Commands
d Goto database screen
l Goto sessions screen
a Goto application details for agent
G Toggle between all partitions and current partitions.
P Select db partition on which to issue snapshot.
t Goto tablespaces screen.
T Goto tables screen.
b Goto bufferpools screen.
D Goto the dynamic SQL screen.
m Display memory pools.
s Goto the statements screen.
U Goto the locks screen.
p Goto the partitions screen.
H Goto the history screen (experimental).
f Freeze screen.
W Watch mode for agent_id, os_user, db_user, application or netname.
/ Enter expression to filter data.
<|> Move to left or right of screen.
z|Z Sort on ascending or descending order.
c This option allows to change the order of the columns displayed on the screen.
S Run native DB2 snapshot.
L Allows to display the complete query text from the SQL screen.
R Reset snapshot data.
i Toggle idle sessions on/off.
k Toggle actual vs delta values.
g Toggle graph on/off.
X Toggle extended mode on/off.
C Toggle snapshot data collector on/off.
V Set default explains schema.
O Display session setup.
w Write session settings to .db2toprc.
q Quit db2top.
63
db2top example
Running db2top monitoring utility in interactive mode in a DPF environment– db2top -d TEST -n mynode -u user -p passwd -V skm4 -B -i 1
– Command parameters are as follows:
– -d TEST --> database name
– -n mynode --> node name
– -u user --> user id
– -p passwd --> password
– -V skm4 --> Schema name
– -B --> Bold enabled
– -i 1 --> Screen update interval: 1 second
Bufferpool snapshot– b command
lqqqqqqqqqqqqqqwqqqqqqqqqqqqwqqqqqqqqqqqqwqqqqqqqqqqqqwqqqqqqqqqqqkx x 25%x 50%x 75%x 100%xxHit Ratio% x--------------------------------------------------xmqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj
Bufferpool Delta Delta Hit Async Delta Name l_reads/s p_reads/s Ratio% Reads% Writes/s --------------- ------------ ------------ ------- ------- ------------IBMDEFAULTBP 23015 241 98.95% 100.00% 212 IBMSYSTEMBP16K 0 0 0.00% 0.00% 0 IBMSYSTEMBP32K 0 0 0.00% 0.00% 0 IBMSYSTEMBP4K 0 0 0.00% 0.00% 0 IBMSYSTEMBP8K 0 0 0.00% 0.00% 0
64
Bufferpool performance
Determine performance of buffer pools– In real time and over time interval
Use performance analysis option of db2top (-A) for bufferpools– db2top -d sample -A -b b
---- Top twenty performance report for 'Bufferpools' between 13:19:52 and 13:20:18-- Sort criteria 'Pages_VctIOs/s'--
Rank Bufferpool_Name Percentage fromTime toTime sum(Pages_VctIOs/s)----- ------------------------------ ----------- -------- --------- ------------------------------
1 IBMDEFAULTBP 100.0000% 13:19:52 13:20:18 182 IBMSYSTEMBP8K 0.0000% 13:19:52 13:20:18 03 IBMSYSTEMBP4K 0.0000% 13:19:52 13:20:18 04 IBMSYSTEMBP16K 0.0000% 13:19:52 13:20:18 05 IBMSYSTEMBP32K 0.0000% 13:19:52 13:20:18 0
---- Performance report, breakdown by 300 seconds --
fromTime sum(Pages_VctIOs/s) Percentage Top Five in 300 seconds interval-------- ------------------------------ ---------- +----------------------------------------------+
18 100.0000% |Rank|Percentage|Bufferpool_Name |- - | 1| 100.0000%|IBMDEFAULTBP |- - | 2| 0.0000%|IBMSYSTEMBP8K |- - | 3| 0.0000%|IBMSYSTEMBP4K |- - | 4| 0.0000%|IBMSYSTEMBP16K |- - | 5| 0.0000%|IBMSYSTEMBP32K |
+----------------------------------------------+---- Performance report, breakdown by 0.5 hour --
fromTime sum(Pages_VctIOs/s) Percentage Top Five in 0.5 hour interval-------- ------------------------------ ---------- +----------------------------------------------+
18 100.0000% |Rank|Percentage|Bufferpool_Name |- - | 1| 100.0000%|IBMDEFAULTBP |- - | 2| 0.0000%|IBMSYSTEMBP8K |- - | 3| 0.0000%|IBMSYSTEMBP4K |
65
Bufferpool performance (cont)
db2top will report in real time
– data hit ratio for each bufferpool
– hit ratio for all tablespaces
– logica/physical read/writes per second per bufferpool
– efficiency of the prefetch (% of unread prefetch pages)
– whether block based bufferpools are used
– overall temp/data/index hit ratio
db2top collector mode
– Gather a lot of data at once and replay it back later
– db2top -C -d sample
– Start collection
– Answer N to „create named pipe‟ question
– db2top -f db2snap-sample-AIX64.bin -d sample
– Replay collection in another window
– db2top does not need to attach to the DB2 instance in replay mode
– Convenient for remote monitoring
– Limit the content and size of the stream file
– specify any of the suboptions available to the -C switch
– db2top -C b -d sample
66
db2top: Bottleneck analysis
Type „B‟ when in interactive mode
lqqqqqqqqqqqqqqwqqqqqqqqqqqqwqqqqqqqqqqqqwqqqqqqqqqqqqwqqqqqqqqqqqk
x x 25%x 50%x 75%x 100%x
xwait lock ms x x
xsort ms x x
xbp r/w ms x ------------------------------------------------ x
xasync r/w ms x ----------------------- x
xpref wait ms x ---------- x
xdir r/w ms x x
mqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj
Server Top Resource Resource Application
Resource Agent Usage Value Name
-- ------------ -------- -------- ---------------- --------------------
=> Cpu N/A 0% 0 N/A
=> SessionCpu 1223 99.87% 4.844 db2bp
=> IO r/w 1223 100.00% 4894 db2bp
=> Memory 1223 50.00% 384.0K db2bp
=> Locks N/A 0% 0 N/A
=> Sorts N/A 0% 0 N/A
=> Sort Time N/A 0% 0 N/A
=> Log Used 1223 100.00% 2.1M db2bp
=> Overflows N/A 0% 0 N/A
=> RowsRead 1223 100.00% 8960 db2bp
=> RowsWritten 1223 100.00% 8960 db2bp
=> TQ r/w N/A 0% 0 N/A
=> MaxQueryCost N/A 0% 0 N/A
Overview of new db2pd features in v9.1
db2pdcfg tool
69
db2pdcfg tool – new in v9.1
Configure DB2 database for problem determination behavior command
Sets flags in the DB2 database memory sets to influence the database
system behavior for problem determination purposes
db2pdcfg –help
Clear separation between db2pd and db2pdcfg tools
– db2pd is the “read-only” tool
– It will never write to DB2 shared memory or change anything
– db2pdcfg is the “read-write” tool
– Sets and displays parameters used
db2pdcfg will be substantially extended in post-v9.1 releases
db2cos tool in v9.1
71
DB2 call-out script improvements in v9.1
In v9.1, importance of db2cos for PD is significantly increased
New cases of automatic invocation
– Panic, trap, segmentation violation or exception
– Configurable by db2pdcfg
– ON by default
– Diagnostic dumps
– Configurable by db2pdcfg
– OFF by default
Addition of Windows OS support: db2cos.bat
Standardized place for db2cos script
– No need to copy file from sqllib/cfg
– $HOME/sqllib/bin on Unix
– %DB2PATH%\bin on Windows
Standardized place for db2cos output files
– directory specified by the DIAGPATH database manager configuration parameter
72
DB2 call-out script improvements in v9.1
Standardized header for db2cos report files
2006-09-07-01.32.32.481578
PID : 7057616 TID : 1 PROC : db2cos
INSTANCE: dabrashk NODE : 0 DB : LDSORTDB
APPHDL : APPID: *LOCAL.dabrashk.060907053224
FUNCTION: oper system services, sqloEDUCodeTrapHandler, probe:999
EVENT : Invoking /home/dabrashk/sqllib/bin/db2cos from oper system services
sqloEDUCodeTrapHandler
Trap Caught
Instance dabrashk uses 64 bits and DB2 code release SQL09010
…
<detailed db2pd output follows>
…
Speed and scalability improvement for large customer systems
– db2cos output files are named db2cosXXXYYY.ZZZ, where XXX is the process ID
(PID), YYY is the thread ID (TID) and ZZZ is the database partition number (or 000 for
single partition databases)
– Necessitated by db2pdcfg –catch feature (especially for catching errors)
Diagnosing traps
74
Call-out script (db2cos) trap output (v9.1)
Increased control over the set of diagnostic information produced when the database manager encounters a panic, trap, exception, or segmentation violation
– In such situations, the db2cos script is now automatically run
DB2 call-out script is executed on traps
– search for "TRAP“ in db2cos script
To disable generation of db2cos report in trap scenarios
– db2pdcfg –cos off
Output goes to db2cos<pid><tid>.<node> file
– db2cos48621401.0
To print the status
– db2pdcfg –cos status
Contains output of db2pd command (all options)
– db2pd –inst OR db2pd -db $database -inst
– You can edit the db2cos script to collect more or less information
75
Call-out script (db2cos) trap output (v9.1)
START and STOP are logged into db2diag.log– db2diag -g "funcname=pdInvokeCalloutScript“
– START : Invoking /home/dabrashk/sqllib/bin/db2cos from oper system services sqloEDUCodeTrapHandler
To specify number of times to execute db2cos during a trap– db2pdcfg –cos count=<count>
– default is 255
Specify how long to sleep between checking the size of the output file generated by db2cos
– db2pdcfg –cos sleep=<numsec>
– default is 3 seconds
Specify how long of a timeout when checking if the output file generated by db2cos is growing in size
– db2pdcfg –cos timeout=<numsec>
– default is 30 seconds
Instruct the database manager to execute db2cos when receiving SQLO_SIG_DUMP signal (db2pd –dump; kill -36 <agentPID> on AIX)
– db2pdcfg –cos SQLO_SIG_DUMP
76
db2pd additions in v9.1
77
Overview of new db2pd features for DB2 v9
PD Infrastructure enhancements in v9.1
– db2pdcfg tool
– db2cos tool improvements
Improvements for diagnosing traps
v9.1 db2pd additions
Troubleshooting and monitoring using db2pd
78
db2pd additions in v9.1
New commands
– -latches
– -fmp
– -pages
– -memblocks
– -dump
New options
– -bufferpools command: Bufferpool ID that contains the page
– -fcm command: high water mark option (hwm)
db2pd usability improvements
79
db2pd usability improvements
Agents
– v8.2: -agents [db=<database>] [ [agent=<agentid>] | [application=<appid>] ]
– v9.1: -agents [db=<database>] [[<AgentId> | [app=<AppHandl>]]
Applications
– v8.2: -applications [ [application=<appid>] | [agent=<agentid>] ]
– v9.1: -applications [[<AppHandl> | [agent=<AgentId>]]
Transactions
– v8.2: -transactions [tran=<tranhdl>] [app=<apphdl>]
– v9.1: -transactions [<TranHdl> | [app=<AppHandl>]]
Locks
– v8.2: -locks [tran=<tranhdl>] [showlocks] [wait]
– v9.1: -locks [<TranHdl>] [showlocks] [wait]
Table Control Block Stats
– v8.2: -tcbstats [all|index] [tbspaceid=<tbspaceid> [tableid=<tableid>]]
– v9.1: -tcbstats [all|index] [<TbspaceID> [<TableID>]]
Tablespaces/Containers
– v8.2: -tablespaces [group] [tablespace=<tablespace id>]
– v9.1: -tablespaces [<Tablespace ID>] [group]
80
Backup slides
81
db2pd -transactions states
-transactions “State” field FREE - Free State
READ - Read, no log record written
WRITE - Log record written
COMMIT - In commiting state
ABORT - In aborting state
ABORTDL - In aborting state, needs to be aborted on datalink file servers
SAVEPNT - In rollback to savepoint
PREP - Prepared transaction
HCOMT - heurestically commited
HABRT - heurestically rolled back
HAING - heurestically rolling back, used by recovery if further rollback processing
required
FRG - The transaction is forgotting
REPAIR - Transaction needs repair due to I/O errors on tablespaces it uses or PIT
tablespace rollforward
REUSE - Federated transaction has reached beginning of federated two-phase-commit
processing. If no other log record follows, then this signals prepare is not fully
finished and transaction needs to be rolled back at resync time.
82
Mapping Application ID to IP address and port
A TCP/IP-generated application ID is composed of three sections (v8.2)
– IP address represented as a 32-bit number (8 hexadecimal chars)
– port number (4 hexadecimal characters)
– unique identifier for the instance of this application
When IP address or port number begin with 0-9, they are changed to G-P respectively. For example, "0" is mapped to "G", "1“ is mapped to "H", and so on.
The IP address, AC10150C.NA04.006D07064947 is interpreted as follows:
– The IP address remains AC10150C, which translates to 172.16.21.12.
– The port number is NA04. The first character is "N", which maps to "7". Therefore, port number is 7A04, which translates to 31236 in decimal form.
The IP address and port can be used with lsof to find out which remote application is using the application ID
AC10150C.NA04.006D07064947
IP Address Port Generated ID