Upload
perconaperformance
View
719
Download
0
Tags:
Embed Size (px)
Citation preview
1
Performance Instrumentationbeyond whatyou do now
Cary [email protected]
Percona Performance ConferenceSanta Clara, California9:00a–9:55a Thursday 23 April 2009
2
Introductions
3
Cary Millsap carymillsap.blogspot.com cary_millsap
4
1986
1989
1999
2008
4
1986
1989
1999
2008
SoftwareDeveloper
and
PerformanceAnalyst
5
6
Method R Corporationhttp://method-r.com
7
What we do at Method R Corporation…
• Write code for you• Troubleshoot performance problems• Teach you how to do what we do• Write software tools that make your work easier
8
Thinking clearly about performance
9
Performance is HARD
10
“Our users say that everything is slow, but I
don’t know where to begin.”
11
“Our users are complaining,but all our dials are green.”
12
A story.
13
In the beginning...
(1989: Oracle 6.0.26)
14
“Tuning” was…
15
bstat.sql...
estat.sqlreport.txt
16
16
V$DB_OBJECT_CACHE
V$FILESTAT
V$LATCH
V$LIBRARYCACHE
V$LOCK
V$OPEN_CURSOR
V$PARAMETER
V$PROCESS
V$ROLLSTATV$ROWCACHE
V$SESSION
V$SESSTAT
V$SQLV$SQLTEXT
V$TIMER
V$TRANSACTION
V$WAITSTAT
V$SESS_IOV$SYSSTAT
V$FIXED_VIEW_DEFINITION
ps
sar
vmstat
iostat
netstat
pstat
nfsstat
17
People looked for “bad numbers.”
18
Inefficiencies.
19
But how can you know what causes a specific task to be
slow?
20
21
21
It's latches
21
It's latches
It'sI/O
21
It's latches
It'sI/O
It's always I/
O
21
It's latches
It'sI/O
It's always I/
O
It'sbad SQL
21
It's latches
It'sI/O
It's always I/
O
It'sbad SQL It's
always bad SQL
21
It's latches
It'sI/O
It's always I/
O
It'sbad SQL It's
always bad SQL
There's not
enough memory
21
It's latches
It'sI/O
It's always I/
O
It'sbad SQL It's
always bad SQL
There's not
enough memory
There's never
enough memory
22
My problem…
23
How can you possibly
know that?
24
Reminded me of…
25vailroger.googlepages.com/orionconstellation
26
You do see it...
Right?
27vailroger.googlepages.com/orionconstellation
27vailroger.googlepages.com/orionconstellation
28
But who says
thatis what you have to see?
29
29
30
Why not?
31
Performance is hard.
32
A good pilot makes it look easy.
—Van R. Millsap1936–2004
33
Performance is EASY
34
How?
35
It’s the
user’s experience
that matters.
36
37
A user’s performance experienceconsists of two elements…
38
1. a task2. time
39
Task
40
The things we used to “computerize”… tasks.http://olathe.lib.ks.us/images/Image/Computer%20User.jpg
41
A task is a business unit of work.
• Post to the General Ledger• Enter an order• Look up a book by author
42
Tasks can nest.
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task• Print Address #42 is a
(sub)task
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task• Print Address #42 is a
(sub)task
Posting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task• Print Address #42 is a
(sub)task
• Often, a program is a taskPosting
PO AP AR … FA
42
Tasks can nest.
• Print Addresses is a task• Print Address #42 is a
(sub)task
• Often, a program is a task• Often, a tiny part of a
program is a taskPosting
PO AP AR … FA
43
Tasks are it.
Business people don’t care about the “system” except
through execution of the tasks that make up their business.
44
Tasks are it.
Tasks are what system owners care
about.
45
Time
46
Performance is about time.
47
How fast: “Daddy, can your car go 500 miles?”He meant “500 miles per hour.”To talk about performance (speed), you have to talk about time.
48
Two ways to measure performance…
49
49
tasks per time
49
tasks per time(that’s throughput)
49
tasks per time(that’s throughput)
49
tasks per time(that’s throughput)
time per task
49
tasks per time(that’s throughput)
time per task(that’s response time)
50
Throughput and response time…
50
Throughput and response time…
• Throughput (X)– The tasks-per-time way– Number of task executions completed in a given duration
• “orders/second”
50
Throughput and response time…
• Throughput (X)– The tasks-per-time way– Number of task executions completed in a given duration
• “orders/second”
50
Throughput and response time…
• Throughput (X)– The tasks-per-time way– Number of task executions completed in a given duration
• “orders/second”
• Response time (R)– The time-per-task way– Elapsed duration of an execution of a given task
• “seconds/order”
51
51
X = 1/R
51
X = 1/R
51
X = 1/R
(kind of)
52
Average throughput is the inverse of average response time.
52
Average throughput is the inverse of average response time.
X = 1,000 txn/sec?
52
Average throughput is the inverse of average response time.
X = 1,000 txn/sec?
Then R = (1 sec)/(1,000 txn) = .001 sec/txn
But…
53
53
…Adding load to createhigher throughput
changes response time.
54
…Which leads to a whole ’nother conversation I’d loveto have with you some other time.
55
Sequence Diagram
56
RA
A simple way to view response time is witha UML sequence diagram.
http://www.websequencediagrams.com
57
RA
More complicated systems have nested levels ofsuppliers and consumers.
RB
http://www.websequencediagrams.com
58
RUser
The tiers represent the way your system is constructed.
http://www.websequencediagrams.com
59
RUser
This sequence diagram shows the complicated interactions among consumers and suppliers.
http://www.websequencediagrams.com
60
The sequence diagram is a
good conceptual tool.
61
But when you need to analyze thousands of calls,you need something else.
62
Profile
63
A profile is a complete account of a task’s response time.
Response time (seconds)
# Calls R/call (seconds)
Call name0.769 50.3% 5,003 0.000154 unaccounted-for between
dbcalls0.393 25.7% 5,010 0.000078 SQL*Net message from client0.381 24.9% 5,013 0.000076 CPU service, execute calls0.090 5.9% 11 0.008194 CPU service, prepare calls0.027 1.8% 1 0.027396 log file sync0.008 0.5% 5,010 0.000002 SQL*Net message to client0.000 0.0% 9 0.000000 CPU service, fetch calls
–0.138 –9.1% 5,031 –0.000028 unaccounted-for within dbcalls1.530 100.0% Total
64
You’ve done this before,if you’ve ever used…
gcc –pg …; gprof …java –prof …; java ProfilerViewer …
perl –d:Dprof …; dprofpp …dbms_monitor.session_trace_enable(…); p5prof …
65
Profile
• Full account of response time– Spanning (sum ≮ R)– Non-overlapping (sum ≯ R)
• Sorted by descending R• Useful dimension
– Flat profile– Call graph
• Contributions as %R• Duration per call
Mean, minimum, maximum, …Skew
• Drill-downIndividual call level of detailMaybe even deeper
66
Response Time
67
To optimize throughput, you
must analyze response time.
68
(Proof)
68
(Proof)
You cannot optimize X for a task that’s inefficient.
68
(Proof)
You cannot optimize X for a task that’s inefficient.
68
(Proof)
You cannot optimize X for a task that’s inefficient.
You cannot measure a task’s efficiency without measuring its R.
68
(Proof)
You cannot optimize X for a task that’s inefficient.
You cannot measure a task’s efficiency without measuring its R.
68
(Proof)
You cannot optimize X for a task that’s inefficient.
You cannot measure a task’s efficiency without measuring its R.
Therefore, to optimize X, you must first analyze R.
69
The universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.
—Donald Knuth
70
(Programmers aren’t very good at guessing where their code spends time.)
71
To optimize performance (throughput or response time),
people need profiles.
72
Performance is EASY
73
Performance is easy if you can
stop guessing where your code is slow.
74
When you have profiles for task response times, performance
problems cannot hide from you.
75
Some surprising things I’ve learned by measuring R…
76
Disk I/O is often less important
than people think.http://carymillsap.blogspot.com/2009/04/cary-on-joel-on-ssd.html
77
Common performance problems:
77
Common performance problems:
CPU
77
Common performance problems:
CPU
77
Common performance problems:
CPU
Network I/O
77
Common performance problems:
CPU
Network I/O
77
Common performance problems:
CPU
Network I/O
Software serialization
78
The point…
79
Your problems have nothing to do with experiences I’ve had.
So measure.
80
Finding what you need to see
81
How are you supposed to
create these profiles?
82
You have to insist on seeing where time goes for any task you think is important.
83
To drill down, you needcall-by-call data.
(NOT data about aggregations of calls.)
84
In Oracle, we do it with a feature called extended SQL tracing.
• For Developers: Making Friends with the Oracle Database for Fast, Scalable Applications– Cary Millsap
http://method-r.com/downloads/doc_details/10-for-developers-making-friends-with-the-oracle-database-cary-millsap
• Optimizing Oracle Performance– Cary Millsap with Jeff Holt
85
The stuff you need…
86
Feature (attribute) Oracle MySQL App tierTask identification yCall-by-call coverage 98%+DB call begin sequence partly derivableDB call begin time partly derivableDB call end time yDB call context info yOS call begin sequence partly derivableOS call begin time derivableOS call end time yOS call context info yCall SQL context yCall CPU (sys mode) -Call CPU (usr mode) -Call CPU (total) ySQL execution plans y
87
Recap
88
Here’s what I hopeyou take away today…
89
Performance is abouttime and tasks.
90
If you’re interested in performance, then
read Goldratt’s The Goal.
91
91
Don’t guess; you’re probably wrong.
91
Don’t guess; you’re probably wrong.
Measure response timebefore you optimize anything.
91
Don’t guess; you’re probably wrong.
Measure response timebefore you optimize anything.
Insist on it.
92
Performance is easy(and fun!)
when code measures its owntime and tasks.
93