59
INTERPRETING AND REPORTING PERFORMANCE TEST RESULTS ERIC PROEGLER

Interpreting Performance Test Results

Embed Size (px)

Citation preview

Page 1: Interpreting Performance Test Results

INTERPRETING AND REPORTING PERFORMANCE TEST RESULTS

ERIC PROEGLER

Page 2: Interpreting Performance Test Results

2

ABOUT ME• 20 Years in Software, 14 in Performance,

Context-Driven for 12

• Performance Engineer/Teacher/Consultant

• Product Manager

• Board of Directors

• Lead Organizer

• Mentor

Page 3: Interpreting Performance Test Results

3

1. Discover• ID processes & SLAs• Define use case

workflows• Model production

workload

3. Analyze• Run tests • Monitor system

resources• Analyze results

5. Report• Interpret results• Make

recommendations• Present to

stakeholders

2. Develop• Develop test scripts• Configure

environment monitors

• Run shakedown test

3

4. Fix• Diagnose • Fix• Re-test

DAN DOWNING’S 5 STEPS OF LOAD TESTING

Page 4: Interpreting Performance Test Results

4

ABOUT THIS SESSION• Participatory!

• Graphs from actual projects – have any?• Learn from each other

• Not About Tools

• First Half• Making observations and forming hypotheses

• Break (~10:00)

• Second Half• Interpreting and reporting actionable results

Page 5: Interpreting Performance Test Results

5

WHAT CAN WE OBSERVE ABOUT THIS APP?

Page 6: Interpreting Performance Test Results

6

WHAT’S THIS SUGGESTING?

Page 7: Interpreting Performance Test Results

7

WHAT’S THIS SUGGESTING?

Page 8: Interpreting Performance Test Results

8

AND THIS?

Page 9: Interpreting Performance Test Results

9

PERFORMANCE KPIS*

Scalability Throughput

System Capacity

*Key Performance Indicators

Workload Achievement

Page 10: Interpreting Performance Test Results

10

HOW COULD WE ANNOTATE THIS GRAPH?

Note scale of each metric

Mix

ed u

nits

(sec

., co

unt,

%)

Page 11: Interpreting Performance Test Results

11

WHAT DOES THIS SAY ABOUT CAPACITY?

Page 12: Interpreting Performance Test Results

12

WHAT OBSERVATION CAN WE MAKE HERE?

Page 13: Interpreting Performance Test Results

13

AND HERE?

Page 14: Interpreting Performance Test Results

14

HMMM…YIKES!

Page 15: Interpreting Performance Test Results

15

WHAT CAN WE SAY HERE?

Page 16: Interpreting Performance Test Results

16

WHAT CAN WE SAY HERE?

Page 17: Interpreting Performance Test Results

17

DESCRIBE WHAT HAPPENED HERE?

Page 18: Interpreting Performance Test Results

18

TELL A PLAUSIBLE STORY ABOUT THIS

Page 19: Interpreting Performance Test Results

19

WHAT’S THE LESSON FROM THIS GRAPH? Hurricane Center “Average” of

US hurricane Forecast models

Averages Lie!

Page 20: Interpreting Performance Test Results

20

MEASURE WHAT WHERE?

Mentora /NoVA

Load Injectors(remote users)

NeoLoad Controller(1000 vuser license)

Resource Monitor

Load Injector(Local users)

7100

Web servers (Linux/

WebLogic)

App servers(Linux,

Tuxedo)DB server(Solaris/Oracle)

F5 Load

Balancer

ssh port 22htt ps 16000

ssh port 22

ssh port 22JDBC port 1521

7200

80/443

4

1

2

3

5

6

Page 21: Interpreting Performance Test Results

21

MEASURE WHAT WHERE?

Mentora /NoVA

Load Injectors(remote users)

NeoLoad Controller(1000 vuser license)

Resource Monitor

Load Injector(Local users)

7100

Web servers (Linux/

WebLogic)

App servers(Linux,

Tuxedo)DB server(Solaris/Oracle)

F5 Load

Balancer

ssh port 22htt ps 16000

ssh port 22

ssh port 22JDBC port 1521

7200

80/443

Proper load balancing (really at Web/App

servers)

HW resourcesWeb Server connections,

queuing, errors

HW resourcesJVM heap memory

DB Connection pools

HW resourcesLock waits / Deadlocks

SQL recompilesFull table scansSlowest queries

SAN IOPS

Bandwidth throughputLoad injector capacity

LoadResponse timeHW resources

Page 22: Interpreting Performance Test Results

22

MEASURE WITH WHAT?

Page 23: Interpreting Performance Test Results

23

ANYTHING CONCERNING HERE?Before:Slowest transactions show spikes of 5 - 8 seconds, every 10 minutes

After:Spikes substantially reduced after VM memory increased to 12 GB

Page 24: Interpreting Performance Test Results

24

WHAT ARE WE LOOKING AT HERE?

When does this become a problem?

When heap space utilization keeps growing, despite

garbage collection, and reaches its max

allocation

Page 25: Interpreting Performance Test Results

25

ANY HYPOTHESES ABOUT THIS?Before:Abnormally high TCP retransmits between Web and App server

After:Network issues resolved

Page 26: Interpreting Performance Test Results

26

TELL A DATA-SUPPORTED STORY ABOUT THIS; ANNOTATE THE GRAPH

Page 27: Interpreting Performance Test Results

27

HOW MUCH LOAD?

Page 28: Interpreting Performance Test Results

28

HOW MUCH LOAD?

Page 29: Interpreting Performance Test Results

29

HOW MUCH LOAD?

Page 30: Interpreting Performance Test Results

30

HOW MUCH LOAD?

Page 31: Interpreting Performance Test Results

31

…Think “CAVIAR”C ollectingA ggregating V isualizing I nterpretingA ssessingR eporting

FOR ACTIONABLE PERFORMANCE RESULTS…

Page 32: Interpreting Performance Test Results

32

COLLECTING• Objective: Gather all results from test that

• help gain confidence in results validity• Portray system scalability, throughput & capacity• provide bottleneck / resource limit diagnostics• help formulate hypotheses

Types Examples Sources Granularity Value

LoadUsers | sessions simulated, files sent, "transactions" completed

test "scenario", "test harness" counts, web / app logs, db queries

At appropriate levels (virtual users, throughput) for your context

Correlate with Response times yields "system scalability"

Errorshttp, application, db, network; "test harness"

web | app logs, db utility, network trace

Raw data at most granular level Confidence in results validity

Response Times

page | action / "transaction" / end-to-end times

test tools | "scripts", web logsAt various levels of app granularity, linked to objectives

Fundamental unit of measure for performance

Resourcesnetwork, "server", "middleware", database, storage, queues

OS tools (vmstat, nmon, sar, perfmon), vendor monitoring tools

5-15 second sampling rates, with logging, to capture transient spikes

Correlated with Response times yields "system capacity"

Anecdotesmanual testing, transient resources, screenshots

People manually testing or monitoring during the test

Manual testing by different people & locations

Confidence / corroberation / triangulation of results

Page 33: Interpreting Performance Test Results

33

AGGREGATING• Objective: Summarize measurements using

• Various sized time-buckets to provide tree & forest views• Consistent time-buckets across types to enable accurate

correlation• Meaningful statistics: scatter, min-max range, variance,

percentiles• Multiple metrics to “triangulate”, confirm (or invalidate)

hypothesesTypes Examples Tools Statistics Value

LoadUsers/sessions; requests; no. files / msgs sent/rcvd

avg Basis for ID'ing load sensitivity of all other metrics

ErrorsError rate, error counts; by url |type; http 4xx & 5xx

avg-maxID if errors correlate with load or resource metrics

Response Times

workflow end-to-end time; page/action time

min-avg-max-Std Deviation90th percentile

Quantify system scalability

Network Thruput

Megabits/sec. avg-maxID if thruput plateaus while load still ramping, or exceeds network capacity

App Thruput

Page view rate; completed transactions by type

avg-maxID if page view rate can be sustained, or if "an hours work can be done in an hour"

Resources% cpu; cpu, disk queue depth; memory usage; IOPS; queued requests; db contention

avg-maxID limiting resources; provide diagnostics; quantify system capacity

Testing tool graphs, monitoring tools, Excel pivot tables

Page 34: Interpreting Performance Test Results

34

VISUALIZING

• Objective: Gain “forest view” of metrics relative to load• Turn barrels of numbers into a few pictures• Vary graph scale & summarization granularity to expose

hidden facts• ID load point where degradation begins• ID system tier(s) where bottlenecks appear, limiting

resources

Page 35: Interpreting Performance Test Results

35

VISUALIZING

• My key graphs, in order of importance• Errors over load (“results valid?”)• Bandwidth throughput over load (“system bottleneck?”)• Response time over load (“how does system scale?”)

• Business process end-to-end• Page level (min-avg-max-SD-90th percentile)

• System resources (“how’s the infrastructure capacity?”)• Server cpu over load• JVM heap memory/GC• DB lock contention, I/O Latency

Page 36: Interpreting Performance Test Results

36

INTERPRETING

• Objective: Draw conclusions from observations, hypotheses• Make objective, quantitative observations from graphs /

data • Correlate / triangulate graphs / data• Develop hypotheses from correlated observations • Test hypotheses and achieve consensus among tech

teams• Turn validated hypotheses into conclusions

Page 37: Interpreting Performance Test Results

37

INTERPRETING

• Observations:• “I observe that…”; no evaluation at this point!

• Correlations:• “Comparing graph A to graph B…” – relate observations

to each other• Hypotheses:

• “It appears as though…” – test these with extended team; corroborate with other information (anecdotal observations, manual tests)

• Conclusions:• “From observations a, b, c, corroborated by d, I

conclude that…”

Page 38: Interpreting Performance Test Results

38

SCALABILITY: RESPONSE TIME OVER LOAD

• Is 2.5 sec / page acceptable? Need to drill down to page level to ID key contributors, look at 90th or 95th percentiles (averages are misleading)

Two styles for system scalability; top graph shows load explicitly on its own y-axis

Note consistent 0.5 sec / page up to ~20 users

Above that, degrades steeply to 5x at max load

Page 39: Interpreting Performance Test Results

39

THROUGHPUT PLATEAU WITH LOAD RISING

= BOTTLENECK SOMEWHERE!• Note throughput

tracking load through ~45 users, then leveling off

• Culprit was an Intrusion Detection appliance limiting bandwidth to 60 Mbps

In a healthy system throughput should closely track load

Page 40: Interpreting Performance Test Results

40

BANDWIDTH TRACKING WITH LOAD = HEALTHY

All 3 web servers show network interface throughput tracking with load throughout the test

A healthy bandwidth graph looks like Mt. Fuji

Page 41: Interpreting Performance Test Results

41

ERRORS OVER LOAD MUST EXPLAIN!

• Note relatively few errors

• Largely http 404s on missing resources

Error rate of <1% can be attributed to “noise” and dismissed; >1% should be analyzed and fully explained

Sporatic bursts of http 500 errors near end of the test while customer was “tuning” web servers

Page 42: Interpreting Performance Test Results

42

END USER EXPERIENCE SLA VIOLATIONS

Outlier, not on VPN

Page 43: Interpreting Performance Test Results

43

SLA VIOLATIONS DRILL DOWN

Felipe B. (Brazil, Feb 28th, 7:19AM-1:00PM CST, 10.74.12.55): > 20 second response on page “Media Object Viewer”.

Page 44: Interpreting Performance Test Results

44

NETWORK THROUGHPUT – RAW GRAPH

Page 45: Interpreting Performance Test Results

45

NETWORK THROUGHPUT - INTERPRETED

Page 46: Interpreting Performance Test Results

46

CAPACITY: SYSTEM RESOURCES - RAW

Page 47: Interpreting Performance Test Results

47

CAPACITY: SYSTEM RESOURCES - INTERPRETED

Monitor resources liberally, provide (and annotate!) graphs selectively: which resources tell the main story?

Page 48: Interpreting Performance Test Results

48

ASSESSING• Objective: Turn conclusions into recommendations

• Tie conclusions back to test objectives – were objectives met?• Determine remediation options at appropriate level –

business, middleware, application, infrastructure, network• Perform agreed-to remediation• Re-test

• Recommendations:• Should be specific and actionable at a business or technical

level• Should be reviewed (and if possible, supported) by the teams

that need to perform the actions (nobody likes surprises!)• Should quantify the benefit, if possible the cost, and the risk of

not doing it• Final outcome is management’s judgment, not yours

Page 49: Interpreting Performance Test Results

49

REPORTING• Objective: Convey recommendations in

stakeholders’ terms• Identify the audience(s) for the report; write / talk in their

language• Executive Summary – 3 pages max

• Summarize objectives, approach, target load, acceptance criteria

• Cite factual Observations • Draw Conclusions based on Observations• Make actionable Recommendations

• Supporting Detail• Test parameters (date/time executed, business processes, load

ramp, think-times, system tested (hw config, sw versions/builds)

• Sections for Errors, Throughput, Scalability, Capacity• In each section: annotated graphs, observations, conclusions

• Associated Docs (If Appropriate)• Full set of graphs, workflow detail, scripts, test assets

Page 50: Interpreting Performance Test Results

50

REPORTING

• Step 1: *DO NOT* press “Print” of tool’s default Report • Who is your audience? • Why do they want to see 50 graphs and 20 tables? What

will they be able to see?• Data + Analysis = INFORMATION

Page 51: Interpreting Performance Test Results

51

REPORTING

• Step 2: Understand What is Important• What did you learn? Study your results, look for

correlations.• What are the 3 things you need to convey?• What information is needed to support these 3 things?• Discuss findings with technical team members: “What

does this look like to you?”

Page 52: Interpreting Performance Test Results

52

REPORTING

• Step 3: So, What is Important?• Prepare a three paragraph summary for email• Prepare a 30 second Elevator Summary for when

someone asks you about the testing• More will consume these than any test report• Get feedback

Page 53: Interpreting Performance Test Results

53

REPORTING

• Step 4: Preparing Your Final Report: Audience• Your primary audience is usually executive sponsors

and the business. Write the Summary at the front of the report for them.• Language, Acronyms, and Jargon • Level of Detail• Correlation to business objectives

Page 54: Interpreting Performance Test Results

54

REPORTING

• Step 5: Audience (cont.)• Rich Technical Detail within:

• Observations, including selected graphs• Include Feedback from Technical Team• Conclusions • Recommendations

Page 55: Interpreting Performance Test Results

55

REPORTING

• Step 6: Present!• Remember, no one is going to read the report.• Gather your audience: executive, business, and

technical. • Present your results• Help shape the narrative. Explain the risks. Earn your

keep.• Call to action! Recommend solutions

Page 56: Interpreting Performance Test Results

56

…REMEMBER: CAVIAR!

C ollectingA ggregating V isualizing I nterpretingA ssessingR eporting

Page 57: Interpreting Performance Test Results

57

A FEW RESOURCES• WOPR (Workshop On Performance and Reliability)

• http://www.performance-workshop.org• Experience reports on performance testing• Spring & Fall facilitated, theme-based peer conferences

• SOASTA Community• http://cloudlink.soasta.com• Papers, articles, presentations on performance testing

• PerfBytes Podcast• Mark Tomlinson’s blog

• http://careytomlinson.org/mark/blog/ • Richard Leeke’s blog (Equinox.nz)

• http://www.equinox.co.nz/blog/Lists/Posts/Author.aspx?Author=Richard Leeke • Data visualization

• Scott Barber’s resource page• http://www.perftestplus.com/resources.htm

• STP Resources• http://www.softwaretestpro.com/Resources• Articles, blogs, papers on wide range of testing topics

Page 58: Interpreting Performance Test Results

58

THANKS FOR ATTENDING

Please fill out an evaluation form

[email protected]

Page 59: Interpreting Performance Test Results