Interpreting Performance Test Results

INTERPRETING AND REPORTING PERFORMANCE TEST RESULTS

ERIC PROEGLER

2

ABOUT ME• 20 Years in Software, 14 in Performance,

Context-Driven for 12

• Performance Engineer/Teacher/Consultant

• Product Manager

• Board of Directors

• Lead Organizer

• Mentor

http://www.soasta.com/load-testing/

https://www.associationforsoftwaretesting.org/

http://www.performance-workshop.org/

http://speaking-easy.com/

3

1. Discover• ID processes & SLAs• Define use case

workflows• Model production

workload

3. Analyze• Run tests • Monitor system

resources• Analyze results

5. Report• Interpret results• Make

recommendations• Present to

stakeholders

2. Develop• Develop test scripts• Configure

environment monitors

• Run shakedown test

3

4. Fix• Diagnose • Fix• Re-test

DAN DOWNING’S 5 STEPS OF LOAD TESTING

4

ABOUT THIS SESSION• Participatory!

• Graphs from actual projects – have any?• Learn from each other

• Not About Tools

• First Half• Making observations and forming hypotheses

• Break (~10:00)

• Second Half• Interpreting and reporting actionable results

5

WHAT CAN WE OBSERVE ABOUT THIS APP?

6

WHAT’S THIS SUGGESTING?

7

WHAT’S THIS SUGGESTING?

8

AND THIS?

9

PERFORMANCE KPIS*

Scalability Throughput

System Capacity

*Key Performance Indicators

Workload Achievement

10

HOW COULD WE ANNOTATE THIS GRAPH?

Note scale of each metric

Mix

ed u

nits

(sec

., co

unt,

%)

11

WHAT DOES THIS SAY ABOUT CAPACITY?

12

WHAT OBSERVATION CAN WE MAKE HERE?

13

AND HERE?

14

HMMM…YIKES!

15

WHAT CAN WE SAY HERE?

16

WHAT CAN WE SAY HERE?

17

DESCRIBE WHAT HAPPENED HERE?

18

TELL A PLAUSIBLE STORY ABOUT THIS

19

WHAT’S THE LESSON FROM THIS GRAPH? Hurricane Center “Average” of

US hurricane Forecast models

Averages Lie!

20

MEASURE WHAT WHERE?

Mentora /NoVA

Load Injectors(remote users)

NeoLoad Controller(1000 vuser license)

Resource Monitor

Load Injector(Local users)

7100

Web servers (Linux/

WebLogic)

App servers(Linux,

Tuxedo)DB server(Solaris/Oracle)

F5 Load

Balancer

ssh port 22htt ps 16000

ssh port 22

ssh port 22JDBC port 1521

7200

80/443

4

1

2

3

5

6

21

MEASURE WHAT WHERE?

Mentora /NoVA

Load Injectors(remote users)

NeoLoad Controller(1000 vuser license)

Resource Monitor

Load Injector(Local users)

7100

Web servers (Linux/

WebLogic)

App servers(Linux,

Tuxedo)DB server(Solaris/Oracle)

F5 Load

Balancer

ssh port 22htt ps 16000

ssh port 22

ssh port 22JDBC port 1521

7200

80/443

Proper load balancing (really at Web/App

servers)

HW resourcesWeb Server connections,

queuing, errors

HW resourcesJVM heap memory

DB Connection pools

HW resourcesLock waits / Deadlocks

SQL recompilesFull table scansSlowest queries

SAN IOPS

Bandwidth throughputLoad injector capacity

LoadResponse timeHW resources

22

MEASURE WITH WHAT?

23

ANYTHING CONCERNING HERE?Before:Slowest transactions show spikes of 5 - 8 seconds, every 10 minutes

After:Spikes substantially reduced after VM memory increased to 12 GB

24

WHAT ARE WE LOOKING AT HERE?

When does this become a problem?

When heap space utilization keeps growing, despite

garbage collection, and reaches its max

allocation

25

ANY HYPOTHESES ABOUT THIS?Before:Abnormally high TCP retransmits between Web and App server

After:Network issues resolved

26

TELL A DATA-SUPPORTED STORY ABOUT THIS; ANNOTATE THE GRAPH

27

HOW MUCH LOAD?

28

HOW MUCH LOAD?

29

HOW MUCH LOAD?

30

HOW MUCH LOAD?

31

…Think “CAVIAR”C ollectingA ggregating V isualizing I nterpretingA ssessingR eporting

FOR ACTIONABLE PERFORMANCE RESULTS…

32

COLLECTING• Objective: Gather all results from test that

• help gain confidence in results validity• Portray system scalability, throughput & capacity• provide bottleneck / resource limit diagnostics• help formulate hypotheses

Types Examples Sources Granularity Value

LoadUsers | sessions simulated, files sent, "transactions" completed

test "scenario", "test harness" counts, web / app logs, db queries

At appropriate levels (virtual users, throughput) for your context

Correlate with Response times yields "system scalability"

Errorshttp, application, db, network; "test harness"

web | app logs, db utility, network trace

Raw data at most granular level Confidence in results validity

Response Times

page | action / "transaction" / end-to-end times

test tools | "scripts", web logsAt various levels of app granularity, linked to objectives

Fundamental unit of measure for performance

Resourcesnetwork, "server", "middleware", database, storage, queues

OS tools (vmstat, nmon, sar, perfmon), vendor monitoring tools

5-15 second sampling rates, with logging, to capture transient spikes

Correlated with Response times yields "system capacity"

Anecdotesmanual testing, transient resources, screenshots

People manually testing or monitoring during the test

Manual testing by different people & locations

Confidence / corroberation / triangulation of results

33

AGGREGATING• Objective: Summarize measurements using

• Various sized time-buckets to provide tree & forest views• Consistent time-buckets across types to enable accurate

correlation• Meaningful statistics: scatter, min-max range, variance,

percentiles• Multiple metrics to “triangulate”, confirm (or invalidate)

hypothesesTypes Examples Tools Statistics Value

LoadUsers/sessions; requests; no. files / msgs sent/rcvd

avg Basis for ID'ing load sensitivity of all other metrics

ErrorsError rate, error counts; by url |type; http 4xx & 5xx

avg-maxID if errors correlate with load or resource metrics

Response Times

workflow end-to-end time; page/action time

min-avg-max-Std Deviation90th percentile

Quantify system scalability

Network Thruput

Megabits/sec. avg-maxID if thruput plateaus while load still ramping, or exceeds network capacity

App Thruput

Page view rate; completed transactions by type

avg-maxID if page view rate can be sustained, or if "an hours work can be done in an hour"

Resources% cpu; cpu, disk queue depth; memory usage; IOPS; queued requests; db contention

avg-maxID limiting resources; provide diagnostics; quantify system capacity

Testing tool graphs, monitoring tools, Excel pivot tables

34

VISUALIZING

• Objective: Gain “forest view” of metrics relative to load• Turn barrels of numbers into a few pictures• Vary graph scale & summarization granularity to expose

hidden facts• ID load point where degradation begins• ID system tier(s) where bottlenecks appear, limiting

resources

35

VISUALIZING

• My key graphs, in order of importance• Errors over load (“results valid?”)• Bandwidth throughput over load (“system bottleneck?”)• Response time over load (“how does system scale?”)

• Business process end-to-end• Page level (min-avg-max-SD-90th percentile)

• System resources (“how’s the infrastructure capacity?”)• Server cpu over load• JVM heap memory/GC• DB lock contention, I/O Latency

36

INTERPRETING

• Objective: Draw conclusions from observations, hypotheses• Make objective, quantitative observations from graphs /

data • Correlate / triangulate graphs / data• Develop hypotheses from correlated observations • Test hypotheses and achieve consensus among tech

teams• Turn validated hypotheses into conclusions

37

INTERPRETING

• Observations:• “I observe that…”; no evaluation at this point!

• Correlations:• “Comparing graph A to graph B…” – relate observations

to each other• Hypotheses:

• “It appears as though…” – test these with extended team; corroborate with other information (anecdotal observations, manual tests)

• Conclusions:• “From observations a, b, c, corroborated by d, I

conclude that…”

38

SCALABILITY: RESPONSE TIME OVER LOAD

• Is 2.5 sec / page acceptable? Need to drill down to page level to ID key contributors, look at 90th or 95th percentiles (averages are misleading)

Two styles for system scalability; top graph shows load explicitly on its own y-axis

Note consistent 0.5 sec / page up to ~20 users

Above that, degrades steeply to 5x at max load

39

THROUGHPUT PLATEAU WITH LOAD RISING

= BOTTLENECK SOMEWHERE!• Note throughput

tracking load through ~45 users, then leveling off

• Culprit was an Intrusion Detection appliance limiting bandwidth to 60 Mbps

In a healthy system throughput should closely track load

40

BANDWIDTH TRACKING WITH LOAD = HEALTHY

All 3 web servers show network interface throughput tracking with load throughout the test

A healthy bandwidth graph looks like Mt. Fuji

41

ERRORS OVER LOAD MUST EXPLAIN!

• Note relatively few errors

• Largely http 404s on missing resources

Error rate of <1% can be attributed to “noise” and dismissed; >1% should be analyzed and fully explained

Sporatic bursts of http 500 errors near end of the test while customer was “tuning” web servers

42

END USER EXPERIENCE SLA VIOLATIONS

Outlier, not on VPN

43

SLA VIOLATIONS DRILL DOWN

Felipe B. (Brazil, Feb 28th, 7:19AM-1:00PM CST, 10.74.12.55): > 20 second response on page “Media Object Viewer”.

44

NETWORK THROUGHPUT – RAW GRAPH

45

NETWORK THROUGHPUT - INTERPRETED

46

CAPACITY: SYSTEM RESOURCES - RAW

47

CAPACITY: SYSTEM RESOURCES - INTERPRETED

Monitor resources liberally, provide (and annotate!) graphs selectively: which resources tell the main story?

48

ASSESSING• Objective: Turn conclusions into recommendations

• Tie conclusions back to test objectives – were objectives met?• Determine remediation options at appropriate level –

business, middleware, application, infrastructure, network• Perform agreed-to remediation• Re-test

• Recommendations:• Should be specific and actionable at a business or technical

level• Should be reviewed (and if possible, supported) by the teams

that need to perform the actions (nobody likes surprises!)• Should quantify the benefit, if possible the cost, and the risk of

not doing it• Final outcome is management’s judgment, not yours

49

REPORTING• Objective: Convey recommendations in

stakeholders’ terms• Identify the audience(s) for the report; write / talk in their

language• Executive Summary – 3 pages max

• Summarize objectives, approach, target load, acceptance criteria

• Cite factual Observations • Draw Conclusions based on Observations• Make actionable Recommendations

• Supporting Detail• Test parameters (date/time executed, business processes, load

ramp, think-times, system tested (hw config, sw versions/builds)

• Sections for Errors, Throughput, Scalability, Capacity• In each section: annotated graphs, observations, conclusions

• Associated Docs (If Appropriate)• Full set of graphs, workflow detail, scripts, test assets

50

REPORTING

• Step 1: *DO NOT* press “Print” of tool’s default Report • Who is your audience? • Why do they want to see 50 graphs and 20 tables? What

will they be able to see?• Data + Analysis = INFORMATION

51

REPORTING

• Step 2: Understand What is Important• What did you learn? Study your results, look for

correlations.• What are the 3 things you need to convey?• What information is needed to support these 3 things?• Discuss findings with technical team members: “What

does this look like to you?”

52

REPORTING

• Step 3: So, What is Important?• Prepare a three paragraph summary for email• Prepare a 30 second Elevator Summary for when

someone asks you about the testing• More will consume these than any test report• Get feedback

53

REPORTING

• Step 4: Preparing Your Final Report: Audience• Your primary audience is usually executive sponsors

and the business. Write the Summary at the front of the report for them.• Language, Acronyms, and Jargon • Level of Detail• Correlation to business objectives

54

REPORTING

• Step 5: Audience (cont.)• Rich Technical Detail within:

• Observations, including selected graphs• Include Feedback from Technical Team• Conclusions • Recommendations

55

REPORTING

• Step 6: Present!• Remember, no one is going to read the report.• Gather your audience: executive, business, and

technical. • Present your results• Help shape the narrative. Explain the risks. Earn your

keep.• Call to action! Recommend solutions

56

…REMEMBER: CAVIAR!

C ollectingA ggregating V isualizing I nterpretingA ssessingR eporting

57

A FEW RESOURCES• WOPR (Workshop On Performance and Reliability)

• http://www.performance-workshop.org• Experience reports on performance testing• Spring & Fall facilitated, theme-based peer conferences

• SOASTA Community• http://cloudlink.soasta.com• Papers, articles, presentations on performance testing

• PerfBytes Podcast• Mark Tomlinson’s blog

• http://careytomlinson.org/mark/blog/ • Richard Leeke’s blog (Equinox.nz)

• http://www.equinox.co.nz/blog/Lists/Posts/Author.aspx?Author=Richard Leeke • Data visualization

• Scott Barber’s resource page• http://www.perftestplus.com/resources.htm

• STP Resources• http://www.softwaretestpro.com/Resources• Articles, blogs, papers on wide range of testing topics

http://www.performance-workshop.org/

http://cloudlink.soasta.com/



http://careytomlinson.org/mark/blog/

http://www.equinox.co.nz/blog/Lists/Posts/Author.aspx?Author=Richard%20Leeke







http://www.perftestplus.com/resources.htm

http://www.softwaretestpro.com/Resources

58

THANKS FOR ATTENDING

Please fill out an evaluation form

[email protected]

mailto:[email protected]

Technology

Interpreting Performance Test Results