Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles

©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Predictive Performance TestingIntegrating Statistical Tests into Agile Development Lifecycles

Tom KleingarnLead, Performance Engineering

Digital River

http://www.linkedin.com/in/tomkleingarn

http://www.perftom.com

Agenda

> Introduction

> Performance engineering

> Agile

> Outputs from LoadRunner

> Basic statistics

> Advanced statistics

> Summary

> Practical application

About Me> Tom Kleingarn

> Lead, Performance Engineering - Digital River

> 4 years in performance engineering

> Tested over 100 systems/applications

> 100’s of performance tests

> Tools> LoadRunner

> JMeter

> Webmetrics, Keynote, Gomez

> ‘R’ and Excel

> Quality Center

> QuickTest Professional

> Leading provider of global e-commerce solutions

> Builds and manages online businesses for software and game publishers, consumer electronics manufacturers, distributors, online retailers and affiliates.

> Comprehensive platform offers > Site development and hosting> Order management> Fraud management> Export control> Tax management> Physical and digital product fulfillment> Multi-lingual customer service> Advanced reporting and strategic marketing

Performance Engineering

> The process of experimental design, test execution, and results analysis, utilized to validate system performance as part of the Software Development Lifecycle (SDLC).

> Performance requirements – measureable targets of speed, reliability, and/or capacity used in performance validation.

> Latency < 10ms, measured at the 99th percentile

> 99.95% uptime

> Throughput of 1,000 requests per second

Performance Testing Cycle

1. Requirements Analysis

2. Create test plan

3. Create automated scripts

4. Define workload model

5. Execute scenarios

6. Analyze results

> Rinse and repeat if…

> Defects identified

> Change in requirements

> Setup or environment issues

> Performance requirement not met

Digital River Test Automation

Agile

> A software development paradigm that emphasizes rapid process cycles, cross-functional teams, frequent examination of progress, and adaptability.

Scrum

Initial Plan

Deploy

Agile Performance Engineering

> Clear and constant communication

> Involvement in initial requirements and design phase

> Identify key business processes before they are built

> Coordinate with analysts and development to build key business processes first

> Integrate load generation requirements into project schedule

> Test immediately with v1.0

> Schedule tests to auto-start, run independently

> Identify invalid test results before deep analysis

LoadRunner Results

> Measures of central tendency> Average = ∑(all samples)/(sample size) =

> Median = 50th percentile

> Mode – highest frequency, the value that occurred the most

> Measures of variability> Min, max

> Standard Deviation =

> 90th percentile

LoadRunner Results

50% 50%

90% 10%

Basic Statistics – Sample vs. Population

> Performance requirement: average latency < 3 seconds

> What if you ran 50 rounds? 100 rounds?

Basic Statistics – Sample vs. Population

> Sample – set of values, subset of population

> Population – all potentially observable values

> Measurements

> Statistic – the estimated value from a collection of samples

> Parameter – the “true” value you are attempting to estimate

Not a representative sample!

Basic Statistics – Sample vs. Population> Sampling distribution – the probability distribution of a

given statistic based on a random sample of size n> Dependent on the underlying population

> How do you know the system under test met the performance requirement?

Basic Statistics – Normal Distribution

> With larger samples, data tend to cluster around the mean

Basic Statistics – Normal Distribution

Sir Francis Galton’s “Bean Machine”

Confidence Intervals

> The probability that an interval made up of two endpoints will contain the true mean parameter μ

> 95% confidence interval:

> … where 1.96 is a score from the normal distribution associated with 95% probability:

Confidence Intervals

> In repeated rounds of testing, a confidence interval will contain the true mean parameter with a certain probability:

True Average

Confidence Intervals in Excel

> 95% confidence - true average latency 3.273 to 3.527 seconds

> 99% confidence - true average latency 3.233 to 3.567 seconds

> Our range is wider at 99% compared to 95%, 0.334 sec vs. 0.254 sec

Statistic Value 95% Value 99% Formula

Average 3.40 3.40

Standard Deviation 1.45 1.45

Sample size 500 500

Confidence Level 0.95 0.99

Significance Level 0.05 0.01 =1-(Confidence Level)

Margin of Error 0.0127 0.167 =CONFIDENCE(Sig. Level, Std Dev, Sample Size)

Lower Bound 3.273 3.233 =Average - Margin of Error

Upper Bound 3.527 3.567 =Average + Margin of Error

The T-test

> Test that your sample mean is greater than/less than a certain value

> Performance requirement:

Mean latency < 3 seconds

> Null hypothesis:

Mean latency >= 3 seconds

> Alternative hypothesis:

Mean latency is < 3 seconds

Add pic

T-test – Raw Data from LoadRunner

n = 500

T-test in ‘R’> ‘R’ for statistical analysis

> http://www.r-project.org/

Load test data from a file:> datafile <- read.table("C:\\Data\\test.data",

header = FALSE, col.names= c("latency"))

Attach the dataframe:> attach(datafile)

Create a “vector” from the dataframe:

> latency <- datafile$latency

http://www.r-project.org/

T.Test in ‘R’

> t.test(latency, alternative="less", mu=3, tails=1)

One Sample t-test

data: latency

t = -2.9968, df = 499, p-value = 0.001432

alternative hypothesis: true mean is less than 3

> There is a 0.14% probability that the true average latency of the system is greater than 3 seconds. In this case we would reject the null hypothesis.

> There is a 99.86% probability that the true average latency is less than 3 seconds

T-test – Number of Samples Required

> power.t.test(sd=sd, sig.level=0.05, power=0.90, delta=mean(latency)*0.01, type="one.sample")

One-sample t test power calculation

n = 215.5319

delta = 0.03241267

sd = 0.1461401

sig.level = 0.05

power = 0.9

alternative = two.sided

> We need at least 216 samples

> Our sample size is 500, we have enough samples to proceed

Test for Normality

> Test that the data is “normal”

> Clustered around a central value, no outliers

> Roughly fits the normal distribution

> shapiro.test(latency)

Shapiro-Wilk normality test

data: latency

p-value = 0.8943

> Our sample distribution is approximately normal

> p-value < 0.05 indicates the distribution is not normal

Review

> Sample vs. Population

> Normal distribution

> Confidence intervals

> T-test

> Sample size

> Test for normality

> Practical application

> Performance requirements

> Compare two code builds

> Compare system infrastructure changes

Case Study

> Engaged in a new web service project

> Average latency < 25ms

> Applied statistical analysis

> System did not meet requirement

> Identified problem transaction

> Development fix applied

> Additional test, requirement met

> Prevented a failure in production

Implementation in Agile Projects

> Involvement in early design stages

> Identify performance requirements

> Build key business processes first

> Calculate required sample size

> Apply statistical analysis

> Run fewer tests with greater confidence in your results

> Prevent performance defects from entering production

> Prevent SLA violations in production

Technology

Predictive Performance Testing: Integrating Statistical Tests into Agile Development Life-cycles