36
The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning Presented at the Greater Atlanta Computer Measurement Group Fall Conference, October 22, 2008

The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Embed Size (px)

Citation preview

Page 1: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

The Mathematics of Performance Management and Capacity Planning - Overview

Descriptive and Predictive Analytics in the Age of Virtual Systems

Tim BrowningPresented at the Greater Atlanta Computer

Measurement Group Fall Conference, October 22, 2008

Page 2: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 2Tim Browning

October, 2008

On Mathematics & Statistics

There are two kinds of statistics, the kind you look up and the kind you make up.  ~Rex Stout, Death of a Doxy

How many times can you subtract 7 from 83, and what is left afterwards?  You can subtract it as many times as you want, and it leaves 76 every

time.  ~Author Unknown

In ancient times, they had no statistics, so they had to fall back on lies.  ~Stephen B. Leacock

Page 3: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 3Tim Browning

October, 2008

Goals Performance Engineering and Capacity Management

• Goals of Performance EngineeringMonitor/Manage/Predict System Performance

Reflect and Understand Customer Experience

Foundation of evidence-based Capacity Management

• Goals of Capacity ManagementAssure Computing Supply is available to Meet Business

Demand

Determine Best use of existing resources (optimization)

Page 4: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 4Tim Browning

October, 2008

Probability, Probity and Authority

• Before the seventeenth century, legal evidence in Europe was considered of greater weight if a person testifying had “probity”. “Empirical evidence” was barely a concept. Probity was a measure of authority, so evidence came from authority. A noble person had probity. Yet today, probability is the very measure of the weight of empirical evidence in science, arrived at from inductive or statistical inference.

• The term 'probable' (Latin probabilis) meant approvable, and was applied in that sense, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances.

• Even so, the jury of executive opinion, in the business-government Enterprise, is most often swayed by the consensus of expert opinion, usually at considerable cost.

Page 5: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 5Tim Browning

October, 2008

• Probability and Statistics are not the same - They are related, but circuitously related:

– Probability can be viewed either as the long-run frequency of occurrence or as a measure of the plausibility of an event given incomplete knowledge - but not both.

– Statistics are functions of the observations (data) that often have useful and even surprising properties. 

• So we see the relationship(s) between probability and statistics: – From the observations we compute statistics that we use to estimate

population parameters, which index the probability density, from which we can compute the probability of a future observation from that density.

– In general, probability asks what is likely to happen and statistics describes what has already happened (and forms the basis for what is likely)

– In statistics, you don’t know how a process works but are able to observe the outcomes; in probability you already know how a process works but want to know how to predict what will happen. The combination is the foundation of statistical inference.

Page 6: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 6Tim Browning

October, 2008

• Descriptive Statistics are used to describe the basic features of the data gathered from an experimental study in various ways. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.

• Two objectives for formulating a summary statistic:– To choose a statistic that shows how different units seem similar.

Statistical textbooks call one solution to this objective, a measure of central tendency.

– To choose another statistic that shows how they differ. This kind of statistic is often called a measure of statistical variability.

Page 7: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 7Tim Browning

October, 2008

“Central Tendency”Central – middle value, centerTendency – Expected value, most frequent, representative

Arithmetic MeanThe arithmetic mean is the most common measure of central tendency. It is simply the sum of the numbers divided by the number of numbers.The symbol M is used for the mean of a population. The symbol M isused for the mean of a sample. The formula for m is shown below:

where ΣX is the sum of all the numbers in the numbers in the sample andN is the number of numbers in the sample. As an example, the mean ofthe numbers 1+2+3+6+8= =4

regardless of whether the numbers constitute the entire population or just a sample from thepopulation.

Nx

M

5

20

Page 8: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 8Tim Browning

October, 2008

• Other, less common measures of central tendency:– Median is the middle value – the point where half the values lie on each

side of the number, i.e. half are larger and half are smaller. The ‘middle’ of the distribution of values. The number separating the higher half of a sample, a population, or a probability distribution, from the lower half. If you divide a distribution into 4ths (quartiles), then the median is the 2nd quartile.

• Useful in performance management in the presence of outliers where we are more concerned about frequency of occurrence relative to a ‘central’ value than a theoretical ‘average’ that many not even occur in the data. For example, response time.

– Percentiles group data by putting equal numbers of data into each group. The nth percentile is the point below which n% of the data are found.

• Useful in performance as it provides a very good view of the user’s experience.• Useful in capacity planning for ‘sizing’ a system based on accommodation of its

historical high points. For example, the 90th percentile of CPU busy.

Page 9: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 9Tim Browning

October, 2008

• When to use the arithmetic mean:– When your data contains no outliers (extreme values that are not

typical or normative).– When the variability is low between values, for example in

utilization metrics.. when the variability is less than 20%.

• What can you do about outliers (dirty data)?– Eliminate them (i.e. they are few and unlikely to reoccur).– Use a weighted mean that discounts the outliers. The weighted

mean is similar to an arithmetic mean (the most common type of average), where instead of each of the data points contributing equally to the final average, some data points contribute more than others.

– Use the Geometric Mean which has remarkable insensitivity to outliers.

Page 10: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 10Tim Browning

October, 2008

The Dirty Data Experiment with the Geometric Mean

Page 11: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 11Tim Browning

October, 2008

The Dirty Data Experiment with the Weighted Mean=(1/19)-(1/19)*0.2 =(1/19)+((1/19)*0.2)/18

A convex combination is a linear combination of points (which can be vectors, scalars, etc.) where all coefficients are non-negative and sum up to 1.

Page 12: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 12Tim Browning

October, 2008

“There are liars, outliers, and out-and out liars.”

• What are ‘outliers’?– Extreme values not typical of the group– “Rare events” that do not fit within the range of other data values.– Non-normative data, anomalous, exceptional, etc.

• How are they detected?– Visually using statistical graphics– Statistical Filtering– Interquartile fencing – less than lower quartile; greater than upper

quartile– More advanced methods: Grubbs’ Test, etc

There is no such thing as a simple test!

Page 13: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 13Tim Browning

October, 2008

The Geometric Mean• Instead of adding the set of

numbers and then dividing the sum by the count of numbers in the set, n, the numbers are multiplied and then the nth root of the resulting product is taken.

• For instance, the geometric mean of two numbers, say 2 and 8, is just the square root (i.e., the second root) of their product, 16, which is 4. As another example, the geometric mean of 1, ½, and ¼ is the cube root (i.e., the third root) of their product (0.125), which is ½.

NNxxxGM ...21

)))(ln(exp(ln1

exp1

1

1

XmeanxN

xGNN

ii

NN

ii

In SQL-eese:

SELECT

EXP(AVG(LN(Response_Time)))

as GEOMEAN

FROM

Page 14: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 14Tim Browning

October, 2008

The ‘geometry’ part of the Geometric Mean:Consider a ‘line’ where the beginning is at point ‘A’

and the end is at point ‘B’, where is the ‘middle’ (point ‘B’)?

C

B

B

A

A C

B?

2

2

*

*

**

CAB

CAB

CABB

Page 15: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 15Tim Browning

October, 2008

Measures of variability

• Variance – the amount of ‘spread’ in the data around the mean.

• Standard Deviation – square root of the varianceIn a normal distribution approx

2/3 of the data are within one standard deviation of the mean on either side

)1/())...()()(( 222

21

2 nxxxxxxS n

In performance large response time Std Devns are usually bad; you want it to be low and repeatable. Wide variations upset people more than long, but consistent

times.

Page 16: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 16Tim Browning

October, 2008

The Geometric Standard Deviation• The antilog of the standard

deviation of the natural log transformed values of x or

)2))^(ln()2)^(ln((

)ln(1

)ln(1

))(ln(exp(

1

2

1

2

xmeanxmeansqrtGsd

xN

xN

Gsd

xstdevGsd

N

i

N

iii

In SQL-eese:

SELECT

EXP(STDDEV(LN(Response_Time)))

as GEOSTDEV

FROM

the_data

WHERE

Response_Time>0

Page 17: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 17Tim Browning

October, 2008

Correlation and Regression• Correlation – How things vary together (or not); the

strength and direction of a linear relationship between two random variables or the departure of two variables from independence.

• There are several…Pearson, being the most common in performance analysis (but mis-named)

• Probably the most misused statistical tool.• Obtained by dividing the covariance of two variables

by the product of their standard deviations.

Page 18: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 18Tim Browning

October, 2008

• Linear Regression and it’s cousins (non-linear, multi-, and logistic, etc.) are all methods for fitting curves or lines to data in a statistically optimal manner. “The best way of drawing a line since the invention of the straight edge” – Pat Artis.

• Often used by managers to observe ‘trends’ and predict the future (or explain the past). Often misused for the same purpose.

• In statistics, linear regression is a form of regression analysis in which the relationship between one or more independent variables and another variable, called dependent variable, is modeled by a least squares function, called linear regression equation. This function is a linear combination of one or more model parameters, called regression coefficients. A linear regression equation with one independent variable represents a straight line. The results are subject to statistical analysis.

Page 19: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 19Tim Browning

October, 2008

Linear regression in Excel:Using Graphical techniques

Linear Regression

y = 10.142x + 20.458

R2 = 0.9638

0.0

100.0

200.0

300.0

400.0

500.0

600.0

20 25 30 35 40 45 50

X

Y

Data Points

Linear (Data Points)

Page 20: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 20Tim Browning

October, 2008

Examples of Capacity/Performance Reporting in use now

Traditional time series line charts…

Page 21: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 21Tim Browning

October, 2008

Advanced Statistical Graphics

3-D Performance Surface

Multi-temporal density plot

Expected high/low/actual

Page 22: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 22Tim Browning

October, 2008

SAP – CCMS Metrics via SAS/Graph

Page 23: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 23Tim Browning

October, 2008

Application Response Time Modeling

INCREASING APPLICATION WORKLOAD

AP

PL

ICA

TIO

N R

ES

PO

NS

E T

IME

Small Changes, Large Impact

Large Changes, Small Impact

System Unresponsive

Page 24: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 24Tim Browning

October, 2008

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

How does Modeling differ from Trending in prediction?

Application Workload

Ap

pli

cati

on R

esp

onse

Tim

e

SLAThreshold

System LoadMeasurement

Date predictedVia Modeling

Date predictedVia Trending

Application Modeling vs. Linear Regression via Trending

Page 25: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 25Tim Browning

October, 2008

Modern Dynamic Systems are Challenging to Understand

Page 26: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 26Tim Browning

October, 2008

Page 27: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 27Tim Browning

October, 2008

• Response to Capacity/Performance Crisis:

• I. System/Application tuning, re-engineering, and optimization :– Benefit: Considerable merit is obtained sometimes in the hundreds of percent improvements.

Achieved via system administrative action (usually parametric changes for the OS) and by algorithmic and parametric re-specification (for the application). No capital expense. Efficient use of resources.

– Detriments: The effects may not be enduring for dynamic systems as version/release changes and application functionality changes can, and do, degrade performance tuning effects quickly. Often system reinitiatlization (reboot, IPL) is required and creates an availability/service delivery issue. Application re-engineering for performance may be, and often is, cost prohibitive and/or unsupported by executive management.

• 2. Capacity Increase via upgrade/replacement or technology refresh:– Benefit: Reduces risk of unsupported/unrecoverable infrastructure conditions. The effect is usually

long term. Accommodates increased application functionality for business utility.

– Detriments: Capital expense may be incurred. Inefficiencies remain. Risk management to avoid undersizing or oversizing requires expensive predictive modeling tools. Predictive analytics requires advanced skills in tech staffing. Risks associated with new technologies which may increase complexity (e.g. virtualization). Costs may be unsupported by executive management.

Page 28: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 28Tim Browning

October, 2008

Modeling? Why?

Reactive Problem Solving vs Modeling

• damage grows rapidly with time;• the longer the error goes undiscovered, the more useless and damaging work based on the error will be done;• when the error is discovered, it and all the associated damage has to beremoved;• the system will then need therapy to recover • the death rate increases dramatically with late discovery • alternatively, the survival rate increases dramatically with early discovery

"Crude measures of the right things are better than precise measures of the wrong things."- from Jim Clemmer's article, "Strategic Measurements Guide Change and Improvement"

Page 29: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 29Tim Browning

October, 2008

Summary of Performance Analysis Techniques

Technique Suitable Unsuitable Reactive problem solving

Almost never Almost always

Measurement For a current status report For modern, dynamic systems with complex, rapidly changing technology

Consensus of expert guessing

Quick and dirty decisions where risk is low

High risk, complexity, high variance between experts

Analytic Modeling

For models such as capacity plans where the models will be reuseable

For projects with a) new and untested architectures, b) new technology, or c) complex, heterogeneous, highly distributed behavior

Simulation Modeling

For predicting the performance of complex new systems and technology in general and modern distributed information systems and computer systems in particular

For rough, quick estimates for large numbers of configurations of simple, mature systems where analytic modeling is suitable

Benchmarking To determine the performance of a particular workload on a particular configuration

When many workloads, designs, and configurations must be analyzed

Page 30: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 30Tim Browning

October, 2008

Predictive Analytics: Benefits

Predictive analytics provide a practical way to detect problems and allow early correction as well as avoid resource saturation conditions.

Simulation provides a practical way to detect such problems and allow early correction. Avoiding the use of simulation substantially increases the risk of failure.

Analytical modeling provides fast and accurate answers based on existing performance data. It allows for a variety of what-if scenarios to be easily crafted to determine the best course of action when systems are experiencing change.

Statistcal Forecasting and Analysis provides descriptive and predictive aspects of IT performance data topology thru the use of measures of central tendency, variability, correlation, linear regression, and statistical pattern recognition.

Page 31: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 31Tim Browning

October, 2008

SAP-specific Capacity Planning Methodology for CCE

• We want to acquire capacity to provide required service levels for sustained busy periods. Typical examples:

– Month end closing– Busy daily window (e.g., 09:00 to 11:00)– Mondays– Complete batch window on time to deliver operational reports or schedule

deliveries/shipment/print picking papers/etc

• The best approach is to choose the percentile you want to satisfy– The 90th percentile of hourly mips across the month is reflective of busy

daily periods– Likewise the 95th percentile reflects the sustained busy where there is a

pronounced financial systems month end closing effect– In legacy OLTP we often see peak to average ratio’s between 1.5:1 and

2:1 based on the definition of peak (e.g, 90th vs 95th)– This really is a view of sustained busy– No one can afford to buy for absolute peaks (99th or 100th percentile)

Page 32: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 32Tim Browning

October, 2008

Capacity Planning for the Newly Virtual

Three Essential Elements measurement to ascertain critical data like IT

resource availability, utilization and usage patterns

second-level analysis to focus on the long-term needs of the enterprise rather than the immediate concern to bump up resources

business realignment to ensure that IT is keeping pace with business needs, not the other way around

Page 33: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 33Tim Browning

October, 2008

Capacity Planning for the Newly Virtual

Over half (54%) of the virtual-server adopters have experienced a net growth in capacity, while only 7% reported a net decrease (ESG Research)

Focus on understanding our “virtualization” factorso Effect of non-concurrent peaks of multiple workloadso Follow the sun in a global operationo Better understanding of these effects can be gained by looking at the

90th/95th percentiles o Landscape dimensions:

• a workload level, • a platform (processor complex) level, • a Sysplex / Cluster level• Server/Lpar level, etc.

The ‘virtualization’ analysis will tell us how much we can over-commit resources

• The 95th percentile of the sums vs the sum of the 95th percentiles• It is often the case that we have the ability to load to 115% with the sum of the

95th percentiles

Page 34: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 34Tim Browning

October, 2008

Organizational Support

Institutionalize the process

The resource reporting and modeling is actually the easy part of this

The more difficult and more important part of institutionalizing the process is connecting the application blueprinting/design process to the capacity planning process:– This creates the understanding of the business drivers which is

key to scaling factors and calibration– This is also a potential trigger for alerting the organization to the

need for a risk mitigation plan. For example, step function workload increases with new workloads which should lead to a performance testing activity

Page 35: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 35Tim Browning

October, 2008

Organizational Support for Capacity Planning

Market the lesser-known benefits of capacity planning

Strengthened relationships with developers and end users. Communication, negotiation, and a sense of joint ownership can all combine to nurture a healthy, professional relationship between IT and its customers

Improved communications with suppliers. Involving key suppliers and support staffs with your capacity plans can promote effective communications among these groups

Increased collaboration with other infrastructure groups. Network services, technical support, database administration, operations, desktop support, and even facilities may all play a role in capacity planning. In order for the plan to be thorough and effective, all these various groups must support and collaborate with each other.

Promotion of a culture of strategic planning as opposed to tactical firefighting. One of the most significant benefits of developing an overall and ongoing capacity-planning program is the institutionalizing of a strategic-planning culture

Page 36: The Mathematics of Performance Management and Capacity Planning - Overview Descriptive and Predictive Analytics in the Age of Virtual Systems Tim Browning

Mathematics of Performance & Capacity

Slide 36Tim Browning

October, 2008

C. Tim Browning Coca-Cola Enterprises Technical Architecture Enterprise Performance & Capacity Planning \\\|/// \\ - - // ( @ @ ) +-----oOOo-(_)-oOOo--+-----------------------------------+ | | | | T I M | Tel: (770) 370-8566 (OFFICE) | | B R O W N I N G | (404) 210-7051 (CELL) | | | MAIL: [email protected] | +--------------Oooo--+-----------------------------------+ oooO ( ) ( ) ) / \ ( (_/ \_)

Go Green – Stop Global Whining

Author/Contact: