Upload
edmund-dorsey
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
The Mathematics of Performance Management and Capacity Planning - Overview
Descriptive and Predictive Analytics in the Age of Virtual Systems
Tim BrowningPresented at the Greater Atlanta Computer
Measurement Group Fall Conference, October 22, 2008
Mathematics of Performance & Capacity
Slide 2Tim Browning
October, 2008
On Mathematics & Statistics
There are two kinds of statistics, the kind you look up and the kind you make up. ~Rex Stout, Death of a Doxy
How many times can you subtract 7 from 83, and what is left afterwards? You can subtract it as many times as you want, and it leaves 76 every
time. ~Author Unknown
In ancient times, they had no statistics, so they had to fall back on lies. ~Stephen B. Leacock
Mathematics of Performance & Capacity
Slide 3Tim Browning
October, 2008
Goals Performance Engineering and Capacity Management
• Goals of Performance EngineeringMonitor/Manage/Predict System Performance
Reflect and Understand Customer Experience
Foundation of evidence-based Capacity Management
• Goals of Capacity ManagementAssure Computing Supply is available to Meet Business
Demand
Determine Best use of existing resources (optimization)
Mathematics of Performance & Capacity
Slide 4Tim Browning
October, 2008
Probability, Probity and Authority
• Before the seventeenth century, legal evidence in Europe was considered of greater weight if a person testifying had “probity”. “Empirical evidence” was barely a concept. Probity was a measure of authority, so evidence came from authority. A noble person had probity. Yet today, probability is the very measure of the weight of empirical evidence in science, arrived at from inductive or statistical inference.
• The term 'probable' (Latin probabilis) meant approvable, and was applied in that sense, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances.
• Even so, the jury of executive opinion, in the business-government Enterprise, is most often swayed by the consensus of expert opinion, usually at considerable cost.
Mathematics of Performance & Capacity
Slide 5Tim Browning
October, 2008
• Probability and Statistics are not the same - They are related, but circuitously related:
– Probability can be viewed either as the long-run frequency of occurrence or as a measure of the plausibility of an event given incomplete knowledge - but not both.
– Statistics are functions of the observations (data) that often have useful and even surprising properties.
• So we see the relationship(s) between probability and statistics: – From the observations we compute statistics that we use to estimate
population parameters, which index the probability density, from which we can compute the probability of a future observation from that density.
– In general, probability asks what is likely to happen and statistics describes what has already happened (and forms the basis for what is likely)
– In statistics, you don’t know how a process works but are able to observe the outcomes; in probability you already know how a process works but want to know how to predict what will happen. The combination is the foundation of statistical inference.
Mathematics of Performance & Capacity
Slide 6Tim Browning
October, 2008
• Descriptive Statistics are used to describe the basic features of the data gathered from an experimental study in various ways. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.
• Two objectives for formulating a summary statistic:– To choose a statistic that shows how different units seem similar.
Statistical textbooks call one solution to this objective, a measure of central tendency.
– To choose another statistic that shows how they differ. This kind of statistic is often called a measure of statistical variability.
Mathematics of Performance & Capacity
Slide 7Tim Browning
October, 2008
“Central Tendency”Central – middle value, centerTendency – Expected value, most frequent, representative
Arithmetic MeanThe arithmetic mean is the most common measure of central tendency. It is simply the sum of the numbers divided by the number of numbers.The symbol M is used for the mean of a population. The symbol M isused for the mean of a sample. The formula for m is shown below:
where ΣX is the sum of all the numbers in the numbers in the sample andN is the number of numbers in the sample. As an example, the mean ofthe numbers 1+2+3+6+8= =4
regardless of whether the numbers constitute the entire population or just a sample from thepopulation.
Nx
M
5
20
Mathematics of Performance & Capacity
Slide 8Tim Browning
October, 2008
• Other, less common measures of central tendency:– Median is the middle value – the point where half the values lie on each
side of the number, i.e. half are larger and half are smaller. The ‘middle’ of the distribution of values. The number separating the higher half of a sample, a population, or a probability distribution, from the lower half. If you divide a distribution into 4ths (quartiles), then the median is the 2nd quartile.
• Useful in performance management in the presence of outliers where we are more concerned about frequency of occurrence relative to a ‘central’ value than a theoretical ‘average’ that many not even occur in the data. For example, response time.
– Percentiles group data by putting equal numbers of data into each group. The nth percentile is the point below which n% of the data are found.
• Useful in performance as it provides a very good view of the user’s experience.• Useful in capacity planning for ‘sizing’ a system based on accommodation of its
historical high points. For example, the 90th percentile of CPU busy.
Mathematics of Performance & Capacity
Slide 9Tim Browning
October, 2008
• When to use the arithmetic mean:– When your data contains no outliers (extreme values that are not
typical or normative).– When the variability is low between values, for example in
utilization metrics.. when the variability is less than 20%.
• What can you do about outliers (dirty data)?– Eliminate them (i.e. they are few and unlikely to reoccur).– Use a weighted mean that discounts the outliers. The weighted
mean is similar to an arithmetic mean (the most common type of average), where instead of each of the data points contributing equally to the final average, some data points contribute more than others.
– Use the Geometric Mean which has remarkable insensitivity to outliers.
Mathematics of Performance & Capacity
Slide 10Tim Browning
October, 2008
The Dirty Data Experiment with the Geometric Mean
Mathematics of Performance & Capacity
Slide 11Tim Browning
October, 2008
The Dirty Data Experiment with the Weighted Mean=(1/19)-(1/19)*0.2 =(1/19)+((1/19)*0.2)/18
A convex combination is a linear combination of points (which can be vectors, scalars, etc.) where all coefficients are non-negative and sum up to 1.
Mathematics of Performance & Capacity
Slide 12Tim Browning
October, 2008
“There are liars, outliers, and out-and out liars.”
• What are ‘outliers’?– Extreme values not typical of the group– “Rare events” that do not fit within the range of other data values.– Non-normative data, anomalous, exceptional, etc.
• How are they detected?– Visually using statistical graphics– Statistical Filtering– Interquartile fencing – less than lower quartile; greater than upper
quartile– More advanced methods: Grubbs’ Test, etc
There is no such thing as a simple test!
Mathematics of Performance & Capacity
Slide 13Tim Browning
October, 2008
The Geometric Mean• Instead of adding the set of
numbers and then dividing the sum by the count of numbers in the set, n, the numbers are multiplied and then the nth root of the resulting product is taken.
• For instance, the geometric mean of two numbers, say 2 and 8, is just the square root (i.e., the second root) of their product, 16, which is 4. As another example, the geometric mean of 1, ½, and ¼ is the cube root (i.e., the third root) of their product (0.125), which is ½.
NNxxxGM ...21
)))(ln(exp(ln1
exp1
1
1
XmeanxN
xGNN
ii
NN
ii
In SQL-eese:
SELECT
EXP(AVG(LN(Response_Time)))
as GEOMEAN
FROM
Mathematics of Performance & Capacity
Slide 14Tim Browning
October, 2008
The ‘geometry’ part of the Geometric Mean:Consider a ‘line’ where the beginning is at point ‘A’
and the end is at point ‘B’, where is the ‘middle’ (point ‘B’)?
C
B
B
A
A C
B?
2
2
*
*
**
CAB
CAB
CABB
Mathematics of Performance & Capacity
Slide 15Tim Browning
October, 2008
Measures of variability
• Variance – the amount of ‘spread’ in the data around the mean.
• Standard Deviation – square root of the varianceIn a normal distribution approx
2/3 of the data are within one standard deviation of the mean on either side
)1/())...()()(( 222
21
2 nxxxxxxS n
In performance large response time Std Devns are usually bad; you want it to be low and repeatable. Wide variations upset people more than long, but consistent
times.
Mathematics of Performance & Capacity
Slide 16Tim Browning
October, 2008
The Geometric Standard Deviation• The antilog of the standard
deviation of the natural log transformed values of x or
)2))^(ln()2)^(ln((
)ln(1
)ln(1
))(ln(exp(
1
2
1
2
xmeanxmeansqrtGsd
xN
xN
Gsd
xstdevGsd
N
i
N
iii
In SQL-eese:
SELECT
EXP(STDDEV(LN(Response_Time)))
as GEOSTDEV
FROM
the_data
WHERE
Response_Time>0
Mathematics of Performance & Capacity
Slide 17Tim Browning
October, 2008
Correlation and Regression• Correlation – How things vary together (or not); the
strength and direction of a linear relationship between two random variables or the departure of two variables from independence.
• There are several…Pearson, being the most common in performance analysis (but mis-named)
• Probably the most misused statistical tool.• Obtained by dividing the covariance of two variables
by the product of their standard deviations.
Mathematics of Performance & Capacity
Slide 18Tim Browning
October, 2008
• Linear Regression and it’s cousins (non-linear, multi-, and logistic, etc.) are all methods for fitting curves or lines to data in a statistically optimal manner. “The best way of drawing a line since the invention of the straight edge” – Pat Artis.
• Often used by managers to observe ‘trends’ and predict the future (or explain the past). Often misused for the same purpose.
• In statistics, linear regression is a form of regression analysis in which the relationship between one or more independent variables and another variable, called dependent variable, is modeled by a least squares function, called linear regression equation. This function is a linear combination of one or more model parameters, called regression coefficients. A linear regression equation with one independent variable represents a straight line. The results are subject to statistical analysis.
Mathematics of Performance & Capacity
Slide 19Tim Browning
October, 2008
Linear regression in Excel:Using Graphical techniques
Linear Regression
y = 10.142x + 20.458
R2 = 0.9638
0.0
100.0
200.0
300.0
400.0
500.0
600.0
20 25 30 35 40 45 50
X
Y
Data Points
Linear (Data Points)
Mathematics of Performance & Capacity
Slide 20Tim Browning
October, 2008
Examples of Capacity/Performance Reporting in use now
Traditional time series line charts…
Mathematics of Performance & Capacity
Slide 21Tim Browning
October, 2008
Advanced Statistical Graphics
3-D Performance Surface
Multi-temporal density plot
Expected high/low/actual
Mathematics of Performance & Capacity
Slide 22Tim Browning
October, 2008
SAP – CCMS Metrics via SAS/Graph
Mathematics of Performance & Capacity
Slide 23Tim Browning
October, 2008
Application Response Time Modeling
INCREASING APPLICATION WORKLOAD
AP
PL
ICA
TIO
N R
ES
PO
NS
E T
IME
Small Changes, Large Impact
Large Changes, Small Impact
System Unresponsive
Mathematics of Performance & Capacity
Slide 24Tim Browning
October, 2008
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
How does Modeling differ from Trending in prediction?
Application Workload
Ap
pli
cati
on R
esp
onse
Tim
e
SLAThreshold
System LoadMeasurement
Date predictedVia Modeling
Date predictedVia Trending
Application Modeling vs. Linear Regression via Trending
Mathematics of Performance & Capacity
Slide 25Tim Browning
October, 2008
Modern Dynamic Systems are Challenging to Understand
Mathematics of Performance & Capacity
Slide 26Tim Browning
October, 2008
Mathematics of Performance & Capacity
Slide 27Tim Browning
October, 2008
• Response to Capacity/Performance Crisis:
• I. System/Application tuning, re-engineering, and optimization :– Benefit: Considerable merit is obtained sometimes in the hundreds of percent improvements.
Achieved via system administrative action (usually parametric changes for the OS) and by algorithmic and parametric re-specification (for the application). No capital expense. Efficient use of resources.
– Detriments: The effects may not be enduring for dynamic systems as version/release changes and application functionality changes can, and do, degrade performance tuning effects quickly. Often system reinitiatlization (reboot, IPL) is required and creates an availability/service delivery issue. Application re-engineering for performance may be, and often is, cost prohibitive and/or unsupported by executive management.
• 2. Capacity Increase via upgrade/replacement or technology refresh:– Benefit: Reduces risk of unsupported/unrecoverable infrastructure conditions. The effect is usually
long term. Accommodates increased application functionality for business utility.
– Detriments: Capital expense may be incurred. Inefficiencies remain. Risk management to avoid undersizing or oversizing requires expensive predictive modeling tools. Predictive analytics requires advanced skills in tech staffing. Risks associated with new technologies which may increase complexity (e.g. virtualization). Costs may be unsupported by executive management.
Mathematics of Performance & Capacity
Slide 28Tim Browning
October, 2008
Modeling? Why?
Reactive Problem Solving vs Modeling
• damage grows rapidly with time;• the longer the error goes undiscovered, the more useless and damaging work based on the error will be done;• when the error is discovered, it and all the associated damage has to beremoved;• the system will then need therapy to recover • the death rate increases dramatically with late discovery • alternatively, the survival rate increases dramatically with early discovery
"Crude measures of the right things are better than precise measures of the wrong things."- from Jim Clemmer's article, "Strategic Measurements Guide Change and Improvement"
Mathematics of Performance & Capacity
Slide 29Tim Browning
October, 2008
Summary of Performance Analysis Techniques
Technique Suitable Unsuitable Reactive problem solving
Almost never Almost always
Measurement For a current status report For modern, dynamic systems with complex, rapidly changing technology
Consensus of expert guessing
Quick and dirty decisions where risk is low
High risk, complexity, high variance between experts
Analytic Modeling
For models such as capacity plans where the models will be reuseable
For projects with a) new and untested architectures, b) new technology, or c) complex, heterogeneous, highly distributed behavior
Simulation Modeling
For predicting the performance of complex new systems and technology in general and modern distributed information systems and computer systems in particular
For rough, quick estimates for large numbers of configurations of simple, mature systems where analytic modeling is suitable
Benchmarking To determine the performance of a particular workload on a particular configuration
When many workloads, designs, and configurations must be analyzed
Mathematics of Performance & Capacity
Slide 30Tim Browning
October, 2008
Predictive Analytics: Benefits
Predictive analytics provide a practical way to detect problems and allow early correction as well as avoid resource saturation conditions.
Simulation provides a practical way to detect such problems and allow early correction. Avoiding the use of simulation substantially increases the risk of failure.
Analytical modeling provides fast and accurate answers based on existing performance data. It allows for a variety of what-if scenarios to be easily crafted to determine the best course of action when systems are experiencing change.
Statistcal Forecasting and Analysis provides descriptive and predictive aspects of IT performance data topology thru the use of measures of central tendency, variability, correlation, linear regression, and statistical pattern recognition.
Mathematics of Performance & Capacity
Slide 31Tim Browning
October, 2008
SAP-specific Capacity Planning Methodology for CCE
• We want to acquire capacity to provide required service levels for sustained busy periods. Typical examples:
– Month end closing– Busy daily window (e.g., 09:00 to 11:00)– Mondays– Complete batch window on time to deliver operational reports or schedule
deliveries/shipment/print picking papers/etc
• The best approach is to choose the percentile you want to satisfy– The 90th percentile of hourly mips across the month is reflective of busy
daily periods– Likewise the 95th percentile reflects the sustained busy where there is a
pronounced financial systems month end closing effect– In legacy OLTP we often see peak to average ratio’s between 1.5:1 and
2:1 based on the definition of peak (e.g, 90th vs 95th)– This really is a view of sustained busy– No one can afford to buy for absolute peaks (99th or 100th percentile)
Mathematics of Performance & Capacity
Slide 32Tim Browning
October, 2008
Capacity Planning for the Newly Virtual
Three Essential Elements measurement to ascertain critical data like IT
resource availability, utilization and usage patterns
second-level analysis to focus on the long-term needs of the enterprise rather than the immediate concern to bump up resources
business realignment to ensure that IT is keeping pace with business needs, not the other way around
Mathematics of Performance & Capacity
Slide 33Tim Browning
October, 2008
Capacity Planning for the Newly Virtual
Over half (54%) of the virtual-server adopters have experienced a net growth in capacity, while only 7% reported a net decrease (ESG Research)
Focus on understanding our “virtualization” factorso Effect of non-concurrent peaks of multiple workloadso Follow the sun in a global operationo Better understanding of these effects can be gained by looking at the
90th/95th percentiles o Landscape dimensions:
• a workload level, • a platform (processor complex) level, • a Sysplex / Cluster level• Server/Lpar level, etc.
The ‘virtualization’ analysis will tell us how much we can over-commit resources
• The 95th percentile of the sums vs the sum of the 95th percentiles• It is often the case that we have the ability to load to 115% with the sum of the
95th percentiles
Mathematics of Performance & Capacity
Slide 34Tim Browning
October, 2008
Organizational Support
Institutionalize the process
The resource reporting and modeling is actually the easy part of this
The more difficult and more important part of institutionalizing the process is connecting the application blueprinting/design process to the capacity planning process:– This creates the understanding of the business drivers which is
key to scaling factors and calibration– This is also a potential trigger for alerting the organization to the
need for a risk mitigation plan. For example, step function workload increases with new workloads which should lead to a performance testing activity
Mathematics of Performance & Capacity
Slide 35Tim Browning
October, 2008
Organizational Support for Capacity Planning
Market the lesser-known benefits of capacity planning
Strengthened relationships with developers and end users. Communication, negotiation, and a sense of joint ownership can all combine to nurture a healthy, professional relationship between IT and its customers
Improved communications with suppliers. Involving key suppliers and support staffs with your capacity plans can promote effective communications among these groups
Increased collaboration with other infrastructure groups. Network services, technical support, database administration, operations, desktop support, and even facilities may all play a role in capacity planning. In order for the plan to be thorough and effective, all these various groups must support and collaborate with each other.
Promotion of a culture of strategic planning as opposed to tactical firefighting. One of the most significant benefits of developing an overall and ongoing capacity-planning program is the institutionalizing of a strategic-planning culture
Mathematics of Performance & Capacity
Slide 36Tim Browning
October, 2008
C. Tim Browning Coca-Cola Enterprises Technical Architecture Enterprise Performance & Capacity Planning \\\|/// \\ - - // ( @ @ ) +-----oOOo-(_)-oOOo--+-----------------------------------+ | | | | T I M | Tel: (770) 370-8566 (OFFICE) | | B R O W N I N G | (404) 210-7051 (CELL) | | | MAIL: [email protected] | +--------------Oooo--+-----------------------------------+ oooO ( ) ( ) ) / \ ( (_/ \_)
Go Green – Stop Global Whining
Author/Contact: