Self-Adaptive SLA-Driven Capacity Management for Internet Services

Self-Adaptive SLA-Driven Capacity Management for Internet Services

Bruno Abrahao, Virgilio Almeida, Jussara AlmeidaFederal University of Minas Gerais, Brazil

Alex Zhang, Dirk Beyer, Fereydoon SafaiHewllet-Packard Labs Palo Alto, CA

IEEE NOMS 20066 April, 2006

2

Motivation• IT outsourcing for Internet Services − Contracts with a provider− Multiple service shared Internet Data Centers (IDC)

• Providers’ challenging task− cost effectiveness while satisfying the customers’ SLA

requirements

• Complexity− Keep track of different application requirements, systems

characteristics, and simultaneous workload variations, as well as (and more importantly!) to consider the business goal of the provider

3

Challenges

Probabilistic performance requirements

Per use service

accounting

Multiple metric requirements

High workload fluctuations

Unexpected workload peaks

Application Heterogeneity

• New customer demands

• Application characteristics

• manual management becomes impractical

• even more complex business and systems models

4

Goal• To present a self-adaptive capacity

management scheme for IDCs which aims at maximizing the service revenue of the provider

−Take into account the new challenges of the modern IT business and infra-structure

−Allows providers to offer customers flexible service plans

−Minimize management costs for service providers

5

IDC Environment

• VMs provide admission control mechanisms

• Virtualization• Transparent and flexible

capacity expansion/ contraction.

6

Self-Adaptive Framework

• Control Interval

7

Capacity Manager Scheme• Provides IDC configurations that maximize the

business objective of the provider

8

Cost Model

• Allows per-use service accounting− Customers pay for extra capacity (than that normally

required) only when needed

• Service accounting− performance achieved by virtual machines instead of

simply accounting for resource utilization

9

Cost Model• Allows probabilistic response time requirements

• Allows multiple metric service level

− Throughput, subjected to a guarantee in the response time of the processed transactions

})(|{ SLARRPX

iSLAii RRP )(

10

Cost Model

Two-level SLA contracts- Normal operation mode

- Surge operation mode

Penalty/Reward model

Provider’s business objetive

Maximize the net result from the penalties and rewards

Extra processing limit

Normal processing requirement

11

Performance Model

• application system characteristics

• performance requirements

• current workload intensity

Performance Model

Capacity allocation decision

• Throughput

• Utilization

• Response time probability distribution

• Based on queuing-theory

12

Performance Model

• Utilization and Throughput can be estimated using well-known queuing-based formulas

• Approximations are often needed to estimate Response time probability distribution

− Markov

− Chebyshev

− Percentile (M/M/1)

SLAi

iSLAii R

RERRP

][)(

2])[(

]var[)(

iSLA

iSLAi RER

RRRP

)1])([/()( iiiSLAi SEfRSLA

ii eRRP

13

Optimization model

{Cost Model

{Perf. Model

Capacity allocation

Provider’s business objective

14

Experimental Analysis• Self-adaptive versus static configuration

− Examine the resulting provider’s payoff − Examine whether performance requirements are met and

queue stability is maintained

• Compare the degree of accuracy provided by each of the performance approximations

• how− Simulate and analyze the behavior of two competing

applications that receive different workloads levels over time

15

Experimental Analysis

• Net result of the provider (M/M/1)

16


• Theoretical value:

• Queue size M/M/1

05.1895.01

)95.0(

1

22

i

iiQ

17


• Requirement:

• Response time M/M/1

10.0)1.0( RP

18

Experimental Analysis• Penalty/Rewards M/M/1

19

Conclusions• The self-adaptive capacity management

model with any of the approximations is able to − increase the business potential of the provider

− Higher payoffs

−maintain the application stability− Stable service queues− Response time requirement satisfaction

−Markov’s approximation overestimates capacity needs−Chebyshev e Percentile result in a equivalent

degree of precision in M/M/1 model• Allows for the new challenges of the

problem

Self-Adaptive SLA-Driven Capacity Management for Internet Services

Bruno Abrahao, Virgilio Almeida, Jussara AlmeidaUniversidade Federal de Minas Gerais, Brazil

Alex Zhang, Dirk Beyer, Fereydoon SafaiHewllet-Packard Labs Palo Alto, CA

IEEE NOMS 20066 April, 2006

Time for questions

21

Backup slides

22


• Two similar applications

• Service demand: sec10][ 3iSE

• Experimental setup

23

Environment

• utilization = busy time / total time

• Virtualization

24

Cost Model

Y

25

Cost Model

NSLAXY

26

Cost Model

NSLAXZ

27

Cost Model

NSLASSLA XXZ

28

Net result M/M/1 and M/G/1 PS

29

Experimental Analysis• Queue size M/G/1 (PS)

05.1895.01

)95.0(

1

22

i

iiQ

• Theoretical value:

30


• Requirement:

• Response time M/G/1 (PS)

10.0)1.0( RP

31


• Penalty/Reward M/G/1 (PS)

Education

Self-Adaptive SLA-Driven Capacity Management for Internet Services