View
213
Download
0
Embed Size (px)
Citation preview
SLA-aware Virtual Resource Management for Cloud
Infrastructures
On the Management and Efficiency of Cloud Based Services
Eitan Rosenfeld December 8th, 2010
Problem SpaceAutomate the management of virtual
servers
Why this is challenging:
Must take into account high-level SLA requirements of hosted applications
Must take into account resource management costs
When applications change state, their resource demands are likely to change
High Level SolutionGenerate a Global Utility function
Constraint Programming approachDegree of SLA fulfillmentOperating costs
Autonomic resource manager built on utility functionDecouple resource provisioning and VM placement
Why use (and automate) the Cloud?
Static allocation of resources results in 15-20% utilization
VMs allow decoupling of applications from physical servers
Automation of the management process (scale up and scale down) can reduce cost, and boost/maintain performance
Decoupling in two stagesProvisioning stage
Allocate resource capacity virtual machines for a given application
Driven by performance goals associated with the business-level SLAs of the hosted applications
Placement stageMap Virtual Machines to Physical MachinesDriven by data center policies regarding resource
management costs. A typical example is to lower energy consumption by
minimizing the number of active physical servers.
Application Environment A
VM1 VM2 VM7VM6VM5VM4VM3
Physical Machine1
Physical Machine2
Physical Machine3
Physical Machine4
Application Environment B
Application Environment C
Allocating new resources with state changes
State = 1State = 2State = 3
Automation criteriaWhat are the requirements for successful
automation? Dynamic provisioning and placementSupport for
online applications with stringent QoS requirements batch-oriented CPU-intensive applications
Support for a variety of application topologies
Provisioning maps to application specific functions
Placement maps to a global decision layer
Utility function is their means of communication. Utility function returns a scalar value
0 (unsatisfied) to 1 (satisfied)Application state: Workload, Resource Capacity, SLA
Both provisioning and placement are mapped as Constraint Satisfaction Problems
Some DefinitionsSatisfaction – whether an application is
achieving its performance goals
Constraint Programming – solve a problem by stating constraint relations between variables – constraints must be satisfied by the solution
AssumptionsPhysical machines can run multiple VMs
Application Environment (AE)AEs can span multiple VMsSLAs apply to AEs
A VM can only run one AE at a time
High Level ArchitectureLocal Decision Module (LDM) for each AE
Compute satisfaction with current resources, workload, and service metrics (utility function)
Evaluate the opportunity of allocating more VMs or releasing existing VMs to/from the AE
Global Decision Module (GDM)Arbitrates resource requirements based on utility
functions and performance of VMs and PMsNotify LDMs of changes to VMs and manage the
VM lifecycle (start, stop, migrate)
Local Decision ModuleLDM is associated with two utility functions
(1) Fixed service-level - maps the service level to a utility value
(2) Dynamic resource-level - maps a resource capacity to a utility value, communicated to GDM
VariablesLet A =(a1, a2, ..., ai, ..., am) denote the set of AEs,
P=(p1,p2,...,pj,...,pq) denote set of PMs in datacenter,
S=(s1, s2, ..., sk, ..., sc), denote set of c classes of VMs, where sk=(sk
cpu, skram) specifies the VM CPU capacity
in MHz and the VM memory capacity in megabytes
LDM (cont’d)Utility function (2) ui for application ai:
ui = fi(Ni), where Ni is the VM allocation vector of application ai:
Ni = (ni1,ni2,...,nik,...,nim) where nik is the number of VMs of class sk attributed to application ai.
Application ConstraintsEach application also provides upper bound on
VMs that it is willing to accept.Each VM class Ni
max=(ni1max
,ni2max,...,nik
max,...,nimmax)
Total Timax
(1) 1 ≤ i ≤ m and 1 ≤ k ≤ c
(2) 1 ≤ i ≤ m
€
nik ≤ Timax
k=1
c
∑€
nik ≤ nikmax
Global Decision ModuleDuties (and Constraint Satisfaction Problems)
Determining VM allocation vectors Ni for each application (Provisioning)
Place VMs on PMs such that number of active PMs is minimized (Packing)
ProvisioningVMs allocated to all applications are constrained by
capacity physical servers
CPU capacity
(3)
RAM capacity
where Cj is the capacity of PM pj€
nik ⋅ skcpu ≤ C j
cpu
j=1
q
∑k=1
c
∑i=1
m
∑
nik ⋅ skram ≤ C j
ram
j=1
q
∑k=1
c
∑i=1
m
∑
Provisioning OutputProvisioning phase output
Set of vectors Ni for which constraints 1, 2, 3 are satisfied
Comparing new Ni to existing Ni tells GDM which VMs will be created, destroyed, or resized.
Global utility Uglobal is maximized via weighted sums of utility and operating costs.
where is weight of utility fn for application ,
and ε is coefficient that allows admin to trade/tweak performance goals for operating cost of Ni€
Uglobal = maximize α i × ui −ε ⋅ cost(N i)( )i=1
m
∑
€
α i
€
ui
€
i
Packing (Placement)V = (vm1,vm2,...,vml,...,vmv) lists all VMs running
at the current time.
For each PM pj ∈ P,
bit vector Hj = (hj1,hj2,...,hjl,...,hjv) denotes the set of VMs assigned to pj
Example: hjl = 1 if pj is hosting vml
R = (r1 , r2 , ..., rl , ..., rv ) is the resource capacity (CPU, RAM) of all VMs, where rl=(rlcpu , rlram )
Packing (physical resource) Constraints
The sum of the resource capacities of the VMs on PM pj must be less than or equal to the resource capacity of pj.
€
1≤ j ≤ q
€
1 ≤ j ≤ q
€
rlcpu⋅ hij ≤ C j
cpu
l=1
v
∑
rlram ⋅ hij ≤ C j
ram
l=1
v
∑
Packing OutputPacking produces VM placement vectors Hj
GDM is run periodically – uses previous Hj to determine which VMs need to be migrated
Goal is to minimize number of active PMs X:
€
X = u jj=1
q
∑ , where u j
€
{
€
1 ∃vml ∈V | h jl =1
0 otherwise
Simulation Environment4 PMs, each with 4000 MHz, 4000 MB
2 applications
Cost function:
€
Cost(CPU) =CPUdemand
CPUtotal
Simulation 1Minimize operating cost impact: ε = .05
4 VM classes
Given Table II below, A is given priority
Demand
DA and DB
CPUs
RA and RB
#Physical Machines
Global Utility
Response times
TA and TB
Simulation (cont’d)
Simulation 2: Operating Cost factor ε increases to .3
Simulation 3New utility function for both A and B
Simulation 3 results
Looking at t4 and t5 – CPU resource for B descends faster as compared to the first test
Simulation 4: Changing weight factors
αA=0.3, αB=0.7 B obtains enough CPU, A does not fails to meet
SLA
RecommendationsConstraint Solver for optimizing provisioning and
packing is not discussed.*
No mention of any overheads of migrating to a new PM or allocating a new VM.
Simulations do not dive into N vectors for VM provisioning
No discussion of cost or frequency of running GDM
*Choco open source constraint solver is used
ConclusionsDynamic placement and attention to
application-specific goals are valuable
Modeling on top of Constraints allows for flexibility
Utility functions provide a uniform way for applications to self-optimize.