Gossip-based resource allocation for green computing in large clouds

Gossip-based resource allocation withperformance and energy-savings objectives for

large clouds

Rerngvit Yanggratoke Fetahi Wuhib Rolf Stadler

LCN SeminarKTH Royal Institute of Technology

April 7, 2011

Motivation

“datacenters in US consumed 1.5 % of total US powerconsumption, resulting in energy cost of $4.5 billions”

–U.S. Environmental Protection Agency, 2007.

“per-server power consumption of a datacenter, over itslifetime, is now more than the cost of the server itself”

–Christian L. Belady, 2007.

Server Consolidation

Minimize number of active servers. Idle servers can be shutdown.

VMWare Distributed PowerManagement(DPM)

Why?

The average utilization levelof servers in datacenters isjust 15% (EPA 2007)

An idle server typicallyconsumes at least 60% of itspower consumption underfull load (VMWare DPM,2010).

Existing works

Products: VMWare DPM, Ubuntu Entreprise Cloud PowerManagement.

Research: (G. Jung et al., 2010), (V. Petrucci et al., 2010),(C. Subramanian et al., 2010), (M. Cardosa et al., 2010) ...

All of them based on some centralized solutions.

Bottleneck.Single point of failure.

Design goals and design principles

Design goals

Server consolidation in case of underload.

Fair resource allocation in case of overload.

Dynamic adaptation to changes in load patterns.

Scalable operation.

Design principles

a distributed middleware architecture.

distributed protocols - epidemic or gossip-based algorithms.

Generic protocol for resource management(GRMP).Instantiation for solving the goals above(GRMP-Q).

The problem setting

The cloud service provider operatesthe physical infrastructure.

The cloud hosts sites belonging toits clients.

Users access sites through theInternet.

A site is composed of modules.

Our focus: allocating CPU andmemory resources to sites.

The stakeholders.

Middleware architecture

Key components: machinemanager and site manager.

The middleware runs on all machines in the cloud.

Middleware architecture (Cont.)

The machine pool service.

Standby - ACPIG2(Soft-off).

Activate - wake-on-LAN(WoL) packet.

Modeling resource allocation

Demand and capacity

M , N : set of modules and machines (servers) respectively.ωm(t), γm: CPU and memory demands of module m ∈M .Ω, Γ: CPU and memory capacity of a machine in the cloud.

Resource allocation

ωn,m(t) = αn,m(t)ωm(t): demand of module m on machine n.A(t) = (αn,m(t))n,m a configuration matrix.

Machine n allocates ωn,m(t) CPU and γm memory to module m.ωn,m(t) = Ωωn,m(t)/

∑i ωn,i: local resource allocation policy

Ω.

Utility and power consumption

Utility

un,m(t) =ωn,m(t)ωn,m(t) : utility of module m on machine n.

u(s, t) = minn,m∈Ms un,m(t): site utility.U c(t) = mins|u(s,t)≤1 u(s, t): cloud utility.

Power consumption

Assuming homogenous machines.

Pn(t) =

0 if rown(A)(t)1 = 0

1 otherwise

The resource allocation problem

Resource allocation as a utility maximization problem

maximize U c(t+ 1)

minimize P c(t+ 1)

minimize c∗(A(t), A(t+ 1))

subject to A(t+ 1) ≥ 0, 1TA(t+ 1) = 1T

Ω(A(t+ 1),ω(t+ 1))1 Ω

sign(A(t+ 1))γ Γ.

Cost of reconfiguration

c∗ is the number of module instances that are started toreconfigure the system.

Protocol GRMP: pseudocode for machine n

initialization1: read ω, γ,Ω,Γ, rown(A);2: initInstance();3: start passive and active threads;

active thread1: for r = 1 to rmax do2: n′ = choosePeer();3: send(n′, rown(A));4: rown′ (A) = receive(n′);5: updateP lacement(n′, rown′ (A));6: sleep until end of round;7: write rown(A);

passive thread1: while true do2: rown′ (A) = receive(n′);3: send(n′, rown(A));4: updateP lacement(n′, rown′ (A));

Three abstract methods:

initInstance();

choosePeer();

updateP lacement(n′, rown′ (A));

Protocol GRMP-Q: pseudocode for machine n

initInstance()1: read Nn;

choosePeer()1: if rand(0..1) < p then2: n′ = unifrand(Nn);3: else4: n′ = unifrand(N −Nn);

updatePlacement(j, rowj(A))1: if (ωn + ωj ≥ 2Ω) then2: equalize(j, rowj(A));3: else4: if j ∈ Nn then5: packShared(j);6: packNonShared(j);

ωn =∑m ωn,m;

Principles:

Prefer a gossiping peer with commonmodules. Nn = j ∈ N, where j havecommon modules with n.Equalize if aggregation load ≥aggregation capacity.

Pick destination machine to pack:

higher load machine if bothare underloaded.underloaded machine if oneis overloaded.

Utilize both CPU and memory duringthe packing process.

pickSrcDest(j)1: dest = arg max(ωn, ωj);2: src = arg min(ωn, ωj);3: if ωdest > Ω then swap dest and src4: return (src, dest);

GRMP-Q - method updatePlacement(j, rowj(A))

Protocol GRMP-Q (Cont.)

packShared(j)1: (s, d) = pickSrcDest(j);2: ∆ωd = Ω−

∑m ωd,m;

3: if ωs > Ω then ∆ωs =∑m ωs,m − Ω else ∆ωs =∑m ωs,m;

4: Let mod be the list of modulesshared by s and d, sorted by de-creasing γs,m/ωs,m;

5: while mod 6= ∅ ∧ ∆ωs > 0 ∧∆ωd > 0 do

6: m = popFront(mod);7: δω = min(∆ωd,∆ωs, ωs,m);8: ∆ωd -= δω; ∆ωs -= δω; δα =

αs,mδωωs,m

;

9: αd,m += δα; αs,m -= δα;

vn =∑

m ωn,m

Ω; gn =

∑m γn,m

Γ;

packNonShared(j)1: (s, d) = pickSrcDest(j);2: ∆γd = Γ −

∑m γd,m; ∆ωd = Ω −∑

m ωd,m;3: if ωs > Ω then ∆ωs =

∑m ωs,m−Ω else

∆ωs =∑m ωs,m;

4: if vd ≥ gd then sortCri = γs,m/ωs,m;else sortCri = ωs,m/γs,m;

5: Let nmod be the list of modules on snot shared with d, sorted by decreasingsortCri;

6: while nmod 6= ∅ ∧ ∆γd > 0 ∧ ∆ωd > 0∧ ∆ωs > 0 do

7: m = popFront(nmod);8: δω = min(∆ωs,∆ωd, ωs,m); δγ =

γs,m;9: if ∆γd ≥ δγ then10: δα = αs,m

δωωs,m

; αd,m += δα;

11: αs,m -= δα; ∆γd -= δγ;12: ∆ωd -= δω; ∆ωs -= δω;

Properties of GRMP-Q

Overload scenarios

fair allocation

Sufficient memory scenario

Optimal solution: the protocol converges into a configurationwhere b|N |CLF c machines are fully packed (wrt CPU) and,|N | − d|N |CLF e are empty.

General case

As long as there is a free machine in the cloud, the protocolguarantees that the demand of all sites is satisfied.

Simulation: demand, capacity and evaluation metrics

Demand ω changes at discrete points in time at whichGRMP-Q recomputes A.

Demand: CPU demand of sites is Zipf distributed. Memorydemand of modules is selected from 128MB, 256MB,512MB, 1GB, 2GB.Capacity: CPU and memory capacities are fixed at 34.513GHz and 36.409GB respectively.

Evaluation scenariosdifferent CPU and memory load factors (CLF, MLF).CLF=0.1, 0.4, 0.7, 1.0, 1.3 and MLF=0.1, 0.3, 0.5, 0.7,0.9.different system size.

Evaluation metrics: power reduction, fairness, satisfieddemand, cost of reconfiguration.

Measurement Results

(a) Fraction of machines that can be shutdown.(b) Fraction of sites with satisfied demand.

(c) Cost of change in configuration. (d) Fairness among sites.

Measurement Results(Cont.)

Scalability with respect to the number of machines and sites. Weevaluate two different sets of CLF and MLF which are(0.5, 0.5), (0.25, 0.25).

Conclusion

We introduce and formalize the problem of serverconsolidation in site-hosting cloud environment.

We develop GRMP, a generic gossip-based protocol forresource management that can be instantiated withdifferent objectives.

We develop an instance of GRMP which we callGRMP-Q, and which provides a heuristic solution to theserver consolidation problem.

We perform a simulation study of the performance ofGRMP-Q, which indicates that the protocol qualitativelybehaves as expected based on its design. For all parameterrange investigated, the protocol uses at most 30% moremachines of the cloud compared to an optimal solution.

Future work

Works relating to resource allocation protocol GRMP-Q:

Its convergence property for large CLF values.

Its support for heterogeneous machines .

Robustness regarding failures.

Regarding the middleware architecture:

Design a mechanism for deploying new sites.

Extend design to span multiple clusters.