14
Optimal Power Allocation and Load Distribution for Multiple Heterogeneous Multicore Server Processors across Clouds and Data Centers Junwei Cao, Senior Member, IEEE, Keqin Li, Senior Member, IEEE, and Ivan Stojmenovic, Fellow, IEEE Abstract—For multiple heterogeneous multicore server processors across clouds and data centers, the aggregated performance of the cloud of clouds can be optimized by load distribution and balancing. Energy efficiency is one of the most important issues for large- scale server systems in current and future data centers. The multicore processor technology provides new levels of performance and energy efficiency. The present paper aims to develop power and performance constrained load distribution methods for cloud computing in current and future large-scale data centers. In particular, we address the problem of optimal power allocation and load distribution for multiple heterogeneous multicore server processors across clouds and data centers. Our strategy is to formulate optimal power allocation and load distribution for multiple servers in a cloud of clouds as optimization problems, i.e., power constrained performance optimization and performance constrained power optimization. Our research problems in large-scale data centers are well-defined multivariable optimization problems, which explore the power-performance tradeoff by fixing one factor and minimizing the other, from the perspective of optimal load distribution. It is clear that such power and performance optimization is important for a cloud computing provider to efficiently utilize all the available resources. We model a multicore server processor as a queuing system with multiple servers. Our optimization problems are solved for two different models of core speed, where one model assumes that a core runs at zero speed when it is idle, and the other model assumes that a core runs at a constant speed. Our results in this paper provide new theoretical insights into power management and performance optimization in data centers. Index Terms—Load distribution, multicore server processor, power allocation, queuing model, response time Ç 1 INTRODUCTION 1.1 Motivation C LOUD computing has recently received considerable attention and is widely accepted as a promising and ultimate way of managing and improving the utilization of data and computing center resources and delivering various computing and IT services. Currently, server sprawl is a common situation, in which multiple under- utilized servers take up more space and consume more energy than can be justified by their workload. According to a recent report [2], many companies typically run at 15- 20 percent of their capacity, which is not a sustainable ratio in the current economic environment. Server consolidation is an effective cloud computing approach to increasing the efficient usage of computer server resources to reduce the total number of servers or server locations that an organization requires. By centralized management of computing resources, cloud computing delivers hosted services over the Internet, such that access to shared hardware, software, databases, information, and all re- sources are provided to users on-demand. In a data center with multiple servers, the aggregated performance of the data center can be optimized by load distribution and balancing. Cloud-based applications depend even more heavily on load balancing and optimization than traditional enterprise applications. For end users, load balancing capabilities will be seriously considered when they select a cloud computing provider. For cloud provi- ders, load balancing capabilities will be a source of revenue, which is directly related to service quality (e.g., task response time). Hence, an efficient load balancing strategy is a key component to building out any cloud computing architecture. Power and cooling costs for data centers have sky- rocketed by 800 percent since 1996, and the escalating costs see no end in sight [15]. Energy efficiency is one of the most important issues for large-scale server systems in current and future data centers. Cloud computing provides the ultimate way of effective and efficient management of energy consumption. Cloud computing can be an inherently energy-efficient technology, due to centralized energy management of computations on large-scale computing systems, instead of distributed and individualized applica- tions without efficient energy consumption control [6]. Moreover, such potential for significant energy savings can be fully explored with balanced consideration of system performance and energy consumption. IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014 45 . J. Cao is with the Research Institute of Information Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China. E-mail: [email protected]. . K. Li is with the Department of Computer Science, State University of New York, New Paltz, New York 12561. E-mail: [email protected]. . I. Stojmenovic is with the School of Electrical Engineering and Computer Science, University of Ottawa, Ontario K1N 6N5, Canada, and the School of Software, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China. E-mail: [email protected]. Manuscript received 7 Sept. 2012; revised 3 Mar. 2013; accepted 18 May 2013; published online 29 May 2013. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TCSI-2012-09-0645. Digital Object Identifier no. 10.1109/TC.2013.122. 0018-9340/14/$31.00 ß 2014 IEEE Published by the IEEE Computer Society

IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

Optimal Power Allocation and Load Distributionfor Multiple Heterogeneous Multicore ServerProcessors across Clouds and Data Centers

Junwei Cao, Senior Member, IEEE, Keqin Li, Senior Member, IEEE, and

Ivan Stojmenovic, Fellow, IEEE

Abstract—For multiple heterogeneous multicore server processors across clouds and data centers, the aggregated performance of

the cloud of clouds can be optimized by load distribution and balancing. Energy efficiency is one of the most important issues for large-

scale server systems in current and future data centers. The multicore processor technology provides new levels of performance and

energy efficiency. The present paper aims to develop power and performance constrained load distribution methods for cloud

computing in current and future large-scale data centers. In particular, we address the problem of optimal power allocation and load

distribution for multiple heterogeneous multicore server processors across clouds and data centers. Our strategy is to formulate

optimal power allocation and load distribution for multiple servers in a cloud of clouds as optimization problems, i.e., power constrained

performance optimization and performance constrained power optimization. Our research problems in large-scale data centers are

well-defined multivariable optimization problems, which explore the power-performance tradeoff by fixing one factor and minimizing the

other, from the perspective of optimal load distribution. It is clear that such power and performance optimization is important for a cloud

computing provider to efficiently utilize all the available resources. We model a multicore server processor as a queuing system with

multiple servers. Our optimization problems are solved for two different models of core speed, where one model assumes that a core

runs at zero speed when it is idle, and the other model assumes that a core runs at a constant speed. Our results in this paper provide

new theoretical insights into power management and performance optimization in data centers.

Index Terms—Load distribution, multicore server processor, power allocation, queuing model, response time

Ç

1 INTRODUCTION

1.1 Motivation

CLOUD computing has recently received considerableattention and is widely accepted as a promising and

ultimate way of managing and improving the utilization ofdata and computing center resources and deliveringvarious computing and IT services. Currently, serversprawl is a common situation, in which multiple under-utilized servers take up more space and consume moreenergy than can be justified by their workload. Accordingto a recent report [2], many companies typically run at 15-20 percent of their capacity, which is not a sustainable ratioin the current economic environment. Server consolidationis an effective cloud computing approach to increasing theefficient usage of computer server resources to reduce thetotal number of servers or server locations that anorganization requires. By centralized management of

computing resources, cloud computing delivers hostedservices over the Internet, such that access to sharedhardware, software, databases, information, and all re-sources are provided to users on-demand.

In a data center with multiple servers, the aggregatedperformance of the data center can be optimized by loaddistribution and balancing. Cloud-based applications dependeven more heavily on load balancing and optimization thantraditional enterprise applications. For end users, loadbalancing capabilities will be seriously considered whenthey select a cloud computing provider. For cloud provi-ders, load balancing capabilities will be a source of revenue,which is directly related to service quality (e.g., taskresponse time). Hence, an efficient load balancing strategyis a key component to building out any cloud computingarchitecture.

Power and cooling costs for data centers have sky-rocketed by 800 percent since 1996, and the escalating costssee no end in sight [15]. Energy efficiency is one of the mostimportant issues for large-scale server systems in currentand future data centers. Cloud computing provides theultimate way of effective and efficient management ofenergy consumption. Cloud computing can be an inherentlyenergy-efficient technology, due to centralized energymanagement of computations on large-scale computingsystems, instead of distributed and individualized applica-tions without efficient energy consumption control [6].Moreover, such potential for significant energy savings canbe fully explored with balanced consideration of systemperformance and energy consumption.

IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014 45

. J. Cao is with the Research Institute of Information Technology, TsinghuaNational Laboratory for Information Science and Technology, TsinghuaUniversity, Beijing 100084, China. E-mail: [email protected].

. K. Li is with the Department of Computer Science, State University of NewYork, New Paltz, New York 12561. E-mail: [email protected].

. I. Stojmenovic is with the School of Electrical Engineering and ComputerScience, University of Ottawa, Ontario K1N 6N5, Canada, and the Schoolof Software, Tsinghua National Laboratory for Information Science andTechnology, Tsinghua University, Beijing 100084, China.E-mail: [email protected].

Manuscript received 7 Sept. 2012; revised 3 Mar. 2013; accepted 18 May2013; published online 29 May 2013.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TCSI-2012-09-0645.Digital Object Identifier no. 10.1109/TC.2013.122.

0018-9340/14/$31.00 � 2014 IEEE Published by the IEEE Computer Society

Page 2: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

Permanently altering the course of computing, the multi-core processor technology provides new levels of performanceand energy efficiency. Ideal for multitasking, multimedia,and networking applications, multicore technology deliversexceptional energy-efficient performance for the ultimatecomputing experience. Incorporating multiple processorcores in a single package that delivers parallel execution ofmultiple software applications, multicore technology enableshigher levels of performance and consumes less powertypically required by a higher frequency single core processorwith equivalent performance.

Software techniques for power reduction are supportedby a mechanism called dynamic voltage scaling (equivalently,dynamic frequency scaling, dynamic speed scaling, dy-namic power scaling). A power-aware system managementalgorithm can change supply voltage and frequency atappropriate times to optimize a combined consideration ofperformance and energy consumption. Dynamic powermanagement at the operating system level provides supplyvoltage and clock frequency adjustment schemes imple-mented while tasks are running. These energy conservationtechniques explore the opportunities for tuning the energy-delay tradeoff [28].

As cloud computing evolves into the next generation, acloud of clouds is able to utilize computing resources acrossmultiple clouds and data centers. Such cloud federationnot only integrates heterogeneous resources and computingpower, but also provides more powerful and flexible servicesto various requests in scientific, business, and industrialapplications. In such an environment with multiple hetero-geneous servers across multiple clouds and data centers,optimal power allocation and load distribution makes evenmore impact to the aggregated performance and energyefficiency of a cloud of clouds. Such a situation provides us amore challenging opportunity to address and discuss powerand performance in a wider scale and to generate moresignificant influence.

1.2 Our Contributions

The present paper aims to develop power and performanceconstrained load distribution methods for cloud computingin current and future large-scale data centers. In particular,we address the problem of optimal power allocation andload distribution for multiple heterogeneous multicoreserver processors across clouds and data centers. We definetwo important research problems that explore the power-performance tradeoff in large-scale data centers from theperspective of optimal power allocation and load distribu-tion. Our strategy is to formulate optimal power allocation andload distribution for multiple servers in a cloud of clouds asoptimization problems. Our problems are defined formultiple multicore server processors with different sizes,and certain workload:

. Power constrained performance optimization—Given apower constraint, our problem is to find an optimalpower allocation to the servers (i.e., to determine theserver speeds) and an optimal workload distributionamong the servers, such that the average taskresponse time is minimized and that the averagepower consumption of the servers does not exceedthe given power limit.

. Performance constrained power optimization—Given aperformance constraint, our problem is to find anoptimal power allocation to the servers (i.e., todetermine the server speeds) and an optimal work-load distribution among the servers, such that theaverage power consumption of the servers isminimized and that the average task response timedoes not exceed the given performance limit.

Our research problems in large-scale data centers are well-defined multivariable optimization problems, which ex-plore the power-performance tradeoff by fixing one factorand minimizing the other, from the perspective of optimalload distribution. By power constrained performanceoptimization, a cloud of clouds can deliver the highestquality of service, while maintaining low cost for energyconsumption. By performance constrained power optimiza-tion, a cloud of clouds can minimize the cost for energyconsumption, while guaranteeing certain quality of service.

We model a multicore server processor as a queuingsystem with multiple servers. Our optimization problemsare solved for two different models of core speed, whereone model assumes that a core runs at zero speed when it isidle, and the other model assumes that a core runs at aconstant speed. Our investigation in this paper makes initialeffort to analytical study of power-performance tradeoff indata centers with heterogeneous servers. Our results in thispaper provide new theoretical and practical insights intopower management and performance optimization in cloudcomputing. To the best of our knowledge, such optimalpower allocation and load distribution for multiple hetero-geneous servers has not been investigated before asoptimization problems.

2 RELATED RESEARCH

Load distribution and balancing in general parallel anddistributed computing systems have been extensivelystudied and a huge body of literature exists (see theexcellent collection [27]). In particular, the problems ofoptimal load distribution have been investigated by usingqueuing models [16], [30], with various performancemetrics such as weighted mean response time [19],arithmetic average response time [20], probability of loadimbalance [21], probability of load balancing success [25],mean response ratio [29], and mean miss rate [14]. Optimalload distribution in a heterogeneous distributed computersystem with both generic and dedicated applications wasstudied in [7], [26]. Optimal load distribution of generictasks without special tasks for a group of heterogeneousmultiserver queuing systems was studied in [14]. Optimalload distribution of generic tasks together with special tasksin cluster and grid computing environments was studied in[22], where each server is modeled as a queuing systemwith a single server. Optimal load distribution of generictasks together with special tasks for a group of hetero-geneous blade servers in a cloud computing environmentwas investigated in [24], where each server is modeled as aqueuing system with multiple servers. To the best of ourknowledge, optimal load distribution with power andperformance constraints for multiple queuing systems withmultiple servers has not been considered before.

46 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014

Page 3: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

There exists a huge body of literature on energy-efficientcomputing and communication (see [3], [4], [5], [31], [32] forcomprehensive surveys). Efficient power management andperformance optimization in large-scale data centers andserver clusters has gained much attention in the researchcommunity in recent years [9], [10], [11], [12], [13]. In [17],the authors developed a framework for hierarchicalautonomic power and performance management in high-performance distributed data centers. In [23], the authorformulated and solved the optimal power allocationproblem for multiple heterogeneous servers in a datacenter. In [33], the authors proposed a highly scalablehierarchical power control architecture for large-scale datacenters. In [34], the authors presented a novel cluster-levelcontrol architecture that coordinates individual power andperformance control loops for virtualized server clusters. In[36], the authors proposed an energy proportional modelbased on queuing theory and service differentiation inserver clusters, which can provide controllable and pre-dictable quantitative control over power consumption withtheoretically guaranteed service performance, where aserver is treated as an M/G/1 queuing system, i.e., a singleserver system. Furthermore, the vary-on vary-off mechan-ism is adopted which turns servers on and off to adjust thenumber of active servers based on the workload.

3 MODELING MULTICORE SERVER PROCESSORS

To formulate and study the problem of optimal loaddistribution and balancing for multiple heterogeneousmulticore server processors in a cloud computing environ-ment with power and performance constraints, we need amodel for a multicore server and a group of multicoreservers. A queuing model for a group of n heterogeneousmulticore servers S1; S2; . . . ; Sn of sizes m1;m2; . . . ;mn andspeeds s1; s2; . . . ; sn across multiple clouds and data centersis given in Fig. 1. Assume that a multicore server Si hasmi identical cores with speed si. In this paper, a multicoreserver processor is treated as an M/M/m queuing system,which is elaborated as follows:

There is a Poisson stream of tasks with arrival rate �(measured by the number of tasks per second), i.e., theinterarrival times are independent and identically distrib-uted (i.i.d.) exponential random variables with mean 1=�.

A load distribution and balancing algorithm will split thestream into n substreams, such that the ith substream witharrival rate �i is sent to server Si, where 1 � i � n, and� ¼ �1 þ �2 þ � � � þ �n. A multicore server Si maintains aqueue with infinite capacity for waiting tasks when all themi cores are busy. The first-come-first-served (FCFS)queuing discipline is adopted by all servers. The taskexecution requirements (measured by the number ofinstructions to be executed) are i.i.d. exponential randomvariables r with mean �r. The mi cores of server Si haveidentical execution speed si (measured by billion instruc-tions per second (BIPS)). Hence, the task execution times onthe cores of server Si are i.i.d. exponential random variablesxi ¼ r=si with mean �xi ¼ �r=si.

Power dissipation and circuit delay in digital CMOScircuits can be accurately modeled by simple equations, evenfor complex microprocessor circuits. CMOS circuits havedynamic, static, and short-circuit power dissipation; how-ever, the dominant component in a well-designed circuit isdynamic power consumption p (i.e., the switching compo-nent of power), which is approximately P ¼ aCV 2f , where ais an activity factor, C is the loading capacitance, V is thesupply voltage, and f is the clock frequency [8]. Since s / f ,where s is the processor speed, and f / V � with 0 < � � 1[35], which implies that V / f1=�, we know that powerconsumption is P / f� and P / s�, where � ¼ 1þ 2=� � 3.For ease of discussion, we will assume that the powerallocated to a processor core with speed s is simply s�.

We will consider two types of core speed models. In theidle-speed model, a core runs at zero speed when there is notask to perform. Let �i ¼ 1=�xi ¼ si=�r be the average servicerate, i.e., the average number of tasks that can be finishedby a processor core of server Si in one unit of time. The coreutilization is �i ¼ �i=ðmi�iÞ ¼ �i�xi=mi ¼ ð�i=miÞ � ð�r=siÞ,which is the average percentage of time that a core of Si isbusy. Since the power for speed si is s�i , the averageamount of energy consumed by a core in one unit of time is�is

�i ¼ ð�i=miÞ�rs��1

i , where we notice that the speed of acore is zero when it is idle. The average amount of energyconsumed by an mi-core server Si in one unit of time,i.e., the average power supply to server Si, is Pi ¼ mi�is

�i ¼

�i�rs��1i , where mi�i ¼ �i�xi is the average number of busy

cores in Si. Since a processor core still consumes someamount of power P �i even when it is idle (assume that anidle core consumes certain base power P �i , which includesstatic power dissipation, short circuit power dissipation,and other leakage and wasted power [1]), we will includeP �i in Pi, i.e., Pi ¼ mið�is�i þ P �i Þ ¼ �i�rs��1

i þmiP�i . Notice

that when P �i ¼ 0, the above Pi is independent of mi.In the constant-speed model, all cores run at the speed si

even if there is no task to perform. Again, we use Pi torepresent the power allocated to server Si. Since the powerfor speed si is s�i , the power allocated to server Si isPi ¼ miðs�i þ P �i Þ.

4 POWER CONSTRAINED PERFORMANCE

OPTIMIZATION

Let pi;k denote the probability that there are k tasks (waitingor being processed) in the M/M/m system for Si. Then, wehave ([18, p. 102]):

CAO ET AL.: OPTIMAL POWER ALLOCATION AND LOAD DISTRIBUTION FOR MULTIPLE HETEROGENEOUS MULTICORE SERVER... 47

Fig. 1. A group of n heterogeneous multicore servers S1; S2; . . . ; Sn.

Page 4: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

pi;k ¼pi;0ðmi�iÞk

k!; k � mi;

pi;0mmi

i �ki

mi!; k � mi;

8>><>>:

where

pi;0 ¼Xmi�1

k¼0

ðmi�iÞk

k!þ ðmi�iÞmi

mi!� 1

1� �i

!�1

:

The probability of queuing (i.e., the probability that a newlyarrived task must wait because all processor cores are busy) is

Pq;i ¼pi;mi

1� �i¼ pi;0

mmi

i

mi!� �

mi

i

1� �i:

The average number of tasks (in waiting or in execution) inSi is

�Ni ¼X1k¼0

kpi;k ¼ mi�i þ�i

1� �iPq;i:

Applying Little’s result, we get the average task responsetime as

Ti ¼�Ni

�i¼ �xi þ

Pq;imið1� �iÞ

�xi ¼ �xi 1þ Pq;imið1� �iÞ

� �:

In other words, the average task response time on multicoreserver Si is

Ti ¼ �xi 1þ pi;0mmi�1i

mi!� �mi

i

ð1� �iÞ2

!

¼ �r

si1þ pi;0

mmi�1i

mi!� �mi

i

ð1� �iÞ2

!:

The average task response time T on a group ofn heterogeneous multicore servers S1; S2; . . . ; Sn is

T ¼ �1

�T1 þ

�2

�T2 þ � � � þ

�n�Tn:

Notice that the above T can be treated as a function of loaddistribution �1; �2; . . . ; �n and server speeds s1; s2; . . . ; sn,i.e., T ð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ.

The average power consumption P ð�1; �2; . . . ; �n; s1;

s2; . . . ; snÞ of a group of n heterogeneous multicore serversS1; S2; . . . ; Sn is also a function of load distribution�1; �2; . . . ; �n and server speeds s1; s2; . . . ; sn. For the idle-speed model, we have

P ð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ ¼Xni¼1

Pi

¼Xni¼1

mi

��is

�i þ P �i

�¼Xni¼1

�i�rs��1i þ

Xni¼1

miP�i :

For the constant-speed model, we have

P ð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ ¼Xni¼1

Pi

¼Xni¼1

mi

�s�i þ P �i

�¼Xni¼1

mis�i þ

Xni¼1

miP�i ;

which is actually a function of server speeds s1; s2; . . . ; sn,

i.e., P ðs1; s2; . . . ; snÞ.Our optimal power allocation and load distribution problem

for multiple heterogeneous multicore server processors can

be specified as follows: given the number of multicore

servers n, the sizes of the servers m1;m2; . . . ;mn, the base

power supply P �1 ; P�2 ; . . . ; P �n , the average task execution

requirement �r, the task arrival rate �, and the available

power ~P , find the task arrival rates on the servers

�1; �2; . . . ; �n, and the server speeds s1; s2; . . . ; sn, such that

the average task response time T ð�1; �2; . . . ; �n; s1; s2; . . . ;

snÞ is minimized, subject to the constraints F ð�1; �2; . . . ;

�nÞ ¼ �, where F ð�1; �2; . . . ; �nÞ ¼ �1 þ �2 þ � � � þ �n, and

�i < 1, i.e., �i < mi=�xi ¼ misi=�r, for all 1 � i � n; and

P ð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ ¼ ~P , namely,

Gð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ

¼Xni¼1

�i�rs��1i ¼ ~P �

Xni¼1

miP�i ;

for the idle-speed model, and

Gðs1; s2; . . . ; snÞ ¼Xni¼1

mis�i ¼ ~P �

Xni¼1

miP�i ;

for the constant-speed model. The optimization problem

defined above is a power constrained performance optimiza-

tion problem, i.e., optimizing performance (minimizing the

average task response time) subject to power consumption

constraint.To solve the above well-defined multivariable optimiza-

tion problem, one can employ the classic method of

Lagrange multiplier. However, this method results in a

system of nonlinear equations with 2ðnþ 1Þ variables, i.e.,

�1; �2; . . . ; �n, s1; s2; . . . ; sn, and two Lagrange multipliers.

These equations are nontrivial to solve. There is absolutely

no closed-form solution. The main contribution of the paper

is to develop an effective and efficient algorithm to find a

numerical solution. Our strategy is to carefully analyze the

problem structure and to reveal the intricate relationship

between the speed and the workload of a server. Our

analysis successfully reduces the 2ðnþ 1Þ variables into

nþ 1 variables, i.e., �1; �2; . . . ; �n and a Lagrange multiplier.

We then develop two procedures to find �1; �2; . . . ; �n and

the Lagrange multiplier, respectively. Since the standard

bisection method is used to search each value in a small

interval, our algorithm is very time efficient in finding a

solution to the multivariable optimization problem.The above centralized problem definition and solution

can be justified in two ways. First, the main approach in

cloud computing is centralized resource management and

performance optimization to minimize the cost of comput-

ing and to maximize the quality of service. Second, our

optimization problem is solved once a while for a cloud

based on stable statistical information such as � and �r. As

the user demand in an application environment changes,

the optimization problem can be solved again to reconfi-

gure the servers, so that system operation and performance

can be optimized for the new user requirement.

48 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014

Page 5: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

5 THE IDLE-SPEED MODEL

First of all, we establish the following result, which givesthe optimal server speed setting for the idle-speed model.

Theorem 1. For the idle-speed model, the average task responsetime T on n multicore servers is minimized when all servershave the same speed, i.e., s1 ¼ s2 ¼ � � � ¼ sn ¼ s, where

s ¼ 1

��r~P �

Xni¼1

miP�i

! !1=ð��1Þ

:

Proof. We can minimize T by using the method of Lagrangemultiplier, namely,

rT ð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ¼ �rF ð�1; �2; . . . ; �nÞþ rGð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ;

that is,

@T

@�i¼ � @F

@�iþ @G

@�i¼ �þ �rs��1

i ;

for all 1 � i � n, where � is a Lagrange multiplier and

is another Lagrange multiplier, and

@T

@si¼ @G

@si¼ �i�rð�� 1Þs��2

i ;

for all 1 � i � n.It is clear that

@T

@�i¼ 1

�Ti þ �i

@Ti@�i

� �;

where

@Ti@�i¼ @Ti@�i� @�i@�i¼ �r

misi� @Ti@�i

;

that is,

@T

@�i¼ 1

�Ti þ �i

@Ti@�i

� �:

Hence, we get

1

�Ti þ �i

@Ti@�i

� �¼ �þ �rs��1

i ; ð1Þ

where

@Ti@�i¼ m

mi�1i

mi!� �r

si

@pi;0@�i� �mi

i

ð1� �iÞ2

þpi;0�mi�1i ðmi � ðmi � 2Þ�iÞ

ð1� �iÞ3

!;

and

@pi;0@�i¼ �p2

i;0

Xmi�1

k¼1

mki �

k�1i

ðk� 1Þ!

þmmi

i

mi!� �

mi�1i ðmi � ðmi � 1Þ�iÞ

ð1� �iÞ2

!;

for all 1 � i � n.It is also clear that

@T

@si¼ �i�� @Ti@si

:

Let Ti ¼ ð�r=siÞDi ¼ �rðDi=siÞ, where

Di ¼ 1þ pi;0mmi�1i

mi!� �mi

i

ð1� �iÞ2:

Then, we have

@Ti@si¼ �r �Di

s2i

þ 1

si� @Di

@�i� @�i@si

� �

¼ ��r1

�rsiTi þ

1

si� @Di

@�i� �i�rmis

2i

� �

¼ ��r1

�rsiTi þ

�is2i

� @Di

@�i

� �:

Since

@Di

@�i¼ m

mi�1i

mi!

@pi;0@�i� �mi

i

ð1� �iÞ2

þpi;0�mi�1i ðmi � ðmi � 2Þ�iÞ

ð1� �iÞ3

!

¼ si�r� @Ti@�i

;

we get

@Ti@si¼ � 1

siTi þ �i

@Ti@�i

� �;

which implies that

@T

@si¼ �i�� @Ti@si¼ � �i

�siTi þ �i

@Ti@�i

� �¼ �i�rð�� 1Þs��2

i ;

ð2Þ

namely,

� 1

��rð�� 1Þs��1i

Ti þ �i@Ti@�i

� �¼ ; ð3Þ

for all 1 � i � n.Based on the above discussion, i.e., (1) and (3), we

have ��rð�� 1Þs��1i ¼ �= þ �rs��1

i , which gives rise tosi ¼ ð�ð�= Þ � 1=ð��rÞÞ1=ð��1Þ, for all 1 � i � n. The aboveresult means that all servers must have the same speed s,i.e., s1 ¼ s2 ¼ � � � ¼ sn ¼ s, where

s ¼ � � � 1

��r

� �1=ð��1Þ:

Consequently, from the constraint

CAO ET AL.: OPTIMAL POWER ALLOCATION AND LOAD DISTRIBUTION FOR MULTIPLE HETEROGENEOUS MULTICORE SERVER... 49

Page 6: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

Gð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ

¼Xni¼1

�i�rs��1i ¼ ��rs��1 ¼ ~P �

Xni¼1

miP�i ;

we get the identical server speed as

s ¼ 1

��r~P �

Xni¼1

miP�i

! !1=ð��1Þ

:

Hence, the theorem is proven. tuBy the above theorem, we immediately get

Pi ¼ �i�rs��1i þmiP

�i ¼ �i�rs��1 þmiP

�i

¼ �i�

~P �Xni¼1

miP�i

!þmiP

�i ;

for all 1 � i � n. It remains to find �1; �2; . . . ; �n. We will

describe our method in Section 6.From the above proof, we also know that �rs��1

i ¼ �rs��1 ¼ ��=�. Therefore, by (1), we have

1

�Ti þ �i

@Ti@�i

� �¼ 1� 1

� ��; ð4Þ

for all 1 � i � n.

6 THE CONSTANT-SPEED MODEL

The following result gives the optimal server speed setting

for the constant-speed model.

Theorem 2. For the constant-speed model, the average task

response time T on n multicore servers is minimized when

si ¼ ð�i=ð�miÞÞ1=�, for all 1 � i � n, where

� ¼ �~P � ðm1P �1 þm2P �2 þ � � � þmnP �nÞ

:

Proof. Again, we can minimize T by using the method of

Lagrange multiplier, namely,

rT ð�1; �2; . . . ; �n; s1; s2; . . . ; snÞ¼ �rF ð�1; �2; . . . ; �nÞ þ rGðs1; s2; . . . ; snÞ;

that is,

@T

@�i¼ � @F

@�i¼ �;

for all 1 � i � n, where � is a Lagrange multiplier, and

@T

@si¼ @G

@si¼ mi�s

��1i ;

for all 1 � i � n, where and is another Lagrange

multiplier.Notice that (1) becomes

1

�Ti þ �i

@Ti@�i

� �¼ �: ð5Þ

Also, (2) becomes

@T

@si¼ � �i

�siTi þ �i

@Ti@�i

� �¼ � mis

��1i ;

namely,

1

�Ti þ �i

@Ti@�i

� �¼ �� mis

�i

�i: ð6Þ

Based on (5) and (6), we get � ¼ �� ðmis�i =�iÞ, and

equivalently, �i ¼ ��ð =�Þmis�i . From the last equation,

and the constraint that

Gðs1; s2; . . . ; snÞ ¼ m1s�1 þm2s

�2 þ � � � þmns

�n

¼ ~P �Xni¼1

miP�i ;

we obtain

� ¼ � �

� �~P �

Xni¼1

miP�i

!;

which implies that �i ¼ �mis�i , and si ¼ ð�i=ð�miÞÞ1=�,

for all 1 � i � n, where

� ¼ �� �¼ �

~P � ðm1P �1 þm2P �2 þ � � � þmnP �nÞ:

This proves the theorem. tuBy the above theorem, we immediately get the power

allocation as

Pi ¼ miðs�i þ P �i Þ ¼ mi�i�miþ P �i

� �

¼ �i�

~P �Xni¼1

miP�i

!þmiP

�i ;

for all 1 � i � n. Notice that the above expression of Pi isthe same as that of the idle-speed model; however, theactual value of Pi is different because �i is different; also theresulting speed si is different.

The above theorem essentially means that Ti can beviewed as a function of �i, and T can be viewed as afunction of �1; �2; . . . ; �n. It remains to find �1; �2; . . . ; �n.The method will be presented in the next section. Noticethat if we keep the relation si ¼ ð�i=ð�miÞÞ1=�, then �ionly depends on (and is an increasing function of) �i, andwe have

�i ¼�i�r

misi¼ ��rs��1

i ¼ ��r�i�mi

� �ð��1Þ=�

¼ �1=��r�imi

� �ð��1Þ=�;

which implies that

@�i@�i¼ �

1=��r

mið1� 1=�Þ �i

mi

� ��1=�

;

and

�i@�i@�i¼ �1=��rð1� 1=�Þ �i

mi

� �1�1=�

¼ ð1� 1=�Þ�i;

50 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014

Page 7: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

for all 1 � i � n. Thus, we obtain

@T

@�i¼ 1

�Ti þ �i

@�i@�i� @Ti@�i

� �¼ 1

�Ti þ ð1� 1=�Þ�i

@Ti@�i

� �;

and similar to (4), we get

1

�Ti þ ð1� 1=�Þ�i

@Ti@�i

� �¼ �; ð7Þ

for all 1 � i � n. We observe that (4) and (7) are very similar

and can be solve by using the same algorithm.

7 THE ALGORITHM

Since (4) and (7) are very similar, the optimal load

distribution �1; �2; . . . ; �n and the optimal server speeds

s1; s2; . . . ; sn (i.e., the optimal power allocation P1; P2; . . . ;

Pn) as well as the optimal task response time T can be found

by using the same algorithm for the two core speed and

power consumption models. We use a Boolean parameter

idle to distinguish the two models, namely, we refer to the

idle-speed model when idle is true, and the constant-speed

model when idle is false.Given n, m1;m2; . . . ;mn, P �1 ; P

�2 ; . . . ; P �n , �r, �, ~P , and idle,

our optimal power allocation and load distribution algo-

rithm to find �, �1; �2; . . . ; �n, s1; s2; . . . ; sn, and T is given in

Fig. 3. The algorithm uses another subalgorithm Find_�idescribed in Fig. 2, which, given mi, �r, �, idle, �, s, and �,

finds �i that satisfies (4) or (7).The key observation is that the left-hand sides of (4) and

(7) are increasing functions of �i. Therefore, given �, we can

find �i by using the bisection method to search �i in certain

interval ½lb; ub� (lines (5)-(14) in Fig. 2). We set lb simply as

0 (line (2)). For ub, we notice that

�i ¼�imis

�r<mis

�r;

for the idle-speed model with s given in Theorem 1, and

�i ¼mi�

�=ð��1Þi

�1=ð��1Þ�r�=ð��1Þ <mi

�1=ð��1Þ�r�=ð��1Þ ;

for the constant-speed model with � given in Theorem 2,

where 1 � i � n. Hence, ub is set in line (3) based on the

above discussion, where the value of a conditional expres-

sion c ? u : v is u if the condition c is true and v if the

condition c is false. For a given �i, the server speed si is s

given by Theorem 1, and ð�i=�miÞ1=� given by Theorem 2

(line (8)). The left-hand side of (4) or (7) is compared with

ð1� 1=�Þ� or � (line (9)), and the search interval ½lb; ub� is

adjusted accordingly (lines (10) and (12)).

CAO ET AL.: OPTIMAL POWER ALLOCATION AND LOAD DISTRIBUTION FOR MULTIPLE HETEROGENEOUS MULTICORE SERVER... 51

Fig. 2. An algorithm to find �i.

Fig. 3. An algorithm to calculate the minimized T .

Page 8: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

The value of � can also be found by using the bisectionmethod (lines (14)-(25) in Fig. 3). The search interval ½lb; ub�for � is determined as follows: We set lb simply as 0 (line(12)). As for ub, we notice that � is at least ð1=�Þ�r=s (line (4))for both speed models, because Ti � xi ¼ �r=s. Then, � isrepeatedly doubled (line (7)), until the sum of the �is foundby Find_�i (line (8)) is at least � (lines (5)-(10) and (13)).Once ½lb; ub� is decided, � can be searched based on the factthat the �i’s increase with � (lines (20)-(24)). After � isdetermined (line (26)), �1; �2; . . . ; �n and s1; s2; . . . ; sn can becomputed routinely (lines (28)-(31)) and T can be obtainedeasily (line (33)).

8 THE EQUAL-POWER (EP) METHOD

In this section, we analyze a baseline method, i.e., the equal-power method. The baseline method is a simple methodwithout optimization applied.

8.1 The Analysis

In the equal-power method, all cores of all servers consumethe same amount of power. Let M ¼ m1 þm2 þ � � � þmn bethe total number of cores of the n multicore servers.

8.1.1 The Idle-Speed Model

The following result gives the server speed setting for theidle-speed model.

Theorem 3. For the idle-speed model, we have si ¼ ci=�1=ð��1Þi ,

where

ci ¼mi

�r

~P

M� P �i

� �� �1=ð��1Þ

;

for all 1 � i � n.

Proof. Since a core in server Si consumes power �is�i þ P �i ,

and all cores consume the same amount of power, we have

�is�i þ P �i ¼

�i�r

mis��1i þ P �i ¼

~P

M;

which gives

si ¼mi

�i�r

~P

M� P �i

� �� �1=ð��1Þ

;

for all 1 � i � n. tuBy the above theorem, we immediately get

Pi ¼ �i�rs��1i þmiP

�i

¼ �i�r �mi

�i�r

~P

M� P �i

� �þmiP

�i ¼

mi

MP �i ;

for all 1 � i � n.Since si is a function of �i and no longer an independent

variable, our optimal power allocation and load distributionproblem for multiple heterogeneous multicore serverprocessors is modified as follows: given the number ofmulticore servers n, the sizes of the servers m1;m2; . . . ;mn,the base power supply P �1 ; P

�2 ; . . . ; P �n , the average task

execution requirement �r, the task arrival rate �, and theavailable power ~P , find the task arrival rates on the servers�1; �2; . . . ; �n, such that the average task response timeT ð�1; �2; . . . ; �nÞ is minimized, subject to the constraint

F ð�1; �2; . . . ; �nÞ ¼ �1 þ �2 þ � � � þ �n ¼ �;

and �i < 1, for all 1 � i � n.By using the method of Lagrange multiplier, namely,

rT ð�1; �2; . . . ; �nÞ ¼ �rF ð�1; �2; . . . ; �nÞ;

that is,

@T

@�i¼ � @F

@�i;

for all 1 � i � n, where � is a Lagrange multiplier, we get

1

�Ti þ �i

@Ti@�i

� �¼ �:

Notice that

@si@�i¼ � 1

�� 1� ci��=ð��1Þ ¼ �

1

�� 1� si�i;

and

@�i@�i¼ �r

misi� �i�r

mis2i

� @si@�i¼ �i�i� �isi� @si@�i

¼ �i1

�iþ 1

�� 1� 1

�i

� �¼ �

�� 1� �i�i:

Hence, we get

@Ti@�i¼ � �r

s2i

� @si@�i� Ti

si�r

þmmi�1i

mi!� �r

si

@pi;0@�i� �mi

i

ð1� �iÞ2

þpi;0�mi�1i ðmi � ðmi � 2Þ�iÞ

ð1� �iÞ3

!@�i@�i

¼ 1

�� 1� Ti�i

þ �

�� 1�m

mi�1i

mi!� �i�i� �r

si

@pi;0@�i� �mi

i

ð1� �iÞ2

þpi;0�mi�1i ðmi � ðmi � 2Þ�iÞ

ð1� �iÞ3

!;

where

@pi;0@�i¼ �p2

i;0

Xmi�1

k¼1

mki �

k�1i

ðk� 1Þ!

þmmi

i

mi!� �

mi�1i ðmi � ðmi � 1Þ�iÞ

ð1� �iÞ2

!;

for all 1 � i � n. Consequently, we have

�� 1� 1�

Ti þmmi�1i

mi!�i

�r

si

@pi;0@�i� �mi

i

ð1� �iÞ2

þpi;0�mi�1i ðmi � ðmi � 2Þ�iÞ

ð1� �iÞ3

!!¼ �;

ð8Þ

52 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014

Page 9: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

for all 1 � i � n. The above equation looks like

1

�Ti þ �i

@Ti@�i

� �¼ 1� 1

� ��;

for all 1 � i � n.

8.1.2 The Constant-Speed Model

The following result gives the server speed setting for theconstant-speed model.

Theorem 4. For the constant-speed model, we have si ¼ d1=�i ,

where di ¼ ~P=M � P �i , for all 1 � i � n.

Proof. Since a core in server Si consumes power s�i þ P �i ,and all cores consume the same amount of power, wehave s�i þ P �i ¼ ~P=M, which gives si ¼ ð ~P=M � P �i Þ

1=�,for all 1 � i � n. tu

By the above theorem, we immediately get the powerallocation as

Pi ¼ miðs�i þ P �i Þ ¼mi

MP �i ;

for all 1 � i � n.Notice that

@si@�i¼ 0;

and

@�i@�i¼ �r

misi� �i�r

mis2i

� @si@�i¼ �i�i:

Hence, we get

@Ti@�i¼ � �r

s2i

� @si@�i� Ti

si�r

þmmi�1i

mi!� �r

si

@pi;0@�i� �mi

i

ð1� �iÞ2

þpi;0�mi�1i ðmi � ðmi � 2Þ�iÞ

ð1� �iÞ3

!@�i@�i

¼ mmi�1i

mi!� �i�i� �r

si

@pi;0@�i� �mi

i

ð1� �iÞ2

þpi;0�mi�1i ðmi � ðmi � 2Þ�iÞ

ð1� �iÞ3

!:

Consequently, we have

1

�Ti þ

mmi�1i

mi!�i

�r

si

@pi;0@�i� �mi

i

ð1� �iÞ2

þpi;0�mi�1i ðmi � ðmi � 2Þ�iÞ

ð1� �iÞ3

!!¼ �;

ð9Þ

for all 1 � i � n. The above equation looks like

1

�Ti þ �i

@Ti@�i

� �¼ �;

for all 1 � i � n.

8.2 The Algorithm

The left-hand sides of (8) and (9) are increasing functions of�i. Therefore, we can adapt our algorithm in Section 6 tofind an optimal equal-power solution.

The Find_�i (mi; �r; �; idle; �; ci; di) algorithm has changein the input parameters, i.e., s and � are replaced by ci anddi. For the idle-speed model, since

�i ¼�imi

�r�

1=ð��1Þi

ci¼ 1

ci� �r

mi��=ð��1Þi < 1;

we need

�i < cimi

�r

� �ð��1Þ=�;

for all 1 � i � n. For the constant-speed model, since

�i ¼�imi� �r

si< 1;

we need

�i <mi

�rsi ¼

mi

�rd

1=�i ;

for all 1 � i � n. Line (3) is, therefore, changed toub idle ? ðciðmi=�rÞÞð��1Þ=� : ðmi=�rÞd1=�

i .According to Theorems 3 and 4, line (8) is changedsi idle ? ci=�

1=ð��1Þi : d

1=�i .

According to (8) and (9), line (9) is changedif (1=�ðTi þ �i@Ti=@�iÞ < ðidle ? ð1� 1=�Þ : 1Þ�).In the Calculate_T algorithm, the output parameters are

changed to �, �1; �2; . . . ; �n, and T . Lines (1) and (2) arechanged toci ððmi=�rÞð ~P=M � P �i ÞÞ

1=ð��1Þ, for all 1 � i � n;di ~P=M � P �i , for all 1 � i � n.In lines (8), (18), (29), the parameters in calling the

Find_�i (mi; �r; �; idle; �; ci; di) algorithm are changed.Line (30) should be identical to line (8) in Find_�i.

9 NUMERICAL EXAMPLES

In this section, we demonstrate some numerical examples.Notice that all parameters in our examples are forillustration only. They can be changed to any other valuesin any real clouds.

9.1 The Optimal Method

First, we show two numerical examples of the optimalmethod.

Example 1. Let us consider a group of n ¼ 7 multicoreservers S1; S2; . . . ; S7. The server sizes are mi ¼ 2i, where1 � i � n. The base power consumption is P �i ¼ 2 Watts,for all 1 � i � n. The average task execution requirementis �r ¼ 1 (giga instructions). The task arrival rate is� ¼ 55 per second. Assume that we are given ~P ¼ 200Watts. In Table 1, we show the optimal load distribution�1; �2; . . . ; �n, the optimal server speeds s1; s2; . . . ; sn, theoptimal power allocation P1; P2; . . . ; Pn, the serverutilizations �1; �2; . . . ; �n, and the average task responsetimes T1; T2; . . . ; Tn on the n servers, for the idle-speedmodel. The overall average task response time of then multicore servers is T ¼ 0:94682 seconds.

CAO ET AL.: OPTIMAL POWER ALLOCATION AND LOAD DISTRIBUTION FOR MULTIPLE HETEROGENEOUS MULTICORE SERVER... 53

Page 10: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

Example 2. In Table 2, we demonstrate the same data withthe same input as in Example 1 for the constant-speedmodel. It is observed that slightly more load and powerare shifted from servers S1-S3 to servers S4-S7. Due toconstant power consumption, server speeds are reduced,which result in increased server utilizations and increasedtask response times. The overall average task responsetime of the n multicore servers is T ¼ 1:20203 seconds,which is greater than that of the idle-speed model.

9.2 The Equal-Power Method

Now, we show two numerical examples of the equal-powermethod.

Example 3. Let us consider the same group of multicoreservers in Example 1 with the same parameter setting. InTable 3, we show the optimal load distribution�1; �2; . . . ; �n, the optimal server speeds s1; s2; . . . ; sn,the optimal power allocation P1; P2; . . . ; Pn, the serverutilizations �1; �2; . . . ; �n, and the average task responsetimes T1; T2; . . . ; Tn on the n servers, for the idle-speedmodel, produced by the equal-power method. It isobserved that compared with the optimal solution inExample 1, workloads are slightly shifted from serversS5-S7 to servers S1-S4. Furthermore, server speeds arenow different. Compared with the optimal solution inExample 1, servers S5-S7 have reduced workload,reduced speeds, reduced power consumption, reducedutilization, and increased response time. The overallaverage task response time of the n multicore servers isT ¼ 0:94887 seconds, which is slightly greater than thatof the optimal solution.

Example 4. In Table 4, we demonstrate the same data withthe same input as in Example 3 for the constant-speedmodel. Notice that the server speeds happen to be thesame, because P �1 ¼ P �2 ¼ � � � ¼ P �n . Again, comparedwith the optimal solution in Example 2, servers S5-S7

have reduced workload, reduced speeds, reduced power

consumption, reduced utilization, and increased re-

sponse time. The overall average task response time of

the n multicore servers is T ¼ 1:20508 seconds, which is

slightly greater than that of the optimal solution.

10 PERFORMANCE COMPARISON

In this section, we compare our optimal solutions with the

solutions produced by the equal-power method.

10.1 The Optimal Method

To plot the minimized average task response time T as a

function of �, we need to know the maximum task arrival

rate �max. For the idle-speed model, since �i < miðsi=�rÞ ¼miðs=�rÞ, �max satisfies �max < Mðs=�rÞ, where M ¼

Pni¼1 mi.

By Theorem 1, we have

s ¼ 1

�max�r

�~P �

�m1P

�1 þm2P

�2 þ � � � þmnP

�n

��� �1=ð��1Þ;

that is,

���1max <

M��1

�r��1� 1

�max�r

�~P �

�m1P

�1 þm2P

�2 þ � � � þmnP

�n

��;

or,

��max <M��1

�r��

~P ��m1P

�1 þm2P

�2 þ � � � þmnP

�n

��;

which implies that

�max <1

�rMð��1Þ=�� ~P �

�m1P

�1 þm2P

�2 þ � � � þmnP

�n

��1=�:

54 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014

TABLE 1Numerical Data in Example 1

TABLE 2Numerical Data in Example 2

TABLE 3Numerical Data in Example 3

TABLE 4Numerical Data in Example 4

Page 11: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

For the constant-speed model, by Theorem 2, we have

�i < misi�r¼ mi

�r

�i�mi

� �1=�

;

that is,

�i <mi

�1=ð��1Þ�r�=ð��1Þ ;

where

� ¼ �max

~P ��m1P �1 þm2P �2 þ � � � þmnP �n

� :Consequently, �max satisfies

�max <M

�1=ð��1Þ�r�=ð��1Þ ;

that is,

�max�1=ð��1Þ <

M

�r�=ð��1Þ ;

or,

���1max� <

M��1

�r�;

or,

��max

~P ��m1P

�1 þm2P

�2 þ � � � þmnP �n

� < M��1

�r�;

which implies that

�max <1

�rMð��1Þ=�� ~P �

�m1P

�1 þm2P

�2 þ � � � þmnP

�n

��1=�:

Hence, both models have the same �max.For the same servers in Examples 1 and 2, we display the

minimized average task response time T as a function of �

in Figs. 4 and 5 for the two core speed models, respectively,

where ~P ¼ 150; 175; 200; 225; 250 Watts. It is clear that the

constant-speed model yields longer average task response

time than the idle-speed model.

10.2 The Baseline Method

To plot the average task response time T obtained by theequal-power method as a function of �, we need to know themaximum task arrival rate �max. For the idle-speed model, byTheorem 3, since �i < miðsi=�rÞ ¼ ðmi=�rÞ � ðci=�1=ð��1Þ

i Þ, weget �i < ðciðmi=�rÞÞð��1Þ=�, and �max satisfies

�max <Xni¼1

cimi

�r

� �ð��1Þ=�:

For the constant-speed model, by Theorem 4, since

�i < misi�r¼ mi

�rd

1=�i ;

we get

�max <Xni¼1

mi

�rd

1=�i :

It can be verified that both models have the same �max, i.e.,

�max <Xni¼1

mi

�r

~P

M� P �i

� �1=�

:

Furthermore, we have

Xni¼1

mi

�r

~P

M� P �i

� �1=�

� 1

�rMð��1Þ=� P �

Xni¼1

miP�i

!1=�

;

where the equality holds only when P �1 ¼ P �2 ¼ � � � ¼ P �n . Toshow this, we notice that the inequality is actually

Xni¼1

mi

M

~P

M� P �i

� �1=�

�~P

M�Xni¼1

mi

MP �i

!1=�

:

Let us consider the function fðzÞ ¼ ð ~P=M � zÞ1=�. It is easy

to verify that fðzÞ is a concave function (i.e., f 00ðzÞ < 0),

which implies that

Xni¼1

mi

MfðziÞ � f

Xni¼1

mi

Mzi

!;

where the equality holds only when z1 ¼ z2 ¼ � � � ¼ zn,

which is equivalent to P �1 ¼ P �2 ¼ � � � ¼ P �n , if zi ¼ P �i ,

for all 1 � i � n. The above inequality means that the

CAO ET AL.: OPTIMAL POWER ALLOCATION AND LOAD DISTRIBUTION FOR MULTIPLE HETEROGENEOUS MULTICORE SERVER... 55

Fig. 4. Average task response time T versus � and P (idle-speed model).

Fig. 5. Average task response time T versus � and P (constant-speed model).

Page 12: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

equal-power method has a smaller saturation point thanthe optimal solution, and thus results in longer averagetask response time.

For the same servers in Examples 1-4, we display theaverage task response time of the equal-power method andthe optimal (OPT) average task response time as a functionof � in Tables 5 and 6 for the idle-speed model and theconstant-speed model, respectively. The task arrival rate is� ¼ y�max, where 0:80 � y � 0:99, and �max is the saturationpoint of the equal-power method. The power constraint is~P ¼ 150; 175; 200; 225; 250 Watts. It is easily observed thatthe equal-power method yields noticeably longer averagetask response time than the optimal average task responsetime, especially when � is close to �max and ~P is not large.

11 PERFORMANCE CONSTRAINED POWER

OPTIMIZATION

Due to space limitation, this section is moved to thesupplementary file, which can be found on the ComputerSociety Digital Library at http://doi.ieeecomputersociety.org/10.1109/TC.2013.122.

12 CONCLUDING REMARKS

We have mentioned the importance and significance ofperformance optimization and power reduction in datacenters for cloud computing. We demonstrated the feasi-bility of studying the power-performance tradeoff for acloud of clouds with heterogeneous servers in an analyti-cally way. We have described a queuing model for a groupof heterogeneous multicore servers with different sizes andspeeds. We formulated two optimization problems foroptimal power allocation and load distribution on multipleheterogeneous multicore server processors in a cloudcomputing environment. We proved optimal server speedsetting for two different core speed models, i.e., the idle-speed model and the constant-speed model. We alsodeveloped our algorithms to solve the optimization pro-blems and demonstrated some numerical examples. Ourwork makes initial contribution to optimal load distributionwith power and performance constraints for multiplequeuing systems with multiple servers. Our models, results,and algorithms in this paper are also applicable to otherserver systems and computing environments.

56 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014

TABLE 5Comparison of Equal-Power Solutions and Optimal Solutions (Idle-Speed Model)

TABLE 6Comparison of Equal-Power Solutions and Optimal Solutions (Constant-Speed Model)

Page 13: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

ACKNOWLEDGMENTS

The authors would like to express their gratitude to the

reviewers for their comments on improving the presenta-

tion and structure of the paper. This work was partially

supported by Ministry of Science and Technology of

China under National 973 Basic Research Grants No.

2011CB302805, No. 2013CB228206, and No. 2011CB302505,

National Natural Science Foundation of China Grant

No. 61233016, and Academic Exchange Foundation of

Tsinghua National Laboratory for Information Science and

Technology. The work of I. Stojmenovic was supported by

the Government of China for the Tsinghua 1000-Plan

Distinguished Professorship (2012-2015) and by NSERC

Canada Discovery grant. Part of the work was performed

while K. Li was visiting Tsinghua National Laboratory for

Information Science and Technology, Tsinghua University,

in the winter of 2011 and 2013 and the summer of 2012 as an

Intellectual Ventures endowed visiting chair professor

(2011-2013).

REFERENCES

[1] http://en.wikipedia.org/wiki/CMOS, 2013.[2] http://searchdatacenter.techtarget.com/definition/server-

consolidation, 2013.[3] S. Albers, “Energy-Efficient Algorithms,” Comm. ACM, vol. 53,

no. 5, pp. 86-96, 2010.[4] A. Beloglazov, R. Buyya, Y.C. Lee, and A. Zomaya, “A Taxonomy

and Survey of Energy-Efficient Data Centers and Cloud Comput-ing Systems,” Advances in Computers, vol. 82, pp. 47-111, 2011.

[5] L. Benini, A. Bogliolo, and G. De Micheli, “A Survey of DesignTechniques for System-Level Dynamic Power Management,”IEEE Trans. Very Large Scale Integration Systems, vol. 8, no. 3,pp. 299-316, June 2000.

[6] A. Berl, E. Gelenbe, M.D. Girolamo, G. Giuliani, H.D. Meer, M.Q.Dang, and K. Pentikousis, “Energy-Efficient Cloud Computing,”The Computer J., vol. 53, pp. 1045-1051, 2009.

[7] F. Bonomi and A. Kumar, “Adaptive Optimal Load Balancing in aNonhomogeneous Multiserver System with a Central Job Sche-duler,” IEEE Trans. Computers, vol. 39, no. 10, pp. 1232-1250, Oct.1990.

[8] A.P. Chandrakasan, S. Sheng, and R.W. Brodersen, “Low-PowerCMOS Digital Design,” IEEE J. Solid-State Circuits, vol. 27, no. 4,pp. 473-484, Apr. 1992.

[9] J.S. Chase, D.C. Anderson, P.N. Thakar, A.M. Vahdat, and R.P.Doyle, “Managing Energy and Server Resources in HostingCenters,” Proc. 18th ACM Symp. Operating Systems Principles,pp. 103-116, 2011.

[10] G. Chen, W. He, J. Liu, S. Nath, L. Rigas, L. Xiao, and F. Zhao,“Energy-Aware Server Provisioning and Load Dispatching forConnection-Intensive Internet Services,” Proc. Fifth USENIX Symp.Networked Systems Design and Implementation, pp. 337-350, 2008.

[11] Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N.Gautam, “Managing Server Energy and Operational Costs inHosting Centers,” Proc. ACM SIGMETRICS Int’l Conf. Measurementand Modeling of Computer Systems, pp. 303-314, 2005.

[12] A. Gandhi, V. Gupta, M. Harchol-Balter, and M.A. Kozuch,“Optimality Analysis of Energy-Performance Trade-off for ServerFarm Management,” Performance Evaluation, vol. 67, no. 11,pp. 1155-1171, 2010.

[13] B. Guenter, N. Jain, and C. Williams, “Managing Cost, Perfor-mance, and Reliability Tradeoffs for Energy-Aware ServerProvisioning,” Proc. IEEE INFOCOM, pp. 1332-1340, 2011.

[14] L. He, S.A. Jarvis, D.P. Spooner, H. Jiang, D.N. Dillenberger, andG.R. Nudd, “Allocating Non-Real-Time and Soft Real-Time Jobsin Multiclusters,” IEEE Trans. Parallel and Distributed Systems,vol. 17, no. 2, pp. 99-112, Feb. 2006.

[15] IBM, “The Benefits of Cloud Computing - A New Era ofResponsiveness, Effectiveness and Efficiency in IT ServiceDelivery,” Dynamic Infrastructure, July 2009.

[16] H. Kameda, J. Li, C. Kim, and Y. Zhang, Optimal Load Balancing inDistributed Computer Systems. Springer-Verlag, 1997.

[17] B. Khargharia, S. Hariri, F. Szidarovszky, M. Houri, H. El-Rewini,S. Khan, I. Ahmad, and M.S. Yousif, “Autonomic Power andPerformance Management for Large-Scale Data Centers,” NFSNext Generation Software Program, 2007.

[18] L. Kleinrock, Queuing Systems: Theory, vol. 1. John Wiley & Sons,1975.

[19] K. Li, “Minimizing Mean Response Time in HeterogeneousMultiple Computer Systems with a Central Stochastic JobDispatcher,” Int’l J. Computers and Applications, vol. 20, no. 1,pp. 32-39, 1998.

[20] K. Li, “Optimizing Average Job Response Time via DecentralizedProbabilistic Job Dispatching in Heterogeneous Multiple Compu-ter Systems,” The Computer J., vol. 41, no. 4, pp. 223-230, 1998.

[21] K. Li, “Minimizing the Probability of Load Imbalance inHeterogeneous Distributed Computer Systems,” Math. and Com-puter Modelling, vol. 36, nos. 9/10, pp. 1075-1084, 2002.

[22] K. Li, “Optimal Load Distribution in Nondedicated Heteroge-neous Cluster and Grid Computing Environments,” J. SystemsArchitecture, vol. 54, nos. 1/2, pp. 111-123, 2008.

[23] K. Li, “Optimal Power Allocation among Multiple HeterogeneousServers in a Data Center,” Sustainable Computing: Informatics andSystems, vol. 2, pp. 13-22, 2012.

[24] K. Li, “Optimal Load Distribution for Multiple HeterogeneousBlade Servers in a Cloud Computing Environment,” J. GridComputing, vol. 11, no. 1, pp. 27-46, 2013.

[25] C.G. Rommel, “The Probability of Load Balancing Success in aHomogeneous Network,” IEEE Trans. Software Eng., vol. 17, no. 9,pp. 922-933, Sept. 1991.

[26] K.W. Ross and D.D. Yao, “Optimal Load Balancing and Schedul-ing in a Distributed Computer System,” J. ACM, vol. 38, no. 3,pp. 676-690, 1991.

[27] Scheduling and Load Balancing in Parallel and Distributed Systems,B.A. Shirazi, A.R. Hurson, and K.M. Kavi, eds. IEEE CS Press,1995.

[28] M.R. Stan and K. Skadron, “Guest Editors’ Introduction: Power-Aware Computing,” Computer, vol. 36, no. 12, pp. 35-38, Dec. 2003.

[29] X. Tang and S.T. Chanson, “Optimizing Static Job Scheduling in aNetwork of Heterogeneous Computers,” Proc. Int’l Conf. ParallelProcessing, pp. 373-382, Aug. 2000.

[30] A.N. Tantawi and D. Towsley, “Optimal Static Load Balancing inDistributed Computer Systems,” J. ACM, vol. 32, no. 2, pp. 445-465, 1985.

[31] O.S. Unsal and I. Koren, “System-Level Power-Aware DesignTechniques in Real-Time Systems,” Proc. IEEE, vol. 91, no. 7,pp. 1055-1069, July 2003.

[32] V. Venkatachalam and M. Franz, “Power Reduction Techniquesfor Microprocessor Systems,” ACM Computing Surveys, vol. 37,no. 3, pp. 195-237, 2005.

[33] X. Wang, M. Chen, C. Lefurgy, and T.W. Keller, “SHIP: ScalableHierarchical Power Control for Large-Scale Data Centers,” Proc.18th Int’l Conf. Parallel Architectures and Compilation Techniques,pp. 91-100, 2009.

[34] X. Wang and Y. Wang, “Coordinating Power Control andPerformance Management for Virtualized Server Clusters,” IEEETrans. Parallel and Distributed Systems, vol. 22, no. 2, Feb. 2011.

[35] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, “Theoretical andPractical Limits of Dynamic Voltage Scaling,” Proc. 41st DesignAutomation Conf., pp. 868-873, 2004.

[36] X. Zheng and Y. Cai, “Achieving Energy Proportionality in ServerClusters,” Int’l J. Computer Networks, vol. 1, no. 2, pp. 21-35, 2010.

CAO ET AL.: OPTIMAL POWER ALLOCATION AND LOAD DISTRIBUTION FOR MULTIPLE HETEROGENEOUS MULTICORE SERVER... 57

Page 14: IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, …

Junwei Cao received the bachelor’s andmaster’s degrees in control theories andengineering in 1996 and 1998, respectively,both from Tsinghua University, Beijing, China,and the PhD degree in computer science fromthe University of Warwick, Coventry, UnitedKingdom, in 2001. He is currently a professorand the vice director, Research Institute ofInformation Technology, Tsinghua University,Beijing, China. He is also the Director of

Common Platform and Technology Division, Tsinghua NationalLaboratory for Information Science and Technology. Before joiningTsinghua University in 2006, he was a research scientist at MIT LIGOLaboratory and NEC Laboratories Europe for about five years. He haspublished more than 130 papers and cited by international scholars forthan 3,000 times. He is the book editor of CyberinfrastructureTechnologies and Applications, published by Nova Science in 2009.His research is focused on advanced computing technologies andapplications. He is a senior member of the IEEE Computer Societyand the IEEE, and a member of the ACM and CCF.

Keqin Li is a SUNY distinguished professor ofcomputer science and an Intellectual Venturesendowed visiting chair professor at the NationalLaboratory for Information Science and Tech-nology, Tsinghua University, Beijing, China. Hisresearch interests include design and analysis ofalgorithms, parallel and distributed computing,and computer networking. He has contributedextensively to processor allocation and resourcemanagement; design and analysis of sequential/

parallel, deterministic/probabilistic, and approximation algorithms; par-allel and distributed computing systems performance analysis, predic-tion, and evaluation; job scheduling, task dispatching, and loadbalancing in heterogeneous distributed systems; dynamic tree embed-ding and randomized load distribution in static networks; parallelcomputing using optical interconnections; dynamic location manage-ment in wireless communication networks; routing and wavelengthassignment in optical networks; energy-efficient computing and com-munication. He has published more than 270 journal articles, bookchapters, and research papers in refereed international conferenceproceedings. He received several Best Paper Awards for his highestquality work. He is currently or has served on the editorial board of IEEETransactions on Parallel and Distributed Systems, IEEE Transactionson Computers, Journal of Parallel and Distributed Computing, Interna-tional Journal of Parallel, Emergent and Distributed Systems, Interna-tional Journal of High Performance Computing and Networking, andOptimization Letters. He is a senior member of the IEEE.

Ivan Stojmenovic received the PhD degree inmathematics. He is a full professor at theUniversity of Ottawa, Canada. He held regularand visiting positions in Serbia, Japan, USA,Canada, France, Mexico, Spain, UnitedKingdom (as a chair in applied computing atthe University of Birmingham), Hong Kong,Brazil, Taiwan, China, and Australia. He pub-lished more than 300 different papers, andedited seven books on wireless, ad hoc, sensor

and actuator networks and applied algorithms with Wiley. He is an editorof over dozen journals (including IEEE Network), the editor-in-chief ofIEEE Transactions on Parallel and Distributed Systems (2010-13), andthe founder and editor-in-chief of three journals (MVLSC, IJPEDS, andAHSWN). He is one of about 250 computer science researchers withh-index at least 50, has top h-index in Canada for mathematics andstatistics, and has more than 12000 citations. He received four bestpaper awards and the Fast Breaking Paper for October 2003, byThomson ISI ESI. He is a recipient of the Royal Society Research MeritAward, United Kingdom. He is a Tsinghua 1000 Plan Distinguishedprofessor (2012-2015). He was an IEEE CS Distinguished visitor 2010-11 and received 2012 Distinguished Service Award from IEEE ComSocCommunications Software TC. He received Excellence in ResearchAward of the University of Ottawa 2009. He chaired and/or organizedmore than 60 workshops and conferences, and served in more than 200program committees. He was a program cochair of IEEE PIMRC 2008,IEEE AINA-07, IEEE MASS-04 and 07, founded several workshopseries, and is/was Workshop chair at IEEE ICDCS 2013, IEEEINFOCOM 2011, IEEE MASS-09, ACM Mobihoc-07 and 08. He is afellow of the IEEE (Communications Society, class 2008), and CanadianAcademy of Engineering (since 2012).

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

58 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 1, JANUARY 2014