14
SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 1 Heterogeneity and Interference-Aware Virtual Machine Provisioning for Predictable Performance in the Cloud Fei Xu, Fangming Liu, Member, IEEE, Hai Jin, Senior Member, IEEE Abstract—Infrastructure-as-a-Service (IaaS) cloud providers offer tenants elastic computing resources in the form of virtual machine (VM) instances to run their jobs. Recently, providing predictable performance (i.e., performance guarantee) for tenant applications is becoming increasingly compelling in IaaS clouds. However, the hardware heterogeneity and performance interference across the same type of cloud VM instances can bring substantial performance variation to tenant applications, which inevitably stops the tenants from moving their performance-sensitive applications to the IaaS cloud. To tackle this issue, this paper proposes Heifer, a He terogeneity and i nterfer ence-aware VM provisioning framework for tenant applications, by focusing on MapReduce as a representative cloud application. It predicts the performance of MapReduce applications by designing a lightweight performance model using the online-measured resource utilization and capturing VM interference. Based on such a performance model, Heifer provisions the VM instances of the good-performing hardware type (i.e., the hardware that achieves the best application performance) to achieve predictable performance for tenant applications, by explicitly exploring the hardware heterogeneity and capturing VM interference. With extensive prototype experiments in our local private cloud and a real-world public cloud (i.e., Microsoft Azure) as well as complementary large-scale simulations, we demonstrate that Heifer can guarantee the job performance while saving the job budget for tenants. Moreover, our evaluation results show that Heifer can improve the job throughput of cloud datacenters, such that the revenue of cloud providers can be increased, thereby achieving a win-win situation between providers and tenants. Index Terms—Cloud computing, hardware heterogeneity, performance interference, predictable performance, VM provisioning. 1 I NTRODUCTION Infrastructure-as-a-Service (IaaS) clouds provide tenants with on-demand computing resources in the form of virtual machine (VM) instances. Most commercial IaaS cloud providers, such as Amazon EC2 [1], Microsoft Azure [2] and RackSpace [3], expose a simple resource- centric interface by allowing tenants to choose the type and the number of VM instances they require. To facil- itate the VM provisioning and guarantee the job per- formance, a recent trend in IaaS clouds is to provide predictable performance for tenant jobs by offering a job- centric interface [4], [5]. Such an interface only requires tenants to specify their expected goals on the perfor- mance (e.g., completion time) or budget of jobs. To meet such goals, cloud providers offer tenants an appropriate VM provisioning plan to run their jobs. However, there exist two critical barriers to delivering Fei Xu is with the Shanghai Key Laboratory of Multidimensional In- formation Processing, Department of Computer Science and Technology, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China. Email: [email protected]. Part of this work was done at the Services Computing Technology and System Lab, Cluster and Grid Computing Lab in the School of Computer Science and Technology, Huazhong University of Science and Technology. Fangming Liu and Hai Jin are with the Services Computing Technology and System Lab, Cluster and Grid Computing Lab in the School of Computer Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan 430074, China. Email: {fmliu, hjin}@hust.edu.cn. The Corresponding Author is Fangming Liu. Manuscript received February 24, 2015; revised July XX, 2015. predictable application performance to tenants in IaaS clouds. First, the hardware heterogeneity within the same VM instance type can cause severe variation of applica- tion performance in IaaS clouds [6], [7]. As evidenced by our motivation experiments in Microsoft Azure in Sec. 2.2, the application performance on A1 VM instances can vary by 92.1% due to the hardware heterogeneity. Such performance variation can even reach up to 280% for the m1 class instances in Amazon EC2 [8]. Second, the performance interference across VM instances also brings substantial performance variation to tenant appli- cations [9]. Recent measurement studies on Amazon EC2 have shown that, the disk I/O bandwidth of small in- stances can vary by 50%[10], and the network I/O band- width of medium instances can also vary by 66%[11], due to the contention on shared resources including CPU cores, cache space and I/O bandwidth. As a result, such severe performance variation of VMs [12] undoubtedly hinders tenants from moving their performance-sensitive applications to the IaaS cloud, as tenants are increasingly concerning about the high-level performance goals and monetary cost for their jobs [4]. Recently, there have been a number of works devoted to providing predictable performance for tenant appli- cations in IaaS clouds. They mainly focus on ansewring how many or what type of (i.e., small, medium, or large) cloud VM instances that tenants require to lease, in order to achieve the goals on job performance or mon- etary cost [4], [13]–[15]. Nevertheless, these solutions

SUBMITTED TO IEEE TRANSACTIONS ON …grid.hust.edu.cn/fmliu/tc-predictable-FangmingLiu.pdf · Machine Provisioning for Predictable Performance in the ... experiments in our local

Embed Size (px)

Citation preview

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 1

Heterogeneity and Interference-Aware VirtualMachine Provisioning for Predictable

Performance in the CloudFei Xu, Fangming Liu, Member, IEEE, Hai Jin, Senior Member, IEEE

Abstract—Infrastructure-as-a-Service (IaaS) cloud providers offer tenants elastic computing resources in the form of virtual machine(VM) instances to run their jobs. Recently, providing predictable performance (i.e., performance guarantee) for tenant applications isbecoming increasingly compelling in IaaS clouds. However, the hardware heterogeneity and performance interference across the sametype of cloud VM instances can bring substantial performance variation to tenant applications, which inevitably stops the tenants frommoving their performance-sensitive applications to the IaaS cloud. To tackle this issue, this paper proposes Heifer, a Heterogeneity andinterference-aware VM provisioning framework for tenant applications, by focusing on MapReduce as a representative cloud application.It predicts the performance of MapReduce applications by designing a lightweight performance model using the online-measuredresource utilization and capturing VM interference. Based on such a performance model, Heifer provisions the VM instances of thegood-performing hardware type (i.e., the hardware that achieves the best application performance) to achieve predictable performancefor tenant applications, by explicitly exploring the hardware heterogeneity and capturing VM interference. With extensive prototypeexperiments in our local private cloud and a real-world public cloud (i.e., Microsoft Azure) as well as complementary large-scalesimulations, we demonstrate that Heifer can guarantee the job performance while saving the job budget for tenants. Moreover, ourevaluation results show that Heifer can improve the job throughput of cloud datacenters, such that the revenue of cloud providers canbe increased, thereby achieving a win-win situation between providers and tenants.

Index Terms—Cloud computing, hardware heterogeneity, performance interference, predictable performance, VM provisioning.

F

1 INTRODUCTION

Infrastructure-as-a-Service (IaaS) clouds provide tenantswith on-demand computing resources in the form ofvirtual machine (VM) instances. Most commercial IaaScloud providers, such as Amazon EC2 [1], MicrosoftAzure [2] and RackSpace [3], expose a simple resource-centric interface by allowing tenants to choose the typeand the number of VM instances they require. To facil-itate the VM provisioning and guarantee the job per-formance, a recent trend in IaaS clouds is to providepredictable performance for tenant jobs by offering a job-centric interface [4], [5]. Such an interface only requirestenants to specify their expected goals on the perfor-mance (e.g., completion time) or budget of jobs. To meetsuch goals, cloud providers offer tenants an appropriateVM provisioning plan to run their jobs.

However, there exist two critical barriers to delivering

• Fei Xu is with the Shanghai Key Laboratory of Multidimensional In-formation Processing, Department of Computer Science and Technology,East China Normal University, 500 Dongchuan Road, Shanghai 200241,China. Email: [email protected]. Part of this work was doneat the Services Computing Technology and System Lab, Cluster andGrid Computing Lab in the School of Computer Science and Technology,Huazhong University of Science and Technology.

• Fangming Liu and Hai Jin are with the Services Computing Technologyand System Lab, Cluster and Grid Computing Lab in the School ofComputer Science and Technology, Huazhong University of Science andTechnology, 1037 Luoyu Road, Wuhan 430074, China. Email: {fmliu,hjin}@hust.edu.cn. The Corresponding Author is Fangming Liu.

Manuscript received February 24, 2015; revised July XX, 2015.

predictable application performance to tenants in IaaSclouds. First, the hardware heterogeneity within the sameVM instance type can cause severe variation of applica-tion performance in IaaS clouds [6], [7]. As evidencedby our motivation experiments in Microsoft Azure inSec. 2.2, the application performance on A1 VM instancescan vary by 92.1% due to the hardware heterogeneity.Such performance variation can even reach up to 280%for the m1 class instances in Amazon EC2 [8]. Second,the performance interference across VM instances alsobrings substantial performance variation to tenant appli-cations [9]. Recent measurement studies on Amazon EC2have shown that, the disk I/O bandwidth of small in-stances can vary by 50% [10], and the network I/O band-width of medium instances can also vary by 66% [11],due to the contention on shared resources including CPUcores, cache space and I/O bandwidth. As a result, suchsevere performance variation of VMs [12] undoubtedlyhinders tenants from moving their performance-sensitiveapplications to the IaaS cloud, as tenants are increasinglyconcerning about the high-level performance goals andmonetary cost for their jobs [4].

Recently, there have been a number of works devotedto providing predictable performance for tenant appli-cations in IaaS clouds. They mainly focus on ansewringhow many or what type of (i.e., small, medium, or large)cloud VM instances that tenants require to lease, inorder to achieve the goals on job performance or mon-etary cost [4], [13]–[15]. Nevertheless, these solutions

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 2

Performance prediction

Job and goals

VM instance provisioning

Periodical goals checking

Cost-effective VM provisioning

plan

Cloud Status

Candidate numbers of VMs

Heifer

Resource utilizationHardware type

Historical information of job execution

Job profiling

Fig. 1: Overview of Heifer.

are designed under the assumption that the hardwareof a given VM instance type are identical, which hasbeen proved unrealistic in real-world cloud datacen-ters [6]. Moreover, several existing solutions (e.g., [14],[16]) are oblivious to the performance interference ofVMs, and simply assume that VM instances can providethe guaranteed VM performance, such as network I/Obandwidth [17] and disk I/O bandwidth [4]. As a result,there has been scant research on achieving predictableapplication performance with a particular focus on alle-viating the impact of performance variation incurred bythe hardware heterogeneity and VM interference.

To address the performance issue above, in this pa-per, we present Heifer, a hardware Heterogeneity andinterference-aware VM provisioning framework, in or-der to deliver predictable performance to tenant appli-cations in IaaS clouds. We illustrate Heifer by focusingon MapReduce as a representative cloud application.To predict the application performance, we devise alightweight MapReduce performance model, using theonline-measured multi-dimensional VM resource utiliza-tion and capturing VM interference. Based on such amodel, we design a VM provisioning strategy to providepredictable performance for MapReduce applications.By explicitly exploring the hardware heterogeneity andcapturing VM interference, Heifer provisions an appro-priate number of VM instances of the good-performinghardware type to tenant jobs. In particular, we define thegood-performing hardware type as the type of hardwarethat achieves the best application performance1.

Specifically, the basic procedure of Heifer is shown inFig. 1: First, tenants submit their job programs with inputdata sets as well as the job performance goals (e.g., theexpected completion time, budget, or throughput) to theIaaS cloud. By profiling the submitted jobs on a sampleddata set, Heifer then analyzes the job specifications andoutputs the candidate number of VM instances for eachhardware type based on our performance predictionmodel. Finally, Heifer selects the minimum from thecandidate numbers of VMs as the cost-effective pro-visioning plan (i.e., the number of VMs required andthe corresponding hardware type). In particular, Heiferperiodically checks whether the performance goals canbe achieved for long-running jobs, and dynamically addsan appropriate number of VM instances or reduces the

1. For example, given three types of hardware (i.e., h1, h2, h3) andone tenant application a1, h1 is considered as the good-performinghardware type for a1, if a1 can obtain the best application performanceon h1 compared with that on h2 and h3.

TABLE 1: Heterogeneous hardware configuration of VM instances inAmazon EC2 [1] and Microsoft Azure [2], our Xen-based private cloud,which are obtained from the latest literature [7], [18] and our measure-ment study in Sec. 2, respectively.

Cloud Provider CPU type Frequency Cache

Intel E5430 2.66 GHz 12 MBAmazon EC2 Intel E5507 2.26 GHz 4 MB

m1 class [7], [18] Intel E5645 2.4 GHz 12 MBIntel E5-2650 2.0 GHz 20 MB

Microsoft Azure AMD 4171 HE 2.1 GHz 0.5 MBStandard A1 Intel E5-2660 2.2 GHz 20 MBOur Private Intel E7420 2.13 GHz 8 MB

Cloud Intel E5620 2.4 GHz 12 MB

number of provisioned VM instances. Moreover, thehistorical information of job execution can be collectedto refine the Heifer performance prediction model.

The rest of this paper is organized as follows. Sec. 2conducts experimental analysis on the performance vari-ation caused by the hardware heterogeneity and VMinterference in both public and private clouds. Suchperformance analysis enables the design of Heifer, aperformance prediction model of MapReduce and aVM instance provisioning strategy in Sec. 3 and Sec. 4,respectively. Sec. 5 evaluates the effectiveness, benefitsand overhead of Heifer through prototype experimentsin both public and private clouds and large-scale simu-lations. Sec. 6 discusses our contribution in the contextof related work. Finally, we conclude this paper in Sec. 7.

2 PERFORMANCE VARIATION IN CLOUDS: ANEXPERIMENTAL STUDY IN MICROSOFT AZURE

In this section, we seek to experimentally understandthe following questions: how severe the performancevariation of tenant applications could be in a real-worldpublic cloud, e.g., Microsoft Azure [2], and what arethe key factors that essentially impact such performancevariation in the cloud?

2.1 Configuration of Cloud VM InstancesIaaS cloud providers such as Amazon EC2 [1], MicrosoftAzure [2], allow tenants to lease multiple types of VM in-stances on demand. These VM instances typically vary insize, ranging from the small, medium, large, to extremelarge instances, etc., offering tenants different amount ofcomputing resources in terms of CPU cores, memory,and disk capacity. Taking the standard instance type inMicrosoft Azure as an example, the small (A1) instancesare equipped with 1 VCPU core and 1.75 GB memory,while the medium (A2) instances are equipped with 2VCPU core and 3.5 GB memory. In this study, we mainlyfocus on the standard A1 VM instance. The experimentresults can be extended to other instance types.

Although the VM instances of the same type are equalin the amount of computing resources, the hardwarewithin the same instance type in an IaaS cloud canbe heterogeneous, mainly due to the upgradation ofhardware infrastructure over time [19]. Specifically, welaunched 20 standard A1 VM instances, running the

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 3

sysbench cpu SPECCPU gobmk0.0

0.5

1.0

1.5

No

rma

lize

d P

erf

orm

an

ce AMD 4171 HE

Intel E5−2660

(a) CPU

sysbench memory SPECCPU mcf0.0

0.5

1.0

1.5

2.0

No

rma

ilize

d P

erf

orm

an

ce

AMD 4171 HE

Intel E5−2660

(b) Memory and cache

sysbench fileio Sort WordCount0.0

0.5

1.0

1.5

No

rma

lize

d P

erf

orm

an

ce AMD 4171 HE

Intel E5−2660

(c) Network and disk I/O

Fig. 2: Performance comparison of benchmark applications running on two different hardware types of A1 VM instances in Microsoft Azure.

guest OS of Ubuntu Server 14.04 LTS with Linux 3.13.0-40-generic kernel, in the China North region of MicrosoftAzure2. We have identified that the A1 VM instances areactually equipped with two types of hardware, as listedin Table 1, i.e., 8 instances are equipped with Intel XeonE5-2660, while the remaining 12 instances are equippedwith AMD Opteron 4171 HE. Such an observation aboveis consistent with the recent measurement studies [6], [7]on other public clouds, such as Amazon EC2 and GoogleCompute Engine [20].

As business analytical applications on big data setsare moving to the cloud [21], MapReduce is becominga representative application in the IaaS cloud increas-ingly [22]. To facilitate the execution of MapReduceapplications in the cloud, Amazon Web Service and Mi-crosoft Azure have launched Elastic MapReduce [23] andHDInsight [24] for cloud users, respectively. Typically,MapReduce allows a master node to coordinate a numberof slave nodes to complete a tenant job. Specifically, themaster node divides the job into a number of map tasksand assigns them to the slave nodes. Then, these slavenodes execute the assigned map tasks in parallel, and thetask outputs (i.e., intermediate date) are finally combinedtogether by reduce tasks as the job result [25].

2.2 Severity of VM Performance Variation in CloudsTo examine how severe the performance variation ofVMs could be in an IaaS cloud, we run a set of rep-resentative benchmark applications listed in Table 2 onthe 20 A1 VM instances launched in Microsoft Azure [2].Specifically, these benchmarks (i.e., sysbench 0.4.12 [26],Hadoop 1.2.1 [27], SPECCPU 2006 [28]) mainly focus onexamining the multi-dimensional VM resources, includ-ing CPU, cache and memory bandwidth, network anddisk I/O bandwidth. In particular, the Hadoop appli-cation runs on a micro-cluster of 8 VM instances withthe default Hadoop configuration. Other benchmarkapplications are executed on a given VM instance. Toparticularly examine the performance interference acrossco-located VMs in clouds, each benchmark application isexecuted five times on the VM instances, at different timeduring a day, i.e., 9:00 am, 3:00 pm and 9:00 pm.

Fig. 2 compares the normalized benchmark perfor-mance with error bars of standard deviations on the twodifferent hardware types of A1 VM instances. We observethat the performance of benchmark applications varies

2. These 20 A1 VM instances are launched on December 26, 2014.

TABLE 2: Benchmarks for the measurement study in Microsoft Azure.

Benchmark Application Description

cpu measure time to check 20,000natural numbers for primeness

sysbench memory measure memory bandwidth for0.4.12 [26] transferring 4 GB file

fileio measure average I/O bandwidthto randomly read/write 1 GB file

Hadoop WordCount measure time to count/sort the1.2.1 [27] numbers in 5 GB file generated

Sort by Hadoop TeraGenSPECCPU gobmk measure time to complete2006 [28] mcf gobmk/mcf applications

significantly from 5.5% to 92.1%, across the CPU, cacheand memory, network and disk I/O resources. Specifi-cally, Fig. 2(a) shows that the AMD architecture achievesbetter CPU performance than the Intel architecture by27%-32.1%. The performance on the I/O resource issimilarly variable compared with the CPU performance,as depicted in Fig. 2(c). The performance on the cacheresource varies much widely (i.e., 92.1% performanceimprovement on Intel over AMD for SPECCPU mcf [28])than the performance on the CPU and I/O resources,as illustrated in Fig. 2(b). The rationale is that the Intelarchitecture is equipped with a much larger cache thanthe AMD architecture for A1 VM instances, as shown inTable 1. In contrast, the memory bandwidth is much lessvariable across the two hardware types of VM instances.

In more detail, the benchmark performance achievedwithin the same hardware type can also vary moderatelyfrom 0.7% to 29.1%. For example, the mcf applicationrunning on the AMD architecture achieves a wide per-formance range from 77.4% to 122.6%, as illustratedin Fig. 2(b). This is because the performance interferenceacross the co-located VMs (i.e., noisy VM neighbors [12])can negatively affect the performance of benchmark ap-plications [9]. In particular, the performance variation onthe cache and (network and disk) I/O resources is muchseverer than that on the CPU and memory resources, asshown in Fig. 2. This is because the cache space andI/O bandwidth resources can hardly be isolated acrossthe co-located VMs in existing Hypervisors [9].

Interestingly, we also observe from Fig. 2 that dif-ferent kinds of applications “prefer” different types ofhardware. For example, the SPECCPU mcf applicationsrunning on the Intel architecture can achieve muchmore performance improvements (i.e., 92.1%) than thatrunning on the AMD architecture, while the other ap-plications achieve better performance on AMD over

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 4

Intel. Accordingly, provisioning the VM instances of the“preferred” hardware type for tenant applications candramatically improve the application performance.

Summary: These experimental results above demon-strate substantial performance variation within the sameVM instance type in a public IaaS cloud. Such per-formance variation is mainly caused by the hardwareheterogeneity and performance interference of cloud VMinstances. As a result, potential performance benefitscan be achieved in clouds by (1) provisioning the VMinstances of the suitable (“preferred”) hardware type forthe requirements of tenant applications, and (2) alleviat-ing the performance interference across VM instances.

2.3 Essential Factors for VM Performance Variation

To identify what are the key factors that essentiallyimpact the performance variation of VM instances, wefurther tentatively investigate the relationship betweenthe application performance and the VM resource con-sumption as well as the number of provisioned VMs co-located on a PM. In this experiment, we run the represen-tative cloud applications (e.g., MapReduce applications)in our Xen-based private cloud, which consists of 12enterprise-class physical machines (PMs).

Experimental Setup. Each PM is equipped with twoquad-core processors, 24 GB memory, running Cen-tOS 5.5 distribution and the Linux 2.6.18.8-Xen kernelpatched with Xen 4.1.1. All PMs are connected with a 1Gbps switch, forming a one-tier tree network topology.The VMs are running CentOS 5.5 with Linux 2.6.18.8kernel. In accordance to the standard A1 VM instancesin Microsoft Azure [2], each VM instance in our clouddatacenter is configured with 1 VCPU core and 1.75GB memory capacity. The domain-0 is allocated with 2CPU core and 3 GB memory. In particular, the VMs areequipped with two types of hardware, i.e., Intel E7420and Intel E5620, which are comparable to the hardwareconfiguration of VM instances in public IaaS clouds likeAmazon EC2 and Microsoft Azure, as listed in Table 1.

Specifically, we run two MapReduce applications (i.e.,Hadoop Grep and Sort applications [27]) on the twodifferent hardware types of VM instances in our privatecloud. Each application processes an amount of 5 GB in-put data generated by the Hadoop TeraGen application.Each VM instance runs the stable version of Hadoop1.2.1 [27] with the default Hadoop configuration. Eachapplication is executed five times on a micro-cluster of 2-12 VMs co-located on a PM, and the performance resultsare illustrated with error bars of standard deviations.

As observed from Fig. 3, the performance of MapRe-duce applications running on the same type of VMs yetwith two types of hardware varies substantially from6.96% to 37.5%. Such an observation is consistent withthe experiment results obtained in a public IaaS cloud(i.e., Microsoft Azure) in Sec. 2.2. In particular, the E5620achieves better performance for Grep than the E7420,while the Sort application achieves less job completion

Number of Co-located VMs2 4 6 8 10 12

Com

plet

ion

Tim

e (s

econ

ds)

100

200

300

400

500 Intel E5620Intel E7420

Over-subscription

Performance variation

(a) GrepNumber of Co-located VMs

2 4 6 8 10 12

Com

plet

ion

Tim

e (s

econ

ds)

500

600

700

800

900

1000 Intel E5620Intel E7420

Performance variation

Over-subscription

(b) SortFig. 3: Performance variation of Hadoop Grep and Sort applications ondifferent numbers of co-located VMs with two types of hardware.

TABLE 3: Averaged multi-dimensional resource utilization of a slave VMinstance running MapReduce applications on two types of hardware.

Application Sort GrepHardware E5620 E7420 E5620 E7420

CPU (%) 26.76 35.04 56.07 71.56

Memory (%) 49.96 51.02 56.79 54.03

Disk I/O (MB/s) 6.87 8.73 2.55 2.86

Net I/O (MB/s) 3.36 4.16 1.65 2.02

time on E7420 than that on E5620. This also validates ouranalysis in Sec. 2.2 that cloud applications always havethe “preferred” type of hardware architecture to run on.

To figure out the key factors that essentially affectthe VM performance variation observed above, we firstexamine the multi-dimensional VM resource utilization in-cluding the CPU, memory, network and disk I/O uti-lization of a slave [25] VM instance. As shown in Table 3,the E5620 achieves lower average CPU utilization thanthe E7420 by 21.6%-26.5%, as the computational abilityof E5620 is more powerful than that of E7420. Also, theaverage I/O utilization on E7420 is higher than that onE5620 by 16.2%-26.1%, which implies that the E7420 hasmore powerful I/O processing ability than the E5620. Incontrast, the memory utilization on the two hardwaretypes are comparable to each other. Accordingly, Grepcan achieve better performance on E5620 than that onE7420, while Sort achieves worse performance on 5620than that on E7420, as Grep and Sort applications aremainly constrained by the VM computational capabilityand VM I/O capability, respectively.

Key Factor I. According to the detailed analysis above,the first key factor to the performance variation of cloudapplications is identified as the multi-dimensional resourceutilization of provisioned VMs, including CPU utilizationand I/O bandwidth (i.e., disk and network I/O).

Next, we proceed to examine whether the number ofprovisioned VMs co-located on the same PM can impactthe performance variation of VM instances. As shown inFig. 3, by varying the number of co-located VMs from 2to 8, both the Sort and Grep applications cannot achievean ideal linear performance improvement, even witheach VM allocated a physical CPU core. The rationaleis that, the VM contention on the shared network anddisk I/O resources still exists, which consequently bringssubstantial performance interference across co-locatedVMs [29]. Such VM interference becomes even severeas the PM provisions 8-12 active VMs (i.e., VM over-subscription [9]), which causes a certain performance

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 5

Number of Co-located VMs2 4 6 8 10 12

CP

U U

tiliz

atio

n (%

)

50

100

150

200

250

300

350

I/O T

hrou

ghpu

t (M

B/s

)

0

15

30

45

60

75

90CPU-E5620CPU-E7420I/O-E5620I/O-E7420

Over-subscription

(a) GrepNumber of Co-located VMs

2 4 6 8 10 12

CP

U U

tiliz

atio

n (%

)

50

100

150

200

250

300

350

I/O T

hrou

ghpu

t (M

B/s

)

0

15

30

45

60

75

90

CPU-E5620CPU-E7420I/O-E5620I/O-E7420

Over-subscription

(b) SortFig. 4: Aggregated CPU and I/O utilization of co-located VMs runningHadoop Grep and Sort applications on two types of hardware.

TABLE 4: Average execution time (measured in seconds) of map tasksof Hadoop Grep and Sort applications running on two types of hardware.

Application Hardware Number of Co-located VMs2 4 6 8 10 12

Grep E5620 13 14 16 21 31 36E7420 20 22 25 29 38 44

Sort E5620 16 22 38 59 69 78E7420 21 28 43 65 76 85

degradation to MapReduce applications shown in Fig. 3.In more detail, as the number of provisioned VMs

varies from 2 to 6, both the Grep and Sort performanceshow a roughly linear increase. This indicates that theaggregated CPU and I/O resource demand of VMs canbe satisfied by the available PM resources. As observedin Fig. 4, the aggregated CPU utilization for Grep andthe aggregated I/O utilization for Sort are roughly linearto the number of co-located VMs. However, as the PMprovisions over 6 VMs, the (network) I/O bandwidthfor Grep in Fig. 4(a) and the (disk and network) I/Obandwidth for Sort in Fig. 4(b) become saturated, whichimplies that the available PM resource cannot satisfy theincreasing resource demand of VMs. This explains thepoor performance of Grep and Sort applications runningon 8-12 co-located VMs. Also, when the aggregated CPUdemand of co-located VMs exceeds the maximum CPUcapacity of a PM (i.e., 800%), there will be significantperformance interference on the CPU resource [30].

Key Factor II: The analysis above indicates that, thesecond key factor to the performance variation of VMinstances is the aggregated resource utilization of provisionedVMs co-located on the PM. Essentially, the main causeto the performance interference of VM instances is themismatch between the available resources on the PM andthe aggregated resource utilization of co-located VMs.

The execution time of map tasks shown in Table 4further validates our analysis above. Specifically, theaverage execution time of Grep map tasks varies slightlyas the number of co-located VMs increases from 2 to 6.The map time of Grep begins to increase when the PMhosts over 6 VMs because of the VM interference on theI/O resource. In contrast, the performance of Sort maptasks is heavily impacted by the number of co-locatedVMs. This is because the VM interference on the sharedI/O resource for Sort becomes severe as the number ofVMs increases, while the VM interference on the CPUresource is slight as long as the aggregated VM CPUutilization does not exceed the CPU capacity of the PM.

A Win-Win Opportunity for Providers and Tenants.

As observed from Fig. 3, different numbers of VM in-stances provisioned can achieve similar application per-formance (job completion time). For example, as shownin Fig. 3(a), two VMs of E5620 achieve almost the sameperformance result for Grep with three VMs of E7420.Such an insight can also be obtained from Fig. 3(b),i.e., two VMs of E7420 achieve the similar (±5%) Sortperformance with three VMs of E5620. Accordingly, themonetary cost of tenants can be reduced by provisioninggood-performing (and less) VM instances to their jobs.The datacenter throughput can also be improved by pro-visioning more jobs with the same amount of physicalresources. This opens up an opportunity to improve the jobthroughput of datacenters for cloud providers and save the jobbudget for tenants, thereby achieving a win-win situation.

3 HEIFER: PREDICTING APPLICATION PER-FORMANCE BY VM RESOURCE UTILIZATION

Based on the experimental understanding of VM perfor-mance variation above, how can we effectively predictthe performance of tenant applications in the cloud? Asevidenced by Sec. 2, the resource utilization of VMs isthe key factor that measures the application performanceachieved on the heterogeneous hardware within thesame instance type. In response, we design a multi-resource model in Heifer to predict the performanceof the representative cloud applications (e.g., MapRe-duce applications), by explicitly considering hardwareheterogeneity and capturing VM interference. Table 5summarizes important notations used in our model.

TABLE 5: Key notations in our prediction model.

Notation Definition

I Amount of input data set of a jobb Size of a data block in the distributed file system

nm, nr Number of map/reduce tasks of the jobn, sm, sr Number of provisioned VMs, map/reduce slotstm, tr Execution time of a map/reduce task

ts1, tsData transfer time of the non-overlapping firstshuffle wave and the other shuffle waves

Pm, Ps, Pr Data processing rate of map, shuffle, reduce tasksP im, P is , P ir of a tenant job and that running in a VM i

Pm, PrData processing rate of map/reduce tasks runningin VMs within the same instance type

Bid, Bin Disk I/O and network I/O bandwidth of a VM i

rm, rrRatio of the amount of intermediate/reduce-inputdata to that of map-input/intermediate data

fd, f idPerformance degradation factor of VM interferenceand performance interference in a VM i

Sicpu, SiioAvailable CPU and I/O resources (supply) on thehosting PM of the VM i

Dicpu, Di

ioAggregated CPU and I/O resource utilization(demand) of the co-located VMs of the VM i

In the following, we first build a basic model basedon [14] to calculate the completion time of a MapReduceapplication. As shown in Fig. 5, the completion timeof a MapReduce job is the total running time of mapphase, shuffle phase and reduce phase. The executiontime of a map task and a reduce task is denoted bytm and tr, respectively. As the shuffle phase of the first

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 6

TimeMap task Reduce taskShuffle

Slots

tm

ts1

tr

ts

T

1

Sm

1

Sr

Fig. 5: Task execution in MapReduce.

reduce wave overlaps with the whole map phase, thedata transfer of the non-overlapping first shuffle waveand that of the remaining shuffle waves are treatedseparately, and denoted as ts1 and ts, respectively. Asa result, the completion time T of a MapReduce job canbe formulated in Eq. (1) as bellow,

T =⌈nmsm

⌉· tm +

⌈nrsr

⌉· (ts + tr) + (ts1 − ts), (1)

where nm, nr denote the numbers of map and reducetasks for the MapReduce job, respectively. sm and srrepresent the numbers of map and reduce slots allocatedto the job, respectively, and both can be calculated asthe product of the number of provisioned VM instancesn and the number of map slots and reduce slots perVM instance. In particular, nm can be calculated as theamount of input data size I divided by the data blocksize b in the distributed file system, i.e., nm = I

b , and nris empirically configured by users [27].

3.1 Modeling Task Execution Time by Resource Uti-lizationAs analyzed in Sec. 2, the task execution time tm, trand data transfer time ts, ts1 highly depend on theamount of available CPU and I/O resources as wellas the performance interference among provisioned VMinstances. As shown in Fig. 6, map tasks need to readdata blocks b from local disks and then process them.Accordingly, we first calculate tm as

tm =b

Pm, (2)

where Pm denotes the average data processing rate ofmap tasks of a tenant job. As map tasks are running inthe VM instances in parallel (i.e., the task scheduler [31]assigns more tasks on fast nodes than on slow nodes),Pm can be formulated as

Pm =

∑i P

im · f idn

, ∀i ∈ [1, n], (3)

where P im denotes the data processing rate of map tasksrunning in a VM instance i hosted alone on a PM,and f id ∈ (0, 1] denotes the performance interference ina VM instance i. As this study mainly considers theheterogeneous hardware within the same instance type (i.e.,we do not lease the VMs with mixed instance types), wehave P im = Pm, ∀i. Accordingly, Pm can be simplified as

Pm = Pm · fd, (4)

where fd denotes the average performance degradationfactor of VM interference, which is calculated as fd =

ReduceDiskMerge

ReduceProcess

MapProcess

MapDisk

map-inputdata

intermediate datai.e., map-output data

reduce-inputdata

outputresults

NetworkShuffle

Fig. 6: Data processing in MapReduce.∑i f

id

n , ∀i ∈ [1, n]. The obtaining process of fd and Pmwill be elaborated in Sec. 3.2.

Furthermore, the data transfer time in a shuffle wavets and the execution time of a reduce task tr can beformulated as the amount of intermediate data (i.e., map-output materialized data) and reduce-input data dividedby the data processing rate of shuffle and reduce tasks,respectively. We assume for simplicity that the interme-diate data is evenly distributed across reduce tasks. Asa result, the amount of data shuffled to each reduce task(i.e., intermediate data) is given by I·rm

nr, where rm is

defined as the ratio of the amount of intermediate datato that of map-input data. Hence, we calculate ts as

ts =I · rmnr · Ps

, (5)

where Ps denotes the average data processing rate ofshuffle tasks of a tenant job. As the last reduce taskcannot execute before the intermediate data are shuffledto its hosting VM instance [25], Ps can be formulated as

Ps = miniP is , ∀i ∈ [1, n], (6)

where P is is the data transfer rate in a VM instance i. Asillustrated in Fig. 6, the intermediate data requires to betransferred over the network and then stored in the localdisk where the reduce task runs [25], P is is dictated bythe slower value of the I/O bandwidth of disk Bid andnetwork Bin in a VM instance i, i.e., P is = min(Bid, B

in).

In particular, as shown in Fig. 5, the shuffle phasecan start after the first map wave is finished [25]. Ac-cordingly, the non-overlapping part of the first shufflewave mainly transfers the map-output materialized datagenerated by the last map wave, which can be estimatedas sm · b · rm. Correspondingly, the amount of input dataof the first non-overlapping shuffle wave is given bysm·b·rmnr

. According to Eq. (5), ts1 can be formulated as

ts1 =sm · b · rmnr · Ps

. (7)

As shown in Fig. 6, the intermediate data requires tobe merged as the reduce-input data by a scale factorrr before the execution of reduce tasks [25], and hence,we calculate the amount of reduce-input data as I·rm·rr

nr.

Similar to the map task, the average data processing rateof reduce tasks is formulate as Pr = Pr · fd. Accordingly,the execution time tr of a reduce task can be given by

tr =I · rm · rrnr · Pr

=I · rm · rrnr · Pr · fd

, (8)

where Pr is the data processing rate of reduce tasksrunning in a VM instance hosted alone on a PM.

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 7

By incorporating Eq. (2), (4), (5), (7) and (8) into Eq. (1),we formulate the performance model of MapReduceapplications using VM resource utilization as follows,

T =⌈nmsm

⌉· b

Pm · fd+⌈nrsr

⌉· I · rm

nr·( 1

Ps+

rrPr · fd

)+

rm

nr · Ps·(sm · b− I

). (9)

In general, smsr

= α, where α is the ratio of the numberof map slots to the number of reduce slots configured ina VM instance, and it can be considered as a constant.In addition, we have x ≤ dxe ≤ (x + 1), where x is avariable. Accordingly, we can obtain a lower bound ofthe job completion time Tlow as

Tlow =I

sm · Pm · fd+I · rm · αsm

·( 1

Ps+

rrPr · fd

)+

rm

nr · Ps·(sm · b− I

). (10)

The variable in Eq. (10) is reduced to sm. The other equa-tion parts can be determined by the model parameters,which are obtained in the following section. In particular,to meet the performance goals of tenant jobs in an idealcase (i.e., dxe = x), we can provision the VM instancesby setting the lower bound of job completion time Tlowas the application performance goals.

3.2 Determining Model ParametersAs analyzed above, there are six key parameters thataffect the performance of MapReduce jobs, which arePm, Pr, Ps, fd, rm and rr, as shown in Eq. (9). Thefirst four parameters Pm, Pr, Ps and fd depend on thehardware type and performance interference of provi-sioned VM instances. The last two parameters rm and rrdepend on the job program and the input data set. Thesix parameters above can be obtained by profiling jobswith sampled input data [4], [32] and online monitoringthe resource utilization of VM instances in the IaaS cloud.

Profiling Model Parameters. Given a job programwith its sampled input data set, we profile the job ina single VM instance running alone on a PM for eachtype of hardware. The profiling overhead can be wellconstrained because: (1) the number of hardware type islimited in an IaaS cloud, e.g., four types of hardwarefor m1 class VM instances in Amazon EC2 [18] andtwo hardware types of A1 VM instances in MicrosoftAzure, as listed in Table 1. (2) The job profiling requiresto be executed only once for each type of job program.The profile information includes the average executiontime of map tasks and reduce tasks, the amount of dataconsumed and generated in each phase (recorded inFileSystem Counters of Hadoop logs [27]), the applica-tion resource utilization (demand) in VM instances, etc.

These profile information above can be used to gen-erate four parameters Pm, Pr, rm and rr. Ps can be ob-tained by online tracing the VM I/O bandwidth of diskBid and network Bin in Eq. (6). fd can be captured usingthe online-measured CPU and I/O resource “supply”

as well as the profiled application resource “demand”,which will be elaborated in the following paragraph. Inparticular, the multi-dimensional resource utilization ofVM instances can be acquired periodically (e.g., rangingfrom 10 seconds to minutes) in the cloud, using theresource tracing tool developed in our prior work [9].Specifically, it acquires the multi-dimensional resource(including CPU, network I/O and disk I/O) utilizationof VMs by reading the Linux ‘proc’ filesystem in theguest domain, i.e., ‘/proc/stat’ file, ‘/proc/net/dev’ file,and ‘/proc/diskstats’ file. Also, our resource tracing toolmeasures the statistics of network interrupts inside theVMs using the Linux tool “vmstat.”

Capturing VM Interference. As analyzed in Sec. 2.3,the VM interference can be considered as the mismatchbetween the resource “supply” provided by the hostingPM and the resource “demand” of co-located VMs, i.e.,whether the resource “demand” can be satisfied by theresource “supply”. Accordingly, we devise a simple yeteffective supply-demand ratio [9] model to capture theperformance interference of a VM instance i as below,

f id = c0 + c1 ·SicpuDicpu

+ c2 ·SiioDiio

, (11)

where c0, c1 and c2 are the coefficients in our statisticallinear model. For each VM instance i, Di

cpu and Diio

denote the aggregated CPU utilization and (disk and net-work) I/O utilization of its co-located VMs, respectively.Sicpu and Siio denote the available physical CPU resource(cores) and I/O resource on its hosting PM, respectively.In particular, the smaller f id ∈ (0, 1] ratio indicates theseverer performance interference in a VM instance i.

In more detail, we use I/O bandwidth and the numberof I/O interrupts to measure the I/O resources in VMsand PMs. The supply-demand ratio of Si

io

Diio

in Eq. (11) can

be calculated as the sum of two ratios, i.e., Siband

Diband

+Siinter

Diinter

.For each VM instance i, Di

band and Diinter denote the

aggregated I/O throughput and virtual I/O interruptsof its co-located VMs, respectively. Siband and Siinterdenote the available I/O bandwidth and the maximumI/O interrupts obtained in domain-0 of the hosting PM,respectively. In practice, we calculate f id only once for theco-located VMs on a PM, as they share the same valueof f id on the hosting PM. Moreover, the performancedegradation factor of VM interference fd is formulatedas the average interference value in all the provisionedVMs, which is given by

fd =

∑i f

id

n, ∀i ∈ [1, n], (12)

as we have analyzed in Sec. 3.1.The model coefficients c0, c1 and c2 in Eq. (11) can be

trained using the least squares method with the historicalinformation of resource utilization of various applica-tions executed in the datacenter. Our model is initiallyconstructed by offline running a number of applicationson multiple VM instances hosted on a single PM. It

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 8

then keeps updating and refining the model coefficientsonline for calibration [29], when new applications arerunning and completed in the datacenter, as shown inFig. 1. In particular, our demand-supply model of VMinterference in Eq. (11) can be extended to the otherdimensional resources, such as the cache and memoryresources. Also, the Heifer performance model can beextended to support mixed types of VM instances (e.g.,small and large instances) by acquiring the three keyparameters Pm, Pr, and Ps.

4 HEIFER: HARDWARE HETEROGENEITY ANDINTERFERENCE-AWARE VM INSTANCE PROVI-SIONING STRATEGY

Based on the application performance model developedin Sec. 3, we further present Heifer in Algorithm 1, withthe aim of provisioning the VM instances of the good-performing hardware type to tenant applications whileachieving application performance goals, by exploringthe hardware heterogeneity and capturing performanceinterference across the VMs. In particular, the perfor-mance variation caused by the hardware heterogeneitycan be mitigated, while the VM interference is capturedby the Heifer performance model in Sec. 3, so as todeliver predictable performance to tenant applications.

The Rationale for Deploying Heifer. To provide pre-dictable performance for tenant applications, the cloudprovider has the flexibility to recommend (or allocate)either a number of VM instances with poor applica-tion performance or a few VM instances with goodapplication performance to tenant jobs, as elaborated inSec. 2.3. Fortunately, the cloud provider has the follow-ing two incentives to deploy Heifer, so as to provisionthe VM instances with good application performance totenant jobs. First, it attracts the performance-sensitivecustomers to deploy their applications in IaaS clouds,as Heifer helps to cut down the job budget while guar-anteeing the job performance for tenants. Second, theprovider can complete the tenant job with fewer VMinstances than the existing VM provisioning approaches,such as ARIA [14]. Accordingly, the provider can havemore available instances to support the other tenant jobs,thereby improving the job throughput of the datacenter.

How Does Heifer Work? Given a tenant job and itsperformance goal Tg , Heifer first obtains and initializesthe model parameters through job profiling and Eq. (6),(12), for each type of instance hardware t. Then, itcalculates the base number of map slots sbasem , by solvingEq. (10) with the ideal job completion time Tlow set as Tg .Staring from b s

basem

β c, Heifer iteratively elects the possiblenumber of provisioned VM instances n to the upper limitmin(max(dnm

β e, dα·nr

β e), ntavail). For each number of VM

instances n, it updates the model parameters Ps andfd, and calculates the job completion T using Eq. (9).Only when the job performance goal Tg is achieved(i.e., T ≤ Tg), can n be considered as the candidatenumber of provisioned VM instances. Finally, Heifer can

identify the minimum nins among the candidate numbersof provisioned VM instances for all hardware types andthe corresponding selected hardware type tins.

Algorithm 1: Heifer: heterogeneity and interference-aware VM instance provisioning strategy.

Input: Job performance goal Tg , job input data set, availableVM instances ntavail for each hardware type t, the numberof map slots β configured in a VM instance.

Output: VM provisioning plan (i.e., the number andhardware type of provisioned VM instances).

1: Initialize nins ←∞, tins ← null;2: for each hardware type t of VM instances do3: Obtain parameters Pm, Pr , rm, rr by job profiling;4: Initialize Ps ← Eq. (6), fd ← Eq. (12) by letting

n = ntavail;5: sbasem ← solving Eq. (10) by letting Tlow = Tg ;6: for all n ∈

(b s

basemβc,min(max(dnm

βe, dα·nr

βe), ntavail)

]do

7: sm ← β · n; sr ← βα· n;

8: Update model parameters Ps, fd by Eq. (6), (12);9: Calculate job completion time T ← Eq. (9);

10: if T ≤ Tg && n < nins then11: nins ← n; tins ← t;12: break out of the inner loop;13: end if14: end for15: end for16: return the number of provisioned VM instances nins and

the corresponding hardware type tins.

Algorithm 1 can be executed periodically to checkwhether the performance goals can be achieved for long-running jobs, and update the VM provisioning planaccordingly, as illustrated in Fig. 1. Specifically, if thepredicted application performance cannot meet the jobperformance goal, Heifer will add an appropriate num-ber of VMs to tenant jobs in the next period. If thepredicted application performance exceeds the perfor-mance goal by a certain threshold (e.g., 20%), Heifer willreduce the number of leased VM instances to cut downthe monetary cost of tenants, while still meeting the jobperformance goals. The checking period can be adjustedby the cloud provider, ranging from minutes to one hour.In particular, to achieve good job performance, Heiferis designed to provision the VM instances of the samehardware type to execute a single tenant job.

Given the detailed notations in Algorithm 1, the com-plexity of the Heifer provisioning strategy is in the orderof O(k · p), where k = min(max(dnm

β e, dα·nr

β e), ntavail) −

b sbasem

β c denotes the cardinality of VM provisioning spacefor each hardware type t, and p is the number ofhardware types of VM instances. In general, the clouddatacenters has a limited number of hardware typeswithin the same instance type, e.g., four types in AmazonEC2. Accordingly, p can be considered as a constant, andthe complexity of Heifer is reduced to O(k). In particular,the algorithm result of nins =∞ or tins = null indicatesthat the job performance goal cannot be satisfied by theavailable cloud resources. Such jobs will be waiting in aqueue until enough resources are available.

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 9

Number of VMs2 4 6 8 10 12 14 16 18

Com

plet

ion

Tim

e (s

econ

ds)

200

600

1000

1400

1800 E5620-obsE5620-predE7420-obsE7420-pred

(a) Sort

Number of VMs2 4 6 8 10 12 14 16 18

Com

plet

ion

Tim

e (s

econ

ds)

0

500

1000

1500

2000 E5620-obsE5620-predE7420-obsE7420-pred

(b) Grep

Number of VMs2 4 6 8 10 12 14 16 18

Com

plet

ion

Tim

e (s

econ

ds)

1000

2000

3000

4000

5000

6000

7000E5620-obsE5620-predE7420-obsE7420-pred

(c) WordCountFig. 7: Comparison of the predicted and observed completion time for (a) Sort, (b) Grep and (c) WordCountMapReduce applications.

Obs Pred Obs Pred Obs Pred0

200

400

600

800

1000

1200

NVM

=16NVM

=8Com

plet

ion

Tim

e (s

econ

ds)

NVM

=4

Reduce phase Non-overlapped shuffle Map phase

Fig. 8: Comparison of phase comple-tion time for Sort with 4, 8 and 16provisioned VM instances of E5620.

5 EXPERIMENTAL EVALUATION

In this section, we evaluate Heifer by focusing on (1) theaccuracy of Heifer performance model, and (2) the effec-tiveness and benefits of Heifer VM provisioning strategy.Specifically, we demonstrate the performance gain ofHeifer for both tenants and providers as well as thestrategy computational overhead, compared to random,ARIA [14] and U-CHAMPION [16] VM provisioningstrategies, through prototype experiments in both publicand private clouds and large-scale simulations.

5.1 Accuracy of Heifer Performance ModelWe first evaluate Heifer performance prediction model inour Xen-based private cloud. The detailed configurationsof our cloud datacenter, VMs, PMs, and Hadoop [27] areelaborated in Sec. 2.2. We run the four representativeMapReduce workloads listed in Table 6 on each type ofinstance hardware for five times, so as to examine theaccuracy and overhead of Heifer performance model.

TABLE 6: Representative MapReduce workloads in IaaS clouds.

Workload Input Data Set

Sort 10 GB generated by TeraGen in Hadoop [27]Grep 20 GB of Wikipedia articles [33]

WordCount 20 GB of Wikipedia articlesTF-IDF [34] 10 GB of Wikipedia articles

To generate model parameters in Eq. (9), we profileeach application workload with its sampled input dataon a VM instance running alone on a PM, as elaboratedin Sec. 3.2. To improve the accuracy of Heifer perfor-mance model, we use the resampling method (i.e., boot-strap aggregating) [32]: Sampling the application inputdata set repeatedly (i.e., five times in our experiment) forjob profiling, and then averaging the multiple profilingoutputs to obtain the model parameters Pm, Pr, rm andrr, as summarized in Table 7. In particular, the size ofsampled input data requires to be sufficient and can beempirically set as the maximum of 10% of the applicationinput data and the heap size of Hadoop. To obtain themodel parameter fd, we train the VM interference modelin Eq. (11), by running the workloads listed in Table 6for ten times on 2 to 12 VMs co-located on a PM.

Prediction of Job Completion Time. The Heifer per-formance model is built using the model parametersobtained above. Fig. 7 compares the predicted and ob-served completion time of MapReduce workloads witherror bars of standard deviations. We observe that the

TABLE 7: Model parameters obtained from job profiling. The job profilingduration is measured in seconds. Pm and Pr are measured in MB/s.

Workload Hardware Duration rm rr Pm Pr

Sort E5620 2161.00 1.00

5.56 2.69E7420 188 4.55 3.59

Grep E5620 2030 0

6.96 N/AE7420 240 5.33 N/A

WordCount E5620 4160.4 0.85

2.81 1.72E7420 586 1.94 1.89

TF-IDF E5620 4681.95 1.42

1.06 1.78E7420 304 1.51 2.46

predicted completion time can basically fit with theobserved completion time of MapReduce applications,as the number of VMs varies from 2 to 18. The averageprediction error for Sort, Grep and WordCount is 9.5%,6.8% and 5.8%, respectively. Accordingly, the model pre-dicts the performance for Grep and WordCount slightlybetter than that for Sort. The rationale is that, both Grepand WordCount (CPU-intensive) are less affected by theI/O bandwidth resource than the Sort (I/O-intensive)application, and meanwhile, the VM I/O utilizationtends to be unstable compared to the CPU utilizationof VMs, which is likely to cause a larger prediction errorof VM I/O utilization than that of VM CPU utilization,as we have analyzed in Sec. 2.2.

Prediction of Phase Completion Time. We furtherevaluate the accuracy of Heifer performance model topredict the completion time for each MapReduce phase.As the map phase overlaps with the shuffle phase, weexamine the non-overlapped part of the shuffle phase.As observed in Fig. 8, the predicted completion time ofmap phase is more accurate than that of the other twophases, while the observed completion time of shufflephase is larger than the predicted completion time. Thisis because the imbalance of intermediate data (i.e., map-output materialized data) does exist in practice, whileour model assumes the intermediate data are evenlydistributed among reduce tasks for fast prediction.

Discussion on Performance Prediction Error. Asshown in Fig. 7 and Fig. 8, we observe a certain predic-tion errors (i.e., within 10%) of completion time for jobsand phases, which mainly stems from the task failuresof MapReduce. By examining the logs of Hadoop jobs,we find that there exist a number of task failures, whichprolongs the job completion and causes the predictionerror of job performance. Accordingly, to account forsuch prediction errors, we simply add an empirical value(i.e., 5%) of error tolerance to the performance goals

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 10

1000 1000 2000 20000

4

8

12

16

20

Completion Time Goals (seconds)Num

ber o

f Pro

visi

oned

VM

sARIAU-CHAMPIONHeifer

Sort Grep

WordCount

TF-IDF

(a)

Sort Grep WordCount TF-IDF0

500

1000

1500

2000

2500

Job

Com

plet

ion

Tim

e (s

econ

ds)

ARIAU-CHAMPIONHeifer

Time Goals

(b)Fig. 9: Comparison of ARIA, U-CHAMPION and Heifer VM provisioningstrategies in terms of: (a) the number of provisioned VM instances witha given completion time goal, and (b) whether the job completion timegoals can be achieved.

0

2

4

6

8

10

12

3000

1000

750

Sort WordCount Grep TF-IDF

Num

ber

of P

rovi

sion

ed V

Ms

500

Completion Time Goals (seconds)10

0075

050

030

0025

0020

0025

0020

00

(a)

500 1000 1500 2000 2500 3000500

1000

1500

2000

2500

3000

Completion Time Goals (seconds)Job

Com

plet

ion

Tim

e (s

econ

ds)

SortGrepWordCountTF-IDF

Time Goals

(b)Fig. 10: Effectiveness of Heifer in our local private cloud: (a) the numbersof VM instances provisioned to MapReduce applications with differentperformance goals, and (b) whether the job completion time goals canbe achieved.

in our prototype experiments. We will incorporate thefactor of task failures into Heifer performance model toimprove its prediction accuracy in our future work.

Heifer Prediction Overhead. With processing the sam-pled data set for job profiling, the prediction overheadof MapReduce job performance is well-constrained. Asshown in Table 7, the average profiling duration for eachapplication workload is within 10 minutes. Moreover,a job application requires to be profiled only once forits subsequent executions, as discussed in Sec. 3.2. Inaddition, the historical information of MapReduce jobexecutions can be used as the input of job profiling, torefine the performance prediction model and reduce theoverhead from job profiling, as shown in Fig. 1.

5.2 Effectiveness and Overhead of HeiferWe further evaluate the effectiveness and overhead ofthe Heifer VM provisioning strategy, compared withthe two recently proposed strategies ARIA [14] and U-CHAMPION [16]. Specifically, (1) ARIA [14] models theapplication performance based on the job profiling. Ititeratively finds the feasible numbers of VM instancesthrough the entire possible range of VM allocations, soas to meet the performance goals of tenant jobs. (2)U-Champion [16] predicts the application performanceusing the Supported Vector Machine (SVM) regressionmodel and online job profiling on different hardwaretypes. It provisions the good-performing VM instances totenant jobs by optimizing the monetary cost of tenants.

By running representative cloud applications in ourlocal 72-VM private cloud3 and a commonly used publiccloud (i.e., Microsoft Azure), we examine performance ofeach VM provisioning strategy, in terms of the number ofprovisioned VM instances, whether the job performancegoals can be achieved, and the strategy computationtime. We run the jobs five time for each provisioningplan, and illustrate the job completion time with errorbars of standard deviations.

5.2.1 Validating Effectiveness of Heifer in Local PrivateCloudWe first run the MapReduce workloads listed in Table 6in our local private cloud, to validate the effectiveness

3. Our private cloud has 12 PMs, and each PM hosts a maximum of6 VM instances to prevent severe VM interference shown in Fig. 3.

and computational overhead of Heifer. In particular, themodel parameters Pm, Pr, rm and rr for each applicationare set as the profiling statistics listed in Table 7. Theother model parameters Ps and fd are trained andmeasured online during the execution of applications,as elaborated in Sec. 3.2 and Sec. 5.1.

Heifer vs. ARIA & U-CHAMPION. To illustratethe effectiveness of Heifer, we compare the numbersof provisioned VM instances and the actual job com-pletion time achieved by Heifer, ARIA [14] and U-CHAMPION [16] strategies. We set the completion timegoals for Sort and Grep applications as 1, 000 seconds,and the time goals for WordCount and TF-IDF appli-cations as 2, 000 seconds. As shown in Fig. 9(a), Heifercan reduce the number of VM instances provisioned toMapReduce jobs by 12.5%-60% as compared to ARIA,while increasing the number of provisioned VMs by11.1%-33.3% compared with U-CHAMPION. Neverthe-less, all the jobs with U-CHAMPION violate the com-pletion time goals, while ARIA and Heifer can basicallymeet the given time goals of jobs, as shown in Fig. 9(b).

The rationale for such observations above is that:(1) ARIA is a heterogeneity-oblivious and interference-unaware VM provisioning strategy. It uses the profiledapplication-level model parameters (e.g., tm, tr) to pre-dict the job performance by Eq. (1), with the assumptionthat all the VMs are identical and totally isolated. Ac-cordingly, to meet the job completion time goals, ARIArequires to provision more VMs than Heifer and U-CHAMPION. (2) U-CHAMPION explicitly considers thehardware heterogeneity within the same instance typeand uses job profiling and the SVM regression modelto predict job performance. Hence, U-CHAMPION canalways provision the good-performing VM instances totenant jobs, as shown in Fig. 9(a). Yet, it is still aninterference-unaware strategy to provision cloud VMs,which consequently leads to the violations of job perfor-mance goals. In contrast, Heifer predicts the performanceof tenant jobs by online tracing the multi-dimensionalVM resource utilization and capturing VM interference.Moreover, Heifer provisions the good-performing VMsto tenant jobs by explicitly taking the hardware hetero-geneity and VM interference into account.

Benefits of Heifer. As Heifer finishes the jobs aheadof the expected completion time goals with fewer provi-sioned VM instances than the ARIA and U-CHAMPION

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 11

0 10 20 30 400.0

0.2

0.4

0.6

0.8

Job Input Data Set (GB)Com

puta

tion

Tim

e (s

econ

ds)

HeiferARIAU-CHAMPION

Fig. 11: Computation time of Heifer, ARIAand U-CHAMPION VM provisioning strate-gies running on a commodity server, with therepresentative Hadoop Sort workload.

500 1000 15000

2

4

6

8

Completion Time Goals (seconds)

Num

ber o

f Pro

visi

oned

VM

s

WordCountSort

(a)

300 500 700 900 1100 1300 1500300500700900

110013001500

Completion Time Goals (seconds)Job

Com

plet

ion

Tim

e (s

econ

ds)

WordCountSort

Time Goals

(b)Fig. 12: Effectiveness of Heifer in Microsoft Azure: (a) the numbers of A1 VM in-stances provisioned to MapReduce applications with different performance goals, and(b) whether the job completion time goals can be achieved.

strategies, it brings the following benefits to both tenantand providers: (1) it can save the job budget for tenantsas the monetary cost of jobs is linear to the numberof provisioned VMs with the same instance type. (2)The cloud datacenter can support more jobs at a time,which increases the job throughput of the datacenter forproviders. The benefits of Heifer will quantitatively bevalidated in large-scale simulations in Sec. 5.3.

Does Heifer Provide Predictable Performance forTenant Jobs? We examine whether Heifer can guar-antee the job performance goals by provisioning anappropriate number of VM instances. Specifically, thecompletion time goals for Sort and Grep applicationsare set as 500, 750 and 1, 000 seconds, while the timegoals for WordCount and TF-IDF applications are set as2, 000, 2, 500 and 3, 000 seconds. Fig. 10(a) compares thenumbers of provisioned VM instances for each job withdifferent time goals. We observe that Heifer provisions adecreasing number of VM instances to MapReduce jobs,as the completion time goal increases. With the provi-sioned VMs, Fig. 10(b) shows that the actual completiontime of these MapReduce jobs running in our privatecloud is basically below the time goals of jobs, whichdemonstrates that Heifer can guarantee the performancegoals of MapReduce jobs for tenants by our VM provi-sioning strategy. Such an observation further validatesthe accuracy of Heifer performance prediction model.

Heifer Computational Overhead. To evaluate thecomputational overhead of Heifer, we compare the strat-egy computation time achieved by Heifer, ARIA and U-CHAMPION, which are running on a commodity server(one quad-core Intel Xeon E5620 2.40 GHz processor).The amount of input data set for the Sort application isvaried from 2 GB to 40 GB. As shown in Fig. 11, thecomputation time of the three strategies is less than 0.7seconds, while the strategy computation time of Heiferis only around 0.2 seconds, as the amount of input dataset grows to 40 GB. Such a computational overhead isnegligible in practice, which can be further reduced byrunning Heifer on a more powerful server.

In more detail, the computation time of Heifer andARIA strategies shows a roughly linear increase, as theamount of input data set increases. Also, the strategycomputation time of ARIA increases much faster thanthat of Heifer. The rationale is that, to find out the VMprovisioning plan, ARIA iterates over the entire range ofmap slot allocations from the maximum number of map

tasks to one [14], while Heifer optimizes the iterationprocess by starting from a base number of map slotsto the maximum number of map tasks, as elaboratedin Sec. 4. In contrast, the strategy computation time ofU-CHAMPION is around 0.6 seconds, which is mainlyattributed to the classifying process of the SVM regres-sion model in U-CHAMPION, as implemented by theSVM toolbox in Matlab in our prototype experiments.

5.2.2 Validating Effectiveness of Heifer in MicrosoftAzureMoreover, to examine the effectiveness of Heifer in apublic IaaS cloud, we implement Heifer upon the 20A1 VM instances in Microsoft Azure, which are actuallyequipped with two types of hardware, as elaborated inSec. 2.1. In this experiment, we run two representativecloud applications, i.e., Hadoop WordCount and Sort, aslisted in Table 2. Each VM runs Hadoop v1.2.1 [27] withthe default Hadoop configuration. It should be notedthat, we have no access to the host domain in publicIaaS clouds, so that we cannot measure the availableresource capacity of PMs. Hence, we empirically set theperformance degradation factor of VM interference fdas a constant value 1

1+0.291 , according to the maximumvalue (29.1%) of performance interference measured inSec. 2.2. The performance goals for WordCount and Sortapplications are set as 500, 1, 000 and 1, 500 seconds.

Fig. 12(a) depicts the numbers of A1 VM instancesprovisioned to the WordCount and Sort applicationswith the different completion time goals above. Similarto the experiment results obtained in our local privatecloud, Heifer provisions a decreasing number of VM in-stances as the completion time goals increase. As shownin Fig. 12(b), the actual completion time of the two appli-cations is far below the time goals. Such an observationindicates that Heifer can guarantee the job performancegoals, while it over-provisions the A1 VM instances forjobs in Microsoft Azure. The rationale for the VM over-provisioning is that, to guarantee the job completion timegoals in the worst cases, the performance degradationfactor fd is statically set as an empirical (big) value inHeifer performance model, since we cannot measure theresource capacity of PMs in public clouds.

5.3 Benefits of Heifer to Both Tenants and ProvidersTo evaluate the effectiveness and benefits of Heifer atscale and obtain complementary insights, we also con-

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 12

0 100 200 300 400 5000

20

40

60

80

100

Number of Provisioned VMs

CD

F %

of J

obs

RandomARIAHeifer

(a)

100 200 300 400 5000

20

40

60

80

100

Job Completion Time (seconds)

CD

F %

of J

obs

RandomARIAHeifer

80%

97%

Time Goals

72%

(b)Fig. 13: Performance comparison of MapReduce jobs with Heifer, ARIA and random VMprovisioning strategies in terms of (a) the number of provisioned VM instances and (b)job completion time.

Tenant Cost Provider Revenue0

2000

4000

6000

8000

US

Dol

lars

RandomARIAHeifer

Fig. 14: Benefits of Heifer to both tenantsand providers in terms of the tenant cost torun all the jobs in traces and the providerrevenue in one month.

duct large-scale simulations driven by real-world jobtraces from Facebook Inc. [35]. Specifically, we compareHeifer with random and ARIA [14] strategies, by run-ning 5, 894 MapReduce jobs in the traces. We set upa 600-node datacenter configured with two hardwaretypes (i.e., Intel E5620 and E7420) shown in Table 1 in oursimulator [36]. Each node is configured with 4 map and 4reduce slots. The size of a data block is set as 128 MB. Forsimplicity, we only consider the hardware heterogeneityin simulations, and set the model parameters rm, rr, Pmand Pr as the job statistics for Sort in Table 7. Anotherparameter Ps is set as the statistics of average VMresource utilization for Sort in Table 3. The completiontime goal of jobs is statically set as the average executiontime (i.e., 167.12 seconds) of jobs in the traces [35].

As shown in Fig. 13(a), Heifer can reduce the numberof provisioned VM instances by 26.5%-64.7% comparedwith the random and ARIA strategies. Specifically, tocomplete jobs within time goals, the random, ARIA andHeifer strategies provision the jobs with 191, 151 and116 VM instances on average. With the provisioned VMinstances, Fig. 13(b) compares the average completiontime of each MapReduce job achieved by the three strate-gies. We observe that the random and ARIA strategiescomplete each MapReduce job using 310 seconds and198 seconds on average, respectively. They guarantee thecompletion time goals for only 72% and 80% of the jobs,respectively. In contrast, Heifer can reduce the averagejob completion time to 166 seconds and complete 97%of the jobs within their time goals.

Analysis of Simulation Results: (1) As the randomstrategy is likely to over-provision VMs to tenant jobscompared with Heifer and ARIA, it can make a numberof jobs wait in the job queue due to the constrainton the available VM instances in the cloud, therebyprolonging the job completion time. Our simulation logsreveal that 28.2% of jobs have joined in the queue withthe random strategy, which causes an average waitingtime of 12.6 seconds per job. Furthermore, the randomstrategy cannot provide the predictable performance fortenant jobs, which causes 28% of jobs to miss their timegoals. (2) As discussed in Sec. 5.2.1, ARIA predicts thejob performance without considering VM interference,which makes 20% of jobs violate their time goals in oursimulations. Meanwhile, ARIA is likely to provision theinappropriate (i.e., bad-performing) VMs to tenant jobs,as it is oblivious to the hardware heterogeneity of cloud

VMs. As a result, ARIA provisions more VM instancesand cause longer job completion time than Heifer.

Job Budget of Tenants. Given the VM instance priceper minute4 price, the number of provisioned VM in-stances nodes num, and the job execution time in min-utes time, we compare the total budget total cost ofexecuting jobs in Facebook traces [35] with the random,ARIA and Heifer strategies as follows.

total cost = price · nodes num · time. (13)

The price of VM instances is set as 0.087 US dollars perhour (i.e., 0.00145 US dollars per minute), according tothe price of A1 VM instances in Microsoft Azure [2].By Eq. (13), the total job budget is calculated as 8, 433,4, 258 and 2, 742 US dollars with the random, ARIAand Heifer strategies, respectively, as shown in Fig. 14.Accordingly, Heifer can save the job budget by at least55.2% compared to the random and ARIA strategies.

Revenue of Providers. We next examine the benefitsof Heifer from the provider’s perspective. First is the jobthroughput of the cloud datacenter, which is calculatedas the amount of seconds in an hour (i.e., 3, 600) dividedby the average job completion time. Accordingly, thedatacenter throughput is 21 jobs per hour with Heifer,as compared to 18 jobs per hour with ARIA and 11 jobsper hour with the random strategy. The second metric forproviders is the cloud revenue. Given a fixed time frameand incoming tenant jobs, the revenue of the provider islinear to the job throughput of the datacenter. We assumethat the provider’s income for running a job is 0.5 USdollars. Accordingly, the revenue of the provider in onemonth can be summed up to 7, 560, 6, 480 and 3, 960 USdollars with Heifer, ARIA and random strategies, respec-tively. As shown in Fig. 14, Heifer achieves an increaseof over 16.7% in the provider’s revenue compared tothe random and ARIA strategies. As a result, deployingHeifer can bring benefits to both tenants and providers.

6 RELATED WORK

There have been a number of works that offer the evi-dence of hardware heterogeneity within the same type ofVM instances in IaaS clouds. For instance, [6], [8] iden-tified that two classes of CPU hardware (i.e., Intel Xeonand AMD Opteron CPU processors) are provisioned to

4. Microsoft Azure [2] and Google Compute Engine [20] charge thetenants for their renting resources in minutes.

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 13

EC2 m1 class instances. As Amazon datacenters havebeen growing, several more recent studies [7], [18], [37]reported that there exist four types of Intel Xeon CPUprocessors for EC2 m1 class VM instances, as listed inTable 1. Complementary to the prior studies, Sec. 2.1 hasconfirmed the prevalence of hardware heterogeneity inanother commonly used public IaaS cloud (i.e., MicrosoftAzure), by identifying (at least) two types of hardwareconfiguration for A1 VM instances.

To mitigate the performance variation caused by suchhardware heterogeneity on behalf of tenants, a simpleheterogeneity-aware workload placement strategy [8],[38] is designed to greedily place tenant jobs onto thegood-performing cloud VMs. Heifer differs from such astrategy in two aspects: first, [8], [38] only mitigate theperformance impact caused by hardware heterogeneity,while Heifer aims to provide predictable performancefor tenant jobs by considering hardware heterogeneity.Second, to identify the VM instances with good runningperformance, [38] uses the resource correlation analysisbetween a reference application and tenant workloads oneach hardware type. [8] relies on an opportunistic work-load placement by simply dropping the VMs with poorperformance. In contrast, Heifer devises a performanceprediction model based on VM resource utilization toseek the good-performing VM instances for tenant jobs.

As cloud tenants are increasingly caring about theirhigh-level job performance goals (e.g., job completiontime) [4], there have recently emerged several works onprovisioning cloud VMs to provide predictable perfor-mance for tenant jobs [39]. For example, Elastisizer [13]leveraged the job execution profiles and what-if analysisto determine the MapReduce job configuration and theinstance type and number of provisioned VMs. BothARIA [14] and Bazaar [4] developed a MapReduce per-formance model using the historical information of jobs,which can be further used to estimate the provisionedresources to MapReduce jobs. To save the job budgetwhile guaranteeing the job completion time, Matrix [15]provisioned the VMs of the cost-effective instance typethrough predicting the application performance usingclustering models, while RHIC [40] designed a black-boxperformance model using system-level metrics, so as toprovision user workloads with an appropriate number ofdedicated and volunteer VMs in a hybrid cluster. Con-ductor [5] focused on optimizing the deployment costof high-level cloud resources including computation,storage and data transfer for MapReduce applications.

While we share the ideas of job profiling and perfor-mance modeling of MapReduce applications with priorworks above, the key difference in Heifer is that we fur-ther make a step to explicitly incorporate the hardwareheterogeneity and VM interference into the performancemodel, and provision the VM instances of the good-performing hardware type to tenant applications.

A more recent work [16] designed a VM provisioningmiddleware for MapReduce jobs named U-CHAMPIONto mitigate the performance variation caused by hard-

ware heterogeneity. However, it does not consider theperformance interference of provisioned VMs, whichcannot be overlooked in IaaS clouds [12], [30]. Heifer fur-ther differs from U-CHAMPION in that, U-CHAMPIONpredicts the MapReduce performance using the SVM re-gression model, while Heifer devises a detailed MapRe-duce performance model using the online-measured VMresource utilization. Another recent work [41] focusedon analyzing the task-level interference between CPU-intensive (i.e., map or reduce) and I/O-intensive (i.e.,shuffle) tasks within a VM. Given a job budget, it provi-sions tenant jobs with suitable types of VM instances toshorten the job completion time. Orthogonal to such aprior work, Heifer focuses on the VM-level performanceinterference, and provisions jobs with good-performingVMs within the same instance type. In addition, energyefficiency [42] is considered as a future extension.

7 CONCLUSION

To facilitate the predictable application performance fortenants in IaaS clouds, this paper presents the design ofHeifer, a hardware heterogeneity and interference-awareVM provisioning framework. By focusing on MapRe-duce as a representative cloud application, Heifer pre-dicts the performance of MapReduce applications usingthe online-measured resource utilization of VM instancesand capturing VM interference. Based on the predictionof application performance, Heifer further provisions theVM instances of the good-performing hardware typeto tenant jobs, so as to achieve job performance goalswhile saving the monetary cost for tenants. With exten-sive prototype experiments in both public and privateclouds as well as large-scale trace-driven simulations,we have demonstrated that Heifer can guarantee thecompletion time goals for MapReduce jobs and save atleast 55.2% of job budget for tenants, in comparison tothe recently proposed VM provisioning strategies (e.g.,ARIA) in clouds. Moreover, Heifer is able to improve thethroughput of MapReduce jobs in the cloud datacenter,thereby increasing the revenue of cloud providers.

ACKNOWLEDGMENT

The research was supported by grants from the Na-tional Natural Science Foundation of China (NSFC) un-der grant No.61502172, No.61520106005, No.61370232,No.61133006, and by a grant from the Science andTechnology Commission of Shanghai Municipality un-der grant No.14DZ2260800. The corresponding author isFangming Liu.

REFERENCES

[1] Amazon Elastic Compute Cloud (Amazon EC2). [Online].Available: http://aws.amazon.com/ec2/

[2] Microsoft Azure. [Online]. Available: http://azure.microsoft.com/

[3] The Rackspace Cloud. [Online]. Available: http://www.rackspace.com/

SUBMITTED TO IEEE TRANSACTIONS ON COMPUTERS 14

[4] V. Jalaparti, H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron,“Bridging the Tenant-Provider Gap in Cloud Services,” in Proc. ofSOCC, Oct. 2012.

[5] A. Wieder, P. Bhatotia, A. Post, and R. Rodrigues, “Orchestratingthe Deployment of Computations in the Cloud with Conductor,”in Proc. of NSDI, Apr. 2012.

[6] Z. Ou, H. Zhuang, A. Lukyanenko, J. K. Nurminen, P. Hui,V. Mazalov, and A. Yla-Jaaski, “Is the Same Instance Type CreatedEqual? Exploiting Heterogeneity of Public Clouds,” IEEE Trans-actions on Cloud Computing, vol. 1, no. 2, 2013.

[7] P. Leitner and J. Cito, “Patterns in the Chaos a Study of Per-formance Variation and Predictability in Public IaaS Clouds,” inProc. of WWW, May 2015.

[8] B. Farley, V. Varadarajan, K. D. Bowers, A. Juels, T. Ristenpart,and M. M. Swift, “More for Your Money: Exploiting PerformanceHeterogeneity in Public Clouds,” in Proc. of SOCC, Oct. 2012.

[9] F. Xu, F. Liu, L. Liu, H. Jin, B. Li, and B. Li, “iAware: Making LiveMigration of Virtual Machines Interference-Aware in the Cloud,”IEEE Transactions on Computers, vol. 63, no. 12, 2014.

[10] S. K. Barker and P. Shenoy, “Empirical Evaluation of Latency-sensitive Application Performance in the Cloud,” in Proc. ofMMSys, Feb. 2010.

[11] A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha, “Sharingthe Data Center Network,” in Proc. of NSDI, Mar. 2011.

[12] F. Xu, F. Liu, H. Jin, and A. V. Vasilakos, “Managing PerformanceOverhead of Virtual Machines in Cloud Computing: A Survey,State of Art and Future Directions,” Proceedings of the IEEE, vol.102, no. 1, 2014.

[13] H. Herodotou, F. Dong, and S. Babu, “No One (Cluster) Size FitsAll: Automatic Cluster Sizing for Data-intensive Analytics,” inProc. of SOCC, Oct. 2011.

[14] A. Verma, L. Cherkasova, and R. H. Campbell, “Resource Pro-visioning Framework for MapReduce Jobs with PerformanceGoals,” in Proc. of Middleware, Dec. 2011.

[15] R. C. Chiang, J. Hwang, H. H. Huang, and T. Wood, “Ma-trix: Achieving Predictable Virtual Machine Performance in theClouds,” in Proc. of ICAC, Jun. 2014.

[16] E. Pettijohn, Y. Guo, P. Lama, and X. Zhou, “User-CentricHeterogeneity-Aware MapReduce Job Provisioning in the PublicCloud,” in Proc. of ICAC, Jun. 2014.

[17] H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, “TowardsPredictable Datacenter Networks,” in Proc. of SIGCOMM, Aug.2011.

[18] M. Hajjat, R. Liu, Y. Chang, T. S. E. Ng, and S. Rao, “Application-Specific Configuration Selection in the Cloud: Impact of ProviderPolicy and Potential of Systematic Testing,” in Proc. of Infocom,Apr. 2015.

[19] C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch,“Heterogeneity and Dynamicity of Clouds at Scale: Google TraceAnalysis,” in Proc. of SOCC, Oct. 2012.

[20] Google Compute Engine. [Online]. Available: https://cloud.google.com/compute/

[21] X. Yi, F. Liu, J. Liu, and H. Jin, “Building a Network Highway forBig Data: Architecture and Challenges,” IEEE Network Magazine,vol. 28, no. 4, pp. 5–13, 2014.

[22] V. Jalaparti, P. Bodik, I. Menache, S. Rao, K. Makarychev, andM. Caesar, “Network-Aware Scheduling for Data-Parallel Jobs:Plan When You Can,” in Proc. of SIGCOMM, Aug. 2015.

[23] Amazon Elastic MapReduce (Amazon EMR). [Online]. Available:http://aws.amazon.com/elasticmapreduce/

[24] HDInsight. [Online]. Available: http://azure.microsoft.com/en-us/services/hdinsight/

[25] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Process-ing on Large Clusters,” Communications of the ACM, vol. 51, no. 1,2008.

[26] sysench. [Online]. Available: https://launchpad.net/sysbench[27] Hadoop. [Online]. Available: http://hadoop.apache.org/[28] SPEC CPU2006. [Online]. Available: http://www.spec.org/

cpu2006/[29] X. Bu, J. Rao, and C.-Z. Xu, “Interference and Locality-Aware Task

Scheduling for MapReduce Applications in Virtual Clusters,” inProc. of HPDC, Jun. 2013.

[30] W. Zhang, S. Rajasekaran, S. Duan, T. Wood, and M. Zhu, “Mini-mizing Interference and Maximizing Progress for Hadoop VirtualMachines,” ACM SIGMETRICS Performance Evaluation Review,vol. 42, no. 4, 2015.

[31] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica,“Improving MapReduce Performance in Heterogeneous Environ-ments,” in Proc. of OSDI, Dec. 2008.

[32] P. I. Good, Ed., Resampling Methods: A Practical Guide to DataAnalysis, 3rd ed. Boston: Birkhauser, 2006.

[33] Wikipedia:Database download. [Online]. Available: http://en.wikipedia.org/wiki/Wikipedia:Database download

[34] Running MapReduce TF-IDF. [Online]. Available:http://code.google.com/p/hadoop-clusternet/wiki/RunningMapReduceExampleTFIDF

[35] Y. Chen, A. Ganapathi, R. Griffith, and R. Katz, “The Case forEvaluating MapReduce Performance Using Workload Suites,” inProc. of MASCOTS, Jul. 2011.

[36] F. Xu, F. Liu, P. Yin, and H. Jin, “Network-Aware Task Assign-ment for MapReduce Applications in Shared Clusters,” Journal ofInternet Technology, vol. 16, no. 2, 2015.

[37] V. Gupta, M. Lee, and K. Schwan, “HeteroVisor: Exploiting Re-source Heterogeneity to Enhance the Elasticity of Cloud Plat-forms,” in Proc. of VEE, Mar. 2015.

[38] D. Jiang, G. Pierre, and C.-H. Chi, “Resource Provisioning of WebApplications in Heterogeneous Clouds,” in Proc. of NetApp, Jun.2011.

[39] J. Guo, F. Liu, C. Lui, and H. Jin, “Fair Network Bandwidth Al-location in IaaS Datacenters via a Cooperative Game Approach,”IEEE/ACM Transactions on Networking, 2015.

[40] R. B. Clay, Z. Shen, and X. Ma, “Accelerating Batch Analyticswith Residual Resources from Interactive Clouds,” in Proc. ofMASCOTS, Aug. 2013.

[41] Y. Yuan, H. Wang, D. Wang, and J. Liu, “On Interference-awareProvisioning for Cloud-based Big Data Processing,” in Proc. ofIWQoS, Jun. 2013.

[42] W. Deng, F. Liu, H. Jin, C. Wu, and X. Liu, “Multigreen: Cost-Minimizing Multi-Source Datacenter Power Supply with OnlineControl,” in ACM e-Energy, 2013.

Fei Xu received the PhD degree in Com-puter Science and Engineering from HuazhongUniversity of Science and Technology, Wuhan,China, in 2014. He is currently an assistant pro-fessor in the Department of Computer Scienceand Technology, East China Normal University,Shanghai, China. His research interests focuson cloud computing and virtualization technol-ogy.

Fangming Liu received the PhD degree in com-puter science and engineering from Hong KongUniversity of Science and Technology, HongKong, in 2011. He is a full professor in HuazhongUniversity of Science and Technology, Wuhan,China. His research interests include cloud com-puting and datacenter, mobile cloud, green com-puting, SDN and virtualization. He is selectedinto National Youth Top Talent Support Pro-gram of National High-level Personnel of SpecialSupport Program (The “Thousands-of-Talents

Scheme”) issued by Central Organization Department of CPC. He is aYouth Scientist of National 973 Basic Research Program Project of SDN-based Cloud Datacenter Networks. He was a StarTrack Visiting Facultyin Microsoft Research Asia in 2012-2013. He has been the Editor-in-Chief of EAI Endorsed Transactions on Collaborative Computing,a guest editor for IEEE Network Magazine, an associate editor forFrontiers of Computer Science, and served as TPC for ACM Multimedia2014, IEEE INFOCOM 2013–2016, ICNP 2014 and ICDCS 2015.

Hai Jin received the PhD degree in computerengineering from Huazhong University of Sci-ence and Technology (HUST), Wuhan, China,in 1994. He is a Cheung Kung Scholars chairprofessor of computer science and engineer-ing in HUST. His research interests includecomputer architecture, virtualization technology,cluster computing, and grid computing. He wasawarded the Excellent Youth Award from theNational Science Foundation of China in 2001.