8

Click here to load reader

[IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

  • Upload
    tridib

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

Cloud Capability Estimation and Recommendationin Black-Box Environments UsingBenchmark-Based Approximation

Gueyoung Jung, Naveen Sharma, Frank Goetz

Xerox Research Center Webster

Webster, USA

{gueyoung.jung, naveen.sharma, frank.goetz}@xerox.com

Tridib Mukherjee

Xerox Research Center India

Bangalore, India

{tridib.mukherjee2}@xerox.com

Abstract—As cloud computing has become popular and thenumber of cloud providers has proliferated over time, thefirst barrier to cloud users will be how to accurately estimateperformance capabilities of many different clouds and then, selecta right one for given complex workload based on estimates. Suchcloud capability estimation and selection can be a big challengesince most clouds can be considered as black-boxes to cloud usersby abstracting underlying infrastructures and technologies. Thispaper describes a cloud recommender system to recommend anoptimal cloud configuration to users based on accurate estimates.To achieve this, our system generates the capability vector thatconsists of relative performance scores of resource types (e.g.,CPU, memory, and disk) estimated for given user workload usingbenchmarks. Then, a search algorithm has been developed toidentify an optimal cloud configuration based on these collectedcapability vectors. Experiments show our approach accuratelyestimate the performance capability (less than 10% error) whilescalable in large search space.

Keywords-estimation; recommendation; benchmarking

I. INTRODUCTION

As cloud computing has become popular, more than 100

cloud providers are in the market nowadays [1]. Enterprises

(i.e., cloud users) have started to deploy their complex work-

loads such as multi-tier web transactions and parallel data

mining into these clouds for cost-efficiency. In this trend,

the first question given to a cloud user will be identifying

a right cloud configuration for her workload to achieve her

performance goal (e.g., the maximum throughput) and figuring

out how much cost-saving she can achieve from her choice.

Meanwhile, the first question given to a cloud provider or

a recommender system will be how to efficiently estimate

performance capabilities of many different clouds and then,

identify an optimal configuration to be offered to his cloud

users at competitive price, while meeting users’ performance

goals. We define the cloud configuration for given workload

as combined virtual resources with required software systems

(e.g., operating system) to deploy the workload in a cloud.

It can be a big challenge to identify such cloud configura-

tion, since most cloud providers in the market hide or abstract

their underlying infrastructure configuration details such as

resource availability, the structure of physical resources (e.g.,

servers, storages, and networks), and how to manage their

virtual machines (VMs) [2], [3]. They only show a list of pre-

determined VM types and corresponding prices. Additionally,

they continually integrate new HW and SW artifacts into

clouds, and a cloud user can be overwhelmed by a number

of those technologies when exploring clouds for her workload

deployment. Hence, we consider such clouds as black-boxes.A user may provision a large amount of high-end VMs to

avoid the risk of failing to meet her throughput goal. This can

typically lead to over-provisioning and unnecessarily high cost.

The user may further find an optimal cloud configuration by

blindly exploring different cloud configurations and evaluating

them whether they meet her throughput goal. However, this

will be very expensive and time consuming, since she will

find a lot of different cloud configuration options, different

workloads and configurations have different performance char-

acteristics [4], and the workload deployment is typically very

complicated [5]. Although a user can figure out which VM

has the fastest CPU, disk I/O, or memory separately, this is

not sufficient for the user to deploy complex workloads. This

is because resource types are usually inter-dependent while

dealing with workloads. This is also because a resource can

be bottleneck for certain amount of loads, and the bottleneck

can be migrated between resources as load changes [6].This paper describes an approach to estimate performance

capabilities of cloud configurations obtained from black-box

clouds, and identify a near optimal cloud configuration for a

given complex workload in an efficient way. Specifically, our

contributions are as follows.

• Estimating cloud capability requirement for workloadand throughput goal: We develop an approach to char-

acterize the performance of given complex workload and

then, estimate the required capability of each resource

type to achieve the target throughput. These performance

capabilities are encoded into a capability vector to repre-

sent the requirement for the throughput goal.

• Estimating performance capability of target cloud forgiven workload: We develop an approach to characterize

the performance of a target cloud configuration (e.g., a

VM), using a benchmark suite that is much simpler todeploy than the target application itself, and then, encode

it into a representative capability vector.

2013 IEEE Sixth International Conference on Cloud Computing

978-0-7695-5028-2/13 $26.00 © 2013 IEEE

DOI 10.1109/CLOUD.2013.45

293

Page 2: [IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

������������� ������������

Price Performance

���������� ��� ���� ���� ������

��������� ���������� ��������� ��������������

�� ��������� ���������� ��������� ��������������

�� ����� ���������� �! ��"�����������

�� ����� ��#������� ��������� ��������������

Fig. 1. An example result of cloud comparison

• Identifying optimal cloud configuration in an efficientway: We develop a search algorithm to identify a near

optimal cloud configuration in black-box clouds. Using

capability vectors, we cast the search problem into the

Knapsack problem [7], and develop a heuristic search

algorithm to identify a cloud configuration having a best

price. With the algorithm, our approach can reduce the

search space and speed up the search procedure.

We have evaluated our approach on a public cloud with

intensive web workloads. Experiment results show our ap-

proach accurately estimate performance capabilities of cloud

configurations using the benchmark-based approximation. To

evaluate the scalability of our search algorithm, we perform

an extensive simulation with capability vectors computed with

benchmark results.

The remainder of this paper is structured as follows: Section

II outlines our recommendation procedure; Section III and IV

describe the estimation and the search algorithm respectively

in detail; Section V shows our evaluation results; Section VI

reviews related work; Section VII concludes the paper.

II. SYSTEM OVERVIEW

It is critical for both cloud providers and users to identify

an optimal cloud configuration for a given workload. In this

regard, we are developing a cloud recommender system, re-

ferred to as CloudAdvisor, that aims to estimate the throughput

of a given workload in a target cloud. To provide a concrete

Recommendation-as-a-Service, CloudAdvisor compares offer-

ings made by different cloud providers for any given workload.

When one requests a comparison, it shows a comparison table

as depicted in Fig. 1.

The table in Fig. 1 shows a price comparison among four

different cloud configurations including an in-house cloud,

when those cloud configurations have similar throughputs for

a workload deployment. Then, a user can identify which

configuration offers a best price while it meets her throughput

goal. Similarly, CloudAdvisor can provide a throughput com-

parison, where prices are similar. Note that those prices and

throughputs can dynamically change over time due to dynamic

environment of cloud market.

Fig. 2 outlines the approach for providing such comparisons.

The overall approach is 1) characterizing the performance of a

given workload in terms of resource usage pattern in the white-

box test-bed, 2) based on the resource usage pattern, estimating

a capability vector, each element of which represents the

��������

�������� ������ �

������!�

�"#�$"���������

�������#�$�

�������#�$�

�������#�$�

�������#�$�

��$%�����&'���#�$�(��������#�$�

$���&"��

�����)�

� &#����*���)#+�����

����%���*�)��

��"� '�$%)�

�,,�&��-�"��"��.�$%�����

��,��&�&#/��)-'�-�"�

��$� ���$����������"���$�-�"�

����''�"��$�/)#�'�

�,,�&��-�"�0��)�$��$1��&��

1

2

3

4

$&���(���/ �&)���"#� �$��$'�"���

��$����$�

������ ����� ������

��������2���

������ �! ������

�������� ������ �! ������

�������� ������ ����� ������

5

Fig. 2. Approach of our recommender system

required performance capability of each resource type (e.g.,

CPU, memory, disk I/O, and network bandwidth) to meet a

target throughput, 3) capturing the performance characteristics

of target clouds using a configurable benchmark suite, referred

to as CloudMeter developed as a component of CloudAdvisor,

and estimating a set of capability vectors, each of which

represents a specific configuration (e.g., a single VM), 4)

with these capability vectors, searching an optimal cloud

configuration (i.e., combined VMs to run the target workload)

until there is no more chance to minimize the price, and finally,

5) providing a comparison table using search results. Note

that the step 3 can be done as offline batch processes, and

each batch process will be scheduled periodically, since those

benchmarking results can be dynamically changed over time.

To achieve such comparisons, we develop two key compo-

nents, capability estimation and search for an optimal cloud

configuration, to be described in the following sections.

III. BENCHMARK-BASED CAPABILITY ESTIMATION

In a recommender system, one of the most important parts is

accurately estimating the performance capability of each cloud

configuration for a given workload while exploring various

different cloud configurations. Here, the performance capa-

bility of cloud configuration is defined as the approximated

maximum throughput that can be achieved using a cloud

configuration for the workload.

To estimate performance capabilities of cloud configura-

tions, our approach first builds an abstract performance model

based on the resource usage pattern of the workload mea-

sured in an in-house test-bed (i.e., a white-box environment).

Second, using CloudMeter, it computes relative performancescores of many different cloud configurations against the in-

house cloud. Finally, it applies the collected performance

scores into the abstract performance model to estimate per-

formance capabilities of those cloud configurations. This ap-

proach needs little cost because we do not need to deploy the

294

Page 3: [IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

complex workload itself into all possible cloud configurations

to evaluate their performance capabilities.

A. Performance Characterization of Workload

For a given workload, CloudAdvisor first characterizes

the workload in the context of its resource usage pattern

by deploying the workload into an in-house test-bed and

computing the correlation of resource usages to load change.

In our approach, we capture the usage change rates (i.e., slope)

of each resource type and the throughput change rate until the

capability is reached, while load increases. These usage change

rates can approximately indicate the degree of contribution of

each resource type to the performance capability (see Fig. 4

in Section V showing these change rates of an application).

A workload simulator has been developed for the per-

formance characterization. It generates synthetic loads with

various data access patterns (e.g., the ratio of database write

over read transactions and the ratio of business logic over I/O).

If a historical load is available, the workload simulator can sort

and re-play the workload to give systematic stress to the target

system. We also consider that the test-bed is highly capable

to run any type of workload such as CPU-intensive, memory-

intensive, I/O intensive, or network intensive.

B. Building Abstract Performance Model

Based on the resource usage pattern, our approach defines

quantitative models for resource types. As shown in Fig. 4,

throughput increases until the performance capability reaches,

and the point of the performance capability is determined by

some resource types that consume the most of their available

capacities (i.e., bottlenecked). Hence, our system defines a

quantitative model for each individual resource type to identify

its correlation to the performance capability. Specifically, for

each resource type j, a quantitative model can be defined as,

Tj = f(Uj |(Cj = c, ∃j ∈ R) ∧ (Cr′ =∞, ∀r′ ∈ R))

, where Tj is the throughput to be achieved with Uj that is

the normalized usage rate over given capacity (i.e., Cj = c)of a resource type j. R is a set of resource types, and r′ is a

resource type in R, where r′ �= j. We consider r′ has unlimited

capacities to compute the correlation of only j to Tj .

To compute Tj using f , our system takes 4 steps. First, the

system figures out the relation of load to the usage rate of the

resource type. The relation can be defined as a linear function

or, generally, as a function that has a logarithmic curve for a

resource type j. Usage rates we consider in this paper are the

total CPU that consists of user and system CPU usages, cache,

memory, disk I/O, and network usages. More specifically, the

function can be as follows,

Uj = si,j(αj(2L− Lp) + γj) (1)

, where L is the amount of load, p is used to minimize the

square error (a linear function is a special case when p is 1).

αj is the increase rate (e.g., a slop in a linear function), and γjis an initial resource consumption of the current configuration.

αj , γj and p can be obtained by calibrating the function to fit

into actual curve. In this fitting, we use the portion of low load

in the curve (i.e., the portion before the knee of the curve in

Fig. 4 (b) and (c)). Then, si,j is computed that is the relative

performance score of the cloud configuration i. This will be

described in the next section. In the white-box, si,j is 1.

Second, the relation of L to throughput T (see Fig. 4 (a))

is defined as,

T = β(2L− Lq) (2)

, where β is the increase rate, and q is used to minimize the

square error (a linear when q is 1). Similarly, we can obtain

β and q by calibrating the function to fit into actual curve.

Third, we compute the capability based on the correlation

of j to L. We can obtain a theoretical amount of load, when

j reaches the full usage point using Eq. 1 (i.e., theoretically

extending the curve beyond the knee point until Uj is 1). Then,

the obtained amount of load is applied to Eq. 2.

Finally, the capability of the cloud configuration can be

Tmax = min(T1, T2, ..., Tr), where Tj is the throughput com-

puted from Eq. 2 for each j. In other words, Tj is a maximum

throughput when j is fully consumed while other resources

are still available (because we consider other resources are

unlimited). The rationale of using min() function to compute

Tmax is that the capability is based on the fact, which some

resources do not consume all of their available resources while

only bottlenecked ones consume all of their resources.

C. Computing Relative Performance Scores

Despite the same workload, target cloud configurations

may have different performances each other. To complete the

abstract performance model (i.e., Eq. 1), we have to capture

the performance characteristics of each target configuration iin terms of relative performance score si,j for each resource

type j. Using CloudMeter, our approach collects relative

performance scores based on resource capability measure-

ments. These scores can be reused for any different workload

later. In current implementation, CloudMeter contains a set

of micro-benchmark applications including Dhrystone, Whe-

stone, System calls, and Context switch that are integrated

into UnixBench for CPU benchmarking, CacheBench for

memory benchmarking, IOZone for disk I/O benchmarking,

and our own network benchmark application. CloudMeter is

very useful when historical workload trace of an application

in the target cloud is not available. Once it is available after

the application is deployed, the application itself can be a

benchmark of CloudMeter and then, the historical data can

be used to compute si,j of a new workload that has similar

performance characteristics.

Using measurements, our approach computes si,j as, si,j =(bi,j/bj)(aj/ai,j), where aj and bj are a given resource

allocation (e.g., the number of vCPUs or memory size) and the

benchmarking measurement for j respectively, in the white-

box cloud configuration. ai,j and bi,j are those in the target

cloud configuration i.

295

Page 4: [IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

By applying si,j to Eq. 1, we can obtain the performance

capability of i. When we deal with CPU for si,cpu, the system

CPU usage for si,sys and user level CPU usage for si,user must

be considered separately in the total CPU usage as shown in

Fig. 4 (b) and (c), since the system CPU usage is related to

context switch and system call used for interrupts, allocate/free

memory, and communicating with file system that can be

different among cloud configurations. Thus, we extend si,jfor CPU as si,cpu = (si,userαuser + si,sysαsys)/αcpu, where

αuser is the increase rate of user CPU usage and αsys is the

increase rate of system CPU usage while αcpu is the increase

rate of the total CPU usage. These rates are captured from the

fitting model step in Eq. 1.

IV. SEARCH FOR OPTIMAL CLOUD CONFIGURATION

Based on a target throughput and the performance model,

CloudAdvisor identifies a near optimal cloud configuration

for the given workload. In particular, we focus on parallel

workload in this paper. To do this, our approach first generates

capability vectors, each of which represents the performance of

a cloud configuration. Second, we encode the search problem

into Knapsack Problem [7] and then, we develop a search

algorithm to solve the problem in an efficient way.

A. Generating Capability Vectors for Given Workload

The first step to identify an optimal cloud configuration is

generating a required capability vector using resource usage

model in the white-box test-bed. Each element of the required

capability vector represents the performance capability value

of each resource type to meet a target throughput.

Specifically, the required performance capability value, re-

marked as c∗j of a resource type j (i.e., the usage rate Uj

required to achieve the target throughput T ∗), can be given as

c∗j = (αj/β)T∗ + γj , when we consider a linear function in

Eq. 1 and 2. Here, (αj/β) indicates the normalized increase

rate of the resource usage to increase a unit of throughput.

Thus, the equation indicates how much resource capability

is required to meet T ∗. Note that if c∗j is more than 1, it

indicates that more resource capability is required to meet T ∗

than currently available in the test-bed. With this equation,

the required capability vector for the target workload and

throughput is a set of such required performance capability

values and represented as a form of V ∗ = [c∗1 c∗2 ... c∗r ].We may obtain the required capability vector of a target

cloud configuration i by directly deploying the given workload

into i and perform aforementioned measurements. However,

it is expensive and time consuming task since there are many

different cloud configurations to be evaluated, and the work-

load deployment is typically very complicated. Hence, our

approach instead captures the relative performance capabilityvalue, remarked as ci,j of i, by using the benchmarking

measurements that are used to compute si,j for j. It is much

simpler than deploying the complex workload into the target

cloud configuration. Moreover, benchmarking measurements

can be reused for other workloads once these are continually

updated for any change of cloud configuration.

CPU Mem Cap Disk IO NW bandwidth

10.0 9.8

7.0

5.0

1.0 1.2 0.8 0.5

Elements of Capability Vector

Perf

. Cap

. Val

ue

Required capability

Filled capability

Fig. 3. A capability vector is filled with a VM’s capability vector

Specifically, the relative performance capability value ci,j =(bi,j/bj)(ai,j/aj)cj , where (bi,j/bj) is the performance ratio

based on benchmarking, (ai,j/aj) is resource allocation ratio,

and cj is the maximum resource usage rate for given capacity

of j (typically, cj = 1). Then, the capability vector of i can

be represented as a form of Vi = [ci,1ci,2...ci,r]. Finally, our

approach computes all target capability vectors from clouds of

interests. Note that Vi is just the relative capability vector of iagainst the test-bed, while V ∗ is the capability vector required

to achieve T ∗ and computed in the test-bed.

B. Search Algorithm

Using capability vectors, CloudAdvisor explores various

cloud configurations offered from each cloud provider to iden-

tify a cloud configuration with a best price. If we can obtain

the price per a unit of resource usage, we can easily compute

the total price to meet V ∗. However, most cloud providers have

pre-determined small cloud configuration types with specific

prices as forms of VM types that have different CPU, memory,

disk and network capacities [8]. Some cloud providers such

as Amazon Web Services use a dynamic pricing scheme for

their virtual resources as well. In this paper, we assume a

static pricing scheme that is used by most cloud providers. In

our current implementation of the search algorithm, we also

assume a parallel workload such as clustered parallel database

transactions and a parallel data mining using MapReduce

[9]. For parallel workloads, the cloud configuration can have

multiple heterogeneous VMs to handle loads in parallel with

a load balancing technique such as one introduced in [10].

The relative capability vector Vi of each specific VM type

can be computed as described in the prior section. We can also

capture V ∗ that represents a resource capacity requirement

to meet a target throughput T ∗. Then, the search procedure

is to fit those numerical capability values of pre-determined

VM types into numerical capability values defined in V ∗.

Especially, the search algorithm identifies a configuration with

a minimum price when fitting into V ∗.

As shown in Fig. 3, we illustrate V ∗ with a set of containers,

each of which has a specific size (i.e., the numerical capability

c∗j of j as described in Section IV-A). Then, we can convert the

problem into Knapsack problem [7], since the search algorithm

tries to fill containers with items (i.e., Vis of VMs) that have

different values (i.e., prices) while aiming at a minimal total

296

Page 5: [IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

price. Fig. 3 shows a snapshot of the search procedure, when

containers are filled with capabilities of a VM’s Vi.

The problem we tackle in this paper is finding such VMs to

meet a target throughput while minimizing the cumulated price

of VMs. More specifically, the optimal cloud configuration

will have 0 distance between V ∗ and cumulated capabilities of

VMs (i.e.,∑m′

i=1 ci,j , where m′ is the number of VM instances

in a configuration) while having the smallest cumulated price

of VMs. The distance can be computed as follows,

D =

r∑

j=1

max(c∗j −m′∑

i=1

ci,j , 0) (3)

If c∗j −∑m′

i=1 ci,j < 0, its distance is considered as 0 since

it indicates enough capability. Otherwise, containers have to

be filled with more VMs since the target throughput is not

met. Note that all containers must be fully filled even if a

container’s size is very small (i.e., the required capability is

relatively small such as network bandwidth in Fig. 3). This is

because the bottleneck can occur in any resource that has not

enough capability even if the resource type is not critical to

achieve a target throughput.

Identifying such cloud configuration is not trivial, since

there are many different combinations of VMs that can meet

the target throughput. We can formulate the problem into

Integer Linear Programming (ILP) as follows,

Minimize∑m

i=1 nipiSubject to

∑mi=1 nici,j ≥ c∗j ∀j ∈ R, ni ≥ 0, pi ≥

0, and ci,j ≥ 0

, where m is the number of VM types, ni is the required

number of VM instances of each VM type i, and pi is the

price of i. Then, to solve this ILP, we can use generic solutions

such as Tabu search [11], Simulated annealing [12], and Hill

climbing [13].

In our approach, we design an efficient search algorithm

by extending the best-first search algorithm based on our

observations, rather than blindly exploring the search space.

Basically, our search algorithm uses the cheapest configuration

(i.e., the cheapest combination of VMs) first to fill V ∗ among a

set of candidates. It keeps generating candidates and trying to

fit each candidate combination into V ∗ iteratively until it finds

one successfully fitted into V ∗. The overhead of the search

procedure is caused by too many candidates to be generated

(i.e., a large search space). Hence, our algorithm is designed to

significantly reduce the search space. To do this, our algorithm

first checks the gradient of each resource capability in the

current cheapest candidate against a previously selected one.

Then, it chooses only those candidates that have any resource

type still required to be filled into V ∗, (i.e., c∗j−∑m′

i=1 ci,j > 0)

and reducing the gap between c∗j and∑m′

i=1 ci,j more than the

previously selected candidate does. We call this approach as

Conservative Gradient Search (CG).

Second, our algorithm explores the search space along with

biased paths in the search space that have more remaining

gaps (i.e., focusing on a resource type to be potentially

bottlenecked), and chooses some number of candidates that

can fill the gap of the resource with less price. Note that our

algorithm checks remaining gaps of resource types at every

iteration since the potential bottleneck keeps migrating while

adding resources. The number of candidates can be determined

empirically or dynamically over the iteration (i.e., reducing the

number as the gap decreases). In our experiments, we use a

static parameter. We call this approach as Bottleneck-awareProactive Search (BP). The algorithm with two inputs, V (a

set of VM types) and V ∗, is outlined in Algorithm 1.

Algorithm 1 Bottleneck-aware proactive search

Input: V ∗, VOutput: V̂curr // the cheapest combination of VM instancesQ← null; // the pool of candidate combinationsV̂curr ← ø; V̂prior ← ø;for each Mi ∈ V do

Q← Q ∪ vmi; // where vmi is an instance of Mi

end forwhile forever do

Q← SortByPrice(Q);while forever do

V̂curr ← Q.RemoveF irst();if |V̂curr| < |V̂prior| then

break; // where |V̂ | is the number of instancesend ifif (V̂curr.cj < V ∗.cj) ∧ (V̂curr.cj ≥ V̂prior.cj), ∃j ∈ R then break;

end whileV̂prior ← V̂curr;

if V̂curr.cj ≥ V ∗.c∗j , ∀j ∈ R thenreturn V̂curr; // i.e., compute D as described in Eq. 3

end if(j∗, g∗)← max{V ∗.c∗j − V̂curr.cj , ∀j ∈ R};// where j∗ is bottlenecked and g∗ is the remaining gapQtemp ← SelectCandidateTypes(j∗, g∗, k,V);// where k is the number of candidate newly generatedfor each Mi ∈ Qtemp do

V̂ ′.p← V̂curr.p+ vmi.p; // where p is the priceV̂ ′.cj ← V̂curr.cj + vmi.cj , ∀j ∈ R;

if V̂ ′ /∈ Q then Q← Q ∪ V̂ ′;end for

end while

Using this algorithm, we reduce the search space by pruning

out some candidates that can achieve the target throughput, but

are eventually more expensive than the resulting combination.

Hence, we can compute the near optimal cloud configuration

much more efficiently than blind exploration.

V. EXPERIMENTAL EVALUATION

A. Experimental Setup

To evaluate our approach, we have used an online auction

web, RUBiS [14], that is deployed on servlet server (Tomcat)

and back-end database server (MySQL). The workload pro-

vided by RUBiS package consists of 26 different transactions.

In our experiments, we focus on clustering of the database

297

Page 6: [IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

0

1000

2000

3000

4000

5000

6000

7000

8000

50

150

250

350

450

550

650

750

850

950

1050

11

50

1250

13

50

1450

15

50

B-1-2-K

B-4-2-X

W-2-4-K

Load (#concurrent users)

Thr

ough

put (

#res

pons

es p

er 3

min

)

0

10

20

30

40

50

60

70

80

90

100

50

150

250

350

450

550

650

750

850

950

1050

11

50

1250

13

50

1450

15

50

CPU_user CPU_sys CPU_total Disk_I/O NW_I/O Mem

Load (#concurrent users)

Res

ourc

e us

age

(%)

0

10

20

30

40

50

60

70

80

90

100

50

150

250

350

450

550

650

750

850

950

1050

11

50

1250

13

50

1450

15

50

1650

17

50

CPU_user CPU_sys CPU_total Disk_I/O #REF! Mem

Load (#concurrent users)

Res

ourc

e us

age

(%)

NW_I/O

(b) Resource usage of light-write (c) Resource usage of heavy-write (a) Throughputs of three configurations

Fig. 4. Throughputs and resource usage patterns

server so that we have modified the default workload to

intentionally stress on the database server by reducing simple

HTML transactions and increasing the rate of DB query

transactions. Then, we have created two different workloads by

changing the rate of write transactions in workload. The light-

write workload has 5% write transactions, while the heavy-

write workload has 50% over all transactions in workload.We have prepared a VM type in our white-box test-bed that

is configured with 2 vCPUs and 4 GB memory, 40 GB disk,

and runs with Ubuntu 10.04 operating system. Instances of

this VM type are deployed into our Intel blade with KVM

virtualization. We call the instance of this VM type as W-2-4-

K. We have prepared other VM types obtained from Rackspace

that is one of well-known IaaS providers. There are various

VM types in Rackspace configured with different number of

vCPUs, memory size, and disk size. We have used 6 VM types,

where their variation in the number of vCPU is 1 to 8, memory

1 to 30 GB, disk 40 GB to 1.2 TB, and network 30 to 300

Mbps. Their prices are from $0.06 to $1.20 per hour. Ubuntu

10.04 has been installed in VMs. These VMs have been used

in our experiments for the search algorithm (Section V-C).For evaluating the capability estimation (Section V-B), we

have selected a VM type from Rackspace that has 4 vCPUs, 2

GB memory, and 80 GB disk space running on AMD server

with Xen. We call its instance as B-4-2-X. We have selected

a small VM type to compare with W-2-4-K. The small VM

type has 1 vCPU, 2 GB memory, and 40 GB disk, and run on

Intel server with KVM. We call its instance as B-1-2-K.

B. Performance Capability EstimationTo evaluate the capability estimation, we have deployed the

application into W-2-4-K and built the abstract performance

model (i.e., Eq. 1 and 2). As shown in Fig. 4, our workload

generator records throughput (i.e., the number of responses per

3 minutes) while increasing the number of concurrent users by

50 every 3 minutes using aforementioned workloads.Fig. 4 (a) shows the throughput change and the maximum

throughput of W-2-4-K. It also shows those of other two con-

figurations, B-1-2-K and B-4-2-X. Our estimation approach

will estimate those maximum throughputs shown in Fig. 4

(a). We note that change rates of all three configurations

are almost identical in low load. This is because they have

TABLE ICOEFFICIENTS (PARAMETERS) OF ABSTRACT PERFORMANCE MODELS

Resources αj p γj

CPU 0.08 1.01 8User CPU 0.05 1.02 5Sys. CPU 0.03 0.98 3Memory 0.01 1.01 21

Rel

ativ

e Pe

rf. S

core

0

0.5

1

1.5

2

2.5

Total CPU

System CPU

User CPU

Memory Disk I/O Network I/O

W-2-4-K

B-1-2-K

B-4-2-X

Fig. 5. Performance scores of resource types

enough capability to deal with such low load. However, their

maximum throughputs are different due to their different

capabilities. To figure out such capabilities, the resource usage

pattern is plotted as shown in Fig. 4 (b) and (c).

Fig. 4 (b) and (c) show usage changes of 6 resource types.

As observed, CPU and memory mainly affect the throughput

in all configurations. We note here that both workloads seem

having similar usage patterns, but user CPU and system

CPU usages are different between workloads. Hence, we

should carefully deal with CPU usage as described in Section

III-C when modeling the performance. We have captured the

abstract performance model of the heavy-write workload into

Table I. We have also captured β = 6 and q = 0.98 used in Eq.

2, as the throughput increase rate and square error respectively.

To compute performance scores of B-1-2-K and B-4-2-

X, we have deployed CloudMeter into these configurations

and performed benchmarking of resource types. Fig. 5 shows

performance scores of 6 different resource types. Note that

scores of B-4-2-X are quite different from other two VMs

since the ratio of CPU and memory allocations is different,

and B-4-2-K is configured with different type of architecture

298

Page 7: [IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

TABLE IITHROUGHPUT ESTIMATES (THE NUMBER OF TRANSACTIONS PER MINUTE)

W-2-4-K B-1-2-K B-4-2-X

Measured T 6050.32 3552.12 7063.33Tcpu 3241.02 13997.38Tmem 5029.91 7659.32Tdisk 4178010.09 3483818.44Tnw 2097002.99 1715384.62

(i.e., AMD) and virtualization (i.e., Xen).

By applying these scores to the abstract performance model,

we can estimate capabilities of B-1-2-K and B-4-2-X as shown

in Table II. Based on our approach in Section III-B, CPU is

bottlenecked in B-1-2-K similar with W-2-4-K as shown in

Fig. 4 (c) (i.e., Tcpu is the minimum). In B-4-2-X, it turns out

memory is bottlenecked (different from other two VMs). This

is because it has enough CPU allocation, but the bottleneck

is migrated to memory. Compared to the measured maximum

throughputs, the error rate is less than 10% (for B-1-2-K, it

is 8.26%, and for B-4-2-X, 8.04%). This is even better in the

light-write workload (around 6%).

Experiment results indicate that our estimation approach is

accurate enough to apply for the recommender system. When

we see the resource usage pattern in Fig. 4 (b) and (c), it seems

that CPU is bottlenecked in W-2-4-K, but it turns out that

memory can be bottlenecked in different configurations. Ad-

ditionally, using B-4-2-X instances can be over-provisioning to

achieve a little throughput increase, when we compare to W-2-

4-K. Finally, we can also find the bottleneck can be migrated

between resources (e.g., Tcpu and Tmem are similar in B-1-

2-K) as adding more allocation into bottlenecked resources.

Hence, the recommender system must accurately capture such

resource usage pattern and bottleneck migrations for workload.

C. Scalability of The Search Algorithm

We have obtained all capability vectors of 6 VM types

of Rackspace and then, evaluated the accuracy of our search

algorithm using the light-write workload. The resulting config-

uration computed by our search algorithm (i.e., BP) has been

compared with the configuration computed by a brute-force

search. We have used a relatively low throughput goal (i.e.,

20K) so that the resulting configuration consists of 5 low-end

VM instances. We have deployed a simple load balancer in

a separate VM that forwards user requests to the VM cluster

based on their performance capabilities. We can see that BP

returns exactly the same configuration (i.e., 5 low-end VM

instances) with the brute-force search in this small setup.

Although the VM cluster has been a little over-provisioned

against the goal (i.e., 22.3K), the error is still around 10 %.

We have conducted an extensive simulation to evaluate the

potential scalability and optimality of our search algorithm.

Currently, we plan to deploy a large-scale MapReduce cluster

with a parallel data mining workload that may need upto

several hundreds of VMs in the cluster. Thus, the search

algorithm has to be scalable to deal with such large workload.

In experiments, we have run our search algorithm with all

capability vectors of 6 VMs and W-1-2-K to compute the

configuration and its total price. We have increased the target

0

500

1000

1500

2000

2500

60K 90K 120K 150K 180K 210K 240K

NBF

CG

BP

Sear

ch T

ime

(sec

)

Target Throughput

Fig. 6. Scalability comparison of three search algorithms

0

1

2

3

4

5

6

60K 90K 120K 150K 180K 210K 240K

NBF

CG

BP

Target Throughput

Tot

al P

rice

/Hou

r ($

)

Fig. 7. Price comparison of three search algorithms

throughput from 60K to 240K while measuring the duration

of our search algorithm and the total price. Three different

search algorithms have been compared for the evaluation.

• Naive Best-First Search (NBF): uses the basic best-

first search algorithm. This algorithm is not so scalable

since it explores the cheapest candidate step by step while

generating all possible candidates in iterations. We have

used this algorithm as a baseline.

• Conservative-Gradient Search (CG): integrates the

conservative gradient search (described in Section IV) on

top of NBF.

• Bottleneck-aware Proactive Search (BP): integrates the

bottleneck-aware search to prune the search space on top

of CG and NBF as shown in Algorithm 1.

Experiment results show that our BP algorithm is scalable

(Fig. 6) while having reasonable optimality (Fig. 7). NBF in

Fig. 6 shows an obvious exponential increase as the target

throughput increases. Although CG has better scalability than

NBF, it still shows the exponential increase. This is mainly

because capabilities per unit price of given VM types are not

so different so that the pruning rate is very low. BP algorithm

aggressively prunes the search space based on potential bot-

tlenecks, instead of capability per unit price. Hence, we can

achieve a good scalability. Meanwhile, BP shows only a little

loss of optimality. As shown in Fig. 7, BP returns at most 4

% more price than the price computed by NBF. CG is almost

identical to NBF since it explores the most of candidates that

are explored by NBF.

VI. RELATED WORK

Evaluation of applications and clouds has gathered pace

in recent times as most enterprises are moving toward agile

299

Page 8: [IEEE 2013 IEEE 6th International Conference on Cloud Computing (CLOUD) - Santa Clara, CA (2013.06.28-2013.07.3)] 2013 IEEE Sixth International Conference on Cloud Computing - Cloud

hosting of services and applications over private, public, and

hybrid clouds. In this regard, researchers have focused on

three different directions: 1) updating application architecture

to move from legacy systems and adapt to the capabilities

of clouds [15]; 2) evaluating and comparing different clouds

functional and non-functional attributes to enable informed

decision on which cloud to host applications [4], [16], [17],

[5], [18]; and 3) efficient orchestration of virtual appliances

in a cloud [19], which may also include negotiations with

users [20]. This paper complements these different directions

by enabling recommendation of cloud configuration to suit

application requirements. In this regard, cloud capability esti-

mation methodology is developed using benchmark-based ex-

perimental profiling. Previous work has principally focused on

comparing rudimentary cloud capabilities using benchmarks

[4], [16] and automated performance testing [5]. This paper

focuses on detailed characterization of cloud capabilities for

given user workload in various target clouds.

Recently, many cloud providers, such as [17], [18] have

further offered testing paradigm to enable evaluation of vari-

ous workload and application models on different cloud and

resource management models. However, scaling such offer-

ings (to evaluate different applications on different clouds)

requires identifying various cloud capability models [21]. This

paper fills the gap by developing a generic methodology to

characterize and model cloud capabilities. The methodology

is applicable to different clouds since the modeling of cloud

capabilities are based on experiments of clouds externally

observable characteristics (assuming clouds as black-box). The

model is used to estimate performance of applications in target

clouds and then such estimates are used to efficiently search

(and subsequently recommend) cloud configuration.

Building performance models and diagnosing performance

bottlenecks for Hadoop clusters have been explored in [22].

This paper further develops methodology to recommend cloud

configurations to meet user demands. Orchestration of virtual

appliances in a cloud [19] and efficient allocation of instances

in cloud infrastructure [20] have been addressed previously

to meet user demands. However, such approaches are typi-

cally non-transparent to users and any cost and performance

trade-off remain unknown to users. This paper, on the other

hand, makes such tradeoff transparent allowing users to make

choices accordingly. Although negotiations with users have

been explored in [20], this paper enables configuration rec-

ommendations to the users based on capability estimation.

VII. CONCLUSION

This paper has aimed to address one major barrier given to

recommender system that identifies an optimal cloud configu-

ration for given complex user workload based on benchmark-

based approximation. Especially, we have shown that such

estimation and recommendation can be done even under black-

box environments. To achieve this, our system generates the

capability vector that consists of relative performance scores of

resource types and then, our search algorithm identifies a near

optimal cloud configuration based on these capability vectors.

Our experiments show our approach estimates the near optimal

cloud configuration for given workload within 10 % error, and

our search algorithm is scalable enough to apply for a large-

scale workload deployment. We currently work on applying

our approach to a large-scale MapReduce job by extending

the current approach. In this case, we are considering the time

factor (e.g., the number of hours to be used for each VM)

into Integer Linear Programming problem, defined in Section

IV-B, for so-called bag-of-tasks as introduced in [23].

REFERENCES

[1] (2013) Cloudharmony. [Online]. Available: http://cloudharmony.com/[2] R. Buyya, J. Brokberg, and A. Goscinski, Coud Computing: Principles

and Paradigms, Chapter 1. Wiley, 2011.[3] McKinsey, “Clearing the air on cloud computing,” Tech. Report, 2009.[4] D. Jayasinghe, S. Malkowski, Q. Wang, J. Li, P. Xiong, and C. Pu,

“Variations in performance and scalability when migrating n-tier appli-cations to different clouds,” in Proc. Int. Conf. on Cloud Computing,2011, pp. 73–80.

[5] D. Jayasinghe, G. Swint, S. Malkowski, Q. Wang, J. Li, and C. Pu,“Expertus: A generator approach to automate performance testing inIaaS,” in Proc. Int. Conf. on Cloud Computing, 2012, pp. 115–122.

[6] S. Malkowski, M. Hedwig, and C. Pu, “Experimental evaluation of n-tier systems: Observation and analysis of multi-bottlenecks,” in Proc.Int. Symp. on Workload Characterization, 2009, pp. 118–127.

[7] H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems. Springer,2004.

[8] M. Cardosa, A. Singh, H. Pucha, and A. Chandra, “Exploiting spatio-temporal tradeoffs for energy-aware MapReduce in the cloud,” in Proc.International Conference on Cloud Computing, 2011, pp. 251–258.

[9] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing onlarge clusters,” in Proc. Conference on Symposium on Operating SystemsDesign and Implementation (OSDI’04), 2004, pp. 10–19.

[10] G. Jung, N. Gnanasambandam, and T. Mukherjee, “Synchronous parallelprocessing of big-data analytics to optimize performance in federatedclouds,” in Proc. Int. Conf. on Cloud Computing, 2012, pp. 811–818.

[11] F. Glover, “Tabu search-part II,” ORSA, vol. 1, pp. 4–32, 1989.[12] V. Granville, M. Krivanek, and J.-P. Rasson, “Simulated annealing: A

proof of convergence,” Transactions on Pattern Analysis and MachineIntelligence, vol. 16, pp. 652–656, 1994.

[13] S. Russell and P. Norvig, Artificial Intelligence: a modern approach.Prentice Hall, pp. 111-114, 2003.

[14] RUBiS. [Online]. Available: http://rubis.ow2.org/[15] M. A. Chauhan and M. A. Babar, “Migrating service-oriented system

to cloud computing: An experience report,” in Proc. InternationalConference on Cloud Computing), 2011, pp. 404–411.

[16] S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, “The HiBenchbenchmark suite: Characterization of the MapReduce-based data analy-sis,” in Proc. International Conference on Data Engineering Workshops,Washington, DC, USA, 2010, pp. 41–51.

[17] R. Calheiros, R. Ranjan, A. Beloglazov, C. De Rose, and R. Buyya,“CloudSim: A toolkit for modeling and simulation of cloud comput-ing environments and evaluation of resource provisioning algorithms,”Software: Practice and Experience, vol. 41, pp. 23–50, 2011.

[18] J. Li, Q. Wang, Y. Kanemasa, D. Jayasinghe, S. Malkowski, P. Xiong,M. Kawaba, and C. Pu, “Profit-based experimental analysis of IaaScloud performance: Impact of software resource allocation,” in Proc.International Conference on Services Computing, 2012, pp. 344–351.

[19] K. Keahey and T. Freeman, “Contextualization: Providing one-clickvirtual clusters,” in Proc. Int. Conf. on eScience, 2008, pp. 301–308.

[20] S. Venugopal, J. Broberg, and R. Buyya, “Openpex: An open provision-ing and execution system for virtual machines,” in Proc. InternationalConference on Advanced Computing and Communications), 2009.

[21] J. Gao, X. Bai, and W. T. Tsai, “Cloud testing- issues, challenges, needsand practice,” Int. Journal on Software Engineering, vol. 1, 2011.

[22] S. Gupta, C. Fritz, J. de Kleer, and C. Witteveen, “Diagnosing hetero-geneous hadoop clusters,” Workshop on Principles of Diagnosis, 2012.

[23] J. O. Gutierrez-Garcia and K. M. Sim, “GA-based cloud resource estima-tion for agent-based execution of bag-of-tasks applications,” InformationSystems Frontiers, vol. 14, pp. 925–951, 2012.

300