Click here to load reader
Upload
tridib
View
217
Download
3
Embed Size (px)
Citation preview
Cloud Capability Estimation and Recommendationin Black-Box Environments UsingBenchmark-Based Approximation
Gueyoung Jung, Naveen Sharma, Frank Goetz
Xerox Research Center Webster
Webster, USA
{gueyoung.jung, naveen.sharma, frank.goetz}@xerox.com
Tridib Mukherjee
Xerox Research Center India
Bangalore, India
{tridib.mukherjee2}@xerox.com
Abstract—As cloud computing has become popular and thenumber of cloud providers has proliferated over time, thefirst barrier to cloud users will be how to accurately estimateperformance capabilities of many different clouds and then, selecta right one for given complex workload based on estimates. Suchcloud capability estimation and selection can be a big challengesince most clouds can be considered as black-boxes to cloud usersby abstracting underlying infrastructures and technologies. Thispaper describes a cloud recommender system to recommend anoptimal cloud configuration to users based on accurate estimates.To achieve this, our system generates the capability vector thatconsists of relative performance scores of resource types (e.g.,CPU, memory, and disk) estimated for given user workload usingbenchmarks. Then, a search algorithm has been developed toidentify an optimal cloud configuration based on these collectedcapability vectors. Experiments show our approach accuratelyestimate the performance capability (less than 10% error) whilescalable in large search space.
Keywords-estimation; recommendation; benchmarking
I. INTRODUCTION
As cloud computing has become popular, more than 100
cloud providers are in the market nowadays [1]. Enterprises
(i.e., cloud users) have started to deploy their complex work-
loads such as multi-tier web transactions and parallel data
mining into these clouds for cost-efficiency. In this trend,
the first question given to a cloud user will be identifying
a right cloud configuration for her workload to achieve her
performance goal (e.g., the maximum throughput) and figuring
out how much cost-saving she can achieve from her choice.
Meanwhile, the first question given to a cloud provider or
a recommender system will be how to efficiently estimate
performance capabilities of many different clouds and then,
identify an optimal configuration to be offered to his cloud
users at competitive price, while meeting users’ performance
goals. We define the cloud configuration for given workload
as combined virtual resources with required software systems
(e.g., operating system) to deploy the workload in a cloud.
It can be a big challenge to identify such cloud configura-
tion, since most cloud providers in the market hide or abstract
their underlying infrastructure configuration details such as
resource availability, the structure of physical resources (e.g.,
servers, storages, and networks), and how to manage their
virtual machines (VMs) [2], [3]. They only show a list of pre-
determined VM types and corresponding prices. Additionally,
they continually integrate new HW and SW artifacts into
clouds, and a cloud user can be overwhelmed by a number
of those technologies when exploring clouds for her workload
deployment. Hence, we consider such clouds as black-boxes.A user may provision a large amount of high-end VMs to
avoid the risk of failing to meet her throughput goal. This can
typically lead to over-provisioning and unnecessarily high cost.
The user may further find an optimal cloud configuration by
blindly exploring different cloud configurations and evaluating
them whether they meet her throughput goal. However, this
will be very expensive and time consuming, since she will
find a lot of different cloud configuration options, different
workloads and configurations have different performance char-
acteristics [4], and the workload deployment is typically very
complicated [5]. Although a user can figure out which VM
has the fastest CPU, disk I/O, or memory separately, this is
not sufficient for the user to deploy complex workloads. This
is because resource types are usually inter-dependent while
dealing with workloads. This is also because a resource can
be bottleneck for certain amount of loads, and the bottleneck
can be migrated between resources as load changes [6].This paper describes an approach to estimate performance
capabilities of cloud configurations obtained from black-box
clouds, and identify a near optimal cloud configuration for a
given complex workload in an efficient way. Specifically, our
contributions are as follows.
• Estimating cloud capability requirement for workloadand throughput goal: We develop an approach to char-
acterize the performance of given complex workload and
then, estimate the required capability of each resource
type to achieve the target throughput. These performance
capabilities are encoded into a capability vector to repre-
sent the requirement for the throughput goal.
• Estimating performance capability of target cloud forgiven workload: We develop an approach to characterize
the performance of a target cloud configuration (e.g., a
VM), using a benchmark suite that is much simpler todeploy than the target application itself, and then, encode
it into a representative capability vector.
2013 IEEE Sixth International Conference on Cloud Computing
978-0-7695-5028-2/13 $26.00 © 2013 IEEE
DOI 10.1109/CLOUD.2013.45
293
������������� ������������
Price Performance
���������� ��� ���� ���� ������
��������� ���������� ��������� ��������������
�� ��������� ���������� ��������� ��������������
�� ����� ���������� �! ��"�����������
�� ����� ��#������� ��������� ��������������
Fig. 1. An example result of cloud comparison
• Identifying optimal cloud configuration in an efficientway: We develop a search algorithm to identify a near
optimal cloud configuration in black-box clouds. Using
capability vectors, we cast the search problem into the
Knapsack problem [7], and develop a heuristic search
algorithm to identify a cloud configuration having a best
price. With the algorithm, our approach can reduce the
search space and speed up the search procedure.
We have evaluated our approach on a public cloud with
intensive web workloads. Experiment results show our ap-
proach accurately estimate performance capabilities of cloud
configurations using the benchmark-based approximation. To
evaluate the scalability of our search algorithm, we perform
an extensive simulation with capability vectors computed with
benchmark results.
The remainder of this paper is structured as follows: Section
II outlines our recommendation procedure; Section III and IV
describe the estimation and the search algorithm respectively
in detail; Section V shows our evaluation results; Section VI
reviews related work; Section VII concludes the paper.
II. SYSTEM OVERVIEW
It is critical for both cloud providers and users to identify
an optimal cloud configuration for a given workload. In this
regard, we are developing a cloud recommender system, re-
ferred to as CloudAdvisor, that aims to estimate the throughput
of a given workload in a target cloud. To provide a concrete
Recommendation-as-a-Service, CloudAdvisor compares offer-
ings made by different cloud providers for any given workload.
When one requests a comparison, it shows a comparison table
as depicted in Fig. 1.
The table in Fig. 1 shows a price comparison among four
different cloud configurations including an in-house cloud,
when those cloud configurations have similar throughputs for
a workload deployment. Then, a user can identify which
configuration offers a best price while it meets her throughput
goal. Similarly, CloudAdvisor can provide a throughput com-
parison, where prices are similar. Note that those prices and
throughputs can dynamically change over time due to dynamic
environment of cloud market.
Fig. 2 outlines the approach for providing such comparisons.
The overall approach is 1) characterizing the performance of a
given workload in terms of resource usage pattern in the white-
box test-bed, 2) based on the resource usage pattern, estimating
a capability vector, each element of which represents the
��������
�������� ������ �
������!�
�"#�$"���������
�������#�$�
�������#�$�
�������#�$�
�������#�$�
��$%�����&'���#�$�(��������#�$�
$���&"��
�����)�
� &#����*���)#+�����
����%���*�)��
��"� '�$%)�
�,,�&��-�"��"��.�$%�����
��,��&�&#/��)-'�-�"�
��$� ���$����������"���$�-�"�
����''�"��$�/)#�'�
�,,�&��-�"�0��)�$��$1��&��
1
2
3
4
$&���(���/ �&)���"#� �$��$'�"���
��$����$�
������ ����� ������
��������2���
������ �! ������
�������� ������ �! ������
�������� ������ ����� ������
5
Fig. 2. Approach of our recommender system
required performance capability of each resource type (e.g.,
CPU, memory, disk I/O, and network bandwidth) to meet a
target throughput, 3) capturing the performance characteristics
of target clouds using a configurable benchmark suite, referred
to as CloudMeter developed as a component of CloudAdvisor,
and estimating a set of capability vectors, each of which
represents a specific configuration (e.g., a single VM), 4)
with these capability vectors, searching an optimal cloud
configuration (i.e., combined VMs to run the target workload)
until there is no more chance to minimize the price, and finally,
5) providing a comparison table using search results. Note
that the step 3 can be done as offline batch processes, and
each batch process will be scheduled periodically, since those
benchmarking results can be dynamically changed over time.
To achieve such comparisons, we develop two key compo-
nents, capability estimation and search for an optimal cloud
configuration, to be described in the following sections.
III. BENCHMARK-BASED CAPABILITY ESTIMATION
In a recommender system, one of the most important parts is
accurately estimating the performance capability of each cloud
configuration for a given workload while exploring various
different cloud configurations. Here, the performance capa-
bility of cloud configuration is defined as the approximated
maximum throughput that can be achieved using a cloud
configuration for the workload.
To estimate performance capabilities of cloud configura-
tions, our approach first builds an abstract performance model
based on the resource usage pattern of the workload mea-
sured in an in-house test-bed (i.e., a white-box environment).
Second, using CloudMeter, it computes relative performancescores of many different cloud configurations against the in-
house cloud. Finally, it applies the collected performance
scores into the abstract performance model to estimate per-
formance capabilities of those cloud configurations. This ap-
proach needs little cost because we do not need to deploy the
294
complex workload itself into all possible cloud configurations
to evaluate their performance capabilities.
A. Performance Characterization of Workload
For a given workload, CloudAdvisor first characterizes
the workload in the context of its resource usage pattern
by deploying the workload into an in-house test-bed and
computing the correlation of resource usages to load change.
In our approach, we capture the usage change rates (i.e., slope)
of each resource type and the throughput change rate until the
capability is reached, while load increases. These usage change
rates can approximately indicate the degree of contribution of
each resource type to the performance capability (see Fig. 4
in Section V showing these change rates of an application).
A workload simulator has been developed for the per-
formance characterization. It generates synthetic loads with
various data access patterns (e.g., the ratio of database write
over read transactions and the ratio of business logic over I/O).
If a historical load is available, the workload simulator can sort
and re-play the workload to give systematic stress to the target
system. We also consider that the test-bed is highly capable
to run any type of workload such as CPU-intensive, memory-
intensive, I/O intensive, or network intensive.
B. Building Abstract Performance Model
Based on the resource usage pattern, our approach defines
quantitative models for resource types. As shown in Fig. 4,
throughput increases until the performance capability reaches,
and the point of the performance capability is determined by
some resource types that consume the most of their available
capacities (i.e., bottlenecked). Hence, our system defines a
quantitative model for each individual resource type to identify
its correlation to the performance capability. Specifically, for
each resource type j, a quantitative model can be defined as,
Tj = f(Uj |(Cj = c, ∃j ∈ R) ∧ (Cr′ =∞, ∀r′ ∈ R))
, where Tj is the throughput to be achieved with Uj that is
the normalized usage rate over given capacity (i.e., Cj = c)of a resource type j. R is a set of resource types, and r′ is a
resource type in R, where r′ �= j. We consider r′ has unlimited
capacities to compute the correlation of only j to Tj .
To compute Tj using f , our system takes 4 steps. First, the
system figures out the relation of load to the usage rate of the
resource type. The relation can be defined as a linear function
or, generally, as a function that has a logarithmic curve for a
resource type j. Usage rates we consider in this paper are the
total CPU that consists of user and system CPU usages, cache,
memory, disk I/O, and network usages. More specifically, the
function can be as follows,
Uj = si,j(αj(2L− Lp) + γj) (1)
, where L is the amount of load, p is used to minimize the
square error (a linear function is a special case when p is 1).
αj is the increase rate (e.g., a slop in a linear function), and γjis an initial resource consumption of the current configuration.
αj , γj and p can be obtained by calibrating the function to fit
into actual curve. In this fitting, we use the portion of low load
in the curve (i.e., the portion before the knee of the curve in
Fig. 4 (b) and (c)). Then, si,j is computed that is the relative
performance score of the cloud configuration i. This will be
described in the next section. In the white-box, si,j is 1.
Second, the relation of L to throughput T (see Fig. 4 (a))
is defined as,
T = β(2L− Lq) (2)
, where β is the increase rate, and q is used to minimize the
square error (a linear when q is 1). Similarly, we can obtain
β and q by calibrating the function to fit into actual curve.
Third, we compute the capability based on the correlation
of j to L. We can obtain a theoretical amount of load, when
j reaches the full usage point using Eq. 1 (i.e., theoretically
extending the curve beyond the knee point until Uj is 1). Then,
the obtained amount of load is applied to Eq. 2.
Finally, the capability of the cloud configuration can be
Tmax = min(T1, T2, ..., Tr), where Tj is the throughput com-
puted from Eq. 2 for each j. In other words, Tj is a maximum
throughput when j is fully consumed while other resources
are still available (because we consider other resources are
unlimited). The rationale of using min() function to compute
Tmax is that the capability is based on the fact, which some
resources do not consume all of their available resources while
only bottlenecked ones consume all of their resources.
C. Computing Relative Performance Scores
Despite the same workload, target cloud configurations
may have different performances each other. To complete the
abstract performance model (i.e., Eq. 1), we have to capture
the performance characteristics of each target configuration iin terms of relative performance score si,j for each resource
type j. Using CloudMeter, our approach collects relative
performance scores based on resource capability measure-
ments. These scores can be reused for any different workload
later. In current implementation, CloudMeter contains a set
of micro-benchmark applications including Dhrystone, Whe-
stone, System calls, and Context switch that are integrated
into UnixBench for CPU benchmarking, CacheBench for
memory benchmarking, IOZone for disk I/O benchmarking,
and our own network benchmark application. CloudMeter is
very useful when historical workload trace of an application
in the target cloud is not available. Once it is available after
the application is deployed, the application itself can be a
benchmark of CloudMeter and then, the historical data can
be used to compute si,j of a new workload that has similar
performance characteristics.
Using measurements, our approach computes si,j as, si,j =(bi,j/bj)(aj/ai,j), where aj and bj are a given resource
allocation (e.g., the number of vCPUs or memory size) and the
benchmarking measurement for j respectively, in the white-
box cloud configuration. ai,j and bi,j are those in the target
cloud configuration i.
295
By applying si,j to Eq. 1, we can obtain the performance
capability of i. When we deal with CPU for si,cpu, the system
CPU usage for si,sys and user level CPU usage for si,user must
be considered separately in the total CPU usage as shown in
Fig. 4 (b) and (c), since the system CPU usage is related to
context switch and system call used for interrupts, allocate/free
memory, and communicating with file system that can be
different among cloud configurations. Thus, we extend si,jfor CPU as si,cpu = (si,userαuser + si,sysαsys)/αcpu, where
αuser is the increase rate of user CPU usage and αsys is the
increase rate of system CPU usage while αcpu is the increase
rate of the total CPU usage. These rates are captured from the
fitting model step in Eq. 1.
IV. SEARCH FOR OPTIMAL CLOUD CONFIGURATION
Based on a target throughput and the performance model,
CloudAdvisor identifies a near optimal cloud configuration
for the given workload. In particular, we focus on parallel
workload in this paper. To do this, our approach first generates
capability vectors, each of which represents the performance of
a cloud configuration. Second, we encode the search problem
into Knapsack Problem [7] and then, we develop a search
algorithm to solve the problem in an efficient way.
A. Generating Capability Vectors for Given Workload
The first step to identify an optimal cloud configuration is
generating a required capability vector using resource usage
model in the white-box test-bed. Each element of the required
capability vector represents the performance capability value
of each resource type to meet a target throughput.
Specifically, the required performance capability value, re-
marked as c∗j of a resource type j (i.e., the usage rate Uj
required to achieve the target throughput T ∗), can be given as
c∗j = (αj/β)T∗ + γj , when we consider a linear function in
Eq. 1 and 2. Here, (αj/β) indicates the normalized increase
rate of the resource usage to increase a unit of throughput.
Thus, the equation indicates how much resource capability
is required to meet T ∗. Note that if c∗j is more than 1, it
indicates that more resource capability is required to meet T ∗
than currently available in the test-bed. With this equation,
the required capability vector for the target workload and
throughput is a set of such required performance capability
values and represented as a form of V ∗ = [c∗1 c∗2 ... c∗r ].We may obtain the required capability vector of a target
cloud configuration i by directly deploying the given workload
into i and perform aforementioned measurements. However,
it is expensive and time consuming task since there are many
different cloud configurations to be evaluated, and the work-
load deployment is typically very complicated. Hence, our
approach instead captures the relative performance capabilityvalue, remarked as ci,j of i, by using the benchmarking
measurements that are used to compute si,j for j. It is much
simpler than deploying the complex workload into the target
cloud configuration. Moreover, benchmarking measurements
can be reused for other workloads once these are continually
updated for any change of cloud configuration.
CPU Mem Cap Disk IO NW bandwidth
10.0 9.8
7.0
5.0
1.0 1.2 0.8 0.5
Elements of Capability Vector
Perf
. Cap
. Val
ue
Required capability
Filled capability
Fig. 3. A capability vector is filled with a VM’s capability vector
Specifically, the relative performance capability value ci,j =(bi,j/bj)(ai,j/aj)cj , where (bi,j/bj) is the performance ratio
based on benchmarking, (ai,j/aj) is resource allocation ratio,
and cj is the maximum resource usage rate for given capacity
of j (typically, cj = 1). Then, the capability vector of i can
be represented as a form of Vi = [ci,1ci,2...ci,r]. Finally, our
approach computes all target capability vectors from clouds of
interests. Note that Vi is just the relative capability vector of iagainst the test-bed, while V ∗ is the capability vector required
to achieve T ∗ and computed in the test-bed.
B. Search Algorithm
Using capability vectors, CloudAdvisor explores various
cloud configurations offered from each cloud provider to iden-
tify a cloud configuration with a best price. If we can obtain
the price per a unit of resource usage, we can easily compute
the total price to meet V ∗. However, most cloud providers have
pre-determined small cloud configuration types with specific
prices as forms of VM types that have different CPU, memory,
disk and network capacities [8]. Some cloud providers such
as Amazon Web Services use a dynamic pricing scheme for
their virtual resources as well. In this paper, we assume a
static pricing scheme that is used by most cloud providers. In
our current implementation of the search algorithm, we also
assume a parallel workload such as clustered parallel database
transactions and a parallel data mining using MapReduce
[9]. For parallel workloads, the cloud configuration can have
multiple heterogeneous VMs to handle loads in parallel with
a load balancing technique such as one introduced in [10].
The relative capability vector Vi of each specific VM type
can be computed as described in the prior section. We can also
capture V ∗ that represents a resource capacity requirement
to meet a target throughput T ∗. Then, the search procedure
is to fit those numerical capability values of pre-determined
VM types into numerical capability values defined in V ∗.
Especially, the search algorithm identifies a configuration with
a minimum price when fitting into V ∗.
As shown in Fig. 3, we illustrate V ∗ with a set of containers,
each of which has a specific size (i.e., the numerical capability
c∗j of j as described in Section IV-A). Then, we can convert the
problem into Knapsack problem [7], since the search algorithm
tries to fill containers with items (i.e., Vis of VMs) that have
different values (i.e., prices) while aiming at a minimal total
296
price. Fig. 3 shows a snapshot of the search procedure, when
containers are filled with capabilities of a VM’s Vi.
The problem we tackle in this paper is finding such VMs to
meet a target throughput while minimizing the cumulated price
of VMs. More specifically, the optimal cloud configuration
will have 0 distance between V ∗ and cumulated capabilities of
VMs (i.e.,∑m′
i=1 ci,j , where m′ is the number of VM instances
in a configuration) while having the smallest cumulated price
of VMs. The distance can be computed as follows,
D =
r∑
j=1
max(c∗j −m′∑
i=1
ci,j , 0) (3)
If c∗j −∑m′
i=1 ci,j < 0, its distance is considered as 0 since
it indicates enough capability. Otherwise, containers have to
be filled with more VMs since the target throughput is not
met. Note that all containers must be fully filled even if a
container’s size is very small (i.e., the required capability is
relatively small such as network bandwidth in Fig. 3). This is
because the bottleneck can occur in any resource that has not
enough capability even if the resource type is not critical to
achieve a target throughput.
Identifying such cloud configuration is not trivial, since
there are many different combinations of VMs that can meet
the target throughput. We can formulate the problem into
Integer Linear Programming (ILP) as follows,
Minimize∑m
i=1 nipiSubject to
∑mi=1 nici,j ≥ c∗j ∀j ∈ R, ni ≥ 0, pi ≥
0, and ci,j ≥ 0
, where m is the number of VM types, ni is the required
number of VM instances of each VM type i, and pi is the
price of i. Then, to solve this ILP, we can use generic solutions
such as Tabu search [11], Simulated annealing [12], and Hill
climbing [13].
In our approach, we design an efficient search algorithm
by extending the best-first search algorithm based on our
observations, rather than blindly exploring the search space.
Basically, our search algorithm uses the cheapest configuration
(i.e., the cheapest combination of VMs) first to fill V ∗ among a
set of candidates. It keeps generating candidates and trying to
fit each candidate combination into V ∗ iteratively until it finds
one successfully fitted into V ∗. The overhead of the search
procedure is caused by too many candidates to be generated
(i.e., a large search space). Hence, our algorithm is designed to
significantly reduce the search space. To do this, our algorithm
first checks the gradient of each resource capability in the
current cheapest candidate against a previously selected one.
Then, it chooses only those candidates that have any resource
type still required to be filled into V ∗, (i.e., c∗j−∑m′
i=1 ci,j > 0)
and reducing the gap between c∗j and∑m′
i=1 ci,j more than the
previously selected candidate does. We call this approach as
Conservative Gradient Search (CG).
Second, our algorithm explores the search space along with
biased paths in the search space that have more remaining
gaps (i.e., focusing on a resource type to be potentially
bottlenecked), and chooses some number of candidates that
can fill the gap of the resource with less price. Note that our
algorithm checks remaining gaps of resource types at every
iteration since the potential bottleneck keeps migrating while
adding resources. The number of candidates can be determined
empirically or dynamically over the iteration (i.e., reducing the
number as the gap decreases). In our experiments, we use a
static parameter. We call this approach as Bottleneck-awareProactive Search (BP). The algorithm with two inputs, V (a
set of VM types) and V ∗, is outlined in Algorithm 1.
Algorithm 1 Bottleneck-aware proactive search
Input: V ∗, VOutput: V̂curr // the cheapest combination of VM instancesQ← null; // the pool of candidate combinationsV̂curr ← ø; V̂prior ← ø;for each Mi ∈ V do
Q← Q ∪ vmi; // where vmi is an instance of Mi
end forwhile forever do
Q← SortByPrice(Q);while forever do
V̂curr ← Q.RemoveF irst();if |V̂curr| < |V̂prior| then
break; // where |V̂ | is the number of instancesend ifif (V̂curr.cj < V ∗.cj) ∧ (V̂curr.cj ≥ V̂prior.cj), ∃j ∈ R then break;
end whileV̂prior ← V̂curr;
if V̂curr.cj ≥ V ∗.c∗j , ∀j ∈ R thenreturn V̂curr; // i.e., compute D as described in Eq. 3
end if(j∗, g∗)← max{V ∗.c∗j − V̂curr.cj , ∀j ∈ R};// where j∗ is bottlenecked and g∗ is the remaining gapQtemp ← SelectCandidateTypes(j∗, g∗, k,V);// where k is the number of candidate newly generatedfor each Mi ∈ Qtemp do
V̂ ′.p← V̂curr.p+ vmi.p; // where p is the priceV̂ ′.cj ← V̂curr.cj + vmi.cj , ∀j ∈ R;
if V̂ ′ /∈ Q then Q← Q ∪ V̂ ′;end for
end while
Using this algorithm, we reduce the search space by pruning
out some candidates that can achieve the target throughput, but
are eventually more expensive than the resulting combination.
Hence, we can compute the near optimal cloud configuration
much more efficiently than blind exploration.
V. EXPERIMENTAL EVALUATION
A. Experimental Setup
To evaluate our approach, we have used an online auction
web, RUBiS [14], that is deployed on servlet server (Tomcat)
and back-end database server (MySQL). The workload pro-
vided by RUBiS package consists of 26 different transactions.
In our experiments, we focus on clustering of the database
297
0
1000
2000
3000
4000
5000
6000
7000
8000
50
150
250
350
450
550
650
750
850
950
1050
11
50
1250
13
50
1450
15
50
B-1-2-K
B-4-2-X
W-2-4-K
Load (#concurrent users)
Thr
ough
put (
#res
pons
es p
er 3
min
)
0
10
20
30
40
50
60
70
80
90
100
50
150
250
350
450
550
650
750
850
950
1050
11
50
1250
13
50
1450
15
50
CPU_user CPU_sys CPU_total Disk_I/O NW_I/O Mem
Load (#concurrent users)
Res
ourc
e us
age
(%)
0
10
20
30
40
50
60
70
80
90
100
50
150
250
350
450
550
650
750
850
950
1050
11
50
1250
13
50
1450
15
50
1650
17
50
CPU_user CPU_sys CPU_total Disk_I/O #REF! Mem
Load (#concurrent users)
Res
ourc
e us
age
(%)
NW_I/O
(b) Resource usage of light-write (c) Resource usage of heavy-write (a) Throughputs of three configurations
Fig. 4. Throughputs and resource usage patterns
server so that we have modified the default workload to
intentionally stress on the database server by reducing simple
HTML transactions and increasing the rate of DB query
transactions. Then, we have created two different workloads by
changing the rate of write transactions in workload. The light-
write workload has 5% write transactions, while the heavy-
write workload has 50% over all transactions in workload.We have prepared a VM type in our white-box test-bed that
is configured with 2 vCPUs and 4 GB memory, 40 GB disk,
and runs with Ubuntu 10.04 operating system. Instances of
this VM type are deployed into our Intel blade with KVM
virtualization. We call the instance of this VM type as W-2-4-
K. We have prepared other VM types obtained from Rackspace
that is one of well-known IaaS providers. There are various
VM types in Rackspace configured with different number of
vCPUs, memory size, and disk size. We have used 6 VM types,
where their variation in the number of vCPU is 1 to 8, memory
1 to 30 GB, disk 40 GB to 1.2 TB, and network 30 to 300
Mbps. Their prices are from $0.06 to $1.20 per hour. Ubuntu
10.04 has been installed in VMs. These VMs have been used
in our experiments for the search algorithm (Section V-C).For evaluating the capability estimation (Section V-B), we
have selected a VM type from Rackspace that has 4 vCPUs, 2
GB memory, and 80 GB disk space running on AMD server
with Xen. We call its instance as B-4-2-X. We have selected
a small VM type to compare with W-2-4-K. The small VM
type has 1 vCPU, 2 GB memory, and 40 GB disk, and run on
Intel server with KVM. We call its instance as B-1-2-K.
B. Performance Capability EstimationTo evaluate the capability estimation, we have deployed the
application into W-2-4-K and built the abstract performance
model (i.e., Eq. 1 and 2). As shown in Fig. 4, our workload
generator records throughput (i.e., the number of responses per
3 minutes) while increasing the number of concurrent users by
50 every 3 minutes using aforementioned workloads.Fig. 4 (a) shows the throughput change and the maximum
throughput of W-2-4-K. It also shows those of other two con-
figurations, B-1-2-K and B-4-2-X. Our estimation approach
will estimate those maximum throughputs shown in Fig. 4
(a). We note that change rates of all three configurations
are almost identical in low load. This is because they have
TABLE ICOEFFICIENTS (PARAMETERS) OF ABSTRACT PERFORMANCE MODELS
Resources αj p γj
CPU 0.08 1.01 8User CPU 0.05 1.02 5Sys. CPU 0.03 0.98 3Memory 0.01 1.01 21
Rel
ativ
e Pe
rf. S
core
0
0.5
1
1.5
2
2.5
Total CPU
System CPU
User CPU
Memory Disk I/O Network I/O
W-2-4-K
B-1-2-K
B-4-2-X
Fig. 5. Performance scores of resource types
enough capability to deal with such low load. However, their
maximum throughputs are different due to their different
capabilities. To figure out such capabilities, the resource usage
pattern is plotted as shown in Fig. 4 (b) and (c).
Fig. 4 (b) and (c) show usage changes of 6 resource types.
As observed, CPU and memory mainly affect the throughput
in all configurations. We note here that both workloads seem
having similar usage patterns, but user CPU and system
CPU usages are different between workloads. Hence, we
should carefully deal with CPU usage as described in Section
III-C when modeling the performance. We have captured the
abstract performance model of the heavy-write workload into
Table I. We have also captured β = 6 and q = 0.98 used in Eq.
2, as the throughput increase rate and square error respectively.
To compute performance scores of B-1-2-K and B-4-2-
X, we have deployed CloudMeter into these configurations
and performed benchmarking of resource types. Fig. 5 shows
performance scores of 6 different resource types. Note that
scores of B-4-2-X are quite different from other two VMs
since the ratio of CPU and memory allocations is different,
and B-4-2-K is configured with different type of architecture
298
TABLE IITHROUGHPUT ESTIMATES (THE NUMBER OF TRANSACTIONS PER MINUTE)
W-2-4-K B-1-2-K B-4-2-X
Measured T 6050.32 3552.12 7063.33Tcpu 3241.02 13997.38Tmem 5029.91 7659.32Tdisk 4178010.09 3483818.44Tnw 2097002.99 1715384.62
(i.e., AMD) and virtualization (i.e., Xen).
By applying these scores to the abstract performance model,
we can estimate capabilities of B-1-2-K and B-4-2-X as shown
in Table II. Based on our approach in Section III-B, CPU is
bottlenecked in B-1-2-K similar with W-2-4-K as shown in
Fig. 4 (c) (i.e., Tcpu is the minimum). In B-4-2-X, it turns out
memory is bottlenecked (different from other two VMs). This
is because it has enough CPU allocation, but the bottleneck
is migrated to memory. Compared to the measured maximum
throughputs, the error rate is less than 10% (for B-1-2-K, it
is 8.26%, and for B-4-2-X, 8.04%). This is even better in the
light-write workload (around 6%).
Experiment results indicate that our estimation approach is
accurate enough to apply for the recommender system. When
we see the resource usage pattern in Fig. 4 (b) and (c), it seems
that CPU is bottlenecked in W-2-4-K, but it turns out that
memory can be bottlenecked in different configurations. Ad-
ditionally, using B-4-2-X instances can be over-provisioning to
achieve a little throughput increase, when we compare to W-2-
4-K. Finally, we can also find the bottleneck can be migrated
between resources (e.g., Tcpu and Tmem are similar in B-1-
2-K) as adding more allocation into bottlenecked resources.
Hence, the recommender system must accurately capture such
resource usage pattern and bottleneck migrations for workload.
C. Scalability of The Search Algorithm
We have obtained all capability vectors of 6 VM types
of Rackspace and then, evaluated the accuracy of our search
algorithm using the light-write workload. The resulting config-
uration computed by our search algorithm (i.e., BP) has been
compared with the configuration computed by a brute-force
search. We have used a relatively low throughput goal (i.e.,
20K) so that the resulting configuration consists of 5 low-end
VM instances. We have deployed a simple load balancer in
a separate VM that forwards user requests to the VM cluster
based on their performance capabilities. We can see that BP
returns exactly the same configuration (i.e., 5 low-end VM
instances) with the brute-force search in this small setup.
Although the VM cluster has been a little over-provisioned
against the goal (i.e., 22.3K), the error is still around 10 %.
We have conducted an extensive simulation to evaluate the
potential scalability and optimality of our search algorithm.
Currently, we plan to deploy a large-scale MapReduce cluster
with a parallel data mining workload that may need upto
several hundreds of VMs in the cluster. Thus, the search
algorithm has to be scalable to deal with such large workload.
In experiments, we have run our search algorithm with all
capability vectors of 6 VMs and W-1-2-K to compute the
configuration and its total price. We have increased the target
0
500
1000
1500
2000
2500
60K 90K 120K 150K 180K 210K 240K
NBF
CG
BP
Sear
ch T
ime
(sec
)
Target Throughput
Fig. 6. Scalability comparison of three search algorithms
0
1
2
3
4
5
6
60K 90K 120K 150K 180K 210K 240K
NBF
CG
BP
Target Throughput
Tot
al P
rice
/Hou
r ($
)
Fig. 7. Price comparison of three search algorithms
throughput from 60K to 240K while measuring the duration
of our search algorithm and the total price. Three different
search algorithms have been compared for the evaluation.
• Naive Best-First Search (NBF): uses the basic best-
first search algorithm. This algorithm is not so scalable
since it explores the cheapest candidate step by step while
generating all possible candidates in iterations. We have
used this algorithm as a baseline.
• Conservative-Gradient Search (CG): integrates the
conservative gradient search (described in Section IV) on
top of NBF.
• Bottleneck-aware Proactive Search (BP): integrates the
bottleneck-aware search to prune the search space on top
of CG and NBF as shown in Algorithm 1.
Experiment results show that our BP algorithm is scalable
(Fig. 6) while having reasonable optimality (Fig. 7). NBF in
Fig. 6 shows an obvious exponential increase as the target
throughput increases. Although CG has better scalability than
NBF, it still shows the exponential increase. This is mainly
because capabilities per unit price of given VM types are not
so different so that the pruning rate is very low. BP algorithm
aggressively prunes the search space based on potential bot-
tlenecks, instead of capability per unit price. Hence, we can
achieve a good scalability. Meanwhile, BP shows only a little
loss of optimality. As shown in Fig. 7, BP returns at most 4
% more price than the price computed by NBF. CG is almost
identical to NBF since it explores the most of candidates that
are explored by NBF.
VI. RELATED WORK
Evaluation of applications and clouds has gathered pace
in recent times as most enterprises are moving toward agile
299
hosting of services and applications over private, public, and
hybrid clouds. In this regard, researchers have focused on
three different directions: 1) updating application architecture
to move from legacy systems and adapt to the capabilities
of clouds [15]; 2) evaluating and comparing different clouds
functional and non-functional attributes to enable informed
decision on which cloud to host applications [4], [16], [17],
[5], [18]; and 3) efficient orchestration of virtual appliances
in a cloud [19], which may also include negotiations with
users [20]. This paper complements these different directions
by enabling recommendation of cloud configuration to suit
application requirements. In this regard, cloud capability esti-
mation methodology is developed using benchmark-based ex-
perimental profiling. Previous work has principally focused on
comparing rudimentary cloud capabilities using benchmarks
[4], [16] and automated performance testing [5]. This paper
focuses on detailed characterization of cloud capabilities for
given user workload in various target clouds.
Recently, many cloud providers, such as [17], [18] have
further offered testing paradigm to enable evaluation of vari-
ous workload and application models on different cloud and
resource management models. However, scaling such offer-
ings (to evaluate different applications on different clouds)
requires identifying various cloud capability models [21]. This
paper fills the gap by developing a generic methodology to
characterize and model cloud capabilities. The methodology
is applicable to different clouds since the modeling of cloud
capabilities are based on experiments of clouds externally
observable characteristics (assuming clouds as black-box). The
model is used to estimate performance of applications in target
clouds and then such estimates are used to efficiently search
(and subsequently recommend) cloud configuration.
Building performance models and diagnosing performance
bottlenecks for Hadoop clusters have been explored in [22].
This paper further develops methodology to recommend cloud
configurations to meet user demands. Orchestration of virtual
appliances in a cloud [19] and efficient allocation of instances
in cloud infrastructure [20] have been addressed previously
to meet user demands. However, such approaches are typi-
cally non-transparent to users and any cost and performance
trade-off remain unknown to users. This paper, on the other
hand, makes such tradeoff transparent allowing users to make
choices accordingly. Although negotiations with users have
been explored in [20], this paper enables configuration rec-
ommendations to the users based on capability estimation.
VII. CONCLUSION
This paper has aimed to address one major barrier given to
recommender system that identifies an optimal cloud configu-
ration for given complex user workload based on benchmark-
based approximation. Especially, we have shown that such
estimation and recommendation can be done even under black-
box environments. To achieve this, our system generates the
capability vector that consists of relative performance scores of
resource types and then, our search algorithm identifies a near
optimal cloud configuration based on these capability vectors.
Our experiments show our approach estimates the near optimal
cloud configuration for given workload within 10 % error, and
our search algorithm is scalable enough to apply for a large-
scale workload deployment. We currently work on applying
our approach to a large-scale MapReduce job by extending
the current approach. In this case, we are considering the time
factor (e.g., the number of hours to be used for each VM)
into Integer Linear Programming problem, defined in Section
IV-B, for so-called bag-of-tasks as introduced in [23].
REFERENCES
[1] (2013) Cloudharmony. [Online]. Available: http://cloudharmony.com/[2] R. Buyya, J. Brokberg, and A. Goscinski, Coud Computing: Principles
and Paradigms, Chapter 1. Wiley, 2011.[3] McKinsey, “Clearing the air on cloud computing,” Tech. Report, 2009.[4] D. Jayasinghe, S. Malkowski, Q. Wang, J. Li, P. Xiong, and C. Pu,
“Variations in performance and scalability when migrating n-tier appli-cations to different clouds,” in Proc. Int. Conf. on Cloud Computing,2011, pp. 73–80.
[5] D. Jayasinghe, G. Swint, S. Malkowski, Q. Wang, J. Li, and C. Pu,“Expertus: A generator approach to automate performance testing inIaaS,” in Proc. Int. Conf. on Cloud Computing, 2012, pp. 115–122.
[6] S. Malkowski, M. Hedwig, and C. Pu, “Experimental evaluation of n-tier systems: Observation and analysis of multi-bottlenecks,” in Proc.Int. Symp. on Workload Characterization, 2009, pp. 118–127.
[7] H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems. Springer,2004.
[8] M. Cardosa, A. Singh, H. Pucha, and A. Chandra, “Exploiting spatio-temporal tradeoffs for energy-aware MapReduce in the cloud,” in Proc.International Conference on Cloud Computing, 2011, pp. 251–258.
[9] J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing onlarge clusters,” in Proc. Conference on Symposium on Operating SystemsDesign and Implementation (OSDI’04), 2004, pp. 10–19.
[10] G. Jung, N. Gnanasambandam, and T. Mukherjee, “Synchronous parallelprocessing of big-data analytics to optimize performance in federatedclouds,” in Proc. Int. Conf. on Cloud Computing, 2012, pp. 811–818.
[11] F. Glover, “Tabu search-part II,” ORSA, vol. 1, pp. 4–32, 1989.[12] V. Granville, M. Krivanek, and J.-P. Rasson, “Simulated annealing: A
proof of convergence,” Transactions on Pattern Analysis and MachineIntelligence, vol. 16, pp. 652–656, 1994.
[13] S. Russell and P. Norvig, Artificial Intelligence: a modern approach.Prentice Hall, pp. 111-114, 2003.
[14] RUBiS. [Online]. Available: http://rubis.ow2.org/[15] M. A. Chauhan and M. A. Babar, “Migrating service-oriented system
to cloud computing: An experience report,” in Proc. InternationalConference on Cloud Computing), 2011, pp. 404–411.
[16] S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, “The HiBenchbenchmark suite: Characterization of the MapReduce-based data analy-sis,” in Proc. International Conference on Data Engineering Workshops,Washington, DC, USA, 2010, pp. 41–51.
[17] R. Calheiros, R. Ranjan, A. Beloglazov, C. De Rose, and R. Buyya,“CloudSim: A toolkit for modeling and simulation of cloud comput-ing environments and evaluation of resource provisioning algorithms,”Software: Practice and Experience, vol. 41, pp. 23–50, 2011.
[18] J. Li, Q. Wang, Y. Kanemasa, D. Jayasinghe, S. Malkowski, P. Xiong,M. Kawaba, and C. Pu, “Profit-based experimental analysis of IaaScloud performance: Impact of software resource allocation,” in Proc.International Conference on Services Computing, 2012, pp. 344–351.
[19] K. Keahey and T. Freeman, “Contextualization: Providing one-clickvirtual clusters,” in Proc. Int. Conf. on eScience, 2008, pp. 301–308.
[20] S. Venugopal, J. Broberg, and R. Buyya, “Openpex: An open provision-ing and execution system for virtual machines,” in Proc. InternationalConference on Advanced Computing and Communications), 2009.
[21] J. Gao, X. Bai, and W. T. Tsai, “Cloud testing- issues, challenges, needsand practice,” Int. Journal on Software Engineering, vol. 1, 2011.
[22] S. Gupta, C. Fritz, J. de Kleer, and C. Witteveen, “Diagnosing hetero-geneous hadoop clusters,” Workshop on Principles of Diagnosis, 2012.
[23] J. O. Gutierrez-Garcia and K. M. Sim, “GA-based cloud resource estima-tion for agent-based execution of bag-of-tasks applications,” InformationSystems Frontiers, vol. 14, pp. 925–951, 2012.
300