An ACO-Inspired Algorithm for Minimizing Weighted Flowtime

Embed Size (px)

DESCRIPTION

An ACO-inspired algorithm for minimizing weighted flowtime

Citation preview

  • ge

    ino, Bu

    Keywords:

    entsainss, dob

    is a very important aspect in PSEs since it allows accelerating PSE results processing and visualization

    SEs) isments,pplica

    et al. [7], which consists in analyzing the inuence of size and typeof geometric imperfections in the response of a simple tensile teston steel bars subject to large deformations. To conduct the study,the authors numerically simulate the test by varying someparameters of interest, namely using different sizes and types of

    appropriately allocate the workload and reduce the associatedcomputation times. Scheduling refers to the way jobs are assignedto run on the available CPUs of a distributed environment, sincethere are typically many more jobs running than available CPUs.However, scheduling is an NP-hard problem and therefore it isnot trivial from an algorithmic complexity standpoint.

    In the last ten years or so, Swarm Intelligence (SI) has receivedincreasing attention in the research community. SI refers to thecollective behavior that emerges from social insects swarms [4].Social insect swarms solve complex problems collectively through

    Corresponding author at: ISISTAN Research Institute, UNICEN University,Campus Universitario, Tandil B7001BBO, Buenos Aires, Argentina. Tel.: +54 (249)4439682x35; fax: +54 (249) 4439681.

    E-mail addresses: [email protected] (C. Mateos), [email protected]

    Advances in Engineering Software 56 (2013) 3850

    Contents lists available at

    Advances in Engin

    sev(E. Pacini), [email protected] (C.G. Garino).times with different input parameters resulting in different outputdata [48]. Running PSEs involves managing many independent jobs[40], since the experiments are executed under multiple initialcongurations (input parameter values) many times, to locate aparticular point in the parameter space that satises certain usercriteria. In addition, different PSEs have different number ofparameters, because they model different scientic or engineeringproblems. Currently, PSEs nd their application in diverse scienticareas such as Bioinformatics [42], Earth Sciences [16], High-EnergyPhysics [3] and Molecular Science [46].

    A concrete example of a PSE is the one presented by Careglio

    of time. Such an environment is called High-Throughput Comput-ing (HTC) environment. In HTC, jobs are dispatched to run indepen-dently on multiple machines at the same time. A distributedparadigm that is growing is Cloud Computing [5], which offersthe means for building the next generation parallel computinginfrastructures along with easy of use. Although the use of Cloudsnds its roots in IT environments, the idea is gradually entering sci-entic and academic ones [32].

    Executing PSEs on Cloud Computing environments (or Cloudsfor short) is not free from the well-known scheduling problem,i.e., it is necessary to develop efcient scheduling strategies toParameter sweep experimentsCloud ComputingJob schedulingSwarm IntelligenceAnt Colony OptimizationWeighted owtime

    1. Introduction

    Parameter Sweep Experiments (Pconducting simulation-based experiengineers, through which the same a0965-9978/$ - see front matter 2012 Elsevier Ltd. Ahttp://dx.doi.org/10.1016/j.advengsoft.2012.11.011in scientic Clouds. We present a new Cloud scheduler based on Ant Colony Optimization, the mostpopular bio-inspired technique, which also exploits well-known notions from operating systems theory.Simulated experiments performed with real PSE job data and other Cloud scheduling policies indicatethat our proposal allows for a more agile job handling while reducing PSE completion time.

    2012 Elsevier Ltd. All rights reserved.

    a very popular way ofused by scientists andtion code is run several

    geometric imperfections. By varying these parameters, severalstudy cases were obtained, which was necessary to analyze andrun on different machines in parallel.

    Users relying on PSEs need a computing environment thatdelivers large amounts of computational power over a long periodAvailable online 17 December 2012schedulers based on bio-inspired techniques which work well in approximating problems with littleinput information have been proposed. Unfortunately, existing proposals ignore job priorities, whichAn ACO-inspired algorithm for minimizinin cloud-based parameter sweep experim

    Cristian Mateos a,b,, Elina Pacini c, Carlos Garca Gara ISISTAN Research Institute, UNICEN University, Campus Universitario, Tandil B7001BBObConsejo Nacional de Investigaciones Cientcas y Tcnicas (CONICET), Argentinac ITIC, UNCuyo University, Mendoza, Argentina

    a r t i c l e i n f o

    Article history:Received 15 April 2012Received in revised form 6 September 2012Accepted 12 November 2012

    a b s t r a c t

    Parameter Sweep Experimthe same program code agtational requirements. Thuthese demands. However, j

    journal homepage: www.elll rights reserved.weighted owtimentsc

    enos Aires, Argentina

    (PSEs) allow scientists and engineers to conduct experiments by runningt different input data. This usually results in many jobs with high compu-istributed environments, particularly Clouds, can be employed to fulllscheduling is challenging as it is an NP-complete problem. Recently, Cloud

    SciVerse ScienceDirect

    eering Software

    ier .com/locate /advengsoft

  • each individual insect, and the cooperation among them is largely

    the whole PSE until all jobs have nished. Thus, in principle, giving

    gineehigher (or lower) priority to jobs that are supposed to take longerto nish may help in improving output processing.

    In this paper, we propose a new scheduler that is based on AntColony Optimization (ACO), the most popular SI technique, for exe-cuting PSEs in Clouds by taking into account job priorities. Specif-ically, we formulate our problem as minimizing the weightedowtime of a set of jobs, i.e., the weighted sum of job nish timesminus job start times, while also minimizing makespan, i.e., the to-tal execution time of all jobs. Our scheduler essentially includes aCloud-wide VM (Virtual Machine) allocation policy based on ACOto map VMs to physical hosts, plus a VM-level policy for taking intoaccount individual job priorities that bases on the inverse of thewell-known convoy effect from operating systems theory. Exper-iments performed using the CloudSim simulation toolkit [6], to-gether with job data extracted from real-world PSEs andalternative scheduling policies for Clouds show that our schedulereffectively reduces weighted owtime and achieves better make-span levels than other existing methods.

    The rest of the paper is structured follows. Section 2 gives somebackground necessary to understand the concepts of the approachpresented in this paper. Then, Section 3 surveys relevant relatedworks. Section 4 presents our proposal. Section 5 presents detailedexperiments that show the viability of the approach. Finally,Section 6 concludes the paper and delineates future researchopportunities.

    2. Background

    Certainly, Cloud Computing [5,44] is the current emerging trendin delivering IT services. By means of virtualization technologies,Cloud Computing offers to end-users a variety of services coveringthe entire computing stack, from the hardware to the applicationlevel. This makes the spectrum of conguration options availableto scientists, and particularly PSEs users, wide enough to coverany specic need from their research. Another important feature,from which scientists can benet, is the ability to scale up anddown the computing infrastructure according to PSE resourcerequirements. By using Clouds scientists can have easy access tolarge distributed infrastructures and are allowed to completelycustomize their execution environment, thus deploying the mostappropriate setup for their experiments.

    2.1. Cloud Computing basicsself-organized without any central supervision. After studying so-cial insect swarms behaviors such as ant colonies, researchers haveproposed some algorithms or theories for solving combinatorialoptimization problems. Moreover, job scheduling in Clouds is alsoa combinatorial optimization problem, and several schedulers inthis line exploiting SI have been proposed.

    As discussed in [31,30], existing SI schedulers completely ignorejob priorities. Particularly, for running PSEs, this is a very importantaspect. When designing a PSE as a set of N jobs, where every job inthe set is associated a particular value for the ith model variablebeing varied and studied by the PSE. In this case job running timesbetween jobs can be very different. This is due to the fact that run-ning the same code or solver (i.e., job) against many input valuesusually yields very dissimilar execution times as well. This situa-tion is very undesirable since, unless the scheduler knows somejob information, the user cannot process/visualize the outputs ofintelligent methods. These problems are beyond the capabilities of

    C. Mateos et al. / Advances in EnThe concept of virtualization is central to Cloud Computing, i.e.,the capability of a software system of emulating various operatingsystems. By means of this support, users exploit Clouds by request-ing from them machine images, or virtual machines that emulateany operating system on top of several physical machines, whichin turn run a host operating system. Usually, Clouds are establishedusing the machines of a datacenter for executing user applicationswhile they are idle. In other words, a scientic user application canco-allocate machine images, upload input data, execute, and down-load output (result) data for further analysis. Finally, to offer on de-mand, shared access to their underlying physical machines, Cloudshave the ability to dynamically allocate and deallocate machinesimages. Besides, Clouds can co-allocate N machines images on Mphysical machines, with NPM, thus concurrent user-wideresource sharing is ensured. These relationships are depicted inFig. 1.

    With everything mentioned so far, there is a great consensus onthe fact that from the perspective of domain scientists the com-plexity of traditional distributed and parallel computing environ-ments such as clusters and particularly Grids should be hidden,so that domain scientists can focus on their main concern, whichis performing their experiments [45,27,26]. The value of CloudComputing as a tool to execute complex scientic applications ingeneral [45,44] and parametric studies in particular [25] has beenalready recognized within the scientic community.

    While Cloud Computing helps scientic users to run complexapplications, job management and particularly scheduling is akey concern that must be addressed. Broadly, job scheduling is amechanism that maps jobs to appropriate executing machines,and the delivered efciency directly affects the performance ofthe whole distributed environment. Particularly, distributedscheduling algorithms have the goal of processing a single applica-tion that is composed of several jobs by submitting these latter tomany machines, while maximizing resource utilization and mini-mizing the total execution time (i.e., makespan) of the jobs.

    2.2. Job and Cloud scheduling basics

    According to the well-known taxonomy of scheduling in dis-tributed computing systems by Casavant and Kuhl [8], from thepoint of view of the quality of the solutions a scheduler is able tobuild, any scheduling algorithm can be classied into optimal orsub-optimal. The former characterizes scheduling algorithms that,based on complete information regarding the state of the distrib-uted environment (e.g., hardware capabilities and load) as wellas resource needs (e.g., time required by each job on each comput-ing resource), carry out optimal job-resource mappings. When thisinformation is not available, or the time to compute a solution isunfeasible, sub-optimal algorithms are used instead.

    Sub-optimal algorithms are further classied into heuristic orapproximate. First, heuristic algorithms are those that make asfew assumptions as possible (or even no assumptions) about re-source load or job duration prior to perform scheduling. Approxi-mate schedulers, on the other hand, are based on the same inputinformation and formal computational model as optimal schedul-ers but they try to reduce the solution space so as to cope withthe NP-completeness of optimal scheduling algorithms. However,having this information again presents problems in practice. Mostof the time, for example, it is difcult to estimate job durationaccurately since the runtime behavior of a job is unpredictablebeforehand because of conditional code constructs or networkcongestion.

    In this sense, Clouds, as any other distributed computing envi-ronment, is not free from the problem of accurately estimate as-pects such as job duration. Another aspect that makes this

    ring Software 56 (2013) 3850 39problem more difcult is multi-tenancy, a distinguishing featureof Clouds by which several users and hence their (potentially het-erogeneous) jobs are served at the same time via the illusion of

  • utin

    ineeseveral logic infrastructures that run on the same physical hard-ware. This also poses challenges when estimating resource load.Indeed, optimal and sub-optimal-approximate algorithms such asthose based on graph or queue theory need accurate informationbeforehand to perform correctly, and therefore in general heuristicalgorithms are preferred. In the context of our work, we are dealingwith highly multi-tenant Clouds where jobs come from differentCloud users (people performing multi-domain PSE experiments)and their duration cannot be predicted.

    Fig. 1. Cloud Comp

    40 C. Mateos et al. / Advances in EngIt is true that, however, several heuristic techniques for distrib-uted scheduling have been developed [18]. One of the aspects thatparticularly makes SI techniques interesting for distributed sched-uling is that they perform well in approximating optimizationproblems without requiring too much information on the problembeforehand. From the scheduling perspective, SI-based job sched-ulers can be conceptually viewed as hybrid scheduling algorithms,i.e., heuristic schedulers that partially behave as approximate ones.Precisely, this characteristic has raised a lot of interest in the scien-tic community, particularly for ACO [11].

    2.3. Bio-inspired techniques for Cloud scheduling

    Broadly, bio-inspired techniques have shown to be useful inoptimization problems. The advantage of these techniques derivesfrom their ability to explore solutions in large search spaces in avery efcient way along with little initial information. Therefore,the use of this kind of heuristics is an interesting approach to copein practice with the NP-completeness of job scheduling, and han-dling application scenarios in which for example job executiontime cannot be accurately predicted. Particularly, existing litera-ture show that they are good candidates to optimize job executiontime and load balancing in both Grids and Clouds [31,30]. In partic-ular, the great performance of ant algorithms for scheduling prob-lems was rst shown in [29]. Interestingly, several authors[22,28,36,12,17,20,39,23,41] have complementary shown in theirexperimental results that by using ACO-based techniques jobscan be allocated more efciently and more effectively than usingother traditional heuristic scheduling algorithms such as GA(Genetic Algorithms) [36], Min-Min [20,36] and FCFS [22].One of the most popular and versatile bio-inspired technique isAnt Colony Optimization (ACO), which was introduced by MarcoDorigo in his doctoral thesis [10]. ACO was inspired by the obser-vation of real ant colonies. An interesting behavior is how antscan nd the shortest paths between food sources and their nest.

    Real ants initially wander randomly, and upon nding food theyreturn to their colony while laying down pheromone trails. If otherants nd the same lucky path, they are likely not to keep travel-ing at random, but to instead follow the trail, returning and rein-

    g: High-level view.

    ring Software 56 (2013) 3850forcing it if they eventually nd more food. When one ant nds agood (i.e., short) path from the colony to a food source, other antsare more likely to follow that path, and positive feedback eventu-ally leaves all the ants following a single path. Conceptually, ACOthen mimics this behavior with simulated ants walking aroundthe graph representing the problem to solve.

    Over time, however, pheromone trails start to evaporate, thusreducing their attractive strength. The more the time it takes foran ant to travel down the path and back again, the less the fre-quency with which pheromone trails are reinforced. A short path,by comparison, gets marched over faster, and thus the pheromonedensity remains high as it is laid on the path as fast as it can evap-orate. From an algorithmic standpoint, the pheromone evaporationprocess has also the advantage of avoiding the convergence to a lo-cally good solution. If there were no evaporation at all, the pathschosen by the rst ants would tend to be excessively attractiveto the following ones. In that case, the exploration of the solutionspace would be constrained.

    Fig. 2 shows two possible paths from the nest to the foodsource, but one of them is longer than the other. Ants will startmoving randomly to explore the ground and choose one of thetwo ways as can be seen in (a). The ants taking the shorter pathwill reach the food source before the others and leave behind thema trail of pheromones. After reaching the food, the ants will turnback and try to nd the nest. The ants that go and return faster willstrengthen the pheromone amount in the shorter path morequickly, as shown in (b). The ants that took the long way will havemore probability to come back using the shortest way, and aftersome time, they will converge toward using it. Consequently, theants will nd the shortest path by themselves, without having a

  • gl ftpa

    ofar ew inor mconstraint is feasible. After initialization of pheromone trails, antsconstruct incomplete feasible solutions, starting from random

    the probability to chose the next state would be pkij 0 and the

    tive be

    gineenodes, and then pheromone trails are updated. A node is anabstraction for the location of an ant, i.e., a nest or a food source.At each execution step, ants compute a set of possible moves andselect the best one (according to some probabilistic rules) to carryout the rest of the tour. The transition probability is based on theheuristic information and pheromone trail level of the move. Thehigher the value of the pheromone and the heuristic information,the more protable it is to select this move and resume the search.In the beginning, the initial pheromone level is set to a small posi-tive constant value s0 and then ants update this value after com-pleting the construction stage. All ACO algorithms adapt thespecic algorithm scheme shown in Algorithm 1.

    Algorithm 1. Pseudo-code of the canonical ACO algorithm

    Procedure ACO

    Begin

    Initialize the pheromone

    While (stopping criterion not satisfied) doPosition each ant in a starting node

    Repeatmpiaanmnoi tobal view of the ground. In time, most ants will choose the leth as shown in (c).When applied to optimization problems, ACO uses a colonyticial ants that behave as cooperative agents in a solution spacere they are allowed to search and reinforce paths (solutions)der to nd the feasible ones. A solution that satises the proble

    Fig. 2. AdapC. Mateos et al. / Advances in EnFor each ant do

    Chose next node by applying the state

    transition rate

    End for

    Until every ant has built a solution

    Update the pheromone

    End while

    End

    After initializing the pheromone trails and control parameters, aain loop is repeated until the stopping criterion is met. The stop-ng criterion can be for example a certain number of iterations orgiven time limit without improving the result. In the main loop,ts construct feasible solutions and update the associated phero-one trails. More precisely, partial problem solutions are seen asdes: each ant starts from a random node and moves from a nodeo another node j of the partial solution. At each step, the ant ksearch process would stop from the beginning. Furthermore, thepheromone level of the solution elements is changed by applyingan update rule sij qsij + Dsij, where 0 < q < 1 models pheromoneevaporation andDsij represents additional added pheromone. Usu-ally, the quantity of the added pheromone depends on the desiredcomputes a set of feasible solutions to its current node and movesto one of these expansions, according to a probability distribution.For an ant k the probability pkij to move from a node i to a node jdepends on the combination of two values:

    pkij sij gijP

    q2allowedksiqgiqif j 2 allowedk

    pkij 0 otherwise

    8 0, otherwise

    havior of ants.ring Software 56 (2013) 3850 41quality for the solution.In practice, to solve distributed job scheduling problems, the

    ACO algorithm assigns jobs to each available physical machine.Here, each job can be carried out by an ant. Ants then cooperativelysearch for example the less loaded machines with sufcient avail-able computing power and transfer the jobs to these machines.

    3. Related work

    Indeed, the last decade has witnessed an astonishingly amountof research in improving bio-inspired techniques, specially ACO[34]. As shown in recent surveys [47,24], the enhanced techniqueshave been increasingly applied to solve the problem of distributedjob scheduling. However, with regard to job scheduling in Cloudenvironments, very few works can be found to date [31].

    Concretely, the works in [2,49] propose ACO-based Cloudschedulers for minimizing makespan and maximizing loadbalancing, respectively. An interesting aspect of [2] is that it was

  • Finally, the work in [19] address the problem of job schedulingin Clouds by employing PSO, while reducing energy consumption.Indeed, energy consumption has become a crucial problem [21], on

    42 C. Mateos et al. / Advances in Engineeone hand because it has started to limit further performancegrowth due to expensive electricity bills, and on the other hand,by the environmental impact in terms of carbon dioxide (CO2)emissions caused by high energy consumption. This problem hasin fact gave birth to a new eld called Green Computing [21]. Assuch, [19] does not paid attention to owtime either but interest-ingly it also achieves competitive makespan as evidenced byexperiments performed via CloudSim [6], which is also used in thispaper.

    The next Section focuses on explaining our approach to bio-in-spired Cloud scheduling in detail.

    4. Approach overview

    The goal of our scheduler is to minimize the weighted owtimeof a set of PSE jobs, while also minimizing makespan when using aCloud. One novelty of the proposed scheduler is the association of aqualitativepriority represented as an integer value for each one ofthe jobs of a PSE. Priorities are assigned in terms of the relativeestimated execution time required by each PSE job. Moreover, pri-orities are provided by the user, who because of his experience, hasarguably extensive knowledge about the problem to be solvedfrom a modeling perspective, and therefore can estimate the indi-vidual time requirements of each job with respect to the rest of thejobs of his/her PSE. In other words, the user can identify which arethe individual experiments within a PSE requiring more time to ob-tain a solution.

    The estimated execution times of each experiment depends onthe particular problem at hand and chosen PSE variable values.Once the user has identied the experiments that may requiremore time to be executed, a simple tagging strategy is applied toassign a category (number) to each job. These categories repre-sent the priority degree that will have a job with respect to the oth-ers in the same PSE, e.g., high priority, medium priority or lowpriority, but other categories could be dened. Despite jobs can de-mand different times to execute and the user can identify thoseevaluated in a real Cloud using the Google App Engine [9] andMicrosoft Live Mesh,1 whereas the other effort [49] was evaluatedthrough simulations. However, during the experiments, [2] usedonly 25 jobs and a Cloud comprising 5 machines. Moreover, as ourproposal, these two efforts support true dynamic resource allocation,i.e., the scheduler does not need to initially know the running time ofthe jobs to allocate or the details of the available resources. On thedownside, the works ignore owtime, rendering difcult their appli-cability to execute PSEs in semi-interactive scientic Cloudenvironments.

    Another relevant approach based on Particle Swarm Optimiza-tion (PSO) is proposed in [33]. PSO is a bio-inspired technique thatmimics the behavior of bird ocks, bee swarms and sh schools.Contrary to [2,49], the approach is based on static resource alloca-tion, which forces users to feed the scheduler with the estimatedrunning times of jobs on the set of Cloud resources to be used.As we will show in Section 4, even when some user-supplied infor-mation is required by our algorithm, it only needs as input a qual-itative indication of which jobs may take longer than others whenrun in any physical machine. Besides, [33] is targeted at paidClouds, i.e., those that bill users for CPU, storage and networkusage. As a consequence, the work only minimizes monetary cost,and does not consider either makespan or owtime minimization.1 http://explore.live.com/windows-live-mesh-devices-sync-upgrade-ui/.that take longer, the number of categories should be limited forusability reasons.

    Conceptually, the scheduling problem to tackle down can beformulated as follows. A PSE is formally dened as a set ofN = 1,2, . . . ,n independent jobs, where each job corresponds to aparticular value for a variable of the model being studied bythe PSE. The jobs are executed on m Cloud machines. Each job jhas an associated priority value, i.e., a degree of importance,which is represented by a weight wj to each job. This priority va-lue is taken into account by the scheduler to determine the orderin which jobs will be executed at the VM level. The schedulerprocesses the jobs with higher priority (or heavier) rst and thenthe remaining jobs. The larger the estimated size of a job in termsof execution time, the higher priority weight the user shouldassociate to the job. This is the opposite criterion to the well-known Shortest Job First (SFJ) scheduler from operating systems,whose performance is ideal. SFJ deals with the convoy effectprecisely by prioritizing shorter jobs over heavier ones. Finally,in our scheduler, jobs having similar estimated execution timesshould be assigned the same priority.

    Our scheduler is designed to deliver the shortest total weightedowtime (i.e., the sum of the weighted completion time of all jobs)along with minimum makespan (i.e., the completion time of thelast job nished). The owtime of a job also known as theresponse time is the total time that a job spends in the system,thus it is the sum of times the job is waiting in queues plus itseffective processing time. When jobs have different degrees ofimportance, indicated by the weight of the job, the total weightedowtime is one of the simplest and natural metrics that measuresthe throughput offered by a schedule S and is calculated asPn

    j CjS AjS wj, where Cj is the completion time of job j, Ajis the arrival time of job j and wj is the weight associated to jobj. Furthermore, the completion time of job j in schedule S can bedenoted by Cj(S) and hence the makespan is Cmax(S) =maxjCj(S).

    Fig. 3 conceptually illustrates the sequence of actions from thetime of creating VMs for executing a PSE within a Cloud until thejobs are processed and executed. The User entity represents boththe creator of a virtual Cloud (i.e., VMs on top of physical hosts)and the disciplinary user that submit their experiments for execu-tion. The Datacenter entity manages a number of Hosts entities, i.e.,a number of physical resources. The VMs are allocated to hoststhrough an AllocationPolicy, which implements the SI-based partof the scheduler proposed in this paper. Finally, after setting upthe virtual infrastructure, the user can send his/her experimentsto be executed. The jobs will be handled through a JobPriorityPolicythat will take into account the priorities of jobs at the moment theyare sent to an available VM already issued by the user and allo-cated to a host. As such, our scheduler operates at two levels:Cloud-wide or Datacenter level, where SI techniques (currentlyACO) are used to allocate user VMs to resources, and VM-level,where the jobs assigned to a VM are handled according to theirpriority.

    4.1. Algorithm implementation

    To implement the Cloud-level logic of the scheduler, AntZ, thealgorithm proposed in [23] to solve the problem of load balancingin Grid environments has been adapted to be employed in Clouds(see Algorithm 2). AntZ combines the idea of how ants cluster ob-jects with their ability to leave pheromone trails on their paths sothat it can be a guide for other ants passing their way.

    In our adapted algorithm, each ant works independently andrepresents a VM looking for the best host to which it can be allo-

    ring Software 56 (2013) 3850cated. The main procedure performed by an ant is shown in Algo-rithm 2. When a VM is created, an ant is initialized and starts towork. A master table containing information on the load of each

  • List)

    ocat

    sub

    ulin

    gineehost is initialized (initializeLoadTable ()). Subsequently, ifan ant associated to the VM that is executing the algorithm alreadyexists, the ant is obtained from a pool of ants through getAntPool(vm) method. If the VM does not exist in the ant pool, then a newant is created. To do this, rst a list of all suitable hosts in whichcan be allocated the VM is obtained. A host is suitable if it has anamount of processing power, memory and bandwidth greater thanor equal to that of required by the wandering VM.

    Then, the working ant with the associated VM is added to theant pool (antPool.add(vm,ant)) and the ACO-specic logicstarts to operate (see Algorithm 3). In each iteration of the subal-gorithm, the ant collects the load information of the host that isvisiting through the getHostLoadInformation () operation and adds this information to its private load history. The antthen updates a load information table of visited hosts (local-LoadTable.update ()), which is maintained in each host. Thistable contains information of the own load of an ant, as well as

    User Datacenter Host VM

    vmList=createVMs()

    allocate(vmList,host

    while [!vmList.isEmpty()] all

    jobList=createJobs()

    Fig. 3. Sequence diagram of sched

    C. Mateos et al. / Advances in Enload information of other hosts, which were added to the tablewhen other ants visited the host. Here, load refers to the totalCPU utilization within a host.

    When an ant moves from one host to another has two choices.One choice is to move to a random host using a constant probabil-ity or mutation rate. The other choice is to use the load table infor-mation of the current host (chooseNextStep ()). The mutationrate decreases with a decay ratefactor as time passes, thus, theant will be more dependent on load information than to randomchoice. This process is repeated until the nishing criterion ismet. The completion criterion is equal to a predened number ofsteps (maxSteps). Finally, the ant delivers its VM to the current hostand nishes its task. When the ant has not completed its work, i.e.,the ant cannot allocate its associated VM to a host, then Algorithm2 is repeated with the same ant until the ant nally achieves thenished state. Prior to repetition, an exponential back-off strat-egy to wait for a while is performed.

    Every time an ant visits a host, it updates the host load informa-tion table with the information of other hosts, but at the same timethe ant collects the information already provided by the table ofthat host, if any. The load information table acts as a pheromonetrail that an ant leaves while it is moving, in order to guide otherants to choose better paths rather than wandering randomly inthe Cloud. Entries of each local table are the hosts that ants havevisited on their way to deliver their VMs together with their loadinformation.Algorithm 2. ACO-based allocation algorithm for individual VMs

    Procedure ACOallocationPolicy (vm,hostList)

    Begin

    initializeLoadTable ()

    ant = getAntPool (vm)

    if (ant==null) thensuitableHosts = getSuitableHostsForVm

    (hostList,vm)

    ant = new Ant (vm,suitableHosts)

    antPool.add (vm,ant)

    end if

    repeat

    ant.AntAlgorithm ()

    until ant.isFinish ()

    allocatedHost = hostList.get(ant.getHost ())

    allocatedHost.allocateVM (ant.getVM ())

    g actions within a Private Cloud.AllocationPolicy Job JobPriorityPolicy

    eVMToHost(vm,host)

    scheduleJobToVM(job)

    mitJobsToVMsByPriority(jobList)

    executeJob(job)

    ring Software 56 (2013) 3850 43End

    Algorithm 3. ACO-specic logic

    Procedure AntAlgorithm ()Begin

    step = 1

    initialize ()

    While (step < maxSteps) docurrentLoad = getHostLoadInformation ()

    AntHistory.add (currentLoad)

    localLoadTable.update ()

    if (random () < mutationRate) thennextHost = randomlyChooseNextStep ()

    else

    nextHost = chooseNextStep ()

    end if

    mutationRate = mutationRate-decayRate

    step = step + 1

    moveTo (nextHost)

    end while

    deliverVMtoHost ()

    End

  • system (specically the Ubuntu 11.04 distribution) running thegeneric kernel version 2.6.388. It is worth noting that only one

    ineeand then, through getJobByPriority () method retrieves jobsaccording to their priority value, this means, jobs with the highest

    priority rst, then jobs with medium priority value, and nally jobswith low priority. Each time a job is obtained from the jobList it

    a metric that indicates how fast a machine processor runs. Sincethe real tests were performed on a machine running the Linuxoperating system, we have considered to use the bogoMIPSis submitted to be executed in a VM by a round robin method. TheAlgorithm 4. ACO-specic logic: The ChooseNextStep procedure

    Procedure ChooseNextStep ()

    Begin

    bestHost = currentHost

    bestLoad = currentLoad

    for each entry in hostList

    if (entry.load < bestLoad) thenbestHost = entry.host

    else if (entry.load = bestLoad) thenif (random.next < probability) thenbestHost = entry.host

    end if

    end if

    end for

    End

    When an ant reads the information in the load table in eachhost and chooses a direction via Algorithm 4, the ant chooses thelightest loaded host in the table, i.e., each entry of the load infor-mation table is evaluated and compared with the current load ofthe visited host. If the load of the visited host is smaller than anyother host provided in the load information table, the ant choosesthe host with the smallest load, and in case of a tie the ant choosesone with an equal probability.

    To calculate the load, the original AntZ algorithm receives thenumber of jobs that are executing in the resource in which the loadis being calculated. In the proposed algorithm, the load is calcu-lated on each host taking into account the CPU utilization madeby all the VMs that are executing on each host. This metric is usefulfor an ant to choose the least loaded host to allocate its VM.

    Once the VMs have been created and allocated in physical re-sources, the scheduler proceeds to assign the jobs to user VMs.To do this, a strategy where the jobs with higher priority valueare assigned rst to the VMs is used (see Algorithm 5). This repre-sents the second scheduling level of the scheduler proposed as awhole.

    Algorithm 5. The SubmitJobsToVMsByPriority procedure

    Procedure SubmitJobsToVMsByPriority (jobList)

    Begin

    vmIndex = 0

    while (jobList.size () > 0)job = jobList.getJobByPriority ()

    vm = getVMsList (vmIndex)

    vm.scheduleJobToVM (job)

    vmIndex = Mod(vmIndex + 1,getVMsList ().size())

    jobList.remove (job)

    end while

    End

    To carry out the assignment of jobs to VMs, this subalgorithmuses two lists, one containing the jobs that have been sent by theuser, and the other list contains all user VMs that are already allo-cated to a physical resource and hence are ready to execute jobs.The procedure starts iterating the list of all jobs jobsList

    44 C. Mateos et al. / Advances in Engcore was used during the experiments since individual jobs didnot exploit multicore capabilities.

    The information regarding machine processing power was ob-tained from the native benchmarking support of Linux and as suchis expressed in bogoMIPS [43]. BogoMIPS (from bogus and MIPS) isVM where the job is executed is obtained through the methodgetVmsList (vmIndex). Internally, the algorithm maintains aqueue for each VM that contains a list of jobs that have been as-signed to be executed. The procedure is repeated until all jobs havebeen submitted for execution, i.e., when the jobList is empty.

    5. Evaluation

    In order to assess the effectiveness of our proposal for executingPSEs on Clouds, we have processed a real case study for solving avery well-known benchmark problem proposed in the literature,see [1] for instance. Broadly, the experimental methodology in-volved two steps. First, we executed the problem in a single ma-chine by varying an individual problem parameter by using anite element software, which allowed us to gather real job data,i.e., processing times and input/output le data sizes (see Sec-tion 5.1). By means of the generated job data, we instantiated theCloudSim simulation toolkit, which is explained in Section 5.2.Lastly, the obtained results regarding the performance of our pro-posal compared with some Cloud scheduling alternatives are re-ported in Section 5.3.

    5.1. Real job data gathering

    The problem explained in [1] involves studying a plane strainplate with a central circular hole. The dimensions of the plate were18 10 m, with R = 5 m. On the other hand, material constantsconsidered were E = 2.1 105 MPa, m = 0.3, ry = 240 MPa andH = 0. A linear Perzyna viscoplastic model with m = 1 and n =1was considered. Unlike previous studies of our own [7], in whicha geometry parameter particularly imperfection was chosento generate the PSE jobs, in this case a material parameter was se-lected as the variation parameter. Then, 25 different viscosity val-ues for the g parameter were considered, namely 1 104, 2 104,3 104, 4 104, 5 104, 7 104, 1 105, 2 105, 3 105,4 105, 5 105, 7 105, 1 106, 2 106, 3 106, 4 106,5 106, 7 106, 1 107, 2 107, 3 107, 4 107, 5 107,7 107 and 1 108 MPa. Useful and introductory details on visco-plastic theory and numerical implementation can be found in[35,14].

    The two different nite element meshes displayed in Fig. 4 weretested. In both cases, Q1/P0 elements were chosen. Imposed dis-placements (at y = 18 m) were applied until a nal displacementof 2000 mm was reached in 400 equal time steps of 0.05 mm each.It is worth noting that d = 1 has been set for all the time steps.

    After establishing the problem parameters, we employed a sin-gle real machine to run the parameter sweep experiment by vary-ing the viscosity parameter g as indicated above andmeasuring theexecution time for the 25 different experiments, which resulted in25 input les with different input congurations and 25 outputles for either meshes. The tests were solved using the SOGDE -nite element solver software [13]. Furthermore, the characteristicsof the real machine on which the executions were carried out areshown in Table 1. The machine model was AMD Athlon (tm) 64X2 Dual Core Processor 3600+, equipped with the Linux operating

    ring Software 56 (2013) 3850

  • gineeC. Mateos et al. / Advances in Enmeasure which is as we mentioned the one used by this operatingsystem to approximate CPU power.

    Then, the simulations were carried out by taking into accountthe bogoMIPS metric for measuring simulated jobs CPU instruc-tions. Once the execution times were obtained from the real ma-chine, we approximated for each experiment the number ofexecuted instructions by the following formula:

    NIj bogomipsCPU Tj

    where,

    NIj is the number of million instructions to be executed by, orassociated to, a job j. bogomipsCPU is the processing power of the CPU of our realmachine measured in bogoMIPS. Tj is the time that took to run the job j on the real machine.

    Next is an example of how to calculate the number of instruc-tions of a job corresponding to the mesh of 1152 elements thattook 117 s to be executed. The machine where the experimentwas executed had a processing power of 4008.64 bogoMIPS. Then,the approximated number of instructions for the job was 469,011

    (a) 288 elementsFig. 4. Considered inp

    Table 1Machine used to execute the real PSE: Characteristics.

    Feature Value

    CPU power 4008.64 bogoMIPSNumber of CPUs 2RAM memory 2 GBytesStorage size 400 GBytesBandwidth 100 Mbpsut data meshes.(b) 1,152 elements

    ring Software 56 (2013) 3850 45MI (Million Instructions). Details about all jobs execution timesand lengths can be seen in Table 2.

    5.2. CloudSim instantiation

    First, the CloudSim simulator [6] was congured with a data-center composed of a single machine or host in CloudSim ter-minology with the same characteristics as the real machinewhere the experiments were performed. The characteristics ofthe congured host are shown in Table 3 (left). Here, processingpower is expressed in MIPS (Million Instructions Per Second),RAMmemory and Storage capacity are in MBytes, bandwidth is ex-pressed in Mbps, and nally, PEs is the number of processing ele-ments (cores) of the host. Each PE has the same processing power.

    Once congured, we checked that the execution times obtainedby the simulation coincided or were close to real times for eachindependent job performed on the real machine. The results weresuccessful in the sense that one experiment (i.e., a variation in thevalue of g) took 117 s to be solved in the real machine, while in thesimulated machine the elapsed time was 117.02 s. Once the execu-tion times have been validated for a single machine on CloudSim, anew simulation scenario was set. This new scenario consisted of adatacenter with 10 hosts, where each host has the same hardwarecapabilities as the real single machine, and 40 VMs, each with thecharacteristics specied in Table 3 (right). This is a moderately-sized, homogeneous datacenter that can be found in many realscenarios.

    For each mesh, we evaluated the performance of their associ-ated jobs in the simulated Cloud as we increased the number ofjobs to be performed, i.e., 25i jobs with i = 10, 20, . . . , 100. Thisis, the base job set comprising 25 jobs obtained by varying the va-lue of g was cloned to obtain more sets.

    Each job, called cloudlet by CloudSim, had the characteristicsshown in Table 4 (left), where Length parameter is the number

  • Table 2Real jobs execution times and lengths: Meshes of 288 and 1152 elements.

    Parameter g Mesh of 288 elements

    Execution time (s-min) Length (M

    1 104 24-0.40 96,2072 104 25-0.41 100,2163 104 24-0.40 96,2074 104 24-0.40 96,2075 104 25-0.41 100,2167 104 25-0.41 100,2161 105 25-0.41 100,2162 105 25-0.41 100,216 85-1.42 340,7343 105 26-0.43 104,2254 105 22-0.36 88,190

    46 C. Mateos et al. / Advances in Enginee5 105 20-0.33 80,1737 105 20-0.33 80,1731 106 20-0.33 80,1732 106 20-0.33 80,1733 106 20-0.33 80,1734 106 20-0.33 80,1735 106 20-0.33 80,173of instructions to be executed by a cloudlet, which varied between52,112 and 104,225 MIPS for the mesh of 288 elements andbetween 244,527 and 469,011 for the mesh of 1152 elements(see Table 2). Moreover, PEs is the number of processing elements(cores) required to perform each individual job. Input size and

    7 106 20-0.33 80,1731 107 20-0.33 80,1732 107 17-0.28 68,1473 107 13-0.21 52,1124 107 13-0.21 52,1125 107 13-0.21 52,1127 107 13-0.21 52,1121 108 13-0.21 52,112

    Table 3Simulated Cloud machines characteristics. Host parameters (left) and VM parameters(right).

    Value

    Host parametersProcessing power 4008 bogoMIPSRAM 4 GbytesStorage 409,600Bandwidth 100 MbpsPEs 2

    VM parametersMIPS 4008 bogoMIPSRAM 1 GbyteMachine image size 102,400Bandwidth 25 MbpsPEs 1VMM (Virtual Machine Monitor) Xen

    Table 4Cloudlet conguration used in the experiments. CloudSim built-in parameters (left)and job priorities (right).

    Value

    Mesh of 288 elements Mesh of 1152 elements

    Cloudlet parametersLength (MIPS) 52,112104,225 244,527469,011PEs 1 1Input size (bytes) 40,038 93,082Output size (bytes) 722,432 2,202,010

    Cloudlet priorityLow (wj = 1) 52,11268,147 244,527272,588Medium (wj = 2) 80,17396,207 280,605348,752High (wj = 3) 100,216104,225 352,760469,011Output size are the input le size and output le size in bytes,respectively. As shown in Table 4 (left) the experiments corre-sponding to the mesh of 288 elements had input les of 40,038 by-tes, and the experiments corresponding to the mesh of 1152elements had input les of 93,082 bytes. A similar distinction ap-plies to the output le sizes. Finally, Table 4 (right) shows the pri-orities assigned to Cloudlets. For the sake of realism, the base jobset comprised 7, 11 and 7 job with low, medium and high priority,

    102-1.70 408,881117-1.95 469,01170-1.17 280,60573-1.22 292,63174-1.23 296,63973-1.22 292,63172-1.20 288,62270-1.17 280,60562-1.03 248,53661-1.02 244,52762-1.03 248,53665-1.08 260,56265-1.08 260,56268-1.13 272,58868-1.13 272,58870-1.17 280,60571-1.18 280,613Mesh of 1152 elements

    IPS) Execution time (s-min) Length (MIPS)

    90-1.50 360,77888-1.47 352,76087-1.45 348,75287-1.45 348,75289-1.48 356,76988-1.47 352,76088-1.47 352,760

    ring Software 56 (2013) 3850respectively. This is, usually most PSE jobs take similar executiontimes, except for less cases, where smaller and higher executiontimes are registered.

    In CloudSim, the amount of available hardware resources toeach VM is constrained by the total processing power and availablesystem bandwidth within the associated host. Thus, schedulingpolicies must be applied in order to appropriately assign Cloudlets(and hence VMs) to hosts and achieve efcient use of resources. Onthe other hand, Cloudlets can be congured to be scheduledaccording to some scheduling policy that can both determinehow to indirectly assign them to hosts and in which order to pro-cess the list of Cloudlets within a host. This allow us to experimentwith custom Cloud schedulers such as the one proposed in this pa-per, which was in fact compared against CloudSim built-in sched-ulers. The next section explains the associated obtained results indetail.

    5.3. Experiments

    In this subsection we report on the obtained results when exe-cuting the PSE in the simulated Cloud by using our two-levelscheduler and two classical Cloud scheduling policies for assigningVMs to hosts and handling jobs. Due to their high CPU require-ments, and the fact that each VM requires only one PSE, we as-sumed a 11 job-VM execution model, i.e., although each VM canhave in its waiting queue many jobs, they are executed one at atime. Concretely, apart from our proposal, we employed a randompolicy, and a best effort (CloudSim built-in) policy, which uponexecuting a job assigns the corresponding VMs to the Cloud ma-chine which has the highest number of free PEs. In our scheduler,

  • we set the ACO-specic parameters i.e., mutation rate, decay rateand maximum steps as suggested in [23]. For simplicity, fromnow on, we will refer to the term weighted owtime as ow-time. In addition, although our scheduler is independent fromthe SI technique exploited at the Cloud-level, it will be referredto as ACO.

    In all cases, the competing policies were also complementedwith the VM-level policy for handling jobs within VMs, or the

    VMs allocated to a single host. As shown in Table 5, regardlessthe VM allocation policy used or the experiment size (i.e., mesh),considering job priority information yielded as a result importantgains with respect to accumulated owtime, i.e., the weightedowtime resulted from simulating the execution of 25i jobs withi = 10, 20, . . . , 100 of various priorities. For example, for the meshof 288 elements, gains in the range of 13.6315.07% compared tonot considering priorities were obtained, whereas for the mesh of

    Table 5Using the VM-level priority policy: Effects on owtime and makespan.

    Scheduler Mesh of 288 elements Mesh of 1152 elements

    Flowtime (min) Makespan (min) Flowtime (min) Makespan (min)

    Random 153707.56 366.20 569335.78 1373.44Random (priority) 130539.85 361.33 482562.31 1373.19Gain (0100%) 15.07 1.32 15.24 0.02Best effort 137378.79 283.89 513783.00 1084.60Best effort (priority) 118646.36 283.20 435985.39 1086.24Gain (0100%) 13.63 0.24 15.14 0.15ACO 127270.74 233.50 476230.77 894.69ACO (priority) 109477.35 234.13 401806.35 896.54Gain (0100%) 13.98 0.27 15.62 0.20

    0

    5000

    10000

    15000

    20000

    25000

    30000

    250 500 750 1000 1250 1500 1750 2000 2250 2500

    Flow

    time

    (min

    utes

    )(L

    ess

    is b

    ette

    r)

    Mak

    espa

    n (m

    inut

    es)

    (Les

    s is

    bet

    ter)

    250 500 750 1000 1250 1500 1750 2000 2250 2500

    RandomBest effort

    ACO

    RandomBest effort

    ACO

    0

    10

    20

    30

    40

    50

    60

    70

    obs

    C. Mateos et al. / Advances in Engineering Software 56 (2013) 3850 47Number of jobs

    (a) FlowtimeFig. 5. Results as the number of j

    RandomBest effort

    120000

    140000Flow

    time

    (min

    utes

    )(L

    ess

    is b

    ette

    r)

    250 500 750 1000 1250 1500 1750 2000 2250 2500Number of jobs

    ACO

    0

    20000

    40000

    60000

    80000

    100000

    (a) FlowtimeFig. 6. Results as the number of jobs iNumber of jobs

    (b) Makespanincreases: Mesh of 288 elements.

    RandomBest effort

    250Mak

    espa

    n (m

    inut

    es)

    (Les

    s is

    bet

    ter)

    250 500 750 1000 1250 1500 1750 2000 2250 2500Number of jobs

    ACO

    0

    50

    100

    150

    200

    (b) Makespanncreases: Mesh of 1152 elements.

  • 1152, the gains were in the range of 15.1415.62%. Interestingly,except for few cases were some overhead was obtained (the cellswith negative percentage values), the accumulated makespanwas not signicantly affected. This means that taking into accountjob priority information at the VM level for scheduling jobs im-proves weighted owtime without compromising makespan.

    Therefore, the rest of the experiments were performed by usingthe variants exploiting job priority information.

    Furthermore, Figs. 5 and 6 illustrate the obtained owtime andmakespan by the three schedulers using the VM-level priority pol-icy for both meshes. Graphically, it can be seen that, irrespective ofthe mesh, owtime and makespan presented exponential and lin-ear tendencies, respectively. All in all, our algorithm performedrather well compared to its competitors regarding the two perfor-mance metrics taken. Table 6 shows the reductions or gains ob-tained by ACO with respect to best effort. An observation isthat, for the mesh of 288 elements, the highest gains in terms ofowtime of ACO compared to the best effort policy wereachieved for 250750 jobs, with gains of up to 16%. For 1000 jobsand beyond, the owtime gains converged around 78%. This issound since adding more jobs to the environment under test endsup saturating its execution capacity, and thus scheduling decisionshave less impact on the outcome. For the mesh of 1152 elements,on the other hand, a similar behavior was observed when execut-ing 1000 or more jobs. For 750 or less jobs, owtime gains were1017%. Despite having heavier jobs (from a computational stand-point) makes computations to stay longer within the Cloud andmay impact on owtime, ACO maintained the gain levels.

    Table 6Results as the number of jobs increases: Percentage gains of ACO with respect to besteffort.

    #Jobs Mesh of 288 elements Mesh of 1152 elements

    Flowtime Makespan Flowtime Makespan

    250 16.43 22.45 17.00 19.87500 12.03 20.55 11.94 20.76750 9.98 16.92 9.96 18.15

    1000 8.37 16.60 8.45 16.641250 8.42 17.76 8.63 17.331500 8.11 17.81 8.19 18.061750 7.84 16.63 7.86 17.282000 7.26 16.48 7.37 16.652250 7.41 17.17 7.58 17.032500 7.30 17.27 7.42 17.50

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    50 250 450 650 850 1050 50 250 450 650 850 1050 0

    2

    4

    6

    8

    10

    12

    14

    Flow

    time

    (min

    utes

    )(L

    ess

    is b

    ette

    r)

    Mak

    espa

    n (m

    inut

    es)

    (Les

    s is

    bet

    ter)

    RandomBest effort

    ACO

    RandomBest effort

    ACO

    sts

    48 C. Mateos et al. / Advances in Engineering Software 56 (2013) 3850Number of hosts

    (a) FlowtimeFig. 7. Results as the number of ho

    3000050 250 450 650 850 1050Number of hosts

    (a) Flowtime

    0

    5000

    10000

    15000

    20000

    25000

    Flow

    time

    (min

    utes

    )(L

    ess

    is b

    ette

    r)

    RandomBest effort

    ACO

    Fig. 8. Results as the number of hostsNumber of hosts

    (b) Makespanincreases: Mesh of 288 elements.

    50 250 450 650 850 1050Number of hosts

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    55

    Mak

    espa

    n (m

    inut

    es)

    (Les

    s is

    bet

    ter)

    RandomBest effort

    ACO(b) Makespanincreases: Mesh of 1152 elements.

  • in terms of makespan, from Table 6 it can be seen that important

    gineemakespan gains were obtained as well. However, compared toowtime, gains were more uniform. Concretely, for the mesh of288 elements, ACO outperformed best effort by 22.45% and20.55% when executing 250 and 500 jobs, respectively. When run-ning more jobs, the gains were in the range of 1617%. For themesh of 1152, the gains were approximately in the range of 1719% irrespective of the number of jobs used.

    In a second round of experiments, we measured the effects ofvarying the number of hosts while keeping the number of jobs toexecute xed. The aim of this experiment is to assess the effectsof increasing hosts, which is commonly known as horizontal scala-bility or scaling out, in the performance of the algorithms. Particu-larly, we evaluated the three policy-aware scheduling alternativesby using 50i hosts with i = 1, 5, 9, 13, 17, 21. These values werechosen to be consistent with the ones employed in the horizontalscalability analysis performed on the original AntZ algorithm[23]. The hosts and VMs characteristics were again the ones shownin Table3. Besides, the number of VMs in each case increasedaccordingly. On the other hand, the number of jobs were set at themaximum used in the previous experiment (2500), which means100 PSEs of 25 jobs each. Again, within each PSE, there were 7,11 and 7 jobs tagged with low, medium and high priority.

    Fig. 7 shows the obtained results for the mesh of 288 elements.As illustrated, ACO performed very well. In terms of owtime, itcan be seen that best effort (gray bar) could not exploit resourcesfor 650 hosts onwards, as the delivered owtime is around thesame value. Moreover, a similar situation occurs with makespan.As one might expect, arbitrarily growing the size of the used Cloudbeneted the random policy, which obtained gains in terms of thetested performance metrics up to 850 hosts. However, the bestcurves were obtained with ACO, which although marginal, also ob-tained performance gains when using 1050 hosts. It is worth not-ing that the goal of this experiment is not to study the numberof hosts to which the owtime and makespan curves converge,which is in fact not useful as it would depend on the parametersestablished for the jobs and the simulated hardware used. In con-trast, the experiment shows that ACO cleanly supports scaling outcompared to the analyzed alternatives.

    Complementary, Fig. 8 shows the resulting owtimes andmakespan for the mesh of 1152 elements. Although the curvesare very similar to the case of the mesh of 288 elements, some dif-ferences in the average makespan were obtained. Concretely, theaverage gain of ACO with respect to best effort, taken asP

    j50;250;450;650;850;1050makespanjbestEffortmakespanjACO

    makespanjbestEffort

    h i=6, yielded as a

    Result 30.43% and 26.79% for the mesh of 1152 and 288 elements,respectively. The same applies to ACO versus Random (41.33% and37.90%). This is since our support considers host CPU utilizationand not just hardware capabilities to make scheduling decisions,it is able to better exploit computational power or Cloud PEs. It isworth noting that CPU utilization is different from regular CPU load[27]. Within a single host, the former metric provides trend infor-mation of CPU usage, but not just the length of the queue main-taining the jobs (or VMs) waiting for taking possession of the PEsas the latter metric does. For larger jobs, more extensive use of re-sources is done, and thus CPU utilization in resources is close to100% most of the time. Then, ACO is able to quickly scheduleVMs to hosts whose CPU utilization is below these levels, thusincreasing efciency.

    6. ConclusionsAlthough our aim is to reduce owtime while being competitive

    C. Mateos et al. / Advances in EnParameter Sweep Experiments (PSE) is a type of numerical sim-ulation that involves running a large number of independent jobsand typically requires a lot of computing power. These jobs mustbe efciently processed in the different computing resources of adistributed environment such as the ones provided by Cloud. Con-sequently, job scheduling in this context indeed plays a fundamen-tal role.

    In the last ten years or so, bio-inspired computing has been re-ceived increasing attention in the research community. Bio-in-spired computing (also known as Swarm Intelligence) refers tothe collective behavior that emerges from a swarm of social in-sects. Social insect colonies solve complex problems collectivelyby intelligent methods. These problems are beyond the capabilitiesof each individual insect, and the cooperation among them is lar-gely self-organized without any supervision. Through studying so-cial insect colonies behaviors such as ant colonies, researchers haveproposed some algorithms or theories for combinatorial optimiza-tion problems. Moreover, job scheduling in Clouds is also a combi-natorial optimization problem, and some bio-inspired schedulershave been proposed. Basically, researchers have introducedchanges to the traditional bio-inspired techniques to achievedifferent Cloud scheduling goals, i.e., minimize makespan, maxi-mize load balancing, minimize monetary cost or minimize energyconsumption.

    However, existing efforts fail at effectively handling PSEs in sci-entic Cloud environments since, to the best of our knowledge, noeffort aimed at minimizing owtime exists. Achieving low ow-time is important since it means a more agile human processingof PSE job results. Therefore, we have presented a new Cloudscheduler based on SI and particularly Ant Colony Optimizationthat pays special attention to this aspect. Simulated experimentsperformed with the help of the well-established CloudSim toolkitand real PSE job data show that our scheduler can handle a largenumber of jobs effectively, achieving interesting gains in terms ofowtime and makespan compared to classical Cloud schedulingpolicies.

    We are extending this work in several directions. We will ex-plore the ideas exposed in this paper in the context of other bio-in-spired techniques, particularly Particle Swarm Optimization, whichis also extensively used to solve combinatorial optimization prob-lems. As a starting point, we will implement the rst scheduling le-vel based on an adaptation of the ParticleZ [23] Grid schedulingalgorithm so as to target Clouds. Eventually, we will materializethe resulting job schedulers on top of a real (but not simulated)Cloud platform, such as Emotive Cloud [15], which is designedfor extensibility.

    Another issue concerns energy consumption. Clearly, simplerscheduling policies (e.g., random or best effort) require fairly lessCPU usage, memory accesses and network transfers compared tomore complex policies such as our algorithm. For example, weneed to maintain local and remote host load information for ants,which requires those resources. Therefore, when running manyjobs, the accumulated resource usage overhead may be arguablysignicant, resulting in higher demands for energy. Then, we willstudy the owtime/makespan vs energy consumption tradeoff inorder to consider this problem.

    Finally, there is an outstanding and increasing number of avail-able mobile devices such as smartphones. Nowadays, mobile de-vices have a remarkable amount of computational resources thatallows them to execute complex applications, such as 3D games,and to store large amounts of data. Recent work has experimen-tally shown the feasibility of using smartphones for runningCPU-bound scientic codes [37]. Due to these advances, emergentresearch lines have aimed at integrating smartphones and otherkind of mobile devices into traditional distributed computational

    ring Software 56 (2013) 3850 49environments, like clusters and Grids [38], to play the role of jobexecuting machines. It is not surprising that this research couldalso span scientic Clouds as well. However, intuitively, job

  • 50 C. Mateos et al. / Advances in Engineering Software 56 (2013) 3850scheduling in these highly heterogeneous environments will bemore challenging since mobile devices rely on unreliable wirelessconnections and batteries, which is necessary to consider at thescheduling level. This will open the door to excellent researchopportunities for new schedulers based both on traditional tech-niques and particularly SI-based algorithms.

    Acknowledgments

    We thank the anonymous referees for their helpful commentsto improve the theoretical background and quality of this paper.We also acknowledge the nancial support provided by ANPCyTthrough Grants PAE-PICT 2007-02311 and PAE-PICT 2007-02312.The second author acknowledges her Ph.D. fellowship Granted bythe PRH-UNCuyo Project.

    References

    [1] Alfano G, Angelis FD, Rosati L. General solution procedures in elasto-viscoplasticity. Comput Methods Appl Mech Eng 2001;190(39):512347.

    [2] Banerjee S, Mukherjee I, Mahanti P. Cloud Computing initiative using modiedant colony framework. World Acad Sci Eng Technol 2009:2214.

    [3] Basney J, Livny M, Mazzanti P. Harnessing the capacity of computational Gridsfor high energy physics. In: International conference on computing in highenergy and nuclear physics (CHEP 2000); 2000.

    [4] Bonabeau E, Dorigo M, Theraulaz G. Swarm intelligence: from natural toarticial systems. Oxford University Press; 1999.

    [5] Buyya R, Yeo C, Venugopal S, Broberg J, Brandic I, computing Cloud, et al. andreality for delivering computing as the 5th utility. Future Generation ComputSyst 2009;25(6):599616.

    [6] Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R. CloudSim: a toolkitfor modeling and simulation of Cloud Computing environments and evaluationof resource provisioning algorithms. Software: Pract Exp 2011;41(1):2350.

    [7] Careglio C, Monge D, Pacini E, Mateos C, Mirasso A, Garca Garino C.Sensibilidad de resultados del ensayo de traccin simple frente a diferentestamaos y tipos de imperfecciones. Mec Comput XXIX 2010(41):418197.

    [8] Casavant TL, Kuhl JG. A taxonomy of scheduling in general-purpose distributedcomputing systems. IEEE Trans Software Eng 1988;14(2):14154.

    [9] de Jonge A. Essential app engine: building high-performance java apps withgoogle app engine. Addison-Wesley Professional; 2011.

    [10] Dorigo M. Optimization, learning and natural algorithms. Ph.D. thesis.Politecnico di Milano, Italy; 1992.

    [11] Dorigo M, Sttzle T. The ant colony optimization metaheuristic: algorithms,applications, and advances. In: Glover F, Kochenberger G, editors. Handbook ofmetaheuristics. International series in operations research & managementscience, vol. 57. New York: Springer; 2003. p. 25085.

    [12] Fidanova S, Durchova M. Ant algorithm for Grid scheduling problem. In: 5thInternational conference on large-scale scientic computing. Springer; 2005.p. 40512.

    [13] Garca Garino C, Gabaldn F, Goicolea JM. Finite element simulation of thesimple tension test in metals. Finite Elem Anal Des 2006;42(13):118797.

    [14] Garca Garino C, Ribero Vairo M, Anda Fags S, Mirasso A, Ponthot JP.Numerical simulation of nite strain viscoplastic problems. In: Hogge M, et al.editor. Fifth international conference on advanced computational methods inengineering (ACOMEN 2011). University of Liege; 2011. p. 110.

    [15] Goiri I, Guitart J, Torres J. Elastic management of tasks in virtualizedenvironments. In: XX Jornadas de Paralelismo (JP 2009); 2009. p. 6716.

    [16] Gulamali M, Mcgough A, Newhouse S, Darlington J. Using ICENI to runparameter sweep applications across multiple Grid resources. In: Global Gridforum 10, case studies on Grid applications workshop, GGF10; 2004.

    [17] Hui Y, Xue-Qin S, Xing L, Ming-Hui W. An improved ant algorithm for jobscheduling in Grid computing. International conference on machine learningand cybernetics, vol. 5. IEEE Computer Society; 2005. p. 295761.

    [18] Izakian H, Abraham A, Snasel V. Comparison of heuristics for schedulingindependent tasks on heterogeneous distributed environments. 2009International joint conference on computational sciences and optimization(CSO 09), vol. 1. Washington, DC, USA: IEEE Computer Society; 2009. p. 812.

    [19] Jeyarani R, Nagaveni N, Vasanth Ram R. Design and implementation ofadaptive power-aware virtual machine provisioner (APA-VMP) using swarmintelligence. Future Generation Comput Syst 2012;28(5):81121.

    [20] Kousalya K, Balasubramanie P. To improve ant algorithms Grid schedulingusing local search. Int J Intell Inform Technol Appl 2009;2(2):719.

    [21] Liu Y, Zhu H. A survey of the research on power management techniques forhigh-performance systems. Software: Pract Exp 2010;40(11):94364.

    [22] Lorpunmanee S, Sap M, Abdullah A, Chompooinwai C. An ant colonyoptimization for dynamic job scheduling in Grid environment. In: Worldacademy of science, engineering & technology; 2007. p. 31421.[23] Ludwig S, Moallem A. Swarm intelligence approaches for Grid load balancing. JGrid Comput 2011;9(3):279301.

    [24] Ma T, Qiaoqiao Yan WL, Guan D, Lee S. Grid task scheduling: algorithm review.IETE Tech Rev 2011;28(2):15867.

    [25] Malawski M, Kuzniar M, Wojcik P, Bubak M. How to use google app engine forfree computing. IEEE Int Comput; in press.

    [26] Mateos C, Zunino A, Hirsch M, Fernndez M. Enhancing the BYG gridicationtool with state-of-the-art Grid scheduling mechanisms and explicit tuningsupport. Adv Eng Software 2012;43(1):2743.

    [27] Mateos C, Zunino A, Hirsch M, Fernndez M, Campo M. A software tool forsemi-automatic gridication of resource-intensive java bytecodes and itsapplication to ray tracing and sequence alignment. Adv Eng Software2011;42(4):17286.

    [28] Mathiyalagan P, Suriya S, Sivan S. Modied ant colony algorithm for Gridscheduling. Int J Comput Sci Eng 2010;2:1329.

    [29] Merkle D, Middendorf M, Schmeck H. Ant colony optimization for resource-constrained project scheduling. Evol Comput 2002;6(4):33346.

    [30] Pacini E, Mateos C, Garca Garino C. Planicadores basados en inteligenciacolectiva para experimentos de simulacin numrica en entornos distribuidos.In: Sexta Edicin del Encuentro de Investigadores y Docentes de Ingeniera;2011.

    [31] Pacini E, Mateos C, Garca Garino C. Schedulers based on ant colonyoptimization for parameter sweep experiments in distributed environments.Handbook of Research on Computational Intelligence for Engineering, Science,and Business. IGI Global 2012:41048.

    [32] Pacini E, Ribero M, Mateos C, Mirasso A, Garca Garino C. Simulation on CloudComputing infrastructures of parametric studies of nonlinear solids problems.In: Cipolla-Ficarra FV, editor. Advances in new technologies, interactiveinterfaces and communicability (ADNTIIC 2011), Lecture notes in computerscience. Springer; in press. p. 5668.

    [33] Pandey S, Wu L, Guru S, Buyya R. A particle swarm optimization-basedheuristic for scheduling workow applications in Cloud Computingenvironments. In: International conference on advanced informationnetworking and applications (AINA 2010). IEEE Computer Society; 2010. p.4007.

    [34] Pedemonte M, Nesmachnow S, Cancela H. A survey on parallel ant colonyoptimization. Appl Soft Comput 2011;11(8):518197.

    [35] Ponthot J-P, Garca Garino C, Mirasso A. Large strain viscoplastic constitutivemodel. Theory and numerical scheme. Mec Comput XXIV 2005:44154.

    [36] Ritchie G, Levine J. A hybrid ant algorithm for scheduling independent jobs inheterogeneous computing environments. In: Proceedings of the 23rdworkshop of the UK planning and scheduling special interest group; 2004.

    [37] Rodriguez JM, Mateos C, Zunino A. Are smartphones really useful for scienticcomputing? In: Cipolla-Ficarra FV et al., editors. Advances in new technologies,interactive interfaces and communicability (ADNTIIC 2011). Lecture notes incomputer science. Huerta Grande, Crdoba, Argentina: Springer; 2011. p.3544.

    [38] Rodriguez JM, Zunino A, Campo M. Introducing mobile devices into Gridsystems: a survey. Int J Web Grid Serv 2011;7(1):140.

    [39] Ruay-Shiung C, Jih-Sheng C, Po-Sheng L. An ant algorithm for balanced jobscheduling in Grids. Future Generation Comput Syst 2009;25:207.

    [40] Samples M, Daida J, ByomM, Pizzimenti M. Parameter sweeps for exploring GPparameters. In: Conference on genetic and evolutionary computation (GECCO05). New York, NY, USA: ACM Press; 2005. p. 2129.

    [41] Sathish K, Reddy ARM. Enhanced ant algorithm based load balanced taskscheduling in Grid computing. IJCSNS Int J Comput Sci Network Security 2008;8(10):21923.

    [42] Sun C, Kim B, Yi G, Park H. A model of problem solving environment forintegrated bioinformatics solution on Grid by using Condor. In: Grid andcooperative computing GCC 2004. Lecture notes in computerscience. Springer; 2004. p. 9358.

    [43] Van Dorst W. Bogomips mini-howto; 2006. .

    [44] Wang L, Kunze M, Tao J, von Laszewski G. Towards building a Cloud forscientic applications. Adv Eng Software 2011;42(9):71422.

    [45] Wang L, Tao J, Kunze M, Castellanos AC, Kramer D, Karl W. Scientic cloudcomputing: early denition and experience. In: 10th IEEE internationalconference on high performance computing and communications (HPCC2008). Washington, DC, USA: IEEE Computer Society; 2008. p. 82530.

    [46] Wozniak J, Striegel A, Salyers D, Izaguirre J. GIPSE: streamlining themanagement of simulation on the Grid. In: 38th Annual simulationsymposium (ANSS 05). IEEE Computer Society; 2005. p. 1307.

    [47] Xhafa F, Abraham A. Computational models and heuristic methods forGrid scheduling problems. Future Generation Comput Syst 2010;26(4):60821.

    [48] Youn C, Kaiser T. Management of a parameter sweep for scientic applicationson cluster environments. Concurr Comput: Pract Exp 2010;22(18):2381400.

    [49] Zehua Z, Xuejie Z. A load balancing mechanism based on ant colony andcomplex network theory in open Cloud Computing federation. In: 2ndInternational conference on industrial mechatronics and automation (ICIMA2010). IEEE Computer Society; 2010. p. 2403.

    An ACO-inspired algorithm for minimizing weighted flowtime in cloud-based parameter sweep experiments1 Introduction2 Background2.1 Cloud Computing basics2.2 Job and Cloud scheduling basics2.3 Bio-inspired techniques for Cloud scheduling

    3 Related work4 Approach overview4.1 Algorithm implementation

    5 Evaluation5.1 Real job data gathering5.2 CloudSim instantiation5.3 Experiments

    6 ConclusionsAcknowledgmentsReferences