D2.2 Soft Real- Time Dynamic Resource Allocationapi.ning.com/.../D2.2SoftRealTimeDynamicResourceAllocation.pdf · D2.2 – Soft Real-Time ... identify their limitations and proposes

Project Partners: Aicas, Bosch, CNRS, Rheon Media, The Open Group, University of Stuttgart, University of York

Every effort has been made to ensure that all statements and information contained herein are accurate, however

the Project Partners accept no liability for any error or omission in the same.

© 2014 Copyright in this document remains vested in the DreamCloud Project Partners.

Project Number 611411

D2.2 – Soft Real-Time Dynamic Resource Allocation

Version 1.0

8 December 2014 Final

Public Distribution

University of York


Page ii Version 1.0 8 December 2014

Confidentiality: Public Distribution

PROJECT PARTNER CONTACT INFORMATION

Aicas

Fridtjof Siebert

Haid-und-Neue Strasse 18

76131 Karlsruhe

Germany

Tel: +49 721 663 96823

E-mail: [email protected]

Bosch Jochen Härdtlein

Robert-Bosch-Strasse 2

71701 Schwieberdingen

Germany

Tel: +49 711 811 24517


CNRS

Gilles Sassatelli

Rue Ada 161

34392 Montpellier

France

Tel: +33 4 674 18690


Rheon Media

Raj Patel

20 Leighton Avenue

Pinner Middlesex HA5 3BW

United Kingdom

Tel: +44 7547 162920


The Open Group

Scott Hansen

Avenue du Parc de Woluwe 56

1160 Brussels, Belgium

Tel: +32 2 675 1136


University of Stuttgart

Bastian Koller

Nobelstrasse 19

70569 Stuttgart

Germany

Tel: +49 711 68565891


University of York

Leandro Soares Indrusiak

Deramore Lane

York YO10 5GH

United Kingdom

Tel: +44 1904 325571



8 December 2014 Version 1.0 Page iii


DOCUMENT CONTROL

Version Status Date

0.1 First overview of soft real-time dynamic resource allocation. 30 September 2014

0.2 Description of proposed resource allocation heuristics. 24 October 2014

0.3 Results and description of market-inspired heuristics 20 November 2014

0.4 Results and description of bio-inspired heuristic 21 November 2014

0.5 Modifications by reviewing of editors 1 December 2014

0.6 Modifications after internal partner review 6 December 2014

1.0 Final review and QA 8 December 2014


Page iv Version 1.0 8 December 2014


TABLE OF CONTENTS

1. Introduction ................................................................................................................................................................. 1

1.1 Structure of this document ....................................................................................................................................... 1

2. Dynamic Resource Allocation Techniques for Systems with Soft Real-Time Constraints ................................... 2

2.1 Market-inspired Dynamic Resource Allocation ...................................................................................................... 3

2.2 Bio-inspired Dynamic Resource Allocation ............................................................................................................ 6

3. Techniques for Market-inspired Dynamic Resource Allocation ............................................................................. 7

3.1 Review of Existing Techniques ................................................................................................................................ 7

3.2 Proposed Market-inspired Techniques .................................................................................................................. 10 3.2.1 Hybrid techniques for bidding ...................................................................................................................... 11 3.2.2 On-the-fly techniques for bidding ................................................................................................................. 26

4. Techniques for Bio-inspired Resource Allocation .................................................................................................. 30

4.1 Review of Existing Techniques .............................................................................................................................. 30 4.1.1 Pheromone signalling (PS) ............................................................................................................................ 30 4.1.2 SymbioticSphere ........................................................................................................................................... 31 4.1.3 Biological Task Mapping and Scheduling .................................................................................................... 31

4.2 Proposed Bio-inspired Technique ......................................................................................................................... 32 4.2.1 Preliminary experimental work ..................................................................................................................... 34

5. Compliance with the Dynamic Resource Allocation Requirements ..................................................................... 37

References........................................................................................................................................................................ 38


8 December 2014 Version 1.0 Page v


EXECUTIVE SUMMARY

This deliverable describes soft real-time dynamic resource allocation techniques. First,

the existing techniques are analysed to identify their bottlenecks for the DreamCloud

applications such as video processing and high performance computing. In order to

overcome the bottlenecks, several techniques are proposed to perform well in different

dynamic scenarios. The proposed techniques fall into two categories: market-inspired

and bio-inspired.

The market-inspired heuristics introduce the notion of values to a task or job and em-

ploy market concepts in the resource allocation process. The introduction of market

concepts helps to choose the most valuable jobs while discarding the less valuable jobs

in the system overload situations in order to maximize the overall system utility, which

has also been referred to as value (profit) achieved by the system. The proposed heuris-

tics take both the computation and communication overheads into account and provide

better allocation decision in comparison to the existing commonly used market-inspired

heuristic.

The bio-inspired heuristics exploit the concepts of biological system. Specifically, pher-

omone-signalling-based load balancing algorithm has been proposed that copes easily

with changing workload dynamics and incurs low computation and communication

overheads.

Finally, conformity in fulfilling the dynamic resource allocation requirements, enumer-

ated and described in Deliverable D1.2, has been analysed.


8 December 2014 Version 1.0 Page 1


1. INTRODUCTION

Based on the earlier progress shown in deliverable D1.1, this deliverable needs to ad-

dress the aspects of dynamic resource allocation for soft real-time systems from the

market and bio-inspirational points of view.

Market-inspired approaches use market concepts by employing a notion of value to

each application (or job) that needs to be allocated into the multi-core platform. The

values associated with each application helps to decide what application to choose so

that utility can be maximized, which is really desirable in the platform overload situa-

tions. Further, associating varying value with different completion time of the same ap-

plication makes the problem more interesting to decide what exact value to bid for. The

market concepts are helpful for such interesting propositions.

Bio-inspired approaches can be developed based on several bio-inspirational ideas. The

concept of pheromone-signalling has been identified as a promising candidate for the

resource allocation purposes.

1.1 STRUCTURE OF THIS DOCUMENT

This deliverable is structured as follows:

Section 2 briefly reviews state-of-the-art in dynamic resource allocation for

systems with soft real-time constraints towards proposing market-inspired and

bio-inspired mechanisms for DreamCloud applications.

Section 3 reviews existing market-inspired resource allocation techniques to

identify their limitations and proposes new techniques to overcome the

shortcomings. A variety of techniques are proposed to perform well for different

kinds of end user needs.

Section 4 reviews existing bio-inspired techniques and proposes the pheromone-

signalling-based load balancing technique.

Section 5 describes compliance of the techniques described in the previous sec-

tions with the requirements from deliverable D1.2.

Sources for additional information are footnoted throughout the document whenever

required.


Page 2 Version 1.0 8 December 2014


2. DYNAMIC RESOURCE ALLOCATION TECHNIQUES FOR SYSTEMS WITH

SOFT REAL-TIME CONSTRAINTS

It has been well proven that resource allocation is one of the most complex problems in

large many-core and distributed systems, and in general it is considered NP-hard [1].

Therefore, a well-tuned search algorithm needs to evaluate hundreds of thousands of

distinct allocations before it finds one solution that meets the system‟s performance re-

quirements. Since such evaluation is expected to take a long time, maybe hours to days,

it cannot be applied to find the solution quickly, which is desired in the contexts of dy-

namic resource allocation. Further, the search algorithms normally consider static work-

load and thus cannot handle dynamic workload scenarios. These requirements can only

be fulfilled by employing dynamic resource allocation that needs to find a performance

satisfying solution quickly in the dynamic workload scenarios.

The deliverables from the previous DreamCloud work packages (e.g., [2]) summarized

that resource allocation has significant impact on performance and energy dissipation in

both grid and many-core based systems. A variety of existing resource allocation tech-

niques that target to achieve performance guarantee and energy efficiency were intro-

duced [3] [4]. It has been observed that state-of-the-art techniques employ search-based

static allocation approaches in order to achieve performance guarantee [5] [6] [7]. These

techniques do not take into account the overheads of dynamically allocating and migrat-

ing tasks (i.e. context saving and transferring). Further, most of them assume that the

tasks are independent and thus do not explicitly consider communication overheads [8].

There have been efforts to consider multiple dependent tasks for the resource allocation

process [7] [9] [10] [11] [12] [13]. They employ various heuristics such as 1) load bal-

ancing: balances load on various cores, 2) critical path method: the tasks on the critical

path that is the longest path in the execution flow and governs the performance, are al-

located first, 3) dependency-based priority assignment: a task connected with many oth-

er tasks is assigned high priority as this task determines execution of the chid dependent

tasks by processing the input received from the parent dependent tasks, 4) hybrid of crit-

ical path method & weighted total execution time: the priority assignment of tasks for

their allocation sequence considers both the critical path and tasks weight that is deter-

mined by total execution time of successor tasks, 5) upward and downward ranks: the

tasks are assigned upward and downward ranks by giving weight to each task based on

the longest path from the task to its latest sink or earliest source, respectively, 6) Long-

est Remaining Time First (LRTF): sorts the tasks in decreasing order of upward ranks to

perform the allocation, 7) Shortest Remaining Time First (SRTF): orders tasks by in-

creasing order of upward rank.

It can be realized that the aforementioned heuristics fall into category of best effort heu-

ristics and try to optimize for the overall compute performance in terms of response

time (total execution time). In order to optimize for both the performance and energy

consumption, several heuristics have been reported in the literature [4] [14]. These heu-

ristics employ a number of fundamental optimization procedures such as iterative hier-

archical allocation to reduce energy consumption while satisfying the required Quality

of Service (QoS) [14], incremental dynamic allocation for finding a contiguous area to

map an application [15], hybrid mapping to perform intensive computations at design




time and using the design-time analysed results at run time to perform efficient online

allocation [16] [17], neighbourhood metric inspired allocation to allocate communi-

cating tasks on neighbouring cores [18] [19].

Drawbacks of Existing Heuristics

Although several efforts have been made to devise heuristics for dynamic resource allo-

cation, they suffer from the following main bottlenecks:

They can lead to starvation, missed deadlines, reduced throughput in overload

situation where demand for available resources is higher than the supply. In such

situation, it becomes difficult to decide what applications to discard and what to

enter into the system when some resources become available due to completion

of some tasks.

Most of the existing heuristics do not take into account any notion of value of

tasks to users. Further, industrial workloads do not currently have values

assigned to tasks/jobs. If such notions can be associated, the system can better

decide what tasks to consider in the allocation process and what to discard in

order to maximize the overall system utility in overload situation.

They cannot efficiently deal with industrial applications containing dependent

tasks. Towards such dealing, industrial applications need to be modelled as a set

of tasks having a set of pre-defined edges that connect the tasks, e.g. Directed

Acyclic Graph (DAG), Synchronous Dataflow Graph (SDFG).

Research in energy-efficient allocation for high performance computing (HPC)

and cloud systems is still incipient, with existing works addressing only the time

and space fragmentation of resource utilisation at a very large granularity (server

level), aiming to minimise energy by rearranging the load and freeing servers

that are then turned off, or slowed down by employing DVFS [20].

It has been realized that the aforementioned drawbacks can be handled by the proposed

soft real-time dynamic resource allocation heuristics within the framework of the

DreamCloud project. In order to be applicable to a variety of platforms, from embedded

to HPC systems, a number of heuristics need to be proposed such that they are

appropriate to different systems and applications. Further, their applicability needs to be

well defined so that one can choose the most suitable heuristic for a given scenario. The

dynamic resource allocation heuristics to be explored in the DreamCloud project are to

be based on market and biology concepts, and are introduced subsequently.

2.1 MARKET-INSPIRED DYNAMIC RESOURCE ALLOCATION

Market-based resource allocation mechanisms are proven to provide promising alloca-

tion decisions for various computing platforms such as clusters, distributed databases,

grids, parallel and distributed systems, World Wide Web, etc. [21]. Market-based

mechanisms use available platform capacity measured by low-level heuristics as bids

within an auction-like allocation process in order to find the allocation that can provide

guarantees to satisfy the required level of QoS and can maximize the overall system

utility (profit). Figure 1 demonstrates the process of market-based dynamic resource al-




location where different tasks need to be allocated into the multi-core platform. To in-

corporate market concepts in the allocation process, tasks are assigned values and bids

from resources (agents) are placed to the allocation engine (Manager Processor) in order

to maximize the value returned by multi-core system.

Figure 1: Market-based Dynamic Resource Allocation: DreamCloud Perspective

The value of tasks or applications (represented as task-graphs) can change over time to

reflect the impact of the computation over the business processes. For example, finish-

ing a computation earlier can enable fast business decisions, which in turn may result in

increasing earnings for a specific product, whereas a late finish may result in low earn-

ings.

The value forecast related to the completion of a given computation can be plotted over

time, resulting in what is called a value curve. For each task, the value curve is a func-

tion of the value of the task to the user depending on the completion time of the task.

Thus, the value curve assigns appropriate benefits (profits) to task completion in partic-

ular time. This is similar to the value curves used in several works reported in the litera-

ture, e.g. [30] [31] [32]. It has been shown that using value curves instead of fixed val-

ues for tasks gives greater market efficiency in the long run [29].

The value curve normally contains value (profit) on one axis and completion time on

another one. For example, Figure 2 shows the value curve for an HPC application. The

figure shows different profit values at different completion times (ranging from t_i to

t_f). The value of the application (job) trends towards zero with the increasing comple-

tion time and becomes zero at t_f. The DreamCloud applications are assumed to be as-

sociated with the similar value curves. Such value curve is normally perceived from the

business unit by following an economic model.

Value curves facilitate for bidding based on the available processing capacity (bids) on

different processors towards maximizing the profit for each processor. For example, a

processor with high available processing capacity can bid for the maximum value (£500

in Figure 2) to maximize its profit. This also helps to finish the job as soon as possible,




enabling faster release of the occupied processor that can be used for future incoming

jobs by having better resource availability. However, the bidding process needs to make

sure to achieve the bidding (desired) price before starting to provide the service, i.e. bid

only when the desired profit can be achieved. In case the application requirements are

not satisfied, a penalty in the overall profit can be imposed. Such bidding process can be

helpful towards devising market-based light-weight bidding heuristics. This seems to

fulfil the realization of platforms considered in the DreamCloud project, e.g. embedded

and HPC systems, which need to process dynamic workloads by employing light-weight

bidding. This approach seems tempting especially in case of dynamic workloads.

Figure 2: Value curve for HPC application processing

The bidding-based resource allocation seems to possess potential, but it suffers from the

following issues:

A weak understanding of how much value each task should be assigned for

industrial workloads.

More complex when dealing with dependent tasks as it requires an approach of

building value curves for particular tasks based on the value curve of the whole

application.

Despite having some aforementioned issues, it has several advantages over the other

kind of resource allocation approaches, for example:

Good handling in overload situations: In overload situations where demand for

the resource increases the supply of the resources, the bidding-based allocation

helps to find an allocation to maximize the overall system profit. Towards the

same, the bidding-based allocation can employ several basic principles such as:

500 450

400 300

100 0 0

100

200

300

400

500

600

…………. t_i t_m t_n .. .. t_f

Time (Accepted time range t_i to t_f)

Pro

fit

(Val

ue

in £

)




o Least valuable tasks can be postponed or discarded.

o “Survival of the fittest” can be employed to let some jobs starve in order

that the majority are able to run to completion.

Can operate both in centralised and decentralised (distributed) way: The

centralized approach achieves good performance due to a better view of the

system resource by the central manager, but there is always bottleneck around

the manager due to heavy traffic around it and extensive computation within it.

The distributed approach overcomes the above bottleneck, but higher network

traffic overhead is encountered while communicating all the bids and offers.

The overhead (in terms of time and energy) for dynamic allocation can be quite high for

large size problems such as high number of tasks in the job and resources (e.g., proces-

sors) in the platform. The overhead increases polynomially with the number of tasks

and grows further with the number of processors due to increased allocation options for

each task. This necessitates the need for light-weight bidding heuristics to efficiently al-

locate the tasks.

2.2 BIO-INSPIRED DYNAMIC RESOURCE ALLOCATION

The bio-inspired heuristics imitate a particular biological system. Among bio-inspired

algorithms, a class drawing inspiration from swarm intelligence can be singled out.

Multiple agents in these algorithms follow a number of relatively simple rules, which

results in their collective behaviour. These heuristics are worth analysing in the project

since they are (usually) distributed and self-organising, whereas algorithms implement-

ed in each agent are of low computation complexities and can be easily parallelised. To

date, project partners analysed one particular algorithm of this kind, namely Phero-

mone-Signalling-Based Load Balancing Algorithm. Although it is difficult to predict

the final system‟s parameters, and in particular to guarantee of meeting any constraints,

this algorithm behaves promising in numerous situations according to the already con-

ducted experiments (some results are presented in [43]). Despite being characterised

with low computation and communication overheads, it copes easily with changing

workload dynamics. Since each node uses only information available locally, this algo-

rithm scales well and avoids generating any hot-spots.




3. TECHNIQUES FOR MARKET-INSPIRED DYNAMIC RESOURCE ALLOCATION

3.1 REVIEW OF EXISTING TECHNIQUES

There are many cloud-based and grid-based HPC systems that use allocation and sched-

uling heuristics by taking into account not only the timing constraints of the tasks but

also their value (economic or otherwise). This problem has been well-studied under the

model of Deadline and Budget Constraints (DBC) [23], where each task or task-flow

has a fixed deadline and a fixed budget. State-of-the-art allocation and scheduling tech-

niques target objectives such as maximising the number of tasks completed within dead-

line and/or budget [24], maximising utility (profit) for platform provider [25] or mini-

mising cost to users [26] while still ensuring deadlines. Several approaches to the DBC

problem use market-inspired techniques to balance the rewards between platform pro-

viders and users [27]. A comprehensive conducted survey reviews several market-based

allocation techniques supporting homogeneous or heterogeneous platforms, some of

them supporting applications with dependent tasks modelled as DAGs [21].

Some allocation techniques using bidding-based feedback approach are proposed in

[28]. During the course of allocation, each core (or group of cores) computes its bid in-

dependently and sends it to allocation (optimization) engine as shown in Figure 3. The

bids are considered as feedback from the system and tasks/processes as auctioned items.

At regular time interval, the engine receives the list of tasks and bids and decides alloca-

tion for the pending tasks/processes. The cluster/core with the highest bid receives the

task with the highest execution time.

Figure 3: Bidding-based system feedback

The bids calculation in [28] is performed as follows:

where b(Ci) represents estimated workload that the cluster can handle, b0(Ci) is the total

CPU instruction queue space available in the core (or cluster). Since an ideal execution

time of one instruction per cycle is considered (e.g. RISC processor), it is assumed that




the number of instructions is equal to the number of cycles. w(Ci) is the remaining

workload of Ci that represents the number of instructions in queue awaiting to be exe-

cuted. The number of instructions is computed using dedicated hardware counters. d(Ci)

is the network delay to receive the bidding packet by the optimization engine, which al-

so represents estimated time for a new process to reach Ci.

By using the computed bids as described above, the authors propose two bidding-based

heuristics having a slight variation between them. The heuristics are termed as Neces-

sary Resorting (NRS) and Dynamic Resorting (DRS). The NRS approach binds highest

bidder with largest process, second highest bidder with second largest process, and so

on. If number of processes is greater than the number of cores (clusters), the allocation

process updates the bids based on the allocation done so far and then repeats the same

process to find binding for remaining processes. The same process is repeated until all

processes have been bound or no more clusters with positive bids exist. In DRS, similar

rationale as that of NRS are followed with the distinction that a cluster‟s bid is re-

calculated every time a process is bound to the cluster and list of available clusters is re-

sorted in order to reflect the allocation.

It has been shown that using value curves instead of fixed values for tasks (employed in

the above mentioned approaches) gives greater market efficiency in the long run [29].

Based on the value curve concept, Irwin et al. [30] have considered a model that as-

sumes linear decaying values of tasks with waiting time. As some variation, linear de-

crease in value followed by an exponential phase has also been considered [31]. Further,

rising and falling value curves have also been proposed to capture the situation in real-

time systems, where early completion of work can be considered as bad as late comple-

tion [32]. Earlier completion is always valued in the industrial scenario. Therefore, val-

ue curves can be assumed to be decreasing. By using value, the nature of scheduling

naturally lends itself to market-based scheduling [29].

A simple and straightforward resource allocation process may follow bidding based on

highest value [28]. The allocation process keeps a track of the unallocated tasks by put-

ting them in a queue and aims for the tasks with the highest value first in order to max-

imize the system utility (profit). The actual value (price) is less important than the order

imposed on jobs in the queue. This means that the 'sale value' of a position in the queue

is the bid value, following the English style of auction. These intermediate values do not

need to be stored. This is because as the actual value paid to the platform operator de-

pends not on the completion times of individual components of the task, but of the

completion time of the whole task according to its value curve. The bidding based on

highest value approach may achieve a large value, but it might require a large amount of

computational resources. It may be possible to achieve high value by running several

small size tasks that take less execution resources in order to maximize the overall sys-

tem utility.

In order to overcome the problem associated with bidding based on highest value, bid-

ding based on value density was introduced [33], where task‟s value divided by the

amount of required computational resources is considered as the value density. In a dy-

namic system, for the situations where workload is schedulable by Earliest Deadline

First (EDF), it has been shown that it is optimal to order the tasks by decreasing value




density. However, in overload situations, since EDF shows rapid degradation in perfor-

mance, the optimality proof will not necessarily hold. In [34], variants of value density

based allocation approach are presented. Bansal et al. [35] introduced a similar ap-

proach termed as highest density first. These approaches normally have limitations such

as: tasks must have exactly known execution times, a fixed value for completion, and all

tasks must execute on a uniprocessor.

An allocation approach to bid based on the value density squared has been introduced in

[36], where the tasks with highest value density squared are run first. This gives a more

extreme separation between valuable and less-valuable tasks. It intends that jobs will

ever start if they are never going to finish. This helps to reduce the execution penalty

and favours the most valuable jobs to finish their execution. However, this work has

drawback of considering pre-emptive tasks, which might not be good for industrial

workloads.

The bidding based on value density has been refined slightly to consider task‟s value

divided by its upward rank instead of task‟s value divided by the amount of required

computational resources. This refined approach has been referred to as value critical

path density. Here, the tasks with the highest value critical path density are run first.

This may be a more useful measure in large clusters, where responsiveness is important.

Another heuristic termed as minimum value remaining has been proposed to ensure that

the job that is going to lose its value soon, i.e., has minimum remaining value, should be

allocated first [37]. The remaining value is calculated as the area under the value curve

from the current time to the time when its value is zero. However, this and the afore-

mentioned approaches might not guarantee higher overall profit than the bidding based

on the highest value as higher value jobs might be postponed for later allocation due to

their low value density or high minimum remaining value. It is also important to note

that the performance of heuristics can depend a great deal on the exact parameters of the

workload and platform. Further, these approaches do not use design-time profiling re-

sults and lack the concept of holding low value executing jobs to allocate the freed re-

sources to high value arrived jobs.

The aforementioned heuristics align within the context of DreamCloud project and thus

have been reviewed. There has been several other market-inspired heuristics in various

contexts such as grid and distributed systems. However, they cannot be simply applied

to fulfil the aim of the DreamCloud project due to their drawbacks. The major draw-

backs of existing market-inspired heuristics are discussed subsequently.

Drawbacks of Existing Market-inspired Heuristics

The existing market-inspired heuristics suffer from the following drawbacks:

Most of them do not consider tasks/process having dependencies and thus can-

not be simply extended to apply for DreamCloud applications such as embedded

and high performance computing (HPC) as they typically contain multiple de-

pendent tasks. There have been some efforts to consider dependencies in the

context of market-inspired resource allocation [32], but static value-scheduling

has been adopted. Therefore, they cannot be applied to dynamic workloads.




The values of tasks may vary with time, but the execution times of tasks are not

well-known in advance and thus expected execution times (e.g. WCET, ACET)

are considered

Most of the approaches consider a single value for a task.

A centralized management is employed in the bidding process, which might not

be scalable.

The DreamCloud project proposes market-inspired heuristics that try to overcome most

of the drawbacks of existing market-inspired heuristics.

3.2 PROPOSED MARKET-INSPIRED TECHNIQUES

The DreamCloud project aims to propose a number of heuristics that can be used to

provide different levels of performance guarantees and can cope with different levels of

dynamism on the application workload while applying them to different systems. Fur-

ther, the heuristics should be able to perform well in different execution scenarios and

different kind of requirements. To perform an early analysis, some heuristics have been

developed and analysed while trying to satisfy the performance requirement. However,

they are expected to be refined during further project development, especially during the

progress of WP3 (Time and Energy Predictability in High Performance and Embedded

Cloud Systems) in order to consider both the performance and energy consumption re-

quirements.

The proposed market-inspired heuristics try to achieve the aims defined in the

DreamCloud project by considering the following aspects:

The heuristics can handle dynamic workloads.

The application model considers dependent tasks and thus the heuristics need to

handle communication issues.

The considered platforms are multi/many cores.

The heuristics can be applied both in centralize and decentralize way.

For a given QoS requirement, a value curve is considered instead of a single

value in contrast to most of the existing approaches.

Towards devising heuristics, two execution scenarios have been considered. In one, it is

assumed that the historical data in terms of executed jobs over the last year or few

months is known. The historical jobs can be analysed in advance so that an incoming

job at runtime can be efficiently allocated to the system resources if the job belongs to

the historical jobs. In case the incoming job does not belong to the historical jobs, i.e.

advance analysis results are not available for the job; the analysis step can be employed

followed by the run-time resource allocation step. The other execution scenario assumes

no prior information about the jobs is available before their arrival. Therefore, all the

processing has to be performed at runtime (i.e., on-the-fly) after the job has arrived and

received by the system. In both the scenarios, value of a job is only realised once all of

its components have been completed. A partially-completed job is considered to be




effectively worthless. The proposed heuristics for these two execution scenarios follow

various principles and are as follows:

Hybrid techniques for bidding

On-the-fly techniques for bidding

3.2.1 Hybrid techniques for bidding

The hybrid technique performs optimizations both at design-time and run-time, and has

been an emerging trend for efficient resource allocation [4]. However, their potential

has not been exploited to facilitate the bidding based resource allocation. The proposed

hybrid techniques employ design-time (off-line) profiling and run-time (on-line)

resource allocation for the jobs considering their arrival time and profiled results, as

shown in Figure 4. The platform resource manager is invoked to perform the resource

allocation process for the arrived jobs.

Design-time

Profiling

Jobs HPC Platform

Run-time Platform

Resource Manager

Incoming Jobs

Allocation

Result

HPC Platform

Profiling Results

& Value Curves

Figure 4: Hybrid resource allocation

Design-time Profiling

At design-time, for a given job, the performance (makespan) is estimated when different

number of cores (that is proportional to the amount of computing power) is utilized. The

makespan for a job can also be referred to as the response time or completion time of

the job after it has been allocated for execution. A job contains a set of workflows (ap-

plications) where each workflow contains a set of tasks having predefined connections

amongst the tasks, as shown in Figure 5. The same job model has also been described in

deliverable D3.1 [38]. Different workflows (applications) might have different task

graph structure.




Figure 5: DreamCloud application components

For each job that might need to be allocated into the system, we use Interval Algebra

(D5.1- Analytical Platform Model [39]) to estimate the performance (makespan) that

can be achieved when using different number of cores (computing power). The

makespan values are computed by assuming worst-case execution times of the tasks in

the job, so that the most pessimistic run-time system behaviour can be taken into ac-

count. The resource allocation decisions at different number of used cores follow a ge-

netic algorithm approach in order to obtain an efficient allocation. Further details can be

found in deliverable D5.1- Analytical Platform Model [39]. These resource allocation

decisions are kept to be used during the run-time resource allocation.

The makespan values (for different number of used cores) can represent the time axis

and some values (profits) can be assigned to achieve the corresponding makespan val-

ues in order to obtain the value-time curve. For example, Figure 6 shows the values (in

blue color on the left vertical axis) that can be achieved by completing a job at different

moment of times (makespan values). Such value-time curve is obtained from the busi-

ness unit and is normally based on the computation and communication overhead of the

tasks within the job. We assume the value-curve is given and has similar properties as

that of the value-curves reported in the literature. The main properties are 1) the value

decreases with the time, 2) value is highest at the minimum possible time, and 3) value

becomes zero at a particular time. The considered value-time curve complies with the

required properties and thus can be considered for evaluation. The profiling output (in

red color on the right vertical axis) is plotted along with the given value curve (in blue

color), which provide enriched information for the job to perform efficient run-time re-

source allocation. The right vertical axis (# Cores) represents the number of homogene-

ous cores required to obtain the corresponding makespan and value.




Figure 6: Value-time curve for a Job

Similar profiling is performed for all the jobs in the workload. For each job, this step as-

sociates information about the required computing power (# Cores) to achieve a certain

value by executing the job over a fixed amount of time. These information along with

the allocation decisions at different number of used cores are stored (Profiling Results &

Value Curves as shown in Figure 4) to be used for performing efficient run-time re-

source allocation.

Run-time resource allocation for Jobs

At run-time, the jobs arrive at different moments of time and the platform resources

need to be allocated to them based on the available computing power (cores). The plat-

form contains a set of nodes (Node 1,…,Node N), where each node contains a set of

homogeneous cores (PEs). The bottom part of Figure 7 shows an example HPC plat-

form. The communication amongst the cores of a node is established by employing ded-

icated connections. The platform nodes are assumed to be connected by a shared bus. A

platform (global) resource manager is used to manage the platform resources and per-

form resource allocation for the arrived jobs. During system operation, the manager

keeps up to date status of the platform resources, i.e., which resources are busy and

which are idle, such that accurate and efficient allocations can be made. In our case, the

platform status is maintained as the number of available (idle) cores at different nodes

and resource allocation has also been referred to as core allocation.

0

1

2

3

4

5

6

7

8

0

100

200

300

400

500

600

15 17 19 23 29 43 86 90

Time (in time-units)

# C

ore

s

Val

ue

(in

$ o

r £

)

Value # Cores




O

S

J01

O

S

O

S

Global Resource Manager

Job Arrival Time

J11J3

1J4 J5 J6 J7

1J8

J01

J11J3

1J4 J5

1J6

1J7

1J8 . . . 1

Jx1Jy

1Jz . . . 1

Jn

Current Time

PE PE

PE PE

PE

PE

Interconnect

. . .

. . .

O

S

O

S

O

S

PE PE

PE PE

PE

PE

Interconnect

. . .

. . .

O

S

O

S

O

S

PE PE

PE PE

PE

PE

Interconnect

. . .

. . .

. . .

Node 1 Node 2 Node N

Figure 7: Run-time job arrival and resource allocation

The run-time resource allocation process can be realized by looking into Figure 7. It can

be seen that the jobs arrived up to the current time has been already allocated into dif-

ferent computing nodes and jobs coming in future (after the current time) need to be al-

located. In order to allocate platform resources to the incoming jobs at run-time, the

platform resource manager is invoked to find an allocation. The manager takes the pro-

filing results of the jobs from the storage along with their value curves and arrival times

as input, and identifies profit maximizing allocation for each job based on the number of

available cores collected as bids at different nodes in the platform. This helps to achieve

high overall profit by servicing (completing) different jobs. Bidding, in this context, of-

fers several inherent benefits. Bid computation is distributed inside the nodes, eliminat-

ing unnecessary traffic. For each job, it is assumed that all of its tasks will be allocated

to one node in the platform, i.e., the tasks of a job cannot be allocated to more than one

node in order to avoid huge communication delay between two nodes. In case of a new-

ly arrived job for which profiling result is not available, the profiling step in employed

followed by the run-time resource allocation step based on the available number of

cores.

The proposed resource allocation heuristics followed by the manager are as follows:

Simple Job Queuing (SJQ)

Simple Job Queuing with Holding (SJQH)

. . .




Maximum Value Queued Job with Holding (maxVH)

Maximum Value Density Queued Job with Holding (maxVDH)

Minimum Value Remaining Queued Job with Holding (minVRH)

3.2.1.1 Simple Job Queuing (SJQ)

The proposed SJQ heuristic is presented in Heuristic 1. The heuristic takes incoming

jobs with their arrival time, the profiling results (value-time curve of jobs with compu-

ting power and allocation decisions), and platform availability as input and tries to find

an efficient allocation for each job. At each time step, the heuristic checks for two deci-

sion points: 1) any already allocated job(s) finish execution (line 2), and 2) any job(s)

arrive into the platform (line 14). First, decision point 1) is checked, followed by deci-

sion point 2).

If any job(s) arrive at the current time step and there are available resources in the plat-

form, then the job(s) are allocated into the platform; otherwise they are put into a queue

for later allocation when resources become available due to already allocated job(s)

completion (line 21). The incoming (arrived) jobs are handled in the same order as they

are perceived by the platform. For each incoming job, the platform node having highest

bid as maximum available resources is initially selected (line 16). In case more than one

platform nodes have the same maximum available resources (bids), any of them is cho-

sen. Choosing such a node helps to achieve better load balancing amongst nodes and

profit maximization by allocating the job to large number of cores. Then, the exact

number of cores to be used by the incoming job is found (line 17). If the number of

available cores is greater than the number of cores to be used to achieve maximum prof-

it, the latter one is chosen as the exact number of cores to be used by the incoming job;

otherwise the former one for the same purpose. Thereafter, the incoming job is allocated

on the chosen number of cores based on the design-time computed allocation and the

platform resources are updated (line 18). The same process is applied for all the incom-

ing job(s) in order to allocate them on the platform resources.




Heuristic 1: Simple Job Queuing (SJQ)

Input: Incoming Jobs with arrival times, Value-time curves of Jobs with computing

power and allocation decision information, Platform nodes with available cores

Output: Resource Allocation for Incoming Jobs

1: for each time_step

2: if allocated job(s) finish execution

3: Update platform resources;

4: If JobQueue contains jobs

5: for each queued job

6: if queued job has value at time_step && resources available

7: Choose the maximum capacity node;

8: Find number of cores to be used by queued job to

maximize profit;

9: Allocate resources to the queued job and update

platform resources;

10: end

11: end

12: end

13: end

14: if Job(s) arrive && resources are available in nodes

15: for each Job

16: Choose the maximum capacity node;

17: Find number of cores to be used by Job to maximize profit;

18: Allocate cores to Job and update platform resources;

19: end

20: else

21: Put the Job(s) in JobQueue for late allocation;

22: end

23: end

In case any earlier allocated job(s) complete the execution at the current time step, the

queued jobs are tried to be allocated onto the freed resources (line 2). At the job(s)

completion, the platform resources are updated. Then, for each queued job that still has

a positive value at the current time step (line 6), resource allocation is performed in the

similar way as described earlier: first, the node having highest bid, i.e., maximum num-

ber of available cores is chosen, then the exact number of cores to be used in the chosen

node is found from the profit maximization point of view, and finally the queued job is

allocated on the found number of cores by using design-time allocation decision and the

platform resources are updated. The same process is repeated for the next queued job if

the job has a positive value at the current time step and some platform resources are

available. If no platform resource is available, the current and earlier queued jobs re-

main in the queue and resource allocation for them is performed at later time steps when

some resources become available. Further, if a queued job has zero value at the current

time step, it is dropped from the queue as no profit can be made out of it.




The same process is repeated for all the incoming jobs until all of them are not allocated

or dropped due to having zero value at their late allocation points. This approach tries to

allocate most of the jobs while achieving some profit for each of them. However, some

of them might not achieve profit due to late availability of the platform resources freed

by the already allocated jobs.

3.2.1.2 Simple Job Queuing with Holding (SJQH)

This technique is similar to that of SJQ with the extra capabilities of holding the already

executing low value jobs during the run-time resource allocation process whenever

profitable. The run-time resource allocation process of the SJQH approach is presented

in Heuristic 2. The heuristic takes the same input as that of Heuristic 1 and follows most

of the similar steps. Here, before queuing an incoming job at a particular arrival time

due to no available resources in the platform, it is checked if any profit can be made by

holding low value executing jobs that are supposed to lead to small amount of profit,

and allocating the incoming job on the freed resources. The same check is applied for

all the incoming jobs at the same arrival time.

In order to identify the executing jobs to be hold, first, executing jobs of each node are

found and sorted based on their start time of the execution (lines 9 and 10). Then, the

net profit made by holding the executing jobs in each node is computed as follows:

The profit is calculated by allocating the incoming job on the freed cores and loss is the

earlier profit achieved by the executing jobs to be put on hold. The allocation uses either

all the freed cores or some of them. If the number of freed cores is higher than the num-

ber of cores required by the job to make the maximum profit, the later one is chosen as

the number of cores to be used; otherwise the former one is chosen for the same. Sorting

the executing jobs based on their start times helps us to choose, first, the job having the

latest start time, then the latest start time job along with the second latest start time, then

latest and second latest start time jobs along with the third latest start time, and so on.

Such consideration helps to identify and hold the jobs that have started recently and

avoids holding of jobs that have been executed for a long time. This process tries to

identify the most profitable instance in terms of jobs to hold. For example, holding the

latest start time job might not be profitable, but it might be profitable to hold the latest

and second latest start time jobs together. In such cases, the profitable instance would be

only when both the latest and second latest start time jobs are put on hold. Further, this

also avoids considering all the possible job combinations that might be quite huge for

large number of executing jobs in a node.

After profit calculation for each instance in each node, the maximum net profit instance

is selected and the corresponding node and its jobs to be put on hold are chosen as

max_profitable_node and jobs_to_hold, respectively (line 16 of Heuristic 2). Thus, the

holding instance that leads to maximum profit is identified. Figure 8 demonstrates the

holding process, where three platform nodes are executing different set of jobs at the




Heuristic 2: Simple Job Queuing with Holding (SJQH)


2: if allocated job(s) finish execution

3: Perform steps 3 to 12 of Heuristic 1 (HybOJQ approach);

4: end

5: if Job(s) arrive && resources are available in nodes

6: Perform steps 15 to 19 of Heuristic 1 (HybOJQ approach);

7: else //first perform holding if profitable; else put into job queue

8: for each arrived Job

9: Find executing_jobs in each platform node;

10: Sort executing_jobs in each node based on their start time;

11: for each node c of platform nodes

12: for each executing_job (∈ executing_jobs) of c

13: Find net_profit to hold the executing_job and allocate

Job on the freed resources;

14: end

15: end

16: Choose the max profitable holding instance in terms of

max_profitable_node and it‟s jobs_to_hold

17: if (net_profit at max profitable holding instance > 0)

18: Hold the jobs jobs_to_hold of max_profitable_node and

update resources;

19: Find number of cores to be used by Job to maximize profit;

20: Allocate cores to the Job and update platform resources;

21: else

22: Put the Job(s) in JobQueue for late allocation;

23: end

24: end

25: end

26: end

current time, e.g., Node 1 is executing jobs j1, j4 and j7. The executing jobs started at dif-

ferent moments of time. At the current time, job j8 has arrived and no resource is availa-

ble in the platform, therefore, the holding process tries to identify the set of jobs to be

put on hold. The table on the right hand side shows the jobs to hold in various nodes and

the corresponding net profit by allocating the freed resources to job j8. It can be realized

that sometimes the net profit (in $) by holding two jobs may be higher than that of one

job, e.g., 100$ by holding j7 and j4, whereas only 40$ by holding j7. The net profit can

also be negative (e.g., -80$ by holding j5), representing a loss if jobs are put on hold,

i.e., the achieved profit is less than the loss. The most profitable holding instance is to

hold the jobs j5 and j2 of Node 2, which results in a net profit of 200$. The holding pro-

cess will choose this instance.




No

de

1N

od

e 2

No

de

3j1

j2

j3

Time

Current Time

j8 arrived & no available cores

j4

j5

j6

j7

No

de

1N

od

e 2

No

de

3

Jobs_to_hold

j740$

j7, j4100$

j7, j4, j1-150$

j5-80$

j6-40$

j6, j3-140$

Net_profit by allocating j8 on freed cores

200$j5, j2

most profitable holding instance

Figure 8: Holding Demonstration

If net profit at the most profitable holding instance is greater than zero, the jobs

jobs_to_hold in node max_profitable_node are held and resources are updated. Then,

the number of resources to be used by the incoming job is identified based on the avail-

able resources and resources required to achieve maximum profit, as described earlier

(line 19). Thereafter, the incoming job is allocated to the cores based on the design-time

profiling allocation decision and resources are updated.

In case there in no profit at any holding instance, the incoming job is put into the queue

to allocate it at later time steps when some resources become available by finishing the

execution of job(s). The holding process helps to achieve higher profit for some jobs,

whereas profit for the held jobs becomes zero. In contrast to SJQ, this approach tends to

provide more overall profit as only profitable holdings are allowed.

3.2.1.3 Maximum Value Queued Job with Holding (maxVH)

The earlier two approaches have following shortcomings: 1) process the queued jobs in

the queuing order, 2) choose the queued jobs randomly for holding operation, and 3) do

not resume back the held jobs when resources become available. The maxVH approach

addresses the above shortcomings and is presented in Heuristic 3. At each time step, the

heuristic checks for three events as follows: 1) any already allocated job(s) finish execu-

tion to update the platform resources, 2) any job(s) arrive into the platform to put into a

job queue, and 3) job queue contains job(s) having non-zero values at current time step

to perform resource allocation for such jobs.

To perform resource allocation for all valuable queued jobs (i.e., jobs having positive

values), all of them (counter = 0 to JobQueue.size()) are tried to be allocated on the

platform resources as along as any resource is available or profit can be made by hold-

ing some executing jobs. First, bids (in terms of number of available cores) from differ-

ent platform nodes are collected, then the maximum bid (maxBid) and the correspond-

ing node is selected. Choosing such a node helps to achieve better load balancing

amongst nodes and thus better resource utilization. In case more than one nodes have

the same amount of bid, any of them is chosen. If the estimate of maxBid is greater than




Heuristic 3: Maximum Value Queued Job with Holding (maxVH)

Input: Incoming Jobs with arrival times, Value-time curves of Jobs with profiling

results, Platform nodes with available cores

Output: Resource Allocation for Incoming Jobs


2: if allocated_job(s) finish execution


4: end

5: if job(s) arrive

6: Put the job(s) in JobQueue;

7: end

8: If JobQueue contains job(s) having positive values

9: counter = 0;

10: do

11: Collect bids from different nodes and select maxBid;

12: if maxBid > 0

13: Compute profits of jobs by utilizing maxBid resources;

14: Select maxProfitableJob and its profit;

15: if profit > 0

16: Allocate resources of maxBid node to maxProfitableJob;


18: end

19: else

20: Find executing jobs_to_hold in the best_suitable_node for recently arrived

maxProfitJob (from JobQueue) and max_hold_profit by Holding heuristic;

21: if max_hold_profit > 0

22: Hold jobs jobs_to_hold and put them in JobQueue for later allocation;

23: Allocate resources of best_suitable_node to maxProfitJob;


25: end

26: end

27: counter++;

28: while (counter != jobQueue.size());

29: end

30: end for

zero (maxBid > 0), i.e., at least one resource is available in the platform, the profits of

jobs utilizing maxBid resources are computed and the job leading to maximum profit is

selected (maxProfitableJob) to be allocated to resources of the node having maxBid re-

sources provided the maximum profit is a positive value (profit > 0). The profit compu-

tation for each job considers the exact number of cores (computed as described earlier)

to be used by the job and its value at the current time step. The resource allocation on

the exact number of cores of the node containing maxBid cores is done based on the al-




location achieved on the same number of cores during design-time profiling. The alloca-

tion process allocates tasks within a job to the cores (PEs) of a node. The platform re-

sources are updated after each allocation process to have up to date resources‟ availabil-

ity information for the next allocation instance. Such information helps to achieve an

accurate and efficient allocation.

In case no resource is available in the platform, i.e. maxBid = 0, it is checked if any

profit can be made by holding low value executing jobs that are supposed to lead to

small amount of profit. For a profitable holding, max_hold_profit is greater than zero

(max_hold_profit > 0). The jobs holding logic in presented in Algorithm 3, which pro-

vides the executing jobs to hold in the best suitable node, and the maximum profitable

queued job (maxProfitJob) along with the achieved profit (max hold profit) by utilizing

the freed cores of the held jobs. The holding process is carried out for the recently ar-

rived jobs (i.e., at current time step) to avoid the same process for all the queued jobs at

each time step.

Heuristic 4: Job Holding

1: // max hold profit = 0;

2: for each recently arrived job j of JobQueue

3: Find executing jobs in each platform node;

4: Sort executing jobs in each node in ascending order based on their start times;

5: for each node n of platform

6: for each executing_job of n

7: Find net_profit (as in earlier approaches) by holding executing job;

8: if ne_ profit > max_hold_profit

9: max hold profit = net profit;

10: maxProfitJob = j;

11: best suitable node = n;

12: Add executing job to list jobs to hold;

13: end

14: end

15: end

16: end

If holding is profitable (max hold profit > 0), the jobs jobs to hold of node best suitable

node are put on hold for later allocation and used cores are released. Then, the incoming

job is allocated to the freed cores based on the profiling allocation decisions and re-

sources are updated. The holding process helps to achieve a higher profit for some jobs,

whereas jobs on hold achieve lower profits due to allocation at later time steps with de-

creased values of the jobs. In case holding is not profitable, the recently arrived jobs

remain in the job queue and resource allocation for them is performed later when re-

sources become available by completing the executing job(s). The allocation process al-

so ensures that a queued job having zero value at the current time step is dropped from

the queue as no profit can be made out of it. Further, the allocation for a queued job that




was put on hold starts from the hold point (i.e., it is resumed) to ensure that only the

fraction of the job left after holding is executed, but not the whole job from the begin-

ning. The allocation process continues until all the arrived jobs are allocated or dropped

due to having zero value while waiting in the job queue.

3.2.1.4 Maximum Value Density Queued Job with Holding (maxVDH)

This approach is similar to maxVH approach with the distinction that the queued jobs

are processed based on their value density when utilizing available (or required)

computing power (number of cores) of highest bid node (maxBid node). The value

density of a job is computed by dividing the achieved value by the number of used

cores. The queued job having maximum value density is chosen first. This also indicates

that a job providing high value and requiring less cores is allocated first, which also

leaves cores for later arriving jobs.

3.2.1.5 Minimum Value Remaining Queued Job with Holding (minVRH)

This approach is also similar to maxVH approach with the distinction that the queued

jobs are processed based on their remaining value. The remaining value of a job is cal-

culated as the remaining area under the value-time curve from the current time to the

time when value becomes zero. The job having minimum remaining value is chosen

first. This signifies that the job that is going to loose its value soon, i.e., has minimum

remaining value, is chosen first in the assumption that most of the jobs will be serviced

and some values will be achieved out of them. However, in doing so, a high value job

might have a very low value by the time resources are available to perform the alloca-

tion process for it. Therefore, it might result in low overall profit.

3.2.1.6 Results and comparison

To evaluate the quality of the heuristics, historical data from High Performance Compu-

ting Center Stuttgart (HLRS) have been considered as the workload. The workload con-

tains a set of jobs having varying arrival time. Each job contains a set of tasks that have

known worst-case execution times (WCETs). The considered HPC platform model con-

tains a set of 3 nodes, where each node consists of 9 cores. However, any number of

nodes and cores within them can be considered based on the physical limitation of

hardware integration.

The platform manager employs a heuristic to find an allocation for each job of the

workload by considering its arrival time, given value curve and profiled information

representing the computing power (used number of cores) and the corresponding alloca-

tion decision to achieve a certain value by executing over a fixed amount of time. The

profiling information is achieved by employing the design-time profiling step. For each

job set, the profiling is performed in advance in order to associate profiling results with

the given value-time curves. The profiling information includes required computing

power (# cores) and resource allocation decisions information to achieve different

makespans representing the time axis.

Overall profit by executing different number of jobs: Figure 9 shows the overall profit obtained by employing the proposed approaches for

varying job sets, which are derived by choosing different number of jobs from the work-

load. The small number of jobs in a set reflects the execution of jobs in the HPC centre




for a fixed small amount of time, e.g. few minutes or hours, but not for the time as with

the whole historical data. A couple of observations can be made from the figure. 1) The

profit obtained by the approaches employing the holding process (e.g., maxVDH and

maxVH) is always higher than that of the corresponding approaches without employing

holding. This improvement is achieved by holding low value executing jobs and allocat-

ing the freed resources to high value incoming jobs. Since holding is performed only

when it is profitable, the approaches employing holding achieve higher overall profit. 2)

The overall profit increases with the number of jobs in a workload as profit is made out

of higher number of jobs. 3) The maxVH approach achieves maximum overall profit for

the considered job sets. This is due to the fact the choosing maximum value queued job

leads to more favourable situations to maximize the profit by completing different jobs.

On an average, maxVH achieves 8% higher profit than that of maxV, which can be a

significant value when serving (completing) a large amount of jobs.

Figure 9: Overall profit for executing different job sets

Analysing holding effect on profit by different jobs: Figure 10 shows the profit obtained by different jobs when approaches SJQ and SJQH

are employed for the job set containing 30 jobs. The interesting observations that can be

made from the figure are as follows. 1) SJQ makes profit for most of the jobs and jobs

achieving zero profit are those that were queued and whose value decreased to zero

when resources become available. An example of such zero profit making job is job 27.

2) SJQH makes profit for lower number of jobs than that of SJQ as SJQH holds some

jobs and makes zero profit for some of them. Examples of such jobs are job 20 and job

24. It should be noted that some zero profit making jobs by SJQH could be due to the

same reason as that of SJQ, i.e. due to some queued jobs for whom profitable holding

was not possible at their arrival and value for them becomes zero when resources be-

come available. Similar results are obtained for other job sets.

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

30 Jobs 40 Jobs 50 Jobs

Ove

rall

Pro

fit

(£)

Job sets containing different number of jobs

SJQ

MinVR

MaxVD

MaxV

SJQH

MinVRH

MaxVDH

MaxVH




Figure 10: Profit by different jobs

Overall profit with varying job killing penalty: We also have evaluated the overall achieved profit in case there is some penalty to hold

the low value executing jobs. This will be more favourable situation for the customers

as they know that most likely their submitted jobs will be serviced with the initial prom-

ised quality; otherwise the HPC centre has to pay them back in terms of some penalty.

We have assumed that if a job is put on hold then there will be a penalty of some per-

centage of the maximum value that could be achieved for the job. However, for the

queued jobs that can lead to zero profit making, we have not considered any penalty

since the job is not put on hold but went to out of profit making point due to resources

unavailability. The holding penalty percentage has been varied to evaluate its impact on

the overall profit achieved by the most promising approach maxVH.

Figure 11 shows the overall profit obtained by employing maxVH when holding penalty

percentage is varied from 0% to 70% for the job set containing 30 tasks. The overall

profits obtained by employing maxVH with no holding penalty (maxVH-0%) and maxV

have also been plotted for the comparison purposes. It can be observed that the overall

profit by maxVH decreases as the holding penalty percentage increases and becomes

saturated after a particular holding penalty percentage. The profit at lower penalties re-

mains the same as the penalty is not sufficient enough to affect the allocation decisions

for the jobs. The decreasing trend is obtained as lesser holdings are performed with in-

creased penalty and thus making low overall profit. The later constant profit region in-

dicates that no further holdings are performed due to high cost (penalty). It should be

noted that the holding penalty also determines the holding decision and thus different

jobs are put on hold with the changed penalty. Further, it can be observed that the over-

all profit with higher holding penalty is the same as that of maxV as the approach

maxVH performs resource allocation exactly in the same manner as that of maxV, i.e.,

no jobs are put on hold due to high penalty and incoming job is put into the job queue

for later allocation when resources become free.

0

100

200

300

400

500

600

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Pro

fit

($)

Job Index

SJQ SJQH




Figure 11: Profit by maxVH with varying holding penalty and by maxV

Profit by different jobs at varying job killing penalty: Figure 12 shows the profit obtained by different jobs when approach SJQH is employed

with different job holding penalties for a job set containing 20 jobs. It can be observed

that the number of held jobs that make zero profit is higher when the holding penalty is

low. For example, for 10% holding penalty, jobs 7, 8, 9 and 18 are held, which results in

zero profit by them. The number of held jobs reduces as the holding penalty increases.

For example, for 30% holding penalty, only job 12 is held. It should be noted that the

holding penalty also determines the holding decision and thus different jobs are held

with the changed penalty. This can be clearly observed with the above two examples

that show that jobs 7, 8, 9 and 10 are held if penalty is 10%, whereas job 12 is held if

penalty is 30%. Further, it can also be observed that the holding of jobs stops at higher

penalties, e.g. at 40%. It should also be noted that zero profit making can be due to job

queuing and late allocation as well. However, for Figure 12, there is no such job, which

has been verified with the allocation results.

Figure 12: Profit by different jobs

95009600970098009900

100001010010200

Tota

l Pro

fit

($)

Resource allocation approaches

0

100

200

300

400

500

600

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Pro

fit

($)

Job index

SJQH-10% SJQH-20% SJQH-30% SJQH-40%




Discussions: The SJQ approach is good from user (customer) point of view as it tries to perform re-

source allocation for most of the jobs. Similarly, minVR tries to service most of the jobs.

In contrast, the approaches employing holding operation hold jobs to maximize the

overall profit, and thus do not care much about the user but to the HPC platform suppli-

ers, i.e. companies. Further, these approaches perform similar allocations as that of ap-

proaches without employing holding in case the job holding penalty is quite high.

The results indicate that maxVH can be employed if the holding penalty is not quite high

and HPC platform supplier wants to maximize the overall profit. However, users might

not wish to submit their jobs to such HPC centres where their jobs might get unserved

due to holding. Therefore, from the user point of view, SJQ and minVR can be em-

ployed that try to serve all the jobs and make profit by most of them. Further, in SJQ,

when no resource is available for an incoming job, the client may be early informed that

the service for his job cannot be started immediately but sometimes soon when resource

become available. This helps the client to withdraw his jobs in case he is not happy and

able to find a better HPC supplier at the same time. However, such a possibility is not

feasible in the approaches employing holding operation as a job can be held at any time.

3.2.2 On-the-fly techniques for bidding

The approaches described in Section 3.2.1 consider whether employing a holding pro-

cess for the low value executing jobs and resuming them later (i.e., employing pre-

emption) can lead to a higher level of value achieved. Further, they utilize design-time

profiled results of the jobs to perform an efficient allocation towards maximizing over-

all profit. The on-the-fly approaches perform all the processing at runtime after the job

has arrived and do not use any design-time profiled results. Further, they do not consid-

er pre-emption. These approaches are suitable for the scenario when the customers are

expected to submit different kinds of jobs at different moments of time and historical

data cannot be made available by HPC centres.

The on-the-fly approaches do not have overhead to manage job queues anywhere other

than inside a HPC platform as no prior profiling information is used. The allocations

found at runtime may be suboptimal as valuable jobs may be starved on a very busy

node, even if there is free capacity on other nodes, just because of the allocation made

on the arrival of job. Similar to the earlier approaches, a centralised auctioneer manag-

ing the queue of jobs that are yet to be allocated is considered. The overhead to manage

the queue could be compensated by the higher value achieved. HPC platforms tend to

use a submission host machine and the auction could be run on it without needing yet

another machine, though it would place more load on the submission host.

As earlier, the auctions take place in order to determine which job to run next. The jobs

and platform nodes place bids, where bids from nodes are the number of free cores.

The job is allocated to the node with the most cores free first, achieving load balancing

across the HPC platform. The highest-bidding job is allocated to the highest-bidding

node until there is no more job or free cores left.




3.2.2.1 Techniques

The evaluated on-the-fly techniques are similar to those proposed in Section 3.2.1, but

they do not use any design-time profiling results and do not employ the holding process

(pre-emption). When not utilizing the profiling information and not employing the hold-

ing process, the techniques are referred to as: SJQ that performs job queuing when no

resource is available and process the queued jobs in the queuing order, maxV that

chooses the maximum value queued job first for allocation, maxVD that chooses maxi-

mum value density queued job first, maxVD2 that chooses maximum value density

squared job first and minVR that chooses minimum value remaining job first. It is

important to note, however, that the performance of heuristics can depend a great deal

on the exact parameters of the workload and platform.

3.2.2.2 Results and comparison

The workload size is designed to correspond to a year in the life of an HPC centre, to

ensure sufficient weekly cycles of load are present. Two simulation scenarios have been

evaluated: small scale scenario and large scale scenario. In small scale scenario, a total

of 1000 jobs to be executed over a HPC platform of two nodes with 20 execution

threads each (representing 10 hyper-threaded cores) are considered. The large scale sce-

nario considers a total of 10,000 jobs composed of more than 100,000 tasks to be exe-

cuted on a platform containing 4 nodes with 1000 cores each. The large scale scenario

has been considered to simulate future platforms that might be encountered in future.

The arrival rates of the jobs are varied to correspond to a normal working week, with

peaks of work arrival during normal working hours, and quiet periods overnight and at

weekends. These peaks are intended to to be high enough to oversaturate the system, in

order to evaluate the techniques‟ ability to appropriately prioritise jobs to achieve the

highest value. Further, a range of load levels are examined. If the load is perfectly di-

visible across all cores, 100% load would represent all the cores being fully utilised (a

state called saturation). Above 100% load, some jobs must starve, whereas below 100%

load there is some slack time available.

It is to be expected that as load rises, the proportion of the maximum value achievable

will fall. Especially beyond saturation, some work will have to starve and so will never

realise any value. In these results, no penalties are applied for jobs that do not run; they

just do not return any value.

Overall profit at varying load: Figure 13 shows the proportion of overall profit at varying loads when different tech-

niques are employed for the small-scale simulation scenario. In these results, maxVD2

policy attains the highest value results across the spectrum of load (Figures 13 (a) and

(b)). This is due to the fact that squaring the density allows a better separation of the

values between the most and least-valuable tasks, and the simulation scenario is more

favourable.




(a) Normal View (b) Zoomed-in View

Figure 13: Value against load for the small-scale platform

Figure 14 shows the proportion of overall profit at varying loads when different tech-

niques are employed for the large-scale simulation scenario. These are designed to rep-

resent a large industrial HPC scenario or a platform owned by an organisation selling

their capacity as cloud computing. In the large-scale results (Figures 14 (a) and (b)),

across the spectrum of load, maxVD dominates all the other policies and is significantly

better at load levels above 100%. However, maxVD2 dominates for load range 80% to

100%.

(a) Normal View (b) Zoomed-in View

Figure 14: Value against load for the large-scale platform

Number of starved (uncompleted) jobs: Figure 15 shows the number of starved jobs (workflows) at varying loads for the small

and large scale platforms. It is remarkable that although maxVD2 and maxVD

achieve

higher overall value for small-scale and large-scale platforms (Figure 13 and Figure 14),

the number of jobs that each technique starve is very different indeed. For small-scale

system, maxVD2 achieves higher overall value and lower number of starved jobs,

therefore, it should be chosen to maximize the overall profit and user satisfaction. For




the large-scale system, even though maxVD2

has a somewhat lower overall value

(Figure 14), it may be better to be used as it starves fewer jobs. Lower levels of

starvation are likely to maintain higher user satisfaction.

(b) Small-scale platform (b) Large-scale platform

Figure 15: Number of starved (incomplete) jobs against load




4. TECHNIQUES FOR BIO-INSPIRED RESOURCE ALLOCATION

4.1 REVIEW OF EXISTING TECHNIQUES

Biologically inspired heuristics are processes which draw inspiration from nature and

apply the observed characteristics when solving specific computational problems. They

are often based on the characteristics of self-organizing biological systems where global

patterns emerge from interactions at a lower-level in the system [40].

Bio-inspired heuristics have been well explored to balance communication load in

networks and distributed systems [41] [42]. In this review, however, we concentrate on

the problem of the allocating both computation and communication loads.

4.1.1 Pheromone signalling (PS)

A bio-inspired approach to load balancing in wireless sensor networks (WSNs) is

presented by Caliskanelli et al. [43]. It is based on the behaviour of honey bees when

choosing a queen for their colony. The authors identify the similarities between the

assignments of responsibilities to members of a beehive with the distribution of a

workload across a WSN. The goal of the heuristic is to optimise the trade-off between

energy efficiency and service availability in a WSN. The proposed scheme applies a

dynamic load balancing approach based on the idea of certain nodes propagating a

virtual „pheromone‟ to make other nearby nodes aware of its presence and

responsibilities. The idea is to provide these optimisations at runtime in a completely

decentralised and self-organising fashion.

The heuristic can be summarised as follows; all nodes in the network are capable of

collecting sensor readings, executing tasks and communicating with other nearby nodes.

Nodes are differentiated into one of two roles at any given time: queen nodes and

worker nodes. Queen nodes (QNs) refer to those which are responsible for the mapping

and execution of tasks. All other nodes are determined to be Worker Nodes (WNs).

QNs are the only nodes which will voluntarily execute tasks, WNs will only perform

execution if explicitly told to do so. The level of a virtual pheromone seen by a node

determines whether it can differentiate itself as a QN and it is possible for there to be

multiple QNs in the network.

QNs periodically propagate pheromone to their neighbours in the network. The level of

pheromone decays with each hop from the QN. For example, a node which is a direct

neighbour of the QN will experience a higher pheromone level from that particular QN

compared to one which is two hops from the QN. It is possible that a node receives

doses of pheromone from multiple QNs. These pheromone doses are accumulated into

an individual pheromone level by each node. If the pheromone level experienced by a

node drops below a pre-defined threshold, the node will become a QN. The pheromone

level of a node will also decay over time, ensuring that if no pheromone doses are

received after an amount of time, the node will become a QN. This ensures the heuristic

is resilient to failure of QNs or if certain nodes are too many hops away from a QN. It

also means that the network will have an appropriate density of QNs so as to maintain a

high level of service availability. However, it is unspecified how the pheromone level

affects nodes which have already differentiated themselves as QNs.




While parameter-rich and difficult to tune to different scenarios, PS provides a

completely decentralised approach to load balancing, as each node dynamically makes

resource allocation decisions using only information that is locally available (i.e.

pheromone level), and its computation and communication overhead is minimal: simple

periodic and event-triggered computation tasks, and lightweight communication

(pheromone packets can be only one byte long).

4.1.2 SymbioticSphere

This is approach tries to model the resource allocation problem as a bio-inspired process

that involves agents that mimic the lifecycle, feeding, reproduction, evolution and death

of living beings [44]. Agents autonomously decide to follow so-called synergetic

behaviours that allow them to migrate towards runtime platforms that are more

resource-rich or to duplicate themselves. The approach has been applied to cloud

systems, aiming to optimise service availability (which is referred as throughput in

[44]), response time and resource efficiency (which is the amount of computational

work performed by a given agent divided by the amount of platform resources it used to

do so). The choice of synergic behaviours is based on game-theoretical formulations

that aim towards stability of the populations of agents, rather than the optimisation of

the metrics of interest. The evolution of the agents, on the other hand, is driven by a

genetic heuristic. Both those mechanisms are computationally heavy, which makes this

approach suitable for large scale systems only and questionable when it comes to

systems with timing constraints.

4.1.3 Biological Task Mapping and Scheduling

Hamouda and Phillips [45] present a bio-inspired task mapping and scheduling

heuristic, “Biological Task Mapping and Scheduling” (BTMS), aiming to improve

energy efficiency and performance in Wireless Sensor Networks. The heuristic is

inspired from zygotes, which are human embryonic stem cells. Zygotes exhibit

behaviour called differentiation which relates to when cells begin to specialise to

perform different functions. The work attempts to recreate this behaviour in network

nodes to achieve the aforementioned goals.

BTMS uses an application model based on a direct acyclic graph (DAG) to represent

the tasks and dependencies between them. It orders tasks according to their

dependencies and divides them in execution stages (or levels), which are used to guide

the mapping. It assumes a homogeneous network, and all nodes begin with the same

energy level. The heuristic is divided in three phases: group discovery, service

provisioning and group management.

The group discovery is based on the concept of a target sensor node, which influences

other nodes based on their proximity to it. A request to participate in a processing

activity is broadcast from the target node to its nearby neighbours who can if they wish

to participate in the activity. Each neighbour makes their decision based on two factors.

Firstly, if their energy level is below a pre-set threshold value, nodes will prefer to

remain as relay nodes to preserve their energy. The authors do not explicitly define the

responsibility of relay nodes but it can be assumed that they participate in

communication activities and not in computation activities. The second decision factor




is whether the node identifies if it has sufficient neighbours to relay the data in the

network. If both of these factors are true for a node, it will participate in the activity.

The target sensor then performs a local election heuristic to elect a Main Node (MN)

whose role it is perform the task mapping. The MN is assigned to the node which has

the maximum of the fitness function:

The first part of the equation refers to the remaining energy of the node where ESi is the

remaining energy of the node and Emax is the maximum energy of the node. The second

part of the equation refers to how close the node is to the centre of the group

participating in the activity. The summation term takes the total of the distances from

the node to all of its neighbours. The a term is a parameter between 0 and 1 which can

be tuned to place more precedence on the nodes remaining energy or its centrality to the

group.

This fitness function is looking for a node with a high energy level which is in the

centre of the group. Intuitively, these nodes are the most appropriate for the role of MN

and this is reflected in the fitness function. The CPU power of the MN is not considered

in this fitness function because the network is assumed to be homogeneous. If the

heuristic were to be applied to a heterogeneous network, a term which factors in

processing power of the node may have to be incorporated into the fitness function,

otherwise the role of MN may be given to a node which is not sufficiently fast enough

to perform the BTMS heuristic. A similarity can be drawn here between main nodes in

BTMS and queen nodes in PS reviewed in the previous subsection, where certain nodes

are elected to have responsibility over a number of nodes in a given area. The additional

responsibility often comes with an increased drain in energy, meaning an imbalance in

the rate of battery drain between nodes.

4.2 PROPOSED BIO-INSPIRED TECHNIQUE

Pheromone Signalling for Manycores (PSIGMA) is the DreamCloud extension to the

PS heuristic presented in [43]. It uses the pheromone signalling mechanism to allow

nodes to advertise and procure the availability of resources over a manycore platform.

Cores that have high availability of resources differentiate themselves as queens (QN)

and will propagate pheromones to supress the differentiation of neighbouring nodes,

therefore achieving efficient load balancing.

Similarly to PS, PSIGMA has three distinct phases which are executed on every

processing core: two of them are time-triggered (differentiation cycle and decay of

pheromone) and one of them is event-triggered (propagation of received pheromone).

The first time-triggered phase, referred to as the differentiation cycle, is executed by

every node of the network every TQN time units. On each execution, the core checks its

current pheromone level hi against a predefined level thresholdQN. The core will

differentiate itself into QN (or maintain its QN status) if hi < thresholdQN; otherwise it

will become (or remain) a regular core and will handle only its individual workload. If




the node is a QN, it then transmits pheromone to its network neighbourhood to make its

presence felt. Each pheromone dose hd is represented as a two-position vector. The first

element of the vector denotes the distance in network hops to the QN that has produced

it. The second element is the actual dosage of the pheromone that will be absorbed by

the neighbours.

The event-triggered part of the heuristic deals with the propagation of the pheromone

released by QNs (as described previously in the differentiation cycle) and received at

neighbouring nodes. The purpose of propagation is to extend the influence of QNs to

nodes other than their directly connected neighbours. Propagation is not a periodic

activity and happens every time a node receives a pheromone dose. Upon receiving a

pheromone dose, a node checks whether the QN that has produced it is sufficiently near

for the pheromone to be effective. It does that by comparing the first element of hd with

a predefined thresholdhopcount. If the hd has travelled more hops than the threshold, the

core simply discards it. If not, it adds the received dosage of the pheromone to its own

pheromone level hi and propagates the pheromone to its neighbourhood. Before

forwarding it, the core updates the hd vector element by incrementing the hop count and

by multiplying the dosage by a decay factor Khopdecay. This represents pheromone

transmission decaying with distance from the source.

The second time-triggered part of the heuristic proposed in [43] is a simple periodic

decay of the pheromone level of each node. Every Tdecay time units, hi is multiplied by a

decay factor Ktimedecay. In PSIGMA, we take into account the dynamic load scenarios

that DreamCloud systems are likely to encounter, and tune the Ktimedecay factor to reflect

the current availability of resources of each core. Following the PS principles, this is

done in a completely decentralised way and using only information available locally to

each core. Each core monitors the slack of the tasks and communication activities it

performs: how early do they finish with regard to their timing constraints (i.e. soft real-

time deadlines). If slacks are high, it means that the core is underloaded, since most

tasks and communications are processed well ahead of their deadlines. In that case, the

Ktimedecay factor is increased, aiming to accelerate the process of differentiating this core

into a QN. Conversely, if slacks are low or negative, it means that the node is

overloaded and therefore should not be differentiated into QN, so Ktimedecay is decreased.

Finally, we introduce another event-triggered part to the heuristic, allowing individual

cores to tune their thresholdQN according to the local availability of resources. It uses

the same slack monitors described in the previous paragraph: if the slacks are large and

increasing, the value of thresholdQN is also increased, in order to increase the likelihood

of a differentiation into QN. Conversely, low or negative slacks will result in a decrease

of thresholdQN, which could potentially force a differentiation of a core from QN back

to a regular node. The differentiation of a QN back into a regular node due to overload

is a particularly desirable behaviour in the case of DreamCloud applications, but this

was not achievable under the baseline PS algorithm, or by tuning Ktimedecay as described

above.




Heuristic 3: Pheromone Signalling for Manycores (PSIGMA)

differentiation 1 every TQN do 2 if <

3 =true 4 broadcast hd = {0, hQN} 5 else 6 =false propagation 1 when is received 2 if ( [ ]

3 [ ] 4 broadcast hd’ = { [ ] , [ ] }

5 else 6 drop hd decay 1 every TDECAY do 2

3

threshold tuning 1 every TQN do 2

4.2.1 Preliminary experimental work

PSIGMA was validated within a simulated multi-stream video processing scenario,

which is similar to one of the DreamCloud case studies in task T6.3 of WP6.

We assume an open system where several video streams can be initiated by end-users,

each with distinct frame-rates, resolutions and QoS requirements. Each video-stream is

divided in MPEG groups of pictures (GoPs), which are periodically processed by a

chain of intercommunicating tasks according to the stream’s frame-rate. The proposed

PSIGMA resource allocation mechanism tries to improve the timeliness of the video

processing against a baseline mechanism that allocates tasks to the cores with the lowest

utilisation value: ∑

⁄

where WCETi is the worst case

execution time of task i and Ti is its minimum inter-arrival interval (i.e. derived from the




stream’s frame-rate). Instead, PSIGMA allocates tasks to cores that have differentiated

themselves as QN. Note that we do not consider task migration in this case study. Once

allocated, a task will be executed by the core it was assigned to, even if it is not QN

anymore by the time the task is scheduled to run. In that case, however, future jobs of

such a task (i.e. processing the next GoP of a stream) will be reallocated to a QN core.

The following parameters were used to tune and implement PSIGMA:

TQN = 0.072 s

Tdecay = 0.036 s

thresholdQN = 9

thresholdhopcount = 2

Khopdecay = 0.25

Ktimedecay = 0.7

hd = 14

pheromone packet size = 32 bytes

Thirty five different scenarios were simulated, each referred by its seed number, which

was used to define the computation cost of the video processing tasks, the video stream

arrival times and the video resolutions. By covering a wide array of scenarios, we can

show the robustness of the proposed resource allocation mechanism, i.e. show that it

performs well for different types of workloads.

Figure 16 below shows the distribution of GoP lateness, measured in seconds, in each of

the 35 scenarios with the baseline and with the PSIGMA resource allocation. There are

clearly noticeable reductions in lateness by using PSIGMA, such as in seeds: 88117,

5558, 42198, 18065, etc. However negative results can be seen in seeds 74076 and

83660. Overall, the approach performs better for both average case and worst case GoP

lateness. Figure 17 has boxplots with the percent improvement of the mean and

maximum GoP lateness for all the 35 scenarios. PSIGMA provides an average

improvement of about 8% and at most a 22% improvement in reducing the mean GoP

lateness of streams processed by the system. Likewise, the overall average improvement

of maximum lateness is in the order of 5%.




Figure 16: MPEG GoP lateness (in seconds): baseline vs PSIGMA

Figure 17: PSIGMA MPEG GoP lateness percent reduction against baseline




5. COMPLIANCE WITH THE DYNAMIC RESOURCE ALLOCATION

REQUIREMENTS

The heuristics proposed in Section 3 and Section 4 should comply with the dynamic

resource allocation requirements reported in deliverable D1.2 [2]. This is to ensure that

the development of the heuristics fall within the DreamCloud perspective. The different

requirements of the heuristics are mentioned in the left most column of Table 1. The

proposed market-inspired and bio-inspired heuristics for soft real-time dynamic

resource allocations are analysed for the different requirements of the DreamCloud

project. In Table 1, ‘+’ sign indicates that the corresponding requirement has been

fulfilled, whereas a ‘-’ sign indicates the opposite. The numbers with the sign are added

to provide further clarifications with the same footnote numbers just below the table.

These footnotes indicate mainly the lack of fulfilment, which are to be considered in

different current or future deliverables. The requirement fulfilment status has been

provided based on the preliminary analysis and results. The analysis for some

requirements is out of the scope of this deliverable and will be deeply performed in

future deliverables.

It can be seen that most of the requirements are fulfilled by the proposed soft real-time

dynamic resource allocation heuristics. The initial sets of analysis and results indicate

that the proposed heuristics are promising choices for extensions and investigations.

Table 1. Dynamic resource allocation requirement fulfilment by Proposed Heuristics

Requirement Heuristics

SJQH maxVH maxVDH minVR

minVRH

PSIGMA

Objectives of dynamic re-

source management should be

configurable

+1,4

+1,4

+1,4

+1,4

+1,4

Dynamic resource allocation

shall be used to provide differ-

ent levels of performance guar-

antees

+4 +

4 +

4 +

4 +

4

The average latency of jobs

shall be minimised

+ + + + +

The total energy dissipation of

jobs shall be minimised

-1 -

1 -

1 -

1 -

1

Communication overhead pa-

rameters shall be predictable

+ + + + +

Dynamic resource allocation

overhead shall be predictable

and bounded

+ + + + +

The dynamic resource alloca-

tion mechanisms shall cope

with dynamic workload

-2 -

2 -

2 -

2 +


tion mechanisms shall not limit

+ + + + +




hardware scaling


tion mechanisms shall cope

with limited information about

the state of the overall system

+3 +

3 +

3 +

3 +

3


tion mechanisms shall respect

mapping constraints that re-

strict the allowed computation-

al unit

+4 +

4 +

4 +

4 +

4


tion mechanisms shall consider

cost, runtime and power effi-

ciency for different type of

resources available to a multi-

typed job

+5 +

5 +

5 +

5 +

5

1Outside the scope of this deliverable; it will be covered in deliverable D2.3 by having possible

extensions of the proposed heuristics. 2In hybrid techniques, the workload is analysed at design-time by assuming that the same workload

pattern will appear at run-time. Therefore, the workload that will be executed at different times should be

known in advance. 3The heuristics use limited information to make resource allocation decision especially for the distributed

resource management. 4Not covered explicitly by the proposed heuristics, but their extension to consider multiple

objectives/constraints is trivial. 5Currently only one type of resource has been considered. The future extensions will consider different

types of resources.

REFERENCES

1. S. H. Bokhari. On the mapping problem. IEEE Transaction on Computers, 30(3):207–214, 1981

2. University of York, University of Stuttgart, D1.2 – Dynamic Resource Allocation Requirements,

DreamCloud, 2014.

3. R. I. Davis and A. Burns. A survey of hard real-time scheduling for multiprocessor systems.

ACM Comput. Surv., 43(4):35:1–35:44, October 2011

4. A. K. Singh, M. Shafique, A. Kumar and J. Henkel. Mapping on multi/many-core systems: sur-

vey of current and emerging trends. In 50th Annual Design Automation Conference, 1:1 – 1:10,

2013

5. G. Ascia, V. Catania, and M. Palesi. A multi-objective genetic approach to mapping problem on

Network-on-Chip. Journal of Universal Computer Science, 12(4):370–394, 2006

6. A. Racu and L. S. Indrusiak. Using genetic algorithms to map hard real-time NoC-based sys-

tems. In 7th International Workshop on Reconfigurable Communication-centric Systems-on-

Chip (ReCoSoC), 2012

7. S. Stuijk, T. Basten, M.C.W. Geilen, and H. Corporaal. Multiprocessor resource allocation for

throughput-constrained synchronous dataflow graphs. In 44th ACM/IEEE Design Automation

Conference, DAC ’07, pp. 777 –782, 2007

8. O. Moreira, F. Valente, and M. Bekooij. Scheduling multiple independent hard-real-time jobs on

a heterogeneous multiprocessor. In Proceedings of the 7th ACM & IEEE International Confer-

ence on Embedded Software, pp. 57–66, 2007

9. J. E. Kelley, Jr. Critical-path planning and scheduling: Mathematical basis. Operations Research,

9(3):pp. 296–320, 1961




10. T.C.E. Cheng and Q. Ding. Scheduling start time dependent tasks with deadlines and identical

initial processing times on a single machine. Computers and Operations Research, 30(1):51 – 62,

2003

11. B. Shirazi, M. Wang, and G. Pathak. Analysis and evaluation of heuristic methods for static task

scheduling. Journal of Parallel and Distributed Computing, 10(3):222 – 232, 1990

12. H. Topcuouglu, S. Hariri, and M. Wu. Performance-effective and low-complexity task schedul-

ing for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems,

13(3):260–274, March 2002

13. E. Saule, D. Bozda˘ g, and U. V. Catalyurek. A moldable online scheduling algorithm and its

application to parallel short sequence mapping. In Eitan Frachtenberg and Uwe Schwiegelshohn,

editors, Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science,

vol. 6253, pp. 93–109, 2010

14. L.T. Smit, J.L. Hurink, and G.J.M. Smit. Run-time mapping of applications to a heterogeneous

SoC. In International Symposium on System-on-Chip, pp. 78 –81, 2005

15. C.-L. Chou and R. Marculescu. Incremental run-time application mapping for homogeneous

NoCs with multiple voltage levels. 5th IEEE/ACM/IFIP International Conference on Hard-

ware/Software Codesign and System Synthesis (CODES+ISSS), pp. 161 –166, 2007

16. A. Schranzhofer, J.-J. Chen, and L. Thiele. Dynamic power-aware mapping of applications onto

heterogeneous MPSoC platforms. IEEE Transactions on Industrial Informatics, 6(4):692 –707,

2010

17. A. K. Singh, A. Kumar, and T. Srikanthan. A hybrid strategy for mapping multiple throughput-

constrained applications on MPSoCs. In Proceedings of the 14th international conference on

Compilers, architectures and synthesis for embedded systems (CASES), 2011

18. A. K. Singh, T. Srikanthan, A. Kumar, and W. Jigang. Communication-aware heuristics for run-

time task mapping on NoC-based MPSoC platforms. J. Syst. Archit., 56(7):242–255, 2010

19. S.-S. Lu, C.-H. Lu, and P.-A. Hsiung. Congestion- and energy-aware run-time mapping for tile-

based network-on-chip architecture. In International Conference on Frontier Computing. Theory,

Technologies and Applications, pp. 300 –305, 2010

20. A. Beloglazov and R. Buyya. Energy efficient allocation of virtual machines in cloud data cen-

ters. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing

(CCGrid), pp. 577 –578, 2010

21. C. S. Yeo and R. Buyya. A taxonomy of market-based resource management systems for utility-

driven cluster computing. Software: Practice and Experience, 36(13):1381–1419, 2006

22. I. Caliskanelli, J. Harbin, L. S. Indrusiak, P. Mitchell, D. Chesmore, F. Polack. Bio-inspired load

balancing in large-scale WSNs using pheromone signalling. International Journal of Distributed

Sensor Networks, vol. 2013, Article ID 172012, 14 pages, 2013

23. R. Buyya and M. Murshed. A deadline and budget constrained cost-time optimisation algorithm

for scheduling task farming applications on global grids. arXiv:cs/0203020, March 2002. Tech-

nical Report, Monash University, 2002

24. Y. Tao and X. Yu. Classified optimization scheduling algorithm driven by multi-QoS attributes

in economical grid. In International Conference on Computer Science and Software Engineering,

volume 3, pp. 70–73.2008

25. C. Li and L. Li. Multi-level scheduling for global optimization in grid computing. Computers &

Electrical Engineering, 34(3):202–221, 2008

26. O.O. Sonmez and A. Gursoy. A novel economic-based scheduling heuristic for computational

grids. International Journal of High Performance Computing Applications, 21(1):21–29, 2007

27. L. Xiao, Y. Zhu, L.M. Ni, and Z. Xu. Incentive-based scheduling for market-like computational

grids. IEEE Transactions on Parallel and Distributed Systems, 19(7):903–913, 2008




28. T. Theocharides, M. K. Michael, M. Polycarpou, and A. Dingankar. Hardware-enabled dynamic

resource allocation for manycore systems using bidding-based system feedback. EURASIP J.

Embedded Syst., Article 3, 21 pages, 2010

29. K. Lai. Markets are dead, long live markets. ACM SIGecom Exchanges, 5(4): 1–10, 2005

30. D. E. Irwin, L. E. Grit, and J. S. Chase. Balancing risk and reward in a market-based task ser-

vice. In Proceedings of the 13th IEEE International Symposium on High Performance Distribut-

ed Computing, pp. 160–169, 2004

31. E. D. Jensen, C. D. Locke, and H. Tokuda. A time-driven scheduling model for real-time operat-

ing systems. In IEEE Real-Time Systems Symposium, pp. 112–122, 1985

32. K. Chen and P. Muhlethaler. A scheduling algorithm for tasks described by time value function.

Real-Time Systems, 10(3):293–312, 1996

33. C. D. Locke. Best-effort decision-making for real-time scheduling. PhD thesis, Pittsburgh, PA,

USA, 1986

34. P. Li and B. Ravindran. Fast, best-effort real-time scheduling algorithms. IEEE Transactions on

Computers, 53(9):1159–1175, 2004

35. N. Bansal and K. R. Pruhs. Server scheduling to balance priorities, fairness, and average quality

of service. SIAM Journal on Computing, 39(7):3311–3335, 2010

36. S. A. Aldarmi and A. Burns. Dynamic value-density for scheduling real-time systems. In The

11th Euromicro Conference on Real-Time Systems, 1999

37. A. M. Burkimsher. Fair, responsive scheduling of engineering workflows on computing grids.

Ph.D. dissertation, UK, 2014

38. University of Stuttgart, D3.1 – Cloud Communications Patterns Analysis, DreamCloud, 2014.

39. University of York, D5.1 – Analytical Platform Model, DreamCloud, 2014.

40. S. Camazine, J.-L. Deneubourg, N. R. Franks, J. Sneyd, G. Theraulaz, and E. Bonabeau. Self-

Organization in Biological Systems. Princeton University Press, 2003

41. T. Nishitha and P. C. Reddy. Performance Evaluation of AntHocNet Routing Algorithm in Ad

Hoc Networks. In 2012 International Conference on Computing Sciences (ICCS), pp. 207–211,

2012

42. A. da Silva Rego, J. Celestino, A. dos Santos, E. C. Cerqueira, A. Patel, and M. Taghavi. BEE-

C: A bio-inspired energy efficient cluster-based algorithm for data continuous dissemination in

Wireless Sensor Networks. In 2012 18th IEEE International Conference on Networks (ICON),

2012, pp. 405–410, 2012

43. I. Caliskanelli, J. Harbin, L. S. Indrusiak, P. Mitchell, F. Polack, and D. Chesmore. Bioinspired

Load Balancing in Large-Scale WSNs Using Pheromone Signalling. Int. J. Distrib. Sens. Netw.,

vol. 2013, p. 14, May 2013

44. P. Champrasert, J. Suzuki, and C. Lee. Exploring self-optimization and self-stabilization proper-

ties in bio-inspired autonomic cloud applications. Concurrency Computat.: Pract. Exper.,

24(9):1015-1034, 2012

45. Y. E. M. Hamouda and C. Phillips. Biological Task Mapping and Scheduling in Wireless Sensor

Networks. In Proc. IEEE Int. Conf. Communications Technology and Applications (ICCTA), pp.

914–919, 2009

Documents

D2.2 Soft Real- Time Dynamic Resource Allocationapi.ning.com/.../D2.2SoftRealTimeDynamicResourceAllocation.pdf · D2.2 – Soft Real-Time ... identify their limitations and proposes