Adaptive job scheduling with load balancing for workflow application

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),

ISSN 0976 – 6375(Online) Volume 2, Number 1, Dec - Jan (2011), © IAEME

9

ADAPTIVE JOB SCHEDULING WITH LOAD BALANCING

FOR WORKFLOW APPLICATION IN GRID PLATFORM

D.Daniel

PG Scholar

Karunya University

E-Mail: [email protected]

Mrs.S.P.Jeno Lovesum M.E

Asst.Professor

Karunya University


D.Asir

PG Scholar

Karunya University


A.Catherine Esther Karunya

PG Scholar

Karunya University


ABSTRACT

Grid computing servers as the globally connected systems which performs high

computing in many practical applications. Scheduling plays a key role in providing

performance for grid workflow applications. Various scheduling strategies are proposed,

including static scheduling strategies which map jobs to resources before execution time,

or dynamic alternatives which schedule individual job only when it is ready to execute.

Both of the schedules require significantly high scheduling cost and they may not

produce good quality of schedule with low cost. This paper proposes a novel semi

dynamic algorithm with load balancing concept, which allows the schedule to adapt and

schedule the jobs as per the changes in the dynamic grid environment. The proposed

novel algorithm schedules the job statically and continues the schedule with dynamic

scheduling due to the dynamic nature of the grid. The makespan and the resource usage

are the main to objective of this scheduling algorithm. When the resource and

International Journal of Computer Engineering

and Technology (IJCET), ISSN 0976 – 6367(Print)

ISSN 0976 – 6375(Online) Volume 2

Number 1, Dec - Jan (2011), pp. 09-21

© IAEME, http://www.iaeme.com/ijcet.html

IJCET

© I A E M E



10

performance fluctuation occur in the grid environment it affects the processing of the jobs

which results in the delay in the job completion time. In this algorithm load balancing is

incorporated to handle such situation where the jobs are handled after it is dispatched to

their respective hosts. When there is resource fluctuation occurs due to the dynamic

nature of the grid or over loading of jobs to a processor which delays the makespan, load

balancing is done to handle the job execute and to get desired makespan.

Index Terms: DAG, Tasks, Makespan, Resource usage, Semidynamic scheduling.

I. INTRODUCTION

Grids as geographically distributed computing systems has variety of resources

often dispersed geographically to be interconnected and shared, for scientific and

engineering challenges, in which majority of applications fall into the interdependent

task model. These applications are generally known as workflow applications [4]. Due to

the growing popularity of grid computing systems, many applications have been

attempting to take advantage of these computing environments. Such applications are

generally constructed by interweaving interdependent jobs; these applications are called

workflow applications. Workflow applications are essentially the same as typical parallel

programs, with one exception: a workflow application consists of a set of interdependent

applications (not partitioned tasks of a parallel program). Like conventional parallel

programs, workflow applications can be represented by a DAG. A DAG, G = (V, E),

consists of a set V of v nodes and a set E of e edges. A DAG is also known as a task

graph or macro dataflow graph. The nodes usually represent jobs of a workflow

application, and the edges usually represent precedence constraints. An edge (i,j)ϵE

between job ni and job nj represents the inter job communication. Specifically, the

outputs of job ni must be transmitted to job nj for job nj to start its execution. A job with

no predecessors is called an entry job, nentry; an exit job, nexit , is one that has no

successors. Among the predecessors of a job ni, the predecessor that completes the

communication at the latest time is the most influential parent (MIP) of the job denoted

as MIP(ni). A job is called a ready job if all of its predecessors have been completed. The

longest path of a task graph is the critical path (CP) [1].

Workflow applications can take advantage of a grid computing platform;

however, these applications, besides the use of resource heterogeneity and dynamism,



11

impose a great burden on scheduling. In some systems, this workflow scheduling is left

for manual dispatch by users, while other systems employ automated workflow

management platforms (WMPs)[1] .These WMPs tend to focus on the minimization of

the application’s completion time. However, there are other important performance

considerations of WMPs, such as resource usage, load balancing, and fault tolerance.

Although some WMPs have facilities to deal with these considerations, they often lack

the capability of explicit resource usage control. Rather, for the sake of fault tolerance,

resources are overly used (task duplication).

The job scheduling has a close relationship with the load balancing. There are two

ways load balancing can be made with the given job and resource in the hand, prediction

based and non prediction based. The prediction based load balancing already collects the

amount of jobs it have to schedule against the amount of resources, i.e processors. In this

case the job scheduling is done keeping in account of the availability of the resource and

the load of jobs that is scheduled is scheduled even to all resources based on their status

and the scheduled job is dispatched to the hosts [11]. The non prediction based approach

does not have any information about the resource and the jobs. Based on the schedule

strategy it schedules the job and dispatched to the hosts based on the dynamic changes

that happens among the availability of the resource, the load is migrated and jobs are

executed, in this case dynamic balancing of the load is done.

The rest of the paper is organized as Chapter 2, explains the related works and

Chapter 3 gives a detailed presentation about the adaptive scheduling Chapter 4 gives the

system model, and chapter 5 tells proposed scheduling with load balancing, chapter 6

gives the comparison and evaluation of the proposed scheduling, which is followed by

conclusion and future work on Chapter 7.

II. RELATED WORKS

Since in many respects, workflow scheduling in grids is similar to the

conventional task scheduling problem in tightly coupled heterogeneous computing

systems (e.g., clusters), some well-known task scheduling algorithms (e.g., HEFT) have

been adopted and modified for grid workflow scheduling. Most of the algorithms are

designed in such a way to meet the dynamic nature of the grid. More specifically,

rescheduling and advance reservation among other techniques are often used to deal with



12

uncertainties in resource performance. Most job scheduling approaches adapted from

traditional task scheduling algorithms fall into two category look-ahead category and

just-in-time category. The major difference between these two categories is whether

scheduling decisions are made before the actual job dispatch or at the time any ready jobs

are identified, i.e., their predecessors have all been completed. Clearly, for look-ahead

approaches, the acquisition of accurate performance information on resources plays a

critical role in their decision making [9]. One major drawback of just-in-time approaches

is the loss of timely data transfers. For example, provided that a job has three

predecessors and they complete at different times, the data transfers from these

predecessors to the job start at the time the last predecessor completes its execution.

Where, the times the first two predecessors are completed and the time the last

predecessor is completed is wasted.

The challenge of scheduling grid workflow application with static strategy is

discussed many researches, but few research efforts address them. Rescheduling is

implemented in the GrADS, where it is normally activated by contract violation.

However, the efforts are all conducted for iterative applications, allowing system to

perform rescheduling decisions at each iteration. The plan switching approach is to

construct a family of activity graphs and investigates the means of switching from one

member of the family to another when the execution of one activity graph fails, but the

major drawback is all the plans are made without knowledge about the future

environment change since the grid does not ensures a stable computing environment [5].

Another rescheduling policy is proposed in, which considers rescheduling at a few,

carefully selected points during the execution. The research tackles one of the

shortcomings that static scheduling always assumes accurate prediction of job

performance. After the initial schedule is made, it selectively reschedules some jobs if the

run time performance variance exceeds predefined threshold. However, this approach

deals with only the inaccurate estimation and does not consider the change of resource

pool [10].

Since the majority of the tasks that grid computing handles are interdependent and

most of them are workflow application, the scheduling must concentrate on the resource

usage, to have a well organized use of resource; the scheduler must know the information



13

about the resource, and not just the amount of resource alone. The basic function of the

scheduler needs the amount of resource that is available in the grid to schedule the n

number of jobs. To have a effective schedule the scheduler needs the status of the

resource, its processor speed, how much time it has left before start executing another

job, how much jobs it can handle in the given amount of time,etc. Based on which the

adaptive scheduling strategy is framed , when the schedule does not produce an optimal

performance , or due to the dynamic changes in the availability of the resources the

scheduler adapts to the situation and schedules the job to complete its execution.

III. ADAPTIVE SCHEDULING STRATEGY

Even though static and dynamic scheduling performs near to optimal, its

effectiveness in a dynamic grid environment is questioned. The proposed semidynamic

strategy based novel adaptive job scheduling with load balancing algorithm by which the

workflow scheduler can adapt to the grid dynamics to achieve its strength practically.

A. Issues with Traditional Scheduling

Planning is a onetime activity in the traditional static scheduling. The static

scheduling does not consider the future change of grid environment after the resource

mapping is made. On the other hand, rescheduling in execution phase is proposed but

mainly used to support fault tolerance. Overall, the issues with traditional static

scheduling are: (1) Accuracy of estimation of communication and computation costs, (2)

Adaptation to dynamic environment,

Figure 3.1 Classification of Static scheduling

and (3) Separation of workflow scheduler from executor. Fundamentally the above two

issues are related to the lack of collaboration between the workflow scheduler and

executor. With collaboration, a scheduler will be aware of the grid environment change,

including the job performance variance and resource availability, and is able to



14

adaptively reschedule based on the increasingly accurate estimations. This approach can

both continuously improve performance by considering the new resources and minimize

the impact caused by unexpected resource downgrade or unavailability [9]. The main

issue with dynamic scheduling in workflow application is the execution procedure of the

interdependent tasks. The output of one job could be the input of another job. In the

dynamic scheduling jobs are executed in all possible ways where the resources are taken

into account for scheduling

B. Adaptive Scheduling

The basic idea of adaptive scheduling for a given DAG and a set of currently

available resources, the scheduler makes the initial resource mapping as any other

traditional static approaches do [5]. Along with scheduler gets updation from the executor

about the resources information, such as:

Resource Pool Change: If new resource is discovered after the current plan is

made, rescheduling may reduce the makespan of a DAG by considering the resource

addition. When resource fails, fault tolerant mechanism is triggered and it is taken care of

by Executor. However, if the failure is predictable, rescheduling can minimize the failure

impact on overall performance.

Resource Performance Variance: The performance estimation accuracy is

largely dependent on history data, and inaccurate estimation leads to a bad schedule. If

the run time Performance Monitor can notify the scheduler of any significant

performance variance, the scheduler along with predictor will evaluate its impact and

reschedule if necessary.

The scheduler reacts to event by evaluating if makespan can be reduced by

rescheduling. For example, if a new resource becomes available, the scheduler will

evaluate if a new schedule with the extra resource in consideration can produce smaller

makespan [7]. If so, the scheduler will replace the current one with new one by

submitting it to the Executor.

IV. SYSTEM MODEL

This paper proposes Adaptive Rescheduling approach that can both continuously

improve performance by considering the new resources and minimize the impact caused

by unexpected resource downgrade or unavailability.



15

Grid consists of number of sites. Each site is autonomous in nature; it has its own

users and global users, hosts are time and resource shared. The hosts of same group or

organizations are clustered together and termed to be sites. The resources of the same

cluster or site can access by the hosts of the same site as their own. When the resource

that has to be accessed from another site the complexity arises. There are administrators

allocated for every site. Depending on which they deploy access polices and access rights

and processors allocation etc. These this differs for every organization. The workflow

application has n number of interrelated jobs in one task. The start time of first job of the

task to the finish time of nth

job in the task is termed as makespan. These jobs are denoted

by DAGs(Directed Acyclic Graph) each node is job and the edges denote the relation

between jobs. The cluster has load scheduler which makes the load balancer and also

reports status of the resource to the job scheduler which maintains in a history repository

for future job scheduling.

Figure 4.1 System Design

V. ADAPTVE SCHEDULING WITH LOAD BALANCING FOR

WORKFLOW APPLICATIONS

While the task scheduling problem in heterogeneous computing systems with

perfectly accurate performance information on resources and applications still remains

very difficult, uncertainties on resource performance and the lack of control over grid

resources make workflow scheduling even more complex. Unlike many other workflow

scheduling schemes, we consider both makespan and resource usage to be equally

important and take this into account in our scheduling model. Efficient resource usage is

crucial in grid scheduling because 1) a grid consists of multiple sites administered by



16

different entities that use their own resources for other tasks beside the grid jobs and 2)

due to the fluctuations and uncertainty surrounding sites in a grid system, lower resource

usage not necessarily the minimization of the number of resources used, rather the

minimization of resource time means lower overall variance in the expected completion

time (makespan) of an application [1].

A. Job Scheduling

To start with, the schedule is made static with the predefine tasks and its

execution procedures. The tasks are scheduled statically based on their priority, the

priority of each task to be set with upward rank value, ranku, which is based on the mean

computation and communication cost. The task list is created by sorting the tasks in

ascending order of the ranku. Tie-breaking is done randomly. There can be alternative

policies for tie-breaking, such as selecting the task whose immediate successor tasks has

higher upward rank. Since these alternate policies increase time complexity, random

selection strategy is preferred [7]. The task list that is made based on the priority is taken

as S* (current schedule).

At the end of each iteration, mutation is considered if no improvement on S* is

made during the current iteration. Schedule randomly chooses a mutation method

between point and swap mutations and mutates each job in S* with a probability of 0.5

sufficient to generate substantially different schedules. The mutated schedule is then used

as the current schedule (S*). If there have been some improvements on S

* in the current

iteration, it is passed onto the next iteration for further improvements. This schedule

manipulation process repeats for a predefined number of iterations. Now, jobs in the

current schedule S* are dispatched to their assigned hosts, as they become ready, i.e.,

their predecessor jobs have finished [1].

B. Cluster Based Dynamic Load Balancing

In the grid computing the resources are globally distributed and the

geographically located resources are termed as clusters. The clusters are group of

processors in an organization or a LAN (Local Area Network). Number of clusters group

together and perform the computation. The user can access any number of resources from

anywhere any time. Each cluster has a scheduler which has the details of the resources or

processor information, about their current computation. Amount of jobs they executed



17

amount of resource they hold .etc. the scheduler also has the details of the neighboring

clusters. The cluster communicates between them to make the load dynamically balance

and to execute the job effectively [11].

When the scheduled jobs are dispatched to the hosts. The next step is the load

balancing. When a job is getting delayed to execute, the Actual Latest Finish time of the

job is calculated. When the delay is less than the ALFT, then the load is balanced. If the

delay is greater than the ALFT, the scheduler communicates with the neighboring cluster

and the load is dynamically allocated to processor that is optimal to execute the

remaining job. Migration is used for allocating the job from one resource to another.

VI. EVALUATION AND COMPARISION

Adaptive scheduling strategy based HEFT-based adaptive rescheduling algorithm

(AHEFT) has the advantage of continuously improves the performance, considering the

new resource and minimizes the impact caused by unexpected resource down grade or

unavailability [7]. Drawbacks of adaptive rescheduling technique are it takes time in

rescheduling the jobs and to implement the collaboration model, rescheduling has to be

integrated with advance resource reservation and resource availability prediction model.

The gridflow gives the Advantage of grid performance service comprises performance

prediction capability with a new application response measurement technique[2], which

can be used to enable prediction-based scheduling as well as response-based scheduling.

But the disadvantage such as the process of a grid workflow encompasses multiple

administrative domains (organizations) [7]. The lack of central ownership and control

results in incomplete information and Computational and networking capabilities can

vary significantly over time in the grid environment. Application performance prediction

becomes difficult and real-time resource information update within a large-scale global

grid becomes impossible, which lower its performance



18

Figure 5.1 The structure of adaptive semidynamic scheduling strategy with Load

Balancing

Critical Path-on-a processor (CPOP) Algorithm and Heterogeneous Earliest First

Time (HEFT) Algorithm, both gives more or less same performance measures as high

performance and fast schedule time. But has slight disadvantage of high schedule cost

[4]. Duplication-based Bottom-Up Scheduling Algorithm (DBUS) gives uses both task

insertion and task duplication. It also gives the facility to minimize the schedule length

[6]. It does not impose any restriction on number of task duplication and Task duplication

mainly causes an increase of resource usage which causes disadvantage to the algorithm.

The ADOS algorithm has the highest possibility for reducing the makespan of the task,

which is the total amount of time required from the start of the first job to the end of the

last job. It also reduces the resource usage. The disadvantage is the algorithm itself is a

complicated which makes iteration more complex in selecting the scheduling best

schedule strategy. The Adaptive scheduling with load balancing strategy provides best

performance on both parameters of the grid work, it reduces the makespan of the jobs and

it makes effective use of resource. The concept provide less complex static and dynamic



19

scheduling strategy and clear condition of the occurrence of load balancing, which makes

the scheduling to have less schedule time.

VII. CONCLUSION

In this paper, the scheduling of workflow applications in grids is addressed.

Unlike many previous scheduling approaches for such a class of applications, the

semidynamic scheduling strategy takes into account both makespan and resource usage.

The schedule achieves the two objectives effectively combining a static heuristic

scheduling scheme with a dynamic scheduling technique with load balancing. Based on

the research and study conducted, the results obtained, the resource-usage-conscious

scheduling scheme significantly improves resource utilization without sacrificing too

much of makespan. The load balancing strategy incorporated into scheduling helps in

ensuring the quality of schedules against performance fluctuations of grid resources.

The future work would be the implementation of the proposed Adaptive

Scheduling with load balancing concept for the workflow application carried on the grid

environment. The result should be taken and compared and a complete performance

evaluation study will be conducted to determine the promising performance of the

schedule on the grid platform.

REFERENCE

1. Young Choon Lee, Member, IEEE, Riky Subrata, and Albert Y. Zomaya, Fellow,

IEEE “On the Performance of a Dual-Objective Optimization Model for

Workflow Applications on Grid Platforms,” Proc, IEEE Transactions On Parallel

And Distributed systems, Vol. 20, N0. 9, September 2009.

2. J. Cao, S.A. Jarvis, S. Saini, and G.R. Nudd, “GridFlow: Workflow Management

for Grid Computing,” Proc. Third IEEE/ACM Int’l Symp. Cluster Computing and

the Grid (CCGrid ’03), pp. 198-205, 2003.

3. H. Casanova, “Simgrid: A Toolkit for the Simulation of Application Scheduling,”

Proc. First IEEE/ACM Int’l Symp. Cluster Computing and the Grid (CCGrid ’01),

pp. 430-437, 2001.

4. R. Wolski, “Dynamically Forecasting Network Performance Using the Network

Weather Service,” Proc. Sixth IEEE Int’l Symp. High Performance Distributed

Computing (HPDC ’97), pp. 316-325, 1997.



20

5. H. Topcuoglu, S. Hariri, and M. Wu, “Performance-Effective and Low-

Complexity Task Scheduling for Heterogeneous Computing,” IEEE Trans.

Parallel and Distributed Systems, vol. 13, no. 3, pp. 260-274, Mar. 2002.

6. D. Bozdag, U. Catalyurek, and F. Ozguner, “A Task Duplication Based Bottom-

Up Scheduling Algorithm for Heterogeneous Environments,” Proc. 19th Int’l

Parallel and Distributed Processing Symp. (IPDPS ’05), Apr. 2005.

7. Z.Yu and W.Shi, “An Adaptive Rescheduling Strategy for Grid Workflow

Applications,” Proc. 21st Int’l Parallel and Distributed Processing Symp.

(IPDPS), 2007.

8. Y. Gil, V. Ratnakar, E. Deelman, G. Mehta, and J. Kim, “Wings for Pegasus:

Creating Large-Scale Scientific Applications Using Semantic Representations of

Computational Workflows,” Proc.19th Conf. Innovative Applications of Artificial

Intelligence (IAAI ’07),pp. 1767-1774, 2007.

9. M. Wieczorek, R. Prodan, and T. Fahringer, “Scheduling of Scientific Workflows

in the ASKALON Grid Environment,” ACMSIGMOD Record, vol. 34, no. 3, pp.

56-62, Sept. 2005.

10. G. Singh, E. Deelman, G. Mehta, K. Vahi, M.-H. Su, G.B. Berriman,J. Good, J.C.

Jacob, D.S. Katz, A. Lazzarini, K. Blackburn, andS. Koranda, “The Pegasus

Portal: Web Based Grid Computing,”Proc. 20th Ann. ACM Symp. Applied

Computing (SAC ’05),pp. 680-686, 2005.

11. A.Mondal, K. Goda, and M. Kitsuregawa, Effective Load Balancing via

Migration and Replication in Spatial Grids, LNCS 2736, pp. 201-211, 2003.

12. B.A. Shirazi, A.R. Hurson, and K.M. Kavi, Scheduling and Load Balancing in

Parallel and Distributed Systems. IEEE CS Press, 1995.

13. A.M. Dobber, G.M. Koole, and R.D. van der Mei, “Dynamic Load Balancing

Experiments in a Grid,” Proc. Fifth IEEE Int’l Symp. Cluster Computing and the

Grid (CCGrid ’05), pp. 123-130, 2005.



21

D.Daniel received the B.E degree in Information Technology from

Karunya University in 2009. He is currently doing his Post graduate in Karunya

University,now works on the project in parallel and distributed Systems, and Continues

research on adaptive scheduling techniques in grid computing.

D. Asir received the B.E degree in Information Technology from

Anna University in 2009. He is currently doing his Post graduate in Karunya

University,now works on the project in parallel and distributed Systems, and Continues

research on Dynamic load balancing techniques in grid computing.

Mrs.S.P.Jeno Lovesum ,Asst professor, has completed her Master of

Engineering (CSE) in Annamalai University, Chidambaram and doing her research in

Cloud Computing.

A.Catherine Esther Karunya received the B.E degree in

Information Technology from Karunya University in 2009. She is currently doing her

Post graduate in Karunya University,now works on the project in Computing Security

and continues her research in Networking.

Documents

Adaptive job scheduling with load balancing for workflow application