5

Click here to load reader

[IEEE 2008 4th International Conference on Information and Automation for Sustainability (ICIAFS) - Colombo, Sri Lanka (2008.12.12-2008.12.14)] 2008 4th International Conference on

  • Upload
    d-n

  • View
    217

  • Download
    5

Embed Size (px)

Citation preview

Page 1: [IEEE 2008 4th International Conference on Information and Automation for Sustainability (ICIAFS) - Colombo, Sri Lanka (2008.12.12-2008.12.14)] 2008 4th International Conference on

978-1-4244-2900-4/08/$25.00 ©2008 IEEE ICIAFS08

Adaptive Distributed Job Scheduler for Java Compute Servers

W.D.I.A Kalawila D. N. Ranasinghe

University of Colombo School of Computing University of Colombo School of Computing University of Colombo University of Colombo Sri Lanka Sri Lanka

Abstract – Federation of shared memory space type distributed system with the basic command pattern has yielded a very powerful distributed systems architecture, which is known as java compute servers. Java compute servers leverage the power of shared memory space based distributed systems in addition to the ability to perform any activity\computation by any processing node within the system. Within this distributed environment there are set of processors known as workers. The task of the workers is to perform the task available in the shared memory space. These workers randomly read these entries in the shared memory space and perform the computation and write back the results in the shared space. These results will be picked up by the master process and if required aggregate the results into one result and delivers to the source of the task. This random selection of tasks from the shared memory space by the worker processes, can lead to low performance in the distributed system, if computational intensive tasks are taken up by slow workers on slower nodes. The solution to the above problem is to implement a job scheduling framework for Java compute servers. This paper proposes an adaptive job scheduling framework which takes into account the processing capabilities of nodes and the processing requirements of tasks when scheduling tasks. To evaluate the framework in a practical environment, the proposed scheduling framework was implemented for an image processing application.

I. INTRODUCTION With the phenomenal growth of the internet and related network technologies, distributed processing has become more popular over the years. Introduced in 1985 by Gelernter [1], the concept of tuple space and its associated coordination language has become more popular with the introduction of technologies such as JINI and java based implementation of tuple space such as JavaSpace. In tuple based systems such as JavaSpaces [2], a process known as master processor writes the task to be performed

to the shared memory space as an entry in the space. The worker who can process the entry picks up the entry by using a mechanism known as template matching. When the worker has finished processing, it writes back the results in the shared memory space, which is picked up by the master processor and hands the results back to the originating source. This concept has the added advantage of having a built in load balancer in the system, thus reducing the need to implement a job scheduler for the system. One draw back of JavaSpace is that only specialized type of workers can perform specific tasks, thus under utilizing the resources within the distributed system. This is overcome by developing java compute servers as an added layer on top of JavaSpaces. Compute servers are developed by leveraging command pattern, whereby the executing code is also included in the task entry. Thus any worker picking up the task can do the computation. This increases the resource utilization of the system. Due to the fact that Compute servers are build on top of JavaSpace technology, Compute servers inherently do not contain job scheduling component. Since every worker within the system can perform any task submitted to the space, not having a scheduler can be disadvantageous when a task requiring high processing is picked up by a worker with low processing capabilities. This issue is aggravated by the leasing mechanism used by the JavaSpace to detect worker failures. The leasing mechanism assigns each process a time limit to complete. During this time period the process is not visible to other workers within the system. If the process of execution is not complete within this specified time it is assumed that the worker has failed during execution and the task is placed in the shared space again to be picked up

by another worker and makes the earlier process of execution redundant. The lease time can be programmatically set in each process but, this is not guaranteed. If the system is overloaded with tasks the master process decides the actual lease time.

When unrealistic lease time periods are set by the programmer or the system, the system thorough put is decreased by way of re-computing of tasks which have already been partially computed by another worker and

Universal
Typewriter
109
Page 2: [IEEE 2008 4th International Conference on Information and Automation for Sustainability (ICIAFS) - Colombo, Sri Lanka (2008.12.12-2008.12.14)] 2008 4th International Conference on

who has not finished the processing and written back to the space during the specified time. In order to improve the performance of java compute servers, this issue has to be overcome. Thus the solution to the problem is to develop a job scheduler which can schedule tasks with high computing requirements with workers having higher processing capability. The processing requirements of tasks can be represented by a priority schema. This priority schema can be utilized by a scheduler for scheduling tasks among the workers. The rest of this paper is organized as follows. Section II describes the related work associated with job scheduling, followed by a detailed description of the proposed job scheduling mechanism in s section III. Section IV presents the experimental evaluation of the proposed scheduler. Section V presents the conclusions and the future work.

I. RELATED WORK

Job scheduling algorithms can be broadly classified based on the nature of identifying and monitoring the distributed resources within the distributed system. These are static scheduling and dynamic scheduling.

In static scheduling variables such as processors and communication lines between collaborators is assumed to be constant. These scheduling schemes found roots in multiprocessor environment [3] where jobs were needed to be scheduled among several processors, where the no of processors and connectivity among them was dependable enough to assume the resources were constant.

One of the simplest static job schedule approach is to use the First-Come-First-Served (FCFS) policy. In FCFS job scheduling model, jobs are processed in the order in which they arrive. This model of job scheduling has gained popularity due to it’s simplicity to implement and it’s built in fairness mechanism for scheduled jobs [4]. But the main problem with this type of scheduling models is that application or system performance is not taken into consideration when scheduling jobs, leading to poor system performance [5]. Backfilling was proposed to overcome the low system throughput issue faced by FCFS type of scheduling [6]. Backfilling works by identifying “holes” in the 2D jobs space and by moving forward smaller jobs that fit those holes. There are two common variations to backfilling, conservative and aggressive. In conservative backfilling, every job is given a reservation when it enters the system. Smaller jobs are moved forward in the queue as long as it does not delay any previously queued job. In aggressive backfilling, only the job at the head of the queue has a reservation. A small job is allowed to leap

forward as long as it does not delay the job at the head of the queue. Under FCFS, the priority of a job is its wait time. Research into job scheduling has concluded that job scheduling is an NP- complete task. To overcome this issue; two broad approaches have been developed. One is to use some heuristic to base the scheduling and the other method is to use knowledge gained prior about the system to decide on best job scheduling scheme.

In heuristic based scheduling, a heuristic is usually selected from a combination of one or more environmental factors that effect job execution. Some of the heuristics used in some schedulers include [7] load balancing, minimum execution time, minimum completion time, etc. These types of schedulers do not take into consideration the communication delays, apparent in distributed networks. Some schedulers have taken this into account and have proposed, heuristics to take into consideration the communication costs [8] as well. According to research carried out by Wang and Morris [9], load balancing plays a critical role in distributed systems with regard to overall quality of service of the system. This phenomenon has been the main motivation for developing dynamic scheduling schemes where balancing the load within the systems is the main criteria when scheduling jobs.

Dynamic scheduling approaches have a load balancing algorithm, a local scheduling policy component, and information policy component as well as placement and distribution policy [10]. The distributing policy determines how programs will be placed on remote machines. The information required for the distribution policy is catered by the information policy of the load balancing algorithm component. Usually this information is requested from the participating nodes to be supplied to the centralized scheduling system.

In dynamic scheduling the main criteria for allocation of job to a processor is mainly dependent on whether the allocated processor can perform the task with adequate quality of service. Most of the dynamic scheduling systems have a transfer policy contained within it. Such policies are used if the higher quality of service can be rendered to the system, if the job completes in another processor. In these types of schedulers, to distribute the load effectively within the distributed system, the scheduler requires a lot of information about each and every individual processor within the system and the information must be frequently collected in order to achieve proper results. Thus the main disadvantage of such systems is that it requires huge amount of information to be exchanged between processors and the scheduler. But some proposed solutions attempt to reduce the need to exchange huge amount of data by using gossiping based systems such as Zhou’s algorithm [11]. Even with such gossip based systems the amount of information passed between

Universal
Typewriter
110
Page 3: [IEEE 2008 4th International Conference on Information and Automation for Sustainability (ICIAFS) - Colombo, Sri Lanka (2008.12.12-2008.12.14)] 2008 4th International Conference on

processors and the scheduler is less than the traditional load balancing and still remains very high.

Some other proposed approaches include dynamically balancing the load without using global information, instead considering load only on neighboring processors [12]. But this approach is not very successful since it does not consider the state of the entire system, when scheduling jobs.

II. PROPOSED SCHEDULER

Since the scheduler is implemented as an additional layer to the workings of java compute servers, the introduced scheduler must not hinder the capabilities of java compute servers. Thus the scheduler must achieve the following goals

• Scheduler must not be centralized • Assign larger tasks to workers with high speed

processors.

In order to achieve the above goals, a solution is proposed based on scheduling mechanism where compute intensive tasks are scheduled on workers with high processing capability. The computing requirements for individual tasks are derived based on information of prior execution of similar tasks. The scheduler itself is implemented as another task. The key feature of this task is that it remains as a task that needs to be computed throughout the entire lifespan of the compute server. This specialized task is executed when a given set of criteria has been fulfilled by the compute

server. This approach decouples the scheduler from the master. Thus the scheduling of task can be performed by any worker within the system as well as the master. This lends for a distributed scheduler. The mechanism used to implement the scheduler is encapsulated within three modules. These modules being the Master, Worker and the scheduler. A schematic overview of the framework architecture is depicted in Figure 1. Master module The master module represents a typical master process in a general purpose compute server. Thus it is responsible for breaking down the high level task to smaller individual tasks and positioning the tasks in the shard memory space. All interaction among the master and worker processes occur in the form of task and result entries exchanged through a single JavaSpace. The JavaSpace registers as a Jini service and relies on Jini for the remote lookup during the discovery phase of the service. It also inherently handles all the low-level communication issues. One of the major activities carried out by master of the proposed system is to assign all the tasks with a priority. Same priority is set for the tasks of the same main task. The priority set for a task is proportional to the computing complexity of the task, with high computer intensive tasks receiving high priority. The computational requirement of a task is derived from pervious execution of similar type of task. If prior knowledge of such execution is not available within the system, the task is assigned with high default priority, and after execution of the task, the computational requirement of the task will be stored for future reference.

Master Module

Master Thread

Task

Worker Module

Scheduler Module

Worker Thread

Sub Task1Sub Task2Sub TaskN JavaSpace

WorkerSpace

WorkerID Speed

ID001 1300

TaskSpace

TaskID Priority

TID100 200

WorkOrder

ID001 TID100

Sub Task1TID100

HistoryDataTaskX 1000

Scheduler Thread

Excecute()

Figure 1 High-level architecture of the proposed job scheduler

Universal
Typewriter
111
Page 4: [IEEE 2008 4th International Conference on Information and Automation for Sustainability (ICIAFS) - Colombo, Sri Lanka (2008.12.12-2008.12.14)] 2008 4th International Conference on

Worker Module The worker module represents an actual computation unit of a task. Since the workers are generic any worker within the distributed system can perform any computational task. When a worker initially joins the space as well as when a task is completed worker records it’s processing capability in the compute server, in the form of the processor speed in a special entry named the WorkerSpace. This entry is a singleton within the space and contains information of workers such as worker identity and the processing speed. Thus this entry contains all the information regarding all the workers who are idle and waiting for job scheduling within the system. After updating the WorkerSpace, each worker is on the lookout for a special entry with its worker identity. This entry is called WorkOrder. The WorkOrder entry contains information regarding the task identity which the worker is assigned. Thereafter worker extracts the task, and search the space for a task with the specified task identity. After retrieving the task from the space, worker performs the task specific computation contained within the entry and writes back the results to the space. Thereafter worker derives the complexity of the task by taking into account it’s processor speed and the duration of the task execution and saves the derived task complexity information in a singleton entry within the space named the History Data. This entry effectively contains information regarding types of jobs executed within the space and the value for the complexity of the task. Scheduler module The scheduler logic is contained within two task entries which can be executed by either the master or worker process within the system. The two entries which contain the scheduler logic are WorkerSpace entry and the TaskSpace entry. Scheduling logic in either of the entries is kicked off based on the number of task for processing reaching the set ceiling or the number of workers within the system reaching ceiling. When scheduling is kicked off the scheduler takes in all the worker entries containing the worker identities and the processing speeds from the WorkerSpace singleton entry and retrieves all the tasks and their priorities from the TaskSpace singleton and assign the high priority tasks with the high speed processor workers. When the scheduling is completed the scheduler writes an entry for each and every schedule task. The entry contains two parameters; those being the worker identity and task identity. These entries will be picked up by the workers to determine which task is scheduled on them.

III. EVALUATION OF FRAMEWORK

For the evaluation of the proposed system, two java compute servers were implemented. One with the proposed scheduling framework and the other as a standard java

compute server implementation. Several image processing algorithms from the image processing paradigm was utilized to evaluate the effectiveness of the proposed solution. Some of the algorithms used were basic ray tracing, Gaussian transformation, etc. Below depicted figure 2 shows the task throughput analysis results of performing ray tracing functionally on different frames with varying frame sizes. Task Throughput analysis

Figure 2 Results of throughput analysis using ray tracing

As can be seen from the figure 2, the initial run (run 1) of each task on the scheduling based java compute server is expensive. This is due to the fact that there is no previous information regarding the complexity of the task, but on the second run of the similar task, the throughput of the task is increased due to scheduling. The run N depicts the results of running similar tasks with different data sets after several runs. And the standard java compute server without the scheduling (W/O Scheduling) on average is not performing well due to under utilization of resources. The figure 3 depicts the throughput analysis of running ray tracing and gaussian transformation in parallel on varying images with different data size sets. As with the pervious results the initial run (run 1) has a low throughput, and this is increased in the second run (run 2) with the same data set and in the Nth run, even with differing data sets the task throughput is increased.

Universal
Typewriter
112
Page 5: [IEEE 2008 4th International Conference on Information and Automation for Sustainability (ICIAFS) - Colombo, Sri Lanka (2008.12.12-2008.12.14)] 2008 4th International Conference on

Figure 3 Results of throughput analysis using different algorithms

Figure 4 Throughput improvement for different dataset and different

algorithm The figure 4 illustrates the throughput improvement as percentage compared to throughput of the proposed scheduler against the throughput of standard java compute server. As can be seen from the figure the throughput is greatly improved in all scenarios. From the experimental results above it is evident that introduction of the job scheduler had not degraded the system drastically even in the worst case scenario and when information regarding previous executions are available the task throughput is increased.

IV. CONCLUSION This paper presented the design, implementation and

evaluation of a distributed adaptive job scheduler for java compute servers. The proposed job scheduler is a lightweight process which enhances the capability of

underlying java compute server. The scheduler takes into account the computational requirements of individual tasks when scheduling jobs. The jobs with high computational requirements are scheduled on the workers with high processors in order to increase the throughput of the application.

The experimental evaluation of the proposed job

scheduler shows that application throughput is increased when prior information is available regarding the computational requirements of specific jobs. The experimental results also further confirm that without prior information the application throughput is not degraded drastically with the introduction of the job scheduler. As future work the job scheduler can be further improved by introducing a more comprehensive mechanism to calculate the computational requirement of jobs, taking into account the communication costs associated with the processing.

REFERENCES

[1] D. Gelernter. Generative communication in Linda. ACM Transactions on Programming Languages and Systems, 7(1):80–112, 1985.

[2] Sun Microsystems. Javaspaces, www.sun.com/software/jini/, 2007. [3] S.Jain, S.Meeran: Deterministic job-shop scheduling: past present

and feature, Department of applied physics and electronic and mechanical engineering, University of Dundee, 1998

[4] U.Schwiegelshohn, R.Yahyapour: Analysis of First Come First Serve Parallel Job Scheduling

[5] Ernemann,Hamscher,Schwiegelshohn,Yahyapour: On Advantages of Grid Computing for Parallel Job Scheduling, Computer Engineering Institute,University Dortmund.

[6] Srinivasan,Kettimuthu, Subramani, Sadayappan : Characterization of Backfilling Strategies for Parallel Job Scheduling, Department of Computer and Information Science, The Ohio State University

[7] Abraham,Buyya,Nath: Nature’s Heuristics for Scheduling Jobs on Computational Grids, School of Computing and Information Technology,Monash University, Gippsland Campus.

[8] Giersh, Robert, Vivien: scheduling tasks sharing files on hetrogenious master-slave platforms

[9] Y. Wang and R Morris "Load balancing in distributed systems" IEEE Trans. comp. c-34,4, March 1985, pp.204-217.

[10] S.Zhou: A trace driven simulation study of dynamic load balancing, Computer Science Division, University of California, Berkley

[11] S. ZHOU. A trace driven simulation study of dynamic load balancing. IEEE Trans. Software Engineering 14(9): 1327–1341, Sep. 1988.

[12] M. SMITH. A survey of process migration mechanisms. ACM Operating Systems Review 22(3):28–40, Jul. 1988

Universal
Typewriter
113