Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
ADAPTIVE INTELLIGENT GRID SCHEDULING SYSTEM
SIRILUCK LORPUNMANEE
A thesis submitted in fulfilment of the
requirements for the award of the degree of
Doctor of Philosophy (Computer Science)
Faculty of Computer Science and Information System
Universiti Teknologi Malaysia
JUNE 2010
iii
Dedicated to my beloved mother Peksim Sae-Khow,
and father Liang Kuang Sae-Lor
iv
ACKNOWLEDGEMENTS
This thesis could not be successfully completed without the kindness of both of
my supervisors. First and foremost to my late supervisor, Associate Professor Dr.
Mohd Noor Md Sap for introducing me to data mining research area. I want to
express my deep gratitude to my supervisor for his support and guidance throughout
my research. I am very grateful for the inspirational discussions that led to this thesis
to success. The special thanks to my co-supervisor, Professor Dr. Abdul Hanan
Abdullah, who has a good guidance during my experimentation period, and for his
appreciated suggestions. Additionally, I thank him for helping me in correcting my
thesis.
I would also like to thank all the staff in the Faculty of Computer Science and
Information Systems at Universiti Teknologi Malaysia for their support specially in
providing facilities and materials required for my research study.
I must express my sincerest and heartiest thanks to Miss Chaupaga
Panpakdeevong for the best wishes, continuous love, encouraging, and for the
confidence she gave me throughout my study. In addition, I would like to thank Suan
Dusit Rajabhat University for the financial support during my study. Without them,
this thesis could not have been written.
Finally, my graduation would not be achieved without best wishes from my
mother Peksim Sae-Khow, and father Liang Kuang Sae-Lor specifically for the
wonderful life they gave me, and greatest love and willpower which I will always
remember until the day I meet them. Last but not least my special gratitude and
respect to my relatives and friends for their help and encouragement.
v
ABSTRACT
Grid technologies are established to share the large-scale heterogeneous
resources over multiple administrative domains for processing the application. In
these technologies, the grid scheduling problem is crucial that must be solved in
order to achieve multiple objectives within different stakeholders (end-users, owner
resources and administrators) preferences. The aim of this research is to design and
implement the Adaptive Intelligent Grid Scheduling System (AIGSS) in order to
achieve multiple objectives named Makespan Time, Grid Efficiency and Total
delayed jobs. The popular meta-heuristic algorithms, namely Ant Colony
Optimization (ACO) and Tabu Search (TS) algorithms are proposed and developed
to maintain the selecting appropriate grid resource to execute each job within the
different job inter-arrival times and grid resources. Additionally, the clustering
technique named Fuzzy C-Means (FCM) algorithm is proposed for clustering the
groups of grid resources as well as jobs based on the degree of characteristic
similarity. Moreover, a popular discrete event simulation tool, namely, GridSim
toolkit and Alea simulation, is extended by developing the service modules on top of
it. Therefore, the experiment is simulated as realistic grid environment in order to
measure the proposed system. The experimental results show that the AIGSS
provides reasonable multiple objectives to stakeholders within different job inter-
arrival times and machines in grid system. In addition to the experimental results, the
proposed system performs better than the other algorithms for different goals of each
stakeholder. The performance of AIGSS is compared with the common and heuristic
algorithms such as First-Come-First-Serve (FCFS) with Optimization, Earliest
Deadline First (EDF), Minimum Tardiness Earliest Due Date (MTEDD), Minimum
Completion Time (MCT), Opportunistic Load Balancing (OLB), MIN-MIN, Hill
Climbing, EASY Backfilling, Simulated Annealing (SA), and Tabu Searching (TS).
vi
ABSTRAK
Teknologi grid dibangunkan untuk membolehkan sumber seperti organisasi
maya berkongsi sumber heterogen yang berskala besar ke atas pelbagai domain
pentadbiran untuk memproses aplikasi. Melalui teknologi ini, masalah penjadualan
grid adalah kritikal dan mesti diselesaikan bagi mencapai pelbagai objektif oleh
pemegang amanah yang berbeza (pengguna akhir, pemilik sumber dan pentadbir).
Matlamat kajian ini adalah untuk mereka bentuk dan melaksanakan Sistem
Penjadualan Grid Pintar yang Adaptif (AIGSS) bagi mencapai pelbagai objektif iaitu
Makespan Time, Grid Efficiency dan Total delayed jobs. Algoritma meta-heuristik
yang terkenal seperti algoritma Ant Colony Optimization (ACO) dan Tabu Search
(TS) dicadangkan dan dibangunkan untuk menyelenggara sumber grid yang dipilih
untuk melaksanakan setiap kerja dalam sela masa ketibaan dan sumber grid yang
berbeza. Tambahan lagi, teknik pengelompokan iaitu algoritma Fuzzy C-Means
(FCM) dicadangkan untuk mengelompokkan sumber grid dan juga kerja berdasarkan
kepada persamaan ciri-cirinya. Selain itu, penambahan ke atas peralatan simulasi
peristiwa diskrit yang popular iaitu GridSim toolkit dan Alea simulation turut
dilaksanakan dengan membangunkan modul perkhidmatan pada kedua-dua peralatan
simulasi tersebut. Oleh itu, ujikaji yang dilaksanakan adalah simulasi sebenar
persekitaran grid bagi menguji sistem yang dicadangkan. Hasil ujikaji menunjukkan
bahawa Sistem Penjadualan Grid Pintar yang Adaptif (AIGSS) mampu mencapai
pelbagai objektif yang ditetapkan oleh pemegang amanah dalam sela masa ketibaan
dan mesin yang berbeza dalam sistem grid. Hasil ujikaji itu juga menunjukkan sistem
yang dicadangkan mampu dilaksanakan dengan lebih baik berbanding algoritma lain
bagi matlamat yang berbeza untuk setiap pemegang saham. Prestasi AIGSS telah
dibandingkan dengan algoritma biasa dan algoritma heuristik seperti First-Come-
First-Serve (FCFS) dengan pengoptimuman, Earliest Deadline First (EDF),
Minimum Tardiness Earliest Due Date (MTEDD), Minimum Completion Time
(MCT), Opportunistic Load Balancing (OLB), MIN-MIN, Hill Climbing, EASY
Backfilling, Simulated Annealing (SA), dan Tabu Searching (TS).
vii
TABLE OF CONTENTS
CHAPTER TITLE PAGE
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES xii
LIST OF FIGURES xiii
LIST OF SYMBOLS AND ABBREVATIONS xvii
LIST OF APPENDICES xx
1 INTRODUCTION 1
1.1 Preliminary 1
1.2 Grid Computing Environments: An Overview 2
1.3 Problem Background 3
1.3.1 The Grid Scheduling Problem 4
1.3.2 The Grid Scheduling Algorithms 7
1.4 Problem Statement 9
1.5 Objectives of Research 11
1.6 Assumptions and Scope of the Research 12
1.7 Research Contributions 13
1.8 Thesis Organization 14
viii
2 LITERATURE REVIEW 16
2.1 Introduction 16
2.2 Grid Scheduling Schemes 17
2.2.1 Scheduler Organization 19
2.2.2 Scheduling Policies 19
2.3 Grid scheduling algorithms 20
2.3.1 Local scheduling versus Global scheduling 23
2.3.2 Static scheduling versus Dynamic scheduling 23
algorithms
2.3.3 Adaptive scheduling versus Non-Adaptive 25
scheduling algorithms
2.3.4 Approximation versus Heuristic algorithms 28
2.4 Ant Colony Optimization (ACO) algorithms Meta-Heuristic 38
2.4.1 General Frameworks for Ant Colony Optimization 38
(ACO) Algorithm: Ant Colony System (ACS)
2.4.2 Ant Colony Optimization (ACO) for Grid scheduling 43
problem
2.5 The other meta-heuristic algorithms: Tabu search (TS) 51
2.6 Clustering technique 53
2.6.1 Fuzzy C-Means algorithm 54
2.7 Grid Information Service (GIS) 57
2.8 The Simulation Tools for Grid scheduling Systems 59
2.9 Chapter Summary 63
3 RESEARCH METHODOLOGY 66
3.1 Introduction 66
3.2 Research Operational Framework 67
3.3 Problem Formulation (phase 1) 68
3.3.1 The Grid Scheduling Problem Identification 69
3.4 System Design and Development Process (phase 2) 71
3.4.1 Analysis, Design and Definitions of the parameters, 72
the performance evaluation metrics and the
architecture of the model
ix
3.4.1.1 Job Model 72
3.4.1.2 Grid Resource Model 73
3.4.1.3 Grid Scheduling Constraints 74
3.4.1.4 Performance evaluation metrics 76
3.4.1.5 The Architecture of AIGSS 78
3.4.2 Analysis, Design and Development The Job 81
Information and Knowledge Discovery
3.4.2.1 The Intelligent Job Characteristics Clustering 81
Groups Module
3.4.3 Analysis, Design and Development Resource 82
Knowledge Discovery and Scheduling System
3.4.3.1 The Resource Information Management and 82
Knowledge Discovery Module
3.4.3.2 Adaptive Intelligent Grid Scheduling 83
Algorithm Module
3.4.4 Analysis, Design and Development Job Execution 84
3.5 Implementation (phase 3) 85
3.6 Chapter summary 85
4 ADAPTIVE INTELLIGENT GRID SCHEDULING SYSTEM 87
(AIGSS) ARCHITECTURE
4.1 Introduction 87
4.2 The Intelligent Job Characteristics Clustering Groups Module 87
4.2.1 Job Information and Knowledge Discovery Method 88
4.3 The Resource Information Management and Knowledge 91
Discovery Module
4.3.1 Evaluation Resource Performance Method 92
4.3.2 Resource Information and Knowledge Discovery 96
Method
4.4 The Adaptive Intelligent Grid Scheduling Algorithm Module 98
4.4.1 Ant Colony Optimization (ACO) Algorithm 100
4.4.2 Tabu Search (TS) Algorithm 110
4.5 Chapter summary 114
x
5 ADAPTIVE INTELLIGENT GRID SCHEDULING SYSTEM 115
(AIGSS) BASED ON GRIDSIM SIMULATION TOOLKIT
5.1 Introduction 115
5.2 Grid Simulation and Adaptive Intelligent Grid Scheduling 117
System (AIGSS) Based on GridSim Toolkit
5.2.1 Grid Simulation Architecture Model 118
5.2.2 An event diagram of Grid Simulation Model 120
5.3 Design and Implementation of the Adaptive Intelligent Grid 123
Scheduling System (AIGSS) Module
5.3.1 UML diagram of Grid Simulation Model 124
5.3.2 The Intelligent Job Characteristics Clustering Groups 134
Module Based on Fuzzy C-Means (FCM) algorithm
5.3.3 The Resource Information Management and 136
Knowledge Discovery Module Based on
Fuzzy C-Means (FCM) algorithm
5.3.4 The Adaptive Intelligent Grid Scheduling Algorithm 138
Module Based on hybrid Ant Colony Optimization
(ACO) and Tabu Search algorithms
5.4 Chapter summary 140
6 EXPERIMENTS AND RESULTS 141
6.1 Introduction 141
6.2 Experiments design 142
6.3 The parameter value settings for ACO, Tabu search, 146
Fuzzy C-Means and the other algorithms
6.4 Experiments 1 (Different job inter-arrival times): 147
Preliminary Results and Discussion
6.5 Experiments 2 (Different job inter-arrival times): 155
Results and Discussion
6.6 Experiments 3 (Different number of machines): 167
Results and Discussion
6.7 Chapter summary 176
xi
7 CONCLUSIONS AND FUTURE WORK 177
7.1 Introduction 177
7.2 Conclusions 178
7.3 Future Work 180
REFERENCES 182
APPENDICES A – E 201 - 225
xii
LIST OF TABLES
TABLE NO. TITLE PAGE
2.1 Summary of the grid scheduling algorithms 33
2.2 Summary of the ACO algorithm in grid scheduling 49
4.1 Parameters of Data set of job characteristics 88
4.2 Parameters of Data set of machine characteristics 94
4.3 Clustering groups of machines 97
6.1 The grid resource’s attributes 143
6.2 Jobs workload’s attributes 144
6.3 Summary of the average evaluation index value in different 166
job inter-arrival times
6.4 Summary of the evaluation index value in the different 175
machines case for one data set of job inter-arrival time
(data set 1)
xiii
LIST OF FIGURES
FIGURE NO. TITLE PAGE
2.1 Scheduling algorithms taxonomy 22
2.2 Ant Colony System Algorithm for TSP instance 42
2.3 The basic structure of the Tabu Search flowchart 52
2.4 The simple pseudo-code of Tabu Search algorithm 52
2.5 Fuzzy C-Means (FCM) algorithm 56
2.6 Fishbone Diagram of Grid Scheduling problem 65
3.1 Research operational framework 67
3.2 Adaptive Intelligent Grid Scheduling System Architecture 80
4.1 Intelligent Job Characteristics Clustering Groups Module 89
4.2 Clustering groups of jobs 90
4.3 Resource Information Management and Knowledge 91
Discovery Module
4.4 The space in CPUs of each machine 94
4.5 An Adaptive Intelligent Grid Scheduling Algorithm Module 100
4.6 Construction the graph of the expected time to complete 101
of job on machine
4.7 Expected makespan time of online job into each 102
queue grid resource (machine)
4.8 The gap in PEs (CPUs) of machine 104
4.9 The pseudo-code of ACO algorithm in implementation 106
4.10 The pseudo-code of Pheromone initialization algorithm in 107
implementation
4.11 The pseudo-code of State Transition Rule algorithm in 107
implementation
xiv
4.12 The pseudo-code of the Local Update Rule algorithm in 109
implementation
4.13 The pseudo-code of the Global Update Rule algorithm in 109
implementation
4.14 Illustration of Tabu search algorithm to reschedule 110
the waiting jobs in grid queue
4.15 The pseudo-code of Tabu Search algorithm in implementation 112
5.1 The layers of Grid Simulation architecture 117
5.2 An event diagram for interaction among the jobs, AIGSS and the 122
resources entities
5.3 A class diagram showing the relationship in UML among 125
GridSim, AdaptiveIntelligentGridSchedulingSystem and
other entities
5.4 A local queue model for each Computational Node is 127
implemented by AdvancedPolicy class in GridResource class
5.5 Communication scheme the Grid user, the Adaptive 133
Intelligent Grid Scheduling System and the Grid Resource
5.6 Flowchart diagram for clustering the groups of a new 135
arrival job in the grid system
5.7 Flowchart diagram for clustering the groups of machines 137
in the grid system
5.8 Flowchart diagram for selecting the appropriate grid resource 139
(machine) of grid system in order to execute the job
6.1 System workload through the simulations 145
6.2 The comparisons of makespan time between Adaptive 150
Intelligent Grid Scheduling System and the other algorithms
within five job inter-arrival times
6.3 The comparisons of Grid efficiency between Adaptive 151
Intelligent Grid Scheduling System and the other algorithms
within five job inter-arrival times
6.4 The comparisons of delayed jobs between Adaptive 153
Intelligent Grid Scheduling System algorithm and the other
algorithms within five job inter-arrival times
6.5 The comparisons of total tardiness time of jobs between 154
xv
Adaptive Intelligent Grid Scheduling System and the other
algorithms within five job inter-arrival times
6.6 The comparisons of makespan time between Adaptive 158
intelligent Scheduling System and the other algorithms
within five job interarrival times
6.7 The comparisons of Grid efficiency between Adaptive 160
Intelligent Grid Scheduling System and the other algorithms
within five job inter-arrival times
6.8 The comparisons of delayed jobs between Adaptive 162
Intelligent Grid Scheduling System and the other algorithms
within five job inter-arrival times
6.9 The comparisons of total tardiness time of jobs between 163
Adaptive Intelligent Grid Scheduling System and the other
algorithms within five job inter-arrival times
6.10 The comparisons of average job slowdown between Adaptive 165
Intelligent Grid Scheduling System and the other algorithms
within five job inter-arrival times
6.11 The comparisons of makespan time between Adaptive 169
Intelligent Grid Scheduling System and the other algorithms
within different machines
6.12 The comparisons of Grid efficiency between Adaptive 170
Intelligent Grid Scheduling System and the other algorithms
within different machines
6.13 The comparisons of delayed jobs between Adaptive 172
Intelligent Grid Scheduling System and the other algorithms
within different machines
6.14 The comparisons of total tardiness time between Adaptive 173
Intelligent Grid Scheduling System and the other algorithms
within different machines
A.1 The implementation of OLB algorithm 202
A.2 The implementation of MET algorithm 203
A.3 The implementation of MCT algorithm 204
A.4 The implementation of MIN-MIN algorithm 206
B.1 The implementation of FCFS with Optimization algorithm 208
xvi
B.2 The implementation of EDF algorithm 209
B.3 The implementation of MTEDD algorithm 210
C.1 The implementation of EASY BackFilling algorithm 213
C.2 The implementation of Resource (machine) with minimum 215
expected makespan time algorithm
C.3 The implementation of Hill Climbing algorithm 216
C.4 The implementation of Simulated Annealing algorithm 217
C.5 The implementation of Tabu Search algorithm 219
C.6 The implementation of moveToOtherResource 220
xvii
LIST OF SYMBOLS AND ABBREVATIONS
k
ijp - The probability with an ant k currently at city i chooses to go
to an unvisited city j
1ij
ijd - A heuristic information value, where ijd is the distance
between cities i and j
ij - The pheromone trail value between cities i and j
, - Two parameters which determine the relative influence of
heuristic information and pheromone trail
k
iN - The feasible neighborhood connected to city i with respect to
ant k
- The pheromone evaporation rate
bs
ij - An amount of pheromone that only one ant constructed the
shortest tour since the beginning of the trail
0 - An initial amount of pheromone
bsL - The length of the globally best tour from the beginning of the
trail
q - A random variable uniformly distributed in [0 … 1]
q0 - A parameter and its value is 00 1q
J - The city ( )k
iJ N
nxxxX ,...,, 21 - A set of given data
xi - The ith of d-dimensional measured data
xviii
vj - The d-dimension center of the cluster
C - Fuzzy cluster
CvvvV ,...,, 21 - The cluster centers
NxCijU - The degree of membership of vector xi in the cluster jth
dij = ||xi – vj|| - The Euclidean distance (dij) between xi and vj
- Termination threshold
MPPs - Massively Parallel Processors computer
VOs - Virtual organizations
QoS - Quality of Service
SMP - Symmetric Multiple Process computer
COW - Cluster Of Workstations
GIS - Grid Information Service
GRIS - Grid Resource Information Server
GIIS - Grid Index Information Server
MDS - Globus Monitoring and Discovery System
NWS - Network Weather Service
MCT - Minimum Completion Time
MET - Minimum Execution Time
KPB - K-Percent Best
OLB - Opportunistic Load Balancing
BLBD - Based on Load Balancing and Demand
UDA - User Directed Assignment
MECT - Minimum Execution Completion Time
ACO - Ant Colony Optimization
AS - Ant System
EAS - Elitist strategy for Ant System
MMAS - Max-Min AS
xix
ACS - Ant Colony System
BACO - Balanced Ant Colony Optimization
TS - Tabu Search
SA - Simulated Annealing
GA - Genetic Algorithm
FCM - Fuzzy C-Means Algorithm
TSP - Travelling Salesman Problem
ETC - Expected Time to Complete
FCFS - First Come First Served
EDD - Earliest Due Date
EDF - Earliest Deadline First
ERD - Earliest Release Date
MTEDD - Minimum Tardiness Earliest Due Date
EASY - Extensible Argonne Scheduling sYstem
SDSC - San Diego Super Computing Center
MIPS - Million Instructions Per Second
MI - Million Instructions
PEs - Processing Elements or CPUs
SPEC - Standard Performance Evaluation Corporation
OS - Operating System
UML - Unified Modeling Language
DAG - Directed Acyclic Graphs
xx
LIST OF APPENDICES
APPENDIX TITLE PAGE
A On-line and Batch mode Algorithms 201
B The other common algorithms 207
C The Heuristic algorithms 212
D The data set structure of job and machine 221
E List of Publications 223
CHAPTER 1
INTRODUCTION
1.1 Preliminary
Generally, computing systems have developed starting from a single
computing system and have eventually grown into Massively Parallel Processors
(MPPs), computing network of workstations, distributed computing systems and
now, grid computing systems. In fact, the development of computation systems
always improves the ability of the systems to solve large complex problems and
process a large amount of data. The high performance systems are able to solve large
complexity problems and process a large amount of data. But they are restricted to
costly systems available and privileged locations such as super computing centres.
Therefore, the resources are often provided in a single organization. Recently, the
explosion of the Internet and high-speed networks, mainly as a result of faster
hardware and more sophisticated software, helps workstations to connect with one
another. According to these technologies, the distributed computing systems are
formed as a single system, unified computing resources, which lead to the popular
heterogeneous distributed computing systems known as grid computing systems
(Foster and Kesselman, 1999). The term grid is usually known as an electrical power
grid which provides dependable, consistent, pervasive, and also transparent access to
electricity but not irrespective of its resources.
Grid computing systems are the latest heterogeneous distributed computing
systems concept and technology. According to Foster and Kesselman (1999); Foster
2
et al. (2002); Mark et al. (2002), grids are recently proposed to enable the selection
and sharing of large resources, and integration of various resources including
supercomputers, workstations, storage systems, data sources, networks systems,
software, specialized devices and even people. These technologies therefore have led
to geographically distributed computing systems and multiple domain spanning
resources in solving the large-scale computational and data intensive problems in
science, engineering and business. Virtual organizations (VOs) are initially defined
by Foster et al. (2001), where the researchers had tried to address and summarize the
grid computing systems according to the current technologies and their framework.
1.2 Grid Computing Environments: An Overview
Most Grid computing systems, in the research and academic context,
emphasize increasing the efficiency as well as leveraging high performance
computing systems. Currently, the increasing performance of networking
technologies and personal computers have produced powerful distributed computing
systems by connecting PCs and the other devices together on the different networks.
With the large gathering of connected distributed computing systems with its various
resources sharing, the grid computing systems are the next evolutionary level of the
heterogeneous distributed computing systems. It usually comprises of geographically
heterogeneous distributed resources with each local management system such as
Operating System (OS), queuing systems and resource owners (CPU, I/O, memory,
etc.), policies and applications (engineering, scientific and commercial) with
different requirements (CPU, memory, storage, etc.) connected with the network
intensively. Consequently, it is geographically distributed with different time zones
and heterogeneous in nature. The grids are made of a virtual organization (VO)
(Foster et al., 2001), who is the multiple stakeholder (resource owner and end users)
aiming for different goals, objectives, strategies and demand patterns (Buyya et al.,
2000).
3
The project to link geographically dispersed supercomputers, distributed
computing systems and to connect other devices, leads to the start of conceptual grid
computing environments. Currently, the fast growing of the Internet has taken us far
beyond its original intention. Wide-area distributed supercomputers and workstation
computers linked with other devices have been a popular infrastructure of grid
computing environments, which a number of applications can take advantage of it.
Nevertheless, the advantages of the grid infrastructure are not only in its application
but also the collaborative engineering, huge data exploration, high throughput
computing and high performance computing systems.
The grid can be viewed as an open system that integrates computational and
collaborative environment. The users interacting with grid computing systems can
potentially use the enormous number of services that are available in the grid. It is
similar to the way that electrical devices can draw power from electrical power grid.
The different users across a virtual organization (VO) can get the opportunity to use
grid computing system for solving a large application problem.
1.3 Problem Background
The grid computing systems are one of the heterogeneous distributed
computing systems that have been mentioned before. A common requirement in
these systems is to utilize resources located within different administrative domains
and provide services for the requesting application. The utilization of grid resources
also provides an opportunity for collaborative computing to solve large complex
problems and process a large amount of data. The complicated computing needed by
the grid resource consumer (or grid application) and the resource owner lead to
complicated resources sharing. Therefore, the application needs to take into account
of the resources behaviour that often demands assurances of the level and type of
service provided by the resources.
4
According to the dynamic nature of grid, the system workload of each grid
resources cannot be determined in advance and also it is impossible for grid users
and grid system administrators to manually find and specify all the needed grid
resources during the arrival of applications. To achieve the goals and potentials of
computational grids, robust mechanisms for allocating resources to applications, an
efficient and effective scheduling system is fundamentally important.
The problem of scheduling jobs assigned to resources has been extensively
studied as a basic problem in traditional parallel and distributed systems. For
example, symmetric multiple process machine (SMP), massively parallel processors
computers (MPP), cluster of workstations (COW) and even the new conceptual
heterogeneous distributed computing systems such as grid computing system have
been studied. The different objectives in scheduling algorithm are designed to meet
these aims. Unfortunately, all of these scheduling algorithms such as FCFS (First
Come First Served) and STF (Shortest Task First) (Karatza, 2000) in traditional
parallel and distributed systems do not usually perform well in the new
circumstances such as grid system environment in particular (Berman, 1998).
1.3.1 The Grid Scheduling Problem
According to the characteristics of grid environment which are currently
presented, a number of applications are turning to grid environment to achieve their
computational resources and share large data storage. Existing scheduling systems
are now based on distributed computing systems. For example in Casavant and Kuhl
(1988), the detailed description of different categories of scheduling classified a large
number of distributed scheduling approaches according to the taxonomy.
Additionally, the researches of Gehring and Reinefeld (1996); Weissman (1996);
Weissman (1999); Xing and Xiaodong (1997) of scheduling systems for distributed
were focused on maintaining load balancing, throughput of the systems, and
turnaround times.
5
From the point of view of these scheduling, it is well known that the
complexity of a general scheduling problem is NP-Complete problem (El-Rewini et
al., 1994; Fernandez-Baca, 1989; Tracy D. Braun et al., 2001) which has been
presented by several researches. Besides the grid scheduling problem issues have
been studied by many researchers as referred in Nabrzyski et al. (2003). In Fibich et
al. (2005), these researchers have studied and proposed the model of grid scheduling
by extending from Graham’s classification of scheduling problems (R.L. Graham,
1979). In brief, they made an effort to propose the model of grid scheduling
problem, its model may consist of environment, characteristics, constraints and the
objective. Therefore, the different job and machine characteristics in the natural grid
are the issue in order to achieve the performance grid.
Grid scheduling system is usually defined as the process of making
scheduling decisions that involves resources across multiple administrative domains.
This process can also include searching module that gathers the different resources in
multiple administrative domains. Its performance assigns or schedules an application
or a single job to use multiple resources at a single site or multiple sites. For an
application, to a set of application such as a parameter sweep, a single job can be
anything that needs resources to execute, for example CPU request, network
bandwidth as well as data storage request. For the resource, it is anything that can be
scheduled, for instance, a machine, CPU speed, CPU requirement, disk space, QoS
network, and so forth. The difference between grid scheduler and local resource
scheduler is typically the owners control over the resources. Since the functional
local resource schedule is usually responsible for scheduling and also managing
resources at a single site or perhaps only for a single cluster or resource. These are,
therefore, the lowest-level and lower-level scheduling as discussed within Uwe and
Ramin (2004). Particularly, grid scheduler has no control over a set of resources and
a set of job assigned to them, even the state of their information. In fact, the dynamic
resource systems and grid users in the natural grid are willing to deal with each other
in order to allocate the coordination of the resources for their application or job. As a
result, this lacks of ownership, control over the resources and also to use manually
find, reserve, and allocate all the resources needed to execute an application or a job.
These issues lead to address the challenges of problem.
6
The grid account of a high diversity system in general, in which comprises of
various applications, middleware component, and resources. The point of view of
functional scheduling involves three main phases (Jennifer, 2004), which comprises
of resources discovery, resource selecting and scheduling, and job execution. By
these phases, to lack of information of the status of available resources, especially the
dynamic heterogeneous distributed computing systems such as the nature of grid, can
lead to inefficient and ineffective scheduling. Accordingly, grid information service
(GIS) is an indispensable component in the nature grid. The main functional Grid
Information Service (GIS) is usually discovering, gathering and providing
information to grid scheduler system. The information within GIS may be comprised
of CPU capacities, memory size, storage size, network bandwidth, software
availabilities, work load of each resource in particular period and so forth. Currently,
Globus Toolkit (http://www.globus.org/toolkit/) is open source for building Grid
systems and applications. It also provides Globus Monitoring and Discovery System
(MDS) (Czajkowski et al., 2001) for providing information service; likewise,
Network Weather Service (NWS) (Wolski et al., 1999) also provides the information
service. These are two mature examples of GIS.
Therefore, the information of grid system plays highly important role in
helping the grid scheduler system for selecting, making scheduling decision along
with criteria estimation. The lack of detailed information causes the poor
performance of the scheduler system. Because of the diversity of job and resource
(machines) characteristics in the natural grid, the job consists of arrival time, a
number of CPUs requirement, size of job and its due date, as well as the machine
consists of CPU speed, a number of CPU and its workload, involve the making
scheduling decision in order to achieve the objective. Also, a number of machine and
job, the different objectives of the stakeholders (administrators, resource owners, end
users) and the types of job cause the different scheduling decision making. Similarly,
scheduling algorithm within grid scheduler system is also an indispensable module.
The successful grid scheduler system needs intelligent algorithm and accommodates
all of the information for selecting, making scheduling decision that allocates jobs or
applications to proper resources belong to criteria estimation. The intelligence
algorithm usually needs correct information so that it can make scheduling decision
7
well. In order to achieve the objectives and potentials of computational grids, the
requirement is not only the precise provided information but also the intelligent
algorithm. The successful grid scheduler system therefore should consider and
accommodate all of these issues.
1.3.2 The Grid Scheduling Algorithms
The grid scheduling system must make resources selection decisions in an
environment where it does not have fully control over local resources on grid sites in
nature. Additionally, it has significantly to deal with high rapidly changing, regularly
unpredictable input data, causing ambiguity and incompleteness of information.
Furthermore, the performance and the availability characteristics of the individual
resources within the resources pool and also the status of grid applications may
change from time to time.
The diverse characteristics of grid applications and grid resources have been
therefore considered to significantly impact on the design of scheduling algorithm
system. To achieve the computational grid of performance goals, the grid
information service plays a highly important role. Moreover, the intelligent
scheduling algorithm should be considered and accommodated all of these issues.
From the definition of grid system in nature (Mark et al., 2002), they showed
the necessary components to form grid system and the services of grid were also
discussed by details. The theory, algorithm of scheduling (Pinedo, 2002)
characterized the classification of scheduling problem in term of a number of job and
machines, their characteristics and optimization criterion. By these characteristics,
they were described by several types, for example the arrival time of job, the
processing time of job, the due date of job, machines in parallel with different
speeds, unrelated machines, and so on. For optimization criterion, the makespan
time, average complete time and maximum lateness, etc. were shown by details.
These constraints almost satisfy in traditional scheduling problems while grid
8
scheduling is significantly a new scheduling problem which differs from the
traditional scheduling in the diverse characteristics of circumstance. However, grid
scheduling problem can get benefits from several traditional scheduling
methodologies. These methodologies have achieved successful results in a wide
variety of scheduling problems.
One of the first taxonomies of scheduling algorithm in distributed computing
systems is categorized by Casavant and Kuhl (1988). They mainly categorized the
problem domain into static and dynamic scheduling. They also presented the groups
of algorithm which related to the kind of domain problem, optimal, sub-optimal,
approximate, and heuristic for example. However, due to the grids scheduling, which
is a special kind of scheduling system, scheduling algorithm in the grids therefore is
subset of this taxonomy. Additionally, another research in Tracy D. Braun et al.
(1998) made effort to present taxonomy of a matching and scheduling heuristics. The
major different parts of their taxonomy for classifying scheduling heuristic have
shown in three parts; (1) the models used for applications and communication
request, (2) the models used for target hardware platforms and (3) the characteristics
of mapping heuristics.
Dynamic heuristics for a class of independent tasks presented in reference
(Muthucumaru Maheswaran et al., 1999) mentioned that two types of different
mapping heuristics of their research within different heuristic algorithms. And there
were five different heuristic algorithms in online mode (immediate mode); Minimum
Completion Time (MCT), Minimum Execution Time (MET), Switching Algorithm
(SA), K-Percent Best (KPB) and Opportunistic load balancing (OLB). While the
other heuristic algorithms such as Min-Min, Max-min and Sufferage were presented
in batch mode. The results of their experiment were the most appropriate for a given
situation. However, the degree of different machine characteristics, for example
number of machine, number of CPU, load of CPU and also job characteristics such
as deadline time, finish time, arrival time, requirement of CPU, etc. were not
considered. Furthermore, their work made a common assumption to solve the
problem. The performance evaluation of grid system such as make-span time was not
9
adequate because grid computing systems usually play with many user parties, for
example end-users, administrators.
Currently, the new approaches of the heuristic algorithms have been inspired
by natural phenomena. These have been proposed to address the challenge of the grid
scheduling algorithm. Several algorithms from natural phenomena have been
introduced to form powerful heuristics and those have been proven to be highly
successful. The meta-heuristic algorithms for grid scheduling, named Simulated
Annealing (SA), Tabu Search (TS), and Ant Colony Optimization (ACO) have been
applied by using each standard algorithm.
1.4 Problem Statement
In recent years, the dynamic scheduler system problems in grid environment
have grown in the number of the studies. Many common goals of grid scheduler
system such as to minimize the makespan time of a job set, tardiness time, as well as
to maximize resource utilization, non-delayed jobs, and so on have been highly
considered. Additionally, several criteria of scheduling problems, for example,
minimizing the cost, communication delay, giving priority to certain users' processes,
needing specialized hardware devices, the different resources and job characteristics,
are included. Moreover, the soft constraints and hard constraints (see 3.4.1.3 in
Chapter 3) play a highly important role. Unfortunately, scheduling algorithms in
traditional parallel and distributed computing systems, which normally operate on
homogeneous and dedicated resources such as computer clusters, supercomputer,
cannot promise good potential for the grid environment because scheduling in grid
environment is significantly complicated by the heterogeneous and dynamics nature
of the grid.
In recent years also, the dynamic heterogeneous distributed computing
systems such as grid environment and its service such as Grid Information Service
(GIS) are proposed to address the challenges of grid scheduling system. The
10
discussion in problem background has shown the techniques with some limitations in
handling grid scheduling problems. Especially meta-heuristic algorithms as referred
in Tracy D. Braun et al. (1999); Young et al. (2003); Zhihong et al. (2003), which
have been growing number of studies. According to these studies, the researchers
proposed the different meta-heuristics by several objectives named the makespan
time, the response time, the sum of the price per unit time and resource average
utilization. However, the several requirement goals in order to satisfy different
stakeholders such as high performance, decreasing delayed jobs and increasing grid
efficiency were not taken into consideration as multiple objectives in assigning the
job to available machine in dynamic grid system.
After the study of the problem background, there are several issues that
should be addressed. Hence, this research is carried out in order to answer the
following questions:
The main question of the Grid Scheduling System is to share the large
resources in grid environment within the soft and hard constraints (see 3.4.1.3 in
Chapter 3) that leads to the exploration of the method in achieving the multiple
objectives of the grids. This question also deals with an algorithm for providing
adaptive, realistic estimation of execution cost of the jobs and current system
information status. Therefore, the questions are as follow:
i) How does the Grid Scheduling System improve the performance of
the grid system based on the soft and hard constraints?
ii) How to design and implement an efficient and effective Grid
Scheduler System so that the grid system can achieve the multiple
objectives?
iii) How to discover and allocate the optimal available grid resource for
each job?
Secondly, accommodating the different job characteristics and grid resources
(machines) within the rapidly changing system information are also considered in
11
order to improve the high performance, high throughput and high system usage of
the overall system. Therefore, the questions are as follow:
i) How to cluster the groups of the jobs based on the degree of similarity
of job characteristics that should be assigned to the suitable grid
resources?
ii) How to cluster the groups of the machines based on the degree of
similarity of machine characteristics before executing the job?
iii) Do Grid Information Service (GIS) and the current system status
information help to improve the performance of grid scheduling
system?
Finally, exploring the algorithm and method for solving grid scheduling
problems will be of further interest. Therefore, the questions are as follow:
i) Which scheduling algorithms give better results on grid environment?
ii) How to apply clustering technique and meta-heuristic algorithms to
solve grid scheduling problems within the complexity of its
environment?
1.5 Objectives of Research
The objectives of this research is to improve the performance, maintain
system efficiency, and also accommodate several jobs into the grid computing
systems with the use of the Adaptive Intelligent Grid Scheduling System (AIGSS).
To deal with the multiple objectives of high performance, decreasing delayed job and
increasing grid efficiency, multi-criteria plays a highly important role. The criteria
can distinguish the two general groups that are related to particular stakeholders (end
users and resources owners). For end users, time criteria such as makespan time,
tardiness time, and due date are considered in this research, while grid efficiency and
12
resource utilization are also taken into account. Therefore, the multiple objectives of
this research are as follows:
i) To study, design and develop the Adaptive Intelligence Grid
Scheduling System (AIGSS) architecture.
ii) To develop and apply the meta-heuristic algorithms in searching the
appropriate grid resource to execute the jobs in the grid.
iii) To develop and apply data mining technique for clustering the jobs
and grid resources based on the degree of similarity of their
characteristics.
iv) To evaluate and compare the proposed solution with the other
algorithms based on the performance, delayed jobs and efficiency of
grid in this research so that the strength and weakness of them can be
find.
1.6 Assumptions and Scope of the Research
The design, development and experiment involved in this research have a few
limitations that are worth mentioning.
i) Grid Scheduler system is responsible for making scheduling decision
based on the different jobs characteristics and a set of grid resources
within their characteristics. In this approach, the local resource
scheduling policies of the different grid resources (machines) are
responsible for scheduling, managing and controlling over the
resources at lower-level scheduling instance (Jennifer, 2004; Uwe and
Ramin, 2004) that are only considered in space sharing fashion.
Additionally, the set of grid resources across multiple administration
domains is not considered because this may be a drawback when each
implementation of its own local resource scheduling policies within
different administrative domains. This research uses simulation based
13
on GridSim Toolkit (Buyya and Manzur, 2002; Klus´aˇcek et al.,
2007), Alea simulation (Klus´aˇcek et al., 2007) and also extends
internal entities to more complex requirement. Moreover, the discrete-
event simulation plays a highly important role in this research.
ii) In this research, the independent jobs are mainly concentrated and are
also scheduled in non-pre-emptive manner.
iii) In view of the independent jobs concept, the network bandwidth of
the grid system is not considered.
iv) The machine is only considered as grid resource in this research.
v) Finally, this research does not consider the grid resources failure, job
loss, job migration and the precedence constraints for workflow jobs
(Directed Acyclic Graph (DAG) applications (Yongcai et al., 2007)).
1.7 Research Contributions
In this research, the artificial intelligence, knowledge discovery and data
mining techniques in dynamic environment are mainly used to build Adaptive
Intelligence Grid Scheduling System (AIGSS). This research work is different from
many relevant areas of existing research which are described as follows:
i) The research builds the Adaptive Intelligent Grid Scheduling System
(AIGSS) that provides a model consisting of knowledge information
discovery, clustering technique and adaptive grid scheduling
algorithm. Most of the existing systems cannot meet all modules.
ii) The research also designs and implements the Adaptive Intelligence
Grid Scheduling Algorithm that helps make accurate decisions in
assigning jobs to available grid resources within dynamic grid
environment by using intelligent Ant Colony Optimization and Tabu
Search algorithms in order to achieve the multiple objectives.
14
iii) To apply data mining technique for clustering of job and grid resource
based on the degree of their characteristics similarity by using Fuzzy
C-Means algorithm.
iv) Lastly, the proposed system known as the Adaptive Intelligence Grid
Scheduling System (AIGSS) attempts to deal with different
characteristics of the independent jobs and dynamic grid resources.
Whole results are depended on the diversity of grid systems such as
the different job inter-arrival times, grid resources as well as the
fluctuation in CPU load.
1.8 Thesis Organization
The thesis has been organized to elaborate the major aspects of this complex
multidisciplinary study. Following paragraphs give a brief overview of the thesis.
Chapter 2 presents a brief overview of taxonomies for grid scheduling
schemes including scheduler organization and scheduling policies, and grid
scheduling algorithms. Additionally, description of previous works that used the
different techniques including the Meta-heuristic algorithms such as Ant Colony
Optimization (ACO), Tabu Search (TS) and Simulated Annealing (SA) algorithms
will also be mentioned. Moreover, Grid Information Service (GIS), the simulation
tools for grid scheduling systems, and the modelling including architecture of
GridSim are presented in this thesis.
Chapter 3 discusses the definitions of the grid scheduling problems. Also, the
grid scheduling constraints, the definitions of job and grid resource characteristics,
and the performance evaluation metrics are discussed and presented. Additionally,
the demonstration of general description of the system design and development
process in order to achieve objectives of this study is also discussed.
15
Chapter 4 presents the details of Adaptive Intelligent Grid Scheduling System
(AIGSS) architecture. The main components consist of the Intelligent Job
Characteristics Clustering Groups module, Adaptive Intelligent Grid Scheduling
Algorithm module, and Resource Information Management and Knowledge
Discovery module.
Chapter 5 presents the architecture model of extended simulation with all
important classes based on GridSim toolkit and Alea Simulation. The UML diagram
is provided to describe the details of all classes in the simulation. Additionally, the
flowchart diagrams of the important modules consisting of the Intelligent Job
Characteristics Clustering Groups module, the Resource Information Management
and Knowledge Discovery module, and the Adaptive Intelligent Grid Scheduling
Algorithm module are presented.
Chapter 6 presents the experiments and results that are shown by the different
grid scenarios, for example different job inter-arrival times and a number of
machines including different CPU speed and a number of its CPUs. The comparison
with other algorithms will show that the proposed Adaptive Intelligent Grid
Scheduling System (AIGSS) can achieve the multiple objectives of different
requirements of each stakeholder.
The thesis concludes in Chapter 7 by expressing the experiments and results.
It is strongly clear that the different scenarios in grid system can cause the different
results within the other algorithms. However, the proposed system can produce the
good results in comparison. The challenges and emerging trends identified for future
research are also suggested in this chapter.