Integrating Computing Resources Integrating Computing Resources on Multiple Grid-enabled Job on Multiple Grid-enabled Job
Scheduling Systems Through a Scheduling Systems Through a Grid RPC SystemGrid RPC System
Yoshihiro Nakajima, Mitsuhisa Sato, Yoshiaki Aida,Taisuke Boku
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid,2006
Reporter:Tung-Yen Haieh
OutlineOutline
IntroductionDesign of Grid RPC System Integrating
Computing Resources on a Multiple Gridenabled Job Scheduling System
Experimental ResultsConclusion
Introduction(Cont.)Introduction(Cont.)
The demands for high-throughput computing is increasing, several grid-enabled job scheduling systems (GJSSs) that support high-throughput computing, such as by XtremWeb , Condor and CyberGRIP
Introduction(Cont.)Introduction(Cont.)
However, each GJSS has its own user interfaces and each GJSS has its own user interfaces that the management policy for the GJSS may also be different on each site.
They propose a framework for integrating and utilizing computing resources managed by a GJSS in different organizations by using Grid RPC style programming.
Design of Grid RPC System IntegratingDesign of Grid RPC System IntegratingComputing Resources on a Multiple Computing Resources on a Multiple
GJSS(cont.)GJSS(cont.)
Design of Grid RPC System IntegratingDesign of Grid RPC System IntegratingComputing Resources on a Multiple Computing Resources on a Multiple
GJSS(cont.)GJSS(cont.)
Design of Grid RPC System IntegratingDesign of Grid RPC System IntegratingComputing Resources on a Multiple Computing Resources on a Multiple
GJSS(cont.)GJSS(cont.)
The proposed system realizes following objectives:
A uniform and parallel programming model by remote procedure call on the grid-enabled job scheduling system.
A fault-tolerant Grid RPC system on the computing resource side.
Design of Grid RPC System IntegratingDesign of Grid RPC System IntegratingComputing Resources on a Multiple Computing Resources on a Multiple
GJSS(cont.)GJSS(cont.)
Simultaneous exploitation of massive computing resources provided on sites that are managed by different organizations.
An easy-to-use execution environment from a cluster to Grid-enabled Job Scheduling Systems without any change in the application source program.
Design of Grid RPC System IntegratingDesign of Grid RPC System IntegratingComputing Resources on a Multiple Computing Resources on a Multiple
GJSS(cont.)GJSS(cont.)
General APIs to absorb differences between GJSSs.
General APIs to adapt to new GJSSs.
Automatic deployment of execution programs on remote
computing resources.
Design of Grid RPC System IntegratingDesign of Grid RPC System IntegratingComputing Resources on a Multiple Computing Resources on a Multiple
GJSS(cont.)GJSS(cont.)
Design of Grid RPC System IntegratingDesign of Grid RPC System IntegratingComputing Resources on a Multiple Computing Resources on a Multiple
GJSS(cont.)GJSS(cont.)
We have extended OmniRPC for the proposed system as follows:
A OmniRPC agent process to handle protocol conversion between the OmniRPC client program and each GJSS server was added.
Design of Grid RPC System IntegratingDesign of Grid RPC System IntegratingComputing Resources on a Multiple Computing Resources on a Multiple
GJSS(cont.)GJSS(cont.)
The remote executable module of OmniRPC can handle I/O data through files.
Alternative methods are available to manage the information of the remote function.
Easy-to-use APIs by which the proposed system can adapt to new GJSSs are provided.
Experimental Results(cont.)Experimental Results(cont.)
GJSSs as backbends of OmniRPC are XtremWeb version 1.5, CyberGRIP version 2.2 (CyberGRIP uses JTX), Condor version 7.10.7, and Open Source Grid Engine Version 6.0u6.
Experimental Results(cont.)Experimental Results(cont.)
Experimental Results(cont.)Experimental Results(cont.)
ConclusionConclusion
They have presented a framework for a parallel programming model by remote procedure calls bridging between large-scale computing resource pools managed by multiple GJSSs.
They found that the proposed system can achieve approximately the same performance as using OmniRPC and can handle interruptions in worker programs on remote nodes.