AService-OrientedWSRF-basedArchitectureforMetaschedulingon ComputationalGrids · 2020. 2. 4. · 1. Introduction GridComputingtechnologies[1] ... tion through a server typically listening

A Service-Oriented WSRF-based Architecture for Metascheduling on

Computational Grids ?

G. Molto ∗, V. Hernandez, J.M. AlonsoDepartamento de Sistemas Informaticos y Computacion. Universidad Politecnica de Valencia. Camino de Vera S/N, 46022 Valencia, Spain

Abstract

This paper describes a Grid service developed over the Globus Toolkit 4, which provides multi-user resource brokering on compu-

tational Grids. Both the architecture and implementation details are covered, emphasizing the usage of the WSRF specifications

and GSI-based security to create a generic, secure and interoperable metascheduler. In addition, we also describe the development

of a graphical client-side application that provides an ubiquitous access to the metascheduler service. This work is currently being

employed in production for the execution of a biomedical application that simulates the electrical activity of cardiac tissues.

Key words: Grid computing, Metascheduling, Globus Toolkit, WSRF, Service-Oriented Architectures

1. Introduction

Grid Computing technologies [1] have emerged as a solu-tion for the computational problems of Virtual Organisa-tions (VOs), enabling the collaborative usage of remote re-sources to satisfy the computational requirements of time-consuming tasks.

Among all the available Grid middlewares, the GlobusToolkit (GT) [2,3] is broadly accepted to be the currentde facto standard for deploying computational Grids. How-ever, GT only provides the basic services and capabilities todeploy and use Grid infrastructures. Performing remote ex-ecutions of scientific applications on a computational Gridtypically requires the usage of metascheduling technologies[5], that provide all the functionality required for task man-agement.

Metascheduling is the process that efficiently allocates aset of tasks on the available computational resources of thedifferent organisations that compose a Grid infrastructure.We envisage Grid metascheduling technologies as a set ofinteroperable components which abstract all the underlyingcomplexity of the Grid middleware, which offers the execu-

? The authors wish to thank the financial support received from theSpanish Ministry of Science and Technology to develop the projectGRID-IT (TIC2003-01318). This work has been partially supportedby the Structural Funds of the European Regional DevelopmentFund (ERDF).∗ Corresponding author: Tel. +34963877356. Fax.: +3496877359

Email address: [email protected] (G. Molto).

tion support. Only if these Grid technologies get closer tothe end user, they will be adopted in many scientific fields.

In this paper, we go one step beyond the traditional usageof metascheduling and we propose a metascheduler Gridservice that can be accessed throughout the network byusers interested in task allocation on computational Grids.It has been developed by exposing a Grid service interfaceto GMarte [6,7], a previously developed metascheduler.

With the proposed approach, the user just focuses ondefining the computational tasks to be executed, and del-egates to this Grid service their efficient execution on theavailable Grid infrastructure. As opposed to traditionalmetaschedulers, which typically require an installationand configuration on the client machine, and are typicallybounded to a given platform (usually Unix-based), we pro-pose a Service-Oriented Architecture (SOA) that can beused from different architectures and operating systems.This approach enables to dramatically simplify the usageof computational Grids for job executions, even accessiblefor Windows-based desktop PCs, thus paving the way forthe widespread adoption of Grid technologies by the leastexperienced users.

In addition, creating a generic and interoperable com-ponent based on standard technologies contributes to theecosystem of tools employed to interact with computationalGrids.

The remainder of the paper is structured as follows. First,section 2 introduces the metascheduling framework and theabstraction environment provided by GMarte, which is the

Preprint submitted to Elsevier 29 May 2007

Access to the Information System

Remote Execution of a Single Task

MetaScheduling of Multiple Tasks

MDS2 MDS4 GLUE

GRAM WS-GRAM CE

POLICIES FAULT TOLERANCE

GRIDFTP

PBS, TORQUE SGE

RESOURCE DISCOVERY

Fig. 1. Layered diagram with the abstraction framework providedby GMarte.

starting point of this work. Then, section 3 explains the ar-chitecture and functionality of the GMarte Grid Service aswell as the development of a graphical client to provide aneasy interaction. Later, section 4 describes the utilizationof the Grid service to offer transparent execution servicesto a biomedical research group that simulates the cardiacelectrical activity. Section 5 compares the approach sug-gested in this paper with others found in the literature andfinally, section 6 summarises the paper.

2. The GMarte Framework

GMarte [6,7] is a software framework, developed by ourresearch group, aimed at simplifying the process of execut-ing scientific applications on computational Grids based onthe standard Globus Toolkit [2,3]. Behind this simple de-scription, there is a tremendous effort to enable the usersto rapidly deploy their applications for execution on a com-putational Grid. GMarte has been developed as a Java li-brary which exposes a high level API (Application Pro-gramming Interface) to the user. The GMarte frameworkis a layered software (see Figure 1) that first introducesan homogeneous interface to access the Information Sys-tems of the heterogeneous computational resources. Overthis common API, a layer which enables remote task exe-cution on Globus-based resources was implemented. Later,we extended the capabilities of GMarte by to introducemetascheduling functionality for the allocation of multipletasks on computational Grids.

Therefore, the GMarte framework is not only ametascheduler but an abstraction environment that en-ables to interact with computational Grids at several levels.In the following subsections, we overview the fundamentalaspects of GMarte concerning each layer mentioned.

2.1. Information System Management for Computational

Resources

The computational resources belonging to a Globus-based deployment expose very valuable information com-monly employed for activities related with the metaschedul-ing process. For example, in order to decide the most suit-able machine for the execution of a given task, a resource

selection must be performed. This procedure obtains thedynamic information of each machine, gathering the avail-able number of processors, the available amount of RAM,etc.

However, this information is exposed in a different man-ner, depending both on the Globus Toolkit version andthe LRMS (Local Resource Management System), that is,the job manager, if any, on the remote machine. For ex-ample, in the Globus Toolkit 2.4.3, the MDS2 (Monitoringand Discovery Service) offers the computational informa-tion through a server typically listening in the port 2135,accessible via LDAP (Lightweight Directory Access Pro-tocol) interfaces. However, the Globus Toolkit 4 exposesthe computational information via the DefaultIndexService

(MDS4), which is just one of the multiple services deployedin the Grid services container. In this case, the informa-tion is published in XML (eXtensible Markup Language)format and typically accessed via XPath (XML Path Lan-guage) queries.

In order to hide all this complexity of protocols and dataformats from the end user, GMarte provides an abstractionlayer to access the computational information. For that, weintroduced a high level object, called GridResource, whichoffers a common API with user-level methods (getAvail-ableProcessors, getAvailableRAM, etc.) that gather the re-quested information, in a transparent way for the user. Thisis achieved by using the appropriate protocols dependingon the underlying Grid middleware of the remote machine.

This feature is of paramount importance for the enduser, which no longer has to deal with the low-level im-plementation details and data formats to access the com-putational information. Additionally, GMarte implementsa transparent caching component that distinguishes be-tween static information, which it is assumed to remain un-changed (i.e., operating system name, CPU architecture,etc.) and dynamic information (i.e, available number ofprocessors, available RAM, CPU load, etc.). The queries tostatic information are automatically cached so that subse-quent ones do not produce access to the remote informationsystem.

At the moment, the features described within GMarte areavailable for resources with the Globus Toolkit 2.X and 4.X,as well as the Computing Elements in LCG-2 [4] deploy-ments. The supported LRMS are PBS/Torque (PortableBatch Systems), and SGE (Sun Grid Engine).

2

2.2. Abstracting the Process of Remote Task Execution

GMarte enables to abstract the process of remote taskexecution by performing fault-tolerant executions with co-ordinated data staging, even for parallel applications usingthe MPI libraries.

First of all, the application dependent input files aretransferred via GridFTP from the client to the remote ma-chine (a process called stage-in). Then, the application isstarted in the remote machine via the GRAM (or WS-GRAM) service provided by the Globus Toolkit and peri-odically monitored to detect failures. When the executionfinishes, the output files generated are transferred to theclient machine via GridFTP (a phase called stage-out).

In this layer, GMarte implements a multi-level fault tol-erance scheme which allows to recover itself from failuresboth during the data transfers (which are retried a cer-tain amount of times before giving up), as well as dur-ing task execution, where failed tasks are marked so thatthe metascheduler decides the appropriate action to betaken. Failures in remote computational resources cause are-scheduling of the tasks on other available machines.

In addition, GMarte supports the definition of application-dependent checkpoint files which are periodically trans-ferred to the metascheduler machine so that failures in thetasks or resources are handled to retake the execution fromthe latest checkpoint. For long-lasting execution this cansave a considerable amount of time.

2.3. Metascheduling Approach

In brief, the metascheduling procedure typically involvesseveral phases [5] in order to efficiently solve the problemof executing a set of tasks on the available remote resources(see Figure 2). In GMarte, these phases are handled asfollows:

First of all, the Resource Discovery phase involves thediscovery of a set of candidate execution machines fromeither a GIIS (Grid Index Information Service) or a BDII(Berkeley Database Information Index). These componentsaggregate resource information and are queried to providea list of machines that satisfy some computational featuresspecified by the user.

Once a list of potential execution machines have beenobtained, the Resource Filtering phase is in charge of dis-carding those resources that either are not accessible withthe user credentials or do not satisfy the computational re-quirements of the task to be executed.

When both previous phases finish, a set of computationalresources on which to perform the execution of the tasksis available. In order to decide which resource is going torun each task, a Resource Selection stage must take placefor each one, which chooses the current best machine onwhich to execute it, according to some criteria. GMarteimplements different resource selection policies. The mostcommonly used involves a performance model that tries to

minimise the execution time of the application, consider-ing both the number of available processors (for parallelexecutions) and the data transfer cost between the sub-mission machine and the computational resource, which isdynamically updated upon each data transfer. In addition,being an abstraction framework, the user can easily createnew policies by providing different score functions for com-putational resources. As no resource selection can be con-sidered the best for all kind of applications, this allows toextend the functionality via additional, user-programmedmodules. This procedure involves creating a new Java classwhich ranks the resources according to the new criteria. Byusing the high level API to access the computational infor-mation of resources, this can be a simple task.

GMarte implements a multi-threaded metaschedulerthat enables to concurrently perform the distinct phasesinvolved in the remote task execution of different tasks(resource selection, stage-in, execution and stage-out). Forexample, the user can specify the number of threads in-volved in the stage-in phase. If 3 threads are for examplespecified, then up to 3 tasks will be simultaneously handled.

This approach enables to introduce a considerable speed-up in the whole task allocation process, but specially in thestart-up time, where the resource selection phase typicallyrequires a time linear with the number of computationalresources. If a large testbed is used, this selection phasemay become a time-consuming process due to the retrievalof dynamic information of each machine. Using multiplethreads for resource selection, combined with a client-sideprocessor reservation stage to avoid selecting the same re-source more than once, offers important improvements inspeed compared with a single-threaded approach.

Finally, it should be mentioned that the functionality de-scribed can be used via a GMarte Java application as well asthrough XML documents, which are automatically mappedto the corresponding abstraction objects. Being based onJava, GMarte can be seamlessly run both on Windows andUnix machines.

Currently the GMarte framework has been successfullyapplied for the execution of a biomedical application thatsimulates cardiac electrical activity both on local and large-scale Grid infrastructures [8]. Its functionality has also beenintroduced as part of a Grid service that performs the struc-tural analysis of buildings [9]. This has enabled to performexecutions of scientific applications in a transparent man-ner for the user.

3. GMarteGS: A Grid Service for Metascheduling

In order to create an interoperable metascheduler, wedecided to migrate the GMarte framework to a SOA basedon standard technologies provided by Grid services.

3

Stage Out

Grid Application

Target Application

Resource Discovery

Resource Filtering

Resource Selection

Selects

Remote Machine

Stage InExecution

Client Machine

GMarte Abstraction

Layer

Refers Uses

Performs

XMLDefinition

Develops

Writes

User

Generates

XML Adapter

Fig. 2. Overview of the principal phases involved in the GMarte framework.

3.1. Grid Services, OGSA and WSRF: Globus Toolkit 4

The evolution of the Grid Computing field indicatesthat there is an unquestionable trend in producing loosely-coupled, interoperable and generic components.

The Globus Toolkit (GT) has been developed since thelate 1990s to support service-oriented distributed com-puting applications and infrastructures [2]. Initially, theearly versions of GT were based on services that interactedamong them using proprietary interfaces, although com-monly implemented using standard protocols like LDAPor FTP (File Transfer Protocol). With the release of theGT version 3 (GT3), a step towards Web services was per-formed, trying to expose common and standard interfacesfor the interaction with the different services. GT version4 (GT4) represented an important advance in terms offunctionality, conformance to standards and usability [3].

GT4 provides the implementation of a set of Grid ser-vices which conform to OGSA (Open Grid services Ar-chitecture) [10]. OGSA represents an evolution towards aGrid system architecture based on the concepts and tech-nologies provided by Web services. Being developed by theOpen Grid Forum (OGF), OGSA defines a common, stan-dard and open architecture for all the services that can befound in a Grid system (job management, resource man-agement, security, etc) [11], so that tools of different ven-dors can cooperate together using standard interfaces. Asa result, OGSA defines its underlying architecture to bebased on special Web services, which maintain their statefrom one invocation to another, something that traditionalWeb services are not able to. This is where WSRF (WebServices Resource Framework) [12] comes into play, speci-fying how Web services can be stateful. This is achieved bycoupling a data container (WS-Resource), which stores thestateful data, to a web service. OGSA uses this new conceptto specify the underlying architecture of the Grid services.Therefore, GT4 includes an implementation of WSRF aswell as a set of Grid services developed on top of WSRF,which are compliant with OGSA requirements.

Migrating GMarte to a SOA means moving the resourcebrokering functionality from the client machine to a spe-cialised machine which now performs the task allocationfunctionality for multiple clients by means of a Grid service.

This represents a two-fold strategy. On the one hand,lighter clients can be developed since they no longer carry

hp w or ks t at io n

xw 4 0 0 0

GMarte Grid Service

Local Organizational

Resources

h p w o rk st a tio n

x w 50 0 0

Organization

(a) Enterprise Grid

h pw or k st a

ti on

xw 4 0 0 0

h p

w or k st ati on

xw 4 0 0 0

GMarte Grid Service

h p w o rk st a tio n x w 50 0 0

Organization 2 Organization 3

Organization 1

(b) Global Grid

Fig. 3. Different scenarios on which the GMarte Grid Service ap-proach can be applied.

out the metascheduling process. This implies decouplingthe task allocation (run in the Grid service) with the statevisualisation (done in the client), thus conforming to aloosely-coupled architecture.

On the other hand, this scenario opens new possibilitiessuch as those depicted in Figure 3. The GMarte Grid Ser-vice (GMarteGS) can be deployed within an organisationas an entry point to its local computational resources, thuscreating an Enterprise Grid (Figure 3.a). This enables anorganisation to provide secure and trusted computationalaccess to multiple clients willing to execute their tasks inthe organisation resources. Alternatively, GMarteGS canbe deployed within an organisation to enable performingremote task executions on computational resources fromother institutions (Figure 3.b). This way, the Grid servicewould represent a gateway among different machines andthe clients that want to access distributed computationalinfrastructure.

4

(1) Create Metascheduling

Session

Client Machine

h p w o rk s ta tio n x w 4 00 0h p w o rk s ta tio n x w 4 00 0

GT4 Java WS Core

Computational Grid

(2) File Transfer

(3) Execution of the Metascheduling

Session via GMarte

(5) Output Data Retrieval via

GMarte

(6) Output Data Retrieval

(7) Destroy Metascheduling

Session

GridFTP

GMarte Grid Service

(4) Task State Notifications

Fig. 4. General functionality provided by the metascheduling Grid service.

3.2. Main Functionality and Interaction

The aim of GMarteGS is to provide multi-user trans-parent metascheduling functionality to perform executionson computational Grids. Therefore, we introduce the con-cept of a metascheduling session and the service providesdifferent operations to manage them. A metaschedulingsession is defined by the computational tasks to be exe-cuted. Additionally, the user can also specify the compu-tational resources to be employed and the configuration ofthe metascheduler. This way the user can select the Gridinfrastructure or let the Grid service decide.

All the information exchanged between the client and theservice is performed in XML language. The main reasonis that XML enables to structure information in a mannerthat is both easily understood by humans and machines. Inaddition, being a text-based information can be obtainedas the result of a method invocation, thus easing commu-nication between the client and the service.

The principal interface of the Grid service covers a set ofmethods that allow the client to:

(i) Create a Session. Creates a new metascheduling ses-sion.

(ii) Initialise the Session. This operation enables to spec-ify the XML documents (in the shape of a String asmethod arguments) that describe the tasks and, op-tionally, the computational resources and the config-uration of the metascheduler. The user may also de-cide to rely on the resource discovery functionality togather a list of computational resources from an IndexService. In addition, user credentials are delegated tothe service so that authentication against computa-tional resources is performed by the Grid service onbehalf of the user. Notice that it would also be possi-ble to let the Grid service use some generic credentialsto interact with the computational Grid, regardless ofthose provided by the user. This can provide a com-putational gateway between different Grids accessedwith different credentials.

(iii) Gather Session Information. Retrieves informationabout the session, such as its name, the data folderin the Grid service machine that will host the client

data, the time spent on the execution of the session,etc.

(iv) Gather Session State. This method obtains the cur-rent state of the metascheduling session. This docu-ment specifies, for each task, the machine hosting theexecution, the number of processors employed in caseof a parallel execution, and its state according to thelife cycle of tasks.

(v) Gather Testbed State. This method obtains the cur-rent state of the testbed. It basically details somecomputational information about the machines beingemployed, such as the number of available processors.This allows the user to have an overview of the stateof the Grid infrastructure.

(vi) Destroy the Metascheduling Session. It instructs theGrid service to stop the task allocation process, thuscancelling all the on-going simulations, as well as eras-ing any data related to the metascheduling session.

Figure 4 summarises the main interaction between aclient and the Grid service. First of all, the client createsa new metascheduling session and specifies the XML defi-nition of tasks. Then, all the data required for the execu-tion of the tasks is transferred via GridFTP to the Gridservice machine. Next, GMarteGS performs the executionof the session delegating on the functionality implementedby GMarte. At the same time, the user is notified aboutchanges in the state of the tasks. Once the session has fin-ished, the user can retrieve the output data to the clientmachine. Notice that it is also possible to retrieve the out-put results of individual tasks as long as they have finished.

3.3. GMarteGS Architecture and Implementation

The principal architecture of the Grid service developedis shown in Figure 5. The Grid service has been split into aFactory Service (FactoryGMarteGS) and an Instance Ser-

vice (GMarteGS), following recommendations of [11]. Onthe one hand, the Factory service is in charge of creatingnew WS-Resources, which store the stateful data, that is,a MetaschedulingSession object. Notice that this creationis actually performed by the ResourceHome, which storesand manages WS-Resources. On the other hand, the In-

5

stance service (GMarteGS) represents the core of the Gridservice, and it implements the operations that affect a givenmetascheduling session. Although it is not necessary to splitthe functionality in two different services, this approach en-ables to introduce new scenarios (not covered in this paper)such as having a distributed federation of metaschedulingGrid services, where the Factory service diverts the clientrequests of creating new sessions to the most appropriatemetascheduler Grid service, based on availability, priorityor any other parameter.

The interaction between the client machine and themetascheduler Grid service is also covered in Figure 5.First of all, the client needs to know the URI (Uniform Re-source Identifier) of the FactoryGMarteGS Grid service,which represents a unique address to identify this service.

Next, a new metascheduling session is requested to becreated (step 1). This causes (step 2) the ResourceHometo create a new WS-Resource (step 3), which contains theMetaschedulingSession object that hosts the session infor-mation, returning a unique identifier (Resource Key) of thenewly created WS-Resource (step 4). Using this ResourceKey plus the URI of the GMarteGS service, an endpointreference (EPR) is created and returned to the client (step5, 6). An EPR, in this context, unambiguously identifiesboth the GMarteGS service and the metascheduling ses-sion (WS-Resource) that will be affected by the operationsinvoked.

Later, the client invokes any of the methods publishedby GMarteGS (step 7). Then, using the ResourceHomethis service can automatically locate the WS-Resource themethod will work with, from the EPR (step 8) and theoperation is performed (step 10), only affecting the corre-spondent MetaschedulingSession object. Finally, the resultof the method invocation is returned to the client (step 11).

3.4. WSRF-based Specifications

GMarteGS uses several WSRF specifications given thatthe adoption of standards enables to create interoperablecomponents.

First of all, the WS-Resource specification describes therelation between a service and a resource as well as the ac-cess mechanisms to resources through interfaces based onWeb services. In GMarteGS, each metascheduling sessioncorresponds to a WS-Resource. This enables to isolate ses-sions of different users. The fact that each WS-Resource hasa unique identifier enables to automatically know the ses-sion on which a certain operation has to be invoked, givenan EPR.

Secondly, the WS-ResourceProperties specification stan-dardizes the mechanism to define the properties of a re-source, which represent their state and how they can be de-clared as part of the service interface. In GMarteGS, eachWS-Resource has a resource property represented by theXML summary of the state of the metascheduling session,thus capturing the state of the WS-Resource.

The WS-ResourceLifeTime specification defines mecha-nisms to destroy a WS-Resource. Basically, it specifies twodifferent strategies: immediate and lease-based destruction.GMarteGS enables to destroy a metascheduling session atany moment cancelling the execution of its tasks. In addi-tion, it implements an automatic mechanism, based on thisspecification, to destroy finished sessions that have beenidle (no operation has been performed with the session) af-ter 15 days. This enables clients to use the Grid service ma-chine also as a temporary data repository for the generatedoutput data from executions.

Finally, we have used some WS-Notifications specifica-tions, particularly WS-Topics and WS-BaseNotification,although they are not part of WSRF but available in GT4.They allow part of a system to be notified when an eventoccurs in another point of it. Thus, a publish-subscriptionmethod based on topics is introduced where a client sub-scribes to a topic in order to receive notifications relatedto it. This feature has been incorporated to GMarteGS sothat changes in the state of the tasks are automatically no-tified to the client by means of XML data describing thecurrent state of the tasks. As initial and final phases ofmetascheduling cause many task state changes we have im-plemented basic contention mechanisms to aggregate mul-tiple notifications that occur within the same time intervalin order to reduce the communication overhead.

The usage of a notification mechanism reduces the work-load in the service as the multiple users no longer require aperiodical query to track the state of the tasks. Only whenchanges are detected a communication is performed. How-ever, the usage of notifications relies on the installation ofat least the Java WS Core of GT4, in the client machine,to be able to receive the notifications. This is a handicapwhen creating lightweight components to interact with theGrid service.

3.5. Security Infrastructure

Security in a multi-user environment accessible from theInternet is a crucial aspect. Therefore, it is important thatthe service guarantees the principal features of a securecommunication: privacy, integrity, authentication and au-thorization [13]. To address this problems we have usedstandard mechanisms based on GSI (Grid Security Infras-tructure) provided by the Globus Toolkit 4.

Figure 6 shows the interaction diagram between theclient and the service detailing the security aspects in-volved. The client must have available a user certifi-cate signed by a Certification Authority (CA) on whichGMarteGS trusts. Otherwise, it is impossible to perform acommunication.

The FactoryGMarteGS service has been configured torequire GSI Secure Conversation to establish a dialog. Thismechanism forces to automatically perform a mutual au-thentication so that both the client and the service ensurethat they are talking to the expected partner. In addition,

6

(7) Invoke a GMarteGS Method (on the WS-Resource)

(1) Create New Metascheduling

SessionFactoryGMarteGS

GMarteGS

ResourceHome

MetaschedulingSession

(WS-Resource)


(WS-Resource)

...

GMarte Grid Service(GMarteGS )

(2) Create New WS-Resource

(3) Create

(4) ResourceKey(6) ObtainedEPR

(8) Get WS-Resource from EPR

(9) Obtained MetaschedulingSession

(10) Invoke MetaschedulingSession

Method

(11) Get the Result of the Invocation

(5) EPR = GMarteGS URI + ResourceKey

Client Machine

Fig. 5. Architecture of the metascheduler Grid service as well as client-service information flow.

Client Machine

User

/C=ES/O=UPV/OU=GRyCAP/CN=John Doe FactoryGMarteGS

GMarteGS


WS-Resource, Owner = Other

User)


WS-Resource, Owner = John Doe

ResourceHome

Authorized Users

Check Authorization G

SI Secure Conversation

Authentication GSI Secure Conversation

Authentication

(2) Create New WS-Resource

(Using Credential Delegation)

(3) Create

(1) Create New Metascheduling

Session

(4) Invoke a GMarteGS Method

(on the WS-Resource)

(5) Get WS-Resource from EPR

(6) Obtained MetaschedulingSession

(7) Invoke the Method on the MetaschedulingSession

(Authorization Error if the Client is not the WS-Resource Owner)

GMarte Grid Service(GMarteGS )

Fig. 6. Diagram of the client-service interaction with the security mechanisms employed.

the channel is configured to support privacy and data in-tegrity. Once the connection has been established, the Fac-toryGMarteGS service queries a database of users, thusgranting access only to authorized ones.

The current security scheme forces the client to use acredential delegation mechanism that enables the serviceto create a new WS-Resource (metascheduling session) set-ting its owner to the user credentials. This feature is of fun-damental importance as all the accesses to a WS-Resourcecan only be performed by the creator of the session. Thissecurity approach is more secure than a simple access con-trol to authorized users, as access policies to WS-Resourcesare controlled by themselves.

The security mechanisms implemented achieve the fol-lowing goals:

(i) All the communications between the client and theservice are private and can not be altered by a thirdparty without the receiver to notice it.

(ii) An user which has not been registered as an au-thorized user of the Grid service can not createmetascheduling sessions.

(iii) Only the creator and owner of a session can invokean operation on it. Any other user authorized by thesystem has no access to this session.

3.6. The GMarteGS Client GUI

GMarteGS uses standard protocols like SOAP (SimpleObject Access Protocol) and XML for data exchange, andWSDL (Web Services Definition Language) for the inter-face specification. Therefore, a client using such protocolscould programmatically interact the service, no matter theprogramming language employed. Additionally, a client li-brary implemented in Java has been developed for that pur-pose. First of all, this library implements methods for datamanagement between the client and the service. It enablesto automatically perform data transfers from the XML de-scription of tasks provided by the user. In addition, it ex-poses high level methods to interact with GMarteGS. Thedevelopment of this client library enables to incorporatethe service functionality within applications that, using theAPI provided, can delegate the execution of tasks to theGrid service.

Alternatively, we have developed a commodity graphicalclient that provides easy-to-use interaction with the servicefor those users that just want to perform remote task exe-cution but do not require a full API (see Figure. 7). Thisapplication allows to:

(i) Perform data transfers between the client and theservice, both for the input and output data of the

7

tasks.(ii) Specify the different XML documents that are re-

quired to create a new metascheduling session. It ismandatory to provide at least the definition of tasks.

(iii) Show the current state of the metascheduling proce-dure by means of populating tables that summarisethe process.

(iv) Know the current state of the testbed, describingthe main features of the computational resources in-volved.

To enable an easy access to GMarteGS, we have reliedon the Java Web Start technology (JWS) to create a self-contained client that can be accessed by only means of aJava-enabled web browser. This feature provides ubiqui-tous access functionality to our client, which can be ac-cessed from every platform that supports Java.

JWS enables to deploy a full Java application to a webserver. This technology provides all the support to au-tomatically retrieve the latest version of the applicationand starts the application in a sandbox in the client ma-chine.The usage of a cache mechanism enables to save band-width from the client-side, while ensuring that the latestversion of the application will always be deployed to theclient machine.

Before working with the application, the user must haveavailable in the client machine all the components requiredfor security based on GSI. This information typically in-volves:

(i) An user certificate signed by a CA, which certifiesthat the public key included in the certificate belongsto the user. This way, the user can be recognised assuch by the computational resources in the Grid thattrust this CA.

(ii) A private key, only readable by the user, which en-ables to decipher messages addressed to the userwhich were ciphered with his/her public key. In GSIsecurity, the private key is also protected by a pass-word to introduce an additional level of security.

(iii) A certificate of the CA that signed the host certificateof the Grid service machine.

The security configuration on the client machine is a one-time process, provided that any certificate involved doesnot expire. In order to ease this procedure, we have alsodeployed under JWS the configuration setup of the JavaCommodity Grid Kit [14], a step-by-step GUI that enablesto configure the client machine for GSI security. In addi-tion, proper CA certificates, which signed the computa-tional resources certificates, must reside in the Grid servicemachine to access the machines during the task allocationprocedure.

Once the GSI configuration has been performed, theclient will always interact with the computational resourcesthrough the Grid service, via an user proxy, which imple-ments a single sign-on strategy that prevents the user fromtyping the password each time the private key is accessed.The proxy is just a small file that holds the user credentialfor a limited amount of time.

After the security configuration procedure, the user cannow access the Web site that hosts the GMarteGS ClientGUI component. This process automatically starts in theclient machine the JWS support to download the requestedcomponent. This involves retrieving the GMarteGS ClientGUI software itself and all its dependent libraries (a set ofJAR files), which broadly cover the following areas:

(i) GMarte libraries employed for reliable data transfers,accessing XML documents, etc.

(ii) Cryptographic libraries required for the GSI-enabledsecurity infrastructure.

(iii) Logging components that trace the execution ofGMarte and other Java modules.

(iv) Support libraries to access the WSRF-based Grid ser-vices, which includes components such as AXIS mod-ules for handling Web services and WSDL modulesfor the description of the service interface.

All this information represents a total 8 MBytes to bedownloaded to the client machine. Considering the cachethat implements JWS, a lightweight access to the clientGUI is provided.

Before the application is started in the client machine,the user is notified that the software demands full accessto his/her machine (in order to read certificates, performGridFTP data transfers, create directories, etc.). In addi-tion, the certificate information that signed the applicationis shown. Upon acceptance, the GUI component is launchedfor the user to interact with the Grid service.

To ensure client multiplatform, successful tests have beenperformed in several architectures (Pentium IV, Intel Xeon,AMD Opteron and UltraSPARC-III) and operating sys-tems (Linux Fedora Core, Windows XP and Solaris 10).

3.7. Fault-Tolerance

Using a service-oriented approach where client and serverreside in different machines introduces new problems whichmust be managed.

First of all, failures in the client should not affect the ex-ecutions being carried out in the remote computational re-sources. The strategy implemented in GMarteGS relies onthe fact that an EPR uniquely identifies both the Grid ser-vice and the WS-Resource that represents the metaschedul-ing session.

Therefore, when a new metascheduling session is created,its EPR is automatically stored. Therefore, a failure in theclient can be managed by using a new one that, using thestored EPR, accesses the session. This functionality enablesto decouple the client and the service so that a user cancreate a session, submit the executions and switch off thecomputer. Later, the client will be able to reconnect to thesession to track the state of the metascheduling procedure.

Failures occurred in both the computational resourcesor the tasks are managed by the GMarte metascheduler,which already provided this functionality. Therefore, theservice is only aware of changes in the state of the tasks.

8

(a) (b)

Fig. 7. Snapshots of the graphical client developed to interact with GMarteGS.

In order to handle failures in the service itself, wehave implemented a persistent approach. By default, WS-Resources reside in the main memory, so a failure in GT4container or in the service causes those resources to disap-pear. Therefore, we have employed the Java serializationmechanism to store a copy of the metascheduling sessionin secondary storage. In the case of failure, it is possible torestart the GT4 container and any operation performed ina session stored in disk will cause its automatic loading toprincipal memory. This enables to retake the metaschedul-ing session for those tasks that still had not finished.

4. An Application of the GMarte Grid Service:

Computational Gateway

Nowadays, GMarteGS is being used to provide computa-tional access to the Grid resources located at our researchgroup for the study of cardiac pathologies. The understand-ing of cardiac electrical activity is fundamental for the de-velopment of new therapies to address diseases such as ven-tricular fibrillation, the most mortal of arrhythmias. Forseveral years now our research group has a collaborationwith the Center for Research and Innovation on Bioengi-neering (Ci2B), at the Universidad Politecnica de Valen-cia. We have developed a High Performance Computing-based cardiac simulator which enables to reduce the exe-cution time of cardiac simulations in a cluster of PCs [15].This is an MPI-based application which solves the reaction-diffusion problem of action potential propagation on car-diac tissues. The simulator employs computationally inten-sive cardiac cellular models and requires the resolution of alarge sparse linear equation system in each simulation timestep. The usage of MPI-2 I/O enables to collaboratively

store on disk the large volume of biomedical data by thedifferent processors in a parallel execution.

However, there are many cardiac studies that require toperform multiple simulations. For example, studies of vul-nerable window in ischemia need to vary the time inter-val between two consecutive electrical stimulus in order todetect the range of values which provoke a reentry, a phe-nomenon that can derive into heart fibrillation. Besides, toevaluate the influence of certain medicines, it is necessaryto modify the concentration of drugs to study how they af-fect the electrical activity of the tissue. These cardiac casestudies are composed of multiple independent simulationswhich are ideal for their execution on a Grid infrastructure.

As part of this collaboration, GMarteGS has been de-ployed as the entry point to a Grid infrastructure com-posed by several clusters of PCs within our research group.The Client GUI application enables the experts to focus onthe definition of the cardiac case studies, while GMarteGSprovides a transparent support for the parallel executionof the different simulations on the machines that composethe Grid infrastructure. As a result, biomedical experts caneasily enlarge their computational power, using a Grid in-frastructure.

Figure 8 summarises the interaction with the Grid ser-vice by the biomedical experts. Notice that many clientscan concurrently work with the service. In this particularsituation, the Client GUI application has been restrictedso that users need only specify the tasks to be executed.

9

(4) The MetaScheduling

Simulation is Started

(2) The Client GUI is Automatically Deployed

(1) The User Accesses the Graphical Client

BiomedicalExpert

ClientMachine

Web Server

Computational Grid

(3) The User Specifies the Tasks

XML Definition of Tasks

(5) The Execution is Performed

Through GMarte

(6) The Output Results are Retrieved

(7) The User Retrieves the Output Data

and Destroys the Session

h p w o r ks ta ti on xw 4 0 00

Grid Service Machine (GMarte)

GMarteGS Client GUI

Fig. 8. Usage of GMarteGS to provide biomedical experts with trans-parent access to a computing infrastructure.

4.1. A Computational Case Study: Analysis of Ischemia

on the EGEE Testbed

The implemented Grid service supports the capability todelegate executions on the EGEE testbed [17], the largestproduction Grid infrastructure all over the world. Thisfunctionality enables to transparently access hundreds ofcomputers, via the high level interfaces in GMarte, for theexecution of computationally intensive applications.

In order to assess the benefits of using a large Grid infras-tructure, we have instrumented the execution of a typicalcardiac case study on the EGEE testbed, via the SOA de-veloped. The case study analyses the effects of myocardialischemia, a condition of the tissue caused by oxygen depri-vation to the heart muscle, in the electrical propagation.The generation of ventricular tachycardia and fibrillationcan be studied using a model of a regionally ischemic tissue[16]. This study introduces the conditions that appear dur-ing the first 10 minutes of myocardial ischemia, on a centralzone of 50x50 cells in a 250x250 cells bidimensional virtualtissue. Using a time increment of 0.5 minutes, 21 indepen-dent parametric executions are required. Each simulationreflects the conditions of the tissue on the specified min-utes after the onset of ischemia. Each execution simulates80 ms and stores a snapshot of the membrane potential ofall the cells of the tissue every 0.2 ms in the interval [60,80]ms, generating a total amount of data of 47 MBytes persimulation.

For the execution, we have relied on the resources ofthe Biomed Virtual Organisation (VO), within the EGEEtestbed. As MPI executions are currently not supportedby LCG-2, we have disabled parallel executions on this in-frastructure. The involved LCG-2 services are the StorageElement (SE), for storing both the input and output data

of a simulation, the Resource Broker (RB), for submittingthe execution, the Computing Element (CE) and its corre-sponding Worker Nodes (WN), where the executions takeplace, the Logging & Bookkeeping (LB), for monitoringthe state of the tasks, and the LFC (LHC File Catalog),to manage file replicas on the Grid. All these services aretransparently orchestrated, in GMarte, in order to achievethe data management, the delegation of execution and themonitoring of tasks. This VO has currently more than 1800queues available for execution.

Concerning the execution results, the metaschedulingsession lasted for 1.71 hours, since the task allocationstarted until the output data of the last task were retrieved.The resource selection policy of the RB was to choosethe machine with the largest number of free CPUs. As aconsequence, all the executions were allocated to the samemachine, which had 160 available CPUs out of a total 190WNs. A sequential execution of the whole case study on aPentium Xeon at 2.8 GHz with 2 GByte of RAM, required29.4 hours. Therefore, the Grid approach was 17.19 timesfaster than the sequential approach, enabling to producemore results per time unit. All the executions were concur-rently executed in the Grid infrastructure. Therefore, thedelays in the Grid approach included the data transfers re-quired between GMarteGS and the SE as well as betweenthe WNs and the SE. It should be mentioned that LCG-2introduces a substantial delay since a job request is sentuntil the application gets running on a remote resource.In order to alleviate this problem, a multi-threaded ap-proach is employed, in GMarte, to perform concurrent jobsubmission of different tasks.

Currently, we are planning to use the developed systemas a computational gateway to a two-layer Grid infrastruc-ture. The first one would be composed by clusters of PCsfrom our research group and the second one would be theEGEE testbed. This would enable firstly to use our lo-cal resources, which produce shorter execution times (dueto the overhead of LCG-2), until they become exhausted.Then, executions would be transparently delegated to anEGEE-based production Grid. This approach would offer asustained quality of service to the biomedical users. Otherexamples of execution results on local and regional Grid in-frastructures based on the Globus Toolkit, using GMarte,can be found in [8].

Decoupling the scientific activities from the computa-tional infrastructure has allowed biomedical experts to con-centrate on the scientific side, while the executions aretransparently performed by GMarteGS. Scientists are nowable to access a Grid infrastructure with easy-to-use toolsthat hide the complexity of the Grid. By enabling biomed-ical experts to enlarge their computational infrastructure,research productivity is easily increased, as more simula-tions per time unit can now carried out.

10

5. Related Work

This work addresses the topic of multiplatform metasche-duling combined with a WSRF-based approach to create amulti-user metascheduler accessible through standard in-terfaces and an easy-to-use graphical client.

In the field of resource brokering we find GridWay [18]which is one of the principal metaschedulers available forGlobus platforms. GridWay is an open source metasched-uler that performs unattended, reliable and efficient exe-cution of jobs on computational resources based both onGT2 and GT4. However, GridWay only runs on UNIX-likeoperating systems. Moving to a SOA, we find the eNanosGrid Resource Broker [19], a metascheduler implementedin GT3 to manage remote application executions.

To the best of our knowledge, there is little work focus-ing on metaschedulers exposed via WSRF-based Grid ser-vices. In [20], a WSRF-based metascheduler is designed,along with a client application, which performs task allo-cation to resources. Its Grid service receives single task re-quests, which cause a resource discovery to occur prior toexecution. In GMarteGS, we generalise this approach byintroducing the concept of a metascheduling session, en-abling whole case studies to be submitted for execution,which only involves one resource discovery per study. In ad-dition, we provide a security infrastructure and emphasizeon the usage of high level client-side interfaces which easethe process for the end users.

As another Grid service metascheduler, we can also findthe Community Scheduler Framework (CSF) [21], whichwas originally developed in GT3 and later adapted to GT4.While GMarteGS is able to manage different metaschedu-ling sessions, which can involve different resources, CSF4receives individual jobs specified in RSL (Resource Speci-fication Language). In addition, CSF4 does not coordinatedata staging for the application data. Also, GMarteGS usesstandard security mechanism between sessions while no re-lated documentation is found about CSF4. This is the caseof the the Gridbus Broker [22], which features a WSRF-based service interface but no security approach appears tohave introduced.

6. Conclusions

In this paper, the GMarte metascheduler has been in-tegrated into a Service-Oriented Architecture based onWSRF Grid services, implemented via the Globus Toolkit4. Using standard functionalities provided by WSRFspecifications have enabled to implement a multi-usermetascheduler which is generic and interoperable withother services in the Grid ecosystem. In addition, standardsecurity mechanisms based on GSI have been implementedto ensure data privacy and integrity.

The usage of GMarteGS has been described as a com-putational gateway for the study of the electrical activityof the heart, enabling biomedical experts to transparently

use a Grid infrastructure via high-level graphical tools. Wehave also provided bindings to access the EGEE infrastruc-ture, within the same interfaces, thus enlarging the compu-tational capabilities accessible. This way, a case study thatanalysis different ischemic conditions has been executed onthe resources of the Biomed VO, using GMarteGS, obtain-ing a considerable speedup.

The widespread adoption of Grid Computing technolo-gies require high level components that reduce the gapfrom the complexity of the underlying Grid middlewareto the user. The development of a system, such as theone described in the paper, paves the way to interoperablemetaschedulers which may cooperate via standard inter-faces to provide resource brokering functionality on com-putational Grids.

References

[1] I. Foster, C. Kesselman, The Grid: Blueprint for a NewComputing Infrastructure, Morgan Kaufmann, 2004.

[2] The Globus project: a status report, Future GenerationComputer Systems, 15 (5-6), 607-621.

[3] I. Foster, Globus Toolkit version 4: Software for service-orientedsystems., in: LNCS (Ed.), IFIP International Conference onNetwork and Parallel Computing, 3779, 2005, 2-13.

[4] LCG - LHC Computing Grid, URL: htp://www.cern.ch/lcg[5] J. M. Schopf, Ten Actions When Superscheduling, SchedWD

8.5, Scheduling Working Group (2001).[6] J. M. Alonso, V. Hernandez, G. Molto, An object-oriented

view of Grid computing technologies to abstract remote taskexecution, in: Proceedings of the Euromicro 2005 InternationalConference, 2005, 235-242.

[7] J. M. Alonso, V. Hernandez, G. Molto, GMarte: Gridmiddleware to abstract remote task execution, Concurrency andComputation: Practice and Experience 18 (15) (2006) 2021-2036.

[8] J. M. Alonso, J. M. Ferrero (Jr.), V. Hernandez, G. Molto, M.Monserrat, J. Saiz, Three-dimensional cardiac clectrical activitysimulation on cluster and Grid platforms, Lecture Notes inComputer Science 3402 (2005) 219-232.

[9] J. M. Alonso, V. Hernandez, R. Lopez, G. Molto, A serviceoriented system for on demand dynamic structural analysis overcomputational Grids, Lecture Notes in Computer Science 4395(2007) 13-26.

[10] I. Foster, et al., The Open Grid Services Architecture, version1.5 (March 2006).

[11] B. Sotomayor, L. Childer, Globus Toolkit 4: Programming JavaServices, Morgan Kaufmann, 2005.

[12] WSRF - The WS-Resource Framework, URL:http://www.globus.org/wsrf.

[13] N. Ferguson, B. Schneier, Practical Cryptography, John Wiley& Sons, 2003.

[14] G. von Laszewski, I. Foster, J. Gawor, P. Lane, A Javacommodity Grid kit, Concurrency and Computation: Practice& Experience 13 (8-9) (2001) 645-662.

[15] J. M. Alonso, J. M. Ferrero (Jr.), V. Hernandez, G. Molto,M. Monserrat, J. Saiz, Computer simulation of action potentialpropagation on cardiac tissues: An efficient and scalableparallel approach, Parallel Computing: Software Technology,Algorithms, Architectures and Applications, included in series:Advances in Parallel Computing 13 (2004) 339-346.

[16] B. Rodriguez, N. Trayanova, and D. Noble, Modeling cardiacischemia, Ann. N.Y. Acad. Sci., vol. 1080 (2006) 395-414.

[17] EGEE - Enabling Grids for E-sciencE, URL: http://public.eu-egee.org.

11

[18] E. Huedo, R. S. Montero, I. M. Llorente, A modular meta-scheduling architecture for interfacing with pre-WS and WS Gridresource management services. Future Generation ComputerSystems 23 (2) 252-261

[19] I. Rodero, J. Corbalan, R. Badia, J. Labarta, eNANOS Gridresource broker, Advances in Grid Computing - Egc 2005 3470

(2005) 111-121.[20] E. Elmroth, J. Tordsson, An interoperable standards-based Grid

resource broker and job submission service, in: e-Science 2005.First IEEE Conference on e-Science and Grid Computing, 2005,212-220.

[21] W. Xiaohui, D. Zhaohui, Y. Shutao, H. Chang, L. Huizhen,CSF4: A WSRF compliant meta-scheduler, The 2006 WorldCongress in Computer Science, Computer Engineering, andApplied Computing (GCA’06), 2006, pp. 61-67.

[22] S. Venugopal, R. Buyya, L. Winton, A grid service brokerfor scheduling e-science applications on global data grids,Concurrency and Computation: Practice & Experience 18 (6)(2006) 685-699.

12

Documents

AService-OrientedWSRF-basedArchitectureforMetaschedulingon ComputationalGrids · 2020. 2. 4. · 1. Introduction GridComputingtechnologies[1] ... tion through a server typically listening