358
Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype D ELIVERABLE D3.5 R EPORT ON THE RESULTS OF THE WP3 2 ND AND 3 RD PROTOTYPE WP3 New Grid Services and Tools Document Filename: CG3.0-D3.5-v1.2-PSNC010-Proto2Status Work package: WP3 New Grid Services and Tools Partner(s): ALGO, CSIC, CYFRONET, DATAMAT, ICM, PSNC, TCD, UAB, UCY Lead Partner: PSNC Config ID: CG3.0-D3.5-v1.2-PSNC010-Proto2Status Document classification: PUBLIC Abstract : This report introduces WP3 deliverable D3.5, which is a result of work that has been done after the deliverable D3.4 (june 2003). During those times we have had 2 integration meetings producing prototype versions. The second integration meeting was held in Poznań (PSNC, Poland) at the end of July 2003, the third one in Nicosia (Cyprus) at the end of January 2004. The main objectives of these meetings were to reach the threshold we can describe a tool as ‘grid enabled’, make the tools and services available in the entire testbed and useful to other applications. The report describes the current state of development until the end of January 2004, when we released the 3 rd WP3 prototype. CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 1 / 358

REPORT ON THE RESULTS OF THE ND AND 3RD PROTOTYPE · Report on the results of the WP3 2nd and 3rd prototype DELIVERABLE D3.5 REPORT ON THE RESULTS OF THE ... Wawrzyniak Adam Padee,

  • Upload
    ngodat

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

D E L I V E R A B L E D 3 . 5

R E P O R T O N T H E R E S U L T S O F T H E W P 3 2 N D A N D 3 R D P R O T O T Y P E

WP3 New Grid Services and Tools

Document Filename: CG3.0-D3.5-v1.2-PSNC010-Proto2Status

Work package: WP3 New Grid Services and Tools

Partner(s): ALGO, CSIC, CYFRONET, DATAMAT, ICM, PSNC, TCD, UAB, UCY

Lead Partner: PSNC

Config ID: CG3.0-D3.5-v1.2-PSNC010-Proto2Status

Document classification: PUBLIC

Abstract: This report introduces WP3 deliverable D3.5, which is a result of work that has been done after the deliverable D3.4 (june 2003). During those times we have had 2 integration meetings producing prototype versions. The second integration meeting was held in Pozna (PSNC, Poland) at the end of July 2003, the third one in Nicosia (Cyprus) at the end of January 2004. The main objectives of these meetings were to reach the threshold we can describe a tool as grid enabled, make the tools and services available in the entire testbed and useful to other applications. The report describes the current state of development until the end of January 2004, when we released the 3rd WP3 prototype.

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 1 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Delivery Slip

Name Partner Date Signature

From Deliverable D3.5

ALGO, CSIC, CYFRONET, DATAMAT, ICM, PSNC, TCD, UAB, UCY

5.02.2004 Norbert Meyer

Verified by Andres Gomez,

Victor M. Gulias

CESGA

University

of Corunna 27.02.2004

Approved by

Document Log

Version

Date Summary of changes Author(s)

0.8 December 2003

Early draft version as a base for discussions before the Technical Board meeting

0.9 18.01.2004 Includes plans and the state of implementation to be done while the integration meeting in Cyprus

1.0 4.02.2004 Last corrections done after the integration meeting in Cyprus, standardization and linking into one document.

1.1 5.02.2004 Editorial changes

1.2 28.02.2004 Changes done after the internal review process see CG3-CAF-v1.0-PSNC010-D35.doc

Executive Summary N.Meyer Task 3.1 Miroslaw Kupczyk, Bartosz Palak, Marcin Pciennik, Norbert Meyer, Pawel Wolniewicz, Miltos Kokkosoulis, Stefano Beco, Marco Sottilaro Task 3.2 Miquel A. Senar , lvaro Fernndez, Elisa Heymann, Marco Sottilaro, Enol Fernandez, Antonio Hervas, Karol Wawrzyniak Adam Padee, Krzysztof Nawrocki Task 3.3 Bartosz Bali, Brian Coghlan, Stuart Kenny, Marcin Radecki, Tomasz Szepieniec, Kazimierz Baos Task 3.4 ukasz Dutka, Jacek Kitowski, Renata Sota Task 3.5 Santiago Gonzlez de la Hoz

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 2 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Contents 1 EXECUTIVE SUMMARY5

2 DEFINITIONS, ABBREVIATIONS AND ACRONYMS..8

3 REFERENCES9

4 STATUS AND PROGRESS OF WP3 PROTOTYPE10

APPENDIX DETAILED STATUS REPORT OF WP3 TOOLS AND SERVICES.... 19 5 PORTALS AND ROAMING ACCESS20

5.1 INTRODUCTION.................................................................................................... 20 5.2 REFERENCES......................................................................................................... 22 5.3 IMPLEMENTATION STRUCTURE...................................................................... 25 5.4 PROTOTYPE FUNCTIONALITY.......................................................................... 29 5.5 USER MANUAL ..................................................................................................... 57 5.6 INTERFACE DESCRIPTION................................................................................. 83 5.7 INTERNAL TESTS ............................................................................................... 116 5.8 ISSUES................................................................................................................... 116 5.9 SUMMARY AND FUTURE PLANS ................................................................... 116

6 GRID RESOURCE MANAGEMENT119 6.1 INTRODUCTION.................................................................................................. 119 6.2 REFERENCES....................................................................................................... 121 6.3 SCHEDULING AGENT........................................................................................ 123 6.4 POSTPROCESSING MONITORING SYSTEM .................................................. 148 6.5 ISSUES................................................................................................................... 164 6.6 SUMMARY AND FUTURE PLANS ................................................................... 165

7 GRID MONITORING..168 7.1 INTRODUCTION.................................................................................................. 168 7.2 REFERENCES....................................................................................................... 170 7.3 OCM-G................................................................................................................... 173 7.4 SANTA-G .............................................................................................................. 182 7.5 JIMS ....................................................................................................................... 203 7.6 SUMMARY AND FUTURE PLANS ..................................................................... 228

8 OPTIMISATION OF DATA ACCESS...229 8.1 INTRODUCTION........................................................................................................ 229 8.2 REFERENCES ........................................................................................................... 230 8.3 IMPLEMENTATION STRUCTURE.................................................................... 231 8.4 PROTOTYPE FUNCTIONALITY........................................................................ 243

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 3 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

8.5 USER MANUAL ................................................................................................... 279 8.6 INTERNAL TESTS ............................................................................................... 293 8.7 ISSUES................................................................................................................... 293 8.8 SUMMARY AND FUTURE PLANS ................................................................... 294

9 TESTS AND INTEGRATION.295 9.1 INTRODUCTION.................................................................................................. 295 9.2 REFERENCES....................................................................................................... 296 9.3 IMPLEMENTATION STRUCTURE.................................................................... 297 9.4 PROTOTYPE FUNCTIONALITY........................................................................ 298 9.5 INTERNAL TESTS ............................................................................................... 325 9.6 SUMMARY AND FUTURE PLANS ................................................................... 357

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 4 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

1 EXECUTIVE SUMMARY

Deliverable D3.5 comes after 8 months of its development and is released as a public report describing the second and third prototypes of workpackage 3.

The main objective of workpackage WP3 (New Grid Services and Tools) is to develop Grid services and software infrastructure required to support the Grid users, applications and tools defined in workpackages WP1 and WP2. This workpackage includes a set of tools and services which (also including the results of WP2) will define the middleware layer of the CrossGrid project. Additionally, the workpackage includes extra tasks concerning tests and integration as well as the co-ordination. The formal list of tasks in WP3 [Annex], [BMM, 2002]:

Task 3.0 Co-ordination and management Task 3.1 Portals and roaming access Task 3.2 Grid resource management Task 3.3 Grid monitoring Task 3.4 Optimisation of data access Task 3.5 Tests and integration

The former deliverables were focusing on the following issues:

Detailed planning for all the tools and services including use cases for WP3 D3.1 [Deliverable D3.1], month 3

The first deliverable was released in Month 3 in a report manner [Annex] of detailed planning for all the tools and services including use cases for:

o Requirements of the end user o Definition of the monitoring environment functionality and of interfaces to Task 2.4 specification of GRM

agents o Review of state-of-the-art and current related techniques.

The deliverable encompassed the Software Requirements Specification (SRS) prepared for each task: 3.1 .. 3.5, enhanced later in D3.2 and D.3.3.

Internal progress report of WP3 D3.2 [Deliverable D3.2], month 6 The aim of these reports was to give a comprehensive set of design for each task, also including a proposition of the tests procedures. The security issue has been treated as a cross-sectional problem and therefore delivered as an extra report.

The second deliverable: o Defined the architecture of each tool and service o Described the submodules and dependencies between them o Defined the list of interface functions for the outside world o Described the technology planned to be used during the development o Described the use cases o Delivered the static and dynamic diagrams of each tool and finally gave us algorithms how to test the product

during the test phase and before the release.

First WP3 software release D3.3 [Deliverable D3.3], month 12

The third deliverable announced the first software release of middleware tools and services developed in WP3. The main issues of D3.3 were the following:

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 5 / 358

o Test scenarios, evaluation suite

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

o Documentation of design, implementation, and interfaces.

The report described in details the functionality of the 1st prototype (guide for the end user), a set of installation steps (will be available for each module independently publicly available and as a general list of installation steps to be done for the whole WP3 mainly used within the CrossGrid consortium). Additionally, the D3.3 report described the structure of WP3 software as well. The fourth deliverable (D3.4), month 15, was an internal progress report on WP3 software evaluation and testing:

o Feedback from applications, new requirements o Detailed integration plan [Deliverable D3.4].

The short overview of past deliverables gives us a reference to all information currently available. Finally deliverable D3.5 presents the current state of development of all WP3 tools after launching two prototypes: Migrating desktop and portal systems Grid resource management with postprocessing analysis module, supporting the scheduler

system Infrastructure and application monitoring systems (SANTA-G, JIMS and OCM-G) Data optimization access module. Additionally, the report sums up the effort of tests and integration (task 3.5). The report shall be treated as a nutshell dedicated for end users going to use the tools (inside the project and/or the external ones). Therefore we are providing all necessary information, like: Contacts to developers, references to source codes Software structure Interface description Installation procedures Detailed functionality description (user manual) List of known bugs. For internal project purposes D3.5 includes additional sections describing: Differences according to past prototypes List of dependencies and known errors Future plans Internal tests results. Each tool was described in a similar way, which allows the potential user to reach the necessary information very easily, just by taking the appropriate chapter with all reference materials (user manuals, installation guides, source codes or binaries).

The deliverable summarizes the effort and experiences coming from the last 7-8 months. The major milestones covered integration meetings in Pozna (Poland, July 2003) and Nicosia (Cyprus, January 2004). The remaining time before and after meetings has been used to produce more robust versions with enhanced functionality.

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 6 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Work Package 3 delivered the second prototype version of its software integrated with some applications (WP1). Those states of functionality for each task can be found under the [Poznan] link.

After the meeting in Pozna we have continued work to obtain more stable versions of software and to develop the functionality, which might have slightly changed because of the reevaluation of demands. We have also verified the detailed plan for the coming months for each task, as described in deliverable D3.4.

The Cyprus meeting (January 2004) focused mainly on wider integration of applications with tools, services and programming environment, running tools in the entire testbed, automating the process of generating executables (taking the source code from CVS and installing at any location within the CrossGrid testbed).

From the formal point of view we reshaped the WP3 structure in work package WP3. The post processing analysis located so far in Grid monitoring (task 3.3) has been moved to Grid resources management (task 3.2) to emphasize the main project objectives:

The current implementation progress was described in chapter 4 and more detailed in Appendix: 3.1 task

The MD and portal are used by almost all applications in WP1 and some tools from WP2 and WP3. Available as a service in the testbed with the backup installations in Nicosia and Cyfronet and the main installation in Pozna.

3.2 task Scheduler installed in the LIP. It serves all the local clusters in the testbed.

The integration with the postprocessing has been fixed at the interface level - the process is in the integration and deployment phase. The postprocessing is at such phase where sensors are installed on every testbed node.

3.3 task The modules SANTA-G, OCM-G are available on the local cluster and JIMS in a few locations in X# testbed. The process of installation of those tools on all the testbed clusters was started after the integration in Nicosia. The OCM-G has been integrated with the postprocessing analysis. JIMS and SANTA-G with GridBench. The integration of JIMS and SANTA-G with the postprocessing analysis is currently in progress.

3.4 TASK After six moths since the Poznan meeting, during the integration meeting at Cyprus, task 3.4 presented the second more advanced prototype, which was fully integrated with EDG Optor/Reptor. The presented solution was able to work for both projects and makes the mutual integration more seamless. During the meeting measurements of data access estimation quality were done and presented. The meeting was also a good opportunity to spread our software around sites in the CrossGrid testbed. At the end, the final demonstration was done based on four sites: Cyfronet Krakow, SAS Bratislava, LIP Lisbon and FZK Karlsruhe.

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 7 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

To sum up, the D3.5 report describes all the tools we worked out in WP3 until January 2004, integrated with other X# components and tested in the testbed [Nicosia].

2 DEFINITIONS, ABBREVIATIONS AND ACRONYMS AP Application Portal API Application Programming Interface CE Computing Element ClassAdd Classified Advertisements Condor-G Condor Component that interfaces with globus CPU Central Processing Unit CrossGrid The EU CrossGrid Project IST-2001-32243 CVS Concurrent Versioning System DataGrid The EU DataGrid Project IST-2000-25182 EDG European Datagrid abbreviation GC Grid Console Globus Grid middleware HPC High Performance Computing HTC High Throughput Computing JDL Job Definition Language JIMS JMX-based Infrastructure Monitoring System JIRO Sun JIRO, an implementation of the FMA JSS Job submission Service LCG/LCG-1 LHC Computing Grid (version 1) LDIF LDAP Interchange Format MD Migrating Desktop MDS Meta Directory System MPICH Implementation of the Message Passing Interface library OCM-G Grid-enabled OMIS-compliant Monitoring System OCM OMIS-Compliant monitor PBS Portable Batch Scheduler RAS Roaming Access Server RB Resource Broker RPMS Red Hat Package Management (packets) SANTA System Area Networks Trace Analysis SANTA-G Grid-enabled System Area Network Trace Analysis SE Storage Element UI User Interface UDAL Unified Data Access Layer UWM University of Wisconsin-Madison (UW-Madison) XML Extensible Markup Language

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 8 / 358

http://www.wisc.edu/http://www.w3.org/XML/

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Prototype ver. 1 first release launched after the integration meeting in Santiago de

Compostella (Spain), February 2003 Prototype ver. 2 second release launched after the integration meeting in Pozna

(Poalnd), August 2003 Prototype ver. 3 third release launched after the integration meetin in Nicosia (Cyprus),

February 2004

3 REFERENCES All references mentioned in the main document

Annex http://www.eu-crossgrid.org

http://www.cyf-kr.edu.pl/crossgrid/CrossGridAnnex1_v31.pdf

BMM, 2002 M.Bubak, J.Marco, H.Marten, N.Meyer, M.Noga, P.A.M.Slot, M.Turaa, CrossGrid development of grid environment for interactive applications, PIONIER 2002 - conference proceedings, Pozna, April 2002, pages 97-112

MBM, 2003 N.Meyer, M.Bubak, J.Marco, H.Marten, M.Noga, P.A.M.Slot, M.Turaa,

FIRST PROTOTYPE version of CROSSGRID tools and services, PIONIER 2003 - conference proceedings, Pozna, April 2003

CVS http://savannah.fzk.de/cgi-bin/viewcvs/crossgrid/crossgrid/wp3

Deliverable D3.4 http://wp3.crossgrid.org/pages/intranet/d34

Deliverable D3.3 http://wp3.crossgrid.org/pages/intranet/d33-24-02-2003.html

http://www.eu-crossgrid.org/M12deliverables.htm

Deliverable D3.2 http://wp3.crossgrid.org/pages/intranet/dd-04-09-2002.html

Deliverable D3.1 http://wp3.crossgrid.org/pages/intranet/srs-03-06-2002.html

[Nicosia] https://savannah.fzk.de/websites/crossgrid/iteam/

[Poznan] http://gridportal.fzk.de/websites/crossgrid/iteam/presentations or

http://wp3.crossgrid.org/doc/Pozna

Santiago Presentations, minutes, results from discussions in Santiago (February

2003)

http://wp3.crossgrid.org/pages/intranet/santiago_presentation.html

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 9 / 358

http://www.eu-crossgrid.org/http://www.cyf-kr.edu.pl/crossgrid/CrossGridAnnex1_v31.pdfhttp://savannah.fzk.de/cgi-bin/viewcvs/crossgrid/crossgrid/wp3http://wp3.crossgrid.org/pages/intranet/d34http://wp3.crossgrid.org/pages/intranet/d33-24-02-2003.htmlhttp://wp3.crossgrid.org/pages/intranet/dd-04-09-2002.htmlhttp://wp3.crossgrid.org/pages/intranet/srs-03-06-2002.htmlhttps://savannah.fzk.de/websites/crossgrid/iteam/http://gridportal.fzk.de/websites/crossgrid/iteam/presentationshttp://wp3.crossgrid.org/doc/poznanhttp://wp3.crossgrid.org/pages/intranet/santiago_presentation.html

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

4 STATUS AND PROGRESS OF WP3 PROTOTYPE This paper gives the current state of development we reached in January 2004. In fact the second prototype version was treated as a temporal one, with no major significance for further use. Therefore we recommend to use the last 3rd prototype (January 2004 revision) as more robust and stable and with enhanced functionality. All prototype versions can be found in the CVS database [CVS]. This chapter describes shortly the current state of development. More detailed information of each tool can be found in the Appendix. The Fig. 4-1 depicts an overview of tools, services and dependencies between them as well as connections to other work packages.

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 10 / 358

services

Web browser

Desktop Portal Server

Application Portal Server

CECE

SE

Postprocessing Grid Monitoring Data Analysis

Grid Traffic Monitoring Tool

Ganglia, IP tables

Replica Manager

3.1

3.2

Data Access Estimator

Component Expert

3.4

GPerf.

APPLICATIONS

JIM

SANTA-G

OCM-G

JSS RAS

EDGLB/R

Resorce Broker

Scheduling Agent

LB/RB Server

Condor-G

GridBench

3.3

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

JS&JM

STEL

Fig. 4-1 General structure of WP3 tools

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 11 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

The Migrating Desktop and Portal developers (task 3.1) focused on improvement of the existing features. The functionality was enlarged also by a number of integrated applications. The current 3rd prototype is a continuation of the software released after the integration meeting in Pozna in July 2003. The major aim was to achieve better stability of software, support for developers of any grid application especially the GUI part of each application, which is crucial for faster and better introduction of newly developed computational problems. The developers also put a lot of effort in the integration of such applications into the CrossGrid framework using the Roaming Access Server and Migrating Desktop. Without these tools the user would not be able to reproduce their working environment anywhere else in the grid. The list below shows all new and enhanced functionality we have done in the prototype ver. 2.0: Submission of batch jobs Example of submission semi interactive jobs Grid Explorer/ Grid Commander tools Integrated mechanism of Application Plugin and Application Container Integrated mechanism of Tool Plugin and Tool Container Private Storage support Operations on Virtual Directory Text files editor Basic graphical format support (JPEG, SVG) Extended information available in Job Monitoring Dialog Graphical interface improvements. Additional effort was put in the development phase before releasing the prototype ver. 3.0, including New functionality:

o Unified view and operations on different file systems (local, GridFTP, ftp, virtual directory, private storage)

o Integrated mechanism of JobSubmissionTool Plugin and JobSubmissionTool Container

o Semi-interactive jobs support o MPI jobs support o Online Help.

Optimization of the following features: o Migration to a newer version of webservices - Axis version 1.1 o Migration to a newer version of Java COG - v1.1 o Static/dynamic code review o Improved stability o Improved speed of launching Migrating Desktop o Improved Grid Explorer/ Grid Commander tools.

Also, important changes have been done since the first prototype regarding the Application Portal (AP). The most important one is the abandonment of the PHP-Nuke technology and the fact that portal developers moved from the Apaches Jakarta Project to the Jetspeed

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 12 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

framework. Jetspeed is an open source implementation of an enterprise portal system and it is based on the Java programming language and XML. On the contrary, there are other technologies that have been used since the first prototype, such as the Roaming Access Server (RAS). The Application Portal is based on the RAS machine, which provides the web services. Those web services (e.g. Job Submission Services) allow the user to submit jobs and perform other tasks on the CrossGrid testbed, such as obtaining the status of the jobs that have been submitted, or obtaining information regarding nodes available in the testbed. The Job Submission Services (JSS) (Task 3.1 and 3.2) provide a set of services, which allows performing job submission and monitoring. Its provided as Web Services. The CrossGrid Migratin Desktop (MD) and the CrossGrid Portal invoke these services to submit and monitoring the user jobs. To the other side theyre connected to the Scheduling Agent (SA) integrated into the EDGs Resource Broker and the EDG Logging&Bookkeping (L&B). The inteface towards these two components has been implemented re-using (and adapting) part of the EDG sw contained in the Workload package. The Web Services are implemented using the tools provided by the Axis distribution (http://ws.apache.org/axis/). In more detail, the following work was done during the second project year:

migration to the EDG 2.0.18 software from the old version 1.x; migration to newer version of some required libraries; providing the tools for submitting interactive jobs; after providing a basic functionality by the first prototype, we focused on performance

measurements and improvements for the second prototype. For the second prototype, we did not need to change the design of the JSS. We only changed the software of the implementation, in order to provide new services (like the interactivity) and improve the services of the first prototype. For the first prototype the CrossGrid testbed was based on the DataGrid 1.x software, so also the first prototype of the JSS was based on this version of EDG. As all components of the testbed have migrated to the version 2.0 (included in the LCG-1 distribution), also the JSS have been updated. Weve also moved to a newer distribution of Axis (the version 1.1), which is more stable than the old one. We use the Axis libraries:

at the developing phase, to generate automatically the Java stub and skeleton code of the Web Services and to the deploy them to a TomCat WebServer;

at runtime, to serve the incoming request. Work in the second prototype has been focused on providing tools for interactivity: tools that allow to submit their jobs and interacts to it while its running on a Computing Element (for instance, just receiving its output while its running). The submission service of the interactive jobs hasnt provided yet as web services. A mechanism based on the VNC tools has been put in place between the Migrating Desktop and the RAS machine, in order to allow user to access to this service, installed on the RAS.

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 13 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

During the integration meeting in Cyprus, a live demo was shown successfully with a interactive job provided by the HEP application (WP3.1) . Integration with CrossGrid testbed:

all code is in the CVS repository ; as for the first prototype, the clients of the JSS are the Migrating Desktop and the Web

Portal (WP3.1); the JSS support the submission of MPI jobs to the Scheduling Agent.

Task 3.2 deals with the development of a resource management system for scheduling parallel applications on the Grid and a postprocessing monitoring system delivering information for better tuning of the scheduler. The resource management system is commonly referred as Scheduling Agent and is based on the EDGs Resource Broker, which has been extended and modified in order to provide support for parallel applications written with the MPI library. In particular, two scheduling systems have been developed during the second year of the project. The first one (that we refer to as the second release) was based on EDGs middleware release 1.4. This system provides full support for running MPI applications in a single remote cluster. More specifically, the following set of services were implemented and tested:

Resource Selector that returns a list of sites where the MPI application can be executed according to the requirements provided by the user

Scheduling Service that selects one of the sites provided by the Resource Selector and passes it to the application launcher

Application Launcher for MPI jobs that run in a single cluster (MPICH-p4 jobs). The Application Launcher is responsible for carrying out all the necessary actions to prepare the job and submit it to the remote Grid site.

The above services were integrated with the Roaming Access Server and it was demonstrated during the meeting that was held in Poznan in July 2003 with the first prototypes of applications from task 1.2 and 1.3. Additionally, a successful alpha test was carried out by members of LIP and the corresponding document that summarizes the results of such test was produced. The second system (that we refer to as the third release) was developed to guarantee compatibility with the new middleware release adopted in the CrossGrid testbed, known as LCG-1. LCG-1 is mainly based on EDGs middleware release 2.0 and is incompatible with EDGs middleware release 1.4. Existing services from the second release were migrated and adapted as needed into the new Resource Broker of LCG-1. Furthermore, the following services were included in our third release: The Resource Selector includes a set-matching algorithm that was developed during the

first year of the project. It was adapted to the new Glue Schema of the Information Index available in LCG-1. The algorithm was modified taking into account the default distribution of resources into queues adopted in the LCG-1 testbed and the edg-job-list-match was also enhanced to improve the comprehensibility of the command output.

The Scheduling Service includes a scheduling policy that selects one site between the different choices provided by the Resource Selector and passes it to the corresponding

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 14 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

application launcher. The selection criteria is based on user preferences and the number of different sites in each group of resources.

The launcher for MPICH-p4 jobs was extended to deal with clusters of dual-CPU machines. A new Application Launcher has been developed for MPI jobs that can run over multiple sites (MPICH-G2 jobs). It carries out file staging and remote note setup and is also responsible for co-allocating the different sub-jobs belonging to the parallel application, following a two-step commit protocol.

The above services have also been integrated with the Roaming Access Server and the rest of the LCG-1 testbed middleware. Three additional mechanisms have been included in the third release in order to improve the adaptability of different modules to the dynamic changes of the Grid. The services are the following: A temporal reservation mechanism has been included in the Resource Selector module to

guarantee time-limited and exclusive access to a set of resources for any given job submitted to the Grid. The mechanism is intended to alleviate the differences that exist between the Information Index and the Resource Broker about the status of the Grid.

A rejection mechanism for faulty sites. This adaptability mechanism is included in the Scheduling Service and is used to discard sites that have recently failed when they have been used to run an MPICH-G2 job on a group of various sites.

A reliability mechanism for MPI job co-allocation. This mechanism is included in the MPICH-G2 Application Launcher in order to detect all possible situations that prevent a certain MPICH-G2 application from running properly. It is intended to improve the adaptability of the overall scheduling system to Grid failures.

Moreover, some additional work has been carried out in order to anticipate the needs for the new functionality that will be included in the third release. This work includes, on the one hand, a glide-in launcher that will constitute the basis for a pre-emption mechanism. Job pre-emption is planned to be used to schedule interactive applications if non-interactive applications dont leave enough free resources for interactive jobs. On the other hand, theoretical work on a new scheduling policy that takes into account network latency and bandwidth has also been carried out. This policy will take advantage of the network metrics provided by the postprocessing monitoring tool in the near future. It is also worth mentioning that a research cooperation agreement was signed with the Condor Team. Joint research was carried out in order to design and implement a robust and reliable Application Launcher for MPICH-G2 jobs that extends the robustness and reliability features already provided by Condor-G for sequential jobs. The main objectives of the third prototype of the postprocessing monitoring system were: To build a system for gathering monitoring data from the grid, such as cluster load and

data transfers between clusters To build a central monitoring service to analyze above data and provide it in the format

suitable for the scheduler which will be the main tasks data consumer

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 15 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

To use the predict tool which was build for the second prototype to forecast collected Grid

parameters. The third prototypes functionality is much closer to the planned final functionality of the postprocessing task than the second prototypes functionality was. The main improvement is its potential ability to work on the whole CrossGrid testbed and to provide real data to end-users. The launched prototype consists of four main layers: The part responsible for collecting the data from the CrossGrid clusters. The part responsible for gathering the collected data to the central monitoring host. The part that analyzes the gathered data, processes them and prepares summary data for

final users. The part that provides two user interfaces: GUI that uses WWW and other based on

SOAP technology. The future development of the task will concentrate on the following topics: Incorporating data provided by CrossGrid monitoring systems: SANTA-G and Jiro to the

postprocessing system, for example very interesting metrics provided by Jiro about data traffic on routers and available bandwidth.

Adding new metrics both on individual hosts and for clusters, for example available bandwidth between clusters.

Providing more data analysis tools on the central monitoring host. The Grid monitoring system (task 3.3) consists of 3 modules. Two of them belong to infrastructure monitoring: SANTA-G (task 3.3.2) and JIMS (3.3.3) systems. OCM-G (3.3.1) monitors parameters on the application level. The OCM-Gs (Grid-enabled OMIS-compliant Monitoring System) purpose is to be an intermediate layer between tools for application development support and applications running on the Grid. The OCM-G enables tools to collect information about, and manipulate, the applications. The report describes the implementation and functionality of the current prototype of the OCM-G. We also provide a description of the installation procedure, and a user manual in which we explain how to use this version of the OCM-G. This version of OCM-G features, among others, the following extensions in the functionality in comparison with the previous version. Firstly, full GSI-based security has been introduced. The user now needs a proper certificate to use the monitoring system. Secondly, configuration facilities have been implemented. Local Monitors are now able to discover Service Managers from an external information service (currently a configuration file). Consequently, a shared file system is no longer required by the OCM-G. Thirdly, a new service to return a list of functions used by a program has been added. This enables the restriction of defined measurements to the level of functions. The SANTA-G system provides a generic template for ad-hoc, non-invasive monitoring with external instruments. This document describes the implementation structure, and the functionality of the second prototype of these services. The first prototype has had an incomplete schema of monitoring data available, the limited set of SQL supported, the

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 16 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

incomplete functionality of the Sensor component of the SANTA-G system, and inadequate security access. In this prototype the schema of information available is more or less complete. The SQL supported by the SANTA-G QueryEngine component is now the same subset as that supported by the DataGrid R-GMA. Also the functionality of the Sensor component is now complete. Thus all the issues excepting security have been fully overcome, as planned. The security will be addressed to from now until the end of the CrossGrid project, again as planned. The JIMS is an infrastructure monitoring system for exposing operating system parameters (CPU statistics, number of processes, memory used, filesystems statistics), and network infrastructure (SNMP attributes) parameters to external monitoring applications through Web Services, as specified in the OGSA/OGSI specifications. The second prototype of JIMS has been completely redesigned due to the support for JIRO technology, on which the first prototype was based, being withdrawn by SUN. As stated, the prototype of JIMS presented in Pozna was based on JIRO technology. It used a special tool called Bean Shell for manipulating monitoring stations as Java objects, and instrumenting them administratively (manually). This concept required the use of graphical tools during the monitoring system deployment. The current version of JIMS has been redesigned to overcome these inconveniences. Its name has been changed from the JIRO Infrastructure Monitoring System to the JMX Infrastructure Monitoring System. In fact it is now based on a pure JMX reference implementation and uses no graphical interface during deployment. This allows the process to be automated using shell scripts. The main part of the work involved was complete refactoring of the source code, and redesigning of the parts in which JIRO was used. The dynamic discovery and heartbeat facilities were implemented. A special script for starting all instances of JIMS on each monitored node was designed. Also, RPM specifications and ant scripts for RPM building are prepared. The work planned for after the third prototype release is the complete autobuild integration, dynamic monitoring agents loading, and implementing a security mechanism for Web Service authentication and authorization. The most powerful factor of grid environment is unification of the heterogeneity. Task 3.4, which is mainly involved in data access optimization for interactive applications, copes also with the storage heterogeneity. The proposed Unified Data Access Layer (UDAL) is supposed to simplify access to the grid storage and make it simpler and more efficient. During the integration meeting in Pozna, task 3.4 provided the first working prototype of UDAL working as a universal data cost access estimator. As the result of cooperation with EDG WP2 the presented solution was partially integrated with EDG Optor. After six moths since the Poznan meeting, during the integration meeting at Cyprus, task 3.4 presented the second more advanced prototype, which was fully integrated with EDG Optor/Reptor. The presented solution was able to work for both projects and makes the mutual integration more seamless. During the meeting measurements of data access estimation quality were done and presented. The meeting was also a good opportunity to spread our software around sites in the CrossGrid testbed. At the end, the final demonstration was done based on four sites: Cyfronet Krakow, SAS Bratislava, LIP Lisbon and FZK

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 17 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Karlsruhe. Due to packaging of the code into RPM the installation in these sites was extremely simple and upgrades in the future can be done automatically. Finally, the tests and integration task (3.5) supported the process of distributing and testing all WP3 tools all over the CrossGrid testbed. New release versions of WP3 were delivered in August 2003 (Pozna) and January 2004 (Nicosia). The last release includes a set of tools and services, which run on the international CrossGrid testbed, and co-operates with the applications. The integration team has tested all tools and services of WP3 which are used by the first application prototype before releasing it in the CrossGrid testbed,. Task 3.5 has checked that the code and scripts are in the CVS repository. The integration process of tools and middleware on the testbed supporting application prototypes 1 has been achieved and correlated with WP4, WP2 and WP1 people and its time scheduling. The next production CrossGrid testbed has been based on LCG-1 middleware and from this point of view of the integration this has been taken into account. Flooding (WP1.2) and Meteorological (WP1.4) and HEP applications are running without problems on the testbed (WP4) with the Migrating Desktop, the Portal (WP3.1) and the Scheduling Agent (WP3.2). The High Energy Physics (WP1.3) application was integrated in Poznan on the testbed. The jobs can be submitted from the User Interface to the new Resource Broker. The interactivity service was presented with this application during the integration meeting in Nicosia by WP3.2 and WP3.1. The Scheduling Agent (WP3.2), Roaming Access Service (RAS, WP3.1) and Migrating Desktop (MD, WP3.1) are fully integrated and ready to work with MPICH-P4. MD presents MPI jobs in the distributed environment. WP3.2 launched the scheduling agent under LCG-1 and works with MPICH-G2. Task 3.4 exploited its functionality with EDG 2.0 (mainly available under LCG-1). The data optimisation access software was successfully installed in the following sites: Cyfronet, LIP, SAS and FZK, using its storage systems. Integration with MD (WP3.1) is being achieved. In general, application integration on the testbed with the WP3 services and tools is progressing, in particular flooding, meteorological and HEP applications. Tools and services from WP3.3 are running on local clusters. Data management and storage functionality is being added in the CrossGrid testbed. The resource broker with scheduler agents submits jobs (HPC and HTC) all over the testbed. The following Appendix describe in more detail all WP3 tools: Portals and Roaming Access task (chapter 5), Grid resource management task (chapter 6), Grid monitoring task (chapter 7), and Optimisation of data access (chapter 8). In chapter 9 we described the integration procedures and results achieved during Pozna and Nicosia meetings.

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 18 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Appendix

Detailed status report of WP3 tools and services

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 19 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

5 PORTALS AND ROAMING ACCESS

5.1 INTRODUCTION The chapter presents the current status and the progress over the last 12 month for CrossGrid WP3.1. The prototype consist of a few components: Roaming Access Server, Migrating Desktop, Application Portal and other necessary services developed for CrossGrid applications purposes.

5.1.1 PURPOSE In the past 12 months Migrating Desktop was significantly improved, some internal interfaces were corrected or optimized. There was put strength into optimizing file transfer between local and grid machines. Proposed solution: Grid Commander improves the management of different types of storages and gives a user common abstract view of all handled file systems (local file system, FTP/gridFTP, Virtual Directory, and any other in the future). For creating uniform Application and Tool environment some internal structures of JSS Wizard have been improved, now there is common user interface (dialogs) for grid application and tool as well. As a first working interactive application MD could launch VNC stream and migrate the remote graphical steering towards local workstation. In the mean time CoG library version 1.1 has been linked into the MD framework. Also the Axis server and corresponding libraries has been moved from version 1.0b3 to 1.1. This work had to be performed due to keeping compatibility with corresponding services developed by other CrossGrid partners. It is worth mentioning that installation packages have been created (RPMs) with additional auto-deploy scripts on installation machines. New features of the 3rd Prototype:

- Unified view and operations on different filesystems (local, gridftp, ftp, virtual directory, private storage).

- Integrated mechanism of JobSubmissionTool Plugin and JobSubmissionTool Container.

- Semi-interactive job support. - MPI job support. - Online Help.

Optimisation: - Migration to the newer version of webservices Axis version 1.1 - Migration to the newer version of COG v1.1. - Static / dynamic code review - Improved stability - Improved speed of launching Migrating Desktop - Improved Grid Explorer/Grid Commander tools.

The enhancements that have taken place during the last 12 months in the CG Portal are the following:

CG3.0-D3.5-v1.2-PSNC010-Proto2Status.doc PUBLIC 20 / 358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Two generic portlets have been added (dg portlets): Job-List-Match & Job-Log-Info ANT installation bundle for the whole portal and for the database of the submitted jobs

has just been prepared. Programmers guide is currently being written as well, which will be incorporated in the

installation bundle. Javadoc documentation has been improved for the portals java code. Some database enhancements have also taken place (Job_Name attribute has been added). New job submission wizard being written. We have started rewriting and enhancing the code for the CG applications and we

continue to be in this phase.

5.1.2 DEFINITIONS, ABBREVIATIONS, ACRONYMS API Application Programming Interface APS Application Portal Server CA Console Agent CoG Commodity Grid Toolkit CS Console Shadow GUI-C Graphic User Interface Container HEP High Energy Physics HTTP HyperText Transport Protocol HTTP-S HTTP Server HTTPS HyperText Transfer Protocol Secure JDL Job Description Language JNI Java Native Interface JS Job Shadow JSP Java Server Pages JSS / JSSs Job Submission Services JSS-Client Job Submission Services-Client JSS-Server Job Submission Services-Server L&B Logging and Bookkeeping LB Logging and Bookkeeping LDAP Lightweight Directory Access Protocol LRMS Local Resource Management System MD Migrating Desktop MPI Message Passing Interface PDA Personal Digital Assistant RA Roaming Access RAS Roaming Access Server SA Scheduling Agent SM Session Manager SOAP Single Object Access Protocol SOM Self Organising Map SSL Secure Sockets Layer

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 21 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

SVG Scalable Vector Graphic TBD To Be Defined VO Virtual Organisation WB Web Browser WMS Work Management System WSDL Web Services Description Language XML Extended Mark-up Language

5.2 REFERENCES

5.2.1 Documents

1. http://xml.apache.org/axis/ 2. http://jakarta.apache.org/tomcat/ 3. http://eu-datagrid.web.cern.ch/eu-datagrid/ 4. http://savannah.fzk.de/cgi-

bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_2scheduling/src/workload/ 5. http://java.sun.com/docs/books/tutorial/native1.1/index.html 6. http://www.cs.wisc.edu/condor/bypass/ 7. http://gridportal.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/ 8. http://ras.man.poznan.pl/crossgrid 9. Design Document, http://www.eu-crossgrid.org/M6deliverables.htm - CG3.1-D3.2-

v1.2-PSNC021-RoamingAccess&Portals.pdf 10. Task 3.1 SRS WP3 New Grid Services and Tools (CG-3.1-SRS-0017) 11. Task 3.1 Design Document WP3.1 (CG3.1-D3.2-v1.3-PSNC022-

RoamingAccess&Portals) 12. Task 3.2 SRS WP3 New Grid Services and Tools (CG-3.1-SRS-0010) 13. CrossGrid Scheduling Agent Design Document CG Scheduling Agent (CG3.2-D3.2-

v1.0-UAB020-SchedulingAgentsDesign) 14. DataGrid JDL attributes - DataGrid-01-TEN-0142-0_2 15. Report on the results of the WP3 2nd and 3rd prototype (CG3.0-D3.5-v1.0-PSNC010-

Task3-2)

5.2.2 SOURCE CODE The source codes of MD, RAS and necessary services are available at the official address: http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/src/ The structure of the src/ tree is presented on the Fig. 5-1:

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 22 /358

http://xml.apache.org/axis/http://jakarta.apache.org/tomcat/http://eu-datagrid.web.cern.ch/eu-datagrid/http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_2scheduling/src/workload/http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_2scheduling/src/workload/http://java.sun.com/docs/books/tutorial/native1.1/index.htmlhttp://www.cs.wisc.edu/condor/bypass/http://gridportal.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/http://ras.man.poznan.pl/crossgridhttp://www.eu-crossgrid.org/M6deliverables.htm - CG3.1-D3.2-v1.2-PSNC021-RoamingAccess&Portals.pdfhttp://www.eu-crossgrid.org/M6deliverables.htm - CG3.1-D3.2-v1.2-PSNC021-RoamingAccess&Portals.pdfhttp://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/src/

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-1 The structure of the src/ tree

The source codes of Application Portal (Fig. 5-2) and necessary services are available at the official address:

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 23 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/portal_jetspeed/

Fig. 5-2 The structure of the Application Portal directory tree

Detailed java class descriptions are generated by JavaDoc and they are available at the address: http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/doc/ The complete source code of Portal is available at the following address:

http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/portal_jetspeed/ The prototype code name is: 3rd Prototype.

5.2.3 CONTACT INFORMATION The contact information for partners responsible for the prototype and integration: Marcin Plociennik [email protected] PSNC

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 24 /358

http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/portal_jetspeed/http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/portal_jetspeed/http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/doc/http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/portal_jetspeed/http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/portal_jetspeed/mailto:[email protected]

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Pawel Wolniewicz [email protected] PSNC MD and RAS: Bartek Palak [email protected] PSNC Miroslaw Kupczyk [email protected] PSNC Rafal Lichwala [email protected] PSNC Marco Sottilaro [email protected] DATAMAT Application Portal: Yannis Perros [email protected] ALGO Miltos Kokkosoulis [email protected] ALGO Benchmarks: Wei Xing [email protected] George Tsouloupas [email protected]

5.3 IMPLEMENTATION STRUCTURE

5.3.1 System decomposition The main components of the proposed architecture of Task 3.1 and their position in the layered structure are shown on Fig. 5-3 and Fig. 5-4 .

Web Browser

HTTP DesktopPortal Server

HTTP ApplicationPortalServer

RoamingAccess

Authentication&

Authorisation

JobSubmission

User ProfileMgmt

Data GridJob

Submission

GridResource

Management

DistributedData Access

GridMonitoring

LDAP

...

GridFTP

Globus Services

GASS MDS GIS......

Higher Level Services

Lower Level Services

Roaming Access Server Level

Application Level

User Level

...

Fig. 5-3 Layered view of Task 3.1 architecture

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 25 /358

mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Web Browser

Web Browser

Web Client

Roaming Access Server

LDAP-DataBase

Replica Manager

Scheduling Agent

Application Portal Server

Desktop Portal Server

Grid Benchmarks & Metrics Web Server

Logging & Bookkeping

Fig. 5-4 Component view of Task 3.1

The main components of the proposed architecture were: Web Browser a simple web browser displaying pages generated by HTTP Server or

handling the Migrating Desktop applet; Application Portal Server a service that provides information for HTTP Server needed

to create web pages. It keeps information about user sessions and provides first parameter verification (see description below);

Desktop Portal Server (Migrating Desktop) a service that extends the functionality of the Application Portal Server by providing a specialised advanced graphical user interface and a sharing mechanism that allows the user to make files stored on his machine available from other locations/machines;

Roaming Access Server a network server responsible for managing user profile, authorisation and authentication, job submission, file transfer and grid monitoring (see description below).

5.3.2 Implementation state Web Browser is not a subject of WP3.1 work; Any web browser available on the market

that supports java plugin (as e.g. Netscape, MS Internet Explorer, etc.) can be used for accessing portal pages or the Migrating Desktop applet;

Command Line Interface the functionality available through Job Submission Wizard Command Line application. This way of accessing the functionality of text-based terminal is replaced by the feature of Job Submission Wizard, it is embedded into MD.

Application Portal Server Portal; Desktop Portal Server at that moment serves only the Migrating Desktop applet and all

corresponding java archives; the mechanism for serving specialized advanced graphical user interface has been moved to the applet (web client) module. This is an implementation difference comparing to the assumptions stated in the SRS documentation.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 26 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Roaming Access Server the main and most important module of Task 3.1 system.

Several services (based on web-services mechanism) have been implemented: managing user profile services responsible for storing/restoring user

configuration; authentication mechanism that gives simple access control. This issue still needs

improvement, job submission service allows to submit grid application; submission of interactive jobs; submission of MPI jobs; Virtual User Directory service allows to access all user files stored in different

physical locations as they were stored in one logical location; manages file transfer;

job monitoring gives information about the current job state to the user; allows to cancel a job;

Support for semi-interactive applications, and for text-based applications. Web Client (MD) understood as loaded from web location java applet a graphical user

interface to services offered by the Roaming Access Server has been implemented; Application Portal Server It is located at the web address

http://kentauros.rtd.algo.com.gr:8080/jetspeed

The machine that is hosting the portal server is also hosting a MyProxy server. The user can use this one, or any other MyProxy server from the CG testbed, in order to delegate his credentials, which will give him the ability to have access to the various portlets inside the portal.

5.3.3 Implementation model vs. the design model The implementation of the Migrating Desktop modules proceeds according to the design model without remarkable discrepancies. Please refer to the Design Document [9]. Main classes (Fig. 5-5) used by desktop graphical interface are: MigratingDesktop derived from Java Swing JApplet class main application class; MigratingDesktopFrame derived from Java Swing JFrame class class representing

frame of the main application window; MigratingDesktopMenuBar derived from Java Swing JMenuBar class class

representing main application window menu bar; MigratingDesktopToolBar derived from Java Swing JToolBar class class representing

main application window tool bar; MigratingDesktopTool base class to all tools available from the toolbar (like desktop

configuration, GridFTP graphical client, etc...); MigratingDesktopSplitPane derived from Java Swing JSplitPane class class

representing main application window splitter; MigratingDesktopStatusBar derived from Java Swing JLabel class class representing

main application window status bar;

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 27 /358

http://kentauros.rtd.algo.com.gr:8080/jetspeed

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

MigratingDesktopPane derived from Java Swing JDesktopPane class class

representing content of main application window; GridDesktopWindow derived from Java Swing JInternalWindow class representing

single Desktop Window; GridDesktopPane derived from Java Swing JDesktopPane class class representing

content of Desktop Window; GridDesktopItem base class for all menu items (icons, etc);

Fig. 5-5 Structure of Migrating Desktop graphical interface classes

Detailed java class descriptions are generated by JavaDoc and they are available at the following address:

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 28 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/doc/ Management of user profiles is realized by a set of classes inside the ldapmanager.* and ldapmanager.structures.* packages, see (Fig. 5-6).

Fig. 5-6 Class Diagram for ldapmanager.* package

5.4 PROTOTYPE FUNCTIONALITY

5.4.1 Migrating Desktop and Roaming Access Server

5.4.1.1 File transfer Each user of the CrossGrid environment has his own virtual Home Directory, where he

can store files during his work with grid resources and applications. The Migrating Desktop supports the User Virtual Directory Browser that is similar to the local one, GridFTP connections, FTP connections and Private Storage access.

The user can do the following operations relating to files: Saving (uploading) a local file into his file space (Virtual Directory, Private Storage,

remote FTP, GridFTP)

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 29 /358

http://savannah.fzk.de/cgi-bin/viewcvs.cgi/crossgrid/crossgrid/wp3/wp3_1-portals/doc/

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Getting (downloading) a remote file from the remote storage services and storing it on

the local file system; Creating links to the local files; Creating links to the remote files stored in the User Virtual Directory; Creating links to the local application;

MD delivers two kinds of directory browsers: Grid Explorer and Grid Commander (see

Fig. 5-7), which is more enhanced version of such tool. It is similar in looking and functionality to well-known directory managers (like UNIX: mc, MS Windows: Total Commander, Far, etc.,)

Choosing icon start Grid Commander from main window toolbar, the user-friendly tool for file management will start.

Fig. 5-7 Grid Commander main window

Grid Commander is used for transferring files between local and remote localisation, remote-remote as well. Remote grid localisation, is a physical Private Storage and Storage Element, FTP, GridFTP, and user Virtual Directory. Logically, the set of files and directories is represented as a hierarchical tree of directories containing files (filenames visible). Forth processing of transfer, hidden mechanism of filename-localisation is performed, but the user is unconscious of this operation. Basic operations (View, Edit, Copy, Move, MakeDir, Delete, Rename) are available for local and remote resources both. In case of remote files, the content is downloaded to local workstation using streaming technology

Adding new Private Storage (Fig. 5-8) is made by option from GC main menu (Connections -> Private Storage Manager). The user should know the address of storage.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 30 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-8 Private Storage management

5.4.1.2 Job Submission The main functionality of the Migrating Desktop is a possibility of submitting jobs for

the Grid application, which are executed on high computing power elements in the Grid environment.

The user can start the Application Wizard (see Fig. 5-9) by choosing icon (Job Wizard), from pop-up menu (New Job Icon), or pressing Alt-w. This dialog allows the user to choose the proper application for execution the application description helps to find a proper one.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 31 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-9 Application Wizard

After the user chooses the Grid application, a Job Submission dialog appears. It should be

used to define all the application parameters, which are required before the Grid application starts. This dialog contains the following five tabs:

Arguments Description Resources Files Environment Tools

The Argument tab of the Job Submission dialog (see Fig. 5-10) allows the user to

define the arguments and parameters, which are directly sent to the application. This tab (its content and look) is unique for the application and can be different for different Grid applications. In most cases, application developers deliver the Argument tab. MD offers extra functionality for interpreting parameters and layout description. Every developer of application can prepare the layout of arguments and automatically deliver it to MD Job Wizard.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 32 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-10 Job Submission dialog - example of arguments tab

The Description tab of the Job Submission dialog (see Fig. 5-11) allows the user to

define the name and description of the job that is sent to the application. Those parameters will help the user to find the status of the job execution in the Job Monitoring tool (see section 5.4.1.3).

Fig. 5-11 Job Submission Dialog - description tab

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 33 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

The Resources tab of the Job Submission dialog (see Fig. 5-12) allows the user to

define the limits on the resources, which are used during the execution of the job. The resource limitations are related with such things as CPU, system memory, operating system etc.

Fig. 5-12 Job Submission Dialog - resources tab

Limits: hostname on this machine the job should be run cpuCount number of requested processors nodeNumber number of requested nodes maxMem max requested amount of memory acquired during computations minMem max requested amount of memory acquired during computations maxTime max estimated time of job execution minTime min estimated time of job execution osType required type operating system (the meaning is the same as JDL file) osName required operating system (eg. Linux) osVersion required version of the operating system cpuSpeed speed of cpu clock application name of application JobType mpi, default is single, in the future interactive

The Files tab of the Job Submission dialog (see Fig. 5-13) allows the user to define a

set of files on which the Grid application operates. There can be a lot of such files and buttons Add and Remove help to manage them. There are three possible types of files: in, out and in-out files. This type defines if a given file is the input file, output file or combined. If you want to run your own application eg. a.out, fill additional Name field with a.out and

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 34 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

give appropriate path to that executable. The Path column of the table contains the browse button, which calls the Virtual Directory Browser to specify the location of the file in the Grid. If you want to have up-to-date version of current e.g. Output, you may specify number of second as a time period for refreshing the data on SE from WN. It is strictly connected with visualization tool, which shows the current progress of execution semi-interactivity.

Fig. 5-13 Job Submission Dialog - files tab

To choose file helps you the Virtual Directory Browser (see Fig. 5-14). It is simplified

version of Grid Commander, It consists of two panels: directory tree and content of desired directory. Before submitting any application, you have to create an empty file using this utility.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 35 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-14 Virtual Directory Browser

The Environment tab of the Job Submission dialog (see Fig. 5-15) allows the user to

define a set of environment variable names and their values (strings), which are required for execution of the Grid application. There could be a lot of such variables and buttons Add and Remove help to manage them.

Fig. 5-15 Job Submission Dialog - environment tab

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 36 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-16 Job Submission Dialog Tools tab and OCMG configurator The last tab Tools (Fig. 5-16) invokes a CrossGrid tool which could be associated with a running application launched using this Job Wizard. For details regarding usage of tool, please refer to the appropriate tool manual delivered by CrossGrid WP2 team.

After the user defines all job parameters, they submit them and all job information are sent to the job submission service. The execution of the job is started and the user can trace its status in the Job Monitoring tool (see next section).

5.4.1.3 Job Monitoring The Job Monitoring dialog (see Fig. 5-17) is an useful tool for tracing the status,

viewing details, logs and parameters of the previously submitted jobs. This dialog contains all the information about the submitted job (its full description and status embedded in the jobs table) and provides the following simple functionality:

Delete delete the selected jobs (stop and remove it from the computing element of the Grid);

Cancel cancel the selected jobs (stop executing it but do not remove it);

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 37 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Refresh refresh the status of the selected jobs; Refresh All refresh the status of all jobs belonging to the user; Details many interesting info (see next); Visualize launch visualization tool for particular application, may by delivered by

application developers. Close close the Job Monitoring dialog;

Fig. 5-17 Job Monitoring Dialog

Job Details dialog (see Fig. 5-18) is launched from the Job Monitoring tool. It presents

the parameters, description, and visualization of job output if it exists and supports the process of visualization. In general, it contains parameters of job, which were defined during the submission procedure. Additionally, the Job Log and Extended status log is also available. Logs contain data about all job phases, its status on internal processing of job placement by CrossGrid infrastructure.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 38 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-18 Job Monitoring Details dialog

The excerpt of log of successfully scheduled and executed job (RB states) is presented on Fig. 5-19.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 39 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-19 Excerpt of job log

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 40 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Excerpt of Logging&Bookkeeping log of job successfully scheduled and executed history of job (see Fig. 5-20)

Fig. 5-20 Excerpt of job log

5.4.2 Application Portal The Application Portal is also a GUI for grid resources, but it offers limited functionality. It suits most in the organisations where java is forbidden. The starting page of Portal is shown on Fig. 5-21.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 41 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-21 Starting page of Portal

5.4.2.1 File transfer The GridFTP mechanism is used for the transfer of files after the execution of a job on the Grid. This job can be from any of the implemented applications portlets in the portal. On the other hand, there is another way to get the output from a simple job when submitted via the generic Job Submission portlet. The user sees two html links in the portlet area, one called output.txt and the other error.txt. Both of them link to the corresponding file and present to the user the output and/or error, if there is, of the specific job submission. There follows a description of the various portlets of the CrossGrid Application Portal, as well as some pictures of them.

5.4.2.2 Job Submission There is a specific portlet Fig. 5-22 that is used for the submission of simple jobs to the CG testbed and provides the user with text fields that correspond to various parameters. Some of those parameters are necessary and others are optional. There follows a list that contains all of them:

Job name [optional] Executable [necessary] Arguments [optional] StdOutput [necessary] StdError [necessary] InputSandbox [optional] OutputSandbox [necessary] Requirements [optional] Resource Broker (drop-down box) [necessary]

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 42 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Port (default value: 7772) [necessary]

There are also the two buttons Submit and Reset for either submitting the job with the values that have been filled in, or clearing all the text fields from those values.

Fig. 5-22 Job submission Portlet

Apart from that, there is another way of submitting jobs to the CrossGrid testbed. The user can submit a JDL file Fig. 5-23 with all the necessary parameters included in it.

Fig. 5-23 JDL submission portlet

An example JDL file is this: Executable = "cg-job-starter.sh"; Arguments = " -fo gsiftp://kentauros.rtd.algo.com.gr:2811/tmp/o111.txt StdOutput -fo gsiftp://kentauros.rtd.algo.com.gr:2811/tmp/e.txt StdError -c /bin/hostname"; StdOutput = "StdOutput"; StdError = "StdError"; InputSandbox = { "/opt/cg/bin/cg-job-starter.sh" }; VirtualOrganisation = "cg"; requirements = other.GlueCEStateStatus == "Production";

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 43 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

rank = other.GlueCEStateEstimatedResponseTime; Succesfull submission results with returning of the job identifier. Such identifier is unique all over the grid infrastructure Fig. 5-24.

Fig. 5-24 Job identifier example

5.4.2.3 Job Status This portlet is used in conjunction with the job submission portlet. So, when a user has submitted a job, he can then check for its status Fig. 5-25. There is a variety of possible kinds of status (OUTPUTREADY, SCHEDULED, etc.)

Fig. 5-25 Job Status portlet

5.4.2.4 Job List Match A user can find out through the portlet Fig. 5-26 the available Computing Elements in the testbed that can execute the particular job that user has specified.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 44 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-26 Job List Match Portlet

5.4.2.5 Job Log Info This portlet can be used in order to obtain information about previously submitted jobs and their status Fig. 5-27. Also the user can find out what exactly was the course of a particular job from the start of the submission until the exit outcome.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 45 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

Fig. 5-27 Job Log Info Portlet

5.4.2.6 Job Get Output This is an important function of the CG portal. With the use of this portlet Fig. 5-28, the user is able to retrieve the output and/or error from a submitted job. The results can be retrieved via HTML links to the output and error files, which are shown inside the portlet area. Then, the user can choose either one of the HTML links to view its context.

Fig. 5-28 job GetOutput Portlet

5.4.3 Job Submission Services The Job Submission Services (JSS) provide the Migrating Desktop Server and the Portal Server with a set of services, which allows performing job submission and monitoring. According to the SRS Document [10] and the Design Document [11], the Job Submission Services are provided as Web Services (Fig. 5-29). These services are network accessible through standardized XML messaging and described using standards Web Services Description Language (WSDL).

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 46 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

GRID

Roaming AccessServer

ResourceBroker

Logging&Bookkeeping

Migrating Desktop Server

& Portal Server

Job Submission Services

(JSS-Server)

JSS Client

Fig. 5-29 High level view of the Job Submission Services components From the architecture point of view, theyre composed by:

the JSS-Server on the Roaming Access Server machine ; the JSS-Client implemented onto the Migrating Desktop Server and the Portal server.

The JSS-Server acts as a Service Provider: implementing the offered services; publishing the description of these services; waiting for customers requests.

The JSS-Client acts as a Services User: it invokes the published services according to their description. The communication between the two components is based on messaging over SOAP/HTTP protocols. In order to process the service request coming from the JSS-Client, the JSS-Server contacts the Resource Broker (RB - also called Scheduling Agent [12] and [13]) and the Logging&Bookeeping (L&B) specified in the request parameters. The communication between the JSS-Server and the RB / L&B is based on TCP/IP sockets.

The JSSs allow Migrating Desktop and Portal users to:

submit their jobs; monitor the status of their submitted jobs; retrieve a set of information on the trace of events concerning a submitted job; cancel the submitted job; request a list of available resources; interactive with some jobs running on a remote Computing Element (CE).

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 47 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

The JSS-Client:

receives the user requests ; invokes the appropriate services on the RAS to process them ; receives the results from the invoked services ; gets results of the requests (coming from the RAS) .

The JSS-Server: receives the service requests coming from JSS-Client ; involves the specified RB (SA [12], [13]) and/or a Logging&Bookkeeping; receives from the RB and L&B the results of the requested services (the error

messages in case of failure); sends these results to the JSS-Client.

The Fig. 5-30 depicts the JSS components in the RAS :

a http-Server and a SOAP engine ; the JSS-WebServices-core ; the WMS-Client,

The web services are published on the http-Server and the SOAP engine handle the SOAP messages exchanged between the JSS-Client and the JSS-Server. The WebServices-core is the module that implements the web services and contains the components that handle the communication with the client. These services are deployed by the Apache Axis[1] tool. Apache Axis provides a SOAP engine that plugs into the Apache TomCat servlet engine[2]. Both the WebServices-core and the JSS-Client are based on a set of Java library provided by the Apache Axis. The WMS-Client provides the web services implementation with a set of API functionality that implements the interface towards the Workload Management System (the RBs and the L&Bs and other components) . This component is based on the software of the EU-DataGrid project [3] (version 2.0.18). and under the control of the CrossGrid WorkPackage 3.2 [4] ([12], [13] and [15])

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 48 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

JSS-Client

WMS

Server ( RB / L&B)

Roaming Access Server

WebServices

core (WP-3.1)

SOAP/HTTP

TC/IP

JSS-Server

WMS-Client (WP-3.2)

JNI

interface(WP3.2)

http Server + SOAP Engine (AXIS)

Fig. 5-30 The JSS-Server modules The web services in the JSS-WebServices-core (Fig. 5-30) are implemented using the Java language; instead the C and C++ languages have been used for the implementation of the JSS-WMS interface. The interface between these two components has been implemented using the Java Native Interface (JNI) technology [5]. Also this interface is under the control of the WP3.2. The published services are the following:

Job-Submit (allows users to submit a job for execution on some remote resources of the computational grid simple, MPI and interactive jobs);

Job-Cancel (allows users to cancel one or more submitted jobs); Job-Status (retrieves the bookkeeping information on the submitted jobs); Get-Logging-Info (retrieves logging information on the submitted jobs); Job-List-Match (returns a list of resources fulfilling the user job requirements); GetUserJobs (retrieves the identifier of the jobs which are recently submitted by

users) . Besides a particular service that allow users to interactive with their jobs or just to see some partial output result, while theyre running on the grid resources (see the section 1.4.4 )

5.4.3.1 The job-submission The job-submisson allows running a job at one or several remote resources. The characteristics and requirements of the job are expressed by means of a Condor ClassAd ( the Job Description Language JDL [14]).

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 49 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

The service checks the correctness of the job description. There is a small subset of class-ad attributes that are compulsory, i.e. that have to be present in a job class-ad before it is sent to the Resource Broker in order to make possible the performing of the match making process. Mandatory attribute name Mandatory with default value (default value)

Executable - VirtualOrganisation - Requirements other.GlueCEStateStatus == "Production" Rank other.GlueCEStateEstimatedResponseTime The JobType attribute specifies the type of the job described by the JDL.

JobType = Interactive; or

JobType = {Checkpointable, MPICH}; The values which are currently supported are:

Normal ( the default value ) Interactive MPICH MPICH-G2

For MPI jobs (when JobType is either MPICH or MPICH-G2) it is necessary to specifies the number of CPUs of the grid resource by the NodeNumber JDL attribute, which value is obviously an integer greater than 1. An example of the JDL setting is provided hereafter:

JobType = MPICH;

NodeNumber = 5;

The RB uses this attribute during the matchmaking for selecting those CE having a number of CPUs equal or greater than the one specified in NodeNumber. Each of submitted job is asssigned a unique string identifier, which format is: https://:// It is necessary to specify this identifier in order to perform a specific operation on this job (i.e monitoring, cancelling, ..)

5.4.3.2 The JobStatus and the JobStatusAll This service retrieves the status of a job previously submitted. The job status request is sent to the L&B that provides the requested information. The JobStatusAll can monitor one or more jobs which a user has previously submitted.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 50 /358

https://://

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

The possible state values of jobs are:

UNDEF (indicates invalid :, i.e. uninitialized instance); SUBMITTED (Just submitted ) ; WAITING ( accepted by the WMS, waiting for resource allocation) ; READY (matching resources found); SCHEDULED (Accepted by LRMS queue); RUNNING (the executable is running on the CE); DONE (Execution finished, output not yet available); CLEARED (Output transferred back to the user); ABORTED (aborted by the system - at any stage-) CANCELLED (cancelled by the user) UNKNOWN (the status cannot be determined )

Some other fields of the job information are (bookkeeping information):

jobId CertificateSubject Executable InputData OutputData ResourceId (if the job has been already scheduled) submissionTime (when the job has been submitted from the UI; SUBMITTED status) scheduledTime (when the job has been submitted to the resource; SCHEDULED

status) startRunningTime (when the job has started its execution; RUNNING status) StopRunningTime (when the job has completed its execution; DONE or ABORTED

status)

5.4.3.3 The job-cancel This command cancels a job previously submitted. The cancel request is sent to the Resource Broker

5.4.3.4 The job-list-match This service provides the list of identifiers of the resources accessible by the user and satisfying the job requirements included in the job description file. The rules of constructing the job description are the same which are used for the job submission. The Resource Broker is only contacted to find job compatible resources; the job is never submitted.

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 51 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

5.4.4 Interactivity When users want to submit an interactive job, they need the allocation of grid resources throughout an interactive session. During the whole session a bi-directional channel is opened between the user client and the application program running on a remote machine (Fig. 5-31). All the job input and output streams are exchanged with the user client via this channel: the users can send their input data to the job and receive output results and error messages.

Job Shadow

User

Job Planning&

LB

........

ComputinElement

W 1

LRMS

Gatekeepe

W n

RS

shadow port,shadow

shadow port,shadow

Job

OutputSandbo InputSandbo

JD

StdIn , StdOut , StdEr

Files transfer

New flows

Usual Jobsubmission

flows

Cushion Process

Console Agent

Job Shadow

User

Job Planning&

LB

........

ComputinElement

W 1

LRMS

Gatekeepe

W n

RS

shadow port,shadow

shadow port,shadow

Job

OutputSandbo InputSandbo

JD

StdIn , StdOut , StdEr

Files transfer

New flows

Usual Jobsubmission

flows

Cushion Process

Console Agent

Job Shadow

Job SubmissionServices

Job Planning&

Control

LB

........

ComputingElement

W 1

LRMS

Gatekeeper

W n

RSL

shadow port,shadow

shadow port,shadow host

Job

OutputSandbox InputSandbox

JDL

StdIn , StdOut , StdEr

Files transfer

New flows

Usual Jobsubmission

flows

Cushion Process

Console Agent

Roaming Access Server

Fig. 5-31 Interactive application job management

CG3.0-D3.5-v1.0-PSNC010-Proto2Status.doc PUBLIC 52 /358

Deliverable D3.5 Report on the results of the WP3 2nd and 3rd prototype

The interactive sessions are handled by Grid Console (GC) that is a system provided by Condor for getting mostly-continuous input/output from remote programs running on an unreliable network. A GC is a split execution system composed by two software components: an agent (Console Agent CA ) and a shadow (Console Shadow CS or Job Shadow JS). A split execution system is a special case of an interposition agent. An interposition agent transforms a program's operation by placing itself between the program and the operating system. When the program attempts certain system calls, the agent grabs control and manipulates the results. In a split execution system an interposition agent (CA) traps some of the procedure calls of an application, and forwards them (via RPC) to a shadow process (CS) on another machine. Under this arrangement, a program can run on any networked machine and yet execute exactly as if it were running on the same machine as the shadow. All the network communications are GSI-enabled. The Console Agent runs on a Worker Node and it is a shared library that intercepts reading and writing operations on stdin, stdout, and stderr of the running job. When possible the CA sends the output back to the CS (Fig. 5-32). The shadow manages the input and output files according to the request of the agent. If the output sending fails CA will instead write its on the local disk (Fig. 5-33). It doesn't matter why the input/output