Upload
giles-little
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Porting applications to EU-IndiaGrid:
EGEE
Marco Verlato (@pd.infn.it)
EU-IndiaGrid Workshop 16-18 April 2007 Bangalore, India
2
Outline
• Concepts• Job categories• Levels of Gridification• Software remote installation• Applications invocation
3
Grid Applications Development
Where computer science meets the application communities!
VO-specific developments built on higher-level tools and core services
Makes Grid services useable by non-specialists
Grids provide the compute and data storage resources
Basic Grid services:AA, job submission, info, …
Higher-level grid services (brokering,…)
Application toolkits, …..
Application
Production grids provide these core services.
4
Job concept• gLite middleware follows the job submission
concept for application execution and resource sharing
• A job is a self contained entity that packages and conveys all the required information and artifacts for the successful remote execution of an application−Executable files−Input/Output data−Parameters−Environment−Infrastructure Requirements−Workflows
5
Type of applications
• The work done to port the application depends on its characteristics, complexity and the level of interactions with external grid services
• Size of input/output data• Metadata request• Access to external software
(database, libraries...)
6
Applications categories
• Simulation• Bulk Processing• Responsive Apps.• Workflow• Parallel Jobs
7
Simulation
• Examples– LHC Monte Carlo simulation– Fusion
• Characteristics– Jobs are CPU-intensive– Large number of independent jobs– Run by few (expert) users– Small input; large output
• Needs– Batch-system services– Minimal data management for storage of results
ATLAS
ITER
8
Bulk Processing
• Examples– HEP processing of raw data, analysis– Earth observation data processing
• Characteristics– Widely-distributed input data– Significant amount of input and output data
• Needs– Job management tools (workload
management)– Meta-data services– More sophisticated data management
9
Responsive Apps. (I)
• Examples–Prototyping new applications–Monitoring grid operations
• Characteristics–Small amounts of input and output data–Not CPU-intensive–Short response time (few minutes)
• Needs–Configuration which allows “immediate”
execution (QoS)–Services must treat jobs with minimum
latency
10
Responsive Apps. (II)
• Grid as a backend infrastructure:– gPTM3D: interactive analysis of medical images– GPS@: bioinformatics via web portal– GATE: radiotherapy planning– DILIGENT: digital libraries
• Characteristics– Rapid response: a human waiting for the result!– Many small but CPU-intensive tasks– User is not aware of “grid”!
• Needs– Interfacing (data & computing) with non-grid application
or portal– User and rights management between front-end and grid
11
Workflow
• Examples–“Bronze Standard”: medical images registration–Flood prediction
• Characteristics–Use of grid and non-grid services–Complex set of algorithms for the analysis–Complex dependencies between
individual tasks
• Needs–Tools for managing the workflow itself–Standard interfaces for services (I.e. web-
services)
12
Parallel Jobs
• Examples– Climate modeling– Earthquake analysis– Computational chemistry
• Characteristics– Many interdependent, communicating tasks– Many CPUs needed simultaneously– Use of MPI libraries
• Needs– Configuration of resources for flexible use of
MPI– Pre-installation of optimized MPI libraries
13
Universal Needs• Security
–Ability to control access to services and to data
•Fine-grained access control listsEncryption & logging for more demanding
disciplines•Access control consistently implemented over
all services
• VO Management–Management of users, groups, and roles–Changing the priority of jobs for different
users, groups, roles–Quota management for users, groups, roles–Definition and access to special resources
•Application-level services•Responsive queues (guaranteed, low-latency
execution)
14
Levels of “Gridification”• No development: wrap existing applications as
jobs. No source code modification is required
• Minor modifications: the application exposes minimal interaction with the grid services (e.g. Data Managements)
• Major modifications: a wide portion of the code is rewritten to adopt to the new environment (e.g. parallelization, metadata, information)
• Pure grid applications: developed from scratch. Extensively exploit existing grid services to provide new capabilities customized for a specific domain (e.g. metadata, job management, credential management)
15
No development
• It is the simplest case : the application has not external dependencies, just needs computational power
• Being sure that the application runs on the Worker Node, the executable, and the input files it may need are carried in the Input Sandbox and therefore executed
• Although the application can be directly run, it’s recommended to invoke it from a supplied script which prepare the environment before the execution
16
Minor modifications• The application has minimal
interactions with grid service : it is self consistent but...– e.g. takes input files from an SE– output files are stored on a SE– Perform I/O from a metadata catalog– the application itself can be
downloaded from a SE • User provides an adapter script
which executes needed tasks, set the environment and run the application
17
Major modifications
• A part of the application software is rewritten taking care of the new environment : e.g. because of parallelization, interaction at run time with grid services...– Use of API – Web Service invoked – CLI wrap
• Source code recompiled taking into account this
18
Pure grid application
• Entirely developed from scratch taking into account grid capabilities
• Better exploits grid facilities• Needs a deep knowledge of grid
environment and possibilities in order to engineer entirely the application logic over a grid environment
• Very rare to meet...
19
Additional software installed
• Independently from the integration level, applications may need external software (libraries) that, as application itself, can be provided in several ways– at run time
• carrying out it within InputSandbox• downloading it from a Storage Element
– permanent install on the CE• as tar.gz installed by the VO software
manager • as rpm being part of the gLite release
20
• The Experiment Software Manager (ESM) is the member of the experiment VO entitled to install Application Software in the different sites.
• The ESM is generally acknowledged through a VOMS role (SoftwareManager)
• The ESM can manage (install, validate, remove...) Experiment Software on a site at any time through a normal Grid job, without previous communication to the site administrators.
• Such job has in general no scheduling priorities and will be treated as any other job of the same VO in the same queue.
• lcg-asis is a good generator for such installation jobs
Software installation on CE
21
Software installation on CE
• The site provides a dedicated space where each supported VO can install or remove software. The amount of available space must be negotiated between the VO and the site
• An environmental variable holds the path for the location of a such space. Its format is the following:
VO_<name_of_VO>_SW_DIR
22
Software validation
• Once the software is installed, the ESM can perform the validation
• The validation is a process verifying that the software was properly installed and works as expected
• It can be performed in the same job, after installation (validation on the fly) or later on, via a dedicated job.
23
Software validation• If the ESM judges that the software was
properly installed he publishes in the Information System (IS) the tag which identifies unequivocally the software,
e.g.: VO-cms-CMSSW_1_2_2
• Such a tag is added to the GlueHostApplicationSoftwareRunTimeEnvironment attribute of the IS, using the GRIS running in the CE
• Jobs requesting for a particular piece of software can be directed to the appropriate CE just by setting special requirements on the JDL, e.g.: Requirements=Member("VO-cms-CMSSW_1_2_2",
other.GlueHostApplicationSoftwareRunTimeEnvironment);
24
Invoke Application
• From the UI−Command Line Interfaces / Scripts−APIs−Higher level tools
• From portals−Accessible from any browser−Tailored to applications
25
References (I)
https://grid.ct.infn.it/twiki/bin/view/GILDA/APIUsage
26
References (II)
https://glite-demo.ct.infn.it/
27
Acknowledgements
• Thanks to Mike Mineter (NeSC), Valeria Ardizzone and Emidio Giorgio of the INFN GILDA team