76
EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org Emidio Giorgio INFN Catania ISSGC’09,Nice-Sophia Antipolis, 10.07.2009 Middleware Overview venerdì 10 luglio 2009

Session 23 - gLite Overview

Embed Size (px)

Citation preview

Page 1: Session 23 - gLite Overview

EGEE-III INFSO-RI-222667

Enabling Grids for E-sciencE

www.eu-egee.org

Emidio GiorgioINFN Catania

ISSGC’09,Nice-Sophia Antipolis, 10.07.2009

Middleware Overview

venerdì 10 luglio 2009

Page 2: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 2

Outline• General overview

• Security System– VOMS server– LCAS LCMAPS

• Information Service– Berkeley DB Information Index (BDII)

• Workload Management System– WMS mechanism– JDL– Computing Element– Logging and bookkeeping

• Questions

venerdì 10 luglio 2009

Page 3: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 3

gLite Middleware overview

venerdì 10 luglio 2009

Page 4: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 4

EGEE Project and gLite

• Enabling Grids for E-sciencE (EGEE) is a large multi-disciplinary grid infrastructure– Brings together more than 120 European organisations – Consists of ~300 sites in 48 countries and more than 68,000 CPUs – Is available to some 10,000 users 24 hours a day, 7 days a week– Processes more than 150,000 jobs per day from different scientific

domains

• gLite is the middleware powering the EGEE infrastructure and many other related projects– Is an integrated set of components designed to enable resource

sharing among different institutions– Pulls together contributions from many other projects, including LCG

and VDT– Enable users with a large set of services

venerdì 10 luglio 2009

Page 5: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 5

Additional Infrastructures: GILDA

• EGEE provides a training infrastructure: GILDA (Grid INFN Laboratory for Dissemination Activities)– Runs the entire gLite stack protocols– Used to demonstrate EGEE grid technology project– Supports beginner and expert training courses on gLite

• Adopted by several Grid projects worldwide

• Own Certification Authority

• Available 365 days for everyone !

• Used in the ISSGC schools series

• Since 2007 other middleware than gLite are tested on GILDA

venerdì 10 luglio 2009

Page 6: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 5

Additional Infrastructures: GILDA

• EGEE provides a training infrastructure: GILDA (Grid INFN Laboratory for Dissemination Activities)– Runs the entire gLite stack protocols– Used to demonstrate EGEE grid technology project– Supports beginner and expert training courses on gLite

• Adopted by several Grid projects worldwide

• Own Certification Authority

• Available 365 days for everyone !

• Used in the ISSGC schools series

• Since 2007 other middleware than gLite are tested on GILDA

venerdì 10 luglio 2009

Page 7: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 6

gLite in the Grid “ecosystem”

. . .

LCG

EGEE

Used in

USA EU

NextGrid DEISAGridCC

Future grids

EDG

Globus MyProxyCondor ...

VDTDataTAG

CrossGrid ...

OSG, …

SRM

interactive

venerdì 10 luglio 2009

Page 8: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 7

The Middleware structure• Applications have access both to

Higher-level Grid Services and to Foundation Grid Middleware

• Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory

• Foundation Grid Middleware are actually developed in EGEE– Must be complete and robust– Should allow interoperation with other

major grid infrastructures– Should not assume the use of Higher-

Level Grid Services

venerdì 10 luglio 2009

Page 9: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 8

gLite Services Decomposition

API Access

Job Mgmt. Services

ComputingElement

WorkloadManagement

MetadataCatalog

Data Services

StorageElement

DataMovement

File & ReplicaCatalog

Authorization

Security Services

Authentication

Information &Monitoring

Information & Monitoring Services

ServiceDiscovering

Accounting

Auditing

JobProvenance

PackageManager

CLI

NetworkMonitoring

venerdì 10 luglio 2009

Page 10: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 9

gLite infrastructure

Workload Management System (WMS)‏Data Management

venerdì 10 luglio 2009

Page 11: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 10

Security System

venerdì 10 luglio 2009

Page 12: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 11

gLite Security

• Authentication based on X.509 PKI infrastructure– Certificate Authorities (CA) issue (long lived) certificates

identifying individuals (much like a passport)– Trust between CAs and sites is established (offline)– In order to reduce vulnerability, Grid user identification is done by

(short lived) proxies of their certificates• Proxies can

– Be delegated to a service such that it can act on the user’s behalf

– Include additional attributes (like VO information via the VO Membership Service VOMS)

– Be stored in an external proxy store (MyProxy) – Be renewed (in case they are about to expire)

venerdì 10 luglio 2009

Page 13: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 12

X.509 Proxy Certificate

• Proxy: GSI extension to X.509 Identity Certificates– signed by the normal end entity cert (or by another proxy).

• It enables single sign-on.

• It supports some important features:– Delegation, Mutual authentication

• It has a limited lifetime (minimized risk of “compromised credentials”)

• It is created by the voms-proxy-init command– Options for voms-proxy-init:

-hours <lifetime of credential> -bits <length of key>

venerdì 10 luglio 2009

Page 14: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 13

GRID Security: Components

• Large and dynamic population•Different accounts at different sites •Personal and confidential data•Heterogeneous privileges (roles)‏•Desire Single Sign-On

Users

• “Group” data • Access Patterns • Membership

“Groups”

Sites• Heterogeneous Resources• Access Patterns • Local policies• Membership

Grid

venerdì 10 luglio 2009

Page 15: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 14

VOM

S

client

Query

Authentication

Request

AuthDB

OK

C=IT/O=INFN /L=CNAF/CN=Pinco Palla/CN=proxy

VOMSAC

VOMSAC

VOMS: conceptsVirtual Organization Membership Service:

– Extends the proxy with info on VO membership, group, roles– Fully compatible with GSI– Each VO has a database containing group membership, roles and capabilities

informations for each user– User contacts VOMS server requesting his authorization info – Server sends authorization info to the client, which includes it in a proxy certificate

venerdì 10 luglio 2009

Page 16: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 15

FQAN and AC

• VOMS uses the Fully Qualified Attribute Name (FQAN) to express membership and other authorization info

• Groups membership, roles and capabilities may be expressed in a format that bounds them together– <group>/Role=[<role>][/Capability=<capability>]

• FQAN are included in an Attribute Certificate

• Attribute Certificates are used to bind a set of attributes (like membership, roles, authorization info etc) with an identity

• ACs are digitally signed

• VOMS uses AC to include the attributes of a user in a proxy certificate

venerdì 10 luglio 2009

Page 17: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 16

VOMS Certificate

• AC is included by the client in a well-defined, non critical, extension assuring compatibility with GT-based mechanism

asli@levrek:~$ voms-proxy-init --voms gildaYour identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargetta/[email protected] GRID pass phrase:Creating temporary proxy .................................... DoneContacting voms.ct.infn.it:15001 [/C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it] "gilda" DoneCreating proxy .................................. DoneYour proxy is valid until Tue Jun 26 03:16asli@levrek:~$

venerdì 10 luglio 2009

Page 18: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 16

VOMS Certificate

• AC is included by the client in a well-defined, non critical, extension assuring compatibility with GT-based mechanism

asli@levrek:~$ voms-proxy-init --voms gildaYour identity: /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargetta/[email protected] GRID pass phrase:Creating temporary proxy .................................... DoneContacting voms.ct.infn.it:15001 [/C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it] "gilda" DoneCreating proxy .................................. DoneYour proxy is valid until Tue Jun 26 03:16asli@levrek:~$

venerdì 10 luglio 2009

Page 19: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 17

VOMS Certificateasli@levrek:~$ voms-proxy-info -allsubject : /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargetta/CN=proxyissuer : /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargettaidentity : /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargettatype : proxystrength : 512 bitspath : /tmp/x509up_u18948timeleft : 11:57:20=== VO gilda extension information ===VO : gildasubject : /C=IT/O=GILDA/OU=Personal Certificate/L=INFN/CN=Marco Fargettaissuer : /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.itattribute : /gilda/Role=NULL/Capability=NULLattribute : /gilda/grelc/Role=NULL/Capability=NULLattribute : /gilda/grelc/das/Role=NULL/Capability=NULLattribute : /gilda/grelc/das/grelc.unile.it/Role=NULL/Capability=NULLattribute : /gilda/grelc/das/grelc.unile.it/sakila/Role=NULL/Capability=NULLattribute : /gilda/grelc/das/grelc02.unile.it/Role=NULL/Capability=NULLattribute : /gilda/grelc/das/grelc02.unile.it/sakila/Role=NULL/Capability=NULLtimeleft : 11:57:48asli@levrek:~$

Attributes

venerdì 10 luglio 2009

Page 20: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 18

LCAS & LCMAPS• At resources level, authorization info is extracted from

the proxy and processed by LCAS and LCMAPS

• Local Centre Authorization Service (LCAS)‏– Checks if the user is authorized– Checks if the user is banned at the site

• Local Credential Mapping Service (LCMAPS)– Map remote credentials to local credentials (eg. different UNIX

uid/gid)– Map also VOMS group and roles (full support of FQAN)

enables privileges separations

venerdì 10 luglio 2009

Page 21: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 19

VOMS enabled Grid

• User can be in multiple VOs– Aggregate rights

• VO can have groups– Different rights for each

Different groups of experimentalists …

– Nested groups

• VO has roles– Assigned to specific purposes

E,g. system admin When assume this role

• Proxy certificate carries the additional attributes

venerdì 10 luglio 2009

Page 22: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 20

Information Service

venerdì 10 luglio 2009

Page 23: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 21

Information Service

• What?– System to collect information on the state of resources

• Why?– To discover resources of the grid and their nature– To check for health status of resources– To provide data in order to manage the workload more efficiently

• How?– Monitoring and publishing fresh data on the state of resources– Adopting a well known data model

• Who?– User searching specific resources for their activity– Workload Management System– Other monitoring system

venerdì 10 luglio 2009

Page 24: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 22

Information Service Systems

• The gLite Data Model is based on Grid Laboratory Uniform Environment (GLUE) Schema

• The IS architecture used in gLite is Berkeley DB Information Index (BDII)– has been adopted in LCG middleware as the Information System

provider– It is an evolution of the Globus Meta Directory System (MDS)‏– It is based on Lightweight Directory Access Protocol (LDAP)

servers

venerdì 10 luglio 2009

Page 25: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 23

GLUE Schema

• Describe the Grid resources information stored in the IS

• Independent from the underlying technology

• Actual release is mapped on– LDAP– XML– ClassAd (Condor Matchmaking language)‏

• The entities of the GLUE Schema are organised hierarchically– Include the concept of Site, Cluster, Computing Element, Storage

Element, and an abstraction of service

venerdì 10 luglio 2009

Page 26: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 24

GLUE Schema Structure

Collection of resources owned by a single organisation. Contains info on the location, the administrator, web page and so on

Site

Description of deployed service

Service

StorageElement

Set of heterogeneous resources. Contains info on shared directory

Cluster

1 1 1

*

*

*

Set of homogeneous resources. Contains the size of the set

Sub-Cluster

ComputingElement

Contains details of hardware (features and performance) and software

Host

1

*

1

Job

VOview

State

PolicyInfo

**

venerdì 10 luglio 2009

Page 27: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 25

Abbreviations:BDII: Berkeley DataBase Information Index

GIIS: Grid Index Information ServerGRIS: Grid Resource Information Server

GRISs, local BDII and BDII

Each site can run a BDII. It

collects the information given by the local BDIIs

At each site, a *local* BDII collects the information

given by the GRISs

Local GRISes run on CEs and SEs at each site and report dynamic and static information

venerdì 10 luglio 2009

Page 28: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 26

RB Local GRIS

SELocal GRIS

CE Local GRIS

BDII-A BDII-B

SELocal GRIS

SELocal GRIS

CE Local GRIS

SELocal GRIS

BDII-C

CELocal GRIS

CE Site BDII

CELocal GRIS

CE Site BDIICE

Local GRIS

CE Site BDII

Site 1 Site 2 Site 3

The IS in gLite

venerdì 10 luglio 2009

Page 29: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 27

BDII

• Users and other Grid services (such as the WMS) can interrogate BDIIs to get information about the Grid status.

• Each BDII collects information from the site GIISes (or local BDII) defined in a configuration file, which it accessed through a web interface.

• Every two minutes a cron-job runs a script and collects information (pull model) from all the GIIS (local BDII) listed in the configuration file

venerdì 10 luglio 2009

Page 30: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 28

Summary

• The security system of gLite is based on X.509 certificates– Users are identified by certificates– VOMS server link user to VOs, groups and roles adding

attributes to the proxy certificate– LCAS and LCMAPS control the local access to the resources

checking the user certificates• Information System provided by gLite is the BDII

– The information are organised following the GLUE Schema– Current implementation use only BDII to check the state of the

resources The user can contact the top BDII in the hierarchy to get the

information of all the resources

venerdì 10 luglio 2009

Page 31: Session 23 - gLite Overview

EGEE-III INFSO-RI-222667

Enabling Grids for E-sciencE

www.eu-egee.org

gLite Workload Management System

venerdì 10 luglio 2009

Page 32: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 30

Outline

gLite OverviewWorkload Management System

WMS ArchitectureJob state machineJob Description Language Overview

Security overview

venerdì 10 luglio 2009

Page 33: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 31

gLite services

Computing Element

Storage Element

Site X

Information System

submit

submit

query

retrieve

retrieve

Workload ManagementLogging & Bookkeeping

User Interface

publishstate

File and ReplicaCatalogs

AuthorizationService

query

updatecredential

publishstate

discoverservices

venerdì 10 luglio 2009

Page 34: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 32

The Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources.

The purpose of the Workload Manager (WM) is to accept and satisfy requests for job management coming from its clientsmeaning of the submission request is to pass the responsibility of the job to the WM.WM will pass the job to an appropriate CE for executiontaking into account requirements and the preferences expressed in the job description file

The decision of which resource should be used is the outcome of a matchmaking process.

WMS Objectives

venerdì 10 luglio 2009

Page 35: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33

WMS Architecture

venerdì 10 luglio 2009

Page 36: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33

WMS Architecture

Job managementrequests (submission,

cancellation) expressedvia a Job Description

Language (JDL)

venerdì 10 luglio 2009

Page 37: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33

WMS Architecture

Keeps submission requests

Requests are kept for a while

if no resources are immediately available

venerdì 10 luglio 2009

Page 38: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33

WMS Architecture

Finds an appropriateCE for each submission

request, taking into account job requests and preferences, Grid status, utilization policies

on resources

venerdì 10 luglio 2009

Page 39: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33

WMS Architecture

Repository of resource information

available to matchmaker

Updated via notifications and/or active

polling on resources

venerdì 10 luglio 2009

Page 40: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33

WMS Architecture

Performs the actual job submission and monitoring

venerdì 10 luglio 2009

Page 41: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 33

WMS Architecture

venerdì 10 luglio 2009

Page 42: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 34

Job Description Language

venerdì 10 luglio 2009

Page 43: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 35

Job Description Language

In gLite, Job Description Language (JDL) is used to describe jobs for execution on Grid.

The JDL adopted within the gLite middleware is based upon Condor’s CLASSified Advertisement language (ClassAd).A ClassAd is a record-like structure composed of a finite number of attributes separated by semi-colon (;)A ClassAd is highly flexible and can be used to represent arbitrary services

The JDL is used in gLite to specify the job’s characteristics and constrains, which are used during the match-making

process to select the best resources that satisfy job’s requirements.

venerdì 10 luglio 2009

Page 44: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 36

The JDL syntax consists on statements like:Attribute = value;

Comments must be preceded by a sharp character ( # ) or have to follow the C++ syntax

WARNING: The JDL is sensitive to blank characters and tabs. No blank characters

or tabs should follow the semicolon at the end of a line.

Job Description Language (cont.)

venerdì 10 luglio 2009

Page 45: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

venerdì 10 luglio 2009

Page 46: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

Executable indicates which file will be executed remotely

venerdì 10 luglio 2009

Page 47: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

Environment allows to specify env. variables which will be set at run time

venerdì 10 luglio 2009

Page 48: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

Arguments appends a string (to be used as argument) to Executable

venerdì 10 luglio 2009

Page 49: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

StdOutput is the remote file where output will be redirected

venerdì 10 luglio 2009

Page 50: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

StdError is the remote file where std error will be redirected

venerdì 10 luglio 2009

Page 51: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

InputSandbox defines a set of local files that you want to be staged remotely for execution

venerdì 10 luglio 2009

Page 52: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

OutputSandbox defines a set of remote files that you want to get back after execution

venerdì 10 luglio 2009

Page 53: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

Requirements allows to specify a set of characteristic (hardware or software that you wish for the resource.

venerdì 10 luglio 2009

Page 54: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 37

JDL: an example

Type = "Job";JobType = "Normal";Executable = "startGen4.sh";Environment = {"CLASSPATH=./gfal.jar:./gint.jar","LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH","LCG_GFAL_VO=gilda","LCG_RFIO_TYPE=dpm"};Arguments = " 0 0 10 4 10000 aliserv6.ct.infn.it lfn:/grid/gilda/valeria/2000pillar.dat /gilda/issgc07/";StdOutput = "sample.out";StdError = "sample.err";InputSandbox = {"startGen4.sh","gint.jar","gfal.jar","libGFalFile.so"};OutputSandbox = {"sample.err","sample.out","res.txt"};Requirements = Member("GLITE-3_1_0",other.GlueHostApplicationSoftwareRunTimeEnvironment);

venerdì 10 luglio 2009

Page 55: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009

Other relevant JDL attributes• If your job needs a file stored somewhere, you can

specify its LFN :• The file will not be copied but your job scheduled to a CE

near the SE holding that file• That is crucial when dealing with large files

38

DataRequirements = { [ InputData = {"lfn:/grid/gilda/emidio/test.txt"}; DataCatalogType = "DLI"; DataCatalog = "http://lfc-gilda.ct.infn.it:8085"; ] }; DataAccessProtocol = {"rfio","gsiftp"};

venerdì 10 luglio 2009

Page 56: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009

Othere relevant JDL attributes• Rank : allows to override UI’s default for fitness function on

which resources are classified

• RetryCount : override default for times that a job will be resubmitted after the first failure

• Requirements : a wide set of attributes, as they are published from the BDII, can be required. Regular expressions can be even set, and/or combined with logical operators ( II, &&, ! )

39

Rank = ( other.GlueCEStateWaitingJobs == 0 ? other.GlueCEStateFreeCPUs : -other.GlueCEStateWaitingJobs);

RetryCount = 7

Requirements = (RegExp("pd.infn.it",other.GlueCEUniqueID));

Rank = ( other.GlueCEInfoTotalCPUs);

venerdì 10 luglio 2009

Page 57: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 40

Workflows of jobs• With a single request, multiple jobs

can be generated and executed

• Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs

• A Collection is a group of jobs with no dependencies– basically a collection of JDL’s

• A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters

• Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs – Submission time reduction

• Single call to WMProxy server / single Authentication and Authorization process• Sharing of files between jobs

– Availability of both a single Job Id to manage the group as a whole and Job Ids for each single job in the group

nodeEnodeC

nodeA

nodeD

nodeB

venerdì 10 luglio 2009

Page 58: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009

DAG example[ Type = "dag";

InputSandbox = {"son.sh"}; nodes = [ son1 = [ description = [ JobType = "Normal"; Executable = "/bin/sh"; InputSandbox = {root.InputSandbox}; Arguments = "son.sh 1"; StdOutput = "son1.output"; StdError = "son1.error"; OutputSandbox = {"final1.input","son1.output","son1.error"}; ]; ]; final = [ description = [ JobType = "Normal"; Executable = "/bin/sh"; InputSandbox = {"final.sh", root.nodes.son1.description.OutputSandbox[0]}; Arguments = "final.sh"; StdOutput = "dag.out"; StdError = "dag.err"; OutputSandbox = {"dag.out","dag.err"}; ]; ]; dependencies = { {son1,final}}; ];]

41

venerdì 10 luglio 2009

Page 59: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009

DAG example[ Type = "dag";

InputSandbox = {"son.sh"}; nodes = [ son1 = [ description = [ JobType = "Normal"; Executable = "/bin/sh"; InputSandbox = {root.InputSandbox}; Arguments = "son.sh 1"; StdOutput = "son1.output"; StdError = "son1.error"; OutputSandbox = {"final1.input","son1.output","son1.error"}; ]; ]; final = [ description = [ JobType = "Normal"; Executable = "/bin/sh"; InputSandbox = {"final.sh", root.nodes.son1.description.OutputSandbox[0]}; Arguments = "final.sh"; StdOutput = "dag.out"; StdError = "dag.err"; OutputSandbox = {"dag.out","dag.err"}; ]; ]; dependencies = { {son1,final}}; ];]

41

Single Submission

single job id

venerdì 10 luglio 2009

Page 60: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 42

[issgc59@issgc-ui ~]$ glite-wms-job-submit -d emidio -o jobid-file sfk-explorer.jdl

Connecting to the service https://gilda-wms-01.ct.infn.it:7443/glite_wms_wmproxy_server

====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxyYour job identifier is:

https://gilda-lb-01.ct.infn.it:9000/4OaQng0PdA1nZJZHMcilqA

The job identifier has been saved in the following file:/home/issgc59/jid

=====================================================================

venerdì 10 luglio 2009

Page 61: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 43

Jobs State Machine (1/9)

venerdì 10 luglio 2009

Page 62: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 43

Jobs State Machine (1/9)

Submitted job is entered by the user to the User Interface but not yet transferred to Network Server for processing

venerdì 10 luglio 2009

Page 63: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 44

Jobs State Machine (2/9)

Waiting job accepted by WMS and waiting for Workload Manager processing or being processed by WMHelper modules.

venerdì 10 luglio 2009

Page 64: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 45

Jobs State Machine (3/9)

Ready job processed by WM but not yet transferred to the CE (local batch system queue).

venerdì 10 luglio 2009

Page 65: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 46

Jobs State Machine (4/9)

Scheduled job waiting in the queue on the CE.

venerdì 10 luglio 2009

Page 66: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 47

Jobs State Machine (5/9)

Running job is

running.

venerdì 10 luglio 2009

Page 67: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 48

Jobs State Machine (6/9)

Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way).

venerdì 10 luglio 2009

Page 68: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 49

Jobs State Machine (7/9)

Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials).

venerdì 10 luglio 2009

Page 69: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 50

Jobs State Machine (8/9)

Cancelled job has been successfully canceled on user request.

venerdì 10 luglio 2009

Page 70: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 51

Jobs State Machine (9/9)

Cleared output sandbox was transferred to the user or removed due to the timeout.

venerdì 10 luglio 2009

Page 71: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009

Logging and Bookkeping

• Every step of the job life cycle is logged on a service called Logging and Bookkeeping

• It is useful for users willing to know the status of their execution– when a job is submitted the UI logs it on LB– As result of submission a job identifier is returned

– WMS logs each step of scheduling– CE logs when it receive a job (scheduled), when it’s running and

when it’s done – Users can query the job status to the LB providing the job id

• Asynchronous updates....

52

venerdì 10 luglio 2009

Page 72: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009

Logging and Bookkeping

• Every step of the job life cycle is logged on a service called Logging and Bookkeeping

• It is useful for users willing to know the status of their execution– when a job is submitted the UI logs it on LB– As result of submission a job identifier is returned

– WMS logs each step of scheduling– CE logs when it receive a job (scheduled), when it’s running and

when it’s done – Users can query the job status to the LB providing the job id

• Asynchronous updates....

52

https://gilda-lb-01.ct.infn.it:9000/fw4Ua8b_7Z8Vd8oJC74NCw

venerdì 10 luglio 2009

Page 73: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009

The Computing Element

• The CE is the front-end machine (master node) to a local batch system– supported batch systems are PBS(Torque/MAUI), LSF, Condor

• WMS “pushes” job execution requests to the CE using condor-G– when a CE receives a job, this is moved on a queue– Then the job will be executed on the first available among its

Worker Nodes (where the batch system clients run) – when execution is complete, output files are copied to the CE

using scp• If the job is succesfully executed, output files are

copied back to the WMS using globus-url-copy• By queries to the LB, users knows when a job is done

and they can retrieve the output

53

venerdì 10 luglio 2009

Page 74: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009

Summary

• WMS catchs users’ request for job executions • Requests are expressed through JDL

– JDL allows to specify requirements that selected resources must have

• The WMS processes request and chooses (matchmaking) a Computing Element for the actual execution– Status of resources is known to WMS with queries to BDII

• The CE tries to execute the job and copies back output files to WMS – status of execution is logged on LB

• Users queries LB, discovers their job is done and download output files from WMS

54

venerdì 10 luglio 2009

Page 75: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 55

References

• gLite – http://www.glite.org

• GILDA Infrastructure– https://gilda.ct.infn.it/

• VOMS– http://infnforge.cnaf.infn.it/projects/voms

• GGF Security– http://www.gridforum.org/security

• GLUE Schema– http://glueschema.forge.cnaf.infn.it/

• EGEE– http://www.eu-egee.org

venerdì 10 luglio 2009

Page 76: Session 23 - gLite Overview

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Overview of gLite middleware, ISSGC 2009 56

www.glite.org

Questions ?

venerdì 10 luglio 2009