DataGrid Prototype 1 A.Ghiselli with the contribution of F.Donno, L.Gaido, F.Prelz, M.Sgaravatto INFN, Italy TNC 2002Limerick 5 June 2002

DataGrid Prototype 1

A.Ghiselli

with the contribution of F.Donno, L.Gaido, F.Prelz, M.Sgaravatto

INFN, Italy

TNC 2002Limerick

5 June 2002

2

Outline

The HEP Environment and the GRID scenario

The DataGrid components: Authentication and Authorization

Virtual Organizations One-time login VO servers

DataGrid Resources The DataGRID fabric Grid Information Service

DataGrid Services Grid scheduling and workload management Data Management

DataGrid Release1 Description

Testbed1

Conclusions

3

The DataGrid Environment and the GRID scenario

Research communities: High Energy Phisics(HEP), Earth Observation (EO), Bio-Informatics Communities

HEP experiment characteristics: Huge number of users: thousands of physicists from hundred of

institutions, Labs and universities for one experiment at Cern

Distributed resources: pool of computing, storage and network resources to analyze petabytes of data (1015)

Natural data parallelism based on the experiment events (HTC)

Computing systems based on clusters with an high number of PC (Farm)

4

A very large distributed comunity

CMS: 1800 physicists150 institutes32 countries

Just as an example

5

On-line System• Large variety of

triggers and thresholds• Multi-level trigger• Online reduction 107

• Keep highly selected events

Level 1 - Special Hardware

Level 2 - Embedded Processors

40 MHz 40 MHz (1000 TB/sec) equivalent)

(1000 TB/sec) equivalent)

Level 3 – Farm of commodity CPU

75 KHz 75 KHz (75 GB/sec)fully digitised

(75 GB/sec)fully digitised5 KHz5 KHz (5 GB/sec)

(5 GB/sec)100 Hz100 Hz (100 MB/sec)

(100 MB/sec)

Data Recording &

Data Recording &

Offline Analysis

Offline Analysis

Digital telephone

1-2 KB/sec

6

HEP computing, key parameters

All LHC experiments at CERN: 10 Peta Bytes/yr data storage; disk: 2 P Byte

Multi-experiment Tier 1: 3 Peta Byte/yr; disk: 0.5 P Byte

Tier 0 & 1 at CERN: 2 M SI 95 (PC today ~ 20SI95)

Multi-experiment Tier 1: 0.9 M SI 95

Networking Tier 0 --> Tier 1: 622 Mbps (4 Gbps) (black fibre: 1 Tbps today)

7

Regional Centres - a Multi-Tier Model

Tier3

Desktop

CERN – Tier 0(CERN - Tier 1)

Tier 1 FNALINFN

IN2P32.5

Gbps

2.5 Gbps

622 M

bp

s

1G m

bps 622M mbps

Tier2 site a

site b site c

site n Organising Software:

"Middleware"

"Transparent" user access to applications and all data

8

GRID an extention of the WEB concept

On-demand creation of powerfulOn-demand creation of powerful virtual computing and data virtual computing and data systemssystems

Grid: Flexible and High Performance Access to All kinds of Resources.

Sensor nets

http://

http://

Web: Uniform Access to the Informations

Data Stores

Computers

Softwarecatalogs

Colleagues

0

50

100

150

200

250

300A

ust

ria

Cze

ch R

ep

ub

lic

De

nm

ark

Fin

lan

d

Fra

nce

Ge

rma

ny

Gre

ece

Ita

ly

Ne

the

rlan

ds

No

rwa

y

Po

lan

d

Po

rtu

ga

l

Slo

vac

Re

pu

blic

Sp

ain

Sw

ed

en

Sw

itze

rlan

d

Un

ited

Kin

gd

om

CE

RN

Arm

en

ia

Au

stra

lia

Aze

rba

ijan

Re

pu

blic

Re

pu

blic

of

Be

laru

s

Bra

zil

Ca

na

da

Ch

ina

PR

Re

pu

blic

of

Ge

org

ia

Isra

el

Jap

an

Mo

rocc

o

Ro

ma

nia

Ru

ssia

JIN

R D

ub

na

Slo

ven

ia

Taiw

an

Tu

rke

y

Un

ited

Sta

tes

9

DataGrid Authentication & Authorization

“Grid” refers to both the technologies and the infrastructure that enable coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations , such as the ones considered in the DataGrid project.

Grid doesn’t prevent the local access to computing resources

The dynamic nature of VO brings the need of specific VO services to authenticate and authorize users to access the Grid resources, to manage grid access policies, etc.

10

One-time login based on Identity Certificate

The users do not have a login name and password via which they can log in to the grid computers, rather they must own an X.509 Identity Certificate issued by a “Certification Authority” (CA) which is trusted (each CA has to adopt the set of agreed rules) by the DataGrid project.

CA’s publish their certs in LDAP Directories (if the CA doesn’t have/want one, a batch alternative is possible).

Trusted CA’s: CERN CA: http://globus.home.cern.ch/globus/ca/ DOESG CA: http://pki1.doesciencegrid.org/ INFN CA: ldap://security.fi.infn.it DutchGrid and NIKHEF CA: ldap://certificate.nikhef.nl UK HEP Testbed CA: http://www.gridpp.ac.uk/ca/ CNRS DataGRid-Fr: http://marianne.in2p3.fr/datagrid/wp6-fr/ca/ca2-fr.html Spanish DataGrid CA: http://www.ifca.unican.es/datagrid/ca/

11

One-time login / authentication process

Datagrid security is based on Globus Security Infrastructure (GSI)

GSI implements the standard Generic Security Service Application Program Interface (GSS-API) based on RFC 2078/2743.

GSS-API requires the ability to pass user/host/service authentication information to a remote site so that further authenticated connections can be established (on-time login).

Proxy is the entity empowered to act as the user at the remote site. The remote end is able to verify the proxy certificate by descending the

certificate signature chain and thus authenticate the certificate signer.

The signer’s identity is established by trusting the CA ( in DataGrid the CA must be one of the trusted CAs).

Proxy certificate has a short expiration time (default value 1 day).

12

One-time login / authorization

User authorization is the last remaining step User access is granted by checking the proxy certificate subject

(X.500 Distinguished Name) and looking it up in a list (so-called gridmap file) that is maintained at the remote site.

Gridmap file links a DN to a local resource username, so that the requesting user can inherit all the rights of the local user. Many DNs can be linked to the same local user

Real authorization specifically granted by the VO which the user belongs to in agreement with the owners of the grid resources.

Needs of VO authorization management server and tools

13

Basic elements of the GSI one-time-login model.

14

Authorization Structure

Each VO manages an LDAP Directory: members (ou=People), which contain:

the URI of the certificate on the CA LDAP Directory; the Subject of the user’s certificate (to speed up grid-mapfile

generation); groups (e.g. ou=tb1):

every user must belong to at least one group;

available VO’s: Alice:

ldap://grid-vo.nikhef.nl/o=alice,dc=eu-datagrid,dc=org Atlas:

ldap://grid-vo.nikhef.nl/o=atlas,dc=eu-datagrid,dc=org CMS:

ldap://grid-vo.nikhef.nl/o=cms,dc=eu-datagrid,dc=org CDF:

ldap://ldap-vo.cnaf.infn.it/o=cdf,dc=eu-datagrid,dc=org

Gridmap files are generated from the VO Directories

15

grid-mapfile generationo=testbed,dc=edg, dc=org

CN=Franz Elmer

ou=People

CN=John Smith

mkgridmap grid-mapfile

VOVODirectoryDirectory

““AuthorizationAuthorizationDirectory”Directory”

CN=Mario Rossi

o=xyz,dc=edg, dc=org

CN=Franz ElmerCN=John Smith

Authentication Certificate



ou=People ou=tb1 ou=Admin

local users ban list

16

The DataGrid fabric The DataGrid fabric consists of a farm of centrally managed

machines with multiple functionalities: Computing Elements(CE), Worker Nodes (WN), Storage Elements (SE), Information Service (IS), Network links , …

A site consists of one or more “software and profile servers” and a number of “clients”. Both clients and server can be automatically automatically configuredconfigured by getting the appropriate profile from one profile servers. The first profile server needs to be installed/configured manually.

CECE//WNWN (PC Cluster)(PC Cluster)

RPMs repositoryRPMs repository

Profile repositoryProfile repository

LCFGServer

SESE(GDMP)(GDMP)

17

DataGrid resources and Information Service

Computing Element (CE) identifies large computing farms of commodity PCs

The access to the grid resources is based on the Globus Resource Access Management (GRAM) service. It is responsible for operating a set of resources under the same site-specific allocation policy or Local Resource Access Management (LRAM) like LSF, PBS, Condor..

Storage Element (SE) identifies any storage system with the ability to provide direct file access and transfer via FTP, NFS or other protocols using the grid security mechanism.

18

Grid Information Service

The Information Service plays a fundamental role in the DataGrid environment since resource discovery and decision making is based upon the information service infrastructure. Basically an IS is needed to collect and organize, in a coherent manner, information about grid resources and status and make them available to the consumer entities.

EDG-Release1 adopted the Globus Information Service (MDS) which is based on LDAP directory service.

Advantages: Well defined data model and a standard and consolidated way to describe data Standardized API to access data in the directory servers A distributed topological model that allows data distribution and delegation of

access policies among institutions

Disadvantages Not designed to store dynamic information such as the status of computing,

storage and network resources. For these reasons other mechanisms are under test.

19

GRAM Architecture

EGD specific Information provider

EDG specific schema

EDG specificLRAM

20

Resources and service schemas

The schema (data structure describing the grid resources and their status) represents what makes data valuable to Grid tools and applications. DataGrid defined and implemented its own CE, SE and NE schemas.

EDG CE schema describes a queue since it is the most suitable data structure to model a cluster of homogeneous PCs, locally managed by schedulers like pbs, lsf and Condor.

The definition and standardization of the information about grid resources is largely work in progress within the Global Grid Forum.

Work for a common schema for CE, SE and NE is in progress between EDT/EDG, iVDGL and Globus. This will allow access US and EU grids for both EU and US users.

21

Scheduling on the grid

One of the most important aspects of the Project is the workload management of applications dealing with large amount of data and clusters with high number of machines.

The scheduler is one of the most critical components of the resource management systems, since it has the responsibility of assigning resources to jobs in such a way that the application requirements are met, and of ensuring that the resource usage limits granted to the user are not exceeded. Although scheduling is a traditional area of computer science research, the particular characteristics of the DataGrid project, and of the computational grids in general, make traditional schedulers inappropriate.

22

Which properties for EDG scheduler? Distributed organization. Given that several user

communities (also called virtual organizations) have to co-exist in DataGrid, it is reasonable to assume that each of them will want to use a scheduler that better fits its particular needs (community scheduler). However, when a number of independent schedulers are operating simultaneously, a lack of coordination among their actions may result in conflicting and performance-hampering decisions. The need of coordination among these peer, independent schedulers naturally calls for a distributed organization.

Predictive state estimation, in order to deliver adequate performance even in face of dynamic variation of the resource status.

Ability to interact with the resource information system. At the moment, all the existing schedulers require that the user specifies the list of the machines that (s)he has permission to use. However, a fully functional grid scheduler should be able to autonomously find this information by interacting with the grid-wide information service.

23

Cont. Ability to optimize both system and application

performance, depending on the needs of DataGrid users. As a matter of fact, DataGrid users needing high-throughput for batches of independent jobs (such as the HEP community) have to co-exist with users requiring low response times for individual applications (e.g. the bio-medical community). In this case, neither a system-oriented nor an application-oriented scheduling policy would be sufficient.

Submission reliability: Grids are characterized by an extreme resource volatility, that is the set of available resource may dynamically change during the lifetime of an application. The scheduler should be able to resubmit, without requiring the user intervention, an application whose execution cannot continue as consequence of the failure or unavailability of the machine(s) on which it is running.

Allocation fairness. In a realistic system different users will have different priorities that determine the amount of Grid resources allocated to their applications.

24

WMS components

The EDG-WMS has been designed having in mind the above properties for a grid scheduler.

Resource broker, is the core component, which has: to find a CE that best matches the requirements and preferences of

a submitted job, considering also the current distribution of load on the grid.

Once a suitable computing element is found, to pass the job to the job submission service for the actual submission.

These tasks include interacting with the DataGrid Data Management Services to resolve Logical data set names as well as to find a preliminary set of sites where the required data are stored

25

Other WMS components

The logging and bookkeeping service is responsible to store and manage logging and bookkeeping information generated by the various components of the WMS. It collects information about the scheduling system and about active jobs.

A user can submit jobs and retrieve the output through the User Interface. The description of a job is expressed in the Job Description Language (JDL), which is based on the classified advertisement scheme developed by the Condor project.

It is a semi-structured data model: no specific schema is required

Symmetry: all entities in the grid, in particular applications and computing resources can be expressible in the same language.

Simplicity for both syntax and semantics.

26

Dg-job-submit dg-job-submit jobad6.jdl -o jobs_list -n [email protected]

#

Executable = "WP1testC";

StdInput = "sim.dat";

StdOutput = "sim.out";

StdError = "sim.err";

InputSandbox = {"/home/wp1/HandsOn-0409/WP1testC","/home/wp1/HandsOn-0409/file*”,

"/home/wp1/DATA/*"};

OutputSandbox = {"sim.err","test.out","sim.out"};

Rank = other.AverageSI00;

Requirements = (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2”) &&

(other.RunTimeEnvironmnet == “CMS3.2”);

InputData = "LF:test10096-0009";

ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";

DataAccessProtocol = "gridftp";

27

Dg-job-submit Executable = "WP1testF";

StdOutput = "sim.out";

StdError = "sim.err";

InputSandbox = {"/home/datamat/sim.exe", "/home/datamat/DATA/*"};

OutputSandbox = {"sim.err","sim.err","testD.out"};

Rank = other.TotalCPUs * other.AverageSI00;

Requirements = other.LRMSType == "PBS" \ && (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2") && \ self.Rank > 10 && other.FreeCPUs > 1;

RetryCount = 2;

Arguments = "file1"; InputData = "LF:test10099-1001";

ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";

DataAccessProtocol = "gridftp";

OutputSE = "grid001.cnaf.infn.it";

28

UI commands Submit of a job for execution on a remote Computing Element,

including: automatic resource discovery and selection staging of the application and data (input sandbox)

Selection of a list of suitable resources for a specific job

Cancellation of one or more submitted jobs

Retrieval of the output file(s) produced by a completed job (output sandbox)

Retrieval and display of bookkeeping information about submitted jobs

Retrieval and display of logging information about jobs.

29

A Job Submission Example

UIJDL

Logging &Logging &Book-keepingBook-keeping

ResourceResourceBrokerBroker

Output “sandbox”

Input “sandbox”

Job SubmissionJob SubmissionServiceService

StorageStorageElementElement

ComputeComputeElementElement

Brokerinfo

Output “sandbox”

Input “sandbox”

Information Information ServiceService

Job Status

LFN->PFN

Data ManagementData ManagementServicesServices

Author.&Authen. Job

Subm

it

Job Q

uery

Job Status

30

Data management

In EDG the technique adopted for optimizing data access and providing fault tolerance is data replication.

The data management architecture is focused on file replication services and the main objectives include optimized data access, caching, file replication and file migration. The most important tasks are:

Management of a universal namespace for files (using replica catalogues)

Secure and efficient data transfer between sites

Synchronization of remote copies

(Optimized) wide-area data access/caching

Management of meta-data like indices and file meta-data

Interface to mass storage systems

31

The building blocks of the DM architecture

Replica catalogue as a fundamental building block in data grids It addresses the common need to keep track of multiple copies of a

single logical file by maintaining a mapping from logical file names to physical locations.

Imported from globus and based on openLDAP v2 with and EDG schema, but migrating to a more distributed architecture with a relational database as backend.

32

The building blocks of the DM architecture

The Replica Manager :the main tasks are to securely and efficiently copy files between two Storage Elements and update the replica catalogue when the copy process has successfully terminated.

The File Copier (also called Data Mover) is an efficient and secure file transfer service that has to be available on each Storage Element. Initially this service will be based on GridFTP protocol.

The Consistency Service has to guarantee the synchronization of the file replicas when an update occurs. This service is provided on top of the replica manager.

The first implementation is based on GDMP (Grid Data Mirroring Package) which is a file replication tool that implements most of the replica manager functionalities.

The access to the replica files shall be optimized by using some performance information such as the current network throughput and latency and the load on the Storage Elements. The Replica Optimizer’s duty is to select the “best” replicas.

33

The Storage Element

Data Management in DataGrid has to deal with heterogeneity of storage systems and thus Storage Elements (SE). The interface to SE has to be unique regardless of the underlying storage technology. SE is a basic storage resource in DataGrid and also define the smallest granularity for a compound storage system. A Storage Element can either be a large disk pool or a Mass Storage System (MSS) having its own internal disk pool. The current storage system implementations include MSS like HPSS, Castor as well as distributed file systems like AFS or NFS.

34

Architecture

35

DataGrid Release 1 summary UI User Interface: Lightweight component for accessing to the

workload management system.

RB Resource Broker: The core component of the workload management system, able to find a resource matching the user’s requirements (including data location).

LB Logging and Bookkeeping: Repository for events occurring in the lifespan of a job.

JSS Job Submission Services: It is the result of the integration of Condor-G and Globus services to achieve reliable submission of jobs via the Globus GRAM protocol.

II Information Index: Caching information index (based on Globus MDS-2) directly connected to the RB, to achieve control of the cache times and to prevent blocking failures when accessing the information space.

Replica Manager /GDMP: To consistently copy (replicate) files from one Storage Element to another and register replicas.

36

Cont.

Replica Catalog: It stores information about the physical files on all the Grid Storage Elements. A centralized replica catalog has been chosen for prototype1

MDS (GRIS and Info.Providers): Grid Information System used by the Resource Broker. The first implementation is based on the Globus MDS system, with resource schema and information providers defined and implemented by DataGrid.

Authentication and Authorization Services: VO directory configuration and tools to periodically generate authorization lists

Automatic Installation and Configuration management: Large Scale Linux Configuration (LCFG) tool for very large computing fabrics (CE)

37

EDG Release1 services

38

Definition of the EDG release 1/2

The hardest task of the Integration Team, a working group in charge of integrating the different pieces of software, has been to collect all the EDG software packages, Globus and Condor, to study the functionality and interdependencies, the requirements for a correct operation of the testbed and come up with a topology and precise installation and configuration instructions for deployment.

Grid Elements: in order to achieve these goals it has been necessary to construct a node-centric view of the testbed deployment specifying the profile of each node type (grid elements).

RPM: the EDG release has been packaged via Linux RPMs (Red Hat Package Manager), because of the requirements coming with the installation and distribution tools provided for the farms. In particular the Globus main subcomponents were packaged in separate RPMs that could be installed and configured independently.

Grid Elements as a list of software RPMs and configuration RPMs. During this work, it was necessary to specify detailed configuration instructions and requirements that allow these elements to interoperate.

39

Definition of the EDG release 2/2

LCFG for automatic installation and configuration: for each of the grid elements an LCFG template provides:

the RPM lists of the packages required for a specific element

a typical LCFG configuration that needs to be customized for a specific testbed site.

a specific set of instructions about configuring the grid element.

Once the definition of the profile for a grid element has been optimized, a small test suite has been used to verify that all the functionalities required were present and working.

All the above constitutes the content of the edg-release package available in the CVS repository of the DataGrid Release.

40

Middleware integration

EDG Services

Grid Elements

Basic Grid Services

Grid Scheduler RM/GDMP

Globus-EDG GIS

UI RB L&B CE SE WN

Condor Globus

Replica Catalogue

II

41

Testbed 1

The first prototype of the EDG (European DataGrid) grid infrastructure (Testbed1) has been deployed in December 2001 using the official - tagged- EDG software Release 1, based on Globus 2 beta 21.

The Testbed1 was initially made up by a limited amount of grid elements in 5 European countries (CERN, FR, UK, IT, NL), but it is currently being extended to about 30 sites all over Europe including also some other countries such as the Czech Republic, ES, PT, DE and the Nordic Countries.

The “common” grid elements (User Interfaces, Computing Elements, Worker Nodes and Storage Elements) have been installed and configured at each site while the grid elements devoted to the central grid services (Resource Brokers, Information Indexes and Logging and Bookkeeping servers) have been set up at CERN and INFN-CNAF (IT). Some other dedicated servers (the Virtual Organization LDAP servers and the VO Replica Catalogues) have been hosted at NIKHEF (NL) and INFN-CNAF.

42

Catania

Bologna/CNAF

Padova/LNL

Torino

Cagliari

Roma

Milano

To USA

To Russia/Japan

The prototype DataGrid testbed

Cern

NIKHEF

London

RAL

CC-Lyon

Paris

Prague

Barcelona

Madrid

Lisbona

43

Validation of Release1

It is essential for the success of the Project that the developed middleware satisfies the user’s initial requirements. For this reason, a software validation phase performed by the user communities using real applications in a large scale environment is crucial within the project.

The validation activity will be progressively done during the whole project lifetime, in order to test each middleware release.

Short term use cases have been used at the beginning and real applications are going to be used later on.

44

NLNLSURFnet

CERN

UKUKSuperJANET4

AbileneAbilene

ESNETESNET

MRENMREN

ITITGARR-B

GEANT

NewYork

STAR-TAP

STAR-LIGHT

DataTAG project

Two main focus:

Grid applied network research; 2.5 Gbps lambda with Star-Light for network research

Interoperability between Grids in EU and US

US partnership: iVDGL project

Main PartnersCERN, INFN,UvA(NL) PPARC(UK), INRIA(FR)

45

Conclusions

The first DataGrid prototype (Relese 1) is in place in Europe as result of the collaboration of all the actors of a distributed computing environment: resource owner, middleware developers, scientific application programmers and scientific application users.

The testing phase demonstrated the power of the grid whose basic functionalities allow to select the most appropriate CE-SE pair just specifying job characteristic and related input/output data in a high level language.

The major project release, foreseen for September 2002, will introduce new important services like support for dependent, parallel, and partitionable jobs, resource co-allocation, advance reservation and accounting, as well as more efficient information and monitoring services.

DataGrid documents can be found at http://eu-datagrid.web.cern.ch/

Documents

DataGrid Prototype 1 A.Ghiselli with the contribution of F.Donno, L.Gaido, F.Prelz, M.Sgaravatto INFN, Italy TNC 2002Limerick 5 June 2002