34
E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System elisabetta . ronchieri @ cnaf . infn .it

E. Ronchieri – n° 1 EDG release 2 Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System [email protected]

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

E. Ronchieri – n° 1

EDG release 2

Elisabetta Ronchieri INFN CNAF - DataGrid WP1 – Workload Management System

[email protected]

E. Ronchieri – n° 2

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management - Workload Management

Installation

E. Ronchieri – n° 3

Grid Vision

Researchers, Grid Middleware, Scientific instruments and experiments and Resources are the major figures

Researchers interact with colleagues, share and access data Grid middleware provides part of the sw infrastructure Experiments provides huge amount of data

Grid is: a special form of distributed computing

Computing and storage resources are distributed over several sites Sites are typically connected via wide-area NW links

It can be best applied to applications that have the following features: Distributed user community Lots of computing power (Computational Grid) Lots of storage capacity (Data Grid)

Currently, it is applied mainly in computing sciences

E. Ronchieri – n° 4

Grid Today

Still many steps must be done (especially to make the Grid popular to a conventional user)

Considerable expertise is still required (especially to make efficient the use of the Grid technology)

There is no single Grid (several projects,…)

Grids need to work together for a standardization Global Grid Forum (GGF http://www.ggf.org)

Its mission is to promote and develop Grid technologies and applications There are a lot working group in several different areas (Scheduling and

Resource Management, Security, ….)

E. Ronchieri – n° 5

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management - Workload Management

Installation

E. Ronchieri – n° 6

Major US & European Grid Projects, many with strong HEP participation

US projects European projects

Many national, regional Grid projects --GridPP(UK), INFN-grid(I),NorduGrid, Dutch Grid, …

The Virtual DataToolkit (VDT)

The DataGridToolkit

E. Ronchieri – n° 7

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management - Workload Management

Installation

E. Ronchieri – n° 8

EDG Globus-based middleware architecture

EDG is built on the emerging Grid technology

Start: Jan 1, 2001 End: Dec 31, 2003

Current EDG architectural functional blocks: Basic Services provided by Globus 2.2.x (such as authentication authorization, info providers,

replica catalog, secure file transfers) and Condor (such as the submission, the effective job cancellation, the event monitoring, the support for the monitoring)

Higher Level EDG Middleware developed within EDG

Application (such as HEP, BIO, and EO)

OS & Net services

Basic Services

High level Grid middleware

LHCVOs common application layer

Other apps

ALICE ATLAS CMS LHCbSpecific application layer

Other apps

GLOBUS 2.2.x

and Condor

Grid middleware

E. Ronchieri – n° 9

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management - Workload Management

Installation

E. Ronchieri – n° 10

Selected Areas for Grid Technologies in EU DataGrid (and partly Globus)

Security All access to and interaction with Grid resources need to be done in a secure way

Major technologies: PKI (Public Key Infrastructure), and GSS

Information and Monitoring Services Before you start using the Grid, you need to know what resources are there and

what you can use

Major technologies: LDAP based or Web Service approach

Data Management Main focus of a Data Grid

Major technologies: LDAP based or Web Service approach

Workload Management Submit your application to Grid where it is executed

E. Ronchieri – n° 11

Outline

What is Grid?

Grid Project – Focus on EU DataGrid Projects

Selected Areas + Technologies Security – Information and Monitoring Services – Storage Management

- Data Management – Workload Management

Installation

E. Ronchieri – n° 12

Security in EDG

Why: User jobs might access several remote resources

Users need to be Authenticated (Who am I?) Authorized (What can I do?)

Mainly uses: The security infrastructure provided by Globus

Based on PKI (Public Key Infrastructure) and GSS

E. Ronchieri – n° 13

Grid Security Requirements

1) Easy to use

2) Single sign-on

3) Run applications

1) Specify local access control

2) Auditing, accounting, etc.

3) Integration local system kerberos, AFS, license mgr.

User View

Resource Owner View

E. Ronchieri – n° 14

Grid Security Infrastructure (GSI)

Extensions to existing standard protocols & APIs Standards: SSL/TLS, X.509 & CA, GSS

Extensions for single sign-on and delegation

Globus Toolkit reference implementation of GSI SSLeay/OpenSSL + GSS-API + delegation + single sign on

E. Ronchieri – n° 15

Site N(Unix)

Example of GSI usage

Site A(Unix)

Site B

Computer

User

Storagesystem

Proxy Credential

GridFTP Server

Grid Service

Remote file access request

Restricted Proxy

E. Ronchieri – n° 16

VO-LDAP Architecture

mkgridmap grid-mapfile

VOVODirectoryDirectory

CN=Mario Rossi

o=xyz,dc=eu-datagrid, dc=org

CN=Franz ElmerCN=John Smith

Authentication Certificate

Authentication Certificate

Authentication Certificate

ou=People ou=Testbed1 ou=???

local users ban list

Adopted by

DataGrid Testbed0 (2001/02)

DataGrid Testbed1 (2003)

DataTAG Testbed (2003)

E. Ronchieri – n° 17

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management -Workload Management

Installation

E. Ronchieri – n° 18

Grid Information and Monitoring Services

MDS 2.x R-GMA

DATA Model LDAP (Hierarchical) Relational

communicaton LDAP HTTP

Information storage

LDAP-based backends re-written by Globus

Relational Data Base

queries LDAP queriesLdapsearch -x -H ldap://lxshare0225.cern.ch:2135\ -b 'Mds-Vo-name=datagrid,o=grid’\ 'objectclass=StorageElement‘\ seId SEsize

SQL queriesSelect * from StorageElement

Components

GRIS SEGRIS CE

GIIS

WNWNWN

WNWN

Producer

Consumer

Registry

E. Ronchieri – n° 19

EDG release 1.x is totally based on MDS 2.x Due to stability problems of this component, in the last period

we use to deploy a pure LDAP server in front of a top level GIIS

EDG release 2.x is based on both MDS 2.x and R-GMA Since the GIS is a vital service for the WM, the Broker will rely

on MDS 2.x until R-GMA won’t prove to be reliable

Grid Information and Monitoring Services in EDG

E. Ronchieri – n° 20

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management -Workload Management

Installation

E. Ronchieri – n° 21

Interfaces to SE

First release of the SE control System

The three interfaces to the outside world are: Data transfer

Gridftp will be used to transfer files over the WAN and the files will be available to local nodes by NFS

Information Existing MDS information providers will be extended to provide the extra information in

the GLUE storage schema

Control Function such as reservation for reading and writing, metadata modification, access via

gridftp

It is an implementation of the Storage Resource Management (SRM) specification

The SE control interface to a generic MSS has already been tailored for CERN and RAL

Work is under way with in2p3, wp10 and wp9 to adapt it to their MSS

http://sdm.lbl.gov/srm-wg

E. Ronchieri – n° 22

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management – Workload Management

Installation

E. Ronchieri – n° 23

Naming Schemes

GUID – Global Unique Identifier guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6

LFN – Logical File Name lfn://event20030612

SFN – Storage File Name sfn://ibm139.cnaf.infn.it/edg/storageelement/dev/wpsix/pippo

Host + path + filename

GUID

LFN1

LFN2

LFN3

SFN1

SFN2

SFN3

E. Ronchieri – n° 24

Replica Manager

Replica Metadata Catalog

Replica Location Service

File Transfer

Optimization Client

RLS

RMC

GridFTP

edg-replica-manager

Replication Services: EDG Replica Manager

Used for querying and assigning LFNs

Used for locating replicas and assigning SFNs

Used for transferring file

E. Ronchieri – n° 25

VO VO

Replication Services Architecture

Site

Replica Manager

StorageElement

ComputingElement

Optimiser

Resource Broker

User Interface

ReplicaMetadata Catalog

Site

Replica Manager

StorageElement

ComputingElement

Optimiser

ReplicaLocation Service

LocalReplicaCatalog

LFNs -> GUIDGUID->SFNs

E. Ronchieri – n° 26

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management - Workload Management

Installation

E. Ronchieri – n° 27

Review of WMS architecture

WMS architecture reviewed To apply the “lessons” learned and addressing the shortcomings

emerged with the first release of the software

To address the scalability problems

To increase the reliability of the system

To favor interoperability with other Grid frameworks, by allowing exploiting WP1 modules (e.g. RB) also “outside” the EDG WMS

E. Ronchieri – n° 28

WMS Revised Architecture

UIReplicaManager

Inform.Service

NetworkServer

Job Contr.-

CondorG

WorkloadManager

RB node

CE characts& status

SE characts& status

RBstorage

Match-Maker/ Broker

JobAdapter

Log Monitor

Logging &Bookkeeping

E. Ronchieri – n° 29

Improvements

Duplication of persistent information related to jobs avoided LB only repository of job information Possible to have multiple LB servers per RB (to avoid bottlenecks)

Techniques to quickly recover from failures E.g.: communication among components of WMS much more reliable (done via persistent

queues in the file system)

Also less exposed to memory leaks (coming not only from EDG software)

Flexibility and interoperability increased E.g. RB-Matchmaker as pluggable module Glue Schema compliance

Other enhancements in design and implementation

E. Ronchieri – n° 30

New functionalities User APIs

Including a Java GUI

Trivial job check-pointing service User can save from time to time the state of the job (defined by the application) A job can be restarted from an intermediate (i.e. previously saved) job state

Gang-matching Allow to take into account both CE and SE information in the matchmaking For example to require a job to run on a CE close to a SE with enough space

Support for parallel MPI jobs

Support for interactive jobs Jobs running on some CE worker node where a channel to the submitting (UI) node is available for the

standard streams (by integrating the Condor Bypass software)

E. Ronchieri – n° 31

Outline

What is Grid?

Grid Projects – Focus on EU Data Grid Project

Selected Areas + Technologies Security – Information and Monitoring Services - Storage Management

- Data Management - Workload Management

Installation

E. Ronchieri – n° 32

Installation

EDG SW: Is delivered via rpms Is handled on CVS repository

Globus + Condor SW: are provided via VDT (delivered rpms) upgraded to Globus 2.2.4 and Condor 6.5.1

LCFGng: Is an automatic installation tool based on rpms Is also used for the configuration of the middleware components Works for RH 6.2 and RH 7.3

Sites: Development testbed

E. Ronchieri – n° 33

EDG Deploying

R-GMA, RM, RLS, ROS, RMC, and WMS + GLUE schema

EDG release 2.0 A temporary tag contains the functionalities for EDG 2.0 (deployed at

CERN, NIKHEF, CNAF, and RAL)

not officially tagged as EDG 2.0 until the basic functionalities work (e.g. job submission, data transfers, etc)

Hopefully the first EDG 2.0 tag at the end of this week

The schedule for moving to gcc3.2.2 for all software is planning for this September

The integration of more functionalities is entirely at the mercy of LCG

E. Ronchieri – n° 34

Conclusion

Many improvements and many new functionalities

Preliminary results encouraging

More comprehensive evaluation with real tests performed by real users on the large scale testbed