65
The EU DataGrid Architecture The European DataGrid Project Team http://www.eu-datagrid.org [email protected]

The EU DataGrid Architecture The European DataGrid Project Team [email protected]

  • View
    258

  • Download
    1

Embed Size (px)

Citation preview

The EU DataGrid Architecture

The European DataGrid Project Team

http://www.eu-datagrid.org

[email protected]

The EDG Architecture Tutorial - n° 2

Contents

Middleware architecture overview

EDG structure Job scheduling

Fabric management

Data Management

Monitoring

Storage

Networking

Summary

The EDG Architecture Tutorial - n° 3

EDG middleware architecture Globus hourglass

Current EDG architectural functional blocks: Basic Services ( authentication, authorization, Replica

Catalog, secure file transfer,Info Providers) rely on Globus 2.0 (GSI, GRIS/GIIS,GRAM, MDS)

OS & Net services

Basic Services

High level GRID middleware

LHCVO common application layer

Other apps

ALICE ATLAS CMS LHCb

Specific application layer Other apps

GLOBUS 2.0

GRID middleware

The EDG Architecture Tutorial - n° 4

DataGrid Architecture

Collective ServicesCollective Services

Information & MonitoringInformation

& MonitoringReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication & Accounting

Authorization Authentication & Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Database Services

Database Services

Fabric servicesFabric services

ConfigurationManagement

ConfigurationManagement

Node Installation &Management

Node Installation &Management

Monitoringand Fault Tolerance

Monitoringand Fault Tolerance

Resource Management

Resource Management

Fabric StorageManagement

Fabric StorageManagement

Grid

Fabric

Local Computing

Grid Grid Application LayerGrid Application Layer

Data Management

Data Management

Job Management

Job Management

Metadata Management

Metadata Management

Object to File

Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

The EDG Architecture Tutorial - n° 5

EDG middleware architecture: EDG interfaces

Computing Computing ElementsElements

SystemSystem ManagersManagers

ScientisScientiststs

OperatingOperating SystemSystem

FileFile SystemsSystems

StorageStorage ElementsElements

MassMass Storage Storage SystemsSystemsHPSS, CastorHPSS, Castor

UserUser AccountsAccounts

CertificateCertificate AuthoritiesAuthorities

ApplicationApplication DevelopersDevelopers

BatchBatch SystemsSystems

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local Application

Local Application

Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

AccountingReplica CatalogReplica Catalog

Storage Element Services

Storage Element Services

SQL Database Services

SQL Database Services

Fabric servicesFabric services

ConfigManagem.

ConfigManagem.

Node Installation Managem.

Node Installation Managem.

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric

StorageManagem.

Fabric Storage

Managem.

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Job Managem.

Job Managem.

Metadata Managem.Metadata

Managem.Object to File MapObject to File Map

Logging & Book-

keeping

Logging & Book-

keeping

The EDG Architecture Tutorial - n° 6

EDG middleware architecture: The Workload Management System

(WP1)

WP1 is responsible for the Workload Management System (WMS).

The WMS is currently composed by the following parts:

User Interface (UI) : access point for the user to the GRID ( using JDL)

Resource Broker (RB) : the broker of GRID resources, matchmaking

Job Submission System (JSS) : Condor-G; interfacing batch systems

Information Index (II) : an LDAP server used as a filter to select resources

Logging and Bookkeeping services (LB) : MySQL databases to store Job Info

The EDG Architecture Tutorial - n° 7

WP1: Work Load Management

ComponentsJob Description Language

Resource Broker

Job Submission Service

Information Index

User Interface

Logging & Bookkeeping Service

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Implementation: UI : python (LB client : C++)

RB : C++

JSS : C++, python

II : LDAP server

LB: MySQL, C++

Input/Output Sandboxes: GridFTP

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

WMS main interfaces: Globus Gatekeeper

WP2 Replica Catalog APIs

WP3 Information Systems

WP7 network monitoring info providers

End User (using JDL files, on the UI)

The EDG Architecture Tutorial - n° 8

EDG middleware architecture: WP1 (WMS)

The EDG Architecture Tutorial - n° 10

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

File Transfer

The EDG Architecture Tutorial - n° 11

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

The EDG Architecture Tutorial - n° 12

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Selection: Get ‘best’ file

The EDG Architecture Tutorial - n° 13

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

The EDG Architecture Tutorial - n° 14

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

The EDG Architecture Tutorial - n° 15

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usage

The EDG Architecture Tutorial - n° 16

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usage

The EDG Architecture Tutorial - n° 17

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usageMetadata:

LFN metadataTransaction informationAccess patterns

The EDG Architecture Tutorial - n° 18

File Management

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usageMetadata:

LFN metadataTransaction informationAccess patterns

The EDG Architecture Tutorial - n° 19

Current State File Transfer: Use GridFTP – deployed

Close collaboration with Globus NetLogger (Brian Tierney and John Bresnahan)

Replication: GDMP – deployed Wrapper around Globus ReplicaCatalog All functionality in one integrated package Using Globus 2 Uses GridFTP for transferring file

Replication: edg-replica-manager – deployed

Replication: Replica Location Service Giggle – in testing Distributed Replica Catalog

Replication: Replica Manager Reptor – in testing

Optimization: Replica Selection OptorSim – in simulation

Metadata Storage: SQL Database Service Spitfire – deployed Servlets on HTTP(S) with XML (XSQL) GSI enabled access + extensions

GSI interface to CASTOR – delivered

The EDG Architecture Tutorial - n° 20

WP2: Data Management

Deployed ComponentsGridFTP

Replica Manager - edg-replica-manager

Replica Catalog - globus-replica-catalog

GDMP

Spitfire

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorGrid

SchedulerGrid

SchedulerReplica

ManagerReplica

Manager

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Job Managem.

Job Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Implementation: RM: C++ classes (under development)

RC : Globus Replica Catalog wrapper

GDMP : C++

Spitfire : Java, Web Services

Data Managem.

Data Managem.

SQL Database Services

SQL Database Services

WP2 main interfaces: The GRID Storage Element

WP1 Resource Broker APIs

WP3 GRID Info services

WP7 network monitoring info providers

End User (using GDMP)

Storage Element Services

Storage Element Services

The EDG Architecture Tutorial - n° 21

Copy data file to storage element:globus-url-copy file:///${chemin}/L69999

gsiftp://lxshare0219.cern.ch/flatfiles/SE1/lhcb/L69999

Register stored data in the catalog:/opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export

GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf;/opt/edg/bin/gdmp_register_local_file -d /flatfiles/SE1/lhcb"

Publish catalog:/opt/globus/bin/globus-job-run lxshare0219.cern.ch /bin/bash -c "export

GDMP_CONFIG_FILE=/opt/edg/lhcb/etc/gdmp.conf; /opt/edg/bin/gdmp_publish_catalogue -n"

Copy output to MSS: rfcp L1600061 /castor/cern.ch/lhcb/mc/L1600061

Example of Data Management by LHCb

The EDG Architecture Tutorial - n° 22

ReplicaOptimiser

Replica Manager

Replica Catalogue

SE

CE

ReplicaOptimiser

Replica Manager

SE

CEphysical file transfer

communication

Client

The Replica Manager APIs

The EDG Architecture Tutorial - n° 23

The Replica Manager APIs

RM.copy(PhysicalFileName source,

PhysicalFileName destination,

String protocol):Status

allows for third-party transfer

transfer between: two StorageElements or ComputingElement and Storage Element Space management policies under development

The EDG Architecture Tutorial - n° 24

RM.add/deletePhysicalFileName(LogicalFileName lfn,

PhysicalFileName pfn)

Replica Catalogue operations only - no file transfer

RM.copyAndAddPhysicalFile(PhysicalFileName source,

PhysicalFileName destination,

LogicalFileName lfn,

String protocol):Status

third-party transfer but :

files can only be registered in Replica Catalogue if destination PFN contains a valid SE (i.e. needs to be registered in the RC)!

RM.deletePhysicalFile(LogicalFileName lfn,

PhysicalFileName pfn)

The Replica Manager APIs

The EDG Architecture Tutorial - n° 25

WP2 next generation Replication Services

Replica Manager

Replica Metadata

Replica Location

File Transfer

Optimization

Transaction

Consistency

Preprocessing

Postprocessing

Subscription

Client

Reptor

Giggle

RepMeC

Optor

GDMP

The EDG Architecture Tutorial - n° 26

Replication Services Architecture

ReplicaLocation

Index

Site

Replica Manager

StorageElement

ComputingElement

Optimiser

Resource Broker

User Interface

Pre-/Post-processing

Core API

Optimisation API

Processing API

LocalReplicaCatalog

ReplicaLocation

Index

ReplicaMetadata Catalog

ReplicaLocation

Index

Site

Replica Manager

StorageElement

ComputingElement

Optimiser

Pre-/Post-processing

LocalReplicaCatalog

The EDG Architecture Tutorial - n° 27

Metadata Management and Security

Project Spitfire

'Simple' Grid Persistency Grid Metadata Application Metadata Unified Grid enabled front end to relational databases.

Metadata Replication and Consistency

Publish information on the metadata service

Secure Grid Services

Grid authentication, authorization and access control mechanisms enabled in Spitfire

Modular design, reusable by other Grid Services

The EDG Architecture Tutorial - n° 28

Spitfire Architecture

Oracle DB2 PostGres MySQL

Atomic RDBMS is always consistent

No local replication of data

Role-based authorization

XSQL Servlet as one access mode

for ‘simple’ web access

Web/Grid Services Paradigm SOAP interfaces JDBC interface to RDBMS

Plugability and extensibility

OracleLayer DB2Layer PGLayer MyLayerLocal Spitfire

Layer

Connecting Layer Global Spitfire LayerSOAP

SOAP SOAP

SOAP SOAP

SOAP

The EDG Architecture Tutorial - n° 29

WP3’s task is to provide information about

The Grid itself This includes information about resources (ComputingElements, StorageElements and the Network), for which the Globus MDS is a common solution; and job status information(as implemented by WP1's Logging and Bookkeeping).

Grid applications This is information published by user jobs. This is used for performance monitoring.

WP3 : GRID monitoring and Info Providers

The EDG Architecture Tutorial - n° 30

Main WP3 components: MDS v 2.1: the Globus Monitoring and Discovery Services based on

Soft State Registration protocols and LDAP aggregate directory services

Ftree : EDG developed directory service based on OpenLDAP plus caching to address shortcoming in MDS v1, optimizing data access performances

R-GMA: Relational GMA (Grid Monitoring Architecture [Consumers, Producers and Directory Services, GGF] ) implementation which makes information from producers available to consumers as relations (tables) . It also uses relations to handle the registration of producers. R-GMA is consistent with GMA principles.

GRM / PROVE: Application monitoring and visualization tools of the P-GRADE graphical parallel programming environment, properly modified for application monitoring in the DataGrid. The instrumentation library of GRM is generalized for a flexible trace event specification. The components of GRM will be connected to the R-GMA using its Producer and Consumer APIs.

WP3 : GRID monitoring and Info Providers

The EDG Architecture Tutorial - n° 31

R-GMA

Use the GMA from GGF

A relational implementation

Applied to both information and monitoring

Creates impression that you have one RDBMS per VO

Producer

Consumer

Registry

subscribe

lookup

The EDG Architecture Tutorial - n° 32

Relational Approach

Producers announce: SQL “CREATE TABLE” publish: SQL “INSERT”

Consumers collect: SQL “SELECT”

The EDG Architecture Tutorial - n° 33

R-GMA

API – Servlet communication http(s) in

XML back

Sensor Code

ProducerAPI

Application Code

ConsumerAPI

ProducerServlet

RegistryAPI

Registry Servlet

SchemaAPI

Schema Servlet

Consumer Servlet

RegistryAPI

The EDG Architecture Tutorial - n° 34

Schema & ContributionsCPULoad (Global Schema)

Country Site Facility Load Timestamp

UK RAL CDF 0.3 19055711022002

UK RAL ATLAS 1.6 19055611022002

UK GLA CDF 0.4 19055811022002

UK GLA ALICE 0.5 19055611022002

CH CERN ALICE 0.9 19055611022002

CH CERN CDF 0.6 19055511022002

CPULoad (Producer3)

CH CERN ATLAS 1.6 19055611022002

CH CERN CDF 0.6 19055511022002

CPULoad (Producer 1)

UK RAL CDF 0.3 19055711022002

UK RAL ATLAS 1.6 19055611022002

CPULoad (Producer 2)

UK GLA CDF 0.4 19055811022002

UK GLA ALICE 0.5 19055611022002

The EDG Architecture Tutorial - n° 35

Contributions are Views

CPULoad (Producer 1)

UK RAL CDF 0.3 19055711022002

UK RAL ATLAS 1.6 19055611022002

CPULoad (Producer 2)

UK GLA CDF 0.4 19055811022002

UK GLA ALICE 0.5 19055611022002

SELECT * FROM cpuLoad

WHERE country = ’UK’ AND site = ’RAL’

SELECT * FROM cpuLoad

WHERE country = ’UK’ AND site = ’GLA’

The EDG Architecture Tutorial - n° 36

WP3: GRID Monitoring

ComponentsMDS / FTree

R-GMA

GRM/Prove

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Implementation: MDS : LDAP, Globus GRIS, GIIS

FTree : OpenLDAP, caching

RGMA : Java , C++, MySQL, TomCat

GRM / PROVE : P-GRADE

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

WP3 main interfaces: WP1 Resource Broker ( InfoIndex)

WP2 RM optimizer

all GRID services producing info (SE,CE..)

WP7 network monitoring

The EDG Architecture Tutorial - n° 37

WP4 is responsible to deliver a computing fabric comprised of all the necessary tools to manage a center providing grid services on clusters of thousands of nodes. The computing fabric is called the Computing Element in EDG.

User Job Control and Management (Grid and local jobs) on fabric batch and/or interactive CPU services

Gridification – Grid interface to fabric resources

Resource Management – manage underlying batch services

Automated System Administration for Computing Fabric Elements. These subsystems are reserved for system administrators and operators for performing system maintenance

Configuration Management

Installation Management

Fabric Monitoring

EDG middleware architecture: WP4 : Fabric Management

Components

The EDG Architecture Tutorial - n° 38

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

The EDG Architecture Tutorial - n° 39

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

- Interface between Grid-wide services and local fabric;

- Provides local authentication, authorization and mapping of grid credentials.

- Interface between Grid-wide services and local fabric;

- Provides local authentication, authorization and mapping of grid credentials.

The EDG Architecture Tutorial - n° 40

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

- provides transparent access (both job and admin) to different cluster batch systems;

- enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).

- provides transparent access (both job and admin) to different cluster batch systems;

- enhanced capabilities (extended scheduling policies, advanced reservation, local accounting).

The EDG Architecture Tutorial - n° 41

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

- provides the tools to install and manage all software running on the fabric nodes;

-Agent to install, upgrade, remove and configure software packages on the nodes.

-bootstrap services and software repositories.

- provides the tools to install and manage all software running on the fabric nodes;

-Agent to install, upgrade, remove and configure software packages on the nodes.

-bootstrap services and software repositories.

The EDG Architecture Tutorial - n° 42

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview

-provides a central storage and management of all fabric configuration information;

-Compile HLD templates to LLD node profiles

- central DB and set of protocols and APIs to store and retrieve information.

-provides a central storage and management of all fabric configuration information;

-Compile HLD templates to LLD node profiles

- central DB and set of protocols and APIs to store and retrieve information.

The EDG Architecture Tutorial - n° 43

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault Tolerance

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

WP4 Architecture logical overview - provides the tools

for gathering monitoring information on fabric nodes;

-central measurement repository stores all monitoring information;

- fault tolerance correlation engines detect failures and trigger recovery actions.

- provides the tools for gathering monitoring information on fabric nodes;

-central measurement repository stores all monitoring information;

- fault tolerance correlation engines detect failures and trigger recovery actions.

The EDG Architecture Tutorial - n° 44

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

The EDG Architecture Tutorial - n° 45

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- Submit job- Submit job

The EDG Architecture Tutorial - n° 46

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- publish resource and accounting information

- publish resource and accounting information

The EDG Architecture Tutorial - n° 47

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- Optimized selection of site

- Optimized selection of site

The EDG Architecture Tutorial - n° 48

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- Authorize

- Map grid local credentials

- Authorize

- Map grid local credentials

The EDG Architecture Tutorial - n° 49

User job management (Grid and local)

Farm A (LSF) Farm B (PBS)

Grid User

(Mass storage,Disk pools)

Local User

Monitoring

FabricGridification

ResourceManagement

Grid InfoServices(WP3)

WP4 subsystems

Other Wps

ResourceBroker(WP1)

Data Mgmt(WP2)

Grid DataStorage(WP5)

- Select an optimal batch queue and submit

- Return job status and output

- Select an optimal batch queue and submit

- Return job status and output

The EDG Architecture Tutorial - n° 50

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

The EDG Architecture Tutorial - n° 51

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Node malfunction detected

- Node malfunction detected

The EDG Architecture Tutorial - n° 52

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

-Remove node from queue

-Wait for running jobs(?)

-Remove node from queue

-Wait for running jobs(?)

The EDG Architecture Tutorial - n° 53

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Update configuration templates

- Update configuration templates

The EDG Architecture Tutorial - n° 54

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Trigger repair- Trigger repair

The EDG Architecture Tutorial - n° 55

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Repair (e.g. restart, reboot, reconfigure, …)

- Repair (e.g. restart, reboot, reconfigure, …)

The EDG Architecture Tutorial - n° 56

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Node OK detected- Node OK detected

The EDG Architecture Tutorial - n° 57

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

- Put back node in queue

- Put back node in queue

The EDG Architecture Tutorial - n° 58

Automated management of large clusters

WP4 subsystems

Other Wps

Farm A (LSF) Farm B (PBS)

Installation &Node Mgmt

ConfigurationManagement

Monitoring &Fault ToleranceResource

Management

Information

Invocation

Automation

The EDG Architecture Tutorial - n° 59

LCFG (Local ConFiGuration system)

Widely used fabric tool, whose purpose is to handle automated installation and configuration in a very diverse and evolving environment

Mechanism: Abstract configuration parameters are stored in a central

repository located in the LCFG server.

Scripts on the host machine (LCFG client) read these configuration parameters and either generate traditional configuration files, or directly manipulate various services.

The EDG Architecture Tutorial - n° 60

WP4: Fabric Management

ComponentsLCFG

Fabric Monitoring

PBS & LSF info providers

Image installation

Config. Cache Mgr

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Implementation: LCFG : C++, XML, HTTP

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

WP4 main interfaces: WP1 Resource Broker ( InfoIndex)

WP2 Data management

WP5 Storage Element

WP3 GRID Info Services

The EDG Architecture Tutorial - n° 61

WP5 delivers the Grid interface to Storage.

Its service, the Storage Element (SE) is interfacing to underlying Mass Storage Systems or simple storage services.

WP5 : Mass Storage Management

The EDG Architecture Tutorial - n° 62

Interface1

Interface3

Interface2

Message Queue

Session Manager

System Log House Keeping

MetaData

MSSInterface

MSSInterface

MSS1 MSS2

Top layer

Core

Bottom layer

Clients ( RB,JSS, RM, GDMP, InfoServices(WP3),User Applic running on CEs, CLIs)

Storage Element

The SE architecture

The EDG Architecture Tutorial - n° 63

Client SE

ReplicaManager/Catalog

Storage6

2

3

4

1

1. The Client asks a catalog to provide the location of a file2. The catalog responds with the name of an SE3. The client asks the SE for the file4. The SE asks the storage system to provide the file5. The storage system sends the file to the client through the SE or 6. directly

5

6

SE Interactions

The EDG Architecture Tutorial - n° 64

WP5: Mass Storage Management Achievements

Definition of Architecture and Design for DataGrid storage Element

Collaboration with Globus on GridFTP/RFIO

Collaboration with PPDG on control API Staging from/to CASTOR at CERN

succesfully implemented and tested Succesfully Interfaced to GDMP

Supported Storage Systems: UNIX disk systems HPSS (High Performance Storage

System) CASTOR (through RFIO) GridFTP servers DMF Enstore

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication

Accounting

Authorization Authentication

Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

WP5 (SE) main interfaces: WP1 Resource Broker & JSS

WP2 RM, RC

WP7 for GRIDftp monitoring

WP3 GRID Info Services

The EDG Architecture Tutorial - n° 65

WP6: TestBed Integration and demonstrators

WP6 goals: the EDG testbed

Integration of EDG sw releases (currently 1.2) and deployment all over the EDG testbed : the integration team

Working implementation of multiple VOs & basic security infrastructure

Definition of acceptable usage contracts and creation of Certification Authorities group

Set up of the Authorization Working Group to manage authorization policies on the testbed

Components

Support for test-VO, mkgridmap tools

Globus packaging & EDG config

Build tools, CVS central s/w repository

End-user documents

Collective ServicesCollective Services

Info & MonitorInfo &

MonitorReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication Accounting

Authorization Authentication Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Fabric servicesFabric services

ConfigManagement

ConfigManagement

Node Installation Management

Node Installation Management

MonitoringFault

Tolerance

MonitoringFault

Tolerance

Resource Managem.Resource

Managem.Fabric StorageManagement

Fabric StorageManagement

Grid Application LayerGrid Application Layer

Data Managem.

Data Managem.

Metadata Managem.Metadata

Managem.Object to

File Mapping

Object to File

Mapping

Logging & Book-

keeping

Logging & Book-

keeping

Job Managem.

Job Managem.

SQL Database Services

SQL Database Services

The EDG Architecture Tutorial - n° 66

Further Information

DataGrid Dx.2 Deliverables: x=1..5

DataGrid D12.4 Deliverable