Upload
matthew-taylor
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
DataGrid Prototype 1
A.Ghiselli
with the contribution of F.Donno, L.Gaido, F.Prelz, M.Sgaravatto
INFN, Italy
TNC 2002Limerick
5 June 2002
2
Outline
The HEP Environment and the GRID scenario
The DataGrid components: Authentication and Authorization
Virtual Organizations One-time login VO servers
DataGrid Resources The DataGRID fabric Grid Information Service
DataGrid Services Grid scheduling and workload management Data Management
DataGrid Release1 Description
Testbed1
Conclusions
3
The DataGrid Environment and the GRID scenario
Research communities: High Energy Phisics(HEP), Earth Observation (EO), Bio-Informatics Communities
HEP experiment characteristics: Huge number of users: thousands of physicists from hundred of
institutions, Labs and universities for one experiment at Cern
Distributed resources: pool of computing, storage and network resources to analyze petabytes of data (1015)
Natural data parallelism based on the experiment events (HTC)
Computing systems based on clusters with an high number of PC (Farm)
4
A very large distributed comunity
CMS: 1800 physicists150 institutes32 countries
Just as an example
5
On-line System• Large variety of
triggers and thresholds• Multi-level trigger• Online reduction 107
• Keep highly selected events
Level 1 - Special Hardware
Level 2 - Embedded Processors
40 MHz 40 MHz (1000 TB/sec) equivalent)
(1000 TB/sec) equivalent)
Level 3 – Farm of commodity CPU
75 KHz 75 KHz (75 GB/sec)fully digitised
(75 GB/sec)fully digitised5 KHz5 KHz (5 GB/sec)
(5 GB/sec)100 Hz100 Hz (100 MB/sec)
(100 MB/sec)
Data Recording &
Data Recording &
Offline Analysis
Offline Analysis
Digital telephone
1-2 KB/sec
6
HEP computing, key parameters
All LHC experiments at CERN: 10 Peta Bytes/yr data storage; disk: 2 P Byte
Multi-experiment Tier 1: 3 Peta Byte/yr; disk: 0.5 P Byte
Tier 0 & 1 at CERN: 2 M SI 95 (PC today ~ 20SI95)
Multi-experiment Tier 1: 0.9 M SI 95
Networking Tier 0 --> Tier 1: 622 Mbps (4 Gbps) (black fibre: 1 Tbps today)
7
Regional Centres - a Multi-Tier Model
Tier3
Desktop
CERN – Tier 0(CERN - Tier 1)
Tier 1 FNALINFN
IN2P32.5
Gbps
2.5 Gbps
622 M
bp
s
1G m
bps 622M mbps
Tier2 site a
site b site c
site n Organising Software:
"Middleware"
"Transparent" user access to applications and all data
8
GRID an extention of the WEB concept
On-demand creation of powerfulOn-demand creation of powerful virtual computing and data virtual computing and data systemssystems
Grid: Flexible and High Performance Access to All kinds of Resources.
Sensor nets
http://
http://
Web: Uniform Access to the Informations
Data Stores
Computers
Softwarecatalogs
Colleagues
0
50
100
150
200
250
300A
ust
ria
Cze
ch R
ep
ub
lic
De
nm
ark
Fin
lan
d
Fra
nce
Ge
rma
ny
Gre
ece
Ita
ly
Ne
the
rlan
ds
No
rwa
y
Po
lan
d
Po
rtu
ga
l
Slo
vac
Re
pu
blic
Sp
ain
Sw
ed
en
Sw
itze
rlan
d
Un
ited
Kin
gd
om
CE
RN
Arm
en
ia
Au
stra
lia
Aze
rba
ijan
Re
pu
blic
Re
pu
blic
of
Be
laru
s
Bra
zil
Ca
na
da
Ch
ina
PR
Re
pu
blic
of
Ge
org
ia
Isra
el
Jap
an
Mo
rocc
o
Ro
ma
nia
Ru
ssia
JIN
R D
ub
na
Slo
ven
ia
Taiw
an
Tu
rke
y
Un
ited
Sta
tes
9
DataGrid Authentication & Authorization
“Grid” refers to both the technologies and the infrastructure that enable coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations , such as the ones considered in the DataGrid project.
Grid doesn’t prevent the local access to computing resources
The dynamic nature of VO brings the need of specific VO services to authenticate and authorize users to access the Grid resources, to manage grid access policies, etc.
10
One-time login based on Identity Certificate
The users do not have a login name and password via which they can log in to the grid computers, rather they must own an X.509 Identity Certificate issued by a “Certification Authority” (CA) which is trusted (each CA has to adopt the set of agreed rules) by the DataGrid project.
CA’s publish their certs in LDAP Directories (if the CA doesn’t have/want one, a batch alternative is possible).
Trusted CA’s: CERN CA: http://globus.home.cern.ch/globus/ca/ DOESG CA: http://pki1.doesciencegrid.org/ INFN CA: ldap://security.fi.infn.it DutchGrid and NIKHEF CA: ldap://certificate.nikhef.nl UK HEP Testbed CA: http://www.gridpp.ac.uk/ca/ CNRS DataGRid-Fr: http://marianne.in2p3.fr/datagrid/wp6-fr/ca/ca2-fr.html Spanish DataGrid CA: http://www.ifca.unican.es/datagrid/ca/
11
One-time login / authentication process
Datagrid security is based on Globus Security Infrastructure (GSI)
GSI implements the standard Generic Security Service Application Program Interface (GSS-API) based on RFC 2078/2743.
GSS-API requires the ability to pass user/host/service authentication information to a remote site so that further authenticated connections can be established (on-time login).
Proxy is the entity empowered to act as the user at the remote site. The remote end is able to verify the proxy certificate by descending the
certificate signature chain and thus authenticate the certificate signer.
The signer’s identity is established by trusting the CA ( in DataGrid the CA must be one of the trusted CAs).
Proxy certificate has a short expiration time (default value 1 day).
12
One-time login / authorization
User authorization is the last remaining step User access is granted by checking the proxy certificate subject
(X.500 Distinguished Name) and looking it up in a list (so-called gridmap file) that is maintained at the remote site.
Gridmap file links a DN to a local resource username, so that the requesting user can inherit all the rights of the local user. Many DNs can be linked to the same local user
Real authorization specifically granted by the VO which the user belongs to in agreement with the owners of the grid resources.
Needs of VO authorization management server and tools
13
Basic elements of the GSI one-time-login model.
14
Authorization Structure
Each VO manages an LDAP Directory: members (ou=People), which contain:
the URI of the certificate on the CA LDAP Directory; the Subject of the user’s certificate (to speed up grid-mapfile
generation); groups (e.g. ou=tb1):
every user must belong to at least one group;
available VO’s: Alice:
ldap://grid-vo.nikhef.nl/o=alice,dc=eu-datagrid,dc=org Atlas:
ldap://grid-vo.nikhef.nl/o=atlas,dc=eu-datagrid,dc=org CMS:
ldap://grid-vo.nikhef.nl/o=cms,dc=eu-datagrid,dc=org CDF:
ldap://ldap-vo.cnaf.infn.it/o=cdf,dc=eu-datagrid,dc=org
Gridmap files are generated from the VO Directories
15
grid-mapfile generationo=testbed,dc=edg, dc=org
CN=Franz Elmer
ou=People
CN=John Smith
mkgridmap grid-mapfile
VOVODirectoryDirectory
““AuthorizationAuthorizationDirectory”Directory”
CN=Mario Rossi
o=xyz,dc=edg, dc=org
CN=Franz ElmerCN=John Smith
Authentication Certificate
Authentication Certificate
Authentication Certificate
ou=People ou=tb1 ou=Admin
local users ban list
16
The DataGrid fabric The DataGrid fabric consists of a farm of centrally managed
machines with multiple functionalities: Computing Elements(CE), Worker Nodes (WN), Storage Elements (SE), Information Service (IS), Network links , …
A site consists of one or more “software and profile servers” and a number of “clients”. Both clients and server can be automatically automatically configuredconfigured by getting the appropriate profile from one profile servers. The first profile server needs to be installed/configured manually.
CECE//WNWN (PC Cluster)(PC Cluster)
RPMs repositoryRPMs repository
Profile repositoryProfile repository
LCFGServer
SESE(GDMP)(GDMP)
17
DataGrid resources and Information Service
Computing Element (CE) identifies large computing farms of commodity PCs
The access to the grid resources is based on the Globus Resource Access Management (GRAM) service. It is responsible for operating a set of resources under the same site-specific allocation policy or Local Resource Access Management (LRAM) like LSF, PBS, Condor..
Storage Element (SE) identifies any storage system with the ability to provide direct file access and transfer via FTP, NFS or other protocols using the grid security mechanism.
18
Grid Information Service
The Information Service plays a fundamental role in the DataGrid environment since resource discovery and decision making is based upon the information service infrastructure. Basically an IS is needed to collect and organize, in a coherent manner, information about grid resources and status and make them available to the consumer entities.
EDG-Release1 adopted the Globus Information Service (MDS) which is based on LDAP directory service.
Advantages: Well defined data model and a standard and consolidated way to describe data Standardized API to access data in the directory servers A distributed topological model that allows data distribution and delegation of
access policies among institutions
Disadvantages Not designed to store dynamic information such as the status of computing,
storage and network resources. For these reasons other mechanisms are under test.
19
GRAM Architecture
EGD specific Information provider
EDG specific schema
EDG specificLRAM
20
Resources and service schemas
The schema (data structure describing the grid resources and their status) represents what makes data valuable to Grid tools and applications. DataGrid defined and implemented its own CE, SE and NE schemas.
EDG CE schema describes a queue since it is the most suitable data structure to model a cluster of homogeneous PCs, locally managed by schedulers like pbs, lsf and Condor.
The definition and standardization of the information about grid resources is largely work in progress within the Global Grid Forum.
Work for a common schema for CE, SE and NE is in progress between EDT/EDG, iVDGL and Globus. This will allow access US and EU grids for both EU and US users.
21
Scheduling on the grid
One of the most important aspects of the Project is the workload management of applications dealing with large amount of data and clusters with high number of machines.
The scheduler is one of the most critical components of the resource management systems, since it has the responsibility of assigning resources to jobs in such a way that the application requirements are met, and of ensuring that the resource usage limits granted to the user are not exceeded. Although scheduling is a traditional area of computer science research, the particular characteristics of the DataGrid project, and of the computational grids in general, make traditional schedulers inappropriate.
22
Which properties for EDG scheduler? Distributed organization. Given that several user
communities (also called virtual organizations) have to co-exist in DataGrid, it is reasonable to assume that each of them will want to use a scheduler that better fits its particular needs (community scheduler). However, when a number of independent schedulers are operating simultaneously, a lack of coordination among their actions may result in conflicting and performance-hampering decisions. The need of coordination among these peer, independent schedulers naturally calls for a distributed organization.
Predictive state estimation, in order to deliver adequate performance even in face of dynamic variation of the resource status.
Ability to interact with the resource information system. At the moment, all the existing schedulers require that the user specifies the list of the machines that (s)he has permission to use. However, a fully functional grid scheduler should be able to autonomously find this information by interacting with the grid-wide information service.
23
Cont. Ability to optimize both system and application
performance, depending on the needs of DataGrid users. As a matter of fact, DataGrid users needing high-throughput for batches of independent jobs (such as the HEP community) have to co-exist with users requiring low response times for individual applications (e.g. the bio-medical community). In this case, neither a system-oriented nor an application-oriented scheduling policy would be sufficient.
Submission reliability: Grids are characterized by an extreme resource volatility, that is the set of available resource may dynamically change during the lifetime of an application. The scheduler should be able to resubmit, without requiring the user intervention, an application whose execution cannot continue as consequence of the failure or unavailability of the machine(s) on which it is running.
Allocation fairness. In a realistic system different users will have different priorities that determine the amount of Grid resources allocated to their applications.
24
WMS components
The EDG-WMS has been designed having in mind the above properties for a grid scheduler.
Resource broker, is the core component, which has: to find a CE that best matches the requirements and preferences of
a submitted job, considering also the current distribution of load on the grid.
Once a suitable computing element is found, to pass the job to the job submission service for the actual submission.
These tasks include interacting with the DataGrid Data Management Services to resolve Logical data set names as well as to find a preliminary set of sites where the required data are stored
25
Other WMS components
The logging and bookkeeping service is responsible to store and manage logging and bookkeeping information generated by the various components of the WMS. It collects information about the scheduling system and about active jobs.
A user can submit jobs and retrieve the output through the User Interface. The description of a job is expressed in the Job Description Language (JDL), which is based on the classified advertisement scheme developed by the Condor project.
It is a semi-structured data model: no specific schema is required
Symmetry: all entities in the grid, in particular applications and computing resources can be expressible in the same language.
Simplicity for both syntax and semantics.
26
Dg-job-submit dg-job-submit jobad6.jdl -o jobs_list -n [email protected]
#
Executable = "WP1testC";
StdInput = "sim.dat";
StdOutput = "sim.out";
StdError = "sim.err";
InputSandbox = {"/home/wp1/HandsOn-0409/WP1testC","/home/wp1/HandsOn-0409/file*”,
"/home/wp1/DATA/*"};
OutputSandbox = {"sim.err","test.out","sim.out"};
Rank = other.AverageSI00;
Requirements = (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2”) &&
(other.RunTimeEnvironmnet == “CMS3.2”);
InputData = "LF:test10096-0009";
ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";
DataAccessProtocol = "gridftp";
27
Dg-job-submit Executable = "WP1testF";
StdOutput = "sim.out";
StdError = "sim.err";
InputSandbox = {"/home/datamat/sim.exe", "/home/datamat/DATA/*"};
OutputSandbox = {"sim.err","sim.err","testD.out"};
Rank = other.TotalCPUs * other.AverageSI00;
Requirements = other.LRMSType == "PBS" \ && (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2") && \ self.Rank > 10 && other.FreeCPUs > 1;
RetryCount = 2;
Arguments = "file1"; InputData = "LF:test10099-1001";
ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";
DataAccessProtocol = "gridftp";
OutputSE = "grid001.cnaf.infn.it";
28
UI commands Submit of a job for execution on a remote Computing Element,
including: automatic resource discovery and selection staging of the application and data (input sandbox)
Selection of a list of suitable resources for a specific job
Cancellation of one or more submitted jobs
Retrieval of the output file(s) produced by a completed job (output sandbox)
Retrieval and display of bookkeeping information about submitted jobs
Retrieval and display of logging information about jobs.
29
A Job Submission Example
UIJDL
Logging &Logging &Book-keepingBook-keeping
ResourceResourceBrokerBroker
Output “sandbox”
Input “sandbox”
Job SubmissionJob SubmissionServiceService
StorageStorageElementElement
ComputeComputeElementElement
Brokerinfo
Output “sandbox”
Input “sandbox”
Information Information ServiceService
Job Status
LFN->PFN
Data ManagementData ManagementServicesServices
Author.&Authen. Job
Subm
it
Job Q
uery
Job Status
30
Data management
In EDG the technique adopted for optimizing data access and providing fault tolerance is data replication.
The data management architecture is focused on file replication services and the main objectives include optimized data access, caching, file replication and file migration. The most important tasks are:
Management of a universal namespace for files (using replica catalogues)
Secure and efficient data transfer between sites
Synchronization of remote copies
(Optimized) wide-area data access/caching
Management of meta-data like indices and file meta-data
Interface to mass storage systems
31
The building blocks of the DM architecture
Replica catalogue as a fundamental building block in data grids It addresses the common need to keep track of multiple copies of a
single logical file by maintaining a mapping from logical file names to physical locations.
Imported from globus and based on openLDAP v2 with and EDG schema, but migrating to a more distributed architecture with a relational database as backend.
32
The building blocks of the DM architecture
The Replica Manager :the main tasks are to securely and efficiently copy files between two Storage Elements and update the replica catalogue when the copy process has successfully terminated.
The File Copier (also called Data Mover) is an efficient and secure file transfer service that has to be available on each Storage Element. Initially this service will be based on GridFTP protocol.
The Consistency Service has to guarantee the synchronization of the file replicas when an update occurs. This service is provided on top of the replica manager.
The first implementation is based on GDMP (Grid Data Mirroring Package) which is a file replication tool that implements most of the replica manager functionalities.
The access to the replica files shall be optimized by using some performance information such as the current network throughput and latency and the load on the Storage Elements. The Replica Optimizer’s duty is to select the “best” replicas.
33
The Storage Element
Data Management in DataGrid has to deal with heterogeneity of storage systems and thus Storage Elements (SE). The interface to SE has to be unique regardless of the underlying storage technology. SE is a basic storage resource in DataGrid and also define the smallest granularity for a compound storage system. A Storage Element can either be a large disk pool or a Mass Storage System (MSS) having its own internal disk pool. The current storage system implementations include MSS like HPSS, Castor as well as distributed file systems like AFS or NFS.
34
Architecture
35
DataGrid Release 1 summary UI User Interface: Lightweight component for accessing to the
workload management system.
RB Resource Broker: The core component of the workload management system, able to find a resource matching the user’s requirements (including data location).
LB Logging and Bookkeeping: Repository for events occurring in the lifespan of a job.
JSS Job Submission Services: It is the result of the integration of Condor-G and Globus services to achieve reliable submission of jobs via the Globus GRAM protocol.
II Information Index: Caching information index (based on Globus MDS-2) directly connected to the RB, to achieve control of the cache times and to prevent blocking failures when accessing the information space.
Replica Manager /GDMP: To consistently copy (replicate) files from one Storage Element to another and register replicas.
36
Cont.
Replica Catalog: It stores information about the physical files on all the Grid Storage Elements. A centralized replica catalog has been chosen for prototype1
MDS (GRIS and Info.Providers): Grid Information System used by the Resource Broker. The first implementation is based on the Globus MDS system, with resource schema and information providers defined and implemented by DataGrid.
Authentication and Authorization Services: VO directory configuration and tools to periodically generate authorization lists
Automatic Installation and Configuration management: Large Scale Linux Configuration (LCFG) tool for very large computing fabrics (CE)
37
EDG Release1 services
38
Definition of the EDG release 1/2
The hardest task of the Integration Team, a working group in charge of integrating the different pieces of software, has been to collect all the EDG software packages, Globus and Condor, to study the functionality and interdependencies, the requirements for a correct operation of the testbed and come up with a topology and precise installation and configuration instructions for deployment.
Grid Elements: in order to achieve these goals it has been necessary to construct a node-centric view of the testbed deployment specifying the profile of each node type (grid elements).
RPM: the EDG release has been packaged via Linux RPMs (Red Hat Package Manager), because of the requirements coming with the installation and distribution tools provided for the farms. In particular the Globus main subcomponents were packaged in separate RPMs that could be installed and configured independently.
Grid Elements as a list of software RPMs and configuration RPMs. During this work, it was necessary to specify detailed configuration instructions and requirements that allow these elements to interoperate.
39
Definition of the EDG release 2/2
LCFG for automatic installation and configuration: for each of the grid elements an LCFG template provides:
the RPM lists of the packages required for a specific element
a typical LCFG configuration that needs to be customized for a specific testbed site.
a specific set of instructions about configuring the grid element.
Once the definition of the profile for a grid element has been optimized, a small test suite has been used to verify that all the functionalities required were present and working.
All the above constitutes the content of the edg-release package available in the CVS repository of the DataGrid Release.
40
Middleware integration
EDG Services
Grid Elements
Basic Grid Services
Grid Scheduler RM/GDMP
Globus-EDG GIS
UI RB L&B CE SE WN
Condor Globus
Replica Catalogue
II
41
Testbed 1
The first prototype of the EDG (European DataGrid) grid infrastructure (Testbed1) has been deployed in December 2001 using the official - tagged- EDG software Release 1, based on Globus 2 beta 21.
The Testbed1 was initially made up by a limited amount of grid elements in 5 European countries (CERN, FR, UK, IT, NL), but it is currently being extended to about 30 sites all over Europe including also some other countries such as the Czech Republic, ES, PT, DE and the Nordic Countries.
The “common” grid elements (User Interfaces, Computing Elements, Worker Nodes and Storage Elements) have been installed and configured at each site while the grid elements devoted to the central grid services (Resource Brokers, Information Indexes and Logging and Bookkeeping servers) have been set up at CERN and INFN-CNAF (IT). Some other dedicated servers (the Virtual Organization LDAP servers and the VO Replica Catalogues) have been hosted at NIKHEF (NL) and INFN-CNAF.
42
Catania
Bologna/CNAF
Padova/LNL
Torino
Cagliari
Roma
Milano
To USA
To Russia/Japan
The prototype DataGrid testbed
Cern
NIKHEF
London
RAL
CC-Lyon
Paris
Prague
Barcelona
Madrid
Lisbona
43
Validation of Release1
It is essential for the success of the Project that the developed middleware satisfies the user’s initial requirements. For this reason, a software validation phase performed by the user communities using real applications in a large scale environment is crucial within the project.
The validation activity will be progressively done during the whole project lifetime, in order to test each middleware release.
Short term use cases have been used at the beginning and real applications are going to be used later on.
44
NLNLSURFnet
CERN
UKUKSuperJANET4
AbileneAbilene
ESNETESNET
MRENMREN
ITITGARR-B
GEANT
NewYork
STAR-TAP
STAR-LIGHT
DataTAG project
Two main focus:
Grid applied network research; 2.5 Gbps lambda with Star-Light for network research
Interoperability between Grids in EU and US
US partnership: iVDGL project
Main PartnersCERN, INFN,UvA(NL) PPARC(UK), INRIA(FR)
45
Conclusions
The first DataGrid prototype (Relese 1) is in place in Europe as result of the collaboration of all the actors of a distributed computing environment: resource owner, middleware developers, scientific application programmers and scientific application users.
The testing phase demonstrated the power of the grid whose basic functionalities allow to select the most appropriate CE-SE pair just specifying job characteristic and related input/output data in a high level language.
The major project release, foreseen for September 2002, will introduce new important services like support for dependent, parallel, and partitionable jobs, resource co-allocation, advance reservation and accounting, as well as more efficient information and monitoring services.
DataGrid documents can be found at http://eu-datagrid.web.cern.ch/