26
Enabling Grids for E-sciencE www.eu-egee.org gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

Enabling Grids for E-sciencE gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

Embed Size (px)

Citation preview

Page 1: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

Enabling Grids for E-sciencE

www.eu-egee.org

gLite for ATLAS Production

Simone Campana, CERN/INFN

ATLAS production meetingMay 2, 2005

Page 2: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 2

Enabling Grids for E-sciencE

[email protected]

Outline

• Contents of gLite release 1.0– Components, architecture, and service interplay– Major differences to LCG-2– Major open issues– Future plans

• Deployment plan– LCG-2 vs gLite– gLite/LCG-2 coexistence– Current status of certification – Preproduction service

Page 3: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 3

Enabling Grids for E-sciencE

[email protected]

WMS

• Reengineering of the LCG-2 Workload Management System– Support partitioned jobs and jobs with dependencies

Might be considered as setup for the production machinery Would help solving problems of pre-staging etc … Side effects should be investigated and considered as well.

– Task Queue Persistent queue for submitted jobs

– Information Supermarket Read only information system cache Updated by

• Information systems (CE in push mode)

• CEMon (CE in pull mode)

• Combination of both Allows WMS to work in push and pull mode

Page 4: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 4

Enabling Grids for E-sciencE

[email protected]

WMS Cont’d

– Interface to Data Management EDG-RLS StorageIndex DLI

– Condor-C Job submission mechanism between the WM and the CE Improvement in term of reliability in respect of globus GRAM

– CE moving towards a VO based scheduler

• Future: – Web services– Bulk submission

Page 5: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 5

Enabling Grids for E-sciencE

[email protected]

WMS Cont’d

• Major problems– Failure rate ~12% (retrycount = 0), otherwise 100% success

Several reasons being investigated (e.g. race conditions) Shallow re-submission (i.e. retry of submission, not execution)

might help

– Matchmaking is being blocked sometimes Fix provided for Release 1.1 (end of April)

– Condor as backend not yet working– Not yet final architecture of CE:

One Schedd per local user id Need setuid services and head node monitoring (Globus+JRA3)

Page 6: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 6

Enabling Grids for E-sciencE

[email protected]

• Mostly new developments based on (and re-using some) AliEn services

• Storage Element– Storage Resource Manager rely on existing implementations– POSIX-I/O gLite-I/O– Access protocols gsiftp, gsidcap, rfio, …

• Catalogs– File Catalog– Replica Catalog– File Authorization Service– Metadata Catalog

• File Transfer– Data Scheduler planned for Release 2– File Transfer Service gLite FTS and glite-url-copy– File Placement Service gLite FPS

Data Management Services

gLite FiReMan Catalog (MySQL and Oracle)

gLite Metadata Catalog

Page 7: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 7

Enabling Grids for E-sciencE

[email protected]

• Addressing shortcomings of LCG-2 data management– RLS performance– Lack of consistent grid-storage interfaces– Unreliable data transfer layer

• Fireman Catalog– Hierarchical Name Space– Bulk Operations– ACLs– Web Services Interface– POOL Interface– Performance/scalability

• gLite I/O– Support of ACL’s– Support of Fireman catalog in addition to RLS

• File Transfer Service– Did not exist on LCG-2: to be evaluated in SC3

Data Management Cont’d

Page 8: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 8

Enabling Grids for E-sciencE

[email protected]

Catalogs

• Currently: Global catalog• Future: can think about Local File catalog

– Central location index (Storage Index) necessary– Lightweight and therefore more scalable– Yet to be demonstrated

• Comparison Fireman-LFC– Single files: LFC faster– Fireman offers Bulk capabilities. LFC does not.

• “Within the LHC experiments there is yet no decision - they really want to test both LFC and Fireman and select based on their needs. It is likely that different applications will choose a different catalogue. Since these are application dependent we will probably be in a situation where we have different VOs using different catalogues.”

Page 9: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 9

Enabling Grids for E-sciencE

[email protected]

Information and Monitoring Services

• R-GMA (Relational Grid Monitoring Architecture)– Implements GGF GMA standard– Development started in EDG, deployed on the production infrastructure

for accounting

ProducerService

RegistryService

ConsumerService

AP

IA

PI

Mediator

SchemaService

Consumerapplication

Producerapplication

Publish Tuples

Send Query

Receive Tuples

Register

LocateQu

ery

Tu

ples

SQL “CREATE TABLE”

SQL “INSERT”

SQL “SELECT”

Page 10: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 10

Enabling Grids for E-sciencE

[email protected]

Job Monitoring

• Currently ATLAS relies on the ProdDB and GridICE server.

• R-GMA and GridICE can coexist – At the momen, information from the GridICE sensors can be

published by R-GMA– Monitoring tools (Magoo) can therefore interface to R-GMA

directly – GridICE server still quite useful to quickly visualize status of

resources

• R-GMA can be used for application monitoring• R-GMA is being used since several months already for

accounting and operations. • Don’t forget Laurence Field mini tutorial on R-GMA

architecture and usage– End of May (not sure on the final date yet).

Page 11: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 11

Enabling Grids for E-sciencE

[email protected]

VOMS

• Derived from DataTag and EDG• Used for VO mgmt

– VOMS certificates understood by WMS and Data Mgmt (as of release 1.1)

• RFC compliance

• Major problems– Incompatibility with previous VOMS versions

Due to RFC compliance

Page 12: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 12

Enabling Grids for E-sciencE

[email protected]

Future Plans - WMS

• Move CE to final architecture– One scheduler per VO (can be provided by VO)– Head-node monitor and fork/set-uid service

• WS interface to WMS– With better support for bulk job-submission

• Support for pilot jobs

• Integration of network information (JRA4)

• Closer integration with Data Mgmt– Common job and data transfer DAGs– Data matchmaking for ranking

• “shallow” job-resubmission

• CE history used for ranking

• Use information from R-GMA in the information supermarket

Page 13: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 13

Enabling Grids for E-sciencE

[email protected]

Future Plans - DM

• Security in DM chain– Delegation – Work out security model (local vs. Grid)– Support SRMs with native ACL support– VOMS roles for ACL’s

• Distributed/Partitioned Catalogs– Need to define model

• Integration of network information (JRA4)

• Explore XROOTD

• Harmonize metadata interface (ARDA, PTF)

• Data Scheduler (equivalent to WMS for data transfer requests)

Page 14: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 14

Enabling Grids for E-sciencE

[email protected]

Outline

• Contents of gLite release 1.0– Components, architecture, and service interplay– Major differences to LCG-2– Major open issues– Future plans

• Deployment plan– LCG-2 vs gLite– gLite/LCG-2 coexistence– Current status of certification – Preproduction service

Page 15: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 15

Enabling Grids for E-sciencE

[email protected]

LCG2 Services

Client Libs Client Libs

ServicesServices

CECE

R-GMA MonBoxR-GMA MonBox

SE classicSE classic

SE

CASTOR

SE

CASTOR

SE

dCache

SE

dCache

SE

DPM

SE

DPMSE SRMSE SRM

LCG WN LCG WN LCG WN LCG WN

LCG WN LCG WN LCG WN LCG WN

LCG WN LCG WN

VOMSVOMSVOMSVOMS

VOMSVOMSVOMSVOMS

VO (ldap)VO (ldap)VO (ldap)VO (ldap)

VO (ldap)VO (ldap)VO (ldap)VO (ldap)

RLSRLSRLSRLS

RLSRLSRLSRLS

LFCLFCLFCLFC

LFCLFCLFCLFC

Services/VO

LCG UI LCG UI LCG UI LCG UI

LCG UI LCG UI LCG UI LCG UI

LCG UI LCG UI

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

R-GMA RegistriesR-GMA

Registries

monitoringmonitoring

SFTSFT

FCRFCR

Global Services

APELAPEL

Page 16: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 16

Enabling Grids for E-sciencE

[email protected]

gLite 1

Services/VO

VOMSVOMSVOMSVOMS

VOMSVOMSVOMSVOMS

FireManFireManFireManFireMan

FireManFireManFireManFireMan

CE push/pull

CE push/pull

R-GMA MonBoxR-GMA MonBox

SE

CASTOR

SE

CASTOR

SE

dCache

SE

dCache

SE

DPM

SE

DPMSRM SE SRM SE

gLite I/OgLite I/O

gLiteWN gLiteWN gLiteWN gLiteWN

gLiteWN gLiteWN gLiteWN gLiteWN

R-GMA RegistriesR-GMA

Registries

Global Services

DGAS??DGAS??

Client Libs Client Libs

ServicesServices

Generic ServicesGeneric Services

gLite UI gLite UI gLite UI gLite UI

gLite UI gLite UI gLite UI gLite UI Services are generic, but file ownership

limits access to gLite I/O service

MyProxyMyProxy

WLMWLM

CIC ServicesMyProxyMyProxy

WLMWLM

CIC ServicesMyProxyMyProxy

WLMWLM

CIC Services

Link between CEs and SEs via CEMon (later by R-GMA, temp. BDII)

Page 17: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 17

Enabling Grids for E-sciencE

[email protected]

gLite -LCG2 Coexistence

Services/VO

VOMSVOMSVOMSVOMS

VOMSVOMSVOMSVOMS

FireManFireManFireManFireMan

FireManFireManFireManFireMan

SE

CASTOR

SE

CASTOR

SE

dCache

SE

dCache

SE

DPM

SE

DPM

SE SRMSE SRM

gLite I/OgLite I/O

R-GMA RegistriesR-GMA

Registries

Global Services

DGASDGAS

Client Libs Client Libs

ServicesServices

Generic ServicesGeneric Services

CE push/pullCE push/pull

R-GMA MonBoxR-GMA MonBox

CECE

R-GMA MonBoxR-GMA MonBox

SE

CASTOR

SE

CASTOR

SE

dCache

SE

dCache

SE

DPM

SE

DPM

SE SRMSE SRM

VOMSVOMSVOMSVOMS

VOMSVOMSVOMSVOMS

VO (ldap)VO (ldap)VO (ldap)VO (ldap)

VO (ldap)VO (ldap)VO (ldap)VO (ldap)

RLSRLSRLSRLS

RLSRLSRLSRLS

LFCLFCLFCLFC

LFCLFCLFCLFC

Services/VO

gLite UI gLite UI gLite UI gLite UI

gLite UI gLite UI gLite UI gLite UI

LCG UI LCG UI LCG UI LCG UI

LCG UI LCG UI LCG UI LCG UI

LCG UI LCG UI

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

R-GMA RegistriesR-GMA

Registries

monitoringmonitoring

SFTSFT

FCRFCR

Global Services

APELAPEL

MyProxyMyProxy

WLMWLM

CIC ServicesMyProxyMyProxy

WLMWLM

CIC ServicesMyProxyMyProxy

WLMWLM

CIC Services

gLiteWN&LCG2 gLiteWN&LCG2

gLiteWN&LCG2 gLiteWN&LCG2

gLiteWN&LCG2 gLiteWN&LCG2

gLiteWN&LCG2 gLiteWN&LCG2

BDIIBDII

Services are generic, but file ownership limits access to gLite I/O service, could be worked around…..

Can be hosted on the same node

WLM can get info on CEs and SEs through CE Mon or BDII

WLM can interface via DLI to LFC, but the gLiteI/O not

Page 18: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 18

Enabling Grids for E-sciencE

[email protected]

Model

• Pro: The real gLite experience

No mixing, except for the WNs and UIs

No intense interoperability test needed

New versions can be released more quickly after they become available

LCG-2 can be evolved independently

• Cons:– Many additional services

More to come Not a valid model for small

sites

– Complex trickery needed to allow access to new/old data for the two systems

– LCG-2 can be evolved independently

Page 19: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 19

Enabling Grids for E-sciencE

[email protected]

gLite - LCG2 step 1

Client Libs Client Libs

ServicesServices

Generic ServicesGeneric Services

CE push/pullCE push/pull

CECE

R-GMA MonBoxR-GMA MonBox SE

CASTOR

SE

CASTOR

SE

dCache

SE

dCache

SE

DPM

SE

DPM

SE SRMSE SRM

VOMSVOMSVOMSVOMS

VOMSVOMSVOMSVOMS

VO (ldap)VO (ldap)VO (ldap)VO (ldap)

VO (ldap)VO (ldap)VO (ldap)VO (ldap)

RLSRLSRLSRLS

RLSRLSRLSRLS

LFCLFCLFCLFC

LFCLFCLFCLFC

Services/VO

gLite UI gLite UI gLite UI gLite UI

gLite UI gLite UI gLite UI gLite UI

LCG UI LCG UI LCG UI LCG UI

LCG UI LCG UI LCG UI LCG UI

LCG UI LCG UI

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

MyProxyMyProxy

BDIIBDII

RBRB

CIC Services

R-GMA RegistriesR-GMA

Registries

monitoringmonitoring

SFTSFT

FCRFCR

Global Services

APELAPEL

MyProxyMyProxy

WLMWLM

CIC ServicesMyProxyMyProxy

WLMWLM

CIC ServicesMyProxyMyProxy

WLMWLM

CIC Services

gLiteWN&LCG2 gLiteWN&LCG2 gLiteWN&LCG2 gLiteWN&LCG2

gLiteWN&LCG2 gLiteWN&LCG2 gLiteWN&LCG2 gLiteWN&LCG2

Classic SEClassic SE

Page 20: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 20

Enabling Grids for E-sciencE

[email protected]

MyProxyMyProxy

BDIIBDII

WLMWLM

MyProxyMyProxy

BDIIBDII

WLMWLM

MyProxyMyProxy

BDIIBDII

WLMWLM

CIC Services

gLite - LCG2 step 2

Client Libs Client Libs

ServicesServices

Generic ServicesGeneric Services

CE push/pullCE push/pull

R-GMA MonBoxR-GMA MonBox SE

CASTOR

SE

CASTOR

SE

dCache

SE

dCache

SE

DPM

SE

DPM

SE SRMSE SRM

VOMSVOMSVOMSVOMS

VOMSVOMSVOMSVOMS

LFCLFCLFCLFC

LFCLFCLFCLFC

Services/VO

gLite UI gLite UI gLite UI gLite UI

gLite UI gLite UI gLite UI gLite UI

R-GMA RegistriesR-GMA

Registries

monitoringmonitoring

SFTSFT

FCRFCR

Global Services

APELAPEL

gLiteWN&LCG2 gLiteWN&LCG2 gLiteWN&LCG2 gLiteWN&LCG2

gLiteWN&LCG2 gLiteWN&LCG2 gLiteWN&LCG2 gLiteWN&LCG2

Page 21: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 21

Enabling Grids for E-sciencE

[email protected]

MyProxyMyProxy

BDIIBDII

WLMWLM

MyProxyMyProxy

BDIIBDII

WLMWLM

MyProxyMyProxy

BDII??BDII??

WLMWLM

CIC Services

gLite - LCG2 step 2

Client Libs Client Libs

ServicesServices

Generic ServicesGeneric Services CE push/pullCE push/pull

R-GMA MonBoxR-GMA MonBox

SE

CASTOR

SE

CASTOR

SE

dCache

SE

dCache

SE

DPM

SE

DPM

SE SRMSE SRM

VOMSVOMSVOMSVOMS

VOMSVOMSVOMSVOMS

LFCLFCLFCLFC

LFCLFCFireManFireMan

Services/VO

gLite UI gLite UI gLite UI gLite UI

gLite UI gLite UI gLite UI gLite UI

R-GMA RegistriesR-GMA

Registries

monitoringmonitoring

SFTSFT

FCR??FCR??

Global Services

DGASDGAS

gLiteWN gLiteWN gLiteWN gLiteWN

gLiteWNgLiteWNgLiteWN,LCG-utils,

gfal…gLiteWN,LCG-utils,

gfal…

gLite IOgLite IO

Ownership adjusted for access via gLite IO

Contains LFC data

LFCLFCLFCLFC

LFCLFCLFCLFC

SE

CASTOR

SE

CASTOR

SE

dCache

SE

dCache

SE

DPM

SE

DPM

SE SRMSE SRM

Users can decide to use different (multiple) catalogues, as long as they provide a DLI interface

Users can opt to use direct access to the storage via the LCG tools. However there will be limitations concerning the accessibility of the data.

Page 22: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 22

Enabling Grids for E-sciencE

[email protected]

Finer Points

• Interoperation with other grids– Mechanism to select between different client libs on the WN (done)– Need to keep an information system that can be used by LCG-3 and

OSG(grid3)

• Operations – We have to adapt our operations services to gLite production

Substantial effort Needed for preproduction service

– Monitoring has to be ported Partially done already

• Time?– Should be driven by VOs and experience gained – Fixed schedules tend to be not workable

Page 23: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 23

Enabling Grids for E-sciencE

[email protected]

To be clear on what is being certified

• Release 1.0 of gLitehttp://glite.web.cern.ch/glite/packages/R1.0/R20050331/default.asp

• Information system is a combination of BDII andCE Mon– Locations of SEs can only be stored in the BDII. This data is

entered manually into the BDII and is static.– In this release no gLite services use R-GMA as information

system, although R-GMA services are in the release and will be tested.

• No File Placement / File Transfer service– Using gridFTP on the certification test bed

• No accounting

Page 24: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 24

Enabling Grids for E-sciencE

[email protected]

When will gLite 1.0 be certified?

• Criteria for certification need to be agreed, but will probably include items such as:– Meeting SA1’s requirements on “deployability”– Successful completion of the gLite certification test suite– Job failure rate ≤ LCG-2 job failure rate– Stability of the system being acceptable– Acceptable number of critical/major bugs outstanding

• First attempt at certifying gLite therefore impossible to predict when it will be finished

Page 25: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 25

Enabling Grids for E-sciencE

[email protected]

Pre-production Service

• Pre-production planned in two phases:– Phase I:

Started when deployment of Certification testbed was successfully completed (middle of last week).

Sites: CNAF, NIKHEF, PIC(?), CESGA and CERN These sites will install all gLite site components + some gLite core

service(s) (for resilience) + LCG-2 site components (for investigating migration)

CESGA site is already being used by ARDA CMS!

– Phase II: Will start when the phase I sites are fully operational >12 sites in phase II All sites will install gLite; some will also install LCG-2

Page 26: Enabling Grids for E-sciencE  gLite for ATLAS Production Simone Campana, CERN/INFN ATLAS production meeting May 2, 2005

ATLAS production meeting, May-2-2005 26

Enabling Grids for E-sciencE

[email protected]

Infn-ECGI activity

• ECGI = Experiment Computing Grid Integration

• Working group created to – Help LHC experiments to understand and experiment

functionalities and features offered by the new gLite MW– Provide adequate user documentation

• Will work in strict collaboration and coordination with– the Experiment Integration Support (EIS) at CERN– the developers of the EGEE/LCG Grid Middleware.

• Whenever possible and needed the group will also– create system administration documentation– use dedicated resources for testing purposes.

• http://infn-ecgi.pi.infn.it/index.html