23
Markus Schulz LCG Deployment WLCG Middleware Status Report 16 th February, 2009

16 th February, 2009

  • Upload
    questa

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

WLCG Middleware Status Report. 16 th February, 2009. Overview. The three WLCG middleware stacks ARC (NDGF) Most sites in northern Europe ~ 10 % of WLCG CPUs OSG Most North American sites > 25 % of WLCG CPUs gLite Used by the EGEE infrastructure Summary and Issues - PowerPoint PPT Presentation

Citation preview

Page 1: 16 th  February,  2009

Markus SchulzLCG Deployment

WLCG Middleware Status Report

16th February, 2009

Page 2: 16 th  February,  2009

[email protected] 2

Overview The three WLCG middleware stacks

ARC (NDGF) Most sites in northern Europe

~ 10 % of WLCG CPUs OSG

Most North American sites > 25 % of WLCG CPUs

gLite Used by the EGEE infrastructure

Summary and Issues

s have been added by me

Page 3: 16 th  February,  2009

ARC middleware status

Michael GrønagerProject Director, NDGFLCG-LHCC Mini Review

CERN, Geneva, February 16th 2009

Page 4: 16 th  February,  2009

LCG-LHCC Mini Review, CERN, February 2009

WLCG sites with ARC Tier-1:

NDGF Tier-2s:

Finnish Norwegia

n Slovenian Swedish

Tier-3s: Danish Norwegia

n Swedish Swiss

Page 5: 16 th  February,  2009

5LCG-LHCC Mini Review, CERN, February 2009

ARC Status – Current Version

Current stable release 0.6.5 - “Earth Quake” (December)

Improved cache scalability added ARC supports caching of files used by several jobs.

This boosts performance for e.g. Analysis, but scalability issues were detected for large clusters. ARC0.6.5 enables to split this load to several file servers

Optional patch for replacing Globus MDS by new solution: EGIIS, which includes BDII - This is deployed at most NDGF related sites

Minor issue with LFC fixed

Page 6: 16 th  February,  2009

6LCG-LHCC Mini Review, CERN, February 2009

ARC Status – Next Version Next stable release “Fastelavn”* (February) Further scalability improvements included:

Support for sharing the load on multiple file system servers

Support for distributing multiple up and down loaders on multiple machines

- these new features makes ARC ready for running production on large +5000 core machines

MDS fully replaced by EGIIS and BDII Optional publishing of GLUE1.3 along with ARC

schema (Currently in testing e.g. at NDGF-BENEDICT-T3)

KnowARC features stating to appear: Optional Module for OGF BES submission, based on

new and more modular code base * aka Mardi Gras

Page 7: 16 th  February,  2009

7LCG-LHCC Mini Review, CERN, February 2009

ARC Future The production release of ARC (sometimes called

ARC classic) will continue to evolve More and more components will be integrated from

e.g. the KnowARC project. The KnowARC development adds new service

interfaces that adhere to standards like GLUE2, BES and JSDL These will be incorporated into the production rel. of

ARC. There will be no “migrations” but a graduate

incorporation of the novel components into the stable branch, like OGF BES in “Fastelavn”

ARC components will be included in UMD, and ARC now supports building on ETICS.

Page 8: 16 th  February,  2009

Staus of OSG Middleware for WLCG

Ruth Pordes, OSG Executive DirectorAlain Roy, OSG Software Coordinator

LHCC MiniReview Feb 16th 2009

Page 9: 16 th  February,  2009

OSG Middleware Scope & Status

• OSG provides packages for Compute Elements, Storage Elements, VO managers, Worker-Node Client and User Client.

• OSG middleware is tested to allow Applications to interoperate across OSG and EGEE (and NDGF).

• Thus WLCG users are able to transparently use the multiple grids.

• OSG V1.0 stable for during data taking, cosmic runs, ramped up simulation production and analysis during second half of 2008.

9LHC Mini-Review, Feb 2009

Page 10: 16 th  February,  2009

Progress over last 6 months• Bestman/xrootd Storage Elements now installed at several

Tier-2. Bestman + nfs/luster/hadoop) installed on Tier-3s and a couple of Tier-2s.

• Addition of WLCG Client utilities (LFC, lcg_utils) enables use OSG Client with no need to install both the OSG and EGEE client packages.

• Roll-out of joint gLite/VO services/ GLobus common interfaces and protocols in security components. Significant testing effort across the projects including SCAS/LCAS, glexec, GUMS.

• EGEE packages continue to be included in OSG s/w stack:

10

VOMS/VOMS-Admin glexec edg-mkgridmap

LHC Mini-Review, Feb 2009

Page 11: 16 th  February,  2009

Software Tools Group• Part of new OSG project structure in FY09. Led by Alain Roy and Mine

Altunay.

• Central hub for all software projects/plans.

• Aims to ensure stakeholder’s needs are met from planning to deployment.

• Single point of contact for software providers.

• Inputs: User/VO/Site requirements Software providers timelines/plans

• Outputs: Plans for software stack evolution

• Point of contact with the EGEE EMT and gLite.

11LHC Mini-Review, Feb 209

Page 12: 16 th  February,  2009

External Software ProvisionOSG,US ATLAS,US CMS working closely with software

development groups for Timely deployment of new versions of dCache and Bestman for

WLCG needs. Evolution of the identity systems (looking at backends to Shib,

Kerberos) and compatability. Condor changes to support scalability in number of jobs. Internet2/ESNEt for deployment of perfsonar network monitoring

tools. Gratia accounting, OIM operations database & tools. Use of xrootd.

OSG & US ATLAS working on generalization of PANDA for other users.

OSG and US CMS working on generalization of Glide-in WMS for other users.

12LHC Mini-Review, Feb 2009

Page 13: 16 th  February,  2009

OSG support for gLite underpinnings

• We continue to supply a subset of the VDT as RPMs: Condor Globus MyProxy GSI OpenSSH GPT

13LHC Mini-Review, Feb 2009

Page 14: 16 th  February,  2009

Current Work• Major focus is on better support for incremental

upgrades, roll-back, forward compatability. Includes a redesign of the packaging to improve native

packaging• Debian 5 support for LIGO• Software upgrades only if really needed.

Not looking yet at Globus 4.2

• Interoperability: Testing of compatability of CREAM with OSG Client stack Ensure availablity, reliability, installed capacity, accounting

software and sevices all report correctly from OSG to EGEE and to WLCG.

14LHC Mini-Review, Feb 2009

Page 15: 16 th  February,  2009

Currently Supported Platforms• Linux (32 & 64 bit)

RHEL 3 RHEL 4 RHEL 5 Debian 4 ROCKS 3 SuSE Linux 9 (just 64-bit) Scientific Linux 3 Scientific Linux 4

• Mac OS 10.4 (client only)• AIX 5.3 (limited support)

15LHC Mini-Review, Feb 2009

Page 16: 16 th  February,  2009

Concerns (nothing new)

Need to continue to ensure modularity/separation of EGEE services and WLCG, to enable OSG to effectively contribute and peer. Need WLCG to work with OSG middleware activities as closely as with the EGEE middleware activities. We are all trying hard here!

Interoperability activities will become more challenging in an EGI era where the number of independent s/w stacks may grow or diverge. OSG committed to work with EGI partners in these areas.

OSG pleased to contribute to the Infrastructure Policy Group. These are pragmatic activities for understanding commonalities and differences. OSG remains nervous at the potential of OGF standards being really successful.

16LHC Mini-Review, Feb 2009

Page 17: 16 th  February,  2009

[email protected] 17

gLite The current release is gLite 3.1 It is updated almost every week ( 30+ updates/year) Its purpose is to provide a stable platform for production grid usage It covers:

Data Management Workload Management Information System AAA

Distributed lifecycle Tools and formal processes

Links teams and tasks Monitor progress

Large code base (~1.6 Million lines of code)

Release Day

time

C

Update1

B

Update2

AC

Update3

B

Integration CertificationBuild

Regular release interval

Component A

Component B

Component C

Illustration of

in a component based release process

Update4

Page 18: 16 th  February,  2009

[email protected] 18

Most Active Areas Workload management ( access to computing resources)

Support for multiuser pilot jobs Used by experiment’s frameworks: Dirac, Panda, ALIEN

Move to next OS platform: SL5

Continuous evolution of other components FTS, DPM, LFC……..

Page 19: 16 th  February,  2009

[email protected] 19

Workload management LCG-RB has been phased out WMS-3.1 SL4 major update (accumulates patches from > 8 months)

Certified Will be released to production in the next weeks Can handle >30K jobs/day Better support for bulk submission Almost ready to support CREAM-CE

ICE integrated, but needs more testing Support for multiuser pilot jobs

SCAS and glexec on WNs are late Now under stress testing Still issues with memory management Fails at 0.03% rate Not good enough for an authorization system Scales to > 10 Hz ( ok for most sites) Will start a pilot service during the next week

Page 20: 16 th  February,  2009

[email protected] 20

Computing Resource Access (CE)

In production at all EGEE sites: LCG-CE Legacy service

Introduced end 2002 Has been improved over the years to handle 50 users and 4K jobs

This is good enough for production use Might be problematic for analysis tasks

CREAM-CE New architecture

Web Service interface, supports BES standard Parameter passing to batch systems Scalability!!!

First version has been released to production 8 months ago 13 instances in production + 13 in PPS Used by ALICE

New version with many bug fixes in final certification state

Page 21: 16 th  February,  2009

[email protected] 21

Scientific Linux 5 SL5 Worker Node pilot phase has come to an end

Experiments encountered no major problems New formal release is being prepared Will arrive in production soon

Other activities: Multi compiler support Support for multiple versions Improved rollback support

Long term: Support of new information system schema ( GLUE-2) Introduction of first components of new EGEE Authorization

Framework Policy management system

Page 22: 16 th  February,  2009

[email protected] 22

Issues and Outlook EGEE-III ends early 2010

The new environment for middleware support is under discussion Less CERN involvement in integration and release management Will the new entities be up and running in time?

gLite Consortium Discussions on formal agreement are taking place Required to organize support for gLite middleware

Unified Middleware Distribution is forming ARC + gLite + UNICORE

Move towards standards based middleware WLCG has a wider scope

Maintaining interoperability might become more difficult

Page 23: 16 th  February,  2009

[email protected] 23

Summary

All 3 middleware stacks provide stable production environments And are aware of scalability issues and addressed most of them

All 3 stacks interoperate with each other And work on improving interoperability and interoperation

OSG supports actively supports pilot jobs (glexec/Gums) gLite will soon ( glexec/SCAS)

Middleware stacks still evolve successfully introduced major changes to the production system Without interrupting the service

The transition from EGEE-III to EGI, UMD and the gLite consortium will be challenging