20
Ian Bird Ian Bird LCG Deployment Manager LCG Deployment Manager EGEE Operations Manager EGEE Operations Manager CG - The Worldwide LHC Computing Gri Building a Service for LHC Data Analysis 22 September 2006

Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

Embed Size (px)

Citation preview

Page 1: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

Ian BirdIan BirdLCG Deployment ManagerLCG Deployment ManagerEGEE Operations ManagerEGEE Operations Manager

LCG - The Worldwide LHC Computing Grid

Building a Service for LHC Data Analysis

22 September 2006

Page 2: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 20052

[email protected]

The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors

The LHC Accelerator

Page 3: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 20053

[email protected]

LHC DATA

This is reduced by online computers that filter out a few hundred “good” events per sec.

Which are recorded on disk and magnetic tapeat 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments

Page 4: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 20054

[email protected]

The Worldwide LHC Computing Grid

Purpose Develop, build and maintain a distributed computing

environment for the storage and analysis of data from the four LHC experiments

Ensure the computing service … and common application libraries and tools

Phase I – 2002-05 - Development & planning

Phase II – 2006-2008 – Deployment & commissioning of the initial services

Page 5: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 20055

[email protected]

WLCG Collaboration

The Collaboration – still growing ~130 computing centres 12 large centres

(Tier-0, Tier-1) 40-50 federations of smaller

“Tier-2” centres 29 countries

Memorandum of Understanding Agreed in October 2005, now being signed

Purpose Focuses on the needs of the four LHC experiments Commits resources –

each October for the coming year 5-year forward look

Agrees on standards and procedures

Page 6: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 20056

[email protected]

LCG Service Hierarchy

Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data Tier-1 centres

Canada – Triumf (Vancouver)France – IN2P3 (Lyon)Germany – Forschunszentrum KarlsruheItaly – CNAF (Bologna)Netherlands – NIKHEF/SARA (Amsterdam)Nordic countries – distributed Tier-1

Spain – PIC (Barcelona)Taiwan – Academia SInica (Taipei)UK – CLRC (Oxford)US – FermiLab (Illinois) – Brookhaven (NY)

Tier-1 – “online” to the data acquisition process high availability

Managed Mass Storage – grid-enabled data service

Data-heavy analysis National, regional support

Tier-2 – ~120 centres in ~29 countries Simulation End-user analysis – batch and interactive

Page 7: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 20057

[email protected]

LHC EGEE GridHigh Energy Physics a new computing infrastructure

for science

1999 – Monarc Project Early discussions on how to organise

distributed computing for LHC 2000 – growing interest in grid technology

HEP community was the driver in launching the DataGrid project

2001-2004 - EU DataGrid project middleware & testbed for an operational grid

2002-2005 – LHC Computing Grid – LCG deploying the results of DataGrid to provide

aproduction facility for LHC experiments

2004-2006 – EU EGEE project phase 1 starts from the LCG grid shared production infrastructure expanding to other communities and

sciences

CERN

Page 8: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 20058

[email protected]

LCG depends on two major science grid infrastructuresEGEE - Enabling Grids for E-ScienceOSG - US Open Science Grid

Page 9: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 20059

[email protected]

Production Grids for LHC

EGEE Grid ~50K jobs/day

~14K simultaneous jobs during prolonged periods

Jobs/Day - EGEE Grid

0

10

20

30

40

50

60

Jun-05

Jul-05

Aug-05

Sep-05

Oct-05

Nov-05

Dec-05

Jan-06

Feb-06

Mar-06

Apr-06

May-06

Jun-06

Jul-06

Aug-06

month

K jobs/day

alice

atlas

cms

lhcb

geant4

dteam

non-LHC

Last month, running jobs for the whole Grid

lhcb cms atlas alicelhcb cms atlas alice

EGEE Grid

Jobs/day EGEE Grid

14K

Page 10: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 200510

[email protected]

OSG Production for LHC

OSG~15K jobs/day. 3 big users are

ATLAS, CDF, CMS.~3K simultaneous jobs --

at the moment use quite spiky.

ATLASCMS

OSG-CMS Data Distribution -

past 3 months

OSG-ATLAS Running Jobs - past 3 months10,000

20,000

1,000

Jobs/day OSG Grid

Page 11: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 200511

[email protected]

Pre-SC4 April tests CERN T1s – SC4 target 1.6 GB/s reached – but only for one day

But – experiment-driven transfers (ATLAS and CMS) sustained 50% of the targetunder much more realistic conditions

CMS transferred a steady 1 PByte/month between Tier-1s & Tier-2s during a 90 day period

ATLAS distributed 1.25 PBytes from CERN during a 6-week period

Data Distribution

1.6 GBytes/sec

0.8 GBytes/sec

Page 12: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 200512

[email protected]

Interoperation between Grid Infrastructures

Good progress EGEE-OSG interoperability Cross job submission – in use by CMS Integrating basic operation – series of workshops

Early technical studies on integration with Nordic countries and NAREGI in Japan

Page 13: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Collaborating Infrastructures

Potential for linking ~80 countries by 2008

KnowARC

DEISATeraGrid

Page 14: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Applications on EGEE

• More than 25 applications from anincreasing number of domains– Astrophysics– Computational Chemistry– Earth Sciences– Financial Simulation– Fusion– Geophysics– High Energy Physics– Life Sciences– Multimedia– Material Sciences– …..

Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf

Page 15: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Example: EGEE Attacks Avian Flu

• EGEE used to analyse 300,000 possible potential drug compounds against bird flu virus, H5N1.

• 2000 computers at 60 computer centres in Europe, Russia, Asia and Middle East ran during four weeks in April - the equivalent of 100 years on a single computer.

• Potential drug compounds now being identified and ranked.

Neuraminidase, one of the two major surface proteins of influenza viruses, facilitating the release of virions from infected cells. Image Courtesy Ying-Ta Wu, AcademiaSinica.

Page 16: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

ITU• International Telecommunication Union

– ITU/BR: Radio-communication Sector management of the radio-frequency

spectrum and satellite orbits for fixed, mobile, broadcasting and other communication services

• RRC-06 (15 May–16 June 2006)– 120 countries negotiate the new frequency plan

– introduction of digital broadcasting UHF (470-862 Mhz) & VHF (174-230 Mhz)

– Demanding computing problem with short-deadlines

– Using EGEE grid were able to complete a cycle in less than 1 hour

Page 17: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid management: structure

• Operations Coordination Centre (OCC)

– management, oversight of all operational and support activities

• Regional Operations Centres (ROC)

– providing the core of the support infrastructure, each supporting a number of resource centres within its region

– Grid manager on Duty (COD)

• Resource centres – providing resources

(computing, storage, network, etc.);

• Grid User Support (GGUS)

– At FZK, coordination and management of user support, single point of contact for users

Page 18: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Security & Policy

Collaborative policy development– Many policy aspects are collaborative

works; e.g.:

• Joint Security Policy Group

• Certification Authorities– EUGridPMA IGTF, etc.

• Grid Acceptable Use Policy (AUP)– common, general and simple AUP

– for all VO members using many Grid infrastructures

EGEE, OSG, SEE-GRID, DEISA, national Grids…

• Incident Handling and Response – defines basic communications paths

– defines requirements (MUSTs) for IR

– not to replace or interfere with local response plans

Security & Availability Policy

UsageRules

Certification Authorities

AuditRequirements

Incident Response

User Registration & VO Management

Application Development& Network Admin Guide

VOSecurity

Page 19: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Sustainability: Beyond EGEE-II

• Need to prepare for permanent Grid infrastructure– Maintain Europe’s leading position in global science Grids– Ensure a reliable and adaptive support for all sciences– Independent of short project funding cycles– Modelled on success of GÉANT

Infrastructure managed in collaboration with national grid initiatives

Page 20: Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006

October 7, 200520

[email protected]

Conclusions

LCG will depend on

~130 computer centres two major science grid infrastructures – EGEE and OSG excellent global research networking

Grids are now operational >200 sites between EGEE and OSG Grid operations centres running for well over a year >40K jobs per day, 20K simultaneous jobs with the right load

and job mix Demonstrated target data distribution rates from CERN Tier-1s

EGEE is a large multi-disciplinary grid Although HEP is a driving force, must remain broader to ensure

the long term Planning for a long-term sustainable infrastructure now