25
les robertson - cern- it 1 st update: 15/03/22 10:43 LCG LHC Computing Grid Project Creating a Global Virtual Computing Centre for Particle Physics ACAT’2002 27 June 2002 Les Robertson IT Division, CERN [email protected]

LHC Computing Grid Project

Embed Size (px)

DESCRIPTION

LHC Computing Grid Project. Creating a Global Virtual Computing Centre for Particle Physics ACAT’2002 27 June 2002 Les Robertson IT Division, CERN [email protected]. Summary. LCG – The LHC Computing Grid Project requirements, funding, creating a Grid areas of work grid technology - PowerPoint PPT Presentation

Citation preview

les robertson - cern-it 1last update: 19/04/23 21:24

LCG LHC Computing Grid Project

Creating a Global Virtual Computing Centre for Particle Physics

ACAT’2002

27 June 2002

Les Robertson

IT Division, CERN

[email protected]

les robertson - cern-it-2last update 19/04/23 21:24

LCG Summary

LCG – The LHC Computing Grid Project requirements, funding, creating a Grid

areas of work grid technology computing fabrics deployment operating a grid

Plan for the LCG Global Grid Service A few remarks

les robertson - cern-it-3last update 19/04/23 21:24

LCG

source: CERN/LHCC/2001-004 - Report of the LHC Computing Review - 20 February 2001

(ATLAS with 270Hz trigger)Regional Grand

Tier 0 Tier 1 Total Centres Total

Processing (K SI95) 1,727 832 2,559 4,974 7,533Disk (PB) 1.2 1.2 2.4 8.7 11.1Magnetic tape (PB) 16.3 1.2 17.6 20.3 37.9

---------- CERN ----------

Summary of Computing Capacity Required for all LHC Experiments in 2007

Funding dictates – Worldwide distributed computing system Small fraction of the analysis at CERN Batch analysis – using 12-20 large regional centres

how to use the resources efficiently establishing and maintaining a uniform physics environment

Data exchange and interactive analysis involving tens of smaller regional centres, universities, labs

les robertson - cern-it-4last update 19/04/23 21:24

LCG Summary - Project Goals

applications - tools, frameworks, environment, persistency

computing system global grid service cluster automated fabric collaborating computer centres grid CERN-centric analysis global analysis environment

Goal – Prepare and deploy the LHC computing environment

This is not another grid technology project –

it is a grid deployment project

les robertson - cern-it-5last update 19/04/23 21:24

LCG Two Phases

The first phase of the project – 2002-2005 preparing the prototype computing environment,

including support for applications – libraries, tools, frameworks,

common developments, ….. global grid computing service

funded by Regional Centres, CERN, special contributions to CERN by member and observer states, middleware developments by national and regional Grid projects

manpower OK hardware at CERN - ~40% funded

Phase 2 – construction and operation of the initial LHC Computing Service – 2005-2007

at CERN – missing funding of ~80M CHF

les robertson - cern-it-6last update 19/04/23 21:24

LCG Funding

Funding agencies have little enthusiasm for investing more in particle physics

HEP seen as a ground-breaker in computing initiator of the Web track record of exploiting leading edge computing effective global collaborations real need – for data as well as computation one of the few application areas with real cross-border data

needs

LHC in sync with -- emergence of Grid technology -- explosion of network bandwidth

We must deliver on Phase 1 for LHC - and show the relevance for other sciences

les robertson - cern-it-7last update 19/04/23 21:24

LCG Building a Grid

massstorage

applicationservers

WAN

data cache

Computing Centre Cluster

les robertson - cern-it-8last update 19/04/23 21:24

LCG

automated managementinstallation, configuration,maintenance, monitoring,error recovery, …

-reliability-cost containment

Cluster Fabric

autonomic computing

les robertson - cern-it-9last update 19/04/23 21:24

LCGThe MONARC Multi-Tier

Model (1999)

Department

Desktop

CERN

MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html

Tier 1 – full service

FNALRAL

IN2P3

Tier2 Lab a

Uni b Lab c

Uni n

les.

rob

ert

son

@ce

rn.c

h

Tier 0 - recording, reconstruction

les robertson - cern-it-10last update 19/04/23 21:24

LCG Building a Grid

CollaboratingComputer Centres

les robertson - cern-it-11last update 19/04/23 21:24

LCG Building a Grid

CollaboratingComputer Centres

The virtual LHC Computing CentreGrid

Alice VO

CMS VO

les robertson - cern-it-12last update 19/04/23 21:24

LCG Virtual Computing Centre

The user ---

sees the image of a single cluster

does not need to know - where the data is

- where the processing capacity is

- how things are interconnected

- the details of the different hardware

and is not concerned by the conflicting policies of the equipment owners and managers

les robertson - cern-it-13last update 19/04/23 21:24

LCGProject Implementation

Organisation

Four areas

Applications (see Matthias Kasemann’s presentation)

Grid Technology

Fabrics

Grid deployment

les robertson - cern-it-14last update 19/04/23 21:24

LCG

Grid Technology AreaLeveraging Grid R&D

Projects

US projects European projects

Many national, regional Grid projects --GridPP(UK), INFN-grid(I),NorduGrid, Dutch Grid, …

• significant R&D funding for Grid middleware

• risk of divergence

and is that good or bad?

• global grids need standards

• useful grids need stability

• hard to do this in the current state of maturity

• will we recognise and be willingto migrate to the winning solutions?

les robertson - cern-it-15last update 19/04/23 21:24

LCG Grid Technology Area

Ensuring that the appropriate middleware is available

Supplied and maintained by the “Grid projects”

It is proving hard to get the first “production” data intensive grids going as user services

Can the grid projects provide long-term support and maintenance?

Trade-off between new functionality and stability

les robertson - cern-it-16last update 19/04/23 21:24

LCG The Trans-Atlantic Issue

Bridging the ATLANTIC is essential for the project HICB – High Energy and Nuclear Physics Intergid

Collaboration Board GLUE – Grid Laboratory Universal Environment compatible middleware and infrastructure

Funded by DataTAG and iVDGL Certificates - OK Schemas – under way, working with the wider

Globus world, getting complicated – probably OK Middleware components – not yet clear – but close

collaboration on File replication Job scheduling

les robertson - cern-it-17last update 19/04/23 21:24

LCGCollaboration with Grid

Projects

LCG must deploy a GLOBAL GRID essential to have compatible middleware &

grid infrastructure better – have identical middleware

We are banking on GLUE

But we have to make some choices towards the end of the year

Services are about stability, support, maintenance

Can the R&D grid projects take commitments for long term maintenance of their middleware?

les robertson - cern-it-18last update 19/04/23 21:24

LCG Scope of Fabric Area

Tier 1,2 centre collaboration

Grid-Fabric integration middleware (DataGrid WP4)

Automated systems management package

Technology assessment (PASTA III) started

CERN Tier 0+1 centre

les robertson - cern-it-19last update 19/04/23 21:24

LCG Grid Deployment Area

The aim is to build a general computing service for a very large user population of independently-minded scientists using a large number of independently managed sites

This is NOT a collection of sites providing pre-defined services

it is the user’s job that defines the service it is current research interests that define the workload it is the workload that defines the data distribution

DEMAND - Unpredictable & Chaotic

But the SERVICE had better be Available & Reliable

les robertson - cern-it-20last update 19/04/23 21:24

LCGGrid Deployment – current

status

Experiments can do (and are doing) their event production using distributed resources with a variety of solutions

classic distributed production – send jobs to specific sites, simple

bookkeeping some use of Globus, and some of the HEP Grid tools other integrated solutions (ALIEN)

The hard problem for distributed computing is data analysis – ESD and AOD

chaotic workload unpredictable data access patterns

this is where new Grid technology is needed resource broker, replica management, ..

this is the problem that the LCG has to solve

les robertson - cern-it-21last update 19/04/23 21:24

LCG Grid Operation

User

Network Operations

Centre

Local operationLocal user support

Grid Operations Centre

Call Centre

Loca

l site

Grid

ope

ratio

ns

Grid information

service VirtualOrganisation

Grid logging &

bookkeeping

queriesmonitoring & alarmscorrective actions

les robertson - cern-it-22last update 19/04/23 21:24

LCG Grid Operation

We do not know how to do this

Probably nobody knows – looks like network operation, but there are many more variables to be watched and adjusted;looks like multi-national commercial systems, but we have no central ownership, control

A 24 hour service is needed – round the clock and round the world

les robertson - cern-it-23last update 19/04/23 21:24

LCGSetting up the

LHC Global Grid Service First data is in 2007 LCG must learn from current solutions, leverage the tools coming from the

grid projects, show that grids are usefulbut set realistic targets

short term (this year): use current solutions for physics data challenges (event

productions) consolidate (stabilise, maintain) middleware learn what a “production grid” really means by working with

DataGrid and VDT

medium term (next year): Set up a reliable global grid service – initially only a few larger

centres, but on three continents Stabilise it Several times the capacity of the CERN facility

and as easy to use

les robertson - cern-it-24last update 19/04/23 21:24

LCG

Having stabilised this base service –

showing that we can run a solid service for the experiments

then – progressive evolution –

integrate all of the Regional Centre resources provided for LHC improve quality, reliability, predictability integrate new middleware functionality – possibly once per year migrate to de facto standards as soon as they emerge

les robertson - cern-it-25last update 19/04/23 21:24

LCG Final comments

It is not just about distributing computation, it is also about managing distributed data (lots of it!) and maintaining a single view of the environment

All these parallel developments, rapidly changing technology .. may be good in the long term, but we must deploy a global grid service next year

A dependable, reliable 24 X 7 service is essential and not so easy to do with all these sites and all that data

Service Quality is the Key to Acceptance of Grids Reliable OPERATION will be the factor that limits the size of

practical Grids We are getting funding because of the relevance for other

sciences, engineering, business -- keeping things general, main-line must remain a high priority