30
Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Embed Size (px)

DESCRIPTION

Sergio Bertolucci, CERN3... And now at 7 TeV

Citation preview

Page 1: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Project Status Report

Ian Bird

Computing Resource Review Board20th April, 2010

CERN-RRB-2010-033

Page 2: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 2

Project status report• Overall status – experience with data• Planning and milestones• Status of planning for new Tier 0• Brief summary of EGEE EGI transition• Resource planning for 2010, 2011, 2012

Page 3: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Sergio Bertolucci, CERN 3

... And now at 7 TeV

Page 4: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Sergio Bertolucci, CERN 4

• Running increasingly high workloads:– Jobs in excess of 650k / day;

Anticipate millions / day soon– CPU equiv. ~100k cores

• Workloads are:– Real data processing– Simulations– Analysis – more and more

(new) users

• Data transfers at unprecedented rates next slide

Today WLCG is:

e.g. CMS: no. users doing analysis

Page 5: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Sergio Bertolucci, CERN 5

Data transfersFinal readiness test (STEP’09)

Preparation for LHC startup LHC physics data

Nearly 1 petabyte/week2009: STEP09 + preparation for data

Castor traffic last week:> 4 GB/s input> 13 GB/s served

Real data – from 30/3

Page 6: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 6

WLCG uses EGEE & OSG

85k CPU-days/day

30k CPU-days/day

30k CPU-days/day

Page 7: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Sergio Bertolucci, CERN 7

• Has meant very rapid data distribution and analysis– Data is processed and available at Tier 2s within

hours!

Readiness of the computing

CMS

ATLAS

LHCb

Page 8: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 8

More and more users>200 users~500 jobs on average over 3 months

ATLAS: number of distinct users accessing various data typesMany hundreds of users accessed grid data

CMS

Page 9: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Sergio Bertolucci, CERN 9

And physics output ...

Page 10: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Sergio Bertolucci, CERN 10

Fibre cut during STEP’09:Redundancy meant no interruption

Page 11: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 11

Reliabilities

• This is not the full picture:• Experiment-specific

measures give complementary view

• Need to be used together with some understanding of underlying issues

Page 12: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 12

• Site readiness as seen by the experiments– LH week before data taking; RH 1st week of data

Site availability seen by experiments

Page 13: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 13

2010 2011 2012

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan FebSU pp running HI SU pp running HI

WLCG timeline 2010-2012

2010 Capacity commissioned

2011 Capacity commissioned

EGEE-III ends EGI & NGIs

EGI

HEP – SSC

EMI

(SA3)

Page 14: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Now full report each month

Glexec + SCAS services available;Deployment discussion / policy ongoing

Not all sites yet publishing; information validation in progress

[email protected]

Page 16: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 16

Future milestones• Actually very few formal milestones now

– Moved from set up to regular operations• Not all problems solved – and more will certainly arise

– These can be subject to specific milestones• However, in general we must move from tracking milestones to

tracking metrics for– Performance– Reliability– Scalability

• Today we have some – but we need to propose a set of useful metrics that we track – Accounting, reliability/availability, throughputs, are published on-line– Operational metrics reviewed weekly – A lot of information in different places (SLS, dashboards, etc).

Page 17: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

STATUS OF PLANS FOR TIER 0

Page 18: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

18Frédéric Hemmer

Revised Tier 0 strategy

• The power situation has evolved – Aggressive replacement of old equipment– Technology evolution– Refined estimates of needs in next few years– 400 kW additional power made available (2.5 2.9 MW)– But situation for backed-up (Diesel) power is more critical –

close to the limit and lack of redundancy

• Revised strategy– Hosting agreement for 100 kW of backed-up power in Geneva

area– Consolidate existing CC critical power situation– Investigate container solution for incremental capacity addition– Investigate (far) remote hosting possibilities

[email protected]

Page 19: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

19Frédéric Hemmer

Tier-0 Power needs estimates

[email protected]

NB: Real limit is closer to 2.7 MW than 2.9 assumed so far

Page 20: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

20Frédéric Hemmer

March 2010 situation• Additional 400 KW in building 513

– The power capacity has been made available• Critical power consolidation in 513

– Various solutions are being studied• Requiring additional UPS & cooling capacity• Should provide ~600 KW of backed up power; Hopefully as an addition to the 2.9 MW

– Will not be available before mid-2011• External hosting of 100 KW in Geneva

– Hosting company identified & contract being signed• Target implementation: summer 2010

– Will allow for initial experience of remote operations• Containers

– Initial technology assessment done & Market survey launched– Location: Prévessin close to building 931

• Will require civil engineering to host electrical power distribution• Cannot be available before end 2011

• (Far) remote hosting proposals– No concrete financial proposals yet from Norway

• Although technical pre-proposal fairly clear– Likelihood that a similar offer will come from Finland

[email protected]

Page 21: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

21Frédéric Hemmer

Summary

• Current estimates predict that the Computer Centre will now run out of power ~ 2013– Within the current requirements of the experiments– Within the limits of the technology evolution

• IT has started to prepare several stop gap solutions to be able to cope with changing conditions as well as alternative options– But costs are significant

• Decisions for the medium term should be taken in 2010 in light of experience of data taking and once alternative options can be evaluated

[email protected]

Page 22: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 22

EGI: Status of project submissions

There were 3 different (sub-) calls1) EGI itself (project named EGI-Inspire); includes an activity (SA3)

specifically focussed on support for existing large communities• This project was invited to a hearing; likely to receive requested

funding2) Middleware (project named EMI); includes support for all gLite

software required by WLCG (FTS, LFC, dCache, etc., etc.)• This project was invited to a hearing; asked to make a 900k€ cut

3) Virtual Research Communities (ex-SSC); There were several EGEE-derived proposals, including one (ROSCOE) that contained a VRC for HEP

• These will NOT be funded.

• Project funding expected to start only in June (may be back dated to May)

Page 23: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 23

EGEE-EGI: Risk for WLCG?

• This situation does not represent a major risk for WLCG– EGEE EGI transition is well planned by EGEE, and is well advanced– Countries representing the majority of the resources have NGIs and the

Tier 1s are well placed – Important operational tools (GGUS, monitoring, etc.) are assured even if

project funding does not appear• WLCG operational procedures are well tested and are mostly independent of the

existence of EGEE or EGI • SA3 activity contains Dashboards, Ganga, & specific tasks for each experiment (~2

FTE each); VRC had integration/analysis support– EMI contains essential middleware support and “harmonisation”

gLite/ARC/Unicore (long term development was not included)• No funding for HEP VRC means that work with other application

communities will significantly reduce at CERN• Should now consider strategy for longer term of middleware

Page 24: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 24

Status of non-European states

• Concern expressed at last RRB over status in EGI of some non-EC states

• The situation has evolved:– EGI.eu: introduced Associate member status – EGI-Inspire project: full partners

Page 25: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

RESOURCE PLANNING

Baseline assumptions used by all experiments for requirements analysis

Page 26: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

Present understanding of schedule for both 2010 and 2011

[email protected]

• 2010 + 2011– Running from mid-Feb

– end Nov – Pb-Pb in November– In principle stop after 1

fb-1 ; plan to run 2 years• (0.2 in 2010, remainder

in 2011)

• 2012: shutdown of accelerator (but not computing)

Page 27: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 27

Assumptions and guidance: 2010,11,12

Assumptions:• The agreed RRB year is April - March (i.e. resources for a given year available by April)

– In 2010 exceptionally delayed this until June 1st (based on the schedules understood at that time)

– Of course some Tier 1s have already installed some fraction of their 2010 pledges.• Also agreed that in 2011 revert to the April installation deadline.• 2010 pledges or the installation schedules cannot be changed:

– nominal 2009 resources must satisfy the needs until end of May; – 2010 resources should cover the time from June to March 2011, – and the 2011 resources from April 2011 onwards.

Live time:• 30 days/month = 720 hours • folding in efficiencies 720 x 0.7 x 0.4 = ~200 effective hours/month

1) Availability of machine for physics = 0.7• The rest is technical stop + recovery from

technical stop + dedicated MD2) Efficiency for physics = time with colliding

beams/time that machine is available = 0.4• The rest is turnaround time + faults +

access

Page 28: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 28

Summary of requirements

Totals 2010 2010pledge

2011 2012

CERN CPU

233.4 233.4 263.3 219.7

CERN disk

14.79 14.8 19.7 22.8

CERN tape

31.7 31.7 48.8 49.7

T1 CPU 394.1 412 543.5 584T1 disk 49.39 44.5 66.3 68.9T1 tape 56.2 51.4 111.07 131.72T2 CPU 562.6 511.1 730.2 787T2 disk 46.62 39.6 75.42 78.42

Old 2010 request + 2010 pledges are as presented at the Autumn 2009 RRB

Page 29: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 29

• Budget cut in France: -40% – Notified after the last RRB– Proposed impact for 2010 somewhat less with planning and

management– Risk for 2011?

• Concerns over some Tier 1s– Recent experience is good, hope this is sustainable in the long

term• Level of effort available in EMI for middleware support

– Including release process etc.– May be at the limit

• Data access for analysis– Early discussions on how to address this – 2 year timescale

Concerns

Page 30: Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB-2010-033

[email protected] 30

Summary• First experience with data has been positive from the

WLCG point of view– Thanks to the huge efforts invested in recent years in testing – All Tier 0, Tier 1 and Tier 2 staff must take the credit for this

• Resource planning for coming years is a concern• Still to see what effect many more non-expert users will

have• Transition from EGEE to EGI is now

– It is (hopefully!) not a major risk for WLCG• Must start to address long term sustainability of the

system we have