22
RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

Embed Size (px)

Citation preview

Page 1: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

RAL Tier1: 2001 to 2011

James ThorneGridPP 19

30th August 2007

Page 2: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

2001 to 2007

• Sorry GridPP, I’m afraid I can’t do that!

Page 3: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Result of GridPP3 for Tier1

• Good result:– Effort increases from 16.5 to 20.4 FTE– £6.8M hardware budget (cf £2.3M in GridPP2)

• Extra fault management/hardware staff as size of farm increases

• A good result but team remains thinly stretched; hardware is just sufficient to meet experiments’ requirements.

Page 4: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Planned Tier1 Storage Capacity (TiB)

Storage Capacity (TiB)

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

2008 2009 2010 2011

April

TiB Tape

Disk

Page 5: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Planned Tier1 CPU Capacity (KSI2K)

0

2000

4000

6000

8000

10000

12000

14000

16000

2008 2009 2010 2011

April

KS

I2K

Page 6: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Estimated Rack Count

0

20

40

60

80

100

120

2006 2007 2008 2009 2010 2011

Ra

ck

s

Disk

CPU

Page 7: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Estimated number of Disk Servers

050

100150200250300350400450500

2006 2007 2008 2009 2010 2011

Nu

mb

er o

f d

isk

serv

ers

Page 8: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Estimated number of Spinning Drives

0

2000

4000

6000

8000

10000

12000

2006 2007 2008 2009 2010 2011

Nu

mb

er

of

dri

ve

s

Page 9: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Approximate H.W Value Allocated to Experiments in 2008

Alice4%

Atlas53%

Babar3%

CMS31%

LHCb8%

Other1%

Alice

Atlas

Babar

CMS

LHCb

Other

Page 10: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Hardware

• CPU• Disk• Tape• Further procurements in FY08, FY09 and

FY10

Page 11: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

New Machine Room

• Order placed and contractor has started work• 800m2 can accommodate 300 racks + 5 robots• 2.3MW Power/Cooling capacity (some UPS)• Office accommodation for all E-Science staff• Scheduled to be available for September 2008

Page 12: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Staffing

• Lex Holt left Tier1• James Adams is moving from hardware

support to Fabric Team system admin• Plan to recruit:

– Replacement hardware repair position– Two experiment support posts; one ATLAS, one

CMS.– Raja Nandakumar as honorary team member from

LHCb– Will also shortly commences GridPP3 recruitments

Page 13: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

CASTOR

• Operational issues mentioned at GridPP 18 were tip of iceberg and CASTOR 2.1.2 service was found to be inoperable.

• Massive amount of re-engineering carried out since March with much effort from CASTOR team.– Huge progress– Areas of concern

• We are optimistic that CASTOR will be a success

Page 14: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

SL4

• 20% of batch farm now running SL4• Negotiating with LHC experiments to agree

the move of their capacity from SL3 to SL4.• Once LHC migration is completed, remaining

capacity will follow within a few weeks.• Depends on the experiments, but should

expect termination of SL3 service in September

Page 15: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Reliability

• March: invested a lot of effort without much gain

• Continue to prioritise reliability and making progress

• Recently exceeded target, now must maintain

• Start “Sysadmin On Duty” in September• Start on call later this year

Page 16: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

RAL-LCG2 Availability/Reliability

0%

20%

40%

60%

80%

100%

120%

Available

Old Reliability

New Reliability

Target

Average

Best 8

Page 17: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

CPU Efficiencies

• CPU efficiency much improved • August fall still being investigated• March minimum when CASTOR was

broken

Page 18: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

CPU Efficiencies

Page 19: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Termination of GridPP use of ADS Service

• GridPP funding and use of old legacy Atlas Datastore service scheduled to end at end of March 2008.

• RAL will continue to operate ADS service and experiments are free to purchase capacity directly from ADS Team.

Page 20: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

dCache Closure

• dCache still supported and working• We will give 6 months notice before

terminating dCache service• No notice of termination yet• Aiming to end service by end of GRIDPP2

(March 2008). Also cannot terminate ADS service until dCache ceases.

Page 21: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Grid Only

• Move to Grid only access postponed until December 2007

• No new local accounts• In January 2008:

– Batch job submission through RB/CE only (no qsub, some exceptions)

– No local login to UIs (some exceptions)– AFS Service will end

Page 22: RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007

30/08/2007 [email protected]

Conclusions

• Positioning ourselves for LHC production.• A lot of good progress with CASTOR and

expect to meet the needs of the ATLAS M4 run and CMS’s CSA07.

• Reliability has finally improved.