13
ATLAS: Heavier than Heaven? ATLAS: Heavier than Heaven? Roger Jones Roger Jones Lancaster University Lancaster University GridPP19 GridPP19 Ambleside 28 August Ambleside 28 August 2007 2007

ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

Embed Size (px)

Citation preview

Page 1: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

ATLAS: Heavier than Heaven?ATLAS: Heavier than Heaven?

Roger JonesRoger Jones

Lancaster UniversityLancaster University

GridPP19GridPP19

Ambleside 28 August 2007Ambleside 28 August 2007

Page 2: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 2

OverviewOverview

• Commissioning PlansCommissioning Plans

• Cosmics running: M3-M6

• Dummy Data:

• T0/T1

• Full Dress Rehearsals

• Resources in the UKResources in the UK

• Data DistributionData Distribution

• CPU, Disk, Mass Storage, Policies

• Operational Issues and Hot TopicsOperational Issues and Hot Topics

Page 3: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 3

Commissioning PlansCommissioning Plans• We are now getting real data, at realistic ratesWe are now getting real data, at realistic rates

• M3 (mid-July)

• Cosmics produced about 100TB in 2 weeks

• Also surprised the offline by running at 4 times the nominal rate (32 samples in the LAr

calorimeter )

• M4 now underway - August 23 - early September

• Expect about • Total data volume: RAW = 66 TB , ESD + AOD = 6 TB

• 20TB RAW data and 6TB ESD at RAL

• 2TB ESD at 5 Tier 2 sites

• Data distribution as for real data

• Currently writing at 200MB/sec, half nominal

• RAW and ESD now appearing at RAL

• M5 will be similar

• M6 will run from December until real data

• Will run close to nominal rate

• Expect ~420TB by start of run, plus Monte Carlo

• T1 should treat this as valuable data, but may only live for about a year

Page 4: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 4

Page 5: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 5

Full Dress RehearsalFull Dress Rehearsal

• First in OctoberFirst in October• T0 running as per real data

• Data movement to Tier 1s

• Shipping onward to Tier 2s

• Some Tier 1 RAW data reprocessing and shipping of ESD, AOD to other Tier 1s

• This is an important step, it is a large part of operations• It also prepares the group analysis activity

• Calibration processes

• These things may not all be in parallel

• Main FDR in FebruaryMain FDR in February• Running at nominal rate

• More processes in parallel

Page 6: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 6

Resource PlanResource Plan

New T1 Evolution

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Total Disk (TB)

Total Tape (TB)

Total CPU (kSI2k)

Total Disk (TB) 2090.24 10725.33659 20100.24616 39528.81444 56231.83987 72934.8653

Total Tape (TB) 1246.026667 8067.068427 15498.64876 29423.0892 45830.74975 64721.63041

Total CPU (kSI2k) 3173 18124.42353 28426.02353 49576.22353 70726.42353 91876.62353

2007 2008 2009 2010 2011 2012

New T2 Evolution

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Disk (TB)

CPU (kSI2k)

Disk (TB) 1339.93551 8602.406837 13889.17849 22909.44708 31868.59425 40828.06284

CPU (kSI2k) 2336 17506.89811 26972.75589 51557.13737 69140.91886 86724.70034

2007 2008 2009 2010 2011 2012

New CAF Evolution

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Total Disk (TB)

Total Tape (TB)

Total CPU (kSI2k)

Total Disk (TB) 218.9205102 1145.579516 1780.331401 3127.598482 4811.362354 6311.016939

Total Tape (TB) 43.90035714 370.9648871 645.3581497 1043.271812 1374.785475 1706.299137

Total CPU (kSI2k) 821.4117647 2081.470067 2562.343154 4663.81529 6598.454092 8533.092895

2007 2008 2009 2010 2011 2012

New T0 Evolution

0

5000

10000

15000

20000

25000

30000

Total Disk (TB)

Total Tape (TB)

Total CPU (kSI2k)

Total Disk (TB) 62.85714286 152.4621429 265.0335714 472.3528571 472.3528571 472.3528571

Total Tape (TB) 400 2449.077 5562.274 11893.871 18225.468 24557.065

Total CPU (kSI2k) 1910.411765 3705.823529 4058.823529 6105.823529 6105.823529 6105.823529

2007 2008 2009 2010 2011 2012

Page 7: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 7

OperationsOperations

• We have a weekly UK Tier 1 operations meetingWe have a weekly UK Tier 1 operations meeting• The attendees is a growing list

• For a while, we may add Tier 2 attendance as required• Time will see if this requires two meetings

• We have more effort on the ATLAS side for operationsWe have more effort on the ATLAS side for operations• The 0.5FTE at RAL from GridPP will be vital

• We are still seeking extra effort from ATLAS, t.b.c.

• ATLAS is engaged in a series of reviews with Tier 1 sitesATLAS is engaged in a series of reviews with Tier 1 sites• RAL was ‘done’ on July

• Constructive discussion, problems faced openly• Very useful ‘live’ document of data classes/disk servers/tape

servers/ endpoints continuing to evolve

Page 8: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 8

ATLAS T0-T1 ExportsATLAS T0-T1 ExportsMon/Tuesday 28/29 May 2007Mon/Tuesday 28/29 May 2007

Mbytes/second•Despite RAL problems, ATLAS was having some success•RAL is now in, and showing good rates

Page 9: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 9

Data StorageData Storage

• The disk problems + Castor problems have meant that RAL was The disk problems + Castor problems have meant that RAL was

effectively ‘off’ for half a yeareffectively ‘off’ for half a year• This has a knock-on effect

• Data was not flowing properly to the Tier 2s• This means the analysis usage at the Tier 2s is severly restricted• Ad hoc work around for the Tier 2s were only partially effective

• This will take a long time to work out of the system

• Disk only storage now going to dCacheDisk only storage now going to dCache• This means that we will have a large migration issue and may need extra disk

for a period

• Castor 2.1.3 seems a big improvement, we hope for a quick and stable Castor 2.1.3 seems a big improvement, we hope for a quick and stable

2.1.42.1.4• Good interactions with the Tier 1 team

• More generally, we still need to be able to apply quotas based on VOMS More generally, we still need to be able to apply quotas based on VOMS

roles etcroles etc

Page 10: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 10

Data PlacementData Placement

• The data movement and placement system is Don Quijote 2 The data movement and placement system is Don Quijote 2

(DQ2)(DQ2)• A new major version was rolled-out in the late spring

• More robust transfers• Rate throttling etc

• The effectiveness depends on the tools folded with the The effectiveness depends on the tools folded with the

policypolicy• ATLAS works with datasets

• Subscriptions only become active when a dataset is complete• We were letting datasets stay open to grow - fixed• Only transfer to Tier 2 when the Tier 1 has the full dataset

• This is not fixed - many sets from BNL, who have low effective outwards bandwidth

Page 11: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 11

Advice for Tier 3sAdvice for Tier 3s

• Working definition: ‘Tier 3’ facilities are for local use, not for Working definition: ‘Tier 3’ facilities are for local use, not for

all of ATLASall of ATLAS• Need a Grid interface

• There are some common requirements

• ATLAS Tier 3 task force is starting to give recommendations ATLAS Tier 3 task force is starting to give recommendations

and to describe possible solutionsand to describe possible solutions• Aim is to help sites

• This will be advisory, not prescriptive!

• They can come in different forms, many ideasThey can come in different forms, many ideas• Dedicated cpu and disk racks

• Fraction of fabric for Tier 2s

• Desktop clusters

Page 12: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 12

ATLAS Requirements ATLAS Requirements start 2008, 2010start 2008, 2010

CPU (MSi2k)CPU (MSi2k) Disk (PB)Disk (PB) Tape (PB)Tape (PB)

20082008 20102010 20082008 20102010 20082008 20102010

Tier-0Tier-0 3.73.7 6.16.1 0.150.15 0.50.5 2.42.4 11.411.4

CERN Analysis FacilityCERN Analysis Facility 2.12.1 4.64.6 1.01.0 2.82.8 0.40.4 1.01.0

Sum of Tier-1sSum of Tier-1s 18.118.1 5050 1010 4040 7.77.7 28.728.7

Sum of Tier-2sSum of Tier-2s 17.517.5 51.551.5 7.77.7 22.122.1

TotalTotal 41.441.4 112.2112.2 18.918.9 65.465.4 10.510.5 41.141.1

•Note the high ratio of disk to cpu in the Tier 2s•Not yet realised

•May require adjustments

Page 13: ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007

RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 13

SummarySummary

• ATLAS is now doing exercises with real data at realistic ratesATLAS is now doing exercises with real data at realistic rates

• After a very bad 6 months, the UK is now in ATLAS exercises After a very bad 6 months, the UK is now in ATLAS exercises

and looking quite goodand looking quite good

• Good relations with T1

• Still concern over the storage solutions

• Migration from dCache will be painful and take extra resources

• The FDRs are important tests and we have to make them workThe FDRs are important tests and we have to make them work

• We have used the Tier 2s surprisingly well considering the We have used the Tier 2s surprisingly well considering the

problems with the data flowproblems with the data flow

• The next year will be ‘interesting’ (but rewarding)The next year will be ‘interesting’ (but rewarding)