Upload
adelia-hannah-dalton
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
ATLAS: Heavier than Heaven?ATLAS: Heavier than Heaven?
Roger JonesRoger Jones
Lancaster UniversityLancaster University
GridPP19GridPP19
Ambleside 28 August 2007Ambleside 28 August 2007
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 2
OverviewOverview
• Commissioning PlansCommissioning Plans
• Cosmics running: M3-M6
• Dummy Data:
• T0/T1
• Full Dress Rehearsals
• Resources in the UKResources in the UK
• Data DistributionData Distribution
• CPU, Disk, Mass Storage, Policies
• Operational Issues and Hot TopicsOperational Issues and Hot Topics
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 3
Commissioning PlansCommissioning Plans• We are now getting real data, at realistic ratesWe are now getting real data, at realistic rates
• M3 (mid-July)
• Cosmics produced about 100TB in 2 weeks
• Also surprised the offline by running at 4 times the nominal rate (32 samples in the LAr
calorimeter )
• M4 now underway - August 23 - early September
• Expect about • Total data volume: RAW = 66 TB , ESD + AOD = 6 TB
• 20TB RAW data and 6TB ESD at RAL
• 2TB ESD at 5 Tier 2 sites
• Data distribution as for real data
• Currently writing at 200MB/sec, half nominal
• RAW and ESD now appearing at RAL
• M5 will be similar
• M6 will run from December until real data
• Will run close to nominal rate
• Expect ~420TB by start of run, plus Monte Carlo
• T1 should treat this as valuable data, but may only live for about a year
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 4
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 5
Full Dress RehearsalFull Dress Rehearsal
• First in OctoberFirst in October• T0 running as per real data
• Data movement to Tier 1s
• Shipping onward to Tier 2s
• Some Tier 1 RAW data reprocessing and shipping of ESD, AOD to other Tier 1s
• This is an important step, it is a large part of operations• It also prepares the group analysis activity
• Calibration processes
• These things may not all be in parallel
• Main FDR in FebruaryMain FDR in February• Running at nominal rate
• More processes in parallel
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 6
Resource PlanResource Plan
New T1 Evolution
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Total Disk (TB)
Total Tape (TB)
Total CPU (kSI2k)
Total Disk (TB) 2090.24 10725.33659 20100.24616 39528.81444 56231.83987 72934.8653
Total Tape (TB) 1246.026667 8067.068427 15498.64876 29423.0892 45830.74975 64721.63041
Total CPU (kSI2k) 3173 18124.42353 28426.02353 49576.22353 70726.42353 91876.62353
2007 2008 2009 2010 2011 2012
New T2 Evolution
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Disk (TB)
CPU (kSI2k)
Disk (TB) 1339.93551 8602.406837 13889.17849 22909.44708 31868.59425 40828.06284
CPU (kSI2k) 2336 17506.89811 26972.75589 51557.13737 69140.91886 86724.70034
2007 2008 2009 2010 2011 2012
New CAF Evolution
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Total Disk (TB)
Total Tape (TB)
Total CPU (kSI2k)
Total Disk (TB) 218.9205102 1145.579516 1780.331401 3127.598482 4811.362354 6311.016939
Total Tape (TB) 43.90035714 370.9648871 645.3581497 1043.271812 1374.785475 1706.299137
Total CPU (kSI2k) 821.4117647 2081.470067 2562.343154 4663.81529 6598.454092 8533.092895
2007 2008 2009 2010 2011 2012
New T0 Evolution
0
5000
10000
15000
20000
25000
30000
Total Disk (TB)
Total Tape (TB)
Total CPU (kSI2k)
Total Disk (TB) 62.85714286 152.4621429 265.0335714 472.3528571 472.3528571 472.3528571
Total Tape (TB) 400 2449.077 5562.274 11893.871 18225.468 24557.065
Total CPU (kSI2k) 1910.411765 3705.823529 4058.823529 6105.823529 6105.823529 6105.823529
2007 2008 2009 2010 2011 2012
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 7
OperationsOperations
• We have a weekly UK Tier 1 operations meetingWe have a weekly UK Tier 1 operations meeting• The attendees is a growing list
• For a while, we may add Tier 2 attendance as required• Time will see if this requires two meetings
• We have more effort on the ATLAS side for operationsWe have more effort on the ATLAS side for operations• The 0.5FTE at RAL from GridPP will be vital
• We are still seeking extra effort from ATLAS, t.b.c.
• ATLAS is engaged in a series of reviews with Tier 1 sitesATLAS is engaged in a series of reviews with Tier 1 sites• RAL was ‘done’ on July
• Constructive discussion, problems faced openly• Very useful ‘live’ document of data classes/disk servers/tape
servers/ endpoints continuing to evolve
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 8
ATLAS T0-T1 ExportsATLAS T0-T1 ExportsMon/Tuesday 28/29 May 2007Mon/Tuesday 28/29 May 2007
Mbytes/second•Despite RAL problems, ATLAS was having some success•RAL is now in, and showing good rates
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 9
Data StorageData Storage
• The disk problems + Castor problems have meant that RAL was The disk problems + Castor problems have meant that RAL was
effectively ‘off’ for half a yeareffectively ‘off’ for half a year• This has a knock-on effect
• Data was not flowing properly to the Tier 2s• This means the analysis usage at the Tier 2s is severly restricted• Ad hoc work around for the Tier 2s were only partially effective
• This will take a long time to work out of the system
• Disk only storage now going to dCacheDisk only storage now going to dCache• This means that we will have a large migration issue and may need extra disk
for a period
• Castor 2.1.3 seems a big improvement, we hope for a quick and stable Castor 2.1.3 seems a big improvement, we hope for a quick and stable
2.1.42.1.4• Good interactions with the Tier 1 team
• More generally, we still need to be able to apply quotas based on VOMS More generally, we still need to be able to apply quotas based on VOMS
roles etcroles etc
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 10
Data PlacementData Placement
• The data movement and placement system is Don Quijote 2 The data movement and placement system is Don Quijote 2
(DQ2)(DQ2)• A new major version was rolled-out in the late spring
• More robust transfers• Rate throttling etc
• The effectiveness depends on the tools folded with the The effectiveness depends on the tools folded with the
policypolicy• ATLAS works with datasets
• Subscriptions only become active when a dataset is complete• We were letting datasets stay open to grow - fixed• Only transfer to Tier 2 when the Tier 1 has the full dataset
• This is not fixed - many sets from BNL, who have low effective outwards bandwidth
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 11
Advice for Tier 3sAdvice for Tier 3s
• Working definition: ‘Tier 3’ facilities are for local use, not for Working definition: ‘Tier 3’ facilities are for local use, not for
all of ATLASall of ATLAS• Need a Grid interface
• There are some common requirements
• ATLAS Tier 3 task force is starting to give recommendations ATLAS Tier 3 task force is starting to give recommendations
and to describe possible solutionsand to describe possible solutions• Aim is to help sites
• This will be advisory, not prescriptive!
• They can come in different forms, many ideasThey can come in different forms, many ideas• Dedicated cpu and disk racks
• Fraction of fabric for Tier 2s
• Desktop clusters
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 12
ATLAS Requirements ATLAS Requirements start 2008, 2010start 2008, 2010
CPU (MSi2k)CPU (MSi2k) Disk (PB)Disk (PB) Tape (PB)Tape (PB)
20082008 20102010 20082008 20102010 20082008 20102010
Tier-0Tier-0 3.73.7 6.16.1 0.150.15 0.50.5 2.42.4 11.411.4
CERN Analysis FacilityCERN Analysis Facility 2.12.1 4.64.6 1.01.0 2.82.8 0.40.4 1.01.0
Sum of Tier-1sSum of Tier-1s 18.118.1 5050 1010 4040 7.77.7 28.728.7
Sum of Tier-2sSum of Tier-2s 17.517.5 51.551.5 7.77.7 22.122.1
TotalTotal 41.441.4 112.2112.2 18.918.9 65.465.4 10.510.5 41.141.1
•Note the high ratio of disk to cpu in the Tier 2s•Not yet realised
•May require adjustments
RWL Jones 29 Aug. 2007 AmblesideRWL Jones 29 Aug. 2007 Ambleside 13
SummarySummary
• ATLAS is now doing exercises with real data at realistic ratesATLAS is now doing exercises with real data at realistic rates
• After a very bad 6 months, the UK is now in ATLAS exercises After a very bad 6 months, the UK is now in ATLAS exercises
and looking quite goodand looking quite good
• Good relations with T1
• Still concern over the storage solutions
• Migration from dCache will be painful and take extra resources
• The FDRs are important tests and we have to make them workThe FDRs are important tests and we have to make them work
• We have used the Tier 2s surprisingly well considering the We have used the Tier 2s surprisingly well considering the
problems with the data flowproblems with the data flow
• The next year will be ‘interesting’ (but rewarding)The next year will be ‘interesting’ (but rewarding)