Upload
byron-mcgee
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Quarterly report
SouthernTier-2Quarter 03 2005
P.D. Gronbech
September 2005 Quarterly report: SouthGrid
Current site status data
Site Service nodes
Worker nodes
Local network connectivity
Site connectivity
SRM Days SFT failed
Days in scheduled maintenance
Security incidents this quarter which impact on Grid
Birmingham SL304LCG2.6.0
SL304LCG2.6.0
100Mb/s 1Gb/s No 3 2 0
Bristol SL304LCG2.6.0
SL304LCG2.6.0
100Mb/s 1Gb/s No 18 19 0
Cambridge SL305LCG2.6.0
SL305LCG2.6.0
100Mb/s 2.5Gb/s No 7 5 0
Oxford SL304LCG2.6.0
SL304LCG2.6.0
100Mb/s 2.5Gb/s No 7 3 0
RAL PPD SL303LCG2.6.0
SL304LCG2.6.0
1Gb/s 2Gb/s Dcache 3 0 0
1) Local network connectivity is that to the site SE2) It is understood that SFT failures do not always result from site problems, but it is the best measure
currently available. Results based on the old SFT page as this contains history, Site only deemed to have failed if it did not have a good set of results for a particular day.
September 2005 Quarterly report: SouthGrid
All GridPP Resources
Site Promised Actual
Integrated kSI2K
hours until this
quarter
CPU (kSI2K)
Storage (TB)
Integrated kSI2K hours
until this quarter
CPU (kSI2K)
Storage (TB)
Birmingham
1716960 196 9.3 1952604 222.9 * 9.3
Bristol 332880 38 1.9 104244 11.9 * 1.9
Cambridge
280320 32 4.4 350400 40 4.4
Oxford 2067360 236 18.5 1042440 119 ** 18.5
RAL PPD 1322760 151 11.8 858480 98 *** 5.8
Total 5720280 653 45.9 4308168 491.8 39.9
1) The GridPP-Tier-2 MoUs made reference to integrated CPU over the 3 years of GridPP2. Under the “Promised – integrated kSI2K hours until this quarter” an estimate is provided of what the Tier-2 would have expected to provide to this quarter on the basis of planned installations. “Static kSI2K” shows what would currently be expected if all purchases planned to this quarter had been made and implemented. The actual columns show what has been delivered.
2) RAL PPD delayed purchase due to lack of use earlier in year.
* The Bristol Babar cluster transferred to Birmingham** Delayed purchasing due to lack of computer room*** Delayed purchasing due to earlier lack of use
September 2005 Quarterly report: SouthGrid
LCG resources
Site Estimated for LCG Currently delivering to LCG
Total job slots
CPU (kSI2K)
Storage(TB)
Total jobs slots
CPU(kSI2K)
Storage (TB)
Birmingham
54 ** 58.5 2 40 28 1.92
Bristol 26 11.9 1.1 2 0.8 0.16
Cambridge 40 28.4 4.2 44 39.6 1.97
Oxford 174 156.6 9 74 66.6 3.2 *
RAL PPD 144 151 11.8 82 71.75 6.4
Total *** 438 406.4 28.1 242 206.75 11.73
1) The estimated figures are those that were projected for LCG planning purposes:http://lcg-computing-fabric.web.cern.ch/LCG-Computing-Fabric/GDB_resource_infos/Summary_Institutes_2004_2005_v11.htm
2) Current total job slots are those reported by EGEE/LCG gstat page.
* This figure includes the older se also, not currently reported on the gstat pages.** 50% of ATLAS Farm available to LCG.*** The total estimated above comes from the MOU spreadsheet, the total from the LCG projected planning spreadsheet was 200 KSI2K and 7TB
Shortfall is due to delayed purchasing at RAL and Oxford
September 2005 Quarterly report: SouthGrid
VOs supported by site
Site ALICE
ATLAS
BABAR
BIOMED
CDF
CMS
DTEAM
DZERO
HONE
ILC LHCB
NA48
PHENO
SIXT
ZEUS
Total
Birmingham
1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 9
Bristol 1 1 0 0 0 1 1 0 0 0 1 0 0 0 1 6
Cambridge
1 1 0 1 0 1 1 0 0 0 1 1 0 0 0 7
Oxford 1 1 0 1 1 1 1 0 0 0 1 0 1 1 1 10
RAL PPD 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 12
Total 5 5 2 4 1 5 5 1 2 1 5 1 2 2 3
0 => not supported 1 => supported
September 2005 Quarterly report: SouthGrid
Resources used per VO over quarter (KSI2K hours)
Site CPU ALICE
ATLAS
BABAR
BIOMED
CMS DTEAM
DZERO
HONE
ILC LHCB
PHENO
ZEUS
Total
Birmingham
0 121 1876 8906 1208 30 0 2885 0 6509 0 0 21535
Bristol 0 26 0 0 1 6 0 0 0 79 0 0 112
Cambridge
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
Oxford 0 14036
270 20933 1722 7 0 0 0 21578
0 1702
60248
RAL PPD 0 23512
23735
5354 7287 312 0 8497 0 18078
1101 3212
91088
Total 0 37695
25881
35193 10218
355 0 11382
0 46244
1101 4914
172983
1) Information currently available from APELhttp://goc.grid-support.ac.uk/gridsite/accounting/tree/gridpp_view.php - please note these pages are still under development! Nb. This could be automated with an SQL/R-GMA query
September 2005 Quarterly report: SouthGrid
Usage by VO for Tier-2
Jobs July 2005 Aug 2005 Sep 2005
alice 0 4 2
Atlas 16024 5549 4879
babar 2500 2048 6509
Biomed 1700 2551 1689
cms 572 191 527
dteam 4726 4334 6600
dzero 7 4 5
hone 720 389 705
ilc 0 2 0
lhcb 1752 2820 6995
Na48 0 0 0
pheno 0 9 337
zeus 2809 633 1939
Data taken from goc accounting pages NormSumCPU (CPU hours normalised to 1KSPECint2000
CPU July 2005 Aug 2005 Sep 2005
alice 0 0 0
Atlas 9185 9147 19363
babar 6562 4977 14342
Biomed 13075 21728 390
cms 6916 1226 2076
dteam 290 54 11
dzero 0 0 0
hone 4298 3037 4047
ilc 0 0 0
lhcb 2084 34652 9508
Na48 0 0 0
pheno 0 0 1101
zeus 2920 623 1371
September 2005 Quarterly report: SouthGrid
Storage resources in use per VO (TB)
Site Storage
ALICE
ATLAS
BABAR
BIOMED CMS
DTEAM DZERO
HONE ILC
LHCB NA48
PHENO ZEUS Total
Birmingham
0 0.598 0 0 0 0.0000006
0 0.0027 0 0 0 0 0 0.6007006
Bristol 0 0 0 0 0 0.0000004
0 0 0 0 0 0 0 0.0000004
Cambridge 0 0.241 0 0.000206
0 0.0000009
0 0 0 0.000004
0 0 0 0.241214
Oxford 0 0.094 0 0.0011 0 0.0000021
0 0 0 0.0000011
0 0 0.000302
0.095405
RAL PPD 0 0.0756
0.157 0.0004205
0 0.001268
0 0.10006
0 0.0000328
0 0.000707
0.000728
0.335816
Total 0 1.0086
0.157 0.0017265
0 0.001272
0 0.10276
0 0.0000379
0 0.000707
0.00103
1.273136
Difficult to provide this for the period but we can at least show *current* usage. Numbers need to be provided by siteAdmins (> du – sh) but this will change under dCache.
September 2005 Quarterly report: SouthGrid
Usage by VO (CPU)
September 2005 Quarterly report: SouthGrid
Usage by VO (jobs)
Nb: This can be extracted from APEL
September 2005 Quarterly report: SouthGrid
Progress over last quarter
Site Successes Problems/IssuesBirmingham •Upgraded to 2.6.0
•Took part in Pre release testing of 2.6.0•Installed Babar clusters formerly at Bristol.
Bristol •Site installed, connected on 5th July and now maintained by Y. Coppens and P. Gronbech. •Finalizing the HP funded post•Recruited Sys admin on Rolling Grant
•Lack of Manpower has hindered progress
Cambridge •Upgraded to 2.6.0•Integrated LCG cluster in to Cam Grid Condor cluster.
•Still some ownership problems wrt to Condor users cf lcg pool uid’s.•Apel Accounting does not yet support Condor
Oxford •Upgraded to SL 304•Installed LCG_2.6.0•New Sys Admin Started
New Computer room is some way off and some nodes are over heating. SRIF funding has been secured to build it. The upgrade of the lcg cluster can commence as soon as the room is ready as funds are reserved.
RALPPD •Upgraded to 2.6.0•Install dcache
September 2005 Quarterly report: SouthGrid
Tier-2 risks
General risks
•Lack of use by casual users. Feedback from jobs that go astray is not user friendly
Mitigating actions
•Need better training for users and more useable software.•Have Integrated UI’s with Local clusters for ease of use.
Institute specific risks
•Lack of adequate computer room space at Oxford is still a problem. Several nodes are running hot.•Slow progress on building of new computer room will delay upgrade of Oxfords resources.•Use of Condor at Cambridge prevents monitoring statistics??•Bristol Manpower•Lack of Bristol resources
Mitigating actions
• Condor support to be added to APEL
•New sys admin due to start October 1st.•Local cluster will be used, SRIF funded eScience cluster to follow with luck
September 2005 Quarterly report: SouthGrid
Tier-2 planning for next quarter
• Setup and purchase integration test bed for SouthGrid use• Coordinate use of this cluster within UK Testzone• Install LCG-2.7.0 • Install SRM at all sites• Support some non LHC VO’s• Investigate DPM at Birmingham, then install either dcache or
DPM at other sites dependent upon results of testing.• Possible new Hardware purchases.• Start some inter site performance tests• Prepare for SC4 involvement
September 2005 Quarterly report: SouthGrid
Objectives and deliverables for last
quarter
Objective/deliverable Due date Metric/output
Install LCG2.6.0 at all sites Late July 2005 Completed
Testing of disk-to-disk transfers in preparation for Service Challenge 4
31st September 2005 Not yet started
First sites to install SRM 31st September 2005 RAL install dcacheBirmingham started installing DPM
Support non LHC vo’s 31st September 2005 Birmingham, Cambridge, Oxford and RAL supported Biomed, and were largest UK T2 contributor.
September 2005 Quarterly report: SouthGrid
Objectives and deliverables for next
quarter
Objective/deliverable Due date Metric/output
Install LCG2.7.0 at all sites Late October 2005
Testing of disk-to-disk transfers in preparation for Service Challenge 4
31st December 2005
All sites to install SRM 31st December 2005
Continue support non LHC VO’s 31st December 2005
September 2005 Quarterly report: SouthGrid
Meetings, papers & effort
Tier-2 coordinator effort Comments
3 months
Area Description
Talks
Conferences •Gridpp 14 Birmingham
Publications
For Tier-2 coordinator:
September 2005 Quarterly report: SouthGrid
Summary & outlook
• South Grid technical meeting held in August continued to focus sites on rapid upgrades. All sites are running the latest release and will start installing SRM’s in preparation for SC4.
• There are continuing manpower issues at Bristol but this will ease shortly as the new Systems Administrator starts in October. The part HP funded post has also been finalised.
• Oxford will be able to expand resources once the new computer room is built. SRIF funding for this has been obtained for the room.
• Oxford new Sys Admin is in place, allowing the T2C more time to Coordinate!
• Yves Coppens is providing valuable help across SouthGrid.• Bristol are now on line and it is hoped to expand their cluster once
the local sys admin is in place.