Upload
gage-wagner
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
NERSC
BU
UNMSDSC
UTA
OU
FNALANL
U WISC BNL
VANDERBILT
PSU
UVA
CALTECH
IOWA STATE
PURDUE
IU
BUFFALO
TTU
CORNELL
ALBANY
UMICH
INDIANAIUPUI
STANFORD
UWM
UNL
UFL
UNI
WSU
MSU
LTU
LSU
CLEMSONUMISS
UIUC
UCRUCLA
LEHIGH
NSF
ORNL
HARVARD
UIC
SMU
UCHICAGO
MIT
RENCI
LBL
GEORGETOWNUIOWA
UCDAVIS
ND
Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid
Reliant on production and advanced networking from ESNET, LHCNET and Internet2.
Virtual Data Toolkit: Common software developed between Computer Science & applications used by OSG and others.
OSG TodayOSG Today
2
OSG Job ThroughputOSG Job Throughput 29 VOs
~75 sites (19 SE & 82 CE)
~400,000 wall clock hours per day (peaks over 500,000)
25-30% opportunistic use
~15% is non-physics
>20,000 cores used per day
>43,000 cores accessible
US-CMS, US-ATLAS and OSG ready for LHC startup
3
OSG Data ThroughputOSG Data ThroughputPetabytes a month distributed from CERN to Tier-1s, between Tier-1s and to/from Tier-2s.
Transfers bursts of >10Gb/sec.
Relies on ESNET, LHCNet and Internet2 in the US.
Are These Estimates Realistic? Yes.Slide Courtesy ESNET:
FNAL outbound CMS traffic for 4 months, to Sept. 1, 2007Max= 8.9 Gb/s (1064 MBy/s of data), Average = 4.1 Gb/s (493 MBy/s of data)
Gigabits/sec of netw
ork trafficMeg
abyt
es/s
ec o
f da
ta t
raff
ic
0
1
2
3
4
5
6
7
8
9
10
Destinations:
Known LHC Tier 2+3 Sites Drive Many of theESnet Peering Point Location and Design Decisions
Slide Courtesy Internet2:
77
OSG Platform for the US-LHC Collaborations
Software/Middlewarea) Support the movement, storage and management of the petabyte LHC data sets.b) Support of job workflow, scheduling and execution at the Tier-1, Tier-2 and Tier-3 sites, that supports
transparent access across the European and US grids
Servicesa) Information, accounting and monitoring Services publishing to the WLCGb) Reliability and Availability monitoring used by the experiments to determine the availability of sites and the
WLCG to match to the MOU.
Supporta) Security monitoring, incident response, notification and mitigation b) Operational support including centralized Ticket Handling, with automated bi-directional communication
between the systems in Europe and the USAc) Collaboration with ESNET and Internet2 network projects for the integration and monitoring of the
underlying network fabric.d) Site Coordination and common support for Tier 3 sites (>8 now on OSG)e) End-to-end support for simulation, production, analysis and focused data challenges; enabling USLHC
readiness for real data taking.
88
OSG Reporting to WLCG on behalf of US-LHC(Example)
Reliability Availability
CPU Wallclock hours for
Owner VO
CPU efficiency for Owner
VO
CPU hours for
Owner VO
MoU Pledge
*
Wallclock hours
delivered to all OSG
VOs
ATLAS T2 FederationsUS-AGLT2 96% 96% 444,517 90% 401,400 416,880 462,449
US-MWT2 100% 100% 882,569 98% 863,373 480.384 975,449
US-NET2 99% 99% 308,304 94% 290,869 287,280 308,304
US-SWT2 100% 100% 463,413 95% 435,971 598,752 686,350
US-WT2 88% 90% 399,035 81% 324,410 354,240 399,035
CMS T2sT2_US_Caltech 83% 86% 419,135 78% 327,857 432,000 451,886
T2_US_Florida 96% 97% 450,173 76% 344,198 432,000 623,556
T2_US_MIT 92% 93% 568,596 87% 493,368 432,000 949,936
T2_US_Nebraska 91% 93% 378,784 62% 235,869 432,000 661,090
T2_US_Purdue 98% 98% 2,098,491 65% 1,370,777 432,000 2,484,099
T2_US_UCSD 99% 99% 1,411,529 39% 554,206 432,000 1,737,658
T2_US_Wisconsin 100% 100% 605,646 79% 480,678 432,000 610,905
US LHC Tier2 Activity for September 2008
Long path to success, and there remains fragility in end-to-end process
99
US-ATLAS Production on OSG
ATLAS Operations on the OSG April thru September 2008
Wall Clock Hours
0500,000
1,000,0001,500,000
2,000,0002,500,0003,000,000
Apr
-08
May
-08
Jun-
08
July
-08
Aug
-08
Sept
-08
Six Month Sum of Wall Clock Hours
02,000,0004,000,0006,000,0008,000,000
10,000,00012,000,00014,000,000
Apr
-08
May
-08
Jun-
08
July
-08
Aug
-08
Sept
-08
Number of Jobs
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
Apr
-08
May
-08
Jun-
08
8-Ju
l
Aug
-08
Sept
-08
Usage Type at ATLAS Sites
0%20%
40%60%
80%100%
Apr
-08
May
-08
Jun-
08
July
-08
Aug
-08
Sept
-08
Owned Opportunistic
Average Number of CPUs Delivered
0500
1,0001,5002,0002,5003,0003,5004,000
Apr
-08
May
-08
Jun-
08
July
-08
Aug
-08
Sept
-08
Petabytes Moved
0.0000.0500.1000.1500.2000.2500.3000.350
Apr
-08
May
-08
Jun-
08
July
-08
Aug
-08
Sept
-08
1010
US-CMS Production on OSG
CMS Operations on the OSG April thru Sept 2008
Wall Clock Hours
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
Apr
-08
May
-08
Jun-
08
Jul-0
8
Aug
-08
Sept
-08
Six Month Sum of Wall Clock Hours
0
5,000,000
10,000,000
15,000,000
20,000,000
Apr
-08
May
-08
Jun-
08
Jul-0
8
Aug
-08
Sept
-08
Number of Jobs
0200,000400,000600,000800,000
1,000,0001,200,000
Apr
-08
May
-08
Jun-
08
Jul-0
8
Aug
-08
Sept
-08
Usage Type at CMS Sites
0%
20%
40%
60%
80%
100%
Apr
-08
May
-08
Jun-
08
Jul-0
8
Aug
-08
Sept
-08
Owned Opportunistic
Average Number of CPUs Delivered
01,0002,0003,0004,0005,0006,0007,000
Apr
-08
May
-08
Jun-
08
Jul-0
8
Aug
-08
Sept
-08
Petabytes Moved
0
0.5
1
1.5
2
Apr
-08
May
-08
Jun-
08
Jul-0
8
Aug
-08
Sept
-08
Out In
1111
US-LHC Benefits from OSG
Common to US-ATLAS and US-CMS1. Serves as integration and delivery point for core middleware components including
compute and storage elements (VDT)2. Cyber Security operations support within OSG and across Grids (e.g. WLCG) in
case of security incidents3. Cyber Security infrastructure including site-level authorization service, operational
service for updating certificates and revocation lists4. Service availability monitoring of critical site infrastructure services, i.e. Computing
and Storage Elements (RSV)5. Service availability monitoring and forwarding of results to WLCG6. Site level accounting services and forwarding accumulated results to WLCG7. Consolidation of Grid client utilities incl. incorporation of LCG client suite, resolving
Globus library inconsistencies8. dCache packaging through VDT and support through OSG-Storage9. Integration testbed for new releases of the OSG software, pre-production
deployment testing10. Continuous support of the distributed Computing Facility and production services
through the weekly OSG facility phone meetings
1212
US-LHC Benefits from OSG(continued)
Specific to US-ATLAS 1. LCG File Catalog (LFC) server and client packaging – needed in support of the
ATLAS global Distributed Data Management system (DDM) 2. Bestman and xrootd: SRM and file system support for Tier 2 and Tier 3 facilities3. Support for integration and extension of security services in the PanDA workload
management system and the GUMS grid identity mapping service, for compliance with OSG security policies and requirements
Specific to US-CMS1. Bestman: SRM support for Tier 3 facilities2. lcg-utils tools for data management3. Scalability testing of OSG services, incl. BDII, CE, SE, and work with developers to
improve the underlying middleware.