Global Lambdas and Grids for Global Lambdas and Grids for Particle Physics in the LHC EraParticle Physics in the LHC Era
Harvey B. NewmanHarvey B. Newman California Institute of TechnologyCalifornia Institute of Technology
SC2005SC2005Seattle, November 14-18 2005Seattle, November 14-18 2005
Beyond the SM: Great Questions of Particle Physics and Cosmology
1. Where does the pattern of particle families and masses come from ?
2. Where are the Higgs particles; what is the mysterious Higgs field ?
3. Why do neutrinos and quarks oscillate ?
4. Is Nature Supersymmetric ?5. Why is any matter left in the
universe ?6. Why is gravity so weak?7. Are there extra space-time
dimensions?
You Are Here.
We do not know what makes up 95% of the universe.
TOTEM pp, general purpose; HI
pp, general purpose; HI
LHCb: B-physics
ALICE : HI
pp s =14 TeV L=1034 cm-2 s-1
27 km Tunnel in Switzerland & France
Large Hadron Collider CERN, Geneva: 2007 Start
Large Hadron Collider CERN, Geneva: 2007 Start
CMS
Atlas
Higgs, SUSY, Extra Dimensions, CP Violation, QG Plasma, … the Unexpected
5000+ Physicists 250+ Institutes 60+ Countries
Challenges: Analyze petabytes of complex data cooperativelyHarness global computing, data & network resources
CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1
Tier 1
Tier2 Center
Online System
CERN Center PBs of Disk;
Tape Robot
FNAL CenterIN2P3 Center INFN Center RAL Center
InstituteInstituteInstituteInstitute
Workstations
~150-1500 MBytes/sec
~10 Gbps
1 to 10 Gbps
Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.
Physics data cache
~PByte/sec
10 - 40 Gbps
Tier2 CenterTier2 CenterTier2 Center
~1-10 Gbps
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
Experiment
LHC Data Grid HierarchyLHC Data Grid Hierarchy
Emerging Vision: A Richly Structured, Global Dynamic System
ESnet Monthly Accepted Traffic ThroughMay, 2005
0
100
200
300
400
500
600
Feb,
90
Sep
, 90
Apr
, 91
Nov
, 91
Jun,
92
Jan,
93
Aug
, 93
Mar
, 94
Oct
, 94
May
, 95
Dec
, 95
Jul,
96
Feb,
97
Sep
, 97
Apr
, 98
Nov
, 98
Jun,
99
Jan,
00
Aug
, 00
Mar
, 01
Oct
, 01
May
,02
Dec
, 02
Jul,
03
Feb,
04
Sep
, 04
Apr
, 05
TByt
e/M
onth
Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs
Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs
SLAC Traffic Growth in Steps: ~10X/4 Years.
Projected: ~2 Terabits/s by ~2014“Summer” ‘05: 2x10 Gbps links:
one for production, one for R&D
W. Johnston
R. Cottrell
Progressin Steps
10 Gbit/s
TER
AB
YTES
Per
Mon
th
100
300
400
500
600
200
ESnet Accepted Traffic 1990 – 2005Exponential Growth: +82%/Year for the Last 15 Years; 400X Per Decade
IPv4 Multi-stream record with IPv4 Multi-stream record with FAST TCP: FAST TCP: 6.86 Gbps X 27kkm:6.86 Gbps X 27kkm: Nov 2004Nov 2004
IPv6 record: IPv6 record: 5.11 Gbps5.11 Gbps between between Geneva and Starlight: Jan. 2005Geneva and Starlight: Jan. 2005
Disk-to-disk Marks: Disk-to-disk Marks: 536 Mbytes/sec (Windows); 536 Mbytes/sec (Windows); 500 Mbytes/sec (Linux) 500 Mbytes/sec (Linux)
End System Issues: End System Issues: PCI-X Bus, PCI-X Bus, Linux Kernel, NIC Drivers, CPULinux Kernel, NIC Drivers, CPU
Internet 2 Land Speed Record (LSR)
NB: Manufacturers’ Roadmaps for 2006: One Server Pair to One 10G Link
Nov. 2004 Record Network
6.6 Gbps16500km
4.2 Gbps16343km5.6 Gbps
10949km
5.4 Gbps7067km2.5 Gbps
10037km0.9 Gbps10978km0.4 Gbps
12272km
0
20
40
60
80
100
120
140
160
Thro
ughp
ut (G
bps)
Internet2 LSR - Single IPv4 TCP stream 7.21 Gbps20675 km
Internet2 LSRs:Blue = HEP
7.2G X 20.7 kkm
Th
rou
hg
pu
t (P
eta
bit
-m/s
ec)
HENP Bandwidth Roadmap for Major Links (in Gbps)
HENP Bandwidth Roadmap for Major Links (in Gbps)
Year Production Experimental Remarks
2001 0.155 0.622-2.5 SONET/SDH
2002 0.622 2.5 SONET/SDH DWDM; GigE Integ.
2003 2.5 10 DWDM; 1 + 10 GigE Integration
2005 10 2-4 X 10 Switch; Provisioning
2007 2-4 X 10 ~10 X 10; 40 Gbps
1st Gen. Grids
2009 ~10 X 10 or 1-2 X 40
~5 X 40 or ~20-50 X 10
40 Gbps Switching
2011 ~5 X 40 or
~20 X 10
~25 X 40 or ~100 X 10
2nd Gen Grids Terabit Networks
2013 ~Terabit ~MultiTbps ~Fill One Fiber
Continuing Trend: ~1000 Times Bandwidth Growth Per Decade;HEP: Co-Developer as well as Application Driver of Global Nets
LHCNet , ESnet Plan 2006-2009:20-80Gbps US-CERN, ESnet MANs, IRNC
LHCNet , ESnet Plan 2006-2009:20-80Gbps US-CERN, ESnet MANs, IRNC
DENDEN
ELPELP
ALBALBATLATL
Metropolitan Area Rings
Aus.
Europe
SDGSDG
AsiaPacSEASEA
Major DOE Office of Science SitesHigh-speed cross connects with Internet2/Abilene
New ESnet hubsESnet hubs
SNVSNV
Europe
Japan
Science Data Network core, 40-60 Gbps circuit transportLab suppliedMajor international
Production IP ESnet core, 10 Gbps enterprise IP traffic
Japan
Aus.
Metro Rings
ESnet2nd Core:
30-50G
ESnet IP Core≥10 Gbps
10Gb/s10Gb/s
30Gb/s2 x 10Gb/s
NYCNYCCHICHI
LHCNetData Network
(2 to 8 x 10 Gbps US-CERN)
LHCNet Data Network
DCDCGEANT2SURFNetIN2P3
NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2
CERN
FNAL
BNL
LHCNet US-CERN: Wavelength Triangle10/05: 10G CHI + 10G
NY 2007: 20G + 20G 2009: ~40G + 40G
ESNet MANs to FNAL & BNL; Dark fiber (60Gbps)
to FNAL
IRNC Links
Global Lambdas for Particle PhysicsGlobal Lambdas for Particle PhysicsCaltech/CACR and FNAL/SLAC BoothsCaltech/CACR and FNAL/SLAC Booths
Preview global-scale data analysis of the LHC Era (2007-2020+), using next-generation networks and intelligent grid systems
Using state of the art WAN infrastructure and Grid-based Web service frameworks, based on the LHC Tiered Data Grid Architecture
Using a realistic mixture of streams: organized transfer of multi-TB event datasets, plus numerous smaller flows of physics data that absorb the remaining capacity.
The analysis software suites are based on the Grid-enabled Analysis Environment (GAE) developed at Caltech and U. Florida, as well as Xrootd from SLAC, and dcache from FNAL
Monitored by Caltech’s MonALISA global monitoring and control system
Global Lambdas for Particle PhysicsGlobal Lambdas for Particle PhysicsCaltech/CACR and FNAL/SLAC BoothsCaltech/CACR and FNAL/SLAC Booths
We used Twenty Two [*] 10 Gbps waves to carry bidirectional traffic between Fermilab, Caltech, SLAC, BNL, CERN and other partner Grid Service sites including: Michigan, Florida, Manchester, Rio de Janeiro (UERJ) and Sao Paulo (UNESP) in Brazil, Korea (KNU), and Japan (KEK)
Results 151 Gbps peak, 100+ Gbps of throughput sustained for hours:
475 Terabytes of physics data transported in < 24 hours 131 Gbps measured by SCInet bwc team on 17 of our waves
Using real physics applications and production as well as test systems for data access, transport and analysis: bbcp, xrootd, dcache, and gridftp; and grid analysis tool suites
Linux kernel for TCP-based protocols, including Caltech’s FAST Far surpassing our previous SC2004 BWC Record
of 101 Gbps [*] 15 at the Caltech/CACR and 7 at the FNAL/SLAC Booth
Monitoring NLR, Abilene/HOPI, LHCNet, USNet,TeraGrid, PWave, SCInet, Gloriad, JGN2, WHREN, other Int’l R&E Nets, and 14000+ Grid Nodes Simultaneously
I. Legrand
Switch and Server Interconnections at the Caltech Booth (#428)
Switch and Server Interconnections at the Caltech Booth (#428)
15 10G Waves 72 nodes with
280+ Cores 64 10G Switch
Ports: 2 Fully Populated Cisco 6509Es
45 Neterion 10 GbE NICs
200 SATA Disks
40 Gbps (20 HBAs) to StorCloud
Thursday – Sunday Setup
http://monalisa-ul.caltech.edu:8080/stats?page=nodeinfo_sys
Fermilab Our BWC data sources are
the Production Storage Systems and File Servers used by:CDFDØUS CMS Tier 1Sloan Digital
Sky Survey Each of these produces,
stores and moves Multi-TB to PB-scale data: Tens of TB per day
~600 gridftp servers (of 1000s) directly involved
bbcp ramdisk to ramdisk transfer (CERN to Chicago)(3 TBytes of Physics Data transferred in 2 Hours)
370000
380000
390000
400000
410000
420000
430000
440000
1 39 77 115 153 191 229 267 305 343 381 419 457 495 533 571 609 647 685 723 761 799 837 875 913 951 989 1027
Units of 5 seconds
kByte
s/sec
16MB window, 2 streams
Single Server Linear Scaling
0
20
40
60
80
100
50 100 150 200 250 300 350 400
Number of Concurrent Jobs
%cp
u or
MB
/sec
0
10000
20000
30000
40000
Even
ts/s
ec
Network I/O in MB/Secpercent CPU remainingevents/sec processed
Xrootd Server Performance
Scientific Results Ad hoc Analysis of Multi-
TByte Archives Immediate exploration Spurs novel discovery
approaches Linear Scaling
Hardware Performance Deterministic Sizing
High Capacity Thousands of clients Hundreds of Parallel
Streams Very Low Latency
12us + Transfer Cost Device + NIC Limited
Excellent Across WANs
A. Hanushevsky
Xrootd ClusteringXrootd Clustering
ClientClient
RedirectorRedirector(Head Node)
Data Data ServersServersopen file X
AA
BB
CC
go to C
open file X
Who has file X?
I have
Cluster
Client sees all servers as xrootd data serversClient sees all servers as xrootd data servers
SupervisorSupervisor((sub-redirectorsub-redirector))
Who has file X? DD
EE
FF
I havego to F
open file X
I have
Unbounded Clustering Self organizing
Total Fault Tolerance Automatic real-time
reorganization
Result Minimum Admin
Overhead Better Client CPU
Utilization More results in less time
at less cost
Remote Sites: Caltech, UFL, Brazil…..
GAE Services GAE Services
GAE Services
ROOTAnalysis
ROOTAnalysis
ROOTAnalysis
Authenticated users automatically discover, and initiate multiple transfers of physics datasets (Root files) through secure Clarens based GAE services.
Transfer is monitored through MonALISA
Once data arrives at the target sites (remote) analysis can start by authenticated users, using the Root analysis framework.
Using the Clarens Root viewer or COJAC event viewer data from remote can be presented transparently to the user.
GLORIAD: 10 Gbps Optical Ring Around the Globe by March 2007GLORIAD: 10 Gbps Optical Ring Around the Globe by March 2007
GLORIAD Circuits Today
10 Gbps Hong Kong-Daejon-Seattle
10 Gbps Seattle-Chicago-NYC (CANARIE contribution to
GLORIAD)
622 Mbps Moscow-AMS-NYC
2.5 Gbps Moscow-AMS
155 Mbps Beijing-Khabarovsk-Moscow
2.5 Gbps Beijing-Hong Kong
1 GbE NYC-Chicago (CANARIE)
China, Russia, Korea, Japan, China, Russia, Korea, Japan, US, Netherlands PartnershipUS, Netherlands Partnership
US: NSF IRNC ProgramUS: NSF IRNC Program
KNU (Korea) Main GoalsKNU (Korea) Main Goals
Uses 10Gbps GLORIAD link from Korea to US, which is called BIG-GLORIAD, also part of UltraLight
Try to saturate this BIG-GLORIAD link with servers and cluster storages connected with 10Gbps
Korea is planning to be a Tier-1 site for LHC experiments
KoreaU.S.
BIG-GLORIAD
KEK (Japan) at SC0510GE Switches on the
KEK-JGN2-StarLight PathJGN2: 10G Network Research Testbed
• Operational since 4/04• 10Gbps L2 between
Tsukuba and Tokyo Otemachi
• 10Gbps IP to Starlight since August 2004
• 10Gbps L2 to Starlight since September 2005
Otemachi–Chicago OC192 link replaced by 10GE WANPHY in September 2005
Brazil HEPGrid: Rio de Janeiro (UERJ) and
Sao Paulo (UNESP)
““Global Lambdas for Particle Physics”Global Lambdas for Particle Physics”A Worldwide Network & Grid Experiment A Worldwide Network & Grid Experiment
We have Previewed the IT Challenges of Next Generation Science at the High Energy Frontier (for the LHC and other major programs) Petabyte-scale datasets Tens of national and transoceanic links at 10 Gbps (and up) 100+ Gbps aggregate data transport sustained for hours; We reached a Petabyte/day transport rate for real physics data
We set the scale and learned to gauge the difficulty of the global networks and transport systems required for the LHC mission But we set up, shook down and successfully ran the system in <1 week
We have substantive take-aways from this marathon exercise An optimized Linux (2.6.12 + FAST + NFSv4) kernel for data transport; after 7 full kernel-build cycles in 4 days
A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions
Extension of Xrootd, an optimized low-latency file access application for clusters, across the wide area
Understanding of the limits of 10 Gbps-capable systems under stress
““Global Lambdas for Particle Physics”Global Lambdas for Particle Physics”A Worldwide Network & Grid Experiment A Worldwide Network & Grid Experiment
We are grateful to our many network partners: SCInet, LHCNet, Starlight, NLR, Internet2’s Abilene and HOPI, ESnet, UltraScience Net, MiLR, FLR, CENIC, Pacific Wave, UKLight, TeraGrid, Gloriad, AMPATH, RNP, ANSP, CANARIE and JGN2.
And to our partner projects: US CMS, US ATLAS, D0, CDF, BaBar, US LHCNet, UltraLight, LambdaStation, Terapaths, PPDG, GriPhyN/iVDGL, LHCNet, StorCloud, SLAC IEPM, ICFA/SCIC and Open Science Grid
Our Supporting Agencies: DOE and NSF And for the generosity of our vendor supporters, especially
Cisco Systems, Neterion, HP, IBM, and many others, who have made this possible
And the Hudson Bay Fan Company…
Global Lambdas for Particle Physics AnalysisSC|05 Bandwidth Challenge Entry
Caltech, CERN, Fermilab, Florida,
Manchester, Michigan, SLAC, Vanderbilt,Brazil, Korea, Japan, et al
CERN's Large Hadron Collider experiments: Data/Compute/Network Intensive
Discovering the Higgs, SuperSymmetry, or Extra Space-Dimensions - with a Global Grid
Worldwide Collaborations of Physicists Working Together; while
Developing Next-generation Global Network and Grid Systems
http/http/httpshttps
ClientClient
Web serverWeb server
ServiceService
33rdrd party party applicationapplication
ClarensClarens
ClarensClarens(ACL, X509, (ACL, X509, Discovery)Discovery)
XML-RPCSOAPJava RMIJSON RPC
Catalog
Storage
AnalysisSandbox
select dataset
Network
datasets
Start (remote) analysis
Authentication Access control on Web
Services. Remote file access
(and access control on files).
Discovery of Web Services and Software.
Shell service. Shell like access to remote machines (managed by access control lists).
Proxy certificate functionality
Virtual Organization management and role management.
User's point of access to a Grid system.Provides environment where user can:
Access Grid resources and services.Execute and monitor Grid applications.Collaborate with other users. One stop shop for Grid needsPortals can lower the barrier for users to
access Web Services and using Grid enabled applications