Transcript

Global Lambdas and Grids for Global Lambdas and Grids for Particle Physics in the LHC EraParticle Physics in the LHC Era

Harvey B. NewmanHarvey B. Newman California Institute of TechnologyCalifornia Institute of Technology

SC2005SC2005Seattle, November 14-18 2005Seattle, November 14-18 2005

Beyond the SM: Great Questions of Particle Physics and Cosmology

1. Where does the pattern of particle families and masses come from ?

2. Where are the Higgs particles; what is the mysterious Higgs field ?

3. Why do neutrinos and quarks oscillate ?

4. Is Nature Supersymmetric ?5. Why is any matter left in the

universe ?6. Why is gravity so weak?7. Are there extra space-time

dimensions?

You Are Here.

We do not know what makes up 95% of the universe.

TOTEM pp, general purpose; HI

pp, general purpose; HI

LHCb: B-physics

ALICE : HI

pp s =14 TeV L=1034 cm-2 s-1

27 km Tunnel in Switzerland & France

Large Hadron Collider CERN, Geneva: 2007 Start

Large Hadron Collider CERN, Geneva: 2007 Start

CMS

Atlas

Higgs, SUSY, Extra Dimensions, CP Violation, QG Plasma, … the Unexpected

5000+ Physicists 250+ Institutes 60+ Countries

Challenges: Analyze petabytes of complex data cooperativelyHarness global computing, data & network resources

CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

Tier 1

Tier2 Center

Online System

CERN Center PBs of Disk;

Tape Robot

FNAL CenterIN2P3 Center INFN Center RAL Center

InstituteInstituteInstituteInstitute

Workstations

~150-1500 MBytes/sec

~10 Gbps

1 to 10 Gbps

Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.

Physics data cache

~PByte/sec

10 - 40 Gbps

Tier2 CenterTier2 CenterTier2 Center

~1-10 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

LHC Data Grid HierarchyLHC Data Grid Hierarchy

Emerging Vision: A Richly Structured, Global Dynamic System

ESnet Monthly Accepted Traffic ThroughMay, 2005

0

100

200

300

400

500

600

Feb,

90

Sep

, 90

Apr

, 91

Nov

, 91

Jun,

92

Jan,

93

Aug

, 93

Mar

, 94

Oct

, 94

May

, 95

Dec

, 95

Jul,

96

Feb,

97

Sep

, 97

Apr

, 98

Nov

, 98

Jun,

99

Jan,

00

Aug

, 00

Mar

, 01

Oct

, 01

May

,02

Dec

, 02

Jul,

03

Feb,

04

Sep

, 04

Apr

, 05

TByt

e/M

onth

Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs

Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs

SLAC Traffic Growth in Steps: ~10X/4 Years.

Projected: ~2 Terabits/s by ~2014“Summer” ‘05: 2x10 Gbps links:

one for production, one for R&D

W. Johnston

R. Cottrell

Progressin Steps

10 Gbit/s

TER

AB

YTES

Per

Mon

th

100

300

400

500

600

200

ESnet Accepted Traffic 1990 – 2005Exponential Growth: +82%/Year for the Last 15 Years; 400X Per Decade

IPv4 Multi-stream record with IPv4 Multi-stream record with FAST TCP: FAST TCP: 6.86 Gbps X 27kkm:6.86 Gbps X 27kkm: Nov 2004Nov 2004

IPv6 record: IPv6 record: 5.11 Gbps5.11 Gbps between between Geneva and Starlight: Jan. 2005Geneva and Starlight: Jan. 2005

Disk-to-disk Marks: Disk-to-disk Marks: 536 Mbytes/sec (Windows); 536 Mbytes/sec (Windows); 500 Mbytes/sec (Linux) 500 Mbytes/sec (Linux)

End System Issues: End System Issues: PCI-X Bus, PCI-X Bus, Linux Kernel, NIC Drivers, CPULinux Kernel, NIC Drivers, CPU

Internet 2 Land Speed Record (LSR)

NB: Manufacturers’ Roadmaps for 2006: One Server Pair to One 10G Link

Nov. 2004 Record Network

6.6 Gbps16500km

4.2 Gbps16343km5.6 Gbps

10949km

5.4 Gbps7067km2.5 Gbps

10037km0.9 Gbps10978km0.4 Gbps

12272km

0

20

40

60

80

100

120

140

160

Thro

ughp

ut (G

bps)

Internet2 LSR - Single IPv4 TCP stream 7.21 Gbps20675 km

Internet2 LSRs:Blue = HEP

7.2G X 20.7 kkm

Th

rou

hg

pu

t (P

eta

bit

-m/s

ec)

HENP Bandwidth Roadmap for Major Links (in Gbps)

HENP Bandwidth Roadmap for Major Links (in Gbps)

Year Production Experimental Remarks

2001 0.155 0.622-2.5 SONET/SDH

2002 0.622 2.5 SONET/SDH DWDM; GigE Integ.

2003 2.5 10 DWDM; 1 + 10 GigE Integration

2005 10 2-4 X 10 Switch; Provisioning

2007 2-4 X 10 ~10 X 10; 40 Gbps

1st Gen. Grids

2009 ~10 X 10 or 1-2 X 40

~5 X 40 or ~20-50 X 10

40 Gbps Switching

2011 ~5 X 40 or

~20 X 10

~25 X 40 or ~100 X 10

2nd Gen Grids Terabit Networks

2013 ~Terabit ~MultiTbps ~Fill One Fiber

Continuing Trend: ~1000 Times Bandwidth Growth Per Decade;HEP: Co-Developer as well as Application Driver of Global Nets

LHCNet , ESnet Plan 2006-2009:20-80Gbps US-CERN, ESnet MANs, IRNC

LHCNet , ESnet Plan 2006-2009:20-80Gbps US-CERN, ESnet MANs, IRNC

DENDEN

ELPELP

ALBALBATLATL

Metropolitan Area Rings

Aus.

Europe

SDGSDG

AsiaPacSEASEA

Major DOE Office of Science SitesHigh-speed cross connects with Internet2/Abilene

New ESnet hubsESnet hubs

SNVSNV

Europe

Japan

Science Data Network core, 40-60 Gbps circuit transportLab suppliedMajor international

Production IP ESnet core, 10 Gbps enterprise IP traffic

Japan

Aus.

Metro Rings

ESnet2nd Core:

30-50G

ESnet IP Core≥10 Gbps

10Gb/s10Gb/s

30Gb/s2 x 10Gb/s

NYCNYCCHICHI

LHCNetData Network

(2 to 8 x 10 Gbps US-CERN)

LHCNet Data Network

DCDCGEANT2SURFNetIN2P3

NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2

CERN

FNAL

BNL

LHCNet US-CERN: Wavelength Triangle10/05: 10G CHI + 10G

NY 2007: 20G + 20G 2009: ~40G + 40G

ESNet MANs to FNAL & BNL; Dark fiber (60Gbps)

to FNAL

IRNC Links

Global Lambdas for Particle PhysicsGlobal Lambdas for Particle PhysicsCaltech/CACR and FNAL/SLAC BoothsCaltech/CACR and FNAL/SLAC Booths

Preview global-scale data analysis of the LHC Era (2007-2020+), using next-generation networks and intelligent grid systems

Using state of the art WAN infrastructure and Grid-based Web service frameworks, based on the LHC Tiered Data Grid Architecture

Using a realistic mixture of streams: organized transfer of multi-TB event datasets, plus numerous smaller flows of physics data that absorb the remaining capacity.

The analysis software suites are based on the Grid-enabled Analysis Environment (GAE) developed at Caltech and U. Florida, as well as Xrootd from SLAC, and dcache from FNAL

Monitored by Caltech’s MonALISA global monitoring and control system

Global Lambdas for Particle PhysicsGlobal Lambdas for Particle PhysicsCaltech/CACR and FNAL/SLAC BoothsCaltech/CACR and FNAL/SLAC Booths

We used Twenty Two [*] 10 Gbps waves to carry bidirectional traffic between Fermilab, Caltech, SLAC, BNL, CERN and other partner Grid Service sites including: Michigan, Florida, Manchester, Rio de Janeiro (UERJ) and Sao Paulo (UNESP) in Brazil, Korea (KNU), and Japan (KEK)

Results 151 Gbps peak, 100+ Gbps of throughput sustained for hours:

475 Terabytes of physics data transported in < 24 hours 131 Gbps measured by SCInet bwc team on 17 of our waves

Using real physics applications and production as well as test systems for data access, transport and analysis: bbcp, xrootd, dcache, and gridftp; and grid analysis tool suites

Linux kernel for TCP-based protocols, including Caltech’s FAST Far surpassing our previous SC2004 BWC Record

of 101 Gbps [*] 15 at the Caltech/CACR and 7 at the FNAL/SLAC Booth

Monitoring NLR, Abilene/HOPI, LHCNet, USNet,TeraGrid, PWave, SCInet, Gloriad, JGN2, WHREN, other Int’l R&E Nets, and 14000+ Grid Nodes Simultaneously

I. Legrand

Switch and Server Interconnections at the Caltech Booth (#428)

Switch and Server Interconnections at the Caltech Booth (#428)

15 10G Waves 72 nodes with

280+ Cores 64 10G Switch

Ports: 2 Fully Populated Cisco 6509Es

45 Neterion 10 GbE NICs

200 SATA Disks

40 Gbps (20 HBAs) to StorCloud

Thursday – Sunday Setup

http://monalisa-ul.caltech.edu:8080/stats?page=nodeinfo_sys

Fermilab Our BWC data sources are

the Production Storage Systems and File Servers used by:CDFDØUS CMS Tier 1Sloan Digital

Sky Survey Each of these produces,

stores and moves Multi-TB to PB-scale data: Tens of TB per day

~600 gridftp servers (of 1000s) directly involved

bbcp ramdisk to ramdisk transfer (CERN to Chicago)(3 TBytes of Physics Data transferred in 2 Hours)

370000

380000

390000

400000

410000

420000

430000

440000

1 39 77 115 153 191 229 267 305 343 381 419 457 495 533 571 609 647 685 723 761 799 837 875 913 951 989 1027

Units of 5 seconds

kByte

s/sec

16MB window, 2 streams

Single Server Linear Scaling

0

20

40

60

80

100

50 100 150 200 250 300 350 400

Number of Concurrent Jobs

%cp

u or

MB

/sec

0

10000

20000

30000

40000

Even

ts/s

ec

Network I/O in MB/Secpercent CPU remainingevents/sec processed

Xrootd Server Performance

Scientific Results Ad hoc Analysis of Multi-

TByte Archives Immediate exploration Spurs novel discovery

approaches Linear Scaling

Hardware Performance Deterministic Sizing

High Capacity Thousands of clients Hundreds of Parallel

Streams Very Low Latency

12us + Transfer Cost Device + NIC Limited

Excellent Across WANs

A. Hanushevsky

Xrootd ClusteringXrootd Clustering

ClientClient

RedirectorRedirector(Head Node)

Data Data ServersServersopen file X

AA

BB

CC

go to C

open file X

Who has file X?

I have

Cluster

Client sees all servers as xrootd data serversClient sees all servers as xrootd data servers

SupervisorSupervisor((sub-redirectorsub-redirector))

Who has file X? DD

EE

FF

I havego to F

open file X

I have

Unbounded Clustering Self organizing

Total Fault Tolerance Automatic real-time

reorganization

Result Minimum Admin

Overhead Better Client CPU

Utilization More results in less time

at less cost

Remote Sites: Caltech, UFL, Brazil…..

GAE Services GAE Services

GAE Services

ROOTAnalysis

ROOTAnalysis

ROOTAnalysis

Authenticated users automatically discover, and initiate multiple transfers of physics datasets (Root files) through secure Clarens based GAE services.

Transfer is monitored through MonALISA

Once data arrives at the target sites (remote) analysis can start by authenticated users, using the Root analysis framework.

Using the Clarens Root viewer or COJAC event viewer data from remote can be presented transparently to the user.

SC|05 Abilene and HOPI Waves

GLORIAD: 10 Gbps Optical Ring Around the Globe by March 2007GLORIAD: 10 Gbps Optical Ring Around the Globe by March 2007

GLORIAD Circuits Today

10 Gbps Hong Kong-Daejon-Seattle

10 Gbps Seattle-Chicago-NYC (CANARIE contribution to

GLORIAD)

622 Mbps Moscow-AMS-NYC

2.5 Gbps Moscow-AMS

155 Mbps Beijing-Khabarovsk-Moscow

2.5 Gbps Beijing-Hong Kong

1 GbE NYC-Chicago (CANARIE)

China, Russia, Korea, Japan, China, Russia, Korea, Japan, US, Netherlands PartnershipUS, Netherlands Partnership

US: NSF IRNC ProgramUS: NSF IRNC Program

ESLEA/UKLight SC|05 Network Diagram

OC-192

6 X 1 GE

KNU (Korea) Main GoalsKNU (Korea) Main Goals

Uses 10Gbps GLORIAD link from Korea to US, which is called BIG-GLORIAD, also part of UltraLight

Try to saturate this BIG-GLORIAD link with servers and cluster storages connected with 10Gbps

Korea is planning to be a Tier-1 site for LHC experiments

KoreaU.S.

BIG-GLORIAD

KEK (Japan) at SC0510GE Switches on the

KEK-JGN2-StarLight PathJGN2: 10G Network Research Testbed

• Operational since 4/04• 10Gbps L2 between

Tsukuba and Tokyo Otemachi

• 10Gbps IP to Starlight since August 2004

• 10Gbps L2 to Starlight since September 2005

Otemachi–Chicago OC192 link replaced by 10GE WANPHY in September 2005

Brazil HEPGrid: Rio de Janeiro (UERJ) and

Sao Paulo (UNESP)

““Global Lambdas for Particle Physics”Global Lambdas for Particle Physics”A Worldwide Network & Grid Experiment A Worldwide Network & Grid Experiment

We have Previewed the IT Challenges of Next Generation Science at the High Energy Frontier (for the LHC and other major programs) Petabyte-scale datasets Tens of national and transoceanic links at 10 Gbps (and up) 100+ Gbps aggregate data transport sustained for hours; We reached a Petabyte/day transport rate for real physics data

We set the scale and learned to gauge the difficulty of the global networks and transport systems required for the LHC mission But we set up, shook down and successfully ran the system in <1 week

We have substantive take-aways from this marathon exercise An optimized Linux (2.6.12 + FAST + NFSv4) kernel for data transport; after 7 full kernel-build cycles in 4 days

A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions

Extension of Xrootd, an optimized low-latency file access application for clusters, across the wide area

Understanding of the limits of 10 Gbps-capable systems under stress

““Global Lambdas for Particle Physics”Global Lambdas for Particle Physics”A Worldwide Network & Grid Experiment A Worldwide Network & Grid Experiment

We are grateful to our many network partners: SCInet, LHCNet, Starlight, NLR, Internet2’s Abilene and HOPI, ESnet, UltraScience Net, MiLR, FLR, CENIC, Pacific Wave, UKLight, TeraGrid, Gloriad, AMPATH, RNP, ANSP, CANARIE and JGN2.

And to our partner projects: US CMS, US ATLAS, D0, CDF, BaBar, US LHCNet, UltraLight, LambdaStation, Terapaths, PPDG, GriPhyN/iVDGL, LHCNet, StorCloud, SLAC IEPM, ICFA/SCIC and Open Science Grid

Our Supporting Agencies: DOE and NSF And for the generosity of our vendor supporters, especially

Cisco Systems, Neterion, HP, IBM, and many others, who have made this possible

And the Hudson Bay Fan Company…

Extra Slides FollowExtra Slides Follow

Global Lambdas for Particle Physics AnalysisSC|05 Bandwidth Challenge Entry

Caltech, CERN, Fermilab, Florida,

Manchester, Michigan, SLAC, Vanderbilt,Brazil, Korea, Japan, et al

CERN's Large Hadron Collider experiments: Data/Compute/Network Intensive

Discovering the Higgs, SuperSymmetry, or Extra Space-Dimensions - with a Global Grid

Worldwide Collaborations of Physicists Working Together; while

Developing Next-generation Global Network and Grid Systems

http/http/httpshttps

ClientClient

Web serverWeb server

ServiceService

33rdrd party party applicationapplication

ClarensClarens

ClarensClarens(ACL, X509, (ACL, X509, Discovery)Discovery)

XML-RPCSOAPJava RMIJSON RPC

Catalog

Storage

AnalysisSandbox

select dataset

Network

datasets

Start (remote) analysis

Authentication Access control on Web

Services. Remote file access

(and access control on files).

Discovery of Web Services and Software.

Shell service. Shell like access to remote machines (managed by access control lists).

Proxy certificate functionality

Virtual Organization management and role management.

User's point of access to a Grid system.Provides environment where user can:

Access Grid resources and services.Execute and monitor Grid applications.Collaborate with other users. One stop shop for Grid needsPortals can lower the barrier for users to

access Web Services and using Grid enabled applications


Recommended