Transcript

Present and Future Networks Present and Future Networks an HENP Perspective an HENP Perspective

Harvey B. Newman, CaltechHarvey B. Newman, Caltech

HENP WG MeetingHENP WG MeetingInternet2 Headquarters, Ann ArborInternet2 Headquarters, Ann Arbor

October 26, 2001October 26, 2001http://l3www.cern.ch/~newman/HENPWG_Oct262001.ppthttp://l3www.cern.ch/~newman/HENPWG_Oct262001.ppt

Next Generation Networks for Experiments

Next Generation Networks for Experiments

Major experiments require rapid access to event samples and subsets Major experiments require rapid access to event samples and subsets from massive data stores: up to ~500 Terabytes in 2001, Petabytes by from massive data stores: up to ~500 Terabytes in 2001, Petabytes by 2002, ~100 PB by 2007, to ~1 Exabyte by ~2012.2002, ~100 PB by 2007, to ~1 Exabyte by ~2012. Across an ensemble of networks of Across an ensemble of networks of varyingvarying capability capability

Network backbones are advancing rapidly to the 10 Gbps range:Network backbones are advancing rapidly to the 10 Gbps range:Gbps end-to-end requirements for data flows will followGbps end-to-end requirements for data flows will follow

Advanced integrated applications, such as Data Grids, relyAdvanced integrated applications, such as Data Grids, relyon seamless “transparent” operation of our LANs and WANson seamless “transparent” operation of our LANs and WANs With reliable, quantifiable (monitored), high performanceWith reliable, quantifiable (monitored), high performance They depend in turn on in-depth, widespread knowledge of They depend in turn on in-depth, widespread knowledge of

expected throughput expected throughput Networks are among the Grid’s basic building blocksNetworks are among the Grid’s basic building blocks

Where Grids interact by sharing common resourcesWhere Grids interact by sharing common resources To be treated explicitly, as an active part of the Grid designTo be treated explicitly, as an active part of the Grid design

Grids are interactive; based on a variety of networked appsGrids are interactive; based on a variety of networked apps Grid-enabled user interfaces; CollaboratoriesGrid-enabled user interfaces; Collaboratories

LHC Computing Model Data Grid Hierarchy (Ca. 2005)

LHC Computing Model Data Grid Hierarchy (Ca. 2005)

Tier 1

Tier2 Center

Online System

Offline Farm,CERN Computer Ctr

~25 TIPS

FNAL CenterIN2P3 Center INFN Center RAL Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100 MBytes/sec

~2.5 Gbps

100 - 1000 Mbits/sec

Physicists work on analysis “channels”

Each institute has ~10 physicists working on one or more channels

Physics data cache

~PByte/sec

~2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~2.5 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

0

2000

4000

6000

8000

10000

Link Bandwidth (Mbps)

Bandwidth (Mbps) 310 622 1250 2500 5000 10000

FY2001 FY2002 FY2003 FY2004 FY2005 FY2006

Baseline BW for the US-CERN Transatlantic Link: TAN-WG (DOE+NSF)

Baseline BW for the US-CERN Transatlantic Link: TAN-WG (DOE+NSF)

Plan: Reach OC12 Baseline in Spring 2002; then 2X Per YearPlan: Reach OC12 Baseline in Spring 2002; then 2X Per Year

Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*]

Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*]2001 2002 2003 2004 2005 2006

CMS 100 200 300 600 800 2500

ATLAS 50 100 300 600 800 2500

BaBar 300 600 1100 1600 2300 3000

CDF 100 300 400 2000 3000 6000

D0 400 1600 2400 3200 6400 8000

BTeV 20 40 100 200 300 500

DESY 100 180 210 240 270 300

CERNBW

155-310

622 1250 2500 5000 10000

[*] [*] Installed BW. Maximum Link Occupancy 50% AssumedInstalled BW. Maximum Link Occupancy 50% Assumed

The Network Challenge is Shared by Both Next- and The Network Challenge is Shared by Both Next- and Present Generation ExperimentsPresent Generation Experiments

Total U.S. Internet TrafficTotal U.S. Internet TrafficTotal U.S. Internet TrafficTotal U.S. Internet Traffic

Source: Roberts et al., 2001 U.S. Internet TrafficU.S. Internet Traffic

1970 1975 1980 1985 1990 1995 2000 2005 2010

Voice Crossover: August 2000

4/Year2.8/Year

1Gbps

1Tbps10Tbps

100Gbps10Gbps

100Tbps

100Mbps

1Kbps

1Mbps10Mbps

100Kbps10Kbps

100 bps

1 Pbps

100 Pbps10 Pbps

10 bps

ARPA & NSF Data to 96

New Measurements

Limit of same % GDP as Voice

Projected at 4/Year

AMS-IX Internet Exchange Throughput AMS-IX Internet Exchange Throughput Accelerated Growth in Europe (NL)Accelerated Growth in Europe (NL)

AMS-IX Internet Exchange Throughput AMS-IX Internet Exchange Throughput Accelerated Growth in Europe (NL)Accelerated Growth in Europe (NL)

Hourly Traffic8/23/013.0 Gbps

2.0 Gbps

1.0 Gbps

0

Monthly Traffic4X Growth from 2000-2001

GriPhyN iVDGL Map Circa 2002-2003 US, UK, Italy, France, Japan, Australia

GriPhyN iVDGL Map Circa 2002-2003 US, UK, Italy, France, Japan, Australia

Tier0/1 facilityTier2 facility

10 Gbps link2.5 Gbps link622 Mbps linkOther link

Tier3 facility

International Virtual-Data Grid Laboratory Conduct Data Grid tests “at scale” Develop Common Grid infrastructure National, international scale Data Grid

tests, leading to managed ops (GGOC) Components

Tier1, Selected Tier2 and Tier3 Sites Distributed Terascale Facility (DTF) 0.6 - 10 Gbps networks: US, Europe, transoceanic

Possible New Partners

Brazil T1 Russia T1 Pakistan T2 China T2 …

Abilene and Other Backbone FuturesAbilene and Other Backbone Futures

Abilene partnership with Qwest extended through 2006Abilene partnership with Qwest extended through 2006 Backbone to be upgraded to 10-Gbps in three phases:Backbone to be upgraded to 10-Gbps in three phases:

Complete by October 2003Complete by October 2003 Detailed Design Being Completed NowDetailed Design Being Completed Now GigaPoP Upgrade start in February 2002GigaPoP Upgrade start in February 2002

Capability for flexible Capability for flexible provisioning in support provisioning in support of future experimentation in optical networkingof future experimentation in optical networking

In a multi- In a multi- infrastructure infrastructure

Overall approach to the new technical design and business Overall approach to the new technical design and business plan is for an incremental, non-disruptive transitionplan is for an incremental, non-disruptive transition

Also: GEANT in Europe; Super-SINET in Japan; Also: GEANT in Europe; Super-SINET in Japan; Advanced European national networks (DE, NL, etc.)Advanced European national networks (DE, NL, etc.)

TEN-155 and GEANTTEN-155 and GEANTEuropean A&R Networks 2001-2002European A&R Networks 2001-2002

TEN-155 and GEANTTEN-155 and GEANTEuropean A&R Networks 2001-2002European A&R Networks 2001-2002

GEANT: from 9/01

10 & 2.5 Gbps

TEN-155OC12 Core

Project: 2000 - 2004

European A&R Networks are Advancing RapidlyEuropean A&R Networks are Advancing Rapidly

National Research Networks in JapanNational Research Networks in JapanNational Research Networks in JapanNational Research Networks in JapanSuperSINET SuperSINET

Start of operation January 2002Start of operation January 2002 Support for 5 important areas:Support for 5 important areas:

HEP,HEP, Genetics, Nano Technology, Genetics, Nano Technology, Space/Astronomy, Space/Astronomy, GRIDsGRIDs

ProvidesProvides 10 Gbps IP connection 10 Gbps IP connection Direct inter-site GbE linksDirect inter-site GbE links Some connections to Some connections to 10 GbE 10 GbE in JFY2002in JFY2002

HEPnet-JHEPnet-J Will be re-constructed with Will be re-constructed with MPLS-VPN in SuperSINET MPLS-VPN in SuperSINET

IMnetIMnet Will be merged into Will be merged into SINET/SuperSINET SINET/SuperSINET

Tokyo

Osaka

Nagoya

Internet

Osaka U

Kyoto U

ICRKyoto-U

Nagoya U

NIFS

NIG

KEK

Tohoku U

IMS

U-TokyoNAO

U Tokyo

NII Hitotsubashi

NII Chiba

IP

WDM path

IP router

OXC

ISAS

STARLIGHT: The Next GenerationSTARLIGHT: The Next GenerationOptical STARTAPOptical STARTAP

STARLIGHT: The Next GenerationSTARLIGHT: The Next GenerationOptical STARTAPOptical STARTAP

StarLight, the Optical STAR TAP, is an advanced optical infrastructure and proving ground for network services optimized for high-performance applications. In partnership with CANARIE (Canada), SURFnet (Netherlands), and soon CERN.

Started this SummerStarted this Summer Existing Fiber: Ameritech, AT&T, Existing Fiber: Ameritech, AT&T,

Qwest; MFN, Teleglobe, Global Qwest; MFN, Teleglobe, Global Crossing and OthersCrossing and Others

Main distinguishing features:Main distinguishing features:Neutral location Neutral location

(Northwestern University) (Northwestern University)40 racks for co-location40 racks for co-location1/10 Gigabit Ethernet based1/10 Gigabit Ethernet basedOptical switches for advanced Optical switches for advanced experiments experiments

GMPLS, OBGPGMPLS, OBGP 2*622 Mbps ATMs connections to 2*622 Mbps ATMs connections to

the STAR TAPthe STAR TAP Developed by EVL at UIC, iCAIR at Developed by EVL at UIC, iCAIR at

NWU, ANL/MCS Div.NWU, ANL/MCS Div.

NLNLSURFnet

GENEVA

UKUKSuperJANET4

ABILENEABILENE

ESNETESNET

MRENMREN

ItItGARR-B

GEANT

NewYork

FrFrRenater

STAR-TAP

STARLIGHT

DataTAG ProjectDataTAG Project

EU-Solicited Project. EU-Solicited Project. CERNCERN, PPARC (UK), Amsterdam (NL), and INFN (IT) , PPARC (UK), Amsterdam (NL), and INFN (IT) Main Aims: Main Aims:

Ensure maximum interoperability between US and EU Grid ProjectsEnsure maximum interoperability between US and EU Grid ProjectsTransatlantic Testbed for advanced network researchTransatlantic Testbed for advanced network research

2.5 Gbps wavelength-based US-CERN Link 7/2002 (Higher in 2003)2.5 Gbps wavelength-based US-CERN Link 7/2002 (Higher in 2003)

Daily, Weekly, Monthly and Yearly Statistics on 155 Mbps US-CERN Link

Daily, Weekly, Monthly and Yearly Statistics on 155 Mbps US-CERN Link

20 - 60 Mbps Used Routinely BW Upgrades Quickly Followedby Upgraded Production Use

Throughput Changes with Time Throughput Changes with Time Link, route upgrades, factors 3-16 in 12 monthsLink, route upgrades, factors 3-16 in 12 months Improvements in steps at times of upgradesImprovements in steps at times of upgrades 8/01: 105 Mbps reached with 30 Streams: SLAC-IN2P38/01: 105 Mbps reached with 30 Streams: SLAC-IN2P3 9/1/01: 102 Mbps reached in One Stream: Caltech-CERN9/1/01: 102 Mbps reached in One Stream: Caltech-CERN

See http://www-iepm.slac.stanford. edu/monitoring/bulk/ Also see the Internet2 E2E Initiative: http://www.internet2.edu/e2e

Caltech to SLAC on CALREN2A Shared Production OC12 Network

Caltech to SLAC on CALREN2A Shared Production OC12 Network

SLAC: 4 CPU Sun; Caltech: 1 GHz PIII; GigE InterfacesSLAC: 4 CPU Sun; Caltech: 1 GHz PIII; GigE Interfaces

Need Large Windows; Multiple streams helpNeed Large Windows; Multiple streams help Bottleneck bandwidth ~320 Mbps; RTT 25 msec;Bottleneck bandwidth ~320 Mbps; RTT 25 msec;

Window > 1 MB needed for a single stream Window > 1 MB needed for a single stream Results vary by a factor of up to 5 over time;Results vary by a factor of up to 5 over time;

sharing with campus trafficsharing with campus traffic

CALREN2

Max. Packet Loss Rates for Given Max. Packet Loss Rates for Given

Throughput [Matthis: BW < MSS/(RTT*LossThroughput [Matthis: BW < MSS/(RTT*Loss 0.50.5)])]Max. Packet Loss Rates for Given Max. Packet Loss Rates for Given

Throughput [Matthis: BW < MSS/(RTT*LossThroughput [Matthis: BW < MSS/(RTT*Loss 0.50.5)])]LA-Boston 70 msecMSS (Bytes) 1500 9128Thruput Mbps Loss Loss

10 4E-04 2E-0230 5E-05 2E-03

100 4E-06 2E-04300 5E-07 2E-05

1,000 4E-08 2E-063,000 5E-09 2E-07

10,000 4E-10 2E-08

LA-CERN 170 msecMSS (Bytes) 1500 9128Thruput Mbps Loss Loss

10 7E-05 3E-0330 8E-06 3E-04

100 7E-07 3E-05300 8E-08 3E-06

1,000 7E-09 3E-073,000 8E-10 3E-08

10,000 7E-11 3E-09

1 Gbps LA-CERN Throughput Means1 Gbps LA-CERN Throughput Means Extremely Low Packet Loss Extremely Low Packet Loss

~1E-8 with standard packet size~1E-8 with standard packet sizeAccording to the Equation a single According to the Equation a single stream with 10 Gbps throughput stream with 10 Gbps throughput requires a packet loss rate requires a packet loss rate of 7 X 1E-11 with standard size packets of 7 X 1E-11 with standard size packets

1 packet lost per 5 hours !1 packet lost per 5 hours ! LARGE WindowsLARGE Windows

2.5 Gbps Caltech-CERN 2.5 Gbps Caltech-CERN 53 Mbytes 53 Mbytes Effects of Packet Drop (Link Error)Effects of Packet Drop (Link Error) on a 10 Gbps Link: MDAI on a 10 Gbps Link: MDAI

Halve the Rate: to 5 GbpsHalve the Rate: to 5 Gbps It will take It will take ~ 4 Minutes~ 4 Minutes for TCP to for TCP to ramp back up to 10 Gbps ramp back up to 10 Gbps

Large Segment Sizes (Jumbo Frames)Large Segment Sizes (Jumbo Frames) Could Help, Could Help, Where SupportedWhere Supported Motivation for exploring TCP Motivation for exploring TCP

Variants; Other Protocols Variants; Other Protocols

Key Network Issues & ChallengesKey Network Issues & Challenges Net Infrastructure Requirements for High ThroughputNet Infrastructure Requirements for High Throughput

Careful Router configuration; monitoring Careful Router configuration; monitoring Enough Router “Horsepower” (CPUs, Buffer Space)Enough Router “Horsepower” (CPUs, Buffer Space) Server and Client CPU, I/O and NIC throughput sufficientServer and Client CPU, I/O and NIC throughput sufficient Packet Loss must be ~Zero (well below 0.1%)Packet Loss must be ~Zero (well below 0.1%)

I.e. No “Commodity” networksI.e. No “Commodity” networks No Local infrastructure bottlenecksNo Local infrastructure bottlenecks

Gigabit Ethernet “clear path” between selected host pairsGigabit Ethernet “clear path” between selected host pairs To 10 Gbps Ethernet by ~2003To 10 Gbps Ethernet by ~2003

TCP/IP stack configuration and tuning is Absolutely RequiredTCP/IP stack configuration and tuning is Absolutely Required Large WindowsLarge Windows Multiple StreamsMultiple Streams

End-to-endEnd-to-end monitoring and tracking of performance monitoring and tracking of performance Close collaboration with local and “regional” network Close collaboration with local and “regional” network

engineering staffs (e.g. router and switch configuration).engineering staffs (e.g. router and switch configuration).

Key Network Issues & ChallengesKey Network Issues & Challenges

None of this scales from 0.08 Gbps to 10 GbpsNone of this scales from 0.08 Gbps to 10 Gbps New (expensive) hardwareNew (expensive) hardware The last mile, and tenth-mile problemThe last mile, and tenth-mile problem Firewall performance; security issuesFirewall performance; security issues

ConcernsConcerns The “Wizard Gap” (ref: Matt Matthis; Jason Lee)The “Wizard Gap” (ref: Matt Matthis; Jason Lee) RFC2914 and the Network Police; “Clever” FirewallsRFC2914 and the Network Police; “Clever” Firewalls Net Infrastructure providers (Local, regional, national, Net Infrastructure providers (Local, regional, national,

int’l) who may or may not want (or feel able) to int’l) who may or may not want (or feel able) to accommodate HENP “bleeding edge” usersaccommodate HENP “bleeding edge” users

New TCP/IP developments (or TCP alternatives) are New TCP/IP developments (or TCP alternatives) are required for multiuser Gbps links [UDP/RTP ?]required for multiuser Gbps links [UDP/RTP ?]

Internet2 HENP WG [*]Internet2 HENP WG [*]

To help ensure that the requiredTo help ensure that the required National and international network infrastructuresNational and international network infrastructures

(end-to-end)(end-to-end) Standardized tools and facilities for high performance and Standardized tools and facilities for high performance and

end-to-end monitoring and tracking, andend-to-end monitoring and tracking, and Collaborative systemsCollaborative systems

are developed and deployed in a timely manner, are developed and deployed in a timely manner, and used effectively to meet the needs of the US LHC and and used effectively to meet the needs of the US LHC and other major HENP Programs, as well as the general needs other major HENP Programs, as well as the general needs of our scientific community.of our scientific community. To carry out these developments in a way that is broadly To carry out these developments in a way that is broadly

applicable across many fieldsapplicable across many fields Forming an Internet2 WG as a suitable framework Forming an Internet2 WG as a suitable framework

[*] [*] Co-Chairs: S. McKee (Michigan), H. Newman (Caltech);Co-Chairs: S. McKee (Michigan), H. Newman (Caltech); Sec’y J. Williams (Indiana); With thanks to Rob Gardner (Indiana Sec’y J. Williams (Indiana); With thanks to Rob Gardner (Indiana ))

http://www.usatlas.bnl.gov/computing/mgmt/lhccp/henpnet/http://www.usatlas.bnl.gov/computing/mgmt/lhccp/henpnet/

Network-Related Hard ProblemsNetwork-Related Hard Problems

““Query Estimation”: Reliable Estimate of PerformanceQuery Estimation”: Reliable Estimate of Performance Throughput monitoring, and also ModelingThroughput monitoring, and also Modeling Source and Destination Host & TCP-stack BehaviorSource and Destination Host & TCP-stack Behavior

Policy Versus Technical Capability IntersectionPolicy Versus Technical Capability Intersection Strategies: (New Algorithms) Strategies: (New Algorithms) Authentication, Authorization, Priorities and Quotas Authentication, Authorization, Priorities and Quotas

Across SitesAcross Sites Metrics of PerformanceMetrics of Performance Metrics of Conformance to PolicyMetrics of Conformance to Policy

Key Role of Simulation (for Grids as a Whole): Key Role of Simulation (for Grids as a Whole):

“Now Casting” ?“Now Casting” ?

US CMS Remote Control RoomFor LHC

US CMS Remote Control RoomFor LHC

US CMS will use the CDF/KEK remote control room concept for Fermilab Run II as a starting point. However, we will (1) expand the scope to encompass a US based physics group and US LHC accelerator tasks, and (2) extend the concept to a Global Collaboratory for realtime data acquisition + analysis

Networks, Grids and HENPNetworks, Grids and HENP Next generation 10 Gbps network backbones are Next generation 10 Gbps network backbones are

almost here: in the US, Europe and Japanalmost here: in the US, Europe and Japan First stages arriving in 6-12 monthsFirst stages arriving in 6-12 months

Major International links at 2.5 - 10 Gbps in 0-12 monthsMajor International links at 2.5 - 10 Gbps in 0-12 months There are Problems to be addressed in other world regionsThere are Problems to be addressed in other world regions Regional, last mile and network bottlenecks and qualityRegional, last mile and network bottlenecks and quality

are all on the critical pathare all on the critical path High (reliable) Grid performance across network meansHigh (reliable) Grid performance across network means

End-to-end monitoring (including s/d host software)End-to-end monitoring (including s/d host software) Getting high performance toolkits in users’ handsGetting high performance toolkits in users’ hands Working with Internet E2E, the HENP WG and Working with Internet E2E, the HENP WG and

DataTAG to get this done DataTAG to get this done iVDGL as an Inter-Regional Effort, with a GGOCiVDGL as an Inter-Regional Effort, with a GGOC

Among the first to face and address these issuesAmong the first to face and address these issues

Agent-Based Distributed System: JINI Prototype (Caltech/NUST)

Agent-Based Distributed System: JINI Prototype (Caltech/NUST)

Includes “Station Servers” (static) that host mobile “Dynamic Services”

Servers are interconnected dynamically to form a fabric in which mobile agents can travel with a payload of physics analysis tasks

Prototype is highly flexible and robust against network outages

Amenable to deployment on leading edge and future portable devices (WAP, iAppliances, etc.)“The” system for the travelling

physicistStudies with this prototype use the

MONARC Simulator, and build on the SONN study

See http://home.cern.ch/clegrand/lia/See http://home.cern.ch/clegrand/lia/

StationServer

StationServer

StationServer

LookupService

LookupService

Proxy Exchange

Registration

Service Listener

Lookup Discovery

Service

Remote Notification

6800 Hosts; 36 (7 I2) Reflectors Users In 56 CountriesAnnual Growth 250%


Recommended