10 June 2004 Protocols for Long-Distance Networks Terena Networking Conference 2004 Rhodes

10 June 2004

Protocols for Long-Distance Networks

Terena Networking Conference 2004

Rhodes

2

Overview

The PFLDnet research area

The PFLDnet Workshop series

Selected results from PFLDnet'04

Reflections

http://www-didc.lbl.gov/PFLDnet2004/

3

The PFLDnet Research Area

Protocols for Fat Long-Distance Nets• Sustaining high-speed flows over wide areas is:

–Difficult–Important

• Difficult due to difficulty of managing large numbers of in-flight packets

• Important due to need for scientists around the world to share information

After a period of relative neglect, PFLDnet is now a vibrant research area

4

A little more on why it's hard

In Van Jacobson's 1988 paper:“...insensitive to [noncongestive] loss until the loss rate is on the order of one packet per window.”

Then: a window was 8 packets.

Now: a window is about 83,000 packets(10,000 km at 10 Gb/s with 1500-byte packets)

So noncongestive packet loss must be less than 0.0012%

5

A little more on why it's important

Many international scientific research collaborations need to transmit data at several multiples of 10 Gb/s over distances at/above 10,000 km.

• High-energy physics• Radio astronomy• Biomedical informatics

How to support these applications in a scalable sustainable way is a key challenge for our community.

6

The PFLDnet Workshop Series

CERNGeneva -- SwitzerlandFebruary 3-4, 2003

Argonne National LaboratoryChicago, Illinois -- USAFebruary 16-17, 2004

Early planning for spring 2005 in Europe

7


Improved algorithms for TCP• FAST: Caltech• H-TCP: Hamilton Institute, Ireland• HSTCP-LP: Rice University and SLAC• Also: HS-TCP, BiC-TCP, and S-TCP

Non-TCP but in shared IP context

Testing and evaluation

Exploring non-shared contexts

8

Critique of 'standard' AIMD TCP

Too cautious:• only increases cwnd by one packet per RTT• interprets every loss as congestion• hence take several tens of minutes to recover in a PFLnet environment

• hence cannot fully utilize the bottleneck link

Too brutal:• keeps growing cwnd until the queue in the bottleneck router overflows

• hence massive queues rise and fall in routers• not good for other jitter-intolerant traffic

9

FAST: Delay-based Algorithms

Steven Low, Cheng Jin, et al. at Caltech

Consider TCP as a control system• TCP sender injects a data rate signal• Network provides delay and loss feedback

Uses measured delay effectively to maintain a moderate-sized queue

• hence better for other applications• and keeps the bottleneck link fully utilized

Careful attention to stability / fairness

10

H-TCP: Rapid recovery of cwnd

DJ Leith and RN Shorten at Hamilton Inst

Focus on the AI part of AIMD in high-speed regimes: use a quadratic function of time since last loss instead of a constant as the increase in cwnd

Consistent with standard AIMD in other regimes

Careful study of synchronization issues

11

HSTCP-LP: Combining High-speed and Low-priority

A Kuzmanovic and E Knightly at Rice,with L Cottrell at SLAC

Builds on earlier TCP-LP work• AIMD but defer to other traffic [Infocom 03]

Builds on Floyd's HSTCP

Careful use of one-way delay measurements via TCP timestamp option

Effectively uses bottleneck link, but defers to other TCP traffic

12

Other TCP Algorithms Work

HSTCP: Floyd of ICIR• conservative improvement on AIMD

BiC: Rhee of North Carolina State• binary search for the right cwnd value

Scalable TCP: Kelly of Cambridge• an aggressive MIMD approach

13


Improved algorithms for TCP

Non-TCP but in shared IP context• UDT: Univ Illinois Chicago• XCP: MIT and USC-ISI• eVLBI-specific: MIT



14

UDT: Congestion Control over UDP

Y Gu and R Grossman at UI-Chicago

Observation: even once a new TCP stack is created, deployment is hard

Idea: implement a good congestion control algorithm within a subroutine library using UDP kernel services

Also, rate-based algorithms with estimates of available bandwidth

15

XCP: Leveraging future router cooperation

D Katabi at MIT, with A Falk et al. at USC-ISI

Posit advanced cooperation by the bottleneck router

• hence stable moderate-sized queues• and full use of bottleneck link• with very rapid convergence

This will take time to get right and then deploy, but clearly a compelling idea

16

eVLBI-specific work

J Wroclawski, D Lapsley, and A Whinery at MIT (CS and Haystack Observatory)

eVLBI: two or more physically separated radio telescopes correlating data from deep-space objects in real time (very cool !!)

Needs: consistent high data rates, but can tolerate some packet loss

Edge Guided Adaptive Endpoint: innovative application-specific algorithms to optimize eVLBI efficacy

17




Testing and evaluation• Techniques: Lawrence Berkeley Lab• Evaluations: SLAC, Internet2, Manchester, UCL


18

Techniques to strengthen testing

B Tierney and J Lee at LBL

Make use of techniques that allow:• testing of multiple paths on multiple days• use well-considered statistics• controlled experiments

Network Tool Analysis Framework

19

Evaluations

L Cottrell at SLAC, R Hughes-Jones at Manchester, and H Bullot at EPFL

Tested many TCP stacks• throughput• sensitivity to distance• stability and fairness

Several shown to be promising• including BiC, FAST, HSTCP-LP

20

Evaluations

S Shalunov of Internet2

Tested FAST within Internet2 context• showed three 1-Gb/s paths easily saturating the OC-48 circuit from Abilene to Georgia Tech

• in the presence of production Internet2 traffic• the high-speed FAST flows do not disrupt conventional traffic

21





Exploring non-shared contexts• Group Transport Protocol: UC San Diego• VBTP: Univ Virginia• IP-QoS for TCP: Univ College London

22

Group Transport Protocol:Rate-based protocols for Grids

R Wu and A Chien at UCSD

Emphasis on Multipoint-to-Point support in a lambda-grid environment

Dynamic lambdas over the wide area

Need for flows from several sources to converge at the site of a grid computation

Rate-based protocols the best approach in this environment

23

VBTP: Scheduling file transfers on dynamic optical networks

Veeraraghavan and Zhang at Univ Virginia, Feng at Los Alamos, Lee at Polytechnic, and Chong and Li at Colorado State Univ

Circuit-switched networks may make it difficult to fully utilize available capacity for a given task

VBTP designed as a rate-based scheme to schedule circuit resources effectively in support of file transfers

24

IP-QoS for TCP

Donato, Li, Saka, and Clarke at Univ College London

Idea: Use IP-QoS as a means of combining dependability of TCP bulk flow rates with protection of interactive traffic from over-aggressive TCP flows

Even with this help, transport protocols will need to be improved for PFLDnet environments

25

Reflections

Making effective use of high-speed wide-area networks is crucial for international collaborative research

Current TCP algorithms were not designed to support anything like the current 10,000 km 10-Gb/s combinations we now face

26

There is now renewed vitality in the PFLDnet research area

This will lead to (at least) two key benefits• enable dramatic improvements in the effective use of high-speed wide-area network infrastructure

• clarify the boundary of applicability of shared packet-switched vs dedicated circuit-switched networks

27

Closing reference to Internet2'sLand Speed Record

• rewards heroism in wide-area high-speed TCP flows• figure of merit: product of b/s rate times distance

Single-stream IPv4 TCP record• current: 4.2 Gb/s over 16,343 km• previous: 5.6 Gb/s over 10,000 km

Can we make these performance levels normative in high-end networks?

28

Documents

10 June 2004 Protocols for Long-Distance Networks Terena Networking Conference 2004 Rhodes