FAST TCP in Linux Cheng Jin David Wei

FAST TCP in Linux

Cheng Jin David Wei

http://netlab.caltech.edu/FAST/WANinLab/nsfvisit

netlab.caltech.edu

Outline

Overview of FAST TCP. Implementation Details. SC2002 Experiment Results. FAST Evaluation and WAN-in-Lab.

netlab.caltech.edu

FAST vs. Linux TCP

Distance = 10,037 km; Delay = 180 ms; MTU = 1500 B; Duration: 3600 sLinux TCP Experiments: Jan 28-29, 2003

1333173.182Linux TCPtxqlen=100


78 1851.86 1Linux TCPtxqlen=100

753

387

111

Transfer(GB)

1,79718.032FAST

19.11.2002

9259.281FAST

19.11.2002


Throughput(Mbps)

BmpsPeta

Flows

netlab.caltech.edu

Aggregate Throughput

Linux TCP Linux TCP FAST

Average utilization

19%

27%

92%FAST Standard MTU Utilization averaged over 1hr

txq=100 txq=10000

95%

16%

48%

Linux TCP Linux TCP FAST

2G

1G

netlab.caltech.edu

Summary of Changes

RTT estimation: fine-grain timer. Fast convergence to equilibrium. Delay monitoring in equilibrium. Pacing: reducing burstiness.

netlab.caltech.edu

FAST TCP Flow Chart

Slow Start

Fast Convergence

Equilibrium

Loss Recovery

NormalRecovery

Time-out

netlab.caltech.edu

RTT Estimation

Measure queueing delay. Kernel timestamp with s resolution. Use SACK to increase the number of RTT

samples during recovery. Exponential averaging of RTT samples to

increase robustness.

netlab.caltech.edu

Fast Convergence

Rapidly increase or decrease cwnd toward equilibrium.

Monitor the per-ack queueing delay to avoid overshoot.

netlab.caltech.edu

Equilibrium

Vegas-like cwnd adjustment in large time-scale -- per RTT.

Small step-size to maintain stability in equilibrium.

Per-ack delay monitoring to enable timely detection of changes in equilibrium.

netlab.caltech.edu

Pacing

What do we pace? Increment to cwnd.

Time-Driven vs. event-driven. Trade-off between complexity and

performance. Timer resolution is important.

netlab.caltech.edu

Time-Based Pacing

cwnd increments are scheduled at fixed intervals.

data ack data

netlab.caltech.edu

Event-Based Pacing

Detect sufficiently large gap between consecutive bursts and delay cwnd increment until the end of each such burst.

SCinet Caltech-SLAC experiments

netlab.caltech.edu/FAST

SC2002 Baltimore, Nov 2002

Experiment

Sunnyvale Baltimore

Chicago

Geneva

3000km 1000km

70

00

km

C. Jin, D. Wei, S. LowFAST Team and Partners

Internet: distributed feedbacksystem Rf (s)

Rb’(s)

x

p

TCP AQM

Theory

FAST TCP Standard MTU Peak window = 14,255 pkts Throughput averaged over > 1hr 925 Mbps single flow/GE card

9.28 petabit-meter/sec 1.89 times LSR

8.6 Gbps with 10 flows 34.0 petabit-meter/sec 6.32 times LSR

21TB in 6 hours with 10 flows

Implementation Sender-side modification Delay based

Highlights

1 2

1

2

7

9

10G

enev

a-Sunnyv

ale

Baltim

ore-

Sunn

yval

eFA

ST

I2 L

SR

#flows

netlab.caltech.edu

Network

(Sylvain Ravot, caltech/CERN)

netlab.caltech.edu

FAST BMPS

Internet2Land Speed

Record

FAST

1 2

1

2

7

9

10

Geneva-S

unnyvale

Baltim

ore-S

unnyvale

#flows

FAST Standard MTU Throughput averaged over > 1hr

netlab.caltech.edu

Aggregate Throughput

1 flow 2 flows 7 flows 9 flows 10 flows

Average utilization

95%

92%

90%

90%

88%FAST Standard MTU Utilization averaged over > 1hr

1hr 1hr 6hr 1.1hr 6hr

netlab.caltech.edu

Caltech-SLAC Entry

Rapid recoveryafter possiblehardware glitch

Power glitchReboot

100-200Mbps ACK traffic

SCinet Caltech-SLAC experiments

netlab.caltech.edu/FAST

SC2002 Baltimore, Nov 2002

PrototypeC. Jin, D. Wei

TheoryD. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)

Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.

Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.

Navratil, J. Williams StarLight: T. deFanti, L. Winkler TeraGrid: L. Winkler

Major sponsorsARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF

Acknowledgments

netlab.caltech.edu

Evaluating FAST

End-to-End monitoring doesn’t tell the whole story.

Existing network emulation (dummynet) is not always enough.

Better optimization if we can look inside and understand the real network.

netlab.caltech.edu

Dummynet and Real Testbed

netlab.caltech.edu

Dummynet Issues

Not running on a real-time OS -- imprecise timing.

Lack of priority scheduling of dummynet events.

Bandwidth fluctuates significantly with workload.

Much work needed to customize dummynet for protocol testing.

netlab.caltech.edu

10 GbE Experiment

Long-distance testing of Intel 10GbE cards.

Sylvain Ravot (Caltech) achieved 2.3 Gbps using single stream with jumbo frame and stock Linux TCP.

Tested HSTCP, Scalable TCP, FAST, and stock TCP under Linux.

1500B MTU: 1.3 Gbps SNV -> CHI; 9000B MTU: 2.3 Gbps SNV -> GVA

netlab.caltech.edu

TCP Loss Mystery

Frequent packet loss with 1500-byte MTU. None with larger MTUs.

Packet loss even when cwnd is capped at 300 - 500 packets.

Routers have large queue size of 4000 packets.

Packets captured at both sender and receiver using tcpdump.

netlab.caltech.edu

How Did the Loss Happen?

loss detected

netlab.caltech.edu

How Can WAN-in-Lab Help?

We will know exactly where packets are lost.

We will also know the sequence of events (packet arrivals) that lead to loss.

We can either fix the problem in the network if any, or improve the protocol.

netlab.caltech.edu

Conclusion

FAST improves the end-to-end performance of TCP.

Many issues are still to be understood and resolved.

WAN-in-Lab can help make FAST a better protocol.

Documents

FAST TCP in Linux Cheng Jin David Wei