Upload
archibald-lamb
View
236
Download
0
Tags:
Embed Size (px)
Citation preview
TCP transfers over high TCP transfers over high latency/bandwidth networkslatency/bandwidth networks
Internet2 Member MeetingInternet2 Member MeetingHENP working group sessionHENP working group session
April 9-11, 2003, Arlington April 9-11, 2003, Arlington
T. Kelly, University of CambridgeT. Kelly, University of CambridgeJ.P. Martin-Flatin, O. Martin, CERNJ.P. Martin-Flatin, O. Martin, CERN
S. Low, CaltechS. Low, CaltechL. Cottrell, SLACL. Cottrell, SLACS. Ravot, CaltechS. Ravot, Caltech
[email protected]@hep.caltech.edu
ContextContext
High Energy Physics (HEP)High Energy Physics (HEP) LHC model shows data at the experiment will be stored at the rate of 100 – 1500 Mbytes/sec LHC model shows data at the experiment will be stored at the rate of 100 – 1500 Mbytes/sec
throughout the year.throughout the year. Many Petabytes per year of stored and processed binary data will be accessed and Many Petabytes per year of stored and processed binary data will be accessed and
processed repeatedly by the worldwide collaborations.processed repeatedly by the worldwide collaborations.
New backbone capacities advancing rapidly to 10 Gbps rangeNew backbone capacities advancing rapidly to 10 Gbps range
TCP limitationTCP limitation Additive increase and multiplicative decrease policyAdditive increase and multiplicative decrease policy
TCP FairnessTCP Fairness Effect of the MTUEffect of the MTU Effect of the RTTEffect of the RTT
New TCP implementationsNew TCP implementations Grid DTGrid DT Scalable TCPScalable TCP Fast TCPFast TCP High-speed TCPHigh-speed TCP
Internet2 Land Speed recordInternet2 Land Speed record
Time to recover from a single lossTime to recover from a single loss
TCP Throughput CERN-Chicago over the 622 Mbit/s link
0
50
100
150
200
0 200 400 600 800 1000 1200 1400 1600
Time (s)
Th
rou
gh
pu
t (M
bit
/s)
TCP reactivity TCP reactivity Time to increase the throughput by 120 Mbit/s is larger than 6 min Time to increase the throughput by 120 Mbit/s is larger than 6 min
for a connection between Chicago and CERN.for a connection between Chicago and CERN. A single loss is disastrousA single loss is disastrous
A TCP connection reduces its bandwidth use by half after a loss is A TCP connection reduces its bandwidth use by half after a loss is detected (Multiplicative decrease)detected (Multiplicative decrease)
A TCP connection increases slowly its bandwidth use (Additive A TCP connection increases slowly its bandwidth use (Additive increase)increase)
TCP throughput is much more sensitive to packet loss in WANs TCP throughput is much more sensitive to packet loss in WANs than in LANsthan in LANs
6 min6 min
Responsiveness (I)Responsiveness (I)
The responsiveness The responsiveness measures how quickly we go back to measures how quickly we go back to using the network link at full capacity after experiencing a loss if using the network link at full capacity after experiencing a loss if we assume that the congestion window size is equal to the we assume that the congestion window size is equal to the Bandwidth Delay product when the packet is lost.Bandwidth Delay product when the packet is lost.
C . RTTC . RTT
2 . MSS2 . MSS
22 C : Capacity of the linkC : Capacity of the link
TCP responsiveness
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 50 100 150 200
RTT (ms)
Tim
e (
s) C= 622 Mbit/s
C= 2.5 Gbit/s
C= 10 Gbit/s
Responsiveness (II)Responsiveness (II)
CaseCase CC RTT (ms)RTT (ms) MSS (Byte)MSS (Byte) ResponsivenessResponsiveness
Typical LAN in 1988Typical LAN in 1988 10 Mb/s10 Mb/s [ 2 ; 20 ] [ 2 ; 20 ] 14601460 [ 1.7 ms ; 171 ms ][ 1.7 ms ; 171 ms ]
Typical LAN todayTypical LAN today 1 Gb/s1 Gb/s 22(worst case)(worst case)
14601460 96 ms96 ms
Futur LANFutur LAN 10 Gb/s10 Gb/s 22(worst case)(worst case)
14601460 1.7s1.7s
WAN WAN Geneva <-> ChicagoGeneva <-> Chicago
1 Gb/s1 Gb/s 120120 14601460 10 min10 min
WAN WAN Geneva <-> SunnyvaleGeneva <-> Sunnyvale
1 Gb/s1 Gb/s 180180 14601460 23 min23 min
WAN WAN Geneva <-> TokyoGeneva <-> Tokyo
1 Gb/s1 Gb/s 300300 14601460 1 h 04 min1 h 04 min
WAN WAN Geneva <-> SunnyvaleGeneva <-> Sunnyvale
2.5 Gb/s2.5 Gb/s 180180 14601460 58 min58 min
Future WAN Future WAN CERN <-> StarlightCERN <-> Starlight
10 Gb/s10 Gb/s 120120 14601460 1 h 32 min1 h 32 min
Future WAN link Future WAN link CERN <-> StarlightCERN <-> Starlight
10 Gb/s10 Gb/s 120120 8960 8960 (Jumbo Frame)(Jumbo Frame)
15 min15 min
The Linux kernel 2.4.x implements delayed acknowledgment. Due to delayed acknowledgments, the The Linux kernel 2.4.x implements delayed acknowledgment. Due to delayed acknowledgments, the
responsiveness is multiplied by two. Therefore, values above have to be multiplied by tworesponsiveness is multiplied by two. Therefore, values above have to be multiplied by two!!
Effect of the MTU on the Effect of the MTU on the responsivenessresponsiveness
0
200
400
600
800
1000
0 1000 2000 3000
Time (s)
Thr
ough
put
(Mb/
s)
MTU=1498
MTU=3998
MTU=8988
Effect of the MTU on a transfer between CERN and Starlight (RTT=117 ms, bandwidth=1 Gb/s)Effect of the MTU on a transfer between CERN and Starlight (RTT=117 ms, bandwidth=1 Gb/s)
Larger MTU improves the TCP responsiveness because you increase your cwnd by one MSS Larger MTU improves the TCP responsiveness because you increase your cwnd by one MSS each RTT.each RTT.
Couldn’t reach wire-speed with standard MTUCouldn’t reach wire-speed with standard MTU Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of
packets) packets)
Starlight (Chi)Starlight (Chi)CERN (GVA)CERN (GVA)
MTU and FairnessMTU and Fairness
Two TCP streams share a 1 Gb/s bottleneck Two TCP streams share a 1 Gb/s bottleneck RTT=117 msRTT=117 ms MTU = 3000 Bytes ; Avg. throughput over a period of 7000s = 243 Mb/sMTU = 3000 Bytes ; Avg. throughput over a period of 7000s = 243 Mb/s MTU = 9000 Bytes; Avg. throughput over a period of 7000s = 464 Mb/sMTU = 9000 Bytes; Avg. throughput over a period of 7000s = 464 Mb/s Link utilization : 70,7 %Link utilization : 70,7 %
RR RRGbE GbE SwitchSwitch
Host #1Host #1POS 2.5POS 2.5 GbpsGbps1 GE1 GE
1 GE1 GE
Host #2Host #2
Host #1Host #1
Host #2Host #2
1 GE1 GE
1 GE1 GE
BottleneckBottleneck
Throughput of two streams with different MTU sizes sharing a 1 Gbps bottleneck
0
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000 6000
Time(s)
Thr
ough
put
(Mbp
s)
MTU = 3000 Byte
Average over the life of the connection MTU = 3000 Byte
MTU = 9000 Byte
Average over the life of the connection MTU = 9000 Byte
SunnyvaleSunnyvaleStarlight (Chi)Starlight (Chi)
CERN (GVA)CERN (GVA)
RTT and FairnessRTT and Fairness
RR RRGbE GbE SwitchSwitch
Host #1Host #1
POS 2.5POS 2.5 Gb/sGb/s1 GE1 GE
1 GE1 GE
Host #2Host #2
Host #1Host #1
Host #2Host #2
1 GE1 GE
1 GE1 GE
BottleneckBottleneck
RRPOS 10POS 10 Gb/sGb/sRR10GE10GE
Two TCP streams share a 1 Gb/s bottleneck Two TCP streams share a 1 Gb/s bottleneck CERN <-> Sunnyvale RTT=181ms ; Avg. throughput over a period of 7000s = 202Mb/sCERN <-> Sunnyvale RTT=181ms ; Avg. throughput over a period of 7000s = 202Mb/s CERN <-> Starlight RTT=117ms; Avg. throughput over a period of 7000s = 514Mb/sCERN <-> Starlight RTT=117ms; Avg. throughput over a period of 7000s = 514Mb/s MTU = 9000 bytesMTU = 9000 bytes Link utilization = 71,6 % Link utilization = 71,6 %
Throughput of two streams with different RTT sharing a 1Gbps bottleneck
0
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000 6000 7000
Time (s)
Thr
ough
put
(Mbp
s) RTT=181ms
Average over the life of the connection RTT=181ms
RTT=117ms
Average over the life of the connection RTT=117ms
RTT of a TCP connection between CERN and Starlight
0
20
40
60
80
100
120
140
160
0 500 1000 1500Time(s)
RT
T (
ms)
Throughput of a TCP connection between CERN and Starlight
0
200
400
600
800
1000
0 500 1000 1500Time(s)
Thr
ough
put(
Mb/
s)
Starlight (Chi)Starlight (Chi)CERN (GVA)CERN (GVA)
Effect of buffering on End-hostsEffect of buffering on End-hosts
SetupSetup
RTT = 117 msRTT = 117 ms Jumbo FramesJumbo Frames Transmit queue of the network Transmit queue of the network
device = 100 packets (i.e 900 device = 100 packets (i.e 900 kBytes)kBytes)
Area #1Area #1 Cwnd < BDP =>Cwnd < BDP =>
Throughput < BandwidthThroughput < Bandwidth RTT constantRTT constant Throughput = Cwnd / RTTThroughput = Cwnd / RTT
Area #2Area #2 Cwnd > BDP => Cwnd > BDP =>
Throughput = BandwidthThroughput = Bandwidth RTT increase (proportional to Cwnd)RTT increase (proportional to Cwnd)
Link utilization larger than 75%Link utilization larger than 75%
Area #2Area #2Area #1Area #1
RR RRHost Host GVAGVA
Host Host CHICHI
POS 2.5POS 2.5 Gb/sGb/s1 GE1 GE 1 GE1 GE
Buffering space on End-hostsBuffering space on End-hosts
Link utilization near 100% if :Link utilization near 100% if : No congestion into the networkNo congestion into the network No transmission errorNo transmission error Buffering space = Bandwidth delay productBuffering space = Bandwidth delay product TCP buffers size = 2 * Bandwidth delay productTCP buffers size = 2 * Bandwidth delay product=> Congestion window size always larger than the bandwidth delay product=> Congestion window size always larger than the bandwidth delay product
Effect of the buffering on the throughput
0
200
400
600
800
1000
0 200 400 600 800 1000 1200 1400 1600 1800
Time(s)
Thr
ough
put
(Mb/
s)
txqueulen=100 pkts
txqueulen=500 pkts
txqueulen=1000 pkts
txqueulen=1500 pkts
Effect of the buffering on the RTT
0
50
100
150
200
250
0 200 400 600 800 1000 1200 1400 1600 1800Time(s)
RT
T (
ms)
txqueuelen=100 pkts
txqueuelen=500 pkts
txqueuelen=1000 pkts
txqueuelen=1500 pkts
Txqueulen Txqueulen is the is the transmit queue of transmit queue of the network the network devicedevice
Linux Patch “GRID DT”Linux Patch “GRID DT”
Parameter tuningParameter tuningNew parameter to better start a TCP transferNew parameter to better start a TCP transfer
Set the value of the initial SSTHRESHSet the value of the initial SSTHRESH
Modifications of the TCP algorithms (RFC 2001)Modifications of the TCP algorithms (RFC 2001)Modification of the well-know Modification of the well-know congestion avoidancecongestion avoidance algorithm algorithm
During congestion avoidance, for every acknowledgement During congestion avoidance, for every acknowledgement received, cwnd increases by received, cwnd increases by A * (segment size) * (segment size) / cwnd.A * (segment size) * (segment size) / cwnd.It’s equivalent to increase cwnd by A segments each RTT. A is It’s equivalent to increase cwnd by A segments each RTT. A is called additive incrementcalled additive increment
Modification of the slow start algorithmModification of the slow start algorithm During slow start, for every acknowledgement received, cwnd During slow start, for every acknowledgement received, cwnd
increases by M segments. M is called multiplicative increment.increases by M segments. M is called multiplicative increment.
Note: A=1 and M=1 in TCP RENO.Note: A=1 and M=1 in TCP RENO. Smaller backoff Smaller backoff
Reduce the strong penalty imposed by a lossReduce the strong penalty imposed by a loss
Grid DTGrid DT
Only the sender’s TCP stack has to be modifiedOnly the sender’s TCP stack has to be modified Very simple modifications to the TCP/IP stackVery simple modifications to the TCP/IP stack Alternative to Multi-streams TCP transfersAlternative to Multi-streams TCP transfers
Single stream vs Multi streamsSingle stream vs Multi streams it is simpler it is simpler startup/shutdown are fasterstartup/shutdown are faster fewer keys to manage (if it is secure)fewer keys to manage (if it is secure)
Virtual increase of the MTU. Virtual increase of the MTU. Compensate the effect of delayed ackCompensate the effect of delayed ack Can improve “fairness” Can improve “fairness”
between flows with different RTTbetween flows with different RTT between flows with different MTUbetween flows with different MTU
Effect of the RTT on the fairnessEffect of the RTT on the fairness Objective: Improve fairness between two TCP streams with different RTT and same MTUObjective: Improve fairness between two TCP streams with different RTT and same MTU We can adapt the model proposed by Matt. Mathis by taking into account a higher additive incrementWe can adapt the model proposed by Matt. Mathis by taking into account a higher additive increment Assumptions:Assumptions:
Approximate the packet loss of probability Approximate the packet loss of probability pp by assuming that each flow delivers by assuming that each flow delivers 1/p1/p consecutive consecutive packets followed by one drop.packets followed by one drop.
Under these assumptions, the congestion window of the flows oscillate with a period T0. Under these assumptions, the congestion window of the flows oscillate with a period T0. If the receiver acknowledges every packet, then the congestion window size opens by x (additive If the receiver acknowledges every packet, then the congestion window size opens by x (additive
increment) packets each RTT.increment) packets each RTT.
W
W/2
(t)2T0T0
00
00___
TTtdtAdttWAStreampacketsNb
00
00'''''___
TTdttBdttWBStreampacketsNb
B
A
RTT
RTT
dt
dt'
2
B
A
RTT
RTT
B
ABy modifying the congestion increment dynamically according By modifying the congestion increment dynamically according to RTT, guarantee fairness among TCP connections:to RTT, guarantee fairness among TCP connections:
Relation between t and t’:Relation between t and t’:
Number of packets delivered by each stream in one period:Number of packets delivered by each stream in one period:
CWND evolution under periodic loss CWND evolution under periodic loss
incrementsadditivetheareBandA
Throughput of two streams with different RTT sharing a 1Gbps bottleneck
0
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000 6000
Time (s)
Thr
ough
put (
Mbp
s)
A=7 ; RTT=181ms
Average over the life of the connection RTT=181ms
B=3 ; RTT=117ms
Average over the life of the connection RTT=117ms
Effect of the RTT on the fairnessEffect of the RTT on the fairness
TCP Reno performance (see slide #8):TCP Reno performance (see slide #8): First stream GVA <-> Sunnyvale : RTT = 181 ms ; Avg. throughput over a period of 7000s = 202 Mb/sFirst stream GVA <-> Sunnyvale : RTT = 181 ms ; Avg. throughput over a period of 7000s = 202 Mb/s Second stream GVA<->CHI : RTT = 117 ms; Avg. throughput over a period of 7000s = 514 Mb/sSecond stream GVA<->CHI : RTT = 117 ms; Avg. throughput over a period of 7000s = 514 Mb/s Links utilization 71,6%Links utilization 71,6%
Grid DT tuning in order to improve fairness between two TCP streams with different RTT:Grid DT tuning in order to improve fairness between two TCP streams with different RTT:First stream GVA <-> Sunnyvale : RTT = 181 ms, Additive increment = A = 7 ; Average throughput = 330 Mb/sFirst stream GVA <-> Sunnyvale : RTT = 181 ms, Additive increment = A = 7 ; Average throughput = 330 Mb/sSecond stream GVA<->CHI : RTT = 117 ms, Additive increment = B = 3 ; Average throughput = 388 Mb/sSecond stream GVA<->CHI : RTT = 117 ms, Additive increment = B = 3 ; Average throughput = 388 Mb/sLinks utilization 71.8%Links utilization 71.8%
39.2117
18122
B
A
RTT
RTT
33.23
7 B
A
SunnyvaleSunnyvale
Starlight (CHI)Starlight (CHI)
CERN (GVA)CERN (GVA)
RR RRGbE GbE SwitchSwitch
POS 2.5POS 2.5 Gb/sGb/s1 GE1 GE
1 GE1 GE
Host #2Host #2
Host #1Host #1
Host #2Host #2
1 GE1 GE
1 GE1 GEBottleneckBottleneck
RRPOS 10POS 10 Gb/sGb/sRR10GE10GE
Host #1Host #1
Effect of the MTUEffect of the MTU
Throughput of two streams with different MTU sizes sharing a Gbps bottleneck
0
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000
Time (s)
Th
rou
gh
pu
t (M
bp
s)
MTU = 3000 Byte
Average over the life of the connection MTU = 3000 Byte
MTU = 9000 Byte
Average over the life of the connection MTU = 9000 Byte
Starlight (Chi)Starlight (Chi)CERN (GVA)CERN (GVA)
Two TCP streams share a 1 Gb/s bottleneck Two TCP streams share a 1 Gb/s bottleneck RTT=117 msRTT=117 ms MTU = 3000 Bytes ; Additive increment = 3; Avg. throughput over a period of 6000s = 310 Mb/sMTU = 3000 Bytes ; Additive increment = 3; Avg. throughput over a period of 6000s = 310 Mb/s MTU = 9000 Bytes; Additive increment = 1; Avg. throughput over a period of 6000s = 325 Mb/sMTU = 9000 Bytes; Additive increment = 1; Avg. throughput over a period of 6000s = 325 Mb/s Link utilization : 61,5 %Link utilization : 61,5 %
RR RRGbE GbE SwitchSwitch
Host #1Host #1POS 2.5POS 2.5 Gb/sGb/s1 GE1 GE
1 GE1 GE
Host #2Host #2
Host #1Host #1
Host #2Host #2
1 GE1 GE
1 GE1 GE
BottleneckBottleneck
Next WorkNext Work Taking into account the value of the MTU in the evaluation of the additive increment:Taking into account the value of the MTU in the evaluation of the additive increment:
Define a reference:Define a reference:
For example:For example: Reference: MTU = 9000 bytes => Add. Increment = 1Reference: MTU = 9000 bytes => Add. Increment = 1 MTU = 1500 bytes => Add. Increment = 6 MTU = 1500 bytes => Add. Increment = 6 MTU = 3000 bytes => Add. Increment = 3 MTU = 3000 bytes => Add. Increment = 3
Taking into account the square of the RTT in the evaluation of the additive increment:Taking into account the square of the RTT in the evaluation of the additive increment: Define a reference: Define a reference:
For example:For example: Reference: RTT=10 ms => Add. Increment = 1Reference: RTT=10 ms => Add. Increment = 1 RTT=100ms => Add. Increment = 100RTT=100ms => Add. Increment = 100 RTT=200ms => Add. Increment = 400RTT=200ms => Add. Increment = 400
Combining the two formulas above:Combining the two formulas above:
Periodic evaluation of the RTT and the MTU.Periodic evaluation of the RTT and the MTU. How to define the references?How to define the references?
REF
REF
REF
RTTRTTifRTT
RTT
RTTRTTif
RTTA 2
1
)(
REF
REF
REF
MTUMTUifMTU
MTU
MTUMTUif
MTUA
1
)(
MTURTTfIncrementAdditive ,2
Scalable TCPScalable TCP
For cwnd>lwnd, replace AIMD with new algorithm:For cwnd>lwnd, replace AIMD with new algorithm:for each ACK in an RTT without loss:for each ACK in an RTT without loss:
cwndcwndi+1i+1 = cwnd = cwndii + a + afor each window experiencing loss:for each window experiencing loss:
cwndcwndi+1i+1 = cwnd = cwndii – (b x cwnd – (b x cwndii))
Kelly’s proposal during internship at CERN:Kelly’s proposal during internship at CERN:(lwnd,a,b) = (16, 0.01, 0.125)(lwnd,a,b) = (16, 0.01, 0.125)Trade-off between fairness, stability, variance and convergenceTrade-off between fairness, stability, variance and convergence
Advantages:Advantages:Responsiveness improves dramatically for gigabit networksResponsiveness improves dramatically for gigabit networksResponsiveness is independent of capacityResponsiveness is independent of capacity
Scalable TCP: Responsiveness Scalable TCP: Responsiveness Independent of CapacityIndependent of Capacity
Scalable TCP vs. TCP NewReno:Scalable TCP vs. TCP NewReno:BenchmarkingBenchmarking
Number of Number of flowsflows 2.4.19 TCP2.4.19 TCP
2.4.19 TCP + 2.4.19 TCP + new dev new dev
driverdriverScalable TCPScalable TCP
1 7 16 44
2 14 39 93
4 27 60 135
8 47 86 140
16 66 106 142
Responsiveness for RTT=200 ms and MSS=1460 bytes:Responsiveness for RTT=200 ms and MSS=1460 bytes: Scalable TCP: 2.7 sScalable TCP: 2.7 s TCP NewReno (AIMD):TCP NewReno (AIMD):
~3 min at 100 Mbit/s~3 min at 100 Mbit/s ~1h 10min at 2.5 Gbit/s~1h 10min at 2.5 Gbit/s ~4h 45min at 10 Gbit/s~4h 45min at 10 Gbit/s
BulkBulk throughput tests with C=2.5 Gbit/s. Flows transfer 2 Gbytes and throughput tests with C=2.5 Gbit/s. Flows transfer 2 Gbytes and start again for 1200sstart again for 1200s
For details, see paper and code at:For details, see paper and code at: http://www-lce.eng.cam.ac.uk/˜ctk21/scalable/http://www-lce.eng.cam.ac.uk/˜ctk21/scalable/
Fast TCPFast TCP
Equilibrium propertiesEquilibrium propertiesUses end-to-end delay Uses end-to-end delay andand loss lossAchieves any desired fairness, expressed by utility functionAchieves any desired fairness, expressed by utility functionVery high utilization (99% in theory)Very high utilization (99% in theory)
Stability propertiesStability propertiesStability for arbitrary delay, capacity, routing & loadStability for arbitrary delay, capacity, routing & loadRobust to heterogeneity, evolution, …Robust to heterogeneity, evolution, …Good performanceGood performance
Negligible queueing delay & loss (with ECN)Negligible queueing delay & loss (with ECN) Fast responseFast response
FAST TCP performance FAST TCP performance
FAST TCP performanceFAST TCP performance
1 flow 2 flows 7 flows 9 flows 10 flows
Average utilization
95%
92%
90%
90%FASTFAST Standard MTUStandard MTU Utilization averaged over > 1hrUtilization averaged over > 1hr
1hr 1hr 6hr 1.1hr 6hr
88%
FAST TCP performanceFAST TCP performance
Linux TCP Linux TCP FAST
Average utilization
19%
27%
92%FASTFAST Standard MTUStandard MTU Utilization averaged over 1hrUtilization averaged over 1hr
txq=100 txq=10000
95%
16%
48%
Linux TCP Linux TCP FAST
2G
1G
Internet2 Land Speed recordInternet2 Land Speed record
On February 27-28, 2003, over a Terabyte of data was On February 27-28, 2003, over a Terabyte of data was transferred in less than an hour between the Level(3) transferred in less than an hour between the Level(3) Gateway in Sunnyvale, near SLAC, and CERN. Gateway in Sunnyvale, near SLAC, and CERN.
The data passed through the TeraGrid Router at StarLight The data passed through the TeraGrid Router at StarLight from memory to memory as a single TCP/IP stream at an from memory to memory as a single TCP/IP stream at an average rate of 2.38 Gbits/s (using large windows and average rate of 2.38 Gbits/s (using large windows and 9KByte "jumbo frames"). 9KByte "jumbo frames").
This beat the former record by a factor of approximately This beat the former record by a factor of approximately 2.5 and used the US-CERN link at 99% efficiency2.5 and used the US-CERN link at 99% efficiency
Internet2 LSR tesbedInternet2 LSR tesbed
ConclusionConclusion
To achieve high throughput over high latency/bandwidth To achieve high throughput over high latency/bandwidth network, we need to :network, we need to : Set the initial slow start threshold (ssthresh) to an Set the initial slow start threshold (ssthresh) to an
appropriate value for the delay and bandwidth of the appropriate value for the delay and bandwidth of the link.link.
Avoid loss Avoid loss By limiting the max cwnd sizeBy limiting the max cwnd size
Recover fast if loss occurs:Recover fast if loss occurs:Larger cwnd increment Larger cwnd increment Smaller window reduction after a lossSmaller window reduction after a lossLarger packet size (Jumbo Frame)Larger packet size (Jumbo Frame)
Is standard MTU the largest bottleneck?Is standard MTU the largest bottleneck? How to define the fairness?How to define the fairness?
Taking into account the MTUTaking into account the MTU Taking into account the RTTTaking into account the RTT
Which is the best new TCP implementation?Which is the best new TCP implementation?