21
Modeling TCP Throughput and Delay M. Veeraraghavan, April 3, 2004 This writeup describes the models from two papers [1] and [2]. The model in [1] uses the throughput model of [2] for the congestion avoidance phase. There are three phases: connection establishment, Slow Start, Congestion avoidance. This is a model of the congestion control proce- dures implemented in the Reno version. 1 Connection establishment We model the total delay to transfer a file of size on an end-to-end path consisting of k rout- ers in the network, with an end-to-end round-trip propagation delay p. The total delay consists of the time to open a connection and then transfer the file. The time to open a TCP connection is (1) where and are the SYN (Synchronize) packet lengths 1 on the access network and the core network, respectively, and is the data rate of the links. The SYN packet lengths could differ on the access and the core if different data-link layer protocols are used, e.g., Ethernet on access and PPP in the core. 2 Low load (zero packet loss) delay In Slow Start, the congestion window (cwnd) is increased exponentially with every received acknowledgement using Allman’s proposal [3] in which the increase is equal to the number of bytes acknowledged. We choose ssthresh to be at least equal to the window size at which the system reaches the streaming state. We assume that the receive window size (rwnd) does not pose a limit, i.e., it is very large. This makes the allowed window size only depend on cwnd. Further assume that an ACK in sent for every data segments (where is typically 2). We model data transfer delays using the concept of “rounds” as demonstrated in (1). One “round” is the time from when the first segment of cwnd segments is emitted until the first ACK is received (given our assumption that every other segment is acknowledged, this ACK will be for the first two segments of the round for all rounds except the first). The congestion window size at which the TCP connection reaches a streaming state is given by: (2) where is the round trip time, and is the buffer size feeding the bottleneck link [4]. When , segments will effectively be streamed continuously by the sender without waiting for ACKs (under no loss conditions). In other words, the ACK for the first two segments sent in a round arrives just in time as the sender completes inserting bytes on to the link. Therefore 1.These are also the packet header lengths for data packets. f T pkt open 3 h a r ---- 2 × h n r ---- k 1 ( ) × + 3 2 -- p × + = h a h n r W str b b W str RTT r × B + = RTT B cwnd W str = W str

Modeling TCP Throughput and Delay - University of … · Modeling TCP Throughput and Delay M. Veeraraghavan, April 3, 2004 This writeup describes the models from two papers [1] and

  • Upload
    doannga

  • View
    241

  • Download
    0

Embed Size (px)

Citation preview

Modeling TCP Throughput and Delay

M. Veeraraghavan, April 3, 2004

This writeup describes the models from two papers [1] and [2]. The model in [1] uses thethroughput model of [2] for the congestion avoidance phase. There are three phases: connectionestablishment, Slow Start, Congestion avoidance. This is a model of the congestion control proce-dures implemented in the Reno version.

1 Connection establishment

We model the total delay to transfer a file of size on an end-to-end path consisting of k rout-ers in the network, with an end-to-end round-trip propagation delay p. The total delay consists ofthe time to open a connection and then transfer the file. The time to open a TCP connection is

(1)

where and are the SYN (Synchronize) packet lengths1 on the access network and the corenetwork, respectively, and is the data rate of the links. The SYN packet lengths could differ onthe access and the core if different data-link layer protocols are used, e.g., Ethernet on access andPPP in the core.

2 Low load (zero packet loss) delayIn Slow Start, the congestion window (cwnd) is increased exponentially with every received

acknowledgement using Allman’s proposal [3] in which the increase is equal to the number ofbytes acknowledged. We choose ssthresh to be at least equal to the window size at which thesystem reaches the streaming state. We assume that the receive window size (rwnd) does not posea limit, i.e., it is very large. This makes the allowed window size only depend on cwnd. Furtherassume that an ACK in sent for every data segments (where is typically 2).

We model data transfer delays using the concept of “rounds” as demonstrated in (1). One“round” is the time from when the first segment of cwnd segments is emitted until the first ACK isreceived (given our assumption that every other segment is acknowledged, this ACK will be forthe first two segments of the round for all rounds except the first). The congestion window size atwhich the TCP connection reaches a streaming state is given by:

(2)

where is the round trip time, and is the buffer size feeding the bottleneck link [4]. When, segments will effectively be streamed continuously by the sender without waiting

for ACKs (under no loss conditions). In other words, the ACK for the first two segments sent in around arrives just in time as the sender completes inserting bytes on to the link. Therefore

1.These are also the packet header lengths for data packets.

f

Tpktopen 3

har

----- 2×hnr

----- k 1–( )×+ 3

2--- p×+=

ha hn

r

Wstr

b b

Wstr RTT r× B+=

RTT Bcwnd Wstr=

Wstr

the sender can immediately start the next round and no time is wasted in waiting for an ACK. Atthis point, we can simply count emission delays of packets on to links and ignore round-trip prop-agation delays. We refer to this as the “streaming state.”

In round , bytes are sent. The system reaches the streaming state in round ,where

(3)

where is the maximum segment size. For example, if is 500, then . The con-

cept of rounds only applies until this round is reached. Beyond this point, the file is simplystreamed with acknowledgements arriving before the whole congestion window can be sent.

The file size corresponding to is

, (4)

Tpktsp

Tround1

Tround2

Tround2

Fig. 1 Visualizing TCP transfer delaysh

Client Server

1

32

456789

1011121314151617

SYNACK

SYN

ACK

ACK

Tround2

i 2i 1– m× nw

nwWstr

m----------

2log 1+=

m Wstr m⁄ nw 10=

nw

Wstr

fstr 2nw 1–( )m=

i.e., for file sizes , the TCP connection does not reach the streaming state. For such files,

we determine the number of rounds needed as follows:

(5)

For , the total file transfer time is:

(6)

where is given by (1), and is the low-load delay to send one segment end-to-endthrough routers as given below:

. (7)

in (6) is the time to receive an ACK for the first segment to which is added a round-trip

propagation delay for the first segment and its ACK. is as depicted in (1), which

includes the pipelining effect assumed for the two packets sent back-to-back , as indi-cated in (9), along with an ACK and round-trip propagation delay:

, (8)

. (9)

The multiplicative factor is needed for because the first and last rounds are sepa-

rated out in (6). represents the time to send segments in the last round, where the round

could be less than bytes if the file size in segments is not a power of 2. There is no waitfor ACKs during this streaming, and hence we only add half the round-trip propagation delay in(10).

f fstr≤

N

N fm---- 1+

2log=

3m f fstr≤ ≤

Tpkt f( ) Tpktopen Tpkt

1 m( ) Tpkt1 0( ) p

N 2–( )Tround2 Tstream f 2N 1– 1–( ) m×–( )

·+ + +

+ +

=

Tpktopen Tpkt

1 m( )k

Tpkt1 q( )

q ha+

r-------------- 2×

q hn+

r-------------- k 1–( )× for q m≤,+=

Tpkt1 0( )

p Tround2

Tpkt2 m( )

Tround2 Tpkt

2 2 m×( ) Tpkt1 0( ) p+ +=

Tpkt2 q( ) Tpkt

1 m( )q m–( ) ha+( )

r---------------------------------- for m q 2m≤ ≤,+=

N 2– Tround2

Tstream

2N 1– m

(10)

Equation (6) holds for file sizes , where is given by (3). For files smaller than

, the total delay can be readily determined using (7) and (9).

If , the server will send data in two phases. The first phase is Slow Start until isreached, and in the second phase, data packets will be streamed. The congestion window willkeep increasing according to Slow Start or congestion avoidance rules based on the value ofssthresh. But the value of cwnd does not matter once the streaming state is reached because pack-ets are sent continuously without waiting for ACKs.

(11)

3 Congestion avoidance [1]Assumptions:

1. A packet is lost in a round independently of any packets lost in other rounds. This isbecause when periodic UDP packets are sent, it has been observed that packets sepa-rated by even 40ms suffer independent losses.

2. On the other hand, we assume that if a packet within a round is lost, all packets in thatround are considered lost. This is because of the drop-tail queueing behavior imple-mented in many routers. Drop-tail queueing simply means all remaining packets aredropped when the buffer is full. Losses are indicated either by an RTO (retransmissiontime-out) or a TD (triple-duplicate ACK). The assumption that if a packet is lost in around it implies that all packets are lost in that round does not mean TD loss indicationcannot occur because segments should be succesfully delivered in order for TD indica-tion to occur. Yes, indeed TD won’t be detected in that round, but it will be detected inthe next round. See details of model to see how packets are sent in the nextround.

3. All segments are sent before the first ACK arrives, in other words . [1]cites two other papers where this has been observed. We will pay attention to theimpact of this assumption on the model.

4. The duration of a round is independent of the window size and is equal to the round-trip time.

Tstream q( ) Tpkt1 q( ) p

2--- for q m≤,+

Tpkt1 q( )

m ha+

r---------------- q

m---- 1–

×

q qm---- m×–

ha+

r----------------------------------------------------- p

2--- for q m>,

+

+ +

=

=

3m f fstr≤ ≤ nw

3m

f fstr> Wstr

Tpkt f( ) Tpktopen Tpkt

1 m( ) Tpkt1 0( ) p+ + + nwTround

2+

Tstream f 2nw 1+

1–( ) m×–( ) for f fstr>,+

=

αi 1–

W W Wstr<

3.1 All losses are detected with a triple-duplicate ACKFirst consider the case when all loss indications are by TD only, and that rwnd (the receiver

window) is not a bottleneck. Let be the number of segments transmitted in . Let

. is the throughput rather than its goodput since it does not consider whether a

packet was received or not. Let the long-term steady-state TCP throughput be:

EQ(1)

Let be the probability that a packet is lost, given it is the first packet in a round or the preced-ing packet in its round is not lost (since we assumed that if a packet is lost in a round, all subse-quent packets are lost in that round). Goal is to find , i.e., throughput as a function of asdefined above.

Define a TD period (TDP) as the period between two TD loss indications as shown in Fig. 2.

Let be the number of packets sent in the TDP, let be the duration of the period and

be the window size at the end of the period. Considering to be a Markov regenerative pro-

cess with rewards it can be shown that , the long-term steady-state TCP throughput is:

EQ(2)

Why cwnd is halved at the end of a TDP:

Nt 0 t, )[

Bt Nt t⁄= Bt

B

B Btt ∞→lim

Ntt

-----t ∞→lim= =

p

B p( ) p

t

W3

W2W1

TDP1 TDP2TDP3

A1 A2 A3

Fig. 2 Evolution of window size over time when lossindications are all by TD ACKs

Yi ith Ai Wi

Wi{ }i

Yi{ }i B

B E Y[ ]E A[ ]------------=

12

45

3

each is a roundTD occurs, TDP ends

1 2 3 4b b TDPi

Xipenultimate round

last round

Fig. 3 Packets sent during a TDP

Wi

Wi 1–

2-------------

βi

αi

if b=2, the windowincreases by 1only after everytwo rounds

There are two types of congestion algorithms: congestion avoidance and fast retransmit andfast recovery (pay attention to the latter being first a fast retransmit and next a fast recovery). SeeStevens’ book for a description of the fast retransmit/recovery procedure.

1. ssthresh = min(cwnd, receiver’s advertised window)/22. cwnd = ssthresh + 3 segsize; then retransmit segment

Reason: TCP receiver has to issue an ACK every time it receives a new segment. Therefore when the sender receives 3 duplicate ACKs it implies that three segments got through the network successfully; Therefore it inflates the cwnd by 3.

3. For each additional duplicate ACK received: set cwnd = cwnd + segsize and transmit asegment if allowed by new value of cwnd

4. When an ACK arrives that acknowledges new data set cwnd = ssthresh; (this shouldbe the ACK for the retransmission from step 1); additionally, it will ack intermediatesegments between the lost packet and receipt of third duplicate ACK, so set cwnd =cwnd + segsize; now in CA phase

In step 2 above it appears that cwnd is set to the previous cwnd + 3. But in [1], it is assumedthat when a TD occurs, cwnd is dropped in half. This is because of step 4 above. The details offast recovery are not modeled here as noted at the top of the second column on page 304 [1]. Sowhile in the intermediate steps cwnd is increased by 3, in the final step it is set to equal ssthreshwhich was set in step 1 to be half of the previous cwnd.

See Stevens’ book [5] Figures 21.9 and 21.7. First note that in the congestion avoidance algo-rithm, ssthresh is reduced to half the cwnd when a loss indication occurs. In fact, this is commonto both a TO and TD. First, I thought this means ssthresh only keeps decreasing; for example, inFig. 21.9, ssthresh starts at 65K but drops to 512 after the loss because cwnd was 256; half of 256is 128, but ssthresh has to be at least two segments long, which makes it 512. This means that inthe next round, slow start completes very quickly because cwnd reaches ssthresh quickly. How-ever, cwnd does can grow in CA, when a loss happens because ssthresh becomes half of the newincreased cwnd, which may make ssthresh increase relative to its last value. Another point to noteis in Fig. 21.7. Here after a loss of the 6657 (check number) segment occurs, other segments getthrough, which is why we keep getting duplicate ACKs. Remember in this phase, delayed ACK isnot used; every time a new segment is received, a duplicate ACK is generated. In this interimperiod, cwnd increases; note in fast retransmit, a quick retransmission occurs when third duplicateACK (i.e., fourth ACK) is received. The reason it is “fast recovery” is that cwnd does not dropdown to 1. Instead after setting ssthresh to half of cwnd, cwnd itself is set to be ssthresh+3. How-ever, at this point cwnd is not big enough to send any more data because many segments arealready outstanding (sent but unacknowledged, e.g., 6913-8705) (see Fig. 21.11 [5] for cwnd cor-responding to the interesting section of Fig. 21.7; ssthresh halves cwnd value at segment 62 tobecome 1024 and cwnd is set to ssthresh + 3 segments; at segment 66, cwnd becomes 2560 - 10segments - at this point there are only 9 outstanding segments 6913, 7169, 7425, 7681, 7937,8193, 8449,8705, 6657 - retx; hence it sends 8961). But for every duplicate ACK received, cwndis increased as per the fast retransmit/recovery procedure. Thus cwnd keeps increasing; when itreaches a large enough value for it to send a new segment, it does so, which is segment 8961 inFig. 21.7. Finally, when an ACK for a new segment is received, cwnd is dropped down to thessthresh value. This means on that last segment (9473), cwnd should be 1024. Instead it is 1280.That means one segment size of 256 is added. But if cwnd is equal to ssthresh, it means it is inCA, which means only 1/W should be added.

EQ(3)

Anyway, the ssthresh dropping to half of cwnd, and cwnd eventually (after fast recovery) get-ting set to be equal to ssthresh explains why the window size is assumed to half in the paper aftera TD loss indication and recovery. See fig. 2, where the starting cwnd after TDP is .

How cwnd changes round-to-round:A TDP starts right after a loss indication by a TD. The cwnd size is dropped in half when a loss

is indicated by a TD.

When the first ACK arrives, cwnd becomes . This is because in CA, cwnd increasesas follows:

EQ(4)

Ignoring the last term and expressing as segments (note: cwnd is in bytes), we canrewrite (4) as

or EQ(5)

A round is defined in [1] as “starting with back-to-back transmissions of packets and end-ing when the first ACK is received.” This means at the end of the first round, increases by

. Officially the second round can start as soon as the first ACK is received. So we could say

the cwnd size at the start of the second round is only . But as a model, by the time these

packets are sent, we can assume that the remaining ACKs for the unacknowledged seg-ments of the first round arrive. A total of ACKs will be received for packets sent in thefirst round, before the first ACK of the second round arrives. This is modeled as if the second

round starts with a rather than a cwnd of . This is why

there is a statement in [1] that the cwnd at the “beginning” of the second round is .

cwnd cwnd segsize segsize×cwnd

-------------------------------------------- segsize8

-------------------+ + 1024 256 256×1024

------------------------ 8+ + 1096= = =

i 1– Wi 1– 2⁄

round

cwnd = W

cwnd = W+1/Wcwnd = W+2/Wround

Fig. 4 cwnd size increase within a TDP

W 1 W⁄+

cwnd cwnd segsize segsize×cwnd

-------------------------------------------- segsize8

-------------------+ +=

cwnd W

cwndsegsize------------------- cwnd

segsize------------------- segsize

cwnd-------------------+= W W 1

W-----+=

WW

1 W⁄

W 1W-----+

W 2–

W b⁄ 1–

cwnd W 1W----- W

b-----×+← W 1

b---+= W 1

W-----+

W 1b---+

Expression for E[Y], average number of packets sent in a TDP:In Fig. 3, let denote the first packet lost in , and denote the number of packets sent

in the last round. Explanation for : At the bottom of page 304, it says the num-

ber of packets sent after packet is . Fig. 3 shows this. In the penultimate round, window

size is . packets are sent successfully in this round. This means ACKswill be received with the first ACK being included in the penultimate round and remaining ACKsin the last round. For each received ACK, cwnd increases by . This means cwnd will be

. , which means if is greater than 1, the window size in the

next round will also be . In the penultimate round packets are sent but not

acknowledged. Therefore only can be sent in the last round. This means that after the

th packet, the number of packets sent is packets are sent. There-

fore the total number of packets sent in a TDP is .

Therefore:

. EQ(6)

Derive remembering that is the number of packets sent in upto and including

the first packet loss. Since we assume that packet loss from round to round is independent,

is a sequence of iid random variables. The probability that is given by the probability thatexactly packets are successfully transmitted before the first loss:

EQ(7)

EQ(8)

EQ(9)

from (6) EQ(10)

Derivation for E[A], the average duration of a TDP:

To derive , let the duration of be , where is the duration of the

round in . Since we assumed that is independent of the round,

, where is the mean rtt. EQ(11)

αi TDPi βi

Yi αi Wi 1–+=

αi Wi 1–

Wi αi 1–( ) αi 1–( ) b⁄

1 Wi⁄

αi 1–( ) b Wi×( )⁄ Wi+ αi 1– Wi< b

Wi Wi αi 1–( )–

αi 1–( )

αi Wi αi– αi 1–( )+ Wi 1–=

Yi αi Wi 1–+=

E Y[ ] E α[ ] E W[ ] 1–+=

E α[ ] αi TDPi

αi{ }i

α k=

k 1–

P α k={ } 1 p–( )k 1– p= k 1 2 …, ,=

E α[ ] 1 p–( )k 1– pkk 1=

∑ p pdd 1 p–( )k

k 1=

∑ ppd

d 1 p–( )k

k 1=

∑–= = =

E α[ ] ppd

d 11 1 p–( )–( )

------------------------------ 1–– p–( )pd

d 1 p–p

------------ p 1p2----- 1

p---= = = =

E Y[ ] 1 p–p

------------ E W[ ]+=

E A[ ] TDPi Ai rij

j 1=

Xi 1+

∑= rij jth

TDPi rij

E A[ ] E X[ ] 1+( )E r[ ]= E r[ ]

(11) holds because X is the penultimate round. So there is one more round. Therefore A is(X+1) multiplied by RTT.Derivation for E[X], mean number of rounds in a TDP and mean window size E[W]:

(assuming both terms on rhs are integers) EQ(12)

Above equation holds because in each round, window size increases by as already shownin the section titled “how cwnd changes round-to-round.” Each column of Fig. 3 is a round. In thesecond column, the cwnd is , as explained before, after each round, the window

size increases by . If is 2, this means that the increase is only 0.5, so in the second roundtoo the number of segments sent is the same as in round 1. In the third round, the increase will beanother 0.5, so now 1 extra segment can be sent. This is why in Fig. 2, rounds 1 and 2 (first twocolumns) are equal in height; rounds 3 and 4; and so on. Therefore after round , the cwndbecomes

EQ(13)

(Eqns. 8-10 of [1]): The number of packets sent in the first rounds is ; in the second

rounds, the number of packets sent is ; and so on. The penultimate round is and

packets are sent in the last round as shown in Fig. 3.

EQ(14)

Adding all the terms first and then the second terms in each term:

EQ(15)

EQ(16)

By replacing using (12), EQ(17)

EQ(18)

Taking expectations of both sides of (12), and assuming and are mutually inde-pendent sequences of iid random variables, we get:

WiWi 1–

2-------------

Xib-----+=

1 b⁄

Wi 1– 2⁄ 1 b⁄+

1 b⁄ b

Xi

WiWi 1–

2-------------

Xib-----+=

b Wi 1– 2⁄

b Wi 1– 2⁄ 1+ Xi

βi

YiWi 1–

2------------- b×

Wi 1–

2------------- 1+ b×

Wi 1–

2------------- 2+ b× …

Wi 1–

2-------------

Xib----- 1– +

b× βi+ + + + +=

Wi 1– 2⁄

YiWi 1–

2------------- b×

Xib-----× b 1 2 …

Xib----- 1– + + +

βi+ +=

YiWi 1–

2------------- Xi×

Xib-----

Xib----- 1– b

2---------------------------- βi+ +=

Xi b⁄ YiXi2----- Wi 1– Wi

Wi 1–

2-------------–

1– +

βi+=

YiXi2-----

Wi 1–

2------------- Wi 1–+ βi+=

Xi{ } Wi{ }

EQ(19)

Taking stock: our goal was to find throughput , which means we needed and . is nowexpressed in terms of and in (18). Eqn (19) gives another relation between and ; thismeans is expressed in terms of . Similarly eq. (11) expressed in terms on ; which meanswe can express in terms of using (19). Therefore both and are now expressed in termsof . is independently expressed in terms of using (10). So we now can set the obtained using (10) and using (18) equal, which is what is done below:

EQ(20)

Solve for by setting

(using (19) and assuming EQ(21)

the number of packets sent in the last round is uniformly distributed between 1 and : the first

packet lost in the penultimate round could be any one of the packets).

EQ(22)

EQ(23)

EQ(24)

The reason for dropping the - sign is that the term in the square root is bigger than the firstterm, which means mean congestion window becomes negative. The above is eq. 13 of thepaper [1].

If is 2, then is only 2/3, while the first term under the square root can be quitelarge if is very small. So ignore the two terms (in and outside square root); there-fore the approximation of eq. 14 of [1], which is:

EQ(25)

The reason for the term is that the term inside the square root can beexpanded as a Taylor’s series. The o term says this Taylor’s series does not grow faster than

. For small ,

E W[ ] 2b---E X[ ]=

B Y A YX W X W

Y W A XA W Y A

W Y W E Y[ ]

1 p–p

------------ E W[ ]+E X[ ]

2------------ E W[ ]

2------------- E W[ ] 1–+ E β[ ]+=

E W[ ] s E W[ ]=

1 p–p

------------ s+ b s4--- 3

2---s 1– s

2---+= E β[ ] E W[ ] 2⁄=

Wi

αi Wi

3b8

------s2 b4--- 1

2---+

s–1 p–

p------------ – 3bs2 2 2 b+( )s– 8 1 p–

p------------ – 0= =

s2 2 b+( ) 4 2 b+( )2 32 3b 1 p–

p------------ ××+±

6b-------------------------------------------------------------------------------------------------------=

s E W[ ] 2 b+( )3b

----------------- 8 1 p–3bp------------ 2 b+

3b------------ 2

++= =

W

b 2 b+( ) 3b( )⁄p 2 b+( ) 3b( )⁄

E W[ ] 83bp--------- o 1

p------- +=

o 1 p⁄( ) 1 p–( )

o 1 p⁄( ) p

EQ(26)

From (19) and (24):

EQ(27)

From (11):

Mean duration of a TDP: EQ(28)

From (25):

Mean number of rounds in a TDP: EQ(29)

From (2) and (10):

EQ(30)

, which reduces to EQ(31)

(how?) EQ(32)

Compare (32) with the Matt Mathis equation . Why is it that MSS does not appear

in (32)?

If we want to model the TCP data transfer phase as only consisting of TDPs, i.e., once SlowStart is over and CA is reached, the transfer always stays in CA; then can delay be estimated asfollows:

EQ(33)

is the file size and is the mss. Thus is the number of packets. E[Y] average number ofpackets are sent per TDP and the average duration of a TDP is E[A]. Therefore delay is given byD. If all losses are detected through TD ACKs, then the system always stays in CA phase and weonly have TDPs. But this is not a good assumption. Not clear whether there are more TOs or moreTDs. Table 2 of [1] shows that more TOs occur; but then fast retransmit/recovery algorithm wouldnot have been created if TDs were not more common. Apparently TOs happen because after one

E W[ ] 83bp---------≈

E X[ ] 2 b+( )6

----------------- 2b 1 p–3p

------------ 2 b+

6------------ 2

++=

E A[ ] RTT 2 b+( )6

----------------- 2b 1 p–3p

------------ 2 b+

6------------ 2

+ 1+ + =

E X[ ] 2b3p------ o 1

p------- +=

B E Y[ ]E A[ ]------------

1 p–p

------------ E W[ ]+

E A[ ]--------------------------------= =

B

1 p–p

------------ 2 b+( )3b

----------------- 8 1 p–3bp------------ 2 b+

3b------------ 2

++ +

RTT 2 b+( )6

----------------- 2b 1 p–3p

------------ 2 b+

6------------ 2

+ 1+ +

---------------------------------------------------------------------------------------------------------=

B p( ) 1RTT----------- 3

2bp--------- o 1

p------- +=

B MSSRTT----------- 3

2p------∼

D fm E Y[ ]×---------------------- E A[ ]×=

f m f m⁄

packet is lost in a round, it is likely that other packets in the round are also lost (because of con-gested routers). Detection of loss of subsequent packets is often via TO.

Reference [6] motivates the need for selective ACKs (instead of cumulative ACKs only) asfollows: “TCP may experience poor performance when multiple packets are lost from one win-dow of data. With the limited information available from cumulative acknowledgments, a TCPsender can only learn about a single lost packet per round trip time. An aggressive sender couldchoose to retransmit packets early, but such retransmitted segments may have already been suc-cessfully received. A Selective Acknowledgment (SACK) mechanism, combined with a selectiverepeat retransmission policy, can help to overcome these limitations. The receiving TCP sendsback SACK packets to the sender informing the sender of data that has been received. The sendercan then retransmit only the missing data segments.” In other words, if within a round, multiplepackets are lost, the first one would be detected by three duplicate ACKs, but if a second lossoccurred (which is likely because the routers were congested when this round is sent), detectionwill likely be only through a time-out since three other segments may not get through to causeduplicate ACKs to be generated. This has the effect of cwnd falling to 1 segment and delayingrecovery. Hence the SACK option was proposed. With this option TCP’s error control schemebecomes truly selective Repeat.3.2 Loss indications are with TD ACKs and TOs

When a timeout occurs, drops to 1. If another loss occurs before a success, the retrans-mission timer is doubled before retransmitting the packet lost in the first timeout. Here instead

of , use and instead of , use . is a duration that consists of a time-out sequence ,

which includes time for the first retransmit at , next at , etc. until is reached, PLUS

the time interval between two time-out sequences . In this latter time interval, there can be

many TDPs. is the duration of a time-out sequence. See figure 3 of [1].

EQ(34)

Using and for number of packets and duration, one gets throughput .

is the duration between two consecutive TO sequences. This means many TDPs could

occur. The number of TDPs within this interval is .

and EQ(35)

is the number of packets sent in the TD period of interval , is the number of pack-

ets sent during a time-out sequence , is the duration of the TD period of interval ,

and is the number of rounds in that TDP.

cwndT0

Y M A S Si ZiTO

T0 2T0 64T0

ZiTD

ZiTO

Si ZiTO Zi

TD+=

M S B

ZiTD

ni

Mi Yij

j 1=

ni

∑ Ri+= Si Aij

j 1=

ni

∑ ZiTO

+=

Yij jth ZiTD Ri

ZiTO Aij jth Zi

TD

Xij

and EQ(36)

If we assume to be an iid sequence, independent of and , then

and EQ(37)

Set , where is the probability that a “loss indication ending a TDP is a TO.”Shouldn’t it be just the probability that at loss indication is a TO. If there are TDPs within a

, then it means that losses were detected by TD ACKs, and 1 loss was detected by aTO.

EQ(38)

On pg. 307 [1], it says if packets are sent successfully in penultimate round, then another packets will be sent in the next round. The same reasoning as on the bottom of page 7 applieshere. If is the window size, then packets are lost in penultimate round and hence theycount as being sent but are unacknowledged. Then window size will still be W in the last roundbecause will not be 1 unless and and if then it meansno loss occured in the penultimate round. Therefore packets will be sent in the last round atwhich point there will be outstanding unacknowledged segments.

Let be the probability that the first packets are ACKed in a round of packets,given there is a sequence of one or more losses in the round. Then

EQ(39)

This from conditional probability:

EQ(40)

The probability that the first packets are ACKed in a round of packets AND there is one loss

in that round is . The probability that there is at least one loss is (which isevent B). Therefore the probability that the first packets are ACKed given that there is at leastone loss in a round of packets is given by .

Let be the probability that packets are ACK’ed in sequence in the last round(where packets are sent) and the rest of the packets in the round, if any are lost:

E M[ ] E Yij

j 1=

ni

∑ E R[ ]+= E S[ ] E Aij

j 1=

ni

∑ E ZTO[ ]+=

ni{ }i Yij{ } Aij{ }

E Yij

j 1=

ni

∑ E n[ ]E Y[ ]= E Aij

j 1=

ni

∑ E n[ ]E A[ ]=

Q 1 E n[ ]( )⁄= Qni

ZiTD ni 1–

B E Y[ ] Q E R[ ]⋅+

E A[ ] Q E ZTO[ ]⋅+----------------------------------------------=

k k

W W k–

W k b⁄( ) 1 W⁄( )+ k W= b 1= k W=

kW

A w k,( ) k w

A w k,( ) 1 p–( )kp1 1 p–( )w–( )

---------------------------------=

P A B( ) P A B∩( )P B( )

-----------------------=

k w

1 p–( )kp 1 1 p–( )w–

kw A w k,( )

C n m,( ) mn

EQ(41)

Set to be the probability that a loss in a window of size is detected by a TO:

EQ(42)

To explain : If w=3, question is whether always loss indication will be because of TO.

The ack generated when seg 6 is sent is asking for seg 5; so even though seg 6 was received suc-cesfully, the sender does not know this. It therefore assumes that it has already sent 3 (5, 6, 7),which are unack’ed and W = 3. But when seg. 7 gets through and the 2nd dupl. ACK is receivedfor 5, W will become 4; so now it can send seg 8. This means a third dupl. ACK can be received.But the round ends when 1st dupl. is received, which means in this last round there will be no lossindication by TD. Something wrong here in the model?

To explain the case when ; the first term is when is 2 or less. This means that in the lastround only 2 or less packets will be sent, so 2 or less duplicate ACKs will be received, whichmeans 3 duplicate ACKs cannot be received and hence loss indication is with a TO.

The second term occurs when , number of successfully sent packets in the penultimate roundis 3 or more. But if in the last round only 0, 1, 2 of the packets sent are successful, then threepackets are not sent in the last round and hence 3 dupl. ACKs will not be received. Therefore indi-cation is by a TO. The term is the probability that there is a loss at in the

penultimate round and a loss at in the last round. See figure 4 of [1].

C n m,( )1 p–( )mp m n 1–≤

1 p–( )n m n=

=

Q̂ w( ) w

Q̂ w( )

1 w 3≤

A w k,( )

k 0=

2

∑ A w k,( ) C n m,( )

m 0=

2

∑k 3=

w

∑+ otherwise

=

Q̂ w )( )

w=3

w = 3+1/3

w = 3+2/3

w = 4

ack 5

seg 5

ack 5 (1st dupl)ack 5 (2nd dupl)

Loss indication by TD

seg 3seg 4

seg 7seg 6

seg 8

ack 5 (3rd dupl)

penultimate round ends

last round ends

w 3> k

kk

A w k,( )C n m,( ) fk 1+

sm 1+

To get eqn 23 of [1]:

EQ(43)

EQ(44)

EQ(45)

EQ(46)

Therefore the second term in the second line of 22 evaluates to

EQ(47)

Adding (44) to the above term yields

EQ(48)

Using

EQ(49)

EQ(48) becomes

EQ(50)

EQ(51)

Because

EQ(52)

EQ(53)

A w k,( )

k 0=

2

∑ A w 0,( ) A w 1,( ) A w 2,( )+ +=

A w k,( )

k 0=

2

∑ p p 1 p–( ) p 1 p–( )2+ +

1 1 p–( )w–

----------------------------------------------------------=

A w 3,( ) C 3 m,( )

m 0=

2

∑p 1 p–( )3

1 1 p–( )w–

----------------------------- p p 1 p–( ) p 1 p–( )2+ +[ ]=

C 3 m,( )

m 0=

2

∑ C 4 m,( )

m 0=

2

∑ C w m,( )

m 0=

2

∑ p p 1 p–( ) p 1 p–( )2+ +[ ]= = =

p p 1 p–( ) p 1 p–( )2+ +[ ]

1 1 p–( )w–

--------------------------------------------------------------- p 1 p–( )3 p 1 p–( )4 … p 1 p–( )w+ + +[ ]

Q̂ w( ) p p 1 p–( ) p 1 p–( )2+ +[ ]

1 1 p–( )w–--------------------------------------------------------------- 1 p 1 p–( )3 p 1 p–( )4 … p 1 p–( )w+ + + +[ ]=

S a ar ar2 … arn 1–+ + + + a 1 rn

–( )1 r–( )

------------------= =

Q̂ w( ) p p 1 p–( ) p 1 p–( )2+ +[ ]

1 1 p–( )w–--------------------------------------------------------------- 1 p 1 p–( )3 1 1 p–( ) 1 p–( )2 … 1 p )w 3–

–+ + + +[ ]+[ ]=

Q̂ w( ) p p 1 p–( ) p 1 p–( )2+ +[ ]

1 1 p–( )w–--------------------------------------------------------------- 1 p 1 p–( )3 1 1 p–( )w 2–

–p

-----------------------------------+=

p p 1 p–( ) p 1 p–( )2+ +( ) 3p 3p2 p3+– 1 1 p–( )3–= =

Q̂ w( ) 1 1 p–( )3–( )

1 1 p–( )w–--------------------------------- 1 1 p–( )3 1 1 p–( )w 2–

–[ ]+[ ]=

Rewriting (42):

EQ(54)

This shows that the index in eqn 23 of [1] is off by 1. It should be not . The min 1comes from case.

Apply L’Hospital’s rule to find (see MG1 lecture for recap of L’Hospital’s rule):

EQ(55)

But approaches 0 as approaches 0. Therefore the second term is 0. The firstterm approaches 3.

Taking the derivative of the denominator yields

EQ(56)

This evaluates to as approaches 0. Therefore

. EQ(57)

Hence:

EQ(58)

Therefore in (38), the probability that a loss indication is a TO is determined by averaging

that the loss is due to a TO when is window size is , i.e., find average of :

, which is approximated as: EQ(59)

, where is given by (24). EQ(60)

Now to derive remaining terms in (38). Find , the average number of packets sent duringa TO sequence:

(in a sequence of TOs) - geometric EQ(61)

Q̂ w( ) min 1 1 1 p–( )3–( )

1 1 p–( )w–--------------------------------- 1 1 p–( )3 1 1 p–( )w 2–

–[ ]+[ ],

=

w 2– w 3–

w 3≤

Q̂ w( )p 0→lim

ddp------ 1 1 p–( )3

–( ) 1 1 p–( )3 1 1 p–( )w 2––( )+( )

3 1 p–( )2 1 1 p–( )3 1 1 p–( )w 2––( )+( ) 1 1 p–( )3–( ) 1 1 p–( )3 1 1 p–( )w 2–

–( )+ ′( )+=

1 1 p–( )3–( ) p

ddp------ 1 1 p–( )w–( ) w 1 p–( )w 1–

=

w p

Q̂p 0→lim w( ) 3

w----=

Q̂ w( ) min 1 3w----,

Q

w Q̂ w( )

Q Q̂ w( )P W w=[ ]

w 1=

∑ E Q̂[ ]= =

Q Q̂ E W[ ]( )≈ E W[ ]

E R[ ]

P R k=[ ] pk 1– 1 p–( )= k

EQ(62)

To get the average duration of a TO sequence , excluding retransmissions (actual timefor emitting the packet on each retransmission?)

Duration of a sequence with time-outs, EQ(63)

To compute

EQ(64)

Should the number of timeouts in a TO sequence really go to ? Isn’t it usually capped at 12or some such number? Anyway, following their model,

EQ(65)

set

EQ(66)

E R[ ] kP R k=[ ]

k 1=

∑ 11 p–------------= =

E ZTO[ ]

k Lk2k 1–( )T0 k 6≤

63 64 k 6–( )T0+ k 7≥

=

E ZTO[ ]

E ZTO[ ] LkP R k=[ ]

k 1=

∑=

E ZTO[ ] 2k 1–( )pk 1– 1 p–( )

k 1=

6

64 k 6–( )pk 1– 1 p–( )

k 7=

∑+ 63 pk 1– 1 p–( )

k 7=

∑+

T0=

E ZTO[ ] A B+ C+( )T0=

A 1 p–( ) 1 3p 7p2 15p3 31p4 63p5+ + + + +( )=

S 1 3p 7p2 15p3 31p4 63p5+ + + + +=

pS p 3p2 7p3 15p4 31p5 63p6+ + + + +=

S 1 p–( ) 1 2p 4p2 8p3 16p4 32p5 63p6–+ + + + +=

S 1 p–( ) 1 2p( )6–1 2p–

---------------------- 63p6–=

A 1 64p6–

1 2p–( )-------------------- 63p6–=

B 64 k 6–( )pk 1– 1 p–( )

k 7=

∑=

B 1 p–( )64 p6 2p7 3p8 …+ + +( )= B 64 1 p–( )S=

EQ(67)

EQ(68)

Combining EQ(66), EQ(67), EQ(68) according to EQ(65)

EQ(69)

This matches result in paper.

With our formulations for , , ,

EQ(70)

Use (54) for , (24) for and (27) for to obtain from (70). For an approxi-

mate answer, use (58) for , (25) for and (29) for :

EQ(71)

S p6 2p7 3p8 …+ + +=

pS p7 2p8 3p9 …+ + +=

1 p–( )S p6 p7 p8 …+ + +=

B 64p6

1 p–( )----------------=

C 63 1 p–( ) p6 p7 p8 …+ + +( ) 63p6= =

E ZTO[ ] 1 64p6–

1 2p–( )-------------------- 63p6– 64p6

1 p–------------+ 63p6

+ T0=

1 64p6– 1 2p( )6– 1 2p( )3

–( ) 1 2p( )3+( ) 1 2p–( ) 1 4p2 2p+ +( ) 1 8p3+( )= = =

E ZTO[ ] 1 4p2 2p+ +( ) 1 8p3+( ) 64p6

1 p–------------+

T0=

E ZTO[ ]1 p–( ) 1 2p 4p2

+ + 8p3 16p4 32p5 64p6+ + + +( )T0

1 p–---------------------------------------------------------------------------------------------------------------------------------=

E ZTO[ ] 1 p 2p2 4p3 8p4 16p5 32p6+ + + + + +

1 p–---------------------------------------------------------------------------------------------- T0

f p( )1 p–------------T0= =

Q E R[ ] E ZTO[ ]

B p( ) E Y[ ] Q E R[ ]⋅+

E A[ ] Q E ZTO[ ]⋅+----------------------------------------------

1 p–p

------------ E W[ ] Q̂ E W[ ]( ) 11 p–------------+ +

RTT E X[ ] 1+( ) Q̂ E W[ ]( )T0f p( )1 p–------------+

--------------------------------------------------------------------------------------= =

Q̂ E W[ ] E X[ ] B p( )

Q̂ E W[ ] E X[ ]

B p( ) 1

RTT 2bp3

--------- T0min 1 3 3bp8

---------, p 1 32p2

+( )+

----------------------------------------------------------------------------------------------------------≈

3.3 If receiver window becomes the bottleneck

Let be the receiver window size. This sets a bound beyond which any increase in thecongestion window size does not matter because the number of outstanding segments is amin(cwnd, AW). AW is the advertised window, aka receiver window. Find , the averageunconstrained window size at the sender. This is given by (24):

EQ(72)

If , then the long-term average of the TCP throughput is given by (70) because

the receiver window does not have an impact. We approximate . But if

, then we approximate . In this case, the equation leading up to the

throughput equation have to be recomputed. The window grows to in rounds and

remains at this level for rounds before a TD occurs. , where X is the total num-ber of rounds in a TDP. When a TD occurs, the window size drops to half. Therefore it becomes

. In rounds it reaches . Therefore

for EQ(73)

EQ(74)

The number of packets sent

EQ(75)

Since it reaches in the th round, in the th round, the window size is

.

EQ(76)

EQ(77)

Since (see EQ(74)

EQ(78)

Wmax

E Wu[ ]

E Wu[ ] 2 b+( )3b

----------------- 8 1 p–3bp------------ 2 b+

3b------------ 2

++=

E Wu[ ] Wmax<

E W[ ] E Wu[ ]≈

E Wu[ ] Wmax≥ E W[ ] Wmax≈

Wmax U1

V1 U1 V1+ X=

Wmax 2⁄ U2 Wmax

WmaxWmax

2-------------

Uib-----+= i 2≥∀

Wmax2

------------- E U[ ]b

-------------=

YiWmax

2-------------b

Wmax2

------------- 1+ b

Wmax2

------------- 2+ b …

Wmax2

------------- Ub----+

b ViWmax+ + + + +=

Wmax U U 1–

Wmax2

------------- U 1–b

------------- +

YiWmax

2------------- U

b---- 1+ b b 1 2 … U

b----+ + +

ViWmax+ +=

YiWmaxU

2-----------------

bWmax2

---------------- U2---- U

b---- 1+ ViWmax+ + +=

Ub----

Wmax2

-------------=

YiU2----

Wmax2

------------- Wmax+ b

Wmax2

------------- U2b------+

ViWmax+ +=

Not the same as in the paper!Paper says:

EQ(79)

EQ(80)

Using (10) for ,

EQ(81)

EQ(82)

Since (number of rounds in TDP ),

EQ(83)

Using this value of and setting in (70),

EQ(84)

Combining (70) and (84):

EQ(85)

An approximate model combines (71) with

EQ(86)

YiU2----

Wmax2

------------- Wmax+ ViWmax+=

E Y[ ] 34---WmaxE U[ ] WmaxE V[ ]+

3b8

------Wmax2 WmaxE V[ ]+= =

E Y[ ]

E Y[ ] 1 p–p

------------ E W[ ]+ 1 p–p

------------ Wmax+3b8

------Wmax2 WmaxE V[ ]+= = =

E V[ ] 1 p–pWmax---------------- 1 3b

8------Wmax–+=

Xi Ui Vi+= i

E X[ ] E U[ ] E V[ ]+b8---Wmax

1 p–pWmax---------------- 1+ += =

E X[ ] E W[ ] Wmax=

B p( )

1 p–p

------------ Wmax Q̂ Wmax( ) 11 p–------------+ +

RTT b8---Wmax

1 p–pWmax---------------- 2+ +

Q̂ Wmax( )T0f p( )1 p–------------+

------------------------------------------------------------------------------------------------------------------=

B p( )

1 p–p

------------ E W[ ] Q̂ E W[ ]( ) 11 p–------------+ +

RTT E X[ ] 1+( ) Q̂ E W[ ]( )T0f p( )1 p–------------+

-------------------------------------------------------------------------------------- if E Wu[ ] Wmax<( )

1 p–p

------------ Wmax Q̂ Wmax( ) 11 p–------------+ +

RTT b8---Wmax

1 p–pWmax---------------- 2+ +

Q̂ Wmax( )T0f p( )1 p–------------+

------------------------------------------------------------------------------------------------------------------ otherwise

=

B p( ) minWmaxRTT------------- 1

RTT 2bp3

--------- T0min 1 3 3bp8

---------, p 1 32p2

+( )+

----------------------------------------------------------------------------------------------------------,

The first term comes about because if is the constraint, then the throughtput is simply

- this is an approximation (sounds more like an upper bound to me). If is

small, then E[W], the average cwnd is set to equal . If the opposite is true, then E[W] isassumed to be the unconstrained window size.

4 Summary of contribution from [2]In the discussion section in [1], the authors state that time spent in slow start is negligible com-

pared to the length of their traces (file sizes are large). The introduction in [2] states that if proba-bility of loss is low (i.e., load is low), slow start behavior will dominate. In [1], there are nomeasurements with low p. The model predicts something in the graphs but these are probably notaccurate. Our numbers from plotting the various components of the total delay formula from [2]bore this out. When p was small, SS dominated; at large p, CA dominated. Reference [2]says slowstart after RTOs can be ignored - for large p’s, CA has same throughput as slow start after RTOs;why? Also, they warn that for small p’s even if RTOs are rare because failures are rare, when theyoccur durations can be long. Neither paper speaks of ssthresh. Why? Is this a settable parameter insockets programming? What is ssthresh typically? If it is large, seems to me that slow start wouldbecome important. ssthresh probably does not matter because whether loss detection is throughTD or TO, ssthresh gets set to cwnd/2 or at least 2 segments. This makes ssthresh very small. Soafter a loss, the system quickly enters CA. Therefore ssthresh and the impact of Slow Start isprobably only seen in the very first ramp-up before the first loss (which is what [2] has in itsmodel). See Fig. 21.9 of Stevens. sshtresh drops from 65K to 512. It ’s 512 because it should be atleast two segments. Strictly speaking it is not true as in [2] that after first loss, SS ends; indeed SScould be reentered with a TO, but even if it does enter, because ssthresh is so small, it will quicklyget to CA. ssthresh can increase if cwnd becomes very large. Another point to note: details of fastrecovery procedure are not modeled. Not important since the measurements validate the model.

References

[1] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput: A Simple Model and its Empir-ical Validation,” Proc. of ACM SIGCOMM 98, Aug. 31 - Sep. 4, Vancouver Canada, pp. 303-314.

[2] N. Cardwell, S. Savage, and T. Anderson, “Modeling TCP Latency,” Proc. of IEEE Infocom, Mar. 26-30,2000, Tel-Aviv, Israel, pp. 1724-1751.

[3] M. Allman, “On the Generation and Use of TCP Acknowledgments,” ACM Computer CommunicationReview, vol. 28 no. 5, Oct. 1998.

[4] T. V. Lakshman and U. Madhow, “The performance of TCP/IP for Networks with High Bandwidth-DelayProducts and Random Loss,” IEEE/ACM Transactions on Networking, vol. 5, no. 3, June 1997.

[5] W. R. Stevens, “TCP/IP Illustrated Vol. I.”[6] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, “TCP Selective Acknowledgment Options,” IETF RFC 2018,

October 1996.

Wmax

Wmax RTT( )⁄ Wmax

Wmax