Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Jiachen Chen
WINLAB, Dept. of Computer Science
Rutgers University
Transport Layer ECE544: Communication Networks-II, Spring 2018
Includes teaching materials from L. Peterson, Sumathi Gopal and Sumit Rangwala, D. Raychaudhuri, Mike Freedman, Tam Vu
OSI Protocol Stack: Key Abstractions
2
Problem: Network Layer (IP) provides only best-effortcommunication services
Best-effort local packet delivery
Best-effort global packet delivery
Reliable streams
Applications
Messages
Application
Presentation
Session
Transport
Network
Data link
Physical
File share, Virtual terminal, …
HTML, XML, JSON, CSS…
RFC, SCP, NFS, …
TCP, UDP, …
IP
802.2, MPLS, …
ISDN, USB, DSL, DOCSIS, …
Link Net Trans …
Applications requirements vs. Network layer limitations Guarantee message delivery
Network may drop messages.
Deliver messages in the same order they are sent Messages may be reordered in networks and incurs a long delay.
Delivers at most one copy of each message Messages may duplicate in networks.
Support arbitrarily large message Network may limit message size.
Support synchronization between sender and receiver
Allows the receiver to apply flow control to the sender
Support multiple application processes on each host Network only support communication between hosts
Many more
IP Protocol Stack: Key Abstractions
4
Transport layer:
Provide applications with good abstractions
Without support or feedback from the network
Is the lowest layer in the network stack that is an end-to-end protocol
Best-effort local packet delivery
Best-effort global packet delivery
Reliable streams
Applications
Messages
Link
Network
Transport
Application
Transport Protocols
5
Logical communication between processes Sender divides a message into segments Receiver reassembles segments into message
Transport services (De)multiplexing packets Detecting corrupted data Optionally: reliable delivery, flow control, …
Two Basic Transport Features Demultiplexing: port numbers
Error detection: checksums
Web server
(port 80)
Client host
Server host 128.2.194.242
Echo server
(port 7)
Service request for
128.2.194.242:80
(i.e., the Web server)OSClient
IP payload
detect corruption6
Most Popular Transport Protocols User Datagram Protocol (UDP)
Support multiple applications processes on each host Option to check messages for correctness with CRC check
Transmission Control Protocol (TCP) Ensures reliable delivery of packets between source and destination processes
Ensures in-order delivery of packets to destination process
Other services
Datagram Congestion Control Protocol (DCCP) Message-oriented protocol Reliable connection setup, tear down, ECN, congestion control, feature negotiation
User Datagram Protocol (UDP) Service: Support for multiple processes on each host to communicate
Issue: IP only provides communication between hosts (IP addresses)
Solution
Add port number and associate a process with a port number
4-Tuple Unique Connection Identifier: [SrcPort, SrcIPAddr, DestPort, DestIPAddr ]
Lightweight communication between processes
Send and receive messages
Avoid overhead of ordered, reliable delivery
No connection setup delay, in-kernel connection state
Used by popular apps
Query/response for DNS
Real-time data in VoIP
SrcPort DesPort
Length Checksum
Payload
0 16 31
User Datagram Protocol (UDP): Error Detection
Service: Ensure message correctness Issue: Packet corruption in transit
Solution Use Checksum.
Includes UDP header, payload, pseudo header
Pseudo header Protocol number, source IP address, destination IP address, and UDP length
SrcPort DesPort
Length Checksum
Payload
0 16 31
Advantages of UDP
10
Fine-grain control
UDP sends as soon as the application writes
No connection set-up delay
UDP sends without establishing a connection
No connection state
No buffers, parameters, sequence #s, etc.
Small header overhead
UDP header is only eight-bytes long
Transmitting a stream of bytes ?
Stream-of-bytes service
Sends and receives a stream of bytes
Reliable, in-order delivery
Corruption: checksums
Detect loss/reordering: sequence numbers
Reliable delivery: acknowledgments and retransmissions
Connection oriented
Explicit set-up and tear-down of TCP connection
Flow control
Prevent overflow of the receiver’s buffer space
Congestion control
Adapt to network congestion for the greater good11
Transmission Control Protocol (TCP) First proposed by Vinton Cerf and Robert Kahn, 1974
TCP/IP enabled computers of all sizes, from different vendors, different OSs, to communicate with each other.
Used by 80% of all traffic on the Internet
Reliable, in-order delivery, connection-oriented, bye-stream service
Starting and Ending a Connection:
TCP Handshakes
Establishing a TCP Connection
Three-way handshake to establish connection
Host A sends a SYN (open) to the host B
Host B returns a SYN acknowledgment (SYN ACK)
Host A sends an ACK to acknowledge the SYN ACK
14
A B
Each host tells its Initial Sequence Number (ISN) to the other host.
TCP Header
15
Source port Destination port
Sequence number
Acknowledgment
Advertised windowHdrLe
nFlags0
Checksum Urgent pointer
Options (variable)
Data
Flags:SYN
FIN
RST
PSH
URG
ACK
Step 1: A’s Initial SYN Packet
16
A’s port B’s port
A’s Initial Sequence Number
Acknowledgment
Advertised window20 Flags0
Checksum Urgent pointer
Options (variable)
Flags:SYN
FIN
RST
PSH
URG
ACK
A tells B it wants to open a connection…
Step 2: B’s SYN-ACK Packet
B’s port A’s port
B’s Initial Sequence Number
A’s ISN plus 1
Advertised window20 Flags0
Checksum Urgent pointer
Options (variable)
Flags:SYN
FIN
RST
PSH
URG
ACK
B tells A it accepts, and is ready to hear the next byte…
… upon receiving this packet, A can start sending data
17
Step 3: A’s ACK of the SYN-ACK
A’s port B’s port
B’s ISN plus 1
Advertised window20 Flags0
Checksum Urgent pointer
Options (variable)
Flags:SYN
FIN
RST
PSH
URG
ACK
A tells B it is okay to start sending
Sequence number
… upon receiving this packet, B can start sending data
18
SYN Loss and Web Downloads Upon sending SYN, sender sets a timer
If SYN lost, timer expires before SYN-ACK received
Sender retransmits SYN
How should the TCP sender set the timer?
No idea how far away the receiver is
Some TCPs use default of 3 or 6 seconds
Implications for web download
User gets impatient and hits reload
… Users aborts connection, initiates new socket
Essentially, forces a fast send of a new SYN!
19
Tearing Down the Connection
Closing (each end of) the connection
Finish (FIN) to close and receive remaining bytes
And other host sends a FIN ACK to acknowledge
Reset (RST) to close and not receive remaining bytes
timeA
B
20
Sending/Receiving the FIN Packet
Sending a FIN: close()
Process is done sending data via socket
Process invokes “close()”
Once TCP has sent all the outstanding bytes…
… then TCP sends a FIN
Receiving a FIN: EOF
Process is reading data from socket
Eventually, read call returns an EOF
21
Data transmission
TCP: Byte-stream
Service: Byte-stream Application reads or writes a stream of bytes to the transport
Issue: IP is packet-oriented
Solution: TCP maintains a local buffer Chop the stream into packets and transmit (sender)
Coalesce data from packets to form a stream (receiver)
TCP “Stream of Bytes” Service
Host A
Host B
24
…Emulated Using TCP “Segments”
Host A
Host B
TCP Data
TCP Data
Segment sent when:1. Segment full (Max Segment Size),2. Not full, but times out, or3. “Pushed” by application
25
TCP Segment
IP packet
No bigger than Maximum Transmission Unit (MTU)
E.g., up to 1500 bytes on an Ethernet link
TCP packet
IP packet with a TCP header and data inside
TCP header is typically 20 bytes long
TCP segment
No more than Maximum Segment Size (MSS) bytes
E.g., up to 1460 consecutive bytes from the stream
IP HdrIP Data
TCP HdrTCP Data (segment)
26
Sequence NumberHost A
Host B
TCP Data
TCP Data
ISN (initial sequence number)
Sequence number = 1st byte
27
Initial Sequence Number (ISN)
Sequence number for the very first byte
E.g., Why not a de facto ISN of 0?
Practical issue: reuse of port numbers
Port numbers must (eventually) get used again
… and an old packet may still be in flight
… and associated with the new connection
So, TCP must change the ISN over time
Set from a 32-bit clock that ticks every 4 microsec
… which wraps around once every 4.55 hours!
28
Reliable Delivery on a LossyChannel With Bit Errors
Challenges of Reliable Data Transfer
Over a perfectly reliable channel: Done
Over a channel with bit errors
Receiver detects errors and requests retransmission
Over a lossy channel with bit errors
Some data missing, others corrupted
Receiver cannot easily detect loss
Over a channel that may reorder packets
Receiver cannot easily distinguish loss vs. out-of-order
30
An Analogy Alice and Bob are talking
What if Alice couldn’t understand Bob?
Bob asks Alice to repeat what she said
What if Bob hasn’t heard Alice for a while?
Is Alice just being quiet? Has she lost reception?
How long should Bob just keep on talking?
Maybe Alice should periodically say “uh huh”
… or Bob should ask “Can you hear me now?”
31
Take-Aways from the Example
Acknowledgments from receiver
Positive: “okay” or “uh huh” or “ACK”
Negative: “please repeat that” or “NACK”
Retransmission by the sender
After not receiving an “ACK”
After receiving a “NACK”
Timeout by the sender (“stop and wait”)
Don’t wait forever without some acknowledgment
32
TCP Support for Reliable Delivery Detect bit errors: checksum
Used to detect corrupted data at the receiver
…leading the receiver to drop the packet
Detect missing data: sequence number
Used to detect a gap in the stream of bytes
... and for putting the data back in order
Recover from lost data: retransmission
Sender retransmits lost or corrupted data
Two main ways to detect lost packets
33
TCP AcknowledgmentsHost A
Host B
TCP Data
TCP Data
ISN (initial sequence number)
Sequence number = 1st byte
ACK sequence number = next expected byte
34
Automatic Repeat reQuest (ARQ)
ACK and timeouts Receiver sends ACK when it receives packet
Sender waits for ACK and times out
Simplest ARQ protocol Stop and wait
Send a packet, stop and wait until ACK arrives
35
Time
Tim
eo
u
t
Sender Receiver
Quick TCP Math• Initial Seq No = 501. Sender sends 4500 bytes
successfully acknowledged. Next sequence number to send is:
(A) 4501 (B) 5000 (C) 5001 (D) 5002
• Next 1000 byte TCP segment received. Receiver acknowledges with ACK number:
(A) 5001 (B) 6000 (C) 6001
36
Flow Control:TCP Sliding Window
Sliding Window: Motivation
Stop-and-wait is inefficient
Only one TCP segment is “in flight” at a time
Consider: 1.5 Mbps link with 50 ms round-trip-time (RTT)
Assume segment size of 1 KB (8 Kbits)
8 Kbits/segment at 50 msec/segment 160 Kbps
That’s 11% of the capacity of 1.5 Mbps link
39
Sliding Window Allow a larger amount of data “in flight”
Allow sender to get ahead of the receiver
… though not too far ahead
Sending process Receiving process
Last byte ACKed
Last byte sent
TCP TCP
Next byte expected
Last byte written Last byte read
Last byte received40
Receiver Buffering Receive window size
Amount that can be sent without acknowledgment
Receiver must be able to store this amount of data
Receiver tells the sender the window Tells the sender the amount of free space left
Window Size
OutstandingUn-ack’d data
Data OK to send
Data not OK to send yet
Data ACK’d
41
TCP: Flow Control Flow Control
“Prevent sender from overrunning the capacity (buffer) of the receiver”
Solution: Use adaptive receiver window size
Goal is to keep (C) – (A) < MaxRcvBuffer
Every packet carries ACK and AdvertisedWindowSending Appl Receiving Appl
LastByteAcked (J) (K) LastByteSent
(I) LastByteWritten
(B) NextByteExpected
(C) LastByteRcvd
LastByteRead(A)TCP TCP
AdvertisedWindow = MaxRcvBuffer-((NextByteExp-1)-LastByteRead)
LastByteSent (K) – LastByteAcked (J) <= AdvertisedWindow
EffWin = AdvertisedWin -(LastByteSent-LastByteAcked)
LastByteWritten – LastByteAcked <= MaxSendBuffer
Optimizing Retransmissions
43
Reasons for Retransmission
44T
imeou
t
Tim
eou
t
Tim
eou
t
Tim
eou
t
Tim
eou
t
Tim
eou
t
ACK lost
DUPLICATE
PACKET
Packet lostEarly timeout
DUPLICATE
PACKETS
How Long Should Sender Wait? Sender sets a timeout to wait for an ACK
Too short: wasted retransmissions
Too long: excessive delays when packet lost
TCP sets timeout as a function of the RTT
Expect ACK to arrive after an “round-trip time”
… plus a fudge factor to account for queuing
But, how does the sender know the RTT?
Running average of delay to receive an ACK
45
TCP Timeout
Issue: RTT in a wide area network varies substantially
Solution: Adaptive Timeout
Original Algorithm: EstimatedRTT = a x EstimatedRTT + (1-a) x SampleRTT
Timeout = β x EstimatedRTT (β = 2)
Problem Does not distinguish whether the ACK is for original transmission or retransmission
Constant β is not good. Assumes constant variance
10 a
TCP Timeout Karn/Partridge Algorithm
Whenever TCP retransmits a segment, it stops taking samples of the RTT Only measure SampleRTT for segments that have been sent only once
Each time TCP retransmits, set the next timeout to be twice the last timeout Relieves congestion
Jacobson/Karels Algorithm: Adaptive variance (uses mean variance)
Difference = SampleRTT - EstimatedRTT
EstimatedRTT = EstimatedRTT + (d x Difference) → (same as in original)
Deviation = Deviation + d(|Difference|- Deviation)
Timeout = m x EstimatedRTT + f x Deviation
(default: set m = 1 and f= 4 )
10 d
TCP Deadlock TCP Deadlock
receiver advertises a window size of 0, the sender stops sending data
the window size update from the receiver is lost
To solve it: the sender starts the persist timer when AdvertisedWindow = 0
When the persist timer expires, the sender sends a small packet
Triggering Transmission When to transmit a segment:
small segments subject to large overhead
Reach max segment size (MSS): the size of the largest segment TCP can send without causing the local IP to fragment MSS = local MTU – IP & TCP header
The sending process explicitly ask the TCP to transmit, “push”
Congestion
When the network cannot support the sender’s rate Queues at the network elements overflow
Source1
Source2
Source3
Dest2
Dest1
Even with flow control packets might not reach the
destination
Congestion Control vs. Flow Control Congestion Control
Mechanism to prevent sender from overrunning the capacity of the network When network is the bottleneck
Flow Control
Mechanism to prevent sender from overrunning the capacity of the receiver When receiver is the bottleneck
Congestion Control: Design Approach
Maintain another window at the sender called CongestionWindow (cwnd) CongestionWindow is the max number of packets allowed in the network
Number of unACKed packets at the sender.
Key: How to calculate congestion window (cwnd) Various approaches possible
TCP estimates it based on observed packet losses
Assumes packet loss as indication of congestion
Since we don’t know whether the network or the receiver is the bottleneck MaxWindow = MIN(CongestionWindow, AdvertisedWindow) EffectiveWin = MaxWindow – (LastByteSent –LastByteAcked)
TCP Congestion Control
TCP sends packets into network without reservation Try to use network resource (bandwidth, buffer) as much as it can
As congestion occurs, scales back
Strategy: Conservatively increases packet sending rate (cwnd) if no congestion
Quickly reduce sending rate(cwnd) as congestion detected (packet loss)
Congestion Avoidance: (AIMD)
If no congestion in the network (increase conservatively) Increase the congestion window additively every RTT
If congestion in the network (decrease aggressively) Decrease the congestion window multiplicatively, immediately
How is congestion detected? Estimated (more later)
Every RTTw = w + 1
w = cwnd in segments
Every ACK receptionw = w + 1/w
w = cwnd in segments
Every ACK receptioncwnd = cwnd + MSS*(MSS/cwnd)
cwnd in bytes
cwnd = cwnd/2cwnd in bytes
Congestion Avoidance: (AIMD)
TCP’s saw tooth pattern
Issues with additive increase takes too long to ramp up a connection from the beginning
The entire advertised window may be reopened when a lost packet retransmitted and a single cumulative ACK is received by the sender
Time
CongestionWindow Size
Startup time
TCP “Slow Start”: To start quickly!
Maintain another variable slow start threshold (ssthresh) Last known stable rate If (cwnd > ssthresh)
State = congestion avoidance
Else State = slow start
In Slow start Increase the congestion window exponentially every RTT
Key: How is ssthresh calculated?
Every ACK reception
w = w + 1w = cwnd in segments
Every ACK reception
cwnd = cwnd + MSScwnd in bytes
TCP: Congestion Detection and Retransmit
Loss of packet indicates congestion
Timer Timeouts (No ACK) Set according to Jacobson/Karels algorithm
On timer timeout ssthresh = max(2*MSS, effwin/2); cwnd = MSS
Notice this will cause TCP to go into slow start
Issue: takes a long time to detect a packet loss
Affects throughput
Any other quicker way of detecting a packet loss?
Fast Retransmit
Observation: A series of duplicate ACKs might mean a packet loss
Solution Every time receiver receives a packet (out-of-order), sends a duplicate ACK
Sender retransmit the missing packet after it receives some number of duplicate ACKs (e.g. 3 duplicate ACKs)
Fast Retransmit does not replace timeouts
Issue: Reduces latency (early retransmit) but still incurs loss in throughput (slow start after packet loss )
ACK 1
ACK 2
ACK 2
ACK 2
ACK 2
ACK 6
PKT 1
PKT 2
PKT 4
PKT 5
PKT 6
PKT 3
Retran
PKT 3
Fast Recovery
Transmit a packet for every ACK received till the retransmitted packet is ACK’d ssthresh= (2*MSS, cwdn/2); cwnd = sshthred + 3
On every ACK will the ACK of retransmitted packet cwnd = cwnd + 1
On reception of ACK of retransmitted packet Start congestion avoidance instead of slow start cwnd = ssthresh
Homework 5.13 (3rd ed and 4th ed)
5.16
5.28
5.34
5.39
Due 4/5