31
CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph http://www.cs.berkeley.edu/~adj/cs16x

CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

CS194-3/CS16xIntroduction to Systems

Lecture 22

TCP congestion control,Two-Phase Commit

November 14, 2007

Prof. Anthony D. Joseph

http://www.cs.berkeley.edu/~adj/cs16x

Page 2: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.211/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Review: Networking Definitions• Layering: building complex services from simpler

ones• Datagram: independent, self-contained message

whose arrival, arrival time, and content are not guaranteed

• Performance metrics– Overhead: CPU time to put packet on wire– Throughput: Maximum number of bytes per second– Latency: time until first bit of packet arrives at

receiver• Arbitrary Sized messages:

– Fragment into multiple packets; reassemble at destination

• Ordered messages:– Use sequence numbers and reorder at destination

• Reliable messages:– Use Acknowledgements– Want a window larger than 1 in order to increase

throughput• TCP: Reliable byte stream between two processes on

different machines over Internet (read, write, flush)

Page 3: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.311/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Goals for Today

• TCP congestion avoidance and flow control• TCP sockets• Distributed applications• Two-phase commit

Note: Some slides and/or pictures in the following areadapted from slides ©2005 Silberschatz, Galvin, and Gagne. Slides courtesy of Kubiatowicz, AJ Shankar, George Necula, Alex Aiken, Eric Brewer, Ras Bodik, Ion Stoica, Doug Tygar, and David Wagner.

Page 4: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.411/14/07 Joseph CS194-3/16x ©UCB Fall 20074

SYN k

SYN n; ACK k+1DATA k+1; ACK n+1

ACK k+n+1data exchange

FIN

FIN ACK½ close

FIN

FIN ACK ½ close

TCP Timing Diagram

3-way handshakeOpenconnect.

Transfer

Closeconnect.

Page 5: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.511/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Sliding Window

• window = set of adjacent sequence numbers– Size of set is window size or n

• Can fully utilize a link, provided the window size is large enough– Throughput is ~ (n/RTT)– Stop & Wait is like n = 1

• Sender has to buffer all unacknowledged packets, because they may require retransmission

• Receiver may be able to accept out-of-order packets, but only up to its buffer limits

Page 6: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.611/14/07 Joseph CS194-3/16x ©UCB Fall 2007

How Fast to Send?

• Send too slow: link is not fully utilized– Wastes time

• Send too fast: link is fully utilized but....– Queue builds up in router buffer (delay)– Overflow buffers in routers– Overflow buffers in receiving host (ignore)

Sending Hosts Buffer in Router Receiving Hosts

A2 B2100 Mbps

A1

A3 B3

B1

Page 7: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.711/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Three Congestion Control Problems

• Adjusting to bottleneck bandwidth– Without any a priori knowledge– Could be gigabit link, could be a modem

• Adjusting to variations in bandwidth– Assuming you have rough idea of bandwidth

• Sharing bandwidth between flows– Two Issues:

» Adjust total sending rate to match bandwidth» Allocation of bandwidth between flows

Page 8: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.811/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Reality

Congestion control is a resource allocation problem involving many flows, many links, and complicated global dynamics

Page 9: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.911/14/07 Joseph CS194-3/16x ©UCB Fall 2007

What’s Really Happening?View from a Single Flow

• Knee – point after which – Throughput increases

very slow– Delay increases fast

• Cliff – point after which– Throughput starts to

decrease very fast to zero (congestion collapse)

– Delay approaches infinity

Load

Load

Th

rou

gh

pu

tD

ela

y

knee cliff

congestioncollapse

packetloss

Page 10: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1011/14/07 Joseph CS194-3/16x ©UCB Fall 2007

General Approaches

• Send without care– Many packet drops– Not as stupid as it seems – UDP

• Dynamic Adjustment– Probe network to test level of congestion– Speed up when no congestion– Slow down when congestion– Suboptimal, messy dynamics, simple to

implement

• For TCP, dynamic adjustment is the most appropriate– Due to pricing structure, traffic characteristics,

and good citizenship (“TCP-Friendly”)

Page 11: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1111/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Administrivia

• Project 3 design due Monday 11/19

• Midterm 2 Exam• Avg 64.8• Median 63• Std Dev 9.5

Page 12: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1211/14/07 Joseph CS194-3/16x ©UCB Fall 2007

TCP Congestion Control

• TCP connection has window– Controls number of unacknowledged packets– Limits how much data can be in transit– Implemented as # of bytes– Described as # packets in this lecture

• Sending rate: ~Window/RTT– Vary window size to control sending rate

• Two basic components:– Detecting congestion– Rate adjustment algorithm

» Three sub-problems within adjustment problem•Finding fixed bandwidth•Adjusting to bandwidth variations•Sharing bandwidth

Page 13: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1311/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Two Basic Components

• Detecting congestion– Packet dropping is best sign of congestion

» Delay-based methods are hard and risky

– How do you detect packet drops? No ACKs» TCP uses ACKs to signal receipt of data» No ACK after certain time interval: time-out

• Rate adjustment – basic structure:– Upon receipt of ACK (of new data): increase

rate– Upon detection of loss: decrease rate

Page 14: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1411/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Problem #1: Single Flow, Fixed BW

• Want to get a first-order estimate of the available bandwidth– Assume bandwidth is fixed– Ignore presence of other flows

• Want to start slow, but rapidly increase rate until packet drop occurs (“slow-start”)

• Adjustment: – cwnd initially set to 1– cwnd++ upon receipt of ACK

Page 15: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1511/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Slow-Start

• cwnd increases exponentially: cwnd doubles every time a full cwnd of packets has been sent– Each ACK releases two packets– Slow-start is called “slow” because of

starting point segment 1cwnd = 1

cwnd = 2 segment 2segment 3

cwnd = 4segment 4segment 5segment 6segment 7

cwnd = 8

cwnd = 3

Page 16: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1611/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Problems with Slow-Start

• Slow-start can result in many losses– Roughly the size of cwnd ~ BW*RTT

• Example:– At some point, cwnd is enough to fill “pipe”– After another RTT, cwnd is double its

previous value– All the excess packets are dropped!

• Need a more gentle adjustment algorithm once have rough estimate of bandwidth– Switch after first loss

Page 17: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1711/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Problems #2 and #3

• Problem #2: Single Flow, Varying BW– Want to be able to track available bandwidth,

oscillating around its current value– Ideal variation: (in terms of RTTs)

» Additive increase : cwnd cwnd + 1/cwnd (Increase cwnd by 1 only if all segments in cwnd are ACK’d)

» Multiplicative decrease: cwnd ½*cwnd» AIMD: gentle increase, drastic decrease

• Problem #3: Multiple Flows– Want steady state to be “fair”– Many notions of fairness, but here all we

require is that two identical flows end up with same BW

Page 18: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1811/14/07 Joseph CS194-3/16x ©UCB Fall 2007

AIMD Sharing Dynamics

A Bx

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packet/RTT every RTT

Congestion decrease rate by factor 2

Rates equalize fair share

y

Page 19: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.1911/14/07 Joseph CS194-3/16x ©UCB Fall 2007

TCP Congestion Control Summary

• Measure available bandwidth– Slow start: fast, hard on network– AIMD: slow, gentle on network

• Detect congestion using timeout based on RTT– Robust, causes low throughput

Page 20: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

BREAK

Page 21: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2111/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Use of TCP: Sockets• Socket: an abstraction of a network I/O queue

– Embodies one side of a communication channel» Same interface regardless of location of other end» Could be local machine (called “UNIX socket”) or remote

machine (called “network socket”)– First introduced in 4.2 BSD UNIX: big innovation at

time» Now most operating systems provide some notion of

socket• Using Sockets for Client-Server (C/C++ interface):

– On server: set up “server-socket”» Create socket, Bind to protocol (TCP), local address, port» Call listen(): tells server socket to accept incoming

requests» Perform multiple accept() calls on socket to accept

incoming connection request» Each successful accept() returns a new socket for a new

connection; can pass this off to handler thread– On client:

» Create socket, Bind to protocol (TCP), remote address, port

» Perform connect() on socket to make connection» If connect() successful, have socket connected to server

Page 22: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2211/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Socket Example (Java)server:

//Makes socket, binds addr/port, calls listen()ServerSocket sock = new ServerSocket(6013);while(true) {

Socket client = sock.accept();PrintWriter pout = new

PrintWriter(client.getOutputStream(),true);

pout.println(“Here is data sent to client!”);…

client.close();}

client:// Makes socket, binds addr/port, calls

connect()Socket sock = new Socket(“169.229.60.38”,6013);BufferedReader bin =

new BufferedReader(new InputStreamReader(sock.getInputStream));

String line;while ((line = bin.readLine())!=null)

System.out.println(line);sock.close();

Page 23: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2311/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Distributed Applications• How do you actually program a distributed

application?– Need to synchronize multiple threads, running on

different machines » No shared memory, so cannot use test&set

– One Abstraction: send/receive messages» Already atomic: no receiver gets portion of a message

and two receivers cannot get same message• Interface:

– Mailbox (mbox): temporary holding area for messages» Includes both destination location and queue

– Send(message,mbox)» Send message to remote mailbox identified by mbox

– Receive(buffer,mbox)» Wait until mbox has message, copy into buffer, and

return» If threads sleeping on this mbox, wake up one of them

Network

Sen

d

Receiv

e

Page 24: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2411/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Using Messages: Send/Receive behavior

• When should send(message,mbox) return?– When receiver gets message? (i.e. ack received)– When message is safely buffered on destination?– Right away, if message is buffered on source node?

• Actually two questions here:– When can the sender be sure that the receiver

actually received the message?– When can sender reuse the memory containing

message?

• Mailbox provides 1-way communication from T1T2– T1bufferT2– Very similar to producer/consumer

» Send = V, Receive = P» However, can’t tell if sender/receiver is local or not!

Page 25: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2511/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Messaging for Producer-Consumer Style

• Using send/receive for producer-consumer style:Producer:

int msg1[1000];while(1) {

prepare message; send(msg1,mbox);

}Consumer:

int buffer[1000];while(1) {

receive(buffer,mbox);process message;

}

• No need for producer/consumer to keep track of space in mailbox: handled by send/receive– One of the roles of the window in TCP – window is

bounded by size of buffer on far end:window = min(cwnd, available receiver buffer)

– Restricts sender to forward only what will fit in buffer

SendMessage

ReceiveMessage

Page 26: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2611/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Messaging for Request/Response communication

• What about two-way communication?– Request/Response

» Read a file stored on a remote machine» Request a web page from a remote web server

– Also called: client-server» Client requester, Server responder» Server provides “service” (file storage) to the client

• Example: File serviceClient: (requesting the file)

char response[1000];

send(“read rutabaga”, server_mbox);receive(response, client_mbox);

Server: (responding with the file)char command[1000], answer[1000];

receive(command, server_mbox);decode command;read file into answer;send(answer, client_mbox);

RequestFile

GetResponse

ReceiveRequest

SendResponse

Page 27: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2711/14/07 Joseph CS194-3/16x ©UCB Fall 2007

General’s Paradox

• General’s paradox: – Constraints of problem:

» Two generals, on separate mountains» Can only communicate via messengers» Messengers can be captured

– Problem: need to coordinate attack» If they attack at different times, they all die» If they attack at same time, they win

– Named after Custer, who died at Little Big Horn because he arrived a couple of days too early

Page 28: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2811/14/07 Joseph CS194-3/16x ©UCB Fall 2007

General’s Paradox

• Can messages over an unreliable network be used to guarantee two entities do something simultaneously?– Remarkably, “no”, even if all messages get through

• No way to be sure last message gets through!

11 am ok?

So, 11 it is?Yes, 11 works

Yeah, but what it you

don’t get this ack?

Page 29: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.2911/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Two-Phase Commit• Since we can’t solve the General’s Paradox (i.e.

simultaneous action), let’s solve a related problem– Distributed transaction: Two machines agree to do

something, or not do it, atomically • Two-Phase Commit protocol does this

– Use a persistent, stable log on each machine to keep track of whether commit has happened

» If a machine crashes, when it wakes up it first checks its log to recover state of world at time of crash

– Prepare Phase:» The global coordinator requests that all participants

will promise to commit or rollback the transaction» Participants record promise in log, then acknowledge» If anyone votes to abort, coordinator writes “abort”

in its log and tells everyone to abort; each records “abort” in log

– Commit Phase:» After all participants respond that they are prepared,

then the coordinator writes “commit” to its log» Then asks all nodes to commit; they respond with ack» After receive acks, coordinator writes “got commit”

to log– Log can be used to complete this process such that

all machines either commit or don’t commit

Page 30: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.3011/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Two phase commit example• Simple Example: AATM machine, BThe Bank

– Phase 1:» A writes “Begin transaction” to log

AB: OK to transfer funds to me?» Not enough funds:

BA: transaction aborted; A writes “Abort” to log» Enough funds:

B: Write new account balance to logBA: OK, I can commit

– Phase 2: A can decide for both whether they will commit

» A: write new account balance to log» Write “commit” to log» Send message to B that commit occurred; wait for ack» Write “Got Commit” to log

• What if B crashes at beginning? – Wakes up, does nothing; A will timeout, abort and

retry• What if A crashes at beginning of phase 2?

– Wakes up, sees transaction in progress; sends “abort” to B

• What if B crashes at beginning of phase 2?– B comes back up, look at log; when A sends it

“Commit” message, it will say, oh, ok, commit

Page 31: CS194-3/CS16x Introduction to Systems Lecture 22 TCP congestion control, Two-Phase Commit November 14, 2007 Prof. Anthony D. Joseph adj/cs16x

Lec 22.3111/14/07 Joseph CS194-3/16x ©UCB Fall 2007

Distributed Decision Making Discussion• Two-Phase Commit: Blocking

– A Site can get stuck in a situation where it cannot continue until some other site (usually the coordinator) recovers.

– Example of how this could happen:» Participant site B writes a “prepared to commit”

record to its log, sends a “yes” vote to the coordintor (site A) and crashes

» Site A crashes» Site B wakes up, check its log, and realizes that it

has voted “yes” on the update. It sends a message to site A asking what happened. At this point, B cannot change its mind and decide to abort, because update may have committed

» B is blocked until A comes back– Blocking is problematic because a blocked site

must hold resources (locks on updated items, pages pinned in memory, etc) until it learns fate of update

• Alternative: There are alternatives such as “Three Phase Commit” which don’t have this blocking problem