Upload
brody-gupton
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
CS 640Introduction to Computer Networks
Fall 1999Professor Landweber
Copyright 1999, Lawrence H. Landweber
Copyright 1996, Larry Peterson (slides marked with@ in lower left corner and those with white backgroundwith the exception of the OSPF and TCP dialog slideswhich are from IETF RFCs)
Computer Communications Network (network, computer network)
Distributed system with:
• Transmission media (coaxial cable, twisted pair (cat3, cat5), glass fibre, wireless-ether)
• Switching elements (switch, router, node, hub, bridge, gateway …)
• Hosts (computer, server, workstation end-system, node)
• Protocols (physical, link, network, transport, session, presentation, application)
Communications Medium
• Connects switching elements to each other and to hosts
• Transports digital data stream
• Link– point-to-point - connects single host to switching
elt or switching elt to another switching elt– multipoint or broadcast - connects one or more
hosts and switching elts to each other
Units - Orders of Magnitude• kilo (thousand), mega (million), giga (billion),
tera (trillion… thousand billion), peta (10**15), exa (10**18)
• Till early 1980s high bandwidth meant 56Kbps - today Gbps is high speed
• A commercial fibre can transmit 800Gbps (80 data streams at 10Gbps - DWDM - Dense Wave Division Multiplexing)
• Research state of the art is 3.2Tbps (80 data streams at 40Gbps) for a single fibre
Switching Elements
• Responsible for the transport of data across a network - routing
• May support congestion handling or quality of service mechanisms
• Examples– routers - IP– switches - ATM or local area networks (e.g.,
Ethernet-802.3, token ring)– bridges, hubs - local area networks
Network Types - Direct Connectivity
• point-to-point
• mulitpoint /broadcast
Ethernet
Token Ring Also satellite/radio
Network Types - Indirect Connectivity
• switched networks
• internetworksbackbone
link
host
switchingelement
Protocols
A protocol is a set of conventions that specifies the rules or parameters for communication between two entities (hosts, processes, switching elts, devices, etc).
Protocols specify conventions for:
• The type, semantics, and format of information to be conveyed between entities
• Establishing/closing connections or sessions• Routing across a network or internetwork• Flow control: speed matching between
entities• Reliability: error detection, handling,
correction
Protocols also specify
• Data and object types and their representation
• Session characteristics (e.g., checkpointing, encryption-security, compression)
• Congestion detection and control
• Quality of service parameters and handling
The Basic Idea: Packet Switching
Data networks use packet switching instead of circuit switching, the method traditionally used in the telephone system. Packet switching was first invented in the 1960s as a method for building a survivable communications system in a battlefield environment.
Circuit Switching - Telephone System Model
• Dedicated bandwidth (capacity) reserved along fixed path from caller to called for duration of conversation (signalling -setup)
• Several conversations can share a link…but each is allocated fixed fraction of link bw
• Bandwidth not shared so during periods of silence, it is wasted
• If dedicated bandwidth not available for a call, then the call not completed (blocking)
Circuit Switching
Circuit Switching
Circuit Switching
Circuit Switching
Path is fixed - all data follows this pathBandwidth / capacity reserved on all linksData flows continuously - NO PACKETSBW unavailable --> BLOCKING
Packet Switching - Data Network Model
Packet Header Payload
1234
5
• Each data stream divided into chunks, called packets; each packet sent separately into the network; packets may follow diff paths from source to dest
• Each packet has a header and possibly a trailer that contain information as to its source and destination
Packet Switching - Data Network Model• Bandwidth not reserved but is shared between
data streams
• Complete packet arrives at sw elt before it is forwarded to next sw elt; store / forward
• Store and forward systems must deal with allocation of buffers in sw elts and with congestion that may result from too many packets arriving in too short a time
• If a buffer not available when a packet arrives at sw elt, a packet is discarded (may not be one that just arrived)
Packet Switching - Datagrams and Virtual Circuits
• Examples of packet switching ( i.e., S&F, buffers, congestion, no link bw reserved)
• Virtual Circuit– must set up path - vc (conn estab - signalling)– path fixed for all packets of a data stream– after setup, header need only specify vc/path
• Datagram– packets for data stream may travel on diff paths– headers must include full address of dest
Multiplexing
• Share network resources (sw elt and links) among multiple users
• TDM - time div multiplexing (Alohanet, satellite)
• FDM - frequency div multiplexing (CATV)
@
TDM
• Slots can be statically or dynamically assigned; assignment decision can be centralized or distributed
• Circuit switching can use TDM - e.g., T1
• Statistical TDM - packet switching– packets from different sources interleaved– buffer contending packets– order of service may be FIFO or otherA B C A A C B
TDM - Circuit Sw. - T1
24 calls/conversations in parallel
125 microseconds - each with 8 bits (digital) for
8 bits7 data1 parity
Each call gets 8 * 8000 bits per second or 64KbpsTotal BW is 64Kbps * 24 + some admin bits or about 1.53MbpsT3 or DS3 = 28 T1 or ~45MbpsE1= 2Mbps
What Can Go Wrong?
• Signal errors - noise - leads to garbled bit(s)– CRC, checksum
• Packet loss due to congestion– sequence numbers, acks, timers, retransmission
(voice vs. file transfer)
• Packets delayed, garbled, duplicated
• Link or sw elt failure
• Depending on application, protocols may have to compensate for or correct the above
@
Performance - Bandwidth
• Amount of data that can be transmitted per unit time (Kbps, Gbps, etc)
• Related to bit width / coding• BW of link vs. end-end throughput• Throughput determined by
– bw of links on path– sw elts proc speed– congestion-queuing time – transmission rate of sender
@
Performance - Latency Delay
• Time to send message or single bit from A to B• Components
– propagation - signal speed in medium - length of links
– transmission / processing time at sender and sw elts
– queuing time at sw elts and at entrance to network
• Speed of light– 3.0 * (10**8) meters / second in a vacuum or ~1 foot
per nanosecond (billionth of a second)
– 2.0 to 2.7 * (10**8) meters per second in various media
– latency in LAN vs. terrestrial WAN vs. satellite
@
Latency vs. Bandwidth
• Independent
• Bandwidth - Delay Product– amount of data that the network can store between
source and destination– if RTT used for delay, then amount that can be sent
before sender can receive an ack– example: 100ms RTT and 45Mbps: bw-delay
product is 560Kbytes
• Jitter - difference in time between arrival of successive packets in a data stream
Protocol Architecture - Layering
• Use layers to localize protocol functionality
• Protocol that implements a layer has interfaces to:– higher level protocol and lower layer protocol
• services to / from (service interface)
• encapsulation / decapsulation
• *** connection oriented or connectionless
– peer protocol• messages exchanged with peer protocol entity
• control information in header / trailer
Interfaces
serviceinterface
serviceinterface
peer-to-peer interface
Protocol Entity
ProtocolEntity
Connection-oriented Service Interface (CO)
• Users of peer protocol entities can assume there a direct physical connection or wire between them (connection setup required)
• Data sent by one user is received by the peer user in same order as it was sent by first user
• Definition of CO Service Interface is indep of its implementation in network (vc/dg)
• May or may not be reliable (TCP vs. ATM)
Connectionless Service Interface (CL)
• No notion of a connection between users of peer protocol entities
• CL service interface promises best effort delivery of data from one user to the other, data may or may not reach the peer user and may or may not be delivered in order sent
• AGAIN NOTE that this says nothing about the implementation method within the network
More on Service Interfaces
• CO, CL refer to service interfaces NOT to the protocols used to implement the service
• The spec of a service interface does not imply anything about how the interface is implemented by a protocol
• It also does not imply anything about how data is transported across the network
• THIS DISTINCTION IS OFTEN BLURRED
OSI Open Systems
Interconnection Model7 Layers
application
presentation
session
transport
network
link
physical
end system end system
sw elts in network
OSI Model
• Model of protocol functionality (CCITT and ISO - Int. Org. for Standardization)
• Just a model … does not specify details of protocols and interfaces or their implementation
• Different protocols may conform to the model but not interoperate
• Useful as a way of thinking about protocols - a taxonomy of protocol functionality
Layers
• Physical - what happens on wire - coding of bits, electrical or optical characteristics, config of pins, connectors and other hardware components
• (Data) Link - handles transfer of data between two or more entities connected by a single link or hop. Service interface may be CO or CL. May or may not be reliable. (X.25 Level 2, IEEE 802.x)
Layers
• Network - Transfer of data across a network. Includes switching (dg, vc), routing, congestion avoidance and control, network wide addressing. (IP addresses, IP/ICMP, X.25 Level 3, ATM, BGP, OSPF)
• Transport - Process to process transfer of data. Connectionless (UDP) or connection-oriented service (TCP) interface.
Layers
• Session - conventions for sessions. E.g., checkpointing, half or full duplex, security.
• Presentation - syntactic conventions for communication between apps, encoding rules and file formats.
• Application - includes session and presentation in the Internet protocol architecture
FTP
TCP
IP
UDP
TelnetSMTP HTTP OSPF
RTP
X.25 802.2
802.5802.3
Internet Protocol Architecture (IETF)
Link Layer
• Two nodes (hosts, sw elts) connected by a single link
• Framing and packet format
• CO or CL service interface
• Reliability
• Flow control
Link Layer - Framing
• Link layer packet = frame
• Problem: how to recognize beginning and end of frame?
• Three methods– byte counting (DDCMP)– byte stuffing/oriented (BISYNCH, IMP-IMP)– bit stuffing/oriented (X.25 Level 2, 802.x)
Byte Counting
• Include count field giving number of bytes in frame
• Problem: what if byte field is corrupted?
• Might be able to catch with CRC… but only if max length assumed
SYN…Count Header Body CRC
@
Byte Stuffing/Oriented
• Special characters used for control
• Must distinguish special characters when used for control from when in data
• Use special data link escape character, DLE
• E.g., STX = start of text… use DLE STX when STX is in data (stuff DLE before STX if it appears in data (also for SOH, ETX,...)
• If DLE in data, send DLE DLE
Byte Stuffing Problems
• Dependence on fixed character set
• Must examine every byte of data on sending and receiving (insert / remove DLE)
• Was used widely in IBM bisynch (at 9600bps)
Bit Stuffing/Oriented
• Frame beginning and end marked by special bit string …. usually 01111110
• If 5 1’s in data to be sent, sender inserts 0
• If receiver sees 5 1’s check next bit(s)– if 0, remove it (stuffed bit)– if 10, end of frame marker (01111110)– if 11, error (7 1’s cannot be in data)
• Can be done fast in hardware
FLAG Header Body … CRC FLAG
Error Detection Cyclic Redundancy Check (CRC)
• Bitwise binary arithmetic0 + 0 = 1 + 1 = 0
0 + 1 = 1 + 0 = 1
• Addition and subtraction are same operation
• Example 11011000
+/- 10001001
01010001
CRC
• Represent n bit message as an n-1 degree polynomial
• Message M = 101101101
M(x) = x**8 + x**6 + x**5 + x**3 + x**2 + 1
• Special generator polynomial, C(x), degree k
• C(x) chosen to have special properties - this is key to the method working well
CRC - Method
• Pad M with k 0s yielding M’ (k = deg C(x))• Represent M’ as M’(x)• R(x) = rem ( M’(x) / C(x)) • P = M’ + R (P(x) = M’(x) + R(x))• Transmit P (P(x))• At dest, receive P(x) + E(x)• Compute rem (P(x) + E(x)) / C(x)
– If 0, either message rec ok or C(x) divides E(x)– If not 0, then error in transmission
Shift Register
• Consider C(x) = x**3 + x**2 + 1
• M(x) = X**7 + x**4 + x**3 + x
• M’ = 10011010000
1 0 0 1+ +
X**3 X**2 X**1 1
1010000
The shift register performs long division using bitwise operations - M’ / C
Shift Register
• Consider C(x) = x**6 + x**5 + x**3 + x +1
• M’ = 10110110101000000
1 0+ 1 1+
X**6 X**5 X**4 X**3
0101000000
The shift register performs long division using bitwise operations - M’ / C
0 1+
X**2 x
1
1
+
Properties of C(x): Should not divide E(x) for common errors
• Single bit errors – non-zero x**k, x**0 terms
• Double (2 consecutive) bit errors – C(x) has factor with at least 3 terms
• Odd number errors: contains factor (x+1) – consider E(x) with odd no. terms - so E(1) = 1– but if (x+1) is factor of E, then E(1) = 0– so if odd number of errors, E cannot have (x+1)
factor
More on C(x)
• Burst error - at least first and last bits of burst are in error– if length of burst is <= k, the degree of C(x),
then burst error is detected– C(x) with degree k cannot divide E(x) with
degree less than k
• High probability of detecting burst errors of length greater than k… but not 100%
Reliable Transmission
• When CRC detects error, frame discarded
• Corrected by link layer retransmission - ARQ (Automatic Repeat reQuest)– mechanisms include: frame sequence numbers,
acknowledgements (acks), negative acks (naks), selective acks (sacks), timers / timeout
– often implemented by sliding window
• Also used by transport layer - but link impl simpler - wire connects protocol entities
Sliding Window - Functions
• Put data in order for delivery to user of service interface
• Recovery from losses– retransmission
• Flow control - prevent sender from overrunning receiver’s buffers – link - SWS usually fixed, RWS often = 1– credits - signal from receiver to sender - used with
transport protocols
Timer / Retransmission Strategies
• Timers for link layer– 1 timer for each frame – 1 timer for link
• Retransmission on timeout– retransmit frame whose timer goes off – retransmit earliest unacked frame– retransmit all frames from earliest unacked frame - go
back n
• Retransmit on nak or sack
Sliding Window Examples
• PrincipleSWS + RWS <= MaxSeqNum (size seq # space)
• MaxSeqNum = 8, SWS = 4, RWS = 4
Sender Receiver0,…,3 0,…,3 accepted
ack 0,…,3, NFE = 4lost
resend old 0,…,3 all rejected-no overlapwith expected frames
Problem occurs if resent frame overlaps with expected new frame. Not possible if above principle observed
Sliding Window Examples
Sender Receiver0,…,4 0,…,3 accepted, disc 4
ack 0,…,3, NFE = 4lost
resend old 0,…,4 acc 4, disc 0,1,2,3
Problem occurs because resent frame overlaps with expected new frame.
ack 4, will acc 5,6,7,0, NFE = 4
lost
resend old 0,…,4 acc 0, as new but is retrans
Bit Oriented Data Link Protocols
• ANSI Standard - ADCCP, Advanced Data Communications Control Procedures
• Choices of functionality and operation
• Designed to operate – over point-point or multi-point links
– in 2-way alternate or 2-way simultaneous mode
– over terrestrial and satellite links
– when communication is between logical equals or non-equals
Station Types
• Primary– flow control– error recovery (other than retransmission)– initialize and terminate link
• Secondary– simpler than primary
primary secondary
commands
responses
Data Transfer Modes
• NRM - Normal Response Mode– one primary, multiple secondary stations– polled multipoint, secondary often a device
• ARM - Asynchronous Response Mode– one primary, one secondary station– secondary can send at will as responses– point-point - not good for multipoint
• ABM - Asynchronous Balanced Mode– two combined stations (both issue commands and responses)
Frame Format
F A C Info FCS F
F = 01111110 (flag sequence)A = address of
receiver if commandsender if responseall 1s = broadcast
C = 8 or 16 control bits - type of frame + functionsFCS = CRC
Control Fields
• All can be commands or responses • Information transfer
– 0 N(S) P/F N(R)
• Supervisory (4)– 10 SS P/F N(R)
• Unnumbered (32)– 11 MM P/F MMM
• N(S) and N(R) are 3 or 7 bits; P/F is 1 bit
P/F Bit
• NRM - P=1, poll secondary, secondary sets F=1 in last response frame
• ARM/ABM - P=1 is sent by primary asking secondary to set F=1 in next response
• P/F bit used for synchronization
The Sliding WindowN(R), N(S)
• N(S) is the sequence number of the information frame that contains it
• N(R) acks all info frames with seq numbers < N(R); i.e., N(R) = NFE
• Sequence number space is 7 (3 bits) or 127 (7 bits)
• With X.25 L2, receive window is always 1 so frames only accepted in order
Miscellaneous
• To abort a frame (so it will be recognized as not correct)– append incorrect FCS or– transmit 7 1s
• 15 ones transmitted means that station is giving up control of the link
Supervisory Frames (4)
• Contains N(R), ack to N(R)-1
• RR - receive ready - ready to rec I frame– clear busy condition (RNR)
• RNR - receive not ready - busy condition– received frames may be discarded
• REJ - reject - ask retrans from N(R)– clear busy condition– cleared when frame with N(S) = N(R) of rejected frame is
received
• SREJ - selective reject - ask retrans of N(R)
Selected Unnumbered Frames
• SABM - set asynchronous balanced mode - link establishment (also SARM, SNRM)
• DISC - disconnect
• UA - unnumbered ack, ack unnumbered command
• DM - disconnect mode - request set mode command, reject set mode
• FRMR - frame reject - bad control field or bad I frame length or invalid N(R)
Link Initialization
SABM
UA (yes)
P=!
F=1Or DM (no)
DM F=0
SABM
DTE DCE
Timers / Constants
• T1 = frame retransmission timer• T2 = may delay timer• T3 = idle channel timer - inform protocol user • N1 = Max I frame size with headers• N2 = Max # retransmission attempts• K = max number of outstanding frames (<= 7
or <= 127) - pipelining
Commands Responses Primary Secondary I I RR RR RNR RNR SREJ * SREJ * SNRM UA DISC DM
FRMR UI UI
Unbalanced Normal Class (NRM)UNC 3,4
*Receive window can be > 1
B, SNRM,P B,UA,FB,RR0,P B,RR0,F
B,I00B,I10B,I20,P B,I03
B,I13B,I23,F
B,RNR3B,I33,P
B,I34B,I44,F
B,RNR5,P
B,DISC,P
B,RR4,F
B,UA,F
Primary Secondary
Balanced Asynchronous Class (ABM)BAC 2,8
Commands Responses Primary Secondary I RR RR RNR RNR REJ * REJ * SABME UA DISC DM
FRMR
*Receive window = 1, only receive in order,REJ requests resend of ALL sent, beginning with lost one
Local Area Networks
• Standardized by IEEE as 802.x• A variety of Media Access Control (MAC)
protocols and associated topologies– 802.3, Ethernet, CSMA/CD (3u, 3z)– 802.4, Token Bus– 802.5, Token Ring– 802.11, Wireless– DQDB, etc
• With 802.2, comprises link layer
802.x Model
LLC Logical Link Control - 802.2MAC Media Access Control - 802.3, 802.4, ...
PHY Physical Layer - 802.3,802.4,...
medium
Link Layer
IP or other network protocol
Token
Token RingFDDI
Ethernet
Hub
Computer
Collision Domain
802.2 - Logical Link Control
• PDU - Protocol Data Unit
DSAP SSAP Control Information
Addresses are 8 bits - all 1s = broadcastDSAP = Destination Service Access Point, address
leftmost bit for Group or IndividualSSAP = Source Address
leftmost bit for Command or Response (a la X.25 L2)Control = 16 bits if seq no included, else 8 bits
LLC Types / Classes
• There are two types of operation– Type 1 - provides connectionless service
• no flow control or error control• no connection setup
– Type 2 - provides a connection-oriented service• similar to X.25 L2 (BAC 2,8)• I frames can be responses
• There are also 2 classes of procedure– Class 1 - includes Type 1 only
– Class 2 - includes Type 1 and Type 2
Commands/Responses
Commands Responses UI XID XID TEST TEST
I I RR RR RNR RNR REJ REJ SABME UA DISC DM
FRMR
Type 1
Type 2
Details - Type 1
• Every LLC on a LAN supports Type 1
• UI frames used for data transfer
• Does not use XID
• TEST used by one entity to see if other still is “alive”
Details - Type 2
• I frames used for data transfer
• Seq number space is 7 bits
• Receive window is1, frames only accepted in order
• Exchange Information (XID) is used by receiver to tell sender the max amount, k, that the sender can send beyond the most most recently received ack (N(R)). This bounds pipelining.
Ethernet - 802.3
• Ethernet developed by Xerox in mid 1970s
• Basic ideas from AlohaNet packet radio project
• Ethernet standardized by Xerox, DEC, Intel in 1978
• IEEE later standardized as 802.3 - at MAC layer differs in one header field from Ethernet
• 10, 100, 1000 Mbps
CSMA/CD
• Carrier sense - stations sense whether there is traffic on the wire
• Multiple access - more than one station can simultaneously be connected and contend for the right to send
• Collision detection - individual stations detect when there has been a collision (transmissions from two stations interfere with each other) and react appropriately
Classical Ethernet (Thick-Net)
• Standardized as 802.3, 10Base5
• Coaxial cable for medium
• Parameters - max– segment length = 500m, 2 repeaters on path– length = 1500m, 1024 nodes
• Also – 10Base2 (thin-net, 200m)– 10BaseT (twisted-pair, star config, 100m)– 10BaseF (fiber, 1-2KM, FP,FB, FL)
Repeater
Repeater
RepeaterMulti-PortBridge
Repeater=HubBridge=SwitchCD=Collision Domain
CD 1 CD 2 CD 3
CD 4
Each CD includesa MP Bridge Port
802.3 Frame Format
Preamble-7 StartDel-1 DstAddr-6 SrcAddr-6 Length-2 Payload FCS-4
StartDelimiter = 10101011Dst and Src Addr = 6 bytes from flat address space
first bit is individual/groupLength = length of LLC packet in payload
max is 1500 bytesEthernet uses this field to specify upper layer protocol,
values are over 1500*802.3 uses 802.2 to select upper level protocol
Payload is LLC packet plus PAD (min payload is 46 bytes)so min packet length is 64 bytes (without preamble,and start delimiter
TransDataEncaps RecDataDecaps
Framing
Addressing (SRC, DST)
Error detection (FCS)
TransMediaAccess RecMediaAccess
Management Management
Medium allocation (collision avoidance)
Contention Resolution (collision handling)
TransDataEncoding RecDataDecoding
Bit transmission/reception
Collision detection
Carrier sense
Preamble generation/detection
PLS
MAC
LLC
Transmission Without Contention
• Client layer (LLC) sends transmission request to MAC
• Construct frame, compute FCS
• Monitor carrier sense signal– if channel clear, initiate frane transmission after
interframe delay (9.6 us) - send bits to PLS
• Transmit encoded preamble, encode and transmit bits
• Listen for collision
Reception Without Contention
• Synchronize with preamble, convert signal to binary
• Set carrier sense while receiving bits
• Check address - if for this station or station is in promiscuous mode, unpack frame, pass data to client
• Compute FCS and send status code to client
Collision Handling on Sending
• If collision detect signal turned on while transmitting
• Stop sending frame and transmit jam signal (32 bits, special pattern) to insure collision is detected by all stations
• Schedule retransmission after back-off time
Collision Handling on Reciving
• Smaller than minimum size packet is detected and discarded (jam detected)
• Terminated transmitted frame plus jam must be smaller than minimum size packet
Retransmission
• When station detects collision on transmitting, backoff/retransmission is scheduled as follows
• For n th retransmission, randomly choose r in the range– 0 < = r < 2**k, where k=min(10,n)
– wait r slot times before retransmission where a slot time is 512 bit times (64 * 8) or 51.2us for 10mbps
• Truncated exponential backoff
• To insure collision detection, enough bits must be sent so that if other sender is maximally distant, its first byte arrives while first sender is still sending
• Consider 10Base5– max length = 1500m – sender must continue sending for number of bit times necessary for data
to travel 1500 * 2 = 3000m– at 10**7 bits/sec, assuming 2/3 speed of light in medium, each bit
occupies 20m so 150 bits must be sent to deal with propagation time, plus additional bits to deal with other delays, but min packet size =512 bits
– the jam is a special 32 bit pattern that allows all stations to learn that that there has been a collision
Collision Detection
Higher Bandwidth 802.3
• 802.3u, 802.3z standardize 100, 1000Mbps• Bridges (switches) can be connected to collision
domains with different bandwidths• To insure collision detection must maintain
relationship between– 802.3 bandwidth
– minimum frame size
– length of combined media
– e.g., higher bandwidth, same min frame size, must have smaller length
Parameters
Parameters 10Mbps 100Mbps 1000Mbps
slotTime 512 bT 512 bT 4096 bTinterFrameGap 9.6us 9.6us .096usjamSize 32 32 32maxFrameSize 1518 B 1518 B 1518 BminFrameSize 64 B 64 B 64 BextendSize 0 B 0 B 448 B
Repeater
Repeater
RepeaterMulti-PortBridge
CD 1(shared 100Mbps)
CD 3
CD 4 (shared 10Mbps)
CD2(dedicated 100Mbps)
Multi-PortBridge
Multi-PortBridge
Multi-PortBridge
1Gbps (dedicated)CD2
1Gbps (dedicated)CD1
Repeater100Mbps (shared)CD3
Repeater
100Mbps (shared)CD4
DSL - Digital Subscriber Line
• Digital copper loop transmission technology and associated equipment
• Connection between subscriber and telco central office or concentrator - local loop access network
• Achieves broadband speed (up to Mbps) over twisted pair
Terminology
• ATU - ADSL (Asynchronous Digital Subscriber Line) Transceiver Unit
• CSU - Channel Service Unit
• DSU - Digital Service Unit
• CPE - Customer Premise Equipment
Current Telco Infrastructure
• Terminology – ILEC - Incumbent Local Exchange Carrier– CLEC - Competitive Local Exchange Carrier– PTO - Public Telephone Operator– CO - Central Office– DLC/RT - Digital Loop Carrier/Remote Term– MDF - Main Distribution Frame - where local loops
terminate– DSLAM - DSL Access Multiplexer
The Access Network
• Typically, cable bundles with up to thousands of twisted wire pairs each of which terminates at a customer premise
• Long loops– signals dissipate/weaken - lower signal to noise– solutions for loops > 18K feet
• loading coils - every 6K feet, amplify signal, not compatible with DSL
• DLC - concentrator, terminate twisted pair and backhaul to CO with T1/E1, fiber or copper, terminate DSL which needs end-end twisted pair
Transmission Frequency
• Higher frequency enables greater BW• POTS - 0 to 3600 Hertz (cycles per sec)
– can also support modem up to 56Kbps (analog)
• DSL uses higher frequencies • Problems
– attenuation - dissipation of signal as it traverses copper wire
– crosstalk - interference between signals on different copper wires
Attenuation
• Higher frequencies attenuate energy faster - so shorter loop distance possible
• Solution– use lower resistance wire - thicker gauge– but telcos have used thinnest possible wire
because thicker wire more expensive to buy/install
Crosstalk
• Electrical energy transmitted on wire radiates energy to adjacent wires in bundle
• Interferes more with similar frequencies• Less of a problem for lower frequencies, e.g., those used
with POTS• Solutions for DSL
– two pair instead of one pair so lower freq can be used to get desired BW
– use different frequencies upstream/downstream, lower upstream because more crosstalk at CO end - many lines converge
Some DSL Types
• HDSL - High Bit Rate DSL, 2 pair, T1/E1, symmetric
• SDSL - Symmetric DSL, 1 pair, multiple data rates to T1/E1
• ADSL - Asymmetric DSL, 1 pair, up to 6Mbps
• RADSL - Rate Adaptive DSL, 1 pair, up to 12Mbps, asymmetric or symmetric
Issues
• Frequencies to use for up/down stream
• Desired max distance - subscriber/DLC, about 18K feet is usual upper bound
• Coding of bits
• NON ISSUE - compatibility with other DSL systems - DSL is point-point and single organization controls both ends
Network Layer
A variety of different types of protocols• Access / service interface
– connection-oriented (X.25 Level 3, ATM) or connectionless (IP)
• Within network – switching method - DG (IP), VC (ATM)
• Routing - OSPF / BGP (IP)• Congestion detection / handling - RED (IP)• QoS - TM4.0 (ATM), RSVP-DiffServ (IP)
Routing
• Problem: Find path from a source to a destination through a network (internet)
• Possible path criteria– correctness - packets sent to correct destination– simplicity– stability - converge to equilibrium)– robustness - operates in presence of failures– fairness - serves all users - can prioritize– optimality - minimize cost of path– QoS - paths depend on app requirements
Cost of Link
• Capacity / bandwidth
• Type - satellite / land
• Error rate
• Delay
• Security
Cost of path = sum of costs of links on path
Some Issues for Routing Protocols
• Who decides route– centralized - Routing or Network Control Center
(RCC or NCC)– distributed - individual sw elts
• When are routes changed– fixed or static or non-adaptive - at system startup
time by administrator– adaptive or dynamic - with each DG, with each VC,
weekly, ??
More Issues for Routing Protocols
• Changes based on – measurements or estimate of traffic over some
period– network topology - links, sw elts up/down,
added/removed
• Algorithm used – distance vector (centralized or distributed)– link state (distributed)
Process for Changing Routes
• Determine relevant parameters - topology, delay, queue lengths, traffic intensity
• Send information to place(s) making decisions (sw elts or RCC/NCC)
• Place receiving info computes routing tables
• Routing tables distributed to sw elts (if NCC)
Distance Vector Routing
• Each sw elt maintains set of triples– (Desination, Cost, NextHop)
• Each sw elt sends updates to (receives updates from) directly connected neighbors– periodically (on the order of several seconds)
– when its table changes (triggered update)
• Update is list of pairs - (Destination, Cost)
• Update local table if receive “better” route
• Refresh existing routes; delete if time out
Database at
A B C D E F
A -- 3A -- 1A -- --B 3B -- 8B 1B -- --C -- 8C -- -- 4C --D 1D 1D -- -- -- 2DE -- -- 4E -- -- 1EF -- -- -- 2F 1F --
Destination
F E C
B
A
D
1
1
1
2
3
4
8
Distributed algorithm, each column at different sw eltCentralized algorithm, entire table at RCC
Database at
A B C D E F
A -- 2D -- 1A -- --B 3B -- 8B 1B 12C --C -- 8C -- -- 4C --D 1D 1D -- -- 3F 2DE -- 12C 4E -- -- 1EF -- 3D -- 2F 1F --
Destination
F E C
B
A
D
1
1
1
2
3
4
8
Changes in B, E tables after one exchange of routing infomation
Example
A
E
C
B
F G
D
Routing table at node B
Dest Cost NextHopA 1 AC 1 CD 2 CE 2 AF 2 AG 3 A
@
A
E
C
B
F G
D
F detects that link to G has failedF sets distance to G to inf, sends update to AA sets distance to G to inf since it uses F to reach GA receives update from C with 2 hop path to GA sets distance to G to 3 and sends update to FF decides it can reach G in 4 hops via A
Example - Recovery from Link Failure
@
A
E
C
B
F G
D
Link from A to E failsA advertises distance of infinity to EB and C advertise distance of 2 to EB decides it can reach E in3 hops, advertises this to AA decides it can reach E in 4 hops, advertises this toCC decides it can reach E in 5 hopsA, B, C form a routing loop
Looping Problem
@
1. Destination D2. Cost from
A 3B (cost to D from A is 3, first hop is B)B 2CC 1D
3. C-D fails (second column of 4)4. Successive table entries for destination D from nodeA entries 3B 3B 3B 5B 5B 7B 7B… B entries 2C 2C 4A 4A 6A 6A 8A… C entries 1D inf 3B 5B 5B 7B 7B…
Eventually, algorithm converges but only after “count to infinity”Problem: path to D from B uses B but B is not aware of this
A B C D
1 1 1
Solution - Split Horizon
• If sw elt W learned route to X from Y, W does not tell Y about its route to X
• Modification: split horizon with poisoned reverse - use metric infinity in update for routes that have ceased to exist– advantage - convergence can be faster
• Can also speed up convergence by triggering update to neighbors when a sw elt’s routing database changes
With Split Horizon
A 3B 3B 3B infB 2C 2C inf infC 1D inf inf inf
A B C D
Cost From
Split Horizon: do not report route to destination to the neighbor from which the route was learned
Consider
1. Destination E2. Cost from
A 3B (cost to D is 3, first hop is B)B 2DC 3B
3. B-D fails4. Successive table entries for destination E from node A entries 3B 3B 4C 5C 6C… B entries 2D inf 4C 5C 6C… C entries 3B 3B 4A 5A 6A… Eventually, algorithm converges - after “count to infinity”Split horizon does not help!
1
1 1A B D E
C
A B D E
C
Split horizon does not help
Successive table entries for destination E from nodes
A 3B 3B 4C inf 6BB 2D inf inf 5A infC 3B 3B 4A inf inf
Solution - Hold Down
• When a route to a destination is lost by a sw elt, it does not accept a new route to that destination for a certain amount of time - time based on max possible path loop
• Difficulty is determining hold down period. Too short may lead to (incorrect) acceptance of new routes. Too long may lead to loss of data enroute to desination
A B D E
C
With sufficient hold down
Successive table entries for destination E from nodes
A 3B 3B infB 2D inf infC 3B 3B inf
Link State
• Each sw elt sends information on directly connected links to ALL sw elts in the AD
• Contrast with distance vector - entire routing table sent just to neighbors
• Link state packet - packet used to carry info• Reliable flooding - hop-hop reliable (optional)• Shortest Path First Algorithm (Dijkstra) used to
compute routes
Link State Packet - LSP
• Routing protocol packet - on Internet encapsulated in UDP DG or TCP segment
• Includes– ID of sw elt that created LSP– cost of link to each directly connected neighbor– sequence number (SEQNO)– time-to-live (TTL) for LSP - max hop count or
max number of time units, as defined by protocol
@
LSP Handling
• Each sw elt periodically generates new LSP with increased SEQNO (set to 0 on reboot)
• LSP forwarded on all links except one on which it arrived
• Each sw elt stores newest LSP from each source sw elt
• TTL decremented before being forwarded
• TTL of stored LSP may also be periodically decremented
@
Dijkstra’s Shortest Path First Algorithm
Goal - find shortest path from source node s to each node in a graph
• N = set of nodes in the graph
• l(i, j)= cost (weight) for edge (i, j) (>= 0)
• sN = the source node
• M = set of nodes incorporated so far
• C(n) = cost of path from s to n
@
M = {s}for each n in N - {s}
C(n) = l(s,n) (inf if s not dir conn to n)while (N -union {w}: where C(w) is the
minimum for all w in (N - M)-for each n directly conn to a w
C(n) = MIN (C(n), C(w) + l(w,n))
Shortest Path First Algorithm
@
Assume for all nodes in M, the min distance and correct first hopfrom S have been found - consider node X in N-M having min distfrom S among all nodes in N-M. Is there shorter path to X? NO, WHY?
M
w added
s
Min dist from S X
Consider shorter path to X. If node before X is not in M, then it would have been chosen instead of X. So X is first node outside of M on this shorter path. But then dist to X would already be this shorter value ...
Route Table Calculation
• Each sw elt contains two lists: Tentative and Confirmed
• Each list contains a set of triples(Destination, Cost, NextHop)
• Note that “NextHop” is the first sw elt on the path from the source S to Destination
@
D
S C
A
B
15
3
10 11 2
3
Confirmed Tentative1. (S,0,-) 2. (S,0.-) (A,15,A)
(C,3,C) (D,11,D)
3. (S,0,-) (A.15,A) (C,3,C) (D,11,D)4. (S,0,-) (A.13,C) (C,3,C) (D,5,C)5. (S,0,-) (A,13,C) (C,3,C) (D,5,C)6. (S,0,-) (A,6,C) (C,3,C) (D,,5,C)
1
7. (S,0,-) (C,3,C) (D,,5,C) (A,6,C)8. (S,0,-) (B,9,C) (C,3,C) (D,,5,C) (A,6,C)
Distance Vector vs. Link State
• Distance vector– does not reveal net topology to sw elt– looping behavior with ad hoc solutions– slow to converge
• Link state– scales well– can secure by authenticating origin of LSPs– each sw elt knows topology of entire network
The Transport Layer
• Implemented in end-systems (hosts) not sw elts
• Uses network layer services
• Issues– addressing– CO vs. CL, reliable vs. unreliable– connection establishment– flow control – connection termination
CO vs. CL
• Transport layer services may be CO or CL , reliable or unreliable
• Consider CO transport service over CL network service
• Consider reliable transport service over unreliable network service
Connection Establishment 2-way Handshake
• Initiator sends connection request (w. id)
• Responder sends accept or reject (w. id)
• Responder assumes connection exists when it sends the ack
• Initiator assumes connection exists when it receives the ack
• OK if network service reliable and CO
• If not? (consider lost, old conn reqs/ack)
Deal with Old Connection Requests
• Keep track of obsolete connection ids– problem: lose state info on crash
• Limit packet lifetime with max hop count– problem: queuing times
• If can bound time in network for request and ack, then can use initial seq number based on clock to distinguish requests– done in TCP, problem: queuing times
• 3-way handshake
3-way Handshake
• Initiator sends request with ID (ISN in TCP)• Responder sends ack of initiator’s ID (ISN)
plus its own ID (ISN)• Initiator sends ack of responder’s ID (ISN)
and includes its ID• Solves old connection request problem
because only establishes connection when both sides ack the other’s ID
Flow Control
• Similar mechanism (e.g., sliding window) to link layer flow control
• Link layer receive window is usually fixed
• Transport protocols may include a “credit” parameter so receiver can inform sender of its current receive window (variable receive window), i.e., how much it will accept beyond what is being acked
• Consider credit of zero!!
Connection TerminationAbort
• User indicates it will not send any more data nor will it accept any more data
• Transport protocol immediately closes connection
• Application must insure that all data sent has been received before issuing abort
Connection TerminationGraceful Close
• User indicate it will send no more data but will continue to accept data
• Transport protocol does not close connection until both users have issued graceful close
• Application can count on transport protocol to insure all data sent is delivered
When is a segment accepted by a receiver?
Segment Receive TestLength Window
-------- -------- --------------------------------------------------- 0 0 SEG.SEQ = = RCV.NXT
0 >0 RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND
>0 0 not acceptable
>0 >0 RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND or RCV.NXT <= SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND
Basic 3-way Handshake for Connection Synchronization
TCP A TCP B SEQ ACK CTL
1. CLOSED LISTEN
2. SYN-SENT <100> <SYN> SYN-RECEIVED
3. ESTABLISHED <300> <101> <SYN,ACK> SYN-RECEIVED
4. ESTABLISHED <101> <301> <ACK> ESTABLISHED
5. ESTABLISHED <101> <301> <ACK> DATA ESTABLISHED
Simultaneous Connection Synchronization
TCP A TCP B SEQ ACK CTL1. CLOSED CLOSED
2. SYN-SENT <100> <SYN> . . .
3. SYN-RECEIVED <300> <SYN> SYN-SENT
4. ... <100> <SYN> SYN-RECEIVED
5. SYN-RECEIVED <100> <301> <SYN,ACK> ...
6. ESTABLISHED <300> <101> <SYN,ACK> SYN-RECEIVED
7. ESTABLISHED <101> <301> <ACK> ESTABLISHED
Recovery from Old Duplicate SYN TCP A TCP B
SEQ ACK CTL1. CLOSED LISTEN
2. SYN-SENT <100> <SYN> ...
3. (duplicate) ... <90> <SYN> SYN-RECEIVED
4. SYN-SENT <300> <91> <SYN,ACK> SYN-RECEIVED
5. SYN-SENT <91> <RST> LISTEN
6. ... <100> <SYN> SYN-RECEIVED
7. SYN-SENT <400> <101> <SYN,ACK> SYN-RECEIVED
8. ESTABLISHED <101> <401> <ACK> ESTABLISHED
Half-Open Connection Discovery TCP A TCP B
SEQ ACK CTL
1. (Crash) (Send 300, receive 100)
2. CLOSED ESTABLISHED
3. SYN-SENT <400> <SYN> (??)
4. (!!) <300> <100> <ACK> ESTABLISHED
5. SYN-SENT <100> <RST> (Abort!!)
6. SYN-SENT CLOSED
7. SYN-SENT <400> <SYN>
Old Duplicate SYN Initiates a Reset on Two Passive Sockets
TCP A TCP B SEQ ACK CTL
1. LISTEN LISTEN
2. ... < Z > <SYN> SYN-RECEIVED
3. (??) < X > <Z+1> <SYN,ACK> SYN-RECEIVED
4. <Z+1> <RST> (return to LISTEN)
5. LISTEN LISTEN
Normal Close Sequence
TCP A TCP B SEQ ACK CTL
1. ESTABLISHED ESTABLISHED
2. (Close) FIN-WAIT-1 <100> <300> <FIN,ACK> CLOSE-WAIT
3. FIN-WAIT-2 <300> <101> <ACK> CLOSE-WAIT
4. (Close) TIME-WAIT <300> <101> <FIN,ACK> LAST-ACK
5. TIME-WAIT <101> <301> <ACK> CLOSED
6. (2 MSL) CLOSED
Simultaneous Close Sequence
TCP A TCP B SEQ ACK CTL
1. ESTABLISHED ESTABLISHED
2. (Close) (Close) FIN-WAIT-1 <100> <300> <FIN,ACK> ... FIN-WAIT-1
<300> <100> <FIN,ACK> ... <100> <300> <FIN,ACK>
3. CLOSING <101> <301> <ACK> ... CLOSING <301> <101> <ACK> ... <101> <301> <ACK>
4. TIME-WAIT TIME-WAIT (2 MSL) (2 MSL) CLOSED CLOSED
Service Interface (to IP)
• Net Send– identifier– source, destination IP address– protocol (TCP)– data, data length– type of service (precision, reliability, delay,
throughput)– don’t fragment– options (IP source routing, time stamp, etc)
Service Interface (from IP)
• Net Deliver– source, destination IP address– protocol (TCP)– data, data length– type of service– options
Service Interface (from application)
• Open– unspecified passive - source port– fully specified passive - +dest port, address– active, active w. data - +data length, data, psh, urg+
• Send - lcn, data length, data, psh, urg+
• Allocate (accept) - lcn, data length, data, urg+
• Close (graceful) - lcn
• Abort - lcn
• Status - lcn
Service Interface (to application)
• Open ID - local connection name, source port, dest port and address (if known)
• Open failure - lcn
• Open success - lcn
• Deliver - lcn, data, data length, urg+
• Closing - lcn
• Terminate - lcn
• Status response and error
When Bad Segments Arrive
• Send RST (stay in same state)– closed state and anything arrives except RST– listen, syn-sent, syn-received states and
incoming acks something not sent
• Send ACK - established and other synchronized states
• Use seq, ack numbers that other side will believe (e.g., from incoming ack or data)
TCP Throughput - Error Free Environment
• Max TCP window/credit = 64KB
• Max TCP throughput– cross US - 100ms RTT - 5.2Mbps– metropolitan - 1ms RTT - 520Mbps– local - .1ms RTT - 5.2Gbps
• Problem is the max credit of 64KB
TP4 Throughput
• Max TP4 window/credit– (8KB per TPDU) * (32K TPDUs)– 256MB = 2Gb
• Max TP4 Throughput– cross US - 100ms RTT - 20Gbps– metropolitan - 1ms RTT - 2Tbps– local - .1ms RTT - 20Tbps
TCP Solution - Window Scaling
• Add window scale option
• Both ends of TCP connection must agree at connection setup time to use scale factors
• Each end tells the other what scale factor it will accept (0 if willing to use scaling for other end but not accept scaled window)
• ActualReceiveWindow =
SegmentReceiveWindow * 2**shift-cnt
Window Scaling
• Max shift-cnt = 14
• Max window/credit = 64KB * 2**14 = ~8Gb
• Max TCP througput across US = 80Gbps
TCP vs. TP4 - Common Features
• Connection-oriented reliable service interface for client processes
• Sliding windows with cumulative acks and credits
• Timers to trigger retransmission • Checksum to insure data integrity• Three-way handshake for connection
establishment
TCP vs. TP4 - Differences
• Sequence number space– TCP - 32 bits indexing BYTES– TP4 - 31 bits indexing TPDUs
• Maximum packet size– TCP - 64KB– TP4 - 8KB
• Credit - maximum receive window size– 64KB– 32K TPDUs, each with max.size 8KB
TCP vs. TP4 - Differences
• Packet types– TCP - one type with flags (segment)
• SYN, FIN, ACK, RST, PSH, URG
– TP4 - multiple packet types (TPDU)• CR, CC, DT, AK, DR, DC, ED, EA
• Out of band data– TCP - URG, PSH– TP4 expedited data
• Termination– TCP - graceful close or abort– TP4 - abort
TCP vs. TP4 - Differences
• Checksum– TCP - one addition for 16 bytes of data– TP4 - two additions for 1 byte of data
• Opening connections– TCP - passive or active open, listen– TP4 - active open, no listen
• Connections– TCP - one between address/port pairs– TP4 - multiple between TSAPs with diff ref #s
The Internet Protocol - IPVersion 4
Internet Model
RR
R
R
R
Hub
R
R
TCP
IP IP
TCP
IP
R
@
Packet Delivery Model
• Connectionless service interface– datagrams may not be delivered to destination
user in the same order as sent by source user
• Potentially unreliable– datagrams sent may never be delivered
• Datagrams are used within the network– unbounded delays, reordering, loss, duplication
within network are possible
IP Header
• Version (4) - currently version 4
• HLEN (4) - number of 32 bit words in header
• TOS (8) - Type of Service (not widely used)
• Length (16) - number of bytes in this DG (max 64K)
• Ident (16) - used by fragmentation/reassemlby
• Flags/Offset (16) - used by frag/reass (3 +13)
• TTL (8) - Time to Live - number of hops before discard this DG
• Protocol (8) - TCP = 6, UDP = 17
• Checksum (16) - of header only
• Destination / Source IP address (32 bits each)
@
Fragmentation and Reassembly
• Each network has some MTU
• Strategy– fragment when necessary (MTU < DG size)– avoid fragmentation at source host– regragmentation is possible– fragments are self contained datagrams– delay reassembly until destination host– do not recover from lost fragments
@
Indent = x
Indent = x
Indent = x
Indent = x Offset=0
512
1024
512 bytes
512 bytes
376 bytes
0
1
1
Flag = more fragments
Indent = x Offset=0
1400 bytes
0
@
Global Addresses• Properties
– globally unique– hierarchical - network + host– each network interface has an IP address
• Four classes - A, B, C, D (multicast-111…)
0
10 host
host
host110 network
net
net
16
8
247
14
21
12.5.120.9
138.108.12.240
205.23.198.12
@
Datagram Forwarding
• Strategy– every DG contains destination’s address– if directly connected destination network, then
forward to host– otherwise, forward to some router– forwarding table maps network number into next
hop– each host has a default router– each router maintains a forwarding table
@
Address Translation
• Map IP addresses into physical addresses (e.g., 48 bit IEEE)
• Address Resolution Protocol (ARP)– table of IP to physical address bindings– sender broadcasts request if destination IP address
not in its table– target machine responds with its physical address
@
How do Hosts/Routers Build/Maintain ARP Caches
• Table entries discarded if not refreshed (~10 minutes)
• Update your table with source when you are the target
• Update your table if you already have an entry and see info pass by on net
• Do not add new table entry based on info that passes on net
ICMPInternet Control Message Protocol
• Uses IP but is a separate protocol in the network layer
ICMP HEADER
IP HEADERPROTOCOL = 1
TYPE CODE CHECKSUM
REMAINDER OF ICMP MESSAGE (FORMAT IS TYPESPECIFIC)
IP HEADER
IP DATA
ICMP Types
• Echo - Echo Reply (ping)• Redirect (use other router)• Destination Unreachable (prot, port, host)• Time Exceeded• Source Quench• Checksum Failed• Reassembly Failed• Cannot Fragment
@
Echo and Echo Reply
TYPE CODE CHECKSUM
IDENTIFIER SEQUENCE #
DATA ….
TYPE = 8 = ECHO; 0 = ECHO REPLYCODE = 0
IDENTIFIERAn identifier to aid in matching echoes and replies
SEQUENCE #Same use as for IDENTIFIER
UNIX “ping” uses echo/echo reply
Redirect
TYPE CODE CHECKSUM
NEW ROUTER ADDRESS
IP HEADER + 64 bits data
from original DG
TYPE = 5CODE =
0 = Network redirect1 = Host redirect2 = Network redirect for specific TOS3 = Host redirect for specific TOS
Destination Unreachable
TYPE CODE CHECKSUM
UNUSED
IP HEADER + 64 bits data from original DG
TYPE = 3CODE
0 = Net unreachable1 = Host unreachable2= Protocol unreachable3 = Port unreachable4 = Fragmentation needed but DF set5 = Source route failed
Time Exceeded
TYPE CODE CHECKSUM
UNUSED
IP HEADER + 64 bits data from original DG
TYPE = 11CODE
0 = Time to live exceeded in transit1 = Fragment reassembly time exceeded
Source Quench
TYPE CODE CHECKSUM
UNUSED
IP HEADER + 64 bits data from original DG
TYPE = 4; CODE = 0
Indicates that a router has dropped the original DG or may indicate that a router is approaching its capacity limit.
Correct behavior for source host is not defined.
Traceroute
• UNIX utility - displays router used to get to a specified Internet Host
• Operation – router sends ICMP Time Exceeded message to source if TTL is
decremented to 0– if TTL starts at 5, source host will receive Time Exceeded message from
router that is 5 hopes away
• Traceroute sends a series of probes with different TTL values… and records the source address of the ICMP Time Exceeded message for each
• Probes are formatted to that the destination host will send an ICMP Port Unreachable message
Scaling Issues
• Problems– Internet addresses have only 32 bits and only 2 levels
of hierarchy– original addresses only allowed 256 networks– routing protocols do not scale
• IPv6 solves both of these problems• Classes (A, B, C) provided for more networks• Subnetting adds a third level of hierarchy• CIDR makes routing more efficient
Subnetting
• Add another level to address/routing hierarchy: subnet
• Subnet masks define variable partition of host part of Class A and B addresses
• Subnets are visible only within the local site
Network Number SubnetID HostID
Subnetted Address
Class B Address
Subnet Mask (255.255.255.0)00000000111111111111111111111111
H3R2
H2
H1 R1
128.96.33.14 128,96.33.1
128.96.34.139
1228.96.34.130
128.96.34.129
128.96.34.1129.96.34.15
Subnet Mask: 255.255.255.128Subnet number: 128.96.34.0
Subnet Mask: 255.255.255.128Subnet number: 128.96.34.128
Subnet Mask: 255.255.255.0Subnet number: 128.96.33.0
Subnet Subnet NextNumber Mask Hop128.96.34.0 255.255.255.128 int0128.96.34.128 255.255.255.128 int1128.96.33.0 255.255.255.0 R2
@
Forwarding Algorithm
D = destination IP address
for each entry (SubnetNum, SubnetMask, NextHop)
D1 = SubnetMask & D
if D1 = SubnetNum
if NextHop is an interface
deliver datagram directly to destination
else
deliver datagram to NextHop (a router)
@
Notes
• Would use a default router if nothing matches
• Not necessary for all ones in subnet mask to be contiguous
• Can put multiple subnets on one physical network
• Subnets not visible from rest of the Internet
@
Supernetting
• Assign block of contiguous network numbers to near-by networks
• Called CIDR: Classless Inter-Domain Routing• Represent blocks with a single pair
– (first _network_address, count)
• Restrict block sizes to powers of 2• Use bit mask (CIDR mask) to identify block size• All routers must understand CIDR addressing
@
Route Propagation
Idea: Impose a second hierarchy on the network that limits what routers talk to each other. (The first hierarchy is the address hierarchy that governs packet forwarding.)
• Autonomous Systems (AS)– corresponds to administrative domain– example: company backbone, ISP backbone– assign each AS a 16 bit number
Routing Protocol Types
• Two-level route propagation hierarchy
• Interior gateway protocol– inside an AS– each AS selects its own– routing decisions based on metric
• Exterior gateway protocol– between ASs – Internet-wide standard)– routing decisions based on administrative issues
Popular Interior Gateway Protocols
• RIP: Route Information Protocol– developed for XNS, distributed with UNIX– distance vector algorithm, metric is hop count
• OSPF: Open Shortest Path First– recent Internet standard– link state algorithm– routes changed when topology changes– supports load balancing– 2-level internal hierarchy
OSPF
OSPF - Open Shortest Path First Intra-AS Routing Protocol
• Used within a single AS
• Calculates routes to destination IP network address
• Link state protocol - SPF/flooding
• Route changes based on topology not traffic
OSPF
• Supports different routes depending on TOS required
• Supports multiple equal cost paths
• Exchange of routing updates is reliable between adjacent routers
• Link state advertisement (LSA) contains info on state of router’s interfaces and adjacencies, LSAs are flooded
Areas
• Two level routing hierarchy
• Divide AS into areas
• SPF is used to determine routes within each area and on the backbone that routes between the areas
• The backbone is also an area
Area 1Area 2
Area 3
Area 4
Packet Types
• Hello - acquire neighbors
• Database description
• Link state request
• Link state update, Link state ack– reliable hop-hop transfer of link state
advertisements (LSA)– may aggregate LSAs for several sources
Link State Advertisements
• Router link - originated by all routers, flooded in area
• Network link - originated by designated router, flooded in area
• Summary link - originated by area border routers, flooded into area, describes route to destination outside the area but in the AS
• AS external link - originated by an AS border router, flooded throughout AS, describes route to dest in another AS
Types of OSPF Routers
• Internal router - within an area
• Area border router – attached to multiple areas
– runs a copy of SPF for each attached area
– relays topological info on attached areas to backbone
• Backbone router - includes area border routers
• AS boundary routers
Popular Exterior Gateway Protocols
• EGP - Exterior Gateway Protocol– original protocol in this class– based on reachability tree– no metric
• BGP4 - Border Gateway Protocol\– recent internet standard
Routing Hierarchy Summary
• AS to destination AS (BGP4)
• Within AS (OSPF)– 2 levels– to destination network
• Within network – to subnet, host
Border Routers
• Each AS has – one or more border routers– one or more border routers that serve as the speakers for the AS– border routers forward datagrams between ASs
• Speakers communicate with border routers in adjacent ASs, using TCP
• Speakers advertise and receive routing info and determine which routes to use
• Message types: OPEN, KEEPALIVE, UPDATE, NOTIFICATION (errors)
UPDATE MESSAGES
• Routes are advertised and withdrawn in UPDATE messages
• A route includes a sequence or set of ASs on the path to the destination IP networks
unfeasible routes length (2B)
withdrawn routes (destination IP addresses)
total path attribute length (2B)
path attributes (routes)
network layer reachability info (destination IP addresses)
Routes
• When a BGP speaker advertises a route, it adds itself to the route
• A BGP speaker only advertises routes that it uses itself
• Policy considerations are used to determine route; routes are not optimal
• An advertised route (sequence or set of ASs) may change -- downstream ASs may not discover in a timely fashion
Update Handling
• Receive an update with routes
• Decision process– D (attributes of route) ------> non neg integer– D specifies the degree of preference for a route– route to dest with highest preference used
• Note that D does not depend on the existence or non-existence of other routes
• D is based on policy considerations, not necessarily on the cost of routes