3rd Edition: Chapter 4

CSCI 547 Network Layer 4-1

Chapter 4Network Layer

Read the paper (IP Addressing) http://www.3com.com/other/pdfs/infra/corpinfo/en_US/501302.pdf#search=%22understanding%20ip%20addressing%20everything%22

A excellent source for IP addressing and subnetting & routing

Also read Chapter 3 of http://www.redbooks.ibm.com/redbooks/pdfs/gg243376.pdf

Computer Networking: A Top Down Approach ,4th edition. Jim Kurose, Keith RossAddison-Wesley, July 2007.

http://www.3com.com/other/pdfs/infra/corpinfo/en_US/501302.pdf#search=%22understanding%20ip%20addressing%20everything%22




Chapter 4: Network Layer

Chapter goals: understand principles behind network

layer services: network layer service models forwarding versus routing how a router works routing (path selection) dealing with scale advanced topics: IPv6, mobility

instantiation, implementation in the Internet


Chapter 4: Network Layer 4. 1 Introduction 4.2 Virtual circuit and

datagram networks 4.3 What’s inside a

router 4.4 IP: Internet Protocol

Datagram format IPv4 addressing ICMP IPv6

4.5 Routing algorithms Link state Distance Vector Hierarchical routing

4.6 Routing in the Internet RIP OSPF BGP

4.7 Broadcast and multicast routing


Network layer: Network layer functions

networkdata linkphysical








application

transportnetworkdata linkphysical

application


Application-layer protocolsdefine when and howmessages are sent Transport-layer protocolsdeliver data between processes on different end-systems

o Transport protocols execute only on end systems

Network-layer protocolsdeliver data from one end-system to another

oHop-to-hop rather than end-to-endoNetwork layer protocols execute on every end-systems and routers

Virtual end-to-end transport


The Network Layer: Network layer functions(Packet switching)

The network-layer provides four important functions: Addressing : the means by which end systems identify each other Path determination (routing) : the route taken by packets from source to destination Switching (forwarding) : the movement of packets from an input interface to an appropriate output interface Call setup & termination : The establishment of a virtual circuit from sender to receiver in case of a connection-oriented service-- ATM, frame relay, X.25


Interplay between routing and forwarding

1

23

0111

value in arrivingpacket’s header

routing algorithm

local forwarding tableheader value output link

0100010101111001

3221

Routing algorithm updates the forwarding table

Routing = ?

Forwarding = ?

Act of updating routing table usually via talking to other routers

Act of looking up routing table and sending(forwarding) to another router


Connection setup

Connection-oriented network architectures: ATM, frame relay, X.25

before datagrams flow, two end hosts and intervening routers establish virtual connection routers get involved in setting up the connection

network vs transport layer connection service: network: between two hosts (may also involve

intervening routers in case of VCs) transport: between two processes


Network service model

What service model should be provided for transporting packets from sender to receiver? In other words, what aspects should we consider in designing network layer?

Some possible elements of a service model: Guaranteed bandwidth Guaranteed delay Preservation of inter-packet timing (guaranteed maximum jitter)—end-to-end—geared toward time-sensitive traffic Loss-free delivery In-order delivery Congestion feedback to senderDoes IP protocol provide any of the above? NO !


Network layer service models:

NetworkArchitecture

Internet(IP)

ATM

ATM

ATM

ATM

ServiceModel

best effort

CBR

VBR

ABR

UBR

Bandwidth

none

constantrateguaranteedrateguaranteed minimumnone

Loss

no

yes

yes

no

no

Order

no

yes

yes

yes

yes

Timing

no

yes

yes

no

no

Congestionfeedback

no (inferredvia loss)nocongestionnocongestionyes

no

Guarantees ?

Connection-oriented protocols can do all of these!

Constant Bit Rate (CBR)Variable Bit Rate (VBR)Available Bit Rate (ABR)Unspecified Bit Rate (UBR)










Connection-Oriented or Connectionless ?

Datagram(DG) network provides network-layer connectionless service--IP

Virtual Circuit(VC) network provides network-layer connection-oriented service—Frame Relay, X.25

analogous to the transport-layer services, but: service: should provide host-to-host—same

with transport layer in this respect no choice: network either provides or not—

transport layer has choice between TCP or UDP

implementation: in network core


Connection-oriented: Virtual Circuits

call setup for each call before data can flow, and teardown when done

each packet carries VC identifier (not destination host address)

every router on source-dest path maintains “state” for each passing connection—connection management—flow control, windowing, etc.

link, router resources (bandwidth, buffers) may be allocated to VC (dedicated resources = predictable service) QOS

“source-to-dest path behaves much like telephone circuit” performance-wise network actions along source-to-dest path


VC implementation

a VC consists of:1. path from source to destination2. VC numbers, one number for each link along

path—number changes hop-to-hop3. entries in forwarding tables in routers along

path—an entry fixed until VC teardown packet belonging to VC carries VC number

(rather than full destination address) VC number can be changed on each link.

VC number changed according to the entry in forwarding table


ATM cell format: Virtual Circuit identifier

GFC--Generic flow control (000=uncontrolled access). VPI--Virtual path identifier. VCI--Virtual channel identifier. Together, the VPI and VCI comprise the VPCI. These fields represent the routing information within the ATM cell.PTI--Payload Type Indication. CLP--Cell Loss Priority.HEC--Header Error Control.

GFC VPI

8 bits

VPI VCI

VCI

VCI PTI(3 bits) CLP

HEC

48 BYTESPAYLOAD

VC identifier


Forwarding table

12 22 32

1 23

VC number

interfacenumber

Incoming interface Incoming VC # Outgoing interface Outgoing VC #

1 12 3 222 63 1 18 3 7 2 171 97 3 87… … … …

Forwarding table innorthwest router:

Routers maintain connection state information for each Virtual circuit and also do windowing—large overhead!


Virtual circuits: signaling protocols

used to setup, maintain, teardown VC using control packets used in ATM, frame-relay, X.25 VC & signaling protocol are used only along parts of Internet

Used primarily near the network core where a connection-oriented service (e.g. ATM) is used

The signaling protocols are not part of IP

application


application


1. Initiate call 2. incoming call

3. Accept call4. Call connected5. Data flow begins 6. Receive data7. Discconnect request 8. Discconnect confirm

What happens when a packet is hit by noise

on a link?


Example ATM Backbone on Internet

http://www.nthelp.com/maps.htm


Datagram networks--connectionless

no call setup at network layer routers: no state about hop-to-hop or end-to-end

connections no network-level concept of “connection”

packets forwarded using destination host address Packets routed independently each other packets between same source-dest pair may take different

paths

application


application


1. Send data 2. Receive data

Advantage: Much less overhead than VC Disadvantages?unreliable, no

flow control, no error control


Forwarding table

Destination Address Range Link Interface

11001000 00010111 00010000 00000000 through 0 11001000 00010111 00010111 11111111

11001000 00010111 00011000 00000000 through 1 11001000 00010111 00011000 11111111

11001000 00010111 00011001 00000000 through 2 11001000 00010111 00011111 11111111

otherwise 3

For 32 bit addresses, 4 billion possible entries to have an entry for every address for every router, so better way is

Above will have the same effect as


Longest prefix matching: the idea used in Internet(glimpse of how routing works)

Prefix Match Link Interface 11001000 00010111 00010 0 11001000 00010111 00011000 1 11001000 00010111 00011 2 otherwise 3

DA: 11001000 00010111 00011000 10101010

Examples

DA: 11001000 00010111 00010110 10100001 Which interface?

Which interface?


Datagram or VC network: why?

Internet (datagram) data exchange among

computers “elastic” service, no

strict timing required “smart” end systems

(computers) can adapt, perform

control, error recovery simple inside network,

complexity at “edge” many link types

different characteristics uniform service difficult

ATM (VC) evolved from telephony--

BISDN human conversation:

strict timing, reliability requirements

need for guaranteed service

“dumb” end systems telephones complexity inside

network


Connection-oriented(VC) vs Conncectionless(DG)

ISSUES Connection-oriented Connectionless-------------------------------------------------------------------------------------------------------------------Initial setup Required Noand termination

Routing Routing only done on Each packet routedinitial VC setup independently

Connection state Routers keep state info. Router do not hold state info.

for each connection

Need for Needed during initial setup Full address neededFull address Afterwards only VC# always

needed

Packet Guaranteed Not guaranteedSequencing Error recovery Handles error conditions Left to a higher layer


ISSUES Connection-oriented Connectionless-------------------------------------------------------------------------------------------------------------Congestion control Easy Difficult

QOS Easy Difficult

Flow Control Handles Not done

Overhead High Low

Examples: TCP UDP, IP, IPX, ISO-IP

Connection-oriented(VC) vs Conncectionless(DG)—Cont’d


Chapter 4: Network Layer

4. 1 Introduction 4.2 Virtual circuit and








Router Architecture Overview

Two key router functions:

run routing algorithms/protocol (RIP, OSPF, BGP) forwarding datagrams from incoming to outgoing link

Explore this first


Input Port Functions: Centralized switching

Centralized switching: Input port forward to Routing processor Routing processor perform forwarding

table lookup Forwards the packet to output port This approach used for low end routers,

workstations or servers acting as routers

Physical layer:bit-level reception

Data link layer:e.g., Ethernetsee chapter 5

Routing processor

Routing Table

Output port


Input Port Functions: Decentralized switching

Decentralized switching: given datagram’s dest., input port

processor lookup output port using forwarding table(a shadow copy in each input port) in input port memory—some routers use Content Addressable Memory for faster lookups—e.g. Cisco 8500 has 64K CAM for each input port

goal: complete input port processing at ‘line speed’—needs parallel processors

queuing: if datagrams arrive faster than forwarding rate into switch fabric

Physical layer:bit-level reception

Data link layer:e.g., Ethernetsee chapter 5


Three types of switching fabrics

1: 2:

3:


Switching Via Memory

First generation routers: earlier routers (often just a computer) with switching under direct control of CPU

packet copied to system’s memory speed limited by memory bandwidth (2 system bus crossings per datagram)

many modern routers still use but in a shared memory multiprocessor mode—a processor for each input port

InputPort

OutputPort

Memory

System Bus

e.g. CISCO Catalyst 8500


Switching Via a Bus

datagram from input port memory

to output port memory via a shared bus—one packet at a time to output port

bus contention: switching speed limited by bus bandwidth

1 Gbps bus, Cisco 1900: sufficient speed for access and enterprise routers (not fast enough for regional or backbone routers)


Switching Via An Interconnection Network

overcome bus bandwidth limitations Banyan networks & other

interconnection nets initially developed to connect processors in multiprocessor

Advanced design: fragmenting datagram into fixed length cells, switch cells through the fabric and reassemble at receiving end

Fastest, most expensive Cisco 12000 series: switches up to 1.28

tera bps through the interconnection network

Horizontal bus—no contention

Vertical bus—contention

. ..

.

.

.

See this

http://en.wikipedia.org/wiki/Cisco_12000

http://en.wikipedia.org/wiki/Cisco_12000

http://www.cisco.com/en/US/products/hw/routers/ps167/prod_models_comparison.html


Cisco 12000 Crossbar switch

http://www.cisco.com/en/US/products/hw/routers/ps167/products_data_sheet0900aecd800f4147.html


Output Ports

Buffering required when datagrams arrive from switching fabric faster than the transmission rate

Scheduling discipline chooses among queued datagrams for transmission—depends upon protocols;

If IP, then First-In-First-OutIf ATM, then depending on the QOS

Constant Bit Rate (CBR)Variable Bit Rate (VBR)Available Bit Rate (ABR)Unspecified Bit Rate (UBR)


Output port queueing

buffering when arrival rate via switch exceeds output line speed queueing (delay) and loss due to output port buffer overflow!—

Assuming Switch operates 3 times the speed of line speed


Input Port Queuing

Fabric slower than input ports combined -> queueing may occur at input queues

Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue from moving forward

queueing delay and loss due to input buffer overflow!










TCP/IP Protocol Suite

Defined in T

CP/IP Protocol S

uiteU

nd

efine

d

7

6

5

4

3

2

1

OS

I l

ay

ers

TC

P/I

P l

ay

ers

Ap

plic

atio

n

DomainName

System

DNSDDNS

FileSharing

NFS

BOOTP

DHCP

HostConfigura

-tion

NetworkManage-

ment

RMON

SNMP

FileTransfer

FTP

TFTP

E-Mail &News

MIME

SMTP

POP/IMAP

NNTP

WWW &Gopher

Interactiveaccess

HTTPHTTPS

Telnet

“r” Com-mands(rlogin,rsh, ,,)

Gopher

IRC

Transport TCP(Transmission Control Protocol)

UDP(User Datagram Protocol)

RARP(Reverse Address Resolution Protocol)

ARP(Address Resolution Protocol)

InternetIPv4IPv6

IP NAT

IP Sec

Mobile IP

IP RoutingProtocols

ICMPv4ICMPv6

ND(NeighborDiscovery)

IP supportProtocols

RIP, OSPF, GGP,HELLO, IGRP, EIGRP,

BGP, EGP

NetworkInterface

(Data Link)

SLIP(Serial Line

Interface Protocol)

PPP(Point-to-Point

Protocol)

LAN/WLAN/WANHardware Drivers

(Ethernet, FDDI, Wireless,ATM, Frame Relay, …)

PhysicalMedium

UTP/Coax/Wireless/Infrared/Fiber Optics/Microwave/Satellite


IP

3 InternetIPv4IPv6

IP NAT

IP Sec

Mobile IP

IP RoutingProtocols

ICMPv4ICMPv6


IP supportProtocols


BGP, EGP


The IP (Internet Protocol) RFC 791

routingtable

Network layer functions of hosts & routers :

Routing protocols•path selection•RIP, OSPF, BGP

IP protocol•addressing conventions•datagram format•packet handling conventions

ICMP protocol•error reporting•router “signaling”

Transport layer: TCP, UDP

Link layer

physical layer

Networklayer

ftp://ftp.rfc-editor.org/in-notes/rfc791.txt










IP datagram format: RFC 791

ver length

32 bits

data (variable length,typically a TCP

or UDP segment)

16-bit identifier

header checksum

time tolive

32 bit source IP address

IP protocol versionNumber: IPv4

header length (32 bit words)

max numberremaining hops

(decremented at each router)

Used forfragmentation/reassembly

total datagramlength (bytes)

upper layer protocolto deliver payload to

head.len

type ofservice

“type” of data flgsfragment

offsetProto-

col

32 bit destination IP address

Options (if any) E.g. timestamp,record routetaken, specifylist of routers to visit.

how much overhead with TCP?

20 bytes of TCP 20 bytes of IP = 40 bytes + app

layer overhead

44 8

16

13

8816

For error checking


http://www.ietf.org/rfc/rfc0790.txt



Table from IANA: Assigned Numbers

The Protocol numbers are Service Access Points on IP layer

http://www.iana.org/assignments/protocol-numbers


Service Access Points

ICMP

1

TCP

6

UDP

17

IGMP

2

IP

Protocol numbers

x0800

ARP

x0806Ether Types

23

Telnet

80

http

53

DNS

Port numbers

Ethernet



http://www.iana.org/assignments/ethernet-numbers

http://www.iana.org/assignments/ethernet-numbers

http://www.iana.org/assignments/port-numbers

http://www.iana.org/assignments/port-numbers


MTU (Maximum Transmission Unit)

network links have MTU (Maximum Transmission Unit) - largest possible link-level frame. different link types different MTUs

Given 1500 Byte MTU(Ethernet), what is MSS(Maximum Segment Size) for TCP? 1500 – 20(TCP) -20(IP) = 1460

IP datagram

Maximum Transmission UnitLink layer header Link layer trailer

1500 bytes for Ethernet

Transport layer(TCP/UDP) has MSS—IP layer has MTU(determined by Link layer)

headers


Link layer protocol MTU (bytes)

Internet Path MTU At least 576

Hyperchannel 65,535

Token ring(16Mbps) 17,914

FDDI 4,500

Ethernet 1,500

Dial-up, X.25 576

PPPoE 1,492

ATM 48

802.11 2272

MTU (Maximum Transmission Unit)


IP Fragmentation & Reassembly

large IP datagram divided (“fragmented”) within net By a router one datagram

becomes several datagrams

“reassembled” only at final destination host—why?

IP header bits used to identify, reorder related fragments

fragmentation: in: one large datagramout: 3 smaller datagrams

reassembly


Fragmentation example:

C:\Documents and Settings\Administrator>ping www.google.com -f -l 1400

Pinging www.l.google.com [66.102.7.104] with 1400 bytes of data:

Packet needs to be fragmented but DF set.Packet needs to be fragmented but DF set.Packet needs to be fragmented but DF set.Packet needs to be fragmented but DF set.

Ping statistics for 66.102.7.104: Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

Ping from a Windows workstation

-f == Set Don't Fragment flag

-l == Send buffer size.Screen captured on a

Windows system


ver length

32 bits

data (variable length,typically a TCP

or UDP segment)

16-bit identifier header

checksumtime to

live

32 bit source IP address

head.len

type ofservice

flgsfragment

offsetProto-

col

32 bit destination IP address

Options (if any)

Fields needed for fragmentation

R DF MFFlags—3 bits

R: Reserved—not used

DF: Don’t Fragment

MF: More Fragment

Fragmentation


IP Fragmentation and Reassembly

ID=x

offset=0

moreflag=0

length=4000

ID=x

offset=0

moreflag=1

length=1500

ID=x

offset=185

moreflag=1

length=1500

ID=x

offset=370

moreflag=0

length=1040

One large datagram becomesseveral smaller datagrams

Example 4000 byte

datagram MTU = 1500 bytes

1480 bytes in data field + 20bytesof IP header

offset =1480/8

Fragments are counted in units of 8 octets(bytes).


Internal modules of IP Layer: for routers & hosts

Header-composing module

Reassembly module

Routingmodule

Processing module

Fragmentation module

Reassembly table

Routing table

MTU table

To upper layer protocol

DataData & dest. addr.

To data link layer

IP packet, next hop addr.

From data link layer

IP packet

IP packet

IP packet, next hop addr., interface

IP packet

IP packet IP

From upper layer protocol










IP packet format—IPv4


IP packet format—IPv4—from RFC791


IP Addressing: introduction

IP address: 32-bit identifier for host, router interface

interface: connection between host/router and physical link router’s typically have

multiple interfaces host typically has one

interface IP addresses

associated with each interface

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

223.1.1.1 = 11011111 00000001 00000001 00000001

223 1 11


IP Addressing

Classful Addressing

CIDR

VLSM

NAT

1992

Inefficient division into 5 classes—A, B, C, D, EAddress space running out

IPv6—128 bitsIPv4—32bits

Huge address space

More streamlined for efficiency

More Auto-configuration

Accommodates QOS

When?

interim solutions


IP Addressing: Classful or Classless

Older IP addressing (and routing) called “Classful IP addressing” which fixes the size of a block to one of classes (A, B, C, D, E)

Out of 5 classes, only 3 classes are assignable to computers as IP addresses—only 3 sizes to fit all organizations of the world?

More elastic (size-wise) scheme is called “CIDR(Classless Inter Domain Routing)”—IP address blocks can be any size using prefix subnet masks—later

Both used but Classless mostly used by ISP level or above


IP Addressing: Classful Addressing

Byte 1 Byte 2 Byte 3 Byte 4

Class A

Class B

Class C

Class D

Class E

0

1 0

1 1 0

1 1 1 0

1 1 1 1

Multicast address

Reserved for future addressing modes

netid

netid

netid

hostid

hostid

hostid

7

1614

24

21 8


IP Addressing: Dotted Decimal Notation

32 bit IP addresses are often represented in the human-friendly 4 groups of 8 bits and written in decimal number with dots separating them “dotted decimal notation”

1 0 0 0 0 1 0 0

1 1 1 1 0 0 0 1

0 0 1 0 0 0 1 11 0 0 1 1 1 1 0

128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1

132 241 158 35. . .


Private IP addresses

The Internet Assigned Numbers Authority (IANA) under ICANN has reserved the following three blocks of the IP address space for private networks: Read here

10.0.0.0 - 10.255.255.255 172.16.0.0 - 172.31.255.255 192.168.0.0 - 192.168.255.255

Also, IP addresses in the range of 169.254.0.0 -169.254.255.255 are reserved for Automatic Private IP Addressing (Zero Configuration Networking).Used for ad hoc or isolated networks

The above IP addresses should not be used on the Internet. Internet routers will not route packets with those addresses—only usable for private networks or Intranets

http://www.iana.org/

http://www.icann.org/

Seung Bae Im

From Wikipedia, the free encyclopediaZeroconf or Zero Configuration Networking is a set of techniques that automatically create a usable IP network without configuration or special servers. Also known as Automatic Private IP Addressing or APIPA, this allows unknowledgeable users to connect computers, networked printers, and other items together and expect them to work. Without Zeroconf or something similar, a knowledgeable user must either set up special servers, like DHCP and DNS, or set up each computer by hand.Zeroconf was pioneered by Apple Computer employee Stuart Cheshire in the company's move from AppleTalk to IP.


Address ranges

Class Start End Default Subnet Mask

CIDR prefix notation

A 1.0.0.0 127. 255.255.255 255.0.0.0 /8

B 128.0.0.0 191.255. 255.255 255.255.0.0 /16

C 192.0.0.0 223.255.255.255 255.255.255.0 /24

D 224.0.0.0 239.255.255.255 NA NA

E 240.0.0.0 255.255.255.255 NA NA


Special Addresses

Some parts of classes A, B, C are used for special addresses:

-----------------------------------------------------------------------------------------------------------------Special Addresses Netid Hostid Source or Destination-----------------------------------------------------------------------------------------------------------------Network address Specific All 0s None(e.g. 132.241.0.0 is the network address for the CSU, Chico LAN)-----------------------------------------------------------------------------------------------------------------Direct broadcast address Specific All 1s Destination(This is used by a router to send a packet to all hosts in a subnet, e.g. 132.241.255.255)-----------------------------------------------------------------------------------------------------------------Limited broadcast address All 1s All 1s Destination(A broadcast address for a subnet--It is used when a host wants to send a message toall the hosts in the local subnet--routers will not pass this to other subnets)-----------------------------------------------------------------------------------------------------------------This host on this network All 0s All 0s Source(All 0s designate “this host on this network”--used by a host when it does not knowit’s own IP address)-----------------------------------------------------------------------------------------------------------------Specific host on this network All 0s Specific Destination(It is used by a host to send a message to another host on the same subnet-- Routers willnot process this kind)-----------------------------------------------------------------------------------------------------------------Loopback address 127 Any Destination(It is used to test the health of TCP/IP protocol on a host, e.g. ping 127.0.0.1)


The Needs for subnetting

Given a chunk of addresses (e.g. a class B), an organization usually need to sub-divide the address space in a hierarchical fashion

Just as an organization is structured hierarchically, IP addresses are divided as needed

From http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/ip.htm “Subnetting provides the network administrator with several benefits, including extra flexibility, more efficient use of network addresses, and the capability to contain broadcast traffic (a broadcast will not cross a router). Subnets are under control of local administration. As such, the outside world sees an organization as a single network and has no detailed knowledge of the organization's internal structure. “

Also read http://www.support.psi.net/support/common/routers/files/SUBNET-Desc.html

http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/ip.htm

http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/ip.htm

http://www.support.psi.net/support/common/routers/files/SUBNET-Desc.html

http://www.support.psi.net/support/common/routers/files/SUBNET-Desc.html


Without Subnetting?

Analogous to: One person in mailroom delivering all mails of the organization

Without subnetting, The entire network (e.g. 132.241.0.0) is connected as one LAN--All workstations should be connected directly to the router—either directly to the router ports or the LAN is connected only through hubs and switches—this is not feasible except for a very small network—less than 100 computers


Without Subnetting?

.

.

.

. . .

To Internet

This router should be very fast, should have a large number of ports. Also long cables needed-- Not feasible except for very small network—less than 100 computers?!


Subnets

IP address divided: subnetid part (high

order bits) hostid part (low order

bits)

What’s a subnet ? device interfaces with

same subnet part of IP address

Hosts within a subnet can physically reach each other without intervening router

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

network consisting of 3 subnets

subnet

subnetid hostid


Subnets223.1.1.0/24

223.1.2.0/24

223.1.3.0/24

Recipe To determine the

subnets, detach each interface from its host or router, creating islands of isolated networks. Each isolated network is called a subnet.

The host are usually connected to a hub or a switch

Subnet mask: /24


Subnets

Decisions to make: How many subnets? How big each

subnet? How many levels of

subnetting?

All depends upon the organizational structure and requirements

223.1.1.1

223.1.1.3

223.1.1.4

223.1.2.2223.1.2.1

223.1.2.6

223.1.3.2223.1.3.1

223.1.3.27

223.1.1.2

223.1.7.0

223.1.7.1223.1.8.0223.1.8.1

223.1.9.1

223.1.9.2


Subnetting After acquiring a block of network addresses, e.g. a Class B

address, divide it according to needs Let’s say, we need 6 large divisions first, then each division

may be divided as needed Each division must be equal sizes

1st division

2nd 3rd


Subnetting: Classful--example

Given class B address of 132.241.0.0, we need to divide the networks into 255 equal-sized subnets—variable sizing covered later (VLSM)

With a class B address, we are given the last 16 bits to play with(divide)

132 . 241 . 0 . 0

1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0

128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1

132 241 0 0. . .netid hostid

Fixed Can be subnetted



132 . 241 . 0 . 0

1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0

128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1

Fixed Can be subnetted. . .

Where should we put the divider?

# of subnets? # of hosts in a subnet?

The decision should be based upon:

# of subnets needed

# of hosts on each subnets

Future needs

Routing protocol (RIPv1, RIPv2, or OSPF)



132 . 241 . 0 . 0

1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0

128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1 128

64

32

16

8 4 2 1

Fixed Can be subnetted. . .

Where should we put the divider?

# of subnets? # of hosts in a subnet?

The division is indicated by “subnet mask”—done by putting 1’s until the division point

For example: Let’s put it after 19th bit

1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 01 1 1 0 0 0 0 0

Then the subnet mask should be

255 255 ? 0224


Subnetting: Classful

1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 01 1 1 0 0 0 0 0


255 255 0224

00241132

netid subnetid hostid

To be precise, we have netid, subnetid, hostid

But the (netid + subnetid) is often called as subnetid

Notation for subnet mask; 2 ways

1) Dotted decimal notation; e.g. 255.255.224.0

2) Prefix notation; e.g. /19 –called slash notation also--19bits are subnet mask

subnetid hostid

subnet mask


Example for classful subnetting

Given a class B address of 132.241.0.0/16 Let’s say we decided to divide using next 8

bits1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 01 1 1 1 1 1 1 1


00241132

subnetid hostid

subnet mask

255 255 255 0

In prefix notation ? 132.241.0.0/24


Example for classful subnetting, cont’d Restrictions: legacy routers following rfc 950, do

not recognize all zero’s and all one’s subnet 132.241.0.0/24 & 132.241.255.0/24 --wasted space

In rfc 1878, all zero’s and all one’s subnets are allowed by defaultCan be turned off by no ip subnet-zero command for CISCO routersRead http://www.cisco.com/en/US/tech/tk648/tk361/technologies_tech_note09186a0080093f18.shtml

In hostid portion, all zero’s and all one’s are not allowed—they are reserved for special purpose—all zero’s represent the subnetid & all one’s represent subnet broadcast address; e.g. 132.241.0.0 is the subnetid for subnet 132.241.0.0 and 132.241.0.255 is the broadcast address for the 132.241.0.0 subnet—also see slide 4.61



http://www.cisco.com/en/US/tech/tk648/tk361/technologies_tech_note09186a0080093f18.shtml

http://www.cisco.com/en/US/tech/tk648/tk361/technologies_tech_note09186a0080093f18.shtml


Example for classful subnetting, cont’dLet’s write down all the subnet addresses and host addresses

Subnet addresses Host addresses

132.241.0.0/24 132.241.0.1 -- 132.241.0.254(note that 132.241.0.0 is subnet’s address and 132.241.0.255 is subnet broadcast address)

132.241.1.0/24 132.241.1.1 -- 132.241.1.254

. . . . . .132.241.254.0/24 132.241.254.1 -- 132.241.254.254

132.241.255.0/24 132.241.255.1 -- 132.241.255.254


Another example for classful subnetting:Let’s subnet a subnet— Mini Lab

A class B subnet 132.241.0.0 was divided using 8 bit(/24) division as previous example

Now, we are assigned one of the subnets; 132.241.158.0—we want to subnet this subnet

Requirements: We need at least 4 subnets and each subnet should accommodate at least 20 hosts


Problems with Classful addressing & subnetting

Only 3 classes useable (A, B, C) Only 3 sizes to satisfy all organizations Address spaces are depleted—not much left

—especially class B (most comfortable fit) Some predictions say the address space will

be exhausted—one predicts in 2008 and the other in 2018—to see current assignments, see http://bgp.potaroo.net/index-ale.html

In classful addressing, the assignment of class C addresses result in a large number of entries in routing table for Internet backbone routers--- http://bgp.potaroo.net/

http://bgp.potaroo.net/index-ale.html

http://bgp.potaroo.net/


IP adddress assignmentFrom: http://bgp.potaroo.net/ipv4-stats/allocated-all.html

http://bgp.potaroo.net/ipv4-stats/allocated-all.html


Routing table size of Internet backbone routers


Solutions for IPv4 address depletion

Short term (interim) solutions: CIDR(Classless InterDomain Routing)—by not

sticking to 3 classes (A,B,C) and their fixed sizes, we can accommodate better fittings to different size organizations – a.b.c.d/x

NAT(Network Address Translation) —Small block of addresses can be timeshared by large number of connections

VLSM(Variable Length Subnet Mask) —Allows intranets use variable sizes for distributing address spaces(rather than the fixed size divisions we saw in the classful subnetting examples)

Long term solution: IPv6 (128 bit IP address)


IP Addressing

Classful Addressing

CIDR

VLSM

NAT

1992

Inefficient division into 5 classes—A, B, C, D, EAddress space running out

IPv6—128 bitsIPv4—32bits

Huge address space

More streamlined for efficiency

More Auto-configuration

Accommodates QOS

When?

interim solutions


CIDR: Motivation

Observation: Many organizations need larger address than one class C(254), but less than 1000 (<< class B)—they need multiple class C addresses but not class B(remember class B space is depleted)

Assigning multiple class C addresses as a block(aggregation) helps to reduce the effects of the explosion of Internet Backbone routers

Therefore, eliminate the restriction of classes!

RFC 1517, 1518, 1519, 1817






IP addressing: CIDR

CIDR: Classless InterDomain Routing Around 1993, CIDR replaced Classful addressing “CIDR is principally a bitwise, prefix-based standard for the

interpretation of IP addresses.”--wikipedia IP address space can have many different sizes—not only 3

sizes! Uses address format: a.b.c.d/x, where x is # bits in subnet

portion of address—prefix notation Class has no meaning!

11001000 00010111 00010000 00000000

subnetpart

hostpart

200.23.16.0/23

subnetpart

hostpart

200.23.18.0/23

11001000 00010111 00010010 00000000

Is 200.23.17.0/23 possible?

http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing


Commonly used CIDR prefixes

CIDR Prefix # of former class C nets

/28 1/16 (of class C)

/27 1/8

/26 1/4

/25 1/2

/24 1 (class C)

/23 2

/22 4

/21 8

/20 16

/19 32

/18 64

/17 128

/16 256 = 1 class B

/15 512

/14 1024

/13 2048

/12 4096


IP addressing: CIDR

CIDR is also called as “supernetting” since we aggregate classful addresses—mostly class B and class C addresses

Example: Let’s say we have the following 4 class C addresses:

• 200.168.4.0 11001000 10101000 00000100 00000000• 200.168.5.0 11001000 10101000 00000101 00000000• 200.168.6.0 11001000 10101000 00000110 00000000• 200.168.7.0 11001000 10101000 00000111 00000000With 255.255.255.0 as subnet mask = prefix ?

Let’s compare with CIDR address of• 200.168.4.0/22

What is the difference? List Class C address blocks given 200.168.8.0/21


IP addresses: how to get one?

Q: How does a host get IP address?

Static addressing: hand-coded by system admin in a file Wintel: control-panel->network->configuration-

>tcp/ip->properties UNIX: /etc/rc.config

Dynamic addressing: DHCP: Dynamic Host Configuration Protocol: dynamically get address from as a DHCP server “plug-and-play”

(more in next chapter)


IP addresses: how to get one? Using CIDR

Q: How does a network get subnet part of IP addr?

A: gets allocated portion of its provider ISP’s address space

ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23 Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23 Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23 ... ….. …. ….

Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23


Hierarchical addressing: route aggregation:using CIDR

“Send me anythingwith addresses beginning 200.23.16.0/20”

200.23.16.0/23

200.23.18.0/23

200.23.30.0/23

Fly-By-Night-ISP

Organization 0

Organization 7Internet

Organization 1

ISPs-R-Us“Send me anythingwith addresses beginning 199.31.0.0/16”

200.23.20.0/23Organization 2

...

...

Hierarchical addressing allows efficient advertisement of routing information:


route aggregation using CIDR:Reduces size of routing tables for Internet Backbone routers

Growth of BGP table 1994 to present

http://bgp.potaroo.net/


Hierarchical addressing: more specific routes

ISPs-R-Us has a more specific route to Organization 1:This may happen when Organization 1 used to subscribe to Fly-By-Night-ISP but now moved to ISPs-R-Us. Organization 1 wants to keep 200.23.18.0/23! – this is one possible scenarioIn our text, you see another possible scenario—page 337

“Send me anythingwith addresses beginning 200.23.16.0/20”

200.23.16.0/23

200.23.18.0/23

200.23.30.0/23

Fly-By-Night-ISP

Organization 0

Organization 7 Internet

Organization 1

ISPs-R-Us“Send me anythingwith addresses beginning 199.31.0.0/16or 200.23.18.0/23”

200.23.20.0/23Organization 2

...

...

199.31.0.0/16


IP addressing: the last word...

Q: How does an ISP get a block of addresses?A: From ICANN: Internet Corporation for Assigned

Names and Numbers allocates addresses manages DNS assigns domain names, resolves disputes

Also see http://www.pch.net/resources/data/WoN/itu-seminar-20040211-1.ppt#480,1,Internet Addressing and the RIR system

http://www.icann.org/

http://www.pch.net/resources/data/WoN/itu-seminar-20040211-1.ppt#Slide%201

http://www.pch.net/resources/data/WoN/itu-seminar-20040211-1.ppt#Slide%201


Solutions to IP address depletion

Short term (interim) solutions: CIDR NAT(Network Address Translation) VLSM(Variable Length Subnet Mask)

Long term solution: IPv6 (128 bits)


NAT: Network Address Translation

10.0.0.1

10.0.0.2

10.0.0.3

10.0.0.4

138.76.29.7

local network(e.g., home network)

10.0.0/24

Intranet Datagrams use10.0.0/24 address

Internet Datagrams aresent to the router

( 10.0.0.4 )

All datagrams leaving localnetwork have same single source

NAT IP address: 138.76.29.7,But different source port

numbers

Internet

NAT table here



Motivation: local network uses just one IP address as far as outside world is concerned: range of addresses not needed from ISP: just

one IP address for all devices can change addresses of devices in local network

without notifying outside world can change ISP without changing addresses of

devices in local network devices inside local net not explicitly

addressable/ visible by outside world (a security plus).

popularly used by home networks connections to Internet



Implementation: NAT router must:

outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #). . . remote clients/servers will respond using

(NAT IP address, new port #) as destination addr.

remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair

incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table



10.0.0.1

10.0.0.2

10.0.0.3

S: 10.0.0.1, 3345D: 128.119.40.186, 80

1

10.0.0.4

138.76.29.7

1: host 10.0.0.1 sends datagram to 128.119.40.186, 80

NAT translation tableWAN side addr LAN side addr

138.76.29.7, 5001 10.0.0.1, 3345…… ……

S: 128.119.40.186, 80 D: 10.0.0.1, 3345

4

S: 138.76.29.7, 5001D: 128.119.40.186, 80

2

2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table

S: 128.119.40.186, 80 D: 138.76.29.7, 5001

3

3: Reply arrives dest. address: 138.76.29.7, 5001

4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345



16-bit port-number field: 60,000 simultaneous connections with a single

LAN-side address! Plenty! NAT is controversial:

routers should only process up to layer 3 violates end-to-end argument

• NAT possibility must be taken into account by app designers, eg, P2P applications

address shortage should instead be solved by IPv6


NAT: Another form

The NAT described so far is also called as NAPT (Network Address Port Translation)

Another form is “Basic NAT” or “Static NAT”—involves only IP address translation--not ports Router is configured with a pool of IP addresses When a computer having private IP address

wants to connect to Internet, router assigns an IP address from the pool until disconnected

Example usage: An ISP with 1000 users and only maximum 20% are on-line at a time. ISP uses NAT with a class C address(254 IP addresses) to serve all users


NAT: Another form

10.0.0.1

10.0.0.2

10.0.0.3

S: 10.0.0.1, 3345D: 128.119.40.186, 80

1

10.0.0.4

138.76.29.7

1: host 10.0.0.1 sends datagram to 128.119.40.186, 80

NAT translation tableWAN side addr LAN side addr

138.76.29.7 10.0.0.1…… ……

S: 128.119.40.186, 80 D: 10.0.0.1, 3345

4

S: 138.76.29.7, 3345D: 128.119.40.186, 80

2

2: NAT routerchanges datagramsource addr from10.0.0.1 to138.76.29.7,updates table

S: 128.119.40.186, 80 D: 138.76.29.7, 3345

3

3: Reply arrives dest. address: 138.76.29.7, 5001

4: NAT routerchanges datagramdest addr from138.76.29.7 to 10.0.0.1Only IP addresses are changed!

Notice that only IP addresses are changed by router


Solutions to IP address depletion

Short term (interim) solutions: CIDR NAT(Network Address Translation) VLSM(Variable Length Subnet Mask)

Long term solution: IPv6 (128 bits)


VLSM(Variable Length Subnet Mask): rfc 1817

Classful subnetting divides a network into equal sizes at a given level=“one size fits all”

With CIDR, the VLSM was introduced—we can divide a network into different sizes at a given level—more flexible & save addresses

1st division

2nd 3rd



VLSM: an example

Uses all 0’s & all 1’s subnets to fully utilze address space

A company is assigned a class C address space and needs the following:

# of subnets subnet size purpose

2 60+ hosts For 2 main offices

4 10+ hosts For 2 branch offices & 2 server farms

The rest 2 hosts PPP (for telecommuting)

1st subnet mask255.255.255.192

/26

X.Y.Z.0/2662 hosts

X.Y.Z.64/2662 hosts

2nd subnet mask255.255.255.240

/28

X.Y.Z.128/2814 hosts

X.Y.Z.144/2814 hosts . .

.

3rd subnet mask255.255.255.252

/30

X.Y.Z.192/302 hosts

X.Y.Z.196/302 hosts

X.Y.Z.252/302 hosts. .

.

X.Y.Z.176/2814 hosts

For another example, visit here

http://www.tcpipguide.com/free/t_IPVariableLengthSubnetMaskingVLSM-3.htm










ICMP: Internet Control Message Protocol

used by hosts & routers to communicate network-level control information error reporting:

unreachable host, network, port, protocol

echo request/reply (used by ping)

network-layer but “above” IP: ICMP msgs carried in IP

datagrams—”horizontal layering”

ICMP message: type, code plus first 8 bytes of IP datagram causing error

Type Code description0 0 echo reply (ping)3 0 dest. network unreachable3 1 dest host unreachable3 2 dest protocol unreachable3 3 dest port unreachable3 6 dest network unknown3 7 dest host unknown4 0 source quench (congestion control - not used)8 0 echo request (ping)9 0 route advertisement10 0 router discovery11 0 TTL expired12 0 bad IP header


ICMP

3 InternetIPv4IPv6

IP NAT

IP Sec

Mobile IP

IP RoutingProtocols

ICMPv4ICMPv6


IP supportProtocols


BGP, EGP

ICMP uses the service of IP to send a control message


Traceroute and ICMP: an example usage of ICMP

Source sends series of UDP segments (in Unix) to dest First has TTL =1 Second has TTL=2, etc. Unlikely port number(33434

and up) in unix implementations

Windows systems use ICMP Echo request not UDP

When nth datagram arrives to nth router: Router discards datagram And sends to source an ICMP

message (type 11, code 0) Message includes name of

router & IP address

When ICMP message arrives, source calculates RTT

Traceroute does this 3 times

Stopping criterion UDP segment eventually

arrives at destination host Destination returns ICMP

“host unreachable” packet (type 3, code 3)

When source gets this ICMP, stops.

http://kb.pert.switch.ch/cgi-bin/twiki/view/PERTKB/VanJacobsonTraceroute

http://kb.pert.switch.ch/cgi-bin/twiki/view/PERTKB/VanJacobsonTraceroute


C:\Documents and Settings\Administrator> tracert www.csuchico.edu

Tracing route to calypso.csuchico.edu [132.241.82.62]over a maximum of 30 hops:

1 6 ms 5 ms 11 ms 208-53-80-5.chico.ca.digitalpath.net [208.53.80.5] 2 7 ms 4 ms 4 ms 198-69-248-1.chico.ca.digitalpath.net [198.69.248.1] 3 9 ms 8 ms 8 ms sl-gw26-stk-5-0-TS9.sprintlink.net [144.232.195.169] 4 8 ms 10 ms 10 ms sl-bb20-stk-8-0.sprintlink.net [144.232.4.114] 5 13 ms 10 ms 9 ms sl-bb20-sj-9-0.sprintlink.net [144.232.20.99] 6 11 ms 9 ms 10 ms sl-bb21-sj-15-0.sprintlink.net [144.232.3.158] 7 10 ms 12 ms 10 ms sl-st20-sj-13-0.sprintlink.net [144.232.9.58] 8 118 ms 11 ms 14 ms so-7-1.car4.SanJose1.Level3.net [209.245.146.245] 9 13 ms 10 ms 12 ms ge-11-0.ipcolo3.SanJose1.Level3.net [4.68.123.43] 10 12 ms 14 ms 10 ms 4.79.44.6 11 14 ms 18 ms 14 ms dc-svl-dc1--isp-1-ge.cenic.net [137.164.22.58] 12 21 ms 22 ms 14 ms dc-oak-dc1--svl-dc1-10ge.cenic.net [137.164.22.31] 13 23 ms 22 ms 24 ms dc-csac-dc1--oak-dc1-ge.cenic.net [137.164.22.111] 14 26 ms 24 ms 24 ms dc-cor-dc1--sac-dc1-ge.cenic.net [137.164.22.153] 15 31 ms 26 ms 25 ms dc-cor-dc2--cor-dc1-df-iconn-1.cenic.net [137.164.22.199] 16 29 ms 27 ms 26 ms dc-csuchico-egm--cor-dc2.cenic.net [137.164.41.26] 17 32 ms 29 ms 35 ms chi-mocha-ge0-0-132.net.CSUChico.EDU [132.241.95.74] 18 40 ms 46 ms 46 ms calypso.CSUChico.EDU [132.241.82.62]

Trace complete.

Result of “tracert www.csuchico.edu” on a Windows host

Round trip time of 3 probes










IPv6 (previously known as IPng (IP next generation))RFC 2460

Initial motivation: 32-bit address space soon to be completely allocated.

Additional motivation: header format helps speed

processing/forwarding header changes to facilitate QoS IPv6 datagram format: fixed-length 40 byte header no fragmentation allowed


Seung Bae Im

From http://news.com.com/2009-1082_3-233721.html NEWS.COM: If you could rebuild the Net from scratch, what would you do differently? Vinton Cerf: One thing I'd definitely do is pick a bigger address space than 32 bits. I have a rationale for having ended up with that number, but looking at it in 1997, it doesn't hold too much water. I can only be forgiven that this was a decision made 20 years ago when I didn't have adequate foresight to realize what an explosion [the Internet] would be.


IPv6 Header (Cont)

Priority: identify priority among datagrams in flowFlow Label: identify datagrams in same “flow.” (concept of“flow” not well defined).Next header: identify upper layer protocol for data

With 128 bits, you can assign over 3.7x10**21 addresses per square inch of the earth's surface.


Other Changes from IPv4

Checksum: removed entirely to reduce processing time at each hop

Options: allowed, but outside of header, indicated by “Next Header” field

ICMPv6: new version of ICMP additional message types, e.g. “Packet Too

Big” multicast group management functions


Differences Between IPv4 and IPv6

Category IPv4 IPv6

Address length 32 bits 128 bits

Header size 20-60 bytes 40 bytes—fixed size

IPSec support Optional Required

QoS support Limited Better

Fragmentation Done by hosts and routers Done by hosts only

Is a header checksum present? Yes No

Does the header include options? Yes No

Link-layer address resolution Broadcast ARP frames Multicast Neighbor Solicitation

messages

Error reporting and diagnostic protocol ICMP (for IPv4) ICMPv6

Multicast group membership protocol IGMP MLD

Router discovery support Optional Required

Network layer broadcast addresses? Yes No

Host configuration DHCP or manual Automatic, DHCP, or manual

DNS record type for name resolution A record AAAA record

DNS record type and location for reverse name resolution

PTR records in IN-ADDR.ARPA domain

PTR records in IP6.INT domain

http://en.wikipedia.org/wiki/IPv6

http://en.wikipedia.org/wiki/IPSec

http://en.wikipedia.org/wiki/Qos

http://en.wikipedia.org/wiki/ICMPv6

http://en.wikipedia.org/wiki/IGMP

http://en.wikipedia.org/wiki/Multicast_Listener_Discovery

Seung Bae Im

From Wikipedai---IPsec (IP security) is a standardized framework for securing Internet Protocol (IP) communications by encrypting and/or authenticating each IP packet in a data stream.


Transition From IPv4 To IPv6

Not all routers can be upgraded simultaneous no “flag days” feasible How will the network operate with mixed IPv4

and IPv6 routers? Two main approaches:

Dual Stack: A host or router implements both IPv4 and IPv6.

Tunneling: IPv6 carried as payload in IPv4 datagram among IPv4 routers


Tunneling

A B E F

IPv6 IPv6 IPv6 IPv6

tunnelLogical view:

Physical view:A B E F

IPv6 IPv6 IPv6 IPv6IPv4 IPv4


Tunneling

A B E F

IPv6 IPv6 IPv6 IPv6

tunnelLogical view:

Physical view:A B E F

IPv6 IPv6 IPv6 IPv6

C D

IPv4 IPv4

Flow: XSrc: ADest: F

data


data


data

Src:BDest: E


data

Src:BDest: E

A-to-B:IPv6

E-to-F:IPv6

B-to-C:IPv6 inside

IPv4

B-to-C:IPv6 inside

IPv4


Tunneling: another view

From http://www.cisco.com/univercd/cc/td/doc/product/software/ios123/123cgcr/ipv6_c/sa_tunv6.htm

http://www.cisco.com/univercd/cc/td/doc/product/software/ios123/123cgcr/ipv6_c/sa_tunv6.htm

http://www.cisco.com/univercd/cc/td/doc/product/software/ios123/123cgcr/ipv6_c/sa_tunv6.htm


IPv6 Deployment?

To see the current deployment, visit

http://bgp.potaroo.net/index-

v6.html

Not likely to happen in the foreseeable future










1

23

0111

value in arrivingpacket’s header

routing algorithm

local forwarding tableheader value output link

0100010101111001

3221

Interplay between routing, forwarding

Difference?


u

yx

wv

z2

2

13

1

1

2

53

5

Graph: G = (N,E)

N = set of routers = { u, v, w, x, y, z }

E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }

Graph abstraction—routing can be analyzed as a graph problem

Remark: Graph abstraction is useful in other network contexts

Example: P2P, where N is set of peers and E is set of TCP connections

nodes edges


Graph abstraction: costs

u

yx

wv

z2

2

13

1

1

2

53

5

• c(x,x’) = cost of link (x,x’) - e.g., c(w,z) = 5• cost could always be set to 1(all links has same cost—hop count), or inversely related to bandwidth,or inversely related to Congestion, or …Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)

Question: What’s the least-cost path between u and z ?

Routing algorithm: algorithm that finds least-cost path


Routing algorithm design

u

yx

wv

z

2

2

13

1

1

2

5

3

5

Assuming that we can decide the cost of the links

How would routers learn about the weights of the links other than the directly connected links?

How often routers advertise the weights?

The scope of advertisement?


Routing Algorithm classification

Global or decentralized information?

Global: all routers have complete

topology, link cost info “link state” algorithmsDecentralized: router knows physically-

connected neighbors, link costs to neighbors

iterative process of computation, exchange of info with neighbors

“distance vector” algorithms

Static or dynamic?Static: routes change slowly

over timeDynamic: routes change more

quickly periodic update in response to link

cost changes


IGP(Intra AS) vs EGP(Inter AS)

A group of networks and routers under the authority of a single administration

AS: Autonomous System

To see AS numbers, visit here


AS numbers from http://bgp.potaroo.net/cidr/autnums.html

AS3895 AMEDD-EUR - DoD Network Information Center AS3896 AMEDD-EUR - DoD Network Information Center AS3897 AMEDD-EUR - DoD Network Information Center AS3898 UCSF-HISD - University of Calif. S.F. - Hospital Info Sys AS3899 CHICO-NET - California State University, Chico AS3900 TEXASNET-ASN - Yokubaitis Holding Corporation AS3901 ARRAKIS - Higher Technology Services AS3902 GLAXOCA-1 - Glaxo Canada Inc. AS3903 NAG-AS - Network Ananlysis Group AS3904 ASTHOUGHTPRT - ThoughtPort inc.

http://bgp.potaroo.net/cgi-bin/as-report?as=AS3895&view=(null)





Popular Routing Algorithms

Distance vector

Link state Path vector










Routing Algorithms

Interior Routing(within one AS) vs Exterior Routing (between AS's)

Current Routing AlgorithmsTwo common Routing Algorithms(Both are adaptive(dynamic) algorithms)

(a) "Link State Routing"-Each router shares its knowledge about its links to the neighbors(link state) with every routers in the area.-Example: OSPF***LINK STATES to EVERYBODY*** (b) "Distance Vector Routing"--Each router shares its knowledge(distance to destination networks) about the entire network(in an AS) with its neighbors-Example: RIPv2***EVERYTHING to NEIGHBORS ONLY***


A Link-State Routing Algorithm

Dijkstra’s algorithm net topology, link costs

known to all nodes accomplished via “link

state broadcast” all nodes have same info

computes least cost paths from one node (‘source”) to all other nodes gives forwarding table

for that node iterative: after k iterations,

know least cost path to k dest.’s

Notation: c(x,y): link cost from

node x to y; = ∞ if not direct neighbors

D(v): current value of cost of path from source to dest. v

p(v): predecessor node along path from source to v

N': set of nodes whose least cost path definitively known


Dijsktra’s Algorithm

1 Initialization: 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N'


Dijkstra’s algorithm: example

Step012345

N'u

uxuxy

uxyvuxyvw

uxyvwz

D(v),p(v)2,u2,u2,u

D(w),p(w)5,u4,x3,y3,y

D(x),p(x)1,u

D(y),p(y)∞

2,x

D(z),p(z)∞ ∞

4,y4,y4,y

u

yx

wv

z2

2

13

1

1

2

53

5


Dijkstra’s algorithm: example (2)

u

yx

wv

z

Resulting shortest-path tree from u:

vx

y

w

z

(u,v)(u,x)

(u,x)

(u,x)

(u,x)

destination link

Resulting forwarding table in u:


Dijkstra’s algorithm, discussion

Algorithm complexity: n nodes each iteration: need to check all nodes, w, not in N n(n+1)/2 comparisons: O(n2) more efficient implementations possible: O(nlogn)

Oscillations possible: e.g., link cost = amount of carried traffic

A

D

C

B1 1+e

e0

e

1 1

0 0

A

D

C

B2+e 0

001+e1

A

D

C

B0 2+e

1+e10 0

A

D

C

B2+e 0

e01+e1

initially… recompute

routing… recompute … recompute










Link State

Link State Routing--Example: OSPF(Open Shortest Path First) http://www.cisco.com/univercd/cc/td/doc/cisi twk/ito_doc/ospf.htm

OSPF is an IGP(Interior Gateway Protocol) developed by IETF Currently Version 2--RFC 1583--also has a version for IPv6— RFC 2740Open(spec. on public domain)OSPF has features not in RIP and is suitable for a large networkBased on Dijkstra's algorithmOSPF is an intra-AS protocol

http://www.cisco.com/univercd/cc/td/doc/cisi%20twk/ito_doc/ospf.htm





http://www.rfc-editor.org/rfc/rfc1583.txt



OSPF

Autonomous System: A group of networks and routers under the authority of a single administration


-Autonomous System(A group of networks and routers under the authority of a single administration) is divided into Areas(A collection of networks, hosts, and routers all contained within an autonomous system). An AS is made up of several Areas.-3 kinds of routers in OSPF--(1) Intra-Area routers (2) Area Border routers (3) AS Boundary routers-Routers inside an Area flood the Area with routing information--information is exchanged to all routers in an Area by Flooding(actually multicasting)--all routers in an area maintain the same topology database-This area concept limits the size of the topology database that must be held by all routers in the area-At the border of an area, special routers called Area border routers collect and summarize the information about each area it is attached to and send the summary to other areas.

OSPF


-One of the areas into which all areas have a connection is called Backbone(all OSPF networks must contain at least one area, the Backbone) which is assigned an area identifier of 0.0.0.0(meaning Area 0--not an IP address)-Routers inside a backbone are called Backbone routers--Backbone routers operate identically to other Intra-Area routers and maintain full topology databases for the backbone area-Autonomous Systems are connected via AS Boundary Routers--they exchange reachability information with routers in other ASs using an Exterior Gateway Protocol.-OSPF is "Link State Routing"-OSPF protocol allows the administrator to assign a cost(metric) to each route:

OSPF


OSPF Cost--from http://www.cisco.com/warp/public/104/2.html#1.0 "The cost (also called metric) of an interface in OSPF is an indication of the overhead required to send packets across a certain interface. The cost of an interface is inversely proportional to the bandwidth of that interface. A higher bandwidth indicates a lower cost. There is more overhead (higher cost) and time delays involved in crossing a 56k serial line than crossing a 10M ethernet line. The formula used to calculate the cost is:cost= 10000 0000/bandwith in bps For example, it will cost 10 EXP8/10 EXP7 = 10 to cross a 10Mbps Ethernet line and will cost 10 EXP8/1544000 = 64 to cross a T1 line.By default, the cost of an interface is calculated based on the bandwidth; you can force the cost of an interface by using the ip ospf cost <value> interface sub-command."-The cost can also be based on a type of service(minimum delay, maximum throughput, …)

OSPF

http://www.cisco.com/warp/public/104/2.html#1.0


OSPF Cost calculation


Distance vector

Link state Path vector


Distance Vector: RIP

RIP(Rourting Information Protocol)- An IGP(Interior Gateway Protocol) developed by XEROX(XNS-Xerox Network Systems)- RIP widely adopted in Internet with 1982 4BSD UNIX even before the standard RFC 1058(1988)-later revised with RFC1388(RIP V.2)- Popular in small networks- Also popular for PC networking- AppleTalk's routing = version of RIP- Novell, 3COM, Banyan- RIP is a Distance Vector Routing---Hop Count is used as the metric--How many hops(each router is a hop) to the destination network?


Operation- Each router initializes with a distance vector table containing zero(0) for itself, one(1) for directly attached networks, and infinity for every other destination.- Each router periodically (typically every 30 seconds) transmits its distance vector table to each of its neighbors. A router sends it's knowledge about the entire Autonomous system(=it's entire routing table(Netid, Hop Count entries)) to its neighboring routers( only to its neighbors). It can also transmit the table when a link first comes up or when the table changes.- Each router saves the most recent table it receives from each neighbor and uses the information to calculate its own distance vector table. (=Each router maintains the distance from itself to every known destination network in a "distance vector table")- Hop count is used as the metric--distance=how many hops

RIP


RIP routing table


RIP Updating AlgorithmReceive: A RIP message from a neighbor1. Add one hop to the hop count for each advertised destination2. Repeat the following steps for each advertised destination: 1. If(destination not in the routing table), then Add the advertised information to the table 2. Else 1. If(next-hop field is the same as the sender's address), then Replace entry in the table with the advertised one 2. Else 1. If(advertised hop count smaller than one in the table), then Add it to the routing table 2. Else do nothing3. Return

C

F

B

AE


Distance Vector Algorithm

Bellman-Ford Equation (dynamic programming)

Definedx(y) := cost of least-cost path from x to y

Then

dx(y) = min {c(x,v) + dv(y) }

where min is taken over all neighbors v of x

v


Bellman-Ford example

u

yx

wv

z2

2

13

1

1

2

53

5Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3

du(z) = min { c(u,v) + dv(z), c(u,x) + dx(z), c(u,w) + dw(z) } = min {2 + 5, 1 + 3, 5 + 3} = 4

Node that achieves minimum is nexthop in shortest path ➜ forwarding table

B-F equation says:


Distance Vector Algorithm

Dx(y) = estimate of least cost from x to y Node x knows cost to each neighbor v: c(x,v) Node x maintains distance vector Dx =

[Dx(y): y є N ] Node x also maintains its neighbors’

distance vectors For each neighbor v, x maintains

Dv = [Dv(y): y є N ]


Distance vector algorithm (4)

Basic idea: Each node periodically sends its own

distance vector estimate to neighbors When a node x receives new DV estimate

from neighbor, it updates its own DV using B-F equation:

Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N

Under minor, natural conditions, the estimate Dx(y) converge to the actual least cost dx(y)


Distance Vector Algorithm (5)

Iterative, asynchronous: each local iteration caused by:

local link cost change DV update message from

neighbor

Distributed: each node notifies

neighbors only when its DV changes neighbors then notify

their neighbors if necessary

wait for (change in local link cost or msg from neighbor)

recompute estimates

if DV to any dest has

changed, notify neighbors

Each node:


x y z

xyz

0 2 7

∞ ∞ ∞∞ ∞ ∞

from

cost to

from

from

x y z

xyz

0

from

cost to

x y z

xyz

∞ ∞

∞ ∞ ∞

cost to

x y z

xyz

∞ ∞ ∞7 1 0

cost to

∞2 0 1

∞ ∞ ∞

2 0 17 1 0

time

x z12

7

y

node x table

node y table

node z table

Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)} = min{2+0 , 7+1} = 2

Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)} = min{2+1 , 7+0} = 3

32


x y z

xyz

0 2 7

∞ ∞ ∞∞ ∞ ∞

from

cost to

from

from

x y z

xyz

0 2 3

from

cost tox y z

xyz

0 2 3

from

cost to

x y z

xyz

∞ ∞

∞ ∞ ∞

cost tox y z

xyz

0 2 7

from

cost to

x y z

xyz

0 2 3

from

cost to

x y z

xyz

0 2 3

from

cost tox y z

xyz

0 2 7

from

cost to

x y z

xyz

∞ ∞ ∞7 1 0

cost to

∞2 0 1

∞ ∞ ∞

2 0 17 1 0

2 0 17 1 0

2 0 13 1 0

2 0 13 1 0

2 0 1

3 1 0

2 0 1

3 1 0

time

x z12

7

y

node x table

node y table

node z table

Dx(y) = min{c(x,y) + Dy(y), c(x,z) + Dz(y)} = min{2+0 , 7+1} = 2

Dx(z) = min{c(x,y) + Dy(z), c(x,z) + Dz(z)} = min{2+1 , 7+0} = 3


Advantages of RIP-----Simplicity=low overhead

Disadvantages of RIP:(a) The limit to the size of a network imposed by maximum hop

count(15).

(b) Slow Convergence problemChanges in routes propagates slowly.For example, suppose there is a change in local LAN to which a router is connected. The router updates itself immediately. Then it will advertise the change in 30 seconds, so the next neighboring routers will know, then they will advertise within 30 seconds, ……Assuming a remote router is n hops away, then it could take 30*n seconds. This is why RIP limits the maximum hop count to 15. RIP is designed for small networks

(c) Instability problem(Also called "count to infinity problem")Counting to infinity occurs when a network becomes unreachable, but erroneous routes to that network persist because of the time for the distance vector tables to converge.


Distance Vector: link cost changes

To solve the Slow Convergence problem, whe link cost changes notify neighbors ASAP:

node detects local link cost change updates routing info, recalculates distance

vector if DV changes, notify neighbors immediately

“goodnews travelsfast”

x z14

50

y1

At time t0, y detects the link-cost change, updates its DV, and informs its neighbors.

At time t1, z receives the update from y and updates its table. It computes a new least cost to x and sends its neighbors its DV.

At time t2, y receives z’s update and updates its distance table. y’s least costs do not change and hence y does not send any message to z.

But this does not solve Instability problem


Instability problem for RIP(Also called as "count to infinity problem")

Net1 goes down

When B sends its table, A thinks that B can deliver to Net1

When A sends its table, B updates the hop count( thru A ! )

1

2

3

B finally learns that Net1 is down hop count 16 means the net is down


Comparison of LS and DV algorithms

Message complexity LS: with n nodes, E links,

O(nE) msgs sent DV: exchange between

neighbors only convergence time varies

Speed of Convergence LS: O(n2) algorithm requires

O(nE) msgs may have oscillations

DV: convergence time varies may be routing loops count-to-infinity problem

Robustness: what happens if router malfunctions?

LS: node can advertise

incorrect link cost each node computes only

its own table

DV: DV node can advertise

incorrect path cost each node’s table used by

others • error propagate thru

network










Hierarchical Routing

scale: with 200 million destinations:

can’t store all dest’s in routing tables!

routing table exchange would swamp links!

administrative autonomy internet = network of

networks each network admin may

want to control routing in its own network

Our routing study thus far - idealization all routers identical network “flat”… not true in practice


Hierarchical Routing

aggregate routers into regions, “autonomous systems” (AS)

routers in same AS run same routing protocol “intra-AS” routing

protocol routers in different AS

can run different intra-AS routing protocol

Gateway router Direct link to router in

another AS


3b

1d

3a

1c2aAS3

AS1

AS21a

2c2b

1b

Intra-ASRouting algorithm

Inter-ASRouting algorithm

Forwardingtable

3c

Interconnected ASes

Forwarding table is configured by both intra- and inter-AS routing algorithm Intra-AS sets entries

for internal dests Inter-AS & Intra-AS

sets entries for external dests


AS numbers from http://bgp.potaroo.net/cidr/autnums.html

AS3895 AMEDD-EUR - DoD Network Information Center AS3896 AMEDD-EUR - DoD Network Information Center AS3897 AMEDD-EUR - DoD Network Information Center AS3898 UCSF-HISD - University of Calif. S.F. - Hospital Info Sys AS3899 CHICO-NET - California State University, Chico AS3900 TEXASNET-ASN - Yokubaitis Holding Corporation AS3901 ARRAKIS - Higher Technology Services AS3902 GLAXOCA-1 - Glaxo Canada Inc. AS3903 NAG-AS - Network Ananlysis Group AS3904 ASTHOUGHTPRT - ThoughtPort inc.


3b

1d

3a

1c2aAS3

AS1

AS21a

2c2b

1b

3c

Inter-AS tasks

Suppose a router in AS1 receives datagram for which dest is outside of AS1 Router should forward

packet towards one of the gateway routers, but which one?

AS1 needs:1. to learn which dests

are reachable through AS2 and which through AS3

2. to propagate this reachability info to all routers in AS1

Job of inter-AS routing!Gateway routers


Example: Setting forwarding table in router 1d

Suppose AS1 learns (via inter-AS protocol) that subnet x is reachable via AS3 (gateway 1c) but not via AS2.

Inter-AS protocol propagates reachability info to all internal routers.

Router 1d determines from intra-AS routing info that its interface I is on the least cost path to 1c.

Puts in forwarding table entry (x,I).

3b

1d

3a

1c2aAS3

AS1AS2

1a

2c2b

1b

3c


Example: Choosing among multiple ASes

Now suppose AS1 learns from the inter-AS protocol that subnet x is reachable via AS3 and via AS2.

To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x.

This is also the job on inter-AS routing protocol!

3b

1d

3a

1c2aAS3

AS1

AS21a

2c2b

1b

3c


Learn from inter-AS protocol that subnet x is reachable via multiple gateways

Use routing infofrom intra-AS

protocol to determine

costs of least-cost paths to each

of the gateways

Hot potato routing:Choose the

gatewaythat has the

smallest least cost

Determine fromforwarding table the interface I that leads

to least-cost gateway. Enter (x,I) in

forwarding table

Example: Choosing among multiple ASes

Now suppose AS1 learns from the inter-AS protocol that subnet x is reachable from AS3 and from AS2.

To configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x.

This is also the job on inter-AS routing protocol! Hot potato routing: send packet towards closest

of two routers.










Intra-AS Routing

Also known as Interior Gateway Protocols (IGP) Most common Intra-AS routing protocols:

RIP: Routing Information Protocol

OSPF: Open Shortest Path First

IGRP: Interior Gateway Routing Protocol (Cisco proprietary—modified version of RIP)










RIP ( Routing Information Protocol)

Distance vector algorithm Included in BSD-UNIX Distribution in 1982 Distance metric: # of hops (max = 15 hops)

DC

BA

u v

w

x

yz

destination hops u 1 v 2 w 2 x 3 y 3 z 2

From router A to subsets:


RIP advertisements

Distance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement)

Each advertisement: list of up to 25 destination nets within AS


RIP: Example

Destination Network Next Router Num. of hops to dest. w A 2

y B 2 z B 7

x -- 1…. …. ....

w x y

z

A

C

D B

Routing table in D


RIP: Example

Destination Network Next Router Num. of hops to dest. w A 2

y B 2 z B A 7 5

x -- 1…. …. ....Routing table in D

w x y

z

A

C

D B

Dest Next hops w - 1 x - 1 z C 4 …. … ...

Advertisementfrom A to D


RIP: Link Failure and Recovery

If no advertisement heard after 180 sec --> neighbor/link declared dead routes via neighbor invalidated new advertisements sent to neighbors neighbors in turn send out new advertisements

(if tables changed) link failure info quickly (?) propagates to entire

net poison reverse used to prevent ping-pong

loops (infinite distance = 16 hops)


RIP Table processing

RIP routing tables managed by application-level process called route-d (daemon)

advertisements sent in UDP packets, periodically repeated

physical

link

network forwarding (IP) table

Transprt (UDP)

routed

physical

link

network (IP)

Transprt (UDP)

routed

forwardingtable










OSPF (Open Shortest Path First)

“open”: publicly available Uses Link State algorithm

LS packet dissemination Topology map at each node Route computation using Dijkstra’s algorithm

OSPF advertisement carries one entry for each connection to neighbor router

Advertisements disseminated to entire AS (via flooding) Carried in OSPF messages directly over IP (rather than

TCP or UDP) via IP protocol number 89.


OSPF “advanced” features (not in RIP)

Security: all OSPF messages authenticated (to prevent malicious intrusion) using MD5 (Message-

Digest algorithm 5) Multiple same-cost paths allowed (only one path

in RIP) For each link, multiple cost metrics for different

TOS (e.g., satellite link cost set “low” for best effort; high for real time)

Integrated uni- and multicast support: Multicast OSPF (MOSPF) uses same topology

data base as OSPF Hierarchical OSPF in large domains.


Hierarchical OSPF


Hierarchical OSPF

Two-level hierarchy: local area, backbone. Link-state advertisements only in area each nodes has detailed area topology; only know

direction (shortest path) to nets in other areas. Area border routers: “summarize” distances to

nets in own area, advertise to other Area Border routers.

Backbone routers: run OSPF routing limited to backbone.

Boundary routers: connect to other AS’s.










Internet inter-AS routing: BGP

BGP (Border Gateway Protocol): the de facto (Latin: “In practice) standard

BGP provides each AS a means to:1. Obtain subnet reachability information from

neighboring ASs.2. Propagate reachability information to all AS-

internal routers.3. Determine “good” routes to subnets based on

reachability information and policy. allows subnet to advertise its existence to

rest of Internet: “I am here”


BGP basics Pairs of routers (BGP peers) exchange routing info

over semi-permanent TCP connections: BGP sessions BGP sessions need not correspond to physical links.

When AS2 advertises a prefix to AS1, AS2 is promising it will forward any datagrams destined to that prefix towards the prefix. AS2 can aggregate prefixes in its advertisement

3b

1d

3a

1c2aAS3

AS1

AS21a

2c

2b

1b

3c

eBGP session

iBGP session


Distributing reachability info With eBGP session between 3a and 1c, AS3 sends prefix

reachability info to AS1. 1c can then use iBGP to distribute this new prefix

reachability info to all routers in AS1 1b can then re-advertise new reachability info to AS2

over 1b-to-2a eBGP session When router learns of new prefix, creates entry for prefix

in its forwarding table.

3b

1d

3a

1c2aAS3

AS1

AS21a

2c

2b

1b

3c

eBGP session

iBGP session


Path attributes & BGP routes

When advertising a prefix, includes BGP attributes. prefix + attributes = “route”

Two important attributes: AS-PATH: contains ASs through which prefix

advertisement has passed: AS 67 AS 17 –- “path vector” algorithm

NEXT-HOP: Indicates specific internal-AS router to next-hop AS. (There may be multiple links from current AS to next-hop-AS.)

When gateway router receives route advertisement, uses import policy to accept/decline.


BGP route selection

Router may learn about more than 1 route to some prefix. Router must select route.

Elimination rules:1. Local preference value attribute: policy decision2. Shortest AS-PATH 3. Closest NEXT-HOP router: hot potato routing4. Additional criteria


BGP messages

BGP messages exchanged using TCP. BGP messages:

OPEN: opens TCP connection to peer and authenticates sender

UPDATE: advertises new path (or withdraws old)

KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request

NOTIFICATION: reports errors in previous msg; also used to close connection


BGP routing policy

Figure 4.5-BGPnew: a simple BGP scenario

A

B

C

W X

Y

legend:

customer network:

provider network

A,B,C are provider networks X,W,Y are customer (of provider networks) X is dual-homed: attached to two networks

X does not want to route from B via X to C .. so X will not advertise to B a route to C


BGP routing policy (2)

Figure 4.5-BGPnew: a simple BGP scenario

A

B

C

W X

Y

legend:

customer network:

provider network

A advertises to B the path AW B advertises to X the path BAW Should B advertise to C the path BAW?

No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers

B wants to force C to route to w via A B wants to route only to/from its customers!


Why different Intra- and Inter-AS routing ?

Policy: Inter-AS: admin wants control over how its traffic

routed, who routes through its net. Intra-AS: single admin, so no policy decisions

needed

Scale: hierarchical routing saves table size, reduced

update trafficPerformance: Intra-AS: can focus on performance Inter-AS: policy may dominate over performance










R1

R2

R3 R4

sourceduplication

R1

R2

R3 R4

in-networkduplication

duplicatecreation/transmissionduplicate

duplicate

Broadcast Routing

Deliver packets from source to all other nodes Source duplication is inefficient:

Source duplication: how does source determine recipient addresses?


In-network duplication

Flooding: when node receives brdcst pckt, sends copy to all neighbors Problems: cycles & broadcast storm

Controlled flooding: node only brdcsts pkt if it hasn’t brdcst same packet before Node keeps track of pckt ids already brdcsted Or reverse path forwarding (RPF): only forward

pckt if it arrived on shortest path between node and source

Spanning tree No redundant packets received by any node


A

B

G

DE

c

F

A

B

G

DE

c

F

(a) Broadcast initiated at A (b) Broadcast initiated at D

Spanning Tree

First construct a spanning tree Nodes forward copies only along spanning

tree


A

B

G

DE

c

F1

2

3

4

5

(a) Stepwise construction of spanning tree

A

B

G

DE

c

F

(b) Constructed spanning tree

Spanning Tree: Creation—one of many algorithms

Center node is predefined—node E in this example Each node sends unicast join message to center

node Message forwarded until it arrives at a node already

belonging to spanning tree Once the spanning tree is made, any node can broadcast


Modes of message transfer

anycast

broadcast

multicast

unicast

Added in IPv6


Multicasting on Internet

IP Multicast Internet Relay Chat NNTP Network News Transfer Protocol PSYC Protocol for SYnchronous Conferencing WWCP World Wide Conferencing Network XCAST Explicit Multi-Unicast Streaming mediaInteractive games

Send to group members only


Two main problems in multicasting

How to identify group members?What address should be used?Class Start End Default Subnet

MaskCIDR prefix notation

A 1.0.0.0 127. 255.255.255 255.0.0.0 /8

B 128.0.0.0 191.255. 255.255 255.255.0.0 /16

C 192.0.0.0 223.255.255.255 255.255.255.0 /24

D 224.0.0.0 239.255.255.255 NA NA

E 240.0.0.0 255.255.255.255 NA NA


mcast


Multicast Routing: sending to a Group

Goal: find a tree (or trees) connecting routers having local mcast group members (Tree is ideal for this: not all paths between routers used)

Two approaches: source-based: different tree from each sender to rcvrs shared-tree: same tree used by all group members

Shared tree Source-based trees


Approaches for building mcast trees

Approaches: source-based tree: one tree per source—

same as spanning-tree algorithm shortest path trees reverse path forwarding

group-shared tree: group uses one tree minimal spanning (Steiner) center-based trees

…we first look at basic approaches, then specific protocols adopting these approaches


Shortest Path Tree

mcast forwarding tree: tree of shortest path routes from source to all receivers Dijkstra’s algorithm

R1

R2

R3

R4

R5

R6 R7

21

6

3 4

5

i

router with attachedgroup member

router with no attachedgroup member

link used for forwarding,i indicates order linkadded by algorithm

LEGENDS: source


Reverse Path Forwarding

if (mcast datagram received on incoming link on shortest path back to center)

then flood datagram onto all outgoing links else ignore datagram

rely on router’s knowledge of unicast shortest path from it to sender

each router has simple forwarding behavior:


Reverse Path Forwarding: example

• result is a source-specific reverse SPT– may be a bad choice with asymmetric links

R1

R2

R3

R4

R5

R6 R7



datagram will be forwarded

LEGENDS: source

datagram will not be forwarded


Reverse Path Forwarding: pruning

forwarding tree contains subtrees with no mcast group members no need to forward datagrams down subtree “prune” msgs sent upstream by router with no

downstream group members

R1

R2

R3

R4

R5

R6 R7



prune message

LEGENDS: source

links with multicastforwarding

P

P

P


Shared-Tree: Steiner Tree

Steiner Tree: minimum cost tree connecting all routers with attached group members

problem is NP-complete("non-deterministic polynomial

time") excellent heuristics exists not used in practice:

computational complexity information about entire network needed monolithic: rerun whenever a router needs to

join/leave


Center-based trees

single delivery tree shared by all one router identified as “center” of tree to join:

edge router sends unicast join-msg addressed to center router

join-msg “processed” by intermediate routers and forwarded towards center

join-msg either hits existing tree branch for this center, or arrives at center

path taken by join-msg becomes new branch of tree for this router


Center-based trees: an example

Suppose R6 chosen as center:

R1

R2

R3

R4

R5

R6 R7



path order in which join messages generated

LEGEND

21

3

1


Internet Multicasting Routing: DVMRP

DVMRP: Distance Vector Multicast Routing Protocol, RFC1075

flood and prune: reverse path forwarding, source-based tree RPF tree based on DVMRP’s own routing tables

constructed by communicating DVMRP routers no assumptions about underlying unicast initial datagram to mcast group flooded

everywhere via RPF routers not wanting group: send upstream prune

msgs


DVMRP: continued…

soft state: DVMRP router periodically (1 min.) “forgets” branches are pruned: mcast data again flows down unpruned branch downstream router: reprune or else continue

to receive data routers can quickly regraft to tree

following IGMP join at leaf odds and ends

commonly implemented in commercial routers Mbone routing done using DVMRP


Tunneling

Q: How to connect “islands” of multicast routers in a “sea” of unicast routers?

mcast datagram encapsulated inside “normal” (non-multicast-addressed) datagram

normal IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router

receiving mcast router unencapsulates to get mcast datagram

physical topology logical topology


PIM: Protocol Independent Multicast

PIM is a family of multicast routing protocol

not dependent on any specific underlying unicast routing algorithm (works with all)

two different multicast distribution scenarios :

Dense: group members

densely packed, in “close” proximity.

bandwidth more plentiful

Sparse: # networks with group

members small wrt # interconnected networks

group members “widely dispersed”

bandwidth not plentiful


Consequences of Sparse-Dense Dichotomy:

Dense group membership by

routers assumed until routers explicitly prune

data-driven construction on mcast tree (e.g., RPF)

bandwidth and non-group-router processing profligate

Sparse: no membership until

routers explicitly join receiver- driven

construction of mcast tree (e.g., center-based)

bandwidth and non-group-router processing conservative


PIM- Dense Mode

flood-and-prune RPF, similar to DVMRP but

underlying unicast protocol provides RPF info for incoming datagram

less complicated (less efficient) downstream flood than DVMRP reduces reliance on underlying routing algorithm

has protocol mechanism for router to detect it is a leaf-node router


PIM - Sparse Mode center-based approach router sends join msg

to rendezvous point (RP) intermediate routers

update state and forward join

after joining via RP, router can switch to source-specific tree increased performance:

less concentration, shorter paths

R1

R2

R3

R4

R5

R6R7

join

join

join

all data multicastfrom rendezvouspoint

rendezvouspoint


PIM - Sparse Mode

sender(s): unicast data to RP,

which distributes down RP-rooted tree

RP can extend mcast tree upstream to source

RP can send stop msg if no attached receivers “no one is listening!”

R1

R2

R3

R4

R5

R6R7

join

join

join

all data multicastfrom rendezvouspoint

rendezvouspoint


Chapter 4: summary 4. 1 Introduction 4.2 Virtual circuit and







For more info. for routing, visit here