Routing Jinyang Li. Administravia Hand in PS1 to Hui Zhang before you leave! Email me project teamlist by Oct-1

Routing

Jinyang Li

Administravia

• Hand in PS1 to Hui Zhang before you leave!

• Email me project teamlist by Oct-1

Routing basics

• Model the network as a graph • Goal: Find a (best) path from A to B• You favorite path finding algorithm?

– Breadth first search– Bellman-Ford– Dijkstra– Floyd-Warshall

• Routing protocols must be decentralized

Challenges

• Network topology is dynamic– Links go up and down– Nodes go up and down– Link costs (metrics) change

• Nodes might have stale information• Nodes might have different information

Basic decentralized routing algorithms

• Distance-vector (DV)• Link state (LS)

Distance vector routing

• Based on Bellman-Ford• Nodes only keep path metrics to all

destinations• Neighboring nodes exchange path metrics

Distance vector routing

a

b

c

d

1

1

10

1

a: a, 0b: b, 1c: c, 10

a: a, 1b: b, 0c: c, 1

a: a, 10b: b, 1c: c, 0d: d, 1

c: c, 1d: d, 0

DV: routing table update

a

a: a, 0b: b, 1c: c, 10

b

c

d a: a, 1b: b, 0c: c, 1

1

1

10

1

a: a, 10b: b, 1c: c, 0d: d, 1

c: c, 1d: d, 0

a: 1b: 0c: 1

+ =a: a, 0b: b, 1c: b, 2

a: 10b: 1c: 0d: 1

+ =a: a, 1b: b, 0c: c, 1d: c, 2

DV: routing table update

• When does DV find best paths?• Static topology, synchronous

exchange – A node learns the best path ≤ x hops

after x rounds of exchange– DV converges after n rounds if longest

short path is n hops

DV under dynamics

• DV update rule– reduce path metric if get a better one from nbr

• Always correct w/ static topology• Might be wrong when topology changes

– My path metric is based on new topology– Neighbor’s path metric could be for old topology

DV: count-to-infinity

a

a: a, 0b: b, 1c: b, 2d: b, 3

b

c

d a: a, 1b: b, 0c: c, 1d: c, 2

1

1

10

1

a: b, 2b: b, 1c: c, 0d: d, 1

a: c, 1b: c, 2c: c, 3d: d, 0

a: a, 1b: b, 0c: infd: inf

a: 10b: infc: c, 0d: d, 1

a: 0b: 1c: 2d: 3

+a: a, 1b: b, 0c: a, 3d: a, 4

=

Incorrect update of path metric based on old topology

a: 1b: 0c: 3d: 4

+ =

a: a, 0b: b, 1c: b, 4d: b, 5

(Partial) solutions to count-to-infinity

• Make infinity a finite number (e.g. 64)• Split-horizon

– Do not advertise routes you learnt from neighbor N back to N

• Split-horizon with poison reverse– Advertise route you learnt from neighbor N

with an infinity metric

Path vector:no more count-to-infinity

a

a: a, 0b: b, 1c: b,c, 2d: b,c,d, 3

b

c

d a: a, 1b: b, 0c: c, 1d: c,d, 2

1

1

10

1

a: b,a, 2b: b, 1c: c, 0d: d, 1

a: c, 1b: c,b, 2c: c,b,a, 3d: d, 0

a: a, 1b: b, 0c: infd: inf

a: a, 10b: infc: c, 0d: d, 1

a: a, 0b: b,1c: b,c 2d: b,c,d 3

+a: a, 1b: b, 0c: infd: inf

=

Discard old info based on path vector

DV Summary

• Periodic exchange among neighbors

• Each update has O(N) size, N is the # of nodes (routable prefixes)

• Convergence delays– Explicit path vector info speeds up

convergence

Link State Routing

• In DV, topology is implicit in the routing tables– Convergence is delayed when using old

topology for updates

• LSR: make topology explicit!

Link State Routing

• Each nodes keeps link state information (complete topology)• Each node computes paths based on topology using a centralized algorithm

a

b

c

d

1

1

10

1

a: a, 0b: b, 1c: b, 2d: b, 3

Dijkstra

Link state updates

• Both ends of a link floods link state to the entire network– Immediately upon change– Periodically with a long period

• LS seq # distinguishes old LS from new ones

• Old LS times out eventually

Link State vs. DV

• Routing state– LS: O(E) to keep complete topology– DV: O(N) to keep path metrics to all nodes

• Routing message overhead– LS: O(E*E) floods each LS to entire network– DV: O(E*N) to exchange routing tables on all links

• LS converges faster than DV• Does LS guarantee loop-free forwarding?

Common link metrics

• What’s the “cost” of different links?– 1– Latency– Bandwidth– Queue length– …

Other routing algorithms?

• LS/DV find optimal paths• Both incur substantial message

overhead• Trade off path optimality for lower

overhead?– Compact routing: O( ) state, 3 times

longer paths in the worst case – Geographic routing: constant state €

N

Routing on the Internet

-- from algorithms to protocols

Intra-domain and Inter-domain routing

Intra-domain routing

• Goal: – Find best paths between all intra-AS networks– Traffic engineering to load balance different paths

• Popular IGP (interior gateway) protocols:– OSPF (LS)– IS-IS (LS)– RIP (DV)

Inter-domain routing

• Goal:– Provide reachability for different ASes– Comply to polices of different ASes

• BGP: path vector based on ASes

BGP

• Routing policies• Protocol operations• Disseminating BGP routes within an AS• BGP challenges

– Policy interactions– Multihoming– security

Inter-AS topology is not simply a graph

AT&T

Another ISP

Small ISP pays $$ to AT&T

NYU pays $ to small ISP

Free for traffic between customers of two peering ISPs only

Small ISP

NYU Customer

BGP export policy: what to reveal to

neighbors?• If you tell N about A --> you agree to forward

traffic from N to A– If you do not want to forward traffic to A, don’t tell

others about it

• Always advertise customer routes– Carrying traffic for customer brings $$$

• Advertise non-customer routes to customers only– If you advertise non-customer routes to another

provider/peer, you are carrying traffic for nothing!

BGP import policies: which route to use?

• Not simply shortest path!• Different preferences for routes

from different ASes• Customer > peer > provider

Customers pay for their traffic

Avoid payingproviders by using peer routes

Example BGP routes>show ip bgp 216.165.108.8

BGP routing table entry for 216.165.0.0/17, version 221058Paths: (41 available, best #39, table Default-IP-Routing-Table) Not advertised to any peer 4513 701 7018 12 12, (aggregated by 12 192.76.177.66) 209.10.12.125 from 209.10.12.125 (209.10.12.125) Origin IGP, metric 4103, localpref 100, valid, external, atomic-

aggregate….

AS Path Longest matching prefix

Next hop

High values are better

Route selection based on attributesLocal Pref

• Used to prefer customer > peer > provider• high values are better

ASPATH• Prefer paths with lowest # of ASes

MED• Tell others to choose one exit point over another• low values are better

IGP path cost• Lower values are better• leads to “hot potato” routing

Router ID

Hot potato routing

• All ASes want to get rid of external traffic asap• Hot potato routing causes asymmetric traffic

MED=100 MED=500

Blue AS’ preferred route

BGP operations

• A router establishes a BGP session with its neighbors over TCP

• Neighbors might be many hops away• Two neighbors exchange

– UPDATE (announcements, withdrawal)– KEEPALIVE

Disseminating routes within an AS

Routers establish eBGP sessions between different ASes

Routers inside an AS establish iBGP session to learn external routes

Challenges of route dissemination

• Loop free– Routers should not disagree on how to route

• Complete– Each router chooses route as if it knows all

external routes from all eBGP sessions

• Scalable

A strawman that works: full mesh dissemination

• Each router establishes an iBGP session with all eBGP speaking routers.Complete All routers know all routes.Loop freeAll routers know the same set of routes. Not scalableRequires e(e-1)/2 + ei iBGP sessions among e

eBGP routers and i non-eBGP routers

A simple route reflector setup

Requires e+i BGP sessions Clients and the reflector exchange less traffic All loads are on one router Not all clients get best routes if there are multiple egress routers

Route Reflector

Reflector client

RR tells clients best route for each prefix over iBGP

RR learns routes from eBGP sessions

A problematic RR topology setup

RR1

Reflector client

RR2 tells clients its best route to D, next hop RR2

RR2

Reflector client

RR2 learns equally good route to prefix D from eBGP

RR1 learns best route to prefix D from eBGP

3

13

2

1RR1 tells clients its best route to D, next hop RR1

BGP


– Policy interactions– Multi-homing– Security

When policy goes against shortest path…

• Each AS prefers two-hop route via its clock-wise neighbor

AS1AS1

AS3AS3AS2AS2

AS0AS0

Shortest path routing always converges

• Why?• Shortest paths form a DAG (directed acyclic graph)

from all nodes to a destination.• When polices override shortest path, there’s danger…

Ensuring convergence

• Global policy check– Each AS submits its policy & neighbors to a

global registry– Centrally check for bad policy interactions Checking is NP-complete Topology might change

• Gao/Rexford (today’s paper)– AS graphs are hierarchical– Restrict the set of allowed policies

Gao’s observation

• AS graphs are not just any graph• Provider-subscriber relationships

form a DAG

Publisher-subscriber link

Peering link

Gao’s rule for convergence

• Do not go against DAG edges– Customer route > provider peer routes

• If peering links do not cause cycles…– Customer peer routes > provider routes

A peering link that will not cause cycle

A peering link that might cause cycle

Gao’s rule for convergence

• Gao’s rule matches ISPs’ incentives– ISP Incentives: customer > peer >

provider– Gao’s: customer > peer provider

BGP


– Policy interactions– Multi-homing– Security

BGP and multi-homing

• “stub” AS uses 2 links to the same ISP• “stub” AS uses 2 links to different ISPs• Transit AS uses 2 providers & peers

Stub AS with a single ISP

• Resilient to a single link failure – announce d/19 on both links

• Balance load between two links – split prefix, announce sub-prefix on different links

• No need for a public AS number for stub

Stub AS, d/19Stub AS, d/19

Announce one route to d/19

Announce d1/20 Announce d2/20

AS 5 AS 5 AS 6 AS 6

Stub AS with multiple ISPs

• Resilient to one ISP failure – announce prefix over both links in primary/backup setup

• Balance load between two ISPs– split prefix and announce sub-prefix on each link

• Need a public AS number

Stub AS 12, d/19Stub AS 12, d/19

Announce d/19 with ASPATH 12

Announce d/19 with ASPATH 12 12 12

Announce d/19 with 5 12

Announce d/19 with 6 12 12 12

Service providers multi-home

• Load balance transit traffic on many prefixes (inter-domain traffic engineering) – Control both outbound and inbound traffic

• Redundancy– Primary/backup etc.

• Challenge: scalability and predictability

Documents

Routing Jinyang Li. Administravia Hand in PS1 to Hui Zhang before you leave! Email me project teamlist by Oct-1