35
Internet Routing (COS Internet Routing (COS 598A) 598A) Today: Interdomain Routing Today: Interdomain Routing Convergence Convergence Jennifer Rexford Jennifer Rexford http://www.cs.princeton.edu/~jrex/ http://www.cs.princeton.edu/~jrex/ teaching/spring2005 teaching/spring2005 Tuesdays/Thursdays 11:00am-12:20pm Tuesdays/Thursdays 11:00am-12:20pm

Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Embed Size (px)

Citation preview

Page 1: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Internet Routing (COS Internet Routing (COS 598A)598A)

Today: Interdomain Routing Today: Interdomain Routing ConvergenceConvergence

Jennifer RexfordJennifer Rexford

http://www.cs.princeton.edu/~jrex/teaching/http://www.cs.princeton.edu/~jrex/teaching/spring2005spring2005

Tuesdays/Thursdays 11:00am-12:20pmTuesdays/Thursdays 11:00am-12:20pm

Page 2: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Outline

• BGP convergence– Causes of routing changes– Detecting session failures– BGP path exploration

• Route-flap damping– Damping persistent flapping– Interaction with path exploration

• Stability of popular destinations– Are things really all that bad?

• Reducing convergence delay– Avoiding complete path exploration– Why this is harder than it looks

Page 3: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Causes of BGP Routing Changes

• Topology changes– Equipment going up or down– Deployment of new routers or sessions

• BGP session failures– Due to equipment failures, maintenance,

etc.– Or, due to congestion on the physical path

• Changes in routing policy– Reconfiguration of preferences– Reconfiguration of route filters

• Persistent protocol oscillation– More on this next week!

Page 4: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

04/21/23

BGP Session Operation

Establish session on TCP port 179

Exchange all active routes

Exchange incremental updates

AS1

AS2

While connection is ALIVE exchangeroute UPDATE messages

BGP session

Page 5: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

BGP Session Failure

• BGP runs over TCP– BGP only sends updates

when changes occur– TCP doesn’t detect lost

connectivity on its own

• Detecting a failure– Keep-alive: 60 seconds– Hold timer: 180 seconds

• Reacting to a failure– Discard all routes

learned from the neighbor

– Send new updates for any routes that change

AS1

AS2

Page 6: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Routing Change: Before and After

0

1 2

3

0

1 2

3

(1,0) (2,0)

(3,1,0)

(2,0)

(1,2,0)

(3,2,0)

Page 7: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Routing Change: Path Exploration

• AS 1– Delete the route (1,0)– Switch to next route

(1,2,0)– Send route (1,2,0) to AS

3

• AS 3– Sees (1,2,0) replace

(1,0)– Compares to route (2,0)– Switches to using AS 2

0

1 2

3

(2,0)

(1,2,0)

(3,2,0)

Page 8: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Routing Change: Path Exploration

• Initial situation– Destination 0 is alive– All ASes use direct path

• When destination dies– All ASes lose direct path– All switch to longer

paths– Eventually withdrawn

• E.g., AS 2– (2,0) (2,1,0) – (2,1,0) (2,3,0) – (2,3,0) (2,1,3,0)– (2,1,3,0) null

1 2

3

0

(1,0)(1,2,0)(1,3,0)

(2,0)(2,1,0)(2,3,0)

(2,1,3,0)

(3,0)(3,1,0)(3,2,0)

Page 9: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Convergence Overhead and Delay

• Path exploration is expensive– Large number of possible paths– Might have to explore (nearly) all of them

• Minimum Route Advertisement Interval– Minimum time between advertisement of routes

for a given destination to a given neighbor– Rate limit on BGP update messages– … and allows combining multiple messages in

one– Typical value of 30 seconds

• Convergence delay– (30 seconds) * (# of paths)

Page 10: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Four Kinds of BGP Routing Changes

• Destination becomes reachable– Switch from no path to a new path

• Better path becomes available– Switch from old path to new, better path

• Best path becomes unavailable– Switch from old path to new, worse path

• Destination becomes unreachable– Switch from old path to no path at all

higherdelay

lowerdelay

Page 11: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Questions About Convergence Delay

• Reduce the MRAI timer?– High message overhead on the router?– Delays from overloading the CPU?– What is the right value?

• Dependence on topology?– Worst-case: n!

• Fully-connected graph (i.e., a clique)• No filtering of advertisements• Shortest-path routing• Destination dies completely

– Typical case?????

Page 12: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Route Flap Damping

Page 13: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Persistent Routing Changes

• Causes– Link with intermittent connectivity– Congestion causing repeated session resets– Persistent oscillation due to policy conflicts

• Effects – Lots of BGP update messages– Disruptions to data traffic– High overhead on routers

• Solution– Suppress paths that go up/down repeatedly– … to avoid updates and prefer stable paths

Page 14: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Route Flap Damping

• BGP-speaking router– One or more BGP neighbors– Keep an “RIB-in” per neighbor– Select single best route per destination prefix

• Route-flap damping– Penalty counter per (peer, prefix) pair– Increment penalty when peer changes route– Decrease penalty over time when route is

stable

• Design and deployed in the mid 1990s– Widely viewed as helping improve stability

Page 15: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Example Why Damping is Good

• Consider AS 3– Path #1: (3,1,0)– Path #2: (3,2,0)

• If link (1,0) fails– AS 3 switches routes

• If link (1,0) restores– AS 3 switches routes

• If this happens a lot– Better for AS 3 to

stick with (3,2,0)

0

1 2

3

(1,0) (2,0)

Page 16: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Damping Penalty Function

time

pena

lty

reuse threshold

suppression threshold

Page 17: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Configurable Damping Parameters

• Penalty for a routing change– May vary with the type of update message– Advertisement vs. withdraw? Attributes

change?• Decaying in absence of a change

– Exponent in the exponential decay• Suppression threshold

– Trigger for damping the route– Determines how many updates are tolerated

• Reuse threshold– Trigger for considering the route again– Determines how long the route is not usable

Page 18: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Best Common Practices for Damping

• Different parameters for different prefixes– More aggressive with small address blocks– Disable damping on certain prefixes (e.g.,

corresponding to the DNS root servers)

• Avoid suppressing stable routes– Tolerate at least four routing changes

• Suppress unstable routes for quite a while– Values ranging from 10 minutes to 1 hour– Values for 30 minutes are not uncommon

Page 19: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Interaction with Path Exploration

• BGP routing convergence– Explore one or more alternate paths– Number of alternate paths may be quite high– Time between steps is small (e.g., 30 seconds)

• Triggering route-flap damping– Increasing penalty with each step– Only small amount of decay between steps

• Convergence may trigger route flap damping– Convergence may involve more than 4 changes– Routing change may trigger lost connectivity!!!– Confirmed by recent active measurement studies

Page 20: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Effects of Damping are Confusing

• AS 0 is a stable network• Link (1,3) fails a lot

– AS 3 switches routes back and forth a lot

– Sends new BGP updates to its customers

– Suppose AS 3 does not apply route-flap damping

• AS 3’s customers– Eventually dampen route– Causes lost reachability

to destination in AS 0

• How can AS 0 diagnose this problem, and fix it?

0

1 2

3

Page 21: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Open Questions

• Want to suppress unstable routes– Otherwise, lots of update messages– … and lots of transient disruptions

• Yet, want to tolerate path exploration– Otherwise, you suppress stable routes– … and black-hole otherwise reachable

destinations

• How to reconcile?– Better flap-damping parameters?– More information in update messages?– Something more gentle than suppression?

Page 22: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

BGP Stability of Popular Destinationshttp://www.cs.princeton.edu/~jrex/papers/imw02.pdf

Page 23: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

BGP Routing and Traffic Popularity

• A possible saving grace…– Most BGP updates due to few prefixes– … and, most traffic due to few prefixes– ... but, hopefully not the same prefixes

• Popularity vs. BGP stability– Do popular prefixes have stable routes?

• Yes, for ~ 10 days at a stretch!

– Does most traffic travel on stable routes?• A resounding yes!

– Direct correlation of popularity and stability?• Well, no, not exactly…

Page 24: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

BGP Updates

• BGP updates for March 2002– AT&T route reflector– RouteViews and RIPE-NCC

• Data preprocessing– Filter duplicate BGP updates– Filter resets of monitor sessions– Removes 7-30% of updates

• Grouping updates into “events”– Updates for the same prefix– Close together in time (45 sec)– Reduces sensitivity to timing

Confirmed: few prefixes responsible for most events

Page 25: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Two Views of Prefix Popularity

• AT&T traffic data– Netflow data on peering links

– Aggregated to the prefix level

– Outbound from AT&T customers

– Inbound to AT&T customers

• NetRatings Web sites– NetRatings top-25 list

– Convert to site names

– DNS to get IP addresses

– Clustered into 33 prefixes

Amazon

www.amazon.com

207.171.182.16

207.171.176.0/20

Internet

AT&T

in out

Page 26: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Traffic Volume vs. BGP Events (CDF)

0

20

40

60

80

100

0 10 20 30 40 50 60 70 80 90 100

Traf

fic v

olum

e (%

)

BGP events (%)

InboundOutbound

50% of events1.4% of traffic

(4.5% of prefixes)

50% of traffic0.1% of events(0.3% of prefixes)

Page 27: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Update Events/Day (CCDF, log-log plot)

0.01

0.1

1

10

100

0.1 1 10

Per

cent

age

(%)

#Events/Day

AllInbound

OutboundNetrating

1% had > 5

events per day

No “popular” prefix had > 3 events per

dayMost “popular” prefixes had <

0.2 events/day and just 1 update/event

Page 28: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

An Interpretation of the Results

• Popular stable– Well-managed– Few failures and fast recovery– Single-update events to alternate routes

• Unstable unpopular– Persistent flaps: hard to reach– Frequent flaps: poorly-managed sites

• Unpopular does not imply unstable– Most prefixes are quite stable– Well-managed, simple configurations– Managed by upstream provider

Page 29: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Avoiding Path Exploration

Page 30: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Reducing Path Exploration By Tagging

• When AS 1 sees (1,0) fail– Switches to (1,2,0)– Why not say “because the

link (1,0) has failed”?– Allow ASes to discard all

paths that use edge (1,0)

• Should reduce exploration– E.g., AS 3 should not

consider (3,2,1,0)– E.g., AS 2 should not

consider (2,3,1,0)

• Seems appealing, but…

1 2

3

0

(1,0)(1,2,0)(1,3,0)

(2,0)(2,1,0)(2,3,0)

(3,0)(3,1,0)(3,2,0)

Page 31: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Problem #1: Timing of Information

• How long should the ASes believe the info?– What if the link (1,0) comes back up?– What if the info about the failure is still

propagating?

• Do the ASes need to remember the old paths?– E.g., should AS 2 remember (2,3,1,0) in case it

learns later that (1,0) has come back up?– BGP is an incremental protocol, so forgetting

information may be risky unless you will get it back again

• But, these issues are probably surmountable– … with some attention to the details

Page 32: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Problem #2: AS With Multiple Routers/Links

• BGP introduces abstraction– Treats each AS as a single node– Doesn’t distinguish between

links

• Example: one link fails– Should AS 1 tell others?– Need to identify which link? – Does it introduce more updates?

• Internal BGP details matter– Some AS 1 routers don’t know

about both paths through AS 0…

1

0

d

Page 33: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Internal BGP Convergence

Briefly, the border router has no route at all!

Page 34: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Questions

• Can we reduce path exploration– Hints in the BGP update messages– To avoid exploring a set of related paths

• Handling the challenges– Timing details– Multiple routers and links per AS– … without excessive overhead

• Can we change the problem– Server per AS that stores all candidate routes– Exchanging information about the root cause

Page 35: Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford jrex/teaching/spring2005 Tuesdays/Thursdays

Next Time: Protocol Divergence

• Two papers– “The Stable Paths Problem and Interdomain

Routing”– “Stable Interdomain Routing Without Global

Coordination”

• Review only of the first paper– Summary– Why accept– Why reject– Future work

• Optional NANOG video on “BGP Wedgies”