18
The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap [email protected] *In collaboration with Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi Craig Labovitz Microsoft Research [email protected] om

The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap [email protected] *In collaboration with

Embed Size (px)

Citation preview

Page 1: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

The Impact of Policy and Topology on Internet Routing Convergence

NANOG 20October 23, 2000

Abha AhujaInterNap

[email protected]

*In collaboration with Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi

Craig LabovitzMicrosoft Research

[email protected]

Page 2: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

2

Background

In NANOG 19, we showed BGP exhibits poor convergence behavior:

1) Measured convergence times of up to 20 minutes for BGP path changes/failures

2) Factorial (N!) theoretic upper bound on BGP convergence complexity (explore all paths of all possible lengths)

Open question: In practice, what topological and policy factors impact convergence delay ?

Page 3: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

3

This Talk

Goal: Understand BGP convergence behavior under real topologies/policies

– Given a physical topology and ISP policies, can we estimate the time required for convergence?

– Do convergence behaviors of ISPs differ?– How does steady-state topology compare to paths

explored during failure?– Can we change policies/topology to improve BGP

convergence times?

Page 4: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

4

Experiments

• Analyzed secondary paths between between 20 source/destination AS pairs– Inject and monitor BGP faults – Survey providers to determine policies behind paths

• To provide intuition, we will focus on faults injected into three ISPs at Mae-West– Observed faults via fourth ISP (in Japan)– Three ISPs roughly map onto tier1, tier2, tier3

providers– Results from these three ISPs representative of all data

Page 5: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

5

Comparing ISP Convergence Latencies

• CDF of faults injected into three Mae-West providers and observed at Japanese ISP• Significant variations between providers• Not related to geography

Page 6: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

6

Observed Fault Injection Topologies

• In steady-state, topologies between ISP1, ISP2, ISP3 similar – all direct BGP peers of ISP4. Does not explain variation on previous slide…

Steady State

ISP 1R1

FAULT

ISP 4

ISP 2R2

FAULT

Steady State

ISP 3R3

FAULT

Steady State

MAE-WEST

Page 7: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

7

Factors Impacting BGP Propagation

• Topology and policy impact graph (usually DAG)

• Each AS router adds between 0-45 seconds of MinRouteAdver Delay

• iBGP/Route Reflector• MinRouteAdver and path race

conditions affect which routes chosen as backup routes

iBGP

A B

C D

Page 8: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

8

ISP1-ISP4 Paths During Failure

• Only one back up path (length 3)

Steady State

ISP 1

ISP 5

P2

P2ISP 4

R1

FAULT

96% Average: 92 (min/max 63/140) seconds

Announce AS4 AS5 AS1 (44 seconds)

Withdraw (92 seconds)

4% Average: 32 (min/max 27/38) seconds

Withdraw (32 seconds)

Page 9: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

9

ISP2-ISP4 Paths During Failure

Ste

ady

Sta

te

ISP 2

ISP 4

P2

P2

ISP 5

P3

P3

P3

ISP 6

R2

FAULT

Vagabond

P4

P4ISP 10

ISP 11

ISP 12

ISP 13

P4

P4

P4

63% Average: 79 (min/max 44/208) seconds

AS4 AS5 AS2 (35 seconds)

Withdraw (79 seconds)

7% Average: 88 (min/max 80/94) seconds

Announce AS4 AS5 AS2 (33 seconds)

Announce AS4 AS6 AS5 AS2 (61 seconds)

Withdraw (88 seconds)

7% Average: 54 (min/max 29/9) seconds

Withdraw (54 seconds)

23% Other

Page 10: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

10

ISP3-ISP4 Paths During Failure

ISP 3

Ste

ady

St a

t e

ISP 4

R3

P2

P2

ISP 5

FAULT

ISP 1

P3

P3

P6

P7

P7P4

P4

P5

P5

P5P5

P5

P6

P6

P6

ISP 7

ISP 9

ISP 8

P7

P7

P4

36% Average: 110 (min/max 78/135) seconds

Announce AS4 AS5 AS (52 seconds)

Withdraw (110 seconds)

35% Average: 107 (min/max 91/133) seconds

Announce AS4 AS1 AS3 (39 seconds)

Announce AS4 AS5 AS3 (68 seconds)

Withdraw (107 seconds)

2% Average:140.00 (min/max 120/142)

Announce AS4 AS5 AS8 AS7 AS3 (27) Announce AS4 AS5AS9 AS8 AS7 AS3 (86)

Withdraw (140 seconds)

27% Other

Page 11: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

11

Why the Different Levels of Complexity?

• Provider relationship taxonomy– Transit relationships

• customer/provider• customer sends their customer routes• provider sends default-free routing info (or default)

– Peer relationships• Bilateral exchange of customer routes

– Back-up transit• peer relationship becomes transit relationship based on failure

• These relationships constrain topology (no N! states) and determine number of possible backup paths

Page 12: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

12

Convergence in the Real World

1

customer

peer

2

3

4

5

X

Longest path: 3 4 5 2 1

Possible paths for node 3:

2 1 x4 2 1 x(4 5 2 1 x)

Possible paths for node 4:

2 1 x3 2 1 x5 2 1 x

Page 13: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

13

Convergence in the Real World

1

customer

peer

2

3

4

5

X

Longest path: 3 4 5 2 1

Possible paths for node 3:

2 1 x4 5 2 1 x

Possible paths for node 4:

3 2 1 x5 2 1 x

Hierarchy eliminates some states

Tier 1?

Page 14: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

14

Policy and Convergence

• Strict hierarchical relationships eliminate exploring some extra states – Policy controls the number of possible paths to

explore.– But turns out the number of paths does not

matter…

Page 15: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

15

Relationship Between Backup Paths and Convergence

• Convergence related to length longest possible backup ASPath between two nodes

Longest Observed ASPath Between AS Pair

Page 16: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

16

So, what does all of this mean for convergence time?

• Convergence time is related to the length of the longest path that needs to be explored– Before fail-over, need to withdraw all

alternative paths– This is bounded O(n) by length of the longest

alternative path in the system– This longest path is related to policy

Page 17: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

17

Towards Millisecond BGP Convergence

Three possible solutions

1) Entirely new protocol

2) Turn off MinRouteAdver timer

3) “Tag” BGP updates– Provide hint so nodes can detect bogus state

information

Page 18: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000 Abha Ahuja InterNap ahuja@umich.edu *In collaboration with

18

Further Information

C. Labovitz, R. Wattenhofer, A. Ahuja, S. Venkatachary, “The Impact of Topology and Policy on Delayed Internet Routing Convergence”. MSR Technical Report (number pending). June, 2000.

C. Labovitz, A. Ahuja, A. Bose, F. Jahanian, “Internet Delayed Routing Convergence.” To appear in Proceedings of ACM SIGCOMM. August, 2000.

Send email to [email protected] for more information or to participate in the policy survey