33
Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

Embed Size (px)

Citation preview

Page 1: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

Path Splicing

Nick FeamsterGeorgia Tech

Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

Page 2: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

2

Internet Availability

“It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. … It should be:

1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today.

2. …

• E911 service• Air traffic control• …

Stanford University Clean-Slate Design for the Internet:

OK for email and the Web, but what about:

Page 3: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

3

Work to do…• Various studies (Paxson, Andersen, etc.) show the

Internet is at about 2.5 “nines”• More “critical” (or at least availability-centric) applications

on the Internet• At the same time, the Internet is

getting more difficult to debug– Scale, complexity, disconnection, etc.

Page 4: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

8

Threats to Availability

• Natural disasters• Physical failures (node, link)• Router software bugs• Misconfiguration• Mis-coordination• Denial-of-service (DoS) attacks• Changes in traffic patterns (e.g., flash crowd)• …

Page 5: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

9

Idea: Backup/Multipath

• For intradomain routing– IP and MPLS fast re-route– Packet deflections [Yang 2006]– ECMP, NotVia, Loop-Free Alternates [Cisco]

• For interdomain routing– MIRO [Rexford 2006]

• Problem– Scale: Protecting against arbitrary failures requires

storing lots of state, exchanging lots of messages– Control: End systems can’t signal when they think a

path has “failed”

Page 6: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

10

Backup Paths: Promise and Problems

• Bad: If any link fails on both paths, s is disconnected from t

• Want: End systems remain connected unless the underlying graph has a cut

ts

Page 7: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

11

Path Splicing: Main Idea

• Step 1 (Generate slices): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration

• Step 2 (Splice end-to-end paths): Allow traffic to switch between instances at any node in the protocol

ts

Compute multiple forwarding trees per destination.Allow packets to switch slices midstream.

Page 8: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

12

Outline

• Path Splicing for Intradomain Routing– Generating slices– Constructing paths– Forwarding– Recovery

• Evaluation– Reliability and recovery– Stretch– Effects on traffic

• Path Splicing for Interdomain Routing• Ongoing: Prototype and Deployment Paths

Page 9: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

13

Generating Slices

• Goal: Each instance provides different paths• Mechanism: Each edge is given a weight that is

a slightly perturbed version of the original weight– Two schemes: Uniform and degree-based

ts

3

3

3

“Base” Graph

ts

3.5

4

5 1.5

1.5

1.25

Perturbed Graph

Page 10: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

14

How to Perturb the Link Weights?

• Uniform: Perturbation is a function of the initial weight of the link

• Degree-based: Perturbation is a linear function of the degrees of the incident nodes– Intuition: Deflect traffic away from nodes where traffic

might tend to pass through by default

Page 11: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

15

Constructing Paths

• Goal: Allow multiple instances to co-exist• Mechanism: Virtual forwarding tables

a

t

c

s b

t a

t c

Slice 1

Slice 2

dst next-hop

Page 12: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

16

Forwarding Traffic

• One approach: shim header with forwarding bits

• Routers use lg(k) bits to index forwarding tables– Shift bits after inspection

• To access different (or multiple) paths, end systems simply change the forwarding bits– Incremental deployment is trivial– Persistent loops cannot occur

• Other variations are possible

Page 13: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

17

Alternate Approach

• Use fewer bits per packet• Each router along the path uses the same set of

random bits as an input to select the next hop

• Advantages– Less per-packet overhead

• Disadvantages– Less direct control over path– No explicit prevention of loops

Page 14: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

19

Recovery Mechanisms

• End-system recovery– Switch slices at every hop with probability 0.5

• Network-based recovery– Router switches to a random slice if next hop is

unreachable– Continue for a fixed number of hops until

destination is reached

19

Page 15: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

20

Availability Evaluation: Two Aspects

• Reliability: Connectivity in the routing tables should approach the that of the underlying graph– If two nodes s and t remain connected in the

underlying graph, there is some sequence of hops in the routing tables that will result in traffic

• Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path

Page 16: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

21

Availability Evaluation

• A definition for reliability

• Does path splicing improve reliability?– How close can splicing get to the best possible

reliability (i.e., that of the underlying graph)?

• Can path splicing enable fast recovery?– Can end systems (or intermediate nodes) find

alternate paths fast enough?

Page 17: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

22

Reliability Definition

• Reliability: the probability that, upon failing each edge with probability p, the graph remains connected

• Reliability curve: the fraction of source-destination pairs that remain connected for various link failure probabilities p

• The underlying graph has an underlying reliability (and reliability curve)– Goal: Reliability of routing system should approach that of the underlying graph.

Page 18: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

23

Reliability Curve: Illustration

Probability of link failure (p)

Fraction of source-dest pairs disconnected

Better reliability

More edges available to end systems -> Better reliability

Page 19: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

24

Experimental Setup

• Evaluation on two topologies– GEANT (Real) and Sprint (Rocketfuel)

• Compute base graph by taking the union of k perturbed graphs

• Remove an edge from the base graph with probability p

• Compute number of pairs that could reach one another (average over 1,000 trials)

Page 20: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

25

Reliability Approaches Optimal• Sprint (Rocketfuel) topology• 1,000 trials• p indicates probability edge was removed from base graph

Reliability approaches optimal

Average stretch is only 1.3

Sprint topology,degree-based perturbations

Page 21: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

26

Simple Recovery Strategies Work Well

• Which paths can be recovered within 5 trials?– Sequential trials: 5 round-trip times– …but trials could also be made in parallel

Recovery approaches maximum possible

Adding a few more slices improves recovery beyond best possible reliability with fewer slices.

Page 22: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

27

Significant Novelty for Modest Stretch

• Novelty: difference in nodes in a perturbed shortest path from the original shortest path

Example

s d

Novelty: 1 – (1/3) = 2/3

Fraction of edges on short path shared with long path

Page 23: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

28

Evaluation Summary: Splicing Can Improve Availability

• Reliability: Connectivity in the routing tables should approach the that of the underlying graph– Approach: Overlay trees generated using random link-

weight perturbations. Allow traffic to switch between them– Result: Splicing ~ 10 trees achieves near-optimal reliability

• Recovery: In case of failure, nodes should quickly be able to discover a new path– Approach: End nodes randomly select new bits– Result: Recovery within 5 trials approaches best possible.

Page 24: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

29

Does Splicing Create Loops?

• Persistent loops are avoidable– In the simple scheme, path bits are exhausted from

the header– Never switching back to the same

• Transient loops can still be a problem because they increase end-to-end delay (“stretch”)– Longer end-to-end paths– Wasted capacity– Two-hop loops do occur (around 1 in 100 trials for

k=2, more for higher values of k), but can be avoided with the mechanisms above

Page 25: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

30

Interactions with Traffic

Maximum utilization unaffected

Page 26: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

31

Path Splicing for Interdomain Routing• Observation: Many routers already learn multiple

alternate routes to each destination.• Idea: Use the bits to index into these alternate routes at

an AS’s ingress and egress routers.

• Storing multiple entries per prefix • Indexing into them based on packet headers• Selecting the “best” k routes for each destination

Required new functionality

ddefault

alternate

Splice paths at ingress and egress routers

Page 27: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

32

Interdomain Splicing Header

• Intradomain bits function as before• Interdomain: Three sections

– Ingress and egress– Policy: restrict “illegal” entries in the forwarding table

Page 28: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

33

Experimental Setup

• 2,500-node policy-annotated AS graph• Use C-BGP to compute routes on base graph• Remove each inter-AS edge with probability p• Test connectivity between a random subset of

AS pairs• Compute base reliability without policy

restrictions

Page 29: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

34

Interdomain Splicing: Reliability

2-slice deployment approaches best possible

Page 30: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

35

Incremental Deployment

Partial deployment provides some gains

Page 31: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

36

Ongoing Work

• Software implementation– Click Element– PlanetLab/VINI deployment

• Open questions– What API should the network layer provide?– How to perform monitoring/failure detection?

• Extension to Cisco Multi-Topology Routing– In-progress IETF draft

Page 32: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

37

Open Questions and Ongoing Work

• How does splicing interact with traffic engineering? Sources controlling traffic?

• What are the best mechanisms for generating slices and recovering paths?

• Can splicing eliminate dynamic routing?

Page 33: Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala, Megan Elmore, Santosh Vempala

38

Conclusion• Simple: Forwarding bits provide access to

different paths through the network

• Scalable: Exponential increase in available paths, linear increase in state

• Stable: Fast recovery does not require fast routing protocols

http://www.cc.gatech.edu/~feamster/papers/splicing.pdf