25
Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

Embed Size (px)

Citation preview

Page 1: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

Improving Internet Availabilitywith Path Splicing

Nick FeamsterGeorgia Tech

Joint work with Murtaza Motiwala and Santosh Vempala

Page 2: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

2

“It is not difficult to create a list of desired characteristics for a new Internet. Deciding how to design and deploy a network that achieves these goals is much harder. Over time, our list will evolve. It should be:

1. Robust and available. The network should be as robust, fault-tolerant and available as the wire-line telephone network is today.

2. …

Can the Internet be “Always On”?

• Various studies (Paxson, etc.) show the Internet is at about 2.5 “nines”

• More “critical” (or at least availability-centric) applications on the Internet

• At the same time, the Internet is getting more difficult to debug– Scale, complexity, disconnection, etc.

Page 3: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

3

Availability of Other Services

• Carrier Airlines (2002 FAA Fact Book)– 41 accidents, 6.7M departures– 99.9993% availability

• 911 Phone service (1993 NRIC report +)– 29 minutes per year per line– 99.994% availability

• Std. Phone service (various sources)– 53+ minutes per line per year– 99.99+% availability

Page 4: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

4

Threats to Availability

• Natural disasters• Physical failures (node, link)• Implementation bugs• Misconfiguration• Mis-coordination• Denial-of-service (DoS) attacks• Changes in traffic patterns (e.g., flash crowd)• …

Page 5: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

5

Availability: Two Aspects

• Reliability: Connectivity in the routing tables should approach the that of the underlying graph– If two nodes s and t remain connected in the

underlying graph, there is some sequence of hops in the routing tables that will result in traffic

• Recovery: In case of failure (i.e., link or node removal), nodes should quickly be able to discover a new path

Page 6: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

6

Multipath: Promise and Problems

• Bad: If any link fails on both paths, s is disconnected from t

• Want: End systems remain connected unless the underlying graph has a cut

ts

Page 7: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

7

Path Splicing: Main Idea

• Step 1 (Perturbations): Run multiple instances of the routing protocol, each with slightly perturbed versions of the configuration

• Step 2 (Slicing): Allow traffic to switch between instances at any node in the protocol

ts

Compute multiple forwarding trees per destination.Allow packets to switch slices midstream.

Page 8: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

8

Generating Slices

• Goal: Each instance provides different paths• Mechanism: Each edge is given a weight that is

a slightly perturbed version of the original weight– Two schemes: Uniform and degree-based

ts

3

3

3

“Base” Graph

ts

3.5

4

5 1.5

1.5

1.25

Perturbed Graph

Page 9: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

9

Constructing Paths

• Goal: Allow multiple instances to co-exist• Mechanism: Virtual forwarding tables

a

t

c

s b

t a

t c

Slice 1

Slice 2

dst next-hop

Page 10: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

10

Forwarding Traffic

• Packet has shim header with forwarding bits

• Routers use lg(k) bits to index forwarding tables– Shift bits after inspection

• To access different (or multiple) paths, end systems simply change the forwarding bits– Incremental deployment is trivial– Persistent loops cannot occur

• Various optimizations are possible

Page 11: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

11

Evaluation

• Defining reliability

• Does path splicing improve reliability?– How close can splicing get to the best possible

reliability (i.e., that of the underlying graph)?

• Can path splicing enable fast recovery?– Can end systems (or intermediate nodes) find

alternate paths fast enough?

Page 12: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

12

Defining Reliability

• Reliability: the probability that, upon failing each edge with probability p, the graph remains connected

• Reliability curve: the fraction of source-destination pairs that remain connected for various link failure probabilities p

• The underlying graph has an underlying reliability (and reliability curve)– Goal: Reliability of routing system should approach that of the underlying graph.

Page 13: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

13

Reliability Curve: Illustration

Probability of link failure (p)

Fraction of source-dest pairs disconnected

Better reliability

More edges available to end systems -> Better reliability

Page 14: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

14

Reliability Approaches Optimal• Sprint (Rocketfuel) topology• 1,000 trials• p indicates probability edge was removed from base graph

Reliability approaches optimal

Average stretch is only 1.3

Sprint topology,degree-based perturbations

Page 15: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

15

Recovery: Two Mechanisms

• End-system recovery– Switch slices at every hop with probability 0.5

• Network-based recovery– Router switches to a random slice if next hop is

unreachable– Continue for a fixed number of hops till

destination is reached

15

Page 16: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

16

Simple Recovery Strategies Work Well

• Which paths can be recovered within 5 trials?– Sequential trials: 5 round-trip times– …but trials could also be made in parallel

Recovery approaches maximum possible

Adding a few more slices improves recovery beyond best possible reliability with fewer slices.

Page 17: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

17

Splicing Improves Availability

• Reliability: Connectivity in the routing tables should approach the that of the underlying graph– Approach: Overlay trees generated using random link-

weight perturbations. Allow traffic to switch between them

– Result: Splicing ~ 10 trees achieves near-optimal reliability

• Recovery: In case of failure, nodes should quickly be able to discover a new path– Approach: End nodes randomly select new bits– Result: Recovery within 5 trials approaches best possible

Page 18: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

18

Open Questions

• How does splicing interact with traffic engineering? Sources controlling traffic?

• What are the best mechanisms for generating slices and recovering paths?

• Can splicing eliminate dynamic routing?

Page 19: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

19

Conclusion• Simple: Forwarding bits provide access to

different paths through the network

• Scalable: Exponential increase in available paths, linear increase in state

• Stable: Fast recovery does not require fast routing protocols

http://www.cc.gatech.edu/~feamster/papers/splicing-hotnets.pdf

Page 20: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

20

Page 21: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

21

How to Perturb the Link Weights?

• Uniform: Perturbation is a function of the initial weight of the link

• Degree-based: Perturbation is a linear function of the degrees of the incident nodes– Intuition: Deflect traffic away from nodes where traffic

might tend to pass through by default

Page 22: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

22

Putting It Together

• End system sets forwarding bits in packet header• Forwarding bits specify slice to be used at any hop• Router: examines/shifts forwarding bits, and forwards

ts

Page 23: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

23

What About Loops?

• Persistent loops are avoidable– In the simple scheme, path bits are exhausted from

the header– Never switching back to the same

• Transient loops can still be a problem because they increase end-to-end delay (“stretch”)– Longer end-to-end paths– Wasted capacity

Page 24: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

24

Significant Novelty for Modest Stretch

• Novelty: difference in nodes in a perturbed shortest path from the original shortest path

Example

s d

Novelty: 1 – (1/3) = 2/3

Fraction of edges on short path shared with long path

Page 25: Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala

25

Extension: Interdomain Paths• Observation: Many routers already learn multiple

alternate routes to each destination.• Idea: Use the forwarding bits to index into these

alternate routes at an AS’s ingress and egress routers.

• Storing multiple entries per prefix • Indexing into them based on packet headers• Selecting the “best” k routes for each destination

Required new functionality

ddefault

alternate

Splice paths at ingress and egress routers