61
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe, Jennifer Rexford

Yi Wang, Eric Keller, Brian Biskeborn , Kobus van der Merwe , Jennifer Rexford

Embed Size (px)

DESCRIPTION

V irtual RO uters O n the M ove (VROOM): Live Router Migration as a Network-Management Primitive. Yi Wang, Eric Keller, Brian Biskeborn , Kobus van der Merwe , Jennifer Rexford. Virtual ROuters On the Move (VROOM). Key idea Routers should be free to roam around - PowerPoint PPT Presentation

Citation preview

Page 1: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Virtual ROuters On the Move (VROOM):Live Router Migration as

a Network-Management Primitive

Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe, Jennifer Rexford

Page 2: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Virtual ROuters On the Move (VROOM)

• Key idea– Routers should be free to roam around

• Useful for many different applications– Simplify network maintenance– Simplify service deployment and evolution– Reduce power consumption– …

• Feasible in practice– No performance impact on data traffic– No visible impact on control-plane protocols

2

Page 3: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

The Two Notions of “Router”• The IP-layer logical functionality, and the physical

equipment

3

Logical(IP layer)

Physical

Page 4: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

The Tight Coupling of Physical & Logical• Root of many network-management challenges (and

“point solutions”)

4

Logical(IP layer)

Physical

Page 5: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

VROOM: Breaking the Coupling• Re-mapping the logical node to another physical node

5

Logical(IP layer)

Physical

VROOM enables this re-mapping of logical to physical through virtual router migration.

Page 6: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 1: Planned Maintenance

• NO reconfiguration of VRs, NO reconvergence

6

A

B

VR-1

Page 7: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 1: Planned Maintenance

• NO reconfiguration of VRs, NO reconvergence

7

A

B

VR-1

Page 8: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 1: Planned Maintenance

• NO reconfiguration of VRs, NO reconvergence

8

A

B

VR-1

Page 9: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 2: Service Deployment & Evolution

• Move a (logical) router to more powerful hardware

9

Page 10: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 2: Service Deployment & Evolution

• VROOM guarantees seamless service to existing customers during the migration

10

Page 11: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 3: Power Savings

11

• $ Hundreds of millions/year of electricity bills

Page 12: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 3: Power Savings

12

• Contract and expand the physical network according to the traffic volume

Page 13: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 3: Power Savings

13

• Contract and expand the physical network according to the traffic volume

Page 14: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Case 3: Power Savings

14

• Contract and expand the physical network according to the traffic volume

Page 15: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Virtual Router Migration: the Challenges

15

1. Migrate an entire virtual router instance• All control plane & data plane processes / states

Page 16: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Virtual Router Migration: the Challenges

16

1. Migrate an entire virtual router instance2. Minimize disruption

• Data plane: millions of packets/second on a 10Gbps link• Control plane: less strict (with routing message retrans.)

Page 17: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Virtual Router Migration: the Challenges

17

1. Migrating an entire virtual router instance2. Minimize disruption3. Link migration

Page 18: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Virtual Router Migration: the Challenges

18

1. Migrating an entire virtual router instance2. Minimize disruption3. Link migration

Page 19: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

VROOM Architecture

19

Dynamic Interface Binding

Data-Plane Hypervisor

Page 20: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Key idea: separate the migration of control and data planes

1.Migrate the control plane2.Clone the data plane3.Migrate the links

20

VROOM’s Migration Process

Page 21: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Leverage virtual server migration techniques• Router image

– Binaries, configuration files, etc.

21

Control-Plane Migration

Page 22: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Leverage virtual migration techniques• Router image• Memory

– 1st stage: iterative pre-copy– 2nd stage: stall-and-copy (when the control plane

is “frozen”)

22

Control-Plane Migration

Page 23: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Leverage virtual server migration techniques• Router image• Memory

23

Control-Plane Migration

Physical router A

Physical router B

DP

CP

Page 24: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Clone the data plane by repopulation– Enable migration across different data planes– Eliminate synchronization issue of control & data

planes

24

Data-Plane Cloning

Physical router A

Physical router B

CP

DP-old

DP-newDP-new

Page 25: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Data-plane cloning takes time– Installing 250k routes takes over 20 seconds*

• The control & old data planes need to be kept “online”• Solution: redirect routing messages through tunnels

25

Remote Control Plane

*: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005.

Physical router A

Physical router B

CP

DP-old

DP-new

Page 26: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Data-plane cloning takes time– Installing 250k routes takes over 20 seconds*

• The control & old data planes need to be kept “online”• Solution: redirect routing messages through tunnels

26

Remote Control Plane

*: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005.

Physical router A

Physical router B

CP

DP-old

DP-new

Page 27: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Data-plane cloning takes time– Installing 250k routes takes over 20 seconds*

• The control & old data planes need to be kept “online”• Solution: redirect routing messages through tunnels

27

Remote Control Plane

*: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005.

Physical router A

Physical router B

CP

DP-old

DP-new

Page 28: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• At the end of data-plane cloning, both data planes are ready to forward traffic

28

Double Data Planes

CP

DP-old

DP-new

Page 29: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• With the double data planes, links can be migrated independently

29

Asynchronous Link Migration

A

CP

DP-old

DP-new

B

Page 30: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Control plane: OpenVZ + Quagga• Data plane: two prototypes

– Software-based data plane (SD): Linux kernel– Hardware-based data plane (HD): NetFPGA

• Why two prototypes?– To validate the data-plane hypervisor design (e.g.,

migration between SD and HD)

30

Prototype Implementation

Page 31: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Performance of individual migration steps• Impact on data traffic• Impact on routing protocols

• Experiments on Emulab

31

Evaluation

Page 32: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Performance of individual migration steps• Impact on data traffic• Impact on routing protocols

• Experiments on Emulab

32

Evaluation

Page 33: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• The diamond testbed

33

Impact on Data Traffic

n0

n1

n2

n3

VR

Page 34: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• SD router w/ separate migration bandwidth– Slight delay increase due to CPU contention

• HD router w/ separate migration bandwidth– No delay increase or packet loss

34

Impact on Data Traffic

Page 35: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• The Abilene-topology testbed

35

Impact on Routing Protocols

Page 36: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Introduce LSA by flapping link VR2-VR3– Miss at most one LSA– Get retransmission 5 seconds later (the default LSA

retransmission timer)– Can use smaller LSA retransmission-interval (e.g., 1

second)

36

Core Router Migration: OSPF Only

Page 37: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• Average control-plane downtime: 3.56 seconds– Performance lower bound

• OSPF and BGP adjacencies stay up• Default timer values

– OSPF hello interval: 10 seconds– BGP keep-alive interval: 60 seconds

37

Edge Router Migration: OSPF + BGP

Page 38: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Where To Migrate

• Physical constraints– Latency

• E.g, NYC to Washington D.C.: 2 msec– Link capacity

• Enough remaining capacity for extra traffic– Platform compatibility

• Routers from different vendors– Router capability

• E.g., number of access control lists (ACLs) supported• The constraints simplify the placement problem

38

Page 39: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Conclusions & Future Work

• VROOM: a useful network-management primitive– Separate tight coupling between physical and logical– Simplify network management, enable new applications– No data-plane and control-plane disruption

• Future work– Migration scheduling as an optimization problem– Other applications of router migration

• Handle unplanned failures• Traffic engineering

39

Page 40: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Thanks!Questions & Comments?

[email protected]

40

Page 41: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Packet-aware Access Network

41

Page 42: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Packet-aware Access NetworkPseudo-wires (virtual circuits) from CE to PE

42

PECE

P/G-MSS: Packet-aware/Gateway Multi-Service SwitchMSE: Multi-Service Edge

Page 43: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Events During Migration

• Network failure during migration– The old VR image is not deleted until the

migration is confirmed successful

• Routing messages arrive during the migration of the control plane– BGP: TCP retransmission– OSPF: LSA retransmission

43

Page 44: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

3. Migrate links affixed to the virtual routers• Enabled by: programmable transport networks

– Long-haul links are reconfigurable• Layer 3 point-to-point links are multi-hop at layer 1/2

Flexible Transport Networks

44

Chicago

New York

Washington D.C.

: Multi-service optical switch (e.g., Ciena CoreDirector)

Programmable Transport Network

Page 45: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Requirements & Enabling Technologies

3. Migrate links affixed to the virtual routers• Enabled by: programmable transport networks

– Long-haul links are reconfigurable• Layer 3 point-to-point links are multi-hop at layer 1/2

45

Chicago

New York

Washington D.C.

: Multi-service optical switch (e.g., Ciena CoreDirector)

Programmable Transport Network

Page 46: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Requirements & Enabling Technologies

4. Enable edge router migration• Enabled by: packet-aware access networks

– Access links are becoming inherently virtualized• Customers connects to provider edge (PE) routers via

pseudo-wires (virtual circuits) • Physical interfaces on PE routers can be shared by

multiple customers

46

Dedicated physical interfaceper customer

Shared physical interface

Page 47: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

• With programmable transport networks, long-haul links are reconfigurable– IP-layer point-to-point links are multi-hop at transport layer

• VROOM leverages this capability in a new way to enable link migration

Link Migration in Transport Networks

47

Page 48: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

2. With packet-aware transport networks– Logical links share the same physical port

• Packet-aware access network (pseudo wires)• Packet-aware IP transport network (tunnels)

Link Migration in Flexible Transport Networks

48

Page 49: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

49

The Out-of-box OpenVZ Approach Packets are forwarded inside each VE When a VE is being migrated, packets are

dropped

Page 50: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

50

Putting It Altogether: Realizing Migration

1. The migration program notifies shadowd about the completion of the control plane migration

Page 51: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

51

Putting It Altogether: Realizing Migration

2. shadowd requests zebra to resend all the routes, and pushes them down to virtd

Page 52: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

52

Putting It Altogether: Realizing Migration

3. virtd installs routes the new FIB, while continuing to update the old FIB

Page 53: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

53

Putting It Altogether: Realizing Migration

4. virtd notifies the migration program to start link migration after finishing populating the new FIB

5. After link migration is completed, the migration program notifies virtd to stop updating the old FIB

Page 54: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

Power Consumption of RoutersVendor Cisco Juniper

Model CRS-1 12416 7613 T1600 T640 M320

Power

(watt)10,920 4,212 4,000 9,100 6,500 3,150

A Synthetic large tier-1 ISP backbone 50 POPs (Point-of-Presence) 20 major POPs, each has:

6 backbone routers, 6 peering routers, 30 access routers

30 smaller POPs, each has: 6 access routers

Page 55: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

55

Future Work• Algorithms that solve the constrained optimization

problems• Control-plane hypervisor to enable cross-vendor migration

Page 56: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

56

Performance of Migration Steps

Memory copy time With different

numbers of routes (dump file sizes)

0

1

2

3

4

5

6

0 10k 100k 200k 300k 400k 500k

Number of routes

Tim

e (

secon

ds)

Suspend + dump Copy dump file Undump + resume Bridging setup

Page 57: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

57

Performance of Migration Steps

FIB population time Grows linearly w.r.t. the number of route entries Installing a FIB entry into NetFPGA: 7.4 microseconds Installing a FIB entry into Linux kernel: 1.94

milliseconds

• FIB update time: time for virtd to install entries to FIB• Total time: FIB update time + time for shadowd to send routes to virtd

Page 58: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

58

The Importance of Separate Migration Bandwidth

The dumbbell testbed

250k routes in the RIB

Page 59: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

59

Separate Migration Bandwidth is Important

Throughput of the migration traffic

Page 60: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

60

Separate Migration Bandwidth is Important

Delay increase of the data traffic

Page 61: Yi Wang, Eric Keller, Brian  Biskeborn ,  Kobus  van  der Merwe , Jennifer Rexford

61

Separate Migration Bandwidth is Important

Loss rate of the data traffic