40
Dmitry Afanasiev, [email protected] Daniel Ginsburg, [email protected] Network Architects MPLS in DC and inter- DC networks: the unified forwarding mechanism for network programmability at scale

MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Embed Size (px)

DESCRIPTION

MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Citation preview

Page 1: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Dmitry Afanasiev, [email protected]

Daniel Ginsburg, [email protected]

Network Architects

MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Page 2: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

About Us

Page 3: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

3

• Founded in 1993• NASDAQ:YNDX, Mkt Cap ~$12.5B• One of Europe's largest internet companies

and the leading search provider in Russia• Over 60% of the local search market• Monthly user audience of over 90 million

worldwide. • Services: search, music, video, cloud storage,

news, weather, maps, traffic, email, ads ...

What is Yandex

Page 4: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

4

• We're rather typical MS-DC• Several DCs in Russia and abroad + MPLS

backbone to connect them• About 100k servers and growing fast• Mostly IPv6 internally, need to serve external

IPv4• Network architecture is a bit outdated, needs

rethinking

Our Infrastructure

Page 5: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

In Search of New Arch

Page 6: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

6

• It needs to be:– Scalable– Flexible– Programmable

• Lots of approaches out there, some get many things right…

• But not one combines all the right pieces in the right way

• It's really surprising because right combination seems almost inevitable.

In Search of New Arch

Page 7: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

7

• Many of the ideas have been around for years (or even decades)

• Interconnection network topology – folded Clos• Let the edge handle complexity• Core just delivers packets edge to edge• Overlay/underlay logical split• Control: mix of centralized and distributed.

Needs a nice way to combine both• Simple commodity network elements • Hierarchy and automation to scale the network

Ideas to Build Upon

Page 8: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

8

• All these are ideas are well known, understood and almost universally accepted in the industry

• People are trying to implement them using a wild mix of data plane mechanisms.

• And it introduces enormous complexity• What's missing? Unified forwarding

mechanism

What’s missing

Page 9: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

9

• Life is much easier when we don't have to deal with multitude of data planes and forwarding mechanisms.

• Fortunately, there is already well known, well understood, standardized forwarding plane mechanism upon which we can implement all those ideas without compromising their value.

• It has well defined and standardized mapping to many other popular forwarding panes.

• It's known as MPLS.

Missing… or overlooked?

Page 10: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Unified Forwarding: Why and How

Page 11: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

11

• Different data plane mechanisms – different features

• The unified data plane should be able to support all useful features and produce their combinations

• MPLS is very flexible:– forwarding over a pre-signalled virtual circuit a-la ATM - this is what

RSPV-TE does– source routing over a previously discovered topology a-la Token Ring

networks - see Segment Routing proposal– hierarchical LPM a-la IP - just split the address over several labels and

allow routers to act on the topmost one (not that we suggest it is practical, but it is definitely possible)

Flexibility

Page 12: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

12

• Best way to implement arbitrary semantics is to get rid of any semantics in protocol headers and assign it externally

• Hardware works with protocol headers• Control software defines the semantics

An Abstract Note on Semantics

Page 13: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

13

• Why combining? To have the right features at the right place or produce useful combination of features

• There're basically two ways to combine different data-planes together: stitch or interwork them, and overlay them on top of each other

Combining Data Planes

Page 14: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

14

• It’s pain• Might be done for subset of protocol features• Need to translate between protocols (complex,

never perfect, looses information)• Need to provision interworking points: fragile,

operational nightmare, create bottlenecks• Seems nobody really does this anymore… Or

maybe we still have to sometimes?

Stitching Data Planes

Page 15: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

15

• Overlay to: scale, virtualize, augment one data plane with properties of another

• Overlaying is building hierarchy• But with multiple data planes it is limited and

ad-hoc• Often ugly: IP over Ethernet over VXLAN over

IP over Ethernet• MPLS is intrinsically hierarchical (overlayable,

if you will)

Overlaying Data Planes

Page 16: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

16

• Many hierarchical structures are already in the network: topology, addressing, management and control

• Hierarchy is the most important and the most reliable way to scale things

Hierarchy is your friend

Page 17: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

17

• The ability to implement hierarchy natively enables us to ditch the notion of hard overlay/underlay boundary.

• In a stack of DC-label, ToR-label, port-label, slice-label, vm-label, where's the boundary of overlay/underlay? Not in the packet

• Placement of the boundary only depends on how you structure your control

Overlay/underlay split is a metaphor

Page 18: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

18

• Can be as granular or coarse-grained as one wishes. There's no network-imposed limitation

• Easy behavior aggregation. Just add an extra label on top

• Easy behavior disaggregation. One can expose additional granularity by adding extra label on bottom

FEC is hierarchical

Page 19: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

How to Control MPLS

Page 20: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

20

• MPLS control plane is notoriously complex• Good news: you don’t have to use all of it, can

pick good parts• Classical distributed control is Ok for transport• Centralized control seems better for higher

level artifacts on the edge, sometimes called services

• Both styles can (and should) be combined

MPLS is complex?

Page 21: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

21

• The device has be a bit smarter than in OF• Gets parts of label stack from different control

plane components • Assembles the full stack from those parts,

using local logic to follow assembly instructions provided by control plane

• Assembly instructions come in form of referencing by “name”

• Assembly uses late binding

Enabling combinability

Page 22: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

22

• MPLS VPN (abstraction A) refers to MPLS tunnels (abstraction B), using next-hop resolution.

• The resolution happens on the device itself, and two control plane entities are loosely coupled - MPLS tunnels paths can change their paths, the assigned labels etc, without MP-BGP caring about it

• VPN abstraction refers to tunnel abstraction using next-hops. Next-hop is the name which one control plane abstraction refers to another

Enabling combinability – example

Page 23: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

23

• Recursive next-hop resolution with labeled routes (RFC 3107) is the powerful way to overlay one control plane abstraction over another

• Able to express almost anything we currently want. Still, more expressive way is desired

• BGP 3107 is the way to interact with all-classically-controlled MPLS networks

Enabling Combinability – BGP 3107

Page 24: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

24

• If you can ensure that the labels at some point of the network always stay the same (because you assigned them to be so), you can use static configuration on the other side

• The way to go, when one wants to avoid any signaling dependencies

• Static configuration can be calculated and disseminated automatically

Static Configuration

Page 25: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

25

• On the host! Or even right from the application• Hypervisor switch is the easiest point. SW only,

very flexible.• Naturally fits centralized control• Helps to scale. Lots of RAM, each element

keeps only needed state• Modern CPUs can forward 10s of Gbps without

breaking sweat

Where MPLS should start?

Page 26: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

26

• A simple forwarding plane (3 simple ops)• A simple software agent on the device

(receives parts of label stack from different control plane components, assembles full stack, and programs the HW)

• Centralized and distributed control, or anything in between

• Combinability of different control plane components with late binding via names, which the device resolves

Looks SDNish

Page 27: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

27

• “Modularity based on abstraction is the way things get done” --Liskov

• “SDN ...Not a revolutionary technology... ...just a way of organizing network functionality” --Shenker

• “SDN is merely set of abstractions for control plane, not a specific set of mechanisms.” --Shenker

• “Most lasting legacy of SDN is not better datacenters - But better ways of reasoning about network control” --Shenker

What SDN is

Page 28: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

28

• Let the edge handle complexity – do it on host• Core just delivers packets edge to edge –

hierarchy enables the devices to be agnostic to changes on the edge

• Overlay/underlay logical split – just a way to implement hierarchy

• Control: mix of centralized and distributed. Needs a nice way to combine both – yeah!

• Simple commodity network elements – cheap MPLS capable silicon is finally there

How Ideas Map to MPLS

Page 29: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

29

• Key point of S-MPLS was to extend MPLS to access and separate transport and service in MPLS network

• NFV describes how to host service nodes in DC. If you don’t have MPLS in DC it’s no longer seamless

• Fix is obvious – extend MPLS into DC• Labels can carry additional metadata if one

wants them to

NFV and Seamless MPLS

Page 30: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Case Study: New Yandex DC

Page 31: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

31

• Cheap and abundant bandwidth• Scalable forwarding with minimal state• Multitenancy (=> network virtualization)• Efficient resource pooling• Inter DC traffic engineering• Function chaining: load balancing, FW, etc.• Interconnection with existing infrastructure• Means to integrate all of above• Local response to some events, e.g. failures• Automation at scale

What we need?

Page 32: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

32

We are trying to keep design really simple. Don’t need many functions often perceived as desireable: • L2 (neither real, nor emulated)• VM mobility

– In scale-out applications nodes coming and going is a norm, no need to move them around while preserving state and identity

– VM mobility increases complexity as it depends on other features

• Multicast• We don't have too many changes in topology

What we don’t need

Page 33: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

33

• Host with vLER (MPLS capable vRouter)• Fabric switching elements – LSRs• Centralized controller• Legacy routers. Need to interwork with fabric

LSRs and controller. BGP 3107 is the tool

Components

Page 34: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

34

• 3-label stack: topmost for egress switch, next for egress port, bottom for VM

• vRouter uses {dst prefix, VRF} to impose label stack

• Bottom label processed by destination vLER• Expected state on a fabric switch:

#switches_in_the_fabric + #local_access_ports

Forwarding model

Page 35: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

35

• iBGP 3107 (in-path RR w/ NHS) inside fabric for reachabilty and label distribution (draft-lapukhov…, but with iBGP and labels)

• iBGP 3107 to interwork with legacy routers– Session with connected network element with NHS for switch label– Session with controller for remaining labels, binds to switch label via next

hop

• Label mappings on edge of the fabric are stable, can be provisioned rather than signaled

• Internal fabric failures are handled locally• Label mappings on vRouters are distributed

centrally

Control plane

Page 36: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Why Now and What’s Next?

Page 37: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

37

“The world is changed… I smell it in the air”• A lot of similar ideas in the industry• Seems that thinking converges on something• But ... a lot of ugly ad-hoc solutions are

popping out here and there• Better implement good solution until bad ones

are entrenched• It would be a shame and missed opportunity to

stick with VXLAN/… for years when we could get MPLS instead

Why Now?

Page 38: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

38

• Merchant silicon is finally MPLS capable. And the price is almost right.

• Modern CPUs can process tens of Mpps in SW, making host-based switching feasible.

• Several open source MPLS data plane implementations are emerging

• Several "classical" MPLS control plane components are very useful - BGP 3107, and have been there for quite long time.

What’s Ready?

Page 39: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

39

• All RFC3107 implementations are broken (multiple labels). Talk to your vendor

• Silicon is not perfect. Talk to your vendor• A more expressive way to control late binding

of control plane artifacts than BGP 3107• Perception MPLS as complex technology. It's

current MPLS control plane that is complex• Perception of MPLS as WAN or metro

technology

Gaps

Page 40: MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Thank you!

Questions?