34
Neutron L2 and L3 agents How They Work and How Kilo Improves Them Carl Baldwin, Rossella Sblendido / May 18, 2014

L2 and L3 agent restructure

Embed Size (px)

Citation preview

Page 1: L2 and L3 agent restructure

Neutron L2 and L3 agentsHow They Work and How Kilo Improves ThemCarl Baldwin, Rossella Sblendido / May 18, 2014

Page 2: L2 and L3 agent restructure

2

Typical OpenStack Deployment

Page 3: L2 and L3 agent restructure

L2 Agent

Page 4: L2 and L3 agent restructure

4

L2 Agent

• Runs on compute node

• Configures the local vbridges (br-int, br-tun)

• Wires new devices

• Applies Security Group Rules

• Communicates with the Neutron server over RPC

Page 5: L2 and L3 agent restructure

5

When a VM is created...

Page 6: L2 and L3 agent restructure

6

Agent loop events

• OVSDB monitor has updates

• Neutron server messages Security groups change (rule updated, member added, provider

rule updated)

Port update

• OVS restarted

Page 7: L2 and L3 agent restructure

7

Detect ports changes

• OVSDB monitor signals if something has changed on the host

• OVS agent scans all the ports in the machine

• It keeps track of the ports that has already processed using an internal dict (registered_ports)

• Diff registered_ports with the result of the scanning → infer devices added and deleted

Page 8: L2 and L3 agent restructure

8

Process network ports – Port added

• request the device details

• provision local VLAN and install proper flows

• set up port filters

• update_device_up

Page 9: L2 and L3 agent restructure

9

Process network ports – Port deleted

• Remove filters

• update_device_down

• claim local VLAN if it's the last device

Page 10: L2 and L3 agent restructure

10

Processing Neutron server messages

• Updated port, same process as added ports

• Security group changes, filters are reapplied for the all devices affected

Page 11: L2 and L3 agent restructure

11

OVS restarted

• Detected using a canary flow

• Reconfigure bridges

• registered_ports is cleared, all ports are reprocessed

Page 12: L2 and L3 agent restructure

12

If an exception is throw?

• registered_ports is cleared, all the ports are reprocessed

• Full resync!

Page 13: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The OpenStack TM attribution statement should used: The OpenStack wordmark and the Square O Design, together or part, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission.

L3 Agent

Page 14: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14

Deployment

• Network Hosts– Legacy with 1 Agent– HA with more than 1 Agent– DVR

• Centralized part is like Legacy– API Available to manage association

• Compute Hosts– DVR

• Distributed part bound to multiple hypervisors

Page 15: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

L3 Agent

• Receives update notifications for routers• Router Processing Queue

– Prioritize user actions so agent is responsive– Less priority to full sync

• Sends status updates

Router Status

51812f4e-e0a8-479a-a116-f588cb020b91

Processing…

5b80e13e-cd2d-40d6-aaea-856bcc4242f6

Processing…

d95effe5-11ca-4450-ba45-615e40d159c6

Processing…

e50750d2-42e3-4e34-888f-cef236a993f7

Processing…

be19c28c-6789-44ce-bb29-8dd4a9944deb

Waiting…

6f81708c-404e-4738-a21c-73eb2b8c2599

Waiting…

4206b114-2e97-4963-9a5d-140cfec95977

Waiting…

… …

Page 16: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16

Router Internals

• Network namespaces (ip netns)• L2 Interfaces moved into namespace

– OVS port– Veth pair (virtual cables)

• IP address configured on interfaces• Simple routing and extra routes• Iptables for NAT and metadata• Proxy for metadata access• External access for instances without floating IP• Advanced Services

– FWaaS– VPNaaS

Page 17: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17

Compute Host L3 Agent

• DVR only– Floating IPs for north/south IPv4 routing– East/west IPv4 routing

• FWaaS Integrated (partially)

“VM1-1

patch-tun

br-int

eth0

QRouter-X “VM2-1

patch-tun

br-int

eth0

QRouter-X

Page 18: L2 and L3 agent restructure

L2 Agent Restructuring

Page 19: L2 and L3 agent restructure

19

Restructuring work

• Get more info from OVSDB monitor

• Improve RPC calls

• Improve resync

Page 20: L2 and L3 agent restructure

20

OVSDB monitor get events

• Improve OvsdbMonitor so that it can pass to the agent the devices that were added or deleted

• The agent consumes the events, don't scans the ports all the time

Page 21: L2 and L3 agent restructure

21

Improve RPC calls

• Use a bulk call to update the status (up/down) of several devices

• Add a parameter: failed devices

• Don't refresh all the devices when security_groups_provider_updated is got but just those affected

• Add the attributes modified in port update so that the L2 agent can decide if reprocessing is needed

Page 22: L2 and L3 agent restructure

22

Improve resync

• Don't resync all the devices if an error is got

• Add a parameter in the RPC calls that collects the devices that caused an error

• The OVS agent can resync only the devices that failed The operation can be retried or failure ignored

Page 23: L2 and L3 agent restructure

23

Did this improve the situation? Let's test!

• VM running Devstack

• Rally scenario "args": {

"flavor": {

"name": "m1.tiny"

},

"image": {

"name": "cirros-0.3.4-x86_64-uec"

},"runner": {

"concurrency": 2,

"times": 20,

"type": "constant"

}

Page 24: L2 and L3 agent restructure

24

Results before

Page 25: L2 and L3 agent restructure

25

Results after

Page 26: L2 and L3 agent restructure

26

It worked!

• Min time 0.6% better

• Avg time 4% better

• 95th percentile 5.9% better

Page 27: L2 and L3 agent restructure

27

There's still work to do...

• Instead of using the command line for OVSDB monitor use the OVS Python library

• Create a queue of events to be processed so that multiple workers can be introduced

• Add priority to events so that higher priority events can be processed first

• Improve state convergence between agent and the server (resilience in case of failure)

Page 28: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The OpenStack TM attribution statement should used: The OpenStack wordmark and the Square O Design, together or part, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission.

L3 Agent Restructuring

Page 29: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29

Handyman Model

• One big file, one object: the agent• Jack of all trades

– Worse: it was a bit forgetful

Page 30: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30

Contractor Model

• Time to move to a contractor model– Agent is the contractor– Calls in specialists to do the work– One contractor for network node, other for hypervisor

Page 31: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31

Specialists

• New specialist for each type of router

Page 32: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.32

More Specialized

Page 33: L2 and L3 agent restructure

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33

Future Work

• Eliminate full sync on router• Too much internal state• Simplify DVR• L3 VPN• Eliminate IPv4 waste• DVR for IPv6

Page 34: L2 and L3 agent restructure

Thank you!