5

Click here to load reader

Evolving New Configuration Tools for IOV Network … · Evolving New Configuration Tools for IOV Network ... introduced two SR-IOV-capable ... MAC addresses to each VF device upon

Embed Size (px)

Citation preview

Page 1: Evolving New Configuration Tools for IOV Network … · Evolving New Configuration Tools for IOV Network ... introduced two SR-IOV-capable ... MAC addresses to each VF device upon

Evolving New Configuration Tools for IOV Network Devices Mitch Williams

Intel Corporation

Abstract:

I/O Virtualization (IOV) technology for network devices is still in its infancy. While the devices are

readily available, and drivers have been pushed into the kernel, configuration tools are few and far

between. Kernel maintainers and network administrators are still coming to terms with what types of

tools are required to make IOV network devices usable in the real world.

This paper describes the current state of these configuration tools, shows some use cases, and provides

a overview of future development work. It describes what works today, what's still missing, and what

can be done to address these issues.

Introduction

In 2009, Intel's LAN Access Division introduced two SR-IOV-capable devices: the Gigabit 82576, and

the 10 Gigabit 82599. Drivers for the 82576 (both Physical Function and Virtual Function) appeared in

the 2.6.30 Linux kernel. Drivers for the 82599 are currently under review and should be available in the

2.6.34 kernel.

Naturally, a driver alone does not make for a completely usable solution. So, also in 2009, both Citrix

and Red Hat shipped virtualization software with rudimentary support for SR-IOV and the 82576

drivers. These releases allowed average network administrators to experiment with SR-IOV, effectively

moving the research out of development laboratories and into the data center.

Problems soon arose, revealing numerous broken and missing features. The SR-IOV devices performed

well, easily beating emulated devices in both throughput and CPU utilization. However, they required a

lot of coddling to be usable. Proper configuration was time-consuming, and some common

functionality (e.g. VM migration) did not work at all with SR-IOV.

The remainder of this paper describes a number of common issues that have been seen, discusses the

impact of these issues, and describes what efforts are taking place to address them.

MAC Addresses

By far, the most significant issue for administrators is proper MAC address management. (This is true

even if said administrators don’t realize it!) Without proper management of MAC addresses by the VM

management tools, the network will never operate correctly.

In general, the PF driver assigns MAC addresses to each VF device upon PF driver load. For the Intel

devices (which with the author is most familiar), these addresses are generated randomly each time the

PF driver loads. Other devices may use fixed addresses, which lessens the severity of the problem, but

the issue still remains.

Naturally, when the MAC address is assigned randomly, it changes each time. This becomes

problematic for the OS running inside the VM, because the MAC address is what is used to uniquely

identify each network interface. Thus, to the OS running in the VM, it appears that the VF device is

constantly being replaced by a new device.

Page 2: Evolving New Configuration Tools for IOV Network … · Evolving New Configuration Tools for IOV Network ... introduced two SR-IOV-capable ... MAC addresses to each VF device upon

For a network configured to use DHCP, this is an annoyance. After 10 boots, the hardware manager

will show 9 inactive NIC devices (never to be used again), and one active. The network will, in general,

continue to work because by default new interfaces are assumed to be configured with DHCP.

However, on a network using statically-assigned IP addresses, this proliferation of NIC devices is fatal.

Each time the host is rebooted, all of the VMs will find their “new” network interfaces and default to

DHCP. The statically-assigned IP address is effectively lost, and new interface must be configured by

hand with the correct address. This process has to happen with each VM, each time the host is booted.

If the host is a headless server, often the only way to get to the VM is over the network, which is, at this

point, inoperable. Things rapidly become ugly.

Obviously, this situation can be avoided by giving each possible VF device its own assigned MAC

address, which some vendors may choose to do. However, this wastes a lot of the vendor’s assigned

MAC address space for virtual devices that may or may not ever be used.

Even this scheme eventually runs into problems, as soon as VM migration is introduced. Ideally, the

MAC address would migrate with the VM. If it does not, then the OS will see yet another “new”

interface, resulting in the same issues that we see with random MAC addresses. Furthermore, the issue

of duplicate MAC addresses now rears its ugly head. As soon as the host from which the VM has

migrated is rebooted, the original MAC address is now available for use again - but the VM that has

migrated is still using it. The inevitable loss of connectivity and the resulting confusion are highly

undesirable.

The solution is simple: the VM management tools need to manage MAC addresses for VF devices, just

like they do for emulated devices. This is the only way to reliably solve the problem.

Toward this goal, the author has submitted a set of patches that modify the “ip” command to allow for

setting the VF MAC addresses at runtime. These patches have been accepted into the 2.6.34 kernel.

VLAN Assignment

One common feature of most mature virtualization solutions is the ability to assign a VM to a given

VLAN. Traffic originating from this VM is tagged with the given VLAN ID before being sent on the

wire, and traffic sent to this VM is stripped of VLAN tags before being passed to the VM.

With an emulated device, the solution is simple – the emulation software (likely QEMU) adds and

strips the tags for each packet. It’s a simple solution, but can be CPU intensive, especially in high-

bandwidth environments. If configured properly, a VF device could do this work with no additional

CPU load.

To put it simply, this feature is completely missing from current SR-IOV implementations. Even if the

hardware is able to perform this functionality, there is no way for the VM management software to

enable or configure the feature.

Again, these features are present in the Linux 2.6.34 kernel.

Switch Issues

To be at all functional, any network device implementing IOV needs to include some sort of on-board

network switch. Such a switch can be very simple, since it only needs to route packets between the

Page 3: Evolving New Configuration Tools for IOV Network … · Evolving New Configuration Tools for IOV Network ... introduced two SR-IOV-capable ... MAC addresses to each VF device upon

various VFs and the PF within the NIC. However, there are also a number of advanced (and quite

desirable) features available on these switches, for example anti-spoofing, rate limiting, storm control,

and port mirroring, among others.

One urgently-needed feature is support for adding additional MAC address filters to the on-board

switch. This will enable VF-enabled VMs to communicate with VMs not using VF devices. See Figure

1.

Figure 1. A typical virtualized networking layout for use with both SR-IOV and emulated

devices. Unless the NIC’s on-board switch is configured correctly, the VMs on the left

side will not be able to communicate with the VMs on the right side.

It may also be desirable to completely disable the onboard switch in order to enable external loopback

through a switch (a “real” external switch) that supports this operation.

Currently there is no way to configure the onboard switch in an IOV-enabled NIC. Work for this is in

the planning stages.

Current Solutions

In late 2009, the author (who was using fixed IP addresses on his lab systems, and was therefore

annoyed) began working on the MAC address issue, as it was apparent that this was the most pressing

issue at the time. After a flurry of emails, and a few face-to-face meetings with members of the Linux

community, work began on modifying the “ip” tool to add this support.

The “ip” tool is part of the Iproute2 package, which is included with every modern Linux distribution.

Despite the name “ip” is used to configure many more settings that just IP addresses. In particular, the

Physical NIC

PF Device

VF Device

VF Device

VF Device

VF Device

On-board switch

Virtual Machine

Virtual Machine

Virtual Machine

Virtual Machine

Virtual Machine

Virtual Machine

Virtual Machine

Virtual Machine

Software Bridge

Emulated Device

Emulated Device

Emulated Device

Emulated Device

The Real World

Page 4: Evolving New Configuration Tools for IOV Network … · Evolving New Configuration Tools for IOV Network ... introduced two SR-IOV-capable ... MAC addresses to each VF device upon

“ip link” command can be used to set the MAC address (among other things) of a network interface. It

was natural, then, to extend this interface to allow for VF configuration.

There are three pieces that must all be present for this to work: the driver modifications, the new kernel

interface, and the updated “ip” utility. Once this is done, an administrator can easily view or modify VF

settings from the command line:

Currently, this functionality is supported in the igb driver for the 82576. Once the kernel is released,

similar support will be added to the ixgbe driver for the 82599.

Future Development

Work is actively progressing on the issues discussed above. Once the first set of patches is (described

above) is upstream, work will begin to solve the switch configuration issues. It is likely that this work

will be done in conjunction with the Open vSwitch project (http://openvswitch.org/).

These tools, while functional and necessary, are still not a complete solution. Support for these low-

level tools needs to be added to the higher-level VM management tools for both KVM and Xen. When

that work is done, we will finally have a full top-to-bottom solution for IOV NIC management.

As is customary in the Open Source community, comments (and help) are always appreciated.

References

Intel Wired Ethernet project:

root@avec:~# ip link show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

link/ether 00:30:48:cc:6d:ee brd ff:ff:ff:ff:ff:ff

vf 0: MAC ee:d4:38:30:91:14

vf 1: MAC 2a:da:93:d7:7d:ec

vf 2: MAC aa:c6:38:a5:8b:a5

vf 3: MAC 4a:d2:64:7c:a8:17

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

link/ether 00:30:48:cc:6d:ef brd ff:ff:ff:ff:ff:ff

vf 0: MAC 82:bc:b8:fa:bf:2c

vf 1: MAC f2:ba:58:df:20:eb

vf 2: MAC 9e:27:b3:ea:e4:9f

vf 3: MAC 7e:06:c8:ba:27:39

root@avec:~# ip link set dev eth1 vf 0 mac 00:11:22:33:44:55

root@avec:~# ip link set dev eth1 vf 0 vlan 10

root@avec:~# ip link show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

link/ether 00:30:48:cc:6d:ee brd ff:ff:ff:ff:ff:ff

vf 0: MAC ee:d4:38:30:91:14

vf 1: MAC 2a:da:93:d7:7d:ec

vf 2: MAC aa:c6:38:a5:8b:a5

vf 3: MAC 4a:d2:64:7c:a8:17

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

link/ether 00:30:48:cc:6d:ef brd ff:ff:ff:ff:ff:ff

vf 0: MAC 00:11:22:33:44:55, vlan 10

vf 1: MAC f2:ba:58:df:20:eb

vf 2: MAC 9e:27:b3:ea:e4:9f

vf 3: MAC 7e:06:c8:ba:27:39

root@avec:~#

Page 5: Evolving New Configuration Tools for IOV Network … · Evolving New Configuration Tools for IOV Network ... introduced two SR-IOV-capable ... MAC addresses to each VF device upon

http://sourceforge.net/projects/e1000/ (also has datasheets for the 82576 and 82599 parts)

Iproute2 information and links:

http://www.linuxfoundation.org/collaborate/workgroups/networking/iproute2

Original kernel patches accepted into the 2.6.34 tree:

http://marc.info/?l=git-commits-head&m=126756050804468&w=2

http://marc.info/?l=git-commits-head&m=126756050904471&w=2

http://marc.info/?l=git-commits-head&m=126756051704516&w=2

http://marc.info/?l=git-commits-head&m=126756051704518&w=2

http://marc.info/?l=git-commits-head&m=126756051204484&w=2

iproute2 patches as posted to linux-netdev mailing list:

http://marc.info/?l=linux-netdev&m=126580241203138&w=2

http://marc.info/?l=linux-netdev&m=126580243203209&w=2

http://marc.info/?l=linux-netdev&m=126580245403264&w=2

http://marc.info/?l=linux-netdev&m=126580247303306&w=2

Xen Cloud Platform, including basic SR-IOV support:

http://www.xen.org/products/cloudxen.html

Red Hat Technical Overview of RHEL 5.4, including SR-IOV support:

http://www.redhat.com/f/pdf/rhel/RHEL5_4-technical-overview.pdf