Upload
stephen-gordon
View
814
Download
2
Tags:
Embed Size (px)
Citation preview
OPENSTACK COMPUTE 101
Libvirt/KVM Driver UpdateStephen Gordon (@xsgordon)Sr. Technical Product Manager
OpenStack Compute● Execution and management of compute workloads● Relatively technology agnostic (VMs, BM, Containers)● Pluggable virtualization/container backends:
○ Libvirt (KVM, LXC, Parallels CT, Parallels VM, QEMU, Xen), Ironic, Hyper-V, VMware vCenter, XenServer, etc.
○ http://docs.openstack.org/developer/nova/support-matrix.html
Components● RESTful nova-api interface
exposed on TCP port 8774.● AMQP message queue used
for RPC communications.● nova-scheduler handles
hypervisor selection for instance placement.
● nova-conductor handles database access.
Components (cont.)● nova-compute acts as the
Compute agent, interacting with the relevant hypervisor APIs to launch/manage guests.
Libvirt/KVM● Driver used for 85% of production OpenStack deployments. [1]● Free and Open Source Software end-to-end stack:
○ Libvirt - Abstraction layer providing an API for hypervisor and virtual
machine lifecycle management. Supports many hypervisors and architectures.
○ Qemu - Machine emulator able to use dynamic translation, or with hypervisor assistance (e.g. KVM) virtualization.
○ KVM - Kernel-based-virtual machine is a kernel module providing full virtualization for the Linux kernel .
● Why Libvirt instead of speaking straight to QEMU?[1] http://superuser.openstack.org/articles/openstack-users-share-how-their-deployments-stack-up
Why Libvirt?$ /usr/libexec/qemu-kvm -name instance-00000007 -S -machine pc-i440fx-rhel7.1.0,accel=tcg,usb=off \ -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object memory-backend-ram,size=2048M,id=ram-node0,host-nodes=1,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -uuid 57d7852e-0286-4913-bd7e-f897c5197d21 -smbios type=1,manufacturer=Red Hat,product=OpenStack Nova,version=2014.2.2-19.el7ost,serial=c3758f33-342b-4350-adf0-a67798b56209,uuid=57d7852e-0286-4913-bd7e-f897c5197d21 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000007.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/57d7852e-0286-4913-bd7e-f897c5197d21/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:45:de:c3,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/57d7852e-0286-4913-bd7e-f897c5197d21/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none
Libvirt/KVM Guest Configuration● CPU● NIC● Disks● PCI devices● Serial consoles● SMBios info● CPU pinning● VNC or SPICE● QEMU + SPICE agents
● VNC or SPICE● QEMU + SPICE agents● Clock (PIT, RTC) parameters● Scheduler, disk, network
tunables
Supporting Tool Highlights● virsh - CLI for interacting with Libvirt.● virt-rescue - Run a rescue shell on a virtual machine (using
libguestfs).● virt-sysprep - Reset a virtual machine so that clones can be
made. Removes SSH host keys, udev rules, etc.● virt-v2v - Convert guests from other platforms (VMware, Xen,
Hyper-V).● virt-sparsify - Convert disk image to thin provisioned.
Libvirt/KVM● nova-compute agent
communicates with Libvirt.● Libvirt launches and
manages qemu processes for each guest.
● KVM uses the Linux kernel for direct hardware access as needed.
Guest Enhancements● VirtIO drivers provide paravirtualized device to virtual
machines, improving speed over emulation.○ Built into modern enterprise Linux guest operating systems.○ Available for Windows.
● QEMU guest agent optionally runs inside guests and facilitates external interaction by users and/or management platforms including OpenStack.
● Anti-VENOM provided using sVirt (SELinux and AppArmour security drivers supported).
Virtual Interface Drivers● Responsible for plugging/unplugging guest interfaces.● Different interface types = different Libvirt XML definitions.● Simplified LibvirtGenericVIFDriver implementation supports a
wide range of VIF types.● Not easily pluggable by out-of-tree implementations.
○ Live in nova/virt/libvirt/vif.py○ More on this later...
Virtual Interface Drivers Example● passthrough:
<interface type="direct">
<mac address="DE:AD:BE:EF:CA:FE"/>
<model type="virtio"/>
<source dev="eth0" mode="passthrough"/>
</interface>
● vhost-user:<interface type="vhostuser">
<mac address="DE:AD:BE:EF:CA:FE"/>
<model type="virtio"/>
<source type="unix" mode="server" path="/vhost-user/test.sock"/>
</interface>
Volume Drivers● Conceptually similar to VIF drivers, albeit no “generic” driver.● volume_drivers=iscsi=nova.virt.libvirt.volume.
LibvirtISCSIVolumeDriver,iser=nova.virt.libvirt.volume.LibvirtISERVolumeDriver,local=nova.virt.libvirt.volume.LibvirtVolumeDriver...etc.
Performance Features● CPU Pinning● Huge Pages● NUMA-aware scheduling (cont.)
○ Memory binding○ I/O device locality awareness
CPU Pinning● Extends NUMATopologyFilter added in Juno:
○ Adds concept of a “dedicated resource” guest.
○ Implicitly pins vCPUs and emulator threads to pCPU cores for increased performance, trading off the ability to overcommit.
● Combine with existing techniques for isolating cores for maximum benefit.
Example - Hardware Layout# numactl --hardwareavailable: 2 nodes (0-1)node 0 cpus: 0 1 2 3node 0 size: 8191 MBnode 0 free: 6435 MBnode 1 cpus: 4 5 6 7node 1 size: 8192 MBnode 1 free: 6634 MBnode distances:node 0 1 0: 10 20 1: 20 10
Example - Hardware Layout
Node 0
Core 0 Core 1
Core 2 Core 3
Node 1
Core 4 Core 5
Core 6 Core 7
Node 0 RAM # 0
Node 0 RAM # 1 Node 1 RAM # 1
Node 1 RAM # 0
Example - Virsh Capabilities<cells num='2'>
<cell id='0'>
<memory unit='KiB'>8387744</memory>
<pages unit='KiB' size='4'>2096936</pages>
<pages unit='KiB' size='2048'>0</pages>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='20'/>
</distances>
<cpus num='4'>
<cpu id='0' socket_id='0' core_id='0' siblings='0'/>
<cpu id='1' socket_id='0' core_id='1' siblings='1'/>
...
Example - Configuration● Scheduler:
○ Enable NUMATopologyFilter, and AggregateInstanceExtraSpecsFilter
● Compute Node(s):○ Alter kernel boot params to add isolcpus=2,3,6,7○ Set vcpu_pin_set=2,3,6,7 in /etc/nova.conf
Example - Hardware Layout
Node 0
Core 0 Core 1
Core 2 Core 3
Node 1
Core 4 Core 5
Core 6 Core 7
Node 0 RAM # 0
Node 0 RAM # 1 Node 1 RAM # 1
Node 1 RAM # 0
Host Processes
Guests
Example - Configuration● Flavor:
○ Add hw:cpu_policy=dedicated extra specification:$ nova flavor-key m1.small.performance set hw:cpu_policy=dedicated
● Instance:$ nova boot --image rhel-guest-image-7.1-20150224 \
--flavor m1.small.performance test-instance
Example - Resultant Libvirt XML● vCPU placement is static and 1:1 vCPU:pCPU relationship:
<vcpu placement='static'>2</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='2'/>
<vcpupin vcpu='1' cpuset='3'/>
<emulatorpin cpuset=' 2-3'/>
</cputune>
● Memory is strictly aligned to the NUMA node:<numatune>
<memory mode= 'strict' nodeset='0'/>
<memnode cellid=' 0' mode='strict' nodeset=' 0'/>
</numatune>
Huge Pages● Huge pages allow the use of larger page sizes (2M, 1 GB)
increasing CPU TLB cache efficiency.○ Backing guest memory with huge pages allows predictable memory
access, at the expense of the ability to over-commit.
○ Different workloads extract different performance characteristics from different page sizes - bigger is not always better!
● Administrator reserves large pages during compute node setup and creates flavors to match:○ hw:mem_page_size=large|small|any|2048|1048576
● User requests using flavor or image properties.
Example - Host Configuration# grubby --update-kernel=ALL --args= ”hugepagesz=2M hugepages=2048”
# grub2-install /dev/sda
# shutdown -r now
# cat /sys/devices/system/node/ node0/hugepages/hugepages-2048kB/nr_hugepages
1024
# cat /sys/devices/system/node/ node1/hugepages/hugepages-2048kB/nr_hugepages
1024
Example - Virsh Capabilities<topology>
<cells num=’2’>
<cell id=’0’>
<memory unit=’KiB’>4193780</memory>
<pages unit=’KiB’ size=’4’>524157</pages>
<pages unit=’KiB’ size=’2048’>1024</pages>
...
Example - Flavor Configuration$ nova flavor-key m1.small.performance set hw:mem_page_size=2048
$ nova boot --flavor=m1.small.performance \
--image=rhel-guest-image-7.1-20150224 \
numa-lp-test
Example - Result$ virsh dumpxml instance-00000001
...
<memoryBacking>
<hugepages>
<page size=’2048’ unit=’KiB’ nodeset=’0’/>
</hugepages>
</memorybacking>
...
Example - Hardware Layout w/ PCIe
Node 0
Core 0 Core 1
Core 2 Core 3
Node 1
Core 4 Core 5
Core 6 Core 7
Node 0 RAM # 0
Node 0 RAM # 1 Node 1 RAM # 1
Node 1 RAM # 0
Node 0 PCIe Node 1 PCIe
I/O-based NUMA Scheduling● Extends PciDevice model to include NUMA node the device
is associated with.● Extends NUMATopologyFilter to make use of this information
when scheduling.
Quiesce Guest Filesystem● Libvirt > 1.2.5 supports a fsFreeze/fsThaw API.● Freezes/thaws guest filesystem(s) using QEMU guest agent.● Ensures consistent snapshots.● To enable:
○ hw_qemu_guest_agent image property must be set to yes.○ hw_require_fsfreeze image property must be set to yes.○ QEMU guest agent must be installed inside guest.
Hyper-V Enlightenment● Windows guests support several additional paravirt features
when running on Hyper-V (similar to virtio, kvmclock, etc. on KVM).
● Helps avoid BSOD in guests on heavily loaded hosts, enhances performance.
● QEMU/KVM is able to support several of these natively.● Expands behavior of os_type=“windows” image property.
vhost-user support● VIF driver for new type of network interface implemented in
QEMU/Libvirt.● Intended to provide a more efficient path between a guest
and userspace vswitches.
Liberty Predictions/Speculation● Libvirt hardware policy from libosinfo (approved)● Post-plug VIF scripts (under review)● Further work around SR-IOV incl.:
○ Interface attach/detach (under review)○ Live migration when using macvtap (under review)
● Ability to select guest CPU model and/or features (under review)
● VM HA (under review)● VirtIO network performance enhancements (under review)● Hot resize (under review)
OPENSTACK COMPUTE 101OPENSTACK COMPUTE 101
Questions?