Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
novm: Hypervisor RebootedAdin Scannell
What is this talk about?
1. Rethinking the hypervisor
2. A new VMM for Linux (novm)
● Adin Scannell○ Systems software developer
● Where do I work?○ Formerly CTO @ Gridcentric Inc.○ Now Software Engineer @ Google
● How can you reach me?○ [email protected]
Who am I?
Virtualization is amazing!
● Powers massive compute infrastructures
● Makes maintaining legacy systems easier
○ (and developing and testing on new systems)
● Enables high-availability, backup, live-migration, etc.
Why is everyone excited about containers?
Some people, when confronted with a problem managing their server,
think "I know, I'll use virtualization."
Now, they have $(virsh list | wc -l) problems.
Virtualization pain points
● Legacy devices, legacy BIOS, etc.
● Performance problems
● Dealing with disk images
DOCKERMANIA
● Lightweight runtime (Linux)● “App store” distribution (registry)● Simple software stack (tarballs and files)
Containers are amazing!
Containers aren’t perfect
● Host kernel dependency limits...○ Portability: SO_REUSEPORT? Everything must be >= 3.9!○ Isolation: Security is tough (CVE-2013-1858)
● Shared kernel state is complex and difficult to isolate○ Migration, suspend & resume are much harder
How can we make containers more like VMs?
How can we make containers more like VMs?How can we make VMs more like containers?
What do I want? (usage)● Support docker-style deployment:
○ novm run --docker_image ubuntu:14.04 grep -v '^#' /etc/apt/sources.list
● Map in different filesystem trees easily:○ novm run --read /var/log=>/prod/foo/log log_analyzer.py
What do I want? (usage)● Support docker-style deployment:
○ novm run --docker_image ubuntu:14.04 grep -v '^#' /etc/apt/sources.list
● Map in different filesystem trees easily:○ novm run --read /var/log=>/prod/foo/log log_analyzer.py
● Support different kernels per “container”:○ novm run --kernel linux-3.9 nodejs so_reuseport.js
● Also: live migration, suspend & resume, etc.
What’s novm?
A lightweight VMM, written in Go.
Designed to run applications, not systems.
Hardware
OS
appapp app
Containers
Containers
Hardware
OS
container
app app
container
app
syscall
cgroups + namespaces
Virtual machines
Hardware
Hypervisor
OS OS OS OS
app app app app
Virtual machines
Hardware
Hypervisor
OS OS OS OS
app app app app
x86 + vmcalls vmx / svm
Virtual machines on Linux
Hardware
Linux Kernel
VMM (qemu)
OS
VMM
OS
KVM
app app app
app
Dimensions
Virtualization Containers
Virtualization Containers
Virtualization Containers
Virtualization Containers
● Lifecycle
● Performance
● Isolation & Security
● Portability
app
Host Untrusted
Host Kernel
Ring 3
Containers
Ring 0
syscall
Kernel
VMM User Code
Devices
Guest Kernel
Host Guest
Host Kernel (KVM)
Ring 0
Ring 3
Virtual machines
vmexit syscallsyscall
application application
Kernel
novm
Devices
Guest Kernel
Host Guest
Host Kernel (KVM)
Ring 0
Ring 3virtio rpcvmexit
novm
syscallsyscall
proxy
application[1]
proxy
virtio rpc
process interactions (stdin, stdout, signals, etc.) [1]
Creating a “novm” (< 1s)1. Create a KVM VM
a. (Management layer creates tap devices, etc.)2. Layout kernel and initrd payload
a. (Build page-tables and use protected entry point)3. Run guest kernel
a. initrd mounts two 9p filesystems: sysroot & noguestb. switch_root to noguest as init, / is sysrootc. noguest opens virtio console, starts RPC serverd. noguest sets up IP configuration, etc.
4. Talk to noguest to run process
Dimensions● Lifecycle
● Performance
● Isolation & Security
● Portability
Virtualization Containers
Virtualization Containers
Virtualization Containers
Virtualization Containers
process-like
novm
Go is great for a VMM!
● Built-in scalability and async tasks
● Better error protection○ Garbage collection○ Bounds checking, type checking
● Built-in serialization and reflection○ Eliminates bookkeeping for S&R
VirtIO Channels == Go Channels ? for buf := range vchannel.incoming {
header := buf.Map(0, VirtioNetHeaderSize)
pktStart := VirtioNetHeaderSize - device.Vnet
pktEnd := buf.Length() - pktStart
// Read a packet from the tap device.
buf.Read(device.Fd, pktStart, pktEnd)
vchannel.outgoing <- buf
}
Asynchronous I/Ofunc (fs *VirtioFsDevice) process(buf *VirtioBuffer) {
fs.Handle(buf)
fs.VirtioDevice.Channels[0].outgoing <- buf
}
func (fs *VirtioFsDevice) run() error {
for {
buf := <-fs.VirtioDevice.Channels[0].incoming
go fs.process(buf)
}
}
Closuresefd := vm.NewBoundEventFd(addr, ioevent.Size(), ioevent.Data())
go func(ioevent IoEvent) {
for {
// Wait for the next event.
efd.Wait()
// Resubmit the ioevent; no need to lookup the device.
handler.Submit(ioevent, offset)
}
}(ioevent)
Dimensions● Lifecycle
● Performance
● Isolation & Security
● Portability
Virtualization Containers
Virtualization Containers
Virtualization Containers
Virtualization Containers
novm
virtio only
process-like
novm
novm
novm
Devices
Linux Guest Kernel
Host Guest
Host Kernel (KVM)
Ring 0
Ring 3
File mapper
syscallsyscall
application
Filesystem Mapper
virtio9p
9p
“read”: {“/”: “/”,
},“write”: {
“/”: “/tmp/vm”,“/var/mysql”: “/proddb”
}
not in kernel space!
Dimensions● Lifecycle
● Performance
● Isolation & Security
● Portability
Virtualization Containers
Virtualization Containers
Virtualization Containers
Virtualization Containers
novm
virtio only
file-based, not disk-based
process-like
novm
novm
novm
Status● What works?
○ Legacy devices: ACPI, UART, PCI, RTC, PIT, etc.○ Virtio devices: Net, Block, FS, Console○ 100% zero copy backends○ Zero downtime restart and upgrades
● TBD: Live migration, suspend & resume
● Performance
What was great?
● Working with KVM!int kvm_fd = open(“/dev/kvm”, O_RDWR);
int kvm_vm = ioctl(kvm_fd, KVM_CREATE_VM, 0);
int kvm_vcpu = ioctl(kvm_vm, KVM_CREATE_VCPU, 0);
int r = ioctl(kvm_vcpu, KVM_RUN);
● Go is amazing!
What was tricky?
● Legacy free? Hardly.○ Device trees? Nope. Virtio-mmio? Nope.○ Virtio devices: PCI w/ MSI-X interrupts (& eventfds)
● VCPUs are goroutines○ How do you interrupt a goroutine?○ Performance analysis will be tricky
Thanks! Questions?● Code available: https://github.com/google/novm● Email: [email protected]
How does a traditional VMM work?
VMM
How does a traditional VMM work?
VMM
BIOS
How does a traditional VMM work?
VMM
BIOS
How does a traditional VMM work?
BIOS
H/W
H/W
VMM
How does a traditional VMM work?
VMM
BIOS
H/W
H/W
disk image
tapdevice
How does a traditional VMM work?
BIOS
boot loader
H/W
H/W
VMM
How does a traditional VMM work?
BIOS
boot loader
H/W
H/W
VMM real mode OS
How does a traditional VMM work?
BIOS
OS
real mode OS
H/W
H/W
VMM
How does a traditional VMM work?
BIOS
OS
app
H/W
H/W
VMM
app
How do you build a VMM? (part 1)int kvm_fd = open(“/dev/kvm”, O_RDWR);
int kvm_vm = ioctl(kvm_fd, KVM_CREATE_VM, 0);
int kvm_vcpu = ioctl(kvm_vm, KVM_CREATE_VCPU, 0);
int r = ioctl(kvm_vcpu, KVM_RUN);
(1)
(2)
(3)
crash
How do you build a VMM? (part 2)void* memory_alloc = malloc(100 * 1024 * 1024);
struct kvm_userspace_memory_region m = {.slot = 0,
.flags = 0,
.guest_phys_addr = 0,
.memory_size = 100 * 1024 * 1024,
.userspace_addr = (__u64)memory_alloc,
};
int r = ioctl(kvm_vcpu, KVM_SET_USER_MEMORY_REGION, &m);
int r = ioctl(kvm_vcpu, KVM_RUN);
(4)
crash
How do you build a VMM? (part 3)struct kvm_run *kvm = mmap(kvm_vcpu);
int r = ioctl(kvm_vcpu, KVM_RUN);
if (kvm->exit_reason == KVM_IO &&
kvm->io.port == 0xCF8)
{
/* Pretend to be a PCI bus! */
}