49
novm: Hypervisor Rebooted Adin Scannell

novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

novm: Hypervisor RebootedAdin Scannell

Page 2: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

What is this talk about?

1. Rethinking the hypervisor

2. A new VMM for Linux (novm)

Page 3: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

● Adin Scannell○ Systems software developer

● Where do I work?○ Formerly CTO @ Gridcentric Inc.○ Now Software Engineer @ Google

● How can you reach me?○ [email protected]

Who am I?

Page 4: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Virtualization is amazing!

● Powers massive compute infrastructures

● Makes maintaining legacy systems easier

○ (and developing and testing on new systems)

● Enables high-availability, backup, live-migration, etc.

Page 5: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Why is everyone excited about containers?

Page 6: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Some people, when confronted with a problem managing their server,

think "I know, I'll use virtualization."

Now, they have $(virsh list | wc -l) problems.

Page 7: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Virtualization pain points

● Legacy devices, legacy BIOS, etc.

● Performance problems

● Dealing with disk images

Page 8: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

DOCKERMANIA

● Lightweight runtime (Linux)● “App store” distribution (registry)● Simple software stack (tarballs and files)

Page 9: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Containers are amazing!

Page 10: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Containers aren’t perfect

● Host kernel dependency limits...○ Portability: SO_REUSEPORT? Everything must be >= 3.9!○ Isolation: Security is tough (CVE-2013-1858)

● Shared kernel state is complex and difficult to isolate○ Migration, suspend & resume are much harder

Page 11: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How can we make containers more like VMs?

Page 12: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How can we make containers more like VMs?How can we make VMs more like containers?

Page 13: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

What do I want? (usage)● Support docker-style deployment:

○ novm run --docker_image ubuntu:14.04 grep -v '^#' /etc/apt/sources.list

● Map in different filesystem trees easily:○ novm run --read /var/log=>/prod/foo/log log_analyzer.py

Page 14: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

What do I want? (usage)● Support docker-style deployment:

○ novm run --docker_image ubuntu:14.04 grep -v '^#' /etc/apt/sources.list

● Map in different filesystem trees easily:○ novm run --read /var/log=>/prod/foo/log log_analyzer.py

● Support different kernels per “container”:○ novm run --kernel linux-3.9 nodejs so_reuseport.js

● Also: live migration, suspend & resume, etc.

Page 15: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

What’s novm?

A lightweight VMM, written in Go.

Designed to run applications, not systems.

Page 16: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Hardware

OS

appapp app

Containers

Page 17: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Containers

Hardware

OS

container

app app

container

app

syscall

cgroups + namespaces

Page 18: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Virtual machines

Hardware

Hypervisor

OS OS OS OS

app app app app

Page 19: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Virtual machines

Hardware

Hypervisor

OS OS OS OS

app app app app

x86 + vmcalls vmx / svm

Page 20: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Virtual machines on Linux

Hardware

Linux Kernel

VMM (qemu)

OS

VMM

OS

KVM

app app app

app

Page 21: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Dimensions

Virtualization Containers

Virtualization Containers

Virtualization Containers

Virtualization Containers

● Lifecycle

● Performance

● Isolation & Security

● Portability

Page 22: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

app

Host Untrusted

Host Kernel

Ring 3

Containers

Ring 0

syscall

Page 23: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Kernel

VMM User Code

Devices

Guest Kernel

Host Guest

Host Kernel (KVM)

Ring 0

Ring 3

Virtual machines

vmexit syscallsyscall

application application

Page 24: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Kernel

novm

Devices

Guest Kernel

Host Guest

Host Kernel (KVM)

Ring 0

Ring 3virtio rpcvmexit

novm

syscallsyscall

proxy

application[1]

proxy

virtio rpc

process interactions (stdin, stdout, signals, etc.) [1]

Page 25: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Creating a “novm” (< 1s)1. Create a KVM VM

a. (Management layer creates tap devices, etc.)2. Layout kernel and initrd payload

a. (Build page-tables and use protected entry point)3. Run guest kernel

a. initrd mounts two 9p filesystems: sysroot & noguestb. switch_root to noguest as init, / is sysrootc. noguest opens virtio console, starts RPC serverd. noguest sets up IP configuration, etc.

4. Talk to noguest to run process

Page 26: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Dimensions● Lifecycle

● Performance

● Isolation & Security

● Portability

Virtualization Containers

Virtualization Containers

Virtualization Containers

Virtualization Containers

process-like

novm

Page 27: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Go is great for a VMM!

● Built-in scalability and async tasks

● Better error protection○ Garbage collection○ Bounds checking, type checking

● Built-in serialization and reflection○ Eliminates bookkeeping for S&R

Page 28: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

VirtIO Channels == Go Channels ? for buf := range vchannel.incoming {

header := buf.Map(0, VirtioNetHeaderSize)

pktStart := VirtioNetHeaderSize - device.Vnet

pktEnd := buf.Length() - pktStart

// Read a packet from the tap device.

buf.Read(device.Fd, pktStart, pktEnd)

vchannel.outgoing <- buf

}

Page 29: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Asynchronous I/Ofunc (fs *VirtioFsDevice) process(buf *VirtioBuffer) {

fs.Handle(buf)

fs.VirtioDevice.Channels[0].outgoing <- buf

}

func (fs *VirtioFsDevice) run() error {

for {

buf := <-fs.VirtioDevice.Channels[0].incoming

go fs.process(buf)

}

}

Page 30: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Closuresefd := vm.NewBoundEventFd(addr, ioevent.Size(), ioevent.Data())

go func(ioevent IoEvent) {

for {

// Wait for the next event.

efd.Wait()

// Resubmit the ioevent; no need to lookup the device.

handler.Submit(ioevent, offset)

}

}(ioevent)

Page 31: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Dimensions● Lifecycle

● Performance

● Isolation & Security

● Portability

Virtualization Containers

Virtualization Containers

Virtualization Containers

Virtualization Containers

novm

virtio only

process-like

novm

novm

Page 32: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

novm

Devices

Linux Guest Kernel

Host Guest

Host Kernel (KVM)

Ring 0

Ring 3

File mapper

syscallsyscall

application

Filesystem Mapper

virtio9p

9p

“read”: {“/”: “/”,

},“write”: {

“/”: “/tmp/vm”,“/var/mysql”: “/proddb”

}

not in kernel space!

Page 33: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Dimensions● Lifecycle

● Performance

● Isolation & Security

● Portability

Virtualization Containers

Virtualization Containers

Virtualization Containers

Virtualization Containers

novm

virtio only

file-based, not disk-based

process-like

novm

novm

novm

Page 34: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Status● What works?

○ Legacy devices: ACPI, UART, PCI, RTC, PIT, etc.○ Virtio devices: Net, Block, FS, Console○ 100% zero copy backends○ Zero downtime restart and upgrades

● TBD: Live migration, suspend & resume

● Performance

Page 35: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

What was great?

● Working with KVM!int kvm_fd = open(“/dev/kvm”, O_RDWR);

int kvm_vm = ioctl(kvm_fd, KVM_CREATE_VM, 0);

int kvm_vcpu = ioctl(kvm_vm, KVM_CREATE_VCPU, 0);

int r = ioctl(kvm_vcpu, KVM_RUN);

● Go is amazing!

Page 36: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

What was tricky?

● Legacy free? Hardly.○ Device trees? Nope. Virtio-mmio? Nope.○ Virtio devices: PCI w/ MSI-X interrupts (& eventfds)

● VCPUs are goroutines○ How do you interrupt a goroutine?○ Performance analysis will be tricky

Page 37: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

Thanks! Questions?● Code available: https://github.com/google/novm● Email: [email protected]

Page 38: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

VMM

Page 39: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

VMM

BIOS

Page 40: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

VMM

BIOS

Page 41: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

BIOS

H/W

H/W

VMM

Page 42: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

VMM

BIOS

H/W

H/W

disk image

tapdevice

Page 43: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

BIOS

boot loader

H/W

H/W

VMM

Page 44: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

BIOS

boot loader

H/W

H/W

VMM real mode OS

Page 45: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

BIOS

OS

real mode OS

H/W

H/W

VMM

Page 46: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How does a traditional VMM work?

BIOS

OS

app

H/W

H/W

VMM

app

Page 47: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How do you build a VMM? (part 1)int kvm_fd = open(“/dev/kvm”, O_RDWR);

int kvm_vm = ioctl(kvm_fd, KVM_CREATE_VM, 0);

int kvm_vcpu = ioctl(kvm_vm, KVM_CREATE_VCPU, 0);

int r = ioctl(kvm_vcpu, KVM_RUN);

(1)

(2)

(3)

crash

Page 48: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How do you build a VMM? (part 2)void* memory_alloc = malloc(100 * 1024 * 1024);

struct kvm_userspace_memory_region m = {.slot = 0,

.flags = 0,

.guest_phys_addr = 0,

.memory_size = 100 * 1024 * 1024,

.userspace_addr = (__u64)memory_alloc,

};

int r = ioctl(kvm_vcpu, KVM_SET_USER_MEMORY_REGION, &m);

int r = ioctl(kvm_vcpu, KVM_RUN);

(4)

crash

Page 49: novm: Hypervisor Rebooted · novm run --read /var/log=>/prod/foo/log log_analyzer.py Support different kernels per “container”: novm run --kernel linux-3.9 nodejs so_reuseport.js

How do you build a VMM? (part 3)struct kvm_run *kvm = mmap(kvm_vcpu);

int r = ioctl(kvm_vcpu, KVM_RUN);

if (kvm->exit_reason == KVM_IO &&

kvm->io.port == 0xCF8)

{

/* Pretend to be a PCI bus! */

}