Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Faculty of Computer Science Institute for System Architecture, Operating Systems Group
Virtualization
Dresden, 2009-12-01
Henning Schild
TU Dresden, 2009-12-01 MOS - Virtualization Slide 2 von 58
So Far ...
● Basics● Introduction● Threads & synchronization● Memory
● Real-time● Resource Management● Device Drivers
TU Dresden, 2009-12-01 MOS - Virtualization Slide 3 von 58
Today: Virtualization
● Introduction● Motivation & classification, flavors● L4Linux: Para-virtualization on top of L4
● Architecture● Address space layout● Scenarios
● NOVA – a μ-hypervisor● KVM on FiascoOC
TU Dresden, 2009-12-01 MOS - Virtualization Slide 4 von 58
One possible definition ...
● Introduction of layers of abstraction betweenphysical ressources and users/applications.
● partitioning of ressources● aggregation of ressources● combinations
TU Dresden, 2009-12-01 MOS - Virtualization Slide 5 von 58
Virtualization flavours
● Multitasking● OS as layer of abstraction● machine partitioning, virtual memory and time
slices● application level
● Unix chroot● FreeBSD Jails, Solaris Zones, Linux Vserver● Wine
● …● multiple OSs on one machine
● VMWare, QEMU, VirtualBOX● UML, Xen, L4Linux
TU Dresden, 2009-12-01 MOS - Virtualization Slide 6 von 58
Virtualization – a hype
● A lot of interest in the research community within the last years, e.g.:
● SOSP 03: Xen and the Art of Virtualization● EuroSys 07: a whole session on virtualization
● Many virtualization products:● VMware, QEmu, VirtualBox, KVM, Hyper-V
● x86 Hardware support● further increasing demand:
● VMware: from 240 to 6300 employees within the last few years
TU Dresden, 2009-12-01 MOS - Virtualization Slide 7 von 58
Virtualization - a new idea?
● Originates in IBM's CP/CMS series used on System/3xx mainframes (starting ~1964)
● Control Program - VMM● Cambridge Monitor System
● Guest OS
● Memory protection● SIE instruction (VM mode)● CP encodes much of the guest privileged state
in a hardware-defined format● IBM's first virtual memory system
TU Dresden, 2009-12-01 MOS - Virtualization Slide 8 von 58
Motivation
TU Dresden, 2009-12-01 MOS - Virtualization Slide 9 von 58
Virtualization - Motivation
● optimize utilization● server consolidation
● Isolation● security reasons● incompatibility
● reusing legacy software● i.e. Windows on Linux
● development● virtual test machines
TU Dresden, 2009-12-01 MOS - Virtualization Slide 10 von 58
Virtualization - Buzzwords
TCO
Migration
Consolidation
Availability
Utilization
EfficiencySecurity
Flexibility
Manageability
Virtual Appliance
Maintainability
Virtualization
TU Dresden, 2009-12-01 MOS - Virtualization Slide 11 von 58
Formal Requirements
● Equivalence● guest behaviour should match real machine
● Isolation● host controls ressource access● guests are isolated from host and from each
other
● Efficiency● guest code should be executed natively
see paper reading 2010-01-12: “Formal requirements for virtualizable third generation architectures”
TU Dresden, 2009-12-01 MOS - Virtualization Slide 12 von 58
Classification help
● Virtualization - an overloaded term● Some classification criteria:
● Objective target: hardware, OS API or ABI ?● Emulation vs. virtualization: do we have to
interpret some or all instructions ? Binary vs. byte code interpretation (e.g.: JVM)
● Can we modify the target software ? (e.g. using para-virtualization techniques)
TU Dresden, 2009-12-01 MOS - Virtualization Slide 13 von 58
Reimplementation of the OS interface
● used to integrate a bunch of existing software to other respectively newly created OSes
● when copying the API of an OS, target software needs to be re-linked
● in contrast to that, ABI emulation can run unmodified binaries e.g.: Wine
● Disadvantages of both approaches:● huge effort● shooting at a moving target
TU Dresden, 2009-12-01 MOS - Virtualization Slide 14 von 58
Virtualize the hardware
● instead of emulating the OS API or ABI, take the underlying platform
● common to many OSs
● Emulation● interprete/translate guest code
● Virtualization● native execution of guest code● with or without HW-Support
● Paravirtualization● modification of the guest
TU Dresden, 2009-12-01 MOS - Virtualization Slide 15 von 58
Emulation
● binary translation/interpretation of guest code● no native execution● contradicts with efficiency requirement● applicable to a lot of architectures● often used for peripheral devices● Example: QEMU, Bochs
● QEMU emulates x86, ARM, SPARC, PowerPC ...
TU Dresden, 2009-12-01 MOS - Virtualization Slide 16 von 58
Platform virtualization in software
● guest OS runs natively in less privileged mode● privileged instructions fail and are handled by
the VMM (trap-and-emulate)● VMM derives and manages shadow structures
from guest's primary structures, e.g.: shadow page tables
● JIT binary translation● Examples: VMware, KQEMU, VirtualBox
TU Dresden, 2009-12-01 MOS - Virtualization Slide 17 von 58
X86 Virtualization
TU Dresden, 2009-12-01 MOS - Virtualization Slide 18 von 58
Problems with x86 virtualization
● Ring-alias problem● guest OS runs in privilege level > 0
● Address space compression● part of the guest OS's address space used by
the VMM (e.g. IDT, GDT)
● some instructions do not trap, e.g.:● popf: pop stack into EFLAGS register,
causes interrupt handling problems (IF not updated in user-mode)
● faulting implies performance loss● kernel entry/exit -> doubled context switch
TU Dresden, 2009-12-01 MOS - Virtualization Slide 19 von 58
Hardware enabled virtualization
● Example Intel-VT● root and non-root mode, VM entry and exit● Virtual Machine Control Structure in physical
memory holds information of guest and host state and some additional control information
● VMCS is used to investigate VM exit conditions, e.g.: whether a guest traps when masking or unmasking interrupts
● AMD SVM is similar
TU Dresden, 2009-12-01 MOS - Virtualization Slide 20 von 58
Hardware enabled virtualization
● problematic instructions trap● reduced software complexity● Examples: KVM, VirtualBox, Xen, Hyper-V,
Windows 7 XP Mode, Parallels ...
TU Dresden, 2009-12-01 MOS - Virtualization Slide 21 von 58
MMU Virtualization
TU Dresden, 2009-12-01 MOS - Virtualization Slide 22 von 58
Shadow page tables
● Memory tracing of the page tables● decode and emulate guest's pagefaults
host physical memory
guest physical memoryhost virtual memory
guest virtual memory
guest page table
host page table
shadow page table
TU Dresden, 2009-12-01 MOS - Virtualization Slide 23 von 58
Shadow page tables
1) pagefault in guest (GVA)
2) caught by hypervisor/VMM
3) parse guest page tables (GVA GPA)→
4) maybe inject pagefault into guest and parse again
5) translate guest pt entry to shadow pt entry (GPA HVA HPA)→ →
6) create mapping in shadow pt and resume
→ costly, recent x86 processors come with hardware support
host physical memory
guest physical memoryhost virtual memory
guest virtual memory
guest page table
host page table
shadow page table
GVA guest virtual addressGPA guest physical addressHVA host virtual addressHPA host physical address
TU Dresden, 2009-12-01 MOS - Virtualization Slide 24 von 58
MMU Virtualization with HW support
● hardware can parse two page table levels● VM page table constructed by VMM maps HPA
to GPA● guest manages its own GPA to GVA tables● no shadow paging in software required● pagefaults can be resolved without mode
switching● AMD: nested paging, Intel: EPT
→ significant performance increase for VMs
TU Dresden, 2009-12-01 MOS - Virtualization Slide 25 von 58
Paravirtualization
TU Dresden, 2009-12-01 MOS - Virtualization Slide 26 von 58
Paravirtualization
● modify guest OS to integrate it in the runtime environment of another OS
● advantages:● no hardware support required● cooperation from guests possible
● disadvantages:● source code required● high development cost
● L4Linux, Xen, User Mode Linux, coLinux● Afterburner (Karlsruhe): modify binary code● paravirtualized drivers: VMware, KVM (virtio)
TU Dresden, 2009-12-01 MOS - Virtualization Slide 27 von 58
XEN
TU Dresden, 2009-12-01 MOS - Virtualization Slide 28 von 58
Examples from TUDOS group
TU Dresden, 2009-12-01 MOS - Virtualization Slide 29 von 58
L4Linux
TU Dresden, 2009-12-01 MOS - Virtualization Slide 30 von 58
L4Linux: history
● presented at SOSP '97● based on x86 Linux 2.0 on top of first L4 kernel
● (L4)Linux has evolved over the years● 2.2 supported MIPS and x86● 2.4 first version to run on L4Env● 2.6 uses 'paravirtualization' L4 kernel features
● recently● latest L4Linux release 2.6.31● x86 and ARM support● SMP
TU Dresden, 2009-12-01 MOS - Virtualization Slide 31 von 58
Linux Architecture
LinuxKernel
Arch-Ind.
Arch-Depend.
Arch-Depend.
Processes Scheduling
IPC
MemoryManagement
Page allocation Address spaces
Swapping
File Systems VFS
File System Impl.
Networking Sockets Protocols
Device Drivers
System-Call Interface
Hardware Access
Application Application Application Applicationuser
kernel
HardwareCPU, Memory, PCI, Devices
TU Dresden, 2009-12-01 MOS - Virtualization Slide 32 von 58
kernel
Linux Architecture
LinuxKernel
Arch-Ind.
Arch-Depend.
Arch-Depend.
Processes Scheduling
IPC
MemoryManagement
Page allocation Address spaces
Swapping
File Systems VFS
File System Impl.
Networking Sockets Protocols
Device Drivers
System-Call Interface
Hardware Access
HardwareCPU, Memory, PCI, Devices, …
Application Application Application Applicationuser
● Architecture dependent part● Small, for x86 about 2% of the kernel● System call interface:
Kernel entry Signal delivery Copy from/to user space
● Hardware access: CPU state and features MMU Interrupt Memory mapped I/O, I/O ports
● Architecture dependent part implements generic interface used by independent part
TU Dresden, 2009-12-01 MOS - Virtualization Slide 33 von 58
Linux Architecture
LinuxKernel
Arch-Ind.
Arch-Depend.
Arch-Depend.
Processes Scheduling
IPC
MemoryManagement
Page allocation Address spaces
Swapping
File Systems VFS
File System Impl.
Networking Sockets Protocols
Device Drivers
System-Call Interface
Hardware Access
HardwareCPU, Memory, PCI, Devices
Application Application Application Applicationuser
kernel
TU Dresden, 2009-12-01 MOS - Virtualization Slide 34 von 58
L4Linux Architecture
LinuxKernel
Arch-Ind.
Arch-Depend.
Arch-Depend.
Processes Scheduling
IPC
MemoryManagement
Page allocation Address spaces
Swapping
File Systems VFS
File System Impl.
Networking Sockets Protocols
Device Drivers
System-Call Interface
Hardware Access
Hardware
Application Application Application Application
user
kernel FiascoOC
L4 Task
L4IO Console moe
L4 Task L4 Task L4 Task L4 Task
sigma0
TU Dresden, 2009-12-01 MOS - Virtualization Slide 35 von 58
L4Linux Architecture
● Linux kernel and Linux user processes run each within a single L4 task
● L4/L4RE specific part is implemented as separate architecture: arch/l4 include/asm-l4
● L4/L4RE architecture dependent part itself divides into x86 and ARM specific part
● most code is reused from x86 resp. ARM specific part
TU Dresden, 2009-12-01 MOS - Virtualization Slide 36 von 58
Linux address space layout
● 0x0 – TASK_SIZE● user part● changes on every
context switch
● TASK_SIZE – 0xF...● kernel part● constant in all
address spaces
● Physical memory mapped beginning at PAGE_OFFSET
0xFFFFFFFF
0xC0000000
0x00000000
UserAddress
Space
KernelAddress
Space
Phys. Memory
vmalloc, kmap, …
Kernel ImagePAGE_OFFSET
Application,Libraries, …
TASK_SIZE
TU Dresden, 2009-12-01 MOS - Virtualization Slide 37 von 58
L4Linux address space layout
0xFFFFFFFF
0xC0000000
0x00000000
UserAddress
Space
KernelAddress
Space
Phys. Memory
vmalloc, kmap, …
Kernel ImagePAGE_OFFSET
Application,Libraries, …
TASK_SIZE
Application,Libraries, …
Guest-phys. Memory
vmalloc, kmap, …
Kernel Image
FiascoOCMicrokernel
FiascoOCMicrokernel
0x00000000
0x00000000
PAGE_OFFSET
0xFFFFFFFF
0xFFFFFFFF
0xC0000000
0xC0000000 L4Linux Server
L4Linux UserProcess
TU Dresden, 2009-12-01 MOS - Virtualization Slide 38 von 58
L4Linux: problems to be solved
● L4Linux server has to:● have some basic resources (memory, I/O)● manage page tables of its user processes● handle exceptions from user processes● schedule its tasks
● L4Linux user processes have to:● 'enter' the L4Linux kernel (now in a different
address space)
● Kernel needs information from user processes formerly accessible in the same address space, e.g.: syscall arguments
TU Dresden, 2009-12-01 MOS - Virtualization Slide 39 von 58
Linux address space management
● Architecture-independent part:● general page table management● implements allocator strategies● page replacement strategies● assumes 4-level page table by
architecture-dependent part
● Architecture-dependent part● set, remove and test entries● TLB handling● Linux for x86 uses 2 level page
tables
Linux Kernel
Hardware
Architecture-DependentPart (i386)
thread_info
Application
MemoryManagement– Page allocation– Address spaces– Swapping
TU Dresden, 2009-12-01 MOS - Virtualization Slide 40 von 58
L4Linux address space management
● L4Linux user processes are actually L4 tasks
● L4Linux server is the pager● Hardware page tables are
managed by L4 kernel● L4Linux page tables are mirrored
● L4Linux uses map/unmap operations
● adding page table entries is done lazy (pagefault occurs)
Linux Kernel
Hardware
Architecture-DependentPart (i386)
thread_info
Application
MemoryManagement– Page allocation– Address spaces– Swapping
FiascoKernel
TU Dresden, 2009-12-01 MOS - Virtualization Slide 41 von 58
General exception handling
● if a L4 task raises an exception kernel sends exception IPC to handler (feature in FiascoOC and L4.X2)
● Exception IPC contains CPU state of the client● Exception handler can reply with a new state,
for instance another instruction pointer● Exception IPC can be used to recognize Linux
system calls:● INT 0x80 will trigger an exception● L4Linux server acts as exception handler for its
user processes
TU Dresden, 2009-12-01 MOS - Virtualization Slide 42 von 58
L4Linux kernel entry
● System call costs:● 2x kernel entry/exit (exception and reply)● 2x address space switch
Fiasco microkernel
L4Linux UserProcess
INT 0x80
L4Linux Server
arch. dependent
arch. independent2
4
1
3
TU Dresden, 2009-12-01 MOS - Virtualization Slide 43 von 58
Interrupt handling
● Interrupt messages are received in separate threads
● Interrupt threads run on a higher priority than other Linux threads (Linux semantic)
● Interrupt thread wake up idle thread or force the running user process to enter the linux server
● Plain Linux disables interrupts for syncronization
● Use a lock instead of CLI/STI
Fiasco Kernel
L4Linux Server
Hardware
Device Driver
InterruptThreads
L4IO
MainThread
r equest _i r q( i r q_no, handl er ,
…)
TU Dresden, 2009-12-01 MOS - Virtualization Slide 44 von 58
not covered in detail here ...
● Linux kernel needs to access address space of user processes (e.g. syscall arguments)
● walk page tables of user process
● Security problems with DMA● move device drivers out of L4Linux● I/O MMU
● L4Linux scheduling● only one L4Linux process is active at a time● other processes are waiting in IPC (exception
or pagefault)
TU Dresden, 2009-12-01 MOS - Virtualization Slide 45 von 58
Hybrid applications
● Linux applications that are 'L4 aware'● Needs to be detected by Linux server
● Linux server puts them in UNINTERRUPTIBLE state in its own data structures
● Will not disturb ongoing IPC in hybrid task
● L4Linux user processes run as Aliens● Special alien flag used when creating a task● Aliens trap when calling L4 system● Exception handler monitors system call● Fiasco-only feature
TU Dresden, 2009-12-01 MOS - Virtualization Slide 46 von 58
L4Linux Use - cases
TU Dresden, 2009-12-01 MOS - Virtualization Slide 47 von 58
Real-time video player
● L4Linux user processes might use L4 services
FiascoOC kernel
Loader Roottask moe DOpE
RT-MPEGPlayer
L4Linux
MPlayerFrontend
controls
TU Dresden, 2009-12-01 MOS - Virtualization Slide 48 von 58
Multiple L4Linux instances
● Using multiple instances concurrently, e.g. for each security domain
● Devices need to be multiplexed (see resource management lesson: ORe, nitpicker, windhoek, )
● Communication through network, special IPC monitors ...
FiascoOC kernel
Loader Roottask console moe
Virtualization infrastructure
L4Linux server L4Linux server
App.App. App. App.
TU Dresden, 2009-12-01 MOS - Virtualization Slide 49 von 58
Use L4Linux as a toolbox
● L4Linux instances can provide access to various complex software stacks, e.g.:
● Network stacks● Drivers● Filesystems
Fiasco kernel
Loader Roottask
L4 App
L4Linux
AlienFilesystemWrapper
moe
TU Dresden, 2009-12-01 MOS - Virtualization Slide 50 von 58
Faithful Virtualization
TU Dresden, 2009-12-01 MOS - Virtualization Slide 51 von 58
NOVA – μ hypervisor approach
● NOVA OS Virtualization Architecture● Separate hypervisor and VMM(s)
hypervisor
Serveruser
kernel
root
non-root
VMM
Guest OS
VMM
Guest OS
VMM
Guest OS
TU Dresden, 2009-12-01 MOS - Virtualization Slide 52 von 58
NOVA
● Hypervisor manages protection domains: ● address spaces and virtual machines
● Virtual machine has associated virtualization handler -> the VMM (codename: Vancouver)
● VMMs handle virtualization faults and implement virtual devices
● split functionality of hypervisor and VMM➔ reduced complexity of hypervisor which
runs security-sensitive applications beside the VMs
TU Dresden, 2009-12-01 MOS - Virtualization Slide 53 von 58
Guest OS
qemu-kvm
FiascoOC and KVM-L4
● FiascoOC provides AMD SVM support● KVM can be reused with little modification
L4Linux server
Fiasco kernel
Loader Roottask DMPhys Names
KVM-L4
qemu-kvm
Guest OS
user
kernel
host
guest
TU Dresden, 2009-12-01 MOS - Virtualization Slide 54 von 58
FiascoOC and KVM-L4
● FiascoOC supports AMD SVM● memory is mapped to VMs using map/unmap
mechanism● invoke VM capability to enter guest mode● existing VMM can be reused
● KVM with little modification● low development cost
● Virtual Machines next to secure applications
TU Dresden, 2009-12-01 MOS - Virtualization Slide 55 von 58
Summary
● Virtualization flavours● API or ABI emulation● Emulation● Full virtualization● Hardware (especially x86) or OS● Paravirtualizition
● L4Linux – paravirtualization in detail● Address space layout & management● Taming Linux (interrupts, I/O memory)
● Faithful Virtualization● Nova – minimal hypervisor + VMM from scratch● KVM-L4 reusing a VMM
TU Dresden, 2009-12-01 MOS - Virtualization Slide 56 von 58
References
● Tom Van Vleck: 'The IBM 360/67 and CP/CMS' http://www.multicians.org/thvv/360-67.html
● Keith Adams and Ole Agesen: 'A Comparision of Software and Hardware Techniques for x86 Virtualization' ASPLOS 2006 http://www.vmware.com/pdf/asplos235_adams.pdf
● Intel Virtualization Technology http://www.intel.com/technology/itj/2006/v10i3/1-hardware/1-abstract.htm
● H. Härtig, M. Roitzsch, A. Lackorzynski, B. Döbel and A. Böttcher: 'L4 – Virtualization and Beyond'
TU Dresden, 2009-12-01 MOS - Virtualization Slide 57 von 58
References
● Udo Steinberg: 'NOVA Hypervisor Architecture Whitepaper' Internal Report 2007
● L4Linux Webpage http://os.inf.tu-dresden.de/L4/LinuxOnL4
● Adam Lackorzynski: 'L4Linux Porting Optimizations' Diploma Thesis 2004 http://os.inf.tu-dresden.de/papers_ps/adam-diplom.pdf
TU Dresden, 2009-12-01 MOS - Virtualization Slide 58 von 58
Outlook
● now, paper reading:● Singularity - Rethinking the Software Stack
● next weeks:● legacy containers● OS Personalities