View
4
Download
0
Category
Preview:
Citation preview
T.J. Watson Research Center
Hardware Virtualization Trends 6/14/2006 © 2006 IBM Corporation
Hardware Virtualization Trends
Leendert van Doorn
T.J. Watson Research Center
© 2006 IBM Corporation2 Hardware Virtualization Trends 6/14/2006
T.J. Watson Research Center
© 2006 IBM Corporation3 Hardware Virtualization Trends 6/14/2006
Outline
Virtualization 101The world is changingProcessor virtualization (Intel VT-x, VT-x2, AMD SVM)Security enhancements: LT & PresidioParavirtualization (software isolation approach)I/O Virtualization (AMD, Intel VT-d)Hypervisor Landscape
Talk is based on publicly available information
T.J. Watson Research Center
© 2006 IBM Corporation4 Hardware Virtualization Trends 6/14/2006
Virtualization In Servers
Reduce total cost of ownership (TCO)– Increased systems utilization (current
servers have less than 10% utilization)– Reduce hardware (25% of the TCO)– Space, electricity, cooling (50% of the
operating cost of a data center)Increase server utilizationManagement simplification – Dynamic provisioning– Workload management/isolation– Virtual machine migration– ReconfigurationBetter securityLegacy compatibilityVirtualization protects IT investment
Virtualization Layer
Virtual Machines
Physical
Machine
T.J. Watson Research Center
© 2006 IBM Corporation5 Hardware Virtualization Trends 6/14/2006
Virtualization is not a Panacea
Increasing utilization through consolidation decreases the reliability– Need better hardware reliability (increased MTBF), error reporting, and fault
tolerance– Need better software fault isolation
Virtualization Layer
3 independent systems
3 dependent systems
T.J. Watson Research Center
© 2006 IBM Corporation6 Hardware Virtualization Trends 6/14/2006
Virtual Machine Monitor Approaches
Hardware
Host OS
VMM
Guest OS 1 Guest OS 2
App App
Hardware
Host OS VMM
Guest OS 1 Guest OS 2
App App
Hardware
VMM
Guest OS 1 Guest OS 2
App App
Type 2 VMM Type 1 VMMHybrid VMM
JVM
CLR
VMware Workstation
MS Virtual Server
VMware ESX
Xen
MS Viridian
T.J. Watson Research Center
© 2006 IBM Corporation7 Hardware Virtualization Trends 6/14/2006
Example: Xen System Architecture
VMM and Microkernel are akin
Control, I/O
(domain 0)
Hardware
Xen
Domain 0
Application
Application
Application
XenoLinux
Application
Application
Application
Windows
Application
Application
Application
Guest
domain
Guest
domain
T.J. Watson Research Center
© 2006 IBM Corporation8 Hardware Virtualization Trends 6/14/2006
VMM needs to intercept the privileged instructions that are executed in the guest OS
Traditional x86 architecture is not virtualizable– POPF: pop value from stack, set flags
– If in privileged mode: IF is set
– If in non-privileged mode: IF is not set, no exception is raised
x86 Virtualization Problem
eflags
T.J. Watson Research Center
© 2006 IBM Corporation9 Hardware Virtualization Trends 6/14/2006
Virtualization Approaches
Full virtualization– Binary rewriting
• Inspect each basic block, rewrite privileged instructions• VMware, Virtual PC, qemu
– Hardware assist (Intel VT-x, AMD SVM)• Conceptually, introduce a new CPU mode• Xen, VMware, MS Viridian
Paravirtualization– Modify guest OS to cooperate with the VMM
– Xen, VMware, L4, Denali
T.J. Watson Research Center
© 2006 IBM Corporation10 Hardware Virtualization Trends 6/14/2006
The World is Changing
Hardware virtualization support will become ubiquitous– Every CPU will have (some) kind of virtualization support– Intel VT-x, AMD SVM (Pacifica), PowerPC, Cell64-bit is here to stayProcessor security features– Dynamic root of trust (AMD SVM, Intel LT)Massive multi/many core– Computing cycles will become really cheap and the vendors cannot keep up– What do you do with 64 cores on a single socket? Reliability? Health monitoring?I/O MMU– Intel, AMD IOMMU, IBM Summit/Hurricane allow secure direct device access to
partitions– PCI- Express will include virtualization support
T.J. Watson Research Center
© 2006 IBM Corporation11 Hardware Virtualization Trends 6/14/2006
Processor Virtualization Features
Both Intel and AMD defined processor extensions for their CPU architectures
Intel: Vanderpool Technology (VT-x, VT-x2)
AMD: Secure Virtual Machine (SVM, Pacifica), Rev F, Rev G, …
From 10,000 ft. both look very similar:– Container model (similar to mainframe SIE, start
interpretive execution)
T.J. Watson Research Center
© 2006 IBM Corporation12 Hardware Virtualization Trends 6/14/2006
Intel Vanderpool Technology (VT)
VT-x adds new instructions such as VMXON, VMXOFF, VMLAUNCH, VMRESUME, VMCALL, …VM entry is caused by a VMLAUNCH or a VMRESUMEEach guest has a VMCS (VM control segment) for its state
Guest OS 1 Guest OS 2
VMM
VM entryVM exit
VM entry
VM exit
VMXON VMXOFF
Instruction stream
T.J. Watson Research Center
© 2006 IBM Corporation13 Hardware Virtualization Trends 6/14/2006
AMD Secure Virtual Machine (SVM) Technology
SVM adds the following new instructions: VMRUN, VMMCALL, …VM entry is caused by a VMRUN [rax]Each guest has a VMCB (VM control block) for its stateAlso known as Pacifica
Guest OS 1 Guest OS 2
VMM
VM entryVM entry
VM exitVM exit
Instruction stream
T.J. Watson Research Center
© 2006 IBM Corporation14 Hardware Virtualization Trends 6/14/2006
Intercepts and Exits
A guest runs until– it performs an action that causes an exit
– it executes a VMCALL/VMMCALL
Exit conditions are specified per guest:– Exceptions (e.g., page faults) and interrupts
– Instruction intercepts (CLTS, HLT, IN, OUT, INVLPG, MONITOR,MOV CR/DR, MWAIT, PAUSE, RDTSC …)
Intel VT-x has shadow registers
T.J. Watson Research Center
© 2006 IBM Corporation15 Hardware Virtualization Trends 6/14/2006
Example: Full Virtualization Support for Xen
Most device emulation is implemented in ioemu (PCI, VGA, IDE, NE2100, …)High performance drivers, such as ioapic, lapic, vpit are implemented in XenDeveloped by Intel, AMD and IBM
HVM domain
Hardware
Xen
RHEL3_U5
Application
Application
Application
Domain 0A
pplication
Application
ioemu
exit
T.J. Watson Research Center
© 2006 IBM Corporation16 Hardware Virtualization Trends 6/14/2006
Example: Xen Implementation Statistics
Lines of Code (C, assembly, headers):– Xen: Intel VT-x specific code: 3718 (3.7%)
– Xen: AMD SVM specific code: 5721 (5.6%)
– Xen: Common HVM code: 5794 (5.7%)
– Tools: Common HVM code: 86313 (85%)
Xen 3.0.2 contains both Intel VT-x and AMD SVM support
T.J. Watson Research Center
© 2006 IBM Corporation17 Hardware Virtualization Trends 6/14/2006
The Cost Of VM Entry And Exit
VM entry and exits are very heavy weight and expensive operationsVT-x specification has 11 pages of conditions that need to be checked just on a single VM entry!It is paramount to reduce number of exits– Shadow page tables– Direct device assignment
T.J. Watson Research Center
© 2006 IBM Corporation18 Hardware Virtualization Trends 6/14/2006
Traditional Virtual Memory Map
Virtual to physical translation (page table) is maintained by the OS
The CPU walks the page tables automatically
Page faults when page is not present or access violation
CPU uses Translation-Lookaside-Buffer (TLB) to cache lookups
0
1GB
Virtual
Address space
0
4GB
Physical
Address space
T.J. Watson Research Center
© 2006 IBM Corporation19 Hardware Virtualization Trends 6/14/2006
Virtualized Memory Map
0
1GB
0
4GB
Guest Virtual
Address space
Guest Physical
Address space
Host Physical
Address space
0
4GB
T.J. Watson Research Center
© 2006 IBM Corporation20 Hardware Virtualization Trends 6/14/2006
Shadow Page Tables
VMM maintains a shadow copy of the guest page table to translate from guest virtual to host physicalHardware only sees the shadow copy
0
1GB
Guest Virtual
Address space
0
4GB
Guest Physical
Address space
0
1GB
Guest Virtual
Address space
0
4GB
Host Physical
Address space
GUEST VMM
cr3
T.J. Watson Research Center
© 2006 IBM Corporation21 Hardware Virtualization Trends 6/14/2006
Shadow Page Table Issues
Managing the Shadow Page Table is expensive– All page faults are handled by the VMM, it has to walk the
guest page tables, and instantiate a shadow entry– VMM needs to propagate access and modify bits
• A&M bits are used by the demand paging algorithms• The hardware modifies the shadow page table entry• VMM needs to emulate A&M behavior for the guest• May take up to 3 actual page faults per one guest page fault
Next generation virtualization extension will do this in hardware (Intel VT-x2, AMD SVM)
T.J. Watson Research Center
© 2006 IBM Corporation22 Hardware Virtualization Trends 6/14/2006
Double (Page Table) Walker
AMD Nested Page Table support
Intel Extended Page Table (EPT)
0
1GB
0
4GB
Guest Virtual
Address space
Guest Physical
Address space
Host Physical
Address space
0
4GB
Hardware
T.J. Watson Research Center
© 2006 IBM Corporation23 Hardware Virtualization Trends 6/14/2006
Processor Security Features
Both Intel and AMD are working on hardware security constructs to enhance their virtualization offeringsPrimarily client focused and driven by Microsoft’s NGSCB designsThese enhancements include processor modifications to support– Isolation (VMM)– Trusted computing (TCG)– Trusted keyboard/graphics I/OIntel Lagrande TechnologyAMD Presidio Technology
T.J. Watson Research Center
© 2006 IBM Corporation24 Hardware Virtualization Trends 6/14/2006
AMD SVM Dynamic Root of Trust
New instruction SKINIT that essentially reboots the CPU into a known state– Start a 64KB secure loader– Interrupts disabled and other processors idled– Inhibit DMA to the secure loader memory area– Measurement of the secure loader is stored in the TCG
trusted platform module (a secure passive storage device)
Measurement can be attested to by a remote party
T.J. Watson Research Center
© 2006 IBM Corporation25 Hardware Virtualization Trends 6/14/2006
Intel VT-x / AMD SVM Comparison
Intel VT-x (2005)VMENTER, VMRESUME, VMREAD, VMWRITEVMCS – VM control segment
Intel VT-x2 (200?)Extended Page Tables (EPT)
Intel LT (200?)SENTER (security)DMA exclusion vector (security)
AMD SVM (2006)VMRUNVMCB – VM control blockASID tagged TLB (performance)Paged realmodeSKINIT (security)DMA exclusion vector (security)
AMD SVM (2007)Nested paging (performance)
T.J. Watson Research Center
© 2006 IBM Corporation26 Hardware Virtualization Trends 6/14/2006
Paravirtualization
What do you do when you don’t have hardware support?Modify guest kernel to cooperate with the hypervisor– Introduce hypercalls– All privileged instructions become hypercalls– such as: mov-to-cr3, sti, cli, etc.All guest applications & libraries are unmodifiedThis is the approach IBM i/p-Series took, but also Xen, Microsoft’s Viridian, L4, Denali and Virtual Iron– VMware has now proposed a Virtual Machine Interface (VMI)Paravirtualization has the potential to obviate the need for all the CPU virtualization extensions– Not quite, certain features such as Thread Local Storage are hard to do in a
paravirtualized environment
T.J. Watson Research Center
© 2006 IBM Corporation27 Hardware Virtualization Trends 6/14/2006
Example: Thread Local Storage
Thread local storage (TLS) is used by glibc to store per thread dataIn Linux, TLS resides at the top 64MB, where the Xen VMM residesHere Xen breaks compatibility at the application interface layer– Xen uses a modified glibc
4GB
0
3GB
0
Xen VMM
Linux
App
App
TLS-64M
3GB
T.J. Watson Research Center
© 2006 IBM Corporation28 Hardware Virtualization Trends 6/14/2006
Example: Xen Writable Page Tables
0
4GB
Guest Domain
Application
Application
Application
Guest Domain
Application
Application
Application
Globally Shared
Page Table
Xen VMM
readonly
readonly
writeswritesinsert after vetted by Xen
T.J. Watson Research Center
© 2006 IBM Corporation29 Hardware Virtualization Trends 6/14/2006
VMware’s Virtual Machine Interface
CPU state Interupt state, interrupt mask, halt, reboot, …
CPU descriptors Gdt, idt, ldt, tr, …CPU control Read/write MSR, CR0..4, …CPU information Cpuid, read/write tsc, pmc, …
Stack/privilege transitions
Update kernel stack, iret, sysexit
I/O In/out, delay, IOPL, invdAPIC Read/writeTimer Get wall clock, frequency, cycles,
set alarm, …MMU Set linear mapping, flush TLB,
invalidate, set PTE, swap PTE, …
T.J. Watson Research Center
© 2006 IBM Corporation30 Hardware Virtualization Trends 6/14/2006
CPU Virtualization Techniques Comparison
Performance Legacy guest support
VMM complexity
Binary rewriting medium yes high
paravirtualization high no medium
Hardware assist (current gen)
low yes medium-low
Hardware assist (next gen)
medium yes medium-low
low medium high
T.J. Watson Research Center
© 2006 IBM Corporation31 Hardware Virtualization Trends 6/14/2006
Typical System Architecture
NetworkController
Videocontroller
Diskcontroller
CPUtext
RAMPCI
Bridge/IOMMU
texttext
Virtual CPU
CPU
PCIbus
Virtual I/O AddressPhysical Address
T.J. Watson Research Center
© 2006 IBM Corporation32 Hardware Virtualization Trends 6/14/2006
I/O Memory Management
I/O MMU translates I/O virtual addresses to host physical addressesMain functions– Memory isolation– Address remapping (guest PA == I/O VA for devices assigned to
guest)– Eliminate bounce buffers (32-bit devices into 64-bit PCI space)Protect system against BUSMASTER devicesAMD disclosed their design in January 2006, Intel followed in March 2006 (VT-d)– Again, from a 10,000 ft. view, both are very similarIBM has the same Summit IOMMU in p/i/xSeries
T.J. Watson Research Center
© 2006 IBM Corporation33 Hardware Virtualization Trends 6/14/2006
I/O Hosting Partition
Kernel <-> Hypervisor Interface
Hardware Platform
Hypervisor
Logical Partition Logical Partition Logical Partition Logical Partition
Hardware <-> Hypervisor Interface
• With I/O hosting domain/partition, all real drivers extracted from DomU.
• Can have multiple Device Domains to support different devices.
• Reasonable performance possible through batching, (page flipping)…(“Unmodified device driver reuse via virtual machines”OSDI04…)
• But performance is just not good enough to get rid of all native devices.
T.J. Watson Research Center
© 2006 IBM Corporation34 Hardware Virtualization Trends 6/14/2006
Direct Device Assignment
Kernel <-> Hypervisor Interface
Hardware Platform
Hypervisor
Logical Partition Logical Partition Logical Partition Logical Partition
Hardware <-> Hypervisor Interface
• With IOMMU can directly give partitions control over Bus/Dev/Func.
• Same HW results in improved reliability.
• With right HW can support fully virtualized OS (e.g., windows).
• Clean support for legacy OSes and for highest performance devices (majority probably still virtualized)
• Migration becomes impossible.
• DD in each OS
T.J. Watson Research Center
© 2006 IBM Corporation35 Hardware Virtualization Trends 6/14/2006
Self Virtualizing Devices
Kernel <-> Hypervisor Interface
Hardware Platform
Hypervisor
Logical Partition Logical Partition Logical Partition Logical Partition
Hardware <-> Hypervisor Interface
• Self virtualizing devices allow direct access by partition, e.g., infiniband.
• No overhead (throughput, latency, serialization) to context switch to device domain.
• Is exception for high performance devices… no migration.
• DD in each OS
T.J. Watson Research Center
© 2006 IBM Corporation36 Hardware Virtualization Trends 6/14/2006
AMD I/O MMU
AMD IOMMU has a distributed I/O TLBDevices themselves can do the translation of I/O virtual to physical
NetworkController
Videocontroller
Diskcontroller
PCIBridge/
IOMMU
PCIbus
Virtual I/O AddressPhysical Address
…
Ethernet
IOTLB
IOTLB
Physical address Virtual I/O address
T.J. Watson Research Center
© 2006 IBM Corporation37 Hardware Virtualization Trends 6/14/2006
Partition Migration
IOMMU provides isolation and remapping for direct device assignment
The current generation of IOMMU proposals do not handle partition migration– Problem: PCI has no ability to restart a request and does
not notify failed PCI requests in all cases
PCI SIG is working on adding support for this
T.J. Watson Research Center
© 2006 IBM Corporation38 Hardware Virtualization Trends 6/14/2006
Hypervisor Landscape
VMware is the undisputed leader in the x86 virtualization space– Its binary translation technology is superior– Only uses VT-x in x86-64 because unlike AMD, Intel does not provide
segment limits– Very mature product
Xen is an open source hypervisor shipped as part of RedHat and Suse Linux– Uses paravirtualization for Linux– VT-x/SVM for unmodified guest OS support– Immature product
And then there is the big unknown …
T.J. Watson Research Center
© 2006 IBM Corporation39 Hardware Virtualization Trends 6/14/2006
Microsoft Viridian
Will be released after LongHorn Server (end 2007)Will run Windows and LinuxUses Intel VT-x, AMD SVM, and paravirtualization (enlightments)
HardwareHypervisor
Guest Applications
VM WorkerVM WorkerVM Worker
WMI
VMService
Virtualization stack
Windows WindowsVSPs VSCs
vmbus
kernel kernel
enlightments
T.J. Watson Research Center
© 2006 IBM Corporation40 Hardware Virtualization Trends 6/14/2006
Summary
Technology Trends– Hardware virtualization
support will become ubiquitous
– 64-bit is here to stay
– Processor security features
– Massive multi/many core
– I/O MMU
Challenges– First generation virtual-
ization support is just a technology preview
– Increase system reliability• Hardware fault tolerance• Software robustness
– Management of virtual environments
T.J. Watson Research Center
© 2006 IBM Corporation41 Hardware Virtualization Trends 6/14/2006
Sources
AMD SVM specification, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24594.pdfAMD SVM availability of Nested Paging, http://www.amd.com/us-en/assets/content_type/DownloadableAssets/JoeMenardAMDAnalystDayv2.pdfAMD IOMMU specification, http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/34434.pdfIntel VT-x and VT-d specification, http://www.intel.com/technology/computing/vptech/Intel VT-d release date (2007), http://www.virtualization.info/2006/03/intel-working-on-new-virtualization.htmlIntel LT specification, http://www.intel.com/technology/security/Merom, Conroe details, http://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessorsMicrosoft Viridian, http://h40132.www4.hp.com/upload/ch/de/MicrosoftVS2005R2HPEvent.pdf, http://download.microsoft.com/download/4/1/e/41e56f34-8f90-405e-9daf-f8aeea249935/InTrack14dec_Presentatie_clean.ppt#35VMWARE and CPU virtualization technology, http://download3.vmware.com/vmworld/2005/pac346.pdfVMWARE Virtual machine Interface, http://www.vmware.com/pdf/vmi_specs.pdf
T.J. Watson Research Center
© 2006 IBM Corporation42 Hardware Virtualization Trends 6/14/2006
BACKUP
T.J. Watson Research Center
© 2006 IBM Corporation43 Hardware Virtualization Trends 6/14/2006
Code Word Soup
Intel VT-x, VT-x2 Vanderpool (processor virtualization) Technology for x86 architecture
Intel VT-i VT for IPF (Itanium)
Intel VT-d Intel’s IOMMU implementation
Intel LT Intel’s Lagrande (security) Technology
AMD SVM (Pacifica) AMD’s Secure Virtual Machine CPU extensions
AMD Presidio AMD’s Lagrande equivalent
T.J. Watson Research Center
© 2006 IBM Corporation44 Hardware Virtualization Trends 6/14/2006
TCG Static Root of Trust
Trusted Boot
CRTMCRTM
GRUBStage1(MBR)
GRUBStage1(MBR)
LinuxKernelLinuxKernel
PCR01-07
POSTPOST
ROT
BIOS Bootloader
GRUBStage1.5GRUB
Stage1.5
PCR04-05TPM
/bin/ls
Operating System
/bin/ls
GRUBStage2
GRUBStage2
PCR08
/usr/sbin/httpd/usr/sbin/httpd
PCR10
Recommended