Upload
chamith-kumarage
View
232
Download
0
Embed Size (px)
Citation preview
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 1/67
Vi r tua l i za t ion
Ian Pratt
XenSource Inc. and University of Cambridge
Keir Fraser, Steve Hand, Christian
Limpach and many others…
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 2/67
Outline
Virtualization Overview
Xen Architecture
New Features in Xen 3.0
VM Relocation
Xen RoadmapQuestions
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 3/67
Virtualization Overview
Single OS image: OpenVZ, Vservers, Zones
Group user processes into resource containers
Hard to get strong isolation
Full virtualization: VMware, VirtualPC, QEMU Run multiple unmodified guest OSes
Hard to efficiently virtualize x86
Para-virtualization: Xen
Run multiple guest OSes ported to special arch Arch Xen/x86 is very close to normal x86
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 4/67
Virtualization in the Enterprise
X
Consolidate under-utilized
servers
Avoid downtime with VMRelocation
Dynamically re-balanceworkload
to guarantee application SLAs
XEnforce security policy X
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 5/67
Xen 2.0 (5 Nov 2005)
Secure isolation between VMs
Resource control and QoS
Only guest kernel needs to be ported
User-level apps and libraries run unmodified
Linux 2.4/2.6, NetBSD, FreeBSD, Plan9, Solaris
Execution performance close to native
Broad x86 hardware support
Live Relocation of VMs between Xen nodes
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 6/67
Para-Virtualization in Xen
Xen extensions to x86 arch
Like x86, but Xen invoked for privileged ops
Avoids binary rewriting
Minimize number of privilege transitions into Xen Modifications relatively simple and self-contained
Modify kernel to understand virtualised env.
Wall-clock time vs. virtual processor time
• Desire both types of alarm timer
Expose real resource availability• Enables OS to optimise its own behaviour
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 7/67
Xen 2.0 Architecture
Event Channel Virtual MMUVirtual CPUControl IF
Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE)
Native
Device
Drivers
GuestOS(XenLinux)
Device
Manager &
Control s/w
VM0
GuestOS(XenLinux)
Unmodified
User
Software
VM1
Front-End
Device Drivers
GuestOS(XenLinux)
Unmodified
User
Software
VM2
Front-End
Device Drivers
GuestOS(Solaris)
Unmodified
User
Software
VM3
Safe HW IF
Xen Virtual Machine Monitor
Back-Ends
Front-End
Device Drivers
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 8/67
Xen 3.0 Architecture
Event Channel Virtual MMUVirtual CPUControl IF
Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE)
Native
Device
Drivers
GuestOS(XenLinux)
Device
Manager &
Control s/w
VM0
GuestOS(XenLinux)
Unmodified
User
Software
VM1
Front-End
Device Drivers
GuestOS(XenLinux)
Unmodified
User
Software
VM2
Front-End
Device Drivers
UnmodifiedGuestOS
(WinXP))
Unmodified
User
Software
VM3
Safe HW IF
Xen Virtual Machine Monitor
Back-End
VT-x
x86_32
x86_64
IA64
AGP
ACPI
PCISMP
Front-End
Device Drivers
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 9/67
I/O Architecture
Xen IO-Spaces delegate guest OSesprotected access to specified h/w devices Virtual PCI configuration space Virtual interrupts (Need IOMMU for full DMA protection)
Devices are virtualised and exported toother VMs via Device Channels Safe asynchronous shared memory transport
‘Backend’ drivers export to ‘frontend’ drivers Net: use normal bridging, routing, iptables Block: export any blk dev e.g. sda4,loop0,vg3
(Infiniband / “Smart NICs” for direct guest
IO)
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 10/67
System Performance
L X V U
SPEC INT2000 (score)
L X V U
Linux build time (s)
L X V U
OSDB-OLTP (tup/s)
L X V U
SPEC WEB99 (score)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
chmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 11/67
TCP results
L X V U
Tx, MTU 1500 (Mbps)
L X V U
Rx, MTU 1500 (Mbps)
L X V U
Tx, MTU 500 (Mbps)
L X V U
Rx, MTU 500 (Mbps)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
TCP bandwidth on Linux (L), Xen (X), VMWare Workstation (V), and UML (U)
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 12/67
Scalability
L X
2
L X
4
L X
8
L X
16
0
200
400
600
800
1000
Simultaneous SPEC WEB99 Instances on Linux (L) and Xen(X)
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 13/67
r
i n g
3
x86_32
Xen reserves top of VA space
Segmentation
protects Xen fromkernel
System call speedunchanged
Xen 3 now supportsPAE for >4GB mem
Kernel
User
4GB
3GB
0GB
Xen
S
S
U r i n g
1
r i n g
0
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 14/67
x86_64
Large VA space makeslife a lot easier, but:
No segment limitsupport
Need to use page-levelprotection to protect
hypervisor
Kernel
User
264
0
Xen
U
S
U
Reserved
247
264 -247
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 15/67
x86_64
Run user-space and kernel inring 3 using differentpagetables
Two PGD’s (PML4’s): one withuser entries; one with user pluskernel entries
System calls require anadditional syscall/ret via Xen
Per-CPU trampoline to avoidneeding GS in Xen
Kernel
User
Xen
U
S
U
syscall/sysret
r3
r0
r3
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 16/67
x86 CPU virtualization
Xen runs in ring 0 (most privileged)
Ring 1/2 for guest OS, 3 for user-space
GPF if guest attempts to use privileged instr
Xen lives in top 64MB of linear addr space Segmentation used to protect Xen as switching page
tables too slow on standard x86
Hypercalls jump to Xen in ring 0
Guest OS may install ‘fast trap’ handler
Direct user-space to guest OS system calls
MMU virtualisation: shadow vs. direct-mode
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 17/67
MMU Virtualization : Direct-Mode
MMU
Guest OS
Xen VMM
Hardware
guest writes
guest reads
Virtual → Machine
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 18/67
Para-Virtualizing the MMU
Guest OSes allocate and manage own PTs
Hypercall to change PT base
Xen must validate PT updates before use
Allows incremental updates, avoids revalidation
Validation rules applied to each PTE:1. Guest may only map pages it owns*
2. Pagetable pages may only be mapped RO
Xen traps PTE updates and emulates, or ‘unhooks’ PTE page
for bulk updates
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 19/67
Writeable Page Tables : 1 – Write fault
MMU
Guest OS
Xen VMM
Hardware
page fault
first guest
write
guest reads
Virtual → Machine
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 20/67
Writeable Page Tables : 2 – Emulate?
MMU
Guest OS
Xen VMM
Hardware
first guest
write
guest reads
Virtual → Machine
emulate?
yes
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 21/67
Writeable Page Tables : 3 - Unhook
MMU
Guest OS
Xen VMM
Hardware
guest writes
guest reads
Virtual → MachineX
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 22/67
Writeable Page Tables : 4 - First Use
MMU
Guest OS
Xen VMM
Hardware
page fault
guest writes
guest reads
Virtual → MachineX
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 23/67
Writeable Page Tables : 5 – Re-hook
MMU
Guest OS
Xen VMM
Hardware
validate
guest writes
guest reads
Virtual → Machine
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 24/67
MMU Micro-Benchmarks
L X V U
Page fault (µs)
L X V U
Process fork (µs)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
ench results on Linux (L), Xen (X), VMWare Workstation (V), and UML
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 25/67
SMP Guest Kernels
Xen extended to support multiple VCPUs
Virtual IPI’s sent via Xen event channels
Currently up to 32 VCPUs supported
Simple hotplug/unplug of VCPUs From within VM or via control tools
Optimize one active VCPU case by binarypatching spinlocks
NB: Many applications exhibit poor SMPscalability – often better off running multipleinstances each in their own OS
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 26/67
SMP Guest Kernels
Takes great care to get good SMP performancewhile remaining secure Requires extra TLB syncronization IPIs
SMP scheduling is a tricky problem
Wish to run all VCPUs at the same time But, strict gang scheduling is not work conserving Opportunity for a hybrid approach
Paravirtualized approach enables severalimportant benefits Avoids many virtual IPIs Allows ‘bad preemption’ avoidance Auto hot plug/unplug of CPUs
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 27/67
Driver Domains
Event Channel Virtual MMUVirtual CPUControl IF
Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE)
Native
Device
Driver
GuestOS(XenLinux)
Device
Manager &
Control s/w
VM0
Native
Device
Driver
GuestOS(XenLinux)
VM1
Front-End
Device Drivers
GuestOS(XenLinux)
Unmodified
User
Software
VM2
Front-End
Device Drivers
GuestOS(XenBSD)
Unmodified
User
Software
VM3
Safe HW IF
Xen Virtual Machine Monitor
Back-End Back-End
Driver
Domain
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 28/67
Device Channel Interface
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 29/67
Isolated Driver VMs
Run device driversin separatedomains
Detect failure e.g. Illegal access
Timeout
Kill domain, restartE.g. 275ms outage
from failedEthernet driver
0
50
100
150
200
250
300
350
0 5 10 15 20 25 30 35 40
time (s)
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 30/67
VT-x / Pacifica : hvm
Enable Guest OSes to be run without modification
E.g. legacy Linux, Windows XP/2003
CPU provides vmexits for certain privileged instrs
Shadow page tables used to virtualize MMU
Xen provides simple platform emulation
BIOS, apic, iopaic, rtc, Net (pcnet32), IDE emulation
Install paravirtualized drivers after booting for high-performance IO
Possibility for CPU and memory paravirtualization Non-invasive hypervisor hints from OS
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 31/67
Native
Device
Drivers
Control
Panel
(xm/xend)
F r o
n t en d
V i r t u al
Dr i v er s
Linux xen64
Xen Hypervisor
Device
Models
Guest BIOS
Unmodified OS
Domain N
Linux xen64
Callback / Hypercall VMExit
Virtual Platform
0D
Guest VM (VMX)
(32-bit)
B a ck en d
V
i r t u al d r i v er
Native
Device
Drivers
Domain 0
Event channel
0P
1/3P
3P
I/O: PIT, APIC, PIC, IOAPICProcessor Memory
Control Interface HypercallsEvent ChannelScheduler
F E
V i r t u
al
Dr i v er s
Guest BIOS
Unmodified OS
VMExit
Virtual Platform
Guest VM (VMX)
(64-bit)
F E
V i r t u
al
Dr i v er s
3D
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 32/67
Native
Device
Drivers
Control
Panel
(xm/xend)
F r o
n t en d
V i r t u al
Dr i v er s
Linux xen64
Xen Hypervisor
Guest BIOS
Unmodified OS
Domain N
Linux xen64
Callback / Hypercall
VMExit
Virtual Platform
0D
Guest VM (VMX)
(32-bit)
B a ck en d
V
i r t u al d r i v er
Native
Device
Drivers
Domain 0
Event channel
0P
1/3P
3P
I/O: PIT, APIC, PIC, IOAPICProcessor Memory
Control Interface HypercallsEvent ChannelScheduler
F E
V i r t u al
Dr i v er s
Guest BIOS
Unmodified OS
VMExit
Virtual Platform
Guest VM (VMX)
(64-bit)
F E
V i r t u al
Dr i v er s
3D
IO Emulation IO Emulation
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 33/67
MMU Virtualizion : Shadow-Mode
MMU
Accessed &dirty bits
Guest OS
VMM
Hardware
guest writes
guest reads Virtual → Pseudo-physical
Virtual → Machine
Updates
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 34/67
Xen Tools
xm
xmlib
xenstore
libxc
Priv Cmd Back
dom0_op
Xen
xenbus
dom0 dom1
xenbus Front
Web svcsCIM
builder control save/
restorecontrol
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 35/67
VM Relocation : Motivation
VM relocation enables:
High-availability• Machine maintenance
Load balancing
• Statistical multiplexing ga
Xen
Xen
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 36/67
Assumptions
Networked storage
NAS: NFS, CIFS
SAN: Fibre Channel iSCSI, network block dev
drdb network RAID
Good connectivity
common L2 network
L3 re-routeing
Xen
Xen
Storage
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 37/67
Challenges
VMs have lots of state in memory
Some VMs have soft real-timerequirements
E.g. web servers, databases, game servers
May be members of a cluster quorum
Minimize down-time
Performing relocation requires resources Bound and control resources used
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 38/67
Stage 0: pre-migration
Stage 1: reservation
Stage 2: iterative pre-copy
Stage 3: stop-and-copy
Stage 4: commitment
Relocation Strategy
VM active on host ADestination host
selected
(Block devicesmirrored)
Initialize container ontarget host
Copy dirty pages insuccessive rounds
Suspend VM on hostA
Redirect networktraffic
Synch remaining
state
Activate on host BVM state on host A
released
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 39/67
Pre-Copy Migration: Round 1
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 40/67
Pre-Copy Migration: Round 1
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 41/67
Pre-Copy Migration: Round 1
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 42/67
Pre-Copy Migration: Round 1
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 43/67
Pre-Copy Migration: Round 1
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 44/67
Pre-Copy Migration: Round 2
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 45/67
Pre-Copy Migration: Round 2
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 46/67
Pre-Copy Migration: Round 2
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 47/67
Pre-Copy Migration: Round 2
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 48/67
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 49/67
Pre-Copy Migration: Final
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 50/67
Writable Working Set
Pages that are dirtied must be re-sent
Super hot pages
• e.g. process stacks; top of page free list
Buffer cache
Network receive / disk buffers
Dirtying rate determines VM down-time
Shorter iterations → less dirtying → …
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 51/67
Rate Limited Relocation
Dynamically adjust resources committed toperforming page transfer
Dirty logging costs VM ~2-3%
CPU and network usage closely linked
E.g. first copy iteration at 100Mb/s, thenincrease based on observed dirtying rate
Minimize impact of relocation on server while
minimizing down-time
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 52/67
Web Server Relocation
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 53/67
Iterative Progress: SPECWeb
52s
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 54/67
Iterative Progress: Quake3
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 55/67
Quake 3 Server relocation
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 56/67
Xen Optimizer Functions
Cluster load balancing / optimization Application-level resource monitoring
Performance prediction
Pre-migration analysis to predict down-time
Optimization over relatively coarse timescale
Evacuating nodes for maintenance Move easy to migrate VMs first
Storage-system support for VM clusters Decentralized, data replication, copy-on-write
Adapt to network constraints Configure VLANs, routeing, create tunnels etc
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 57/67
Current Status
x86_32 x86_32p x86_64 IA64 Power
Privileged Domains
Guest Domains
SMP Guests
Save/Restore/Migrate
>4GB memory
VT
Driver Domains
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 58/67
3.1 Roadmap
Improved full-virtualization support Pacifica / VT-x abstraction
Enhanced IO emulation
Enhanced control toolsPerformance tuning and optimization Less reliance on manual configuration
NUMA optimizationsVirtual bitmap framebuffer and OpenGL
Infiniband / “Smart NIC” support
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 59/67
IO Virtualization
IO virtualization in s/w incurs overhead Latency vs. overhead tradeoff
• More of an issue for network than storage
Can burn 10-30% more CPUSolution is well understood Direct h/w access from VMs
• Multiplexing and protection implemented in h/w
Smart NICs / HCAs• Infiniband, Level-5, Aaorhi etc
• Will become commodity before too long
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 60/67
Research Roadmap
Whole-system debugging
Lightweight checkpointing and replay
Cluster/distributed system debugging
Software implemented h/w fault tolerance Exploit deterministic replay
Multi-level secure systems with Xen
VM forking Lightweight service replication, isolation
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 61/67
Parallax
Managing storage in VM clusters.
Virtualizes storage, fast snapshots
Access optimized storage
Root A Root BSnapshot
Data
L2
L1
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 62/67
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 63/67
V2E : Taint tracking
VMM
Control VM
DD
DiskNet
ND
Protected VM
VN VD
I/O Taint
Taint Pagemap
1. Inbound pages are marked as tainted. Fine-grained taint
Details in extension, page-granularity bitmap in VMM.
2. VM traps on access to a tainted page. Tainted pages
Marked not-present. Throw VM to emulation.
Qemu*
Protected VM
VN VD
3. VM runs in emulation, tracking tainted data. Qemu
microcode modified to reflect tainting across data movement.
4. Taint markings are propagated to disk. Disk extension
marks tainted data, and re-taints memory on read.
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 64/67
V2E : Taint tracking
VMM
Control VM
DD
DiskNet
ND
I/O Taint
Taint Pagemap
1. Inbound pages are marked as tainted. Fine-grained taint
Details in extension, page-granularity bitmap in VMM.
2. VM traps on access to a tainted page. Tainted pages
Marked not-present. Throw VM to emulation.
Qemu*
3. VM runs in emulation, tracking tainted data. Qemu
microcode modified to reflect tainting across data movement.
4. Taint markings are propagated to disk. Disk extension
marks tainted data, and re-taints memory on read.
Protected VM
VN VD
Protected VM
VN VD
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 65/67
Xen Supporters
Hardware Systems
Platforms & I/O
Operating System and Systems Management
* Logos are registered trademarks of their owners
Acquired by
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 66/67
Conclusions
Xen is a complete and robust hypervisor
Outstanding performance and scalability
Excellent resource control and protection
Vibrant development communityStrong vendor support
Try the demo CD to find out more!
(or Fedora 4/5, Suse 10.x)
http://xensource.com/community
8/14/2019 Vir Tualization
http://slidepdf.com/reader/full/vir-tualization 67/67