46
Virtual Machine 1 Computer System Engineering (2013 Spring)

Virtual Machine

Embed Size (px)

DESCRIPTION

Virtual Machine. Computer System Engineering (2013 Spring). Review. Layering Threads. Review. Virtualization OS kernel helps run multiple programs on single computer Provides virtual memory to protect programs from each other - PowerPoint PPT Presentation

Citation preview

Page 1: Virtual  Machine

1

Virtual Machine

Computer System Engineering (2013 Spring)

Page 2: Virtual  Machine

2

REVIEW

Page 3: Virtual  Machine

3

Layering Threads

Page 4: Virtual  Machine

4

Review

• Virtualization– OS kernel helps run multiple programs on single

computer– Provides virtual memory to protect programs from

each other– Provides system calls so that programs can interact.

E.g., send/receive, file system, ... – Provides threads to run multiple programs on few

CPUs• Can we rely on the OS kernel to work correctly?

Page 5: Virtual  Machine

5

VIRTUAL MACHINE

Page 6: Virtual  Machine

6

Outline

• Why Virtual Machine?– Micro-kernel?– Emulation?

• Virtual Machine– Memory: shadow memory, H/W support– CPU: para-virt, binary translation, H/W support– I/O Device: driver domain

Page 7: Virtual  Machine

7

Kernel Complexity

Page 8: Virtual  Machine

8

Monolithic Kernel

• Linux is Monolithic– No enforced modularity (e.g., Linux). – Many good software engineering techniques used

in practice.– But ultimately a bug in the kernel can affect entire

system. • How can we deal with the complexity of such

systems? – One result of all this complexity: bugs!

Page 9: Virtual  Machine

9

Linux Fails

• A kernel bug can – Cause the whole Linux system to fail

• Is it a good thing that Linux lasted this long? – Problems can be hard to detect, even though they

may still do damage– E.g., maybe my files are now corrupted, but

system didn't crash– Worse: adversary can exploit a bug to gain

unauthorized access to system

Page 10: Virtual  Machine

10

Linux (2002-2009): 2+ bugs fixed per day!

Page 11: Virtual  Machine

11

Microkernel Design

• Enforced Modularity– Subsystems in "user" programs– E.g., file server, device driver, graphics, networking

are programs

Page 12: Virtual  Machine

12

Why not Microkernel Style?

• Many dependencies between components– E.g., balance between memory used for file cache vs.

process memory

• Moving large components to microkernel-style programs does not win much– If entire file system service crashes, might lose all data

• Better designs possible, but require a lot of careful design work– E.g., could create one file server program per file system,

mount points tell FS client to talk to another FS server

Page 13: Virtual  Machine

13

Why not Microkernel Style?

• Trade-off: spend a year on new features vs. microkernel rewrite– Microkernel design may make it more difficult to

change interfaces later– Some parts of Linux have microkernel design

aspects: X server, some USB drivers, FUSE, printing, SQL database, DNS, SSH, ...

– Can restart these components without crashing kernel or other programs

Page 14: Virtual  Machine

14

One Solution

• Problem: – Deal with Linux kernel bugs without redesigning Linux

from scratch• One idea: run different programs on different

computers– Each computer has its own Linux kernel; if one

crashes, others not affected– Strong isolation (all interactions are client-server), but

often impractical– Can't afford so many physical computers: hardware

cost, power, space, ..

Page 15: Virtual  Machine

15

Run multiple Linux on a Single Computer?

• Virtualization + Abstractions– New constraint: compatibility, because we want to run

existing Linux kernel– Linux kernel written to run on regular hardware– No abstractions, pure virtualization

• Approach is called "virtual machines" (VM): – Each virtual machine is often called a guest– The equivalent of a kernel is called a "virtual machine

monitor" (VMM)– The VMM is often called the host

Page 16: Virtual  Machine

16

Why Virtual Machine?

• Consolidation– Run several different OS on a single machine

• Isolation– Keep the VMs separated as error container– Fault tolerant

• Maintenance– Easy to deploy, backup, clone, migrate

• Security– VM introspection– Antivirus out of the OS

Page 17: Virtual  Machine

17

Naive Approach

• Emulate every single instruction from the guest OS– In practice, this is sometimes done, but often too

slow in practice– Want to run most instructions directly on the CPU,

like a regular OS kernel

Page 18: Virtual  Machine

18

MEMORY VIRTUALIZATION

Page 19: Virtual  Machine

19

Virtualizing Memory

• VMM constructs a page table that maps guest address to host physical address– E.g., if guest VM has 1GB of memory, it can access

memory address 0~1GB– Each guest VM has its own mapping for memory

address 0, etc. – Different host physical address used to store data

for those memory locations

Page 20: Virtual  Machine

20

Virtualizing the PTP register / page tables

• Terminology: 3 levels of addressing now– Guest virtual. Guest physical. Host physical– Guest VM's page table contains guest physical address– Hardware page table must point to host physical address

(actual DRAM locations)

• Setting hardware PTP register to point to guest page table would not work– Processes in guest VM might access host physical address

0~1GB– But those host physical addresses might not belong to that

guest VM

Page 21: Virtual  Machine

21

One Solution: Shadow Pages

• Shadow Paging– VMM intercepts guest OS setting the virtual PTP

register– VMM iterates over the guest page table,

constructs a corresponding shadow PT– In shadow PT, every guest physical address is

translated into host physical address– Finally, VMM loads the host physical address of

the shadow PT– Shadow PT is per process

Page 22: Virtual  Machine

22

Shadow Page Table

Page 23: Virtual  Machine

HPA

Guest VM

GPT

GVA

SPTHPT

GPA

Page 24: Virtual  Machine

24

What if guest OS modifies its page table?

• Real hardware would start using the new page table's mappings. – Virtual machine monitor has a separate shadow page table.

• Goal: – VMM needs to intercept when guest OS modifies page

table, update shadow page table accordingly. • Technique:

– use the read/write bit in the PTE to mark those pages read-only.

– If guest OS tries to modify them, hardware triggers page fault.

– Page fault handled by VMM: update shadow page table & restart guest.

Page 25: Virtual  Machine

25

Another Solution: Hardware Support

• Hardware Support for Memory Virtualization– Intel’s EPT (Extended Page Table)– AMD’s NPT (Nested Page Table)

• Another Table– EPT for translation from GPA to HPA– EPT is controlled by the hypervisor– EPT is per-VM

Page 26: Virtual  Machine

GVA

GPA

HPA

Guest VM VMM

VA

PA

PT

EPT PT

Root modeNon-root mode

Page 27: Virtual  Machine

27

CPU VIRTUALIZATION

Page 28: Virtual  Machine

28

Virtualizing the U/K bit

• Hardware U/K bit should be set to 'U': – Otherwise guest OS can do anything

• Behavior affected by the U/K bit: – 1. Whether privileged instructions can be

executed (e.g., load PTP)– 2. Whether pages marked "kernel-only" in page

table can be accessed

Page 29: Virtual  Machine

29

Virtual U/K Bit

• Implementing a virtual U/K bit: – VMM stores the current state of guest's U/K bit (in

some memory location)– Privileged instructions in guest code will cause an

exception (U mode)• VMM exception handler's job will be to emulate these

instructions

– Approach is called "trap and emulate”• E.g., if load PTP instruction runs in virtual K mode, load

shadow PT. Otherwise, send an exception to the guest OS, so it can handle it

Page 30: Virtual  Machine

30

PTP Exception

Page 31: Virtual  Machine

31

Virtual U/K Bit

Page 32: Virtual  Machine

32

Protect Kernel-only Pages

• How do we selectively allow / deny access to kernel-only pages in guest PT? – Hardware doesn't know about our virtual U/K bit

• Idea: – Generate two shadow page tables, one for U, one

for K– When guest OS switches to U mode, VMM must

invoke set_ptp(current, 0)

Page 33: Virtual  Machine

33

Non-virtualizable Instruction

• Hard: what if some instructions don't cause an exception? – E.g., on x86, can always read the current U/K bit

value– Guest OS kernel might observe it's running in U

mode, expecting K– Similarly, can jump to U mode from both U & K

modes– How to intercept switch to U mode to replace

shadow page table?

Page 34: Virtual  Machine

34

Approach 1: “Para-Virtualization"

• Modify the OS slightly to account for these differences– Compromises on our goal of compatibility– Recall: goal was to run multiple instances of the

OS on a single computer– Not perfect virtualization, but relatively easy to

change Linux– May be difficult to do with Windows, without

Microsoft's help

Page 35: Virtual  Machine

35

Approach 2: Binary Translation

• VMM analyzes code from the guest VM– Replaces problematic instructions with an

instruction that traps– Once we can generate a trap, can emulate the

desired behavior– Original versions of VMware did this

Page 36: Virtual  Machine

36

Approach 3: Hardware Support

• Both Intel and AMD:– Special "VMM" operating mode, in addition to U/K

bit– Two levels of page tables, implemented in hardware

• Makes the VMM's job much easier: – Let the guest VM manipulate the U/K bit, as long as

VMM bit is cleared– Let the guest VM manipulate the guest PT, as long

as host PT is set– Many modern VMMs take this approach

Page 37: Virtual  Machine

37

DEVICE VIRTUALIZATION

Page 38: Virtual  Machine

38

Virtualizing Devices

• I/O Virtualization– OS kernel accesses disk by issuing special

instructions to access I/O memory– These instructions only accessible in K mode, and

raise an exception– VMM handles exception by simulating a disk

operation– Actual disk contents is typically stored in a file in

the host file system• Case: Xen’s Domain-0 Architecture

Page 39: Virtual  Machine

39

Xen’s Domain-0 (Para-Virtualization)

• Split Driver to Backend and Frontend– Frontend driver in guest VM• As a new type of NIC

– Backend driver in domain-0• One backend instance for each VM

– Communication with I/O ring• Shared page for I/O

Page 40: Virtual  Machine

Application

TCP/IP Stack

Frontend Dev Driver

TCP/IP Stack

Backend Dev Driver Real Device Driver

Split Device Driver

Physical Device

Guest Domain Domain-0

Xen

Hardware

Page 41: Virtual  Machine

41

Xen’s Domain-0 (Full-Virtualization)

• Qemu for Device Emulation– User-level application running in domain-0– Implement NIC (e.g., 8139) in software– Each VM has its own Qemu instance running

• I/O Request Redirection– Guest VM’s I/O requests are sent to domain-0– Then Domain-0 sends to Qemu– Eventually, Qemu will use domain-0’s NIC driver

Page 42: Virtual  Machine

Application

TCP/IP Stack

Device Driver (8139)

TCP/IP Stack

Real Device Driver

Physical Device

Guest Domain Domain-0

Xen

Hardware

Qemu (User-level)

Page 43: Virtual  Machine

43

Hardware Virtualization

• Device Support– SRIOV: Single-Root I/O Virtualization– By PCI-SIG and Intel– Typically used on NIC device

• VF: Virtual Function– One VF for each VM– Limited VFs, e.g., 8 per NIC

• PF: Physical Function– One PF for each device– Used only for configuration, not on the critical path

Page 44: Virtual  Machine

Application

TCP/IP Stack

Device Driver (VF) Device Driver (PF)

Guest Domain Domain-0

Xen

Hardware

User-level Config

VF VF VF VF PF

Physical Device

Application

TCP/IP Stack

Device Driver (VF)

Guest Domain

Page 45: Virtual  Machine

45

Benefits of Virtual Machine

• Can share hardware between unrelated services, with enforced modularity.

• Can run different OSes (Linux, Windows, etc). • Level-of-indirection tricks: can move VM

from one physical machine to another.

Page 46: Virtual  Machine

46

How do VMs relate to microkernels?

• Solving somewhat orthogonal problems– Microkernel: splitting up monolithic kernel into

client/server components– VMs: how to run many instances of an existing OS

kernel• E.g., can use VMM to run one network file server VM

and one application VM

– Hard problem still present• Entire file server fails at once? • Or, more complex design with many file servers