44
Codesigned Virtual Machines 김 김 김 Kim, Dukhwan ([email protected]) 2005.10.31

Codesigned Virtual Machines

  • Upload
    kory

  • View
    31

  • Download
    3

Embed Size (px)

DESCRIPTION

Codesigned Virtual Machines. 김 덕 환 Kim, Dukhwan ([email protected]) 2005.10.31. Contents. What is the Codesigned VM? Memory & Register State Mapping Self-Modifying Code Support for Code Caching Implementing Precise Traps Input/Output Applying Codesigned VM Case Study : next class. - PowerPoint PPT Presentation

Citation preview

Page 1: Codesigned  Virtual Machines

Codesigned Virtual Machines

김 덕 환Kim, Dukhwan ([email protected])

2005.10.31

Page 2: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.2

Contents

What is the Codesigned VM? Memory & Register State Mapping Self-Modifying Code Support for Code Caching Implementing Precise Traps Input/Output Applying Codesigned VM Case Study : next class

Page 3: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.3

What is the Codesigned VM? (1)

ApplicationBinary

Hardware

Native ISA

ApplicationBinary

VirtualMachine

Virtual ISA

Hardware

Native ISA

ApplicationBinary

VM Software

VM Hardware

Virtual ISA

Conventional HW/SW interface

Conventional Virtual Machine

interface

HW/SW Codesigned

Virtual Machine

Implementation ISA

Page 4: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.4

What is the Codesigned VM? (2)

Host architecture (target ISA) is designed concurrently with the VM SW.

SW becomes part of the HW platform We can divide the implementation of

HW & SW optimally.

Page 5: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.5

What is the Codesigned VM? (3)

Codesigned VM vs. System VMSupport an entire system (OS + App)

Codesigned VM is a form of system VM.But in codesigned VM,

Not intended to virtualize HW resources Not intended to support multiple VM environment. Goals include performance, power efficiency, desig

n simplicity.

Page 6: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.6

Hardware

What is the Codesigned VM? (4)

Application

OS

VMM

Translator

Code Cache

Source ISA (IA32)

Target ISA (Crusoe)

Codesigned VM as a System VMWe refer to the VM SW as a VM Monitor

(VMM)

Page 7: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.7

What is the Codesigned VM? (5)

Codesigned VM vs. Process VMSimilarity : emulate the source ISA,

dynamic translation, code cacheBut in codesigned VM,

Intrinsic compatibility at the ISA level (not ABI level) Both user-level & system-level instructions must be emulated.

Improved performance, power efficiency, design simplicity. Compatibility is just a requirement, not a motivation.

Page 8: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.8

What is the Codesigned VM? (6)

Codesigned VM vs. Superscalar processorSimilarity : perform translation

source ISA (IA32) target ISA (P4, Crusoe)But in codesigned VM,

The translation is done in SW. less cost, small size, design simplicity, much

more optimization opportunities Inter-instruction optimization is possible.

Page 9: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.9

Code Translation Methods

instr. 1

instr. 2

instr. 3

.

.

.

instr. n

micro-op amicro-op bmicro-op cmicro-op dmicro-op e

.

.

.

micro-op pmicro-op qmicro-op r

source target

Code translation by HW

instr. 1

instr. 2

instr. 3

.

.

.

instr. n

instr. A

instr. B

instr. C

instr. D

.

.

.

instr. M

source target

Code translation by SW

Page 10: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.10

Register State Mapping

Register state mapping is easy.Host register files can be made larger

than the guest’s.

r0-r31

counter

linkreg

MQ

const. 0

R0-R31

R32

R33

R34

R35

R36-R63

PowerPC Daisy host

ScratchSpeculative Results

ConstantsPointers

Page 11: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.11

Memory State Mapping

Concealed MemoryA reserved region for VMM, code cache,

other data used by VMM.Never visible to guest SW.

This is possible because VMM takes control from the boot process.

Fixed size, normally diskless (to simplify the system design)

VMM may be stored in ROM.

Page 12: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.12

Concealed Memory (1)

CodeCache

VMM Code

VMM Data

Source ISA Code

Source ISA Data

I-Cache

D-Cache

ProcessorCore

Concealed Memory

Conventional Memory

Memory system in Codesigned VM I-cache only holds target ISA instructions

Page 13: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.13

Concealed Memory (2)

Memory mapping for Concealed memory1. Concealed logical memory shares

address space with the guest Host address space must be enlarged

ConcealedLogicalAddress

ConventionalLogicalAddress

Concealedreal

memory

Conventionalreal

memory

concealedmemorymapping

guestmemorymapping

Page 14: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.14

Concealed Memory (3)

Memory mapping for Concealed memory2. Two separate logical address spaces.

Load/Store must select the mapping. This can be controlled by the VMM.

concealedmemorymapping

guestmemorymapping

ConcealedLogicalAddress

ConventionalLogicalAddress

Concealedreal

memory

Conventionalreal

memory

Page 15: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.15

Concealed Memory (4)

Memory mapping for Concealed memory3. Use real addressing for concealed

memory. Special case of option 2. Separate set of Load/Store, or a mode bit.

guestmemorymapping

ConcealedReal

Address space

ConventionalLogicalAddress

Concealedreal

memory

Conventionalreal

memory

Page 16: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.16

Self-Modifying Code (1)

Basically use same technique as in Ch 3.Keep guest OS’s virtual-to-real page

mappingAll load/store addresses are mapped to

source memory region.Write-protect guest code region Any

attempt to write into that region will cause a trap. Then VM can handle this.

But,We cannot use a system call to write-protect,

because we should not modify guest’s page state.

Page 17: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.17

Self-Modifying Code (2)

Solution : Use TLB for write-protectTLB is managed by VMM, has an

additional bit indicating “write-protect”.The VMM sets write-protect bit

whenever an entry for a code page is loaded into TLB.

VMM should maintain a table of all the guest virtual pages for translated code.

Moreover, because we are discussing about codesigned VM, we can get …

Page 18: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.18

Self-Modifying Code (3)

Special hardware support. In the Transmeta Crusoe, a special

hardware structure is added to speedup fine-grained write-protection checking.

Goal : Find out whether this is really write to translated code region

Virtual address (TLB) Real address (Filtered by write-protect table) write fault or not

Page 19: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.19

Self-Modifying Code (4)

virt.addr phys.addr 0

virt.addr phys.addr 1

.

.

.

.

.

.

virt.addr phys.addr 0

bit mask

bit mask

.

.

.

bit mask

ComparisonLogic

source addr

virt. page No.

phys. page No.

WP. bits

TLB

Write-Protect Tablehit/miss

wp bit mask

Page level write-protect fault

source code write fault

Page Offset Bits

Page 20: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.20

Self-Modifying Code (5)

I/O writes to guest code memory must be caught.For translated code in the code cache,

keep track of all the real guest pages.Maintain a hardware table for I/O writes

– entries for all the real pages that hold guest code page.

A store to any of these pages cause an interrupt to the VMM. Then VMM flushes the translated code.

Page 21: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.21

Support for Code Caching (1)

Code cache performance is the most important.SPC (Hash) TPC (if hit) access

code cache Involves multiple mem access + indirect

jumpSuperblock chaining may help –

eliminate table lookup, direct jumps and branches

But how about indirect jumps?

Page 22: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.22

Support for Code Caching (2)

To reduce table lookup overhead – use SW-based jump target prediction

But, If SW prediction is incorrect time is

wastedMany indirect jumps are difficult to predict

(ex. returns) Again, for codesigned VM, we can get…

if (Rx == #addr_1) goto #target_1else if (Rx == #addr_2) goto #target_2else map_lookup(Rx)

Page 23: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.23

Support for Code Caching (3)

Hardware support for code caching. JTLB (Jump Translation Lookaside

Buffers)D-RAS (Dual-address Return Address

Stack)

Page 24: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.24

JTLB (1)

“a specially designed HW cache of map table entries”

SPC Hashtag TPC

tag TPC

tag TPC

TagCompare

tag

tag

MUX

hit or miss

JTLB

select

TPC

Page 25: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.25

JTLB (2)

JTLB_Lookup instruction

Lookup_Jump instruction and predictionPredict using BTB and fetch

JTLB hit and prediction correct Happy JTLB hit but misprediction Redirect fetch to

jump target TPC from JTLB JTLB miss Redirect fetch to fall-through addr.

JTLB_Lookup Ri, Rj, RkJump Ri, Rj == 0Jump map_lookup

SPChit/missTPC

Page 26: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.26

D-RAS (1)

The RAS (Return address stack) helps solving return-jump problem.

In codesigned VM, We need TPC (not SPC) If the procedure call is at the end of a

translated superblock, the return address may not be correct.

TranslationBlock A

TranslationBlock X

Call

Return

???

Page 27: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.27

D-RAS (2)

A specialized dual-address RAS is used.

Opcode SPC TPC

Push_DRAS instruction

SPC TPC

.

.

.

.

.

.

Opcode SPC

Return instruction

Predicted SPC

Predicted TPC

push pop

Dual-Address Return Address Stack

Page 28: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.28

Implementing Precise Traps

Similar techniques in Chapter 3, 4Maintain SW checkpointsCode motion with extending register live

rangeTrap occurs Interpretation beginning at

the checkpoint to establish correct state In codesigned VM,

Enough registers live ranges can be extended with less register pressure

Restriction of code motion is relaxed. Why?

Page 29: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.29

HW Support for Checkpoints (1)

Use HW to set a checkpoint when each translation block is entered.

TranslationBlock A

TranslationBlock B

TranslationBlock C

TranslationBlock N

set checkpoint

set checkpoint

set checkpoint

set checkpoint

Page 30: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.30

HW Support for Checkpoints (2)

If a trap occurs,HW restores the

state at the beginning of the block.

Then interpretation is used to provide the precise exception state.

TranslationBlock A

TranslationBlock B

Source code

restore checkpoint

trap !

interpret

Page 31: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.31

HW Support for Checkpoints (3)

When a new translation block is entered,The state from the previous block is

“committed” And a new checkpoint is set.

Setting register checkpoint – Using shadow copy of registersWhen checkpoint is set – registers are

copied to shadow registers.When a trap occurs – copy back from

shadow registers to working registersThese copying are done very fast.

Page 32: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.32

HW Support for Checkpoints (4)

Checkpointing memory – store operations are bufferedUntil the current translation block is

exited (committed) If an exception occurs, the buffered

stores are flushed This is done by HW

Restrictions on code motion are relaxed.Fixed size of store buffer constrain the

translation block size.

Page 33: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.33

Page Fault Compatibility (1)

Guest OS must observe exactly the same page fault as on a native platform.

If guest OS manages conventional memory, (still, mem mapping by host)Page fault for data region will be detected

naturallyDuring interpretation, page fault for code

region will also be detected.But executing translation code does not

fetch any code from the guest memory

Page 34: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.34

Page Fault Compatibility (2)

When a translated instruction is fetched from the code cache, we trigger a page fault, ifthe corresponding guest instruction

would have caused a page fault on a native platform.

How to do this? Active approachLazy approach

Page 35: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.35

Active Page Fault Detection (1)

Monitor potential page replacement by the guest OS.Assuming architected page table, VMM

can identify the mem region of page table.VMM monitors the guest OS’s modification

to the architected page table.By write-protecting the page table, VMM

can monitor any change of a virtual page mapping.

VMM keeps a table for : in which virtual pages each source instructions is contained

Page 36: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.36

Active Page Fault Detection (2)

If the page table is modified,VMM flushes all the translations in the code

cache derived from that (modified) page.Table 1 - Each source page : all the

translation block (must-be-flushed blocks)Table 2 – keep track of any link

backpointers links (for removed pages) are changed to point

VMM emulation manager. emulation process will detect the instruction

page fault.

Page 37: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.37

Lazy Page Fault Detection (1)

Code cache flushing is postponed until actual use of the replaced code.Every time the translated code crosses a

source page boundary, check the page table.

At the time crossing the boundary, Verify_Translation instruction is inserted.

It checks the page mapping page mapped correctly proceed page not mapped page fault

Page 38: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.38

Lazy Page Fault Detection (2)

ABC

DE

FG

HI

J

K

L

ABC

DE

FG

HI

J

K

L

ABC

DE

FG

HIJ

KL

Probe page tablePage

correctlymapped?

Yes

No Jump to VMM

continue execution

Guest Pages Code Cache

Verify_Translationinstruction

Page 39: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.39

Input/Output (1)

If the VMM does not use any I/O,All the guest device drivers can be run as

is.Any I/O instructions or memory mapped

I/O is simply passed through. Volatile memory inhibit optimization.

So we need to identify access to the volatile memory.Use access-protect bit : load/store to that

page trap deoptimize for correct sequence.

Special volatile version of load/store

Page 40: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.40

Input/Output (2)

Using disk in VMMfor disk-based code

cache approach – large, persistent code cache

requires relaxed transparency

“concealed secondary storage”

VMM-aware special disk driver

Guest OS

VMM

SpecialDisk Driver

ConcealedDisk region

Page 41: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.41

Applying Codesigned VM (1)

Advantage at the macro level :New ISAs can be implementedEfficiency in instruction-level parallelism

(Crusoe)Simplify instruction issue logicHigh-level object-oriented source ISA

(IBM AS/400)

Page 42: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.42

Applying Codesigned VM (2)

Advantage at the micro level :Codesigned VM permit the

implementation of specific performance enhancement.

Implementation-dependent profiling HW can be built in for use by dynamic translating/optimizing SW.

Page 43: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.43

References

http://www.research.ibm.com/vliw/pubs.html

Smith, J.E. et al., “Achieving High Performance via Co-Designed Virtual Machine”, IWIA 1999

Dehnert, J.C. et al, “The Transmeta Code Morphing Software : Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges”, CGO 2003

Page 44: Codesigned  Virtual Machines

System Synthesis Lab.System Synthesis Lab.44

Thank You !!