Upload
kory
View
31
Download
3
Embed Size (px)
DESCRIPTION
Codesigned Virtual Machines. 김 덕 환 Kim, Dukhwan ([email protected]) 2005.10.31. Contents. What is the Codesigned VM? Memory & Register State Mapping Self-Modifying Code Support for Code Caching Implementing Precise Traps Input/Output Applying Codesigned VM Case Study : next class. - PowerPoint PPT Presentation
Citation preview
System Synthesis Lab.System Synthesis Lab.2
Contents
What is the Codesigned VM? Memory & Register State Mapping Self-Modifying Code Support for Code Caching Implementing Precise Traps Input/Output Applying Codesigned VM Case Study : next class
System Synthesis Lab.System Synthesis Lab.3
What is the Codesigned VM? (1)
ApplicationBinary
Hardware
Native ISA
ApplicationBinary
VirtualMachine
Virtual ISA
Hardware
Native ISA
ApplicationBinary
VM Software
VM Hardware
Virtual ISA
Conventional HW/SW interface
Conventional Virtual Machine
interface
HW/SW Codesigned
Virtual Machine
Implementation ISA
System Synthesis Lab.System Synthesis Lab.4
What is the Codesigned VM? (2)
Host architecture (target ISA) is designed concurrently with the VM SW.
SW becomes part of the HW platform We can divide the implementation of
HW & SW optimally.
System Synthesis Lab.System Synthesis Lab.5
What is the Codesigned VM? (3)
Codesigned VM vs. System VMSupport an entire system (OS + App)
Codesigned VM is a form of system VM.But in codesigned VM,
Not intended to virtualize HW resources Not intended to support multiple VM environment. Goals include performance, power efficiency, desig
n simplicity.
System Synthesis Lab.System Synthesis Lab.6
Hardware
What is the Codesigned VM? (4)
Application
OS
VMM
Translator
Code Cache
Source ISA (IA32)
Target ISA (Crusoe)
Codesigned VM as a System VMWe refer to the VM SW as a VM Monitor
(VMM)
System Synthesis Lab.System Synthesis Lab.7
What is the Codesigned VM? (5)
Codesigned VM vs. Process VMSimilarity : emulate the source ISA,
dynamic translation, code cacheBut in codesigned VM,
Intrinsic compatibility at the ISA level (not ABI level) Both user-level & system-level instructions must be emulated.
Improved performance, power efficiency, design simplicity. Compatibility is just a requirement, not a motivation.
System Synthesis Lab.System Synthesis Lab.8
What is the Codesigned VM? (6)
Codesigned VM vs. Superscalar processorSimilarity : perform translation
source ISA (IA32) target ISA (P4, Crusoe)But in codesigned VM,
The translation is done in SW. less cost, small size, design simplicity, much
more optimization opportunities Inter-instruction optimization is possible.
System Synthesis Lab.System Synthesis Lab.9
Code Translation Methods
instr. 1
instr. 2
instr. 3
.
.
.
instr. n
micro-op amicro-op bmicro-op cmicro-op dmicro-op e
.
.
.
micro-op pmicro-op qmicro-op r
source target
Code translation by HW
instr. 1
instr. 2
instr. 3
.
.
.
instr. n
instr. A
instr. B
instr. C
instr. D
.
.
.
instr. M
source target
Code translation by SW
System Synthesis Lab.System Synthesis Lab.10
Register State Mapping
Register state mapping is easy.Host register files can be made larger
than the guest’s.
r0-r31
counter
linkreg
MQ
const. 0
R0-R31
R32
R33
R34
R35
R36-R63
PowerPC Daisy host
ScratchSpeculative Results
ConstantsPointers
System Synthesis Lab.System Synthesis Lab.11
Memory State Mapping
Concealed MemoryA reserved region for VMM, code cache,
other data used by VMM.Never visible to guest SW.
This is possible because VMM takes control from the boot process.
Fixed size, normally diskless (to simplify the system design)
VMM may be stored in ROM.
System Synthesis Lab.System Synthesis Lab.12
Concealed Memory (1)
CodeCache
VMM Code
VMM Data
Source ISA Code
Source ISA Data
I-Cache
D-Cache
ProcessorCore
Concealed Memory
Conventional Memory
Memory system in Codesigned VM I-cache only holds target ISA instructions
System Synthesis Lab.System Synthesis Lab.13
Concealed Memory (2)
Memory mapping for Concealed memory1. Concealed logical memory shares
address space with the guest Host address space must be enlarged
ConcealedLogicalAddress
ConventionalLogicalAddress
Concealedreal
memory
Conventionalreal
memory
concealedmemorymapping
guestmemorymapping
System Synthesis Lab.System Synthesis Lab.14
Concealed Memory (3)
Memory mapping for Concealed memory2. Two separate logical address spaces.
Load/Store must select the mapping. This can be controlled by the VMM.
concealedmemorymapping
guestmemorymapping
ConcealedLogicalAddress
ConventionalLogicalAddress
Concealedreal
memory
Conventionalreal
memory
System Synthesis Lab.System Synthesis Lab.15
Concealed Memory (4)
Memory mapping for Concealed memory3. Use real addressing for concealed
memory. Special case of option 2. Separate set of Load/Store, or a mode bit.
guestmemorymapping
ConcealedReal
Address space
ConventionalLogicalAddress
Concealedreal
memory
Conventionalreal
memory
System Synthesis Lab.System Synthesis Lab.16
Self-Modifying Code (1)
Basically use same technique as in Ch 3.Keep guest OS’s virtual-to-real page
mappingAll load/store addresses are mapped to
source memory region.Write-protect guest code region Any
attempt to write into that region will cause a trap. Then VM can handle this.
But,We cannot use a system call to write-protect,
because we should not modify guest’s page state.
System Synthesis Lab.System Synthesis Lab.17
Self-Modifying Code (2)
Solution : Use TLB for write-protectTLB is managed by VMM, has an
additional bit indicating “write-protect”.The VMM sets write-protect bit
whenever an entry for a code page is loaded into TLB.
VMM should maintain a table of all the guest virtual pages for translated code.
Moreover, because we are discussing about codesigned VM, we can get …
System Synthesis Lab.System Synthesis Lab.18
Self-Modifying Code (3)
Special hardware support. In the Transmeta Crusoe, a special
hardware structure is added to speedup fine-grained write-protection checking.
Goal : Find out whether this is really write to translated code region
Virtual address (TLB) Real address (Filtered by write-protect table) write fault or not
System Synthesis Lab.System Synthesis Lab.19
Self-Modifying Code (4)
virt.addr phys.addr 0
virt.addr phys.addr 1
.
.
.
.
.
.
virt.addr phys.addr 0
bit mask
bit mask
.
.
.
bit mask
ComparisonLogic
source addr
virt. page No.
phys. page No.
WP. bits
TLB
Write-Protect Tablehit/miss
wp bit mask
Page level write-protect fault
source code write fault
Page Offset Bits
System Synthesis Lab.System Synthesis Lab.20
Self-Modifying Code (5)
I/O writes to guest code memory must be caught.For translated code in the code cache,
keep track of all the real guest pages.Maintain a hardware table for I/O writes
– entries for all the real pages that hold guest code page.
A store to any of these pages cause an interrupt to the VMM. Then VMM flushes the translated code.
System Synthesis Lab.System Synthesis Lab.21
Support for Code Caching (1)
Code cache performance is the most important.SPC (Hash) TPC (if hit) access
code cache Involves multiple mem access + indirect
jumpSuperblock chaining may help –
eliminate table lookup, direct jumps and branches
But how about indirect jumps?
System Synthesis Lab.System Synthesis Lab.22
Support for Code Caching (2)
To reduce table lookup overhead – use SW-based jump target prediction
But, If SW prediction is incorrect time is
wastedMany indirect jumps are difficult to predict
(ex. returns) Again, for codesigned VM, we can get…
if (Rx == #addr_1) goto #target_1else if (Rx == #addr_2) goto #target_2else map_lookup(Rx)
System Synthesis Lab.System Synthesis Lab.23
Support for Code Caching (3)
Hardware support for code caching. JTLB (Jump Translation Lookaside
Buffers)D-RAS (Dual-address Return Address
Stack)
System Synthesis Lab.System Synthesis Lab.24
JTLB (1)
“a specially designed HW cache of map table entries”
SPC Hashtag TPC
tag TPC
tag TPC
TagCompare
tag
tag
MUX
hit or miss
JTLB
select
TPC
System Synthesis Lab.System Synthesis Lab.25
JTLB (2)
JTLB_Lookup instruction
Lookup_Jump instruction and predictionPredict using BTB and fetch
JTLB hit and prediction correct Happy JTLB hit but misprediction Redirect fetch to
jump target TPC from JTLB JTLB miss Redirect fetch to fall-through addr.
JTLB_Lookup Ri, Rj, RkJump Ri, Rj == 0Jump map_lookup
SPChit/missTPC
System Synthesis Lab.System Synthesis Lab.26
D-RAS (1)
The RAS (Return address stack) helps solving return-jump problem.
In codesigned VM, We need TPC (not SPC) If the procedure call is at the end of a
translated superblock, the return address may not be correct.
TranslationBlock A
TranslationBlock X
Call
Return
???
System Synthesis Lab.System Synthesis Lab.27
D-RAS (2)
A specialized dual-address RAS is used.
Opcode SPC TPC
Push_DRAS instruction
SPC TPC
.
.
.
.
.
.
Opcode SPC
Return instruction
Predicted SPC
Predicted TPC
push pop
Dual-Address Return Address Stack
System Synthesis Lab.System Synthesis Lab.28
Implementing Precise Traps
Similar techniques in Chapter 3, 4Maintain SW checkpointsCode motion with extending register live
rangeTrap occurs Interpretation beginning at
the checkpoint to establish correct state In codesigned VM,
Enough registers live ranges can be extended with less register pressure
Restriction of code motion is relaxed. Why?
System Synthesis Lab.System Synthesis Lab.29
HW Support for Checkpoints (1)
Use HW to set a checkpoint when each translation block is entered.
TranslationBlock A
TranslationBlock B
TranslationBlock C
TranslationBlock N
set checkpoint
set checkpoint
set checkpoint
set checkpoint
System Synthesis Lab.System Synthesis Lab.30
HW Support for Checkpoints (2)
If a trap occurs,HW restores the
state at the beginning of the block.
Then interpretation is used to provide the precise exception state.
TranslationBlock A
TranslationBlock B
Source code
restore checkpoint
trap !
interpret
System Synthesis Lab.System Synthesis Lab.31
HW Support for Checkpoints (3)
When a new translation block is entered,The state from the previous block is
“committed” And a new checkpoint is set.
Setting register checkpoint – Using shadow copy of registersWhen checkpoint is set – registers are
copied to shadow registers.When a trap occurs – copy back from
shadow registers to working registersThese copying are done very fast.
System Synthesis Lab.System Synthesis Lab.32
HW Support for Checkpoints (4)
Checkpointing memory – store operations are bufferedUntil the current translation block is
exited (committed) If an exception occurs, the buffered
stores are flushed This is done by HW
Restrictions on code motion are relaxed.Fixed size of store buffer constrain the
translation block size.
System Synthesis Lab.System Synthesis Lab.33
Page Fault Compatibility (1)
Guest OS must observe exactly the same page fault as on a native platform.
If guest OS manages conventional memory, (still, mem mapping by host)Page fault for data region will be detected
naturallyDuring interpretation, page fault for code
region will also be detected.But executing translation code does not
fetch any code from the guest memory
System Synthesis Lab.System Synthesis Lab.34
Page Fault Compatibility (2)
When a translated instruction is fetched from the code cache, we trigger a page fault, ifthe corresponding guest instruction
would have caused a page fault on a native platform.
How to do this? Active approachLazy approach
System Synthesis Lab.System Synthesis Lab.35
Active Page Fault Detection (1)
Monitor potential page replacement by the guest OS.Assuming architected page table, VMM
can identify the mem region of page table.VMM monitors the guest OS’s modification
to the architected page table.By write-protecting the page table, VMM
can monitor any change of a virtual page mapping.
VMM keeps a table for : in which virtual pages each source instructions is contained
System Synthesis Lab.System Synthesis Lab.36
Active Page Fault Detection (2)
If the page table is modified,VMM flushes all the translations in the code
cache derived from that (modified) page.Table 1 - Each source page : all the
translation block (must-be-flushed blocks)Table 2 – keep track of any link
backpointers links (for removed pages) are changed to point
VMM emulation manager. emulation process will detect the instruction
page fault.
System Synthesis Lab.System Synthesis Lab.37
Lazy Page Fault Detection (1)
Code cache flushing is postponed until actual use of the replaced code.Every time the translated code crosses a
source page boundary, check the page table.
At the time crossing the boundary, Verify_Translation instruction is inserted.
It checks the page mapping page mapped correctly proceed page not mapped page fault
System Synthesis Lab.System Synthesis Lab.38
Lazy Page Fault Detection (2)
ABC
DE
FG
HI
J
K
L
ABC
DE
FG
HI
J
K
L
ABC
DE
FG
HIJ
KL
Probe page tablePage
correctlymapped?
Yes
No Jump to VMM
continue execution
Guest Pages Code Cache
Verify_Translationinstruction
System Synthesis Lab.System Synthesis Lab.39
Input/Output (1)
If the VMM does not use any I/O,All the guest device drivers can be run as
is.Any I/O instructions or memory mapped
I/O is simply passed through. Volatile memory inhibit optimization.
So we need to identify access to the volatile memory.Use access-protect bit : load/store to that
page trap deoptimize for correct sequence.
Special volatile version of load/store
System Synthesis Lab.System Synthesis Lab.40
Input/Output (2)
Using disk in VMMfor disk-based code
cache approach – large, persistent code cache
requires relaxed transparency
“concealed secondary storage”
VMM-aware special disk driver
Guest OS
VMM
SpecialDisk Driver
ConcealedDisk region
System Synthesis Lab.System Synthesis Lab.41
Applying Codesigned VM (1)
Advantage at the macro level :New ISAs can be implementedEfficiency in instruction-level parallelism
(Crusoe)Simplify instruction issue logicHigh-level object-oriented source ISA
(IBM AS/400)
System Synthesis Lab.System Synthesis Lab.42
Applying Codesigned VM (2)
Advantage at the micro level :Codesigned VM permit the
implementation of specific performance enhancement.
Implementation-dependent profiling HW can be built in for use by dynamic translating/optimizing SW.
System Synthesis Lab.System Synthesis Lab.43
References
http://www.research.ibm.com/vliw/pubs.html
Smith, J.E. et al., “Achieving High Performance via Co-Designed Virtual Machine”, IWIA 1999
Dehnert, J.C. et al, “The Transmeta Code Morphing Software : Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges”, CGO 2003
System Synthesis Lab.System Synthesis Lab.44
Thank You !!