Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
EECS 470
Lecture 17
Virtual MemoryVirtual Memory
Fall 2007Fall 2007
Prof. Thomas Wenisch
http://www.eecs.umich.edu/courses/eecs4http://www.eecs.umich.edu/courses/eecs470
Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti,
Lecture 17 Slide 1EECS 470
Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin. Purdue University, University of Michigan, and University of Wisconsin.
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Announcements
HW # 5 (due 11/16)HW # 5 (due 11/16)Will be posted by Wednesday
Milestone 2 (due 11/14)
Lecture 17 Slide 2EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Readings
For Today:H&P Appendix C.4‐C.6.Jacob & Mudge. Virtual Memory in Contemporary Processors
F M dFor Monday:H&P 5.3
Lecture 17 Slide 3EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Improving Cache Performance: SummaryMiss rate
large block sizeHit time (difficult?)
small and simple cacheshigher associativityvictim cachesskewed‐/pseudo‐associativity
pavoiding translation during L1 indexing (later)pipelining writes for fastskewed /pseudo associativity
hardware/software prefetchingcompiler optimizations
pipelining writes for fast write hitssubblock placement for fast write hits in write through
Miss penaltygive priority to read misses over writes/writebacks
gcaches
subblock placementearly restart and critical word firstnon blocking caches
Lecture 17 Slide 4EECS 470
non‐blocking cachesmulti‐level caches
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
2 Parts to Modern VM VM provides each process with the illusion of aVM provides each process with the illusion of a large, private, uniform memory
Part A: Protectioneach process sees a large, contiguous memory segment without holeseach process’s memory space is private, i.e. protected from access by other processesot e p ocesses
Part B: Demand Pagingcapacity of secondary memory (swap space on disk)at the speed of primary memory (DRAM)
Based on a common HW mechanism: address translationt “ i t l” “ ff ti ” dduser process operates on “virtual” or “effective” addresses
HW translates from virtual to physical on each referencecontrols which physical locations can be named by a processll d l f h l b k ( )
Lecture 17 Slide 5EECS 470
allows dynamic relocation of physical backing store (DRAM vs. HD)VM HW and memory management policies controlled by the OS
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Evolution of Protection MechanismsEarliest machines had no concept of protection and address translation
no need‐‐‐single process, single userautomatically “private and uniform” (but not very large)programs operated on physical addresses directlyp g p p y y
no multitasking protection, no dynamic relocation (at least not very easily)
Lecture 17 Slide 6EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
base and bound registersIn a multi‐tasking systemIn a multi tasking systemEach process is given a non‐overlapping, contiguous physical memory region, everything belonging to a process must fit in that region
Wh i d i OS t b t th t t f th ’When a process is swapped in, OS sets base to the start of the process’s memory region and bound to the end of the region
HW translation and protection check (on each memory reference)
PA = EA + base
provided (PA < bound), else violations
⇒ Each process sees a private and uniform address space (0 .. max)
active process’sBaseregion
another process’si
Bound
privileged controlregisters
Lecture 17 Slide 7EECS 470
physical mem.
region Bound can also be formulated as a range
registers
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarSegmented Address Space
segment == a base and bound pairg p
segmented addressing gives each process multiple segmentsinitially, separate code and data segments
‐ 2 sets of base‐and‐bound reg’s for inst and data fetch‐ allowed sharing code segments
became more and more elaborate: code, data, stack, etc.beca e o e a d o e e abo ate code, data, stac , etcalso (ab)used as a way for an ISA with a small EA space to address a larger physical memory space
SEG # EASEG # EA
segment tablest b
segmenttable
+,<base &
PA&
okay?
must be 1. privileged data
structures and 2. private/unique
Lecture 17 Slide 8EECS 470
boundp q
to each process
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarPaged Address Space
Segmented addressing creates fragmentation problemsSegmented addressing creates fragmentation problems, a system may have plenty of unallocated memory locationsthey are useless if they do not form a contiguous region of a ffi i isufficient size
In a Paged Memory System:
PA i di id d i fi d i ( 4kb )PA space is divided into fixed size segments (e.g. 4kbyte), more commonly known as “page frames”
EA is interpreted as page number and page offsetEA is interpreted as page number and page offset
Page No. Page Offset
page tables
pagetable
+page framePA
page tablesmust be
1. privileged data structures and
2 pri ate/ niq e
Lecture 17 Slide 9EECS 470
table page frame base &okay?
2. private/unique to each process
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Demand PagingMain memory and Disk as automatically managed levels in the memory hierarchies
analogous to cache vs. main memory
Drastically different size and time scales
⇒ very different design decisions
Early attempts von Neumann already described manual memory hierarchiesBrookner’s interpretive coding, 1960
a software interpreter that managed paging between a 40kb main memory and a 640Kb drum
Atlas, 1962hardware demand paging between a 32‐page (512 word/page) main
d d
Lecture 17 Slide 10EECS 470
memory and 192 pages on drumsuser program believes it has 192 pages
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Demand Paging vs. Caching: 2003L1 Cache Demand PagingL1 Cache Demand Paging
capacity 32KB~MB ??
100MB~10GB ??100MB~10GB ??
block size 16~128 Byte 4K to 64K Byte
hit ti 1~3 50 150hit time 1~3 cyc 50‐150 cyc
miss penalty 5~150 cycles 1M to 10M cycles
i t 0 1~10% 0 00001~0 001%miss rate 0.1~10% 0.00001~0.001%
hit handling hw hw
i h dli hmiss handling hw sw
Lecture 17 Slide 11EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Page-Based Virtual Memory
Page offsetVirtual address (64-bit)
(12-bit)(1~10 GBytes)
oder Translation
Main memory pagesVirtual page number
(1 10 GBytes)
deco
oder
memory (page table)
(52-bit)
(~8-bytes)
deco
Physical Page
NumberNumberPhysical address
Where to hold this translation memory and (40-bit)
Lecture 17 Slide 12EECS 470
how much translation memory do we need? (10 ~ 100 GBytes)
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarHierarchical Page Table
p1 p2 P.O.effective address
12-bit10-bit10-bit
p2
effective address
privileged
Page Table of Base of the P T bl f
p1
d
pregister
the page table
pages of thepage table
i di k
Page Table of the page table
g
data pages
page in swap diskpage in main memorypage does not exist
Storage of overhead of translation should be proportional to
Lecture 17 Slide 13EECS 470
Storage of overhead of translation should be proportional to the size of physical memory and not the virtual address space
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Inverted or Hashed Page Tables
Base of Table
InvertedPage Table
hashPIDTableOffset
VPN+ PA of IPTE
VPN PID PTE
Size of Inverted Page table only needs to beproportional to the size of the physical memory
Each VPN can only be mapped to a small setof entries according to a hash function
PhysicalMemory
To translate a VPN, check all allowed table entries for matching VPN and PID
Lecture 17 Slide 14EECS 470
y
How many memory lookup per translation?
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Translation Look-aside Buffer (TLB)
Essentially a cache ofrecent address translations
Virtual addressrecent address translationsavoids going to the page tableon every reference
indexed by lower bits ofTag
Page offsetVPN
VPN (virtual page #)tag = unused bits of VPN +
process ID
=
Indexdata = a page‐table entry
i.e. PPN (physical page #) andaccess permission
status = valid dirtyPhysical page no.
Page offset
status = valid, dirtythe usually cache design choices (placement, replacement policymulti‐level, etc) apply here too.
Physical address
Lecture 17 Slide 15EECS 470
, ) pp y
What should be the relative sizes of ITLB and I-cache?
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Virtual to Physical Address TranslationEffectiveAddress
TLBLookup
i hit
≤ 1 pclk
Page TableWalk
ProtectionCheck
miss hit
≤ 1 pclk100’s pclkby HW or SW Walk
succeed fail denied permitted
by HW or SW
Update TLB Page FaultOSTable Walk
PhysicalAddressTo Cache
ProtectionFault
10000’ lk
Lecture 17 Slide 16EECS 470
To Cache10000’s pclk
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cache Placement and Address Translation
PAPhysical Cache (Most Systems)
CPU PhysicalCache
MMUPhysicalMemory
VA longer hit time
fetch critical path
VAVirtual Cache (SPARC2’s)
fetch critical path
CPU VirtualCache
MMUPhysicalMemory
PA aliasing problem
cold start aftercold start after context switch
Vi t l h t l b
fetch critical path
Lecture 17 Slide 17EECS 470
Virtual caches are not popular anymore becauseMMU and CPU can be integrated on one chip
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Physically Indexed Cache
Tag Index Page Offset (PO)VirtualAddress
Virtual Page No. (VPN)
TLB
kg
Address(n=v+g bits) v-k
Phy. Page No. (PPN) PO
p
PhysicalTag Index BO
yAddress(m=p+g bits)
t i b
D-cache
Lecture 17 Slide 18EECS 470
Data
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Virtually Indexed CacheParallel Access to TLB and Cache arraysParallel Access to TLB and Cache arrays
Virtual Pg No (VPN) Virtual Pg No (VPN)Virtual Pg No. (VPN) Tag Index Page Offset Tag Index Page Offset
gk Index BOv-k
Virtual Pg No. (VPN)
TLB
D-cache
gi b
pPPN
PPNp
p
=Data
Hit/Miss
pp
Lecture 17 Slide 19EECS 470
How large can a virtually indexed cache get?
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Large Virtually Indexed Cache
Virtual Pg No. (VPN) Tag Index Page Offset Tag Index Page Offset
Virtual Pg No. (VPN) Tag Index Page Offset Tag Index Page Offset
TLBgk Index BOv-k
i ba
D-cachePPN p
p
PPNData
pp
= Hit/Miss
If two VPNs differs in a, but both map to the same PPN then
Lecture 17 Slide 20EECS 470
there is an aliasing problem
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Virtual Address Synonyms
To Virtual pages that map to the same physical pageTo Virtual pages that map to the same physical pagewithin the same virtual address spaceacross address spaces
VA1
VA2
PA
Using VA bits as IDX PA data may reside in different sets in cache!!
Lecture 17 Slide 21EECS 470
Using VA bits as IDX, PA data may reside in different sets in cache!!
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Synonym (or Aliasing)When VPN bits are used in indexing, two virtual addresses that map to the same physical Virtual Pg No. (VPN) p p yaddress can end up sitting in two cache lines
Tag Index Page OffsetIndex BO
g ( )
In other words, two copies of the same physical memory location
D-cache
i ba
may exist in the cache
⇒modification to one copy won’t be visible in the other
PPND t
p
pPPNwon t be visible in the other=
DataHit/Miss
pPPNfromTLB
Lecture 17 Slide 22EECS 470
If the two VPNs do not differ in a then there is no aliasing problem
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Synonym Solutions
Limit cache size to page size times associativityLimit cache size to page size times associativityget index from page offset
Search all sets in parallel64K 4‐way cache, 4K pages, search 4 sets (16 entries)Slow!
R t i t l t i OSRestrict page placement in OSmake sure index(VA) = index(PA)
Eliminate by OS conventionEliminate by OS conventionsingle virtual spacerestrictive sharing model
Lecture 17 Slide 23EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
R10000’s Virtually Index Caches32KB 2‐Way Virtually‐Indexed L1
needs 10 bits of index and 4 bits of block offset page offset is only 12‐bits ⇒ 2 bits of index are VPN[1:0]p g y [ ]
Direct‐Mapped Physical L2 L2 is Inclusive of L1VPN[1:0] is appended to the “tag” of L2
Given two virtual addresses VA and VB that differs in a and both map to the same physical address PAmap to the same physical address PA
Suppose VA is accessed first so blocks are allocated in L1&L2What happens when VB is referenced?1 VB i d t diff t bl k i L1 d i1 VB indexes to a different block in L1and misses2 VB translates to PA and goes to the same block as VA in L23. Tag comparison fails (VA[1:0]≠VB[1:0])4 L2 detects that a synonym is cached in L1⇒ VA’s entry in L1 is
Lecture 17 Slide 24EECS 470
4. L2 detects that a synonym is cached in L1 ⇒ VA s entry in L1 is ejected before VB is allowed to be refilled in L1
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
MIPS R10K
00
64‐bit virtual addresstop 2 bits set kernel/supervisor/user mode
0.5 GBmapped
0.5 GBmapped 20
00/3
00
additional bits set cache and translation behaviorbit 61‐40 not translate at all (holes in the VA??)
pp(ksseg)0.5 GB
unmappeduncached
0 GB it VA
in R
8‐bit ASID (address space ID) distinguishes
b t
0.5 GBunmapped
cached
rom
32-
bi
between processes
40 bi h i l dd
Bottom2 GB
M d xam
ple
fro
40‐bit physical address
T l i
Mapped
(normal)
mpl
ified
ex
Lecture 17 Slide 25EECS 470
Translation ‐
“64”‐bit VA + 8‐bit ASID 40‐bit PA
sim
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
MIPS TLB64‐entry fully associative unified TLB
paired: each entry maps 2 consecutive VPNs to 2 different PPNssoftware managed
7‐instruction page table walk in the best caseTLB Write Random: chooses a random entry for TLB replacementOS can exclude some number of TLB entry (low range) to be excluded from the random selection, to hold translations that cannot miss or should not miss
TLB entry
N h bl
VPN20 ASID6 06PPN20 ndvg 08
R2000
N: noncacheableD: dirty (actually a write‐enable bit)V: valid
Lecture 17 Slide 26EECS 470
G: global entry, i.e., ignore ASID matching
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
MIPS Bottom-Up Hierarchical TablePage table organization is not part of the ISA
Reference design optimized for software TLB miss handlingReference design optimized for software TLB miss handling
VPN PO VA case TLB Miss, trapwhich
VPN 0sPTEBase VA of PTE(generated automatically
address space?
by HW after TLB miss)
PPN status
memload
PTE loaded from memPPN status PTE loaded from mem
Can this load miss?
Lecture 17 Slide 27EECS 470
Can this load miss?What happens if it misses?
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarMIPS: UserTLB Miss
Instruction SequenceInstruction Sequencemfc0 k0,tlbcxt # move the contents of TLB
# context register into k0gmfc0 k1,epc # move PC of faulting load
# instruction into k1lw k0 0(k0) # load thru address that waslw k0,0(k0) # load thru address that was
# inTLB context registermtc0 k0,entry_lo # move the loaded value
# into the EntryLo registertlbwr # write entry into the TLB
# at a random slot numberj k1 # jump to PC of faulting
# load instruction to retryrfe # RESTORE FROM
Lecture 17 Slide 28EECS 470
rfe # RESTORE FROM# EXCEPTION
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
SPARC V964‐bit Virtual Address
an implementation can choose not to map the high‐order bits(must be sign‐extended from the highest mapped bit)( g f g pp )
e.g. UltraSPARC 1 maps only the lower 44 bits
physical address space size set by implementation
64 entry fully associative I‐TLB and D‐TLB
context13g VA<63:13> 51 Tag64
v size nfo IE Soft diag PA<40:13> Soft L CP CV e p w g Data64
Lecture 17 Slide 29EECS 470
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
TLB Miss HandlingSPARC V8 (32‐bit) defines a 3‐level hierarchical page table
for HWMMU page‐table walkfor HW MMU page table walk
context table L1 Table L2 Table L3 Table+VA +VA +VA
context
descriptors256
descriptors(1024-byte)
64 descriptors(256-byte)
64 PTEs
(256-byte)
+VA[31:24] +VA[23:18] +VA[17:12]
SPARC V9 (64‐bit) defines Translation Storage BufferSPARC V9 (64 bit) defines Translation Storage Buffera software managed, direct‐mapped cache of PTEs (think inverted/hashed page table)
HW assisted address generation on a TLB miss e g for 8 k pages
Lecture 17 Slide 30EECS 470
HW assisted address generation on a TLB miss, e.g.,for 8‐k pages {TSBbase63:21, Logic(TSBbase20:13,VA32:22,size,split?),VA21:13,0000}
TLB miss handler search TSB. If TSB misses, a slower TSB handler takes over
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
IBM PowerPC (32-bit)
seg#4 seg offset16 page offset12 EA32
segments 256MB regions
16-entry segment table
Pro
tect
ion
seg ID24 seg offset16 page offset12 VA52
Pgi
ng
128 2-way ITLB and DTLB
eman
d P
ag
PPN20 page offset12PA64
De
Lecture 17 Slide 31EECS 470
64-bit PowerPC = 64-bit EA -> 80-bit VA 64-bit PA. How many segments in EA?
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
IBM PowerPC Hashed Page TableVPN40
HashFunction Hashed Page Table
8 PTE’s per group
**must hold at least N PTE’s for a system with 2N physical pagesHW table walk
table base +
y p y p gHW table walkVPN hashes into a PTE group of 88 PTEs searched sequentially for tag matchf f d f h dif not found in first PTE group search a second PTE groupif not found in the 2nd PTE group, trap to software handler
Hashed table structure also used for EA to VA mapping in 64‐bit
Lecture 17 Slide 32EECS 470
Hashed table structure also used for EA to VA mapping in 64 bit implementations
© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
HP PA-RISC: PID and AID 2‐level translation:
64‐bit EA 96‐bit VA (global) 64‐bit PA
V i bl i d t d EA t VAVariable sized segmented EA to VA
A diff i iA different twist on protectioneveryone else: limit what can be named by a process
In PowerPC, OS controls what VA can be reached by a process by controlling h ’ i h iwhat’s in the segment registers
HP‐RISC: rights‐based access controlUser controls segment registers, i.e., user can generate any VA it wantsEach virtual page has an access ID (not related to ownership by processes) assigned by the OSEach process has 8 active protection ID in special HW registers controlled by the OS
Lecture 17 Slide 33EECS 470
the OSA process can only access a page if it has the key (PID) that fits the lock (AID)