EECS 470 Lecture 17 Virtual Memory - University of Michigan

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

EECS 470

Lecture 17

Virtual MemoryVirtual Memory

Fall 2007Fall 2007

Prof. Thomas Wenisch

http://www.eecs.umich.edu/courses/eecs4http://www.eecs.umich.edu/courses/eecs470

Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti,

Lecture 17 Slide 1EECS 470

Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin. Purdue University, University of Michigan, and University of Wisconsin.


Announcements

HW # 5 (due 11/16)HW # 5 (due 11/16)Will be posted by Wednesday

Milestone 2 (due 11/14)



Readings

For Today:H&P Appendix C.4‐C.6.Jacob & Mudge. Virtual Memory in Contemporary Processors

F M dFor Monday:H&P 5.3



Improving Cache Performance: SummaryMiss rate

large block sizeHit time (difficult?)

small and simple cacheshigher associativityvictim cachesskewed‐/pseudo‐associativity

pavoiding translation during L1 indexing (later)pipelining writes for fastskewed /pseudo associativity

hardware/software prefetchingcompiler optimizations

pipelining writes for fast write hitssubblock placement for fast write hits in write through

Miss penaltygive priority to read misses over writes/writebacks

gcaches

subblock placementearly restart and critical word firstnon blocking caches


non‐blocking cachesmulti‐level caches


2 Parts to Modern VM VM provides each process with the illusion of aVM provides each process with the illusion of a large, private, uniform memory

Part A: Protectioneach process sees a large, contiguous memory segment without holeseach process’s memory space is private, i.e. protected from access by other processesot e p ocesses

Part B: Demand Pagingcapacity of secondary memory (swap space on disk)at the speed of primary memory (DRAM)

Based on a common HW mechanism: address translationt “ i t l” “ ff ti ” dduser process operates on “virtual” or “effective” addresses

HW translates from virtual to physical on each referencecontrols which physical locations can be named by a processll d l f h l b k ( )


allows dynamic relocation of physical backing store (DRAM vs. HD)VM HW and memory management policies controlled by the OS


Evolution of Protection MechanismsEarliest machines had no concept of protection and address translation

no need‐‐‐single process, single userautomatically “private and uniform” (but not very large)programs operated on physical addresses directlyp g p p y y

no multitasking protection, no dynamic relocation (at least not very easily)



base and bound registersIn a multi‐tasking systemIn a multi tasking systemEach process is given a non‐overlapping, contiguous physical memory region, everything belonging to a process must fit in that region

Wh i d i OS t b t th t t f th ’When a process is swapped in, OS sets base to the start of the process’s memory region and bound to the end of the region

HW translation and protection check (on each memory reference)

PA = EA + base

provided (PA < bound), else violations

⇒ Each process sees a private and uniform address space (0 .. max)

active process’sBaseregion

another process’si

Bound

privileged controlregisters


physical mem.

region Bound can also be formulated as a range

registers

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarSegmented Address Space

segment == a base and bound pairg p

segmented addressing gives each process multiple segmentsinitially, separate code and data segments

‐ 2 sets of base‐and‐bound reg’s for inst and data fetch‐ allowed sharing code segments

became more and more elaborate: code, data, stack, etc.beca e o e a d o e e abo ate code, data, stac , etcalso (ab)used as a way for an ISA with a small EA space to address a larger physical memory space

SEG # EASEG # EA

segment tablest b

segmenttable

+,<base &

PA&

okay?

must be 1. privileged data

structures and 2. private/unique


boundp q

to each process

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarPaged Address Space

Segmented addressing creates fragmentation problemsSegmented addressing creates fragmentation problems, a system may have plenty of unallocated memory locationsthey are useless if they do not form a contiguous region of a ffi i isufficient size

In a Paged Memory System:

PA i di id d i fi d i ( 4kb )PA space is divided into fixed size segments (e.g. 4kbyte), more commonly known as “page frames”

EA is interpreted as page number and page offsetEA is interpreted as page number and page offset

Page No. Page Offset

page tables

pagetable

+page framePA

page tablesmust be

1. privileged data structures and

2 pri ate/ niq e


table page frame base &okay?

2. private/unique to each process


Demand PagingMain memory and Disk as automatically managed levels in the memory hierarchies

analogous to cache vs. main memory

Drastically different size and time scales

⇒ very different design decisions

Early attempts von Neumann already described manual memory hierarchiesBrookner’s interpretive coding, 1960

a software interpreter that managed paging between a 40kb main memory and a 640Kb drum

Atlas, 1962hardware demand paging between a 32‐page (512 word/page) main

d d


memory and 192 pages on drumsuser program believes it has 192 pages


Demand Paging vs. Caching: 2003L1 Cache Demand PagingL1 Cache Demand Paging

capacity 32KB~MB ??

100MB~10GB ??100MB~10GB ??

block size 16~128 Byte 4K to 64K Byte

hit ti 1~3 50 150hit time 1~3 cyc 50‐150 cyc

miss penalty 5~150 cycles 1M to 10M cycles

i t 0 1~10% 0 00001~0 001%miss rate 0.1~10% 0.00001~0.001%

hit handling hw hw

i h dli hmiss handling hw sw



Page-Based Virtual Memory

Page offsetVirtual address (64-bit)

(12-bit)(1~10 GBytes)

oder Translation

Main memory pagesVirtual page number

(1 10 GBytes)

deco

oder

memory (page table)

(52-bit)

(~8-bytes)

deco

Physical Page

NumberNumberPhysical address

Where to hold this translation memory and (40-bit)


how much translation memory do we need? (10 ~ 100 GBytes)

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarHierarchical Page Table

p1 p2 P.O.effective address

12-bit10-bit10-bit

p2

effective address

privileged

Page Table of Base of the P T bl f

p1

d

pregister

the page table

pages of thepage table

i di k

Page Table of the page table

g

data pages

page in swap diskpage in main memorypage does not exist

Storage of overhead of translation should be proportional to


Storage of overhead of translation should be proportional to the size of physical memory and not the virtual address space


Inverted or Hashed Page Tables

Base of Table

InvertedPage Table

hashPIDTableOffset

VPN+ PA of IPTE

VPN PID PTE

Size of Inverted Page table only needs to beproportional to the size of the physical memory

Each VPN can only be mapped to a small setof entries according to a hash function

PhysicalMemory

To translate a VPN, check all allowed table entries for matching VPN and PID


y

How many memory lookup per translation?


Translation Look-aside Buffer (TLB)

Essentially a cache ofrecent address translations

Virtual addressrecent address translationsavoids going to the page tableon every reference

indexed by lower bits ofTag

Page offsetVPN

VPN (virtual page #)tag = unused bits of VPN +

process ID

=

Indexdata = a page‐table entry

i.e. PPN (physical page #) andaccess permission

status = valid dirtyPhysical page no.

Page offset

status = valid, dirtythe usually cache design choices (placement, replacement policymulti‐level, etc) apply here too.

Physical address


, ) pp y

What should be the relative sizes of ITLB and I-cache?


Virtual to Physical Address TranslationEffectiveAddress

TLBLookup

i hit

≤ 1 pclk

Page TableWalk

ProtectionCheck

miss hit

≤ 1 pclk100’s pclkby HW or SW Walk

succeed fail denied permitted

by HW or SW

Update TLB Page FaultOSTable Walk

PhysicalAddressTo Cache

ProtectionFault

10000’ lk


To Cache10000’s pclk


Cache Placement and Address Translation

PAPhysical Cache (Most Systems)

CPU PhysicalCache

MMUPhysicalMemory

VA longer hit time

fetch critical path

VAVirtual Cache (SPARC2’s)

fetch critical path

CPU VirtualCache

MMUPhysicalMemory

PA aliasing problem

cold start aftercold start after context switch

Vi t l h t l b

fetch critical path


Virtual caches are not popular anymore becauseMMU and CPU can be integrated on one chip


Physically Indexed Cache

Tag Index Page Offset (PO)VirtualAddress

Virtual Page No. (VPN)

TLB

kg

Address(n=v+g bits) v-k

Phy. Page No. (PPN) PO

p

PhysicalTag Index BO

yAddress(m=p+g bits)

t i b

D-cache


Data


Virtually Indexed CacheParallel Access to TLB and Cache arraysParallel Access to TLB and Cache arrays

Virtual Pg No (VPN) Virtual Pg No (VPN)Virtual Pg No. (VPN) Tag Index Page Offset Tag Index Page Offset

gk Index BOv-k

Virtual Pg No. (VPN)

TLB

D-cache

gi b

pPPN

PPNp

p

=Data

Hit/Miss

pp


How large can a virtually indexed cache get?


Large Virtually Indexed Cache

Virtual Pg No. (VPN) Tag Index Page Offset Tag Index Page Offset

Virtual Pg No. (VPN) Tag Index Page Offset Tag Index Page Offset

TLBgk Index BOv-k

i ba

D-cachePPN p

p

PPNData

pp

= Hit/Miss

If two VPNs differs in a, but both map to the same PPN then


there is an aliasing problem


Virtual Address Synonyms

To Virtual pages that map to the same physical pageTo Virtual pages that map to the same physical pagewithin the same virtual address spaceacross address spaces

VA1

VA2

PA

Using VA bits as IDX PA data may reside in different sets in cache!!


Using VA bits as IDX, PA data may reside in different sets in cache!!


Synonym (or Aliasing)When VPN bits are used in indexing, two virtual addresses that map to the same physical Virtual Pg No. (VPN) p p yaddress can end up sitting in two cache lines

Tag Index Page OffsetIndex BO

g ( )

In other words, two copies of the same physical memory location

D-cache

i ba

may exist in the cache

⇒modification to one copy won’t be visible in the other

PPND t

p

pPPNwon t be visible in the other=

DataHit/Miss

pPPNfromTLB


If the two VPNs do not differ in a then there is no aliasing problem


Synonym Solutions

Limit cache size to page size times associativityLimit cache size to page size times associativityget index from page offset

Search all sets in parallel64K 4‐way cache, 4K pages, search 4 sets (16 entries)Slow!

R t i t l t i OSRestrict page placement in OSmake sure index(VA) = index(PA)

Eliminate by OS conventionEliminate by OS conventionsingle virtual spacerestrictive sharing model



R10000’s Virtually Index Caches32KB 2‐Way Virtually‐Indexed L1

needs 10 bits of index and 4 bits of block offset page offset is only 12‐bits ⇒ 2 bits of index are VPN[1:0]p g y [ ]

Direct‐Mapped Physical L2 L2 is Inclusive of L1VPN[1:0] is appended to the “tag” of L2

Given two virtual addresses VA and VB that differs in a and both map to the same physical address PAmap to the same physical address PA

Suppose VA is accessed first so blocks are allocated in L1&L2What happens when VB is referenced?1 VB i d t diff t bl k i L1 d i1 VB indexes to a different block in L1and misses2 VB translates to PA and goes to the same block as VA in L23. Tag comparison fails (VA[1:0]≠VB[1:0])4 L2 detects that a synonym is cached in L1⇒ VA’s entry in L1 is


4. L2 detects that a synonym is cached in L1 ⇒ VA s entry in L1 is ejected before VB is allowed to be refilled in L1


MIPS R10K

00

64‐bit virtual addresstop 2 bits set kernel/supervisor/user mode

0.5 GBmapped

0.5 GBmapped 20

00/3

00

additional bits set cache and translation behaviorbit 61‐40 not translate at all (holes in the VA??)

pp(ksseg)0.5 GB

unmappeduncached

0 GB it VA

in R

8‐bit ASID (address space ID) distinguishes

b t

0.5 GBunmapped

cached

rom

32-

bi

between processes

40 bi h i l dd

Bottom2 GB

M d xam

ple

fro

40‐bit physical address

T l i

Mapped

(normal)

mpl

ified

ex


Translation ‐

“64”‐bit VA + 8‐bit ASID 40‐bit PA

sim


MIPS TLB64‐entry fully associative unified TLB

paired: each entry maps 2 consecutive VPNs to 2 different PPNssoftware managed

7‐instruction page table walk in the best caseTLB Write Random: chooses a random entry for TLB replacementOS can exclude some number of TLB entry (low range) to be excluded from the random selection, to hold translations that cannot miss or should not miss

TLB entry

N h bl

VPN20 ASID6 06PPN20 ndvg 08

R2000

N: noncacheableD: dirty (actually a write‐enable bit)V: valid


G: global entry, i.e., ignore ASID matching


MIPS Bottom-Up Hierarchical TablePage table organization is not part of the ISA

Reference design optimized for software TLB miss handlingReference design optimized for software TLB miss handling

VPN PO VA case TLB Miss, trapwhich

VPN 0sPTEBase VA of PTE(generated automatically

address space?

by HW after TLB miss)

PPN status

memload

PTE loaded from memPPN status PTE loaded from mem

Can this load miss?


Can this load miss?What happens if it misses?

© Wenisch 2007 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarMIPS: UserTLB Miss

Instruction SequenceInstruction Sequencemfc0 k0,tlbcxt # move the contents of TLB

# context register into k0gmfc0 k1,epc # move PC of faulting load

# instruction into k1lw k0 0(k0) # load thru address that waslw k0,0(k0) # load thru address that was

# inTLB context registermtc0 k0,entry_lo # move the loaded value

# into the EntryLo registertlbwr # write entry into the TLB

# at a random slot numberj k1 # jump to PC of faulting

# load instruction to retryrfe # RESTORE FROM


rfe # RESTORE FROM# EXCEPTION


SPARC V964‐bit Virtual Address

an implementation can choose not to map the high‐order bits(must be sign‐extended from the highest mapped bit)( g f g pp )

e.g. UltraSPARC 1 maps only the lower 44 bits

physical address space size set by implementation

64 entry fully associative I‐TLB and D‐TLB

context13g VA<63:13> 51 Tag64

v size nfo IE Soft diag PA<40:13> Soft L CP CV e p w g Data64



TLB Miss HandlingSPARC V8 (32‐bit) defines a 3‐level hierarchical page table

for HWMMU page‐table walkfor HW MMU page table walk

context table L1 Table L2 Table L3 Table+VA +VA +VA

context

descriptors256

descriptors(1024-byte)

64 descriptors(256-byte)

64 PTEs

(256-byte)

+VA[31:24] +VA[23:18] +VA[17:12]

SPARC V9 (64‐bit) defines Translation Storage BufferSPARC V9 (64 bit) defines Translation Storage Buffera software managed, direct‐mapped cache of PTEs (think inverted/hashed page table)

HW assisted address generation on a TLB miss e g for 8 k pages


HW assisted address generation on a TLB miss, e.g.,for 8‐k pages {TSBbase63:21, Logic(TSBbase20:13,VA32:22,size,split?),VA21:13,0000}

TLB miss handler search TSB. If TSB misses, a slower TSB handler takes over


IBM PowerPC (32-bit)

seg#4 seg offset16 page offset12 EA32

segments 256MB regions

16-entry segment table

Pro

tect

ion

seg ID24 seg offset16 page offset12 VA52

Pgi

ng

128 2-way ITLB and DTLB

eman

d P

ag

PPN20 page offset12PA64

De


64-bit PowerPC = 64-bit EA -> 80-bit VA 64-bit PA. How many segments in EA?


IBM PowerPC Hashed Page TableVPN40

HashFunction Hashed Page Table

8 PTE’s per group

**must hold at least N PTE’s for a system with 2N physical pagesHW table walk

table base +

y p y p gHW table walkVPN hashes into a PTE group of 88 PTEs searched sequentially for tag matchf f d f h dif not found in first PTE group search a second PTE groupif not found in the 2nd PTE group, trap to software handler

Hashed table structure also used for EA to VA mapping in 64‐bit


Hashed table structure also used for EA to VA mapping in 64 bit implementations


HP PA-RISC: PID and AID 2‐level translation:

64‐bit EA 96‐bit VA (global) 64‐bit PA

V i bl i d t d EA t VAVariable sized segmented EA to VA

A diff i iA different twist on protectioneveryone else: limit what can be named by a process

In PowerPC, OS controls what VA can be reached by a process by controlling h ’ i h iwhat’s in the segment registers

HP‐RISC: rights‐based access controlUser controls segment registers, i.e., user can generate any VA it wantsEach virtual page has an access ID (not related to ownership by processes) assigned by the OSEach process has 8 active protection ID in special HW registers controlled by the OS


the OSA process can only access a page if it has the key (PID) that fits the lock (AID)

Documents

EECS 470 Lecture 17 Virtual Memory - University of Michigan