Operating Systems: Paging/1 1 Paging Segmentation helps deal with: – modularity, sharability, protection, relocation Paging can help with: – insufficient

Operating Systems: Paging/1

1

Paging

Segmentation helps deal with:

– modularity, sharability, protection, relocation

Paging can help with:

– insufficient memory, external fragmentation, efficiency of memory use

•Physical memory divided into fixed size page-frames e.g. 4096 bytes

•Virtual pages mapped into page-frames:

CPU page offset frame offset

frame

page table physical memory


2

•Virtual address :

–total virtual address space of 2p+d , 2p pages, each of 2d bytes

»e.g. p = 20, d = 12 : 232 bytes virtual space : 220 pages of 4096 bytes each

–virtual space usually very much larger than amount of physical memory

»4Gb physical memory soon possible and common?

»in the near future, 64-bit addressing to extend virtual address space: Intel Itanium, AMD Clawhammer, DEC Alpha etc.

page number page offset

p bits d bits

page 0

page 0

page 2

page 1

virtual memory physical memory

page 1

page 2

page 3

1

4

3

6

page table

0

1

2

3

4

5

6

7

page 3

.

.

.

.

.

.

.

.


3

•Page table the same length as the number of virtual pages

–can get extremely long

»e.g. 220 pages, each page table entry 4 bytes typically = 4Mb

»252 pages in a 64-bit virtual memory = 254 bytes! bigger pages a possible solution?

e.g. 1Mb pages still gives page table length of 244

–usually have a page table limit hardware register :

»hardware checks page number within the limit for each translation

»limits the length of the page table

»but reduces the size of virtual space – undesirable!

•Each process has its own virtual space

–so each needs its own page table

–may be hundreds of processes around even on a single user workstation

»e.g. scheduling, device driver, comms system processes etc.

»though many of these will be small and not need a large page table

•Need a better system – multi-level page tables described later


4

•Not all virtual pages can be mapped into physical memory at once–a Presence (or Valid) bit part of each page table entry

»P=1 : page present in memory»P=0 : page not present in memory – page-frame number invalid

page may be temporarily stored on hard disc may not even exist – no problem with gaps in virtual space!

–if the page table entry corresponding to a presented virtual address has P=0

»a page-fault interrupt generated by address translation hardware

page 0

page 0

page 2virtual memory physical memory

page 1

page 2

page 3

1

3

6

0

1

2

3

4

5

6

7

page 3

.

.

.

.

.

.

.

.

1P

0

1

1


5

•Demand Paging :–invoked when a page-fault interrupt occurs

–a policy in which :»the operating system automatically retrieves the page from hard disc»allows the process to continue with just a short hiatus»transparent to the user

–steps:»check that the missing page actually exists in virtual space (not in a

gap)»find a spare page-frame in memory

may involve removing an existing page according to some page replacement policy

»initiate transfer of missing page from hard disc into spare page-frame»wait until transfer complete

schedule another process in the meantime

»update the page table entry with page-frame number»set P = 1 in the page table entry»put process back on the Run Queue»when process dispatched onto the CPU, instruction causing page-fault is

retried this time no page-fault interrupt will occur


6

•Multiple processes in physical memory :

page A0

page A0

page A2

page B1

process Avirtual memory

physical memorypage A1

page A2

page A3

1

3

6 0

1

2

3

4

5

6

7

8

9

10

11

page A3

.

.

.

.

.

.

.

.

1

page table A

0

1

1

page B0

page B3

page B0

process Bvirtual memory

page B1

page B2

page B3

8

10

.

.

.

.

.

.

.

.

1

page table B

1

1

0

4


7

•Pages shared in memory

–e.g. a text editor used by two processes – each with their own data

editor0

editor0

data A

process Avirtual memory

physical memoryeditor1

editor2

1

6

10

3

0

1

2

3

4

5

6

7

8

9

10

11

editor1

.

.

.

.

.

.

.

.

1

page table A

1

1

1

data B

editor2

editor0

process Bvirtual memory

editor1

editor2

data B

1

10

8

.

.

.

.

.

.

.

.

1

page table B

1

1

1

6

data A


8

•Benefits of Paging :

–avoids external fragmentation – no unusable holes in memory

–some internal fragmentation

»the last part of the last page in a sequence may be wasted

»more waste the larger the page size

–small page size :

»less internal fragmentation – more efficient memory utilisation

»larger page tables and other kernel tables e.g. free page list

»higher kernel overheads – dealing with more individual pages

–larger page size :

»longer disc transfer times

»hidden internal fragmentation - less of each page may actually be used

»smaller Translation Lookaside Buffer needed

–small pages often grouped together by OSes to make larger effective pages

»saves some overheads

–e.g. Sun SPARC 4Kb pages, Intel x86: 4Kb pages (with 4Mb option)


9

•Benefits of Paging :

–gets around lack of physical memory

–a large virtual address space mapped into whatever physical memory is available

–up to the OS to achieve acceptable performance with demand paging

»the more page-frames allocated to a process the fewer page-faults and consequent demand page-in delays

»page-frames must be shared equitably between the processes demanding memory space – memory management

•Page sharing :

–usually more convenient to share at the module level

»make the whole module sharable, rather than individual pages

•Protection :

–better organised at module level

»give the whole module the same protection rather than for individual pages


10

•Allocation of Page-frames :

–a free list of unused page-frames

–any page-frame is as good as any other when allocating

»though may wish to avoid using a page-frame if there is a chance its contents may be needed again in near future

»recapture using memory as a large cache of pages (Windows 2000)

–page-frames put back on the free list as they are released from use

»except perhaps : pages known to be finished with on front of free list

e.g. stack pages from a terminated process or thread

pages which might possible be used again on end of free list

maximise chance of them still being un-reused when needed again

•Consequence of large page-tables :

–need to be stored in main memory (far too long to be held in CPU registers)

–every virtual address access requires two memory accesses?

»one to access page table + one to access required physical memory location


11

•Translation Lookaside Buffer (TLB)

–a fast associative memory close to the CPU

–stores a set of translations from virtual page number to page-frame number

–each TLB entry compared concurrently

CPU page offset

frame offset

frame

page table

physical memory

page num frame num

TLB

TLB hit


12

–TLB translations much faster than going via a page table in memory

–aim is to achieve as high a hit ratio as possible

•can get an effective access time using a weighted average :

–for: m = main memory access timet = TLB access timeh = hit ratio

– effective access time = ( h*t + (1-h)*(t+m) ) + m

–e.g. h = 0.95, m = 100ns, t = 10ns : e.a.t. = 115ns h = 0.99, m = 25ns, t = 2ns : e.a.t. = 27.25ns

–the larger the TLB, the higher the likely hit ratio

–examples:

»Motorola 68030 : 22 entries

»Intel 486 : 32 entries (claimed 98% hit ratio)

»Intel Pentium 4 : 128 entry instruction, 64 entry data (4-way associative)

»PowerPC 601 : 256 entry (2-way associative)

»AMD Athlon : 512 entry (2-level)


13

•hit ratio optimised by keeping most recently used translations in the TLB

–pages just accessed likely to be accessed again soon

•each entry has a tag which is updated at every translation

–e.g. Sun UltraSparc : 64 entries with 6 bit tag field

–scheme: if entry with tag value n matched :tag for this entry set to 0entries with tag < n : incremented by 1entries with tag > n : unaffected

–lowest value tags are most recently used translations

–highest tags are least recently used (LRU) and can be discarded first

virtual page tag page-frame

0

1

2

61

62

63


14

•TLB entries must be invalidated when:

–a page table entry is modified to change a virtual to physical mapping

–the running process is changed

»same virtual address in different processes means different physical address

–privileged instructions usually provided to invalidate one entry or all entries

»used by OS kernel dispatcher and memory manager

–a process ID number could be prepended to the virtual page number to disambiguate the same virtual page numbers in different processes

»Sun SPARC

»effectively creates a larger virtual address space in which all processes live

•Changing the running process can cause significant performance loss

–first translations to be made are not available in the TLB yet

–need to go to page tables in memory

»almost doubles the normal access time

–kernel dispatchers must try to limit frequency of changing processes


15

•Caches

–high speed memory closer to the processor than main memory

»Level 1 cache small, closest to processor, on same chip as CPU, highest access speed

»Level 2 cache larger, between level 1 cache and memory, usually on same chip also now

–when data from a memory location is needed:

»level 1 cache searched first for that location

»if missing, level 2 cache searched; transfer data to level 1 cache if found

»if missing from level 2 cache, fetch data from main memory to level 2 and level 1 caches

–blocks of contiguous memory cached e.g. 16 byte ‘lines’

–most recently used lines maintained in caches by hardware

»some architectures allow a preload cache line by software

–effective memory access time reduced

»similar calculation as for TLB


16

–various strategies for writing data to memory

»write-through – data written back to main memory immediately useful for multi-processor systems

»write-back – data only written back to main memory when block discarded from cache

less memory writing potentially

»strategy can often be selected by software

–bus-snooping may be necessary for multi-processor systems:

»each cache snoops on bus addresses from other CPUs and either: invalidates its own copy of any lines being written to main memory

captures line (or data within a line) and updates its own cache

CPU

cache

CPU

cache

CPU

cache

bus main memory


17

•N-way Set Associative Caching :

–address :

–N sets of lines per index

»tag compared associatively with all tags at that row index

»e.g. Pentium 4 Level 1 data cache : 8Kb, 4-way associative, 64 byte lines

32 rows of 4*64 bytes : 5 bits index, 6 bits within-line offset

»Pentium 4 Level 2 unified cache : 256Kb, 8-way associative, 64 byte lines

512 rows of 8*64 bytes

»Ref. IA-32 Intel Architecture Software Developer’s Manual, Vol 3

tag index within-line

tag data block tag data block tag data block tag data block

index

64


18

•most often, physical addresses are used for cache line matching

–i.e. virtual address translated to physical first :

–some overlapping of translation and cache matching possible:

»virtual address offset can be used to start indexing into the level 1 cache whilst TLB translation taking place

because the offset is not altered - just concatenated to the page-frame number

requires index+within-line bits < page offset bits

OK for Pentium 4 level 1 data cache : 11 bits index+within < 12 bits page offset

»tag matching can only take place after translation completed

CPU page offset

frame offset

frame

page table

page num frame num

TLB

TLB hit

Level 1 cache Level 2 cache

MainMemory

data value


19

•Caching using virtual addresses also possible e.g. Sun SPARC

–virtual addresses matched before translation

–translation can be started concurrently with cache match

»aborted if cache match found

–need to have process ID bits prepended to the virtual address before match

»otherwise only one processes data lines could be in the cache at once

»and whole cache would need to be invalidated on every process change

not necessary with physical address caching

–possible snag:

»one process could, in theory, use two different virtual addresses for the same physical address

which line is updated on write?

•Very significant performance penalty when process on CPU changed

–probably none of the data in the caches relates to the new process

–much more significant than loss of TLB context

»most of the high performance of recent processors comes from caching

–kernel scheduler and dispatcher need to avoid process changing too often


20

•Flag Bits – in each page table entry

–Presence bit : when set to 1, this page is present in memory

»page-frame number is valid

»page-fault interrupt caused when page accessed with this bit 0

»sometimes set to 0 even though page is present in memory acts as a guard page

causes an artificial page fault interrupt into the kernel when page accessed

–Referenced bit : set whenever the page is accessed, read or write

»cleared to 0 by paging manager at start of a processes residence in memory

and at successive time intervals thereafter

»indicates which pages have been accessed during the previous interval sometimes called strobing

»used by the kernel’s memory manager when deciding which pages can be removed from memory when space is needed for something else

useful to know what is currently being used

various page replacement schemes possible e.g. LRU


21

–Modified (or Dirty) bit : set whenever a page is written to

»set value maintained throughout a memory residence through successive time intervals

»pages with this bit set must be saved somewhere e.g. hard disc, to avoid information loss when the space they occupy is needed for something else

»pages with this bit not set can just be discarded assuming a copy of the original exists on hard disc somewhere

–Cache Disable bit : do not cache information in this page

»useful for memory-mapped I/O addresses data needs to go straight to output device

»needed for semaphores in multi-processor system semaphore stored in main memory

must not be held up in a local cache

–Cache Write-through : write data straight through cache to memory

»instead of waiting until cache line re-used

»depends on particular architecture – possible on Intel Pentium


22

•Demand Paging Flow :

Page in TLB?Generate Physical

Address

yes virtualaddress

Data in Cache?

yes

Access MainMemory for data

use data

no UpdateCache

Access PageTable

Presence bit set? Update TLByes

no

Initiate PageFault Interrupt

A page-frame free? Select Page forReplacement

no

Start Transfer ofPage from Disc

yes

Page TransferCompleted

Start Transfer ofPage to Disc

no

Dirty bit set?yes

no

Page TransferCompleted

Update Page Tableand set Presence bit

Hardware

Software

Run other processeswhilst waiting


23

•Multi-level Page Tables

–very large virtual address spaces becoming common

»e.g. 64-bit addressing on DEC Alpha, Intel Itanium, AMD Clawhammer

–single-level page tables get excessively long

»4kb pages = 252 entry page tables

»1Mb pages = 244 entry page tables

»and a page table for every process!

–whole page table cannot be held in physical memory at once

»needs to be sub-divided and somehow paged in and out as required

–Two-level Paging

»level 1 page table entries point to one of many level 2 page tables

»level 2 page table entries contain page-frame numbersindex into level 1

page tableindex into level 2

page tableoffset within page

Virtual Address


24

level 1page table

level 2page tables

mainmemory

–level 2 page tables can be paged in and out

»presence bit in level 1 page table entries indicates whether level 2 page table present or not

»virtual address partition usually organised to fit a single page table into a page e.g. for a 32-bit machine, partitions of 10 bits, 10 bits and 12 bits:

index partition sizes of 10 bits with 4 bytes per entry gives a page table size of 4kb


25

–effective access time = ( h*t + (1-h)*(t+2*m) ) + m

–e.g. h = 0.95, m = 100ns, t = 10ns : e.a.t. = 120ns (up from 115ns) h = 0.99, m = 25ns, t = 2ns : e.a.t. = 27.5ns (up from 27.25ns)

–Three and more level Page Tables

»more partitions of the virtual address space:

»likely to be necessary with 64-bit virtual addressing

»page tables at any intermediate level can be paged in and out using presence bits at each level

1st index 2nd index 3rd index 4th index offset


26

•Inverted Page Tables

–a page table with one entry per page-frame of physical memory

–each entry contains the virtual page number of the page in that frame

»plus a process ID to disambiguate virtual page numbers

–memory manager inserts data into table when allocating page-frames

–table has to be searched for a PID/virtual page number combination

»index of matching position is the page-frame number

–if PID/virtual page number not found, page-fault interrupt triggered

»memory manager may also keep traditional page tables for its own use

PID virtual page number

.

.

.

.

page-frame number


27

–linear table searching entry by entry will be slow and inefficient

–use Hash Table searching by hardware :

–hashing clashes dealt with by chaining entries with same hash index together

»hash index calculation, PID/page no. comparison and chaining done by hardware

–a TLB is still required

»searched associatively before inverted page table consulted

v page no. offset

virtual address

hash table

hash index

inverted page table

chain

page-frame offset

physical address

Documents

Operating Systems: Paging/1 1 Paging Segmentation helps deal with: – modularity, sharability, protection, relocation Paging can help with: – insufficient