Upload
lenhu
View
235
Download
1
Embed Size (px)
Citation preview
1!
1
Memory Management
• Basic memory management: • Mono- and multi-programming • Fixed and variable memory partitioning
• Swapping • Paging • Segmentation • Virtual memory (paging) • Page replacement algorithms • Design issues for paging systems • Implementation issues
Vera Goebel Department of Informatics
University of Oslo
2
Motivation • In project assignments so far
– Program code is linked to kernel – Physical addresses are well-known – Not realistic
• In the real world – Programs are loaded dynamically – Physical addresses it will get are not known to
program – Program size at run-time is not known to kernel
2!
3
Memory Management
• Ideally programmers want memory that is – large – fast – non volatile
• Memory hierarchy – small amount of fast, expensive memory – cache – some medium-speed, medium price main memory – gigabytes of slow, cheap disk storage
• Memory manager handles the memory hierarchy
4
Computer Hardware Review
• Typical memory hierarchy – numbers shown are rough approximations
3!
5
Memory Management for Monoprogramming
• Only one user program loaded – Program is entirely in memory – No swapping or paging
• Three simple ways of organizing memory
OS in ROM!0x0! 0xfff….!
User program!
OS in RAM!0x0! 0xfff….!
User program!
OS in RAM!0x0! 0xfff….!
User program! Devs in ROM!MS-DOS, …!
Old mainframes and minicomputers!
C64, ZX80, …!Some PDAs, embedded systems!
6
Multiprogramming
• Processes have to wait for I/O • Goal
– Do other work while a process waits – Give CPU to another process
• Processes may be concurrently ready • So
– If I/O waiting probability for all processes is p – Probable CPU utilization can be estimated as
CPU utilization = 1 - pn
4!
7
Multiprogramming
Job# Arriv
al time
CPU use time
IO wait time
1 10:00 4 16
2 10:10 3 12
3 10:15 2 8
4 10:20 2 8
• Sequence of events as jobs arrive and finish – Note numbers show amount of CPU time jobs get each interval
# processors % 1 2 3 4
CPU idle 80 64 51 41
CPU busy 20 36 49 59
CPU per process
20 18 16 15
2.0! 0.9!0.9!
0.7!
0.8!0.8!0.8!
0.3!0.3!0.3!0.3!
0.9!0.9!0.9!
2!3!4!
1!
0.1!
0.1!
22! 27.6!28.2! 31.7!
time!10! 15! 20!0!
• Arrival and work requirements of 4 jobs • CPU utilization for 1-4 jobs with 80% I/O wait
1!2!3!4!
2.0!3.0!1.1!2.1!2.0!
0.3!1.2!1.1!2.0!
1.0!0.9!1.7!0.7!
0.1!
0.8!
Remaining CPU time!
8
Multiprogramming
• CPU utilization as a function of number of processes in memory
Degree of multiprogramming
5!
9
Multiprogramming • Several programs
– Concurrently loaded into memory – OS must arrange memory sharing – Memory partitioning
• Memory – Needed for different tasks within a process – Shared among processes – Process memory demand may change over time
• Use of secondary storage – Move (parts of) blocking processes from memory – Higher degree of multiprogramming possible – Makes sense if processes block for long times
10
Memory Management for Multiprogramming
• Process may not be entirely in memory • Reasons
– Other processes use memory • Their turn • Higher priority • Process is waiting for I/O
– Too big • For its share • For entire available memory
• Approaches – Swapping – Paging – Overlays
Registers
Cache(s)
DRAM
Disk
2x
100x
109x
Paging Swapping Overlays
6!
11
Memory Management for Multiprogramming
• Swapping – Remove a process from memory
• With all of its state and data • Store it on a secondary medium
– Disk, Flash RAM, other slow RAM, historically also Tape
• Paging – Remove part of a process from memory
• Store it on a secondary medium • Sizes of such parts are fixed • Page size
• Overlays – Manually replace parts of code and data
• Programmer’s rather than OS’s work • Only for very old and memory-scarce systems
How to use these!with!
Virtual Memory!
12
Memory Management Techniques • Before details about moving processes out
– Assign memory to processes
• Memory partitioning – Fixed partitioning – Dynamic partitioning – Simple paging – Simple segmentation – Virtual memory paging – Virtual memory segmentation
7!
13
Multiprogramming with Fixed Partitions
• Fixed memory partitions – separate input queues for each partition – single input queue
14
Fixed Partitioning
• Divide memory – Into static partitions – At system initialization time (boot or earlier)
• Advantages – Very easy to implement – Can support swapping process in and out
8!
15
Fixed Partitioning
• Two fixed partitioning schemes – Equal-size partitions – Unequal-size partitions
• Equal-size partitions – Big programs can not be
executed • Unless program parts are
loaded from disk – Small programs use entire
partition • A problem called “internal
fragmentation”
Operating system!8MB!
8MB!
8MB!
8MB!
8MB!
8MB!
8MB!
8MB!
0x0!
0x…fff!
16
Fixed Partitioning • Two fixed partitioning
schemes – Equal-size partitions – Unequal-size partitions
• Unequal-size partitions – Bigger programs can be
loaded at once – Smaller programs can lead
to less internal fragmentation
– Advantages require assignment of jobs to partitions
Operating system!8MB!
8MB!
8MB!
8MB!
8MB!
8MB!
8MB!
8MB!
Operating system!8MB!
8MB!
8MB!
2MB!4MB!6MB!
12MB!
16MB!
9!
17
Fixed Partitioning • Approach
– Has been used in mainframes
– Uses the term job for a running program
– Jobs run as batch jobs – Jobs are taken from a
queue of pending jobs • Problem with unequal
partitions – Choosing a job for a
partition
Operating system!8MB!
8MB!
8MB!
2MB!4MB!6MB!
12MB!
16MB!
18
Fixed Partitioning
• One queue per partition – Internal fragmentation
is minimal – Jobs wait although
sufficiently large partitions are available
Operating system!8MB!
8MB!
8MB!
2MB!4MB!6MB!
12MB!
16MB!
10!
19
Fixed Partitioning • Single queue
– Jobs are put into next sufficiently large partition
– Waiting time is reduced
– Internal fragmentation is bigger
– A swapping mechanism can reduce internal fragmentation
• Move a job to another partition
Operating system!8MB!
8MB!
8MB!
2MB!4MB!6MB!
12MB!
16MB!
20
Problems: Relocation and Protection
• Cannot be sure where program will be loaded in memory – address locations of variables, code routines cannot be absolute – must keep a program out of other processes’ partitions
• Use base and limit values – address locations added to base value to map to physical addr – address locations larger than limit value is an error
11!
2 Registers: Base and Bound
• Built in Cray-1 • A program can only access
physical memory in [base, base+bound]
• On a context switch: save/restore base, bound registers
• Pros: Simple • Cons: fragmentation, hard to
share, and difficult to use disks
virtual address
base
bound
error
+
>
physical address
22
Swapping (1)
Memory allocation changes as – processes come into memory – leave memory
Shaded regions are unused memory
12!
23
Dynamic Partitioning • Divide memory
– Partitions are created dynamically for jobs
– Removed after jobs are finished
• External fragmentation – Problem increases with
system running time – Occurs with swapping
as well – Addresses of process 2
changed
Operating system!8MB!
56MB free!
Process 1!20MB!
36MB free!
22MB free!
Process 2!14MB!
4MB free!
Process 3!18MB!
14MB free!Process 4!
8MB!6MB free!
20MB free!Process 5!
14MB!
6MB!
External!fragmentation!
Swapped in!Process 2!
14MB!
6MB free!
Solutions to address!change with!
Address Translation!
24
Operating system!8MB!
Dynamic Partitioning • Reduce external fragmentation
– Compaction
• Compaction – Takes time – Consumes processing resources
• Reduce compaction need – Placement algorithms
4MB free!
Process 3!18MB!
Process 4!8MB!
6MB free!
Swapped in!Process 2!
14MB!
6MB!Process 4!8MB!
6MB free!Process 3!
18MB!
6MB free!
6MB free!16MB free!
13!
25
Dynamic Partitioning: Placement Algorithms
• Use most suitable partition for process
• Typical algorithms – First fit – Next fit – Best fit
128MB! 128MB! 128MB!
16MB! 16MB!16MB!4MB! 4MB! 4MB!8MB! 8MB! 8MB!6MB! 6MB! 6MB!
16MB! 16MB! 16MB!
8MB! 8MB! 8MB!
4MB!
4MB!
4MB!
8MB! 8MB!
8MB!
6MB!
6MB!
6MB!
8MB!
8MB!
8MB!
16MB!
16MB!
16MB!
32MB! 32MB!
32MB!
First! Next! Best!
26
Dynamic Partitioning: Placement Algorithms
• Use most suitable partition for process
• Typical algorithms – First fit – Next fit – Best fit
128MB! 128MB!
4MB! 4MB!
16MB! 16MB!
4MB!
4MB!
8MB!
6MB!
6MB!
8MB!
32MB! 32MB!
12MB!
12MB!
12MB!
12MB!
10MB!
10MB!
16MB! 16MB!
8MB! 8MB!First! Best!
14!
27
Dynamic Partitioning: Placement Algorithms
• Comparison of First fit, Next fit and Best fit • Example is naturally artificial
– First fit • Simplest, fastest of the three • Typically the best of the three
– Next fit • Typically slightly worse than first fit • Problems with large segments
– Best fit • Slowest • Creates lots of small free blocks • Therefore typically worst
28
Memory Management with Bit Maps
• Part of memory with 5 processes, 3 holes – tick marks show allocation units – shaded regions are free
• Corresponding bit map • Same information as a list
15!
29
Memory Management with Linked Lists
Four neighbor combinations for the terminating process X
30
Buddy System • Mix of fixed and dynamic
partitioning – Partitions have sizes 2k,
L ≤ k ≤ U
• Maintain a list of holes with sizes
• Assign a process – Find smallest k so that
process fits into 2k
– Find a hole of size 2k
– If not available, split smallest hole larger than 2k
• Split recursively into halves until two holes have size 2k
1MB!
512kB!
512kB!
256kB!
256kB!
128kB!
128kB!
Process!128kB!
256kB!
Process!256kB!
256kB!Process!256kB!
Process!128kB!Process!256kB!
Process 32kB!
64kB!64kB!
32kB!32kB!Process 32kB!
16!
31
Swapping (2)
• Allocating space for growing data segment • Allocating space for growing stack & data segment
32
Memory use within a process
• Memory needs of known size – Program code – Global variables
• Memory needs of unknown size – Dynamically allocated
memory – Stack
• Several in multithreaded programs
program!
Initialized global!variables (data)!
Uninitialized global vars!
Program!
PCB!
Uninitialized global!variables!data!
stack!
Possibly stacks for more threads!
Process!
17!
33
Memory Addressing
• Addressing in memory – Addressing needs are
determined during programming
– Must work independently of position in memory
– Actual physical address are not known
program!
Initialized global!variables!
Uninitialized global vars!
34
Memory Addressing
• Addressing in memory – Addressing needs are
determined during programming
– Must work independently of position in memory
– Actual physical address are not known
program!
PCB!
data!
stack!
18!
35
Memory Management • Addressing
– Covered address translation and virtual memory
• Important now – Translation is
necessary – Therefore possible to
have several parts • Pages • Segments
program!PCB!
stack!
data!
data!
data!
program!
36
Paging • Paging
– Equal lengths – Determined by processor – One page moved into one
memory frames
• Process is loaded into several frames – Not necessarily consecutive
• No external fragmentation • Little internal
fragmentation – Depends on frame size
Process 1!Process 2!Process 3!Process 4!Process 5!Process 1!
19!
Paging
• Use a page table to translate • Various bits in each entry • Context switch: similar to
the segmentation scheme • What should be the page
size? • Pros: simple allocation, easy
to share • Cons: big page table and
cannot deal with holes easily
VPage # offset
Virtual address
. . .
> error
PPage# ...
PPage# ...
...
PPage # offset
Physical address
Page table
page table size
38
Segmentation • Segmentation
– Different lengths – Determined by programmer – Memory frames
• Programmer (or compiler toolchain) organizes program in parts – Move control – Needs awareness of possible segment size limits
• Pros and Cons – Principle as in dynamic partitioning – No internal fragmentation – Less external fragmentation because on average smaller segments
20!
Segmentation • Have a table of (seg, size) • Protection: each entry has
– (nil, read, write) • On a context switch: save/
restore the table or a pointer to the table in kernel memory
• Pros: Efficient, easy to share • Cons: Complex management
and fragmentation within a segment
physical address
+
segment offset
Virtual address
seg size
. . .
> error
40
Paging and Segmentation • Typical for paging and
swapping – Address translation – At execution time – With processor support
• Simple paging and segmentation – Without virtual memory and
protection – Can be implemented
• by address rewriting at load time • by jump tables setup at load time
Code part 1!
Code part 2!
(“part 2”,!offset in part 2)!Lookup!
table!+!
Simplified!Address translation!
21!
Segmentation with Paging
VPage # offset
Virtual address
. . .
>
PPage# ...
PPage# ...
...
PPage # offset
Physical address
Page table seg size
. . .
Vseg #
error
42
Other needs (protection) • Protection of process
from itself – (stack grows into heap)
• Protection of processes from each other – (write to other process)
program!
PCB!
data!
stack!
program!
data!
stack!
program!
data!
stack!Solutions to protection!
with!Address Translation!
22!
43
Summary: Memory Management • Algorithms
– Paging and segmentation • Extended in address translation and virtual memory lectures
– Placement algorithms for partitioning strategies • Mostly obsolete for system memory management
– since hardware address translation is available • But still necessary for managing
– kernel memory – memory within a process – memory of specialized systems (esp. database systems)
• Address translation solves – Solves addressing in a loaded program
• Hardware address translation – Supports protection from data access – Supports new physical memory position after swapping in
• Virtual memory provides – Provide larger logical (virtual) than physical memory – Selects process, page or segment for removal from physical memory
44
Why Virtual Memory?
• Use secondary storage – Extend expensive DRAM with reasonable
performance • Protection
– Programs do not step over each other and communicate with each other require explicit IPC operations
• Convenience – Flat address space and programs have the same
view of the world
23!
45
Virtual Memory Paging (1)
The position and function of the MMU
Translation Overview • Actual translation is in
hardware (MMU) • Controlled in software • CPU view
– what program sees, virtual memory
• Memory view – physical memory
Translation (MMU)
CPU
virtual address
Physical memory
physical address
I/O device
24!
47
Goals of Translation
• Implicit translation for each memory reference
• A hit should be very fast
• Trigger an exception on a miss
• Protected from user’s faults
Registers
Cache(s)
DRAM
Disk
10x
100x
10Mx paging
48
Paging (2)
The relation between virtual addresses and physical memory addres- ses given by page table
25!
49
Page Tables (1)
Internal operation of MMU with 16 4 KB pages
50
0! 0! 1! 0!0! 0! 1! 0!
Memory Lookup
0! 0! 1! 0! 0! 0! 0! 0! 0! 0! 0! 0! 0! 1! 0! 0!
12-bit offset!
Outgoing physical address!
4-bit index!into page table!virtual page = 0x0010 = 2!
Incoming virtual address!(0x2004, 8196)!
0! 010! 1!1! 001! 1!2! 110! 1!3! 000! 1!4! 100! 1!5! 011! 1!6! 000! 0!7! 000! 0!8! 000! 0!9! 101! 1!
10! 000! 0!11! 111! 1!12! 000! 0!13! 000! 0!14! 000! 0!15! 000! 0!Page table!
0! 0! 1! 0!
present !bit!
0! 0! 0! 0! 0! 0! 0! 0! 0! 1! 0! 0!
(0x6004, 24580)!1! 1! 0!
0! 0! 0! 0! 0! 0! 0! 0! 0! 1! 0! 0!
26!
51
0! 0! 1! 0!0! 0! 1! 0!
Memory Lookup
0! 0! 1! 0! 0! 0! 0! 0! 0! 0! 0! 0! 0! 1! 0! 0!
12-bit offset!
Outgoing physical address!
4-bit index!into page table!virtual page = 0x0010 = 2!
Incoming virtual address!(0x2004, 8196)!
0! 010! 1!1! 001! 1!2! 110! 0!3! 000! 1!4! 100! 1!5! 011! 1!6! 000! 0!7! 000! 0!8! 000! 0!9! 101! 1!
10! 000! 0!11! 111! 1!12! 000! 0!13! 000! 0!14! 000! 0!15! 000! 0!Page table!
0! 0! 1! 0!
present !bit!
0! 0! 0! 0! 0! 0! 0! 0! 0! 1! 0! 0!
PAGE FAULT
52
Page Fault Handling 1. Hardware traps to the kernel saving program counter and process state information
2. Save general registers and other volatile information
3. OS discover the page fault and tries to determine which virtual page is requested
4. OS checks if the virtual page is valid and if protection is consistent with access
5. Select a page to be replaced
6. Check if selected page frame is ”dirty”, i.e., updated
7. When selected page frame is ready, the OS finds the disk address where the needed data is located and schedules a disk operation to bring in into memory
8. A disk interrupt is executed indicating that the disk I/O operation is finished, the page tables are updated, and the page frame is marked ”normal state”
9. Faulting instruction is backed up and the program counter is reset
10. Faulting process is scheduled, and OS returns to routine that made the trap to the kernel
11. The registers and other volatile information is restored and control is returned to user space to continue execution as no page fault had occured
27!
53
Instruction Backup
An instruction causing a page fault
54
Page Tables (2)
• 32 bit address with 2 page table fields • Two-level page tables
Second-level page tables!
Top-level !page table!
28!
55
Page Tables (3)
Typical page table entry
Multiple-Level Page Tables
Directory . . .
pte
. . .
. . .
. . .
dir table offset Virtual address
29!
57
TLBs – Translation Lookaside Buffers
A TLB to speed up paging
Translation Look-aside Buffer (TLB)
offset
Virtual address
. . .
PPage# ...
PPage# ...
PPage# ...
PPage # offset
Physical address
VPage #
TLB
Hit
Miss
Real page table
VPage# VPage#
VPage#
30!
Bits in A TLB Entry
• Common (necessary) bits – Virtual page number: match with the virtual address – Physical page number: translated address – Valid – Access bits: kernel and user (nil, read, write)
• Optional (useful) bits – Process tag – Reference – Modify – Cacheable
Hardware-Controlled TLB • On a TLB miss
– Hardware loads the PTE into the TLB • Need to write back if there is no free entry
– Generate a fault if the page containing the PTE is invalid – VM software performs fault handling – Restart the CPU
• On a TLB hit, hardware checks the valid bit – If valid, pointer to page frame in memory – If invalid, the hardware generates a page fault
• Perform page fault handling • Restart the faulting instruction
31!
Software-Controlled TLB • On a miss in TLB
– Write back if there is no free entry – Check if the page containing the PTE is in memory – If no, perform page fault handling – Load the PTE into the TLB – Restart the faulting instruction
• On a hit in TLB, the hardware checks valid bit – If valid, pointer to page frame in memory – If invalid, the hardware generates a page fault
• Perform page fault handling • Restart the faulting instruction
62
Hardware vs. Software Controlled
• Hardware approach – Efficient – Inflexible – Need more space for page table
• Software approach – Flexible – Software can do mappings by hashing
• PP# → (Pid, VP#) • (Pid, VP#) → PP#
– Can deal with large virtual address space
32!
63
How Many PTEs Do We Need?
• Worst case for 32-bit address machine – # of processes × 220 (if page size is 4096 bytes)
• What about 64-bit address machine? – # of processes × 252
Inverted Page Tables
• Main idea – One PTE for each
physical page frame – Hash (Vpage, pid) to
Ppage# • Pros
– Small page table for large address space
• Cons – Lookup is difficult – Overhead of
managing hash chains, etc
pid vpage offset
pid vpage
0
k
n-1
k offset
Virtual address
Physical address
Inverted page table
33!
65
Inverted Page Tables
Comparison of a traditional page table with an inverted page table
66
Page Replacement Algorithms • Page fault → OS has to select a page for
replacement – Modified page → write back to disk – Not modified page → just overwrite with new data
• How do we decide which page to replace? → determined by the page replacement algorithm → several algorithms exist:
• Random • Other algorithms take into acount usage, age, etc.
(e.g., FIFO, not recently used, least recently used, second chance, clock, …)
• which is best???
34!
67
Optimal • Best possible page replacement algorithm:
• When a page fault occurs, all pages in memory are labeled with the number of instructions that will be executed before this page will be used again
• The page with most instructions before reuse is replaced
• Easy to describe, but impossible to implement (OS cannot look into the future)
• Estimate by logging page usage on previous runs of process
• Useful to evaluate other page replacement algorithm
68
Not Recently Used (NRU) • Two status bits associated with each page:
R → page referenced (read or written) M → page modified (written)
• Pages belong to one of four set of pages according to the
status bits: • Class 0: not referenced, not modified (R=0, M=0) • Class 1: not referenced, modified (R=0, M=1) • Class 2: referenced, not modified (R=1, M=0) • Class 3: referenced, modified (R=1, M=1)
• NRU removes a page at random from the lowest numbered, non-empty class
• Low overhead
35!
69
First In First Out (FIFO) • All pages in memory are maintained in a list sorted by age • FIFO replaces the oldest page, i.e., the first in the list
• Low overhead • FIFO is rearly used in its pure form
Page most recently loaded
Page first loaded, i.e., FIRST REPLACED
Reference string: A B C D A E F G H I A J!
AC B AB AE D C B AF E D C B AG F E D C B AI H G F E D C BA I H G F E D CJ A I H G F E DD C B AD C B A
No change in the FIFO chain
H G F E D C B A
Now the buffer is full, next page fault results in a replacement
70
Page most recently loaded
Page first loaded
R-bit
Second Chance • Modification of FIFO • R bit: when a page is referenced again, the R bit is set,
and the page will be treated as a newly loaded page
Reference string: A B C D A E F G H I !
E
0
D
0
C
0
B
0
A
1
F
0
E
0
D
0
C
0
B
0
A
1
G
0
F
0
E
0
D
0
C
0
B
0
A
1
D
0
C
0
B
0
A
0
D
0
C
0
B
0
A
1
The R-bit for page A is set
H
0
G
0
F
0
E
0
D
0
C
0
B
0
A
1
Now the buffer is full, next page fault results in a replacement
H
0
G
0
F
0
E
0
D
0
C
0
B
0
A
1
Page I will be inserted, find a page to page out by looking at the first page loaded: -if R-bit = 0 → replace -if R-bit = 1 → clear R-bit, move page last, and finally look at the new first page
A
0
H
0
G
0
F
0
E
0
D
0
C
0
B
0
Page A’s R-bit = 1 → move last in chain and clear R-bit, look at new first page (B)
I
0
A
0
H
0
G
0
F
0
E
0
D
0
C
0
Page B’s R-bit = 0 → page out, shift chain left, and insert I last in the chain
• Second chance is a reasonable algorithm, but inefficient because it is moving pages around the list
36!
71
Reference string: A B C D A E F G H I !
Clock • More efficient way to implement Second Chance • Circular list in form of a clock • Pointer to the oldest page:
– R-bit = 0 → replace and advance pointer – R-bit = 1 → set R-bit to 0, advance pointer until R-bit = 0, replace
and advance pointer
A 0
D 0
B 0
C 0
A 1
E 0
F 0
G 0
H 0
I 0
72
Least Recently Used (LRU) • Replace the page that has the longest time since
last reference
• Based on the observation that pages that are heavily used in the last few instructions will probably be used again in the next few instructions
• Several ways to implement this algoithm
37!
73
Least Recently Used (LRU) • LRU as a linked list:
Page most recently used
Page least recently used
Reference string: A B C D A E F G H A C I!
E A D C BF E A D C BG F E A D C BD C B AA D B C
Move A last in the chain (most recently used)
H G F E A D C B
Now the buffer is full, next page fault results in a replacement
I C A H G F E D
Page fault, replace LRU (B) with I
A H G F E D C B
Move A last in the chain (most recently used)
C A H G F E D B
Move C last in the chain (most recently used)
• Expensive - maintaining an ordered list of all pages in memory: • most recently used at front, least at rear • update this list every memory reference !!
74
Least Recently Used (LRU) • LRU by using aging:
– ”reference counter” for each page – after a clock tick:
• shift bits in the reference counter to the right (rightmost bit is deleted)
• add a page’s referece bit in front of the reference counter (left) – page with lowest counter is replaced
1 00000000
2 00000000
3 00000000
4 00000000
5 00000000
6 00000000
1 10000000
2 00000000
3 10000000
4 00000000
5 10000000
6 10000000
1 11000000
2 10000000
3 01000000
4 10000000
5 01000000
6 01000000
Clock tick 0 1 0 1 0 1 1
Clock tick 1 1 1 0 1 0 0
Clock tick 2 1 1 0 1 0 1
Clock tick 3 1 0 0 0 1 0
Clock tick 4 0 1 1 0 0 0
1 11100000
2 11000000
3 00100000
4 11000000
5 00100000
6 10100000
1 11110000
2 01100000
3 00010000
4 01100000
5 10010000
6 01010000
1 01111000
2 10110000
3 10001000
4 00110000
5 01001000
6 00101000
38!
75
Least Recently Used (LRU) • LRU as a matrix:
– N pages → N x N matrix – Page N is referenced → row N is set (1)
→ column N is cleared (0) – Replace page with lowest row value
1 2 3 4 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0
1 2 3 4 1 1 1 1 1 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0
”Page frame” string: 1 2 3 4 3 2 1 4!
1 2 3 4 1 0 1 1 1 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0
1 2 3 4 1 0 1 1 1 2 1 1 1 1 3 0 0 0 0 4 0 0 0 0
1 2 3 4 1 0 0 1 1 2 1 0 1 1 3 0 0 0 0 4 0 0 0 0
1 2 3 4 1 0 0 1 1 2 1 0 1 1 3 1 1 1 1 4 0 0 0 0
1 2 3 4 1 0 0 0 1 2 1 0 0 1 3 1 1 0 1 4 0 0 0 0
1 2 3 4 1 0 0 0 1 2 1 0 0 1 3 1 1 0 1 4 1 1 1 1
1 2 3 4 1 0 0 0 0 2 1 0 0 0 3 1 1 0 0 4 1 1 1 0
1 2 3 4 1 0 0 0 0 2 1 0 0 0 3 1 1 1 1 4 1 1 1 0
1 2 3 4 1 0 0 0 0 2 1 0 0 0 3 1 1 0 1 4 1 1 0 0
1 2 3 4 1 0 0 0 0 2 1 1 1 1 3 1 1 0 1 4 1 1 0 0
1 2 3 4 1 0 0 0 0 2 1 0 1 1 3 1 0 0 1 4 1 0 0 0
1 2 3 4 1 1 1 1 1 2 1 0 1 1 3 1 0 0 1 4 1 0 0 0
1 2 3 4 1 0 1 1 1 2 0 0 1 1 3 0 0 0 1 4 0 0 0 0
1 2 3 4 1 0 1 1 1 2 0 0 1 1 3 0 0 0 1 4 1 1 1 1
1 2 3 4 1 0 1 1 0 2 0 0 1 0 3 0 0 0 0 4 1 1 1 0
76
Counting Algorithms • LRU by using a reference counter
– clear the counter when the page is referenced (counter = 0) – increase all counters each clock tick – replace the page with the highest counter
• Not/Least Frequently Used (N/LFU) – counter initially 0 – increase the page’s counter only if it has been referenced during
this clock tick – replace the page with lowest counter
• Most Frequently Used (MFU) – counter as LFU – replace the page with the highest counter
(assuming low counters mean new, fresh pages)
39!
77
LRU-K & 2Q • LRU-K: bases page replacement in the last K
references on a page [O’Neil et al. 93]
• 2Q: uses 3 queues to hold much referenced and popular pages in memory [Johnson et al. 94]
• 2 FIFO queues for seldom referenced pages • 1 LRU queue for much referenced pages
FIFO LRU FIFO
Retrieved from disk Reused, move to LRU queue NOT Reused, move to FIFO queue
NOT reused, page out
NOT reused, page out
Reused, re-arrange LRU queue Reused, move back to LRU queue
78
Working Set Model
• Working set: set of pages which a process is currently using
• Working set model: paging system tries to keep track of each process’ working set and makes sure that these pages is in memory before letting the process run → reduces page fault rate (prepaging)
• Defining the working set: – set of pages used in the last k memory references (must count backwards) – approximation is to use all references used in the last XX instructions
40!
79
The Working Set Page Replacement Algorithm (1)
• The working set is the set of pages used by the k most recent memory references
• w(k,t) is the size of the working set at time, t
k
80
Working Set Page Replacement Algorithm τ - time period to calculate the WS over age - virtual time - last reference time
if all pages have R == 1! select one page randomly!
• Expensive - must search the whole page table
41!
81
WSClock Page Replacement Algorithm • Organize each page table entry as a clock • As with clock - the page pointed to is
examined first – R = 1:
clear bit, set virtual time, continue (b)
– R = 0: (c)
• age < τ : continue to next
• age > τ : – if page clean, replace (d) – othervice, write to disk and continue to
next
• If all pointer comes back to start – writes are scheduled to clean pages
(find first)
– no scheduled writes (all in WS), several option
• remove first clean • remove oldest • ...
2204!
2204! 2204!
2204!
82
Review of Page Replacement Algorithms
42!
83
Demand Paging Versus Prepaging • Demand paging: pages are loaded on demand, i.e., after a
process needs it • Should be used if we have no knowledge about future references • Each page is loaded separatly from disk, i.e., results in many disk accesses
• Prepaging: prefetching data in advance, i.e., before use • Should be used if we have knowledge about future references • # page faults is reduced, i.e., page in memory when needed by a process • # disk accesses can be reduced by loading several pages in one
I/O-operation
84
Allocation Policies • How should memory be allocated among
the competing runnable processes? • Equal allocation: all processes get the same
amount of pages • Proportional allocation: amount of pages is
depending on process size
43!
85
Allocation Policies • Local page replacement: consider only pages of own
process when replacing a page • corresponds to equal allocation • can cause thrashing • multiple, identical pages in memory
• Global page replacemet: consider all pages in memory when replacing a page
• corresponds to proportional allocation • better performance in general • monitoring of working set size and aging bits • data sharing
86
Allocation Policies • Example: local versus global replacement
insert page A5 using age replacement
Age
A1 10
A2 7
A3 4
A4 11
B1 6
B2 12
B3 1
B4 3
C1 8
C2 2
C3 9
C4 5
Original configuration
Age
A1 10
A2 7
A5 13
A4 11
B1 6
B2 12
B3 1
B4 3
C1 8
C2 2
C3 9
C4 5
Local replacement
Age
A1 10
A2 7
A3 4
A4 11
B1 6
B2 12
A5 13
B4 3
C1 8
C2 2
C3 9
C4 5
Global replacement
Local replacement: Replace the oldest of A’s pages
Global replacement: Replace the oldest page in memory
44!
87
Allocation Policies • Page fault frequency (PFF):
Usually, more page frames → fewer page faults
PFF
: p
age
fau
lts/
sec
# page frames assigned
PFF is unacceptable high → process needs more memory
PFF might be too low → process may have too much memory!!!??????
Solution ??: Reduce number of processes competing for memory
• reassign a page frame • swap one or more to disk, divide up pages they held • reconsider degree of multiprogramming
d n 1 d n 1
88
Page Size • Determining the optimum page size requires balancing
several competing factors: • Data segment size ≠ n x page size → internal fragmentation (small size) • Keep in memory only data that is (currently) used (small size)
• Disk operations (large
size) • Page table size: access/load time and space requirements (large size) • Page replacement algorithm: operations per page (large size)
• Usual page sizes is 4 KB – 8 KB, but up to 64 KB is suggested for systems supporting ”new” applications managing high data rate data streams like video and audio
45!
89
Locking & Sharing • Locking pages in memory:
– I/O and context switches – Much used pages – …
• Shared pages users running the same program at the same time, e.g., editor or compiler
– Problem 1: not all pages are shareable – Problem 2: process swapping or termination – …
90
Separate Instruction and Data Spaces
• One address space • Separate I and D
spaces
46!
91
Cleaning Policy • Need for a background process, paging
daemon – periodically inspects state of memory
• When too few frames are free – selects pages to evict using a replacement
algorithm • It can use same circular list (clock)
– as regular page replacement algorithmbut with diff ptr
92
Implementation Issues Operating System Involvement with Paging
Four times when OS involved with paging 1. Process creation
- determine program size - create page table
2. Process execution - MMU reset for new process - TLB flushed
3. Page fault time - determine virtual address causing fault - swap target page out, needed page in
4. Process termination time - release page table, pages
47!
93
Page Fault Handling (1)
1. Hardware traps to kernel 2. General registers saved 3. OS determines which virtual page needed 4. OS checks validity of address, seeks page frame 5. If selected frame is dirty, write it to disk
94
Page Fault Handling (2)
6. OS brings schedules new page in from disk 7. Page tables updated l Faulting instruction backed up to when it began 6. Faulting process scheduled 7. Registers restored l Program continues
48!
95
Instruction Backup
An instruction causing a page fault
96
Locking Pages in Memory
• Virtual memory and I/O occasionally interact • Proc issues call for read from device into buffer
– while waiting for I/O, another processes starts up – has a page fault – buffer for the first proc may be chosen to be paged out
• Need to specify some pages locked – exempted from being target pages
49!
97
Backing Store
(a) Paging to static swap area (b) Backing up pages dynamically
98
Segmentation (1)
• One-dimensional address space with growing tables • One table may bump into another
50!
99
Segmentation (2)
Allows each table to grow or shrink, independently
100
Segmentation (3)
Comparison of paging and segmentation
51!
101
Implementation of Pure Segmentation
(a)-(d) Development of checkerboarding (e) Removal of the checkerboarding by compaction
102
Segmentation with Paging: MULTICS (1)
• Descriptor segment points to page tables • Segment descriptor – numbers are field lengths
52!
103
Segmentation with Paging: MULTICS (2)
A 34-bit MULTICS virtual address
104
Segmentation with Paging: MULTICS (3)
Conversion of a 2-part MULTICS address into a main memory address
53!
105
Segmentation with Paging: MULTICS (4)
• Simplified version of the MULTICS TLB • Existence of 2 page sizes makes actual TLB more complicated
106
Segmentation with Paging: Pentium (1)
A Pentium selector
54!
107
Segmentation with Paging: Pentium (2)
• Pentium code segment descriptor • Data segments differ slightly
108
Segmentation with Paging: Pentium (3)
Conversion of a (selector, offset) pair to a linear address
55!
109
Segmentation with Paging: Pentium (4)
Mapping of a linear address onto a physical address
110
Segmentation with Paging: Pentium (5)
Protection on the Pentium
Level!
56!
111
Paging on Pentium • In protected mode, the currently executing process
have a 4 GB address space (232) – viewed as 1 M 4 KB pages – The 4 GB address space is divided into 1 K page groups
(1 level – page directory) – Each page group has 1 K 4 KB pages
(2 level – page table)
• Mass storage space is also divided into 4 KB blocks of information
• Uses control registers for paging information
112
Control Registers used for Paging on Pentium
• Control register 0 (CR0):
• Control register 1 (CR1) – does not exist, returns only zero
• Control register 2 (CR2) – only used if CR0[PG]=1 & CR0[PE]=1
31 30 29 16 0
PG
CD
NW
WP
PE
Not-Write-Through and Cache Disable: used to control internal cache
Paging Enable: OS enables paging by setting CR0[PG] = 1
Write-Protect: If CR0[WP] = 1, only OS may write to read-only pages
31 0
Page Fault Linear Address
Protected Mode Enable: If CR0[PE] = 1, the processor is in protected mode
57!
113
Control Registers used for Paging on Pentium
• Control register 3 (CR3) – page directory base address: – only used if CR0[PG]=1 & CR0[PE]=1
• Control register 4 (CR4):
31 11 4 3 0
Page Directory Base Address PCD
PWT
A 4KB-aligned physical base address of the page directory
Page Cache Disable: If CR3[PCD] = 1, caching is turned off
Page Write-Through: If CR3[PWT] = 1, use write-through updated
31 4 0
PSE
Page Size Extension: If CR4[PSE] = 1, the OS designer may designate some pages as 4 MB
114
Pentium Memory Lookup 31 22 21 12 11 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0
Incoming virtual address!(0x1402038, 20979768)!
Page directory: 31 12 7 6 5 4 3 2 1 0
PT base address
... PS A U W P
physical base address of the page table
page size
accessed present
allowed to write
user access allowed
58!
115
Pentium Memory Lookup 31 22 21 12 11 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0
Incoming virtual address!(0x1402038, 20979768)!
31 12 7 6 5 4 3 2 1 0
0...01010101111
... 1 0...011111110
00 ... 0
0...01110000111
... 0
0...00001010101
... 1
0...01111000101
... 0
0...00000000100
... 0
......
Index to page directory!(0x6, 6)!
Page Directory Base Address
CR3:
Page table PF: 1. Save pointer to instruction 2. Move linear address to CR2 3. Generate a PF exception – jump to handler 4. Programmer reads CR2 address 5. Upper 10 CR2 bits identify needed PT 6. Page directory entry is really a mass
storage address 7. Allocate a new page – write back if dirty 8. Read page from storage device 9. Insert new PT base address into
page directory entry 10. Return and restore faulting instruction 11. Resume operation reading the same
page directory entry again – now P = 1
116
Pentium Memory Lookup 31 22 21 12 11 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0
Incoming virtual address!(0x1402038, 20979768)!
31 12 7 6 5 4 3 2 1 0
0...01010101111
... 1 0...011111110
00 ... 0
0...01110000111
... 0
0...00001010101
... 1
0...01111000101
... 0
0...00000000100
... 1
......
Index to page directory!(0x6, 6)!
Page Directory Base Address
CR3: 31 12 7 6 5 4 3 2 1 0
0...01010101111
... 1
0...01010100000
0
0...01100110011
1
0...00010000100
1
......
Page table:
Index to page table!(0x2, 2)!
Page frame PF: 1. Save pointer to instruction 2. Move linear address to CR2 3. Generate a PF exception – jump to handler 4. Programmer reads CR2 address 5. Upper 10 CR2 bits identify needed PT 6. Use middle 10 CR2 bit to determine entry
in PT – holds a mass storage address 7. Allocate a new page – write back if dirty 8. Read page from storage device 9. Insert new page frame base address into
page table entry 10. Return and restore faulting instruction 11. Resume operation reading the same
page directory entry and page table entry again – both now P = 1
59!
117
Pentium Memory Lookup 31 22 21 12 11 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0
Incoming virtual address!(0x1402038, 20979768)!
31 12 7 6 5 4 3 2 1 0
0...01010101111
... 1 0...011111110
00 ... 0
0...01110000111
... 0
0...00001010101
... 1
0...01111000101
... 0
0...00000000100
... 1
......
Index to page directory!(0x6, 6)!
Page Directory Base Address
CR3: 31 12 7 6 5 4 3 2 1 0
0...01010101111
... 1
0...01010100000
1
0...01100110011
1
0...00010000100
1
......
Index to page table!(0x2, 2)!
Page offset!(0x38, 56)!
Page:
requested data
118
Page Fault Causes • Page directory entry’s P-bit = 0:
page group’s directory (page table) not in memory
• Page table entry’s P-bit = 0: requested page not in memory
• Attempt to write to a read-only page • Insufficient page-level privilege to access
page table or frame • One of the reserved bits are set in the page
directory or table entry