CS 2510
OS Basics, cont’d
Dynamic Memory Allocation
• How does system manage memory of a single process?– View: Each process has contiguous logical address
space
Dynamic Storage Management
• Static (compile-time) allocation is not possible for data– Recursive procedures
• Even regular procedures are hard to predict (data dependencies)
– Complex data structures• Storage used inefficiently when reserved statically• Must reserve enough to handle worst case• ptr = allocate(x bytes)• Free(ptr)
• Dynamic allocation can be handled in 2 ways– Stack allocation
• Restricted, but simple and efficient
– Heap allocation• More general, but less efficient• Harder to implement
Stack Organization• Definition: Memory is freed in opposite order from allocation
– Alloc(A)– Alloc(B)– Alloc(C)– Free(C)– Free(B)– Free(A)
• When is it useful?– Memory allocation and freeing are partially predictable– Allocation is hierarchical– Example
• Procedure call frames• Tree traversal, expression evaluation, parsing
Stack Implementation• Advance pointer dividing allocated and free space
– Allocate: Increment pointer – Free: Decrement pointer– X86: Special ‘stack pointer’ register
• ‘SP’ (16), ‘ESP’ (32), ‘RSP’(64)• Where does this register point to?• How does the x86 allocate and free?• Stack grows down
• Advantage– Keeps all the free space contiguous– Simple and efficient to implement
• Disadvantage: Not appropriate for all data structures
Heap Organization
• Definition: Allocate from random locations– Memory consists of allocated areas and free areas (or holes)
• When is it useful?– Allocation and release and unpredictable– Arbitrary list structures, complex data organizations
• Examples: new in C++, malloc() in C
• Advantage: Works on arbitrary allocation and free patterns• Disadvantage: End up with small chunks of free space
Free
Alloc
FreeAlloc
16 bytes
32 bytes
16 bytes
12 bytes
How to allocate 24 bytes?
Fragmentation• Definition: Free memory that is too small to be usefully allocated
– External: Visible to system– Internal: Visible to process (e.g. if allocation at some granularity
• Goal– Keep number of holes small– Keep size of holes large
• Stack allocation– All free space is contiguous in one large region
• How do we implement heap allocations
Heap implementation• Data Structure: Linked list of free blocks
– Free List tracks storage not in use
• Allocation– Choose block large enough for request
• According to policy criteria!
– Update pointers and size variable
• Free– Add block to free list– Merge adjacent free blocksIf (addr of new block == prev_addr + size) {
Combine blocks}
Project 2!!! Project 2!!! Project 2!!! Project 2!!! Project 2!!! Project 2!!!
x86 and Linux
• Where is heap managed?– User space or kernel?
• syscall_brk();– Expands or contracts heap
• A lot like a stack
– Heap grows up– Dedicated virtual address area
• Allocated space then managed by heap allocator– Backed by page tables
Best vs. First vs. Worst
• Best fit– Search the whole list of each allocation– Choose bock that most closely matches size of request– Can stop searching if see exact match
• First fit– Allocate first block that is large enough– Rotating first fit: Start with next free block each time
• Worst fit– Allocate largest block to request (most leftover space)
• Which is best?
Examples
• Best algorithm: Depends on sequence of requests• Example: Memory contains 2 free blocks of size 20
and 15 bytes– Allocation requests: 10 then 20
– Allocation requests: 8, 12, then 12
Buddy Allocation• Fast simple allocation for blocks that are 2n bytes (Knuth 1968)
• Allocation Restrictions– Block sizes 2n
– Represent memory units (2min_order) with bitmap• Allocation strategy for k bytes
– Raise allocation request to nearest 2n
– Search free list for appropriate size• Recursively divide larger blocks until reach block of correct size• “Buddy” blocks remain free
• Free strategy– Recursively coalesce block with buddy if buddy free– May coalesce lazily to avoid overhead
Example
• 1MB of memory– Allocate: 70KB, 35KB, 80KB– Free: 70KB, 35KB
Comparison of Allocation Strategies
• Best fit– Tends to leave some very large holes, some very small ones– Disadvantage: Very small holes can’t be used easily
• First fit– Tends to leave “average” size holes– Advantage: Faster than best fit
• Buddy– Organizes memory to minimize external fragmentation
• Leaves large chunks of free space• Faster to find hole of appropriate size
– Disadvantage: Internal fragmentation when not power of 2 request
Memory allocation in practice
• Malloc() in C:– Calls sbrk() to request more contiguous memory– Add small header to each block of memory
• Pointer to next free block or• Size of block
– Where must header be placed?– Combination of two data structures
• Separate free list for each popular size– Allocation is fast, no fragmentation– Inefficient if some are empty while others have lots of free blocks
• First fit on list of irregular free blocks– Combine blocks and shuffle blocks between lists
Reclaiming Free Memory• When can dynamically allocated memory be freed?
– Easy when a chunk is only used in one place• Explicitly call free()
– Hard when information is shared• Can’t be recycled until all sharers are finished
• Sharing is indicated by the presence of pointers to the data– Without a pointer, can’t access data (can’t find data)
• Two possible problems– Dangling pointers: Recycle storage while its still being used– Memory leaks: Forget to free storage even when can’t be used again
• Not a problem for short lived user processes• Issue for OS and long running applications
Reference Counts
• Idea– Keep track of the number of references to each chunk of memory– When reference count reaches zero, free memory
• Example– Files and hard links in Unix– Smalltalk– Objects in distributed systems– Linux Kernel
• Disadvantages– Circular data structures -> memory leaks
Garbage Collection
• Idea– Storage isn’t freed explicitly (i.e. no free() operation)– Storage freed implicitly when no longer referenced
• Approach– When system needs storage, examine and collect free
memory
• Advantages– Works with circular data structures– Makes life easier on the application programmer
Mark and Sweep Garbage Collection
• Requirements– Must be able to find all objects– Must be able to find all pointers to objects
• Compiler must cooperate by marking type of data in memory. Why?
• Two Passes– Pass 1: Mark
• Start with all statically allocated (where?) and procedure local variables (where?)
• Mark each object• Recursively mark all objects reachable via pointer
– Pass 2: Sweep• Go through all objects, free those not marked
Garbage Collection in Practice
• Disadvantages– Garbage collection is often expensive: 20% or more of CPU time– Difficult to implement
• Languages with garbage collection– LISP (emacs)– Java/C#– Scripting languages
• Conservative Garbage Collection– Idea: Treat all memory as pointers (what does this mean?)– Can be used for C and C++
I/O Devices
• Two primary aspects of computer system– Processing (CPU + Memory)– Input/Output
• Role of Operating System– Provide a consistent interface
• Simplify access to hardware devices• Implement mechanisms for interacting with devices
– Allocate and manage resources• Protection• Fairness
– Obtain Efficient performance• Understand performance characteristics of device• Develop policies
I/O Subsystem
User Process
Kernel
Kernel I/O Subsystem
SCSIBus Keyboard Mouse PCI Bus GPU HarddiskDevice
Drivers
Software
Hardware
SCSIBus Keyboard Mouse PCI Bus GPU Harddisk
SCSIBus Keyboard Mouse PCI Bus GPU Harddisk
DeviceControllers
Devices
User View of I/O• User Processes cannot have direct access to devices
– Manage resources fairly– Protects data from access-control violations– Protect system from crashing
• OS exports higher level functions– User process performs system calls (e.g. read() and write())
• Blocking vs. Nonblocking I/O– Blocking: Suspends execution of process until I/O completes
• Simple and easy to understand• Inefficient
– Nonblocking: Returns from system calls immediately• Process is notified when I/O completes• Complex but better performance
User View: Types of devices
• Character-stream– Transfer one byte (character) at a time – Interface: get() or put()
• Implemented as restricted forms of read()/write()
– Example: keyboard, mouse, modem, console
• Block– Transfer blocks of bytes as a unit (defined by hardware)– Interface: read() and write()
• Random access: seek() specifies which bytes to transfer next
– Example: Disks and tapes
Kernel I/O Subsystem
• I/O scheduled from pool of requests– Requests rearranged to optimize efficiency
• Example: Disk requests are reordered to reduce head seeks
• Buffering – Deal with different transfer rates– Adjustable transfer sizes
• Fragmentation and reassembly
– Copy Semantics• Can calling process reuse buffer immediately?
• Caching: Avoid device accesses as much as possible– I/O is SLOW– Block devices can read ahead
Device Drivers
• Encapsulate details of device– Wide variety of I/O devices (different manufacturers and
features)– Kernel I/O subsystem not aware of hardware details
• Load at boot time or on demand
• IOCTLs: Special UNIX system call (I/O control)– Alternative to adding a new system call– Interface between user processes and device drivers
• Device specific operation• Looks like a system call, but also takes a file descriptor argument
– Why?
Device Driver: Device Configuration
• Interactions directly with Device Controller• Special Instructions
– Valid only in kernel mode• X86: In/Out instructions
– No longer popular
• Memory-mapped– Read and write operations in special memory regions
• How are memory operations delivered to controller?
– OS protects interfaces by not mapping memory into user processes– Some devices can map subsets of I/O space to processes
• Buffer queues (i.e. network cards)
Interacting with Device Controllers• How to know when I/O is complete?• Polling
– Disadvantage: Busy Waiting• CPU cycles wasted when I/O is slow• Often need to be careful with timing
• Interrupts– Goal: Enable asynchronous events– Device signals CPU by asserting interrupt request line– CPU automatically jumps to Interrupt Service Routine
• Interrupt vector: Table of ISR addresses• Indexed by interrupt number
– Lower priority interrupts postponed until higher priority finished• Interrupts can nest
– Disadvantage: Interrupts “interrupt” processing• Interrupt storms
Device Driver: Data transfer
• Programmed I/O (PIO)– Initiate operation and read in every byte/word of data
• Direct Memory Access (DMA)– Offload data xfer work to special-purpose processor – CPU configures DMA transfer
• Writes DMA command block into main memory– Target addresses and xfer sizes
• Give command block address to DMA engine
– DMA engine xfers data from device to memory specified in command block
– DMA engine raises interrupt when entire xfer is complete– Virtual or Physical address?