Download pptx - CS 2510 OS Basics, cont’d. Dynamic Memory Allocation How does system manage memory of a single process? – View: Each process has contiguous logical address

CS 2510

OS Basics, cont’d

Dynamic Memory Allocation

• How does system manage memory of a single process?– View: Each process has contiguous logical address

space

Dynamic Storage Management

• Static (compile-time) allocation is not possible for data– Recursive procedures

• Even regular procedures are hard to predict (data dependencies)

– Complex data structures• Storage used inefficiently when reserved statically• Must reserve enough to handle worst case• ptr = allocate(x bytes)• Free(ptr)

• Dynamic allocation can be handled in 2 ways– Stack allocation

• Restricted, but simple and efficient

– Heap allocation• More general, but less efficient• Harder to implement

Stack Organization• Definition: Memory is freed in opposite order from allocation

– Alloc(A)– Alloc(B)– Alloc(C)– Free(C)– Free(B)– Free(A)

• When is it useful?– Memory allocation and freeing are partially predictable– Allocation is hierarchical– Example

• Procedure call frames• Tree traversal, expression evaluation, parsing

Stack Implementation• Advance pointer dividing allocated and free space

– Allocate: Increment pointer – Free: Decrement pointer– X86: Special ‘stack pointer’ register

• ‘SP’ (16), ‘ESP’ (32), ‘RSP’(64)• Where does this register point to?• How does the x86 allocate and free?• Stack grows down

• Advantage– Keeps all the free space contiguous– Simple and efficient to implement

• Disadvantage: Not appropriate for all data structures

Heap Organization

• Definition: Allocate from random locations– Memory consists of allocated areas and free areas (or holes)

• When is it useful?– Allocation and release and unpredictable– Arbitrary list structures, complex data organizations

• Examples: new in C++, malloc() in C

• Advantage: Works on arbitrary allocation and free patterns• Disadvantage: End up with small chunks of free space

Free

Alloc

FreeAlloc

16 bytes

32 bytes

16 bytes

12 bytes

How to allocate 24 bytes?

Fragmentation• Definition: Free memory that is too small to be usefully allocated

– External: Visible to system– Internal: Visible to process (e.g. if allocation at some granularity

• Goal– Keep number of holes small– Keep size of holes large

• Stack allocation– All free space is contiguous in one large region

• How do we implement heap allocations

Heap implementation• Data Structure: Linked list of free blocks

– Free List tracks storage not in use

• Allocation– Choose block large enough for request

• According to policy criteria!

– Update pointers and size variable

• Free– Add block to free list– Merge adjacent free blocksIf (addr of new block == prev_addr + size) {

Combine blocks}

Project 2!!! Project 2!!! Project 2!!! Project 2!!! Project 2!!! Project 2!!!

x86 and Linux

• Where is heap managed?– User space or kernel?

• syscall_brk();– Expands or contracts heap

• A lot like a stack

– Heap grows up– Dedicated virtual address area

• Allocated space then managed by heap allocator– Backed by page tables

Best vs. First vs. Worst

• Best fit– Search the whole list of each allocation– Choose bock that most closely matches size of request– Can stop searching if see exact match

• First fit– Allocate first block that is large enough– Rotating first fit: Start with next free block each time

• Worst fit– Allocate largest block to request (most leftover space)

• Which is best?

Examples

• Best algorithm: Depends on sequence of requests• Example: Memory contains 2 free blocks of size 20

and 15 bytes– Allocation requests: 10 then 20

– Allocation requests: 8, 12, then 12

Buddy Allocation• Fast simple allocation for blocks that are 2n bytes (Knuth 1968)

• Allocation Restrictions– Block sizes 2n

– Represent memory units (2min_order) with bitmap• Allocation strategy for k bytes

– Raise allocation request to nearest 2n

– Search free list for appropriate size• Recursively divide larger blocks until reach block of correct size• “Buddy” blocks remain free

• Free strategy– Recursively coalesce block with buddy if buddy free– May coalesce lazily to avoid overhead

Example

• 1MB of memory– Allocate: 70KB, 35KB, 80KB– Free: 70KB, 35KB

Comparison of Allocation Strategies

• Best fit– Tends to leave some very large holes, some very small ones– Disadvantage: Very small holes can’t be used easily

• First fit– Tends to leave “average” size holes– Advantage: Faster than best fit

• Buddy– Organizes memory to minimize external fragmentation

• Leaves large chunks of free space• Faster to find hole of appropriate size

– Disadvantage: Internal fragmentation when not power of 2 request

Memory allocation in practice

• Malloc() in C:– Calls sbrk() to request more contiguous memory– Add small header to each block of memory

• Pointer to next free block or• Size of block

– Where must header be placed?– Combination of two data structures

• Separate free list for each popular size– Allocation is fast, no fragmentation– Inefficient if some are empty while others have lots of free blocks

• First fit on list of irregular free blocks– Combine blocks and shuffle blocks between lists

Reclaiming Free Memory• When can dynamically allocated memory be freed?

– Easy when a chunk is only used in one place• Explicitly call free()

– Hard when information is shared• Can’t be recycled until all sharers are finished

• Sharing is indicated by the presence of pointers to the data– Without a pointer, can’t access data (can’t find data)

• Two possible problems– Dangling pointers: Recycle storage while its still being used– Memory leaks: Forget to free storage even when can’t be used again

• Not a problem for short lived user processes• Issue for OS and long running applications

Reference Counts

• Idea– Keep track of the number of references to each chunk of memory– When reference count reaches zero, free memory

• Example– Files and hard links in Unix– Smalltalk– Objects in distributed systems– Linux Kernel

• Disadvantages– Circular data structures -> memory leaks

Garbage Collection

• Idea– Storage isn’t freed explicitly (i.e. no free() operation)– Storage freed implicitly when no longer referenced

• Approach– When system needs storage, examine and collect free

memory

• Advantages– Works with circular data structures– Makes life easier on the application programmer

Mark and Sweep Garbage Collection

• Requirements– Must be able to find all objects– Must be able to find all pointers to objects

• Compiler must cooperate by marking type of data in memory. Why?

• Two Passes– Pass 1: Mark

• Start with all statically allocated (where?) and procedure local variables (where?)

• Mark each object• Recursively mark all objects reachable via pointer

– Pass 2: Sweep• Go through all objects, free those not marked

Garbage Collection in Practice

• Disadvantages– Garbage collection is often expensive: 20% or more of CPU time– Difficult to implement

• Languages with garbage collection– LISP (emacs)– Java/C#– Scripting languages

• Conservative Garbage Collection– Idea: Treat all memory as pointers (what does this mean?)– Can be used for C and C++

I/O Devices

• Two primary aspects of computer system– Processing (CPU + Memory)– Input/Output

• Role of Operating System– Provide a consistent interface

• Simplify access to hardware devices• Implement mechanisms for interacting with devices

– Allocate and manage resources• Protection• Fairness

– Obtain Efficient performance• Understand performance characteristics of device• Develop policies

I/O Subsystem

User Process

Kernel

Kernel I/O Subsystem

SCSIBus Keyboard Mouse PCI Bus GPU HarddiskDevice

Drivers

Software

Hardware

SCSIBus Keyboard Mouse PCI Bus GPU Harddisk

SCSIBus Keyboard Mouse PCI Bus GPU Harddisk

DeviceControllers

Devices

User View of I/O• User Processes cannot have direct access to devices

– Manage resources fairly– Protects data from access-control violations– Protect system from crashing

• OS exports higher level functions– User process performs system calls (e.g. read() and write())

• Blocking vs. Nonblocking I/O– Blocking: Suspends execution of process until I/O completes

• Simple and easy to understand• Inefficient

– Nonblocking: Returns from system calls immediately• Process is notified when I/O completes• Complex but better performance

User View: Types of devices

• Character-stream– Transfer one byte (character) at a time – Interface: get() or put()

• Implemented as restricted forms of read()/write()

– Example: keyboard, mouse, modem, console

• Block– Transfer blocks of bytes as a unit (defined by hardware)– Interface: read() and write()

• Random access: seek() specifies which bytes to transfer next

– Example: Disks and tapes

Kernel I/O Subsystem

• I/O scheduled from pool of requests– Requests rearranged to optimize efficiency

• Example: Disk requests are reordered to reduce head seeks

• Buffering – Deal with different transfer rates– Adjustable transfer sizes

• Fragmentation and reassembly

– Copy Semantics• Can calling process reuse buffer immediately?

• Caching: Avoid device accesses as much as possible– I/O is SLOW– Block devices can read ahead

Device Drivers

• Encapsulate details of device– Wide variety of I/O devices (different manufacturers and

features)– Kernel I/O subsystem not aware of hardware details

• Load at boot time or on demand

• IOCTLs: Special UNIX system call (I/O control)– Alternative to adding a new system call– Interface between user processes and device drivers

• Device specific operation• Looks like a system call, but also takes a file descriptor argument

– Why?

Device Driver: Device Configuration

• Interactions directly with Device Controller• Special Instructions

– Valid only in kernel mode• X86: In/Out instructions

– No longer popular

• Memory-mapped– Read and write operations in special memory regions

• How are memory operations delivered to controller?

– OS protects interfaces by not mapping memory into user processes– Some devices can map subsets of I/O space to processes

• Buffer queues (i.e. network cards)

Interacting with Device Controllers• How to know when I/O is complete?• Polling

– Disadvantage: Busy Waiting• CPU cycles wasted when I/O is slow• Often need to be careful with timing

• Interrupts– Goal: Enable asynchronous events– Device signals CPU by asserting interrupt request line– CPU automatically jumps to Interrupt Service Routine

• Interrupt vector: Table of ISR addresses• Indexed by interrupt number

– Lower priority interrupts postponed until higher priority finished• Interrupts can nest

– Disadvantage: Interrupts “interrupt” processing• Interrupt storms

Device Driver: Data transfer

• Programmed I/O (PIO)– Initiate operation and read in every byte/word of data

• Direct Memory Access (DMA)– Offload data xfer work to special-purpose processor – CPU configures DMA transfer

• Writes DMA command block into main memory– Target addresses and xfer sizes

• Give command block address to DMA engine

– DMA engine xfers data from device to memory specified in command block

– DMA engine raises interrupt when entire xfer is complete– Virtual or Physical address?