Operating System Organization (Part 1)Distributed File Systems (Part 2)
Jay [email protected] 543 Operating Systems
Operating System Design Overview
• OS Characteristics
• Types of Kernels
• Monolithic Kernels
• Microkernels
• Hybrid Kernels
• Examples
• Mach
• Amoeba
• Plan 9
• Windows NT
Operating System Organization
• What is the best way to design an operating system?
• Put another way, what are the important software characteristics of an OS?
• Decide on those, then design to match them
Important OS Software Characteristics
• Correctness and simplicity
• Performance
• Extensibility and portability
• Suitability for distributed and parallel systems
• Compatibility with existing systems
• Security and fault tolerance
Kernel OS Designs
Similar to layers, but only two OS layers
Kernel OS services
Non-kernel OS services
Move certain functionality outside kernel
file systems, libraries
Unlike virtual machines, kernel doesn’t stand alone
Examples - Most modern Unix systems
Pros/Cons of Kernel OS Organization
+ Many advantages of layering, without disadvantage of too many layers
+ Easier to demonstrate correctness
– Not as general as layering
– Offers no organizing principle for other parts of OS, user services
– Kernels tend to grow to monoliths
Monolithic Kernel Design
Build tightly coupled OS (originally in single module)
Hopefully using data abstraction, compartmentalized function, etc.
Provides a virtual interface over computer hardware with primitives for system services in one or more modules
All modules run in same address space -- issues? advantages?
Kernel and device drivers are in single space in kernel mode
Examples
DOS (DR-DOS, MS-DOS)
*nix systems (FreeBSD, NetBSD
OpenVMS
Mac OS (up to 8.6)
Windows 9x (95, 98, 98SE, Me)
Pros/Cons of Monolithic Design
• Pros
• Speed
• Simplicity of design
• Cons
• Potential stability issues
• Can become huge - Linux 2.6 has 7.0 million lines of code
• Potentially difficult to maintain
Microkernel OS Design
Like kernels, only less so
Try to include only small set of required services in the microkernel
Moves even more out of innermost OS part
Like parts of VM, IPC, paging, etc.
Examples - Mach, Amoeba, Plan 9, Windows NT, Chorus
Microkernel OS Design
Like kernels, only less so
Try to include only small set of required services in the microkernel
Moves even more out of innermost OS part
Like parts of VM, IPC, paging, etc.
Examples - Mach, Amoeba, Plan 9, Windows NT, Chorus
Pros/Cons of Microkernel Organization
+ Those of kernels, plus:
+ Minimizes code for most important OS services
+ Offers model for entire system
– Microkernels tend to grow into kernels
– Requires very careful initial design choices
– Serious danger of bad performance
Object-Oriented OS Design
Design internals of OS as set of privileged objects, using OO methods
Sometimes extended into application space
Tends to lead to client/server style of computing
Examples
Mach (internally)
Spring (totally)
Pros/Cons of Object Oriented OS Organization
+ Offers organizational model for entire system
+ Easily divides system into pieces
+ Good hooks for security
– Can be a limiting model
– Must watch for performance problems
Micro-ness is in the eye of the beholder
Mach
Amoeba
Plan 9
Windows NT
Some Important Microkernel Designs
Mach
Mach didn’t start life as a microkernel
Became one in Mach 3.0
Object-oriented internally
Doesn’t force OO at higher levels
Microkernel focus is on communications facilities
Much concern with parallel/distributed systems
Mach Model
Kernelspace
UserspaceSoftware
emulationlayer
4.3BSDemul.
SysVemul.
HP/UXemul.
otheremul.
Userprocesses
Microkernel
What’s In the Mach Microkernel?
Tasks & Threads
Ports and Port Sets
Messages
Memory Objects
Device Support
Multiprocessor/Distributed Support
Mach Tasks
An execution environment providing basic unit of resource allocation
Contains
Virtual address space
Port set
One or more threads
Mach Task Model
Processport
Bootstrapport
Exceptionport
Registeredports
Addressspace
Thread
Process
Use
r sp
ace
Ker
nel
Mach Threads
Basic unit of Mach execution
Runs in context of one task
All threads in one task share its resources
Unix process similar to Mach task with single thread
Task and Thread Scheduling
Very flexible
Controllable by kernel or user-level programs
Threads of single task can execute in parallel
On single processor
Multiple processors
User-level scheduling can extend to multiprocessor scheduling
Mach Ports
Basic Mach object reference mechanism
Kernel-protected communication channel
Tasks communicate by sending messages to ports
Threads in receiving tasks pull messages off a queue
Ports are location independent
Port queues protected by kernel; bounded
Port Rights
• mechanism by which tasks control who may talk to their ports
• Kernel prevents messages being set to a port unless the sender has its port rights
• Port rights also control which single task receives on a port
Port Sets
• A group of ports sharing a common message queue
• A thread can receive messages from a port set
• Thus servicing multiple ports
• Messages are tagged with the actual port
• A port can be a member of at most one port set
Mach Messages
Typed collection of data objects
Unlimited size
Sent to particular port
May contain actual data or pointer to data
Port rights may be passed in a message
Kernel inspects messages for particular data types (like port rights)
Mach Memory Objects
A source of memory accessible by tasks
May be managed by user-mode external memory manager
a file managed by a file server
Accessed by messages through a port
Kernel manages physical memory as cache of contents of memory objects
Mach Device Support
Devices represented by ports
Messages control the device and its data transfer
Actual device driver outside the kernel in an external object
Mach Multiprocessor and Distributed System Support
Messages and ports can extend across processor/machine boundaries
Location transparent entities
Kernel manages distributed hardware
Per-processor data structures, but also structures shared across the processors
Intermachine messages handled by a server that knows about network details
Mach’s NetMsgServer
• User-level capability-based networking daemon
• Handles naming and transport for messages
• Provides world-wide name service for ports
• Messages sent to off-node ports go through this server
NetMsgServer in Action
User space
Kernel space
Sender
User process
NetMsgServer
User space
Kernel space
Receiver
User process
NetMsgServer
Mach and User Interfaces
Mach was built for the UNIX community
UNIX programs don’t know about ports, messages, threads, and tasks
How do UNIX programs run under Mach?
Mach typically runs a user-level server that offers UNIX emulation
Either provides UNIX system call semantics internally or translates it to Mach primitives
Amoeba
Amoeba presents transparent distributed computing environment (a la timesharing)
Major components
processor pools
server machines
X-terminals
gateway servers for off-LAN communications
Amoeba Software Model
Addressspace
Thread
Process
Use
r sp
ace
Ker
nel Process mgmt.
Memory mgmt.Comm’s
I/O
Amoeba Processes
Similar to Mach processes
Process has multiple threads
But each thread has a dedicated portion of a shared address space
Thread scheduling by microkernel
Amoeba Memory Management
Amoeba microkernel supports concept of segments
To avoid the heavy cost of fork across machine boundaries
A segment is a set of memory blocks
Segments can be mapped in/out of address spaces
Remote Procedure Call
Fundamental Amoeba IPC mechanism
Amoeba RPC is thread-to-thread
Microkernel handles on/off machine invocation of RPC
Plan 9
Everything in Plan 9 is a file system (almost)
Processes
Files
IPC
Devices
Only a few operations are required for files
Text-based interface
File Systems in Plan 9
File systems consist of a hierarchical tree
Can be persistent or temporary
Can represent simple or complex entities
Can be implemented
In the kernel as a driver
As a user level process
By remote servers
Sample Plan 9 File Systems
Device file systems - Directory containing data and ctl file
Process file systems - Directory containing files for memory, text, control, etc.
Network interface file systems
Plan 9 Channels and Mounting
A channel is a file descriptor
Since a file can be anything, a channel is a general pointer to anything
Plan 9 provides 9 primitives on channels
Mounting is used to bring resources into a user’s name space
Users start with minimal name space, build it up as they go along
Typical User Operation in Plan 9
User logs in to a terminal
Provides bitmap display and input
Minimal name space is set up on login
Mounts used to build space
Pooled CPU servers used for compute tasks
Substantial caching used to make required files local
Windows NT
More layered than some microkernel designs
NT Microkernel provides base services
Executive builds on base services via modules to provide user-level services
User-level services used by
privileged subsystems (parts of OS)
true user programs
Windows NT Diagram
Hardware
Microkernel
Executive
UserProcesses
ProtectedSubsystems User
Mode
Kernel Mode
Win32 POSIX
NT Microkernel
Thread scheduling
Process switching
Exception and interrupt handling
Multiprocessor synchronization
Only NT part not preemptible or pageable
All other NT components runs in threads
NT Executive
Higher level services than microkernel
Runs in kernel mode
but separate from the microkernel itself
ease of change and expansion
Built of independent modules
all preemptible and pageable
NT Executive Modules
Object manager
Security reference monitor
Process manager
Local procedure call facility (a la RPC)
Virtual memory manager
I/O manager
Windows NT Threads
Executable entity running in an address space
Scheduled by kernel
Handled by kernel’s dispatcher
Kernel works with stripped-down view of thread - kernel thread object
Multiple process threads can execute on distinct processors--even Executive ones
Microkernel Process Objects
A microkernel proxy for the real process
Microkernel’s interface to the real process
Contains pointers to the various resources owned by the process
e.g., threads and address spaces
Alterable only by microkernel calls
Microkernel Thread Objects
As microkernel process objects are proxies for the real object, microkernel thread objects are proxies for the real thread
One per thread
Contains minimal information about thread
Priorities, dispatching state
Used by the microkernel for dispatching
Basic Distributed FS Concepts
• You are here, the file’s there, what do you do about it?
• Important questions
• What files can I access?
• How do I name them?
• How do I get the data?
• How do I synchronize with others?
What files can be accessed?
• Several possible choices
• Every file in the world
• Every file stored in this kind of system
• Every file in my local installation
• Selected volumes
• Selected individual files
What dictates the proper choice?
• Why not make every file available?
• Naming issues
• Scaling issues
• Local autonomy
• Security
• Network traffic
Naming Files in a Distributed System
• How much transparency?
• Does every user/machine/sub-network need its own namespace?
• How do I find a site that stores the file that I name? Is it implicit in the name?
• Can my naming scheme scale?
• Must everyone agree on my scheme?
How do I get data for non-local files?
• Fetch it over the network?
• How much caching?
• Replication?
• What security is required for data transport?
Synchronization and Consistency
• Will there be trouble if multiple sites want to update a file?
• Can I get any guarantee that I always see consistent versions of data?
• i.e., will I ever see old data after new?
• How soon do I see new data?
The Andrew File System
• A different approach to remote file access
• Meant to service a large organization
• Such as a university campus
• Scaling is a major goal
Basic Andrew Model
• Files are stored permanently at file server machines
• Users work from workstation machines
• With their own private namespace
• Andrew provides mechanisms to cache user’s files from shared namespace
User Model of AFS Use
• Sit down at any AFS workstation anywhere
• Log in and authenticate who I am
• Access all files without regard to which workstation I’m using
The Local Namspace
• Each workstation stores a few files
• Mostly systems programs and configuration files
• Workstations are treated as generic, interchangeable entities
Virtue and Vice
• Vice is the system run by the file servers
• Distributed system
• Virtue is the protocol client workstations use to communicate to Vice
Overall Architecture
• System is viewed as a WAN composed of LANs
• Each LAN has a Vice cluster server
• Which stores local files
• But Vice makes all files available to all clients
Caching the User Files
• Goal is to offload work from servers to clients
• When must servers do work?
• To answer requests
• To move data
• Whole files cached at clients
Why Whole-File Caching?
• Minimizes communications with server
• Most files used in entirety, anyway
• Easier cache management problem
• Requires substantial free disk space on workstations
- Doesn’t address huge file problems
The Shared Namespace
• An Andrew installation has global shared namespace
• All clients files in the namespace with the same names
• High degree of name and location transparency
How do servers provide the namespace?
• Files are organized into volumes
• Volumes are grafted together into overall namespace
• Each file has globally unique ID
• Volumes are stored at individual servers
• But a volume can be moved from server to server
Finding a File
• At high level, files have names
• Directory translates name to unique ID
• If client knows where the volume is, it simply sends unique ID to appropriate server
Finding a Volume
• What if you enter a new volume?
• How do you find which server stores the volume?
• Volume-location database stored on each server
• Once information on volume is known, client caches it
Making a Volume
• When a volume moves from server to server, update database
• Heavyweight distributed operation
• What about clients with cached information?
• Old server maintains forwarding info
• Also eases server update
Handling Cached Files
• Client can cache all or part of a file
• Files fetched transparently when needed
• File system traps opens
• Sends them to local Venus process
The Venus Daemon
• Responsible for handling single client cache
• Caches files on open
• Writes modified versions back on close
• Cached files saved locally after close
• Cache directory entry translations, too
Consistency for AFS
• If my workstation has a locally cached copy of a file, what if someone else changes it?
• Callbacks used to invalidate my copy
• Requires servers to keep info on who caches files
Write Consistency in AFS
• What if I write to my cached copy of a file?
• Need to get write permission from server
• Which invalidates anyone else’s callback
• Permission obtained on open for write
• Need to obtain new data at this point
Write Consistency in AFS, Con’t
• Initially, written only to local copy
• On close, Venus sends update to server
• Server will invalidate callbacks for other copies
• Extra mechanism to handle failures
Storage of Andrew Files
• Stored in UNIX file systems
• Client cache is a directory on local machine
• Low-level names do not match Andrew names
Venus Cache Management
• Venus keeps two caches
• Status
• Data
• Status cache kept in virtual memory
• For fast attribute lookup
• Data cache kept on disk
Venus Process Architecture
• Venus is single user process
• But multithreaded
• Uses RPC to talk to server
• RPC is built on low level datagram service
AFS Security
• Only server/Vice are trusted here
• Client machines might be corrupted
• No client programs run on Vice machines
• Clients must authenticate themselves to servers
• Encryption used to protect transmissions
AFS File Protection
• AFS supports access control lists
• Each file has list of users who can access it
• And permitted modes of access
• Maintained by Vice
• Used to mimic UNIX access control