Unix Project

51

History of UNIX:In order to define UNIX, it helps to look at its history. In 1969, Ken Thompson, Dennis Ritchie and others started work on what was to become UNIX on a "little-used PDP-7 in a corner" at AT&T Bell Labs. For ten years, the development of UNIX proceeded at AT&T in numbered versions. V4 (1974) was re-written in C -- a major milestone for the operating system's portability among different systems. V6 (1975) was the first to become available outside Bell Labs -- it became the basis of the first version of UNIX developed at the University of California Berkeley.Bell Labs continued work on UNIX into the 1980s, culminating in the release of System V (as in "five," not the letter) in 1983 and System V, Release 4 (abbreviated SVR4) in 1989. Meanwhile, programmers at the University of California hacked mightily on the source code AT&T had released, leading to many a master thesis. The Berkeley Standard Distribution (BSD) became a second major variant of "UNIX." It was widely deployed in both university and corporate computing environments starting with the release of BSD 4.2 in 1984. Some of its features were incorporated into SVR4.As the 1990s opened, AT&T's source code licensing had created a flourishing market for hundreds of UNIX variants by different manufacturers. AT&T sold its UNIX business to Novell in 1993, and Novell sold it to the Santa Cruz Operation two years later. In the meantime, the UNIX trademark had been passed to the X/Open consortium, which eventually merged to form The Open Group.1

While the stewardship of UNIX was passing from entity to entity, several long-running development efforts started bearing fruit. Traditionally, in order to get a BSD system working, you needed a source code license from AT&T. But by the early 1990s, Berkeley hackers had done so much work on BSD that most of the original AT&T source code was long gone. A succession of programmers, starting with William and Lynne Jolitz, started work on the Net distribution of BSD, leading to the release of 386BSD version 0.1 on Bastille Day, 1992. This original "free source" BSD was spun out into three major distributions, each of which has a dedicated following: NetBSD, FreeBSD, and OpenBSD, all of which are based on BSD 4.4.2

BSD wasn't the first attempt at a "free" UNIX. In 1984, programmer Richard Stallman started work on a free UNIX clone known as GNU (GNU's Not UNIX). By the early 1990s, the GNU Project had achieved several programming milestones, including the release of the GNU C library and the Bourne Again SHell (bash). The

51

whole system was basically finished, except for one critical element: a working kernel.Enter Linus Torvalds, a student at the University of Helsinki in Finland. Linus looked at a small UNIX system called Minix and decided he could do better. In the fall of 1991, he released the source code for a freeware kernel called "Linux" -- a combination of his first name and Minux, pronounced lynn-nucks.3 By 1994, Linus and a far-flung team of kernel hackers were able to release version 1.0 of Linux. Linus and friends had a free kernel; Stallman and friends had the rest of a free UNIX clone system: People could then put the Linux kernel together with GNU to make a complete free system. This system is known as "Linux," though Stallman prefers the appellation "GNU/Linux system." There are several distinct GNU/Linux distributions: some are available with commercial support from companies like Red Hat, Caldera Systems, and S.U.S.E.; others, like Debian GNU/Linux, are more closely aligned with the original free software concept. The spread of Linux, now up to kernel version 2.2, has been a startling phenomenon. Linux runs on several different chip architectures and has been adopted or supported to varying extents by several old-line UNIX vendors like Hewlett-Packard, Silicon Graphics, and Sun Microsystems, by PC vendors like Compaq and Dell, and by major software vendors like Oracle and IBM. Perhaps the most delicious irony has been the response of Microsoft, which acknowledges the competitive threat of ubiquitous free software but seems unwilling or unable to respond with open-source software of its own.5

Microsoft has, however, struck blows with Windows NT (Windows 2000). During the late 1990s, vendor after vendor has abandoned the UNIX server platform in favor of Windows NT or wavered in their support. Silicon Graphics Inc., for example, has decided that Intel hardware and NT is the graphics platform of the future.The phenomenon of old-line UNIX vendors jumping ship and the concurrent rush to Linux by vendors large and small brings us back to the question at the top of this section: What is UNIX? While one can abide by the legal definition as embodied in the trademark, I believe that this does a major disservice to the industry. As the base software of the Internet, UNIX technology is one the significant achievements of 20th century civilization. To restrict it to a narrow legal or technical definition -- as formulated by some of the vendors now abandoning it -- is to deny its ongoing relevance and importance, which is most evident in the amazing popularity and strength of UNIX-like clones such as GNU/Linux and BSD.

51

Unix processwhen you execute a program on your UNIX system, the system creates a special environment for that program. This environment contains everything needed for the system to run the program as if no other program were running on the system.Whenever you issue a command in UNIX, it creates, or starts, a new process. When you tried out the lscommand to list directory contents, you started a process. A process, in simple terms, is an instance of a running program.The operating system tracks processes through a five digit ID number known as the pid or process ID . Each process in the system has a unique pid.Pids eventually repeat because all the possible numbers are used up and the next pid rolls or starts over. At any one time, no two processes with the same pid exist in the system because it is the pid that UNIX uses to track each process. A process is an instance of running a program. If, for example, three people are running the same program simultaneously, there are three processes there, not just one. In fact, we might have more than one

51

processrunning even with only person executing the program, because the program can “split into two,” making two processes out of one.Starting a Process:When you start a process (run a command), there are two ways you can run it:

Foreground Processes Background Processes

Types of process:Foreground Processes:By default, every process that you start runs in the foreground. It gets its input from the keyboard and sends its output to the screen.You can see this happen with the ls command. If I want to list all the files in my current directory, I can use the following command:$ls ch*.docThe process runs in the foreground, the output is directed to my screen, and if the ls command wants any input (which it does not), it waits for it from the keyboard.While a program is running in foreground and taking much time, we cannot run any other commands (start any other processes) because prompt would not be available until program finishes its processing and comes out.Background Processes:A background process runs without being connected to your keyboard. If the background process requires any keyboard input, it waits.The advantage of running a process in the background is that you can run other commands; you do not have to wait until it completes to start another!The simplest way to start a background process is to add an ampersand ( &) at the end of the command.$ls ch*.doc &Here if the ls command wants any input (which it does not), it goes into a stop state until I move it into the foreground and give it the data from the keyboard.If you press the Enter key now, you see the following:[1] + Done ls ch*.doc &$The first line tells you that the ls command background process finishes successfully. The second is a prompt for another command.Listing Running Processes:It is easy to see your own processes by running the ps (process status) command as follows:

51

One of the most commonly used flags for ps is the -f ( f for full) option, which provides more information as shown in the following example:

Here is the description of all the fileds displayed by ps -f command:

There are other options which can be used along with ps command:

Stopping Processes:Ending a process can be done in several different ways. Often, from a console-based command, sending a CTRL + C keystroke (the default interrupt character) will exit the command. This works when process is running in foreground mode.

51

If a process is running in background mode then first you would need to get its Job ID using pscommand and after that you can use kill command to kill the process as follows:

Here kill command would terminate first_one process. If a process ignores a regular kill command, you can use kill -9 followed by the process ID as follows:

Parent and Child Processes:Each unix process has two ID numbers assigned to it: Process ID (pid) and Parent process ID (ppid). Each user process in the system has a parent process.Most of the commands that you run have the shell as their parent. Check ps -f example where this command listed both process ID and parent process ID.Zombie and Orphan Processes:Normally, when a child process is killed, the parent process is told via a SIGCHLD signal. Then the parent can do some other task or restart a new child as needed. However, sometimes the parent process is killed before its child is killed. In this case, the "parent of all processes," init process, becomes the new PPID (parent process ID). Sometime these processes are called orphan process.When a process is killed, a ps listing may still show the process with a Z state. This is a zombie, or defunct, process. The process is dead and not being used. These processes are different from orphan processes. They are the processes that has completed execution but still has an entry in the process table.Daemon Processes:Daemon stands for Disk and Execution Monitor. A daemon is a long-running background process that answers requests for services. The term originated with UNIX, but most operating systems use daemons in some form or another. In UNIX, the names of daemons conventionally end in "d". Some examples include inetd, httpd, nfsd,sshd, named, and lpd.Daemons are system-related background processes that often run with the permissions of root and services requests from other processes. A daemon process has no controlling terminal. It cannot open /dev/tty. If you do a "ps -ef"

https://kb.iu.edu/d/agat

51

and look at the tty field, all daemons will have a ? for the tty.More clearly, a daemon is just a process that runs in the background, usually waiting for something to happen that it is capable of working with, like a printer daemon is waiting for print commands.If you have a program which needs to do long processing then its worth to make it a daemon and run it in background.The top Command:The top command is a very useful tool for quickly showing processes sorted by various criteria.It is an interactive diagnostic tool that updates frequently and shows information about physical and virtual memory, CPU usage, load averages, and your busy processes.Here is simple syntax to run top command and to see the statistics of CPU utilization by different processes:

A Five-State Process Model(Review)The not-running state in the two-state model has now been split into a ready state and a blocked state

New— just been created Running— currently being executed Ready— prepared to execute Blocked— waiting for some event to occur (for an I/O operation to

complete, or a resource to become available, etc.) New— just been created Exit— just been terminated

State transition diagram:

Context Switching

51

Stopping one process and starting another is called a context switch. When the OS stops a process, it stores the hardware registers (PC,

SP, etc.) and any other state information in that process’ PCB When OS is ready to execute a waiting process, it loads the hardware

registers (PC, SP, etc.) with the values stored in the new process’ PCB, and restores any other state information

Performing a context switch is a relatively expensive operationHowever, time-sharing systems may do 100–1000 context switches a second.Unix – Signals and TrapsSignals are software interrupts sent to a program to indicate that an important event has occurred. The events can vary from user requests to illegal memory access errors. Some signals, such as the interrupt signal, indicate that a user has asked the program to do something th at is not in the usual flow of control.The following are some of the more common signals you might encounter and want to use in your programs:

List of Signals:There is an easy way to list down all the signals supported by your system. Just issue kill -l command and it would display all the supported signals:

51

The actual list of signals varies between Solaris, HP-UX, and Linux.ThreadUnit of execution (unit of dispatching) and a collection of resources, with which the unit of execution is associated, characterize the notion of a process.A thread is the abstraction of a unit of execution. It is also referred to as a light-weight process (LWP).As a basic unit of CPU utilization, a thread consists of an instruction pointer (also referred to as the PC or instruction counter), a CPU register set and a stack. A thread shares its code and data, as well as system resources and other OS related information, with its peer group (other threads of the same process).Threads: an exampleA good example of an application that could make use of threads is a file server on a local area network (LAN).

51

A ‘‘controller’’ thread accepts file service requests and spawns a ‘‘worker’’ thread for each request, therefore may handle many requests concurrently. When a worker thread finishes servicing a request, it is destroyed.

Threads Models• User-level threads:Implemented through a threads library in the address space of a process, these are invisible to the operating system. User-level threads (ULTs)are the interface for application parallelism.Benefits:• no modifications required to kernel• flexible and low cost–Drawbacks:• can not block without blocking entire process• no parallelism (not recognized by kernel)Kernel level: Implemented as system calls; can be scheduled directly byThe OS; independent operation of threads in a singleProcess ; more expensive (thread) operations.Kernel level threads- directly supported by kernel, thread is the basic scheduling entity–Examples:• Windows 95/98/NT/2000, Solaris, Tru64 UNIX, BeOS, Linux– Benefits:• coordination between scheduling and synchronization• suitable for parallel application–Drawbacks:• more expensive than user-level threads

51

UNIX Process Model

●Start in Created, go to either:●Ready to Run, in Memory●or Ready to Run, Swapped (Out) if thereisn’t room in memory for the new process●Ready to Run, in Memory is basically same state as Preempted(dotted line)

51

nPreemptedmeans process was returning to user mode, but the kernel switched to another process instead● When scheduled, go to either:●User Running (if in user mode)●or Kernel Running (if in kernel mode)●Go from U.R. to K.R. via system call●Go to Asleep in Memory when waiting for some event, to RtRiMwhen it occurs●Go to Sleep, Swapped if swapped out

Scheduling :A schedule or a timetable is a basic time management tool consisting of a list of times at which possible tasks, events, or actions are intended to take place, or a sequence of events in the chronological order in which such things are intended to take place. The process of creating a schedule - deciding how to order these tasks and how to commit resources between the variety of possible tasks - is called scheduling. Or arrange or plan (an event) to take place at a particular time is called scheduling .Scheduling and System Performance:The scheduler determines when and for how long processes run. Therefore, the scheduler's behavior strongly affects a system's performance. By default, all user processes are time-sharing processes. A process changes class only by a priocntl(2) (process scheduler control) ‘system call’.

http://docs.oracle.com/docs/cd/E19683-01/817-0691/index.html

51

All real-time process priorities have a higher priority than any time-sharing process. Time-sharing processes or system processes cannot run while any real-time process is runnable. A real-time application that occasionally fails to relinquish control of the CPU can completely lock out other users and essential kernel housekeeping. Besides controlling process class and priorities, a real-time application must also control other factors that affect its performance. The most important factors in performance are CPU power, amount of primary memory, and I/O throughput. These factors interact in complex ways. The sar(1) command has options for reporting on all performance factors.Process State Transition:Applications that have strict real-time constraints might need to prevent processes from being swapped or paged out to secondary memory. A simplified overview of UNIX process states and the transitions between states is shown in the following figure.

An active process is normally in one of the five states in the diagram. The arrows show how the process changes states.

A process is running if the process is assigned to a CPU. A process is removed from the running state by the scheduler if a process with a higher priority becomes runnable. A process is also pre-empted if a process of equal priority is runnable when the original process consumes its entire time slice.

A process is runnable in memory if the process is in primary memory and ready to run, but is not assigned to a CPU.

A process is sleeping in memory if the process is in primary memory but is waiting for a specific event before continuing execution. For example, a process sleeps while waiting for an I/O operation to complete, for a locked resource to be unlocked, or for a timer to expire. When the event occurs, a


51

wakeup call is sent to the process. If the reason for its sleep is gone, the process becomes runnable.

When a process' address space has been written to secondary memory, and that process is not waiting for a specific event, the process is runnable and swapped.

If a process is waiting for a specific event and has had its whole address space written to secondary memory, the process is sleeping and swapped.

If a machine does not have enough primary memory to hold all its active processes, that machine must page or swap some address space to secondary memory.

When the system is short of primary memory, the system writes individual pages of some processes to secondary memory but leaves those processes runnable. When a running process, accesses those pages, the process sleeps while the pages are read back into primary memory.

Both paging and swapping cause delay when a process is ready to run again. For processes that have strict timing requirements, this delay can be unacceptable. To avoid swapping delays, real-time processes are never swapped, though parts of such processes can be paged. A program can prevent paging and swapping by locking its text and data into primary memory.

Process scheduling in UnixWhen a process is created, the system assigns a lightweight process (LWP) to the process. If the process is multithreaded, more LWPs might be assigned to the process. An LWP is the object that is scheduled by the UNIX system scheduler, which determines when processes run. The scheduler maintains process priorities that are based on configuration parameters, process behavior, and user requests. The scheduler uses these priorities to determine which process runs next.

Two-level schedulingLow level (CPU) scheduler uses multiple queues to select the next process, out of the processes in memory, to get a time quantum.Low-level scheduler keeps queues for each priorityProcesses in user mode have positive prioritiesProcesses in kernel mode have negative priorities (lower is higher)

High level (memory) scheduler moves processes from memory to disk and back, to enable all processes their share of CPU time

51

Unix priority queues

Unix low-level Scheduling Algorithm:Pick process from highest (non-empty) priority queue. Run for 1 quantum (usually 100 ms.), or until it blocks.i ncrement CPU usage count every clock tick. Every second, recalculate priorities:

o Divide cpu usage by 2o New priority = base + cpu_usage + niceo Base is negative if the process is released from waiting in kernel

modeUse round robin for each queue (separately)Blocked processes are removed from queue, but when the blocking event occurs, are placed in a high priority queueThe negative priorities are meant to release processes quickly from the kernelNegative priorities are hardwired in the system, for example, -5 for Disk I/O is meant to give high priority to a process released from disk I/O Interactive processes get good service, CPU bound processes get whatever service is left...

51

The six priority classes are Time-Sharing Class System Class Real-time Class Interactive Class Fair-Share Class Fixed-Priority Class

Time-Sharing Class:The goal of the time-sharing policy is to provide good response time to interactive processes and good throughput to CPU-bound processes. The scheduler switches CPU allocation often enough to provide good response time, but not so often that the system spends too much time on switching. Time slices are typically a few hundred milliseconds.The time-sharing policy changes priorities dynamically and assigns time slices of different lengths. The scheduler raises the priority of a process that sleeps after only a little CPU use. For example, a process sleeps when the process starts an I/O operation such as a terminal read or a disk read. Frequent sleeps are characteristic of interactive tasks such as editing and running simple shell commands. The time-sharing policy lowers the priority of a process that uses the CPU for long periods without sleeping.The time-sharing policy that is the default gives larger time slices to processes with lower priorities. A process with a low priority is likely to be CPU-bound. Other processes get the CPU first, but when a low-priority process finally gets the CPU, that process gets a larger time slice. If a higher-priority process becomes runnable during a time slice, however, the higher-priority process pre-empts the running process.Global process priorities and user-supplied priorities are in ascending order: higher priorities run first. The user priority runs from the negative of a configuration-dependent maximum to the positive of that maximum. A process inherits its user priority. Zero is the default initial user priority.The “user priority limit” is the configuration-dependent maximum value of the user priority. You can set a user priority to any value lower than the user priority limit. With appropriate permission, you can raise the user priority limit. Zero is the user priority limit by default.An administrator configures the maximum user priority independent of global time-sharing priorities. For example, in the default configuration a user can set a

51

user priority in the –20 to +20 range. However, 60 time-sharing global priorities are configured. The scheduler manages time-sharing processes by using configurable parameters in the time-sharing parameter table ts_dptbl(4) (time-sharing dispatcher parameter table ‘File Format’). This table contains information specific to the time-sharing class. System Class:The system class uses a fixed-priority policy to run kernel processes such as servers and housekeeping processes like the paging daemon. The system class is reserved to the kernel. Users cannot add a process to the system class. Users cannot remove a process from the system class. Priorities for system class processes are set up in the kernel code. The priorities of system processes do not change once established. User processes that run in kernel mode are not in the system class.Real-time Class:The real-time class uses a scheduling policy with fixed priorities so that critical processes run in predetermined order. Real-time priorities never change except when a user requests a change. Privileged users can use the priocntl(1) (display or set scheduling parameters of specified process(es)’ User Command’) command to assign real-time priorities. The scheduler manages real-time processes by using configurable parameters in the real-time parameter table rt_dptbl(4) (real-time dispatcher parameter table ‘File Format’). This table contains information specific to the real-time class. Interactive Class:The IA class is very similar to the TS class. When used in conjunction with a windowing system, processes have a higher priority while running in a window with the input focus. The IA class is the default class while the system runs a windowing system. The IA class is otherwise identical to the TS class, and the two classes share the same ts_dptbl(4) (time sharing dispatch parameter table)Fair-Share Class:The FSS class is used by the Fair-Share Scheduler (FSS(7))( Fair share scheduler) to manage application performance by explicitly allocating shares of CPU resources to projects. A share indicates a project's entitlement to available CPU resources. The system tracks resource usage over time. The system reduces entitlement when usage is heavy. The system increases entitlement when usage is light. The FSS schedules CPU time among processes according to their owners' entitlements, independent of the number of processes each project owns. The FSS class uses





51

the same priority range as the TS and IA classes. See the FSS man page for more detailsFixed-Priority Class:The FX class provides a fixed-priority pre-emptive scheduling policy. This policy is used by processes that require user or application control of scheduling priorities but are not dynamically adjusted by the system. By default, the FX class has the same priority range as the TS, IA, and FSS classes. The FX class allows user or application control of scheduling priorities through user priority values assigned to processes within the class. These user priority values determine the scheduling priority of a fixed-priority process relative to other processes within its class.The scheduler manages fixed-priority processes by using configurable parameters in the fixed-priority dispatch parameter table fx_dptbl(4) (fixed priority dispatcher parameter table ‘file format’). This table contains information specific to the fixed-priority class.Commands and Interfaces:The following figure illustrates the default process priorities.

A process priority has meaning only in the context of a scheduler class. You specify a process priority by specifying a class and a class-specific priority value. The class and class-specific value are mapped by the system into a global priority that the system uses to schedule processes.


51

The ps(1) command with -cel options reports global priorities for all active processes. The priocntl(1) command reports the class-specific priorities that users and programmers use.The priocntl(1) command and the priocntl(2) and priocntlset(2) interfaces are used to set or retrieve scheduler parameters for processes. Setting priorities generally follows the same sequence for the command and both interfaces:

1. Specify the target processes. 2. Specify the scheduler parameters that you want for those processes. 3. Execute the command or interface to set the parameters for the processes. Process IDs are basic properties of UNIX processes. The class ID is the scheduler class of the process. priocntl(2) works only for the time-sharing and the real-time classes, not for the system class.

priocntl Usage:The priocntl(1) utility performs four different control interfaces on the scheduling of a process:priocntl -l

Displays configuration informationpriocntl -d

Displays the scheduling parameters of processespriocntl -s

Sets the scheduling parameters of processespriocntl -e

Executes a command with the specified scheduling parametersThe following examples demonstrate the use of priocntl(1).The -l option for the default configuration produces the following output:

$ priocntl -lCONFIGURED CLASSES==================

SYS (System Class)

TS (Time Sharing)Configured TS User Priority Range -60 through 60

RT (Real Time)Maximum Configured RT Priority: 59









51

To display information on all processes, do the following:$ priocntl -d -i all

To display information on all time-sharing processes:$ priocntl -d -i class TS

Kernel Processes:The kernel's daemon and housekeeping processes are members of the system scheduler class. Users can neither add processes to nor remove processes from this class, nor can users change the priorities of these processes. The command ps -cel lists the scheduler class of all processes. A SYS entry in the CLS column identifies processes in the system class when you run ps(1) with the -f option.The Deadlock Problem :Law passed by the Kansas Legislature in early 20th century:“When two trains approach each other at a crossing, both shall come to a full stop and neither shall start upon again until theother has gone.”Deadlock or Deadly Embrace :

Permanent blocking of a set of processes that either compete for system resources or communicate with each other–Several processes may compete for a finite set of resources–Processes request resources and if a resource is not available, enter a wait state–Requested resources may be held by other waiting processes–Require divine intervention to get out of this problem

A significant problem in real systems, because there is no efficient solution in the general case

Deadlock problem is more important because of increasing use of multiprocessing systems (like real-time, life support,vehicle monitoring, multicore utilization, grid processing)

Deadlocks can occur with– Serially reusable (SR) resources – printer, tape drive, memory

A finite set of identical units, with the number of units constant


51

Can be used safely by only one process at a time and are not depleted by that use

Units acquired by processes, used, and released later for use by other processes

A process may release a unit only if it has previously acquired it Examples include processors, memory, devices, files, databases,

and semaphores–Consumable resources – messages

Resource gets created dynamically and may be destroyed after use

Typically no limit on the number of consummable resources of a specific type

Examples are messages, signals, interrupts, and information in I/O buffers

Examples of Deadlocks in Computer Systems: Reusable resources

– File Sharing Consider two processes p1 and p2 They update a file F and require a scratch tape during the updating Only one tape drive T available T And F are serially reusable resources, and can be used only by

exclusive access. p 2 Needs T immediately prior to updating request operation

Blocks the process requesting the resource Puts the process on the wait queue The process is to remain blocked until the requested resource

isAvailable

If the resource is available, the process is granted exclusive access

to it. Release operation

Returns the resource being released to the system Wakes up the process waiting for the resource

51

P 1 and P 2 may run as follows

P1: request(F); . . . .P 2: request(T);r1 : request(T);....r2 : request(F);......release(T); release(F);

release(F) ; release(T)p1 can block on T Holding F While p2 can block on F holding T

Consumable resources

–Deadlock with messages A pair of processes p1 And p2

Each process receives a message from the other process and then, send a message to the other process

p1 () p2 (). .. .. .

receive (p2) receive (p1)

send (p2,m1) send (p1,m1) Deadlock with blocking receive

Locking in Database SystemsLocking required to preserve the Integrity and consistency of databases, with random request patterns.Problem when two records to be updated by two different processes are locked

Effective Deadlocks

51

Milder form of indefinite postponement of processes competing for a resourceExemplified by Shortest Job Next Scheduling

Deadlocks in Unix– Possible deadlock condition that cannot be detected– Number of processes limited by the number of available entries in the

process table – If process table is full, the Fork system call fails– Process can wait for a random amount of time before Fork ing again– Deadlocks due to open files, swap space– Another cause of deadlock can be due to the inode table becoming full

in the filesystem– Example:

10 processes creating 12 children each100 entries in the process tableEach process has already created 9 childrenNo more space in the process table deadlock

Deadlocks problem characterization: Deadlock Detection

Process resource graphs

Deadlock Recovery“Best” ways of recovering from a deadlock

Deadlock PreventionNot allowing a deadlock to happen

A Systems Model: Finite number of resources in the system to be distributed among a

number of competing processes. Partition the resources into several classes. Identical resources assigned to the same class (CPU cycles, memory

space, files, tape drives, printers).

51

Allocation of any instance of resource from a class will satisfy the request.

State of the OS– allocation status of various resources, and can be Changed only by process actions. Process actions– Request a resource– Acquire/use a resource– Release a resource

Resources acquired and used only through system calls

Deadlock Characterization:Necessary and sufficient conditions for deadlocks – Four conditions to hold simultaneously1) Mutual exclusion

– Only one process may use a resource at a time– At least one resource must be held in a non-sharable mode

2) Hold and wait– Existence of a process holding at least one resource and waiting to

acquire additional resources currently held by other processes3) No preemption

– Resources cannot be preempted by the system4) Circular wait

– Processes waiting for resources held by other waiting processesDeadlock Detection:

do not restrict process actions or limit resource access (if resources are available to satisfy requests)

Periodically detect the circular wait condition using a deadlock detection algorithm

Simulate the most favored execution of each unblocked process– An unblocked process may acquire all the needed resources– Run and then release All the acquired resources– Remain dormant thereafter– Released resources may wake up some previously blocked process– continue the above steps as long as possible

Recovery from Deadlock: Recovery by process termination– Abort all deadlocked processes

51

– Back up each deadlocked process to some previously defined checkpoint and restart all of them

Needs rollback and restart mechanisms built into the system– terminate deadlocked processes in a systematic way

When enough processes terminated to recover from deadlock, stop terminations

Perform deadlock detection at each process’ termination Processes should be terminated based on some criterion/policy

Priority of a process CPU time used and expected usage before completion Number of resources needed for completion Number of processes needed to be terminated Are the processes interactive or batch?

Minimum cost recovery based on Cost of destroying a process Cost of recovery from the next process state

Recovery by resource preemption Enough resources to be preempted from processes and made

available to deadlocked processes to resolve the dead–lock Selecting a victim Rollback

Deadlock Prevention: Uses a conservative resource allocation policy; undercommits resources Each process can request and acquire All the needed resources at the same

time– Works well for processes that perform a single burst of activity– No preemption necessary– Grossly inefficient– May delay process initialization– Processes must identify All future resource requirements in advance

Deny one of the required conditions for a deadlock Mutual Exclusion

Cannot be done for non-sharable resources (like printers) Sharable resources (read-only files) do not require mutually

exclusive access cannot be involved in deadlock Cannot deny mutual exclusion as some resources are

inherently non-sharable

51

Hold and Wait Processes can request and acquire all the resources at one

time Request resources only if the process is holding none Disadvantages

Low resource utilization – resources may get allocated but not used for a long time.Possibility of starvation – on popular resources.

No Preemption If a process holding resources requests for another resource

that cannot be immediately allocated, all currently held resources are preempted

Process restarted only when it regains All the resources Suitable for resources whose state can be easily saved -

CPU registers, memory Circular Wait

Impose a total ordering on all resource types Each process requests resources in an increasing order of

enumeration If several instances of a resource required, a single request

must be issued for all of themDeadlock Prevention based on Maximum Claims:

Also called Deadlock Avoidance A priori knowledge of maximum possible claims for each process Dynamically examine the resource allocation status to ensure that no

circular wait condition can exist Resource allocation state

– Defined by the number of available and allocated resources, and the maximum demands of the processes

– Safe, if the system can allocate resources to each process (up to its maximum) in some order and still avoid a deadlock

All unsafe states are not deadlock states An unsafe state may lead to a deadlock

Deadlock Avoidance:– Requires a process to declare the maximum instances of each resource type

needed

51

– Upon request, the system must determine whether the allocation will leave the system in a safe state

– Number of processes in the system –n– Number of resource classes –m

UNIX Memory Management Management is responsible to allocate the portion of memory for new processes, keep track of which parts of memory are in use, deallocate parts of memory when they are unused, and manage swapping between main memory and disk and demand paging when main memory is not enough to hold all the processes.Evolution of Memory Management As in a single-process operating system only one process at a time can be running, there is just one program sharing the memory, except the operating system. The operating system may be located at the lower-addressed space of the memory and the user program at the rest part. Thus, the memory management is quite simple. That is, there is not too much work to do for the memory management in the single-process operating system. The memory management just handles how to load the program into the user memory space from the disk when a program is typed in by a user and leaves the process management to accomplish the program execution. When a new program name is typed in by the user after the first one finishes, the memory management also loads it into the same space and overwrite the first one.

In multiprocessing operating systems, there are many processes that represent different programs to execute simultaneously, which must be put in different areas of the memory. Multiprogramming increases the CPU utilization, but needs complex schemes to divide and manage the memory space for several processes in order to avoid the processes’ interfering with each other when executing and make their execution just like single process executing in the system. It may be the simplest scheme to divide the physical memory into several fixed areas with different sizes. When a task arrives, the memory management should allocate it the smallest area that is large enough to hold it and mark this area as used. When the task finishes, the management should deallocate the area and mark it as free for the later tasks. A data structure is necessary to hold the information for each size-fixed area, including its size, location and use state. Since a program has a starting address when it is executed and the initial addresses for various areas are different, the memory management

51

should also have an address transformation mechanism to handle this issue. The two biggest disadvantages for this scheme are that the fixed sizes cannot meet the needs of the number increasing of the tasks brought in the system simultaneously and the size growing of application programs. The former problem can be handled by swapping and the latter one via paging. Sometimes some processes wait for I/O devices in memory without doing anything. Otherwise, some other processes are ready to run, but there is not enough memory to hold them. Thus, operating system developers consider if the memory management can swap the waiting-for-I/O processes out of the memory and put them in the disk temporarily to save the memory space for other processes that are ready to run. When the memory is available for the processes swapped out on the disk, the system checks which processes swapped out are ready to run and swap the ready processes in memory again. This memory management strategy is called swapping. The memory is allocated to processes dynamically. As the swapping is fast enough to let the user not to realize the delay and the

system can handle more processes, the performance of the whole system becomes better. Since the program is kept in a continuous memory space, the swapping is just done on a whole process. When the size of an application program becomes too big to load in the memory as a whole at a time to execute, the memory paging is needed. Paging technique can divide the main memory into small portions with the same size— pages, whose size can be 512 or 1024 bytes. When a long program is executed, the addresses accessed over any short period of time are within an area around a locality. That is, only a number of pages of the process are necessarily loaded in main memory over a short period of time. When some page of the program is needed and not in memory yet, it is acceptable if that page is loaded in main memory fast enough when demanded, which is called demand paging. In this situation, bringing in or out of memory is with pages, rather than a whole process. Demand paging combined with page replacement and swapping implements the virtual memory management in 4.2BSD and UNIX System V.

At the very beginning of UNIX development in the early 1970s, UNIX System versions adopted swapping as its memory management strategy. It was designed to transfer entire processes between primary memory and the disk. Swapping was quite suitable for the hardware system at that time, which had a small size memory, such as PDP-11 (whose total physical main memory was about 256 Kbytes). With swapping as the memory management scheme, the size of the

51

physical memory space restricts the size of processes that can be running in the system. However, the swapping is easy to implement and its system overhead is quite small. Around the second half of 1970s, with the advent of the VAX that had 512-byte pages and a number of gigabytes of virtual address space, the BSD variants of UNIX first implemented demand paging in memory management. Demand paging transfers pages instead of a whole process between the main memory and the disk. To start executing, a process is not necessarily to load in the memory as a whole, but several pages of its initial segment. During its execution, when the process references the pages that are not in memory, the kernel loads them in memory on demand. The demand paging allows a big-sized process to run in a small-sized physical memory space and more processes to execute simultaneously in a system than just swapping. Even though demand paging is more flexible than just swapping, demand paging implementation needs the swapping technique to replace the pages. From UNIX System V, UNIX System versions also support demand paging. Another way used in memory management is segmentation, which divides the user program into logical segments that are corresponding to the natural length of programming and have unequal sizes from one segment to another.

Memory Allocation Algorithms in SwappingWith swapping, the memory is assigned to processes dynamically when a new process is created or an existent process has to swap in from the disk. The memory management system must handle it. A common method used to keep track of memory usage is linked lists. The system can maintain theallocated and free memory segments with one linked list or two separate lists. The allocated memory segments hold processes that reside currently in memory, and the free memory segments are empty holes that are in between two allocated segments. With one list, the segment list can be sorted by address and each entry in the list can be specified as allocated or empty with a marker bit. With two separate lists for allocated and free memory segments respectively, each entry in one of the lists holds the address and size of the segment, and two pointers. One pointer points to the next segment in the list; the other points to the last segment in the list. With the lists of memory segments, the following algorithms can be used to allocate memory for a new process or an existent process that has to swap in. Having one mixed list, the algorithms search the only list; with two separate lists, the algorithms scan the

51

list of free memory segments (or the free list), and after allocated, the chosen segment is transferred from the free list to the allocated list.• First-fit algorithm. It scans the list of memory segments from the beginning until it finds the first free segment that is big enough to hold the process. The chosen free segment is tried to split into two pieces. One piece is just enough for the process. If the rest piece is greater than or equal to the minimum size of free segments, one piece is assigned to the process and the other remains free. If the rest piece is less than the minimum size of free segments, the whole segment is assigned to the process without being divided. The list entries are updated.• Next-fit algorithm. It works almost the same way as first fit does, except that at the next time it is called to look for a free segment, it starts scanning from the place where it stopped last time. Thus, it has to record the place where it finds the free segment every time. And when the searching reaches the end of the list, it goes back to the beginning of the list andcontinues.• Best-fit algorithm. It searches the whole list, takes the smallest free segment that is enough to hold the process, assigns it to the process, and updates the list entries.• Quick-fit algorithm. It has different lists of the memory segments and puts the segments with the same level of size into a list. For example, it may have a table with several entries, in which the first entry is a pointer to a list of 8 Kbyte segments, which are the segments whose sizes are less than or equal to 8 Kbytes; the second entry is a pointer to a list of 16 Kbyte segments, which are the segments whose sizes are less than or equal to 16 Kbytes and greater than 8 Kbytes; and so on. When called, it just searches the list of the segments with the size that is close to the requested one. After allocated, the list entries should be updated.

First-fit algorithm is simple and fast. When all the processes do not occupy all the physical memory space, Next-fit algorithm is faster than the first fit algorithm because first-fit the algorithm does searching always from the beginning of the list. When the processes fill up the physical memory and some of them are swapped out on the disk, next-fit algorithm does not necessarily surpass first-fit algorithm. Best-fit algorithm is slower than first-fit and next-fit algorithms because it must search the entire list every time. Quick-fit algorithm can find a required free segment faster than other algorithms.

51

When de-allocating memory, the memory management system has to do merge free segments to avoid memory split into a large number of small fragments. That is, if the neighbors of the newly de-allocated segment are also free, they are merged into one bigger segment by revising the list entries. Two separate lists for allocated and free segments can speed up all the algorithms, but make the algorithms more complicated and merging free segments when de-allocating memory more costly, especially for quick-fit algorithm. The early UNIX System versions adopted the fist-fit algorithm to carry out allocation of both main memory and swap space.

Virtual memory management

Page Replacement Algorithms in Demand Paging Several important replacement algorithms. When considering how good a page replacement algorithm is, we can examine how frequent the thrashing happens when the algorithm used. The thrashing is the phenomenon that the page that has been just removed from memory is referenced and has to be brought in memory again.• The optimal page replacement algorithm. It is an ideal algorithm, and of no use in real systems. But traditionally, it can be used as a basis reference for other realistic algorithms. It removes the optimal page that will be referenced the last among processes currently in memory. But it is difficult and costly to look for this page.• The first-in-first-out (FIFO) page replacement algorithm. It removes the page that is in memory for the longest time among all the processes currently residing in memory. The memory management system can use a list to maintain all pages currently in memory, and put the page that arrives the most recently at the end of the list. When a page fault happens, the first page in the list, which is the first comer, is removed. The new page is the most recent comer, so it is put at the end of the list.• The least recently used (LRU) page replacement algorithm. It assumes that pages that have not been used for a long time in the past would remain unused for a long time in the future. Thus, when a page fault occurs, the page unused for the longest time in the past will be removed. To implement LRU paging, it is necessary to have a linked list of all pages in memory. When a page is referenced, it is put at the end of the list. Thus, for a while, the head of

51

the list is the least recently used page, which will be chosen as the one to be removed when a page fault happens. The implementation can also be performed with hardware. One way is to equip each page in memory with a shift register.• The clock page replacement algorithm. It puts all the pages in memory in a circular list and maintains the list in a clock-hand-moving order. Usually, the virtual memory in a computer system has a status bit associated with each page, for example, R, which is set when the page is referenced. If a page fault occurs, the system begins its page scanning along the clock-like list. If its R bit is zero, the page is chosen to remove, and replaced with the new page. Then the searching pointer, which works like the clock hand, moves to point to next page in the list and stops there until the next page fault. If R is one, the system clears the R bit and moves the searching pointer to the next page in the list. The pointer motion is repeated until a page with zero R bit is found. Then the system does the page replacement.

Process Swapping in UNIX As UNIX memory management started from swapping the whole processes out of or in memory, in this section, we will first discuss how to swap a process as a whole out of or in memory. Swapped Content We know the swapping moves a whole process between the memory and the swap space on the disk. What does the swapped content of a process consist of?

51

In UNIX, since processes can execute in user mode or kernel mode, typically, the major data associated with a process consist of the instruction segment, the user data segment, and the system data segment. Except the private code, the instruction segment may include the shared code, which can be used by several processes. The user data segment includes user data and stack. The system data segment is composed of kernel data and stack. Either data or stack in both user and system segments can grow during the process executing. The sharable code is not necessary to swap because it is only to read and there is no need to read in a piece of shared code for each process if the kernel has already brought it in memory for some process. On the other hand, multiple processes using the same code can save the memory space. InUNIX, shared code segments are treated with an extra mechanism. Except the shared code, all the other segments, including the private code of the instruction segment, the user data segment, and the system data segment, can be swapped if necessary. To swap easily and fast, all the segments have to keep in a contiguous area of memory. Contiguous placement of a process can cause serious external fragmentation of memory. However, in demand paging, it is not necessary to put a process in a contiguous area of memory.Timing of Swapping In UNIX kernel, the swapper is responsible for swapping processes between the memory and the swap area on the disk. The swapper is awaked at least once in a set slice of time (for example 4 seconds) to check whether or not there are processes to be swapped in or out. The swapper does examine the process control table to search a process that has been swapped out and ready to run. If there is free memory space available, the kernel allocates main memory space for the process, copies its segments into memory, and changes its state from ready swapped into ready in memory, and puts it in the proper priority queue to compete CPU with other processes that are also ready in memory. If the kernel finds the system does not have enough memory to make the process to be swapped in, the swapper will examine the process control table to find a process that sleeps in memory waiting for some event happens and put it in the swap space on the disk. Then the swapper is back to search a process to swap in. The free memory space is allocated to that process. Except when there is not enough room in memory for all the existing

51

processes, swapping-out can also happen if one of two cases occurs: one is some segments of a process increase and its old holder cannot accommodatethe process; the other is a parent process creating a child process with a fork system call. As known, both the user and system data segments may grow and exceed the original scope during the process execution. If there is enough memory to allocate a new memory space for the process, the allocation will be done directly by the invocation of the brk system call that can set the highest address of a process’s data segment and its old holder will be freed by the kernel. If not, the kernel does an expansion swap of the process, which includes: to allocate the swap space on the disk for the new size of the process, to modify the address mapping in the process control table according to the new size, to swap the process out on the swap space, to initiate the newly expanded space on the disk, and to modify its state as “ready, swapped”. When the swapper is invoked again, the process will be swapped in memory and finally resume its execution in a new larger memory space. When a child process is created via the fork system call, the kernel should allocate main memory space for it (see Figure 4.4). However, if there is no room in memory available, the kernel will swap out the child process onto the swap space without freeing the memory because the memory has not been allocated yet and set the child in “ready, swapped”. Later on, the swapper will swap it in memory.Allocation AlgorithmAs mentioned before, the first-fit algorithm was adopted in UNIX to allocate memory or the swap space on the disk to processes. In UNIX, the swap space on the disk is allocated to a process with sequential blocks. To do so is for several reasons: first, the use of the swap space on the disk is temporary; second, the speed of transferring processes between the memory and the swap space on the disk is crucial; third, for I/O operations, several contiguous blocks of data transfer are faster than several separate blocks. The kernel has a mapping array to manage the swap space. Each entry of the mapping array holds the address and number of blocks of a free space. At first, the array has only one entry that consists of the total number of blocks that the swap space in the system can have. After several times of swapping in and out, the mapping array can have many entries. The malloc system call is used to allocate the swap space to the process to be swapped

51

out. With the first-fit algorithm, the kernel scans the mapping array for the first entry that can make the process fit in. If the size of the process can cover all the blocks of entry, all the blocks are allocated to the process, and this entry is removed from the array. If the process cannot use all the blocks, the kernel breaks up the blocks into two sequential groups, one that are enough for the process are allocated to the process; the other becomes a new entry with modified address and number of blocks in the mapping array. To free the swap space, the kernel does some merge just like what it does when deallocating memory. If one or both of the front and back neighbors are free, the newly freed entry is merged with them. The address or the number of blocks of the new entry should be modified, and some entry may be deleted according to the situation. If the newly freed entry is separate, the kernel adds one entry into an appropriate position of the mapping array and fills in its address and number of blocks.Selection Principle of Swapped ProcessesThe selection principles for the processes to be swapped out or in are slightly different.To swap in processes, the swapper has to make a decision on which one to be swapped in earlier than others. Two rules are used.

• It examines processes that have been swapped out and are ready to run.• It tests how long a process stays on the disk. The longer time a process stays than others, the earlier it will be swapped in.

To swap out processes, the swapper chooses a process according to the rules: • It examines the processes that are sleeping in memory for some event to Occur.• It checks how long a process stays in memory. The process for the longesttime will be first swapped out.Swapper The swapper or Process 0 is a kernel process that enters an infinite loop after the system is booted. It tries to swap processes in memory from the swap space on the disk or swap out processes from memory onto the swap space if necessary, or it goes to sleep if there is no process suitable or necessary to swap. The kernel periodically schedules it like other processes in the system. When the swapper is scheduled, it examines all processes whose states are “ready, swapped”, chooses one that has been out on the disk for the longest time. If there is free memory space available and enough for the chosen process, the swapper does the swapping in for the process. If successful, the swapping-in repetition continues to look for other process in

51

“ready, swapped” state and swap them in one by one until no process on the swap space is in “ready, swapped” or there is no room in memory available for the process to be swapped in. If there is no room in memory for the process to be swapped in, the swapper enters in swapping-out searching, in which it checks the processes that are sleeping in memory, and chooses the one that has been in memory for the longest time to swap out on the swap space of the disk. Then the infinite loop goes back to look for the processes to be swapped in. If there is no chosen process, the swapper goes to sleep.

51

The procedure of swapper also indicates that the swapping has to handle three parts: the swap device allocation, swapping processes in memory, and swapping out of memory.Swapping Effect As known, the swapping is simple and appropriate for the system with the small main memory space. But it has some flaws. As the swap space is on the disk, the swapping can seriously intensify the file system traffic and increase the usage of disk. When the kernel wants to swap in a process but there is no free memory space for it, it has to swap out a process. If the size of the process to be swapped out is much less than the process to be swapped in, one time of swapping out cannot make the swapping-in successful. This may cause swapping to delay longer, especially considered that it involves I/O operations. Furthermore, there is an extreme case existing for swapping. If some of the swapped-out processes are ready to run, the swapper is invoked to try to swap them in. But if there is not enough room in memory now, the swapper has to swap out some process in memory. However, if there is no room in the swap space on the disk, either, and at the same time some new process is created, a stalemate can happen. If the processes in memory are time-consuming, the deadlock can keep much longer.Demand Paging in UNIX Since the advent of computer systems with the virtual memory, such as VAX, in the early 1970s, operating systems have been developed to manage the memory in a new way related to virtual memory address space that is different from the physical memory space, and even better, expands the memory space into a virtual larger scope and allows more processes to execute in the system simultaneously. Demand paging in virtual memory even enhances the system throughput and makes the concurrently execution of multiple processes in a uni-processor system implement well. As known, the locality principle was first addressed by Peter J. Denning in 1968 (Denning 1983), which uncovered the fact that the working set of pages that the process references in a short period of time are limited to a few of pages. Because the working set is a little dynamical part of a process, if the memory management handles this dynamicity as soon as possible, the system potentially can increase its throughput and allow more processes concurrently executing. We also know that the swapping needs to transfer the whole processes and may aggravate the disk I/O traffic. Demand paging transfers only some pages of the processes between the memory and the space on the

51

disk, and has some mechanisms to reduce the I/O traffic that will be discussed in this section. Demand paging usually cooperates with page replacement in the virtual memory management. The former tackles how and when to bring one or more pages of a process in memory; the latter handles how to swap out some pages of a process periodically in order to allow new pages of a process to enter the memory. When a process references a page that is not in memory, it causes a page fault that invokes demand paging. The kernel puts the process in sleeping until the needed page is read in memory and is accessible for the process. When the page is loaded in memory, the process resumes the execution interrupted by the page fault. As there is always some new page to be brought in memory, the page replacement must dynamically swap out some pages onto the swap space.Demand Paging As known, in the virtual memory address space, the pages of a process are indexed by the logical page number, which indicates the logical order in the process. We also have a physical memory address space in the system, which is also divided into pages. To avoid making readers confused by two context related pages, the pages in the virtual memory space are still called pages, but the pages in physical memory are usually called frames. When a page of a process is put in a frame, this page is really allocated in physical memory. Three data structures in the UNIX kernel are needed to support demand paging: page table, frame table, and swap table. And two handlers are used to accomplish demand paging for different situations. Page Table Entries of the page table are indexed by the page number of a process, and one entry is for a page. Each entry of the page table has several fields: the physical address of the page, protection bits, valid bit, reference bit, modify bits, copy-on-write bit, age bits, the address on the disk, and disk type

51

Here gives the explanation of the fields:• Physical address is the address of the page in the physical memory, which is the frame address that the page occupies.• Protection bits are the access privileges for processes to read, write or execute the page.• Valid bit indicates whether or not the content of a page is valid.• Reference bits indicate how many processes reference the page.• Modify bit shows whether or not the page is recently modified by processes.• Copy-on-write bit is used by fork system call when a child process is created.• Age bits show how long the page is in memory and are used for page replacement.• Disk address shows the address of the page on the disk, including the logical device number and block number, no matter whether it is in the file system or the swap space on the disk.• Disk type includes four kinds: file, swap, demand zero, and demand fill. If the page is in an executable file, its disk type is marked as file and its disk address is the logical device number and block number of the page in the file on the file system. If the page is on the swap space, its disk type is marked as swap and its disk address is the logical device number and block number of the page in the swap space on the disk. If the page is marked as “demand zero”, which means bss segment (block started by symbol segment) that contains data not initialized at compile time, the kernel will clear the page when it assigns the page to the process. If the page is marked as “demand fill”, which contains the

51

data initialized at compile time, the kernel will leave the page to be overwritten with the content of the process when allocating the frame to the process.Frame Table Frame table is used for physical memory management. One frame of physical memory has an entry in the frame table, and the frame table is indexed with the frame number in physical memory. Entries in frame table can be arranged on one of two lists: a free frame list or a hash frame queue. The frames on the free frame list are reclaimable. When a frame is put at the end of the free frame list, it will be allocated to a new page of a process if no process does reference it again in a period of time. However, a process may cause a page fault that is found still on the free frame list, and it can save once I/O operation of reading from the swap space on the disk. The hash frame table is indexed with the key that is the disk address (including the logical device number and block number). One entry of the hash frame table is corresponding to a hash queue with one unique key value. With the key value, the kernel can search for a page on the hash frame tablequickly. When the kernel allocates a free frame to a page of a process, it removes an entry at the front of the free frame list, modifies its disk address, and inserts the frame into a hash frame queue according to the disk address. To support the frame allocation and deallocation, each entry in the frametable has several fields:• Frame state can be several situations, for example, reclaimable, on the swap space, in an executable file, being underway of reading in memory, or accessible.• The number of referencing processes shows how many processes access the page.• Disk address where the page is stored in the file system or the swap space on the disk includes the logical device number and block number.• Pointers to the forward and backward neighbor frames on the free frame list or a hash frame queue.

Swap TableThe swap table is used by page replacement and swapping. Each page on the swap space has an entry in the swap table. The entry holds a reference field that indicates how many page table entries point to the page.

Page Fault

51

We have known that when a process references a page that is not in memory, it causes a page fault that invokes demand paging. In fact, demand paging is a handler that is similar to general interrupt handlers, except that the demand paging handler can go to sleep but interrupt handlers cannot. Because the demand paging handler is invoked in a running process and it will be back to the running process, its execution has to be in the context of the running process. Thus, demand paging handler can sleep when I/O operation is done for the page read or swapped in memory. Since page faults can occur in different situations during a process execution, in the UNIX virtual memory management, there are two kinds of page faults: protection page faults and validity page faults. Thus, there are two demand paging handlers, protection handler and validity handler, which handle protection page faults and validity page faults, respectively. The protection page faults are often caused by the fork system call. The validity page faults can be resulted from several situations depending on the different stages of the execution of a process, and mostly related to the execve system call. Thus, later we will discuss these two, respectively.Protection page faultIn the UNIX System V, the kernel manages processes with the per process region table that is usually part of process control block of the process. Each of its entries represents a region in the process and holds a pointer to the starting virtual address of the region. The region contains the page table of this region and the reference field that indicates how many processes reference the region. The per process region table consists of shared regions and private regions of the process. The former holds one part of the process that can be shared by several processes; the latter contains the other part that is protected from other processes’ references. When the demand paging handler is invoked during the fork system call, the kernel increments the region reference field of shared regions for the child process. For each of private regions of the child process, the kernel allocates a new region table entry and page table. The kernel then examines each entry in page table of the parent process. If a page is valid, the kernel increments the reference process number in its frame table entry, indicating the number of processes that share the page via different regions rather than through the shared region in order to let the parent and child processes go in different ways after the execve system call. Similarly, if the page exists on the swap space, it increments the reference field of the swap table entry for this page. Now the

51

page can be referenced through both regions, which share the page until one of the parent or child processes writes to it. Then the kernel copies the page so that each region has a private version. To do this, the kernel turns on the copy-on-write bit for each page table entry in private regions of the parent and child processes during the fork system call. If either process writes the page, it causes a protection page fault that invokes the protection handler. Now we can see that the copy-on-write bit in a page table entry is designed to separate a child process creation from its physical memory allocation. In this way, via protection page fault, the memory allocation can postpone until it is needed. The protection page fault can be caused in two situations. One is when a process references a valid page but its permission bits do not allow the process access, and the other is when a process tries to write a page whose copy-on-write bit is set by the fork system call. The kernel has to check first whether or not permission is denied in order to make a decision about what to do next, to signal an error message or to invoke the protection handler. If the latter, the protection handler is invoked. When the protection handler is invoked, the kernel searches for the appropriate region and page table entry, and locks the region so that the page cannot be swapped out while the protection handler operates on it. If the page is shared with other processes, the kernel allocates a new frame and copies the contents of the old page to it; the other processes still reference the old page. After copying the page and updating the page table entry with the new frame number, the kernel decrements the process reference number of the old frame table entry. If the copy-on-write bit of the page is set but the page is not shared with other processes, the kernel lets the process retain the old frame. Then the kernel separates the page from its disk copy because the process will write the page in memory but other processes may use the disk copy. Then it decrements the reference field of the swap table entry for the page and if the reference number becomes 0, frees the swap space. It clears the copy-onwrite bit and updates the page table entry. Then it recalculates the process priority because the process has been raised to a kernel-level priority when it invokes the demand paging handler in order to smooth the demand paging process. Finally, before returning to the user mode, it checks signal receipts that reached during handling the demand paging. Through the processing above, we can see that the page copying of the

51

child process is deferred until the process needs it and causes a protection page fault, rather than when it is created. BSD systems used demand paging before System V and had their solution to the separate memory allocation for a child process. In BSD, there are two versions of fork system calls: one is the regular one that is just the fork system call; the other is the vfork system call that does not do physical memory allocation for the child process. The fork system call makes a physical copy of the pages of the parent process, which is a wasteful operation if it is closely followed by an execve system call. However the vfork system call, which assumes that a child process will immediately invoke the execve system call after returning from the vfork call, does not copy page tables so it is faster than the fork system call of System V. The potential risk of vfork is that if a programmer uses vfork incorrectly, the system will go into danger. After vfork system call, the child process uses the physical memory address space of the parent process before execve or exit is called, and can ruin the parent’s data and stack by accident and make the parent not to be able to go back into its working context.Page Replacement

As known, demand paging should cooperate with page replacement to implement the virtual memory management. When a process executes, its working pages change dynamically. Some of its pages in memory should be swapped out dynamically and replaced by new pages to let the process keep executing until its work finishes. Page replacement is similar to the swapping, except that it swaps out pages of a process rather than a whole process. In UNIX, there are two solutions to page replacement: one is the page stealer of System V; the other is the page daemon of the 4.2BSD.

UNIX securityUNIX systems are designed to encourage user interaction, which can make themmore difficult to secure.UNIX systems are intended to be open systems; their specifications and source code are widely available. The UNIX password file is encrypted. When a user enters a password, it is encrypted and compared to the encrypted password file.

Thus, passwords are unrecoverable even by the system administrator. UNIX systems use salting whenencrypting passwords.139 The salting is a two-character string randomly selected via a function of the time and the process ID. Twelve bits of the

51

salting then modify the encryption algorithm. Thus, users who choose the same password (by coincidence or intentionally) will have different encrypted passwords (with high likelihood).Some installations modify the password program to prevent users from choosing weak passwords. The password file must be readable by any user because it contains other crucial information (i.e., usernames, user IDs, and the like) that is required by many UNIX tools. For example, because directories employ user IDs to record file ownership, Is (the tool that lists directory contents and file ownership) needs to read the password file to determine usernames from user IDs. If crackers obtain the password file, they could potentially break the password encryption. To address this issue, UNIX protects the password file from crackers by storing information other than the encrypted passwords in the normal password file and storing the encrypted passwords in a shadow password file that can be accessed only by users with root privileges.

With the UNIX setuid permission feature, a program may be executed by one user using the privileges of another user. This powerful feature has security flaws, particularly when the resulting privilege is that of the "super user" (who has access to all files in a UNIX system). For example, if a regular user is able to execute a shell belonging to the super user, and for which the setuid bit has been set, then the regular user essentially becomes the superuser.143 clearly, setuid should be employed carefully. Users, including those with super user privileges, should periodically examine their directories to confirm the presence of setuid files and detect any that should not be setuid. A relatively simple means of compromising security in UNIX systems (and other operating systems) is to install a program that prints out the login prompt, copies what the user then types, fakes an invalid login and lets the user try again.The user has unwittingly given away his or her password! One defense is that if you are confident you typed the password correctly the first time, you should log into a different terminal and choose a new password immediately.

UNIX systems include the crypt command, which allows a user to enter a key and plaintext; ciphertext is output. The transformation can be reversed trivially with the same key. One problem with this is that users tend to use the same key repeatedly; once the key is discovered, all other files encrypted with this key can be read. Users sometimes forget to delete their plaintext files after producing encrypted versions.This makes discovering the key much easier. Often, too many people are given superuser privileges. Restricting super user privileges can reduce the risk of attackers gaining control of a system

51

due to errors made by inexperienced users. UNIX systems provide a substitute user identity (su) command to enable users to execute shells with a different user's credentials. All su activity should be logged; this command lets any user who types a correct password of another user assume that user's identity,possibly even acquiring superuser privileges. A popular Trojan horse technique is to install a fake su program, which obtains the user's password, e-mails it to the attacker and restores the regular su program. Never allow others to have write permission for your files, especially for your directories; if you do, you're inviting someone to install a Trojan horse. UNIX systems contain a feature called password aging, in which the administrator determines how long passwords are valid; when a password expires, the user receives a message and is asked to enter a new password. There are several problems with this feature 1. Users often supply easy-to-crack passwords. 2. The system often prevents a user from resetting to the old (or any other)

Password for a week, so the user can not strengthen a weak password. 3. Users often switch between only two passwords. Passwords should be changed frequently. A user can keep track of all login dates and times to determine whether an unauthorized user has logged in (which means that his or her password has been learned). Logs of unsuccessful login attempts often store passwords, because users sometimes accidentally type their password when they mean to type their username. Some systems disable accounts after a small number of unsuccessful login attempts. This is a defense against the intruder who tries all possible passwords. An intruder who has penetrated the system can use this feature to disable the account or accounts of users, including the system administrator, who might attempt to detect the intrusion.

The attacker who temporarily gains superuser privileges can install a trap-door program with undocumented features. For example, someone with access to source code could rewrite the login program to accept a particular login name and grant this user super user privileges without even typing a password. It is possible for individual users to "grab" the system, thus preventing other users from gaining access. A user could accomplish this by spawning thousands of processes, each of which opens hundreds of files, thus filling all

51

the slots in the open-file table. Installations can guard against this by setting reasonable limits on the number of processes a parent can spawn and the number of files that a process can open at once, but this in turn could hinder legitimate users who need the additional resources.

Kernal of Unix

Overview of Operating Systems and KernelsBecause of the ever-growing feature set and ill design of some modern commercial operating systems, the notion of what precisely defines an operating system is vague. Many users consider whatever they see on the screen to be the operating system. Technically speaking, and in this book, the operating system is considered the parts of the system responsible for basic use and administration. This includes the kernel and device drivers, boot loader command shell or other user interface, and basic file and system utilities. It is the stuff you need not a web browser or music players. The term system in turn refers to the operating system and all the applications running on top of it.Of course the topic of this book is the kernel. Whereas the user interface is the outermost portion of the operating system the kernel is the innermost. It is the core internals the software that provides basic services for all other parts of the system manages hardware and distributes system resources. The kernel is sometimes referred to as the supervisor core or internals of the operating system. Typical components of a kernel are interrupt handlers to service interrupt requests a scheduler to share processor time among multiple processes a memory management system to manage process address spaces and system services such as networking and inter-process communication. On modern systems with protected memory management units the kernel typically resides in an elevated system state compared to normal user applications. This includes a protected memory space and full access to the hardware. This system state and memory space is collectively referred to as kernel-space. Conversely, user applications execute in user-space. They see a subset of the machine's available resources and are unable to perform certain system functions, directly access hardware, or otherwise misbehave (without consequences, such as their death, anyhow). When executing the kernel, the system is in kernel-space executing in kernel mode, as opposed to normal user execution in user-space executing in user mode. Applications running on the system communicate with the kernel via system calls .An application typically calls functions in a library for example the C library that in turn rely on the system call interface to instruct the kernel to carry out tasks on their behalf. Some library calls provide many features not found in the system call and thus calling into the kernel is just one step in an otherwise large function. For example, consider the familiar printf () function. It provides formatting and buffering of the data and only

51

eventually calls write () to write the data to the console. Conversely, some library calls have a one-to-one relationship with the kernel. For example, the open () library function does nothing except call the open () system call. Still other C library functions, such as strcpy (), should (you hope) make no use of the kernel at all. When an application executes a system call, it is said that the kernel is executing on behalf of the application. Furthermore, the application is said to be executing a system call in kernel-space, and the kernel is running in process context. This relationship that applications call into the kernel via the system call interface is the fundamental manner in which applications get work done.Figure 1.1 Relationship between applications, the kernel, and hardware

The kernel also manages the system's hardware. Nearly all architectures, including all systems that Linux or UNIX supports, provide the concept of interrupts. When hardware wants to communicate with the system, it issues an interrupt that asynchronously interrupts the kernel. Interrupts are identified by a number. The kernel uses the number to execute a specific interrupt handler to process and respond to the interrupt. For example, as you type, the keyboard controller issues an interrupt to let the system know that there is new data in the keyboard buffer.

51

The kernel notes the interrupt number being issued and executes the correct interrupt handler. The interrupt handler processes the keyboard data and lets the keyboard controller know it is ready for more data. To provide synchronization, the kernel can usually disable interrupts either all interrupts or just one specific interrupt number. In many operating systems, including Linux or UNIX, the interrupt handlers do not run in a process context. Instead, they run in a special interrupt context that is not associated with any process. This special context exists solely to let an interrupt handler quickly respond to an interrupt, and then exit.These contexts represent the breadth of the kernel's activities. In fact, in UNIX, we can generalize that each processor is doing one of three things at any given moment:

• In kernel-space, in process context, executing on behalf of a specific process• In kernel-space, in interrupt context, not associated with a process, handling

an interrupt• In user-space, executing user code in a process

This list is inclusive. Even corner cases fit into one of these three activities: For example, when idle, it turns out that the kernel is executing an idle process in process context in the kernel.Linux versus Classic UNIX KernelsOwing to their common ancestry and same API, modern UNIX kernels share various design traits. With few exceptions, a UNIX kernel is typically a monolithic static binary. That is, it exists as a large single-executable image that runs in a single address space. UNIX systems typically require a system with a paged memory-management unit; this hardware enables the system to enforce memory protection and to provide a unique virtual address space to each process.Monolithic Kernel versus Microkernel DesignsOperating kernels can be divided into two main design camps: the monolithic kernel and the microkernel. (A third camp, exokernel is found primarily in research systems but is gaining ground in real-world use.)Monolithic kernels involve the simpler design of the two, and all kernels were designed in this manner until the 1980s. Monolithic kernels are implemented entirely as single large processes running entirely in a single address space. Consequently, such kernels typically exist on disk as single static binaries. All kernel services exist and execute in the large kernel address space. Communication within the kernel is trivial because everything runs in kernel mode in the same address space: The kernel can invoke functions directly, as a user-space application might. Proponents of this model cite the simplicity and performance of the monolithic approach. Most UNIX systems are monolithic in design.

51

Microkernel, on the other hand, is not implemented as single large processes. Instead, the functionality of the kernel is broken down into separate processes, usually called servers. Idealistically, only the servers absolutely requiring such capabilities run in a privileged execution mode. The rest of the servers run in user-space. All the servers, though, are kept separate and run in different address spaces. Therefore, direct function invocation as in monolithic kernels is not possible. Instead, communication in microkernel is handled via message passing: An inter-process communication (IPC) mechanism is built into the system, and the various servers communicate and invoke "services" from each other by sending messages over the IPC mechanism. The separation of the various servers prevents a failure in one server from bringing down another.Likewise, the modularity of the system allows one server to be swapped out for another. Because the IPC mechanism involves quite a bit more overhead than a trivial function call, however, and because a context switch from kernel-space to user-space or vice versa may be involved, message passing includes a latency and throughput hit not seen on monolithic kernels with simple function invocation. Consequently, all practical microkernel-based systems now place most or all the servers in kernel-space, to remove the overhead of frequent context switches and potentially allow for direct function invocation. The Windows NT kernel and Mach (on which part of Mac OS X is based) are examples of microkernel. Neither Windows NT nor Mac OS X run any microkernel servers in user-space in their latest versions, defeating the primary purpose of microkernel designs altogether.

51

Bibliography:http://www.tutorialspoint.com/unix/index.htmhttp://gmarik.info/blog/2012/08/15/orphan-vs-zombie-vs-daemon-processeswww.cs.kent.edu/~farrell/osf03/oldnotes/L06.pdfhttp://ocamlunix.forge.ocamlcore.org/threads.htmlhttps://kb.iu.edu/d/aiauThis version of events is captured by the History & Timeline that can be found at http://www.UNIX-systems.org/what_is_unix/history_timeline.htmlFor more information about the *BSD family, see the FAQ at http://www.faqs.org/faqs/386bsd-faq/part1/http://www.linuxmall.com/Allann/lxtm.001.htmlSee http://www.gnu.org/gnu/linux-and-gnu.htmlSee The Halloween Documents athttp://www.opensource.org/halloween.html UNIX OS by Yukun Liu and Yong Yue pdf book

http://www.opensource.org/halloween.html

Education

Unix Project