Systems by mvb.pdf · OPERATING SYSTEMS Computer System and Operating System Overview: Overview of computer operating Systems, operating systems functions, protection and security,

OPERATING SYSTEMS Choose Operating System

Freedom, choices, Beautiful

Prepared By Dr.MVB

Prepared by Dr.MVB

OPERATING SYSTEMS Computer System and Operating System Overview: Overview of computer operating Systems, operating systems functions, protection and security, distributed systems, Special purpose systems, operating systems structures and systems calls, operating systems Generation.

Objectives:

• To provide a grand tour of the major operating systems

components. • To provide coverage of basic computer system organization. • Gives a historical view of the development of operating

systems. • Grand tour of the major operating-system components.

COMPUTER SYSTEM AND OPERATING SYSTEM OVERVIEW � OVERVIEW OF COMPUTER OPERATING SYSTEM

A program that acts as an intermediary between a user of a computer and the computer hardware

Operating system goals:

• Execute user programs and make solving user problems easier • Make the computer system convenient to use • Use the computer hardware in an efficient manner

What an operating system do?

A computer system is broadly classified into 4 categories those are

a) Hardware

b) Operating system

c) Application programs

Prepared by Dr.MVB

OPERATING SYSTEMS

d) Users

Hardware: - The CPU, the memory, and the i/o devices-provides the basic computing resources for the system Application program:-Such as word processors, spread sheets, compliers etc -------define the ways in which these resources are used to solve the user‘s computing problems.

Operating system:-controls & coordinates the use of the hardware among the various computer applications among various users.

Fig: abstract view of computer system

From the above diagram operating system is defined as interface between user of a program &computer hardware.

It provides an platform for application programs to run on it.

Prepared by Dr.MVB

OPERATING SYSTEMS It controls the execution of application programs. It manages the computer hardware Os provides environment in such that user can execute program in a convenient

&efficient manner.

Operating system acts as resource allocator

A computer system has many resources to solve a problem

.the resources are cputime, memory space,I/ODevices and soon. It must decide how to allocate resources to specific programs& users so that it allocation, deallocation is also a function of operating system. it forms the basis for implementation of resource allocation between users.

Operating system acts a control program

Operating system is a control program. A control program

manages the execution of user programs to prevent errors improper use of computer.

It controls the entire system. it protects the system by giving the passwords.

We can explore the operating system by using 2 viewpoints.

a) User view points------what the user aceeptsfrom operating system in convenient &effiency way.

b) System view point----------what the operating system accepts from the user. Such that the user is allocating the resources in a efficient manner or not.

Prepared by Dr.MVB

OPERATING SYSTEMS Operating system goals

Convenience:-the operating system should provide an environment that is

simple and easy to use. Efficiency: - the operating system allows the computer system resources to be used in an efficient manner.

Ability to use:-an operating system should be developed in a way that it provides flexibility maintainability.

Various computer operating systems

Mainframe systems-----these are the old systems. Which belongs to 1960‘s? these systems are very large. Punched cards are used instead of keyboard. There are many disadvantages those are it always perform single task & CPU stays idle for more time. since one tasks executes until it excutes.

Multiprogramming systems-----in mono programming, memory contains only one program at any point of time. Whereas it contains more than one user program. In case of mono programming, when cpu is executing the program&i/o operation is encountered then the program goes toi/o devices, during thetimecpu sits idle.so cpu utilization is poor. In multiprogramming user program contains i/o operations, cpu switches to next userprogram.hence cpu is busy at all times.

Timesharing systems-------the user has a separate program in memory.eachprogram in time sharing system is given a certain time sloti.e os allocates cpu to any process for given time period. This time period is known as time quantum/slice.

Prepared by Dr.MVB

OPERATING SYSTEMS Desktop system-------It is a control program in user‘s machine. It is also called client operating system. It concentrates on convenience & effienciency to the user.

Multiprocessor systems------these are also called as parallel systems/tightly coupled systems. These systems have 2 or more processors. We reduce the cpu utilization time. it shares the computer bus& sometimes clock,memory&pheriperal devices.

Distributed systems--------It is collection of physically separate, possibly heterogonous computer systems that are networked together to provide the users with access to the various resources that system maintains.

Clustered systems---------these are composed of two or more systems are grouped together which share the information. Cluster technology is rapidly.some cluster products support dozen of systems in a cluster as well as clustered nodes that are separated by miles.

Real time system----------the real time systems has well-defined, fixed time constraints. Processing must be done within the defined constraints, or the system will fail. a real time system functions correctly only if it returns the correct result within time constraints.

Handheld system--------these are the systems used in mobile phones. This has a limited size, small display screens slow processors. To include faster processors, large battery is required which would take more space should be recharged frequently&also they limited i/o.

Prepared by Dr.MVB

OPERATING SYSTEMS � OPERATING SYSTEM FUNCTIONS/SERVICES

User interface-----the operating system should provide an interface through which an user can interact.eariler operating systems provide command line interface, which uses text commands. All the users are supposed to type their commands through keyboard. Some uses batch interface which supports the file commands. Now a days it uses graphical user interface is used where a window display a list of commands to be chosen by a user.

Program creation& execution----the operating system should support various utilities such as editors,compilers&debuggers etc.in order to give the user the facility to write&excute their programs.

I/O device support-------there are many I/O devices. Each of them has its own instructions &controlsignals.which is used during its operation. Os should take care of all these internal devices details& should provide users with simple read () & write() functions for using those devices

File system management-----user data is stored in files os should manage all those flies and should provide various functions like create open, close, etc.additionally protecting the files from unorganized access.

Inter processor communication----communication takes place between two processors. These processors exchange the data among themselves. these can be employed by operating system techniques like messagepassing&shared memory. Error detection----the os is responsible for various errors that occurring in cpu. Whenever error occurs the os takes appropriate steps & may provide debugging facilities.

Prepared by Dr.MVB

OPERATING SYSTEMS Accounting-----it stores the recorded statistical information can used to improve the system performance. and also it stores the information that how many times that user accessed the resources.

Resource allocation---It is responsibility of the operating system to allocate the resources such as cpu time,i/o devicesetc effectively. For ex various scheduling algorithms are used for allocating the resources.

Protection&security-----protection is a mechanism for controlling the access of processors to resources defined by a computer system. it provides means of specification of controls to be imposed. Security is to protect the system from the external or internal attacks. Such as virus, &worms e.t.c.

� PROTECTION AND SECURITY

Protection – any mechanism for controlling access of processes or users to resources defined by the OS

Security – defense of the system against internal and external attacks

• Huge range, including denial-of-service, worms, viruses, identity theft,

theft of service • Systems generally first distinguish among users, to determine who can do

what • User identities (user IDs, security IDs) include name and associated

number, one per user • User ID then associated with all files, processes of that user to determine

access control • Group identifier (group ID) allows set of users to be defined and controls

managed, then also associated with each process, file • Privilege escalation allows user to change to effective ID with more

rights

Prepared by Dr.MVB

OPERATING SYSTEMS � DISTURBUTED SYSTEMS

It is the collection of independent heterogeneous systems that are networked together to provide the users with access to various resources that system maintains.acess to a shared resource increases the computation speed,functionality,dataavaliblity,&reliability.some operating systems generalize network access as a form of files.

Some operating systems provide generic functions /methods. For accessing files over a network. They manipulate the networking details in network interface‘s device driver. Whereas some other operating systems have separate functions for network access. &localacess.

There are various types of networks available for connecting two computers together. these networks are distinguished by protocols they used. Various protocols are ATM,TCP/IP,e.t.c TCP/IP is the most popular network protocalmany operating systems support this protocol.

A network is a communication path between 2 or more systems. These systems depend upon networking for their functionality. Networks are characterized into 3 types. a local area network(LAN) connects computer within a building/room. a wide area network (WAN)that connects cities/countries.metropolitian area network(MAN) connects building within city. Network operating system is an operating system that provides features such as file sharing across the network that include communication scheme that allows different processors on different computers to exchange messages. a network operating system acts autonomously from all other computer on the network.

Distributed operating system provides a less autonomous environment.

Prepared by Dr.MVB

OPERATING SYSTEMS

SPECIAL PURPOSE SYSTEMS There are various classes of computer system upon their conventional speed, usage&hardware.the following are some special purpose systems according to specific applications. They are

Real time embedded system----a real time system is used when rigid time requirements have been placed on the operation of a processor or the flow of data, thus it is often used as a control device in a dedicated application.

A real time system has well-defined, fixed time constraints. processing must be done within the defined constraints. Or the system will fail. a real time system functions correctly only if it returns the correct result within its time constraints.

Multimedia systems: Many operating system handle conventional data. Conventional data includes text files, programs word processing documents, spread sheets.

Now a days multimedia gain popularity multimedia is a combination of audio, video&conventionalfiles.the examples of multimedia embedded systems are mp3, dvd,&video conferencing.

Handheld system: they have limited size small displayscreens,slow processors. To include faster processor, large battery is required which would take more space& should be Recharged frequently&also they limited I/o.the examples of handheld systems are mobile phones.

Prepared by Dr.MVB

OPERATING SYSTEMS � OPERATING SYSTEM STRUCTURES

An operating system is considerably very large & complex program that must be designed &modified carefully. Usually it is preferable to have several components of a system rather than Single/monolithic system. Each component has well-defined job .all those components are joined together to form an operating system. The following are the various approaches of operating system design.

Simple structure

There are several commercial operating systems which are very

simple but not well-defined structures.usuaslly these systems are made with small size, simple with limited functionality.

But their popularity beyond their original scope.MS-DOS is the example for such type of systems. It always experienced the threat of vulnerable and malicious programs that may cause damage to the entire system. Due to the improper separation of interfaces & functionalities. an application program can access the system by using the basic i/o routines. There is no hardware, protection& security mechanisms in the period of 8088 processor.

Early versions of UNIX fall into this category. It divides the system into 2 parts kernel (heart of Unix) and the system programs. The kernel contains several device drivers which interact with the system directly. Then the problem occurs in kernel because it became larger to implement as it has more functionality.

Prepared by Dr.MVB

OPERATING SYSTEMS

Structure of MS-DOS Layered approach

In the layered approach the operating system is divided into

layers. The highest layer corresponds to application layer. The lower layer consists of hardware. Each layer consists of data structure& operations which are invoked by the higher layers. The lower layer provide some

Prepared by Dr.MVB

OPERATING SYSTEMS Service to the higher layer .the advantage in this approach is construction &debugging is simple. as the lowest layer is nothing but hardware it can be used by layer1,if layer1 is debugged if any bug is found it is fixed. The limitation of this approach is pre planning is needed because particular layer uses the lower layer.

Another major problem in layered approach is its inefficiency as it increases the overall burden of an operating system for ex ,where an user excutes an i/o system call, initially it is catched in i/o layer, which calls the memory management layer ultimately calls theCPU scheduler layer. In each of these calls an additional increase is observed in the system calls and processing time.

Module based approach

It is the best approach of operating

system design which uses object-oriented programming techniques. In this approach various components of operating system are implemented as dynamically loadable modules. The kernel consists of core components and dynamic links to load those modules either at run time or at compile time.

Prepared by Dr.MVB

OPERATING SYSTEMS Operating systems such as newer versions of UNIX,solaris,LINUX all use module-based approach. The solaris operating system has seven loadable kernel modules

Fig : loadable modules of solaris of operating system. The kernel consists of core and other services, like device drivers of certain hardware, various file system support etc.these can be added to kernel dynamically when needed. This approach is similar to layered approach when compared to it. The core services include the code for communication among them(modules) and loadable modules.

Prepared by Dr.MVB

OPERATING SYSTEMS � SYSTEM CALLS

The operating system provides a wide range of

system services &functionalities. These services can be accessed by the system calls. The system calls acts as an interface between the operating system user applications. These are available as built functions&routines in almost in all high level languages like c, c++, e.t.c.

For ex: all operating system have a routine for creating a directory .if we want to excute an os routine from a program. We must make a system call is an interface between process& operating system. System calls are similar due to predefined functions or subroutine calls.

Copy the contents of one file to another. To handle this task several system calls are used from the time of prompting users to entire file names of source &destination files to copy in the contents&closing these files.

In GUL systems users have the facility of selecting source& destination files with in the mouse. This can be done by using I/O system calls. The next step is used to open the specified files. It is done by using the open system call-the system may have the possibility of containing errors. If the specified errors are not present/ failed to open them due to access restrictions. Hence a system call is aborted by the system

When both files are opened read() is used to read the contents of source files and write() is used to write the contents in destination file. This is repeated till the end of file is reached Finally both the files are closed &probably user will be notified by displaying a message on screen using another system call.

Prepared by Dr.MVB

OPERATING SYSTEMS

Fig: how system call are used

As we can see even simple programs make heavy use of operating system. Frequently systems excute thousands of system calls per second.

Most programmers won‘t see these levels of detail so application developers

designed programs according to API.

Application programming interface contains set of functions including parameters that are passed to each function return values .main 3 application program interfaces are

1) DM 32 API for window system

2) POSIX API for POSIX-based system

3) Java API for designing program that run on jvm

.

Prepared by Dr.MVB

OPERATING SYSTEMS

The relationship between API system call interface &operating system.

� It serves as a link to system in operating system.

� It interacts function calls in API&invokes necessary system calls within

operating system. � A no is associated with each system call and system call interface maintains

a table indexed. According to this numbers. � It invokes intended system call in operating system interface are hidden from

the programs by the API and are managed by runtime support library. � 3 methods are used to pass parameters to the operating system

Registers-the processor register can be used to store the parameters when system calls will receive those parameters by reading register values.

Prepared by Dr.MVB

OPERATING SYSTEMS Stack---parameters can also be pushed on stack and system calls can receive them while copying them from stack. The advantage is that it doesn‘t limit the length of parameters being passed

Block of memory--- if no of parameters are more than processor registers then they are stored in block of memory and address of that block is stored in processor register.

TYPES OF SYSTEM CALLS

System calls can grouped into 5 categories those are process control, file manipulation, device manipulation, information maintaince&communications

Process control

� End, abort

� Load, excute

� Create process, terminate process

� Get process attributes, set process attributes

Prepared by Dr.MVB

OPERATING SYSTEMS � Wait for time

� Wait event, signal event

� Allocate &free memory

Often 2 or more processors share data.toensure integrity data we share os often provide system calls allowing a process to lock shared thus preventing another process from accessing data while it is located. Such system calls accquirelock7realse lock.

Process control can be known by taking a single tasking& other multi tasking system.MS-DOS is an example for the single tasking system as soon as computer is started command interpreter present is invoked.

At system startup running a program

MS-DOS uses a simple method to run a program& doesn‘t create a new process. It loads program into memory writing over most of itself to give the program as much memory as possible. Then it sets instruction pointer to first instruction of the program.

Prepared by Dr.MVB

OPERATING SYSTEMS UNIX is an example of multitasking system. When an user logs on system the shell(command interpreter)of the users choice is run.this shell is similar to MS- dos shell in that it accepts commands and excutes programs that the user request.

Fig; UNIX running multiple programs

Loaded into memory through an exec ().and the program is then executed

File manipulation

• Create file, delete file

• Open, close

• Read,write,reposition

• Get file attributes, set file attributes

• need to create&deletefiles.we use create and delete system calls. Once the

file is created we can open that file by using the open().we can read, write or reposition a file using read(),write(),&reposition().if the system is in no longer in use we close().if we want to use the courts, accounting information.

Prepared by Dr.MVB

OPERATING SYSTEMS Device management

• Request device,relese device

• Read ,write, reposition

• Get device attributes, set device attributes

• Logically attach or detach devices To perform any operation on a system it request the system to request the system it gives request ()if not so it gives release().if it accepts the request it performs Read, write,

Reposition ().

And also get device attributes set attributes.

Info maintainence

• Get time/date, set time/date

• Get system data, set system data

• Get process, file/device attributes

• Set process file,device,attributes

To return the current date and time we use get time/date ().other system calls may return information about the system such as the number of current users the version no of the operating system, the amount of free memory/disc space and soon.

Communication

o create, delete communication connection

o send, receive message

o transfer status information

o attach/detach remote devices

Prepared by Dr.MVB

OPERATING SYSTEMS it create the communication between two devices .it create/deletes the communication between devices. By using create&delete().as well as it sends& receives messages by using send & receive().and it transfer the status information. And also it attach/detach the devices. There are 2 communication models .those are msg passing& shared memory. Shared memory allows max speed for communicaton.msg passing is used for exchanging the smaller amount of data.

Msg passing shared memory.

Prepared by Dr.MVB

OPERATING SYSTEMS � OPERATING SYSTEM GENERATION

Operating system generation refers to the process of specifying conifigarationof specific machine/site. Where as the os has to be deployed & run. This process is often called as SYSGEN.the os is usually stored in floppies or cd-roms/DVD and distributed. The SYSGEN program is used to generate the operating system. It prompts the system operator to enter the information about its hardware configuration or it may read the same from a file and automatically verifies that configuration. Is available or not.

The following information has to be verified

• the type of CPU /processor used the various components it has including

instruction set, floating point arithmetic it supports. If there are multiple CPU then each of them has to be defined

• the memory capacity it allows

• the various devices it supports and how it address them. Some systems maintain device number, interrupt number,type,model and special characteristic information for each device

• finally ,the various options that user want from operating system like size of buffers,CPU-scheduling algorithms to be used, the number of processors supported at atime,e.t.c.usually,modern day operating systems decide themselves the type of scheduling algorithms to be used and buffer size,e.t.c.these are dynamically changed depending upon the system performance.

After getting the above information, operating system is compiled and data declarations, constants initializations are done to produce a version of operating system are desired by the user/particular system/machine

Prepared by Dr.MVB

OPERATING SYSTEMS Based upon the system descriptors several modules may be selected from a precompiled library and are linked together to generate the operating system as whole.

However, if the system is table driven, then selection of modules is done at execution time rather than compile time and generation is slower because recompilation is needed.

OPERATING-SYSTEM SERVICES

An operating system provides an environment for the execution of programs. It

provides certain services to programs and to the users of those programs. The specific services provided, of course, differ from one operating system to another, but we can identify common classes.

One set of operating-system services provides functions that are helpful to the user.

• User interface. Almost all operating systems have a user interface (UI). This interface can take several forms. One is a command-line interface (CLI), which uses text commands and a method for entering them (say, a program to allow entering and editing of commands). Another is a batch interface, in which commands and directives to control those commands are entered into files, and those files are executed. Most commonly/ a graphical user interface (GUI) is used. Here, the interface is a window system with a pointing device to direct I/O, choose from menus, and make selections and a keyboard to enter text. Some systems provide two or all three of these variations.

Prepared by Dr.MVB

OPERATING SYSTEMS • Program execution. The system must be able to load a program into memory and to run that program. The program must be able to end its execution, either normally or abnormally (indicating error).

• I/O operations. A running program may require I/O, which may involve a file or an I/O device. For specific devices, special functions may be desired (such as recording to a CD or DVD drive or blanking a CRT screen). For efficiency and protection, users usually cannot control I/O devices directly. Therefore, the operating system must provide a means to do I/O.

• File-system manipulation. The file system is of particular interest. Obviously, programs need to read and write files and directories. They also need to create and delete them by name, search for a given file, and list file information. Finally, some programs include permissions management to allow or deny access to files or directories based on file ownership.

• Communications. There are many circumstances in which one process needs to exchange information with another process. Such communication may occur between processes that are executing on the same computer or between processes that are executing on different computer systems tied together by a computer network. Communications may be implemented via shared memory or through message passing, in which packets of information are moved between processes by the operating system.

• Error detection. The operating system needs to be constantly aware of possible errors. Errors may occur in the CPU and memory hardware (such as a memory error or a power failure), in I/O devices (such as a parity error on tape, a connection failure on a network, or lack of paper in the printer), and in the user program (such as an arithmetic overflow, an attempt to access an illegal memory location, or a too-great use of CPU time). For each type of error, the operating system should take the appropriate action to nsure correct and consistent computing. Debugging facilities can greatly enhance the user's and programmer's abilities to use the system efficiently.

Another set of operating-system functions exists not for helping the user but rather for ensuring the efficient operation of the system itself.

Resource allocation. When there are multiple users or multiple jobs running at the same time, resources must be allocated to each of them. Many different types of resources are managed by the operating system. Some (such as CPU cycles, main memory, and file storage) may have special allocation code, whereas others (such as I/O devices) may have much more general request and release code.

Accounting. We want to keep track of which users use how much and what kinds of computer resources. This record keeping may be used for accounting (so that users can be billed) or simply for accumulating usage statistics. Usage statistics may be a valuable tool for researchers who wish to reconfigiire the system to improve computing services.

Protection and security. The owners of information stored in a multiuser or networked computer system may want to control use of that information. When several separate processes execute concurrently, it should not be possible for one process to interfere with the others or with the operating system itself. Protection involves ensuring that all access to

Prepared by Dr.MVB

OPERATING SYSTEMS system resources is controlled. Security of the system from outsiders is also important. Such security starts with requiring each user to authenticate himself or herself to the system, usually by means of a password, to gain access to system resources.

SYSTEM CALLS

System calls provide an interface to the services made available by an operating system. The system call provides an interface to the operating system services.

Application developers often do not have direct access to the system calls, but can access them through an application programming interface (API). The functions that are included in the API invoke the actual system calls. By using the API, certain benefits can be gained:

• Portability: as long a system supports an API, any program using that API can

compile and run. • Ease of Use: using the API can be significantly easier than using the actual system

call.

Let's first use an example to illustrate how system calls are used: writing a simple program to read data from one file and copy them to another file.

In order to do this work first we need to take the two file names as input. The total operation is expressed below. Though it is a simple task itself it needs so many

system calls.

Three general methods exist for passing parameters to the OS:

1. Parameters can be passed in registers. 2. When there are more parameters than registers, parameters can be stored in a block

and the block address can be passed as a parameter to a register. 3. Parameters can also be pushed on or popped off the stack by the operating system.

Prepared by Dr.MVB

OPERATING SYSTEMS

TYPES OF SYSTEM CALLS

System calls can be grouped roughly into five major categories: process control, file

manipulation, device manipulation, information maintenance, and communications. • Process control � end, abort � load, execute � create process, terminate process � get process attributes, set process attributes � wait for time � wait event, signal event � allocate and free memory �

• File management � create file, delete file � open, close � read, write, reposition � get file attributes, set file attributes

• Device management � request device, release device � read, write, reposition � get device attributes, set device attributes � logically attach or detach devices

• Information maintenance

� get time or date, set time or date � get system data, set system data � get process, file, or device attributes � set process, file, or device attributes

• Communications � create, delete communication connection � send, receive messages

Prepared by Dr.MVB

OPERATING SYSTEMS � transfer status information � attach or detach remote devices

PROCESS CONTROL

A running program needs to be able to halt its execution either normally (end) or abnormally (abort). When execution is stopped abnormally, often a dump of memory is taken and can be examined with a debugger.

FILE MANAGEMENT

Some common system calls are create, delete, read, write, reposition, or close. Also, there is a need to determine the file attributes – get and set file attribute. Many times the OS provides an API to make these system calls.

DEVICE MANAGEMENT

Process usually requires several resources to execute, if these resources are available,

they will be granted and control returned to the user process. These resources are also thought of as devices. Some are physical, such as a video card, and others are abstract, such as a file.

User programs request the device, and when finished they release the device. Similar

to files, we can read, write, and reposition the device. INFORMATION MAINTENANCE

Some system calls exist purely for transferring information between the user program

and the operating system. An example of this is time, or date.

The OS also keeps information about all its processes and provides system calls to report this information.

Ex: system call to return the current t i m e and d a t e .

COMMUNICATION

There are two models of interprocess communication, the message-passing model and the

shared memory model.

• Message-passing uses a common mailbox to pass messages between processes. • Shared memory use certain system calls to create and gain access to create and gain

access to regions of memory owned by other processes. The two processes exchange information by reading and writing in the shared data.

OPERATING-SYSTEM STRUCTURE

A system as large and complex as a modern operating system must be engineered

carefully if it is to function properly and be modified easily. A common approach is to partition the task into small components rather than have one monolithic system. Each of

Prepared by Dr.MVB

OPERATING SYSTEMS these modules should be a well-defined portion of the system, with carefully defined inputs, outputs, and functions.

Simple Structure

Many commercial systems do not have well-defined structures. Frequently, such

operating systems started as small, simple, and limited systems and then grew beyond their original scope.

MS-DOS - Written to provide the most functionality in least space.

Not divided into modules. Although MS-DOS has some structure its interfaces and levels of functionality are not well separated.

In MS-DOS, the interfaces and levels of functionality are not well separated. For

instance, application programs are able to access the basic I/O routines to write directly to the display and disk drives. Such freedom leaves MS-DOS vulnerable to errant (or malicious) programs, causing entire system crashes when user programs fail.

UNIX – limited by h/w functionality, the original UNIX os had limited structuring.

The UNIX OS consists of two separate parts • System programs • The kernel

� Consists of everything below system call interface and above h/w.

� Provides the file system, CPU scheduling, Memory mgmt and other OS functions; A large no. of functions at one level.

Prepared by Dr.MVB

OPERATING SYSTEMS

LAYERED APPROACH

With proper hardware support, operating systems can be broken into pieces that are

smaller and more appropriate than those allowed by the original MS-DOS or UNIX systems. The operating system can then retain much greater control over the computer and over the applications that make use of that computer.

A system can be made modular in many ways. One method is the layered approach,

in which the operating system is broken up into a number of layers (levels). The bottom layer (layer 0) is the hardware; the highest (layer N) is the user interface.

An operating-system layer is an implementation of an abstract object made up of data and the operations that can manipulate those data. A typical operating-system layer—say, layer M—consists of data structures and a set of routines that can be invoked by higher-level layers. Layer M, in turn, can invoke operations on lower-level layers.

The main advantage of the layered approach is simplicity of construction and debugging. The layers are selected so that each uses functions (operations) and services of only lower-level layers. This approach simplifies debugging and system verification. The first layer can be debugged without any concern for the rest of the system, because, by definition, it uses only the basic hardware (which is assumed correct) to implement its functions. Once

Prepared by Dr.MVB

OPERATING SYSTEMS the first layer is debugged, its correct functioning can be assumed while the second layer is debugged, and so on.

Each layer is implemented with only those operations provided by lower level layers.

MICROKERNELS

Researchers at Carnegie Mellon University developed an operating system called

Mach that modularized the kernel using the microkernel approach. This method structures the operating system by removing all nonessential components from the kernel and implementing them as system and user-level programs. The result is a smaller kernel.

The main function of the microkernel is to provide a communication facility between

the client program and the various services that are also running in user space. Communication is provided by message passing, One benefit of the microkernel approach is ease of extending the operating system.

Moves as much from the kernel into USER space. Communication takes place between user modules using msg passing. Benefits:

� Easier to extend a micro kernel. � Easier to port the OS to new architectures. � More reliable(running less code). � More secure.

Detriments: � Performance overhead of user space to kernel space communication.

Mac OS architecture. MODULES:

Perhaps the best current methodology for operating-system design involves using

object-oriented programming techniques to create a modular kernel. Here, the kernel has a set of core components and dynamically links in additional services either during boot time or during run time.

Each core component is separate. Each talk with others over interfaces. Each is loadable as needed with in the kernel.

This model is similar to layered model but more flexible. A Solaris core kernel with seven types of loadable kernel modules:

Prepared by Dr.MVB

OPERATING SYSTEMS

• File systems • Scheduling classes • Loadable system calls • Executable formats • STREAMS modules • Miscellaneous • Device and bus drivers

Such a design allows the kernel to provide core services yet also allows 1. Certain features to be implemented dynamically.

OPERATING SYSTEM GENERATION

• Operating systems are designed to run on any class of machines. The system

must be configured for each specific computer site. • SYSGEN programs obtains info concerning the specific configuration of h/w

system. • Booting- starting a computer by loading the kernel. • BootStrap Program-code stored in ROM that is able to locate the kernel,load it

into memory and start its execution. SYSTEM BOOT

OS must be made available to h/w so h/w can start it.

• Small piece of code bootstrap loader, locates the kernel, loads it into memory,

and starts it. • Sometimes two step process where boot block at fixed location loads bootstrap

loader. • When power initialized on system, xcution starts at fixed memory location.

� Firmware is used to initial boot code.

Prepared by Dr.MVB

OPERATING SYSTEMS IMPORTANT QUESTIONS

� Operating system can be viewed as resource allocator? Explain?

� List five services provided by an operating system that are designed to

make it more convenient for users to use computer system. In what cases it would be impossible for user level programs to provide these services? Explain

� Pictorially depict the location of the operating system in the computer

system explain its functioning. � Explain following terms

File system manipulation Error detection Resource allocation Accounting

� Define the following terms

Operating system Distributed system Network operating system

� Explain the concept of system calls?(IMP MAXTIMES REPEATED)

� What is multitasking system Explain the concept. Advantages?

� Explain the following terms

Multitasking Real time embedded system System calls

� Explain multiprogramming, multi tasking multiprocessing?

� What is a kernel of an operating system? Normal functions of a kernel

� What is a system call? What is the purpose of system call? Describe the

various types of system calls with suitable examples.

Prepared by Dr.MVB

OPERATING SYSTEMS � What is a time sharing system? How does the time sharing system better

utilize the resources? � What is an operating system &structure of an operating system?

� Various types of system calls?

� Explain how an operating system acts as a resource manager?

� Explain the following terms

� Real time system

� Operating system services

� Describe the functions of an operating system

� Explain about distributed systems?

� Explain the purpose of operating system & its goals?

� Operating system make system call available?

� Protection can improve reliability? Explain how?

� Explain about the various memory hierarchies?

� Explain about various objectives function of operating system?

Prepared by Dr.MVB

OPERATING SYSTEMS Process Management – Process concept- process scheduling, operations, Interprocess Communication. Multi Thread programming models. Process scheduling criteria and Algorithms and their evaluation.

Objectives:

• To describe the services an operating system provides to

users, processes, and other systems. • To discuss the various ways of structuring an operating

system. • To explain how operating systems are installed and

customized and how they boot.

PROCESS MANAGEMENT � PROCESS CONCEPT

Process is simply a ―unit of workǁ or an ― program in executionǁ.

• A process is more than the program code, which is sometimes known

as the text section. Operating system is consists of a collection of processes.

• OS processes execute system code and user processes executes user

code.

• It also includes the current activity, as represented by the value of the program counter and the contents of the processor's registers.

• A process generally also includes the process stack, which contains

temporary data (such as function parameters, return addresses, and local variables), and a data section, which contains global variables.

• A process may also include a heap, which is memory that is

dynamically allocated during process run time.

Prepared by Dr.MVB

OPERATING SYSTEMS

• The structure of a process in memory is shown in Figure. It also includes the current activity, as represented by the value of the program counter and the contents of the processor's registers. The structure of a process in memory is shown in Figure.

Process in Memory

Process State:

As a process executes, it changes state. The state of a process

is defined in part by the current activity of that process. Each process may be in one of the following states:

There are 5 stages for the process in its execution life they are

1. New

2. Ready

3. Running

4. Waiting

5. Terminated

• New- The process is being created.

• Running- Instructions are being executed.

Prepared by Dr.MVB

OPERATING SYSTEMS

• Waiting- The process is waiting for some event to occur (such as an I/O completion or reception of a signal).

• Ready- The process is waiting to be assigned to a processor.

• Terminated- The process has finished execution.

It is important to realize that only one process can be running on any

processor at any instant. Many processes may be ready and limiting, however. The state diagram corresponding to these states is presented in figure.

State changes:

Null->new New->ready Ready->running Running->terminated Running->ready Running->waiting Waiting->ready

Prepared by Dr.MVB

OPERATING SYSTEMS Ready->terminated

Waiting->terminated

Process Control Block (PCB) or Task Control Block(TCB):

Each process is represented in the operating system by a process control

block (PCB).

• It is also called a task control block. • It contains many pieces of information associated with a specific process,

including these:

� process state

� process number

� program counter

� registers

� memory limits

� list of open files

�

Fig: Process Control Block (PCB). Process state-The state may be new, ready, running, waiting, halted, and so on.

Prepared by Dr.MVB

OPERATING SYSTEMS Program counter- The counter indicates the address of the next instruction to be executed for this process.

CPU registers- The registers vary in number and type, depending on the computer architecture. They condition code information, stack pointers, and data registers etc. Along with the program counter, this state information must be saved when an interrupt occurs, to allow the process to be continued correctly afterward .

CPU-scheduling information-This information includes a process priority, pointers to scheduling queues, and any other scheduling parameters.

Memory-management information-This information may include such information as the value of the base and limit registers, the page tables, or the segment tables, depending on the memory system used by the operating system .

Accounting information- This information includes the amount of CPU and real time used, time limits, account numbers, job or process numbers, and so on.

I/O status information-This information includes the list of I/O devices allocated to the process, a list of open files, and so on.

CPU switch from a process to process:

Along with PC the state information must be saved when an

interrupt accurs to allow the process to be continued correctly after the word.

Prepared by Dr.MVB

OPERATING SYSTEMS

Fig: context switch

CPU Scheduling Information:

This information includes process priority, pointers to scheduling queues

and any other scheduling parameters Memory management information:

This information includes the value of base and limit registers, page or

segment tables depending on memory system used by operating system. Accounting information:

This information includes amount of rest will have to wait until the

CPU is free and can be rescheduled. Scheduling is responsible for deciding which process to execute and when.

Scheduling Queues:

Job queue: As processes enter the system, they are put into a job queue, which consists of all processes in the system.

Prepared by Dr.MVB

OPERATING SYSTEMS Ready queue: The processes that are residing in main memory and are ready and waiting to execute are kept on a list called the ready queue.

Device queue: Set of all process waiting for I/O devices.

Every process has its own queues. Process migrates b/n the various

queues during its life time.

Fig:The ready queue and various I/O device queues.

A common representation for a discussion of process scheduling queuing diagram, such as in the below figure.

Prepared by Dr.MVB

OPERATING SYSTEMS

Fig: Queueing-diagram representation of process scheduling.

• Each rectangular box represents a queue. • Two types of queues are present: 1. The ready queue and

2. A set of device queues. • The circles represent the resources that serve the queues, and the arrows

indicate the flow of processes in the system. • A new process is initially put in the ready queue. It waits there till it is

selected for execution, or is dispatched. • Once the process is allocated the CPU and is executing, one of several

events could occur: • The process could issue an I/O request and then be placed in an I/O

queue. • The process could create a new sub process and wait for the sub process's

termination. • The process could be removed forcibly from the CPU, as a result of an

interrupt, and be put back in the ready queue. • In the first two cases, the process eventually switches from the waiting

state to the ready state and is then put back in the ready queue. • A process continues this cycle until it terminates, at which time it is

removed from all queues and has its PCB and resources de allocated.

Prepared by Dr.MVB

OPERATING SYSTEMS

• Representation of process scheduling SCHEDULERS:

• A process migrates among the b/n various scheduling queues throughout

its lifetime.

• The operating system must select a, processes from these queues in some fashion.

• The selection process is carried out by the appropriate scheduler.

• The schedulers are generally of 3 types:

� long-term scheduler or job scheduler

� short-term scheduler or CPU scheduler

� medium-term scheduling:

long-term scheduler or job scheduler: Selects processes should be brought into ready queue.

short-term scheduler or CPU scheduler: Selects processes should be executed next and allocates cpu.

• It is invoked very frequently

Milliseconds=>must be fast

• Long term scheduler is invoked very infrequently

Seconds, minutes=>may be slow

• Long term scheduler controls the degree of multiprogramming.

Medium-term scheduling:

• When main memory is full process need to be swapped out to disk.

• When main memory is full process need to be swapped out to disc.

• It decides which process and when the process.

Prepared by Dr.MVB

OPERATING SYSTEMS

Addition of medium term scheduling

Fig: Addition of medium term scheduler of queueing diagram

Types of processes:

I/O bound process: It spends more time on doing I/O than computations, many sort cpu bursts.

Cpu bound process: It spends more time on doing computations .

Context Switch:

• Context switch is a task of switching the cpu to another

process by saving the state of the old process and loading the saved state for the new process.

• Context-switch time is pure overhead, because the system

does no useful work while switching.

• Time dependent on hardware support 1 to 1000 microseconds.

Operations on process:

The processes in most systems can execute concurrently, and they may be created and deleted dynamically. Thus, these systems must provide a mechanism for process creation and termination. In this section, we explore the mechanisms involved in creating processes and illustrate process creation On UNIX and Windows systems Two operations can be done on process:

Prepared by Dr.MVB

OPERATING SYSTEMS

1. process creation 2. process Termination

Process creation: • A system call is used to create process. • Assign unique process id for each process. • Space. • PCB initialization. • The process creation is called as ”parent process”. • Parent process creates a ―child processǁ which in terms creates other

processes, forming a tree processes. • Most operating systems (including UNIX and the Windows family of

operating systems) identify processes according to a unique process identifier.

• A process needs certain resources to accomplish its task. o -cpu time, memory, files, I/O devices.

• When a process creates a new process o -Resource sharing possibilities. o -children share subset of parents resources. o -parent and child share no resources.

• Execution possible. o -parent and children execute concurrently. o -parent waits until child execute.

Prepared by Dr.MVB

OPERATING SYSTEMS

Fig: tree of processes on a typical solar system

Process creation with UNIX example:

� A new process is created by the fork() system call. � The new process consists of a copy of the address space of the original

process. � This mechanism allows the parent process to communicate easily with its

child process. � Both processes (the parent and the child) continue execution at the

instruction after the fork( ), with one difference: � The return code for the fork() is zero for the new (child) process, whereas

the (nonzero) process identifier of the child is returned to the parent. � The exec ( ) system call is used after a fork ( ) system call by one of the

two processes to replace the process's memory space with a new program. With window s NT

• Parent address space can be duplicated or

Prepared by Dr.MVB

OPERATING SYSTEMS

• Parent can be specify the name of a program for the OS to load into the address space of new process.

UNIX: fork ( ) system call:

• Fork () is used to create process and it doesn‘t take arguments and return‘s process ID.

• Fork () create a new process which becomes the child process of the caller.

• After a new process is created, both will execute the next instruction following fork () system call.

• On checking the returning value, we have to distinguish the parent from child for that it returns 3 values if it returns

� 0 The new child process is created. � -ve value the creation is un success full. � +ve to the parent.

• Process I/O is of type pit_t defined in sys/types.h • getpid ( ) can be used to retrieve process id. • The new process consists of a copy of address space of the original

process.

#include <sys/types.h> #include <stdio.h> #include <string.h> Void main(void) { pid-t pid; int i; char buf[Buf_size]; fork(); pid=getpid(); for(i=1:i<=MAX_count ; i++) { Sprint(buf;ǁthis line is from pid %d,value =%d\nǁ,pid,i); While(1,buf,strlen(buf));}}

Process Termination:

Prepared by Dr.MVB

OPERATING SYSTEMS

• A process terminates when it finishes executing its final statement and asks the operating system to delete it by using the exit () system call.

• A parent may terminate the execution of one of its children for a variety of reasons, such as these: � The child has exceeded its usage of some of the resources that it

has allocated. � The task assigned to the child is no longer required. � The parent is exiting, and the operating system does not allow a

child to continue if its parent terminates. void main(void) { pid-t pid; /* fork a child process */ pid = fork(); if (pid < 0) {/* error occurred */ fprintf(stderr, "Fork Failed"); exit (-1) ; } else if (pid == 0} {/* child process */ execlpf"/bin/Is","Is",NULL); } else {/* parent process */ /* parent will wait for the child to complete */ wait(NULL); printf("Child Complete");

exit (0) ; } }

� INTER PROCESS COMMUNICATION (IPC):

Processes executing concurrently in the operating system may be either independent processes or cooperating processes.

Prepared by Dr.MVB

OPERATING SYSTEMS

• A process is independent if it cannot affect or be affected by the other processes executing in the system.

• Any process that does not share data with any other process is independent.

• A process is co-operating if it can affect or be affected by the other processes executing in the system.

Reason’s for process co-operating: There are several reasons for providing an environment that

allows process co operation: Information sharing- Since several users may be interested in the same piece of information (for instance, a shared file) , we must provide an environment to allow concurrent access to such information. Computation speedup- If we want a particular task to run faster, we must break it into subtasks, each of which will be executing in parallel with the others. Notice that such a speedup can be achieved only if the computer has multiple processing elements (such as CPUs or I/O channels). Modularity- We may want to construct the system in a modular fashion, dividing the system functions into separate processes or threads. Convenience- Even an individual user may work on many tasks at the same time. For instance, a user may be editing, printing, and compiling in parallel. There are two fundamental models of inter process communication: (1) Shared memory (2) Message passing. Shared memory: In the shared-memory model, a region of memory that is shared by cooperating processes is established. Processes can then exchange information by reading and writing data to the shared region. Message passing:

In the message passing model, communication takes place by means of messages exchanged between the cooperating processes.

• Message passing is useful for exchanging smaller amounts of data, because no conflicts need be avoided.

• Message passing is also easier to implement . • Shared memory allows maximum speed and convenience of

communication • As it can be done at memory speeds when within a computer.

Prepared by Dr.MVB

OPERATING SYSTEMS

• Shared memory is faster than message passing, as message-passing systems are typically implemented using system calls

Fig: message passing and shared memory

Producer consumer problem

The concept of cooperating processes can be explained using producer-consumer problem .

• A producer process produces information that is consumed by a consumer process.

• For example a print program produces charts that are consumed by the print driver‘s.

• A compiler may produce assembly code, which is consumed by an assembler. The assembler, in turn, may produce object modules, which are consumed by the loader.

• To allow producer and consumer processes to run concurrently, we must have available a buffer of items that can be filled by the producer and emptied by the consumer.

• A producer can produce one item while the consumer is consuming another item. The producer and consumer must be synchronized, so that

Prepared by Dr.MVB

OPERATING SYSTEMS

the consumer does not try to consume an item that has not yet been produced.

• Two types of buffers can be used. The unbounded buffer places no practical limit on the size of the buffer. The consumer may have to wait for new items, but the producer can always produce new items.

• The bounded buffer assumes a fixed buffer size. In this case, the consumer must wait if the buffer is empty, and the producer must wait if the buffer is full

The following variables reside in a region of memory shared by the producer and consumer processes:

#define BUFFER_SIZE 10 typedef struct { }item; item buffer [BUFFER_SIZE] ; int in = 0 ,- int out = 0 ;

• The shared buffer is implemented a as a circular array with two logical pointers: ―in―and ―outǁ.

• The variable ―inǁ points to the next free position in the buffer. • ―Outǁ points to the first full position in the buffer. • The buffer is empty while(in == out); • the buffer is full when ((in + 1) % BUFFER_SIZE) == out)). • The code for the producer and consumer processes is shown as below

item nextProduced; while (true) { /* produce an item in nextProduced */ while (((in + 1) % BUFFER-SIZE) == out) /* do nothing */ buffer[in] = nextProduced; in = (in + 1) % BUFFER_SIZE; } Fig: The producer process

Prepared by Dr.MVB

OPERATING SYSTEMS Next Produced: New item to be produced is stored.

item nextConsumed; while (true) { while (in == out) ; //do nothing nextConsumed = buffer[out]; out = (out + 1) % BUFFEFLSIZE; /* consume the item in nextConsumed */ } Fig: The consumer process

Next Consumed: Item to be consumed to be stored.

Message-Passing Systems:

� Co operating processes can communicate in a shared-memory

environment. � It is used for example, a chat program used on the World Wide Web

could be designed so chat participants communicate with one another by exchanging messages.

� A message-passing facility provides at least two operations: � send(message) and � receive(message). � Messages sent by a process can be of either fixed or variable size. � If only fixed-sized messages has straight forword implementation where

as complex system level implementation is needed for variable length. � If processes P and Q want to communicate, they must send messages to

and receive messages from each other; a communication link must exist between them. This link can be implemented in a variety of ways.

� We are concerned here not with the link's physical implementation (such as shared memory, hardware but rather with its logical implementation.

� Here are several methods for logically implementing a link and the send()/receive () operations:

o Direct or indirect communication o Synchronous or asynchronous communication

Prepared by Dr.MVB

OPERATING SYSTEMS

o Automatic or explicit buffering Naming :

Processes that want to communicate must have a way to refer to

each other .They can use either direct or indirect communication. • Direct communication • Indirect communication

Direct communication:

In direct communication, each process that wants to communicate must explicitly name the recipient or sender of the communication. In this scheme, the send0 and receive() primitives are defined as: � send (P, message)—Send a message to process P. � receive (Q, message)—Receive a message from process Q.

A communication link in this scheme has the following properties: • A link is established automatically between every pair of processes that

want to communicate. The processes need to know only each other's identity to communicate.

• A link is associated with exactly two processes. • Between each pair of processes, there exists exactly one link.

Indirect communication:

The messages are sent to and received from mailboxes, or ports. A mailbox can be viewed abstractly as an object into which messages can be placed and can be removed.

• Each mailbox has a unique identification.

The send () and receive () primitives are defined as follows:

send (A, message)—Send a message to mailbox A. receive(A, message)—Receive a message from mailbox A.

• In this scheme, a communication link has the following properties:

• A link is established between a pair of processes only if both members of the pair have a shared mailbox.

• A link may be associated with more than two processes.

Prepared by Dr.MVB

OPERATING SYSTEMS

• Between each pair of communicating processes, there may be a number of different links, with each link corresponding to one mailbox. -The operating system then must provide a mechanism that allows a process to do the following:

• Create a new mailbox. • Send and receive messages through the mailbox. • Delete a mailbox.

Synchronization:

Message passing may be blocking or nonblocking— also known

as synchronous and asynchronous. Buffering:

Communication is direct or indirect; messages exchanged by communicating processes reside in a temporary queue. Such queues can be implemented in three ways: Zero capacity. The queue has a maximum length of zero; thus, the link cannot have any messages waiting in it. In this case, the sender must block until the recipient receives the message. Bounded capacity. The queue has finite length n; thus, at most n messages can reside in it. If the queue is not full when a new message is sent, the message is placed in the queue (either the message is copied or a pointer to the message is kept), and the sender can continue execution without waiting. The links capacity is finite, however. If the link is full, the sender must block until space is available in the queue. Unbounded capacity. The queues length is potentially infinite; thus, any number of messages can wait in it. The sender never blocks.

THREAD:

• A thread is a basic unit of CPU utilization. • Each thread shares code, data segment and open files of that process with

another threads. • Each has separate register set values and stacks. • Consider an example of a word processing application: • It has 2 important threads executing in it.

Prepared by Dr.MVB

OPERATING SYSTEMS

• To read the keystrokes from keyboard and another a spelling checking running back in background. Advantages: � Creating a thread is faster than creating a process. � Thread termination is faster than process termination � Switching b/n threads of a single process is faster. � Threads are more efficient than processes as they ―share

memoryǁ. � The process of thread creation& distribution are cheaper as no

allocation & declaration of new address space is required.

User and kernel Level threads:

Kernel Level threads: In Kernel Level threads the os Kernel is responsible for the management of threads.

• Hence no thread management code is available in application area Also no runtime system is required in kernel level threads.

• Also no run time system is required in kernel level threads.

Prepared by Dr.MVB

OPERATING SYSTEMS

• The kernel level threads has a thread table that maintains the information of all threads present in system

• In addition to table kernel also maintains a process table for keeping track of all the process.

• All the threads can be created and manage by using a system call provided by os of kernel.

User level threads: In user l Level threads the management of threads by the application.

• As all the threads. Single and multi thread process:

A relation exist b/n the user and kernel thread which can be as 3 types: Many-to-One Model

The many-to-one model maps many user-level threads to one kernel thread.

• Thread management is done by the thread library in user space, so it is efficient; but the entire process will block if a thread makes a blocking system call.

• Green threads—a thread library available for Solaris—uses this model, as does GNU Portable Threads

One-to-One Model

• The one-to-one model maps each user thread to a kernel thread.

Prepared by Dr.MVB

OPERATING SYSTEMS

• It provides more concurrency than the many-to-one model by allowing another thread to run when a thread makes a blocking system call; it also allows multiple threads to run in parallel on multiprocessors.

• The only drawback to this model is that creating a user thread requires creating the corresponding kernel thread. Because the overhead of creating kernel threads can burden the performance of an application, most implementations of this model restrict the number of threads supported by the system. Linux, along with the family of Windows operating systems—including Windows 95, 98, NT, 2000, and XP— implement the one-to-one model.

Many-to-Many Model

• The many-to-many model (Figure 4.4) multiplexes many user-level threads to a smaller or equal number of kernel threads.

• The number of kernel threads equal to user level threads. • This variation, sometimes referred to as the tivo-level model , is

supported by operating systems such as IRIX, HP-UX, and Tru64 UNIX. The Solaris operating system supported.

Prepared by Dr.MVB

OPERATING SYSTEMS

Scheduling Criteria

Different CPU scheduling algorithms have different properties, and the choice of a particular algorithm may favor one class of processes over another. In choosing which algorithm to use in a particular situation, we must consider the properties of the various algorithms.

• Many criteria have been suggested for comparing CPU scheduling

algorithms. • The criteria include the following:

CPU utilization- We want to keep the CPU as busy as possible. Conceptually, CPU utilization can range from 0 to 100 percent. Throughput- If the CPU is busy executing processes, then work is being done. One measure of work is the number of processes that are completed per time unit, called throughput. Turnaround time- From the point of view of a particular process, the important criterion is how long it takes to execute that process. The interval from the time of submission of a process to the time of completion is the turnaround time. Waiting time- The CPU scheduling algorithm does not affect the amount of time during which a process executes or does I/O; it affects only the amount of time that a process spends waiting in the ready queue. Waiting time is the sum of the periods spent waiting in the ready queue.

Prepared by Dr.MVB

OPERATING SYSTEMS Response time- In an interactive system, turnaround time may not be the best criterion. Often, a process can produce some output fairly early and can continue computing new results while previous results are being produced. Scheduling Algorithms CPU scheduling deals with the problem of deciding which of the processes in the ready queue is to be allocated the CPU. There are many different CPU scheduling algorithms. In this section, we describe several of them.

First-Come First-Served Scheduling(FCFS)

• By far the simplest CPU-scheduling algorithm is the first-come, first- served (FCFS) scheduling algorithm. With this scheme, the process that requests the CPU first is allocated the CPU first.

• The implementation of the FCFS policy is easily managed with a FIFO queue. When a process enters the ready queue, its PCB is linked onto the tail of the queue. When the CPU is free, it is allocated to the process at the head of the queue.

• The running process is then removed from the queue. The code for FCFS scheduling is simple to write and understand.

• The average waiting time under the FCFS policy, however, is often quite long. Consider the following set of processes that arrive at time 0, with the length of the CPU burst given in milliseconds:

Process Burst Time P1 24 P2 3 P3 3

• If the processes arrive in the order Pi, Po, P3, and are served in FCFS order, we get the result shown in the following Gantt chart:

T

Prepared by Dr.MVB

OPERATING SYSTEMS

• he waiting time is 0 milliseconds for process Pi, 24 milliseconds for process Pn, and 27 milliseconds for process Pj. Thus, the average waiting time is (0 + 24 + 27)/3 = 17 milliseconds. If the processes arrive in the order P1, P2, P3.

Shortest-Job-First Scheduling(SJF)

• This algorithm associates with each process the length of the process‘s next CPU burst. When the CPU is available, it is assigned to the process that has the smallest next CPU burst.

• If the next CPU bursts of two processes are the same, FCFS scheduling is used to break the tie.

• As an example of SJF scheduling, consider the following set of processes, with the length of the CPU burst given in milliseconds:

Process Burst Time

P1 6 P2 8 P3 7 P4 3

• we would schedule these processes according to the following Gantt chart:

• The waiting time is 3 milliseconds for process P\, 16 milliseconds for process Pi, 9 milliseconds for process P$, and 0 milliseconds for process P4. Thus, the average waiting time is (3 + 16 + 9 + 0)/4 - 7 milliseconds. By comparison, if we were using the FCFS scheduling scheme, the average waiting time would be 10.25 milliseconds.

Prepared by Dr.MVB

OPERATING SYSTEMS Priority Scheduling

• A priority is associated with each process, and the CPU is allocated to the process with the highest priority. Equal-priority processes are scheduled in FCFS order.

• An SJF algorithm is simply a priority algorithm where the priority (p) is the inverse of the (predicted) next CPU burst. The larger the CPU burst, the lower the priority, and vice versa.

• A major problem with priority scheduling algorithms is indefinite blocking or starvation.

• A solution to the problem of indefinite blockage of low-priority processes is aging.

• Aging is a technique of gradually increasing the priority of processes that wait in the system for a long time.

Process BT Priority P1 10 3 p2 1 1 p3 2 3 p4 1 4 p5 5 2

• Using priority scheduling, we would schedule these processes according to the following Gantt chart.

Prepared by Dr.MVB

OPERATING SYSTEMS

The average waiting time is 8.2 milliseconds Round-Robin Scheduling(RR)

• A small unit of time, called a time quantum or time slice, is defined. A time quantum is generally from 10 to 100 milliseconds.

• The ready queue is treated as a circular queue. The CPU scheduler goes around the ready queue, allocating the CPU to each process for a time interval of up to 1 time quantum.

Where q=4

Prepared by Dr.MVB

OPERATING SYSTEMS Multilevel Queue Scheduling

A multilevel queue scheduling algorithm partitions the ready queue into several separate queues. The processes are permanently assigned to one queue, generally based on some property of the process, such as memory size, process priority, or process type.

• Each queue has its own scheduling algorithm. • For example, separate queues might be used for foreground and

background processes. The foreground quite might be scheduled by an RR algorithm.

• In addition, there must be scheduling among the queues, which is commonly implemented as fixed-priority preemptive scheduling. For example, the foreground queue may have absolute priority over the background queue.

Prepared by Dr.MVB

OPERATING SYSTEMS

Multilevel Feedback-Queue Scheduling The multilevel feedback-queue scheduling algorithm, in

contrast, allows a process to move between queues. The idea is to separate processes according to the characteristics of their CPU bursts.

• If a process uses too much CPU time, it will be moved to a lower-priority queue. This scheme leaves I/O-bound and interactive processes in the higher-priority queues.

• In addition, a process that waits too long in a lower-priority queue may be moved to a higher-priority queues.

• This form of aging prevents starvation. For example, consider a multilevel feedback-queue scheduler with three queues, numbered from 0 to 2. The scheduler first executes all processes in queue 0.

Prepared by Dr.MVB

OPERATING SYSTEMS

• The scheduler first executes all processes in queue 0. Only when queue 0 is empty will it execute processes in queue 1.

• Similarly, processes in queue 2 will only be executed if queues 0 and 1 are empty. A process that arrives for queue 1 will preempt a process in queue 2.

• A process in queue 1 will in turn be preempted by a process arriving for queue 0.

IMPORTANT QUESTIONS:

• What is the process? • What are the process states? • Explain about PCB. • Define context switch and scheduler. • Explain about Cooperating process and indefinite process. • Explain the various types of schedulers. • Explain scheduling queue. • Explain the operations on process. • Explain the cooperating processes. • Explain the inter process communication Or message systems. • Explain aboutThreads. • Explain preemptive and non preemptive scheduling. • Discuss about scheduling criteria‘s.

Prepared by Dr.MVB

OPERATING SYSTEMS Concurrency:Process synchronization, the critical- section problem, Peterson‘sSolution, Synchronization Hardware, semaphores, classic problems of synchronization, Monitors, Synchronization examples

Process synchronization

There are several situations, where different

processes need to interact with each other to achieve a common goal. The interaction can be done either by sharing data or sending messages. When data is shared by several independent processors then there is a chance of data becoming inconsistent.

Forex:let us consider2 processors p1 and p2,both share a variable called ‗counter‘. If bothp1 and p2 excute simultaneously, with p1 incrementing the counter while p2 decrementing it. Then at the end, the result of this variable will be such that, it is not expected by either p1/p2. Because it becomes inconsistent. This is often called race condition. Hence ,a synchronization mechanism is needed to avoid inconsistency among several processes.

Basically process synchronization is implemented by making a process to wait until another process performs an appropriate action on shared data. It is also called signaling where one process waits for notification of an event that will occur in another process.

Synchronization: Communication between processes takes place through calls to send () and receive () primitives. Message passing may be either blocking or non blocking – also known as synchronous and asynchronous.

Blocking send: The sending process is blocked until the message is received by the receiving process or the mail box. Non blocking send: The sending process sends the message and resumes operation. Blocking receive: The receiver blocks until a message is available. Non blocking receive: The receiver retrieves either a valid message or a null. When both send () and receive () are blocking, we have a rendezvous between the sender and the receiver.

Prepared by Dr.MVB

OPERATING SYSTEMS Buffering: Whether the communication is direct or indirect, messages exchanged by communicating processes reside in a temporary queue. Such queues can be implemented in three ways – Zero capacity: The queue has the maximum length of zero; thus the link cannot have any messages waiting in it. Bounded capacity: The queue has finite length n; thus, at most n messages can reside in it. Unbounded capacity: The queues length is infinite; thus any number of messages can wait in it. The sender never blocks.

The zero capacity buffer is sometimes referred to as a message system with no buffering; the other cases are referred to as systems with automatic buffering.

A co – operating process is one that can affect or be affected by other processes executing in the system. Co- operating processes can either directly share a logical address space or be allowed to share data only through files or messages. The former is achieved through the use of light weight processes or threads. Concurrent access to shared data may result in data inconsistency. A bounded buffer could be used to enable processes to share memory. While considering the solution for producer – consumer problem, it allows at most BUFFER_SIZE – 1 items in the buffer. If the algorithm can be modified to remedy this deficiency, one possibility is to add an integer variable counter initialized to 0. Counter is incremented every time a new item is added to the buffer and is decremented every time, an item is removed from the buffer.

When several processes access and manipulate the same data concurrently and the outcome of the execution depends on the particular order in which the access takes place is called race condition. To guard against the race condition, we need to ensure that only one process at a time can be manipulating the variable counter. Hence processes must be synchronized. process competition for resources

Conflicts arises among the various concurrently

executing processes when they are competing for the same resource.

Let us consider a situation in which two or processors want to access a resource. Each of these concurrently executing processes is unware of the presence of other processes and the execution of one process doesn‘t cause any effect on the execution of the other process. Hence, the state of the resources used, remain unaffected.

Prepared by Dr.MVB

OPERATING SYSTEMS No information exchange occurs between the competing processes and the execution of one process causes a significant affect on the behavior of the other competing process i.e. if 2 processes want to access a single resource then the os grants the resource access to only one process and let the other process to wait ,as a result of which the blocked process may never get the resources and terminates in an inappropriate manner.

Three problems dominates in case of competing processes.

i. The need for mutual exclusion ii. The occurrence of deadlock and

iii. The problem of starvation

(i) Need for mutual Exclusion

Consider a situation in which two or more processes need access to single non-sharable resource(eg:printer).during the execution process, each process send the commands to i/o devices or sends and receives status information e.t.c. Such an I/O device is said to be critical section. An important point to be considered here is that only one program is permitted to enter into the critical section at any time.

(ii)Occurrence of deadlocks

The major cause for the occurrence of a

deadlock is the imposition of the mutual exclusion.

Consider an example ,granting of 2 resources R1 and R2 to two processes p1 and p2.futher suppose that each of these processes want to access both the resources in order to excute some function. a situation may occur in which an operating system assigns the resourceR1 to process p2 and resourceR2 to process p1.hence , each process is waiting for one of the resources and will not release the acquired resource till it gets the other resource i.e. a process needs both these resources inorderto proceed. This leads to deadlock.

Prepared by Dr.MVB

OPERATING SYSTEMS

(ii)The problem of starvation

Let P1,P2 and p3 be the three processes, each of which requires a periodical access to resource R.if access to resource ‘R‘ is granted to the process p1,then the other two processors P2 and P3 are delayed as they are waiting for the resource ‘R‘. Now ,let the access is granted to p3 and if p1 again needs ‗R‘ prior to the completion of its critical section. If os permits p1 to use ‗R‘ after P3 has completed its execution, then these alternate access permissions provided to P1 and P3 causes P2 to be blocked.

This competition among the processes can be controlled by

involving an os which is responsible for allocating resources to all processes in the system

� , the critical- section problem:

A critical section is a segment of code present in a process may be

modifying or accessing common variables or shared data item.

The most important thing that a system should control is that, when one process is executing its critical section, it should not allow other processes to execute in its critical sections

The critical section problem is to design a protocol that processes can use to cooperate. Each process must request permission to enter its critical section from the system. this is called an

Entry section.

After that process executes its critical section and comes out of it this is called exit section. then, it executes the remaining code called remainder section. the general structure of process pi can be shown below.

Prepared by Dr.MVB

OPERATING SYSTEMS A critical section is a piece of code that only one thread can execute at a time. If multiple threads try to enter a critical section, only one can run and the others will sleep. Imagine you have three threads that all want to enter a critical section.

Only one thread can enter the critical section; the other two have to sleep. When a thread sleeps, its execution is paused and the OS will run some other thread.

Once the thread in the critical section exits, another thread is woken up and allowed to enter the critical section.

Prepared by Dr.MVB

OPERATING SYSTEMS A solution to critical section problems must satisfy the following three requirements.

Mutual exclusion: If a process is executing in critical section, then no other process can be executing in their critical section.

Progress: If no process is executing in its critical section and some processes wish to enter their critical sections, then only those processes that are not executing in their remainder sections can participate in the decision on which will enter its critical section next and this selection cannot be postponed indefinitely

Bounded waiting There exists a bound or limit on the number of times that other processes are allowed to enter their critical section after a process has made a request to enter its critical section and before that request is granted. There are two 2 approaches are used to handle critical sections in operating systems.

Preemptive kernels: allows a process to be pre empted while it is running in kernel mode; a kernel mode process will run until it exits kernel mode, blocks or voluntarily yields control of the CPU

Non preemptive kernels: does not allow a process running in kernel mode to be pre empted. A non pre emptive kernel is free from race conditions on kernel data structures as only one process is active in the kernel.

� Peterson’sSolution,

A classic software based solution to the critical section problem is known as Peterson‘s solution. It provides a good algorithmic description of solving the critical section problem and illustrates some of the complexities involved in designing software that addresses the requirements of mutual exclusion, progress and bounded waiting requirements. Peterson‘s solution is restricted to two processes that alternate execution between their critical sections and remainder sections. Peterson‘s solution requires two data items to be shared between the two processes: int turn;

boolean flag[2];

Prepared by Dr.MVB

OPERATING SYSTEMS We use the notation pi for p0 and pj for p1 where i=0 and j=1-i(i.e., 1-0=1).

The variable ‗turn‘ indicates the turn of the process, whose turn is to execute it‘s critical section.

For example: if turn==j, then process pj is allowed to enter it‘s critical section

and execute. The ‗flag‘ array indicates whether a process is ready to enter it‘s critical section. Or not.

For ex: if flag[j] is true, then it means process pj is ready to enter its critical section.

The algorithm for Peterson‘s solution is as follows

- boolean flag[2];

initially flag [0] = flag [1] = false.

- flag [i] = true _ Pi ready to enter its critical section

do { flag [i]:= true; turn = j; while (flag [j] and turn == j) ; critical section flag [i] = false; remainder section } while (1);

Process pi sets flag[i] as true to indicate that it is ready to enter it‘s critical section. Then it allows pj to enter its critical section(if it wishes) by assigning the value of variable ‗turn‘ as‘j‘(turn:=j).

The while loop waits infinitely doing no-operation until the value of ‘turn‘ is equal to ‗j‘. that is the value ‗turn‘ should be equal to ‘I‘ then only it comes out of while loop and enters the critical section of ‘I‘ and after executing that, it exists the critical section by disabling flag(as flag[i]:=false).later it executes

Prepared by Dr.MVB

OPERATING SYSTEMS the remainder section.

Mutual exclusion is preserved , because pi can enter its critical section only if either flag[j]==false or turn==i. if both the cases doesn‘t hold, then control will be blocked in the while loop.

Progress and bounded waiting requirements are satisfied by using the condition flag[j]==true and turn==j.if process pj is not ready to enter its cs, then the value of flag[j] will be false, then pi executes its cs. If it is not the case i.e. pj is ready, then execution of cs depends on the value of ‗turn‘.

� Synchronization Hardware,:

Any solution to the critical section problem requires a simple tool called a lock. Race conditions are prevented by requiring that critical regions are protected by locks that is a process must acquire a lock before entering a critical section; it releases the lock when it exits the critical section.

But design of such locks can be sophisticated. Hardware features can make any programming task easier and improve system efficiency. The critical section problem can be solved simply in a uni processor environment if we could prevent interrupts from occurring while a shared variable was being modified. Then we can ensure that the current sequence of instructions would be allowed to execute in order without pre emption. No other instructions would be run, so no unexpected modifications could be made to the shared variable. This solution

Prepared by Dr.MVB

OPERATING SYSTEMS is not feasible in a multi processor environment. Disabling interrupts on a multi processor can be time consuming, as the message is passed to all the processors.

This message passing delays entry into each critical section and system efficiency decreases. Many modern computer systems provide special hardware instructions that allow us either to test and modify the content of a word or to swap the contents of two words atomically that is as one uninterruptable unit. The TestAndSet () instruction can be defined as above. This instruction is executed atomically. Thus, if two TestAndSet () instructions are executed simultaneously each on a different CPU, they will be executed sequentially in some order.

The Swap () instruction operates on the contents of two words –

Definition of Swap Instruction

Prepared by Dr.MVB

OPERATING SYSTEMS

Mutual Exclusion implementation with the Swap() function

This is also executed atomically. If the machine supports Swap() instruction, then mutual exclusion can be provided by using a global Boolean variable lock initialized to false. Each process has a local Boolean variable key.

But these algorithms do not satisfy the bounded – waiting requirement. The below algorithm satisfies all the critical section problems: Common data structures used in this algorithm are: Boolean waiting[n];

Boolean lock; Both these data structures are initialized to false. For proving that the mutual exclusion requirement is met, we must make sure that process Pi can enter its critical section only if either waiting[i] == false or key == false. The value of key can become false only if the TestAndSet() is executed.

Prepared by Dr.MVB

OPERATING SYSTEMS � Semaphores:

The various hardware based solutions to the critical section problem are complicated for application programmers to use. To overcome this difficulty, we can use a synchronization tool called a semaphore.

A semaphore S is an integer variable that is accessed only through standard atomic operations:

wait() and signal().

Wait: Signal: Wait(S) signal(S) { { While S<=0; S++; S--; } }

Modifications to the integer value of the semaphore in the wait() and signal() operations must be executed indivisibly. When one process modifies the semaphore value, no other process can simultaneously modify that same semaphore value.

Prepared by Dr.MVB

OPERATING SYSTEMS Usage: :OS‘s distinguish between counting and binary semaphores. The value of a counting semaphore can range over an unrestricted domain. The value of a binary semaphore can range only between 0 and 1. Binary semaphores are known as mutex locks as they are locks that provide mutual exclusion.

Binary semaphores are used to deal with the critical section problem for multiple processes. Counting semaphores can be used to control access to a given resource consisting of a finite number of instances. The semaphore is initialized to the number of resources available. Each process that wishes to use a resource performs a wait() operation on the semaphore. When a process releases a resource, it performs a signal() operation.

Count = 0 -> all resources are being used

Semaphores can be used to solve various synchronization problems. Simple binary semaphore example with two processes: Two processes are both vying for the single semaphore. Process A performs the acquire first and therefore is provided with the semaphore. The period in which the semaphore is owned by the process is commonly called a critical section. The critical section can be performed by only one process, therefore the need for the coordination provided by the semaphore. While Process A has the semaphore, Process B is not permitted to perform its critical section. Note that while Process A is in its critical section, Process B attempts to acquire the semaphore. As the semaphore has already been acquired by Process A, Process B is placed into a blocked state. When Process A finally releases the semaphore, it is then granted to Process B, which is allowed to enter its critical section. Process B at a later time releases the semaphore, making it available for acquisition.

Prepared by Dr.MVB

OPERATING SYSTEMS

In the counting semaphore example, each process requires two resources before being able to perform its desired activities. In this example, the value of the counting semaphore is 3, which means that only one process will be permitted to fully operate at a time. Process A acquires its two resources first, which means that Process B blocks until Process A releases at least one of its resources.

Implementation The main disadvantage of the semaphore is that it requires busy waiting. While a process is in its critical section, any other process that tries to enter its critical section must loop continuously in the entry code. Busy waiting wastes CPU

Prepared by Dr.MVB

OPERATING SYSTEMS cycles that some other process might be able to use productively. This type of semaphore is called a spinlock because the process spins while waiting for the lock. To overcome, the need for busy waiting the definition of wait () and signal() semaphore operations can be modified. When a process executes the wait() operation and finds that the semaphore value is not positive, it must wait. Rather than engaging in busy waiting, the process can block itself. The block operation places a process into a waiting queue associated with the semaphore and the state of the process is switched to the waiting state. Then control is transferred to CPU scheduler which selects another process to execute. A process that is blocked waiting on a semaphore S, should be restarted when some other process executes a signal() operation. The process is restarted by a wakeup() operation which changes the process from the waiting state to the ready state. Process is then placed in the ready queue.

Each semaphore has an integer value and a list of processes list. When a process must wait on a semaphore, it is added to the list of processes. A signal () operation removes one process from the list of waiting processes and awakens that process Wait(semaphore DIAGRAM Signal(semaphore DIAGRAM The block() operation suspends the process that invokes it. The wakeup(P) operation resumes the execution of a blocked process P. These two operations are provided by the OS as basic system calls.

But this implementation may have negative semaphore values. The list of waiting processes can be easily implemented by a link field in each PCB. Each semaphore contains an integer value and a pointer to a list of PCBs. One way to add and remove processes from the list in a way that ensures bounded waiting is to use a FIFO queue where the semaphore contains the head and tail pointer to the queue.

The critical aspect of semaphores is that they be executed atomically. No two processes can execute wait() and signal() operations on the same semaphore at the same time. This is a critical section problem and in a single CPU environment, this can be solved by simply inhibiting interrupts during the time the wait() and signal() operations are executing. But in a multi processor

Prepared by Dr.MVB

OPERATING SYSTEMS environment, interrupts must be disabled on every processor. Disabling interrupts on every processor can be a difficult task and diminishes performance. Hence SMP systems must provide alternate locking techniques such as spin locks to ensure that wait() and signal() are performed atomically.

Deadlocks and Starvation The implementation of a semaphore with a waiting queue may result in a situation where two or more processes are waiting indefinitely for an event (execution of a signal()) that can be caused only by one of the waiting processes. When such a state is reached, these processes are said to be deadlocked.

A set of processes is in a dead lock state when every process in the set is waiting for an event that can be caused only by another process in the set. These events are resource acquisition and release.

Prepared by Dr.MVB

OPERATING SYSTEMS

Another problem related to deadlocks is indefinite blocking or starvation, a situation in which processes wait indefinitely within the semaphore. Indefinite blocking may occur if we add and remove processes from the list associated with a semaphore in LIFO order.

� Classic problems of synchronization:

These synchronization problems are examples of large class of concurrency control problems. In solutions to these problems, we use semaphores for synchronization.

The Bounded Buffer problem:

Here the pool consists of n buffers, each capable of holding one item. The mutex semaphore provides mutual exclusion for accesses to the buffer pool and is initialized to the value 1. The empty and full semaphores count the number of empty and full buffers. The semaphore empty is initialized to the value n, the semaphore full is initialized to value 0. The code below can be interpreted as the producer producing full buffers for the consumer or as the consumer producing empty buffers for the producer.

Prepared by Dr.MVB

OPERATING SYSTEMS

The structure of the producer process

The structure of the consumer process

The Readers – Writers Problem

A data base is to be shared among several concurrent processes. Some of these processes may want only to read the database (readers) whereas others may want to update the database (writers). If two readers access the shared data simultaneously, no adverse effects will result. If a writer and some other thread, access the database simultaneously, problems occur.

To prevent these problems, writers have exclusive access to the shared database. This synchronization problem is referred to as readers – writers problem.

Prepared by Dr.MVB

OPERATING SYSTEMS

The simplest readers writers problem requires that no reader will be kept waiting unless a writer has already obtained permission to use the shared object. No reader should wait for other readers to finish simply because a writer is waiting.

The second readers writers problem requires that once a writer is ready, that writer performs its write as soon as possible, that is if a writer is waiting to access the object, no new readers may start reading.

In the solution to the first readers – writers problem, the reader processes share the following data structures:

Semaphore mutex, wrt; Int readcount;

Prepared by Dr.MVB

OPERATING SYSTEMS

Semaphores mutex and wrt are initialized to 1, readcount is initialized to 0. Semaphore wrt is common to both reader and writer processes. The mutex semaphore is used to ensure mutual exclusion when the variable readcount is updated. The readcount variable keeps track of how many processes are currently reading the object. The semaphore wrt functions as a mutual exclusion semaphore for the writers. It is also used by the first or last reader that enters or exits the critical section. It is not used by readers who enter or exit while other readers are in their critical section.

The readers – writers problem and its solutions has been generalized to provide reader – writer locks on some systems. Acquiring the reader – writer lock requires specifying the mode of the lock, either read or write access. When a process only wishes to read shared data, it requests the reader – writer lock in read mode; a process wishing to modify the shared data must request the lock in write mode. Multiple processes are permitted to concurrently acquire a reader – writer lock in read mode, only one process may acquire the lock for writing as exclusive access is required for writers.

Reader – writer locks are useful in the following situations:

Prepared by Dr.MVB

OPERATING SYSTEMS

- In applications where it is easy to identify which processes only read shared data and which threads only write shared data.

- In applications that have more readers than writers. Dining Philosophers Problem

Consider five philosophers who spend their lives thinking and eating. The philosophers share a circular table surrounded by five chairs, each belonging to one philosopher. In the center of the table is a bowl of rice, and the table is laid with five single chop sticks. When a philosopher thinks, she does not interact with her colleagues. From time to time, a philosopher gets hungry and tries to pick up the two chopsticks that are closest to her. A philosopher may pick up only one chopstick at a time. She cannot pick up a chopstick that is already in the hand of the neighbor. When a hungry philosopher has both her chopsticks at the same time, she eats without releasing her chop sticks. When she is finished eating, she puts down both of her chop sticks and starts thinking again.

The dining philosopher‘s problem is considered a classic synchronization problem as it an example of a large class of concurrency control problems.

One simple solution is to represent each chop stick with a semaphore. A philosopher tries to grab a chop stick by executing a wait () operation on that semaphore; she releases her chop sticks by executing the signal () operation on the appropriate semaphores. Thus, the shared data are

Semaphore chopstick [5]; where all elements of chopstick are initialized to 1.This solution is

rejected as it could create a dead lock. Suppose that all five philosophers become hungry simultaneously and each grabs her left chop stick. All the elements of chop stick will now be equal to 0. When each philosopher tries to grab her right chopstick, she will be delayed forever.

Prepared by Dr.MVB

OPERATING SYSTEMS Solution to dining philosophers problem: - Allow at most four philosophers to be sitting simultaneously at the table - Allow a philosopher to pick up her chopsticks only if both chopsticks are available. - Use an asymmetric solution; that is, an odd philosopher picks up first her left chopstick and then her right chopstick whereas an even philosopher picks up her right chopstick and then her left chopstick.

� Monitors Although semaphores provide a convenient and effective mechanism for process synchronization, using them incorrectly can result in timing errors that are difficult to detect since these errors happen only if some particular execution sequences take place and these sequences do not always occur. * Suppose that a process interchanges the order in which the wait () and signal () operations on the semaphore mutex are executed.

Signal (mutex); ……. Critical section …….. Wait (mutex); Here several processes may be executing in their critical sections simultaneously, violating the mutual exclusion requirement. * Suppose that a process replaces signal (mutex) with wait (mutex) that is it executes

Wait(mutex); …… Critical section ……. Wait(mutex); Here a deadlock will occur.

*Suppose that a process omits the wait(mutex), or the signal(mutex) or both. Here, either mutual exclusion is violated or a dead lock will occur. To deal with such errors, a fundamental high level synchronization construct called monitor type is used.

Prepared by Dr.MVB

OPERATING SYSTEMS

schematic view of a monitor :Usage A monitor type presents a set of programmer defined operations that are provided mutual exclusion within the monitor. The monitor type also contains the declaration of variables whose values define the state of an instance of that type, along with the bodies of the procedures or functions that operate on those variables. The representation of a monitor type cannot be used directly by the various processes. Thus, a procedure defined within a monitor can access only those variables declared locally within the monitor and its formal parameters. The local variables of a monitor can be accessed by only the local procedures.

Name:monitor

….local declarations …..initialise local dATA

Proc1(…parameter list) …..statement list

Proc2(..parameter list) ….statement list Proc3(…parameter list) …..statement list

The monitor construct ensures that only one process at a time can be active within the monitor. But this monitor construct is not powerful for modeling some synchronization schemes. For this we need additional synchronization mechanisms. These mechanisms are provided by condition construct. The only operations that can be invoked on a condition variable are wait() and signal().

The operation x.wait() means that the process invoking this operation is suspended until another process invokes x.signal(). The x.signal() operation resumes exactly one suspended process.

When x.signal() operation is invoked by a process P, there is a suspended process Q associated with condition x. If suspended process Q is allowed to resume its execution, the signaling process P must wait. Otherwise, both P and Q would be active simultaneously within the monitor

However, both processes can conceptually continue with their execution. Two possibilities exist:

Prepared by Dr.MVB

OPERATING SYSTEMS 1. Signal and wait – P either waits until Q leaves the monitor or waits for another condition. 2. Signal and condition – Q either waits until P leaves the monitor or waits for another condition.

Dining Philosophers Solution using Monitors This solution imposes the restriction that a philosopher may pick up her

chopsticks only if both of them are available. To code this solution, we need to distinguish among three states in which

we may find a philosopher. For this purpose, we use this data structure enum {thinking, hungry, eating } state[5];

Philosopher i can set the variable state[i] = eating only if her two neighbors are not eating: (state [(i+4) % 5]! = eating) and (state [(i+1)%5]!=eating)

Also declare condition self [5];

where philosopher i can delay herself when she is hungry but is unable to

obtain the chop sticks she needs. The distribution of the chopsticks is controlled by the monitor dp. Each philosopher before starting to eat, must invoke the operation pickup(). This may result in the suspension of the philosopher process. After the successful completion of the operation the philosopher may eat. Following this, the philosopher invokes the putdown() operation. Thus philosopher i must invoke the operations pickup() and putdown() in the following sequence:

dp.pickup(i); ……. Eat …….

Prepared by Dr.MVB

OPERATING SYSTEMS

Dp.putdown(i);

This solution ensures that no two neighbors are eating simultaneously and that no deadlocks will occur.

Prepared by Dr.MVB

OPERATING SYSTEMS

Monitor solution to Dining Philosophers problem

Implementing a Monitor using Semaphores For each monitor, a semaphore mutex initialized to 1 is provided. A

process must execute wait(mutex) before entering the monitor and must execute(signal) after leaving the monitor.

Since a signaling process must wait until the resumed process either leaves or waits, an additional semaphore next initialized to 0, on which the signaling processes may suspend themselves. An integer variable next_count is used to count the number of processes suspended on next. Thus, each external procedure P is replaced by –

wait(mutex); …… Body of F ….. if(next_count>0)

Signal(next); else Signal(mutex);

Mutual exclusion within a monitor is thus ensured. Implementing condition variables: For each condition x, we introduce a semaphore x_sem and an integer

variable x_count, both initialized to 0. The operations x.wait() and x.signal() can be implemented as –

Wait:

Prepared by Dr.MVB

OPERATING SYSTEMS

X_count++; If(next_count >0 )

Signal(next); Else signal(mutex);

Wait(x_sem); X_count--;

Signal:

If(x_count > 0 ){ Next_count++; Wait(next); Next_count--;

}

Resuming Processes Within a Monitor If several processes are suspended on condition x, and an x.signal()

operation is executed by some process, then for determining which suspended process should be resumed next, we use FCFS ordering so that the process waiting the longest is resumed first. Or conditional wait construct() can be used as –

x.wait(c);

where c is an integer expression that is evaluated when the wait() operation is executed. The value of c which is called a priority number is then stored with the name of the process that is suspended. When x.signal() is executed, the process with the smallest associated priority number is resumed next

Prepared by Dr.MVB

OPERATING SYSTEMS

The resource allocator monitor controls the allocation of a single resource among competing processes. Each process, when requesting an allocation of this resource specifies the maximum time it plans to use the resource. The monitor allocates the resource to the process that has the shortest time- allocation request. A process that needs access to the resource must follow this sequence –

R.acquire(t); ….. Access the resource; …… R.release();

Where R is an instance of type Resource Allocator

But the following problems can occur –

• A process might access a resource without first gaining access permission to the resource

• A process might never release a resource once it has been granted access to the resource

• A process might attempt to release a resource that it never requested. • A process might request the same resource twice

Prepared by Dr.MVB

OPERATING SYSTEMS

One possible solution to the current problem is to include the resource access operations within the Resource Allocator monitor.

� Synchronization Examples

• Solaris • Windows XP • Linux • Pthreads

Solaris Synchronization

• Implements a variety of locks to support multitasking, multithreading

(including real-time threads), and multiprocessing • Uses adaptive mutexes for efficiency when protecting data from short

code segments • Uses condition variables and readers-writers locks when longer sections

of code need access to data • Uses turnstiles to order the list of threads waiting to acquire either an

adaptive mutex or reader-writer lock Windows XP Synchronization

• Uses interrupt masks to protect access to global resources on uniprocessor

systems • Uses spinlocks on multiprocessor systems • Also provides dispatcher objects which may act as either mutexes and

semaphores • Dispatcher objects may also provide events • An event acts much like a condition variable

Linux Synchronization

• Linux:lPrior to kernel Version 2.6, disables interrupts to implement short

critical sections • Version 2.6 and later, fully preemptive • Linux provides: • semaphores • spin locks

Prepared by Dr.MVB

OPERATING SYSTEMS Pthreads Synchronization

• Pthreads API is OS-independent • It provides: • mutex locks • condition variablesnNon-portable extensions include: • read-write locks • spin locks

Atomic Transactions

• System Model • Log-based Recovery • Checkpoints • Concurrent Atomic Transactions

System Model

• Assures that operations happen as a single logical unit of work, in its

entirety, or not at all • Related to field of database systems • Challenge is assuring atomicity despite computer system failures • Transaction - collection of instructions or operations that performs single

logical function • Here we are concerned with changes to stable storage – disk • Transaction is series of read and write operations • Terminated by commit (transaction successful) or abort (transaction

failed) operation Aborted transaction must be rolled back to undo any changes it performed

Types of Storage Media

• Volatile storage – information stored here does not survive system

crashes • Example: main memory, cache • Nonvolatile storage – Information usually survives crashes • Example: disk and tape • Stable storage – Information never lost • Not actually possible, so approximated via replication or RAID to devices

with independent failure modes

Prepared by Dr.MVB

OPERATING SYSTEMS

• Goal is to assure transaction atomicity where failures cause loss of information on volatile storage

Log-Based Recovery

• Record to stable storage information about all modifications by a

transaction • Most common is write-ahead logging • Log on stable storage, each log record describes single transaction write

operation, including Transaction name

Data item name

Old value

New value

• <Ti starts> written to log when transaction Ti starts • <Ti commits> written when Ti commits • Log entry must reach stable storage before operation on data occurs

Log-Based Recovery Algorithm

Using the log, system can handle any volatile memory errors

• Undo(Ti) restores value of all data updated by Ti • Redo(Ti) sets values of all data in transaction Ti to new values • Undo(Ti) and redo(Ti) must be idempotent • Multiple executions must have the same result as one execution • If system fails, restore state of all updated data via log • If log contains <Ti starts> without <Ti commits>, undo(Ti) • If log contains <Ti starts> and <Ti commits>, redo(Ti)

Checkpoints

Log could become long, and recovery could take long

Prepared by Dr.MVB

onserial Schedule

OPERATING SYSTEMS Checkpoints shorten log and recovery time. Checkpoint scheme: 1.Output all log records currently in volatile storage to stable storage

2.Output all modified data from volatile to stable storage

3.Output a log record <checkpoint> to the log on stable storage

Now recovery only includes Ti, such that Ti started executing before the most recent checkpoint, and all transactions after Ti All other transactions already on stable storage

Concurrent Transactions

• Must be equivalent to serial execution – serializability • Could perform all transactions in critical section • Inefficient, too restrictive • Concurrency-control algorithms provide serializability

Serializability

• Consider two data items A and B • Consider Transactions T0 and T1 • Execute T0, T1 atomically • Execution sequence called schedule • Atomically executed transaction order called serial schedule • For N transactions, there are N! valid serial schedules

Schedule 1: T0 then T1

N

Prepared by Dr.MVB

OPERATING SYSTEMS

• Nonserial schedule allows overlapped execute • Resulting execution not necessarily incorrect • Consider schedule S, operations Oi, Oj • Conflict if access same data item, with at least one write • If Oi, Oj consecutive and operations of different transactions & Oi and Oj

don‘t conflict • Then S‘ with swapped order Oj Oi equivalent to S • If S can become S‘ via swapping nonconflicting operations • S is conflict serializable

Schedule 2: Concurrent Serializable Schedule

Locking Protocol

• Ensure serializability by associating lock with each data item • Follow locking protocol for access control • Locks • Shared – Ti has shared-mode lock (S) on item Q, Ti can read Q but not

write Q • Exclusive – Ti has exclusive-mode lock (X) on Q, Ti can read and write

Q

Prepared by Dr.MVB

OPERATING SYSTEMS

• Require every transaction on item Q acquire appropriate lock • If lock already held, new request may have to wait • Similar to readers-writers algorithm

Two-phase Locking Protocol

• Generally ensures conflict serializability • Each transaction issues lock and unlock requests in two phases • Growing – obtaining locks • Shrinking – releasing locks • Does not prevent deadlock

Timestamp-based Protocols

• Select order among transactions in advance – timestamp-ordering • Transaction Ti associated with timestamp TS(Ti) before Ti starts • TS(Ti) < TS(Tj) if Ti entered system before Tj • TS can be generated from system clock or as logical counter incremented

at each entry of transaction • Timestamps determine serializability order • If TS(Ti) < TS(Tj), system must ensure produced schedule equivalent to

serial schedule where Ti appears before Tj Timestamp-based Protocol Implementation

• Data item Q gets two timestamps • W-timestamp(Q) – largest timestamp of any transaction that executed

write(Q) successfully • R-timestamp(Q) – largest timestamp of successful read(Q) • Updated whenever read(Q) or write(Q) executed • Timestamp-ordering protocol assures any conflicting read and write

executed in timestamp order Suppose Ti executes read(Q)

• If TS(Ti) < W-timestamp(Q), Ti needs to read value of Q that was already overwritten read operation rejected and Ti rolled back

• If TS(Ti) ≥ W-timestamp(Q)

read executed, R-timestamp(Q) set to max(R-timestamp(Q), TS(Ti)) Timestamp-ordering Protocol

Prepared by Dr.MVB

OPERATING SYSTEMS Supose Ti executes write(Q)

If TS(Ti) < R-timestamp(Q), value Q produced by Ti was needed previously and Ti assumed it would never be produced

Write operation rejected, Ti rolled back

If TS(Ti) < W-tiimestamp(Q), Ti attempting to write obsolete value of Q

Write operation rejected and Ti rolled back

Otherwise, write executed

Any rolled back transaction Ti is assigned new timestamp and restarted

Algorithm ensures conflict serializability and freedom from deadlock

Schedule Possible Under Timestamp Protocol

Prepared by Dr.MVB

OPERATING SYSTEMS

Memory Management: Swapping, contiguous memory allocation, paging, structure of the Page table, segmentation

Objectives:

1. Maintenance of memory. 2. What are paging, advantages and disadvantages of paging. 3. Explanation of segmentation and swapping. 4. Differences between paging and segmentation.

Memory Management

• To provide a detailed description of various ways of organizing memory hardware • To discuss various memory-management techniques, including paging and segmentation • To provide a detailed description of the Intel Pentium, which supports both pure

segmentation and segmentation with paging • Program must be brought (from disk) into memory and placed within a process for it to be

run • Main memory and registers are only storage CPU can access directly • Register access in one CPU clock (or less) • Main memory can take many cycles • Cache sits between main memory and CPU registers • Protection of memory required to ensure correct operation

Base and Limit Registers

A pair of base and limit registers define the logical address space

Prepared by Dr.MVB

OPERATING SYSTEMS

Binding of Instructions and Data to Memory

• Address binding of instructions and data to memory addresses can happen at three different stages

• Compile time: If memory location known a priori, absolute code can be generated; must recompile code if starting location changes

• Load time: Must generate relocatable code if memory location is not known at compile time

• Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers)

Multistep Processing of a User Program

Prepared by Dr.MVB

Loading

OPERATING SYSTEMS

Logical vs. Physical Address Space

• The concept of a logical address space that is bound to a separate physical address space is central to proper memory management

• Logical address – generated by the CPU; also referred to as virtual address • Physical address – address seen by the memory unit • Logical and physical addresses are the same in compile-time and load-time address-binding

schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme

Memory-Management Unit (MMU)

• Hardware device that maps virtual to physical address

• In MMU scheme, the value in the relocation register is added to every address generated by a user process at the time it is sent to memory

• The user program deals with logical addresses; it never sees the real physical addresses Dynamic relocation using a relocation register

Dynamic

• Routine is not loaded until it is called • Better memory-space utilization; unused routine is never loaded • Useful when large amounts of code are needed to handle infrequently occurring cases • No special support from the operating system is required implemented through program

design

Dynamic Linking

• Linking postponed until execution time • Small piece of code, stub, used to locate the appropriate memory-resident library routine • Stub replaces itself with the address of the routine, and executes the routine • Operating system needed to check if routine is in processes’ memory address • Dynamic linking is particularly useful for libraries • System also known as shared libraries

Prepared by Dr.MVB

OPERATING SYSTEMS

Prepared by CloudMaster

Contiguous allocation:

When each process is allocated a single contiguous memory section then it is called as contiguous allocation. Fixed partitioning:

In fixed partitioning main memory is divided into a no. of fixed size blocks called partitions.

When a process has to be loaded in main memory it is allocated a free partitions which it releases while terminating and that partition becomes free to be used by some other processes.

The major drawback of this scheme is the internal fragmentation .it arises when a memory

allocated to a process is not fully utilized.

Consider a system having fixed partitions of size 100 KB if a process of 50KB is allocated to one such partition then it uses only 50KB and the remaining 50KB is unnecessary wasted this is called as internal fragmentation.

Dynamic partitioning � In dynamic partitioning partitions are created dynamically depending on the size of

the process being loaded. � The partition size is exactly equal to the size of the process being loaded. � For example initially the memory will be empty the memory will be empty. Then a

process of 200KB is loaded into memory and is allocated 200KB then some other process of 500KB.like wise 300KB and 100KB are loaded.

� Consider that p1 and p3 are completed and terminates by releasing the memory occupied by them .then the memory status is as follows.

Now suppose new process of p5 of size 500KB wants to be loaded

but that cannot be done because we don‘t have a single contiguous 500KB free space although we have 500KB free space but it is fragmented and not contiguous hence cannot be allocated. This problem is called external fragmentation .Which is a drawback of dynamic partitioning.

The solution to external fragmentation is compaction.

In this the kernel shuffle the memory content in order to place all free memory partitions together to form a single large block.

Memory mapping and protection:

� This is one of the important issues that arise during contiguous memory allocation.

OPERATING SYSTEMS


� It refers to protecting the process from addressing an unspecified location. � It is done using limit register and relocation register. � Limit register contains the range of valid logical address and relocation register

contains the starting physical address. � If the logical address entered by user is less than the value of limit register then it is a

valid address and it is added with relocation address to map a physical location. � If it is greater than the limit, an addressing error is raised. � The following figure shows memory mapping and protection. � This procedure is a particular instance of the general dynamic storage allocation

problem, which concerns how to satisfy a request of size n from a list of free holes. � There are many solutions to this problem. The first fit, best fit and worst-fit

strategies are the ones most commonly used to select a free hole from the set of available holes.

Swapping

A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution Backing store – fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images Roll out, roll in – swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped Modified versions of swapping are found on many systems (i.e., UNIX, Linux, and Windows)

System maintains a ready queue of ready-to-run processes which have memory images on disk

Schematic View of Swapping

Contiguous Allocation

• Main memory usually into two partitions: • Resident operating system, usually held in low memory with interrupt vector • User processes then held in high memory Relocation registers used to protect user

processes from each other, and from changing operating-system code and data • Base register contains value of smallest physical address • Limit register contains range of logical addresses – each logical address must be less than the

limit register

OPERATING SYSTEMS


OS

Process 5

Process 9

OS

Process 5

• MMU maps logical address dynamically

Hardware Support for Relocation and Limit Registers

• Multiple-partition allocation • Hole – block of available memory; holes of various size are scattered throughout memory • When a process arrives, it is allocated memory from a hole large enough to accommodate it • Operating system maintains information about:

• a) allocated partitions b) free partitions (hole)

OS

Process 5

Process 8

OS Process 5

Process 9

Process

Process 2 Process 2

Process 2

process 2

OPERATING SYSTEMS


Storage-Allocation Problem

• First-fit: Allocate the first hole that is big enough • Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered

by size Produces the smallest leftover hole • Worst-fit: Allocate the largest hole; must also search entire list • Produces the largest leftover hole • First-fit and best-fit better than worst-fit in terms of speed and storage utilization

Fragmentation

• External Fragmentation – total memory space exists to satisfy a request, but it is not contiguous

• Internal Fragmentation – allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used

• Reduce external fragmentation by compaction • Shuffle memory contents to place all free memory together in one large block • Compaction is possible only if relocation is dynamic, and is done at execution time.

• I/O problem � Latch job in memory while it is involved in I/O � Do I/O only into OS buffers

Paging

• Logical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter is available

• Divide physical memory into fixed-sized blocks called frames (size is power of 2, between 512 bytes and 8,192 bytes)

• Divide logical memory into blocks of same size called pages Keep track of all free frames • To run a program of size n pages, need to find n free frames and load program • Set up a page table to translate logical to physical addresses • Internal fragmentation

Address Translation Scheme

• Address generated by CPU is divided into

• Page number (p) – used as an index into a page table which contains base address of each page in physical memory

page number page offset

p d m - n n

OPERATING SYSTEMS

• Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit

• For given logical address space 2m and page size 2n

Paging Hardware

Paging Model of Logical and Physical Memory

Prepared by Dr.MVB

OPERATING SYSTEMS


Free Frames

Paging Example

32-byte memory and 4-byte pages

OPERATING SYSTEMS


Implementation of Page Table

• Page table is kept in main memory • Page-table base register (PTBR) points to the page table • Page-table length register (PRLR) indicates size of the page table • In this scheme every data/instruction access requires two memory accesses. One for the

page table and one for the data/instruction. • The two memory access problem can be solved by the use of a special fast-lookup hardware

cache called associative memory or translation look-aside buffers (TLBs) • Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies

each process to provide address-space protection for that proces Associative Memory

• Associative memory – parallel search • Address translation (p, d) • If p is in associative register, get frame # out • Otherwise get frame # from page table in memory

Frame #

Page #

Paging Hardware With TLB

OPERATING SYSTEMS


Effective Access Time

• Associative Lookup = e time unit • Assume memory cycle time is 1 microsecond • Hit ratio – percentage of times that a page number is found in the associative registers; ratio

related to number of associative registers • Hit ratio = an Effective Access Time (EAT)

EAT = (1 + e) a + (2 + e)(1 – a)

= 2 + e – a

Memory Protection

• Memory protection implemented by associating protection bit with each frame

• Valid-invalid bit attached to each entry in the page table: • “valid” indicates that the associated page is in the process’ logical address space, and is thus

a legal page • “invalid” indicates that the page is not in the process’ logical address space • Valid (v) or Invalid (i) Bit In A Page Table

Shared Pages

OPERATING SYSTEMS

Shared code

• One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems).

• Shared code must appear in same location in the logical address space of all processes

Private code and data

• Each process keeps a separate copy of the code and data • The pages for the private code and data can appear anywhere in the logical address space

Shared Pages Example

Prepared by Dr.MVB

OPERATING SYSTEMS


Structure of the Page Table

• Hierarchical Paging • Hashed Page Tables • Inverted Page Tables

Hierarchical Page Tables

• Break up the logical address space into multiple page tables • A simple technique is a two-level page table

Two-Level Page-Table Scheme

OPERATING SYSTEMS


age number page offset

pi p2

d

Two-Level Paging Example

• A logical address (on 32-bit machine with 1K page size) is divided into: • a page number consisting of 22 bits • a page offset consisting of 10 bits • Since the page table is paged, the page number is further divided into: • a 12-bit page number • a 10-bit page offset • Thus the logical address as follows: :

where pi is an index into the outer page table, and p2 is the displacement within the page of the outer page table

p

12 10 10

Address-Translation Scheme

OPERATING SYSTEMS


Three-level Paging Scheme

Hashed Page Tables

• Common in address spaces > 32 bits • The virtual page number is hashed into a page table • This page table contains a chain of elements hashing to the same location • Virtual page numbers are compared in this chain searching for a match • If a match is found, the corresponding physical frame is extracted

Hashed Page Table

OPERATING SYSTEMS


Inverted Page Table

• One entry for each real page of memory • Entry consists of the virtual address of the page stored in that real memory location, with

information about the process that owns that page • Decreases memory needed to store each page table, but increases time needed to search

the table when a page reference occurs • Use hash table to limit the search to one — or at most a few — page-table entries

Inverted Page Table Architecture

Segmentation

Memory-management scheme that supports user view of memory

• A program is a collection of segments • A segment is a logical unit such as: • main program • procedure function • method • object • local variables, global variables • common block • stack • symbol table • arrays

OPERATING SYSTEMS


1

4

2

3

User’s View of a Program

Logical View of Segmentation

1

2

3

4

user space

Physical memory

OPERATING SYSTEMS


Segmentation Architecture

• Logical address consists of a two tuple: o <segment-number, offset>,

• Segment table – maps two-dimensional physical addresses; each table entry has: • base – contains the starting physical address where the segments reside in memory • limit – specifies the length of the segment • Segment-table base register (STBR) points to the segment table’s location in memory • Segment-table length register (STLR) indicates number of segments used by a program;

Segment number s is legal if s < STLR

• Protection • With each entry in segment table associate: • validation bit = 0 Þ illegal segment • read/write/execute privileges • Protection bits associated with segments; code sharing occurs at segment level • Since segments vary in length, memory allocation is a dynamic storage-allocation problem • A segmentation example is shown in the following diagram

Segmentation Hardware

OPERATING SYSTEMS

Example of Segmentation

Example: The Intel Pentium

• Supports both segmentation and segmentation with paging • CPU generates logical address • Given to segmentation unit

Which produces linear addresses?

• Linear address given to paging unit Which generates physical address in main memory?

Paging units form equivalent of MMU

Prepared by Dr.MVB

OPERATING SYSTEMS

Logical to Physical Address Translation in Pentium

Intel Pentium Segmentation

Prepared by Dr.MVB

OPERATING SYSTEMS

Pentium Paging Architecture

Linear Address in Linux

Three-level Paging in Linux

Prepared by Dr.MVB

OPERATING SYSTEMS

Virtual Memory Management: virtualmemory,demand paging, page-Replacement, algorithms, Allocation of Frames, Thrashing

OBJECTIVES

• To describe the benefits of a virtual memory system.

• To explain the concepts of demand paging, page-replacement algorithms, and allocation of page frames.

• To discuss the principles of the working-set model.

Virtual Memory

All the memory mgmt strategies have the same goal: to keep many processes in memory

simultaneously to allow multiprogramming. However,they tend to require that an entire process be in memory before it can execute. Virtual memory is a technique that allows the execution of processes that are not completely in memory. One major advantage of this scheme is that programs can be larger than physical memory.

Virtual memory also allows processes to share files easily and to implement shared memory.

it provides an efficient mechanism for process creation. Virtual memory is not easy to implement, however, and may substantially decrease performance if it is used carelessly.

Background: The requirement that instructions must be in physical memory to be executed seems both necessary and reasonable; but it is also unfortunate, since it limits the size of a program to the size of physical memory.But in real the entire program is not needed. Consider the following

• Programs often have code to handle unusual error conditions. Since these errors seldom, if

ever, occur in practice, this code is almost never executed. • Arrays, lists, and tables are often allocated more memory than they actually need. An array may be declared 100 bv 100 elements, even though it is seldom larger than 10 by 10 elements. An assembler symbol table may have room for 3,000 symbols, although the average program has less than 200 symbols.

• Certain options and features of a program may be used rarely. For instance, the routines on U.S. government computers that balance the budget are only rarely used.

Prepared by Dr.MVB

OPERATING SYSTEMS

The ability to execute a program that is only partially in memory would confer many benefits:

• A program would no longer be constrained by the amount of physical memory that is available. Users would be able to write programs for an extremely large virtual address space, simplifying the programming task.

• Because each user program could take less physical memory, ?more programs could be run at the same time, with a corresponding increase in CPU utilization and throughput but with no increase in response time or turnaround time.

• Less I/O would be needed to load or swap each user program into memory, so each user program would run faster.

Thus, running a program that is not entirely in memory would benefit both the system and the user.

Virtual memory involves the separation of logical memory as perceived by users from

physical memory. This separation, allows an extremely large virtual memory to be provided for programmers when only a smaller physical memory is available

Virtual memory makes the task of programming much easier,

The virtual address space of a process refers to the logical (or virtual) view of how a process is stored in memory. Typically, this view is that a process begins at a certain logical address—say, address 0—and exists in contiguous memory,

Prepared by Dr.MVB

OPERATING SYSTEMS

Note that from the above figure we allow for the heap to grow upward hi memory as it is used for dynamic memory allocation. Similarly, we allow for the stack to grow downward in memory through successive function calls. The large blank space (or hole) between the heap and the stack is part of the virtual address space but will require actual physical pages only if the heap or stack grows.

Virtual address spaces that include holes are known as sparse address spaces. Using a sparse

address space is beneficial because the holes can be filled as the stack or heap segments grow or if we wish to dynamically link libraries (or possibly other shared objects) during program execution. Virtual memory also allows files and memory to be shared by two or more processes

through page sharing This leads to the following benefits:

• System libraries can be shared by several processes through mapping of the shared object into a virtual address space. a library is mapped read-only into the space of each process that is linked with it.

• Similarly, virtual memory enables processes to share memory. Virtual memory allows one process to create a region of memory that it can share with another process. Processes sharing this region consider it part of their virtual address space • Virtual memory can allow pages to be shared during process creation with the forkO system call, thus speeding up process creation.

Prepared by Dr.MVB

OPERATING SYSTEMS

Demand Paging:

How an executable program might be loaded from disk into memory. One option is to load the entire program in physical memory at program execution time. An alternative strategy is to initially load pages only as they are needed.

This technique is known as demand paging and is commonly used in virtual memory systems. With demand-paged virtual memory, pages are only loaded when they are demanded during program execution; pages that are never accessed are thus never loaded into physical memory.

A demand-paging system is similar to a paging system with swapping where processes reside in secondary memory (usually a disk).

When we want to execute a process, we swap it into memory. Rather than swapping the

entire process into memory, however, we use a lazy swapper. A lazy swapper never swaps a page into memory unless that page will be needed.

A swapper manipulates entire processes, whereas a pager is concerned with the individual pages of a process. We thus use pager, rather than swapper, in connection with demand paging.

Prepared by Dr.MVB

OPERATING SYSTEMS

Basic Concepts

When a process is to be swapped in, the pager guesses which pages will be used before the

process is swapped out again. Instead of swapping in a whole process, the pager brings only those necessary pages into memory. Thus, it avoids reading into memory pages that will not be used anyway, decreasing the swap rime and the amount of physical memory needed.

we need some form of hardware support to distinguish between the pages that are in

memory and the pages that are on the disk.

when this bit is set to "valid/" the associated page is both legal and in memory. If the bit is set to "invalid," the page either is not valid (that is, not in the logical address space of the process) or is valid but is currently on the disk. The page-table entry for a page that is brovight into memory is set as usual, but the page-table entry for a page that is not currently in memory is either simply marked, invalid or contains the address of the page on disk.

Marking a page invalid will have no effect if the process never attempts to access that page.

Prepared by Dr.MVB

OPERATING SYSTEMS

What happens if the process tries to access a page that was not brought into memory? Access to a page marked invalid causes a page-fault trap. The paging hardware, This trap is the

result of the operating system's failure to bring the desired page into memory.

The procedure for handling this page fault is straightforward

1. We check an internal table (usually kept with the process control block) for this process to determine whether the reference was a valid or an invalid memory access.

2. If the reference was invalid, we terminate the process. If it was valid, but we have not yet brought in that page, we now page it in.

3. We find a free frame (by taking one from the free-frame list, for example).

4. We schedule a disk operation to read the desired page into the newly allocated frame.

5. When the disk read is complete, we modify the internal table kept with the process and the page table to indicate that the page is now in memory.

Prepared by Dr.MVB

OPERATING SYSTEMS

6. We restart the instruction that was interrupted by the trap. The process can now access the page as though it had always been in memory.

In the extreme case, we can start executing a process with no pages in memory. This scheme is pure demand paging: Never bring a page into memory until it is required.

The hardware to support demand paging is the same as the hardware for paging and swapping:

• Page table. This table has the ability to mark an entry invalid through a valid-invalid bit or special value of protection bits.

• Secondary memory. This memory holds those pages that are not present in main memory. The secondary memory is usually a high-speed disk. It is known as the swap device, and the section of disk used for this purpose is known as swap space.

A crucial requirement for demand paging is the need to be able to restart any instruction after a page fault. Because we save the state (registers, condition code, instruction counter) of the interrupted process when the page fault occurs, we must be able to restart the process in exactly the same place and state, except that the desired page is now in memory and is accessible. In most

cases, this requirement is easy to meet. A page fault may occur at any memory reference.

Prepared by Dr.MVB

OPERATING SYSTEMS

Consider a three-address instruction such as ADD the content of A to B, placing the result in C. These are the steps to execute this instruction:

1. Fetch and decode the instruction (ADD).

2. Fetch A.

3. Fetch B.

4. Add A and B.

5. Store the sum in C.

If we fault when we try to store in C (because C is in a page not currently in memory), we will have to get the desired page, bring it in, correct the page table, and restart the instruction. The restart will require fetching the instruction again, decoding it again, fetching the two operands again, and then adding again. However, there is not much repeated work (less than one complete instruction), and the repetition is necessary only when a page fault occurs.

The major difficulty arises when one instruction may modify several different locations. pagingt should be entirely transparent to the user process.

Performance of Demand Paging

Demand paging can significantly affect the performance of a computer system. To see why, let's compute the effective access time for a demand-paged memory. For most computer systems, the memory-access time, denoted ma, ranges from 10 to 200 nanoseconds. As long as we have no page faults, the effective access time is equal to the memory access time. If, however, a page fault occurs, we must first read the relevant page from disk and then access the desired word.

Let p be the probability of a page fault (0 s p 5 1). We would expect p to be close to zero—

that is, we would expect to have only a few page faults. The effective access time is then effective access time = (1 - p) x ma + p x page fault time. To compute the effective access time, we must know how much time is needed to service a page fault. A page fault causes the following sequence to occur:

1. Trap to the operating system.

2. Save the user registers and process state.

3. Determine that the interrupt was a page fault. '

4. Check that the page reference was legal and determine the location of the page on the disk.

5. Issue a read from the disk to a free frame:

Prepared by Dr.MVB

OPERATING SYSTEMS

a. Wait in a queue for this device until the read request is serviced. b. Wait for the device seek and /or latency time. c. Begin the transfer of the page to a free frame.

6. While waiting, allocate the CPU to some other user (CPU scheduling, optional).

7. Receive an interrupt from the disk I/O subsystem (I/O completed).

8. Save the registers and process state for the other user (if step 6 is executed).

9. Determine that the interrupt was from the disk.

10. Correct the page table and other tables to show that the desired page Is now in memory.

11. Wait for the CPU to be allocated to this process again.

12. Restore the user registers, process state, and new page table, and then resume the interrupted instruction.

In any case, we are faced with three major components of the page-fault service time:

1. Service the page-fault interrupt.

2. Read in the page.

3. Restart the process. Page Replacement

If a process of ten pages actually uses only half of them, then demand paging saves the I/O necessary to load the five pages that are never used.

consider that system memory is not used only for holding program pages. Buffers for I/O also consume a significant amount of memory. This use can increase the strain on memory-placement algorithms. Deciding how much memory to allocate to I/O and how much to program pages is a significant challenge.

While a user process is executing, a page fault occurs. The operating system determines where the desired page is residing on the disk but then finds that there are no free frames on the free-frame list; all memory is in use . The operating system has several options at this point. It could terminate the user process. However, demand paging is the operating system's attempt Users should not be

Prepared by Dr.MVB

OPERATING SYSTEMS

aware that their processes are running on a paged system—paging should be logically transparent to the user. So this option is not the best choice.

The operating system could instead swap out a process, freeing all its frames and reducing the level of multiprogramming. This option is a good one in certain circumstances, most common solution: page replacement.

Basic Page Replacement

Page replacement takes the following approach. If no frame is free, we find one that is not currently being used and free it. We can free a frame by writing its contents to swap space and changing the page table (and all other tables) to indicate that the page is no longer in memory (Figure 9.10). We can now use the freed frame to hold the page for which the process faulted. We modify the page-fault service routine to include page replacement:

1. Find the location of the desired page on the disk.

2. Find a free frame:

a. If there is a free frame, use it.

b. If there is no free frame, use a page-replacement algorithm toselect a victim frame. c. Write the victim frame to the disk; change the page and frame tables accordingly.

3. Read the desired page into the newly freed frame; change the page and frame tables.

Prepared by Dr.MVB

OPERATING SYSTEMS

4. Restart the user process.

If no frames are free, two page transfers (one out and one in) are required. This situation effectively doubles the page-fault service time and increases the effective access time accordingly.

We can reduce this overhead by using a modify bit (or dirty bit). When this scheme is used,

each page or frame has a modify bit associated with it in the hardware. The modify bit for a page is set by the hardware whenever any word or byte in the page is written into, indicating that the page has been modified. When we select a page for replacement, we examine its modify bit.

If the bit is set, we must write that page to the disk. If the modify bit is not set, then we need

not write the memory page to the disk: It is already there.

FIFO Page Replacement

The simplest page-replacement algorithm is a first-in, first-out (FIFO) algorithm. A FIFO replacement algorithm associates with each page the time when that page was brought into memory. When a page must be replaced, the oldest page is chosen.

create a FIFO queue to hold all pages in memory. We replace the page at the head of the queue. When a page is brought into memory, we insert it at the tail of the queue.

Prepared by Dr.MVB

OPERATING SYSTEMS

The FIFO page-replacement algorithm is easy to understand and program. However, its

performance is not always good. problems that are possible with a FIFO page-replacement

algorithm., we consider the following reference string:

1,2,3,4,1,2,5,1,2,3,4,5

Notice that the number of faults for four frames (ten) is greater than the number of faults for three frames (nine)! This most unexpected result is known as Belady's anomaly: For some page- replacement algorithms, the page-fault rate may increase as the number of allocated frames

increases.

Optimal Page Replacement

One result of the discovery of Belady's anomaly was the search for an optimal page-

replacement algorithm. An optimal page-replacement algorithm has the lowest page-fault rate of all algorithms and will never suffer from Belady's anomaly. Such an algorithm does exist and has been called OPT or MIK. It is simply this:

Replace the page that will not be used for the longest period of time.

Prepared by Dr.MVB

OPERATING SYSTEMS

LRU Page Replacement

If the optimal algorithm is not feasible, perhaps an approximation of the optima] algorithm is possible. The key distinction between the FIFO and OPT algorithms (other than looking backward versus forward in time) is that the FIFO algorithm uses the time when a page was brought into memory, whereas the OPT algorithm uses the time when a page is to be used. If we use the recent past as an approximation of the near future, then we can replace the page that

has not been used for the longest period of time. This approach is the least-recently-used (LRU) algorithm.

Like optimal replacement, LRL replacement does not suffer from Belady's anomaly. Both belong to a class of page-replacement algorithms, called stack algorithms, that can never exhibit Belady's anomaly.

Second-Chance Algorithm

The basic algorithm of second-chance replacement is a FIFO replacement algorithm. When a page has been selected, however, we inspect its reference bit. If the value Is 0, we proceed to replace this page; but if the reference bit is set to 1, we give the page a second chance and move on to select the next FIFO page. When a page gets a second chance, its reference bit is cleared, and its arrival time is reset to the current time. Thus, a page that is given a second chance will not be replaced until all other pages have been replaced (or given second chances). In addition, if a page is used often enough to keep its reference bit set, it will never be replaced.

One way to implement the second-chance algorithm (sometimes referred to as the dock algorithm) is as a circular queue. A pointer (that is, a hand on the clock) indicates which page is to be replaced next. When a frame is needed, the pointer advances until it finds a page with a 0 reference bit. As it advances, it clears the reference bits. Once a victim page is found, the page

is replaced, and the new page is inserted in the circular queue in that position.

LFU:

Prepared by Dr.MVB

OPERATING SYSTEMS

The least frequently used (LFU) page-replacement algorithm requires that the page with the smallest count be replaced. The reason for this selection is that an actively used page should have a large reference count. A problem arises, however, when a page is used heavily during the initial phase of a process but then is never used again. Since it was used heavily,

it has a large count and remains in memory even though it is no longer needed. One solution is to shift the counts right by 1 bit at regular intervals, forming an exponentially decaying average usage count.

Page-Buffering Algorithms

Systems commonly keep a pool of free frames. When a page fault occurs, a victim frame is chosen as before. However, the desired page is read into a free frame from the pool before the victim is written out. This procedure allows the process to restart as soon as possible, without waiting for the victim page to be written out. When the victim is later written put, its frame is added to the free-frame pool.

An expansion of this idea is to maintain a list of modified pages. Whenever the paging device is idle, a modified page is selected and is written to the disk. Its modify bit is then reset. This scheme increases the probability that a page will be clean when it is selected for replacement and will not need to be written out.

Another modification is to keep a pool of free frames but to remember which page was in each frame. Since the frame contents are not modified when a frame is written to the disk, the old page can be reused directly from the free-frame pool if it is needed before that frame is reused. No I/O is needed in this case. When a page fault occurs, we first check whether the desired page is in the free- frame pool, if it is not, we must select a free frame and read into it.

Allocation of Frames

How do we allocate the fixed amount of free memory among the various processes? If we

have 93 free frames and two processes, how many frames does each process get?

The user process is allocated any free frame.

We cannot, for example, allocate more than the total number of available frames (unless there is page sharing). We must also allocate at least a minimum number of frames. One reason for allocating at least a minimum number of frames involves performance. Obviously, as the number of

Prepared by Dr.MVB

OPERATING SYSTEMS

frames allocated to each process decreases, the page-fault rate increases, slowing process execution. For example, consider a machine in which all memory-reference instructions have only one memory address. In this case, we need at least one frame for the instruction and one frame for the memory reference. In addition, if one-level indirect addressing is allowed (for example, a load instruction on page 16 can refer to an address on page 0, which is an indirect reference to page 23), then paging requires at least three frames per process. The minimum number of frames is defined by the computer architecture. For example, the move instruction for the PDP-11 includes more than one word for some addressing modes, and thus the instruction itself may straddle two pages. In addition, each of its two operands may be indirect references, for a total of six frames.

The minimum number of frames per process is defined by the architecture, the maximum number is defined by the amount of available physical memory.

Allocation Algorithms

The easiest way to split in frames among n processes is to give everyone an equal share, m/n frames. For instance, if there are 93 frames and five processes, each process will get 18 frames. The leftover three frames can be used as a free-frame buffer pool. This scheme is called equal allocation.

An alternative is to recognize that various processes will need differing amounts of memory.

To solve this problem, we can use proportional allocation, in which we allocate available memory to each process according to its size. Let the size of the virtual memory for process pt be s-, and define

Then, if the total number of available frames is m, we allocate a, frames to process /»,-, where a, is approximately

a, = Sj/S x m.

Global versus Local Allocation

With multiple processes competing for frames, we can classify page-replacement algorithms into two broad categories: global replacement and local replacement. Global replacement allows a process to select a replacement frame from the set of all frames, even if that frame is currently allocated to some other process; that is, one process can take a frame from another. Local replacement requires that each process select from only its own set of allocated frames. One problem with a global replacement algorithm is that a process cannot control its own page-fault rate.

Prepared by Dr.MVB

OPERATING SYSTEMS Thrashing

If the number of frames allocated to a low-priority process falls below the minimum number required by the computer architecture, we must suspend, that process's execution. We should then page out its remaining pages, freeing all its allocated frames. This provision introduces a swap-in, swap-out level of intermediate CPU scheduling.

In fact, look at any process that does not have ''enough" frames. If the process does not

have the number of frames it needs to support pages in active use, it will quickly page-fault. At this point, it must replace some page. However, since all its pages are in active use, it must replace a page that will be needed again right away. Consequently, it quickly faults again, and again, and again, replacing pages that it must bring back in immediately.

This high paging activity is called thrashing. A process is thrashing if it is spending more time

paging than executing Cause of Thrashing

Thrashing results in severe performance problems.

The operating system monitors CPU utilization. If CPU utilization is too low, we increase the degree of multiprogramming by introducing a new process to the system. A global page- replacement algorithm is used; it replaces pages without regard to the process to which they belong. The CPU scheduler sees the decreasing CPU utilization and increases the degree of multiprogramming as a result. As a result, CPU utilization drops even further, and the CPU scheduler tries to increase the degree of multiprogramming even more. Thrashing has occurred, and system throughput plunges.

Prepared by Dr.MVB

OPERATING SYSTEMS

We can limit the effects of thrashing by using a local replacement algorithm (or priority replacement algorithm). With local replacement, if one process starts thrashing, it cannot steal frames from another process and cause the latter to thrash as well.

To prevent thrashing, we must provide a process with as many frames as it needs. But how

do we know how many frames it "needs'? There are several techniques.

The working-set strategy (Section 9.6.2) starts by looking at how many frames a process is actually using. This approach defines the locality model of process execution.

The locality model states that, as a process executes, it moves from locality to locality. A

locality is a set of pages that are actively used together (Figure 9.19). A program is generally composed of several different localities, which may overlap.

Working-Set Model

The working-set model is based on the assumption of locality. This model uses a parameter, A, to define the working-set window. The idea is to examine the most recent A page references. The set of pages in the most recent A page references is the working set (Figure 9.20). If a page is in,active use, it will be in the working set. If it is no longer being used, it will drop from the working set A time units after its last reference. Thus, the working set is an approximation of the program's locality.

For example, given the sequence of memory references shown in Figure 9.20, if A = 10 memory references, then the working set at time t\ is {1, 2, 5, 6, 7). By time h, the working set has changed to {3, 4}. The accuracy of the working set depends on the selection of A. If A is too

small, it will not encompass the entire locality; if A is too large, it may overlap several localities. In the extreme, if A is infinite, the working set is the set of pages touched during the process execution.

The most important property of the working set, then, is its size. If we compute the working-

set size, WSSj, for each process in the system, we can then consider that where D is the total demand for frames. Each process is actively using the pages in its working set. Thus process i needs WSSj frames. If the total demand is greater than the total number of available frames (D > m), thrashing will occur, because some processes will not have enough frames.

Prepared by Dr.MVB

OPERATING SYSTEMS

Page-Fault Frequency

The working-set model is successful, and knowledge of the working set can be useful for prepaging (Section 9.9.1), but it seems a clumsy way to control thrashing.

A strategy that uses the page-fault frequency (PFF) takes a more direct approach. Thrashing has a high page-fault rate. Thus, we want to control the page-fault rate. When it is too high, we know that the process needs more frames. Conversely, if the page-fault rate is too low, then the process may have too many frames. We can establish upper and lower bounds on the desired page- fault rate (Figure 9.21). If the actual page-fault rate exceeds the upper limit, we allocate the process another frame; if the page-fault rate falls below the lower limit, we remove a frame

from the process. Thus, we can directly measure and control the page-fault rate to prevent thrashing.

Dead locks

• To develop a description of deadlocks, which prevent sets of concurrent processes from completing their tasks?

• To present a number of different methods for preventing or avoiding deadlocks in a computer system.

In a multiprogramming environment, several processes may compete for a finite number of resources. A process requests resources; and if the resources are not available at that time, the process enters a waiting state. Sometimes, a waiting process is never again able to change state, because the resources it has requested are held by other waiting processes. This situation is called a deadlock.

"When two trains approach each other at a crossing, both shall come to a full stop and

neither shall start up again until the other has gone.'" System Model

A system consists of a finite number of resources to be distributed among a number of

competing processes. The resources are partitioned into several types, each consisting of some number of identical instances. If a process requests an instance of a resource type, the allocation of any instance of the type will satisfy the request.

Prepared by Dr.MVB

OPERATING SYSTEMS

A process must request a resource before using it and must release the resource after using it. A process may request as many resources as it requires to carry out its designated task. The number of resources requested may not exceed the total number of resources available in the system.

Under the normal mode of operation, a process may utilize a resource in only the following

sequence:

1. Request- If the request cannot be granted immediately (for example, if the resource is being used by another process), then the requesting process must wait until it can acquire the resource.

2. Use- The process can operate on the resource (for example, if the resource is a printer, the process can print on the printer).

3. Release- The process releases the resource.

The request and release of resources are system calls.

A set of processes is in a deadlock state when every process in the set is waiting for an event that can be caused only by another process in the set. Deadlocks may also involve different resource types.

Deadlock Characterization

In a deadlock, processes never finish executing, and system resources are tiedup, preventing

other jobs from starting.

Necessary Conditions

system. A deadlock situation can arise if the following four conditions hold simultaneously in a

1. Mutual exclusion- At least one resource must be held in a non-sharable mode; that is, only one process at a time can use the resource. If another process requests that resource, the requesting process must be delayed until the resource has been released.

Prepared by Dr.MVB

OPERATING SYSTEMS

2. Hold and wait- A process must be holding at least one resource and waiting to acquire additional resources that are currently being held by other processes.

3. No preemption- Resources cannot be preempted. i.e, a resource can be released only voluntarily by the process holding it, after that process has completed its task.

4. Circular wait- A set {P$, Pi, ..., Pn\ of waiting processes must exist such that P-0 is waiting for a resource held by P\, P\ is waiting for a resource held by P?, •••, P.,--i is waiting for a resource held by Pn, and P,, is waiting for a resource held by Pn.

It means that all four conditions must hold for a deadlock to occur. The circular-wait condition implies the hold-and-wait condition, so the four conditions are not completely independent.

Resource-Allocation Graph

Deadlocks can be described more precisely in terms of a directed graph called a system resource-allocation graph. This graph consists of a set of vertices V and a set of edges E. The set of vertices V is partitioned into two different types of nodes: P - {Pi, Pi,,.., P,,\, the set consisting of all the active processes in the system, and R = {R[, R?, •••/ Rm}, the set consisting of all resource types in the system.

A directed edge from process P to resource type R is denoted by P-> R it signifies that process P, has requested an instance of resource type R, and is currently waiting for that resource. A directed edge from resource type Rj to process P- is denoted by Rj -> P. it signifies that an instance of resource type Rj has been allocated to process P;. A directed edge P, —> R is called a request edge; a directed edge Rj -> P; is called an assignment edge.

Pictorially, we represent each process P, as a circle and each resource type Rj as a rectangle. Since resource type Rj may have more than one instance, we represent each such instance as a dot within the rectangle.

Prepared by Dr.MVB

OPERATING SYSTEMS

If the graph contains no cycles, then no process in the system is deadlocked. If

the graph does contain a cycle, then a deadlock may exist. If each resource type has exactly one instance, then a cycle implies that a deadlock has occurred.

If each resource type has several instances, then a cycle does not necessarily imply that a deadlock has occurred. In this case, a cycle in the graph is a necessary but not a sufficient condition for the existence of deadlock.

If there is a cycle, then the system may or may not be in a deadlocked state.

Prepared by Dr.MVB

OPERATING SYSTEMS

Methods for Handling Deadlocks

We can deal with the deadlock problem in one of three ways:

• We can use a protocol to prevent or avoid deadlocks, ensuring that the system will never enter a deadlock state.

• We can allow the system to enter a deadlock state, detect it, and recover.

• We can ignore the problem altogether and pretend that deadlocks never occur in the system.

The third solution is the one used by most operating systems.

To ensure that deadlocks never occur, the system can use either a dead lock prevention or a deadlock-avoidance scheme.

Deadlock prevention provides a set of methods for ensuring that at least one of the necessary conditions cannot hold.

Deadlock avoidance requires that the operating system be given in advance additional information concerning which resources a process will request and use during its lifetime. With this additional knowledge, it can decide for each request whether or not the process should wait.

Deadlock Prevention

We can prevent the occurrence of a deadlock by ensuring that at least one of these

conditions cannot hold.

Mutual Exclusion

Prepared by Dr.MVB

OPERATING SYSTEMS

The mutual-exclusion condition must hold for non-sharable resources. We cannot prevent deadlocks by denying the mutual-exclusion condition, because some resources

are intrinsically non-sharable.

Hold and Wait

To ensure that the hold-and-wait condition never occurs in the system, we must guarantee that, whenever a process requests a resource, it does not hold any other resources.

One protocol that can be used requires each process to request and be allocated all its

resources before it begins execution.

An alternative protocol allows a process to request resources only when it has none. A process may request some resources and use them. Before it can request any additional resources, however, it must release all the resources that it is currently allocated.

Both these protocols have two main disadvantages. First, resource utilization

may be low, Second, starvation is possible.

No Preemption

The third necessary condition for deadlocks is that there be no preemption of resources that have already been allocated. To ensure that this condition does not hold, we can use the following protocol. If a process is holding some resources and requests another resource that cannot be immediately allocated to it (that is, the process must wait), then all resources currently being held are preempted.

Alternatively, if a process requests some resources, we first check whether they are

available. If they are, we allocate them. If they are not, we check whether they are allocated to some other process that is waiting for additional resources. If so, we preempt the desired resources from the waiting process and allocate them to the requesting process. If the resources are neither available nor held by a waiting process, the requesting process must wait.

Circular Wait

The fourth and final condition for deadlocks is the circular-wait condition. One way to ensure that this condition never holds is to impose a total ordering of all resource types and to require that each process requests resources in an increasing order of enumeration.

To illustrate, we let R = {R1, R2, ..., Rm} be the set of resource types. We assign to each

resource type a unique integer number, which, allows us to compare two resources and to

Prepared by Dr.MVB

OPERATING SYSTEMS

determine whether one precedes another in our ordering. Formally, we define a one-to-one function F: R —> N, where N is the set of natural numbers. For example, if the set of resource types R includes tape drives, disk drives, and printers, then the unction F might be defined as

follows:

F(tape drive) = 1

F(di.s.k drive) =5

F (printer) = 12

We can now consider the following protocol to prevent deadlocks: Each process can request resources only in an increasing order of enumeration. That is, a process can initially request any number of instances of a resource type-say, Ri . After that, the process can request instances of resource type Rj if and only if F(Rj) > F(Ri). If several instances of the same resource type are needed, a single request for all of them must be issued.

If these two protocols are used, then the circular-wait condition cannot hold. We can

demonstrate this fact by assuming that a circular wait exists. Let the set of processes involved in the circular wait be {P1,p2,p3,..., Pn,}, where Pi is waiting for a resource Ri , which is held by process Pi+i. (Modulo arithmetic is used on the indexes, so that Pn, is waiting for a resource Rn, held by Po) Then, since process Pi+i is holding resource Ri

while requesting resource Ri+i, we must have F(Ri) < F(Ri+i), for all i. But this condition means that

F(R0) < F(R1) < ••• < F(Rn) < F(R0). By transitivity, F(Ro) < F(R0), which is impossible. Therefore, there can be no circular wait.

DEADLOCK AVOIDANCE

An alternative method for avoiding deadlocks is to require additional information about how

resources are to be requested. Each request requires that in making this decision the system consider the resources currently available, the resources currently allocated to each process, and the future requests and releases of each process.

The various algorithms that use this approach differ in the amount and type

of information required. The simplest and most useful model requires that each process declare the maximum number of resources of each type that it may need.

A deadlock-avoidance algorithm dynamically examines the resource-allocation state to

ensure that a circular wait condition can never exist. The resource-allocation state is defined by the number of available and allocated resources and the maximum demands of the processes.

Prepared by Dr.MVB

OPERATING SYSTEMS

SAFE STATE

A state is safe if the system can allocate resources to each process (up to its maximum) in some order and still avoid a deadlock. More formally, a system

is in a safe state only if there exists a safe sequence. A sequence of processes <P1, P2, ..., Pn> is a safe sequence for the current allocation state if, for each Pi, the resource requests that Pi can still make can be satisfied by the currently available resources plus the resources held by all Pj, with j< i. In this situation, if the resources that Pi needs are not immediately available, then Pi can wait until all Pj have finished. When they have finished, Pi can obtain all of its needed resources, complete its designated task, return its allocated resources, and terminate. When Pi terminates, Pi+l can obtain its needed resources, and so on. If no such sequence exists, then the system state is said to be unsafe.

A safe state is not a deadlocked state. Conversely, a deadlocked state is an unsafe state. Not all unsafe states are dead locks; however An unsafe state may lead to a deadlock. As long as the state is safe, the operating system can avoid unsafe (and deadlocked) states. In an unsafe state, the operating system cannot prevent processes from requesting resources such that a deadlock occurs: The behavior of the processes controls unsafe states.

We consider a system with 12 magnetic tape drives and three processes: P0,p1 and P2.

Process P0 requires 10 tape drives, process P1 may need as many as 4 tape drives, and process P2 may need up to 9 tape drives.

Suppose that, at time to, process P0 is holding 5 tape drives, process P1 is holding 2 tape

drives, and process P2 is holding 2 tape drives. (Thus, there are 3 free tape drives.)

Prepared by Dr.MVB

OPERATING SYSTEMS

At time to, the system is in a safe state. The sequence < P1, Po, p2> satisfies the safety condition. Process P1 can immediately be allocated all its tape drives and then return them (the system will then have 5 available tape drives); then process P0) can get all its tape drives and return them (the system will then have 10 available tape drives); and finally process P2can get all its tape drives and return them (the system will then have all 12 tape drives available). A system can go from a safe state to an unsafe state.

Suppose that, at time t1, process P2 requests and is allocated one more tape drive. The

system is no longer in a safe state. At this point, only process P1 can be allocated all its tape drives. When it returns them, the system will have only 4 available tape drives. Since process P0, is allocated 5 tape drives but has a maximum of 10, it may request 5 more tape drives. Since they are unavailable, process Po must wait. Similarly, process P2 may request an additional 6 tape drives and have to wait, resulting in a deadlock. The idea is simply to ensure that the system will always remain in a safe state. Initially, the system is in a safe state.

Whenever a process requests a resource that is currently available, the system must decide

whether the resource can be allocated immediately or whether the process must wait. The request is granted only if the allocation leaves the system in a safe state.

RESOURCE-ALLOCATION-GRAPH ALGORITHM

If we have a resource-allocation system with only one instance of each resource

type, a variant of the resource-allocation graph defined before can be used for deadlock avoidance.

In addition to the request and assignment edges already described, we introduce a new type of edge, called a claim edge.

A claim edge Pi —> Rj indicates that process Pi may request resource Rj at some time in the future. This edge resembles a request edge in direction but is represented in the graph by a dashed line. When process Pi requests resource Rj, the claim edge Pi —> Rj is converted to a request edge. Similarly, when a resource Rj is released by Pj, the assignment edge Rj -» Pi is reconverted to a claim edge Pi —> Rj.

Suppose that process Pi requests resource Rj. The request can be granted only if converting

the request edge Pi —» Rj to an assignment edge Rj —> Pi does not result in the formation of a cycle in the resource-allocation graph. Note that we check for safety by using a cycle-detection algorithm. An algorithm for detecting a cycle in this graph requires an order of n2 operations, where n is the number of processes in the system.

Prepared by Dr.MVB

OPERATING SYSTEMS

If no cycle exists, then the allocation of the resource will leave the system in a safe state.

BANKER'S ALGORITHM

The resource-allocation-graph algorithm is not applicable to a resource allocation system

with multiple instances of each resource type. This algorithm is commonly known as the banker's algorithm. The name was chosen because the algorithm could be used in a banking system but is less efficient than the resource-allocation graph scheme.

When, a new process enters the system, it must declare the maximum number of instances of each resource type that it may need. This number may not exceed the total number of resources in the system. When a user requests a set of resources, the system must determine whether the allocation of these resources will leave the system in a safe state. If it will, the resources are allocated; otherwise, the process must wait until some other process releases enough resources.

Prepared by Dr.MVB

OPERATING SYSTEMS

We need several data structures. These data structures encode the state of the resource-allocation system. Let n be the number of processes in the system and m be the number

of resource types. We need the following data structures:

• Available. A vector of length m indicates the number of available resources of each type. If Availab!c[j] equals k, there are k instances of resource type Rj available.

• Max. An n x m matrix defines the maximum demand of each process. If Max[i][j] equals k, then process Pi may request at most k instances of resource type Rj.

• Allocation. An n x in matrix defines the number of resources of each type currently allocated to each process. If Allocation[i][j] equals k, then process Pi is currently allocated k instances of resource type Rj.

• Need. An n x m matrix indicates the remaining resource need of each process. If Need[i][j] equals k, then process Pi may need k more instances of resource type Rj to complete its task. Note that Need[i][j] equals Max[i][j] - Allocation[i][j].

These data structures vary over time in both size and value.

We can treat each row in the matrices Allocation and Need as vectors and refer to them as Allocationi and Needi. The vector Allocationi specifies the resources currently allocated to process Pi the vector Needi specifies the additional resources that process Pi may still request to complete its task.

Safety Algorithm

We can now present the algorithm for finding out whether or not a system is in a safe state.

The algorithm can be described, as follows:

1. Let Work and Finish be vectors of length m and n, respectively. Initialize

Work = Available and Finish[i] - false for I in 0 , 1 , ..., n - l .

2. Find an i such that both

Prepared by Dr.MVB

OPERATING SYSTEMS

a. Finish[i] ==false

b. Needi <= Work

If no such / exists, go to step 4.

3. Work = Work + Allocation, Finish[i] = true Go to step 2.

4. If Finisli[i] -- true for all i, then the system is in a safe state.

This algorithm may require an order of m x it operations to determine whether a state is safe. Resource-Request Algorithm

We now describe the algorithm which determines if requests can be safely granted.

Let Requesti be the request vector for process Pi If Requesti [j']== k, then process Pi wants k instances of resource type Rj. When a request for resources is made by process Pi the following actions are taken:

1. If Requesti <= Needi go to step 2. Otherwise, raise an error condition, since the process has exceeded its= maximum claim.

2. If Requesti < Available, go to step 3. Otherwise, Pi must wait, since the resources are not available.

3. Have the system pretend to have allocated the requested resources to process Pi by modifying the state as follows:

Available = Available - Requesti Allocationi = Allocationi + Requesti; Need; = Necdi - Requesi If the resulting resource-allocation state is safe, the transaction is completed, and process Pi

is allocated its resources. However, if the new state is unsafe, then Pi must wait for Request;, and the old resource-allocation state is restored.

EXAMPLE

Finally, to illustrate the use of the banker's algorithm, consider a system with five processes P0 through P4 and three resource types A, B, and C. Resource type A has 10 instances, resource type B

Prepared by Dr.MVB

OPERATING SYSTEMS

has 5 instances, and resource type C has 7 instances. Suppose that, at time To, the following snapshot of the system has been taken:

The content of the matrix Need is defined to be Max - Allocation and is as follows:

We claim that the system is currently in a safe state. Indeed, the sequence <P1,p3,p4,p2,p0> satisfies the safety criteria. Suppose now that process P1 requests one additional instance of resource type A and two instances of resource type C, so Request1 = (1,0,2). To decide whether this request can be immediately granted, we first check that Request1 <= Available—that is, that (1,0,2) <= (3,3,2), which is true. We then pretend that this request has been fulfilled, and we arrive at the following new state:

We must determine whether this new system state is safe. To do so, we execute our safety

algorithm and find that the sequence <P1, P3, P4, Po, P2> satisfies the safety requirement. Hence, we can immediately grant the request of process P1.

You should be able to see, however, that when the system is in this state, a request for (3,3,0) by P4 cannot be granted, since the resources are not available. Furthermore, a request for (0,2,0) by Po cannot be granted, even though the resources are available, since the resulting state is unsafe.

DEADLOCK DETECTION

If a system does not employ either a deadlock-prevention or a deadlock avoidance

algorithm, then a deadlock situation may occur. In this environment, the system must provide:

• An algorithm that examines the state of the system to determine whether a deadlock has occurred

• An algorithm to recover from the deadlock.

Prepared by Dr.MVB

OPERATING SYSTEMS

A detection-and-recovery scheme requires overhead that includes not only the run-time costs of maintaining the necessary information and executing the detection algorithm but also the potential losses inherent in recovering from a deadlock.

SINGLE INSTANCE OF EACH RESOURCE TYPE

If all resources have only a single instance, then we can define a deadlock detection algorithm that uses a variant of the resource-allocation graph, called a wait-for graph. We obtain this graph from the resource-allocation graph by removing the resource nodes and collapsing the appropriate edges More precisely, an edge from Pi to Pj in a wait-for graph implies that process Pi is waiting for process Pj to release a resource that Pi needs. An edge Pi -» Pj exists in a wait-for graph if and only if the corresponding resource allocation graph contains two edges Pi —> Rq and Rq -» Pj

A deadlock exists in the system if and only if the wait-for graph contains a cycle. To detect deadlocks, the system needs to maintain the wait-for graph and periodically invoke an algorithm that searches for a cycle in the graph.

An algorithm to detect a cycle in a graph requires an order of n*n operation(n-vertices).

Prepared by Dr.MVB

OPERATING SYSTEMS

SEVERAL INSTANCES OF A RESOURCE TYPE

The wait-for graph scheme is not applicable to a resource-allocation system with multiple instances of each resource type.

The algorithm employs several time-varying data structures that are similar to those used in the banker's algorithm

• Available. A vector of length m indicates the number of available resources of each type. • Allocation. An n x m matrix defines the number of resources of each type currently allocated

to each process. • Request. An n x in matrix indicates the current request of each process.

If Request[i][j] equals k, then process Pi is requesting k more instances of resource type Rj.

The detection algorithm described here simply investigates every possible allocation sequence for the processes that remain to be completed.

1. Let Work and Finish be vectors of length m and n, respectively. Initialize Work = Available. For i = 0 , 1 , . . . , n-1, if Allocationi!= 0, then Finish[i]=- false; otherwise, Finisli[i] = true.

2. Find an index i such that both

a. Finish[i] ==false b. Requesti < =Work

If no such i exists, go to step 4.

3. Work = Work + Allocationi

Finish[i] = true

Go to step 2.

4. If Finish[i] == false, for some i 0 < =i < n, then the system is in a deadlocked state. Moreover, if Finish[i] == false, then process Pi is deadlocked.

This algorithm requires an order of in x n2 operations to detect whether the

system is in a deadlocked state.

consider a system with five processes P0 through P4 and three resource types A, B, and C. Resource type A has seven instances, resource type B has two instances, and resource type C has six instances. Suppose that, at time To, we have the following resource-allocation state:

Prepared by Dr.MVB

OPERATING SYSTEMS

We claim that the system is not in a deadlocked state. Indeed, if we execute our algorithm, we will find that the sequence <P0, P2, P3, P1, P4> results in Finish[i] == true for all i.

Suppose now that process P2 makes one additional request for an instance of type C. The Request matrix is modified as follows:

We claim that the system is now deadlocked.

DETECTION-ALGORITHM USAGE

When should we invoke the detection algorithm? The answer depends on two factors:

1. How often is a deadlock likely to occur?

2. How many processes will be affected by deadlock when it happens?

If deadlocks occur frequently, then the detection algorithm should be invoked frequently. Resources allocated to deadlocked processes will be idle until the deadlock can be broken. In addition, the number of processes involved in the deadlock cycle may grow. In the extreme, we can invoke the deadlock detection algorithm every time a request for allocation cannot be granted immediately.

Prepared by Dr.MVB

OPERATING SYSTEMS Recovery From Deadlock

When a detection algorithm determines that a deadlock exists, several alternatives are

available. One possibility is to inform the operator that a deadlock has occurred and to let the operator deal with the deadlock manually. Another possibility is to let the system recover from the deadlock automatically. There are two options for breaking a deadlock.

• One is simply to abort one or more processes to break the circular wait. • The other is to preempt some resources from one or more of the deadlocked

processes.

PROCESS TERMINATION

To eliminate deadlocks by aborting a process, we use one of two methods. In both methods, the system reclaims all resources allocated to the terminated processes.

Abort all deadlocked processes. This method clearly will break the deadlock cycle, but at great expense; the deadlocked processes may have computed for a long time, and the results of these partial computations must be discarded and probably will have to be recomputed later.

Abort one process at a time until the deadlock cycle is eliminated. This method incurs considerable overhead, since, after each process is aborted, a deadlock-detection algorithm must be invoked to determine whether any processes are still deadlocked. Aborting a process may not be easy. If the process was in the midst of updating a file, terminating it will leave that file in an incorrect state.

If the partial termination method is used, then we must determine which deadlocked process (or processes) should be terminated. This determination is a policy decision, similar to CPU- scheduling decisions. The question is basically an economic one; we should abort those processes whose termination will incur the minimum cost.

Many factors may affect which process is chosen, including:

• What the priority of the process is • How long the process has computed and how much longer the process will compute before

completing its designated task • How many and what type of resources the process has used (for example, whether the

resources are simple to preempt) • How many more resources the process needs in order to complete • How many processes will need to be terminated • Whether the process is interactive or batch

Prepared by Dr.MVB

OPERATING SYSTEMS

RESOURCE PREEMPTION

To eliminate deadlocks using resource preemption, we successively preempt some resources from processes and give these resources to other processes until the deadlock cycle is broken.

If preemption is required to deal with deadlocks, then three issues need to be addressed:

� Selecting a victim. Which resources and which processes are to be preempted? As in process

termination, we must determine the order of preemption to minimize cost. Cost factors may include such parameters as the number of resources a deadlocked process is holding and the amount of time the process has thus far consumed during its execution.

� Rollback. If we preempt a resource from a process, what should be done with that process? Clearly, it cannot continue with its normal execution; it is missing some needed resource. We must roll back the process to some safe state and restart it from that state. Since, in general, it is difficult to determine what a safe state is, the simplest solution is a total rollback: Abort the process and then restart it. Although it is more effective to roll back the process only as far as necessary to break the deadlock, this method requires the system to keep more information about the state of all running processes.

� Starvation. How do we ensure that starvation will not occur? That is, how can we guarantee that resources will not always be preempted from the same process?

File system Interface- the concept of a file, Access Methods, Directory structure, File system Mounting, file sharing, protection. File System implementation- File system structure, file system implementation, directory Implementation, allocation methods, free-space management

OBJECTIVES:

• To explain the function of file systems.

• To describe the interfaces to file systems.

• To discuss file-system design tradeoffs, including access methods, filesharing, file locking, and directory structures.

• To explore file-system protection.

Introduction:

The file system is the most visible aspect of an operating system.It provides the mechanism

for on-line storage of and access to both data andprograms of the operating system and all the users of the computer system. Thefile system consists of two distinct parts: a collection of files, each storing related data, and a directory structure, which organizes and provides information aboutall the files in the system.

File Concept

Computers can store information on various storage media, such as magneticdisks, magnetic tapes, and optical disks. So that the computer system willbe convenient to use, the operating system provides a uniform logical viewof information storage. The smallest logical storage unit is file. Files aremapped by the operating system onto physical devices. These storage devicesare usually nonvolatile, so the contents are persistent through power failuresand system reboots.

Prepared by Dr.MVB

OPERATING SYSTEMS

A file is a named collection of related information that is recorded onsecondary storage. Data cannot be written to secondarystorage unless they are within a file. Files represent programs (both source and object forms) and data. Data files may be numeric, alphabetic,alphanumeric, or binary. a file is a sequence of bits, bytes, lines, or records,the meaning of which is defined by the file's creator and user.

The information in a file is defined by its creator. Many different typesof information may be stored in a file—source programs, object programs,executable programs, numeric data, text, payroll records, graphic images,sound recordings, and so on.

A file has a certain defined structure, whichdepends on its type. A text file is a sequence

ofcharacters organized intolines (and possibly pages). A source file is a sequence of subroutines and functions, each of which is further organized as declarations followed byexecutable statements.

File Attributes

A file is named, for the convenience of its human users, and is referred to byits name. A name is usually a string of characters, such as example.c. Somesystems differentiate between uppercase and lowercase characters in names,whereas other systems do not. When a file is named, it becomes independentof the process, the user, and even the system that created it. For instance, oneuser might create the file example.c, and another user might edit that file byspecifying its name. The file's owner might write the file to a floppy disk, sendit in an e-mail, or copy it across a network, and it could still be called example.con the destination system.

A file's attributes vary from one operating system to another but typicallyconsist of these: Name The symbolic file name is the only information kept in humanreadableform,. Identifier This unique tag, usually a number, identifies the file within thefile system; it is the non- human-readable name for the file. Type This information is needed for systems that support different typesof files. Location This information is a pointer to a device and to the location ofthe file on that device. Size The current size of the file (in bytes, words, or blocks) and possiblythe maximum allowed size are included in this attribute. Protection Access-control information determines who can do reading,writing, executing, and so on. Time, date, and user identificationThis information may be kept forcreation, last modification, and last use.

The information about all files is kept in the directory structure, which alsoresides on

secondary storage. File Operations

A file is an abstract data type. The operating system can providesystem calls to create, write, read, reposition, delete, and truncate files.

Creating a fileTwo steps are necessary to create a file. First, space in thefile system must be found for the file. Second, an entry for the new file must be made inthe directory. Writing a fileTo write a file, we make a system call specifying both thename of the file and the information to be written to the file. Given thename of the file, the system searches the directory to find the file's location.The system must keep a write pointer to the location in the file where thenext write is to take place. The write pointer must be updated whenever awrite occurs. Reading a file To read from a file, we use a system call that specifies thename of the file and where (in memory) the next block of the file shouldbe put. Again, the directory is searched for the associated entry, and thesystem needs to keep a read pointer to the location in the file where thenext read is to take place. Once the read has taken place, the read pointeris updated. Because a process is usually either reading from or writing toa file, the current operation location can be kept as a per-process currentfile- position pointer Both the read and write operations use this samepointer, saving space and reducing system complexity.

Prepared by Dr.MVB

OPERATING SYSTEMS Repositioning within a fileThe directory is searched for the appropriateentry, and the current-file- position pointer is repositioned to a given value.Repositioning within a file need not involve any actual I/O. This fileoperation is also known as a file seek. Deleting a fileTo delete a file, we search the directory for the named file.Having found the associated directory entry, we release all file space, sothat it can be reused bv other files, and erase the directory entry. Truncating a fileThe user may want to erase the contents of a file butkeep its attributes. Rather than forcing the user to delete the file and thenrecreate it, this function allows all attributes to remain unchanged—exceptfor file length—but lets the tile be reset to length zero and its file spacereleased.

These six basic operations comprise the minimal set of required fileoperations. Other

common operations include appending new informationto the end of an existing file and renaming an existing file.

Most of the file operations mentioned involve searching the directory forthe entry associated

with the named file. To avoid this constant searching, manysystems require that an openO system call be made before a file is first usedactively. The operating system keeps a small table, called the open- file table,containing information about all open files. When a file operation is requested,the file is specified via an index into this table, so no searching is required.Some systems implicitly open a file when the first reference to it is made.The file is automatically closed when the job or program that opened thefile terminates.

TheopenO operation takes a file name and searches the directory, copying thedirectory entry into the open-file table. The openO call can also accept accessmodeinformation—create, read-only, read—write, append-only, and so on.This mode is checked against the file's permissions. If the request mode isallowed, the file is opened for the process. The openO system call typicallyreturns a pointer to the entry in the open-file table. This pointer, not the actualfile name, is used in all I/O operations, avoiding any further searching andsimplifying the system-call interface.

Several pieces of information are associated with an open file.

File pointerOn systems that do not include a file offset as part of thereadO and write () system calls, the system must track the last readwritelocation as a current-file-position pointer. This pointer is unique to each process operating on the file and therefore must be kept separate fromthe on-disk file attributes. File-open count As files are closed, the operating system must reuse itsopen-file table entries, or it could run out of space in the table. Becausemultiple processes may have opened a file, the system must wait for thelast file to close before removing the open-file table entry. The file-opencounter tracks the number of opens and closes and reaches zero on the lastclose. The system can then remove the entry. Disk location of the file Most file operations require the system to modifydata within the file. The information needed to locate the file on disk iskept in memory so that the system does not have to read it from disk foreach operation. Access rights Each process opens a file in an access mode. This informationis stored on the per- process table so the operating system can allow or denysubsequent I/O requests.

File Types

To design a file system—we must ensure that whether the operating system should recognize and supportfile types.

A common technique for implementing file types is to include the type aspart of the file name. The name is split into two parts—a name and an extension,usually separated by a period character File name examples include resume.doc, Scrver.java, andReaderThread.c. Only a file with a .com,

Prepared by Dr.MVB

OPERATING SYSTEMS .cxe,or .bat extension can be executed, for instance. The .com and .exe files are twoforms of binary executable files, whereas a .bat file is a batch file containing, inASCII format, commands to the operating system.

COMMON FILE TYPES

File Structure

File types also can be used to indicate the internal structure of the file. The operating system requires that an executable file have a specificstructure so that it can determine where in memory to load the file and whatthe location of the first instruction is.one of the disadvantages of having the operating system support multiple file structures: The resulting size of the operatingsystem is cumbersome. If the operating system defines five different file structures,it needs to contain the code to support these file structures.

Internal File StructureInternally, locating an offset within a file can be complicated for the operatingsystem. Disk systems typically have a well-defined block size determined bythe size of a sector. All disk I/O is performed in units of one block (physicalrecord), and all blocks are the same size.

Prepared by Dr.MVB

OPERATING SYSTEMS ACCESS METHODS

Files store information. When it is used, this information must be accessed andread into computer memory. The information in the file can be accessed inseveral ways.

SequentialAccess

The simplest access method is sequential access. Information in the file isprocessed in order, one record after the other. This is the most common mode; for example, editors and compilers usually access files in thisfashion.Reads and writes make up the bulk of the operations on a file. A readoperation—read next—reads the next portion of the file and automaticallyadvances a file pointer, which tracks the I/O location. Similarly, the writeoperation—write next—appends to the end of the file and advances to theend of the newly written material (the new end of file).

Direct Access

A file is made up of fixedlengthlogical records that allow programs to read and write records rapidlyin no particular order. The direct-access method is based on a disk model ofa file, since disks allow random access to any file block. For direct access, thefile is viewed as a numbered sequence of blocks or records. Direct-access files are of great use for immediate access to large amountsof information. Databases are often of this type. For the direct-access method, the file operations must be modified toinclude the block number as a parameter. Thus, we have read n, where n isthe block number, rather than read next, and write n rather than write next. The block number provided by the user to the operating system is normally arelative block number. A relative block number is an index relative to the beginning of the file. Thus, the first relative block of the file is 0, the next is1, and so on,

Prepared by Dr.MVB

OPERATING SYSTEMS Other Access Methods

Other access methods can be built on top of a direct-access method. Thesemethods generally involve the construction of an index for the file. The index,like an index in the back of a book, contains pointers to the various blocks. Tofind a record in the file, we first search the index and then use the pointer toaccess the file directly and to find the desired record.

DIRECTORY STRUCTURE

In reality, systems mayhave zero or more file systems, and the file systems may be of varying

types.The file systems of computers, then, can be extensive. Some systems storemillions of files on terabytes of disk. To manage all these data, we need toorganize them. This organization involves the use of directories.

Storage Structure

A disk (or any storage device that is large enough) can be used in its entirety fora file system.Sometimes, though, it is desirable to place multiple file systems on a disk or to use parts of a disk for a file system and other parts for otherthings, such as swap space or unformatted (raw) disk space. These parts areknown variously as partitions, slices, or (in the IBM world) minidisks.The parts can also be combined to form larger structures known asvolumes, and file systems can be created on these as well. Eachvolume can be thought of as a virtual disk. Volumes can also store multipleoperating systems, allowing a system to boot and run more than one. Each volume that contains a file system must also contain informationabout the files in the system. This information is kept in entries in a devicedirectory or volume table of contents.

Directory Overview

The directory can be viewed as a symbol table that translates file names intotheir directory entries. The operations that are to be performed on a directory: Search for a file. We need to be able to search a directory structure to findthe entry for a particular file. Since files have symbolic names and similarnames may indicate a relationship between files, we may want to be ableto find all files whose names match a particular pattern. Create a file. New files need to be created and added to the directory.

Delete a file. When a file is no longer needed, we want to be able to removeit from the directory.

Prepared by Dr.MVB

OPERATING SYSTEMS List a directory. We need to be able to list the files in a directory and thecontents of the directory entry for each file in the list. Rename a file. Because the name of a file represents its contents to its users,we must be able to change the name when the contents or use of the filechanges. Renaming a file may also allow its position within the directorystructure to be changed. Traverse the file system. We may wish to access every directory and everyfile within a directory structure. If a file is no longer inuse., the file can be copied to tape and the disk space of that file released for reuse by another file.

Single-Level Directory

The simplest directory structure is the single-level directory. All files arecontained in the same directory, which is easy to support and understand.A single-level directory has significant limitations, however, when thenumber of files increases or when the system has more than one user. Since allfiles are in the same directory, they must have unique names. If two users calltheir data file test, then the unique-name rule is violated.

Even a single user on a single-level directory may find it difficult toremember the names of all the files as the number of files increases. It is notuncommon for a user to have hundreds of files on one computer system and anequal number of additional files on another system. Keeping track of so many files is a daunting task.

Two-Level Directory

A single-level directory often leads to confusion of file namesamong different users. The

standard solution is to create a separate directoryfor each user. In the two-level directory structure, each user has his own user filedirectory (LTD). The UFDs have similar structures, but each lists only the files of a single user. When a user job starts or a user logs in, the system's masterfile directory (MFD) is searched. The MFD is indexed by user name or accountnumber, and each entry points to the UFD for that user. When a user refers to a particular file, only his own UFD is searched. The user directories themselves must be created and deleted as necessary.

A special system program is run with the appropriate user name and accountinformation. The program creates a new UFD and adds an entry for it to the MFD.It still has disadvantages. This structure effectively isolates one user fromanother. Isolation is an advantage wrhen the users are completely independentbut is a disadvantage when the users want to cooperate on some task and toaccess one another's files.

Prepared by Dr.MVB

OPERATING SYSTEMS

Tree-Structured Directories

The natural generalization is to extend the directory structure to a tree ofarbitrary height This

generalization allows users to create theirown subdirectories and to organize their files accordingly. A tree is the mostcommon directory structure. The tree has a root directory, and every file in the system has a unique path name. A directory (or subdirectory) contains a set of files or subdirectories. Adirectory is simply another file, but it is treated in a special way. All directorieshave the same internal format.

Acyclic-Graph Directories

Consider two programmers who are working on a joint project. The files associatedwith that

project can be stored in a subdirectory, separating them fromother projects and files of the two

Prepared by Dr.MVB

OPERATING SYSTEMS programmers. But since both programmersare equally responsible for the project, both want the subdirectory to be in their own directories. The common subdirectory should be shared. A shareddirectory or file will exist in the file system in two (or more) places at once.

A tree structure prohibits the sharing of files or directories. An acyclic graph—that is, a

graph with no cycles—allows directories to share subdirectoriesand files It is important to note that a shared file (or directory) is not the same as twocopies of the file. With two copies, each programmer can view the copy ratherthan the original, but if one programmer changes the file, the changes will notappear in the other's copy. With a shared file, only one actual file exists, so anychanges made by one person are immediately visible to the other.

General Graph Directory

A serious problem with using an acyclic-graph structure is ensuring that thereare no cycles. If we start with a two-level directory and allow users to createsubdirectories, a tree-structured directory results. It should be fairly easy to seethat simply adding new files and subdirectories to an existing tree-structureddirectory preserves the tree-structured nature. However, when we add links toan existing tree-structured directory, the tree structure is destroyed,The primary advantage of an acyclic graph is the relative simplicity of thealgorithms to traverse the graph and to determine when there are no more references to a file. We want to avoid traversing shared sections of an acyclicgraph twice, mainly for performance reasons.

A similar problem exists when we are trying to determine when a filecan be deleted. With acyclic-graph directory structures, a value of 0 in thereference count means that there are no more references to the file or directory, and the file can be deleted.

Prepared by Dr.MVB

OPERATING SYSTEMS

File-System Mounting

Just as a file must be opened before it is used, a file system must be mountedbefore it can be available to processes on the system. Thedirectory structure can be built out of multiple volumes, which must bemounted to make them available within the file-system name space.

The mount procedure is straightforward. The operating system is given thename of the device and the mount point—the location within the file structurewhere the file system is to be attached. Typically, a mount point is an emptydirectory. For instance, on a UNIX system, a file system containing a user's homedirectories might be mounted as /home;Next, the operating system verifies that the device contains a valid filesystem. It does so by asking the device driver to read the device directoryand verifying that the directory has the expected format. Finally, the operatingsystem notes in its directory structure that a file system is mounted at thespecified mount point.

The triangles represent subtrees of directories that are of interest. At this point, only the files on the existing file system can be accessed.

Prepared by Dr.MVB

OPERATING SYSTEMS File Sharing

File sharing is verydesirable for users who want to collaborate and to reduce the effort required to achieve a computing goal. Therefore, user-oriented operating systems mustaccommodate the need to share files in spite of the inherent difficulties.

When an operating system accommodates multiple users, the issues of filesharing, file

naming, and file protection become preeminent. Given a directorystructure that allows files to be shared by users, the system must mediate thefile sharing. The system can either allow a user to access the files of other usersby default or require that a user specifically grant access to the files. These arethe issues of access control and protection To implement sharing and protection, the system must maintain more fileand directory attributes than are needed on a single-user system. Althoughmany approaches have been taken to this requirement mostsystems have evolved to use the concepts of file (or directory) owner (or user) andgroup. The owner is the user who can change attributes and grant accessand who has the most control over the file. The group attribute defines asubset of users who can share access to the file. For example, the owner of afile on a UNIX system can issue all operations on a file, while members of the file's group can execute one subset of those operations, and all other users canexecute another subset of operations. The owner and group IDs of a given file (or directory) are stored with theother file attributes. When a user requests an operation on a file, the user ID canbe compared with the owner attribute to determine if the requesting user is theowner of the file. Many systems have multiple local file systems, including volumes of asingle disk or multiple volumes on multiple attached disks.

Remote File Systems

With the advent of networks, communication among remotecomputers became possible. Networking allows the sharing of resources spreadacross a campus or even around the world. One obvious resource to share isdata in the form of files. Through the evolution of network and file technology, remote file-sharingmethods have changed. The first implemented method involves manually transferring files between machines via programs like ftp. The second majormethod uses a distributed file system (DFS) in which remote directories arevisible from a local machine. In some ways, the third method, the World WideWeb, is a reversion to the first. ftp is used for both anonymous and authenticated access. Anonymousaccess allows a user to transfer files without having an account on the remotesystem. The World Wide Web uses anonymous file exchange almost exclusively.

TheClient- Server Model

Remote file systems allow a computer to mount one or more file systemsfrom one or more remote machines. In this case, the machine containing thefiles is the server, and the machine seeking access to the files is the client. Theclient-server relationship is common with networked machines. Generally,the server declares that a resource is available to clients and specifies exactlywhich resource (in this case, which files) and exactly which clients. A servercan serve multiple clients, and a client can use multiple servers, depending onthe implementation details of a given client-server facility. The server usually specifies the available files on a volume or directorylevel. Client identification is more difficult. A client can be specified by anetwork name or other identifier, such as an IP address, but these can be spoofed,or imitated. Distributed Information Systems

To make client-server systems easier to manage, distributed informationsystems, also known as distributed naming services, provide unified accessto the information needed for remote computing. The domain name system(DNS) provides host-name-to-network-address translations for the entire

Prepared by Dr.MVB

OPERATING SYSTEMS Internet (including the World Wide Web). Before DNIS became widespread,files containing the same information were sent via e-mail or f tp between allnetworked hosts. This methodology was not scalable. Other distributed information systems provide user name/password/userID/group ID space for a distributed facility. UNIX systems have employed a widevariety of distributed-information methods. Sun Microsystems introducedyellow pages (since renamed network information service, or NIS), and most of the industry adopted its use. It centralizes storage of user names, host names,printer information, and the like.

Failure Modes

Local file systems can fail for a variety of reasons, including failure of thedisk containing the file system, corruption of the directory structure or otherdisk-management information (collectively called metadata), disk-controllerfailure, cable failure, and host-adapter failure. Remote file systems have even more failure modes. Because of thecomplexity of network systems and the required interactions between remotemachines, many more problems can interfere with the proper operation ofremote file systems. In the case of networks, the network can be interruptedbetween two hosts. Such interruptions can result from hardware failure, poorhardware configuration, or networking implementation issues. Although somenetworks have built-in resiliency, including multiple paths between hosts,many do not. Any single failure can thus interrupt the flow of DFS commands.

Protection

When information is stored in a computer system, we want to keep it safe fromphysical damage (reliability) and improper access (protection).Reliability is generally provided by duplicate copies of files. Many computershave systems programs that automatically (or through computer- operator intervention) copy disk files to tape at regular intervals (once per day or weekor month) to maintain a copy should a file system be accidentally destroyed.

File systems can be damaged by hardware problems (such as errors in readingor writing), power surges or failures, head crashes, dirt, temperature extremes,and vandalism. Files may be deleted accidentally. Bugs in the file-system softwarecan also cause file contents to be lost.

Protection can be provided in many ways. For a small single-user system,we might provide protection by physically removing the floppy disks andlocking them in a desk drawer or file cabinet. In a multiuser system, however,other mechanisms are needed.

Types of Access

The need to protect files is a direct result of the ability to access files. Systemsthat do not permit access to the files of other users do not need protection. Thus,we could provide complete protection by prohibiting access. Alternatively, wecould provide free access with no protection. Both approaches are too extremefor general use. What is needed is controlled access.

Protection mechanisms provide controlled access by limiting the types offile access that can be made. Access is permitted or denied depending onseveral factors, one of which is the type of access requested. Several differenttypes of operations may be controlled: • Read. Read from the file. • Write. Write or rewrite the file. • Execute. Load the file into memory and execute it. • Append. Write new information at the end of the file. • Delete. Delete the file and tree its space for possible reuse. • List. List the name and attributes of the file.

Prepared by Dr.MVB

OPERATING SYSTEMS Access Control

The most common approach to the protection problem is to make accessdependent on the identity of the user. Different users may need different typesof access to a. file or directory. The most general scheme to implement identitydependentaccess is to associate with each file and directory an access-controllist (ACL) specifying user names and the types of access allowed for each user.

When a user requests access to a particular file, the operating system checksthe access list associated with that file. If that user is listed for the requestedaccess, the access is allowed. Otherwise, a protection violation occurs, and theuser job is denied access to the file. This approach has the advantage of enabling complex access methodologies.The main problem with access lists is their length. If this techniquehas two undesirable consequences:

• Constructing such a list may be a tedious and unrewarding task, especiallyif we do not know

in advance the list of users in the system. • The directory entry, previously of fixed size, now needs to be of variablesize, resulting in

more complicated space management. • These problems can be resolved by use of a condensed version of the accesslist. • To condense the length of the access-control list, many systems recognizethree classifications

of users in connection with each file: o Owner. The user who created the file is the owner. o Group. A set of users who are sharing the file and need similar access is agroup, or

work group. • Universe. All other users in the system constitute the universe.

The most common recent approach is to combine access-control lists withthe more general (and easier to implement) owner, group, and universe accesscontrolscheme

Other Protection Approaches

Another approach to the protection problem is to associate a password witheach file. Just as access to the computer system is often controlled by apassword, access to each file can be controlled in the same way. If the passwordsare chosen randomly and changed often, this scheme may be effective in limiting access to a file. The use of passwords has a few disadvantages,however. First, the number of passwords that a user needs to remember maybecome large, making the scheme impractical. Second, if only one password isused for all the files, then once it is discovered, all files are accessible;

Prepared by Dr.MVB

OPERATING SYSTEMS FILE SYSTEM IMPLEMENTATION:

OBJECTIVES • To describe the details of implementing local file systems and directorystructures. • To describe the implementation of remote file systems. • To discuss block allocation and free-block algorithms and trade-offs.

The file system provides the mechanism for on-linestorage and access to file contents,

including data and programs. The file systemresides permanently on secondary storage, which is designed to hold a largeamount of data permanently.

Here we are discussing surrounding file storage and access on the most common secondary- storagemedium, the disk.

File-System Structure

Disks provide the bulk of secondary storage on which a file system ismaintained. They have two characteristics that make them a convenientmedium for storing multiple files: 1. A disk can be rewritten in place; it is possible to read a block from thedisk, modify the block, and write it back into the same place. 2. A disk can access directly any given block of information it contains.Thus, it is simple to access any file either sequentially or randomly, andswitching from one file to another requires only moving the read-writeheads and waiting for the disk to rotate.

Rather than transferring a byte at a time, to improve I/O efficiency, I/Otransfers between memory and disk are performed in units of blocks. Eachblock has one or more sectors.

To provide efficient and convenient access to the disk, the operating systemimposes one or

more file systems to allow the data to be stored, located, andretrieved easily. A file system poses two quite different design problems. Thefirst problem is defining how the file system should look to the user. This taskinvolves defining a file and its attributes, the operations allowed on a file, andthe directory structure for organizing files. The second problem is creatingalgorithms and data structures to map the logical file system onto the physicalsecondary-storage devices.

The file system itself is generally composed of many different levels. Each level inthe design uses the features of lower levels to create new features for use byhigherlevels.

The lowest level, the I/O control, consists of device drivers and interrupthandlers to transfer information between the main memory and the disksystem. A device driver can be thought of as a translator. Its input consists ofhigh-level commands such as "retrieve block 123." Its output consists of lowlevel,hardware-specific instructions that are used by the hardware controller,which interfaces the I/O device to the rest of the system. The device driverusually writes specific bit patterns to special locations in the I/O controller'smemory to tell the controller which device location to act on and what actions to take. The basic file system needs only to issue generic commands to theappropriate device driver to read and write physical blocks on the disk.

Prepared by Dr.MVB

OPERATING SYSTEMS

The file-organization module knows about files and their logical blocks,as well as physical blocks. By knowing the type of file allocation used andthe location of the file, the file-organization module can translate logical blockaddresses to physical block addresses for the basic file system to transfer. Each file's logical blocks are numbered from 0 (or 1) through N. Since thephysical blocks containing the data usually do not match the logical numbers,a translation is needed to locate each block. The file-organization module alsoincludes the free-space manager, the logical file system manages metadata information. Metadataincludes all of the file-system structure except the actual data (or contents of the files). The logical file system manages the directory structure to provide the fileorganizationmodule with the information the latter needs, given a symbolicfile name. It maintains file structure via file- control blocks. A file-control block(FCB) contains information about the file, including ownership, permissions, and location of the file contents. The logical file system is also responsible forprotection and security, Many file systems are in use today. Most operating systems supportmore than one.

File-System implementation

Operating systems implement open() andclose () systems calls for processes to request access to file contents. Several on-disk and in-memory structures are used to implement a file system.These structures vary depending on the operating system and the file system,but some general principles apply. On disk, the file system may contain information about how to boot anoperating system stored there, the total number of blocks, the number and location of free blocks, the directory structure, and individual files. Many ofthese structures are detailed throughout the remainder of this chapter; here wedescribe them briefly: • A boot control block (per volume) can contain information needed by thesystem to boot an operating system from that volume. If the disk does notcontain an operating system, this block can be empty. It is typically thefirst block of a volume. In UFS, it is called the boot block; in NTFS, it is thepartition boot sector. • A volume control block (per volume) contains volume (or partition)details, such as the number of blocks in the partition, size of the blocks, freeblockcount and free-block pointers, and free FCB count and FCB pointers. In UFS, this is called a superblock; in NTFS, it is stored in. the master filetable.

Prepared by Dr.MVB

OPERATING SYSTEMS • A directory structure per file system is used to organize the files. In UFS,this includes file names and associated inodenumbers. In NTFS it is storedin the master file table. • A per-file FCB contains many details about the file, including file permissions,ownership, size, and location of the data blocks. In UFS, this is calledthe inode. In NTFS, this information is actually stored within the masterfile table, which uses a relational database structure, with a row per file.

The in-memory information is used for both file-system management andperformance improvement via caching. The data are loaded at mount time anddiscarded at dismount. The structures may include the ones described below: • An in-memory mount table contains information about each mountedvolume. • An in-memory directory-structure cache holds the directory informationof recently accessed directories. (For directories at which volumes aremounted, it can contain a pointer to the volume table.) • The system-wide open-file table contains a copy of the FCB of each openfile, as well as other information. • The per-process open-file table contains a pointer to the appropriate entryin the system-wide open- file table, as well as other information.

To create a new file, an application program calls the logical file system.The logical file system knows the format of the directory structures. To create anew file, it allocates a new FCB. The system then reads the appropriate directory into memory,updates it with the new file name and FCB, and writes it back to the disk.

Prepared by Dr.MVB

OPERATING SYSTEMS

Partitions and Mounting

The layout of a disk can have many variations, depending on the operatingsystem. A disk can be sliced into multiple partitions, or a volume can spanmultiple partitions on multiple disks. Each partition can be either "raw," containing no file system, or "cooked;'containing a file system. Raw disk is tisedwrhere no file system is appropriate.UNIX swap space can use a raw partition, Boot information can be stored in a separate partition. Again, it has itsown format, because at boot time the system does not have file-system devicedrivers loaded and therefore cannot interpret the file-system format.

The disk can have multiple partitions, each containing a differenttype of file system and a different operating system.The root partition, which contains the operating-system kernel and sometimes other system files, is mounted at boot time.

Virtual File Systems

The previous section makes it clear that modern operating systems mustconcurrently support multiple types of file systems. But how does an operatingsystem allow multiple types of file systems to be integrated into a directorystructure? And how can users seamlessly move between file-system types as they navigate the file-system space? An obvious but suboptimal method of implementing multiple types of filesystems is to write directory and file routines for each type.

Directory implementation

Prepared by Dr.MVB

OPERATING SYSTEMS

The selection of directory-allocation and directory-management algorithmssignificantly affects the efficiency, performance, and reliability of the filesystem.

Linear List

The simplest method of implementing a directory is to use a linear list of filenames with pointers to the data blocks. This method is simple to programbut time-consuming to execute.

To create a new file., we must first search thedirectory to be sure that no existing file has the same name. Then, we add anew entry at the end of the directory.

To delete a file, we search the directoryfor the named file, then release the space allocated to it. To reuse the directoryentry, we can do one of several things. We can mark the entry as

unused(byassigning it a special name, such as an all-blank name, or with a used-unused,bit in each entry), or we can attach it to a list of free directory entries. A thirdalternative is to copy the last entry in the directory into the freed location andto decrease the length of the directory. A linked list can also be used to decreasethe time required to delete a file. The real disadvantage of a linear list of directory entries is that finding afile requires a linear search.

Hash Table

Another data structure used for a file directory is a hash table. With thismethod, a linear list stores the directory entries, but a hash data structure isalso used. The hash table takes a value computed from the file name and returnsa pointer to the file name in the linear list. Therefore, it can greatly decrease thedirectory search time. Insertion and deletion are also fairly straightforward,although some provision must be made for collisions—situations in whichtwo file names hash to the same location. The major difficulties with a hash table are its generally fixed size and thedependence of the hash function on that size.

Allocation Methods

The direct-access nature of disks allows us flexibility in the implementation offiles, in almost every case, many files are stored on the same disk. The mainproblem is how to allocate space to these files so that disk space is utilizedeffectively and files can be accessed quickly. Three major methods of allocatingdisk space are in wide use: contiguous, linked, and indexed.

Contiguous Allocation

Contiguous allocation requires that each file occupy a set of contiguous blockson the disk. Disk addresses define a linear ordering on the disk. With thisordering, assuming that only one job is accessing the disk, accessing blockb +1 after block b normally requires no head movement. When head movement is needed (from the last sector of one cylinder to the first sector of the nextcylinder), the head need only move from one track to the next. Thus, the numberof disk seeks required for accessing contiguously allocated files is minimal.

Contiguous allocation of a file is defined by the disk address and length (inblock units) of the

first block. If the file is n blocks long and starts at locationb, then it occupies blocks b, b + 1, b + 2, ...,b + n — 1. The directory entry foreach file indicates the address of the starting block and the length of the areaallocated for this file .

Prepared by Dr.MVB

OPERATING SYSTEMS

Accessing a file that has been allocated contiguously is easy. For sequentialaccess, the file system remembers the disk adciress of the last block referencedand, when necessary, reads the next block. For direct access to block /' of afile that starts at block b, we can immediately access block b + i. Thus, bothsequential and direct access can be supported by contiguous allocation.

Contiguous allocation has some problems, however. One difficulty isfinding space for a new file. The system chosen to manage free space determineshow this task is accomplished; The contiguous-allocation problem can be seen as a particular applicationof the general dynamic storage-allocation problemwhich involves how to satisfy a request of size n from a list of free holes. Firstfit and best fit are the most common strategies used to select a free hole fromthe set of available holes. This scheme effectively compactsall free space intoone contiguous space, solving the fragmentation problem. The cost of thiscompaction is time. The time cost is particularly severe for large hard disks that use contiguous allocation, where compacting all the space may take hours andmay be necessary on a weekly basis.

Some systems require that this functionbe done off-line, with the file system unmounted. During this down time,normal system operation generally cannot be permitted; so such compaction isavoided at all costs on production machines. Most modern systems that needdefragmentation can perform it on-line during normal system operations, butthe performance penalty can be substantial.

Another problem with contiguous allocation is determining how muchspace is needed for a file.

Linked Allocation

Linked allocation solves all problems of contiguous allocation. With linkedallocation, each file is a linked list of disk blocks; the disk blocks may bescattered anywhere on the disk. The directory contains a pointer to the firstand last blocks of the file. For example, a file of five blocks might start at block9 and continue at block 16, then block 1, then block 10, and finally block 25. Each block contains a pointer to the next block. These pointersare not made available to the user. Thus, if each block is 512 bytes in size, anda disk address (the pointer) requires 4 bytes, then the user sees blocks of 508bytes.

To create a new file, we simply create a new entry in the directory. Withlinked allocation, each directory entry has a pointer to the first disk block of thefile. This pointer is initialized to nil (the

Prepared by Dr.MVB

OPERATING SYSTEMS end-of-list pointer value) to signify anempty file. The size field is also set to

0.

Linked allocation does have disadvantages, however. The major problemis that it can be used effectively only for sequential-access files. To find theith block of a file, we must start at the beginning of that file and follow thepointers until we get to the ith block. Each access to a pointer requires a disk read, and some require a disk seek. Consequently, it is inefficient to support adirect-access capability for linked-allocation files. Another disadvantage is the space required for the pointers.

Indexed Allocation

Linked allocation solves the external-fragmentation and size-declaration problemsof contiguous allocation. However, in the absence of a FAT, linkedallocation cannot support efficient direct access, since the pointers to the blocksare scattered with the blocks themselves all over the disk and must be retrieved in order. Indexed allocation solves this problem by bringing all the pointerstogether into one location: the index block.

Each file has its own index block, which is an array of disk-block addresses.The /"' entry in the index block points to the /"' block of the file. The directorycontains the address of the index block. To find and read the /thblock, we use the pointer in the /"' index-block entry. This scheme is similar tothe paging scheme

Prepared by Dr.MVB

OPERATING SYSTEMS

When the file is created, all pointers in the index block are set to nil. Whenthe ith block is first written, a block is obtained from the free-space manager,and its address is put in the zth index-block entry.Indexed allocation supports direct access, without suffering from externalfragmentation, because any free block on the disk can satisfy a request for morespace. Indexed allocation does suffer from wasted space,The pointeroverhead of the index block is generally greater than the pointer overhead oflinked allocation.

Linked scheme. An index block is normally one disk block. Thus, it canbe read and written directly by itself. To allow for large files, we can linktogether several index blocks. For example, an index block might contain asmall header giving the name of the file and a set of the first 100 disk-block addresses. The next address (the last word in the index block) is nil (for asmall file) or is a pointer to another index block (for a large file). Multilevel index. A variant of the linked representation is to use a firstlevelindex block to point to a set of second-level index blocks, which inturn point to the file blocks. To access a block, the operating system usesthe first-level index to find a second-level index block and then uses thatblock to find the desired data block. This approach could be continued toa third or fourth level, depending on the desired maximum file size. With4,096-byte blocks, we could store 1,024 4-byte pointers in an index block. Two levels of indexes allow 1,048,576 data blocks and a file size of up to 4GB. Combined scheme.Another alternative, vised in the UFS, is to keep thefirst, say, 15 pointers of the index block in the file's inode. The first 12of these pointers point to direct blocks; that is, they contain addresses ofblocks that contain data of the file. Thus, the data for small files (of no morethan 12 blocks) do not need a separate index block. If the block size is 4 KB,then up to 48 KB of data can be accessed directly. The next three pointerspoint to indirect blocks. The first points to a single indirect block, whichis an index block containing not data but the addresses of blocks that docontain data. The second points to a double indirect block, which containsthe address of a block that contains the addresses of blocks that containpointers to the actual data blocks. The last pointer contains the addressof a triple indirect block. Under this method, the number of blocks thatcan be allocated to a file exceeds the amount of space addressable by the4-byte file pointers used by many operating systems. A 32-bit file pointerreaches only 232 bytes, or 4 GB. Many UNIX implementations, includingSolaris and IBM's A1X, now support up to 64-bit file pointers. Pointers ofthis size allow files and file systems to be terabytes in size.

Performance

Prepared by Dr.MVB

OPERATING SYSTEMS

The allocation methods that we have discussed vary in their storage efficiencyand data-block access times. Both are important criteria in selecting the propermethod or methods for an operating system to implement. Before selecting an allocation method, we need to determine how thesystems will be used. A system with mostly sequential access should not usethe same method as a system with mostly random access.

For any type of access, contiguous allocation requires only one access to geta disk block. Since we can easily keep the initial address of the file in memory,we can calculate immediately the disk address of the ;th block (or the nextblock) and read it directly. For linked allocation, we can also keep the address of the next block in memory and read it directly. This method is fine for sequential access; for direct access, however, an access to the ;th block might require / disk reads. This problem indicates why linked allocation should not be used for an application requiring direct access. Indexed allocation is more complex. If the index block is already in memory,then the access can be made directly.Some systems combine contiguous allocation with indexed allocation byusing contiguous allocation for small files (up to three or four blocks) andautomatically switching to an indexed allocation if the file grows large.

Free-Space Management

Since disk space is limited, we need to reuse the space from deleted files for newfiles, if possible. (Write-once optical disks only allow one write to any givensector, and thus such reuse is not physically possible.) To keep track of free diskspace, the system maintains a free-space list. The free- space list records all freedisk blocks—those not allocated to some file or directory. To create a file, wesearch the free-space list for the required amount of space and allocate thatspace to the new file. This space is then removed from the free-space list. Whena file is deleted, its disk space is added to the free-space list. The free-space list,despite its name, might not be implemented as a list, BitVectorFrequently, the free-space list is implemented as a bit map or bit vector. Eachblock is represented by 1 bit. If the block is free, the bit is 1; if the block isallocated, the bit is 0.

For example, consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17,18, 25,26, and 27 are free and the rest of the blocks are allocated. The free-spacebit map would be001111001111110001100000011100000 ... The main advantage of this approach is its relative simplicity and itsefficiency in finding the first free block or n consecutive free blocks on thedisk, indeed, many computers supply bit-manipulation instructions that canbe used effectively for that purpose. For example, the Intel family startingwith the 80386 and the Motorola family starting with the 68020 (processors that have powered PCs and Macintosh systems, respectively) have instructionsthat return the offset in a word of the first bit with the value 1. One techniquefor finding the first free block on a system that uses a bit-vector to allocatedisk space is to sequentially check each word in the bit map to see whetherthat value is not 0, since a 0-valued word has all 0 bits and represents a set ofallocated blocks. The first non-0 word is scanned for the first 1 bit, which is thelocation of the first free block. The calculation of the block number is(number of bits per word) x (number of 0-value words) + offset of first 1 bit.

Again, we see hardware features driving software functionality. Unfortunately,bit vectors are inefficient unless the entire vector is kept in mainmemory (and is written to disk occasionally for recovery needs). Keeping it inmain memory is possible for smaller disks but not necessarily for larger ones.A 1.3-GB disk with 512-byte blocks would need a bit map of over 332 KB totrack its free blocks, although clustering the blocks in groups of four reducesthis number to over 33 KB per disk. A 40-GB disk with 1-KB blocks requires over5 MB to store its bit map.

Another approach to free-space management is to link together all the free diskblocks,

keeping a pointer to the first free block in a special location on the diskand caching it in memory. This first block contains a pointer to the next freedisk block, and so on. In our earlier example, we would keep apointer to block 2 as the first free block. Block 2 would contain a pointer to block3, which would point to block 4, which would point to block 5, which would point to block 8, and so on. However; this scheme is not efficient;to traverse the list, we must read each block, which requires

Prepared by Dr.MVB

OPERATING SYSTEMS substantial I/Otime. Fortunately, traversing the free list is not a frequent action. Usually, theoperating system simply needs a free block so that it can allocate thatblockto a file, so the first block in the free list is used. The FAT method incorporatesfree-block accounting into the allocation data structure. No separate method isneeded.

Grouping

A modification of the free-list approach is to store the addresses of n free blocksin the first free block. The first n—1 of these blocks are actually free. The lastblock contains the addresses of another n free blocks, and so on. The addressesof a large number of free blocks can now be found quickly, unlike the situationwhen the standard linked-list approach is used. Counting

Another approach is to take advantage of the fact that, generally, severalcontiguous blocks may be allocated or freed simultaneously, particularlywhen space is allocated with the contiguous- allocation algorithm or throughclustering. Thus, rather than keeping a list of n free disk addresses, we cankeep the address of the first free block and the number n of free contiguousblocks that follow the first block. Each entry in the free-space list then consistsof a disk address and a count. Although each entry requires more space thanwould a simple disk address, the overall list will be shorter, as long as the countis generally greater than 1.

Prepared by Dr.MVB

OPERATING SYSTEMS Mass-storage structure overview of Mass-storage structure, Disk structure, disk Attachment, disk scheduling, swap-space management

CHAPTER OBJECTIVES

• Describe the physical structure of secondary and tertiary storage devices and the resulting effects on the uses of the devices.

• Explain the performance characteristics of mass-storage devices. Overview of Mass-Storage Structure

Magnetic Disks

Magnetic disks provide the bulk of secondary storage for modern computer systems. conceptually, disks are relatively simple (Figure 12.1). Each disk platter has a flat circular shape, like a CD. Common platter diameters range from 1.8 to 5.25 inches. The two surfaces of a platter are covered with a magnetic material. We store information by recording it magnetically on the platters.

A read-write head "flies" just above each surface of every platter. The heads are attached to a disk arm that moves all the heads as a unit. The surface of a platter is logically divided into circular tracks, which are subdivided into sectors. The set of tracks that are at one arm position makes up a cylinder.

There may be thousands of concentric cylinders in a disk drive, and each track may contain hundreds of sectors. The storage capacity of common disk drivesis measured in gigabytes.

When the disk is in use, a drive motor spins it at high speed. Most drives rotate 60 to 200 times per second. Disk speed has two parts. The transfer rate is the rate at which data flow between the drive and the computer. The positioning time, sometimes called the random-access time, consists of the time to move the disk arm to the desired cylinder, called the seek time, and the time for the desired sector to rotate to the disk head, called the rotational latency.

Sometimes the head will damage the magnetic surface. This accident is called a head crash.

A head crash normally cannot be repaired; the entire disk must be replaced. A disk can be removable, allowing different disks to be mounted as needed. Removable

magnetic disks generally consist of one platter, held in a plastic case to prevent damage while not in the disk drive. Floppy disks are inexpensive removable magnetic disks that have a soft plastic case containing a flexible platter. The head of a floppy-disk drive generally sits directly on the disk surface, so the drive is designed to rotate more slowly than a hard-disk drive

A disk drive is attached to a computer by a set of wires called an I/O bus. Several kinds of

buses are available, including enhanced integrated drive electronics (EIDE), advanced technology attachment (ATA), serial ATA (SATA), universal serial bus (USB), fiber channel (FC), and SCSI buses. The data transfers on a bus are carried out by special electronic processors called controllers. The host controller is the controller at the computer end of the bus. A disk controller is built into each disk drive.

Magnetic Tapes

Magnetic tape was used as an early secondary-storage medium. Although it is relatively permanent and can hold large quantities of data, its access time is slow compared with that of main memory and magnetic disk.

Random access to magnetic tape is about a thousand times slower than random access to magnetic disk, so tapes are not very useful for secondary storage. Tapes are used mainly for backup, for storage of infrequently used information, and as a medium for transferring information from one system to another. A tape is kept in a spool and is wound or

Prepared by Dr.MVB

OPERATING SYSTEMS rewound past a read-write head. Moving to the correct spot on a tape can take minutes, but once positioned, tape drives can write data at speeds comparable to disk drives.

Disk Structure

Modern disk drives are addressed as large one-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. The size of a logical block is usually 512 bytes, although some disks can be low-level formatted to have a different logical block size, such as 1,024 bytes. The one-dimensional array of logical blocks is mapped onto the sectors of the disk sequentially. Sector 0 is the first sector of the first track on the outermost cylinder. The mapping proceeds in order through that track, then through the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost.

By using this mapping, we can—at least in theory—convert a logical block number into an old-style disk address that consists of a cylinder number, a track number within that cylinder, and a sector number within that track. In practice, it is difficult to perform, this translation, for two reasons. First, most disks have some defective sectors, but the mapping hides this by substituting spare sectors from elsewhere on the disk. Second, the number of sectors per track is not a constant on some drives. Let's look more closely at the second reason. On media that use constant linear velocity (CLV), the density of bits per track is uniform. The farther a track is from the center of the disk, the greater its length, so the more sectors it can hold. As we move from outer zones to inner zones, the number of sectors per track decreases. Tracks in the outermost zone typically hold 40 percent more sectors than do tracks in the innermost zone. The drive increases its rotation speed as the head moves from the outer to the inner tracks to keep the same rate of data moving under the head. This method is used in CD-ROM and DVD-ROM drives. Alternatively, the disk rotation speed can stay constant, and the density of bits decreases from inner tracks to outer tracks to keep the data rate constant.

This method is used in hard disks and is known as constant angular velocity (CAV). The number of sectors per track has been increasing as disk technology improves, and the outer zone of a disk usually has several hundred sectors per track. Similarly, the number of cylinders per disk has been increasing; large disks have tens of thousands of cylinders.

Disk Attachment

Computers access disk storage in two ways. One way is via I/O ports (or host-attached storage); this is common on small systems. The other way is via a remote host in a distributed file system; this is referred to as network-attached storage.

Host-Attached Storage

Host-attached storage is storage accessed through local I/O ports. These ports use several technologies. The typical desktop PC uses an I/O bus architecture called IDE or ATA. This architecture supports a maximum of two drives per I/O bus. A newer, similar protocol that has simplified cabling is SATA. High-end workstations and servers generally use more sophisticated I/O architectures, such as SCSI and fiber channel (FC).

SCSI is a bus architecture. Its physical medium is usually a ribbon cable having a large number of conductors (typically 50 or 68). The SCSI protocol supports a maximum of 16 devices on the bus. Generally, the devices include one controller card in the host (the SCSI initiator) and up to 15 storage devices (the SCSI targets). A SCSI disk is a common SCSI target, but the protocol provides the ability to address up to 8 logical units in each SCSI target. A typical use of logical unit addressing is to direct commands to components of a RATD array or components of a removable media library (such as a CD jukebox sending commands to the media-changer mechanism or to one of the drives).

Prepared by Dr.MVB

OPERATING SYSTEMS

FC is a high-speed serial architecture that can operate over optical fiber or over a four- conductor copper cable. It has two variants. One is a large switched fabric having a 24-bit address space. This variant is expected to dominate in the future and is the basis of storage-area networks (SANs),. Because of the large address space and the switched nature of the communication, multiple hosts and storage devices can attach to the fabric, allowing great flexibility in I/O communication. The other PC variant is an arbitrated loop (FC-AL) that can address 126 devices (drives and controllers). A wide variety of storage devices are suitable for use as host-attached storage. Among these are hard disk drives, RAID arrays, and CD, DVD, and tape drives. The I/O commands that initiate data transfers to a host-attached storage device are reads and writes of logical data blocks directed to specifically identified storage units (such as bus ID, SCSI ID, and target logical unit).

Network-Attached Storage

A network-attached storage (NAS) device is a special-purpose storage system that is accessed remotely over a data network . Clients access network-attached storage via a remote-procedure-call interface such as NFS for UNIX systems or CIFS for Windows machines. The remote procedure calls (RPCs) are carried via TCP or UDP over an IP network—-usually the same local-area network (LAN) that carries all data traffic to the clients. The networkattached storage unit is usua lly implemented as a RAID array with software that implements the RPC interface. It is easiest to think of NAS as simply another storage-access protocol. For example, rather than using a SCSI device driver and SCSI protocols to access storage, a system using NAS would use RPC over TCP/IP.

Network-attached storage provides a convenient way for all the computers on a LAN to share a pool of storage with the same ease of naming and access enjoyed with local host-attached storage. However, it tends to be less efficient and have lower performance than some direct-attached storage options. ISCSI is the latest network-attached storage protocol. In essence, it uses the IP network protocol to carry the SCSI protocol. Thus, networks rather than SCSI cables can be used as the interconnects between hosts and their storage. As a result, hosts can treat their storage as if it were directly attached, but the storage can be distant from the host.

Storage-Area Network

One drawback of network-attached storage systems is that the storage I/O operations consume bandwidth on the data network, thereby increasing the latency of network communication. This problem can be particularly acute in large client-server installations—the communication between servers and clients competes for bandwidth with the communication among servers and storage devices.

A storage-area network (SAN) is a private network (using storage protocols rather than networking protocols) connecting servers and storage units.The power of a SAN lies in its flexibility. Multiple hosts and multiple storage arrays can attach to the same SAN, and storage can be dynamically allocated to hosts. A SAN switch allows or prohibits access between the hosts and the storage. As one example, if a host is running lowon disk space, the SAN can be configured to allocate

Prepared by Dr.MVB

OPERATING SYSTEMS more storage to that host. SANs make it possible for clusters of servers to share the same storage and for storage arrays to include multiple direct host connections. SANs typically have more ports, and less expensive ports, than storage arrays. FC is the most common. SAN interconnect.

An emerging alternative is a special-purpose bus architecture named InfiniBand, which provides hardware and software support for high-speedinterconnection networks for servers and storage units.

Disk Scheduling

One of the responsibilities of the operating system is to use the hardware efficiently. For the disk drives, meeting this responsibility entails having fast access time and large disk bandwidth. The access time has two major components. The seek time is the time for the disk arm to move the heads to the cylinder containing the desired sector. The rotationallatency is the additional time for the disk to rotate the desired sector to the diskhead. The disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. We can improve both the access time and the bandwidth by scheduling the servicing of disk I/O requests in a good order.

Whenever a process needs I/O to or from the disk, it issues a system call to the operating system. The request specifies several pieces of information: • Whether this operation is input or output • What the disk address for the transfer is • What the memory address for the transfer is • What the number of sectors to be transferred is

If the desired disk drive and controller are available, the request can be serviced immediately. If the drive or controller is busy, any new requests for service will be placed in the queue of pending requests for that drive. For a multiprogramming system with many processes, the disk queue may often have several pending requests. Thus, when one request is completed,, the operating system chooses which pending request to service next. How does the operating system make this choice? Any one of several disk-scheduling algorithms can be used, and we discuss them next.

FCFS Scheduling

The simplest form of disk scheduling is, of course, the first-come, first-served (FCFS) algorithm. This algorithm is intrinsically fair, but it generally does not provide the fastest service. Consider, for example, a disk queue with requests for I/O to blocks on cylinders 98, 183, 37,122, 14, 124, 65, 67,

in that order. If the disk head is initially at cylinder 53, it will first move from 53 to 98, then to 183, 37, 122, 14, 124/65, and finally to 67, for a total head movement of 640 cylinders. The wild swing from 122 to 14 and then back to 124 illustrates the problem with this schedule. If the requests for cylinders 37 and 14 could be serviced together, before or after the requests at 122 and 124, the total head movement could be decreased substantially, and performance could be thereby improved.

Prepared by Dr.MVB

OPERATING SYSTEMS

SSTF Scheduling

It seems reasonable to service all the requests close to the current head position before moving the head far away to service other requests. This assumption is the basis for the shortest- seek-time-first (SSTF) algorithm. The SSTF algorithm selects the request with the minimum seek time from the current head position. Since seek time increases with the number of cylinders traversed by the head, SSTF chooses the pending request closest to the current head position.

For our example request queue, the closest request to the initial head position (53) is at cylinder 65. Once we are at cylinder 65, the next closest request is at cylinder 67. From there, the request at cylinder 37 is closer than the one at 98, so 37 is served next. Continuing, we service the request at cylinder 14, then 98,122, 124, and finally183.

This scheduling method results in a total head movement of only 236 cylinders—little more than one-third ofthe distance needed for FCFS scheduling of this request queue. This algorithm gives a substantial improvement in performance.

SSTF scheduling is essentially a form of shortest-job-first (SJF) scheduling; and like SJF scheduling, it may cause starvation of some requests. Remember that requests may arrive at any time. Suppose that we have two requests in the queue, for cylinders 14 and 186, and while servicing the request from 14, a new request near 14 arrives. This new request will be serviced next, making the request at 186 wait. While this request is being serviced, another request close to 14 could arrive. In theory, a continual stream of requests near one another could arrive, causing the request for cylinder 186 to wait indefinitely.

SCAN Scheduling In the SCAN algorithm, the disk arm starts at one end of the disk and moves toward the other

end, servicing requests as it reaches each cylinder, until it gets to the other end of the disk. At the other end, the direction of head movement is reversed, and servicing continues. The head continuously scans back and forth across the disk. The SCAN algorithm is sometimes called the elevator algorithm, since the disk arm behaves just like an elevator in a building, first servicing all the requests going up and then reversing to service requests the other way.

Before applying SCAN to schedule the requests on cylinders 98,183, 37,122,14, 124, 65, and 67, we need to know the direction of head movement in addition to the head's current position (53). If the disk arm is moving toward 0, the head will service 37 and then 14. At cylinder 0, the arm will reverse and will move toward the other end of the disk, servicing the requests at 65, 67, 98, 122, 124, and 183 (Figure 12.6). If a request arrives in the queue just in front of the head, it will be serviced almost immediately; a request arriving just behind the head will have to wait until the arm moves to the end of the disk, reverses direction, and comes back. Assuming a uniform distribution of requests for cylinders, consider the

Prepared by Dr.MVB

OPERATING SYSTEMS density of requests when the head reaches one end and"reverses direction. At this point, relatively few requests are immediately in front of the head, since these cylinders have recently been serviced. The heaviest density of requests is at the other end of the disk. These requests have also waited the longest, so why not go there first? That is the idea of the next algorithm.

C-SCAN Scheduling Circular SCAN (C-SCAN) scheduling is a variant of SCAN designed to provide a more

uniform wait time. Like SCAN, C-SCAN moves the head from one end of the disk to the other, servicing requests along the way. When the head reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip (Figure 12.7). The C-SCAN scheduling algorithm essentially treats the cylinders as a circular list that wraps around from the final cylinder to the first one.

LOOK Scheduling

Both SCAN and C-SCAK move the disk arm across the full width of the disk. In practice, neither algorithm is often implemented this way. More commonly, the arm goes only as far as the final request in each direction. Then, it reverses direction immediately, without going all the way to the end of the disk. Versions of SCAN and C-SCAN that follow this pattern are called LOOK and C- LOOK scheduling, because they look for a request before continuing to move in a given direction

Prepared by Dr.MVB

OPERATING SYSTEMS

Selection of a Disk-Scheduling Algorithm

Given so many disk-scheduling algorithms, how do we choose the best one? SSTF is common and has a natural appeal because it increases performance over FCFS. SCAM and C-SCAN perform better for systems that place a heavy load on the disk, because they are less likely to cause a starvation problem..

For default SSTF and LOOK is resonable. Swap-Space Management

Prepared by Dr.MVB

OPERATING SYSTEMS

Swapping is moving entire processes between disk and main memory. Very few modern operating systems implement swapping in this fashion. Rather, systems now combine swapping with virtual memory techniques and swap pages, not necessarily entire processes. In fact, some systems now use the terms swapping and paging interchangeably, reflecting the merging of these two concepts.

Swap-space management is another low-level task of the operating system. Virtual memory uses disk space as an extension of main memory. Since disk access is much slower than memory access, using swap space significantly decreases system performance.

The main goal for the design, and implementation of swap space is to provide the best throughput for the virtual memory system.

Swap-Space Use

Swap space is used in various ways by different operating systems, depending on the memory-management algorithms in use. For instance, systems that implement swapping may use swap space to hold an entire process image, including the code and data segments.

Paging systems may simply store pages that have been pushed out of main memory. The amount of swap space needed on a system can therefore vary depending on the amount of physical memory, the amount of virtual memory it is backing, and the way in which the virtual memory is used.

It can range from a few megabytes of disk space to gigabytes. Some operating systems—including Linux—allow the use of multiple swap spaces. These

swap spaces are usually put on separate disks so the load placed on the I/O system by paging and swapping can be spread over the system's I/O devices.

Swap-Space Location

A swap space can reside in one of two places: It can be carved out of the normal file system, or it can be in a separate disk partition. If the swap space is simply a large file within the file system, normal file-system routines can be used to create it, name it, and allocate its space. This iseasy to implement, but inefficient.

External fragmentation can greatly increase swapping times by forcing multiple seeks during reading or writing of a process image. We can improve performance by caching the block location information in physical memory and by using special tools to allocate physically contiguous blocks for the swap file, but the cost of traversing the file-system data structures still remains.

Alternatively, swap space can be created in a separate raw partition, as no file system or

directory structure is placed in this space. Rather, a separate swap-space storage manager is used to allocate and deallocate the blocks from the raw partition. This manager uses algorithms optimized for speed rather than for storage efficiency, because swap space is accessed much more frequently than file systems (when it is used).

Internal fragmentation may increase, but this trade-off is acceptable because the life ofdata in the

swap space generally is much shorter than that of files in the file system. Swap space is reinitialized at boot time so any fragmentation is short-lived. This approach creates a fixed amount of swap space during disk partitioning. Adding more swap space requires repartitioning the disk (which involves moving the other file-system, partitions or destroying them and restoring them from backup) or adding another swap space elsewhere.

Some operating systems are flexible and can swap both in raw partitions and in file-system space. Linux is an example

Swap-Space Management: An Example

We can illustrate how swap space is used by following the evolution of swapping and paging in various UNIX systems. The traditional UNIX kernel started with an implementation of swapping that copied entire processes between contiguous disk regions and memory. UNIX later evolved to a combination of swapping and paging as paging hardware became available.

Prepared by Dr.MVB

OPERATING SYSTEMS

In Solaris 1 (SunOS), the designers changed standard UNIX methods to improve efficiency andreflect technological changes. When a process executes, text-segment pages containing code are brought in from the file system, accessed in main memory, and thrown away if selected for pageout. It is more efficient to reread a page from the file system than to write it to swap space and then reread it from there. Swap space is only used as a backing store for pages of anonymous memory, which includes memory allocated for the stack, heap, and uninitialized data of a process.

Linux is similar to Solaris in that swap space is only used for anonymous memory or for regions of memory shared by several processes. Linux allows one or more swap areas to be established. A swap area may be in either a swap file on a regular file system or a raw swap partition. Each swap area consists of a series of 4-KB page slots, which are used to hold swapped pages. Associated with each swap area is a swap map—an array of integer counters, each corresponding to a page slot in the swap area. Tf the value of a counter is 0, the corresponding page slot is available. Values greater than 0 indicate that the page slot is occupied by a swapped page. The value of the counter indicates the number of mappings to the swapped page; for example, a value of 3 indicates that the swapped page is mapped to three different processes (which can occur if the swapped page is storing a region of memory shared by three processes).

Prepared by Dr.MVB

Documents

Systems by mvb.pdf · OPERATING SYSTEMS Computer System and Operating System Overview: Overview of computer operating Systems, operating systems functions, protection and security,