Theoretical Concept of Unix Operating System

Unit - 4 1

Unit-4

Theoretical Concept of Unix Operating System

BASIC FEATURES OF UNIX OPERATING SYSTEM

It is written in high-level language, 'C' making it easy to port to different configurations.

It is a good operating system, especially, for programs. UNIX programming environment is unusually rich and productive. It provides features that allow complex programs to be built from simpler programs.

It uses a hierarchical file system that allows easy maintenance and efficient implementation.

It uses a consistent format for files, the byte stream, making application programs easier to write.

It is a multi-user, multiprocess system. Each user can execute several processes simultaneously.

It hides the machine architecture from the user, making it easier to write programs that run on different hardware implementation.

FILE STRUCTURE

A file in UNIX is a sequence of bytes. Different programs expect various levels of structure, but the Kernel does not impose any structure on files, and no meaning is attached to its contents - the meaning of the bytes depends solely on the programs that interpret the file. This is not true of just disc files but of peripheral devices as well. Magnetic tapes, mail messages, character typed on the keyboard, line printer output, data flowing in pipes - each of these is just a sequence of bytes as far as the system and the programs in it are concerned.

Files are organized in tree-structured directories. Directories are themselves files that contain information on how to find other files. A path name to a file is a text string that identifies a file by specifying a path through the directory structure to the file. Syntactically it contains of individual file name elements separated by the slash character. For example, in /usr/Akshay/data, the first slash indicates the root of the directory tree, called the root directory. The next element, usr, is a subdirectory of the root, Akshay is a subdirectory of usr and data is a file or a directory in the directory Akshay.

ACME/Gulshan Soni

Unit - 4 2

Figure 2 shows a typical UNIX file systems. The file system

Figure 2 : UNIX file, system

is organised as a tree with a single root node called root (written "f); every non-leaf-node of the file system structure is a directory of files, and files at the leaf nodes of the tree are either directories or regular files or special device files. ldey contains device files, such as /dev/console, /dev/lp0, /dev/mt0 and so on; /bin contains the binaries of essential UNIX system programs.

Create, open, read, write, close, uplink and trunc are system calls which are used for basic file manipulation. 7be create system call, given a pathname, creates a (empty) file (or truncates and existing one). An existing file is opened by the open system call, which takes a path name and a node (such as read, write or read-write) and returns a small descriptor which may then be passed to a read or write system call (along with a buffer address and a number of bytes to transfer) to perform data transfer to or from the file.

A file descriptor is an index into a small table of open files for this process. Descriptors start at 0 and seldom get higher than 6 or 7 for typical programs, depending on the maximum number of simultaneously open files.

ACME/Gulshan Soni

Unit - 4 3

Each read or write updates the current offset into the file, which is associated with file table entry and is used to determine the position in the file for the next read or write.

CPU SCHEDULING

CPU scheduling in UNIX is designed to benefit interactive processes. Processes are given small CPU time slices by a priority algorithm that reduces to round-robin scheduling for CPU-bound jobs.

The scheduler on UNIX system belongs to the general class of operating system schedulers known as round robin with multilevel feedback which means that the kernel allocates the CPU time to a process for small time slice, preempts a process that exceeds its time slice and feed it back into one of several priority queues. A process may need many iterations through the "feedback loop" before it finishes. When kernel does a context switch and restores the context of a process. the process resumes execution from the point where it had been suspended.

Each process table entry contains a priority field. There is a process table for each process which contains a priority field for process scheduling. The priority of a process, is lower if they have recently used the CPU and vice versa.

The more CPU time a process accumulates, the lower (more positive) its priority becomes, and vice versa, so there is negative feedback in CPU scheduling and it is difficult for a single process to take all the CPU time. Process aging is employed to prevent starvation.

Older UNIX systems used a 1-second quantum for the round- robin scheduling. 4.33SD reschedules processes every 0.1 second and recomputes priorities every second. The round-robin scheduling is accomplished by the -time-out mechanism, which tells the clock interrupt driver to call a kernel subroutine after a specified interval; the subroutine to be called in this case causes the rescheduling and then resubmits a time-out to call itself again. The priority recomputation is also timed by a subroutine that resubmits a time-out for itselfevent. The kernel primitive used for this purpose is called sleep (not to be confused with the user-level library routine of the same name.) It takes an argument, which is by convention the address of a kernel data structure related to an event that the process wants to occur before that process is awakened. When the event occurs, the system process that knows about it calls wakeup with the address corresponding to the event, and all processes that had done a sleep on the same address are put in the ready queue to be run.

MEMORY MANAGEMENT

The CPU scheduling is strongly influenced by memory management schemes. At least part of a process must be contained in primary memory to run; a process cannot be executed by a CPU if it is existing entirely in main memory. It is not also possible to contain all active processes in the main memory. For example 4MB main memory will

ACME/Gulshan Soni

Unit - 4 4

not be able to provide space for 5MB process. It is the job of memory management module to decide which process should reside (at least partially) in main memory, and manage the parts of the virtual address of a process which are residing on secondary storage devices. It monitors the amount of physical memory and provide swapping of processes between physical memory and secondary storage devices.

Swapping

The early development of UNIX systems transferred entire processes between primary memory and secondary storage device but did not transfer parts of a process independently, except for shared text Such a memory management policy is called swapping. UNIX was first implemented on PDP-11, where the total physical memory was limited to 256Kbytes. The total memory resources were insufficient to justify or support complex memory management algorithms. Thus, UNIX swapped entire process memory images.

Allocation of both main memory and swap space is done first- fit. When the size of a process memory image increases (due to either stack expansion or data expansion), a new piece of memory big enough for the whole image is allocated. The memory image is copied, the old memory is freed, and the appropriate tables are updated. (An attempt is made in some systems to find memory contiguous to the end of the current piece, to avoid some copying.) If no single piece of main memory is large enough, the process is swapped out such that it will be swapped back in with the new size.

There is no need to swap out a sharable text segment, because it is read-only, and there is no need to read in a sharable text segment for a process when another instance is already in memory. That is one of the main reasons for keeping track of sharable text segments: less swap traffic. The other reason is the reduced amount of main memory required for multiple processes using the same text segment.

Decisions regarding which processes to swap in or swap out are made by the scheduler process (also known as the swapper). The scheduler wakes up at least once every 4 seconds to check for processes to be swapped in or out. A process is more likely to be swapped out if it is idle or has been in main memory for a long time, or is large; if no obvious candidates are found, other processes are picked by age. A process is more likely to be swapped in if its has been swapped out a long time, or is small. There are checks to prevent thrashing, basically by not letting a process be swapped out if it's not been in memory for a certain amount of time.

If jobs do not need to be swapped out, the process table is searched for a process deserving to be brought in (determined by how small the process is and how long it has been swapped out). Processes are swapped out until there is not enough memory available.

Many UNIX systems still use the swapping scheme just described. All Berkeley UNIX systems, on the other hand, depend primarily on paging for memory-contention

ACME/Gulshan Soni

Unit - 4 5

management, and depend only secondarily on swapping. A scheme similar in outline to the traditional one is used to determine which processes get swapped in or out. but the details differ and the influence of swapping is less.

Demand Paging

Berkeley introduced demand paging to UNIX with BSD (Berkeley System) which transferred memory pages instead of processes to and from a secondary device; recent releases of UNIX system also support demand paging. Demand paging is done in a straightforward manner. When a process needs a page and the page is not there, a page fault to the kernel occurs, a frame of main memory is allocated, and then the process is loaded into the frame by the kernel.

The advantage of demand paging policy is that it permits greater flexibility in mapping the virtual address of a process into the physical memory of a machine, usually allowing the size of a process to be greater than the amount of availability of physical memory and allowing more Processes to fit into main memory. The advantage of a swapping policy is that is easier to implement and results in less system overhead.

Blocks and Fragments

Most of the file system is taken up by data blocks, which contain whatever the users have put in their files. Let us consider how these data blocks are stored on the disk.

The hardware disk sector is usually 512 bytes. A block size larger than 512 bytes is desirable for speed. However, because UNIX file systems usually contain a very large number of small files, much larger blocks would cause excessive internal fragmentation. That is why the earlier 4.IBSD file system was limited to a 1024-byte (IK) block.

The 4.2BSD solution is to use two block sizes for files which have no indirect blocks: all the blocks of a file are of a large block size (such as 8K), except the last. The last block is an appropriate multiple of a smaller fragment size (for example, 1024) to fill out the file. Thus, a file of size 18,000 bytes would have, two 8K blocks and one 2K fragment (which would not be filled completely).

The block and fragment sizes are set during file-system creation according to the intended use of the file system: If many small files are expected, the fragment size should be small; if repeated transfers of large files are expected, the basic block size should be large, Implementation details force a maximum block-to-fragment ratio of 8:1, and a minimum block size of 4K, so typical choices are 4096: 512 for the former case and 8 192: 1024 for the latter.

Suppose data are written to a file in transfer sizes of 1K bytes, and the block and fragment sizes of the file system are 4K and 512 bytes. The file system will allocate a 1K fragment to contain the data from the first transfer. The next transfer will cause a new 2K fragment to be allocated. The data from the original fragment must be copied into this

ACME/Gulshan Soni

Unit - 4 6

new fragment, followed by the second 1 K transfer. The a] location routines do attempt to find the required space on the disk immediately following the existing ferment so that no copying is necessary, but, if they cannot do so, up to seven copies may be required before the fragment becomes a block. Provisions have been made for programs to discover the block size for a file so that transfers of that size can be made, to avoid fragment recopying

Inodes

Associated with each file in LTNIX is a little table (on disk) called an i-node. An inode is a record that describes the attributes of a file, including the lay out of its data on disk. Inodes exist in a static form on disk and the kernel read them into the main memory and manipulates them. Disk inodes consist of the following fields:

File owner identifier - File ownership is divided between an individual owner and a group owner and defines the set of users who have access rights to a file. There supervisor has access rights to all files in the system.

File type - Files may be of type regular, directory, character or block special or pipes.

File access permission - The system protects files according to three classes: the owner and the group owner of the file and other users; each class has access rights to read, write and execute the file which can be set individually. Although directory is a file but it cannot be executed, execution permission for a directory gives the right to search the directory, for a file name.

File access times - Giving the time the file was last modified, when it was last accessed.

In addition, the inode contains 15 pointers to the disk blocks containing the data contents of the file. The first 12 of these pointers (as shown in figure 3) point to direct blocks; that is, they contain addresses of blocks that contain data of

ACME/Gulshan Soni

Unit - 4 7

Figure 3 : Direct and indirect block of inode

the file. Thus, the data for small files (no more than 12 blocks) can be referenced immediately, because a copy of the inode is kept in main memory while a file is open. If the block size is 4K, then up to 48K of data may be accessed directly from the inode.

The next three pointers in the inode point to indirect blocks. If the file is large enough to use indirect blocks, the indirect blocks are each of the major block size; the fragment size applies to only data blocks. The first indirect block pointer is the address of a single indirect block. The single indirect block is an index block, containing not data, but rather the addresses of blocks that do contain data. Then, there is a double-indirect-block pointer, the address of a block that contains the addresses of blocks that contain pointers to the actual data blocks.

The last pointer would contain the address of a triple indirect block; however, there is no need for it. The minimum block size for a file system in 4.2BSD is 4K, so files wit as many as 232 bytes will use only double, not triple, indirection. That is, as each block pointer takes 4 bytes, we have 49,152 (4K x 12) bytes accessible in direct blocks, 4,194,304 bytes accessible by a single indirection, and 4,294,967,296 bytes reachable through double indirection, for a total of 4,299,210,752 bytes, which is larger than 232 bytes.

The number 232 is significant because the file offset in the file structure in main memory is kept in a 32-bit word. Files therefore cannot be larger than 232 bytes. Since file

ACME/Gulshan Soni

Unit - 4 8

pointers are signed integers (for seeking backward and forward in a file), the actual maximum file size is 232-1 bytes. Two gigabytes is large enough for most purposes.

Directory Structure

Before a file can be read, it must be opened. When a file is opened, the operating system uses the path name supplied by the user to locate the disk blocks, so that it can read and write the file later. Mapping path names onto i-nodes (or the equivalent) brings us to the subject of how directory systems are organized. These vary from quite simple to reasonably sophisticated.

Now let us consider some examples of systems with hierarchical directory trees. Figure 4 shows an MS-DOS directory entry. It is 32 bytes long and contains the file name and the first block number, among other items. The first block number can be used as an index into the FAT, to find the second block number, and so on. In this way all the blocks can be found a given file. Except for the root directory, which is fixed size (1 12 entries for a 360K disk). MS-DOS directories am files and may contain an arbitrary number of entries.

Figure 4 : The MS-DOS directory entry

The directory structure used in UNIX is extremely simple, as shown in figure 5. Each entry contains just a file name and its i-node number. All the information about the type, size. times, ownership, and disk blocks is contained in the i- node (see figure 3). All directories UNIX are files, and may contain arbitrarily many of these entries.

Figure 5 : A Unix directory entry

When a file is opened, the file system must take the file name supplied and locate its disk blocks. Let us consider how the path name /usr/ast/mbox is looked up. We will use UNIX as an example, but the algorithm is basically the same for all hierarchical directory

ACME/Gulshan Soni

Unit - 4 9

systems. First the file system locates the root directory. In UNIX its i-node is located at a fixed place on the disk.

Then it looks up the first component of the path, usr, in the root directory to find the i-node, the system locates the directory for/usr and looks up the next component, ast, in it. when it has found the entry for ast, it has the i node for directory for /usr /ast. From this i-node it can find the directory itself and look up mbox. The i- node for this file is then read into memory and kept there until the file is closed. The lookup process is illustrated in figure 6.

Figure 6: The steps In looking up/usr/ast/mbox

Relative path names are looked up the same way as absolute ones, only starting from the working directory instead of starting from the root directory. Every directory has entries for and which are put there when the directory is created. The entry has the i node number for the parent directory, and searches that directory for disk. No special mechanism is needed to handle these names. As far as the directory system is concerned, they are just ordinary ASCII

User to user communication:

INTRODUCTION

UNIX for easy communication. Each of the commands has its own advantages and is appropriate in different situations. You should learn to use these commands and actually put them into practice.

ACME/Gulshan Soni

Unit - 4 10

Unlike the other commands which you can learn by solitary effort, the communication features are best mastered by working with a partner to whom you can send practice messages. Of course this is not an essential requirement, and you could, if necessary, learn even the communication commands all by yourself.

OBJECTIVES

This unit will take you further down the road to exploring UNIX. By now you have had a feel of what UNIX is like. In this unit you will learn about some more of the strong points of UNIX. By the end of this unit you should be able to:

Communicate on-line with other users on your machine using

1. write2. wall

Communicate off-line with other users with the help of mail and news

ACME/Gulshan Soni

Documents

Theoretical Concept of Unix Operating System