Unix

CHAPTER-1

UNIX OPERATING SYSTEM OVERVIEW

This lecture covers:

The concept of an operating system. The internal architecture of an operating system. The evolution of the UNIX operating system into two broad schools (BSD

and SYSV) and the development of Linux, a popular open source operating system.

The architecture of the Linux operating system in more detail. How to log into (and out of) UNIX and change your password. The general format of UNIX commands.

What Is Operating System?

An operating system (OS) is a resource manager. It takes the form of a set of software routines that allow users and application programs to access system resources (e.g. the CPU, memory, disks, modems, printers network cards etc.) in a safe, efficient and abstract way.

For example, an OS ensures safe access to a printer by allowing only one application program to send data directly to the printer at any one time. An OS encourages efficient use of the CPU by suspending programs that are waiting for I/O operations to complete to make way for programs that can use the CPU more productively. An OS also provides convenient abstractions (such as files rather than disk locations) which isolate application programmers and users from the details of the underlying hardware.

id1875093 pdfMachine by Broadgun Software - a great PDF writer! - a great PDF creator! - http://www.pdfmachine.com http://www.broadgun.com

Fig. 1.1: General operating system architecture

Fig. 1.1 presents the architecture of a typical operating system and shows how an OS succeeds in presenting users and application programs with a uniform interface without regard to the details of the underlying hardware. We see that:

The operating system kernel is in direct control of the underlying hardware. The kernel provides low-level device, memory and processor management functions (e.g. dealing with interrupts from hardware devices, sharing the processor among multiple programs, allocating memory for programs etc.)

Basic hardware-independent kernel services are exposed to higher-level programs through a library of system calls (e.g. services to create a file, begin execution of a program, or open a logical network connection to another computer).

Application programs (e.g. word processors, spreadsheets) and system utility programs (simple but useful application programs that come with the operating system, e.g. programs which find text inside a group of files) make use of system calls. Applications and system utilities are launched

using a shell (a textual command line interface) or a graphical user interface that provides direct user interaction.

Operating systems (and different flavours of the same operating system) can be distinguished from one another by the system calls, system utilities and user interface they provide, as well as by the resource scheduling policies implemented by the kernel.

A BRIEF HISTORY OF UNIX

UNIX has been a popular OS for more than two decades because of its multi-user, multi-tasking environment, stability, portability and powerful networking capabilities. What follows here is a simplified history of how UNIX has developed

Fig. 1.2: Simplified UNIX FamilyTree

In the late 1960s, researchers from General Electric, MIT and Bell Labs launched a joint project to develop an ambitious multi-user, multi-tasking OS for mainframe computers known as MULTICS (Multiplexed Information and Computing System). MULTICS failed (for some MULTICS enthusiasts "failed" is perhaps too strong a word to use here), but it did inspire Ken Thompson, who was a researcher at Bell Labs, to have a go at writing a simpler operating system himself. He wrote a simpler version of MULTICS on a PDP7 in assembler and called his attempt UNICS (Uniplexed Information and Computing System). Because memory and CPU power were at a premium in those days, UNICS (eventually shortened to UNIX) used short commands to minimize the space needed to store them and the time needed to decode them - hence the tradition of short UNIX commands we use today, e.g. ls, cp, rm, mv etc.

Ken Thompson then teamed up with Dennis Ritchie, the author of the first C compiler in 1973. They rewrote the UNIX kernel in C - this was a big step forwards in terms of the system's portability - and released the Fifth Edition of UNIX to universities in 1974. The Seventh Edition, released in 1978, marked a split in UNIX development into two main branches: SYSV (System 5) and BSD (Berkeley Software Distribution). BSD arose from the University of California at Berkeley where Ken Thompson spent a sabbatical year. Its development was continued by students at Berkeley and other research institutions. SYSV was developed by AT&T and other commercial companies. UNIX flavours based on SYSV have traditionally been more conservative, but better supported than BSD-based flavours.

The latest incarnations of SYSV (SVR4 or System 5 Release 4) and BSD Unix are actually very similar. Some minor differences are to be found in file system structure, system utility names and options and system call libraries as shown in Fig 1.3.

Feature Typical SYSV Typical BSD kernel name /unix /vmunix boot init /etc/rc.d directories /etc/rc.* files mounted FS /etc/mnttab /etc/mtab default shell sh, ksh csh, tcsh FS block size 512 bytes->2K 4K->8K print subsystem lp, lpstat, cancel lpr, lpq, lprm echo command echo "\c" echo -n (no new line) ps command ps -fae ps -aux

multiple wait poll select syscalls memory access memset, memcpy bzero, bcopy syscalls

Fig. 1.3: Differences between SYSV and BSD Linux is a free open source UNIX OS for PCs that was originally developed in 1991 by Linus Torvalds, a Finnish undergraduate student. Linux is neither pure SYSV or pure BSD. Instead, incorporates some features from each (e.g. SYSV-style startup files but BSD-style file system layout) and aims to conform with a set of IEEE standards called POSIX (Portable Operating System Interface). To maximise code portability, it typically supports SYSV, BSD and POSIX system calls (e.g. poll, select, memset, memcpy, bzero and bcopy are all supported).

The open source nature of Linux means that the source code for the Linux kernel is freely available so that anyone can add features and correct deficiencies. This approach has been very successful and what started as one person's project has now turned into a collaboration of hundreds of volunteer developers from around the globe. The open source approach has not just successfully been applied to kernel code, but also to application programs for Linux

As Linux has become more popular, several different development streams or distributions have emerged, e.g. Redhat, Slackware, Mandrake, Debian, and Caldera. A distribution comprises a prepackaged kernel, system utilities, GUI interfaces and application programs.

Redhat is the most popular distribution because it has been ported to a large number of hardware platforms (including Intel, Alpha, and SPARC), it is easy to use and install and it comes with a comprehensive set of utilities and applications including the X Windows graphics system, GNOME and KDE GUI environments, and the StarOffice suite (an open source MS-Office clone for Linux).

ARCHITECTURE OF THE LINUX OPERATING SYSTEM

Linux has all of the components of a typical OS (at this point you might like to refer back to Fig 1.1):

Kernel The Linux kernel includes device driver support for a large number of PC hardware devices (graphics cards, network cards, hard disks etc.), advanced

processor and memory management features, and support for many different types of filesystems (including DOS floppies and the ISO9660 standard for CDROMs). In terms of the services that it provides to application programs and system utilities, the kernel implements most BSD and SYSV system calls, as well as the system calls described in the POSIX.1 specification.

The kernel (in raw binary form that is loaded directly into memory at system startup time) is typically found in the file /boot/vmlinuz, while the source files can usually be found in /usr/src/linux.

Shells and GUIs Linux supports two forms of command input: through textual command line shells similar to those found on most UNIX systems (e.g. sh - the Bourne shell, bash - the Bourne again shell and csh - the C shell) and through graphical interfaces (GUIs) such as the KDE and GNOME window managers. If you are connecting remotely to a server your access will typically be through a command line shell.

System Utilities Virtually every system utility that you would expect to find on standard implementations of UNIX (including every system utility described in the POSIX.2 specification) has been ported to Linux. This includes commands such as ls, cp, grep, awk, sed, bc, wc, more, and so on. These system utilities are designed to be powerful tools that do a single task extremely well (e.g. grep finds text inside files while wc counts the number of words, lines and bytes inside a file). Users can often solve problems by interconnecting these tools instead of writing a large monolithic application program.

Like other UNIX flavours, Linux's system utilities also include server programs called daemons which provide remote network and administration services (e.g. telnetd and sshd provide remote login facilities, lpd provides printing services, httpd serves web pages, crond runs regular system administration tasks automatically). A daemon (probably derived from the Latin word which refers to a beneficient spirit who watches over someone, or perhaps short for "Disk And Execution MONitor") is usually spawned automatically at system startup and spends most of its time lying dormant (lurking?) waiting for some event to occur.

Application programs Linux distributions typically come with several useful application programs as standard. Examples include the emacs editor, xv (an image viewer), gcc (a C compiler), g++ (a C++ compiler), xfig (a drawing package), latex (a powerful typesetting language) and soffice (StarOffice, which is an MS-Office style clone that can read and write Word, Excel and PowerPoint files).

Redhat Linux also comes with rpm, the Redhat Package Manager which makes it easy to install and uninstall application programs.

1.5 Logging into (and out of) UNIX Systems Text-based (TTY) terminals: When you connect to a UNIX computer remotely (using telnet) or when you log in locally using a text-only terminal, you will see the prompt:

login:

At this prompt, type in your usename and press the enter/return/ key. Remember that UNIX is case sensitive (i.e. Will, WILL and will are all different logins). You should then be prompted for your password:

login: will password:

Type your password in at the prompt and press the enter/return/ key. Note that your password will not be displayed on the screen as you type it in.

If you mistype your username or password you will get an appropriate message from the computer and you will be presented with the login: prompt again. Otherwise you should be presented with a shell prompt which looks something like this:

$

To log out of a text-based UNIX shell, type "exit" at the shell prompt (or if that doesn't work try "logout"; if that doesn't work press ctrl-d).

Graphical terminals:

If you're logging into a UNIX computer locally, or if you are using a remote login facility that supports graphics, you might instead be presented with a graphical prompt

with login and password fields. Enter your user name and password in the same way as above (N.B. you may need to press the TAB key to move between fields).

Once you are logged in, you should be presented with a graphical window manager that looks similar to the Microsoft Windows interface. To bring up a window containing a shell prompt look for menus or icons which mention the words "shell", "xterm", "console" or "terminal emulator".

To log out of a graphical window manager, look for menu options similar to "Log out" or "Exit".

CHANGING YOUR PASSWORD

One of the things you should do when you log in for the first time is to change your password.

The UNIX command to change your password is passwd:

$ passwd

The system will prompt you for your old password, then for your new password. To eliminate any possible typing errors you have made in your new password, it will ask you to reconfirm your new password.

Remember the following points when choosing your password:

o Avoid characters which might not appear on all keyboards, e.g. ''. o The weakest link in most computer security is user passwords so keep your

password a secret, don't write it down and don't tell it to anyone else. Also avoid dictionary words or words related to your personal details (e.g. your boyfriend or girlfriend's name or your login).

o Make it at least 7 or 8 characters long and try to use a mix of letters, numbers and punctuation.

GENERAL FORMAT OF UNIX COMMANDS

A UNIX command line consists of the name of a UNIX command (actually the "command" is the name of a built-in shell command, a system utility or an application program) followed by its "arguments" (options and the target filenames and/or expressions). The general syntax for a UNIX command is $ command -options targets

Here command can be though of as a verb, options as an adverb and targets as the direct objects of the verb. In the case that the user wishes to specify several options, these need not always be listed separately (the options can sometimes be listed altogether after a single dash)

An operating system (OS) is a computer program that manages the hardware and software resources of a computer. At the foundation of all system software, the OS performs basic tasks such as controlling and allocating memory, prioritizing system requests, controlling input and output devices, facilitating networking, and managing files. It also may provide a graphical user interface for higher level functions.

Introduction

A typical vision of a computer architecture as a series of abstraction layers: hardware, firmware, assembler, kernel, operating system and applications (see also Tanenbaum 79).

Modern general-purpose computers, including personal computers and mainframes, have an operating system (a general purpose operating system) to run other programs, such as application software. Examples of operating systems for personal computers include Microsoft Windows, GNU/Linux, and Mac OS.

The lowest level of any operating system is its kernel. This is the first layer of software loaded into memory when a system boots or starts up. The kernel provides access to various common core services to all other system and application programs. These

services include, but are not limited to: task scheduling, memory management, disk access, and access to hardware devices.

An example of MS-DOS's command-line interface, this one showing that the current directory is the root of drive C

Apart from the kernel, an operating system is often distributed with system software that manages a graphical user interface(gui)(although Windows and Macintosh have integrated these programs into the operating system), as well as utility programs for tasks such as managing files and configuring the operating system. Oftentimes distributed with operating systems are application software that does not directly relate to the operating system's core function, but which the operating system distributor finds advantageous to supply with the operating system.

Delineating between the operating system and application software is not a completely precise activity, and is occasionally subject to controversy. From commercial or legal points of view, the delineation can depend on the contexts of the interests involved. For example, one of the key questions in the United States v. Microsoft antitrust trial was whether Microsoft's web browser was part of its operating system, or whether it was a separable piece of application software.

Like the term "operating system" itself, the question of what exactly the "kernel" should manage is subject to some controversy, with debates over whether things like file systems should be included in the kernel. Various camps advocate microkernels, monolithic kernels, and so on.

Operating systems are used on most, but not all, computer systems. The simplest computers, including the smallest embedded systems and many of the first computers did not have operating systems. Instead, they relied on the application programs to manage the minimal hardware themselves, perhaps with the aid of libraries developed for the purpose. Commercially-supplied operating systems are present on virtually all modern devices described as computers, from personal computers to mainframes, as well as mobile computers such as PDAs and mobile phones.

Services

Process management

Every action on a computer, be it background services or applications, is run inside a process. As long as a von Neumann architecture is used to build computers, only one process per CPU can be run at a time. Older OS such as DOS did not attempt any artifacts to bypass this limit and only one process could be run under them (although DOS itself featured TSR as a very partial and not too easy to use solution). Modern operating systems are able to simulate execution of many processes at once via multitasking even with one CPU. Process management is an operating system's way of dealing with running multiple processes. Since most computers contain one processor with one core, multitasking is done by simply switching processes quickly. As more processes run, all timeshares become smaller. On many systems, this can eventually lead to problems such as skipping of audio or jittery mouse movement (called thrashing, a state in which OS related activity becomes the only thing a computer does). Process management involves computing and distributing "timeshares". Most OSs allow a process to be assigned a process priority which impacts its timeshare. Interactive OSs also employ some level of feedback in which the task with which the user is working receives higher priority.

Memory management

According to Parkinson's law "Programs expand to fill the memory available to hold them". Thus the programmers prefer a memory of infinite size and infinite speed. Nowadays most of the computer's memory is arranged in a hierarchical manner, starting from the fastest registers, cache, RAM and disk storage. The memory manager in an OS coordinates the memories by tracking which one is available, which is to be allocated or deallocated and how to swap between the main memory and secondary memories. This activity which is usually referred to as virtual memory management greatly increases the amount of memory available for a process (typically 4GB, even if the physical RAM available is less). This however comes at a speed penalty which is usually low, but can become very high in extreme cases and, again, lead to thrashing.

Another important part of memory management activity is managing virtual addresses, with help from the CPU. If multiple processes are in memory at once, they must be prevented from interfering with each other's memory (unless there is an explicit request to share a limited amount of memory, in controlled ways). This is achieved by having separate address spaces. Each process sees the whole virtual address space (typically from address 0 up to the maximum size of virtual memory) as uniquely assigned to it (ignoring the fact that some areas are OS reserved). The CPU stores some tables to match virtual addresses to physical addresses.

By creating a separate address space for each process, it is also simple for the operating system to free all of the memory that was used by a particular process. If a process does not free memory, it is unimportant once the process ends and the memory is all released.

Disk and file systems

Operating systems have a variety of native file systems. Linux has a greater range of native file systems, those being: ext2, ext3, ReiserFS, Reiser4, GFS, GFS2, OCFS, OCFS2, NILFS and Google File System. Linux also has full support for XFS and JFS, along with the FAT file systems, and NTFS. Windows on the other hand has limited file system support which only includes: FAT12, FAT16, FAT32, and NTFS. The NTFS file system is the most efficient and reliable of the four Windows systems. All the FAT systems are older than NTFS and have limitations on the partition and file size that can cause a variety of problems.

For most of the above file systems there are two ways it can be allocated. Each system can be journaled or non-journaled. Journaled being the safer alternative under the circumstances of a system recovery. If a system comes to an abrupt stop, in a system crash scenario, the non-journaled system will need to undergo an examination from the system check utilities where as the journaled file systems recovery is automatic. Microsoft's NTFS is journaled along with most Linux file systems, except ext2, but including ext3, reiserfs and JFS.

Every file system is made up of similar directories and subdirectories. Along with the operating systems file system similarities there are the subtle differences. Microsoft separates its directories with a back slash and its file names aren't case sensitive whereas Unix-derived operating systems (including Linux) use the forward slash and their file names generally are case sensitive.

Networking

Most current operating systems are capable of using the now-universal TCP/IP networking protocols. This means that one system can appear on a network of the other and share resources such as files, printers, and scanners.

Many operating systems also support one or more vendor-specific legacy networking protocols as well, for example, SNA on IBM systems, DECnet on systems from Digital Equipment Corporation, and Microsoft-specific protocols on Windows. Specific protocols for specific tasks may also be supported such as NFS for file access.

Security

Many operating systems include some level of security. Security is based on the two ideas that:

The operating system provides access to a number of resources, directly or indirectly, such as files on a local disk, privileged system calls, personal information about users, and the services offered by the programs running on the system;

The operating system is capable of distinguishing between some requestors of these resources who are authorized (allowed) to access the resource, and others who are not authorized (forbidden). While some systems may simply distinguish between "privileged" and "non-privileged", systems commonly have a form of requestor identity, such as a user name. Requestors in turn divide into two categories:

Internal security: an already running program. On some systems, a program once it is running has no limitations, but commonly the program has an identity which it keeps and is used to check all of its requests for resources.

External security: a new request from outside the computer, such as a login at a connected console or some kind of network connection. To establish identity there may be a process of authentication. Often a username must be quoted, and each username may have a password. Other methods of authentication such as magnetic cards or biometric data might be used instead. In some cases, especially connections from the network, resources may be accessed with no authentication at all.

In addition to the allow/disallow model of security, a system with a high level of security will also offer auditing options. These would allow tracking of requests for access to resources (such as "who has been reading this file"?)

Security of operating systems has long been a concern because of highly sensitive data held on computers, both of a commercial and military nature. The United States Government Department of Defense (DoD) created the Trusted Computer System Evaluation Criteria (TCSEC), which is a standard that sets basic requirements for assessing the effectiveness of security. This became of vital importance to operating system makers, because the TCSEC was used to evaluate, classify and select computer systems being considered for the processing, storage and retrieval of sensitive or classified information.

Internal security

Internal security can be conceptualized as protecting the computer's resources from the programs concurrently running on the system. Most operating systems set programs

running natively on the computer's processor, so the problem arises of how to stop these programs doing the same task and having the same privileges as the operating system (which is after all just a program too). Processors used for general purpose operating systems generally have a hardware concept of privilege. Generally less privileged programs are automatically blocked from using certain hardware instructions, such as those to read or write from external devices like disks. Instead, they have to ask the privileged program (operating system) to read or write. The operating system therefore gets the chance to check the program's identity and allow or refuse the request.

An alternative strategy, and the only strategy available where the operating system and user programs have the same hardware privilege, is the operating system not running user programs as native code, but instead either emulates a processor or provides a host for a p-Code based system such as Java.

Internal security is especially relevant for multi-user systems; it allows each user of the system to have private files that the other users cannot tamper with or read. Internal security is also vital if auditing is to be of any use, since a program can potentially bypass the operating system, inclusive of bypassing auditing.

External security

Typically an operating system offers (hosts) various services to other network computers and users. These services are usually provided through ports or numbered access points beyond the operating systems network address. Typically services include offerings such as file sharing, print services, email, web sites, and file transfer protocols. At the front line of security are hardware devices known as firewalls. At the operating system level there are various software firewalls. A software firewall is configured to allow or deny traffic to a service running on top of the operating system. Therefore one can install and be running an insecure service, such as Telnet or FTP, and not have to be threatened by a security breach because the firewall would deny all traffic trying to connect to the service on that port.

Graphical user interfaces

Today, most modern operating systems contain Graphical User Interfaces (GUIs, pronounced g-oo-ey-s). A few older operating systems tightly integrated the GUI to the kernelfor example, the original implementations of Windows and Mac OS. More modern operating systems are modular, separating the graphics subsystem from the kernel (as is now done in Linux, and Mac OS X, and to a limited extent Windows).

Many operating systems allow the user to install or create any user interface they desire. The X Window System in conjunction with GNOME or KDE is a commonly found setup on most Unix and Unix derivative (BSD, Linux, Minix) systems.

GUIs tend to change with time. For example, Windows has modified its GUI every time a new major version of Windows is released and the Mac OS GUI changed dramatically with the introduction of Mac OS X.

Device drivers

A device driver is a specific type of computer software developed to allow interaction with hardware devices. Typically this constitutes an interface for communicating with the device, through the specific computer bus or communications subsystem that the hardware is connected to, providing commands to and/or receiving data from the device, and on the other end, the requisite interfaces to the operating system and software applications. It is a specialized hardware dependent computer program which is also operating system specific that enables another program, typically an operating system or applications software package or computer program running under the operating system kernel, to interact transparently with a hardware device, and usually provides the requisite interrupt handling necessary for any necessary asynchronous time-dependent hardware interfacing needs.

The key design goal of device drivers is abstraction. Every model of hardware (even within the same class of device) is different. Newer models also are released by manufacturers that provide more reliable or better performance and these newer models are often controlled differently. Computers and their operating systems cannot be expected to know how to control every device, both now and in the future. To solve this problem, OSes essentially dictate how every type of device should be controlled. The function of the device driver is then to translate these OS mandated function calls into device specific calls. In theory a new device, which is controlled in a new manner, should function correctly if a suitable driver is available. This new driver will ensure that the device appears to operate as usual from the operating systems' point of view for any person.

History

The first computers did not have operating systems. However, software tools for managing the system and simplifying the use of hardware appeared very quickly afterwards, and gradually expanded in scope. By the early 1960s, commercial computer vendors were supplying quite extensive tools for streamlining the development, scheduling, and execution of jobs on batch processing systems. Examples were produced by UNIVAC and Control Data Corporation, amongst others.

Through the 1960s, several major concepts were developed, driving the development of operating systems. The development of the IBM System/360 produced a family of mainframe computers available in widely differing capacities and price points, for which a single operating system OS/360 was planned (rather than developing ad-hoc programs for every individual model). This concept of a single OS spanning an entire product line was crucial for the success of System/360 and, in fact, IBM's current mainframe operating systems are distant descendants of this original system; applications written for the OS/360 can still be run on modern machines. OS/360 also contained another important advance: the development of the hard disk permanent storage device (which IBM called DASD). Another key development was the concept of time-sharing: the idea of sharing the resources of expensive computers amongst multiple computer users interacting in real time with the system. Time sharing allowed all of the users to have the illusion of having exclusive access to the machine; the Multics timesharing system was the most famous of a number of new operating systems developed to take advantage of the concept.

Multics, particularly, was an inspiration to a number of operating systems developed in the 1970s, notably Unix. Another commercially-popular minicomputer operating system was VMS.

The first microcomputers did not have the capacity or need for the elaborate operating systems that had been developed for mainframes and minis; minimalistic operating systems were developed, often loaded from ROM and known as Monitors. One notable early disk-based operating system was CP/M, which was supported on many early microcomputers and was largely cloned in creating MS-DOS, which became wildly popular as the operating system chosen for the IBM PC (IBM's version of it was called IBM-DOS or PC-DOS), its successors making Microsoft one of the world's most profitable companies. The major alternative throughout the 1980s in the microcomputer market was Mac OS, tied intimately to the Apple Macintosh computer.

By the 1990s, the microcomputer had evolved to the point where, as well as extensive GUI facilities, the robustness and flexibility of operating systems of larger computers became increasingly desirable. Microsoft's response to this change was the development of Windows NT, which served as the basis for Microsoft's entire operating system line starting in 1999. Apple rebuilt their operating system on top of a Unix core as Mac OS X, released in 2001. Hobbyist-developed reimplementations of Unix, assembled with the tools from the GNU Project, also became popular; versions based on the Linux kernel are by far the most popular, with the BSD derived UNIXes holding a small portion of the server market.

The growing complexity of embedded devices has led to increasing use of embedded operating systems.

Today

MacOS X in action

Modern operating systems have a Graphical user interface which uses a pointing device such as a mouse or stylus for input in addition to the keyboard. Older models and Operating Systems not designed for direct-human interaction (such as web-servers) typically use a Command line interface (or CLI) typically with only the keyboard for input. Both models are centered around a "shell" which accepts and processes commands from the user (eg. clicking on a button, or a typed command at a prompt). The choice of OS may depend on the hardware architecture, specifically the CPU, with only Linux and BSD running on almost any CPU. Windows NT has been ported to a few other CPUs (DEC Alpha and MIPS Magnum). Since the early 1990s the choice for personal computers has largely been limited to the Microsoft Windows family and the Unix-like family, of which Linux and Mac OS X are becoming the major alternatives. Mainframe computers and embedded systems use a variety of different operating systems, many with no direct connection to Windows or Unix, Personal but typically more similar to Unix than Windows.

Computers

IBM PC compatible - Microsoft Windows and smaller Unix-variants (like Linux and BSD)

Apple Macintosh - Mac OS X, Windows, Linux and BSD

Mainframes - A number of unique operating systems; sometimes Linux and other Unix variants.

Embedded systems - a variety of dedicated operating systems and limited versions of Linux or other operating systems

Unix-like

A customized Linux desktop.

The Unix-like family is a diverse group of operating systems, with several major sub-categories including System V, BSD, and Linux. The name "Unix" is a trademark of The Open Group which licenses it for use with any operating system that has been shown to conform to their definitions. "Unix-like" is commonly used to refer to the large set of operating systems which resemble the original Unix.

Unix systems run on a wide variety of machine architectures. They are used heavily as server systems in business, as well as workstations in academic and engineering environments. Free software Unix variants, such as Linux and BSD, are increasingly popular. They are used in the desktop market as well, for example Ubuntu, but mostly by hobbyists.

Some Unix variants like HP's HP-UX and IBM's AIX are designed to run only on that vendor's proprietary hardware. Others, such as Solaris, can run on both proprietary hardware and on commodity x86 PCs. Apple's Mac OS X, a microkernel BSD variant derived from NeXTSTEP, Mach, and FreeBSD, has replaced Apple's earlier (non-Unix) Mac OS. Over the past several years, free Unix systems have supplanted proprietary ones in most instances. For instance, scientific modeling and computer animation were once the province of SGI's IRIX. Today, they are dominated by Linux-based or Plan 9 clusters. [citation needed]

The team at Bell Labs that designed and developed Unix went on to develop Plan 9 and Inferno, which were designed for modern distributed environments. They had graphics built-in, unlike Unix counterparts that added it to the design later. Plan 9 did not become popular because, unlike many Unix distributions, it was not originally free. It has since been released under Free Software and Open Source Lucent Public License, and has an expanding community of developers. Inferno was sold to Vita Nuova and has been released under a GPL/MIT license.

Microsoft Windows

A typical Windows XP desktop.

The Microsoft Windows family of operating systems originated as a graphical layer on top of the older MS-DOS environment for the IBM PC. Modern versions are based on the newer Windows NT core that first took shape in OS/2 and borrowed from OpenVMS. Windows runs on 32-bit and 64-bit Intel and AMD computers, although earlier versions also ran on the DEC Alpha, MIPS, and PowerPC architectures (some work was done to port it to the SPARC architecture).

As of 2004, Windows held a near-monopoly of around 90% of the worldwide desktop market share,[citation needed] although this is thought to be dwindling due to the increase of interest in open source operating systems. [1] It is also used on low-end and mid-range servers, supporting applications such as web servers and database servers. In recent years, Microsoft has spent significant marketing and R&D money to demonstrate that Windows is capable of running any enterprise application (see the TPC article).

The most recent addition to the Microsoft Windows family is Microsoft Windows XP, released on October 25, 2001. The latest stable release is Windows XP Service Pack 2, released on August 6, 2004.

Microsoft is currently developing its next generation of Windows Platform named Windows Vista (formerly code name "Longhorn"), which has yet to be released and boasts some impressive new functionality particularly in security and network administration. The as-yet-unreleased software also boasts a completely new front-end known as Aero Glass.

Mainframe operating systems, such as IBM's z/OS, and embedded operating systems such as VxWorks, eCos, and Palm OS, are usually unrelated to Unix and Windows, except for Windows CE, Windows NT Embedded 4.0 and Windows XP Embedded which are descendants of Windows, and several *BSDs, and Linux distributions tailored

for embedded systems. OpenVMS from Hewlett-Packard (formerly DEC), is still under active development.

Older operating systems which are still used in niche markets include OS/2 from IBM; Mac OS, the non-Unix precursor to Apple's Mac OS X; BeOS; XTS-300.

Popular prior to the Dot COM era, operating systems such as AmigaOS and RISC OS continue to be developed as minority platforms for enthusiast communities and specialist applications.

Research and development of new operating systems continues. GNU Hurd is designed to be backwards compatible with Unix, but with enhanced functionality and a microkernel architecture. Microsoft Singularity is a research project to develop an operating system with better memory protection based on the .Net managed code model.

UNIX File System The UNIX file system (UFS) is a file system used by many Unix and Unix-like operating systems. It is also called the Berkeley Fast File System, the BSD Fast File System or FFS. It is a distant descendant of the original filesystem used by Version 7 Unix.

Design A UFS volume is composed of the following parts:

a few blocks at the beginning of the partition reserved for boot blocks (which must be initialized separately from the filesystem)

a superblock, containing a magic number identifying this as a UFS filesystem, and some other vital numbers describing this filesystem's geometry and statistics and behavioral tuning parameters

a collection of cylinder groups. Each cylinder group has the following components:

a backup copy of the superblock

a cylinder group header, with statistics, free lists, etc, about this cylinder group, similar to those in the superblock

a number of inodes, each containing file attributes

a number of data blocks

Inodes are numbered sequentially. The first several inodes are reserved for historical reasons, followed by the inode for the root directory.

Directory files contain only the list of filenames in the directory and the inode associated with each file. All file metadata is kept in the inode.

History and evolution

Early versions of Unix used filesystems referred to simply as FS. FS only included the boot block, superblock, a clump of inodes, and the data blocks. This worked well for the small disks early Unixes were designed for, but as technology advanced and disks got larger, moving the head back and forth between the clump of inodes and the data blocks they referred to caused thrashing. BSD optimized this in FFS by inventing cylinder groups, breaking the disk up into smaller chunks, each with its own inode clump and data blocks.

The intent of BSD FFS is to try to localize associated data blocks and metadata in the same cylinder group, and ideally, all of the contents of a directory (both data and metadata for all the files) in the same or nearby cylinder group, thus reducing fragmentation caused by scattering a directory's contents over a whole disk.

Some of the performance parameters in the superblock included number of tracks and sectors, disk rotation speed, head speed, and alignment of the sectors between tracks. In a fully optimized system, the head could be moved between close tracks to read scattered sectors off of alternating tracks while waiting for the platter to spin around.

As disks grew larger and larger, sector level optimization became obsolete (especially with disks that used linear sector numbering and variable sectors per track). With larger disks and larger files, fragmented reads became more of a problem. To combat this, BSD originally increased the filesystem block size from one sector to 1k in 4.0BSD, and, in FFS, increased the filesystem block size from 1k to 8k. This has several effects. The chances of a file's sectors being contiguous is much greater. The amount of overhead to list the file's blocks is reduced. The number of blocks representable in a fixed bit width block number is increased (allowing for larger disks).

With larger block sizes, disks with many small files would waste a lot of space, so BSD added block level fragmentation, where the last partial block of data from several files may be stored in a single "fragment" block instead of multiple mostly empty blocks.

Implementations

Vendors of some commercial Unix systems, such as SunOS/Solaris, System V Release 4, HP-UX, and Tru64 UNIX, have adopted UFS. Most of them adapted UFS to their own uses, adding proprietary extensions that may not be recognized by other vendors' versions of Unix. Surprisingly, many have continued to use the original block size and data field widths as the original UFS, so some degree of (read) compatibility remains across platforms.

As of Solaris 7, Sun Microsystems included UFS Logging, which brought filesystem journaling to UFS. Solaris UFS also has extensions for large files and large disks and other features.

In 4.4BSD and BSD Unix systems derived from it, such as FreeBSD, NetBSD, OpenBSD, and DragonFlyBSD, the implementation of UFS1 and UFS2 is split into two layers - an upper layer that provides the directory structure and supports metadata (permissions, ownership, etc.) in the inode structure, and lower layers that provide data containers implemented as inodes. This was done to support both the traditional FFS and the LFS log-structured file system with common code for common functions. The upper layer is called "UFS", and the lower layers are called "FFS" and "LFS". In some of those systems, the term "FFS" is used for the combination of the FFS lower layer and the UFS upper layer, and the term "LFS" is used for the combination of the LFS lower layer and the UFS upper layer.

FreeBSD extended the FFS and UFS layers to support a new variant, called UFS2, which adds 64-bit block pointers (allowing volumes larger than 1TB), variable-sized blocks (similar to extents), extended flag fields and extended attribute support. FreeBSD also introduced soft updates and the ability to make file system snapshots for both UFS1 and UFS2. These have since been ported to NetBSD. Note that OpenBSD does not currently support UFS2, but it does support soft updates.

Linux includes a UFS implementation for binary compatibility at the read level with other Unixes, but since there is no standard implementation for the vendor extensions to UFS, Linux does not have full support for writing to UFS. The native Linux ext2 filesystem is inspired by UFS. (In fact, in some 4.4BSD-derived systems, the UFS layer can use an ext2 layer as a container layer, just as it can use FFS and LFS.) NeXTStep, which was BSD-derived, also used a version of UFS. In Mac OS X it is available as an alternative to HFS+.

Chapter-2

Basic UNIX commands

Note: not all of these are actually part of UNIX itself, and you may not find them on all UNIX machines. But they can all be used on turing in essentially the same way, by typing the command and hitting return. Note that some of these command are different on non-Solaris machines - see SunOS differences. If you've made a typo, the easiest thing to do is hit CTRL-u to cancel the whole line. But you can also edit the command line. UNIX is case-sensitive.

Files

ls --- lists your files

ls -l --- lists your files in 'long format', which contains lots of useful information,

e.g. the exact size of the file, who owns the file and who has the right to look at it, and when it was last modified. ls -a --- lists all files, including the ones whose filenames begin in a dot, which you do not always want to see. There are many more options, for example to list files by size, by date, recursively etc.

more filename --- shows the first part of a file, just as much as will fit on one screen. Just hit the space bar to see more or q to quit. You can use /pattern to search for a pattern.

emacs filename --- is an editor that lets you create and edit a file.. mv filename1 filename2 --- moves a file (i.e. gives it a different name, or moves

it into a different directory (see below) cp filename1 filename2 --- copies a file

rm filename --- removes a file. It is wise to use the option rm -i, which will ask you for confirmation before actually deleting anything. You can make this your default by making an alias in your .cshrc file.

diff filename1 filename2 --- compares files, and shows where they differ wc filename --- tells you how many lines, words, and characters there are in a file chmod options filename --- lets you change the read, write, and execute

permissions on your files. The default is that only you can look at them and change them, but you may sometimes want to change these permissions. For example, chmod o+r filename will make the file readable for everyone, and chmod o-r filename will make it unreadable for others again. Note that for someone to be able to actually look at the file the directories it is in need to be at least executable..

File Compression o gzip filename --- compresses files, so that they take up much less space.

Usually text files compress to about half their original size, but it depends very much on the size of the file and the nature of the contents. There are other tools for this purpose, too (e.g. compress), but gzip usually gives the highest compression rate. Gzip produces files with the ending '.gz' appended to the original filename.

o gunzip filename --- uncompresses files compressed by gzip. o gzcat filename --- lets you look at a gzipped file without actually having to

gunzip it (same as gunzip -c). You can even print it directly, using gzcat filename | lpr

printing

o lpr filename --- print. Use the -P option to specify the printer name if you want to use a printer other than your default printer. For example, if you want to print double-sided, use 'lpr -Pvalkyr-d', or if you're at CSLI, you

may want to use 'lpr -Pcord115-d'. See 'help printers' for more information about printers and their locations.

o lpq --- check out the printer queue, e.g. to get the number needed for removal, or to see how many other files will be printed before yours will come out

o lprm jobnumber --- remove something from the printer queue. You can find the job number by using lpq. Theoretically you also have to specify a printer name, but this isn't necessary as long as you use your default printer in the department.

o genscript --- converts plain text files into postscript for printing, and gives you some options for formatting. Consider making an alias like alias ecop 'genscript -2 -r \!* | lpr -h -Pvalkyr' to print two pages on one piece of paper.

o dvips filename --- print .dvi files (i.e. files produced by LaTeX). You can use dviselect to print only selected pages

Directories

Directories, like folders on a Macintosh, are used to group files together in a hierarchical structure.

mkdir dirname --- make a new directory cd dirname --- change directory. You basically 'go' to another directory, and you

will see the files in that directory when you do 'ls'. You always start out in your 'home directory', and you can get back there by typing 'cd' without arguments. 'cd ..' will get you one level up from your current position. You don't have to walk along step by step - you can make big leaps or avoid walking around by specifying pathnames.

pwd --- tells you where you currently are.

Finding things

ff --- find files anywhere on the system. This can be extremely useful if you've forgotten in which directory you put a file, but do remember the name. In fact, if you use ff -p you don't even need the full name, just the beginning. This can also be useful for finding other things on the system, e.g. documentation.

grep string filename(s) --- looks for the string in the files. This can be useful a lot of purposes, e.g. finding the right file among many, figuring out which is the right version of something, and even doing serious corpus work. grep comes in several varieties (grep, egrep, and fgrep) and has a lot of very flexible options. Check out the man pages if this sounds good to you.

About other people

w --- tells you who's logged in, and what they're doing. Especially useful: the 'idle' part. This allows you to see whether they're actually sitting there typing away at their keyboards right at the moment.

who --- tells you who's logged on, and where they're coming from. Useful if you're looking for someone who's actually physically in the same building as you, or in some other particular location.

finger username --- gives you lots of information about that user, e.g. when they last read their mail and whether they're logged in. Often people put other practical information, such as phone numbers and addresses, in a file called .plan. This information is also displayed by 'finger'.

last -1 username --- tells you when the user last logged on and off and from where. Without any options, last will give you a list of everyone's logins.

talk username --- lets you have a (typed) conversation with another user write username --- lets you exchange one-line messages with another user

elm --- lets you send e-mail messages to people around the world (and, of course, read them). It's not the only mailer you can use, but the one we recommend.

About your (electronic) self

whoami --- returns your username. Sounds useless, but isn't. You may need to find out who it is who forgot to log out somewhere, and make sure *you* have logged out.

finger & .plan files of course you can finger yourself, too. That can be useful e.g. as a quick check whether you got new mail. Try to create a useful .plan file soon. Look at other

people's .plan files for ideas. The file needs to be readable for everyone in order to be visible through 'finger'. Do 'chmod a+r .plan' if necessary. You should realize that this information is accessible from anywhere in the world, not just to other people on turing.

passwd --- lets you change your password, which you should do regularly (at least once a year).

ps -u yourusername --- lists your processes. Contains lots of information about them, including the process ID, which you need if you have to kill a process. Normally, when you have been kicked out of a dialin session or have otherwise managed to get yourself disconnected abruptly, this list will contain the processes you need to kill. Those may include the shell (tcsh or whatever you're using), and anything you were running, for example emacs or elm. Be careful not to kill your

current shell - the one with the number closer to the one of the ps command you're currently running. But if it happens, don't panic. Just try again :) If you're using an X-display you may have to kill some X processes before you can start them again. These will show only when you use ps -efl, because they're root processes.

kill PID --- kills (ends) the processes with the ID you gave. This works only for your own processes, of course. Get the ID by using ps. If the process doesn't 'die' properly, use the option -9. But attempt without that option first, because it doesn't give the process a chance to finish possibly important business before dying. You may need to kill processes for example if your modem connection was interrupted and you didn't get logged out properly, which sometimes happens.

quota -v --- show what your disk quota is (i.e. how much space you have to store files), how much you're actually using, and in case you've exceeded your quota (which you'll be given an automatic warning about by the system) how much time you have left to sort them out (by deleting or gzipping some, or moving them to your own computer).

du filename --- shows the disk usage of the files and directories in filename (without argument the current directory is used). du -s gives only a total.

last yourusername --- lists your last logins. Can be a useful memory aid for when you were where, how long you've been working for, and keeping track of your phonebill if you're making a non-local phonecall for dialling in.

Connecting to the outside world

nn --- allows you to read news. It will first let you read the news local to turing, and then the remote news. If you want to read only the local or remote news, you can use nnl or nnr, respectively. To learn more about nn type nn, then \tty{:man}, then \tty{=.*}, then \tty{Z}, then hit the space bar to step through the manual.

rlogin hostname --- lets you connect to a remote host telnet hostname --- also lets you connect to a remote host. Use rlogin whenever

possible.

ftp hostname --- lets you download files from a remote host which is set up as an ftp-server. This is a common method for exchanging academic papers and drafts. If you need to make a paper of yours available in this way, you can (temporarily) put a copy in /user/ftp/pub/TMP. For more permanent solutions, ask Emma. The most important commands within ftp are get for getting files from the remote machine, and put for putting them there (mget and mput let you specify more than one file at once). Sounds straightforward, but be sure not to confuse the two, especially when your physical location doesn't correspond to the direction of the ftp connection you're making. ftp just overwrites files with the same filename. If you're transferring anything other than ASCII text, use binary mode.

lynx --- lets you browse the web from an ordinary terminal. Of course you can see only the text, not the pictures. You can type any URL as an argument to the G command. When you're doing this from any Stanford host you can leave out the .stanford.edu part of the URL when connecting to Stanford URLs. Type H at any time to learn more about lynx, and Q to exit.

Miscellaneous tools

webster word --- looks up the word in an electronic version of Webster's dictionary and returns the definition(s)

date --- shows the current date and time. cal --- shows a calendar of the current month. Use e.g., 'cal 10 1995' to get that

for October 95, or 'cal 1995' to get the whole year.

You can find out more about these commands by looking up their man pages: man command name --- shows you the manual page for the command .

Starting and Ending

login: `Logging in' telnet: Connect to another machine logout: `Logging out'

File Management

emacs: `Using the emacs text editor' mkdir: `Creating a directory' cd: `Changing your current working directory' ls: `Finding out what files you have' cp: `Making a copy of a file' mv: `Changing the name of a file' rm: `Getting rid of unwanted files' chmod: `Controlling access to your files' cmp: Comparing two files wc: Word, line, and character count compress: Compress a file

Communication

e-mail: `Sending and receiving electronic mail' talk: Talk to another user write: Write messages to another user ftp: `Transferring files with ftp'

Information

man: Manual pages quota -v: Finding out your available disk space quota ical: `Using the Ical personal organizer' finger: Getting information about a user passwd: Changing your password who: Finding out who's logged on

Printing

lpr: `Printing' lprm: Removing a print job lpq: Checking the print queues

Job control

ps: `Finding your processes' kill: `Killing a process' nohup: Continuing a job after logout nice: Changing the priority of a job &: `What is a background process?' Cntrl-z: Suspending a process fg: `Resuming a suspended process'

SYNOPSIS

wc [OPTION]... [FILE]...

DESCRIPTION

Print byte, word, and newline counts for each FILE, and a total line if more than one FILE is specified. With no FILE, or when FILE is -, read standard input.

-c, --bytes print the byte counts -m, --chars print the character counts -l, --lines print the newline counts -L, --max-line-length print the length of the longest line -w, --words print the word counts --help display this help and exit --version output version information and exit

ls: `Finding out what files you have'

Once you log in to your account, you can see what files you have by using the ls command:

% ls This command will give you a list of all the names of your files. If you just got your account, you won't see any names until you actually create some files.

Here's an example. Suppose you type an ls command and it produces this output:

% ls groceries schedule todo

cp: `Making a copy of a file'

If you want to make an exact copy of the contents of a file, figure out what you want to name the copy, and then type the command

% cp oldname newname where oldname is the name of the existing file you want to copy. A new file will be created with the name newname and the same contents as the original.

Grep

Introduction -grep

grep searches the input files for lines containing a match to a given pattern list. When it finds a match in a line, it copies the line to standard output (by default), or does whatever other sort of output you have requested with options.

Though grep expects to do the matching on text, it has no limits on input line length other than available memory, and it can match arbitrary characters within a line. If the final byte of an input file is not a newline, grep silently supplies one. Since newline is also a separator for the list of patterns, there is no way to match newline characters in a text.

Invoking grep

grep comes with a rich set of options from POSIX.2 and GNU extensions.

`-c'

`--count' Suppress normal output; instead print a count of matching lines for each input file. With the `-v', `--invert-match' option, count non-matching lines.

`-e pattern' `--regexp=pattern'

Use pattern as the pattern; useful to protect patterns beginning with a `-'. `-f file' `--file=file'

Obtain patterns from file, one per line. The empty file contains zero patterns, and therefore matches nothing.

`-i' `--ignore-case'

Ignore case distinctions in both the pattern and the input files. `-l' `--files-with-matches'

Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning of every file will stop on the first match.

`-n' `--line-number'

Prefix each line of output with the line number within its input file. `-o' `--only-matching'

Print only the part of matching lines that actually matches pattern. `-q' `--quiet' `--silent'

Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if an error was detected. Also see the `-s' or `--no-messages' option.

`-s' `--no-messages'

Suppress error messages about nonexistent or unreadable files. Portability note: unlike GNU grep, traditional grep did not conform to POSIX.2, because traditional grep lacked a `-q' option and its `-s' option behaved like GNU grep's `-q' option. Shell scripts intended to be portable to traditional grep should avoid both `-q' and `-s' and should redirect output to `/dev/null' instead.

`-v' `--invert-match'

Invert the sense of matching, to select non-matching lines. `-x'

`--line-regexp'

Select only those matches that exactly match the whole line

grep programs

grep searches the named input files (or standard input if no files are named, or the file name `-' is given) for lines containing a match to the given pattern. By default, grep prints the matching lines. There are four major variants of grep, controlled by the following options.

`-G' `--basic-regexp'

Interpret the pattern as a basic regular expression. This is the default. `-E' `--extended-regexp'

Interpret the pattern as an extended regular expression. `-F' `--fixed-strings'

Interpret the pattern as a list of fixed strings, separated by newlines, any of which is to be matched.

`-P' `--perl-regexp'

Interpret the pattern as a Perl regular expression.

In addition, two variant programs EGREP and FGREP are available. EGREP is the same as `grep -E'. FGREP is the same as `grep -F'.

Regular Expressions

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions. grep understands two different versions of regular expression syntax: "basic"(BRE) and "extended"(ERE). In GNU grep, there is no difference in available functionality using either syntax. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards.

The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash.

A regular expression may be followed by one of several repetition operators:

`.'

The period `.' matches any single character. `?'

The preceding item is optional and will be matched at most once. `*'

The preceding item will be matched zero or more times. `+'

The preceding item will be matched one or more times. `{n}'

The preceding item is matched exactly n times. `{n,}'

The preceding item is matched n or more times. `{n,m}'

The preceding item is matched at least n times, but not more than m times.

Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions.

Two regular expressions may be joined by the infix operator `|'; the resulting regular expression matches any string matching either subexpression.

Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole subexpression may be enclosed in parentheses to override these precedence rules.

Usage

Here is an example shell command that invokes GNU grep:

grep -i 'hello.*world' menu.h main.c

This lists all lines in the files `menu.h' and `main.c' that contain the string `hello' followed by the string `world'; this is because `.*' matches zero or more characters within a line. See section 5. Regular Expressions. The `-i' option causes grep to ignore case, causing it to match the line `Hello, world!', which it would not otherwise match. See section 2. Invoking grep, for more details about how to invoke grep.

Here are some common questions and answers about grep usage.

1. How can I list just the names of matching files?

grep -l 'main' *.c

2. lists the names of all C files in the current directory whose contents mention `main'.

3. How do I search directories recursively?

grep -r 'hello' /home/gigi

4. searches for `hello' in all files under the directory `/home/gigi'. For more control of which files are searched, use find, grep and xargs. For example, the following command searches only C files:

find /home/gigi -name '*.c' -print | xargs grep 'hello' /dev/null

5. This differs from the command:

grep -r 'hello' *.c

6. which merely looks for `hello' in all files in the current directory whose names end in `.c'. Here the `-r' is probably unnecessary, as recursion occurs only in the unlikely event that one of `.c' files is a directory.

7. What if a pattern has a leading `-'?

grep -e '--cut here--' *

8. searches for all lines matching `--cut here--'. Without `-e', grep would attempt to parse `--cut here--' as a list of options.

9. Suppose I want to search for a whole word, not a part of a word?

grep -w 'hello' *

10. searches only for instances of `hello' that are entire words; it does not match `Othello'. For more control, use `\

grep 'hello\>' *

11. searches only for words ending in `hello', so it matches the word `Othello'. 12. How do I output context around the matching lines?

grep -C 2 'hello' *

13. prints two lines of context around each matching line. 14. How do I force grep to print the name of the file?

Append `/dev/null':

grep 'eli' /etc/passwd /dev/null

gets you:

/etc/passwd:eli:DNGUTF58.IMe.:98:11:Eli Smith:/home/do/eli:/bin/bash

15. Why do people use strange regular expressions on ps output?

ps -ef | grep '[c]ron'

16. If the pattern had been written without the square brackets, it would have matched not only the ps output line for cron, but also the ps output line for grep. Note that some platforms ps limit the ouput to the width of the screen, grep does not have any limit on the length of a line except the available memory.

17. Why does grep report "Binary file matches"?

If grep listed all matching "lines" from a binary file, it would probably generate output that is not useful, and it might even muck up your display. So GNU grep suppresses output from files that appear to be binary files. To force GNU grep to output lines even from files that appear to be binary, use the `-a' or `--binary-files=text' option. To eliminate the "Binary file matches" messages, use the `-I' or `--binary-files=without-match' option.

18. Why doesn't `grep -lv' print nonmatching file names?

`grep -lv' lists the names of all files containing one or more lines that do not match. To list the names of all files that contain no matching lines, use the `-L' or `--files-without-match' option.

19. I can do OR with `|', but what about AND?

grep 'paul' /etc/motd | grep 'franc,ois'

20. finds all lines that contain both `paul' and `franc,ois'. 21. How can I search in both standard input and in files?

Use the special file name `-':

cat /etc/passwd | grep 'alain' - /etc/motd

22. How to express palindromes in a regular expression?

It can be done by using the back referecences, for example a palindrome of 4 chararcters can be written in BRE.

grep -w -e '$.$$.$.\2\1' file

It matches the word "radar" or "civic".

Guglielmo Bondioni proposed a single RE that finds all the palindromes up to 19 characters long.

egrep -e '^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\9\8\7\6\5\4\3\2\1$' file

Note this is done by using GNU ERE extensions, it might not be portable on other greps.

23. Why are my expressions whith the vertical bar fail?

/bin/echo "ba" | egrep '(a)\1|(b)\1'

24. The first alternate branch fails then the first group was not in the match this will make the second alternate branch fails. For example, "aaba" will match, the first group participate in the match and can be reuse in the second branch.

25. What do grep, fgrep, egrep stand for ?

grep comes from the way line editing was done on Unix. For example, ed uses this syntax to print a list of matching lines on the screen.

global/regular expression/print g/re/p

fgrep stands for Fixed grep, egrep Extended grep.

CHAPTER-3

Fundamentals of UNIX shell programming

The Shell's Responsibilities

Now you know that the shell analyzes each line you type in and initiates execution of the selected program. But the shell also has other responsibilities,

Program Execution

The shell is responsible for the execution of all programs that you request from your terminal.

Each time you type in a line to the shell, the shell analyzes the line and then determines what to do. As far as the shell is concerned, each line follows the same basic format:

program-name arguments

The line that is typed to the shell is known more formally as the command line. The shell scans this command line and determines the name of the program to be executed and what arguments to pass to the program.

Figure 3.8 The shell's responsibilities.

The shell uses special characters to determine where the program name starts and ends, and where each argument starts and ends. These characters are

collectively called whitespace characters, and are the space character, the horizontal tab character, and the end-of-line character, known more formally as the newline character. Multiple occurrences of whitespace characters are simply ignored by the shell. When you type the command

mv tmp/mazewars games

the shell scans the command line and takes everything from the start of the line to the first whitespace character as the name of the program to execute: mv. The set of characters up to the next whitespace character is the first argument to mv: tmp/mazewars. The set of characters up to the next whitespace character (known as a word to the shell)in this case, the newlineis the second argument to mv: games. After analyzing the command line, the shell then proceeds to execute the mv command, giving it the two arguments tmp/mazewars and games (see Figure 3.9).

Figure 3.9 Execution of mv with two arguments.

As mentioned, multiple occurrences of whitespace characters are ignored by the shell. This means that when the shell processes this command line:

echo when do we eat?

it passes four arguments to the echo program: when, do, we, and eat? (see Figure 3.10).

Figure 3.10 Execution of echo with four arguments.

Because echo takes its arguments and simply displays them at the terminal, separating each by a space character, the output from the following becomes easy to understand:

$ echo when do we eat? when do we eat? $

The fact is that the echo command never sees those blank spaces; they have been "gobbled up" by the shell. When we discuss quotes in Chapter 6, "Can I Quote You on That?," you'll see how you can include blank spaces in arguments to programs.

We mentioned earlier that the shell searches the disk until it finds the program you want to execute and then asks the Unix kernel to initiate its execution. This is true most of the time. However, there are some commands that the shell knows how to execute itself. These built-in commands include cd, pwd, and echo. So before the shell goes searching the disk for a command, the shell first determines whether it's a built-in command, and if it is, the shell executes the command directly.

Variable and Filename Substitution

Like any other programming language, the shell lets you assign values to variables. Whenever you specify one of these variables on the command line, preceded by a dollar sign, the shell substitutes the value assigned to the variable at that point. This topic is covered in complete detail in Chapter 5, "And Away We Go."

The shell also performs filename substitution on the command line. In fact, the shell scans the command line looking for filename substitution characters *, ?, or [...] before determining the name of the program to execute and its arguments. Suppose that your current directory contains the files as shown:

$ ls

mrs.todd prog1 shortcut sweeney $

Now let's use filename substitution for the echo command:

$ echo * List all files mrs.todd prog1 shortcut sweeney $

How many arguments do you think were passed to the echo program, one or four? Because we said that the shell is the one that performs the filename substitution, the answer is four. When the shell analyzes the line

echo *

it recognizes the special character * and substitutes on the command line the names of all files in the current directory (it even alphabetizes them for you):

echo mrs.todd prog1 shortcut sweeney

Then the shell determines the arguments to be passed to the command. So echo never sees the asterisk. As far as it's concerned, four arguments were typed on the command

line (see Figure 3.11).

Figure 3.11 Execution of echo.

I/O Redirection

It is the shell's responsibility to take care of input and output redirection on the command line. It scans the command line for the occurrence of the special redirection characters , or >> (also

the shell recognizes the special output redirection character > and takes the next word on the command line as the name of the file that the output is to be redirected to. In this case, the file is reminder. If reminder already exists and you have write access to it, the previous contents are lost (if you don't have write access to it, the shell gives you an error message).

Before the shell starts execution of the desired program, it redirects the standard output of the program to the indicated file. As far as the program is concerned, it never knows that its output is being redirected. It just goes about its merry way writing to standard output (which is normally your terminal, you'll recall), unaware that the shell has redirected it to a file.

Let's take another look at two nearly identical commands:

$ wc -l users 5 users $ wc -l < users 5 $

In the first case, the shell analyzes the command line and determines that the name of the program to execute is wc and it is to be passed two arguments: -l and users (see

Figure 3.12).

Figure 3.12 Execution of wc -l users.

When wc begins execution, it sees that it was passed two arguments. The first argument, -l, tells it to count the number of lines. The second argument specifies the name of the file whose lines are to be counted. So wc opens the file users, counts its lines, and then prints the count together with the filename at the terminal.

Operation of wc in the second case is slightly different. The shell spots the input redirection character < when it scans the command line. The word that follows on the command line is the name of the file input is to be redirected from. Having "gobbled up" the < users from the command line, the shell then starts execution of the wc program, redirecting its standard input from the file users and passing it the single

argument -l (see Figure 3.13).

Figure 3.13 Execution of wc -l < users.

When wc begins execution this time, it sees that it was passed the single argument -l. Because no filename was specified, wc takes this as an indication that the number of lines appearing on standard input is to be counted. So wc counts the number of lines on standard input, unaware that it's actually counting the number of lines in the file users. The final tally is displayed at the terminalwithout the name of a file because wc wasn't given one.

The difference in execution of the two commands is important for you to understand. If you're still unclear on this point, review the preceding section.

Pipeline Hookup

Just as the shell scans the command line looking for redirection characters, it also looks for the pipe character |. For each such character that it finds, it connects the standard output from the command preceding the | to the standard input of the one following the |. It then initiates execution of both programs. So when you type

who | wc -l the shell finds the pipe symbol separating the commands who and wc. It connects the standard output of the former command to the standard input of the latter, and then initiates execution of both commands. When the who command executes, it makes a list of who's logged in and writes the results to standard output, unaware that this is not going to the terminal but to another command instead.

When the wc command executes, it recognizes that no filename was specified and counts the lines on standard input, unaware that standard input is not coming from the terminal but from the output of the who command.

Environment Control

The shell provides certain commands that let you customize your environment. Your environment includes your home directory, the characters that the shell displays to prompt you to type in a command, and a list of the directories to be searched whenever you request that a program be executed. You'll learn more about this in Chapter 11, "Your Environment."

Interpreted Programming Language

The shell has its own built-in programming language. This language is interpreted, meaning that the shell analyzes each statement in the language one line at a time and then executes it. This differs from programming languages such as C and FORTRAN, in which the programming statements are typically compiled into a machine-executable form before they are executed.

Programs developed in interpreted programming languages are typically easier to debug and modify than compiled ones. However, they usually take much longer to execute than their compiled equivalents.

The shell programming language provides features you'd find in most other programming languages. It has looping constructs, decision-making statements, variables, and functions, and is procedure-oriented. Modern shells based on the IEEE POSIX standard have many other features including arrays, data typing, and built-in arithmetic operations.

NAME

test - check file types and compare values

SYNOPSIS

test EXPRESSION [ EXPRESSION ] test OPTION

DESCRIPTION

Exit with the status determined by EXPRESSION.

--help display this help and exit

--version output version information and exit

EXPRESSION is true or false and sets exit status. It is one of:

( EXPRESSION ) EXPRESSION is true

! EXPRESSION EXPRESSION is false

EXPRESSION1 -a EXPRESSION2 both EXPRESSION1 and EXPRESSION2 are true

EXPRESSION1 -o EXPRESSION2 either EXPRESSION1 or EXPRESSION2 is true

[-n] STRING the length of STRING is nonzero

-z STRING the length of STRING is zero

STRING1 = STRING2 the strings are equal

STRING1 != STRING2 the strings are not equal

INTEGER1 -eq INTEGER2 INTEGER1 is equal to INTEGER2

INTEGER1 -ge INTEGER2 INTEGER1 is greater than or equal to INTEGER2

INTEGER1 -gt INTEGER2 INTEGER1 is greater than INTEGER2

INTEGER1 -le INTEGER2 INTEGER1 is less than or equal to INTEGER2

INTEGER1 -lt INTEGER2 INTEGER1 is less than INTEGER2

INTEGER1 -ne INTEGER2 INTEGER1 is not equal to INTEGER2

FILE1 -ef FILE2 FILE1 and FILE2 have the same device and inode numbers

FILE1 -nt FILE2 FILE1 is newer (modification date) than FILE2

FILE1 -ot FILE2 FILE1 is older than FILE2

-b FILE FILE exists and is block special

-c FILE FILE exists and is character special

-d FILE FILE exists and is a directory

-e FILE FILE exists

-f FILE FILE exists and is a regular file

-g FILE FILE exists and is set-group-ID

-h FILE FILE exists and is a symbolic link (same as -L)

-G FILE FILE exists and is owned by the effective group ID

-k FILE FILE exists and has its sticky bit set

-L FILE FILE exists and is a symbolic link (same as -h)

-O FILE FILE exists and is owned by the effective user ID

-p FILE FILE exists and is a named pipe

-r FILE FILE exists and is readable

-s FILE FILE exists and has a size greater than zero

-S FILE FILE exists and is a socket

-t [FD] file descriptor FD (stdout by default) is opened on a terminal

-u FILE FILE exists and its set-user-ID bit is set

-w FILE FILE exists and is writable

-x FILE FILE exists and is executable

Beware that parentheses need to be escaped (e.g., by backslashes) for shells. INTEGER may also be -l STRING, which evaluates to the length of STRING.

1 Introduction

In previous discussions we have talked about many of the facilities of the C shell, such as command aliasing, job control, etc. In addition, any collection of csh commands may be stored in a file, and csh can be invoked to execute the commands in that file. Such a file is known as a shell script file. The language used in that file is called shell script language. Like other programming languages it has variables and flow control statements (e.g. if-then-else, while, for, goto).

In Unix there are several shells that can be used, the C shell ( csh and its extension, the T C shell tcsh), the Bourne Shell (sh and its extensions the Bourne Again Shell bash and the highly programmable Korn shell ksh ) being the more commonly used.

Note that you can run any shell simply by typing its name. For example, if I am now running csh and wish to switch to ksh, I simply type ksh, and a Korn shell will start up for me. All my commands from that point on will be read and processed by the Korn shell (though when I eventually want to log off, exiting the Korn shell will still leave me in the C shell, so I will have to exit from it too).

2 Invoking Shell Scripts

There are two ways to invoke a shell script file.

2.1 Direct Interpretation

In direct interpretation, the command

csh filename [arg ...]

invokes the program csh to interpret the script contained in the file `filename'.

2.2 Indirect Interpretation

In indirect interpretation, we must insert as the first line of the file

#! /bin/csh

or

#! /bin/csh -f

(there are situations in which this is not necessary, but it won't hurt to have it), and the file must be made executable using chmod (see previous discussion). Then it can be invoked in the same way as any other command, i.e., by typing the script file name on the command line.

The -f option says that we want fast startup, which it will achieve by not reading or executing the commands in .cshrc Thus for example, we won't have the `set' values for shell variables or the aliases from that file, but if we don't need them, this will be much faster (if we need a few of them, we can simply add them to the script file itself).

3 Shell Variables

Like other programming languages the csh language has variables. Some variables are used to control the operation of the shell, such as $path and $history, which we discussed earlier. Other variables can be created and used to control the operation of a shell script file.

3.1 Setting Variables

Values of shell variable are all character-based: A value is formally defined to be a list of zero or more elements, and an element is formally defined to be a character string. In other words, a shell variable consists of an array of strings.

For example,

set X

will set the variable $X to have an empty list as its value. The command

set V = abc

will set V to have the string `abc' as its value. The command

set V = (123 def ghi)

will set V to a list of three elements, which are the strings `123', `def' and `ghi'.

The several elements of a list can be treated like array elements. Thus for V in the last example above, $V[2] is the string `def'. We could change it, say to `abc', by the command

set V[2] = abc

3.2 Referencing and Testing Shell Variables

The value of a shell variable can be referenced by placing a $ before the name of the variable. The command

echo $path

will output the value of the variable $path. Or you can access the variable by enclosing the variable name in curly brace characters, and then prefixing it with a $. The command

echo ${path}

would have the same result as the last example. The second method is used when something is to be appended to the contents of the variable. For example, consider the commands

set fname = prog1 rm ${fname}.c

These would delete the file `prog1.c'.

To see how many elements are in a variable's list, we prefix with a # then a $. The command

echo $#V

above would print 3 to the screen, while

echo $#path

would reveal the number of directories in your search path.

The @ command can be used for computations. For example, if you have shell variables $X and $Y, you can set a third variable $Z to their sum by

@Z = $X + $Y

4 Command Arguments

Most commands have arguments (parameters), and these are accessible via the shell variable $argv. The first parameter will be $argv[1], the second $argv[2], and so on. You can also refer to them as $1, $2, etc. The number of such arguments (analogous to argc in the C language) is $#argv.

For example, consider the following script file, say named Swap:

#! /bin/csh -f

set tmp = $argv[1] cp $argv[2] $argv[1] cp $tmp $argv[2]

This would do what its name implies, i.e. swap two files. If, say, I have files x and y, and I type

Swap x y

then the new contents of x would be what used to be y, and the new contents of y would be what used to be x.

5 Language Constructs

The shell script language, like other programming languages, has constructs for conditional execution (if-then-else; while), iterative execution (for loop), a switch statement, and a goto statement:

1. i

Documents

Unix