45
D u k e S y s t e m s CPS 210 Unix and All That Jeff Chase Duke University http://www.cs.duke.edu/~chase/cps210

D u k e S y s t e m s CPS 210 Unix and All That Jeff Chase Duke University chase/cps210

Embed Size (px)

Citation preview

D u k e S y s t e m s

CPS 210Unix and All That

Jeff ChaseDuke University

http://www.cs.duke.edu/~chase/cps210

Unix: A lasting achievement?

“Perhaps the most important achievement of Unix is to demonstrate that a powerful operating system for interactive use need not be expensive…it can run on hardware costing as little as $40,000.”

The UNIX Time-Sharing System* D. M. Ritchie and K. Thompson

1974

DEC PDP-11/24

http://histoire.info.online.fr/pdp11.html

Let’s pause a moment to reflect...

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

Core Rate(SPECint)

Notelog scale

Today Unix runs embedded in devices costing < $100.

Small is beautiful?

The UNIX Time-Sharing System* D. M. Ritchie and K. Thompson

1974

[RT74]: historical hardware details

• [Ritchie/Thompson74] is the classic reference on Unix.

• In 1974, the advances we take for granted were in the future.

• They had to prove it on the hardware they had at the time.

• Many specific implementation choices have changed.

– 14 –character file names

– assembly language C

– 7 protection bits on files

– i-numbers and i-list

– 512-byte blocks

– ppt is “paper tape”???

– vowel embargo

The UNIX Time-Sharing System* D. M. Ritchie and K. Thompson

1974

Some lessons of history• At the time it was created, Unix was the “simplest multi-

user OS people could imagine.”– It’s in the name: Unix vs. Multics

• Simple abstractions can deliver a lot of power.– Many people have been inspired by the power of Unix.

• The community spent four decades making Unix complex again....but the essence is unchanged.

• Unix is a simple context to study core issues for classical OS design. “It’s in there.”

• Unix variants continue to be in wide use.

• They serve as a foundation for advances.

Abstraction

The UNIX Time-Sharing System* D. M. Ritchie and K. Thompson,1974

Innovation

Simple?

• users

• files

• processes

• pipes– which “look like” files

These persist across reboots. They have symbolic names (you choose it) and internal IDs (the system chooses).

These exist within a running system, and they are transient: they disappear on a crash or reboot. They have internal IDs.

Unix supports dynamic create/destroy of these objects.It manages the various name spaces.It has system calls to access these objects.It checks permissions.

Unix: some key concepts

• Names and namespaces– directories and pathnames

– name tree and subtree grafting (mount)

– root directory and current directory

– path prefix list

– resolution

– links (aliases) and reference counting

• Access control by tags and labels– inheritance of tags and labels

• Context manipulation– fork vs. exec

Files: hierarchical name spaceroot directory

mount point

user home directory

external media volume or network storage

applications etc.

“Everything is a file”

Universal Set

A

B

regular files

“Files” special files

directories

The UNIX Time-Sharing System* D. M. Ritchie and K. Thompson,1974

File I/O

char buf[BUFSIZE];int fd;

if ((fd = open(“../zot”, O_TRUNC | O_RDWR) == -1) {perror(“open failed”);exit(1);

}while(read(0, buf, BUFSIZE)) {

if (write(fd, buf, BUFSIZE) != BUFSIZE) {perror(“write failed”);exit(1);

}}

Open files are named within the process by an integer file descriptor.

Pathnames may be relative to process current directory.

Process passes status back to parent on exit, to report success/failure.

Process does not specify current file offset: the system remembers it.

Standard descriptors (0, 1, 2) for input, output, error messages (stdin, stdout, stderr).

Program

Context(Domain)

Thread

“Components in context”

For our purposes, an operating system is a platform that supports protection and isolation: every component runs within a context.Program, context and thread are OS abstractions.

execute

A context defines an isolated sandbox for a running program, so that it can use only the data and resources that the OS grants it.

Program

Running a program

“Unix Classic” simplificationsContext == process == (1 VAS + 1 thread + ...)Each process runs exactly one program/component instance (at a time).IPC channels are pipes.All I/O is based on a simple common abstraction: file / stream.

data

codeconstants

initialized dataimports/exports

symbolstypes/interfaces

The theater analogy

Threads

Address space

Program

scriptcontext (stage)

[lpcox]

Running a program is like performing a play.

Processes and the kernel

data dataPrograms

run asindependent processes.

Protected system calls

...and upcalls (e.g., signals)

Protected OS kernel

mediates access to

shared resources.

Threads enter the kernel for

OS services.

Each process has a private

virtual address space and one

thread.

The kernel is a separate component/context with enforced modularity.The kernel syscall interface supports processes, files, pipes, and signals.

Enforced modularity

pipe(or other channel)

An important theme from Monday’s classBy putting each component instance in a separate context, we can enforce modularity boundaries among components. Each component runs in a sandbox: they can interact only through

pipes. Neither can access the internals of the other.

Other application programs

cc

Other application programs

Hardware

Kernel

sh who a.out

date

wc

grepedvi

ld

as

comp

cppnroff

Unix defines uniform, modular ways to combine programs to build up more complex functionality.

A key idea: Unix pipes

[http://www.bell-labs.com/history/unix/philosophy.html]

Unix programming environment

stdoutstdin

Standard unix programs read a byte stream from standard input (fd==0).

They write their output to standard output (fd==1).

That style makes it easy to combine simple programs using pipes or files.

If the parent sets it up, the program doesn’t even have to know.

Stdin or stdout might be bound to a file, pipe, device, or network socket.

Unix fork/exec/exit/wait syscalls

fork parent fork child

wait exit

int pid = fork();Create a new process that is a clone of its parent.

exec*(“program” [, argvp, envp]);Overlay the calling process with a new program, and transfer control to it.

exit(status);Exit with status, destroying the process. Note: this is not the only way for a process to exit!

int pid = wait*(&status);Wait for exit (or other status change) of a child, and “reap” its exit status. Note: child may have exited before parent calls wait!

exec

initialize child context

Wait

Unix: users and their namespaces

• A unix system has a set of user accounts.– identities, principals

– often correspond to real users, but not always

• Each account has a username.– a human-readable character string: “chase”

– also called a symbolic name

• Each account has a userID– a number for internal use

• These namespaces are flat.

• The system keeps a bidirectional map:– f(username) = userID or

Principles of Computer System Design Saltzer & Kaashoek 2009

Protection Systems 101

Reference monitorExample: Unix kernel

Isolation boundary

Labels and access control

login

shell

tool foo

login

shell

tool

log in

fork, setuid(“alice”), exec

fork/execcreat(“foo”)

write,close open(“foo”)

read

fork/exec

fork, setuid(“bob”), exec

owner=“alice”uid=“alice”

uid=“bob”

Every file and every process is labeled/tagged with a user ID.

A process inherits its userID from its parent process.

A file inherits its owner userID from its creating process.

A privileged process may set its user ID.

Alice Bob

Labels and access control

login

shell

tool foo

login

shell

tool

creat(“foo”)

write,close open(“foo”)

readowner=“alice”

uid=“alice”

uid=“bob”

Should processes running with Bob’s userID be permitted to

open file foo?

Alice BobEvery system defines rules for assigning security labels to

subjects (e.g., Bob’s process) and objects (e.g., file foo).

Every system defines rules to compare the security labels to authorize attempted accesses.

Post-note• We talked about access policy in vanilla Unix.

• The owner of a Unix file may tag it with additional status specifying access rights for subjects.

– Access types = {read, write, execute} [3 bits]

– Subject types = {owner, group, other/anyone} [3 bits]

– If the file is executed, should the system setuid the process to the userID of the file’s owner. [1 bit]

– 10 bits total: (3x3)+1. Usually given in octal: e.g., “777” means 9 bits set: anyone can r/w/x the file, but no setuid.

– It is a very simple form of an access control list (ACL). Later systems like AFS have richer ACLs.

• Unix provides a syscall and shell command for owner to set the permission bits on each file (inode).

• “Group” was added later and is a little more complicated: a user may belong to multiple groups.

Init and Descendents

Kernel “handcrafts” initial process to run “init” program.

Other processes descend from init, and also run as root, including user login guards.

Login invokes a setuid system call to run user shell in a child process after user authenticates.

Children of user shell inherit the user’s identity (uid).

Processes: A Closer Look

+ +user ID

process IDparent PIDsibling links

children

virtual address space process descriptor (PCB)

resources

thread

stack

Each process has a thread bound to the VAS.

The thread has a stack addressable through the

VAS.

The kernel can suspend/restart the thread wherever and whenever it

wants.

The OS maintains some state for each

process in the kernel’s internal

data structures: a file descriptor table, links to maintain the process tree, and a place to store the

exit status.

The address space is a private name space for a set of memory

segments used by the process.

The kernel must initialize the process

memory for the program to run.

0x0

0x7fffffff

Static data

Dynamic data(heap/BSS)

Text(code)

Stack

ReservedVAS example (32-bit)

• An addressable array of bytes…

• Containing every instruction the process thread can execute…

• And every piece of data those instructions can read/write…– i.e., read/write == load/store

• Partitioned into logical segments with distinct purpose and use.

• Every memory reference by a thread is interpreted in its VAS context.– Resolve to a location in machine memory

• A given address in different VAS may resolve to different locations.

64 bytes: 3 waysp + 0x0

0x1f

0x0

0x1f

0x1f

0x0

char p[]char *p

int p[]int* p

p

char* p[]char** p

Pointers (addresses) are 8 bytes on a 64-bit machine.

Alignmentp + 0x0

0x1f

0x0

0x1f

0x1f

0x0

char p[]char *p

int p[]int* p

p

char* p[]char** p

The machine requires that an n-byte value is aligned on an n-byte boundary. n = 2i

XX

X

Heap allocation

Allocated heap blocks for structs or objects.

Align!

A contiguous chunk of memory obtained from

OS kernel.E.g., with Unix sbrk()

system call.

A runtime library obtains the block and manages it as a

“heap” for use by the programming language environment, to store

dynamic objects.

E.g., with Unix malloc and free library calls.

Alternative: block maps

map

The storage in a heap block is contiguous in the VAS. C and

other PL environments require this.

That complicates the heap manager because the heap

blocks may be different sizes.

Idea: use a level of indirection through a map to assemble a

storage object from “scraps” of storage in different locations.

The “scraps” can be fixed-size slots: that makes allocation

easy because they are interchangeable.

Example: page tables that implement a VAS.

Indirection

Variable PartitioningVariable partitioning is the strategy of parking differently sized carsalong a street with no marked parking space dividers.

Wasted spaceexternal fragmentation

2

3

1

Fixed Partitioning

Wasted spaceinternal fragmentation

“Classic Linux Address Space”

http://duartes.org/gustavo/blog/category/linux

N

What’s in an Object File or Executable?

int j = 327;char* s = “hello\n”;char sbuf[512];

int p() { int k = 0; j = write(1, s, 6); return(j);}

text

dataidata

wdata

header

symboltable

relocationrecords

Used by linker; may be removed after final link step and strip.

Header “magic number”indicates type of image.

Section table an arrayof (offset, len, startVA)

program sections

program instructionsp

immutable data (constants)“hello\n”

writable global/static dataj, s

j, s ,p,sbuf

A Peek Inside a Running Program

0

high

code library

your data

heap

registers

CPU

R0

Rn

PC

“memory”

x

x

your program

common runtime

stack

address space(virtual or physical)

SP

y

y

Process Creation in Unix

int pid;int status = 0;

if (pid = fork()) {/* parent */…..pid = wait(&status);

} else {/* child */…..exit(status);

}

Parent uses wait to sleep until the child exits; wait returns child pid and status.

Wait variants allow wait on a specific child, or notification of stops and other signals.

The fork syscall returns twice: it returns a zero to the child and the child process ID (pid) to the parent.

The Shell

• Users may select from a range of interpreter programs available– or even write their own (to add to the confusion)

– csh, sh, ksh, tcsh, bash: choose your flavor…

• Shells execute commands composed of program filenames, args, and I/O redirection symbols.– Shells can run files of commands (scripts) for more

complex tasks, e.g., by redirecting shell’s stdin.

– Shell’s behavior is guided by environment variables.

– E.g., $PATH

Using the shell

• Commands: ls, cat, and all that

• Current directory: cd and pwd

• Arguments: echo

• Signals: ctrl-c

• Job control, foreground, and background: &, ctrl-z, bg, fg

• Environment variables: printenv and setenv

• Most commands are programs: which, $PATH, and /bin

• Shells are commands: sh, csh, ksh, tcsh, bash

• Pipes and redirection: ls | grep a

• Files and I/O: open, read, write, lseek, close

• stdin, stdout, stderr

• Users and groups: whoami, sudo, groups