COSC 3407: Operating Systems Lecture 5: Independent Vs. Cooperating Threads

This lecture… Thread Creation Why do we need to handle cooperating threads? Atomic operations

Thread Creation Thread “fork” – create a new thread, three

arguments:– Pointer to application routine to execute (fcnPtr)– Pointer to arguments records (fcnArgPtr)– Size of stack to allocate

Fork for threads not the same as fork for processes

Fork(fcnPtr, fcnArgPtr, StackSize)

Appl. Routine to run

args

Thread Creation Thread fork implementation:

– Sanity check arguments – check stack size not infinite, fcnPtr points to something well formed, i.e. a valid code

– Enter kernel mode (sanity check again)– Allocate a stack and a new TCB and initialize its register

fields. » In particular, the stack pointer is made to point at the

stack, the PC return address is made to point at an OS (assembler) routine ThreadRoot, and two of the registers are initialized to fcnPtr and fcnArgPtr

– Put the newly allocated TCB on the ready list (Runnable). This will cause it to eventually be dispatched by run_new_thread, and start running the routine ThreadRoot.

Thread Creation ThreadRoot:

– Do start-up housekeeping (e.g., record start time, accounting information).

– Return to user mode.– Call fcnPtr(fcnArgPtr).– Do thread finish-up:call ThreadFinish.

ThreadRoot

fcnPtr

A

B

Thread Finish ThreadFinish:

– Put any threads waiting on the termination of this thread on the ready list.

– Can’t deallocate thread yet, since we’re still running on its stack. Record thread as “waitingToBeDestroyed”.

– Call run_new_thread to run another thread. ThreadHouseKeeping will examine waithingToBeDestroyed and deallocate the finished thread’s TCB and stack.

run_new_thread() { newThread = PickNewThread(); switch(curThread, newThread); ThreadHouseKeeping(); } ThreadRoot

ThreadFinish

Additional Details Thread fork is not the same thing as UNIX “fork”.

UNIX fork creates a new process, so it has to create a new address space, in addition to a new thread.

For now, don’t worry about how switching between different processes’ address spaces is done.

Thread fork is very much like an asynchronous procedure call – it means, go do this work, where the calling thread does not wait for the callee to complete.

Parent-Child relationship

Every thread (and/or Process) has a parentage– A “parent” is a thread that creates another

thread– A child of a parent was created by that parent

Typical process treefor Solaris system

Thread Join What if thread wants to exit early?

– ThreadFinish() and exit() are essentially the same procedure entered at user level

What if the calling thread needs to wait?– Thread Join – wait for a forked thread to finish.– Calling thread will be taken off run queue and

placed on waiting queue

ThreadJoin() system call Where is a logical place to store this wait queue?

– On queue inside the TCB

Similar to wait() system call in UNIX– Lets parents wait for child processes

OtherStateTCB9

LinkRegisters

OtherStateTCB6

LinkRegisters

OtherStateTCB16

LinkRegisters

HeadTail

TerminationWait queue

TCBtid

Use of Join for Traditional Procedure Call Thus, a traditional procedure call is logically

equivalent to doing a fork then immediately doing a join.

This is a normal procedure call (synchronous):A() { B(); }B() { }

The procedure A can also be implemented as:

A’() { Thread t = new Thread; t->Fork(B); t->Join();}

Synchronous/Asynchronous Procedure calls Why not replace synchronous with asynchronous

procedure calls everywhere?– Overhead: allocate TCB, allocate stack, startup

overhead, context switch– Crashing of a thread kills the process

Multithreading Multiple activities within the same process

– Web server, database server No protection!

Need coordination between threads– Web crawler: multiple threads going to different

links; make sure that different threads do not return the same page as it will waste bandwidth and resources

– File system – different threads don’t apply the same lock to two different users

Multiprocessing vs. Multiprogramming Multiprocessing = multiple CPU Multiprogramming = multiple jobs or processes Definition of “run concurrently” – scheduler is free to run

threads in any order (e.g., FIFO, random, etc.) For example:

Multiprocessing vs. Multiprogramming Dispatcher can choose to run each thread to

completion, or time-slice in big chunks, or time slice so that each thread executes only one instruction at a time (simulating a multiprocessor, where each CPU operates in lockstep).

If the dispatcher can do any of the above, programs must work under all cases, for all interleavings.

So how can you know if your concurrent program works? Whether all interleavings will work?

Definitions Independent threads:

– No state shared with other threads– Deterministic – input state determines result– Reproducible – (input state can be recreated)I/O, memory,

…– Scheduling order doesn’t matter

Cooperating threads:– Shared state– Non-deterministic – Non-reproducible

Non-reproducibility and non-determinism means that bugs can be intermittent.

This makes debugging really hard!

Interactions Complicate Debugging Is any program truly independent?

– Every process shares the file system, OS resources, network, etc

– Extreme example: buggy device driver causes thread A to crash “independent thread” B

You probably don’t realize how much you depend on reproducibility:– Example: Evil C compiler

» Modifies files behind your back by inserting errors into C program unless you insert debugging code

– Example: Debugging statements can overrun stack Non-deterministic errors are really difficult to find

– Example: Memory layout of kernel+user programs» depends on scheduling, which depends on timer/other

things» Original UNIX had a bunch of non-deterministic errors

– Example: Something which does interesting I/O» User typing of letters used to help generate secure keys

Why allow cooperating threads? People cooperate; and computers model people’s

behavior, so computers at some level have to cooperate!1. Share resources/information

– One computer, many users– One bank balance, many ATMs– Embedded systems (ex: robot control)

2. Speedup– Overlap I/O and computation– UNIX file system does read ahead– Multiprocessors – chop up program into smaller pieces

3. Modularity– chop large problem up into simpler pieces. For example, to

compile: gcc – cpp | cc1 | cc2 | as | ld– This makes the system easier to extend; you can replace the

assembler without changing the loader.

Some simple concurrent programs Most of the time, threads are working on separate data, so

scheduling order doesn’t matter. Initially, y = 12

What about?

What are the possible values for x after the above? What are the possible values of x below?

Can’t say anything useful about a concurrent program without knowing what are the underlying indivisible operations!

Thread A Thread B x = 1 y = 2

x = 1 y = 2x = y + 1 y = y * 2

x = 1 x = 2

Atomic operations What we want is some way of allowing a thread to

perform a task without having other threads interfere with the task.

Atomic operation: an operation that always runs to completion, or not at all. – It is indivisible: it can’t be stopped in the middle, and its

state can’t be modified by someone else during the operation.

On most machines, memory reference and assignment (i.e., load and store) of words are atomic.

Many instructions are not atomic. – For example, on most 32-bit architectures, double

precision floating point store is not atomic; it involves two separate memory operations.

High-level Example: Web Server

Server must handle many requests Non-cooperating version:

serverLoop() { con = AcceptCon(); ProcessFork(ServiceWebPage(),con);

} What are some disadvantages of this

technique?

Threaded Web Server Now, use a single process Multithreaded (cooperating) version:

serverLoop() { connection = AcceptCon(); ThreadFork(ServiceWebPage(),connection);

} Looks almost the same, but has many advantages:

– Can share file caches kept in memory, results of CGI scripts, other things

– Threads are much cheaper to create than processes, so this has a lower per-request overhead

Question: would a user-level (say many-to-one) thread package make sense here?– When one request blocks on disk, all block…

Thread Pools Problem with previous version: Unbounded Threads

– When web-site becomes too popular – throughput sinks

Instead, allocate a bounded “pool” of threads, representing the maximum level of multiprogramming

master() { allocThreads(slave,queue); while(TRUE) { con=AcceptCon(); Enqueue(queue,con); wakeUp(queue); }}

slave(queue) { while(TRUE) { con=Dequeue(queue); if (con==null) sleepOn(queue); else ServiceWebPage(con); }}

MasterThread

Thread Pool

qu

eu

e

Documents

COSC 3407: Operating Systems Lecture 5: Independent Vs. Cooperating Threads