26
Multi-processor Scheduling Two implementation choices Single, global ready queue Per-processor run queue Which is better?

Multi-processor Scheduling

  • Upload
    xuxa

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Multi-processor Scheduling. Two implementation choices Single, global ready queue Per-processor run queue Which is better?. Queue-per-processor. Advantages of queue per processor Promotes processor affinity (better cache locality) Removes a centralized bottleneck - PowerPoint PPT Presentation

Citation preview

Page 1: Multi-processor Scheduling

Multi-processor Scheduling

Two implementation choices Single, global ready queue Per-processor run queue

Which is better?

Page 2: Multi-processor Scheduling

Queue-per-processor

Advantages of queue per processor Promotes processor affinity (better cache locality) Removes a centralized bottleneck

Which runs in global memory Supported by default in Linux 2.6 Java 1.6 support: a double-ended queue

(java.util.Deque) Use a bounded buffer per consumer If nothing in a consumer’s queue, steal work from

somebody else If too much in the queue, push work somewhere else

Page 3: Multi-processor Scheduling

Thread Implementation Issues

Andrew Whitaker

Page 4: Multi-processor Scheduling

Where do Threads Come From?

A few choices: The operating system A user-mode library Some combination of the two…

Page 5: Multi-processor Scheduling

Option #1: Kernel Threads

Threads implemented inside the OS Thread operations (creation, deletion,

yield) are system calls Scheduling handled by the OS scheduler

Described as “one-to-one” One user thread mapped to one

kernel thread Every invocation of Thread.start()

creates a kernel thread

process

OS threads

Page 6: Multi-processor Scheduling

Option #2: User threads

Implemented as a library inside a process

All operations (creation, destruction, yield) are normal procedure calls

Described as “many-to-one” Many user-perceived threads map

to a single OS process/thread

process

OS thread

Page 7: Multi-processor Scheduling

Process Address Space Review

Every process has a user stack and a program counter

In addition, each process has a kernel stack and program counter (not shown here)

code(text segment)

static data(data segment)

heap(dynamic allocated mem)

stack

SP

PC

Page 8: Multi-processor Scheduling

code

(text segment)

static data(data segment)

heap(dynamic allocated mem)

thread 1 stack

PC (T2)

SP (T2)thread 2 stack

thread 3 stack

SP (T1)

SP (T3)

PC (T1)PC (T3)

Threaded Address Space

Every thread always has its own user stack and program counter For both user, kernel

threadsFor user threads, there

is only a single kernel stack, program counter, PCB, etc.

User address space (for both user and kernel threads)

Page 9: Multi-processor Scheduling

User Threads vs. Kernel Threads

User threads are faster Operations do not pass through the OS

But, user threads suffer from: Lack of physical parallelism

Only run on a single processor! Poor performance with I/O

A single blocking operation stalls the entire application

For these reasons, most (all?) major OS’s provide some form of kernel threads

Page 10: Multi-processor Scheduling

When Would User Threads Be Useful?The calculator?The web server?The Fibonacci GUI?

Page 11: Multi-processor Scheduling

Option #3: Two-level Model

OS supports native multi-threading

And, a user library maps multiple user threads to a single kernel thread

“Many-to-many” Potentially captures the best of

both worlds Cheap thread operations Parallelism

process

OS threads

Page 12: Multi-processor Scheduling

Problems with Many-to-Many Threads

Lack of coordination between user and kernel schedulers “Left hand not talking to the right”

Specific problems Poor performance

e.g., the OS preempts a thread holding a crucial lock Deadlock

Given K kernel threads, at most K user threads can block

• Other runnable threads are starved out!

Page 13: Multi-processor Scheduling

Scheduler Activations, UW 1991

Add a layer of communication between kernel and user schedulers

Examples: Kernel tells user-mode that a task has blocked

User scheduler can re-use this execution context Kernel tells user-mode that a task is ready to resume

Allows the user scheduler to alter the user-thread/kernel-thread mapping

Supported by newest release of NetBSD

Page 14: Multi-processor Scheduling

Implementation Spilling Over into the InterfaceIn practice, programmers have

learned to live with expensive kernel threads

For example, thread pools Re-use a static set of threads

throughout the lifetime of the program

Page 15: Multi-processor Scheduling

Locks

Used for implementing critical sectionsModern languages (Java, C#) implicitly

acquire and release locks

interface Lock { public void acquire(); // only one thread allowed between an // acquire and a release public void release();}

Page 16: Multi-processor Scheduling

Two Varieties of Locks

Spin locks Threads busy wait until the lock is freed

Thread stays in the ready/running state

Blocking locks Threads yield the processor until the lock is

freed Thread transitions to the blocked state

Page 17: Multi-processor Scheduling

Why Use Spin Locks?

Spin Locks can be faster No context switching required

Sometimes, blocking is not an option For example, in the kernel scheduler

implementation

Spin locks are never used on a uniprocessor

Page 18: Multi-processor Scheduling

Bogus Spin Lock Implementation

1. class SpinLock implements Lock {2. private volatile boolean isLocked = false;

3. public void acquire() {4. while (isLocked) { ; } // busy wait5. isLocked = true;6. }

7. public void release() {8. isLocked = false;9. }10. }

Multiple threads can acquire this lock!

Page 19: Multi-processor Scheduling

Hardware Support for Locking

Problem: Lack of atomicity in testing and setting the isLocked flag

Solution: Hardware-supported atomic instructions e.g., atomic test-and-set

Java conveniently abstracts these primitives (AtomicInteger, and friends)

Page 20: Multi-processor Scheduling

Corrected Spin Lock

1. class SpinLock implements Lock {2. private final AtomicBoolean isLocked = 3. new AtomicBoolean (false);

4. public void acquire() {5. // get the old value, set a new value6. while (isLocked.getAndSet(true)) { ; }7. }

8. public void release() {9. assert (isLocked.get() == true);10. isLocked.set(false);11. }12. }

Page 21: Multi-processor Scheduling

Blocking Locks: Acquire ImplementationAtomically test-and-set locked statusIf lock is already held:

Set thread state to blocked Add PCB (task_struct) to a wait queue Invoke the scheduler

Problem: must ensurethread-safe access to the wait queue!

Page 22: Multi-processor Scheduling

Disabling Interrupts

Prevents the processor from being interrupted Serves as a coarse-grained lock

Must be used with extreme care No I/O or timers can be processed

Page 23: Multi-processor Scheduling

Thread-safe Blocking Locks

Atomically test-and-set locked statusIf lock is already held:

Set thread state to blocked Disable interrupts Add PCB (task_struct) to a wait queue Invoke the scheduler Next task re-enables interrupts

Page 24: Multi-processor Scheduling

Disabling Interrupts on a MultiprocessorDisabling interrupts can be done locally or

globally (for all processors) Global disabling is extremely heavyweight

Linux: spin_lock_irq Disable interrupts on the local processor Grab a spin lock to lock out other processors

Page 25: Multi-processor Scheduling

Preview For Next Week1. public class Example extends Thread {

2. private static int x = 1;3. private static int y = 1; 4. private static boolean ready = false;

5. public static void main(String[] args) {6. Thread t = new new Example(); 7. t.start();8. 9. x = 2;10. y = 2;11. ready = true;12. }

13. public void run() {14. while (! ready)15. Thread.yield(); // give up the processor16. System.out.println(“x= “ + x + “y= “ + y);17. }18. }

Page 26: Multi-processor Scheduling

What Does This Program Print?

Answer: it’s a race condition. Many different outputs are possible x=2, y=2 x=1,y=2 x=2,y=1 x=1,y=1 Or, the program may print nothing!

The ready loop runs forever