36
MULTITHREADING ALGORITHMS Juan Mendivelso

Juan Mendivelso. Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time. Parallel Algorithms:

Embed Size (px)

Citation preview

Page 1: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

MULTITHREADING ALGORITHMSJuan Mendivelso

Page 2: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

SERIAL ALGORITHMS & PARALLEL ALGORITHMS

Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.

Parallel Algorithms: Run on a multiprocessor computer that permits multiple execution to execute concurrently.

Page 3: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PARALLEL COMPUTERS

Computers with multiple processing units.

They can be: Chip Multiprocessors: Inexpensive

laptops/desktops. They contain a single multicore integrated-circuit that houses multiple processor “cores” each of which is a full-fledged processor with access to common memory.

Page 4: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PARALLEL COMPUTERS

Computers with multiple processing units.

They can be: Clusters: Build from individual computers

with a dedicated network system interconnecting them. Intermediate price/performance.

Page 5: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PARALLEL COMPUTERS

Computers with multiple processing units.

They can be: Supercomputers: Combination of custom

architectures and custom networks to deliver the highest performance (instructions per second). High price.

Page 6: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

MODELS FOR PARALLEL COMPUTING

Although the random-access machine model was early accepted for serial computing, no model has been established for parallel computing.

A major reason is that vendors have not agreed on a single architectural model for parallel computers.

Page 7: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

MODELS FOR PARALLEL COMPUTING

For example some parallel computers feature shared memory where all processors can access any location of memory.

Others employ distributed memory where each processor has a private memory.

However, the trend appears to be toward shared memory multiprocessor.

Page 8: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

STATIC THREADING

Shared-memory parallel computers use static threading.

Software abstraction of “virtual processors” or threads sharing a common memory.

Each thread can execute code independently.

For most applications, threads persist for the duration of a computation.

Page 9: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PROBLEMS OF STATIC THREADING

Programming a shared-memory parallel computer directly using static threads is difficult and error prone.

Dynamically partioning the work among the threads so that each thread receives approximately the same load turns out to be complicated.

Page 10: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PROBLEMS OF STATIC THREADING

The programmer must use complex communication protocols to implement a scheduler to load-balance the work.

This has led to the creation of concurrency platforms. They provide a layer of software that coordinates, schedules and manages the parallel-computing resources.

Page 11: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

DYNAMIC MULTITHREADING

Class of concurrency platform. It allows programmers to specify

parallelism in applications without worrying about communication protocols, load balancing, etc.

The concurrency platform contains a scheduler that load-balances the computation automatically.

Page 12: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

DYNAMIC MULTITHREADING

It supports: Nested parallelism: It allows a

subroutine to be spawned, allowing the caller to proceed while the spawned subroutine is computing its result.

Parallel loops: regular for loops except that the iterations can be executed concurrently.

Page 13: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

ADVANTAGES OF DYNAMIC MULTITHREADING

The user only spicifies the logical parallelism.

Simple extension of the serial model with: parallel, spawn and sync.

Clean way to quantify parallelism. Many multithreaded algorithms

involving nested parallelism follow naturally from the Divide & Conquer paradigm.

Page 14: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

BASICS OF MULTITHREADING

Fibonacci Example The serial algorithm: Fib(n) Repeated work Complexity However, recursive calls are independent! Parallel algorithm: P-Fib(n)

Page 15: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

SERIALIZATION

Concurrency keywords: spawn, sync and parallel

The serialization of a multithreaded algorithm is the serial algorithm that results from deleting the concurrency keywords.

Page 16: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

NESTED PARALLELISM

It occurs when the keyword spawn precedes a procedure call.

It differs from the ordinary procedure call in that the procedure instance that executes the spawn - the parent – may continue to execute in parallel with the spawn subroutine – its child - instead of waiting for the child to complete.

Page 17: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

KEYWORD SPAWN

It doesn’t say that a procedure must execute concurrently with its spawned children; only that it may!

The concurrency keywords express the logical parallelism of the computation.

At runtime, it is up to the scheduler to determine which subcomputations actually run concurrently by assigning them to processors.

Page 18: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

KEYWORD SYNC

A procedure cannot safely use the values returned by its spawned children until after it executes a sync statement.

The keyword sync indicates that the procedure must wait until all its spawned children have been completed before proceeding to the statement after the sync.

Every procedure executes a sync implicitly before it returns.

Page 19: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

COMPUTATIONAL DAG

We can see a multithread computation as a directed acyclic graph G=(V,E) called a computational dag.

The vertices are instructions and and the edges represent dependencies between instructions, where (u,v) є E means that instruction u must execute before instruction v.

Page 20: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

COMPUTATIONAL DAG

If a chain of instructions contains no parallel control (no spawn, sync, or return), we may group them into a single strand, each of which represents one or more instructions.

Instructions involving parallel control are not included in strands, but are represented in the structure of the dag.

Page 21: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

COMPUTATIONAL DAG

For example, if a strand has two successors, one of them must have been spawned, and a strand with multiple predecessors indicates the predecessors joined because of a sync.

Thus, in the general case, the set V forms the set of strands, and the set E of directed edges represents dependencies between strands induced by parallel control.

Page 22: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

COMPUTATIONAL DAG

If G has a directed path from strand u to strand, we say that the two strands are (logically) in series. Otherwise, strands u and are (logically) in parallel.

We can picture a multithreaded computation as a dag of strands embedded in a tree of procedure instances.

Example!

Page 23: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

COMPUTATIONAL DAG

We can classify the edges: Continuation edge : connects a strand u

to its successor u’ within the same procedure instance.

Call edges: representing normal procedure calls.

Return edges: When a strand u returns to its calling procedure and x is the strand immediately following the next sync in the calling procedure.

A computation starts with an initial strand and ends with a single final strand.

Page 24: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

IDEAL PARALLEL COMPUTER

A parallel computer that consists of a set of processors and a sequential consistent shared memory.

Sequential consistent means that the shared memory behaves as if the multithreaded computation’s instructions were interleaved to produce alinear order that preserves the partial order of the computation dag.

Page 25: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

IDEAL PARALLEL COMPUTER

Depending on scheduling, the ordering could differ from one run of the program to another.

The ideal-parallel-computer model makes some performance assumptions: Each processor in the machine has equal

computing power It ignores the cost of scheduling.

Page 26: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PERFORMANCE MEASURES

Work: Total time to execute the entire

computation on one processor. Sum of the times taken by each of the

strands. In the computational dag, it is the number

of strands (assuming each strand takes a time unit).

Page 27: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PERFORMANCE MEASURES

Span: Longest time to execute thge strands

along in path in the dag. The span equals the number of vertices on

a longest or critical path. Example!

Page 28: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PERFORMANCE MEASURES

The actual running time of a multithreaded computation depends also on how many processors are available and how the scheduler allocates strands to processors.

Running time on P processors: TP

Work: T1

Span: T∞ (unlimited number of processors)

Page 29: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PERFORMANCE MEASURES

The work and span provide lower bound on the running time of a multithreaded computation TP on P processors: Work law: TP ≥ T1 /P Span law: TP ≥ T∞

Page 30: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PERFORMANCE MEASURES

Speedup: Speedup of a computation on P processors

is the ratio T1 /TP

How many times faster the computation is on P processors than on one processor.

It’s at most P. Linear speedup: T1 /TP = θ(P) Perfect linear speedup: T1 /TP =P

Page 31: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

PERFORMANCE MEASURES

Parallelism: T1 /T∞

Average amount amount of work that can be performed in parallel for each step along the critical path.

As an upper bound, the parallelism gives the maximum possible speedup that can be achieved on any number of processors.

The parallelism provides a limit on the possibility of attaining perfect linear speedup.

Page 32: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

SCHEDULING

Good performance depends on more than minimizing the span and work.

The strands must also be scheduled efficiently onto the processors of the parallel machine.

On multithreaded programming model provides no way to specify which strands to execute on which processors. Instead, we rely on the concurrency platform’s scheduler.

Page 33: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

SCHEDULING

A multithreaded scheduler must schedule the computation with no advance knowledge of when strands will be spawned or when they will complete—it must operate on-line.

Moreover, a good scheduler operates in a distributed fashion, where the threads implementing the scheduler cooperate to load-balance the computation.

Page 34: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

SCHEDULING

To keep the analysis simple, we shall consider an on-line centralized scheduler, which knows the global state of the computation at any given time.

In particular, we shall consider greedy schedulers, which assign as many strands to processors as possible in each time step.

Page 35: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

SCHEDULING

If at least P strands are ready to execute during a time step, we say that the step is a complete step, and a greedy scheduler assigns any P of the ready strands to processors.

Otherwise, fewer than P strands are ready to execute, in which case we say that the step is an incomplete step, and the scheduler assigns each ready strand to its own processor.

Page 36: Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

SCHEDULING

A greedy scheduler executes a multithreaded computation in time: TP

≤ T1 /P + T∞

Greedy scheduling is provably good becauses it achieves the sum of the lower bounds as an upper bound.

Besides it is within a factor of 2 of optimal.