22
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National Laboratory

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Embed Size (px)

Citation preview

Page 1: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Parallel Programming Models

Jihad El-Sana

These slides are based on the book:Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Laboratory

Page 2: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Overview

• Parallel programming models in common use: – Shared Memory – Threads – Message Passing – Data Parallel – Hybrid

• Parallel programming models are abstractions above hardware and memory architectures.

Page 3: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Shared Memory Model

• Tasks share a common address space, which they read and write asynchronously.

• Various mechanisms such as locks / semaphores may be used to control access to the shared memory.

• An advantage of this model from the programmer's point of view is that the notion of data "ownership" is lacking, so there is no need to specify explicitly the communication of data between tasks.

• Program development can often be simplified.

Page 4: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Disadvantage

• Is difficult to understand and manage data locality. – Keeping data local to the processor that works on

it conserves memory accesses, cache refreshes and bus traffic that occurs when multiple processors use the same data.

– Unfortunately, controlling data locality is hard to understand and beyond the control of the average user.

Page 5: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Implementations

• The native compilers translate user program variables into actual memory addresses, which are global.

• Common distributed memory platform implementations does not exist.

• A shared memory view of data even though the physical memory of the machine was distributed, impended as virtual shared memory

Page 6: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Threads Model

• A single process can have multiple, concurrent execution paths.

• The main program loads and acquires all of the necessary system and user resources .

• It performs some serial work, and then creates a number of tasks (threads) that run concurrently.

Page 7: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Threads Cont.• The word of a thread can be described as a subroutine within

the main program. • All the thread shares the memory space• Each thread has local data.• They save the overhead of replicating the program's resources.• Threads communicate with each other through global memory. • Threads requires synchronization constructs to insure that

more than one thread is not updating the same global address at any time.

• Threads can come and go, but main thread remains present to provide the necessary shared resources until the application has completed.

Page 8: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Message Passing Model

• Message Passing Model is used– A set of tasks that use

their own local memory during computation.

– Multiple tasks can reside on the same physical machine as well across an arbitrary number of machines.

Page 9: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Message Passing Model

• Tasks exchange data through communications by sending and receiving messages.

• Data transfer usually requires cooperative operations to be performed by each process.

• The communication processes may exist on the same machine of different machines

Page 10: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Data Parallel Model• Most of the parallel work

focuses on performing operations on a data set.

• The data set is typically organized into a common structure.

• A set of tasks work collectively on the same data structure, each task works on a different partition of the same data structure.

Page 11: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Data Parallel Model Cont.

• Tasks perform the same operation on their partition of work.

• On shared memory architectures, all tasks may have access to the data structure through global memory. On distributed memory architectures the data structure is split up and resides as "chunks" in the local memory of each task.

Page 12: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Designing Parallel Algorithms

• The programmer is typically responsible for both identifying and actually implementing parallelism.

• Manually developing parallel codes is a time consuming, complex, error-prone and iterative process.

• Currently, The most common type of tool used to automatically parallelize a serial program is a parallelizing compiler or pre-processor.

Page 13: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

A parallelizing compiler

• Fully Automatic – The compiler analyzes the source code and identifies opportunities

for parallelism. – The analysis includes identifying inhibitors to parallelism and possibly

a cost weighting on whether or not the parallelism would actually improve performance.

– Loops (do, for) loops are the most frequent target for automatic parallelization.

• Programmer Directed – Using "compiler directives" or possibly compiler flags, the

programmer explicitly tells the compiler how to parallelize the code. – May be able to be used in conjunction with some degree of

automatic parallelization also.

Page 14: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Automatic Parallelization Limitations

• Wrong results may be produced • Performance may actually degrade • Much less flexible than manual parallelization • Limited to a subset (mostly loops) of code • May actually not parallelize code if the

analysis suggests there are inhibitors or the code is too complex

Page 15: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

The Problem & The Pogramm • Determine whether or not the problem is one that can actually be

parallelized.• Identify the program's hotspots:

– Know where most of the real work is being done. – Profilers and performance analysis tools can help here – Focus on parallelizing the hotspots and ignore those sections of the program

that account for little CPU usage. • Identify bottlenecks in the program

– Identify areas where the program is slow, or bounded.– May be possible to restructure the program or use a different algorithm to

reduce or eliminate unnecessary slow areas • Identify inhibitors to parallelism. One common class of inhibitor is data

dependence, as demonstrated by the Fibonacci sequence. • Investigate other algorithms if possible. This may be the single most

important consideration when designing a parallel application.

Page 16: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Partitioning

• Break the problem into discrete "chunks" of work that can be distributed to multiple tasks.– domain decomposition – functional decomposition.

Page 17: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Domain Decomposition

• The data associated with a problem is decomposed.

• Each parallel task then works on a portion of the data.

• This partition could be done in different ways.– Row, Columns, Blocks,

Cyclic, etc.

Page 18: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Functional Decomposition

• The problem is decomposed according to the work that must be done. Each task then performs a portion of the overall work.

Page 19: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Communications

• Cost of communications • Latency vs. Bandwidth • Visibility of communications• Synchronous vs. asynchronous communications• Scope of communications– Point-to-point– Collective

• Efficiency of communications• Overhead and Complexity

Page 20: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Synchronization

• Barrier• Lock / semaphore • Synchronous communication operations

Page 21: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Data Dependencies

• A dependence exists between program statements when the order of statement execution affects the results of the program.

• A data dependence results from multiple use of the same location(s) in storage by different tasks.

• Dependencies are important to parallel programming because they are one of the primary inhibitors to parallelism.

Page 22: Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National

Load Balancing

• Load balancing refers to the practice of distributing work among tasks so that all tasks are kept busy all of the time. It can be considered a minimization of task idle time.

• Load balancing is important to parallel programs for performance reasons. For example, if all tasks are subject to a barrier synchronization point, the slowest task will determine the overall performance.