26
Dharmendra Savaliya { Subham Yadav Bhaumik Patel Darshak Shah}

Intel parallel programming

Embed Size (px)

Citation preview

Page 1: Intel parallel programming

Dharmendra Savaliya

{ Subham Yadav

Bhaumik Patel

Darshak Shah}

Page 2: Intel parallel programming

(Using Open MP & MPI)

Page 3: Intel parallel programming

How to make Processors always busy?

Fork join method For OpenMP.

What is MPI?

Page 4: Intel parallel programming

STUDY PROBLEM, SEQUENTIAL PROGRAM

LOOK FOR OPPORTUNITIES FOR PARALLELISM

TRY TO KEEP ALL PROCESSORS BUSY DOING USEFUL WORK

Page 5: Intel parallel programming

DOMAIN DECOMPOSITION

TASK DECOMPOSITION

Dependence Graph

Page 6: Intel parallel programming

First, decide how data elements should be divided among processors. Second, decide which tasks each processor should be doing

Intel White paper : www.intel.com/pressroom/archive/reference/

Page 7: Intel parallel programming

First, divide tasks among processors Second, decide which data elements are going to be accessed (read and/or written) by which processors

Page 8: Intel parallel programming
Page 9: Intel parallel programming

Special kind of task decomposition

Page 10: Intel parallel programming
Page 11: Intel parallel programming

1. for (i = 0; i < 3; i++) a[i] = b[i] / 2.0;

Page 12: Intel parallel programming

2. for (i = 1; i < 4; i++) a[i] = a[i-1] * b[i];

Page 13: Intel parallel programming

3. a = f(x, y, z); b = g(w, x); t = a + b; c = h(z); s = t / c;

Page 14: Intel parallel programming
Page 15: Intel parallel programming

OpenMP is an API for parallel programming

First developed by the OpenMP Architecture Review in 1997, a standard Designed for shared-memory multiprocessors

Set of compiler directives, library functions, and environment variables, but not a language

Can be used with C, C++, or Fortran

Based on fork/join model of threads

Page 16: Intel parallel programming

Strengths

Well-suited for Domain decompositions

Available on Unix and Windows

Weaknesses

Not good for Task decompositions

Race condition due to dependancy

16 Implementing Domain Decompositions

Page 17: Intel parallel programming

The compiler directive

#pragma omp parallel for

tells the compiler that the for loop which immediately follows can be executed in parallel

The number of loop iterations must be computable at run time before loop executes

Loop must not contain a break, return, or exit

Loop must not contain a goto to a label outside loop

Page 18: Intel parallel programming

The hello World OpenMP Prog.

#include <stdio.h> #include <omp.h> int main() { #pragma omp parallel printf("Hello World \n"); return 0; } Since fork/join is a source of loop, we

want to maximize the amount of work done for each fork/join

Master

Master

W0 W1 W2

Page 19: Intel parallel programming
Page 20: Intel parallel programming

Message Passing interface

Used on Distributed memory MIMD architectures • MPI specifies the API for message passing

(communication related routines)

Distributed Memory

Processes have only local memory and must use some other mechanism (e.g., message passing) to exchange information.

Advantage: programmers have explicit control over data distribution and communication

How to make different process to do different things (MIMD functionality)?

Need to know the execution environment: Can usually decide what to do based on the number of processes on this job and the process id.

How many processes are working on this problem?

MPI_Comm_size What is myid?

MPI_Comm_rank Rank is with respect to a communicator (context of the communication). MPI_COM_WORLD is a predefined communicator that includes all processes (already mapped to processors).

Page 21: Intel parallel programming

#include "mpi.h"

#include <stdio.h>

int main( int argc, char *argv[] )

{

MPI_Init( &argc, &argv );

printf( "Hello world\n" );

MPI_Finalize();

return 0;

}

• Mpi.h contains MPI

definitioins and types.

• MPI program must start

with MPI_init

• MPI program must exit

with MPI_Finalize

• MPI functions are just

library routines that can be

used on top of the regular

C, C++

Page 22: Intel parallel programming

MPI_Send(start, count, datatype, dest, tag, comm)

MPI_Recv(start, count, datatype, source, tag, comm, status)

The Simple MPI (six functions that make most of programs work):

MPI_INIT

MPI_FINALIZE

MPI_COMM_SIZE

MPI_COMM_RANK

MPI_SEND

MPI_RECV

Page 23: Intel parallel programming

MPI OpenMP

Distributed memory Shared memory model model

Distributed network on Multi-core processors

Message based Directive based

Flexible and expressive Easier to program and debug

Page 24: Intel parallel programming

If, We use OpenMP and MPI For C Programming But in this case performance is less then MPI. So, Better to use OpenMP or MPI Separately.

Page 25: Intel parallel programming

http://www.intel-software-academic-program.com/pages/courses

Page 26: Intel parallel programming

Any ?