Intel parallel programming

Dharmendra Savaliya

{ Subham Yadav

Bhaumik Patel

Darshak Shah}

(Using Open MP & MPI)

How to make Processors always busy?

Fork join method For OpenMP.

What is MPI?

STUDY PROBLEM, SEQUENTIAL PROGRAM

LOOK FOR OPPORTUNITIES FOR PARALLELISM

TRY TO KEEP ALL PROCESSORS BUSY DOING USEFUL WORK

DOMAIN DECOMPOSITION

TASK DECOMPOSITION

Dependence Graph

First, decide how data elements should be divided among processors. Second, decide which tasks each processor should be doing

Intel White paper : www.intel.com/pressroom/archive/reference/

First, divide tasks among processors Second, decide which data elements are going to be accessed (read and/or written) by which processors

Special kind of task decomposition

1. for (i = 0; i < 3; i++) a[i] = b[i] / 2.0;

2. for (i = 1; i < 4; i++) a[i] = a[i-1] * b[i];

3. a = f(x, y, z); b = g(w, x); t = a + b; c = h(z); s = t / c;

OpenMP is an API for parallel programming

First developed by the OpenMP Architecture Review in 1997, a standard Designed for shared-memory multiprocessors

Set of compiler directives, library functions, and environment variables, but not a language

Can be used with C, C++, or Fortran

Based on fork/join model of threads

Strengths

Well-suited for Domain decompositions

Available on Unix and Windows

Weaknesses

Not good for Task decompositions

Race condition due to dependancy

16 Implementing Domain Decompositions

The compiler directive

#pragma omp parallel for

tells the compiler that the for loop which immediately follows can be executed in parallel

The number of loop iterations must be computable at run time before loop executes

Loop must not contain a break, return, or exit

Loop must not contain a goto to a label outside loop

The hello World OpenMP Prog.

#include <stdio.h> #include <omp.h> int main() { #pragma omp parallel printf("Hello World \n"); return 0; } Since fork/join is a source of loop, we

want to maximize the amount of work done for each fork/join

Master

Master

W0 W1 W2

Message Passing interface

Used on Distributed memory MIMD architectures • MPI specifies the API for message passing

(communication related routines)

Distributed Memory

Processes have only local memory and must use some other mechanism (e.g., message passing) to exchange information.

Advantage: programmers have explicit control over data distribution and communication

How to make different process to do different things (MIMD functionality)?

Need to know the execution environment: Can usually decide what to do based on the number of processes on this job and the process id.

How many processes are working on this problem?

MPI_Comm_size What is myid?

MPI_Comm_rank Rank is with respect to a communicator (context of the communication). MPI_COM_WORLD is a predefined communicator that includes all processes (already mapped to processors).

#include "mpi.h"

#include <stdio.h>

int main( int argc, char *argv[] )

{

MPI_Init( &argc, &argv );

printf( "Hello world\n" );

MPI_Finalize();

return 0;

}

• Mpi.h contains MPI

definitioins and types.

• MPI program must start

with MPI_init

• MPI program must exit

with MPI_Finalize

• MPI functions are just

library routines that can be

used on top of the regular

C, C++

MPI_Send(start, count, datatype, dest, tag, comm)

MPI_Recv(start, count, datatype, source, tag, comm, status)

The Simple MPI (six functions that make most of programs work):

MPI_INIT

MPI_FINALIZE

MPI_COMM_SIZE

MPI_COMM_RANK

MPI_SEND

MPI_RECV

MPI OpenMP

Distributed memory Shared memory model model

Distributed network on Multi-core processors

Message based Directive based

Flexible and expressive Easier to program and debug

If, We use OpenMP and MPI For C Programming But in this case performance is less then MPI. So, Better to use OpenMP or MPI Separately.

http://www.intel-software-academic-program.com/pages/courses

Any ?

Engineering

Intel parallel programming