Upload
sitara
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science. Outline. Introduction to OpenMP OpenMP Programming Model OpenMP Directives OpenMP Clauses Run-Time Library Routine Environment Variables Summary. What is OpenMP. - PowerPoint PPT Presentation
Citation preview
1
Mohsan JameelDepartment of Computing
NUST School of Electrical Engineering and Computer Science
2
OutlineI. Introduction to OpenMP
II. OpenMP Programming Model
III. OpenMP Directives
IV. OpenMP Clauses
V. Run-Time Library Routine
VI. Environment Variables
VII. Summary
3
What is OpenMPApplication program interface (API) that is used
to explicitly direct multi-threaded, shared memory parallelism
Consists of:Compiler directives Run time routinesEnvironment variables
• Specification maintained by the OpenMP, Architecture Review Board (http://www.openmp.org)
• Version 3.0 has been released May 2008
4
What OpenMP is NotNot Automatic parallelization
User explicitly specifies parallel execution Compiler does not ignore user directives even if
wrong
Not just loop level parallelism Functionality to enable coarse grained parallelism
Not meant for distributed memory parallel systems
Not necessarily implemented identically by all vendors
Not Guaranteed to make the most efficient use of shared memory
5
History of OpenMP In the early 90's, vendors of shared-memory machines supplied
similar, directive-based, Fortran programming extensions: The user would augment a serial Fortran program with directives
specifying which loops were to be parallelized. First attempt at a standard was the draft for ANSI X3H5 in 1994. It
was never adopted, largely due to waning interest as distributed memory machines became popular.
The OpenMP standard specification started in the spring of 1997, taking over where ANSI X3H5had left off, as newer shared memory machine architectures started to become prevalent
6
Goal of OpenMPStandardization :
Provide a standard among a variety of shared memory architectures/platforms
Lean and mean : Establish a simple and limited set of directives for
programming shared memory machines.
Ease of Use : Provide capability to incrementally parallelize a
serial program Provide the capability to implement both coarse-
grain and fine-grain parallelism
Portability : Support Fortran (77, 90, and 95), C, and C++
7
OutlineI. Introduction to OpenMP
II. OpenMP Programming Model
III. OpenMP Directives
IV. OpenMP Clauses
V. Run-Time Library Routine
VI. Environment Variables
VII. Summary
8
OpenMP Programming ModelThread Based Parallelism
Explicit Parallelism
Compiler Directive Based
Dynamic Threads
Nested Parallelism Support
Task parallelism support (OpenMP specification 3.0)
9
Shared Memory Model
10
Execution Model
ID=0
ID=1,2,3…N-1
11
TerminologyOpenMP Team=: Master + workers
A parallel region is block of code executed by all threads simultaneously. Master thread always has thread ID=0 Thread adjustment is done before entering parallel
region. An “if” clause can be used with parallel construct,
incase the condition evaluate to FALSE, parallel region is avoided and code run serially
Work-sharing construct is responsible for dividing work among the threads in parallel region
12
Example OpenMP Code Structure
13
Components of OpenMP
14
I. Introduction to OpenMP
II. OpenMP Programming Model
III. OpenMP Directives
IV. OpenMP Clauses
V. Run-Time Library Routine
VI. Environment Variables
VII. Summary
15
Go to helloworld.c
16
C/C++ Parallel Region Example
!$OMP PARALLEL write (*,*) “Hello”
!$OMP END PARALLEL
Hello world from thread = 0Number of threads = 3
Hello world from thread = 1 Hello world from thread = 2
thread 0 thread 1 thread 2
17
OpenMP Directives
18
OpenMP ScopingStatic Extent:
The code textually enclosed between beginning and end of structure block
The static extent does not span other routines
Orphaned Directive:An OpenMP directive appear independently
Dynamic Extent: It include extent of both static extent and
orphaned directives
19
OpenMP Parallel RegionsA block of code that will be executed by multiple
threads
Properties
- Fork-Join Model
- Number of threads won’t change inside a parallel region
- SPMD execution within region
- Enclosed block of code must be structured, no branching into or out of block
Format
#pragma omp parallel clause1 clause2 …
20
OpenMP ThreadsHow many threads?
Use of the omp_set_threads() library functionSetting of the OMP_NUM_THREADS environment
variable Implementation default
Dynamic Threads :By default, the same number of threads are used to
execute each parallel regionTwo methods for enabling dynamic threads
1 Use of the omp_set_dynamic() library function2 Setting of the OMP_DYNAMIC environment variable
21
OpenMP Work-sharing constructs
Data parallelism Functional parallelism Serialize a section
22
Example: Count3s in an array
Lets assume we have an array of N integers.
We want to find how many 3s are in the array.
We needa for loop if statement, anda count variable
Lets look at its serial and parallel version
23
Serial: Count3s in an arrayint count, n=100;int array[n]; // initialize array
for(i=0;i<length;i++){
if (array[i]==3)count++;
}
24
Work-sharing construct: “for loop”
“for loop” work-sharing construct is thought of as data parallelism construct.
25
Parallelize 1st attempt: Count3s in an array
int count, n=100;int array[n]; // initialize array
#pragma omp parallel for default(none) shared(n,array,count) private(i)
for(i=0;i<length;i++){
if (array[i]==3)count++;
}
26
Work-sharing construct:Example of “for loop”
#pragma omp parallel for default(none) shared(n,a,b,c) private(i)
for (i=0;i<n;i++)
{
c[i] = a[i] + b[i];
}
27
Work-sharing construct: “section”
“Section” work-sharing construct is thought of as functional parallelism construct.
28
Parallelize 2nd attempt: Count3s in an array
• Say we also want to count 4s in same array.• Now we have two different function i.e. count 3 and count 4.
int count, n=100;int array[n]; // initialize array#pragma omp parallel sections default(none) shared(n,array,count3,count4)
private(i)
#pragma omp parallel sectionfor(i=0;i<length;i++){
if (array[i]==3)count3++;
}#pragma omp parallel sectionfor(i=0;i<length;i++){
if (array[i]==4)count4++;
}
No date race condition in this example. WHY?
29
#pragma omp parallel sections default(none) shared(a,b,c,d,e,n) private(i)
{#pragma omp section{
printf("Thread %d executes 1st loop \n”,omp_get_thread_num());
for(i=0;i<n;i++)a[i]=3*b[i];
}#pragma omp section{
printf("Thread %d executes 1st loop \n”,omp_get_thread_num());
for(i=0;i<n;i++) e[i]=2*c[i]+d[i];
}}final_sum=sum(a,n) + sum(e,n);printf("FINAL_SUM is %d\n",final_sum)
Work-sharing construct:Example 1 of “section”
30
Work-sharing construct:Example 2 of “section” 1/2
31
Work-sharing construct:Example 2 of “section” 2/2
32
Work-sharing construct:Example of “single”
In parallel region “single block” is used to specify that this block is executed only by one thread in the team of threads.
Lets look at an example
33
I. Introduction to OpenMP
II. OpenMP Programming Model
III. OpenMP Directives
IV. OpenMP Clauses
V. Run-Time Library Routine
VI. Environment Variables
VII. Summary
34
OpenMP Clauses: Data sharing 1/2
shared(list)shared clause is used to specify which data is
shared among thread.All threads can read and write to this shared
variable.By default all variables are shared.
private(list)private variable are local to thread.Typical example of private variable is loop counter,
since each thread has its own loop counter initialized at entry point.
35
A private variable is defined between entry and exit point of parallel region.
A private variable within parallel region has no scope out side of it
firstprivate and lastprivate clauses are used to increase scope of variable beyond parallel region.
firstprivate: All variables in the list are initialized with the original value that object had before entering parallel region
lastprivate: The thread that executes the last iteration or section updates the value of object in list.
OpenMP Clauses: Data sharing 2/2
36
Example: firstprivate and lastprivate
int main(){int C, B , A= 10;
/*--- Start of parallel region ---*/#pragma omp parallel for default(none) firstprivate(A)
lastprivate(B) private(i) for (i=0;i<n;i++){
…B = i + A; …
}/*--- End of parallel region ---*/
C=B;}
37
OpenMP Clauses: nowait
nowait clause is used to avoid implicit synchronization at end of work-sharing directive
38
OpenMP Clause: schedule schedule clause is supported in loop construct only.
Used to control the manner in which loop iterations are distributed over the threads.
Syntax: schedule(kind[,chunk_size)
Types: static[,chunk]: distribute iterations in blocks of size “chunk
over the threads in a round-robin fashion dynamic[,chunk]: fixed portions of work; size is controlled by
the value chunk, when thread finishes its portion it starts with next portion.
guided[,chunk]: same as “dynamic”, but size of the portion of work decreases exponentially.
runtime[,chunk]: iteration scheduling scheme is set at runtime thought environment variable OMP_SCHEDULE
39
The Experiment with schedule clause
40
OpenMP Critical construct
int main(){int sum, n=5;
int a[5]={1,2,3,4,5};/*--- Start of parallel region ---*/#pragma omp parallel for default(none) shared(sum,a,n) private(i) for (i=0;i<n;i++){ sum += a[i]; }/*--- End of parallel region ---*/printf(“sum of vector a =%d”,sum);}
Example summation of a vector
race condition
41
OpenMP Critical constructint main(){
int sum, local_sum, n=5; int a[5]={1,2,3,4,5};/*--- Start of parallel region ---*/#pragma omp parallel default(none) shared(sum,a,n) private(local_sum,i) {
#pragma omp forfor (i=0;i<n;i++){
local_sum += a[i]; }
#pragma omp critical {
sum+=local_sum}
}/*--- End of parallel region ---*/printf(“sum of vector a =%d”,sum);}
42
Parallelize 3rd attempt: Count3s in an array
int count, n=100;int array[n]; // initialize array
#pragma omp parallel default(none) shared(n,array,count) private(i,local_count)
{#pragma omp parallel for for(i=0;i<length;i++){
if (array[i]==3)local_count ++;
}#pragma omp critical
{ count+=local_count}
} /*--- End of Parallel region ---*/
43
OpenMP Clause: reduction
int main(){
int sum, n=5;
int a[5]={1,2,3,4,5};
/*--- Start of parallel region ---*/
#pragma omp parallel for default(none) shared(a,n) private(i)\
reduction(+:sum)
for (i=0;i<n;i++)
{
sum += a[i];
}
/*--- End of parallel region ---*/
printf(“sum of vector a =%d”,sum);
}
• OpenMP provides a reduction clause which is used with for loop and section directives.
• reduction variable must be shared among threads
• race condition is avoided implicitly.
44
Parallelize 4th attempt: Count3s in an array
int count, n=100;int array[n]; // initialize array
#pragma omp parallel for default(none) shared(n,array) private(i) \
for(i=0;i<length;i++){
if (array[i]==3)count++;
} /*--- End of Parallel region ---*/
reduction(+:count)
45
Tasking in OpenMP
Tasking in OpenMPIn OpenMP 3.0 the concept of tasks has been
added to the OpenMP execution model
The Task model is useful is case where the number of parallel pieces and the work involved in each piece varies and/or unknown
Before inclusion of the Task model OpenMP was not suited for unstructured problem
Tasks are often set up within a single construct in a manager-worker model.
46
Task Parallelism Approach 1/2 Threads line up as workers, go through the queue of work
to be done, and do a task
Threads do not wait, as in loop parallelism, rather go back to queue and do more tasks.
Each task is executed serially by work thread that encounter that task in queue.
Load balancing occur as short and long task are done as threads become available.
47
48
Task Parallelism Approach 2/2
49
Example: Task parallelism
50
Best Practices Optimize barrier use
Avoid ordered construct
Avoid large critical regions
Maximize parallel regions
Avoid multiple use of parallel regions
Address poor load balance
51
I. Introduction to OpenMP
II. OpenMP Programming Model
III. OpenMP Directives
IV. OpenMP Clauses
V. Run-Time Library Routine
VI. Environment Variables
VII. Summary
52
List of runtime library routineRuntime library routine are provided in omp.h
header file void omp_set_num_threads(int num); int omp_get_num_threads(); int omp_get_max_threads(); int omp_get_thread_num(); int omp_get_thread_limit(); int omp_get_num_procs(); double omp_get_wtime(); int omp_in_parallel(); // return 0 false and non-zero true Few more
53
More list of runtime library routineThese routine are new with OpenMP 3.0
54
I. Introduction to OpenMP
II. OpenMP Programming Model
III. OpenMP Directives
IV. OpenMP Clauses
V. Run-Time Library Routine
VI. Environment Variables
VII. Summary
55
Environment VariableOMP_NUM_THREAD
OMP_DYANMIC
OMP_THREAD_LIMIT
OMP_STACKSIZE
56
I. Introduction to OpenMP
II. OpenMP Programming Model
III. OpenMP Directives
IV. OpenMP Clauses
V. Run-Time Library Routine
VI. Environment Variables
VII. Summary
57
SummaryOpenMP provides small but yet powerful programming
model
Compilers with OpenMP support are widely available
OpenMP is a directive based shared memory programming model
OpenMP API is a general purpose parallel programming API with emphasis on the ability to parallelize existing programs
Scalable parallel programs can be written by using parallel regions
Work-sharing constructs enable efficient parallelization of computationally intensive portions of program
58
Thank Youand
Exercise Session