Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
Introduction to parallelprogramming via OpenMP
August 20, 2019
Introduction to parallel programming via OpenMP August 20, 2019 1 / 25
Contents
1 Basic facts about parallel algorithms
2 How things are stored in computer memory
3 OpenMP
Introduction to parallel programming via OpenMP August 20, 2019 2 / 25
�Parallel� comes from ...
Introduction to parallel programming via OpenMP August 20, 2019 3 / 25
Amdahl's law
Gives theoretical speedup formula:
Slatency =1
a + 1−ap
where
Slatency is the theoretical speedup of the algorithm,
a = number of operations done sequentiallytotal number of operations
< 1,
p is the number of parallel workers.
Introduction to parallel programming via OpenMP August 20, 2019 4 / 25
Amdahl's law
It follows that
Slatency(s) ≤1
a
lims→∞ Slatency(s) =1
a
Doesn't take into account many other factors: memory,various overheads, ...
But can be used as a �rst estimate!
Introduction to parallel programming via OpenMP August 20, 2019 5 / 25
Example: dot product
Dot product
Given vectors x and y of length N compute
c = x · y =N∑i=1
xiyi
where xi , yj - coordinates of x and y
Questions
What is the total number of operations?
What is the maximum speedup?
What about memory?
Introduction to parallel programming via OpenMP August 20, 2019 6 / 25
Amdahl's law (modi�ed)
Slatency =1
a + 1−ap
+ c
where:
Slatency is the theoretical speedup of the execution of
the whole task,
a = number of operations done in paralleltotal number of operations
< 1,
p is the number of parallel workers,
c is the communication overhead (nonlinear!).
Introduction to parallel programming via OpenMP August 20, 2019 7 / 25
Main characteristics:
�ops = �oating point operations per second
memory access speed
memory latency = amount of time to satisfy anindividual memory request
memory bandwidth = how much data a memory channelcan transfer at once
Introduction to parallel programming via OpenMP August 20, 2019 8 / 25
Memory access speed
Introduction to parallel programming via OpenMP August 20, 2019 9 / 25
Memory bandwidth
True story (for graphical chips at least)
You cannot optimize more after you've reached the maximummemory bandwidth!
Introduction to parallel programming via OpenMP August 20, 2019 10 / 25
Measuring memory bandwidth
Theoretical bandwidth:number of RAM sticks × bus width × memory clockfrequency for one stick.
Experimental bandwidth (simplest variant):
void write_memory_loop ( void ∗ a r ray , s i z e_t s i z e ) {s i z e_t ∗ c a r r a y = ( s i z e_t ∗) a r r a y ;s i z e_t i ;f o r ( i = 0 ; i < s i z e / s i z eo f ( s i z e_t ) ; i++) {
c a r r a y [ i ] = 1 ;}
}
Experimental bandwidth: accessed memory / timeTaken from:1) http://codearcana.com/posts/2013/05/18/achieving-maximum-memory-bandwidth.html
2) https://github.com/awreece/memory-bandwidth-demo
Introduction to parallel programming via OpenMP August 20, 2019 11 / 25
OpenMP
Two main concepts of OpenMP:
the use of threads.
fork/join model of parallelism.
Website: http://www.openmp.org/Alternatives working on shared memory: TBB, Cilk (high-level), Pthreads(low-level)
Introduction to parallel programming via OpenMP August 20, 2019 12 / 25
OpenMP
Let's try �hello openmp world�.
#pragma omp p a r a l l e l pr i va te ( n th reads , t i d ){
t i d = omp_get_thread_num ( ) ;p r i n t f ( " He l l o OpenMP wor ld from th r ead = %d\n" , t i d ) ;i f ( t i d == 0){
n th r e ad s = omp_get_num_threads ( ) ;p r i n t f ( "Number o f t h r e ad s = %d\n" , n th r e ad s ) ;
}}
Compiling:
GNU: -fopenmp
Intel: -openmp
Environment variable: export OMP_NUM_THREADS = NIntroduction to parallel programming via OpenMP August 20, 2019 13 / 25
OpenMP
Main tool: adding preprocessor directives #pragma omp tellsthe compiler how to set up threads.
Syntax
#pragma omp directive-name [clause[ [,] clause]...] new- line#pragma omp parallel
Memory speci�cation
Default: variables declared outside parallel regions are shared,internal are private.
Introduction to parallel programming via OpenMP August 20, 2019 14 / 25
OpenMP
Example: vector dot product
Implementation 1
sum = 0 . 0 ;#pragma omp p a r a l l e l f o r
fo r ( i n t i =0; i<N; ++i ){
sum += a [ i ]∗ b [ i ] ;}
Wrong result!
Introduction to parallel programming via OpenMP August 20, 2019 15 / 25
OpenMP
Why? It's a race condition. All threads are trying to updatesum at the same time. Let's correct that!
Implementation 2
sum = 0 . 0 ;#pragma omp p a r a l l e l f o r
fo r ( i n t i =0; i<N; ++i ){
#pragma omp atomicsum += a [ i ]∗ b [ i ] ;
}
Introduction to parallel programming via OpenMP August 20, 2019 16 / 25
OpenMP
If there is a special construction - it might be good idea to usethat!
Implementation 3
sum = 0 . 0 ;#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum)
f o r ( i n t i =0; i<N; ++i ){
sum += a [ i ]∗ b [ i ] ;}
Introduction to parallel programming via OpenMP August 20, 2019 17 / 25
OpenMP
Warning: don't use more programming threads than yourhardware allows!
Implementation 4
sum = 0 . 0 ;#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum) \num_threads (8 )
f o r ( i n t i =0; i<N; ++i ){
sum += a [ i ]∗ b [ i ] ;}
Introduction to parallel programming via OpenMP August 20, 2019 18 / 25
OpenMP
Less �defaults� - less errors! De�ne variable types explicitly.
Implementation 5
sum = 0 . 0 ;#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum) \sha r ed ( a , b ,N)
f o r ( i n t i =0; i<N; ++i ){
sum += a [ i ]∗ b [ i ] ;}
Introduction to parallel programming via OpenMP August 20, 2019 19 / 25
OpenMP
Let's try to add more �exibility. Dynamic should be good. Or...?
Implementation 6
sum = 0 . 0 ;#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum) \
sha r ed ( a , b ) s c h edu l e ( dynamic )f o r ( i n t i =0; i<N; ++i ){
sum += a [ i ]∗ b [ i ] ;}
Introduction to parallel programming via OpenMP August 20, 2019 20 / 25
OpenMP
Think of how memory is read for a moment. Do you have anyidea how to improve? Use blocks
Implementation 7
sum=0.0;i n t M=32;
#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum) \sha r ed ( a , b )f o r ( i n t i =0; i<N/M; i++){
f o r ( i n t j=i ∗M; j <( i +1)∗M; j++){
sum+=a [ j ]∗ b [ j ] ;}
}
Introduction to parallel programming via OpenMP August 20, 2019 21 / 25
OpenMP
Just an alternative.Implementation 8
#pragma omp p a r a l l e l s e c t i o n s{
#pragma omp s e c t i o n{
f o r ( i n t i =0; i<N/2 ; i++){
sum1+=a [ i ]∗ b [ i ] ;}
}#pragma omp s e c t i o n
{f o r ( i n t i=N/2 ; i<N; i++){
sum2+=a [ i ]∗ b [ i ] ;}
}} Introduction to parallel programming via OpenMP August 20, 2019 22 / 25
OpenMP
Things to learn:
#pragma omp atomic
#pragma omp flush
#pragma omp critical
#pragma omp barrier
#pragma omp single/master
KMP_AFFINITY
locks and mutexes
false sharing
OpenMP tasks
Introduction to parallel programming via OpenMP August 20, 2019 23 / 25
OpenMP in Python
Bad news (AFAIK)
Python is not a friend with OpenMP because of the GIL =General Interpreter Lock. Python has GIL so only one thread isallowed to run at a given time.
Good news (AFAIK)
There are some Python extensions which can handle OpenMP:SciPy (to some extent), CPython, weave, other C extensions...
Introduction to parallel programming via OpenMP August 20, 2019 24 / 25
Thanks for coming!
Introduction to parallel programming via OpenMP August 20, 2019 25 / 25