25

Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Introduction to parallelprogramming via OpenMP

August 20, 2019

Introduction to parallel programming via OpenMP August 20, 2019 1 / 25

Page 2: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Contents

1 Basic facts about parallel algorithms

2 How things are stored in computer memory

3 OpenMP

Introduction to parallel programming via OpenMP August 20, 2019 2 / 25

Page 3: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

�Parallel� comes from ...

Introduction to parallel programming via OpenMP August 20, 2019 3 / 25

Page 4: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Amdahl's law

Gives theoretical speedup formula:

Slatency =1

a + 1−ap

where

Slatency is the theoretical speedup of the algorithm,

a = number of operations done sequentiallytotal number of operations

< 1,

p is the number of parallel workers.

Introduction to parallel programming via OpenMP August 20, 2019 4 / 25

Page 5: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Amdahl's law

It follows that

Slatency(s) ≤1

a

lims→∞ Slatency(s) =1

a

Doesn't take into account many other factors: memory,various overheads, ...

But can be used as a �rst estimate!

Introduction to parallel programming via OpenMP August 20, 2019 5 / 25

Page 6: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Example: dot product

Dot product

Given vectors x and y of length N compute

c = x · y =N∑i=1

xiyi

where xi , yj - coordinates of x and y

Questions

What is the total number of operations?

What is the maximum speedup?

What about memory?

Introduction to parallel programming via OpenMP August 20, 2019 6 / 25

Page 7: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Amdahl's law (modi�ed)

Slatency =1

a + 1−ap

+ c

where:

Slatency is the theoretical speedup of the execution of

the whole task,

a = number of operations done in paralleltotal number of operations

< 1,

p is the number of parallel workers,

c is the communication overhead (nonlinear!).

Introduction to parallel programming via OpenMP August 20, 2019 7 / 25

Page 8: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Main characteristics:

�ops = �oating point operations per second

memory access speed

memory latency = amount of time to satisfy anindividual memory request

memory bandwidth = how much data a memory channelcan transfer at once

Introduction to parallel programming via OpenMP August 20, 2019 8 / 25

Page 9: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Memory access speed

Introduction to parallel programming via OpenMP August 20, 2019 9 / 25

Page 10: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Memory bandwidth

True story (for graphical chips at least)

You cannot optimize more after you've reached the maximummemory bandwidth!

Introduction to parallel programming via OpenMP August 20, 2019 10 / 25

Page 11: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Measuring memory bandwidth

Theoretical bandwidth:number of RAM sticks × bus width × memory clockfrequency for one stick.

Experimental bandwidth (simplest variant):

void write_memory_loop ( void ∗ a r ray , s i z e_t s i z e ) {s i z e_t ∗ c a r r a y = ( s i z e_t ∗) a r r a y ;s i z e_t i ;f o r ( i = 0 ; i < s i z e / s i z eo f ( s i z e_t ) ; i++) {

c a r r a y [ i ] = 1 ;}

}

Experimental bandwidth: accessed memory / timeTaken from:1) http://codearcana.com/posts/2013/05/18/achieving-maximum-memory-bandwidth.html

2) https://github.com/awreece/memory-bandwidth-demo

Introduction to parallel programming via OpenMP August 20, 2019 11 / 25

Page 12: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Two main concepts of OpenMP:

the use of threads.

fork/join model of parallelism.

Website: http://www.openmp.org/Alternatives working on shared memory: TBB, Cilk (high-level), Pthreads(low-level)

Introduction to parallel programming via OpenMP August 20, 2019 12 / 25

Page 13: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Let's try �hello openmp world�.

#pragma omp p a r a l l e l pr i va te ( n th reads , t i d ){

t i d = omp_get_thread_num ( ) ;p r i n t f ( " He l l o OpenMP wor ld from th r ead = %d\n" , t i d ) ;i f ( t i d == 0){

n th r e ad s = omp_get_num_threads ( ) ;p r i n t f ( "Number o f t h r e ad s = %d\n" , n th r e ad s ) ;

}}

Compiling:

GNU: -fopenmp

Intel: -openmp

Environment variable: export OMP_NUM_THREADS = NIntroduction to parallel programming via OpenMP August 20, 2019 13 / 25

Page 14: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Main tool: adding preprocessor directives #pragma omp tellsthe compiler how to set up threads.

Syntax

#pragma omp directive-name [clause[ [,] clause]...] new- line#pragma omp parallel

Memory speci�cation

Default: variables declared outside parallel regions are shared,internal are private.

Introduction to parallel programming via OpenMP August 20, 2019 14 / 25

Page 15: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Example: vector dot product

Implementation 1

sum = 0 . 0 ;#pragma omp p a r a l l e l f o r

fo r ( i n t i =0; i<N; ++i ){

sum += a [ i ]∗ b [ i ] ;}

Wrong result!

Introduction to parallel programming via OpenMP August 20, 2019 15 / 25

Page 16: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Why? It's a race condition. All threads are trying to updatesum at the same time. Let's correct that!

Implementation 2

sum = 0 . 0 ;#pragma omp p a r a l l e l f o r

fo r ( i n t i =0; i<N; ++i ){

#pragma omp atomicsum += a [ i ]∗ b [ i ] ;

}

Introduction to parallel programming via OpenMP August 20, 2019 16 / 25

Page 17: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

If there is a special construction - it might be good idea to usethat!

Implementation 3

sum = 0 . 0 ;#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum)

f o r ( i n t i =0; i<N; ++i ){

sum += a [ i ]∗ b [ i ] ;}

Introduction to parallel programming via OpenMP August 20, 2019 17 / 25

Page 18: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Warning: don't use more programming threads than yourhardware allows!

Implementation 4

sum = 0 . 0 ;#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum) \num_threads (8 )

f o r ( i n t i =0; i<N; ++i ){

sum += a [ i ]∗ b [ i ] ;}

Introduction to parallel programming via OpenMP August 20, 2019 18 / 25

Page 19: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Less �defaults� - less errors! De�ne variable types explicitly.

Implementation 5

sum = 0 . 0 ;#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum) \sha r ed ( a , b ,N)

f o r ( i n t i =0; i<N; ++i ){

sum += a [ i ]∗ b [ i ] ;}

Introduction to parallel programming via OpenMP August 20, 2019 19 / 25

Page 20: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Let's try to add more �exibility. Dynamic should be good. Or...?

Implementation 6

sum = 0 . 0 ;#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum) \

sha r ed ( a , b ) s c h edu l e ( dynamic )f o r ( i n t i =0; i<N; ++i ){

sum += a [ i ]∗ b [ i ] ;}

Introduction to parallel programming via OpenMP August 20, 2019 20 / 25

Page 21: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Think of how memory is read for a moment. Do you have anyidea how to improve? Use blocks

Implementation 7

sum=0.0;i n t M=32;

#pragma omp p a r a l l e l f o r r e d u c t i o n (+:sum) \sha r ed ( a , b )f o r ( i n t i =0; i<N/M; i++){

f o r ( i n t j=i ∗M; j <( i +1)∗M; j++){

sum+=a [ j ]∗ b [ j ] ;}

}

Introduction to parallel programming via OpenMP August 20, 2019 21 / 25

Page 22: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Just an alternative.Implementation 8

#pragma omp p a r a l l e l s e c t i o n s{

#pragma omp s e c t i o n{

f o r ( i n t i =0; i<N/2 ; i++){

sum1+=a [ i ]∗ b [ i ] ;}

}#pragma omp s e c t i o n

{f o r ( i n t i=N/2 ; i<N; i++){

sum2+=a [ i ]∗ b [ i ] ;}

}} Introduction to parallel programming via OpenMP August 20, 2019 22 / 25

Page 23: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP

Things to learn:

#pragma omp atomic

#pragma omp flush

#pragma omp critical

#pragma omp barrier

#pragma omp single/master

KMP_AFFINITY

locks and mutexes

false sharing

OpenMP tasks

Introduction to parallel programming via OpenMP August 20, 2019 23 / 25

Page 24: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

OpenMP in Python

Bad news (AFAIK)

Python is not a friend with OpenMP because of the GIL =General Interpreter Lock. Python has GIL so only one thread isallowed to run at a given time.

Good news (AFAIK)

There are some Python extensions which can handle OpenMP:SciPy (to some extent), CPython, weave, other C extensions...

Introduction to parallel programming via OpenMP August 20, 2019 24 / 25

Page 25: Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP

Thanks for coming!

Introduction to parallel programming via OpenMP August 20, 2019 25 / 25