44
Parallel Computing Group - University of La Laguna Towards Structured Parallel Programming Antonio Dorta, Jesús A. González, Casiano Rodríguez and )UDQFLVFR GH6DQGH ([email protected]) 8QLYHUVLW\RI/D/DJXQD Tenerife, Canary Islands, Spain Rome, September 19 2002

Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH ([email protected]) ... Gonzalez M. and

Embed Size (px)

Citation preview

Page 1: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

Towards Structured Parallel Programming

Antonio Dorta, Jesús A. González, Casiano Rodríguez and )UDQFLVFR GH�6DQGH�([email protected])

8QLYHUVLW\�RI�/D�/DJXQDTenerife, Canary Islands, Spain

Rome, September 19 2002

Page 2: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

2XWOLQH

l Skeletons

á Basic Skeletonsl Goals l Related Workl The OTOSP Model

á Examplesl The llCoMP compilerl Example(s)l Computational resultsl Conclusionsl Future Work

Page 3: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

6NHOHWRQV

l Skeletons are software components that reflect the common patterns of most parallel programs

l The goal of the skeleton approach is to develop a viable and formally well-founded methodology for parallel programming

l Programs will be based on a restricted set of structures avoiding the send/receive mechanism

l Dijkstra structured programming, with the inclusion of the for, while, repeat, etc. skeletons, rejecting the use of unstructured gotos is an analogy in the scope of sequential programming

Page 4: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

6NHOHWRQV�&KDUDFWHULVWLFV

l A good skeleton should be a piece of code:

ü Carefully designedü Reusableü Parametrisedü With pre-packaged implementations for

different architectures

l These codes are named skeletons, because they have structure but lack detail

Page 5: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

%DVLF 6NHOHWRQV

l Although there are many flavours of parallel skeletons, but it is clear the importance of these:

l FARM / WorkQueuingl PIPEl MAP / foralll REDUCE, SCAN

l Susana Pelagatti Structured development of parallel programs. Taylor and Francis 1997

Page 6: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH�)DUP���:RUNTXHXLQJ�VNHOHWRQ

ü Models a set of identical workers computing in parallel a stream of independent tasks

Master

TaskList

ResultList

Tasks

Results

Workers

...

Page 7: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH�3LSH�VNHOHWRQ

ü Exploits parallelism in the evaluation of a cascade of stages

...Stage0

Stage1

StageNp-2

StageNp-1

. . .N data items

Page 8: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH�0DS � IRUDOO�VNHOHWRQ

ü Models independent data parallel computations in which the same function is applied to all the elements of a data array

...0 1 2 3 4 5 6 P-1

...f(0) f(1) f(2) f(3) f(4) f(5) f(6) f(P-1)

Page 9: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

d[0]

d[1]

d[i]

d[Np-1]

d[Np-2]

d[j]

7KH 5HGXFH�� 6FDQ�VNHOHWRQü Implementing parallel reduction and

prefix computations of the elements of an array by means of an associative and commutative operator

0

1

Np-2

Np-1i

i

@

@ i=1,Np

Page 10: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

2XU�JRDOV

l We are working in the design and implementation of a language giving suport to these basic skeletons

l The language follows the OpenMP syntax wherever there is one for a skeleton

l Our constructs extend the OpenMP directives with new annotations when it is necessary

l The language should allow the efficient nested combination of any of the basic skeletons

l We want the compiler to produce code for both shared and distributed memory architectures

Page 11: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

5HODWHG�:RUN

l Skeletons:ü The P3L project (Prof. S. Pelagatti)

A “structured” parallel programming language

ü Project eSkel (Prof. M. Cole) Library based approach

ü Project COFFE (Prof. S. Gorlatch) intensive and pragmatic use of collective operations

ü The skeleton library (Prof. H. Kuchen)

Page 12: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

5HODWHG�:RUN

l Nesting IRUDOO clauses:ü The NANOS Project. Ayguade E., Martorell

X., Labarta J., Gonzalez M. and Navarro N. Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study Proc. of the 1999 International Conference on Parallel Processing, Aizu (Japan), September 1999

ü The OMNI Project. Yoshizumi Tanaka, Kenjiro Taura, Mitsuhisa Sato, and Akinori Yonezawa Performance Evaluation of OpenMP Applications with Nested Parallelism Languages, Compilers, and Run-Time Systems for Scalable Computers pp. 100-112, 2000

l The FARM / Workqueuing skeleton:ü KAI-Intel Group (Shah, Petersen, Throop)

Flexible control structures for parallelism in OpenMP EWOMP 1999

Page 13: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH 27263�PRGHO

l The One Thread is One Set of Processors(OTOSP) Model

facilitates the interpretation of how we intend to map the skeletons on distributed memory machines

Page 14: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

$�27263�FRPSXWHU

l Let’s consider an (ideal) OTOSP computer:

ü It is composed of a number of infinite processors connected through a network

ü Each processor is a RAM machine with its own private memory, and the only difference among them is an internal register, containing an integer, the NAME (or number) of the processor

ü The processors are organized in setsü The initial set is composed of all the

processors in the machineü At any time, the memory state of all the

processors in the same set is identicalü An OTOSP computation assumes that all

the processors have the same input dataand the same program in memory

Page 15: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

([DPSOH�RI�FRPSXWDWLRQ...

1 �SUDJPD�RPS�SDUDOOHO�IRU2 �SUDJPD�OOF�UHVXOW(ri + i, si[i]); 3 for(i = 1; i <= 3; i++) {4 ...5 �SUDJPD�RPS�SDUDOOHO�IRU6 �SUDJPD�OOF�UHVXOW(rj + j, sj[j]);7 for(j = 0; j <= i; j++) {8 rj[j] = J_function(i, j, &sj[j], ...);9 }

10 ri[i] = I_function(i, &si[i], ...);11 }

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 ...

...6 120 9 153

i=11 4 7 10 13 16 ...

i=22 5 8 11 14 17 20 ...23

i=3

Page 16: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

([DPSOH

1 24 56 7 810 1112 13 1416 17 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230 ...

...230 ......

1 10... 4 13... 7 16 ...

9 153

2 14... 5 17... 8 20... 11 ...23

i=1 i=2 i=3

6 120 ... ...3 9 15i=1, j =0 i=1, j =1

...1 �SUDJPD�RPS�SDUDOOHO�IRU2 �SUDJPD�OOF�UHVXOW(ri + i, si[i]); 3 for(i = 1; i <= 3; i++) {4 ...5 �SUDJPD�RPS�SDUDOOHO�IRU6 �SUDJPD�OOF�UHVXOW(rj + j, sj[j]);7 for(j = 0; j <= i; j++) {8 rj[j] = J_function(i, j, &sj[j], ...);9 }

10 ri[i] = I_function(i, &si[i], ...);11 }

Page 17: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH�UHDO�VLWXDWLRQ

l In a real scenario, the number of processors is limited

l Two situations have to be considered:

ü The number of tasks is larger than the number of processors available in the current set

nT � nP

ü The number of processors in the current set is larger than the number of tasks:

nP > nT

Page 18: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

Q7�� Q3�

l In the case of more Tasks than processors

1

10

2 3

4 5

67

8

911

12

13

14

15

0

P0 P1 P2 P3 P4 P5

Page 19: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

Q7�� Q3�

l Each processor has to compute several tasks:

34

5 6 78

10911

12 13 14 151

20

P0 P1 P2 P3 P4 P5

l Each processor constitutes a different set

l At the end of the computation, each processor sends its results to its partners processors

#pragma llc result

Page 20: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

Q7�� Q3

l In the case of more processors than tasks

1

230

P0 P1 P2 P3 P4 P5

Page 21: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

Q7�� Q3

l Several processors replicate the computation of the same task

2 3

P0 P1 P2 P3 P4 P5

1 10 0

l All the processors replicating the same task are in the same set

l Each processor exchanges the corresponding results with its partners in the other sets

Page 22: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

OO&R03

l llCoMP is our prototype compiler for the OTOSP model (LL stands for La Laguna)

l The llCoMP (piler)ü Transforms annotated C code to C+MPI

calls ü implemented using lex and yaccü Portability (for the compiler and the

generated code) is an issue in the design

Page 23: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

OO&R03

Source C codeWith pragmaannotations

C preprocessor

Lexical analysis

Syntax analysis

Inermediate Code(C + MPI functions)

Binary MPI code

C compiler MPILibrary

Page 24: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

&RPSXWLQJ P1 double t, pi=0.0, w;2 long i, n = 100000000;3 double local, pi = 0.0, w = 1.0 / n;4 ...5 �SUDJPD�RPS�SDUDOOHO�IRU�reduction(+:pi) private(t)6 �SUDJPD�OOF�UHGXFWLRQBW\SH(double)7 for(i = 0; i < n; i++) {8 t = (i + 0.5) * w;9 pi += 4.0/(1.0 + t*t);10 }11 pi *= w;12 ...

p = = S �

� ����[��G[ ��L�1�1�����L������1���

Page 25: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

&RPSXWLQJ P1 double t, pi=0.0, w;2 long i, n = 100000000;3 double local, pi = 0.0, w = 1.0 / n;4 ...5 �SUDJPD�RPS�SDUDOOHO�IRU�reduction(+:pi) private(t)6 �SUDJPD�OOF�UHGXFWLRQBW\SH(double)7 for(i = 0; i < n; i++) {8 t = (i + 0.5) * w;9 pi += 4.0/(1.0 + t*t);10 }11 pi *= w;12 ...

l When compiled with llCoMP, the loop iterations (tasks) are splitted among the processors

l The SULYDWH clause is kept only for compatiblility with OpenMP, because in the OTOSP model, all the storages are private

l The OpenMP UHGXFWLRQ clause implies a collective communication among all the processors in the set

Page 26: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

&RPSXWLQJ P

l Speedup computing P on a SunFire 6800109 iterations

Page 27: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

&RPELQLQJ�UHGXFWLRQ�DQG�UHVXOW�FODXVHV��0ROHFXODU�'\QDPLFV

1 void compute(int np, int nd, double *box, vnd_t *pos, vnd_t *vel, double mass, vnd_t *f, double *pot_p, double *kin_p) {

3 double x, d, pot, kin;4 int i, j, k;5 vnd_t rij;6 7 pot = kin = 0.0;8 9 �SUDJPD�RPS�SDUDOOHO�IRU�GHIDXOW�VKDUHG�

SULYDWH�L��M��N��ULM��G��UHGXFWLRQ���SRW��NLQ�10 �SUDJPD�OOF�UHGXFWLRQBW\SH��GRXEOH��GRXEOH�11 �SUDJPD�OOF�UHVXOW�I>L@��QG�12 for (i = 0; i < np; i++) { /* energies and forces */13 for (j = 0; j < nd; j++)14 f[i][j] = 0.0;15 for (j = 0; j < np; j++) {16 if (i != j) {17 d = dist(nd, box, pos[i], pos[j], rij);18 pot = pot + 0.5 * v(d); 19 for (k = 0; k < nd; k++) {20 f[i][k] = f[i][k] - rij[k] * dv(d) /d;21 }22 }23 }24 kin = kin + dotr8(nd, vel[i], vel[j]); 25 }26 kin = kin * 0.5 * mass;27 *pot_p = pot;28 *kin_p = kin;29 }

Page 28: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

0ROHFXODU�'\QDPLFV

l SGI Origin 3800 ü dim = 3ü 8192 particlesü simulation steps = 10

Page 29: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH 6LQJOH 5HVRXUFH $OORFDWLRQ�3UREOHP �65$3�

l M units of an indivisible resource and aset of N Tasks

l fn(r) � benefit obtained when r units of resource are allocated to task n

G[n][r] = max{G[n-1][r-i] + fn(i) / 0 � i � r }

max

Subject to

integer,

IQ UQ1

UQ 0Q1

UQU� Q 1 0

( )

,..., ;

==Ê

�= ³

1

10

1 NQ

Page 30: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

$�VHTXHQWLDO�G\QDPLF�SURJUDPPLQJ�DOJRULWKP�IRU�WKH�65$3�SUREOHP

1 int srap(int N, int M, cost f, table G, table L) {2 int r, n, i, s, decision_i, temp, pos, chunksize,

buffersize;3456 for (n = 0; n < N; n++) {7 if (n == 0) 8 for (r = 0; r <= M; r++) {9 G[0][r] = f(0, r); /* f is non decreasing */ 1011 }12 else13 for (r = 0; r <= M; r++) {1415 temp = G[n-1][r]; 16 pos = 0;17 for (i = 1; i <= r; i++) {18 decision_i = G[n-1][r-i] + f(n, i);19 if (decision_i > temp) {20 temp = decision_i; 21 pos = i;22 }23 }24 G[n][r] = temp; 2526 L[n][r] = pos;27 }28 }29 return G[N-1][M];30 }

Page 31: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

8VLQJ�WKH 3,3(�VNHOHWRQ�IRU�WKH�65$3�SUREOHP

1 int srap(int N, int M, cost f, table G, table L) {2 int r, n, i, s, decision_i, temp, pos, chunksize,

buffersize;34 �SUDJPD�OOF�SLSHOLQH�VFKHGXOH�FKXQNVL]H� EXIIHUVL]H�5 �SUDJPD�OOF�UHVXOW (&G[n][0], M) (&L[n][0], M)6 for (n = 0; n < N; n++) {7 if (n == 0) 8 for (r = 0; r <= M; r++) {9 G[0][r] = f(0, r); /* f is non decreasing */ 10 �SUDJPD�OOF�VHQG (&G[0][r], 1)11 }12 else13 for (r = 0; r <= M; r++) {14 �SUDJPD�OOF�UHFHLYH (&G[n-1][r], &s)15 temp = G[n-1][r]; 16 pos = 0;17 for (i = 1; i <= r; i++) {18 decision_i = G[n-1][r-i] + f(n, i);19 if (decision_i > temp) {20 temp = decision_i; 21 pos = i;22 }23 }24 G[n][r] = temp; 25 �SUDJPD�OOF�VHQG (&G[n][r], 1)26 L[n][r] = pos;27 }28 }29 return G[N-1][M];30 }

Page 32: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH�3LSH�VNHOHWRQ

l Lets see the organization of the processors for the case where nP > nT(more processors than stages in the pipe)

data items . . .

...0 1 2 3 4 5 6 P

Stage0

0

N

2N

. . .

. . .

Stage1

2N+1

N+1

1

Stage2

P

N+2

2

StageN-1

2N-1

N-1

Page 33: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

'HSHQGHQFLHV

. . .

.. .

. .

. . .

.

. . .

.

. . .

.

. . .

.

. . .

.. .

. .

3URFHVVRU Q

*>Q��@>U@ *>Q@>U@

3URFHVVRU Q�� �

U

l In the case of a PIPE skeleton, the tasks are not independent: there is a specific relationship among them

G[n][r] = max{G[n-1][r-i] + fn(i) / 0 � i � r }

Page 34: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH 6LQJOH 5HVRXUFH�$OORFDWLRQ�3UREOHP

l Cray T3Eü Tasks: 350ü Resource units: 4000

Page 35: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

1HVWLQJ�WKH�VNHOHWRQV��WKH�))7�DOJRULWKP

1 void FFT(Complex *A, Complex *a, Complex *W, unsigned N, 2 unsigned stride, Complex *D) {3 Complex *B, *C;4 Complex Aux, *pW;5 unsigned i, n;67 if(N == 1) {8 A[0].re = a[0].re; 9 A[0].im = a[0].im;10 }11 else {12 n = (N >> 1); 13 B = D; 14 C = D + n;1516 �SUDJPD�RPS�SDUDOOHO�IRU17 �SUDJPD�OOF�UHVXOW(D+i*n, n)18 for(i = 0; i <= 1; i++)19 FFT(D+i*n, a+i*stride, W, n, stride<<1, A+i*n);2021 for(i = 0, pW = W; i < n; i++, pW += stride) {22 Aux.re = pW->re * C[i].re - pW->im * C[i].im;23 Aux.im = pW->re * C[i].im + pW->im * C[i].re;24 A[i].re = B[i].re + Aux.re; 25 A[i].im = B[i].im + Aux.im;26 A[i+n].re = B[i].re - Aux.re; 27 A[i+n].im = B[i].im - Aux.im;28 } 29 }30 }

Page 36: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH ))7 DOJRULWKP

))7(A) = B = (B[0] , ..., B[N-1]) ³&�N

B[i] = Êk=0..N-1 A[k] wki, w = e2pi/ N=cos(2p/ N)+i sin(2p/ N)

B[ i] = Êk=0..N/ 2-1 A[2k] (w2)ki + wi Êk=0..N/ 2-1 A[2k+1] (w2)ki

A=(A[0], ...., A[N-1]) ³&�N

FFT(A[0], A[1], ... A[N-1])

FFT(A[�], A[�], ... A[1��]) FFT(A[�], A[�], ... A[1��])

FFT(A[�], A[�], ... A[1��])3�

FFT(A[�], A[�], ... A[1��])

3�FFT(A[�], A[�], ... A[1��])

3�FFT(A[�], A[�], ... A[1��])

3�

Page 37: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH�)DVW�)RXULHU�7UDQVIRUP

l Cray T3Eü Sizes: 64K, 120K, 256K, 512K, 1M

Page 38: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

7KH�0DWUL[�SURGXFW

1 �SUDJPD�RPS�SDUDOOHO�IRU2 �SUDJPD�OOF�ZHLJKW (1 << t)3 �SUDJPD�OOF�UHVXOW(CC + t * m * m, m * m)45 for(t = 0; t <= tasks - 1; t++) {6 col = n << t;7 A = AA + m * n * ((1 << t) - 1);8 B = BB + m * n * ((1 << t) - 1);9 C = CC + m * m * t; 10 �SUDJPD�RPS�SDUDOOHO�IRU11 �SUDJPD�OOF�UHVXOW(C + i * m, m)12 13 for(i = 0; i <= m - 1; i++) {14 for(j = 0; j < m; j++)15 for(C[t][i][j] = 0.0, k = 0; k < n; k++) 16 C[t][i][j] += A[t][i][k] * B[t][k][j];17 }18 }

Page 39: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

0DWUL[�SURGXFWV�ZLWK�GLIIHUHQW�OHYHOV�RI�SDUDOOHOLVP

l SGI 3000ü Each task is a matrix productü Exploiting different levels of parallelism

Page 40: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

&XUUHQW�GHYHORSPHQWV

l Algorithms:ü Molecular Dynamicsü Conjugate Gradientü NAS Embarrasingly Parallelü Mandelbrot Setü Matrix productü Quicksortü Fast Fourier Transformü Single Resource Allocation Problemü Knapsack Problem

l Architectures:ü Cray T3Eü Beowulf-type PC clusterü SGI Origin 3800ü Sunfire 6800 SMP

Page 41: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

&RQFOXVLRQVl We have presented a proposal for a

skeletal language that extends OpenMP with new constructs

l We have introduced the OTOSP abstract model, as the basis for our implementation

l The model guarantees the portability to any platform

l We have developed llCoMP a prototype compiler for the language

l We have shown different examples of algorithms implemented following the ideas of the model

l Have presented computational results for these examples on different architecturesobtained with the llCoMPiler

l We think that it makes worth the research and development of tools oriented to the OTOSP model

Page 42: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

)XWXUH�:RUN

l Add new features to the languageü Improve the PIPE skeleton

n Introducing different assignment policiesn Controling the number of processors

assigned to each stagen Using buffers to produce ‘tiled’ coden Managing data sequences with unknown

lenghtn Managing data sequences with varying

sizes

ü Work on the FARM skeleton

l Improve the prototype llCoMP compilerü Including type analysis

l To implement the OTOSP model using threads

Page 43: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

$FNQRZOHGJPHQWV

l Edinburgh Parallel Computing Centre (EPCC)l Centre Europeu de Parallelisme de Barcelona

(CEPBA)l Centre de Supercomputació de Catalunya

(CESCA)l Centro de Investigaciones Energéticas,

Medioambientales y Tecnológicas (CIEMAT)

l This research benefits from the support of Secretaría de Estado de Universidades e Investigación, SEUI, project MaLLBa, TIC1999-0754-C03

l Also from the European Commission through grant number HPRI-CT-1999-00026

Page 44: Towards Structured Parallel Programming - cOMPunity€¦ · Towards Structured Parallel Programming ... Casiano Rodríguez and )UDQFLVFR GH 6DQGH (fsande@ull.es) ... Gonzalez M. and

Parallel Computing Group - University of La Laguna

Towards Structured Parallel Programming

Antonio Dorta, Jesús A. González, Casiano Rodríguez and

)UDQFLVFR GH�6DQGH�([email protected])

http://nereida.deioc.ull.es/llCoMP/