Upload
aysha
View
26
Download
0
Embed Size (px)
DESCRIPTION
What is required for "standard" distributed parallel programming model?. Mitsuhisa Sato Taisuke Boku and Jinpil Lee University of Tsukuba. My Background and Position. OpenMP A s tandard parallel programming model and API for shared memory multiprocessors - PowerPoint PPT Presentation
Citation preview
What is required for "standard" distributed parallel programming
model?
Mitsuhisa SatoTaisuke Boku and Jinpil Lee
University of Tsukuba
2
My Background and Position OpenMP
A standard parallel programming model and API for shared memory multiprocessors
Extend the base language (Fortran/C/C++) with directives or pragma Incremental parallel programming, keep sequential semantics with ignoring
directives allows range of programming styles For scientific applications. Support for loop-based parallelism Target: small-scale( ~ 16processors ) to medium-scale ( ~ 64processors )
First draft is published in 1997, now this standard is getting accepted for multi-core era.
Omni OpenMP compiler project (… now, inactive) The project done in Real World Computing Partnership (RWCP, ~ 2002) Research Objectives
Portable implementation of OpenMP for SMPs Design and implementation of Cluster-enabled OpenMP for PC/WS/SMP
clusters Support seamless programming from SMPs to clusters. Using page-based Software Distributed Shared Memory System
Free and Open-Source, released since 1998
3
Agenda
OpenMPD : directive-based programming model for distributed memory
What is required for "standard" distributed parallel programming model?
4
OpenMPD : directive-based programming model for distributed
memory Objectives
Providing a simple and “easy-to-understand” programming model for distributed memory OpenMP is just for shared memory, not for distributed
memory
Supporting data parallelization and typical parallelization pattern by adding directive similar to OpenMP (inspired by OpenMP)
5
Features of OpenMPD Directive-based programming model for distributed memory
system
C programming language (Fortran) + directives
Explicit communication and synchronization All action is taken by directive for being “easy-to-understand” in
performance tuning
Support typical communication pattern Scatter/gather, reduction, neighbor communication, …
“Directives” describe typical data parallelization array distribution, data synchronization, …
Highly portable implementation with translation to MPI the compiler translate the directives into parallel code using MPI
functions
6
Code Example
int array[YMAX][XMAX];
#pragma ompd distvar(var = array;dim = 2)
main(){ int i, j, res; res = 0;
#pragma ompd for affinity(array) reduction(+:res) for(i = 0; i < 10; i++) for(j = 0; j < 10; j++){ array[i][j] = func(i, j); res += array[i][j]; }}
add to the serial code : incremental parallelization
data distribution
work sharing and data synchronization
7
The same code written in MPIint array[YMAX][XMAX];
main(int argc, char**argv){ int i,j,res,temp_res, dx,llimit,ulimit,size,rank; MPI_Init(argc, argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx = YMAX/size; llimit = rank * dx; if(rank != (size - 1)) ulimit = llimit + dx; else ulimit = YMAX; temp_res = 0; for(i = llimit; i < ulimit; i++) for(j = 0; j < 10; j++){ array[i][j] = func(i, j); temp_res += array[i][j]; }
MPI_Allreduce(&temp_res, &res, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Finalize();}
8
Array data distribution
Each processor computes on different regions #pragma ompd distvar(var=list; dim=num; sleeve=size)
CPU1
CPU2
CPU3
CPU0
array[]
Reference to assigned to other nodes
Synchronization on data→
Sync on sleeve area
Sync. on whole array
The programmer choose which sync is required
In current implementation, whole array are replicated in each node
9
Data synchronization of array (Gather)
Gather operation to distribute data to every nodes #pragma ompd gather(var=list) Execute communication to get data assigned to other nodes Most easy way to synchronize
CPU1
CPU2
CPU3
CPU0
array[]
Now, we can access correct data by local access !!
→ But, communication is expensive!
10
Data synchronization of array (Sleeve)
Exchange data only on “sleeve” region If neighbor data is required to communicate, then only
sleeve area can be considered. example : b[i] = array[i-1] + array[i+1]
CPU1
CPU2
CPU3
CPU0
array[]
Programmer specifies sleeve region explicitlyDirective : #pragma ompd sync_sleeve(var=array)
#pragma ompd distvar(var = array; dim = 1); sleeve = 1)Different from gather operation, communcation on sleeve is cheaper.
User has to specify sleeve region with the size.
11
Parallel Execution of “for” loop
array[]
CPU1
CPU2
CPU3
CPU0
Execute for loop to compute on arrayData region to be computed
by for loop
Execute for loop in parallel with affinity to array distribution : #pragma for affinity(array)
Array distribution
for(i=2; i <=10; i++)
12
Experimental Results
0
1
2
3
4
5
6
7
8
9
1 2 3 4 5 6 7 8
Number of Nodes
Spe
ed
up
ep- openmpdep- mpihimeno- openmpdhimeno- mpicg- openmpdcg- mpi
constant speed-up withmoderate scalability
performance degraded by lack ofmulti-dim. array distribution
13
Related Work
OpenMP Just only for shared memory
Unified Parallel C PGAS (Partitioned Global Address Space)
Language
Co-Array Fortran Also, PGAS
Above two providing alternative programming models of MPI for distributed memory
OpenWP?
14
Future Work and Plan for OpenMPD
Multi-dimensional array distribution and nested parallel loop execution
Integration of PGAS feature for more flexible communication pattern and data distribution
Current OpenMPD only support typical cases. Remote memory access (one-side communication) Part of assigned data should be allocated in each node
Address translation is required.
Supporting hybrid programming with OpenMP within node in SMP/multicore node clusters, even with MPI!
15
Agenda
OpenMPD : directive-based programming model for distributed memory
What is required for "standard" distributed parallel programming model?
16
Message Passing Model (MPI)
Message passing model was the dominant programming model in the past.
…. Yes. Message passing is the dominant programming model
today. … Unfortunately, yes…
Will OpenMP be a programming model for future system?
… I hope so, but it is not perfect. OpenMP is only for shared memory model. (I think) some features for performance turning are missing
data mapping, scalability, IO…
17
For application programmers
Are programmers satisfied with MPI? yes…? Many programmers writes MPI.
Is MPI enough for parallelizing scientific parallel programs?
Application programmer’s concern is to get their answers faster!!
Automatic parallelizing compiler is the best, but … many problems remain.
18
“Life is too short for MPI”(from WOMPAT2001 T-shirt message)
Simple N-body problem for(i = 0; i < n_particles; i++) { p = &particles[i]; ax = 0.0; ay = 0.0; az = 0.0; for(j = 0; j < n_particles; j++){ if(i == j) continue; q = &particles[j]; dx = p->x - q->x; dy = p->y - q->y; dz = p->z - q->z; X = dx * dx + dy * dy + dz * dz; if (X < b2) { f = q->m * (X - a2) * (X - b2); ax += f * dx; ay += f * dy; az += f * dz; } } p->ax = ax; p->ay = ay; p->az = az; } for(i = 0; i < n_particles; i++){
p = &particles[i];p->x += p->vx * DT;p->y += p->vy * DT;p->z += p->vz * DT;p->vx += p->ax * DT;p->vy += p->ay * DT;p->vz += p->az * DT;
}
MPI•Data partitioning•scheduling•communication (broadcast, reduction)
OpenMPjust put #pragma omp parallel at loop!!!
It takes several hours with MPI
It takes just a few 10 min!!!#pragma omp parallel
#pragma omp parallel
19
JedeHPC++mpc++HPFLindaMentatFortran MOccamAPLSAL
pC++SISALNESLClikpHaskelPrologOrcampCC*dataparallel C
Split-CFortran DVCharm++CODEZPLFortran X3H5…..
Parallel programming languages
Programming language design reflects its model.
So far, many parallel programming languages were proposed in computer science community.
Are they actually used by application users? Where were they gone? What is missing in them?
20
Think about MPI, …
Why was MPI accepted and so successful?
Portability: Most parallel computing platforms can run MPI programs (even in SMP).
Many free and portable software such as MPICH.
Education: MPI Standard allows many programmers to learn MPI parallel programming.
In university By book
21
Discussion The demand for parallel programming is increasing!!
Low cost PC clusters SMP in PC box. On-chip multiprocessors, … multiprocessors even in PDA, now!
Of course, … clear and excellent concept of modeling, good performance, … many factors are important!
Standardization and Education are important for widespread use.
Standardization enables a good education. It must be available in many platforms.
22
Discussion Cost of parallelization is also important for acceptance by
application programmers. Easy to transfer from an original sequential program. What application programmers need to learn must be small.
We have a plan to organize the group for “standard” parallel programming language for petaflops systems
Will be supported by RIKEN Try to find a fund for development Should be international.
For the standard, “agreement” process is important rather than “advanced” idea.
Standardization and Education