View
215
Download
0
Tags:
Embed Size (px)
Citation preview
INVESTIGATE AND PARALLEL PROCESSING USING E1350 IBM ESERVER CLUSTERAyaz ul Hassan Khan (g201002860)
OBJECTIVES
Explore the architecture of E1350 IBM eServer Cluster
Parallel Programming: OpenMP MPI MPI+OpenMP
Analyzing the effects of above programming models on speedup
Finding out overheads and optimize as much as possible
IBM E1350 CLUSTER
CLUSTER SYSTEM The cluster is unique in its dual-boot capability with
Microsoft Windows HPC Server 2008 and Red Hat Enterprise Linux 5 operating systems.
The cluster has 3 master nodes, one for Red Hat Linux, one for Windows HPC Server 2008 and one for cluster management.
The cluster has 128 compute nodes. Each compute node of the cluster is dual-processor having
two 2.0 GHz x3550 Xeon Quad-core E5405 processors. The total number of cores in the cluster is 1024. Each master node has 1 TB of hard disk space and each
compute node has 500 GB of hard disk. Each master node has 8 GB of RAM. Each compute node has 4 GB of RAM. The interconnect is 10 GBASE-SR
EXPERIMENTAL ENVIRONMENT
Nodes: hpc081, hpc082, hpc083, hpc084 Compilers:
icc: for sequential and OpenMP programs mpiicc: for MPI and MPI+OpenMP programs
Profiling Tools: ompP: for OpenMP profiling mpiP: for MPI profiling
APPLICATIONS USED/IMPLEMENTED
Jacobi Iterative Method Max Speedup = 7.1 (OpenMP, Threads = 8) Max Speedup = 3.7 (MPI, Nodes = 4) Max Speedup = 9.3 (MPI+OpenMP, Nodes = 2,
Threads = 8) Alternating Direction Integration (ADI)
Max Speedup = 5.0 (OpenMP, Threads = 8) Max Speedup = 0.8 (MPI, Nodes = 1) Max Speedup = 1.7 (MPI+OpenMP, Nodes = 1,
Threads = 8)
JACOBI ITERATIVEMETHOD Solving systems of linear equations
= - x
/
JACOBI ITERATIVEMETHOD Sequential Codefor(i = 0; i < N; i++){
x[i] = b[i];}
for(i=0; i<N; i++){sum = 0.0;for(j=0; j<N; j++){
if(i != j){sum += a[i][j] * x[j];new_x[i] = (b[i] - sum)/a[i][i];
}}
}for(i=0; i < N; i++)
x[i] = new_x[i];
128 256 384 512 640 7680
0.10.20.30.40.50.60.70.80.9
sequential
sequential
Space Size (N)
Tim
e (
secs)
JACOBI ITERATIVEMETHOD OpenMP Code#pragma omp parallel private(k,i,j, sum){
for(k = 0; k < MAX_ITER; k++){#pragma omp for
for(i=0; i<N; i++){sum = 0.0;for(j=0; j<N; j++){
if(i != j){sum += a[i][j] * x[j];new_x[i] = (b[i] - sum)/a[i][i];
}}
}#pragma omp for
for(i=0; i < N; i++)x[i] = new_x[i];
}}
JACOBI ITERATIVEMETHOD OpenMP Performance
128 256 384 512 640 768012345678
OpenMP (barrier)
2-cores4-cores8-cores
Space Size (N)
Speedup
128 256 384 512 640 7680
2
4
6
8
10
12
OpenMP (nowait)
2-cores4-cores8-cores
Space Size (N)
Speedup
128 256 384 512 640 7680
0.10.20.30.40.50.60.70.8
OpenMP (barrier)
2-cores4-cores8-cores
Space Size (N)
Overh
ead
128 256 384 512 640 7680
0.10.20.30.40.50.60.70.8
OpenMP (nowait)
2-cores4-cores8-cores
Space Size (N)
Overh
ead
JACOBI ITERATIVEMETHOD ompP results (barrier)
R00002 jacobi_openmp.c (46-55) LOOP TID execT execC bodyT exitBarT taskT 0 0.09 100 0.07 0.01 0.00 1 0.08 100 0.07 0.00 0.00 2 0.08 100 0.07 0.01 0.00 3 0.08 100 0.07 0.01 0.00 4 0.08 100 0.07 0.01 0.00 5 0.08 100 0.07 0.01 0.00 6 0.08 100 0.07 0.01 0.00 7 0.08 100 0.07 0.01 0.00 SUM 0.65 800 0.59 0.06 0.00
R00003 jacobi_openmp.c (56-58) LOOP TID execT execC bodyT exitBarT taskT 0 0.00 100 0.00 0.00 0.00 1 0.00 100 0.00 0.00 0.00 2 0.00 100 0.00 0.00 0.00 3 0.00 100 0.00 0.00 0.00 4 0.00 100 0.00 0.00 0.00 5 0.00 100 0.00 0.00 0.00 6 0.00 100 0.00 0.00 0.00 7 0.00 100 0.00 0.00 0.00 SUM 0.01 800 0.00 0.01 0.00
JACOBI ITERATIVEMETHOD ompP results (nowait)
R00002 jacobi_openmp.c (43-52) LOOP TID execT execC bodyT exitBarT taskT 0 0.08 100 0.08 0.00 0.00 1 0.08 100 0.08 0.00 0.00 2 0.08 100 0.08 0.00 0.00 3 0.08 100 0.08 0.00 0.00 4 0.08 100 0.08 0.00 0.00 5 0.08 100 0.08 0.00 0.00 6 0.08 100 0.08 0.00 0.00 7 0.08 100 0.08 0.00 0.00 SUM 0.63 800 0.63 0.00 0.00
R00003 jacobi_openmp.c (53-55) LOOP TID execT execC bodyT exitBarT taskT 0 0.00 100 0.00 0.00 0.00 1 0.00 100 0.00 0.00 0.00 2 0.00 100 0.00 0.00 0.00 3 0.00 100 0.00 0.00 0.00 4 0.00 100 0.00 0.00 0.00 5 0.00 100 0.00 0.00 0.00 6 0.00 100 0.00 0.00 0.00 7 0.00 100 0.00 0.00 0.00 SUM 0.00 800 0.00 0.00 0.00
JACOBI ITERATIVEMETHOD MPI CodeMPI_Scatter(a, N * N/P, MPI_DOUBLE, apart, N * N/P, MPI_DOUBLE, 0, MPI_COMM_WORLD);MPI_Bcast(x, N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
for(i=myrank*N/P, k=0; k<N/P; i++, k++)bpart[k] = x[i];
for(k = 0; k < MAX_ITER; k++){for(i=0; i<N/P; i++){
sum = 0.0;for(j=0; j<N; j++){
index = i+((N/P)*myrank);if(index != j){
sum += apart[i][j] * x[j];new_x[i] = (bpart[i] - sum)/apart[i][index];
}}
}MPI_Allgather(new_x, N/P, MPI_DOUBLE, x, N/P, MPI_DOUBLE, MPI_COMM_WORLD);}
JACOBI ITERATIVEMETHOD MPI Performance
128 256 384 512 640 7680
0.51
1.52
2.53
3.54
MPI
1-node2-nodes4-nodes
Space Size (N)
Speedup
128 256 384 512 640 7680
102030405060708090
MPI
1-node2-nodes4-nodes
Space Size (N)
Max M
PIT
ime-t
o-A
ppTim
e
Rati
o (
%)
JACOBI ITERATIVEMETHOD mpiP results
---------------------------------------------------------------------------@--- Aggregate Time (top twenty, descending, milliseconds) -------------------------------------------------------------------------------------------Call Site Time App% MPI% COVAllgather 1 60.1 6.24 19.16 0.00Allgather 2 58.8 6.11 18.77 0.00Allgather 3 57.3 5.96 18.29 0.00Scatter 4 34.6 3.59 11.03 0.00Scatter 3 31.8 3.30 10.14 0.00Scatter 1 30.1 3.13 9.61 0.00Scatter 2 27 2.81 8.62 0.00Bcast 2 7.05 0.73 2.25 0.00Allgather 4 4.33 0.45 1.38 0.00Bcast 3 2.25 0.23 0.72 0.00Bcast 1 0.083 0.01 0.03 0.00Bcast 4 0.029 0.00 0.01 0.00
JACOBI ITERATIVEMETHOD MPI+OpenMP CodeMPI_Scatter(a, N * N/P, MPI_DOUBLE, apart, N * N/P, MPI_DOUBLE, 0, MPI_COMM_WORLD);MPI_Bcast(x, N, MPI_DOUBLE, 0, MPI_COMM_WORLD);for(i=myrank*N/P, k=0; k<N/P; i++, k++)
bpart[k] = x[i];omp_set_num_threads(T);#pragma omp parallel private(k, i, j, index){for(k = 0; k < MAX_ITER; k++){#pragma omp for
for(i=0; i<N/P; i++){sum = 0.0;for(j=0; j<N; j++){
index = i+((N/P)*myrank);if(index != j){
sum += apart[i][j] * x[j];new_x[i] = (bpart[i] - sum)/apart[i][index];
}}
}#pragma omp master{
MPI_Allgather(new_x, N/P, MPI_DOUBLE, x, N/P, MPI_DOUBLE, MPI_COMM_WORLD);}}}
JACOBI ITERATIVEMETHOD MPI+OpenMP Performance
128 256 384 512 640 7680123456789
10
MPI+OpenMP
1-node2-nodes4-nodes
Space Size (N)
Speedup
128 256 384 512 640 7680
0.5
1
1.5
2
2.5
3
3.5
MPI+OpenMP
1-node2-nodes4-nodes
Space Size (N)
Overh
ead
128 256 384 512 640 7680
102030405060708090
MPI+OpenMP
1-node2-nodes4-nodes
Space Size (N)
Max M
PIT
ime-t
o-A
ppTim
e
Rati
o (
%)
JACOBI ITERATIVEMETHOD ompP results
R00002 jacobi_mpi_openmp.c (55-65) LOOP TID execT execC bodyT exitBarT taskT 0 0.03 100 0.02 0.01 0.00 1 0.24 100 0.02 0.23 0.00 2 0.24 100 0.02 0.22 0.00 3 0.24 100 0.02 0.22 0.00 4 0.24 100 0.02 0.22 0.00 5 0.24 100 0.02 0.22 0.00 6 0.24 100 0.02 0.22 0.00 7 0.24 100 0.02 0.22 0.00 SUM 1.72 800 0.15 1.56 0.00
R00003 jacobi_mpi_openmp.c (67-70) MASTER TID execT execC 0 0.22 100 SUM 0.22 100
JACOBI ITERATIVEMETHOD mpiP results
---------------------------------------------------------------------------@--- Aggregate Time (top twenty, descending, milliseconds) -------------------------------------------------------------------------------------------Call Site Time App% MPI% COVScatter 8 34.7 9.62 14.11 0.00Allgather 1 32.6 9.05 13.28 0.00Scatter 6 31.3 8.70 12.76 0.00Scatter 2 30.2 8.39 12.31 0.00Allgather 3 29.9 8.30 12.18 0.00Allgather 5 27.6 7.67 11.25 0.00Scatter 4 27.1 7.51 11.02 0.00Allgather 7 22.1 6.14 9.00 0.00Bcast 4 7.12 1.98 2.90 0.00Bcast 6 2.81 0.78 1.14 0.00Bcast 2 0.09 0.02 0.04 0.00Bcast 8 0.033 0.01 0.01 0.00
ADI Alternating Direction Integration
-= * /
-= * /
ADI Sequential Code//////ADI forward & backword sweep along rows//////for (i = 0; i < N; i++){
for (j = 1; j < N; j++){x[i][j] = x[i][j]-x[i][j-1]*a[i][j]/b[i][j-1];b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i][j-1];
}x[i][N-1] = x[i][N-1]/b[i][N-1];
}for (i = 0; i < N; i++)
for (j = N-2; j > 1; j--)x[i][j]=(x[i][j]-a[i][j+1]*x[i][j+1])/b[i][j];
////// ADI forward & backward sweep along columns//////for (j = 0; j < N; j++){
for (i = 1; i < N; i++){x[i][j] = x[i][j]-x[i-1][j]*a[i][j]/b[i-1][j];b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i-1][j];
}x[N-1][j] = x[N-1][j]/b[N-1][j];
}for (j = 0; j < N; j++)
for (i = N-2; i > 1; i--)x[i][j]=(x[i][j]-a[i+1][j]*x[i+1][j])/b[i][j];
128 256 384 512 640 7680
0.51
1.52
2.53
3.54
4.55
sequential
sequential
Space Size (N)
Tim
e (
secs)
ADI OpenMP Code#pragma omp parallel private(iter){for(iter = 1; iter <= MAXITER; iter++){//////ADI forward & backword sweep along rows//////#pragma omp for private(i,j) nowaitfor (i = 0; i < N; i++){
for (j = 1; j < N; j++){x[i][j] = x[i][j]-x[i][j-1]*a[i][j]/b[i][j-1];b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i][j-1];
}x[i][N-1] = x[i][N-1]/b[i][N-1];
}#pragma omp for private(i,j)for (i = 0; i < N; i++)
for (j = N-2; j > 1; j--)x[i][j]=(x[i][j]-a[i][j+1]*x[i][j+1])/b[i][j];
////// ADI forward & backward sweep along columns//////#pragma omp for private(i,j) nowaitfor (j = 0; j < N; j++){
for (i = 1; i < N; i++){x[i][j] = x[i][j]-x[i-1][j]*a[i][j]/b[i-1][j];b[i][j]= b[i][j] - a[i][j]*a[i][j]/b[i-1][j];
}x[N-1][j] = x[N-1][j]/b[N-1][j];
}#pragma omp for private(i,j)for (j = 0; j < N; j++)
for (i = N-2; i > 1; i--)x[i][j]=(x[i][j]-a[i+1][j]*x[i+1][j])/b[i][j];
}
ADI OpenMP Performance
128 256 384 512 640 7680
1
2
3
4
5
6
OpenMP
2-cores4-cores8-cores
Space Size (N)
Speedup
128 256 384 512 640 768012345678
OpenMP
2-cores4-cores8-cores
Space Size (N)
Overh
ead
ADI ompP results
R00002 adi_openmp.c (43-50) LOOP TID execT execC bodyT exitBarT taskT 0 0.18 100 0.18 0.00 0.00 1 0.18 100 0.18 0.00 0.00 2 0.18 100 0.18 0.00 0.00 3 0.18 100 0.18 0.00 0.00 4 0.18 100 0.18 0.00 0.00 5 0.18 100 0.18 0.00 0.00 6 0.18 100 0.18 0.00 0.00 7 0.18 100 0.18 0.00 0.00 SUM 1.47 800 1.47 0.00 0.00
R00003 adi_openmp.c (52-57) LOOP TID execT execC bodyT exitBarT taskT 0 0.11 100 0.10 0.01 0.00 1 0.11 100 0.10 0.01 0.00 2 0.11 100 0.10 0.01 0.00 3 0.10 100 0.10 0.00 0.00 4 0.11 100 0.10 0.01 0.00 5 0.10 100 0.10 0.01 0.00 6 0.10 100 0.10 0.01 0.00 7 0.10 100 0.10 0.00 0.00 SUM 0.84 800 0.78 0.06 0.00
R00004 adi_openmp.c (61-68) LOOP TID execT execC bodyT exitBarT taskT 0 0.38 100 0.38 0.00 0.00 1 0.31 100 0.31 0.00 0.00 2 0.35 100 0.35 0.00 0.00 3 0.29 100 0.29 0.00 0.00 4 0.35 100 0.35 0.00 0.00 5 0.36 100 0.36 0.00 0.00 6 0.36 100 0.36 0.00 0.00 7 0.37 100 0.37 0.00 0.00 SUM 2.77 800 2.77 0.00 0.00
R00005 adi_openmp.c (70-75) LOOP TID execT execC bodyT exitBarT taskT 0 0.16 100 0.16 0.00 0.00 1 0.23 100 0.15 0.07 0.00 2 0.19 100 0.14 0.05 0.00 3 0.25 100 0.16 0.09 0.00 4 0.19 100 0.14 0.05 0.00 5 0.18 100 0.17 0.01 0.00 6 0.18 100 0.17 0.01 0.00 7 0.17 100 0.17 0.01 0.00 SUM 1.55 800 1.26 0.29 0.00
ADI MPI CodeMPI_Bcast(a, N * N, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);
for(i=myrank*(N/P), k=0; k<N/P; i++, k++)for(j=0;j<N;j++)
apart[k][j] = a[i][j];
for(iter = 1; iter <= 2*MAXITER; iter++){//////ADI forward & backword sweep along rows//////for (i = 0; i < N/P; i++){
for (j = 1; j < N; j++){xpart[i][j] = xpart[i][j]-xpart[i][j-1]*apart[i][j]/bpart[i][j-1];bpart[i][j]= bpart[i][j] - apart[i][j]*apart[i][j]/bpart[i][j-1];
}xpart[i][N-1] = xpart[i][N-1]/bpart[i][N-1];
}for (i = 0; i < N/P; i++){
for (j = N-2; j > 1; j--)xpart[i][j]=(xpart[i][j]-apart[i][j+1]*xpart[i][j+1])/bpart[i]
[j];
ADI MPI CodeMPI_Gather(xpart, N*N/P, MPI_FLOAT, x, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Gather(bpart, N*N/P, MPI_FLOAT, b, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);
//transpose matricestrans(x, N, N);trans(b, N, N);trans(a, N, N);
MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);
for(i=myrank*(N/P), k=0; k<N/P; i++, k++)for(j=0;j<N;j++)
apart[k][j] = a[i][j];}
ADI MPI Performance
128 256 384 512 640 7680
0.10.20.30.40.50.60.70.80.9
MPI
1-node2-nodes4-nodes
Space Size (N)
Speedup
128 256 384 512 640 7680
20
40
60
80
100
120
MPI
1-node2-nodes4-nodes
Space Size (N)
Max M
PIT
ime-t
o-A
ppTim
e
Rati
o (
%)
ADI mpiP results
---------------------------------------------------------------------------@--- Aggregate Time (top twenty, descending, milliseconds) -------------------------------------------------------------------------------------------Call Site Time App% MPI% COVGather 1 8.63e+04 22.83 23.54 0.00Gather 3 6.29e+04 16.63 17.15 0.00Gather 2 6.08e+04 16.10 16.60 0.00Gather 4 5.83e+04 15.43 15.91 0.00Scatter 4 3.31e+04 8.76 9.03 0.00Scatter 2 3.08e+04 8.14 8.39 0.00Scatter 3 2.87e+04 7.58 7.81 0.00Scatter 1 5.53e+03 1.46 1.51 0.00Bcast 2 50.8 0.01 0.01 0.00Bcast 4 50.8 0.01 0.01 0.00Bcast 3 49.5 0.01 0.01 0.00Bcast 1 40.4 0.01 0.01 0.00Reduce 1 2.57 0.00 0.00 0.00Reduce 3 0.259 0.00 0.00 0.00Reduce 2 0.056 0.00 0.00 0.00Reduce 4 0.052 0.00 0.00 0.00
ADI MPI+OpenMP CodeMPI_Bcast(a, N * N, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);
omp_set_num_threads(T);
#pragma omp parallel private(iter){int id, sindex, eindex;int m,n;id = omp_get_thread_num();
sindex = id * node_rows/T;eindex = sindex + node_rows/T;int l = myrank*(N/P);
for(m=sindex; m<eindex; m++){for(n=0;n<N;n++)
apart[m][n] = a[l+m][n];l++;
}
ADI MPI+OpenMP Codefor(iter = 1; iter <= 2*MAXITER; iter++){//////ADI forward & backword sweep along rows//////#pragma omp for private(i,j) nowaitfor (i = 0; i < N/P; i++){
for (j = 1; j < N; j++){xpart[i][j] = xpart[i][j]-xpart[i][j-1]*apart[i][j]/bpart[i][j-1];bpart[i][j]= bpart[i][j] - apart[i][j]*apart[i][j]/bpart[i][j-1];
}xpart[i][N-1] = xpart[i][N-1]/bpart[i][N-1];
}
#pragma omp for private(i,j)for (i = 0; i < N/P; i++)
for (j = N-2; j > 1; j--)xpart[i][j]=(xpart[i][j]-apart[i][j+1]*xpart[i][j+1])/bpart[i][j];
#pragma omp master{
MPI_Gather(xpart, N*N/P, MPI_FLOAT, x, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Gather(bpart, N*N/P, MPI_FLOAT, b, N*N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);
}
#pragma omp barrier
ADI MPI+OpenMP Code#pragma omp sections{
#pragma omp section{ trans(x, N, N); }#pragma omp section{ trans(b, N, N); }#pragma omp section{ trans(a, N, N); }
}#pragma omp barrier
#pragma omp master{MPI_Scatter(x, N * N/P, MPI_FLOAT, xpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);MPI_Scatter(b, N * N/P, MPI_FLOAT, bpart, N * N/P, MPI_FLOAT, 0, MPI_COMM_WORLD);}
l = myrank*(N/P);for(m=sindex; m<eindex; m++){
for(n=0;n<N;n++)apart[m][n] = a[l+m][n];
l++;}}#pragma omp barrier}
ADI MPI+OpenMP Performance
128 256 384 512 640 7680
0.20.40.60.8
11.21.41.61.8
MPI+OpenMP
1-node2-nodes4-nodes
Space Size (N)
Speedup
128 256 384 512 640 7680
100200300400500600700800900
MPI+OpenMP
1-node2-nodes4-nodes
Space Size (N)
Overh
ead
128 256 384 512 640 7680
20
40
60
80
100
120
MPI+OpenMP
1-node2-nodes4-nodes
Space Size (N)
Max M
PIT
ime-t
o-A
ppTim
e
Rati
o (
%)
ADI ompP results
R00002 adi_mpi_scatter_openmp.c (89-96) LOOP TID execT execC bodyT exitBarT taskT 0 0.05 200 0.05 0.00 0.00 1 0.05 200 0.05 0.00 0.00 2 0.08 200 0.08 0.00 0.00 3 0.08 200 0.08 0.00 0.00 4 0.08 200 0.08 0.00 0.00 5 0.08 200 0.08 0.00 0.00 6 0.08 200 0.08 0.00 0.00 7 0.08 200 0.08 0.00 0.00 SUM 0.58 1600 0.58 0.00 0.00
R00003 adi_mpi_scatter_openmp.c (99-104) LOOP TID execT execC bodyT exitBarT taskT 0 0.06 200 0.05 0.01 0.00 1 34.23 200 0.05 34.18 0.00 2 34.22 200 0.05 34.17 0.00 3 34.22 200 0.05 34.17 0.00 4 34.21 200 0.05 34.16 0.00 5 34.20 200 0.05 34.15 0.00 6 34.21 200 0.05 34.16 0.00 7 34.20 200 0.05 34.15 0.00 SUM 239.54 1600 0.39 239.14 0.00
ADI ompP results
R00005 adi_mpi_scatter_openmp.c (113) BARRIER TID execT execC taskT 0 0.00 200 0.00 1 64.29 200 0.00 2 64.29 200 0.00 3 64.29 200 0.00 4 64.29 200 0.00 5 64.29 200 0.00 6 64.29 200 0.00 7 64.29 200 0.00 SUM 450.02 1600 0.00
R00004 adi_mpi_scatter_openmp.c (106-111) MASTER TID execT execC 0 64.28 200 SUM 64.28 200
R00006 adi_mpi_scatter_openmp.c (116-130) SECTIONS TID execT execC sectT sectC exitBarT mgmtT taskT 0 0.85 200 0.85 200 0.00 0.00 0.00 1 0.85 200 0.83 200 0.02 0.00 0.00 2 0.85 200 0.44 200 0.41 0.00 0.00 3 0.85 200 0.00 0 0.85 0.00 0.00 4 0.85 200 0.00 0 0.85 0.00 0.00 5 0.85 200 0.00 0 0.85 0.00 0.00 6 0.85 200 0.00 0 0.85 0.00 0.00 7 0.85 200 0.00 0 0.85 0.00 0.00 SUM 6.80 1600 2.12 600 4.67 0.01 0.00
ADI ompP results
R00007 adi_mpi_scatter_openmp.c (132) BARRIER TID execT execC taskT 0 0.00 200 0.00 1 0.00 200 0.00 2 0.00 200 0.00 3 0.00 200 0.00 4 0.00 200 0.00 5 0.00 200 0.00 6 0.00 200 0.00 7 0.00 200 0.00 SUM 0.01 1600 0.00
R00008 adi_mpi_scatter_openmp.c (134-138) MASTER TID execT execC 0 34.46 200 SUM 34.46 200
R00009 adi_mpi_scatter_openmp.c (149) BARRIER TID execT execC taskT 0 0.00 1 0.00 1 0.28 1 0.00 2 0.28 1 0.00 3 0.28 1 0.00 4 0.28 1 0.00 5 0.28 1 0.00 6 0.28 1 0.00 7 0.28 1 0.00 SUM 1.94 8 0.00
ADI mpiP results
---------------------------------------------------------------------------@--- Aggregate Time (top twenty, descending, milliseconds) -------------------------------------------------------------------------------------------Call Site Time App% MPI% COVGather 2 8.98e+04 23.32 23.52 0.00Gather 6 6.57e+04 17.05 17.19 0.00Gather 8 6.45e+04 16.74 16.89 0.00Gather 4 6.17e+04 16.03 16.16 0.00Scatter 4 3.39e+04 8.79 8.87 0.00Scatter 8 3.1e+04 8.06 8.13 0.00Scatter 6 2.96e+04 7.68 7.75 0.00Scatter 2 5.4e+03 1.40 1.41 0.00Bcast 7 49.5 0.01 0.01 0.00Bcast 3 49.3 0.01 0.01 0.00Bcast 5 47.8 0.01 0.01 0.00Bcast 1 40 0.01 0.01 0.00Scatter 1 30.5 0.01 0.01 0.00Scatter 5 30.3 0.01 0.01 0.00Scatter 7 30.3 0.01 0.01 0.00Scatter 3 28.8 0.01 0.01 0.00Reduce 1 1.8 0.00 0.00 0.00Reduce 5 0.062 0.00 0.00 0.00Reduce 3 0.049 0.00 0.00 0.00Reduce 7 0.049 0.00 0.00 0.00
THANKS
Q & A Any Suggestions?