60

Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

  • View
    223

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the
Page 2: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

2

Programming systems

Programming systems– For parallel computing

» Traditional systems (MPI, HPF) do not address the extra challenges of heterogeneous parallel computing

» mpC, HeteroMPI

– For high performance distributed computing» NetSolve/GridSolve

Page 3: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

3

mpC

mpC– An extension of ANSI C for programming parallel

computations on networks of heterogeneous computers

– Support efficient, portable and modular heterogeneous parallel programming

– Addresses the heterogeneity of both processors and communication network

Page 4: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

4

mpC (ctd)

A parallel mpC program is a set of parallel processes interacting (that is, synchronizing their work and transferring data) by means of message passing

The mpC programmer cannot determine how many processes make up the program and which computers execute which processes

– This is specified by some means external to the mpC language

– Source mpC code only determines, which process of the program performs which computations.

Page 5: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

5

mpC (ctd)

The programmer describes the algorithm– The number of processes executing the algorithm– The total volume of computation to be performed

by each process» A formula including the parameters of the algorithm

» The volume is measured in computation units provided by the application programmer

The very code that has been used to measure the speed of the processors

Page 6: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

6

mpC (ctd)

The programmer describes the algorithm (ctd)– The total volume of data transferred between each

pair of the processes– How the processes perform the computations and

communications and interact» In terms of traditional algorithmic patterns (for, while,

parallel for, etc)» Expressions in the statements specify not the

computations and communications themselves but rather their amount

Parameters of the algorithm and locally declared variables can be used

Page 7: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

7

mpC (ctd) The abstract processes of the algorithm are

mapped to the real parallel processes of the program– The mapping of the abstract processes should minimize

the execution time of this program

Page 8: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

8

mpC (ctd) Example (see handouts for full code):

algorithm HeteroAlgorithm(int n, double v[n]){ coord I=n; node { I>=0: v[I]; };}; …int [*]main(int [host]argc, char **[host]argv){ … { net HeteroAlgorithm(N, volumes) g; … }}

Page 9: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

9

mpC (ctd) The program calculates the mass of a metallic

construction welded from N heterogeneous rails– It defines group g consisting of N abstract processes, each

calculating the mass of one of the rails

– The calculation is performed by numerical 3D integration of the density function Density with a constant integration step

» The volume of computation to calculate the mass of each rail is proportional to the volume of this rail

– i-th element of array volumes contains the volume of i-th rail» The program specifies that the volume of computation performed by

each abstract process of g is proportional to the volume of its rail

Page 10: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

10

mpC (ctd)

The library nodal function MPC_Wtime is used to measure the wall time elapsed to execute the calculations

Mapping of abstract processes to real processes – Based on the information about the speed, at which the real

processes run on physical processors of the executing network

Page 11: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

11

mpC (ctd) By default, the speed estimation obtained on initialization

of the mpC system on the network is used– The estimation is obtained by running a special test program

mpC allows the programmer to change at runtime the default estimation of processor speed by tuning it to the computations, which will be really executed– The recon statement

Page 12: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

12

mpC (ctd)

An irregular problem– Characterized by inherent coarse/large-

grained structure– This structure determines a natural

decomposition of the problem into a small number of subtasks» Of different size» Can be solved in parallel

Page 13: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

13

mpC (ctd)

The whole program solving the irregular problem– A set of parallel processes– Each process solves its subtask

» As sizes of subtasks are different, processes perform different volumes of computation

– The processes are interacting via message passing Calculation of the mass of a mettalic

«hedgehog» is an example of irregular problem

Page 14: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

14

mpC (ctd) A regular problem

– The most natural decomposition is a large number of small identical subtasks that can be solved in parallel

– As the subtasks are identical, they are of the same size

Multiplication of two nxn dense matrices is an example of a regular problem– Naturally decomposed into n2 identical subtasks

» Computation of one element of the resulting matrix

How to efficiently solve a regular problem on a network of heterogeneous computers?

Page 15: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

15

mpC (ctd) Main idea

– Transform the problem into an irregular problem» Whose structure is determined by the structure of the

executing network

The whole problem– Decomposed into a set of relatively large subproblems– Each subproblem is made of a number of small

identical subtasks stuck together– The size of each subproblem depends on the speed of

the processor solving this subproblem

Page 16: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

16

mpC (ctd)

The parallel program– A set of parallel processes– Each process solves one subproblem on a

separate physical processor» The volume of computation performed by each

of these processes should be proportional to its speed

– The processes are interacting via message passing

Page 17: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

17

mpC (ctd) Example. Parallel multiplication on a heterogeneous

network of matrix A and the transposition of matrix B, where A, B are dense square nxn matrices.

, =>

A B C=AxBT

Page 18: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

18

mpC (ctd) One step of parallel multiplication of matrices A and BT.

The pivot row of blocks of matrix B (shown slashed) is first broadcast to all processors. Then, each processor in parallel with others computes its part of the corresponding column of blocks of the resulting matrix C.

, =>

A B C

Page 19: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

19

mpC (ctd) See handouts for the mpC program implementing

this algorithm– The program first update the estimation of the speeds

of processors with the code» Executed at each step of the main loop

– The program first detects the number of physical processors

Page 20: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

20

mpC: inter-process communication

Basic subset of mpC is based on the performance model of parallel algorithm ignoring communication operations

– It presumes that» contribution of the communications into the total execution

time of the algorithm is negligibly small compared to that of the computations

– It is acceptable for» Computing on heterogeneous clusters» MP algorithms not frequently sending short messages

– Not acceptable for “normal” algorithms running on common heterogeneous networks of computers

Page 21: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

21

mpC: inter-process communication (ctd)

Compiler can optimally map parallel algorithms with substantial contribution of communication operations into the execution time only if programmers can specify

– Absolute volumes of computation performed by processes

– Volumes of data transferred between the processes

Page 22: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

22

mpC: inter-process communication (ctd)

Volume of communication– Can be naturally measured in bytes

Volume of computation– What is the natural unit of measurement?

» To allow the compiler to accurately estimate the execution time

– In mpC, the unit is the very code which has been most recently used to estimate the speed of physical processors » Normally specified as part of the recon statement

Page 23: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

23

mpC: N-body problem

The system of bodies consists of large groups of bodies, with different groups at a good distance from each other. The bodies move under the influence of Newtonian gravitational attraction

Page 24: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

24

mpC: N-body problem (ctd)

Parallel N-body algorithm– There is one-to-one mapping between groups of

bodies and parallel processes of the algorithm– Each process

» Holds in its memory all data characterising bodies of its group

Masses, positions and velocities of bodies

» Responsible for its updating

Page 25: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

25

mpC: N-body problem (ctd)

Parallel N-body algorithm (ctd)– The effect of each remote group is approximated

by a single equivalent body» To update its group, each process requires the total

mass and the center of mass of all remote groups The total mass of each group of bodies is constant. It is

calculated once. Each process receives from each of other processes its calculated total mass, and stores all the masses.

The center of mass of each group is a function of time. At each step of simulation, each process computes its center and sends it to other processes.

Page 26: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

26

mpC: N-body problem (ctd)

Parallel N-body algorithm (ctd)– At each step of simulation the updated system of

bodies is visualised» To do it, all groups of bodies are gathered to the process

responsible for the visualisation, which is the host-process.

– In general different groups have different sizes » Different processes perform different volumes of

computation» different volumes of data are transferred between different

pairs of processes

Page 27: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

27

mpC: N-body problem (ctd) Parallel N-body algorithm (ctd)

The POV of each individual process: the system includes all bodies of its group, with each remote group approximated by a single equivalent body .

Page 28: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

28

mpC: N-body problem (ctd)

Pseudocode of the N-body algorithm:

Initialise groups of bodies on the host-processVisualize the groups of bodiesScatter the groups across processesCompute masses of the groups in parallelCommunicate to share the masses among processeswhile(1) { Compute centers of mass in parallel Communicate the centers among processes Update the state of the groups in parallel Gather the groups to the host-process Visualize the groups of bodies}

Page 29: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

29

mpC N-body application

The core is the specification of the performance model of the algorithm:algorithm Nbody(int m, int k, int n[m]){ coord I=m; node { I>=0: bench*((n[I]/k)*(n[I]/k)); }; link { I>0: length*(n[I]*sizeof(Body)) [I]->[0];};

parent [0];};

Page 30: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

30

mpC N-body application (ctd)

The most principle fragments of the rest of code:void [*] main(int [host]argc, char **[host]argv){ ... // Make the test group consist of first Tgsize // bodies of the very first group of the system OldTestGroup[] = (*(pTestGroup)Groups[0])[]; recon Update_group(TGsize, &OldTestGroup, &TestGroup, 1,NULL,NULL,0 ); { net Nbody(NofGroups, TGsize, NofBodies) g; … }}

Page 31: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

31

mpC: algorithmic patterns

One more important feature of parallel algorithm is still not reflected in the performance model

– The order of execution of computations and communications

As the model says nothing about how parallel processes interact during execution of the algorithm, the compiler assumes that

– First, all processes execute all their computations in parallel

– Then the processes execute all the communications in parallel

– There is a synchronisation barrier between execution of the computations and communications

Page 32: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

32

mpC: algorithmic patterns (ctd)

These assumption are unsatisfactory in case of– Data dependencies between computations performed

by different processes» One process may need data computed by other processes in

order to start its computations

» This serialises some computations performed by different parallel processes ==> The real execution time of the algorithm will be longer

– Overlapping of computations and communications» The real execution time of the algorithm will be shorter

Page 33: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

33

mpC: algorithmic patterns (ctd)

Thus, if estimation is not based on the actual scenario of interaction of parallel processes

– It may be not accurate which leads to non-optimal mapping of the algorithm to the executing network

Example. An algorithm with fully serialised computations.

– Optimal mapping:» All the processes are asigned to the fastest physical processor

– Mapping based on the above assumptions:» Involves all available physical processors

Page 34: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

34

mpC: algorithmic patterns (ctd)

mpC addresses the problem– The programmer can specify the scenario of

interaction of parallel processes during execution of the parallel algorithm

– That specification is a part of the network type definition» The scheme declaration

Page 35: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

35

mpC: algorithmic patterns (ctd)

Example 1. N-body algorithmalgorithm Nbody(int m, int k, int n[m]){ coord I=m; node { I>=0: bench*((n[I]/k)*(n[I]/k)); }; link { I>0: length*(n[I]*sizeof(Body)) [I]->[0];}; parent [0];

scheme { int i; par (i=0; i<m; i++) 100%%[i]; par (i=1; i<m; i++) 100%%[i]->[0]; };

};

Page 36: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

36

mpC: algorithmic patterns (ctd)

Example 2. Matrix multiplication.algorithm ParallelAxBT(int p, int n, int r, int d[p]) {coord I=p;node

{ I>=0: bench*((d[I]*n)/(r*r)); };link (J=p)

{ I!=J: length*(d[I]*n*sizeof(double)) [J]->[I]; };parent [0];

Page 37: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

37

mpC: algorithmic patterns (ctd)

Example 2. Matrix multiplication (ctd) scheme { int i, j, PivotProc=0, PivotRow=0; for(i=0; i<n/r; i++, PivotRow+=r) { if(PivotRow>=d[PivotProc]) { PivotProc++; PivotRow=0; } for(j=0; j<p; j++) if(j!=PivotProc) (100.*r/d[PivotProc])%%[PivotProc]->[j]; par(j=0; j<p; j++) (100.*r/n)%%[j]; } };};

Page 38: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

38

mpC: the timeof operator

Further modification of the matrix multiplication program: [host]: {

int m; struct {int p; double t;} min;

double t; min.p = 0; min.t = DBL_MAX; for(m=1; m<=p; m++) {

Partition(m, speeds, d, n, r); t = timeof(net ParallelAxBT(m, n, r, d) w); if(t<min.t) { min.p = m; min.t = t; } } p = min.p;

}

Page 39: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

39

mpC: the timeof operator (ctd)

Operator timeof estimates the execution time of the parallel algorithm without its real execution

– The only operand specifies a fully specified network type» The value of all parametrs of the network type must be specified

– The operator does not create an mpC network of this type

– Instead, it calculates the time of execution of the corresponding parallel algorithm on the executing network » Based on

the provided performance model of the algorithm the most recent performance characteristics of physical processors

and communication links

Page 40: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

40

mpC: mapping

Dispatcher maps abstract processes of the mpC network to the processes of the parallel program

– At runtime– Trying to minimize of the execution time

The mapping is based on» The model of the executing network of computers

» A map of processes of the parallel program The total number of processes running on each computer The number of free processes

Page 41: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

41

mpC: mapping (ctd)

The mapping is based on (ctd)» The performance model of the parallel algorithm represented

by this mpC network The number of parallel processes executing the algorithm The absolute volume of computations performed by each

of the processes The absolute volume of data transferred between each

pair of processes The scenario of interaction between the parallel

processes during the algorithm execution

Page 42: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

42

mpC: mapping (ctd)

Two main features:– Estimation of each particular mapping

» Based on Formulas for

– Each computation unit in the scheme declaration

– Each communication unit in the scheme declaration Rules for each sequential and parallel algorithmic pattern

–for, if, par, etc.

Page 43: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

43

HeteroMPI

HeteroMPI– An extension of MPI– Programmer can describe the performance model of

the implemented algorithm» In a small model definition language shared with mpC

– Given this description» HeteroMPI tries to create a group of processes executing the

algorithm faster than any other group

Page 44: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

44

HeteroMPI (ctd)

Standard MPI approach to group creation– Acceptable in homogeneous environments

» If there is one process per processor» Any group will execute the algorithm with the same speed

– Not acceptable » In heterogeneous environments» If there are more that one process per processor

In HeteroMPI– The programmer can describe the algorithm– The description is translated into a set of functions

» Making up an algorithm-specific part of HeteroMPI run-time system

Page 45: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

45

HeteroMPI (ctd)

A new operation to create a group of processes:HMPI_Group_create(

HMPI_Group* gid,

const HMPI_Model* perf_model,

const void* model_parameters)

Collective operation– In the simplest case, called by all processes HMPI_COMM_WORLD

Page 46: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

46

HeteroMPI (ctd)

Dynamic update of the estimation of the processors speed can be performed by

HMPI_Recon(

HMPI_Benchmark_function func,

const void* input_p,

int num_of_parameters,

const void* output_p)

Collective operation– Called by all processes of HMPI_COMM_WORLD

Page 47: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

47

HeteroMPI (ctd)

Prediction of the execution time of the algorithmHMPI_Timeof(

HMPI_Model *perf_model,

const void* model_parameters)

Local operation– Can be called by any processes

Page 48: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

48

HeteroMPI (ctd)

Another collective operation to create a group of processes:

HMPI_Group_auto_create(

HMPI_Group* gid,

const HMPI_Model* perf_model,

const void* model_parameters) Used if the programmer wants HeteroMPI to find the

optimal number of processes

Page 49: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

49

HeteroMPI (ctd)

Other HMPI operationsHMPI_Init()HMPI_Finalize()HMPI_Group_free()HMPI_Group_rank()HMPI_Group_size()MPI_Comm *HMPI_Get_comm(HMPI_Group *gid)

HMPI_Get_comm– Creates an MPI communicator with the group defined

by gid

Page 50: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

50

Grid Computing vs Distributed Computing

Definitions of Grid computing are various and vague– A new computing model for better use of many separate

computers connected by a network– => Grid computing targets heterogeneous networks

What is the difference between Grid-based heterogeneous platforms and traditional distributed heterogeneous platforms?– A single login to a group of resources is the core– Grid operating environment – services built on top of this

» Different models of GOE supported by different Grid middleware (Globus, Unicore)

Page 51: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

51

GridRPC

High-performance Grid programming systems are based on GridRPC– RPC – Remote Procedure Call

» Task, input data, output data, remote computer

– GridRPC» Task, input data, output data» Remote computer is picked by the system

Page 52: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

52

NetSolve NetSolve

– Programming system for HPDC on global networks» Based on the GridRPC mechanism

– Some components of the application are only available on remote computers

NetSolve application– The user writes a client program

» Any program (in C, Fortran, etc) with calls the NetSolve client interface

» Each call specifies Remote task Location of the input data on the user’s computer Location of the output data (on the user’s computer)

Page 53: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

53

NetSolve (ctd)

Execution of the NetSolve application– A NetSolve call results in

» A task to be executed on a remote computer» The NetSolve programming system

Selects the remote computer Transfers input data to the remote computer Delivers output data to the user’s computer

– The mapping of the remote tasks to computers» The core operation having an impact on the

performance of the application

Page 54: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

54

NetSolve (ctd)

1. Assign (“task”)

netslInfo()

Agent

Server AProxy

Client

netsl (“task”, in, out)

Server B

netslX()

2. Upload (in)

3. Download (out)

Page 55: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

55

NetSolve (ctd)

Mapping algorithm– Each task is scheduled separately and

independently on other tasks» A NetSolve application is seen as a sequence of

independent tasks

– Based on two performance models (PMs)» The PM of heterogeneous network of computers» The PM of a task

Page 56: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

56

NetSolve (ctd)

Client interface– User’s command line interface

» NS_problems, NS_probdesc

– C program interface» Blocking call

int netsl(char *problem_name, …<argument_list>…)

» Non-blocking call request=netslnb(…); info = netslpr(request); info = netslwt(request);

Page 57: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

57

NetSolve (ctd)

Network of computers– A set of interconnected heterogeneous processors

» Each processor is characterized by the execution time of the same serial code

Matrix multiplication of two 200×200 matrices Obtained once on the installation of NetSolve and

does not change» Communication links

The same way as in NWS (latency + bandwidth) Dynamic (periodically updated)

Page 58: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

58

NetSolve (ctd)

The performance model of a task– Provided by the person installing the task on a

remote computer– A formula to calculate the execution time of the

task by the solver» Uses parameters of the task and the execution time of

the standard computation unit (matrix multiplication)

– The size of input and output data– The PM = a distributed set of performance models

Page 59: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

59

NetSolve (ctd)

The mapping algorithm– Performed by the agent– Minimizes the total execution time, Ttotal

» Ttotal= Tcomputation + Tcommunication

» Tcomputation

Uses the formulas of the PM of the task » Tcommunication= Tinput delivery + Toutput receive

Uses characteristics of the communication link and the size of input and output data

Page 60: Heterogeneous and Grid Compuitng2 Programming systems u Programming systems –For parallel computing »Traditional systems (MPI, HPF) do not address the

Heterogeneous and Grid Compuitng

60

NetSolve (ctd)

Link to NetSolve software and documentation– http://icl.cs.utk.edu/netsolve/