[IEEE Computer. Soc. Press Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Geneva, Switzerland (1 April 1997)] Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Parallel programming through configurable interconnectable objects page 1
[IEEE Computer. Soc. Press Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Geneva, Switzerland (1 April 1997)] Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Parallel programming through configurable interconnectable objects page 2
[IEEE Computer. Soc. Press Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Geneva, Switzerland (1 April 1997)] Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Parallel programming through configurable interconnectable objects page 3
[IEEE Computer. Soc. Press Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Geneva, Switzerland (1 April 1997)] Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Parallel programming through configurable interconnectable objects page 4
[IEEE Computer. Soc. Press Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Geneva, Switzerland (1 April 1997)] Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Parallel programming through configurable interconnectable objects page 5

[IEEE Computer. Soc. Press Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Geneva, Switzerland (1 April 1997)] Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments - Parallel programming through configurable interconnectable objects

Embed Size (px)

Text of [IEEE Computer. Soc. Press Second International Workshop on High-Level Parallel Programming Models...

  • Parallel Programming through Configurable Interconnectable Objects

    Enrique Vinicio Carrera E. Orlando Loques Julius Leite Coppe - UFRJ CAA - UFF CAA - UFF

    Caixa Postal 685 1 1 Rio de Janeiro - RJ, 2 1945-970

    Rua Passo da Patria, 156 Niteroi - RJ, 242 10-240

    Brazil Brazil Brazil vinicio@cos.ufrj .br loques@caa.uff.br julius@caa.uff.br

    Rua Passo da Patria, 156 Niteroi - RJ, 242 10-240

    Abstract

    This paper presents P-RIO, a parallel programming environment that supports an object based software conjiguration methodology. It promotes a clear separation of the individual sequential computation components from the interconnection structure used for the interaction between these components. This makes the data and control interactions explicit, simpliJjiing program visualization and understanding. P-RIO includes a graphical tool that helps to conjigure, monitor and debug parallel programs

    1. Introduction

    In most message passing based parallel programming environments, the interactions are specified through explicit language constructs embedded in the text of the program modules. As a consequence, when the interaction patterns are not trivial, the overall program structure is hidden, making difficult its understanding and performance optimization. In addition, there is little regard for properties such as reuse, modularity and software maintsnmce that are of great concern in the software engineering area.

    In this context, P-RIO (Parallel - Reconfigurable Interconnectable Objects) tries to offer high-level, but straightforward, concepts for parallel and distributed programming [I ] . A simple software configuration methodology makes most of the useful object based programming technology properties available, facilitating modularity and code reuse. This methodology promotes a clear separation of the individual sequential computation components from the interconnection structure used for the interaction between these components. Besides the language used for the programming of the sequential components, a configuration language is used to describe

    the program composition and its interconnection structure. This makes the data and control interactions explicit, simplifying program visualization and understanding. In principle, the methodology is independent of the programming language, operating system and of the communication architecture adopted.

    P-RIO includes a graphical tool that provides system visualization features, that help to configure, monitor and debug a parallel program. The support environment has a modular construction, is highly portable, and provides development tools and run-time support mechanisms for parallel programs, in architectures composed of heterogeneous computing nodes. Our current implementation, based on PVM [ 3 ] , is message passing oriented what had strong influence in the presentation of the paper. However, we also provide some insights when considering a distributed shared memory architecture.

    Figure 1. A typical module icon

    2. P-RIO concepts

    The P-RIO methodology hinges on the configuration paradigm [ 2 ] whereby a system can be assembled by extemally interconnected modules obtained from a library. The concept of module (figure 1) has two facets: (i) a type or class if it is used as a mould to create execution units of a parallel program; (ii) an execution unit, called a module or class instance. Primitive classes define only one thread of control, and instances created from them can execute concurrently. Composition caters for module reuse and encapsulation; that is to say, existing (primitive and composite) classes can be used to compose new classes.

    110 0-8186-7882-8/97 $10.00 0 1997 IEEE

  • Configuration comprises the selection and naming of class instances and the definition of their interconnections.

    The points of interaction and interconnection between instances are defined by named ports, which are associated with their parent classes. Ports themselves are classes that can be reused in different module classes. A port instance is named in the context of its owner class. This provides configuration level naming independence helping the reuse of the port classes.

    We use the term transaction style to name a specific control and data interaction rule between ports. Besides the interaction rule, the style also defines the implementation mechanisms required for the support of the particular transaction.

    Our basic transaction set includes a bi-directional synchronous (remote-procedure-call like) and a Ilnldii.cct;onal asynchronous (datagram-like) transaction style. The synchronous transaction can be used in a relaxed fashion (deferred synchronous), without blocking, allowing to overlap remote calls with local computations. This basic transaction set can be extended or modified in order to fulfill different requirements of parallel programs. It was implemented using the standard PVM library. Currently, we are considering a MPI [4] based implementation. In this case, the different semantic modes (defined by the underlying protocols) associated to MPI point-to-point transactions could be selected at the configuration level. Other transaction styles, as for example, the highly optimized RPC mechanism described in [ 5 ] , based on active messages, could be included in our environment and selected for specific port connections.

    2.1. Configuration programming

    Figure 2, shows a master-slave architecture typically found in parallel programs and its description using the configuration language. Data is a port class that defines an RPC-like transaction (sync); int [21 and double are the data types used for the in and out call arguments, respectively. Calc-Pi is a composite class that uses the primitive classes Master and Slave; it defines enclosed instances of these classes (master and slave) as well as their ports and connections. Note that the ports named input and output [ i l are of the Data class. The last line specifies the creation of an instance of Calc - Pi, named calc-pi with a variable number of slave instances. The declarative system description is submitted to a command interpreter that executes the low level calls required to create the specified parallel program. The interpreter queries the user to provide the number of replicated instances to be actually created in the system. The instances are automatically distributed to the available set of processors. However, the user can override this

    allocation defining specific mappings at the configuration level. The configuration language interpreter uses a parser written in Tcl [6] and supports flow control instructions that facilitate the description of complex configurations [l] . Alternatively, as described in section 2.4, a graphical tool can be used to configure parallel programs.

    /'

    \

    port-class Data { sync, int [21, double} class Calc-Pi { N } {

    class Master { N } { code C "master $N" : port out Data output [$NI ;

    1 class Slave { } {

    1

    code C l'slave" ; port in Data input;

    Master master $N; Slave slave [$NI : forall i $N { link master.output[$il

    slave [Si1 .input; I

    I Calc-Pi calc-pi variable;

    Figure 2. Example of the configuration language

    2.2. Group transactions

    A group can be seen as an abstract module that receives inputs and distributes them according to a particular transaction style. In our model, groups are explicitly named for connection purposes and several collective message passing transaction styles can be selected at the configuration domain. Configuration level primitives are available to create and name groups, as well as to specify port to group connections. Our current implementation supports two multicast group transaction styles: unidirectional and bi-directional, for use with the asynchronous and synchronous transactions, respectively. It is noteworthy that the popular all-gather, scatter and gather, and all-to-all collective data moves can be implemented using the standard features of P-RIO.

    Barrier synchronization mechanisms can also be implemented using a group transaction through ports. However, we chose to offer a specific programming level

    111

  • primitive for this function, in order not to burden the programmer with port concerns. For the same reason, barrier specifications appear in a simplified form at the configuration level.

    2.3. Shared memory transactions

    In this section, we present one scheme that we adopt to introduce the distributed shared memory (DSM) paradigm in the context of P-RIO. This provides a hybrid environment, helping the programmer to mix the two paradigms in order to implement the best solution for a specific problem.

    \\, \

    i

    (a) Configuration Structure

    Coherence Protocol

    (b) Implementation Structure

    Figure 3. Distributed shared memory transaction

    In figure 3-a, lets consider that module S can perform heavy processing on a data structure on behalf of any module C[i]. The latter concurrently compete to access these processing services through procedure call transactions associated to ports visible at S interface (for simplicity, only one port is presented in figure 3). Thus, as an implementation option, S and its encapsulated data structures can be replicated at each hardware node where there is a potential user of its services. Thus, the procedure code can be executed locally, providing a simple computation and data migration policy. A procedure that performs read-only data operations does not need to execute any synchronization code. However, if the procedure performs updates to S data, they must be implemented using a shared memory coherence protocol. Configuration level analysis, similar to compile level

    analys