[IEEE Comput. Soc Eighth International Workshop on High-Level Parallel Programming Modes and Supportive Environments - Nice, France (22 April 2003)] Eighth International Workshop on

ParoC++: A Requirement-driven Parallel Object-orientedProgramming Language

Tuan-Anh Nguyen, Pierre KuonenUniversity of Applied Sciences Western [email protected], [email protected]

Abstract

Adaptive utilization of resources in a highly heteroge-neous computational environment such as the Grid is a dif-ficult question. In this paper, we address an object-orientedapproach to the solution using requirement-driven paral-lel objects. Each parallel object is a self-described, share-able and passive object that resides in a separate mem-ory address space. The allocation of the parallel objectis driven by the constraints on the resource on which theobject will live. A new parallel programming paradigm ispresented in the context of ParoC++ - a new parallel object-oriented programming environment for high performancedistributed computing. ParoC++ extends C++ for support-ing requirement-driven parallel objects and a runtime sys-tem that provides services to run ParoC++ programs in dis-tributed environments. An industrial application on real-time image processing is used as a test case to the system.The experimental results show that the ParoC++ model isefficient and scalable and that it makes easier to adapt par-allel applications to dynamic environments.

1. Introduction

The emerging of computational grid [7, 8] and the rapidgrowth of the Internet technology have created new chal-lenges for application programmers and system develop-ers. Special purpose massively parallel systems are beingreplaced by loosely coupled or distributed general-purposemultiprocessor systems with high-speed network connec-tions. Due to the natural difficulty of the new distributedenvironment, the methodology and the programming toolsthat have been used before need to be rethought.Many system-level toolkits such as Globus [6], Legion

[12] have been developed to manage the complexity of thedistributed computational environment. They provide ser-vices such as resource allocation, information discovery, au-thentication users, etc. However, since the user must dealdirectly with the computational environment, developing

applications using such tools still remains tricky and timeconsuming.At the programming level, there still exists the ques-

tion of achieving high performance computing (HPC) in awidely distributed computational environment. Some ef-forts have been spent for porting existing tools such asMentat Programming Language (MPL) [11], MPI [5] tothe computational grid environment. Nevertheless, the sup-port for adaptive usage of resources is still limited in somespecific services such as network bandwidth and real-timescheduling. MPICH-GQ [16], for example, uses quality ofservice (QoS) mechanisms to improve performance of mes-sage passing. However, message passing is a quite low-levellibrary that the user has to explicitly specify the send, re-ceive and synchronization between processes and most ofparallelization tasks are left to the programmer.The above difficulties lead to a quest for a new model

for developing HPC applications in widely distributed en-vironments. While traditional distributed HPC applicationsusually view the performance as a function of processorsand network resources, we will address the question: Howto tailor the application with a desired performance to thedistributed computational environment.We developed an object-oriented model that enables the

user to express high-level resource requirements for eachobject. This model is implemented in a parallel object-oriented programming system for HPC called ParoC++.ParoC++ is a programming language and a runtime sys-tem. We did not try to create a new language but we ex-tended C++ to support our model. The runtime system ofParoC++ is responsible for managing and monitoring dis-tributed computational environment and is partially writtenusing ParoC++ itself. The current prototype runtime systemsupports the ability to map an arbitrary object onto a re-source in a heterogeneous environment. We have modelleda wide area environment as a dynamic graph of resources.The resource discovery process during parallel object allo-cation takes place on this graph by mechanism of requestmatching and forwarding.In ParoC++, the user does not directly deal with pro-

Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS’03) 0-7695-1880-X/03 $17.00 © 2003 IEEE

cesses. Instead, he handles the so-called ”parallel objects”which encapsulate processes. A parallel object is a self-described object that specifies its resource requirementsduring the lifetime. Parallel objects can be computationalobjects, data objects or both. Each parallel object residesin a separate memory address space. Similar to CORBA,parallel objects are passive objects that communicate viamethod invocations. The selection of resource for a parallelobject is driven by the object requirement and is transparentto the user.This paper focuses on the programming language aspect

of the ParoC++ and the requirement-driven parallel object.In section 2, we will explain our requirement-driven par-allel object model. Parallel object is the central concept inParoC++ which we describe in section 3. We also present inthis section some experimental results on low-level perfor-mance of ParoC++. Next, we demonstrate using ParoC++in an industrial real-time application in the field of imageprocessing in section 4. Some related works are discussedin section 5 before the conclusions in section 6.

2. Requirement-driven parallel object

2.1. A parallel object model

We envision parallel object as the generalization of thetraditional object such as in C++. We share with CORBAthe concept of transparent access to the object using objectinterface but we add more supports for object parallelism.One important support is the transparent creation of paral-lel objects by dynamic assignments of suitable resources toobjects. Another support is various mechanisms of methodconcurrency: parallel, sequential and mutex.A parallel object, in our definition, has all properties of a

traditional object plus the following ones:

• Parallel objects are shareable. References to parallelobjects can be passed to any method regardless wher-ever it is located (locally or remotely). This propertyis described in section 2.2.

• Syntactically, invocations on parallel objects are iden-tical to invocations on traditional sequential objects.However, the parallel object invocation supports var-ious semantics. The invocation semantic is presentedin section 2.3.

• Objects can be located on remote resources and in aseparate address space. Parallel objects allocations aretransparent to the user. The object allocation is pre-sented in section 2.4.

• Each parallel object has the ability to dynamically de-scribe its resource requirement during its lifetime. Thisfeature will be discussed in detail in the section 2.5.

It has to be mentioned that as normal objects, parallel ob-jects are passive objects. They can only go into active modewhen receiving a method invocation request. We believethat using the passive object is easier and more familiar tothe traditional object-oriented programming paradigm. Thepassive object allows the user to fully control object execu-tion, thus allowing a better integration into other softwarecomponents and making the maintenance of componentssimple.

2.2. Shareable parallel objects

All parallel objects are shareable. Shared objects withencapsulated data provide a means for the user to implementglobal data sharing in distributed environments. Shared ob-jects can be useful in many cases. For example, compu-tational parallel objects can synthesize the output data si-multaneously and automatically into a shared output paral-lel object.

2.3. Invocation semantics

Syntactically, invocations on parallel objects are identi-cal to invocations on traditional sequential objects. How-ever, the parallel object invocation supports various seman-tics. The semantics are defined by two parameters:

1. Interface semantics:

• Synchronous invocation: the caller waits until theexecution of the requested method on the serverside is finished and returned the results. This cor-responds to the traditional way to invoke meth-ods.

• Asynchronous invocation: the invocation returnimmediately after sending the request to the re-mote object. Asynchronous invocation is an im-portant mean to exploit the parallelism becauseit enables the overlapping between computationand communication. However, at the time the ex-ecution returns, no computing result is availableyet. This excludes the invocation from producingresults. However, the results can be actively re-turned to the caller object if the callee knows the”call back” interface of the caller. This featureis well supported in our approach by the fact thatan interface of parallel object can be passed asan argument to other parallel objects during themethod invocation.

2. Object-side semantics:

• Sequential invocation: the invocation is executedsequentially and during its execution, other in-vocation requests on the same object will be


blocked until this sequential invocation finishes.Other concurrent methods that have been startedbefore can still continue their normal works. Theexecutions of sequential methods guarantee theserializable consistency.

• Mutex invocation: the request is executed onlyif no other instance of methods is running. Oth-erwise, the current method will be blocked untilall the others (including concurrent methods) areterminated.

• Concurrent invocation: the execution of methodoccurs in a new thread (multithreading) if nosequential or mutex invocation is currently in-voked. All invocation instances of the sameobject share the same object data attributes.Concurrent invocation is an important mean toachieve the parallelism inside each parallel ob-ject.

All invocation semantics are specified during the designphase of parallel objects.

2.4. Parallel object allocation

The allocation of parallel object is transparent to users.Allocation of an object consists of two phases. In the firstphase, we need to find a resource where the object will live.The second phase is transmitting the correspondent objectcode to that resource (if it is necessary), starting the objectcode and setting up the object interface.This is important to ease the developer from dealing with

the complexity of distributed heterogeneous environments.

2.5. Requirement-driven parallel objects

Along with the changes in parallel and distributed pro-cessing toward web and global computing, there is a chal-lenge question of how to exploit high performance providedby highly heterogeneous and dynamic environments. Webelieve that for such environments, the high performancecan only be obtained if the two following conditions aresatisfied:

• The application should be able to adapt to the environ-ment.

• The programming environment should somehow en-able application components to describe their resourcerequirements.

The first condition can be fulfilled by multi-level paral-lelism, dynamic utilization of resources or adaptive task sizepartitioning. One solution is to dynamically create parallel

objects on demand that will be expressed in section 3 wherewe describe the ParoC++.In the second condition, the requirements can be ad-

dressed in form of quality of services that componentsdesire from the execution environment. Number of re-searches on the quality of service (QoS) has been performed[9, 13, 10]. Most of them consist in some low-level specificservices such as network bandwidth reservation, real-timescheduling, etc.In our approach, the user requirement is integrated into

parallel objects in the form of high-level resource descrip-tions. Each parallel object is associated with an object de-scription (OD) that depicts the needed resources to executethe object. The resource requirements in OD are expressedin terms of:

• Resource name (host name) (low level).• The maximum computing power that the object needs(e.g. the number of Mflops needed).

• The amount of memory that the parallel object con-sumes.

• The communication bandwidth with its interfaces.Each item in the OD is classified into two types: strict

item and non-strict item. Strict item means that the desig-nated resource must fully satisfy the requirement. If no sat-isfying resource is available, the allocation of parallel objectfails. Non-strict item, on the other hand, gives the systemmore freedom in selecting the resource. A resource that par-tially matches the requirement is acceptable although a fullqualification resource is the preferable one. For example,the following OD:"power>= 150 MFlops : 100MFlops;memory=128MB"means that the object requires a preferred performance150MFlops although 100MFlops is acceptable (non-strictitem) and a memory storage of at least 128MB (strict item).The construction of OD occurs during the parallel object

creation. The user can initiate the OD for each object con-structor. The OD can be parameterized by the input parame-ters of the constructor. This OD is then used by the runtimesystem to select an appropriate resource for the object.It can occur that, due to some changes on the object data

or some increase of computation demand, the OD needsto be re-adjusted during the life time of the parallel ob-ject. If the new requirement exceeds some threshold, theadjustment can request for object migration. Object migra-tion consists of three steps: first, allocating a new object ofthe same type with the current OD, then, transferring thecurrent object data to new object (assignment) and finally ,redirecting and re-establishing the communication from thecurrent object to the newly allocated objects. The migration


process should be handled by the system and be transparentto the user.

3. ParoC++ programming language

In this section, we present the main features of ParoC++programming system, focusing on the language aspect.

3.1. ParoC++ language

ParoC++ is an extension of C++ that supports parallelobjects. We try to keep this extension as close as possibleto C++ so that the programmer can easily learn ParoC++and the existing C++ libraries can be parallelized usingParoC++ without too much effort.We claim that all C++ classes with the following restric-

tions can be implemented as parallel object classes withoutany changes in semantic:

• All data attributes of object are protected or private• The object does not access any global variable• There is no user-defined operator• There is no method that returns the memory addressreferences

In other word, to some extension, ParoC++ is a supersetof C++. This is important if we want to construct mech-anisms for coarse-grain auto-parallelism. In many case,the compiler can efficiently decide among objects whichones are parallel objects and which ones are sequential ob-jects and thus automatically generates the codes for eachkind of object. Auto-parallelism is not yet implemented inParoC++.

3.2. ParoC++ parallel class

Developing ParoC++ programs mainly consist of design-ing parallel classes. The declaration of a parallel class be-gins with the keyword parclass following the class name:parclass myclass {...};As sequential classes, parallel classes contain methods

and attributes. Method accesses can be public, protectedor private while attribute accesses must be protected or pri-vate. For each method, the user should define the invocationsemantics. These semantics, described in section 2.3, arespecified by two keywords:

1. Interface semantics:

sync: Synchronous invocation. This corresponds tothe traditional way to invoke methods and is thedefault value. For example:sync void method1();

async: Asynchronous invocation. For example:async int method2();

2. object-side semantics:

seq: Sequential invocation. This is the default value.For example:seq void method1();

mutex: Mutex invocation:mutex int method2();

conc: Concurrent invocation. The invocation occursin a new thread.

The combination of the interface and object-side seman-tics defines the overal semantics of a method. For instance,the following declaration defines an asynchronous concur-rent method that returns an integer number:async conc int mymethod();Two important properties of object-oriented program-

ming: multiple inheritance and polymorphism are sup-ported in ParoC++. A parallel class can be a stand-aloneclass or it can be derived from other parallel classes. Somemethods of a parallel class can be declared as overridable(virtual methods).

3.3. Object description

Object description is declared along with parallel objectconstructor statement. Each constructor of a parallel objectassociates with an OD that resides right after the argumentdeclaration between ”{...}”. An OD contains a set of ex-pressions on the reserved keywords power (for the comput-ing power), network (for the communication bandwidth be-tween the object server and the interface),memory (for thememory) and host (user-specified resource). Each expres-sion is separated by a semi-colon (”;”) and has the followingformat:[power | memory | network] [>= | =]<number expression 1> [”:” number expression 2];or host = [string expression];The number expression 2 part is used only in non-strictOD items to describe the low-bound of acceptableresource requirements. The existence of host expressionwill make all other expressions be ignored.

Object description information will be used by theParoC++ run-time system to find a suitable resource for theparallel object. Matching between OD and resources is car-ried out by multi-layer filtering technique: first, each ex-pression in OD will be evaluated and be categorized (e.g.,power, network, memory). Then, the matching processconsists of several layers; each layer filters single categorywithin OD and performs matching on that category. Finally,if the OD can pass all filters, the object is assigned to thatresource.


3.4. Parallel object creation and destruction

In ParoC++, each parallel object has a counter that de-fines the current number of reference to the object. Acounter value of 0 will make the object be physically de-stroyed.Syntactically, the creation and the destruction of a par-

allel object are identical to those of C++. A parallel objectcan be implicitly created by declaring a variable of the typeof parallel object on stack or using the standard C++ newoperator. When the execution goes out of the current stackor the delete operator is called, the reference counter of thecorrespondent object will be decreased.The object creation process consists of locating a re-

source satisfying the OD, transmitting object code, remoteexecuting object code, establishing communication, trans-mitting arguments and invoking the object constructor. Fail-ures on object creation will raise an exception to the caller.

3.5. Inter-object communication: method invoca-tion

The conventional way to communicate between dis-tributed components in ParoC++ is through method invoca-tions. The semantic of invocations is fix ed during the classdeclaration. For standard C++ data types, the data mar-shalling is performed automatically. For user-defined datatypes, the user should also specify the function to marshaldata by an optional descriptor [proc=<marshal function>].If an argument of method is an array, it is also necessary thatthe user provide a hint on the number of elements by the ex-pression [size= <global number expression>].The current prototype of ParoC++ implements the com-

munication using TCP/IP socket and Sun XDR as its datarepresentation. All data transmitted over the network con-forms to XDR format.

3.6. Intra-object communication: shared data vs.event sub-system

In parallel objects, there are two ways for concurrent op-erations to communicate: using shared data attributes ofthe object or via the event sub-system. Communication be-tween operations using shared attributes is straightforwardbecause all of the operation instances share the same mem-ory address space. However, by this way, the programmershould manually verify and synchronize the data access.Another method is communication via event sub-system.

In ParoC++, each parallel object has its own event queue.Each event is a positive integer whose semantic is applica-tion dependent. A parallel object can raise or can wait foran event in its queue. Waiting for an event will check atthe parallel object event queue to see if the event is in the

queue. If not, the execution of object will be blocked untilthe event arrives in the queue. An event ”n” can be raised byoperation eventraise(n) or a method can wait for by event-wait(n). Raising an event in one parallel object will notaffect the waiting-for-event in other parallel objects.Event sub-system is a very powerful feature to deal with

signalling and synchronization problems in distributed en-vironments. For instance, it can be used in conjunction withthe shared data attributes to notify the status of data duringthe concurrent invocations of read/write operations. It canalso be used to tell the others about occurrence of failure orthe changes in the environment.

3.7. Mutual exclusive execution

When concurrent invocations occur, some parts of exe-cutions might access an attribute concurrently. To deal withthese situations, it is necessary to provide a mutual exclu-sive mechanism. ParoC++ supports this feature by usingthe keywordmutex. Inside a given parallel object, all blockof codes starting with the keyword mutex will be executedmutual exclusively.

3.8. Putting together

Figure 1 shows a simple example of a ParoC++ program.The user defines a parallel class calledInteger start-

ing with the keyword parclass. Each constructor of theparallel class Integer is associated with an OD that re-sides right after the argument declaration between ”{...}”.The figure shows different semantics of invocation: con-current (Integer::Get), sequential (Integer::Set)and mutual exclusive execution (Integer::Add). Themain ParoC++ program looks exactly like a C++ program.Two parallel objects of type Integer o1 and o2 are cre-ated. Object o1 requires a resource with the desired per-formance of 100MFlops although the minimum acceptableperformance is 80MFlops. Object o2will explicitly specifythe resource name (local host). After object creations, invo-cations to methods Set and Add are performed. It is inter-esting to note that the parallel object o2 can be passed fromthe main program to the remote method Integer::Addof the parallel object o1.

3.9. Communication cost

We use a ping-pong program to test the communicationcost of invocation of parallel objects. We wrote a parallelprogram containing two parallel objects called ”Ping” and”Pong” running on two different machines. Ping invokesa method of Pong with different arguments (size and type)and with two different invocation semantics: synchronousand asynchronous. For synchronous invocations, ”pong”


parclass Integer {public:Integer (int wanted, int minp)@{power>=wanted : minp;};

Integer(char *machine) @{host=machine;};seq async void Set(int val);conc int Get();mutex void Add(Integer &other);

protected:int data;

};

Integer::Integer(int wanted, int minp){}Integer::Integer(char*machine){}void Integer::Set(int val) {data=val;}int Integer::Get() { return data; }void Integer::Add(Integer &other) {data=other.Get();

}

int main(int argc, char **argv) {try { Integer o1(100,80),o2("localhost");o1.Set(1); o2.Set(2);o1.Add(o2);cout<<"Value="<<o1.Get();

}catch (int e) {cout<<"Creation fail";return -1;}return 0;

}

Figure 1. Example of ParoC++ program

will reply with the same message for each call. For asyn-chronous invocations, ”ping” will not wait for the replyfrom ”pong”. In this case, after all requests have been sentout, ”ping” will wait until all requests have been executedon ”pong” by a call to a synchronous method on ”pong” dueto the serializability property of invocation semantic.The ping-pong processes are repeated many times and

the total time of execution is measured.Figure 2 shows the invocation speed of parallel ob-

jects on 32-bit integer and 8-bit character messages andfor different message sizes. There is quite a big differencebetween synchronous and asynchronous invocations espe-cially with the small message size since the asynchronousinvocation gives a better overlapping between invocationswhereas there is no overlapping in synchronous invocations.System buffering also improves the throughput of smallmessage invocations by aggregating small messages into asingle large message before sending. The latency for asyn-

ParoC++ Invocation speed

1

10

100

1000

10000

100000

1000000

0 1000 2000 3000 4000 5000 6000 7000 8000DWORDS

Nu

mb

er o

f in

voca

tio

ns

per

se

con

d

CHAR(asynchronous)

CHAR(synchronous)

INT(asynchronous)

INT(synchronous)

Figure 2. Parallel object invocation cost

chronous invocation is about 6.9 µsec (MPICH: 43 µsec)and for synchronous one is about 94 µsec (MPICH: 123µsec).

Network Bandwidth

0

2000

4000

6000

8000

10000

12000

14000

0 1000 2000 3000 4000 5000 6000 7000 8000DWORD

Kb

yte/

s

ParoC++(asynchronous)ParoC++(synchronous)MPICH(1 way)MPICH(2 ways)

Figure 3. ParoC++ communication bandwidth

The communication bandwidth during the invocationsis presented in Fig. 3. Asynchronous invocations, due tothe overlapping, utilize better bandwidth than synchronousinvocations. This bandwidth is slightly better than asyn-chronous send (one way) of MPICH. The bandwidth ofasynchronous calls almost reaches the limit of the Fast Eth-ernet throughput (11.3 MB/s). For synchronous invocation,MPICH achieves somehow better bandwidth in our experi-ment (15-20% better for large messages). This is due to theextra cost for multiplexing remote method in ParoC++.

4. Example application

We present in this section the development of Pattern andDefect Detection System (PDDS) using ParoC++. PDDS is


part of the European project Forall-11 in textile manufac-turing. The main function of PDDS is to analyze continu-ous tissue images to find pattern positions and to discoverdefects on the tissue. This process should be in real-timewith the capacity of analysis up to 3.3 Mpixels/s or about10MBytes/s for 24bits RGB images.

search areasTemplate

Template

Figure 4. PDDS algorithm

The idea around the PDDS algorithms is to searchall over the tissue the local maximal values of similaritybetween the user-provided pattern template and the sub-image. Such positions are considered as the start points ofpatterns. PDDS optimizes the algorithm by searching onlyin small areas (the highlight areas in figure 4) for the pat-terns on the next row.

Nesting system

Main program

Input interface

Output interface

Image B uf

Analyzer 1

Analyzer n

Output Data

Object creation

Asynchronous I nvocation

S ynchronous I nvocation

Image acquiring system

Figure 5. ParoC++ implementation of PDDS

Figure 5 demonstrates the parallel object diagram ofPDDS using ParoC++. The main program will create twoparallel objects of type ImageBuf and OuputData andseveral parallel objects of type Analyzer. ImageBuf

1European project E!1955/CTI 5130.1 financed by Swiss Governmentin the Eureka program

and OutputData objects are shared among Analyzerobjects. The Analyzer objects access the ImageBufobject to get the images (synchronous invocation), analyzethem and then store the results in the OutputData object(asynchronous invocation). ImageBuf functionalities arereceiving image frames from the image acquiring systemand splitting them into small images and storing these smallimages into an internal buffer so that the Analyzer ob-jects can get and analyze. The main program also plays therole of a monitoring agent. It watches over the ImageBufto see if the system could follow the real-time speed. In thecase the main program detects that the system overworksdue to some increase on the computation demand or someexternal change to the resources, it can create some moreAnalyzer objects to solve the problem. Hence, in PDDSwe also deal with the adaptation of the application to theuser requirement and to the dynamic state of the environ-ment.

Analysis speed up

0

2

4

6

8

10

12

14

16

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of processors

Spe

edup

Sict2/ClusterMonti/ClusterSict2/WorkstationsMonti/Workstations

Figure 6. Performance of PDDS/ParoC++

We have performed two experiments. In the first experi-ment, we run PDDS in homogeneous networks to measurethe performance, the scalability and the efficienc y in termof number of Analyzer objects. The second experiment isdone in a heterogeneous network where we take into ac-count the changes on the computation demand and on theenvironment.The input image for the first experiment consists of 100

frames and is sent to PDDS frame by frame. Each framehas the size of 2048x2048 pixels. ImageBuf will split theframe into several sub images of the size 512x512 pixels.Neither the adaptation to the environment nor the adapta-tion to some increase of the requirement is considered inthis test. Figure 6 shows the speedup of two types of tis-sues: small patterns (Sict2) and big patterns (Monti) on anetwork of Sun sparc workstations and on a cluster of Pen-tium 4. We see that in both environments, almost linearspeedup is achieved. PDDS runs about 14 times faster on


16 processors.

Performance Adaptation

0

1

2

3

4

0 100 200 300 400 500 600

Time(s)

Ana

lysi

s sp

eed

(Mpi

xels

/s)

Actual speed

Required speed

Figure 7. Adaptation to the external changes

In the second experiment, the PDDS runs in a hetero-geneous environment of Solaris/sparc and Linux/Intel withthe adaptation part turned on. If PDDS discovers the sys-tem is overloaded due to the availability of the resourcesor the increase of the required performance, it will auto-matically adapt to the changes by allocating more Analyzerobjects (involving more resources). Hence we consider theadaptation of the application to the external changes. In fig-ure 7, we show the dependency between the analysis speed(in term of MPixel/s) and the time. The dash line presentsthe required power whereas the continuous line is the actualperformance of PDDS. In the test, we dynamically changethe requirement speed every 2 minutes. Due to these ex-ternal changes, additional Analyzer objects (resources) areautomatically allocated in order to satisfy the required per-formance. One interesting note is that at a certain time, theactual performance goes down (at the second of 220). Thereason is that we have changed the load of a machine usedby PDDS (launching other applications). The system reactsto this change and is soon recovered to the normal speed. Bythis experiment we want to show the two important points:

• ParoC++ application can efficiently deal with the com-putation on demand.

• ParoC++ can adaptively use the heterogeneous re-sources efficiently .

5. Related works

There are number of researches on parallel and dis-tributed object systems. The researches focus on two direc-tions: developing object-oriented languages and construct-ing supporting tools for the existing system.On the language aspect, Orca[1], MPL[11] and PO[3, 4]

are some examples. Orca provides a new language basedshared objects. The programming model that Orca used isDistributed Shared Memory (DSM)[7] for task parallelism.While Orca aims at using the objects as a mean to sharedata between processes, our approach combines the two

concepts of shared data object and the process into a sin-gle parallel object.MPL on the other hand, is an extension of C++ with

some so-called metat classes for parallel execution. MPLfollows the data-driven model. The parallelism is achievedby concurrent invocations on these objects. TheMentat run-time system is responsible for the instantiation of mentatobjects, the invocation of method and keeping objects con-sistency. Parallel objects in our approach are more generalthan metat objects. While the metat object supports onlyasynchronous invocation and is not shareable, ParoC++ pro-vides a more general approach with various invocation types(synchronous, asynchronous, concurrent, sequential, mu-tex) and the capacity of sharing objects. Moreover, bothOrca and MPL do not allow specifying the resource require-ment within the object.Our parallel object and PO share the object-oriented ap-

proach by both allowing inter-object and intra-object paral-lelism (concurrent methods). The difference is on the objectmodel: PO follows active object mode[2] with capability ofdeciding when and which invocation requests to serve whileour Parallel Object uses passive object model that is simi-lar to C++. Abstract Configuration Language (ACL) in POto specify high-level directives for the object allocation issimilar to our Object Description (OD); however, the ACLdirectives are only expressed at the class-level and cannot beparameterized for specific instances whereas our OD dealsdirectly with each object instance. Therefore, our OD canbe customized based on the real input parameters of the ob-ject.On the tool aspect, COBRA[14] and Parallel Data

CORBA[15] extend CORBA standard by encapsulatingseveral distributed components (object parts) within an ob-ject and by implementing the data parallelism based on datapartitioning. Data input on an object will be automaticallysplit and distributed to several object parts that can residein difference memory address spaces. This differs from ourapproach in which each parallel object resides in a singlememory address space and the parallelism is achieved byconcurrent interaction of objects and concurrent invocationsof methods on the same object. In addition, the specificationof resource requirement is not defined in both Data ParallelCORBA and COBRA.

6. Conclusions

Adaptive utilization of the highly heterogeneous com-putational environment for high performance computing isa difficult question that we tried to answer in this paper.Such adaptation has two forms: or the application compo-nents should somehow decompose dynamically based onthe available resources of the environment; or the compo-nents should allow the infrastructure to select suitable re-


sources by providing descriptive information about the re-source requirement.We have addressed these two forms of adaptation by in-

troducing our parallel object and ParoC++-a parallel object-oriented programming language. The integration of require-ment driven by object-description into the shareable paral-lel object is a distinct feature or our approach. We havedescribed ParoC++ that extends C++ to support the paral-lel object. ParoC++ also offers various mechanisms such asevent sub-systems, synchronization, and mutual exclusiveexecution to support the concurrency within the parallel ob-ject. Programming in ParoC++ is rather easy since ParoC++is very similar to C++.Some primary experiments on ParoC++ have been per-

formed. Low-level tests on different types of method invo-cations give a good latency and a good bandwidth comparedto MPICH on the same architecture. An industrial appli-cation on real-time image analysis has also been demon-strated. The results have showed the efficienc y, scalability,adaptability and the ease-to-use of ParoC++ in dealing withthe computation on demand of the HPC application in het-erogeneous and distributed environments.

References

[1] H. E. Bal, M. F. Kaashoek, and A. S. Tanenbaum. Orca: Alanguage for parallel programming of distributed systems.IEEE Transactions on Software Engineering, 18(3):190–205, March 1992.

[2] R. Chin and S. Chanson. Distributed object-based program-ming system. ACM Computing Surveys, 23(1), 1991.

[3] A. Corradi, L. Leonardi, and F. Zambonelli. Hpo: a pro-gramming environment for object-oriented metacomputing.In Proc. of the 23rd EUROMICRO conference, 1997.

[4] A. Corradi, L. Leonardi, and F. Zambonelli. Parallel objectallocation via user-specified directives: A case study in traf-fic simulation.J. Parallel Computing, (27):223–241, 2001.

[5] I. Foster and N. Karonis. A grid-enabled mpi: Messagepassing in heterogeneous distributed computing systems. InProc. 1998 SC Conference, November 1998.

[6] I. Foster and C. Kesselman. Globus: A metacomputinginfrastructure toolkit. Intl J. Supercomputer Applications,11(2):115–128, 1997.

[7] I. Foster and C. Kesselman. The Grid: Blueprint for a NewComputing Infrastructure. Morgan Kaufmann Publishers,1998.

[8] I. Foster, C. Kesselman, and S. Tuecke. The anatomy of thegrid: Enabling scalable virtual organizations. InternationalJ. Supercomputer Applications, 15(3), 2001.

[9] I. Foster, A. Roy, and V. Sander. A quality of service archi-tecture that combines resource reservation and applicationadaptation. In The 8th International Workshop on Quality ofService, 2000.

[10] C. Gill, F. Kuhns, D. C. Schmidt, and R. Cytron. Empiricaldifferences between cots middleware scheduling paradigms.In The 8th IEEE Real-Time Technology and ApplicationsSymposium, September 2002.

[11] A. Grimshaw, A. Ferrari, and E. West. Parallel Program-ming Using C++, pages 383–427. The MIT Press, Cam-bridge, Massachusetts, 1996.

[12] A. S. Grimshaw and W. A. Wulf. Legion — a view from50,000 feet. In Proc. of the 5th IEEE International Sym-posium on High Performance Distributed Computation, Au-gust 1996.

[13] G. Hoo, W. Johnston, I. Foster, and A. Roy. Qos as mid-dleware: Bandwidth reservation system design. In Proc. ofthe 8th IEEE Symposium on High Performance DistributedComputing, 1999.

[14] K. Keahey and D. Gannon. Pardis: A parallel approach tocorba. In The 6th IEEE International Symposium on HighPerformance Distributed Computing, August 1997.

[15] T. Priol and C. Rene. Cobra: A corba-compliant pro-gramming environment for high-performance computing. InProc. of Europar’98, Southampton, UK, September.

[16] A. Roy, I. Foster, W. Gropp, N. Karonis, V. Sander, andB. Toonen. MPICH-GQ: Quality-of-service for messagepassing programs. In Proc. of the IEEE/ACM SC2000 Con-ference, November 2000.


Documents

[IEEE Comput. Soc Eighth International Workshop on High-Level Parallel Programming Modes and Supportive Environments - Nice, France (22 April 2003)] Eighth International Workshop on