Parallel object-oriented descriptions of graph reduction machines

Parallel object-oriented descriptions of graph reduction machines

225

Dav id Bol ton *, Chris H a n k i n * * and Paul Kel ly * * • Department of Computer Science, City University, Northampton Square, London ECI V OHB, UK • * Department of Computing, Imperial College, 180 Queen's Gate, London SW7 2BZ, UK

Abstract machine descriptions of parallel computer architectures must capture communications and concur- rency characteristics at a high level. Current design techniques and notations are weak in this respect. We present a layered method for refinement of a requirements specification through to a detailed systems architecture design.

This paper concentrates on the two highest layers, the logical model, which is a requirements statement, and the systems architecture, which specifies logical processes and explicit communications. While requirements are expressed in a language that matches the problem domain, we suggest that a parallel object-oriented notation is most appropriate for the systems architecture layer. Refinements within this layer reflect implementation details (e.g. structure sharing and distribution of work among processing elements). We introduce a parallel object-oriented notation based on rewriting concepts and use it to refine the design of a parallel graph reduction machine to execute functional programs.

The Paragon notation used is a natural extension of a graph rewriting language and the work forms the basis for a structured explication of parallel graph rewriting in which all communications are made explicit.

!. Introduction

In this paper we propose a formally-based methodology for the design of parallel computing systems. The method applies to the logical model which involves a requirements statement and its refinements, and the systems architecture which involves a specification of logical processes and explicit inter-process communication. The systems architecture can be further refined towards a silicon design, but in this paper we focus on the top two layers.

1.1. The logical model

The logical model begins with a requirements statement. This and any refinements of the logical model are expressed in a language which matches the problem domain.

Our demonstrator is a parallel combinator reduction machine, and in this particular example, the language of term and graph rewriting systems is the most appropriate. The requirements are expressed as a term rewriting system which describes the normal-order reduction of combinator expressions. The lowest level of the logical model layer uses an annotated graph rewriting system to describe a parallel, packet-based graph reduction mechanism. Verification of the refinements employs the well-understood theory of graph rewriting systems.

North-Holland Future Generations Computer Systems 6 (1990) 225-239

0376-5075/90/$03.50 © 1990 - Elsevier Science Publishers B.V. (North-Holland)

226 D. Bolton et al. / Parallel object-oriented descriptions

1.2. The systems architecture

The role of the systems architect is to identify log!cal processes within the architecture, and the control and data flows between them. Here we introduce a parallel object-oriented notation for use across all applications. The top-level of the layer recasts the lowest level of the logical model in the object-oriented notation; for our example the objects will be packets and processing agents. Lower levels of this layer might incorporate processors as objects allowing us to reflect scheduling and process placement design decisions, and we sketch an example refinement to model load balancing. The correctness proofs at this lower level are not treated in the paper.

The rest of this paper is structured to reflect the different layers of design activity. In the next section we present the requirements statement that the example design is meant to meet. This represents the top-level (most abstract) design within the logical model. In Section 3 we present a graph rewriting system which describes the lowest level of description within the logical model; we identify the correctness criterion for this description. Section 4 contains a description of Paragon and Section 5 illustrates its use in the description of the top levels of the systems architecture. We conclude with some suggestions for future work.

2. The requirements

The example architecture is the Cobweb machine [2,3]. This parallel architecture aims to execute functional programs using a large array of processing elements fabricated on a single wafer of silicon.

The arguments for using functional languages in parallel systems have been well-rehearsed elsewhere [9,18,13]. Functional programming systems must implement function application efficiently. During application, some representation of the argument (either a value or closure/suspension) becomes associated with the formal parameters of the function. These associations are used in the evaluation of the body of the function. There are two standard approaches to functional language implementation: environment- based machines (e.g. the SECD machine) and combinator-based machines [9,13]. In environment-based machines the binding of actual to formal parameters is represented by an association list which is globally accessible and updateable. In contrast, in combinator-based machines the need for an association list is replaced by compiling the program into code which is variable-free and which directs the actual parameters to where they are used. This avoids a major potential bottleneck in parallel implementations.

We choose a particularly straightforward combinator-based implementation, in which the program is represented as director code [8,15]. To a first approximation, a functional expression can be viewed as a binary tree. Each node corresponds to an application with the left subtree being the "funct ion" and the fight subtree being the "argument". The leaves are constants and variables. In the director approach every variable is replaced by a placeholder (here represented by I) and nodes (applications) are annotated by directors which indicate how actual parameters should be distributed through the tree. Since the nodes are binary, there are four possible directors per parameter:

parameter is used in both subtrees / parameter is used \ parameter is used

parameter is used For example:

Ax [

/l\\. ,// \

* X

in left subtree only in right subtree only in neither subtree.

~X

,C \ " \ !

/" \~ "1 * X

D. Bolton et al. / Parallel object-oriented descriptions 227

The two directors on the root node of the resultant tree, / and \ , correspond to the original abstractors, ~,x and by; since the variable y did not occur in the left subtree of the original tree, all directors on nodes in the final subtree refer to x. Details of the compilation process can be found in [11].

The "machine code" has the following format: direxp :: = variable I

constant I

P I K I I I Y I (dirs direxp direxp)

d i r s : : = n i l ld i r ::dirs

dir : := / 1 \ 1 ^ I - t # In the following we will use + as a prototypical constant operator and we will freely use a suggestive notation for atomic constants; brackets will be omitted when no ambiguity results and, as usual, application associates to the left; nil director strings are omitted. We use the usual shorthand notation: [e 1,e 2 . . . . . e . ] to represent the list: e I :: (02::... (e n :: nil)...) The P combinator and the # director are parallelism annotations (a richer set of annotations that take account of structured data may be found in [11]). As will be shown in the next section, they are associated with applications and allow for parallel evaluation of the argument and the application.

The requirements statement may be formalised as a simple applicative term rewriting system [16], TRS, which is defined on the machine code structure: + n m ~ n + m [ + ] P f a = f a [P]

K x y ==> x [K]

Y f = f (Y f) [Y]

(^ : :d f a) x = d (f x) (a x) [^1] (^ : :d I a) x ~ d x ( a x ) [^2] (^ : :d f l ) x ~ d ( f x ) x [^3] (^ : :d I I) x = d x x [^4] ( / : : d f a) = d ( f x) a [ / 1 ] ( / : : d l a ) x ~ d x a [ / 2 ] ( \ : : d f a) x ~ d f ( a x ) [ \ 1 ] ( \ : : d f I) x ~ d f x [ \ 2 ] ( - : :d f a) x ~ d f a [-] # : : d f a = d f a [ # ] (Note: I is not being used as the ident i ty combinator but instead as a place marker for occurrences of actual parameters. Thus there is no rewrite rule for I. Underlining is used to represent literal values.)

A computation is represented by a pair; a term (program) and the set of rules. Execution proceeds by matching the program (or one of its subterms) against the left hand side of one of the rules and replacing it by the corresponding right hand side (matching will involve instantion of (some of) the variables in the left hand side of the rule). For example: [ / , \ ] ( [ \ ] + ( [ ^1( [ \ ] *1)1))1 2 3 = [/q [ \ ] ( [ \ ] + ( [^]( [ \ ] '1)1)2)1 3 ~ [\2]

^

( [ \ ] + ([ ] ( [ \ ] '1 )1)2)3 = [\11 ^

( + ([ ] ( [ \ ] *1)1 2))3 ~ [%1 ( + ( ( [ \ ] ' 12 )2 ) )3 ~ [\21 ( + ( *2 2))3 ~ [.]

+4 3 ~ l+l 7 At each stage we have simplified the outermost term which matches. If more than one term had been outermost (for example, both arguments to + , *, etc.), we would have selected the leftmost. Leftmost


outermost rewriting is sometimes called normal order; it is guaranteed to produce an answer if any strategy does. To introduce parallelism, compile-time analysis is used to deduce where a more eager order can be employed; the results of this analysis is reflected in the code by the P combinator and # annotations.

3. The graph rewriting system

There is a variety of design decisions that can (and should) be made in the logical model. Amongst these are such things as graph versus string reduction and the extent to which parallelism annotations are acted upon. In order to be able to express such decisions it is necessary to extend our term rewriting system to be an annotated graph rewriting system [4,10]. The lowest level description in the logical model might be (called GRS below): + m n ~ m + n [ g + l ] + x n ~ $ + ! x n [ g + 2 ] + m y ~ $ + r n ! y [ g + 3 ] + x y = $ + !x !y [g + 4] P f a ~ !(f !a) [gP] K x y = Ix [gK]

t = (Y f) = !(f t) [gY] ( ^ : : d f a ) t = x = !(d (f t)tl = (a t)) [g^ l ] ( ^ : : d l a ) t = x = ! ( d t t l = ( a t ) ) [g^2] ( ^ : : d f l ) t = x = g ( d ( f t ) t ) [g^3] ( ^ : : d l l ) t = x = ! ( d t t ) [g^4] ( / : : d f a)x = !(d (f x)a) [ g / l ] ( / : : d I a)x = !(d x a) [ g / 2 ] ( \ : : d f a)x = !(d f t = ( a x)) [ g \ l ] ( \ :: d f I)x = !(d f x) [g \ 2 ] (-:: d f a)x = !(d f a) [g-] # ::d f a ~ !(d f !a) [ g # ] t a = $ !t a [gt]

The ! annotations indicate the redexes which are reducible next - this gives us fine control over the evaluation order. In most cases it is the leftmost redex that is to be evaluated but for example, in the [g + 4] rule, both operands to a strict operator can be reduced in parallel (an operator is strict in an argument if it requires the value of that argument for the overall result to be defined). The $ indicates that the process of reducing this redex should be suspended until the subordinate (inner) reductions have

^

completed. The rules for Y, and \ include named subgraphs (indicated by an identifier, t or t 1, separated from the graph by an equals sign (=)) ; in each case the name is expected to be a new name. This notation introduces the possibility of sharing structure between different parts of the computation (copying pointers rather than structure) and is naturally implemented on a packet-based architecture [7,2,3]. We have described a parallel graph reduction process for a packet-based machine, we now turn to the question of correctness.

A computation is represented by a pair consisting of a program and a set of rewrite rules. The correctness of the graph rewriting system with respect to the term rewriting system has two parts: termination and equality of results. The rewriting process terminates when the term(graph) is in normal form, that is that no more rewrite rules are applicable to the term(graph). A more formal statement of correctness is: "v'd In direxp.(

((d,GRS) has a normal form = ((d, TRS) has a normal form) and ((d, TRS) = (d, GRS ) ))


a n d

((d,TRS) h a s a n n o r m a l f o r m = ((d,GRS) h a s a n o r m a l f o r m ) and

( (d , TItS ) = (d , GFIS)))) Ignoring the ! annotations in GRS, it is straightforward to show the termination equivalence of the two systems and the equivalence of normal forms follows because they are inter-reducible [5]. In this paper, we assume that the parallelism annotations have been generated by abstract interpretation [11] and that they therefore do preserve the termination properties of the normal-order reduction system.

4. A parallel object-oriented rewrite notation: Paragon

The previous sections have been rather problem specific. For example, if one wanted to produce an architecture for signal processing, it is unlikely that graph rewriting systems would provide the best medium for the logical model. The object-oriented approach provides a natural means to describe the functional units, and the data and control flows between them, that are specified at the systems architecture level. In order to facilitate the transition from the rewrite system specification, we have developed a novel object-oriented programming notation, Paragon, which extends our graph rewriting notation to include message passing.

This section gives a description of the language's syntactic structure and gives a simple example which is followed by an overview of the operational semantics.

Note that while Paragon does form a basis for a programming language, there are several issues - for example the absence of program structuring mechanisms - which would have to be resolved before a language design based on it would be complete.

4.1. Syntax

We follow POOL [1] in dealing with programs represented by a single Unit, which introduces a collection of Class definitions. To begin program execution, an instance of the final class in the unit is generated when the program is run. The syntax of a unit is: u n i t : := < C l a s s n a m e 1 : d e f . . . . . C l a s s n a m e n : d e f )

where each Classname is represented by an identifier starting with an upper-case letter. Each class definition consists of some type information and a labelled, guarded transition system. The

type information specifies the type of the tuple of objects which constitute the state of an instance of the class and is specified using an algebraic type scheme similar to that found in Miranda ] [21]. The labels on rewrite rules are used to model message receipt. Informally, the most general form of a rule is: state g iven message

w h e n pa t te rn~ ? iden t i f i e r~

a n d . . .

and p a t t e r n m ? i d e n t i f i e r m

with g u a r d

- , s t a t e t h e n a c t i o n s

w h e r e i d e n t i f i e r m +1 = expression1

a n d . . .

a n d identifier m + n = expressionn;

Such a rewrite rule will he selected for execution when all the conditions of the left-hand side (to the left of ) are satisfied: when the state of an object "matches" the left hand side of the rule, the message in the

rule "matches" a message in the object's message buffer, each of the match clauses after the when succeeds and the guard evaluates to true. The right hand side of the rule is evaluated in an environment which is

i Miranda is a trademark of Research Software Limited.

230 D. Bolton et a L / Parallel object-oriented descriptions

defined as a result of the rule selection process and also includes any bindings resulting from the where clause. It defines what the new state of the object should be and may specify some further communications (actions). The integrity of the rewrite is guaranteed by appropriate locking mechanisms.

The actions component is formed from the primitive message transmission operators ! (for synchronous transmission) and !! (for asynchronous transmission). For example, d ! m ( e l . . . e o ) sends a message m, carrying parameters e l . . . e n to the object identified by d. This synchronous transmission delays subsequent actions until the message is received and acted upon by the destination object. In an asynchronous transmission, d T7 m (e 1 . . .en) subsequent actions can proceed straight away, and message receipt may occur some time later. Primitive message transmissions can be combined sequentially (using the + operator), or in parallel using commas: (d 1 !! m 1 ( e l . . . e n ) , d2 !! m 2 ( e l . . . e n ) . . . . dm !! mm ( e l . . . e n ) )

As an example we show part of the class definition that might be employed in defining a simple vending machine that dispenses cups of coffee. The vending machine is an instance of the class of machines which have the following type: c lass m a c h i n e ::= ( f lag, in teger , in teger )

type flag = idle I busy This specifies that the state of a machine object is represented by a triple containing the following information:

a flag which indicates whether the machine is busy an integer that indicates how much coffee there is in the machine an integer which represents how full the coin box is

Six sample rules are: {idle,coffee,coins}given coin with (coffee _> l a n d coins < capac i t y )

--, ( busy , co f f ee , co ins } ; [ c m l ]

{idle,coffee,coins} given coin with (coffee = 0 or coins = capacity) - * {idle,coffee,coins} then c u s t o m e d c o i n ; [ cm2 ]

{busy . . . . } given coin --* {busy . . . . } then c u s t o m e d c o i n ; [ cm3 ]

{busy,coffee,coins} given reject -~ {idle,coffee,coins} then customer!coin; [cm4]

{idle . . . . } given reject --, {idle . . . . }; [cm5] (busy,coffee,coins} - , (idle,coffee-1 ,coins + 1 } then customedcup; [cm6]

Leftmost in each rule is the object's state required before matching. The keyword given introduces the message which must be received before rewriting, and with introduces the guard which must be satisfied.

Rules [cml] to [cmf] define state transitions that may occur on the receipt of an appropriate message; rule [cm6] may be selected whenever the state matches. At any time after rule [cml] has been executed, rule [cm6] can be selected to dispense the coffee. Rule [cm2] deals with the machine being empty or the coin box being full. Rules [cm3] and [cmf] deal with abuse of the machine and rule [cm4] deals with the case where the reject button is pushed between inserting a coin and the dispending of the coffee. In a fuller specification we would have additional rules for the machine and another class definition specifying customer objects.

The reader is referred to section 5 for examples of non-primitive actions, when clauses and where clauses which were not required for this example.

More formally, the syntax for rewrite rules is defined as follows (square brackets are used to denote optional components): def ::= ( r u l es ) ru les ::= ru le l ru le ; ru les ru le ::= state [given message][when pa t te rns ] [w i t h gua rd ] --, rhs

gua rd ::= Basic-exp


message ::= Id I Id ( tup le)

tup le ::= e l e m e n t le lement , t up le

e l e m e n t ::= _ l exp

exp ::= Id I Bas i c -exp I n e w (c lassname, s tate) Inil I se l f

pa t te rns ::= pat te rn lPat tern and pa t te rns

pa t te rn ::= s ta te ? Id

We will not define Id or Basic-exp (which includes the usual arithmetic and boolean expressions) further. Elements of a message may not be_ which is used as a don't care pattern below.

The main function of each rule is to perform an update of the state of an object. An object is an instance of a class and associated with each class of instance variables; the state of an object is a tuple which gives a binding for each of the instance variables for that object. The syntax for states is: sta te : : = { t u p l e } I{ }

In a left hand side state elements are restricted to identifiers, literal basic expressions and _. The right hand side of a rule specifies how the state is to be updated and may, optionally, specify

further actions which are to be performed. A where clause can also be used for auxilliary definitions: rhs :: = s tate [ then a c t i o n s ] [ w h e r e auxde fs ]

a u x d e f s :: = auxde f l auxde f and auxde fs

auxde f :: = Id = exp

ac t i ons :: = par_ ac t ion I par_ ac t ion + ac t i ons

par ac t ion : : = ac t ion I (ac t ion _ l is t )

ac t ion _ l ist :: = ac t ion , ac t ion l i s t

ac t ion :: = exp ! m e s s a g e l exp !! m e s s a g e

An action involves the sending of message, either synchronously (!) or asynchronously (!!): in either case the expression must evaluate to an object (of the appropriate class) or a list of objects.

4.2. Semantics

In this section, we sketch the operational semantics [19] of the language; a full operational semantics, based on POOL's [1], is presented in the companion paper [6]. The context-sensitive syntax which is described above can be formalised in the static semantics. This is routine and omitted here, instead we concentrate on the dynamic semantics.

4.2.1. Objects Following the semantics of POOL [1], the set of objects includes standard objects and a set of

programmer defined objects, AObj: Obj = AOb j u Z u { t t , f f } u {n i l } u (?)

where Z is the set of integers, tt and ff represent the truth values and ? is used to represent unknown values in the semantics of right hand side states with_elements. Programmer defined objects are generated in such a way that they can be assigned a unique integer identifier: AOb j = { n l n in N}

4. 2.2. Configurations Following [19], the semantics is presented as a set of axioms and derivation rules which manipulate

configurations which represent the state of the computation. The configurations used in the semantics are elements of the form: <X,s,t ,O) ~ (.~fin(ACtivity) X S x Type × Unit) L) {e r r } where ~fi , constructs the set of finite subsets of a given set.

Activity is a set of quadruples which, for each active object, gives information about the current activity of that object. The information specifies the identify of the object, either a script to be executed by the object or the value of an expression produced by the execution of part of a script, information about the source of the message that initiated the activity and an environment containing bindings for variables

232 D. Bolton et aL / Parallel object-oriented descriptions

involved in the pattern matching during rule selection. Thus each element, represented as (a,s,m,r), has one of the forms: ( Id, scr ipt , message_ in fo , env i ronment> (Id, result , message_ info, env i ronment> where script ::= actions lactions; l actions where auxdefs; l exp I state I wait result ::= Obj u Obj * message _ in fo= (message, Id) I (message, n i l ) l ok I ni l env i r onmen t ::= Id --, fin resul t The second component in the message information is either an object - the return address for a synchronous message - or nil - which means that the message was asynchronous and there is no return. In the semantics ok is used as an acknowledgement for synchronous message passing and nil is used as a dummy message when the activity corresponds to an un-labelled transition (i.e. a transition which has no message component). The environment is a finite mapping from identifiers to results.

The state s E S is represented by a pair of mappings: (AObj --, fin O b j * ) × (AObj --' fin(.~fin (Messages) u {n i l } ) )

The first function in the pair gives the current bindings of the instance variables (i.e. current state) for each object and the second mapping gives the current message queue for each object. Messages is a set of messages which are waiting to be processed by this object; its elements are of the form: (message, Id> or (message, nil> or ok We will use the notation: <sl , S2) to represent the pair of functions which constitute the state.

Type is a type assignment function which, for each object, gives the classname that it was generated from: AObj --, .n c lassname

Finally, err is an error value.

4.2.3. The meaning of a unit

The meaning of a unit, U, is the set of all (finite and infinite) sequences of configurations, (cl ,c2 . . . . ), that satisfy the following conditions:

(i) the initial configuration of the system will be (.O,s,t,U} where U = (C1 : d l ; . . . C n : d n } s l --- (1 --* (n i l . . . . . n i l } ] s2 = [1 ---* ni l ] t ( 1 ) = Cn where JJ denotes the empty set. The initial state only contains information about one object, 1, which is an instance of Cn. Notice that, by (iv) below, this is also a terminal configuration unless dn contains an unlabelled rule which is enabled by sl .

(ii) For all ci and ci + 1 we have: ci ~ c i + 1

where --, is the transition relation defined in [6]. (iii) The whole sequence is fair: when an object is infinitely often allowed to participate in a transition,

it will eventually do so [1]. (iv) Terminal configurations are of the form:

c t e r m = (~ , ( s l ,~x .n i l> , t ,U) such that ~ 3 c ~ Con f igu ra t ions . c term --, c A terminal configuration is one in which there are no current activities, no messages waiting for processing and no un-labelled transitions are applicable to any object.

D. Bolton et a L / Parallel object-oriented descriptions 233

We now return to the intended application - the description of the systems architecture for the graph reduction machine of Section 3.

5. The systems architecture

The top level description within this level is a straightforward recasting of the graph rewriting system. However, in this next specification the designer is forced to make communication explicit - something that would be very difficult in the graph rewriting idiom.

5.1. The top level

There are three types of objects in our particular system: packets which correspond to nodes in the graph, agents which correspond to the execution mechanism and the loader which is the final class in the unit. The loader is program specific; it consists of a single unlabelled rule with a single action (a reduce message to a new agent that starts the evaluation of the top-level expression); the graph of the program is constructed in a where clause. We will not detail the loader further here.

Remark: We would have given a higher level description which had only two objects: packets and the loader. This would give us a description that would be very similar to GRS without the ! annotations and would not allow us to model communication adequately so instead we start at a slightly lower level. It is a feature of our approach that each level of refinement involves adding new instance variables and messages to existing classes and adding new classes. With the higher level description we would have the following sequence of refinements (but we only describe the last two): (packets, loader) ~ (packets' , agents, loader) (packets" , agents' , processors, loader) The intention being that packets ' (packets") is a class of objects like packets but refined by the addition of further components in the state and further rules. Thus successive refinements are linked by a record subtype relation with packets" a subtype of packets ' which is in turn a subtype of packets. Similar components apply to the heirarchy of agent classes.

5.1.1. Packets The packet class is introduced by the class definition:

class packet: := (rater, rand, string, innf, act, l ist agent) type rater = packet I basic-value type rand = packet 1basic-value type innf = nf I notnf type act = active l inact ive

The components of a packet object are: Operator Operand Annotat ion Normal Form Activity Marker Pending

Members of the

the left hand subgraph the fight hand subgraph the director string indication of whether the packet is in normal form indication of whether or not the packet is being reduced a list of agents that are awaiting the value of this packet packet object class can respond to need, fire and rewrite messages:

5.1.1.1. Need This message indicates that the packet should be reduced to normal form (by a new agent) and the results should be returned to the requesting agent, which is a parameter to the message. The rewrites which deal with these messages are as follows: { . . . . . . act ive,p}g iven need(agent) -* ( . . . . . . . . act ive,agent ::p}; ( . . . . . nf , inact ive,p} g iven need(agent) --, ( . . . . . nf , inact ive,p}

234 D. Bolton et a L / Parallel object-oriented descriptions

then agent ! !wakeup; { . . . . . . no tn f , inact ive ,p} g iven need(agent ) -~ { . . . . . . notn f ,ac t ive ,agent :: p}

then new (Agent , {se l f ,0} ) ! ! reduce(se l f ) ;

The last rule corresponds to the receipt of a need message by an unevaluated packet which is not currently being reduced - its activity marker is set active, the requesting agent is added to the pending list and a new agent is created to reduce this packet (self).

5.1.1.2. Fire This message has no parameters, it indicates that the packet should be evaluated. The message will be sent by an agent that is executing a rewrite rule with parallelism annotations. { . . . . . . act ive,p} given fire --, { . . . . . . . . act ive,p} ; { . . . . . . nf,inactive,p} given f i r e , { . . . . . . nf,inactive,p}; { . . . . . . notnf,inactive,p} givenfire -~ { . . . . . . notnf,active,p}

then new (Agent ,P{se l f ,0} ) ! ! reduce(se l f ) ;

5.1.1.3. Rewrite A rewrite message is received when the node is to be overwritten. If the node is rewritten to normal form, all of the agents on the Pending list must be sent wakeup messages and the Activity Marker must be set inactive. The message has four parameters, corresponding to the fields of the node and the normal form flag. { . . . . . . . . . . p} given rewr i te (op,arg ,ann,notn f ) --, (op ,arg ,ann,notn f , _,p}; { . . . . . . . . . . p} given rewrite(op,arg,ann,nf) --, {op,arg,ann,nf,inactive,nil}

then p!!wakeup; Note that the message target in the second rule is a list.

5.1.2. Agents Agents have the following type:

c l a s s agent ::= (packet,integer) Thus agents have the following internal structure: target_packet "address" of the packet that the agent is trying to reduce count a count of the number of values that the agent is waiting for The following messages can be received by an agent.

5.1.2.1. Reduce This message is sent to the agent when it is first created and subsequently when there is more reduction to be performed. The calls to reduce correspond to the ! annotations in the graph rewriting system and to the reactivation of a suspended reduction through a wakeup message. It has a single parameter which is the address of the packet which the agent is meant to be reducing.

The rules presented below correspond to the rules in GRS; in the interests of brevity we have omitted some of the rules. The rule [onf] has no analogue in GRS and is included so that we can correctly model the use of the innf flag. { _,0} g iven reduce(pack1) [o + 1] when {pack2,n,nil . . . . . . } ? pack 1

and { +,m,nil . . . . . . } ? pack2

with is _ integer(m)and is integer(n) { _, } then pack l l r ewr i t e (m + n,ni l ,ni l ,nf) ;

{ _,o} when

.and { and {

given reduce(pack1 ) {pack2 ,pack4 ,n i l . . . . . . } ? pack1 + ,pack3 ,n i l . . . . . . } ? pack 2 . . . . . . notnf . . . . } ? pack3

and { . . . . . notnf . . . . } ? pack4 --, { _,2} then (packed3!!need(self), pack4!!need(self));

{ _,0} given reduce(pack1)

[o + 4]

[oP]

D. Bolton et aL //Parallel object-oriented descriptions 235

when {pack2,pack3,ni l ,n . . . . } ? pack 1 and {P,f . . . . . . . . } ? pack2 with is _ packet(pack3)

{pack1,0} then (pack3!! f i re,packl !rewrite(f,pack3,nil,n)) + self!!red uce(pack l ));

{ _,0} given reduce(pack1) [oY] when {Y,arg,nil . . . . . . }?pack1

--, {pack1,0} then pack1 !rewri te(arg,packl ,nil,notnf) + self!!reduce(packl);

{ _,0} given reduce(pack1 ) [ o \ 1 ] when {pack2,arg l ,nil . . . . . . }?pack1 and {op ,a rg2 , \ :: ann . . . . . . }?pack2

--, {pack1,0} then pack1 &ewrite(op, new(Packet, { arg2,arg I ,nil, notnf }),an n, notnf) + selfl !red uce(pack l );

{ _,0} given reduce(pack1) [ 0 \ 2 ] when {pack2,arg l ,nil . . . . . }?pack1 and {op , l , \ :: ann . . . . . . }?pack2

* {pack1,0} then pack1 !rewri te(op,argl ,ann,notnf) + selfl !red uce(pack l );

{ _,0} given reduce(pack1) lot] when {pack2, _,nil . . . . . . )?pack1 and { . . . . . notnf . . . . }?pack2

, {pack1,1 } then pack2!!need(self); { _,0} given reduce(pack1) when (op,arg,a :: x . . . . . . }?pack1 with a ~ #

--, {pack1,0} then pack1 !!rewrite(op,arg,a :: x,nf); [onf] We have freely used predicates such as is_ integer and is_packet to check an object's type.

Although this rewrite system specifies a director reduction stragegy, it would only require relatively minor modifications to define a strategy for any other graph-based language (e.g. supercombinators [18]).

5.1.2.2. Wakeup A wakeup message is received whenever a packet that was activated by the agent is reduced to normal form. It has no parameters. { _,c} g iven wakeup with c >_ 2 , { _,c-1 }; {pack,1 } given wakeup --* {pack,O} then self l !reduce(pack);

5.1.2.3. Unlabelled transitions In the course of evaluating a program using the above rewrites indirections will be introduced (e.g. in the second rule for + ); these can be removed by adding the following unlabelled rules: { pack l , 0 }when {pack2,a,ann,f lag . . . . ) ? pack1

and {f,ni l,ni l,nf . . . . } ? pack2 , {pack1,0} then pack1 !!rewrite(f,a,ann,flag);

{pack1,0} when {f,pack2,ann,f lag . . . . } ? pack1 and {a,ni l ,ni l ,nf . . . . } ? pack2 --, {pack1,0} then pack1 !!rewrite(f,a,ann,flag);


5.2. Correctness

We have to show that this specification is correct with respect to the preceding level, i.e. GRS. The statement of correctness is equivalent to that shown at the end of Section 3; we have to show that termination is preserved and that the results are equal. Rather than give a detailed correctness proof here, we will outline the approach.

First we need to demonstrate an isomorphism between the structures being manipulated by the two specifications. Graphs in the Paragon specification ( P S ) are represented by packet objects; these carry some extra information compared to graphs in GRS but these items are initialised to appropriate null values when the packets are first loaded. Therefore, ignoring the additional structure, there is an obvious isomorphism between collections of packets in PS and the graph structures used in GRS.

Next we consider the reduction process. One obvious difference between PS and GRS is the distinction between agents and packets that appears in the former. Reduction is performed by agents and the rules from the reduce method for agents are approximately a Paragon transliteration of the rewrite rules in GRS. The remaining rules in PS explicate the communications requirements implied in the control annotations of GRS. A detailed proof would proceed by induction over the forms of graphs and takes us beyond the scope of this paper since we would require a detailed semantics both for Paragon and the graph rewriting notation.

5.3. Lower levels of specification

At the next lower level of the systems architecture, we introduce processing elements. Each packet will be associated with a particular processing element and agents with execute on processing elements. Processing elements will be able to "steal" work from neighbouring elements when they have no work of their own. Once an agent has begun to execute it is "f ixed" on the processor and cannot be stolen by a neighbour. To specify this level, we need to specify a new class of Processors and modify the two class definitions from Section 5.1.

5.3.1. Processors Processors have the following type:

class processor ::= (set packet, set agent, set processor, free) type free = free I locked the components playing the following roles: Packets the col lect ion of packets resident on this processor Waiting the col lect ion of agents currently wait ing on this processor Executed the collection of agents execut ing (or which have been executed) on this processor Neighbours the list of neighbour ing processing elements

free a flag which controls the process of searching for new work Processors respond to four messages: place, work?, coal and n o work.

5.3.1.1. Place This message is received from a packet which has created an agent to reduce itself. It has a single parameter which identifies the agent. { _,A . . . . } given place(agent) --) ( ,A LJ agent . . . . . free};

5. 3.1.2. Work? This message is received from another process which is trying to steal work. It has a single parameter which identifies the processor which wants the work. { ,A L_) agent, , , }g iven work?(proc) --, ( _,A, , _} then (agent!!move(proc),

proc!lplace(agent)): { _ ~ . . . . . } given work?(proc)-- , ( . . . . . } then proc! lno work;

D. Bolton et aL / Parallel object-oriented descriptions 237

5.3.1.3. Move This message is received from an agent when it becomes active. It has a single parameter which identifies the agent. { _,A, L) agent,E, _, } given m o v e ( a g e n t ' ) with agent = a g e n t '

-~ { _,A,E u agent . . . . };

5.3.1.4. N o _ w o r k This message is received f rom a neighbouring processor which does not have any work to export in

response to a work? message. { . . . . . . n :: N,Iocked} given no_work --* { . . . . . . N + + [n],locked} thenn!!work?(self); { . . . . . . . free} given no_work --, ( . . . . . . . free};

The second rule allows for a processor being free 'd by one of its own packets after it has asked for work f rom a neighbour.

5.3.1.5. Unlabelled transitions There is a single unlabelled transition which may fire at any time that the processor is idle in order to find new work. { _,~, _,n :: N, f ree} --, { . . . . . N + + [n ] , locked} then n!!work?(self) ;

5.3.2. Packets and agents We need to make some minor changes to the two object classes defined in Section 5.1. Specifically, bo th

packets and agents have an additional instance variable identifying the processor that they are allocated to: class packet ::= (processor,rator,rand,string,boolean,boolean,llst agent) class agent ::= (activity,processor, packet,integer) type act iv i ty = idle l exec

Agents also have an additional instance variable which indicates whether or not they have responded to a reduce message before.

The modified rules for packets are exemplified by: { p roc . . . . . . . notnf, inact ive ,p} g iven need(agent )

-* { . . . . . . no tn f ,ac t ive ,agent :: p}

then proc!place(newagent) + newagent ! ! reduce(se l f ) where newagen t = new (Agent , ( id le ,proc,se l f ,0} ) ;

{ p roc . . . . . . . notnf,inactive, p} given f ire --, { . . . . . . . notn f ,ac t ive ,p}

then proc!place(newagent) + newagent ! ! reduce(se l f ) where newagen t = new(Agent , ( id le ,proc,se l f ,0} ) ;

When an agent is spawned, it is placed on the packet 's processor before reduct ion can begin. The other rules just require a modif icat ion to the tuples to reflect the new instance variable.

Two new labelled transitions are needed in the agent definition (modulo changing tuples to reflect the new state information); the first deals with the first reduce message and the second deals with the new message, move:

( id le ,p roc . . . . } given reduce(pack) - - * {exec ,p roc ,pack ,0 } then se l f ! ! r educe (pack )+ proc! !move(sel f ) ; { , , , } g i v e n m o v e ( p r o c ) ~ ( _ , p r o c , ,_ } ;

All of the other rules for the reduce message should be changed so that their left hand sides require the activity flag to be set to exec.


6. Refining the specification to a hardware description

Lower levels of the refinement process are under development and are only sketched here. While we have already introduced processors to model load-balancing, agents can still perform rewrites freely - we have not captured the requirement that each rewrite be executed on a physical hardware device. In a hardware implementation we will require that rewrites be scheduled for sequential execution within each processor. Thus, there can be at most one "concurrent" rewrite per processor at once.

We can achieve this by extending the processor method with a scheduling mechanism. Messages to agents for which a processor is responsible are kept in an invocation pool, which forms part of a processor's state. The processor selects messages from this agent-invocation pool one at a time, and forwards them to their target agent. The agent is modified to provide a message to the scheduler to signal its completion.

This technique ensures that at most a single method is active for each processor at a time, and so could, in principle, be used to generate a hardware description directly. However, real architectures often multiplex the physical hardware between a limited number of virtual processors, to avoid idleness which might arise when a method involves a non-local communication or memory access. This is easily accomodated by modifying the processor to allow several agent invocations to be active at once.

We are currently using such a specification to construct a simulation of a simple parallel graph reduction machine written in the Occam programming language [12]. This constitutes a hardware description in the sense that the number of processes is fixed, and it could, in principle, be compiled automatically to a VLSI layout [17]. Howeve,r to make efficient use of VLSI technology a lower-level specification is required to detail the context-switching which allows each physical processor to be multiplexed between the virtual processors.

7. Conclusions

We have exhibited a simple notation which spans the description of parallel graph-rewriting computer architectures from the highest level, term-rewrite system specification down to the explicit distribution and scheduling of computation by physical processing elements. The notation integrates a graph-rewriting language with an object-oriented approach in which processes and message-passing are explicit. An operational semantics is given in a companion paper [6], and verification techniques are under development. Related work is reported by Kennaway and Sleep [14], and Schaefer and Schnoebelen [20].

We demonstrated the approach by describing the implementation of COBWEB, a simple parallel graph reduction machine. From a term-rewrite system we used standard techniques to derive graph-rewriting implementation in which sharing a reduction order (including parallelism) are explicit. A message passing implementation of this was given, and this was refined to include processor allocation and a simple work distribution strategy.

Further work will complete the integration of notations, including the term-rewrite system at one extreme and a hardware description language at the other. This will be supplemented by the development of the notation's theory to accomodate the refinement steps linking each level, such as firing, sharing, message-passing, work distribution and scheduling.

Acknowledgements

The work reported in this paper has benefited much from discussion with our colleagues in the Cobweb project. We would also like to thank the Alvey Directorate for their financial assistance and Alan Bagshaw in particular for his guidance and encouragement. Finally, we acknowledge the help and encouragement of our colleague Geoffrey Burn, formerly of GEC Hirst Research Centre.


R e f e r e n c e s

[1] P. America, J. de Bakker, J.N. Kok and J. Rutten, Operational semantics of a parallel object-oriented language, Report CS-R8515, Centre for Mathematics and Computer Science, Amsterdam, The Netherlands, 1985.

[2] P. Anderson, C.L. Hankin, P.H.J. Kelly, P.E. Osmon and M.J. Shute, COBWEB-2: Structured specification of a wafer scale supercomputer, in PARLE, Vol. I, J.W. de Bakker, A.J. Nijman and P.C. Treleaven (eds.), LNCS 258, (Springer-Verlag, 1987) 51-67.

[3] P. Anderson, C.L. Hankin and P.H.J. Kelly, Parallel combinator reduction on a wafer, in 1FIP 88: Network Information Processing Systems (North-Holland, Amsterdam) forthcoming.

[4] H.P. Barendregt, M.C.D.J. van Eeckelen and M.J. Plasmeijer, Specification of reduction strategies in term rewriting systems, reprint, Catholic University of Nijmegen, The Netherlands, 1986.

[5] H.P. Barendregt, M.C.D.J. van Eeckelen, J.R.W. Glauert, J.R. Kennaway, M.J. Plasmeijer and M.R. Sleep, Term graph rewriting, in PARLE, Vol. II, J.W. de Bakker, A.J. Nijman and P.C. Treleaven (eds.), LNCS 259 (Springer, Berlin, 1987) 159-176.

[6] D. Bolton, C.L. Hankin and P.H.J. Kelly, An operational semantics for Paragon: A design notation for parallel architecture, forthcoming.

[7] J. Darlington and M. Reeve, Alice - A multiprocessor reduction machine for the parallel evaluation of applicative languages, Proc. ACM Conf. Functional Languages and Computer Architecture (New Hampshire, 1981).

[8] E.W. Dijkstra, A mild variant of combinatory logic, EWD735, 1980. [9] H.W. Glaser, C.L. Hankin and D.R. Till, Principles of Functional Programming (Prentice Hall, Englewood Cliffs, 1984).

[10] J.R.W. Glauert, J.R. Kennaway and M.R. Sleep, DACTL: A computational model and compiler target language based on graph reduction, Report SYS-C87-03, University of East Anglia, 1987.

[11] C.L. Hankin, G.L. Burn and S.L. Peyton Jones, A safe approach to parallel combinator reduction, Theoretical Comput. Sci. (March, 1988).

[12] Inmos Ltd., Occam 2 ® Reference Manual (Prentice-Hall, International, Englewood Cliffs, 1988). [13] P.H.J. Kelly, Functional Programming for Loosely-coupled Multiprocessors, (Pi tman/MIT Press, London, 1989). [14] J.R. Kennaway and M.R. Sleep, Expressions and processes, Proc. ACM Syrup. LISP and Functional Programming (1982). [15] J.R. Kennaway and M.R. Sleep, Director strings as combinators, ACM Trans. Programming Languages and Systems. [16] J.W. Klop, Term rewriting systems, notes for the 1985 Ustica workshop on reduction machines, to be published. [17] D. May and C. Keane, Compiling Occam into Silicon, Technical report 23, lnmos Ltd, 1000 Aztec West, Almondsbury, Bristol

BS12 4SQ, U.K. [18] S.L. Peyton Jones, The Implementation of Functional Programming Languages (Prentice Hall, Englewood Cliffs, 1987). [19] G.D. Plotkin, A structural approach to operational semantics, Technical Report FN-19, Aarhus, Denmark, 1981. [20] P. Schaefer and P. Schnoebelen, Specification of a pipelined event driven simulator using FP2, in PARLE, Vol. I, J.W. de

Bakker, A.J. Nijman and P.C. Treleaven (eds) LNCS 258 (Springer, Berlin, 1987). [21] D.A. Turner, Miranda: A non-strict functional language with polymorphic types, in: Functional Programming Languages and

Computer Architecture, J.-P. Jouannaud (ed), LNCS 201 (Springer, Berlin, 1985).

Documents

Parallel object-oriented descriptions of graph reduction machines