A syntax-directed translation for the synthesis of delay-insensitive circuits

196 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 2, NO. 2, JUNE 1994

A Syntax-Directed Translation for the Synthesis of Delay-Insensitive Circuits

S . C. Leung and Hon F. Li

Abstract- A syntax-directed translation procedure for .the synthesis of delay-insensitive circuits from graph-theoretic specifications is pmented. No isochronic fork assumption is required for the correct operation of the synthesized circuits. The synthesized circuits are different from those obtained from Ebergen’s synthesis method [6]. In Ebergen’s circuits, the voltage levels of a set of wires are used to encode which input events are most recently received. Special circuit elements (the N-element or the RCEL element) and two-phase to four-phase converters are needed to change the voltage levels of the encoding wires when input events are received. In the circuits obtained from the method in this paper, the wires encoding which input events are most recently received are the outputs of the toggles. When input events are received, they are sent directly or via demultiplexers to the toggles to change the voltage levels at their outputs. Two- phase to four-phase converters are not needed. The synthesis method is compared with Ebergen’s synthesis method.

I. INTRODUCTION delay-insensitive (DI) circuit is a form of asynchronous A circuit whose functional correctness is unaffected by the

delays in its components and the delays in its interconnecting wires. It is commonly known as a self-timed circuit [16], and is the most robust form of self-timed circuits. Other forms of self-timed circuits require some delay assumptions, such as the use of equipotential regions, local clocks, or relative wire-delay assumptions. Speed-independent circuits are closely related to DI circuits in that they allow arbitrary component delays, but assume negligible wire delays. Research into self- timed circuits has been motivated by the problems encountered in large-scale synchronous design, such as the problem of distributing a global clock signal correctly as a result of clock skew. Work on asynchronous circuit desigdsynthesis has appeared in [11-[41, [61, 171, [lo], 1121, 1151, U71, U81.

Speed-independent circuits have been synthesized using Signal Transition Graphs (STG’s) [4], [12]. The STG is first modified to satisfy liveness and persistency requirements. Then the modified STG is transformed into a state graph and implemented as an asynchronous sequential circuit. Classical Boolean optimization can be used in translating the state graph into a circuit. However, it is sometimes necessary to change the original specification.

A procedure for translating programs into DI circuits was described by Ebergen [6]. In Ebergen’s circuits, the voltage levels of a set of wires are used to encode which input

Manuscript received December 11, 1992; revised August 10, 1993, Novem- ber 22, 1993.

The authors are with the Department of Computer Science, Concordia University, Montreal, Quebec, Canada H3G 1M8.

IEEE Log Number 9400595.

events are most recently received. Special circuit elements (the N-element or the RCEL element in [6]) and two-phase to four-phase converters are needed to change the voltage levels of the encoding wires when input events are received. In the circuits obtained from the method in this paper, the signals for encoding the most recent input events are the outputs of the toggles. When input events are received, they are sent directly or via demultiplexers to the toggles to change the voltage levels at their outputs. ”bo- to four-phase converters are not needed. A comparison of the methods will be presented in Section V.

Martin [ 111 showed that the class of DI circuits is very limited, but his result was only for a certain set of classical circuit elements such as AND, OR, SR latch etc. In contrast, the method in this paper uses a different set of basic circuit elements to produce DI circuits. Similarly, Ebergen’ s synthesis method achieved DI circuit implementations by using a set of basic circuit elements different from those considered in [ 111.

Other methods of translating into delay-insensitive/speed- independent circuits have been from specifications based on CSP/occam [1]-[3], [lo]. Most have been based on syntax- directed translation into a set of building blocks. Methods which make use of isochronic forks are [3], [4], [lo], [121, while methods which are delay-insensitive are [l], [2], [61.

The contribution of this paper is as follows. A syntax- directed translation procedure for the synthesis of delay- insensitive circuits from graph-theoretic specifications is presented. The translation procedure is proved correct. In the circuits obtained in this paper, two-phase to four-phase converters are not needed. The input to the synthesis method is a behavior machine [14]. In the synthesis of multiple-rule behavior machines, two rules, called the demultiplexing rule and the choice recording rule, are found to be useful. Finally, a comparison with Ebergen’s work is presented.

The synthesis procedure described in this paper is for control circuits. A separation is made between the data path and the control part of the circuit, with the control part being separately synthesized. Data paths can be attached to the control circuits using the bundled data convention, such as in [ 181.

The paper is organized as follows. The specification model in [14] is reviewed in Section 11. This section describes the semantics of the behavior machine model which serves as the input to the synthesis method. A correctness definition for implementation is given in Section 111. The correctness definition is necessary for judging the correctness of the translated circuits. The syntax-directed translation is described and proved correct in Section IV. Comparison with Ebergen’s

1063-8210/94$04.00 0 1994 IEEE

~~

LEUNG AND L1: SYNTAX-DIRECTED TRANSLATION 197

method is presented in Section V. The paper concludes in Section VI.

11. SPECIFICATION MODEL

A DI hardware process (component) has input and output ports where signal transitions take place. An action of a process models a rising or falling signal transition at an input or output port of the process. Each occurrence of an action is called an event. No distinction is made between the semantics of rising and falling signal transitions. Initially, all the input and output ports are low. An execution of the process is described by a sequence of inputfoutput events (i.e., a sequence of signal transitions at the inputloutput ports of the process). Given a process, the set of possible executions can be defined.

In each execution, there are necessary orderings as well as unnecessary orderings among the events. A necessary ordering between two events is that they must occur in certain order. An unnecessary ordering between two events is that they can occur in either order or simultaneously. As a result, a binary relation can be defined between two executions: two executions are related if and only if they do not differ in the necessary orderings among the events in them. Clearly, the binary relation is an equivalence relation (i.e., it is reflexive, symmetric and transitive). This relation induces equivalence classes on the set of possible executions of a process. Each such equivalence class is called a behavior of the process. Such a behavior is represented by a partial order on the events, where the partial order represents the necessary ordering.

The formal tool for such a representation is a pomset (partially ordered multiset) [13, 141, which is defined as follows. A labeled partial order is a 4-tuple (V. C,T. p ) consisting of I ) a finite set V of events, 2 ) an alphabet C of actions, 3) a partial order r c V x V and 4) a labeling function p : V --t C assigning action labels to events. Apomset, written [V. C , I?. ,U], is the isomorphism class of a labeled partial order [V.c .r ,p] . There is an arrow from event e to event f in pomset p = [V, E, I?, p] if (e, f ) E r. The partial order r is represented in transitively reduced form, i.e., if there are arrows from event e to event f and from event f to event g, then the arrow from event e to event g will not be represented explicitly.

A behavior of a process is represented by a pomset as follows. A labeled event in the pomset represents an event of the process. The partial order on the events of the pomset represents the necessary ordering on the events of the process. For example, Fig. l(b) shows the behavior of the C-element in Fig. I (a) when it is exercised to produce a rising and a falling transition at output 2. Initially, all inputs and outputs are low. Output events are underlined, while input events are not. An arrow from a to g means a must occur before 3;. If a is an input event and q is an output event, then this is the ordering (represented by solid arrows) enforced by the process. The reverse, i.e., an arrow from 3; to a, is an ordering (represented by dashed arrows) enforced by the environment of the process (user of the hardware process). The C-element could produce a rising transition at output IL: after rising transitions at both a and b are received. Having observed g, the user can send

Fig. 1. A C-element and i t s behaviors. (a) A C-element. (b) Pomset example. (c) Complete behavior of the C-element.

in new events (falling transitions) at a and b, followed by the process (G-element) producing an event (a falling transition) at 2.

Some terms are defined next. In Fig. I(b), the second event of a or b should not be sent in by the environment before the first event of g is produced. This is called input safety requirement of the C-element. The G-element should not produce the first (second) output event r: until the first (second) pair of a and b is received. This is called output safety requirement. When the first (second) pair of U and h is received, the C-element must eventually produce an event at output g. This is called output progress requirement.

Given a behavior b = [V> E. r. p], event e E I; is said to immediatelyprecede event f E V if (e. f ) E r. Event c is said to precede event f if e immediately precedes f or there exists an event g such that e immediately precedes g and ,y precedes f . The projection of b onto a set V’(C V ) of events is the restriction of b to V’. Formally, the projection of b onto V’, writtenproject (b,V’), is [V’.C: ( r+n(V’xV’ ) ) - . p ] , where + and - denote transitive closure and transitive reduction respectively. A prejix ( s u f i ) of a behavior is the projection of the behavior onto a set of events such that if e is in the set, then the events immediately preceding (following) c will also be in the set. A complete behavior of a process is a behavior of the process that is not a prefix of another behavior of the process. For example, the behavior in Fig. I(b) is not a complete behavior as it is a prefix of the complete behavior, shown in Fig. I(c), of the C-element. The specijication of a process is the set of complete behaviors of the process. A determinate (nondeterminate) process is a process whose specification has only one (more than one) complete behavior.

A given behavior b of a process is a prefix of some complete behaviors of the process. For each of these complete behaviors, a suffix can be defined as follows. Let B = [V.EC:r.p] be a complete behavior of which b is a prefix, and V’ C V . If the projection of B onto V’ is b (thus every event preceding an event in V’ is also in V’), then the suffix of b with respect to B will be the projection of B onto (V - V’). So, a behavior is associated with a set of suffixes (one suffix corresponds to one complete behavior of the process) that can follow it. A binary relation on the set of behaviors can be defined: two behaviors are related if and only if the sets of suffixes that can follow them are identical. This binary relation is obviously an equivalence relation and induces equivalence classes on the set of behaviors of the process. Each equivalence class is called a state of the process. For each distinct state, the set of suffixes that can follow is distinct.

A. Delay-Insensitivity Axioms

A pomset represents a DI behavior, a behavior of a DI process, if the following axioms, equivalent to those by


Udding [19], are satisfied [14]. 1) No auto-concurrency. In the behavior, all occurrences of an input/output action must be totally ordered. Thus a cannot happen concurrently with another a. 2) Altemating input/output. Input events can only immediately precede output events and output events can only immediately precede input events. Thus, in each (transitively reduced) pomset, a --+ b, where a and b are both input or both output events is forbidden. 3 ) No cross-disabling. Suppose a process is specified by the set S of complete behaviors. For any b l , b2 E S , let CY be the maximal common prefix of bl and b2, i.e., any prefix that is common to bl and b2 is a prefix of CY. If input action a is allowed by Q: according to bl and output action g is allowed by a according to b2, then there exists a behavior b3 E S such that CY is a prefix of b3, and a and g are allowed by CY according to b3. All behaviors in this paper are assumed to satisfy axioms (1)-(3).

B. Behavior Machine

To finitely represent the set of infinite complete behaviors of a process, a pomset generator, called behavior machine, is used. The behavior machine of a process P has a finite number of named states, one of which is the initial state. Each named state is a state of P (but not vice versa) and is represented by a set of slots. There are transitions, called rules, between pairs of named states. Thus, each rule has a named state preceding it and a named state following it. Each rule is a finite pomset denoting the behavior of P when there is a transition from the state preceding the rule to the state following the rule. A behavior of P is generated by concatenating rules of the behavior machine of P, starting from a rule following the initial state. A rule T can be concatenated to a behavior b if the state, say s, following the behavior b is the same as the state preceding the rule T . The set of slots representing state s is for expressing causalities between events from the behavior b and events from the rule T . If event e in b precedes a slot L and event f in T is preceded by slot L, then, after concatenation, event e will precede event f . Fig. 2 shows a behavior machine that has a named state, denoted by the set of slots {(0), (l)}, and four rules, rules 1 to 4. {(0), (1)) is the initial state.

In the special case of the behavior machine of a process consisting of only one rule, the rule models the cyclic behavior of the (determinate) process. The infinite behavior is generated by repeated concatenation of the rule with itself. That rule is called a recurrent pomset. A recurrent pomset is similar to a STG [4] without choice in that they both express the cyclic behavior of a process. However, it is different from a STG in that a recurrent pomset is subjected to the delay-

- l I

(a) (b)

Fig. 3 . recurrent pomset. (b) a STG.

Arecurrent pomset and a STG that specify the same process. (a) a

insensitivity axioms. A STG does not have to satisfy the delay-insensitivity axioms, e.g., the specification of the self- timed controller of the A/D converter in [ 5 ] . Moreover, for a recurrent pomset, the directions of signal transitions are not required to be stated explicitly. Thus, recurrent pomsets have simpler representations. Fig. 3 shows a process specified by a recurrent pomset and by a STG.

For a nondeterminate process, the behavior machine representation may have more than one rule emanating from a named state. Such a named state is called a choice state. A rule ending at a choice state can concatenate with any one of the rules following the choice state. This property ensures that more than one behavior can be generated by the machine.

Variations of behaviors are limited to the conflicts between enabled input actions only. An input free choice is the selection of one out of several nonempty sets of input actions by the environment. Variations of behaviors due to the arbitration of concurrently enabled output actions can be converted into a determinate process and an arbiter process. Details can be found in [SI. When an input free choice is enabled, the environment flips a coin to decide which set of input actions is selected. Then, it sends in input actions in the selected set. To make the synthesis easier to understand, it is required that any two of the sets have empty intersection. Suppose a behavior CY enables two actions a and b that are in conflict. The class of choice behaviors is further restricted so that the events in CY immediately preceding a are identical to the events in CY

immediately preceding b. A choice state is a state at which every input action enabled

is in conflict with some enabled input action. A choice state is a named state of the behavior machine to be constructed. From the set of behaviors, it is possible to identify which input actions are in conflict with other input actions at a choice state. Suppose at a choice state, actions a and b are in conflict, and there is a rule ending in that choice state. Rule T contains the events that immediately precede a or b. A slot L is created for the choice state so that there are dashed arrows from the events in T immediately preceding a to L. In the rule containing a, there is a dashed arrow from slot L to a. These are the conventions for creating a slot at a choice state. The semantics of a slot is that the environment is allowed to do a coin- flipping action. When all the events immediately preceding a slot L have occurred, an input free choice is enabled. The environment is allowed to make a decision to choose one of the several enabled sets of input actions. When the named states of a behavior machine are identified, the transition rules between pairs of named state can easily be derived.

A nondeterminate behavior machine is an atomic choice behavior machine if its named choice states are all atomic

LEUNG AND LI: SYNTAX-DIRECTED TRANSLATION I99

0-1 1 Wlre (0) .... I. - I ....,,. (0) 2 l""enPC (0)- ....."

3 Fork ( O W - .... I,. (0)

U -(a) a+.

"Tf '',,,,, / ILL

(2 nJIies) ,*. ( o p b - ...,,. (0) ILL

\ )I

4 C-element (0)--..,. - .... I,. (0)

.---a=; 5 Toggle (0)- I. - I ... I,. - .....,. (0)

6 XOR ( O b n - 6 .... '..(I))

7 Demuluplexer (6 rules)

+I ,, (0* a - ....,.. (0) ,4 (1) p.. a - ..... "(1)

,> (0) ....I. * - ....". (0) ' 5 ( 1 )....... - I .....,. (0) *

I Y (1) rs (1) ....... " .....,.. . ( 1 ) ,, (0)" - I ....,..

Fig. 4. The set of basic circuit elements,

(necessarily reached), else it is a concurrenf choice behavior machine. In a concurrent choice machine, a choice state consists of multiple choice slots, each enabling an input-free choice. The behavior machine for the demultiplexer in Fig. 4 is an atomic choice machine, while the behavior machine in Fig. 2 is a concurrent choice machine.

More examples are shown in Fig. 4. They are the behavior machines of the basic circuit elements used in the synthesis in this paper. Initially, all the inputs and outputs of a circuit element are low. The C-element acts as a join, where there is a transition at the output of the C-element only when both of its inputs have received transitions. The toggle routes input signal transitions to the outputs alternately. No isochronic fork assumption is required for the fork element. The exclusive-or (XOR) is a nondeterminate process. It acts as a merge where the environment is allowed to send an event at one of its input ports after initialization and after it produces an output event. Concurrent input and output events are not allowed. The behavior machine consists of two rules T I and T Z . For concise representation, a square black dot is used to denote an XOR.

The 2-channel demultiplexer works like this: input s switches the state of the circuit element to state ((0)) (acknowledged by t) and similarly input U switches the state of the circuit element to state ((1)) (acknowledged by g). After switched properly to either state ((0)) or state {(l)}, an event at input a will lead to production of either g or y correspondingly. Input at s or U is allowed only if no pending output is enabled in the circuit element. At the initial state (state {(0)}), an event at input a will lead to production of an event at g. This is indicated by a dot at channel g.

C. Well-Behavedness of Behavior Machine

Besides the delay-insensitivity axioms, two additional well- behavedness axioms are required for the behavior machines to be synthesized. The axioms are stated below. (4) Finiteness axiom. Suppose a rule r is instantiated twice in the generation of a behavior b of a process. There are two events e l and e2 in b that are instances of the same event e in T . If f l is an event in b that is immediately preceded by e l , then fl must precede e2.

The finiteness axiom ensures that the given behavior machine has some property so that the synthesis method yields a correct circuit. (5) Well-behavedness of choice. If a named choice state has two or more choice slots, then all combinations of choices following each slot must be included in the behavior machine. In other words, the choice made in one slot cannot disable any choice that can be made in a concurrent choice slot. For example, in Fig. 2, at the named choice state {(0), (l)}, slot (0) enables a choice between {a} and { b } , and slot (1) enables a choice between { c ) and { d } . Thus, there is a rule following the choice state for each of the combinations {a, c ) , { a , d } , {b , c ) , and { b ; d } . All the behavior machines used in this paper are assumed to satisfy all delay-insensitivity axioms and well-behavedness axioms.

111. CORRECTNESS OF IMPLEMENTATION

The correctness criteria for an implementation [ 141 are presented. A behavior machine specifies the safety and progress requirements of a process. A network of processes implements a process P if and only if the safety and progress requirements of P are "satisfied'. A formalization of the correctness definition in terms of the pomset model is given below.

Given two behaviors p and q, an event e of p corresponds to an event f of q if (1) the events have the same action label, and (2) the number of events of the same action label preceding e in p is equal to the number of events of the same action label preceding f in q.

A closed system is formed by a set of processes if every output port of a process is in direct contact with an input port of some other process, every input port of a process is in direct contact with an output port of some other process, and ports in direct contact are of the same name.

A system behavior of a closed system formed by processes Po.. . . . P, is the behaviors of Po.. . . . P, (one behavior from each process) augmented with wavy arrows from output events to corresponding input events such that 1) every input (output) event in one behavior has a corresponding output (input) event in some behavior and 2) if there are cycles in the augmented behavior, then the events in the cycles and all the events preceded by them will be removed. Intuitively, a system behavior of a closed system is a maximal run of the closed system if the processes in the closed system satisfy all safety and progress requirements. Events deleted in 2 ) are those that cannot occur because of the causality constraints imposed.

The mirror of a process P, written rnP, is a process which has a specification given by changing every input (output) event of every complete behavior in the specification of P to an output (input) event of the same label, and every dashed (solid) arrow to a solid (dashed) arrow. The mirror of a process can be regarded as a usage specification of the process.

A network of processes P I , . . . . P,, implements a process P if and only if P I . . . . . P, and nrP form a closed system and the following correctness properties hold: 1) in every system behavior of the closed system formed by P I , . . . . Pn, and mP, the set of causalities represented by the dashed arrows is a subset of the set of causalities derivable from the solid arrows and the wavy arrows (all component input safety requirements

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 2, NO. 2, JUNE 1994

(b)

v - v

I

/- \ Causahty

introduced by pon-pon wntact

e ................................................

(d)

Fig. 5. Correctness of implementation. (a) A finite poset. (b) 2 component processes PI and 4. (c ) Closed system of mP, PI and P2. (d) A h n of the closed system (system behavior).

are satisfied under the usage of P) , and 2 ) the projections of the system behaviors onto the events of m P are the complete behaviors of m P (progress requirements of P are satisfied).

Consider the example specification ( P ) shown in Fig. 5(a), involving a finite poset and a set of component processes, PI (a fork) and P2 (a G-element) shown in Fig. 5(b). The behavior of the environment that is expected to use P can be constructed by changing input actions to output actions and vice versa in P and thus obtaining its mirror process mP. Mirror mP can be interpreted as the usage specification of P. PI , P2 and m P can now be hooked up to form a closed system: input a to PI comes from output a in mP, output g of PI goes to input w of P:, etc. as shown in Fig. 5(c). The processes forming the closed system are initialized and allowed to interact. Correctness is affirmed by requiring that 1) there is no safety violation in any of the processes: whenever an output event is produced by a process, the receiver process is already enabled to receive it, and 2 ) every input event of m P (thus output event required by P ) will be produced by some process. The system behavior of mP, PI , and P2 is shown in Fig. 5(d). The causalities represented by dashed arrows are enforced by solid and wavy arrows, as indicated in the figure. Moreover, the projection of the system behavior onto the events of m P gives mP. Thus, it can be concluded that the network of PI and P:, implements P.

Semantically, a wavy arrow from an output event to an input event of the same action label denotes the causality from an event at an output port to a corresponding event at an input port that is in direct contact with the output port.

Intuitively, whether a network N of processes implements a process P is checked by using the mirror of P as a driver to drive N and allowing the closed system (consisting of N and m P ) to run. Validity of condition 1) means input safety requirements of every process in the closed system are not violated. Input safety requirements in N are not violated amounts to the satisfaction of the input safety requirements of P. Input safety requirements of m P are not violated amounts to the satisfaction of the output safety requirements of P. Validity of condition 2) means mP is allowed to proceed as far as specified, which amounts to the satisfaction of the output progress requirements of P.

The following lemmas follow from the definitions. Proofs of the lemmas can be found in [9]. Lemma 1 states that if a process is implemented by a network, and each component Pi

in the network is implemented by Qi, then the process will be implemented by the network consisting of Qi. This lemma is similar to Ebergen’s Substitution Theorem [6]. Lemma 2 states that if a system behavior satisfies correctness property 1) in Section 111, and its events can be partitioned into (n + 1) sets (processes) satisfying certain properties, then a network of n processes can be identified to implement the mirror of the projection of the system behavior onto the remaining process. Use of the lemma is explained in the beginning of Section IV.

If the network of processes P I , . . . , P, implements P, and Qi implements Pi (1 5 i 5 n) , then the network of processes Ql, . . . , Qn will implement P.

Lemma 2: Given a pomset B with wavy arrows from every output event to its corresponding input event, and in B, the set of causalities represented by dashed arrows is a subset of the set of causalities represented by solid and wavy arrows, if the events in B can be partitioned into (n+ 1) sets Vo, . . . , V, such that (1) every wavy arrow connects two events from two different sets, (2) every solid arrow connects two events from the same set, (3) all the input events of the same action label belong to one of the sets, and (4) all the output events of the same action label belong to one of the sets, then the network of processes {project(B,V,) : 1 5 i 5 n} implements the process of which project(B, Vo) is its mirror.

Lemma 1:

IV. SYNTAX-DIRECTED TRANSLATION

The main idea of the syntax-directed translation is to apply a series of correctness-preserving transformations to the system- behavior of the closed system formed by the given process P and mP. Thus, the transformed system behavior B satisfies the correctness criteria. From B, (n+l ) sets of events Vi,. . . , V,, which satisfy the four conditions stated in Lemma 2, are identified. V, contains the events in mP. Using Lemma 2, it can be concluded that P is implemented by the network of processes {project(B,V,) : 1 5 i 5 n}. Each of the processes in the network is, in turn, implemented by one of the basic circuit elements. By Lemma 1, the synthesized circuit implements P. Correctness of the synthesis procedures is mostly concerned with proving that the transformations are correctness-preserving, and with the identification of the sets of events VO , . . . , V,.

A. Syntax-Directed Translation for a Finite Poset

The problem of the synthesis of a finite poset is to find a network of basic circuit elements that implements the finite poset.

Some terms are defined. An N-structure is a pomset containing two input events el and e:, and two output events e, and

such that 1) e l immediately precedes e,, 2) el immediately precedes a, and 3) e:, immediately precedes Q. Fig. 5(a) is an N-structure. A basic posetpiece q of a pomset p is a connected subgraph of the graph representing p such that 1) events in q are connected by solid arrows only, and 2) no event in q is connected by a solid arrow to an event in p but not in q. Fig. 6(c) contains three basic poset pieces, namely { a , g , z } , { b J , y L and {.,d,E}.

LEUNG AND LI: SYNTAX-DIRECTED TRANSLATION 20 1

~-~ ....................................

\ - v - " - \ \ ;

b - " ....,.. d - u

( C )

Fig. 6. Synthesis of finite poset. (a) Transformation of .Y-structure (b) A finite poset. (c) The transformed poset. (d) Sytex-directed translation circuit forthe poset in (b).

In the translation of a finite poset P, a set of transformations (behavior refinement) is applied to the system behavior of the closed system formed by P and m P to get a system behavior of P I , . . . , P, and m P that satisfies the correctness properties stated in Section 111. The network of processes PI . . . . . P,, thus implements P. Each of the processes Pl, . . . . P,! is implemented by a wire, an inverter, a wire-fork, or a C- element. The network of circuit elements is the synthesized circuit.

The synthesis procedure is given below. The behavior of m P remains unchanged throughout the procedure. The N - structures and basic poset pieces referred to in the procedure do not include events from SrriP.

Construct the system behavior B of P and m P ; While (there is an N-structure in B)

For every basic poset piece G of B Apply the transformation in Fig. 6(a) to the N-structure

Case G is: an output event: an arrow: instantiate a wire; a join: instantiate a C-element; a fork: instantiate a wire-fork;

Fig. 6 demonstrates the synthesis procedure. Fig. 6(a) shows the transformation applied to the N-structures in the system behavior of P and 7nP. Fig. 6(b) shows a finite poset example which is transformed into the poset shown in Fig. 6(c). The basic poset pieces in Fig. 6(c) are then mapped into circuit elements which are interconnected to form the synthesized circuit shown in Fig. 6(d).

The correctness of the procedure is established through a series of lemmas.

Lemma 3: The correctness properties (stated in Section I11 hold before the while loop.

Proof: From the definition of mP, it follows that for every dashed arrow ( g , x) in SrnP, there is a solid arrow (0,. z) in P. From the definition of system behavior, in the system behavior B formed by P and m P , there are wavy arrows (a, a ) and (g3 x). Therefore, in B, the causality represented by dashed arrow ( g . : E ) is also represented by the wavy arrow (a. a ) , the solid arrow ( U . g), and the wavy arrow (g. :I;). This

instantiate an inverter;

is similar for every dashed arrow from P. Thus, in B, the set of causalities represented by dashed arrows is a subset of the set of causalities represented by solid and wavy arrows. If two events e and f in rnP are connected (not connected) by a solid or dashed arrow, then, in B, e and f' are connected (not connected) by a solid arrow or via events from P. Thus, the projection of the system behavior onto the events of m P still

The correctness properties are invariant with respect to the transformation in Fig. 6(a).

It can be checked that after the transformation, the causalities from every input event to every output event are still enforced, and no dashed arrows are added to the system behavior. Hence, the set of causalities represented by dashed arrows is still a subset of the set of causalities represented by solid and wavy arrows. On the other hand, the transformation does not add additional causalities between the events in the N-structure and other events originally in the system behavior. Hence, the projection of the transformed system behavior onto

H The while loop is executed a finite number of

times. Since there are only a finite number of events in

I?, there can only be a finite number of N-structures. An N - structure in a pomset is induced by an arrow, say ( a . y), if there are already arrows (a . g) and (h . g) in the pomset. The number of N-structures in B is monotonzally reduced by applying the transformation in Fig. 6(a), because the introduction of events U and 3 eliminates the N-structure induced by ( a , y) without introducing a new one. Consequently, the transformation is only applied to the system behavior a finite number of times. H

When the while loop is exited, there are no

The condition for entering the while loop is that there exists an N-structure in B. When the loop is exited, the condition is no longer valid. Thus, there are no N-structures in B. H

Lemma 7: In the transformed system behavior, each of the basic poset pieces that do not include events in SrnP is either an output event, an arrow from an input event to an output event, a fork, or a join.

Suppose otherwise. Then, there is a basic poset piece G that has r n ( > 2) input events and n(2 2) output events. Let el and e2 (e3 and ( J ~ ) be two of the input (output) events. Since G is connected, there exist solid arrows from el to g3, c1 to g,, and e2 to e,. Thus, there is an N-structure induced by the arrow (el . g , ) , which contradicts Lemma 6. H

For any finite poset P that satisfies delay- insensitivity axioms (1) and (2), the synthesis procedure gives a network of basic circuit elements that implements P.

By Lemmas 3, 4, and 5 , upon exit of the while loop, a transformed system behavior that satisfies the correctness properties is obtained. The events in the transformed system behavior can be partitioned into several sets of events that satisfy the conditions stated in Lemma 2. One of the sets contains the events in mP. The other sets contain the events in the basic poset pieces that do not include events from m P .

gives rnP. H Lemma 4:

Proof

events of ,rnP still gives mP. Lemma 5:

Proof

Lemma 6: ; N-structures in B.

Proof

Proof

Theorem I :

Proof


By Lemma 2, P is implemented by a network of,processes, each of which is specified by a basic poset piece. By Lemma 7, each of the basic poset pieces is of some form that can be implemented by an inverter, a wire, a wire-fork, or a C- element. By Lemma 1, the synthesized circuit implements P.

a - a ' 8. a - a

f f b - y --I. b

(a) (b) B. Syntax-Directed Translation for a Finite Pomset

The problem of the synthesis of a finite pomset is to find a network of basic circuit elements that implements the finite pomset. The idea of synthesizing a finite pomset is to convert the finite pomset into a finite poset. The converted finite poset can be synthesized using the method described in the previous section. Two basic circuit elements, toggle and XOR, are used for the conversion. A toggle of n output ports produces an

b

a

event at the ith output after the (nk + i)th input event is (C)

received' where ' ' and IC is a positive integer* It Fig. 7. Synthesis of finite pomset. (a) A finite pomset. (b) Transformed serves as a pomset to poset transformer for events of an input action. An XOR produces an output event when it receives

(c) Synthesized circuit.

0-J 0 -a, - XI - I an event at any one of its input ports. Note that due to the

a -at- a,- I

zero autoconcurrency axiom, at any one instant, only one of "7" - \ / - / the inputs is receiving signal transition. It serves as a poset to pomset transformer for events of an output action. (a) (b)

Fig. 7 illustrates the synthesis. Fig. 7(a) is a pomset P,

7(b). The poset is synthesized by the method in the previous section, and the pomsets are implemented by toggles and XOR's. Fig. 7(c) is the translated circuit. The syntax-directed

Y Y b b

which is transformed into a poset and several pomsets in Fig. 0 -11 -..I- a -az-*" *** ....I. a - R

(C) ( 4

=i+ I translation for a finite pomset P is given below. a ...,. x2d a .... I.. ... ..... I. +- 2

Construct the system behavior B of P and m P ; For each input event a of P (e) (0

X"

Apply the transformation in Fig. 8(a);

Apply the transformation in Fig. 8(b); For each pomset containing the events of an input action a

and the events immediately preceded by the events of a Instantiate a toggle;

and the events immediately preceding the events of g Instantiate an XOR;

Fig. 8. Transformationsfor the synthesis of finite pomset. (a) Transformation for an input event. (b) Transformation for an output event. (c) A finite pomset. (d) A toggle that implements the pomset. (e) A finite pomset. (0 An XOR that implements the pomset.

For each output event g of P

Lemma 11: After the transformations of Fig. 8(b), for every output action g, the projection of the system behavior onto the events of g and the events immediately preceding the events of g is of the form shown in Fig. 8(e).

Proof: Similar to the proof of Lemma 10. 0

For each pomset containing events of an output action g

Synthesize the finite poset containing events that are not from mP or from the pomsets synthesized above;

Lemma 8: The process in Fig. 8(c) is implemented by the

Proof: The proof is carried out by the model checking

Lemma 9: The process in Fig. 8(e) is implemented by the

Proof: The proof is carried out by the model checking

Lemma 10: After the transformations of Fig. 8(a), for every input action a, the projection of the system behavior onto the events of a and those events immediately preceded by the events of a is of the form shown in Fig. 8(c).

The transformations ensure that the ith occurrence of a precedes ai. In turn, ai precedes the (i + l)st occurrence of a.

toggle in Fig. 8(d).

procedure in [ 141.

XOR in Fig. 8 ( f ) .

procedure in [14].

Proof:

Theorem 2: For any finite pomset P that satisfies delay- insensitivity axioms (1) and (2), the synthesis procedure gives a network of basic circuit elements that implements P.

Similar to the proofs of Lemmas 3, 4, and 5 in Section IV-A, the correctness properties hold before and after the transformations shown in Fig. 8(a) and (b) are applied a finite number of times. The events in the transformed system behavior can be partitioned into several sets of events that satisfy the four conditions stated in Lemma 2. One of the sets contains the events in mP. The other sets contain the events of an input (output) action and the events immediately preceded by (preceding) them, and the events in the transformed system behavior that are not mentioned above. By Lemmas 10 and 11, the projections of the transformed system behavior onto the sets of events are pomsets of the forms shown in Fig. 8(c) and (e), and the finite poset P' amounts to re-labeling

Pmofi

LEUNG AND LI: SYNTAX-DIRECTED TRANSLATION

P. Thus, by Lemma 2, P is implemented by P' and several pomsets of forms shown in Fig. 8(c) and (e). By Lemmas 8 and 9, the pomsets are implemented by toggles and XOR's. By Theorem 1, the syntax-directed translation of P' gives a network of basic circuit elements. Consequently, by Lemma 1, P is implemented by the synthesized circuit.

C. Syntax-Directed Translation for a Recurrent Pomset

The problem of the synthesis of a recurrent pomset P is to find a network of basic circuit elements that implements P", the concatenation of r i ( 2 1) copies of P. Since P satisfies the finiteness axiom (axiom (4) in Section 11-C), the circuit that implements P' also implements P", V n 2 1. Note that P' is a single copy of P (without slots) while P is a generator that generates the infinite complete behavior. The circuit that implements P1 can be obtained from the syntax- directed translation of P1 . The above result assumed that no slots in P immediately precede an output event. When the assumption does not hold, some adjustments are needed. The adjustments involve inserting inverters into the synthesized circuit and proving that dashed arrows connecting events of mPk+' from Q? to QL+l, 1 5 i 5 k , are enforced by sequences of solid and wavy arrows. Details can be found in [9]. Proof of the claim is detailed below. Properties 1 and 2 are used in the proof of Theorem 3.

In the system behavior formed by the syntax- directed translation circuit for P1 and mP1, if b is an input event of a component of type wire-fork, C-element, or wire, then there must exist a most recent output event, say a, in rnP such that a precedes b and one of the input events in rnP immediately following a is preceded by every output event in the component immediately following b.

In the system behavior formed by the syntax- directed translation circuit for P 1 and mP', each action of a component of type wire-fork or C-element or wire has only single occurrence.

Properties 1 and 2 follow from the direct mapping of the forks and joins in P1 into distinct circuit elements.

Theorem 3: If P is a recurrent pomset that satisfies the delay-insensitivity axioms (1) and (a ) , and the finiteness axiom, then the syntax-directed translation circuit for P' will implement P". V n 2 1.

Property I

Property 2:

Pro08 The theorem is proved by induction. 0 Basis By Theorem 2, it follows that the syntax-directed

translation circuit for P1 implements P'. Induction hypothesis Assume the synthesized circuit for P1

implements Pk, k 2 1. Induction step To show that the synthesized circuit for P'

implements Pk+', consider the system behavior formed by mPk+' and the synthesized circuit for P', which is the concatenation of ( k + 1) copies of Q , the system behavior formed by the synthesized circuit and mP'. Denote that by Q1 . Q 2 . . . &k+l. As P satisfies the finiteness axiom, there are no arrows from an event in Qz, 1 5 a 5 k - 1, to an event in Qk+l. Thus, there are arrows from QL to QL+' only. Because no slot in P immediately precedes an output event, all arrows from Qz to Qz+l that connect the events of rnPk+'

Fig. 9. Situation where safety violations may occur.

are solid arrows. Thus, if the projection of Q onto the events of mP' is m P 1 , then the projection of Q1 . Q2 . . . Q k + l onto the events of mPk+l will also be mPk+'. So, there is no progress violation.

Next, consider safety violations. Since the synthesized circuit for P1 implements P' , all dashed arrows in Q;, 1 5 i 5 k + 1, are enforced (by solid and wavy arrows). As P satisfies the finiteness axiom, there are no arrows from an event in Q;, 1 5 i 5 k - 1, to an event in Qk+l. Thus, it is sufficient to consider the dashed arrows from Q; to Qi+l, 1 5 i 5 k . Furthermore, all dashed arrows from Qi to Qi+l connect events of circuit elements in the synthesized circuit for P1 (Le., no dashed arrow from Q; to Qi+l connects events of mPk+').

Consider a dashed arrow from Qi to Qi+l that connects some events of a toggle, as shown in Fig. 9(a). The toggle has an input a and an output &. In Fig. 9(a), the event a in Q; is the last occurrence of a, and that in &;+I is the first occurrence of a. These events have matching output events labeled a in mPk+'. For each input event x of mPk+l that immediately follows a, because the dashed arrow (a, x) is enforced, there must exist a sequence of solid and wavy arrows from t to 2, which is represented by double arrow in the figure. One of the input event enabled by g in Q;, say 2, must precedes the first occurrence of a in &;+I, otherwise the two occurrences of a will be concurrent, violating the zero autoconcurrency axiom. This is represented by the double arrow from x in Q; to a in &;+I. This implies the dashed arrow ( t , U ) is enforced. Using the same reasoning, it can be proved that a dashed arrow from Qi to &;+I that connects some events of an XOR, (g ,b) , is enforced, as shown in Fig. 9(b).

To complete the proof, consider a dashed arrow from Q; to Qi+l that connects events of a component of type wire-fork, C-element or wire, as shown in Fig. 9(c). The component has an input b and an output y (in case of a wire-fork, y is one of its outputs). By Property 1,ihere exists events a and 2 of mPk+l such that there are sequences of solid and wavy arrows from g to b and from y to x, as indicated by the double arrows in Fig. 9(c). By Prop&ty 2, the occurrences of b in Q; and &;+I are corresponding events. Thus, the two occurrences of a in Qi and Qi+l are also corresponding events. From the finiteness axiom, there exists a sequence of solid and dashed arrows from 2 in Qi to g in &;+I. Each of the dashed arrow in the sequence connects events of mPk+', which exists within Qi or Qi+l, and so they are enforced. Thus, there exists a sequence of solid and wavy arrows from x in Qi to a in &i+l, as indicated by the double arrow. This in turn implies the dashed arrow (y, - b ) is enforced. Therefore, all dashed arrow from Qi to Qi+l are enforced.

204

t


C I

Fig. 10. Synthesized circuit for the recurrent pomset in Fig. 3(a).

Fig. 10 shows the synthesized circuit for the recurrent pomset in Fig. 3(a). Note that in the terminology of [4], the signal transitions y+ are not persistent. In order to use the method in [4] to synthesize the circuit, persistency constraints from y+ to x+ must be added. This is not desirable as the original specification is changed.

As a comparison, Fig. 11 shows the recurrent pomset and trace specification of the same process and the synthesized circuits using the syntax-directed translation in this paper and Ebergen's syntax-directed translation [6]. In Fig. 1 l(a), the voltage levels of the outputs of the toggle with input a encode which event of input a has just been received. If the voltage levels are different, then an odd occurrence, i.e., the lst, 3rd, 5th, . . . occurrence, of a has just been received. Otherwise, an even occurrence of a has just been received. This is similar for the toggle with input b. In Fig. ll(b), the voltage levels of q1 and 43 encode which event of input b has been received. If they are different, then an odd occurrence of b has just been received. Otherwise, an even occurrence of b has just been received. It can be observed that a lot of circuitry is used to change the voltage levels of q1 and q3 when events of b are received. A similar amount of circuitry is used for changing the voltage levels of the wires encoding which event of a is received. The N-element is not a DI circuit element because it allows consecutive signal transitions at an input without an output event in between. An N-element with inputs c and d and output e has the specification p ~ f I ( c ? ) ~ 1 (d?)21(c?lld?; e!)']. The toggles and the XOR's constitute the two-phase to four-phase converters. More comparison is presented in Section V.

D. Syntax-Directed Translation for an Atomic Choice Machine

For nondeterminate processes, the behavior machines have more than one rule. The strategy is to map each rule of the behavior machine into a circuit block using the syntax-directed translation for a pomset. When input events in a rule (or region of a rule for the case of a concurrent choice behavior machine, as used in the next section) occur, they are forwarded to the circuit block for the rule. That circuit block will produce the appropriate output events. When execution of the process evolves from one rule to another rule, subsequent input events are forwarded to the circuit block for the latter rule. Thus, the synthesis problem is divided into three sub-problems. The first one involves the mapping of the rules in a behavior machine into distinct circuit blocks using the syntax-directed translation

I

Y

U

2

b

42

(b)

Fig. 11. Comparision of synthesized circuits. (a) A recurrent pomset and its synthesized circuit. (b) A trace specification and its synthesized circuit.

for a pomset. The second subproblem involves the distribution of input events received to the circuit blocks for the rules. The third sub-problem involves the merging of the output events produced by the circuit blocks for the rules.

The mapping of the rules into circuit blocks uses the syntax- directed translation for a pomset described in the previous sec- tions. The merging of the events of an output action produced by the circuit blocks is done by the circuit element XOR. The events of an output action are produced sequentially (the zero autoconcurrency axiom) according to the specification. Thus, an XOR gate can be used to merge the events of an output action that are produced at the outputs of the circuit blocks for the rules. The distribution of input events to the circuit blocks involves the use of demultiplexers. These demultiplexers must be switched appropriately when the process is executed from one rule to another rule. Two rules, the demultiplexing rule and the choice recording rule, are proposed to accomplish such a distribution. The demultiplexing rule is explained with the syntax-directed translation for a 2-rule atomic choice behavior machine. Generalization to k-rule atomic choice behavior machine is immediate. The choice recording rule is required for the syntax-directed translation for a concurrent choice behavior machine and will be explained in the next section.

More precisely, the problem of the distribution of input events to the circuit blocks can be stated as follows. When an event of an input action a in rule ri is received, the input event is forwarded to the circuit block Ci for ri.


Rderz ( 0 ) b - y .........,.

(a)

(1) Choice event a IS forwarded 10 Cl.

(2) Choice event is used to switch the demultiplexers for input acuons thal have occurrences m RI (dashed arrow)

(3) Switching event 1s used to set the demultiplexers for input actions that are choice acuons of the next choice swe (double arrow).

Fig. 12. (b) The synthesized circuit for (a).

The demultiplexing rule. (a) An atomic choice behavior machine.

Some terms are defined to present the demultiplexing rule. A choice event is an input event in one out of several mutually exclusive sets of events that can occur at a choice state. In Fig. 12(a), a and b are the choice events. A switching event is an input event that precedes a choice slot but not any other input event in the same rule. In Fig. 12(a), p and q are switching events. Note that an input event can be both a choice event and a switching event.

Suppose in the rule r being considered, a is an input action that has a choice occurrence, b is some other input action, and S is the named state that starts r . The demultiplexing rule consists of two parts. Consider a transition from rule r’ to rule r via named state S. (1) To switch the demultiplexer for a to the circuit block C for r , a switching event in r’ is identified. This switching event must not be concurrent with any occurrence of a in r’. (2) To switch the demultiplexer for input action b to C, a choice event of r is used. This choice event must not be concurrent with any occurrence of b in r .

For the example in Fig. 12(a), let C1 (C2) be the circuit block for T I ( ~ 2 ) . The switching event p ( q ) is used to switch the demultiplexer for b to C2 (a to C1) before producing ~ ( 2 ) . Thus, when named state { (0)) is entered, if the environment sends in a(b), then that event of a(b) will be routed to the circuit block C1 (C2). Before producing g, the choice event a is used to switch the demultiplexers for the input actions (except that for a ) that have occurrences in rule r1 to circuit block C1. A similar switching will be performed by the choice event b in rule r2.

Fig. 13(a) shows an atomic choice behavior machine whose syntax-directed translation circuit is in Fig. 13(b). Circuit blocks C1 and C2 are the syntax-directed translation circuits for the finite pomsets in rules 1 and 2. Execution of rule 1 will be explained below while the execution of rule 2 can be reasoned similarly. In C1, when the first occurrence of a’ is received, an event of U is produced which is used to switch the demultiplexers for b and c before an event of g is produced.

syntax-duected uanslauon CUCUII CI synm-duected Uaslauon cucuit C,

(b)

Fig. 13. Syntax-directed translation for an atomic choice machine. (a) An atomic choice behavior machine. (b) Syntax-directed translation circuit for the atomic choice machine in (a).

The event of U is produced for switching the demultiplexers for b and c to C1 (i.e., (2) of the demultiplexing rule). With the demultiplexers for b and c so switched, future events of b and c will be forwarded to C1.

The second occurrence of b in rule 1 is a switching event. When that event occurs, (1) of the demultiplexing rule should be accomplished. In C1, it can be observed that, before the event of y enabled by that Occurrence of b is produced, an event of is generated to switch the demultiplexer for b to C2 so that, if the environment sends in b at the next choice state, then the choice event of b will be forwarded to C2. Upon initialization, the demultiplexers for a and b should be at states as indicated in Fig. 13(b), i.e., the upper channels of them are selected. The initial state of the demultiplexer for c does not matter.

Formally, the synthesis of a 2-rule atomic choice behavior machine for process P amounts to applying correctness preserving transformations to the system behaviors formed by (rl)l and m(rl)l, and by (7-2)’ and m(r2)’, where TI and 1-2

are the two rules of the behavior machine, and (ri)’ denotes one copy of rule r;. The transformations applied to the system behavior formed by ( ~ 1 ) ~ and m(r1)l are shown in Fig. 14.

206 IEEE TRANSACTIONS ON V

1

a - & a - &- -ub-&-Uc- ... -&"--u,-d

\ - \ " Y

(a)

P-6 p - ~ ' - p ' - b - I a - f c - l ~ - e.. - l , , - I n - a

\ - \ Y Y

(b)

C - 6 c - c'- C' 4 & 0 - 2 U - d*-- x' - 6 f - /

Y Y b \ - \

(C) ( 4

Fig. 14. Transformations for the synthesis of an atomic choice behavior machine. (a) Transformation for a choice event a. (b) Transformations for a switching event p . (c) Transformation for a nonchoice input event c. (d) Transformation for an output event E.

In Fig. 14(a), the events u~,u , , . . . are the (control) events entering and exiting the demultiplexers implementing the switching by the choice event. A choice event can be chosen to switch the demultiplexer for an input action if all occurrences of the input action in r1 are preceded by the choice event. In Fig. 14(b), the events t b , t,, . . . are the events entering and exiting the demultiplexers switched by the switching event. A switching event can be chosen to switch the demultiplexer for an input action that has a choice event in a rule that can follow r1 if all occurrences of the input action in r1 precede the switching event. Fig. 14(c) shows the transformation applied to an input event other than a choice/switching event. Fig. 14(d) shows the transformation for an output event in rl. The transformations applied to the system behavior formed by ( ~ 2 ) ~ and m(r2)l are similar, except that a' and z' are changed to a" and x", and the events for switching the demultiplexers, i.e., ub, U,, . . . and t b , t,, . . . , have different action labels. From the two transformed system behaviors, demultiplexers, XOR's, and the finite pomset specifications of circuit blocks for the rules can be identified. The finite pomset specifications of the circuit blocks can be synthesized using the syntax-directed translation for a finite pomset.

The correctness of the synthesized circuit is proved below. Let r1 and r-2 be the two rules of the atomic choice behavior machine, C be the synthesized circuit, and (r;)" denotes the concatenation of n copies of r;.

The synthesis method based on the syntax- directed translation of specification and the demultiplexing rule yields a correct circuit for any atomic choice behavior machine that satisfies the delay-insensitivity axioms.

The theorem is proved by showing that C implements a . r1 and a . r2, where a is the concatenation of any instances of r1 and r 2 . In the proof, it is assumed that a = (q)"l(rz)"l ( ~ 1 ) ~ ~ ( ~ 2 ) " ~ . . . (7-1)"" (TZ)"" , mi, ni 2 0, and E:=, mi + n; = 1, the result holds for other ways of concatenating the 1 instances of the rules. Induction on I is used. Basis First, it is to show that C implements ( ~ 1 ) ~ . The

synthesized circuit C consists of demultiplexers, XOR's, and circuit blocks for 7-1 and r2. Some inputs and outputs of the demultiplexers and some inputs of the XOR's are connected to the circuit block for r2. When C and m(r1)' interact, no events

Theorem4:

Pro08

'ERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 2, NO. 2, JUNE 1994

are generated at those inputs and outputs. Thus, the system behavior of C and m(r1)' is still the transformed system behavior of ( ~ 1 ) ~ and ~z(r1)~. It can easily be checked that the transformations listed in Fig. 14 preserve the correctness properties. Thus, the transformed system behavior of (7-1)~

and m(rl ) l , and hence the system behavior of C and m(rl ) l , satisfies the correctness properties. Similarly, C implements

Induction hypothesis Assume that C implements Q = ( r 1 ) " 1 ( ~ 2 ) " ~ ( ~ ~ ) " 2 ( ~ 2 ) " ~ . . . (T I )"" ( r ~ ) ~ k and mi , n; 2 0.

Induction step To show that C implements a . T I , consider the system behavior formed by the mirror of a . and C. The system behavior is the concatenation (Q1)"l (Q2)"l ( Q I ) ~ ~ ( Q z ) ~ * . . . (QI )"~(QZ)"" Q1, where Qi is the system behavior formed by C and mr;. Since no slots in the rules of the behavior machine immediately precede an output event, all arrows between two consecutive instances of Q 1 / Q 2 that connects events of the mirror of a.r1 are solid arrows. Thus, if the projection of Qi onto the events of mr; is mr;, then the projection of (Qi)"' (Qz)" l (Qi)"*(Qz)"' . . . (Q1)mk (Q2)"" . Qi onto the event of the mirror of a . r1 will be the mirror of Q . 7-1. That is, there is no progress violation.

In considering safety violations, since C implements (7-1)' and ( T Z ) ' , all dashed arrows in Q1 and Q2 are enforced. As there are arrows between two successive instances of 91/92

only, it is sufficient to consider the dashed arrows from Q; to Q j , z = 1,2, j = 1,2. Furthermore, all dashed arrows from Q; to Qj connect events of circuit elements in C.

Consider a dashed arrow from one instance of Q1 to the next instance of Q1 that connects the events of a demultiplexer, as shown in Fig. 15. The demultiplexer has an input 6, an output b' which connects to the circuit block for rl, an input Ub by which the demultiplexer can be switched to the circuit block for 7-1, and an output t& which acknowledges the inputs at Ub.

The event b is the last occurrence in Q1. The event is the choice event of r1. The event z is an input event of mrl that immediately follows b. The transformation ensures that there is a sequence of solid and wavy arrows from a to U b . Since all dashed arrows in Q1 are enforced, it follows that there must exist a sequence of solid and wavy arrows from b' to z. The atomicity of the choice state ensures that there exists a sequence of solid and dashed arrows from 2 to a. Each dashed arrow in the sequence connects events of mrl and so they are enforced. Hence, it follows that the dashed arrow from b' to ub is enforced, as shown in Fig. 15.

(r2)l.


switched br

R, R, ,..... ..............................

R) R) rule 3 Ri rule 1

:2 ..................................... !!.- ................................. ...... ........... (0) 2 - r........ .. ........ (I)..m.!d-y

R6 R, rule 4

,_.

rUlC 2 R4

Fig. 16. A concurrent choice behavior machine.

Dashed arrows that connect events of a toggle, an XOR, a wire-fork, a C-element, or a wire are enforced by reasoning similar to that presented in Theorem 3. Similarly, dashed arrows from Q1 to Q2, from Q2 to Q1, from Q 2 to Q2 are enforced. Thus, C implements Q . r1, and similarly, a . r2 .

E. Syntax-Directed Translation for a Concurrent Choice Machine

The synthesis of a concurrent choice machine with one named state is used to illustrate the choice recording rule in handling concurrent choices. The synthesis method, however, can be applied to the synthesis of general concurrent choice machines. The named state is assumed to have only two choice slots. Generalization to k ( > 2) choice slots is straightforward.

Generally, in a concurrent choice behavior machine, a rule that follows a choice state of two choice slots can be decomposed into three regions: R1, R2 and R3. All events in R1 are preceded by one of the choice slots and those in R2 are preceded by the other choice slot. An event in RY, however, is preceded by both choice slots. RI and R2 are called the predecessor regions and R3 is called the successor region in that rule. The importance of this partitioning into regions is this: by the well-behavedness of choice, R1 is independent of choices made in the second slot, while R2 is independent of choices made in the first slot. The pomset in R3 can depend on what choice events occur in both slots.

Consider the concurrent choice machine in Fig. 2, the four rules can be decomposed into twelve regions, as shown in Fig. 16. The predecessor regions that are selected by the choice event a in rule 1 and rule 2 must be identical, according to the well-behavedness of choice. Hence, the predecessor regions can be mapped into the same circuit. They are given the same label RI . Other predecessor regions are labeled R2 to R4 following the same reasoning. The successor regions are labeled R5 to Rg.

The syntax-directed translation for concurrent choice machine can now be explained. The pomset representing each distinctly named region (RI to R8) is translated into a circuit block (C1 to c8) using the syntax-directed translation for a pomset. Similar to the synthesis of an atomic choice machine, input events are distributed to the circuit blocks, and events produced at the outputs of the circuit blocks are merged. The merging of the events of an output action is done by an XOR gate. The problem of distributing input events to the circuit blocks is the same as for an atomic choice machine, namely, when an event of an input action a in region R, is received, the input event is forwarded to the circuit block C, for Ri.

o h c d

Fig. 17. recorder.

The choice recording rule. (a) A choice recorder. (b) Use of choice

Distribution of input events to circuit blocks for predecessor regions follows immediately the demultiplexing rules of Sec- tion IV-D. Distribution of input events to circuit blocks for successor regions has to be dealt with differently.

Some terms are defined. A glb (greatest lower bound) of R, is an input event in R, that is not preceded by any other input event in R,. In Fig. 16, a and c are glb’s of Rg. A switching event of a region is an input event of that region that is not followed by any other input event of the same region. In Fig. 16, a (which is also a glb of R1) is the switching event of R I , the second occurrence of a in R5 and the second occurrence of c in R5 are the switching events of R5.

Since a glb of a successor region may correspond to the same input action, say e, as a glb of another successor region, in general, it is not possible for the switching events in the predecessor regions to switch the demultiplexer for e to a particular circuit block. For example, in Fig. 16, the successor regions have glbs of action label a. The switching event in R I , for instance, cannot determine whether it should switch the demultiplexer for a to the circuit block for R5 or the circuit block for R6 because either R5 or R6 may follow RI . To ensure the circuit block of the correct successor region, say R5 in Fig. 16, to be selected in advancing on rule 1, the following choice recording rule is enforced. Intuitively, this rule ensures that C5 for R5 is selected if and only if the choice events in RI and R3 did happen earlier, as recorded in a choice recorder.

Suppose e is the input action label of some glb in the successor regions R5 to Rg. In particular, predecessor regions RI and R3 precede Rg, and e is a glb of Rj. Three cases may arise: 1) e does not occur in R1 and R3, 2) e occurs in R1 but not R3, 3) e occurs in R3 but not RI . In case I), a switching event that is not concurrent with any occurrence of e can be used to switch the demultiplexer for e so that when e occurs, it is used to interrogate a choice recorder, shown in Fig. 17(a). Depending on the input choices recorded earlier, the demultiplexer for e will be switched to the appropriate circuit block, say C, for Rg, and e will be forwarded to Cs (instead of circuit blocks for other successor regions), shown in Fig. 17(b).

The choice recorder for the example is a device built with three demultiplexers, selected by a, b, c, and d so that input

208

Clrcuir black CJ for reiion RJ

..................................... - I

T i d62- P6 96

062- 86 16-1 .....................................

Clrcuit block C6 for region R6

..................................... a l l

072- 31 tl

- T


I

II

II

Fig. 18. Syntex-directed translation for a concurrent choice machine.

i XI

dl1

4 2 - ~ 1 9, .....................................

Cacur block Cl for region RI

.................................... 081

e is routed to one of the four output channels, depending on whether a and c, a and d, b and c, or b and d were received (and recorded) earlier. The choice recorder is set by the choice events.

In case 2), the switching event can be chosen from R I . This switching event is ensured to be nonconcurrent with any occurrence of e in RI . Case 3) can be dealt with similarly.

For an input action, say b, that occurs in a successor region but is not the action label of a glb, a glb is chosen to switch the demultiplexer for b to the circuit block for that successor region. The chosen glb must precede the first occurrence of b in that successor region.

Fig. 18 illustrates how the demultiplexing rule and the choice recording rule are used to get a circuit for a concurrent choice behavior machine. The rules of the behavior machine in Fig. 2 are partitioned into eight regions R1 to R8, as shown in Fig. 16. Each of the successor regions R5 to R8 has a glb of action label a. R6 and R7 have a glb of action label d. Choice recorders, CRa and CRd, are needed. The switching events in RI and R2 are picked to target the demultiplexer for a to choice recorder CRa. Similarly, the switching events in R3 and R4 are chosen to switch the demultiplexer for d to the choice recorder CRd.

When the glb of a comes, the choice recorder is properly set and an event will appear at one of a5 to a8. That event will be used to switch the demultiplexer for a so that

II y4

i i i i XI U 2

Cboice recorder CR, Choice recorder CR,

......................................

T i a82 - I n f8 ;

b8-R 98 ~

......................................

..................................... ' I l l

II

LEUNG AND LI: SYNTAX-DIRECTED TRANSLATION

Synthesized Circuits Specification

Sire Speed

1 136 4 1 2rtZc

209

Ebergen‘s Circuits

S I X Speed

764 I 17 I 41+5c+Sx

e - x e - e’ - e ’ - e5 - e5 - &-- ~ ~ - + e ~ , - - e ~ ~ - x \ - \

Y Y

(c)

Fig. 19. machine .

Transformations for the synthesis of a concurrent choice behavior

Qi Qi

events and are the events setting the choice recorder for the glb (of action label e ) in Rg. In Fig. 19(b), the event switches the demultiplexers for e to the choice recorder. It is assumed that the switching event p is not concurrent with any occurrence of 6: . In Fig. 19(c), e’ is the output of the demultiplexer for e that is connected to the choice recorder, e5 is the output of the choice recorder at which an event will be generated if T I

is executed, U , switches the demultiplexer for e to the circuit block for R,, and e51 is the first occurrence of e in Rs. The transformations applied to the system behavior formed by ( ~ i ) ’

and rr/,(,ri)’, i = 2 . 3 , 4, are similar. From the four transformed system behaviors, demultiplexers, XOR’s, choice recorders, and the finite pomset specifications of circuit blocks for the rules can be identified. The finite pomset specifications of the circuit blocks can be synthesized using the syntax-directed translation for a finite pomset.

The synthesis method based on the syntax- directed translation of each region, choice recording rule and demultiplexing rule, yields a correct circuit for any con-current choice behavior machine that satisfies the delay-insensitivity axioms and the well-behavedness axioms.

The proof follows reasoning similar to that for Theorems 3 and 4, and is omitted here. Details can be found in [9]. Fig. 20 shows how a dashed arrow connecting two events of a choice recorder is enforced.

Theorem 5:

Proof

v. COMPARISON WITH EBERGEN’S WORK

The synthesis method presented in this paper is compared with Ebergen’s synthesis method [6]. Seven examples are used in the comparison. The results are summarized in Table I.

The basic circuit elements mentioned in this paper are implemented as static CMOS transistor circuits. The number of transistors for each circuit element is known. The demul-

tiplexer has 105 transistors, the toggle has 34 transistors, the C-element has 34 transistors, and the XOR has 24 transistors. Other circuit elements such as wire and wire-fork are assumed to have zero sizes. The size of a synthesized circuit is measured by the total number of transistors. The N-elements used in Ebergen’s circuits are assumed to have sizes equal to that of the C-element.

The speed of a synthesized circuit is measured by the longest cycle time. The cycle time of an event in the specification is the time between its successive occurrences. The longest cycle time is the largest cycle time among the cycle times of the events in the specification, assuming negligible delay in the environment. For example, in Fig. 1 I (a), the cycle time of the first event of U in the recurrent pomset is the time between successive instances of that event in the infinite behavior, i.e., the time between the kth and the ( k + 2)nd instances, where k is an odd integer. This is also the longest cycle time. When there is nondeterminism in the specification, the longest cycle time is taken to be the cycle time of an event in the longest simple cycle of named states. For example, in specification 5 which is the sequence detector in p. 33 of [6], the longest simple cycle of named states is the shortest cycle that brings the component from state 0, through states 1, 2, and 3, and back to state 0. The delays for the demultiplexer, toggle, C - element, and XOR are d. t , c, and 2, respectively. Actual figures d = 3. t = c = z = 1, are also provided to evaluate the cycle times.

Specification 1 is the specification shown in Fig. 1 I . Speci- fication 2 is the specification shown in Fig. 3. Specification 3 is a linear recurrent pomset (all events in the recurrent pomset are totally ordered) consisting of 10 events. Thus, specifications 1 to 3 are determinate processes. Specification 4 is the EO component in Section 6.1.3., p. 104 of Ebergen’s thesis (synthesized circuit is shown in Fig. 6.1.2. of p. 105). Specification 5 is the sequence detector in Section 2.3.1., p. 33 of Ebergen’s thesis (synthesized circuit is shown in Fig. 6.1.3. of p. 105). Specification 6 (7) is an atomic choice machine with two (four) rules, each of which is a linear pomset consisting of 8 events. The atomic choice machine in Fig. 13 has not been expressed in Ebergen’s DI grammar successfully. It is conjectured that it cannot be expressed in Ebergen’s DI grammar.

From the above comparison, it can be concluded that the synthesis method in this paper produces smaller and faster circuits for determinate processes, especially those with more


concurrency. For nondeterminate processes, it depends on the number of rules of the behavior machines and the sizes of the rules. Specifications 4 and 5 above are behavior machines consisting of many rules, each of which consists of two events only. In this situation, Ebergen’s method produces smaller and faster circuits. However, when the number of rules is small and the rules consists of many events, as in specification 6, the method in this paper produces smaller circuits. When the number of rules and the sizes of rules are large, as in specification 7, both methods produce circuits of comparable sizes and speeds. This suggests that the demultiplexing rule and the choice recording rule are not very efficient, and improvements in them are needed to produce more efficient circuits.

VI. CONCLUSION A syntax-directed translation for the synthesis of DI circuits

is presented and proved correct. By using circuit elements different from those considered in [ll], this paper demonstrates that DI circuits are possible. The synthesized circuits do not use two-phase to four-phase converters, as opposed to those obtained by Ebergen’s method. The synthesis method is compared with Ebergen’s synthesis method. The comparison suggests that the demultiplexing rule and the choice recording rule are not efficient. The behavior machine model allows the specification of arbitrary partial order of signal transitions that are subjected to the delay-insensitivity and well-behavedness axioms. Thus, a large class of specifications can be compiled into DI circuits. The transistor circuits for the basic circuit elements are designed and tested. The synthesis procedures described in this paper are automated. From a behavior machine input, the software produces a netlist of basic circuit elements which can be given to a standard place- and-route package to generate the layout of the synthesized circuit. Further research will focus on the development of optimization techniques so that more area efficient circuit can be synthesized.

ACKNOWLEDGMENT The authors would like to thank the comments from the

referees, especially the comments from referee #3, which greatly improve the quality and presentation of the paper.

[31

141

[51

REFERENCES

C. H. van Berkel and R. Saeijs, “Compilation of communicating processes into delay-insensitive circuits,” Proc. In. Con$ on Comput. Design, 1988, pp. 157-162. E. Brunvand and R. F. Sproull, “Translating concurrent programs into delay-insensitive circuits,” Proc. Inr. Con$ on Computer-Aided Design,

S. M. Bums and A. J. Martin, “Syntax-directed translation of concurrent programs into self-timed circuits,” Proc. Advanced Res. in VLSI, 1988, pp. 35-50. T.-A. Chu, “Synthesis of self-timed VLSI circuits from graph-theoretic specifications,” Proc. Inr. Con$ on Comput. Design, 1987, pp. 220-223. T.-A. Chu and L. A. Glasser, “Synthesis of self-timed control circuits from graphs: An example,” Proc. Int. Con$ on Comput. Design, 1986, pp. 565-571.

1989, pp. 262-265.

[6] J. C. Ebergen, “Translation of programs into delay-insensitive circuits,” Ph.D. dissertation, Eindhoven Univ. of Technol., 1987.

[7] P. N. Lam and H. F. Li, “Hierarchical design of delay-insensitive systems,” IEE Proc., Part E: Comput. and Digital Tech., vol. 137, pp. 41-56, Jan. 1990.

[8] S. C. Leung, “Synthesis of delay-insensitive circuits from graph- theoretic specifications, Ph.D. dissertation, Dept. of Comput. Sci., Concordia Univ., Montreal, Quebec, Canada, Sept. 1993.

[9] S. C. Leung and H. F. Li, “A syntax-directed translation for the synthesis of delay-insensitive circuits,” Tech. Rep. CS-VUI-93-07, Dept. of Comput. Sci., Concordia Univer., Montreal, Quebec, Canada H3G 1M8.

[lo] A. J. Martin, “Compiling communicating processes into delay- insensitive VLSI circuits,” J. Distrib. Computing, vol. 1, pp. 226234, 1986.

[ l l ] A. J. Martin, “The limitations to delay-insensitivity in asynchronous circuits,” Proc. Advanced Res. in VLSI,pp. 263-278, 1990.

[12] T. H.-Y. Meng, R. W. Brodersen, and D. G. Messerschmitt, “Automatic synthesis of asynchronous circuits from high-level specifications,” IEEE Trans. Computer-Aided Design, vol. 8, pp. 1185-1205, Nov. 1989.

[13] V. R. Pratt, “Modelling concurrency with partial orders,” Int. J. Parallel Programm., vol. 15, no. 1, pp. 33-71, 1986.

[14] D. K. Probst and H. F. Li, “Partial-order model checking: a guide for the perplexed,” in Proc. 3rd Int. Wkshp. on Comput. Aided Verificat., Aalborg, Denmark, K. G. Larsen and A. Skou, Eds., Lecture Notes in Computer Science vol. 575, Springer-Verlag. 1991, pp. 322-33 1.

[15] F. U. Rosenberger, C. E. Molnar, T. I. Chaney, and T.-P. Fang, “Q- Modules: Internally clocked delay-insensitive modules,” IEEE Trans. Comput., vol. 37, pp. 1005-1018, 1988.

[16] C. L. Seitz, “System timing,” in Mead & Conway, Introduction to V U 1 Systems. Reading, MA: Addison-Wesley, 1980, ch. 7.

[17] J. Staunstrup and M. R. Greenstreet, ‘Designing delay-insensitive circuits using synchronized transitions,” IMEC IFIP Inc. Wkshp. on Applied Formal Methods for Correct VLSI Design, 1989, pp. 741-758.

[18] I. E. Sutherland, “Micropipelines,” Commun. ACM, vol. 32, no. 6, pp. 72S738. June 1989.

[19] J. T. Udding, “A formal model for defining and classifying delay- insensitive circuits and systems,” J. Distrib. Computing, vol. 1, pp. 197-204. 1986.

S. C. Leung received the B.Sc. (Eng.) degree (first class honors) in 1984 and the M.Phil. degree in electrical engineering in 1987 from the University of Hong Kong, Hong Kong. He also received the Ph.D. degree in computer science in 1993 from Concordia University, Montreal.

Upon graduation, he joined the Department of Computer Science of Concordia University and is now an Assistant Professor. His research interests include verification and synthesis of asynchronous systems.

Hon F. Li received the B.S. degree (with highest honors) in 1972, and the Ph.D. degree in the area of parallel pipelined architectures in 1975 from the University of California, Berkeley.

Subsequently he joined the University of Illi- nois, Urbana-Champaign, as an assistant professor with the Department of Electrical Engineering. In 1977 he went to the University of Hong Kong and served as a lecturer and later a senior lecturer with the Department of Electrical Engineering where he developed both an undergraduate and graduate

program in computer engineering. He joined Concordia University in 1984 and is now a Professor of computer science. His recent research interests cover modeling, verification and synthesis of asynchronous and distributed systems.

Documents

A syntax-directed translation for the synthesis of delay-insensitive circuits