15
Computer Languages, Vol. 1, pp. 29--43. Pergamon Press, 1975. Printed in Northern Ireland EXTENDING PL/I FOR STRUCTURED PROGRAMMINGt JOSEPH E. SULLIVAN The Mitre Corporation, Bedford, MA 01730, U.S.A. (Received 5 November 1973 and in revised form 17 January 1974) Abstract---General principles of structured programming, and practical aspects of applying those principles within the programming language PL/I, are discussed. Suitable extensions (and contractions) of that language are suggested. Structured programming PL/I Hierarchial design Level of abstraction Program com- prehension Programming language design GOTO-free coding Interrupts Error handling 1. WHAT IS STRUCTURED PROGRAMMING? 1.1. The basic idea THE TERM"structured programming" embraces a number of related notions, most of them neither new nor surprising. The fundamental concept is that programs consists of layers, as it were, each layer implementing a characteristic level of abstraction built upon the layers below and supporting the layers above (e.g. Dijkstra [1]). Each element at a given level is a distinct unit with a clean and in some sense minimal interface to other units, and may be independently designed, implemented and tested. Thus far, this may seem to be very little more than the common concept of "modularity," a term now so shopworn that the phase "modular design" is something of a private joke among programmers. Where structured programming departs from platitudes is in prescribing two specific design and coding rules. 1.2. Top-down, hierarchial organization First, design and coding are blended into a single, top--down process. That is, the entire system operation is expressed as a simple process in terms of other, probably as yet un- defined, processes and data structures (e.g. "obtain user's query" or "look up string in directory"). The next step is to take one or more of these abstractions and define it, possibly in terms of still other abstractions, and so on until everything is resolved into processes and data structures already supported in the implementation environment. The trick is to keep each expansion minimal---some would even demand that it fit on one page [2]--so that the algorithm is easily designed and understood (cf. Dijkstra [3, 4]). Of course, the controversy surrounding this design method centers on the question of practicality. In fact, no one denies that the success of the method requires foresight during the topmost design phases, in order to retain elegance, not to mention efficiency, at the bottom. This is partly because one generally has a fixed goal in terms of the existing machine, operating system, language and subroutine packages towards which he must direct his design as a practical matter. In addition, poor choices for abstractions at the higher level show up in the lower levels as excessive, seemingly incoherent, interdependeneies. In practice, good designing is typically an iterative process--starting from the top down, one 1 This paper is an adaptation and updating of report ESD-TR-72-315 of the Air Force Electronics Systems Division. This work was performed under AFIESD Contract F 19628-71-C-0002. 29

Extending PL/I for structured programming

Embed Size (px)

Citation preview

Computer Languages, Vol. 1, pp. 29--43. Pergamon Press, 1975. Printed in Northern Ireland

EXTENDING PL/I FOR STRUCTURED PROGRAMMINGt

JOSEPH E. SULLIVAN The Mitre Corporation, Bedford, MA 01730, U.S.A.

(Received 5 November 1973 and in revised form 17 January 1974)

Abstract---General principles of structured programming, and practical aspects of applying those principles within the programming language PL/I, are discussed. Suitable extensions (and contractions) of that language are suggested.

Structured programming PL/I Hierarchial design Level of abstraction Program com- prehension Programming language design GOTO-free coding Interrupts Error handling

1. WHAT IS STRUCTURED PROGRAMMING?

1.1. The basic idea

THE TERM "structured programming" embraces a number of related notions, most of them neither new nor surprising. The fundamental concept is that programs consists of layers, as it were, each layer implementing a characteristic level of abstraction built upon the layers below and supporting the layers above (e.g. Dijkstra [1]). Each element at a given level is a distinct unit with a clean and in some sense minimal interface to other units, and may be independently designed, implemented and tested. Thus far, this may seem to be very little more than the common concept of "modularity," a term now so shopworn that the phase "modular design" is something of a private joke among programmers. Where structured programming departs from platitudes is in prescribing two specific design and coding rules.

1.2. Top-down, hierarchial organization

First, design and coding are blended into a single, top--down process. That is, the entire system operation is expressed as a simple process in terms of other, probably as yet un- defined, processes and data structures (e.g. "obtain user's query" or "look up string in directory"). The next step is to take one or more of these abstractions and define it, possibly in terms of still other abstractions, and so on until everything is resolved into processes and data structures already supported in the implementation environment. The trick is to keep each expansion minimal---some would even demand that it fit on one page [2]--so that the algorithm is easily designed and understood (cf. Dijkstra [3, 4]).

Of course, the controversy surrounding this design method centers on the question of practicality. In fact, no one denies that the success of the method requires foresight during the topmost design phases, in order to retain elegance, not to mention efficiency, at the bottom. This is partly because one generally has a fixed goal in terms of the existing machine, operating system, language and subroutine packages towards which he must direct his design as a practical matter. In addition, poor choices for abstractions at the higher level show up in the lower levels as excessive, seemingly incoherent, interdependeneies. In practice, good designing is typically an iterative process--starting from the top down, one

1 This paper is an adaptation and updating of report ESD-TR-72-315 of the Air Force Electronics Systems Division. This work was performed under AFIESD Contract F 19628-71-C-0002.

29

30 JOSEPH E. SULLWAN

realizes at some point he had made a bad choice at some higher level, possibly even the very top, backs up to that point and starts down again. (Of course, there is nothing really wrong with retaining the work one did during the first descent, to the extent that it still applies, although this is probably where many errors are born.) In short, we are rarely all that sure of what we want before we see what it involves. On the other hand, good designs always look strictly top-down when finished; the proof of the structure is in the compre- hension.

1.3. Canonical composition rules The second rule is that the user restrict himself to one of a small number of"composition

rules" in expressing the control logic of each unit. In the earliest and most basic formula- tion, the flow chart of a process is restricted to one of the three forms:

> P1 > P2 * (Sequence)

> (Alternation)

> t 1 • (Iteration)

The t's are elementary 2-branch tests. Each P may be null, an elementary process, or itself a process whose chart is of one of the three types, as it will be noted that each form has but one external entry and one external exit. It is easy to see how this process of expansion is related to the concept of levels.

In relation to the programming language PL]I, these flow charts correspond to simple statement-to-statement sequence, IF-THEN-ELSE constructions, and DO-WHILE groups, respectively. Thus it is already possible to program in a structured manner in PL/I without resorting to GO-TO statements, which Dijkstra [5] would banish them from programming languages because their presence invites composing non-structured programs. However, it must be recognized that as things stand, PL/I sometimes exacts what seem to be an un- reasonably high price in elegance, efficiency, or both, for perfect avoidance of GO-TOs; this is the subject matter of the third section of this paper. Of course, even if the programmer should appeal to GO-TOs in such cases, he may still conform to the canonical composition rules.

BShm and Jacopini [6] have proved that any general flow chart may be transformed so as to involve composition of the three basic forms only, by spfitting nodes (copying boxes) and introducing auxiliary variables where necessary.t Actually, their theorem was stronger than this, in that only two forms (sequence and iteration) are necessary if an operation amounting to "save result of alternative test" is available. In PL/I, a logical (BIT(I))

1" Knuth and Floyd [7], implicitly ruling out the possibility of introducing auxiliary variables and com- putations, exhibited a program which could not be rewritten without either GO-TOs or procedure calls. With auxiliary variables, however, the program is easily rewritten in canonical form.

Extending PL/I for structured programming

assignment is such an operation; thus any alternation

31

could be rewritten

IF (condition) THEN (action 1); ELSE (action 2);

DCL (FLAGY, FLAGN) BIT (1); FLAGY = (condition);

FLAGN = ---nFLAGY; DO WHILE (FLAGY);

(action 1) FLAGY = '0'B; END;

DO WHILE (FLAGN); (action 2) FLAGN = '0'B; END;

Although such a result might be of interest if one is searching for the minimum number of control primitives necessary for programming, this is not the point of structured programm- ing. On the contrary, the goal should be to provide a rich lexicon of control structures, as long as each such structure is separately useful and well understood--i.e, theoretically reducible to the basic structures by a simple chain of reasoning. There are several such alternatives or generalizations for the canonical forms which do no violence to the concept of structure.

The extension to generalized sequence is immediate:

)P, )P~ ) . . . )P~ )

Similarly, n-way or generalized alternation

/'1

is easily seen to be equivalent to

_-- -r-[ = v l

(Ti:v2) n Y~_ P"-' ~ . . . . . (Tt=vn_l)

Pn

32 JosEPh E. Strdav.~t

Finally, the generalized iterationt

I f " PI --"tl ~P2 ---'-t2

t " P5 . . . . . . . Pn "-''m~i'n'-''m''Pi'1"l'l

J

is reducible to (note the new boolean variable G):

In " - - " - Q I " - - ' - G ? - - - " PI

~,m2 .Q;___._ .... .,)~

2 2 "--..~ 02---"2 " I

where QI and Q2 are assignments of "yes" and "no," respectively, to G.

2. W H Y S T R U C T U R E D P R O G R A M M I N G ?

2.1. Understanding programs

The principal rationale for structured programming is an improvement in the ease of understanding the dynamic process represented by a program. This extends along three fronts: comprehension by a programmer examining the program text, proof of correctness of debugging of the process, and explanation of the system function to a user.

2.1.1. Comprehension by a programmer. Comprehension of the (static) program text, in relation to the dynamic process it evokes, is improved because everything that logically (dynamically) precedes a given program point will also precede it physically in the source text, except possibly for ongoing iterations whose extents are also highly visible in the source. Or, as Dijkstra [3, 5] put it, one can define any state of a structured process as a stack of pointers to the text and iteration counters. That is, all one needs to know at any point in time during execution of the procedure is: where control is in the current lowermost block, where that block was invoked by the one above it, and so on, plus a loop counter for each iterative block. At any given level, a single such textual index thus suffices to define the state of progress, where otherwise a complete record of all branches taken would be necessary. Consequently, it becomes much easier to be sure, when reading the static text of the program, what dynamic context is assumed i.e. "what do the variables mean when I get here--and by what path did I get here anyhow?"

Otherwise, if GO-TOs and labels are used freely, it can easily become necessary for a reader to understand the whole program simultaneously in order to understand any part of it In a sense, it is the labels and not their correlative GO-TOs that are mainly at fault in a such a case.

That is, the "arrive here from someplace" operation implied by a label is more dangerous within its environment than the "depart" is within its (immediate) environment. This is

I" This chart, with Px null, is called omega, by Brhm and Jacopini [6]. Kosaraju ]8] proved that omega. cannot be reduced to a composition of the three basic forms plus omega,,, m < n without introducing new auxiliary variables.

Extending PL/I for structured programming 33

because any block of code presupposes, on entry, certain consistency relationships in the data structure---what we might call the structure of the structure as opposed to the state of the structure. A segment of code which embarks upon updating the state will often tempo- rarily disturb the structure. A branch-out from the middle of a block may apparently leave things inconsistent, but that, by definition, is the problem of the point that receives control. (It may not be a point that presumes much anyway--for example, the very end of the program. The definition of many programs is that they either do something, in which case all sorts of relationships apply, or else print an error message, in which case practically everything is undefined.) A label, however, raises the question as to whether the expected structural relationships (or midtransition conditions) apply no matter where control is received from. Determining this may very well turn out to be an ever-expanding task.

2.1.2. Debugging andproving correctness. Convincing oneself or others that a program is correct, whether by the relatively informal process known as debugging or by rigorous proof, is much easier when the program is structured. For the following the same lines of structure inherent in the program, one can construct a framework of reasoning where each layer supports the next. The fact that process A, invoked by process B, is known to be correct is used to establish that B is correct--and so on with obvious analogy to the familiar process of theorem-proving in other branches of mathematics. The actual order of testing or proof may be top-down or bottom-up, with slight differences in technique but not in substance. Certain known patterns of proof are peculiar to each of the three canonical forms: "linear" deductive reasoning for sequence, case analysis for alternation, and mathematical induction for iteration [2, 9, 10].

As a concrete example of how debugging can be simplified by structured programming, consider the endless loop, that ancient haunt of the programmer on a tight budget. With structured programming, such loops can arise only from improper iterative DO specifica- tions, e.g. a WHILE expression that cannot become true within the (textually localized) range of the DO. Loops simply cannot arise as an unintended by-product of an ill-con- ceived set of GO-TOs. This one fact greatly reduces the amount of reasoning or instru- mentation that must be applied to find the cause of a loop. In fact, it makes it practical for the language or a separate tracer to detect excessive looping semi-automatically with the aid of maximum iteration estimates.

2.1.3. Comprehension by a user. It is generally assumed that what is easy to program will be hard to use and vice versa. Surely this conflict, to the extent that it exists, arises not from logical necessity but rather from poor programming techniques. For a user must, as a practical matter, come to understand the spirit, the motivation, the underlying rhythms and patterns of the device he is using before he can really be facile with it. No amount of "functional" (black-box oriented) documentation, however comprehensive, well-written and well-organized, can substitute for this insight into what's going on underneath. In fact, the more functional documentation there is, the more obviously hopless this means of comprehension !

An unstructured program can be a formidable thing to explain to a user, and for him to grasp. Its rhythms are aperiodic, its patterns irregular. Ironically, the worst sources of difficulty may very well be functions--often patched in after basic design--conceived specifically for user convenience. A structured program, by its nature, is a regular process at every level. Its organization lends itself to comprehension of the dominant (top-level) pattern first, with gradual deepening of knowledge as the need arises--and with no danger that new knowledge will surprise the old.

34 JosEpri E. SULLIVAN

2.2. Constructing and adapting programs

There is some concern that it may be a bit harder to organize a program in structured form initially [9]. True, getting rid of GO-TOs may seem like losing one's only means of transportation at first, but it will be recalled that the no-GO-TO rule is not the essence of structured programming. Undoubtedly, many programs written without regard to the canons of structured programming have a layered organization and, though they contain GO-TOs, are resolvable or nearly resolvable into the composition rules and so are structured. In such cases, getting used to different expressions for the primitives is a trivial matter. And in any case, the discipline of structured programming establishes a framework for developing one's algorithm which, like the outline for a composition, speeds the work. For all the same reasons that make structured programs more comprehensible, the technique keeps one's thought processes orderly and lends assurance that all the ground is being covered.

Adaptation of a structured program can take one of two forms. Lower-level redesign is a backing up to the lowest level unaffected by the desired adaptation, and top-down reworking from there. Upper-level redesign is really the construction of a new system, using modules from the old. Either way, independence of the levels assures the continued integrity of the retained part of the system.

3. PROBLEMS WITH STRUCTURE IN PL/I

3.1. Basis of remarks--.4 note on elegance and efficiency

It has already been noted that structured and GO-TO-free programming is perfectly possible in PL/I as it stands, but that occasionally the price for strict adherence may be unreasonably high. This price may be in terms of efficiency, elegance or both. It is not, as one might suppose, easy to separate out these considerations for discussion. For while we may agree that elegance is generally more important than efficiency and that efficiency is largely a matter of the compiler and not the language, in practice the programmer must keep both in mind and thread a common-sense line between extremes. Moreover, it is becoming increasingly apparent that automatic optimization is a very difficult, and still very young, technology. Consequently, it is often more realistic to consider certain in- efficiencies as properties of the language, or at least as inevitable consequences of the level of abstraction represented by the language, rather than as purely compiler-dependent. For example, an unfortunately large timing overhead often attaches to the procedure call mechanism. This phenomenon has several causes, some of them not particularly related to machine architecture. (Ironically, one of them is the precautions that must be taken just in case the current procedure is bypassed by a "blast out" GO-TO from a lower pro- cedure to a higher.)

3.2. Boolean expression evaluation

In a boolean expression, it is often a matter of choice, for evaluation purposes, whether or not all constituents are evaluated and in which order. For example, in the expression (A < B) & (C(X) < D), either side of the & could be evaluated first, and if that sub- expression were found to be false, it would be unnecessary to evaluate the other side. Such choices are not necessarily unimportant, because function references, e.g. the C(X), could

Extending PL/I for structured programming 35

have side effects. However, the definition of PL/I t leaves these choices purposely unresolved, which is as good a way as any to discourage poor programming practices such as surprise side effects. The problem with this definition is the semantic dilemma posed by such ex- pressions as

IF (e "-1 = N U L L ) & (P--,-X= Y) THEN DO;

In the case where P is NULL, the reference P---*X is invalid and so evaluation of the right-hand expression may result in an addressing or data specification fault during execu- tion. In this case, it is fairly easy, though less concise, to rewrite the test as

IF P "-1 = N U L L THEN DO;

IF P-+X-- Y THEN DO;

(The first "DO ;" can be omitted if one doesn't mind dealing with a "dangling else".) How- ever, handling an ELSE clause in this case--or, equivalently, the I F . . . "or" case

IF P = NULL I P---,.X= Y THEN DO;

is much worse, requiring repetition of code or an artificial indicator to rewrite in GO-TO free form:

F L A G = ( P = N U L L ) ;

IF - t F L A G THEN FLAG=(P--~X= 10;

IF FLAG THEN DO;

Furthermore, when such a boolean appears in a WHILE clause:

DO WHILE (P----NULL l P ~ x = Y);

an artificial indicator must be used and the code to set it must be repeated, either literally or via macros:

F L A G = ( P = N U L L ) ;

IF "-1 FLAG THEN F L A G = (P-+X= Y);

DO WHILE (FLAG);

F L A G = (P=NULL) ;

IF "-1 FLAG THEN F L A G = (P---~X= Y);

END;

The only other alternative would be to replace the argument of WHILE by a procedure call, a generally unattractive solution because it destroys locality, that is the property of having things, such as the clear text of an expression, where they are expected and most needed for reference. This is a kind of logical overhead, and added to it is the textual verbosity and aforementioned timing overhead for procedures. What seems to be needed is either a change in the language definition, the introduction of expression notation (discussed later in this section), or a different form of expression--perhaps using different operators for controlling the order and circumstances under which an expression is evalu- ated.

• t Specifically, the evolving ANSI standard [11], (presumably) Multics [12] and the Optimizer [ 13]. The older F compiler [14] definition was that all subexpressions were always evaluated though the order was un- defined; for purposes of this discussion, this was no better.

36 JOSEPH E. SULLIVAN /

3.3. ON, REVERT, Block level and error exits

ON and REVERT are tied closely to block level in PL/I, in that:

(1) Executing an ON for a given condition pushes the condition action stack only on the first such execution in a block; subsequent ONs in the same block result in replacement of the top stack element;

(2) REVERTs in a given block affect only corresponding ONs executed in the same invocation of the same block;

(3) All ONs executed in a block and still active when the block is terminated are automatically REVERTed at that time.

The effect of (2) and (3) is to disallow procedures whose function is to perform some stand- ard set of ONs or REVERTs for the benefit of the calling program. In structured pro- gramming, one should be able to collect together any set of functions into a new primitive function; from this point of view it would seem that the present ON/REVERT definition is inimical to structured programming

The reason ONs and REVERTs work as they do seems to be related to the possibility of writing GO-TOs in the ON-unit. In the case where the ON-unit simply does some processing, not relying upon the environment of the procedure that executed the ON, and returns normally, i.e. returns to the point of interrupt or its immediate vicinity (we will call this the "polite interrupt" case), there doesn't seem to be any reason why ON and REVERT could not act in a simple push/pop fashion without regard to block level. After all, the ON-unit is a block in its own right, separate from the block which executes the ON, and thus can be invoked at any time. But if the ON-unit makes use of the scope rules to access or alter the environment of the block which executed the ON--in particular, if it exits to or through that environment by a GO-TO (we will call this a "rude interrupt")-- then it is clear that the block must be still active if the designated condition is raised before the ON is REVERTed. As it is, this is assured by the implied REVERTs at block termina- tion.

Furthermore, the fact that an ON-unit cannot RETURN or REVERT at the same level as the procedure which executed the ON statement often leaves the GO-TO as the only means of returning to that level. For example, it is desirable, on entry to a program, to specify that whenever an error occurs, certain variables are to be printed out and then whatever error action was in effect at the time of program entry (presumably set up by the calling program) is to be invoked, and so on back to the top-level procedure. All codings involve GO-TOs. Perhaps the best, because at least the GO-TOs can be masked by means of macros, is

ON ERROR SNAP GO TO A;

GO TO B;

A : REVERT ERROR:

(Print variables)

SIGNAL ERROR;

STOP;/* SHOULDN'T BE REACHED*/

B: . . .

Extending PL/I for structured programming 37

However, the coding

ON ERROR SNAP BEGIN;

REVERT ERROR;

(Print variables)

SIGNAL ERROR;

END;

would obviously be preferable, if only it would work as intended. The question can fairly be raised whether ONs are not in themselves inimical to structured

programming. If so, we have arrived at an impasse: the utility of interrupts and ONs in particular are well established. Clearly, rude interrupts to any arbitrary program point are unstructured, as are polite interrupts which affect data in ways unexpected by the process subject to interrupt. (In general, aninterrupted process need not distinguish between a rude interrupt and a polite interrupt which provides a fix-up.) It is, perhaps, unreasonable to expect the language to exercise appropriate control over data access.

In summary, there seem to be several possible ways out of the present difficulty. One would be to disallow rude interrupts altogether and allow free push-popping of a con- dition-procedure stack, e.g.

STACK ERRFIX FOR ERROR;

UNSTACK FOR ERROR;

On the other hand, rude interrupts (such as error exits) are highly useful and, if constrained to block termination, are not unstructured. The flow chart resembles generalized iteration, with implied tests after each statement. But to allow these, conditions must be tied to block level (including DO-group level) even more closely and explicitly than at present. The possibility of making the ON part of the DO statement is further discussed under Continuous loop control below. It should also be noted that this solution is incompatible with the first, in order to maintain proper stack discipline, at least for the same conditions. If we rule out the possibility of separating the conditions into two disjoint classes according to allowed stacking procedure, then even the polite interrupt specification must be tied to block level, e.g.

DO . . . . INVOKING ERRFIX(X) FOR ERROR . . . .

END;

38 JosePz-x E. SULLIVAN

3.4. Continuous loop control

The lack of a continuously-applied loop control mechanism or exit facility (as in BLISS [15-17] and PASCAL [18]) can mean an unnatural "diving" in block level as successive conditions arise which may terminate the loop. Repetition of code or tests or both may also be necessary to avoid GO-TOs. Consider, for example, the process "read and print cards until there are no more cards." There are a number of ways of coding this simple process in PL/I without using GO-TOs. Preceding declarations and the statements

ON ENDFILE(CARDS) INPUT_EXISTS = FALSE;

INPUT_EXISTS = TRUE;

are assumed in each of the following examples:

DO WHILE(INPUT_EXISTS); (1)

READ FILE(CARDS) INTO (CARD);

IF INPUT_EXISTS THEN DO;

PUT EDIT(CARD)(SKIP,A);

END;

END;

READ FILE(CARDS) INTO(CARD); (2)

DO WHILE(INPUT_EXISTS);

PUT EDIT(CARD)(SKIP,A);

READ FILE(CARDS) INTO(CARD);

END;

GOTACARD: PROC RETURNS (BIT(l)); (3)

READ FILE(CARDS) INTO(CARD);

RETURN(INPUT_EXISTS);

END;

DO WHILE(GOTACARD0);

PUT EDIT(CARD)(SKIP,A);

END;

It should be borne in mind that the READ and PUT, while single statements in this example, are representative of any process, perhaps comprising many statements. This is the reason for the technically unnecessary DO-END around the PUT in example 1. On the whole, this first coding is probably the best available in PL/I, but there are at least two faults that can be found with it. The first is that the loop continuation criterion (INPUT_EXISTS) must be written twice. (The associated testing code is thus compiled

Extending PL/I for structured programming 39

into two places in the program, and the test actually made twice each execution loop. Normally, these are insignificant inefficiencies, however.) This rewriting affords an extra opportunity for error and is basically inelegant, (i.e. unduly complex) insofar as the simple, though somewhat imprecise, English definition of the process needed only one statement of the termination condition. The second fault is a result of the first: the PUT statement, while logically at the same level as the READ (again, refer to the English definition), must be relegated to a hierarchially lower block. This effect can continue to any depth. If, for example, there are several kinds of cards to be read, and thus several different READ statements in the loop, an IF INPUT_EXISTS and a drop in level would follow each one for the remainder of the loop.

Coding 2 is also unsatisfying on two counts. The first is the repetition of the READ statement, objectionable on the same grounds as the repetition of the loop test. Secondly, the order of the PUT and READ in the loop is the reverse of the natural order--once once again referring to the English definition of the process. In reading the program, one must come to understand that the first part of the loop goes either with the initialization or with the second half of the previous loop, as appropriate. Furthermore, in the previously mentioned case of several card types to be read, the method is not easily extended in any coherent way.

The third coding avoids any repetition. Nevertheless, this is probably the least satis- factory coding. First, there is the textual and timing overhead introduced by the procedure definition. Second, to make the READ part of the process a side-effect of the termination test is to obscure the essential characteristics of the process for anyone trying to read the code.

This problem is closely related to that of error exits, already mentioned. What we wish to write, of course, would be something like

DO TERMINATING ON ENDFILE (CARDS);

READ FILE (CARDS) INTO (CARD);

PUT EDIT (CARD)(SKIP,A);

END;

That is, a "rude" interrupt could be attached to the DO-group, such that termination occurs when the condition is raised. This is a practical possibility because the PL/I CONDITION data type has the right characteristics (boolean, reset to false after testing) and can be set only at points known to the compiler. A general continuous test, e.g. TERMINATING ON(X -t- l r < A), is probably not feasible, at least not with present hardware.

3.5. End-of-loop control The lack of a control expression with the loop test at the end of the loop sometimes forces

odd or at least less-than-concise and therefore error-prone code. One examplet is the case where a routine R is to be called until it reports no errors in an error counter ERR:

ERR = 1;/*ILLOGICAL BUT NECESSARY TO FORCE FIRST CALL*/

DO WHILE (ERR>0);

CALL R;

END;

Due to J. A. Clapp.

40 JOSEPH E. SULLIVAN

or perhaps:

GO = 'I 'B;

DO WHILE (GO);

CALL R;

GO=(ERR>0) ;

END;

neither of which is as easy to follow as something liket

DO WHILE_END (ERR>0);

CALL R;

END;

3.6. Local procedure call

As noted previously, the execution time overhead for a procedure call in PL/I dis- courages the use of procedures for small functions, especially functions to be invoked in tight inner loops, or called only once--i.e, defined for formality only, for the sake of better program structure and clarity. A large part of this dilemma would disappear if the language would recognize the concept of a procedure that did not require the environment to be pushed down, but rather only a save-return-address. Implementing the concept of a COBOLT "SECION", that can be invoked by a "PERFORM sectionname", would be one way to accomplish this. Another approach is that taken by the Multics PL/I compiler, which automatically recognizes "quick" internal procedure or BEGIN blocks that need no new environment [19]. As a matter of aesthetics, it seems better to recognize explicitly, right in the language, the important difference between these "white box" procedures and those that are definitely "black box". On the other hand, the arguments are not unmixed: whiteness and blackness, after all, are but two extremes on the scale of grayness.

3.7. Expression notation

One further enhancement to structure would be to permit it within expressions. As realized in "expression" languages such as AED [20, 21] and BLISS [16, 17], this is a matter of defining a value for "statements" and allowing these in expressions. For example, one could write

R = IF X = A THEN --1;

ELSE IF X = B THEN 1;

ELSE 0;

t Perhaps the symbols WHILE and WHILE_END ought to be replaced by ?* and * ?, respectively, the better to suggest the relative order of the body of the loop and the test. Maybe APL has the right idea after all I

Extending PL/I for structured programming 41

which properly highlights the fact that R is being assigned and is thus more perspicuous than

IF X = A THEN R = --1;

ELSE IF X ----- B THEN R ---- 1 ;

ELSE R = 0;

One implication of such an extension would be that the assignment operator and equality comparison operator would have to be distinguished--something that is probably desir- able anyway.

Depending on how it was realized, structure within expressions might eliminate the problem of boolean expression evaluation already mentioned.

3.8. CASE construction

The lack of a CASE statement means that

GO TO labelvariable(i);

must be used for efficiency in the situation where the program path is the result of a simple computation. In other CASE situations, PL/I already has the I F . . . T H E N . . . ELSE construction, which, if THEN IFs are avoided (without loss of generality, as a DO may always be introduced after the THEN), can be used in the canonical sequence:

IF (condition) THEN (action);

ELSE IF (condition) THEN (action); "~

0 or more of these

ELSE IF (condition) THEN (action); J

ELSE (action); } 0 or 1

This arrangement suggest the inherent parallelism of generalized alternation, but not as well as a CASE statement. A reasonable syntax might be

CASE [(exp.)];

(expl) stmh;

(exp.) stmt~;

(exp,,) stmt,,;

END;

[ELSE stmt, ;]

42 JOSEPH E. SULLIVAN

where exp~ defaults to 'I 'B, and execution proceeds as follows: Each of the exp,, i = 1 . . . . n are evaluated in turn until exp~ = expx, whereupon stmt~ (which may be simple or a DO-END) is executed. No further exp~ are evaluated or statements executed. If expx ~ exp,, then stmt, is executed if the ELSE is present, otherwise no statement is exe- cuted.

3.9. Coroutine linkage Like most programming languages, PL/I does not recognize the concept of a coroutine.

This concept is, or should be, important to structured programming because an ideally structured program is independent not only of the internal characteristics of the primitives it calls upon (downward independence) but also of what the process is doing that calls upon it (upward independence). By extension, this would seem to include not even pre- suming what is above and what is below, but rather only what is input and what is output. For example, a lexical scanner, whose input is single characters and whose output is words or "tokens", is commonly written as a subroutine to be called by a parser whenever a new word is desired from the input. However, there is no special reason why the scanner should not be the driver, calling the parser whenever a word is ready to be processed. Despite this equivalence, a scanner as subroutine and a scanner as driver would be coded quite differently. In fact, the scanner and parser (and for that matter the basic I/O and semantic processing routines) are logically coroutines----each waiting for an input or inputs to be available, processing them to produce an output, waiting for the output to be taken away, and then repeating.

At best, the language could be extended by tagging of variables, continuous WHILEs or some such construct which would activate a process as soon as its inputs were available. At second best, an explicit coroutine linkage could be devised. At third best, those aspects of code which relate to whether the procedure is called or is calling could be made small and centralized.

The first of these is the approach taken in SIMPL/I, an extensiont to PL/I for process- oriented simulation [22]. Similar concepts appeared much earlier in SIMULA 67 [23, 24], an ALGOL derivative.

3.10. Language deletions Lest we appear only to be adding more "buttons, switches and handles" to the cockpit

of Dijkstra's [25] nightmare PL/I airplane, let us hasten to list those features that would be rendered unnecessary by the extensions discussed, and that should therefore be deleted in the name of structured programming. Needless to say, many other features of the language could and perhaps should be simplified or perhaps omitted but that is another subject.

Let us delete, therefore, while we are young: (1) the GO TO statement; (2) the ON statement; (3) the REVERT statement; (4) the ENTRY statement and (5) the RETURN statement. Labels should be retained for PROC naming and other control point identifica- tion, but not to affect control nor for multiple-closure END statements. The END state- ment can be altered to perform the valued RETURN for a function procedure, i.e.

END (exp);

where exp is an expression.

t Currently implemented by IBM as a preprocessor to the Optimizer and Checkout Compilers.

Extending PL]I for structured programming 43

Acknowledgements--The author has had the benefit of many discussions with individuals interested in the subject area, many of whom also reviewed early drafts of this paper. These included MITRE colleagues William Amory, Richard Bullen, Lorna Cheng, Judith Clapp, Stanley Cohen (now of Index Systems, Inc.), Carl Engelman, Robert Fleischer, Leonard LaPadula, Steven Lipner, Dr. Barbara Liskov, Dr. Jonathan Millen, David Miller, Dr. Leroy Smith, and Robert Yens; Dr. John Goodenough (now of SofTech, Inc.) and Major Roger Schell of ESD, the sponsoring agency; and Dr. Henry Ledgard and Amos Gileadi of the University of Massachusetts. In addition, Nancy Ansehuetz of MITRE's Department D-73 Research Center assisted greatly in the gathering of pertinent information, on a continuing basis.

R E F E R E N C E S 1. E.W. Dijkstra, The structure of the "THE"--multiprogramming system, Communs Ass. Comput. Mach.

11,341-346 (1968). 2. H.D. Mills, Top down programming in large systems, Debugging Techniques in Large Systems, R. Rustin

(Ed.), pp. 41-55, Prentice-Hall (1971). 3. E. W. Dijkstra, Notes on Structured Programming, Technische Hogeschool Eindhoven (1969). 4. E. W. Dijkstra, Structured programming, Software Engineering Techniques Report on a Conference

Sponsored by the NATO Science Committee, J. N. Buxton and B. Randell (Eds.), 84-88 (1970). 5. E.W. Dijkstra, GO TO statement considered harmful, Communs Ass. Comput. Mach. 11,147-148 (1968). 6. C. Bi~hm and G. Jacopini, Flow diagrams, Turing machines and languages with only two formation

rules, Communs Ass. Comput. Mach. 9, 366-371 (1966). 7. D. E. Knuth and R. W. Floyd, Notes on Avoiding "GO TO" Statements, Stanford University CS 148

(1970). 8. S. R. Kosaraju, Analysis of structured programs, Proc of the 5th Annual ACM Symposium on the Theory

of Computing (1973). 9. B. H. Liskov and E. Towster, The Proof of Correctioness Approach to Reliable Systems, MITRE Corp.

Tech. Pep. ESD-TR-71-222 (1971). 10. R. L. London, Certification of algorithm 245, Treesort 3, Communa Ass. comput. Mach. 13, 371-373

(1970). 11. European Computer Manufacturers Association (ECMA) and American National Standards Institute,

PL[1 BASIS]I, ECMA.TC10/ANSI.X3J1 (to be published.) 12. R. A. Freiburghouse, The Multics PL]I compiler, AFIPS Conf. Proc. 35, 187-199 (1969) 13. OS PL]I Checkout and Optimizing Compilers: Language Reference Manual, IBM Order No. $33-0009,

3rd edn (1972). 14. IBM System]360 Operating System, PL/I (F) Language Reference Manual, IBM Order No. GC28-8201,

5th edn (1972). 15. W. A. Wulf, D. Russell, A. N. Habermann, C. Geschke, J. Apperson, D. Wile and R. Brender, BLISS

Reference Manual, DECUS Program Library 10-118 (1971). 16. M. G. Manugian (Ed.), A Collection of Readings on the Subject of BLISS-IO, Digital Equipment Corp.,

DECUS Program Library 10-118 Part II (1971). 17. W. S. Wulf, D. B. Russell and A. N. Habermann, BLISS, a language for systems programming, Commun

Ass. Comput. Mach. 14, 780-790 (1971). 18. B. L. Clark and J. J. Homing, The system language for project SUE, SIGPLANNotices 6, 79-87 (1971). 19. B. L. Wolman, Debugging PL/I programs in the Multics environment, AFIPS Conf. Proc. 41,507-514

(1972). 20. D.T. Ross, Introduction to Software Engineering with the AED-O Language, MIT Rep. ESL-R-405 (1969). 21. C. •. Feldmann (Ed.)• D. T. R•ss and J. E. R•driguez• AED-• Pr•grammer• s Guide• MIT Rep. ESL-R-4•6

(1970). 22. SIMPL/I (Simulation Language Based on PL]I) Program Reference Manual, IBM Program Product

5734-XXB (1973). 23. A Wang and O. J. Dahl, Coroutine sequencing in a block structured environment, BIT 11,425-449

(1971). 24. O. J. Dahl and K. Nygaard, SIMULAman ALGOL-based simulation language, Communs Ass. Comput.

Mach. 9, 671-678 (1966). 25. E. W. Dijkstra, The humble programmer, Communs Ass. Comput. Mach. 15, 859-866 (1972).