Download pdf - AEPL: An extensible programming language†

This article was downloaded by: [University of Toronto Libraries]On: 31 October 2014, At: 16:56Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Computer MathematicsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gcom20

AEPL: An extensible programming languageJacob Katzenelson a & Elie Milgrom aa Technion- Israel Institute of Technology , Haifa, IsraelPublished online: 21 Dec 2010.

To cite this article: Jacob Katzenelson & Elie Milgrom (1975) AEPL: An extensible programming language , InternationalJournal of Computer Mathematics, 5:1-4, 3-35, DOI: 10.1080/00207167508803100

To link to this article: http://dx.doi.org/10.1080/00207167508803100

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gcom20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00207167508803100

http://dx.doi.org/10.1080/00207167508803100

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Intern. 3. Computer Maths. 1975, Vol. 5, Section A, pp. 3-35 0 Gordon and Breach Science Publishers Ltd. Printed in Great Britain

AEPL: An Extensible Programming Language? JACOB KATZENELSON and ELlE MILGROMS Technion-Israel Institute of Technology, Haifa, Israel

This paper presents an extensible programming language (AEPL) which has been designed as a tool for the implementation of a large class of languages for specific applications. AEPL includes a powerful data definition facility which enables one to d e b data structures and new types of data elements as well as new operators to manipulate both new and old data elements. A syntax driven parsing scheme derived from the Markov Algorithm makes it possible to control the syntax of the language dynamically, thus allowing one to define new language structures such as expressions and statements. As a result of the method used for syntactic definition, languages obtained by extension of AEPL belong to the general class of phrase structure languages, i.e. they are not restricted to the class of context-free languages.

AEPL was extended to produce a language for the manipulation of linear graphs. Our experience indicates that such extensions are fairly simple and, in many cases, straight- forward.

Computing Reviews Categories: 4.12, 4.13, 4.20, 4.22. Key words and phrases: extensible languages, Markov Algorithms, phase structure

languages.

1. INTRODUCTION

The extensible programming language AEPL has been designed as a tool for the implementation of a large class of problem-oriented languages or languages for specific applications. The reason for such a goal is that we believe that there exist numerous areas of human interest generating problems which can be solved with the aid of a computer. We believe also that to be able to approach these problems using languages which are close to the terminology and the methodology of the respective areas is a significant advantage: it

?Based in part on a thesis submitted by the second author to the Faculty of Electrical Engineering of the TECHNION in partial fulfillment of the requirements for the degree of Doctor of Science in Technology.

$Unit6 d'hformatique, Institut de Mathematique Pure et Appliquk, Universit6 Catholique de Louvain, 2 Chemin du Cyclotron, B-1348 Louvain-La-Neuve, Belgium.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

4 J. KATZENELSON AND E. MILGROM

enables a user to think in familiar terms and it liberates him from the burden of extraneous detail. This has been the reason for the uneconomic prolifera- tion of a large number of programming languages, each more or less well adapted to the solution of a particular class of problems (see [20] for a survey of a number of problem-oriented languages). Extensible languages propose to cover wide areas of application at lesser cost and greater convenience. A detailed description of a large number of current extensible languages and systems can be found in [23], together with an extensive bibliography of the area. Many of the best known extensible language schemes are described in PI, 131, PI, [51,[71, PI , 1101, [I61 and P11.

At the present time, we do not believe that many of the existing extensible languages can reasonably claim to replace all existing general-purpose and special purpose languages, mainly for reasons of efficiency. We concede therefore that the usefulness of AEPL will be greatest for application areas which do not warrant the cost of a specially written compiler and where the matter of efficiency is relatively unimportant. Another possible use for AEPL is during the design phase of a new application language: AEPL provides a rapid and cheap way to experiment with different versions of a proposed language.

We believe that the major innovations present in AEPL are the treatment of sets, used to create data structures and to define new data types, and the use of a powerful syntax description mechanism derived from the Markov Algorithm. We think also that most of the power of the system stems from its particular architecture and the concept of a special machine or processor which embodies the semantics of the language.

In this paper, we have described the main concepts underlying the language, rather than to explain its features in detail. The reason for this is that we believe that these concepts are more important than their syntactic implementation which consists of a large number of details which deserve to be known only to someone who is actually writing programs in AEPL. A full description of the language may be found in [16].

Section 2 of this article presents the general objectives on which the design of AEPL was based. Section 3 describes the overall model of the AEPL system; section 4 introduces the data structure concepts and the semantics of the data definition facility. These three sections have appeared in 1171 in a somewhat different form; they are included here to make the paper a complete overview of the system. Besides, the understanding of these sections is a prerequisite to understanding the rest of the paper.

Section 5 describes the processor and the structure of the programs which drive the processor; section 6 presents the translator, the parsing algorithm and the structure of the grammar rules. Finally, a short example of the use of AEPL is commented in section 7.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

AEPL : AN EXTENSIBLE PROGRAMMING LANGUAGE 5

2. GENERAL DESIGN OBJECTIVES.

During the design phase of AEPL, we tried to remain consistent with a number of general concepts and ideas which we discuss in this section.

2.1. Extensibility

The three main aspects of extensibility which we set out to provide were the ability to define new types of data items and new operations on old or new data items and the possibility to modify extensively the syntactic frame of the language. The AEPL system was designed so as to present itself to the users as a language, sometimes called core or kernel language, which includes a number of basic data types, a number of operators for these data types and a syntactic frame within which one can describe sequences of operations on data, i.e. programs. The core language includes also the tools which enable one to modify these basic constituents and create "extended languages". Note, however, that the adjective "extended" does not necessarily imply addition of features to the core language : one can use the extension mechan- isms to produce a language which is less rich than the kernel by deletion of undesired features.

2.2. Minimality

In the design of an extensible language, one is tempted to limit the number of primitive language features to the bare minimum and to rely on extensibility for the creation of useful languages from the original core language. While the precise definition of a minimum set of features is a problem in itself, it is clear that the emphasis on minimality leads to kernel languages which are so primitive and involuted that their use is difficult: they have to be drastically extended in order to be of any practical use.

The design of AEPL is a compromise between a desire to keep the number of features of the kernel as low as possible and the requirement that the language be a fairly convenient programming tool.

2.3. Generality and completeness

Rather than to emphasize minimality, an approach has been to try to limit the number of primitive concepts, not the number of built-in language features. For that purpose, we tried to isolate a few very general ideas regarding data stmctures and syntax and to implement them in a language which would respect the concept of completeness as expressed by Reynolds in [19]: any value or class of values which is permitted in some context of

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

6 J. KATZENELSON AND E. MlLGROM

the language should be permissible in any other meaningful context. This makes the language very regular: the number of special cases and particular conventions is greatly reduced. We believe that this is an important feature for an extensible language, since it reduces the number of posible inadvertent violations of the language rules.

3. THE MODEL

The AEPL system is composed of three parts: a core language, a processor, a translator.

The AEPL core language is a relatively small language which resembles Algol 60 in the sense that it includes a number of basic expression and statement forms (including declarations) and that the name-scoping of its variables is governed by an Algol-like block-structure. It differs from Algol 60 in the following aspects :

the primitive data items manipulated in AEPL are not the integer numbers, real numbers, arrays, etc. of Algol 60, but so-called t-values and objects as described below, the AEPL core language contains a data definition facility which enables the user to define and manipulate new data structures, the AEPL core language includes a number of facilities for modifying its own translator, thereby allowing an extensive syntactic variability.

The AEPL processor is a machine which operates on data structures of a particular kind, namely executable data structures called programs. Programs may be created and operated upon by the user in the same way as any other data structures. Programs are distinguished only by the fact that if the AEPL processor is applied upon them or, more precisely, if control is transferred to a program data structure, a number of actions will be performed by the processor.

The AEPL processor recognizes 63 different kinds of programs, i.e., the processor is a machine with a repertoire of 63 different instructions. It is possible to combine a number of such programs into a compound data structure; control can then be transferred to this structure and the processor will then execute the different actions specified by the individual programs in a well-defined order.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


The AEPL translator is a program for the AEPL processor whose purpose is to transform an input string of characters into another data structure according to the rules of an MMA grammar (see below). At certain points of the translation, control may be transferred from the translator program to certain parts of the generated structure, thereby yielding "execution" of the transformed text by the processor. The AEPL translator is composed of a lexical scan and a parsing phase. The parser consists of a parsing algorithm derived from the Markov Algorithm [13] which includes a modifiable grammar which "drives" the algorithm. The source text submitted by a user may contain statements whose execution affects the grammar by addition or deletion of rules. This feature is used to modify the syntax of the language: one may add new operators, new kinds of expressions, new types of statements dynamically; it is also possible to redefine (overload) or delete existing language structures.

In conclusion, one may view the AEPL system as consisting of a program (the translator) executed on a special machine (the processor). The translator transforms the input into several data structures. A certain number of those data structures can be interpreted as instructions for the processor and control can be transferred to them. If the input contains the appropriate commands the execution of the corresponding data structures by the processor will modify the translator: the language will have been extended.

The translator program is present in the memory of the processor together with the generated data structures-unless those have been deleted by specific commands. Thus, at any instant of time, the "run time environment" of a user's program consists of the whole AEPL system augmented by the programs which were executed in the past and the data structures resulting from the execution of those programs. This approach is similar to that of languages such as LISP and BALM [9].

Since the processor is implemented conceptually as a program executing on an existing computer, the AEPL system can be considered to be inter- pretative.

4. DATA STRUCTURES

4.1. Principles

One of our aims has been to create in AEPL a simple but powerful data

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


definition and manipulation facility which would allow us to handle a wide class of data structures. This facility should be powerful in order to enable the user to define complex data organizations; it should however be simple enough to understand and to use. This last point requires that the data structure facility be based on a small number of well-chosen primitives.

Another design decision which has been made regarding AEPL is the total separation between data structures as conceptual organizations of data and storage structures or representations of data structures in memory. At present, the user is provided with a flexible data structure manipulation system, but he has no control over the way the structures are represented in memory.

It is clear that an algorithm can be specified and checked out for logical flaws without reference to memory representatiok Indeed, when a complex algorithm is designed, it is common practice to clear the main issues and to avoid excessive detail by specifying the data structures first and postponing decisions regarding memory structures to a later stage. On the other hand, it is certain that the efficiency of any algorithm depends on mempry representations of the data structures. Therefore, in its current form, AEPL is a tool which is useful in the first stage of the design of algorithms. Using AEPL, one can verify and debug an algorithm in terms of its logic rather than in terms of its storage structures. After the debugging phase, however, it may be necessary to modify the default storage structures in order to increase the efficiency of the algorithm. At this stage, it is certainly easier to experiment with new storage structures, since one is at least almost certain that the logic of the algorithm is correct.

A complete programming system such as the one we aimed at should also provide means for controlling and checking the memory representations. This requires an implementation specifcation language which would allow the specification of storage structures by addition of statements to a program rather than by the modification of the program. This idea is not new: it has been proposed by Balzer [I], Schwartz [22] and Earley 161 among others.

4.2. Basic data elements . There are two kinds of data elements in AEPL: t-values (terminal values) and objects. Both kinds are strongly interrelated.

t-values are entities which can be used as values (in the sense described below) of attributes of objects. Examples of t-values are integer numbers, character-strings, sets of integer numbers.

Objects are entities to which six t-values are associated in the following way: We say that an object possesses six attributes, named respectively:

name-, value-, mode-, type-, scope- and rule-attribute.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


Each attribute may possess a value, which is necessarily a t-value. If an attribute of an object possesses no value at some point in time, it is said that its value is undefined. It is possible to enquire about the value of any attribute of any object, and to modify that value.

Another way of looking at this would be to say that one object describes particular relationships between the six t-values which are the values of its attributes. The nature of these relationships will be explained below.

T-values The AEPL system provides the following kinds of t-values:

atomic t-values: integers, reals, character strings, labels and references;

compound t-values or, in our terminology, sets:

explicit sets or E-sets, conceptual sets: C-sets, R-sets, P-sets, U-sets, I-sets, F-sets and the

primitive sets.

The primitive sets are : the set of all integer t-values, the set of all real t-values, the set of all character string t-values, the set of all label t-values, the set of all reference r-values.

Although the term "set" is used, the concept is not in every case the same as the one used in mathematics. Some of the AEPL sets are ordered and may contain the same element many times; other sets (e.g. the primitive sets) correspond precisely to the mathematical notion of set: an unordered collection of distinct elements.

Classes of t-values The set of all integer t-values is sometimes called the class of all integer t-values; similarly, the other primitive sets are primitive classes. The term "class" is used for a set which specifies the "kind" or "type" (in the Algol 60 sense) of a t-value. The primitive classes are available in the kernel language; other classes can be formed by means of the extension facilities.

Among the five primitive classes, only the class of reference t-values needs further explanation.

References A reference is a t-value which designates an object in a unique way. One of the ways to gain access to the attributes of an object is by using a reference to that object. We do not concern ourselves with the implementation of such references: the important fact is that for every reference t-value there exists one and only one object which is referred to by that t-value. The

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

10 J. KATZENELSON AND E. MILOROM

reference concept is a generalization of the pointer concept which does not imply any particular implementation.

Sets As mentioned above, a set, in AEPL, is a collection of t-values which is itself a t-value. Sets are used:

to create aggregates of t-values,

to define new classes of t-values.

AEPL distinguishes between two kinds of sets: explicit sets and conceptual sets. -

An explicit set is a finite ordered collection of t-values which are effectively present in the system. Such sets correspond to the usual programming concepts of vector, list or sequence. An example is the explicit set composed of the integer t-values one, two and three, in that order.

A conceptual set is a collection of t-values which is defined implicitly. It may be finite or infinite, ordered or unordered. Such a set is defined by a predicate: it consists of all the t-values for which the predicate is true. In mathematical notation:

An example is the set of all integer t-values or the set of all character strings beginning with the letter A, or the set of all prime numbers smaller than 100.

Sets are described in greater detail in section 4.4.

Other classes oft-values We shall defer the discussion of the classes of compound t-values until Section 4.4. There are, however, a number of other classes of t-values which are not primitive classes, but which are used within the translator program. Because of the model described above, the data structures of the translator are accessible to the user. Among other structures, the translator for the core AEPL uses a number of classes, called built-in . classes, which define domains of t-values which may be of interest to the user: these classes are built in terms of the primitive classes in the same way that user-defined classes are constructed. Among these built-in classes is the class of all identifier t-values (character-strings beginning with a letter and containing only letters or digits) and the class of program t-values.

4.3. Objects and their attributes

Objects are entities to which six t-values are associated: one object describes speciiic relationships between these t-values, which are said to be the values of the attributes of that object. We describe here the roles of the attributes of an object.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

4.3.1 The name-attribute The value of the name-attribute of an object or, for short, the name of an object,is at-value belonging to the class of identifiers: it may be used to refer to an object in the same way as a reference t-value. An identifier is thus associated with an object through the name-attribute of that object. Many objects may have the same identifiers as value of their name- attribute, but at every point in time a given identifier may be used to refer to only one of these objects. The choice of the object which is referred to by a given identifier is governed by the name scoping rules which depend on the block structure of the text submitted to the translator (section 5.4).

4.3.2. The value-attribute The value of the value-attribute of an object or, again for short, the value of an object is a t-value whose class is defined by the mode-attribute of the object (see 4.3.3. below). This attribute is closely related to the usual concept of value of a constant or of a variable in other programming languages.

4.3.3. The mode-attribute The value of the mode-attribute of an object a is a reference to an object B whose value is a set of t-values to which the value of a belongs. The value of f i defines thus the domain of the values of a or their class. For short, we say that B is the mode of a or that object a possesses mode fi.

The reason for the existence of the mode-attribute is simply to allow to associate a meaning with the internal representation of the value of an object. The mode of an object a will indeed indicate whether the value of a is an integer t-vaIue, a reference, a set, and so on. The corresponding Algol 60 concept is that of type of a variable; the name "mode" has been chosen because of the similarity with the Algol 68 idea.

4.3.4. The type-attribute The type-attribute of an object can possess two values which indicate whether the object is a variable or a constant. An object is a variable if the set of possible values for that object contains more than one element; otherwise it is constant. ClearIy, one could indicate that an object a is constant by having its mode be a reference to an object f i whose value is a set with one element, namely the value of a. However, it is usually preferable not to use this device; it is more appropriate to distinguish between a variable object whose value is the integer t-value seven and a constant object whose value is the integer seven by means of the type-attribute. The mode-attributes of these two objects could then both be a reference to an object whose value is the set of all integers.

4.3.5. The scope-attribute The scope attribute of an object can possess

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


three values denoted GLOBAL, LOCAL and DUMMY which define the scope of the relationship between the object and the identifier which is the value of its name-attribute (see sections 5.4 and 5.5).

4.3.6. The rule-attribute The purpose of this attribute is discussed in Section 5.2. The value of the rule-attribute belongs to the built-in class of program t-values.

4.3.7. Primitive objects To all primitive classes correspond built-in primitive objects. We have thus an object whose name is INT and whose value is the class of all integer t-values. The mode of this object has to indicate that its value is a primitive class: this is achieved by having the mode of object INT be a reference to a special object known to the system as the object whose name is PRIMITIVE and whose value is the set of ail primitive classes. The value of the mode of PRIMITIVE is undefined. (The program which operates on the data structures recognizes the name PRIMITIVE).

4.3.8. An example Figure 1 illustrates, through a schematic representation,

FIGURE 1 The objects A, INT and PRIMITIVE.

the relationships between three objects. The object A, i.e. the object whose name-attribute has the identifier A as value, has as other attributes:

the value is the integer t-value twenty-seven (the dotted line is used to indicate this),

the mode is a reference to the object INT.

the type is the t-value indicating that A is a variable,

the scope is the t-value indicating that the association of the identifier A with this object is global,

the rule is irrelevant (its value is either undefined or not important in this context).

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


4.4. Sets-detailed description

4.4.1. Explicit sets An explicit set or E-set is a finite ordered collection of t-values. It corresponds to the usual notions of vector, list or sequence. Every member of such a collection is called a component.

According to the functions of the attributes described above, if an object a has an E-set as value (i.e. as value of its value-attribute), then the mode of u should be a reference to an object whose value is a set of E-sets, namely the class to which the value of u belongs. This class may be one of the conceptual sets available in AEPL.

The basic operation on E-sets is the selection of a component. This can be done either by ordinal position or by name, if a name has been associated with the desired component (see C-sets).

4.4.2. Conceptual sets A conceptual set is a set defined by a predicate. Such a set is not present in the system under the form of a collection of t-values: it is present purely by convention as the set (in the mathematical sense) of t-values for which the predicate is true. A conceptual set is thus a collection of t-values defined by a certain common property. These sets are represented in AEPL by descriptions of the properties of their elements rather than by a list of their elements. Since such a description is usually composed of several elements, AEPL represents a description by an E-set. The description of a conceptual set may be stored in the value-attribute of an object; the mode of that object will indicate that its value may be interpreted as the description of a conceptual set.

The primitive sets are conceptual sets corresponding to the primitive classes of AEPL: they eliist in the system as the values of the primitive objects INT, REAL, CHAR, LABEL and REFERENCE. Other conceptual sets belong to one of the following categories:

C-sets, R-sets, P-sets, U-sets, I-sets and F-sets.

The reason why there is more than one kind of conceptual set besides the primitive sets is simply one of ease of programming: it is not always convenient to represent a set by a general predicate; certain particular cases deserve a special treatment.

4.4.2.1. C-sets A C-set is a set of E-sets. Its description is composed of 5 components which are identified by the following names: number-type, number, component-type, component-class, names.

Number-type is at-value which indicates whether the number of components of the E-sets which belong to this C-set is variable or constant.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


Number is a t-value which is the number of components of the E-sets which belong to this C-set if this number is constant (examine number-type to find this out); otherwise, this t-value is a reference to a boolean function of two arguments: an integer t-value n and a reference to an object a whose value belongs to the class of E-sets described by this C-set. The function returns the value true if and only if the integer n is a permitted value for the number of components of the value of a.

Component-type is a t-value which indicates whether the E-sets belonging to this C-set are homogeneous or not. A homogeneous E-set is one whose components belong to the same class.

Component-class is a t-value which defines the-class of every component of any E-set belonging to this C-set in the following way. If the E-sets are homogeneous (examine component-type to find this out), then component- class is a reference to an object whose value is the class to which all components of an E-set belong. If the E-sets are non-homogeneous, then this t-value is a reference to a function of two arguments: an integer t-value n and a reference to an object a whose value belongs to the class of E-sets described by this C-set. The result of this function is a reference to an object whose value is the class to which the n-th component of the value of a belongs.

Names is either undehed or a reference to a function of two arguments, an identifier id and a reference to an object a whose value belongs to the class of E-sets described by this C-set. The function returns an integer t-value n which is the ordinal position of the component of the value of a whose name is to be id. If no such component is found, then the function returns zero.

Figure 2 schematizes an example in which the object PAIR has, as its value, the set of all E-sets with two integer components which are unnamed. The values of number-type (constant), number (2), component-type (constant), component-class (a reference to INT) and of names (undefined) define this by convention. The value of object X (a pair of integers) belongs to the class defined by PAIR, so the mode of Xis a reference to PAIR. We wish to point out here that the schematic representation of an E-set as shown in Fig. 2 does not imply any particular implementation.

4.4.2.2. Other conceptual sets In order -to shorten this presentation, we won't give the precise definitions of the other conceptual sets, but we will limit ourselves to presenting them informally.

R-sets (Restriction-sets) R-sets are used to impose a restriction on the elements of any other set. This restriction takes the form of a Boolean function

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


which specifies which t-values are members of the restricted set. Example: the set of all positive integers can be described by imposing the obvious restriction on the members of the set of all integers. In mathematical notation :

{x I x E S and P(x))

P-sets (Property-sets) This kind of conceptual set defines a subset of a given set by distinguishing a specific property which is, contrarily to the case of R-sets, not expressed as a predicate: the user must specify how the property is to be used to distinguish the elements of the subset by modifying the

.

-.

CONSTANT

w number-type

number

component-type

component-class

names

HGURE 2 A possible data structure for an object X whose value is a pair of integers; data structure defined by means of a C-set.

membership operation. P-sets are thus an escape mechanism enabling the user to design different kinds of conceptual sets.

U-sets and I-sets (Union-sets and Intersection-sets) These sets make it possible to define sets of t-values as unions or intersections of other sets.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


Example: the set which is the union of the set of all negative integers and the set of all integers greater than hundred.

F-sets (File-sets) These sets are used to define input-output files.

5. THE PROCESSOR

5.1. Program t-values

The AEPL processor is a machine which can be applied on a particular kind of data structure, called program t-values or programs, resulting in specific actions being performed on the universe of data items present in the system at the time of the application of the processor.

Program t-values belong to a built-in class which is the value of a built-in object named PROG. Without going into all the details, we can say here that

t o b u i l t - i n definition of PROG

CONSTANT

scope MCAL GLOBAL

program t-value

LOCAL

FIGURE 3 A program t-value and its structure

a program t-value is an E-set usually composed of two (sometimes of three) components. Figure 3 gives a schematic representation of a program t-value which is the value of an object named P. The value of P has two components,

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


the first of which is an operation code denoted here symbolically by " + ", the second of which is an E-set with 3 components, references, respectively, to objects A, B and C.

If the processor is applied (see below how this is done) to the value of object P, then, in the case of Fig. 3, the actions taken would result in the fetching of the integer values of objects A and B, their addition and the storage of the sum as the value of object C.

5.2. Grouping of program t-values

Program t-values are usually not isolated, but grouped together. It is possible to create sets of programs (compound programs) in the same way that one creates sets of integer t-values. If the processor is applied to such a compound program, it will execute the different members in sequential order-unless transfers of control are encountered. A program counter is used for that purpose. If one of the components of a compound program is itself a compound program (nesting), then the components of the latter are executed in sequence before execution of the former proceeds. The processor stacks and unstacks the program counter when going down or up in the nested structure of programs.

Another way to group program t-values is through the use of the rule- attribute of objects such as A and B in Fig. 3. The following outlines the role of the rule-attribute in the execution of programs. This feature will play an important role in the translator program of section 6.

The actions performed by the processor during the execution of a program t-value can be considered to be composed of a sequence of primitive operations (similar to micro-instructions for a conventional machine) such as fetching and storing of values, various arithmetic and logical operations. One of the primitive operations is very important and is worth mentioning with more detail: this operation involves a recursive call of the processor. Indeed, while the processor is executing a program t-value, it may be necessary to execute other program t-values in order to be able to proceed with the execution of the first program. This is made possible by the stack discipline imposed on the program counter. Such a recursive call of the processor occurs when the value of the rule-attribute of an object such as A or B in Fig, 3 is a program t-value (not undefined). In such a case, the execution of the value of P would produce the following actions:

1) if the rule-attribute of A is a program t-value which not undefined, then execute the value of the rule-attribute of A,

2) if the rule-attribute of B is not undefined, then execute the value of the rule-attribute of B, B

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


3) fetch the value of A, fetch the value of B, add the values, store the sum as value of C.

5.3. The different kinds of programs

'Among the 63 different kinds of program t-values (instructions for the AEPL processor), one may distinguish between the following classes :

instructions for manipulation of objects (creation, deletion), instructions for selection of an attribute of an object, instructions for assignment of t-values as values of attributes of objects,

either by copying or by sharing of t-values, operations involving references to objects, arithmetic and relational operations,

character-string operations, mode-conversion operations, operations on E-sets (number of elements, subset, membership, insertion,

deletion, concatenation), operations on files (I/O), operations involving transformation rules and grammars (see farther), operations related to the block-structure (see farther), control instructions (transfers, conditionals, stop, repeat, procedure

calls, block entry and exit).

5.4. Blocks

Lexicographically speaking, an AEPL program text is composed of a sequence of executable commands, some of which may be simple statements, others being compound statements of blocks.

A block is, at the language level, a statement (possibly compound) enclosed between block brackets BLOCK and END. What distinguishes a block from another statement is the fact that every object which is created within that block with a LOCAL scope-attribute can be accessed by name only within that block. Blocks may be nested as in Algol 60, and similar name scoping rules apply. Objects with scope-attribute GLOBAL are known at the outermost level, even if they are created within an inner block. The purpose of the scope-attribute DUMMY is explained below in Section 5.5.

The objects present in the system at any point are linked together by means of a tree-like structured E-set of references which reflects the overall block structure of the text. This structure plays a role similar to the symbol-tablebf

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


other languages: it is the value of a built-in object named SYMBOL-TABLE. In terms of the processor, a block is a particular program t-value which is

composed of an executable part (another program t-value) and a part, called environment, which includes the section of the value of SYMBOLTABLE which corresponds to the block. Within the program text, identifiers are used to refer to objects of the universe. As mentioned in 4.3.1, many objects may have the same name, but at every point in the program text, a given identifier will refer to only one object or to no object at all.

Whenever an identifier is encountered within a block, the internal structure allows the scanning of the universe of objects in order to find the corresponding object whose name is equal to the identifier, if such an object exists.

5.5. Procedures

Procedures are blocks whose environments contain zero or more objects with the scope attribute DUMMY. These objects always appear at some fixed positions in these environments. A procedure with dummy objects cannot be executed as a regular block: The dummy objects are place holders for actual objects which must repla& in some way the dummy objects before execution may proceed. This process is called binding of actual objects and is achieved through a procedure call. A procedure call is a program t-value which is composed of a block and of a set of references to the actual objects. Its execution yields binding of the dummy objects of the block to the actual objects (in the same order as they appear in both sets of references), execution of the block and breaking of the bindings.

When the processor encounters a dummy object a, it operates on the object @ which is bound to a. If no such object is found, then an error condition is signalled, if @ is itself a dummy object, then the processor operates on the object y which is bound to @, and so on. At some point, the chain of bindings must terminate in an object whose scope is not DUMMY, otherwise an error has occurred.

6. THE TRANSLATOR

6.1. Overall structure of the translator

The translator is (at least conceptually) a program for the processor described above. It is composed of three phases: a lexical scan, a syntax analysis and generation phase and an execution phase.

The lexical scan transforms an input string into a stream of symbols by recognizing identifiers, numbers, quoted strings and special compound characters. Then, this stream is transformed into a set of references to objects

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

20 J . KATZENELSON AND E. MILGROM

containing one reference per symbol of the stream. Figure 4 gives an idea of the result.

The syntactic analysis and generation phase uses a syntax-driven parsing algorithm to transform the output of the scanner in a tree-like structure. Section 6.2 describes the algorithm and the structure of the driving grammar; Fig. 5 illustrates the result of this process on the data of Fig. 4 where the grammar used is that of the kernel AEPL language.

The execution phase is triggered by the application of any rule of the grammar which recognizes the language structure denoting a request for

name Value node

Type Scope

Rule

VARIABLE , I

REF (CHAR)

CONSTANT

REF (CHAR)

CONSTANT

Note: REF(X) stands f o r a reference t o object X .

FIGURE 4 Output of the scanner for an input text A := Bi-C.35

execution. When such a rule is applied, a part of the tree-like structure produced up to that point is (in the absence of errors) a program t-value ("machine-code") translation of the AEPL source text. The result of the application of this grammar rule is that the processor is applied to that program 2-value which is thus executed. When this execution terminates, the processor resumes the execution of the translator program.

6.2. The syntactic analysis and generation mechanism

The syntax analysis and generation mechanism or, for short, the parser, is

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

ti me

Value

Mode

Type

Scope Rule

AEPL: AN EXTENSIBLE PROGRAMMING LANGUAGE

VARIABLE

~~~~-~ Note: *GI, 'G2 an4 'G3 are translator-generated temporary objects.

FIGURE 5 Result of the transformation of A := B f C . 3 5

derived from the well-known Markov Algorithm [8], 1131, [IS] and from the PBNF notation of Bell [2]. The parser is driven by a recognitive modified Markov Algorithm (MMA) grammar which is an ordered set of grammar rules of the form:

Ri:Ni P i + q i ; C i ; P i

where R, is the name of the rule (optional),

N, is an integer called priority level of the rule,

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

pi and qi are patterns (see below),

C, is a predicate,

Pi is a program.

The structure of a grammar rule is represented by an E-set with 6 components named respectively:

name, priority, input-pattern, output-pattern, predicate and action.

The rules are ordered in the grammar in order of decreasing priority levels. The order of the rules having identical priority levels is indifferent, but the following condition must be fulfilled in order to ensure the determinism of the algorithm: among the rules having the same priority level, there should not exist two rules whose input patterns are of the form

and

where x and u are E-sets of references, xu denotes the concatenation of the E-sets x and u, and u may be empty.

The purpose of the parser is to transform the output of the scanner-a set of references to certain objects, called input set-into another set of references to certain objects (see Figs. 4 and 5) called output set. This transformation is done according to the rules of the grammar.

The transformation proceeds in steps by successive applications of MMA grammar rules. At every application of a rule, some subset of the set of references is replaced by another set of references in a manner governed by the input- and output-patterns of the rule. These patterns are themselves sets of references to objects; the input-pattern is used to define that part of the input set which will be transformed: this is done by a matching process in which the mode-attributes of the objects play an important role. The output-

. pattern of the rule is used to generate a replacement for the part of the input set which matched the input-pattern.

A precise definition of the parsing algorithm can be found in the appendix; the following is an intuitive explanation of the process. Let us first assume that the grammar does not contain more than one rule at every priority level. The parsing process consists in the scanning of the input set from left to right while searching for a match for the input pattern of a grammar rule. The

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


process starts with the rule at the highest priority level and continues with the following ones until a match is found, say for a rule R,. Note that matching involves both the veriiication of the presence in the input set of certain objects according to the structure of the input pattern and the satisfaction of the predicate of the rule. When the match for rule Ri is found as described above, that part of the input set which matches the input patternp, is replaced by a copy of the output pattern qi of the rule and the associated program Pi is executed. The left to right scan proceeds then to search for a match from the leftmost end of the new input and for the rule with the highest priority. Then, the next rule is considered, and all the following ones until a new match is discovered.

If no match is found for any of the rules of the grammar, then the process terminates or rather returns to the calling program, since we consider the parser to be implemented as a recursive procedure.

If the grammar contains rules with the same priority level, then, when the scan reaches that level, the search is'made for a match of the input patterns of every rule at that level. If more than one such match is found, then the rule yielding the leftmost match is applied.

This parsing algorithm differs from the standard Markov Algorithm for a number of reasons :

1) The algorithm described below operates on E-sets of references to objects, the matching conditions may depend on all 6 attributes of every object. The Markov Algorithm operates on strings; its matching conditions depend only on identity of strings.

2) The rules of the grammar which derive the algorithm have two components which may affect the execution of the algorithm: the predicate and the action. These are absent from the Markov Algorithm.

3) The grammar which drives the algorithm may contain a number of rules with the same priority levels, but such rules must satisfy a particular condition which ensures the determinism of the algorithm.

The roles of the predicate and of the action of a rule are very important: the predicate allows one to specify conditions under which the q l e may or may not be applied; the actions allow computation during syntax analysis or generation of the tree structure or of code.

Metalmguage for the syntax The rules of a grammar and the grammar itself are data structures describable entirely within the framework presented in Section 4. The kernel language, however, possesses built-in language structures and notational facilities which enable a user to create new grammar rules and modify grammars in a simple way. AEPL serves thus as syntactic metalanguage to describe its own syntax.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


Modification of a grammar Extension of the language is done by addition (or deletion) of rules to (from) the grammar which drives the translator.

The modification of a grammar is expressed by a sequence of AEPL statements. This modification takes effect from the moment the processor has executed the program t-value resulting from the translation of that sequence of AEPL statements. Since execution is controlled by the source program itself, it is possible to have the grammar modifications occur immediately after the translation, or to have them stored for later use.

6.3. Some notes about the parsing algorithm and the grammars

The parsing algorithm described above proceeds by successive transforma- tions of parts of the given input. Since there is no restriction about the lengths of the replacements, it is impossible to guarantee that the algorithm will terminate for any grammar and any input. Indeed, the algorithm can easily enter an infinite loop or generate an infinitely increasing output stream. It is clear that some caution is needed when creating grammars to avoid such situations.

The algorithm is also inherently slow because of the repeated scanning of the same parts of the input by the same patterns. In [ll], [12] and [16], methods are presented to obtain the same transformation of any input to the appropriate output whereby this repeated scan is avoided or at least mini- mized. This new algorithm uses two stacks and a set of markers which delimit portions of the input which need not be checked by certain grammar rules.

Concerning the choice of the MMA for the parser, we would like to point out several advantages that this choice has for the implementation of extensible languages. The first advantage is generality: a larger class of languages (the class of phrase-structured languages) can be described in MMA grammars than in, say, BNF, Second, and most important for extensibility, is the following: we believe that the specification of a language by an MMA grammar is easier to write than BNF grammar. We have come across this phenomenon in our experimental work in which we wrote parsers for Algol and a language for graphs. The theoretical results and the examples reported in [12] indicate that this phenomenon is a result of following properties of the MMA: (1) As a result of the use of the predicate and string variables the number of rules used by the MMA to describe a language is much smaller than the number of rules used using BNF rules or production rules. (2) Under certain conditions (no overlap among the input patterns of rules) the priority of the rules of the MMA can be changed without affecting the result of the parsing. (3) Computer languages used today can be described as consisting of several sublanguages (e.g. arithmetic expressions, blocks, etc.). An MMA parser can be constructed for each sublanguage and a single parser can be

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

AEPL: AN WTENSIBLE PROGRAMMING LANGUAGE 25

constructed out of them. If certain overlap conditions are satisfied, this parser is simply constructed by appending the parsers for the sublanguages. We believe that the "no-overlap" condition is basically the mechanism by which a programmer recognizes and "parses" a program when he reads it. Therefore we expect computer languages composed by and for humans to have this no-overlap property; this, in turn, makes the writing of a parser using the MMA a relatively simple task.

7. AN EXAMPLE

In this section we wish to present an example of the use of AEPL in order to give the reader a more concrete feeling of the language. It is however rather difficult to &d an example which is at once interesting, non trivial and simple enough to be understood by someone who does not know all the fine details of the language. We did not attempt, therefore, to give here an example which illustrates all the capabilities of AEPL. The reader who wishes to see a more detailed example may find in [16] the description of a language for the creation and manipulation of linear graphs obtained by extension of AEPL, together with some graph-theoretical algorithms written in that language.

We present here the extensions needed to implement a very simple formula manipulation language which allows symbolic derivation of algebraic expressions. This language is derived in part from an extension of Algol 60 proposed in 1181.

7.1. The goal

We wish to extend AEPL to include a new data class; values belonging to that class are used to represent algebraic expressions. We wish to be able to assign expressions as values of variables and to compute the symbolic derivation of an expression. Expressions are formed in the conventional way from atoms, integer constants, prefix operators (+ , - , SIN, COS, LN, EXP), infix operators (+, -, *, /, 1) and parentheses.

7.2. The proposed solution

We shall define the appropriate new classes of t-values. An expression will be represented by an E-set (a vector, list or sequence) with either one, two or three components, depending on whether the expression is an atom, a prefix expression or an i d x expression. Expressions will be represented internally intree form.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


We shall then create the grammar ruies which recognize expressions and form their internal representations in terms of E-sets. Finally, we shall write the recursive symbolic derivation procedure.

7.3. Some points about AEPL

The language we use as a starting point is not the kernel AEPL, but a language which has been extended from the kernel for reasons of convenience (see[l6]).

Comments appear in the text between a / * and a */. The exclamation mark (!) causes the execution of the statement which

Precedes it (this statement may be a block), similarly to the language Proteus [2].

A grammar rule has the syntactic form

TRANS name: priority input pattern -t output pattern; predicate; action;

TRANSEND

The name of a rule enables one to refer to the rule once it has been entered into a grammar.

The patterns of a rule are enclosed between square brackets. The phrase

[(A : INT) + (B : REAL)]

represents a pattern with three components:

a reference to an object of mode INT whose name is A,

a reference to the object whose value is + , a reference to an object of mode REAL whose name is B.

The names A and B may be used in the predicate and the action of the grammar rule to refer to those objects of the input stream which correspond to objects A and B when the rule is applied.

The expression A EL S

is a predicate which is true if the value of A is a member of the set which is the vatue of S,

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

AEPL: AN RXTENSIBLE PROGRAMMING LANGUAGE 27

The expression # S

has as value the number of components of the E-set which is the value of S. The operator REF, when applied to an object, yields a reference to that

object. The operator UNREF, when applied to an object whose value is a reference, yields the object which is referred to by that reference.

A declaration statement such as DCL X INT creates an object whose name is X and whose mode is a reference to the object INT; the value of that object must therefore be an integer.

The expression S . N

denotes the N-th component of the value of S, provided this value is an E-set with at least N components.

7.4. The program

BEGIN

DCL EXPR CSET! /* the value of EXPR will be a Gset */

DCL ATOM SET! /* the value of ATOM is a set (in general)*/

/* The value of EXPR will be the class of all E-sets which are used to represent expressions */

/* The value of ATOM will remain undefined: no further precision is needed than the fact that an object which is an atom has as mode a reference to ATOM */

EXPR : = CWARIABLE, /* number type */

REFCBl), /* number */ CONSTANT, /* component type */ REFERENCE, /* component class */

NIL) ; /* names */ DEFINE BOOL FUNCTION B1 OF (N:INT) TO BE

B1 := IF N > 0 AND N S 3 THEN TRUE ELSE FALSE;

/* The two statements above define the value of EXPR to be the class of E-sets with a variable number of components--defined by the function B1 to be between 1 and 3-which are all of the same class, namely the class of references, and which are not accessed by name */

/* Now we introduce operators */ DCL OPERATOR CSET! /*the value of OPERATOR will be a C-set*/

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


OPERATOR := CIVARIABLE, I* number type */ TRUE, /* number */ CONSTANT, /* component class */ CHARACTER, I* component class */ Na} /* Ilitmes *I

/* This causes the value of OPERATOR to be the class of all Esets with a variable number of components which all belong to the class of character-strings *I

DCL (PREFIX, ADOP, MULTOP) OPERATOR!

/* This declares PREFIX, ADOP, MULTOP to be objects whose values belong to the class of operators */

PREFM : = E{-, $. , SIN, COS, LN, EXP);

ADOP := E{+, -1;

MULTOP : = E{*, /};

* It should be noted that all h i t e explicit sets in AEPL are ordered. In the case of the classes of operators, the order is irrelevant and it seems annoying that we must represent these classes by ordered sets. We shall simply never make use of the fact that the operators are ordered in their sets */

END! /* This causes the previous block to be executed: the new classes are now part of the system */

/* Now we define the grammar rules to recognize expressions */ BEGIN

DCL (TI, T2, T3, T4, T5, T6, T7, TS) TRN!

/* This declares TI, T2, . . ., Ts to be objects whose values will be grammar rules */ T1 := TRANS TI: 2000 [(A:ATOM>] -, [{B : EXPR)] ;

/* predicate */ TRUE; /* action */ B.1 := REF(A); TRANSEND ;

T2 : = TRANS T2: 1900 [(A : INT)] -r [(B : EXPR)] ; /* predicate */ TRUE; /* action */ B.l : = REF(A);

TRANSEND;

/* These two rules recognize atoms and integers; now the rules for prefix operators */

T3 : = TRANS T3 : 1800 [(A : PREFIX)(B : EXPR)] -, [<C : W R ) ] ; /* predicate */ TRUE; I* action */ BEGIN

C.1 := REF(A) ; C.2 : = REF@);

END; TRANSEND;

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

AEPL: AN RXTENSIBLE PROGRAMMING LANGUAGE 29

/* A rule to take care of the unary plus */ T4 : = TRANS T4: 1850 [+ <A : EWR)] -, [<A : EXPR)];

/* predicate */ TRUE; /* action */ NOOP; /* no action */ TRANSEND ;

/* The remaining rules handle parentheses and infix operators */ T5 : = TRANS T5 ; 1700 [(<A : EXPR))] + [(A : EXPR)] ;

/* predicate */ TRUE; I* action */ NOOP; TRANSEND ;

T6 : = TRANS T6 : 1650 [<A : EXPR) t <B : EXPR)] -, [(C : EXPR)] ; ' /* predicate */ TRUE;

/* action */ BEGIN C.1 := REF(A); C.2 := REF(t); C.3 := REF@);

END; TRANSEND;

n := TRANS ~ 7 : 1600 [<A : EXPR)<B : MULTOP)<C : EXPR)(X : ANY)] -, [(D : EXPR>(X : ANY)]; / * predicate */ IF X = t THEN FALSE ELSE TRUE; /* action */ BEGIN

C.l := REF(A); C.2 := REF@) C.3 := REF(C);

END; TRANSEND ;

T8 := TRANS T8: 1500 [(A : EXPR)(B : ADOP><C : EXPR)(X:ANY>] + [(D : EXPR)<X : ANY)] ; /* predicate */ IF X = t OR X EL MULTOP THEN FALSE

ELSE TRUE; I* action */ BEGIN

C.l := REF(A); C.2 := REF@); C.3 := REF(C);

END ; TRANSEND ;

/* The predicates in rules "I7 and T8 ensure the correct parsing of expressions such as X+X * (Y+Z) */

/ * Now we add these rules to the grammar of AEPL */ ADD TI, T2, T3, T4, T5, T6, T7, T8 TO KERNEL;

END! /* this causes execution of the previous block of code */ / * From now on, the grammar rules TI, T2,. . ., T8 are part of the grammar of

the language */ /* The following block uses the new language */

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


BEGIN DCL (X, A) ATOM! DCL F EXPR! F := (-X+A) / (X-l)*X;

END! / * See Figure 6 for the result of the execution of this block in terms of internal data

structures *I /* We now define the recursive functions DERIVE which computes the derivative of

of an expression with respect to a variable */

DEFINE EXPR FUNCTION DERIVE OF (F : EXPR, X :ATOM) TO BE

BEGIN

DCL A REFERENCE! DCL (By C) EXPR! IF # F = 1 THEN

IF F.1= REF@ THEN DERIVE := 1; ELSE DERIVE := 0; ELSE IF # F = 2 THEN /* a prefix expression */

BEGIN A := F.l; B := UNREF@.~); IF A = REF(-) THEN DERIVE : = -DERIVE@, X) ; ELSE IF A = REF(S1N) THEN DERIVE : = COS(B)*DERIVE@, X ) ; ELSE IF A = REF(COS) THEN DERIVE := -SIN@)*DERIVE(B, X ) ; ELSE IF A = REF(LN) THEN DERIVE : = DERIVE@, X)/B; ELSE IF A = REF(E;rCP) THEN DERIVE : = F* DERIVW, X ) ; ELSE ERROR;

END;

ELSE I* an infix expression */ BEGIN

B : = UNREF(F.1); A : = F.2; C : = UNREF(F.3); IF A = REF(+) THEN DERIVE : =

D E W , X)+ D E W C , X ) ; ELSE I F A = REF(-) THEN DERIVE :=

D E W , X)-DERIVE(C, X); ELSE IF A = REF(*) THEN DERIVE :=

B *DERIVE(C, X) + CCDERIVEtB, X ) ; ELSE IF A = REFm THEN DERIVE :=

( C * D E W , X)- *DERIWC, W / C T 2; ELSE IF A = REF(?) THEN DERIVE : =

IF TYPE(C) = CON THEN C*B t ( C - l ) * D E W , X )

ELSE (B t C) r (LOG B*DERNE(C, X)

+ C / B * D E W , X)); ELSE ERROR;

END ; END !

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

AEPL: AN BXTENSIBLE PROGRAMMING LANGUAGE

A VARIABLE

w COKSTANT

VARIABLE N o t e : EXPR i n a mode f ie ld standa

for a reference t o the o b j e c t named EXPR and similarly fpr INT, ADOPI MULTOP, etC.

mGURE 6 Result of the execution of the statement F := ( - X + A ) / ( X - 1 ) . X

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


8. CONCLUSION

We have described the main concepts underlying the design of the extensible language system AEPL. The system is composed of three parts: a core language with extension capabilities, a processor and a translator.

The core language is composed of an AlgoldO-like base and a powerful data definition facility which enables one to build complex data structures and to create new data types. The data items are of two kinds: t-values and objects; objects are a generalization of the concept of variable in other languages. Data structures are built using the implementation-independent concept of a set of t-values. Sets can be of two kinds : explicit-corresponding to the notion of sequence, list or vector, i.e. a h i t e ordered collection of values-or conceptual, i.e. defined by a predicate. Conceptual sets are par- ticularly useful to define new data types, e.g. the class of all positive integers smaller than five hundred.

The processor is a machine which operates on executable data structures called programs. Programs are created in the same way other data structures are created. They are distinguished only by the fact that the control of the processor can be transferred to them. The instruction repertoire of the processor defines ultimately the semantics of the operations of the language and of its extensions. Since, for instance, multiprocessing is not provided in the processor, such a mode of operation can only be simulated, it can never be executed directly.

The translator is a program for the AEPL processor; its purpose is to transform the input string into appropriate data structures, some of which may be programs which may be executed by the processor. The transformation process is based on a parsing method derived from the Markov Algorithm which is driven by a dynamically modifiable MMA grammar. This provides- the user of the language with a powerful tool for the creation of new syntactic structures which are not restricted to the domain of context-free languages.

From an experience with this (as yet unimplemented) language, we derive that the system is fairly easy to use and that extensions are convenient to express. This paper contains a short example of the implementation of a language allowing the symbolic differentiation of expressions; in [16], a language for the creation and the manipulation of linear graphs is described; in [12], the MMA mechanism is used to define a subset of Algol 60. What remains to be done is a careful study of implementation schemes, including the addition of an implementation specification language for storage structures. It will almost certainly be necessary to sacrifice some of the generality of the language (a result of a commitment to completeness) in order to achieve a reasonable implementation. Other problems are common to all extensible languages: they include the problem of error-handling,

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


-recovery and -message generation and the problem of the "freezing" of an extended language in order to improve the efficiency of its translator.

Acknowledgements

We would l i e to express our gratitude to the following people for the numerous and fruitful discussions we had with them: Prof. Y. Wallach and Dr. E. Kantorowitz of the Technion, Prof. J. Feldman of Stanford University, Profs. J. C. Boussard and M. Griffiths of the University of Grenoble and Messrs. S. Schuman and P. Jorrand of the D M Scientific Center in Grenoble.

Manuscript received: October 1972; revised: February 1974.

References

Balzer, R. M., "Dataless Programming", Proc. AFZPS 1967 FJCC, 535-544. Bell, J. R., The Design of a Minimal Expandable Computer Language, doctoral dissertation, Stanford University, 1968. Cheatham, T. E. Jr., "The Introduction of Definitional Facilities into Higher Level Languages", Proc. AFIPS 1966 FJCC, 623-637. Cheatham, T. E. Jr., Fischer, A. and Jorrand, P., "On the Basis for ELF-An Exten- sible Language Facility", Proc. AFIPS 1968 FJCC, 937-948. Christensen, C. and Shew, C. J., (eds.), "Proceedings of the Extensible Languages Symposium", SIGPLAN Notices 4 , 8 (Aug. 1969), 1-62. Earley, J., "Toward an Understanding of Data Structures", Comm. ACM 14, 10 ( k t . 1971), 617-627. Galler, B. A. and Perlis, A. J., "A Proposal for Definitions in Algol", Comm. ACM 10, 4 (April 1967). 204-219. Galier, B. A. and Perlis, A. J., A View of Programrmizg Languages, Addison-Wesley Publ. Co., Reading, Mass., 1970. Harrison, M. C., BALM-An Extendable List-Processing Language, Proc. AFIPS 1970 SJCC, 507-51 1.

10. -irons, E. T., "Experience with an Extensible Language", Convn. ACM 13, 1 (Jan. 1970), 31-40. ,

11. Katzenelson, J., "The Markov Algorithm as a Language Parser-Linear Bounds", J. of Systems and Computer Sciences 6 , s (October 1972), 465478.

12. Katzenelson, J. and Milgrom, E., A Modified Markov Algorithm as a Language Parser, Memorandum ERL-M363, Electronics Research Laboratory, College of Engineering, U. of California, Berkeley.

13. Markov, A. A., Theory of Algorithms, Academy of Sciences of the USSR, 1954, English translation by Israel Program for Scientific Translations.

14. McIhoy, M. D., "Macro Instruction Extension of Compiler Languages", Comm. ACM 3,4 (April (1960), 214-220.

15. Mendelson, E., Introduction to Mathematical Logic, Van Nostrand Co., Princeton, N.J., 1964.

16. Milgrom, E. Design of an Extensible Programming Language, doctoral dissertation, Technion-Israel Institute of Technology, 1971.

17. Milgrom, E. and Katzenelson, J., "Data Structures in the Extensible Programming Language AEPL", Proc. AFZPS 1972 FJCC, 515-523.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014

34 J. KATZENBLSON AND E. MILGROM

18. Perlis, A. J. and Iturriaga, R. "An Extension to Algol for Manipulating Formulae", Comm. ACM7,2 (Feb. 1964), 127-130.

19. Reynolds, J. C., GEDANKEN-"A Simple Typeless Language Based on the Principle of Completeness and the Reference Concept", Comm. ACM U, 5 (May 1970), 308-318.

20. Samrnet, J. E. Programming Languages: History and Fundamentals, Prentice-Hall Inc., Englewood Cliffs, N.J. 1969.

21. Schuman, S. A. (ed), "Proceedings of the International Symposium on Extensible Programming Languages, Grenoble, 1971", SZGPLAN Notices 6, 12 (December 1971).

22. Schwartz, J. T., Abstract Algorithms and a Set-Theoretic Language for their Expres- sion, Preliminary draft, Computer Science Dept., Courant Institute of Mathematical Sciences, New York University, New York, 1970-71.

23. Solntseff, N. and Yezersky, A., A Survey of Extensible Programming Languages, Computer Science Tech. Report No. 7117, McMaster University, Hamilton, Ontario, 1971.

APPENDIX

The M M A parsing algorithm

Let us assume the existence of a grammar G containing N rules G[1], G[2], . . ., G[N] and an input set p. The following variables are used in the algorithm :

P, PBEST are integer variables indicating the position of a match inp.

RBEST, RCURRENT are integer variables indicating rules in G. The algorithm is expressed in pseudo-Algol.

Step 1 : [Initialize]: PBEST := 0 ; RCURRENT := 1 ;

Step 2: [Look for a match]: If MATCH (p, G(RCURRENT)) succeeds then goto step 4, otherwise proceed.

Step 3 : [No match found]: if RCURRENT # N then

begin RCURRENT : = RCURRENT + 1 ;

goto step 2 end else terminate ;

Step 4: [A match is found]: P : = position of the match inp if PBEST = 0 or P < PBEST then

begin RBEST : = RCURRENT ; PBEST := P

end

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014


Step 5: [Check for more rules on this priority level]: if RCURRENT = N or PRIORITY (G(RCURRENT+ 1))

+ PRIORITY (G(RCURRENT)) then goto step 7 otherwise goto step 6

Step 6 : m e next rule has the same priority]: RCURRENT=RCURRENT+l; if MATCH (p, G(RCURRENT)) succeeds, then goto step 4, otherwise goto step 5

Step 7: [Apply the rule with the leftmost match]: APPLY (p, G(RBEST), PBEST) goto step 1 - - - - - - - - - -

end of the algorithm

In order to complete the description of this algorithm, we must still define the auxiliary process MATCH and APPLY.

The process MATCH (input set, rule) scans the input set from left to right and compares it with the input-pattern of the rule. For a subset s of the input set to match the input pattern i of the rule, a number of conditions must be fulfilled :

1) s and i must have the same number of components; this establishes a one-to-one correspondence between the objects referenced by the components of s and those references by the components of i.

2) Let u and f i be two corresponding objects, u being in the input-pattern i, B in the input sets.

The values of the mode, type and value attributes of B must be equal to the values of the corresponding attributes of a, provided that the latter are deked.

3) If two components of i are references to the same object, the corresponding components of s must also refer to the same object.

4) The evaluation of the predicate of the rule for this attempted match must yield the value true. The predicate is evaluated as a procedure call where the objects of s are the actual objects bound to the corresponding dummy objects of i.

The process APPLY (input set, rule, position) involves the replacement of the subset s of the input set which matches the input-pattern i of the rule by a replacement set r which is generated from the output-pattern o of the rule. Then, the action of the rule is executed as a procedure call where the objects of s and r are the actual objects bound to the corresponding dummy objects of i and o.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

oron

to L

ibra

ries

] at

16:

56 3

1 O

ctob

er 2

014