Executing PASCAL programs on a PROLOG architecture

systems

Execut ing PASCAL programs on a PROLOG architecture

by G CHEN and M H WILLIAMS

Abstract: The decision by the Japanese to build their F(lth Generation Computer Project around systems based on logic programming rather than on conventional imperative languages is a s~en(ficant departure/i'om the style q[ computers developed in the past. The ~{[]~,et which this might have on computer systems in the lhture has h,d to concern about what ,tight happen to the large base o[existing so[tware which is implemented in imperative hmguages. To allay fi'ars on this score an investigation has been conducted into the j~,asibility o! translating conventional languages [ike P,tS(AI. ittto PROI, OG. The results ~[this study are reported.

Ket'words: programming hmguages. PAS(,.tI,, I'ROI, OG.

s ince its development in 1970 as a tool for research in artificial intelligence, the use of the programming language PROLOG 1'2 has grown relatively

slowly. Nevertheless it has been successfully used to develop programs in a number of different application areas, including natural language processing 3, expert systems 4, database query languages s, CAD modeilers 6, etc.

Recently, the importance of the language has received a considerable boost as a result of the decision by the Japanese to establish their Fifth Generation Computer Project on logic programming languages 7. One of the objectives of this project is to develop several different computer systems whose kernel languages are variants of PROLOG. Subsequently other national and international research programmes provoked by the Japanese under- taking have also recognized the importance of logic programming.

Although there has been considerable debate over the likelihood of success of these research programmes and their effects over the next decade or two, there has also been some concern regarding the problem of the large

Department of Computer Science, Heriot-Watt University, Edinburgh, UK

base of existing software which is implemented in imperative languages. Also, there has been concern about the software if computer systems with these radically different architectures were to replace existing systems.

Ideally, programs which have been developed in imperative languages should be able to be run, albeit inefficiently, on architectures designed for logic programming languages. To assess the feasibility of this, a study has been conducted into the possibility of translating programs in a conventional imperative language into equivalent programs in PROLOG. For the purpose of this exercise PASCAL was chosen as representative of the class of imperative languages, partly because of its simplicity and neat control and data structures and partly because it has a reasonably precise formal specification s - 1o

Because of the fundamentally different approach to control in the two languages (PASCAL and PROLOG), the goto statement is particularly difficult to translate. To handle this, the translation process has been divided into two stages. In the first stage all goto statements are removed from the PASCAL program. This is achieved by mapping it into an equivalent PASCAL program without goto statements. The second stage converts a structured PASCAL program into an equivalent PROLOG program. To simplify the problem, it was decided at this stage to concentrate on a subset of PASCAL, PASCAL-S 1 i. This does not include pointer types, file types, set types, variant record fields or packed variables.

Besides the handling of goto statements, there are a number of major problems in the translation process. These include:

• differences in the nature of variables in the two languages,

• approach to parameters, • difference in structured data types, • handling of assignments, • translation of procedure or function bodies.

This paper reviews briefly the approaches adopted in the two stages of the translation process and the problems currently under investigation.

vol 29 no 6 july/august 1 9 8 7 0950-5849/87/060285-06503.00 ,C~ 1987 Butterworth & Co (Publishers) Ltd. 285

Restructuring the PASCAL program

PROLOG is a logic programming language based on a subset of first order predicate calculus. The execution of a PROLOG program corresponds to the proof of some goal, i.e. that some particular relation is satisfied. This is done by recursively matching the goal against the relation definitions until either a match is found or the attempt fails. In sequential PROLOG the order of evaluation is strictly left to right and depth first. When more than one match can be made, the clauses are selected in top down order.

This approach to control is fundamentally different from that in imperative languages. In the latter case control constructs can be divided into two categories.

Simple structured constructs. This includes simple sequence, single-entry single-exit conditions and loops, and procedure or function calls. These are relatively 'well- behaved' and can be mapped onto PROLOG without much difficulty.

Unstructured constructs. In the case of PASCAL, the

only statement which leads to unstructured constructs is the goto statement. There is no corresponding statement which is equivalent to this in PROLOG, nor is it possible to produce the effect of one without great difficulty. Thus, when a PASCAL program is to be translated into PROLOG, the only practical way of handling goto statements is to remove them altogether by mapping the program into an equivalent one without goto statements.

In formulating an algorithm to remove all goto statements from a PASCAL program, it is convenient to regard a PASCAL program as a hierarchical structure. Thus a block is made up of a sequence of statements which are at the same level (one lower than the block itselt) and will be executed sequentially. Likewise the body of a loop statement (for, repeat or while) consists of a sequence of statements which are at a level one below that of the loop statement and which will be executed sequentially. The same is true of if or case statements. Hence each program can be regarded as a hierarchical tree structure with simple statements (such as assignment, input-output or procedure call) as leaves and structured statements as branch nodes. By this reasoning a goto statement is treated as a leaf node in the tree.

Following Jensen and Wirth 12, in a valid PASCAL program:

• A goto statement which is outside a structured statement will not be directed to a label within the structured statement.

• A goto statement which is within one limb of a conditional statement will not be directed to a label within another limb of the same conditional statement.

Since this study is only concerned with correct PASCAL

programs, neither of these conditions should be violated. The process of restructuring is based on mapping

unstructured constructs in the PASCAL program into equivalent ones until a structured program is arrived at. The transformations used can be divided into three categories. First, a branch to a point in the program at the same level as the goto statement. This will be transformed into a repeat or an iLthen statement depending on whether it is a backward or forward jump respectively. In doing so this will create a new level corresponding to the body of the loop or limb of the conditional statement. Some labels may become further separated from the goto statements which refer to them by being embedded into the new repeat or if statement. Additional transformations are used to move labels out of such structures. Normally one Boolean variable is introduced as a flag for each label moved.

Second, a branch from within a structured statement to a point outside this statement. By using one flag, a goto statement can be moved out of the structured statement, thereby decreasing the level of that goto statement by one.

Third, a branch from within a procedure or function body to some point outside the body (abnormal exit). This can be replaced by assigning the value true to a global flag to indicate that the abnormal exit takes place in the body of the procedure or function, and inserting a jump instruction to the end of the body to terminate the execution. The abnormal exit is therefore translated into a normal jump instruction and an instruction setting the global flag. On the other hand, for each function or procedure call which may be affected by such an abnormal exit, an instruction must be inserted after the invocation of the function or procedure, which tests the global flag and branches to the corresponding label if the flag is true. Thus, any expression containing such functions must be split up so that the evaluation of a function is isolated and a dummy variable is introduced to keep the value of the function temporarily. After all functions have been evaluated, the value of the expression is calculated by substituting the values of the dummy variables in place of the function calls.

Based on these three groups of transformations, the algorithm to removc all goto statements operates briefly as follows. First, all abnormal exits in the procedure and function definitions need to be removed using the third group of transformations. These transformations are applied repeatedly until all the abnormal exits are translated into normal jump instructions which do not cross procedure body boundaries. Second, all remaining goto statements are removed by systematic application of the first two groups of transformations.

A system for restructuring PASCAL programs in this way has been implemented in PROLOG. The only overhead incurred in restructuring in this way is the flags and

286 information and software technology

systems the operations on them. Normally a single pass of the PASCAL source program is sufficient to achieve this.

T r a n s l a t i n g s t r u c t u r e d PASCAL-S to P R O L O G

Because of the fundamental differences in data and control structures between the two languages concerned, the translation process is not straightforward. Some critical problems are discussed briefly in this section. For further details, see Williams and Chen la.

The variables in these two languages are conceptually different. A PASCAl. variable is a token of a piece of yon Neumann storage which can be accessed and updated in the process of state transformations. By contrast a PROLOG variable stands for a single object and, once instantiated with a value, may not be altered, unless by backtracking the system returns to the point where the variable is instantiated and undoes the instantiation. Based on this conceptual difference the translation process creates a sequence of PROLOG variables to stand for a PASCAL variable at various points in a computation. Denotationally, whenever a PASCAL variable is updated, the translation process will create a new PROLOG variable to stand for this new value in the subsequent execution and abandon the old one.

The mapping of a single PASCAL variable into different PROLOG variables at different stages of state transform- ation is direct and has no side effect. However, since two distinct PASCAL variables may refer to the same location, referred to as variable aliasing, it is necessary to split the translation into two mappings, one from the domain of variables to that of locations and the other from locations to values. For example, when the same variable is used as actual parameter for two different formal variable parameters or when a variable used as an actual parameter is also referred to directly as a global variable within the procedure or function body. Thus, all access to or update of this kind of variable will be performed by two predicates to treat these special cases. An algorithm has been derived to detect possible variable aliasing at compile time.

A PASCAL block is translated as a rule to call each statement making up the block in turn. Each structured statement is translated as a rule which operates on a subset of the variables and yields another subset as the result, for example:

i f a > h then if a > c

then m a x : = a

else m a x : = c

else if b > c then m a x : = h

else m a x ; = c

the rule corresponding to this if statement will operate on the input set {a, b, c} and output set {max}. Since a block or structured statement may in turn contain other blocks or structured statements, such a translation can be performed recursively until simple statements such as assignment, procedure calls or input-output statements are encountered. These can be coded directly. The necessary set of parameters for each rule can be determined by constructing a data flow analysis framework at compile time.

In the case of array variables, there is no direct equivalent of arrays in PROLOG. To deal with these, two predicates called 'update ' and 'access' were defined to modify or retrieve the relevant value from an array, and various different representations were experimented with. The most suitable was found to be a height-balanced binary tree. Thus an assignment in PASCAL can be translated as an instantiation ofa PROLOG variable in the sequence corresponding to the destination of the assignment. The same is true for the values passed to the parameters of procedure or function calls.

The fundamental differences in the approach for handling arrays proved to be a particular stumbling block to efficiency in the PROLOG analogues of PASCAL programs. Even with a binary tree representation, updating and accessing an array element requires o(log 2 N) time which is very inefficient compared with the time to access an array element in PASCAL.

To assess the extent of this problem six simple examples were tested using the C-PROLOG system. Of these, three used an array as the main data structure. In each case variations of the program were produced using arrays of different sizes in the range 5 to 400 elements. The remaining three examples did not use arrays at all.

In Table 1 the execution times are compared by show- ing the ratio of the time taken to execute the PROLOG code which is generated by this approach (Code) to the time taken by the original PASCAL program running under the Unix PASCAL interpreter EMI (PASC), and the ratio of Code to the time taken to execute an alternative PROLOG program which was handwritten to produce the same effect. The results show clearly that, although not the only

Table 1. Execution times

Example no. Uses arrays Code/PASC Code/Prol

1 yes 10 25 11 -19 2 yes 10 23 6 10 3 yes 8 18 3 5.5 4 no 0.9 1.6 1.5 1.9 5 no 1.4 2.4 1.3 6 no 1.3 1.4

vol 29 no 6 july/august 1987 287

source of inefficiency, the representation of array variables is a major cause of inefficiency.

For programs which do not involve arrays the execution times of the translated programs are much closer to the execution times of the PASCAL programs.

Adding an array facility to PROLOG

In the previous section the translation from PASCAL-S into PROLOG was seen to lead to very inefficient code when array variables are used in the PASCAL-S program. This was due to the fact that in PROLOG a list or binary tree structure was used to simulate an array, with the result that the time required to access an element depends either linearly or logarithmically on the size of the array concerned. This problem may be alleviated if an efficient array facility could be provided in PROLOG. To this end an extension of PROLOG has been implemented which pro- vides such a facility and its performance has been evaluated.

Any array can be mapped onto a one dimensional vector with lower and upper bounds which are non- negative numbers. Three basic operations have been provided to manipulate arrays in the implementation. These are: isarr(Lower,Upper,Array) which is used to create an array or to test if an object is unified with an array. When it is used to create an array, the first two arguments specify the lower and upper bounds of the array and the third argument stands for the array created. Second is access(Array,Index,Element), this selects an element of an array. Third is update(Old- Array,Index,NewValue,NewArray). This is used to modify an array element. A new array (fourth argument) is created from an existing array (first argument) in which the element at the specified index value is given the value 'New Value'.

While the operations isarr and access are relatively straightforward to implement, the operation update can be handled in a number of different ways. Two such ways which have been catered for, are that: it can create a new copy of the array, assigning the selected element to this new array and copying the remainder of the values from the old array (update by copying). Also, it can update the array in situ, assigning the value to the selected element

directly (selective update). The second interpretation defines a destructive

operator which will increase the efficiency considerably. However, to maintain the applicative character of PROLOG, it is important to preserve one of its main features: referential transparency. That is, an element of an array can be updated directly only if it is not referred to in the subsequent execution. Otherwise the element must be updated by copying.

To decide when the more efficient selective update can

be used, a data flow analysis framework for PROLOG programs has been implemented which will give the correlative referential states of the variables in each goal. Thus, for each array variable to be updated, the system can decide whether the update can be achieved destruc- tively (i.e. if there is no further reference to this variable in the subsequent execution) or whether it must be done by copying.

The results of this investigation did not yield as significant an improvement as might have been expected, and are reported more fully in Chen and Williams' paper 14.

Parallel logic programming languages

Another area of interest is how this work may be extended to translate PASCAL programs into either Concurrent PROLOG 15 or PARLOG 16.

Concurrent PROLOG and PARLOG are logic programming languages similar to PROLOG except that they have the ability to execute rules and goals in parallel. Both languages provide 'and-parallelism' (to execute goals in parallel rather than in the left to right order in sequential PROLOG) and 'or-parallelism' (to execute alternative clauses in parallel rather than the top-to-bot tom order of PROLOG). They incorporate features such as guards, stream communication via shared variables and commit- ted choice non-determinism as the 'and-parallel ' compo- nent. PARLOG uses mode declarations and producer and consumer annotations as the basic synchronization mechanism to achieve 'and- parallelism', whereas Concurrent PROLOG uses the read-only annotation to achieve this.

By performing a data flow analysis on a PASCAL program, it is possible to decide which parts of a program represented by different PROLOG goals can be executed in parallel, For example, in the case of translation to PARLOG, when a PASCAL program is analysed and each statement is represented as a PROLOG rule, it is natural to derive the mode declarations for this rule and to construct two sets of PROLOG variables corresponding to the set of PASCAL variables consumed in the goal and the set produced in the goal. If goal B follows goal A in the equivalent PROLOG program and the intersection of the variable set produced by A and that consumed by B is empty, goals A and B can be executed in parallel.

One difficulty lies in the maximum parallelism achiev- able for an iterative statement by using 'divide and conquer ' techniques. For example, the algorithm to calculate the inner product of two arrays (using a for statement) can be understood as halving the arrays each time, determining the inner products for each half and then adding them together. Thus the for statement can be translated into two calls to the same PROLOG rule with different parameters and executed in parallel.


systems Thus with an extension to the method discussed

previously, the system might generate code in Concurrent PROLOG or PARLOG which can be executed in parallel.

Conclus ion

The study described in this paper has shown how a program in the conventional programming language PASCAL can be restructured and translanted to execute on a PROLOG architecture. The translantion discussed herein presented the following results.

First, a program in a conventional programming language, a realistic subset in the form of PASCAL-S, was translated as a set of PROLOG rules. For any PASCAL-S program, a PROLOG program was obtained by applying the translation algorithm to the PASCAL-S source program. From this the extension to full PASCAL has been looked at and no significant problems encountered other than ones of efficiency.

Second, the syntax-directed translation used associates each PASCAL BNF syntax definition with a Horn clause representation in a way which is natural both in respect of the translation process and the PROLOG representation of PASCAL programs.

Third, the translation algorithm produced a representation which takes account both of the basic program constructs and of conventional variables. The basic principles of the translation algorithm are suited to other conventional programming languages as well.

Fourth, the PASCAL source program was translated into a group of PROLOG procedures in which the interfaces between individual procedures relied solely on variables as parameters. This approach changes the dominance of control flow in the source program into that of data flow in the object program.

Several problems emerged from this study, the most serious among these being the efficiency of the object code. The object PROLOG program can run very efficiently in many cases, but in some applications, the speed of the translated code was drastically reduced. It is expected that this might be improved in two respects. From the translation, it is expected that a refined flow analysis of the data flow and control flow of the source program might be required not only to obtain information on the logical relationship between program objects, but also to explore the additional information needed to help the translation process to analyse, reorganize and optimize these rel- ations in order to produce more efficient code. On the other hand, further study of logic programming may be required to improve the performance in general.

Although the study of this translation process is primarily concerned with the problem of how a PASCAL program can be handled on a PROLOG architecture, a secondary interest lies in how PROLOG itself can be further

developed to achieve its goal, as a high level system programming language and as a kernel language for the fifth generation computer systems. The study revealed some problems in this respect.

First, a powerful conventional structured data type, the array data type, cannot be handled efficiently in PROLOG through the mechanism of system predicates. Although the result proved that the introduction of an array facility cannot match the efficiency of a list in PROLOG without a fundamental change to the PROLOG interpreter, it revealed that in handling problems which depend on arrays (editing, searching, matrix multiplication), the performance of PROLOG is drastically inferior to that of conventional programming languages. PROLOG has al- ready shown merit in many areas such as artificial intelligence, databases, natural language processing and knowledge representation, but it is still questionable whether it can achieve as much in numerical computation and system programming. Many proposals have been made to adopt PROLOG as a high level general purpose system programming language; however, the problems with the data structures available in PROLOG severely limit the effectiveness of PROLOG in achieving this "~urpose.

Second, the system overhead to achieve nondeterminism and backtracking is very high, although it is not always necessary for certain applications. Most conventional computation algorithms do not require these features when implemented in PROLOG even when they can be efficiently implemented in PROLOG. For instance, most implementations of algorithms tested in this work such as searching, sorting and editing do not make use of the features of nondeterminism or backtracking; the same must be true of a wide range of numerical computation algorithms. This overhead greatly reduces the efficiency of the language when these classes of applications are required. Some improvements which could be made to assist the PROLOG system in reducing this overhead when the features of nondeterminism or backtracking are not required includes more general exploitation of mode declaration. This remains a significant problem for the language designer to overcome.

References

I Clocksin, W F and Mellish, C S Programming in PROLOC Springer-Verlag (1981 )

2 Roussel, P PROLOG: Manual de reference et d'utilisation Groupe d'lntelligence Artificielle, Uni- versite d'Aix-Marseille, Luminy France [1975)

3 Pereira, F C N and Warren, D H D 'Definite clause grammars for language analysis a survey of the formalism and a comparison with argumented transl-

vol 29 no 6 july/august 1987 289

systems ation networks' Artif. lntell. Vol 13 No 3 (May 1980) pp 231 278

4 Clark, K L and McCabe, F G PROLOG: A language for implementing expert systems' Research report Dept of Computing, Imperial College of Science and Tech- nology London UK (November 1980)

5 Neves, J C, Anderson, S O and Williams M H 'A PROLOG implementation of query-by-example' Proc. 7th Int. Comput. Symp. (Eds Schneider, H J and Teubner, B G) (1983) pp 318 332

6 Camacho-Gonzalez, J, Williams, M H and Aitchison, I E 'Evaluation of the effectiveness of PROLOG for a CAD application' IEEE Comput. Graph. Applic. Vol 4 No 3 (March 1984) pp 67 75

7 Moto-nka, T Challenge for knowledge information processing systems' Fifth Generation Computer Sy- stems pp 3-89 North-Holland, Amsterdam, The Neth- erlands (1982)

8 Wirth, N The programming language PASCAL' Acta Inf. Vol 1 No 1 (1971) pp 35 63

9 Specification for computer programming language PASCAL BS 6192.'1982 British Standards Institute (1982)

10 Williams, M H and Chen, G 'Restructuring PASCAL programs containing GOTO statements' Comput. J. Vol 28 No 2 (1985) pp 134 137

II Wirth, N 'PASCAL-S: A subset and its implementation' (Ed Barron, D W) pp 199-259 Wiley-lnterscience Publication (1981)

12 Jensen, K and Wirth, N PASCAL user's manual and report Springer-Verlag New York (1975)

13 Williams, M H and Chen, G 'Translating PASCAL for execution on a PROLOG-based system' Comput. J. Vol 29 No 3 (1986) pp 246 252

14 Chen, G and Williams, M H 'The value of an array facility in PROLOG' Inf. Process. Lett. Vol 23 No 5 (November 1986) pp 247 251

15 Shapiro, E 'A subset of concurrent PROLOG and its interpreter' Report CS83-06 Dept Applied Mathema- tics, The Weizmann Institute of Science, Rehovot, Israel (February 1983)

16 Clark, K L and Gregory, S PARLOG: Parallel programming in logic' Research report DOC84/4 Dept of Computing, Imperial College of Science and Tech- nology London UK (April 1984) []


Documents

Executing PASCAL programs on a PROLOG architecture