Parallelism and implementation technology for (constraint) logic programming

Parallelism and Implementation Technology for Logic

Programming Languages

Vıtor Santos Costa

LIACC & DCC-FCUP

Universidade do Porto

4150 Porto, Portugal

Abstract

Logic programming provides a high-level view of programming where programs are fun-

damentally seen as a collection of statements that define a model of the intended problem.

Logic programming has been successfully applied to a vast number of applications, and has

been shown to be a good match for parallel computers.

This survey discusses the major issues on the implementation of logic programming

systems. We first survey the evolution of sequential implementations, since the original

Marseille Prolog implementation. Focus is then given to the WAM based techniques that

are the basis for most Prolog systems. More recent developments are presented, such as

compilation to native code.

We next survey the main issues of parallel logic programming, since the original pro-

posals for And/Or parallel systems. The article describes the major techniques used for the

shared memory systems that implement only Or-parallelism and only And-parallelism,

such as Aurora, Muse, &-Prolog, DDAS, &-ACE, PARLOG and KLIC. Last, the survey dis-

cusses recent work on combining several forms of parallelism, as in the Andorra based

languages such as Andorra-I or Penny, or in the Independent-And plus Or models, such as

the PBA, SBA, ACE, or Fire.

1 Introduction

Developments in computing have been dominated by the rise of ever more powerful hard-

ware. Processing speed and memory capacity have increased dramatically over the last

1

decades. Parallel computers connect together several processing units to obtain even higher

performance. Unfortunately, progress in software has been much less impressive. One rea-

son is that most programmers still rely on traditional, imperative languages, and high-level

tasks are difficult to express on an imperative language primarily concerned with how mem-

ory positions are to be updated. This low-level approach to programming is also cumbersome

when programming parallel computers, as the details of control flow can become very com-

plex, and as the best execution strategy can very much depend on a computer’s architecture

and configuration.

In contrast to the traditional programming languages, logic programming provides a

high-level view of programming. In this approach, programs are fundamentally seen as a

collection of statements that define a model of the intended problem. Questions may be

asked against this model, and can be answered by an inference system, with the aid of some

user-defined control. The combination was summarised by Kowalski [96]:algorithm = logic + controlTraditionally, logic programming systems are based on Horn clauses, a natural and useful

subset of First Order Logic. For Horn clauses, a simple proof algorithm, SLD resolution, pro-

vides clear operational semantics and can be implemented efficiently. The most popular logic

programming language is Prolog [39]. Throughout its history, Prolog has exemplified the use

of logic programming for applications such as artificial intelligence, database programming,

circuit design, genetic sequencing, expert systems, compilers, simulation and natural lan-

guage processing. Other logic programming languages have been successfully used in areas

such as constraint based resource allocation and optimisation problems, and on operating

system design.

Logic programming systems are also a good match for parallel computers. As different

execution schemes may be used for the same logic program, forms of program execution can

be developed to best exploit the advantages of the parallel architecture used. This means that

parallelism in logic programs can be exploited implicitly, and that the programmer can be

left free to concentrate on the logic of the program and on the control information necessary

to obtain efficient algorithms.

2

1.1 Organisation of the Survey

In this work we survey work on the sequential and parallel implementation of logic pro-

gramming. We describe the major issues in implementing sequential and parallel logic pro-

gramming systems including compilation techniques, abstract machine implementation, and

performance evaluation. The first part gives an overview of terminology and basic concepts

of logic programming (section 2), discusses how they were applied in the Prolog language

(section 3), and presents some of the major extensions to Prolog (section 4). The second

part (section 5) discusses the sequential implementation of Prolog and other logic program-

ming languages, since the original Marseille implementation. Focus is then given to the

WAM based techniques that are the basis for most Prolog systems. Last, we briefly survey

the more recent developments, such as native code compilation. The third part discusses

the implementation of parallelism since the original proposals for And/Or parallel systems.

The general concepts are given in section 6. Section 7 discusses the major issues in Or-

parallelism, section 8 the problems with And-parallelism and some combined models, and

last section 9 discusses the issues arising from the implementation of the Andorra models.

The survey terminates with pointers to further reading in the area (section 10) and with

some conclusions (section 11).

2 Logic Programming

Logic programs manipulate terms. A term is either a logical variable, or a constant, or a

compound term. Constants are elementary objects, and include symbols and numbers. Log-

ical variables are terms that can be attributed values or bindings. This process is known as

instantiation or binding. Logical variables can be seen as referring to an initially unspecified

object. Hence, variables can be given a definite value (or bound) only once. Several variables

can also be made to share the same value, that is a variable may be instantiated to another

variable.

Compound terms are structured data objects. Compound terms comprise a functor (called

the principal functor of the term) and a sequence of one or more terms, the arguments. A

functor is characterised by its name and by its arity, or number of arguments. The formname=arity is used to refer to a functor. In the Edinburgh syntax [32], terms are written asf(T1; : : : ; Tn), where f is the name of the principal functor and the Ti the arguments. A very3

common term is the compound term :(Head; Tail), written as [HeadjTail], usually called thepair or the list constructor. The Edinburgh syntax also allows some functors to be written as

operators. For instance, the term ’+’(1,2) can also be written as 1+2.

A term is said to be ground, or fully instantiated, if it does not contain any variables. We

define size of a term to be one if the term is a constant or variable, and one plus the size of

the arguments if the term is a compound term.

We can now define Horn clauses. Horn clauses are terms of the form:H :� G1; : : : ; Gn:where H is the head of the clause, and G1; : : : ; Gn form the body of the clause. The body of aclause is a conjunction of goals. Goals are terms, either compound terms or constants. Goals

are distinguished from other terms only by the context in which they appear in the programs.

The head consists of a single goal or is empty. If the head of a clause is empty, the clause

is called a query. If the head is a single goal but the body is empty, a clause is named a unit

clause, or fact. If the head is a goal and the body is non-empty, the clause is called a non-unit

clause, or rule. Logic programs consist of clauses. A sequence of clauses whose head goals

have the same functor, forms a procedure.

One advantage of Horn clauses is that complete, and easy to implement, proof mecha-

nisms are available. Traditionally, the resolution rule is used by these mechanisms. Given

two clauses, resolution creates a new clause that is obtained by matching a negated goal of a

clause to a non-negated goal of another clause. Consider the two clauses:G00 : �A;B:: �G0; C:The resolution rule will use unification to match the goals G0 and G00 to obtain a new clause,in this case :� A;B;C. If variables appear in the clauses, then the resolution process will ob-tain the most general unifier,mgu, for the goals that are being matched. For logic programs,

the mgu is unique if it exists. If it does not exist, the resolution rule fails.

4

The resolution rule can be used in a top-down or bottom-up fashion. Top-down systems

start from an initial query. This query is matched against a clause for the corresponding

predicate, and a new goal is launched according to some selection function. For Horn clauses,

one useful top-down form of resolution is SLD-resolution (or LUSH resolution [83]). In this

method, a query is matched against a clause, and generates a new query (or resolvent) built

from the remainder of the initial query and the body of the matching goal. This process goes

on recursively until either some goal has no matching clause, or until an empty query is

generated.

There is a simple and intuitive reading to SLD-resolution. Referring to the previous

clauses, G0 :� A;B can be interpreted as part of the definition of a procedure, and the query:� G00; C as a set of goals to execute, or satisfy. SLD-resolution operates by selecting one goalof the query and calling a corresponding procedure. To satisfy this goal, some new goals need

to be satisfied, hence the new goals are added to the query. The process is repeated until all

the goals have been executed.

SLD-resolution does not specify which goal in the query should be selected. This is the

province of the selection function. Moreover, several clauses may match a goal, hence there

might be several ways to search for a solution. For a particular selection function, an SLD-

tree represents all the possible ways to solve a query from a program, that is, the search-space

of the program. It is important to remark that by changing the selection function, one can

change the search space. Consider this very small program:

a(1). a(2).

b(2).

Figure 1 shows the search trees corresponding to two different selection functions applied in

the execution of the query :- a(X), b(X). The first function selects a(X) first, and needs

to consider the two clauses for a/1. The second selects b(X) first and hence only has a single

matching clause for a(X).

There may be several strategies for exploring a search-space. A search rule describes

which alternative branches should be selected first. Search rules do not affect the search

space, but they can affect how quickly one will reach the first solution (if at all).

5

:− a(2).

:− a(X), b(X).

success

:− a(X), b(X).

:− b(1).

successfailure

:− b(2).

(1) leftmost selection function (2) rightmost selection function

Figure 1: Different Search-Trees for the Same Query

3 The Prolog Language

Prolog was invented in Marseille by Colmerauer and his team [38]. Prolog systems apply

SLD-resolution, but with some simplifications. Prolog uses a fixed selection function: the

leftmost goal is always selected first. The search rule of Prolog is also quite simple: Prolog

simply explores the tree in a depth-first left-to-right manner. Whenever several alternatives

for a goal are available, Prolog simply tries the first alternative, following the textual order in

the program. When an alternative fails, Prolog backtracks to the last place with unexplored

alternative, (that is, it restores the state of the computation as before that point) and tries

the first remaining alternative.

In Prolog, programs automatically give control information through the ordering of goals

in the body of a clause and of the clauses in the definition of a procedure. The ordering

of body goals gives control information for the selection function, whereas the ordering of

clauses gives control information for the search rule. Prolog also includes control operators.

A large number of built-in predicates provides extra control, Input/Output, database oper-

ations, arithmetic, term comparison, meta-logical operations, and set operations. Note that

actual features vary between different Prolog systems. ISO supported the development of

a standard for basic functionality in Prolog [55]. In practice, most implementations do not

fully adhere to the standard, and all have extensions. Readers should refer to the specific

Prolog manuals, such as SICStus Prolog’s [6], ECLiPSe’s [1], or YAP’s [48] for the ultimate

information on what is available on a specific system.

Note that the use of many built-ins relies on prior knowledge of Prolog execution. For

example, a typical top-level clause for a program might look like this:

6

top_level :-

read(Query),

solve(Query, Solution),

write(Solution).

The correct execution of the built-ins read/1 and write/1 implicitly assumes left-to-right

execution.

4 Other Logic Programming Languages

One of the most serious criticisms of Prolog is that the selection function used by Prolog is

too restrictive. From the beginning, authors such as Kowalski [96] remarked the effect on

the size of the search space of solving different goals in different sequences. A more flexible

execution than the one used by Prolog can be obtained through coroutining. Coroutines

cooperatively produce and consume data [96], allowing for data-driven execution. Several

designs for coroutining became very influential in the logic programming community. IC-

Prolog, designed by Clark and others [30], was one of the first logic programming languages

to support delaying of goals until data is available. Colmerauer’s Prolog-II [36] supported

geler, a built-in that would delay a goal until a variable would be instantiated. Naish’s

MU-Prolog and NU-Prolog [124] supports wait, a form of declaring that a goal should only

execute if certain arguments were instantiated. Features similar to geler and wait are

now common in modern logic programming systems.

Coroutining allows more flexible execution of goals. One can go one step further, and

associate specific rules with variables. Whereas in traditional logic programming variables

are associated with terms, in these novel frameworks variables may also be associated with

real and rational numbers, intervals, booleans, lists, and so on. A special class of goals, or

constraints, manipulates these variables.

The concept of constraint predates logic programming, but constraint logic programming

has shown to be a very effective form of applying constraints. Initial work originates from

Marseille group’s Prolog-III [37], Jaffar and other’s CLP framework of languages [85] and

ensuing CLP(R) system, and ECRC’s CHIP [56]. There has been very intense research inthe area since. We refer the reader to Marriot and Stuckey [111] for a good introduction to

this rapidly expanding field.

7

4.1 The Committed-Choice Languages

Research into coroutining led some authors to completely abandon the Prolog’s left-to-right

execution. The committed-choice, or concurrent, logic programming languages [155] are a

family of logic programming languages that use the process reading [184] of logic programs.

In this reading, each goal is viewed as a process and computation as a whole as a network

of concurrent processes, with interconnections specified by the shared logical variables. The

process reading of programs is most useful to build reactive systems, which contrast with

transformational systems in that their purpose is to interact with their environment in some

way, and not necessarily to obtain an answer to a problem. Examples of reactive systems are

operating systems and database management systems.

Initial research on these languages started in the early to mid eighties with Clark, Gre-

gory and others’ IC-Prolog [30] and then Parlog [29], and with Shapiro and others’ Con-

current Prolog [153]. At the time, the Japanese Government was starting an ambitious

research project on developing the Japanese hardware and software industry, the Fifth Gen-

eration Computing Systems Project (FGCS). Shapiro was influential in persuading Japanese

researchers to use committed choice languages as a basis for this project. These languages

were seen as an high-level programming tool that naturally allowed the exploitation of con-

currency and parallelism. Ueda’s GHC [182], later simplified to KL1 [183], was the basis

for FGCS work. The FGCS project had huge impact outside Japan, and both the American

and the European governments supported alternative research on sequential and parallel

committed choice languages and on traditional Prolog systems.

One major difference between Prolog and the committed choice languages is that clauses

are guarded. The head and usually some goals in the body of a clause are tested before ex-

ecuting. If they are satisfied, one says that a goal can commit to the clause. If the goal does

commit, the remaining clauses are discarded, even if they would match the goal. This simpli-

fied semantics and implementation. A further simplification resulted in the flat committed

languages where one only allows a few built-in goals, such as tests or arithmetic conditions,

in the guard.

Research in these languages was very intense during the eighties. Quite a few committed

choice languages and dialects have been developed. Rather sophisticated applications were

also developed, especially within the FGCS project. Work is still going on the design and

8

application of committed choice languages, but many researchers in the area moved towards

applying their framework outside logic programming, particularly as a basis for coordination

languages in parallel and distributed systems. An initial example is the commercial Strand

system [61].

4.2 Andorra

The decision to support a single solution simplifies the design and implementation of these

languages. Arguably, nondeterminism in choosing clauses is sufficient for most reactive sys-

tems, and indeed the committed-choice languages have been used successfully to implement

complex applications such as operating systems kernels or compilers [154]. On the other

hand, search programs that can be coded easily and naturally in Prolog are much more

awkward to write in these languages [177]. Several authors proposed languages that al-

lowed non-deterministic procedures in a committed choice environment, such as Saraswat’s

CP[#,j,&,;] [149], Yang’s P-Prolog [195], and Takeuchi’s ANDOR-II [170]. Starting from theopposite direction, Naish’s PNU-Prolog [125] parallelised the implementation of NU-Prolog

by allowing the execution of deterministic computations in parallel.

Yang’s work was an important influence in the David Warren’s Basic AndorraModel. This

model follows the Andorra Principle:� Goals can execute in And-parallel, provided they are determinate;� If no (selectable) goals are determinate, we can select one nondeterminate goal, andexplore its alternatives, possibly in Or-parallel.

The model could be used to parallelise Prolog programs, or to design new languages.

Work on Prolog parallelisation was mainly pursued at Bristol and resulted in the Andorra-I

system [145]. Several other groups proposed novel languages based on this model, such as

Haridi’s Andorra Prolog [74], developed at SICS, and Bahgat’s Pandora [8], from Imperial

College.

One problemwith Andorra systems is the notion of determinate. Systems such as Andorra-

I follow a strict definition of determinacy, where one considers a goal to be determinate if

head unification and built-in execution will succeed for at most one clause. The definition

was later extended to handle pruning operators.

9

Research on more ambitious form of determinacy eventually led to the Extended Andorra

Model, or EAM, where one can parallelise non-determinate computations as long as they do

not bind external variables. Warren was interested in the EAM as a way to exploit all forms

of implicit parallelism in logic programs [191]. In contrast, Haridi, Janson, and colleagues

at SICS were interested in developing new programming frameworks. They eventually pro-

posed a new language, the Andorra Kernel Language, AKL [86], a general concurrent logic

programming language based on Kernel Andorra Prolog [75]. Several parallel implementa-

tions of AKL were designed and implemented [129, 120, 118].

More recently, Smolka’s Oz language [162], developed at DFKI in Germany, provides the

main features in AKL, but generalising to a wider framework that encompasses functional

and object-oriented programming. The researchers involved in AKL have since moved on to

work with Oz.

5 Implementation of Logic Programming Languages

The Prolog language adapts well to conventional computer architectures. The selection func-

tion and search rule are simple operations, and the fact that Prolog only uses terms means

that the state of the computation can be coded quite efficiently.

5.1 The Beginnings

The original Marseille Prolog [38] system was an experimental interpreter, written in Algol-

W by Philippe Roussel. A second interpreter was written in Fortran by Battani and col-

leagues. The system used structure sharing to represent Prolog terms. In this represen-

tation terms are represented as pairs, one containing the fixed structure of the term, the

skeleton, the other containing the free variables of the term. Unification proceeds by compar-

ing the skeletons and assigning variables in the environments. Each goal was represented

by a record, that included both the data necessary to represent the execution of a matching

clause, and the data needed to backtrack. System built-ins included the basic functionality

available in modern Prolog systems.

Warren’s DEC-10 Prolog system [187] was the first compiled Prolog system. The system

was developed by Warren, F. Pereira and L. M. Pereira. The system showed good perfor-

mance, comparable to the existing Lisp systems [192]. It included a separate stack to store

10

terms. Control was still represented by activations. Mode declarations were used to simplify

compilation.

The DEC-10 Prolog system became very popular. It is the reference for the “Edinburgh

syntax” that is still followed by most Prolog systems. The efficiency of this system was also

influential in the decision of the Japanese to use logic programming for their Fifth Genera-

tion Project.

5.2 The WAM

The basis for most of the current implementations of logic programming languages is the

Warren Abstract Machine [188], or WAM, an “abstract machine” useful as a target for the

compilation of Prolog because it can be implemented very efficiently in most conventional

architectures. The WAM was developed out of the interest in having a hardware imple-

mentation of Prolog. Warren presented a set of registers, stacks and instructions that could

be efficiently implemented by specialised hardware. In practice, most WAM-based systems

emulate such a machine in software, through an emulator.

The WAM represents Prolog terms as groups of cells, where a cell can be either a value,

such as a constant, or a pointer. Variables are represented as a single cells. Free variables

are represented as pointers that point to themselves. Bound variables can simply receive the

value they are assigned to, if the value fits the cell size, or made to point to the term they are

bound to. The WAM uses a copy representation for compound terms. In this representation

a compound term is represented as a set of cells, where the first cell represents the main

functor, and the other cells represent the arguments. Unification first compares the two

main functors and then is called recursively for each argument. Note that in copying terms

must be constructed from scratch, whereas in structure sharing different terms can share the

same skeleton. Both sharing and copying have advantages and disadvantages, but Warren

argues that copying gives easier compilation and better locality [188].

The WAM was designed as a register based architecture. Arguments are passed through

the A registers. These registers also double as temporary registers, known as X registers.

Several other registers control the execution stacks:� The Environment Stack tracks the flow of control in program. Each environment framerepresents a clause, and maintains the point where to return after executing the clause,

11

plus the variables that are shared between goals in the clause. The E register points to

the current active environment.� The Choice-Point Stack stores open alternatives. Each choice-point frame records thecurrent value for the abstract machines when an alternative was taken. The B register

points to the active choice-point, which is always the last.� The Global Stack or Heap was inherited from the DEC-10 Prolog abstract machine. Itstores compound terms and variables that cannot be stored in the environment stack.

The H register points to the top of this stack.� The Trail stores conditional bindings, that is, bindings to variables that need to beundone in backtracking. In the WAM bindings can be undone by simply resetting the

variable. The TR register points to the top of this stack.

Trail Choice-Point Stack

Environment Stack

Heap

TR

B

E

H

SHB

Figure 2: WAM Stacks

Figure 2 gives an overview of the stacks used by the WAM. The figure mentions other

important WAM registers. The HB and EB registers record the value of the stack at the last

choice-point, and their value could be obtained from the choice-point pointed to by the B

12

register. The S register is used when unifying compound terms, and always points to the

global stack.

Systems that implement the WAM compile programs as sequences of abstract machine

instructions. To give the reader a flavour of what to expect fromWAM code, we give a simple

example of the code for the naive reverse procedure:

nrev([], []).

nrev([H|T], R) :-

nrev(T, R0),

conc(R0, [H], R).

The WAM code for this procedure is shown in Table 1. The code shows examples of the four

switch_on_term CV1,Cc,Cl,fail

CV1:try_me_else CV2 % nrev(

Cc:get_constant [],A1 % [],get_constant [],A2 % [])proceed

CV2: %trust_me % nrev(allocate 3get_list A1 % [unify_variable Y1 % H|unify_variable X1 % T],get_variable Y2,A2 % R) :-put_variable Y3,A2 % nrev(T,R0)call nrev/2 % ,put_unsafe_value Y3,A1 % conc(R0,put_list A2 % [unify_value Y1 % H|unify_constant [] % []],put_value Y2 % R)execute conc/3 % .

Table 1: WAM Code for Naive Reverse

different groups of WAM instructions:� Indexing instructions choose clauses from the first argument. An example is the firstinstruction in the code, switch on term. The instruction tests the type of the first ar-

gument and jumps to different code according whether the first argument is a variable,

13

constant, compound term, or pair. Other indexing instructions switch on the value of

constants and functors.� Choice-Point Manipulation instructions manage choice-points. The code includes atry me else instruction, that creates a choice-point, and a trust me instruction, that

uses and then discards an existing choice-point. The retry me instruction, not shown

here, just uses a choice-point.� Unification instructions implement specialised versions of the unification algorithm.The instructions are classified by position and type of argument. Head unification is

performed by get instructions, sub-argument unification by unify instructions, and

argument preparation for calls by put instructions. The variable instructions pro-

cess first occurrence of variables in the clause, value instructions process non-first

occurrences, the constant process constants in the clause, the list instructions

process lists, and the structure instructions process other compound terms.� Control instructions manage forward execution. The allocate and deallocate re-spectively create and destroy environments. The proceed instruction returns from a

fact. The call instruction calls a non-last subgoal, and the execute instruction calls

the last subgoal. Note that by using the deallocate and execute instructions the

WAM can perform last-call optimisation.

The outside simplicity of the WAM hides several intricate implementation issues. Com-

plete books, such as Aıt-Kaci’s tutorial on the WAM [2] have been written on this subject.

5.3 Improving the WAM

The WAM soon became the standard technique for Prolog implementations. Several optimi-

sation techniques have been proposed for the WAM, we next discuss a few:

Register Allocation The WAM is register based, and there is scope for optimising the al-

location of temporary registers. Debray gives one of the first discussion on register allocation

in the WAM [50]. More sophisticated schemes were presented by Janssen et al. [87] and by

Matyska et al. [113]. A good discussion on the problem can also be found in Mats Carlsson

thesis [21].

14

Compilation of Compound Terms The WAM uses a breadth-first scheme for compil-

ing compound terms and lists. Other schemes are possible. Marien et al. [109] present a

depth-to-right scheme that had also been used in previous implementations, such as YAP.

Intermediate schemes, such as the one used in SICStus Prolog [19] and Andorra-I [148] are

also possible.

Implementation of Cut the implementation of cut usually requires an extra register

in the WAM. Marien and Demoen discuss the problem in conjunction with stack manage-

ment [108].

Modes Unification instructions can be specialised if one knows arguments are already in-

stantiated, or if they are free. DEC-10 Prolog introduced mode annotations, where the user

can declare how he expects each argument to be used. Annotations demands extra work

from the user, and in the worst case may be erroneous, resulting in incorrect execution. Mel-

lish was the first to derive mode information automatically through global analysis [115].

He used the abstract introduction framework, originally proposed in the context of impera-

tive languages by the Cousots [42, 43]. This framework detects properties of programs by

executing the programs under a simplified abstraction of their original domain. The abstrac-

tion must be such that the analysis process will converge. Since Mellish’s work, abstract

interpretation has been used for several applications in logic programming.

One generalisation of modes is the case where one can detect what is the first usage of a

variable. If we know where the variable will be used the first time, we know the variable is

unbound, andmoreover, we did not need to use initialise the variable. Beer [11] gives the first

application of this technique, that was fundamental in the Aquarius [185] and Parma [175]

native-code systems.

Types Although the Prolog language is untyped, one can try to infer types for the terms

used during program execution. This optimisation can be used to simplify code, for example

by not tagging terms. The Aquarius and Parma global analysers can detect simple types,

such as terms that are integers, floats or symbols. More sophisticated systems can detect

recursive types. A discussion on the application of type inference is given by Marien and

colleagues [110]. There is recent work on untagging by Bigot and Debray [13] and on the

inference of recursive types by Lu [105].

15

Memory management One of advantages of Prolog is that one can recover space in back-

tracking. Still, space recovery may be required for large computations that do not fail. The

landmark paper on garbage collection for the WAM is from Appleby and colleagues [7]. More

recently, Demoen and colleagues have presented a copying based algorithm that is a good

alternative [53]

There has been some research on using compile-time analysis to reutilise stack cells. See

Bruynooghe and colleagues on how to use global analysis, and specifically abstract inter-

pretation, for this purpose [16]. More recent work on the subject has been performed by

Mulkers, Winsborough and Bruynooghe [122] and by Debray [51].

Indexing choice-point creation and management is one the most expensive operations on

the WAM. Warren’s indexing instructions have several problems. For instance, they only

index on a single argument, and they may create several choice-points for the same call.

Extensions to the original indexing scheme have been used in most Prolog systems, such as

the one used in Prolog by BIM [107], or in SICStus Prolog [19].

One can go one step further and try to minimise the number of choice points created.

Hickey and Mudambi [82] use switching trees to minimise choice-points. More recently,

Zhou et al. propose a matching tree scheme for the compilation of Prolog [197].

One possible alternative is to maintain close to the original Prolog code, but flesh choice-

points only when necessary. The scheme is know as shallow-backtracking [20], and has been

used in several Prolog implementations such as SICStus Prolog.

5.4 Generating Machine Code

Since the initial Prolog implementations, systems such as Bim-Prolog [107], Aquarius [185]

and Parma [173, 174] created interest on using direct compilation to native code as a way to

improve performance over traditional emulators. Native code systems avoid the overheads

incurred in emulating an abstracting, and simplify introducing new optimisations [186].

5.4.1 Aquarius and Parma

The Aquarius [185] systemwas developed at Berkeley. It uses the Berkeley Abstract Machine

(BAM) as the target for translation from Prolog. The BAM is a low-level instruction set

16

designed for easy translation into RISC style architectures. The abstract machine code is

generated after the following transformations:

1. Global Analysis. This gives information on groundness and freeness of variables, and

on the size of reference chains.

2. Determinism Extraction. The goal is to replace shallow backtracking by conditional

branching. This is implemented by rewriting the predicate into a series of nested state-

ments:

3. Type Enrichment. The idea is to generate different versions for the case where the first

argument (or the first interesting argument) is bound or unbound. It can be avoided if

global analysis gave type information.

4. Goal Reordering. Sometimes, goals can be reordered to improve performance.

5. Determinism Extraction. Last, try to generate the switch statements.

The output of these steps is still a program in a sub-set of Prolog. Van Roy calls this subset

Kernel Prolog.

Next, a disjunction and a clause compiler are used to generate BAM code. The disjunction

compiler handles choice-points, trying to minimise their size. The clause compiler performs

goal compilation, unification compilation and register allocation. BAM instructions are di-

vided into: the simple instructions, complex instructions, and embedded instructions. Simple

instructions are designed to support:� comparison, data movement, address calculation, and stack manipulation instructions;� switch, hash, and pair instructions are designed to support indexing;� Control instructions, such as call or return.The complex instructions are groups of instructions that represent common operations: deref-

erencing, trailing, unification and backtracking. Embedded instructions give information

that can help optimise code, such as pragmas.

Last, a final step implements several BAM optimisations, such as peephole optimisations

(with a 3-instruction window), dead-code elimination, jump elimination, duplicate-code elim-

ination and choice-point elimination.

17

The Parma system was independently developed by A. Taylor in Australia at about the

same time as Aquarius [175]. Many of its ideas are similar to the principles used in Aquar-

ius. Parma does use a more sophisticated abstract interpreter, and is specialised for MIPS

code. Van Roy gives a comparison of the two systems where Parma clearly out-performs

Aquarius [186]. In practice, Parma was never as widely available as Aquarius.

5.4.2 Super Monaco

Super Monaco is a system developed at the University of Oregon by Evan Tick’s group. It

compiles a subset of KL1 [178] into low-level code in the style of Parma or Aquarius. Work

on Super Monaco was influenced by the work in the compilation of CP, especially by the

decision graph compilation algorithms of Kliger and Shapiro [92, 93], and by previous work

in the compilation of KL1 and of Parlog, such as JAM [45].

In Super Monaco procedures are compiled into a decision graph. The compilation algo-

rithm is as follows:� The front-end generates decision-graphs in the style of Kliger. The decision graph al-lows for very good compilation, although the decision graphs initially generated by

Parma have excessive number of tests, and especially of type-checking tests.� An intermediate code is next generated. The intermediate code is similar to what onewould find in Aquarius. The initial code assumes infinite number of abstract machine

registers.� A flow analysis analyser builds a flow graph of basic blocks, and performs memoryallocation coalescence.� Next, common subexpressions are eliminated, and type and dereferencing informationis propagated through the graph. This is fundamental to remove unnecessary instruc-

tions and branches introduced in the original graph. Dead-code removal is also per-

formed.� Register allocation is performed.� Minor optimisations are performed. These include short-circuiting, branch-removal,peephole optimisation, and register move chain squashing.

18

� Native code is generated by using templates, that convert each instruction into a se-quence of native code. Templates are available for X86 and MIPS assembly instruc-

tions.

The instruction set for the intermediate language is quite complex. It includes several op-

timised instructions, such as alloc, mkstruct, mkgoal. Some instructions, such as unify

instruction call runtime routines.

Finally, note that the Super Monaco supports And-parallelism. Although this results in a

more expensive implementation than Aquarius or Parma, the authors of the system argued

that Super Monaco’s performance is close to C-systems [178]

5.5 Assembly Generation from WAM style Code

The previous system try to obtain high efficiency by translating from Prolog directly to low-

level intermediate code. One alternative possibility is to generate low-level code from inter-

mediate WAM-style code. This approach has been followed in commercial systems such as

Prolog by BIM and in SICStus Prolog. We next discuss the implementation of native code in

SICStus Prolog, as documented by Haygood [77].

SICStus Prolog included a native code compiler for SPARC and 68k machines for quite

some time. More recently, a new native code system was designed towards having a more

portable implementation [77]. The novel implementation used some of the techniques origi-

nally used in Aquarius and supports SPARC, MIPS, and 68k architectures. The new native

code system uses an abstract machine, the SAM, as target from WAM style code. The SAM

is then compiled into either RISS, an intermediate representation appropriated for RISC

machines, or directly into 68k style code. The SAM is quite simple, and includes only a few

instructions: ld, st, and stz for moves, the arithmetic and bitwise logical operations, and

simple control instructions. Complex operations, such as unifications, are implemented by

jumping to a “kernel” coded in assembly language.

Most effort in the SAM is devoted to the compilation of unification for compound terms.

SAM improves on Van Roy’s two stream technique [186].

The next step is the RISS. In the RISS the number of registers is specified, immediates

are size-restricted, and control transfers have delay slots. The transformation is performed

by another step, sam riss, which binds the registers and tries to fill delay slots.

19

Performance analysis shows a two to three fold speedup with the native code compilation

in SICStus. The results from the new implementation, used in SICStus Prolog v3, versus

the previous implementation as used in SICStus v2.1, also indicate much better performance

with the new native code: the speed of the new native code seems to be similar to the per-

formance of Aquarius without global optimisations. The implementation is also very stable,

and is now the default mode for SICStus v3 in the Sparc port.

5.6 C Code Generation

One problem with traditional native code generators is portability. Supporting a new archi-

tecture is hard. Even changes in the Operating System may force the implementor to retool

the system. These problems could be avoided if one used a higher-level language, such as

C. Improvements in the quality of compiler back-ends, such as in GCC [165], also argue for

experimenting with this solution. Although the highest performance implementations still

generate assembly code, we next discuss three systems that do generate C code.

5.6.1 JC

The Janus Compiler, jc, was developed at Arizona by De Bosschere, Debray, Gudeman and

others [64]. It compiles from the Janus committed-choice language into C code code. The

Janus committed-choice language [150] is an ask-and-tell language, that simplifies on tradi-

tional committed-choice languages by having the restriction that non-ground variables can

at most have one writable and one readable occurrence in a clause. This restriction enables

several simplifications to the implementation.

The jc compiler compiles a procedure into a single huge switch statement to reduce

work. The compiler assumes a set of virtual registers, some of which will be actual machine

registers. Differently from most committed-choice language, jc uses environments for data

representation. Instead of using a code continuation pointer, environments point to the code

to execute next in the environment. Because jc compiles into a single switch statement,

predicates are represented as numbers.

The compiler starts from the predicate as a set of clause and performs suspension-related

transformations, expression flatenning, common subexpression elimination and goal reorder-

ing. No decision graph is generated.

20

Code generation is performed next. The jc abstract machine has four kinds of registers,

ordinary tagged registers and untagged registers, which may contain addresses, integers or

floating-point numbers. The compiler always keeps untagged values. The authors claim at

most 10 registers are needed.

The system relies on optimisations to obtain good performance. The main optimisation

is call forwarding [49]. The idea is to generate procedures with multiple entry points, so

that information specific to a call can be used to bypass tests. The effects of call forward-

ing are generalised by using jump target duplication. Finally, the compiler also includes

instruction-pair motion, which allows the removal of complementary instructions, especially

of environment allocation.

The intermediate instruction set for jc is a set of simple macros that will later be com-

piled into C. Macros include Move, Assign, and quite a few versions of Jump.

The performance of the system is quite good, even versus native code implementations.

This result is explained in part by the simplicity of the language, and in part by the use of

call forwarding.

5.6.2 KLIC

The KLIC compiler [27] is a descendent of the Japanese implementation of KL1. These imple-

mentations compiled from KL1 into a WAM like instruction set, and supported parallelism.

KLIC was designed as a portable, and highly efficient implementation of KL1. The au-

thors argue that the advantages of compiling to C are portability, low-level optimisations

available in the C compiler, and ease of linking with foreign languages. The problems are

costly function calls, register allocation control, provision for interrupts and object code size.

The solutions proposed are having one module as a function, caching global variables as lo-

cal variables, which may then be placed in machine registers, using flags for synchronisation

with interrupts, and runtime routines for costly operations.

5.6.3 WAMCC

The WAMCC was designed by Diaz [34] at INRIA. It compiles from Prolog into straightfor-

ward WAM code, and then into C which can be compiled by GCC.

The WAMCC includes relatively few optimisations. The philosophy seems to be leave

the work to C compiler. Unfortunately, performance is not very impressive as compared to

21

emulated systems such as YAP or SICStus v3. This indicates that care in the description of

abstract instructions is fundamental.

The WAMCC provides a very clean system that is ideal for experimentation with new

compilation technology. For instance, the WAMCC was the basis for Diaz and Codognet’s

clp(fd) system [35].

5.7 Other Approaches

There is a substantial body of work in logic programming implementation. We mention a few

important of the most important contributions.

Structure Sharing Not all Prolog implementations rely on structure copying. MProlog

is an example that one can get good performance by using structure sharing [95]. More

recently, Li has proposed an alternative term representation that combines structure sharing

and copying [100].

The Vienna Abstract Machine The WAM generates code for each procedure that is inde-

pendent of its callers. The VAM [97] innovates over the WAM by considering both the caller

and callee to generate more efficient code. The VAM can obtain significant performance im-

provements, but runs the risk of generating excessive code.

Binarisation The BinProlog system [171] introduced several important contributions to

logic programming. In BinProlog clauses are binarised. For instance, the code for naıve

reverse:

nrev([], []).

nrev([H|T], R) :-

nrev(T, R0),

conc(R0, [H], R).

would be transformed to the following binary clauses:

nrev([], [], Cont) :- call(Cont).

nrev([H|T], R, Cont) :-

nrev(T, R0, conc(R0, [H], R, Cont)),

22

Having continuations available explicitly allows continuation-passing style compilation, a

technique that has had quite good results for the functional languages. It also simplifies the

implementation of several extensions to Prolog.

A second contribution is that BinProlog combines both native and emulated code for the

same procedure. Native code is generated for the kernel part of a clause, and the remaining

code is still emulated [172]. This gives compact code and good performance.

Global Analysis Traditional global analysis in Prolog has been performed through ab-

stract interpretation [42, 43]. There has been recently work in showing that abstract in-

terpretation is effective and can be integrated into more mainstream systems. Examples

include the work on CIAO at Madrid [17] and GAIA [99].

An alternative to do abstract interpretation is to use control flow graphs to specialised

Prolog programs. This enables several mode and type based optimisations. Recent work on

this area is described by Lindgren [103] and Ferreira [60].

Extensions to Prolog Several authors have proposed extensions to Prolog. Examples in-

clude Miller and Nadathur’s �Prolog [116], a language that adds meta-level programmingto Prolog, Monteiro and Porto’s contextual logic programming [117], an interesting scheme

for modular programming. In a different vein, there have been several proposals for combin-

ing logic programming with other paradigms, such as the functional languages [98]. Most

of these extensions are implemented either by compiling into Prolog, or by adding suitable

extensions to the WAM, or by using totally new frameworks, as in BABEL [98] or in MALI’s

implementation of �Prolog [15].Tabling [26] is arguably one of the most important extensions to logic programs. It im-

proves on left-to-right search scheme by storing and reutilising intermediate solutions. This

strategy can avoid unnecessary computations, and in a few cases can avoid looping. One

of the first widely available Prolog systems to implement tabling is XSB-Prolog. XSB im-

plements SLG-resolution [26] through the SLG-WAM [137]. More recently CAT [54] has

designed an alternate implementation for SLG-resolution, based on copying, as in the Muse

or-parallel Prolog system [4].

Mercury Some authors have defended abandoning Prolog altogether in the interest of ef-

ficiency and, arguably, of declarativeness. One popular alternative is Mercury [163]. This is

23

simple language, proposed by Somogy and colleagues. The language supports a strict mode

and type system by restricting severely the use of the logic variable, and is amenable to a

very fast C-code implementation.

6 Forms of Implicit Parallelism in Logic Programs

Parallelism in logic programs can be exploited implicitly or explicitly. In explicit systems such

as Delta Prolog [131] special types of goals (events and splits in Delta Prolog) are available

to control parallelism. Unfortunately, these languages do not preserve the declarative view

of programs as Horn clauses, and thus lose one of the most important advantages of logic

programming.

Implicit parallelism can be obtained through the parallel execution of several resolvents

arising from the same query, Or-parallelism, or through the parallel resolution of several

goals, And-parallelism. These two forms of parallelism can be explored according to very

different strategies. A large number of parallel models and systems have been developed

for both distributed and shared memory parallel architectures. It is impossible to discuss

all proposals and systems, in this survey we concentrate on some of the most influential

systems.

7 Or-Parallelism

In Or-parallel models of execution, several alternative search branches in a logic program’s

search tree can be tried simultaneously. So far, quite a few models have been proposed. Most

successful have been the multi-sequential models, where processing agents (workers in the

Aurora [106] notation) select Or-parallel work and then proceed to execute as normal Prolog

engines.

A fundamental problem in the implementation of Or-parallel systems is that different

or-branches may attribute different bindings to the same variable. In an Or-parallel system,

and differently from a sequential execution, these bindings must be simultaneously avail-

able. The problem is exemplified in Figure 3, where choice-points are represented by black

circles and branches that are being explored by some worker are represented by arrows. The

two branches corresponding to workersW1 andW2 see different bindings for the variable X .24

X = aX = b

W1

W2

X

Figure 3: The Binding Problem in Or-parallelism

A large number of Or-parallel models, including different solutions to these problems

have been proposed (the reader is referred to Gupta [69] for a survey of several Or-parallel

models). The models vary according to the way they address the binding problem. Next,

there follows a brief description of influential Or-parallel models.

7.0.1 Independent Prolog Engines

The binding problem can be avoided by having each worker to operate on its part of the

or-tree as independently from other workers as possible. One extreme is represented by

the Delphi model [31], each worker receives a set of pre-determined paths in the search

term, attributed by oracles allocated by a central controller. Whenever a worker must move

to an alternative in a different point of the search tree, the worker recomputes all state

information for that alternative. Clocksin and colleagues argue that Delphi allows for good

parallelism, low communication and efficient performance in coarse-grained problems. One

problem is that Delphi can have problems in fine-grained tasks as full recomputation of

work may become very expensive. Delphi was also severely limited by centralised control,

Lin proposed a different, self-organising, task scheduling scheme [102].

A different alternative, copying was at the first used in the Japanese Kabu-Wake sys-

tem [112] system (that was later abandoned in favour of the Fifth Generation Project) and

in Ali’s BC-machine [3]. Ali and Karlsson eventually adapted copying to standard shared

memory machines, and developed the Muse system [4]. In copy scheme based implementa-

tions, whenever a worker W1 needs work from a worker W2, it copies the entire stacks ofW2. Worker W1 will then work in its tasks independently from other workers until it needsto request more work. To minimise the number of occasions at which copying is needed,

25

scheduling in Muse favours selecting work at the bottom of tree.

Full copy systems basically use the same data-structures as a sequential Prolog engine

during ordinary execution. Thus they do not suffer any special overheads during ordinary

execution. On the other hand, task switching becomes more expensive. One argument for

other systems, such as the Aurora system we discuss later, is that it would be difficult to

support cut and built-ins efficiently in Muse. Ali and Karlsson later presented an elegant

solution to this problem [5].

Copying has become the most popular form of exploiting Or-parallelism. Muse is now

a part of the standard SICStus implementation [6], and the ideas from Muse were used in

parallelising ECLiPse [1] and YAP [140]. Muse was also influential in the design of other

parallel logic programming systems, such as the distributed Or-parallel system Opera [14],

ACE [66], and Penny [118].

7.0.2 Shared Space

Instead of each worker having its own stacks, all the workers may share the stacks. In

this case they will need to represent the different bindings for the or-branches. To do so,

changes must be made to the data structures used to represent terms. Whereas sequential

implementations of Prolog store bindings in the value-cell representing a variable variable,

these systems need to use some intermediate data structure to store bindings to variables

that are shared between or-branches. We next discuss two examples of shared space models,

the Hash Tables used in PEPSys, and the SRI model used in Aurora.

Hash Tables: The main characteristic of hash-table models [25] is that whenever a worker

conditionally bindings a variable, the binding is stored in a shared data structure associated

with the current or-branch (these data structures are implemented as hash-tables for speedy

access). Whenever a worker needs to consult the value of a variable, instead of consulting

the variable’s cell immediately it will look-up the hash-tables first. Figure 4 (a) shows the

use of hash-tables: note the links between hash windows and the fact that only some hash

windowswill have bindings. Note also that whenever the value for a variable is consulted, we

need only to consult the hash-tables younger than the variable, thus look-up is not necessary

for variables created after the last hash-table. PEPSys reduces the overheads in looking up

ancestor nodes by adding the binding of a variable to the current hash table whenever that

26

variable is accessed. Analysis of the PEPSys showed that a maximum of 7% of the execution

time is being spent in dereferencing through the hash tables.

X = ..

Y = a

X = a

Z = ..

X = b

Z = a

X = b

X = a

Z = a

(a) Hash-Windows model

X = ..

Y = a

X = a

Z = ..

X = b

Z = a

X = ...

X = bX = ...

X = a

Z = a

Z = ...

(b) Binding Array Model

Figure 4: Shared Bindings in Or-parallel Models

The PEPSys system was developed at the ECRC research labs. The idea was to sup-

port both or and And-parallelism, but only a limited form of And-parallelism was eventually

implemented. ECRC also maintained several Prolog and CLP systems. Although results

were good, PEPsys was never available outside ECRC, and ECRC’s ECLiPse system uses

copying [1]. More recently, Shen proposes hash tables in the style of PEPSys in his FIRE

system [159].

Binding arrays: In binding arrays each Prolog variable has associated to it a cell in an

auxiliary private data structure, the binding array [193, 189]. In the SRI model, a binding

array is associated with each active worker, and every variable is initialised to point to its

offset in the corresponding binding-array location. Conditional bindings are stored in the

binding arrays and in the trail, but not in the environment stack or heap. Unconditional

bindings are still stored in the stacks. In this way the model guarantees that if a variable

can be bound differently by several or-branches, it must be accessed through the binding

27

array. Moreover, binding arrays have the important property that a variable has the same

binding array offset irrespective of or-branch. Figure 4 (b) shows the use of binding arrays:

notice that binding arrays always grow when you go down the search tree.

In the SRI model the stacks are completely shared, but binding arrays are private to every

processor. When a workerW1 wants to work in a choice-point created by another workerW2,it backtracks until a choice-point it is sharing with W2 and then moves down in the or-treeuntil it finds W2’s choice-point. Backtracking is implemented by inspecting the trail andresetting all entries in the binding array altered since the last choice-point. Moving down

the tree is done by setting its pointers to the ones in the choice-point and by inspectingW2’strail (which is shared) in order to place all the corresponding bindings in the worker’s own

binding array.

Aurora [106] implements the SRI model. Aurora was a very important influence in the

development of parallel logic programming. It was arguably the first parallel system that

could run sizeable Prolog implementations. We give amore detailed description of the Aurora

implementation next.

Aurora Implementation The Aurora system [106] is based on SICStus Prolog [6]. The

system was a collaborative effort between the University of Bristol (originally Manchester)

in the UK, SICS in Sweden and Argonne National Labs in the USA. Aurora changes the

SICStus engine in several ways [22], we just discuss the most important ones:� each worker has two binding arrays, one for variables in the environments, the otherfor variables in the heap;� choice points are expanded to point to the binding arrays, and to include fields relevantto the management of work in the search-tree;� memory allocation in Aurora is changed; each worker now represents its stacks as setsof blocks. This allows the stacks to grow without the need for relocation of pointers.

A fundamental problem for Or-parallel systems is how to schedule or-work. Aurora uses

a demand driven approach to scheduling. Basically, a worker executes a task as a Prolog

engine. When its current task is finished, the worker calls the scheduler which tries to find

work somewhere in the search-tree. The interface between the two components has been de-

signed to be as independent as possible of the underlying engine and scheduler [168]. Initial

28

schedulers, such as the Manchester scheduler [18], favoured distribution of work topmost in

the tree. Such strategies do not necessarily obtain the best results, particularly when the

or-tree may be pruned by cuts or commits or when most work is fine-grained. The Bristol

scheduler [10] was initially implemented to support bottommost dispatching but has been

adapted to support several strategies, including selection of leftmost work. The Dharma

scheduler [161] favours work which is not likely to be pruned away and tries to avoid spec-

ulative work. Both the Dharma and Bristol schedulers can use voluntary suspension, i.e., a

worker abandoning its unfinished task, to move workers from speculative to non-speculative

areas in the or-tree.

Results for Aurora (with the latest schedulers) show good all-solution and improvements

on first-solution speedups on diverse applications. The static overheads caused by the paral-

lel implementation are on the average of 15%–30%, basically due to supporting the binding

arrays and to overheads in the implementation of choice-points. Instrumentation [166] shows

that fixed overheads are more substantial than the distance-dependent overheads frommov-

ing around the search-tree.

Aurora influenced the development of or-parallel system Dorpp from Silva [160], one of

the first systems designed for shared distributed memory. It also influenced the And/Or

Andorra-I system [145], and the PBA [71] and SBAmodels [41], that we discuss later. Aurora

was also one of the most stable parallel logic programming systems, and one of the few for

which there was significant work on practical applications, such as the work from Kluzniak

and from Szeredi [94, 167, 169].

8 And-Parallelism

Whereas workers in Or-parallel computations attempt to obtain different solutions to the

same query, workers in And-parallel computations collaborate in obtaining a solutions to a

query. Each And-parallel task will contribute to the solution by binding variables to values.

Problems can arise if the parallel goals have several different alternative solutions, or if sev-

eral parallel goals want to attribute different values to the same value (the binding conflict

problem).

There are several solutions to these problems. Traditional Independent And-parallel sys-

tems only run goals that do not share variables in parallel. Non-Strict Independent And-

29

Parallelism [81] gives a more general definition of independence between goals that allows

some variable sharing.

In contrast, Dependent And-parallel systems allow goals that share variables to proceed

in parallel, while (usually) enforcing some other restrictions. One example of such systems

are the parallel implementations of the committed-choice languages. Parallelism in these

languages can be exploited quite simply by allowing all goals that can commit to do so simul-

taneously. By their very nature, committed-choice systems do not have the multiple-solution

problem, as they disallow multiple solutions. In this case the binding conflict disappears,

since if two goals in the current query give different values for the same variable, then the

query is inconsistent and the entire computation should fail.

We next discuss some examples of And-parallelism in logic programming systems. We

discuss And-parallelism in the committed-choice languages, with emphasis on the imple-

mentation issues. We mention some Independent And-parallel models and systems, and

we briefly refer to some proposals to exploit dependent And-parallelism between nondeter-

minate goals. Exploiting And-parallelism between determinate goals, as performed in the

Basic Andorra Model [190] and PNU-Prolog [125] is explained in more detail outside this

chapter.

8.1 And-Parallelism in the Committed-Choice Languages

In the committed-choice languages, all goals that can commit may run in parallel and gener-

ate new goals in parallel. Parallelism in these languages is thus at the goal-level. An ideal

execution model for such languages could be based on a pool of goals. Whenever a worker is

free it looks in the pool of goals and fetches a goal. If the goal does commit to a clause, the

worker should add the goals in the body to the pool. Otherwise, the worker would look for

another goal.

Consider now a very simple FGHC program (FGHC is the flat version of GHC, where only

built-ins are allowed in the guard):

append([X|Xs], Ys, XZs) :-

XZs = [X|Zs],

append(Xs, Ys, Zs).

append([], Ys, Zs) :- Zs = Ys.

30

append4(L1, L2, L3, L4, NL) :-

append(L1, L2, I1),

append(L3, L4, I2),

append(I1, I2, NL).

The procedure append4/5 appends four lists, by first appending the lists by pairs, and then

appending the result.

Consider now the query append4([1,2],[],[3],[4],L). One And-parallel execution

with three workers is shown in figure 5. The workers executing the left two calls to append/3

execute independently. The leftmost and rightmost call to append/3 execute in “pipeline”,

that is, the leftmost call generates bindings for L1 which allow the rightmost call to commit.

ap([1,2],[],L1). ap([3],[4],L2). ap(L1, L2, L).

ap([2],[],L1’).

ap([],[],L1’’)

ap([],[4],L2’).

ap([2|L1’’],[3]|L2’,L’).

ap([],[3,4],L’’)True

True

True

ap4([1,2],[],[3],[4],L).

Figure 5: Parallel Execution of Multiple Concatenation

This very simple example shows the flexibility of the committed-choice languages. Par-

allelism between independent goals can be exploited naturally. More interestingly, logical

variables can be used in quite a natural way to give both sharing and synchronisation be-

tween goals one wants to execute in parallel.

31

Note that verifying if a goal should commit is a very simple process in the flat languages:

just performing head unification and some built-ins. Thus the parallelism that is exploited

in flat committed-choice languages is quite fine-grained.

We next discuss two influential implementations of the committed-choice languages that

have been quite successful for shared-memory machines. Both systems use abstract ma-

chines similar to the WAM, but with strong differences on how goals are manipulated.

Kimura and Chikayama’s KL1-B abstract machine [91, 152, 151] implements KL1. Cram-

mond’s JAM [46] is an abstract machine for the implementation of parallel PARLOG, includ-

ing deep guards (although there is no Or-parallelism between deep guards) and the sequen-

tial conjunction. JAM is based on a light weight process execution model [45].

Both systems use a goal–stacking implementation where for each goal, goal records store

all the arguments plus some bookkeeping information, instead of the WAM’s environment

representation. Goal stacking was first proposed to represent And-parallelism in the RAP-

WAM [79], described later. Goal records can be quite heavy, in the JAM they have a total of

eleven fields. In the KL1-B goals are stored in a separate heap. In the JAM goal records are

divided in a set of arguments, stored in the argument stack, and the goal structure, stored in

the process stack. Both systems store all variables in the heap. Parallel Parlog supports the

sequential conjunction that are implemented with a special data structure, environments.

We briefly mention some of the more important alterations to the WAM:� Manipulation of goals: Whereas Prolog can always immediately pick the leftmost goal,in committed choice languages goals can be in several states. The KL1-B classifies these

states as ready, or available for execution, suspended, or waiting for some variable to

be instantiated, and current, or being executed. The JAM follows similar principles.� Suspension on variables: Committed-choice languages allow multiple waiting, so a goalmay suspend on several variables. The opposite is also true and several goals may sus-

pend on the same variable. Both languages associate to each variable a linked list of

suspension records, or suspension notes. In the KL1-B each suspension record contains

a pointers to the suspension flag record, itself consisting of a pointer to the goal record

plus the number of variables the goal is suspending. The JAM also uses indirect ad-

dressing to guarantee synchronisation between several variables whilst accessing goal

records. One useful optimisation of JAM is that goals suspended on a single variable

32

are treated in a simpler way.� Organisation of clauses: Clauses are divided into a guard, where unification is read-only (passive in KL1-B notation) unification and can suspend, and the body where (as

in Prolog) unification can bind external variables (active in KL1-B’s terminology) or

be used for argument preparation. New instructions are need to support passive uni-

fication instructions (these instructions need to consider suspension). Both abstract

machines use a special suspension instruction that is called when the goal cannot com-

mit.� Backtracking: In committed-choice languages, there is no true backtracking. Therefore,the trail and choice points can be dispensed with. (In JAM backtracking may occur

in the guard but as the goals in the guard cannot bind external variables it is not

necessary to implement a trail.) Both abstract machines still include try instructions,

but they do notmanipulate choice-points. The disadvantage of not having backtracking,

is that it is impossible to recover space and hence there is a strong need for dynamic

memory recovery, such as the recovery of unreferenced data structures through the

MRB bit [28] and garbage collection [44, 128].

Shared-memory parallel implementations of both languages have to perform lockingwhen-

ever writing variables because other processors may want to write on the variables simulta-

neously. To reduce locking, structures are first created and unified to temporary variables;

only then they are unified with the actual arguments. Finally scheduling of and-work in

these languages is dominated by locality considerations. Each processor has its own work

queue, and manages its own parts of the data areas (again, similar ideas were proposed for

the RAP-WAM [79]). Depth first scheduling is favoured for efficiency: evaluating the leftmost

call first allows better reusage of the goal frames. JAM supports better scheduling [47] by

allowing the local run queues to be in part private to each worker. On the Sequent Symmetry

multiprocessor, JAM performs 20% to 40% faster than the corresponding implementation of

the KL1-B [179].

KLIC [27] is a more recent implementation of KL1. It generates C code, and it supports

parallelism for both shared memory and distributed memory [141].

33

8.2 Initial And/Or Models

In contrast to work on the commited-choice languages, several authors suggested describing

Prolog computation as expanding an And/Or tree, and exploiting parallelism from this tree.

Conery’s AND/OR process model [40] was one of the most influential. In Conery’s model, or-

processes are created to solve the different alternative clauses, and and-processes are created

to solve the body of a goal. The and-processes start or-processes for the execution of the goals,

and join the solutions from the or-processes for the different goals. The model restricts Or-

parallelism by only starting or-processes for the remaining clauses or if no more or-processes

in the current clauses remain to output the solutions. The structure of the model is shown in

Figure 6, based on Figure 3.7 of Conery’s book [40].

<- a & b & c.

<= OR process

<= Processes to solve literals in goal

Figure 6: Computation in the AND/OR Process Model

In Conery’s model a dependency graph between goals indicates which goals depend on

which goal. The cost of process creation and to maintain the dependency graph means that

this model has severe overheads in relation to a sequential Prolog system. Lin and Kumar

obtained an efficient shared memory AND-parallel system where dependency analysis was

performed dynamically without excessive run-time overheads [101].

The REDUCE/OR ProcessModel (ROPM) was designed by Kale [90]. The REDUCE-OR

34

tree is used instead of the AND-OR tree to represent computation. OR nodes correspond to

goals, REDUCE nodes correspond to clauses in the program with special notation for gener-

ators of variables and several or-nodes for the same goal, corresponding to different bindings

of their arguments. REDUCE nodes maintain partial solution sets, PSS, initially empty, that

are used to avoid recomputation. Sequencing between goals is given by data join graphs in

the style of Conery’s dependency graphs. Or-parallelism is explored both when a goal is first

executed, or when solutions from a goal generate several instances (the latter case is not

explored in Conery’s model). The ROPM model has been implemented on multiprocessors

using structure-sharing to implement a binding environment that prevents references from

a child to its parent node [138]. Benchmark results suggest significant overheads in the im-

plementation of the model, but almost linear speedups in suitable benchmarks due in some

cases to AND- and in some cases to Or-parallelism.

The dataflow paradigm has also been used to support And/Or parallelism. In this paradigm

the nodes in the And/Or tree become nodes in a dataflow graph. Examples of work on

dataflow systems include Wise’s Epilog [194] and Kacsuk’s LOGFLOW [88, 126].

8.3 Independent-And Parallelism

The overheads in implementing dependency (or join) graphs may be quite substantial. De-

Groot [52] proposed a scheme where only goals which do not have any run-time common

variables are allowed to execute in parallel. To verify these conditions, DeGroot suggested

the use of expressions that are added to the original clause and at run-time test the argu-

ments of goals to verify independence. DeGroot’s work on Restricted And Parallelism was

later refined by Hermenegildo. Hermenegildo proposed the conditional graph expressions,

or CGEs [79], and &-Prolog’s parallel expressions [123] to control And-parallelism. We next

give an example of a linear parallel expression in the &-Prolog language:

( ground(X), indep(Z, W) ->

a(X,Z) & b(X, W) ;

a(X,Z), b(X, W) )

If the first two conditions hold, the two goals a(X,Y) and b(X,W) are independent and can

execute in parallel, otherwise they are to be evaluated sequentially. The ground condition

guarantees that the shared variable will not contain unbound variables at run-time, and the

35

test indep guarantees that Z and W do not share run-time variables.

8.4 &-Prolog

Hermenegildo’s &-Prolog system implements Independent And-parallelism for Prolog. It is

based on an execution scheme proposed by Hermenegildo and Nasr [80] which extends back-

tracking to cope with independent and-parallel goals. As in Prolog, the scheme recomputes

the solutions to independent goals if previous independent goals are nondeterminate.

The &-Prolog language extends Prolog with the parallel conjunction and goal delaying.

One objective of corresponding system was to have sequential execution as close to Prolog

as possible. To do so, &-Prolog maintains much of the Prolog data structures. &-Prolog pro-

grams are executed by a number of PWAMs running in parallel [79]. The instruction set

of a PWAM is the SICStus Prolog instruction set, plus instructions that include the CGE

tests and instructions for parallel goal execution. Synchronisation between goals is imple-

mented through the parcall-frames, data-structures that represent the CGEs and that are

used to manage sibling and-goals. The resulting system has very low overheads in relation

to the corresponding sequential system, SICStus, and shows good speedups for the selected

benchmark programs, including examples of linear speedups.

An essential component of &-Prolog is the &-Prolog compiler. This compiler can use global

analysis to generate CGEs. (Notice that for some programs this may still be difficult, and &-

Prolog allows hand-written annotations). Abstract interpretation is used to verify conditions

such as groundness or independence between variables are always satisfied. If they are, the

CGE generator can much simplify the resulting CGEs, and avoid the overheads inherent in

performing the CGE tests.

The &-Prolog system was designed to support full Prolog. Similar to Or-parallel systems,

parallel executions of goals may break the Prolog sequence of side-effects. Several solutions

have been proposed for this problem, the one actually used in &-Prolog is simply to sequence

computation around side-effects.

The &-Prolog system was initially designed at MCC, in the USA. The implementation was

unfortunately not available outside MCC. Several researchers have then continued work in

Independent And-Parallelism. Hermenegildo and others in Spain focussed on compilation is-

sues, both for parallelising Prolog [17] and the constraint logic programming languages [9].

36

Shen in the UK reimplemented most of the functionality of &-Prolog in his DASWAM pro-

totype for DDAS [157]. Pontelli and Gupta at New Mexico State University have also re-

designed most of &-Prolog for their &-ACE prototype [134]. The &-ACE system has been

very successful as an efficient implementation of Independent And-parallelism, and has con-

tributed several important run-time optimisations [135].

8.5 Some Dependent And-Parallel Models

Several models that allow dependent And-parallelism between non-determinate goals have

been proposed in the literature. In order to solve the binding problems all these models

impose some ordering between goals.

In Tebra’s optimistic And-parallel model [176], the standard Prolog ordering is used. Dur-

ing normal execution all goals are allowed to go ahead and bind any variables. When binding

conflicts arise between two goals, the goal that would have been executed first by Prolog has

priority, and the other goal is discarded. In the worst case quite a lot of work can be dis-

carded, and the parallelism can become very speculative. Other optimistic models reduce

the amount of discarded work by using other orderings of goals, such as time-stamps [127].

Goals can also be classified according to producer-consumer relationships. In these mod-

els, producer goals are allowed to bind variables, and consumer goals wait for these vari-

ables. Goals can be classified as producers for a variable statically [33, 164], or dynam-

ically [156]. In Somogyi’s system [164], extended mode declarations statically determine

producer-consumer relationships. In the Codognets’ IBISA scheme [33], it is suggested a

system where variables in a clause are marked with read-only annotations in the style of

Concurrent Prolog. In Shen’s DDAS [158] model, dynamic relationships between produc-

ers and consumers are obtained by extending the CGE notation. The extended CGEs now

mark some variables as dependent, and the system dynamically follows these variables to

verify which goals are leftmost for them. If a goal has the leftmost occurrence of a dependent

variable it is allowed to bind the variable, but otherwise it delays.

The producer-consumer models become very complex when the producer or consumer

have to backtrack. Both Somogyi’s scheme and especially IBISA apply ideas of “intelligent

backtracking” to reduce the search space. DDAS uses a backtracking scheme similar to

Hermenegildo’s semi-intelligent backtracking [80]. Shen claims that this schemes results in

37

a simpler execution model, closer to sequential Prolog, the target language for DDAS.

The consumer-producer models support Independent And-parallelism as a subset. In

addition, dependent And-parallelism between deterministic computations (as long as the

producer-consumer relations between goals are fixed) can be exploited naturally. Finally, the

models offer dependent And-parallelism between non-determinate dependent goals.

Shen gives a description and good initial performance results for his implementation of

DDAS [157]. One problem with Shen’s implementation of dependent And-parallelism is that

it is quite complex to support variables that are shared between and-branches. A simpler and

more elegant technique for detecting whether a goal is the producer or a consumer for such

a variable was proposed and implemented for ACE [133], with good results. ACE also inno-

vates over pure producer-consumer parallelism by allowing eager execution of deterministic

goals, in the style of Andorra.

8.6 Independent And/Or models

The PEPSys [25] model, Gupta’s Extended And-Or Tree model [68] and Fagin’s model [59]

are three models designed to implement Independent And- and Or-parallelism in a shared-

memory framework. All use CGEs to implement And-parallelism, and combine Or-parallelism

with And-parallelism by extending respectively hash tables and binding arrays. The sev-

eral solutions from the independent and-parallel computations are implemented through a

special cross-product node, which can be quite complex to implement. In fact, the PEPSys

system only truly implements deterministic And-parallel computations [23].

New proposals for combined and-or parallelism use backtracking to obtain the several

And-parallel solutions (thus, and as in &-Prolog, some recomputation is performed) [67].

Such systems should be easier to implement than PEPSys or the Extend And-Or Tree Model,

as they can exploit the technology of &-Prolog and the Or-parallel systems, and as they do not

need to calculate cross-products. Such proposals include Shen’s or-under-andmodel [156] and

the C-Tree, a framework for And/Or models that was proposed by Gupta and colleagues [67].

Proposed implementations of the C-tree framework include Gupta and Hermenegildo’s ACE [66],

a model that combines Muse and &-Prolog, and Gupta’s PBA model [70], a model that com-

bines Aurora and &-Prolog. One important advantage of these models is that they are quite

suitable to the implementation of Full Prolog [72].

38

Work in the implementation of these models has shown several implementation difficul-

ties, particularly in memory management. Correia and colleagues argue that to address

these problems, C-tree based models must be redesigned to minimise interference between

and and Or-parallelism, and propose a novel data-structure towards this purpose, the Sparse

Binding Array [144].

An alternative approach to combining and-or parallelism through recomputation has

been proposed by Shen based in his PhD work in simulating And/Or parallelism [156]. The

FIRE model uses hash tables instead of copying or binding arrays [159], thus avoiding mem-

ory management problems.

8.7 Reform Prolog

A very different approach to And-parallelism was suggested in Reform Prolog, a system de-

veloped at Upsalla [12]. The idea was to use data-parallelism by unfolding recursive calls in

a predicate. The scheme was shown to obtain excellent results. The Reform Prolog system

was developed from scratch, and included a compiler that could automatically parallelise

simple recursion on lists and on numbers. It was one of the few parallel logic programming

systems that could use static scheduling.

Other authors have suggested that this parallelism can be implemented on top of tradi-

tional Prolog systems, either by compile-time transformations (see Hermenegildo et al. [78]),

or by run-time optimisations (see Gupta et al. [132]).

9 Andorra Based Systems

Andorra-I was the first implementation of the Basic Andorra Model. The system was de-

veloped at the University of Bristol by Beaumont, Dutra, Santos Costa, Yang, and War-

ren [145, 196]. It was designed to take full advantage of the Basic Andorra Model. This

means both exploiting parallelism, and exploiting implicit coroutining as much as possible.

Andorra-I programs are executed by teams of abstract processing agents called workers.

Each worker usually corresponds to a physical processor. Each team, when active, is asso-

ciated with a separate or-branch in the computation tree and is in one of two computation

phases:

39

Determinate For a team, as long as determinate goals exist in the or-branch, all such goals

are candidates for immediate evaluation, and thus can be picked up by a worker. This

phase ends when no determinate goals are available, or when a determinate goal fails.

In the first case, the team moves to the non-determinate phase. In the second case, the

corresponding or-branch must be abandoned, and the team will backtrack in order to

find a new or-branch to explore.

Nondeterminate If no determinate goals exist, the leftmost goal (or a particular goal spec-

ified by the user) is reduced. A choice-point is created to represent the fact that the

current or-branch has now forked into several or-branches, while the team itself will

explore one of the or-branches. If other teams are available, they can be used to explore

the remaining or-branches.

Figure 7 shows the execution phase in terms of a pool of determinate goals. The figure

shows that the determinate phase is abandoned when either no more determinate goals are

available or when the team fails, and the determinate phase is reentered either after creating

a choice point, or after backtracking and reusing a choice point.

Determinate Goals

backtrack

create choice point

no determinate

goals

fail

reduce

Determinate Phase Nondeterminate Phase

Figure 7: Execution Model of Andorra-I

During the determinate phase, the workers of each team behave similarly to those of a

parallel committed-choice system; they work together to exploit And-parallelism. During

the non-determinate phase, and on backtracking, only one particular worker in the team

is active. We call this worker the master and the remaining workers slaves. The master

40

performs choice-point creation and backtracking in the same way as an Or-parallel Prolog

system.

The Andorra-I system consists of several components. The preprocessor, designed by

Santos Costa, is responsible for compiling the program and for the sequencing information

necessary to maintain the correct execution of Prolog programs [147]. The engine, designed

by Yang, and was responsible for the execution of the Andorra-I programs [146] Initially

this was an interpreted system largely based on JAM [45], as regards the treatment of And-

parallelism, and on Aurora [106], as regards the treatment of Or-parallelism. However, the

integration of both types of parallelism introduced a number of new implementation issues,

as discussed in [146]. A compiler based version of Andorra-I was next developed. Yang

et al. [196] describe the key ideas of the abstract machine, based on JAM, and give a perfor-

mance analysis. The compilation techniques used are describe by Santos Costa et al [148].

Most of the execution time of workers should be spent executing engine code, i.e. per-

forming reductions. Whenever a worker runs out of work, they enter a scheduler to find

another piece of available work. Andorra-I includes an or-scheduler, an and-scheduler and a

reconfigurer.

The or-scheduler is responsible for finding or-work, i.e. an unexplored alternative in the

or-tree implied by the logic program. Andorra-I used Bristol or-scheduler [10], originally

developed for the Aurora system. The and-scheduler, developed by Yang, is responsible for

finding eligible and-work, which corresponds to a goal in the run queue (list of goals not yet

executed) of a worker in the same team. Each worker in a team keeps a run queue of goals.

This run queue of goals has two pointers. The pointer to the head of the queue is only used

by the owner. The pointer to the tail of the queue is used by other workers to “steal” goals

when their own run queues are empty. If all the run queues are empty, the slaves wait either

until some other worker (in our implementation, the master) creates more work in its run

queue or until the master detects that there are no more determinate goals to be reduced

and it is time to create a choice-point.

The initial version of Andorra-I relied on a fixed configuration of workers between teams.

Dutra [57] designed the Andorra-I reconfigurer, which could dynamically adapt workers to

the different forms of available parallelism. The reconfigurer was shown to be quite effective

and in fact could quite often improve on the best hand-tailored configurations.

Andorra-I was a very significant step in parallel logic programming. It was the first

41

system to support both Dependent And-Parallelism and Or-Parallelism. It showed good

speedups, whilst maintaining an acceptable base performance. More recently, there has been

work on application development with Andorra-I, and also on supporting parallelisation of

finite-domain constraints [63]. Andorra-I has also been used as basis for studying the per-

formance of parallel logic programming systems on scalable architectures, see Santos Costa,

Bianchini, and Dutra [143, 142], and on software distributed shared memory systems [84].

9.0.1 Other Andorra Systems

Palmer and Naish worked on a different implementation of the Basic Andorra Model, as

an extension of the Parallel Nu-Prolog System, the NUA-Prolog system [130]. This was

a compiled system supporting And-parallelism, and also showed good speedups. Tick and

Korsloot also did some work for an implementation of Bahgat’s Pandora [180].

The development of the EAM and AKL led to several parallel implementations of AKL.

In Melbourne, Palmer designed an implementation supporting And-parallelism [129]. He

extended AKL with mode declarations to provide finer control. Moolenar and others at Leu-

ven designed an implementation supporting AND/OR parallelism [120] through hash tables.

This work was one of the first to use the Andorra principle for constraints, namely finite-

domain constraints [121]. Montelius and others worked on a parallel implementation of AKL

using copying, Penny [118]. The implementation does not separate between forms of paral-

lelism, both are exploited in much the same way. Other interesting contributions result from

the work on parallel garbage collection and from the extensive performance analysis [119].

Research in parallelism on the language Oz has followed different directions. Oz has

an efficient sequential implementation that benefits from work in logic programming sys-

tems [114]. Researchers have beenmore interested in explicit parallelism through threads [136],

and in using Oz as a control language for distributed programming [76].

10 Further Reading

It is impossible to cover all the research in implementing sequential and parallel logic pro-

gramming systems in a single survey. Several interesting books and surveys are available

that cover further ground on this area. Aıt-Kaci’s book [2] is the standard reference on

the WAM. Van Roy published an excellent survey on sequential Prolog implementations,

42

up to 1993 [186]. The survey gives a more detailed analysis of the WAM and of the BAM,

and discusses subjects not covered in this work, such as hardware implementations of Pro-

log. Regarding parallel systems, Tick’s book gives an excellent analysis of Or-parallel Prolog

systems versus the committed choice languages [177], Gupta’s book gives an insightful dis-

cussion of the issues of parallelism for Prolog systems [65], and Kacsuk and Wise edited

a thorough collection on work on distributed implementations of logic programming [89].

Chassin de Kergommeaux and Codognet give a survey on Parallel Logic Programming Sys-

tems, also from 1994 [24]. An important survey is the one on committed choice languages

given by Shapiro [155].

Research in this field has often been published in the major conferences on logic pro-

gramming, published as The Logic Programming Series from MIT Press. Other important

conferences are the PLILP Conferences, published in Springer Verlag LNCS series. A yearly

workshop on implementation, so far associated with the major Logic Programming Confer-

ence, are a good source on the most up to date research. Often these workshops are published

as books [181, 58]. The Journal of Logic Programming, published by Elsevier, and New Gen-

eration Computing, published by Ohmsha and Springer Verlag are the two main journals in

the area. Implementation work on logic programming can also be found in several related

journals and conferences.

11 Conclusions

Logic programming is one of themost successful and widely available programming paradigms.

Part of this success results from the extensive work on the design and implementation of

logic programming systems. In this paper we survey some of the most significant work on

sequential and parallel implementations of logic programming.

Work in sequential implementations has been very successful. The work in Prolog has

evolved from the original Marseille interpreter to compiled systems such as the WAM and

then to high-performance native code implementations. There is still scope on applying

global optimisations effectively to further improve performance, and on optimising perfor-

mance for modern computer architectures.

Some of the most exciting recent work on sequential work results from the many new

paths being constantly open in logic programming. Constraint Programming is now becom-

43

ing a separate research area. Work on tabulation and memoing is at last tackling some of

Prolog’s “original sins”. Ambitious execution schemes, such as the EAM, are being consid-

ered [104]. Other researchers are keen on combining logic programming with other success-

ful paradigms such as functional programming or Java.

Logic programming is an ideal paradigm for the development of parallel systems. Al-

though the wide acceptance of parallel logic programming has been hampered by the rela-

tive dearth of parallel computers, logic programming is one area in computing where main-

stream systems do support implicit parallelism. In fact, Or-parallelism is supported in the

Prolog systems such as ECLiPse [1], SICStus Prolog [6], and YAP [48]. More experimen-

tal, but still excellent, available systems include Andorra-I [145], KLIC [27], Penny [118],

DASWAM [157], and ACE [134]. Research is now close to at last efficiently combining And-

parallelism with Or-parallelism while preserving Prolog style execution. Work is also going

on further optimisations to the current parallel paradigms, on providing more effective ap-

plication support, and on supporting extensions to logic programming, such as constraints,

tabulation [62, 139] and functional programming [73].

Acknowledgments

Thanks are due to Kish Shen and to Ines Dutra for reading drafts of this paper and making

helpful comments. Thanks are also due to the organisers of ILPS97 workshop on implemen-

tation who made this work possible: Enrico Pontelli, Gopal Gupta, Ines Dutra, and Fernando

Silva. The author wants to acknowledge the support of the COPPE/Sistemas, Universidade

Federal do Rio de Janeiro, and from the Melodia project (grant JNICT PBIC/C/TIT/2495/95)

for this work. Last, and not the least, the author wants to acknowledge the excellent work

that justifies this survey, and apologise to every author whose contribution could not be in-

cluded in this survey.

References

[1] A. Aggoun, D. Chan, P. Dufresne, E. Falvey, H. Grant, A. Herold, G. Macartney,

M. Meier, D. Miller, S. Mudambi, B. Perez, E. van Rossum, J. Schimpf, P. A. Tsahageas,

and D. H. de Villeneuve. ECLiPSe 3.5 User Manual. ECRC, December 1995.

44

[2] H. Aıt-Kaci.Warren’s Abstract Machine— A Tutorial Reconstruction. MIT Press, 1991.

[3] K. A. M. Ali. Or-parallel Execution of Prolog on the BC-Machine. In Proceedings of

the Fifth International Conference and Symposium on Logic Programming, pages 253–

268. MIT Press, 1988.

[4] K. A. M. Ali and R. Karlsson. The Muse Or-parallel Prolog Model and its Performance.

In Proceedings of the North American Conference on Logic Programming, pages 757–

776. MIT Press, October 1990.

[5] K. A. M. Ali and R. Karlsson. Scheduling Speculative Work in Muse and Performance

Results. International Journal of Parallel Programming, 21(6):449–476, December

1992. Published in Sept. 1993.

[6] J. Andersson, S. Andersson, K. Boortz, M. Carlsson, H. Nilsson, T. Sjoland, and

J. Widen. SICStus Prolog User’s Manual. Technical report, Swedish Institute of Com-

puter Science, November 1997. SICS Technical Report T93-01.

[7] K. Appleby, M. Carlsson, S. Haridi, and D. Sahlin. Garbage collection for Prolog based

on WAM. Communications of the ACM, 31(6):171–183, 1989.

[8] R. Bahgat and S. Gregory. Pandora: Non-deterministic Parallel Logic Programming.

In Proceedings of the Sixth International Conference on Logic Programming, pages

471–486. MIT Press, June 1989.

[9] M. G. d. l. Banda and M. V. Hermenegildo. A Practical Approach to the Global Analysis

of CLP Programs. In ILPS93, pages 437–455, 1993.

[10] A. Beaumont, S. M. Raman, P. Szeredi, and D. H. D. Warren. Flexible Scheduling of

OR-Parallelism in Aurora: The Bristol Scheduler. In PARLE91: Conference on Parallel

Architectures and Languages Europe, volume 2, pages 403–420. Springer Verlag, June

1991.

[11] J. Beer. Concepts, Design, and Performance Analysis of a Parallel Prolog Machine.

Number 404 in Lecture Notes in Computer Science. Springer Verlag, 1989.

[12] J. Bevemyr, T. Lindgren, and H. Millroth. Reform Prolog: The Language and its Im-

plementation. In Proceedings of the Tenth International Conference on Logic Program-

ming, pages 283–298. MIT Press, June 1993.

45

[13] P. A. Bigot and S. K. Debray. A simple approach to supporting untagged objects in

dynamically typed languages. The Journal of Logic Programming, 32(1), July 1997.

[14] J. Briat, M. Favre, C. Geyer, and J. Chassin. Scheduling of or-parallel Prolog on a

scaleable, reconfigurable, distributed-memory multiprocessor. In Proceedings of Par-

allel Architecture and Languages Europe. Springer Verlag, 1991.

[15] P. Brisset and O. Ridoux. Continuations in �Prolog. In D. S. Warren, editor, Pro-ceedings of the Tenth International Conference on Logic Programming, pages 27–43,

Budapest, Hungary, 1993. The MIT Press.

[16] M. Bruynooghe, G. Janssens, A. Callebault, and B. Demoen. Abstract Interpretation:

Towards the Global Optimisation of Prolog Programs. In Proceedings 1987 Symposium

on Logic Programming, pages 192–204. IEEE Computer Society, September 1987.

[17] F. Bueno, M. G. d. l. Banda, and M. V. Hermenegildo. Effectiveness of Abstract In-

terpretation in Automatic Parallelization: A Case Study in Logic Programming. ACM

TOPLAS, 1998.

[18] A. Calderwood and P. Szeredi. Scheduling or-parallelism in Aurora – the Manchester

scheduler. In Proceedings of the Sixth International Conference on Logic Programming,

pages 419–435. MIT Press, June 1989.

[19] M. Carlsson. Freeze, Indexing, and Other Implementation Issues in the Wam. In J.-L.

Lassez, editor, Proceedings of the Fourth International Conference on Logic Program-

ming, MIT Press Series in Logic Programming, pages 40–58. University of Melbourne,

”MIT Press”, May 1987.

[20] M. Carlsson. On the efficiency of optimised shallow backtracking in Compiled Prolog.

In Proceedings of the Sixth International Conference on Logic Programming, pages 3–

15. MIT Press, June 1989.

[21] M. Carlsson. Design and Implementation of an OR-Parallel Prolog Engine. SICS Dis-

sertation Series 02, The Royal Institute of Technology, 1990.

[22] M. Carlsson and P. Szeredi. The Aurora abstract machine and its emulator. SICS

Research Report R90005, Swedish Institute of Computer Science, 1990.

46

[23] J. Chassin de Kergommeaux. Measures of the PEPSys Implementation on the MX500.

Technical Report CA-44, ECRC, January 1989.

[24] J. Chassin de Kergommeaux and P. Codognet. Parallel Logic Programming Systems.

Computing Surveys, 26(3):295–336, September 1994.

[25] J. Chassin de Kergommeaux and P. Robert. An Abstract Machine to Implement Or-And

Parallel Prolog Efficiently. The Journal of Logic Programming, 8(3), May 1990.

[26] D. Chen and D. S. Warren. Query evaluation under the well-founded semantics. In

Proc. of 12th PODS, pages 168–179, 1993.

[27] T. Chikayama, T. Fujise, and D. Sekit. A Portable and Efficient Implementation of

KL1. In 6th International Symposium PLILP, pages 25–39, 1994.

[28] T. Chikayama and Y. Kimura. Multiple Reference Management in Flat GHC. In J.-L.

Lassez, editor, Proceedings of the Fourth International Conference on Logic Program-

ming, MIT Press Series in Logic Programming, pages 276–293. University of Mel-

bourne, ”MIT Press”, May 1987.

[29] K. L. Clark and S. Gregory. PARLOG: Parallel Programming in Logic. ACM TOPLAS,

8:1–49, January 1986.

[30] K. L. Clark, F. G. McCabe, and S. Gregory. IC-PROLOG – language features. In

K. L. Clark and S. A. Tarnlund, editors, Logic Programming, pages 253–266. Academic

Press, London, 1982.

[31] W. F. Clocksin. Principles of the DelPhi parallel inference machine. Computer Journal,

30(5):386–392, 1987.

[32] W. F. Clocksin and C. Mellish. Programming in Prolog. Springer-Verlag, 1986.

[33] C. Codognet and P. Codognet. Non-deterministic Stream And-Parallelism Based on

Intelligent Backtracking. In G. Levi and M. Martelli, editors, Logic Programming:

Proceedings of the Sixth International Conference, pages 83–79. The MIT Press, 1989.

[34] P. Codognet and D. Diaz. wamcc: Compiling Prolog to C. In 12th International Confer-

ence on Logic Programming. The MIT Press, 1995.

47

[35] P. Codognet and D. Diaz. Compiling constraints in clp(fd). The Journal of Logic Pro-

gramming, 27(3):185–226, June 1996.

[36] A. Colmerauer. Prolog II: Reference Manual and Theoretical Model. Groupe

D’Intelligence Artificielle, Faculte Des Sciences De Luminy, Marseilles, October 1982.

[37] A. Colmerauer. An Introduction to Prolog-III. Communications of the ACM, 33(7):69–

90, July 1990.

[38] A. Colmerauer. The Birth of Prolog. In The Second ACM-SIGPLAN History of Pro-

gramming Languages Conference, pages 37–52. ACM, March 1993.

[39] A. Colmerauer, H. Kanoui, R. Pasero, and P. Roussel. Un systeme de communication

homme–machine en francais. Technical report cri 72-18, Groupe Intelligence Artifi-

cielle, Universite Aix-Marseille II, October 1973.

[40] J. S. Conery. Parallel Execution of Logic Programs. Kluwer Academic Publishers,

Norwell, Ma 02061, 1987.

[41] M. E. Correia, F. M. A. Silva, and V. Santos Costa. The SBA: Exploiting orthogonality

in OR-AND Parallel Systems. In Proceedings of the 1997 International Logic Program-

ming Symposium, October 1997. Also published as Technical Report DCC-97-3, DCC -

FC & LIACC, Universidade do Porto, April, 1997.

[42] P. Cousot and R. Cousot. Abstract Interpretation: a Unified Lattice Model for Static

Analysis of Programs by Construction or Approximation of Fixpoints. In Conference

Record of the 4th ACM Symposium on Principles of Programming Languages, pages

238–252, 1977.

[43] P. Cousot and R. Cousot. Abstract interpretation and application to logic programs.

The Journal of Logic Programming, 13(1, 2, 3 and 4):103–179, 1992.

[44] J. A. Crammond. A Garbage Collection Algorithm for Shared Memory Parallel Proces-

sors. International Journal of Parallel Processing, 17(6), December 1988.

[45] J. A. Crammond. Implementation of Committed Choice Logic Languages on Shared

Memory Multiprocessors. PhD thesis, Heriot-Watt University, Edinburgh, May 1988.

Research Report PAR 88/4, Dept. of Computing, Imperial College, London.

48

[46] J. A. Crammond. The Abstract Machine and Implementation of Parallel Prolog. Tech-

nical report, Dept. of Computing, Imperial College, London, June 1990.

[47] J. A. Crammond. Scheduling and Variable Assignment in the Parallel Parlog Imple-

mentation. In 1990 North American Conference on Logic Programming, pages 642–

657. MIT Press, October 1990.

[48] L. Damas, V. Santos Costa, R. Reis, and R. Azevedo. YAP User’s Guide and Reference

Manual, 1989.

[49] K. De Bosschere, S. K. Debray, D. Gudeman, and S. Kannan. Call Forwarding: An In-

terprocedural Optimization Technique for Dynamically Typed Languages. In Proceed-

ings of the SIGACT–SIGPLAN Symposium on Principles of Programming Languages,

1994.

[50] S. K. Debray. Register allocation in a Prolog machine. In Symposium on Logic Pro-

gramming, pages 267–275. IEEE Computer Society, The Computer Society Press,

September 1986.

[51] S. K. Debray. On copy avoidance in single assignment languages. In ICLP93, pages

393–407, 1993.

[52] D. DeGroot. Restricted and-parallelism. In H. Aiso, editor, International Conference

on Fifth Generation Computer Systems 1984, pages 471–478. Institute for New Gener-

ation Computing, Tokyo, 1984.

[53] B. Demoen, G. Engels, and P. Tarau. Segment Preserving Copying Garbage Collec-

tion for WAM based Prolog. In Proceedings of the 1996 ACM Symposium on Applied

Computing, pages 380–386, Philadelphia, February 1996. ACM Press.

[54] B. Demoen and K. Sagonas. CAT: the Copying Approach to Tabling. In Proceedings of

PLILP/ALP98. Springer Verlag, September 1998.

[55] P. Deransart, A. Ed-Dbali, L. Cervoni, and A. A. Ed-Ball. Prolog, The Standard :

Reference Manual. Springer Verlag, 1996.

[56] M. Dincbas, P. Van Hentenryck, H. Simonis, A. Aggoun, T. Graf, and F. Berthier. The

Constraint Logic Programming Language CHIP. In International Conference on Fifth

Generation Computer Systems 1988, pages 693–702. ICOT, Tokyo, Japan, Nov. 1988.

49

[57] I. Dutra. Strategies for Scheduling And- and Or-Work in Parallel Logic Programming

Systems. In Logic Programming: Proceedings of the 1994 International Symposium,

pages 289–304. MIT Press, 1994.

[58] I. Dutra, V. Santos Costa, F. Silva, E. Pontelli, G. Gupta, and M. Carro, editors. Paral-

lelism and Implementation Technology for Logic and Constraint Logic Programming.

Nova Science, 1998.

[59] B. S. Fagin and A. M. Despain. The Performance of Parallel Prolog Programs. IEEE

Transactions on Computers, 39(12):1434–1445, Dec. 1990.

[60] M. Ferreira and L. Damas. Unfolding WAM Code. In 3rd COMPULOG NETWorkshop

on Parallelism and Implementation Technology for (Constraint) Logic Programming

Languages, Bonn, September 1996.

[61] I. Foster and S. Taylor. Strand : NewConcepts in Parallel Programming. Prentice-Hall,

January 1990.

[62] J. Freire, R. Hu, T. Swift, and D. S. Warren. Exploiting Parallelism in Tabled Evalua-

tions. In 7th International Symposium PLILP, pages 115–132, 1995.

[63] S. Gregory and R. Yang. Parallel Constraint Solving in Andorra-I. In International

Conference on Fifth Generation Computer Systems 1992, pages 843–850. ICOT, Tokyo,

Japan, June 1992.

[64] D. Gudeman, K. de Bosschere, and S. K. Debray. jc: An Efficient and Portable Sequen-

tial Implementation of Janus. In Proceedings of the 1992 Joint International Confer-

ence and Symposium on Logic Programming, 1992.

[65] G. Gupta. Multiprocessor Execution of Logic Programs. Kluwer Academic Press, 1994.

[66] G. Gupta, M. Hermenegildo, E. Pontelli, and V. Santos Costa. ACE: And/Or-parallel

Copying-based Execution of Logic Programs. In Proc. ICLP’94, pages 93–109. MIT

Press, 1994.

[67] G. Gupta, M. Hermenegildo, and V. Santos Costa. And-Or Parallel Prolog: A Recom-

putation based Approach. New Generation Computing, 11(3,4):770–782, 1993.

50

[68] G. Gupta and B. Jayaraman. Compiled And-Or Parallelism on Shared Memory Multi-

processors. In Proceedings of the North American Conference on Logic Programming,

pages 332–349. MIT Press, October 1989.

[69] G. Gupta and B. Jayaraman. Analysis of or-parallel execution models. ACM TOPLAS,

15(4):659–680, 1993.

[70] G. Gupta and V. Santos Costa. And-Or Parallelism in Full Prolog with Paged Binding

Arrays. In LNCS 605, PARLE’92 Parallel Architectures and Languages Europe, pages

617–632. Springer-Verlag, June 1992.

[71] G. Gupta and V. Santos Costa. Optimal implementation of and-or parallel Prolog.

Future Generation Computer Systems, 14(10):71–92, 1994.

[72] G. Gupta and V. Santos Costa. Cuts and Side-Effects in And-Or Parallel Prolog. Jour-

nal of Logic Programming, 27(1):45–71, April 1996.

[73] M. Hanus and R. Sadre. A Concurrent Implementation of Curry in Java. In Workshop

on Parallelism and Implementation Technology for (Constraint) Logic Programming

Languages, Port Jefferson, October 1997.

[74] S. Haridi and P. Brand. Andorra Prolog–an integration of Prolog and committed choice

languages. In International Conference on Fifth Generation Computer Systems 1988.

ICOT, 1988.

[75] S. Haridi and S. Jansson. Kernel Andorra Prolog and its Computational Model. In

D. Warren and P. Szeredi, editors, Proceedings of the Seventh International Conference

on Logic Programming, pages 31–46. MIT Press, 1990.

[76] S. Haridi, P. Van Roy, and G. Smolka. An overview of the design of Distributed Oz. In

Proceedings of the Second International Symposium on Parallel Symbolic Computation

(PASCO ’97), pages 176–187, Maui, Hawaii, USA, July 1997. ACM Press.

[77] R. C. Haygood. Native code compilation in SICStus Prolog. In P. V. Hentenryck, edi-

tor, Proceedings of the Eleventh International Conference on Logic Programming. MIT

Press, June 1994.

51

[78] M. Hermenegildo and M. Carro. Relating Data–Parallelism and And–Parallelism in

Logic Programs. In Proceedings of EURO–PAR’95, Swedish Institute of Computer

Science (SICS), August 1995.

[79] M. V. Hermenegildo. An Abstract Machine Based ExecutionModel for Computer Archi-

tecture Design and Efficient Implementation of Logic Programs in Parallel. PhD thesis,

Dept. of Electrical and Computer Engineering (Dept. of Computer Science TR-86-20),

University of Texas at Austin, Austin, Texas 78712, August 1986.

[80] M. V. Hermenegildo and R. I. Nasr. Efficient Management of Backtracking in AND-

parallelism. In Third International Conference on Logic Programming, number 225 in

Lecture Notes in Computer Science, pages 40–54. Imperial College, Springer-Verlag,

July 1986.

[81] M. V. Hermenegildo and F. Rossi. Non-Strict IndependentAnd-Parallelism. In Proceed-

ings of the Seventh International Conference on Logic Programming, pages 237–252.

MIT Press, June 1990.

[82] T. Hickey and S. Mudambi. Global compilation of Prolog. The Journal of Logic Pro-

gramming, pages 193–230, November 1989.

[83] R. Hill. LUSH-Resolution and its Completeness. Dcl memo 78, Department of Artificial

Intelligence, University of Edinburgh, 1974.

[84] Z. Huang, C. Sun, A. Sattar, and W.-J. Lei. Parallel Logic Programming on Distributed

Shared Memory System. In Proceedings of the IEEE International Conference on In-

telligent Processing Systems, October 1997.

[85] J. Jaffar and S. Michaylov. Methodology and implementation of a CLP system. In

J.-L. Lassez, editor, Proceedings of the Fourth International Conference on Logic Pro-

gramming, MIT Press Series in Logic Programming, pages 196–218. University of Mel-

bourne, ”MIT Press”, May 1987.

[86] S. Janson and S. Haridi. Programming Paradigms of the Andorra Kernel Language. In

Logic Programming: Proceedings of the International Logic Programming Symposium,

pages 167–186. MIT Press, October 1991.

52

[87] G. Janssens, B. Demoen, and A. Marien. Improving the register allocation of WAM by

recording unification. In ICLP88, pages 1388–1402, 1988.

[88] P. Kacsuk. A Highly Parallel Prolog Interpreter Based on the Generalised Data Flow

Model. In S.-A. Tarnlund, editor, Proceedings of the Second International Logic Pro-

gramming Conference, pages 195–205, Uppsala University, Uppsala, Sweden, 1984.

[89] P. Kacsuk and M. J. Wise, editors. Implementations of Distributed Prolog. Wiley, Series

in Parallel Computing, 1992.

[90] L. V. Kale. The REDUCE OR process model for parallel execution of logic program-

ming. The Journal of Logic Programming, 11(1), July 1991.

[91] Y. Kimura and T. Chikayama. An Abstract KL1 Machine and its Instruction Set. In In-

ternational Symposium on Logic Programming, pages 468–477. San Francisco, IEEE

Computer Society, August 1987.

[92] S. Kliger and E. Shapiro. A Decision Tree Compilation Algorithm for FCP(j,:,?). In Pro-ceedings of the Fifth International Conference and Symposium on Logic Programming,

pages 1315–1336. MIT Press, August 1988.

[93] S. Kliger and E. Shapiro. From Decision Trees to Decision Graphs. In Proceedings

of the North American Conference on Logic Programming, pages 97–116. MIT Press,

October 1990.

[94] F. Kluzniak. Developing applications for Aurora. Technical Report TR-90-17, Univer-

sity of Bristol, Computer Science Department, August 1990.

[95] P. Koves and P. Szeredi. Collection of Papers on Logic Programming, chapter Getting

the Most Out of Structure-Sharing. SZKI, November 1993.

[96] R. A. Kowalski. Logic for Problem Solving. Elsevier North-Holland Inc., 1979.

[97] A. Krall. The vienna abstract machine. The Journal of Logic Programming, 1-3, Octo-

ber 1996.

[98] H. Kuchen, R. Loogen, J. J. Moreno-Navarro, and M. Rodrıguez-Artalejo. The Func-

tional Logic Language BABEL and Its Implementation on a Graph Machine. New

Generation Computing, 14(4):391–427, 1996.

53

[99] B. Le Charlier and P. Van Hentenryck. Experimental evaluation of a generic abstract

interpretation algorithm for PROLOG. ACM TOPLAS, 16(1):35–101, January 1994.

[100] X. Li. A new term representation method for prolog. The Journal of Logic Program-

ming, 34(1):43–57, January 1998.

[101] Y.-J. Lin and V. Kumar. AND-parallel execution of logic programs on a shared-memory

multiproces sor. The Journal of Logic Programming, 10(1,2,3 and 4):155–178, 1991.

[102] Z. Lin. Self-organizing task scheduling for parallel execution of logic programs. In

Proceedings of the International Conference on Fifth Generation Computer Systems,

pages 859–868, ICOT, Japan, 1992. Association for Computing Machinery.

[103] T. Lindgren. Polyvariant detection of uninitialized arguments of prolog predicates. The

Journal of Logic Programming, 28(3), September 1997.

[104] R. Lopes and V. Santos Costa. The BEAM: Towards a first EAM Implementation.

In Workshop on Parallelism and Implementation Technology for (Constraint) Logic

Programming Languages, Port Jefferson, October 1997.

[105] L. Lu. Polymorphic type analysis in logic programs by abstract interpretation. The

Journal of Logic Programming, 36(1), July 1998.

[106] E. Lusk, R. Butler, T. Disz, R. Olson, R. Overbeek, R. Stevens, D. H. D. Warren,

A. Calderwood, P. Szeredi, S. Haridi, P. Brand, M. Carlsson, A. Ciepelewski, and

B. Hausman. The Aurora or-parallel Prolog system. New Generation Computing,

7(2,3):243–271, 1990.

[107] A. Marien. Improving the Compilation of Prolog in the Framework of the Warren Ab-

stract Machine. PhD thesis, Katholiek Universiteit Leuven, September 1993.

[108] A. Marien and B. Demoen. On the Management of Choicepoint and Environment

Frames in the WAM. In E. L. Lusk and R. A. Overbeek, editors, Proceedings of

the North American Conference on Logic Programming, pages 1030–1050, Cleveland,

Ohio, USA, 1989.

[109] A. Marien and B. Demoen. A new scheme for unification in WAM. In V. Saraswat and

K. Ueda, editors, Logic Programming, Proceedings of the 1991 International Sympo-

sium, pages 257–271, San Diego, USA, 1991. The MIT Press.

54

[110] A. Marien, G. Janssens, A. Mulkers, and M. Bruynooghe. The impact of abstract inter-

pretation: an experiment in code generation. In Proceedings of the Sixth International

Conference on Logic Programming, pages 33–47. MIT Press, June 1989.

[111] K. Marriot and P. J. Stuckey. Programming with Constraints: An Introduction. MIT

Press, 1998.

[112] H. Masukawa, K. Kumon, A. Itashiki, K. Satoh, and Y. Sohma. ”Kabu-Wake” Parallel

Inference Mechanism and Its Evaluation. In 1986 Proceedings Fall Joint Computer

Conference, pages 955–962. IEEE Computer Society Press, November 1986.

[113] L. Matyska, A. Jergova, and D. Toman. Register allocation in WAM. In K. Furukawa,

editor, Proceedings of the Eighth International Conference on Logic Programming,

pages 142–156, Paris, France, 1991. The MIT Press.

[114] M. Mehl, R. Scheidhauer, and C. Schulte. An Abstract Machine for Oz. Research Re-

port RR-95-08, Deutsches Forschungszentrum fur Kunstliche Intelligenz, Stuhlsatzen-

hausweg 3, D66123 Saarbrucken, Germany, June 1995. Also in: Proceedings of

PLILP’95, Springer-Verlag, LNCS, Utrecht, The Netherlands.

[115] C. S. Mellish. The Automatic Generation of Mode Declarations for Prolog Programs.

DAI Research Paper 163, Department of Artificial Intelligence, Univ. of Edinburgh,

August 1981.

[116] D. A. Miller and G. Nadathur. Higher-order logic programming. In E. Shapiro, edi-

tor, Proceedings of the Third International Conference on Logic Programming, Lecture

Notes in Computer Science, pages 448–462, London, 1986. Springer-Verlag.

[117] L. Monteiro and A. Porto. Contextual logic programming. In Proceedings of the Sixth

International Conference on Logic Programming, pages 284–299. MIT Press, June

1989.

[118] J. Montelius and K. A. M. Ali. An And/Or-Parallel Implementation of AKL. New

Generation Computing, 14(1), 1996.

[119] J. Montelius and P. Magnusson. Using SimICS to Evaluate the PennySystem. In

Proceedings of the 1997 International Logic Programming Symposium, October 1997.

55

[120] R. Moolenaar and B. Demoen. A parallel implementation for AKL. In Proceedings

of the Programming Language Implementation and Logic Programming: PLILP ’93,

Tallin, Estonia, pages 246–261, 1993.

[121] R. Moolenaar and B. Demoen. Hybrid tree search in the Andorra Model. In P. V.

Hentenryck, editor, Proceedings of the Eleventh International Conference on Logic Pro-

gramming, pages 110–123. MIT Press, June 1994.

[122] A. Mulkers, W. Winsborough, and M. Bruynooghe. Live-structure dataflow analysis for

Prolog. ACM TOPLAS, 16(2):205–258, March 1994.

[123] K. Muthukumar and M. V. Hermenegildo. The CDG, UDG, and MEL Methods

for Automatic Compile-time Parallelization of Logic Programs for Independent And-

parallelism. In Proceedings of the Seventh International Conference on Logic Program-

ming, pages 221–237. MIT Press, June 1990.

[124] L. Naish. Negation and Control in Prolog. Lecture notes in Computer Science 238.

Springer–Verlag, 1985.

[125] L. Naish. Parallelizing NU-Prolog. In Proceedings of the Fifth International Conference

and Symposium on Logic Programming, pages 1546–1564. MIT Press, August 1988.

[126] Z. Nemeth and P. Kacsuk. Experiments with Binding Schemes in LOGFLOW. In

Proceedings of Europar 1998, Southampton, UK, 1998.

[127] I. W. Olthof. An Optimistic AND-Parallel Prolog Implementation. Master’s thesis,

Department of Computer Science, University of Calgary, 1991.

[128] T. Ozawa, A. Hosoi, and A. Hattori. Generation Type Garbage Collection for Parallel

Logic Languages. In Proceedings of the North American Conference on Logic Program-

ming, pages 291–305. MIT Press, October 1990.

[129] D. Palmer. The DAM: A Parallel Implementation of the AKL. Presented at the ILPS

workshop on Parallel Logic Programming, October 1991.

[130] D. Palmer and L. Naish. NUA-Prolog: an Extension to the WAM for Parallel Andorra.

In K. Furukawa, editor, Proceedings of the Eighth International Conference on Logic

Programming. MIT Press, 1991.

56

[131] L. M. Pereira, L. Monteiro, J. Cunha, and J. N. Aparıcio. Delta Prolog: a distributed

backtracking extension with events. In E. Shapiro, editor, Third International Confer-

ence on Logic Programming, London, pages 69–83. Springer-Verlag, 1986.

[132] E. Pontelli and G. Gupta. Data and-parallel logic programming in &ace. In 7th IEEE

Symposium on Parallel and Distributed Processing. IEEE Computer Society, 1995.

[133] E. Pontelli and G. Gupta. Dependent and Extended Dynamic Dependent And-

parallelism in ACE. Journal of Functional and Logic Programming, (to appear).

[134] E. Pontelli, G. Gupta, and M. Hermenegildo. &ACE: A High-Performance Parallel Pro-

log System. In International Parallel Processing Symposium. IEEE Computer Society

Technical Committee on Parallel Processing, IEEE Computer Society, April 1995.

[135] E. Pontelli, G. Gupta, M. Hermenegildo, M. Carro, and D. Tang. Efficient Implementa-

tion of And-Parallel Logic Programming Systems. Computer Languages, 22(2/3), 1996.

[136] K. Popov. A Parallel Abstract Machine for the Thread-Based Concurrent Language

Oz. In 1997 Post ILPS Workshop on Parallelism and Implementation Technology for

(Constraint) Logic Programming, 1997.

[137] I. V. Ramakrishnan, P. Rao, K. Sagonas, T. Swift, and D. S. Warren. Efficient Tabling

Mechanisms for Logic Programs. In L. Sterling, editor, Proceedings of the 12th Inter-

national Conference on Logic Programming, pages 687–711, Tokyo, Japan, June 1995.

The MIT Press.

[138] B. Ramkumar and L. Kale. Compiled Execution of the Reduce-OR Process Model on

Multiprocessors. In Proceedings of the North American Conference on Logic Program-

ming, pages 313–331. MIT Press, October 1989.

[139] R. Rocha, F. Silva, and V. Santos Costa. On Applying Or-Parallelism to Tabled Evalua-

tions. In Post-ICLP’97 Workshop on Tabling in Logic Programming, Leuven, Belgium,

July 1997. Also published as Technical Report DCC-97-2, DCC - FC & LIACC, Univer-

sidade do Porto, April, 1997.

[140] R. Rocha, F. Silva, and V. Santos Costa. YapOr: an Or-Parallel Prolog System based

on Environment Copying. Technical report, DCC-97-14, DCC - FC & LIACC, Univer-

sidade do Porto, December 1997. (submitted for publication).

57

[141] K. Rokusawa, A. Nakase, and T. Chikayama. Distributed memory implementation of

klic. New Generation Computing, 14(3):261–280, 1996.

[142] V. Santos Costa and R. Bianchini. Optimising Parallel Logic Programming Systems

for Scalable Machines. In Proceedings of Europar 1998, Southampton, UK, 1998.

[143] V. Santos Costa, R. Bianchini, and I. C. Dutra. Parallel Logic Programming Systems

on Scalable Multiprocessors. In Proceedings of the 2nd International Symposium on

Parallel Symbolic Computation, PASCO’97, pages 58–67, July 1997.

[144] V. Santos Costa, M. E. Correia, and F. Silva. Performance of Sparse Binding Arrays

for Or-Parallelism. In Proceedings of the VIII Brazilian Symposium on Computer Ar-

chitecture and High Performance Processing – SBAC-PAD, August 1996.

[145] V. Santos Costa, D. H. D. Warren, and R. Yang. Andorra-I: A Parallel Prolog System

that Transparently Exploits both And- and Or-Parallelism. In Third ACM SIGPLAN

Symposium on Principles & Practice of Parallel Programming PPOPP, pages 83–93.

ACM press, April 1991. SIGPLAN Notices vol 26(7), July 1991.

[146] V. Santos Costa, D. H. D. Warren, and R. Yang. The Andorra-I Engine: A parallel

implementation of the Basic Andorra model. In Proceedings of the Eighth International


[147] V. Santos Costa, D. H. D. Warren, and R. Yang. The Andorra-I Preprocessor: Support-

ing full Prolog on the Basic Andorra model. In Proceedings of the Eighth International


[148] V. Santos Costa, D. H. D. Warren, and R. Yang. Andorra-I Compilation. New Genera-

tion Computing, 14(1), 1996.

[149] V. A. Saraswat. Partial Correctness Semantics for CP[#,j,&,;]. In Proceedings of theFoundations of Software Technology and Theoretical Computer Science Conference,

pages 347–368, December 1985.

[150] V. A. Saraswat, K. Kahn, and J. Levy. Janus: A step towards distributed constraint

programming. In S. Debray and M. Hermenegildo, editors, Proceedings of the 1990

North American Conference on Logic Programming, pages 431–446, Cambridge, Mas-

sachusetts London, England, 1990. MIT Press.

58

[151] M. Sato and A. Goto. Evaluation of the KL1 Parallel System on a Shared Memory

Multiprocessor. In IFIP Working Conference on Parallel Processing, pages 305–318.

Pisa, North Holland, May 1988.

[152] M. Sato, H. Shimizu, A. Matsumoto, K. Rokusawa, and A. Goto. KL1 Execution Model

for PIM Cluster with Shared Memory. In J.-L. Lassez, editor, Proceedings of the Fourth

International Conference on Logic Programming, MIT Press Series in Logic Program-

ming, pages 338–355. University of Melbourne, ”MIT Press”, May 1987.

[153] E. Shapiro. A Subset of Concurrent Prolog and Its Interpreter. In E. Shapiro, editor,

Concurrent Prolog: Collected Papers, pages 27–83. MIT Press, Cambridge MA, 1987.

[154] E. Shapiro. Concurrent Prolog: Collected Papers. MIT Press, 1987.

[155] E. Shapiro. The family of Concurrent Logic Programming Languages. ACM computing

surveys, 21(3):412–510, 1989.

[156] K. Shen. Studies of AND/OR Parallelism in Prolog. PhD thesis, University of Cam-

bridge, 1992.

[157] K. Shen. Initial Results from the Parallel Implementation of DASWAM. In M. Maher,

editor, Proceedings of the 1996 Joint International Conference and Symposium on Logic

Programming. The MIT Press, 1996.

[158] K. Shen. Overview of DASWAM: Exploitation of Dependent And-parallelism. J. of

Logic Prog., 29(1–3), 1996.

[159] K. Shen. A New Implementation Scheme for Combining And/Or Parallelism. In 1997

Post ILPS Workshop on Parallelism and Implementation Technology for (Constraint)

Logic Programming, 1997.

[160] F. M. A. Silva. Implementations of Logic Programming Systems, chapter Or-

Parallelism on Distributed Shared Memory Architectures. Kluwer Academic Pub.,

1994.

[161] R. Sindaha. The Dharma Scheduler – Definitive Scheduling in Aurora on

Multiprocessors Architecture. In PDP ’92, pages 296–303. IEEE, November 1992.

59

[162] G. Smolka. The Oz programming model. In J. van Leeuwen, editor, Computer Science

Today, Lecture Notes in Computer Science, vol. 1000, pages 324–343. Springer-Verlag,

Berlin, 1995.

[163] Z. Somogyi, F. Henderson, and T. Conway. The execution algorithm of mercury, an

efficient purely declarative logic programming language. The Journal of Logic Pro-

gramming, 1-3, October 1996.

[164] Z. Somogyi, K. Ramamohanarao, and J. Vaghani. A Stream AND-Parallel Execution

Algorithm with Backtracking. In R. A. Kowalski and K. A. Bowen, editors, Logic Pro-

gramming: Proceedings of the Fifth International Conference and Symposium, Volume

2, pages 1142–1159. The MIT Press, 1988.

[165] R. M. Stallman. Using and porting gcc. Technical report, The Free Software Founda-

tion, 1993.

[166] P. Szeredi. Performance analysis of the Aurora or-parallel Prolog system. In Proceed-

ings of the North American Conference on Logic Programming, pages 713–732. MIT

Press, October 1989.

[167] P. Szeredi. Using Dynamic Predicates in an Or-Parallel Prolog System. In Logic Pro-

gramming: Proceedings of the International Logic Programming Symposium, pages

355–371. MIT Press, October 1991.

[168] P. Szeredi, M. Carlsson, and R. Yang. Interfacing Engines and Schedulers in OR-

Parallel Prolog Systems. In PARLE91: Conference on Parallel Architectures and Lan-

guages Europe, volume 2, pages 439–453. Springer Verlag, June 1991.

[169] P. Szeredi and Z. Farkas. Handling large knowledge bases in parallel Prolog. InWork-

shop on High Performance Logic Programming Systems, European Summer School on

Logic, Language, and Information, August 1996.

[170] A. Takeuchi. Parallel Logic Programming. PhD thesis, University of Tokyo, July 1990.

[171] P. Tarau. An Efficient Specialization of the WAM for Continuation Passing Binary

programs. In Proceedings of the 1993 ILPS Conference, Vancouver, Canada, 1993. MIT

Press. poster.

60

[172] P. Tarau, K. de Bosschere, and B. Demoen. Partial translation: towards a portable and

efficient Prolog implementation technology. The Journal of Logic Programming, 1-3,

October 1996.

[173] A. Taylor. Removal of Dereferencing and Trailing in Prolog Compilation. In Proceed-

ings of the Sixth International Conference on Logic Programming, pages 49–60. MIT

Press, June 1989.

[174] A. Taylor. LIPS on a MIPS: Results from a Prolog Compiler for a RISC. In Proceedings

of the Seventh International Conference on Logic Programming, pages 174–185. MIT

Press, June 1990.

[175] A. Taylor. Parma–bridging the performance gap between imperative and logic pro-

gramming. The Journal of Logic Programming, 1-3, October 1996.

[176] H. Tebra. Optimistic And-Parallelism in Prolog. In PARLE: Parallel Architectures and

Languages Europe, Volume II, pages 420–431. Springer-Verlag, 1987. Published as

Lecture Notes in Computer Science 259.

[177] E. Tick. Parallel Logic Programming. MIT Press, 1991.

[178] E. Tick and C. Banerjee. Performance evaluation of Monaco compiler and runtime

kernel. In ICLP93, pages 757–773, 1993.

[179] E. Tick and J. A. Crammond. Comparison of Two Shared-Memory Emulators for Flat

Committed–Choice Logic Programs. In International Conference on Parallel Process-

ing, volume 2, pages 236–242, Penn State, August 1990.

[180] E. Tick and M. Korsloot. Determinacy testing for nondeterminate logic programming

languages. ACM TOPLAS, 16(1):3–34, January 1994.

[181] E. Tick and G. Succi, editors. Implementations of Logic Programming Systems. Kluwer

Academic Pub., 1994.

[182] K. Ueda. Guarded Horn Clauses. In E. Shapiro, editor, Concurrent Prolog: Collected

Papers, pages 140–156. MIT Press, Cambridge MA, 1987.

[183] K. Ueda and T. Chikayama. Design of the Kernel Language for the Parallel Inference

Machine. Computer Journal, December 1990.

61

[184] M. H. van EmdemandG. J. de Lucena Filho. Predicate Logic as a Language for Parallel

Programming. In K. L. Clark and S. A. Tarnlund, editors, Logic Programming, pages

189–198. Academic Press, London, 1982.

[185] P. Van Roy. Can Logic Programming Execute as Fast as Imperative Programming?

PhD thesis, University of California at Berkeley, November 1990.

[186] P. Van Roy. 1983-1993: The Wonder Years of Sequential Prolog Implementation. The

Journal of Logic Programming, 19/20, May/July 1994.

[187] D. H. D. Warren. Implementing Prolog - Compiling Predicate Logic Programs. Techni-

cal Report 39 and 40, Department of Artificial Intelligence, University of Edinburgh,

1977.

[188] D. H. D. Warren. An Abstract Prolog Instruction Set. Technical Note 309, SRI Inter-

national, 1983.

[189] D. H. D. Warren. The SRI model for or-parallel execution of Prolog—abstract design

and implementation issues. In Proceedings of the 1987 Symposium on Logic Program-

ming, pages 92–102, 1987.

[190] D. H. D. Warren. The Andorra model. Presented at Gigalips Project workshop, Uni-

versity of Manchester, March 1988.

[191] D. H. D. Warren. Extended Andorra model. PEPMA Project workshop, University of

Bristol, October 1989.

[192] D. H. D. Warren, L. M. Pereira, and F. C. N. Pereira. Prolog—The Language and its

Implementation Compared with Lisp. ACM SIGPLAN Notices, 12(8):109–115, 1977.

[193] D. S. Warren. Efficient Prolog Memory Management for Flexible Control Strategies.

New Generation Computing, 2:361–369, 1984.

[194] M. J. Wise. Prolog Multiprocessors. Prentice-Hall, 1986.

[195] R. Yang. P-Prolog a Parallel Logic Programming Language. World Scientific, 1987.

[196] R. Yang, T. Beaumont, I. Dutra, V. Santos Costa, and D. H. D. Warren. Performance

of the Compiler-based Andorra-I System. In Proceedings of the Tenth International


62

[197] N.-F. Zhou, T. Takagi, and U. Kazuo. A Matching Tree Oriented Abstract Machine for

Prolog. In D. Warren and P. Szeredi, editors, Proceedings of the Seventh International

Conference on Logic Programming, pages 158–173. MIT Press, 1990.

63

Documents

Parallelism and implementation technology for (constraint) logic programming