Upload
upm-es
View
0
Download
0
Embed Size (px)
Citation preview
Parallelism and Implementation Technology for Logic
Programming Languages
Vıtor Santos Costa
LIACC & DCC-FCUP
Universidade do Porto
4150 Porto, Portugal
Abstract
Logic programming provides a high-level view of programming where programs are fun-
damentally seen as a collection of statements that define a model of the intended problem.
Logic programming has been successfully applied to a vast number of applications, and has
been shown to be a good match for parallel computers.
This survey discusses the major issues on the implementation of logic programming
systems. We first survey the evolution of sequential implementations, since the original
Marseille Prolog implementation. Focus is then given to the WAM based techniques that
are the basis for most Prolog systems. More recent developments are presented, such as
compilation to native code.
We next survey the main issues of parallel logic programming, since the original pro-
posals for And/Or parallel systems. The article describes the major techniques used for the
shared memory systems that implement only Or-parallelism and only And-parallelism,
such as Aurora, Muse, &-Prolog, DDAS, &-ACE, PARLOG and KLIC. Last, the survey dis-
cusses recent work on combining several forms of parallelism, as in the Andorra based
languages such as Andorra-I or Penny, or in the Independent-And plus Or models, such as
the PBA, SBA, ACE, or Fire.
1 Introduction
Developments in computing have been dominated by the rise of ever more powerful hard-
ware. Processing speed and memory capacity have increased dramatically over the last
1
decades. Parallel computers connect together several processing units to obtain even higher
performance. Unfortunately, progress in software has been much less impressive. One rea-
son is that most programmers still rely on traditional, imperative languages, and high-level
tasks are difficult to express on an imperative language primarily concerned with how mem-
ory positions are to be updated. This low-level approach to programming is also cumbersome
when programming parallel computers, as the details of control flow can become very com-
plex, and as the best execution strategy can very much depend on a computer’s architecture
and configuration.
In contrast to the traditional programming languages, logic programming provides a
high-level view of programming. In this approach, programs are fundamentally seen as a
collection of statements that define a model of the intended problem. Questions may be
asked against this model, and can be answered by an inference system, with the aid of some
user-defined control. The combination was summarised by Kowalski [96]:algorithm = logic + controlTraditionally, logic programming systems are based on Horn clauses, a natural and useful
subset of First Order Logic. For Horn clauses, a simple proof algorithm, SLD resolution, pro-
vides clear operational semantics and can be implemented efficiently. The most popular logic
programming language is Prolog [39]. Throughout its history, Prolog has exemplified the use
of logic programming for applications such as artificial intelligence, database programming,
circuit design, genetic sequencing, expert systems, compilers, simulation and natural lan-
guage processing. Other logic programming languages have been successfully used in areas
such as constraint based resource allocation and optimisation problems, and on operating
system design.
Logic programming systems are also a good match for parallel computers. As different
execution schemes may be used for the same logic program, forms of program execution can
be developed to best exploit the advantages of the parallel architecture used. This means that
parallelism in logic programs can be exploited implicitly, and that the programmer can be
left free to concentrate on the logic of the program and on the control information necessary
to obtain efficient algorithms.
2
1.1 Organisation of the Survey
In this work we survey work on the sequential and parallel implementation of logic pro-
gramming. We describe the major issues in implementing sequential and parallel logic pro-
gramming systems including compilation techniques, abstract machine implementation, and
performance evaluation. The first part gives an overview of terminology and basic concepts
of logic programming (section 2), discusses how they were applied in the Prolog language
(section 3), and presents some of the major extensions to Prolog (section 4). The second
part (section 5) discusses the sequential implementation of Prolog and other logic program-
ming languages, since the original Marseille implementation. Focus is then given to the
WAM based techniques that are the basis for most Prolog systems. Last, we briefly survey
the more recent developments, such as native code compilation. The third part discusses
the implementation of parallelism since the original proposals for And/Or parallel systems.
The general concepts are given in section 6. Section 7 discusses the major issues in Or-
parallelism, section 8 the problems with And-parallelism and some combined models, and
last section 9 discusses the issues arising from the implementation of the Andorra models.
The survey terminates with pointers to further reading in the area (section 10) and with
some conclusions (section 11).
2 Logic Programming
Logic programs manipulate terms. A term is either a logical variable, or a constant, or a
compound term. Constants are elementary objects, and include symbols and numbers. Log-
ical variables are terms that can be attributed values or bindings. This process is known as
instantiation or binding. Logical variables can be seen as referring to an initially unspecified
object. Hence, variables can be given a definite value (or bound) only once. Several variables
can also be made to share the same value, that is a variable may be instantiated to another
variable.
Compound terms are structured data objects. Compound terms comprise a functor (called
the principal functor of the term) and a sequence of one or more terms, the arguments. A
functor is characterised by its name and by its arity, or number of arguments. The formname=arity is used to refer to a functor. In the Edinburgh syntax [32], terms are written asf(T1; : : : ; Tn), where f is the name of the principal functor and the Ti the arguments. A very3
common term is the compound term :(Head; Tail), written as [HeadjTail], usually called thepair or the list constructor. The Edinburgh syntax also allows some functors to be written as
operators. For instance, the term ’+’(1,2) can also be written as 1+2.
A term is said to be ground, or fully instantiated, if it does not contain any variables. We
define size of a term to be one if the term is a constant or variable, and one plus the size of
the arguments if the term is a compound term.
We can now define Horn clauses. Horn clauses are terms of the form:H :� G1; : : : ; Gn:where H is the head of the clause, and G1; : : : ; Gn form the body of the clause. The body of aclause is a conjunction of goals. Goals are terms, either compound terms or constants. Goals
are distinguished from other terms only by the context in which they appear in the programs.
The head consists of a single goal or is empty. If the head of a clause is empty, the clause
is called a query. If the head is a single goal but the body is empty, a clause is named a unit
clause, or fact. If the head is a goal and the body is non-empty, the clause is called a non-unit
clause, or rule. Logic programs consist of clauses. A sequence of clauses whose head goals
have the same functor, forms a procedure.
One advantage of Horn clauses is that complete, and easy to implement, proof mecha-
nisms are available. Traditionally, the resolution rule is used by these mechanisms. Given
two clauses, resolution creates a new clause that is obtained by matching a negated goal of a
clause to a non-negated goal of another clause. Consider the two clauses:G00 : �A;B:: �G0; C:The resolution rule will use unification to match the goals G0 and G00 to obtain a new clause,in this case :� A;B;C. If variables appear in the clauses, then the resolution process will ob-tain the most general unifier,mgu, for the goals that are being matched. For logic programs,
the mgu is unique if it exists. If it does not exist, the resolution rule fails.
4
The resolution rule can be used in a top-down or bottom-up fashion. Top-down systems
start from an initial query. This query is matched against a clause for the corresponding
predicate, and a new goal is launched according to some selection function. For Horn clauses,
one useful top-down form of resolution is SLD-resolution (or LUSH resolution [83]). In this
method, a query is matched against a clause, and generates a new query (or resolvent) built
from the remainder of the initial query and the body of the matching goal. This process goes
on recursively until either some goal has no matching clause, or until an empty query is
generated.
There is a simple and intuitive reading to SLD-resolution. Referring to the previous
clauses, G0 :� A;B can be interpreted as part of the definition of a procedure, and the query:� G00; C as a set of goals to execute, or satisfy. SLD-resolution operates by selecting one goalof the query and calling a corresponding procedure. To satisfy this goal, some new goals need
to be satisfied, hence the new goals are added to the query. The process is repeated until all
the goals have been executed.
SLD-resolution does not specify which goal in the query should be selected. This is the
province of the selection function. Moreover, several clauses may match a goal, hence there
might be several ways to search for a solution. For a particular selection function, an SLD-
tree represents all the possible ways to solve a query from a program, that is, the search-space
of the program. It is important to remark that by changing the selection function, one can
change the search space. Consider this very small program:
a(1). a(2).
b(2).
Figure 1 shows the search trees corresponding to two different selection functions applied in
the execution of the query :- a(X), b(X). The first function selects a(X) first, and needs
to consider the two clauses for a/1. The second selects b(X) first and hence only has a single
matching clause for a(X).
There may be several strategies for exploring a search-space. A search rule describes
which alternative branches should be selected first. Search rules do not affect the search
space, but they can affect how quickly one will reach the first solution (if at all).
5
:− a(2).
:− a(X), b(X).
success
:− a(X), b(X).
:− b(1).
successfailure
:− b(2).
(1) leftmost selection function (2) rightmost selection function
Figure 1: Different Search-Trees for the Same Query
3 The Prolog Language
Prolog was invented in Marseille by Colmerauer and his team [38]. Prolog systems apply
SLD-resolution, but with some simplifications. Prolog uses a fixed selection function: the
leftmost goal is always selected first. The search rule of Prolog is also quite simple: Prolog
simply explores the tree in a depth-first left-to-right manner. Whenever several alternatives
for a goal are available, Prolog simply tries the first alternative, following the textual order in
the program. When an alternative fails, Prolog backtracks to the last place with unexplored
alternative, (that is, it restores the state of the computation as before that point) and tries
the first remaining alternative.
In Prolog, programs automatically give control information through the ordering of goals
in the body of a clause and of the clauses in the definition of a procedure. The ordering
of body goals gives control information for the selection function, whereas the ordering of
clauses gives control information for the search rule. Prolog also includes control operators.
A large number of built-in predicates provides extra control, Input/Output, database oper-
ations, arithmetic, term comparison, meta-logical operations, and set operations. Note that
actual features vary between different Prolog systems. ISO supported the development of
a standard for basic functionality in Prolog [55]. In practice, most implementations do not
fully adhere to the standard, and all have extensions. Readers should refer to the specific
Prolog manuals, such as SICStus Prolog’s [6], ECLiPSe’s [1], or YAP’s [48] for the ultimate
information on what is available on a specific system.
Note that the use of many built-ins relies on prior knowledge of Prolog execution. For
example, a typical top-level clause for a program might look like this:
6
top_level :-
read(Query),
solve(Query, Solution),
write(Solution).
The correct execution of the built-ins read/1 and write/1 implicitly assumes left-to-right
execution.
4 Other Logic Programming Languages
One of the most serious criticisms of Prolog is that the selection function used by Prolog is
too restrictive. From the beginning, authors such as Kowalski [96] remarked the effect on
the size of the search space of solving different goals in different sequences. A more flexible
execution than the one used by Prolog can be obtained through coroutining. Coroutines
cooperatively produce and consume data [96], allowing for data-driven execution. Several
designs for coroutining became very influential in the logic programming community. IC-
Prolog, designed by Clark and others [30], was one of the first logic programming languages
to support delaying of goals until data is available. Colmerauer’s Prolog-II [36] supported
geler, a built-in that would delay a goal until a variable would be instantiated. Naish’s
MU-Prolog and NU-Prolog [124] supports wait, a form of declaring that a goal should only
execute if certain arguments were instantiated. Features similar to geler and wait are
now common in modern logic programming systems.
Coroutining allows more flexible execution of goals. One can go one step further, and
associate specific rules with variables. Whereas in traditional logic programming variables
are associated with terms, in these novel frameworks variables may also be associated with
real and rational numbers, intervals, booleans, lists, and so on. A special class of goals, or
constraints, manipulates these variables.
The concept of constraint predates logic programming, but constraint logic programming
has shown to be a very effective form of applying constraints. Initial work originates from
Marseille group’s Prolog-III [37], Jaffar and other’s CLP framework of languages [85] and
ensuing CLP(R) system, and ECRC’s CHIP [56]. There has been very intense research inthe area since. We refer the reader to Marriot and Stuckey [111] for a good introduction to
this rapidly expanding field.
7
4.1 The Committed-Choice Languages
Research into coroutining led some authors to completely abandon the Prolog’s left-to-right
execution. The committed-choice, or concurrent, logic programming languages [155] are a
family of logic programming languages that use the process reading [184] of logic programs.
In this reading, each goal is viewed as a process and computation as a whole as a network
of concurrent processes, with interconnections specified by the shared logical variables. The
process reading of programs is most useful to build reactive systems, which contrast with
transformational systems in that their purpose is to interact with their environment in some
way, and not necessarily to obtain an answer to a problem. Examples of reactive systems are
operating systems and database management systems.
Initial research on these languages started in the early to mid eighties with Clark, Gre-
gory and others’ IC-Prolog [30] and then Parlog [29], and with Shapiro and others’ Con-
current Prolog [153]. At the time, the Japanese Government was starting an ambitious
research project on developing the Japanese hardware and software industry, the Fifth Gen-
eration Computing Systems Project (FGCS). Shapiro was influential in persuading Japanese
researchers to use committed choice languages as a basis for this project. These languages
were seen as an high-level programming tool that naturally allowed the exploitation of con-
currency and parallelism. Ueda’s GHC [182], later simplified to KL1 [183], was the basis
for FGCS work. The FGCS project had huge impact outside Japan, and both the American
and the European governments supported alternative research on sequential and parallel
committed choice languages and on traditional Prolog systems.
One major difference between Prolog and the committed choice languages is that clauses
are guarded. The head and usually some goals in the body of a clause are tested before ex-
ecuting. If they are satisfied, one says that a goal can commit to the clause. If the goal does
commit, the remaining clauses are discarded, even if they would match the goal. This simpli-
fied semantics and implementation. A further simplification resulted in the flat committed
languages where one only allows a few built-in goals, such as tests or arithmetic conditions,
in the guard.
Research in these languages was very intense during the eighties. Quite a few committed
choice languages and dialects have been developed. Rather sophisticated applications were
also developed, especially within the FGCS project. Work is still going on the design and
8
application of committed choice languages, but many researchers in the area moved towards
applying their framework outside logic programming, particularly as a basis for coordination
languages in parallel and distributed systems. An initial example is the commercial Strand
system [61].
4.2 Andorra
The decision to support a single solution simplifies the design and implementation of these
languages. Arguably, nondeterminism in choosing clauses is sufficient for most reactive sys-
tems, and indeed the committed-choice languages have been used successfully to implement
complex applications such as operating systems kernels or compilers [154]. On the other
hand, search programs that can be coded easily and naturally in Prolog are much more
awkward to write in these languages [177]. Several authors proposed languages that al-
lowed non-deterministic procedures in a committed choice environment, such as Saraswat’s
CP[#,j,&,;] [149], Yang’s P-Prolog [195], and Takeuchi’s ANDOR-II [170]. Starting from theopposite direction, Naish’s PNU-Prolog [125] parallelised the implementation of NU-Prolog
by allowing the execution of deterministic computations in parallel.
Yang’s work was an important influence in the David Warren’s Basic AndorraModel. This
model follows the Andorra Principle:� Goals can execute in And-parallel, provided they are determinate;� If no (selectable) goals are determinate, we can select one nondeterminate goal, andexplore its alternatives, possibly in Or-parallel.
The model could be used to parallelise Prolog programs, or to design new languages.
Work on Prolog parallelisation was mainly pursued at Bristol and resulted in the Andorra-I
system [145]. Several other groups proposed novel languages based on this model, such as
Haridi’s Andorra Prolog [74], developed at SICS, and Bahgat’s Pandora [8], from Imperial
College.
One problemwith Andorra systems is the notion of determinate. Systems such as Andorra-
I follow a strict definition of determinacy, where one considers a goal to be determinate if
head unification and built-in execution will succeed for at most one clause. The definition
was later extended to handle pruning operators.
9
Research on more ambitious form of determinacy eventually led to the Extended Andorra
Model, or EAM, where one can parallelise non-determinate computations as long as they do
not bind external variables. Warren was interested in the EAM as a way to exploit all forms
of implicit parallelism in logic programs [191]. In contrast, Haridi, Janson, and colleagues
at SICS were interested in developing new programming frameworks. They eventually pro-
posed a new language, the Andorra Kernel Language, AKL [86], a general concurrent logic
programming language based on Kernel Andorra Prolog [75]. Several parallel implementa-
tions of AKL were designed and implemented [129, 120, 118].
More recently, Smolka’s Oz language [162], developed at DFKI in Germany, provides the
main features in AKL, but generalising to a wider framework that encompasses functional
and object-oriented programming. The researchers involved in AKL have since moved on to
work with Oz.
5 Implementation of Logic Programming Languages
The Prolog language adapts well to conventional computer architectures. The selection func-
tion and search rule are simple operations, and the fact that Prolog only uses terms means
that the state of the computation can be coded quite efficiently.
5.1 The Beginnings
The original Marseille Prolog [38] system was an experimental interpreter, written in Algol-
W by Philippe Roussel. A second interpreter was written in Fortran by Battani and col-
leagues. The system used structure sharing to represent Prolog terms. In this represen-
tation terms are represented as pairs, one containing the fixed structure of the term, the
skeleton, the other containing the free variables of the term. Unification proceeds by compar-
ing the skeletons and assigning variables in the environments. Each goal was represented
by a record, that included both the data necessary to represent the execution of a matching
clause, and the data needed to backtrack. System built-ins included the basic functionality
available in modern Prolog systems.
Warren’s DEC-10 Prolog system [187] was the first compiled Prolog system. The system
was developed by Warren, F. Pereira and L. M. Pereira. The system showed good perfor-
mance, comparable to the existing Lisp systems [192]. It included a separate stack to store
10
terms. Control was still represented by activations. Mode declarations were used to simplify
compilation.
The DEC-10 Prolog system became very popular. It is the reference for the “Edinburgh
syntax” that is still followed by most Prolog systems. The efficiency of this system was also
influential in the decision of the Japanese to use logic programming for their Fifth Genera-
tion Project.
5.2 The WAM
The basis for most of the current implementations of logic programming languages is the
Warren Abstract Machine [188], or WAM, an “abstract machine” useful as a target for the
compilation of Prolog because it can be implemented very efficiently in most conventional
architectures. The WAM was developed out of the interest in having a hardware imple-
mentation of Prolog. Warren presented a set of registers, stacks and instructions that could
be efficiently implemented by specialised hardware. In practice, most WAM-based systems
emulate such a machine in software, through an emulator.
The WAM represents Prolog terms as groups of cells, where a cell can be either a value,
such as a constant, or a pointer. Variables are represented as a single cells. Free variables
are represented as pointers that point to themselves. Bound variables can simply receive the
value they are assigned to, if the value fits the cell size, or made to point to the term they are
bound to. The WAM uses a copy representation for compound terms. In this representation
a compound term is represented as a set of cells, where the first cell represents the main
functor, and the other cells represent the arguments. Unification first compares the two
main functors and then is called recursively for each argument. Note that in copying terms
must be constructed from scratch, whereas in structure sharing different terms can share the
same skeleton. Both sharing and copying have advantages and disadvantages, but Warren
argues that copying gives easier compilation and better locality [188].
The WAM was designed as a register based architecture. Arguments are passed through
the A registers. These registers also double as temporary registers, known as X registers.
Several other registers control the execution stacks:� The Environment Stack tracks the flow of control in program. Each environment framerepresents a clause, and maintains the point where to return after executing the clause,
11
plus the variables that are shared between goals in the clause. The E register points to
the current active environment.� The Choice-Point Stack stores open alternatives. Each choice-point frame records thecurrent value for the abstract machines when an alternative was taken. The B register
points to the active choice-point, which is always the last.� The Global Stack or Heap was inherited from the DEC-10 Prolog abstract machine. Itstores compound terms and variables that cannot be stored in the environment stack.
The H register points to the top of this stack.� The Trail stores conditional bindings, that is, bindings to variables that need to beundone in backtracking. In the WAM bindings can be undone by simply resetting the
variable. The TR register points to the top of this stack.
Trail Choice-Point Stack
Environment Stack
Heap
TR
B
E
H
SHB
Figure 2: WAM Stacks
Figure 2 gives an overview of the stacks used by the WAM. The figure mentions other
important WAM registers. The HB and EB registers record the value of the stack at the last
choice-point, and their value could be obtained from the choice-point pointed to by the B
12
register. The S register is used when unifying compound terms, and always points to the
global stack.
Systems that implement the WAM compile programs as sequences of abstract machine
instructions. To give the reader a flavour of what to expect fromWAM code, we give a simple
example of the code for the naive reverse procedure:
nrev([], []).
nrev([H|T], R) :-
nrev(T, R0),
conc(R0, [H], R).
The WAM code for this procedure is shown in Table 1. The code shows examples of the four
switch_on_term CV1,Cc,Cl,fail
CV1:try_me_else CV2 % nrev(
Cc:get_constant [],A1 % [],get_constant [],A2 % [])proceed
CV2: %trust_me % nrev(allocate 3get_list A1 % [unify_variable Y1 % H|unify_variable X1 % T],get_variable Y2,A2 % R) :-put_variable Y3,A2 % nrev(T,R0)call nrev/2 % ,put_unsafe_value Y3,A1 % conc(R0,put_list A2 % [unify_value Y1 % H|unify_constant [] % []],put_value Y2 % R)execute conc/3 % .
Table 1: WAM Code for Naive Reverse
different groups of WAM instructions:� Indexing instructions choose clauses from the first argument. An example is the firstinstruction in the code, switch on term. The instruction tests the type of the first ar-
gument and jumps to different code according whether the first argument is a variable,
13
constant, compound term, or pair. Other indexing instructions switch on the value of
constants and functors.� Choice-Point Manipulation instructions manage choice-points. The code includes atry me else instruction, that creates a choice-point, and a trust me instruction, that
uses and then discards an existing choice-point. The retry me instruction, not shown
here, just uses a choice-point.� Unification instructions implement specialised versions of the unification algorithm.The instructions are classified by position and type of argument. Head unification is
performed by get instructions, sub-argument unification by unify instructions, and
argument preparation for calls by put instructions. The variable instructions pro-
cess first occurrence of variables in the clause, value instructions process non-first
occurrences, the constant process constants in the clause, the list instructions
process lists, and the structure instructions process other compound terms.� Control instructions manage forward execution. The allocate and deallocate re-spectively create and destroy environments. The proceed instruction returns from a
fact. The call instruction calls a non-last subgoal, and the execute instruction calls
the last subgoal. Note that by using the deallocate and execute instructions the
WAM can perform last-call optimisation.
The outside simplicity of the WAM hides several intricate implementation issues. Com-
plete books, such as Aıt-Kaci’s tutorial on the WAM [2] have been written on this subject.
5.3 Improving the WAM
The WAM soon became the standard technique for Prolog implementations. Several optimi-
sation techniques have been proposed for the WAM, we next discuss a few:
Register Allocation The WAM is register based, and there is scope for optimising the al-
location of temporary registers. Debray gives one of the first discussion on register allocation
in the WAM [50]. More sophisticated schemes were presented by Janssen et al. [87] and by
Matyska et al. [113]. A good discussion on the problem can also be found in Mats Carlsson
thesis [21].
14
Compilation of Compound Terms The WAM uses a breadth-first scheme for compil-
ing compound terms and lists. Other schemes are possible. Marien et al. [109] present a
depth-to-right scheme that had also been used in previous implementations, such as YAP.
Intermediate schemes, such as the one used in SICStus Prolog [19] and Andorra-I [148] are
also possible.
Implementation of Cut the implementation of cut usually requires an extra register
in the WAM. Marien and Demoen discuss the problem in conjunction with stack manage-
ment [108].
Modes Unification instructions can be specialised if one knows arguments are already in-
stantiated, or if they are free. DEC-10 Prolog introduced mode annotations, where the user
can declare how he expects each argument to be used. Annotations demands extra work
from the user, and in the worst case may be erroneous, resulting in incorrect execution. Mel-
lish was the first to derive mode information automatically through global analysis [115].
He used the abstract introduction framework, originally proposed in the context of impera-
tive languages by the Cousots [42, 43]. This framework detects properties of programs by
executing the programs under a simplified abstraction of their original domain. The abstrac-
tion must be such that the analysis process will converge. Since Mellish’s work, abstract
interpretation has been used for several applications in logic programming.
One generalisation of modes is the case where one can detect what is the first usage of a
variable. If we know where the variable will be used the first time, we know the variable is
unbound, andmoreover, we did not need to use initialise the variable. Beer [11] gives the first
application of this technique, that was fundamental in the Aquarius [185] and Parma [175]
native-code systems.
Types Although the Prolog language is untyped, one can try to infer types for the terms
used during program execution. This optimisation can be used to simplify code, for example
by not tagging terms. The Aquarius and Parma global analysers can detect simple types,
such as terms that are integers, floats or symbols. More sophisticated systems can detect
recursive types. A discussion on the application of type inference is given by Marien and
colleagues [110]. There is recent work on untagging by Bigot and Debray [13] and on the
inference of recursive types by Lu [105].
15
Memory management One of advantages of Prolog is that one can recover space in back-
tracking. Still, space recovery may be required for large computations that do not fail. The
landmark paper on garbage collection for the WAM is from Appleby and colleagues [7]. More
recently, Demoen and colleagues have presented a copying based algorithm that is a good
alternative [53]
There has been some research on using compile-time analysis to reutilise stack cells. See
Bruynooghe and colleagues on how to use global analysis, and specifically abstract inter-
pretation, for this purpose [16]. More recent work on the subject has been performed by
Mulkers, Winsborough and Bruynooghe [122] and by Debray [51].
Indexing choice-point creation and management is one the most expensive operations on
the WAM. Warren’s indexing instructions have several problems. For instance, they only
index on a single argument, and they may create several choice-points for the same call.
Extensions to the original indexing scheme have been used in most Prolog systems, such as
the one used in Prolog by BIM [107], or in SICStus Prolog [19].
One can go one step further and try to minimise the number of choice points created.
Hickey and Mudambi [82] use switching trees to minimise choice-points. More recently,
Zhou et al. propose a matching tree scheme for the compilation of Prolog [197].
One possible alternative is to maintain close to the original Prolog code, but flesh choice-
points only when necessary. The scheme is know as shallow-backtracking [20], and has been
used in several Prolog implementations such as SICStus Prolog.
5.4 Generating Machine Code
Since the initial Prolog implementations, systems such as Bim-Prolog [107], Aquarius [185]
and Parma [173, 174] created interest on using direct compilation to native code as a way to
improve performance over traditional emulators. Native code systems avoid the overheads
incurred in emulating an abstracting, and simplify introducing new optimisations [186].
5.4.1 Aquarius and Parma
The Aquarius [185] systemwas developed at Berkeley. It uses the Berkeley Abstract Machine
(BAM) as the target for translation from Prolog. The BAM is a low-level instruction set
16
designed for easy translation into RISC style architectures. The abstract machine code is
generated after the following transformations:
1. Global Analysis. This gives information on groundness and freeness of variables, and
on the size of reference chains.
2. Determinism Extraction. The goal is to replace shallow backtracking by conditional
branching. This is implemented by rewriting the predicate into a series of nested state-
ments:
3. Type Enrichment. The idea is to generate different versions for the case where the first
argument (or the first interesting argument) is bound or unbound. It can be avoided if
global analysis gave type information.
4. Goal Reordering. Sometimes, goals can be reordered to improve performance.
5. Determinism Extraction. Last, try to generate the switch statements.
The output of these steps is still a program in a sub-set of Prolog. Van Roy calls this subset
Kernel Prolog.
Next, a disjunction and a clause compiler are used to generate BAM code. The disjunction
compiler handles choice-points, trying to minimise their size. The clause compiler performs
goal compilation, unification compilation and register allocation. BAM instructions are di-
vided into: the simple instructions, complex instructions, and embedded instructions. Simple
instructions are designed to support:� comparison, data movement, address calculation, and stack manipulation instructions;� switch, hash, and pair instructions are designed to support indexing;� Control instructions, such as call or return.The complex instructions are groups of instructions that represent common operations: deref-
erencing, trailing, unification and backtracking. Embedded instructions give information
that can help optimise code, such as pragmas.
Last, a final step implements several BAM optimisations, such as peephole optimisations
(with a 3-instruction window), dead-code elimination, jump elimination, duplicate-code elim-
ination and choice-point elimination.
17
The Parma system was independently developed by A. Taylor in Australia at about the
same time as Aquarius [175]. Many of its ideas are similar to the principles used in Aquar-
ius. Parma does use a more sophisticated abstract interpreter, and is specialised for MIPS
code. Van Roy gives a comparison of the two systems where Parma clearly out-performs
Aquarius [186]. In practice, Parma was never as widely available as Aquarius.
5.4.2 Super Monaco
Super Monaco is a system developed at the University of Oregon by Evan Tick’s group. It
compiles a subset of KL1 [178] into low-level code in the style of Parma or Aquarius. Work
on Super Monaco was influenced by the work in the compilation of CP, especially by the
decision graph compilation algorithms of Kliger and Shapiro [92, 93], and by previous work
in the compilation of KL1 and of Parlog, such as JAM [45].
In Super Monaco procedures are compiled into a decision graph. The compilation algo-
rithm is as follows:� The front-end generates decision-graphs in the style of Kliger. The decision graph al-lows for very good compilation, although the decision graphs initially generated by
Parma have excessive number of tests, and especially of type-checking tests.� An intermediate code is next generated. The intermediate code is similar to what onewould find in Aquarius. The initial code assumes infinite number of abstract machine
registers.� A flow analysis analyser builds a flow graph of basic blocks, and performs memoryallocation coalescence.� Next, common subexpressions are eliminated, and type and dereferencing informationis propagated through the graph. This is fundamental to remove unnecessary instruc-
tions and branches introduced in the original graph. Dead-code removal is also per-
formed.� Register allocation is performed.� Minor optimisations are performed. These include short-circuiting, branch-removal,peephole optimisation, and register move chain squashing.
18
� Native code is generated by using templates, that convert each instruction into a se-quence of native code. Templates are available for X86 and MIPS assembly instruc-
tions.
The instruction set for the intermediate language is quite complex. It includes several op-
timised instructions, such as alloc, mkstruct, mkgoal. Some instructions, such as unify
instruction call runtime routines.
Finally, note that the Super Monaco supports And-parallelism. Although this results in a
more expensive implementation than Aquarius or Parma, the authors of the system argued
that Super Monaco’s performance is close to C-systems [178]
5.5 Assembly Generation from WAM style Code
The previous system try to obtain high efficiency by translating from Prolog directly to low-
level intermediate code. One alternative possibility is to generate low-level code from inter-
mediate WAM-style code. This approach has been followed in commercial systems such as
Prolog by BIM and in SICStus Prolog. We next discuss the implementation of native code in
SICStus Prolog, as documented by Haygood [77].
SICStus Prolog included a native code compiler for SPARC and 68k machines for quite
some time. More recently, a new native code system was designed towards having a more
portable implementation [77]. The novel implementation used some of the techniques origi-
nally used in Aquarius and supports SPARC, MIPS, and 68k architectures. The new native
code system uses an abstract machine, the SAM, as target from WAM style code. The SAM
is then compiled into either RISS, an intermediate representation appropriated for RISC
machines, or directly into 68k style code. The SAM is quite simple, and includes only a few
instructions: ld, st, and stz for moves, the arithmetic and bitwise logical operations, and
simple control instructions. Complex operations, such as unifications, are implemented by
jumping to a “kernel” coded in assembly language.
Most effort in the SAM is devoted to the compilation of unification for compound terms.
SAM improves on Van Roy’s two stream technique [186].
The next step is the RISS. In the RISS the number of registers is specified, immediates
are size-restricted, and control transfers have delay slots. The transformation is performed
by another step, sam riss, which binds the registers and tries to fill delay slots.
19
Performance analysis shows a two to three fold speedup with the native code compilation
in SICStus. The results from the new implementation, used in SICStus Prolog v3, versus
the previous implementation as used in SICStus v2.1, also indicate much better performance
with the new native code: the speed of the new native code seems to be similar to the per-
formance of Aquarius without global optimisations. The implementation is also very stable,
and is now the default mode for SICStus v3 in the Sparc port.
5.6 C Code Generation
One problem with traditional native code generators is portability. Supporting a new archi-
tecture is hard. Even changes in the Operating System may force the implementor to retool
the system. These problems could be avoided if one used a higher-level language, such as
C. Improvements in the quality of compiler back-ends, such as in GCC [165], also argue for
experimenting with this solution. Although the highest performance implementations still
generate assembly code, we next discuss three systems that do generate C code.
5.6.1 JC
The Janus Compiler, jc, was developed at Arizona by De Bosschere, Debray, Gudeman and
others [64]. It compiles from the Janus committed-choice language into C code code. The
Janus committed-choice language [150] is an ask-and-tell language, that simplifies on tradi-
tional committed-choice languages by having the restriction that non-ground variables can
at most have one writable and one readable occurrence in a clause. This restriction enables
several simplifications to the implementation.
The jc compiler compiles a procedure into a single huge switch statement to reduce
work. The compiler assumes a set of virtual registers, some of which will be actual machine
registers. Differently from most committed-choice language, jc uses environments for data
representation. Instead of using a code continuation pointer, environments point to the code
to execute next in the environment. Because jc compiles into a single switch statement,
predicates are represented as numbers.
The compiler starts from the predicate as a set of clause and performs suspension-related
transformations, expression flatenning, common subexpression elimination and goal reorder-
ing. No decision graph is generated.
20
Code generation is performed next. The jc abstract machine has four kinds of registers,
ordinary tagged registers and untagged registers, which may contain addresses, integers or
floating-point numbers. The compiler always keeps untagged values. The authors claim at
most 10 registers are needed.
The system relies on optimisations to obtain good performance. The main optimisation
is call forwarding [49]. The idea is to generate procedures with multiple entry points, so
that information specific to a call can be used to bypass tests. The effects of call forward-
ing are generalised by using jump target duplication. Finally, the compiler also includes
instruction-pair motion, which allows the removal of complementary instructions, especially
of environment allocation.
The intermediate instruction set for jc is a set of simple macros that will later be com-
piled into C. Macros include Move, Assign, and quite a few versions of Jump.
The performance of the system is quite good, even versus native code implementations.
This result is explained in part by the simplicity of the language, and in part by the use of
call forwarding.
5.6.2 KLIC
The KLIC compiler [27] is a descendent of the Japanese implementation of KL1. These imple-
mentations compiled from KL1 into a WAM like instruction set, and supported parallelism.
KLIC was designed as a portable, and highly efficient implementation of KL1. The au-
thors argue that the advantages of compiling to C are portability, low-level optimisations
available in the C compiler, and ease of linking with foreign languages. The problems are
costly function calls, register allocation control, provision for interrupts and object code size.
The solutions proposed are having one module as a function, caching global variables as lo-
cal variables, which may then be placed in machine registers, using flags for synchronisation
with interrupts, and runtime routines for costly operations.
5.6.3 WAMCC
The WAMCC was designed by Diaz [34] at INRIA. It compiles from Prolog into straightfor-
ward WAM code, and then into C which can be compiled by GCC.
The WAMCC includes relatively few optimisations. The philosophy seems to be leave
the work to C compiler. Unfortunately, performance is not very impressive as compared to
21
emulated systems such as YAP or SICStus v3. This indicates that care in the description of
abstract instructions is fundamental.
The WAMCC provides a very clean system that is ideal for experimentation with new
compilation technology. For instance, the WAMCC was the basis for Diaz and Codognet’s
clp(fd) system [35].
5.7 Other Approaches
There is a substantial body of work in logic programming implementation. We mention a few
important of the most important contributions.
Structure Sharing Not all Prolog implementations rely on structure copying. MProlog
is an example that one can get good performance by using structure sharing [95]. More
recently, Li has proposed an alternative term representation that combines structure sharing
and copying [100].
The Vienna Abstract Machine The WAM generates code for each procedure that is inde-
pendent of its callers. The VAM [97] innovates over the WAM by considering both the caller
and callee to generate more efficient code. The VAM can obtain significant performance im-
provements, but runs the risk of generating excessive code.
Binarisation The BinProlog system [171] introduced several important contributions to
logic programming. In BinProlog clauses are binarised. For instance, the code for naıve
reverse:
nrev([], []).
nrev([H|T], R) :-
nrev(T, R0),
conc(R0, [H], R).
would be transformed to the following binary clauses:
nrev([], [], Cont) :- call(Cont).
nrev([H|T], R, Cont) :-
nrev(T, R0, conc(R0, [H], R, Cont)),
22
Having continuations available explicitly allows continuation-passing style compilation, a
technique that has had quite good results for the functional languages. It also simplifies the
implementation of several extensions to Prolog.
A second contribution is that BinProlog combines both native and emulated code for the
same procedure. Native code is generated for the kernel part of a clause, and the remaining
code is still emulated [172]. This gives compact code and good performance.
Global Analysis Traditional global analysis in Prolog has been performed through ab-
stract interpretation [42, 43]. There has been recently work in showing that abstract in-
terpretation is effective and can be integrated into more mainstream systems. Examples
include the work on CIAO at Madrid [17] and GAIA [99].
An alternative to do abstract interpretation is to use control flow graphs to specialised
Prolog programs. This enables several mode and type based optimisations. Recent work on
this area is described by Lindgren [103] and Ferreira [60].
Extensions to Prolog Several authors have proposed extensions to Prolog. Examples in-
clude Miller and Nadathur’s �Prolog [116], a language that adds meta-level programmingto Prolog, Monteiro and Porto’s contextual logic programming [117], an interesting scheme
for modular programming. In a different vein, there have been several proposals for combin-
ing logic programming with other paradigms, such as the functional languages [98]. Most
of these extensions are implemented either by compiling into Prolog, or by adding suitable
extensions to the WAM, or by using totally new frameworks, as in BABEL [98] or in MALI’s
implementation of �Prolog [15].Tabling [26] is arguably one of the most important extensions to logic programs. It im-
proves on left-to-right search scheme by storing and reutilising intermediate solutions. This
strategy can avoid unnecessary computations, and in a few cases can avoid looping. One
of the first widely available Prolog systems to implement tabling is XSB-Prolog. XSB im-
plements SLG-resolution [26] through the SLG-WAM [137]. More recently CAT [54] has
designed an alternate implementation for SLG-resolution, based on copying, as in the Muse
or-parallel Prolog system [4].
Mercury Some authors have defended abandoning Prolog altogether in the interest of ef-
ficiency and, arguably, of declarativeness. One popular alternative is Mercury [163]. This is
23
simple language, proposed by Somogy and colleagues. The language supports a strict mode
and type system by restricting severely the use of the logic variable, and is amenable to a
very fast C-code implementation.
6 Forms of Implicit Parallelism in Logic Programs
Parallelism in logic programs can be exploited implicitly or explicitly. In explicit systems such
as Delta Prolog [131] special types of goals (events and splits in Delta Prolog) are available
to control parallelism. Unfortunately, these languages do not preserve the declarative view
of programs as Horn clauses, and thus lose one of the most important advantages of logic
programming.
Implicit parallelism can be obtained through the parallel execution of several resolvents
arising from the same query, Or-parallelism, or through the parallel resolution of several
goals, And-parallelism. These two forms of parallelism can be explored according to very
different strategies. A large number of parallel models and systems have been developed
for both distributed and shared memory parallel architectures. It is impossible to discuss
all proposals and systems, in this survey we concentrate on some of the most influential
systems.
7 Or-Parallelism
In Or-parallel models of execution, several alternative search branches in a logic program’s
search tree can be tried simultaneously. So far, quite a few models have been proposed. Most
successful have been the multi-sequential models, where processing agents (workers in the
Aurora [106] notation) select Or-parallel work and then proceed to execute as normal Prolog
engines.
A fundamental problem in the implementation of Or-parallel systems is that different
or-branches may attribute different bindings to the same variable. In an Or-parallel system,
and differently from a sequential execution, these bindings must be simultaneously avail-
able. The problem is exemplified in Figure 3, where choice-points are represented by black
circles and branches that are being explored by some worker are represented by arrows. The
two branches corresponding to workersW1 andW2 see different bindings for the variable X .24
X = aX = b
W1
W2
X
Figure 3: The Binding Problem in Or-parallelism
A large number of Or-parallel models, including different solutions to these problems
have been proposed (the reader is referred to Gupta [69] for a survey of several Or-parallel
models). The models vary according to the way they address the binding problem. Next,
there follows a brief description of influential Or-parallel models.
7.0.1 Independent Prolog Engines
The binding problem can be avoided by having each worker to operate on its part of the
or-tree as independently from other workers as possible. One extreme is represented by
the Delphi model [31], each worker receives a set of pre-determined paths in the search
term, attributed by oracles allocated by a central controller. Whenever a worker must move
to an alternative in a different point of the search tree, the worker recomputes all state
information for that alternative. Clocksin and colleagues argue that Delphi allows for good
parallelism, low communication and efficient performance in coarse-grained problems. One
problem is that Delphi can have problems in fine-grained tasks as full recomputation of
work may become very expensive. Delphi was also severely limited by centralised control,
Lin proposed a different, self-organising, task scheduling scheme [102].
A different alternative, copying was at the first used in the Japanese Kabu-Wake sys-
tem [112] system (that was later abandoned in favour of the Fifth Generation Project) and
in Ali’s BC-machine [3]. Ali and Karlsson eventually adapted copying to standard shared
memory machines, and developed the Muse system [4]. In copy scheme based implementa-
tions, whenever a worker W1 needs work from a worker W2, it copies the entire stacks ofW2. Worker W1 will then work in its tasks independently from other workers until it needsto request more work. To minimise the number of occasions at which copying is needed,
25
scheduling in Muse favours selecting work at the bottom of tree.
Full copy systems basically use the same data-structures as a sequential Prolog engine
during ordinary execution. Thus they do not suffer any special overheads during ordinary
execution. On the other hand, task switching becomes more expensive. One argument for
other systems, such as the Aurora system we discuss later, is that it would be difficult to
support cut and built-ins efficiently in Muse. Ali and Karlsson later presented an elegant
solution to this problem [5].
Copying has become the most popular form of exploiting Or-parallelism. Muse is now
a part of the standard SICStus implementation [6], and the ideas from Muse were used in
parallelising ECLiPse [1] and YAP [140]. Muse was also influential in the design of other
parallel logic programming systems, such as the distributed Or-parallel system Opera [14],
ACE [66], and Penny [118].
7.0.2 Shared Space
Instead of each worker having its own stacks, all the workers may share the stacks. In
this case they will need to represent the different bindings for the or-branches. To do so,
changes must be made to the data structures used to represent terms. Whereas sequential
implementations of Prolog store bindings in the value-cell representing a variable variable,
these systems need to use some intermediate data structure to store bindings to variables
that are shared between or-branches. We next discuss two examples of shared space models,
the Hash Tables used in PEPSys, and the SRI model used in Aurora.
Hash Tables: The main characteristic of hash-table models [25] is that whenever a worker
conditionally bindings a variable, the binding is stored in a shared data structure associated
with the current or-branch (these data structures are implemented as hash-tables for speedy
access). Whenever a worker needs to consult the value of a variable, instead of consulting
the variable’s cell immediately it will look-up the hash-tables first. Figure 4 (a) shows the
use of hash-tables: note the links between hash windows and the fact that only some hash
windowswill have bindings. Note also that whenever the value for a variable is consulted, we
need only to consult the hash-tables younger than the variable, thus look-up is not necessary
for variables created after the last hash-table. PEPSys reduces the overheads in looking up
ancestor nodes by adding the binding of a variable to the current hash table whenever that
26
variable is accessed. Analysis of the PEPSys showed that a maximum of 7% of the execution
time is being spent in dereferencing through the hash tables.
X = ..
Y = a
X = a
Z = ..
X = b
Z = a
X = b
X = a
Z = a
(a) Hash-Windows model
X = ..
Y = a
X = a
Z = ..
X = b
Z = a
X = ...
X = bX = ...
X = a
Z = a
Z = ...
(b) Binding Array Model
Figure 4: Shared Bindings in Or-parallel Models
The PEPSys system was developed at the ECRC research labs. The idea was to sup-
port both or and And-parallelism, but only a limited form of And-parallelism was eventually
implemented. ECRC also maintained several Prolog and CLP systems. Although results
were good, PEPsys was never available outside ECRC, and ECRC’s ECLiPse system uses
copying [1]. More recently, Shen proposes hash tables in the style of PEPSys in his FIRE
system [159].
Binding arrays: In binding arrays each Prolog variable has associated to it a cell in an
auxiliary private data structure, the binding array [193, 189]. In the SRI model, a binding
array is associated with each active worker, and every variable is initialised to point to its
offset in the corresponding binding-array location. Conditional bindings are stored in the
binding arrays and in the trail, but not in the environment stack or heap. Unconditional
bindings are still stored in the stacks. In this way the model guarantees that if a variable
can be bound differently by several or-branches, it must be accessed through the binding
27
array. Moreover, binding arrays have the important property that a variable has the same
binding array offset irrespective of or-branch. Figure 4 (b) shows the use of binding arrays:
notice that binding arrays always grow when you go down the search tree.
In the SRI model the stacks are completely shared, but binding arrays are private to every
processor. When a workerW1 wants to work in a choice-point created by another workerW2,it backtracks until a choice-point it is sharing with W2 and then moves down in the or-treeuntil it finds W2’s choice-point. Backtracking is implemented by inspecting the trail andresetting all entries in the binding array altered since the last choice-point. Moving down
the tree is done by setting its pointers to the ones in the choice-point and by inspectingW2’strail (which is shared) in order to place all the corresponding bindings in the worker’s own
binding array.
Aurora [106] implements the SRI model. Aurora was a very important influence in the
development of parallel logic programming. It was arguably the first parallel system that
could run sizeable Prolog implementations. We give amore detailed description of the Aurora
implementation next.
Aurora Implementation The Aurora system [106] is based on SICStus Prolog [6]. The
system was a collaborative effort between the University of Bristol (originally Manchester)
in the UK, SICS in Sweden and Argonne National Labs in the USA. Aurora changes the
SICStus engine in several ways [22], we just discuss the most important ones:� each worker has two binding arrays, one for variables in the environments, the otherfor variables in the heap;� choice points are expanded to point to the binding arrays, and to include fields relevantto the management of work in the search-tree;� memory allocation in Aurora is changed; each worker now represents its stacks as setsof blocks. This allows the stacks to grow without the need for relocation of pointers.
A fundamental problem for Or-parallel systems is how to schedule or-work. Aurora uses
a demand driven approach to scheduling. Basically, a worker executes a task as a Prolog
engine. When its current task is finished, the worker calls the scheduler which tries to find
work somewhere in the search-tree. The interface between the two components has been de-
signed to be as independent as possible of the underlying engine and scheduler [168]. Initial
28
schedulers, such as the Manchester scheduler [18], favoured distribution of work topmost in
the tree. Such strategies do not necessarily obtain the best results, particularly when the
or-tree may be pruned by cuts or commits or when most work is fine-grained. The Bristol
scheduler [10] was initially implemented to support bottommost dispatching but has been
adapted to support several strategies, including selection of leftmost work. The Dharma
scheduler [161] favours work which is not likely to be pruned away and tries to avoid spec-
ulative work. Both the Dharma and Bristol schedulers can use voluntary suspension, i.e., a
worker abandoning its unfinished task, to move workers from speculative to non-speculative
areas in the or-tree.
Results for Aurora (with the latest schedulers) show good all-solution and improvements
on first-solution speedups on diverse applications. The static overheads caused by the paral-
lel implementation are on the average of 15%–30%, basically due to supporting the binding
arrays and to overheads in the implementation of choice-points. Instrumentation [166] shows
that fixed overheads are more substantial than the distance-dependent overheads frommov-
ing around the search-tree.
Aurora influenced the development of or-parallel system Dorpp from Silva [160], one of
the first systems designed for shared distributed memory. It also influenced the And/Or
Andorra-I system [145], and the PBA [71] and SBAmodels [41], that we discuss later. Aurora
was also one of the most stable parallel logic programming systems, and one of the few for
which there was significant work on practical applications, such as the work from Kluzniak
and from Szeredi [94, 167, 169].
8 And-Parallelism
Whereas workers in Or-parallel computations attempt to obtain different solutions to the
same query, workers in And-parallel computations collaborate in obtaining a solutions to a
query. Each And-parallel task will contribute to the solution by binding variables to values.
Problems can arise if the parallel goals have several different alternative solutions, or if sev-
eral parallel goals want to attribute different values to the same value (the binding conflict
problem).
There are several solutions to these problems. Traditional Independent And-parallel sys-
tems only run goals that do not share variables in parallel. Non-Strict Independent And-
29
Parallelism [81] gives a more general definition of independence between goals that allows
some variable sharing.
In contrast, Dependent And-parallel systems allow goals that share variables to proceed
in parallel, while (usually) enforcing some other restrictions. One example of such systems
are the parallel implementations of the committed-choice languages. Parallelism in these
languages can be exploited quite simply by allowing all goals that can commit to do so simul-
taneously. By their very nature, committed-choice systems do not have the multiple-solution
problem, as they disallow multiple solutions. In this case the binding conflict disappears,
since if two goals in the current query give different values for the same variable, then the
query is inconsistent and the entire computation should fail.
We next discuss some examples of And-parallelism in logic programming systems. We
discuss And-parallelism in the committed-choice languages, with emphasis on the imple-
mentation issues. We mention some Independent And-parallel models and systems, and
we briefly refer to some proposals to exploit dependent And-parallelism between nondeter-
minate goals. Exploiting And-parallelism between determinate goals, as performed in the
Basic Andorra Model [190] and PNU-Prolog [125] is explained in more detail outside this
chapter.
8.1 And-Parallelism in the Committed-Choice Languages
In the committed-choice languages, all goals that can commit may run in parallel and gener-
ate new goals in parallel. Parallelism in these languages is thus at the goal-level. An ideal
execution model for such languages could be based on a pool of goals. Whenever a worker is
free it looks in the pool of goals and fetches a goal. If the goal does commit to a clause, the
worker should add the goals in the body to the pool. Otherwise, the worker would look for
another goal.
Consider now a very simple FGHC program (FGHC is the flat version of GHC, where only
built-ins are allowed in the guard):
append([X|Xs], Ys, XZs) :-
XZs = [X|Zs],
append(Xs, Ys, Zs).
append([], Ys, Zs) :- Zs = Ys.
30
append4(L1, L2, L3, L4, NL) :-
append(L1, L2, I1),
append(L3, L4, I2),
append(I1, I2, NL).
The procedure append4/5 appends four lists, by first appending the lists by pairs, and then
appending the result.
Consider now the query append4([1,2],[],[3],[4],L). One And-parallel execution
with three workers is shown in figure 5. The workers executing the left two calls to append/3
execute independently. The leftmost and rightmost call to append/3 execute in “pipeline”,
that is, the leftmost call generates bindings for L1 which allow the rightmost call to commit.
ap([1,2],[],L1). ap([3],[4],L2). ap(L1, L2, L).
ap([2],[],L1’).
ap([],[],L1’’)
ap([],[4],L2’).
ap([2|L1’’],[3]|L2’,L’).
ap([],[3,4],L’’)True
True
True
ap4([1,2],[],[3],[4],L).
Figure 5: Parallel Execution of Multiple Concatenation
This very simple example shows the flexibility of the committed-choice languages. Par-
allelism between independent goals can be exploited naturally. More interestingly, logical
variables can be used in quite a natural way to give both sharing and synchronisation be-
tween goals one wants to execute in parallel.
31
Note that verifying if a goal should commit is a very simple process in the flat languages:
just performing head unification and some built-ins. Thus the parallelism that is exploited
in flat committed-choice languages is quite fine-grained.
We next discuss two influential implementations of the committed-choice languages that
have been quite successful for shared-memory machines. Both systems use abstract ma-
chines similar to the WAM, but with strong differences on how goals are manipulated.
Kimura and Chikayama’s KL1-B abstract machine [91, 152, 151] implements KL1. Cram-
mond’s JAM [46] is an abstract machine for the implementation of parallel PARLOG, includ-
ing deep guards (although there is no Or-parallelism between deep guards) and the sequen-
tial conjunction. JAM is based on a light weight process execution model [45].
Both systems use a goal–stacking implementation where for each goal, goal records store
all the arguments plus some bookkeeping information, instead of the WAM’s environment
representation. Goal stacking was first proposed to represent And-parallelism in the RAP-
WAM [79], described later. Goal records can be quite heavy, in the JAM they have a total of
eleven fields. In the KL1-B goals are stored in a separate heap. In the JAM goal records are
divided in a set of arguments, stored in the argument stack, and the goal structure, stored in
the process stack. Both systems store all variables in the heap. Parallel Parlog supports the
sequential conjunction that are implemented with a special data structure, environments.
We briefly mention some of the more important alterations to the WAM:� Manipulation of goals: Whereas Prolog can always immediately pick the leftmost goal,in committed choice languages goals can be in several states. The KL1-B classifies these
states as ready, or available for execution, suspended, or waiting for some variable to
be instantiated, and current, or being executed. The JAM follows similar principles.� Suspension on variables: Committed-choice languages allow multiple waiting, so a goalmay suspend on several variables. The opposite is also true and several goals may sus-
pend on the same variable. Both languages associate to each variable a linked list of
suspension records, or suspension notes. In the KL1-B each suspension record contains
a pointers to the suspension flag record, itself consisting of a pointer to the goal record
plus the number of variables the goal is suspending. The JAM also uses indirect ad-
dressing to guarantee synchronisation between several variables whilst accessing goal
records. One useful optimisation of JAM is that goals suspended on a single variable
32
are treated in a simpler way.� Organisation of clauses: Clauses are divided into a guard, where unification is read-only (passive in KL1-B notation) unification and can suspend, and the body where (as
in Prolog) unification can bind external variables (active in KL1-B’s terminology) or
be used for argument preparation. New instructions are need to support passive uni-
fication instructions (these instructions need to consider suspension). Both abstract
machines use a special suspension instruction that is called when the goal cannot com-
mit.� Backtracking: In committed-choice languages, there is no true backtracking. Therefore,the trail and choice points can be dispensed with. (In JAM backtracking may occur
in the guard but as the goals in the guard cannot bind external variables it is not
necessary to implement a trail.) Both abstract machines still include try instructions,
but they do notmanipulate choice-points. The disadvantage of not having backtracking,
is that it is impossible to recover space and hence there is a strong need for dynamic
memory recovery, such as the recovery of unreferenced data structures through the
MRB bit [28] and garbage collection [44, 128].
Shared-memory parallel implementations of both languages have to perform lockingwhen-
ever writing variables because other processors may want to write on the variables simulta-
neously. To reduce locking, structures are first created and unified to temporary variables;
only then they are unified with the actual arguments. Finally scheduling of and-work in
these languages is dominated by locality considerations. Each processor has its own work
queue, and manages its own parts of the data areas (again, similar ideas were proposed for
the RAP-WAM [79]). Depth first scheduling is favoured for efficiency: evaluating the leftmost
call first allows better reusage of the goal frames. JAM supports better scheduling [47] by
allowing the local run queues to be in part private to each worker. On the Sequent Symmetry
multiprocessor, JAM performs 20% to 40% faster than the corresponding implementation of
the KL1-B [179].
KLIC [27] is a more recent implementation of KL1. It generates C code, and it supports
parallelism for both shared memory and distributed memory [141].
33
8.2 Initial And/Or Models
In contrast to work on the commited-choice languages, several authors suggested describing
Prolog computation as expanding an And/Or tree, and exploiting parallelism from this tree.
Conery’s AND/OR process model [40] was one of the most influential. In Conery’s model, or-
processes are created to solve the different alternative clauses, and and-processes are created
to solve the body of a goal. The and-processes start or-processes for the execution of the goals,
and join the solutions from the or-processes for the different goals. The model restricts Or-
parallelism by only starting or-processes for the remaining clauses or if no more or-processes
in the current clauses remain to output the solutions. The structure of the model is shown in
Figure 6, based on Figure 3.7 of Conery’s book [40].
<- a & b & c.
<= OR process
<= Processes to solve literals in goal
Figure 6: Computation in the AND/OR Process Model
In Conery’s model a dependency graph between goals indicates which goals depend on
which goal. The cost of process creation and to maintain the dependency graph means that
this model has severe overheads in relation to a sequential Prolog system. Lin and Kumar
obtained an efficient shared memory AND-parallel system where dependency analysis was
performed dynamically without excessive run-time overheads [101].
The REDUCE/OR ProcessModel (ROPM) was designed by Kale [90]. The REDUCE-OR
34
tree is used instead of the AND-OR tree to represent computation. OR nodes correspond to
goals, REDUCE nodes correspond to clauses in the program with special notation for gener-
ators of variables and several or-nodes for the same goal, corresponding to different bindings
of their arguments. REDUCE nodes maintain partial solution sets, PSS, initially empty, that
are used to avoid recomputation. Sequencing between goals is given by data join graphs in
the style of Conery’s dependency graphs. Or-parallelism is explored both when a goal is first
executed, or when solutions from a goal generate several instances (the latter case is not
explored in Conery’s model). The ROPM model has been implemented on multiprocessors
using structure-sharing to implement a binding environment that prevents references from
a child to its parent node [138]. Benchmark results suggest significant overheads in the im-
plementation of the model, but almost linear speedups in suitable benchmarks due in some
cases to AND- and in some cases to Or-parallelism.
The dataflow paradigm has also been used to support And/Or parallelism. In this paradigm
the nodes in the And/Or tree become nodes in a dataflow graph. Examples of work on
dataflow systems include Wise’s Epilog [194] and Kacsuk’s LOGFLOW [88, 126].
8.3 Independent-And Parallelism
The overheads in implementing dependency (or join) graphs may be quite substantial. De-
Groot [52] proposed a scheme where only goals which do not have any run-time common
variables are allowed to execute in parallel. To verify these conditions, DeGroot suggested
the use of expressions that are added to the original clause and at run-time test the argu-
ments of goals to verify independence. DeGroot’s work on Restricted And Parallelism was
later refined by Hermenegildo. Hermenegildo proposed the conditional graph expressions,
or CGEs [79], and &-Prolog’s parallel expressions [123] to control And-parallelism. We next
give an example of a linear parallel expression in the &-Prolog language:
( ground(X), indep(Z, W) ->
a(X,Z) & b(X, W) ;
a(X,Z), b(X, W) )
If the first two conditions hold, the two goals a(X,Y) and b(X,W) are independent and can
execute in parallel, otherwise they are to be evaluated sequentially. The ground condition
guarantees that the shared variable will not contain unbound variables at run-time, and the
35
test indep guarantees that Z and W do not share run-time variables.
8.4 &-Prolog
Hermenegildo’s &-Prolog system implements Independent And-parallelism for Prolog. It is
based on an execution scheme proposed by Hermenegildo and Nasr [80] which extends back-
tracking to cope with independent and-parallel goals. As in Prolog, the scheme recomputes
the solutions to independent goals if previous independent goals are nondeterminate.
The &-Prolog language extends Prolog with the parallel conjunction and goal delaying.
One objective of corresponding system was to have sequential execution as close to Prolog
as possible. To do so, &-Prolog maintains much of the Prolog data structures. &-Prolog pro-
grams are executed by a number of PWAMs running in parallel [79]. The instruction set
of a PWAM is the SICStus Prolog instruction set, plus instructions that include the CGE
tests and instructions for parallel goal execution. Synchronisation between goals is imple-
mented through the parcall-frames, data-structures that represent the CGEs and that are
used to manage sibling and-goals. The resulting system has very low overheads in relation
to the corresponding sequential system, SICStus, and shows good speedups for the selected
benchmark programs, including examples of linear speedups.
An essential component of &-Prolog is the &-Prolog compiler. This compiler can use global
analysis to generate CGEs. (Notice that for some programs this may still be difficult, and &-
Prolog allows hand-written annotations). Abstract interpretation is used to verify conditions
such as groundness or independence between variables are always satisfied. If they are, the
CGE generator can much simplify the resulting CGEs, and avoid the overheads inherent in
performing the CGE tests.
The &-Prolog system was designed to support full Prolog. Similar to Or-parallel systems,
parallel executions of goals may break the Prolog sequence of side-effects. Several solutions
have been proposed for this problem, the one actually used in &-Prolog is simply to sequence
computation around side-effects.
The &-Prolog system was initially designed at MCC, in the USA. The implementation was
unfortunately not available outside MCC. Several researchers have then continued work in
Independent And-Parallelism. Hermenegildo and others in Spain focussed on compilation is-
sues, both for parallelising Prolog [17] and the constraint logic programming languages [9].
36
Shen in the UK reimplemented most of the functionality of &-Prolog in his DASWAM pro-
totype for DDAS [157]. Pontelli and Gupta at New Mexico State University have also re-
designed most of &-Prolog for their &-ACE prototype [134]. The &-ACE system has been
very successful as an efficient implementation of Independent And-parallelism, and has con-
tributed several important run-time optimisations [135].
8.5 Some Dependent And-Parallel Models
Several models that allow dependent And-parallelism between non-determinate goals have
been proposed in the literature. In order to solve the binding problems all these models
impose some ordering between goals.
In Tebra’s optimistic And-parallel model [176], the standard Prolog ordering is used. Dur-
ing normal execution all goals are allowed to go ahead and bind any variables. When binding
conflicts arise between two goals, the goal that would have been executed first by Prolog has
priority, and the other goal is discarded. In the worst case quite a lot of work can be dis-
carded, and the parallelism can become very speculative. Other optimistic models reduce
the amount of discarded work by using other orderings of goals, such as time-stamps [127].
Goals can also be classified according to producer-consumer relationships. In these mod-
els, producer goals are allowed to bind variables, and consumer goals wait for these vari-
ables. Goals can be classified as producers for a variable statically [33, 164], or dynam-
ically [156]. In Somogyi’s system [164], extended mode declarations statically determine
producer-consumer relationships. In the Codognets’ IBISA scheme [33], it is suggested a
system where variables in a clause are marked with read-only annotations in the style of
Concurrent Prolog. In Shen’s DDAS [158] model, dynamic relationships between produc-
ers and consumers are obtained by extending the CGE notation. The extended CGEs now
mark some variables as dependent, and the system dynamically follows these variables to
verify which goals are leftmost for them. If a goal has the leftmost occurrence of a dependent
variable it is allowed to bind the variable, but otherwise it delays.
The producer-consumer models become very complex when the producer or consumer
have to backtrack. Both Somogyi’s scheme and especially IBISA apply ideas of “intelligent
backtracking” to reduce the search space. DDAS uses a backtracking scheme similar to
Hermenegildo’s semi-intelligent backtracking [80]. Shen claims that this schemes results in
37
a simpler execution model, closer to sequential Prolog, the target language for DDAS.
The consumer-producer models support Independent And-parallelism as a subset. In
addition, dependent And-parallelism between deterministic computations (as long as the
producer-consumer relations between goals are fixed) can be exploited naturally. Finally, the
models offer dependent And-parallelism between non-determinate dependent goals.
Shen gives a description and good initial performance results for his implementation of
DDAS [157]. One problem with Shen’s implementation of dependent And-parallelism is that
it is quite complex to support variables that are shared between and-branches. A simpler and
more elegant technique for detecting whether a goal is the producer or a consumer for such
a variable was proposed and implemented for ACE [133], with good results. ACE also inno-
vates over pure producer-consumer parallelism by allowing eager execution of deterministic
goals, in the style of Andorra.
8.6 Independent And/Or models
The PEPSys [25] model, Gupta’s Extended And-Or Tree model [68] and Fagin’s model [59]
are three models designed to implement Independent And- and Or-parallelism in a shared-
memory framework. All use CGEs to implement And-parallelism, and combine Or-parallelism
with And-parallelism by extending respectively hash tables and binding arrays. The sev-
eral solutions from the independent and-parallel computations are implemented through a
special cross-product node, which can be quite complex to implement. In fact, the PEPSys
system only truly implements deterministic And-parallel computations [23].
New proposals for combined and-or parallelism use backtracking to obtain the several
And-parallel solutions (thus, and as in &-Prolog, some recomputation is performed) [67].
Such systems should be easier to implement than PEPSys or the Extend And-Or Tree Model,
as they can exploit the technology of &-Prolog and the Or-parallel systems, and as they do not
need to calculate cross-products. Such proposals include Shen’s or-under-andmodel [156] and
the C-Tree, a framework for And/Or models that was proposed by Gupta and colleagues [67].
Proposed implementations of the C-tree framework include Gupta and Hermenegildo’s ACE [66],
a model that combines Muse and &-Prolog, and Gupta’s PBA model [70], a model that com-
bines Aurora and &-Prolog. One important advantage of these models is that they are quite
suitable to the implementation of Full Prolog [72].
38
Work in the implementation of these models has shown several implementation difficul-
ties, particularly in memory management. Correia and colleagues argue that to address
these problems, C-tree based models must be redesigned to minimise interference between
and and Or-parallelism, and propose a novel data-structure towards this purpose, the Sparse
Binding Array [144].
An alternative approach to combining and-or parallelism through recomputation has
been proposed by Shen based in his PhD work in simulating And/Or parallelism [156]. The
FIRE model uses hash tables instead of copying or binding arrays [159], thus avoiding mem-
ory management problems.
8.7 Reform Prolog
A very different approach to And-parallelism was suggested in Reform Prolog, a system de-
veloped at Upsalla [12]. The idea was to use data-parallelism by unfolding recursive calls in
a predicate. The scheme was shown to obtain excellent results. The Reform Prolog system
was developed from scratch, and included a compiler that could automatically parallelise
simple recursion on lists and on numbers. It was one of the few parallel logic programming
systems that could use static scheduling.
Other authors have suggested that this parallelism can be implemented on top of tradi-
tional Prolog systems, either by compile-time transformations (see Hermenegildo et al. [78]),
or by run-time optimisations (see Gupta et al. [132]).
9 Andorra Based Systems
Andorra-I was the first implementation of the Basic Andorra Model. The system was de-
veloped at the University of Bristol by Beaumont, Dutra, Santos Costa, Yang, and War-
ren [145, 196]. It was designed to take full advantage of the Basic Andorra Model. This
means both exploiting parallelism, and exploiting implicit coroutining as much as possible.
Andorra-I programs are executed by teams of abstract processing agents called workers.
Each worker usually corresponds to a physical processor. Each team, when active, is asso-
ciated with a separate or-branch in the computation tree and is in one of two computation
phases:
39
Determinate For a team, as long as determinate goals exist in the or-branch, all such goals
are candidates for immediate evaluation, and thus can be picked up by a worker. This
phase ends when no determinate goals are available, or when a determinate goal fails.
In the first case, the team moves to the non-determinate phase. In the second case, the
corresponding or-branch must be abandoned, and the team will backtrack in order to
find a new or-branch to explore.
Nondeterminate If no determinate goals exist, the leftmost goal (or a particular goal spec-
ified by the user) is reduced. A choice-point is created to represent the fact that the
current or-branch has now forked into several or-branches, while the team itself will
explore one of the or-branches. If other teams are available, they can be used to explore
the remaining or-branches.
Figure 7 shows the execution phase in terms of a pool of determinate goals. The figure
shows that the determinate phase is abandoned when either no more determinate goals are
available or when the team fails, and the determinate phase is reentered either after creating
a choice point, or after backtracking and reusing a choice point.
Determinate Goals
backtrack
create choice point
no determinate
goals
fail
reduce
Determinate Phase Nondeterminate Phase
Figure 7: Execution Model of Andorra-I
During the determinate phase, the workers of each team behave similarly to those of a
parallel committed-choice system; they work together to exploit And-parallelism. During
the non-determinate phase, and on backtracking, only one particular worker in the team
is active. We call this worker the master and the remaining workers slaves. The master
40
performs choice-point creation and backtracking in the same way as an Or-parallel Prolog
system.
The Andorra-I system consists of several components. The preprocessor, designed by
Santos Costa, is responsible for compiling the program and for the sequencing information
necessary to maintain the correct execution of Prolog programs [147]. The engine, designed
by Yang, and was responsible for the execution of the Andorra-I programs [146] Initially
this was an interpreted system largely based on JAM [45], as regards the treatment of And-
parallelism, and on Aurora [106], as regards the treatment of Or-parallelism. However, the
integration of both types of parallelism introduced a number of new implementation issues,
as discussed in [146]. A compiler based version of Andorra-I was next developed. Yang
et al. [196] describe the key ideas of the abstract machine, based on JAM, and give a perfor-
mance analysis. The compilation techniques used are describe by Santos Costa et al [148].
Most of the execution time of workers should be spent executing engine code, i.e. per-
forming reductions. Whenever a worker runs out of work, they enter a scheduler to find
another piece of available work. Andorra-I includes an or-scheduler, an and-scheduler and a
reconfigurer.
The or-scheduler is responsible for finding or-work, i.e. an unexplored alternative in the
or-tree implied by the logic program. Andorra-I used Bristol or-scheduler [10], originally
developed for the Aurora system. The and-scheduler, developed by Yang, is responsible for
finding eligible and-work, which corresponds to a goal in the run queue (list of goals not yet
executed) of a worker in the same team. Each worker in a team keeps a run queue of goals.
This run queue of goals has two pointers. The pointer to the head of the queue is only used
by the owner. The pointer to the tail of the queue is used by other workers to “steal” goals
when their own run queues are empty. If all the run queues are empty, the slaves wait either
until some other worker (in our implementation, the master) creates more work in its run
queue or until the master detects that there are no more determinate goals to be reduced
and it is time to create a choice-point.
The initial version of Andorra-I relied on a fixed configuration of workers between teams.
Dutra [57] designed the Andorra-I reconfigurer, which could dynamically adapt workers to
the different forms of available parallelism. The reconfigurer was shown to be quite effective
and in fact could quite often improve on the best hand-tailored configurations.
Andorra-I was a very significant step in parallel logic programming. It was the first
41
system to support both Dependent And-Parallelism and Or-Parallelism. It showed good
speedups, whilst maintaining an acceptable base performance. More recently, there has been
work on application development with Andorra-I, and also on supporting parallelisation of
finite-domain constraints [63]. Andorra-I has also been used as basis for studying the per-
formance of parallel logic programming systems on scalable architectures, see Santos Costa,
Bianchini, and Dutra [143, 142], and on software distributed shared memory systems [84].
9.0.1 Other Andorra Systems
Palmer and Naish worked on a different implementation of the Basic Andorra Model, as
an extension of the Parallel Nu-Prolog System, the NUA-Prolog system [130]. This was
a compiled system supporting And-parallelism, and also showed good speedups. Tick and
Korsloot also did some work for an implementation of Bahgat’s Pandora [180].
The development of the EAM and AKL led to several parallel implementations of AKL.
In Melbourne, Palmer designed an implementation supporting And-parallelism [129]. He
extended AKL with mode declarations to provide finer control. Moolenar and others at Leu-
ven designed an implementation supporting AND/OR parallelism [120] through hash tables.
This work was one of the first to use the Andorra principle for constraints, namely finite-
domain constraints [121]. Montelius and others worked on a parallel implementation of AKL
using copying, Penny [118]. The implementation does not separate between forms of paral-
lelism, both are exploited in much the same way. Other interesting contributions result from
the work on parallel garbage collection and from the extensive performance analysis [119].
Research in parallelism on the language Oz has followed different directions. Oz has
an efficient sequential implementation that benefits from work in logic programming sys-
tems [114]. Researchers have beenmore interested in explicit parallelism through threads [136],
and in using Oz as a control language for distributed programming [76].
10 Further Reading
It is impossible to cover all the research in implementing sequential and parallel logic pro-
gramming systems in a single survey. Several interesting books and surveys are available
that cover further ground on this area. Aıt-Kaci’s book [2] is the standard reference on
the WAM. Van Roy published an excellent survey on sequential Prolog implementations,
42
up to 1993 [186]. The survey gives a more detailed analysis of the WAM and of the BAM,
and discusses subjects not covered in this work, such as hardware implementations of Pro-
log. Regarding parallel systems, Tick’s book gives an excellent analysis of Or-parallel Prolog
systems versus the committed choice languages [177], Gupta’s book gives an insightful dis-
cussion of the issues of parallelism for Prolog systems [65], and Kacsuk and Wise edited
a thorough collection on work on distributed implementations of logic programming [89].
Chassin de Kergommeaux and Codognet give a survey on Parallel Logic Programming Sys-
tems, also from 1994 [24]. An important survey is the one on committed choice languages
given by Shapiro [155].
Research in this field has often been published in the major conferences on logic pro-
gramming, published as The Logic Programming Series from MIT Press. Other important
conferences are the PLILP Conferences, published in Springer Verlag LNCS series. A yearly
workshop on implementation, so far associated with the major Logic Programming Confer-
ence, are a good source on the most up to date research. Often these workshops are published
as books [181, 58]. The Journal of Logic Programming, published by Elsevier, and New Gen-
eration Computing, published by Ohmsha and Springer Verlag are the two main journals in
the area. Implementation work on logic programming can also be found in several related
journals and conferences.
11 Conclusions
Logic programming is one of themost successful and widely available programming paradigms.
Part of this success results from the extensive work on the design and implementation of
logic programming systems. In this paper we survey some of the most significant work on
sequential and parallel implementations of logic programming.
Work in sequential implementations has been very successful. The work in Prolog has
evolved from the original Marseille interpreter to compiled systems such as the WAM and
then to high-performance native code implementations. There is still scope on applying
global optimisations effectively to further improve performance, and on optimising perfor-
mance for modern computer architectures.
Some of the most exciting recent work on sequential work results from the many new
paths being constantly open in logic programming. Constraint Programming is now becom-
43
ing a separate research area. Work on tabulation and memoing is at last tackling some of
Prolog’s “original sins”. Ambitious execution schemes, such as the EAM, are being consid-
ered [104]. Other researchers are keen on combining logic programming with other success-
ful paradigms such as functional programming or Java.
Logic programming is an ideal paradigm for the development of parallel systems. Al-
though the wide acceptance of parallel logic programming has been hampered by the rela-
tive dearth of parallel computers, logic programming is one area in computing where main-
stream systems do support implicit parallelism. In fact, Or-parallelism is supported in the
Prolog systems such as ECLiPse [1], SICStus Prolog [6], and YAP [48]. More experimen-
tal, but still excellent, available systems include Andorra-I [145], KLIC [27], Penny [118],
DASWAM [157], and ACE [134]. Research is now close to at last efficiently combining And-
parallelism with Or-parallelism while preserving Prolog style execution. Work is also going
on further optimisations to the current parallel paradigms, on providing more effective ap-
plication support, and on supporting extensions to logic programming, such as constraints,
tabulation [62, 139] and functional programming [73].
Acknowledgments
Thanks are due to Kish Shen and to Ines Dutra for reading drafts of this paper and making
helpful comments. Thanks are also due to the organisers of ILPS97 workshop on implemen-
tation who made this work possible: Enrico Pontelli, Gopal Gupta, Ines Dutra, and Fernando
Silva. The author wants to acknowledge the support of the COPPE/Sistemas, Universidade
Federal do Rio de Janeiro, and from the Melodia project (grant JNICT PBIC/C/TIT/2495/95)
for this work. Last, and not the least, the author wants to acknowledge the excellent work
that justifies this survey, and apologise to every author whose contribution could not be in-
cluded in this survey.
References
[1] A. Aggoun, D. Chan, P. Dufresne, E. Falvey, H. Grant, A. Herold, G. Macartney,
M. Meier, D. Miller, S. Mudambi, B. Perez, E. van Rossum, J. Schimpf, P. A. Tsahageas,
and D. H. de Villeneuve. ECLiPSe 3.5 User Manual. ECRC, December 1995.
44
[2] H. Aıt-Kaci.Warren’s Abstract Machine— A Tutorial Reconstruction. MIT Press, 1991.
[3] K. A. M. Ali. Or-parallel Execution of Prolog on the BC-Machine. In Proceedings of
the Fifth International Conference and Symposium on Logic Programming, pages 253–
268. MIT Press, 1988.
[4] K. A. M. Ali and R. Karlsson. The Muse Or-parallel Prolog Model and its Performance.
In Proceedings of the North American Conference on Logic Programming, pages 757–
776. MIT Press, October 1990.
[5] K. A. M. Ali and R. Karlsson. Scheduling Speculative Work in Muse and Performance
Results. International Journal of Parallel Programming, 21(6):449–476, December
1992. Published in Sept. 1993.
[6] J. Andersson, S. Andersson, K. Boortz, M. Carlsson, H. Nilsson, T. Sjoland, and
J. Widen. SICStus Prolog User’s Manual. Technical report, Swedish Institute of Com-
puter Science, November 1997. SICS Technical Report T93-01.
[7] K. Appleby, M. Carlsson, S. Haridi, and D. Sahlin. Garbage collection for Prolog based
on WAM. Communications of the ACM, 31(6):171–183, 1989.
[8] R. Bahgat and S. Gregory. Pandora: Non-deterministic Parallel Logic Programming.
In Proceedings of the Sixth International Conference on Logic Programming, pages
471–486. MIT Press, June 1989.
[9] M. G. d. l. Banda and M. V. Hermenegildo. A Practical Approach to the Global Analysis
of CLP Programs. In ILPS93, pages 437–455, 1993.
[10] A. Beaumont, S. M. Raman, P. Szeredi, and D. H. D. Warren. Flexible Scheduling of
OR-Parallelism in Aurora: The Bristol Scheduler. In PARLE91: Conference on Parallel
Architectures and Languages Europe, volume 2, pages 403–420. Springer Verlag, June
1991.
[11] J. Beer. Concepts, Design, and Performance Analysis of a Parallel Prolog Machine.
Number 404 in Lecture Notes in Computer Science. Springer Verlag, 1989.
[12] J. Bevemyr, T. Lindgren, and H. Millroth. Reform Prolog: The Language and its Im-
plementation. In Proceedings of the Tenth International Conference on Logic Program-
ming, pages 283–298. MIT Press, June 1993.
45
[13] P. A. Bigot and S. K. Debray. A simple approach to supporting untagged objects in
dynamically typed languages. The Journal of Logic Programming, 32(1), July 1997.
[14] J. Briat, M. Favre, C. Geyer, and J. Chassin. Scheduling of or-parallel Prolog on a
scaleable, reconfigurable, distributed-memory multiprocessor. In Proceedings of Par-
allel Architecture and Languages Europe. Springer Verlag, 1991.
[15] P. Brisset and O. Ridoux. Continuations in �Prolog. In D. S. Warren, editor, Pro-ceedings of the Tenth International Conference on Logic Programming, pages 27–43,
Budapest, Hungary, 1993. The MIT Press.
[16] M. Bruynooghe, G. Janssens, A. Callebault, and B. Demoen. Abstract Interpretation:
Towards the Global Optimisation of Prolog Programs. In Proceedings 1987 Symposium
on Logic Programming, pages 192–204. IEEE Computer Society, September 1987.
[17] F. Bueno, M. G. d. l. Banda, and M. V. Hermenegildo. Effectiveness of Abstract In-
terpretation in Automatic Parallelization: A Case Study in Logic Programming. ACM
TOPLAS, 1998.
[18] A. Calderwood and P. Szeredi. Scheduling or-parallelism in Aurora – the Manchester
scheduler. In Proceedings of the Sixth International Conference on Logic Programming,
pages 419–435. MIT Press, June 1989.
[19] M. Carlsson. Freeze, Indexing, and Other Implementation Issues in the Wam. In J.-L.
Lassez, editor, Proceedings of the Fourth International Conference on Logic Program-
ming, MIT Press Series in Logic Programming, pages 40–58. University of Melbourne,
”MIT Press”, May 1987.
[20] M. Carlsson. On the efficiency of optimised shallow backtracking in Compiled Prolog.
In Proceedings of the Sixth International Conference on Logic Programming, pages 3–
15. MIT Press, June 1989.
[21] M. Carlsson. Design and Implementation of an OR-Parallel Prolog Engine. SICS Dis-
sertation Series 02, The Royal Institute of Technology, 1990.
[22] M. Carlsson and P. Szeredi. The Aurora abstract machine and its emulator. SICS
Research Report R90005, Swedish Institute of Computer Science, 1990.
46
[23] J. Chassin de Kergommeaux. Measures of the PEPSys Implementation on the MX500.
Technical Report CA-44, ECRC, January 1989.
[24] J. Chassin de Kergommeaux and P. Codognet. Parallel Logic Programming Systems.
Computing Surveys, 26(3):295–336, September 1994.
[25] J. Chassin de Kergommeaux and P. Robert. An Abstract Machine to Implement Or-And
Parallel Prolog Efficiently. The Journal of Logic Programming, 8(3), May 1990.
[26] D. Chen and D. S. Warren. Query evaluation under the well-founded semantics. In
Proc. of 12th PODS, pages 168–179, 1993.
[27] T. Chikayama, T. Fujise, and D. Sekit. A Portable and Efficient Implementation of
KL1. In 6th International Symposium PLILP, pages 25–39, 1994.
[28] T. Chikayama and Y. Kimura. Multiple Reference Management in Flat GHC. In J.-L.
Lassez, editor, Proceedings of the Fourth International Conference on Logic Program-
ming, MIT Press Series in Logic Programming, pages 276–293. University of Mel-
bourne, ”MIT Press”, May 1987.
[29] K. L. Clark and S. Gregory. PARLOG: Parallel Programming in Logic. ACM TOPLAS,
8:1–49, January 1986.
[30] K. L. Clark, F. G. McCabe, and S. Gregory. IC-PROLOG – language features. In
K. L. Clark and S. A. Tarnlund, editors, Logic Programming, pages 253–266. Academic
Press, London, 1982.
[31] W. F. Clocksin. Principles of the DelPhi parallel inference machine. Computer Journal,
30(5):386–392, 1987.
[32] W. F. Clocksin and C. Mellish. Programming in Prolog. Springer-Verlag, 1986.
[33] C. Codognet and P. Codognet. Non-deterministic Stream And-Parallelism Based on
Intelligent Backtracking. In G. Levi and M. Martelli, editors, Logic Programming:
Proceedings of the Sixth International Conference, pages 83–79. The MIT Press, 1989.
[34] P. Codognet and D. Diaz. wamcc: Compiling Prolog to C. In 12th International Confer-
ence on Logic Programming. The MIT Press, 1995.
47
[35] P. Codognet and D. Diaz. Compiling constraints in clp(fd). The Journal of Logic Pro-
gramming, 27(3):185–226, June 1996.
[36] A. Colmerauer. Prolog II: Reference Manual and Theoretical Model. Groupe
D’Intelligence Artificielle, Faculte Des Sciences De Luminy, Marseilles, October 1982.
[37] A. Colmerauer. An Introduction to Prolog-III. Communications of the ACM, 33(7):69–
90, July 1990.
[38] A. Colmerauer. The Birth of Prolog. In The Second ACM-SIGPLAN History of Pro-
gramming Languages Conference, pages 37–52. ACM, March 1993.
[39] A. Colmerauer, H. Kanoui, R. Pasero, and P. Roussel. Un systeme de communication
homme–machine en francais. Technical report cri 72-18, Groupe Intelligence Artifi-
cielle, Universite Aix-Marseille II, October 1973.
[40] J. S. Conery. Parallel Execution of Logic Programs. Kluwer Academic Publishers,
Norwell, Ma 02061, 1987.
[41] M. E. Correia, F. M. A. Silva, and V. Santos Costa. The SBA: Exploiting orthogonality
in OR-AND Parallel Systems. In Proceedings of the 1997 International Logic Program-
ming Symposium, October 1997. Also published as Technical Report DCC-97-3, DCC -
FC & LIACC, Universidade do Porto, April, 1997.
[42] P. Cousot and R. Cousot. Abstract Interpretation: a Unified Lattice Model for Static
Analysis of Programs by Construction or Approximation of Fixpoints. In Conference
Record of the 4th ACM Symposium on Principles of Programming Languages, pages
238–252, 1977.
[43] P. Cousot and R. Cousot. Abstract interpretation and application to logic programs.
The Journal of Logic Programming, 13(1, 2, 3 and 4):103–179, 1992.
[44] J. A. Crammond. A Garbage Collection Algorithm for Shared Memory Parallel Proces-
sors. International Journal of Parallel Processing, 17(6), December 1988.
[45] J. A. Crammond. Implementation of Committed Choice Logic Languages on Shared
Memory Multiprocessors. PhD thesis, Heriot-Watt University, Edinburgh, May 1988.
Research Report PAR 88/4, Dept. of Computing, Imperial College, London.
48
[46] J. A. Crammond. The Abstract Machine and Implementation of Parallel Prolog. Tech-
nical report, Dept. of Computing, Imperial College, London, June 1990.
[47] J. A. Crammond. Scheduling and Variable Assignment in the Parallel Parlog Imple-
mentation. In 1990 North American Conference on Logic Programming, pages 642–
657. MIT Press, October 1990.
[48] L. Damas, V. Santos Costa, R. Reis, and R. Azevedo. YAP User’s Guide and Reference
Manual, 1989.
[49] K. De Bosschere, S. K. Debray, D. Gudeman, and S. Kannan. Call Forwarding: An In-
terprocedural Optimization Technique for Dynamically Typed Languages. In Proceed-
ings of the SIGACT–SIGPLAN Symposium on Principles of Programming Languages,
1994.
[50] S. K. Debray. Register allocation in a Prolog machine. In Symposium on Logic Pro-
gramming, pages 267–275. IEEE Computer Society, The Computer Society Press,
September 1986.
[51] S. K. Debray. On copy avoidance in single assignment languages. In ICLP93, pages
393–407, 1993.
[52] D. DeGroot. Restricted and-parallelism. In H. Aiso, editor, International Conference
on Fifth Generation Computer Systems 1984, pages 471–478. Institute for New Gener-
ation Computing, Tokyo, 1984.
[53] B. Demoen, G. Engels, and P. Tarau. Segment Preserving Copying Garbage Collec-
tion for WAM based Prolog. In Proceedings of the 1996 ACM Symposium on Applied
Computing, pages 380–386, Philadelphia, February 1996. ACM Press.
[54] B. Demoen and K. Sagonas. CAT: the Copying Approach to Tabling. In Proceedings of
PLILP/ALP98. Springer Verlag, September 1998.
[55] P. Deransart, A. Ed-Dbali, L. Cervoni, and A. A. Ed-Ball. Prolog, The Standard :
Reference Manual. Springer Verlag, 1996.
[56] M. Dincbas, P. Van Hentenryck, H. Simonis, A. Aggoun, T. Graf, and F. Berthier. The
Constraint Logic Programming Language CHIP. In International Conference on Fifth
Generation Computer Systems 1988, pages 693–702. ICOT, Tokyo, Japan, Nov. 1988.
49
[57] I. Dutra. Strategies for Scheduling And- and Or-Work in Parallel Logic Programming
Systems. In Logic Programming: Proceedings of the 1994 International Symposium,
pages 289–304. MIT Press, 1994.
[58] I. Dutra, V. Santos Costa, F. Silva, E. Pontelli, G. Gupta, and M. Carro, editors. Paral-
lelism and Implementation Technology for Logic and Constraint Logic Programming.
Nova Science, 1998.
[59] B. S. Fagin and A. M. Despain. The Performance of Parallel Prolog Programs. IEEE
Transactions on Computers, 39(12):1434–1445, Dec. 1990.
[60] M. Ferreira and L. Damas. Unfolding WAM Code. In 3rd COMPULOG NETWorkshop
on Parallelism and Implementation Technology for (Constraint) Logic Programming
Languages, Bonn, September 1996.
[61] I. Foster and S. Taylor. Strand : NewConcepts in Parallel Programming. Prentice-Hall,
January 1990.
[62] J. Freire, R. Hu, T. Swift, and D. S. Warren. Exploiting Parallelism in Tabled Evalua-
tions. In 7th International Symposium PLILP, pages 115–132, 1995.
[63] S. Gregory and R. Yang. Parallel Constraint Solving in Andorra-I. In International
Conference on Fifth Generation Computer Systems 1992, pages 843–850. ICOT, Tokyo,
Japan, June 1992.
[64] D. Gudeman, K. de Bosschere, and S. K. Debray. jc: An Efficient and Portable Sequen-
tial Implementation of Janus. In Proceedings of the 1992 Joint International Confer-
ence and Symposium on Logic Programming, 1992.
[65] G. Gupta. Multiprocessor Execution of Logic Programs. Kluwer Academic Press, 1994.
[66] G. Gupta, M. Hermenegildo, E. Pontelli, and V. Santos Costa. ACE: And/Or-parallel
Copying-based Execution of Logic Programs. In Proc. ICLP’94, pages 93–109. MIT
Press, 1994.
[67] G. Gupta, M. Hermenegildo, and V. Santos Costa. And-Or Parallel Prolog: A Recom-
putation based Approach. New Generation Computing, 11(3,4):770–782, 1993.
50
[68] G. Gupta and B. Jayaraman. Compiled And-Or Parallelism on Shared Memory Multi-
processors. In Proceedings of the North American Conference on Logic Programming,
pages 332–349. MIT Press, October 1989.
[69] G. Gupta and B. Jayaraman. Analysis of or-parallel execution models. ACM TOPLAS,
15(4):659–680, 1993.
[70] G. Gupta and V. Santos Costa. And-Or Parallelism in Full Prolog with Paged Binding
Arrays. In LNCS 605, PARLE’92 Parallel Architectures and Languages Europe, pages
617–632. Springer-Verlag, June 1992.
[71] G. Gupta and V. Santos Costa. Optimal implementation of and-or parallel Prolog.
Future Generation Computer Systems, 14(10):71–92, 1994.
[72] G. Gupta and V. Santos Costa. Cuts and Side-Effects in And-Or Parallel Prolog. Jour-
nal of Logic Programming, 27(1):45–71, April 1996.
[73] M. Hanus and R. Sadre. A Concurrent Implementation of Curry in Java. In Workshop
on Parallelism and Implementation Technology for (Constraint) Logic Programming
Languages, Port Jefferson, October 1997.
[74] S. Haridi and P. Brand. Andorra Prolog–an integration of Prolog and committed choice
languages. In International Conference on Fifth Generation Computer Systems 1988.
ICOT, 1988.
[75] S. Haridi and S. Jansson. Kernel Andorra Prolog and its Computational Model. In
D. Warren and P. Szeredi, editors, Proceedings of the Seventh International Conference
on Logic Programming, pages 31–46. MIT Press, 1990.
[76] S. Haridi, P. Van Roy, and G. Smolka. An overview of the design of Distributed Oz. In
Proceedings of the Second International Symposium on Parallel Symbolic Computation
(PASCO ’97), pages 176–187, Maui, Hawaii, USA, July 1997. ACM Press.
[77] R. C. Haygood. Native code compilation in SICStus Prolog. In P. V. Hentenryck, edi-
tor, Proceedings of the Eleventh International Conference on Logic Programming. MIT
Press, June 1994.
51
[78] M. Hermenegildo and M. Carro. Relating Data–Parallelism and And–Parallelism in
Logic Programs. In Proceedings of EURO–PAR’95, Swedish Institute of Computer
Science (SICS), August 1995.
[79] M. V. Hermenegildo. An Abstract Machine Based ExecutionModel for Computer Archi-
tecture Design and Efficient Implementation of Logic Programs in Parallel. PhD thesis,
Dept. of Electrical and Computer Engineering (Dept. of Computer Science TR-86-20),
University of Texas at Austin, Austin, Texas 78712, August 1986.
[80] M. V. Hermenegildo and R. I. Nasr. Efficient Management of Backtracking in AND-
parallelism. In Third International Conference on Logic Programming, number 225 in
Lecture Notes in Computer Science, pages 40–54. Imperial College, Springer-Verlag,
July 1986.
[81] M. V. Hermenegildo and F. Rossi. Non-Strict IndependentAnd-Parallelism. In Proceed-
ings of the Seventh International Conference on Logic Programming, pages 237–252.
MIT Press, June 1990.
[82] T. Hickey and S. Mudambi. Global compilation of Prolog. The Journal of Logic Pro-
gramming, pages 193–230, November 1989.
[83] R. Hill. LUSH-Resolution and its Completeness. Dcl memo 78, Department of Artificial
Intelligence, University of Edinburgh, 1974.
[84] Z. Huang, C. Sun, A. Sattar, and W.-J. Lei. Parallel Logic Programming on Distributed
Shared Memory System. In Proceedings of the IEEE International Conference on In-
telligent Processing Systems, October 1997.
[85] J. Jaffar and S. Michaylov. Methodology and implementation of a CLP system. In
J.-L. Lassez, editor, Proceedings of the Fourth International Conference on Logic Pro-
gramming, MIT Press Series in Logic Programming, pages 196–218. University of Mel-
bourne, ”MIT Press”, May 1987.
[86] S. Janson and S. Haridi. Programming Paradigms of the Andorra Kernel Language. In
Logic Programming: Proceedings of the International Logic Programming Symposium,
pages 167–186. MIT Press, October 1991.
52
[87] G. Janssens, B. Demoen, and A. Marien. Improving the register allocation of WAM by
recording unification. In ICLP88, pages 1388–1402, 1988.
[88] P. Kacsuk. A Highly Parallel Prolog Interpreter Based on the Generalised Data Flow
Model. In S.-A. Tarnlund, editor, Proceedings of the Second International Logic Pro-
gramming Conference, pages 195–205, Uppsala University, Uppsala, Sweden, 1984.
[89] P. Kacsuk and M. J. Wise, editors. Implementations of Distributed Prolog. Wiley, Series
in Parallel Computing, 1992.
[90] L. V. Kale. The REDUCE OR process model for parallel execution of logic program-
ming. The Journal of Logic Programming, 11(1), July 1991.
[91] Y. Kimura and T. Chikayama. An Abstract KL1 Machine and its Instruction Set. In In-
ternational Symposium on Logic Programming, pages 468–477. San Francisco, IEEE
Computer Society, August 1987.
[92] S. Kliger and E. Shapiro. A Decision Tree Compilation Algorithm for FCP(j,:,?). In Pro-ceedings of the Fifth International Conference and Symposium on Logic Programming,
pages 1315–1336. MIT Press, August 1988.
[93] S. Kliger and E. Shapiro. From Decision Trees to Decision Graphs. In Proceedings
of the North American Conference on Logic Programming, pages 97–116. MIT Press,
October 1990.
[94] F. Kluzniak. Developing applications for Aurora. Technical Report TR-90-17, Univer-
sity of Bristol, Computer Science Department, August 1990.
[95] P. Koves and P. Szeredi. Collection of Papers on Logic Programming, chapter Getting
the Most Out of Structure-Sharing. SZKI, November 1993.
[96] R. A. Kowalski. Logic for Problem Solving. Elsevier North-Holland Inc., 1979.
[97] A. Krall. The vienna abstract machine. The Journal of Logic Programming, 1-3, Octo-
ber 1996.
[98] H. Kuchen, R. Loogen, J. J. Moreno-Navarro, and M. Rodrıguez-Artalejo. The Func-
tional Logic Language BABEL and Its Implementation on a Graph Machine. New
Generation Computing, 14(4):391–427, 1996.
53
[99] B. Le Charlier and P. Van Hentenryck. Experimental evaluation of a generic abstract
interpretation algorithm for PROLOG. ACM TOPLAS, 16(1):35–101, January 1994.
[100] X. Li. A new term representation method for prolog. The Journal of Logic Program-
ming, 34(1):43–57, January 1998.
[101] Y.-J. Lin and V. Kumar. AND-parallel execution of logic programs on a shared-memory
multiproces sor. The Journal of Logic Programming, 10(1,2,3 and 4):155–178, 1991.
[102] Z. Lin. Self-organizing task scheduling for parallel execution of logic programs. In
Proceedings of the International Conference on Fifth Generation Computer Systems,
pages 859–868, ICOT, Japan, 1992. Association for Computing Machinery.
[103] T. Lindgren. Polyvariant detection of uninitialized arguments of prolog predicates. The
Journal of Logic Programming, 28(3), September 1997.
[104] R. Lopes and V. Santos Costa. The BEAM: Towards a first EAM Implementation.
In Workshop on Parallelism and Implementation Technology for (Constraint) Logic
Programming Languages, Port Jefferson, October 1997.
[105] L. Lu. Polymorphic type analysis in logic programs by abstract interpretation. The
Journal of Logic Programming, 36(1), July 1998.
[106] E. Lusk, R. Butler, T. Disz, R. Olson, R. Overbeek, R. Stevens, D. H. D. Warren,
A. Calderwood, P. Szeredi, S. Haridi, P. Brand, M. Carlsson, A. Ciepelewski, and
B. Hausman. The Aurora or-parallel Prolog system. New Generation Computing,
7(2,3):243–271, 1990.
[107] A. Marien. Improving the Compilation of Prolog in the Framework of the Warren Ab-
stract Machine. PhD thesis, Katholiek Universiteit Leuven, September 1993.
[108] A. Marien and B. Demoen. On the Management of Choicepoint and Environment
Frames in the WAM. In E. L. Lusk and R. A. Overbeek, editors, Proceedings of
the North American Conference on Logic Programming, pages 1030–1050, Cleveland,
Ohio, USA, 1989.
[109] A. Marien and B. Demoen. A new scheme for unification in WAM. In V. Saraswat and
K. Ueda, editors, Logic Programming, Proceedings of the 1991 International Sympo-
sium, pages 257–271, San Diego, USA, 1991. The MIT Press.
54
[110] A. Marien, G. Janssens, A. Mulkers, and M. Bruynooghe. The impact of abstract inter-
pretation: an experiment in code generation. In Proceedings of the Sixth International
Conference on Logic Programming, pages 33–47. MIT Press, June 1989.
[111] K. Marriot and P. J. Stuckey. Programming with Constraints: An Introduction. MIT
Press, 1998.
[112] H. Masukawa, K. Kumon, A. Itashiki, K. Satoh, and Y. Sohma. ”Kabu-Wake” Parallel
Inference Mechanism and Its Evaluation. In 1986 Proceedings Fall Joint Computer
Conference, pages 955–962. IEEE Computer Society Press, November 1986.
[113] L. Matyska, A. Jergova, and D. Toman. Register allocation in WAM. In K. Furukawa,
editor, Proceedings of the Eighth International Conference on Logic Programming,
pages 142–156, Paris, France, 1991. The MIT Press.
[114] M. Mehl, R. Scheidhauer, and C. Schulte. An Abstract Machine for Oz. Research Re-
port RR-95-08, Deutsches Forschungszentrum fur Kunstliche Intelligenz, Stuhlsatzen-
hausweg 3, D66123 Saarbrucken, Germany, June 1995. Also in: Proceedings of
PLILP’95, Springer-Verlag, LNCS, Utrecht, The Netherlands.
[115] C. S. Mellish. The Automatic Generation of Mode Declarations for Prolog Programs.
DAI Research Paper 163, Department of Artificial Intelligence, Univ. of Edinburgh,
August 1981.
[116] D. A. Miller and G. Nadathur. Higher-order logic programming. In E. Shapiro, edi-
tor, Proceedings of the Third International Conference on Logic Programming, Lecture
Notes in Computer Science, pages 448–462, London, 1986. Springer-Verlag.
[117] L. Monteiro and A. Porto. Contextual logic programming. In Proceedings of the Sixth
International Conference on Logic Programming, pages 284–299. MIT Press, June
1989.
[118] J. Montelius and K. A. M. Ali. An And/Or-Parallel Implementation of AKL. New
Generation Computing, 14(1), 1996.
[119] J. Montelius and P. Magnusson. Using SimICS to Evaluate the PennySystem. In
Proceedings of the 1997 International Logic Programming Symposium, October 1997.
55
[120] R. Moolenaar and B. Demoen. A parallel implementation for AKL. In Proceedings
of the Programming Language Implementation and Logic Programming: PLILP ’93,
Tallin, Estonia, pages 246–261, 1993.
[121] R. Moolenaar and B. Demoen. Hybrid tree search in the Andorra Model. In P. V.
Hentenryck, editor, Proceedings of the Eleventh International Conference on Logic Pro-
gramming, pages 110–123. MIT Press, June 1994.
[122] A. Mulkers, W. Winsborough, and M. Bruynooghe. Live-structure dataflow analysis for
Prolog. ACM TOPLAS, 16(2):205–258, March 1994.
[123] K. Muthukumar and M. V. Hermenegildo. The CDG, UDG, and MEL Methods
for Automatic Compile-time Parallelization of Logic Programs for Independent And-
parallelism. In Proceedings of the Seventh International Conference on Logic Program-
ming, pages 221–237. MIT Press, June 1990.
[124] L. Naish. Negation and Control in Prolog. Lecture notes in Computer Science 238.
Springer–Verlag, 1985.
[125] L. Naish. Parallelizing NU-Prolog. In Proceedings of the Fifth International Conference
and Symposium on Logic Programming, pages 1546–1564. MIT Press, August 1988.
[126] Z. Nemeth and P. Kacsuk. Experiments with Binding Schemes in LOGFLOW. In
Proceedings of Europar 1998, Southampton, UK, 1998.
[127] I. W. Olthof. An Optimistic AND-Parallel Prolog Implementation. Master’s thesis,
Department of Computer Science, University of Calgary, 1991.
[128] T. Ozawa, A. Hosoi, and A. Hattori. Generation Type Garbage Collection for Parallel
Logic Languages. In Proceedings of the North American Conference on Logic Program-
ming, pages 291–305. MIT Press, October 1990.
[129] D. Palmer. The DAM: A Parallel Implementation of the AKL. Presented at the ILPS
workshop on Parallel Logic Programming, October 1991.
[130] D. Palmer and L. Naish. NUA-Prolog: an Extension to the WAM for Parallel Andorra.
In K. Furukawa, editor, Proceedings of the Eighth International Conference on Logic
Programming. MIT Press, 1991.
56
[131] L. M. Pereira, L. Monteiro, J. Cunha, and J. N. Aparıcio. Delta Prolog: a distributed
backtracking extension with events. In E. Shapiro, editor, Third International Confer-
ence on Logic Programming, London, pages 69–83. Springer-Verlag, 1986.
[132] E. Pontelli and G. Gupta. Data and-parallel logic programming in &ace. In 7th IEEE
Symposium on Parallel and Distributed Processing. IEEE Computer Society, 1995.
[133] E. Pontelli and G. Gupta. Dependent and Extended Dynamic Dependent And-
parallelism in ACE. Journal of Functional and Logic Programming, (to appear).
[134] E. Pontelli, G. Gupta, and M. Hermenegildo. &ACE: A High-Performance Parallel Pro-
log System. In International Parallel Processing Symposium. IEEE Computer Society
Technical Committee on Parallel Processing, IEEE Computer Society, April 1995.
[135] E. Pontelli, G. Gupta, M. Hermenegildo, M. Carro, and D. Tang. Efficient Implementa-
tion of And-Parallel Logic Programming Systems. Computer Languages, 22(2/3), 1996.
[136] K. Popov. A Parallel Abstract Machine for the Thread-Based Concurrent Language
Oz. In 1997 Post ILPS Workshop on Parallelism and Implementation Technology for
(Constraint) Logic Programming, 1997.
[137] I. V. Ramakrishnan, P. Rao, K. Sagonas, T. Swift, and D. S. Warren. Efficient Tabling
Mechanisms for Logic Programs. In L. Sterling, editor, Proceedings of the 12th Inter-
national Conference on Logic Programming, pages 687–711, Tokyo, Japan, June 1995.
The MIT Press.
[138] B. Ramkumar and L. Kale. Compiled Execution of the Reduce-OR Process Model on
Multiprocessors. In Proceedings of the North American Conference on Logic Program-
ming, pages 313–331. MIT Press, October 1989.
[139] R. Rocha, F. Silva, and V. Santos Costa. On Applying Or-Parallelism to Tabled Evalua-
tions. In Post-ICLP’97 Workshop on Tabling in Logic Programming, Leuven, Belgium,
July 1997. Also published as Technical Report DCC-97-2, DCC - FC & LIACC, Univer-
sidade do Porto, April, 1997.
[140] R. Rocha, F. Silva, and V. Santos Costa. YapOr: an Or-Parallel Prolog System based
on Environment Copying. Technical report, DCC-97-14, DCC - FC & LIACC, Univer-
sidade do Porto, December 1997. (submitted for publication).
57
[141] K. Rokusawa, A. Nakase, and T. Chikayama. Distributed memory implementation of
klic. New Generation Computing, 14(3):261–280, 1996.
[142] V. Santos Costa and R. Bianchini. Optimising Parallel Logic Programming Systems
for Scalable Machines. In Proceedings of Europar 1998, Southampton, UK, 1998.
[143] V. Santos Costa, R. Bianchini, and I. C. Dutra. Parallel Logic Programming Systems
on Scalable Multiprocessors. In Proceedings of the 2nd International Symposium on
Parallel Symbolic Computation, PASCO’97, pages 58–67, July 1997.
[144] V. Santos Costa, M. E. Correia, and F. Silva. Performance of Sparse Binding Arrays
for Or-Parallelism. In Proceedings of the VIII Brazilian Symposium on Computer Ar-
chitecture and High Performance Processing – SBAC-PAD, August 1996.
[145] V. Santos Costa, D. H. D. Warren, and R. Yang. Andorra-I: A Parallel Prolog System
that Transparently Exploits both And- and Or-Parallelism. In Third ACM SIGPLAN
Symposium on Principles & Practice of Parallel Programming PPOPP, pages 83–93.
ACM press, April 1991. SIGPLAN Notices vol 26(7), July 1991.
[146] V. Santos Costa, D. H. D. Warren, and R. Yang. The Andorra-I Engine: A parallel
implementation of the Basic Andorra model. In Proceedings of the Eighth International
Conference on Logic Programming, pages 825–839. MIT Press, June 1991.
[147] V. Santos Costa, D. H. D. Warren, and R. Yang. The Andorra-I Preprocessor: Support-
ing full Prolog on the Basic Andorra model. In Proceedings of the Eighth International
Conference on Logic Programming, pages 443–456. MIT Press, June 1991.
[148] V. Santos Costa, D. H. D. Warren, and R. Yang. Andorra-I Compilation. New Genera-
tion Computing, 14(1), 1996.
[149] V. A. Saraswat. Partial Correctness Semantics for CP[#,j,&,;]. In Proceedings of theFoundations of Software Technology and Theoretical Computer Science Conference,
pages 347–368, December 1985.
[150] V. A. Saraswat, K. Kahn, and J. Levy. Janus: A step towards distributed constraint
programming. In S. Debray and M. Hermenegildo, editors, Proceedings of the 1990
North American Conference on Logic Programming, pages 431–446, Cambridge, Mas-
sachusetts London, England, 1990. MIT Press.
58
[151] M. Sato and A. Goto. Evaluation of the KL1 Parallel System on a Shared Memory
Multiprocessor. In IFIP Working Conference on Parallel Processing, pages 305–318.
Pisa, North Holland, May 1988.
[152] M. Sato, H. Shimizu, A. Matsumoto, K. Rokusawa, and A. Goto. KL1 Execution Model
for PIM Cluster with Shared Memory. In J.-L. Lassez, editor, Proceedings of the Fourth
International Conference on Logic Programming, MIT Press Series in Logic Program-
ming, pages 338–355. University of Melbourne, ”MIT Press”, May 1987.
[153] E. Shapiro. A Subset of Concurrent Prolog and Its Interpreter. In E. Shapiro, editor,
Concurrent Prolog: Collected Papers, pages 27–83. MIT Press, Cambridge MA, 1987.
[154] E. Shapiro. Concurrent Prolog: Collected Papers. MIT Press, 1987.
[155] E. Shapiro. The family of Concurrent Logic Programming Languages. ACM computing
surveys, 21(3):412–510, 1989.
[156] K. Shen. Studies of AND/OR Parallelism in Prolog. PhD thesis, University of Cam-
bridge, 1992.
[157] K. Shen. Initial Results from the Parallel Implementation of DASWAM. In M. Maher,
editor, Proceedings of the 1996 Joint International Conference and Symposium on Logic
Programming. The MIT Press, 1996.
[158] K. Shen. Overview of DASWAM: Exploitation of Dependent And-parallelism. J. of
Logic Prog., 29(1–3), 1996.
[159] K. Shen. A New Implementation Scheme for Combining And/Or Parallelism. In 1997
Post ILPS Workshop on Parallelism and Implementation Technology for (Constraint)
Logic Programming, 1997.
[160] F. M. A. Silva. Implementations of Logic Programming Systems, chapter Or-
Parallelism on Distributed Shared Memory Architectures. Kluwer Academic Pub.,
1994.
[161] R. Sindaha. The Dharma Scheduler – Definitive Scheduling in Aurora on
Multiprocessors Architecture. In PDP ’92, pages 296–303. IEEE, November 1992.
59
[162] G. Smolka. The Oz programming model. In J. van Leeuwen, editor, Computer Science
Today, Lecture Notes in Computer Science, vol. 1000, pages 324–343. Springer-Verlag,
Berlin, 1995.
[163] Z. Somogyi, F. Henderson, and T. Conway. The execution algorithm of mercury, an
efficient purely declarative logic programming language. The Journal of Logic Pro-
gramming, 1-3, October 1996.
[164] Z. Somogyi, K. Ramamohanarao, and J. Vaghani. A Stream AND-Parallel Execution
Algorithm with Backtracking. In R. A. Kowalski and K. A. Bowen, editors, Logic Pro-
gramming: Proceedings of the Fifth International Conference and Symposium, Volume
2, pages 1142–1159. The MIT Press, 1988.
[165] R. M. Stallman. Using and porting gcc. Technical report, The Free Software Founda-
tion, 1993.
[166] P. Szeredi. Performance analysis of the Aurora or-parallel Prolog system. In Proceed-
ings of the North American Conference on Logic Programming, pages 713–732. MIT
Press, October 1989.
[167] P. Szeredi. Using Dynamic Predicates in an Or-Parallel Prolog System. In Logic Pro-
gramming: Proceedings of the International Logic Programming Symposium, pages
355–371. MIT Press, October 1991.
[168] P. Szeredi, M. Carlsson, and R. Yang. Interfacing Engines and Schedulers in OR-
Parallel Prolog Systems. In PARLE91: Conference on Parallel Architectures and Lan-
guages Europe, volume 2, pages 439–453. Springer Verlag, June 1991.
[169] P. Szeredi and Z. Farkas. Handling large knowledge bases in parallel Prolog. InWork-
shop on High Performance Logic Programming Systems, European Summer School on
Logic, Language, and Information, August 1996.
[170] A. Takeuchi. Parallel Logic Programming. PhD thesis, University of Tokyo, July 1990.
[171] P. Tarau. An Efficient Specialization of the WAM for Continuation Passing Binary
programs. In Proceedings of the 1993 ILPS Conference, Vancouver, Canada, 1993. MIT
Press. poster.
60
[172] P. Tarau, K. de Bosschere, and B. Demoen. Partial translation: towards a portable and
efficient Prolog implementation technology. The Journal of Logic Programming, 1-3,
October 1996.
[173] A. Taylor. Removal of Dereferencing and Trailing in Prolog Compilation. In Proceed-
ings of the Sixth International Conference on Logic Programming, pages 49–60. MIT
Press, June 1989.
[174] A. Taylor. LIPS on a MIPS: Results from a Prolog Compiler for a RISC. In Proceedings
of the Seventh International Conference on Logic Programming, pages 174–185. MIT
Press, June 1990.
[175] A. Taylor. Parma–bridging the performance gap between imperative and logic pro-
gramming. The Journal of Logic Programming, 1-3, October 1996.
[176] H. Tebra. Optimistic And-Parallelism in Prolog. In PARLE: Parallel Architectures and
Languages Europe, Volume II, pages 420–431. Springer-Verlag, 1987. Published as
Lecture Notes in Computer Science 259.
[177] E. Tick. Parallel Logic Programming. MIT Press, 1991.
[178] E. Tick and C. Banerjee. Performance evaluation of Monaco compiler and runtime
kernel. In ICLP93, pages 757–773, 1993.
[179] E. Tick and J. A. Crammond. Comparison of Two Shared-Memory Emulators for Flat
Committed–Choice Logic Programs. In International Conference on Parallel Process-
ing, volume 2, pages 236–242, Penn State, August 1990.
[180] E. Tick and M. Korsloot. Determinacy testing for nondeterminate logic programming
languages. ACM TOPLAS, 16(1):3–34, January 1994.
[181] E. Tick and G. Succi, editors. Implementations of Logic Programming Systems. Kluwer
Academic Pub., 1994.
[182] K. Ueda. Guarded Horn Clauses. In E. Shapiro, editor, Concurrent Prolog: Collected
Papers, pages 140–156. MIT Press, Cambridge MA, 1987.
[183] K. Ueda and T. Chikayama. Design of the Kernel Language for the Parallel Inference
Machine. Computer Journal, December 1990.
61
[184] M. H. van EmdemandG. J. de Lucena Filho. Predicate Logic as a Language for Parallel
Programming. In K. L. Clark and S. A. Tarnlund, editors, Logic Programming, pages
189–198. Academic Press, London, 1982.
[185] P. Van Roy. Can Logic Programming Execute as Fast as Imperative Programming?
PhD thesis, University of California at Berkeley, November 1990.
[186] P. Van Roy. 1983-1993: The Wonder Years of Sequential Prolog Implementation. The
Journal of Logic Programming, 19/20, May/July 1994.
[187] D. H. D. Warren. Implementing Prolog - Compiling Predicate Logic Programs. Techni-
cal Report 39 and 40, Department of Artificial Intelligence, University of Edinburgh,
1977.
[188] D. H. D. Warren. An Abstract Prolog Instruction Set. Technical Note 309, SRI Inter-
national, 1983.
[189] D. H. D. Warren. The SRI model for or-parallel execution of Prolog—abstract design
and implementation issues. In Proceedings of the 1987 Symposium on Logic Program-
ming, pages 92–102, 1987.
[190] D. H. D. Warren. The Andorra model. Presented at Gigalips Project workshop, Uni-
versity of Manchester, March 1988.
[191] D. H. D. Warren. Extended Andorra model. PEPMA Project workshop, University of
Bristol, October 1989.
[192] D. H. D. Warren, L. M. Pereira, and F. C. N. Pereira. Prolog—The Language and its
Implementation Compared with Lisp. ACM SIGPLAN Notices, 12(8):109–115, 1977.
[193] D. S. Warren. Efficient Prolog Memory Management for Flexible Control Strategies.
New Generation Computing, 2:361–369, 1984.
[194] M. J. Wise. Prolog Multiprocessors. Prentice-Hall, 1986.
[195] R. Yang. P-Prolog a Parallel Logic Programming Language. World Scientific, 1987.
[196] R. Yang, T. Beaumont, I. Dutra, V. Santos Costa, and D. H. D. Warren. Performance
of the Compiler-based Andorra-I System. In Proceedings of the Tenth International
Conference on Logic Programming, pages 150–166. MIT Press, June 1993.
62