51
Journal of Automated Reasoning 25: 167–217, 2000. © 2000 Kluwer Academic Publishers. Printed in the Netherlands. 167 Ordered Semantic Hyper-Linking ? DAVID A. PLAISTED and YUNSHAN ZHU Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3175. e-mail: {plaisted|zhu}@cs.unc.edu (Received: 8 June 1998; accepted: 31 January 1999) Abstract. The ordered semantic hyper-linking strategy is complete for first-order logic and accepts a user-specified natural semantics that guides the search for a proof. Any semantics in which the meanings of the function and predicate symbols are computable on ground terms may be used. This instance-based strategy is efficient on near-propositional problems, is goal sensitive, and has an extension to equality and term rewriting. However, it sometimes has difficulty generating large terms. We compare this strategy with some others that use semantic information, and present a proof of soundness and completeness. We also give some theoretical results about the search efficiency of the strategy. Some examples illustrate the performance of the strategy. Key words: hyper-linking, semantics, automated theorem proving. 1. Introduction The goal of automatic theorem proving is to develop efficient automatic or semi- automatic methods of searching for proofs of theorems in formal logical systems. If this could be done, it would clearly have tremendous applications in many ar- eas of mathematics and science in general. The existence of automatic theorem provers is a consequence of the syntactic properties of formal logic, which permit an algorithmic test for whether a rule of inference has been applied correctly or not. However, formal logic includes not only syntax but also semantics. Syntax spec- ifies which strings of symbols are allowable in a logic. Even the description of rules of inference in a logic is essentially syntactic in nature. A semantics for a statement or logic specifies the meanings of the symbols in the statement as functions and predicates over some set or sets of elements. The collection of such meanings is called an interpretation of the statement or formula. Although humans often use se- mantics in proving theorems, most computer theorem provers are entirely syntactic in approach. When proving a theorem about groups, a human will typically think about examples of groups and whether they satisfy the theorem or not. A human will typically consider the set of all groups in some sequential order, dividing this set into subcases in some organized fashion until all groups have been considered ? This research was partially supported by the National Science Foundation under grant CCR- 9108904.

Ordered Semantic Hyper-Linking

Embed Size (px)

Citation preview

Journal of Automated Reasoning25: 167–217, 2000.© 2000Kluwer Academic Publishers. Printed in the Netherlands.

167

Ordered Semantic Hyper-Linking?

DAVID A. PLAISTED and YUNSHAN ZHUDepartment of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill,NC 27599-3175. e-mail: {plaisted|zhu}@cs.unc.edu

(Received: 8 June 1998; accepted: 31 January 1999)

Abstract. The ordered semantic hyper-linking strategy is complete for first-order logic and acceptsa user-specified natural semantics that guides the search for a proof. Any semantics in which themeanings of the function and predicate symbols are computable on ground terms may be used.This instance-based strategy is efficient on near-propositional problems, is goal sensitive, and hasan extension to equality and term rewriting. However, it sometimes has difficulty generating largeterms. We compare this strategy with some others that use semantic information, and present a proofof soundness and completeness. We also give some theoretical results about the search efficiency ofthe strategy. Some examples illustrate the performance of the strategy.

Key words: hyper-linking, semantics, automated theorem proving.

1. Introduction

The goal of automatic theorem proving is to develop efficient automatic or semi-automatic methods of searching for proofs of theorems in formal logical systems.If this could be done, it would clearly have tremendous applications in many ar-eas of mathematics and science in general. The existence of automatic theoremprovers is a consequence of the syntactic properties of formal logic, which permitan algorithmic test for whether a rule of inference has been applied correctly ornot.

However, formal logic includes not only syntax but also semantics. Syntax spec-ifies which strings of symbols are allowable in a logic. Even the description of rulesof inference in a logic is essentially syntactic in nature. A semantics for a statementor logic specifies the meanings of the symbols in the statement as functions andpredicates over some set or sets of elements. The collection of such meanings iscalled an interpretation of the statement or formula. Although humans often use se-mantics in proving theorems, most computer theorem provers are entirely syntacticin approach. When proving a theorem about groups, a human will typically thinkabout examples of groups and whether they satisfy the theorem or not. A humanwill typically consider the set of all groups in some sequential order, dividing thisset into subcases in some organized fashion until all groups have been considered

? This research was partially supported by the National Science Foundation under grant CCR-9108904.

168 DAVID A. PLAISTED AND YUNSHAN ZHU

and the theorem is shown to be true for all of them. A mechanical theorem prover,on the other hand, will typically deal only with symbols and low-level operationson them. Such a prover will perform the same inferences on the same symbols,regardless of whether these symbols refer to groups or to pumpkins. This suggeststhe possibility that some of the power of human theorem proving comes from theuse of semantics and that in order to obtain more powerful automated theoremprovers, methods that depend not only on syntax but also on semantics might benecessary.

For a long time the first author has been searching for a way to integrate mean-ingful semantic information into the theorem-proving process. He has also been at-tempting to combine this use of semantics with other desirable properties in a singletheorem-proving strategy. Of course, others have also recognized the importanceof semantics in theorem proving and have attempted to integrate semantics intotheorem proving. Recently, we have succeeded in developing and implementing asemantic theorem-proving strategy that we believe has considerable potential.

We now present thisordered semantic hyper-linkingstrategy, OSHL, whichmakes use of meaningful semantic information to guide the search for a proofand has some other significant advantages over existing strategies. It is the firsttheorem-proving strategy we are aware of that combines the following elements:

• Equality and term rewriting• Propositional efficiency• Goal sensitivity• Natural semantics

We will define and discuss each of these later. OSHL is an instance-based strategy.This means that to show a setS of clauses is unsatisfiable, it first generates a setT of ground instances of the clauses inS and then shows thatT is propositionallyunsatisfiable. By Herbrand’s theorem, ifS is unsatisfiable, then such a setT alwaysexists. This approach has the advantage that it can exploit efficient propositionaltheorem-proving methods such as Davis and Putnam’s method [21, 22] but has thedisadvantage that one needs an efficient method for generating the instances. Thefirst author has been involved in the development of a number of instance-basedstrategies. The main elements of OSHL were presented in [43], but without animplementation.

In addition to the above features, this prover also has a replacement rule facilityfor set theory and for definitions in general, and has implicit typing and a UR-resolution component. We are not aware of any other prover that has all of thesefeatures. Of course, this strategy also has some disadvantages, and one cannotexpect one strategy to be optimal for all theorems.

We now give brief definitions of the above elements of OSHL. Equality andterm rewriting refer to efficient methods for handling the equality predicate; thesetypically involve replacement of equals by equals by term rewriting, as well as

ORDERED SEMANTIC HYPER-LINKING 169

some form of completion. Many provers have such capabilities, typically imple-mented by paramodulation [49]. The propositional efficiency of a prover refers tothe efficiency of the method on propositional problems, particularly those that arehighly non-Horn. Goal sensitivity refers to the ability of the prover to concentrateits work on those clauses that refer to the particular theorem being proved, asopposed to those clauses that encode general axioms. Natural semantics refers tothe ability of the prover to incorporate meaningful semantic information into itssearch; by “meaningful” we mean that any computable functions and predicatescan be specified as the semantics of the function and predicate symbols in the inputclauses, and the prover will still be reasonably efficient.

A disadvantage of OSHL is that it sometimes has trouble generating large terms,because it instantiates the input clauses to ground clauses and uses a constrainedenumeration procedure to find these ground instances of the input clauses. Fortheorems with large terms and short proofs, especially on non-Horn clauses, onemight want to use resolution or model elimination or some similar strategy instead.However, OSHL has several features that minimize the difficulty of generatinglarge terms. Later we will discuss this problem of generating large terms and someways in which OSHL alleviates it.

Many provers involve strategies that are especially adapted to Horn clauses, orthat perform well on them. For example, Prolog with caching gives efficiency andgoal sensitivity on Horn clauses, and model elimination can simulate this approachwell. The first author also believed initially that efficiency on Horn clauses wasessential for an efficient theorem prover. However, he now feels that this is not sucha central issue overall. Thus OSHL does not have special mechanisms (other thanUR-resolution, which is not goal sensitive) for handling Horn clauses. Therefore,for Horn clauses, one might want to use a model elimination-type strategy instead,especially if the term sizes of a proof may be large. But in fact it turns out thatOSHL performs well on Horn clauses because of the way semantics works onthem, assuming the term sizes are not too big. The reason for this is that on Hornclauses, OSHL behaves a lot like the geometry theorem prover of Gelernter etal. [24], which used geometric diagrams and backward chaining with Horn-clause-like inference rules to guide its search.

The OSHL prover has been implemented in Eclipse Prolog and is currentlyavailable on the World-Wide Web at the URL http://www.cs.unc.edu/zhu/prover.html. This Web page permits a user to enter theorems and call OSHL to attemptto prove them, with a 300-second time limit. A few flags are also available on thisWeb page to adjust the strategy used. The prover may also be downloaded.

2. Existing Strategies

For a new strategy to have a purpose, it must possess some advantages not alreadypresent in existing approaches. Therefore we now consider the extent to which anumber of well-known strategies possess the features of OSHL, particularly the

170 DAVID A. PLAISTED AND YUNSHAN ZHU

first four mentioned above, and avoid its disadvantages. The existing strategieswe consider are resolution; the set of support restriction of resolution; semanticresolution; model elimination; the methods of Caferra, Peltier, and Zabel [15, 17,18, 40]; and that of Ganzinger, Meyer, and Weidenbach [25]. We also consider thesimplified and modified problem reduction formats [41, 42], clause linking [34],and clause linking with semantics (CLIN-S) [14], since these are previous proversin Plaisted’s research program. For a general discussion of these strategies andfirst-order theorem proving in general, see [13, 28, 33, 53].

In general, all of the strategies we consider below except for CLIN-S use uni-fication, not enumeration, to generate large terms, and thus avoid the problemsassociated with enumeration. Only model elimination and the simplified and mod-ified problem reduction formats have special facilities for Horn clauses, and noneof the strategies listed below are propositionally efficient except clause linking,clause linking with semantics, and possibly the approach of Ganzinger, Meyer, andWeidenbach [25], which actually has considerable similarities with OSHL. Forsome evidence of the propositional inefficiency of many strategies, see [34] and[45]. To assess the significance of propositional efficiency on some problems, weencourage readers to try various provers on pigeonhole problems and various logicpuzzles such as the zebra problem.

We first mention the geometry theorem prover of Gelernter et al. [24], whichused semantics in the form of geometry diagrams to guide its application of geo-metric constructions. This was one of the earliest automatic theorem provers usingsemantics. However, its proof style is limited to Horn-clause-like rules of inference.

The resolution rule of Robinson [47], one of the simplest and most common in-ference rules for theorem proving, has a number of significant advantages. But it isnot propositionally efficient or goal sensitive and does not use semantics. However,it does have an excellent method of incorporating equality and term rewriting usingparamodulation and demodulation. The same comments apply to ordered resolu-tion (that is, resolution with an A-ordering [33]). In particular, ordered resolutionis not propositionally efficient. However, it has a number of significant advantagesthat we would like to exploit. A recent modification of resolution by Slaney [51]does use natural semantics, and may be goal sensitive, but it is not complete, nor isit propositionally efficient. However, it performs very well on certain problems.

The set of support refinement of resolution [54] is goal sensitive and can beviewed as incorporating natural semantics, to some extent, if the latter is used tochoose the set of support. However, it is not propositionally efficient and may notpermit a complete method of incorporating term rewriting and equality. In general,this strategy is much like model elimination in its properties.

There is another refinement of resolution, calledsemantic resolution[50], thatis to some extent a generalization of hyperresolution to arbitrary interpretations.This strategy is goal sensitive and uses natural semantics. However, it is not propo-sitionally efficient and may be difficult to combine with term rewriting and para-modulation in a complete manner.

ORDERED SEMANTIC HYPER-LINKING 171

Model elimination [32] is goal sensitive, which probably helps to explain itssuccess in theorem prover competitions. It also restricts the order in which literals“resolve” away, unlike set of support resolution. It does not appear to be proposi-tionally efficient, nor does it permit a complete combination with term rewritingand paramodulation except via some form of Brand’s modification method [12].It also does not incorporate natural semantics, except possibly in its choice of astarting clause.

Some early work emphasizing the importance of semantics is found in [1, 10].Both of these papers emphasize the use of semantics to provide counterexamplesand find proofs in topology and analysis in a way similar to that used by humansin finding mathematical proofs. They both argue for the importance of naturalsemantics and attempt to mechanize the process of example (semantics) generationas far as possible. The second paper, emphasizing the use of semantics to guide theinstantiation of set variables, has some formal similarity to the instantiation processused in OSHL.

The simplified problem reduction format [41] can make use of a limited form ofnatural semantics, but does not appear to be propositionally efficient. It is also notgoal sensitive. The modified problem reduction format [42] has similar properties.However, there is a semantic version of the modified problem reduction format [39]that can incorporate natural semantics and is goal sensitive. But neither strategy hasa straightforward, efficient method to incorporate equality and term rewriting.

Clause linking (CLIN) [34] is propositionally efficient but does not have a goodway to incorporate equality and term rewriting; at least, no such combination hasbeen implemented yet. There is a version of clause linking (user support) that isgoal sensitive, but this version does not seem to be efficient in practice. Clauselinking with semantics (CLIN-S) [14] is also propositionally efficient and permitsthe incorporation of a limited form of natural semantics. In particular, it can makeuse of a semantics with a finite domain or one that can be expressed using linearinequalities. This restriction on the semantics permits CLIN-S to use constraints toguide the instantiation of clauses. But CLIN-S lacks some of the other techniquesthat make instantiation efficient in OSHL. Also, CLIN-S does not incorporateequality and term rewriting. Nor is it compatible with ordered resolution in thesame way that OSHL is. Despite its limitations, CLIN-S does perform remarkablywell on certain problems. CLIN-S actually has many similarities to OSHL, exceptfor its failure to use an ordering on literals.

Caferra, Peltier, and Zabel have developed a number of interesting and originaltechniques for using semantics in theorem proving [15, 17, 18, 40] that are based onconstraints. These techniques involve inference rules for detecting unsatisfiability,as well as rules for finding models, combined into one system. Both sets of rulesare applied to a set of clauses until either a refutation is obtained or no more rulescan be applied; in the latter case, one knows that the formula is satisfiable, andone can often obtain a model from the set of remaining clauses in this case. Ofcourse, it is also possible that the method will not terminate. These methods have

172 DAVID A. PLAISTED AND YUNSHAN ZHU

been embodied in the RAMC (refutation and model construction) system and havebeen applied to abduction [16] and other areas. This approach can be very effectiveat finding models. RAMC avoids the need to enumerate terms by its use of con-straints. The more powerful the constraint language, the more models this approachcan find; see, for example, an extension to terms with exponents in [40]. In fact, itmight be possible to build a similar approach for avoiding enumeration into OSHL,but it would introduce some nondeterminism into the strategy. Such an approachwould entail performing an OSHL-like proof search at the nonground level andcarrying along constraints on the ground terms that could replace the variables.If the constraints become too large, their evaluation can become time consuming.OSHL already has nondeterminism in the choice of the ground instances used inthe proof, but handles this by including all selected instances in the same proofattempt. Another difference in the approach of Caferra, Peltier, and Zabel is thatinstead of accepting a semantics provided by the user at the beginning, the RAMCapproach generates a semantics automatically in many cases. In [15], this semanticsis used to prune the proof search, in a spirit somewhat similar to OSHL. RAMC canhandle special inference rules for equality; an extension of the methods of Caferra,Peltier, and Zabel to permit paramodulation in the refutation part is given in [2].

The RAMC approach is probably much better than OSHL at detecting satisfia-bility of clause sets. However, RAMC may not be propositionally efficient, since itis based on resolution as the inference rule for finding refutations. Some versionsof RAMC are goal sensitive, but most are not, since most versions of RAMC donot use the model to prune the search in a goal-sensitive manner. Also, OSHLoften has to construct thousands of ground instances in order to find proofs; eventhough RAMC clauses are more general, and therefore it may not need as many ofthem, it is not clear how efficiently RAMC could handle such a large number ofclauses. Still, without more testing, one cannot give a definite comparison of thetwo approaches, since they are so different.

Ganzinger, Meyer, and Weidenbach [25] have developed a refinement of or-dered resolution that is very similar to OSHL, which may have been developedabout the same time or a little later than that described in [44]. However, eachmethod has advantages not possessed by the other. Because of the similaritiesbetween the methods, we devote special attention to a comparison between OSHLand the method of Ganzinger, Meyer, and Weidenbach.

Formally, the methods have many similarities, especially on ground clauses.Both methods specify a minimal model of a subset of the input clauses, which isminimal on a certain ordering on interpretations. Both methods seek to find clausesthat contradict this model. However, OSHL computes a minimal ground clause thatcontradicts this model, and uses this clause to modify the model. Sometimes OSHLwill resolve two ground clauses, but not always. The basic operation of Ganzinger,Meyer, and Weidenbach is to resolve two clauses, one of which is a minimal clausecontradicting this minimal model. There is another slight difference between themethods, in that OSHL permits the user to specify an initial interpretation, while

ORDERED SEMANTIC HYPER-LINKING 173

the refinement of Ganzinger et al. uses a fixed initial interpretation. This initialinterpretation is minimal in the ordering on interpretations, and so the ability for theuser to specify it means that the ordering on interpretations used by OSHL is moreflexible than that used by the method of Ganzinger, Meyer, and Weidenbach. Thisis a small point formally, but in practice it is very significant, because this featureenables OSHL to be goal sensitive, which otherwise it would not be. Of course,this feature can easily be incorporated into the method of Ganzinger, Meyer, andWeidenbach, which probably would make it goal sensitive as well.

Another difference is the way in which the two strategies are lifted to first-order logic. OSHL is lifted by an enumeration process, which essentially extendsthe ground method to first-order logic in a fairly direct manner but introducesinefficiencies in enumerating ground terms in some cases. Ganzinger, Meyer, andWeidenbach work at the first-order level and perform all first-order inferences thatgeneralize permitted ground inferences. This makes their method potentially moreefficient, since the enumeration done by OSHL is avoided, but it also can intro-duce computability and decidability problems, which the authors consider. Oneobvious corollary of this difference is the fact that the first-order version of OSHLcan use any computable interpretation. This permits many interpretations to bespecified in a natural way, by giving Prolog programs to compute the meaningsof the functions and predicates. However, the method of Ganzinger, Meyer, andWeidenbach does not appear to permit interpretations of this generality; rather, itappears to be necessary to solve constraints related to the interpretation in orderto use their method. There does not seem to be any obvious way to extend themethod of Ganzinger, Meyer, and Weidenbach to use arbitrary computable inter-pretations. This difference is related to the fact that in the following discussion wedevote considerable attention to implementation issues and to the complexity ofinstantiation, topics that are only briefly touched on in [25]. In fact, it is not clearfrom the paper of Ganzinger, Meyer, and Weidenbach that their strategy is eveneffectively computable, although they do make use of sort theories to approximateit in first-order logic.

The fact that Ganzinger, Meyer, and Weidenbach lift to first-order logic in adifferent way makes their method qualitatively different from OSHL. One implica-tion of this is that their method may not be as propositionally efficient on first-orderlogic as OSHL, since it is a less direct simulation of Davis and Putnam’s methodthan OSHL. If their method could be implemented exactly, then it would probablyavoid this propositional inefficiency. But if it can only be approximated, then itseems likely that the extra resolutions performed will lead to some propositionalinefficiency. The approach of Ganzinger, Meyer, and Weidenbach may also intro-duce some choices into their method, especially if it is only approximated, since itappears that there may be many possible ordered resolutions at the first-order levelin their approach. At least, they consider fairness properties of proofs in order toshow completeness. For OSHL, fairness is not an issue, since at any given timethere is only one ground clause added to the set of ground instances, and as long as

174 DAVID A. PLAISTED AND YUNSHAN ZHU

this clause is chosen to be a clause that is minimal in a certain ordering on clauses,OSHL is complete.

Another slight difference between the methods is that the method of Ganzingeret al. has as its basic operation ordered resolution. OSHL has as its basic operationthe selection of a ground instance that contradicts the current interpretation. Thismakes OSHL more fine grained, in a sense. It could be that OSHL might select aninfinite sequence of ground instances and never perform any ordered resolutionsat all, if the set of input clauses is satisfiable. Since the basic step of OSHL issmaller, it is easier to perform, making OSHL possibly somewhat easier to im-plement. On the other hand, if the method of Ganzinger, Meyer, and Weidenbachcould be effectively implemented, it might be better. The formalism of OSHL alsodiffers to some extent from that used by Ganzinger et al.; for example, our notionof a progress function does not seem exactly to correspond to the formalism ofGanzinger et al.

There is another formal difference between the two methods. Both methodsconsider clauses that are minimal in a clause ordering. However, OSHL uses adifferent ordering on clauses from the method of Ganzinger, Meyer, and Weiden-bach. The method of Ganzinger, Meyer, and Weidenbach essentially uses the sameordering on clauses and interpretations. Their completeness result depends only onthe fact that the ordering on clauses is well founded. But even for well-foundedorderings on clauses, OSHL is incomplete. For some such orderings, OSHL willfail to find a proof, even for an unsatisfiable set of input clauses, since it willgenerate many ground instances that are smaller than the ones it needs. For thecompleteness of OSHL, the clause ordering must satisfy a stronger condition thanwell-foundedness, which we call downward finiteness. Since the methods use dif-ferent clause orderings, the set of inferences they perform is not comparable. OSHLwill perform some inferences that the method of Ganzinger, Meyer, and Weiden-bach will not perform, and their method will perform some inferences that OSHLwill not perform. A consequence of this is that the completeness of neither methodis implied by the completeness of the other. The fact that OSHL has a strongerrequirement on the clause ordering is not really a restriction, since the ordering oninterpretations is the key for compatibility with ordered paramodulation strategies.OSHL can use the same orderings on interpretations as the method of Ganzinger,Meyer, and Weidenbach, while using a downward finite ordering on clauses. Infact, the implementation of OSHL uses a length-lexicographic ordering for bothinterpretations and clauses, and this ordering is downward finite. But in theory,OSHL could use more general orderings on interpretations.

Another possible difference between the methods is in their relationship toequality reasoning. We implemented a complete extension of OSHL to equality,which is based on unit equations. The method of Ganzinger et al., working at afirst-order level, has a more difficult time incorporating equality, and as far as weknow no extension to equality has been implemented.

ORDERED SEMANTIC HYPER-LINKING 175

Ganzinger, Meyer, and Weidenbach also mention a few implementation resultsof their method, which give a speedup of about two or possibly more. We have donemany more tests, and for some problems (such as the planning problems) obtainmuch more dramatic speedups than this. However, our implementation, being inProlog, may be slower in terms of inferences per unit time.

The SATCHMO prover [37] is also related to OSHL, in that it generates groundinstances of the input clauses and searches for models. It uses case analysis, as doesOSHL. If the input clauses are not in a special range-restricted form, SATCHMOmay need to perform exhaustive enumeration of ground instances, like OSHL.SATCHMO can detect satisfiability of clause sets in some cases. SATCHMO isefficient on logic puzzles and near-propositional problems. However, it does notpermit a natural semantics as does OSHL, but essentially uses an all-negativesemantics. SATCHMO lacks some of OSHL’s optimizations for enumeration ofground instances. It also has “proof confluence” in the sense of Bibel [9], as doesOSHL, and so for SATCHMO it is not necessary to consider more than one choiceat each step for completeness. Another difference is that SATCHMO does notmake use of an ordering on ground literals to guide the search. There are nospecial features for equality in SATCHMO, but these could be added without muchdifficulty.

An application of semantics to diagnosis is given in [4], in which the hyper-tableaux method (similar to SATCHMO) is modified to incorporate semantics ina propositional setting. This semantic hyper-tableaux method is also similar toOSHL, and has proof confluence but does not make use of an ordering on literalsin the same manner as OSHL. However, it does permit a natural use of semanticsand appears to be a promising strategy. The object of diagnosis is to enumeratemodels of the axioms, which correspond to diagnoses. For this application, the useof natural semantics gives dramatic improvements in performance.

3. Importance of Various Features of OSHL

We now discuss various features of OSHL individually, before presenting an over-view of the operation of the prover. These include natural semantics, goal sensi-tivity, propositional efficiency, ordering of literals, equality and term rewriting, re-placement rules, implicit typing, and UR-resolution. For each feature, we indicateits importance, as well as sketch some aspects of the implementation mechanismused in OSHL to realize this feature. Afterwards, we discuss the prover as a wholeand some complexity questions associated with it.

All the features we discuss fit together more or less naturally into one prover,and all have been implemented. We encourage users to experiment with this proverat the URL given above, or to download the prover. Many implementation decisionswere made that could have been made differently, and so there is still plenty ofroom for experimentation.

176 DAVID A. PLAISTED AND YUNSHAN ZHU

Before discussing the various features of OSHL, we present some necessaryterminology. We denote variables by the lettersx, y, z, possibly with subscripts;function symbols byf, g, h, possibly with subscripts; and constant symbols by thesymbolsa, b, c, d, possibly with subscripts. Thearity of a function symbol is thenumber of arguments it takes. A constant symbol can be considered as a functionsymbol of arity zero. We letF be the set of function symbols andX be the setof variables. Aterm is a well-formed expression formed from variables, constantsymbols, and function symbols. For example,f (x, g(a)) is a term, wheref hasarity two andg has arity one. We use the lettersr, s, t , possibly with subscripts, torefer to terms. Thesymbol sizeof a term is the number of occurrences of function,constant, and variable symbols in it. Thus the symbol size of the termf (x, g(x, a))

is 5. We denote the symbol size of a termt by ||t||. We use the symbol≡ forsyntactic identity, as well as logical equivalence. Thus ifr and s are terms, thenr ≡ s means thatr ands are syntactically identical.

A 1-contextis a term with one occurrence of2 in it. Thus f (a, g(2)) is acontext. If r is a context ands is another term, thenr[s] representsr with theoccurrence of2 replaced bys. Thusf (a, g(2))[h(y)] is f (a, g(h(y))).

We denotepredicate symbolsby P,Q,R, possibly with subscripts. Thearityof a predicate symbol is the number of arguments it takes. Anatomis a predicatesymbol of arityn followed by a list ofn terms, asP(a, g(b)), whereP has aritytwo. The symbol size||A|| of an atomA is the sum of the number of symboloccurrences inA. Thus the symbol size ofP(x, g(x)) is 4. A literal is an atomor an atom preceded by a negation sign¬. A literal having a negation sign is saidto benegative, and a literal without a negation sign (i.e., an atom) is said to bepositive. The literalsL and¬L are said to becomplementary, and ifL denotes¬Afor an atomA, then¬L denotesA. If L is a literal, thenat (L) or L0 is theatomofL, which isL with the negation sign removed, if present. The symbol size||L||of a literalL is defined as||A||, whereA is the atom ofL. A clauseis a set ofliterals, denoting its logical disjunction, with free variables universally quantified.A clause, literal, atom, or term is said to begroundif it has no variables.

A substitutionis a mapping from variables to terms that differs from the identityon only finitely many variables. The substitution that maps variablexi to term tifor 1 ≤ i ≤ n is denoted by{x1 ← t1, . . . , xn ← tn}. If t is a term and2 is asubstitution, thent2 denotes the term that results by replacing variables by termsas specified by2. Such a term is called aninstanceof t . Similarly,L2 denotes theapplication of a substitution to a literalL, and so forth. The set of variablesx suchthatx2 6≡ x is called thesupportof 2. We denote substitutions by Greek letters.

If I is a first-order structure andC is a clause, thenI |= C means that thestructureI satisfies the clauseC. We also say thatC is true in I . If I 6|= C, wesayC contradictsI , or thatC is false inI . We speak ofI satisfying or falsifyingliterals and other formulae in a similar way. IfS is a set of clauses, thenI |= S

means thatI |= C for all C ∈ S.

ORDERED SEMANTIC HYPER-LINKING 177

3.1. NATURAL SEMANTICS

We have already highlighted the importance of natural semantics in theorem prov-ing. Such semantics are often used by humans when proving theorems, and so itis reasonable to use such semantics to guide a theorem prover, too. In fact, someof the successes of syntactic provers in solving open problems are precisely onstructures that are not well understood, where human intuition and semantics areof little help. By “natural” semantics, we mean that one can specify meanings of thefunctions and predicates that correspond to their common mathematical or physicalmeanings. Some provers permit a limited semantics in which one can assign apredicate symbol to be identically true or identically false. This is of course muchless useful, though it can provide some improvement as well.

In OSHL, the user specifies a semantics by giving the domainD of the inter-pretation and, for each function symbolf , a computable function fromDn toD,wheren is the arity off . For each predicate symbolP , the user gives a computablepredicate onDn, wheren is the arity ofD. Since OSHL is written in Prolog, theuser specifies the semantics in Prolog as well, which is an advantage due to thesimple logical structure of Prolog. Any computable semantics may be specified inthis way. We note that this is very flexible, since many semantics are computable,and is more general than the semantic mechanism of CLIN-S. Not all semanticsare computable; for example, set theory is undecidable, so there is no computablesemantics for a first-order axiomatization of set theory. It probably would be pos-sible to extend OSHL to acceptpartially computable semantics, but we have notexplored this question.

We feel that it is reasonable to require the user to specify a semantics, sincestandard semantics are known for many common sets of axioms, and it makes senseto permit the prover to take advantage of this mathematical knowledge. If suchsemantic information leads to a powerful theorem prover, then the effort invested inproviding it is well justified, even if it becomes more difficult to use such a prover intheorem-proving competitions. And there are many cases in which a semantics canbe found automatically or semi-automatically; for example, an exhaustive searchcan find finite semantics (that is, semantics having a finite domain) that satisfies acollection of axioms, if such a semantics exists, since there is a countable number offinite semantics, and it is decidable whether such a semantics satisfies a set of first-order axioms. Caferra, Peltier, and Zabel [15, 17, 18, 40] have developed automaticmethods capable of generating infinite semantics. Another possibility would beto store with the prover a large collection of semantics for various mathematicalstructures, such as rings, groups, and set theory, and then have the prover lookat the syntactic form of the input clauses to see whether they are specifying oneof these structures. If so, then one of these predefined semantics can be used. Ofcourse, there is no guarantee that a mechanically generated semantics will leadto an efficient proof search. And no procedure can possibly generate and validatemodels satisfying an arbitrary satisfiable set of axioms, for this would imply that

178 DAVID A. PLAISTED AND YUNSHAN ZHU

first-order logic is decidable. Thus, the finding of such models will always beto some extent an art. OSHL will be complete regardless of which semantics isspecified, but a good semantics can enhance its efficiency.

We now give a couple of examples of semantics and show how instantiation ofclauses would proceed for these semantics.

EXAMPLE 3.1. First we consider clauses containing a binary predicate symbolM, the constant symbolse, a, b, c, and the binary function symbol•. The domainD of the interpretation is{0, a, b, c}. We indicate the meaning of a function orconstantf by f I and the meaning of a predicateP byP I . We define•I by

•I (x,0) = •I (0, x) = x, x ∈ D,•I (x, x) = 0, x ∈ D,•I (x, y) = •I (y, x) = z, x, y, z ∈ a, b, c, x, y, z are distinct.

We defineaI = a, bI = b, cI = c, eI = 0. Finally, MI(x, y, z) is true iff•I (x, y) = z. Letting I be this structure, we then have, for example, that

I |= M(a, b, c),I |= ∀x(M(x, x, e)),I |= ∀x, y, z(M(x, y, z) ⊃ M(y, x, z)).

Given such a structureI (which can be specified in Prolog in a manner to be il-lustrated below), the prover operates by instantiating clausesC to ground instancesC2 such thatI 6|= C2. We call such an instanceC2 of C acontradiction instanceof C for I . We refer to the problem of finding such an instance as theinstantiationproblemfor C relative toI . We illustrate how contradiction instances can be foundfor the above structureI . Let C be the clause¬M(x, y, z). Then there are manycontradiction instances ofC for I : ¬M(a, b, c), ¬M(b, c, a), ¬M(a, b, a • b),¬M(a •b, b •a, c • c), and so on. But we note that if¬M(r, s, t) is a contradictioninstance ofC for I , then so is¬M(r1, s1, t1), for anyr1, s1, t1 such thatrI = rI1andsI = sI1 and tI = tI1 . Thus, if we are looking for minimal size contradictioninstances, it is only necessary to considerr, s, t such that there are no smallerr1, s1, t1 with rI = rI1 and sI = sI1 and tI = tI1 . This result is general, so wehave the following definition and theorem.

DEFINITION 3.2. An ordering> on ground terms and ground atoms is called atermination orderingif > is well founded, that is, there are no infinite sequencest1, t2, t3, . . . of terms or atoms witht1 > t2 > t3 > · · · , and if for all 1-contextst and all terms or atomsu andv, u > v implies t[u] > t[v]. An ordering>cl

on clauses is called a termination ordering if>cl is well founded and if there isa termination ordering> on ground terms such that for all clausesC and groundtermss andt , s > t impliesC[s] ≥ C[t].

ORDERED SEMANTIC HYPER-LINKING 179

DEFINITION 3.3. We define theordering by sizeon ground terms and groundatoms byr >size s for ground terms or atomsr ands if ||r|| > ||s||. The lexico-graphic ordering>lex on ground terms and atoms is defined byr >lex s for groundterms or atomsr and s if ||r|| > ||s|| or ||r|| = ||s|| and r is lexicographicallylarger thans; that is, ifg andf are the symbols ofr ands at the first place wherethere is a difference, reading left to right, theng is alphabetically larger thanf .Thusf (b, a) >lex f (a, c) andf (b, c) >lex f (b, b).

We note that ordering by size is a partial termination ordering on ground termsand ground atoms, and the lexicographic ordering is a total termination orderingon ground terms and ground atoms.

DEFINITION 3.4. If > is a termination ordering on ground terms and groundatoms, then we extend> to an ordering>lit on ground literals byM > L forground literalsL andM if B > A, whereB is the atom ofM andA is the atomof L. Thus the literalsA and¬A are not ordered with respect to each other, for anatomA. But for all other pairsL andM of ground literals, if> is a total terminationordering, then eitherL >lit M orM >lit L. We extend an ordering>lit on groundliterals to an ordering>cl on ground clauses byD >cl C for ground clausesC andD if for all literals L in C, there is a literalM in D such thatM >lit L. We extendan ordering> on ground terms to ground substitutions having the same support by22 > 21 if for some variablex in the support of21 or22, x22 > x21, and if forall variablesx in the support of21 or22, eitherx22 > x21 or x22 ≡ x21.

THEOREM 3.5. If > is a termination ordering on ground terms and groundatoms, then the extensions>lit and>cl of> to ground literals and ground clauses,respectively, are well-founded partial orderings. The extension of> to groundsubstitutions having the same support is also a well-founded partial ordering.

Proof. One only needs to show that>lit and>cl are transitive, irreflexive, andwell founded, and similarly for the extension to ground substitutions. The proof isstraightforward. Showing that>cl is well founded is nontrivial, but standard. 2DEFINITION 3.6. SupposeI is a first-order structure and> is a terminationordering on ground terms and ground atoms. We say a ground termt is minimalfor I and> if there is no other ground termu with t > u such thattI = uI . Wesay a substitution2 is minimal for I and> if for all variablesx, x2 is a groundterm that is minimal forI and>. Let>cl be the extension of> to ground clauses.We say a contradiction instanceC2 of clauseC for I is minimal forC, I and>(or>cl) if there is no other2′ such thatC2′ is a contradiction instance ofC for Iand such thatC2 >cl C2

′.

Note that if ground termt is minimal forI and>, then all subterms oft are alsominimal for I and>. Also, note that if> is a total termination ordering on groundterms, then ifr ands are minimal forI and> andrI = sI , thenr ≡ s, that is,

180 DAVID A. PLAISTED AND YUNSHAN ZHU

r ands are identical. It follows in this case that for all ground termsu there is aunique ground termt that is minimal forI and> such thatuI = tI .

THEOREM 3.7. Suppose> is a termination ordering on ground terms and groundatoms, and>cl is its extension to ground clauses. IfC is a first-order clause andI is a first-order structure, and if there is a2 such thatC2 is a contradictioninstance ofC for I , then there is a minimal(with respect to>) 2 such thatC2 isa contradiction instance ofC for I . Furthermore, there is a minimal(with respectto>) 2 such thatC2 is a minimal contradiction instance forC, I , and>cl.

Proof. If C2 is a contradiction instance ofC for I , then let21 be definedso that for allx, if x2 6≡ x, thenx21 is a minimal term forI and> such thatx2I

1 = x2I . Such a minimal term exists, because> is well founded. ThenC21

is also a contradiction instance ofC for I , and21 is minimal for I and>. Thefact that a minimal contradiction instance forC, I , and>cl exists follows fromthe fact that>cl is well founded. Also, if22 > 21, then eitherC22 > C21 orC22 ≡ C21. This implies that there is a minimal2 such thatC2 is a minimalcontradiction instance forC, I , and>cl. 2

The implication of these results is that in looking for small contradiction in-stances, it is necessary to consider only minimal2. Note that if> is the lexico-graphic ordering, not only are>-minimal terms minimal in size, but among allequivalent terms of a given size, only one of them is minimal with respect to>.Also, all subterms of a minimal term are minimal with respect to>. The fact thatminimal terms have minimal subterms increases the efficiency of instantiation evenmore. For example, ifD is a finite set, and> is a total termination ordering, thenthere are only finitely many minimal terms altogether, and they are closed underthe subterm operation.

DEFINITION 3.8. IfC is a clause,I is a structure, and> is a termination orderingon ground terms, then letGC(I,>) be the set ofC2 such that2 is minimal forIand>. LetGC,n(I,>) be the set ofC2 as before, with the additional restrictionthat for all variablesx in the support of2, ||x2|| ≤ n.

THEOREM 3.9. If C is a clause,I is a structure, and> is a termination orderingon ground terms, and if there is a contradiction instance ofC for I , thenGC(I,>)

contains a contradiction instance ofC for I , in fact, one that is minimal forC, Iand>. Also, if there is a contradiction instanceD of C for I such that||D|| ≤ n,thenGC,n(I,>) contains a minimal contradiction instanceC2 for C, I , and>such that||C2|| < n, where> is the size ordering.

Proof. The first part follows from Theorem 3.7, which states that if there is acontradiction instance forC andI , there is one obtainable from a minimal2. Thesecond part follows from this and the fact that any terms in a contradiction instanceof sizen or less can have size at mostn. 2

ORDERED SEMANTIC HYPER-LINKING 181

The significance of this result is that it suffices to examine the setGC(I,>)

when searching for (minimal) contradiction instances, and it suffices to examinethe setGC,n(I,>) when searching for contradiction instances of size at mostn.Thus we can bound the complexity of the instantiation procedure by bounding thesizes of these sets. And we have the following simple result.

PROPOSITION 3.10.The number of elements inGC(I,>), whereC is a clausehavingk variables andI is an interpretation having a domainD with d elements,is at mostdk. The number of elements inGC,n(I,>), whereC is a clause havingk variables andI is an interpretation such that{tI : t is a ground term,||t|| ≤ n}hasd elements, is at mostdk .

Sinced is a constant for a givenI , this bound is exponential ink. It is often thecase that clauses do not have many variables, and we will see later that this numberof variables is often reduced before instantiation, further improving the efficiency.For I from Example 3.1, the only minimal terms aree, a, b, andc, even if we usethe size ordering, so the process of finding contradiction instances is particularlysimple in this case. The number of elements inGC(I,>) is then 4k.

We now give another example of a semanticsI and how the process of instatia-tion would work with it.

EXAMPLE 3.11. We consider clauses containing a binary predicate symbolsB

andE, the constant symbol 0, the unary function symbolss andp, and the binaryfunction symbols+ and−. The domainD of the interpretation is the integers. Asusual, we indicate the meaning of a function or constantf by f I and the meaningof a predicateP by P I . We define+I to be integer addition,−I to be integersubtraction, 0I to be the integer zero,sI to be the integer successor operation, andpI to be the integer predecessor operation. Finally,BI is defined byBI(x, y) iffx > y, andEI is defined byEI(x, y) iff x = y. Letting I be the structure sodefined, we then have, for example, that

I |= E(s(s(0)), s(0) + s(0)),I |= E(0, s(s(p(p(0))))),I |= ∀x∃yB(y, x),I |= ∀x, yE(x + y, y + x).

For this structureI , assuming we use ordering by size, the only minimal groundterms are those of the formsn(0) andpn(0) for variousn. This eliminates a lot ofnonminimal terms and makes instantiation much simpler than it otherwise wouldbe. If we are restricting the search to minimal contradiction instances of a givensizes or less, then there are at most 2s−1 minimal ground terms to consider. Thusit is necessary to consider only a subset ofD of size 2s − 1 or less. This impliesthatGC,s(I,>) has at most(2s − 1)k elements.

182 DAVID A. PLAISTED AND YUNSHAN ZHU

We now comment on the way in which natural semantics helps to focus attentionon clauses related to the particular theorem being proved. A theorem is typically ofthe formA ⊃ R, whereA is a collection of general axioms andR is a particularresult one wishes to prove. In refutational clause-form theorem proving, this theo-rem is first negated to obtainA ∧ ¬R and then converted to clause formS, whereS is SA ∪ S¬R andSA are the clauses obtained fromA andS¬R are the clausesobtained from¬R. Now, if I is a structure satisfyingA, then there is a closelyrelated structure satisfyingSA. (It is necessary only to interpret Skolem functionsin a reasonable way.) Thus the only clauses that do not satisfyI will be those inS¬R, and these clauses derive from the negation of the particular theorem beingproved. By choosing such anI , we guarantee that contradiction instances foundare related to the particular theoremR and are not general axioms fromA. Asthe theorem-proving process continues, OSHL will modifyI and obtain additionalcontradiction instances, but we will show below that these additional instances arealso related toR.

For example, consider the following clauseC expressing the associativity of thegroup operation of Example 3.1:

¬M(x, y, u) ∨ ¬M(y, z, v) ∨ ¬M(u, z,w) ∨M(x, v,w).SinceI |= C, whereI is the structure of Example 3.1, we know that there

will not be any contradiction instances forC and I . In fact, OSHL permits theuser to specify a set of support, so that the prover will not waste time attemptingto instantiate such clausesC at the start. However, the clause¬M(b, a, c) fromthe negation of the theorem stating that the group is commutative does, in fact,contradictI .

We have argued that semantics can aid in the process of proving a theorem.Another role of semantics is to find counterexamples. If a setS of clauses is sat-isfiable, it is possible that a structureI will be given by the user or generated bythe prover in the course of its operation such thatI |= S. If I has a finite domain,then the prover may be able to detect this and report that the setS is satisfiable.However, OSHL may fail to find such a modelI even if one exists with a finitedomain. Extending OSHL to detect finite models, or models in some other class,when they exist, is a promising research area.

3.2. GOAL SENSITIVITY

OSHL is goal sensitive. Goal sensitivity means that the inferences performed bythe prover are all related to the particular theorem being proved, and not simply thecombination of general axioms. This is especially important when there are manyinput clauses from axioms, but only a small number of clauses from the particulartheorem being proved. If the process of inference is not goal sensitive, and thereare many general axioms, the proof search will be inefficient, because of the manyinferences obtained simply by combining general axioms. Tests by [48] show that

ORDERED SEMANTIC HYPER-LINKING 183

the provers that perform best on large axiom sets from some realistic examples in-volving software verification are goal sensitive. This demonstrates the importanceof goal sensitivity, especially since this effect should be even more pronounced onlarger axiom sets. Some strategies such as Knuth–Bendix completion are not goalsensitive but are still efficient on moderately sized problems. Also, if the generalaxioms are saturated with respect to a strategy, it can be efficient on large clausesets even if it is not inherently goal sensitive.

To make the concept of goal sensitivity precise, we need to specify what itmeans for an inference to be “related” to the particular theorem being proved. Sinceour prover constructs ground instances of input clauses, this needs to be stated interms of ground instances being relevant to the particular theorem. We have thefollowing definition:

• A ground instance that is not satisfied by the initial structureI is relevant to theparticular theorem being proved.• If ground instanceC is relevant to the theorem being proved andD is another

ground instance and there are literalsL andM in C andD, respectively, suchthat L andM are complementary, thenD is relevant to the theorem beingproved.

The first point assumes that the initial structureI supplied by the user satisfiesthe clauses coming from general axioms, so a ground instanceC not satisfied byImust be related to the particular theorem being proved. The second point is justifiedbecause clausesC andD containing complementary literals can resolve, and thusare related. The degree of relevance of a ground instanceD can be measured byhow many such links(L,M) are needed to connectD to a clause that contradictsthe initial structureI . More relevant clauses require fewer such links. One can showthat all instances in a minimal unsatisfiable set of ground clauses will be relevantto the theorem.

For a strategy to be goal sensitive, we require that all instances used be rele-vant to the particular theorem, and also require that a verification of relatednessto the goal be supplied with each generated ground instance, either actually orpotentially. Our prover could easily be modified to print out the appropriate literalsdemonstrating relatedness of ground instances to the particular theorem.

Our prover actually uses a stronger notion of relevance than this, since only asubset of the relevant instances is used. Other notions of relevance that incorporatemore information than our simple link criterion may also be useful, as in [36].This approach involves extending SATCHMO by essentially detecting which lit-erals could be useful for a backward chaining proof, and generating only clausescontaining such literals. The same restriction can be applied to forward chainingproofs, and the two notions are combined in [26] for non-Horn clauses using“magic sets.” This involves a transformation of the input clause set and leads toa dramatic improvement in search space size and proof time on some examples.The approach of [48] is also promising and improves the performance of several

184 DAVID A. PLAISTED AND YUNSHAN ZHU

theorem provers on realistic examples of proofs of specification and program prop-erties by analyzing the large-scale structure of a specification to determine relevantaxioms.

3.3. PROPOSITIONAL EFFICIENCY

Propositional efficiency refers to the efficiency of the prover on propositional andnear-propositional problems, especially those that are highly non-Horn. A problemis near-propositionalif the terms needed in the proof are small, that is, the proofdoes not make extensive use of function symbols. For example, a problem withoutany function symbols is near-propositional. Horn problems are not so interestingbecause near-propositional Horn problems can be solved efficiently by positiveresolution or hyperresolution in many cases. In fact, propositional Horn problemscan be solved in polynomial time [19].

Examples of near-propositional problems for which propositional efficiency be-comes important are the pigeonhole problems and logic puzzles such as the salt-and-mustard problem and the zebra problem. We showed in [34] that an instance-based strategy far outperforms resolution on such problems, and we believe thatmany other strategies are nearly as inefficient as resolution on such problems.It seems likely that such inefficiencies are also affecting non-Horn problems ingeneral, even those that are not near-propositional. It is interesting to note alongthis line that many of the success stories of theorem provers are on problems thatare Horn problems or even pure equality problems. Thus there is a potentiallydramatic increase in the power of a theorem prover as a result of propositionalefficiency. None of the instance-based provers implemented by the first author’sstudents have been implemented in the most efficient languages; if they were, theirperformance advantages may have been much more dramatic. All of these proverswere implemented in Prolog.

The commonest way to get propositional efficiency is by some kind of a Davis-and-Putnam like procedure [21]. Such procedures are in practice often highly effi-cient on propositional problems. The procedures essentially involve case analysis.This means that a proposition (ground literal)A is chosen and first assumed tobe true. Then a contradiction is derived. ThenA is assumed to be false, and acontradiction is derived. Thus we have the two cases to consider,A being trueandA being false. This is a natural mechanism, and one often used by humans infinding and expressing proofs, since human proofs often involve the considerationof a number of cases in some planned sequence. So the use of natural semanticsand propositional efficiency together makes OSHL remarkably humanlike.

ORDERED SEMANTIC HYPER-LINKING 185

3.4. ORDERING LITERALS

A number of efficient resolution strategies impose an ordering on literals and choosethe largest literals in clauses as the ones to resolve together. Examples of suchstrategies are the literal ordering strategy of Bachmair and Ganzinger [5] and theparamodulation refinements of Hsiang and Rusinowitch [27]. We would like OSHLto be compatible with such strategies. These strategies eliminate large literals fromclauses, and large literals tend to be the most difficult for OSHL to generate.CLIN-S [14] usedrough resolutionto eliminate large literals, and this substan-tially increased its power, so we would like to have something similar available onOSHL. Thus if OSHL and some kind of ordered resolution were used together, theordered resolution component would eliminate large literals, and OSHL would beefficient on the remaining problem involving small literals.

Just using ordered resolution by itself may not always work, because theremay be an efficiency problem with ordered resolution. Ordered resolution by itselfdoes not appear to be propositionally efficient. This can be verified by runningan ordered resolution prover on logic puzzles or on the pigeonhole problems. Atheoretical result in this direction is also contained in [45] and [44].

3.5. ORDERING AND TERM REWRITING

For problems involving equality, it is widely recognized that term-rewriting tech-niques are generally required. Such techniques permit the replacement of equalsby equals and lead to the paramodulation and demodulation inference rules of first-order logic. Demodulation is useful because it simplifies many equivalent expres-sions to a common canonical form. The most efficient versions of paramodulationand demodulation involve a termination ordering on literals. For pure equationalproblems, completion methods may be used, and these are often strikingly suc-cessful in this domain. Completion methods are also based on rewriting techniquesand term orderings. In order to achieve efficiency on problems involving equality,it is necessary to build in some kind of a term-rewriting mechanism, and this alsorequires the incorporation of a term ordering into a prover. For surveys of termrewriting, see [20, 11]. For an early paper about paramodulation and demodulation,see [27].

There are approaches to equality that do not involve term orderings, but theseare generally much less efficient. One possibility is a direct use of the equalityaxioms in a first-order prover. Another possibility that is generally somewhat betteris to use Brand’s modification method [12], which is a preprocessing step on theinput clauses that eliminates the need for most of the equality axioms but does notrequire any other modification to a theorem prover. However, rewriting techniquesare often superior to Brand’s modification method.

OSHL is the first instance-based prover implemented in the first author’s re-search program that has an efficient equality mechanism, and possibly the firstinstance-based prover anywhere that has an efficient such mechanism. (The method

186 DAVID A. PLAISTED AND YUNSHAN ZHU

of Ganzinger et al. [25] is not instance based.) OSHL uses rewriting and narrowing(paramodulation) with unit equations, but requires some other mechanism such asBrand’s transformation to handle equations that appear in nonunit clauses. It seemsfeasible to extend OSHL to permit rewriting and narrowing on Horn clauses con-taining equations. It is interesting that OSHL obtains reasonable performance witha fixed ordering (the length-lexicographic ordering). This is somewhat remarkablebecause this ordering orients some equations (such as the distributivity law) the“wrong way.” Most rewriting-based provers allow the user to choose a differentordering for each problem. The fact that one ordering always seems to work wellmeans that the user need give less input to OSHL. Of course, OSHL might be morepowerful if the user could specify the ordering.

3.6. REPLACEMENT RULES

In [35], the value of replacement rules for theorems in set theory and temporallogic was demonstrated. Such rules have the effect of replacing predicates by theirdefinitions in a Skolemized framework. Ordinary resolution often has considerabledifficulty proving theorems in set theory. We have built a replacement rule facilityinto OSHL.

3.7. IMPLICIT TYPING

OSHL has a facility for typing, but the user must specify it in the semantics.Without going into details, we note that this facility permits the user to specifyrestrictions on the forms of the terms that can appear in a proof. For example,in theorems involving set theory, the user could specify that no set constructorsproducing infinite sets be allowed. Such a mechanism often permits a considerablereduction in search space, because meaningless expressions like apple+ housecan be eliminated from consideration. This mechanism can also be used to giveheuristic guidance to the prover, based on the user’s intuition and experience aboutthe structure of the proof.

3.8. UR-RESOLUTION

UR-resolution is an inference rule that generates a unit clause (or the empty clause)from a nonunit clause and a collection of unit clauses. This rule avoids many of thesearch inefficiencies of resolution because literals from different clauses are notcombined together. However, the price is that UR-resolution is not complete as atheorem-proving strategy. Still, many theorems can be proved by UR-resolution,and so it makes sense for a prover to have some UR-resolution capability. In addi-tion, even if a theorem cannot be proved by UR-resolution, the extra unit clausesgenerated in this way can be useful in reducing the search space by permitting otherclauses to be simplified.

ORDERED SEMANTIC HYPER-LINKING 187

4. The OSHL Procedure

We now give a high-level description of the OSHL theorem-proving strategy andshow that, under certain assumptions, OSHL is complete as a theorem prover. Thenwe present a more detailed description and verify that the assumed properties ofOSHL are satisfied.

We first give a verbal sketch of the operation of OSHL.We assume an ordering on interpretations and an ordering on ground clauses

are given.The prover OSHL attempts to determine whether a setS of first-order clauses is

unsatisfiable. It does this by constructing a setT of ground instances ofS. Initially,T is empty.

The prover repeatedly performs the following three steps:

• First, OSHL constructs the modelI of T that is minimal in the ordering oninterpretations.• Next, OSHL chooses a ground instanceD of some clause ofS such thatI 6|= D

and such thatD is minimal in the ordering on clauses, subject to this restriction.D is added to the setT of ground clauses.• Finally, OSHL modifies the setT of clauses so as to preserve or increase its

minimal model. (I will no longer be a model ofT , sinceD has been added toT .)We assume that the modification performed onT at each step is polynomial timecomputable. We also assume that ifT is unsatisfiable, then it contains the emptyclause.

The above three steps are repeated untilT contains the empty clause, at whichpoint OSHL reports unsatisfiable.

We now give a more formal and precise definition.We assume a total ordering>i on interpretations. We also assume a partial

ordering>c on clauses. We writeI1 <i I2 if I2 >i I1 andC1 <c C2 if C2 >c C1.

DEFINITION 4.1. mmin[T ] is the minimal(wrt >i) model of a setT of groundclauses, ifT is satisfiable. We assume that such minimal models exist. Later wewill demonstrate this.

DEFINITION 4.2. GminS [I ] is a minimal (wrt >c) ground instance ofS such

that I 6|= GminS [I ]. We assume thatGmin

S [I ] is either undefined (that is, does notterminate) or terminates with “fail” ifI |= S.

DEFINITION 4.3. A progress functionf (T ,D) is a function of two variables,whereT is a set of ground clauses,D is a ground clause, andf (T ,D) is a set ofground clauses, such that

• mmin[f (T ,D)] >i mmin[T ] if D contradictsmmin[T ] andT ∪ {D} is satisfiable,

• f (T ,D) = {{}} if T ∪ {D} is unsatisfiable,• f (T ,D) is a logical consequence ofT ∪ {D},

188 DAVID A. PLAISTED AND YUNSHAN ZHU

• f is polynomial time computable,• if T0, T1, T2, . . . is a sequence of sets of clauses such thatT0 is {} andTi+1 =f (Ti, Ci) for all i, and{C1, C2, C3, . . .} is finite, then

⋃i Ti is finite.

The intuition for this definition is that a progress function keeps track of theprogress of a theorem prover in examining interpretations. Interpretations are ex-amined in the order of>i , smallest first, andmmin[f (T ,D)] is the smallest interpre-tation not known to be contradicted by the setS of input clauses. In the following,we assume thatsimpis a progress function and that finite setsT of ground clauseshave models that are minimal with respect to>i .

The general OSHL procedure is as follows:

procedure OSHL(S);T ← {};while {} 6∈ T do

if mmin[T ] |= S and this can be detectedthen return satisfiableelseD← Gmin

S [mmin[T ]];if D = “fail” then return satisfiablefi;T ← simp(T ,D)

fiod;

end while;return unsatisfiable

endOSHL.

We assume thatsimp is a progress function. The testmmin[T ] |= S may not bedecidable, but we include it in case it is. We note that the fact thatsimpis a progressfunction guarantees that a model ofT exists if{} 6∈ T .

DEFINITION 4.4. An ordering< on interpretations iscoarseif for all infinitesequencesT1, T2, T3, . . . of sets of clauses, ifmmin[T1] < mmin[T2] < mmin[T3] <. . . , then

⋃i Ti is infinite.

We assume that the ordering>i on interpretations is coarse.

DEFINITION 4.5. An ordering> is downward finiteif for all x, for all but finitelymanyy, y > x.

LEMMA 4.6. If the setS of clauses input to procedure OSHL is unsatisfiable,minimal models of finite sets of ground clauses(in the sense of Definition4.1)exist, and the ordering>c on clauses is downward finite, then the set of clausesD

that are added toT during the run of OSHL is finite.Proof. EachD is Gmin

S [I ] for some interpretationI . SinceS is unsatisfiable,there is a finite unsatisfiable setS′ of ground instances ofS. Thus, some clauseC

ORDERED SEMANTIC HYPER-LINKING 189

of S′ contradictsI . SinceD is the minimal clause contradictingI , it must not bethe case thatD >c C. Since>c is downward finite, the set of suchD for a givenCin S′ is finite. SinceS′ is finite, the set of all suchD is a finite union of finite sets,hence finite. 2

4.1. CORRECTNESS

We now show correctness of the OSHL procedure.

THEOREM 4.7. If S is unsatisfiable, and the ordering>i on interpretations iscoarse, and finite sets of ground clauses have minimal models with respect to>i ,and the ordering>c on clauses is downward finite, and simp is a progress function,then the OSHL procedure will report unsatisfiability in finite time. Also, if OSHLreports unsatisfiability, thenS is unsatisfiable.

Proof. Let T1, T2, T3, . . . be the sequence of setsT constructed by OSHL, andlet I1, I2, I3, . . . be the structuresmmin[Ti] for theseTi . LetD1,D2,D3, . . . be thesequence of clausesD that contradictI1, I2, I3, . . . . ThusTi+1 = simp(Ti,Di).ThenIn+1 >i In for all n, sincemmin[simp(Ti,Di)] > mmin[Ti] becausesimp is aprogress function, and because the clauseDn contradictsIn. By Lemma 4.6 above,the set{D1,D2,D3, . . .} is finite. Since>i is coarse, the set

⋃i Ti is finite. Thus the

set{T1, T2, T3, . . .} is finite. Since>i is an ordering, this means that the sequenceI1, I2, I3, . . . is finite (since there cannot be repetitions). This means that the proce-dure OSHL must terminate. SinceS is unsatisfiable, OSHL cannot terminate andreport “satisfiable”; hence OSHL must terminate and report “unsatisfiable.”

Conversely, if OSHL terminates and reports “unsatisfiable,” then the emptyclause{} must be inTn for somen. Sincesimp(T ,D) is a logical consequenceof T ∪ {D} becausesimpis a progress function, the empty clause must be a logicalconsequence ofS. ThereforeS is unsatisfiable. 2

We now give a more detailed description of the procedure OSHL, specifying theorderings>i and>c and specifying howGmin

S andsimpare computed. We verifythat these orderings and procedures satisfy the properties assumed above in theproof of completeness of OSHL. This shows that OSHL is complete. Of course,there could be other ways of specifying these orderings and functions, and as longas they satisfy the general properties given above, OSHL would still be a completetheorem prover.

We recall the definition of the extension of an ordering>lit on ground literals toan ordering>cl on ground clauses:

DEFINITION 4.8. D >cl C if for all literalsL ∈ C, there is a literalM ∈ D suchthatM >lit L.

We can extend the ordering>lex on terms and atoms to an ordering>lit onliterals, and thence to an ordering>cl on clauses, as indicated in Definition 3.4

190 DAVID A. PLAISTED AND YUNSHAN ZHU

above. The ordering>lex is a total ordering on terms and atoms, but its extensionto an ordering on clauses is a partial ordering. Recall that the literalsA and¬A arenot ordered with respect to each other by the ordering>lit , but all other pairs ofliterals are.

Henceforth we assume that>c is this extension>cl of >lex to an ordering onclauses.

THEOREM 4.9. The ordering>c on clauses is downward finite. That is, there areonly finitely many ground clauses not greater than any given one.

Proof.Let>lex also denote the extension of the lexicographic ordering to groundliterals. We note that>lex is a total ordering on ground atoms. If not(C >lex D),then¬(A >lex B), whereA is the maximal atom inC andB is the maximal atomin D. Thus it must be true that for all atomsA in C, ¬(A >lex B). There are onlyfinitely many such atomsA for a fixedB, since they all must have length not largerthan that ofB. From a finite set of atoms, one can construct only a finite set of suchclausesC. Thus the set of ground clausesC such that not(C >lex D) is finite. 2

Next we define the ordering>i on interpretations. For this purpose, we consideran arbitrary total termination ordering>t on ground terms and atoms and use it toorder interpretations. We assume that an initial interpretationI0 is specified. Giventwo interpretationsI andJ , we sayI andJ agree onL if I |= L iff J |= L. IfI andJ are distinct, we letI ⊕ J be the set of ground literalsL on whichI andJ do not agree. We letI ↓ J be the ground atomL in I ⊕ J that is minimal inthe ordering>t. We sayJ >i I if I andI0 agree onI ↓ J . ThusI >i I0 for all Idistinct fromI0. We writeI <i J if J >i I .

We show that this ordering>i on interpretations is coarse, as follows.

THEOREM 4.10. If T is a finite set of ground clauses, thenI0⊕mmin[T ] is a finiteset of literals.

Proof. If a literalL does not appear inT , thenI0 andmmin[T ] agree onL, sincean interpretation differing frommmin[T ] only on such literals will also satisfyT .Thus, becausemmin[T ] is the minimal model ofT ,mmin[T ] must agree withI0 onliteralsL that do not appear inT . However, there are only finitely many literalsin T . Thus there are only finitely many literals on whichI0 andmmin[T ] differ. 2COROLLARY 4.11. The ordering>i on interpretations is coarse.

Proof.Supposemmin[T1] <i mmin[T2] <i m

min[T3] <i . . . , whereTj are sets ofground clauses. Now, ifmmin[Tj ] differs from I0 onL, thenL appears inTj , andhence in

⋃i Ti. Let Ij bemmin[Tj ]. If

⋃i Ti is finite, then

⋃i(I0⊕ Ij ) is finite. But

(I0⊕ Ij ) ⊆⋃i(I0⊕ Ij ). So if⋃i Ti is finite, then each of theIj can differ fromI0

on a set of literals that is a subset of the finite set⋃i(I0⊕ Ij). This means that the

set ofIj is finite, so the sequencemmin[T1],mmin[T2],mmin[T3], . . . must be finite.Thus the ordering>i is coarse. 2

ORDERED SEMANTIC HYPER-LINKING 191

We now show that minimal models of finite sets of ground clauses exist, withrespect to the ordering>i on interpretations.

DEFINITION 4.12. Given an interpretationI and literalsLi , letI [L1, L2, . . . , Ln]be defined as follows:I [L1, L2, . . . , Ln] |= L iff

(a) L ∈ {L1, . . . , Ln} or if(b) L 6∈ {L1, . . . , Ln} andL 6∈ {L1, . . . , Ln} andI |= L.

ThusI [L1, L2, . . . , Ln] is like I except for the finite list[L1, L2, . . . , Ln] of“exceptions”; for an earlier use of this notation, see [14].

THEOREM 4.13. SupposeI is an interpretation andJ = I0[L1, L2, . . . , Ln].Then ifI <i J , (I ↓ J ) ∈ {at (L1), at (L2), . . . , at (Ln)}.

Proof.If I <i J , thenI andJ are unequal; thus there must be an atomI ↓ J be-cause the ordering>t is well founded. If(I ↓ J ) 6∈ {at (L1), at (L2), . . . , at (Ln)},thenJ agrees withI0 on the atomI ↓ J , soJ <i I . 2

This result classifies the interpretations that are smaller thanI0[L1, L2, . . . , Ln]into n distinct groups, depending on whichat (Li) is equal toI ↓ J .

THEOREM 4.14. SupposeC = {L1, L2, . . . , Ln} is a ground clause, and supposeL1 < L2 < · · · < Ln. SupposeI0 6|= C. Then the least model ofC is I0[Ln].

Proof.We note thatI0[Ln] is a model ofC. We show that no smaller interpreta-tion J is a model ofC. SupposeJ <i I0[Ln]. Then by Theorem 4.13,I ↓ J = Ln.This implies thatJ agrees withI0 onLn, and soC contradictsJ . 2THEOREM 4.15. LetG be a finite set of ground clauses. IfG is satisfiable, thenG has a least model.

Proof. Let A1, . . . , An be the atoms that appear (positively or negatively) inclauses inG. There are only finitely many interpretations of these atoms; at leastone of them, sayI , is a model, sinceG is satisfiable. LetImin be the least suchmodel, in our ordering<i on interpretations. This must exist, since there are onlyfinitely many such finite interpretations. ExtendImin to an interpretationJ =I0[L1, L2, . . . , Ln] where eachLi is eitherAi or ¬Ai , and Imin |= Li for all i.ThenJ is a model ofG, sinceImin is, and it is a least model, becauseImin is assmall as possible among the interpretations of theAi and the interpretations ofthe other literals have been chosen to makeJ as small as possible. Any smallerinterpretation would have to differ fromJ on some literalLi , by Theorem 4.13,which is not possible by the wayImin was chosen. 2

192 DAVID A. PLAISTED AND YUNSHAN ZHU

4.2. THE PROCEDUREsimp

We now describe the proceduresimpand show that it is a progress function. Thereader may wish to consult the examples in Sections 4.3 and 4.4.

DEFINITION 4.16. IfC is a (nontautologous) clause, letmax(C) be the maximalliteral inC in the ordering>t. This exists because clauses are finite.

DEFINITION 4.17. A listC of clauses isascendingif it is of the formC1, C2,

. . . , Cn wheremax(C1) <t max(C2) <t . . . and where for alli ≤ n,mmin({C1, C2,

. . . , Ci}) 6|= Ci+1. ThusC1 contradictsI0,C2 contradicts the least model ofC1, andso on. The literalsmax(Ci) are calledeligible literals, in harmony with the use ofthis term in [14].

THEOREM 4.18. If C = C1, C2, . . . , Cn is ascending, then so are all its prefixes.Also, ifC1, C2, . . . , Cn is ascending, thenI0 6|= max(Ci) for all i andmmin({C1, C2,

. . . , Cn}) = I0[max(C1),max(C2), . . . ,max(Cn)]. Furthermore, this latter inter-pretation disagrees withI0 on the literals max(Cj).

Proof.The part about prefixes is immediate. For the rest, we use induction oni.We show thatI0 6|= max(Ci) for all i. By induction,mmin({C1, C2, . . . , Ci−1}) =I0[max(C1),max(C2), . . . ,max(Ci−1)]. Since the literalmax(Ci) is larger than anyliteral in [max(C1),max(C2), . . . ,max(Ci−1)], mmin({C1, C2, . . . , Ci−1}) agreeswith I0 on max(Ci). By the definition of ascending,mmin({C1, C2, . . . , Ci−1}) 6|=Ci, thereforemmin({C1, C2, . . . , Ci−1}) 6|= max(Ci). SinceI0 agrees withmmin({C1,

C2, . . . , Ci−1}) on max(Ci), I0 6|= max(Ci).Let I be I0[max(C1),max(C2), . . . ,max(Cn)]. To show thatI andI0 disagree

on the literalsmax(Ci), we have just shown thatI0 6|= max(Ci) for all i, but I |=max(Ci) by the way it is constructed.

To show thatI ismmin(C), we show thatI is a model ofC, but no smaller inter-pretationJ is a model ofC. Now, I is a model ofC since it satisfies all the literalsmax(Ci). If J <i I , thenJ must differ fromI on some eligible literal, by Theo-rem 4.13. SupposeI ↓ J is max(Ci) or its complement. ThenJ 6|= max(Ci) (sinceI |= max(Ci) andI andJ differ on max(Ci)). It remains to show thatJ does notsatisfy the other literals ofCi. We know thatmmin({C1, C2, . . . , Ci−1}) 6|=Ci, by thedefinition of ascending. By induction, we know thatmmin({C1, C2, . . . , Ci−1}) =I0[max(C1), max(C2), . . . ,max(Ci−1)]. Therefore I0[max(C1),max(C2), . . . ,

max(Ci−1)] does not satisfyCi. ThereforeJ does not satisfyCi, sinceJ agreeswith I0[max(C1),max(C2), . . . ,max(Ci−1)] on all literals smaller thanmax(Ci),and thus the literals ofCi other thanmax(Ci) are not satisfied byJ either. 2

We now describe the proceduresimp. There are two kinds of operations thattake place during the processing ofC involved in the call tosimp. The first kind is

ORDERED SEMANTIC HYPER-LINKING 193

to perform ordered resolutions betweenD and the last clause ofC, when possible.For this, we define a_res(C,D) as follows.

DEFINITION 4.19. SupposeC andD are ground clauses, and suppose thereis a literalL such thatL = max(C) andL = max(D). Then a_res(C,D) =(C − {L}) ∪ (D − {L}).

Now, a_res(C,D) is an (A-ordering) resolution involving the maximal literalsin C andD. We note that A-ordering resolution is in itself a complete theorem-proving method for propositional logic and has natural extensions to first-orderlogic.

The second kind of operation that takes place duringsimp is to eliminate el-ements fromC that are made irrelevant by these A-ordering resolutions, that is,elements ofC that can be eliminated without affectingmmin[C]. We have thefollowing procedure forsimp.

procedure simp([C1, C2, . . . , Cn],D);if max(Cn) andmax(D) are complementarythen

D′ ← a_res(Cn,D);return simp([C1, C2, . . . , Cn−1],D′)

else ifmax(Cn) >t max(D) thenreturn simp([C1, C2, . . . , Cn−1],D)else return [C1, C2, . . . , Cn,D] fi fi

endsimp

In this procedure, we assume that the clause list[C1, C2, . . . , Cn] is given in as-cending order. The proceduresimpoperates much like Davis and Putnam’s method,which justifies our claim that OSHL is propositionally efficient. We have alsofound by experiment that OSHL is reasonably efficient on propositional and near-propositional problems. We now show thatsimpis a progress function.

LEMMA 4.20. If {C1, C2, . . . , Cn,D} is satisfiable, then the procedure simp([C1,

C2, . . . , Cn],D) returns a list of the form[C1, C2, . . . , Ci,D′′] for somei ≤ n and

for some clauseD′′ which is a logical consequence of{C1, C2, . . . , Cn,D}, andfor which max(D′′) >t max(Ci) if i > 0, and max(D′′) <t max(Ci+1) if i < n. If{C1, C2, . . . , Cn,D} is unsatisfiable, then simp([C1, C2, . . . , Cn],D) returns[D′′],whereD′′ is the empty clause.

Proof.If max(Cn) <t max(D), then the lemma is true because thensimpreturns[C1, C2, . . . , Cn,D]. If max(Cn) >t max(D), then the lemma follows by induction,becausesimp is called on smaller arguments of the same form. Ifmax(Cn) andmax(D) are complementary, then the theorem is true because thensimp is calledrecursively on arguments of the same form, that is, simp([C1, C2, . . . , Cn−1], D′)is called, whereD′ is an A-resolvent ofCn andD, and thusmax(D′) <t max(Cn).Since this clauseD′ is a logical consequence ofCn andD, it follows that the

194 DAVID A. PLAISTED AND YUNSHAN ZHU

clauseD′′ in the lemma is a logical consequence of{C1, C2, . . . , Cn,D}, sinceD′′is obtained by a sequence of A-resolutions from clauses in{C1, C2, . . . , Cn,D}.2LEMMA 4.21. If {C1, C2, . . . , Cn,D} is satisfiable, then the procedure simp([C1,

C2, . . . , Cn],D) returns a list of the form[C1, C2, . . . , Ci,D′′] such thatmmin

[{C1, C2, . . . , Ci,D′′}] >i m

min[{C1, C2, . . . , Cn}].Proof. By Theorem 4.18 above, we know thatmmin[{C1, C2, . . . , Cn}]

= I0[max(C1),max(C2), . . . ,max(Cn)] and mmin[{C1, C2, . . . , Ci,D′′}] =

I0[max(C1),max(C2), . . . ,max(Ci),max(D′′)]. If i = n, then I0[max(C1),

max(C2), . . . ,max(Ci),max(D′′)] >i I0[max(C1),max(C2), . . . ,max(Cn)] be-causemax(D′′) = max(D) >t max(Cn). If i < n, thenI0[max(C1),max(C2), . . . ,

max(Ci),max(D′′)] >i I0[max(C1),max(C2), . . . ,max(Cn)] becausemax(D′′) <t

max(Ci+1). 2THEOREM 4.22. The above procedure simp is a progress function, regardingits first argument[C1, C2, . . . , Cn] as a representation of the setT = {C1, C2,

. . . , Cn}.Proof.We first show thatmmin[simp(T ,D)] >i m

min[T ] if D contradictsmmin[T ]andT ∪ {D} is satisfiable. This follows by Lemma 4.21 above.

We next show thatsimp(T ,D) = {{}} if T ∪ {D} is unsatisfiable. This followsbecause ifT ∪ {D} is unsatisfiable, then eitherD is the empty clause or there mustbe an A-resolution proof of the empty clause fromT ∪ {D} using the ordering>t, since A-resolution is complete. The proof is by induction onn. SinceD isan ascending sequence, there cannot be any A-resolutions between the clauses ofD. Thus the only possible A-resolution is that betweenD and a clause ofT . Ifmax(D) <t max(Cn), thenCn cannot contribute to an A-resolution proof, so wecan deleteCn and still have an unsatisfiable set of clauses. Ifmax(D) >t max(Cn),then no A-resolutions are possible, so this case cannot occur, since it would im-ply thatT ∪ {D} is satisfiable. Ifmax(D) andmax(Cn) are complementary, thenan A-resolution is possible betweenD andCn, and afterwards no additional A-resolutions will be possible involving the clausesD andCn, so{C1, C2, . . . , Cn−1}∪ {D′′} will be unsatisfiable, whereD′′ is the A-resolvent ofD andCn. In all casesin which T ∪ {D} is unsatisfiable,simp is called recursively on arguments thatpreserve this unsatisfiability relation. Thus by induction, we will obtain a proof ofthe empty clause ifT ∪ {D} is unsatisfiable. The base case is that in whichT isempty, in which caseD itself must be the empty clause.

We next show thatsimp(T ,D) is a logical consequence ofT ∪{D}. This followsbecause the clauseD′′ of Lemma 4.20 is a logical consequence ofT andD. Now,simp is polynomial time computable because each call tosimp takes a constantamount of time, plus time to do an A-resolution, and time for further recursivecalls. An A-resolution can be done in linear time, and there are at most a linearnumber of recursive calls, so the total time bound is quadratic. Finally, we notethat simp preserves atoms. That is, the set of atoms appearing insimp(T ,D) is

ORDERED SEMANTIC HYPER-LINKING 195

a subset of those appearing inT . This follows from the fact that the only newclauses produced are obtained by resolution fromT andD, and resolution doesnot generate new atoms. As a consequence of this, it follows that ifT0, T1, T2, . . .

is a sequence of sets of clauses such thatT0 is {} and Ti+1 = simp(Ti, Ci) forall i, and {C1, C2, C3, . . .} is finite, then

⋃i Ti is finite. This follows because if

{C1, C2, C3, . . .} is finite, then it will only have a finite number of atoms. Sincesimpdoes not generate new atoms,

⋃i Ti will only have a finite number of atoms,

hence will be finite. This completes the proof.

It is possible to define other versions ofsimp that are also progress functions.One way to do this is as follows: If simp([C1, C2, . . . , Cn],D) returns[C1, C2, . . . ,

Ci,D′′], andD′′ is not the empty clause, then define simp′([C1, C2, . . . , Cn],D) to

be[C1, C2, . . . , Ci,D′′,D1, . . . ,Dk] where{D1,D2, . . . ,Dk} is the set of clauses

Cj with j > i not containing the literalmax(D). Thensimp′ is also a progressfunction, since the extra clauses in it will only make the minimal model larger.The modified proceduresimp′ has the advantage that it remembers more instances,avoiding the necessity to generate the same instance repeatedly.

We note that the above results together imply that the OSHL procedure is soundand complete, by Theorem 4.7. We also note that the search isdeterministic, thatis, it is not necessary to try more than one sequence of ground clauses. In a sense,OSHL constructs just one proof, in contrast to methods like resolution (or eventhose of Caferra, Peltier and Zabel [17, 18, 15, 40]) which construct many proofsin parallel. Thus OSHL has “proof confluence” in the sense of Bibel [9].

4.3. AN EXAMPLE

Suppose that the set of atoms isP1, P2, P3, . . . ordered byP1 <t P2 <t P3 <t . . .

and thatS contains the following clauses.

P5

¬P5,¬P8

P6,¬P10

¬P6, P8

P10,¬P11

P11

SupposeI0 interpretsPi as true for all i, that is,I0 |= Pi . Then we generate thefollowing sequence of current interpretationsIi = mmin[Ci], and the correspondingclausesGmin

S [Ii].I0 (now the clause¬P5,¬P8 is chosen)I0[¬P8] (now¬P6, P8 is chosen)

(resolvent¬P5,¬P6, from the above two clauses)

196 DAVID A. PLAISTED AND YUNSHAN ZHU

I0[¬P6] (now¬P5,¬P8 is chosen again)I0[¬P6,¬P8] (nowP6,¬P10 is chosen)I0[¬P6,¬P8,¬P10] (nowP10,¬P11 is chosen)I0[¬P6,¬P8,¬P10,¬P11] (nowP11 is chosen)

(resolventP10 is generated fromP10,¬P11 and P11)(resolventP6 is generated fromP10 and P6,¬P10)(resolvent¬P5 is generated fromP6 and¬P5,¬P6)

I0[¬P5] (nowP5 is chosen)(resolvent {} is generated from¬P5 andP5)

We show the six ascending lists of clauses that are generated, too.

{¬P5,¬P8}

{¬P5,¬P6} (after “simp” is called){¬P5,¬P6}, { ¬P5,¬P8}

{¬P5,¬P6}, { ¬P5,¬P8}, { P6,¬P10 }

{¬P5,¬P6}, { ¬P5,¬P8}, { P6,¬P10 }, { P10,¬P11}

{¬P5}

4.4. ANOTHER EXAMPLE

Axioms:S1: {r, q, p}, S5: {r, q,¬p},S2: {¬r, q, p}, S6: {¬r, q,¬p},S3: {r,¬q, p}, S7: {r,¬q,¬p}.S4: {¬r,¬q, p},

Theorem:S8: {¬r,¬q,¬p}.

A propositional problem.

We now use another example to illustrate the ordered semantic hyper-linkingalgorithm. The set of clauses for our example is shown above. We choose the inputsemanticsI0 to be{p, q, r}. We regard the first seven clauses as axioms and the lastclause as the theorem, andI0 models all the axioms and contradicts the theorem.According to the ordering<t, p < q < r. Literals in the input clauses are ordered.We call each iteration of the loop in procedure OSHL a round. We use subscriptsto distinguish each round’s variables.

Round 1:I1 = I0,D1 = {¬r,¬q,¬p}.Round 2:I2 = I0[¬r] = {p, q,¬r}, D2 = {r,¬q,¬p}. In the model genera-

tion procedure,D2 is resolved withD1, and a new clauseC1{¬q,¬p} is generated.

ORDERED SEMANTIC HYPER-LINKING 197

Figure 1. A semantic tree constructed by OSHL

Round 3:I3 = I0[¬q] = {p,¬q, r}, D3 = {¬r, q,¬p}. No resolution is donein this round, andD3 is simply added to the set of ground clausesG. Note thatA-resolution only resolves the maximal literals of two clauses.

Round 4:I4 = I0[¬q,¬r] = {p,¬q,¬r}, D4 = {r, q,¬p}. In the modelgeneration procedure,D4 is resolved withD3 to generate clauseC2 = {q,¬p} andC2 subsequently is resolved with theC1 generated in round 2. A new clause{¬p}is generated.

Round 5:I5 = I0[¬p] = {¬p, q, r}.In the next three rounds, an empty clause will be generated, and no more models

can be found. Thus a refutation is obtained.Figure 1 shows the semantic tree that corresponds to the search of the OSHL

procedure. Each leaf node of the semantic tree corresponds to an input clause,and each internal node corresponds to an A-resolvent. In fact, the proof process ofordered semantic hyper-linking can be understood as a process of constructing andsearching through a semantic tree. An interpretation corresponds to a branch ofthe semantic tree. Assuming that the input semantics is the leftmost branch of thesemantic tree and the nodes are in an increasing order from the root to the leaves,ordered semantic hyper-linking performs a depth-first? search of the semantic treefrom left to right.

4.5. GENERATING GROUND INSTANCES

We now describe how to computeGminS in the procedure OSHL. To this end, we first

present a theorem about disunification, which may be proved by methods similarto those in [30] and [23]:

THEOREM 4.23. If L is a literal andL is a finite set of ground literals, then thereis a finite set dis(L,L) of possibly nonground literals such that for all instancesL′ of L, L′ 6∈ L iff L′ is an instance of some literal in dis(L,L). Moreover, the? In the case of first-order logic, it performs an iterative-deepening search.

198 DAVID A. PLAISTED AND YUNSHAN ZHU

number of elements in dis(L,L) is at worst polynomial in the sum of the symbolsizes of the literals in{L} ∪ L, and dis(L,L) can be computed in polynomialtime.

For the sake of completeness, we present a simple algorithm to computedis(L,L) below, after some definitions.

DEFINITION 4.24. Suppose thatM is a ground literal andL is an arbitrary literalhavingM as an instance. DefineM//L, theprefixes ofM relative toL, to be thesmallest set of literals such that (a)L ∈ M//L and (b) ifL′ ∈ M//L andx is theleftmost variable inL′, thenL′{x ← f (x1, . . . , xn)} ∈ M//L, wherex1, . . . , xnare distinct variables that do not appear inL′, and wheref is chosen so thatM isan instance ofL′{x ← f (x1, . . . , xn)}.

For example, ifL is P(x, f (y)) andM is P(a, f (f (b))) andF is {a, b, f },thenM//L is {P(x, f (y)), P (a, f (y)), P (a, f (f (y))), P (a, f (f (b)))}.DEFINITION 4.25. Suppose thatM is a ground literal andL is an arbitrary literalhavingM as an instance. Then dis2(L,M) is the set of literals of the formL′{x ←f (x1, . . . , xn)} such thatL′ ∈ M//L, L′ is not ground,x is the leftmost variableof L′, x1, . . . , xn are distinct variables that do not appear inL′, and such thatf ischosen so thatM is not an instance ofL′{x← f (x1, . . . , xn)}.

For L and M as above, dis2(L,M) is {P(b, f (y)), P (f (x1), f (y)),

P (a, f (a)), P (a, f (b)), P (a, f (f (a))), P (a, f (f (f (x1))))}.We observe that dis2(L,M) has a number of elements that is polynomial in

||L||+ ||M||, and the symbol sizes of these elements are also polynomial in||L||+||M||. Also, no two distinct elements of dis2(L,M) unify with each other.

THEOREM 4.26. If L is a literal andL′ is an arbitrary ground instance ofL,thenL′ is not equal toM iff L′ is an instance of some element ofdis2(L,M).

Proof. This may be seen by considering the leftmost position at whichL′ andM differ. 2

The algorithm to computedis(L,L) follows.

procedure dis(L,L);[[ create a listS of instances ofL such that an instanceL′ of L is

not in the setL of ground literals iffL′ is an instanceof some element ofS ]]

if L is emptythen return({L}) elseletL1 be an element ofL;if L andL1 do not unifythen return(dis(L,L − {L1})) else

ORDERED SEMANTIC HYPER-LINKING 199

if L ≡ L1 then return{} else return(⋃{dis(M,L− {L1}) : M ∈ dis2(L,L1)})

fi fi fi )enddis;

THEOREM 4.27. The procedure dis(L,L) returns a setM of literals such thatan arbitrary instanceL′ of L is not in L iff L′ is an instance of some literal inM. Also, dis(L,L) runs in polynomial time, and the sum of the symbol sizes of theliterals in M is polynomial in the sum of the symbol sizes of the literals in{L} ∪L.Furthermore, no two elements ofM unify with each other.

Proof. The proof that an arbitrary instanceL′ of L is not in L iff L′ is aninstance of some literal inM follows by simple properties of unification, and byTheorem 4.26. The key observation for the polynomial bound on running time isthat the elements of dis2(L,L1) do not unify with each other. This means thatthe elements of dis2(L,L1) partition the setL − {L1} into disjoint subsets thatunify with the different elements of dis2(L,L1). We can then bound the numberof recursive calls of the procedure “dis”, since for a given call ofdis(L,L), if LandL1 unify, then there will be at most one recursive call to “dis” for each elementM of dis2(L,L1). Thus the number of recursive calls to “dis” generated by thisinvocation ofdis(L,L) is bounded by the number of elementsM of dis2(L,L1),which is polynomial. IfL1 andL do not unify, no recursive calls are generated.Also, for each literalM ′ in L, there will be only one call todis(L,L) in the entireexecution tree such thatL1 will be M ′ and such thatL1 andL will unify. Thus thetotal number of calls to “dis” in the execution tree is polynomial. Since each callto “dis” runs in polynomial time, excluding the time for recursive calls, the totalexecution time for “dis” is polynomial. The fact that no two distinct elements ofthe set returned bydis(L,L) unify with each other follows because no two distinctelements of dis2(L,L1) unify. 2

The procedure “dis” should be efficient because in most casesL andL1 willnot unify. Also,L will get larger on recursive calls, making the probability ofunification even less. If one is looking for instances of sizes, for somes, whichis the typical case, thendis(L,L) can return the empty set if||L|| > s, furtherimproving the efficiency. In addition, it is not necessary to compute all elementsof dis(L,L) at once, but they can be returned one by one, and the computationcan be stopped as soon asGmin

S finds an instance of sizes, if one is using the sizeordering on clauses. The computation of dis2(L,M) is efficient because it can bedone in polynomial time. All of these factors together should make disunificationvery efficient, even beyond the fact that its worst-case running time is polynomial.In most cases the size of the set returned should be very small.

We can also characterize the set of ground clauses that contradict the modelI0[L1, . . . , Ln] as follows.

200 DAVID A. PLAISTED AND YUNSHAN ZHU

THEOREM 4.28. A ground clauseC contradictsI0[L1, . . . , Ln] iff C = C1∪C2,C1 ∩ C2 = {}, for all L ∈ C1, ¬L ∈ {L1, . . . , Ln}, and for all L ∈ C2, L 6∈{L1, . . . , Ln} andI0 6|= L.

Proof.By the definition of|= and the definition ofI0[L1, . . . , Ln] from Defini-tion 4.12 above. 2

These ground instances can be obtained by first instantiating clauses ofS witheligibility substitutionsand then instantiating so that part of the clause contra-dictsI0.

DEFINITION 4.29. Aneligibility substitutionfor a clauseC and a set{L1, . . . ,

Ln} of eligible literals is a most general substitutionα such that for every literalLin C, Lα is an instance of some literal indis(L, {L1, . . . , Ln})∪ {¬L1, . . . ,¬Ln}.COROLLARY 4.30. If C is a clause andβ is a ground substitution andI0[L1,. . . , Ln] 6|= Cβ, then there is an eligibility substitutionα for C and {L1, . . . , Ln}such thatCβ is an instance ofCα.

Proof. For every literalL of C, Lβ must be an instance of some literal indis(L, {L1, . . . , Ln})∪{¬L1, . . . ,¬Ln}. If we letCα be the most general instanceof C having this property, thenα is an eligibility substitution as in the corollary.2

This shows that we can findβ such thatI0[L1, . . . , Ln] 6|= Cβ by first unifyingeach literalL of C with an element ofdis(L, {L1, . . . , Ln}) ∪ {¬L1, . . . ,¬Ln}and then further instantiating the literals inCα − {¬L1, . . . ,¬Ln} so that theycontradictI0. This latter job is often made easier by the fact thatα often replacessome or all variables ofC by ground terms.

THEOREM 4.31. Given a clauseC, the set of eligibility substitutionsα for C and{L1, . . . , Ln} can be computed in a time polynomial in||L1||+ ||L2||+ · · ·+ ||Ln||and in the size ofC and exponential in the number of literals inC.

Proof.To compute the eligibility substitutions, we need to examine all possibleways of unifying literalsL of C with literals ofdis(L, {L1, . . . , Ln}) ∪ {¬L1, . . . ,

¬Ln}, and taking the most general among them. This latter set is of size polynomialin ||L1||+· · ·+||Ln|| and has a number of literals polynomial in||L1||+· · ·+||Ln||.If this set hask literals, andC hasd literals, then there are at mostkd unificationsto try. Each such unification can be done in polynomial (even linear) time.2

If the number of literals inC is bounded, as it will be for a fixedS, then thisprocess of computing eligible substitutions requires only polynomial time.

Finally, we define the procedureGminS . This procedure first computes the set

U of all minimal terms of increasing sizes. The use of minimal terms considerablyincreases the efficiency of instantiation in OSHL. This is a significant improvementover the instantiation procedure in CLIN-S [14], for example. Next,Gmin

S uses thesetU of minimal terms to compute a setV of instances of clauses inS which

ORDERED SEMANTIC HYPER-LINKING 201

contradictI . Finally, a minimal element ofV is returned. If there are only finitelymany minimal terms,Gmin

S may halt and return “fail.” If there are infinitely manyminimal terms, but no instances ofS contradictI , thenGmin

S will fail to terminate.For this procedure, we assume>cl is the extension of the lexicographic ordering>lex to clauses. Note that we can test whetherI0 6|= Dβ becauseI0 is grounddecidable andDβ is a ground clause.

procedureGminS [I ];

[[ compute minimal ground instance relative to>lex

that contradicts interpretationI ]]let {L1, . . . , Ln} be such thatI is I0[L1, . . . , Ln];[[ compute minimal terms relative toI and>lex ]]U ← {}; [[ U is the set of minimal terms ]]for s = 0,1,2,3, . . . doUnew← {}; [[ Unew is the set of minimal terms of sizes ]]U1← {termsf (s1 · · · sk) of sizes : si ∈ U, f ∈ F };for all termst in U1 in order of>lex do

if there does not existu in U ∪ Unew such thattI0 = uI0thenUnew← Unew∪ {t} fi;

od;if Unew is emptythen return “fail” fi;U ← U ∪ Unew;V ← {}; [[ V is a set of instances ofS ]]for C ∈ S do

for all eligibility substitutionsα of C doD← Cα − {¬L1, . . . ,¬Ln};for all β of the form{x1← t1, . . . , xm ← tm} where

x1, . . . , xm are the variables ofD andt1, . . . , tm are inU do

if I0 6|= Dβ then V ← V ∪ {Cαβ} fiod;

od;od;

until V is not emptyod;return an element ofV that is minimal in the clause ordering>cl;

endGminS ;

The actual implementation of OSHL uses a different implementation ofGminS that

involves unifying some of the literals ofC with complements of eligible literalsand replacing the remaining variables ofC with minimal or nonminimal terms insuch a way that no eligible literals are created, and so that the resulting clausecontradictsI . In practice, this use of nonminimal terms is efficient because thechance of instantiating a literal to an eligible literal is very low.

202 DAVID A. PLAISTED AND YUNSHAN ZHU

4.6. GOAL SENSITIVITY AND PROOF SENSITIVITY

We now show that the procedure OSHL isgoal sensitive, as claimed in Section 3.2,that is, the inferences performed by the prover are all relevant to the particulartheorem being proved. For this we have the following definition.

DEFINITION 4.32. The set of ground instances of a setS of clauses relevant to astructureI is the smallest set of ground instances ofS satisfying the following twoconditions:

• A ground instance of a clause inS that is not satisfied byI is relevant toI .• If ground instanceC of a clause inS is relevant toI andD is another ground

instance of a clause inS and there are literalsL andM in C andD, respectively,such thatL andM are complementary, thenD is also relevant toI .

The motivation for this definition is thatI is a structure satisfying the axiomsof a theorem. Thus the only clauses that contradictI are those clauses representingthe negation of the particular theorem being attempted. These are relevant to thetheorem. Also, if two ground instancesC andD have complementary literals, thenthey are in some sense related to each other, so if one of them is relevant to thetheorem, the other is, too.

THEOREM 4.33. All ground instancesD generated by the procedure OSHL arerelevant to the initial interpretationI0, the minimal interpretation in the ordering>i on interpretations.

Proof. The first instanceD generated by OSHL contradictsI0, so it is relevantto I0. Later instancesD either contradictI0 or contain the complement of someeligible literalLi . ButLi is contained in some earlier ground instanceD′ generatedby OSHL. By induction, we can assume thatD′ is relevant toI0. It follows thatDis, as well. 2

Previously we considered interpretations that were minimal with respect to theordering<i on interpretations, but here we consider interpretationsI that are mini-mal with respect to the set of positive literalsL such thatI |= L. We writeI ≤pos J

if for all atomsA, if I |= A, thenJ |= A. We say a modelI of a setS of clauses is≤pos-minimal if for all modelsJ of S, I ≤posJ .

We can say even more than Theorem 4.33 if the input setS is a set of Hornclauses and the semantics is chosen as a≤pos-minimal model of the clauses ofSthat contain at least one positive literal. In this case, OSHL isproof sensitive, inthat it only generates subgoals that are provable.

DEFINITION 4.34. For a setS of Horn clauses, theaxiomsare the clausescontaining at least one positive literal. Thegoal clausesare the remaining clauses.

ORDERED SEMANTIC HYPER-LINKING 203

This definition is consistent with the use of Horn clauses in Prolog, althoughone could imagine cases where the axioms might naturally include an all-negativeclause.

THEOREM 4.35. SupposeS is a set of Horn clauses andI0 is a ≤pos-minimalmodel of the axioms ofS. Then for all ground clausesD generated by OSHL, andfor all negative literals¬L in D, L is a logical consequence of the axioms ofS.

Proof.It is well known that the axioms ofS have a≤pos-minimal modelI , sincethey are a Horn set; see, for example, [29]. This model has the property that ifL isa positive literal, thenI |= L iff L is a logical consequence of the axioms ofS.

Now, if D is a ground instance of a clause inS andI0 6|= D, thenD must bean instance of a goal clause, sinceI0 models the axioms ofS. This means thatfor every literalL in D, L is negative and thatI0 6|= ¬L. Thus¬L is a logicalconsequence of the axioms ofS, as claimed.

We assume by induction that all eligible literals generated are negative andthat their complements are logical consequences of the axioms ofS. This is truefor the first clause chosen, as we just showed. In general, the ground instanceD

generated will either contradictI0 or have some literalsL that are complementsof eligible literals. In the former case, the theorem is true. In the latter case, therecan be at most one such literalL, since any such literals must be positive, andScontains only Horn clauses, which have at most one positive literal. The remainingliterals ofLmust be negative, and hence cannot be complements of eligible literals.These literals therefore must contradictI0. SinceI0 is a≤pos-minimal model of theaxioms ofS, then the negations¬L of these literals must be logical consequencesof the axioms ofS. Furthermore, any new eligible literals ofD will be negative, asassumed in our induction hypothesis. This completes the proof. 2

If we think of each eligible literal as (the negation of) a subgoal, this means thatthe prover only generates subgoals that are logical consequences of axioms, andhence provable. Thus OSHL is proof sensitive. This is in a sense the best one couldhope for. This is evidence that OSHL, with a good semantics, can be efficient onHorn sets. In this case, OSHL functions much like the geometry theorem proverof Gelernter et al. [24]. Later we will give some Horn clause planning problemsfor which a good semantics yielded good results, in harmony with the theorem wehave just proved.

4.7. SEMANTICS

We now give an example of how the semanticsI0 for a setS of clauses can bespecified in Prolog in the OSHL implementation. This is done by giving Prologcode to compute the truth values of ground literals in the interpretationI0. Considerthe following set of input clauses:

204 DAVID A. PLAISTED AND YUNSHAN ZHU

Axioms:{p(e,X,X)}{p(i(X),X, e)}{p(U,Z,W),¬p(X, Y,U),¬p(Y,Z, V ),¬p(X,V,W)}{p(X,V,W),¬p(X, Y,U),¬p(Y,Z, V ),¬p(U,Z,W)}{p(X, Y, f (X, Y ))}{p(a, b, c)}.?

Theorem:{¬p(a, e, a)}.The above example represents the following group theory problem: A group

with left inverse and left identity has a right identity. A natural semantics forthe problem can be constructed based on an example from the integer additiongroup. The domain is the set of integers.e denotes 0,a denotes 2,b denotes 3,c denotes 5, functioni() denotes the inverse function, functionf () denotes theaddition function, and predicatep denotes the sum relation. Of course, many othersemantics could be constructed in which the constants have different integer values,or values in different groups. The semantics is represented as decision proceduresfor ground literals. A Prolog representation of the procedures is shown below. Theinteger addition group is commutative, sop(a, b, f (b, a)) is true in the givensemantics, but it is not true in the≤pos-minimal model. A better semantics (i.e.,a better approximation of the≤pos-minimal model) can be constructed based onfree group examples.

eval(e,0).eval(a,2).eval(b,3).eval(c,5).eval(i(Xt),V) :- eval(Xt,X), V is -X.eval(f(Xt,Yt),V) :- eval(Xt,X), eval(Yt,Y),

V is X+Y.eval(p(Xt,Yt,Zt)) :- eval(Xt,X), eval(Yt,Y),

eval(Zt,Z), Temp is X + Y,Z = Temp.

In general, any semanticsI can be specified in this manner, provided that it iscomputable for a ground literalL whetherI |= L. It might be possible to specifypartially computable semantics by interleaving the computation of the semanticswith the operation of the prover.

The Prolog implementation also permits a form ofimplicit typing, which wasinstrumental in solving some planning problems [55]. This feature permits cer-tain “ill-formed” terms such as 1+apple from being generated when instantiating? The last two axioms are not needed for the proof.

ORDERED SEMANTIC HYPER-LINKING 205

ground clauses. By proper use of this feature, the user can enforce a typing disci-pline on his or her ground instances, and this has the potential greatly to increasethe efficiency of the prover. This feature is implemented in the following manner:A ground instanceD contradictsI0 if for all literals L in D, I0 6|= L. The userinputs Prolog code to test whetherI0 |= L. The user can define bothp and¬p,for a predicate symbolp. If bothp(t1, . . . , tn) and¬p(t1, . . . , tn) evaluate totrue,then the literalsp(t1, . . . , tn) and¬p(t1, . . . , tn) will never be included in groundinstancesD generated by the prover. The internal Prolog code in the prover toimplement this feature is as follows.

semantics_false_literal(not(L)) :-eval(not(L)),!,fail.

semantics_false_literal(not(L)) :-!,eval(L).

semantics_false_literal(L) :-!,\+ eval(L).

The user can make use of this feature by definitions of the form

eval(p(Xt,Yt,Zt)) :- ...eval(not(p(Xt,Yt,Zt))) :- ...

in the semantics. This feature can also be useful in set theory, since it can permitone to avoid consideration of any set expressions that generate infinite sets. Sinceinfinite sets make the semantics of set theory uncomputable, it may be convenientto exclude them. Even with such a restriction, many set theory problems shouldstill be provable. However, we have not experimented with this possibility yet.

4.8. THE COMPLEXITY OF INSTANTIATION

The fact that the instantiation procedureGminS constructs substitutionsβ of the form

{x1 ← t1, . . . , xn ← tn} wherex1, . . . , xn are the variables ofD and t1, . . . , tnare inU can lead to inefficiency, since this is essentially an exhaustive enumera-tion of the Herbrand universe (subject to the minimality condition). However, thisprocedure is made more efficient because the number of variablesxi is typicallysmall. This is true because the eligibility substitutionsα typically unify one ormore literals ofC with complements of eligible literals. Since these eligible literalsare ground, any literals ofC so unified become ground literals, and all of theirvariables are replaced by ground terms. This reduces the number of variables (andprobably, nonground literals) that remain inCα. If no literals ofC are unified with

206 DAVID A. PLAISTED AND YUNSHAN ZHU

complements of eligible literals, thenC itself contradictsI0. Since axioms ofSare typically satisfied byI0, this typically means thatC is a clause deriving fromthe negation of the particular theorem. This often means thatC is a ground clause,since the clausal form of a pure universal theorem will be pure existential, and itsSkolemization will be a ground clause. These two factors together help to minimizethe number of variables to instantiate by substitutionsβ, often to zero.

Another factor that tends to reduce the work for instantiation is thatI0 typicallysatisfies the clauses ofS obtained from axioms. This means that it is not necessaryeven to attempt to instantiate these clauses, in many cases, if one knows that theyare satisfied byI0. Also, unit clauses that derive from axioms will never need tobe instantiated, because they will be satisfied byI0, and if they unify with thecomplement of an eligible literal, there are no variables left over to require furtherinstantiation. In the same way, any axiom clause in which all variables appear inall literals will never need to be instantiated.

Still, there are some cases where OSHL is very inefficient. Notable among theseis the case whenS contains two unit clauses{L} and{M}, which resolve to producea contradiction. It can be that the most general instance ofL and¬M is of sizeexponential in the sizes ofL andM. This can make an exhaustive enumerationprocedure run in double exponential time, in this case. However, the OSHL imple-mentation has some UR-resolution built in, which will find this proof quickly. Also,a good semantics can reduce the instantiation time even in bad cases such as this.Still, there are undoubtedly some problems for which unification-based provers aresuperior because of the occurrence of large terms in short proofs.

In Theorem 4.35, we gave evidence that a≤pos-minimal model is a good se-mantics for Horn clauses. For non-Horn clauses, assuming (as is often the case)that these clauses contain mostly negative literals, it seems reasonable that a goodsemantics is one in which

• the semantics models the (Skolemized) axioms ofS, and• there are as few true positive atoms as possible.

The first condition guarantees goal sensitivity. The second tends to reduce the num-ber of instances that contradictI0[L1, . . . , Ln], since most of the literals in clausesof S are negative, andI0[L1, . . . , Ln] will tend to make negative literals true.Since a clauseCαβ contradictsI0[L1, . . . , Ln] iff every literal ofCαβ contradictsI0[L1, . . . , Ln], it is unlikely forCαβ to contradictI0[L1, . . . , Ln] if C has manynegative literals, all of which tend to be satisfied byI0[L1, . . . , Ln]. This meansthat the probability thatCαβ contradictsI0[L1, . . . , Ln] is small, so the number ofinstances ofC generated byGmin

S will be reduced. These conditions at least givesome general guidelines as to which semantics will be most efficient for OSHL,and why it would be preferred to approximate an all-negative semantics as closelyas possible. In other words, we want to interpret a predicateP(t1 . . . tn) as trueonly if there is a good reason to do so; otherwise, we interpretP(t1 . . . tn) asfalse.

ORDERED SEMANTIC HYPER-LINKING 207

4.9. COMBINING OSHL AND RESOLUTION

Since OSHL has difficulty generating large literals in some cases, and resolutioneasily generates large literals using unification, it seems reasonable to combineOSHL and resolution. One way to do this is to call OSHL(S ′)whereS′ isS togetherwith some resolvents of clauses inS. Actually, OSHL itself does A-resolutions(which eliminate the largest literals of clauses) in thesimp procedure. In fact,the final proof generated by OSHL is an A-resolution proof. So ifS ′ containsStogether with some A-resolvents of clauses inS, then calling OSHL(S ′) mightsave some work for OSHL, since the size of the A-resolution proof OSHL wouldhave to generate would be smaller.

This also applies to resolution-paramodulation strategies, such as those studiedby Hsiang and Rusinowitch [27] and Bachmair et al. [7], that also eliminate largeliterals of clauses. A combination of OSHL with such resolution-paramodulationstrategies is attractive because of the efficiency of these resolution-paramodulationstrategies on some examples. The completeness of such a combination, however,depends on the fact that the ordering>t used by OSHL is the same as the termina-tion ordering used by these resolution-paramodulation refinements.

4.10. EQUALITY

We have also implemented an extension of OSHL to equality, which has beendescribed in [46], along with a mechanism for handling associative-commutativefunction symbols. This mechanism emphasizes pure equational problems and es-sentially performs completion on a set of equations.

We now present a complete extension of OSHL to first-order logic with equalitywhich also in many cases avoids the explicit use of the equality axioms. The currentimplementation of OSHL may be incomplete for general clause sets involvingequality without the equality axioms, since it may not fully implement this ap-proach. For example, the current implementation assumes that the initial structureI0 is a model of the unit equations. The extension to equality applies when theinput clauses consist of a setE of unit equations and a setS of other clauses notcontaining positive occurrences of the equality predicate, and in other cases aswell. Briefly, our extension does narrowings of the clauses ofS using equationsin E, does unfailing completion [3] ofE to add new equations toE, and uses theequations ofE to simplify ground instances generated by OSHL.

In the following, assume thatE is a set of unit equations and thatS is a set ofclauses not containing positive occurrences of the equality predicate, except for theaxiomx = x, which is assumed to occur inS. Later we will give other conditionsguaranteeing the completeness of this version of OSHL. In the following proce-dure, “unfailing_complete(E)” is assumed to do one round of unfailing completionon the setE of equations, so that some critical pairs between equations ofE areadded toE, andE is simplified. Also, “narrow(S,E)” is assumed to do one round

208 DAVID A. PLAISTED AND YUNSHAN ZHU

of narrowing ofS by the equationsE, so that some of the paramodulants ofS byE in a simplifying direction are added toS.

Note that the equations ofE are not instantiated in this version of OSHL. This,as well as omission of the equality axioms, improves efficiency.

procedure OSHL(S,E)T ← {};while {} 6∈ T doround← 0;doround← round+ 1;E← unfailing_complete(E);S← narrow(S,E);D← Gmin

S [mmin[T ]];until D is not “fail” or E andS are unchangedod;if (D = fail) andE andS are unchangedthen return satisfiable;T ← simpE(T ,D)od

endwhilereturn unsatisfiable

endOSHL

We now describe the modifications toGminS andsimpthat are used in this version

of OSHL. InGminS , we refer to the round in which a clause was generated to guide

the instantiation. This is used to restrict the size of the substitutionsβ. We definethe size of a substitution{x1 ← t1, . . . , xn ← tn} as the maximum of||ti||. Gmin

S

is modified so that the substitutionsβ applied to a clauseD obtained fromC arerestricted so that (size ofβ)+ (round in whichC was generated)= (current round).For this, we count clauses present when “round” is set to zero as being generatedat round zero. Also, in the main loop ofGmin

S , s increases only until it reaches thecurrent round number. Finally, the line

if Unew is emptythen return “fail” fi;of Gmin

S is deleted. This modification ofGminS permits instantiation and narrowing

to be interleaved.We also modify thesimpprocedure. InsimpE(T ,D), all clauses of an ascending

sequenceT are rewritten to normal form usingE, and the maximum prefix ofTthat is still an ascending sequence is taken. Call thisT ′. LetD′ be the simplifiedform ofD. Thensimp(T ′,D′) is returned as the value ofsimpE(T ,D).

THEOREM 4.36. Suppose unfailing_complete preserves logical equivalence, thatis,E is equivalent to unfailing_complete(E) relative to equality, and that ifS ∪ Eis unsatisfiable relative to the equality axioms, then eventuallyS will have a subset

ORDERED SEMANTIC HYPER-LINKING 209

S′ that remains from then on, that is,S′ ⊆ S impliesS′ ⊆ narrow(S,E′) foranyE′ logically equivalent toE, and such thatS′ is unsatisfiable without addi-tional equality axioms.(This is true of typical versions of unfailing completionand narrowing.) Also, suppose that clauses ofS ′ will not be changed by rewritingin simpE. Then OSHL as above is complete, that is, OSHL will eventually returnunsatisfiable. This is true even if the ordering used for rewriting and narrowingdiffers from the ordering>t on terms. This also does not require thatI0 |= E.

Proof. After S′ is generated, the result holds, since this case reduces to thecase of OSHL without equality. BeforeS′ is generated, we only have to show thatOSHL can always find a clause contradicting any interpretationI if E andS do notchange. But this is so because narrowing and unfailing completion will continue tobe called if no clause contradictingI is found, and eventually a clause ofS ′ will begenerated that will contradict any interpretation. 2

A survey of paramodulation and rewriting in theorem proving may be foundin [6]. Without giving the details, we sketch how the above result can be extendedto other cases involving equality. For setsS of clauses such that all positive equalityliterals occur in Horn equations, one can extend this approach by including inE allthe Horn equations (Horn clauses containing the equality predicate) and perform-ing unfailing completion extended to Horn equations, which is not difficult to do.ThenS need not contain equality axioms (except forx = x) or Horn equations.The completeness proof is similar to the above.

If the positive equality predicate occurs in non-Horn clauses, it seems reason-able to apply some transformation such as Brand’s transformation [8] to the clausesof S (excluding the Horn equational clauses) and apply unfailing Horn completionto the Horn equational clauses. This combination is complete, as above. However,it requires that the Brand transformation of all the input clauses be included inS,including the transformed unit equations. This also requires that the equality axiomx = x be included inS. The completeness proof for this case is much the same asbefore.

A fourth possibility is to include inS all the equality axioms and all of theclauses ofS except for the unit equations. Then unfailing completion may be doneon the unit equations, and these may be used for narrowing and simplification ofclauses ofS and their ground instances as before. Two advantages of this approachare that it avoids the need to instantiate the unit equations and it avoids the needto use Brand’s transformation. Unit equations are not needed inS because all oftheir ground instances will simplify to instances ofx = x. A disadvantage ofthis approach is that it requires the use of the equality axioms. However, we havefound that the equality axioms are often not as detrimental as one would think.Furthermore, Brand’s transformation can create clauses with many literals, whichcan be difficult to handle for the instantiation procedure.

This approach to treating equality tries to use rewriting and completion wherethey are most efficient, that is, on unit equations, and uses Brand’s modification

210 DAVID A. PLAISTED AND YUNSHAN ZHU

method or the equality axioms on more general occurrences of equality. How-ever, it might be better to have one uniform method of handling equality. Theproblem is that paramodulation combines literals from different clauses and canproduce a large number of clauses by combining literals in many different ways.Paramodulation is also not goal sensitive and is difficult to combine with semantics.

4.11. REPLACEMENT RULES

We have observed in [35] and elsewhere that set theory problems are often greatlyhelped by the use of replacement rules. OSHL has an advanced replacement rulefacility that the user may invoke by setting the strategy flag to “SET THEORY.”This setting performs some UR-resolution, then applies replacement rules, and thencalls the default theorem prover to attempt to complete the proof if the proof couldnot be found entirely by the use of replacement rules. The replacement rule facilityhas the effect of expanding definitions of predicates before attempting a proof.The results below on set theory problems made use of a hand-computed set ofreplacement rules for set theory. However, these rules were fairly straightforward togenerate. We note that the use of replacement rules is a way to generate large literalsin proofs more efficiently than by the basic instantiation procedure of OSHL.

5. Evaluation of OSHL

5.1. ASYMPTOTIC COMPLEXITY

We now comment on the complexity of ordered semantic hyper-linking. First-orderlogic has only partial decision procedures for validity. The run time of OSHL cantherefore not be bounded recursively by the size of the input clauses. We definethemaximal literal sizeof a set of clausesS to be the maximum of the size of theliterals inS. A Herbrand setfor S is an unsatisfiable set of ground instances ofS.We definec(S) to be the minimum, over all Herbrand setsT for S, of the maximalliteral size ofT . Thusc(S) is the smallest bound on literal size permitting a proof tobe found. By Herbrand’s theorem,c(S) is finite if S is unsatisfiable. We can analyzethe complexity of OSHL by including, in the input string,c(S) represented in unaryalong with the input clause setS. We call this thecomplexity with respect to linearterm size measure[45]. OSHL is double exponential with respect to the linear termsize measure. There are at most O(2c(S)) ground instances ofS of term sizec(S)or less; thus the instance generation module takes at most O(2c(S)) time and themodel generation module takes at most O(22c(S)) time. It is co-NEXPTIME hard todetermine whether a setS has a Herbrand set with a maximal literal size of lessthann. Thus the double exponential complexity of OSHL is probably the best wecan get. In practice, because of the use of an efficient propositional procedure, themodel generation module usually takes much less time than the instance generationmodule. In fact, we might assume an efficient propositional procedure to have an

ORDERED SEMANTIC HYPER-LINKING 211

expected polynomial complexity (especially when there are many Horn clauses);thus OSHL will have an expected single exponential complexity.

5.2. RUNS ON EXAMPLES

We have run OSHL on a number of problems both with and without natural seman-tics, and compared OSHL with OTTER. The results are encouraging. Althoughsome hard problems can be solved using a trivial semantics, the prover performsbest when a natural semantics is used. We generated natural semantics for manyproblems by hand. The process is easier than it sounds. For example, many of theGRP problems from the TPTP library [52] have the same set of axioms. Had thelibrary used a uniform naming scheme for constant symbols (not just functions andpredicates), a single semantics would have sufficed for all the GRP problems! Wehope that the TPTP library will eventually include semantics for the appropriateproblems in the database.

Table I shows some of the problems that OSHL solved. The EXQ problemsand the SET problems demonstrate the propositional efficiency of the prover. Thereplacement module helped in solving the SET problems. It essentially generatesan unsatisfiable set of propositional clauses by expanding the definitions. A trivialsemantics is used for the SET problems. The Wos problems and the intermediatevalue theorem problem show the effectiveness of semantic guidance. We usedfree groups for the Wos problem semantics and a piecewise linear function forthe IMV semantics. Natural semantics greatly reduces the number of instancesgenerated by OSHL. The problems e4, e5, and e6 show the effect of equalitysupport in the prover. They, along with e1–e3, are a set of equality benchmarkproblems proposed by [31]. None of them can be solved without equality sup-port. For these problems, the input was expressed as a set of equations, and anunfailing-completion based approach was used, which handled AC operators usingpermutations of the arguments, as in [46]. No equality axioms were necessary. Inthe current implementation, OSHL is not as efficient on pure equality problems asOTTER.

Natural semantics plays an important role in guiding the search of OSHL. Wemanually constructed semantics for some problems from the TPTP library. Wefocus on four categories of problems from the TPTP library, namely, GRP, PLA,SET, and SYN. GRP contains problems from group theory, PLA contains prob-lems from AI planning, SET contains problems from set theory, and SYN con-tains syntactical problems. In Table II we report the total number of problems ineach category, the percentages of problems solved with or without natural seman-tics, and the percentages of problems solved by OTTER. We ran OSHL with atime limit of 3000 seconds. The result for OTTER is collected from the Web sitehttp://www-c.mcs.anl.gov/home/mccune/ar/otter/.

OSHL solves 41% and 71% of SYN problems with all negative semantics andall positive semantics, respectively. This indicates that the choice of initial seman-

212 DAVID A. PLAISTED AND YUNSHAN ZHU

Table I. Timing of OSHL and OTTER. Time is measured inseconds on a SPARC-20. 3000+ means that no proof is foundin 3000 seconds. In the second column, we list the number ofinstances that OSHL generates in searching for a proof.

Problem # of OSHL OTTER

Instances Time Time

EXQ1(SYN013-1) 95 35.0 3000+

EXQ2(SYN014-1) 241 305.9 3000+

EXQ3(SYN015-1) 280 383.9 3000+

SET159-6 101 95.5 3000+

SET169-6 67 160.4 3000+

SET249-6 93 97.1 3000+

wos19(GRP039-3) 5 38.2 3000+

wos20(GRP040-4) 61 882.3 950.7

IMV(ANA002-3) 188 317.2 3000+

e4(GRP002-4) n/a 801.0 2.0

e5(BOO002-1) n/a 83.6 21.2

e6(RNG009-7) n/a 245771 3000+

Table II. Timing of OSHL on some TPTP problems.

Category # of Problems No Semantics Semantics OTTER

SYN 414 positive 71% N/A 88%

negative 41%

SET 539 27% N/A 13%

GRP 84 48% 76% 80%

PLA 26 8% 92% 19%

tics can significantly impact the performance of OSHL. We tested OSHL on settheory problems based on von Neumann–Bernays–Gödel’s axiomatization. Theseproblems correspond to the subset of set theory problems from the TPTP librarynamed as “SETxxx-6.” The performance of OSHL in set theory indicates the ef-ficiency of replacement rules and the propositional decision procedure of OSHL.The performance of OSHL on SYN and SET might be further improved by usingnatural semantics. We selected the first 84 group theory problems from the TPTPlibrary, and manually generated semantics for these problems. While OSHL cansolve 48% of the problems with trivial semantics, it can solve 76% with a naturalsemantics. OTTER performs slightly better in this category, possibly because of

ORDERED SEMANTIC HYPER-LINKING 213

its efficiency in handling equality problems. Natural semantics plays a critical rolein solving the PLA problems. We used a nonground decision procedure as oursemantics, and thus the enumeration of Herbrand terms in finding contradictinginstances is avoided. Finally, we point out that the percentages of TPTP problemssolved might be an inaccurate way of comparing theorem provers. Some of theproblems in the TPTP library are variants of each other, and many problems in thesame category are very similar to each other. Thus the strength of a theorem proveron a small subset of the problems might disproportionally affect its results on anentire category of problems.

6. Discussion

Semantics is a potentially significant aspect of theorem proving that is neglectedin many current theorem provers. We have surveyed the work of a number of re-searchers concerning semantics in theorem proving. We discussed the importanceof semantics as well as goal-sensitivity, propositional efficiency, and equality. Wethen presented the OSHL strategy, proved that it was correct and complete, andshowed that it uses natural semantics, is goal sensitive and propositionally efficient,and has efficient equality strategies. The efficiency of OSHL is indicated by itsperformance on a number of examples.

Some areas have not received as much attention as we would like. We did notexperiment much with natural semantics for set theory problems, since replacementrules permit many set theory problems to be obtained with a trivial semantics.However, a natural semantics might help even more. The fact that the OSHL im-plementation is written in Prolog may limit its efficiency compared with proversimplemented in LISP or C. However, Prolog makes the specification of semanticseasier. It would be useful to incorporate specialized decision procedures into theprover in some manner. This might involve removing certain axioms from the setS of input clauses and incorporating them intoGmin

S instead, to guarantee that theinterpretation found is also a model of the omitted axioms. Another issue is thatour equality mechanism is biased toward unit equations. It could be extended toHorn equations without much trouble, but we would also like to have an efficientmechanism for non-Horn equations, if possible. Also, we would like to take somehard problems and experiment with various semantics to see their influence on theproof search.

It is not clear what is learned by summary statistics of prover runs on many ex-amples. What we really need is a better understanding of why the prover succeedsor fails on specific problems, and not just summary statistics based on large classesof problems, in order to have a better idea of how good the prover is and how itcould be improved. In particular, we would like to try OSHL on more differentkinds of hard problems, not just problems that are hard because they have all ofthe set theory axioms. Possibly problems in analysis like the intermediate valuetheorem would be suitable. We would also like to study in more detail what actually

214 DAVID A. PLAISTED AND YUNSHAN ZHU

happens during the proof attempt, and why the prover succeeds or fails. We wouldlike to make a more detailed attempt to find out the influence of various semantics,and to test various semantics.

In this way, we hope to learn more accurately (1) how good OSHL is, especiallyif implemented in a faster language with good data structures; (2) what kind ofproblems it is good for, and whether there is a class of problems for which OSHLis better than other strategies, (3) what would be needed to make OSHL better, and(4) how close are we to strategies that can prove truly interesting mathematical the-orems automatically. This is not to deny the significance of the Robbins proof [38],but that was on a pure equational problem where efficient equality techniques inthemselves sufficed to find the proof. For many theorems, other techniques are alsoneeded. Also, is there a class of problems with a practical value for which OSHLis already good enough to constitute part of a practical tool, possibly in program orhardware verification or expert systems?

Another attractive possibility for the use of semantics is that of automaticallygenerating models (as in the system of Caferra, Peltier, and Zabel [17, 18, 15, 40])for the axioms of a set of clauses, and then using these models to guide the search.This is more easily done for finite models, but is also possible for some infinitemodels. We might also attempt to generate a number of models automatically, tryto evaluate (automatically) which is best, and then use this model to guide thesearch.

Since OSHL sometimes has difficulty generating large literals, we would alsolike to study other methods of avoiding the inefficiency of having to generate largeliterals. Possibly some kind of unification or resolution can be added to OSHL tohelp alleviate this problem. The inefficiency of generating large literals is a conse-quence of the fact that OSHL enumerates the Herbrand universe when instantiatingvariables. Actually, the prover does not really enumerate the Herbrand universe,but rather, equivalence classes of terms relative to the semantics. This can be muchmore efficient than enumerating the entire Herbrand universe, but still entails alarge search in some cases.

Another possible extension to the prover involves doing resolutions at the non-ground level corresponding to the A-resolutions performed on ground clauses bythe procedure “simp.” Clauses generated by such non-ground resolutions can thenbe added to the set of input clauses. This corresponds to explanation-based gen-eralization and has the potential to enhance the efficiency of the prover in somecases. However, the added input clauses can also slow the prover down.

7. Conclusions

Semantics has the potential significantly to enhance the power of theorem provers,being commonly used by humans when proving theorems. The combination ofsemantics, goal sensitivity, propositional efficiency, and equality found in OSHLis attractive, complete, and, on many problems, comparatively efficient. For some

ORDERED SEMANTIC HYPER-LINKING 215

problems, the difficulty of generating large terms and enumerating the Herbranduniverse may make ordered semantic hyper-linking less attractive. We would liketo have a better understanding of how this inefficiency occurs and whether thislimitation can be overcome for some of these problems. It would also facilitate theevaluation of provers such as OSHL if there were readily available problem setssuch as TPTP with natural semantics already supplied. Finally, in order really tounderstand how powerful a prover such as OSHL is, it will be necessary to investsimilar effort in data structures for OSHL as has been done in OTTER, since suchdata structures could lead to a further dramatic improvement in efficiency. Onlythen can we really estimate how powerful OSHL is, and for what applications it issuitable. If it turns out to be sufficiently powerful, this could dramatically impactthe field of theorem proving. But we feel that the contribution of this work is moregeneral than that, in presenting techniques that can be incorporated into many othertheorem provers, as well.

Acknowledgments

Support from Ricardo Caferra and assistance from his group in Grenoble, Francecontributed to the final stages of the preparation of this paper. An anonymousreferee provided an exceptionally thoughtful and detailed report.

References

1. Ballantyne, A. M. and Bledsoe, W. W.: On generating and using examples in proof discovery,in Michie and Pao (eds),Machine Intelligence, Vol. 10, Ellis Horwood, 1982, pp. 3–39.

2. Bourely, Ch., Caferra, R. and Peltier, N.: A method for building models automatically: Exper-iments with an extension of OTTER, inProceedings of CADE-12, Springer LNAI 814, 1994,pp. 72–86. http://www-leibniz.imag.fr/ATINF/PUBLICATIONS/nico-cade94.ps.gz.

3. Bachmair, L., Dershowitz, N. and Plaisted, D.: Completion without failure, in H. Aït-Kaciand M. Nivat (eds),Resolution of Equations in Algebraic Structures 2: Rewriting Techniques,Academic Press, New York, 1989, pp. 1–30.

4. Baumgartner, P., Fröhlich, P., Furbach, U. and Nejdl, W.: Semantically guided theorem provingfor diagnosis applications, in15th International Joint Conference on Artificial Intelligence(IJCAI 97), 1997, pp. 460–465.

5. Bachmair, L. and Ganzinger, H.: Ordered chaining for total orderings, in A. Bundy (ed.),Pro-ceedings of the 12th International Conference on Automated Deduction, Springer-Verlag, NewYork, 1994, pp. 435–450.

6. Bachmair, L. and Ganzinger, H.: Equational reasoning in saturation-based theorem proving, inW. Bibel and P. H. Schmitt (eds),Automated Deduction – A Basis for Applications. Volume I:Foundations – Calculi and Methods, Kluwer Acad. Publ., Dordrecht, 1998, pp. 353–398.

7. Bachmair, L., Ganzinger, H., Lynch, C. and Snyder, W.: Basic paramodulation,Inform. andComput.121(2) (1995), 172–192.

8. Bachmair, L., Ganzinger, H. and Voronkov, A.: Elimination of equality via transformationwith ordering constraints, in C. Kirchner and H. Kirchner (eds),Proceedings of the 15thInternational Conference on Automated Deduction, Springer-Verlag, New York, 1998, pp.175–190.

9. Bibel, W.:Automated Theorem Proving, 2nd edn, Vieweg, Braunschweig/Wiesbaden, 1987.

216 DAVID A. PLAISTED AND YUNSHAN ZHU

10. Bledsoe, W. W.: Using examples to generate instantiations of set variables, inProceedings ofthe 8th International Joint Conference on Artificial Intelligence, 1983, pp. 892–901.

11. Baader, F. and Nipkow, T.:Term Rewriting and All That, Cambridge University Press,Cambridge, 1998.

12. Brand, D.: Proving theorems with the modification method,SIAM J. Comput.4 (1975), 412–430.

13. Chang, C. and Lee, R.:Symbolic Logic and Mechanical Theorem Proving, Academic Press,New York, 1973.

14. Chu, H. and Plaisted, D.: Semantically guided first-order theorem proving using hyper-linking,in Proceedings of the Twelfth International Conference on Automated Deduction, Lecture Notesin Artif. Intell. 814, 1994, pp. 192–206.

15. Caferra, R. and Peltier, N.: Extending semantic resolution via automated model building:Applications, inProceeding of IJCAI’95, Morgan Kaufman, 1995, pp. 328–334. http://www-leibniz.imag.fr/ATINF/PUBLICATIONS/nico-ijcai95.ps.gz.

16. Caferra, R. and Peltier, N.: Disinference rules, model building and abduction, to appear in E.Orłowska (ed.),Logic at Work: Essays Dedicated to the Memory of Helena Rasiowa, 1997.http://www-leibniz.imag.fr/ATINF/PUBLICATIONS/nico-eva.ps.gz.

17. Caferra, R. and Zabel, N.: A method for simultaneous search for refutations and models byequational constraint solving,J. Symbolic Comput.13 (1992), 613–641.

18. Caferra, R. and Zabel, N.: Building models by using tableaux extended by equational problems,J. Logic Comput.3 (1993), 3–25.

19. Dowling, W. and Gallier, J.: Linear-time algorithms for testing the satisfiability of propositionalHorn formulae,J. Logic Programming1 (1984), 267–284.

20. Dershowitz, N. and Jouannaud, J.-P.: Rewrite systems, in J. van Leeuwen (ed.),Handbook ofTheoretical Computer Science, North-Holland, Amsterdam, 1990.

21. Davis, M., Logemann, G. and Loveland, D.: A machine program for theorem-proving,Communications of the ACM5 (1962), 394–397.

22. Davis, M. and Putnam, H.: A computing procedure for quantification theory,J. ACM7 (1960),201–215.

23. Fermüller, C. and Leitsch, A.: Hyperresolution and automated model building,J. LogicComput.6(2) (1996), 173–203.

24. Gelernter, H., Hansen, J. R. and Loveland, D. W.: Empirical explorations of the geometrytheorem proving machine, in E. Feigenbaum and J. Feldman (eds),Computers and Thought,McGraw-Hill, New York, 1963, pp. 153–167.

25. Ganzinger, H., Meyer, C. and Weidenbach, C.: Soft typing for ordered resolution, in B. McCune(ed.),Proceedings of the Fourteenth Conference on Automated Deduction, 1997, pp. 321–335.

26. Hasegawa, R., Inoue, K., Ohta, Y. and Koshimura, M.: Non-Horn magic sets to incorporatetop-down inference into bottom-up theorem proving, in W. McCune (ed.),Proceedings of the14th International Conference on Automated Deduction, July 1997, pp. 176–190.

27. Hsiang, J. and Rusinowitch, M.: Proving refutational completeness of theorem-proving strate-gies: The transfinite semantic tree method,J. ACM38(3) (1991), 559–587.

28. Leitsch, A.:The Resolution Calculus, Springer-Verlag, Berlin, 1997. Texts in TheoreticalComputer Science.

29. Lloyd, J. W.:Foundations of Logic Programming, 2nd edn, Springer-Verlag, Berlin, 1987.30. Lassez, J.-L. and Marriott, K. G.: Explicit representation of terms defined by counterexamples,

J. Automated Reasoning3(3) (1987), 1–17.31. Lusk, E. and Overbeek, R.: Non-Horn problems,J. Automated Reasoning1 (1985), 103–114.32. Loveland, D.: A simplified format for the model elimination procedure,J. ACM 16 (1969),

349–363.33. Loveland, D.:Automated Theorem Proving: A Logical Basis, North-Holland, New York, 1978.

ORDERED SEMANTIC HYPER-LINKING 217

34. Lee, S.-J. and Plaisted, D.: Eliminating duplication with the hyper-linking strategy,J. Auto-mated Reasoning9(1) (1992), 25–42.

35. Lee, S.-J. and Plaisted, D.: Use of replace rules in theorem proving,Methods of Logic inComputer Science1 (1994), 217–240.

36. Loveland, D., Reed, D. and Wilson, D.: SATCHMORE: SATCHMO with RElevance,J.Automated Reasoning14 (1995), 325–351.

37. Manthey, R. and Bry, F.: SATCHMO: A theorem prover implemented in Prolog, inProceedingsof the 9th Conference on Automated Deduction, Argonne, Illinois, May 1988, pp. 415–434.

38. McCune, W.: Solution of the Robbins problem,J. Automated Reasoning19(3) (1997), 263–276.

39. Nie, X. and Plaisted, D.: A complete semantic back chaining proof system, inProceedings ofthe 10th International Conference on Automated Deduction, 1990.

40. Peltier, N.: Increasing the capabilities of model building by constraint solving withterms with integer exponents,J. Symbolic Comput.24 (1997), 59–101. http://www-leibniz.imag.fr/ATINF/PUBLICATIONS/nico-jsc97.ps.gz.

41. Plaisted, D.: A simplified problem reduction format,Artif. Intell. 18 (1982), 227–261.42. Plaisted, D.: Non-Horn clause logic programming without contrapositives,J. Automated

Reasoning4 (1988), 287–325.43. Plaisted, D.: Ordered semantic hyper-linking, Technical Report MPI-I-94-235, Max-Planck

Institut für Informatik, Saarbrücken, Germany, 1994.44. Plaisted, D.: The search efficiency of theorem proving strategies: An analytical comparison,

Technical Report MPI-I-94-233, Max-Planck Institut für Informatik, Saarbrücken, Germany,1994.

45. Plaisted, D. and Zhu, Y.:The Efficiency of Theorem Proving Strategies: A Comparative andAsymptotic Analysis, Vieweg, Wiesbaden, 1997.

46. Plaisted, D. and Zhu, Y.: Equational reasoning using AC constraints, inProceedings of the 15thInternational Joint Conference on Artificial Intelligence, 1997.

47. Robinson, J.: A machine-oriented logic based on the resolution principle,J. ACM 12 (1965),23–41.

48. Reif, W. and Schellhorn, G.: Theorem proving in large theories, in W. Bibel and P. H. Schmitt(eds),Automated Deduction – A Basis for Applications. Volume III: Applications, Kluwer Acad.Publ., Dordrecht, 1998, pp. 225–241.

49. Robinson, G. and Wos, L.: Paramodulation and theorem-proving in first order theories withequality, inMachine Intelligence4, Edinburgh University Press, Edinburgh, 1969, pp. 135–150.

50. Slagle, J. R.: Automatic theorem proving with renameable and semantic resolution,J. ACM14(1967), 687–697.

51. Slaney, J.: SCOTT: A model-guided theorem prover, in R. Bajcsy (ed.),Proceedings of theThirteenth International Joint Conference on Artificial Intelligence, 1993, pp. 109–14.

52. Suttner, C. B. and Sutcliffe, G.: The TPTP problem library (TPTP v2.0.0), Technical ReportAR-97-01, Institut für Informatik, Technische Universität München, Germany, 1997.

53. Wos, L., Overbeek, R., Lusk, E. and Boyle, J.:Automated Reasoning: Introduction andApplications, Prentice-Hall, Englewood Cliffs, NJ, 1984.

54. Wos, L., Robinson, G. and Carson, D.: Efficiency and completeness of the set of supportstrategy in theorem proving,J. ACM12 (1965), 536–541.

55. Zhu, Y. and Plaisted, D.: FOLPLAN: A semantically guided first-order planner, inProceedingsof the 10th International FLAIRS Conference, 1997.