20
JournalofAutomated Reasoning 1 (1985), 141-160 . 0168-7433/85.15. 141 1985 by D. ReideIPublishing Company. Deduction in Non-Horn Databases ADNAN YAHYA and LAWRENCE J, HENSCHEN Electrical Engineering and Computer Science Northwestern University Evanston, IL 0201, U.S.A. (Received: 14 February 1984) Abstract. The class of non-Horn, function-free databases is investigated and several aspects of the problem of using theorem proving techniques for such databases are considered. This includes explor- ing the treatment of negative information and extending the existing method, suggested by Minker, to accept non-unit negative clauses. It is shown that the algorithms based on the existing methods for the treatment of negative information can be highly inefficient. An alternative approach is suggested and a simpler algorithm based on it is given. The problems associated with query answering in non-Horn databases are addressed and compared with those for the Horn case. It is shown that the query evalu- ation process can be computationaly difficult in the general case. Conditions under which the process is simplified are discussed. The topic of non-Horn general laws is considered and some guidelines are suggested to divide such laws into derivation rules and integrity constraints. The effect of such a division on the query evaluation process is discussed. Key words. Deductive databases, non-Horn databases, generalized closed-world assumption. 1. Introduction A major recent application for automated reasoning techniques has been in the area of logic-based databases. In particular, the relational model has been extended to allow more general well-formed formulas as definitions and integrity constraints. Then, theorem- proving methods, typically resolution, are used for query processing, constraint main- tenance, etc. The use of mathematical logic and theorem proving methods in database design and applications has been extensively covered in the technical literature [1, 3, 6, 7, 10, 15-19]. Most of the attention there was given to the class of Horn databases. Although most of the real life databases are Horn, there can also be many useful instances of non-Hornness. For example, a major instance arises in the logical representation of NULL values [4, 11,20]. The focus of this paper will be deduction in function-free non-Horn databases. In particular, we are concerned with the treatment of negative information and the deduct- ive techniques necessary to process this information in the non-Horn setting. Minker [12] suggested a generalized closed world assumption to deal with negative information in non-Horn databases. We extend that notion to include non.unit negative clauses as answers and give an algorithm by which a theorem prover could be used to tell when negative information could be assumed. We also discuss the computational difficulties associated with this assumption. We then turn to the treatment of non-Horn general laws in an otherwise Horn data- base. Some guidelines are given to divide such laws into derivation rules and integrity

Deduction in non-Horn databases

Embed Size (px)

Citation preview

JournalofAutomated Reasoning 1 (1985), 141-160 . 0168-7433 /85 .15 . 141 �9 1985 by D. ReideIPublishing Company.

Deduction in Non-Horn Databases

A D N A N Y A H Y A and L A W R E N C E J, HENSCHEN Electrical Engineering and Computer Science Northwestern University Evanston, IL 0201, U.S.A.

(Received: 14 February 1984)

Abstract. The class of non-Horn, function-free databases is investigated and several aspects of the problem of using theorem proving techniques for such databases are considered. This includes explor- ing the treatment of negative information and extending the existing method, suggested by Minker, to accept non-unit negative clauses. It is shown that the algorithms based on the existing methods for the treatment of negative information can be highly inefficient. An alternative approach is suggested and a simpler algorithm based on it is given. The problems associated with query answering in non-Horn databases are addressed and compared with those for the Horn case. It is shown that the query evalu- ation process can be computationaly difficult in the general case. Conditions under which the process is simplified are discussed. The topic of non-Horn general laws is considered and some guidelines are suggested to divide such laws into derivation rules and integrity constraints. The effect of such a division on the query evaluation process is discussed.

Key words. Deductive databases, non-Horn databases, generalized closed-world assumption.

1. I n t r o d u c t i o n

A major recent application for automated reasoning techniques has been in the area of

logic-based databases. In particular, the relational model has been extended to allow more

general well-formed formulas as definitions and integrity constraints. Then, theorem-

proving methods, typically resolution, are used for query processing, constraint main-

tenance, etc. The use of mathematical logic and theorem proving methods in database

design and applications has been extensively covered in the technical literature [1, 3,

6, 7, 10, 15-19]. Most of the attention there was given to the class of Horn databases.

Although most of the real life databases are Horn, there can also be many useful instances

of non-Hornness. For example, a major instance arises in the logical representation of

NULL values [4, 11,20].

The focus of this paper will be deduction in function-free non-Horn databases. In

particular, we are concerned with the treatment of negative information and the deduct-

ive techniques necessary to process this information in the non-Horn setting. Minker

[12] suggested a generalized closed world assumption to deal with negative information

in non-Horn databases. We extend that notion to include non.unit negative clauses as

answers and give an algorithm by which a theorem prover could be used to tell when

negative information could be assumed. We also discuss the computational difficulties

associated with this assumption.

We then turn to the treatment of non-Horn general laws in an otherwise Horn data-

base. Some guidelines are given to divide such laws into derivation rules and integrity

142 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

constraints. If followed, these guidelines are shown to give better results at query evalu-

ation.

We will take the relational model for our choice of data representation. Some defi-

nitions and background material necessary to understand this paper as well as a review

of some of the previous research relevant to the subject under consideration are given

but the reader is referred to [2, 5, 12, 13] for more complete details

2. Definitions and Background Material

We assume the reader is familiar with standard concepts from theorem proving such as

literal, clause, logical consequence, refutation, Horn, etc. We also assume a basic famili-

arity with relational database concepts like relation, tuple and the like.

Under the relational model a database is a collection of relations. An n-place relation

is a subset o f the cartesian product D1 • x . . . x D n , where D1 . . . . . Dn are (not

necessarily distinct) domains of elements. The components of a tuple are referred to as

attributes. From a logical point of view, a stored tuple can be treated as a positive unit

ground clause. We will refer to the stored set of ground clauses as the Extensional Data-

base (EDB) [8] and to the set o f general laws as the Intensional Database (IDB). Thus,

for our purposes, a database DB = EDB + IDB. The advantages of the separation have

been discussed in [8, 16]. We assume that the elements of IDB are in clausal form [3].

The database DB is Horn if all of its clauses (in both IDB and EDB) are Horn.

DEFINITION. A query Q is an expression of the form [18]:

( x l , x2 . . . . . xn / (q l y l )(q2y2) . . . , ( q m y m ) F ( x l , x 2 . . . . . x n , y l , y 2 . . . . ,ym)) ;

where qiyi is either Vyi or Eyi, and F( x l , x2 . . . . . xn, y l , y2 . . . . , y m ) is a quantifier

free weU-formed formula. The intended meaning of such a query is "find all n-tuples

( x l , x2 . . . . , xn) such that (q ly l ) (q2y2) . . . . . ( q m y m ) F ( x l , x2 . . . . . xn, y l , y2 ,

. . . . y m ) is true". For brevity we will write ( x / q y F ( x , y)) .

A Q of the form (x/EyF(x, y ) ) is called an existential query. Using generalized

versions of the division and projection operators Reiter [16] shows that arbitrary queries

can be reduced to existential queries in a function-free database. Therefore, we will

discuss only existential queries. In our notation, we use simple variable and constant

symbols to denote n-tuples, i.e., we drop vector signs.

DEFINITION. A set of n-tuples (cl , c 2 , . . . , c o is an answer to Q if and only if

DB ~- ( q y ) F ( c l , y ) v (qy )F(e2 ,y ) v . . . v (qy)F(cl , y ) ,

where ~- stands for 'derives'. Following Reiter [16], we say that cl + c2 + �9 �9 �9 + cl

is an answer to Q.

DEFINITION. An answer is minimal if no proper subdisjunction of it is an answer, i.e. cl + c2 + �9 �9 �9 + cn is a minimal answer to Q if

DB ~- ( q y ) F ( c l , y ) v (qy )F(c2 , y ) v . . . v (qy)F(cn ,y ) ,

DEDUCTION IN NON-HORN DATABASES 143

but DB F~ ( q y ) F ( c l , y ) v . . . v (qy)F(ci -- 1 ,y ) v (qy)Fci + 1 ,y ) v . . . v ( q y ) F ( c n , y )

for any i in { 1 . . . . . n}. Here ~ denotes 'does not derive'.

DEFINITION. An answer c l + c2 + �9 �9 �9 + cn is definite if n = 1 and indefinite other-

wise.

EXAMPLE. Let

DB = {e(a, b) r e ( a , c), e(a , d), R (b), R (c), R (d)}.

Let

a = (x/(P(a, x ) & R(x ) ) ) .

Then, b + c is a minimal indefinite answer to Q, d is a definite answer to Q, and d + c is an answer to Q, but not minimal.

3. N e g a t i v e I n f o r m a t i o n in Databases, the Closed World A s s u m p t i o n (CWA)

There are two approaches to dealing with negative information in databases:

(1) The negative EDB information is stored explicitly and used in the same way as

positive information for deriving answers [6]. A negative fact could be assumed only

if it could be proved from the existing set of clauses. In database applications, the storage

requirements for the vast amounts of negative information make this approach unsatis- factory.

(2) The negative information is represented implicitly. A negative ground literal ~L

is assumed to be true if we fail to prove L from the existing set of clauses in DB, i.e.,

DB ~ L . This representation is called the 'Closed World Assumption (CWA)' by Reiter

[18]. The closed world assumption is logically equivalent to adding a new component D B - to the database, where D B - = {~P(c)/P(c) is a ground atom and DB ~ P(c)}, but

without having D B - explicitly stored.

If we are not working under CWA then we will say that the Open World Assumption (OWA) is adopted.

DEFINITION. Q = ( x / E y P ( t l , t2 . . . . , tk)) is an atomic query if and only i f P is a predi- cate symbol and for all i in {1 . . . . . k}, ti is either a constant, an x or a y [18].

By IIQIICWA(IIQIIOWA) we will denote the set of minimal answers to the query Q under the closed world assumption (open world assumption).

Let DB be a consistent, function-free Horn database. The following are important theorems proved by Reiter in [18]:

THEOREM 1. Let Q be an existential query. Every minimal answer to Q under the CWA

is definite.

THEOREM 2. I f DB + D B - is consistent then the evaluation o f an arbitrary quantifier

free query under the CWA can be reduced to Boolean operations o f set intersection

144 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

(^) union (U) and difference ( - ) applied to atomic queries as follows: assume that F,

F1 and F2 below are all quantifier free.

(a) [l(x/Ey(F1 vF2))l l = II(x/EyF1)[I U II(x/EyF2)ll;

(b) I[(x/(F1 &g2))] l = II(x/gl)ll A II(x/F2)II;

(c) [[ (x/(~F)}[[ = D -- [[ (x/F}[[;

(d) II(x/(F1 & -F2)II -- II(x/Fl>ll - I I (x /F2) l l ;

where 19 is the domain o f x.

Reiter also gives techniques for reducing queries to quantifier-free form and then using

generalized projection and division operators to reconstruct answers to the original query

from answers to the reduced query.

THEOREM 3. I f DB is Horn and consistent then DB + D B - is also Horn and consistent.

THEOREM 4. I f Q is a positive existential query over a Horn database then the closed

world assumption answers for Q when evaluated with respect to DB yields the same set

o f answers as when evaluated with respect to D B - {C/C is a clause with no positive

literals}.

THEOREM 5. Let Q be an atomic query over a Horn database. I f DB + DB ~ is con-

sistent then IIQIICWA = IIQItOWA.

The following examples demonstrate that the above results may fail or be otherwise

inapplicable when DB is not Horn.

EXAMPLE 3.1. DB1 =P(a)vP(b) ; Q =(x /P(x ) ) ; a + b is a minimal answer for Q, but not a definite answer, violating the conclusion of Theorem 1.

EXAMPLE 3.2. DB2 = P(a) v T(a). First, DB2 ~- P(a), and DB2 ~ T(a). Thus DB2- =

{-P(a), -T(a)}. Clearly DB2 + DB2- is inconsistent although DB2 is consistent, so that

Theorem 3 fails. Theorems 2 and 5 are not applicable because the hypotheses are false.

However, consider the query Q = (x /P(x ) v T(x)) . Clearly II Q II = {a}, while II (x/e(x)>ll =

II (x/T(x))l l = the empty set, and so is their union.

EXAMPLE 3.3. DB3 = {-P(a) v - T(a), T(a) v S(a), e(a)}. Clearly DB3 ~ S(a) while

DB3 - - {~P(a) v ~ T(a)} ~ S(a), so that Theorem 4 fails.

4. N o n - H o r n Da tabase s

Although most of the databases in use are Horn, indefiniteness can appear as a result of our incomplete knowledge about the real world. For example, we could know that the blood type of John is A or B, but not enough information is available to determine exactly which blood type John has. This fact could be represented b.y the clause BT(A,

John) vBT(B , John). In addition, the IDB can have non-Horn general laws.

DEDUCTION IN NON-HORN DATABASES 145

The closed world assumption as discussed above is not applicable for non-Horn data- bases. The inclusion of ~L whenever we fail to prove L can lead to inconsistencies.

This was shown in Example 3.2 above. In [12] Minker suggests a modified version

of the closed world assumption, called the generalized closed world assumption (GCWA).

GCWA accommodates the non-Horn case and reduces to the ordinary CWA if the data-

base under consideration is Horn. To describe Minker's GCWA we need the following:

Let H denote the set of all ground atomic formulas of the form P(t l , t2 . . . . . tn), where P is an n-place predicate symbol occurring in the database and n', for all i in {1,

. . . . n}, belongs to the Herbrand universe of DB (in the absence of function symbols, the Herbrand universe is the set of constant symbols appearing in DB). H is the Herbrand

base for DB. Let H' denote the set of all negative ground literals in DB, i.e., H ' = {-P(a)/ P(a) is in H}.

Since we will be dealing with databases not containing function signs and with a finite

number of constants, H and H ' will always be finite. The reader is reminded of the

definitions of general interpretation and model. In this paper we are concerned only

with Herbrand interpretations.

DEFINITION. A Herbrand interpretation I of DB is any subset of H:

(1) A ground positive literal P is true in I if and only i fP is in I.

(2) A ground negative literal ~P is true in I if and only i fP is not in 1.

(3) A ground clause is true in I if and only if at least one of its literals is true in I.

(4) A clause C is true in I if and only if every ground instance of C is true in I.

(5) A set o f clauses is true in I if and only if every clause in the set is true in I.

DEFINITION. A model M of DB is an interpretation satisfying every clause in DB.

DEFINITION. A model M is minimal if and only if no proper subset of M is a model.

EXAMPLE 4.1. Let DB = {P(c) v -P(b), P(a) vP(c), P(d) vP(c)}.

M1 = {P(a), P(c), P(d)} is a model which is not minimal since M2 = {e(c)} is also

a model and M2 is a proper subset of M1. M2 as well as M3 = {P(a), P(d)} are minimal

models of DB.

A SEMANTIC DEFINITION OF THE GENERALIZED CLOSED WORLD ASSUMP-

TION (GCWA). A negative ground literal can be assumed to be true if and only if its positive counterpart is not in any minimal model of DB.

EXAMPLE 4.2. Let DB1 = {P(a) vP(b)}. The minimal models are {P(a)}, {P(b)}. Thus

we cannot assume -P(a) nor ~P(b). If DB2 = {P(c) vP(b) , P(a) v -P(d)}, the minimal models are {P(c)}, {P(b)}, which

enables us to assume -P(a) and -P(d).

A SYNTACTIC DEFINITION OF THE GCWA. Let DB be consistent and E = {P(c)/ P(c) is in H and DB ~-P(c)vK-, where K is a positive or empty clause not derivable

from DB (DB b/K)}. Let P(d) be an element of the Herbrand base of DB. Under GCWA we can assume -P(d) if and only i fP(d) is not in E.

146 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

Minker [12] shows that the two above definitions of GCWA are equivalent. We

define D B - - = {-P(d)/P(d) is not in E}, where E is as defined above.

THEOREM 6. I f DB is consistent then so is DB + D B - - .

THEOREM 7. I f DB is consistent then DB + D B - - is maximally consistent with the

GCWA in the sense that the addition o f a ground negative literal ~L from H' which is

not subsumed by DB + DB ~~ satisfies the following: DB + DB ~~ + {-L } [- K, where K is a positive or empty clause not derivable from DB [12].

5. Definite and Indefinite Databases

DEFINITION. By PGC(DB) (positive and minimally derivable ground clauses from DB)

we will denote the set of all positive ground clauses derivable from DB such that no

proper subclause of a clause in PGC is derivable from DB.

PGC(DB)= { K / K = P l v P 2 v . . . v P n and DB I--K, K is ground and DB ~ P 1

v . . . vP i -- 1 vPi + 1 v . . . Pn, where P/' is an atom for allj in { 1 , . . . , n}}.

Here we mean t-- to be any sound and complete inference system. We will often write

just PGC when DB is clear from the context. By L(K) we will denote the length of K,

i.e., the number of literals in K.

DEFINITION. A database DB is indefinite if and only if there exists an element in PGC(DB) of length greater than 1.

We note here that every model of DB must contain a minimal model [12]. We will be

making implicit use of this fact frequently in this paper.

Later, we also use the following lemma:

LEMMA 1. A positive clause C is true in every model o f DB (and thus is derivable from

DB) i f and only i f C is true in every minimal model o f DB.

Proof. The only-if part is trivial.

We show that if C is true in every minimal model of DB then it is true in every model

of DB. Let M be an arbitrary model ofDB. ThenM contains a minimal model of DB, say

M1, where M1 is a subset of M. Let C' be an arbitrary instance of C. Since C' is also

positive, there must exist a literal in C' which is also in the minimal model M1 and

subsequently in the model M (because M contains all the elements in M1). This implies

that C is true in M. []

Note that the above result does not hold for nonpositive clauses as the following

example demonstrates: Let DB = {e(a) v -P(b), e(c) v ~P(d)}. Let C = -P(a) v -P(c).

The only minimal model of DB is the empty set. C is true in the only minimal model

but not true in the model M = {e(a), e(c)}.

LEMMA 2. A positive clause C is not derivable from DB (DB b ~ C) i f and only i f C is false in some minimal model.

Proof. This follows immediately from Lemma 1. []

DEDUCTION IN NON-HORN DATABASES 147

THEOREM 8. Every indefinite database is non-Horn. Proof Let DB be Horn. Suppose DB ~- P1 v . . . v Pn where n ;> 1 and h- is a complete

proof system. Then DB + {-P1 . . . . . -Pn} P EMPTY. Because DB is Horn, resolution

theory [9] tells us that at most one of the Pi literals is needed to derive the contradic-

tion, say P1. Then DB l--P1, so that the above non-unit clause was not minimal. Thus

DB cannot be indefinite. []

Note that the converse of this theorem is not true. There are non-Horn databases

that are definite, for example DB= {P(a)vP(b) , ~P(b)} [12]. In such cases, DB is

logically equivalent to a Horn database, for example in the above case to {P(a), -P(b)} .

THEOREM 9. A database DB is definite i f and only i f it has at most one minimal model A consistent database DB is definite i f and only if it has exactly one minimal model.

Proof. DB is inconsistent if and only i fDB ~- EMPTY; if and only ifPGC = {EMPTY};

if and only i fDB is not indefinite. Now, assume that DB is consistent.

~ : By contradiction. Assume that DB has only one minimal model. Suppose that

DB ~-P1 v . . . vPn, some ground clause with n > 1, but that no proper subclause is

derivable. Then P2 v . . . v Pn is not derivable and therefore false in some minimal model

of DB, say M1. On the other hand, M1 has to be a model of P1 v . . . v Pn, so M1 must

make P1 true and all the other Pi false. Similarly, there must exist a minimal m o d e l M 2

making P2 true and all the other Pi false. But clearly M1 and M2 are distinct, a contra-

diction.

--,: By contradiction. Assume that DB is a definite database and has n + 1 distinct

minimal models MO, M1, . . . ,Mn. Let P0 be an element of (M0 - -M1) . Consider the

ground clause K = P0 v P1 v . . . v Pn, where Pi is in (Mi --MO) for all i in {1 . . . . , n}.

Pi must exist because M i < > M j whenever i < > j and because, since these models

are all minimal, no Mi can be contained within a distinct 34].

DB ~-K since an element of K is in every minimal model.

DB ff-PO because P0 is not in M1.

DB b/- K1 = P1 v . . . v Pn, since no element of K1 is in M0. Note that P1 . . . . . Pn

need not be distinct. Now let K ' be a minimal subclause of K such that DB ~ - K ' . By the

above remarks, K ' cannot be contained in P1 v . . . vPn , i.e., K ' must contain P0. But

since DB ~PO, K' must contain at least one other Pi. Thus an indefinite minimal

positive clause is derivable from DB contradicting our assumption that DB is definite. []

Note that the ground condit ion cannot be removed from the above definition of PGC

as the following example shows. Let DB = { P ( x ) v Q ( x ) , P(a), Q(b)}. H is the set

{P(a), P(b), Q(a), Q(b)}. Any model must contain both P(a) and Q(b), and it is easy

to see that no other element of H is needed. Thus M = {P(a), Q(b)} is the only minimal

model. However, P ( x ) v Q ( x ) is derivable while neither P(x ) nor Q(x) alone is. An

unpleasant consequence of this is that in the general case definiteness may not be decid-

eable, unlike the case of Hornness which is always decideable by inspection.

148 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

THEOREM 10. A Herbrand interpretation M is a minimal model o f a consistent data-

base DB if and only if M is a minimal model o f PGC (DB ).

Proof. +-: By contradiction. Assume that M is a minimal Herbrand model of PGC but

that either M is a nonminimal model o f DB or it is not a model of DB at all. There can

be three possibilities:

(a) M is a model of DB which is not minimal. A proper subset ,M1, is a minimal model

of DB. Since DB derives all clauses in PGC, it follows that whenever DB is true in an

interpretation so must be PGC. I.e., M1 is a model of PGC. But M1 is smaller than M con-

tradicting our assumption that M is a minimal model o f PGC.

(b) M is not a model o f DB but a proper subset of M, say M1, is a model of DB.

As above, this leads to a contradiction. This also means that a model of DB cannot be

a proper subset of a minimal model o f PGC.

(c) M is not a model of DB and no subset o f M is a model ofDB. L e t M 1 , M 2 , . . . ,

Mn be all the distinct minimal models of DB. Note that n is at least 1, because DB has at

least one minimal model.

For all i in { 1 . . . . , n} let Pi be a ground atom in Mi but not in M. Such a Pi must

exist, for if not, then Mi will be a subset of M. Mi < > M because Mi is a model of DB

while M is not. Mi cannot also be a proper subset of M because, as a model of DB, Mi

would again be a model of PGC. This would make Mi a model of PGC smaller than M

contradicting the minimality of M. The clause C = P1 v . . . v Pn is true in every minimal

model of DB, so that DB ~- C. Thus C, or a proper subclause of C, must be in PGC. But

for all i in {1 . . . . , n}, Pi is not in M. This means that an element of PGC is false in M

contradicting our assumption that M is a model o f PGC.

~ : See [12]. []

COROLLARY 1. ILK1, K2 . . . . . Kn are all the elements ofPGCand S = L (K1) * L (K2)

�9 . . . * L (Kn), then S is an upper bound on the number o f minimal models o f DB. Here

�9 is the usual multiplication sign.

Proof. By Theorem 10 the only minimal models of DB are the minimal models of

PGC. The number o f the latter cannot exceed S which is the number of all possible

distinct n-tuples with i-th element o f each tuple drawn from clause number (i) in PGC. []

Observation. In Theorems 1, 2 and 3 we can change the word 'Horn ' to 'definite ' .

THEOREM 4 could be restated as follows:

If Q is a positive existential query over a definite database then the closed world

assumption answers for Q when evaluated with respect to DB yields the same set of

answers as when evaluated with respect to DBO, where DBO is a database with the same

minimal model as DB.

6. T h e G e n e r a l i z e d C l o s e d W o r l d A s s u m p t i o n : F u r t h e r D i s c u s s i o n

Let DB be a consistent but not necessarily Horn database. In this chapter we will give

alternative definitions for the GCWA. These definitions could be utilized as a basis for

DEDUCTION IN NON-HORN DATABASES 149

more efficient algorithms to determine the possibility for assuming a ground negative

literal in a general database. We will give such an algorithm later in this paper.

THEOREM 11. Under the generalized closed world assumption, a ground atomic formula

-P(a) can be assumed to be true i f and only i f there exists no conjunction o f negative

unit clauses from H', say XO, such that DB + XO is unsatisfiable while DB + )(1, where

X1 = XO -- {~P(a)}, is satisfiable. Proof. Let us call DB + X0, where X0 is as above, form (1).

-~: By contradiction. Assume that P(a) is in no minimal model but there exists a

subset of H ' , X0, such that DB1 = DB + XO is unsatisfiable but DB2 = DB + X1 is

satisfiable.

Let M be a minimal model of DB2. M is clearly a minimal model of DB. M is in fact

a minimal model of DB; for if a proper subset of M, say M1, was, then M1 would be a

model of DB2 which is smaller than M (for it satisfies DB because it is a model of it and

satisfies X1 since M1 is smaller than M and so satisfies even more negative literals). NowM

is not a model of DB1 since DB1 is unsatisfiable. Further, the only difference between

DB1 and DB2 is that DB1 contains the unit clause -P(a). Therefore it must be that M

contains P(a), contradicting our assumption that P(a) is in no minimal model of DB.

~-: We are assuming in this section that DB is consistent, so let M be a minimal model

of DB. Assume that there does not exist an X0 and X1 as described. Suppose that P(a)

was in M. Let X1 = {-P(c) I P(c) is in H but not inM}. ThenMis also a model ofDB +

X1. Let X0 be X1 + {~P(a)}. By supposition, DB + XO must also be satisfiable with

model, say, M'. Clearly by the definition of X0 and X1, M ' cannot contain any positive

literal not in M. But M' also cannot contain P(a). Thus M ' is strictly smaller than M, contradicting the minimality of M. []

As preparation for Theorem 12, we present the following notation.

Let DB' denote the set of clauses remaining in the database after deleting from DB

any clause with a pure negative literal (the pure literal rule, Davis and Putnam method [2]). A literal L in a clause of DB is pure if and only if -M does not appear in any clause

in DB whenever L and M are unifiable. DB' is unsatisfiable if and only if DB is unsatis- fiable [2].

Further, we have the following lemma:

LEMMA 3. A Herbrand interpretation M is a minimal model o f DB i f and only i f M is

a minimal model o f DB'.

Proof. -~: Assume that M is a minimal model of DB'. Since the positive counterparts

of the pure negative literals in DB occur nowhere in DB', then M, as well as any other

minimal model of DB', will not contain any of these positive counterparts of the pure

negative literals. M must satisfy all the clauses deleted from DB because the pure negative

literal in each such clause is true in M. Further, no subset of M can be a model of DB

because it would also be a model of DB' smaller thanM.

~-: Assume that M is a minimal model ofDB. M is a model of DB' because DB' is a subset ofDB. I f M is a minimal model of DB' then we are done. If not, a proper sub-

150 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

set of M, say MI, must be a minimal mode] of DB'. By the first part of this proof, MI

must be a model of DB. This contradicts the assumption that M is a minimal mode] of

DB. []

Note that if DB is empty then the only minimal model is the empty set. In this case

we can assume any negative ground literal.

We now introduce the notation to be used in the remainder of this section. Let DB

be nonempty with no pure literals. Let P(a) be a ground atomic formula not derivable from D8 (DB ~ e(a)).

Let {C1 . . . . , Cm} be the set of instances o f clauses in DB which contain P(a). Let Ci' = C/ - - {P(a)}.

Let {D1, . . . , Dn} be the set of instances of clauses in DB which contain -P(a). Let Di' = Di - {-P(a)}.

Let {K1, . . . , Kp} be the set of instances of clauses in DB containing neither of

e(a), -P(a).

LetDB' = C I & C 2 & . . . & C m &D1 & D 2 & . . . & D n &K1 &K2 & . . . & K p .

We can also write DB' as

DB' = {e(a) v (CI' & C2' & . . . & On')} & {-e(a) v (DI' & D2' & . . . & Dn')} &

&K1 & K 2 & . . . & K p .

Clearly DB and DB' have the same minimal models.

L e t D B " = D I ' & D 2 ' & . . . & D n ' & K 1 & K 2 & . . . & K p .

THEOREM 12. Under the above assumptions, we can assume -P(a) i f and only i f every

minimal model o f DB" is a model (not necessarily minimal) o f C1' & C2' & . . . & Cm '. Proof. +-: If every minimal model of DB" is a model of CI ' & C2' & . . . & Cm' then

we show that P(a) is in no minimal model of DB', and hence no minimal model of DB. Since P(a) does not occur in any clause of DB", no minimal model of DB" can con-

tain P(a). But each such model must also be a model of CI ' & C2' & . . . & Cm'. Thus

it is a model of DB because every Ki is satisfied, every P ( a ) v C / ' is satisfied through

the Ci part, and every ~P(a) vDi ' is satisfied through the -P(a) literal.

~ : By contradiction: Assume that P(a) is not in any minimal model ofDB' a n d M " is a minimal model of DB" in which CI ' & C2' & . . . & On ' is false. Since M" is mini-

mal and DB" contains no occurrences of P(a) by construction, then P(a) is not in M". But M " is not a model of DB' (it falsifies P(a) and CI ' & C 2 ' & . . . & Cm ' ) a n d M " +

P(a) is a model of DB' (M" satisfies DB" while P(a) satisfies the clauses with P(a) in DB'). Further, no other element, say Q, can be deleted from M " + P(a) and still have

a model o f DB' because then M" -- Q would satisfy DB" contradicting the minimality

of M" for DB". Thus M" +P(a) is a minimal model of DB' and P(a) is in it contra-

dicting our assumption. []

THEOREM 13. Under the same assumptions as those for Theorem 12, we can assume

~P(a) i f DB" -+ C1' & . . . & Cm ', or else i f P(a) appears nowhere in DB'. I f C1', C2', . . . . Cm' are positive clauses then an 'only-if' component can be added to this theorem.

Proof. Of course, if there are no positive instances o f P(a), we may assume P(a) false.

DEDUCTION IN NON-HORN DATABASES 151

->: Assume DB" -* (CI' & C2' & . . . & Cm'). Then every model of DB" is a model of

CI ' & C2' & . . . & On '. By Theorem 12, we can assume -P(a). ~: If there is at least one Ci' (i.e., P(a) actually occurs positively) and all of O ' s

are positive clauses and P(a) is in no minimal model of DB then we show that DB"

-+ CI' & . . . & Cm'. Let /{3 be an interpretation in which some C/' is false, say C]'. Suppose DB" is true

in I0. Let I1 be a minimal submodel of DB". Since C]' has only positive titerals, C]' is

also false in I1, so that I1 is not a model of DB'. Note that P(a) is not in I1 because

P(a) does not occur in DB". Now we can, by adding P(a), extend I1 to a model, say

12, of DB. Any minimal submodel of I2 must contain P(a), contradicting our assumption

that P(a) is not in any minimal model o f DB. Thus DB" must have been false in I0. []

The following counterexample shows that the only if part need not hold if any of the Ci's is not a positive clause.

EXAMPLE 6.1

DB = {P(a) v -P(b) , P(b) vP(c) , ~P(a) re (c )} .

CI ' = ~P(b), D I ' = P(c), K1 = P(b) v P(c). (Note that CI ' is not positive.)

The only minimal model is {P(c)}. So we can assume -P(a) although the set D " =

D l ' & K1 = {P(c), P(b) v P(c)} does not imply ~P(b).

7. E x t e n d i n g G C W A to A c c e p t N o n - u n i t Nega t ive Clauses

Indefinite databases admit positive answers of the form BT(A, John)vBT(B, John).

If we interpret such answers as the blood type of John is either A or B (but not both) and this was the only information available in the database, it will be reasonable to

assume that the answer to the dosed query -BT(A, J o h n ) v - B T ( B , John) will be

'yes'. This is in fact similar to interpreting the logical 'or ' as exclusive rather than inclusive.

Taking the above into consideration, we think that it will be natural to extend the

GCWA to accept 'indefinite' negative answers. I.e., negative, but not necessarily unit,

clauses. For that purpose we have the following two definitions.

SEMANTIC DEFINITION. We can assume a ground negative clause C = -P1 v - P 2 v . . . v

-Pn in DB if and only if C is true in every minimal model of DB (no minimal model of DB contains all the atoms in C).

SYNTACTIC DEFINITION. We cannot assume a ground negative clause C = ~P1 v - P 2 v . . . v -Pn in DB if and only if there exist positive or empty clauses K 1 , . . . , Kn such

that for all i in { 1 , . . . , n}: (1) DB [-t~'vKi and

(2) DB ~-Ki and

(3) DB ~K1 vK2 v . . . vKn;

152 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

THEOREM 14. The semantic and syntactic definitions are equivalent.

Proof. ~ : If conditions (1-3) hold, we show that some minimal model of DB con- tains P1, P2 . . . . , Pn. Because of condition (3), there must exist a minimal model M,

such that all Ki are false in M. Because of conditions (1) and (2) all of Pi must be inM

which we wanted to prove.

~: Assume that there exists a minimal model M such that all of Pi are in M, i.e.,

we cannot assume C by the semantic definition. If M is the only minimal model, then

conditions 1-3 hold with all the Ki's empty. Otherwise, let M 1 , . . . , Aim be all of the

minimal models of DB other than M. Let K = Q1 v Q 2 v . . . v Q m , where Qi is in

(Mi - M ) for all i in { 1 , . . . , m}.

(1) DB ~-Pi vK, since Pi is in M and K is true in every minimal model other thanM.

(2) DB b z- Ki, where every Ki = K, since no elements of Ki are in M. (3) D B ~ K l v K 2 v . . . v K m , where K l v . . . v K m = K , because none of the

elements o f any Ki is in M.

Thus all the three conditions hold. []

We note here that the three conditions given above are not the minimal possible.

The Ki's need not be distinct. Condition (3) implies condition (2). If we replace all

Ki's by a single positive clause K = K1 v . . . v K n , then we will be left with only two

conditions: (1) DB ] - P i v K , and

(2) DB

The conditions stated here have the advantage of enabling the search for Ki's to be

performed independently for each Pi. After all such Ki's are found, condition (3) can

be checked. We prefer to use such an approach in the algorithm discussed later in this

chapter. Let us denote by DB* the set of all minimal ground negative clauses that can be

assumed using the above procedure. Here, a ground negative clause in DB* is minimal

if no proper subclause of it is in DB*.

THEOREM 15. I f DB is consistent then DB + DB* is also consistent. Proof We show that any minimal model of DB, M, is a model ofDB + DB*. Assume

that is not the case, i.e., M is a minimal model of DB but not a model of DB + DB*. There must exist a clause in DB*, say C = -P1 v . . . v - P n , such that Pi is in M for

all i in { 1 . . . . . n }. If M is the only minimal model, then we have a contradiction because C is not true

in M and therefore not true in all minimal models. Then let M1, . . . , Mm be all the

minimal models of DB other than M. Let K = Q1 v . . . v Q m , where Qi is in (Mi - M ) . Let Ki = K for all i in {1 . . . . . n}. All of Ki are false in M. This implies that DB ~ K 1 v K2 v . . . vKn and according to our definition C must not have been in DB* after all.

Now since DB is consistent, it must have at least one minimal model which, by the above,

will also be a model ofDB + DB*. []

THEOREM 16. An interpretation M is a minimal model o f DB if and only i f it is a model

o f DB + DB*.

DEDUCTION IN NON-HORN DATABASES 15 3

P r o o f -+: This direction follows from Theorem 15.

<--: By contradiction. Assume that M is a model ofDB + DB* which is not a minimal

model of DB. Let M0 be a minimal model of DB contained in M and let YO = M - M 0 .

Let PO be a ground atom from H which is an element o f Y0. (Note that M is bound to

be a model of DB because it is a model ofDB + DB*. Our aim is to show that it is mini-

real).

Let M1, 1142 . . . . . Mn be all the minimal models of DB which contain PO. (Note,

n can be zero). Let Pi be an element of MO - Mi for all i in { 1 . . . . . n}. We claim that

we can assume -P0 v -P1 v . . . v -Pn, for suppose that was not the case. Then according

to the syntactic definition, there must exist positive or empty clauses K0, K1, . . . , Kn

such that for all i in {0, 1 . . . . . n}:

DB ~ Pi v Ki, and DB ~ Ki, and DB ~ KO v K 1 v . . . v Kn.

But P / i s not in Mi which means that Ki is true in Mi, for all i in {0, 1 . . . . . n}, since

Mi is a model of DB and hence also for Pi vKi. In addition, K0 must be true in every

model not containing P0. This means that K0 vK1 v . . . vKn is true in every minimal

model because Ki is true in Mi, and K0 is true in all the minimal models that don ' t

contain P0, i.e., in all the minimal models other than M I . . . Mn. Then DB ~-KO v K1

v . . . vKn, contradicting the above assumption, and we must be able to assume ~P0 v

~P1 v . . . v -Pn.

Now, by the just proved claim, C = ~PO v -P1 v . . . v -Pn , or some subclause of it,

is in DB*. Consider M, which is supposed to be a model ofDB + DB*. Pi is i nM0 for all

i in {1, 2 . . . . . n} and therefore in M (recall that M0 is a subset of M). P0 is also in M.

Therefore clause C, or any subclause of C, is falsified by M, contradicting our assumption

that M, is a model ofDB + DB*. []

Note that adding DB* does not allow the derivation o f any new positive information,

that is, there is no positive clause K such that DB + DB* 1-- K but DB ~: K. To see this,

let M be any model of DB, and M ' a minimal submodel. Then M ' is a model ofDB + DB*

and so a model of K. But because K has only positive literals, it must also be true in M.

Thus K is true in arbitrary models of DB. We now show that DB* is the largest set of

negative information with this property.

THEOREM 17. I f DB is consistent then DB + DB* is maximally consistent in the follow-

ing sense: no negative information, C, not subsumed by DB + DB*, can be added to

DB + DB*, without making it possible to derive a positive or empty clause not derivable

from DB itself Le., DB + DB* + C ~- K, where K is a positive or empty clause not derivable from

DB. Proof Since C = ~P1 v - P 2 v . . . v - P n is not in DB*, there must exist positive

or empty clauses K1 . . . . . Kn such that for all i in { 1 . . . . . n}:

(1) DB ~- Pi v Ki and

(2) DB ~ Ki and

(3) DB b l- K1 v K 2 v . . . v K n .

154 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

But clearly we can derive K1 v . . . vKn from C and the clausesPi v Ki (by resolution

for example). Thus, DB +DB*+ C ~-K1 v K 2 v . . . v K n , whereas by condition (3),

DB ~ K 1 v K 2 v . . . v K n . Note that if all Ki are empty, then K1 v K 2 v . . . v K n is

the empty clause. []

COROLLARY 2. Theorem 12 transforms into the following: Under the assumptions made for Theorem 12, we can assume ~P(a) if and only if every model o f DB" + DB"* is also a model (not necessarily minimal) o f CI' & C2' & . . . & Cm'.

Proof. By Theorem 12, we can assume -P(a) if and only if every minimal model of

DB" is also a model (not necessarily minimal) of CI ' & C2' & . . . & Cm'. By Theorem 16, a Herbrand interpretation M is a minimal model of DB" if and only

if it is a model ofDB" + DB"*. The composition of these two results proves the corrollary. []

Observation. The extended generalized closed world assumption reduces to the GCWA

if we restrict the elements of DB* to be ground unit clauses. It reduces to the CWA if

DB is Horn.

We can use Theorem 13 and Corollary 2 as a basis for an algorithm for deciding the

problem of the possibility to assume a negative ground literal in a non-Horn database,

which is more efficient than the algorithm suggested by the definition o f GCWA given

by Minker in [12], as follows (here we will use the notation developed for Theorem 12):

ALGORITHM 1.

Step 1. Let i = 1. / / i is the number of the clause that originally contained P(a), being

considered.//

Step 2. If i = m + 1 then stop with the message 'We can assume -P(a)'. //The test

succeeded for all the clauses that should be considered.//

Step 3. Try to derive the empty clause (EMPTY) from DB" + -Ci'. I fDB" + ~Ci' ~- EMPTY then set i = i + 1 and go to Step 2 . / /C i ' is the result o f deleting P(a) from

clause number i that originally contained P(a). This step tests if Theorem 13 holds for

Ci'. If yes we move to process the next Ci'. If no we apply the next Step.//

Step 4. If DB" + ~0" ~ EMPTY and if all the Ci clauses are positive then stop with

the message 'We cannot assume -P(a) to be true ' . / /The test based on Theorem 13 failed

for the positive case (the case when the only-if part of Theorem 13 holds). This indicates a failure and we quit . / /

Step 5. I fDB" + -Ci' ~- EMPTY and C/' is not positive then try to derive the empty clause from DB"* +DB" + Ci'. If that is possible then set i -=- i + 1 and go to Step 2,

otherwise stop with the mesage 'We cannot assume -P(a) to be true'. //Here we use Corollary 2 for the case when C/' is not a positive clause when the only if part of Theorem 13 Fails.//

Notes concerning this algorithm:

(1) In Step 3 we try a derivation of the empty clause supported by the clauses in -Ci'. Note that -Ci' constitutes a set of unit clauses.

DEDUCTION IN NON-HORN DATABASES 15 5

(2) In Step 5 we need to calculate only those parts of DB"* that are required for

our attempt to derive the empty clause from DB"* + DB" + ~Ci'. The elements of

-Ci' are used to guide the search for those parts of DB"* necessary for the derivation.

(3) The saving achieved using this algorithm is the result of being able to cut short

the computation as soon as we reach the conclusion that no positive clauses not derivable

from DB can be derived from therein in disjunction with P(a). This makes it unnecessary

to derive all possible positive clauses and to test them for derivability from DB (see the

syntactic definition of GCWA).

EXAMPLE 7.1. Let DB = {P(a) v -P(m), P(a) v P(b) v e(m), e(b) v e(c), -P(c) v P(e), ~e(c) v e(d) v e(m), ~P(e) ~, e(b ), -e(a) v e(n), -e(m) re(k), -t'(m) v -e (g )} .

To test for the possibility to assume ~P(a) we have DB" = {P(b) v P(c), -P(c) v P(e), ~P(c) v P(d) vP(m) , ~P(e) vP(b) , P(n), ~P(m) vP(k) , -P(m) v -P(k)}, C I ' = ~P(m), and C2' = P ( b ) v P ( m ) . We are able to derive the empty clause from DB"+ ~CI' and

from DB" + ~C2' (we succeed in Step 3 in both cases), which enables us to assume ~P(a) under GCWA.

On the other hand, when testing for the possibility to assume -P(n), DB" = {P(a) v

~P(m), e(a) v e(b ) v e(m), e(b ) re (c ) , ~e(c) v P(e), ~e(c) v e(d) v e(m), ~e(e) v e(b ), ~P(m) v P(k), ~P(m) v -P(k)}, and CI ' = -P(a) and Step 3 fails. We try using Step 5

to derive the empty clause from DB" + DB"* + -CI'. To do that, CI ' guides us to try

to show that we can assume -P(a) which we can do as was shown in the first part of this

example. Thus we will be able to assume ~P(n) without the need to calculate all the

elements of DB "*.

Minker and Grant [14] also give some algorithms for dealing with indefinite data-

bases. However, there are a number o f differences between their work and the present

work. First, they assume that all formulas are ground. In the case of a database with

general formulas, they suggest that all ground instances be generated. Thus, they deal

at the outset with vastly larger sets of formulas. Second, they are answering a different set of questions than we are. In their case, they seek to answer the general question,

'Does a candidate answer al* + �9 �9 �9 + an* belong to a query Q(x*) = B l ( y l * ) & . �9 �9 &

Bn(yn*)'. They propose to find candidate answers by taking the JOIN of a set o f tables

called $Bi tables: each SBi table is the set of all tuples bi* for which Bi(bi*) occurs

as a disjunct in a (ground instance of a) clause in the database. They extend their algorithm

to queries involving disjunction, negation and quantifiers. Their fundamental algorithm

itself for the simple case is an exponential one which first constructs a table o f all tuples

in all indenfinite tables over the concerned relations up to order n, where n is as above.

It then tests all clauses in the database for subsuming rows of this table. One of their principle aims is to be able to deal with NULL values. We, on the other hand, are attack-

ing a much more restricted problem, namely, when can one assume -P(a*) , or in their

terminology, when is a* not in $P. And, of course, we avoid having to generate all

ground instances of all clauses, a prohibitive task. On the other hand, our method does

involve finding sets of refutations. Now since we are dealing with function-free theories,

156 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

these refutation problems will all be decideable, but of course again in exponential worst.case time. In our case, we can hope that the nature of the problem will lead to good heuristics that will find the required refutations relatively quickly in cases of real databases. An interesting question is whether or not any of our techniques for handling

the database in its general, first-order form can be carried over to the problems attacked by Minker/Grant. In any case, from a computational-complexity point of view, non-Horn databases are very hard to deal with, and it remains to be seen which aspects o f indefi-

niteness can be made practical.

8. Non-Horn General Laws

As we mentioned earlier, a database can have a component consisting of general laws (IDB).

One of the factors that determines how reasoning is to be done in a deductive database

is which of the general laws are used for deriving answers and which are used as integrity

constraints. The kinds of reasoning for the two classes can be quite different (see, e.g.,

[8, 10, 12, 15, 16] ). Sonde guidelines for dividing general laws into derivation rules

and integrity constraints are given in [15]. There, a general law is used as an integrity

constraint if its positive literal is a primitive relation (predicate). A relation is considered

to be primitive if its extension can be known by means of an algorithm. Examples of

such relations are LESS THAN, GREATER THAN . . . . . If that is not the case, the

general law is used as a derivation rule.

The general laws in a database can also be in the form of non-Horn clauses. Here,

assuming that the general laws are the only non-Horn component in the database,

additional guidelines for the purpose of dividing general laws into derivation rules and integrity constraints are suggested.

As a motivating example let us consider the simple case when we have only one non-Horn general law with only two positive literals appearing in its clausal representation.

EXAMPLE 8.1. Assume that we have the general law:

L1 & L 2 &- �9 �9 & L n - ~ M 1 v M2 . . . . rule (2)

which is written in clausal form as:

- L 1 v - L 2 v . �9 �9 v ~ L n v M1 v M2 . . . . rule (2') .

(a) If both M1 and M 2 are base (stored) relations then rule (2) is used as an integrity constraint: whenever the left-hand side of the implication is true, so must be the right hand side. If that is not the case, the integrity of the database is violated. It is the DBA's responsibility to recover the integrity (for example, by rejecting the update which caused this violation or by propagating further updates needed for recovery - of course this process can be automated). The result must be that the extensions of M1 andM2 satisfy rule (2). Thus, after any update, M1 and M2 will retain their status as base relations defined through their extensions.

(b) If both M1 and M2 are defined through other (Horn) clauses, or if both of them are mixed (partially stored and partially defined) then rule (2) is also used as an integrity

DEDUCTION IN NON-HORN DATABASES 157

constraint: whenever the Li s are true so must be M1 v M2. I.e., the tuples corresponding

to M1 v M2 must either be stored in the database or be provable from therein. The above

discussion concerning recovery is applicable here, but we must take into consideration

the derived nature of M1 and M2.

(c) Suppose that rule (2) is the only definition for M1 or M2 but not for both, say

for M I . M2 is either a base relation or a relation defined through other (Horn) clauses.

Rule (2) is treated as a defining rule forM1 while M2 is defined exclusively through other

clauses. Rule (2) is marked as a defining rule for M1 if it will be used in that capacity.

This means that rule (2) is treated as

L1 & L 2 & - - . & L n & - M 2 - + M t . . . . rule (2") .

Under the closed world assumption we will be able to evaluate -M2 through M2.

If Li's hold true and M2 does not, we can derive M1. Actually, here we are making use

of the fact that D = II M2 II v II ~M2 II. i.e., any tuple defined over the proper domains

is either inM2 or in ~_M2.

The criterion for our distinction between derivation rules and integrity constraints

is not to allow any rule to be used as a definition for more than one relation at the

same time. If that is not possible, the rule must be used as an integrity constraint. Note

that in the example above, even if M1 and M2 were originally base relations, we can

still treat rule (2) as a derivation rule for M1 or for M2 (but not for both) since this

does not violate our criterion.

The condit ion that a non-Horn general law can serve as a derivation rule for only

one of its positive literals is a necessary but not a sufficient condit ion for simplifying the

non-Horn case. The necessity stems from the fact that if rule (2) is used to derive both

M1 and M2 then we may encounter the situation where we need to make sure that M1

does not hold in order for M2 to be true, and to make sure that M2 does not hold in

order for M1 to be true. This will create a deadlock. The fact that the condit ion is not

sufficient is illustrated by the following example:

EXAMPLE 8.2.

P ( x , y ) & SIB(y, z) ~ UN(x, z) v AUN(x , z) (1)

NIE( t, u) -~ UN(u, t) v AUN(u, t). (2)

If we use (1) as a derivation rule for UN and (2) as a derivation rule for AUN then we

may be deadlocked in spite of the fact that each clause is used as a derivation rule for

only one of its positive literals. Consider, for example, the situation when x = a, y = c,

z = b , t = b a n d u = a . We now develop a general criterion for the distinction between derivation rules and

integrity constraints when we have an arbitrary number of general laws (Horn and non-

Horn), with each clause containing an arbitrary number of positive literals. We will still

require each general law to be used as a derivation rule for at most one of its positive

literals.

Let K1, K 2 , . . . , Kn be all the general laws used as derivation rules in DB.

158 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

Let C/, for all i in {1 . . . . . n} denote the positive clause that we get from Ki by deleting all negative literals from Ki. In each C/, one literal, Li is designated as distinguished

(it is the literal for which Ki will be used as a derivation rule). Thus

C1 = M l l vM12 v . . . v M l ] l vL1

Ci = Mi l v Mi2 v . . . vMi/ i v L i

Cn = M n l v M n 2 v . . . vMn/n v L n .

Now, construct a connection graph (CG) as follows:

(1) For each clause C/create a node of the form

(2) Create a directed edge from Mi] to L k if and only if these two literals are uni-

fiable. This edge is labelled by the most general unifier (MGU) for Mi] and Lk.

We will say that a connection graph is cyclic (has a cycle) if and only if there exists

a sequence of consecutive edges such that the first edge of this sequence emanates from

and the last edge enters into the same node and at the same time the substitutions label-

ling the edges of the sequence are compatible. The connection graph is acyclic if it has

no cycles.

A terminal node is a node in the connection graph with no edges emanating from it.

THEOREM 18. I f the connection graph constructed for the derivation rules in a database,

DB, is acyclic then the non-Hornness o f these rules will cause no problems for query

evaluation (we can encounter no deadlocks when evaluating queries) not encountered

fo r the Horn case. Under the CWA we will be able to use Reiter's method for the evalu-

ation o f complex queries in terms o f set operations over the answers to their atomic

components.

Proof. Since the connection graph is acyclic, each path in the graph must end in a terminal node. The distinguished literal in the terminal node is evaluated and the result is used to evaluate the preceeding nodes on the path and so on. This makes it impossible

to have mutually dependent nodes which could have created deadlocks. Each attempt

to evaluate a node will end either with failure or success, and the evaluation process will always terminate (assuming no problems are caused by the Horn part of DB). On the other hand, if a cycle existed in the connection graph then there will be no guarantee of the absence of mutual dependence in the evaluation process. []

Notes

(1) We can use Theorem 18 to divide general laws into derivation rules and integrity

constraints. The way to do that is to keep changing laws from derivation rules to integrity constraints until the connection graph becomes acyclic. If we can achieve acyclicity in more than one way, then we can use the flexibility to improve the database design.

(2) Even if there was no way to achieve an acyclic connection graph, the cycles will affect only the evaluation of those queries that traverse the existing cycles. If the evalu-

DEDUCTION IN NON-HORN DATABASES 159

ation path of a query does not pass through a cycle, then we can still get good results

although the connection graph as a whole is cyclic.

(3) Whenever the database is consistent and its integrity is not violated, the integrity

constraints can be used to detect answers. For example, even if rule (2) was originally used as an integrity constraint, if the left hand side of the implication in rule (2") is true, we can assume M2 without looking that up in the EDB.

(4) Other than the CWA for negative information, the above classification still uses the ordinary methods for query evaluation. The queries could still be compiled in advance and invoked when needed [8].

(5) The fact that we may be using negative information in the query evaluation pro- cess will have the following drawbacks:

(a) The evaluation of negative literals under the CWA requires the exhaustive search for every possible answer. This process can be time consuming.

(b) Because of the nonmonotonic nature of the CWA [12], the addition of some

tuples to the database may decrease the number of tuples derivable for certain queries as the following example demonstrates:

Assume that IDB consists of one law

P(x) & -Q(x) ~R (x). IDB = {~P(x) v Q(x) v R(x)}.

EDB = {P(a), P(b), P(d), Q(a), Q(c)}.

We can assume -Q(b), -Q(d),

and derive R(b), R(d).

P Q R

a a a b

b b c

d d d

P Q R

a b

c d

If we add Q(d) to EDB we will be able to derive R(b) only.

9. Conc lus ion and R e m a r k s

As the discussion in this paper shows, the general case of non-Hornness is a very undesir-

able property. It adds substantially to the complexity of deductive operations in a data-

base. Therefore, the DBA should strive to design Horn databases whenever possible. If that is not achievable, it may help to use the guidelines for dividing non-Horn general

laws into derivation rules and integrity constraints to get simpler algorithms. Because of the problems associated with non-Horn databases, it may be necessary

to monitor the evolution of the database to secure the holding of any assumptions affecting the algorithms used to manipulate the database. We did not address the realiz-

160 ADNAN YAHYA AND LAWRENCE J. HENSCHEN

ation o f such a monitoring system in this paper but it can serve as an area for further

research.

R e f e r e n c e s

1. Chang, C. L., 'On evaluation of queries containing derived relations', in Formal Bases for Data. bases, Preprints, Toulouse, France (1979).

2. Chang, C.L. and Lee, R. C. T., Symbolic Logic and Mechanical Theorem Proving, Computer Science and Applied Mathematics, Series, Academic Press, New York, (1973).

3. Clark, K. L., 'Negation as failure', in Logic and Databases (eds. H. GaUaire and J. Minker), Plenum Press, New York, 293-324, (1978).

4. Codd, E. F., 'Extending the database relational model to capture more meaning', ACM Trans- actions on Database Systems 4, 4 ,339- 434 (December, 1979).

5. Date, C.J. , An Introduction to Database Systems, 3rd. Ed., Addison-Wesley Publishing Co., Reading, Mass. (1982).

6. Gallaire, H., Minker, J., and Nicolas J. M., 'An overview and introduction to logic and databases', in Logic and Databases (eds. H. GaUaire and J. Minker), Plenum Press, New York, 3-32 (1978).

7. Ghafarzade, M., 'Function symbols in first-order databases', Ph.D. Dissertation, EECS Department Northwestern University, Evanston, I11. (1981).

8. Henschen, L. and Naqvi, S., 'On compiling queries in recursive first-order databases', Accepted JACM 31, 1, 47-85 (January, 1984).

9. Henschen, L. and Wos, L., 'Unit refutation and Horn sets', JACM 21,4,590-605 (October, 1974). 10. Kowalski, R., 'Logic for data description', in Logic and Databases (eds. H. GaUaire and J. Minker)

Plenum Press, New York, 77-106 (1978). 11. Lipski, W. Jr., 'On semantic issues connected with incomplete information databases', ACM

Transactions on Database Systems 4, 3,262-301 (September 1979). 12. Minker, J., 'On indefinite databases and the closed world assumption', Technical Report, Univer-

sity of Maryland, College Park, Maryland, (July, 1981). 13. Minker, J., 'On theories of definite and indefinite databases', University of Maryland, College

Park, Maryland, 1983. Submitted for publication to the JACM. 14. Minker, J. and Grant, J., 'Answering queries in indefinite databases and the null value problem',

University of Maryland, College Park, Maryland (1984). 15. Nicolas, J.M. and Yazadanian, K., 'Logic and database integrity', in Logic and Databases (eds.

H. GaUaire and J. Minker), Plenum Press, New York, 325-346 (1978). 16. Reiter, R., 'Deductive question-answering in relational databases', in Logic and Databases (eds.

H. Gallaire and J. Minker.), Plenum Press, New York, 149-178 (1978). 17. Reiter, R., 'Equality and domain closure in first-order databases', JACM 27, 4 ,235-250 (April

1980). 18. Reiter, R., 'On closed world databases', in Logic and Databases (eds. H. Gallaire and J. Minker)

Plenum Press, New York 55-76 (1978). 19. Van Emden, M. H. and Kowalski, R.A., 'The semantics of predicate logic as a programming

language', JACM 23,733-742 (October, 1976). 20. Vassiliou, Y., 'Null values in database management: a denotational semantics approach',

A CM/SIGMOD International Symposium on Management o f Data, 162-169 (May - June, 1977). 21. Yahya, A., 'Deduction in non-Horn databases', Ph.D. Dissertation, EECS Department North-

western University, Evanston, I l l . (1984).