A Query Language for NC

File: DISTIL 152501 . By:DS . Date:14:10:97 . Time:08:14 LOP8M. V8.0. Page 01:01Codes: 6173 Signs: 3849 . Length: 60 pic 11 pts, 257 mm

Journal of Computer and System Sciences � SS1525

journal of computer and system sciences 55, 299�321 (1997)

A Query Language for NC

Dan Suciu

ATHT Labs Research, Florham Park, New Jersey 07932-0971

and

Val Tannen*

University of Pennsylvania, Philadelphia, Pennsylvania 19104

Received January 2, 1995; revised April 29, 1996

We show that a form of divide and conquer recursion on sets,together with the relational algebra, expresses exactly the queries overordered relational databases which are NC-computable. At a finer level,we relate k nested uses of recursion exactly to ACk, k�1. We also givecorresponding results for complex objects. ] 1997 Academic Press

1. INTRODUCTION

NC is the complexity class of functions that are com-putable in polylogarithmic time with polynomially manyprocessors on a parallel random access machine (PRAM).The query language for NC discussed here is centeredaround a form of divide and conquer recursion (dcr) on finitesets which has obvious potential for parallel evaluation andcan easily express, for example, transitive closure andparity. Divide and conquer with parameters e, f, u definesthe unique function ., notation dcr(e, f, u), taking finitesets as arguments, such that

.(<) =def e

.([ y]) =def f ( y)

.(s1 _ s2) =def u(.(s1), .(s2)) when s1 & s2=<.

For parity, we take e =def false, f ( y) =

def true and u(v1 , v2) =def

v1xor v2 . To compute the transitive closure of some binaryrelation r, take e =

def <, f ( y) =def r, and u(r1 , r2) =

def r1 _ r2 _r1 b r2 . Then, the transitive closure of r is obtained by apply-ing . to the set of nodes of the relation r, namely tc(r)=.(61(r) _ 62(r)), where 61 , 62 are the relational projec-tions. In general, dcr(e, f, u) is well defined when there is

some set containing e and the range of f, on which u isassociative, commutative, and has the identity e. For parity,this is the set B of booleans, while for transitive closure, itis the set [r _ r2 _ } } } _ rn | n�0].

We show that dcr, together with the relational algebraexpresses exactly the queries over ordered databases of flatrelations that are NC-computable. We also show that abounded version of dcr, together with the nested relationalalgebra expresses exactly the queries over ordered databasesof complex obJects that are NC-computable. In fact, weprove the more refined versions that relate k nested uses of(bounded) dcr exactly to the subclass AC k of NC, wherek�1 (the definitions of these complexity classes arereviewed in Section 4). Some explanations are in order:

�� Computable queries are in the sense of Chandra andHarel [CH80], with a natural extension to complex objects(Section 5).

�� Any language that can express the same class ofqueries as first-order logic would do just as well as the rela-tional algebra. Similarly for complex objects, where acorresponding class of tractable queries has emerged fromseveral equivalent formalisms. Some of these formalisms aresyntactically restricted higher-order logics, others arealgebraic languages, often called nested relational algebras,hence our statement above. In fact, we will use the familyof query languages introduced in [BBW92, BNTW95]because it is semantically related to dcr (Section 2).

�� dcr and (nested) relational algebra have meaning overany (nested) relational database. But, as with all knowncharacterizations of query complexity classes below NP, weknow how to capture the entire NC only over ordereddatabases. Formally, we do this by extending the languagewith an order predicate.

�� A bounded version of dcr is necessary over complexobjects; otherwise queries of high complexity such as

article no. SS971525

299 0022-0000�97 �25.00

Copyright � 1997 by Academic PressAll rights of reproduction in any form reserved.

* The authors were partially supported by NSF Grant CCR-90-57570and ONR Contract NOOO14-93-11284. E-mail: suciu�research.att.com;val�cis.upenn.edu.


powerset will be expressible. The bounded version isobtained by intersecting the result with a bounding set ateach recursion step (Section 3). This is similar to thebounded fixpoints studied in [Suc97], an idea due to PeterBuneman, and, as with fixpoints, over flat relations dcr canalways be expressed through bounded dcr (Section 3).

We believe that these results are of interest from twoangles:

Query language design. dcr is a well-known construct. Itappears under the name pump, in a language specificallydesigned for a parallel database machine, FAD [BBKV87].Following FAD, but under the name hom, this constructwas included in Machiavelli [OBB89], where it fit nicelyinto the language's type system. Called (a form of) trans-ducer, it is part of SVP [PSV92], precisely in order to sup-port divide and conquer parallelism. Some limitations of itstheoretical expressive power were examined (under thename hom) by Immerman, Patnaik, and Stemple [IPS91,Theorem 7.8]. They also note that dcr is in NC.

As part of a larger group of researchers, we became inter-ested in dcr because it fits into a natural hierarchy of querylanguages that share a common semantic basis, built aroundforms of structural recursion on collection types [BTS91,BTBN91, BBW92, BNTW95] (see Section 3). Theoreticalstudies of expressiveness, such as [Won93, LW94a, Suc97,LW94b, SW95] and the present paper help us with the choiceand mix of primitives, as well as implementation strategies. Inparticular, dcr is at the core of a sublanguage for which we arecurrently seeking efficient implementation techniques for avariety of parallel architectures.

Computational complexity. Following Immerman andVardi's influential result [Imm82, Var82, Imm86] that first-order logic with the least fixed point captures exactly thePTIME-computable queries on flat relations over ordereddatabases, several characterizations of low complexityclasses in terms of logics or algebras used in databases havebeen discovered with the hope that logical methods maygive insights into the difficult problem of complexity classseparation. We mention first a few of these characterizationswhich have had a direct influence on the work here.

For parallel complexity classes, Immerman [Imm89]shows that the class of finite and ordered relational struc-tures recognizable in parallel time t(n) (n is the size of thestructure) on a certain CRCW (concurrent read��con-current write) PRAM coincides with the class of structuresdefinable by a first-order induction [Mos74] of depth up tot(n). Denninghoff and Vianu [DV91] characterize NC interms of a resource-restricted message-passing model withparallel semantics which computes object-oriented queries.For complex object databases, Grumbach and Vianu[GV91b, GV91a, GV95] give a syntactic restriction of theramified higher-order logic CALC which, together with

inflationary fixpoints and in the presence of order, capturesexactly the PTIME-computable complex-object queries.Suciu [Suc97] shows that, in the presence of order, thesame class of queries is captured by the nested relationalalgebra augmented with an inflationary bounded fixpointoperator.

To the best of our knowledge, no characterization ofparallel complexity classes of queries over complex objectshas been given before. What is more likely to set our resultsapart, however, is the intrinsic nature of the language we areproposing: the semantics of dcr puts it naturally in NC ;there is no need to impose logarithmic bounds on thenumber of iterations or recursion depth. Moreover, it can beshown that a different kind of recursion on sets, namelystructural recursion on the insert presentation of sets([BTS91]; notation sri ; definitions reviewed in Section 3),together with the relational algebra expresses exactly thePTIME-computable queries on ordered databases.1 Thisfollows from results in [IPS91]; we state the correspondingresult for complex objects in Proposition 6.7. Hence, at leastover ordered databases, the difference between NC andPTIME boils down to two different ways of recurring onsets, divide and conquer versus element by element.

Gurevich [Gur83] and Compton and Laflamme [CL90]characterize the DLOGSPACE- and respectively the NC1-computable global functions on ordered finite relationalstructures as algebras with certain primitive recursionschemas. Compton and Laflamme capture NC 1 also withfirst-order logic augmented with BIT,2 and with an operatorfor defining relations by primitive recursion. The recursionforms used in these two papers are very different from dcrbecause they depend on some linear ordering of the under-lying structures for their actual definition, and while dcr isa form of recursion on finite sets, the recursion forms in[Gur83, CL90] are on notations for elements of (linearlyordered) finite sets. In Section 3 we consider a related formof recursion on sets, set-divide, whose definition relies on anorder relation of the base type. This is a recursion on sets;hence, it is different from the recursion considered in[CL90], which is a recursion on the elements of the domain;set-divide is related in spirit to set-reduce [IPS91]. Clote[Clo90] gives related characterizations of most parallelcomplexity classes, of arithmetical functions, however.

Since dcr can be defined for all structures, not justordered ones, our characterization of NC is, instead, closerin style to the above-mentioned fixpoint characterization ofPTIME by Immerman and Vardi, or to Immerman'scharacterizations of DLOGSPACE and NLOGSPACE byfirst-order logic extended with deterministic and nondeter-ministic transitive closure [Imm87b, Imm87a]. We must

300 SUCIU AND TANNEN

1 Of course, so does least fixpoint recursion, for example, but it is not arecursion on sets.

2 A relation giving the binary representation of integers.


warn the reader, however, about one sense in which ourlanguage is not as neat as these extensions of first-orderlogic. For dcr to be well-defined, the operations involved init must satisfy certain algebraic identities (associativity,commutativity, identity) and this turns out to be anundecidable condition (in fact 6 0

1 complete; see Section 3).Of course, only a certain family of instances of dcr isactually needed in the simulations, and for these, thealgebraic conditions always hold (Proposition 8.3). Hence,it is of theoretical interest that there is a decidable sub-language of dcr plus relational algebra which capturesexactly NC in the presence of order. In practice, we havefound it useful to provide special syntax for some instancesof dcr in which the algebraic conditions are automaticallysatisfied, but we found it counterproductive to limit dcr tothese instances, as other uses kept appearing.

This paper is organized as follows. Section 2 reviews somebasic query language constructs. Section 3 introduces fourdifferent forms of recursions over sets (one of which is dcr)and establishes the relationships between them, then definesbounded versions of recursion on sets, which are necessaryfor controlling complexity when we work with nested sets. Italso briefly discusses the undecidability of the well-defined-ness of the dcr construct. We briefly review the parallel com-plexity classes ACk and NC in Section 4 and extend them toclasses of queries in Section 5. The main results are statedand discussed in Section 6. We prove our main results bytranslating dcr into a series of equivalent constructs, whichare of interest of their own, into order-dependent forms ofrecursion (Section 7) and into iterations (Section 8). Sec-tions 9 and 10 complete the proofs.

2. QUERY LANGUAGES: THE BACKGROUND

Our language for the NC-computable queries over flatrelations consists of a background language, equivalent tothe relational algebra, and a form of recursion on sets, dcr.For complex objects, we also have a background language,this one is equivalent to the nested relational algebra and aform of recursion on sets called bounded dcr. The readerinterested only in flat relations may skip the complex-objectpart; some of the proofs in this paper, however, are easier tounderstand in the more general context of complex objectsthan in that of flat relations.

We could have simply chosen the relational algebra or,equivalently, first-order logic (FO) as our backgroundlanguage for flat relations. But we do not do so, becausethere is a mismatch between the syntax of, say, FO, and thebest notation for dcr. To illustrate the point, note thatdcr(e, f, u) is a higher-order construct, expecting functions fand u as arguments. Such functions are cumbersome todenote in FO. Indeed FO is well equipped to express querieswhich are functions taking relations as inputs and returninga relation as output, but the input relation names are fixed

by the input schema. In order to extend FO with higherorder constructs like dcr, one needs a notation for definingarbitrary functions. In fact, a similar situation arises for theextension of FO with fixpoints, another higher order con-struct [AHV95]. The usual notation for a fixpoint is of theform fixT (.(T)), where .(T ) is a formula with m freevariables, in which T is a new m-ary relation name. Theargument of the fixpoint is the query .(T ), and the symbolfixT is needed to single out the name, T, of the input relationto that query. This works fine for fixpoints, because theytake only a single query as argument, and, moreover, thisquery takes a single relation as input. For our dcr, however,f is not a traditional query, because its input is an atomicvalue, or a tuple of atomic values, rather than a relation,and moreover, u is a query expecting two relations as inputs.Extending the syntax of FO in the same way as for fixpointswould lead us to the following notation for dcr(e, f, u):dcrx1, ..., xk, T1, T2

(#, �(x1 , ..., xk), .(T1 , T2)), where # is a for-mula with m free variables (denoting e), �(x1 , ..., xk) is aformula with m+k free variables, denoting the function f inwhich x1 , ..., xk are the inputs, and .(T1 , T2) is again a for-mula with m free variables and two additional m-ary rela-tion names T1 and T2 , denoting the query u. We foundthis notation cumbersome. Instead we chose to follow[BTBN91, BBW92, BNTW95] in presenting a languageequivalent to the relational algebra, with a concise notationfor functions. This will make the addition of a syntax for dcra simple task. Moreover, the presentation in [BNTW95] issuch that the transition to complex objects is natural andstraightforward.

Flat relations, and their extensions to complex objects arebuilt essentially from tuples and finite sets. To describethem, we define the complex object types by the grammar

t =def

D | B |t_ } } } _t| [t],

where D is some base type, which we assume to be infiniteand denumerable, and B is the type of boolean.3 The valuesof type t1_ } } } _tk are k-tuples (x1 , ..., xk) with xi a valueof type ti , for i=1, k. The values of type [t] are finite sets[x1 , ..., xn] of values of type t. We denote with unit theempty product, obtained by taking k=0 in t1_ } } } _tk : ithas a single value, the empty tuple denoted ( ) . E.g., thetype of finite binary relations is [D_D] and is populatedwith relations like [(a, c), (b, a) , (a, d)] with a, b, c, d #D. The type [D_[D]] contains nested relations, like[(a, [b, c]) , (b, [ ])].

A scalar type is either one of the types D, B, or a productof two scalar types. A flat relation type, or a flat type inshort, is either a type of the form [t], with t a scalar type,or a product of two flat types. E.g., D_B_D is a scalartype, while [D_D] and [unit]_[D_B] are flat types.

301A QUERY LANGUAGE FOR NC

3 Not really necessary, could have been encoded as [unit] [BNTW95].

File: 571J 152504 . By:XX . Date:08:10:97 . Time:10:57 LOP8M. V8.0. Page 01:01Codes: 6143 Signs: 4533 . Length: 56 pic 0 pts, 236 mm

Relational algebra and first-order logic deal only withvalues of flat types.

Preparing the way for our higher order constructs, likedcr and its relatives, we consider also function types. Thesehave the form t1 � t2 , where t1 and t2 are complex objecttypes.4 The syntactic construct for functions of type t1 � t2

is *xt1 } e, where xt1 is a variable of type t1 and e is an expres-sion of type t2 . E.g., *x[N] } x[N] _ [5] denotes the functionwhich inserts 5 into a set of natural numbers. This will makeit easy to introduce dcr later on as a construct dcr(e, f, u),where e : t2 is a complex object expression, and f : t1 � t2 ,u : t2_t2 � t2 are function expressions.

We present now the formalization of the nested relatio-nal algebra, NRA, that was essentially introduced in[BNTW95] and is based on what is called there the monadcalculus. This language has the same expressive power (see[Won94]) as Thomas and Fischer's algebra [TF86], hencealso as Schek and Scholl's NF2 relational algebra [SS86],or Paredaens and Van Gucht's nested algebra [PG88,PG92], or Abiteboul and Beeri's algebra without powerset[AB88]. When restricted to flat relations, it is equivalent torelational algebra.

The basic idea, which proved quite fruitful, is to designquery languages for complex objects by considering tuplesand sets as orthogonal. Hence, there will be primitives thatwork on tuples, primitives that work on sets, and generalprimitives for combining other primitives.

We assume an infinite set of variables to be given, eachhaving a complex object type associated with it. We write xt

for a variable of type t. As usual, we distinguish between freeand bound variables, and we identify those expressions thatdiffer only in the name of some bound variables. NRA isdefined by the expressions in Fig. 1, presented as part oftheir typing rules.

We briefly describe the semantics of the expressions:(e1 , ..., ek) constructs a k-tuple, ?t1, ..., tk

i are the projections,[e] is the singleton set, empty?(e) returns true iff e=<, f (e)is function application, and ext( f )([x1 , ..., xn]) =

def f (x1) _

} } } _ f (xn). Finally, get(s, y) gives x if s is the singleton[x] and y otherwise; this is a rather unusual operation inthe context of a database query language; we introduce it fortechnical reasons, and we will comment on it shortly. Wedrop type superscripts, writing ?i and x instead of ?t1, ..., tk

i

and xt, when no confusion arises. We denote e[e$�x] theresult of substituting e$ for x in e.

A possible set 7 of external functions p : dom( p) �codom( p) could be added to the language; in this case, wedenote the language by NRA(7).

FIG. 1. The definition of the nested relational algebra NRA.

To illustrate with some simple examples, consider thefollowing:5

pairwith(x, s) =def

ext(*y } [(x, y)])(s)

product(s, s$) =def

ext(*x } pairwith(x, s$) )(s)

member(x, s) =def

not(empty?(ext(*y } if x= y

then [( )] else <)(s)))

difference(s, s$) =def

ext(*x } if member(x, s$)

then [x] else <)(s)

Here pairwith(x, [ y1 , ..., yn])=[(x, y1) , ..., (x, y1)],product(s, s$)=s_s$ computes the cartesian product of sand s$.

In fact, it is easy to see that NRA can express all opera-tions in the relational algebra. But, besides being able todeal with complex objects, NRA adds to the relationalalgebra the ability of dealing with scalar values. The abilityto treat uniformly relations and scalar values is of impor-tance to us, because, as we saw, the functions f and u indcr(e, f, u) take scalar values and flat relations, respectively,as inputs. Therefore in defining queries we again departslightly from the traditional approach, where a query f is afunction taking relations as inputs and returning a relationas an output, i.e., it has type f : [t1]_ } } } _[tk] � [t].Instead we define a query in NRA to be any closed6 func-tion expression f : t1 � t2 . This is only a slight generaliza-tion, in that it allows queries to also take scalar values asinputs and return both relations and�or scalar values as out-puts.


4 Note that we never allow functions to take other functions asarguments. Thus, in the parlance of the theory of functional programminglanguages, our language is first-order; in that of logic, however, it is higher-order since it manipulates complex objects.

5 Here not e is an abbreviation for if e then false else true; true can be

taken to be empty?(<), while false =def

empty? ([( ) ]). Also, we prefer todefine functions using the notation pairwith(x, s) =

defext(*y } [(x, y)])(s),

instead of the official pairwith =def

*z } ext(*y } [(?1(z), y)])(?2(z)).6 I.e., with no free variables.


All queries in NRA are generic: see Appendix A for areview of the definition.

As we said earlier, the reader interested only in flat rela-tions and not in complex objects may skip the part dealingwith complex objects. Formally, we consider a certain frag-ment of NRA which has the same expressive power asrelational algebra. First we define the set height of a type tas

height(D) =def

height(B) =def

height(unit) =def

0

height(t1_ } } } _tk) =def

max(height(t1), ..., height(tk))

height([t]) =def

1+height(t).

Thus, height(t)=0 iff t is a scalar type, and height(t)�1 ifft is a product of scalar types and flat types. The fragmentthat interests us is NRA1, defined as the restriction ofNRA to types of set height �1; i.e., the only types allowedin NRA1 as inputs, outputs, and intermediate types areproducts of base and flat types. Indeed:

Theorem 2.1 [BTBN91]. The set of NRA1 queriesf : t1 � t2 with t1 , t2 flat types coincides with the set of queriesexpressible in the relational algebra and, hence, in first-orderlogic.

For all practical purposes NRA1 is equivalent to rela-tional algebra. In the rest of the paper we will consider inparallel both NRA1 (i.e., flat relations) and NRA (i.e.,complex objects).

Finally we comment on the operation get which weincluded in NRA. It is obviously a generic function, in thesense of Definition A.1, and it is computable in NC, in thesense of Definition 5.2. Hence it must be expressible in ourlanguage for NC. In fact get turns out to be expressible withdcr (Example 3.3). But this is unsatisfactory for us, becauseit ruins our correspondence between the classes ACk and thenesting depth of dcr. Worse, when we go to complex objectsand replace dcr with a bounded dcr, get is no longer express-ible (Proposition C.1). To simplify the exposition, we choseto include get in all our languages.

3. QUERY LANGUAGES: RECURSION ON SETS

In this section we discuss the properties of the divide andconquer construct, dcr, which is at the core of our querylanguages for NC, in the context of related operations.

We also discuss three additional forms of recursion onsets (two of which are structural recursions) and the rela-tionship between them. While dcr corresponds to NC,another form, element-step-recursion, esr, corresponds toPTIME. When dealing with complex objects, instead of flatrelations, we need to add an additional twist to these struc-tural recursions to make them still capture NC and PTIME,

respectively; we call the resulting forms of recursion boundedrecursions.

We start by formally defining these forms of recursion,establish the relationships between them, and between thebounded and unbounded versions, and finally discuss thecomplexity of checking the ``side-conditions'' associated torecursions on sets.

3.1. Forms of Recursion on Sets

There seem to be two basic ways of describing the struc-ture of finite sets. In one way, they are generated by finitelymany (maybe zero!) binary unions of singleton sets. We callthis the union presentation. In another way, they aregenerated by finitely many insertions of one element, start-ing with the empty set. We call this the insert presentation.Recognizing the relevant algebraic identities satisfied byunion (associativity, commutativity, idempotence, has < asan identity) and by element insertion (left-commutativityand left-idempotence) gives us two different algebraic struc-tures on finite sets. Both these algebras are characterized byuniversality properties, which amount to definitions of func-tions by structural recursion [BTS91, BTBN91]. We have astructural recursion construct on the union presentation,sru,

e : t2 f : t1 � t2 u : t2_t2 � t2

sru(e, f, u) : [t1] � t2

sru(e, f, u)(<) =def e

sru(e, f, u)([ y]) =def f ( y)

sru(e, f, u)(s1 _ s2) =def u(sru(e, f, u)(s1), sru(e, f, u)(s2));

sru(e, f, u) is well defined when there is some subset of t2

containing e and the range of f, on which u is associative,commutative, idempotent, and has the identity e. We alsohave a structural recursion construct on the insert presenta-tion, sri (we denote by xZs the element insertion operation,[x] _ s),

e : t2 i : t1 _t2 � t2

sri(e, i) : [t1] � t2

sri(e, i)(<) =def e

sri(e, i)( yZs) =def i( y, sri(e, i)(s));

sri(e, i) is well defined when there is some subset of t2 con-taining e on which i is left-commutative,

i(x, i( y, s)) =i( y, i(x, s)),



and left-idempotent,

i(x, i(x, s)) =i(x, s).

We can now present our central operation, divide and con-quer recursion, as a ``no duplicates'' variation of sru. As wasalready sketched in Section 1,

e : t2 f : t1 � t2 u : t2_t2 � t2

dcr(e, f, u) : [t1] � t2

dcr(e, f, u)(<) =def e

dcr(e, f, u)([ y]) =def f ( y)

dcr(e, f, u)(s1 _ s2) =def u(dcr(e, f, u)(s1), dcr(e, f, u)(s2))

when s1 & s2=<;

dcr(e, f, u) is defined only when u is associative, com-mutative, and has e as identity on some subset of t2 contain-ing Im( f ).

If sru(e, f, u) is well defined then so is dcr(e, f, u) and theyare equal. But dcr is potentially more expressive, since uneed not be idempotent. In fact, it is open whetherNRA1(sru), or even NRA1(sru, �) can express parity ortransitive closure. However, over ordered databases sru,together with transitive closure, has the same expressivepower as dcr, Proposition 7.4.

One can also define a no duplicates variant of sri; let uscall it element-step recursion, esr. This is likely sri, with thesecond clause modified as

esr(e, i)( yZs) =def i( y, esr(e, i)(s)) when y � s,

where i is required to be left-commutative (but notnecessarily left-idempotent). Obviously, esr can express sri.7

The nonimmediate relationships between the four forms ofrecursion on sets are contained in:

Proposition 3.1. [BTS91, BT92]. In the presence ofthe operations of NRA: sri can express sru and, similarly,esr can express dcr; moreover, sri can express esr. All this canbe done without increase in set height and with at most poly-nomial overhead.

Schematically,

sru�dcr�esr=sri.

For the case of flat relations, the statement becomes:

Corollary 3.2. NRA1(sru) � NRA1(dcr) �NRA1(esr)=NRA1(sri).

Example 3.3. We show here that get can be definedwith sru. Indeed, get =

def *(s, y) .?2(.(s)), where for eachy # D we define . : [D] � [D]_D to be . =

defsru(e, f, u),

with

e =def

(<, y)

f (x) =def

([x], x)

u((s1 , x1) , (s2 , x2)) =def

if empty? (s1) then (s2 , x2)

else if empty? (s2) then (s1 , x1)

else if s1=s2 7 card(s1)=1 then (s1 , x1)

else (s1 _ s2 , y).

Note that the test card(s1)=1 is expressible in NRA1.Then u is associative, commutative, and idempotent onthe subset [([x], x) | x # D] _ [(s, y) | card(s){1] of[D]_D and this set also contains e and Im( f ). Hence . iswell defined.

As a consequence get can be expressed with any of thefour forms of recursion in the presence of the other NRA

operations.

We have thus reached the language NRA1(dcr), k�0.Adding a linear order predicate to this language gives us thesubject of one of our main theorems, the characterization ofthe NC-computable queries on ordered flat databases(Theorem 6.1).

In the same theorem, we obtain a finer characterization,involving the AC-hierarchy, for which we need the notion ofthe depth of recursion nesting depth(e), of some expression e.We define this to be to be the maximum depth of recur-sions occurring in e. More precisely, depth(dcr(e, f, u))=def

max(depth(e), depth( f ), 1+depth(u)) (only u isactually iterated), (depth((e1 , ..., ek) ) =

defmax(depth(e1),

..., depth(ek)), depth( f (e)) =def

max(depth( f ), depth(e)), etc.We denote NRA1(dcr(k)) the restrictions of the languageNRA1(dcr) to recursion depth �k. We have thusobtained the hierarchy of languages NRA1(dcr(k)). Add-ing a linear order predicate to these languages gives us thesubject of the finer characterization given in Theorem 6.1.

The reader may have noticed that some redundancy inexpressibility will appear when we add dcr. Indeed, it turnsout that ext( f ) can be expressed with sru (and, hence withdcr) as sru(<, *x .[x], _). It is important, however, tokeep ext( f ) as a separate construct in the language becausethe expression derived through dcr would be computed inlog n parallel steps, when in fact a direct one-step parallel


7 sru and sri are easier to reason about than dcr or esr because theydefine functions that preserve the algebraic structure, i.e., homomorphisms,hence the ``structural'' in their names. A good way to think aboutdcr(e, f, u) is as the composition of the canonical coercion from sets to bagsfollowed by the structural recursion on the sum presentation of bags[BTS91], with parameters e, f, u. Similarly, esr can be expressed via struc-tural recursion on the increment presentation of bags.


computation is possible; obtain in parallel and independ-ently f (x1), ..., f (xn), and then take their union to computeext( f )([x1 , ..., xn]).

3.2. Bounded RecursionThe four forms of recursion discussed so far do their job

on flat relations. The reader interested in complex objects,however, may have noticed that each of them can express apowerset on complex objects and, hence, are too powerfulfor languages intended to capture complexity classes likeNC or PTIME. All four forms of recursion can simulateeach other, using powerset (at the cost of a high com-plexity); hence, NRA(sru)=NRA(dcr)=NRA(sri)=NRA(esr) and they have essentially the same expressivepower as Abiteboul and Beeri's algebra [BNTW95]. Tokeep our languages for complex objects tractable, we willdefine bounded versions of these recursions, an analog toPeter Buneman's idea of bounded fixpoints [Suc97], andalso related to Sazonov's boundedness for a languagewithout constants [Saz93].

A PS-type (product of sets type) is either a set type, or aproduct of PS-types. E.g., [D_[D]]_[unit] and[D]_([B_B]_[D]) are PS-types, while D_[D] is not.Set-theoretic operations like _, &, � extend to PS-typescomponent-wise.

The bounded version of dcr is defined by

e : t2 f : t1 � t2 u : t2_t2 � t2 b : t2

bdcr(e, f, u, b) : [t1] � t2

with the restriction that t2 is a PS-type, and with the seman-tics:

bdcr(e, f, u, b) =def

dcr(e & b, f & b, u & b).

Here (u & b)(s1 , s2) =def u(s1 , s2) & b. As for dcr, we define

bounded versions for the other forms of recursions on sets,bsru, bsri, besr. Proposition 3.1 easily extends to thebounded versions of recursion:

Corollary 3.4. NRA(bsru) � NRA(bdcr) �NRA(bsri)=NRA(besr).

Over flat relations the explicit bounding is unnecessary:we will prove in Appendix B that NRA1(bdcr) andNRA1(dcr) have the same expressive power:

Proposition 3.5. NRA1(bdcr)=NRA1(dcr). More-over, the equivalence preserves the nesting depth of iterations;i.e., \k�0, NRA1(bdcr(k))=NRA1(dcr(k)). Similarly forsri.

Note that, if we drop get from NRA, then this proposi-tion fails, since NRA1(dcr) can express get (Example 3.3),while NRA1(bdcr) cannot (Proposition C.1). However,

they still express the same set of queries f : t � t$, where t$ isa PS-type. Also, Proposition 3.5 fails in the presence of cer-tain external functions; see Proposition 6.3.

3.3. The Cost of Order-IndependenceThe algebraic conditions, besides the fact that they arise

from principled mathematical characterizations of finitesets, provide us with an elegant alternative for ensuring thewell-definedness of various forms of recursion on sets.Unfortunately, for a language at least as expressive as first-order logic, verifying in general most of these identities is ashard as testing the validity of a first-order formula in allfinite models. We prove:

Theorem 3.6. Deciding whether some expressiondcr(e, f, u) in NRA1(dcr) is well defined is 6 0

1 complete.Similar results hold for sri, as well as for their bounded ver-sions.

As a consequence it is undecidable whether an arbitraryexpression is a correct NRA1(dcr) expression. To provethis result we use Trakhtenbrot's theorem (see, for example,[Fag93]).

Theorem 3.7. (Trakhtenbrot). Assume that a first-orderlanguage L contains some relation symbol that is not unary.Then the set of first-order sentences over L valid in all finitestructures is 6 0

1 -complete.

Proof of Theorem 3.6. First we show that checkingwell-definedness is co-r.e. For some values of the freevariables in e, f, u, let A denote the set obtained by closing[e] _ Im( f ) under applications of u. The expressiondcr(e, f, u) is well defined iff, for any values of the freevariables in e, f, u, the function u is associative, com-mutative, and has identity e, on the set A. Since A isenumerable, testing whether dcr(e, f, u) is not well definedis obviously r.e. Now we show that well-definedness isco-r.e. complete (6 0

1-complete), by reducing the validityproblem over finite models to the decision whether someexpression (dcr(e, f, u) is well defined. Let L be a first-orderlanguage having at least some nonunary relation symbol.To keep our argument simple, suppose L has exactly onebinary symbol R. Since first-order logic is equivalent to therelational algebra, and the latter is essentially equivalent toNRA1, we conclude that for any sentence . in L thereexists a query g : [D]_[D_D] � B in NRA1 such that. is true in the finite model (D, R), R�D_D, iffg(D, R)=true. Then consider the following expressiondcr(e, f, u) : [D] � [D] with free variables D, R: e =

def <,

f =def *x D .[x D], u(s1 , s2) =

defif g(D _ 61(R) _ 62(R), R)

then s1 _ s2 else s1&s2 . When . is valid in all finite models,then g(D _ 61(R) _ 62(R)) is true for all values of the freevariables D and R; hence the function u coincides withunion. In this case dcr(e, f, u) is well defined and is actually



the identity function. On the other hand, if . is not true insome finite model (D, R), then for that particular value ofthe free variables D and R the function u coincides with setdifference, and hence, dcr(e, f, u) is not well defined. K

4. COMPLEXITY CLASSES

We review here the definitions of the complexity classesACk and NC. For some W # [0, 1]* we denote length(W )the length of the string W. Let F : [0, 1]* � [0, 1]*.

Definition 4.1 [RB90, pp. 766]. We say that F is inACk, for k�0 iff the following conditions are met:

1. There is some polynomial Q(n) s . t. \W # [0, 1]*,length( f (W))=Q(length(W )). Thus, F is the union of itsrestrictions to inputs of length n, F=�n�0 Fn , whereFn : [0, 1]n � [0, 1]Q(n).

2. There is a family of circuits (:n)n�0 , where :n is a cir-cuit computing Fn , with n input gates, Q(n) outputs, andconsisting of NOT gates, unbounded fan-in AND and ORgates.

3. For every n�0, size(:n)�P(n) for some polynomial P(the size is the number of gates), and depth(:n)=O(logk n).

4. The family :n is `ùniform,'' as described below.

Following Cook [Coo85, Proposition 4.7], we impose asuniformity condition the DLOGSPACE-DCL uniformity,reviewed below. Barrington, Immerman, and Straubing in[BIS90] give a weaker uniformity condition called FO-DCL-uniformity which is equivalent to the DLOGSPACE-DCL uniformity for the classes ACk, k�1, and whichprovide a more satisfactory characterization for AC 0. In thispaper, only Proposition 6.5 deals with the class AC0 and itremains true for the more restrictive FO-DCL-uniformitycondition in [BIS90].

For completeness, we briefly review the definition ofDLOGSPACE-DCL uniformity [Coo85]. Let :n , n�0 bea family of circuits. The direct connection language DCL forthis family is the set of quadruples (n, g, g$, t), where g, g$are gate numbers in :n (i.e., 1�g, g$�P(n)), such that theoutput of g is one of the inputs of g$ in :n , and t indicateswhat kind of gate g$ is in :n , i.e., t # [NOT, AND, OR, y1 ,..., yQ(n)]. When t= yi , then g$ is the output bit i. We use theconvention that the input gates corresponding to x1 , ..., xn

are identified by assigning them the numbers 1, ..., n. We saythat the family of circuits :n is DLOGSPACE-DCL uniform,iff the DCL can be accepted by some O(log n) space deter-ministic Turing machine T.

Definition 4.2. NC =def

�k�0 ACk.

The results in Stockmeyer and Vishkin [SV84] implythat NC coincides with the class of functions computable bya CRCW PRAM (concurrent read concurrent write parallel

random access machine) in polylogarithmic time usingpolynomially many processors.

5. COMPUTING COMPLEX OBJECTS QUERIES

In order to give precise definitions for query complexityclasses, we must specify an encoding of complex objects intostrings that can be given as input to a computational modelsuch as PRAM's or families of circuits.

Our encoding of complex objects with strings over somefixed alphabet is related to that in [GV91b]. We start withan encoding of the base type D into natural numbers; i.e., weassume some bijection 8 : D � N to be given. When dealingwith ordered databases, we require this encoding topreserve the order relation � on D. Next, we encode com-plex objects using the eight symbols from the alphabetA=[0, 1, [ , ], ( , ), comma, blank], as follows: elementsfrom D are encoded in binary, true and false are encoded by1 and 0 respectively, a k-tuple is encoded by (X1 , ..., Xk) ,and a set by [X1 , ..., Xn]. No duplicates are allowed in theencoding of a set. However, blanks may be scatteredarbitrarily inside some encoding, but not inside the binarynumbers. Since the encoding of some complex object x isnot unique, we define an encoding relation xtX to denotethe fact that X is a valid encoding of x. We view encodingsas strings in [0, 1]*, by further encoding each of the eightsymbols in A by a string of length 3 in [0, 1]*.

Example 5.1. Let x=[[a, c], [c], [d], [ ]]. Assum-ing that a, c, d # D are encoded (by 8) as 0, 2, and 3 respec-tively, then x can be encoded by the string

X=``[[0 comma 10] comma [10]

comma [11] comma [ ]]"

but also by, say,

X$=``blank[blank[0 comma blank 10] comma [10]

comma [11] comma blank[ ]]"

When viewed as strings in [0, 1]*, X has length 3_21=63,while X$ has length 3_25=75.

Removing duplicates in sets is essential in the presence ofrecursors or iterators; else the size of some representationcould grow beyond any polynomial. Duplicates can beremoved in AC 0, by replacing them with blanks, and blankscan be removed (more precisely, moved at the end) in AC1.So, within AC0 we have the alternative choice of encodingwith possible duplicates and no blanks, because there are noiterations allowed within AC0. Within AC k, k�1, we couldask both for blanks and duplicate elimination. Our choice ofencoding without duplicates, but to allow blanks, worksacross all ACk, for k�0.



Note that this encoding is different from that consideredby Immerman in [Imm89], who only deals with flat rela-tions. Under that encoding, a relation of type [Dk] isrepresented by a string of bits of length nk, where n is the sizeof the active domain; see Appendix D. This encoding doesnot extend to complex objects, because it would requireexponential space to encode objects of higher types, evenwhen their size is only polynomial in the size of the activedomain. For flat relations, we show in Appendix D that wecan translate between the two encodings in AC0; seeProposition D.1.

Adapting the definition in [CH80], we define a databasequery of type t1 � t2 to be a function f : t1 � t2 which isgeneric (see Definition A.1 for a review). We say that aquery f : t1 � t2 is computed by the function F : [0, 1]* �[0, 1]* iff \x # t1 , \X # [0, 1]*, xtX O f (x)tF(X ).

The fact that we only deal with generic queries gives usmore liberty in the encoding of complex objects, withoutusing the bijection 8. Suppose the query f is computed by F.To compute f (x) for some input x # t1 , we may proceed asfollows. Let d0 , d1 , ..., dm&1 be the active domain of x, i.e. allvalues of type D mentioned in x. In the case of ordereddatabases, assume d0<d1< } } } <dm&1. Obviously8(d0), ..., 8(dm&1) can be any numbers, but we will chooseto encode d0 , ..., dm&1 with the numbers 0, 1, ..., m&1; wecall this the minimal encoding, to distinguish it from thestandard encoding using 8. Let X be the resulting minimalencoding of x. Apply F on X to get Y=F(X ), and finallydecode Y under the minimal encoding, to get y. Now weprove that y is the correct result; i.e., y= f (x). Indeed, letD$=[d $0 , d $1 , ..., d $m&1], where d $0 , ..., d $m&1 are such that8(d $i)=i, for i=0, m&1. Then the function . : D$ � D,.(d $i) =

def di for i=0, m&1 is injective and, in the case ofordered databases, also order-preserving. Let .t1

: t$1 � t1 beits extension to t1 , where t$1 is obtained from t1 by replacingevery occurrence of D with D$ (see Appendix A). Letx$=.&1

t1(x). Obviously X is the standard encoding of x$

and, since F computes f, we have that Y=F(X ) is thestandard encoding of y$= f (x$). A moment of thought willconvince the reader that decoding Y under the minimalencoding yields .t2

( y$); i.e., y=.t2( y$). Now we use the fact

that f is generic to argue .t2( y$)=.t2

( f (x$))= f (.t1(x$))=

f (x). Hence, when we decode Y according to the minimalencoding, we obtain f (x).

Definition 5.2. We say that a query f is in NC iff thereis some function F : [0, 1]* � [0, 1]* computing f which isin NC. We denote by CQ-NC the class of queries which arein NC, and by Q-NC the class of queries over types of setheight �1 which are in NC. Similarly, for some k�0, wedefine queries in ACk, and the classes CQ-ACk and Q-ACk.

In short, Q-NC is the class of NC queries over flat rela-tions and CQ-NC is that of NC queries over complexobjects.

6. MAIN RESULTS

Our languages capture the classes Q-NC and CQ-NConly on ordered databases. The presence of order means forus the addition of an external function �: D_D � B,which is understood always to denote a linear order on D.This immediately gives us an order relation at all types; thatis, for every type t there is an expression in NRA(�),�t : t_t � B, whose meaning is a total order on t (e.g., see[LW94a]).

Theorem 6.1. NRA1(dcr, �)=Q-NC. More pre-cisely NRA1(dcr(k), �)=Q-AC k for every k�1.

Theorem 6.2. NRA(bdcr, �)=CQ-NC. More pre-cisely NRA(bdcr(k), �)=CQ-AC k for every k�1.

These languages are purely for flat relations, respectivelycomplex objects. But many external functions of practi-cal interest such as the usual arithmetical operations(+, V , &, �, etc), and the usual aggregate functions (car-dinality, sum, average, etc.) are also in NC. Can they beadded in? The answer is yes for bdcr, but no for dcr.

Proposition 6.3. Let 7 be an extension consisting ofpossible additional base types and a set of functions com-putable in NC. Then NRA(7, bdcr)�NC. However,NRA1(N, +, dcr) can express exponential space queries.

Of course, in general NRA(7, bdcr)�3 CQ-NC, becausethe external functions in 7 may not necessarily be generic.We prove the inclusion NRA(7, bdcr)�NC in Section 9.The following example shows that NRA1(N, +, dcr) canexpress exponential space queries.

Example 6.4. Consider the subset of [N] containingall sets of the form [0, 1, 2, ..., k]; we view such a set as theencoding of the integer k. Then we can define the functionsplus, times, exp: [N]_[N] � [N], with the meaning:x=[0, 1, 2, ..., m], y=[0, 1, 2, ..., n] O plus(x, y) =[0, 1,2, ..., m+n], times(x, y)=[0, 1, 2, ..., m V n], exp(x, y)=[0, 1, 2, ..., nm], in NRA1(N, +, dcr). Namely wedefine plus directly, while for times and exp we hint on theirdefinition with dcr:

plus(x, y) =def

ext(*(u, v) .[u+v])(x_y)

times(x _ x$, y) =def

plus( times(x, y) , times(x$, y))

when x & x$=<

exp(x _ x$, y) =def

times(exp(x, y) , exp(x$, y))

when x & x$=<.

In particular, Example 6.4 shows that dcr is strictly morepowerful than bdcr over flat types, in the presence of certainexternal functions, because when all external functions are



in NC (and this is the case of + above), then with bdcr canexpress only NC queries, while exp obviously not in NC.This shows that, in general, Proposition 3.5 fails in thepresence of external functions.

Immerman in [Imm89] and Barrington, Immerman, andStraubing in [BIS90] prove that FO is included inFO-DCL-uniform AC0, and that FO, together with orderand the BIT relation, has the same expressive power as AC 0.Here, we prove that NRA is included in AC 0, thus extend-ing half of their result to complex objects.

Proposition 6.5. Under the encoding of complex objectsdescribed in Section 5, all queries in NRA(�), are in FO-DCL-uniform AC0 (see [BIS90]).

We state two more results which help us put the maintheorems in perspective. Their proofs are omitted from thispaper.

Conservative extension. One may wonder in what senseTheorem 6.1 is a ``particular case'' of Theorem 6.2. Actually,even though the proof of Theorem 6.1 is quite similar to thatof Theorem 6.2, and we do present them ``together'' inSections 9 and 10, Theorem 6.1 in fact follows fromTheorem 6.2, Proposition 3.5 and the conservative exten-sion result presented below.

Paredaens and Van Gucht in [PG92], and Wong in[Won93] prove that NRA is a conservative extension ofNRA1. Suciu in [Suc97] proves that NRA(bfix) is aconservative extension of NRA1(fix), where fix is the usualinflationary fixpoint, and bfix is a bounded version of fix.Using the techniques in [Suc97], we can prove the follow-ing.

Proposition 6.6. Let 7 be a set of external functionswhich have set heights �1. Then, NRA(7, bdcr, �) is aconservative extension of NRA1(7, bdcr, �).

Note that for the case when 7=<, we can turn the tablesand Proposition 6.6 follows directly from the maintheorems. For the case when 7{<, this propositionrequires a separate proof, and we are able to do it only inthe presence of order. However, we conjecture thatNRA(bdcr) is a conservative extension of NRA1(dcr).

PTIME vs NC. Immerman, Patnaik, and Stemple[IPS91] show that PTIME is captured by a language builtaround set-reduce (see Section 3). Extending their resultalso to complex objects we have:

Proposition 6.7. NRA1(sri(1), �) = Q-PTME[IPS91] and NRA(bsri(1), �)=CQ-PTIME.

Here Q-PTIME and CQ-PTIME are the set of queriesover flat relations and over complex objects, respectively,computable by some function in PTIME.

Thus, by the main theorems and this proposition, the dif-ference between PTIME and NC computable queries overordered databases can be characterized by the differencebetween two kinds of recursion on sets. It is interesting tonote that only one level of recursion nesting suffices for sriand PTIME, as opposed to dcr and NC. For the latter, thecollapse of the dcr hierarchy is equivalent to the collapse ofthe AC k, k�1, hierarchy and, hence, is still open.

The rest of the paper is devoted to thee proof of the mainresult, Theorems 6.1 and 6.2. Basically, we do this in threesteps:

1. We show that, over ordered databases, dcr is equiv-alent to another form of recursion, called set-divide, whosedefinition depends essentially on the order relation. More-over, the equivalence preserves the nesting depth.

2. Next we show that set-divide is equivalent, on ordereddatabases, to a form of iteration, which we call log� loop.Again, the nesting depth is preserved under the equivalence.

3. Finally we show that NRA1(log�loop, �)=Q-NC.

The new form of recursion set-divide is of interest in itself asan order-dependent variant of dcr. The other forms of recur-sion, bdcr, sri, and bsri have order-dependent cousins them-selves, which we discuss too. Next, the log�loop construct isalso of interest, because its definition is order independent(unlike set-divide), it requires no side-conditions (as dcrdoes) and, hence, forms the core of a decidable querylanguage which, over ordered databases expresses exactlyQ-NC. Of course, there is a bounded version, too, whichdoes the job for complex objects.

7. RECURSION AND ORDER

One way of interpreting the roles of conditions likeassociativity, commutativity, etc., in the definition of dcrand sri, is as simple sufficient conditions for order inde-pendence. As we saw in Subsection 3.3, checking these con-ditions is undecidable in general. On ordered structures onthe other hand, one can define forms of recursion on setsthat do not require conditions. For instance, Immerman,Patnaik, and Stemple [IPS91] consider under the name set-reduce a form of recursion on sets which resembles some-what sri. Set-reduce does not require conditions such as left-commutativity, etc. Instead, its definition relies on the exist-ence of a linear ordering on the elements of the sets to whichit is applied. We prove that, in the presence of order, thisform of recursion has the same expressive power as sri. Wealso formulate a similar order-dependent form of recursionthat corresponds to dcr in the presence of order. Finally, aninteresting relationship holds between sru and dcr in thepresence of order.



Set-reduce [IPS91]. Let <: t_t � B be a linearordering on t, and for e : t, j : t_t$ � t$, let set-reduce(e, j) : [t] � t$ be defined by

set-reduce(e, j)(<) =def

e

set-reduce(e, j)([x1 , ..., xn]) =def j(x1 , set-reduce([x2 , ..., xn]))

when x1<x2< } } } <xn . No conditions are imposed on j.

Set-divide. Similarly, one can conceive a form of divideand conquer recursion that relies on the ordering, whichallows us to define some function by .([x1 , ..., xn]) =

def

u(.([x1 , ..., xwn�2x]), .([xwn�2x+1 , ..., xn])) (no conditionsare imposed on u). That is, we make the arbitrary choice ofdividing a set into two almost equal halves at each iterationstep.8 Again, we can prove that this form of recursion hasthe same expressive power as dcr.

Set-divide is defined formally as follows: Let <: t_t � Bbe a linear ordering on t and for e : t$, f : t � t$, v : t$_t$ � t$define set-divide(e, f, v) : [t] � t$ by

set-divide(e, f, v)<) =def

e

set-divide(e, f, v)([a]) =def

f (a)

set-divide(e, f, v)([x1 , ..., xn]) =def

v(set-divide(e, f, v)([x1 , ..., xwn�2x]),

set-divide(e, f, v)([xwn�2x+1 , ..., xn]))

when x1<x2< } } } <xn . No conditions are imposed on v.Then:

Proposition 7.1. Over ordered flat relations, set-divideis equivalent to dcr and set-reduce is equivalent to sri9,

NRA1(set-divide, �)=NRA1(dcr, �)

NRA1(set-divide, �)=NRA1(sri, �).

Similarly, over complex objects we have

NRA(bset-divide, �)=NRA(bdcr, �)

NRA(bset-divide, �)=NRA(bsri, �)

Moreover, these equivalences preserve the iteration depth, e.g.NRA1(set-divide(k), �)=NRA1(dcr(k), �) for all k�0,etc.

Main technique. Proving one inclusion is easy:NRA1(dcr, �)�NRA1(set-divide, �), because when-ever dcr(e, f, u) is a correct expression, it is equivalent to

set-divide(e, f, u). The other direction is more involved:given a set-divide(e, f, v) expression, in general v is notassociative and does not have identity e; hence the expres-sion dcr(e, f, v) is not well formed: we need a more involveddcr expression. Let �=set-divide(e, f, v), � : [t] � t$. Theidea is to compute �(x), where x=[x1 , ..., xn], x1<x2<} } } <xn , by computing � iteratively on larger and largersubsets of x and ``memorizing'' all results. For that, wedefine, for every number k, 0�k�n, a complex object k�which `èncodes'' all applications of � to `ìntervals'' oflength �k in x, i.e., to sets s�[x1 , ..., xn] of the forms=[xp , xp+1 , ..., xq], of cardinality �k (i.e., q& p+1�k).There is a single interval of length 0, namely <, there are nintervals of length 1 ([x1], ..., [xn]), n&1 intervals oflength 2 ([x1 , x2], [x2 , x3], ..., [xn&1, xn]), etc. Thus thecomplex object, say, 2� , will contain enough informationto allow us to extract from it all values �(<),�([x1]), ..., �([xn]), �([x1 , x2]), ..., �([xn&1 , xn]). Ingeneral k� encodes the result of applying � on1+n+ (n &1) + } } } + (n&k+1) =1+(k(n&k+1)�2)intervals. So, in order to compute �(x), we compute k� forlarger and larger k's, until we get n~ , from which we extract�(x). We need to be a bit careful, however, because to show,say NRA1(set-divide, �)�NRA1(dcr, �), we do notwant k� to have a larger set height than �. Typically t$ is aPS-type and, for the sake of clarity, we will assume that t$ isa product of two set types; that is, t$=[t$1]_[t$2]. Then forevery number k, 0�k�n, we define the encoding of k w.r.t.� to be k� # [t_t]_[(t_t)_t$1]_[(t_t)_t$2]:

k� =def ([(xp , xq) | 1� p, q�n, 0�q& p+1�k],

[((xp , xq) , a1) | 1� p, q�n, 0�q& p+1�k,

a1 # ?1(�([xp , xp+1 , ..., xq]))],

[((xp , xq) , a2) | 1� p, q�n, 0�q& p+1�k,

a2 # ?2(�([xp , xp+1 , ..., xq]))]) .

We adopt the convention k� =def n~ , whenever k>n.

Thus k� contains the following information (1) the set ofall pairs (xp , xq) for which [xp , xp+1 , ..., xq] has between0 and k elements, and (2) for each such pair (xp , xq) ,all values of �([xp , xp+1 , ..., xq]), tagged with the pair(xp , xq). Note that the set height of k� is no larger than themaximum of the set heights of [t] or t$. In particular, when� takes as input a flat relation and returns two flat relations,then k� consists of three flat relations.

Given an encoding k� and xp , xq # x, we can extract�([xp , ..., xq]). Namely we define

extract(k� , (xp , xq)) =def ([a1 | ((xp , xq) , a1) # ?2(k� )],

[a2 | ((xp , xq) , a2) # ?3(k� )]) .


8 Another ad-hoc way of defining recursion, related to set-divide, but onvectors instead of sets, can be found in the equational parallel languageEL* [RW93].

9 The equivalence holds for complex objects too, but we are only inter-ested in bounded iterations over complex objects here.


Obviously extract can be expressed in NRA (or inNRA1, when the type t$ has set height �1), andextract((xp , xq), k� ) =�([xp , xp+1 , ..., xq]) for 0�q&p+1�k. We will use this encoding to compute � with dcr,or with esr (and, hence, with sri). To compute � with dcr,assume that the following are expressible in NRA : 0� , 1� ,and the function *(k� , k� $) . k+k$

t. Then we can express the

function # : [t] � [t_t]_[(t_t)_t$1]_[(t_t)_t$2]defined by

#(s)={k�n~

when card(s)=k�nwhen card(s)>n

as #=dcr(0� , *y .1� , *(k� , k� $) . k+k$t

). Obviously the function*(k� , k� $) . k+k$

tis associative, commutative, and has iden-

tity 0� , and the dcr-expression is well defined. Finally, tocompute �(x), we take

�(x)=extract(�(x), (x1 , xn)).

Proof of Proposition 7.1. As argued earlier, theinclusions NRA1(dcr, �)�NRA1(set-divide, �),NRA1(sri, �)�NRA1(set-reduce, �), NRA(bdcr, �)�NRA(bset-divide, �), and NRA(bsri, �)�NRA(bset-reduce, �) follow immediately, so we showonly the other four inclusions. We start by showingNRA(bset-reduce, �)�NRA(bsri, �), let �=bset-reduce(e, j, b) : [t] � t$. In order to express �(x) with sri,where x=[x1 , ..., xn], with x1<x2< } } } <xn , we will usethe encoding described earlier. Since t$ is a PS-type, we willassume, for sake of clarity, that it is the product of two settypes, t$=[t$1]_[t$2]. Then we associate to each number k,its encoding k� # [t_t]_[(t_t)_t$1]_[(t_t)_t$2], asbefore. It is easy to check that, given the fact that �=bset-reduce(e, j, b), then 0� , and the function *k� . k+1

tare

expressible in NRA(bsri, �). For the latter, suppose weare given some k� . To compute k+1

t, consider all pairs

(xp , xq) # ?1(k� ) with p>1. Compute xp&1 (using the orderrelation on x=[x1 , ..., xn]), and compute �([xp , ..., xq])=extract(k� , (xp , xq) ). To compute �([xp&1 , xp , ..., xq]),observe that �([xp&1, xp , ..., xq])= j(xp&1, �([xp , ..., xq])).Finally this allows us to assemble together the value k+1

t.

Then we define in NRA(bsri, �) the function # : [t] �[t_t]_[(t_t)_t$1]_[(t_t)_t$2], #=esr(0� , *( y, k� ) .k+1t

), whose meaning is

#(s) =def {k�

n~when card(s)=k�nwhen card(s)>n.

Next we convert the definition of # from esr into besr, bynoting that \k, 0�k�n, k� �b$, where b$=(x_x)_((x_x)_61(b))_((x_x)_62(b)). Hence #=besr(0� ,*( y, k� ) . k+1

t, b$). Finally we convert it into a definition

with bsri, using Proposition 3.1, and observe that �(x)=extract((x1 , xn), #(x)) .

To check NRA1(set-reduce, �)�NRA1(sri, �), weargue as follows. First, by extending Proposition 3.5 to set-reduce, we obtain NRA1(set-reduce, �)=NRA1(bset-reduce, �). Next, we observe that in the translation givenabove of bset-reduce: [t] � t$ into bsri, the resulting expres-sion uses only types of set height �max(height([t]),height(t$), 1)), which proves NRA1(bset-reduce, �)�NRA1(bsri, �). Finally we use Proposition 3.5 to arguethat NRA1(bsri, �)=NRA1(sri, �).

We prove that NRA(bset-divide, �)�NRA(bdcr, �)using the same idea, with minor additional complications.Here, given that �=bset-divide(e, f, v, b), the following areexpressible in NRA(bdcr, �) : 0� , 1� , and, the function*(k� , k� $) } k+k$

t. We will explain how to express the latter.

Recall that k� encodes all values of �(s), for s an interval ofx1<x2< } } } <xn , of cardinality �k. To compute k+k$

t,

we need to apply � to all intervals s"�[x1 , ..., xn] of car-dinality k"�k+k$ (there are n&(k+k$)+1 such inter-vals). To do this, split s"=[xp , xp+1 , ..., xq] into twohalves of cardinalities wk"�2x and Wk"�2X, respectively,s"=s1 _ s2 , with s1=[xp , xp+1, ..., xr], s2=[xr+1 , xr+2 ,..., xq]. This can be done using Lemma 7.2 below, given thatbdcr can express transitive closure. We will not computetransitive closure for every pair (xp , xq) , however, becausethis would increase the nesting depth of the bdcr expression(it would be 1 higher than the original bset-divide expres-sion): instead, we precompute a relation containing all tri-ples (xp , xq , xr) , 0�q& p+1, r=w( p+q)�2x, using asingle level of bdcr, then look up this table every time weneed to split some set s". Now we argue that both s1 and s2

have cardinalities which are less than or equal to the largestof k, k$: if not, then card(s1)�k (because card(s1)�card(s2)&1), card(s2)>k$; hence card(s")=card(s1)+card(s2)>k+k$, a contradiction. Therefore, assumingk�k$, it suffices to ``look up'' the values �(s1) and �(s2) ink� and to compute �(s")=v(�(s1), �(s2)).

Next, as in the proof of Proposition 7.1, we will considerthe function # =

defdcr(0� , *y } 1� , *(k� , k� $) } k+k$

t), with the

meaning:

#(s)={k�n~

when card(s)=k�nwhen card(s)>n.

The definition of # is obviously correct, since *(k� , k� $)_k+k$t

) is associative, commutative, and has identity 0� . Asin the proof of Proposition 7.1, we observe that #(x)=bdcr(0� , 1� , *(k� , k� $) } k+k$

t, b$)(x), where b$=(x_x)_

(((x_x)_61(b))_((x_x)_62(b))). Finally �(x)=extract(�(x), (x1 , xn)).



To prove NRA1(set-divide, �)�NRA1(dcr, �), weproceed as in the proof of Proposition 7.1. Namely we firstextend Proposition 3.5 to NRA1(set-divide, �)=NRA1(bset-divide, �), then observe that the above con-struction also proves that NRA1(bset-divide, �)�NRA1(bdcr, �), since the auxiliary types never exceed inset height the heights of the input and output types. Finallywe use Proposition 3.5 to argue that NRA1(bdcr, �)=NRA1(dcr, �). K

In the proof above we needed the following lemma.

Lemma 7.2. [Imm87b]. Let tct : [t_t] � [t_t] bethe function computing the transitive closure of binary rela-tions of elements of type t, and let NRA(tc) stand forNRA extended with tct for every type t. Then, the functioneq-cardinality: [t]_[t] � B, defined by eq-cardinality(x, y)=true iff card(x)=card( y), is expressible inNRA(tc, �).

Proof. Let x = [x1 , ..., xm], y = [ y1 , ..., yn], andassume x1<x2< } } } <xm , y1< y2< } } } < yn . Let r #[(t_t)_(t_t)] be r=[((xi , yj) , (xi+1 , yj+1)) | i=1,m&1; j=1, n&1]. Obviously r can be computed inNRA(�) from x and y. Compute the transitive closure ofr, q =

deftct_t(r). Then m=n iff ((x1 , y1) , (xm , yn))

# q. K

sru versus dcr. We end this section with a comment onthe expressive power of sru. Although superficially relatedto dcr, sru seems to have less expressive power. In fact itremains open whether transitive closure can be expressed inNRA1(sru), or in NRA(bsru), or even in NRA(bsru,�). We do not venture a conjecture about the last language,but there is some reason to think that the following is true.

Conjeture 7.3. Transitive closure cannot be expressed inNRA1(sru), nor in NRA(bsru).

Interestingly, with order and tc, sru becomes as powerfulas dcr.

Proposition 7.4. NRA1(sru, �, tc)=NRA1(dcr, �)and NRA(bsru, �, tc)=NRA(bdcr, �).

Proof. To show NRA(bdcr), �)�NRA(bsru,�, tc) consider some expression �=bdcr(e, f, u, b) inNRA(bdcr, �). Without loss of generality we may assumethat e, f, u, b are in NRA(bsru, �).

To compute �([x1 , ..., xn]), x1<x2< } } } <xn , usingbsru, we use the encodings of numbers k� , 0�k�n men-tioned earlier. Namely we define the function . to be.(s) =

def (k� , s) , where k=card(s). . can be defined usingbsru. Obviously .(<) =

def (0� , s) and .([ y]) =def (1� , [ y]).

For .(s _ s$), let k=card(s), k$=card(s$) and k"=card(s _ s$). We have .(s _ s$)=(k� ", s _ s$) , so we have toargue that k� " can be computed, given k, k� $, and s, s$. First

note that k� "� k+k$t

. So all we have to do is to select fromk+k$t

all those values which are tagged with pairs (xp , xq)with q& p+1�k". (*(k� , k� $) } k+k$

tcan be computed

with bsru in the same way as with bset-divide in the proofof Proposition 7.1; in fact it is even simpler, since we do notneed to test for equal cardinality.) The latter is equivalent tocard([xp , ..., xq])�card(s _ s$), which can be tested inNRA(�, tc), using Lemma 7.2. Finally we have �([x1 , ...,xn])=extract(?1(.([x1 , ..., xn])), (x1 , xn) ). K

8. ITERATION OVER SETS

Next we show that the order-dependent forms of recur-sion of the previous section are equivalent, over ordereddatabases, with more simple loops. The logarithmic and thebounded logarithmic iterator are defined by

f : t � tlog�loop( f ) : [t$]_t � t

f : t � t b : tblog�loop( f, b) : [t$]_t � t

with the semantics

log�loop( f )(x, y) =def f (Wlog(card(x)+1)X)( y),

where card(x) is the number of elements of x. Thus,log�loop iterates some function f a number of times equal tothe number of bits necessary to represent the numbercard(x). The bounded logarithmic iterator is define by

blog�loop( f, b)(x, y) =def

log�loop( f & b)(x, y & b).

Similarly, we define the iterator and the bounded iteratorloop and bloop, which iterates some function card(x) times,instead of Wlog(card(x)+1)X times,

f : t � tloop( f ) : [t$]_t � t

f : t � t b : tbloop( f, b) : [t$]_t � t

,

with the semantics

loop( f )(x, y) =def f card(x)( y)

bloop( f, b)(x, y) =def

loop( f & b)(x, y & b).

We extend the definition of depth of recursion nesting todepth of iteration nesting for these constructs by definingdepth(log�loop( f )(e)) =

defmax(1+depth( f ), depth(e)), etc.

Both log�loop and loop are powerful enough to expresspowerset. Hence, we will only consider the unbounded ver-sions in conjunction with flat relations and use theirbounded versions for complex objects.



Example 8.1. log�loop can express transitive closure,tc: [t_t] � [t_t]. Indeed, let r # [t_t] be some relation.First compute v=61(r) _ 62(r) (the set of all elementsmentioned in r), then, repeat Wlog(n+1)X times r � r _ r b r,where n =

defcard(v), and b is relation composition. That is

tc(r)=log�loop(*r } r _ r b r)(v, r) .

Example 8.2. Let n=card(x). Then loop( f ) andlog�loop( f ) allow us to iterate n and log n times, respec-tively. To iterate n2 times, it suffices to loop over x_x,which has n2 elements: f (n2)( y)=loop( f )(x_x, y) . But inorder to iterate log2 n times, we use two levels of iterations:

f (Wlog(n+1)X2)( y)=log�loop(*z } log�loop( f )(x, z) )(x, y).

Immerman defines FO(t(n)) in [Imm89] to be first orderlogic, with order and with a binary relation BIT, extendedwith those inductive definitions which close after t(n)steps. The languages NRA1(log�loop, �, BIT ) andNRA1(loop, �, BIT ) have essentially the same expressivepower as FO[logO(1) n] and FO[nO(1)], respectively.However, without order, these two are no longer equiv-alent: loop can express parity, while FO[nO(1)] (withoutorder and BIT ) is included in FO+LFP, and hence it can-not express parity. Similarly, we can argue thatFO[logO(1) n] without order is less powerful thanNRA1(log�loop).

The key technical lemma in proving the main resultsstates that dcr and log�loop have the same expressive powerover ordered databases. Expressions with dcr can be trans-lated into expressions with log�loop and conversely.Moreover, this translation preserves the depth of iterationnesting.

Proposition 8.3. Over ordered flat relations, set-divideis equivalent to log�loop, and set-reduce is equivalent to loop:

NRA1(log�loop, �)=NRA1(set-divide, �)

NRA1(loop, �)=NRA1(set-reduce, �).

Similarly, over complex objects we have

NRA(blog�loop, �)=NRA(bset-divide, �)

NRA(bloop, �)=NRA(bset-reduce, �).

Moreover, these equivalences preserve the iteration depth,e.g., NRA1(log�loop(k), �)=NRA1(set-divide(k), �)for all k�0, etc.

Proof. We saw in Proposition 3.5 that NRA1(bdcr)=NRA(dcr). Similarly, one can show that NRA1(blog�loop)=NRA1(log�loop) and NRA1(bset-divide)=NRA1

(set-divide). Hence, once we prove NRA(blog�loop,�)=NRA(bset-divide), the equivalence NRA1(log�loop,�)=NRA1(set-divide, �) follows. So we prove the firstequivalence, and start by showing NRA(bset-divide, �)�NRA(blog�loop, �). Consider some function .=bset-divide(e, f, u, b), . : [t] � t$. Let x=[x1 , ..., xn] # [t].We will use the encoding k� described in Section 7. Define gto be the function *k� } k+k

t(see the techniques used in the

proof of Proposition 7.1). Then the sequence 1� , g(1� ),g(g(1� )), g(g(g(1� ))), ... is 1� , 2� , 4� , 8� , ...; hence it suffices toiterate g Wlog(n+1)X times, and to apply it to 1� , to getlog�loop(g)(x, 1� ) =n~ . Using the techniques from the proofof Proposition 7.1, given a bound b for bset-divide(e, f,u, b), we can compute a bound b$ for k� for all k�n; hencelog�loop(g)(x, 1� ) is equivalent to blog�loop(g, b$)(x, 1� ).Finally we obtain .(x)=extract(n~ , (x1 , xn)) .

The converse, NRA(blog�loop, �) � NRA(bset-divide, �), follows immediately from blog�loop( f, b)(x, y) =bset-divide( y, *z } f ( y), *(z, z$) } f (z), b)(x).

Finally note that the translations preserve the nestingdepth. The equivalence between bloop and bset-reduce isproved similarly. K

It follows that the languages NRA1(dcr) andNRA1(log�loop) coincide on ordered structures. Howeverwe do not know what their relationship is without order.Still Proposition 8.3 has an important consequence. Recallthat the conditions for well-definedness of dcr are not r.e.;hence the language NRA1(dcr, �) is not r.e. But, byrestricting it to the instances of dcr used in the simulation oflog�loop we obtain an r.e., in fact a decidable, sublanguageL which has the same expressive power as the wholeNRA1(dcr, �).

9. CIRCUITS

We will show here that NRA1(log�loop)�Q-NC. Firstwe establish some technical lemmas. For some stringW # [0, 1]n of length n and numbers i, j, 0�i� j�n&1,we denote with W[i : j] the substring of W containing allelements from positions i to j inclusive. Recall that weencode each symbol of the alphabet A=[0, 1, [ , ], ( , ) ,comma, blank] with a string of length 3. Thus if X is anencoding of the complex object x, then for every i for whichi mod 3=0, X[i : i+2] will be the binary encoding of somecharacter in A.

Lemma 9.1. For every d�0 there exists a functionF d=�n�0 F d

n in AC0, F dn : [0, 1]n � [0, 1]n2

, finding the``matching parentheses'' of nesting depth �d.

More precisely, for every string X # [0, 1]n and for everyi, j, 0�i, j�n&1, F d

n(X ) will have a 1 on position i } n+ j



iff i mod 3= j mod 3=0, and there is a left parenthesis [onpositions i, i+1, i+2, and a matching right parenthesis] onpositions j, j+1, j+2 (or similarly, for a pair of matchingangle parentheses ( and ) on these positions), and the nest-ing depth of parentheses enclosed between the positionsi and j is <d. E.g., consider the string X=``]0[ 1[2]3[4[ 5]6[7]8]9]10[11 '' which, after encoding eachparenthesis with three bits, will be translated into a stringX # [0, 1]36. Then F 2

36(X) reports the matching parentheses[2 , ]3 , [5 , ]6 , [7 , ]8 , and [4 , ]9 (i.e., it will output an 1 onthe positions 36i+ j, for i, j=(6, 9) , (15, 18) , (21, 24) ,and (12, 27) ). It will not report the matching parentheses[1 , ]10 , because their nesting level is 3.

Proof of Lemma 9.1. We prove the statement by induc-tion on d. For d=0 there is nothing to check: F 0

n(X) justreturns the string 000 } } } 0. For d=1, F 1

n(X) returns an 1 onthose positions i } n+ j for which both i and j are divisible by3, i< j, X[i : i+2], and X[ j : j+2] contain [ and ]respectively (or ( and ) ), and \k, k mod 3=0, i<k< j,there is no parenthesis in X[k : k+2].

For d>1 we proceed as follows. First we computeY=F d&1

n (X ). Next replace in X every character corres-ponding to a matched parenthesis in Y with blank, and callX$ the result; i.e., \i } 0�i�n&1, if i mod 3=0 then, if _ js.t. Y[i } n+ j]=1 or Y[ j } n+i]=1, then X$[i : i+2] =

def

blank, else X$[i : i+2] =def X[i : i+2]. Compute Y$=

F 1n(X$), and finally output Z, where Z[i] =

defOR(Y[i],

Y $[i]), for 0�i�n2&1.We invite the reader to check that for d�1 a circuit :d

n

can be built for F dn , for which height(:d

n)�8d&4, andsize(:d

n)�d } (2n3+3n2+6n)&5n. K

Corollary 9.2. For each type t, there is some functionFt=�n�0 F t

n in AC0, F tn : [0, 1]n � [0, 1]n2 which identifies

the pairs of parenthesis for any encoding of type t. Moreprecisely for any encoding X of an object x of type t, F t

n(X )will have an 1 on position i } n+ j iff there is a left parenthesis[ on positions i, i+1, i+2, and a matching right parenthesis] on positions j, j+1, j+2, or similarly for a pair of matchingangle parentheses ( ).

Proof. This follows directly from Lemma 9.1, since anyencoding X of an object of type x will have at most d nestedparentheses, where d depends only on the type t. K

Lemma 9.3. For any set type [t], there is some functionB[t]=�n�0 B[t]

n in AC 0, B[t]n : [0, 1]n � [0, 1]n, which, for

some encoding [X1 , ..., Xm] of type [t], returns a string con-taining exactly m 1's, namely on those positions where someXi begins. Similarly, for every product type t1_ } } } _tk ,there exists a function Bt1_ } } } _tk in AC0 which, when given anencoding (X1 , ..., Xk) of length n of (x1 , ..., xk) # t1_} } } _tk , returns a string of length n containing exactly k 1's,namely on the positions where X1 , ..., Xk start.

Proof. The circuit computing B[t]n identifies the outer-

most commas (i.e., those not included in any pair of match-ing parenthesis, except the outermost [ ]), and returns a 1on each first nonblank position following such a comma, orfollowing the leading left brace. K

As a consequence, we have:

Lemma 9.4. For all types t, equality of objects of type tis computable in AC0.

Proof. We construct a family of circuits E tn : [0, 1]2n �

[0, 1] which, when given two encodings X, Y of x, y # t,both of length n, returns 1 iff x= y. We proceed by induc-tion on the type t. For the base case t=D, we use essentiallythe fact that blanks are not allowed inside the binaryrepresentation of numbers. Then E D

n (XY ) will return 1 iff_i, i $, j, j $, 0�i�i $�n&1, 0� j� j $�n&1, X[i : i $]=Y[ j : j $] (in particular i $&i= j $& j), and in X[0 : i&1],X[i $+1 : n&1], Y[0 : j&1], Y[ j $+1 : n&1] there areonly blanks. For the induction case t=[t$], we haveX=[X1 , ..., Xp] and Y=[Y1 , ..., Yq], with possible blanksscattered. We essentially test the following two conditions:

\i� p } _ j�q } Xi=Yj (1)

\ j�q } _i� p } Xi=Yj . (2)

To do that, we start by using Lemma 9.3 to compute thestings U, V of length n, which identify the p positions in Xand the q positions in Y where some element starts. Thencondition 1 becomes

\i, i $ } (i<i $ 7 U[i]=U[i $]=1 7 \i"i<i $, U[i"]=0)

O _ j, j $ } j< j $ 7 V[ j]=V[ j $]=1

7 (\ j" } j< j"< j $ O V[ j"]=0)

7 E t$n(U[i : i $&1] V[ j : j $&1])=1.

In fact the expression E t$n(U[i : i $&1] V[ j : j $&1]) is not

quite correct; we have to eliminate the trailing comma (i.e.,replace it with blank in U[i : i $&1] and V[ j : j $&1] andadd n&i $+i&1 trailing blanks to U[i : i $&1] andV[ j : j $&1] (to make them of length n) before computingE t$

n(U[i : i $&1] V[ j : j $&1]). We invite the reader to fillin the details. Condition 2 is handled similarly. K

Lemma 9.5. For every set type [t] there exists a``duplicate elimination function'' Dt=� Dt

n , where Dtn :

[0, 1]n � [0, 1]n, with the following meaning. Given anyencoding X=[X1 , ..., Xm] of some object of type [t], withpossible duplicates, Dt(X) returns a string of the samelength, in which all duplicates are replaced with blanks.

Proof. Dtn(X ) will work as follows. First compute

Y=Btn(X); i.e., identify where each of the m substrings



X1 , ..., Xm begins in X. Now we have to replace each charac-ter X[i : i+2] belonging to some duplicate Xp , with0�i�n&1, i mod 3=0, with a blank. More precisely wereplace X[i : i+2] with blank iff the following holds:

_ j�i } _k>i } Y[ j]=Y[k]=1 7 (\l } j<l<k O Y[l]=0)

7 _ j $, k$ } k� j $<k$ 7 Y[i $]=Y[k]=1

7 (\l } j $<l<k$ O Y[l]=0)

7 E tn(X[ j : k&1] X[ j $ : k$&1]).

The condition says that X[i : i+2] lies inside some stringXp stretching between positions j and k and that Xp equalsXq with p<q, where Xq stretches between the positions j $and k$. The expression E t

n(X[ j : k&1] X[ j $ : k$&1]) isnot quite correct: the possible trailing commas have to beeliminated (replaced with blank) in both X[ j : k&1] andX[ j $ : k$&1], and then they have to be padded with blanksto make them strings of length n. We invite the reader to fillin the details. K

Finally we show that all queries in NRA1(log�loop) arein NC. We want to prove this by `ìnduction'' on the struc-ture of a query, but here we run into the following technicaldifficulty. In our presentation of NRA, queries do notadmit an inductive definition, because subexpressions ofsome query f may not be queries themselves (they may befunctions f : t1 � t2 with free variables, or they may becomplex object expressions e : t). To circumvent that, weassociate to each expression a query, as follows. To anycomplex object expression e : t, and any set of variablesxt1

1 , ..., xtkk including all free variables of e, we associate the

query *(xt11 , xt2

2 , ..., xtkk ) } e : t1 _t2_ } } } tk � t. Similarly,

to any function expression f : t � t$ we associate the query*(xt, xt1

1 , xt22 , ..., xtk

k ) } f (xt) : t_t1_t2_ } } } tk � t$. Whenno confusion arises, we shall omit mentioning explicitly theset of variables and talk about ``the query f associated tosome expression e.''

Proposition 9.6. The following hold for all k�0:

NRA1(log�loop(k))�Q-AC k

NRA(blog�loop(k))�CQ-AC k.

Proof. It suffices to prove only the second inclusion: thefirst one follows. For that we prove by induction on somecomplex object expression e # NRA(blog�loop(k)) of typet that for any set of variables xt1

1 , ..., xtll which includes all

free variables in e, the associated query

*(xt11 , xt2

2 , ..., xtll ) } e : t1_t2_ } } } tl � t

is in ACk. Simultaneously, we prove that for any functionexpression f # NRA(blog�loop(k)), the correspondent

query is in ACk. We illustrate only some of the cases for eand f.

Union e _ e$. Let f and f $ be the queries associated to eand e$, respectively. Then the query associated to e _ e$ isg=*x } f (x) _ f $(x). Let f, f $ be computed by the functionsF and F $, respectively, and let :n and :$n be circuitsassociated to Fn and F $n . To compute g concatenate the out-puts of :n and :$n , eliminate the braces ] [ by replacing themwith blanks and conditionally placing a comma (the commais placed only when both outputs encode a nonempty set).Finally, eliminate the duplicates in the resulting set usingLemma 9.5.

Function application f (e), where f : t � t$ and e : t. Letg : t_t1 _t2_ } } } tl � t be the query corresponding to f,and h : t1_t2 _ } } } tl � t be the query corresponding to e.Then the query corresponding to f (e) is k=*x } g(h(x), x) ,and this leads us directly to a circuit. Namely let :n be thecircuit computing g and ;n be the circuit computing h ; letQ(n) be the size of the output of ;n . Then the circuit com-puting k will consists in a copy of ;n , and a copy of:n+Q(n)+9 . The inputs of :n+Q(n)+9 will consists in the con-catenation of the following strings:

``( '' the output of ;n ``,'' the input X``) ''

Extension ext( f ). For sake of clarity, suppose thatf : t � [t$] does not have free variables and that l=0. Thenthe query associated to f is f itself, and let :n be a circuit forcomputing f ; let Q(n) be the size of the output of :n . The cir-cuit ;n for ext( f ) will receive an input X=[X1 , X2 , ..., Xm].Essentially it has to `àpply'' :n to each substring Xp . It startsby computing Y=Bt

n(X ). Since it cannot anticipate whereeach substring Xp lies, it will have a copy of the circuit :j&i

for every pair (i, j), 0�i< j�n&1, for which i mod 3=j mod 3=0, which will receive as input X[i : j&1]. Theoutput of the circuit corresponding to (i, j) will beinvalidated, however (overwritten with blanks), unlessY[i]=Y[ j]=1 and \k, i<k< j, Y[k]=0, i.e., unlessthere exists indeed some substring Xp which stretches fromposition i to position j&1. All n(n+1)�2 results are con-catenated, all pairs of the inner parentheses ] [ replacedwith a comma, and finally we feed the resulting string (ofsize �Q(n) } n } (n+1)�2) into D t

Q(n) } n } (n+1)�2 to eliminatethe duplicates.

Iteration blog�loop( f, b). Since the output type is a PS-type, we will assume, for the sake of clarity, it is a set type,i.e. f : [t] � [t]. Let g : [t]_t1_ } } } � [t] be the queryassociated to f, and h : t1_ } } } � [t] be the query associateto b. Let :n and ;n be the circuits for computing g and h,respectively, and let Gn , Hn be the functions computed bythem. Assume their output sizes to be Qg(n) and Qh(n),respectively. The input set y in f(x, y) cannot have morethan n elements (since at most n�3 characters are used to



encode (x, y) ); hence it would suffice to generate Wlog nXcopies of :n and to feed the output of every circuit as inputto the next. However, this naive approach does not work.Indeed, the output of the first :n has size Qg(n), then theoutput of the second :Qg(n) has size Qg(Qg(n)), the output ofthe third :Qg(Qg(n)) has size Qg(Qg(Qg(n))), etc. Hence theoutput size grows more than a polynomial. We need a moresubtle technique which uses the bound b in an essential way.

Let p : ([t$]_[t])_(t1_ } } } ) � [t] be the query associ-ated to blog�loop( f, b). The circuit computing p will receiveas input a string ((X, Y) , Z) of length n. It will start bycomputing Hn on the string (Y, Z) of length n (obtainedfrom the input by overriding some characters with blanks).Let U=Hn(Y, Z). U is the encoding of the bounding set,u=[u1 , ..., ul], and length(U)=Qh(n). Next, the circuit forp contains Wlog(n+1)X copies of the circuit :Qh(n) . The ideais that all intermediate results are subsets of u, hence Qh(n)bits will suffice to represent them. Namely we feed the firstcircuit :Qh(n) with the intersection of Y and U ; we obtain Yfrom the input ((X, Y) , Z) by overriding the rest withblanks, and we compute the intersection by overriding in Uall elements which are not in Y with blanks, so the result hasthe same size as U. Similarly, we feed the i+1 copy of thecircuit :Qh(n) with the output of the i 's copy, intersected withU, which is again a string of size Qh(n). Of course, we haveto bypass all circuits :Qh(n) beyond level Wlog(m+1)X,where m is the cardinality of the set encoded by X. To dothat, we start by computing Bt$

n(X). Next we compute astring of length Wlog(n+1)X representing the binary encod-ing of the number of 1's in Bt$

n(X ), i.e., the binary encodingof m ; this can be done in AC 1, since it is a particular case ofthe problem of adding n binary numbers. Now we establisha correspondence between the Wlog(n+1)X bits represent-ing m and the Wlog(n+1)X copies of :Qh(n) ; the least signifi-cant bit will correspond to the first copy, etc. We will bypasscopy :Qh(n) if all bits i, i+1, i+2, ... in m are 0. Finally,observe that, if the circuit :Q(n) for computing g had depthO(logk n), then the circuit for computing p has depthO(logk+1n).

The other cases are treated in a similar fashion. We skipthe proof of the uniformity, which is tedious but straight-forward. K

Since the inequality � can be computed in AC0, Proposi-tion 9.6 immediately implies:

Corollary 9.7. NRA(blog�loop(k), �)�CQ-AC k

and NRA1(log�loop(k), �)�Q-AC k

Instead of designing a circuit for computing f, we couldhave shown that f can be computed in FO[logk n]+�+BIT, and then using the results in [Imm89, BIS90] to con-clude f # ACk; in fact, this is the way we prove Proposi-tion 6.5. But we chose to construct the circuit for computing

f in order to suggest how f may compiled on a CRCWPRAM.

Putting together Propositions 7.1, 8.3, and 9.6, we get onedirection of the inclusions in Theorems 6.1 and 6.2:

Proposition 9.8. For all k�0, NRA1(dcr(k))�Q-ACk and NRA(bdcr(k))�CQ-ACk. Moreover, if everyoperator p # 7 is in NC, then NRA1(dcr, 7)�NC.

10. SIMULATING CIRCUITS

It remains to prove the other inclusion: Q-ACk�NRA1(dcr(k)). For that we start with an NC query f whichwe want to simulate in NRA1(log�loop, �). We do this inthree steps: (1) Given an input x to f, we compute the mini-mal encoding X for x and represent it as a complex object.(2) Assuming n=length(X ), we simulate the Turingmachine from the DLOGSPACE-DCL uniformity and con-struct the circuit :n which computes f on inputs of size n;again, we encode :n as a complex object. (3) We simulatethe circuit :n on the input X in NRA1(log�loop, �). Herewe have to iterate logk n times, where logk n is the height of:n : k nestings of log�loop suffice for that. (4) Finally we``decode'' the string Y resulting from the simulation of :n .

For step (2) above we adopt techniques for simulatingPTIME Turing machines with query languages over finite,ordered structures [Imm86, Var82] and over complexobjects [GV91c]. For step (1) we take the (ordered) inputx and construct some set z whose cardinality is at least aslarge as the size of the minimal encoding of x (see Section 4).The type of z may depend on the value x, but there are afixed number of choices, and we may test (using if 's) whichone is the right choice. E.g., if t=[D], then x=[x1 , ..., xm],where m=card(x). Then x can be encoded with n�6 Vm2+3 bits; since each element in x can be encoded with atmost Wlog(m+1)X�m characters 0, 1, each is followed byone additional comma (and the last element is followed bythe right brace), and each character uses, in fact, 3 bits.Finally we need the left brace (hence the } } } +3). For thiscase, it suffices to take z =

def x_x_x, of type [D_D_D]when m>2, because card(z)=m3�6 V m2+3 for m�3.But for m�2, we choose z =

def [true, false]5 of type [B5],which has 32>6 V m2+3 elements.

Lemma 10.1. For every type t there exists nt conditionsC t

1 , ..., C tnt

: t � B, nt types st1 , ..., st

nt, and nt functions

bt1 : t � [st

1], ..., btnt

: t � [stnt

] in NRA, such that thefollowing hold. Let x # t, and let X # [0, 1]* be the minimalencoding of x. Then:

v _i such that C ti(x)=true.

v \i, C ti(x)=true O length(X )�card(bt

i(x)).

Moreover, height([sti])�max(1, height(t)), for i=1, nt .



This says that, for every type t and any object x of type t,we may construct a set z whose cardinality is greater than orequal to the length of the minimal encoding for x : for dif-ferent values of x we may be forced to choose between nt dif-ferent types for the set z, but there are conditions C t

1 , ..., C tnt

telling us, for every x, which particular type for z to choose.

Proof. We prove the lemma by induction on the type t.We will abbreviate the set [false, true]_[false, true] with4, and the set [false, true] with 2.

Case t=D. Then we take nt=1, C t1(x) =

deftrue, st

1 =def

B_B, and bt1(x) =

def4. Indeed, \x # D, the minimal encoding

of x is 0, which has length 3 (recall that we encode each sym-bol of the alphabet A with three bits).

Case t=[t$]. Let x # t, x=[x1 , ..., xn], and let a =def

atoms(x), a # [D]. We first apply induction hypothesis tot$, to get conditions C t$

1 , ..., ct$nt

, and types st$1 , ..., st$

nt$. Next

observe that the minimal encoding of X is [X1 , ..., Xn].Obviously X1 , ..., Xn are not minimal encodings of x1 , ..., xn ,respectively (because xj does not necessarily mention allatoms in a), but their length is at most a factor of3 } Wlog(card(a)+1)X larger than the length of their mini-mal encoding (this is a generous upper bound). Observethat 3 } Wlog(card)(a)+1)X�4_card(a), when a{0.Hence, to obtain a set of cardinality �length(Xj) whenCt$

i (xj) is true and a{<, it suffices to take the set4_a_bt$

i (xj). The latter set is included in 4_a_ext(bt$i )(x).

Finally, if xj has the longest encoding among x1 , ..., xn , thenthe length of the minimal encoding of x is at most 2n timeslarger than the encoding of xj ; hence the cardinality of theset 2_x_(4_a_ext(bt$

i )(x)) is larger than the length of X.However, when a=<, then the latter set has cardinality 0.Here we argue that size of X is bounded by a constant whichdepends only on the type t, and we take as bounding set[false, true]k, with k sufficiently large. In conclusion wedefine nt =

def nt$+1, and

C ti(x) =

def a{<7 _xj # x } (C t$i (xj) 7 \xj $ # x(C t$

i $(xj $)

O card(bt$i (xj))�card(bt$

i $(xj $))))

sti =

defB_t$_(B2_D_st$

i )

bti(x) =

def2_x_(4_a_ext(bt$

i )(x))

for i=1, nt$ , and

C tnt$+1(x) =

def a=<

stnt$+1 =

defBk

btnt$+1(x) =

def [false, true]k.

The other cases are handled using similar techniques. K

Proposition 10.2. For all k�1 CQ-ACk�NRA

(blog�loop(k), �) and Q-ACk�NRA1(log�loop(k), �).

Proof. Let f be in CQ-ACk, f : t � t$. We shall constructan expression f $ in NRA(blog�loop(k), �) equivalentto f. This proves CQ-ACk�NRA(blog�loop(k), �). ForQ-ACk�NRA1(log�loop(k), �) it will suffice to observethat, when height(t)�1 and height(t$)�1, then f $ #NRA1(blog�loop(k), �) and, hence, f $ # NRA1

(log�loop(k), �), by a simple extension of Proposition 3.5to blog�loop.

Since f # CQ-ACk it is computed by some functionF : [0, 1]* � [0, 1]*. F is given by: (1) a DLOGSPACETuring machine T accepting the DCL of a family of circuits,and (2) polynomials P(n) and Q(n) (see Section 4). Forsome input x # t, let n be the length of the minimalencoding X of x (see Section 5). The simulation of F inNRAk(blog�loop, �) is described:

1. Apply Lemma 10.1 to construct some set z having acardinality �n. The type of z may depend on the value x:but there are a fixed number of choices, and we may test(using if 's) which one is the right choice. Observe that whenheight(t)�1 then the type of z will be flat and z can be com-puted in NRA1.

2. Some power zl of z will have p=nl elements, enough toperform all the arithmetic needed in the sequel. Over thisordered set, we precompute the functions plus, minus, mul-tiplication, and bit, on the numbers 0, ..., p&1. See, e.g., theproof of Proposition 8.3 for the computation of addition,and see Example 6.4 for a hint on how to compute multi-plication using bdcr and, hence, blog�loop. As a result, weobtain four relations plus, minus, mult, and bit of type[t_t_t], where [t] is the type of zl. We only need onenesting level of blog�loop to compute each of them.

3. Compute the minimal encoding X of x, of length n,without blanks: a string X # [0, 1]* of length n is represen-ted as a set of ``numbers,'' i.e., a subset of zl. The computa-tion is done in NRA1(blog�loop, �), using the previouslycomputed relations plus, minus, mult, bit as parameters; theblog�loop is needed to compute the sum of a set of numbers.As a byproduct of this encoding, we also obtain a transla-tion table from atomic values in x to numbers.

4. Simulate F on X, as described below, to get Y=F(X ).Again we use plus, minus, mult, bit as parameters.

5. Finally ``decode'' Y, to get y # t. Decoding is done inNRA, i.e., no loops are necessary, and uses the translationtable constructed at point 3. The decoding consists essen-tially in parsing the string Y and constructing y. The parserdepends on the type t. E.g., when t=[t$], then we constructy as follows. We built the set of pairs of number (i, j), i< j(this is just a subset of zl_zl), and for each such pair, we usethe parser for the type t$ to test whether some element isencoded in Y between the positions i and j, and if so, to



decode that element. Finally we construct the set of all suchelements. So essentially we apply ext on the parser for thetype t$.

There are two ways of simulating F on X. One is to use theresult in [Imm89] which says that, since F is in ACk, F isalso in FO[logk n]+�+BIT, and to observe that, fork�1, FO[logk n]+ �+BIT�NRA1(log�loop(k), �).The second way is to use the DLOGSPACE-DCL-unifor-mity definition of AC k. We will follow the second path here.First, we simulate the O(log n) space Turing machine com-puting the DCL of :n : this can be done, since there are onlypolynomially many configurations for T, and decidingwhether T accepts some input (n, g, g$, t) reduces, usingstandard techniques, to the computation of the transitiveclosure of the successor relation on the set of configurationsof T. Second, we simulate the circuit :n itself, by computingstep by step the outputs of the gates at each level: thisonly requires logk n iterations, so it can be done inNRA1(log�loop(k), �) and, hence, in NRA(blog�loop(k),�). We observe that ext is used in an essential way at eachiteration step, accounting for the parallelism in the evalua-tion of :n .

Finally we argue that the nesting depth of the resultingexpression is no larger than k. We saw already that the cir-cuit :n can be simulated by some function, say h(X ), wherethe nesting depth of h is k, but where h uses the tables plus,minus,... as free variables. We have to show that these tablesdo not increase the nesting depth. Indeed, the full expressionof the simulation of :n will then be

e =def

(*(plus, minus, ...) } h(X ))( fplus(zl), fminus(zl), ...) ,

where fplus(zl) computes the addition table plus on zl, etc. Bydefinition of the nesting depth (Subsection 3.1) we havedepth(e) =

defmax(depth(h(X )), depth( fplus), depth( fminus),

...)=max(depth(h(x)), 1)=(depth(h(X ))=k when k�1;indeed, the iterations in fplus and h(X ) are not nested, butcomputed sequentially.

This proved

CQ-ACk�NRA(blog�loop(k), �) \k�1.

If t and t$ are both flat types, then all the computationsdescribe above are in NRA1(blog�loop(k), �), which isequal to NRA1(log�loop(k), �), by a straightforwardextension of Proposition 3.5 to log�loop. Hence Q-ACk�NRA1(log�loop(k), �). K

Now we can prove our main results. Theorems 6.2 and 6.1follow from Corollary 9.7 and Proposition 10.2. Proposi-tion 6.5 follows from Proposition 9.6, and Proposition 6.3 isproven by an straightforward extension of the proof ofProposition 9.6.

11. CONCLUSIONS

Ordering seems to play a crucial role in capturing com-plexity classes below NP and our characterization is noexception. Indeed, it follows from Theorem 7.8 in [IPS91]that in the absence of ordering FO+dcr cannot express thelower bound in [CFI92] which is in AC0 plus parity gates[CFI92, Remark 7.2]. As with PTIME, DLOGSPACE,etc., it remains an important open question whether thereexists an r.e. set of ``programs'' that express exactly the NC-computable queries over arbitrary relational databases.

On the other hand, studying the expressiveness of thevarious forms of recursion on sets in the absence of orderingis quite relevant to query language design. It may also berelevant to complexity theory, if an analog to the surprisingresult of Abiteboul and Vianu [AV91] holds. They haveshown that PTIME{PSPACE iff first-order least fixpointqueries { first-order while queries. (Vardi had shown thatin the presence of order the FO+while captures PSPACE[Var82].) Dawar, Lindell, and Weinstein [DLW93] give amachine-independent proof of the Abiteboul and Vianuresult making use of properties of bounded variable logics.Abiteboul, Vardi, and Vianu [AVV92] give evidence for therobustness of the idea with several such results for otherpairs of complexity classes. In our case, the analog wouldbe: NC{PTIME iff FO+dcr{FO+sri (in our formalism,NRA1(dcr){NRA1(sri)). By setting aside the ordering,with its potential for tricky encodings, this would strengthenthe observation (Section 6) that the difference between trac-table sequential and tractable parallel computation can becharacterized as the difference between two ways of recur-ring on sets.

APPENDICES

A. Genericity

In this appendix we review Chandra and Harel's defini-tion of query genericity [CH80] and extend it to the case ofqueries over ordered databases. To define genericity, con-sider some subset D$�D. Then for every type t, let t$ be thetype obtained by replacing each occurrence of D with D$.Obviously t$�t. Let . : D$ � D be an injective function.For every type t, we define the extension of . to t to be.t : t$ � t as

. D(x) =def .(x)

.B(x) =def x

.t1_ } } } _tk((x1 , ..., xk) ) =def (.t1

(x1), ..., .tk(xk))

.[t]([x1 , ..., xn]) =def [.t(x1), ..., .t(xn)].



Definition A.1. We call a function f : t1 � t2 generic if,for every infective function . : D$ � D, the following holds:\x # t$1 } .t2

( f (x))= f (.t1(x)).

In the case of ordered databases, i.e., in which a total orderrelation is defined on the base type (D, �), we call f genericif .t2

b f = f b .t1for all order-preserving injective functions

. : (D$, �) � (D, �).This definition A.1 naturally extends that of Chandra and

Harel [CH80] to ordered databases. In their case . onlyranges over bijections . : D � D. For ordered databases,this is not enough for our purposes. Indeed, consider thecase in which D is the set of natural numbers. Then, there isonly one order-preserving bijection . : (D, �) � (D, �),namely the identity, and in this case, under the definition in[CH80] all queries are generic. By slightly changing thedefinition to allow the domain of . to be any subset D$ ofD, we force f to commute with order-preserving renamingsof elements of D.

B. Type Lifting

We prove here Proposition 3.5. For this we need thetechnique of type lifting. For any type t we define the liftedtype t� to be the type obtained by putting set parenthesisaround all the base types it contains. Formally,

B� =def [B]

D� =def [D]

[t] =def [t]

t1_ } } } _tk =def t� 1 _ } } } _t� k .

Note that for every t, t� is a PS-type. In addition we defineliftt : t � t� , which puts set parenthesis around all theelements of base type that it contains, e.g. liftD =

def [x],liftt1_ } } } _tk(x1 , ..., xk) =

def ( liftt1(x1), ..., lifttk(xk)) , and

lift[t](x) =def x. Note that when t is a PS-type, then t� =t and

liftt(x)=x.For a given function f : t � t$, where t$ is a PS-types, we

define f� : t� � t$ as

f� (x) =def

� [ f (z) | liftt(z)�x and

\z$ } (liftt (z)�liftt (z$)�x) O z=z$].

(Recall that we extend the union operation to PS-types.) Inthe particular case when t=D and t$ is a set type, t$=[t"],then we have f : D � [t"], f� : [D] � [t"], and f� =ext( f ).Similarly, when f : D_D � [t"] then f� : [D]_[D] � [t"]is f� (x, y) =ext(*v } ext(*v } f(u, v) )( y))(x). In general:

Proposition B.1. Whenever f is expressible in NRA1,NRA1(dcr), or NRA1(bdcr), then f� is expressible in thesame language.

Example B.2. Let t=D_([D_D]_B), t$=[D]_[D], and f : t � t$. Then f� is defined by

f� (x1 , (x2 , x3)) =def

� [ f(z1 , (x2 , z3)) | z1 # x1 , z3 # x3]

and can be defined as

f� (x1 , (x2 , x3))

=ext(*z1 } ext(*z3 } f(z1 , (x2 , z3)) )(x3))(x1).

Fact B.3. Let f : t � t$, where t$ is a PS-type. Thenf� (liftt(x))= f (x).

Proof of Proposition 3.5. The inclusion NRA1(bdcr)�NRA1(dcr) is obvious. For the inclusion NRA1(dcr)�NRA1(bdcr), consider some expression dcr(e, f, u) :[t] � t$ in NRA1(dcr). First we lift t$ to a PS-type t� $. Also,we define e$ : t� $, f $ : t � t� $ and u$ : t� $_t� $ � t� $ as

e$ =def

liftt$(e)

f $ =def

liftt$ b f

u$ =def

liftt$ b u.

If u is associative, commutative, and has identity e on someset s�t$, then u$ will still be associative, commutative, andwill have identity e$, on the subset [lift(x) | x # s] of t� $.Moreover, we have dcr(e$, f $, u$)=liftt$ b dcr(e, f, u). To seethat, consider for illustration a set with three elements,s=[x1 , x2 , x3]. Then

dcr(e$, f $, u$)([x1 , x2 , x3])

=u$( f $(x1), u$( f $(x2), f $(x3)))

=u$( f $(x1), liftt$ b u( liftt$( f (x2)), liftt$( f (x3))))

=u$( f $(x1), liftt$ b u(liftt$_t$( f (x2), f (x3)) ))

by the definition of liftt$_t$

=u$( f $(x1), liftt$(u( f (x2), f (x3)) ))

by Fact B.3

=liftt$(u( f (x1), u( f (x2), f (x3))) )

by repeating the steps above

=liftt$(dcr(e, f, u))([x1 , x2 , x3]).

Now assume without loss of generality that e, f, u haveequivalent expressions in NRA1(bdcr) and, hence, so doe$, f $, u$. Let x1 , ..., xk be all free variables in the expressionse, f, and u, and let s be the input set to dcr(e$, f $, u$). Wecompute a bound for dcr(e$, f $, u$) as follows. First we define



bD # [D] to be the set of all values of type D mentioned inx1 , ..., xk , and s ; obviously bD can be computed in NRA1.Next, we define bB # [B] to be bB =

def [false, true]. Finallywe compute the bound b of type t� $ by pairing proper car-tesian products of bD and bB . E.g., when t� $=[D]_([D_D]_[B]), we take b =

def (b D , (bD _bD , bB)).Then it is easy to check that *s } bdcr(e$, f $, u$, b)(s)=dcr(e$, f $, u$) (note that s is a free variable occurring in theexpression b).

So let s # [t] and r=dcr(e, f, u)(s). We have shown howliftt$(r) can be computed in NRA1(bdcr). Finally, observethat we can recover r from liftt$(r) by using the functionget. K

C. The Role of get

We show in this appendix that even at flat types dcr isslightly more expressive than bdcr, because it can extract anelement of type D from a singleton set, as we have seen inExample 3.3.

Throughout this appendix we shall denote with NRA&

language NRA without get.

Proposition C.1. get cannot be expressed inNRA&(bdcr).

Proof. (Sketch). Here we prove by induction on thestructure of an expression e in NRA&(bdcr) that thefollowing holds. Let f : t � t$ be the associated query (i.e.,the function *(x1 , ..., xk) } e, where x1 , ..., xk are all freevariables in e ; see Section 9). Then we show that each``scalar component'' of f (x) is one of the ``scalar com-ponents'' of x. More precisely, for every type t definescalart : t � [D] to be

scalarB(x) =def <

scalarD(x) =def [x]

scalar[t](x) =def <

scalart1_ } } } _tk(x1 , ..., xk) =

defscalart1

(x1) _ } } } _ scalartk(xk).

Then, for every query f : t � t$ associated to some expressionin NRA&(bdcr) we prove that the following conditionholds:

\x # t, scalart$( f (x))�scalart(x). (3)

Certainly get does not satisfy Condition 3, becausescalarD(get([d1], d2) ) = scalarD(d1) = [d1], whilescalar[D]_D(([d1], d2) )=[d2].

We prove Condition 3 by induction on the structure of anexpression e in NRA&(bdcr). We consider some relevantcases for e :

Projection. For illustration take e=?1(e$), wheree$ : t$_t", e : t$. Then f =*x } ?1( f $(x)), where f $ is the queryassociated to e$. By induction hypothesis we havescalart$_t"( f $(x))�scalart(x). It suffices to observe that:scalart$(?1( f $(x)))�scalart$_t"( f $(x)).

Union e=e1 _ e2 . Since the type t$ is now a set type,scalart$( f (x))=<, and there is nothing to prove.

Function application e= g(e$). For the sake of claritysuppose that g : t" � t$ does not have any free variables,hence its associated query is g itself. Let f $ : t � t" be thequery associated to e$, so the query associated to e isf = g b f $. Then, using the induction hypotheses we have

scalart$( f (x))=scalart$(g( f $(x)))

�scalart"( f $(x))

�scalart(x).

Bounded recursion. As in the case of union, there isnothing to prove here, since the resulting type is a Ps-type.

The remaining cases can be easily checked. K

This result justifies the inclusion of get in the languageNRA. The next result shows that the additional expressivepower brought by get is largely cosmetic. Indeed, we con-sider getting around the function get through type lifting, atechnique for transforming a non-PS-type into a PS-typedefined in Subsection 3.2.

Proposition C.2. Let f : t � t$ be a query inNRA(bdcr) (or in NRA1(bdcr)). Then lift b f : t � t� $ isexpressible in NRA&(bdcr) (or in NRA1

&(bdcr), respec-tively), i.e., without get. Moreover, \k�0, if f #NRA(bdcr(k)) (or f # NRA1(bdcr(k))), then lift b f #NRA&(bdcr(k)) (or lift b f # NRA1

& (bdcr(k)), respec-tively).

As a consequence, adding or dropping get from NRA

does not affect in any way the queries whose result is a PS-type.

Proof. We proof by induction on the structure of anexpression e in NRA1(bdcr, get) that its associated queryf (in the sence of Section 9) satisfies the above property. Thecase when e is the query get : [D]_D � D is trivial,because liftD b get : [D]_D � [D] is

*(x, y) } if card(x)=1 then x else [ y]

which is expressible in NRA1. The only interesting case iswhen e is function application, i.e. e= g(e$). Let f $ : t � t" bethe query associated to e$ and assume, for sake of clarity,that g has no free variables; i.e., the query associated to g isg itself. Then f = g b f $. By induction hypothesis we knowthat liftt" b f $ and liftt$ b g are expressible in NRA1

&(bdcr),



say by h : t � t� " and h$ : t" � t� $. We prove that lift b f =h� $ b hand, since the latter is in NRA1

&(bdcr), this concludes ourproof. Indeed,

h� $(h(x))=h� $(liftt"( f $(x)))

=h$( f $(x)) by Fact B.3. K

D. Alternative Encoding of Flat Relations

Immerman [Imm89] proves a relationship betweenqueries computable in extensions of FO and parallel com-plexity classes. His encoding of flat relations is different fromours; it is more elegant when dealing with flat relations, butit does not extend to complex objects. We prove here thatover flat relations our encoding is AC0 equivalent to Immer-man's, i.e., that the functions translating between the twoencodings are in AC0.

For sake of simplicity we will consider only the base typeD, and drop the types unit and B. Then a flat relation typeis a type of the form [Dk]. For a flat relation x # [Dk] wedefine the bit-wise encoding of r to be the following stringX # [0, 1]*. Let d=[d0 , d1 , ..., dn&1] # [D] be a set con-taining the active domain of x, i.e. the set of all values of typeD mentioned in x, and assume d0<d1< } } } <dn&1. Thenthe encoding X associated to x and d will have length nk

and will be defined as follows. For every i1 , i2 , ..., ik ,0�i1 , i2 , ..., ik�n&1, X[i1nk&1+i2nk&2+ } } } +ik]=1iff (di1 , ..., dik) # x.

Proposition D.1. There exists a function F : [0, 1]nk�

[0, 1]nk(3kn+3k+6)+3 in AC 0 such that, for every string Xrepresenting the bit-wise encoding of some flat relation x, wehave xtF(X) and, moreover, F(X ) is a minimal encoding ofx. Conversely, there exists a function G : [0, 1]n � [0, 1]nk inAC0 such that for every string X encoding some flat relationx, G(X ) is a bit-wise encoding of x.

Proof. To compute F(X ) we proceed as follows. Theoutput string will be formed of a leading left parentheses, [,followed by nk ``cells'' of 3kn+3k+6 bits each. Each cell issplit into a ``body'' built from the first 3kn+3k+3 bits,and a ``tail,'' containing the last three bits. We fill the cellcorresponding to i1 , i2 , ..., ik either with blanks, whenX[i1nk&1+i2nk&2+ } } } +ik]=0, or with (i1 , ..., ik) ,otherwise. For the second case observe that each i1 , ..., ik

are between 0 and n&1, hence each requires only3Wlog(n+1)X�3n bits, for a total of 3kn bits, and we needan additional 3k+3 bits to encode the brackets ( ) and thecommas. In the first case we fill the ``tail'' with a blank, whilein the second case we fill it with a comma, except for the lastnonblank body, where we put a right parenthesis ].

For G(X ) we observe first that the active domain of x hasless than n elements, where n=length(X ). We start by com-puting Y=B[Dk]

n (X ); see Lemma 9.3. Then, for everyi1 , ..., ik , the bit number i1nk&1+i2nk&2+ } } } +ik of the

output will be 1 iff _ j, j $0� j< j $�n&1 s.t. Y[ j]=Y[ j $]=1 and \k . j<k< j $, Y[k]=0, and X[ j : j $&1]equals (i1 , ..., ik). The latter test can be made using a circuitof depth O(k) (since we need k$ levels of comparisons toidentify the position of yk$ , for 1�k$�k). K

ACKNOWLEDGMENTS

We thank Scott Weinstein for many illuminating discussions, NeilImmerman for answering our sometimes naive queries, Peter Bunemanand Leonid Libkin for suggestions from a careful reading of an earlier ver-sion of this paper, and Peter, Leonid, and Limsoon Wong for theirconstant help.

REFERENCES

[AB88] S. Abiteboul and C. Beeri, On the power of languages for themanipulation of complex objects, in ``Proceedings, Interna-tional Workshop on Theory and Applications of NestedRelations and Complex Objects, Darmstadt, 1988.'' Alsoavailable as INRIA Technical Report 846.

[AHV95] S. Abiteboul, R. Hull, and V. Vianu, ``Foundations ofDatabases,'' Addison�Wesley, Reading, MA, 1995.

[AV91] S. Abiteboul and V. Vianu, Generic computation and itscomplexity, in ``Proceedings, 23rd ACM Symposium on theTheory of Computing, 1991.''

[AVV92] S. Abiteboul, M. Vardi, and V. Vianu, Fixpoint logics, rela-tional machines, and computational complexity, in ``Struc-ture and Complexity, 1992.''

[BBKV87] F. Bancilhon, T. Briggs, S. Khoshafian, and P. ValduriezFAD, a powerful and simple database language, in``Proceedings, 13th International Conference on Very LargeData Bases, 1987,'' pp. 97�105.

[BBW92] V. Breazu-Tannen, P. Buneman, and L. Wong, Naturallyembedded query languages, in ``Proceedings, 4th Interna-tional Conference on Database Theory, Berlin, October,1992'' (J. Biskup, R. Hull, Ed.), Lect. Notes in Comput. Sci.,Vol. 646, pp. 140�154, Springer-Verlag, New York�Berlin,1992. Available as UPenn Technical Report MS-CIS-92-47.

[BIS90] D. M. Barrington, N. Immerman, and H. Straubing, Onuniformity within NC 1, J. Comput. System Sci. 41 (1990),274�306.

[BNTW95] P. Buneman, S. Naqvi, V. Tannen, and L. Wong, Principlesof programming with collection types, Theoretical ComputerScience, to appear.

[BT92] V. Breazu-Tannen, Generalized structural recursion and setsvs. bags, January 1992. Manuscript available from val�saul.cis.upenn.edu.

[BTBN91] V. Breazu-Tannen, P. Buneman, and S. Naqvi, Structuralrecursion as a query language, in ``Proceedings, 3rd Interna-tional Workshop on Database Programming Languages,Naphlion, Greece, August 1991,'' pp. 9�19, Morgan Kauf-mann, San Mateo, CA, 1992. Also available as UPennTechnical Report MS-CIS-92�17.

[BTS91] V. Breazu-Tannen and R. Subrahmanyam, Logical and com-putational aspects of programming with Sets�Bags�Lists, in``LNCS 510: Proceedings of 18th International Colloquiumon Automata, Languages, and Programming, Madrid,Spain, July 1991,'' pp. 60�75, Springer-Verlag, 1991.

[CFI92] J-Y. Cai, M. Furer, and N. Immerman, An optimal lowerbound on the number of variables for graph identification,Combinatorica 12, No. 4 (1992), 389�410.



[CH80] A. Chandra and D. Harel, Computable queries for relationaldatabases, J. Comput. System Sci. 21, No. 2 (1980), 156�178.

[CL90] K. L. Compton and C. Laflamme, An algebra and a logic forNC, Information and Computation 87, No. 1-2 (1990),240�262.

[Clo90] P. Clote, Sequential, machine-independent characterizationsof the parallel complexity classes AlogTime, ACk, NCk, andNC, in ``Feasible Mathematics'' (S. R. Buss and P. J. Scot,Eds.), Birkha� user, Boston, 1990.

[Coo85] S. Cook, A taxonomy of problems with fast parallel algo-rithms, Inform. and Control 64 (1985), 2�22.

[DLW93] A. Dawar, S. Lindell, and S. Weinstein, Infinitary logic andinductive definability over finite structures, Inform. and Com-put. 119 (1995), 160�175. Available as UPenn TechnicalReport MS-CIS-91-97.

[DV91] K. Denninghof and V. Vianu, The power of methods withparallel semantics, in ``Proceedings, 17th International Con-ference on Very Large Databases, 1991.''

[Fag93] R. Fagin, Finite model theory��A personal perspective,Theoretical Computer Science 116, No. 1 (1993), 3�32.

[Gur83] Y. Gurevich, Algebra of feasible functions, in ``Proceedings,24th IEEE Symposium on Foundations of ComputerScience,'' pp. 210�214, IEEE Comput. Soc., Los Alamitos,CA, 1983.

[GV91a] S. Grumbach and V. Vianu, Expressiveness and complexityof restricted languages for complex objects, in ``Proceedings,3rd International Workshop on Database programmingLanguages, Naphlion, Greece,'' pp. 191�202, Morgan Kauf-mann, 1991.

[GV91b] S. Grumbach and V. Vianu, ``Tractable Query Languages forComplex Object Databases,'' Technical Report 1573 INRIA,Rocquencourt BP 105, 78152 Le Chesnay, France, 1991.Extended abstract appeared in PODS 91.

[GV91c] S. Grumbach and V. Vianu, Tractable query languages forcomplex object databases, in ``Proceedings, 10th ACM Sym-posium on Principles of Database Systems, 1991''.

[GV95] S. Grumbach and V. Vianu, Tractable query languages forcomplex object databases, J. Comput. System Sci. 51, No. 2(1995), 149�167.

[Imm82] N. Immerman, Upper and lower bounds for first-orderexpressibility, J. Comput. Systems Sci. 25 (1982), 76�98.

[Imm86] N. Immerman, Relational queries computable in polynomialtime, Information and Control 68 (1986), 86�104.

[Imm87a] N. Immerman, Expressibility as a complexity measure:Results and directions, in ``Proceedings, 2nd Conference onStructure in Complexity Theory,'' pp. 194�202, 1987.

[Imm87b] N. Immerman, Languages that capture complexity classes,SIAM Journal of Computing 16 (1987), 760�778.

[Imm89] N. Immerman, Expressibility and parallel complexity, SIAMJournal of Computing 18 (1989), 625�638.

[IPS91] N. Immerman, S. Patnaik, and D. Stemple, The expressive-ness of a family of finite set languages, in ``Proceedings, 10thACM Symposium on Principles of Database Systems,''pp. 37�52, 1991.

[LW94a] L. Libkin and L. Wong, Aggregate functions, conservativeextension, and linear orders, in ``Proceedings, 4th Interna-tional Workshop on Database Programming Languages,New York, August 1993'' (C. Beeri, A. Ohori, and D. E.Shasha, Eds.), pp. 282�294, Springer-Verlag, New York�Berlin, 1994. See also UPenn Technical Report MS-CIS-93-36.

[LW94b] L. Libkin and L. Wong, New techniques for studying setlanguages, bag languages, and aggregate functions, in``Proceedings, 13th ACM Symposium on Principles ofDatabase Systems,'' pp. 115�166, Minneapolis, Minnesota,1994. See also UPenn Technical Report MS-CIS-93-95.

[Mos74] Y. N. Moschovakis, `Èlementary Induction on AbstractStructures,'' North Holland, Amsterdam, 1974.

[OBB89] A. Ohori, P. Buneman, and V. Breazu-Tannen, Databaseprogramming in Machiavelli, a polymorphic language withstatic type inference, in ``Proceedings of ACM-SIGMODInternational Conference on management of Data'' (J. Clif-ford, B. Lindsay, and D. Maier, Eds.), pp. 46�57, Portland,Oregon, 1989.

[PG88] J. Paredaens and D. Van Gucht, Possibilities and limitationsof using flat operators in nested algebra expressions, in``Proceedings, 7th ACM Symposium on Principles ofDatabase Systems,'' pp. 29�38, Austin, Texas, 1988.

[PG92] J. Paredaens and D. Van Gucht, Converting nested relationalalgebra expressions into flat algebra expressions, ACMTrans. Database Systems 17, No. 1 (1992), 65�93.

[PSV92] D. Stott Parker, E. Simon, and P. Valduriez, SVP: A modelcapturing sets, streams, and parallelism, in ``Proceedings,18th International Conference on Very Large Databases,Vancouver, August 1992'' (L-Y. Yuan, Ed.), pp. 115�126,Morgan-Kaufmann, San Mateo, California, 1992.

[RB90] M. Sipser and R. Boppana, The complexity of finite func-tions, in ``Handbook of Theoretical Computer Science. Vol.A: Algorithms and Complexity'' (J. Van Leeuwen, Ed.), MITPress, 1990.

[RW93] P. Rao and C. Walinsky, An equational language for data-parallelism, in ``Proceedings, 4th ACM SIGPLAN Sym-posium on Principles and Practice of Parallel Program-ming,'' pp. 112�118, ACM Press, 1993.

[Saz93] V. Y. Sazonov, Hereditarily-finite sets, data bases and poly-nomial-time computability, Theoretica Computer Science 119(1993), 187�214.

[SS86] H.-J. Schek and M. H. Scholl, The relational model with rela-tion-valued attributes, Information Systems 11, No. 2 (1986),137�147.

[Suc97] D. Suciu, Bounded fixpoints for complex objects, TheoreticalComputer Science 176 (1997), 283�328.

[SV84] L. Stockmeyer and U. Vishkin, Simulation of parallel ran-dom access machines by circuits, SIAM Journal of Comput-ing 13 (1984), 409�422.

[SW95] D. Suciu and L. Wong, On two forms of structural recursion,1995.

[TF86] S. J. Thomas and P. C. Fisher, Nested relational structures,in `Àdvances in Computing Research: The Theory ofDatabases'' (P. C. Kanellakis and F. P. Preparata, Eds.),pp. 269�307, JAI Press, London, England, 1986.

[Var82] M. Y. Vardi, The complexity of relational query languages, in``Proceedings, 14th ACM SIGACT Symposium on the Theoryof Computing,'' pp. 137�146, San Francisco, California, 1982.

[Won93] L. Wong, Normal forms and conservative properties forquery languages over collection types, in ``Proceedings, 12thACM Symposium on Principles of Database Systems,''pp. 26�36, Washington, D.C., 1993. See also UPenn Techni-cal Report MS-CIS-92-59.

[Won94] L. Wong, ``Querying nested Collections,'' Ph.D. thesis,Department of Computer and Information Science, Univer-sity of Pennsylvania, Philadelphia, PA, 19104, 1994.Available as University of Pennsylvania IRCS Report 94-09.


Documents

A Query Language for NC