INFORMATION SCIENCES 6,49-83 (1973) 49gpenn/csc2517/peters-ritchie73.pdf · INFORMATION SCIENCES 6,49-83 (1973) 49 ... guage generated by a transformational grammar is no greater

INFORMATION SCIENCES 6,49-83 (1973) 49

On the Generative Power of Transformational Grammars*

P. STANLEY PETERS, lR.t

University of Texas, Austin, Texas

AND

R. W. RITCHIE*

University of Washington, Seattle, Washington

Communicated by Frank B. Cannonito

ABSTRACT

Mathematical modeling of phrase structure grammars has yielded many results of benefit to linguists in their investigation of these grammars, such as Chomsky’s characterization in terms of self-embedding of those context-free languages which are not regular. The recent shift of focus in linguistic theory to transformational grammars has not been accompanied by a similar application of mathematical techniques to transformations. Our present purpose is to foster such studies by providing general definitions which model grammatical transformations as mappings on trees (equivalently, labeled bracketings) and investigating ques- tions of current linguistic interest, such as the recursiveness of languages generated by transformational grammars. The first result of our research is that, despite the linguistically motivated, complex restrictions placed on transformational grammars, every recursively enumerable set of strings is a transformational language (Theorem 5.1). We demonstrate that this power of transformational grammars to generate non-recursive languages results from their ability to cycle their rules, applying transformations an unbounded number of times (Corollary 6.6). Analysis of decision procedures for grammars with bounded cycling reveals a connection between the amount of cycling permitted by a grammar and the complexity of the recursive set it generates; if cycling is bounded by any elementary recursive function (primitive recursive function, function in &? for n > 3), then the language generated has characteristic function in the same class (Cor- ollary 6.7). One application of these results provides empirical support for the notion that natural languages are recursively, in fact elementarily, decidable. Our results also isolate one feature which must be further restricted in a linguistically motivated way if transformational theory is to achieve its goal of delimiting precisely the natural languages.

INTRODUCTION

In Aspects of the Theory of Syntax, Chomsky presents a theory of transformational grammar. The purpose of this paper is to formalize this notion of transformational grammar and to study the expressive power of these grammars.

*This work was supported in part by the 1965 and 1968 Advanced Research Seminars in Mathematical Linguistics, sponsored by the Center for Advanced Studies in the Be- havioral Sciences, Stanford, California.

$Correspondence to this author at: Dept. of Linguistics, University of Texas, Austin, Texas 78712.

SSupported in part by National Science Foundation Grant NSFGP185 1.

0 American Elsevier Publishing Company, Inc., 1973

50 P. STANLEY PETERS, JR., AND R. W. RITCHIE

In particular, we relate the languages generated by these grammars to classes of languages studied in recursive function theory.

The paper is arranged as follows: Section 1 is an informal discussion of the nature of grammatical transforma-

tions and the manner in which they operate on phrase-markers. This material will be familiar to linguists.

Section 2 merely makes precise the concepts introduced in Sec. 1 with one difference. In informal discussion, phrase-markers are represented as trees to aid the reader’s intuitions, but in Sec. 2 they are represented as labeled bracketings for technical convenience in later sections.

Section 3 merely recaps the definitions of phrase structure grammars, with emphasis on the manner in which they generate sets of phrase-markers.

Section 4 defines a transformational grammar to contain two components: a base component (consisting of a phrase structure grammar) and a transformational component (consisting of a finite ordered set of grammatical transformations). Furthermore, transformations are defined to apply cyclically in derivations converting step by step a phrase-marker generated by the base into a derived phrase-marker. If the latter contains no occurrences of a special sentence boundary symbol, it is a surface structure and the phrase-marker initiating the derivation is a deep structure underlying it. A transformational grammar then generates as its language the set of all strings which have a surface phrase- marker.

With these definitions as background, we prove in Sec. 5 that every recursively enumerable set of strings is the language generated by some transformational grammar. In Sec. 6 we examine the sets of languages generated by restricted types of transformational grammar and prove that the complexity of the language generated by a transformational grammar is no greater than the complexity of computation of the length of an underlying deep structure from a sentence. Section 7 is devoted to discussing some implications of these results for natural language in light of empirical studies linguists have made of a variety of languages. Empirical support is given for the hypothesis that natural languages are recursive.

The reader whose interest is primarily in the results of Sets. 5, 6, or 7 is en- couraged to proceed directly to these sections after reading Sec. 1. Sections 5 and 6, which require properties of transformational grammars detailed in Sets. 2-4, begin with summaries of the relevant properties. The properties sum- marized at the beginning of Sec. 5 follow immediately from the definitions, while those of Sec. 6 are deduced at the end of that section.

1. TRANSFORMATIONS: INFORMAL DEVELOPMENT

As is usual in the formal study of grammars, we consider a language to be a set of fmite strings over a vocabulary of terminal symbols, i.e. given a finite

GENERATIVE POWER OF TRANSFORMATlONAL GRAMMARS 51

nonempty set VT (the terminal vocabulary) we may form the set Vt of all finite sequences of members of V T . Then a language is any subset of Vt . Phrase structure and transformational grammars also refer to another vocabulary of symbols, the nonterminal vocabulary VN of phrase types or grammatical categories.

These grammatical categories appear in phrase-markers of strings in V:, which represent their segmentation into phrases and the classification of these phrases into types. A phrase-marker may be represented as a tree in which the leaves are labeled with members of VT and the other nodes with members of VN . The sequence of leaves dominated by a node labeled with a nonterminal symbol A is a phrase of type A. Alternatively, the same information can be represented by a well-formed labeled bracketing (cf. Defs. 2.1 and 2.11). As an example of a phrase-marker, assume that we are given the nonterminal vocabulary VN = {S, NP, VP, N, A} and the terminal vocabulary VT = {they, are, flying, planes} and consider the tree (1).

0 1 NY\

//.i// p\

Pi K they are flying planes

Phrase-marker (1) represents the information that, for example, flying planes is a member of the grammatical category NP, as is they. On the other hand are flying is not a phrase of any type according to (1).

Transformational rules are mappings of phrase-markers into phrase-markers (cf. Def. 2.14). Each such rule consists of two parts: a structural condition and a set of elementary transformations (cf. Defs. 2.8, 2.10, and 2.12). The structural condition of a transformation serves to determine whether or not the rule will apply to a given phrase-marker and, if so, how to factor the phrase-marker into sections to be rearranged, duplicated or deleted. These effects are achieved by application of elementary transformations to factors of the phrase-marker. In order to be a transformation, a paired structural condition and set of elementary transformations must meet conditions of compatibility, chief among them the condition of recoverability of deletions (cf. Def. 2.13). A factorization of a phrase-marker is induced by a factorization of its terminal string in the


following way. Consider the factorization of the terminal string

(2) they are flying planes

into the four substrings Xr = they, X, = are, X3 = flying, and X, = pIanes. This induces the division of (1) into factors as indicated in (3) (cf. Def. 2.6). The factors are given in (4).

NA /c/c;:+N,

they a’ are .’ flying : planes

_

A

II flying _

)i

N

I planes

Notice that each tree factor is chosen so as to include the highest node dom~ating only terminal symbols in the corresponding string factor and that nodes which dominate two or more string factors do not appear in any tree factor. (cf. Def. 2.5 for corresponding concepts in terms of labeled bracketings.)

The factorization X1, X,X,, X4 of (2) into three terms induces the factorization of (1) indicated in (5).

‘\h I

planes

GENERATIVE POWER OF TRANSFORMATIONAL GRAMMARS 53

The first and last factors are the same as before but the second factor is (6) which is not a subtree of (1) but a forest of adjacent subtrees.

0 6 are

.

A

I

P1yi.n

Such forests will arise not only as a single factor of a factorization but also as what we shall call a sequence of factors. The sequence of ith-jth factors of a tree factorization is defined to be the ith factor of the tree factorization induced by concatenating the ith through jth string factors (cf. Def. 2.7 of “contents,” the corresponding notion on labeled bracketing). For example, (6) is the sequence of 2nd-3rd factors of (3) and (7) is the sequence of 2nd4th factors of (3).

(?J /h are flying pl es

A structural condition will specify the properties a factorization of a tree must have if a transformation is to operate on it. These properties are expressed by employing three sorts of predicate: One sort specifies that a particular sequence of factors has a phrase of a certain type as its terminal string, another sort specifies that two sequences of factors are identical, and a third sort specifies that a sequence of factors possesses a certain terminal string. Each predicate is true only of factorizations with a specified number of terms and deals with particular sequences of these terms.

(a) For every nonterminal symbol A the predicate Ay+j is true of a factorization if and only if 1 < i < j < n, the factorization has n terms, and a node


labeled A dominates the terminal string of the sequence of ith-jth factors.

(b) The predicate h + i f” j -+ k is true of a factorization if and only if

1 < h G i ,< n, i Q j =G k < n, the factorization has n terms, and the sequence of

hth-ith factors is identical to the sequence ofjth-kth factors. (c) For every string x of terminal symbols, the predicate i + j s” x is true

of a factorization if and only if 1 <i <j Q n, the factorization has n terms, and

the sequence of ith-jth factors has x as its terminal string.

Any Boolean combination of these predicates is a structural condition. The action of a transformation on a tree to which it applies is determined by

the elementary transformations it contains. Each elementary operates on factorizations of trees in one of three ways.

(i) The deletion elementary [Td, (i, j)] deletes the sequence of ith-jth factors

of a factorization. (ii) The substitution elementary [T,, (h, i), (j, k)] substitutes a copy of the

sequence of jth-kth factors for the sequence of hth-ith factors if the

latter is a subtree. (in) The adjunction elementaries [T,, (h, i), (j, k)] and [Tr, (h, i), (j, k)]

attach a copy of the sequence of jth-kth factors to the right and left

respectively of the sequence of hth-ith factors.

A set of elementary transformations may appear in a transformation if the

sequences of factors on which they operate (indicated by the first pair of inte-

gers) do not overlap. For a structural condition and a set of elementary transformations to form

a transformation they must both deal with factorizations into the same number of terms and meet the condition of recoverability of deletions; namely, if the set of elementary transformations deletes or substitutes for a sequence of factors

without leaving a replica of them in another position, then the structural con-

dition implies that either an identical sequence of factors remains elsewhere in

the tree or else the deleted sequence of factors had a member of a finite, pre-

assigned subset of V$ as its terminal string.

2. TRANSFORMATIONS: FORMAL DEVELOPMENT

Let VT and VN be fixed, finite (disjoint) terminal and nonterminal vocab- ularies, and let L = { [AJA E VN} and R = {]A(A E VN). A labeled bracketing is a finite string of symbols from VT U VN U L U R. A terminal labeled bracketing is a finite string of symbols from VT U L U R.

A labeled bracketing is said to be well-formed if its brackets occur in (nested) matched pairs:

GENERATIVE ~WEROFTRANSFORMATlONALGRAMMARS 5.5

Definition 2.1. A string cp is a well-formed labeled bmcke ting if

(i) VEVT "VN (ii) cp = qKd,Of

(iii) cp = [A$IA

where J, and o are well-formed labeled bracketings and A E VN. A weZZ-foxed terminal labeled bracketing is, of course, a labeled bracketing which is both well-formed and terminal.

We shall also define the debracketing function d mapping labeled bracketings into strings of terminals and nonterm~~s as follows.

Definition 2.2. The debracketing function d on labeled bracketings is the mapping defined by setting

d(oc) = t

(Y if (YeVr”VN

e if crELUR

d(d) = dtcp) d(#) f or all labeled bracketings cp, JI. (Here, as elsewhere, we use “e” to denote the empty string.) For a labeled bracketing cp, we call d(p) the debracketization of cp.

As examples of these definitions, let us consider the following. We fii VT to be the set {are, fly, they, planes, vyings, and VN to be the set (S, V, N, VP, NP, Aux, A}.

Then (2) and (8)-(11) are well-formed labeled bracketings:

(8) ts[Nr’]Nth Y] 1 [ e N NP VPare[NP[AflyinglA[NplaneslNlNPlVPlS~

(9) ts]Nr&th Y] ] ] t e N NP VP AwcarelAux[VflyinglVINP~NpfaneS~NINP~VP~S,

(loI [NINPjNplaneslNlNPTNt

(11) [S[NPtheYlNP[VP[Au*arelA"~v[NP~NP~aneslN1NPlVPlS~

In fact, (2) and (Q-(10) are well-formed terminal labeled bracketings [although (11) is not] , and (2) is the debracketization of (8) and (9).

The following are labeled bracketings which are not well-formed:

(12) hardNy

(13) [Vare[NPfly%dVlNP.

Although (10) is well-formed, it is in a sense redundant. Having “planes” surrounded twice by brackets labeled N seems unnecessary. (In terms of the “is a” relation-see the discussion fo~ow~g ~efmition 2.1 l-it says twice that “planes” is an N in (lo),) To eliminate this sort of redundancy we define a reduced labeled bracketing to be one meeting the conditions below.


Definition 2.3. A labeled bracketing 4 is said to be reduced if there are no

A, xl, xz , $, w, 0,~~ such that either cp = XI [AIAXZ, or

6) cp = XI [A~~LIAXZ, (ii) I// = u[Aw]A~, and

(iii) II, and w are well-formed, u E L * and r E R * _

For example, (2), (8), (9), and (11) are reduced well-formed labeled bracketings, but (10) is not.

We shall use the notion of reduced well-formed labeled bracketing frequently in Sec. 6. In particular, we note now that there is an upper bound on the length of reduced well-formed labeled bracketings which may have a given string as their debracketization (in terms of the length of the given string and of the size of the nonterminal vocabulary).

LEMMA 2.4. Let q be the cardinality of VN, and let cp be a string in (VT U VN)* of length 1. If $J is a reduced well-formed labeled bracketing such that d(J/) = cp, then the length of 11/ is at most 2q(21- 1) t 1.

hoof. The proof is by induction on 1, and for convenience we shall prove the stronger fact that if Ic/ is “without exterior brackets,” then its length is at most 247(21- 2) t 1. We say that $ is “without exterior brackets” if either $ is a single symbol or if there are well-formed labeled bracketings $i, . . . , Jl,(n > 2) such that $ = J/r . lclz . . . . . I), . If 1 = 1, the result is true trivially, so considering 1 > 1, assume the result for all 1’ < 1, and let $ be a reduced well-formed labeled bracketing without exterior brackets whose debracketization has length 1. It must be of the form JIi . $a . . . . . tin, n > 2, where each Jli is a reduced well-formed labeled bracketing whose debracketization is a string of length li < 1. By induction, each $i has length at most 2q(2li - 1) + li (it has the form [A, . . . [Ak $:]A,. . . ]A~ where k Q q and J/i is without ex-

terior brackets). Hence, the length of $J is at most 2 2q(2li - 1) + li=

2q(21- n) + 1 Q 2q(21- 2) + 1 as desired.’ i=l

m

The interior of a terminal labeled bracketing will be, roughly speaking, the longest well-formed substring of the labeled bracketing which contains all the terminals in this labeled bracketing, if such a substring exists. We also speak of the residue as the (left and right) exterior. Precisely, we have:

1 We note that this bound can be achieved whenever the length of up is a power of 2. For example, if up is ab, then

[A, . . . [A~~A~...[A~~IA~...IA~ hl...hqblAq..,lAI lAq...lAI

is a reduced well-formed labeled bracketing of the desired length.


Defbition 2.5. The interior of a terminal labeled bracketing cp [written Z(q)] is the longest well-formed labeled bracketing $ such that

(i) d(q) = d(G), and (ii) there are labeled bracketings u, T such that cp = a@r

if such a J/ exists. We shall call u the Zeft exterior of cp[written El(q)] and r the right exterior of cp[E,(lp)] . If there is no such IJ, we leave I(q), El(~) and E,(q) undefined.

For example, the interior of (10) is (10) itself. The interior of (12) is “are,” but (13) has no interior. The left and right exteriors of (12) are “[v” and “I N” respectively, while both exteriors of (10) are null.

Definition 2.6. A standard factorization into n terms, for n > 1, of a terminal labeled bracketing cp is defined, if cp is a substring of a well-formed labeled bracketing, to be an (ordered) n-tuple (J/r, . . . , 9,) of labeled bracketings such that

(i) q=$r . ..Gn.and (ii) for eachi= l,..., n, the leftmost symbol of Jli is not a right bracket,

nor is the rightmost symbol a left bracket.

The second condition assures us that the factors have been chosen to coincide with the phrase breaks, and that each non-null factors contains terminals. The conditions are necessary for the correct assignment of derived constituent structure by the transformations defined below.

As an example, let us consider the standard factorization of labeled bracketing (8) given as follows:

(14) ($1, $2, $3, $4) where

$1 = [S[NP[Ntb’]N]Np

$2 = tVpa=,

$3 = [NPIAflying]Ay and

$4 = [NP[N@dN]NP]VP]S.

Other standard factorizations are

(1% (J/r, $243, $41, and

(16) ($1, $2J,3$4),where Jli are asin(i4).

Note that (14) is the same factorization as (3), (15) is the same as (5) and the second factor, G2 J/a, in (15) has no interior, even though the entire string 9 1 14~ I)~ 14~ does, as does each $i individually.


Only standard factorizations of labeled bracketmgs will be employed below. Therefore, we shall henceforth omit the word “standard” and call these simply “factorizations.”

Definition 2.7. The contents C(p) of a terminal labeled bracketing cp is defined if and only if cp is a substring of a well-formed labeled bracketing, the leftmost -symbol of cp is not in R and the rightmost is not in L to be the con- catenation of the interiors of the terms of the unique factorization (J/ 1 , . . . , J/,) of cp such that

(i) each $i has an interior, and (ii) for any factorization (or, . . . , wk) of cp in which each term has an in-

terior, each ILi is a product of adjacent oi’s; i.e. there are pc , . . . , p,, such that 0 = p. < p1 < . * -<p,, =k and for each i= l,...,n we have

*i=“pi_1+1wpi_l+2 “‘*pi_l*pi*

For use in Def. 2.8 below, we also introduce the notation R(p) for the string of brackets of cp remaining after C(p) has been removed.

Contents is well-defined since there is always a factorization in which each term has an interior (for example, if cp # e the one in which each term contains a single terminal symbol) and given two such factorizations ((II, . . . , up),

(71, -. . , TV) there is a factorization (xl, . . . , x,) refmed by both in which every term has an interior [for example, the factorization (x1, . . . , xm) with m maximum, each Xk a product of adjacent Ui’s and also of adjacent Tj’S, each term of which has an interior since El(oi) = El(Tj) = e for every oi and Tj except those leftmost in factors Xk and similarly for Er(Ui) and Er(Tj)J .

The contents of any well-formed labeled bracketing cp is cp itself, and R(q) = e, so that for example the contents of (8) is (8). The contents of tiZJ/a of (14) is “are[Aflying]A” (cf. (6)) while that of ti2J/s ti4 of (14) is

J/2$3 [NP[N@d Nl NPI VP (cf.(T)). R(ti2J13)iS [v~[NpandR(~2~3~4)iSls.

On the other hand, (13) has no contents since it is not a substring of any well- formed labeled bracketing.

Following Chomsky, we shah require that the mappings (of labeled bracketings) effected by transformations be the results of applying three very restricted types of “elementary transformations” (see Ref. 1, pp. 144 and 147): deletion of the contents of a sequence of terms, substitution of the contents of a sequence of terms for the interior of a sequence of terms, and (left or right) adjunction of the contents of a sequence of terms to the interior of a sequence of terms.

Definition 2.8. (I) The deletion elementary is the function Td from substrings of well-formed labeled bracketings to labeled bracketings defined by

T&P) = WP).


(II) The substitution elementary is the function r, from pairs (P, $) of substrings of well-formed labeled bracketings to labeled bracketings defined if and only if cp has an interior by setting T,(q, 4) = El(s) C( 9) E&J).

(II@ The le~r~djunc~on elementary is the function Tl from pairs (ip, J/) of substr~gs of wee-foxed labeled bracket~gs to labeled bracketings defined if and only if @ has an interior by setting T&, J/) = El(p) [A, . . . [A, C($)I(p)]&,, . . . ] A,.&&), where A 1 I . . . ,A, is the longest sequence of nonterminals such that there is a well-formed labeled bracketing o for which

I(p) = [Al - * * [Am w] A, * * * ]A13 allowing the case m = 0 in which there are no such brackets.’

(III,) The r~~tudju~crio~ e~eme~t~~ Tr is defined exactly parallel with the left-adjunction elementary immediately above.

It is easy to see that if (JIr , . . . , I/I,,) is a factorization of a well-formed labeled bracketing then J/i . . . ~J~h_~i”d(Jlh . ,. $i) $i+l . . . II/, is also a well- formed labeled bracketing, since it results from a well-formed labeled bracketing by deleting maximal well-formed substrings. S~~arly, the result of a single application of any of T,, Tl,or Tr is a well-formed labeled bracketing, since it is obtained from $r . . . a,, by replacing the interior of \(l,, . . . J/j by a well- formed labeled bracketing or by the empty string. However, applying two elementary transformations simultaneously to J/r . . . $,, in this sense may result in a labeled bracketing which is not well-formed for the trivial reason that a subpart of the form [_&]A may occur; for example let $ f be [s [Ab, $2 be CIA, and $3 be ala. The labeled bracketing T&JII)T&J/z)$3 is [a[A],+a]s which is not well-formed, because it is not reduced in the sense of Def. 2.3. We now define a reduction mapping p from labeled bracketings to reduced labeled bracketings; following these applications of sets of elementary transformations to factori~tions of wee-formed labeled bracketings by p will then always result in a well-formed labeled bracketing if each elementary transformation is defined on the factorization. This reduction mapping p will not only remove nests of matched brackets around the empty string, but will also remove redundant pairs of brackets in labeled bracketings making them reduced in the sense of Def. 2.3:

~e~nitiu~ 2.9. We say that the labeled bracket~g cp is an immed~te reduction of the labeled bracketing 3/ if either there are labeled bracketings Jt, , ~5~ and

*The additional complexity here arises from the requirement (see subsequent discussion of “recoverability of deletions”) that a transformation not lose any information. In this case, the information that w is inside each of the labeled brackets Ai is preserved, and this information is also given to the string C(~)~(~).

3The reduction mapping is a special case of same genera1 conventions such as Ross’ Tree Pruning Convention which can delete nodes from a tree after each transformation has applied. For the linguistic motivation for Ross’ convention see Ref. 13.


a nonterminal A such that cp = J/r $a and $I = &!J, [A] AJ/s or else there are

labeled bracketings $r , J/*, o, strings u in L’ and 7 in R* , and a nonterminal

A such that w is well-formed, cp = $r [Autir]A$s and J/ = $r [Au[A~]Ar] *as. We shall further say that p is a reduction of $ if it can be obtained from $ by a

finite sequence of immediate reductions. We define the reduction mapping p from labeled bracketings to labeled bracketings by giving p(J/) as value the unique reduced labeled bracketing which is a reduction of $. (It is unique since if cpr and cpZ are reductions of J/, there is an o which is a reduction of both cpl

and CPZ .)

Definition 2.10. The set {a,, . . . ,cu,} defines an n-term transformational mapping if each Ok, 1 Q 4 < p, is either a pair [Te, (h4, i,)] or one of the

triples ]T,, (h,, i4), Ci,,&)l, [Th (h,,iq),(jq,~q)lT or [~,,(~,,i,),O’,,~,)l~ and if 1 < ht < il < hz < i2 <. . * < h, d i, < n. In this case, the value

of the mapping of n-term factorizations of well-formed labeled bracketings to

well-formed labeled bracketings, on the factorization ($r , . . . , $,) is defined,

if and only if there are a nonterminal A and a well-formed labeled bracketing

cpsuchthatJ/, . . . 4, = [ALP] A 9 to be4 :

G) p(ol . . . wzp+ , ) if each w, is defined, 1 i m < 2p t 1, and if there is

a well-formed labeled bracketing x such that p(or . . . w2p+l) = [Ax]A, and

(ii) [ AV] A otherwise, where

w2q+1 = i

$i,+r . . . $*q+l_l if&+r - 1 >i4. + 1

e otherwise, where we set ie = 1 and h,+, = n I

for q=O,...,p

1

Td(Ghq . . . Illi,) if @q = tTdtO$,iq)l

w2q = Tp($hq . . . tii,, tijq . . . Jlkq) if0 is S, forq=l,...,p.

1 or r and aq = P’p, (hq, iq), (is, kq>l I The reader interested in examples of transformational mappings at this point

may refer to the end of this section. The “is a” relation between strings and trees or appropriate linguistic struc-

tures was used in Ref. 1, p. 84, and pp. 142-3. For us, the “is a” relation takes the form of a relation between labeled bracketings and nonterminals.

Definition 2.11. For a labeled bracketing cp and a nonterminal symbol B we say that cp is a B if there are labeled bracketings $, w, cr, r such that

4Chomsky considers elementary transformations to be applied in sequence rather than

simultaneously. We have discussed our formulation with him and believe that the transformational mappings we allow are the same as those he desires to have available.


(i) J/ and w are well-formed labeled bracketings and u E L* and r E R* , (ii) cp = ~$7, and

(iii) $= [nw]u.

By this definition each of examples (8) and (9) is an S, but (2) is not. Also, (10) is both an N and an NE.

Although this definition is not the more familiar linguistic notion “is a,” the usual notion is easily recaptured as follows. If x is a string of terminals, B is a nonterminal and cp is a labeled bracketing, then we may say that x is a B in cp if there are labeled bracketings $, (I, and r such that cp = (~$7, x = d(4) and + is a B.

Definition 2.12. (I) For each nonterminal B and all integers h, i and n, the predicate B$+ holds of the factorization (J/i, . . . , 4,) if

(i) l<h<iin,and

(ii) $htih+i . . . $i_, Gi is a B.

(II) For integers h, i, j, k, and n, the predicate h + i -” j + k holds of the factorization (J/i, . . . , $,) if

(i) 1 dh<i<n and 1 <j<k<n,and (ii) C($h . . . tii) = c($j . . . $k).

(III) For all integers h, i and n, and for each terminal stringx, the predicate h + i 9 x holds of the factorization (J/, , . . . , lCln) if

(i) 1 <h <i<n,and (ii) d(rjh . . . ai) = x.

An n-term structural condition’ on a factorization is a Boolean combination of the n-ary predicates Bt_+i; h + i E” j + k; and h -+ i 9 x.

Every pair (C, M), where C is an n-term structural condition and M is an n-term transformational mapping, induces a mapping (also denoted (C, M)) of factorizations of well-formed labeled bracketings into reduced well-formed labeled bracketings as follows. The value of this mapping on the factorization

($i,...,tin)is

(0 WJIl,. . . , tin) if the structural condition C holds of the factorization

(Gl,...,$,),and (ii) $r . . . I), if the structural condition C does not hold of (4 1, . . . , 9,).

sStructura1 conditions replace Chomsky’s structure indices as follows. Conditions (I) and (III) can be used to define the predicate “Analyzable” (Ref. 1, p. 143) in terms of which Chomsky states his structure indices. Conditions (II) and (III) are required for a precise statement of “recoverability of deletions,” and in the presence of the boundary symbol (see Sec. 3) accomplish the filter function of transformations as well.


Definition 2.13. The mapping induced by the pair (C, M) where C is an n-term structural condition and M is an n-term transformational mapping, satisfies the principle of recoverability of deletions6 if the following implication holds for each pair h, i of integers; whenever M contains a pair [Td, (h, i)] or a triple [T,, (h, i), (j, k)] , then one of the following conditions obtains:

(i) there are t and u such that the following two conditions hold: C implies h +if” t +uand,ifMcontainsapair [Td,(f,g)] oratriple [T,,(f,g), (v, w)] with either t Q f Q u or t <g Q u then there exist p, q,y, and z with p < t < u Q q such that M contains either [T,, (y, z), (p, q)],

LG, (Y, z>, (P, 411, or IT,, (u, ~1, (P, 4)l; (ii) there exist a finite number of terminal strings x1, . . . ,x, such that C

impliesthateitherh-+i=“x, orh+ir”xz or.. .orh+iE”x,.

We now state the formal definition of a transformation.

Definition 2.‘14. A transformation is the mapping induced by any pair (C, M) which satisfies the principle of recoverability of deletions, where C is an n-term structural condition and M is an n-term transformational mapping for some n. A factorization (9 I , . . . , $,) is called a proper analysis for the transformation induced by (C,M) if C holds of (G1, . . . , tin)’ and ifM($, , . . . , $,) is defined by (i) of Def. 2.10.

We conclude this section with three examples of transformations modified for expository purposes; the “passive,” the “Aux-attraction” and the “Wh- inversion” transformations. To illustrate their operation, these will be applied in sequence as they would in the derivation of the sentence “By whom had the call been put through to Chicago before John left?”

6This condition is discussed in Ref. 1, pp. 144-5, footnotes 1 and 13 to Chapter 3, p. 132 and Ref. 2, p. 41. The reader should note that in Aspects Chomsky restricts use of predicates of the form h -+ i=:” I +u rather drastically (Ref. 1, p. 145 and footnote 13, p. 225). We can capture the restriction he imposes as follows. We can add to Def. 2.10 new triples of the form [ Td, (h, i), (r , u)] and new quadruples of the form [ ?“,, (h, i), (j, k), (I, u)] . Each of these has the same effect as the corresponding tuple without the last pair. We disal- low all explicit use of the predicates h + i =” t -S u in structural conditions, but when our new tuples are members of the transformational mapping of a pair (C, M), by convention this pair stands for the pair (C’,M’) where C” = (C and h --*i sn t + u) and M’ equalsM with the final pairs (I, u) deleted. Such a restriction would not change our theorems.

‘We have made provision for deletion of sequences of terms even when the sequence has no interior. If this power is eliminated, we can dispense with the deletion elementary. To delete the hth through ith terms of an n-term proper analysis (~pl, . . , p,J satisfying a structural condition C, let C’be the n+l-term structural condition (Cand n+l -tn+l ?+’ e) where c is obtained from C by adding one to superscripts. Then (~1, . . , Pi, e) satisfies c’. Now replace the deletion elementary [Td, (h, i)] by [T,, (h, i), (n+l, n+l)]. If qh . . . pi has an interior, the result is the same labeled bracketing as before. If the deletion elementary is eliminated, we can simplify the statement of recoverability of deletions.


Schematically, we might represent the passive transformation as follows:

W NP Aux V X NP Y Passive Z 1 2 3 4 5 6 7 8 9 1 6 3+8 4 5 0 7 2 9

Here the intended interpretation is that the structural condition muqt assure us that the factorization to which the passive transformation is to be applied has the structure: anything, a noun phrase, an auxiliary, a verb, anything, a noun phrase, anything, the passive marker, anything. (Some of these pieces of the labeled bracketing may be empty.) Further, if this structural condition holds, then the pieces of the labeled bracketing are to be reordered as indicated. More formally, the 9-term transformation “passive” consists of the structural condition “NP~.+2 and Aux:_,~ and Vf_,4 and NI?& and Passive: _,s,” together with the transformational mapping {[T,, (2, 2), (6, 6)], [T,, (3, 3), (8, S)] ,

[Td, (6, @I > [Ts, (8, f9, (‘L2)1). To see how this operates in our example, let us consider the deep structure

“Q wh t A, past have + en put through the call to Chicago by be + en before John left.” Here A= is an animate dummy noun form; i.e., someone. With the appropriate bracketing this gives us the factorization

[ s [preQ1 pre 1 hPWh[NAa 1 N 1 NP 1 [PP [ Aux [TensePaSt] Tense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

t Aspect [Perfhave en1 Perf 1 Aspect1 AU 1 [VP tvPut] v 1 htt~oughl Prt( . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(17) [NP [De&l Det [Ncalll N I NP 1 [Dir to Cfic%ol Dir [Manner [Agent . . . . . . . . . . . . . . . . . . . . . . .

[Prep-P [PrepbY Prep ) bassivebe enI P assivel Prep-P1 Agent 1 Manner] VP ( . . . . . . . . . . . . . . .

[Tim&fore John left1 Time 1 PP 1 s . . . . . . . . . . . . . . . . . . . . . .

(Here we have indicated the appropriate factoring by the long vertical bars, and the interiors of the factors by the dotted underlines.) The passive transformation applied to (17) gives us the labeled bracketing:

( [s[~reQl Prel [NP bet theI Det [NC~ N 1 NP ( . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

[PP [AU [AU [Tense past1 Tense 1 [Aspect [Perfhve ) enlPerf 1 Aspect 1 AUX 1 . . . . . . . . . . . . . . . . . . . .

(18) [ Passivebe en 1 Passive 1 AUX [VP [vPut] v [~~,through] Prt [Dirt0 Chiwwl Dir

[Manner [Agent [Prep-P [PrepbYIPtep [NPW~[N &I NI NPI Prep-PI Agent

I Manner 1 VP [Timebefore John left1 Time 1 PP 1 S*


(Here the indicated factorization has no relation to the passive transformation but is indicated merely for convenience in the following discussion.)

Now let us consider the Aux-attraction transformation. In the same informal notation used above, this transformation is represented as

M X Q NP Tense

(1 11

be Y z have

1 2 3 4 5 6 7 1 2+4+5 3 0 0 6 7

Condition: if 5 = have then 6 = en and if 5 = e then 6 is a V. Here we have fol- lowed the usual linguistic notational convention that parentheses enclose an optional element, and braces enclose alternative choices. In complete formal detail this is a 7-term transformation whose structural condition is “2 + 2 E’ Q and NPi+3 and Tense & and [M:+s or 5+5~’ be or (5-t Ss’have and 6 + 6 E’ en) or (5 + 5 E’ e and Vl+)] ” and whose transformational mapping is {[T,, (2, 2), (4, S)] , [Td, (4, S)] }. The factorization (18) is a proper analysis for this transformation, and its result when applied to (18) is the labeled bracketing below.

1 [S [Pre brew Pre 1 [Tense padTense have1 pre [NP [D,tthel Det [N Cdl1 N 1 NP

[PP [Au: i/k; jAspe.ct [Perf enI Perfl Aspect 1Aux [passive be enI PassivelAw

(19) [VP ]vPut] v ]Prt through] Prt [Dirt0 Chicago1 Dir 1

[fanner [Agent [prep-~ [prepbY prep 1 [NPW~I [N&I ~1 NPI Prep-pIAgent . . . . . . . . . . . . . . . . . . . 1 Manner1 VP 1 [ ~i.ye_b”fof~ ?F Ff’! ?yyI PP I S.

The final example which we shall consider is the Wh-inversion transformation. This is the 7-term transformation composed of the structural condition “2 + 2 s’ Q and (Prep:+ or 4 + 4 =7 e) and 5 + 5 z7 wh and NPz+,” and the transformational mapping {[T,, (2, 2), (4, 6)], [Td, (4, 6)J 1. Application of this transformation to the proper analysis indicated in (19) yields:

~~~~re~Manner~A~ent~~re~-p~pre~~Y1~rep~NpW~~N~alNlNP~prep-plAgent

1 Manner [Tense Past] Tens&vel pre [ NP [ Det theI Det [ Ncd~l NP

[PP 1 Aux [ Aux [Aspect [Perf en1 Perf 1AspectlAux [passivebe en1 PassivelAw

[ VP [v Put] v [ Prt through] Prt [ Dirt0 Chicago1 Dir1 VP

[Time before John left1 Time1 PPI s .


3. PHRASE STRUCTURE GRAMMARS

In the previous sections we defined transformations as rules which apply to labeled bracketings to form other labeled bracketings. Now we must provide a source for the labeled bracketings to which transformations are applied before we can define transformational grammars. Following Chomsky (Ref. 1, pp. 120-3, 128, 141-2) we employ phrase-structure grammars as the source.8 Since phrase-structure grammars are ordinarily thought of as generating sets of terminal strings (i.e., languages in our sense) rather than sets of labeled bracketings, we recapitulate the definitions of such grammars here with slight modifications.

Let VT and VN be finite, disjoint, nonempty sets. An unrestricted rewriting system (URS) G is a 5-tuple (VT, VN, S, #, -+) where S is a distinguished member of VN called the initial symbol, # is a distinguished member of VT called the boundary symbol,’ and + is a finite subset of (VT U VN)* X (VT U V,)* (the Cartesian product of (VT U V,)* with itself) called the set of rules of G. As before, VT is called the terminal vocabulary and VN the nonterminal vocabulary of G.

A sequence (p1,(p2,..., q,, of strings over VT U VN is called a weak derivation for G if cpl = #S# and for each i (1 < i < n) there are X1, x2, $J,

u E (VT U VN)* such that cpi = X1 $X2, cpi+ I = x1 wx2, and $ + o is a rule of G. A weak derivation cp,, cp2, . . . , qn for G is said to be terminated if there is no %+ t E (VT U VN)* such that cpl , ~2, . . . , (pn , (pn+ , is a weak derivation for G. The language [L(G)] weakly generated by G is the set of all terminal strings x E VG such that there is a terminated weak derivation \a,, cp2, . . . , (P,, for G with (p,, = #x#.

Linguists are not interested in the full class of unrestricted rewriting systems,

aChomsky actually employs a modified version of phrase structure grammar as the source of the labeled bracketings to which transformations apply. His “base components” are derived from phrase structure grammars by four changes which do not affect any of the results in our paper. First he imposes a linear order on the rules of his base component (cf. Ref. 9 for a proof that this does not affect weak generative capacity). Second, he intro- duces rules which operate on an extended nonterminal vocabulary consisting in part of complex symbols. As Chomsky notes, these can be eliminated at the cost of increased complexity of the rules. Third, he uses a lexicon of complex terminal symbols which a special rule per- mits to be substituted for complex nontemrinals. This lexicon and special rule can be replaced by additional rewriting rules, again increasing the complexity of the grammar. Fourth, he restricts context sensitive rules to rewriting complex nonterminal symbols as other such symbols, thus the eliminations just referred to yield a weakly equivalent context- free grammar.

sThe symbol ‘#’ was used in Ref. 3 to mark the ends of the lines of a derivation. We allow it to be introduced into the middle of lines of a derivation so it can be used to perform the filter function of transformations following Ref. 1, pp. 137-139.


but in a smaller class called the phrase-structure grammars. A URS G is called a phrase-structure grammar (also called a context-sensitive (CS) grammar) if for each rule $ + w of G there are unique” strings x1 , x2, cp E (VT U VN)’ and A E VN such that IJ = xl Axz , o = xl (px2, and cp f e. For this limited class of grammars we can assign a labeled bracketing to each string in the generated language. This is the reason we limit our attention to this subclass of the class of all URS’s.

We shall assign labeled bracketings to the strings weakly generated by phrase structure grammars by means of strong derivations. Since we desire these labeled bracketings to be reduced in the sense of Def. 2.3, avoiding the redundancy discussed there, we shall apply the reduction mapping P of Def. 2.9 at each step.

Given a CS grammar G, a sequence cpl, . . . , (p,, of reduced labeled bracketings is called a strong derivation for G if cpl = #S# and for each i (1 < i < n) there are ~1, ~2, cp, J/, w E (VT U VN U L U R)* and A E VN such that Cpi =

XI~AJ/XZ, cPi+r = P(XI~[AWlA$XZ), d(w) = a, and 4cpAJl) + d(V[Awl A $1

is a rule of G (where d is the debracketing function of Def. 2.2). We define terminated strong derivation for G in the obvious manner. The set [J(G)] of structural descriptions strongly generated by G is the set of all cp E (VT U L UR)’ such that there is a terminated strong derivation cpl, . . . , (pn for G such that (pn = #q#. Note that for any CS grammar G, X(G) contains only reduced well-formed labeled bracketings. We now have the source of labeled bracketings which we need as input for transformations.

Before defining “transformational grammar,” however, we will define several subclasses of the phrase structure grammars which have been studied in the literature. The most important of these subclasses is the class of context-free grammars.

A URS G is a context-free (CF) grammar if for every rule IJ + w of G, J/ E VN and w # e. A URS G is a linear grammar if it is a CF grammar and for every rule J/ --f w of G, there are x, y E V; and x E VN U {e} such that w = xxy. A URS G is a left (respectively, right) linear grammar if it is a linear grammar and x = e (respectively, y = e) in the preceding sentence. A URS G is a one-sided linear grammar if it is either a left or a right linear grammar, These, and several other, classes of grammars have been extensively studied in the literature.

4. TRANSFORMATIONAL GRAMMARS

We may now define a context-sensitive (context-free, linear, etc.) based transformational grammar 9 as a pair (!I’, S) where 9 is a context-sensitive (context-

loThe uniqueness condition is not usually imposed, but it does not alter the weak gen-

erative capacity of the class of context-sensitive grammars not materially affect its strong generative capacity.

GENERATIVEPOWEROFTRANSFORMATIONALGRAMMARS 67

free, linear, etc.) phrase-structure grammar and where 3 = (T, , . . . , Tk) is a

finite sequence of transformations over the vocabulary of !I’. We call !? the base

component of 9, and T the transformational component of g. In order to define the language L(g) generated by the transformational

grammar 9, we must define transformational derivations with respect to

Vi,. . . , Tk). Having done so, the language L(g) will be the set of debracketizations of strings qr for which there is a terminated strong derivation cpl, . . . , #qj# of a terminal labeled bracketing in the grammar !I’ and a transformational

derivation vi, . . . , pt with respect to (T,, . . . , Tk). We shall follow Chomsky [Ref. 1, pp. 134-51 in using cycles of transforma-

tions, applied to “innermost” phrase-markers and then working “outward.” We

now proceed to make this precise.

Definition 4.1. Given a sequence (T, , . . . , Tk) of transformations, a se-

quence cpl, . . . , qr of reduced labeled bracketings is called a transformational cycle with respect to (T1, . . . , Tk) if there is a sequence iI, . . . , it-i, 1 < i, < iz <- ‘-<it_, Gksuchthat

(i) for every j = 1, . . , t - 1, there is a proper analysis ($ 1, . . . , $I,,) for Tii

ofpjandcPj+, =Ttj(JI~,...,$n), (ii) for every 1 <j < t - 2, and every r, ii < r < ii+1 , there is no proper

analysis of vi+ 1 for T,., and

(iii) there is nor, 1 < r < i, (respectively it_l < r < k), such that there is a

proper analysis of cpl (respectively qr) for T,.

The interested reader will note that, if cpr is a reduced well-formed labeled

bracketing, so is pt. A well-formed substring cp of a well-formed labeled bracketing J, is said to be

a subsentence of $ if there is a well-formed labeled bracketing w such that

cp = [sw] s. (Here S is, of course, the initial symbol of !I’.) A well-formed

labeled bracketing is a sentence if it is a subsentence of itself.

Definition 4.2. A sequence of sentences is said to be a transformational derivation with respect to the sequence (T, , . . . , Tk) of transformations if there

are integers i, , . . . , i,, 1 < ii < k t 1 for each j = 1, . . . , p; labeled bracketings

Oj, rj, j = 1, . . . 9 p;andsentences$j,, forallj=l,..., p,m=l,..., ijsuch

that the sequence is of the form

o1J11,1r1, DlJ/1,271r . .f, mh,ip13

(72$2,172, 0242,2723 . . . , U2$2,i2727 ...


where

(a) $r,r has as its rightmost bracket ] s the leftmost ] s in u1 $r,r rl, (i.e. r1 contains all but the leftmost ] s of u1 J/r, 1 r1 ),

(b) for each i = 1, , . . , p, the sequence Jli, 1, $j,z, . . . , $i,ii is a transformational cycle with respect to (Tr , . . . , Tk),

(c) for eachi= 1,. . . ,p - l,Oj+l$j+1,1Tj+l =&oj$j,~i~j) and the rightmost

Is of- Jli+1,1 is the leftmost ] s of rj, (d) and up and rP are empty.

The reader will note that, in clause (c), rj is preserved in the reduction since p removes redundant brackets from the inside preserving everything outside of Jlj, ii. Further the cycling works on each subsentence of ul 9 1, 1 71 , from “inside out ’ and left to right across the sentence by always taking as the next subsentence the one with the next ] s to the right of the subsentence just completed.

Thus, a transformational derivation is a sequence of transformational cycles operating on sentences, one cycle after the other, reducing at the end of each cycle. We now proceed to define the notions of the language generated by a transformational grammar, and of the deep and surface phrase-markers of sentences in this language.

Definition 4.3. Let 9 = (?, y) be a transformational grammar, and let 9 = (VT, VN, S, #, +). The last line 1+5 of a transformational derivation with respect to T whose first line cp is in X(9) will be called a surface phrase-marker of $$ provided that $ contains no occurrences of #. In this case, cp is said to be a deep phrase-marker of g underlying 11/, and the pair (cp, $) is said to be a structural description generated by g. We say that (cp, $) is a structural description of the debracketization of 9. A string x [in (VT - {#})*I is said to be a sentence in the language generated by 9 if it has a structural description generated by 9. We let L(g) be the class of all sentences generated by 9, and refer to it as the language generated by !$. A context-sensitive (context-free, linear, etc.) based transformational language is, of course, the language generated by some context-sensitive (context-free, linear, etc.) based transformational grammar. (When the base is of no interest in the discussion, we shall refer simply to transformational grammars and languages, omitting all reference to the type of base.)

5. EVERY RECURSIVELY ENUMERABLE SET IS A TRANSFORMATIONAL LANGUAGE

Having defined transformational grammars we are now in a position to prove some theorems about their expressive power. We can distinguish two different aspects of expressive power: the first, called weakgenerative capacity, is the set


of all languages which can be generated by transformational grammars; the second, called strong generative capacity, is the set of all sets of structural descriptions generated by transformational grammars.

The usual approach to the study of expressive power of a class of grammars is to investigate the weak generative capacity of the class. This is the approach we shall adopt. In particular we will compare the weak generative capacity of the class of transformational grammars with the set of recursively enumerable languages and with the set of elementary languages.”

This section requires two facts about transformational grammars. First, that transformations are applied cyclically, each subsentence of a deep phrase-marker serving exactly once as the domain of application of each transformation. Second, that there is a transformation which alters a sentence if and only if it contains at least two terminal symbols the rightmost of which is a particular specified symbol and in this case deletes the symbol.

Our first theorem concerns the full class of context-sensitive based transformational grammars.

THEOREM 5.1. Every recursively enumerable language is generated by some context-sensitive based transformational grammar, and conversely,

Proof The converse follows directly from our definitions by Church’s Thesis. In somewhat more detail, one might carry out the enumeration as follows:

(1) effectively generate all pairs (n, x) where n is a positive integer and x is illv;,

(2) as each pair appears, check effectively (say using the Turing machine con- structed quite explicitly in Theorem 6.4) whether x has a deep phrase- marker in the desired grammar with no more than n subsentences,

(3) if x does have such a deep phrase-marker, output x, otherwise consider the next pair in the list being generated.

For the other direction, let L be any recursively enumerable language. We wish to construct a CS-based transformational grammar 9 such that 15 = I,(s). It is well-known (see, for example, [Ref. 7, p. 208, Theorem 0.31) that there must be an unrestricted rewriting system G such that L = L(G). In fact, G can even be chosen so that each rule is either of the form (pX$ + cp Y $ where X and Y are in Vi or of the form A + a where A is in VN and a is in VT. We will construct another URS G’, such that L(G’) is closely related to L(G). Let the terminal vocabulary of G’ be VT U {b} where b is a new symbol, and the nonterminal vocabulary of G’ be VN U {B} where B is also a new symbol. Let G’ have the same initial symbol as G, and have the following rules:

I* By “elementary language” we mean a language with an elementary characteristic function, see Corollary 6.7 ff.

70 P. STANLEY PETERS,JR.,AND R. W. RITCHIE

I. if cp -+ J/ is a rule of G, then a) cp + $ is a rule of G’ if the length of $ is not less than the length of cp b) cp + $B” is a rule of G’ if n = length of cp minus length of JI is greater

than zero II. c) BA + AB is a rule of G’ for all A E VN

d) B-tbisaruleofG’.

The following relation holds between L(G’) and L(G) (=L): L(G) is the set of all strings obtainable by deleting all occurrences of the symbol “b” from strings y in L(G’). To see that all members of L(G) can be obtained in this way, note that given any x E L(G) there is an integer m such that xbm E L(G’); it is also easy to see that only members of L(G) can be obtained in this way. Notice that every rule of G’ has a right-hand side at least as long as its left-hand side. Therefore, by the proofs of Kuroda’s Lemmas 2 and 3 (Ref. 7, p. 211) there is a CS grammar 9 such that (i) L(Y) = L(G’), (ii) the only rules of !I’ involving its initial symbol S are S + SA and S + S’ where A and S’ are nonterminals, and (iii) all other rules of 9 are of the form CD + EF, C -+ E, and C + c where C, D, E, and F are nonterminals and c is a terminal.

!? will be the base component of 9. Let cp E .C(!l’) and let x be the terminal string of cp. Each initial substring of x is an S in q (in the sense of “is a” defined after Def. 2.11) and thus each b in cp is the rightmost terminal symbol in some subsentence of cp. Now consider the transformation T which deletes b when it is the rightmost terminal symbol in a sentence (in the notation of Sec. 2, let T be (2 + 2 =* b, { [Td, (2,2)] }). Although T does not necessarily preserve a copy of the symbol b, it satisfies the condition of recoverability of deletions by specifying that the deleted string must in every case be a single b. If we put Y = (T) and g = (!I’, S) then L(g) is the set of all strings obtained by deleting each occurrence of the symbol “b” from each member of L(Y) [= L(G’)] , since T applies cyclically to every phrase-marker q in 49) operating on every subsentence. Thus L(g) = L(G) = L. This proves the theorem. n

Theorem 5.1 was straightforward to prove since it did not involve any use of sophisticated properties of transformations. Its proof relied ultimately on an observation of Scheinberg’s concerning the relation of recursively enumerable languages to context-sensitive languages. We can strengthen Theorem 5.1 con- siderably at the cost of a much more intricate argument. We state Theorem 5.2 here and prove it in Ref. 10.

THEOREM 5.2. Every recursively enumerable language is generated by some con text-free based transformational grammar, and conversely.

The linguistic import of Theorems 5.1 and 5.2 is considerable. To get an idea of this import we must briefly look at the meaning of the notion “possible


transformational grammar” for linguistics. In Aspects Chomsky states that the set of possible transformational grammars is the set of hypotheses which a child learning a language has available concerning the linguistic data he must explain. Learning a grammar consists of selecting one of these hypotheses on the basis of certain criteria. Theorems 5.1 and 5.2 show that the theory formulated in Aspects makes available to the child a hypothesis to explain any recursively enumerable set of data. It could be that the fact that the data will always be finite together with the criteria which are the basis of selection will prevent the child from ever learning a grammar which generates a nonrecursive language. But until the criteria of selection are more clearly stated we must consider the possibility that the child can actually learn any grammar made available given some set of data. In other words, until the criteria of selection are better defined, the theory of transformational grammar contained in Aspects should be taken as asserting that a child can learn a nonrecursive language. This is the linguistic import of the theorems in this section.

Such an assertion is quite strange, for it flies in the face of the intuitive meaning of recoverability of deletions. The intuitive meaning of this condition is that given a speaker who knows a grammar and given a string over the terminal vocabulary, the speaker can construct all structural descriptions for the string generated by the grammar and can furthermore determine that the grammar does not generate the string if this is the case.‘* Chomsky states, somewhat tentatively, that there should be an algorithm to determine for an arbitrary possible grammar and an arbitrary string over the terminal vocabulary what structural descriptions the grammar assigns to the string, if any, as well as whether the grammar generates the string at all (Ref. 1, pp. 31-32 and footnote 18, p. 202). This is an even stronger condition than what we have called the intuitive meaning of the condition of recoverability of deletions. It follows from either of these conditions by Church’s Thesis that each possible grammar generates a recursive language. Chomsky later says that conceivably the child has available a grammar for any recursively enumerable language (Ref. 1, p. 62).

But even as he makes this statement, he adds that this “seems definitely not to be the case” (Ref. 1, footnote 37, p. 208) and he clearly expects the condition of recoverability of deletions as formulated in Aspects to require that every possible grammar generates a recursive language. Thus Theorem 5.1 can be taken as an indication of something amiss in the theory of transformational

r*Chomsky states that each possible grammar should assign a structural description to each “possible sentence” (in our framework, each string over the terminal vocabulary), and that for certain possible grammar-sentence pairs the structural description will indicate that the sentence deviates from well-formedness, further will state the respect in which the sentence deviates from well-formedness.


grammar as presented in Aspects.r3 We have stated Theorem 5.2 here to show that these problems cannot be solved by restricting the base to be context-free.

For this reason, among others, it seems reasonable to us to search for conditions under which transformational grammars generate recursive languages. Another reason which supports this search is simply the desirability of restricting as much as possible the class of possible grammars without ruling out as im- possible grammars which are required for any known natural language. Thus we should look not just for conditions under which transformational grammars generate recursive languages, but for such conditions as might conceivably be supported by empirical observations of linguists as to what types of transformational grammars are needed to describe known languages and what types are not. In the next section we investigate some such conditions.

6. THE DECIDABILITY OF TRANSFORMATIONAL LANGUAGES WITH BOUNDED CYCLING

It was shown in Sec. 5 that some nonrecursive languages are generated by transformational grammars. We now wish to study the nature of these grammars in order to determine which portions of the generative process are in fact effective, and isolate exactly those points at which the noneffectiveness enters.

Section 5 contradicts the following “plausible argument” that transformational languages are effectively decidable. To decide whether a given terminal string x is in the language generated by the transformational grammar (!I’, T j apply the following procedure :

(1) enumerate successively the well-formed terminal labeled bracketings over the vocabulary of 9;

(2) given such a well-formed labeled bracketing, decide whether it is strongly generated by 9, if not continue the enumeration in (1); if so, go to (3);

(3) apply the transformations of y to the given well-formed labeled bracketing and determine whether or not the last line of the transformational derivation obtained is a well-formed surface phrase-marker having x as its debracketization.

13The reader should note that as usually formulated-and as we have formulated it-the condition of recoverability of deletions does not attempt to capture what we have called its intuitive meaning. Rather it is intended to assert that if one is given a labeled bracketing which one knows to be the result of applying a certain transformation, then one can recon- struct the labeled bracketing to which the transformation applied. Thus the principle as usually formulated would not necessarily guarantee that one could recover a deep structure underlying a given surface structure unless one knows at least the sequence of transformations that must be applied in deriving the surface structure from (one of) its deep structure(s). This is a very peculiar notion of recoverability of deletions since in interpreting a sentence a speaker is not given a list of what transformations apply in the derivation of the sentence. Discovering what transformations apply is perhaps a part of the process of interpreting a sentence, but the information is not given in advance.

GENERATIVEF'OWEROFTRANSFORMATIONALGRAMMARS 73

We now proceed to analyze this argument in order to determine why it does not in fact yield a decision procedure. In so doing, we shall isolate the one element missing from the argument, and then be able to formulate a necessary and sufficient condition for a transformational language to be recursively decidable.

This analysis uses a variety of properties of transformational grammars as defined in Sec. 4; however, we shall summarize in the following three lemmas all the results about grammars used in this section.

LEMMA 6.1. If 9 is a context-sensitive grammar, then there is a constant KI and a Turing machine Z1 on VT U L U R U {A} which accepts S(Y), rejects -_ J@‘), and uses at most K!(+‘) tape squares on any input cp of length Z(q).

LEMMA 6.2. If (9,3) is a transformational grammar, then there is a constant K2 and a Turing machine Z2 on VT U L U R U {A} which accepts a tape of the form AxAqA#‘AwherexE(VT - {#})*,~E(Vr UL UR)* and lisa positive integer if and only if cp is well-formed and there is a transformational derivation (J/I, . . . , $t) with respect to 3’ such that q = al, x = d($,) and I($i)<Iforeachi=l,... , t. Further the number of tape squares used is at most K2 1’.

LEMMA 6.3. For every transformational grammar $? = (9,3) there is a constant K3 such that, if x is in L(s) and ifs is an integer such that some deep phrase-marker underlying x contains at most s subsentences, then there is a transformational derivation cpl, . . . , qt such that cpl E X(5’), d(q,) = x, and further for each i = 1, . . . , t, Z(+oi) < Ki Z(x).

The reader who is interested in the results of this section and who is willing to accept these lemmas can read this section, with the exception of the proofs of these lemmas, without any knowledge of the definitions in Sets. 2,3, and 4. No reference to those sections is made until the proofs of these lemmas, which are given at the conclusion of this section.

Consideration of the “plausibility” argument with which this section began (cf. Theorem 6.4 below) will show that the procedure does, in fact, yield a transformational derivation of each terminal string x which is in L(g), if the interpretation of clause (3) is that the procedure returns to (1) when the last line is not a surface phrase-marker of x. The flaw lies in the inability of this procedure to identify those strings not in L(g). When presented with any such string x the procedure will not terminate, but will instead continue enumerating and testing longer and longer candidates for the deep phrase-marker underlying x. If the process incorporated an effective method for determining from x when to stop enumerating in step (1) and conclude that x is not in L(Q), it would constitute the desired decision procedure. The results of Sec. 5 thus make clear that there is no method for determining when to stop the enumeration. For


example, there cannot be any effective method for computing from each x an upper bound on the length of the shortest deep phrase-marker underlying x, where this bound is taken to be zero when x is not a member of L(!$‘).

Even though we know that it is not the case that every transformational grammar has associated with it an effective method for finding from an x an upper bound for the shortest underlying deep structure, some clearly do. In fact, this discussion indicates that the existence, for a particular grammar 9, of such an effective method for obtaining these upper bounds is a sufficient condition for the language ,@‘) to be decidable. The condition is also necessary, since if L(g) is decidable one can, given X, apply the decision procedure and, if x is in L(g), then continue by executing steps (l), (2) and (3) of the above procedure to obtain a deep phrase-marker underlying x, and hence an upper bound.

Thus, we will see that, although there is no decision procedure for arbitrary transformational languages, a modification of the fallacious “plausible argument” provides a necessary and sufficient condition that the language generated by a transformational grammar be decidable (cf. Corollary 6.6 below). We have phrased the condition in terms of length of deep phrase-markers, but in our rigorous treatment below we will use the linguistically more significant notion of number of subsentences of the deep phrase-marker-or, equivalently, number of cycles in the transformational derivation. We shall do this by showing that a bound on the number of subsentences, or of cycles, provides an upper bound on the length of an underlying deep phrase-marker.

We begin a more rigorous investigation of the argument given above by noting that Lemma 6.1 gives a Turing machine which performs step (2) of the procedure. At these and the remaining stages of this section, we will also explicitly note the amount of tape required by each Turing machine described in order to determine when the decision procedure for a particular transformational grammar will fall in a restricted class of procedures, such as the primitive recursive or elementary recursive ones. Lemma 6.2 shows that step (3) of the procedure can be carried out (within limited storage if there is an upper bound on the length of any line of a derivation). Lemma 6.3 gives an upper bound on the lengthof a deep phrase-marker (and on each line of a derivation from this phrase-marker to the terminal string) in terms of the number of sentences in the deep phrase- marker-equivalently, the number of cycles in the derivation. Theorem 6.4 produces a Turing machine which, given a string x and a number s, determines whether x has any deep phrase-marker containing s or fewer sentences. Corollary 6.6 justifies our remark that a necessary and sufficient condition is obtained for decidability of languages generated by transformational grammars in terms of computability of a bound on the number of sentences or cycles. Corollary 6.7 shows that if the bound is a very simply computable function of the input string, say elementary recursive, then the decision procedure is similarly easily executed, say is an elementary recursive one.

GENERATIVE POWEROFTRANSFORMATIONALGRAMMARS 75

We begin with a proof of Theorem 6.4 from Lemmas 6.1,6.2 and 6.3.

THEOREM 6.4. For every transformational grammar there is a Turing machine Z, whose alphabet consists of the. terminal symbols of the grammar together with a blank A, which accepts a tape Ax A#"A if underlying x there is a deep phrase-marker containing at most s subsentences and which rejects Ax A#S A otherwise. Further, there is a constant C such that Z uses at most Ccs’(x’ tape squares on this input.

Proof Let $ = (!?,3) be a transformational grammar. We shall construct a Turing machine Z’ operating over the alphabet VT U L U R U (0, A} which operates as desired, and appeal to the well-known result (stated and proved here for completeness as Lemma 6.8) that symbols may be removed from the alphabet of a Turing machine without affecting the function computed at the cost of a linear increase in length of tape. On input A x A#sA, Z’ arranges its tape as follows:

AxA#...#AO...OAO...OAO...OA y--Y--.+--

S : KZl(x) ; K@) ; K*K:“l’(x)

(1): (2) ! (3) f (4) : (5)

where K,, K2, and K3 are the constants given in Lemmas 6.1, 6.2, and 6.3.

The procedure is then to enumerate terminal labeled bracketings cp in increasing

order in the position (3) and as each is produced, if it exceeds the space provided

in position (3), reject the input, if not check in position (4) by Lemma 6.1 whether cp is in L(Y). If it is not, or if it contains more than s subsentences, continue the enumefation. If cp is in l(Y) and has at most s subsentences, then

copy AxAqA# K3'(x) A into position (5) and check by the procedure of

Lemma 6.2 whether up is a deep structure underlying x, and if not, continue

enumerating. If x has a deep phrase-marker with no more than s subsentences, then Lemma 6.3 guarantees that there is one, cp, such that each line of a transformational derivation of x from cp is no longer than K,Sl(x), so that the procedures of Lemmas 6.1 and 6.2 can be carried out on the tape as set up by Z’.

To obtain the constant C, we note that Z’uses l(x) t s t K$l(x) + Kfu:t(x) + K2K3S12(x) t 4 tape squares. Letting C be, for example, K1 t K2 t KS, we have the theorem. n

The Turing machine Z of Theorem 6.4 is almost the machine described by the fallacious “plausible argument” with which this section began-except that a number s is required as input in addition to the string x. We know that s cannot be eliminated for all transformational grammars, but we now proceed to

eliminate s for certain grammars.


Definition 6.5. The cycling function fg of a transformational grammar g is that function from V+ into the non-negative integers whose value on a string x is 0 if x e L(g) and otherwise is the smallest number s such that some deep phrase-marker underlying x has s subsentences.

COROLLARY 6.6. The following three conditions are equivalent for any transformational grammar $ :

(i) the language L(9) is decidable, (ii) the cycling function fg is recursive,

(iii) fg is bounded (pointwise) by a recursive function. ‘4

Proof: Assume that f is bounded by a recursive function f. Given x, to b decide membership in L( ) compute f(x) and apply the machine Z of Theorem

6.4 to A x A # JW A, thus (iii) implies (i). To see that (i) implies (ii), assume that L(g) is decidable. To compute fg (x), output 0 if x $ I!(!_?), otherwise set fg (x) equal to the smallest integer s for which the machine Z of Theorem 6.4 accepts A x A #’ A. That (ii) implies (iii) is trivial since fs bounds itself. l

This corollary, which completes our discussion of decidability of transformational grammars, does not make use of the bound obtained on tape storage

used in Theorem 6.4. This bound becomes g(x) = C cf(X)W when f bounds the cycling function, and g is not much more “difficult to compute” than is f. For example, if f is primitive recursive, or elementary recursive (in the sense of Csillag-Kalmar), so is g. Further, it was shown in Ref. 12 that functions computed by Turing machines for which the amount of tape used was elementary recursive are again elementary, and the extension to primitive recursive, though not drawn there is implicit, and was affirmed in Ref. 4. Hence we have the following corollary.

COROLLARY 6.7. If $? is a transformational grammar whose cycling function fg is bounded by an elementary (primitive) recursive function, then L(s) is an elementary (primitive) recursive language. (The same holds upon substituting bn for any n > 3 for ‘elementary” or “primitive, ” where &” is defined in Ref 5.)

Proof. In Ref. 12, p. 148 it was shown that the class of elementary recursive functions of Csillag-Kalmar (see Refs. 5,8, p. 76, or 6, Ex. 1, Sec. 57, p. 285, for a definition) can be characterized as the class of predictably computable func-

tions; i.e. u Ft. Here FO is the class of functions computable on finite autom- i=l

14We shall assume that the domain offg has been encoded in a suitable fashion into the

non-negative integers so that we may speak of recursiveness-the p-adic encoding is a natural one, for example.

GENERATIVEPOWEROFTRANSFORMATIONALGRAMMARS 77

ata and Fi +, is the class of all functions f computable on Turing machines for which there is a g in Fi such that, for each x, g(x) is an upper bound on the storage used to compute f(x). It was shown in Ref. 12 that, if f(x) is in Fi, then for any constant C, Cf(*) is in Ft+l as is Cf(*)l(x). Hence, if the cycling function ~GJ (x) for a grammar 9 is bounded by a function in Fi, then the bound

Cc’5 (X) Z(x) on tape squares used to decide membership of x in L(s) provided in Theorem 6.4 is in Fi+z, and thus the characteristic function of t(g) is in

Fi+s (i.e. membership in L(g) is decidable in class Fi+3). It was implicit in Ref. 12 and noted explicitly in Ref. 4 that if the tape bound for a computation is in $“, for any n 2 3, then so is the function computed; hence the result for primitive recursive functions also follows since the primitive recursive functions

are fi gn. 9

It is worth noting that Corollary 6.7 unlike Corollary 6.6 does not establish the equivalence of elementary recursiveness of fg and of L(Q). The question of whether or not a grammar for an elementary (predictably computable) language can have very complex deep structure with “unpredictable” nesting in the sense that the cycling function is not predictably computable is an interesting one which the authors have not studied. A positive answer would suggest that de- cisions of grammaticality can be made in ways far “simpler” than the resurrec- tion of an entire deep structure, a finding with some psycholinguistic interest.

We now conclude this section with the proofs of Lemmas 6.1, 6.2 and 6.3. It will be convenient to also state and prove explicitly the result used in the proof of Theorem 6.4 that the alphabet on which a Turing machine operates may be modified with little effect on the storage needed.

LEMMA 6.8. Let Z be a Turing machine on the alphabet VI, let V, be a subset of VI containing at least two symbols and let c be the number of symbols in VI - V,. There is a Turing machine 2’ on V, such that, for every string q E V,*, Z’ accepts (rejects) cp if and only if Z accepts (rejects) cp, and further Z’ uses exactly c + 1 times the amount of tape used by Z on input up.

Proof Let bI and b2 be distinct elements of V,. We shall encode Vt into V,* as follows. Each string of V; of length 1 is replaced by a string of V,’ of length (c t 1) I; namely each symbol a of Va is replaced by ab:, and the kth symbol of VI - Vz is replaced by b:-k b 1 bi. The machine Z’ operates as follows. On an input q in V,* it replaces ‘p by its encoded version, multiplying the length of the tape by c + 1, then operates on the encoded version exactly as Z does (interpreting each string of c + 1 symbols as a single symbol of V,). m

Proof of Lemma 6.1. We shall construct Z on the alphabet VT U L U R U {A} U VN U (0, 1,2, . _ . , j} and appeal to Lemma 6.8. Upon input of a string cp, Z checks that cp is in (Vr U L U I?)*, and if so sets up the tape as follows:


AcpA#S#A... A IO’-’ A IO'-' A.. . AlO’-’ A

-y-- - I 1 1 1

with (p + CJ)I copies of 101-’ A, where p and 4 are the cardinalities of VT and VN respectively and I = Z(p). In the section of the tape initially containing #S#, 2 carries out a strong derivation in accordance with the rules of 9.

The (p + 4)’ rightmost sections of the tape, each of length I, which will be referred to as “counters,” are used to specify, at each step of the derivation, that rule to be applied and the position at which to apply it, and also to assure that no possible derivation is overlooked. Since without loss of generality we may require that no line of a derivation be repeated, since there are fewer than k’ distinct strings of length at most I- 2 over an alphabet of k symbols, and since the outermost two symbols ignoring #‘s of each line but the first in our derivation are [S and] s, we see that if there is a derivation of cp, then theremust be one with less than (p + 4)l steps. Specification of a derivation is accom- plished as follows. The ith counter specifies the action taken at the ith step; it will contain a 0 on each of its 1 squares except one, say the mth which will contain an integer n, 1 Q n < j. This setting of the ith counter requires the nth rule of 9 to be applied to the mth symbol of the string in this ith step. If that is not possible (if the string at this step is less than m symbols long, if its mth symbol is not the nonterminal rewritten by the nth rule, or if the context of the nth rule is not satisfied at this position or if the result would be a string more than I symbols long) then the string at this ith step is compared with cp. If they are equal, the computation terminates and cp is accepted; if not, then this attempted derivation fails and the next attempt is begun by restoring S on the second section of the tape and by passing to the next arrangement of the counters. If all arrangements have been tried, cp is rejected. We leave to the reader the specification of a systematic procedure for running through all the (p + 4)‘. 1. j possible arrangements of the counters so that every possible derivation of length (p + q)l or less is attempted. The tape used in this computation is actually [(p + s)’ t 21 (It 1) t 3 tape squares. Reducing this to the alphabet VT U L U R U {A} by Lemma 6.8, the Turing machine Zr uses (4 t j + 2) times this many tape squares. Hence, setting K1 equal to, for example (p + 4 + j)’ establishes the desired result. n

Proof of Lemma 6.2. We appeal to Lemma 6.8 and add 0 to the alphabet before constructing Z. Given (x, cp, #‘), Z rejects the input if l(p) > I or if9 is not well-formed; otherwise it sets up the tape as follows:

Ax A(p,aA”-’ cpO’+‘)A A”-’ O’A _ . . AA”-’ O’A,

(kY 1)s

GENERATIVEF-OWEROFTRANSFORMATIONALGRAMMARS 79

where k is the number of transformations of S, n is the maximum number of terms in any of these transformations, and s is the number of subsentences in 9. The leftmost two sections of this tape retain permanent copies of x and 9 while the rightmost (k + 1)s sections which we will call “counters” are used to carry out transformational derivations from 9, Since each line in any derivation has length less than 1, and since the transformation applied to any line analyzes it into at most n terms which we can record by the insertion of n - I or fewer A’s into the line, each counter can hold a line of a derivation factored into the terms appropriate to the next transformation to be applied. Since there can be only s cycles of transformations each of k t 1 steps, the tape is sufficient to hold an arbitrary transformational derivation from 9 each line of which is to have length less than 1. The ith counter is used to specify the factorization of the line obtained at the ith step; and each counter is set by inserting n - 1 A’s into a string of I 0’s. We have indicated the insertion of these n - 1 A’s in the leftmost spaces in each counter and the duplication of 9 following these in the first counter in our description of the tape we set up.

Each time a derivation is unsuccessful, the next position of the counters is obtained and a new derivation is attempted. We leave to the reader the specification of a systematic procedure for moving the (k t 1)s (n - 1) A’s in the counters through all possible positions so that all possible factorizations are attempted at each step of a derivation.

To describe the manner in which a derivation is carried out, let us describe an arbitrary step in such a derivation. Assume that the transformations are T . . . ) Tk,andtheyareni,.. . , nk-term transformations respectively, ni < n. C&.ider the [(k t 1) (p - 1) + 41 th step, 1 f p < s, 0 < (I < k, and let the [(k + 1) (p - 1) + 41 th line of the derivation be rj. If 4 = 0, simply place p(JI) into the spaces occupied by O’s in the next counter, skipping spaces occupied by A’s. If q > 1, let the contents of the counter be x1 o 1 ho2 . . . w, _ rAti,,, x2 where w = wlw2 . . . w,_,w~ is the pth subsentence of $, and where the remaining A’s, if any, are in the x’s. (More precisely, x2 contains exactly s - p occurrences of ] s .)

If (WI,..., w,) is a proper analysis of w for Tq, then apply Tq to

(Wl, f. . , w,) and write x)ITq(wl, . . . , w,)x; in the spaces occupied by O’s in the next counter, where xi is the result of deleting A’s from xi. If the result is more than 1 symbols long, reset the counters and try the next derivation. If (at,..., om) is not a proper analysis of w for Tq , write x’r w& into the spaces holding O’s in the next counter.

In this process, when a result appears in the (k t 1)sth counter, its debracketization should be compared with x and if different, the counters should be reset and the next derivation tried. If equal, then we have carried out a transformational derivation of x unless at some step in which the factorization was not a proper analysis there was some other factorization which could be a proper


analysis. Recall that our transformations are obligatory, and if any analysis at the [(k t 1) (p - 1) + 41 th step is proper, then the qth transformation must apply there. I5 Hence if x is obtained at the end of a purported derivation, we then go back throughkach step at which the analysis was improper, and try all other arrangements of the n - 1 A’s. If some arrangement yields a proper analysis, we reject this purported derivation, reset the counters and try the next derivation. If there is no proper analysis, we reset the A’s in this one counter, and try the next counter at which the analysis was improper. If no counter holding an improper analysis can be reset in this way to yield a proper analysis, then the input tape A x A cp A #‘A is accepted.

The number of tape squares used is 2[1(x) + l(q) + (k + l)s(l + n) + 31 and since I(x), l(q) and s are each less than I we can obtain the desired bound by setting K2 equal to, for example, 4(k + 1)~. n

Proof of Lemma 6.3. We shall show that K3 may be taken to be [n(4q t 1) . (c + I)] ‘+’ where k is the number of transformations in 7, n is the maximum number of terms in any of the k transformations in Y, 4 is the number of nonterminals in Y, and c is length of the longest terminal stringy mentioned in the structural condition of any transformation in Ff.

Consider the step of the transformational derivation taking Cpi to Cpi+l. We desire an upper bound on I(qi)/l(Cpi+, ); since these labeled bracketings are reduced, it will suffice to bound the ratio of the debracketizations. Letting I be the length of d(pi+,), we now show that the length of d(gi) can be at most n(/ + c). The only elementary transformations which can shorten Cpi are deletion and substitution, and the condition of recoverability of deletions requires that each application of these operations occur only when there is an associated condition either of the form (i) in Def. 2.13 or of the form (ii) in that definition. The greatest number of terminals that can be erased from any term of the proper analysis of cpi by a deletion or substitution is thus less than 1 t c; 1 if the condition is of type (i) since then a copy (of length at most 1) is left in pi+1 , c if the condition is of the other form. Since at most one operation can be applied to each of the at most n terms, the length of d(cpi) is less than n(Z + c). Since vi is reduced its length is less than 2q [2n(l + c) - I] + n(l + c) by Lemma 2.4. Since 1 > 1, Z(qi) < n(4q t 1) (c + 1)1. Since I is less than I(~pi+~), we have @I& < n(4q + 1) (c + l)l(pi+l), for each i = 1,. . . , t - 1, SO that /((pi) < [n(4q t I) (c t l)] I-i (4q t l)Z(x), where the right factors are obtained by

lSThroughout the development, we could have generalized the notion of transformation to include optional transformations by allowing an optional/obligatory distinction to be specified for each transformation without changing our results. In this proof, we would only check for transformations marked obligatory that there is no proper analysis by the method described immediately below.


Lemma 2.4 using x = d(cp,). Finally we note that t is at most (k + l)s, and this gives the desired result &J$ < { [n(4q + 1) (c + l)] k+l )” I(x). l

7. SOME CONCLUDING REMARKS

We have seen that the ability of transformational grammars to generate nonrecursive languages, nonprimitive recursive languages, nonelementary languages, etc. resides in the fact that very short sentences may have very large numbers of cycles in their derivations, and thus a great amount of deletion may take place in the transformational derivation even though it is all “recoverable.” Thus Corollaries 6.6 and 6.7 show that any restriction which limits the number of subsentences in the deep phrase-markers of strings generated by a transformational grammar can be interpreted as a stronger condition of recoverability of deletions. Available transformational grammars of natural languages do not make use of the power to take enormous numbers of cycles in the derivation of very short sentences. In fact, it appears that for every transformational grammar 3 written for a natural language there is a constant k such that the function I?(X) bounds & . Since Icrtx) is an elementary function the language L(9) is elementary by Corollary 6.7; it is even in F 3, this corollary since 21cx) is in Fe.

as can be seen from the proof of

These observations suggest that an appropriate line of research for the dis- covery of a more adequate condition of recoverability of deletions would be to search for empirically supportable restrictions on transformational grammars which would guarantee that the cycling function of such grammars be bounded by an exponential or polynomial function. This would become especially interesting if the length of the deep phrase-marker were linear in the terminal string x. Then we would know that the languages generated by these grammars were context-sensitive since this restriction would permit checking of base and transformational components to be done nondeterministically in linearly bounded storage.

We relate our results to some remarks and proposals of Putnam [Ref. 111. There he noted that every recursively enumerable language is generated by a transformational grammar and made several suggestions for conditions which would restrict the transformational languages to being recursive. We will return to his reasons for desiring such restrictions. He suggested two conditions (Ref. 11, p. 42) (i) that the transformational rules be made “cut-free” in the sense that the output of a transformation never be shorter than its input and (ii) that there be constants nl and n2 for each transformational grammar such that at most nr terminals can be deleted by any transformation and at most n2 deletion transformations can be applied in any derivation.

Empirical considerations clearly rule out both of these as restrictions on the definition of a transformational grammar. Noting this, Putnam proposed that


the class of transformational grammars be defined so that they satisfy a “cut- elimination” theorem. We can interpret this rather broadly to mean that for every grammar 9, in the class there is another grammar gz such that (i) L(sr) = L(gl) and (ii) th ere is a constant k with the property that for every x E L(!&) there is a deep phrase-marker cp underlying x with respect to gz such that I[d(p)] < M(x). We now see that any grammar satisfying such a cut- elimination theorem generates a language which more than being recursive is context sensitive. This is so because a nondetermlnistic linear bounded automaton can determine both that a labeled bracketing cp is strongly generated by a context sensitive grammar and that it underlies a given string x if the automaton has enough tape to write cp (since the Ccs’(x) sections of the tape in the proof of Theorem 6.4 are used only to check deterministically all pos- sibilities, and hence are dispensable in nondeterministic operation.) However, we have no way of settling the question whether grammars of natural languages satisfy a cut-elimination theorem.

Thus, let us return to the point discussed at the end of Sec. 5, where we concerned ourselves with the question whether all natural languages are recursive. Putnam offers an argument (Ref. 11, pp. 39-41) that natural languages are recursive. His argument involves several highly debatable assumptions and in addition is in reality an argument that the set of sentences of a natural language acceptable to a speaker under performance conditions is recursive rather than an argument about the set of sentences specified as grammatical by the speaker’s competence (Ref. 1, pp. 3-4, 10-15). We are able to circumvent these difficulties and offer a new argument based on empirical research in linguistics.

There has been a great deal of work describing the competence of native speakers of a variety of natural languages by transformational grammars. As we have noted, all these grammars seem to have exponentially bounded cycling functions. Thus, if one makes the empirically falsifiable assumptions (a) that every natural language has a descriptively adequate transformational grammar, and (b) that the languages investigated so far are typical as regards the computational complexity of their cycling functions, then it follows that the set of grammatical sentences of every natural language is recursive, in fact predictably computable and in F3 at worst. There is a great deal of empirical evidence to support assumption (a) and we see no reason to doubt (b); thus we feel that this argument is empirically well supported. It provides strong justification for our feelings expressed at the end of Sec. 5 that recoverability of deletions should restrict natural languages to being recursive. It is worthy of note that the assumptions of this argument are not philosophical but empirical in nature.

Thus we can justify the intuition of virtually all linguists that natural languages are recursive. This provides motivation for the desire, as seen for example in (Ref. 1, footnote 37, p. 208), of transformational linguists to restrict de-


letions so that transformational languages are recursive. Although we have shown that the restrictions currently imposed on deletions do not accomplish this, our results provide guidance for research into this problem.

REFERENCES

1. Noam Chomsky,Aspecrs offhe Theory of Syntax, M.I.T. Press, Cambridge (1965). 2. Noam Chomsky, Current Issues in Linguistic Theory, Mouton, The Hague (1964). 3. Noam Chomsky, On certain formal properties of grammar, Information and Control 2,

137-167 (1959). 4. Alan Cobham, The intrinsic computational difficulty of functions, Logic,MethodoIogy

and Philosophy of Science (Proc. 1964 Internat. Congr.), North-Holland, Amsterdam (1965) pp. 24-30.

5. Andrej Grzegorczyk, Some classes of recursive functions, Rozprawy Matemafyczne, Warsaw (1953).

6. S. C. Kleene, Introduction to Metamathematics. Van Nostrand, Princeton, N.J. (1952). 7. S. Y. Kuroda, Classes of languages and linear-bounded automata, Information and Con-

trol 7,207-223 (1964). 8. Rozsa Peter,Rekursive Funktionen, Akademia Kiado, Budapest (195 1). 9. Stanley Peters, A note on the equivalence of ordered and unordered grammars,Harvard

Computation Laboratory Report to NSF, No. I7 (1966). 10. Stanley Peters, and R. W. Ritchie, On restricting the base component of transforma-

tional grammars, Information and Control 18,483-501 (1971). 11. Hilary Putnam, Some issues in the theory of grammar, The Structure of Language and

Its Mathematical Aspect (Roman Jakobson, Ed.), American Mathematical Society, Providence, RI. (1961).

12. R. W. Ritchie, Classes of predictably computable functions, Trans. Amer. Math. Sot. 106, 139-173 (1963).

13. John R. Ross, A proposed rule of tree pruning, Harvard Computation Laboratory Report to NSF, No. I7 (1966).

Received May, I969; revised version received April, 1971

Documents

INFORMATION SCIENCES 6,49-83 (1973) 49gpenn/csc2517/peters-ritchie73.pdf · INFORMATION SCIENCES 6,49-83 (1973) 49 ... guage generated by a transformational grammar is no greater