Uniquely Parsable Unification Grammars and Their Parser Implemented in Prolog

Uniquely Parsable Unification Grammars and TheirParser Implemented in Prolog

JIA LEE, KENICHI MORITA, HIROKI ASOUand KATSUNOBU IMAIHiroshima University, Faculty of Engineering, Higashi-Hiroshima, 739-8527, JapanE-mail: [email protected]

Abstract. A uniquely parsable grammar (UPG) introduced by Morita and coworkers is a formalgrammar with a restricted type of rewriting rules, where parsing can be performed without backtrack-ing. By extending a UPG, we introduce a uniquely parsable unification grammar (UPUG), and weinvestigate its applicability to parsing. A unification grammar (UG) is a system such that a sequenceof terms is rewritten by a set of rules, and the rewriting process accompanies unification of terms asin Prolog. We first define a general framework of a UG and then give a UPUG-condition so that ithas the property of unique parsability. Since the class of UPGs is a subclass of UPUGs and is knownto be universal in language generating ability, the class of UPUGs is also universal. We then show asimple parsing method for UPUGs. Based on it, we give a Prolog implementation of a parser whichwill be useful for natural language analysis and other applications.

Key words: deterministic parsing, unification grammar, uniquely parsable grammar

1. Introduction

A uniquely parsable grammar (UPG) introduced by Morita et al. (1997) is a gener-ative grammar whose rewriting rules satisfy the following condition: If a suffix ofthe right-hand side of a rule matches with a prefix of that of some other rule, thenthese overlapping portions remain unchanged by the reverse application of theserules. Due to this condition, UPGs have a kind of confluence property and thus canbe parsed without backtracking.

Usual grammars have, in general, a ‘nondeterministic’ nature in both directionsof derivation and reduction. That is, there are many possible ways of applications(or reverse applications, respectively) of rules in each step of a derivation (reduc-tion). On the other hand, in UPGs, a derivation process is nondeterministic as usual,but a reduction is (in some sense) deterministic. This ‘backward deterministic’property leads to the result that the class of UPGs and its three subclasses forma ‘deterministic Chomsky hierarchy’, i.e., they are exactly characterized by de-terministic Turing machines, deterministic linear-bounded automata, deterministicpushdown automata, and deterministic finite automata (see Morita et al., 1997).

In this paper, we introduce a unification grammar (UG) version of a UPG calleda uniquely parsable unification grammar(UPUG). Here, we regard a UG as agrammar in which a string of terms is rewritten by a rewriting rule, and the de-rivation (and reduction) steps are carried out based on unification of terms as in

Grammars3: 63–81, 2000.© 2000Kluwer Academic Publishers. Printed in the Netherlands.

64 J. LEE ET AL.

Prolog. (Note that, in the field of linguistic theory, the notion of unification or aunification grammar has a slightly different meaning from above. See e.g., Sells,1985.) Since each term may have several arguments, we can keep a subsidiary in-formation in them. For example, various linguistic features associated to a syntacticcategory of a natural language can be handled by using these arguments. A definiteclause grammar (DCG) introduced by Pereira et al. (1980), which can be easilyimplemented on Prolog, is a kind of UG in this sense. A UG defined here may beregarded as a generalized variant of a DCG. In order to use a UPG in a practicalsituation it is necessary to extend it into a framework of UG.

In addition to such properties, a UPUG inherits unique parsability from a UPG,and thus parsing can be performed deterministically in exactly the same steps as inthe derivation process. We give a simple and efficient leftmost parsing algorithmfor UPUGs. This algorithm can be implemented on Prolog in a very simple manner,which will be useful for natural language analysis and other applications.

2. A Unification Grammar

2.1. DEFINITIONS

In this paper, we define a unification grammar (UG) as a kind of a generativegrammar in which a string of terms is rewritten by a set of rules, and the derivation(and reduction) steps accompany unification of terms.

DEFINITION 2.1. There are two kinds of symbols used in a unification gram-mar: function symbolsandvariables. The number of arguments associated with afunction symbol is calledarity, and thus a function symbolf of arity n is denotedby f (n). A constantis a special function symbol of arity 0. LetF andV be sets offunction symbols and variables respectively. The set of all constants inF is denotedbyConst (F ). A termoverF andV is defined recursively as follows:(1) Every variable is a term.(2) If f (n) is a function symbol of arityn andt1, . . . , tn are terms, thenf (n)(t1, . . .,

tn) is a term. (Note thatf (0)( ) may be simply written byf (0).)The set of all terms overF andV is denoted byT erm(F, V ). A complex termis aterm other than a single variable. Thus the setCT erm(F, V ) of all complex termsis as follows:CT erm(F, V ) = T erm(F, V )− V .

A finite string of terms is represented by

α = t1t2 · · · tkwhereti ∈ T erm(F, V ) (i = 1, . . . , k).

The sets of variables contained in a termt ∈ T erm(F, V ), a string of termsγ ∈ (T erm(F, V ))∗, and a pair of strings of terms(α, β) ∈ ((T erm(F, V ))∗)2 aredenoted byV ar(t), V ar(γ ), andV ar((α, β)), respectively.

UNIQUELY PARSABLE UNIFICATION GRAMMARS 65

DEFINITION 2.2. A substitutionis a mappingσ : V → T erm(F, V ). It can beexpressed by the following set:

{t/X | σ (X) = t andt 6= X}for convenience, and we often use this notation hereafter. LetSupp(σ ) be asfollows: Supp(σ ) = {X | σ (X) 6= X}.

An application of a substitutionσ to a termt is defined as follows, where theyield of the application is denoted bytσ :(1) If t = X ∈ V , thentσ = σ (X).(2) If t=f (n)(s1, . . . , sn) ∈CT erm(F, V ), thentσ =f (n)(s1σ, . . . , snσ ).Similarly, an application of a substitutionσ to a string of termst1 · · · tm is definedas follows:

(t1 · · · tm)σ = t1σ · · · tmσ.Letσ, τ be substitutions. Thecompositionof σ andτ is a mappingσ ◦τ : V →

T erm(F, V ) defined as follows:

∀X ∈ V : σ ◦ τ(X) = τ(σ (X)).If two substitutionsσ, τ satisfySupp(σ ) ∩ Supp(τ) = ∅, we can defineσ ∪ τ asfollows:

σ ∪ τ = {t/X | (σ (X) = t or τ(X) = t) andt 6= X}A renamingof variables is a substitutionσ : V → V which is one-to-one.

DEFINITION 2.3. Lets, t be two terms. We says andt areunifiable(denoted bys ∼ t) iff there exists a substitutionσ such thatsσ = tσ . Suchσ is called aunifierof s and t . Similarly, two stringsξ, η ∈ (T erm(F, V ))∗ of terms are said to beunifiable (denoted byξ ∼ η) iff there exists a substitutionσ such thatξσ = ησ .

DEFINITION 2.4. Aunification grammar(UG) is a system defined by

G = (F, T , V , P, s(ns )).The itemsF, T , V, P, ands(ns ) are as follows:(1) F is a non-empty set of function symbols such thatConst (F ) 6= ∅.(2) T ⊆ Const (F ) is a non-empty set ofterminals.(3) V is a set of variables.(4) P is a finite set ofrewriting rules, each of which is a pair(α, β) of fi-

nite strings of complex terms such thatα ∈ ((CT erm(F, V ))+ − T +), β ∈(CT erm(F, V ))∗, andV ar(α) ⊆ V ar(β). Hereafter, a rewriting rule is de-noted byα→ β instead of(α, β).

(5) s(ns ) ∈ F − T is astart symbol.

66 J. LEE ET AL.

DEFINITION 2.5. LetG = (F, T , V , P, s(ns )) be a UG,ξ = t1t2 · · · tk ∈ (CT erm(F, V ))∗ be a string of complex terms, and

R = a1a2 · · · am→ b1b2 · · · bnbe a rewriting rule inP . We assumeV ar(ξ) ∩ V ar(R) = ∅ (if otherwise renamethe variables inR, since we assume each variable inV ar(R) is quantified by auniversal quantifier). The ruleR is said to beapplicableto ξ at the positionj (16j 6 k) iff there exists a substitutionσ that satisfies

(tj tj+1 · · · tj+m−1)σ = (a1a2 · · · am)σ.If η is the following string of complex terms:

η = (t1 · · · tj−1b1 · · · bntj+m · · · tk)σ,then we say thatη is directly derivedfrom ξ inG. It is written asξ ⇒

Gη (or ξ ⇒ η

if G is understood). When indicating the applied rule, position, and substitution,we write it as

ξ[R,j,σ ]H⇒G

η.

We call [R, j, σ ] an item of rewriting. The reflexive and transitive closure of⇒G

gives the notion of aderivation, and denoted by∗⇒G

. We denote ann-step derivation

byn⇒G

.

LetX1, . . . , Xns be distinct variables. A string of complex termsξ ∈ (CT erm(F, V ))∗ is said to be asentential formin G, iff

s(ns )(X1, . . . , Xns )∗⇒Gξ.

A sentential formξ is called asentencein G iff ξ ∈ T ∗. The languageL(G)generated byG is the set of sentences inG, i.e.,

L(G) = {w | s(ns )(X1, . . . , Xns )∗⇒Gw ∧ w ∈ T ∗ }.

DEFINITION 2.6. LetG = (F, T , V , P, s(ns )) be a UG,ξ = t1t2 · · · tk ∈ (CT erm(F, V ))∗ be a string of complex terms, and

R = a1a2 · · · am→ b1b2 · · · bnbe a rule inP . We assumeV ar(ξ) ∩ V ar(R) = ∅ (if otherwise, rename thevariables inR). The ruleR is said to bereversely applicableto ξ at the positionj (16 j 6 k) iff there exists a substitutionσ that satisfy

tj tj+1 · · · tj+n−1 = (b1b2 · · · bn)σ.


Suchξ is calledreduciblein G. If η is the following string of complex terms:

η = t1 · · · tj−1(a1 · · · am)σ tj+n · · · tkthen we say thatη is directly reducedfrom ξ in G. It is written as

ξ[R,j,σ ]⇐HG

η (or ξ ⇐Gη).

We call[R, j, σ ] an item of reverse rewriting.The reflexive and transitive closure of⇐

Ggives the notion ofreduction

∗⇐G

. The

notion ofn-step reductionn⇐G

is defined similarly.

REMARK . In the above definition of a reductionξ[R,j,σ ]⇐HG

η, a substitutionσ is

applied only to a rewriting ruleR, but not to a stringξ to be reduced. We employthis definition, because it is sufficient when reducing (i.e., parsing) terminal words.Consider a reducing process ofw ∈ T ∗ as below:

w = ξ0⇐Gξ1⇐

Gξ2⇐

G· · ·

Then,V ar(ξi ) = ∅ for eachi (= 0,1, . . . ), sincew contains no variable andV ar(α) ⊆ V ar(β) for eachR = α→ β (by the definition of UG). Hence, in sucha case, there is no need to apply a substitution to a string to be reduced.

DEFINITION 2.7. LetG = (F, T , V , P, s(ns )) be a UG. A direct reductionξ[R,j,σ]⇐HG

η is called adirect leftmost reduction, iff for any [R′, j ′, σ ′] and η′ such that

ξ[R′,j ′,σ ′]⇐H

Gη′, the inequalityj 6 j ′ holds. We denote a leftmost reduction by

ξ[R,j,σ ]⇐Hlmr

η (or ξ ⇐lmrη).

A reduction ξ0⇐ ξ1⇐ · · · ⇐ ξn (n = 0,1, . . . ) is called aleftmost reductioniffξ0⇐

lmrξ1⇐

lmr· · · ⇐

lmrξn, and denoted byξ0

∗⇐lmrξn (or ξ0

n⇐lmrξn).

2.2. BASIC PROPERTIES OF A UG

As defined above, derivation and reduction in UG accompany unification pro-cesses. Hence, it is not obvious whether there is a reduction ofξ for a givenderivation ofξ (and vice versa). Here, we show the relation between derivationand reduction in UG.

68 J. LEE ET AL.

LEMMA 2.1. LetG = (F, T , V , P, s(ns )) be a UG,ξ ∈ CT erm(F, V )+ be astring of complex terms, andt1, . . . , tns ∈ T erm(F, V ) be terms. If

ξn⇐Gs(t1, . . . , tns ),

then the following relation holds for any substitutionσ (n = 0,1, . . . ).

ξσn⇐Gs(t1, . . . , tns )σ

Proof. It is shown by an induction on the numbern of reduction steps. Whenn = 0, it is clear that the lemma holds. We suppose the lemma holds forn 6 k,and consider an arbitrary reduction ofk + 1 steps as below:

ξ[R,j,τ ]⇐HG

ηk⇐Gs(t1, . . . , tns )

AssumeR = α → β. Then there exist someγ, δ ∈ CT erm(F, V )∗ such thatξ = γ βτ δ. Therefore:

ξ = γ βτ δ [R,j,τ ]⇐HG

γ ατ δ = η.

It is clear that

ξσ = γ σ β(τ ◦ σ ) δσ [R,j,τ◦σ ]⇐HG

γ σ α(τ ◦ σ ) δσ = ησ.

holds for any substitutionσ . By the inductive hypothesis:

ησk⇐Gs(t1, . . . , tns )σ.

Hence,

ξσ ⇐Gησ

k⇐Gs(t1, . . . , tns )σ,

and the lemma is proved. 2LEMMA 2.2. LetG = (F, T , V , P, s(ns )) be a UG, ξi ∈ CT erm(F, V )∗ (i =1, . . . , n) be a string of complex terms, andX1, . . . , Xns ∈ V be distinct variables.If

s(X1, . . . , Xns )[R1,j1,σ1]H⇒

Gξ1 · · · [Rn,jn,σn]H⇒

Gξn

then the following relation holds (n = 0,1, . . . ):

ξnn⇐Gs(X1, . . . , Xns )(σ1 ◦ · · · ◦ σn).


Proof. It is shown by an induction onn. The lemma is obvious forn = 0.Suppose the lemma holds forn 6 k, and consider a derivation ofk + 1 steps asfollows:

s(X1, . . . , Xns )[R1,j1,σ1]H⇒

Gξ1 · · · [Rk,jk,σk]H⇒

Gξk[Rk+1,jk+1,σk+1]H⇒

Gξk+1

We assumeRk+1 = α → β. Also assumeV ar(Rk+1) ∩ V ar(ξk+1) = ∅ (ifotherwise rename the variables inRk+1) as well asV ar(Rk+1) ∩ V ar(ξk) = ∅.SinceRk+1 is applicable toξk, we can writeξk = γ α′δ for someγ, α′, δ ∈CT erm(F, V )∗ such thatασk+1 = α′σk+1. From the definition of a derivationwe can seeξk+1 = γ σk+1 βσk+1 δσk+1. SoRk+1 is reversely applicable toξk+1 atthe positionjk+1, and thus the following relation holds:

ξk+1 = γ σk+1 βσk+1 δσk+1[Rk+1,jk+1,σk+1]⇐H

Gγ σk+1 ασk+1 δσk+1 = ξkσk+1

By the inductive hypothesis,

ξkk⇐Gs(X1, . . . , Xns )(σ1 ◦ · · · ◦ σk)

holds. Hence, from Lemma 2.1, we can obtain the following relation:

ξk+1[Rk+1,jk+1,σk+1]⇐H

Gξkσk+1

k⇐Gs(X1, . . . , Xns )(σ1 ◦ · · · ◦ σk ◦ σk+1) 2

LEMMA 2.3. LetG = (F, T , V , P, s(ns )) be a UG,ξ ∈ CT erm(F, V )∗ be astring of complex terms, andt1, . . . , tns ∈ T erm(F, V ) be terms. If

ξn⇐Gs(t1, . . . , tns ),

then for anyX1, . . . , Xns ∈ V−(V ar(ξ)∪V ar(s(t1, . . . , tns ))): which are distinctvariables, the following relation holds (n = 1,2, . . . ):

s(X1, . . . , Xns )n⇒Gξ

Proof. It is shown by the induction onn.(1) The casen = 1: Suppose

ξ[R,j,σ ]⇐HG

s(t1, . . . , tns )

for someR = s(u1, . . . , uns ) → ξ ′. Then, ξ ′σ = ξ and s(u1, . . . , uns )σ =s(t1, . . . , tns ) hold. LetX1, . . . , Xns ∈ V −(V ar(ξ)∪V ar(s(t1, . . . , tns ))) be anydistinct variables. We assume{X1, . . . , Xns } ∩ V ar(R) = ∅ (if otherwise renamethe variables ofR). Let τ = {u1/X1, . . . , uns /Xns }. Then,s(X1, . . . , Xns )(τ ◦σ ) = s(t1, . . . , tns ) ands(u1, . . . , uns )(τ ◦ σ ) = s(t1, . . . , tns ) hold. Therefore,Ris applicable tos(X1, . . . , Xns ) i.e.,

s(X1, . . . , Xns )[R,j,τ◦σ ]H⇒

Gξ ′(τ ◦ σ ).

70 J. LEE ET AL.

SinceSupp(τ) ∩ V ar(ξ ′) = ∅, ξ ′(τ ◦ σ ) = ξ ′σ = ξ . Therefore,

s(X1, . . . , Xns )[R,j,τ◦σ ]H⇒

Gξ.

(2) The casen > 1: Suppose that the lemma holds forn 6 k, and consider areduction ofk + 1 steps as below:

ξ[R,j,σ ]⇐HG

ηk⇐Gs(t1, . . . , tns )

We assumeR = α → β. Also assumeV ar(R) ∩ V ar(η) = ∅ (if otherwise,rename the variables inR) as well asV ar(R) ∩ V ar(ξ) = ∅. From the definitionof reduction, there exist someγ, δ ∈ CT erm(F, V )∗ that satisfy

ξ = γ βσ δ [R,j,σ ]⇐HG

γ ασ δ = η.

SinceV ar(R) ∩ (V ar(η) ∪ V ar(ξ)) = ∅, we can seeSupp(σ ) ∩ (V ar(η) ∪V ar(ξ)) = ∅. Hence,γ σ = γ andδσ = δ hold. Therefore,R is applicable toηand the following relation holds:

η = γ ασ δ [R,j,σ ]H⇒G

(γ β δ)σ = γ βσ δ = ξ.

By the inductive hypothesis,s(X1, . . . , Xns )k⇒Gη. Therefore,

s(X1, . . . , Xns )k+1⇒Gξ. 2

The next Theorem follows from Lemmas 2.2 and 2.3.

THEOREM 2.1. Let G = (F, T , V , P, s(ns )) be a UG, andξ be a string ofcomplex terms.ξ is a sentential form iff

ξ∗⇐Gs(t1, . . . , tns )

for some termst1, . . . , tns ∈ T erm(F, V ).

3. A Uniquely Parsable Unification Grammar

3.1. DEFINITION OF A UNIQUELY PARSABLE UNIFICATION GRAMMAR

DEFINITION 3.1. Auniquely parsable unification grammar(UPUG) is a systemdefined by

G = (F, T , V , P, s(ns ),$).


The itemsF, T , V, ands(ns ) are the same as in UG. $( 6∈ F) is a special symbolcalled an end-marker.P is a finite set of rewriting rules of the following form:

α→ β, $α→ $β, α$→ β$, $α$→ $β$, or $t$→ $$

whereα ∈ ((CT erm(F, V ))+−T +), β ∈ (CT erm(F, V ))+, α 6= β, V ar(α) ⊆V ar(β), andt ∈ ((CT erm(F, V ))− T ) (a rule of the form $t$→ $$ is called anε-rule). Furthermore,P satisfies the following condition.

The UPUG-Condition:1. The right-hand side of each rule inP is unifiable with none of

s(ns )(X1, . . . , Xns ), $s(ns )(X1, . . . , Xns ),

s(ns )(X1, . . . , Xns )$, and $s(ns )(X1, . . . , Xns )$,

whereX1, . . . , Xns are distinct variables which do not appear in the rules inP .

2. For any two rulesR1 = α1→ β1 andR2 = α2→ β2 in P (R1 andR2 may bethe same) the following statements hold:(a) If β1 = β ′1δ1, β2 = δ2β

′2, andδ1 ∼ δ2 for someβ ′1, β

′2, δ1, δ2 ∈ (CT erm

(F, V ) ∪ {$})+, then there existα′1, α′2 ∈ (CT erm(F, V ) ∪{$})∗ that

satisfy the following conditions:(i) α1 = α′1δ1

(ii) α2 = δ2α′2

(b) If β1 ∼ γβ2γ′ for someγ, γ ′ ∈ (CT erm(F, V ) ∪{$})∗, thenR1 = R2

(thereforeγ = γ ′ = ε (empty string)).

The UPUG-condition 2(a) requires that if some suffix of the right-hand side ofR1 is unifiable with some prefix of that ofR2, then the left-hand sides ofR1 andR2

also contain them as a suffix and a prefix, respectively. The condition 2(b) statesthat there is no pair of distinct rulesR1 andR2 such that the right-hand side ofR2 is unifiable with a substring of that ofR1. Note that there is at most oneε-rulebecause of 2(b).

The notions of derivation, reduction, and leftmost reduction for UPUGs aresimilar to those for UGs, except that each string of terms derived or reduced has$’s at the left and right ends. Hence, the languageL(G) generated by a UPUGGis as follows:

L(G) = {w | $s(ns )(Y1, . . . , Yns )$∗⇒G

$w$ ∧ w ∈ T ∗}.

Note that it is easy to modify Lemmas 2.1–2.3 and Theorem 2.1 for UPUGs, whichhave end-markers.

72 J. LEE ET AL.

EXAMPLE 3.1. The grammarGanbncn = (F, T , V , P, s(1),$) is a UPUG thatgenerates the language{anbncn | n = 0,1, . . . }, whereF, T , V, P are as follows:

F = { s(1), a(1), b(1), c(1), f (1),0(0),1(0), a(0), b(0), c(0) }T = { a(0), b(0), c(0) }V = { X,X1, X2, . . . , Y1, Y2, . . . }P = { $ s(0) $→ $ $ (1)

$ s(X)→ $ a(X) b(X) c(X) (2)

a(f (X1))→ a a(X1) (3)

a(1) b(X2)→ a b(X2) (4)

b(f (X3))→ b b(X3) (5)

b(1) c(X4)→ b c(X4) (6)

c(f (X5))→ c c(X5) (7)

c(1) $→ c $ } (8)

A wordaabbcc is generated by the following derivation inGanbncn :

$s(Y1)$[(2),1,σ1]H⇒ $ a(Y1) b(Y1) c(Y1)$ (σ1 = {Y1/X})[(3),2,σ2]H⇒ $a a(Y2) b(f (Y2)) c(f (Y2))$ (σ2 = {f (Y2)/Y1, Y2/X1})[(4),3,σ3]H⇒ $a a b(f (1)) c(f (1))$ (σ3 = {1/Y2, f (1)/X2})[(5),4,σ4]H⇒ $a a b b(1) c(f (1))$ (σ4 = {1/X3})[(6),5,σ5]H⇒ $a a b b c(f (1))$ (σ5 = {f (1)/X4})[(7),6,σ6]H⇒ $a a b b c c(1)$ (σ6 = {1/X5})[(8),7,σ7]H⇒ $a a b b c c $ (σ7 = { })

On the other hand, a reduction process of the wordaabbcc is as follows:

$a a b b c c $[(8),7,τ1]⇐H $a a b b c c(1)$ (τ1 = { })[(7),6,τ2]⇐H $a a b b c(f (1))$ (τ2 = {1/X5})[(6),5,τ3]⇐H $a a b b(1) c(f (1))$ (τ3 = {f (1)/X4})[(5),4,τ4]⇐H $a a b(f (1)) c(f (1))$ (τ4 = {1/X3})[(4),3,τ5]⇐H $a a(1) b(f (1)) c(f (1))$ (τ5 = {f (1)/X2})[(3),2,τ6]⇐H $ a(f (1)) b(f (1)) c(f (1))$ (τ6 = {1/X1})[(2),1,τ7]⇐H $s(f (1))$ (τ7 = {f (1)/X})


REMARK . The class of UPGs is a special class of UPUGs such that all functionsymbols are of arity 0. Since the class of UPGs is universal in language generatingability (Morita et al., 1997), the class of UPUGs is also universal, i.e., equivalentto the class of type-0 grammars.

3.2. UNIQUE PARSABILITY OF A UPUG

We now show that a reduction (parsing) of a language generated by a UPUG canbe done without backtracking. A UPUG inherits this useful property from a UPG(Morita et al., 1997). Although the essential idea of the proof is the same as in aUPG, it is a little bit complex because a reduction process accompanies unificationof terms. The next Lemma shows that a UPUG has a kind of confluence propertyin reduction.

LEMMA 3.1. LetG = (F, T , V , P, s(ns ),$) be a UPUG, andη be a string ofcomplex terms and$ (i.e., η ∈ (CT erm(F, V ) ∪ {$})∗). For any two rulesR1 =α1 → β1, R2 = α2 → β2 in P , and two non-negative integersj1, j2 such that

1 6 j1 6 j2 6 |η|, if η[R1,j1,σ1]⇐H

Gξ1 andη

[R2,j2,σ2]⇐HG

ξ2 hold, thenξ1 = ξ2 or there

exists a uniqueξ ∈ (CT erm(F, V ) ∪ {$})+ that satisfies the following relations,wherej ′2 = j2+ |α1| − |β1|:

η[R1,j1,σ1]⇐H

Gξ1[R2,j

′2,σ2]⇐HG

ξ

η[R2,j2,σ2]⇐H

Gξ2[R1,j1,σ1]⇐H

Gξ

Proof. First we show that ifj1 = j2, thenR1 = R2, and henceξ1 = ξ2. Suppose

η[R1,j,σ1]⇐H

Gξ1 andη

[R2,j,σ2]⇐HG

ξ2, whereR1 = α1 → β1, R2 = α2 → β2. Then there

existγ, δ1, δ2 ∈ (CT erm(F, V ))∗ such that

η = γ β1σ1 δ1 = γ β2σ2 δ2,

where |γ | = j . We can also assumeSupp(σ1) ∩ Supp(σ2) = ∅ (if otherwiserename the variables inR1). If |β1| 6 |β2| (the case|β2| 6 |β1| is similar), thenwe can see that there exist someβ ′1, β

′′1 ∈(CT erm(F, V ))∗ that satisfyβ1 = β ′1β ′′1

andβ ′1(σ1∪σ2) = β2(σ1∪σ2). Hence, from 2(b) of the UPUG-condition,R1 = R2

holds.Next, we consider the casej1 < j2, which is divided into two subcases.(1) The casej2 − j1 > |β1|: By the definition of reduction in UG, there exist

someγ, θ, ζ ∈ (CT erm(F, V ) ∪ {$})∗ such that

η = γ β1σ1 θ β2σ2 ζ,

74 J. LEE ET AL.

where|γ | = j1− 1, and|γ β1σ1 θ | = j2− 1. Therefore,

ξ1 = γ α1σ1 θ β2σ2 ζ,

ξ2 = γ β1σ1 θ α2σ2 ζ.

HenceR2 andR1 are reversely applicable toξ1 andξ2 at the positionsj ′2 andj1

respectively, i.e.,

ξ1[R2,j

′2,σ2]⇐HG

γ α1σ1 θ α2σ2 ζ,

ξ2[R1,j1,σ1]⇐H

Gγ α1σ1 θ α2σ2 ζ.

Thus, the lemma holds forξ = γ α1σ1 θ α2σ2 ζ .(2) The casej2 − j1 6 |β1|: By the definition of reduction and the UPUG-

condition, there areγ, ζ, α′1, β′1, α′2, β′2, δ, δ1, δ2 ∈ (CT erm(F, V ) ∪{$})∗ such

that

η = γ β ′1σ1 δ β′2σ2 ζ,

R1 = α′1δ1→ β ′1δ1,

R2 = δ2α′2→ δ2β

′2,

δ = δ1σ1 = δ2σ2,

where|γ | = j1 − 1, |γ β ′1σ1| = j2 − 1. Note thatR1 andR2 should be as above,sinceδ1 ∼ δ2 (because we can assumeV ar(R1) ∩ V ar(R2) = ∅ and thusδ1(σ1 ∪σ2) = δ1σ1 = δ2σ2 = δ2(σ1 ∪ σ2)). Therefore,

ξ1 = γ α′1σ1 δ1σ1 β′2σ2 ζ,

ξ2 = γ β ′1σ1 δ2σ2 α′2σ2 ζ.

Sinceδ1σ1 = δ2σ2 it is obvious thatR2 andR1 are reversely applicable toξ1 andξ2, at the positionsj ′2, j1 respectively, i.e.,

ξ1[R2,j

′2,σ2]⇐HG

γ α′1σ1 δ2σ2 α′2σ2 ζ,

ξ2[R1,j1,σ1]⇐H

Gγ α′1σ1 δ1σ1 α

′2σ2 ζ.

Thus, the lemma holds forξ = γ α′1σ1 δ′ α′2σ2 ζ . 2

The next Theorem states that any stringη ∈ (CT erm(F, V ) ∪ {$})+ can beparsed without backtracking if it is a sentential form of a UPUG.

THEOREM 3.1. LetG = (F, T , V , P, s(ns ),$) be a UPUG,η be a string of com-plex terms and$ (i.e.,η ∈ (CT erm(F, V )∪{$})+) , andt1, . . . , tns ∈ T erm(F, V )be terms. If

ηn⇐G

$s(t1, . . . , tns )$,


then for any reverse rewriting item[R, j, σ ] andξ ∈ (CT erm(F, V )∪ {$})+ suchthat

η[R,j,σ ]⇐HG

ξ,

the following relation holds (n = 1,2, . . . ):

η[R,j,σ ]⇐HG

ξn−1⇐G

$s(t1, . . . , tns )$

Proof. The theorem is shown by an induction onn.(1) The casen = 1: This case is obvious, because there exists only one reverse

rewriting item by the definition of a UPUG.

(2) The casen > 1: Suppose that the theorem holds forn 6 k. If ηk+1⇐G

$s(t1, . . . , tns )$, then there exists a item[R0, i0.σ0] such that

η[R0,j0,σ0]⇐H

Gξ0

k⇐G

$s(t1, . . . , tns )$

Let [R, j, σ ] be any item such thatη[R,j,σ ]⇐HG

ξ ′ for someξ ′. We assumej 6= j0,

because ifj = j0 then[R, j, σ ] = [R0, j0, σ0], and hence the theorem is proved.By Lemma 3.1, there areξ ∈ (CT erm(F, V ) ∪ {$})+ and non-negative integersj ′0, j

′ such that

η[R0,j0,σ0]⇐H

Gξ0[R,j ′,σ ]⇐HG

ξ,

η[R,j,σ ]⇐HG

ξ ′[R0,j

′0,σ0]⇐HG

ξ.

By the inductive hypothesis,

ξ0[R,j ′,σ ]⇐HG

ξk−1⇐G

$s(t1, . . . , tns )$.

Thus,

η[R,j,σ ]⇐HG

ξ ′[R0,j

′0,σ0]⇐HG

ξk−1⇐G

$s(t1, . . . , tns )$. 2The next Corollary follows from Theorem 3.1. It states that anyη ∈ (CT erm

(F, V ) ∪ {$})+ can be uniquely parsed by a leftmost reduction if it is derived in agiven UPUG.

COROLLARY 3.1. Let G = (F, T , V , P, s(ns ),$) be a UPUG,η ∈ (CT erm(F, V )∪ {$})+ be a string of complex terms and$, andt1, . . . , tns ∈ T erm(F, V )be terms. Ifη

n⇐G

$s(t1, . . . , tns )$, then the following relation holds:ηn⇐lmr

$s(t1, . . .,

tns )$.

76 J. LEE ET AL.

4. Prolog Implementation of a Parsing Algorithm

By Theorem 2.1 and Corollary 3.1, any sentence of a UPUG can be parsed bya leftmost reduction. Since the number of steps of the reduction is equal to thatof a derivation, we can design an efficient parsing algorithm for UPUGs based onleftmost reduction. Furthermore, this algorithm can be implemented on Prolog verysimply.

Here, we write each rule ofG = (F, T , V , P, s(ns ),$) in the form

a1 · · · aib1 · · · bj → a1 · · · aic1 · · · ckwherea1, . . . , ai, b1, . . . , bj , c1, . . . , ck ∈ CT erm(F, V ) satisfy the followingcondition (i, j, k ∈ {0,1, . . . }):

(b1 6= c1) ∨ (j = 0) ∨ (k = 0)

Note that, since a rule of the form

a1 · · · aib1 · · · bj → a1 · · · aicannot be used in a derivation starting from $s(ns )(X1, . . . , Xns ) $, we removesuch ones from the setP . (Because, if this rule is reversely applicable to a stringη such that $s(ns )(X1, . . . , Xns ) $

∗⇒ η, then there exists an infinite reductionprocess starting fromη, which contradicts Theorem 3.1.)

We now give a parsing algorithm for UPUGs.

Algorithm A :Input: UPUGG = (F, T , V , P, s(ns ),$) and a stringd1 · · · dn ∈ T ∗.Output: write “yes” iff $d1 · · · dn$ ∗⇐

G$s(t1, . . . , tns )$ for some

t1, . . . , tns ∈ CT erm(F, V ).1. begin2. ζ := $, η := d1 · · · dn$ ;3. while η 6= ε do4. if there is a rulea1 · · · aib1 · · · bj → a1 · · · aic1 · · · ck ∈ P

such thatζ = ζ ′a′1 · · · a′ic′1 · · · c′k−1, η = c′kη′ and(a1 · · · aic1 · · · ck)σ = a′1 · · · a′ic′1 · · · c′k for some substitutionσ

5. then ζ := ζ ′a′1 · · · a′i , and η := (b1 · · · bj )σ η′6. elseremove the leftmost term ofη, and attach it at the right

end ofζ ;7. if ζ = $ s(ns )(t1, . . . , tns ) $ for some termst1, . . . , tns andη = ε

then write “yes”8. end.

The above algorithm scans the input string from left to right, and if there isa rule reversely applicable to the string, then it performs a reduction operation.Otherwise, it does a right-shift operation (Figure 1). Note that there is a case that


Figure 1. Reduction and right-shift operations of the Algorithm A (4 shows the scanningposition).

the algorithm does not halt, since it is known that a UPG (hence UPUG) is universal(Morita et al., 1997).

THEOREM 4.1. The Algorithm A gives ‘yes’ as an output iff

$d1 · · · dn$ ∗⇐G

$s(t1, . . . , tns )$

for somet1, . . . , tns ∈ CT erm(F, V ). The total number of unifications and otherelementary operations on strings of terms executed for the inputd1 · · · dn such that$d1 · · · dn$ m⇐

G$s(t1, . . . , tns )$ is O(m).

Proof.First, we prove the correctness of the algorithm. Letζh andηh denote thevalues ofζ andη just after theh times executions of thewhile loop of lines 3–6(note thatζ0 = $ andη0 = d1 · · · dn$). We first show that the following proposition(a) holds for anyh such thatζh andηh are defined:

(a) $d1 · · · dn$ ∗⇐lmrζhηh, andζh is irreducible inG.

It is obvious whenh = 0. Assume (a) holds for anyh < H , and consider theH -thexecution of thewhile loop. If the condition of the line 4 is satisfied, then by the op-eration of the line 5ζH−1ηH−1⇐ ζHηH holds. Furthermore, sinceζH is irreducible

78 J. LEE ET AL.

andG is a UPUG, we can seeζH−1ηH−1 ⇐lmrζHηH , and thus $d1 · · · dn$ ∗⇐

lmrζHηH is

obtained. It is clear thatζH is irreducible since it is a prefix ofζH−1. If the conditionof the line 4 is not satisfied, then clearlyζH−1ηH−1 = ζHηH holds by the operationof the line 6, and hence $d1 · · · dn$ ∗⇐

lmrζHηH . ζH is also irreducible in this case,

since the condition of the line 4 is not satisfied.Next, we show the proposition (b):

(b) If ζhηh is reducible, then∃H > h (ζhηh ⇐lmrζHηH ).

Assume there exists noH(> h) such thatζhηh ⇐lmr

ζHηH . Then, it is not possible

to reduceζhηh by the operation of the line 5 in theh′-th execution (h′ > h) of thewhile loop, because this operation must be a leftmost reduction by (a). Hence, theshift operation of the line 6 should be repeatedly applied untilζH ′ = ζhηh, ηH ′ = εis obtained for someH ′ > h. But, from the assumptionζhηh is reducible,ζH ′ isalso reducible, which contradicts (a).

Using (a) and (b) we show the proposition (c):

(c) $d1 · · · dn$ ∗⇐lmrξ iff ∃h (ζhηh = ξ).

First we show ‘only-if’ part of (c), i.e., if $d1 · · · dn$ m⇐lmr

ξ then ∃h (ζhηh = ξ)by an induction onm. It is obvious whenm = 0. Assume it holds form < M,

and consider a reduction $d1 · · · dn$ M−1⇐lmr

ξ0 ⇐lmr

ξ1. By the inductive hypothesis,

there exists someH0 such thatξ0 = ζH0ηH0. SinceζH0ηH0 is reducible, there is

someH1 such thatζH0ηH0 ⇐lmrζH1ηH1 = ξ1 by (b). Hence, if $d1 · · · dn$ M⇐

lmrξ1 then

∃h (ζhηh = ξ1). “If” part of (c) is clear from (a).By (c) we can see that $d1 · · · dn$ ∗⇐

lmr$s(t1, . . . , tns )$ iff ∃h (ζhηh = $s(t1, . . . ,

tns )$). In the cased1 · · · dn ∈ L(G), ζ andη will eventually becomeζ = $s(t1, . . . ,tns ))$ andη = ε (because $s(t1, . . . , tns )$ is irreducible by the definition of aUPUG), and the Algorithm A terminates with the output “yes”. It is also easy tosee that, in the cased1 · · · dn 6∈ L(G), the algorithm never writes “yes”. By above,correctness of the Algorithm A is proved.

Next, we evaluate the total number of unifications and other elementary opera-tions on strings of terms executed in the algorithm. Consider a leftmost reductionprocess $d1 · · · dn$ m⇐

lmr$s(t1, . . . , tns )$ by the Algorithm A. LetL (R, respectively)

be the maximum length (i.e., number of terms) of the left-hand (right-hand) side ofa rewriting rule inP . The initial length ofη is n+ 1 by the line 2 of the algorithm.The length ofη can increase only if the line 5 is executed. If a ruleα → β suchthat |α| > |β| is reversely applied, then|η| increases by|α| − |β| < L− 1. Sincethe line 5 is executedm times, the total increase ofη is at mostm(L − 1). Hence,the total number of times that the line 6 is executed is at mostn + 1+ m(L − 1).


On the other hand, if we consider a derivation $s(X1, . . . , Xns )$m⇒ $d1 · · · dn$,

we can seen − 1 6 m(R − 1), since increase of the number of terms by applyinga ruleα → β is |β| − |α| 6 R − 1. Hence, the total number of times that the line6 is executed is at most(L+ R − 2)m+ 2.

The number of times that thewhile loop is executed is at mostm+ (L + R −2)m + 2 = (L+ R − 1)m + 2. Hence, the total number of unifications and otherelementary operations on strings of terms executed for the inputd1 · · · dn such that$d1 · · · dn$ m⇐

G$s(t1, . . . , tns )$ is O(m). 2

The Algorithm A can be implemented on Prolog in a very simple manner asshown below. Here we use two kinds of predicatesparse andrule. The predicateparse(A,B,Y) realizes the control structure of the Algorithm A, where the argu-mentA keeps the reverse string ofζ as a list, whileB keepsη. The variableY isused to obtains(t1, . . . , tns ) such that $d1 · · · dn$ ∗⇐

G$s(t1, . . . , tns )$. Each unit

clause having the predicate namerule corresponds to each rule of the UPUG, andperforms the reduction operation.

Prolog Implementation of a UPUG Parser:LetG = (F, T , V , P, s(ns ),$) be a given UPUG.1. Define the predicate “parse” by the following three clauses:

parse([’$’,s(X1, . . . , Xns ),’$’],[],s(X1, . . . , Xns )):- !.

parse(A,B,Y):- rule(C,D,A,B),!,parse(C,D,Y).

parse(A,[H|B],Y):- !,parse([H|A],B,Y).The first clause is for terminating the algorithm, the second is for the reductionoperation, and the third is for the right-shift operation.

2. For each rule

a1 · · · aib1 · · · bj → a1 · · · aic1 · · · ckin P , add the following clause:

rule([ai,· · · ,a1|A],[b1,· · · ,bj|B],[ck−1,· · · ,c1,ai,· · · ,a1|A],[ck|B]).

Note that the list ‘[ai,· · · ,a1|A]’ should be replaced by ‘A’ if i = 0 (likewisefor ‘[b1,· · · ,bj|B]’ and the others).

3. Let w = d1d2 · · · dn ∈ T ∗ be an input string. The following query clauseinitiates the parsing procedure, and answers whetherw ∈ L(G) or not:

?- parse([’$’],[d1,· · · ,dn,’$’],Y).Note. Each term ofG should be replaced by an appropriate term of Prolog.

EXAMPLE 4.1. Using the above method we can construct a parser for the UPUGGanbncn in Example 3.1 implemented by Prolog as follows. Here,a, b, and c arerepresented byat, bt, andct, respectively.

80 J. LEE ET AL.

parse([’$’,s(X),’$’],[],s(X)) :- !.parse(A,B,Y) :- rule(C,D,A,B), !, parse(C,D,Y).parse(A,[H|B],Y) :- !, parse([H|A],B,Y).rule([’$’|A],[s(0),’$’|B],[’$’|A],[’$’|B]).rule([’$’|A],[s(X)|B],[bt(X),at(X),’$’|A],[ct(X)|B]).rule(A,[at(f(X1))|B],[a|A],[at(X1)|B]).rule(A,[at(1),bt(X2)|B],[a|A],[bt(X2)|B]).rule(A,[bt(f(X3))|B],[b|A],[bt(X3)|B]).rule(A,[bt(1),ct(X4)|B],[b|A],[ct(X4)|B]).rule(A,[ct(f(X5))|B],[c|A],[ct(X5)|B]).rule(A,[ct(1),’$’|B],[c|A],[’$’|B]).

A sentence ‘aaabbbccc’ can be parsed by the following query, and the results(f (f (1))) is obtained (note that$aaabbbccc$

∗⇐ $s(f (f (1)))$):

?- parse([’$’],[a,a,a,b,b,b,c,c,c,’$’],Y).Y = s(f(f(1)))

5. Concluding Remarks

In this paper, we extended the framework of a UPG to a UPUG, a unificationgrammar version of the former, and gave a simple parsing algorithm and its im-plementation on Prolog. A UG has some similar features and functions as anattribute grammar (see e.g. Deransart et al. (1993)). Hence a UPUG can be usedas a deterministic alternative for an attribute grammar. Furthermore, since eachterm in a UPUG may have arguments, it is useful for describing a grammar of anatural language, in which various ‘features’ should be handled as parameters ofa syntactic category. Of course, when describing a natural language by a UPUG,syntactic ambiguity cannot be handled by it. This property may be a serious defectin some case, but there is also a case in which we want a unique parse result byputting some constraints on the syntactic structure, and in the latter case UPUGsare useful.

Although the number of rewriting rules of a UPUG is generally larger than thatof a CFG generating the same language, the UPUG-condition, by which determin-istic parsing is possible, is much simpler than that of an LR(k) grammar (Knuth,1965) or an LL(k) grammar (Rosenkrantz et al., 1970). In addition, the languagegenerating ability of UPUGs is universal (i.e., they cover the class of all recursivelyenumerable sets, rather than the class of all context-free languages), since UPGsare already universal.


Acknowledgements

The authors would express their thanks to the referees for the invaluable comments.They also thank Dr. Chuzo Iwamoto of Hiroshima University for his useful discus-sion. This work was supported in part by Grant-in-Aid for Scientific Research (C),No. 10680355, from Ministry of Education, Science, Sports and Culture of Japan.

References

Deransart, P. and J. Maluszynski.A Grammatical View of Logic Programming, The MIT Press,Cambridge, MA, 1993.

Knuth, D. E. On the translation of languages from left to right,Information and Control, 8: 607–639,1965.

Morita, K., N. Nishihara, Y. Yamamoto and Z. Zhang. A hierarchy of uniquely parsable grammarclasses and deterministic acceptors,Acta Informatica, 34: 389–410, 1997.

Pereira, F.C.N. and D.H.D. Warren. Definite clause grammar for language analysis,ArtificialIntelligence, 13: 231–278, 1980.

Rosenkrantz, D.J. and R. E. Stearns. Properties of deterministic top-down grammars,Informationand Control, 17: 226–256, 1970.

Sells, P.Lectures on Contemporary Syntactic Theories, CSLI Lecture Notes, Stanford, CA, 1985.

Documents

Uniquely Parsable Unification Grammars and Their Parser Implemented in Prolog