Practical Experiments in Parsing using Tree Adjoining Grammarsanoop/papers/pdf/parser_expt_final.pdf · Workshop TAG+5, Paris, 25-27 May 2000 Practical Experiments in Parsing using

Workshop TAG+5, Paris, 25-27 May 2000

Practical Experiments in Parsingusing Tree Adjoining Grammars�

Anoop Sarkar

Dept. of Computer and Information SciencesUniversity of Pennsylvania

200 South 33rd Street,Philadelphia, PA 19104 USA

[email protected]

AbstractWe present an implementation of a chart-basedhead-corner parsing algorithm for lexicalizedTree Adjoining Grammars. We report on somepractical experiments where we parse2250sentences from the Wall Street Journal usingthis parser. In these experiments the parseris run without any statistical pruning; it pro-duces all valid parses for each sentence inthe form of a shared derivation forest. Theparser uses a large Treebank Grammar with6789 tree templates with about120; 000 lexi-calized trees. The results suggest that the ob-served complexity of parsing for LTAG is dom-inated by factors other than sentence length.

1. MotivationThe particular experiments that we report onin this paper were chosen to discover certainfacts about LTAG parsing in a practical setting.Specifically, we wanted to discover the impor-tance of the worst-case results for LTAG pars-ing in practice. Let us take Schabes’ Earley-style TAG parsing algorithm (Schabes, 1994)which is the usual candidate for a practicalLTAG parser. The parsing time complexity of

�I would like to thank Aravind Joshi, Carlos Proloand Fei Xia for their help and suggestions. This workwas partially supported by NSF Grant SBR 8920230.

this algorithm for various types of grammarsare as follows (for input of lengthn):

O(n6) - TAGs for inherently ambiguous lan-guages

O(n4) - unambiguous TAGs

O(n) - bounded state TAGs e.g. theusual grammarG where L(G) =fan bn e cn dn j n � 0g (see (Joshiet al.,1975))

The grammar factors are as follows: Schabes’Earley-style algorithm takesO(jAjjI[AjNn6)worst case time andO(jA[IjNn4) worst casespace, wheren is the length of the input,A isthe set of auxiliary trees,I is the set of initialtrees andN is maximum number of nodes inan elementary tree.Given these worst case estimates we wish toexplore what the observed times might be for aTAG parser. It is not our goal here to comparedifferent TAG parsing algorithms, rather it is todiscover what kinds of factors can contribute toparsing time complexity. Of course, a natural-language grammar that is large and complexenough to be used for parsing real-world textis typically neither unambiguous nor boundedin state size. It is important to note that in this

Anoop Sarkar

paper we are not concerned withparsing ac-curacy, rather we want to exploreparsing effi-ciency. This is why we do not pursue any prun-ing while parsing using statistical methods. In-stead we produce a shared derivation forest foreach sentence which stores, in compact form,all derivations for each sentence. This helpsus evaluate our TAG parser for time and spaceefficiency. The experiments reported here arealso useful for statistical parsing using TAGsince discovering the source of grammar com-plexity in parsing can help in finding the rightfigures-of-meritfor effective pruning in a sta-tistical parser.

2. Treebank GrammarThe grammar we used for our experimentswas a Treebank Grammar which was extractedfrom Sections 02–21 of the Wall Street Jour-nal Penn Treebank II corpus (Marcuset al.,1993). We are grateful to Fei Xia for useof this grammar which was part of a separatestudy (Xia, 1999). The extraction convertedthe derivedtrees of the Treebank intoderiva-tion trees which represent the attachments oflexicalized elementary trees. There are6789tree templates in the grammar with47; 752 treenodes. Each word in the corpus selects someset of tree templates. The total number of lex-icalized trees is123; 039. The total number ofword types in the lexicon is44; 215. The aver-age number of trees per word is2:78. How-ever, the average gives a misleading overallpicture of the syntactic lexical ambiguity in thegrammar.1 Figure 1 shows the syntactic lexi-cal ambiguity of the150 most frequent wordsin the corpus. We shall return to this issue oflexical ambiguity when we evaluate our TAG

1We define the (syntactic) lexical ambiguity for alexicalized TAG as the number of trees selected by alexical item. Note that in a fully lexicalized formalismlike LTAG, lexical ambiguity includes (to some extent)what would be considered to be a purely syntactic am-biguity in other formalisms.

parser. Finally, some lexicalized trees from thegrammar are shown in Figure 2.

0

20

40

60

80

100

120

140

160

180

200

200 400 600 800 1000

Num

ber

of tr

ees

sele

cted

Word frequency

Figure 1: Number of trees selected by the150most frequent words in the input corpus. (x-axis: Word Frequency; y-axis: Number ofTrees Selected)

3. The Parser3.1. Parsing Algorithm

The parser used in this paper implements achart-based head-corner algorithm. The use ofhead-driven prediction to enchance efficiencywas first suggested by (Kay, 1989) for CFparsing (see (Sikkel, 1997) for a more de-tailed survey). (Lavelli & Satta, 1991) pro-vided the first head-driven algorithm for LT-AGs which was a chart-based algorithm butit lacked any top-down prediction. (van No-ord, 1994) describes a Prolog implementa-tion of a head-corner parser for LTAGs whichincludes top-down prediction. Significantly,(van Noord, 1994) uses a different closure re-lation from (Lavelli & Satta, 1991). The head-corner traversal for auxiliary trees starts fromthe footnode rather than from the anchor.The parsing algorithm we use is a chart-basedvariant of the (van Noord, 1994) algorithm.We use the same head-corner closure as pro-posed there. We do not give a complete de-

Practical Experiments in Parsing using TAGs

NNPn

NPu

NNPm

NPn

NPu

sNPNNP@=41[Haag] mNNP@ NP*=2 1[Ms.]

NNPn

NPu NP

arg

VBZn

NParg

VPn

Su

sNPNNP@=41[Elianti] sSNPsVBZ@ NPs=201[plays]

Figure 2: Example lexicalized elementary trees from the Treebank Grammar. They areshown in the usual notation:� = anchor; #= substitution node; � = footnode; na =null-adjunction constraint. These trees can be combined using substitution and adjunction toparse the sentenceMs. Haag plays Elianti.

scription of our parser here since the basicidea behind the algorithm can be grasped byreading (van Noord, 1994). Our parser differsfrom the algorithm in (van Noord, 1994) insome important respects: our implementationis chart-based and explicitly tracksgoal anditem states and does not perform any implicitbacktracking or selective memoization, we donot need any additional variables to keep trackof which words are already ‘reserved’ by anauxiliary tree (which (van Noord, 1994) needsto guarantee termination), and we have an ex-plicit completionstep.

3.2. Parser Implementation

The parser is implemented inANSI C and runson SunOS 5.x and Linux 2.x. Apart fromthe Treebank Grammar used in this paper, theparser has been tested with the XTAG EnglishGrammar and also with a Korean grammar.The implementation optimizes for space at theexpense of speed, e.g. the recognition chart isimplemented as a sparse array thus taking con-siderably less than the worst casen4 space and

the lexical database is read dynamically froma disk-based hash table. For each input sen-tence, the parser produces as output a sharedderivation forest which is a compact repre-sentation of all the derivation trees for thatsentence. We use the definition of derivationforests for TAGs represented as CFGs, takingO(n4) space as defined in (Vijay-Shanker &Weir, 1993; Lang, 1994).

4. Input Data

The data used as input to the parser was a set of2250 sentences from the WSJ Penn Treebank.The length of each sentence was21 words orless. The average sentence length was12:3 andthe total number of tokens was27; 715. Thesesentences were taken from the same sections asthe input Treebank Grammar. This was done toavoid any processing difficulties which are in-curred for handling unknown words properly.

Anoop Sarkar

5. ResultsIn this section we examine the performance ofthe parser on the input data (described inx4).2

Figure 3 shows the time taken in seconds bythe parser plotted against sentence length.3 Wesee a great deal of variation in timing for thesame sentence length, especially for longersentences. This is surprising since all timecomplexity analyses reported for parsing al-gorithms assume that the only relevant factoris the length of the sentence. In this paper,we will explore whether sentence length is theonly relevant factor.4

Figure 4 shows the median of time taken foreach sentence length. This figure shows thatfor some sentences the time taken by the parserdeviates by a large magnitude from the mediancase for the same sentence length. Next weconsidered each set of sentences of the samelength to be a sample, and computed the stan-dard deviation for each sample. This numberignores the outliers and gives us a better esti-mate of parser performance in the most com-mon case. Figure 5 shows the plot of the stan-dard deviation points against parsing time. Thefigure also shows that these points can be de-scribed by a linear function.

2The data was split into 45 equal sized chunks andparsed in parallel on a Beowulf cluster of Pentium Pro200Mhz servers with 512MB of memory running Linux2.2.

3From the total input data of2250 sentences,315sentences did not get a parse. This was because theparser was run with the start symbol set to the label S.Of the sentences that did not parse276 sentences wererooted at other labels such as FRAG, NP, etc. Theremaining39 sentences were rejected because a tok-enization bug did not remove a few punctuation symbolswhich do not select any trees in the grammar.

4A useful analogy to consider is the run-time analy-sis ofquicksort. For this particular sorting algorithm, itwas detemined the distribution of the order of the num-bers in the input array to be sorted was an extremelyimportant factor to guarantee sorting in time�(nlogn).An array of numbers that is already completely sortedhas time complexity�(n2).

0

500

1000

1500

2000

2500

3000

3500

4000

5 10 15 20

Med

ian

time

(sec

onds

)

Sentence length

Figure 4: Median running times for the parser.(x-axis: Sentence length; y-axis: Median timein seconds)

-10

0

10

20

30

40

50

60

70

80

8 10 12 14 16 18 20

Tim

e (s

econ

ds)

Std. Deviation points

Figure 5: Least Squares fit over std. devi-ation points for each sentence length. Errorwas 9.078% and 13.74% for the slope andintercept respectively. We ignored sentencesshorter than8 words due to round-off errors;cf. Figure 3 (x-axis: Std. deviation points; y-axis: Time in seconds)


0

1

2

3

4

5

6

7

8

9

10

2 4 6 8 10 12 14 16 18 20

log(

time)

in s

econ

ds

Sentence length

Figure 3: Parse times plotted against sentence length. (x-axis: Sentence length; y-axis:log(time) in seconds)

0

5

10

15

20

25

30

35

40

45

2 4 6 8 10 12 14 16 18 20

log(

No.

of d

eriv

atio

ns)

Sentence length

Figure 6: Log of number of derivations pro-duced by the parser plotted against sentencelength. (x-axis: Sentence length; y-axis:log(No. of derivations))

Figure 6 shows a plot of the number of deriva-tions reported by the parser for each sentenceplotted against sentence length. These deriva-tions were never enumerated by the parser –the total number of derivations for each sen-tence was computed directly from the sharedderivation forest reported by the parser. Thesize of the grammar has a direct relation tothe large number of derivations reported bythe parser. However, in this figure just as inthe figure for the parsing times while there isan overall increase in the number of deriva-tions as the sentence length increases, there isalso a large variation in this number for iden-tical sentence lengths. We wanted to discoverthe relevant variable other than sentence lengthwhich would be the right predictor of parsingtime complexity.5 As our analysis of the lex-icon showed us (see Figure 1), there can bea large variation in syntactic lexical ambigu-

5Note that this variable cannot be the number of ac-tive or passive edges proposed by the parser since thesevalues can only be computed at run-time.

Anoop Sarkar

ity which might be a relevant factor in pars-ing time complexity. To draw this out, in Fig-ure 7 we plotted the number of trees selectedby a sentence against the time taken to parsethat sentence. From this graph we see thatthe number of trees selected is a better pre-dictor than sentence length of increase in pars-ing complexity. Based on the comparison ofthe graph in Figure 7 with Figure 3, we assertthat it is the syntactic lexical ambiguity of thewords in the sentence which is the major con-tributor to parsing time complexity. One mightbe tempted to suggest that instead of numberof trees selected, the number of derivations re-ported by the parser might be a better predic-tor of parsing time complexity.6 We tested thishypothesis by plotting the number of deriva-tions reported for each sentence plotted againstthe time taken to produce them (shown in Fig-ure 8). The figure shows that the final numberof derivations reported is not a valid predictorof parsing time complexity.

0

1

2

3

4

5

6

7

8

9

10

0 5 10 15 20 25 30 35 40 45

log(

Tim

e ta

ken

in s

ecs)

log(Num of Derivations reported)

Figure 8: Number of derivations reported foreach sentence plotted against the time taken toproduce them (both axes are in log scale). (x-axis: log(Number of derivations reported); y-axis: log(Time taken) in seconds)

6One of the anonymous reviewers of this paper sug-gested that this might be a useful indicator.

6. ConclusionIn this paper, we described an implementationof a chart-based head-corner parser for LTAGs.We ran some empirical tests by running theparser on2250 sentences from the Wall StreetJournal. We used a large Treebank Grammar toparse these sentences. We showed that the ob-served time complexity of the parser on thesesentences does not increase predictably withlonger sentence lengths. Looking at the deriva-tions produced by the parser, we see a similarvariation in the number of derivations for thesame sentence length. We presented evidencethat indicates that the number of trees selectedby the words in the sentence (a measure of thesyntactic lexical ambiguity of a sentence) is abetter predictor of complexity in LTAG pars-ing.

ReferencesJOSHI A. K., L EVY L. & TAKAHASHI M. (1975).Tree Adjunct Grammars.Journal of Computer andSystem Sciences.

KAY M. (1989). Head driven parsing. InProc. ofIWPT ’89, p. 52–62, Pittsburgh, PA.

LANG B. (1994). Recognition can be harder thanparsing.Computational Intelligence, 10 (4).

LAVELLI A. & SATTA G. (1991). Bidirectionalparsing of Lexicalized Tree Adjoining Grammars.In Proc. 5th EACL, Berlin, Germany.

MARCUS M., SANTORINI B. &MARCINKIEWIECZ M. (1993). Building alarge annotated corpus of english.ComputationalLinguistics, 19 (2), 313–330.

SCHABES Y. (1994). Left to right parsing of lex-icalized tree adjoining grammars.ComputationalIntelligence, 10 (4).

SIKKEL K. (1997). Parsing Schemata. EATCSSeries. Springer-Verlag.

VAN NOORD G. (1994). Head-corner parsing forTAG. Computational Intelligence, 10 (4).


0

1

2

3

4

5

6

7

8

9

10

0 200 400 600 800 1000

log(

Tim

e ta

ken)

in s

econ

ds

Total num of trees selected by a sentence

Figure 7: The impact of syntactic lexical ambiguity on parsing times. Log of the time takento parse a sentence plotted against the total number of trees selected by the sentence. (x-axis:Total number of trees selected by a sentence; y-axis: log(Time taken) in seconds). Comparewith Figure 3.

VIJAY-SHANKER K. & W EIR D. J. (1993). Theuse of shared forests in TAG parsing. InProc of6th Meeting of the EACL, p. 384–393, Utrecht, TheNetherlands.

XIA F. (1999). Extracting tree adjoining grammarsfrom bracketed corpora. InProceedings of 5th Nat-ural Language Processing Pacific Rim Symposium(NLPRS-99), Beijing, China.

Documents

Practical Experiments in Parsing using Tree Adjoining Grammarsanoop/papers/pdf/parser_expt_final.pdf · Workshop TAG+5, Paris, 25-27 May 2000 Practical Experiments in Parsing using