16
A Brief Introduction to the TIGER Treebank, Version 1 George Smith Universit¨ at Potsdam July 22, 2003 1 Introduction This brief introduction provides a basic description of the structural format of Ver- sion 1 of the TIGER treebank, giving representative examples of the way linguistic phenomena are encoded. It is intended to guide the user through the first steps of becoming familiar with the treebank. The user is presumed to be a linguist specifi- cally interested in German. A basic knowledge of the syntax and vocabulary of the German language is assumed. The TIGER Treebank (Version1) is the product of an ongoing joint project at the the Department of Computational Linguistics and Phonetics in Saarbr ¨ ucken,the Institute of Natural Language Processing (IMS) in Stuttgart, and the Institut f¨ ur Germanistik in Potsdam 1 . Version 1 of the treebank consists of approximately 700,000 tokens (40,000 sentences) of German newspaper text, taken from the Frankfurter Rundschau. The information encoded in the treebank can be most easily accessed via the TIGERSearch query tool (Lezius, 2002). Thus, for each phenomenon discussed here, sample queries in the TIGER query language are given which demonstrate how relevant sentences can be retrieved from the treebank via the query tool. These queries can be used as building blocks in the formulation of user queries. This doc- ument is not specifically intended as a tutorial for the query language, but it could provide a starting point for familiarizing oneself with the basic concepts of the language. The examples of queries given here progress from very simple to more complex and are accompanied by explanations. No prior knowledge of the query language is assumed. For introductory information regarding the TIGERSearch query tool and the TIGER language see K¨ onig et al. (2003). For a formal defini- tion of the language see K¨ onig and Lezius (2003). The annotation scheme used to encode grammatical information in the treebank builds on that developed for the NEGRA treebank (Skut et al., 1997, 1998; Brants et al., 1999). The corpus is divided into sentences. Each sentence has been given 1 The TIGER project has been financed by the Deutsche Forschungsgemeinschaft since 1999 1

Tiger Introduction

Embed Size (px)

Citation preview

Page 1: Tiger Introduction

A Brief Introduction to theTIGERTreebank,Version1

GeorgeSmithUniversitat Potsdam

July22,2003

1 Introduction

This brief introduction providesabasic description of thestructural formatof Ver-sion1 of theTIGERtreebank,giving representativeexamplesof theway linguisticphenomenaareencoded. It is intended to guidetheuser through thefirst steps ofbecoming familiarwith thetreebank. Theuseris presumedto bea linguist specifi-cally interestedin German.A basicknowledgeof thesyntax andvocabularyof theGermanlanguageis assumed.

TheTIGER Treebank(Version1) is the product of an ongoing joint project atthetheDepartment of Computational LinguisticsandPhonetics in Saarbrucken,theInstitute of Natural Language Processing (IMS) in Stuttgart, and the Institut furGermanistik in Potsdam1. Version1 of the treebank consistsof approximately700,000 tokens (40,000 sentences) of Germannewspaper text, taken from theFrankfurter Rundschau.

The information encodedin the treebankcanbe mosteasily accessedvia theTIGERSearchquery tool (Lezius,2002). Thus, for eachphenomenondiscussedhere,samplequeries in the TIGER query languagearegiven which demonstratehow relevantsentencescanberetrievedfrom thetreebankvia thequery tool. Thesequeriescanbeusedasbuilding blocks in theformulation of userqueries.Thisdoc-umentis not specifically intendedasa tutorial for thequery language, but it couldprovide a starting point for familiarizing oneself with the basic conceptsof thelanguage.Theexamples of queriesgivenhereprogressfrom very simpleto morecomplex andareaccompanied by explanations. No prior knowledge of the querylanguageis assumed. For introductory information regarding the TIGERSearchquerytool andthe TIGER languageseeKonig et al. (2003). For a formal defini-tion of thelanguageseeKonig andLezius(2003).

Theannotationschemeusedto encodegrammatical information in thetreebankbuildson thatdevelopedfor theNEGRA treebank(Skutet al., 1997, 1998; Brantset al., 1999). Thecorpus is divided into sentences.Eachsentencehasbeengiven

1TheTIGERprojecthasbeenfinancedby theDeutscheForschungsgemeinschaftsince1999

1

Page 2: Tiger Introduction

Nouns N Adverbs ADVVerbs V Conjunctions KOArticles ART Adpositions APAdjectives ADJ Interjections ITPronouns P Particles PTKCardinal Numbers CARD

Table1: Main STTSCategories

a structural description, henceforth annotation. Theannotationcanbedividedintotwo parts: informationspecific to individual wordformsis encodedby tagsassoci-atedwith those forms;syntactic structureis encodedvia asyntaxgraph, a tree-likestructurewith potentially crossingbranchesandlabeled edges (Konig andLezius,2003). In this paperwewill informally refer to trees.Thewordformsof asentencearetagged for part of speech2. A treeis built above thewordformsof a sentence.Thenodesof thetreearelabeledfor their constituent category. Theedges leadingfrom onenodeto anotherarelabeled according to thefunction of thechild node inthe constituent formedby the parent node. The treebank alsocontains secondaryedges. Secondary edgesaredrawn betweentwo nodeswhich arenot in a relationof immediate dominanceandindicate a syntacticrelation betweenthose nodes.

Searching the treebankusing the TIGER query languageinvolvesdescribingnodes andrelationsbetween nodes via expressions in thelanguage.Theseexpres-sionscanbe combined using various Boolean operators. Basicnode descriptionsareintroducedin section 2. Meansof expressingrelations betweennodesarein-troducedin sections3 and4.

2 Part of Speech

Thetagset for partof speech usedin theTIGERTreebank is theStuttgart-TubingerTagset(STTS), with minor variations.For a completedocumentation of STTS seeSchilleret al. (1999). A brief overviewof thetagset is givenin section A.1. Pointswherethe tagset used in TIGER differs from Schiller et al. (1999) are given insection A.2. The STTSdivides the wordforms of Germaninto the eleven maincategories shown in table 1. Thesemain categories are then subject to varyingdegrees of further subdivision. A tag for a word form consistsof the tag for themainword category foll owedby tagsfor thesubcategories.

Table2 uses the class of verbsasan example of the further subdivision of amain word category in STTS. It showshow verbs arefurther subdivided into fullverbs,auxiliary verbs,andmodalverbs,andhow these arefurther subdividedinto

2Wearecurrentlyin theprocessof addingadditionalword-level tags.In version2 of thetreebank,inflectedformsandcertainotherformswill betaggedwith categoriesrelevantfor inflection,andeachwordformwill betaggedwith its respective lemma.

2

Page 3: Tiger Introduction

Part of Speech VerbType FinitenessV Verb A Auxiliar FIN finit

INF infinitIMP imperativPP Partizip Perfekt

MModal FIN finitINF infinitPP Partizip Perfekt

V Voll FIN finitINF infinitIZU Infinitiv mit zuIMP imperativPP Partizip Perfekt

Table2: Subcategories of Verbsin STTS

finite formsandseveral categoriesof non-finite forms: infinitives,infinitiveswithzu, pastparticiplesandimperatives.Thefull labelfor averbform is aconcatenationof theabbreviation for themaincategory, verbtype, andfiniteness/infinitive-type.

(1) a. gewesenVAPP

b. konnteVMFIN

c. abzugebenVVIZU

A pastparticiple of anauxiliary verbwould thenbegiventhetagin (1a)(Verb,auxiliar, Partizip Perfekt). A finite form of a modalverbwould begiventhetagin(1b) (Verb, modal,finit), anda singleinfinitive wordform containing zuwould begiventhetagin (1c) (Verb,voll, infitit iv, zu). An exampleof partof speech taggingof a sentenceis givenin (2).

(2) JetztADV

solleVMFIN

erneutADJD

einART

AntragNN

gestelltVVPP

werdenVAI NF

.$.

A search for wordformsbelongingto a particular category is accomplished byusing an expression in the TIGER languageknown as a nodedescription. Thesimplest nodedescription consistsof anexpressionknown asa feature constraint.The simplest feature constraint consistsof a single feature-value pair. A featureanda valueareseparatedby anequal sign.

3

Page 4: Tiger Introduction

(3) a. [pos = "ART"]

b. [pos = "NN" ]

c. [pos = "VVFIN"]

The nodedescription (3a) describesa nodewith the value "ART" (Artikel)for the feature pos (part of speech). In otherwords,this feature description canbe usedto access an article. Similarly, the node description (3b) would access acommonnoun(anodewith thevalue"NN" , Nomen,normal, for thefeaturepos ).And (3c)wouldaccessafinite form of afull verb(anodewith thevalue"VVFIN" ,Verb,voll, finit, for thefeature pos ). It is of coursenot alwaysdesirableto searchsolely for the mostspecific category. Lessspecific categoriescanbe accessedbycombining morespecific categories. Oneway of accomplishing this is by usingBooleanoperators,anotherway is by using types.

Within theclass of verbs, for example, onecould searchspecifically for finiteforms of full verbs,using the nodedescription (4a). Onecould also search forfinite forms of auxiliary verbs with the nodedescription (4b), or for finite formsof modalverbswith the nodedescription (4c). Or onecould search for all finiteverb forms,usingthesingle feature constraint (4d) This node description uses theBooleanoperator for disjunction | . Parenthesesindicategrouping.

(4) a. [pos = "VVFIN"]

b. [pos = "VAFIN"]

c. [pos = "VMFIN"]

d. [pos = ("VV FIN" | "VAFIN" | "VMFIN" )]

Thedouble quotes in (4a–4c) signify constants, namelythestrings"VVFIN" ,"VAFIN" or "VMFIN" . Thesefeature constraints accept nodeswith the givenstring asa valuefor the feature pos . Theconstraint (4d) contains three constantsandthe Booleanoperator for disjunction andaccepts nodeswith a valueof either"VVFIN" , "VAFIN" or "VMFIN" . The Booleanexpression describesa setofpossible feature values. Theparenthesesindicate grouping.

(5) a. finit e := VAFIN, VVFIN, VMFIN;

b. [pos = fini te]

Another way of searching for finite verbsis by defining a type. A pre-definedtypehierarchy for part of speech hasbeen developedfor usewith theTIGERTree-bank (Version 1). Userscan also definetheir own type hierarchy (Konig et al.,2003). The type definition in (5a) definesa type for finite verb forms. The typefini te canthenbe used in a nodedescription suchas(5b) to search for finiteverbforms. A type,oncedefined, canbereused. It is ultimatelysimpler andmoreeconomical to usepre-definedtypes, ratherthanBoolean expressions for setsoffeature values which arefrequently searchedfor. Theusercandefinea hierarchyof typesfor eachfeature. A typedefinition hierarchy is stored in a file andis thusspecifically intendedfor reuse. Well-definedtypehierarchiescansimplify thecon-

4

Page 5: Tiger Introduction

struction of queries considerably. In constructing a typehierarchy, a usercanmapthe tagsetsusedin TIGER to a terminology of the usersown choosing,making iteasier to posequeriesin anintuitive way.

3 Syntactic Structure

Syntactic structure is expressedby meansof a graphstructure. Constituent cat-egories are encoded in node labels: non-terminal node labels representphrasalcategories;terminal nodesrepresentwordformsandaretagged asdescribedin sec-tion 2. Syntactic functions areencoded in edgelabels, discussedin section 4. Asimpleexamplefrom thetreebank is givenin figure1.

Constituentstructure treesin the TIGER Treebank tendto be flat; redundantbranching is eliminated. The distinction betweenargumentsandadjuncts is notmadein constituent structure,but is expressedvia syntacticfunctionsexpressedviaedgelabels(cf. section 4). In general, ergonomicshasbeen an important designcriterion. The elimination of redundant structures,which canbe found in manylinguistic descriptions but arenot actually needed for processing queries, resultsin treestructureswhich aremoreeasily viewed on screen. Two examplesof thiselimination of redundantinformationarethelack of non-branching NPnodes3 andthe encoding of prepositional phrases without a separate NP node for the nounphrase immediately dominated by the prepositional phrase4. In the former case,thecategory of theconstituent formedby thewordform is derivable from thepartof speech tag. In thelattercase,thesetof formscomprisingthenoun phrasewithintheprepositional phraseis completely predictable.

The branchesof a treemay cross(cf. figure 2), allowing local andnon-localdependenciesto beencodedin a uniform way andeliminating theneedfor traces.This approachhassignificant advantages for non-configurational languagessuchasGerman,which exhibit a rich inventory of discontinuousconstituency typesandconsiderable freedom with respectto word order.

Searching for syntactic structureswith theTIGERSearchsoftwareinvolvesex-pressing relations between nodes. A numberof predicatesfor expressingsuchrelationsareintroducedin Konig etal. (2003). Therearepredicatesfor expressingrelationsof dominance(seeexamples 6–8)andpredicatesfor expressingrelationsof linear precedence(seeexamples9–11). In addition, there arepredicates whichdirectly expressconstraints on graphs (seeexample12).

(6) a. [cat = "NP" ]

b. [cat = "NP" ] > [po s = "NN"]

3For anexampleof thelack of non-branchingnodes,seethepronoun sieandtheplural commonnounsVerpackungen andEtikettenin figure3

4For anexample,seethetwo prepositionalphrasesin figure1.

5

Page 6: Tiger Introduction

01

23

45

67

89

10

500

501

502

503

504

Der

AR

T

Par

teita

g

NN

der

AR

T

SP

D

NE

bega

nn

VV

FIN

am

AP

PR

AR

T

Mitt

woc

h

NN

um

AP

PR

15.1

5

CA

RD

Uhr

NN

. $.

NK

NK

AC

NK

AC

NK

NK

NK

NK

NP

AG

NP

SB

HD

PP

MO

PP

MO

S

Fig

ure

1:B

asic

Con

stitu

entS

truc

ture

6

Page 7: Tiger Introduction

01

23

45

67

89

1011

1213

500

501

502

503

504

505

506

Unt

er

AP

PR

ande

rem

PIS

wur

de

VA

FIN

ein

AR

T

Bet

eilig

ungs

mod

ell

NN

gesc

haffe

n

VV

PP

, $,

das

PR

ELS

heut

e

AD

V

insg

esam

t

AD

V

17

CA

RD

Ges

ells

chaf

ter

NN

umfa

ßt

VV

FIN

. $.

AC

NK

MO

HD

PP

MO

HD

AP

NK

NK

SB

MO

NP

OA

HD

NK

NK

SRC

HD

VP

OC

NP

SB

S

Fig

ure

2:D

isco

ntin

uous

Con

stitu

ents

7

Page 8: Tiger Introduction

Thequery in (6a) is matchedby all noun phrases,that is, nodeswith thevalueNPfor the featurecat (category). Thequery in (6b) usestheoperator for imme-diate dominance>. This operator expresses a relation of immediatedominancebetweentwo nodes. Thequerymatchesany nounphrasewhich immediately dom-inates a commonnoun. More precisely, it matches any structuresconsisting of anodewith thevalueNP for the feature cat which immediately dominatesa nodewith thevalueNNfor thefeature pos .

(7) a. #xyz

b. #xyz: [cat = "NP" ]

c. #np:[ cat = "NP"] &#np > [po s = "NN"]

Whenformulating a query, it is often necessary to refer to a particular nodemorethanonce. This is doneby usingvariables(Konig et al., 2003). Thesymbol# indicates that what follows is a variable name. Example(7a) shows a validvariable name. It is also a simple query, matching any node, as the variable isnot constrained. Variablesmay be constrained, as in example(7b), in which thevariable xyz is constrained to matchnodes of type [ca t = "NP"] . Note theuseof thesymbol : to indicatethat thevariable is constrained. A simpleexampleof theuseof a variable is given in (7c). Here,a variable nameis chosen which ismnemonically convenient, a practice which makesmorecomplex querieseasier toread.This queryaccessesthesamesetof nodesasthequery in example (6b). It isnotnecessaryto useavariablehere; theexampleis intendedto illustratethesyntaxfor usingvariablesfor nodedescriptions.Thequery in example(8), however, couldnot have beenformulatedwithout theuseof a variable.

(8) #np:[cat = "NP"] &#np > [pos = "ART"] &#np > [pos = "NE"]

In example(8), we seethreelines,each containinganexpressionrepresentinga condition on syntactic structure. The expressions in these lines are joined bythe Boolean operator for conjunction (&). In the first line, we seea constrainedvariable: thereis a node#np with the value "NP" for the feature cat . In thesecond line of thequerythecondition is expressedthat thenode referredto by thevariable #np dominatesa nodewith the valueART for the feature pos . In thethird line, thecondition is expressedthatthenode#np alsodominatesanodewiththevalue NE for the feature pos . Thequery is thusmatched by any noun phrasewhich immediately dominatesbothanarticle (ART) anda propernoun (NE).

(9) [wo rd = "noc h"] . [wo rd = "nic ht"]

Thequeries in examples (9–11) demonstratetheuseof two operatorsfor rela-tions of precedence: . and .* . The operator . indicatesimmediate precedence.Thequeryin example(9) would accessall instancesin which thewordform nochimmediately precedesthewordform nicht.

8

Page 9: Tiger Introduction

(10) #s: [cat = "S"] &#n: [cat = "NP" ] &#v: [pos = "VAFIN"] &#s >SB #n &#s > #v &#v .* #n

A more complex example of the useof a predicate expressinga relation oflinear precedenceis given in (10). Thequery would accessall sentencesin whicha noun phrasefunctioning assubject is precededby a finite form of an auxiliaryverb,suchasthesentencein figure2. Theoperator .* denotesprecedence whichmay be, but neednot be, immediate. The first three lines of the query containexpressionsconstraining the three variables: #s , #n and#v . In the fourth line,a relation of labeled immediatedominanceis expressed(seesection 4). In otherwords,#n is thesubject(SB) of #s . Thefifth line containsanexpressionindicatingthat #v is immediately dominated by #s . The last line contains an expressionindicatingthat#v occurs at somepoint in thesentencebefore #n .

(11) #s: [cat = "S"] >SB #n:[ cat = "NP"] &#s > #v:[p os = "VAFIN"] &#v .* #n

Thequery (10) canbe reformulatedin a morecompact way. In (10), the firstthreelines simply constrain thethreevariablesused. This makesthequeryeasytoread.It is not necessaryto do this however, asseenin theequivalentqueryin (11).It is amatterof personal tastewhether theformulationin (10) or (11) is preferable.For thosenotaccustomedto using query languages, it maybebetter in thelongrunto begin writing queries that areformulatedin a simplermorestructuredstyle, asdebugging queries which arehardto understand requiresconsiderably moretimethandoesa bit of extra typing.

(12) a. #n:[c at=" NP"] & disc onti nuous (#n)

b. #n:[c at=" NP"] & cont inuo us(#n )

The queries in (12) show two examples of predicateson graphs. The no-tion discontinuous constituent can be expressedby combining the use of rela-tions of dominance with the useof relations of linear precedence. Expressingthis in a query would not besimple, however. Thepredicatescont inuou s anddisc onti nuou s areincludedin theTIGERlanguageto provideasimplewayofdifferentiating continuous anddiscontinuousconstituents. Thequery (12a) wouldaccess all discontinuousnounphrases,suchasthat in figure2 andthequery (12b)would access all continuousnounphrases.Note thatpredicateson graphsrequiretheuseof a variable. For other predicateson graphs,see Konig et al. (2003).

9

Page 10: Tiger Introduction

502�

500�

0�

1�

2�

3�

4�

5�

6�

7�

500�

501�

502�

503�

Sie

PPER

entwickelt

VVFIN

und

KON

druckt

VVFIN

Verpackungen

NN

und

KON

Etiketten

NN

.

$.

SB�

HD

CJ

CD

CJ

HD

CNPOA

S�

CJ

CD

S�CJ

CS

SB�

OA�

Figure3: SecondaryEdges

4 Syntactic Relations

Syntactic relations areencodedon the edgedrawn from a parent node to a childnode. Thesentencein figure 1 containsa nounphrase(NP) which itself containsanother nounphrase, the latter functioning asa genitive attribute (AG). This syn-tactic relation is indicatedby a label on theedgedrawn from parentto child. It ispossible to search for pairsof nodes which arein a particular syntacticrelation byusinganoperatorfor labeled immediatedominancesuchas>AG(attribut, genitive)or >RC(relative clause) in (13).

(13) a. [cat = "NP" ] >AG [ca t = "NP"]

b. [] >RC [cat = "S"]

The expression in (13a)would access all nodes of the category noun phrase(NP) which themselveshave a child of thecategory nounphrase(NP) which func-tionsasa genitive attribute(AG). Theexpressionin (13b) would bematchedby allnodes of any kind which have a daughter which is a sentence(S) which functionsasa relative clause(RC). Thenode description [] containsno feature-valuepairsandis thusmatched by any node. Thequery(13b)would, for example,accessthesubject of thesentencein figure2.

Syntactic relationswhich obtain betweennodeswhich arenot in a relation ofimmediatedominance are encodedvia secondary edges. In figure 3, the initialwordform Siefunctionsnot only assubject of thefirst of thetwo coordinatedsen-tences, but alsoassubject of the second. Likewise, the coordinated nounphrase(CNP) functionsasaccusative object of both sentences.

(14) a. [] >˜SB []

b. [] >˜OA []

Thequery(14a) would access all casesin which thesecondarysyntactic rela-tion of subject (SB) holds betweentwo nodes. The query(14b) cansimilarly beusedto searchfor secondaryaccusative objects(OA).

10

Page 11: Tiger Introduction

(15) a. #n:[c at=" NP"] >RC [cat ="S"] & dis conti nuou s(#n )

b. #n:[c at=" NP"] >RC [cat ="S"] & con tinuo us(# n)

The queries in (15) usetwo predicateson graphswe saw in section 3 alongwith a relation of labeled immediate dominance.Thequery (15a)would accessalldiscontinuousnoun phraseswhich immediately dominatea sentencewhich func-tionsasa relative clause.This query would accessnounphrasescontaining extra-posed relative clausessuchasthe subject of the sentence in figure 2. The query(15b) would access nounphraseswhich immediately dominatea sentencewhichfunctions as a relative clause, but which are not discontinuous, excluding thosecontaining extraposedrelativeclauses.

5 Conclusion

This documentwasintendedasa brief introduction to theTIGER Treebank (Ver-sion 1). The best way to explore the treebankat this point is to experiment withtheTIGERSearch query tool, possibly modifying andexpandinguponthevariousqueriesgiven here. An introduction to the query language,including a completelist of operators, canbe found in Konig et al. (2003). For a formal definition ofthequery languageseeKonig andLezius(2003). Examplesof queriesrelating toword structure canbe found in Smith (2003). Theseandotherdocuments canbefound on theTIGERwebsite,located at:

http://www.ims.uni-stuttgart.de/projekte/TIGER/

11

Page 12: Tiger Introduction

A Tagsets

A.1 The Tagset for Part of Speech

Thetagset for partof speech is theSTTS tagset(Schiller et al., 1999) with minormodificationsoutlinedin A.2. As the abbreviationsof the STTStagset arebasedon Germanterminology, it may be easierfor readersof Germanto usethe origi-nal documentation in Schiller et al. (1999). Englishtranslations aregivenhereforthoselessfamiliar with Germanlinguistic terminology.

ADJA adjective,attributive [das]große[Haus]ADJD adjective,adverbial or predicative [er fahrt] schnell, [er ist] schnellADV adverb schon, bald,dochAPPR preposition; circumposition left in [der Stadt],ohne [mich]APPRART preposition with article im [Haus],zur [Sache]APPO postposition [ihm] zufolge,[der Sache]wegenAPZR circumposition right [von jetzt] anART definite or indefinite article der, die,das, ein,eine,...CARD cardinal number zwei [Manner], [im Jahre] 1994FM foreign languagematerial [Er hatdasmit “] A big fish [”

ubersetzt]ITJ interjection mhm,ach, tjaKOUI subordinate conjunction with zu

andinfinitiveum [zu leben], anstatt [zu fragen]

KOUS subordinate conjunction withsentence

weil, daß,damit,wenn,ob

KON coordinate conjunction und,oder, aberKOKOM comparative conjunction als,wieNN commonnoun Tisch,Herr, [das]ReisenNE proper noun Hans,Hamburg, HSVPDS substituting demonstrativepronoun dieser, jenerPDAT attributive demonstrativepronoun jener[Mensch]PIS substituting indefinite pronoun keiner, viele, man,niemandPIA T attributive indefinite pronoun

without determinerkein [Mensch], irgendein[Glas]

PIDAT attributive indefinite pronounwithdeterminer

[ein] wenig[Wasser], [die] beiden[Bruder]

PPER non-reflexive personal pronoun ich, er, ihm, mich,dirPPOSS substituting possessive pronoun meins,deinerPPOSAT attributive possessive pronoun mein[Buch], deine[Mutter]

12

Page 13: Tiger Introduction

PRELS substituting relativepronoun [der Hund,] derPRELAT attributive relativepronoun [der Mann,] dessen[Hund]PRF reflexive personalpronoun sich, dich, mirPWS substituting interrogative pronoun wer, wasPWAT attributive interrogative pronoun welche [Farbe],wessen[Hut]PWAV adverbial interrogativeor relative

pronounwarum,wo, wann,woruber, wobei

PAV pronominal adverb dafur, dabei, deswegen,trotzdemPTKZU zubefore infinitive zu [gehen]PTKNEG negative particle nichtPTKVZ separable verbalparticle [er kommt] an,[er fahrt] radPTKANT answerparticle ja, nein,danke, bittePTKA particle with adjective or adverb am[schonsten], zu [schnell]SGML SGML markup � turnid=n022k TS2004 SPELL lettersequence S-C-H-W-E-I-K-LTRUNC word remnant An- [und Abreise]VVFIN finite verb,full [du] gehst, [wir] kommen[an]VVI MP imperative, full komm[!]VVI NF infinitive, full gehen,ankommenVVI ZU Infinitive with zu, full anzukommen,loszulassenVVPP perfect participle, full gegangen,angekommenVAFIN finite verb,auxiliary [du] bist, [wir] werdenVAI MP imperative,auxiliary sei[ruhig !]VAI NF infinitive,auxiliary werden, seinVAPP perfect participle, auxiliary gewesenVMFIN finite verb,modal durfenVMINF infinitive,modal wollenVMPP perfect participle, modal gekonnt,[er hatgehen] konnenXY non-word containing non-letter 3:7, H2O,D2XW3$, comma ,$. sentence-finalpunctuation mark . ? ! ; :$( othersentence-internal punctuation

mark- [,]()

A.2 Deviation from STTS in the TIGER Treebank� PIDAT vs. PIAT

No distinctionis madebetweenPIAT andPIDAT. PIAT is usedfor indefi-nite pronounsin anattributive function bothwith andwithout determiner.

� ADVPrepositions aretaggedasADVwhenthemodify numerals.

� PAVvs. PROAVThetagPROAV replacestheSTTStagPAV.

13

Page 14: Tiger Introduction

A.3 The Tagset for Node Labels

AA superlative phrasewith amAP adjective phraseAVP adverbial phraseCAC coordinatedadpositionCAP coordinatedadjective phraseCAVP coordinatedadverbialphraseCCP coordinatedcomplementiserCH chunkCNP coordinatednoun phraseCO coordinationCPP coordinatedadpositional phraseCS coordinatedsentenceCVP coordinatedverbphrase(non-finite)CVZ coordinatedinfinitive with zuDL discourselevel constituentISU idiosyncratic unitMTA multi-tokenadjectiveNM multi-tokennumberNP nounphrasePN proper nounPP adpositional phraseQL quasi-languageS sentenceVP verbphrase(non-finite)VZ infinitive with zu

14

Page 15: Tiger Introduction

A.4 The Tagset for Edge Labels

AC adpositional casemarkerADC adjective componentAG genitive attributeAMS measure argumentof adjectiveAPP appositionAVC adverbial phrase componentCC comparative complementCD coordinating conjunctionCJ conjunctCM comparative conjunctionCP complementizerCVC collocational verbconstruction (Funktionsverbgefuge)DA dativeDH discourse-level headDM discoursemarkerEP expletive esHD headJU junctorMNR postnominalmodifierMO modifierNG negationNK nounkernel elementNMC numerical componentOA accusative objectOA second accusative objectOC clausal objectOG genitive objectOP prepositional objectPAR parenthetical elementPD predicatePG phrasalgenitivePH placeholderPM morphological particlePNC proper nouncomponentRC relative clauseRE repeatedelementRS reportedspeechSB subject

15

Page 16: Tiger Introduction

SBP passivisedsubject(PP)SP subjector predicateSVP separableverbprefixUC unit componentVO vocative

References

Brants,Thorsten, Wojciech Skut,andHansUszkoreit. 1999. Syntactic annotationof a Germannewspaper corpus. In Proceedings of the ATALA Treebank Work-shop, 69–76. Paris,France.

Konig, Esther, andWolfgangLezius.2003. TheTIGER language- a descriptionlanguagefor syntax graphs, Formaldefinition. Tech.rep., IMS, University ofStuttgart.

Konig, Esther, Wolfgang Lezius, and Holger Voormann. 2003. TIGERSearchUser’s Manual. IMS, Universityof Stuttgart, Stuttgart.

Lezius, Wolfgang. 2002. Ein suchwerkzeug fur syntaktisch annotierte textkor-pora. Ph.D.thesis, IMS, University of Stuttgart. ArbeitspapieredesInstituts urMaschinelle Sprachverarbeitung (AIMS), Volume8, Number4.

Schiller, Anne, SimoneTeufel, Christine Stockert, and Christine Thielen.1999.Guidelines fur das Tagging deutscher Textcorpora mit STTS. UniversitatStuttgartandUniversitat Tubingen.

Skut, Wojciech, Thorsten Brants, Brigitte Krenn, and HansUszkoreit. 1998. Alinguistically interpreted corpus of Germannewspaper text. In Proceedingsof theConferenceon Language Resourcesand Evaluation LREC-98, 705–711.Granada, Spain.

Skut,Wojciech, Brigitte Krenn, Thorsten Brants,andHansUszkoreit. 1997. Anannotationschemefor freeword order languages. In Proceedingsof ANLP-97.Washington.

Smith,George. 2003. Searching for morphological structurewith regular expres-sions. Tiger Projektbericht, Univ. Potsdam.

16