Upload
victoria-strickland
View
215
Download
1
Embed Size (px)
Citation preview
The Rule-based Parser of the NLP Group of the University of Torino
Leonardo LesmoDipartimento di Informatica and
Centro di Scienze Cognitive,Università di Torino,
ItalyEmail: [email protected]
GoalsWide-coverage toolDomain-independence
ApproachManually developed rulesTwo phases: Chunking and subcategorization
Extensibility to semantics
Procedural analysis of conjunctions and of identification of verbal dependents
TULE (Turin University Linguistic Environment)
TOKENIZER
Tokens
Text
Token Automaton Splits the text into words, numbers, punctuation marks
DICTIONARY LOOKUP
Sets of lexical items
Morphological dictionary
Suffix tables
Extracts all lexical interpretations of each token
POS TAGGERTagging rules Chooses one lexical interpretation
Lexical items
DEPENDENCY PARSER
Parse Tree
Parsing rules
Verbal Caseframes
Establishes the connections between lexical items
The grammarRule-based dependency grammar
Chunking (non-verbal groups) + verbal subcategorization frames
Output: a projective tree represented as pointers to parents, including some null elements (understood items – e.g. pro-drop - and traces)
CHUNKING
Chunked text
Lexical Items
Chunking rules Splits the text into groups of strictly connected words
ANALYSIS OF CONJUNCTIONS
Chunked text
Procedural preference rules 1
Connects chunks linked by conjunctions, to form larger chunks
SEGMENTATIONDetermines the dependents of verbs
Lexical items
VERBAL ATTACHMENT
Parse Tree
Verb classes
Verbal Caseframes
Determines the role (arc labels) of the verbal dependents
Parser Architecture
Procedural preference rules 2
An exampleExample: Slitta a Tirana la decisione sullo stato di emergenza.
(The decision on the emergency status in Tirana has been delayed)
1 Slitta (SLITTARE VERB MAIN IND PRES INTRANS 3 SING)2 a (A PREP MONO)3 Tirana (TIRANA NOUN PROPER F SING ££CITY) 4 la (IL ART DEF F SING)5 decisione (DECISIONE NOUN COMMON F SING DECIDERE INTRANS)6 sullo ((SU PREP MONO) 6.10 (IL ART DEF M SING))7 stato (STATO NOUN COMMON M SING)8 di (DI PREP MONO)9 emergenza (EMERGENZA NOUN COMMON F SING)10 . (#\. PUNCT)
Lexical Items
[0;TOP-VERB][1;PREP-RMOD] [2;PREP-ARG][1;VERB-SUBJ] [4;DET+DEF-ARG][5;PREP-RMOD][6;PREP-ARG][6.10;DET+DEF-ARG][7;PREP-RMOD] [8;PREP-ARG][1;END]
Parse Tree Infos
1: Slitta
Prep-rmod
2: a
Verb-subj
4: la
3: Tirana
Prep-arg
5; decisione
Det+def-arg
6: su
Prep-rmod
Prep-arg
6.10: lo
Stato di emergenza
PuoiV-modal-2nd-sing-pres dirV-inf [miPron-1st-dative]Pron
[cheAdj-interr spettacoliNoun [diPrep cabaretNoun]P-group ]N-group possoV-modal-1st-sing-pres vedereV-inf [domaniAdv]A-group?
ChunkingExample: Puoi dirmi che spettacoli di cabaret posso vedere domani? (Can you tell me what cabaret plays I can see tomorrow?)
Chunking Rules Chunking rules are grouped in packets. Each packet is associated with a lexical category, and describes
the “chunkable” possible dependents of words of that category. Chunkable means a dependent handled during chunking (e.g.
auxiliaries, but not arguments of verbs)
(NOUN common(precedes (ADJ qualif T (#\- #\' #\")) (ADJ ((type qualif) (agree)))ADJC+QUALIF-RMOD))
A chunk rule
Packet (governing word)
feature (constrains applicability)
Label of connecting arc
Category of possible dep (and constraints on it)
Position of dep (and possible words separating head from dep)
Conjunctions When a coordinating conjunction is found, all following and
preceding chunks are collected All pairs are built, and the best one is chosen according to
criteria based on structural similarity and distance Special treatment for verbs
HoV-aux incontratoV-main
[MarcoNoun-Proper]Noun eConj-coord [LuciaNoun-Proper]Noun eConj-coord [liPron-pers ]Pron hoV-aux salutatiV-main
Example: Ho incontrato Marco e Lucia e li ho salutati (I met Marco e Lucia and I greeted them)
Segmentation
PuoiV-modal-2nd-sing-pres { dirV-inf [miPron-1st-dative]Pron
{[cheAdj-interr spettacoliNoun [diPrep cabaretNoun]P-group ]N-group possoV-modal-1st-sing-pres {vedereV-inf [domaniAdv]A-group? } } } }
For each verb (going from left to right): Look for possible dependents (on its right and left) On the left, the search is blocked from the previous verb On the right, some “barriers” are defined to stop the search (for
instance, a subordinating conjunction acts as a barrier)
Verbal Subcategorization
verbs
nosubj-verbs
subj-verbs
obj-verbs
basic-transempty-modal
modal
ssubj-inf-verbs
trans
indobj-verbs
trans-indobj
subcategorization classes
bisognare
camminare
dovere
dictionary
potere
need
walk
must
can
The subcategorization classes:
(subj-verbs (intrans) (verbs) ; *** verbs with a subject. Definition of subject
( verb-subj ((noun (agree)) (art (agree)) (pron (not (word quale) (type relat)) (case lsubj) (agree)) (adj (type (indef demons deitt interr poss)) (agree)) (num (agree)) (prep (word in) (down (cat pron) (type indef)) (agree)))))
(ssubj-inf-verbs () (verbs) ; *** verbs with an inf-verb sentential subject
( verb-subj ((verb (mood infinite) (agree)))))
(empty-modal () (no-subj-verbs) ; *** modals without subject ( verb-indcompl-modal ((verb (mood infinite)))))
Example subcategorization class definitions:
Transformations:
basic class (e.g. trans) transformed classes (e.g. trans, trans+passivization,trans+infinitivization,trans+prodrop,trans+passivization+infinitivization,….. )
Example transformation:(infinitivization replacing (subj-verbs) (is-inf-form tr-verb v-casefr) (cancel-case s-subj))
Some statistics
Chunking rulesTotal: 295 rulesCommon: 250 rulesEnglish: 34 rulesItalian: 7 rulesSpanish + Catalan: 4 rules
Base SubcategorizationTotal: 118 classesAbstract: 21 classes
plus verbal locutionsItalian: 40 classesEnglish: 1 class
Derived surface case frames 2653 case frames
Conclusions Test of the parser on other languages, using the same grammar
augmented with extra rules (see previous slide)
Partial use of semantic information (about 400 words classified according to a semantic taxonomy)
The parser has been used in a project involving spoken and written linguistic interaction with a user. It has been interfaced with an repository of semantic knowledge to build a meaning representation.