15
The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università di Torino, Italy Email: [email protected]

The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

Embed Size (px)

Citation preview

Page 1: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

The Rule-based Parser of the NLP Group of the University of Torino

Leonardo LesmoDipartimento di Informatica and

Centro di Scienze Cognitive,Università di Torino,

ItalyEmail: [email protected]

Page 2: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

GoalsWide-coverage toolDomain-independence

ApproachManually developed rulesTwo phases: Chunking and subcategorization

Extensibility to semantics

Procedural analysis of conjunctions and of identification of verbal dependents

Page 3: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

TULE (Turin University Linguistic Environment)

TOKENIZER

Tokens

Text

Token Automaton Splits the text into words, numbers, punctuation marks

DICTIONARY LOOKUP

Sets of lexical items

Morphological dictionary

Suffix tables

Extracts all lexical interpretations of each token

POS TAGGERTagging rules Chooses one lexical interpretation

Lexical items

DEPENDENCY PARSER

Parse Tree

Parsing rules

Verbal Caseframes

Establishes the connections between lexical items

Page 4: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

The grammarRule-based dependency grammar

Chunking (non-verbal groups) + verbal subcategorization frames

Output: a projective tree represented as pointers to parents, including some null elements (understood items – e.g. pro-drop - and traces)

Page 5: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

CHUNKING

Chunked text

Lexical Items

Chunking rules Splits the text into groups of strictly connected words

ANALYSIS OF CONJUNCTIONS

Chunked text

Procedural preference rules 1

Connects chunks linked by conjunctions, to form larger chunks

SEGMENTATIONDetermines the dependents of verbs

Lexical items

VERBAL ATTACHMENT

Parse Tree

Verb classes

Verbal Caseframes

Determines the role (arc labels) of the verbal dependents

Parser Architecture

Procedural preference rules 2

Page 6: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

An exampleExample: Slitta a Tirana la decisione sullo stato di emergenza.

(The decision on the emergency status in Tirana has been delayed)

1 Slitta (SLITTARE VERB MAIN IND PRES INTRANS 3 SING)2 a (A PREP MONO)3 Tirana (TIRANA NOUN PROPER F SING ££CITY) 4 la (IL ART DEF F SING)5 decisione (DECISIONE NOUN COMMON F SING DECIDERE INTRANS)6 sullo ((SU PREP MONO) 6.10 (IL ART DEF M SING))7 stato (STATO NOUN COMMON M SING)8 di (DI PREP MONO)9 emergenza (EMERGENZA NOUN COMMON F SING)10 . (#\. PUNCT)

Lexical Items

[0;TOP-VERB][1;PREP-RMOD] [2;PREP-ARG][1;VERB-SUBJ] [4;DET+DEF-ARG][5;PREP-RMOD][6;PREP-ARG][6.10;DET+DEF-ARG][7;PREP-RMOD] [8;PREP-ARG][1;END]

Parse Tree Infos

1: Slitta

Prep-rmod

2: a

Verb-subj

4: la

3: Tirana

Prep-arg

5; decisione

Det+def-arg

6: su

Prep-rmod

Prep-arg

6.10: lo

Stato di emergenza

Page 7: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

PuoiV-modal-2nd-sing-pres dirV-inf [miPron-1st-dative]Pron

[cheAdj-interr spettacoliNoun [diPrep cabaretNoun]P-group ]N-group possoV-modal-1st-sing-pres vedereV-inf [domaniAdv]A-group?

ChunkingExample: Puoi dirmi che spettacoli di cabaret posso vedere domani? (Can you tell me what cabaret plays I can see tomorrow?)

Chunking Rules Chunking rules are grouped in packets. Each packet is associated with a lexical category, and describes

the “chunkable” possible dependents of words of that category. Chunkable means a dependent handled during chunking (e.g.

auxiliaries, but not arguments of verbs)

Page 8: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

(NOUN common(precedes (ADJ qualif T (#\- #\' #\")) (ADJ ((type qualif) (agree)))ADJC+QUALIF-RMOD))

A chunk rule

Packet (governing word)

feature (constrains applicability)

Label of connecting arc

Category of possible dep (and constraints on it)

Position of dep (and possible words separating head from dep)

Page 9: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

Conjunctions When a coordinating conjunction is found, all following and

preceding chunks are collected All pairs are built, and the best one is chosen according to

criteria based on structural similarity and distance Special treatment for verbs

HoV-aux incontratoV-main

[MarcoNoun-Proper]Noun eConj-coord [LuciaNoun-Proper]Noun eConj-coord [liPron-pers ]Pron hoV-aux salutatiV-main

Example: Ho incontrato Marco e Lucia e li ho salutati (I met Marco e Lucia and I greeted them)

Page 10: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

Segmentation

PuoiV-modal-2nd-sing-pres { dirV-inf [miPron-1st-dative]Pron

{[cheAdj-interr spettacoliNoun [diPrep cabaretNoun]P-group ]N-group possoV-modal-1st-sing-pres {vedereV-inf [domaniAdv]A-group? } } } }

For each verb (going from left to right): Look for possible dependents (on its right and left) On the left, the search is blocked from the previous verb On the right, some “barriers” are defined to stop the search (for

instance, a subordinating conjunction acts as a barrier)

Page 11: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

Verbal Subcategorization

verbs

nosubj-verbs

subj-verbs

obj-verbs

basic-transempty-modal

modal

ssubj-inf-verbs

trans

indobj-verbs

trans-indobj

subcategorization classes

bisognare

camminare

dovere

dictionary

potere

need

walk

must

can

The subcategorization classes:

Page 12: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

(subj-verbs (intrans) (verbs) ; *** verbs with a subject. Definition of subject

( verb-subj ((noun (agree)) (art (agree)) (pron (not (word quale) (type relat)) (case lsubj) (agree)) (adj (type (indef demons deitt interr poss)) (agree)) (num (agree)) (prep (word in) (down (cat pron) (type indef)) (agree)))))

(ssubj-inf-verbs () (verbs) ; *** verbs with an inf-verb sentential subject

( verb-subj ((verb (mood infinite) (agree)))))

(empty-modal () (no-subj-verbs) ; *** modals without subject ( verb-indcompl-modal ((verb (mood infinite)))))

Example subcategorization class definitions:

Page 13: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

Transformations:

basic class (e.g. trans) transformed classes (e.g. trans, trans+passivization,trans+infinitivization,trans+prodrop,trans+passivization+infinitivization,….. )

Example transformation:(infinitivization replacing (subj-verbs) (is-inf-form tr-verb v-casefr) (cancel-case s-subj))

Page 14: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

Some statistics

Chunking rulesTotal: 295 rulesCommon: 250 rulesEnglish: 34 rulesItalian: 7 rulesSpanish + Catalan: 4 rules

Base SubcategorizationTotal: 118 classesAbstract: 21 classes

plus verbal locutionsItalian: 40 classesEnglish: 1 class

Derived surface case frames 2653 case frames

Page 15: The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

Conclusions Test of the parser on other languages, using the same grammar

augmented with extra rules (see previous slide)

Partial use of semantic information (about 400 words classified according to a semantic taxonomy)

The parser has been used in a project involving spoken and written linguistic interaction with a user. It has been interfaced with an repository of semantic knowledge to build a meaning representation.