Upload
dodien
View
223
Download
0
Embed Size (px)
Citation preview
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Las entradas adverbiales de diversos lexicos delfrances
Elsa Tolone1
1. FaMAF, Universidad Nacional de Cordoba (Argentina)
Workshop on Natural Language Processing 2012Scientific Seminar STIC-AmSud 2012
InCo, Facultad de Ingenierıa, Universidad de la Republica,Montevideo, Uruguay
November 8, 2012
1 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Context in 1970’s
I Syntactic lexicon for NLP applications→ syntactic and semantic information of predicates (verb,noun or adjective)
I Works in syntax: create general rules, i.e., transformationrules of Chomsky→ the question is For each word, this general rule can beapplied ?
I Objective of M. Gross: to create a large-coverage lexicalresource→ Lexicon-Grammar tables
2 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Context now
I Lexicon-Grammar tables for French are a large-coveragelexical resource
I They contain syntactic and semantico-syntactic information
I Such information is useful for parsing
I Experiments have been done in order to integrate them in areal-life symbolic parser [Tolone 2011]
I We continue to improve the data: focus on adverbs→ conversion in useful lexica for different NLP tasks [Laporte
& Voyatzi 2008] [Agirre et al. 2008]
3 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
1. Lexicon-Grammar tables for French
2. LGLex lexicon
3. DELA lexicon
4. Evaluacion
4 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
1. Lexicon-Grammar tables for French
5 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Lexicon-Grammar tables
Developed manually for over 40 years by the LADL group[Gross 1975], and the Computational Linguistics Groupof LIGM (Universite Paris-Est)Methodology:
I Study the syntax of basic sentences(or subcategorization frames)e.g.: N0 V N1
I Study of French verbs, adverbs, predicative nouns andadjectives and frozen expressions→ they share some features = classes
I The different meanings are distinguishede.g.: se rendre ’to surrender / to accept’- ’rendirse : capitular / aceptar ’
6 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Principle
I Each class is described in a table:I one row for each (lemma-level) entryI one column for each feature relevant to the classI in each cell, + (resp. −) = the corresponding feature is valid
(resp. not valid) for the corresponding entry
I A class is defined by a set of “defining features”I For a given table, the defining features include:
I a basic defining feature (a subcategorization frame)I often additional features (distributional, morphological,
transformational, semantic, etc.)e.g.: N0 =: Nhum → human name
7 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Table V 33
Defining feature: N0 V a N1
8 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Inventory
I Inventory [Tolone 2011]:I 67 classes of simple verbs
I 13 872 entries for 5 738 distinct lemmasI 81 classes of simple and compound predicative nouns
(nouns with argument(s)that are studied with their light verb)I 14 271 entries for 10 112 distinct lemmas
e.g.: Luc monte une attaque contre le fort
’Luc is lauching an attack against the fort’I 69 classes of frozen expressions, mostly verbal and adjectival
I 39 628 entries for 38 658 distinct lemmase.g.: Tu n’arrives pas a la cheville de Marie ’You don’t hold a candleto Mary’ - ’no llegarle a los talones / a la suela del zapato’,literaly ’You don’t arrive at the ankle of Mary’ - ’no llegarle altobillo’
I 32 classes of simple and frozen adverbsI 10 488 entries for 9 326 distinct lemmas
e.g.: difficilement ’with difficulty’ ’dificilmente’& [changer] du jour au lendemain ’[to change] overnight’
- ’[cambiar] de un dıa para otro’9 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Defining features of adverbs
Table Defining features Example
ADVM... N0 V Adv W Ce livre est en vente exclusivement sur ce site*Exclusivement, ce livre est en vente sur ce site
ADVMP N0 V Adv W Ce livre est en vente regulierement sur ce siteADVMS Adv, N0 V W Regulierement, ce livre est en vente sur ce site
ADVMTF *Regulierement, ce livre n’est pas en ventesur ce site
ADVMP N0 V Adv W Ce concert est musicalement une reussiteAdv, N0 V W Musicalement, ce concert est une reussiteAdv, N0 ne V pas W Musicalement, ce concert n’est pas une reussite
10 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Simple adverbs in -ment (Table ADVMS)
Defining structure: AdvDefining features: N0 V Adv W & Adv, N0 V W→ 3 203 simple adverbs in -ment ’-mente’ [Molinier & Levrier 2000]
11 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Defining structures of simple and frozen adverbs
12 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Simple and frozen adverbs (Table PCA)
Defining structure: Prep Det C Modif pre-adj AdjDefining features: N0 V Adv W & Adv, N0 V W & Adv, N0 ne Vpas W→ 7 285 simple and frozen adverbs [Gross 1986b] [Gross 1986a]
13 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
2. LGLex lexicon
14 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
LGLex
The improvement of the tables enables the extraction of asyntactic lexicon for each categories from Lexicon-Grammartables [Constant & Tolone 2010]:
I named LGLex lexicon
I generated from the original Excel or CSV tables by theLGExtract tool
I exchange format with the same linguistic concepts of thetables
I text or XML format
I the version 3.4 contained LGLex for each category athttp://infolingu.univ-mlv.fr/
15 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Format of LGLex lexicon
I ID=category numTable numEntry
I lexical-info=[...] → lemma and lexical information(auxiliaries, support verbs, determiners, prepositions)
I args=(...) → arguments and their nature with otherinformation (semantic features, mood of the complementizerphrase, argument controled by the infinitive, prepositions)
I all-constructions=[...] → list of accepted constructions
I example=[...] → an illustrative example of the entry
16 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
LGLex: an example of verb
ID=V 33 131lexical-info=[cat=”verb”,verb=[lemma=”rendre”,ppvse=”true”],
aux-list=(etre=”true”),prepositions=(),locatifs=()]args=(
const=[pos=”0”,dist=(comp=[cat=”NP”,hum=”true”,introd-prep=(),introd-loc=(),origin=(orig=”N0 =: Nhum”)],
const=[pos=”1”,dist=(comp=[cat=”NP”,hum=”true”,introd-prep=(),introd-loc=(),origin=(orig=”N1 =: Nhum”)])])
all-constructions=[absolute=(construction=”true::N0 V a N1”,construction=”o::N0 V”,relative=()]
example=[example=”Le caporal s’est rendu a l’ennemi”]
I entry se rendre V 33 131 ’to surrender’ - ’rendirse’
17 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Adverbial variants in LGLex
We added the following fields to lexical-info:I paraphrases: franchement ’frankly’ - ’francamente’→ a franchement parler ’frankly speaking’ - ’francamente hablando’→ de (facon+maniere) franche ’in a frank way’ - ’de forma+manerafranca’
(a Adv parler, P & N0 V W de (facon+maniere) Adj)
I substructures: jusqu’a la fin des (=de les) temps ’until the end oftime’ - ’hasta el fin de los tiempos’→ jusqu’a la fin ’until the end’ - ’hasta el fin’
(Prep1 Det1 C1 derived from the basic structure Prep1 Det1 C1
Prep2 C2)
I intensified structures: particulierement ’particularly’→ plus particulierement ’more particularly’ - ’mas particularmente’
(plus Adv)
18 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
3. DELA lexicon
19 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
French morphological lexicon DELA
I Freely available: http://infolingu.univ-mlv.fr/english
(Language Resources > Dictionaries > Download)I Composed of 683 824 simple entries and 108 436 compound
entries [Courtois & Silberztein, 1990]
paresseuse,paresseux.A+d+z1:fs ’lazy’ - ’peresosa,peresoso’
I paresseuse: inflected form of the entryI paresseux: canonical form (lemma) of the entryI A+d+z1: sequence of gramatical and semantic information
I A: adjectiveI d: the adjective occurs after the nounI z1, z2, z3: the language register
I z1: general language (blague ’joke’ - ’broma’)I z2: specialized language (disquette ’floppy disk’ - ’disquette’)I z3: very specialized (or technical) language (serialisation
’serialization’ - ’serialisacion’)
I fs: inflectional code (feminine singular)20 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
LGLex → DELA: an example
ID=P advms 643lexical-info=[cat=”adv”,exprF=[adv=[notperm=[complete=”paresseusement”]]],
paraphrases=(adv=”de facon <paresseux>”,adv=”d’une facon <paresseux>”,adv=”de maniere <paresseux>”,adv=”d’une maniere <paresseux>”),
autres-ID=(),autres-structures=()]
The four variants associated in LGLex to the adverbial entryparesseusement ’lazily’ - ’peresosamente’ produce in DELA format:
de facon paresseuse, paresseusement.ADV+advmsd’une facon paresseuse, paresseusement.ADV+advmsde maniere paresseuse, paresseusement.ADV+advmsd’une maniere paresseuse, paresseusement.ADV+advms’in a lazy way’ - ’de (una) forma+manera peresosa’
[Tolone & Voyatzi 2011] [Tolone et al. 2012a]21 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Conversion into DELA format
We extracted the list of adverbial variants and we converted themin DELA format, so we have:
I 20 761 entries without variables, after substitutions(<paresseux> → paresseuse) and deletion of duplicates
I Corrections were necessary for two reasons: to improve thequality of entries and to compare with the current versionof DELA
I 830 entries contain variables like <A> (adjective) or :DNUM(numerical determiner):
par temps <A>,.ADV+pcaa <A> echeance, a echeance <A>.ADV+pcaa :DNUM franc pres,.ADV+pca
I We had to interprete the variables using graphs
22 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Integration in the existing DELA
We merged the entries without variables with the relevant entriespresent in the existing DELA:
abominablement,abominablement.ADV+advmqi (Lexicon-Grammar table)abominablement,abominablement.ADV+z1 (existing DELA)abominablement,abominablement.ADV+advmqi+z1 (new entry)’abominably’ - ’abominablemente’
We obtained 22 481 final entries merging 9 036 initial entries and20 761 new entries in DELA format
23 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Graph dictionary: an example
During a corpus processing, this graph can produce the followingentries in DELA format:
par temps sec,.ADV+pca ’in dry weather’’por tiempo seco’
par temps pluvieux,.ADV+pca ’in rainy weather’’ por tiempo lluvioso’
a breve echeance, a echeance breve.ADV+pca ’in the short term’’a corto plazo’
a cinq francs pres,.ADV+pca ’to the very last penny’’por un peso’
24 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Number of entries in DELA
without withvariables variables
Initial entries in DELA 9 036 /
New entries from LGLex 20 761 830(without duplicates entries)
All entries 22 481 (155 graphs)(without duplicates entries)
We enhanced the DELA with 13 445 entries, and we have 830entries with variables which enable the generation of more entriesby using the graph dictionary method
25 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
4. Evaluacion
26 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Corpus anotado
Se han evaluado los datos obtenidos en el corpus de referencia delfrances anotado por las expresiones estereotipadas con susfunciones adverbiales [Laporte et al. 2008]:
I http://infolingu.univ-mlv.fr/corpus/fr-MW-Adv/
I contiene 8 794 frases o 168 846 palabras y esta compuesto de:
I la transcripcion des las sessiones de la Asamblea NacionalFrancesa del 3 y 4 de octubre 2006
I la novela de Jules Vernes Le Tour du monde en quatre-vingtjours, escrita en 1873
27 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Resultados
Datos utilizados Numero de adverbios anotados
Referencia 3 239 (adverbios estereotipadosunicamente)
DELA inicial (9 036 entradas) 3 690
DELA extendido (22 481 entradas) 4 706
Grafos diccionarios 727(830 entradas con variables)
DELA extendido 5 209+ Grafos diccionarios
Los grafos diccionarios han permitido reconocer 503 entradasadicionales, 224 entradas ya siendo incluidas en el DELA extendido
28 / 29
1. Lexicon-Grammar tables for French2. LGLex lexicon3. DELA lexicon
4. Evaluacion
Conclusiones y perspectivas
I Recursos: http://infolingu.univ-mlv.fr/english
(Languages Ressources > Lexicon-Grammar > Download)I Anotacion con el DELA extendido y los grafos diccionarios:
I Reconoce adverbios validos, incluyendo los adverbios simplesque no estan reconocidos por la referencia, pero tambienaumenta el ruido→ filtrar las nuevas entradas anadidas en el DELA para sacar omodificar algunas de ellas
I Perspectivas:I Convertir las nuevas entradas adverbiales al formato Lefff
[Sagot 2010] para integrarlas en un analizador sintactico comolos verbos y los nombres predicativos [Tolone & Sagot 2011][Tolone et al. 2012b]
I Mejorar el Wordnet del frances con esas entradas adverbiales[Sagot et al. 2009]
29 / 29
5. Evaluation of verbs and predicative nouns
References
I [Agirre et al. 2008] Agirre E., Baldwin T., & Martinez D. Improving parsing
and PP attachment performance with sense information. Proceedings of
ACL’08, Columbus, Ohio. 2008.
I [Constant & Tolone 2010] Constant M. and Tolone E. A generic tool to
generate a lexicon for NLP from Lexicon- Grammar tables. Lingue d’Europa e
del Mediterraneo, Grammatica comparata, vol.1, pp.79-93. Aracne. 2010.
I [Courtois & Silberztein, 1990] Courtois B. & Silberztein M. Dictionnaires
electroniques du francais, Langue Francaise 87, Larousse, Paris. 1990.
I [Gross 1975] Gross M. Methodes en syntaxe : Regime des constructions
completives. Hermann. Paris, France. 1975.
I [Gross 1986a] Gross M. Grammaire transformationnelle du francais : Syntaxe
de l’adverbe, Volume 3, ASSTRIL, Paris, France. 1986.
30 / 29
5. Evaluation of verbs and predicative nouns
References (2)
I [Gross 1986b] Gross M. Lexicon-grammar: The representation of compound
words. Proceedings of the 11th International Conference on Computational
Linguistics, Bonn, Germany. 1986.
I [Laporte & Voyatzi, 2008] Laporte E. & Voyatzi S. 2008. An electronic
dictionary of french multiword adverbs. Proceedings of the LREC workshop
Towards a Shared Task on Multiword Expressions, Marrakech, Morocco.
I [Laporte et al. 2008] Laporte E., Nakamura T. & Voyatzi S. A French
Corpus Annotated for Multiword Expressions with Adverbial Function.
Proceedings of LREC’08, Workshop on Linguistic Annotation Conference (LAW
II), Marrakech, Morocco, pp. 48-51. 2008.
I [Molinier & Levrier 2000] Molinier C. & Levrier F. Grammaire des
adverbes : description des formes en -ment, Droz, Geneve, Switzerland. 2000.
31 / 29
5. Evaluation of verbs and predicative nouns
References (3)
I [Sagot 2010] Sagot B. The Lefff, a freely available and large-coverage
morphological and syntactic lexicon for French. Proceedings of LREC’10, 8 pp.
Valletta, Malta. 2010.
I [Sagot et al. 2009] Sagot B., Fort K. & Venant F. Extending the adverbial
coverage of a French Wordnet. Proceedings of the NODALIDA 2009 workshop
on WordNets and other Lexical Semantic Resources, Odense, Danemark. 2009.
I [Tolone & Sagot 2011] Tolone E. & Sagot B. Using Lexicon-Grammar
tables for French verbs in a large-coverage parser. LNAI. Springer Verlag. 2011.
I [Tolone 2011] Tolone E. Analyse syntaxique a l’aide des tables du
Lexique-Grammaire du francais, These de doctorat, Universite Paris-Est, 340 pp.
2011.
I [Tolone & Voyatzi 2011] Tolone E. & Voyatzi S. Extending the adverbial
coverage of a NLP oriented resource for French. Proceedings of IJCNLP’11, pp.
1225-1234, Chiang Mai, Thailand. 2011.
32 / 29
5. Evaluation of verbs and predicative nouns
References (4)
I [Tolone et al. 2012a] Tolone E., Voyatzi S., Martineau C. & Constant M.
Extending the adverbial coverage of a French morphological lexicon.
Proceedings of LREC’12, 7pp. Istambul, Turquia. 2012.
I [Tolone et al. 2012b] Tolone E., Sagot B. & Eric de La Clergerie.
Evaluating and improving syntactic lexica by plugging them within a parser.
Proceedings of LREC’12, 8pp. Istambul, Turquia. 2012.
33 / 29
5. Evaluation of verbs and predicative nouns
5. Evaluation of verbs and predicative nouns
34 / 29
5. Evaluation of verbs and predicative nouns
The FRMG parser
A TAG parser of French [Thomasset & de La Clergerie 2005]
FRMG fits into a processing chain:I upstream
I SXPipe: segmentation, token, corrections, named entitiesI Lefff: morphological and syntactic lexicon for French→ connection lexicon/grammar: anchoring with hypertag
I downstream, with a module of disambiguation (withheuristics)
35 / 29
5. Evaluation of verbs and predicative nouns
The Lefff
I The Lefff (Lexique des Formes Flechies du Francais) is amorphological and syntactic lexicon for French[Sagot 2010]
I large coverage (536 375 entries corresponding to 110 477distinct lemmas covering all categories)
I freely available (LGPL-LR license)
I It relies on the Alexina framework for the modeling andacquisition of morphological and syntactic lexicons
36 / 29
5. Evaluation of verbs and predicative nouns
Integration in the FRMG parser
Only verbs and nouns predicatives
I We replaced the Lefff with a modified version of the Lefff inwhich verb entries are replaced by LGLexLefff
I We added nominal entries of LGLexLefff
I We kept other entries of the Lefff
The result is a variant of FRMG, named FRMGLGLex unlike thestandard variant denoted by FRMGLefff
37 / 29
5. Evaluation of verbs and predicative nouns
Example of dependencies in FRMG
Paul s’adresse a Max ’Paul talks to Max’ - ’Paul se dirige a Max’→ entry s’adresser V 33 8
38 / 29
5. Evaluation of verbs and predicative nouns
Protocol used
I We evaluated FRMGLefff and FRMGLGLex by parsing themanually annotated part of the 1st Passage parsers’evaluation campaign of 2007 [Hamon et al. 2008]
I 4 306 sentences of EASy annotated corpus + 400 newsentences : various genres (journalistic, medical, oral,questions, literacy, etc.)
I evaluation metrics: those of the first EASy parsers’ evaluationcampaign that took place in December 2005[Paroubek et al. 2006]
I evaluation in chunks and relations (∼ dependencies betweenlexical words)
39 / 29
5. Evaluation of verbs and predicative nouns
Preliminary remarks
FRMGLGLex ’s results must be analyzed with the following facts inmind:
I FRMGLGLex ’s verb entries are the result of a conversionprocess from the original tables→ this conversion process certainly introduces errors
I the majority of predicative nouns can not evaluated becauseFRMG does not consider those with determiners
I Passage not allow to evaluate all the information contained intables (e.g. semantic features)
I the Lefff was developed in parallel with the EASy and Passagecampaigns (unlike Lexicon-Grammar tables)
I LGLexLefff does not contain all necessary verb entries; weadded other ones→ other verb entries may be still missing because all verbentries are not encoded
40 / 29
5. Evaluation of verbs and predicative nouns
Results
Passage : Comparative results of FRMGLefff and FRMGLGLex (interms of f-measure):
Chunks RelationsSub-corpus FRMGLefff FRMGLGLex FRMGLefff FRMGLGLex
general_lemonde 88.22% 84.60% 62.73% 59.01%litteraire_2 88.91% 88.46% 65.28% 62.43%mail_9 82.60% 81.90% 58.55% 56.00%medical_3 85.04% 85.89% 64.79% 65.26%oral_delic_4 78.80% 81.79% 51.67% 51.14%questions_amaryllis 91.30% 90.73% 66.56% 64.77%total 87.05% 85.53% 63.10% 60.25%
Parsing times higher with FRMGLGLex than with FRMGLefff : themedian parsing time per sentence is 0,62s vs. 0,26s
I this comes from the higher average number of entries per verblemma (approx. 3) in LGLex than in the Lefff→ more ambiguity
41 / 29
5. Evaluation of verbs and predicative nouns
References (5)
I [Hamon et al. 2008] Hamon O., Mostefa D., Ayache C., Paroubek P., Vilnat
A. & de La Clergerie E. Passage: from French Parser Evaluation to Large Sized
Treebank. Proceedings of LREC’08. Marrakech, Maroc. 2008.
I [Paroubek et al. 2006] Paroubek P., Robba I., Vilnat A. & Ayache C. Data,
Annotations and Measures in EASy: the Evaluation Campaign for Parsers of
French. Proceedings of LREC’06. Genoa, Italy. 2006.
I [Thomasset & de La Clergerie 2005] Thomasset F. & de La Clergerie E.
Comment obtenir plus des meta-grammaires.Proceedings of TALN’05. Dourdan,
France. 2005.
42 / 29