An illustration with misplaced adverbs Workshop LORIA, Nancy 17-18 June 2010 Correcting errors...
30
An illustration with misplaced adverbs Workshop LORIA, Nancy 17-18 June 2010 Correcting errors produced by French speakers writing in English: Marie Garnier Cultures Anglo- Saxonnes Université Toulouse 2 France P. Saint-Dizier IRIT CNRS France
An illustration with misplaced adverbs Workshop LORIA, Nancy 17-18 June 2010 Correcting errors produced by French speakers writing in English: Marie Garnier
An illustration with misplaced adverbs Workshop LORIA, Nancy
17-18 June 2010 Correcting errors produced by French speakers
writing in English: Marie Garnier Cultures Anglo-Saxonnes Universit
Toulouse 2 France P. Saint-Dizier IRIT CNRS France
Slide 2
Introduction CorrecTools project objective: develop correction
rules for grammatical errors produced by French speakers writing in
a foreign language, application to English (not detected nor
corrected by grammar checkers) didactic perspective: inclusion of
dynamically generated explanations (grammar, several corrections,
etc.) and possibly argumentation. Possible extension to style.
First experiment: Errors linked to misplaced adverbs (adjuncts)
motivations for the correction of such errors their automatic
correction
Slide 3
Project Overview Target: French speakers Audience: large-public
as well as professionals Exploratory corpus: variety of types of
documents, domains, authors around 100.000 words (errors are
manually detected and annotated) Classification of errors: A priori
choice: system of categories based on linguistic criteria (NP, PP,
VP, Clause and sentence) (Albert et al., 2009)
Slide 4
Parameters of the construction of a corpus General methodology
Construction of corpus: first step of an error analysis methodology
Designed in accordance with our objective (representativity of
errors and types of situations) Parameters taken into
consideration: Level of control Type of document Authors and target
audience Fields or domains of document production
Slide 5
Description of parameters Type of documents and level of
control: From short spontaneous productions (e.g. emails, posts) to
longer professional productions Quasi-continuum from low level to
high level of control Emails, blogs = low level of control, Web
pages = average level of control professional productions = high
level of control Variations exist within groups Around 200 pages
(90 pages of internet productions, 110 pages of professional
productions, 100 000 words), 79 authors.
Slide 6
Constraints on the classification of errors Methods Two main
methods (Ellis, 2008): Errors categorized according to linguistic
criteria (i.e. syntax/morphology/lexicon, parts of speech,
linguistic systems such as determination, expression of future,
etc.) Errors categorized according to the observation of surface
phenomena (i.e. omission, addition, wrong use, etc.) Possibility of
ad hoc categories (study of a limited number of error types
concerning a specific group of learners)
Slide 7
Constraints on our classification system Categories should
describe most types of errors (not ad hoc) Categories should be
designed according to linguistic criteria (descriptions used to
analyze the source of errors) Categories should be understood by
most annotators and users Classification system should show
internal coherence (linguistic, cognitive) Categories could be
portable to other languages
Slide 8
Presentation of our error categorization system Main
categories: syntactic phrases that contain the errors (NP, VP, PP,
Sentence and Clause) Internal categories: finer distinctions
designed after observation of the nature of errors. Leads to about
40 subclasses. Analyze reasons/source of errors for a better
correction.
Slide 9
Error categories: a few examples NOUN PHRASE Adjective Position
of adjective w.r.t. noun The carrying of weapons is permitted in
fifty states different. The carrying of weapons is permitted in
fifty different states. Order of adjectives in a complex
construction European academic and industrial partners Academic and
industrial European partners Position of the adverb modifying an
adjective (exceptional construction) A quite detailed analysis
Quite a detailed analysis Determination Choice of article A
Merovingian necropolis was built on exact site of the villa. A
Merovingian necropolis was built on the exact site of the villa. NN
construction Ungrammatical NN construction The objects properties
The properties of the objects Abusive NN stacking Security object
granularity The granularity of security objects Table 4.
Slide 10
CategoryNumber of calque errors Lexical & lex. choice
calques200 Incorrect lexical choice of preposition62 Determiner30
Adverbs12 Modals26 Incorrect idiomatic expression70 Structural
calques105 Incorrect position of adverbs38 Incorrect position of
adjectives7 Argument omissions52 Incorrect passive forms8 Stylistic
calques122 Incorrect temporal sequence26 Incorrect choice of
aspect20 Punctuation errors76 CALQUE: Frequency table
Slide 11
Distribution of errors: a sample Public.EmailsLearner product.
ReportsTOTAL NN Constructions55461110,5% Choice of
article2492599,3% Choice of preposition18271638,9% Position of
adverb160864,2% Transitivity621124,2% TOTAL47%22,9%39,2%50%37,1%
Table 5. Main types of errors in the corpus
Slide 12
About other language pairs Same remarks apply, but with quite
different error categories: French Spanish (Mathilde Janier) ex:
temporal agreement ramos en los tiempos, nos vamos con destino a
Lyn ramos en los tiempos, nos fuimos con destino a Lyn futur avec
Cuando: cuando ser ms vieja cuando sea ms vieja Spanish English
(Astrid Rojas) Realmente espero ir el prximo ao Really I hope I can
go there next year (I really hope) Tengo 20 aos I have 20 years.
The grammar of pronouns and reflexives is quite different in
Spanish, leading to forms such as David is me, a calque of David
soy yo. French German (Camille Albert) Ich habe gern die Suppe Ich
habe die Suppe gern.
Slide 13
Proposition of an annotation schema Attempt to reflect the
parameters involved in error detection and correction made by human
correctors Annotations are in XML format The aim is to derive
correction rules from annotations, possibly through
machine-learning techniques
Slide 14
Error annotations: a preliminary proposal tags the group of
words involved in the error comprehensionindicates if the segment
is understandable (0 to 4) grammaticalityindicates how
ungrammatical the error is (0 to 2) categmain category of the error
(lexical, syntactic, stylistic, semantic, textual) sourcetransfer,
overgeneralization, erroneous rule Table 1. Delimitation and
characterization of an error
Slide 15
tags the text fragment involved in the correction tags each
correction surfacesize of the text fragment affected by the
correction (minimal, average, maximal) grammarindicates if
correction proposed is standard (by-default, alternative, unlikely)
meaningindicates if the meaning has been altered (yes, somewhat,
no) var-sizeindicates increase/decrease in number of words
changeindicates the nature of the change (lexical, syntactic,
stylistic, semantic, textual) compindicates if correction is easy
to understand (yes, average, no) fixindicates whether the error is
specific or not (yes, no) qualifindicates the certainty level of
the annotator (high, average, low) correctgives the correction NB:
More complex schema than those used in other projects (ICLE and
FreeText, NICT Japanese Learner English, Cambridge Learner Corpus)
but purposes are very different. Table 2. Delimitation and
characterization of correction(s)
Slide 16
Example of an annotated error with multiple corrections: *We
need to index efficiently the soundtrack of multimedia documents We
need to Table 3. Example of an annotated error
Slide 17
The case of misplaced adverbs Distribution and type of errors
in the corpus Responses offered by grammar checkers A correction
strategy
Slide 18
Type of errors FunctionTypeExample AdjunctsVP modifiersManner
Degree Means or Instrument *To index efficiently the distributional
data *His father resembles strongly his own character *Our system
is able to derive automatically information ClausalConnective*They
exhibit nevertheless the dependency relationships observed in the
source parse tree Focusing ModifiersAdditive*The treatment of this
official day exemplifies also an awnswer to associations
Restrictive?in order to hand down exclusively family memories Table
6. Errors linked to adverbs Morphology: mostly prototypical ly
adverbs + simple or complex other adverbs (well,
nevertheless...)
Slide 19
Grammar checkers From payware to freeware, from professional
websites to research projects: With those systems: Error samples
from corpus: best result = 19.3% Misplaced adverbs in the VP are in
general not corrected nor detected... After the Deadline, Paper
Rater, TwinMarker, SpellCheckPlus, LanguageTool, Grammar Expert +,
GrammarCheckAnywhere, Ginger, Word 2007, Grammar Slammer...
Slide 20
Error sources Syntactic transfer (Ellis, 2008): Adverb placed
between main verb and complement (L1 influence) Ex: *It won't
change completely the life of its citizens Generalization from
exceptional cases: In English, adverbs can be found after the verb
when the complement is long (Huddleston and Pullum, 2002) or when
there is no complement (intransitive VP) Ex: She ate slowly. Ex:
She waited anxiously for the results of the exam she had had such a
hard time preparing for.
Slide 21
Towards automatic correction Grammatical and linguistic
framework: Descriptive grammar The Cambridge Grammar of the English
Language, R. Huddleston & G. K. Pullum (2002) Prescriptive
grammar Grammaire Explicative de l'Anglais, P. Larreya & C.
Rivire (2005) Overview of grammatical rules and tendencies
governing adverb placement
Slide 22
Parameters involved in correction rules Weight Length of AdvP
(long HeadAdv and/or modification) ? She would very erratically
tell her story. Presence/absence of complements after the verb +
length ? She was slowly eating. She has slowly opened the door to
the second guestroom. vs She has opened it slowly. Semantics
Adjunct type (Manner, Degree, Act-related...) They deliberately had
stopped the train. Scope of the adverb (VP-oriented,
Clause-oriented) Sadly they were arguing about the children. ? They
were arguing sadly about the children.
Slide 23
Syntax "Simple" verbs vs Prepositional verbs vs Phrasal verbs
She has opened the door slowly. She has slowly given up cigarettes.
Prosody Prosodically integrated vs prosodically detached *Anxiously
she waited for the results. Anxiously, she waited for the results.
(Other works on parameters of adverb placement include: Kampers-
Manhe, 1994; Engels, 2004)
Slide 24
Tests with native speakers SentencesOKIncorrectBest Choice
(1)(2) 1.Slowly she has opened the door.x 2.She slowly has opened
the door.x 3.She has slowly opened the door.x 4.4.She has opened
the door slowly.xx (1) Grammatical but unnatural and/or changes
original meaning (2) Ungrammatical Table 7. Sample from NS
tests
Slide 25
Error patterns and correction rules Manner adverbs used as
adjuncts (VP-oriented) Ex: *Slowly she has opened the door. (1)
Slowly, she has opened the door. (2) She has opened the door
slowly. (3) She has slowly opened the door. Correction: pattern for
detection + Rewriting under conditions, with preferences:
Adverb(+manner), NP 1, {Auxiliary}, Verb, NP 2 Adverb(+manner),
[,], NP 1, {Auxiliary}, Verb, NP2,{preference: 1} NP 1,
{Auxiliary}, Verb, NP 2, Adverb(+manner),{preference: 2} NP 1,
{Auxiliary}, Adverb(+manner),Verb, NP 2,{preference: 3}
Slide 26
Ex: *She anxiously was waiting for the results. (1) She was
anxiously waiting for the results. (2) She was waiting for the
results anxiously. Rewriting rule: NP 1, Adverb(+manner),
{Auxiliary}, Verb, NP 2 NP 1, {Auxiliary}, Verb, NP 2,
Adverb(+manner), {preference: 1} NP 1, {Auxiliary},
Adverb(+manner), Verb, NP 2, {preference: 2}
Slide 27
Difficulties: Deal with the recognition of NPs Possible
interactions with other functions of adverbs (ex: He loves only his
work, focusing modifier) Await testing and implementation using
(software platform for the identification of textual semantic
structures): Evaluation of : Annotation of errors + correction
proposals.
Slide 28
Perspectives Further research on adverbs: Other functions, e.g.
modifiers of adjectives and adverbs, focusing modifiers (might
interact with existing error patterns) Internal syntax of AdvPs
Develop explanation aspects of the project: Generate argumentations
to deal with multiple correction propositions (Garnier et al.,
2009) Design dynamically generated explanations for errors linked
to adverbs Investigate cognitive aspects of error correction
Correction of NN errors (ex: the meaning utterance) and other types
of errors Requires knowledge from different areas (lexical,
ontological, domain knowledge, etc.)
Slide 29
More information on:
http://www.irit.fr/recherches/ILPL/webct/ct.html