30
An illustration with misplaced adverbs Workshop LORIA, Nancy 17-18 June 2010 Correcting errors produced by French speakers writing in English: Marie Garnier Cultures Anglo- Saxonnes Université Toulouse 2 France P. Saint-Dizier IRIT CNRS France

An illustration with misplaced adverbs Workshop LORIA, Nancy 17-18 June 2010 Correcting errors produced by French speakers writing in English: Marie Garnier

Embed Size (px)

Citation preview

  • Slide 1
  • An illustration with misplaced adverbs Workshop LORIA, Nancy 17-18 June 2010 Correcting errors produced by French speakers writing in English: Marie Garnier Cultures Anglo-Saxonnes Universit Toulouse 2 France P. Saint-Dizier IRIT CNRS France
  • Slide 2
  • Introduction CorrecTools project objective: develop correction rules for grammatical errors produced by French speakers writing in a foreign language, application to English (not detected nor corrected by grammar checkers) didactic perspective: inclusion of dynamically generated explanations (grammar, several corrections, etc.) and possibly argumentation. Possible extension to style. First experiment: Errors linked to misplaced adverbs (adjuncts) motivations for the correction of such errors their automatic correction
  • Slide 3
  • Project Overview Target: French speakers Audience: large-public as well as professionals Exploratory corpus: variety of types of documents, domains, authors around 100.000 words (errors are manually detected and annotated) Classification of errors: A priori choice: system of categories based on linguistic criteria (NP, PP, VP, Clause and sentence) (Albert et al., 2009)
  • Slide 4
  • Parameters of the construction of a corpus General methodology Construction of corpus: first step of an error analysis methodology Designed in accordance with our objective (representativity of errors and types of situations) Parameters taken into consideration: Level of control Type of document Authors and target audience Fields or domains of document production
  • Slide 5
  • Description of parameters Type of documents and level of control: From short spontaneous productions (e.g. emails, posts) to longer professional productions Quasi-continuum from low level to high level of control Emails, blogs = low level of control, Web pages = average level of control professional productions = high level of control Variations exist within groups Around 200 pages (90 pages of internet productions, 110 pages of professional productions, 100 000 words), 79 authors.
  • Slide 6
  • Constraints on the classification of errors Methods Two main methods (Ellis, 2008): Errors categorized according to linguistic criteria (i.e. syntax/morphology/lexicon, parts of speech, linguistic systems such as determination, expression of future, etc.) Errors categorized according to the observation of surface phenomena (i.e. omission, addition, wrong use, etc.) Possibility of ad hoc categories (study of a limited number of error types concerning a specific group of learners)
  • Slide 7
  • Constraints on our classification system Categories should describe most types of errors (not ad hoc) Categories should be designed according to linguistic criteria (descriptions used to analyze the source of errors) Categories should be understood by most annotators and users Classification system should show internal coherence (linguistic, cognitive) Categories could be portable to other languages
  • Slide 8
  • Presentation of our error categorization system Main categories: syntactic phrases that contain the errors (NP, VP, PP, Sentence and Clause) Internal categories: finer distinctions designed after observation of the nature of errors. Leads to about 40 subclasses. Analyze reasons/source of errors for a better correction.
  • Slide 9
  • Error categories: a few examples NOUN PHRASE Adjective Position of adjective w.r.t. noun The carrying of weapons is permitted in fifty states different. The carrying of weapons is permitted in fifty different states. Order of adjectives in a complex construction European academic and industrial partners Academic and industrial European partners Position of the adverb modifying an adjective (exceptional construction) A quite detailed analysis Quite a detailed analysis Determination Choice of article A Merovingian necropolis was built on exact site of the villa. A Merovingian necropolis was built on the exact site of the villa. NN construction Ungrammatical NN construction The objects properties The properties of the objects Abusive NN stacking Security object granularity The granularity of security objects Table 4.
  • Slide 10
  • CategoryNumber of calque errors Lexical & lex. choice calques200 Incorrect lexical choice of preposition62 Determiner30 Adverbs12 Modals26 Incorrect idiomatic expression70 Structural calques105 Incorrect position of adverbs38 Incorrect position of adjectives7 Argument omissions52 Incorrect passive forms8 Stylistic calques122 Incorrect temporal sequence26 Incorrect choice of aspect20 Punctuation errors76 CALQUE: Frequency table
  • Slide 11
  • Distribution of errors: a sample Public.EmailsLearner product. ReportsTOTAL NN Constructions55461110,5% Choice of article2492599,3% Choice of preposition18271638,9% Position of adverb160864,2% Transitivity621124,2% TOTAL47%22,9%39,2%50%37,1% Table 5. Main types of errors in the corpus
  • Slide 12
  • About other language pairs Same remarks apply, but with quite different error categories: French Spanish (Mathilde Janier) ex: temporal agreement ramos en los tiempos, nos vamos con destino a Lyn ramos en los tiempos, nos fuimos con destino a Lyn futur avec Cuando: cuando ser ms vieja cuando sea ms vieja Spanish English (Astrid Rojas) Realmente espero ir el prximo ao Really I hope I can go there next year (I really hope) Tengo 20 aos I have 20 years. The grammar of pronouns and reflexives is quite different in Spanish, leading to forms such as David is me, a calque of David soy yo. French German (Camille Albert) Ich habe gern die Suppe Ich habe die Suppe gern.
  • Slide 13
  • Proposition of an annotation schema Attempt to reflect the parameters involved in error detection and correction made by human correctors Annotations are in XML format The aim is to derive correction rules from annotations, possibly through machine-learning techniques
  • Slide 14
  • Error annotations: a preliminary proposal tags the group of words involved in the error comprehensionindicates if the segment is understandable (0 to 4) grammaticalityindicates how ungrammatical the error is (0 to 2) categmain category of the error (lexical, syntactic, stylistic, semantic, textual) sourcetransfer, overgeneralization, erroneous rule Table 1. Delimitation and characterization of an error
  • Slide 15
  • tags the text fragment involved in the correction tags each correction surfacesize of the text fragment affected by the correction (minimal, average, maximal) grammarindicates if correction proposed is standard (by-default, alternative, unlikely) meaningindicates if the meaning has been altered (yes, somewhat, no) var-sizeindicates increase/decrease in number of words changeindicates the nature of the change (lexical, syntactic, stylistic, semantic, textual) compindicates if correction is easy to understand (yes, average, no) fixindicates whether the error is specific or not (yes, no) qualifindicates the certainty level of the annotator (high, average, low) correctgives the correction NB: More complex schema than those used in other projects (ICLE and FreeText, NICT Japanese Learner English, Cambridge Learner Corpus) but purposes are very different. Table 2. Delimitation and characterization of correction(s)
  • Slide 16
  • Example of an annotated error with multiple corrections: *We need to index efficiently the soundtrack of multimedia documents We need to Table 3. Example of an annotated error
  • Slide 17
  • The case of misplaced adverbs Distribution and type of errors in the corpus Responses offered by grammar checkers A correction strategy
  • Slide 18
  • Type of errors FunctionTypeExample AdjunctsVP modifiersManner Degree Means or Instrument *To index efficiently the distributional data *His father resembles strongly his own character *Our system is able to derive automatically information ClausalConnective*They exhibit nevertheless the dependency relationships observed in the source parse tree Focusing ModifiersAdditive*The treatment of this official day exemplifies also an awnswer to associations Restrictive?in order to hand down exclusively family memories Table 6. Errors linked to adverbs Morphology: mostly prototypical ly adverbs + simple or complex other adverbs (well, nevertheless...)
  • Slide 19
  • Grammar checkers From payware to freeware, from professional websites to research projects: With those systems: Error samples from corpus: best result = 19.3% Misplaced adverbs in the VP are in general not corrected nor detected... After the Deadline, Paper Rater, TwinMarker, SpellCheckPlus, LanguageTool, Grammar Expert +, GrammarCheckAnywhere, Ginger, Word 2007, Grammar Slammer...
  • Slide 20
  • Error sources Syntactic transfer (Ellis, 2008): Adverb placed between main verb and complement (L1 influence) Ex: *It won't change completely the life of its citizens Generalization from exceptional cases: In English, adverbs can be found after the verb when the complement is long (Huddleston and Pullum, 2002) or when there is no complement (intransitive VP) Ex: She ate slowly. Ex: She waited anxiously for the results of the exam she had had such a hard time preparing for.
  • Slide 21
  • Towards automatic correction Grammatical and linguistic framework: Descriptive grammar The Cambridge Grammar of the English Language, R. Huddleston & G. K. Pullum (2002) Prescriptive grammar Grammaire Explicative de l'Anglais, P. Larreya & C. Rivire (2005) Overview of grammatical rules and tendencies governing adverb placement
  • Slide 22
  • Parameters involved in correction rules Weight Length of AdvP (long HeadAdv and/or modification) ? She would very erratically tell her story. Presence/absence of complements after the verb + length ? She was slowly eating. She has slowly opened the door to the second guestroom. vs She has opened it slowly. Semantics Adjunct type (Manner, Degree, Act-related...) They deliberately had stopped the train. Scope of the adverb (VP-oriented, Clause-oriented) Sadly they were arguing about the children. ? They were arguing sadly about the children.
  • Slide 23
  • Syntax "Simple" verbs vs Prepositional verbs vs Phrasal verbs She has opened the door slowly. She has slowly given up cigarettes. Prosody Prosodically integrated vs prosodically detached *Anxiously she waited for the results. Anxiously, she waited for the results. (Other works on parameters of adverb placement include: Kampers- Manhe, 1994; Engels, 2004)
  • Slide 24
  • Tests with native speakers SentencesOKIncorrectBest Choice (1)(2) 1.Slowly she has opened the door.x 2.She slowly has opened the door.x 3.She has slowly opened the door.x 4.4.She has opened the door slowly.xx (1) Grammatical but unnatural and/or changes original meaning (2) Ungrammatical Table 7. Sample from NS tests
  • Slide 25
  • Error patterns and correction rules Manner adverbs used as adjuncts (VP-oriented) Ex: *Slowly she has opened the door. (1) Slowly, she has opened the door. (2) She has opened the door slowly. (3) She has slowly opened the door. Correction: pattern for detection + Rewriting under conditions, with preferences: Adverb(+manner), NP 1, {Auxiliary}, Verb, NP 2 Adverb(+manner), [,], NP 1, {Auxiliary}, Verb, NP2,{preference: 1} NP 1, {Auxiliary}, Verb, NP 2, Adverb(+manner),{preference: 2} NP 1, {Auxiliary}, Adverb(+manner),Verb, NP 2,{preference: 3}
  • Slide 26
  • Ex: *She anxiously was waiting for the results. (1) She was anxiously waiting for the results. (2) She was waiting for the results anxiously. Rewriting rule: NP 1, Adverb(+manner), {Auxiliary}, Verb, NP 2 NP 1, {Auxiliary}, Verb, NP 2, Adverb(+manner), {preference: 1} NP 1, {Auxiliary}, Adverb(+manner), Verb, NP 2, {preference: 2}
  • Slide 27
  • Difficulties: Deal with the recognition of NPs Possible interactions with other functions of adverbs (ex: He loves only his work, focusing modifier) Await testing and implementation using (software platform for the identification of textual semantic structures): Evaluation of : Annotation of errors + correction proposals.
  • Slide 28
  • Perspectives Further research on adverbs: Other functions, e.g. modifiers of adjectives and adverbs, focusing modifiers (might interact with existing error patterns) Internal syntax of AdvPs Develop explanation aspects of the project: Generate argumentations to deal with multiple correction propositions (Garnier et al., 2009) Design dynamically generated explanations for errors linked to adverbs Investigate cognitive aspects of error correction Correction of NN errors (ex: the meaning utterance) and other types of errors Requires knowledge from different areas (lexical, ontological, domain knowledge, etc.)
  • Slide 29
  • More information on: http://www.irit.fr/recherches/ILPL/webct/ct.html
  • Slide 30
  • Thank to you