Towards Interactive and Automatic Refinement of Translation Rules

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Towards Interactive and Automatic Refinement of Translation Rules. PhD Thesis Proposal Ariadna Font Llitjós 5 November 2004. Outline. Introduction Related Work Technical Approach Interactive elicitation of error information A framework for automatic rule adaptation Preliminary Research - PowerPoint PPT Presentation

Text of Towards Interactive and Automatic Refinement of Translation Rules

  • Towards Interactive and Automatic Refinement of Translation Rules PhD Thesis Proposal

    Ariadna Font Llitjs

    5 November 2004

    Interactive and Automatic Rule Refinement

  • OutlineIntroduction Related WorkTechnical ApproachInteractive elicitation of error informationA framework for automatic rule adaptationPreliminary ResearchProposed ResearchContributions and Thesis Timeline

    Interactive and Automatic Rule Refinement

  • How to recycle corrections of MT output back into the system

    by adjusting and adapting the grammar and lexical rules

    Interactive and Automatic Rule Refinement

  • The ProblemGeneralMT output still requires post-editing.Current systems do not recycle post-editing efforts back into the system, beyond adding as new training data.

    Avenue specificResource-poor scenarios: lack of manual grammar or very small initial grammar.Need to validate elicitation corpus and automatically learned translation rules .

    Interactive and Automatic Rule Refinement

  • MotivationGeneralVery costly and time consuming to refine and extend translation rule sets manually by trained computational linguists with knowledge of both languages.

    Resource-poor scenariosIndigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.).Preservation of language and culture.

    Interactive and Automatic Rule Refinement

  • MT OutputSL: Mary and Anna are fallingTL: Mara y Ana estn cayendo TL: Mara y Ana se estn cayendo

    SL: Gaudi was a great artistTL: Gaudi estaba un artista grandeTL: Gaudi era un artista grande TL: Gaudi era un gran artista

    SL: You saw the womanTL: Viste la mujer TL: Viste a la mujerTL: Vi la mujerSL: I used my elbow to push the buttonTL: Us mi codo que apretar el botnTL: Us mi codo para apretar el botn

    SL: We are building new bridges in the cityTL: Nosotros estamos construyendo nuevo puentes dentro la ciudadTL: Nosotros estamos construyendo nuevo puentes dentro de la ciudad

    Interactive and Automatic Rule Refinement

  • Resource-poor scenariosNo e-data available (often spoken tradition) SMT or EBMTNo computational linguists to write a grammar

    Interactive and Automatic Rule Refinement

  • Resource-poor scenariosNo e-data available (often spoken tradition) SMT or EBMTNo computational linguists to write a grammar

    So how can we even start to think about MT?

    Interactive and Automatic Rule Refinement

  • Resource-poor scenariosNo e-data available (often spoken tradition) SMT or EBMTNo computational linguists to write a grammar

    So how can we even start to think about MT?Thats what AVENUE is all aboutElicitation Corpus + Automatic Rule Learning

    Interactive and Automatic Rule Refinement

  • Resource-poor scenariosNo e-data available (often spoken tradition) SMT or EBMTNo computational linguists to write a grammar

    So how can we even start to think about MT?Thats what AVENUE is all aboutElicitation Corpus + Automatic Rule Learning

    What do we usually have available in resource-poor scenarios?

    Interactive and Automatic Rule Refinement

  • Resource-poor scenariosNo e-data available (often spoken tradition) SMT or EBMTNo computational linguists to write a grammar

    So how can we even start to think about MT?Thats what AVENUE is all aboutElicitation Corpus + Automatic Rule Learning

    What do we usually have available in resource-poor scenarios?Bilingual users

    Interactive and Automatic Rule Refinement

  • Avenue overview

    Interactive and Automatic Rule Refinement

  • Avenue overview: my thesis

    Interactive and Automatic Rule Refinement

  • Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable.

    Interactive and Automatic Rule Refinement

  • Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. - We can automatically refine translation rules, given corrected and aligned translation pairs and some error information, so as to improve coverage and overall MT quality.

    Interactive and Automatic Rule Refinement

  • Interactive and Automatic Rule Refinement

  • OutlineIntroduction Related WorkTechnical ApproachInteractive elicitation of error informationA framework for automatic rule adaptationPreliminary ResearchProposed ResearchContributions and Thesis Timeline

    Interactive and Automatic Rule Refinement

  • Related WorkPost-editing to improve MT systemsminimal post-editing [Allen, 2003]include user feedback in the MT loop [Callison-Burch, 2004], [Allen & Hogan, 2000], [Su et al. 1995], [Menezes & Richardson, 2001] and [Imamura et al. 2003]MT error information and classification[Flanagan, 1994], [White et al., 1994], [Allen 2003], [Niessen et al. 2000]

    Interactive and Automatic Rule Refinement

  • Related Work++Rule AdaptationPOS tagging: [Lin et al., 1994]parsing: [Lehman, 1989], [Brill, 2003]NLU: [Gavald, 2000]MT: [Corston-Oliver & Gammon, 2003]: DTs to correct binary features of LF to reduce noise [Yamada, 1995]: structural comparison between machine translations and manual translations to adapt MT system to new domain.[Naruedomkul, 2001]: modify HPSG-like semantic representation of TL until it is acceptably similar to the SL.

    Interactive and Automatic Rule Refinement

  • OutlineIntroduction Related WorkTechnical ApproachInteractive elicitation of error informationA framework for automatic rule adaptationPreliminary ResearchProposed ResearchContributions and Thesis Timeline

    Interactive and Automatic Rule Refinement

  • Interactive elicitation of MT errorsAssumptions: non-expert bilingual users can reliably detect and minimally correct MT errors, given:SL sentence (I saw you)TL sentence (Yo vi t)word-to-word alignments (I-yo, saw-vi, you-t)(context)using an online GUI: the Translation Correction Tool (TCTool)Goal: simplify MT correction task maximally

    Interactive and Automatic Rule Refinement

  • MT error typology for RR (simplified)missing wordextra wordword order (local vs long-distance, word vs phrase, word change)incorrect word (sense, form, selectional restrictions, idiom, ...)agreement (missing constraint, extra agreement constraint)

    Interactive and Automatic Rule Refinement

  • OutlineMotivation and Goals Related WorkTechnical ApproachInteractive elicitation of error informationA framework for automatic rule adaptationWork to DateProposed ResearchContributions and Open Questions

    Interactive and Automatic Rule Refinement

  • Automatic Rule Refinement FrameworkFind best RR operations given a: grammar (G), lexicon (L), (set of) source language sentence(s) (SL), (set of) target language sentence(s) (TL), its parse tree (P), and minimal correction of TL (TL) such that TQ2 > TQ1Which can also be expressed as:max TQ(TL|TL,P,SL,RR(G,L))

    Interactive and Automatic Rule Refinement

  • Types of RR operationsGrammar:R0 R0 + R1 [=R0 + contr] Cov[R0] Cov[R0,R1]R0 R1 [=R0 + constr] Cov[R0] Cov[R1]R0 R1[=R0 + constr= -] R2[=R0 + constr=c +] Cov[R0] Cov[R1,R2]LexiconLex0 Lex0 + Lex1[=Lex0 + constr] Lex0 Lex1[=Lex0 + constr]Lex0 Lex0 + Lex1[Lex0 + TLword] Lex1 (adding lexical item)bifurcaterefine

    Interactive and Automatic Rule Refinement

  • Formalizing Error Information

    Wi = error Wi = correction Wc = clue word

    Example:

    SL: the red car - TL: *el auto roja TL: el auto rojo

    Wi = roja Wi = rojo Wc = autoneed to agree

    Interactive and Automatic Rule Refinement

  • Finding Triggering Features

    Once we have users correction (Wi), we can compare it with Wi at the feature level and find which is the triggering feature.

    If set is empty, need to postulate a new binary feature Delta function:

    Interactive and Automatic Rule Refinement

  • OutlineIntroduction Related WorkTechnical ApproachInteractive elicitation of error informationA framework for automatic rule adaptationPreliminary ResearchProposed ResearchContributions and Thesis Timeline

    Interactive and Automatic Rule Refinement

  • TCTool v0.1Add a wordDelete a wordModify a wordChange word orderActions:Interactive elicitation of error information

    Interactive and Automatic Rule Refinement

  • TCTool v0.1 specsFirst five translations from lattice produced by transfer engine.Asks users to pick correct translation, or else, best incorrect translation (i.e. the one requiring the least amount of corrections).Provides translation correction and error classification help (static tutorial + error example page). CGI scripts in PERLCorrection interface in JavaScript (Kenneth Sim and Patrick Milholland) Interactive elicitation of error information

    Interactive and Automatic Rule Refinement

  • 1st Eng2Spa user study [LREC 2004]Manual grammar: 12 rules + 442 lexical entriesMT error classification (v0.0): 9 linguistically-motivated classes word order, sense, agreement error (number, person, gender, tense), form, incorrect word and no translationTest set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect)

    Interactive elicitation of error information

    Interactive and Automatic Rule Refinement

  • Data Analysis

    Interested in high precision, even at the expense of lower recall

    Users did not al