Correcting Language Errors Machine Translation ?· Correcting Language Errors Machine Translation Techniques…

Embed Size (px)

Text of Correcting Language Errors Machine Translation ?· Correcting Language Errors Machine Translation...

  • CorrectingLanguageErrorsMachineTranslationTechniques

    Shamil Chollampattshamil@u.nus.edu

    NaturalLanguageProcessingGroupNationalUniversityofSingapore

    ^

  • Whatarelanguageerrors?

    Alanguageerrorisadeviationfromrulesofalanguage

    Duetolackofknowledge.

    Madebylearnersofthelanguage.

    Languageerrorsinwritingincludespelling,grammatical,wordchoice,andstylisticerrors

    2

  • HowcanNLPhelp?

    Buildingautomaticgrammarcorrectiontoolsandspellcheckers.

    Rule-basedsystems(e.g.MicrosoftWord),andadvancedsoftwarethatcorrectdifferentkindsoferrors(e.g.Grammarly,Ginger).

    Usefultoolfornon-nativewriters.

    Evidencethatcorrectivefeedbackhelpslanguagelearning(Leacocketal.,AutomatedGrammaticalErrorDetectionforLanguageLearners2ed,2014)

    3

  • GrammaticalErrorCorrectionorGEC

    Automaticcorrectionofvariouskindsoferrorsinwrittentext.

    Example(input):

    Theproblemsbringsomeeffecton affect engineeringdesignfrom in twoaspectaspects,independentinnovationandengineeringapplication.

    fromtheNUSCorpusofLearnerEnglish(NUCLE)

    Mostpopularapproachisthemachinetranslationapproach.

    4

  • TheTranslationApproach

    TreatsGECastranslation taskfrombadEnglish goodEnglish

    Advantages:Abletolearntexttransformationsfromparalleldata.Simple,anddoesnotneedlanguage-dependant tools.Cancorrectinteractingerrorsandcomplexerrortypes.

    Typically,statisticalmachinetranslation(SMT)orneuralmachinetranslation(NMT)frameworks.

    5

  • History

    SMTforcountabilityerrorsofmassnouns(Brockettetal.2006)

    JapaneseSMT-basedGECandLang-8corpus(Mizumoto etal.2011)

    CoNLL-2014SharedTask:2/3topsystemsuseSMT

    Neuralmodelsasfeatures(Chollampatt etal.2016)

    SystemcombinationapproachbeatsCoNLL-2014systems

    (Susanto etal.2014)

    NeuralmachinetranslationapproachtoGEC

    (YuanandBriscoe2016)

    GEC-specificfeaturesand(Junczys-Dowmunt andGrundkiewicz 2016)

    Combiningwordandcharacter-levelSMT

    (Chollampatt andNg,2017)

    Convolutionalneuralencoder-decoderforGECachievesbestresults(Chollampatt andNg,AAAI2018 toappear)

    6

  • DataFortraining: ParallelCorpora

    - AnnotatedLearnerDataset:NUCLE- CrawledfromLang-8

    Englishcorpora:Wikipedia,CommonCrawl

    Fortesting:CoNLL-2014sharedtasktestset(1312sentences)Metric:F0.5 usingMaxMatch scorer

    7

  • WordandCharacter-levelSMTforGEC

    8

  • StatisticalMachineTranslationApproach

    9

    TRANSLATION MODEL

    LANGUAGE MODEL

    SMTDECODER

    Parallel Text(Learner Text &Corrected Text)

    Well-formed English text

    train train

    Input Sentence

    Output Sentence

  • StatisticalMachineTranslationApproach

    Usingalog-linearframework:

    Featureweights" aretunedusingMERToptimizingF0.5 metricondevelopmentset.

    10

    = argmax,

    = argmax,

    /"(" , )4

    "56 :bestoutputsentenceS:sourcesentenceT:candidateoutputsentenceN:numberoffeatures" :ith featureweight" :ithfeaturefunction

  • Thus,advicefromhospitalplaysthe importantrolefor this.

    Phrase-basedSMT

    11

    InputSentence(S)

  • Thus,advicefromhospitalplaysthe importantrolefor this.

    Phrase-basedSMT

    12

    Thus,advicefromthe hospitalplaysanimportantroleinthis.

    OutputSentence(T*)

    InputSentence(S)

  • UsefulGEC-specificFeatures

    IntroducedbyJunczys-Dowmunt andGrundkiewicz (CoNLL-2014SharedTask,EMNLP2016)

    WordClassLanguageModel OperationSequenceModel EditOperations SparseEditOperationFeatures AWeb-scaleLM

    13

  • NeuralNetworkJointModel

    JointModel(JM)vsLanguageModel(LM)

    FeatureFunction:

    , = J("|NO6, N, NP6, "O6)|,|

    "5614

    3+2gramJM:(sat|cat,sit,in,cats)

    SRC:Thecatsitinamat.

    HYP:Thecatssatonthemat.

    BigramLM:(sat|cats)

  • NeuralNetworkJointModel

    Usesafeed-forwardneuralnetwork(Devlinetal.,2014)

    5+5gramNNJMforGECinChollampatt etal.(IJCAI2016andBEAWorkshop2017)

    15

    cat sit in cats

    Output Vocabulary

    P(targetword|context)

    (sat|cat,sit,in,cats)

    Es Es Es Et

  • NNJMAdaptation

    Training:usingloglikelihoodwithselfnormalization.

    Adaptation:addingKL-divergenceregularizationtermtolossfunction:

    AdaptationData:HigherqualityerrorannotationsHighererror/sentenceratio

    16

  • SMTforSpellingCorrection

    17

    Addedasapostprocessingsteptotheword-levelSMT.

    Character-levelSMTgetstheunknownwordsfromtheSMTsystemandgeneratescandidates(maybenon-words)

    Rescoringwithlanguagemodeltofilterawaynon-wordcandidatesandpickbestcorrectionbasedoncontext.

    utli sesuti li sesuti li zesuti li seuti li shes

  • Setup

    DevelopmentData: 5,458sentencesfromNUCLEwithatleast1error/sentence.

    ParallelTrainingDataforWord-levelSMT: Lang-8,NUCLE(2.21Msentences,26.77Msourcewords)

    DataforCharacter-levelSMT: UniquewordsinthecorrectedsideofNUCLEandthecorporaofmisspellings(http://www.dcs.bbk.ac.uk/~ROGER/corpora.html)

    LMTrainingData: Wikipedia(1.78Btokens),CommonCrawlLM(94Btokens)

    18

  • Results

    43.1645.90

    49.25 51.7053.14

    47.40 49.52

    0.00

    10.00

    20.00

    30.00

    40.00

    50.00

    60.00

    SMT-GEC +GECFEATURES

    +WEB-SCALELM

    +ADAPTEDNNJM

    +SMTFORSPELLING

    R&R(2016) J&G(2016)

    19

    R&R(2016):ROZOVSKAYAANDROTH(ACL2016)J&G(2016) :JUNCZYSDOWMUNTANDGRUNDKIEWICZ(EMNLP2016)

  • MultilayerConvolutionalEncoderandDecoderNeuralNetworkforGEC

    20

  • Encoder-DecoderApproach

    21

    DECODERInput Sentence

    ENCODER

    ATTENTION

    Output Sentence

  • PriorworkinGEC:RecurrentNeuralNetwork(RNN)-basedapproaches(Bahdanau etal.2015)

    WeuseafullyConvolutionalNeuralNetwork(CNNs)-basedapproach(Gehring etal.2017)

    Encoder-DecoderApproach

    22

  • AMultilayerConvolutionalEncoder-Decoder

    23

  • AMultilayerConvolutionalEncoder-Decoder

    EncoderConsistsofsevenlayers.

    24

  • AMultilayerConvolutionalEncoder-Decoder

    EncoderConsistsofsevenlayers.

    ConvolutionOperation:"S = Conv "O6SO6, "SO6, "P6SO6

    25

  • AMultilayerConvolutionalEncoder-Decoder

    EncoderConsistsofsevenlayers.

    ConvolutionOperation:"S = Conv "O6SO6, "SO6, "P6SO6

    GatedLinearUnits(GLUs):GLU "S = ",6:ZS + ",ZP6:]ZS

    26

  • AMultilayerConvolutionalEncoder-Decoder

    EncoderConsistsofsevenlayers.

    ConvolutionOperation:"S = Conv "O6SO6, "SO6, "P6SO6

    GatedLinearUnits(GLUs):GLU "S = ",6:ZS + ",ZP6:]ZS

    ResidualConnections:"S = GLU "S + "SO6

    27

  • AMultilayerConvolutionalEncoder-Decoder

    DecoderConsistsofsevenlayers.

    Consistsofconvolutionsandnon-linearities

    28

  • AMultilayerConvolutionalEncoder-Decoder

    DecoderConsistsofsevenlayers.

    Consistsofconvolutionsandnon-linearities

    +Attention:

    _,"S = exp("a_S )

    exp(da_S )ed56

    _S = /_,"S (e

    "56

    " + ")29

  • Pre-trainingWordEmbeddings

    30

    Wordembeddings arepre-trainedandinitialized. TrainedusingfastText (Bojanowskietal.,2017)onWikipedia. Usesunderlyingcharactern-gramsequencesofwords

    AdvantagesReliableembeddings canbeconstructedforrarerwords.Morphologyofwordsisconsidered.

  • Ensembling andRe-scoring

    Ensembling multiplemodels,i.e.thelogprobabilitiesformultiplemodelsareaveragedduringpredictionofeachoutputword.

    Thefinalbeamcandidatesarere-scoredusingfeatures: EditOperation(EO):#insertions,#deletions,#substitutions LanguageModel(LM):web-scaleLMscore,#words

    FeatureweightstuningdonesimilartoSMT:MERToptimizingF0.5 onthedevelopmentdata.

    31

  • ModelandTrainingDetails

    Data:AsinChollampatt andNg(BEA2017)exceptforusingonlyannotatedsentencepairsduringtraining.

    Vocabulary:30Kmostfrequentwordsonsourceandtargetside

    Numberofdimensionsofembeddings:500

    Numberofdimensionsof encoder/decoderoutputvectors:1024

    32

  • Results

    45.36 46.3849.33

    54.13 53.14

    45.1541.53 41.37

    0

    10

    20

    30

    40

    50

    60

    MULTILAYERCONVENC-DEC

    PRE-TRAININGEMBEDDINGS

    ENSEMBLEOF4MODELS

    RE-SCORING(EO,LM)

    CHOLLAMPATTANDNG(2017)

    JI ETAL. (2017)WITHOUTLM

    JIETAL. (2017) SCHMALTZETAL. (2017)

    33

  • ChallengesandFutureWork

    Lackofgoodqualityparalleldata.

    Goingbeyondsentence-level.

    Adaptationtodiverselearners.

    34

  • ThankYouEmail:shamil@u.nus.eduWebsite:shamilcm.github.io

    35