Download pdf - Correcting Language Errors Machine Translation Techniquescompling.hss.ntu.edu.sg/events/2018-gwc/presentations/GWC2018... · Correcting Language Errors Machine Translation Techniques

CorrectingLanguageErrorsMachineTranslationTechniques

Shamil [email protected]

NaturalLanguageProcessingGroupNationalUniversityofSingapore

^

Whatarelanguageerrors?

• A“languageerror”isadeviationfromrulesofalanguage

• Duetolackofknowledge.

• Madebylearnersofthelanguage.

• Languageerrorsinwritingincludespelling,grammatical,wordchoice,andstylisticerrors

2

HowcanNLPhelp?

• Buildingautomaticgrammarcorrectiontoolsandspellcheckers.

• Rule-basedsystems(e.g.MicrosoftWord),andadvancedsoftwarethatcorrectdifferentkindsoferrors(e.g.Grammarly,Ginger).

• Usefultoolfornon-nativewriters.

• Evidencethatcorrectivefeedbackhelpslanguagelearning(Leacocketal.,AutomatedGrammaticalErrorDetectionforLanguageLearners2ed,2014)

3

GrammaticalErrorCorrectionor“GEC”

• Automaticcorrectionofvariouskindsoferrorsinwrittentext.

Example(input):

Theproblemsbringsomeeffecton affect engineeringdesignfrom in twoaspectaspects,independentinnovationandengineeringapplication.

– fromtheNUSCorpusofLearnerEnglish(NUCLE)

• Mostpopularapproachisthemachinetranslationapproach.

4

TheTranslationApproach

• TreatsGECastranslation taskfrom“bad”Englishà “good”English

Advantages:üAbletolearntexttransformationsfromparalleldata.üSimple,anddoesnotneedlanguage-dependant tools.üCancorrectinteractingerrorsandcomplexerrortypes.

• Typically,statisticalmachinetranslation(SMT)orneuralmachinetranslation(NMT)frameworks.

5

History

SMTforcountabilityerrorsofmassnouns(Brockettetal.2006)

JapaneseSMT-basedGECandLang-8corpus(Mizumoto etal.2011)

CoNLL-2014SharedTask:2/3topsystemsuseSMT

Neuralmodelsasfeatures(Chollampatt etal.2016)

SystemcombinationapproachbeatsCoNLL-2014systems

(Susanto etal.2014)

NeuralmachinetranslationapproachtoGEC

(YuanandBriscoe2016)

GEC-specificfeaturesand(Junczys-Dowmunt andGrundkiewicz 2016)

Combiningwordandcharacter-levelSMT

(Chollampatt andNg,2017)

Convolutionalneuralencoder-decoderforGECachievesbestresults(Chollampatt andNg,AAAI2018– toappear)

6

DataFortraining:• ParallelCorpora

- AnnotatedLearnerDataset:NUCLE- CrawledfromLang-8

• Englishcorpora:Wikipedia,CommonCrawl

Fortesting:CoNLL-2014sharedtasktestset(1312sentences)Metric:F0.5 usingMaxMatch scorer

7

WordandCharacter-levelSMTforGEC

8

StatisticalMachineTranslationApproach

9

TRANSLATION MODEL

LANGUAGE MODEL

SMTDECODER

Parallel Text(Learner Text &Corrected Text)

Well-formed English text

train train

Input Sentence

Output Sentence

StatisticalMachineTranslationApproach

• Usingalog-linearframework:

• Featureweights𝜆" aretunedusingMERToptimizingF0.5 metricondevelopmentset.

10

𝑇∗ = argmax,

𝑃 𝑇 𝑆 = argmax,

/𝜆"(𝑓" 𝑆, 𝑇 )4

"56𝑇∗ :bestoutputsentenceS:sourcesentenceT:candidateoutputsentenceN:numberoffeatures𝜆" :ith featureweight𝑓" :ithfeaturefunction

Thus,advicefromhospitalplaysthe importantrolefor this.

Phrase-basedSMT

11

InputSentence(S)

Thus,advicefromhospitalplaysthe importantrolefor this.

Phrase-basedSMT

12

Thus,advicefromthe hospitalplaysanimportantroleinthis.

OutputSentence(T*)

InputSentence(S)

UsefulGEC-specificFeatures

• IntroducedbyJunczys-Dowmunt andGrundkiewicz (CoNLL-2014SharedTask,EMNLP2016)

‣ WordClassLanguageModel‣ OperationSequenceModel‣ EditOperations‣ SparseEditOperationFeatures‣ AWeb-scaleLM

13

NeuralNetworkJointModel

• JointModel(JM)vsLanguageModel(LM)

• FeatureFunction:

𝑓 𝑇, 𝑆 = 𝑃 𝑇 𝑆 ≈ J𝑃(𝑡"|𝑠NO6, 𝑠N, 𝑠NP6, 𝑡"O6)|,|

"5614

3+2gramJM:𝑃(sat|cat,sit,in,cats)

SRC:Thecatsitinamat.

HYP:Thecatssatonthemat.

BigramLM:𝑃(sat|cats)

NeuralNetworkJointModel

• Usesafeed-forwardneuralnetwork(Devlinetal.,2014)

• 5+5gramNNJMforGECinChollampatt etal.(IJCAI2016andBEAWorkshop2017)

15

cat sit in cats

Output Vocabulary

P(targetword|context)

𝑃(sat|cat,sit,in,cats)

Es Es Es Et

NNJMAdaptation

Training:usingloglikelihoodwithselfnormalization.

Adaptation:addingKL-divergenceregularizationtermtolossfunction:

AdaptationData:üHigherqualityerrorannotationsüHighererror/sentenceratio

16

SMTforSpellingCorrection

17

• Addedasapostprocessingsteptotheword-levelSMT.

• Character-levelSMTgetstheunknownwordsfromtheSMTsystemandgeneratescandidates(maybenon-words)

• Rescoringwithlanguagemodeltofilterawaynon-wordcandidatesandpickbestcorrectionbasedoncontext.

utli sesuti li sesuti li zesuti li seuti li shes

Setup

DevelopmentData:‣ 5,458sentencesfromNUCLEwithatleast1error/sentence.

ParallelTrainingDataforWord-levelSMT:‣ Lang-8,NUCLE(2.21Msentences,26.77Msourcewords)

DataforCharacter-levelSMT:‣ UniquewordsinthecorrectedsideofNUCLEandthecorporaofmisspellings(http://www.dcs.bbk.ac.uk/~ROGER/corpora.html)

LMTrainingData:‣ Wikipedia(1.78Btokens),CommonCrawlLM(94Btokens)

18

Results

43.1645.90

49.25 51.70 53.1447.40 49.52

0.00

10.00

20.00

30.00

40.00

50.00

60.00

SMT-GEC +GECFEATURES

+WEB-SCALELM

+ADAPTEDNNJM

+SMTFORSPELLING

R&R(2016) J&G(2016)

19

R&R(2016):ROZOVSKAYAANDROTH(ACL2016)J&G(2016) :JUNCZYSDOWMUNTANDGRUNDKIEWICZ(EMNLP2016)

MultilayerConvolutionalEncoderandDecoderNeuralNetworkforGEC

20

Encoder-DecoderApproach

21

DECODERInput Sentence

ENCODER

ATTENTION

Output Sentence

PriorworkinGEC:RecurrentNeuralNetwork(RNN)-basedapproaches(Bahdanau etal.2015)

WeuseafullyConvolutionalNeuralNetwork(CNNs)-basedapproach(Gehring etal.2017)…

Encoder-DecoderApproach

22

AMultilayerConvolutionalEncoder-Decoder

23


EncoderConsistsofsevenlayers.

24



ConvolutionOperation:𝐟"S = Conv 𝐡"O6SO6, 𝐡"SO6, 𝐡"P6SO6

25




GatedLinearUnits(GLUs):GLU 𝐟"S = 𝐟",6:ZS + 𝜎 𝐟",ZP6:]ZS

26




GatedLinearUnits(GLUs):GLU 𝐟"S = 𝐟",6:ZS + 𝜎 𝐟",ZP6:]ZS

ResidualConnections:𝐡"S = GLU 𝐟"S + 𝐡"SO6

27


DecoderConsistsofsevenlayers.

Consistsofconvolutionsandnon-linearities

28


DecoderConsistsofsevenlayers.

Consistsofconvolutionsandnon-linearities

+Attention:

𝛼_,"S = exp(𝐞"a𝐳_S )

∑ exp(𝐞da𝐳_S )ed56

𝐱_S = /𝛼_,"S (e

"56

𝐞" + 𝐬")29

Pre-trainingWordEmbeddings

30

• Wordembeddings arepre-trainedandinitialized.• TrainedusingfastText (Bojanowskietal.,2017)onWikipedia.• Usesunderlyingcharactern-gramsequencesofwords

AdvantagesüReliableembeddings canbeconstructedforrarerwords.üMorphologyofwordsisconsidered.

Ensembling andRe-scoring

• Ensembling multiplemodels,i.e.thelogprobabilitiesformultiplemodelsareaveragedduringpredictionofeachoutputword.

• Thefinalbeamcandidatesarere-scoredusingfeatures:• EditOperation(EO):#insertions,#deletions,#substitutions• LanguageModel(LM):web-scaleLMscore,#words

• FeatureweightstuningdonesimilartoSMT:MERToptimizingF0.5 onthedevelopmentdata.

31

ModelandTrainingDetails

• Data:AsinChollampatt andNg(BEA2017)exceptforusingonlyannotatedsentencepairsduringtraining.

• Vocabulary:30Kmostfrequentwordsonsourceandtargetside

• Numberofdimensionsofembeddings:500

• Numberofdimensionsof encoder/decoderoutputvectors:1024

32

Results

45.36 46.3849.33

54.13 53.14

45.1541.53 41.37

0

10

20

30

40

50

60

MULTILAYERCONVENC-DEC

PRE-TRAININGEMBEDDINGS

ENSEMBLEOF4MODELS

RE-SCORING(EO,LM)

CHOLLAMPATTANDNG(2017)

JI ETAL. (2017)WITHOUTLM

JIETAL. (2017) SCHMALTZETAL. (2017)

33

ChallengesandFutureWork

• Lackofgoodqualityparalleldata.

• Goingbeyondsentence-level.

• Adaptationtodiverselearners.

34

ThankYouEmail:[email protected]:shamilcm.github.io

35