Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
NLP Guest Lecture: Morphological Word Embeddings PAMELA SHAPIRO
Outline Review:Morphology,WordEmbeddings Research:MorphologicalWordEmbeddings Application:NeuralMachineTranslation
2
Review: Morphology Morphology:studiessub-wordlevelphenomena◦ e.g.prefixes,suffixes,compounding
Morpheme:sub-wordcomponentthatbearsmeaning◦ e.g.play,-ing,-s,-er,re-
Inflectionalmorphology:doesn’tchangecorecontentofword(includingPOS),tenses/cases◦ Basewhichinflectionalmorphologyattachestocalledlemma
Derivationalmorphology:canchangemeaning,partofspeech
Compounding:combiningcompletewords◦ e.g.toothbrush
Interactswithphonology,orthography,andsyntax
3
Morphology Across Languages Morphologicalpatternsvaryacrosslanguages
Analytic:fewmorphemesperword(e.g.Chinese,Vietnamese)◦ Englishismoderatelyanalytic
Fusional:tendto“fuse”multiplefeaturesintooneaffix◦ IncludesmostIndo-EuropeanandSemiticlanguages◦ e.g.habló,-ódenotesbothpasttenseand3rdpersonsingular
Agglutinative:tendtohavemoremorphemesperword,moreclearlydemarcatedandregular◦ IncludesFinnish,Hungarian,Turkish◦ e.g.anlamadım'Ididnotunderstand’–verbrootanla-,negativemarker–m(a),definitepastmarker–d(i),and1stpersonindicator,-(ı)m.
4
Non-concatenative Morphology Non-concatenativemorphology:additionalpropertyofSemiticlanguages,morphemesarenotjustconcatenatedtogether,include“templates”whichrootsareinsertedinto
5
Review: Word Embeddings Skip-gramobjective(word2vec)
Learninputvectorsandoutputvectorssimultaneouslys.t.inputvectorpredictsoutputvectorsofsurroundingwords
Mikolovetal.(2013)
w(t-2)
w(t-1)
w(t+1)
w(t+2)
OUTPUTPROJECTION
w(t)
INPUT
6
Problem: Sparsity due to Morphology Rarewordsdon’tgettheirvectorsupdatedenoughtohavemeaningfulrepresentations Inmorphologicallyrichlanguages,maybeonlymostcommonvariantgetsupdatedfrequently Solution:Bringmorphologicalvariantsclosertogether◦ “morphologicalwordembeddings”isaveryvaguetermthatcouldmeanmanythings,butgenerallyaddressesthisissue
WetryacoupleapproachestothisprobleminArabic,trainingwordvectorsonArabicWikipedia
7
Approach #1: Bag of Character N-Grams Fromexistingliterature(fastText)
Noaccesstomorphologicalinformation,butmaybeabletolearnimplicitly
Bojanowskietal.(2017)
OUTPUTPROJECTIONINPUT
w(t-2)
w(t-1)
w(t+1)
w(t+2)
gi(w(t))
gi-1(w(t))
gi+1(w(t))
+
8
Approch #2: Lemma-Informed Lemma-informed(morph)
Acquirelemmasusingspecializedmorphologicalanalyzer(MADAMIRA),trainwordandlemmaembeddings
w(t-2)
w(t-1)
w(t+1)
w(t+2)
OUTPUTPROJECTION
w(t)
l(t)
INPUT
9
Word Embeddings: Evaluation Intrinsic: Assesssimilarityofwordseitherdirectlyorwithanalogies,comparingtohumanjudgment
Extrinsic: Assessusefulnessindownstreamapplication(e.g.NeuralMachineTranslation)
10
Word Similarity WordSimilarity-353testset:wordpairswithhumanjudgmentsof“relatedness”onscaleof1-10◦ e.g.tigercat7.35,noonstring0.54(averagedoverseveralhumanjudgments)
Toevaluatesuccess:◦ Usemodel’swordvectorstodeterminecosinesimilarity(between0and1)◦ Assesscorrelationwithhumanjudgments,typicallyuseSpearman'srankcorrelationcoefficient
WeuseaversionofthistestsetthathasbeentranslatedintoArabic(HassanandMihalcea,2009)
11
Results: Word Similarity
12
w(t-2)
w(t-1)
w(t+1)
w(t+2)
OUTPUTPROJECTION
w(t)
l(t)
INPUT
Results: Word Similarity
13
Extrinsic Evaluation: NMT UsewordembeddingstoinitializesourcewordvectorsofNMTsystem
MTBitext:AràEncorpusofTEDtalks,consideredlow-resourcesetting◦ ~175ktrainsentences◦ 2kdev◦ 2ktest
14
Background: Neural Machine Translation
15
vielendank!wordembeddings
BidirectionalLSTM/GRU
Background: Neural Machine Translation
16
wordembeddings
BidirectionalLSTM/GRU
Attention
UnidirectionalLSTM/GRU
thank
vielendank!
Background: Neural Machine Translation
17
wordembeddings
BidirectionalLSTM/GRU
thank
Attention
UnidirectionalLSTM/GRU
thankyou
vielendank!
Background: Neural Machine Translation
18
wordembeddings
BidirectionalLSTM/GRU
thankyou
Attention
UnidirectionalLSTM/GRU
thankyouvery
vielendank!
Background: Neural Machine Translation
19
wordembeddings
BidirectionalLSTM/GRU
thankyouvery
Attention
UnidirectionalLSTM/GRU
thankyouverymuch
vielendank!
Background: Neural Machine Translation
20
wordembeddings
BidirectionalLSTM/GRU
thankyouverymuch!
thankyouverymuch!
Attention
UnidirectionalLSTM/GRU
vielendank!
MT Evaluation: BLEU Score Humanevaluationisexpensive,souseautomaticmetricbasedonn-gramprecision
Notperfect,butshowntocorrelatewithhumanjudgment
Higherisbetter
21
Results: MT
22
Example Sentence
23
CandiscussBPEattheendiftime(tl;drsegmentswordswithanautomaticheuristic)
Conclusion IncorporatingawarenessofmorphologyintowordembeddingscanhelpinNMT
Lemma-informed(morph)andbagofcharactern-grams(fastText)modificationstoskip-gram(word2vec)◦ Bothhelpinbothinstrinsic&extrinsicevaluation◦ However,lemmaismoreusefulforwordsimilarity◦ Meanwhile,fastTextdoesbetterwithMT
24
Byte-Pair Encoding Compressionalgorithm(Gage1994,Sennrichetal.,2016):aimedatkeepingfrequentwordsintact,breakinguprare/unknownwords
Thentranslatesthese“subwordunits”withsamemodel
Hasbecomestandardpractice
25
Byte-Pair Encoding Beginswitheverythingsplitupintocharacters,iterativelymergesmostfrequentpairsofsymbols,useslearnedmergeoperationsontestdata(doesn’tcrosswordboundaries)
e.g.alloftheanimalsaregoingdownthepath
all|of|the|animals|are|going|down|the|path
26
Byte-Pair Encoding Beginswitheverythingsplitupintocharacters,iterativelymergesmostfrequentpairsofsymbols,useslearnedmergeoperationsontestdata(doesn’tcrosswordboundaries)
e.g.alloftheanimalsaregoingdownthepath
all|of|the|animals|are|going|down|the|path
1)thàth
27
Byte-Pair Encoding Beginswitheverythingsplitupintocharacters,iterativelymergesmostfrequentpairsofsymbols,useslearnedmergeoperationsontestdata(doesn’tcrosswordboundaries)
e.g.alloftheanimalsaregoingdownthepath
all|of|the|animals|are|going|down|the|path
1)thàth
2)alàal
28
Byte-Pair Encoding Beginswitheverythingsplitupintocharacters,iterativelymergesmostfrequentpairsofsymbols,useslearnedmergeoperationsontestdata(doesn’tcrosswordboundaries)
e.g.alloftheanimalsaregoingdownthepath
all|of|the|animals|are|going|down|the|path
1)thàth
2)alàal
3)theàthe
29
Byte-Pair Encoding TypicalrangeofBPE:15k-100k
30