Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Representa)onLearningforReadingComprehension
RussSalakhutdinovMachine Learning Department
Carnegie Mellon UniversityCanadian Institute for Advanced Research
Joint work withBhuwanDhingra,ZhilinYang,YeYuan,JunjieHu,
HanxiaoLiu,andWilliamCohen
TalkRoadmap
• Mul)plica)veandFine-grainedAJen)on
• Incorpora)ngKnowledgeasExplicitMemoryforRNNs
• Genera)veDomain-Adap)veNets
• Query:President-electBarackObamasaidTuesdayhewasnotawareofallegedcorrup)onbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.
Who-Did-WhatDataset• Document:“…arrestedIllinoisgovernorRodBlagojevichandhis
chiefofstaffJohnHarrisoncorrup)oncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleZvacantbyPresident-electBarackObama…”
• Answer:RodBlagojevich
Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016
Introduc)on• Thecloze-styleQ/Ataskinvolvestuplesoftheform
Ø isadocument(paragraph/context)Ø isaques)onoverthecontentsofthatdocumentØ istheanswertothisquery
• Theanswercomesfromafixedvocabulary.
• Task:givenadocumentquerypairfindwhichanswers.
PriorWork• LSTMswithAJen)on:Hermannetal.,2015,Hilletal,2015,Chenetal,2016,Bahdanauetal.,2014
• MemoryNetworks:Westonetal.,2014,Sukhbaataretal.,2015,Bajgaretal.,2016
• AJen)onSumReader:Kadlecetal.,2016,Cuiwtal,2016
• DynamicEn)tyRepresenta)ons:Kobayashietal.,2016
• NeuralSeman)cEncoders,Munkhdalai&Yu,2016
• Itera)veAJen)veReader,Sordonietal.,2016
• ReasoNet,Shenetal.,2016
RecurrentNeuralNetwork
x1 x2 x3
h1 h2 h3
Nonlinearity HiddenStateatprevious)mestep
Inputat)mestept
Mul)plica)veIntegra)on
• Replace
• With
• Ormoregenerally
Wu et al., NIPS 2016
Mul)plica)veIntegra)on
Wu et al., NIPS 2016
• Ormoregenerally
Represen)ngDocument/Query• Letdenotedenotethetokenembeddingsofthedocument.
• Letdenoteembeddingsofthequery.
• |D|and|Q|denotethedocumentandquerylengthsrespec)vely
Represen)ngDocument/Query• ForwardRNNreadssentencesfromleZtoright:
• BackwardRNNreadssentencesfromrighttoleZ:
• Thehiddenstatesarethenconcatenated:
Represen)ngDocument/Query• UseGRUstoencodeadocumentandaquery:
• Notethat,forexample,Qisamatrix
• WecanthenuseGatedAJen)onmechanism:
Ø usetheelement-wisemul)plica)onoperatortomodeltheinterac)onsbetweenand
GatedAJen)onMechanism• ForeachtokendinD,weformatoken-specificrepresenta)onofthequery:
Dhingra, Liu, Yang, Cohen, Salakhutdinov, ACL 2017
Mul)-hopArchitecture• ManyQAtasksrequirereasoningovermul)plesentences.• Needtoperformsseveralpassesoverthecontext.
Dhingra, Liu, Yang, Cohen, Salakhutdinov, ACL 2017
OutputModel• Probabilitythatapar)culartokeninthedocumentanswersthequery:
Ø Takeaninnerproductbetweenthequeryembeddingandtheoutputofthelastlayer:
• Theprobabilityofapar)cularcandidateisthenaggregatedoveralldocumenttokenswhichappearinc:
setofposi)onswhereatokenincappearsinthedocumentd.
Pointer Sum Attention of Kadlec et al., 2016
OutputModel• Theprobabilityofapar)cularcandidateisthenaggregatedoveralldocumenttokenswhichappearinc:
• Thecandidatewithmaximumprobabilityisselectedasthepredictedanswer:
• Usecross-entropylossbetweenthepredictedprobabili)esandthetrueanswers.
Datasets• Wehavelookedat6datasets:
• Thewordlookupwasini)alizedwithGloVevectors(Penningtonetal.,2014)
AffectofMul)plica)veGa)ng• Performanceofdifferentga)ngfunc)onson“WhodidWhat”(WDW)dataset.
AnalysisofAJen)on• Context:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohn
Harrisoncorrup)oncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleZvacantbyPresident-electBarackObama…”
• Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorrup)onbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”
• Answer:RodBlagojevich
Layer1 Layer2
AnalysisofAJen)on• Context:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohn
Harrisoncorrup)oncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleZvacantbyPresident-electBarackObama…”
• Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorrup)onbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”
• Answer:RodBlagojevich
Layer1 Layer2
Code+Data:hJps://github.com/bdhingra/ga-reader
Wordsvs.Characters• Word-levelrepresenta)onsaregoodatlearningtheseman)csofthetokens
• Character-levelrepresenta)onsaremoresuitableformodelingsub-wordmorphologies(“cat”vs.“cats”)
Ø Word-levelrepresenta)onsareobtainedfromalearnedlookuptable
Ø Character-levelrepresenta)onsareusuallyobtainedbyapplyingRNNorCNN
• Hybridword-charactermodelshavebeenshowntobesuccessfulinvariousNLPtasks(Yangetal.,2016a,Miyamoto&Cho(2016),Lingetal.,2015)
• Commonlyusedmethodistoconcatenatethesetworepresenta)ons
Fine-GrainedGa)ng• Fine-grainedga)ngmechanism:
Word-levelrepresenta)on
Character-levelrepresenta)on
Ga)ng
Addi)onalfeatures:nameden)tytags,part-of-speechtags,documentfrequencyvectors,wordlook-uprepresenta)ons
Yang et al., ICLR 2017
Children’sBookTest(CBC)Dataset
Yang et al., ICLR 2017
Wordsvs.Characters• Highgatevalues:character-levelrepresenta)ons• Lowgatevalues:word-levelrepresenta)ons.
Yang et al., ICLR 2017
TalkRoadmap
• Mul)plica)veandFine-grainedAJen)on
• Linguis)cKnowledgeasExplicitMemoryforRNNs
• Genera)veDomain-Adap)veNets
Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.'' She gave me a quick nod and turned back to X
Broad-ContextLanguageModeling
LAMBADA dataset, Paperno et al., 2016
Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.'' She gave me a quick nod and turned back to X
Broad-ContextLanguageModeling
LAMBADA dataset, Paperno et al., 2016
Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.'' She gave me a quick nod and turned back to X
X = Terry
Broad-ContextLanguageModeling
LAMBADA dataset, Paperno et al., 2016
HerplainfacebrokeintoahugesmilewhenshesawTerry.“Terry!”shecalledout.Sherushedtomeethimandtheyembraced.“Hon,Iwantyoutomeetanoldfriend,OwenMcKenna.Owen,pleasemeetEmily.'’ShegavemeaquicknodandturnedbacktoX
CoreferenceDependencyParses
En)tyrela)ons
Wordrela)ons
CoreNLP
Freebase
WordNet
RecurrentNeuralNetwork
TextRepresenta)on
Incorpora)ngPriorKnowledge
there ball the left She
kitchen the to went She
football the got Mary
CoreferenceHyper/Hyponymy
RNN
Dhingra, Yang, Cohen, Salakhutdinov 2017
Incorpora)ngPriorKnowledge
MemoryasAcyclicGraphEncoding(MAGE)-RNN
there ball the left She
kitchen the to went She
football the got Mary
CoreferenceHyper/Hyponymy
RNN
RNN
xt
Mt
h0h1...
ht�1
e1 e|E|. . .
ht
Mt+1gt
Dhingra, Yang, Cohen, Salakhutdinov 2017
Incorpora)ngPriorKnowledge
there ball the left she
kitchen the to went she
football the got Mary
ForwardSubgraph
Forward/BackwardDAG
there ball the left she
kitchen the to went she
football the got Mary
BackwardSubgraph
there ball the left she
kitchen the to went she
football the got Mary
? football the is Where
Mul)pleSequences
ResultswithCoreference
• Plantoincorporaterela)onsbeyondcoreference
Model bAbiQA(1ktraining)
LAMBADA CNN
PreviousBest 90.1 49.0 77.9GA 75.2 50.0 77.9GA+MAGE 91.3 51.6 78.6
• AJen)onoveredgetypes
LearnedRepresenta)on
TalkRoadmap
• Mul)plica)veandFine-grainedAJen)on
• Linguis)cKnowledgeasExplicitMemoryforRNNs
• Genera)veDomain-Adap)veNets
Inmeteorology,precipita)onisanyproductofthecondensa)onofatmosphericwatervaporthatfallsundergravity.Themainformsofprecipita)onincludedrizzle,rain,sleet,snow,andhail…Precipita)onformsassmallerdropletscoalesceviacollisionwithotherraindropsoricecrystalswithinacloud.Short,intenseperiodsofraininscaJeredloca)onsarecalled“showers”
Whatcausesprecipita)ontofall?gravity
• Givenaparagraph/ques)on,extractaspanoftextastheanswer• Expensivetoobtainlargelabeleddatasets• SOTAapproachesrelyonlargelabeleddatasets
Extrac)veQues)onAnswering
SQuAD Dataset, Rajpurkar et al., 2016
LeverageUnlabeledText
• Almostunlimitedunlabeledtext.
LabeledQApairs Unlabeledtext
QAModel
Semi-SupervisedQA
Extrac)veQues)onAnswering
• UsePOS/NER/parsingtoextractpossibleanswerchunks• Anythingcanbetheanswers• Wewillassumethatanswersareavailable.
Inmeteorology,precipita)onisanyproductofthecondensa)onofatmosphericwatervaporthatfallsundergravity.Themainformsofprecipita)onincludedrizzle,rain,sleet,snow,andhail…Precipita)onformsassmallerdropletscoalesceviacollisionwithotherraindropsoricecrystalswithinacloud.Short,intenseperiodsofraininscaJeredloca)onsarecalled“showers”
Whatcausesprecipita)ontofall?gravity
Labeleddata
p,q,a
Unlabeleddata
p,a
q
GeneratorG:From(p,a)qSeq2seqwithcopymechanism
DiscriminatorD:CombinetotrainaQAmodel
From(p,q)aGAreader
Genera)ngQues)ons
Baseline1:ContextQues)ons
Generateacontextques)onfortheanswer“gravity”
Inmeteorology,precipita)onisanyproductofthecondensa)onofatmosphericwatervaporthatfallsundergravity.Themainformsofprecipita)onincludedrizzle,rain,sleet,snow,andhail…Precipita)onformsassmallerdropletscoalesceviacollisionwithotherraindropsoricecrystalswithinacloud.Short,intenseperiodsofraininscaJeredloca)onsarecalled“showers”
Whatcausesprecipita)ontofall?gravity
WatervaporthatfallsunderThemainformsof
Baseline2:GANs
paragraph,answer
paragraph,ques)on
Answer(reconstruc)on)
TrueorFakeques)on?
G
DD’
Goodfellow et al., 2014, Ganin et al. 2014 , Xia et al., 2016
Johnson et al., 2016; Chu et al., 2017
Genera)veDomain-Adap)veNets(GDANs)
LabeledData
TrainD TrainG
Yang, Hu, Salakhutdinov, Cohen, ACL 2017
UnlabeledData
TrainD
Johnson et al., 2016; Chu et al., 2017
Genera)veDomain-Adap)veNets(GDANs)
LabeledData
UnlabeledData
TrainD TrainD TrainG
GeneratorasaDataDomain
Condi)onDiscriminatorDonDomainsAdversarialtrainingforG
Yang, Hu, Salakhutdinov, Cohen, ACL 2017
ExamplesContext:“…anaddi)onalwarmingoftheEarth’ssurface.Theycalculatewith
confidencethatC02hasbeenresponsibleforoverhalftheenhancedgreenhouseeffect.Theypredictthatundera“businessasusual”scenario,…”
Answer:overhalfQuesJon:whattheenhancedgreenhouseeffectthatCO2beenresponsiblefor?GroundTrueQ:Howmuchofthegreenhouseeffectisduetocarbondioxide?
Context:“…in0000,bankamericardwasrenamedandspunoffintoaseparatecompanyknowntodayasvisainc.”
Answer:visainc.QuesJon:whatwastheseparatecompanybankamericard?GroundTrueQ:whatpresent-daycompanydidbankamericardturninto?
Yang, Hu, Salakhutdinov, Cohen, ACL 2017
SQuADdataset
Labelingrate Method TestF1 ExactMatching
0.1 Supervised 0.3815 0.24920.1 Context 0.4515 0.29660.1 Gen+GAN 0.4373 0.28850.1 GDAN 0.4802 0.32180.5 Supervised 0.5722 0.41870.5 Context 0.5740 0.41950.5 Gen+GAN 0.5590 0.40440.5 GDAN 0.5831 0.4267
• SQuADdataset:87,636training,10,600developmentinstances• Use50Kunlabelledexamples.
Varia)onalAutoencoder(VAE)• Transformsamplesfromsomesimpledistribu)on(e.g.normal)tothedatamanifold:
Determinis)cneuralnetwork
Genera)veProcess
Knigma and Welling, 2014
Themoviewasawfulandboring
VAEforTextGenera)on
Hu, Yang, Liang, Salakhutdinov, Xing, 2017
• Samplec,fixz.
VAEforTextGenera)on
Hu, Yang, Liang, Salakhutdinov, Xing, 2017
• Samplez,fixc.
Thankyou