Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo...

CS839:ProbabilisticGraphicalModels

Lecture22:TheAttentionMechanismTheoRekatsinas

WhyAttention?

• Considermachinetranslation:• Weneedtopayattentiontothewordwearecurrentlytranslating.Istheentiresequenceneededascontext?

• Thecatisblack->Lechatest noir

WhyAttention?

• Considermachinetranslation:• Weneedtopayattentiontothewordwearecurrentlytranslating.Istheentiresequenceneededascontext?

• Thecatisblack->Lechatest noir

• RNNsarethede-factostandardformachinetranslation• Problem:translationreliesonreadingacompletesentenceandcompressesallinformationintoafixed-lengthvectorasentencewithhundredsofwordsrepresentedbyseveralwordswillsurelyleadtoinformationloss,inadequatetranslation,etc.

• Long-rangedependenciesaretricky.

Basicencoder- decoder

SoftAttentionforTranslation

“Ilovecoffee”->“Megustaelcafé”

Distributionoverinputwords

Bahdanauetal,“NeuralMachineTranslationbyJointlyLearningtoAlignandTranslate”,ICLR2015

SoftAttention

FromY.Bengio CVPR2015Tutorial

BidirectionalencoderRNN

DecoderRNN

AttentionModel

SoftAttentionContextvector(inputtodecoder):

Mixtureweights:

Alignmentscore(howwelldoinputwordsnearjmatchoutputwordsatpositioni):

SoftAttentionLuong,PhamandManning’sTranslationSystem(2015):

LuongandManningIWSLT2015

TranslationErrorRatevsHuman

HardAttention

MonotonicAttention

GlobalAttention• Blue=encoder• Red=decoder• Attendtoacontextvector.• Decodercapturesglobalinformationnotonlytheinformationfromonehiddenstate.• Contextvectortakesallcell’soutputsasinputandcomputesaprobabilitydistributionforeachtokenthedecoderwantstogenerate

LocalAttention

• Computeabestalignedpositionfirst• Thencomputeacontextvectorcenteredatthatposition

RNNforCaptioning

Image:HxWx3

Features:D

Hiddenstate:H

Firstword

Secondword

Distributionovervocab

RNNonlylooksatwholeimage,once

WhatiftheRNNlooksatdifferentpartsoftheimageateachtimestep?

SoftAttentionforCaptioning

Image:HxWx3

Features:LxD

Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015

Image:HxWx3

Features:LxD

Image:HxWx3

Features:LxD

DistributionoverLlocations

Image:HxWx3

Features:LxD

Weightedcombinationoffeatures

z1Weightedfeatures:D

Image:HxWx

Features:LxD

Weightedfeatures:D y1

Firstword

Image:HxWx3

Features:LxD

Firstword

Weightedfeatures:D

Image:HxWx3

Features:LxD

Firstword

z2Weightedfeatures:D

Image:HxWx3

Features:LxD

Firstword

z2 y2Weightedfeatures:D

Image:HxWx3

Features:LxD

Firstword

z2 y2Weightedfeatures:D

SoftvsHardAttention

Image:HxWx3

Gridoffeatures(EachD-dimensional)

Distributionovergridlocations

pa+pb+pc+pc=1

FromRNN:

SoftvsHardAttention

Image:HxWx3

pa+pb+pc+pc=1

FromRNN:

Contextvectorz(D-dimensional)

SoftvsHardAttention

Image:HxWx3

pa+pb+pc+pc=1

FromRNN:

Softattention:SummarizeALLlocationsz=paa+pbb +pcc +pdd

Derivativedz/dpisnice!Trainwithgradientdescent

SoftvsHardAttention

Image:HxWx3

pa+pb+pc+pc=1

FromRNN:

Softattention:SummarizeALLlocationsz=paa+pbb +pcc +pdd

Derivativedz/dpisnice!Trainwithgradientdescent

Hardattention:SampleONElocation

accordingtop,z=thatvector

Withargmax,dz/dpiszeroalmosteverywhere…

Can’tusegradientdescent;needreinforcementlearning

SoftvsHardAttention

Multi-headedAttention

Attentionisallyouneed

Attentiontricks

SoftvsHardAttentionAttentionTakeawaysPerformance:• Attentionmodelscanimprove

accuracy andreducecomputationatthesametime.

Complexity:• Therearemanydesignchoices.• Thosechoiceshaveabigeffectonperformance.• Ensemblinghasunusuallylargebenefits.• Simplifywherepossible!

SoftvsHardAttentionAttentionTakeawaysExplainability:• Attentionmodelsencodeexplanations.• Bothlocusandtrajectoryhelp

understandwhat’sgoingon.

Hardvs.Soft:• Softmodelsareeasiertotrain,hardmodelsrequirereinforcementlearning.

• Theycanbecombined,asinLuongetal.

Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo...

Documents

Hierarchical Attention Networks for Document Classificationdmqm.korea.ac.kr/uploads/seminar/190712... · 2019-07-12 · 01 Introduction Attention Mechanism Seq2seq Model(Sutskever

arXiv:1703.10106v2 [cs.CV] 7 Aug 2017 · Pose-conditioned Spatio-Temporal Attention for Human Action Recognition ... France ffabien.baradel, ... soft-attention mechanism conditioned

The Spam and Attention Bond Mechanism FAQ economics-faq.pdfThe Spam and Attention Bond Mechanism FAQ Thede Loder, Marshall Van Alstyne, Rick Wash, University of Michigan Mark Benerofe,

Residual Learning, Attention Mechanism and Multi-tasks

Oscillatory phase synchronisation: A brain mechanism of memory matching and attention

ACCEPTED TO IEEE TRANSACTIONS ON IMAGE ... context memory cell. To further improve the attention capability, we also introduce a recurrent attention mechanism, with which the attention

The Spam and Attention Bond Mechanism FAQ economics-faq.pdf · The Spam and Attention Bond Mechanism FAQ Thede Loder, Marshall Van Alstyne, Rick Wash, University of Michigan Mark

E cient Attention Mechanism for Visual Dialog that can ... · E cient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs Van-Quang

Convolutional Block Attention Module - ECCV 2018...a human visual system. Attention mechanism. It is well known that attention plays an important role in human perception [23,24,25]

Attention Branch Network: Learning of Attention Mechanism ...openaccess.thecvf.com/content_CVPR_2019/papers/Fukui_Attention_Branch... · the attention mechanism and is trainable for

General Attention Mechanism for Artiﬁcial Intelligence …en.ru.is/media/td/Helgi_Pall_Helgason_PhD_CS_HR.pdf · General Attention Mechanism for Artificial Intelligence Systems

Lecture 8: Machine Translation and Sequence-to-Sequence Models · Encoder-decoder architecture Sequence matrix representation Attention mechanism Encoder-decoder with attention Convolutional

Interactions of Visual Attention and Object Recognition: … · 2012-12-26 · vi Selective visual attention provides an eﬀective mechanism to serialize perception of complex scenes

A possible mechanism for impaired joint attention in autismdocs.autismresearchcentre.com/news/ANC_2010_Williams.pdf · A possible mechanism for impaired joint attention in autism

Improving Relation Extraction with Knowledge-attention · Transformer through a novel knowledge-attention mechanism to improve its performance on relation extraction task. 3 Knowledge-attention

Déjà vu: A Contextualized Temporal Attention Mechanism for ... · Déjà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation WWW ’20, April 20–24,

Reducing Test Cases with Attention Mechanism of Neural

“It has not escaped our attention that the structure provides a mechanism for replication”

Pruning Convolutional Neural Networks with an Attention ... · electronics Article Pruning Convolutional Neural Networks with an Attention Mechanism for Remote Sensing Image Classiﬁcation

Attention models & Current topics in Neural MT•Solution: attention mechanism •An example of incorporating inductive bias in model architecture Attention model intuition •Encode