Why LSTM outperforms RNN? Xintao Huan

Why LSTM outperforms RNN - GitHub Pages...Solution - LSTM With the cell state, it runs straight down the entire chain, with only some minor linear interactions (NOT matrix multiplication

Download PDF Report

Upload
others
View
11
Download
0

Embed Size (px)

Citation preview

Why LSTM outperforms RNN?

Xintao Huan

Outline

• Quick Review of RNN

• Investigate RNN

• Key Issue

• Solution - LSTM

Quick Review of RNN

Recurrent Neural Network (RNN) and the unfolding in time of thecomputation involved in its forward computation.

Page 4: Why LSTM outperforms RNN - GitHub Pages...Solution - LSTM With the cell state, it runs straight down the entire chain, with only some minor linear interactions (NOT matrix multiplication

Investigate RNN

Whh� weight between hidden layersWxh: weight between input layer and hidden layerWhy: weight between hidden layer and output layer

Page 5: Why LSTM outperforms RNN - GitHub Pages...Solution - LSTM With the cell state, it runs straight down the entire chain, with only some minor linear interactions (NOT matrix multiplication

Investigate RNN

In the computation of the hidden layer, Weight Matrix W is shared in all time-steps!

Page 6: Why LSTM outperforms RNN - GitHub Pages...Solution - LSTM With the cell state, it runs straight down the entire chain, with only some minor linear interactions (NOT matrix multiplication

Investigate RNN

In TRAINING,we compare the output of the time-step ytwith the reference result, then the loss Lt is obtained,

and sequence loss L will be back propagated from the end time-step to the first time-step.Next,

we employ Stochastic Gradient Descent (SGD) to minimize the lossand update the parameters in W .

Page 7: Why LSTM outperforms RNN - GitHub Pages...Solution - LSTM With the cell state, it runs straight down the entire chain, with only some minor linear interactions (NOT matrix multiplication

Investigate RNN

Forward through entire sequence to compute the Loss.Backward through entire sequence to minimize the loss and update the parameters in W .

Back Propagation Through Time (BPTT)

Page 8: Why LSTM outperforms RNN - GitHub Pages...Solution - LSTM With the cell state, it runs straight down the entire chain, with only some minor linear interactions (NOT matrix multiplication

Key IssueIn every backpropagation from ht to ht-1:

W is multiplied!

e.g. computing gradient of h0 involves many factors of W !

If sequence is long enough andW > 1, exploding gradients!W < 1, vanishing gradients!

Page 9: Why LSTM outperforms RNN - GitHub Pages...Solution - LSTM With the cell state, it runs straight down the entire chain, with only some minor linear interactions (NOT matrix multiplication

Key Issue

Exploding gradients:

Employ Gradient Clipping to scale the gradient (e.g. cut the value).

Vanishing gradients:

Change RNN architecture !

Solution - LSTM

With the cell state, it runs straight down the entire chain, with only someminor linear interactions (NOT matrix multiplication like in RNN)

It’s very easy for information to just flow along it unchanged.

input gateforget gateoutput gateupdate gate

Solution - LSTMRNN

LSTM

Vanishing gradients SOLVED!

References

1. Understanding LSTM http://colah.github.io/posts/2015-08-Understanding-LSTMs/

2. The Unreasonable Effectiveness of Recurrent Neural Networkshttp://karpathy.github.io/2015/05/21/rnn-effectiveness/

3. An Empirical Exploration of Recurrent Network Architectures http://proceedings.mlr.press/v37/jozefowicz15.pdf

4. A Critical Review of Recurrent Neural Networks for Sequence Learning https://arxiv.org/pdf/1506.00019.pdf

5. https://www.zhihu.com/question/44895610

6. https://my.oschina.net/u/2719468/blog/662099

7. https://yq.aliyun.com/articles/574218

8. https://juejin.im/post/59ae29d36fb9a024966cac99

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

http://proceedings.mlr.press/v37/jozefowicz15.pdf

https://arxiv.org/pdf/1506.00019.pdf

https://www.zhihu.com/question/44895610

https://my.oschina.net/u/2719468/blog/662099

https://yq.aliyun.com/articles/574218

https://juejin.im/post/59ae29d36fb9a024966cac99

Page 13: Why LSTM outperforms RNN - GitHub Pages...Solution - LSTM With the cell state, it runs straight down the entire chain, with only some minor linear interactions (NOT matrix multiplication

Thanks for your attention!

A RNN-LSTM based Predictive Autoscaling Approach on ...ijsrcseit.com/Paper/CSEIT1833498.pdf3Department of Information Technology, PSG College of Technology, Coimbatore, Tamilnadu,

Documents

Some RNN Variantsslazebni.cs.illinois.edu/spring17/lec03_rnn.pdf• A very simpliﬁed version of the LSTM! – Merges forget and input gate into a single ‘update’ gate! – Merges

Documents

Composing a melody with long-short term memory …konstilackner.github.io/LSTM-RNN-Melody-Composer-Website/Thesis... · long-short term memory (LSTM) Recurrent Neural Networks

Documents

RNN & LSTM

Software

Automated Neural Image Caption Generator for Visually Impaired …cs224d.stanford.edu/reports/mcelamri.pdf · 2016-06-20 · then fed into a vanilla RNN or a LSTM. The vanilla RNN

Documents

EcoRNN: Efficient Computing of LSTM RNN on GPUs · Figure 1: Left: A single-layer LSTM RNN that scans through an input sequence. Right: A zoom-in view of an LSTM cell. Both diagrams

Documents

Recurrent Neural Networks · 2020-04-06 · Overview •Finite state models •Recurrent neural networks (RNNs) •Training RNNs •RNN Models •Long short-term memory (LSTM) •Attention

Documents

Partition-wise Recurrent Neural Networks for Point-based ... · current Neural Networks (RNN), especially Long Short-Term Memory (LSTM) [5], Gated Recurrent Units (GRU) [6], and bidirectional

Documents

PowerPoint Presentationbhiksha/courses/deeplearning/Fall.2015/... · Standard RNN • Recurrent ... •Controls how much of the information is to be let through. LSTM: ... possible

Documents

CS230 Deep Learningcs230.stanford.edu/files_winter_2018/projects/6939740.pdf · Yelp review data, and report our generated sentences as being comparable to a traditional LSTM RNN

Documents

Multiple Instance Learning for Efﬁcient Sequential Data … · based on standard RNN models like LSTM are difﬁcult to deploy on tiny devices as they are computationally expensive

Documents

Appendix D RNN/LSTM Example Implementations with Keras/TensorFlo · 2019-05-14 · 2 -2.871662 3 0.496465 4 -0.602593 5 0.205998 6 0.808703 7 1.689375 The LSTM example to be examined

Documents

Deep Learning: Models for Sequence Data (RNN and LSTM) · Recap: Convolutional Neural Network Special type of feedforward neural nets (local connectivity + weight sharing) Each layer

Documents

180912 Lec6 RNN architectures(LSTM, GRU 등 최신 RNN 모델, …alinlab.kaist.ac.kr/resource/Lec6_RNN_architectures.pdf• Google’s Neural Machine Translation (GNMT) 3. Overcoming

Documents

Examensarbete Systemarkitekturutbildningen Philip ...hb.diva-portal.org/smash/get/diva2:1142444/FULLTEXT01.pdf · box trading, lstm, rnn, time series forecasting , prediction, tensorflow,

Documents

MULTIPLE-TARGET DEEP LEARNING FOR LSTM-RNN BASED …home.ustc.edu.cn/~sunlei17/pdf/MULTIPLE-TARGET.pdf · LSTM-RNN based speech enhancement. First, a direct map-ping approach using

Documents

ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS – FROM … · ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS – FROM HMM TO LSTM-RNN Heiga Zen Google

Documents

RNN & NLP Application - cs.kangwon.ac.krleeck/ML/RNN_NLP.pdf · LSTM RNN 응용 예 • Hand Writing by Machine • Music Composition • Neural Machine Translation • Image Caption

Documents

YoTube: Searching Action Proposal Via Recurrent and Static ... Searching... · A. Recurrent Neural Network Recurrent Neural Network (RNN), especially Long-Short Term Memory (LSTM)

Documents

CSE 291G : Deep Learning for Sequencescseweb.ucsd.edu/classes/wi19/cse291-g/student_presentations/NER.… · RNN : LSTM • Overcome drawbacks of existing system • Account for variable

Documents

Advanced Prediction ModelsA Special RNN: LSTM • The gap between the relevant information and the point where it is needed can become unbounded • Empirical observation: Vanilla

Documents

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image ... · phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning 3 the conventional RNN language model and our

Documents

Situation Assessment for Planning Lane Changes ...Neural Network (RNN) with Long Short-Term Memory (LSTM) units. We further extend this to a Bidirectional RNN. As Bidirectional RNNs

Documents

RNN LSTM and Deep Learning Libraries · RNN LSTM and Deep Learning Libraries UDRC Summer School Muhammad Awais m.a.rana@surrey.ac.uk

RNN LSTM and Deep Learning Libraries · RNN LSTM and Deep Learning Libraries UDRC Summer School Muhammad Awais [email protected]

Documents

RNN & NLP Application - cs.kangwon.ac.krcs.kangwon.ac.kr/~leeck/NLP/RNN_NLP.pdf · Long Short-Term Memory RNN • Vanishing Gradient Problem for RNN • LSTM can preserve gradient

Documents

Understanding Recurrent Neural Networks - Virginia Techjbhuang/teaching/ece6554/sp17/lectures/... · LSTM outperforms n-gram in predicting opening and closing braces for code Quantitative

Documents

Compression of Neural Machine Translation Models via Pruningnlp.stanford.edu/pubs/see2016compression.pdf(2015b) to recurrent neural networks (RNN), in particular LSTM architectures

Documents

CSE 291 - Advanced Statistical Natural Language Processing Nishant … · 2017-08-09 · Long Short-Term Memory (LSTM) I Hochreiter and Schmidhuber (1997) I An RNN architecture that

Documents

Fast, Compact, and High Quality LSTM-RNN Based … · Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices Heiga Zen, Yannis

Documents

Reinforcement Learning: A Brief Introduction · 2012: Distributed deep learning (e.g., Google Brain) 2013: DQN for deep reinforcement learning 1997: LSTM-RNN 2015: Open source tools:

Documents