64
ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.) 11-01-2017

ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

  • Upload
    others

  • View
    28

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

ELEC 576: Recurrent Neural Network Language Models &

Deep Reinforcement LearningAnkit B. Patel, CJ Barberan

Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.)

11-01-2017

Page 2: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Word-level RNN Language Models

Page 3: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Motivation• Model the probability

distribution of the next word in a sequence, given the previous words

• Words are the minimal unit to provide meaning

• Another step to a hierarchical model

[Nicholas Leonard]

Page 4: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Global Vectors for Word Representation (GloVe)

• Provide semantic information/context for words

• Unsupervised method for learning word representations

[Richard Socher]

Page 5: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Glove Visualization

[Richard Socher]

Page 6: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Word2Vec• Learn word embeddings

• Shallow, two-layer neural network

• Training makes observed word-context pairs have similar embeddings, while scattering unobserved pairs. Intuitively, words that appear in similar contexts should have similar embeddings

• Produces a vector space for the words

[Goldberg, Levy Arxiv 2014]

Page 7: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Word2Vec Visualization

[Tensorflow]

Page 8: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Understanding Word2Vec

Probability that word context pair taken from document

[Goldberg, Levy Arxiv 2014]

Page 9: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Understanding Word2VecMaximize likelihood real context pairs come from document

[Goldberg, Levy Arxiv 2014]

Page 10: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Word2Vec as Word-Context Association Matrix Decomposition

Solution is optimal parameters obey relation:

Pointwise Mutual Information

=

1. Construct word context association matrix

2. Low rank decomposition

[Goldberg, Levy Arxiv 2014]

Page 11: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Question Time• Given the theoretical understanding of word2vec,

what kinds of things will word2vec not capture well?

• Can you think of ways to make it better?

Page 12: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Word2vec with RNNs

[Mikolov, Yih, Zweig]

Page 13: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Word RNN trained on Shakespeare

[Sung Kim]

Page 14: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Gated Word RNN

[Miyamoto, Cho]

Page 15: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Gated Word RNN Results

[Miyamoto, Cho]

Page 16: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Combining Character & Word Level

[Bojanowski, Joulin, Mikolov]

Page 17: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Question Time• In which situation(s) can you see character-level

RNN more suitable than a word-level RNN?

Page 18: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Generating Movie Scripts• LSTM named Benjamin

• Learned to predict which letters would follow, then the words and phrases

• Trained on corpus of past 1980 and 1990 sci-fi movie scripts

• "I'll give them top marks if they promise never to do this again."

• https://www.youtube.com/watch?v=LY7x2Ihqjmc

Page 19: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Character vs Word Level Models

Page 20: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Character vs Word-Level Models

[Kim, Jernite, Sontag, Rush]

Page 21: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Word Representations of Character & Word Models

[Kim, Jernite, Sontag, Rush]

Page 22: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Other Embeddings

Page 23: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Tweet2Vec

[Dhingra, Zhou, Fitzpatrick, Muehl, Cohen]

Page 24: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Tweet2Vec Encoder

[Dhingra, Zhou, Fitzpatrick, Muehl, Cohen]

Page 25: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Tweet2Vec Results

[Dhingra, Zhou, Fitzpatrick, Muehl, Cohen]

Page 26: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Gene2Vec• Word2Vec performs poorly on long nucleotide

sequences

• Short sequences are very common like AAAGTT

[David Cox]

Page 27: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Gene2Vec Visual

[David Cox]Hydrophobic Amino Acids

Page 28: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Doc2Vec• Similar to Word2Vec but to a larger scale

• Sentences & Paragraphs

[RaRe Technologies]

Page 29: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Applications of Document Models

• Discovery of litigation e.g. CS Disco

• Sentiment Classification e.g. movie reviews

Page 30: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Reinforcement Learning

Page 31: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

What is Reinforcement Learning

• Reinforcement Learning (RL) is a framework for decision-making

• Have an agent with acts in an environment

• Each action influences the agent’s future state

• Success is measured by reward

• Find actions that maximise your future reward

Page 32: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Agent and Environment• Agent

• Executes action

• Receives observation

• Receives reward

• Environment

• Receives action

• Sends new observation

• Sends new reward

[David Silver ICML 2016]

Page 33: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Defining the State• Sequence of observations, rewards, and actions

define experience

• State is a summary of experiences

[David Silver ICML 2016]

Page 34: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Components of an RL Agent• Policy

• Agent’s behavior function

• Value Function

• Determine if each state or action is good

• Model

• Agent’s representation

Page 35: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Policy• Policy is the agent’s behavior

• Policy is a map from state to action

• Deterministic

• Stochastic

[David Silver ICML 2016]

Page 36: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Value Function• Value function is a prediction of the future reward

• What will my reward be given this action a from state s?

• Example: Q-value function

• State s, action a, policy , discount factor

[David Silver ICML 2016]

Page 37: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Optimal Value Function• The goal is to have the maximum achievable

• With that, can act optimally with the policy

• Optimal value maximises over all the decisions

[David Silver ICML 2016]

Page 38: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Model• The model is learned through experience

• Can act as a proxy for the environment

Page 39: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

RL Approaches• Value-based RL

• Estimate the optimal value

• Maximum value achievable under any policy

• Policy-based RL

• Search for the optimal value

• Policy achieving maximum future reward

• Model-based RL

• Build a model of the environment

• Plan using a model

[David Silver ICML 2016]

Page 40: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Deep Reinforcement Learning

Page 41: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Deep RL• To have Deep RL, we need

• RL + Deep Learning (DL) = AI

• RL defines the objective

• DL provides the mechanism to learn

Page 42: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Deep RL• Can use DL to represent

• Value function

• Policy

• Model

• Optimise loss function via Stochastic Gradient Descent

Page 43: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Value-Based Deep Reinforcement Learning

Page 44: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Value-Based Deep RL• The value function is

represented by Q-network with weights w

[David Silver ICML 2016]

Page 45: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Q-Learning• Optimal Q-values obey Bellman Equation

• Minimise MSE loss via SGD

• Diverges due to

• Correlations between samples

• Non-stationary targets

Page 46: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Experience Replay• Remove correlations

• Build dataset from agent’s own experience

• Sample experiences from the dataset and apply update

[David Silver ICML 2016]

Page 47: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

DQN: Atari 2600

[David Silver ICML 2016]

Page 48: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

DQN: Atari 2600• End-to-end learning of Q(s,a) from the frames

• Input state is stack of pixels from last 4 frames

• Output is Q(s,a) for the joystick/button positions

• Varies with different games (~3-18)

• Reward is change in score for that step

Page 49: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

DQN: Atari 2600• Network architecture and hyper parameters are

fixed across all the games

[David Silver ICML 2016]

Page 50: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

DQN: Atari Video

[Jon Juett Youtube]

Page 51: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

DQN: Atari Video

[PhysX Vision Youtube]

Page 52: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Improvements after DQN• Double DQN

• Current Q-network w is used to select actions

• Older Q-network is w- is used to evaluate actions

• Prioritized reply

• Store experience in priority queue by the DQN error

Page 53: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Improvements after DQN• Duelling Network

• Action-independent value function V(s,v)

• Action-dependent advantage function A(s,a,w)

• Q(s,a) = V(s,v) + A(s,a,w)

Page 54: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Improvements after DQN

[Marc G. Bellemare Youtube]

Page 55: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

General Reinforcement Learning Architecture

[David Silver ICML 2016]

Page 56: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Policy-based Deep Reinforcement Learning

Page 57: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Deep Policy Network• The policy is represented by a network with weights u

• Define objective function as total discounted reward

• Optimise objectiva via SGD

• Adjust policy parameters u to achieve higher reward

[David Silver ICML 2016]

Page 58: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Policy Gradients• Gradient of a stochastic policy

• Gradient of a deterministic policy

[David Silver ICML 2016]

Page 59: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Actor Critic Algorithm• Estimate value function Q(s, a, w)

• Update policy parameters u via SGD

[David Silver ICML 2016]

Page 60: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Asynchronous Advantage Actor-Critic: Labyrinth

[DeepMind Youtube]

Page 61: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

A3C: Labyrinth• End-to-end learning of softmax policy from raw

pixels

• Observations are pixels of the current frame

• State is an LSTM

• Outputs both value V(s) & softmax over actions

• Task is to collect apples (+1) and escape (+10)

[David Silver ICML 2016]

Page 62: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Model-based Deep Reinforcement Learning

Page 63: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Learning Models of the Environment

• Generative model of Atari 2600

• Issues

• Errors in transition model compound over the trajectory

• Planning trajectory differ from executed trajectories

• Long, unusual trajectory rewards are totally wrong

[David Silver ICML 2016]

Page 64: ELEC 576: Recurrent Neural Network Language Models & Deep ... · ELEC 576: Recurrent Neural Network Language Models & Deep Reinforcement Learning Ankit B. Patel, CJ Barberan Baylor

Learning Models of the Environment

[Junhyuk Oh Youtube]