View
171
Download
0
Category
Preview:
Citation preview
Using Deep Reinforcement Learningfor Dialogue Systems
Harm van Seijen, Research Scientist Montréal, Canada
spoken dialogue system
natural language understanding state tracker
policy managernatural language generation
data
“Hi, do you know a good Indian restaurant”
system response
user act
systemact
dialogue stateuser
inform(food=“Indian”)
user input
“Sure. What price range are you thinking of?” request(price_range)
spoken dialogue system
natural language understanding state tracker
policy managernatural language generation
data
“Hi, do you know a good Indian restaurant”
system response
user act
systemact
dialogue stateuser
The central question: how to train the policy manager?
inform(food=“Indian”)
user input
“Sure. What price range are you thinking of?” request(price_range)
outline1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
what is reinforcement learningReinforcement Learning is a data-driven approach towards learning behaviour.
what is reinforcement learningReinforcement Learning is a data-driven approach towards learning behaviour.
machine learning
unsupervised learning
supervised learning
reinforcement learning
what is reinforcement learningReinforcement Learning is a data-driven approach towards learning behaviour.
machine learning
unsupervised learning
supervised learning
reinforcement learning
+deep learning deep learning
+ +deep learning
what is reinforcement learningReinforcement Learning is a data-driven approach towards learning behaviour.
machine learning
unsupervised learning
supervised learning
reinforcement learning
+deep learning deep learning
+ +deep learning
= deep reinforcement
learning
RL vs supervised learning
supervised learninghard to specify function easy to identify correct output
behaviour: function that maps environment states to actions
RL vs supervised learning
supervised learninghard to specify function easy to identify correct output
behaviour: function that maps environment states to actions
example: recognizing cats in images
f cat / no cat
RL vs supervised learningbehaviour: function that maps environment states to actions
reinforcement learning: hard to specify function hard to identify correct output easy to specify behaviour goal
RL vs supervised learningbehaviour: function that maps environment states to actions
reinforcement learning: hard to specify function hard to identify correct output easy to specify behaviour goal
example: double inverted pendulum
state: θ1, θ2, ω1, ω2 action: clockwise/counter-clockwise torque on top joint goal: balance pendulum upright
advantages RL
does not require knowledge of good policy does not require labelled data online learning: adaptation to environment changes
challenges RL
requires lots of data sample distribution changes during learning samples are not i.i.d.
outline1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
finding the optimal policy
Q-learning: classical RL algorithm combines (partial) policy evaluation with (partial) policy improvement update target:
policy estimation
policy improvement:
deep reinforcement learning2015 Nature paper from DeepMind introduced an RL method based on deep learning, called DQN
main result: with same network architecture, learned to play large number of Atari 2600 games effectively
deep reinforcement learning2015 Nature paper from DeepMind introduced an RL method based on deep learning, called DQN
main result: with same network architecture, learned to play large number of Atari 2600 games effectively
DQN characteristics variation on Q-learning that uses deep neural networks to approximate the Q function uses experience replay to deal with non-i.i.d. samples uses two networks (Q and Q’) to mitigate non-stationarity of update targets
outline1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
applying RL to dialogue system
training dialogue manager requires huge number of online samples hence, a user simulator, trained on offline data, is used to train dialogue manager
policy manager
systemact
user simulator
trainingstate tracker
dialogueact
offline data
deep RL for dialogue system
exact state is not observed, hence belief state is used belief-state spaces are typically discretized into summary state spaces to make the task tractable deep RL can be applied directly to the belief-state space due to its strong generalization properties with pre-training, a deep RL method can become even more efficient
summaryRL is a data-driven approach towards learning behaviour RL does not require knowledge of good policy RL can be used for online learning combining RL with deep learning means that RL can be applied to much bigger problems constructing a good policy for a modern dialogue manager is a challenging task deep RL is the perfect candidate to address this challenge
Further reading:
“Introduction to Reinforcement Learning” by Richard S. Sutton & Andrew G. Barto https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html
“Algorithms for Reinforcement Learning” by Csaba Szepesvarihttps://sites.ualberta.ca/~szepesva/RLBook.html
“Policy Networks with Two-Stage Training for Dialogue Systems” by Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman https://arxiv.org/abs/1606.03152
Code examples:
simple DQN example in Python: https://edersantana.github.io/articles/keras_rl/
tool for testing/developing RL algorithms: https://gym.openai.com/
Recommended