Upload
institute-of-systems-science-national-university-of-singapore
View
157
Download
0
Embed Size (px)
Citation preview
What is this talk about?
An introduction to key ideas in reinforcement learning
Conceptual, fairly high level
Intuition about the maths without the equations- Not quite. There are 4 equations
Demos
What is Machine Learning?
“Machine learning is the field of
study that gives computers the
ability to learn without being
explicitly programmed”
Arthur Samuel
Wrote the first
program that learn
how to play
checkers in 1959
Uses the minimax
algorithm
Arthur Samuel
Programmed vs Learnt
Traditional Program
MachineProgram
Data
Answer
Machine Learning
Machine
Ne
w D
ata
New AnswerMachine Learning
Program
Data
Answer
Training phase
How Do Machine Learn?
Slides adapted from https://www.slideshare.net/aaalm/2016-01-27-a-gentle-and-structured-introduction-to-machine-learning-v105-for-release-58329371
By generalising
Supervised
Learning
Give samples
Give answers to the
samples
Infer rules from the
samples and the
answers
By comparing
Unsupervised
Learning
Give samples
Do not give answers
Use some metrics to
infer similarity by
grouping them
By reward
Reinforcement
Learning
Do not give samples
Do not give answers
Infer the rules from the
positive or negative
feedback
Reinforcement Learning circa 1977
ActionObserveAgent
Environment
State Reward
Reinforcement Learning
State
A ‘snapshot’ of the environment at a point in time
What are the possible actions to take?
State
LEFT UP
State Transition
State
State State State
Action
State
LEFT UP
Which of these 2
states is the better
one to be in?
State
Features of this state
Straight ahead, no reduction
in speed
Features of this state
Proximity to power pill and
around the corner (obstacle) but
reduce speed
Utility
State
State State State
Action
The state has the
best features
Utility is how we calculate
the ‘goodness’ of a state
Using the utility function we
can express the agent’s
preference
u(state0) > u(state1)
agent prefer state0 over
state1
To Win
Maximize our utility
Utility, Value and Reward
Rewards - How do they differ?
What Actions to Take?
Image from http://ai.berkeley.edu/home.html
Rewards
Rewards
- Can be either positive and negative
- Given at the end
- Given at every step - living reward
Prefer now to later
- Discounting - earlier rewards will have higher utility than later rewards
Image from http://ai.berkeley.edu/home.html
How will the Agent Behave?
Scenario #1
Living reward is -1
End game reward is 100
Scenario #2
Living reward is 1
End game reward is 100
UP
Uncertainty
Controller Problem
80% move in the correct direction
10% go left
10% go right
What is the Value?
UP
80%
20%
Time to Changi Airport
Normal traffic = 30 mins
Probability = 60%
Heavy traffic = 45 mins
Probability = 40%
Time to Changi Airport
(30 ✕ .6) = 18 mins (45 ✕ .4) = 18 mins
≅36 mins
Bellman Equation
Richard Bellman
The Bellman Equation
state0
state1
(state0, action)
(state0, action, state1)
Q-state
Value
Policy Extraction or How to Win
Step 1 - start by being optimal
Step 2 - keep being optimal
Demo
1. Prefer the closer exit (1),
risking the cliff (-10)
1. Prefer the closer exit (1),
avoiding the cliff (-10)
1. Prefer the distant exit (10),
risking the cliff (-10)
1. Prefer the distant exit (10),
avoiding the cliff (-10)
Model Free Learning
Trial and Error
Don’t know the transitions
Don’t know the rewards
Model Free Learning
Learn by trial and error
Eventually will approximate Bellman updates
Explore vs Exploit
Demo
Where to Learn?
Reinforcement Learning - An Introductionby Richard S Sutton and Andrew G Barto
Berkeley AI Course - http://ai.berkeley.edu/home.html
David Silver - http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html
Thank YouHave a great day
Maximize your utility