38
HOW ABOUT A NICE GAME OF CHESS? Lee Chuk Munn [email protected] http://bit.ly/learningday2017

NUS-ISS Learning Day 2017 - How About a Game of Chess?

Embed Size (px)

Citation preview

Page 1: NUS-ISS Learning Day 2017 - How About a Game of Chess?

HOW ABOUT A NICE GAME OF CHESS?

Lee Chuk [email protected]

http://bit.ly/learningday2017

Page 2: NUS-ISS Learning Day 2017 - How About a Game of Chess?
Page 3: NUS-ISS Learning Day 2017 - How About a Game of Chess?

What is this talk about?

An introduction to key ideas in reinforcement learning

Conceptual, fairly high level

Intuition about the maths without the equations- Not quite. There are 4 equations

Demos

Page 4: NUS-ISS Learning Day 2017 - How About a Game of Chess?

What is Machine Learning?

“Machine learning is the field of

study that gives computers the

ability to learn without being

explicitly programmed”

Arthur Samuel

Page 5: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Wrote the first

program that learn

how to play

checkers in 1959

Uses the minimax

algorithm

Arthur Samuel

Page 6: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Programmed vs Learnt

Page 7: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Traditional Program

MachineProgram

Data

Answer

Page 8: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Machine Learning

Machine

Ne

w D

ata

New AnswerMachine Learning

Program

Data

Answer

Training phase

Page 9: NUS-ISS Learning Day 2017 - How About a Game of Chess?

How Do Machine Learn?

Slides adapted from https://www.slideshare.net/aaalm/2016-01-27-a-gentle-and-structured-introduction-to-machine-learning-v105-for-release-58329371

By generalising

Supervised

Learning

Give samples

Give answers to the

samples

Infer rules from the

samples and the

answers

By comparing

Unsupervised

Learning

Give samples

Do not give answers

Use some metrics to

infer similarity by

grouping them

By reward

Reinforcement

Learning

Do not give samples

Do not give answers

Infer the rules from the

positive or negative

feedback

Page 10: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Reinforcement Learning circa 1977

Page 11: NUS-ISS Learning Day 2017 - How About a Game of Chess?

ActionObserveAgent

Environment

State Reward

Reinforcement Learning

Page 12: NUS-ISS Learning Day 2017 - How About a Game of Chess?
Page 13: NUS-ISS Learning Day 2017 - How About a Game of Chess?

State

A ‘snapshot’ of the environment at a point in time

What are the possible actions to take?

Page 14: NUS-ISS Learning Day 2017 - How About a Game of Chess?

State

LEFT UP

Page 15: NUS-ISS Learning Day 2017 - How About a Game of Chess?

State Transition

State

State State State

Action

Page 16: NUS-ISS Learning Day 2017 - How About a Game of Chess?

State

LEFT UP

Which of these 2

states is the better

one to be in?

Page 17: NUS-ISS Learning Day 2017 - How About a Game of Chess?

State

Features of this state

Straight ahead, no reduction

in speed

Features of this state

Proximity to power pill and

around the corner (obstacle) but

reduce speed

Page 18: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Utility

State

State State State

Action

The state has the

best features

Utility is how we calculate

the ‘goodness’ of a state

Using the utility function we

can express the agent’s

preference

u(state0) > u(state1)

agent prefer state0 over

state1

Page 19: NUS-ISS Learning Day 2017 - How About a Game of Chess?

To Win

Maximize our utility

Page 20: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Utility, Value and Reward

Page 21: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Rewards - How do they differ?

Page 22: NUS-ISS Learning Day 2017 - How About a Game of Chess?

What Actions to Take?

Image from http://ai.berkeley.edu/home.html

Page 23: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Rewards

Rewards

- Can be either positive and negative

- Given at the end

- Given at every step - living reward

Prefer now to later

- Discounting - earlier rewards will have higher utility than later rewards

Image from http://ai.berkeley.edu/home.html

Page 24: NUS-ISS Learning Day 2017 - How About a Game of Chess?

How will the Agent Behave?

Scenario #1

Living reward is -1

End game reward is 100

Scenario #2

Living reward is 1

End game reward is 100

Page 25: NUS-ISS Learning Day 2017 - How About a Game of Chess?

UP

Page 26: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Uncertainty

Controller Problem

80% move in the correct direction

10% go left

10% go right

Page 27: NUS-ISS Learning Day 2017 - How About a Game of Chess?

What is the Value?

UP

80%

20%

Page 28: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Time to Changi Airport

Normal traffic = 30 mins

Probability = 60%

Heavy traffic = 45 mins

Probability = 40%

Page 29: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Time to Changi Airport

(30 ✕ .6) = 18 mins (45 ✕ .4) = 18 mins

≅36 mins

Page 30: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Bellman Equation

Richard Bellman

Page 31: NUS-ISS Learning Day 2017 - How About a Game of Chess?

The Bellman Equation

state0

state1

(state0, action)

(state0, action, state1)

Q-state

Value

Policy Extraction or How to Win

Step 1 - start by being optimal

Step 2 - keep being optimal

Page 32: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Demo

1. Prefer the closer exit (1),

risking the cliff (-10)

1. Prefer the closer exit (1),

avoiding the cliff (-10)

1. Prefer the distant exit (10),

risking the cliff (-10)

1. Prefer the distant exit (10),

avoiding the cliff (-10)

Page 33: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Model Free Learning

Trial and Error

Don’t know the transitions

Don’t know the rewards

Page 34: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Model Free Learning

Learn by trial and error

Eventually will approximate Bellman updates

Page 35: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Explore vs Exploit

Page 36: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Demo

Page 37: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Where to Learn?

Reinforcement Learning - An Introductionby Richard S Sutton and Andrew G Barto

Berkeley AI Course - http://ai.berkeley.edu/home.html

David Silver - http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html

Page 38: NUS-ISS Learning Day 2017 - How About a Game of Chess?

Thank YouHave a great day

Maximize your utility