19
Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

Embed Size (px)

Citation preview

Page 1: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

Machine Learning

Chapter 13. Reinforcement

Learning

Tom M. Mitchell

Page 2: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

2

Control Learning

Consider learning to choose actions, e.g., Robot learning to dock on battery charger Learning to choose actions to optimize factory output Learning to play Backgammon

Note several problem characteristics: Delayed reward Opportunity for active exploration Possibility that state only partially observable Possible need to learn multiple tasks with same

sensors/effectors

Page 3: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

3

One Example: TD-Gammon

Learn to play Backgammon

Immediate reward +100 if win -100 if lose 0 for all other states

Trained by playing 1.5 million games against itself

Now approximately equal to best human player

Page 4: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

4

Reinforcement Learning Problem

Page 5: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

5

Markov Decision Processes

Assume finite set of states S set of actions A at each discrete time agent observes state st S and chooses ac

tion at A then receives immediate reward rt and state changes to st+1

Markov assumption : st+1 = (st, at ) and rt = r(st, at )– i.e., rt and st+1 depend only on current state and action– functions and r may be nondeterministic– functions and r not necessarily known to agent

Page 6: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

6

Agent's Learning Task

Page 7: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

7

Value Function

Page 8: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

8

Page 9: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

9

What to Learn

Page 10: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

10

Q Function

Page 11: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

11

Training Rule to Learn Q

Page 12: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

12

Q Learning for Deterministic Worlds

Page 13: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

13

Page 14: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

14

Page 15: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

15

Nondeterministic Case

Page 16: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

16

Nondeterministic Case(Cont’)

Page 17: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

17

Temporal Difference Learning

Page 18: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

18

Temporal Difference Learning(Cont’)

Page 19: Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

19

Subtleties and Ongoing Research