27
12/31/21 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

Embed Size (px)

Citation preview

Page 1: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 1DARPA-MARS Kickoff

Adaptive Intelligent Mobile Robots

Leslie Pack Kaelbling

Artificial Intelligence Laboratory

MIT

Page 2: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 2DARPA-MARS Kickoff

Two projects

Making reinforcement learning work on real robots

Solving huge problems dynamic problem reformulation explicit uncertainty management

Page 3: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 3DARPA-MARS Kickoff

Reinforcement learning

given a connection to the environment find a behavior that maximizes long-run

reinforcement

Reinf

Environment

ActionObservation

Page 4: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 4DARPA-MARS Kickoff

Why reinforcement learning?

Unknown or changing environments

Easier for human to provide reinforcement function than whole behavior

Page 5: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 5DARPA-MARS Kickoff

Q-Learning

Learn to choose actions because of their long-term consequences

Given experience:

Given a state s , take the action a that maximizes

s,a,r, ′ s

Q(s,a) =(1−α)Q(s,a) +α(r +γmax′ aQ( ′ s , ′ a ))

Q(s,a)

Page 6: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 6DARPA-MARS Kickoff

Does it Work?

Yes and no.

Successes in simulated domains: backgammon, elevator scheduling

Successes in manufacturing and juggling with strong constraints

No strong successes in more general online robotic learning

Page 7: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 7DARPA-MARS Kickoff

Why is RL on robots hard?

Need fast, robust supervised learning

Continuous input and action spaces

Q-learning slow to propagate values

Need strong exploration bias

Page 8: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 8DARPA-MARS Kickoff

Making RL on robots easier

Need fast, robust supervised learning locally weighted regression

Continuous input and action spaces search and caching of optimal action

Q-learning slow to propagate values model-based acceleration

Need strong exploration bias start with human-supplied policy

Page 9: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 9DARPA-MARS Kickoff

HumanPolicy

Start with human-provided policy

Environment

action

state

Page 10: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 10DARPA-MARS Kickoff

Do supervised policy learning

HumanPolicy

Train

Environment

Policy

action

state

s a

Page 11: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 11DARPA-MARS Kickoff

When the policy is learned, let it drive

HumanPolicy

Train

Environment

Policyaction

state

Page 12: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 12DARPA-MARS Kickoff

Q-LearningTrain

Environment

Q-Value

RL

Policyaction

state

D

sa

v

Page 13: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 13DARPA-MARS Kickoff

Acting based on Q values

Q-Value

Q-Value

Q-Value

maxindex

a1

a2

an

a

s

Page 14: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 14DARPA-MARS Kickoff

Letting the Q-learner driveTrain

Environment

RL

Policyaction

state

D

Q-Valuesa

v

max

Page 15: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 15DARPA-MARS Kickoff

Train policy with max Q valuesTrain

Environment

RL

Policyaction

state

D

Q-Valuesa

v

max

s’

Page 16: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 16DARPA-MARS Kickoff

Add model learningTrain

Train

Model

Environment

Q-Value

RL

Policyaction

state

D

s

s s

a

a r

v

Page 17: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 17DARPA-MARS Kickoff

Train

Train

Model

Environment

Q-Value

RL

Policyaction

state

D

sa

v

When model is good, train Q with it

s’

a’

Page 18: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 18DARPA-MARS Kickoff

Other forms of human knowledge

hard safety constraints on action choices partial models or constraints on models value estimates or value orderings on states

Page 19: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 19DARPA-MARS Kickoff

We will have succeeded if

It takes less human effort and total development time to provide prior knowledge run and tune the learning algorithm

than to write and debug the program without learning

Page 20: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 20DARPA-MARS Kickoff

Test domain

Indoor mobile-robot navigation and delivery tasks

quick adaptation to new buildings

quick adaptation to sensor change or failure

quick incorporation of human information

Page 21: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 21DARPA-MARS Kickoff

Solving huge problems

We have lots of good techniques for small-to-medium sized problems

reinforcement learning probabilistic planning Bayesian inference

Rather than scale them to tackle huge problems directly, formulate right-sized problems on the fly

Page 22: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 22DARPA-MARS Kickoff

Dynamic problem reformulation

workingmemory

perception action

Page 23: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 23DARPA-MARS Kickoff

Reformulation strategy

Dynamically swap variables in and out of working memory

constant sized problem always tractable adapt to changing situations, goals, etc

Given more time pressure, decrease problem size

Given less time pressure, increase problem size

Page 24: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 24DARPA-MARS Kickoff

Multiple-resolution plans

Fine view of near-term high-probability eventsCoarse view of distant low-probability events

Page 25: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 25DARPA-MARS Kickoff

Information gathering

Explicit models of the robot’s uncertainty allow information gathering actions

drive to top of hill for better view open a door to see what’s inside ask a human for guidance

Where is the supply depot?

Two miles up this road

Page 26: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 26DARPA-MARS Kickoff

Explicit uncertainty modeling

POMDP work gives us theoretical understanding

Derive practical solutions from learning explicit memorization policies approximating optimal control

Page 27: 8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

04/19/23 27DARPA-MARS Kickoff

Huge-domain experiments

Simulation of very complex task environment large number of buildings and other geographical

structures concurrent, competing tasks such as

surveillance supply delivery self-preservation

other agents from whom information can be gathered