27
From Motor Babbling to Planning Cornelius Weber Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany ICN Young Investigators’ Colloquium 26 th June 2008, Frankfurt am Main

From Motor Babbling to Planning Cornelius Weber Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany ICN Young Investigators’

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

From Motor Babbling to Planning

Cornelius WeberFrankfurt Institute for Advanced StudiesGoethe University Frankfurt, Germany

ICN Young Investigators’ Colloquium26th June 2008, Frankfurt am Main

Reinforcement Learning

value actor units

fixed reactive system that always strives for the same goal

Trained Weights

reinforcement learning does not use the exploration phase

to learn a general model of the environment

that would allow the agent to plan a route to any goal

so let’s do this

Learning

actor

state space

randomly move aroundthe state space

learn world models:● associative model● inverse model● forward model

variables:► action► current state► next state

as

s '

Learning: Associative Model

weights to associateneighbouring states

use these to find any possible routes between agent and goal

si '=∑ w ijs'ss j

Δw ijs's=ε s i '− si ' s j

Learning: Inverse Model

weights to “postdict”action given state pair

use these to identify the action that leads to a desired state

ak=∑ wkija s's s i 's j

Δw kijas's =ε ak− ak s i 's j

∑ sum ∏ product Sigma-Pi neuron model

Learning: Forward Model

weights to predict stategiven state-action pair

use these to predict the next state given the chosen action

si '=∑ w ikjs'as ak s j

Δw ik js'as =ε si '− si ' ak s j

Planning

Planning

Planning

Planning

Planning

Planning

Planning

Planning

Planning

Planning

Planning

Planning

Planning

Planning

Planning

goal

actorunits

agent

Planning

Planning

Planning

Discussion

- AI context ... assumed links explained by learning

- reinforcement learning ... if no access to full state space

- noise ... wide “goal hills” will have flat slopes

- shortest path ... not taken; how to define?

- biological plausibility ... Sigma-Pi neurons; winner-take-all

- to do: embedding ... learn state space from sensor input

- to do: embedding ... let the goal be assigned naturally

- to do: embedding ... hand-designed planning phases

Acknowledgments

Collaborators:

Jochen Triesch FIAS J-W-Goethe University Frankfurt

Stefan Wermter University of Sunderland

Mark Elshaw University of Sheffield