View
214
Download
1
Tags:
Embed Size (px)
Citation preview
From Motor Babbling to Planning
Cornelius WeberFrankfurt Institute for Advanced StudiesGoethe University Frankfurt, Germany
ICN Young Investigators’ Colloquium26th June 2008, Frankfurt am Main
Reinforcement Learning
value actor units
fixed reactive system that always strives for the same goal
Trained Weights
reinforcement learning does not use the exploration phase
to learn a general model of the environment
that would allow the agent to plan a route to any goal
so let’s do this
Learning
actor
state space
randomly move aroundthe state space
learn world models:● associative model● inverse model● forward model
variables:► action► current state► next state
as
s '
Learning: Associative Model
weights to associateneighbouring states
use these to find any possible routes between agent and goal
si '=∑ w ijs'ss j
Δw ijs's=ε s i '− si ' s j
Learning: Inverse Model
weights to “postdict”action given state pair
use these to identify the action that leads to a desired state
ak=∑ wkija s's s i 's j
Δw kijas's =ε ak− ak s i 's j
∑ sum ∏ product Sigma-Pi neuron model
Learning: Forward Model
weights to predict stategiven state-action pair
use these to predict the next state given the chosen action
si '=∑ w ikjs'as ak s j
Δw ik js'as =ε si '− si ' ak s j
Discussion
- AI context ... assumed links explained by learning
- reinforcement learning ... if no access to full state space
- noise ... wide “goal hills” will have flat slopes
- shortest path ... not taken; how to define?
- biological plausibility ... Sigma-Pi neurons; winner-take-all
- to do: embedding ... learn state space from sensor input
- to do: embedding ... let the goal be assigned naturally
- to do: embedding ... hand-designed planning phases