Robot Motor Skill Coordination withEM-based Reinforcement Learning
Italian Institute of TechnologyAdvanced Robotics dept.
http://www.iit.it
Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell
October 20, 2010IROS 2010
Petar Kormushev, Italian Institute of Technology
Motivation
• How to learn complex motor skills which also require variable stiffness?
• How to demonstrate the required stiffness/compliance?
• How to teach highly-dynamic tasks?
2/22
Petar Kormushev, Italian Institute of Technology
Background
• Learning adaptive stiffness by extracting variability and correlation information from multiple demonstrations Sylvain Calinon et al., IROS 2010
3/22
Petar Kormushev, Italian Institute of Technology
Robot Motor Skill Learning
Demonstration by human
Encoding the skill
Refining the skill
Reproduction
Imitation learning
Reinforcement learning
Shared representation(encoding)
Motion capture Kinesthetic teaching
4/22
Petar Kormushev, Italian Institute of Technology
Skill representation (encoding)
Time independent
Trajectory-basedVia-pointsDMP
GMM/GMRDS-based
Time dependent
5/22
Petar Kormushev, Italian Institute of Technology
Dynamic Movement Primitives
DMP
Sequence of attractorsDemonstrated trajectory
Ä̂x =KX
i=1
hi(t)h· P (¹ X
i ¡ x) ¡ · V _xi
Ijspeert, Nakanishi, Schaal, IROS 2001
6/22
Petar Kormushev, Italian Institute of Technology
Extended DMP to include coordination
Ä̂x =KX
i=1
hi(t)hK Pi (¹
Xi ¡ x) ¡ · V _x
i
Coordination matrix (full stiffness matrix)
Advantages: capture correlations between the different motion variables reduce number of primitives
Ä̂x =KX
i=1
hi(t)h· P (¹ X
i ¡ x) ¡ · V _xi
Stiffness gain (scalar)
Proposal: use Reinforcement learning to learn the coordination matrices
7/22
Petar Kormushev, Italian Institute of Technology
Example: Reaching task with obstacle
Using full coordination matricesUsing diagonal matrices
Reward function: r(t) =½w1
T e¡ jjxRt ¡ x
Dt jj;t 6= te
w2 e¡ jjxRt ¡ x
Gjj;t = te
Expected return: 0.61 Expected return: 0.73
8/22
Petar Kormushev, Italian Institute of Technology
EM-based Reinforcement learning (RL)
• PoWER algorithm - Policy learning by Weighting Exploration with the Returns
• Advantages over policy-gradient based RL:no need of learning ratecan use importance samplingsingle rollout enough to update policy
Jens Kober and Jan Peters, NIPS 2009
9/22
Petar Kormushev, Italian Institute of Technology
µn+1=µn+
D(µk ¡ µn)R(¿k)
E
w(¿k)DR(¿k)
E
w(¿k)
RL implementation
• Policy parameters– Full coordination matrices:– Attractor vectors:
• Policy update rule:
• Importance sampling– uses best σ rollouts so far
K Pi
¹ Xi
µ
Df (µk;¿k)
E
w(¿k)=
¾X
k=1
f (µind(k);¿ind(k))
10/22
Petar Kormushev, Italian Institute of Technology
Pancake flipping: Experimental setup
Frying pan mounted on the end-effector
Artificial pancakewith 4 passive markers
(more robust to occlusions)
Barrett WAM 7-DOF robot
11/22
Petar Kormushev, Italian Institute of Technology
Evaluation: Tracking of the pancake
NaturalPoint OptiTrack motion capture system
x 12
100 Hz camera fps 40 Hz real-time capturing
12/22
Petar Kormushev, Italian Institute of Technology
Reward function
• Cumulative return of a rollout:
r(tf ) =w1harccos(v0:vtf )
¼
i+w2e¡ jjx
p¡ xF jj +w3xM3
R(¿) =TX
t=1
r(t)
• Reward function:
orientation position height
13/22
Petar Kormushev, Italian Institute of Technology
Kinesthetic demonstration of the task
14/22
Petar Kormushev, Italian Institute of Technology
Learning by trial and error
15/22
Petar Kormushev, Italian Institute of Technology
Finally learned skill
16/22
Petar Kormushev, Italian Institute of Technology
Motion capture to evaluate rollouts
17/22
Petar Kormushev, Italian Institute of Technology
Captured pancake trajectory
90° flip 180° flip
18/22
Petar Kormushev, Italian Institute of Technology
Performance
19/22
Petar Kormushev, Italian Institute of Technology
M (q)Äq+C(_q;q) _q+g(q) =¿G +¿T
¿G =LX
i=1
J TG;iFG;i
Gravity compensation
Task execution
¿T =J TT FT
Reproduction control strategy
20/22
Petar Kormushev, Italian Institute of Technology
Conclusion
• Combining Imitation learning + RL to learn motor skills with variable stiffness– Imitation used to initialize policy– RL to learn coordination matrices– Learned variable stiffness during reproduction
• Future work– other representations– other RL algorithms
21/22
Thanks for your attention!
Petar Kormushev, Italian Institute of Technology 22/22