22
Robot Motor Skill Coordination with EM-based Reinforcement Learning Italian Institute of Technology Advanced Robotics dept. http://www.iit.it Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell October 20, 2010 IROS 2010

Robot Motor Skill Coordination with EM-based Reinforcement Learning

Embed Size (px)

DESCRIPTION

A Barrett WAM robot learns to flip pancakes by reinforcement learning.The motion is encoded in a mixture of basis force fields through an extension of Dynamic Movement Primitives (DMP) that represents the synergies across the different variables through stiffness matrices. An Inverse Dynamics controller with variable stiffness is used for reproduction.The skill is first demonstrated via kinesthetic teaching, and then refined by Policy learning by Weighting Exploration with the Returns (PoWER) algorithm. After 50 trials, the robot learns that the first part of the task requires a stiff behavior to throw the pancake in the air, while the second part requires the hand to be compliant in order to catch the pancake without having it bounced off the pan.

Citation preview

Page 1: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Robot Motor Skill Coordination withEM-based Reinforcement Learning

Italian Institute of TechnologyAdvanced Robotics dept.

http://www.iit.it

Petar Kormushev, Sylvain Calinon, Darwin G. Caldwell

October 20, 2010IROS 2010

Page 2: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Motivation

• How to learn complex motor skills which also require variable stiffness?

• How to demonstrate the required stiffness/compliance?

• How to teach highly-dynamic tasks?

2/22

Page 3: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Background

• Learning adaptive stiffness by extracting variability and correlation information from multiple demonstrations Sylvain Calinon et al., IROS 2010

3/22

Page 4: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Robot Motor Skill Learning

Demonstration by human

Encoding the skill

Refining the skill

Reproduction

Imitation learning

Reinforcement learning

Shared representation(encoding)

Motion capture Kinesthetic teaching

4/22

Page 5: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Skill representation (encoding)

Time independent

Trajectory-basedVia-pointsDMP

GMM/GMRDS-based

Time dependent

5/22

Page 6: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Dynamic Movement Primitives

DMP

Sequence of attractorsDemonstrated trajectory

Ä̂x =KX

i=1

hi(t)h· P (¹ X

i ¡ x) ¡ · V _xi

Ijspeert, Nakanishi, Schaal, IROS 2001

6/22

Page 7: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Extended DMP to include coordination

Ä̂x =KX

i=1

hi(t)hK Pi (¹

Xi ¡ x) ¡ · V _x

i

Coordination matrix (full stiffness matrix)

Advantages: capture correlations between the different motion variables reduce number of primitives

Ä̂x =KX

i=1

hi(t)h· P (¹ X

i ¡ x) ¡ · V _xi

Stiffness gain (scalar)

Proposal: use Reinforcement learning to learn the coordination matrices

7/22

Page 8: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Example: Reaching task with obstacle

Using full coordination matricesUsing diagonal matrices

Reward function: r(t) =½w1

T e¡ jjxRt ¡ x

Dt jj;t 6= te

w2 e¡ jjxRt ¡ x

Gjj;t = te

Expected return: 0.61 Expected return: 0.73

8/22

Page 9: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

EM-based Reinforcement learning (RL)

• PoWER algorithm - Policy learning by Weighting Exploration with the Returns

• Advantages over policy-gradient based RL:no need of learning ratecan use importance samplingsingle rollout enough to update policy

Jens Kober and Jan Peters, NIPS 2009

9/22

Page 10: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

µn+1=µn+

D(µk ¡ µn)R(¿k)

E

w(¿k)DR(¿k)

E

w(¿k)

RL implementation

• Policy parameters– Full coordination matrices:– Attractor vectors:

• Policy update rule:

• Importance sampling– uses best σ rollouts so far

K Pi

¹ Xi

µ

Df (µk;¿k)

E

w(¿k)=

¾X

k=1

f (µind(k);¿ind(k))

10/22

Page 11: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Pancake flipping: Experimental setup

Frying pan mounted on the end-effector

Artificial pancakewith 4 passive markers

(more robust to occlusions)

Barrett WAM 7-DOF robot

11/22

Page 12: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Evaluation: Tracking of the pancake

NaturalPoint OptiTrack motion capture system

x 12

100 Hz camera fps 40 Hz real-time capturing

12/22

Page 13: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Reward function

• Cumulative return of a rollout:

r(tf ) =w1harccos(v0:vtf )

¼

i+w2e¡ jjx

p¡ xF jj +w3xM3

R(¿) =TX

t=1

r(t)

• Reward function:

orientation position height

13/22

Page 14: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Kinesthetic demonstration of the task

14/22

Page 15: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Learning by trial and error

15/22

Page 16: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Finally learned skill

16/22

Page 17: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Motion capture to evaluate rollouts

17/22

Page 18: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Captured pancake trajectory

90° flip 180° flip

18/22

Page 19: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Performance

19/22

Page 20: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

M (q)Äq+C(_q;q) _q+g(q) =¿G +¿T

¿G =LX

i=1

J TG;iFG;i

Gravity compensation

Task execution

¿T =J TT FT

Reproduction control strategy

20/22

Page 21: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Petar Kormushev, Italian Institute of Technology

Conclusion

• Combining Imitation learning + RL to learn motor skills with variable stiffness– Imitation used to initialize policy– RL to learn coordination matrices– Learned variable stiffness during reproduction

• Future work– other representations– other RL algorithms

21/22

Page 22: Robot Motor Skill Coordination with EM-based Reinforcement Learning

Thanks for your attention!

Petar Kormushev, Italian Institute of Technology 22/22