Giannone Nao Learning

7/30/2019 Giannone Nao Learning

1/20


2/20

Overview:environment

Robotic Agent NAO

Application Robotic Soccer

SDK

Simulator

Humanoid Robot

Produced by Aldebaran


3/20

Process raw data

from environment

Elaborate raw data to obtain

more reliable information

Decide the best behaviour to

accomplish the agent goal

Actuate robot motors

accordindly

Vision Module Modelling Module

Motion Control

Module

Behaviour Control

Module

Environment

At First !!!

At First !!!

Overview:(sub)tasks


4/20

Make Nao walkhow?

Main Advantage

and a Drawback

Based on an unknow Walk Model

Ready to Use (to be tuned)

Nao is equipped

with a set of motion utilities including

a walk implementationthat can be

No flexibility at all!!!

called through an interface

(NaoQi Motion Proxy)

partially customized by tuning

some parameters

For these reasons

we decided to develop

our walkmodeland to tune it using

machine learnig tecniques


5/20


6/20

A simple walking RAgent for Nao

Motion Control Module

NaoQi Adaptor

Simple Behaviour Module

Switches between

two states: walk -

stand

Smemy

SPQR Walking Library

NAO (NaoQi)

Webots Client

TCP channel

WEBOTS

uses


7/20

Choose a set of variable output:

3D coordinates of selected points

of the robot

Choose and parametrize the desired

trajectories for these variables

at each phase of the gait

SPQR Walking Engine Model

21 degrees of freedom

Velocity Commands (v,) v is linear velocity

is angolar velocity

We follow theStatic Walking Pattern:

Use a-priori definition of the

desired trajectories defined by:

NAO modelcharacteristics

No actuated trunk

No dynamic model available


8/20


9/20

SPQR walking subtasks and parameters

SPQR walk subtasks

Foot trajectories in

the xz planeCenter of mass

trajectory in lateral

direction

Hip yaw/pitch

control (turn)

Arm control

Xtot, Xsw0, Xds

Zst, Zsw

Yft, Yss, Yds, Kr

HypKs

Biped walking

Double support phaseSwing phase SS%


10/20

Walk tuning: main issues Possible choices

By hand

By using machine learning techniques

Machine Learning seems the best solution

Less human interaction

Explores the search space in a more systematic way

but take care of some aspects

You need to define an effective fitness function

You need to choose the right algorithm to explore the parameterspace

Only a limited amount of experiments can be done on a real

robot


11/20

SPQR Learning System Architecture

LearnerLearning library

RAgent

Walking library

uses

uses

Real Nao

Webots

Datato evaluatethe fitness

FitnessIterationexperiments

(GPS)


12/20

SPQR Learner

First

iteration?

Return initial

Iteration and

iteration information

Apply the chosen

algorithm (strategy)

Yes

No

Policy Gradient

(e.g., PGPR)

Nelder Mead

Simplex Method

Genetic Algorithm

Learner

Return next

Iteration and

iteration information


13/20


14/20

Enhancing PG: PGPR

At each iteration i, the gradient estimate (i) can be

used to obtain a metric for measuring therelevance of the parameters.

Given the relevance and a threshold T, PGPR prunes less relevant parameters

in next iterations.

forgetting factor


15/20


16/20

Simulators in learning tasks

Advantages

You can test the gait model and the learningalgorithm without being biased by noise

Limits

The results of the experiments on the simulator can

be ported on the real robot, but specialized solutions

for the simulated model can be not so effective on the

real robot (e.g., it does not take into account

asymmetries, models are not very accurate)


17/20

Results (1)

Five sessions of PG, 20 iterations each, all starting from

the same initial configuration

SS%, Ks, Yft have been set to hand-tuned values

16 policies for each iteration

Fitness increases

in a regular way

Low variance

among the five

simulations


18/20

Results (2)

Zsw Xs KrXsw0

Five runs of PGPR

Final parameter setsfor the five PG runs


19/20

A. Cherubini, F. Giannone, L. Iocchi, M. Lombardo, G. Oriolo. Policy

Gradient Learning for a Humanoid Soccer Robot. Accepted for Journal ofRobotics and Autonomous Systems.

A. Cherubini, F. Giannone, L. Iocchi, and P. F. Palamara, An extendedpolicy gradient algorithm for robot task learning, Proc. of IEEE/RSJInternational Conference on Intelligent Robots and System, 2007.

A. Cherubini, F. Giannone, and L. Iocchi, Layered learning for a soccerlegged robot helped with a 3D simulator, Proc. of 11th InternationalRobocup Symposium, 2007.

http://openrdk.sourceforge.net

http://www.aldebaran-robotics.com/

http://spqr.dis.uniroma1.it

Bibliography
http://openrdk.sourceforge.net/http://www.aldebaran-robotics.com/http://spqr.dis.uniroma1.it/http://spqr.dis.uniroma1.it/http://www.aldebaran-robotics.com/http://www.aldebaran-robotics.com/http://www.aldebaran-robotics.com/http://openrdk.sourceforge.net/


20/20

??? Any Questions ???

???

???

Documents

Giannone Nao Learning