Incremental Motion Learning with Locally Modulated ...lasa.epfl.ch/publications/uploadedFiles/LMDS_els.pdf · Incremental Motion Learning with Locally Modulated Dynamical Systems

Incremental Motion Learning with Locally Modulated Dynamical Systems

K. Kronanderc,∗∗, M. Khansarid, A. Billardc

aLearning Algorithms and Systems Laboratory, EPFL-STI-I2S-LASA, Station 9, CH 1015, Lausanne, SwitzerlandbArtificial Intelligence Laboratory, Stanford University, Stanford, CA 94305-9010, USA

∗Corresponding authorEmail addresses: [email protected] (K. Kronander ),

[email protected] (M. Khansari), [email protected](A. Billard)

Preprint submitted to Elsevier March 18, 2015

Incremental Motion Learning with Locally Modulated Dynamical Systems

K. Kronanderc,∗∗, M. Khansarid, A. Billardc

cLearning Algorithms and Systems Laboratory, EPFL-STI-I2S-LASA, Station 9, CH 1015, Lausanne, SwitzerlanddArtificial Intelligence Laboratory, Stanford University, Stanford, CA 94305-9010, USA

Abstract

Dynamical Systems (DS) for robot motion modeling are a promising approach for efficient robot learning and control. Our focusin this paper is on autonomous dynamical systems, which represent a motion plan without dependency on time. We develop amethod that allows to locally reshape an existing, stable nonlinear autonmous DS while preserving important stability properties ofthe original system. Our system is based on local transformations of the dynamics. We propose an incremental learning algorithmbased on Gaussian Processes for learning to reshape dynamical systems using this representation. The approach is validated in a 2dtask of learning handwriting motions and in a manipulation task a 7 degrees of freedom Barrett WAM manipulator.

Keywords: Dynamical Systems, Motion Modeling, Robotics

1. Introduction

A set of preprogrammed behaviors is insufficient for a trulyversatile robot. Alternative solutions should hence be sought toprovide the user with an intuitive interface that can be used toquickly and efficiently teach the robot new tasks. Robot Learn-ing from Demonstration (RLfD) addresses this issue by endow-ing the robot with the capability to learn tasks from demonstra-tions [1][2]. The goal is that the user should be able to show therobot how to perform a task rather than programming it explic-itly. A task can be demonstrated in various ways, e.g. throughmotion capture or Kinesthetic teaching. After a set of demon-strations has been collected, these are typically used for opti-mizing a parametrized representation of robot motions. Whilethese representations can take many forms, in this paper we areinterested particularly in approaches that represent motions us-ing Dynamical Systems (DS).

In order to successfully model the robot motion, demonstra-tions should be provided such that they include the genericcharacteristics of the motion. This is however very difficultwhen dealing with complex and/or high dimensional motions.Incremental Learning, whereby the robot successively learnsthe task through several demonstrations, can alleviate this dif-ficulty. Furthermore, incremental learning can allow task re-finement (incremental adaptation of task model to improve taskperformance) and reuse (use of an existing task model for acompletely different task) [3]. A general workflow of an incre-mental learning setting is described in Fig. 1. While numer-ous advances have been made for incremental motion learningfor time-indexed trajectories, incremental learning in DS rep-resentations is still a largely unexplored area of research. In

∗∗Corresponding authorEmail addresses: [email protected] (K. Kronander ),

[email protected] (M. Khansari), [email protected](A. Billard)

AutonomousSkill execution

Task ExecutionOriginal DS model

Reshaped DS model

Teacher

Teacher observes performance

Corrective demonstrations

Teacher observes performance

Incremental learning

Figure 1: The figure illustrates the incremental process of acquiring a new skillby reshaping dynamics using the proposed framework. The system can be re-shaped repeatedly until satisfactory task performance has been achieved.

Preprint submitted to Elsevier March 18, 2015

this work, we adress this by proposing a novel DS representa-tion, called Locally Modulated Dynamical Systems (LMDS),that allows to reshape DS while preserving stability propertiesof the original system. As hinted by the name, this is done bylocally applying transformations (e.g. rotations), to the origi-nal dynamics. It is shown that this way of reshaping dynamicsis suitable for robot motion modeling, since complex motionscan be modeled without risking the introduction of spurious at-tractor points or unstable behavior. The LMDS representationis not constrained to a particular form of original dynamics orlearning method (any local regression method can in principlebe used, together with any representation of a first order au-tonomous dynamical system). We further propose the Gaus-sian Process Modulated Dynamical Systems (GP-MDS) algo-rithm, which uses Gaussian Process Regression (GPR) to learnreshaped dynamics. In summary, the main contributions in thispaper are:

• The Locally Modulated Dynamical Systems (LMDS) for-mulation.

• Stability analysis of the general LMDS formulation.

• An incremental learning algorithm for LMDS based onGaussian Processes, GP-MDS.

The remainder of this paper is organized as follows. In Sec-tion 2, we provide a literature review. In Section 3 we detailthe LMDS formalism, and propose a particular parameterizedform of the modulation function which is used in this paper. InSection 4, we then address the problem of how to learn LMDS,introducing the GP-MDS algorithm. Experimental validation ispresented in Section 5, with a 2d example of warping dynamicsfor handwriting letters, and one periodic as well as one discretemanipulation task on the KUKA LWR and Barret WAM arms.The paper is concluded with a discussion and an outlook in tofuture directions of research in Section 6.

2. Related Work

Dynamical Systems have emerged as one of the most generaland flexible ways of representing motion plans for robotic ap-plications. In contrast to classical architectures where a robot isusually programmed to track a given reference position trajec-tory as accurately as possible, in DS representations the staticreference trajectory is replaced by one which unfolds as thetask progresses, making adaptation to unforeseen events pos-sible. Motion generation with dynamical systems is a long-standing research topic with important early approaches suchas the VITE [4] model suggested to simulate arm reaching mo-tions. Recurrent Neural Networks (RNN) have been succesfullyused for modeling dynamics [5, 6, 7] in various applications.However, neural network approaches typically suffer from longtraining times and difficulty to ensure stability.

More recently, the Dynamic Motor Primitives (DMP) frame-work [8, 9] and variants [10] have gained popularity both in im-itation learning [11] and reinforcement learning [12, 13]. Thisclass of DS has been shown to be very useful and flexible for a

large variety of robotics tasks, both discrete and rhythmic. Cou-pling between several DS is achieved via a shared phase vari-able that acts as a clock, which also forces potentially unstablenon-linear dynamics to decay and eventually be replaced by alinear system with known stability properties. This mechanismmakes it easy to incorporate exploration and learning withoutrisking unstable behavior, but it also means that the system istime-dependent, which for some tasks may or may not be desir-able.

In contrast, autonomous DS formulations [14] can encodemotions in a completely time-independent manner. By scalingthe speed of motion, time-invariant models can be transformedinto time-dependent models and cope with timing constraints[15, 16]. Stability is arguably one of the most fundamentalproperties that should be ensured when using DS for model-ing robot motions, both from a task performance and a safetyperspective. In our previous work, this has been addressed in[17] by deriving stability constraints for a particular parametricform of DS, Gaussian Mixture Regression (GMR). A similaranalysis with resulting constraints have also been performed forDS learned by Extreme Learning Machines (ELM) in [18]. Amore generally applicable method was proposed in [19], whichpresents an approach that can stabilize any DS by online gener-ation of an auxiliary command that ensures monotonic decay ofa task-based energy function which is learned from demonstra-tions. This method allows more complex motions than stabilityconstraints based on a quadratic energy function, which is usede.g. in [17], but is still limited by the energy function used asa basis for the stabilization mechanism. Task-based Lyapunovfunctions have also been explored in the ELM framework in[20]. All of these methods are based on using a parameterizedLyapunov function for ensuring asymptotic stability of the dy-namics. In each case, this has consequences on the accuracyat which trajectories can be represented. In this work, we donot base the stability analysis on a known Lyapunov function,and instead construct a DS which is 1) inherently incapable ofintroducing spurious attractors and 2) guaranteed to generatebounded trajectories. These are weaker properties than asymp-totic stability, with the consequence that our dynamics can con-verge to limit cycles or orbits (but not spurious attractors). Inexchange, we can directly incorporate incremental demonstra-tions, which need not comply with an energy function. As willbe shown later, asymptotic stability is for all practical purposesan unnecessary restriction in our framework, since it is not vi-olated unless the demonstrations explicitly indicate orbital be-havior.

Our model is based on modulating an available autonomous(time-invariant) DS with a state-dependent full-rank matrix, andstrongly related to our previous work where state-dependentmodulation was used to steer trajectories away from obstacles[21]. While similar in formulation of the dynamical system,here we assume a non-parametric form of the modulation andlearn it from examples.

Incremental learning from demonstration can alleviate thedifficulty of simultaneously demonstrating desired behavior inmultiple degrees of freedom. Furthermore, it can allow refine-ment and reuse of a learned model for a different task. Various

3

methodologies have been used. In [22], a neural network basedapproach inspired by how humans consolidate existing knowl-edge is presented. Gaussian Mixture Modeling (GMM) usu-ally in combination with Gaussian Mixture Regression (GMR)is a well-established framework in the RLfD community [23].GMM are usually trained offline with the EM algorithm [24],but incremental variants exist [25, 26], allowing incrementalRLfD based on GMM [27, 28]. To deal with synchronizationand clustering of motion data, Hidden Markov Models (HMM)have been extensively used. In [29], a system based on HMMthat incrementally learns whole body motion primitives frommotion capture data. In [30], a specialized impedance con-trol law designed to facilitate incremental kinesthetic teachingwas used together with a HMM for representing the incomingdata, allowing to elegantly rid the system of synchronizationproblems between demonstrations. Similarly to HMM, an au-tonomous DS representation does not have issues with synchro-nization, in this case because no temporal information is en-coded in the model. In general, autonomous DS models seemparticularly well-suited for incremental learning settings, butsofar there has been little research in this direction.

Our learning algorithm uses Gaussian Process Regression toencode variations of a parameterization of the modulation func-tion across the state-space. Since GPR suffers computationallyfrom an expanding data set, it is important to sparsely selecttraining points that are actually used for regression. This is of-ten referred to as selecting an active set of training points. TheGP literature is rich in sparse approximations. A complete cov-erage of all proposed sparse GP methods is outside the scope ofthis work. We refer the reader to [31] which provides an excel-lent review and unifying view of most sparse approximations.What most previous methods have in common is that they de-fine sparsity criteria that are based on the input patterns of thedata. This is natural in the GP framework, since the inputs im-plicitly define the covariance of the outputs and hence allowto use information-theoretic principles for selecting the data touse for regression. In contrast, we define a sparsity criterionbased on the outputs, similarly to e.g. the LWPR algorithm[32]. This criterion is designed to select training points not tomaximize information gain, but to maximize performance met-rics that are important for the specific application of trajectoryencoding with DS.

3. Locally Modulated Dynamical Systems

In this work, we assume the availability of an autonomousDS which serves as a coarse motion representation of a robotictask. We will refer to this DS as the original dynamics. Inthis work, we will exclusively use original dynamics that areasymptotically stable at a single attractor point.

3.1. Formulation and propertiesLet x ∈ RN represent a N-dimensional kinematic variable,

e.g. a Cartesian position vector. Let a continuous function f :RN 7→ RN represent the original dynamics:

x = f (x) (1)

Table 1: DefinitionsDefinition 1 (Locally active). A matrix-valued function M(x) ∈ RN×N issaid to be acting locally or to be locally active if there exists a compactsubset χ ⊂ RN such that M(x) = IN×N for all x ∈ RN \ χ.

Let an autonomous DS be defined by x = f (x), where f : RN 7→ RN

is a continuous real-valued function. The following are a set of standarddefinitions related to the properties of this DS.

Definition 2 (Boundedness). A DS is bounded if for each δ > 0, thereexists ε > 0 such that:

‖x(t0)‖ < δ⇒ ‖x(t)‖ < ε,∀ t > t0

Definition 3 (Equilibrium point). An equilibrium point for a DS is a pointx ∈ RN such that f (x) = 0.

Definition 4 (Lyapunov Stability). An equilibrium point x∗ is said to bestable in the sense of Lyapunov or simply stable if for each ε > 0 thereexists δ(ε) > 0 such that:

‖x(t0) − x∗‖ < δ⇒ ‖x(t) − x∗‖ < ε, ∀t > t0

Definition 5 (Asymptotic Stability). An equilibrium point x∗ is calledasymptotically stable if it is stable and, if in addition there exists R > 0such that:

‖x(t0) − x∗‖ < R⇒ ‖x(t) − x∗‖ → 0, t → ∞

If R can be chosen arbitrarily large, the equilibrium point is globally asymp-totically stable.

These dynamics are reshaped by a modulation field M(x). Theform of the dynamics in LMDS is hence:

x = g(x) = M(x) f (x) (2)

where M(x) ∈ RN×N is a continuous matrix valued function thatmodulates the original dynamics f (x). As will be shown later,this seemingly simple representation is higly flexible and canmodel very complex motions. If the modulation is local andfull rank, several important properties are inherited from theoriginal to the reshaped dynamical system.

Proposition 1 (Equilibrium points). If M(x) has full rank forall x, the reshaped dynamics has the same equilibrium point(s)as the original dynamics.

If M(x) has full rank, it has an empty null-space, and henceEq. (2) is zero iff f (x) = 0. This simple result is of tremen-dous importance for using DS for motion representation. Theintroduction of spurious attractors is one of the main problemsin using regression methods to learn dynamics [17]. Here, wemake such spurious attractors impossible by construction.

Proposition 2 (Boundedness). Assume that the original dy-namics is bounded (see Def. 2). Assume further that M(x) islocally active in a compact subset χ ⊂ RN (see Def. 1). Then,the reshaped dynamics is bounded.

Proof. Let BR be a ball centered at the origin of radius R inRN . Let R be chosen such that χ lies entirely in BR. Since χis a compact set in RN , it is always possible to find such a R.For each δ > 0, let ε(δ) > 0 be an associated boundary for theoriginal dynamics (refer to Def. 2). Define ε′(δ) as a boundary

4

for the reshaped dynamics as follows: ε′ = ε(R) for δ < R andε′ = ε(δ) for δ ≥ R. Boundedness follows from Def. 2.

DS that have a single attractor are useful for representinggeneric motions to a fixed point, e.g. reach-and-grasp type mo-tions. For such systems, in addition to equilibrium points andboundedness, the stronger stability property is also inherited,and locally the stability is asymptotic.

χ

Br

BR

r

R

Figure 2: Left: Illustration of the introduction of the balls Br and BR. Br is theball centered at the origin with the largest possible r so that there are no pointsof χ in Br . BR is a ball with radius R, which is chosen as the smallest possiblenumber so that BR fully contains χ. Right: An example of a 2D reshapedsystem where the orginial asympotically stable (linear) dynamics are reshapedto a system that converges to a either a limit cycle or the origin depending onstarting position. The reshaped system is globally stable but not asymptoticallystable.

Proposition 3 (Lyapunov stability). Consider a system x =

f (x) that has a single equilibrium point. Without loss of gener-ality, let this quilibrium point be placed at the origin. Assumefurther that the equilibrium point is stable. Assume that the cri-teria for Propositions 1 and 2 are satisfied. If in addition, χdoes not include the origin, the reshaped system is stable at theorigin.

Proof. According to Proposition 1, the reshaped dynamics hasa single equilibrium point at the origin. Let Br be a ball cen-tered at the origin with a radius r small enough that Br does notinclude any point in χ. Hence, inside Br, we have g(x) = f (x).By the stability of f , there exists for all 0 < ε < r a δ(ε) suchthat ‖x(0)‖ < δ(ε) ⇒ ‖x(t)‖ < ε, ∀t > 0. For any ε > r, letδ(ε) = δ(r). Then, by the stability of f , ‖x(0)‖ < δ(ε) = δ(r) ⇒‖x(t)‖ < r < ε.

Note that the above proposition does not ensure asymptoticstability, which would mean that all trajectories are guaranteedto converge to the origin. Instead, Prop. 3 says that trajecto-ries can be kept arbitrarily close to the origin, if they start closeenough. Since stability is necessary for asymptotic stability,the result is important because it represents a minimum require-ment, but not a guarantee, for trajectories to converge to attrac-tor of the original system. Unsurprisingly, the precondition thatthere is a region around the origin which is not reshaped alsoimplies local asymptotic stability.

Proposition 4 (Local asymptotic stability). Consider a systemx = f (x) that has a single equilibrium point. Assume that theconditions of Propositions 1,2 and 3 are satisfied. Then, thereshaped system is locally asymptotically stable at the origin.

Proof. The original dynamics are globally asymptotically sta-ble, which implies the existence of a Lyapunov function V :RN 7→ R+ such that:

V(x) > 0, ∀x , 0 and V(0) = 0 (3)

V =∂V∂x

f (x) < 0, ∀x , 0 and V(0) = 0 (4)

Let Br be defined as in the proof of Proposition 3. Let M ⊂ Br

denote the largest level set of V that lies entirely inside Br. Forany x0 ∈ M, the reshaped dynamics is exactly equal to the orig-inal dynamics x = f (x). Hence, V(x) > 0 and V(x) < 0 holdsfor all x ∈ M, which proves that the system is locally asymp-totically stable at the origin with region of attraction given byM.

If demonstrations are given that clearly contradict the asymp-totic stability property of the original dynamics, it will not beretained. A simple example is given in Fig. 2, which illus-trates a reshaped asymptotically stable linear system. As canbe seen, the resulting topology after reshaping is a half-stable1

limit cycle. The non-inheritance of global asymptotic stabilityis both an advantage and a disadvantage. It is an advantage be-cause it allows to represent repetitive motions such as shakingor polishing, as will be shown in Section 5.3. It is a disad-vantage because for discrete motions, it would be preferable torigorously ensure that the target point will be reached on all cir-cumstances. However, we conjuncture that limit cycles are notintroduced unless they are explicitly demonstrated, as will beexemplified in Section 5.2.

3.2. Illustrative Examples

Here, we give set of illustrative 2-d example of some types oftransformations that can be achieved in the LMDS formulation.Consider the following linear original dynamics:

x = −

[10 00 10

]x (5)

The following function will be used to control the influence ofthe modulations, i.e. in what region of the state-space they in-fluence the dynamics.

h(x) = exp(−50‖x − c‖2) (6)

Let the center point of the influence function be placed at c =

[50, 50]T . This function is used to impose the locally activeproperty2, see definition in Table 1.

1 The term half-stable refers to the property that trajectories may convergeto the limit cycle or an attractor point depending on the starting location.

2Strictly speaking, for the modulation to be locally active, Eq. (6) should beset to zero at a small value, as described in Section 4.3. This is, however, forillustrative purposes not necessary here.

5

(a) (b) (c)

(d) (e) (f)

Figure 3: Top: Examples of local modulation of linear dynamics with threedifferent random matrices. Note that in the second example (top-middle), aspurious attractor point has been introduced due to rank-deficiency of the mod-ulating function. Bottom: Examples of local rotation of linear dynamics withthree different rotation angles.

3.2.1. Local Modulation by Random MatrixFor illustrative purposes, we construct here a LMDS which

will locally modulate the original dynamics with a random ma-trix, A ∈ R2×2. We define the modulation as follows:

Ma(x) = (1 − h(x))I2 + h(x)A (7)

This modulation is locally active3, but does not have full rankeverywhere. Consequently, the modulation can illustrate spuri-ous attractors, as is illustrated in Figures 3a- 3c.

3.2.2. Locally Rotating DynamicsOne particularly interesting class of local modulations are ro-

tations of the original dynamics. Let φ(x) = h(x)φc denote astate-dependent rotation. This results in a smoothly decayingrotation which will fully rotate the dynamics by the angle φc

only at x = c. The modulation function is then defined as theassociated rotation matrix:

Mr(x) =

[cos(φ(x)) − sin(φ(x))sin(φ(x)) cos(φ(x))

](8)

In this case, Mr(x) is guaranteed to have full rank for all x. Fur-thermore, the modulation is locally active4. This means thatlocal rotations can be applied without introducing spurious at-tractors, regardless of the form of the original dynamics. Thisvery useful property will be exploited to apply nonparametriclearning without constraints, as is detailed in Section 4. Exam-ples of locally rotating the linear dynamics in Eq. (5) with afew different values of φc are given in Figures 3d- 3f

3.3. Modulation by Rotation and Norm-scalingIn this section, we describe a particular choice of modulation

function which is used in the remainder of this paper. As seen

3Strictly speaking, for the modulation to be locally active, Eq. (6) should beset to zero at a small value, as described in Section 4.3. This is, however, forillustrative purposes not necessary here.

4See previous footnote.

in Section 3.2.2, rotations (or any other orthogonal transforma-tions) always have full rank. It is possible to define and param-eterize rotations in any dimension, but we will focus mainly on2d and 3d systems in the remainder of this work.

For increased flexibility, a scaling of the speed in the DS canbe achieved by multiplying the rotation matrix by a scalar. LetR(x) denote a state-dependent rotation matrix, and let κ(x) de-note a state-dependent scalar function strictly larger than −1.We then construct a modulation function that can locally rotateand speed up or slow down dynamics as follows:

M(x) = (1 + κ(x))R(x) (9)

Both κ and R should vary in a continuous manner across thestate-space. In a continuous system, the inclusion of a speed-scaling does not influence the stability properties, although itmay do so in discrete implementations, so care should be usedto not allow κ(x) to take large values. Also, note that κ has beengiven an offset of 1 so that with κ(x) = 0 the original speed isretained. This is useful when modeling κ with local regressiontechniques such as GPR, as will be done in Section 4.2.

Rotations in arbitrary dimension can be defined by means ofa two-dimensional rotation set and a rotation angle φ. In 2d, thefact that the rotation set is entire R2 means that a rotation is fullydefined by the rotation angle only. Hence, the parameterizationin that case is simply θ2d = [φ, κ]. In 3d, the rotation planecan be compactly parameterized by its normal vector. Hence,the parameterization in 3d is θ3d = [φµR, κ], where µR is therotation vector (the normal of the rotation set). Parameteriza-tions in higher dimensions are possible, but require additionalparameters for describing the rotation set.

4. Learning Locally Modulated Dynamical Systems

In the previous section, we described how the dynamics canbe reshaped in the LMDS framework. We now turn to the prob-lem of how to learn from data in LMDS. The procedure forgenerating training data for LMDS from trajectory data is de-scribed in Section 4.1. After this step, one can in principle useany local regression technique to learn an LMDS system.

4.1. Training DataAssume that a training set of M observations of x and x is

available: {xm, xm}Mm=1. To exploit this data for learning, it is

first converted to a data set consisting of input locations andcorresponding modulation vectors: {xm, θm}

Mm=1. To compute

the modulation data, the first step is to compute the originalvelocities, denoted by xo

m,m = 1 . . . M. These are computed byevaluating the original dynamics function at all xm,m = 1 . . . Min the trajectory data set. Each pair {xo

m, xm} then corresponds toa modulation parameter vector θm. How this parameter vectoris computed depends on the structure and parameterization cho-sen for the modulation function. The procedure for computingthe modulation parameters for the particular choice of modula-tion function used in this work (rotation and norm scaling) isdescribed in Table 2. Parameter vectors for each collected datapoint are computed this way and in pairs with the corresponding

6

Table 2: Procedure for converting 2d or 3d trajectory data to modulation data.

Require: Trajectory data {xm, xm}Mm=1

1: for m = 1 to M do2: Compute original velocity: xo

m = f (xm)3: Compute rotation vector (3d only): µm =

xm×xom

‖xm‖‖xom‖

4: Compute rotation angle: φm = arccos xTm xo

m‖xm‖‖xo

m‖

5: Compute scaling κm =‖xm‖

‖xom‖− 1

6: 3d: θm = [φmµm, κm] 2d: θm = [φm, κm]7: end for8: return Modulation data {xm, θm}

Mm=1

state-observations constitute a new data set: {xm, θm}Mm=1. Re-

gression can now be applied to learn θ(x) as a state-dependentfunction.

4.2. Gaussian Process Modulated Dynamical SystemsGaussian Process Regression (GPR), is a state-of-the-art re-

gression technique which in its standard form can model func-tions with input of arbitrary dimension and scalar outputs. Thebasic GPR equations are reviewed briefly in Table 3.

The behavior of GPR depends on the choice of covariancefunction k(·, ·). In this work, we use the squared exponentialcovariance function, defined by:

k(x, x′) = σ f exp(−

xT x2l

)where l, σ f > 0 are scalar hyper-parameters. In this paper, theseparameters are set to predetermined values. Alternatively, theycould optimized to maximize the likelihood of the training data[33].

GP-MDS is based on encoding the parameter vector of themodulation function with Gaussian Processes. The data setfrom Section 4.1 is used as training set for the GP, where thepositions xm are considered as inputs and the correspondingmodulation parameters θm are considered as outputs. Note thatsince θ is multidimensional, one GP per parameter is needed.This can be done at little computational cost if the same hyper-parameters are used in each GP, as is clear by inspecting Eq.(14). A vector of scalar weights can be precomputed:

α(x∗) = [KXX + σ2nI]−1KXx∗ (10)

Prediction of each entry of θ then only requires computing adot-product: θ j(x∗) = α(x∗)T Θ j, where Θ j is a vector of all thetraining samples of the j:th parameter of θ.

4.3. Enforcing Local ModulationThe particular choice of GP prior with zero mean and with

the squared exponential covariance function results all elementsof θ going to zero in regions far from any training data. Hence,for the modulation to be local, it should be parameterized suchthat M → I as θ → 0. This is the case for the rotated andspeed-scaled modulation that are used in this paper, which asdescribed in Section 4.1 encode the rotation angle in the norm

α

α + ρ

originaltruncated

Figure 4: The figure illustrates the smooth truncation function in Eq. (11) onan Gaussian kernel. For clarity of illustration, the threshold has been set to arelatively high value.

of a subvector of θ. Also, when the speed factor κ goes tozero, the speed of the reshaped dynamics approaches the orig-inal speed. Consequently, the modulation function does go toidentity, but there is no strict boundary outside of which M isexactly equal to I. To make the modulation locally active in thestrict sense, the entries of a(x∗) (Eq. (10)) should be smoothlytruncated at some small value. To this end, we used a sinusoidcomputing the truncated weights α′(x∗) as follows:

α′(x∗) =

0 α(x∗) < α12

(1 + sin

( 2π(α(x∗)−α)2ρ − π

2

))α(x∗) α ≤ α(x∗) ≤ α + ρ

α(x∗) α + ρ < α(x∗)(11)

This function is illustrated in Fig. 4. Throughout this work,we used the weighting function above with values α = 0.01and ρ = 0.01. It should be noted that this particular choice oftruncation function is not critical, and could surely be replacedby other methods without perceivably impacting the resultingdynamics. The computation of the reshaping parameters θ j(x∗)at a query location x∗ can hence be summarized as follows:

1. compute α(x∗) according to Eq. (10)2. compute the truncated weights according to Eq. (11)3. compute the predicted parameters θ j(x∗) = α′(x∗)T Θ j

4.4. Trajectory-based Sparsity CriteriaIf fixed hyper parameters are considered as in this paper, in-

cremental learning can be achieved simply by expanding thetraining set used for GPR. To deal with the increased complex-ity of having to recompute a M × M matrix inverse each time anew data point is added, it is useful to sparsely represent the in-coming data. The GP literature is already rich in sparse variantsof GPR. Like many of these previous works, we GP-MDS usesa sparsely selected subset of the collected data for GPR. Wepropose a novel selection criteria, which is based only on theobserved outputs. Assume that there is already M training datapoints in the GP training set. In order to determine if a new datapoint {xM+1, θM+1} should be included in the GP training set, weintroduce two functions:

J1M+1 =

|κM+1 − κ(xM+1)|1 + κM+1

(12a)

7

Training data

Training points used by GP-MDS

Influence region of the GPExample trajectories fromreshaped dynamical system

Example trajectories fromoriginal dynamical system

Figure 5: Left: Example of reshaped dynamics using GP-MDS in a 3d system.The colored streamtapes represent example trajectories of the reshaped dynam-ics. The streamtapes colored in black represent trajectories that do not passthrough the reshaped region of the state space, and hence retain the straight-line characteristics of the linear system that is used as original dynamics here.The green streamtube are artificially generated data representing an expandingspiral. Points in magenta represent the subset of this data that was selected astraining set. The gray surface illustrates the region in which the dynamics aresignificantly altered (corresponding to a level set of the predictive variance inthe GP). The colored streamtapes are example trajectories that pass through thereshaped region. Right: Same as left but zoomed in and the influence surfacehas been sliced to improve visibility of the training points and the trajectories.

J2L+1 = min

k∈N(|φM+1 − φ(xM+1) + 2kπ|) (12b)

where κ(xM+1) and φ(xM+1) denote the predicted speed scalingand rotation angle at the new input points, using the existingGP training data {xm, θm}

Mm=1. Eq. (12a) is a relative measure of

the speed error, and (12b) is an absolute measure of the errorin rotation angle. The data point is added to the training set ifeither of J1

M+1 or J2M+1 exceed predetermined threshold values

J1, J

2. E.g. by setting J

1= 0.1 and J

2= 0.1π, speed errors of

less than 10% and error in rotation angle below 0.1π are con-sidered acceptable. Note that these thresholds relate directly tothe trajectory and are hence easily tuned to a desirable trade-off

between sparsity and accurate trajectory representation.An illustrative example of GP-MDS on toy 3d data is given

in Fig. 5.

5. Evaluation

In this section, we present simulations and experiments toevaluate the proposed approach. First, GP-MDS is applied forrefining handwriting motions in 2d. We then provide a set ofsimulations with artificially generated data illustrate 1) that GP-MDS can represent cyclic motions and 2) cyclic behavior needsto be explicitly demonstrated in order to occur. Lastly, we applyGP-MDS on a real-world task consisting in teaching a robot toput plates in the slots of a dishwasher rack.

5.1. Handwriting Motions

The LASA handwriting dataset [17] is commonly used forbenchmarking for learning in autonomous DS [20, 18]. It con-sists of a series of demonstrated 2d trajectories of handwritingletters. Here, we will not present a comparative evaluation of

Table 3: Gaussian Process Regression

Let Θ j ∈ RL be a vector containing L observations of the jth dimension ofthe parameter vector θ, with associated inputs x1, . . . , xL ∈ RD. In GPR, thegoal is to make predictions of an unknown function b, which is assumed tounderly the observed data. It is assumed that the observed outputs are noisysamples of this function, θ j

l = b(xl) + εl, l = 1 . . . L, with ε being i.i.dGaussian noise with variance σ2

n. The joint distribution of the L trainingpoints and the output at some query point x∗ is then fully determined by acovariance function k(·, ·) and Gaussian observation noise σ2

n:[Θ j

θj∗

]∼ N

(0,

[KXX + σ2

nI Kx∗XKXx∗ k(x∗, x∗)

])where

KXx∗ = [k(x1, x∗), . . . , k(xL, x∗)], Kx∗X = KTXx∗

and where KXX is a L× L matrix whose element at row i, column j is givenby:

[KXX]i j = k(xi, x j)

Predictions are made by conditioning the joint distribution over trainingpoints and the query point:

θj∗ |Θ

j∗ ∼ N

(µθ

j∗ |Θ

j , σ2θ∗ |Θ j

)The mean of the resulting distribution is used as estimator:

µθ

j∗ |Θ

j = Kx∗X[KXX + σ2nI]−1Θ j (14a)

with predictive variance:

σ2θ

j∗ |Θ

j= k(x∗, x∗) − Kx∗X[KXX + σ2

nI]−1KXx∗ (14b)

GP-MDS versus other methods for learning these motions, butfocus instead on illustrating an interesting application scenarioof GP-MDS for reshaping existing DS with additional demon-strations.

The first column of Fig. 6 shows training data for four let-ters from the LASA handwriting set, along with streamlinesfrom SEDS models trained on this data. Note that these modelsalready do a good job at modeling the data. GP-MDS was ap-plied to refine these dynamics using additional demonstrations.The middle column of Fig. 6 shows GP-MDS being appliedfor modifying the SEDS dynamics. In the case of the S-Shape,starting the trajectory from some points result in a letter withdisproportionate features as illustrated by the black trajectoryin Fig. 6a. In Fig. 6b, the dynamics have been reshaped suchthat trajectories starting in one problematic region are deviatedtoward a region of the state-space from which they produce agood letter. This results in a trajectory that produces a goodletter S after it has been deviated toward the region were theoriginal demonstrations start. For letter N, starting trajectoriesleft of the demonstrated starting location is problematic, as il-lustrated by the black example trajectory in Fig. 6d. In Fig. 6e,this is again remedied with a very simple corrective demonstra-tion. For letters W and Z, one additional demonstration (dif-ferent from the demonstrations used for the SEDS model) wasgiven. The goal here is to sharpen the corners, which are overlysmooth both in the original demonstrations and the resultingSEDS model (Figures 6g and 6j). In order to favor detail overgeneralization, a fine lengthscale was selected, resulting in thesharpened letters in Figures 6h and 6k.

The right column of Fig. 6 shows streamlines from GP-

8

MDS applied to a linear system in place of an SEDS model. Inthese cases, the original training data (the same that was usedfor training the SEDS models) was used training GP-MDS. Amedium scale lengthscale was chosen to trade-off generaliza-tion and detail. As is seen, in most cases GP-MDS reproducethe shape of the letters, although using considerably more pa-rameters than the SEDS models. While we can conclude thatrelatively complex motions can be learned even without anyknowledge of the task in the original dynamics, the perfor-mance of GP-MDS is better if the original dynamics can alreadyprovide a rough estimate of the trajectory.

Note the sparse selection of training data in Fig. 6, middlecolumn. In areas of the state-space were the original dynamicshave the same direction as the corrective demonstration, it isnot necessary to add training data5. The sparse data selectionis also clearly visible near the end of the letters in Figures 6band 6i, since the demonstrations there are roughly aligned withthe trajectories of the linear system which is used as originaldynamics in these cases.

5.2. Non-convergence in reshaped systems

Recall from Section 3.1 that starting from an asymptoticallystable DS, reshaping the system with full rank and locally ac-tive modulation function (e.g. GP-MDS) only guarantees thatthe system remains stable. Hence, it is theoretically possiblethat trajectories could end up in orbits (open or closed) insteadof converging to the attractor point of the original system. Weargue that this is not a problem, because in practice GP-MDSconverges unless it is presented with data that explicitly indi-cates orbital behavior. In this section, we support this statementby providing GP-MDS with data that is artificially generated tobe at risk of generating such behavior in the reshaped system.

Fig. 7a shows a GP-MDS reshaping a linear system explic-itly trying to create a repetetive pattern. The resulting systemconverges either to a limit cycle or the origin, depending on thestarting point of the trajectory. In Fig. 7b, additional data hasbeen provided that changes the characteristic of the origin froma sink to a source, resulting in a system in which all trajectoriesconverge to the stable limit cycle. Note that the system in Fig.7b is not stable at the origin, and violates a condition of Prop.3, because the reshaped region includes the origin. In planarsystems, any closed orbits necessarily enclose an equilibriumpoint, which is illustrated in Fig. 7c where similar data has pre-sented in a different part of the plane. The resulting system isasymptotically stable at the origin, although it exhibits charac-teristics that are undesirable for any practical application. Weretain from this that closed orbits in the demonstrations shouldgenerally be avoided.

In higher dimension orbits can in principle occur anywherein the state-space. We have found that it is quite difficult toproduce data that causes orbital behavior, although it is possi-ble. Fig. 8 shows GP-MDS used to reshape a linear system in

5In these experiments, J1

was set to a very high value, since speed was notconsidered important for this task. Hence, selection criteria is in practice onlydepending on the directionality of the vector field.

Original training dataHighlighted trajectoriesStreamlines of dynamics

Collected corrective dataSelected GP data

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 6: Left column: Demonstrated trajectories and resulting SEDS modelsfor the letters S,N,Z and W. Middle column: GP-MDS is used to improve var-ious aspects of the SEDS models. In the case of letters S and N the favorablestarting region of the state space is achieved with very simple data being pro-vided to GP-MDS with a crude length-scale. In the case of Z and W, GP-MDSwith a fine length-scale is used to sharpen the corners of the letters. Right col-umn: The original training data is provided to GP-MDS, with a simple linearsystem replacing SEDS as original dynamics.

(a) (b) (c)

Figure 7: a) A demonstration of a closed repetitive pattern is used for reshap-ing a linear 2d system with GP-MDS. b) An additional demonstration aimed atdestabilizing the origin results in a system in which all trajectories converge toa stable limit cycle. c) A demonstrated closed curve not containing an equilib-rium point.

9

Training data

Training points used by GP-MDS

Influence region of the GP

Example trajectories fromreshaped dynamical system

(a) (b)

Figure 8: Left: An artificially generated spiral-shaped trajectory is used toreshape a linear 3d system. Right: Reshaping that leads to loss of asymptoticstability. Here, an artificially generated planar circular trajectory centered at theorigin is used with GP-MDS with a very large lengthscale.

3d with artificial data from a spiral shaped motion. Even witha very tight spiral as in Fig. 8a, it seems that trajectories donot get stuck but would eventually converge to the origin. Onlywhen translating the data so that the spiral has the origin in itscenter, and by significantly increasing the lengthscale of the GPwere we able to clearly produce a system in which the testedtrajectories did not converge to the origin, see Fig. 8b.

These are examples of particular systems with particulardemonstrations and parameter settings that can not be usedto draw any certain conclusions regarding asymptotic stabil-ity. What we can see though, is that unless repetitive behavioris explicitly demonstrated, the equilibrium point of the systemseems to remain asymptotically stable. Planar systems are theonly case in which it is easy to achieve closed orbits. Withartificial data perfectly centered around the origin and with ex-treme parameter settings for the GP we were able to produce asystem that was clearly not asymptotically stable. We concludethat in no case will orbital behavior occur unless the demonstra-tions explicitly include repetitive patterns, and even with suchdemonstrations the resulting system will only produce closedorbits in special cases. However, it is advisable to avoid repeti-tive patterns in the demonstrations, as they can lead to unnaturalmotions as in Fig. 7c.

5.3. Polishing Task Using Planar Periodic Motion

From Section 5.2, it is clear that periodic motion is gener-ally quite difficult to achieve with GP-MDS. In the special caseof planar systems, however, periodic motion can be achievedby reshaping the dynamics into a limit cycle which can be halfstable (Fig. 7a) or stable (Fig. 7b). Note that in planar sys-tems, limit cycles can only occur if they encircle an equilibriumpoint. Since we consider with original dynamics that have asingle stable equilibrium point, the location of the limit cycle isconstrained to include this point. By reshaping at the origin, thelatter can be turned from a sink to a source, resulting in systemin which all trajectories converge to a limit cycle as in Fig. 7b.Note that to achieve this, Propositions 3 and 4 are violated andthe origin is no longer stable.

(a) (b)

Figure 9: Left: The plot shows the initial demonstrated data and resultinglimit cycle sysmtem. Right: For polishing of differently shaped objects, thepolishing motion has to be adapted. Here, additional demonstrations were pro-vided until a new satisfactory shape, highlighted by the black line, had beenestablished.

Periodic motions that can be parameterized in the plane canhence be modeled using the proposed system. To exemplifythis, we consider a robotic polishing task. The polishing taskmimics the final brightening step of watches with a major Swisswatchmaker. This task is currently done manually at this com-pany. Our implementation is a prototype meant to showcase arobotic system that could potentially ease the repetitive work ofthe polisher.

The motion is parameterized in 2d by one translational com-ponent and one rotational component. These were chosen asthe z-coordinate and rotation around the y-axis in the referenceframe of the polishing center. The remaining degrees of free-dom remain constant during the task. As original dynamics, alinear system bringing the robot in a straight line to the polish-ing center was used. By starting the demonstration when therobot is on the equilibrium point of the DS, the latter is natu-rally destabilized. This effect can be seen in Fig. 9a, whichshows the demonstrated data and the resulting reshaped system.When polishing objects of different shapes, sizes and materials,it is important to adapt the polishing motion accordingly. GP-MDS is a suitable modeling tool for such tasks, since new datacan be incorporated incrementally while retaining the periodicmotion. The demonstrations seen in Fig. 9b were aimed atestablishing a more pronounced orientation change before theworkpiece is stroked in a linear motion over polishing wheel.Corrective data was provided until a satisfactory new limit cy-cle had been shaped.

5.4. Cartesian Trajectory Modeling for Stacking Plates

The task considered is to insert plates into slots in a dish rack,see Fig. 11. To perform this task, the robot needs to graspthe plates and transport them from an arbitrary starting loca-tion to the slot, and insert them with the correct orientation.We focus on using the proposed methodology to learn a modelfor the translational motion of the end-effector. While generaltreatment of the grasping and orientation control aspects alsopresent interesing problems per se, these are outside the scopeof this paper. We hence achieve proper orientation by keeping

10

(a) (b)

Figure 10: Left: The KUKA LWR robot performing the polishing task. Thegreen arrows illustrate the planar polishing motion. Right: The Barrett WAMperforming the Cartesian plate stacking task.

the end-effector orientation fixed. The grasping is completed bymanual control of the Barrett Hand by a human operator.

As original dynamics, a cartesian SEDS model correspond-ing to a standard place-type motion trained from trajectoriesrecorded from humans was used. Example trajectories fromthis system are illustrated in Fig. 11a. As can be seen, the gen-eral motion pattern is appropriate for the task, but trajectoriesstarting close to the dish rack tend to go to straight toward thetarget, colliding with the rack on its path. This model can beimproved by locally reshaping the system in the problematicregion.

For controlling the robot, we use a Cartesian impedance con-troller to follow the trajectory by the DS. Since the teachingprocedure takes place iteratively as the robot is performing thetask, it is necessary to inform the system when a correctivedemonstration is being delivered. We achieve this through theuse of an artificial skin module mounted on the robot. The ideais to achieve accurate tracking in combination with compliantmotion when necessary for corrective demonstrations. This isachieved by multiplying the feedback-component of the con-troller with a positive scalar which is inversely proportional topressure detected on the artificial skin. Let τPD ∈ R7 denote thevector of joint torques coming from the Cartesian impedancecontroller, τG ∈ R7 denote the gravity compensation torques.The control torque τ commanded to the robot joints is then:

τ = ψτPD + τG (15)

where ψ ∈ [0, 1] is a truncated linear function which is equalto one when there is no pressure on the skin, and equal to zerowhen the detected pressure exceeds a predetermined thresholdvalue. As an effect of Eq. (15), the resistance to perturbations isdecreased when the teacher pushes the arm in order to deviateits trajectory during a corrective demonstration.

The teaching procedure was initialized by starting the robotin a problematic starting point (one that would result in collisionwith the rack if the teacher does not intervene). The teacher thenphysically guided the robot during its motion so as to preventcollision with the rack. The data was recorded and used by GP-MDS to reshape the dynamics according to the incoming data.This procedure was repeated from a few problematic points inorder to expand the reshaped region of the state space. Cor-rective demonstrations were provided during four trajectories

starting in the problematic region, resulting in the training datavisible in Figures 11b and 11c. A total of 1395 data points werecollected. With lengthscale l = 0.07, signal variance σ f = 1and signal noise σn = 0.4 and with selection parameter valuesJ

1= 0.1, J

2= 0.2, only 22 training points needed to be saved in

the training set. The region in which the dynamics are reshapedis illustrated as a gray surface in Fig. 11b. As can be seen inFigures 11b and 11c, the dynamics were successfully reshapedto avoid collision with the edge of the rack. The total compu-tational time (GP prediction followed by reshaping) took about0.04ms, two orders of magnitude faster than required for ourcontrol frequency of 500Hz. The program was written in C++

and executed on an average desktop with a quadcore Intel Xeonprocessor. On this particular machine, the maximum numberof training points compatible with our control frequency is justover 2000.

6. Discussion

We have proposed a novel framework for incremental learn-ing in dynamical systems. Desired behavior is achieved by lo-cally applying transformations to an existing dynamical system.As such, this work is strongly related to our previous work ondynamic obstacle avoidance, which also uses full-rank modu-lations to modify a dynamical system without introducing newequilibrium points [21]. Here, we exploit such modulations notfor obstacle avoidance but to learn the task itself.

The LMDS framework can be used with various forms ofmodulation function and learning algorithms. In this paper,we have proposed a particular modulation function, based onlocally scaling the speed and rotating the direction of the ve-locity. This modulation function proved very useful to locallymake the streamlines of the DS match demonstrated trajecto-ries. We would like to emphasize that this is one particularexample of a possible modulation function, and a wealth of in-teresting behaviors could be implemented by a different choiceof modulation function. For example, it would be straightfor-ward to implement attraction or repulsion from an arbitrary axisin the workspace. The former could be very useful for locallyencoding convergence of trajectories to a narrow path.

We proposed a particular algorithm, GP-MDS, which mod-ulates the original dynamics by locally rotating and scaling itand learning the parameters of the modulation function withGPR. There exists numerous other regression techniques thatcould be used instead of GPR, if desirable. The required prop-erty is that the regression must be local, so that the parametervector approaches zero in regions remote from the demonstra-tions. Possible algorithms include Support Vector Regression[34] which can achieve this by enforcing zero bias, LocallyWeighted Learning [35] with radial basis functions and of per-haps of particular interest for large datasets Locally WeightedProjection Regression [32].

The three hyper parameters of the GP (lengthscale, signalvariance and noise variance) were selected by hand in this work.These parameters are easy to tune, and for a wide range of ap-plications the signal and noise variance could be fixed and the

11

Trajectories from original dynamics

Corrective demonstrations

Training points used for reshaping dynamics

Trajectories from reshaped dynamics

Influence region of the GP

(a) (b) (c)

Figure 11: Left: Trajectories resulting from the original dynamics from a set of starting points. Note that the overall motion seems well suited for the task,but trajectories starting close to the rack tend to collide with it. Provided corrective training data delivered through physical guiding of the robot is shown ingreen. Middle: Resulting reshaped system. The gray shaded region illustrates the region of influence of the GP and is computed as a level set of the predictivevariance.Right: Reshaped system from different point of view. Note the sparse selection of the training data.

lengthscale can be varied achieve the desired trade-off betweengeneralization and local accuracy. Further open parameters arethe thresholds J1 and J2 which determine the sparsity of thedata used for GPR. Because of the form of the sparsity crite-ria, the thresholds represents quantities that are interpretable ina trajectory context: they are the acceptable levels of error inspeed and direction respectively. Note that there is no cap onthe amount of data points used in GP-MDS, and problematicdata set sizes could be reached despite the sparsity criteria ifvery rich and varying data is provided. This aspect could be im-proved by incorporating a maximum number of allowed train-ing points and a matching strategy for pruning old data fromthe training set when necessary. Other directions for futureresearch include Using a non-stationary covariance function,which would increase the flexibility by allowing the generaliza-tion/detail trade-off to vary across the state-space. Furthermore,using multiple GP:s with different lengthscales would poten-tially remove the need to compromise between generalizationand local accuracy.

Generic methods that use task-based energy functions can al-low incremental learning in DS, but these methods are alwayslimited by the form of the task energy function. There are meth-ods to deal with this difficulty, as our previous work [19] whichlearns task-based energy function based on the demonstrateddata. The system proposed in [19] first builds an estimate ofa task-based Lyapunov function from demonstrations, and sub-sequently allows incremental adjustments only if they respectdescent of the learned Lyapunov function. LMDS, which is notbased on an energy function, in contrast supports unconstrainedincremental learning.

Our experimentation indicates that with asymptotically sta-ble original dynamics, systems reshaped with GP-MDS will re-tain convergence of all trajectories to the origin unless orbitalbehavior was explicitly demonstrated (Section 5.2). There pos-sible avenues for verifying the asymptotic stability property ofa reshaped system. For planar systems, it is possible to con-clude asymptotic stability by precluding the existence of limit

cycles6. This can be done by invoking Bendixon’s criterion,which gives a sufficient condition for the non-existence of limitcycles depending on the Jacobian of the system [36]. General-izations of Bendixon’s criterion to higher dimensions, notablySmiths autonomous convergence theorem [37] and extensions[38], further allows to ensure asymptotic stability in any di-mension. Depending on the form of the original dynamics,numerical evaluation across the state-space would generally berequired to ensure asymptotic stability using such an approach.

Acknowledgment

This work was supported by the Swiss National ScienceFoundation through the National Centre of Competence in Re-search Robotics and by the Swiss National Science Foundation.

[1] B. D. Argall, S. Chernova, M. Veloso, B. Browning, A Survey of RobotLearning from Demonstration, Robotics and Autonomous Systems 57 (5)(2009) 469–483. doi:10.1016/j.robot.2008.10.024. 1

[2] A. Billard, S. Calinon, R. Dillmann, S. Schaal, Handbook of RoboticsChapter 59: Robot Programming by Demonstration, in: Handbook ofRobotics, Springer, 2008. doi:10.1109/IROS.2008.4650593. 1

[3] E. Sauser, B. Argall, G. Metta, Iterative learning of grasp adaptationthrough human corrections, Robotics and Autonomous Systems 60 (1)(2011) 55–71. doi:10.1016/j.robot.2011.08.012. 1

[4] D. Bullock, S. Grossberg, The Vite Model: A Neural Command Cir-cuit for Generating Arm and Articulator Trajectories, Dynamic patternsin complex systems (1988) 305–326. 2

[5] B. Pearlmutter, Learning state space trajectories in recurrent neural net-works, Neural Computation 1 (2) (1989) 263–269. 2

[6] M. Ito, On-line Imitative Interaction with a Humanoid Robot Using aDynamic Neural Network Model of a Mirror System, Adaptive Behavior12 (2) (2004) 93–115. doi:10.1177/105971230401200202. 2

[7] D. Lin, J. Dayhoff, P. Ligomenides, Trajectory production with the adap-tive time-delay neural network, Neural Networks 8 (3) (1995) 447–461.2

[8] S. Schaal, Dynamic movement primitives-a framework for motor controlin humans and humanoid robotics, in: The International Symposium onAdaptive Motion of Animals and Machines, 2003. 2

6Limit cycles must enclose an equilibrium point

12

http://dx.doi.org/10.1016/j.robot.2008.10.024

http://dx.doi.org/10.1109/IROS.2008.4650593


http://dx.doi.org/10.1177/105971230401200202

[9] A. Ijspeert, J. Nakanishi, S. Schaal, Movement imitation with nonlineardynamical systems in humanoid robots, IEEE Intl. Conf. on Robotics andAutomation (2002) 1398–1403doi:10.1109/ROBOT.2002.1014739.2

[10] S. Calinon, Z. Li, T. Alizadeh, N. G. Tsargarakis, D. Caldwell, Statisti-cal dynamical systems for skills acquisition in humanoids, InternationalConference on Humanoid Robots. 2

[11] S. Schaal, A. Ijspeert, A. Billard, Computational approaches to motorlearning by imitation., Philosophical transactions of the Royal Society ofLondon. Series B, Biological sciences 358 (1431) (2003) 537–47. doi:

10.1098/rstb.2002.1258. 2[12] J. Kober, J. Peters, Policy search for motor primitives in robotics,

Machine Learning 84 (1-2) (2010) 171–203. doi:10.1007/

s10994-010-5223-6. 2[13] P. Kormushev, S. Calinon, D. Caldwell, Robot motor skill coordination

with EM-based reinforcement learning, in: IEEE Intl. Conf. on IntelligentRobots and Systems (IROS), 2010, pp. 3232–3237. 2

[14] E. Gribovskaya, A. Billard, Learning nonlinear multi-variate motion dy-namics for real-time position and orientation control of robotic manipula-tors, 2009 9th IEEE-RAS International Conference on Humanoid Robots(2009) 472–477doi:10.1109/ICHR.2009.5379536. 2

[15] K. Kronander, M. Khansari-zadeh, A. Billard, Learning to control planarhitting motions in a minigolf-like task, in: Intelligent Robots and Systems(IROS), 2011 IEEE/RSJ International Conference on, IEEE, 2011, pp.710–717. 2

[16] S. Kim, E. Gribovskaya, A. Billard, Learning motion dynamics to catcha moving object, 2010 10th IEEE-RAS International Conference on Hu-manoid Robots (2010) 106–111doi:10.1109/ICHR.2010.5686332. 2

[17] S. Khansari-Zadeh, A. Billard, Learning stable non-linear dynamical sys-tems with Gaussian Mixture Models, IEEE Transactions on Robotics 27(2011) 1–15. 2, 3.1, 5.1

[18] A. Lemme, K. Neumann, F. Reinhart, J. Steil, Neurally imprinted stablevector fields, in: European Symposium on Artificial Neural Networks,2013. 2, 5.1

[19] S. Mohammad Khansari-Zadeh, A. Billard, Learning control Lyapunovfunction to ensure stability of dynamical system-based robot reachingmotions, Robotics and Autonomous Systems 62 (6) (2014) 752–765.doi:10.1016/j.robot.2014.03.001. 2, 6

[20] K. Neumann, A. Lemme, J. J. Steil, Neural learning of stable dynamicalsystems based on data-driven Lyapunov candidates, 2013 IEEE/RSJ In-ternational Conference on Intelligent Robots and Systems 1 (2013) 1216–1222. doi:10.1109/IROS.2013.6696505. 2, 5.1

[21] S. M. Khansari-Zadeh, A. Billard, A dynamical system approach to re-altime obstacle avoidance, Autonomous Robots 32 (4) (2012) 433–454.doi:10.1007/s10514-012-9287-y. 2, 6

[22] T. Ogata, S. Sugano, J. Tani, Open-end humanrobot interaction fromthe dynamical systems perspective: mutual adaptation and incrementallearning, Advanced Robotics 19 (6) (2005) 651–670. doi:10.1163/

1568553054255655. 2[23] S. Calinon, F. Guenter, A. Billard, On learning, representing, and gener-

alizing a task in a humanoid robot., IEEE transactions on systems, man,and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems,Man, and Cybernetics Society 37 (2) (2007) 286–98. 2

[24] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incompletedata via the EM algorithm, Journal of the Royal Statistical Society, SeriesB (Methodological) 39 (1) (1977) 1–38. 2

[25] R. Neal, G. Hinton, A view of the EM algorithm that justifies incremental,sparse, and other variants, NATO ASI SERIES D BEHAVIOURAL AND. . . . 2

[26] M. Song, H. Wang, Highly efficient incremental estimation of gaussianmixture models for online data stream clustering, Defense and Security(2005) 174–183doi:10.1117/12.601724. 2

[27] S. Calinon, A. Billard, Incremental learning of gestures by imitation in ahumanoid robot, in: Proceedings of the ACM/IEEE international confer-ence on Human-robot interaction, ACM, 2007, pp. 255–262. 2

[28] T. Cederborg, M. Li, A. Baranes, P. Oudeyer, Incremental local onlineGaussian Mixture Regression for imitation learning of multiple tasks., in:IEEE/RSJ International Conference on Intelligent Robots and Systems,Taipei, Taiwan, 2010, pp. 267–274. 2

[29] D. Kulic, W. Takano, Y. Nakamura, Incremental Learning, Clusteringand Hierarchy Formation of Whole Body Motion Patterns using Adaptive

Hidden Markov Chains, The International Journal of Robotics Research27 (7) (2008) 761–784. doi:10.1177/0278364908091153. 2

[30] D. Lee, C. Ott, Incremental kinesthetic teaching of motion primitives us-ing the motion refinement tube, Autonomous Robots 31 (2011) 115–131.doi:10.1007/s10514-011-9234-3. 2

[31] J. Quinonero Candela, C. Rasmussen, A unifying view of sparse approxi-mate Gaussian process regression, The Journal of Machine Learning Re-search 6 (2005) 1939–1959. 2

[32] S. Vijayakumar, A. D’Souza, S. Schaal, Incremental online learning inhigh dimensions., Neural computation 17 (12) (2005) 2602–34. doi:

10.1162/089976605774320557. 2, 6[33] C. Rasmussen, C. Williams, Gaussian processes for machine learning,

MIT Press, 2006. 4.2[34] H. Drucker, C. Burges, L. Kaufman, A. Smola, V. Vapnik, Support vec-

tor regression machines, Advances in neural information processing sys-tems (x) (1997) 155–161. 6

[35] C. G. Atkeson, W. Moore, S. Schaal, Locally Weighted Learning, Artifi-cial Intelligence Review (1997) 11–73. 6

[36] J.-J. Slotine, W. Li, Applied Nonlinear Control, Prentice hall, 1991. 6[37] R. Smith, Some applications of Hausdorff dimension inequalities for or-

dinary differential equations, Proceedings of the Royal Society of Edin-burgh: Section A (104A) (1986) 235–259. 6

[38] M. Y. Li, J. S. Muldowney, On R.A. Smith’s autonomous convergencetheorem, Rocky Mountain Journal of Mathematics 25 (1). 6

13

http://dx.doi.org/10.1109/ROBOT.2002.1014739

http://dx.doi.org/10.1098/rstb.2002.1258

http://dx.doi.org/10.1098/rstb.2002.1258

http://dx.doi.org/10.1007/s10994-010-5223-6

http://dx.doi.org/10.1007/s10994-010-5223-6

http://dx.doi.org/10.1109/ICHR.2009.5379536

http://dx.doi.org/10.1109/ICHR.2010.5686332


http://dx.doi.org/10.1109/IROS.2013.6696505

http://dx.doi.org/10.1007/s10514-012-9287-y

http://dx.doi.org/10.1163/1568553054255655

http://dx.doi.org/10.1163/1568553054255655

http://dx.doi.org/10.1117/12.601724

http://dx.doi.org/10.1177/0278364908091153

http://dx.doi.org/10.1007/s10514-011-9234-3

http://dx.doi.org/10.1162/089976605774320557

http://dx.doi.org/10.1162/089976605774320557

Documents

Incremental Motion Learning with Locally Modulated ...lasa.epfl.ch/publications/uploadedFiles/LMDS_els.pdf · Incremental Motion Learning with Locally Modulated Dynamical Systems