Spacecraft AttitudeControlusingPathIntegral

Spacecraft Attitude Control using Path Integral

Method via Riemann Manifold Hamiltonian Monte

Carlo

Bryce G. Doerr∗, Richard Linares†

University of Minnesota, Minnesota, 55455, USA

Christopher D. Petersen‡

Air Force Research Laboratory, Kirtland AFB, New Mexico, 87117, USA

The separation principle for control and estimation is the traditional method for stochas-tic optimal control of linear systems. However, such an approach does not hold true assystems become increasingly nonlinear. This paper presents a new method for controlthrough estimation based on the Path Integral (PI) formulation and the Riemann Mani-fold Hamiltonian Monte Carlo (RMHMC) method. The approach uses the path integralmethod to formulate the control problem as an estimation problem and adds the estima-tion of the model parameters to the problem. Then, a solution is found by solving theestimation problem by Riemann Manifold Hamiltonian Monte Carlo (RMHMC) samplingwhich includes a solution for the control parameters. By solving the nonlinear controland estimation problem using the path integral method via RMHMC, no open algorithmictuning parameters other than exploration noise are required and the control solution hasnumerically robust performance to high dimensionality in the system control input. Themethodology is specifically applied to the spacecraft attitude control problem, though itshould be noted that such an approach is generalizable. Simulation results are presentedwhich demonstrate good performance when utilizing the PI via RMHMC method.

I. Introduction

There exists uncertainty in how dynamical systems are modeled, how information is obtained from sensors,and the effects of external disturbances on the system. This is traditionally reflected with the addition ofprocess and measurement noise in the equations of motion. While some of these uncertainties are small,it presents a large problem in areas that need precise control. Specifically for spacecraft, such precision isneeded in small satellites for astrophysics research, surveillance, and autonomous servicing which requireprecise attitude control.1 The issue that occurs for controlling spacecraft precisely is that the system ishighly nonlinear, and as such, dynamic uncertainties can be amplified, making the system difficult to control.2

Nevertheless, scientists are motivated to develop estimation and control methods that provide precise trackingso spacecraft can return necessary data for analysis. Our work here develops and extends the control theoryfor the space attitude problem using the path integral method via Riemann Manifold Hamiltonian MonteCarlo (RMHMC) in order to control spacecraft with precision in the presence of uncertainties.

Several methods have been used to develop control and estimation techniques for nonlinear stochastic sys-tems. For the control of the spacecraft attitude problem, nonlinear optimal control strategies include solvingthe Hamilton-Jacobi equation,3 and using physically-based penalty functions.4 With estimation, Kalmanfiltering provides accurate estimates of the spacecraft orientation and angular rates.5, 6 Unfortunately, non-linearities in the dynamics do not allow for control and estimation to be combined without performanceand robustness to uncertainty repercussions. Fortunately, combined control and estimation methods existthrough the separation principle.

∗Graduate Student, Aerospace Engineering and Mechanics. AIAA Student Member. email: [email protected].†Assistant Professor, Aerospace Engineering and Mechanics. Senior AIAA Member. email: [email protected].‡Research Engineer, Space Vehicles Directorate. AIAA Member. email: [email protected].

1 of 15

American Institute of Aeronautics and Astronautics

By developing a stochastic adaptive control system, the separation principle can be used to estimateand control the stochastic system simultaneously.7, 8 The separation of both the estimation and controlcan be optimal to a known objective function, or it can also be used as an assumption to simplify theproblem. For example, the separation principle holds when the process and measurement noise in a system isGaussian and linear for the unknown parameters, and the objective function is quadratic.9 By separating thecontroller and estimator design, both can be combined to form the stochastic adaptive control by estimatingthe parameters and then using the information to update the controller. The benefit of this solution isthat if both the estimator and controller are optimal to their individual objective functions, the controllerinput obtained will be optimal (if the additive uncertainty is Gaussian). This also follows the certaintyequivalence principle where the additive uncertainty in the system does not change the optimality of thecontrol solution. Unfortunately, this does not let the controller take any measure to improve the controllerusing the uncertainties in the parameters. These are then non-dual adaptive controllers.

Dual control model predictive control (MPC) has been used for simultaneous identification and controlof uncertain systems while satisfying constraints.10–12 MPC computes the control input by solving a finite-horizon optimization problem at the current time instant. The benefit of dual control MPC is that it providesrobustness to uncertainties using simultaneous identification and control for linear systems. For applications,dual control MPC has been studied for spacecraft rendezvous and attitude control.13, 14

Most dual control MPC methods, however, are required to have additional hard constraints in order toproduce a stabilizing control. An example is the persistent excitation condition (constraint) which developsbounds where the input is full rank by being persistently excited.15 By having this constraint, the problembecomes more complex and more computationally expensive.15 Nonlinear MPC adds more complexity ascomplicated programming methods are necessary, and it may not even provide guarantees of a solution. Forour formulation, the persistent excitation condition is not needed to provide robustness to uncertainties dueto exploiting Riemann Manifold Hamilton Monte Carlo (RMHMC).

Stochastic adaptive control and dual control MPC which combine estimation and control are viable indifferent conditions, but for the nonlinear spacecraft attitude problem, methods that are robust to uncertain-ties and not computationally expensive are needed. The objectives of the paper is to control a spacecraft’sattitude through the use of the path integral formulation. By setting up an objective function that includesa control and estimation term, it can be converted to an inference problem that Markov Chain Monte Carlo(MCMC) can solve. In previous work, the inference problem was solved using a variant of MCMC calledHamiltonian Monte Carlo (HMC).16 For this work, a variation of HMC called the RMHMC is used whichovercomes the difficulty of tuning the exploration noise. Compared to other MCMC algorithms, RMHMCprovides less iterations to form the sample distribution of the probability density distribution than theMetropolis-Hastings (MH) approach and accepts the states in most of the runs of the Monte Carlo simu-lations.17, 18 RMHMC can explore the probability density function (PDF) of the inference problem moreefficiently than MH for higher dimensionality. By solving the nonlinear control and estimation using thepath integral formulation via RMHMC, the path integral method provides a straightforward formulation, noopen algorithmic tuning parameters other than exploration noise are needed, and the control solution hasnumerically robust performance to high dimensionality in the system.19

The paper is organized as follows. In Section II, the path integral and MCMC theory is discussed includ-ing methods and solutions to path integral, the Metropolis-Hastings algorithm and the Riemann ManifoldHamilitonian Monte Carlo algorithm. Section III discusses the dynamical models that demonstrate themethods and solutions discussed in the previous section. Section IV discusses relevant simulation resultsinvolved in the path integral formulation, MCMC, and RMHMC. Lastly in Section V, concluding remarksis provided including future research work.

II. Theory

A. Stochastic Optimal Control Problem

The following discussion on stochastic optimal control follows closely to that of Theodorou.19 The stochasticdynamical system is defined as

xt = f(xt, t, θ) +G(xt)(ut + ǫt), (1)

yt = h(t,xt) + vt, (2)

2 of 15


where xt is the system state, ut is control input, ǫt is zero mean Gaussian noise with variance Σǫ, f(xt, t, θ)is dynamics as a function of the state, and G(xt) is the control matrix. It is desired to determine both thecontrol trajectory and best parameters, θ that matches the output data, yt. The approach of this paperis to achieve this objective using the path integral control method to formulate the control problem as aninference problem and then add the additional inference of the model parameters, θ, to the problem.30–32 Ageneralized finite horizon cost function for the trajectory is given by

R(τk) = φtN +

∫ tN

tk

rtdt, (3)

where tk is the initial time, tn is the final time, τk is the trajectory, φtN is the terminal cost, and rt is theimmediate cost. The immediate cost is given in the form

rt = q(xt, t) +1

2u⊺

tRut, (4)

where q(xt, t) is the state dependent cost and R is the positive semi-definite control weight matrix. The costis assumed to be quadratic in terms of control input. For deterministic optimal control, the input is foundby minimizing the cost function in terms of control, but for stochastic optimal control, the input is found byminimizing the expectation of the cost function in terms of control

V (xtk) = minutk :tN

Eτk [R(τk)], (5)

where Eτk [·] is the expectation of all the trajectories. The stochastic optimal control problem (Eq. (5)) canbe equivalently given as a Hamilton-Jacobi-Bellman (HJB) equation given by

−∂tVt = minu

(

rt + (∇xVt)⊺Ft +

1

2Trace [(∇xxVt)GtΣǫG

⊺

t ]

)

, (6)

where Ft is the nonlinear dynamics without the noise given by Ft = f(xt, t) +G(xt)(ut), ∂t is the partialderivative with respect to time, ∇x is the Jacobian of the value function, and ∇xx is the Hessian of the valuefunction.20, 21

B. Hamilton-Jacobi-Bellman (HJB) Solution

To obtain the HJB solution, the optimal control must first be found by taking the minimization of Eq. (6).Equation (4) is substituted into Eq. (6), and the minimum is solved by taking the gradient and setting it tozero which results in

ut = −R−1G⊺

t (∇xtVt). (7)

The resulting second order partial differential equation (PDE) from the substitution from Eq. (4) and (7) is

−∂tVt = qt + (∇xVt)⊺ft −

1

2(∇xVt)

⊺GtR

−1G⊺

t (∇xVt) +1

2Trace ((∇xxVt)GtΣǫG

⊺

t ) . (8)

The goal is to transform this second order PDE into a linear PDE in which there are solutions for. To forma linear PDE from Eq. (8), an exponential transformation of the value function is taken which is given by

Vt = −λ logΨt, (9)

where Ψt is the exponential cost. The partial derivatives in terms of time (∂tVt) and state (∇xVt and ∇xxVt)are given as

∂tVt = −λ1

Ψt∂tΨt, (10)

∇xVt = −λ1

Ψt∇xΨt, (11)

∇xxVt = λ1

Ψ2t

∇xΨt∇xΨ⊺

t − λ1

Ψt∇xxΨt. (12)

3 of 15


These are substituted back into Eq. (8) to form

λ

Ψt∂tΨt = qt −

λ

Ψt(∇xΨt)

⊺ft −λ2

2Ψ2t

(∇xΨt)⊺GtR

−1G⊺

t (∇xΨt) +1

2Trace(Γ), (13)

where

Γ =

(

λ1

Ψ2t

∇xΨt∇xΨ⊺

t − λ1

Ψt∇xxΨt

)

GtΣǫG⊺

t . (14)

Equation (14) can be simplified further using a property of traces,

Trace(Γ) = λ1

Ψ2t

Trace(∇xΨ⊺

tGtΣǫGt∇xΨt)− λ1

ΨtTrace(∇xxΨtGtΣǫG

⊺

t ). (15)

Next, it is assumed that λR−1 = Σǫ. This condition relates the noise variance to the control cost andimplies that a higher noise variance should have a lower control cost.22, 23 Otherwise, a lower noise varianceshould have a higher control cost. For example, a system may be under a large disturbance (high noisevariance), thus more significant control input is required to bring the system back to stability or a desiredstate. More significant control input occurs with a lower control cost R. Thus, the control weight matrix R

is inverse proportional to the variance of the noise.By using the assumption above, the expression inside Eq. (13) simplifies to

λGtR−1G

⊺

t = GtΣǫG⊺

t = Σ(xt) = Σt, (16)

and cancels out with the first term in Eq. (15). The PDE is reduced to

−∂tΨt = −1

λqtΨt + f

⊺

t ∇xΨt +1

2Trace(∇xxΨtGtΣǫG

⊺

t ), (17)

with a boundary condition of ΨtN = exp(

− 1λφtn

)

. The second order linear PDE (Eq. (17)) is known as theChapman Kolmogorov partial differential equation. Unfortunately, analytical solutions usually cannot befound for general problems that contain nonlinear systems and cost functions, but the Chapman Kolmogorovequation can be converted into an inference problem which can be approximated by well-known approximateinference techniques such as Markov chain Monte Carlo (MCMC) sampling, or for this work, RMHMC.

To convert the equation into an inference problem, a relation between the solution of PDEs and theirrepresentation as a stochastic differential equation (SDE) is used. The Feynman-Kac formula provides thisrelation by finding distributions of random processes used to solve SDEs and determining numerical methodsused to solve PDEs.24, 25 Thus, the solution by using the Feynman-Kac formula on Eq. (17) is

Ψtk = Eτk

(

Ψtne−

∫tntk

1λqtdt

)

= Eτk

[

exp

(

−1

λφtN −

1

λ

∫ tN

tk

qtdt

)]

(18)

in continuous time or

Ψtk = limdt→0

∫

p(τk|xi) exp

−1

λ

φtN +N−1∑

j=i

qtjdt

dτk (19)

in discrete time where p(τk|xi) is the probability of a sample trajectory conditioned on the initial state xtk .The integration is taken with respect to the each sample trajectory τk. Therefore, the exponential cost to goΨtk can be solved. The conditional probability p(τk|xi) is developed in the next section using path integraltheory.

C. Path Integral Formulation

The probability p(τk|xtk) can be expanded out as

p(τk|xtk) = p(τk+1|xtk),

= p(xtk+1, . . . ,xtn |xtk),

=

N−1∏

j=i

p(xtk+1|xtk),

(20)

4 of 15


where the states in the trajectory are independently distributed and the initial state xtk does not contributeto the joint conditional probability. The probability p(xtk+1

|xtk), under the assumption of Brownian motionand zero mean variance, is

p(xtk+1|xtk) = N (xtk ;xtk+1

− ftkdt,Σtk), (21)

where Σtk is formed as GtkΣǫG⊺

tkdt. The probability density with inclusion of the parameters and measure-

ments is

p(τk|xtk) =

N−1∏

j=k

p(ytj+1|xtj+1

)p(xtj+1|xtj , θ). (22)

Equation (22) is simplified by using Eq. (16), resulting in,

Σtj = GtjΣǫG⊺

tjdt = λGtjR−1G

⊺

tjdt = λHtjdt. (23)

Inside Eq. (23), Htj = GtjR−1G

⊺

tj . The probability p(τk|xtk) simplifies to

p(τk|xtk) =

N−1∏

j=k

1√

2π det(Htj )exp

−1

2λ

N−1∑

j=1

∥

∥

∥

∥

xtj+1− xtj

dt− ftj

∥

∥

∥

∥

2

H−1tj

, (24)

where ‖v‖2M = v⊺Mv is the weighted square norm or Mahalanobis distance. With Eq. (19) and (24), thepath integral cost to go Ψtk becomes

Ψtk = limdt→0

∫

exp

(

−1

λZ(τk)

)

dτk, (25)

where

Z(τk) = S(τk) +λ(N − i)l

2log(2πdtλ), (26)

S(τk) = S(τk) +λ

2

N−1∑

j=k

log ||Htj ||, (27)

and

S(τk) = φtN +

N−1∑

j=i

qtjdt+1

2

N−1∑

j=k

∥

∥

∥

∥

xtj+1− xtj

dt− ftj

∥

∥

∥

∥

2

H−1tj

. (28)

S(τk) is the cost function of the optimal control problem, and it can be compared to Eq. (3) and (4) directly.

For Eq. (27), if the Gtk is not state dependent (i.e. Gtk = G), then the λ2

∑N−1j=k log ||Htj || term vanishes

and Sτk = S(τk).

D. Path Integral Optimal Control

It was shown in Eq. (7) that the minimization of the HJB equation in terms of u is the optimal controlsolution in terms of the Jacobian of the value function. By substituting the transformation of the valuefunction into Eq. (7), the control becomes

utk = λR−1Gtk

∇xtkΨtk

Ψtk

. (29)

The above equation can be reduced by substituting Eq. (25) as given by

utk =

∫

P (τk)uL(τk)dτk , (30)

P (τk) =e−

1λS(τk)

∫

e−1λS(τk)

, (31)

5 of 15


and

uL(τk) = R−1G⊺

tk

(

GtkR−1G

⊺

tk

)

−1(

Gtkǫtk − λHtk

1

2Trace

(

H−1tk ∂xtk

jHtk

)

)

. (32)

The equations given by Eq. (30), (31), and (32) are the path integral solution to the stochastic optimalcontrol problem. With these final equations, both the estimation part p(ytk+1

|xtk+1) and the objective

function given by the control problem are included to an overall inference problem which MCMC samplingcan solve.

E. Markov Chain Monte Carlo (Metropolis-Hastings)

Markov Chain Monte Carlo (MCMC) are methods that are used to sample from probability densities whichare only known up to a scaling of dimension.26 These samples can then be used to provide estimates ofmoments or other statistical properties of the PDF. Suppose an equation in the form

U =

∫

∞

−∞

U(x)p(x)dx (33)

is to be determined, but the PDF p(x) is unknown up to a scale factor. MCMC can solve this problem byproviding a sequence of samples xi ∼ p(x). Additionally, if event p(x) is known, computing these multidi-mensional integrals can be computationally intensive by using normal numerical integration methods. Suchcomputation however can be alleviated by sampling the distribution and calculating statistical properties todetermine the integral over the high dimensional state.

The integral in Eq. (33) can be viewed as an expectation with respect to p(x). Thus, the integrationproblem can be expressed as a numerical expectation approximated by samples for the distribution

U =1

N

N∑

i=1

U(xi), (34)

where xi ∼ p(·), and N is the number of samples. This equation determines the integral of U(x) by samplingeach xi in the distribution and calculates the average of the accepted samples. Both MH and HMC, whichare explained later, are methods that sample the distribution to determine the integral, but they vary onhow they sample and accept the samples while exploring the PDF.

The idea for the MH algorithm is that it picks a neighboring state of the current state, and it eitheraccepts or rejects the state depending on the change in energy. First, the candidate in the same area of thelast state is selected as given by

x′

n = xn +wk, (35)

where wk is zero mean Gaussian noise. Thus, the candidate is created through a random walk. Withprobability

p = min [1, exp (log p(xn)− log p(x′

n))] , (36)

the new state is accepted as xn+1 = x′

n, otherwise it is rejected with probability 1− p and xn+1 = xn. FromEq. (36), the candidate is always accepted if the energy from the candidate state is lower (i.e. p(x′

n) < p(xn)).So for exploration of the PDF, candidate states will always go towards lower energies, but there is norestriction for moving to higher energies for exploration. Higher energies for candidate states will have alower probability for acceptance. With a set number of samples N , a chain of accepted states is producedx1,x2, . . . ,xN from this algorithm which is used to determine statistic metrics of the system including themean as shown in Eq. (34).

Unfortunately, MH has some problems when exploring the PDF space. First, it can take a great dealof samples to explore complex distributions. Due to its random walk exploration behavior, MH can beinefficient moving around the PDF. Another problem is that it will also inefficiently explore depending onthe dimensionality of the system. For example, accepting samples with 100 dimensions with lower energyis harder probabilistically than accepting lower energy samples with 3 dimensions for a constant number ofsamples. The higher dimension case may have certain elements that are lower energy, but since the candidatestate is always proposed as a random walk, the algorithm becomes inefficient as the dimension grows. Thus,for higher dimensions MH become inefficient due to random walk behavior, but HMC can provide a moreefficient exploration of the PDF.

6 of 15


F. Hamiltonian Monte Carlo

HMC provides a more efficient way in sampling to produce the sampled PDF of a distribution compared tothe MH algorithm for high-dimensional statistical problems.17 The HMC can be described as a hockey pucksliding over a surface without friction which will stop at a certain point in time and then pushed again ina random direction. By this visualization, it can explore the PDF much more readily than a random walk.While MH algorithm proposes a candidate through random walks, HMC proposes a candidate by simulatinga Hamiltonian dynamics given by

H(xt,pt) = V (xt) + T (xt,pt), (37)

where V (xt) is the potential energy given by the negative log of the target PDF (V (xt) = − log(p(τk))) andT (xt,pt) is the kinetic energy which is dependent on the state parameters and the auxiliary momentumvariable (gradient of the state parameters). This will lead the Hamiltonian to a higher probability state withthe gradient term. The momentum term inside T (xt,pt) contains the velocity or gradient of the Hamiltoniansystem which is randomly generated for a proposed candidate. The kinetic energy is assumed to have theform

T (pt) =1

2p⊺

tM−1pt, (38)

where M is the symmetric and positive definite mass matrix of the system. Substituting this equation intoEq. (37), the Hamiltonian system is given by

H(xt,pt) = V (xt) +1

2p⊺

tM−1pt. (39)

From this Hamiltonian system, the trajectories can be determined by taking partial derivatives in terms ofpt and xt as given by

xt =∂H

∂pt= M−1pt (40)

and

pt = −∂H

∂xt= −∇xt

V (xt). (41)

where ∇xtis the gradient of in terms of the state. These solutions preserve total energy, volume, and are time

reversible. The state candidate, using Eq. (40), is determined by numerically solving over some integrationtime t. The time for integration is chosen to increase the probability for acceptance. To determine thestate candidate by integration, a numerical integrator that preserves energy, volume, and time reversibilityis needed. The Leapfrog method preserves these attributes.17

The leapfrog integration method moves forward by ∆ with the following

pt+∆2= pt −

∆

2∇xV (xt), (42)

xt+∆ = xt +∆M−1pt+∆2, (43)

and

pt+∆ = pt+∆2−

∆

2∇xV (xt+∆). (44)

Unfortunately, numerical approximation introduces errors to the integration, and the leapfrog only ap-proximates the conservation of energy. To correct for this, a similar accept and reject method that was usedfor the MH algorithm is used for the HMC algorithm. This equation is given by

α(xt,x∗

t ) = min [1, exp (−H(x∗

t ,p∗

t ) +H(xt,pt))] . (45)

If the numerical integration produces no errors (Hamiltonian is conserved), the proposal candidates shouldhave a 100% probability of acceptance, but in reality, numerical integration may produce numerical errorsthat causes the trajectory to head in a decreased probabilistic state which would increase the probabilityfor the state candidate to be rejected. If the trajectory increases the probability then it will be accepted.Thus, with a set number of samples, the PDF can be explored through a position and a velocity termwhich provides a more thorough exploration compared to MH with acceptance that is not dependent on thedimensionality of the candidate state.

7 of 15


Equations (42) and (43) can be written in terms of xt+∆ as

xt+∆ = xt +∆M−1pt −∆2

2M−1∇xV (xt) (46)

which is in the form of a pre-conditioned gradient decent or Langevin diffusion. From this equation, a tunedmass matrix is necessary to obtain good exploration noise characteristics, but tuning this mass matrix M isdifficult without knowledge of the target density.

G. Riemann Manifold Hamiltonian Monte Carlo

The Riemann Manifold Hamiltonian Monate Carlo (RMHMC) removes the difficulty of manually tuningthe mass matrix by using the local structure of the target density. This is done by utilizing the FisherInformation Matrix of the PDF dependent on the state, given by

M(x) = F = E

([

∂

∂xlog (p(·))

] [

∂

∂xlog (p(·))

]⊺)

. (47)

Note, however, the mass matrix is state dependent, therefore the Leapfrog method is not applicable for a non-separable Hamiltonian system. Nevertheless, a generalized Leapfrog algorithm can be used to simulate non-separable dynamics using the Fisher Information Matrix. The Hamiltonian for the new RMHMC approachis


2p⊺

t F−1pt +

1

2log [Det(F )] , (48)

which must be solved to solve the HMC on the Riemann manifold defined by the geometry of the parameterspace. This is a challenge due to non-separable nature of the Hamiltonian dynamics which require gradientsof F . Therefore for this approach, F is held constant during the simulation of the Hamiltonian for generationof the proposal states. Despite the fact F is held constant while solving the Hamiltonian, F is updated forthe next proposal candidate. Thus, this method uses the Riemann manifold geometry without the use of ageneralized Leapfrog or further derivatives of F . With this generalization, the Hamiltonian is


2p⊺

t F−1pt, (49)

where the term 12 log [Det(F )] disappears since the mass matrix is held constant for the proposal step. Now,

the difficulty of tuning the mass matrix is removed, and the sampling can propagate the PDF withoutdifficulty.

H. Path Integral via RMHMC

For the nonlinear control and estimation problem, a path integral method can be developed using Eqs. (30),(31), and (32). Specifically, the probability given by P (τk) in Eq. (31) has a PDF in the form

p(τk) = exp

(

−1

λS(τk)

)

, (50)

where S(τk) is the cost function of the optimal control problem, and λ is a constant used in the exponentialtransformation given in Eq. (9). Using Eq. (50), MCMC simulations can determine the mean in Eq. (34),which is an approximation of the actual PDF given by Eq. (33). By inspection, Eq. (33) is related to pathintegral optimal control solution given by Eq. (30), thus by utilizing MCMC sampling, the control trajectoryfor a nonlinear control and estimation problem can be found. An improvement of the MCMC method calledRMHMC is used to determine the mean of the control trajectory states by sampling more efficiently thanits MH counterpart.

III. Dynamical Models

A double integrator system and a spacecraft attitude system are used to determine the viability of thepath integral method via MCMC. The double integrator system is used specifically because the linearizedspacecraft attitude equations of motion reduce to a form similar to a double integrator system. The dynamicsand kinematics of each are given below.

8 of 15


A. Double Integrator Model

The double integrator linear time-invariant (LTI) system is defined as

Ac =

[

0 1

0 0

]

, Bc =

[

0

1

]

, (51)

where Ac and Bc are continuous time state matrices. The Ac and Bc matrices are discretized along a fixedtime interval utilizing a zero-order hold assumption for the control (i.e. control is held constant over thetime-interval). This results in the discretized A and B matrices and equations of motion,

xk+1 = Axk +B(uk + ǫk). (52)

To solve for the control trajectory Un = [uk, . . . ,uk+N−1]⊺, the dynamical equation is stacked so it is only

dependent on the initial state and the control input at each time step as given by

Xk+1 = Πxk +ΥUk, (53)

Yk = ΦXk, (54)

where

Π =[

A A2 , . . . , AN]⊺

, (55)

Υ =

B 0 . . . . . . . . . 0

AB B 0 . . . . . ....

A2B AB B 0 . . ....

......

. . . . . . 0...

...... . . .

. . . B 0

AN−1B AN−2B . . . . . . . . . B

, (56)

Φ =

C 0 . . . 0

0. . .

......

. . . 0

0 . . . 0 C

, (57)

xk is the initial state, Yk = [yk+1, . . . ,yk+N ]⊺is the output trajectory vector, and Xk = [xk+1, . . . ,xk+N ]

⊺

is the state trajectory vector. An optimal control problem is determined through a finite-time horizonoptimization

minUn=[un,...,un+N−1]

⊺

1

2U⊺

nQUn +HUn, (58)

where Q and H is given byQ = 2L1 + 2Υ⊺L2Υ, (59)

H = 2x⊺

nΨ⊺L2Υ. (60)

L1 and L2 are defined as

L1 =

R 0 . . . 0

0. . .

......

. . . 0

0 . . . 0 R

, L2 =

Q 0 . . . 0

0. . .

...... Q 0

0 . . . 0 P

. (61)

9 of 15


The cost function given by Eq. (58) is the S(τk) given in Eq. (50). For both the MH and RMHMC algorithms,the potential energy V (xt) = − log(p(τk)) is used. This equation is given by

V (xt) = − log exp

(

−1

λS

)

,

=1

λ

(

1

2U⊺

nQUn +HUn

)

.

(62)

The RMHMC algorithm uses an additional velocity term given by

∂V (xt)

∂Un=

∂ 1λ

(

12U

⊺

nQUn +HUn

)

∂Un,

=1

λ(QUn +H) .

(63)

Now, path integral via MH and RMHMC can be found.

B. Spacecraft Attitude Model

The quaternion attitude kinematics equations are given by27

q =1

2Ξ (q)ω =

1

2Ω (ω)q, (64)

where q = [q1, q2, q3, q4]⊺ represents the quaternions for the system and ω is the angular velocity of the body-

fixed frame with respect to the inertial frame with components ω = [ωx, ωy, ωz]⊺. The quaternion describes

the orientation of the spacecraft using a body-fixed frame respect to the inertial frame. The dynamics for arigid body spacecraft are given by27

Jω = [ω×]Jω + L, (65)

where J is the moment of inertia matrix, L is the applied torque, and [ω×] is a skew-symmetric matrix ofangular velocity components given by

[ω×] =

0 −wz wy

wz 0 −wx

−wy wx 0

. (66)

By assumption, no other forces act on the spacecraft. Using the attitude dynamics, a cost function is definedas28

J =

∫ T

0

δq⊺Qqδq+ δω⊺Qωδω + u⊺Rudt, (67)

subject to:qq⊺ = 1, (68)

x = f(x, t), (69)

where x = [ω⊺ q⊺]⊺, δq is the quaternion error between the current quaternion (q) and the desiredquaternion (qd), and similarly, δω = ω − ωd. For the regulation problem, qd = [0, 0, 0, 1]

⊺where q4 is

the scalar quaternion. The quaternion error is simplified by

δq = q⊗ q−1d ,

= q⊗ qd,

= q,

(70)

where q⊗ qd is in the form

q⊗ q∗ =

[

q∗4q1:3 + q4q∗

1:3 − q1:3 × q∗

1:3

q4q∗

4 − q1:3q∗

1:3

]

. (71)

10 of 15


Therefore, the cost function simplifies to

J =

∫ T

0

q⊺Qqq+ ω⊺Qωω + u⊺Rudt. (72)

This cost function provides the full nonlinear control solution using the path integral method, but the controlsolution can be approximated using a linear equation about the current quaternion q. The dynamics arelinearized by a first order Taylor series expansion given by

d

dt

[

ω

q

]

=

[

J−1 [ω×]J 03×412Ω(ω) 1

2Ξ(q)

] [

ω

q

]

+

[

J−1

0

]

L, (73)

where

A(t) =

[

J−1 [ω×]J 03×412Ω(ω) 1

2Ξ(q)

]

, B =

[

J−1

0

]

. (74)

The A(t) and B matrices are discretized along a fixed time interval utilizing a zero-order hold for the control.Thus, it is in the form that can be solved by the same finite horizon optimal control problem used by thedouble integrator which is given by Eq. (58). The spacecraft attitude cost is nonlinear, but MH and RMHMCalgorithms use the gradient of the corresponding potential energy and kinetic energy of the linearized timevarying model for the path integral method given by Eqs. (62) and (63).

IV. Results

-20 -15 -10 -5 0 5 10 15 20-20

-15

-10

-5

0

5

10

15

20

u(t

2)

u(t1)

(a) The PDF for the control input at t1 andt2 and optimal control (red point).

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.11

1.05

1.1PI Control via MetroHast

LQR control

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10.6

0.8

1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1-4

-2

0

x1(t)

x2(t)

u(t)

Time (s)

Time (s)

Time (s)

(b) The control input using path integral viaMH (blue) and LQR (red).

Figure 1: Simulation Results for Case 1.

For the simulations, a double integrator and spacecraft dynamical model are tested under path integraldual control method by using MCMC sampling. Both the MH and RMHMC methods are compared to eachother and the linear Quadratic regulator (LQR) method. The results are shown for the double integratordynamics and the spacecraft attitude dynamics.

A. Double Integrator Path Integral via Metropolis-Hastings

First, the LTI double integrator system is simulated to determine the characteristics of Metropolis-Hastingsat different conditions. The double integrator system uses 11 time-steps (Case 1) and 201 time-steps (Case2) to determine how exploring the PDF at different dimensions affects the sample propagation.

Figure 1(a) shows the contour plot for the path integral PDF for a mesh [−20, 20] (dimensionless) inboth axes and the sampled PDF for the control input at t1 and t2. The red point is the LQR optimalcontrol solution at t1 and t2 seconds. 10, 000 samples were generated from the initial sample in a randomwalk motion. Through the MH acceptance criteria, the sample was either accepted or rejected by comparingits energy to the previous point. For this simulation, the acceptance ratio was 62.56% from tuning MH

11 of 15


parameters. From Figure 1(a), the control input propagates more in higher probabilities of the PDF thanthe lower probabilities which follows the theory direction. Control input was determined by taking the meanof the sampled PDF which follows the formulation in Eq. (30). Figure 1(b) shows the linear response andcontrol input using the PI via MH method and a base LQR method. Starting from the initial conditionof [1, 1]⊺ and simuating for only 11 time-steps (an 11 state control input vector), the path integral methodprovides very similar performance to the LQR control method.

-20 -15 -10 -5 0 5 10 15 20-20

-15

-10

-5

0

5

10

15

20

u(t

2)

u(t1)


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.5

1

1.5

PI Control via MetroHast

LQR control

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-1

0

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-10

0

10

x1(t)

x2(t)

u(t)

Time (s)

Time (s)

Time (s)

(b) The control input using path integral viaMH (blue) and LQR (red).


Figure 2(a) shows the contour plot for the path integral PDF for a mesh [−20, 20] (dimensionless) inboth axes and the sampled PDF for the control input at t1 and t2 seconds. The red point is the LQR controlsolution at t1 and t2 seconds. Again, the 10, 000 samples were generated in a random walk motion. Theacceptance ratio for this simulation was 78.62% by tuning the MH parameters. From Figure 2(a), the controlinput does not explore the PDF evenly. By increasing the number of samples, there is a possibility thatit can explore the PDF space, but it could take more time. Also, the exploration itself does not look likea random walk when comparing this plot to Figure 1(a). Since Case 2 contains a 201 dimensional controlvector, MH has a hard time accepting samples with lower energy than its previous step. The larger thedimension, the more inefficient MH becomes when exploring the PDF and accepting samples. Figure 2(b)shows the linear response and control input for both the PI via MH and LQR method with initial conditionof [1, 1]⊺ and 201 time-steps. The control input from the figure deviates a lot more from the LQR control.More samples are needed to get a comparable control input to the LQR method. The state response also hasdiminished performance compared to the LQR state response. For the MH method to work, more samplesare needed to get performance comparable LQR in this high dimensional system.

B. Double Integrator Path Integral via Riemann Manifold Hamiltonian Monte Carlo

The LTI double integrator system was simulated using the RMHMC method to 11 time-steps (Case 3) and201 time-steps (Case 4) to show how the samples propagate at different dimensions.

For Case 3, Figure 3(a) shows the contour plot for the path integral PDF for a mesh [−20, 20] (dimen-sionless) in both axes and the sampled PDF for the control input at t1 and t2 seconds. The red point is theLQR control solution at t1 and t2 seconds. 3, 000 samples were generated using Hamiltonian dynamics withan acceptance ratio of 50.00%. The acceptance ratio of 50.00% and 3, 000 samples provides sufficient sampleexploration of the PDF. The samples were accepted or rejected using the candidate’s Hamiltonian. Sincethe Hamiltonian is conserved, it should have 100% probability of acceptance, but numerical errors from theintegration steps of the Hamiltonian can increase the rejection rate. The control input itself propagates inareas of higher probability in the PDF. By taking the mean of the sampled control inputs, the control inputfor the whole trajectory was determined using Eq. (30). Figure 3(b) shows the linear response and controlinput using the RMHMC method and a LQR method with 11 time-steps and initial conditions of [1, 1]⊺.PI via RMHMC provides very similar performance to the LQR controller. The RMHMC method at lowdimensionality also provides very similar performance to the MH method shown in Figure 1(b). Thus, pathintegral via MH or RMHMC can be used interchangeably for low dimensional problems without any drop in

12 of 15


performance in the controller.

-20 -15 -10 -5 0 5 10 15 20-20

-15

-10

-5

0

5

10

15

20

u(t

2)

u(t1)


0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.11

1.05

1.1PI Control via HMC

LQR control

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10.6

0.8

1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1-3

-2

-1

x1(t)

x2(t)

u(t)

Time (s)

Time (s)

Time (s)

(b) The control input using path integral viaRMHMC (blue) and LQR (red).


-20 -15 -10 -5 0 5 10 15 20-20

-15

-10

-5

0

5

10

15

20

u(t

2)

u(t1)


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.5

1

1.5

PI Control via HMC

LQR control

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-1

0

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-5

0

5

x1(t)

x2(t)

u(t)

Time (s)

Time (s)

Time (s)

(b) The control input using path integral viaRMHMC (blue) and LQR (red).


The contour plot shown in Figure 4(a) was simulated for 201 time-steps (Case 4). The red point is aLQR control input for t1 and t2 seconds. 3, 000 samples were generated using Hamiltonian dynamics withan acceptance rate of 49.75%. In contrast with Figure 2(a), the samples propagate the PDF in a sufficientmanner. By using less than half the samples from the MH, the RMHMC method propagates the sampledcontrol input more in areas of higher probability and less in lower probability. The individual states in thehigh dimensional system do not affect the acceptance or rejection rates for exploration. The acceptance andrejection are only affected by errors in the integration which move state candidates toward low probabilisticstates. Thus, there will be a chance the state candidate will be rejected. The RMHMC method can propagatethrough a PDF sufficiently and with less samples than the MH method. Figure 4(b) shows the linear responseand the control input for PI via RMHMC and the LQR controller with 201 time-steps and initial conditionsof [1, 1]⊺. The control input deviates from the LQR controller a lot less than the MH method-based controllershown in 2(b). Plus, there is less variance from RMHMC sampling. The state response follows the LQRsolution a lot better than the MH method with less noise incorporated in the solution. Thus, path integralvia RMHMC provides better sampling than random walk methods for high dimensional problems.

C. Spacecraft via Riemann Manifold Hamiltonian Monte Carlo

A spacecraft with a mass of 2.6 kg and dimensions of 0.1 x 0.1 x 0.2 m is simulated using the RMHMC methodfor 121 time-steps. Since the control input at a time-step is size 3, the dimension length of the RMHMC

13 of 15


0 10 20 30 40 50 60

-4

-2

0

210

-3

0 10 20 30 40 50 60

-0.02

0

0.02

0.04

0 10 20 30 40 50 60

-2

-1

0

110

-5

ω(r

ad/s)

qu

(Nm

)

ω1ω2ω3

q1q2q3

u1u2u3

Time (s)

Time (s)

Time (s)

Figure 5: The time history using path integral via RMHMC.

simulation is 363. Figure 5 shows the response and control input for the spacecraft system using the stackedcost function by linear approximation given by Eq. (58). With an initial orientation of [2.0, 3.0, 4.0]

⊺,

the system was able to stabilize in all states by only using 1, 000 RMHMC samples. Due to the small angleapproximation, q4 is not shown since it will stay approximately 1 throughout the simulation. For dimensionlength of 363, 1, 000 RMHMC samples is sufficient to provide a good approximation of the PDF to stabilizethe spacecraft optimally to a cost function.

V. Conclusion

This work studied the control problem for nonlinear systems. The Path Integral (PI) method was usedto convert a stochastic optimal control problem into an inference problem connecting estimation and controlfor nonlinear systems. This inference problem is then solved using the Riemann Manifold HamiltonianMonte Carlo (RMHMC) method formulating a new approach called Path Integral via Riemann ManifoldHamiltonian Monte Carlo (PI-RMHMC). Compared to the path integral via MH approach, the PI-RMHMCis able to solve high dimensional problems (large number of control inputs over a large number of time-steps)with better approximation of the PDF than MH with less samples. Therefore, the PI-RMHMC method ismore robust in stability as compared to the MH method. Additionally, the PI-RMHMC control approach isviable for spacecraft attitude systems which is highly dimensional in the number of control inputs for eachtime-step. Hence, the PI-RMHMC can be extended to larger dimensions than the spacecraft problem thatwas solved.

For future work, the full nonlinear spacecraft model will be used to determine the viability for fullnonlinear systems. By using the full nonlinear model, we suspect that the optimal controller will provide alower cost function than LQR that is based on the linearized spacecraft dynamics. Also, PI-RMHMC will beextended to the dual control problem for spacecraft systems. By including process and measurement noise tothe spacecraft model, PI-RMHMC may be able to simultaneous control and estimate the spacecraft throughthe time interval. One last extension for PI-RMHMC is to use neural networks to develop parameter controlmethods for the spacecraft attitude problem. PI-RMHMC is generalizable to nonlinear systems, and it beuseful for more general stochastic control problems.

Acknowledgments

This work was made possible by support from the Air Force Research Laboratory at Kirtland Air ForceBase, through the Universities Space Research Association.

14 of 15


References

1Cebrowski, Arthur K., and John W. Raymond. “Operationally responsive space: A new defense business model.” ArmyWar Coll Carlisle Barracks PA, 2005.

2MacKunis, Will, et al. “Adaptive satellite attitude control in the presence of inertia and CMG gimbal friction uncertain-ties.” The Journal of the Astronautical Sciences 56.1 (2008): 121-134.

3Sharma, Rajnish, and Ashish Tewari. ”Optimal nonlinear tracking of spacecraft attitude maneuvers.” IEEE transactionson control systems technology 12.5 (2004): 677-682.

4Schaub, Hanspeter, John L. Junkins, and Rush D. Robinett. ”New penalty functions and optimal control formulation forspacecraft attitude control problems.” Journal of Guidance Control and Dynamics 20 (1997): 428-434.

5Lefferts, Ern J., F. Landis Markley, and Malcolm D. Shuster. ”Kalman filtering for spacecraft attitude estimation.”Journal of Guidance, Control, and Dynamics 5.5 (1982): 417-429.

6Kim, Son-Goo, et al. ”Kalman filtering for relative spacecraft attitude and position estimation.” Journal of GuidanceControl and Dynamics 30.1 (2007): 133-143.

7Wittenmark, Bjorn. “Adaptive dual control.” Control Systems, Robotics and Automation, Encyclopedia of Life SupportSystems (EOLSS), Developed under the auspics of the UNESCO. (2002).

8Wittenmark, Bjorn. ”Adaptive dual control methods: An overview.” IFAC Proceedings Volumes 28.13 (1995): 67-72.9Lobo, Miguel Sousa, and Stephen Boyd. “Policies for simultaneous estimation and optimization.” American Control

Conference, 1999. Proceedings of the 1999. Vol. 2. IEEE, 1999.10Weiss, Avishai, and Stefano Di Cairano. “Robust dual control mpc with guaranteed constraint satisfaction.” Decision

and Control (CDC), 2014 IEEE 53rd Annual Conference on. IEEE, 2014.11Rathousky, Jan, and Vladimir Havlena. “Multiple-step active control with dual properties.” IFAC Proceedings Volumes

44.1 (2011): 1522-1527.12Rathousky, Jan, and Vladimır Havlena. “MPC - based approximate dual controller by information matrix maximization.”

International Journal of Adaptive Control and Signal Processing 27.11 (2013): 974-999.13Mitani, Shinji. ”Satisficing Nonlinear Spacecraft Rendezvous Under Control Magnitude and Direction Constraints.”

(2013).14Manikonda, V., et al. ”A model predictive control-based approach for spacecraft formation keeping and attitude control.”

American Control Conference, 1999. Proceedings of the 1999. Vol. 6. IEEE, 1999.15Marafioti, G., R. Bitmead, and M. Hovd. “Persistently exciting model predictive control using fir models.” International

Conference Cybernetics and Informatics. Vol. 6. 2010.16van den Broek, Bart, Wim Wiegerinck, and Bert Kappen. ”Stochastic optimal control of state constrained systems.”

International Journal of Control 84.3 (2011): 597-615.17Neal, Radford M. “MCMC using Hamiltonian dynamics.” Handbook of Markov Chain Monte Carlo 2 (2011): 113-162.18Betancourt, Michael. “A conceptual introduction to Hamiltonian Monte Carlo.” arXiv preprint arXiv:1701.02434 (2017).19Theodorou, Evangelos, Jonas Buchli, and Stefan Schaal. “A generalized path integral control approach to reinforcement

learning.” Journal of Machine Learning Research 11.Nov (2010): 3137-3181.20Stengel, Robert F. Optimal control and estimation. Courier Corporation, 2012.21Fleming, Wendell H., and Halil Mete Soner. Controlled Markov processes and viscosity solutions. Vol. 25. Springer Science

& Business Media, 2006.22Kappen, Hilbert J. ”An introduction to stochastic control theory, path integrals and reinforcement learning.” AIP con-

ference proceedings. Vol. 887. No. 1. AIP, 2007.23van den Broek, L. J., W. A. J. J. Wiegerinck, and H. J. Kappen. ”Graphical model inference in optimal control of

stochastic multi-agent systems.” (2008).24Øksendal, Bernt. ”Stochastic differential equations.” Stochastic Differential Equations. Springer Berlin Heidelberg, 2003.

65-84.25Yong, Jiongmin. ”Relations among odes, pdes, fsdes, bsdes, and fbsdes.” Decision and Control, 1997., Proceedings of the

36th IEEE Conference on. Vol. 3. IEEE, 1997.26Geyer, Charles. ”Introduction to markov chain monte carlo.” Handbook of markov chain monte carlo (2011): 3-48.27Hughes, Peter C. “Spacecraft attitude dynamics.” Courier Corporation, 2012.28Schaub, Hanspeter, and John L. Junkins. Analytical mechanics of space systems. Aiaa, 2003.29Crassidis, John L., and John L. Junkins. Optimal estimation of dynamic systems. CRC press, 2011.30Yang, Insoon, et al. “Path integral formulation of stochastic optimal control with generalized costs.” IFAC Proceedings

Volumes 47.3 (2014): 6994-7000.31Williams, Grady, Andrew Aldrich, and Evangelos A. Theodorou. “Model predictive path integral control: from theory to

parallel computation.” Journal of Guidance, Control, and Dynamics. AIAA (2017).32Mensink, Thomas, Jakob Verbeek, and Bert Kappen. “EP for efficient stochastic control with obstacles.” ECAI 2010-19th

European Conference on Artificial Intelligence. IOS Press, 2010.

15 of 15


Documents

Spacecraft AttitudeControlusingPathIntegral