10 Splitting

Model Predictive Control

Lecture: Operator Splitting Methods for Fast MPC

Colin Jones

Laboratoire dAutomatique, EPFLSome material from Stephen Boyds lecture notes on ADMM

Outline

1. Duality

2. Dual Decomposition

3. Method of Multipliers

4. Alternating Direction Method of Multipliers

5. Common Patterns in Control

6. ADMM for MPC

Operator Splitting Methods for Fast MPC 102 Model Predictive Control ME-425

Duality

Primal problem: minz

f (z)

s.t. Az = b

Define the Lagrangian

L(z , ) = f (z) + T (Az b)

and the dual function

d() = minz

L(z , )

The dual problem is

max

d()

Recover the primal optimal solution from

z? = argminz L(z , ?)


Properties of the Dual Function

d() is concave1

d() = minz

f (z) + T (Az b)

The dual function is the pointwise minimum of affine functions

d() f (z) for all and all z such that Az = bGiven a feasible z such that Az = b

f (z) = f (z) + T (Az b) minz

L(z , ) = d()

Dual function gives lower bounds on the optimal solution.

1This is true whether f is convex or notOperator Splitting Methods for Fast MPC 104 Model Predictive Control ME-425

Properties of the Dual Function

d() is concave1

d() = minz

f (z) + T (Az b)

The dual function is the pointwise minimum of affine functions

d() f (z) for all and all z such that Az = bGiven a feasible z such that Az = b

f (z) = f (z) + T (Az b) minz

L(z , ) = d()

Dual function gives lower bounds on the optimal solution.

1This is true whether f is convex or notOperator Splitting Methods for Fast MPC 105 Model Predictive Control ME-425

Dual ProblemLagrange dual problem (find the best lower bound)

max

d()

Always a convex optimization problem

max d() minAx=b f (x)If problem is convex, then (under mild assumptions), we have strong duality:

max

d() = minz

f (z) s.t. Az = b

We can solve the primal, or the dual (whichever is easier).

Note: Duality extends to problems with inequality constraints too!


Example: Linearly Constrained QPs

minz

12zTQz + cT z

s.t. Az = b

Goal: Find a feasible z and a such that the dual function equals the primal

d() = minz

12zTQz + cT z + T (Az b)

Take the gradient and set it to zero: Qz+c+AT=0

= f (z)

Conditions for optimality:

Qz + c + AT = 0

Az = b

(z

)=

[Q AT

A 0

]1(cb

)


Dual Ascent MethodDual problem:

max

d()

This is a convex, unconstrained optimization problem.

We would like to apply a gradient ascent approach:

k+1 = k + ckd(k)

How do we compute a gradient of the dual function?

(Note: Were making the strong assumption that the dual function isdifferentiable here. Similar procedure in the non-differentiable case too.)


Gradient of the DualTheorem:If z = argminL(z , ), then Az b d()

Recall: g is a subgradient of a function h if and only if

h(x) gT (x y) h(y)

for all x and y .

d() (Az b)T ( ) = L(z , ) (Az b)T ( )= f (z) + T (Az b) (Az b)T ( )= f (z) + T (Az b) d()

Note: If d is differentiable, then d() = d()


Dual Gradient MethodWe can compute the gradient of the dual and implement the gradient method:

xk+1 = argminx L(x , k)

k+1 = k + c(Axk+1 b)

This works, but requires a number of strong assumptions.


Outline

1. Duality





6. ADMM for MPC


Dual Decomposition

Suppose our problem has the form:

min f (x) + g(y)

s.t. Ax + By = d

Computing the dual function:

minx ,y

L(x , y , ) = minx ,y

f (x) + g(y) + T (Ax + By d)= min

x(f (x) + TAx) + min

y(g(y) + TBy) d

The Lagrangian function is separable for a fixed !

Algorithm becomes:

xk+1 = argminx f (x) + Tk Ax

yk+1 = argminy g(y) + Tk By

k+1 = k + c(Axk+1 + Byk+1 b)Minimizing f and g independently and in parallelOperator Splitting Methods for Fast MPC 1012 Model Predictive Control ME-425

Dual Decomposition

Benefits:

Can solve very large problems in parallel

min f and min g may be much simpler to solve than min f + g

Limitations:

The function value converges non-monotonically to the optimal valueDoesnt matter for control

Slow (sub-gradient method)

If objective is not strictly convex, then the primal iterates xk do notnecessarily converge

MPC objectives will almost never be stricty convex because they willinclude indicator functionsThe primal iterates are the control law these must converge!


Outline

1. Duality





6. ADMM for MPC


Augmented Lagrangian

min f (z)

s.t. Az = b

Add a penalty term to the cost function to make it strictly convex:

min f (z) +

2Az b2

s.t. Az = b

Note that this doesnt change the solution!

Recall: f (z) is strictly convex if

z1, z2,t (0, 1) f (tz1 + (1 t)z2) < tf (z1) + (1 t)f (z2)


Dual FunctionThe (augmented) Lagrangian is:

L(z , ) = f (z) + T (Az b) + 2Az b2

The dual function is

d() = minz

L(z , )

Note that the dual function has changed

Theorem: Convex Analysis, Rockafellar (1970)If a convex program has a strictly convex objective, it has a unique solutionand its Lagrangian dual function is differentiable.

Convergence of the iterates

Differentiable function faster gradient method, rather than sub-gradient


Augmented Lagrangian Method

xk+1 = argminx f (x) + Tk (Ax b) +

2Ax b2

k=1 = k + (Axk+1 b)

We want to apply this to problems of the form

min f (x) + g(y)

s.t. Ax + By = d

Problem: Augmented Lagrangian doesnt decompose

minx ,y

f (x) + g(y) + Tk (Ax + By b) +

2Ax + By b2

Couples x and y


Augmented Lagrangian Method

Positive:

Converges under extremely loose conditions:non-differentiable functions, unbounded functions / indicator functions, etc

Negative:

Does not decompose / parallelize due to the quadratic term


Outline

1. Duality





6. ADMM for MPC


Alternating Direction Method of Multipliers

min f (x) + g(y)

s.t. Ax + By = b

L(x , y , ) = f (x) + g(y) + T (Ax + By b) + 2Ax + By b2

ADMM:

xk+1 = argminx L(x , yk , k)

y k+1 = argminy L(xk+1, y , k)

k+1 = k + (Axk+1 + By k+1 b)

Idea: Approximate the computation of the dual (sort of) via one step ofGauss-Seidel / block coordinate descent.Operator Splitting Methods for Fast MPC 1021 Model Predictive Control ME-425

ADMM: A Cleaner FormulationCombine linear and quadratic terms:

L(x , y , ) = f (x) + g(y) + T (Ax + By b) + 2Ax + By b2 (1)

= f (x) + g(y) +

2Ax + By b + 2 (2)

where = 1

ADMM (scaled form):

xk+1 = argminx f (x) +

2Ax + By k b + k2

y k+1 = argminy g(y) +

2Axk+1 + By b + k2

k+1 = k + Axk+1 + By k+1 b

This is the form that we will use in the exercises


Convergence of ADMM

If

f , g convex, closed, proper

L has a saddle point (i.e., an optimal solution exists).Note that this requires that the problem is feasible!

then

iterates approach feasibility Axk + By k b 0 objective approaches optimal valuef (xk) + g(y k) minx ,y f (x) + g(y) s.t. Ax + By = c


Outline

1. Duality





6. ADMM for MPC


Easily Evaluated Updates

Given a function f (x), we need to compute

x+ = argminx f (x) +

2Ax v2

If A = I (common), then this is called the proximal operator of f

proxf , (v) = argminx f (x) +

2x v2

We types of functions f can we evaluate this easily?


Common Patterns in Control. Upper / lower bounds

f (x) =

x l0 l x u x u

proxf , (v) = argminx f (x) +

2x v2

= argminx x v2s.t. l x u

=

l v lv l v uu v u

Evaluation of the proximal operator is trivial!


Common Patterns in Control. Polytopic constraints

min f (x)

s.t. Hx h

Re-write using slack variables:

min f (x) + g(s)

s.t. Hx + s = h

where g(s) is the indicator function for the positive orthant

g(s) =

{0 s 0 otherwise

proxg, (s) = max{s, 0}


Common Patterns in Control. Quadratic function

f (x) =12xTQx + cT x

proxf , (v) = argminx12xTQx + cT x +

2(x v)T (x v)

Take gradient, set to zero

Qx + c + (x v) = 0proxf , (v) = (Q + I )

1(v c)


Other Common PatternsMany other common constraints and functions have nice proximal operators

Ellipsoidal / ball-constraints

Vector norms: 1, 2, inf norms Matrix norms: Frobenius-norm, 2norm, Nuclear-norm Standard convex cones: second-order cone, semi-definite cone, positiveorthant, etc


Distributed Optimization

Separable cost function with shared variables

min

fi (xi )

s.t. xi = z for all i

Note that g(z) = 0.

Augmented Lagrangian is:

L(x0, . . . , xn, ) =

fi (xi ) +

2xi z + i2

ADMM:

xk+1i = argminxi fi (xi ) +

2xi zk + ki 2 Parallel

zk+1 = argminz

2xk+1i z + ki 2

=1n

xk+1i +

ki Consensus

k+1i = ki + x

k+1i + z

k+1 Parallel


Outline

1. Duality





6. ADMM for MPC


Linear Quadratic Predictive Control

14

Assumption: Prox operators for X and U are simple(Also possible for more complex sets)

Linear dynamics Quadratic stage costs Simple stage constraints

How to define functions f and g?

minx,u

N1i=0

x iQxi + uiRui

Z[ xi+1 = Axi + Buixi X, ui U

Sequential Convex Program

15

Make a copy of states and inputsmin

x,u

N1i=0

x iQxi + uiRui

Z[ xi+1 = Axi + Buixi = xi , ui = ui

xi X, ui U


16

minx,u

N1i=0

x iQxi + uiRui


xi X, ui U

f (x,u) Linear quadratic regulator


17

minx,u

N1i=0

x iQxi + uiRui


xi X, ui U


Linear coupling constraints


18

minx,u

N1i=0

x iQxi + uiRui


xi X, ui U xi , ui xi , ui


Simple constraints

Linear coupling constraints


19

minx,u

N1i=0

x iQxi + uiRui


xi X, ui U xi , ui xi , ui

g(xi , ui)

Sequential Convex Program Proximal Operators

20

2. Stage constraints Box (upper/lower bounds) Clipping Sphere Scaling

Also possible with additional scaling: Polyhedron Ellipse

1. LQR (Linearly constrained least-squares)

proxf (x , u) = minx,u

N1i=0

x iQxi + uiRui +

2xi xi22 +

2ui ui22

Z[ xi+1 = Axi + Bui

= M

x

u

.

Putting it Together: Sequential Convex Programming

21

minx,u

N1i=0

x iQxi + uiRui


xi X, ui U

min f (x) + g(y)

Z[ Ax + By = c

xk+1

uk+1

= M

xk + k

uk + k

xk+1 = X(x

k+1) uk+1 = U(uk+1)

k+1 = k + xk+1 xk+1k+1 = k + uk+1 uk+1

Multiplication

Clipping

Addition

Second-Order Example

22

k = 1

)SHJR! x6YHUNL! x


23

k = 2

)SHJR! x6YHUNL! x


24

k = 3

)SHJR! x6YHUNL! x


25

k = 5

)SHJR! x6YHUNL! x


26

k = 10

)SHJR! x6YHUNL! x


27

k = 15

)SHJR! x6YHUNL! x


28

k = 20

)SHJR! x6YHUNL! x


29

k = 30

)SHJR! x6YHUNL! x


30

k = 40

)SHJR! x6YHUNL! x


31

k = 50

)SHJR! x6YHUNL! x


32

k = 60

)SHJR! x6YHUNL! x


33

k = 70

)SHJR! x6YHUNL! x


34

k = 75

)SHJR! x6YHUNL! x

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

Solve

Tim

e (m

s)

Iteration

Example: Quad-Copter

35

7 states, 4 inputs, horizon 8 Upper/lower bounds on states and inputs Ellipsoidal terminal set Algorithm: Fast AMA

Max: 1.5ms

Mean: 250s

Computation time sensitive to number of active constraints

Worst-case analysis critical!

Time [Ye, et al, 2013]

Example: AC/DC Converter

36

AC grid DC source

Switched inverter (150 kHz)CL filter

[Richter, et al, 2010]

min1

2uTHu + g(x, xss , uss , w)

T u

Z[uk U() uss , k = 0, . . . , N 1

0 100 200 300 400 500

100 ns1 us

10 us100 us

1 ms

Performance of Auto-Tuned FGM on 2.5GHz PC

37

Time step

CPLEX: 0.51 ms

0 100 200 300 400 500

100 ns1 us

10 us100 us

1 ms

Performance of Auto-Tuned FGM on 2.5GHz PC

38

Time step

CPLEX: 0.51 ms

0 100 200 300 400 500

100 ns1 us

10 us100 us

1 ms

Fast gradient: 360ns

On average 1400x faster than CPLEX

[Richter, et al, 2010]

Exercise: Box-constrained QP via ADMM

minz

12zTQz + cT z

s.t. Az = b

l z u

minz

12zTQz + cT z

s.t. Az = b

l z uz = z

zupdate: zk+1 = argminz12zTQz + cT z +

2z zk + k2

s.t. Az = b(zk+1

?

)=

[Q + I AT

A 0

]1(c + (xk k)b

)

zupdate: zk+1 = min{max{zk+1 + k , l}, u}

Dual update: k+1 = k + zk+1 zk+1


Simple prox operators

Least-squares

Toolbox for Deployment of Embedded Optimization

64Release date: Very soon ;)

% Optimization variablesx = splitvar(n, N);u = splitvar(m, N-1);x(:,1) = parameter(n,1); % Objective and dynamicsobj = 0;for i = 1:N-1 x(:,i+1) == A*x(:,i) + B*u(:,i); obj = obj + x(:,i)'*Q*x(:,i) + ...

u(:,i)'*R*u(:,i);endobj = obj + x(:,end)'*x(:,end); % set up constraints-5

Splitting Methods for Control

Main idea:

Separate complex optimization into a sequence of simpler operations

Use dual to push the individual problems into consensus

Key properties

Centralized optimization : Each iteration is extremely cheap

Parallel optimization : Sub-problems can be solved in parallel

Major downside

Number of iterations may be very high. Lots of ongoing research to dealwith this issue


DualityDual DecompositionMethod of MultipliersAlternating Direction Method of MultipliersCommon Patterns in ControlADMM for MPC

Documents

10 Splitting