Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
Tutorial: Algorithms & Hardware for Embedded Optimization
14:00 What Is Different about Embedded Optimization? Eric Kerrigan
14:40 Survey of Industrial Applications of Embedded MPC Alexander Domahidi
15:00 Efficient QP Frameworks for Industrial Embedded MPC Giorgio Kufoalor
15:20 Implicit vs Explicit MPC Martin Klauco (on behalf of Michal Kvasnica)
15:40 Robustness of Explicit MPC Pedro Ayerbe
What can I expect from this session?
What is Different About Embedded Optimization?Eric Kerrigan, Bulat Khusainov, George Constantinides
Communication
ProcessingStorage
Computing System Physical
System
On time & on budget in an uncertain world
+
+
+
Applications of Embedded Optimization
A Fundamental Problem
dt
dt= 1
Correctness should be a function of time
Process + Embedded Optimizer = Uncertain Cyber-Physical System
Time + Uncertainty ⇒ Approximate today might be better than accurate tomorrow
u⇤(y) := argminu
f(u, y)
s.t. g(u, y) = 0
h(u, y) 0
yu⇤(y)measurementsoptimal inputs
disturbances
numerical errors
physical system
computing system
(p⇤, c⇤) := argminp,c
�(p, c)
s.t. ↵(p, c) = 0
�(p, c) 0
for physical system
for computing system
p⇤
c⇤
co-designer
cyber-physical system
optimal design parameters
optimal design parameters
Ordering is important:
Discretized Optimal Control/Estimation Problems
minq2Q,s2S
N�1X
i=0
`(qi, si, si+1, i, d, t)
f(qi, si, si+1, i, d, t) = 0, i = 0, . . . , N � 1
g(qi, si, si+1, i, d, t) 0, i = 0, . . . , N � 1
x := (s0, q0, s1, q1, . . . , sN�1, qN�1, sN )
Nonlinear)Programming)Strategies)On)High3Performance)Computers)
Victor)M.)Zavala)Scalable)Systems)Laboratory)Department)of)Chemical)&)Biological)Engineering)University)of)Wisconsin3Madison!
With:)Carl)Laird)(Purdue))))))Jia)Kang)(Sabre)CorporaMon)))))))Nai3Yuan)Chiang)(United)Technologies))
Acknowledgement:)Yankai)Cao)(Purdue))
Sparse & structured matrices ⇒ small (and full) not always better
Exploit Structure: Interior Point Method
Time Space
Proposed = [Cantoni, Farokhi, Kerrigan, Shames, AuCC16]
Control / Automation
SYSTEM
MODEL
ALGORITHM
Unknown Inputs
Known Inputs
Known Outputs
Unknown Outputs
Estimates of Unknown Outputs
Estimates of Known Outputs
CONTROLLER
Corrections
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
time
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
time
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
time
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
2.Solve optimal control problem
time
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
2.Solve optimal control problem
time
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
2.Solve optimal control problem
3.Implement first parttime
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
2.Solve optimal control problem
3.Implement first parttime
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
2.Solve optimal control problem
3.Implement first part
4.Go to step 1
time
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
2.Solve optimal control problem
3.Implement first part
4.Go to step 1
time
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
2.Solve optimal control problem
3.Implement first part
4.Go to step 1
time
time
output
input
Real-time Optimal/Predictive Control (e.g. Receding Horizon)
1.Take measurement
2.Solve optimal control problem
3.Implement first part
4.Go to step 1
time
time
output
input
Feedback Algorithm
Signal Processing / Estimation / Learning
SYSTEM
MODEL
ALGORITHM
Unknown Inputs
Known Inputs
Known Outputs
Unknown Outputs
Estimates of Unknown Outputs
Estimates of Known Outputs
ESTIMATOR
Corrections
Process + Embedded Optimizer = Uncertain Cyber-Physical System
HC
Pz
vy
w
euUnknowns Behaviour
Optimal closed-loop system?
u(t) 2 Proj argmin
x
{J(x, d, t) | x 2 X(d, t), d = D(y, v, t)}
When is Sub-optimal Optimal? Real-time Dynamic Optimization
latency/computational delay
objec
tive
func
tion
value
monotonic & sub-optimal
optimal
non-monotic & sub-optimal
When is Sub-optimal Optimal? Real-time Dynamic Optimization
u⇤(t, �) =
(0 8t 2 [0, �)
1/(1� �) 8t 2 [�, 1)
y(0) = 0, y(1) � 1
u(t) = 0, 8t 2 [0, �)
y(t) = u(t), 8t 2 [0, 1)
s.t.
V (u⇤(·, 0.2)) = 1.25 < V (1.1u⇤(·, 0.2)) < V (u⇤(·, 0.4)) ⇡ 1.7
(u⇤(·, �), y⇤(·, �)) := arg min(u,y)
V (u), V (u) :=
Z 1
0u(t)2dt
Precision, Accuracy and Latency: Fast Gradient Method
Bounds for i ! 1 [Jerez et al., ECC 2013, IEEE TAC 2014]
0 20 40 60 80 10010−9
10−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1||z∗(x)−
zi||2
Number of fast gradient iterations i
doubleb = 18b = 24b = 32
Quantization Errors in Communication: Distributed First-Order Optimization
[Pu, Zeilinger, Jones, arXiv 2015]
Hardware or Algorithm?
Computing Systems and Resources
Communication
Storage
Processing
Space
Energy
Time
Possible Design Parameters
Computer Hardware Algorithm
cost, space, energy, power accuracy, termination tolerances
# processors/cores/arithmetic units # iterations in each loop
pipeline depth step length parameters
clock frequency and supply voltage amount of data/results to store
memory architecture, latency, size time horizon
communication architecture, bandwidth complexity of physical model
number representation, word length coarseness of discretization
actuation and sampling schedule/rate scheduling/communication strategy
Co-Design as Multi-objective Optimization
perform
ance
lim
it
perform
ance
constraint
computational constraint
Pareto frontier
com
puting
resources
performance cost
H(c)
C(c)
Pz
vy
w
eu
minc{F(H(c),c) | (H(c),c) 2 G}
[Khusainov, Kerrigan, Constantinides, ECC2016]
Explicit Constrained LQR in Fixed Point
10-4 10-3 10-2 10-1 100
1
1.1
1.2
1.3
1.4
1.5
1.6
Computing Power [W]
Clos
ed-lo
op C
ost
[Suardi et al., ECC 2013]
H(c)
C(c)
Pz
vy
w
eu
minc{F(H(c),c) | (H(c),c) 2 G}
c := #bits
Size is Very Important in Microprocessor Design
Die area = 1 Working = 64
Die area = 4 Working = 4
Cost per die = f(areax), x∈[2,4]
FPGA Resources for an Optimal Controller
5 23 520
2
4
6
x 105
# of bits
#of
FFs
requ
ired
5 23 520
2
4
6
x 105
# of bits
#of
LU
Ts
requ
ired
5 23 520
2000
4000
# of bits
#of
DSP
sre
quir
ed
5 23 520
5
x 10−6
# of bits
com
puta
tion
dela
y[s
]
Interior-point [Rao et al., 1998] on Xilinx Virtex 6 [Longo, Kerrigan, Constantinides, Automatica 2014]
Optimal Control in Low Precision Floating-Point
0 5 10 15 20−2
0
2
4
time [s]
stat
e5
5 bits, shift
5 bits, delta
52 bits, shift
0 5 10 15 20
−0.5
0
0.5
time [s]
inpu
t2
[Longo, Kerrigan, Constantinides, Automatica 2014]
Computational Resources for an Adder
Number representation
Registers/Flip-Flops (FFs)
Lookup-Tables (LUTs)
Latency/delay (clock cycles)
double floating-point 52-bit mantissa 1035 852 12
single floating-point 23-bit mantissa 542 445 12
fixed-point 53 bits 53 53 1
fixed-point 24 bits 24 24 1
Xilinx Virtex-7 XT 1140 FPGA:
Cheap and low power processors often only have fixed-point
Floating-Point Arithmetic
Fixed-Point Arithmetic
Computational Resources for an Adder
Given a fixed amount of silicon (£/$/€):
200x more fixed point additions
• per second
• per Joule
than in floating point
Fast Gradient Method in Fixed Point Arithmetic
Atomic force microscope Actuation rate > 1 MHz
y
d
r
cantilever
sample
Piezo plate actuatoru
[Jerez et al., ECC 2013 & IEEE TAC 2014]
QP solver on FPGA: latency < 1 µs power < 1 W within 0.1% of optimal
minu
u0Hu+ u0D(y, v, t)
s.t. u u u
Latency, Precision and Silicon: MINRES
Xilinx Virtex-7 XT 1140 FPGA
float52
float23
more parallelism
0 20 40 60 80 1000
200
400
600
800
1000
late
ncy
(cycl
esper
iter
ation)
% Registers (FFs)
fixed53
fixed29
faster
[Jerez, Constantinides, Kerrigan, IEEE TC 2015]
Fixed point (FPGA) vs floating point (GPU): MINRES
10−15 10−10 10−5 1000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 x 1012
oper
ations
per
seco
nd
error tolerance for >90% of problems (η)
k = 41P = 4
k = 23P = 11
k = 17P = 21
k = 58P = 2
actual sustained
FPGA
theoretical peak GPU
NVIDIA C2050 1.03 TFLOP/s
1.15GHz, 100W 10 GFLOP/s/W
Xilinx Virtex-7 XT 1140 400MHz, 22W >180 GOP/s/W
[Jerez, Constantinides, Kerrigan, IEEE TC 2015]
What is Different About Embedded Optimization?
Optimal?
HC
Pz
vy
w
eu
Nonlinear)Programming)Strategies)On)High3Performance)Computers)
Victor)M.)Zavala)Scalable)Systems)Laboratory)Department)of)Chemical)&)Biological)Engineering)University)of)Wisconsin3Madison!
With:)Carl)Laird)(Purdue))))))Jia)Kang)(Sabre)CorporaMon)))))))Nai3Yuan)Chiang)(United)Technologies))
Acknowledgement:)Yankai)Cao)(Purdue))
Feedback
Structure
Hardware ⇔ Algorithm
+ + +