Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Regularization Methods for SDP Relaxations inLarge Scale Polynomial Optimization
Li Wang
Joint Work With Jiawang Nie
Department of Mathematics,University of California, San Diego
2012 SIAM Annual Meeting,MS 76: Convex Algebraic Geometry and Optimization I,
Minneapolis, Minnesota, USA.
Outline
SDP Relaxations for Polynomial Optimization
Regularization Methods and Software SDPNAL
Computational Experiments
Numerical Issues
Discussion
Unconstrained polynomial optimization
Let f(x) be a multivariate polynomial in x ∈ Rn of degree 2d
minx∈Rn
f(x)
We want to find the global minimum fmin of f(x), or aguaranteed lower bound fbd ≤ fmin efficiently.
The problem is NP-hard, even when the degree is 4.
Very difficult to find a good lower bound when n ≥ 20 anddeg(f) ≥ 4.
A polynomial f(x) is sum of squares if there exists polynomialsgi(x) such that
f(x) =∑
i
g2i (x).
Sum of Square Relaxation
A standard SOS relaxation is
fsos := max γ s.t. f(x)− γ is SOS.
f(x)− γ is SOS if and only if there exists X ∈ S(n+dd ) satisfying
f(x)− γ = [x]Td X[x]d = X • ([x]d[x]Td ), X 0.
Here [x]d denotes the vector of monomials
[x]d = [1 x1 · · ·xn x21 x1x2 · · · · · · xd
n]T .
The length of [x]d is N =(n+d
d
).
Let
f(x) =∑
α∈Nn:|α|≤2d
fαxα11 · · ·xαn
n =∑
α∈Nn:|α|≤2d
fαyα.
An SDP formulation of Sum of Square
Define 0/1 constant symmetric matrices C and Aα in the way that
[x]d[x]Td = C +∑
α∈Nn:0<|α|≤2d
Aαyα.
If f(x) =∑
fαxα, then f(x)− γ is SOS if and only if
C •X + γ = f0,Aα •X = fα, ∀α ∈ Nn : 0 < |α| ≤ 2d
X 0
The number of equations is: m =(n+2d
2d
).
An SDP formulation of Sum of Square
The SOS relaxation
fsos := max γ subject to f(x)− γ is SOS
is equivalent to the Semidefinite Programming (SDP)
min C •Xs.t. Aα •X = fα ∀α ∈ Nn : 0 < |α| ≤ 2d,
X 0.
The dual optimization of the above is
max bT ys.t. A∗(y) + Z = C,Z 0.
Here A∗(y) =∑
α∈Un2d
yαAα.
The size of the generated SDP
n= 10 20 30 40 502d = 4 (66, 1000) (231, 10625) (496, 46375) (861, 135750) (1326, 316250)
n= 60 70 80 90 1002d=4 (1891, 635375) (2556, 1150625) (3321, 1929500) (4186, 3049500) (5151, 4598125)n= 10 15 20 25 30
2d = 6 (286, 8007) (816, 54263) (1771, 230229) (3276, 736280) (5456, 1947791)n= 5 10 15 20 25
2d = 8 (126, 1286) (1001, 43757) (3876, 490313) (10626,3108104) (23751,13884155)n= 5 8 9 10 15
2d = 10 (252, 3002) (1287,43757) (2002, 92377) ( 3003, 184755) (15504,3268759)
Table: In each pair (N,m), N is the length of matrix and m is thenumber of equality constraints, N =
(n+d
d
), m =
(n+2d
2d
)− 1.
Constrained polynomial optimization
f(x) and g1(x), . . . , g`(x) are multivariate polynomials in x ∈ Rn
and have degrees no greater than 2d.
fmin := minx∈Rn
f(x)
s.t. g1(x) ≥ 0, . . . , g`(x) ≥ 0,
The problem is NP-hard.The d-th Lasserre’s relaxation is
fsos := max γs.t. f(x)− γ = σ0(x) + g1(x)σ1(x) + · · ·+ g`(x)σ`(x),
σ0(x), σ1(x), . . . , σ`(x) are SOS,deg(σ0),deg(σ1g1), . . . ,deg(σ`g`) ≤ 2d.
Sum of Square Relaxation
Then γ is feasible if and only if there existsX(i) ∈ SN(d−di) (i = 0, 1, . . . , `) such that
f(x)− γ =∑i=0
X(i) • (gi(x)[x]d−di[x]Td−di
),
X(0) 0, X(1) 0, . . . , X(`) 0.
is equivalent to an SDP:
min C0 •X0 + · · ·+ C` •X`
s.t. A0(X0) + · · ·+A`(X`) = fα
X0, . . . , X` 0
Its dual problem is
max bT ys.t. A∗
i (y) + Zi = Ci, i = 0, . . . , `Z0, . . . , Z` 0
Semidefinite Programming (SDP)
Primal problem (C,Ai ∈ SN×N and b ∈ Rm are const.)
min C •Xs.t. A(X) = b
X 0, A(X) =
A1 •X...
Am •X
: SN×N → Rm.
Dual problem
max bT ys.t. A∗(y) + Z = C, Z 0.
The optimality condition is
A(X) = bA∗(y) + Z = C
XZ = 0y ∈ Rm, 0 X, Z ∈ SN×N
Interior Point Method
Constructing a sequence (Xk, yk, Zk) near the central path
A(X) = bA∗(y) + Z = C
X, Z 0, X · Z = µIN , µ → 0
At each step, compute a search direction
(∆Xk,∆yk,∆Zk) ∈ RN×N × Rm × RN×N
by discretizeing the central path system (many variants)
A(∆Xk) = b−A(Xk)A∗(∆yk) + ∆Zk = C −A∗(yk)− Zk
∆Xk · Zk + Xk ·∆Zk = µIN −Xk · Zk
Regularization for primal and dual SDPs
minX
C •X maxy∈Rm,Z
bT y
s.t. A(X) = b, s.t. A∗(y) + Z = CX 0 Z 0
, A(X) =
A1 •X...
Am •X
.
The Moreau-Yosida regularization for the primal is
minX,Y
C •X + 12σ‖X − Y ‖2
s.t. A(X) = b, X 0
The Augmented Lagrangian for the dual is
maxy∈Rm,Z
bT y − (Z +A∗(y)− C) • Y − σ2 ‖Z +A∗(y)− C‖2
s.t. Z 0.
Regularization methods
Malick, Povh, Rendl, and Wiegele further showed that the duallagrange problem can be reduced to
maxy∈Rm
bT y − σ2 ‖(A
∗(y)− C + Y/σ)+‖2 + 12σ‖Y ‖
2.
Currently, there are three methods to solve the SDP problem basedon this idea:
Boundary point method (BPM), Povh, Rendl, and Wiegele(2006)
Regularization methods for semidefinite programming, Malick,Povh, Rendl, and Wiegele (2009)
Newton-CG Augmented Lagrangian method, Zhao, Sun andToh (2010).
Newton-CG Augmented Lagrangian method
The function
φ(y) := bT y − σ
2‖(A∗(y)− C + Y k/σ)+‖2
is concave and semi-smooth. Its gradient ∇φ(y) always exists, andthe Hessian ∇2φ(y) exists almost everywhere.
∇yφ(y) = b− σA((A∗(y)− C + Y/σ)+
).
Apply Newton’s method to maximize φ(y) with sub-Hessian∇2φ(y).
To avoid storing ∇2φ(y) in computer, use Conjugate-Gradient(CG) to compute Newton direction.
An Implicit Formula for Sub-Hessian ∇2φ(y)
φ(y) := bT y − σ
2‖(A∗(y)− C + Y k/σ)+‖2
Compute eigenvalue decomposition. A∗(y)− C + Y k/σ = QΛQT
where Λ = diag(λ1, . . . , λN ) with λ1 ≥ λ2 ≥ · · · ≥ λN .
Ω(i, j) =
1 if λi ≥ 0, λj ≥ 0,
λiλi−λj
if λi ≥ 0, λj < 0λj
λj−λiif λi < 0, λj ≥ 0
0 otherwise.
, Ω ∈ RN×N
∇2φσ(y) = σA(Q(Ω (QTA∗(y)Q)
)QT),
where denotes Hadamard product.
Software SDPNAL
SDPNAL(Zhao, Sun and Toh (2010)) is an implementation of theNewton-CG Augmented Lagrangian method.
The error of computed x∗ is measured as
errsol =|f(x∗)− lbd|
max1, |f(x∗)|,
The error of a computed solution (X, y, Z) for SDP relaxation ismeasured as
max
|b>y − 〈C,X〉|1 + |b>y|+ |〈C,X〉|
,‖A(X)− b‖2
1 + ‖b‖2,‖A∗(y) + Z − C‖2
1 + ‖C‖2
.
Unconstrained polynomial optimization
Minimize the following least squares polynomial
3∑k=1
(n∑
i=1
xki − 1
)2
+n∑
i=1
(x2
i−1 + x2i + x2
i+1 − x3i − 1
)2where x0 = xn+1 = 0. For n = 16, the resulting SDP relaxationhas size (N,m) = (969, 74612). Solving it by SDPNAL takes about34 minutes. The computed SDP solution has error around 6 · 10−7.The computed lower bound fuc
sos ≈ 7.5586. The error of SOSsolution is about 2 · 10−6.
Random unconstrained polynomial optimization
We generate f(x) randomly as
f(x) = fT [x]2d−1 + [xd]T F T F [xd],
where f and F are random vector/matrix of proper dimensionsand in Gaussian distribution.
n (N,m) time (min, med, max) errsol (min, med, max) errsdp (min, med, max)20 (231, 10625) 0:00:02 0:00:04 0:00:09 (4.1e-7, 1.0e-5, 1.6e-4) (2.5e-7, 7.3e-7, 1.3e-6)30 (496, 46375) 0:00:12 0:00:21 0:00:31 (1.3e-7, 5.6e-5, 1.5e-4) (3.2e-7, 6.8e-7, 1.0e-6)40 (861, 135750) 0:00:57 0:01:10 0:01:24 (7.8e-7, 1.2e-4, 3.1e-4) (4.2e-7, 4.7e-7, 9.6e-7)50 (1326, 316250) 0:02:44 0:03:17 0:04:08 (1.3e-5, 3.2e-5, 2.3e-4) (5.6e-7, 6.4e-7, 8.3e-7)60 (1891, 635375) 0:07:55 0:08:49 0:09:48 (4.6e-5, 1.8e-4, 5.1e-4) (4.8e-7, 9.1e-7, 9.5e-7)70 (2556, 1150625) 0:17:38 0:19:34 0:22:33 (8.0e-5, 2.8e-4, 3.3e-4) (4.1e-7, 5.7e-7, 9.2e-7)80 (3321, 1929500) 0:38:45 0:38:48 0:42:46 (9.3e-5, 2.7e-4, 9.6e-4) (3.7e-7, 7.0e-7, 9.9e-7)90 (4186, 3049500) 1:37:04 1:46:57 2:02:01 (1.1e-4, 2.7e-4, 6.4e-4) (4.3e-7, 5.2e-7, 9.5e-7)
100 (5151, 4598125) 2:48:03 2:55:34 3:35:27 (2.1e-4, 2.6e-4, 4.5e-4) (7.1e-7, 7.9e-7, 8.7e-7)
Table: Computational results for random polynomial of degree 4
Random unconstrained polynomial optimization
We generate f(x) randomly as
f(x) = fT [x]2d−1 + [xd]T F T F [xd],
where f and F are random vector/matrix of proper dimensionsand in Gaussian distribution.
n (N,m) time (min, med, max) errsol (min, med, max) errsdp (min, med, max)20 (231, 10625) 0:00:02 0:00:04 0:00:09 (4.1e-7, 1.0e-5, 1.6e-4) (2.5e-7, 7.3e-7, 1.3e-6)30 (496, 46375) 0:00:12 0:00:21 0:00:31 (1.3e-7, 5.6e-5, 1.5e-4) (3.2e-7, 6.8e-7, 1.0e-6)40 (861, 135750) 0:00:57 0:01:10 0:01:24 (7.8e-7, 1.2e-4, 3.1e-4) (4.2e-7, 4.7e-7, 9.6e-7)50 (1326, 316250) 0:02:44 0:03:17 0:04:08 (1.3e-5, 3.2e-5, 2.3e-4) (5.6e-7, 6.4e-7, 8.3e-7)60 (1891, 635375) 0:07:55 0:08:49 0:09:48 (4.6e-5, 1.8e-4, 5.1e-4) (4.8e-7, 9.1e-7, 9.5e-7)70 (2556, 1150625) 0:17:38 0:19:34 0:22:33 (8.0e-5, 2.8e-4, 3.3e-4) (4.1e-7, 5.7e-7, 9.2e-7)80 (3321, 1929500) 0:38:45 0:38:48 0:42:46 (9.3e-5, 2.7e-4, 9.6e-4) (3.7e-7, 7.0e-7, 9.9e-7)90 (4186, 3049500) 1:37:04 1:46:57 2:02:01 (1.1e-4, 2.7e-4, 6.4e-4) (4.3e-7, 5.2e-7, 9.5e-7)
100 (5151, 4598125) 2:48:03 2:55:34 3:35:27 (2.1e-4, 2.6e-4, 4.5e-4) (7.1e-7, 7.9e-7, 8.7e-7)
Table: Computational results for random polynomial of degree 4
Example: Constrained polynomial optimization
Minimize the quartic polynomial∑1≤i<j≤n
(xixj + x2
i xj − x3j − x2
i x2j
)over the hypercube [−1, 1]n = x ∈ Rn : x2
i ≤ 1. We apply the2nd Lasserre’s relaxation. For n = 50, solving generated SDP takesabout 2.8 hours. The error of its computed SDP solution is around10−6. The computed lower bound f con
sos ≈ −1250. A localminimizer that is
(−1,−1,−1,−1,−1,−1,−1,−1,−1,−1,−1,−1,−1, 1, 1, 1,−1,−1,−1, 1, 1,1,−1, 1,−1, 1,−1,−1, 1, 1, 1, 1,−1, 1,−1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1).
Its value is f(x∗) = −1248.
Random constrained polynomial optimization
Generate f(x) randomly as
f(x) =∑
α∈Nn:|α|≤2d
fαxα,
where the coefficients fα are Gaussian random variables.
The test results as follows:
(n,2d) #Inst time (min, med, max) errsol(min, med, max) errsdp(min, med, max)(30,4) 10 0:00:28 0:00:52 0:02:47 (5.6e-8, 1.3e-6, 6.9e-6) (1.3e-7, 8.1e-7, 2.9e-6)(40,4) 10 0:03:35 0:06:38 0:10:32 (8.8e-8, 1.8e-6, 9.5e-6) (2.2e-7, 1.0e-6, 4.5e-6)(50,4) 3 0:20:34 0:22:18 0:24:59 (5.7e-6, 5.6e-6, 7.0e-6) (2.7e-6, 2.8e-6, 3.4e-6)(60,4) 3 0:35:02 1:20:15 1:20:38 (1.5e-7, 3.5e-6, 2.5e-5) (1.7e-7, 1.7e-6, 1.2e-5)(20,6) 3 0:36:31 0:49:17 0:56:35 (8.5e-7, 2.7e-6, 4.4e-6) (5.8e-7, 1.3e-6, 2.7e-6)(12,8) 3 0:27:11 0:44:06 0:59:30 (5.5e-7, 2.8e-6, 9.0e-6) (9.0e-7, 1.3e-6, 4.2e-6)(9,10) 3 0:16:31 0:36:05 0:40:53 (2.6e-7, 3.3e-6, 1.4e-5) (2.7e-7, 1.6e-6, 6.3e-6)(80,4) 3 10:52:30 15:12:40 15:57:30 (5.3e-6, 5.5e-6, 2.2e-1) (2.6e-6, 2.6e-6, 2.7e-3)(25,6) 3 10:38:04 11:00:48 12:57:59 (5.9e-3, 6.6e-3, 1.4e-2) (3.6e-3, 5.8e-3, 6.1e-3)
Table: Computational results for random constrained polynomial optimization
Random constrained polynomial optimization
Generate f(x) randomly as
f(x) =∑
α∈Nn:|α|≤2d
fαxα,
where the coefficients fα are Gaussian random variables.The test results as follows:
(n,2d) #Inst time (min, med, max) errsol(min, med, max) errsdp(min, med, max)(30,4) 10 0:00:28 0:00:52 0:02:47 (5.6e-8, 1.3e-6, 6.9e-6) (1.3e-7, 8.1e-7, 2.9e-6)(40,4) 10 0:03:35 0:06:38 0:10:32 (8.8e-8, 1.8e-6, 9.5e-6) (2.2e-7, 1.0e-6, 4.5e-6)(50,4) 3 0:20:34 0:22:18 0:24:59 (5.7e-6, 5.6e-6, 7.0e-6) (2.7e-6, 2.8e-6, 3.4e-6)(60,4) 3 0:35:02 1:20:15 1:20:38 (1.5e-7, 3.5e-6, 2.5e-5) (1.7e-7, 1.7e-6, 1.2e-5)(20,6) 3 0:36:31 0:49:17 0:56:35 (8.5e-7, 2.7e-6, 4.4e-6) (5.8e-7, 1.3e-6, 2.7e-6)(12,8) 3 0:27:11 0:44:06 0:59:30 (5.5e-7, 2.8e-6, 9.0e-6) (9.0e-7, 1.3e-6, 4.2e-6)(9,10) 3 0:16:31 0:36:05 0:40:53 (2.6e-7, 3.3e-6, 1.4e-5) (2.7e-7, 1.6e-6, 6.3e-6)(80,4) 3 10:52:30 15:12:40 15:57:30 (5.3e-6, 5.5e-6, 2.2e-1) (2.6e-6, 2.6e-6, 2.7e-3)(25,6) 3 10:38:04 11:00:48 12:57:59 (5.9e-3, 6.6e-3, 1.4e-2) (3.6e-3, 5.8e-3, 6.1e-3)
Table: Computational results for random constrained polynomial optimization
Scaling
The SDP relaxations for polynomial optimization are harder tosolve than general SDP problems. If the optimal Z∗ has rank 1,then Z∗ = [x∗]d[x∗]Td (x∗ is a minimizer) has entries of the form
1, x∗1, . . . , (x∗1)
2, . . . , . . . , (x∗1)2d, . . . , (x∗n)2d.
Let s = (s1, . . . , sn) > 0 be a scaling vector, and scale x tox = (x1, . . . , xn) as
x = (s1x1, . . . , snxn).
Then f(x) is scaled to be the new polynomial f(s1x1, . . . , snxn) inx.
Restart Scaling
Step 1 If Algorithm Newton-CG converges well, we do not scale andget a solution; otherwise, select a scaling vectors = (s1, . . . , sn) > 0 as:
si =
τ if |yei | ≤ τ,|yei | otherwise.
Here τ > 0 is fixed and y is the most recent dual solution forSDP.
Step 2 Scale f(x) as f(s1x1, . . . , snxn). Go back to Step 1 and solvethe scaled polynomial optimization again.
Example
Consider the polynomial optimization
minx∈Rn
x41 + . . . + x4
n +∑
1≤i<j<k≤n
xixjxk.
For this kind of polynomials, its global minimizers have largenegative values and lead to ill-conditioning of the SDP relaxation.For n = 20, the iteration process as follows:
Iter time low. bdd. sdp err.
1 0:01:15 -1.0806e+7 0.7555
2 0:01:15 -1.9444e+7 0.0460
3 0:00:37 -2.1883e+7 0.0082
4 0:01:16 -2.2266e+7 2.4e-6
Table: Scaling iteration process
Discussion
Our numerical experiments show that regularization method isefficient in solving large scale polynomial optimization problems.But there are still some problems we need to consider further:
Inner CG-iteration
Scaling factor
Discussion
Our numerical experiments show that regularization method isefficient in solving large scale polynomial optimization problems.But there are still some problems we need to consider further:
Inner CG-iteration
Scaling factor
Discussion
Our numerical experiments show that regularization method isefficient in solving large scale polynomial optimization problems.But there are still some problems we need to consider further:
Inner CG-iteration
Scaling factor
Discussion
Our numerical experiments show that regularization method isefficient in solving large scale polynomial optimization problems.But there are still some problems we need to consider further:
Inner CG-iteration
Scaling factor
c
Thank you!