Regularization Methods for SDP Relaxations in Large Scale ...€¦ · Regularization Methods for SDP Relaxations in Large Scale Polynomial Optimization Li Wang Joint Work With Jiawang

Regularization Methods for SDP Relaxations inLarge Scale Polynomial Optimization

Li Wang

Joint Work With Jiawang Nie

Department of Mathematics,University of California, San Diego

2012 SIAM Annual Meeting,MS 76: Convex Algebraic Geometry and Optimization I,

Minneapolis, Minnesota, USA.

Outline

SDP Relaxations for Polynomial Optimization

Regularization Methods and Software SDPNAL

Computational Experiments

Numerical Issues

Discussion

Unconstrained polynomial optimization

Let f(x) be a multivariate polynomial in x ∈ Rn of degree 2d

minx∈Rn

f(x)

We want to find the global minimum fmin of f(x), or aguaranteed lower bound fbd ≤ fmin efficiently.

The problem is NP-hard, even when the degree is 4.

Very difficult to find a good lower bound when n ≥ 20 anddeg(f) ≥ 4.

A polynomial f(x) is sum of squares if there exists polynomialsgi(x) such that

f(x) =∑

i

g2i (x).

Sum of Square Relaxation

A standard SOS relaxation is

fsos := max γ s.t. f(x)− γ is SOS.

f(x)− γ is SOS if and only if there exists X ∈ S(n+dd ) satisfying

f(x)− γ = [x]Td X[x]d = X • ([x]d[x]Td ), X 0.

Here [x]d denotes the vector of monomials

[x]d = [1 x1 · · ·xn x21 x1x2 · · · · · · xd

n]T .

The length of [x]d is N =(n+d

d

).

Let

f(x) =∑

α∈Nn:|α|≤2d

fαxα11 · · ·xαn

n =∑

α∈Nn:|α|≤2d

fαyα.

An SDP formulation of Sum of Square

Define 0/1 constant symmetric matrices C and Aα in the way that

[x]d[x]Td = C +∑

α∈Nn:0<|α|≤2d

Aαyα.

If f(x) =∑

fαxα, then f(x)− γ is SOS if and only if

C •X + γ = f0,Aα •X = fα, ∀α ∈ Nn : 0 < |α| ≤ 2d

X 0

The number of equations is: m =(n+2d

2d

).

An SDP formulation of Sum of Square

The SOS relaxation

fsos := max γ subject to f(x)− γ is SOS

is equivalent to the Semidefinite Programming (SDP)

min C •Xs.t. Aα •X = fα ∀α ∈ Nn : 0 < |α| ≤ 2d,

X 0.

The dual optimization of the above is

max bT ys.t. A∗(y) + Z = C,Z 0.

Here A∗(y) =∑

α∈Un2d

yαAα.

The size of the generated SDP

n= 10 20 30 40 502d = 4 (66, 1000) (231, 10625) (496, 46375) (861, 135750) (1326, 316250)

n= 60 70 80 90 1002d=4 (1891, 635375) (2556, 1150625) (3321, 1929500) (4186, 3049500) (5151, 4598125)n= 10 15 20 25 30

2d = 6 (286, 8007) (816, 54263) (1771, 230229) (3276, 736280) (5456, 1947791)n= 5 10 15 20 25

2d = 8 (126, 1286) (1001, 43757) (3876, 490313) (10626,3108104) (23751,13884155)n= 5 8 9 10 15

2d = 10 (252, 3002) (1287,43757) (2002, 92377) ( 3003, 184755) (15504,3268759)

Table: In each pair (N,m), N is the length of matrix and m is thenumber of equality constraints, N =

(n+d

d

), m =

(n+2d

2d

)− 1.

Constrained polynomial optimization

f(x) and g1(x), . . . , g`(x) are multivariate polynomials in x ∈ Rn

and have degrees no greater than 2d.

fmin := minx∈Rn

f(x)

s.t. g1(x) ≥ 0, . . . , g`(x) ≥ 0,

The problem is NP-hard.The d-th Lasserre’s relaxation is

fsos := max γs.t. f(x)− γ = σ0(x) + g1(x)σ1(x) + · · ·+ g`(x)σ`(x),

σ0(x), σ1(x), . . . , σ`(x) are SOS,deg(σ0),deg(σ1g1), . . . ,deg(σ`g`) ≤ 2d.

Sum of Square Relaxation

Then γ is feasible if and only if there existsX(i) ∈ SN(d−di) (i = 0, 1, . . . , `) such that

f(x)− γ =∑i=0

X(i) • (gi(x)[x]d−di[x]Td−di

),

X(0) 0, X(1) 0, . . . , X(`) 0.

is equivalent to an SDP:

min C0 •X0 + · · ·+ C` •X`

s.t. A0(X0) + · · ·+A`(X`) = fα

X0, . . . , X` 0

Its dual problem is

max bT ys.t. A∗

i (y) + Zi = Ci, i = 0, . . . , `Z0, . . . , Z` 0

Semidefinite Programming (SDP)

Primal problem (C,Ai ∈ SN×N and b ∈ Rm are const.)

min C •Xs.t. A(X) = b

X 0, A(X) =

A1 •X...

Am •X

: SN×N → Rm.

Dual problem

max bT ys.t. A∗(y) + Z = C, Z 0.

The optimality condition is

A(X) = bA∗(y) + Z = C

XZ = 0y ∈ Rm, 0 X, Z ∈ SN×N

Interior Point Method

Constructing a sequence (Xk, yk, Zk) near the central path

A(X) = bA∗(y) + Z = C

X, Z 0, X · Z = µIN , µ → 0

At each step, compute a search direction

(∆Xk,∆yk,∆Zk) ∈ RN×N × Rm × RN×N

by discretizeing the central path system (many variants)

A(∆Xk) = b−A(Xk)A∗(∆yk) + ∆Zk = C −A∗(yk)− Zk

∆Xk · Zk + Xk ·∆Zk = µIN −Xk · Zk

Regularization for primal and dual SDPs

minX

C •X maxy∈Rm,Z

bT y

s.t. A(X) = b, s.t. A∗(y) + Z = CX 0 Z 0

, A(X) =

A1 •X...

Am •X

.

The Moreau-Yosida regularization for the primal is

minX,Y

C •X + 12σ‖X − Y ‖2

s.t. A(X) = b, X 0

The Augmented Lagrangian for the dual is

maxy∈Rm,Z

bT y − (Z +A∗(y)− C) • Y − σ2 ‖Z +A∗(y)− C‖2

s.t. Z 0.

Regularization methods

Malick, Povh, Rendl, and Wiegele further showed that the duallagrange problem can be reduced to

maxy∈Rm

bT y − σ2 ‖(A

∗(y)− C + Y/σ)+‖2 + 12σ‖Y ‖

2.

Currently, there are three methods to solve the SDP problem basedon this idea:

Boundary point method (BPM), Povh, Rendl, and Wiegele(2006)

Regularization methods for semidefinite programming, Malick,Povh, Rendl, and Wiegele (2009)

Newton-CG Augmented Lagrangian method, Zhao, Sun andToh (2010).

Newton-CG Augmented Lagrangian method

The function

φ(y) := bT y − σ

2‖(A∗(y)− C + Y k/σ)+‖2

is concave and semi-smooth. Its gradient ∇φ(y) always exists, andthe Hessian ∇2φ(y) exists almost everywhere.

∇yφ(y) = b− σA((A∗(y)− C + Y/σ)+

).

Apply Newton’s method to maximize φ(y) with sub-Hessian∇2φ(y).

To avoid storing ∇2φ(y) in computer, use Conjugate-Gradient(CG) to compute Newton direction.

An Implicit Formula for Sub-Hessian ∇2φ(y)

φ(y) := bT y − σ

2‖(A∗(y)− C + Y k/σ)+‖2

Compute eigenvalue decomposition. A∗(y)− C + Y k/σ = QΛQT

where Λ = diag(λ1, . . . , λN ) with λ1 ≥ λ2 ≥ · · · ≥ λN .

Ω(i, j) =

1 if λi ≥ 0, λj ≥ 0,

λiλi−λj

if λi ≥ 0, λj < 0λj

λj−λiif λi < 0, λj ≥ 0

0 otherwise.

, Ω ∈ RN×N

∇2φσ(y) = σA(Q(Ω (QTA∗(y)Q)

)QT),

where denotes Hadamard product.

Software SDPNAL

SDPNAL(Zhao, Sun and Toh (2010)) is an implementation of theNewton-CG Augmented Lagrangian method.

The error of computed x∗ is measured as

errsol =|f(x∗)− lbd|

max1, |f(x∗)|,

The error of a computed solution (X, y, Z) for SDP relaxation ismeasured as

max

|b>y − 〈C,X〉|1 + |b>y|+ |〈C,X〉|

,‖A(X)− b‖2

1 + ‖b‖2,‖A∗(y) + Z − C‖2

1 + ‖C‖2

.

Unconstrained polynomial optimization

Minimize the following least squares polynomial

3∑k=1

(n∑

i=1

xki − 1

)2

+n∑

i=1

(x2

i−1 + x2i + x2

i+1 − x3i − 1

)2where x0 = xn+1 = 0. For n = 16, the resulting SDP relaxationhas size (N,m) = (969, 74612). Solving it by SDPNAL takes about34 minutes. The computed SDP solution has error around 6 · 10−7.The computed lower bound fuc

sos ≈ 7.5586. The error of SOSsolution is about 2 · 10−6.

Random unconstrained polynomial optimization

We generate f(x) randomly as

f(x) = fT [x]2d−1 + [xd]T F T F [xd],

where f and F are random vector/matrix of proper dimensionsand in Gaussian distribution.

n (N,m) time (min, med, max) errsol (min, med, max) errsdp (min, med, max)20 (231, 10625) 0:00:02 0:00:04 0:00:09 (4.1e-7, 1.0e-5, 1.6e-4) (2.5e-7, 7.3e-7, 1.3e-6)30 (496, 46375) 0:00:12 0:00:21 0:00:31 (1.3e-7, 5.6e-5, 1.5e-4) (3.2e-7, 6.8e-7, 1.0e-6)40 (861, 135750) 0:00:57 0:01:10 0:01:24 (7.8e-7, 1.2e-4, 3.1e-4) (4.2e-7, 4.7e-7, 9.6e-7)50 (1326, 316250) 0:02:44 0:03:17 0:04:08 (1.3e-5, 3.2e-5, 2.3e-4) (5.6e-7, 6.4e-7, 8.3e-7)60 (1891, 635375) 0:07:55 0:08:49 0:09:48 (4.6e-5, 1.8e-4, 5.1e-4) (4.8e-7, 9.1e-7, 9.5e-7)70 (2556, 1150625) 0:17:38 0:19:34 0:22:33 (8.0e-5, 2.8e-4, 3.3e-4) (4.1e-7, 5.7e-7, 9.2e-7)80 (3321, 1929500) 0:38:45 0:38:48 0:42:46 (9.3e-5, 2.7e-4, 9.6e-4) (3.7e-7, 7.0e-7, 9.9e-7)90 (4186, 3049500) 1:37:04 1:46:57 2:02:01 (1.1e-4, 2.7e-4, 6.4e-4) (4.3e-7, 5.2e-7, 9.5e-7)

100 (5151, 4598125) 2:48:03 2:55:34 3:35:27 (2.1e-4, 2.6e-4, 4.5e-4) (7.1e-7, 7.9e-7, 8.7e-7)

Table: Computational results for random polynomial of degree 4

Random unconstrained polynomial optimization

We generate f(x) randomly as

f(x) = fT [x]2d−1 + [xd]T F T F [xd],

where f and F are random vector/matrix of proper dimensionsand in Gaussian distribution.

n (N,m) time (min, med, max) errsol (min, med, max) errsdp (min, med, max)20 (231, 10625) 0:00:02 0:00:04 0:00:09 (4.1e-7, 1.0e-5, 1.6e-4) (2.5e-7, 7.3e-7, 1.3e-6)30 (496, 46375) 0:00:12 0:00:21 0:00:31 (1.3e-7, 5.6e-5, 1.5e-4) (3.2e-7, 6.8e-7, 1.0e-6)40 (861, 135750) 0:00:57 0:01:10 0:01:24 (7.8e-7, 1.2e-4, 3.1e-4) (4.2e-7, 4.7e-7, 9.6e-7)50 (1326, 316250) 0:02:44 0:03:17 0:04:08 (1.3e-5, 3.2e-5, 2.3e-4) (5.6e-7, 6.4e-7, 8.3e-7)60 (1891, 635375) 0:07:55 0:08:49 0:09:48 (4.6e-5, 1.8e-4, 5.1e-4) (4.8e-7, 9.1e-7, 9.5e-7)70 (2556, 1150625) 0:17:38 0:19:34 0:22:33 (8.0e-5, 2.8e-4, 3.3e-4) (4.1e-7, 5.7e-7, 9.2e-7)80 (3321, 1929500) 0:38:45 0:38:48 0:42:46 (9.3e-5, 2.7e-4, 9.6e-4) (3.7e-7, 7.0e-7, 9.9e-7)90 (4186, 3049500) 1:37:04 1:46:57 2:02:01 (1.1e-4, 2.7e-4, 6.4e-4) (4.3e-7, 5.2e-7, 9.5e-7)

100 (5151, 4598125) 2:48:03 2:55:34 3:35:27 (2.1e-4, 2.6e-4, 4.5e-4) (7.1e-7, 7.9e-7, 8.7e-7)

Table: Computational results for random polynomial of degree 4

Example: Constrained polynomial optimization

Minimize the quartic polynomial∑1≤i<j≤n

(xixj + x2

i xj − x3j − x2

i x2j

)over the hypercube [−1, 1]n = x ∈ Rn : x2

i ≤ 1. We apply the2nd Lasserre’s relaxation. For n = 50, solving generated SDP takesabout 2.8 hours. The error of its computed SDP solution is around10−6. The computed lower bound f con

sos ≈ −1250. A localminimizer that is

(−1,−1,−1,−1,−1,−1,−1,−1,−1,−1,−1,−1,−1, 1, 1, 1,−1,−1,−1, 1, 1,1,−1, 1,−1, 1,−1,−1, 1, 1, 1, 1,−1, 1,−1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1).

Its value is f(x∗) = −1248.

Random constrained polynomial optimization

Generate f(x) randomly as

f(x) =∑

α∈Nn:|α|≤2d

fαxα,

where the coefficients fα are Gaussian random variables.

The test results as follows:

(n,2d) #Inst time (min, med, max) errsol(min, med, max) errsdp(min, med, max)(30,4) 10 0:00:28 0:00:52 0:02:47 (5.6e-8, 1.3e-6, 6.9e-6) (1.3e-7, 8.1e-7, 2.9e-6)(40,4) 10 0:03:35 0:06:38 0:10:32 (8.8e-8, 1.8e-6, 9.5e-6) (2.2e-7, 1.0e-6, 4.5e-6)(50,4) 3 0:20:34 0:22:18 0:24:59 (5.7e-6, 5.6e-6, 7.0e-6) (2.7e-6, 2.8e-6, 3.4e-6)(60,4) 3 0:35:02 1:20:15 1:20:38 (1.5e-7, 3.5e-6, 2.5e-5) (1.7e-7, 1.7e-6, 1.2e-5)(20,6) 3 0:36:31 0:49:17 0:56:35 (8.5e-7, 2.7e-6, 4.4e-6) (5.8e-7, 1.3e-6, 2.7e-6)(12,8) 3 0:27:11 0:44:06 0:59:30 (5.5e-7, 2.8e-6, 9.0e-6) (9.0e-7, 1.3e-6, 4.2e-6)(9,10) 3 0:16:31 0:36:05 0:40:53 (2.6e-7, 3.3e-6, 1.4e-5) (2.7e-7, 1.6e-6, 6.3e-6)(80,4) 3 10:52:30 15:12:40 15:57:30 (5.3e-6, 5.5e-6, 2.2e-1) (2.6e-6, 2.6e-6, 2.7e-3)(25,6) 3 10:38:04 11:00:48 12:57:59 (5.9e-3, 6.6e-3, 1.4e-2) (3.6e-3, 5.8e-3, 6.1e-3)

Table: Computational results for random constrained polynomial optimization

Random constrained polynomial optimization

Generate f(x) randomly as

f(x) =∑

α∈Nn:|α|≤2d

fαxα,

where the coefficients fα are Gaussian random variables.The test results as follows:

(n,2d) #Inst time (min, med, max) errsol(min, med, max) errsdp(min, med, max)(30,4) 10 0:00:28 0:00:52 0:02:47 (5.6e-8, 1.3e-6, 6.9e-6) (1.3e-7, 8.1e-7, 2.9e-6)(40,4) 10 0:03:35 0:06:38 0:10:32 (8.8e-8, 1.8e-6, 9.5e-6) (2.2e-7, 1.0e-6, 4.5e-6)(50,4) 3 0:20:34 0:22:18 0:24:59 (5.7e-6, 5.6e-6, 7.0e-6) (2.7e-6, 2.8e-6, 3.4e-6)(60,4) 3 0:35:02 1:20:15 1:20:38 (1.5e-7, 3.5e-6, 2.5e-5) (1.7e-7, 1.7e-6, 1.2e-5)(20,6) 3 0:36:31 0:49:17 0:56:35 (8.5e-7, 2.7e-6, 4.4e-6) (5.8e-7, 1.3e-6, 2.7e-6)(12,8) 3 0:27:11 0:44:06 0:59:30 (5.5e-7, 2.8e-6, 9.0e-6) (9.0e-7, 1.3e-6, 4.2e-6)(9,10) 3 0:16:31 0:36:05 0:40:53 (2.6e-7, 3.3e-6, 1.4e-5) (2.7e-7, 1.6e-6, 6.3e-6)(80,4) 3 10:52:30 15:12:40 15:57:30 (5.3e-6, 5.5e-6, 2.2e-1) (2.6e-6, 2.6e-6, 2.7e-3)(25,6) 3 10:38:04 11:00:48 12:57:59 (5.9e-3, 6.6e-3, 1.4e-2) (3.6e-3, 5.8e-3, 6.1e-3)

Table: Computational results for random constrained polynomial optimization

Scaling

The SDP relaxations for polynomial optimization are harder tosolve than general SDP problems. If the optimal Z∗ has rank 1,then Z∗ = [x∗]d[x∗]Td (x∗ is a minimizer) has entries of the form

1, x∗1, . . . , (x∗1)

2, . . . , . . . , (x∗1)2d, . . . , (x∗n)2d.

Let s = (s1, . . . , sn) > 0 be a scaling vector, and scale x tox = (x1, . . . , xn) as

x = (s1x1, . . . , snxn).

Then f(x) is scaled to be the new polynomial f(s1x1, . . . , snxn) inx.

Restart Scaling

Step 1 If Algorithm Newton-CG converges well, we do not scale andget a solution; otherwise, select a scaling vectors = (s1, . . . , sn) > 0 as:

si =

τ if |yei | ≤ τ,|yei | otherwise.

Here τ > 0 is fixed and y is the most recent dual solution forSDP.

Step 2 Scale f(x) as f(s1x1, . . . , snxn). Go back to Step 1 and solvethe scaled polynomial optimization again.

Example

Consider the polynomial optimization

minx∈Rn

x41 + . . . + x4

n +∑

1≤i<j<k≤n

xixjxk.

For this kind of polynomials, its global minimizers have largenegative values and lead to ill-conditioning of the SDP relaxation.For n = 20, the iteration process as follows:

Iter time low. bdd. sdp err.

1 0:01:15 -1.0806e+7 0.7555

2 0:01:15 -1.9444e+7 0.0460

3 0:00:37 -2.1883e+7 0.0082

4 0:01:16 -2.2266e+7 2.4e-6

Table: Scaling iteration process

Discussion

Our numerical experiments show that regularization method isefficient in solving large scale polynomial optimization problems.But there are still some problems we need to consider further:

Inner CG-iteration

Scaling factor

Discussion


Inner CG-iteration

Scaling factor

Discussion


Inner CG-iteration

Scaling factor

Discussion


Inner CG-iteration

Scaling factor

c

Thank you!

Documents

Regularization Methods for SDP Relaxations in Large Scale ...€¦ · Regularization Methods for SDP Relaxations in Large Scale Polynomial Optimization Li Wang Joint Work With Jiawang