View
222
Download
1
Category
Preview:
Citation preview
Convex Optimization and Applications
Lin Xiao
Center for the Mathematics of InformationCalifornia Institute of Technology
Acknowledgment:
Stephen Boyd (Stanford) and Lieven Vandenberghe (UCLA)and their research groups
CS286a: Mathematics of Information seminar, 10/15/04
Two problems
polyhedron P described by linear inequalities, aTi x ≤ bi, i = 1, . . . , L
PSfrag replacements
a1
P
Problem 1: find minimum volume ellipsoid ⊇ P
Problem 2: find maximum volume ellipsoid ⊆ P
are these (computationally) difficult? or easy?
CS286a seminar 10/15/04 1
problem 1 is very difficult
• in practice
• in theory (NP-hard)
problem 2 is very easy
• in practice (readily solved on small computer)
• in theory (polynomial complexity)
CS286a seminar 10/15/04 2
Moral
very difficult and very easy problems can look quite similar
. . . unless we are trained to recognize the difference
CS286a seminar 10/15/04 3
Linear program (LP)
minimize cTxsubject to aTi x ≤ bi, i = 1, . . . ,m
c, ai ∈ Rn are parameters; x ∈ Rn is variable
• easy to solve, in theory and practice• can solve dense problems with n = 1000 vbles, m = 10000 constraintseasily; far larger for sparse or structured problems
CS286a seminar 10/15/04 4
Polynomial minimization
minimize p(x)
p is polynomial of degree d; x ∈ Rn is variable
• except for special cases (e.g., d = 2) this is a very difficult problem
• even sparse problems with size n = 20, d = 10 are essentially intractable
• all algorithms known to solve this problem require effort exponential in n
CS286a seminar 10/15/04 5
Moral
• a problem can appear∗ hard, but be easy
• a problem can appear∗ easy, but be hard
∗ if we are not trained to recognize them
CS286a seminar 10/15/04 6
What makes a problem easy or hard?
classical view:
• linear is easy
• nonlinear is hard(er)
CS286a seminar 10/15/04 7
What makes a problem easy or hard?
emerging (and correct) view:
. . . the great watershed in optimization isn’t between linearity andnonlinearity, but convexity and nonconvexity.
— R. Rockafellar, SIAM Review 1993
CS286a seminar 10/15/04 8
Convex optimization
minimize f0(x)subject to f1(x) ≤ 0, . . . , fm(x) ≤ 0, Ax = b
x ∈ Rn is optimization variable; fi : Rn → R are convex:
fi(λx+ (1− λ)y) ≤ λfi(x) + (1− λ)fi(y)
for all x, y, 0 ≤ λ ≤ 1
• includes least-squares, linear programming, maximum volume ellipsoidin polyhedron, and many others
• convex problems are fundamentally tractable
CS286a seminar 10/15/04 9
Example: Robust LP
minimize cTxsubject to Prob(aTi x ≤ bi) ≥ η, i = 1, . . . ,m
coefficient vectors ai IID, N (ai,Σi); η is required reliability
• for fixed x, aTi x is N (aTi x, xTΣix)
• so for η = 50%, robust LP reduces to LP
minimize cTxsubject to aTi x ≤ bi, i = 1, . . . ,m
and so is easily solved
• what about other values of η, e.g., η = 10%? η = 90%?
CS286a seminar 10/15/04 10
constraint Prob(aTi x ≤ bi) ≥ η equivalent to
aTi x+Φ−1(η)‖Σ1/2i x‖2 − bi ≤ 0
Φ is CDF of unit Gaussian
is LHS a convex function?
CS286a seminar 10/15/04 11
Hint
{x | Prob(aTi x ≤ bi) ≥ η, i = 1, . . . ,m}
η = 10% η = 50% η = 90%
CS286a seminar 10/15/04 12
That’s right
robust LP with reliability η = 90% is convex, and very easily solved
robust LP with reliability η = 10% is not convex, and extremely difficult
CS286a seminar 10/15/04 13
Maximum volume ellipsoid in polyhedron
• polyhedron: P = {x | aTi x ≤ bi, i = 1, . . . ,m}• ellipsoid: E = {By + d | ‖y‖ ≤ 1}, with B = BT Â 0
PSfrag replacementsE
s
d
maximum volume E ⊆ P, as convex problem in variables B, d:
maximize log detBsubject to B = BT Â 0, ‖Bai‖+ aTi d ≤ bi, i = 1, . . . ,m
CS286a seminar 10/15/04 14
Moral
• it’s not easy to recognize convex functions and convex optimizationproblems
• huge benefit, though, when we do
CS286a seminar 10/15/04 15
Convex Analysis and Optimization
Convex analysis & optimization
nice properties of convex optimization problems known since 1960s
• local solutions are global• duality theory, optimality conditions• simple solution methods like alternating projections
convex analysis well developed by 1970s Rockafellar
• separating & supporting hyperplanes• subgradient calculus
CS286a seminar 10/15/04 16
What’s new (since 1990 or so)
• primal-dual interior-point (IP) methodsextremely efficient, handle nonlinear large scale problems,polynomial-time complexity results, software implementations
• new standard problem classesgeneralizations of LP, with theory, algorithms, software
• extension to generalized inequalitiessemidefinite, cone programming
CS286a seminar 10/15/04 17
Applications and uses
• lots of applicationscontrol, combinatorial optimization, signal processing,circuit design, communications, machine learning . . .
• robust optimizationrobust versions of LP, least-squares, other problems
• relaxations and randomizationprovide bounds, heuristics for solving hard (e.g., combinatorialoptimization) problems
CS286a seminar 10/15/04 18
Recent history
• 1984–97: interior-point methods for LP– 1984: Karmarkar’s interior-point LP method– theory Ye, Renegar, Kojima, Todd, Monteiro, Roos, . . .– practice Wright, Mehrotra, Vanderbei, Shanno, Lustig, . . .
• 1988: Nesterov & Nemirovsky’s self-concordance analysis• 1989–: LMIs and semidefinite programming in control• 1990–: semidefinite programming in combinatorial optimizationAlizadeh, Goemans, Williamson, Lovasz & Schrijver, Parrilo, . . .
• 1994: interior-point methods for nonlinear convex problemsNesterov & Nemirovsky, Overton, Todd, Ye, Sturm, . . .
• 1997–: robust optimization Ben Tal, Nemirovsky, El Ghaoui, . . .
CS286a seminar 10/15/04 19
New Standard Convex Problem Classes
Some new standard convex problem classes
• second-order cone program (SOCP)• geometric program (GP) (and entropy problems)• semidefinite program (SDP)
all these new problem classes have
• complete duality theory, similar to LP• good algorithms, and robust, reliable software• wide variety of new applications
CS286a seminar 10/15/04 20
Second-order cone program
second-order cone program (SOCP) has form
minimize cT0 x
subject to ‖Aix+ bi‖2 ≤ cTi x+ di, i = 1, . . . ,m
with variable x ∈ Rn
• includes LP and QP as special cases• nondifferentiable when Aix+ bi = 0
• new IP methods can solve (almost) as fast as LPs
CS286a seminar 10/15/04 21
Example: robust linear program
minimize cTxsubject to Prob(aTi x ≤ bi) ≥ η, i = 1, . . . ,m
where ai ∼ N (ai,Σi)
equivalent to
minimize cTx
subject to aTi x+Φ−1(η)‖Σ1/2i x‖2 ≤ bi, i = 1, . . . ,m
where Φ is (unit) normal CDF
robust LP is an SOCP for η ≥ 0.5 (Φ−1(η) ≥ 0)
CS286a seminar 10/15/04 22
Geometric program (GP)
log-sum-exp function:
lse(x) = log (ex1 + · · ·+ exn)
. . . a smooth convex approximation of the max function
geometric program:
minimize lse(A0x+ b0)
subject to lse(Aix+ bi) ≤ 0, i = 1, . . . ,m
Ai ∈ Rmi×n, bi ∈ Rmi; variable x ∈ Rn
CS286a seminar 10/15/04 23
Posynomial form geometric program
x = (x1, . . . , xn): vector of positive variables
function f of form
f(x) =
t∑
k=1
ckxα1k1 x
α2k2 · · ·xαnk
n
with ck ≥ 0, αik ∈ R, is called posynomial
like polynomial, but
• coefficients must be positive• exponents can be fractional or negative
CS286a seminar 10/15/04 24
Posynomial form geometric program
posynomial form GP:
minimize f0(x)subject to fi(x) ≤ 1, i = 1, . . . ,m
fi are posynomial; xi > 0 are variables
to convert to (convex form) GP, express as
minimize log f0(ey)
subject to log fi(ey) ≤ 0, i = 1, . . . ,m
objective and constraints have form lse(Aiy + bi)
CS286a seminar 10/15/04 25
Entropy problems
unnormalized negative entropy is convex function
− entr(x) =
n∑
i=1
xi log(xi/1Tx)
defined for xi ≥ 0, 1Tx > 0
entropy problem:
minimize − entr(A0x+ b0)
subject to − entr(Aix+ bi) ≤ 0, i = 1, . . . ,m
Ai ∈ Rmi×n, bi ∈ Rmi
CS286a seminar 10/15/04 26
Solving GPs (and entropy problems)
• GP and entropy problems are duals (if we solve one, we solve the other)
• new IP methods can solve large scale GPs (and entropy problems)almost as fast as LPs
• applications in many areas:– information theory, statistics– communications, wireless power control– digital and analog circuit design
CS286a seminar 10/15/04 27
Generalized inequalities
with proper convex cone K ⊆ Rk we associate generalized inequality
x ¹K y ⇐⇒ y − x ∈ K
convex optimization problem with generalized inequalities:
minimize f0(x)
subject to f1(x) ¹K1 0, . . . , fL(x) ¹KL0, Ax = b
fi : Rn → Rki are Ki-convex: for all x, y, 0 ≤ λ ≤ 1,
fi(λx+ (1− λ)y) ¹Kiλfi(x) + (1− λ)fi(y)
CS286a seminar 10/15/04 28
Semidefinite program
semidefinite program (SDP):
minimize cTxsubject to x1A1 + · · ·+ xnAn ¹ B
• B, Ai are symmetric matrices; variable is x ∈ Rn
• ¹ is matrix inequality; constraint is linear matrix inequality (LMI)• SDP can be expressed as convex problem as
λmax(B − x1A1 · · · − xnAn) ≤ 0
or handled directly as cone problem
CS286a seminar 10/15/04 29
Semidefinite programming
• nearly complete duality theory, similar to LP
• interior-point algorithms that are efficient in theory & practice
• applications in many areas:– control theory– combinatorial optimization & graph theory– structural optimization– statistics– signal processing– circuit design– geometrical problems– communications and information theory– quantum computing– algebraic geometry– machine learning
CS286a seminar 10/15/04 30
Continuous time symmetric Markov chain on a graph
• connected graph G = (V, E), V = {1, . . . , n}, no self-loops
• Markov process on V, with symmetric rate matrix Q– Qij = 0 for (i, j) 6∈ E , i 6= j– Qij ≥ 0 for (i, j) ∈ E– Qii = −
∑
j 6=iQij
• eigenvalues of Q ordered as 0 = λ1(Q) > λ2(Q) ≥ · · · ≥ λn(Q)
• state distribution given by π(t) = etQπ(0)
• distribution converges to uniform with rate determined by λ2
‖π(t)− 1/n‖tv ≤ (√n/2)eλ2(Q)t
CS286a seminar 10/15/04 31
Fastest mixing Markov chain (FMMC) problem
minimize λ2(Q)
subject to Q1 = 0, Q = QT
Qij = 0 for (i, j) 6∈ E , i 6= j
Qij ≥ 0 for (i, j) ∈ E∑
(i,j)∈E d2ijQij ≤ 1
PSfrag replacements
?
? ??
? ?
?
• variable is matrix Q; problem data is graph, constants dij = dij on E
• need to add constraint on rates since λ2 is homogeneous
• optimal Q gives fastest diffusion process on graph(subject to rate constraint)
CS286a seminar 10/15/04 32
Fast diffusion processes
electrical
PSfrag replacements
12
34
5
6
g13
g12 g23 g34g45
g46
mechanical
PSfrag replacements
1 2 3 4 5 6
k13
k12 k23 k34 k45
k46
CS286a seminar 10/15/04 33
Swap objective and constraint
• λ2(Q) and∑
(i,j)∈E d2ijQij are homogeneous
• hence, can just as well minimize weighted rate sum ∑
(i,j)∈E d2ijQij
subject to bound on λ2(Q)
minimize∑
(i,j)∈E d2ijQij
subject to Q1 = 0, Q = QT
Qij = 0 for (i, j) 6∈ E , i 6= j
Qij ≥ 0 for (i, j) ∈ Eλ2(Q) ≤ −1
CS286a seminar 10/15/04 34
SDP formulation of FMMC
using Q1 = 0, we have
λ2(Q) ≤ −1 ⇐⇒ Q− 11T/n ¹ −I
so FMMC reduces to SDP
minimize∑
(i,j)∈E d2ijQij
subject to Q1 = 0, Q = QT
Qij = 0 for (i, j) 6∈ E , i 6= j
Qij ≥ 0 for (i, j) ∈ EQ− 11T/n ¹ −I
hence: can solve efficiently, duality theory, . . .
CS286a seminar 10/15/04 35
Robust Optimization
Robust optimization problem
robust optimization problem:
minimize maxa∈A f0(a, x)subject to maxa∈A fi(a, x) ≤ 0, i = 1. . . . ,m
• x is optimization variable
• a is uncertain parameter
• A is parameter set (e.g., box, polyhedron, ellipsoid)
heuristics (detuning, sensitivity penalty term) have been used sincebeginning of optimization
CS286a seminar 10/15/04 36
Robust optimization via convex optimization
El Ghaoui, Ben Tal & Nemirovsky, . . .
• robust versions of LP, QP, SOCP problems, with ellipsoidal orpolyhedral uncertainty, can be formulated as SDPs (or simpler)
• other robust problems (e.g., SDP) intractable, but there are goodconvex approximations
CS286a seminar 10/15/04 37
Robust least-squares
robust LS problem with uncertain parameters v1, . . . , vp
minimize sup‖v‖≤1 ‖(A0 + v1A1 + · · ·+ vpAp)x− b‖2
equivalent SDP (variables x, t1, t2):
minimize t1 + t2
subject to
I P (x) q(x)P (x)T t1I 0q(x)T 0 t2
º 0
where
P (x) =[
A1x A2x · · · Apx]
∈ Rm×p, q(x) = A0x− b
CS286a seminar 10/15/04 38
example: minimize sup‖v‖≤ρ ‖(A0 + v1A1 + v2A2)x− b‖2
0 0.2 0.4 0.6 0.8 10
2
4
6
8
10
12
PSfrag replacements
ρ
worst-caseresidual
LS sol.
robust LS sol.
(‖A0‖ = 10, ‖A1‖ = ‖A2‖ = 1)
CS286a seminar 10/15/04 39
example: minimize sup‖v‖≤1 ‖(A0 + v1A1 + v2A2)x− b‖2
0 1 2 3 4 50
0.05
0.1
0.15
0.2
0.25
PSfrag replacements
‖(A0 + v1A1 + v2A2)x− b‖2
xls
xtych
xrls
freq
uen
cy
distribution assuming v is uniformly distributed
CS286a seminar 10/15/04 40
Relaxations & Randomization
Relaxations & randomization
convex optimization is increasingly used
• to find good bounds for hard (i.e., nonconvex) problems, via relaxation
• as a heuristic for finding good suboptimal points, often viarandomization
CS286a seminar 10/15/04 41
Example: Boolean least-squares
Boolean least-squares problem:
minimize ‖Ax− b‖2subject to x2
i = 1, i = 1, . . . , n
• basic problem in digital communications• could check all 2n possible values of x . . .• an NP-hard problem, and very hard in practice• many heuristics for approximate solution
CS286a seminar 10/15/04 42
Boolean least-squares as matrix problem
‖Ax− b‖2 = xTATAx− 2bTAx+ bT b
= TrATAX − 2bTATx+ bT b
where X = xxT
hence can express BLS as
minimize TrATAX − 2bTAx+ bT bsubject to Xii = 1, X º xxT , rank(X) = 1
. . . still a very hard problem
CS286a seminar 10/15/04 43
SDP relaxation for BLS
ignore rank one constraint, and use
X º xxT ⇐⇒[
X xxT 1
]
º 0
to obtain SDP relaxation (with variables X, x)
minimize TrATAX − 2bTATx+ bT b
subject to Xii = 1,
[
X xxT 1
]
º 0
• optimal value of SDP gives lower bound for BLS• if optimal matrix is rank one, we’re done
CS286a seminar 10/15/04 44
Interpretation via randomization
• can think of variables X, x in SDP relaxation as defining a normaldistribution z ∼ N (x,X − xxT ), with E z2
i = 1
• SDP objective is E ‖Az − b‖2
suggests randomized method for BLS:
• find X?, x?, optimal for SDP relaxation
• generate z from N (x?, X? − x?x?T )
• take x = sgn(z) as approximate solution of BLS(can repeat many times and take best one)
CS286a seminar 10/15/04 45
Example
• (randomly chosen) parameters A ∈ R150×100, b ∈ R150
• x ∈ R100, so feasible set has 2100 ≈ 1030 points
LS approximate solution: minimize ‖Ax− b‖ s.t. ‖x‖2 = n, then round
yields objective 8.7% over SDP relaxation bound
randomized method: (using SDP optimal distribution)
• best of 20 samples: 3.1% over SDP bound
• best of 1000 samples: 2.6% over SDP bound
CS286a seminar 10/15/04 46
1 1.20
0.1
0.2
0.3
0.4
0.5
PSfrag replacements
‖Ax− b‖/(SDP bound)
freq
uen
cySDP bound LS solution
CS286a seminar 10/15/04 47
Calculus of Convex Functions
Approach
• basic examples or atoms
• calculus rules or transformations that preserve convexity
CS286a seminar 10/15/04 48
Convex functions: Basic examples
• xp for p ≥ 1 or p ≤ 0; −xp for 0 ≤ p ≤ 1
• ex, − log x, x log x
• xTx; xTx/y (for y > 0); (xTx)1/2
• ‖x‖ (any norm)
• max(x1, . . . , xn), log(ex1 + · · ·+ exn)
• log Φ(x) (Φ is Gaussian CDF)
• log detX−1 (for X Â 0)
CS286a seminar 10/15/04 49
Calculus rules
• convexity preserved under sums, nonnegative scaling
• if f cvx, then g(x) = f(Ax+ b) cvx
• pointwise sup: if fα cvx for each α ∈ A, then g(x) = supα∈A fα(x) cvx
• minimization: if f(x, y) cvx, then g(x) = infy f(x, y) cvx
• composition rules: if h cvx & increasing, f cvx, then g(x) = h(f(x))cvx
• perspective transformation: if f cvx, then g(x, t) = tf(x/t) cvx fort > 0
. . . and many, many others
CS286a seminar 10/15/04 50
More examples
• λmax(X) (for X = XT )
• f(x) = x[1] + · · ·+ x[k] (sum of largest k elements of x)
• −∑mi=1 log(−fi(x)) (on {x | fi(x) < 0}; fi cvx)
• f(x) = logProb(x+ z ∈ C) (C convex, z ∼ N (0,Σ))
• xTY −1x is cvx in (x, Y ) for Y = Y T Â 0
CS286a seminar 10/15/04 51
Duality
Lagrangian and dual function
primal problem:
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . ,m
Lagrangian: L(x, λ) = f0(x) +∑m
i=1 λifi(x)
dual function: g(λ) = infxL(x, λ)
CS286a seminar 10/15/04 52
Lower bound property
• for any primal feasible x and any λ ≥ 0, we have f0(x) ≤ g(λ)
• hence for any λ ≥ 0, g(λ) ≤ p? (optimal value of primal problem)
• duality gap of x, λ is defined as f0(x)− g(λ)(gap is nonnegative; bounds suboptimality of x)
CS286a seminar 10/15/04 53
Dual problem
find best lower bound on p?:
maximize g(λ)
subject to λi ≥ 0, i = 1, . . . ,m
p? is optimal value of primal, and d? is optimal value of dual
• weak duality: even when primal not convex, d? ≤ p?
• for convex primal problem, we have strong duality: d? = p?
(provided technical condition holds)
CS286a seminar 10/15/04 54
Example: linear program
primal:minimize cTxsubject to Ax ¹ b
dual:maximize bTλsubject to ATλ+ c = 0, λ º 0
CS286a seminar 10/15/04 55
Example: unconstrained geometric program
primal:minimize log
(∑m
i=1 exp(aTi x+ bi)
)
dual:maximize bTν −∑m
i=1 νi log νisubject to 1Tν = 1, ATν = 0, ν º 0
. . . an entropy maximization problem
CS286a seminar 10/15/04 56
Example: duality between FMMC and MVU
FMMC problem
minimize∑
(i,j)∈E d2ijQij
subject to Q1 = 0, Q = QT
Qij = 0 for (i, j) 6∈ E , i 6= j
Qij ≥ 0 for (i, j) ∈ EQ− 11T/n ¹ −I
dual of FMMC problem
maximize TrX
subject to Xii +Xjj −Xij −Xji ≤ d2ij, (i, j) ∈ E
X1 = 0, X º 0
CS286a seminar 10/15/04 57
FMMC dual as maximum-variance unfolding
• use variables x1, . . . , xn, with X =
xT1...xTn
[x1 · · ·xn]
• dual FMMC problem becomes
maximize∑n
i=1 ‖xi‖2
subject to∑
i xi = 0, ‖xi − xj‖ ≤ dij, (i, j) ∈ E
• position n points in Rn to maximize variance, respecting local distanceconstraints, i.e., maximum-variance unfolding problem
CS286a seminar 10/15/04 58
Semidefinite embedding
• similar to semidefinite embedding for unsupervised learning ofmanifolds (which has distance equality constraints)(Weinberger, Saul 2003)
• surprise: fastest diffusion on graph, max-variance unfolding are duals
CS286a seminar 10/15/04 59
Interior-Point Methods
Interior-point methods
• handle linear and nonlinear convex problems Nesterov & Nemirovsky• based on Newton’s method applied to ‘barrier’ functions that trap x ininterior of feasible region (hence the name IP)
• worst-case complexity theory: # Newton steps ∼ √problem size• in practice: # Newton steps between 20 & 50 (!)— over wide range of problem dimensions, type, and data
• 1000 variables, 10000 constraints feasible on PC; far larger if structureis exploited
• readily available (commercial and noncommercial) packages
CS286a seminar 10/15/04 60
Log barrier
for convex problem
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . ,m
we define logarithmic barrier as
φ(x) = −m∑
i=1
log(−fi(x))
• φ is convex, smooth on interior of feasible set
• φ→∞ as x approaches boundary of feasible set
CS286a seminar 10/15/04 61
Central path
central path is curve
x?(t) = argminx
(tf0(x) + φ(x)) , t ≥ 0
• x?(t) is strictly feasible, i.e., fi(x) < 0
• x?(t) can be computed by, e.g., Newton’s method
• intuition suggests x?(t) converges to optimal as t→∞• using duality can prove x?(t) is m/t-suboptimal
CS286a seminar 10/15/04 62
Central path & duality
from
∇x(tf0(x?) + φ(x?)) = t∇f0(x
?) +
m∑
i=1
1
−fi(x?)∇fi(x?) = 0
we find that
λ?i (t) =1
−tfi(x?), i = 1, . . . ,m
is dual feasible, with g(λ?(t)) = f0(x?)−m/t
• duality gap associated with pair x?(t), λ?(t) is m/t• hence, x?(t) is m/t-suboptimal
CS286a seminar 10/15/04 63
Example: central path for LP
x?(t) = argminx
(
tcTx−∑6i=1 log(bi − aTi x)
)
PSfrag replacements
c
x?(0)
xopt x?(10)
CS286a seminar 10/15/04 64
Barrier method
a.k.a. path-following method
given strictly feasible x, t > 0, µ > 1
repeat
1. compute x := x?(t)
(using Newton’s method, starting from x)
2. exit if m/t < tol
3. t := µt
duality gap reduced by µ each outer iteration
CS286a seminar 10/15/04 65
Trade-off in choice of µ
large µ means
• fast duality gap reduction (fewer outer iterations), but
• many Newton steps to compute x?(t+)(more Newton steps per outer iteration)
total effort measured by total number of Newton steps
CS286a seminar 10/15/04 66
Typical example
GP with n = 50 variables,m = 100 constraints, mi = 5
• wide range of µ works well• very typical behavior(even for large m, n)
0 20 40 60 80 100 120
10−6
10−4
10−2
100
102
PSfrag replacements
Newton iterations
dual
ity
gap
µ = 2µ = 50µ = 150
CS286a seminar 10/15/04 67
Effect of µ
0 20 40 60 80 100 120 140 160 180 2000
20
40
60
80
100
120
140
PSfrag replacements
µ
Newtoniterations
barrier method works well for µ in large range
CS286a seminar 10/15/04 68
Typical convergence of IP method
0 10 20 30 40 5010
−6
10−4
10−2
100
102
PSfrag replacements
GP LPSOCP
SDP
# Newton steps
dual
ity
gap
LP, GP, SOCP, SDP with 100 variables
CS286a seminar 10/15/04 69
Typical effort versus problem dimensions
• LPs with n vbles, 2nconstraints
• 100 instances for each of20 problem sizes
• avg & std dev shown
101
102
10315
20
25
30
35
PSfrag replacements
n
New
ton
step
s
CS286a seminar 10/15/04 70
Complexity analysis
• based on self-concordance (Nesterov & Nemirovsky, 1988)
• for any choice of µ, #steps is O(m log 1/ε), where ε is final accuracy
• to optimize complexity bound, can take µ = 1 + 1/√m, which yields
#steps O(√m log 1/ε)
• in any case, IP methods work extremely well in practice
CS286a seminar 10/15/04 71
Computational effort per Newton step
• Newton step effort dominated by solving linear equations to findprimal-dual search direction
• equations inherit structure from underlying problem(e.g., sparsity, symmetry, toeplitz, circulant, Hankel, Kronecker)
• equations same as for least-squares problem of similar size and structure
conclusion:
we can solve a convex problem with about the same effort assolving 20–50 least-squares problems
CS286a seminar 10/15/04 72
Other interior-point methods
more sophisticated IP algorithms
• primal-dual, incomplete centering, infeasible start• use same ideas, e.g., central path, log barrier• readily available (commercial and noncommercial packages)
typical performance: 20 – 50 Newton steps (!)— over wide range of problem dimensions, problem type, and problem data
CS286a seminar 10/15/04 73
Exploiting structure
sparsity
• well developed, since late 1970s
• direct (sparse factorizations) and iterative methods (CG, LSQR)
• standard in general purpose LP, QP, GP, SOCP implementations
• can solve problems with 105, 106 vbles, constraints(depending on sparsity pattern)
symmetry
• reduce number of variables
• reduce size of matrices (particularly helpful for SDP)
CS286a seminar 10/15/04 74
A return to subgradient-type methods
• very large-scale problems: 105 − 107 variables
• applications: medical imaging, shape design of mechanical structures,machine learning and data mining, . . .
• IPM out of consideration (even single iteration prohibitive); can useonly first-order information (function values and subgradients)
• a reasonable algorithm should have– computational effort per iteration at most linear in design dimension– potential to obtain (and willing to accept) medium-accuracy solutions– error reduction factor essentially independent of problem dimension
Ben Tal & Nemirovsky, Nesterov, . . .
CS286a seminar 10/15/04 75
Conclusions
Conclusions
convex optimization
• theory fairly mature; practice has advanced tremendously last decade
• qualitatively different from general nonlinear programming
• cost only 30× more than least-squares, but far more expressive
• lots of applications still to be discovered
CS286a seminar 10/15/04 76
Some references
• Convex Optimization, Boyd & Vandenberghe, 2004www.stanford.edu/~boyd/cvxbook.html
(pdf of full text on web)
• Introductory Lectures on Convex Optimization, Nesterov, 2003
• Lectures on Modern Convex Optimization, Ben Tal & Nemirovsky, 2001
• Interior-point Polynomial Algorithms in Convex Programming,Nesterov & Nemirovsky, 1994
• Linear Matrix Inequalities in System and Control Theory,Boyd, El Ghaoui, Feron, & Balakrishnan, 1994
CS286a seminar 10/15/04 77
Recommended