Convex Optimization in Communications and Signal …en.its.kpi.ua/International_office/SiteAssets/Lists/News...Convex Optimization in Communications and Signal Processing Prof. Dr.-Ing

Convex Optimization inCommunications and Signal Processing

Convex Optimization inCommunications and Signal Processing

Prof. Dr.-Ing. Wolfgang Gerstacker 1

University of Erlangen-NürnbergInstitute for Digital Communications

National Technical University of Ukraine, KPI, Kiev, April 2015

1These lecture slides are largely based on the book ”ConvexOptimization” by Steven Boyd and Lieven Vandenberghe and thecorresponding slides. Many thanks to Prof. Boyd for the permissionto use his materials for this course.

1 Introduction

2 Convex sets

3 Convex functions

4 Convex optimization problems

5 Duality

6 Applications in communications

7 Algorithms – Equality constrained minimization

8 Conclusions

Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)00

1. Introduction1. Introduction

OutlineOutline

1 Introduction

2 Convex sets



IntroductionIntroduction

mathematical optimization

least-squares and linear programming

convex optimization

course goals and topics

nonlinear optimization

brief history of convex optimization



Mathematical optimizationMathematical optimization

(mathematical) optimization problem

minimize f0(x)subject to fi(x) ≤ bi, i = 1, ...,m

x = (x1, ..., xn): optimization variables

f0 : Rn Ï R: objective function

fi : Rn Ï R, i = 1, ...,m: constraint functions

optimal solution x∗ has smallest value of f0 among allvectors that satisfy the constraints



ExamplesExamplesportfolio optimization

variables: amounts invested in different assetsconstraints: budget, max./min. investment perasset, minimum returnobjective: overall risk or return variance

device sizing in electronic circuitsvariables: device widths and lengthsconstraints: manufacturing limits, timingrequirements, maximum areaobjective: power consumption

data fittingvariables: model parametersconstraints: prior information, parameter limitsobjective: measure of misfit or prediction error



Examples (communications and signal processing)Examples (communications and signal processing)

channel estimation

detection

filter design

beamformer design

network optimization

power control

. . .



Solving optimization problemsSolving optimization problems

general optimization problem

very difficult to solve

methods involve some compromise, e.g., very longcomputation time, or not always finding the solution

exceptions: certain problem classes can be solvedefficiently and reliably

least-squares problems

linear programming problems

convex optimization problems



Least-squares ILeast-squares I

minimize ‖Ax − b‖22interpretation

linear system of equations Ax = b can be alsowritten as aT

i x = bi, i = 1, . . . , k, where aTi are the

rows of A ∈ Rk×n and bi are the entries of b ∈ Rk

‖Ax − b‖22 is equal tok∑

i=1(aTi x − bi)2, i.e. the sum of

squared equation errors

solving least-squares problems

analytical solution: x∗ = (ATA)−1ATbreliable and efficient algorithms and software



Least-squares IILeast-squares II

computation time proportional to n2k (A ∈ Rk×n); lessif structured

a mature technology

using least-squares

least-squares problems are easy to recognize

a few standard techniques increase flexibility (e.g .,including weights, adding regularization terms)



Calculation of least-squares solution ICalculation of least-squares solution Icost function:

J(x) = ‖Ax − b‖22 = (Ax − b)T (Ax − b)= xTATAx − bTAx − xTATb + bTb

we need the gradient vector

∂J∂x = [ ∂J

∂x1 ∂J∂x2 . . . ∂J

∂xn

]T

differentiation rules for some simple costfunctions:

J(x) = cTx = c1x1 + . . .+ cnxn∂J∂x = [ c1 . . . cn ]T = c

J(x) = xTCx∂J∂x = 2 Cx



Calculation of least-squares solution IICalculation of least-squares solution II

ÍÑ for the least-squares cost function we get

∂J∂x = 2 ATAx − 2 ATb

setting the gradient vector to zero yields the solution

x∗ = (ATA)−1ATb

this is the global minimum since the function isdifferentiable, there is only one local extremum andxTATAx = (Ax)T (Ax) ≥ 0 ∀x



Modified least-squares cost functionsModified least-squares cost functions

weighted cost function

J(x) = k∑i=1 wi (aT

i x − bi)2with nonnegative weight factors wi

regularized cost function

J(x) = k∑i=1 (aT

i x − bi)2 + ρn∑

i=1 x2i

both cost functions can be minimized with similarcalculations as the original least-squares cost function



Linear programmingLinear programming

minimize cTxsubject to aT

i x ≤ bi, i = 1, ...,msolving linear programs

no analytical formula for solutionreliable and efficient algorithms and software(e.g. Simplex algorithm by Dantzig)computation time proportional to n2m if m ≥ n; lessif structureda mature technology

using linear programmingnot as easy to recognize as least-squares problemsa few standard tricks used to convert problems intolinear programs (e.g ., problems involving `1- or`∞-norms, piecewise-linear functions)



example for a linear programming problem: Chebyshevapproximation problem

minimize maxi=1,...,m |aTi x − bi|

Problem is equivalent to the linear program

minimize tsubject to aT

i x − t ≤ bi, i = 1, ...,m−aT

i x − t ≤ −bi, i = 1, ...,m−t ≤ 0

with variables x and t



Convex optimization problemConvex optimization problem

minimize f0(x)subject to fi(x) ≤ bi, i = 1, ...,m

objective and constraint functions are convex:

fi(αx + βy) ≤ αfi(x) + βfi(y)if α+ β = 1, α ≥ 0, β ≥ 0includes least-squares problems and linearprograms as special cases



solving convex optimization problems

no analytical solution

reliable and efficient algorithms (e.g. interior pointalgorithms)

computation time (roughly) proportional tomax{n3,n2m,F}, where F is cost of evaluating fi’sand their first- and second-order derivatives

almost a technology for subclasses of convexproblems like second-order cone programming



using convex optimization

often difficult to recognize (is a given functionconvex?)

many tricks for transforming problems into convexform

surprisingly many problems can be solved viaconvex optimization

in particular, during the last 10-15 years, a varietyof problems in communications and signalprocessing could be solved via convex optimization



Course goals and topicsCourse goals and topics

goals1 recognize/formulate problems as convex

optimization problems2 characterize optimal solution (optimal power

distribution), give limits of performance, etc.3 apply techniques to problems in communications

and signal processing4 understand the basic principles of convex

optimization algorithms

topics1 convex sets, functions, optimization problems2 examples and applications3 algorithms



Nonlinear optimizationNonlinear optimizationtraditional techniques for general nonconvex problemsinvolve compromiseslocal optimization methods (nonlinear programming)

find a point that minimizes f0 among feasible pointsnear itfast, can handle large problemsrequire initial guessprovide no information about distance to (global)optimumExample: gradient search

global optimization methodsfind the (global) solutionworst-case complexity grows exponentially withproblem size

these algorithms are often based on solving convexsubproblems



Brief history of convex optimization IBrief history of convex optimization I

theory (convex analysis): ca. 1900 - 1970

algorithms

1947: simplex algorithm for linear programming(Dantzig)

1960s: early interior-point methods (Fiacco &McCormick, Dikin, . . . )

1970s: ellipsoid method and other subgradientmethods

1980s: polynomial-time interior-point methods forlinear programming (Karmarkar 1984)

late 1980s–now: polynomial-time interior-pointmethods for nonlinear convex optimization(Nesterov & Nemirovski 1994)



Brief history of convex optimization IIBrief history of convex optimization II

applications

before 1990: mostly in operations research; few inengineering

since 1990: many new applications in engineering(control, signal processing, communications, circuitdesign, . . . ); new problem classes (semidefinite andsecond-order cone programming, robustoptimization)


2. Convex sets2. Convex sets

OutlineOutline

1 Introduction

2 Convex sets

3 Convex functions



Convex setsConvex sets

vector spaces and subspaces

affine and convex sets

some important examples

operations that preserve convexity

separating and supporting hyperplanes



Vector spaces and subspacesVector spaces and subspaces

variables to be optimized are typically collected invectors

Ñ we need the concept of a vector space frommathematics: collection of vectors for which vectoraddition and multiplication of vectors with scalarsare defined; these operations must fulfill a numberof requirements (axioms) such as associativity ofaddition, commutativity of addition etc.

one important property of a vector space isclosure: the result of addition and scalarmultiplication, respectively, belongs also to thevector space

subspace: subset of a vector space that is closedunder addition and scalar multiplication



Affine setAffine set

line through x1,x2: all points

x = θx1 + (1− θ)x2 = x2 + θ (x1 − x2) (θ ∈ R)

affine set: contains the line through any two distinctpoints in the set



example: solution set of linear equations {x | Ax = b}assume that we have two solutions x1, x2

Ax1 = b, Ax2 = b

Ñ θAx1 = θ b, (1− θ) Ax2 = (1− θ) b

A (θ x1 + (1− θ) x2)) = (θ + (1− θ)) b = b

θ x1 + (1− θ) x2, θ ∈ R is also a solution, set is affine

(conversely, every affine set can be expressed assolution set of system of linear equations)



Convex setConvex set

line segment between x1 and x2: all points

x = θx1 + (1− θ)x2with 0 ≤ θ ≤ 1convex set: contains line segment between any twopoints in the set

x1,x2 ∈ C, 0 ≤ θ ≤ 1 ÍÑ θx1 + (1− θ)x2 ∈ C

examples: (one convex, two nonconvex sets)



Convex combination and convex hullConvex combination and convex hull

convex combination of x1, . . . ,xk: any point x of theform

x = θ1x1 + θ2x2 + . . .+ θkxk

with θ1 + . . .+ θk = 1, θi ≥ 0convex hull conv S: set of all convex combinations ofpoints in S

conv S is also smallest convex set that contains S



Cones and convex conesCones and convex conesset C is called a cone if for any x ∈ C and θ ≥ 0, θ x ∈ C

conic (nonnegative) combination of x1 and x2: anypoint of the form

x = θ1x1 + θ2x2with θ1 ≥ 0, θ2 ≥ 0

two–dimensional pie slice with apex 0 and edgespassing through x1 and x2



convex cone: set is convex and a cone; set thatcontains all conic combinations of points in the set

Ñ for any x1, x2 ∈ C, θ1 ≥ 0, θ2 ≥ 0, we haveθ1 x1 + θ2 x2 ∈ C



Some important factsSome important facts

any line is affine; if it passes through zero, it is asubspace

any line segment is convex, but not affine

any subspace is affine, and a convex cone

any ray, i.e. a set {x0 + θ v | θ ≥ 0}, v 6= 0, is convexbut not affine; if x0 = 0, it is a convex cone



Hyperplanes and halfspacesHyperplanes and halfspaceshyperplane: set of the form {x | aTx = b}(a 6= 0)

halfspace: set of the form {x | aTx ≤ b}(a 6= 0)

a is the normal vectorhyperplanes are affine and convex; halfspaces areconvex



Alternative representation of hyperplanesAlternative representation of hyperplanes

hyperplane in a two-dimensional vector space(line):

x = x0 + θ c, θ ∈ R

c is orthogonal to a, cT a = 0Ñ aTx = aTx0︸︷︷︸

b

+θ aTc︸︷︷︸0hyperplane in a three-dimensional vector space(plane):

x = x0 + θ1 c1 + θ2 c2, θ1, θ2 ∈ R

c1, c2 are orthogonal to a, cT1 a = 0, cT2 a = 0Ñ aTx = aTx0︸︷︷︸

b

+θ1 aTc1︸︷︷︸0+θ2 aTc2︸︷︷︸0



Euclidean balls and ellipsoidsEuclidean balls and ellipsoids(Euclidian) ball with center xc and radius r:

B(xc, r) = {x | ‖x − xc‖2 ≤ r} = {xc + ru | ‖u‖2 ≤ 1}ellipsoid: set of the form

{x | (x − xc)TP−1(x − xc) ≤ 1}with P ∈ Sn++ (Sn++: set of all symmetric positive definitematrices)

other representation: {xc + Au | ‖u‖2 ≤ 1} with A squareand nonsingular (A = P1/2)Euclidean balls and ellipsoids are convex sets



Norm balls and norm conesNorm balls and norm cones

norm: a function ‖ · ‖ that satisfies

‖x‖ ≥ 0; ‖x‖ = 0 if and only if x = 0‖tx‖ = |t| ‖x‖ for t ∈ R‖x + y‖ ≤ ‖x‖+ ‖y‖

notation: ‖ · ‖ is general (unspecified) norm; ‖ · ‖symb isparticular norm

examples:

`1-norm: ‖x‖1 = |x1|+ |x2|+ . . .+ |xn|`2-norm: ‖x‖2 =√|x1|2 + |x2|2 + . . .+ |xn|2(Euclidean norm)

`∞-norm: ‖x‖∞ = maxi=1,...,n |xi|`p-norm: ‖x‖p = p

√|x1|p + |x2|p + . . .+ |xn|p



norm ball with center xc and radius r: {x | ‖x−xc‖ ≤ r}

norm cone: {(x, t) | ‖x‖ ≤ t}Euclidean norm cone is calledsecond-order cone or ice-creamcone or quadratic cone (since itcan be defined by a quadraticinequality)

norm balls and cones are convex



PolyhedraPolyhedrasolution set of finitely many linear inequalities andequalities

Ax � b, Cx = d

(A ∈ Rm×n,C ∈ Rp×n,� is componentwise inequality)

polyhedron is intersection of finite number ofhalfspaces and hyperplanes



example: nonnegative orthant

Rn+ = {x ∈ Rn | xi ≥ 0, i = 1, . . . ,n} = {x ∈ Rn |x � 0}

Rn+ is polyhedron and convex cone

affine sets (e.g. subspaces, hyperplanes, lines),rays, line segments, halfspaces are all polyhedra

bounded polyhedron is called polytope

all polyhedra are convex



Positive semidefinite conePositive semidefinite cone

notation:

Sn is set of symmetric n × n matrices and forms alinear space or vector space

Sn+ = {X ∈ Sn | X � 0}: positive semidefinite n × nmatrices

X ∈ Sn+ ⇐Ñ zTXz ≥ 0 for all z

Sn+ is a convex cone

Sn++ = {X ∈ Sn | X � 0}: positive definite n × nmatrices

X ∈ Sn++ ⇐Ñ zTXz > 0 for all z 6= 0



Operations that preserve convexityOperations that preserve convexity

practical methods for establishing convexity of a set C1 apply definition

x1,x2 ∈ C, 0 ≤ θ ≤ 1 ÍÑ θx1 + (1− θ)x2 ∈ C

2 show that C is obtained from simple convex sets(hyperplanes, halfspaces, norm balls, . . . ) byoperations that preserve convexity

intersectionaffine functionsperspective functionlinear-fractional functions



IntersectionIntersection

the intersection of (any number of) convex sets isconvex



Affine functionAffine function

suppose f : Rn Ï Rm is affine (f (x) = Ax + b withA ∈ Rm×n,b ∈ Rm)

the image of a convex set under f is convex

S ⊆ Rn convex ÍÑ f (S) = {f (x) | x ∈ S} convex

the inverse image f−1(C) of a convex set under f isconvex

C ⊆ Rm convex ÍÑ f−1(C) = {x ∈ Rn | f (x) ∈ C} convex

examples

scaling, translation

projection



Perspective and linear-fractional functionPerspective and linear-fractional function

perspective function P : Rn+1 Ï Rn:

P(x, t) = x/t, dom P = {(x, t) | t > 0}images and inverse images of convex sets underperspective functions are convex



linear-fractional function f : Rn Ï Rm:

f (x) = Ax + bcTx + d , dom f = {x | cTx + d > 0}

f (·) may be viewed as concatenation of P(·) and afunction g(·), f (·) = P(·) ◦ g(·), with

g(x) = [ AcT

]x + [ b

d

]images and inverse images of convex sets underlinear-fractional functions are convex



Separating hyperplane theoremSeparating hyperplane theoremif C and D are disjoint convex sets, then there existsa 6= 0, b such that

aTx ≤ b for x ∈ C, aTx ≥ b for x ∈ D

the hyperplane {x | aTx = b} separates C and D

strict separation (i.e., at least in one of bothinequalities, ”≤” resp. ”≥” can be replaced by ”<” resp.”>”) requires additional assumptions (e.g ., C is closed,D is a singleton)



Supporting hyperplane theoremSupporting hyperplane theoremsupporting hyperplane to set C at boundary point x0:

{x | aTx = aTx0}where a 6= 0 and aTx ≤ aTx0 for all x ∈ C

hyperplane is tangent to C at x0halfspace defined by the hyperplane(in ”opposite direction” of a) contains C



supporting hyperplane theorem: if C is convex,then there exists a supporting hyperplane at everyboundary point of C


3. Convex functions3. Convex functions

OutlineOutline

2 Convex sets

3 Convex functions




Convex functionsConvex functions

basic properties and examples

operations that preserve convexity



DefinitionDefinition

f : Rn Ï R is convex if dom f is a convex set and

f (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y)for all x,y ∈ dom f , 0 ≤ θ ≤ 1

f is concave if −f is convex

f is strictly convex if dom f is convex and

f (θx + (1− θ)y) < θf (x) + (1− θ)f (y)for x,y ∈ dom f ,x 6= y, 0 < θ < 1



convexity means that the line segment through any twopoints of the graph of the function is above the graph

consider a function of one scalar variable, f (z)straight line through (x, f (x)) and (y, f (y)): g(z) = a z + bg(x) = a x + b

!= f (x), g(y) = a y + b!= f (y)

Ñ a = (f (y)− f (x))/(y − x), b = f (x)− (f (y)− f (x))/(y − x) · xÑ g(z) = f (x) + (f (y)− f (x))/(y − x) · (z − x)g(θ x + (1− θ) y) = f (x) + (f (y)− f (x))/(y − x) (x (θ − 1) + (1− θ) y)= f (x) + (1− θ) (f (y)− f (x))= θ f (x) + (1− θ) f (y) ≥ f (θ x + (1− θ) y)



Examples on RExamples on R

convex:

affine: ax + b on R, for any a,b ∈ Rexponential: eax, for any a ∈ Rpowers: xα on R++, for α ≥ 1 or α ≤ 0powers of absolute value: |x|p on R, for p ≥ 1negative entropy: x log x on R++

concave

affine: ax + b on R, for any a,b ∈ Rpowers: xα on R++, for 0 ≤ α ≤ 1logarithm: log x on R++



Examples on Rn and Rm×nExamples on Rn and Rm×n

affine functions are convex and concave

all norms are convex due to the following inequalityvalid for any norm function f (x):

f (θ x + (1− θ) y) ≤ f (θ x) + f ((1− θ) y) = θ f (x) + (1− θ) f (y)examples on Rn

affine function f (x) = aTx + bnorms: ‖x‖p = (∑n

i=1 |xi|p)1/p forp ≥ 1; ‖x‖∞ = maxk |xk|



examples on Rm×n (m × n matrices)affine function

f (X) = tr(ATX) + b = m∑i=1

n∑j=1 AijXij + b

spectral (maximum singular value) norm

f (X) = ‖X‖2 = σmax(X) = (λmax(XTX))1/2



Restriction of a convex function to a lineRestriction of a convex function to a line

f : Rn Ï R is convex if and only if the function g : RÏ R,

g(t) = f (x + tv), dom g = {t | x + tv ∈ dom f}

is convex (in t) for any x ∈ dom f , v ∈ Rn

can check convexity of f by checking convexity offunctions of one variable



First-order conditionFirst-order conditionf is differentiable if dom f is open and the gradient

∇f (x) = (∂f (x)∂x1 , ∂f (x)

∂x2 , · · · , ∂f (x)∂xn

),

exists at each x ∈ dom f

1st-order condition: differentiable f with convexdomain is convex iff

f (y) ≥ f (x) +∇f (x)T (y − x) for all x,y ∈ dom f

first-order Taylor series approximationof f is global underestimator



first–order approximation establishes a global lowerbound

from local information about the function (functionvalue, gradient vector) we can derive global information(global underestimator)

∇f (x) = 0 Ñ f (y) ≥ f (x)Ñ x is global minimizer of f



Second-order conditionsSecond-order conditions

f is twice differentiable if dom f is open and theHessian ∇2f (x) ∈ Sn,

∇2f (x)ij = ∂2f (x)∂xi∂xj

, i, j = 1, . . . ,n,exists at each x ∈ dom f

2nd-order conditions: for twice differentiable f withconvex domain

f is convex if and only if

∇2f (x) � 0 for all x ∈ dom f

if ∇2f (x) � 0 for all x ∈ dom f , then f is strictlyconvex



ExamplesExamplesquadratic function: f (x) = (1/2)xTPx + qTx + r (withP ∈ Sn)

∇f (x) = Px + q, ∇2f (x) = Pconvex if P � 0least-squares objective: f (x) = ‖Ax − b‖22

∇f (x) = 2AT (Ax − b), ∇2f (x) = 2ATA

convex (for any A)

quadratic-over-linear:f (x, y) = x2/y∇2f (x, y) = 2

y3[

y−x

] [y−x

]T� 0

convex for y > 0Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)5858


expression for the Hessian can be verified via thepartial derivatives:

∂2f (x, y)∂x∂x = 2

y

∂2f (x, y)∂y∂y = 2 x2

y3∂2f (x, y)∂x∂y = −2 x

y2∂2f (x, y)∂y∂x = −2 x

y2



log-sum-exp: f (x) = log∑nk=1 exp xk is convex

∇2f (x) = 11Tz

diag(z)− 1(1Tz)2 zzT (zk = exp xk)to show ∇2f (x) � 0, we must verify that vT∇2f (x)v ≥ 0 forall v:

vT∇2f (x)v = (∑k zkv2k )(∑k zk)− (∑k vkzk)2(∑k zk)2 ≥ 0

since (∑k vkzk)2 ≤ (∑k zkv2k )(∑k zk) (from Cauchy-Schwarz

inequality (∑ni=1 ai bi)2 ≤∑n

i=1 a2i ·∑n

i=1 b2i , use ai = √zi,

bi = √zi vi)

geometric mean: f (x) = (∏nk=1 xk)1/n on Rn++ is concave

(similar proof as for log-sum-exp)



Epigraph and sublevel setEpigraph and sublevel setα-sublevel set of f : Rn Ï R:

Cα = {x ∈ dom f | f (x) ≤ α}

sublevel sets of convex functions are convex (converseis false)

epigraph of f : Rn Ï R:

epi f = {(x, t) ∈ Rn+1 | x ∈ dom f , f (x) ≤ t}

(epi means ”above”)f is convex if and only if epi f is a convex set



Jensen’s inequalityJensen’s inequality

basic inequality: if f is convex, then for 0 ≤ θ ≤ 1,

f (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y)extension: if f is convex, then

f (E z) ≤ E f (z)for any random variable z



basic inequality is special case with discretedistribution

prob(z = x) = θ, prob(z = y) = 1− θgeneral discrete distribution

f( k∑

i=1 prob(zi) zi)≤

k∑i=1 prob(zi) f (zi)

Ñ f (θ1 x1+θ2 x2+. . .+θk xk) ≤ θ1 f (x1)+θ2 f (x2)+. . .+θk f (xk)with θi ≥ 0,

k∑i=1 θi = 1



continuous distribution p(x)f(∫

S

p(x) x dx)≤∫S

p(x) f (x) dx

with p(x) ≥ 0,∫

S p(x) dx = 1, S ⊆ dom fdithering with respect to a deterministic vector x

Ef (x + z) ≥ f (x)with zero–mean random vector z



Operations that preserve convexityOperations that preserve convexity

practical methods for establishing convexity of afunction

1 verify definition (often simplified by restricting to aline)

2 for twice differentiable functions, show ∇2f (x) � 03 show that f is obtained from simple convex

functions by operations that preserve convexitynonnegative weighted sumcomposition with affine functionpointwise maximum and supremumcompositionminimizationperspective



Positive weighted sum & composition with affine functionPositive weighted sum & composition with affine function

nonnegative multiple: αf is convex if f is convex,α ≥ 0sum: f1 + f2 convex if f1, f2 convex (extends to infinitesums, integrals)

f1(θ x + (1− θ) y) + f2(θ x + (1− θ) y)≤ θ f1(x) + (1− θ) f1(y) + θ f2(x) + (1− θ) f2(y)= θ (f1(x) + f2(x)) + (1− θ) (f1(y) + f2(y))



composition with affine function: f (Ax + b) isconvex if f is convex

f (A (θ x + (1− θ) y) + b)= f (θ (Ax + b) + (1− θ) (Ay + b))≤ θ f (Ax + b) + (1− θ) f (Ay + b)



Pointwise maximum IPointwise maximum I

if f1, . . . , fm are convex, then f (x) = max{f1(x), . . . , fm(x)} isconvex

examples

piecewise-linear function: f (x) = maxi=1,··· ,m(aTi x + bi)

is convex

sum of r largest components of x ∈ Rn:

f (x) = x[1] + x[2] + · · ·+ x[r]is convex (x[i] is ith largest component of x)

proof:

f (x) = max{xi1 + xi2 + · · ·+ xir | 1 ≤ i1 < i2 < · · · < ir ≤ n}



Pointwise maximum IIPointwise maximum II

consider e.g. x = [x1 x2 x3]T (n = 2),f (x) = x[1] + x[2] (r = 2)

equivalently, f can be expressed as

f (x) = max{x1 + x2, x2 + x3, x1 + x3}



Pointwise supremumPointwise supremum

supremum: ”smallest upper bound”

if f (x,y) is convex in x for each y ∈ A, then

g(x) = supy∈A

f (x,y)is convex



Composition with scalar functionsComposition with scalar functionscomposition of g : Rn Ï R and h : RÏ R:

f (x) = h(g(x))f is convex if

g convex, h convex, h nondecreasingg concave, h convex, h nonincreasing

proof (for n = 1, differentiable g ,h on R (functionsand derivatives exist for R))

f ′′(x) = h′′(g(x))g ′(x)2 + h′(g(x))g ′′(x)note: monotonicity must hold for extended-valueextension h

examplesexp g(x) is convex if g is convex1/g(x) is convex if g is concave and positive



Vector compositionVector compositioncomposition of g : Rn Ï Rk and h : Rk Ï R:

f (x) = h(g (x)) = h(g1(x), g2(x), . . . , gk(x))f is convex if:

gi convex, h convex, h nondecreasing in eachargumentgi concave, h convex, h nonincreasing in eachargument

proof (for n = 1, differentiable g ,h)

f ′′(x) = g ′(x)T∇2h(g (x))g ′(x) +∇h(g (x))Tg ′′(x)examples∑m

i=1 log gi(x) is concave if gi are concave andpositivelog∑m

i=1 exp gi(x) is convex if gi are convex



Minimization IMinimization I

if f (x,y) is convex in (x,y) and C is a convex set, then

g(x) = infy∈C

f (x,y)is convex; infimum: "greatest lower bound"



PerspectivePerspective

the perspective of a function f : Rn Ï R is the functiong : Rn × RÏ R,

g(x, t) = t f (x/t), dom g = {(x, t) | x/t ∈ dom f , t > 0}g is convex if f is convex

this can be shown via the epigraph:

(x, t, s) ∈ epi g ⇐Ñ t f (x/t) ≤ s⇐Ñ f (x/t) ≤ s/t

⇐Ñ (x/t, s/t) ∈ epi f

thus, epi g is the inverse image of epi f under theperspective mapping (u, v,w)Ñ (u/v,w/v); inverseimages of convex sets under perspective are convex!


4. Convex optimization problems4. Convex optimization problems

OutlineOutline

3 Convex functions


5 Duality



Convex optimization problemsConvex optimization problems

optimization problem in standard form

convex optimization problems

linear optimization

quadratic optimization

semidefinite programming



Optimization problem in standard formOptimization problem in standard form

minimize f0(x)subject to fi(x) ≤ 0, i = 1, ...,m

hi(x) = 0, i = 1, ..., px ∈ Rn is the optimization variablef0 : Rn Ï R is the objective or cost functionfi : Rn Ï R, i = 1, · · · ,m, are the inequality constraintfunctionshi : Rn Ï R are the equality constraint functions

optimal value:

p∗ = inf{f0(x) | fi(x) ≤ 0, i = 1, · · · ,m, hi(x) = 0, i = 1, · · · , p}p∗ =∞ if problem is infeasible (no x satisfies theconstraints)p∗ = −∞ if problem is unbounded below



Optimal and locally optimal pointsOptimal and locally optimal pointsx is feasible if x ∈ domf0 and it satisfies the constraints

a feasible x is optimal if f0(x) = p∗; Xopt is the set ofoptimal points

x is locally optimal if there is an R > 0 such that x isoptimal for

minimize (over z) f0(z)subject to fi(z) ≤ 0, i = 1, ...,m, hi(z) = 0, i = 1, · · · , p

‖z− x‖2 ≤ R

examples (with n = 1,m = p = 0)f0(x) = 1/x, dom f0 = R++ : p∗ = 0, no optimal pointf0(x) = − log(x), dom f0 = R++ : p∗ = −∞f0(x) = x log x, dom f0 = R++ : p∗ = −1/e, x = 1/e isoptimalf0(x) = x3 − 3x, p∗ = −∞, local optimum at x = 1



Implicit constraintsImplicit constraintsthe standard form optimization problem has an implicitconstraint

x ∈ D = m⋂i=0 dom fi ∩

p⋂i=1 dom hi

we call D the domain of the problemthe constraints fi(x) ≤ 0, hi(x) = 0 are the explicitconstraintsa problem is unconstrained if it has no explicitconstraints (m = p = 0)

example

minimize f0(x) = −∑k

i=1 log(bi − aTi x)

is an unconstrained problem with implicit constraintsaT

i x < biProf. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)7979


Feasibility problemFeasibility problem

find xsubject to fi(x) ≤ 0, i = 1, ...,m

hi(x) = 0, i = 1, · · · , pcan be considered a special case of the generalproblem with f0(x) = 0

minimize 0subject to fi(x) ≤ 0, i = 1, ...,m

hi(x) = 0, i = 1, · · · , pp∗ = 0 if constraints are feasible; any feasible x isoptimal

p∗ =∞ if constraints are infeasible



Convex optimization problemConvex optimization problemstandard form convex optimization problem

minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . ,m

aTi x = bi, i = 1, . . . , p

f0, f1, . . . , fm are convex; equality constraints areaffine

often written as

minimize f0(x)subject to fi(x) ≤ 0, i = 1, ...,m

Ax = b

important property: feasible set of a convexoptimization problem is convex



example

minimize f0(x) = x21 + x22subject to f1(x) = x1/(1 + x22 ) ≤ 0

h1(x) = (x1 + x2)2 = 0f0 is convex; feasible set {(x1, x2) | x1 = −x2 ≤ 0} isconvex

not a convex problem (according to our definition):f1 is not convex, h1 is not affine

equivalent (but not identical) to the convex problem

minimize x21 + x22subject to x1 ≤ 0

x1 + x2 = 0Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)8282


Local and global optimaLocal and global optimaany locally optimal point of a convex problem is(globally) optimal

proof: suppose x is locally optimal and y is optimalwith f0(y) < f0(x)x locally optimal means there is an R > 0 such that

z feasible, ‖z− x‖2 ≤ R ÍÑ f0(z) ≥ f0(x)consider z = θy + (1− θ)x with θ = R/(2‖y − x‖2)‖y − x‖2 > R, so 0 < θ < 1/2z is a convex combination of two feasible points,hence also feasible‖z− x‖2 = R/2 and

f0(z) ≤ θf0(y) + (1− θ)f0(x) < f0(x)which contradicts our assumption that x is locallyoptimal



or use some simplified reasoning:consider z = θy + (1− θ)x with very small θÑ z lies in Euclidean ball around x with radius Rz is feasible, since feasible set is convex

f0(z) ≤ θ f0(y) + (1− θ) f0(x)< θ f0(x) + (1− θ) f0(x) = f0(x)

contradiction to assumption that x is locally optimum Ñf0(y) ≥ f0(x)



Optimality criterion for differentiable f0Optimality criterion for differentiable f0x is optimal if and only if it is feasible and

∇f0(x)T (y − x) ≥ 0 for all feasible y

if nonzero, ∇f0(x) defines a supporting hyperplane tofeasible set X at x



proof: ∇f0(x)T (y − x) ≥ 0 is validÑ f0(y) ≥ f0(x) +∇f0(x)T (y − x) ≥ f0(x)x is optimum point

converse: see Boyd book



unconstrained problem: x is optimal if and only if

x ∈ dom f0, ∇f0(x) = 0equality constrained problem

minimize f0(x) subject to Ax = b

x is optimal if and only if there exists a ν such that

x ∈ dom f0, Ax = b, ∇f0(x) + ATν = 0proof: ∇f0(x)T (y − x) ≥ 0 must hold for optimumx for all y satisfying Ay = ball admissible vectors y can be expressed asy = F z + x0 where x0 is a particular solution to thelinear system of equations and the columns of Fspan the nullspace of A



Ñ∇f0(x)T F z ≥ 0 ∀zÑ∇f0(x) is orthogonal to the nullspace of A;orthogonal complement of nullspace of A isidentical to the column space of AT Ñ∇f0(x) = ATvminimization over nonnegative orthant

minimize f0(x) subject to x � 0

x is optimal if and only if

x ∈ dom f0, x � 0, {∇f0(x)i ≥ 0 xi = 0∇f0(x)i = 0 xi > 0

proof: ∇f0(x)T y ≥ ∇f0(x)T x∀y � 0 for a particular x � 0∇f0(x)T y is unbounded below unless ∇f0(x) � 0Ñ demand: ∇f0(x)T x = 0



Equivalent convex problemsEquivalent convex problemstwo problems are (informally) equivalent if thesolution of one is readily obtained from the solution ofthe other, and vice-versa

some common transformations that preserve convexity:

eliminating equality constraints


Ax = bis equivalent to

minimize (over z) f0(Fz + x0)subject to fi(Fz + x0) ≤ 0, i = 1, . . . ,m

where F and x0 are such that

Ax = b ⇐Ñ x = Fz + x0 for some zProf. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)8989


introducing equality constraints

minimize f0(A0x + b0)subject to fi(Aix + bi) ≤ 0, i = 1, . . . ,m

is equivalent to

minimize (over x,yi) f0(y0)subject to fi(yi) ≤ 0, i = 1, . . . ,m

yi = Aix + bi, i = 0, 1, . . . ,mintroducing slack variables for linearinequalities

minimize f0(x)subject to aT

i x ≤ bi, i = 1, . . . ,mis equivalent to

minimize (over x, s) f0(x)subject to aT

i x + si = bi, i = 1, . . . ,msi ≥ 0, i = 1, . . . ,m



epigraph form: standard form convex problem isequivalent to

minimize (over x, t) tsubject to f0(x)− t ≤ 0

fi(x) ≤ 0, i = 1, . . . ,mAx = b

minimizing over some variables

minimize f0(x1,x2)subject to fi(x1) ≤ 0, i = 1, . . . ,m

is equivalent to

minimize f0(x1)subject to fi(x1) ≤ 0, i = 1, . . . ,m

where f0(x1) = infx2 f0(x1,x2)Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)9191


Linear program (LP)Linear program (LP)

minimize cTx + dsubject to Gx � h

Ax = b

convex problem with affine objective and constraintfunctions

feasible set is a polyhedron



Linear-fractional programLinear-fractional programminimize f0(x)subject to Gx � h

Ax = blinear-fractional program

f0(x) = cTx + deTx + f , dom f0(x) = {x | eTx + f > 0}

a quasiconvex optimization problem; can be solvedby bisectionalso equivalent to the LP (variables y, z )

minimize cTy + dzsubject to Gy � hz

Ay = bzeTy + fz = 1z ≥ 0



Quadratic program (QP)Quadratic program (QP)

minimize (1/2)xTPx + qTx + rsubject to Gx � h

Ax = b

P ∈ Sn+, so objective is convex quadraticminimize a convex quadratic function over apolyhedron



ExamplesExamplesleast-squares

minimize ‖Ax − b‖22analytical solution x∗ = A†b (A† is pseudo-inverse)can add linear constraints, e.g ., l � x � u

linear program with random cost

minimize cTx + γxTΣx = E (cTx) + γ Var (cTx)subject to Gx � h, Ax = b

c is random vector with mean c and covariance Σ

hence, cTx is random variable with mean cTx andvariance xTΣxγ > 0 is risk aversion parameter; controls thetrade-off between expected cost and variance (risk)



Quadratically constrained quadratic program (QCQP)Quadratically constrained quadratic program (QCQP)

minimize (1/2)xTP0x + qT0 x + r0subject to (1/2)xTPix + qT

i x + ri ≤ 0, i = 1, . . . ,mAx = b

Pi ∈ Sn+; objective and constraints are convexquadratic

if P1, . . . ,Pm ∈ Sn++, feasible region is intersection ofm ellipsoids and an affine set (standard form ofellipsoid: {x | (x − xc,i)TB−1

i (x − xc,i) ≤ 1} withBi ∈ Sn++)



Second-order cone programmingSecond-order cone programming

minimize fTxsubject to ‖Aix + bi‖2 ≤ cT

i x + di, i = 1, . . . ,mFx = q

(Ai ∈ Rni×n,F ∈ Rp×n)

closely related to quadratic programminginequalities are called second-order cone (SOC)constraints:(Aix + bi, cT

i x + di) ∈ second-order cone in Rni+1(affine function (Aix + bi, cT

i x + di) has to lie in thesecond–order cone in Rni+1)for Ai = 0, SOCP reduces to an LP; if ci = 0, itreduces to a QCQPmore general than QCQP and LP



Semidefinite program (SDP)Semidefinite program (SDP)

minimize cTxsubject to x1F1 + x2F2 + · · ·+ xnFn + G � 0

Ax = b

with F i,G ∈ Sk

inequality constraint is called linear matrixinequality (LMI)

includes problems with multiple LMI constraints: forexample,

x1F1 + · · ·+ xnFn + G � 0, x1F1 + · · ·+ xnFn + G � 0

is equivalent to single LMI

x1[

F1 00 F1

]+x2[

F2 00 F2

]+· · ·+xn

[Fn 00 Fn

]+[ G 00 G

]� 0



LP and SOCP as SDPLP and SOCP as SDP

LP and equivalent SDP

LP: minimize cTx SDP: minimize cTxsubject to Ax � b subject to diag(Ax − b) � 0

(note different interpretation of generalized inequality�)

SOCP and equivalent SDP

SOCP: minimize fTxsubject to ‖Aix + bi‖2 ≤ cT

i x + di, i = 1, . . . ,mSDP: minimize fTx

subject to[ (cT

i x + di)I Aix + bi(Aix + bi)T cTi x + di

]� 0, i = 1, . . . ,m



proof of equivalence: we consider the block matrix

P = [ A bbT c

]Schur complement of A in P: S = c − bT A−1 b

P is positive semidefinite if and only if S is positivesemidefinite, provided A is positive definite

For

P = [ (cTi x + di)I Aix + bi(Aix + bi)T cT

i x + di

],

and a positive value of cTi x + di, the positive

definiteness of P implies

S = (cTi x + di)− 1

cTi x + di

‖Aix + bi‖22 ≥ 0Ñ ‖Aix + bi‖22 ≤ (cT

i x + di)2Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)100100

5. Duality5. Duality

OutlineOutline


5 Duality




DualityDuality

Lagrange dual problem

weak and strong duality

geometric interpretation

optimality conditions

examples



LagrangianLagrangianstandard form problem (not necessarily convex)


hi(x) = 0, i = 1, . . . , pvariable x ∈ Rn, domain D, optimal value p∗

Lagrangian: L : Rn × Rm × Rp Ï R, withdom L = D × Rm × Rp,

L(x,λ, ν) = f0(x) + m∑i=1 λifi(x) + p∑

i=1 νihi(x)weighted sum of objective and constraint functionsλi is Lagrange multiplier associated with fi(x) ≤ 0νi is Lagrange multiplier associated with hi(x) = 0



Lagrange dual functionLagrange dual functionLagrange dual function: g : Rm × Rp Ï R,

g(λ, ν) = infx∈D

L(x,λ, ν)= inf

x∈D

(f0(x) + m∑

i=1 λifi(x) + p∑i=1 νihi(x))

λ and ν are the dual variables or Lagrange multipliervectors

g is concave since it is defined as pointwise infimum ofaffine functions, and it can be −∞ for some λ, ν

lower bound property: if λ � 0, then g(λ, ν) ≤ p∗

proof: if x is feasible and λ � 0, then

f0(x) ≥ L(x,λ, ν) ≥ infx∈D

L(x,λ, ν) = g(λ, ν)minimizing over all feasible x gives p∗ ≥ g(λ, ν)



Least-norm solution of linear equationsLeast-norm solution of linear equations

minimize xTxsubject to Ax = b

dual functionLagrangian is L(x, ν) = xTx + νT (Ax − b)to minimize L over x, set gradient equal to zero:

∇xL(x, ν) = 2x + ATν = 0 ÍÑ x = −(1/2)ATν

plug in L to obtain g:

g(ν) = L((−1/2)ATν, ν) = −14νTAATν − bTν

a concave function of ν

lower bound property: p∗ ≥ −(1/4)νTAATν − bTν forall ν



Standard form LPStandard form LP

minimize cTxsubject to Ax = b, x � 0

dual functionLagrangian is

L(x,λ, ν) = cTx + νT (Ax − b)− λTx= −bTν + (c + ATν − λ)Tx

L is affine in x, hence

g(λ, ν) = infx

L(x,λ, ν) = { −bTν ATν − λ + c = 0−∞ otherwise

g is linear on affine domain {(λ, ν) | ATν − λ + c = 0},hence concave

lower bound property: p∗ ≥ −bTν if ATν + c � 0Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)106106


The dual problemThe dual problemLagrange dual problem

maximize g(λ, ν)subject to λ � 0

finds best lower bound on p∗, obtained fromLagrange dual functiona convex optimization problem, also if originalproblem is not! optimal value denoted d∗λ, ν are dual feasible if λ � 0, (λ, ν) ∈ dom goften simplified by making implicit constraint(λ, ν) ∈ dom g explicit

example: standard form LP and its dual

minimize cTx maximize −bTνsubject to Ax = b subject to ATν + c � 0

x � 0Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)107107


Weak and strong dualityWeak and strong duality

weak duality: d∗ ≤ p∗

always holds (for convex and nonconvex problems)

can be used to find nontrivial lower bounds fordifficult problems

strong duality: d∗ = p∗

does not hold in general

(usually) holds for convex problems

conditions that guarantee strong duality in convexproblems are called constraint qualifications



Slater’s constraint qualificationSlater’s constraint qualificationstrong duality holds for a convex problem


Ax = bif it is strictly feasible, i.e.,

∃ x ∈ intD : fi(x) < 0, i = 1, . . . ,m, Ax = b

also guarantees that the dual optimum is attained(if p∗ > −∞)can be sharpened: e.g., can replace intD withrelintD (interior relative to affine hull); linearinequalities do not need to hold with strictinequality, . . .there exist many other types of constraintqualifications



Inequality form LPInequality form LPprimal problem

minimize cTxsubject to Ax � b

dual function

g(λ) = infx

((c + ATλ)Tx − bTλ) = { −bTλ ATλ + c = 0

−∞ otherwise

dual problem

maximize −bTλsubject to ATλ + c = 0, λ � 0

from Slater’s condition: p∗ = d∗ if Ax ≺ b for some xin fact, p∗ = d∗ except when primal and dual areinfeasible



Quadratic programQuadratic programprimal problem (assume P ∈ Sn++)

maximize xTPxsubject to Ax � b

dual function

g(λ) = infx

(xTPx + λT (Ax − b)) = −14λTAP−1ATλ − bTλ

(infimum is attained at x for which 2 P x = −AT λÑ x = −12 P−1 AT λ)dual problem

maximize −(1/4)λTAP−1ATλ − bTλsubject to λ � 0

from Slater’s condition: p∗ = d∗ if Ax ≺ b for some xin fact, p∗ = d∗ always



Geometric interpretationGeometric interpretationfor simplicity, consider problem with one constraintf1(x) ≤ 0interpretation of dual function:

g(λ) = inf(u,t)∈G(t + λu), where G = {(f1(x), f0(x) | x ∈ D}t + λu = [ λ 1 ] [ u

t

]



λu + t = g(λ) is (non-vertical) supporting hyperplaneto Ghyperplane intersects t-axis at t = g(λ)



Complementary slackness IComplementary slackness I

assume strong duality holds, x∗ is primal optimal, (λ∗, ν∗)is dual optimal

f0(x∗) = g(λ∗, ν∗) = infx∈D

(f0(x) + m∑

i=1 λ∗i fi(x) + p∑

i=1 ν∗i hi(x))

≤ f0(x∗) + m∑i=1 λ

∗i fi(x∗) + p∑

i=1 ν∗i hi(x∗)

≤ f0(x∗)(λ∗i ≥ 0, fi(x∗) ≤ 0, λ∗i fi(x∗) ≤ 0, hi(x∗) = 0)



Complementary slackness IIComplementary slackness II

hence, the two inequalities hold with equality

x∗ minimizes L(x,λ∗, ν∗)λ∗i fi(x∗) = 0 for i = 1, . . . ,m (known as complementaryslackness):

λ∗i > 0 ÍÑ fi(x∗) = 0, fi(x∗) < 0 ÍÑ λ∗i = 0ith Lagrange multiplier λi can be only nonzero if ithconstraint is active at optimum



Karush-Kuhn-Tucker (KKT) conditionsKarush-Kuhn-Tucker (KKT) conditions

the following four conditions are called KKT conditions(for a problem with differentiable fi,hi, D = Rn):

1 primal constraints:fi(x) ≤ 0, i = 1, . . . ,m, hi(x) = 0, i = 1, . . . , p

2 dual constraints: λ � 03 complementary slackness: λifi(x) = 0, i = 1, . . . ,m4 gradient of Lagrangian with respect to x vanishes:

∇f0(x) + m∑i=1 λi∇fi(x) + p∑

i=1 νi∇hi(x) = 0

from previous considerations: if strong duality holdsand x,λ, ν are optimal, then they must satisfy the KKTconditions



KKT conditions for convex problem IKKT conditions for convex problem I

if x, λ, ν satisfy KKT for a convex problem, then they areoptimal:

from complementary slackness: f0(x) = L(x, λ, ν)from 4th condition (and convexity): g(λ, ν) = L(x, λ, ν)(L(x, λ, ν) is convex in x and differentiable)

hence, f0(x) = g(λ, ν)(g(λ, ν) = L(x, λ, ν) = f0(x) + m∑

i=1 λifi(x) + p∑i=1 νihi(x) = f0(x))

g(λ, ν) = f0(x) ≤ p∗ Ñ p∗ = f0(x)Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)117117


KKT conditions for convex problem IIKKT conditions for convex problem II

if Slater’s condition is satisfied:x is optimal if and only if there exist λ, ν that satisfy KKTconditions

recall that Slater implies strong duality, and dualoptimum is attained

generalizes optimality condition ∇f0(x) = 0 forunconstrained problem



example: water-filling (assume αi > 0)for signal transmission over n independent subchannelswith signal power xi and noise power αi, the totalchannel capacity is

C =∑n

i=1 log(1+xi/αi) =∑n

i=1(log(1+xi/αi)+log(αi))+const.

hence, for maximization of the channel capacity undera total power constraint the following optimizationproblem results

minimize −∑n

i=1 log(xi + αi)subject to x � 0, 1Tx = 1

x is optimal iff x � 0, 1Tx = 1, and there existλ ∈ Rn, ν ∈ R such that

λ � 0, λixi = 0, 1xi + αi

+ λi = ν



if ν < 1/αi : λi = 0 and xi = 1/ν − αi (xi = 0 wouldcause negative λi)if ν ≥ 1/αi : λi = ν − 1/αi and xi = 0 (xi > 0 wouldcause λi > 0 violating complementary slackness)combining the above two results givesxi = max{0, 1/ν − αi}determine ν from 1Tx =∑n

i=1 max{0, 1/ν − αi} = 1interpretation

n patches; level of patch i is at height αiflood area with unit amount of waterresulting level is 1/ν∗



Duality and problem reformulationsDuality and problem reformulations

equivalent formulations of a problem can lead tovery different duals

reformulating the primal problem can be usefulwhen the dual is difficult to derive, or uninteresting

common reformulations

introduce new variables and equality constraints

make explicit constraints implicit or vice-versa

transform objective or constraint functionse.g., replace f0(x) by φ(f0(x)) with φ convex,increasing



Semidefinite programSemidefinite programprimal SDP (F i,G ∈ Sk)

minimize cTxsubject to x1F1 + · · ·+ xnFn � G

Lagrange multiplier is matrix Z ∈ Sk

Lagrangian

L(x,Z) = cTx + tr(Z(x1F1 + · · ·+ xnFn −G))= c1 x1 + . . .+ cn xn +tr(Z F1) x1 + . . .+ tr(Z Fn) xn − tr(Z G)

(scalar product between two symmetric matrices Aand B: tr(A B))dual function

g(Z) = infx

L(x,Z) = { −tr(GZ) tr(F iZ) + ci = 0, i = 1, . . . ,n−∞ otherwise



dual SDP

maximize −tr(GZ)subject to Z � 0, tr(F iZ) + ci = 0, i = 1, . . . ,n

p∗ = d∗ if primal SDP is strictly feasible (∃ x withx1F1 + · · ·+ xnFn ≺ G)


6. Applications in communications6. Applications in communications

OutlineOutline

5 Duality





Examples for applications in communicationsExamples for applications in communications

downlink beamforming

uplink-downlink duality

multiuser detection



Downlink beamforming problem [Luo et al. 2006]Downlink beamforming problem [Luo et al. 2006]

transmitter beamforming problem in downlink ofwireless communications

base station is equipped with multiple antennasand each of K mobile terminal with a single antenna

block diagram for K = 2 in equivalent complexbaseband:

z1~N (0,σ2)Base station Mobile terminals

u1

u2

w1

w2

x

h2H

h1H

z2~N (0,σ2)

y1

y2



wi: transmit beamforming vector for ith user(NT × 1; NT : number of transmit antennas)

ui: information symbol of ith user of varianceE{|ui|2} = 1x: transmit vector of base station in current timestep (NT × 1)



received signal of ith user:

yi = hHi x + zi, i = 1, . . . ,K

hi: channel vector of ith user(µth entry (µ = 1, . . . ,NT ): overall (conjugate)channel coefficient from µth transmit antennaof BS to user i)

hi is assumed to be known at BS and receiver

zi: i.i.d. additive complex Gaussian noise ofvariance σ2yi is a scalar (single antenna receivers)



transmit vector of beamformer in the BS:

x = K∑i=1 uiwi

Ñ representation for received signal at `th terminal:

y` = hH`

( K∑i=1 uiwi

)+ z` , ` = 1, . . . ,K= u`hH

` w` + K∑i=1,i 6=` uihH

` wi + z`

signal-to-interference-and-noise ratio (SINR)of `th user:

SINR` = |hH` w` |2∑

k 6=` |hH` wk|2 + σ2



design criterion for beamforming vectors:

minimization of total transmit power, whilesatisfying a given set of SINR constraints γ`for the users (assuming the set is feasible)

minimize E{xHx} = E{‖x‖22} =∑Ki=1 ‖wi‖22

subject towH` h`hH

` w`∑k 6=` wH

k h`hH` wk + σ2 ≥ γ` ∀`

however: SINR constraint is not convexÑ not aconvex optimization problem, but one that can berelaxed or transformed into a convex problem



Relaxation approachRelaxation approach

reformulation

define Bi = wiwHi (NT ×NT positive semidef. matrix)

H i = hihHi

Bi is of rank 1 (dyadic product)

Ñ minimize∑K

i=1 tr(Bi)subject to tr(H`B` )− γ` ∑

k 6=` tr(H`Bk) ≥ γ`σ2 ∀`

Bi � 0, Bi Hermitian, rank (Bi) = 1 ∀i



dropping the rank-1 constraint results in a convexsemidefinite programming (SDP) problem (SDPrelaxation)

it can be shown that the SDP relaxation isguaranteed to have at least one optimalsolution with rank 1Ñ it can be used to optimally solve to original,nonconvex problem



Transformation into a convex problemTransformation into a convex problem

observe that an arbitrary phase rotation can beadded to the beamforming vectors withoutaffecting the transmit power or the constraintsÑ hH

` w` can be chosen to be real, ≥ 0 without anyloss in generality ∀`

constraints: (1 + 1γ` )|hH

` w` |2 ≥∑Kk=1 |hH

` wk|2 + σ2 ∀`



define following vector and matrix

w = [wT1 wT2 . . .wT

K

]T

H` =

hH` 0 . . . 00 hH

` . . . 0...

......

0 0 . . . hH`

Ñ H`w =

hH` w1

hH` w2...

hH` wK

constraint can be written written as(1 + 1

γ`

)|hH

` w` |2 ≥∣∣∣∣∣∣∣∣[ H`w

σ

]∣∣∣∣∣∣∣∣22Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)134134


hH` w` can be assumed to be real, non-negativeÑ taking the square root yields:√1 + 1

γ` hH` w` ≥

∣∣∣∣∣∣∣∣[ H`wσ

]∣∣∣∣∣∣∣∣2 ∀`

or ||Aw + b||2 ≤ cHw + d

with A = [ H`0T

] ; b =

0...0σ

cH = [ 0T 0T . . . hH

` 0T . . . 0T ] ·√1 + 1γ` ; d = 0

second-order cone constraint!original optimization problem is equivalent to thefollowing second-order cone program



SOCPSOCP

minimize τ

subject to√1 + 1

γ` hH` w` ≥

∣∣∣∣∣∣∣∣[ H`wσ

]∣∣∣∣∣∣∣∣2 ∀`

||w||2 ≤ τ

(this minimizes√∑K

i=1 ||wi||22 under the constraints;√(·)

is a monotonically increasing function and does notchange the optimum solution for the beamformingfilters)



Uplink-downlink duality via Lagrangian duality [Luo et al. 2006]Uplink-downlink duality via Lagrangian duality [Luo et al. 2006]

exploring the dual of convex optimization problemsin engineering often reveals the structure of theoptimum solution

Lagrangian dual of the above SOCP problem has anengineering interpretation, which is known asuplink-downlink duality

several different versions of uplink-downlink dualityhave been developed in the literature, referring todifferent figures of merit, e.g. channel capacity



duality in beamforming context: minimumpower needed to achieve a set of SINR targets in adownlink multiple-input multiple-output (MIMO)channel is the same as the minimum power neededto achieve the same set of SINR targets in theuplink channel, where the uplink channel is derivedby reversing the input and output of the downlink

these results can be proven via Lagrangian dualityin convex optimization in a unified manner



Lagrangian of the downlink beamforming problem:

L(w,λ)= K∑i=1 wH

i wi + K∑i=1 λi

− 1γi|hH

i wi|2 +∑k 6=i|hH

i wk|2 + σ2

= K∑i=1 λi σ2 + K∑

i=1 wHi

I − λiγi

hihHi +∑

µ 6=iλµhµhH

µ

wi

λ=[λ1, λ2, . . . , λK]TLagrangian dual function:

g(λ) = infw∈CNT ·K

L(w,λ)



g(λ) =

K∑i=1 λi σ2 if

(I − λi

γihihH

i + ∑µ 6=i

λµhµhHµ

)� 0 ∀i

−∞ else

dual problem (SDP):

maximizeK∑

i=1 λi σ2subject to I + K∑

µ=1 λµhµhHµ �

(1 + 1γi

)λihihH

i ∀i



dual problem can be shown to correspond to anuplink problem with λi as the the (scaled) uplinkpower, hi as the uplink channel, and γi as the SINRconstraintblock diagram of uplink transmission (K = 2)

Base stationMobile terminals

x1

x2

h1

h2 w2H

w1H ỹ1

ỹ2

z~N (0,σ2I)

wi: receiver filter vector of base station for ith user(N × 1)xi: information symbol of ith user of varianceE{|xi|2} = ρi



hi: channel vector of ith user (µth entry: overallchannel coefficient from user i to µth receiveantenna of BS); N × 1z: i.i.d. additive complex Gaussian noise vectorwith covariance matrix σ2Ireceived signal for user ` after filtering at BS:

y` = wH`

( K∑i=1 xihi + z

)` = 1, . . . ,K

= x` wH` h` + K∑

i=1,i 6=` xi wH` hi + wH

` z



SINR of `th user:

SINR` = ρ` |wH` h` |2∑

k 6=` ρk|wH` hk|2 + σ2wH

` w`

design criterion for receiver vectors:

minimize total sum transmit power of mobileterminals, while satisfying a given set of SINRconstraints γ` for the users

minimizeK∑

i=1 ρi

subject toρ`wH

` h`hH` w`∑

k 6=` ρkwH` hkhH

k w` + σ2wH` w`

≥ γ` ∀`



optimal wi is the minimum mean-squared error(MMSE) filter

wi = ( K∑k=1 ρkhkhH

k + σ2I)−1

hi

substituting the MMSE filters into the constraintsand performing some matrix manipulations, it canbe shown that the uplink problem is equivalent to

minimizeK∑

i=1 ρi

subject to(1 + 1

γi

)ρihihH

i � σ2I + K∑µ=1 ρµhµhH

µ ∀i



with λi = ρiσ2 , the dual downlink problem and

the uplink problem are identical, except thatmaximization and minimization are reversedand right hand side and left hand side in theconstraint inequalities have been interchanged

finally, both problems turn out to have the samesolution

strong duality holds for original and dual downlinkproblem (convex optimization problem)

Ñ original and dual problem must have the samesolution

Ñ solution of dual uplink problem solves alsooriginal downlink problem



dual variables of the downlink problem can beinterpreted as uplink powers scaled by the noisevariance

there is a quite general uplink-downlink dualitywhich is useful because the uplink problem is easierto solve (e.g., iteratively)



Multiuser detection [Luo et al. 2006]Multiuser detection [Luo et al. 2006]

consider a multiple-input multiple-output (MIMO)transmission with received vector

y =√ ρnHs + z

ρ: total signal-to-noise ratio, SNR

H: n ×m channel matrixentry (ν, µ): channel coefficient from µthtransmit antenna to νth receive antenna

m: number of transmit antennasn: number of receive antennas

s: transmitted symbol vector with BPSK entriess ∈ {−1,+1}mz: additive complex Gaussian noise vectorwith i.i.d. entries



channel capacity of MIMO channel is known to beproportional to the number of transmit antennas(for sufficient number of receive antennas)

efficient detection algorithms are needed in order torealize the potential gains of a MIMO transmission

maximum-likelihood detection:

minimize∣∣∣∣∣∣y −√ ρ

n Hs∣∣∣∣∣∣22

subject to s ∈ {±1}mnonconvex problem!

known methods to find ML solution:1 full search2 sphere decoder



in the following a semidefinite programmingrelaxation method with polynomial complexityis described

for simplicity, we consider the case m = nH, s and z are assumed to be real

rewrite decision metric as∣∣∣∣∣∣∣∣y −√ ρnHs

∣∣∣∣∣∣∣∣22 = tr (QxxT )with real-valued matrix Q and vector x,

Q = ( ρn)

HTH −√

ρn HTy

−√

ρn yTH ||y||22

, x = [ s1 ]



define X = xxT ; X � 0, Xii = 1, rank(X) = 1 ifand only if X = xxT for some x with xi = ±1write original problem in terms of X and relaxthe rank-1 constraint

Ñ SDP relaxation:

minimize tr(Q · X)subject to X � 0, Xii = 1 ∀i

solve SDP which, however, in general will yield amatrix X with rank greater than one



randomized procedure to generate a feasiblerank-1 solution xSDR

1 compute the largest eigenvalue of optimum solutionof SDP and associated unit-norm eigenvector

v = [v1 v2 . . . vn+1]T2 generate L i.i.d. binary vectors x` , ` = 1, . . . ,L, whose

ith entry (i = 1, . . . ,n + 1) follows the distribution

Pr{xi = +1} = (1 + vi)/2Pr{xi = −1} = (1− vi)/2

3 pick xSDR = argminx` xT` Qx`

set sSDR to be the first n entries of xSDR multipliedby the last entry (to correct the sign)

excellent performance - complexity tradeoff of SDPdetector in practical SNR ranges



bit error rate vs. signal-to-noise ratio (SNR)

(from [Luo et al. 2006])



ReferenceReference

[Luo et al. 2006] Z.Q. Luo and W. Yu, ”An introductionto convex optimization for communi-cations and signal processing”, IEEEJournal on Selected Areas in Com-munications, vol. 24, no. 8, pp.1426-1438, Aug. 2006


7. Algorithms – Equality constrained minimization7. Algorithms – Equality constrained minimization

OutlineOutline



8 Conclusions



Equality constrained minimizationEquality constrained minimization

equality constrained minimization

eliminating equality constraints

Newton’s method with equality constraints



Equality constrained minimizationEquality constrained minimization

minimize f (x)subject to Ax = b

f convex, twice continuously differentiable

A ∈ Rp×n with rank A = pwe assume p∗ is finite and attained

optimality conditions x∗ is optimal iff there exists a ν∗such that

∇f (x∗) + ATν∗ = 0, Ax∗ = b



equality constrained quadratic minimization (withP ∈ Sn+)

minimize (1/2)xTPx + qTx + rsubject to Ax = b

optimality condition:[P AT

A 0

] [x∗ν∗] = [ −q

b

]coefficient matrix is called KKT matrix

KKT matrix is nonsingular if and only if

Ax = 0, x 6= 0 ÍÑ xTPx > 0equivalent condition for nonsingularity: P + ATA � 0



Eliminating equality constraintsEliminating equality constraintsrepresent solution of {x | Ax = b} as

{x | Ax = b} = {Fz + x | z ∈ Rn−p}

x is (any) particular solutionrange of F ∈ Rn×(n−p) is nullspace of A(rank F = n − p and AF = 0)reduced or eliminated problem

minimize f (Fz + x)an unconstrained problem with variable z ∈ Rn−p

from solution z∗, obtain x∗ and ν∗ as

x∗ = Fz∗ + x, ν∗ = −(AAT )−1A∇f (x∗)Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)158158


example: optimal allocation with resource constraint

minimize f1(x1) + f2(x2) + . . .+ fn(xn)subject to x1 + x2 + . . .+ xn = b

eliminate xn = b − x1 − . . .− xn−1, i.e., choose

x = ben, F = [ I−1T

]∈ Rn×(n−1)

reduced problem:

minimize f1(x1) + . . .+ fn−1(xn−1) + fn(b − x1 − . . .− xn−1)(variables x1, . . . , xn−1)



Newton stepNewton stepNetwton step ∆xnt of f at a feasible x is given bysolution v of[

∇2f (x) AT

A 0

] [vw

] = [ −∇f (x)0

]interpretations∆xnt solves second order approximation (with

variable v)

minimize f (x + v) = f (x) +∇f (x)Tv + (1/2)vT∇2f (x)vsubject to A(x + v) = b

∆xnt equations follow from linearizing optimalityconditions

∇f (x+v)+ATw ≈ ∇f (x)+∇2f (x)v+ATw = 0, A(x+v) = b



Newton decrementNewton decrement

λ(x) = (∆xTnt∇2f (x)∆xnt )1/2 = (−∇f (x)T∆xnt )1/2

properties

gives an estimate of f (x)− p∗ using quadraticapproximation f:

f (x)− infAy=b

f (y) = 12λ(x)2directional derivative in Newton direction:

ddt f (x + t∆xnt )∣∣∣∣

t=0 = −λ(x)2in general, λ(x) 6= (∇f (x)T∇2f (x)−1∇f (x))1/2



Newton’s method with equality constraintsNewton’s method with equality constraints

given starting point x ∈ dom f with Ax = b, toleranceε > 0.repeat

1 Compute the Newton step and decrement ∆xnt , λ(x).2 Stopping criterion. quit if λ2/2 ≤ ε.3 Line search. Choose step size t by backtracking line

search.4 Update. x := x + t∆xnt

a feasible descent method: x(k) feasible andf (x(k+1)) < f (x(k))affine invariant



Newton’s method and eliminationNewton’s method and elimination

Newton’s method for reduced problem

minimize f (z) = f (Fz + x)variables z ∈ Rn−p

x satisfies Ax = b; rank F = n − p and AF = 0Newton’s method for f , started at z(0), generatesiterates z(k)

Newton’s method with equality constraints

when started at x(0) = Fz(0) + x, iterates are

x(k+1) = Fz(k) + x

hence, don’t need separate convergence analysis


8. Conclusions8. Conclusions

OutlineOutline


8 Conclusions



ConclusionsConclusionsmathematical optimization

problems in engineering design, data analysis andstatistics, economics, management, etc., can oftenbe expressed as mathematical optimizationproblems

techniques exist to take into account multipleobjectives or uncertainty in the data

tractability

roughly speaking, tractability in optimizationrequires convexity

algorithms for nonconvex optimization find local(suboptimal) solutions, or are very expensive

surprisingly many applications can be formulatedas convex problems



theoretical consequences of convexity

locally optimum solutions are globally optimum

set of globally optimum solutions is always convex

extensive duality theorysystematic way of deriving lower bounds on optimalvaluenecessary and sufficient optimality conditionssolution methods with polynomial worst-casecomplexity



practical consequences of convexity: (most)convex problems can be solved globally andefficiently

interior-point methods require 20 – 80 steps inpractice

basic algorithms (e.g., Newton, barrier method,etc.) are easy to implement and work well for smalland medium size problems (larger problems ifstructure is exploited)

more and more high-quality implementations ofadvanced algorithms and modeling tools arebecoming available

high level modeling tools like cvx ease modelingand problem specification



how to use convex optimization in some appliedcontext

use rapid prototyping, approximate modeling – startwith simple models, small problem instances,inefficient solution methods

if you don’t like the results, no need to expendfurther effort on more accurate models or efficientalgorithms

work out, simplify, and interpret optimalityconditions and dualeven if the problem is quite nonconvex, you can useconvex optimization

in subproblems, e.g., to find search directionby repeatedly forming and solving a convexapproximation at the current pointapproximate or even optimum solution viaproblem relaxation



applications in communications and signalprocessing

power allocation

optimization of beamformers/filters and signalpowers in multiuser wireless networks (uplinkand downlink)

detection (SISO, MIMO)

calculation of channel capacity

design of source codes

design of transmission policies for sensornetworks with energy harvesting

signal recovery

blind source separation

radar signal design

optimization of power flow in smart grids

etc.Prof. Dr.-Ing. Wolfgang Gerstacker – Convex Optimization (based on materials by S. Boyd)169169


connections to current research topics forMaster’s / Ph.D. theses

knowledge on convex optimization is highlybeneficial for research projects on

cellular systems with buffer–aided relaying,wireless networks with information and powertransfer (energy harvesting),resource allocation,etc.


Documents

Convex Optimization in Communications and Signal …en.its.kpi.ua/International_office/SiteAssets/Lists/News...Convex Optimization in Communications and Signal Processing Prof. Dr.-Ing