L.Vandenberghe ECE236C(Spring2020) 9.Dualdecompositionvandenbe/236C/lectures/dualdecomp.pdfD.P.Bertsekas,Network Optimization. Continuous and Discrete Models (1998). L.S.Lasdon,Optimization

L. Vandenberghe ECE236C (Spring 2020)

9. Dual decomposition

• dual methods

• dual decomposition

• network utility maximization

• network flow optimization

9.1

Dual methods

primal: minimize f (x) + g(Ax)dual: maximize −g∗(z) − f ∗(−AT z)

reasons why dual problem may be easier to solve by first-order methods:

• dual problem is unconstrained or has simple constraints (for example, z � 0)

• dual objective is differentiable or has a simple nondifferentiable term

• decomposition: exploit separable structure

Dual decomposition 9.2

(Sub-)gradients of conjugate function

assume f : Rn → R is closed and convex with conjugate

f ∗(y) = supx(yT x − f (x))

• f ∗ is subdifferentiable on (at least) int dom f ∗ (page 2.4)

• maximizers in the definition of f ∗(y) are subgradients at y (page 5.15)

y ∈ ∂ f (x) ⇐⇒ yT x − f (x) = f ∗(y) ⇐⇒ x ∈ ∂ f ∗(y)

• if f is strictly convex, maximizer is unique (hence, equal to ∇ f ∗(y)) if it exists• if f is strongly convex, then conjugate is defined for all y and differentiable with

‖∇ f ∗(y) − ∇ f ∗(y′)‖ ≤ 1µ‖y − y′‖∗ for all y, y′

(µ is strong convexity constant of f with respect to ‖ · ‖); see page 5.19


Outline

• dual methods




Equality constraints

Primal and dual problems

primal: minimize f (x)subject to Ax = b

dual: maximize −bT z − f ∗(−AT z)

Dual gradient ascent algorithm (assuming dom f ∗ = Rn)

x̂ = argminx( f (x) + zT Ax)

z+ = z + t(Ax̂ − b)

• step one computes a subgradient x̂ ∈ ∂ f ∗(−AT z)• step two computes a subgradient b − Ax̂ of bT z + f ∗(−AT z) at z

of interest if calculation of x̂ is inexpensive (for example, f is separable)


Dual decomposition

Convex problem with separable objective

minimize f1(x1) + f2(x2)subject to A1x1 + A2x2 � b

constraint is complicating or coupling constraint

Dual problem

maximize − f ∗1 (−AT1 z) − f ∗2 (−AT

2 z) − bT zsubject to z � 0

can be solved by (sub-)gradient projection if z � 0 is the only constraint


Dual subgradient projection

Subproblem: to calculate f ∗j (−ATj z) and a (sub-)gradient for it,

minimize (over x j) f j(x j) + zT A j x j

• optimal value is − f ∗j (−ATj z)

• minimizer x̂ j is in ∂ f ∗j (−ATj z)

Dual subgradient projection method

x̂ j = argminx j

( f j(x j) + zT A j x j) for j = 1,2

z+ = (z + t(A1 x̂1 + A2 x̂2 − b))+

• minimization problems over x1, x2 are independent

• z-update is projected subgradient step (u+ = max{u,0} elementwise)


Interpretation as price coordination

• p = 2 units in a system; unit j chooses decision variable x j

• constraints are limits on shared resources; zi is price of resource i

Dual update: depends on slacks s = b − A1x1 − A2x2

z+ = (z − ts)+

• increases price zi if resource is over-utilized (si < 0)

• decreases price zi if resource is under-utilized (si > 0)

• never lets prices get negative

Distributed architecture: central node sets prices z, peripheral node j sets x j

01 2A1 x̂1

z

A2 x̂2

z


Example

Quadratic optimization problem

minimizer∑

j=1(12xT

j Pj x j + qTj x j)

subject to B j x j � d j, j = 1, . . . ,rr∑

j=1A j x j � b

• without last inequality, problem would separate into r independent QPs

• we assume Pj � 0

Formulation for dual decomposition

minimizer∑

j=1f j(x j)

subject tor∑

j=1A j x j � b

where f j(x j) = (1/2)xTj Pj x j + qT

j x j with domain {x j | B j x j � d j}Dual decomposition 9.8

Dual problem

maximize −bT z −r∑

j=1f ∗j (−AT

j z)subject to z � 0

• gradient of h(z) = ∑j f ∗j (−AT

j z) is Lipschitz continuous (since Pj � 0):

‖∇h(z) − ∇h(z′)‖2 ≤‖A‖22

min j λmin(Pj)‖z − z′‖2

where A = [ A1 · · · Ar ]• function value of − f ∗j (−AT

j z) is optimal value of QP

minimize (over x j) (1/2)xTj Px j + (q j + AT

j z)T x j

subject to B j x j � d j

• optimal solution x̂ j is gradient x̂ j = ∇ f ∗j (−ATj z)


Numerical example

• 10 subproblems (r = 10), each with 100 variables and 100 constraints

• 10 coupling constraints

• projected gradient descent and FISTA, with the same fixed step size

0 50 100 150 200

10−5

10−4

10−3

10−2

10−1

100

101

102

iteration

rela

tive

dual

subo

ptim

ality

gradient projectionFISTA


Outline

• dual methods




Network utility maximization

Network flows

• n flows, with fixed routes, in a network with m links

• variable x j ≥ 0 denotes the rate of flow j

• flow utility is U j : R→ R, concave, increasing

Capacity constraints

• traffic yi on link i is sum of flows passing through it

• y = Rx, where R is the routing matrix

Ri j =

{1 flow j passes over link i0 otherwise

• link capacity constraint: y � c


Dual network utility maximization problem

primal: maximizen∑

j=1U j(x j)

subject to Rx � c

dual: minimize cT z +n∑

j=1(−U j)∗(−rT

j z)subject to z � 0

• r j is column j of R

• dual variable zi is price (per unit flow) for using link i

• rTj z is the sum of prices along route j


(Sub-)gradients of dual function

Dual objective

f (z) = cT z +n∑

j=1(−U j)∗(−rT

j z)

= cT z +n∑

j=1sup

x j

(U j(x j) − (rT

j z)x j

)Subgradient

c − Rx̂ ∈ ∂ f (z) where x̂ j = argmaxx j

(U j(x j) − (rT

j z)x j

)• rT

j z is the sum of link prices along route j

• c − Rx̂ is vector of link capacity margins for flow x̂

• if U j is strictly concave, this is a gradient


Dual decomposition algorithm

given initial link price vector z � 0 (e.g., z = 1), repeat:

1. sum link prices along each route: calculate λ j = rTj z for j = 1, . . . ,n

2. optimize flows (separately) using flow prices

x̂ j = argmaxx j

(U j(x j) − λ j x j

), j = 1, . . . ,n

3. calculate link capacity margins s = c − Rx̂

4. update link prices using projected (sub-)gradient step with step t

z := (z − ts)+

Decentralized:

• to find λ j , x̂ j source j only needs to know the prices on its route

• to update si, zi, link i only needs to know the flows that pass through it


Outline

• dual methods




Single commodity network flow

Network

• connected, directed graph with n links/arcs, m nodes

• node-arc incidence matrix A ∈ Rm×n is

Ai j =

1 arc j enters node i−1 arc j leaves node i

0 otherwise

Flow vector and external sources

• variable x j denotes flow (traffic) on arc j

• bi is external demand (or supply) of flow at node i (satisfies 1T b = 0)

• flow conservation: Ax = b


Network flow optimization problem

minimize φ(x) =n∑

j=1φ j(x j)

subject to Ax = b

• φ is a separable sum of convex functions

• dual decomposition yields decentralized solution method

Dual problem (a j is jth column of A)

maximize −bT z −n∑

j=1φ∗j(−aT

j z)

• dual variable zi can be interpreted as potential at node i

• y j = −aTj z is the potential difference across arc j

(potential at start node minus potential at end node)


(Sub-)gradients of dual function

Negative dual objective

f (z) = bT z +n∑

j=1φ∗j(−aT

j z)

Subgradient

b − Ax̂ ∈ ∂ f (z) where x̂ j = argmin(φ j(x j) + (aT

j z)x j

)• this is a gradient if the functions φ j are strictly convex

• if φ j is differentiable, φ′j(x̂ j) = −aTj z


Dual decomposition network flow algorithm

given initial potential vector z, repeat

1. determine link flows from potential differences y = −AT z

x̂ j = argminx j

(φ j(x j) − y j x j

), j = 1, . . . ,n

2. compute flow residual at each node: s := b − Ax̂

3. update node potentials using (sub-)gradient step with step size t

z := z − ts

Decentralized:

• flow is calculated from potential difference across arc

• node potential is updated from its own flow surplus


Electrical network interpretation

network flow optimality conditions (with differentiable φ j)

Ax = b, y + AT z = 0, y j = φ′j(x j), j = 1, . . . ,n

network with node incidence matrix A, nonlinear resistors in branches

Kirchhoff current law (KCL): Ax = b

x j is the current flow in branch j; bi is external current extracted at node i

Kirchhoff voltage law (KVL): y + AT z = 0

z j is node potential; y j = −aTj z is jth branch voltage

Current-voltage characterics: y j = φ′j(x j)

for example, φ j(x j) = Rj x2j /2 for linear resistor Rj

current and potentials in circuit are optimal flows and dual variables


Example: minimum queueing delay

Flow cost function and conjugate (c j > 0 is link capacity):

φ j(x j) =x j

c j − x j, φ∗j(y j) =

(√c jy j − 1

)2y j > 1/c j

0 y j ≤ 1/c j

with dom φ j = [0, c j)

• φ j is differentiable except at x j = 0

∂φ j(0) = (−∞,0], φ′j(x j) =c j

(c j − x j)2(0 < x j < c j)

• φ∗j is differentiable

φ∗j′(y j) =

{0 y j ≤ 1/c jc j −

√c j/y j y j > 1/c j


Flow cost function, conjugate, and their subdifferentials (c j = 1)

0 10

2

4

6

x j

φj(x

j)

0 2 4 6 80

1

2

y j

φ∗ j(y

j)

0 1

0

2

4

6

8

x j

∂φ

j(xj)

0 2 4 6 8

0

1

y j

∂φ∗ j(y

j)


References

• S. Boyd, Lecture slides and notes for EE364b, Convex Optimization II, lectures and notes ondecomposition.

• M. Chiang, S.H. Low, A.R. Calderbank, J.C. Doyle, Layering as optimization decomposition: Amathematical theory of network architectures, Proceedings IEEE (2007).

• D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods(1989).

• D.P. Bertsekas, Network Optimization. Continuous and Discrete Models (1998).• L.S. Lasdon, Optimization Theory for Large Systems (1970).


https://web.stanford.edu/class/ee364b/lectures.html

Documents

L.Vandenberghe ECE236C(Spring2020) 9.Dualdecompositionvandenbe/236C/lectures/dualdecomp.pdfD.P.Bertsekas,Network Optimization. Continuous and Discrete Models (1998). L.S.Lasdon,Optimization