Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
L. Vandenberghe ECE236C (Spring 2020)
9. Dual decomposition
• dual methods
• dual decomposition
• network utility maximization
• network flow optimization
9.1
Dual methods
primal: minimize f (x) + g(Ax)dual: maximize −g∗(z) − f ∗(−AT z)
reasons why dual problem may be easier to solve by first-order methods:
• dual problem is unconstrained or has simple constraints (for example, z � 0)
• dual objective is differentiable or has a simple nondifferentiable term
• decomposition: exploit separable structure
Dual decomposition 9.2
(Sub-)gradients of conjugate function
assume f : Rn → R is closed and convex with conjugate
f ∗(y) = supx(yT x − f (x))
• f ∗ is subdifferentiable on (at least) int dom f ∗ (page 2.4)
• maximizers in the definition of f ∗(y) are subgradients at y (page 5.15)
y ∈ ∂ f (x) ⇐⇒ yT x − f (x) = f ∗(y) ⇐⇒ x ∈ ∂ f ∗(y)
• if f is strictly convex, maximizer is unique (hence, equal to ∇ f ∗(y)) if it exists• if f is strongly convex, then conjugate is defined for all y and differentiable with
‖∇ f ∗(y) − ∇ f ∗(y′)‖ ≤ 1µ‖y − y′‖∗ for all y, y′
(µ is strong convexity constant of f with respect to ‖ · ‖); see page 5.19
Dual decomposition 9.3
Outline
• dual methods
• dual decomposition
• network utility maximization
• network flow optimization
Equality constraints
Primal and dual problems
primal: minimize f (x)subject to Ax = b
dual: maximize −bT z − f ∗(−AT z)
Dual gradient ascent algorithm (assuming dom f ∗ = Rn)
x̂ = argminx( f (x) + zT Ax)
z+ = z + t(Ax̂ − b)
• step one computes a subgradient x̂ ∈ ∂ f ∗(−AT z)• step two computes a subgradient b − Ax̂ of bT z + f ∗(−AT z) at z
of interest if calculation of x̂ is inexpensive (for example, f is separable)
Dual decomposition 9.4
Dual decomposition
Convex problem with separable objective
minimize f1(x1) + f2(x2)subject to A1x1 + A2x2 � b
constraint is complicating or coupling constraint
Dual problem
maximize − f ∗1 (−AT1 z) − f ∗2 (−AT
2 z) − bT zsubject to z � 0
can be solved by (sub-)gradient projection if z � 0 is the only constraint
Dual decomposition 9.5
Dual subgradient projection
Subproblem: to calculate f ∗j (−ATj z) and a (sub-)gradient for it,
minimize (over x j) f j(x j) + zT A j x j
• optimal value is − f ∗j (−ATj z)
• minimizer x̂ j is in ∂ f ∗j (−ATj z)
Dual subgradient projection method
x̂ j = argminx j
( f j(x j) + zT A j x j) for j = 1,2
z+ = (z + t(A1 x̂1 + A2 x̂2 − b))+
• minimization problems over x1, x2 are independent
• z-update is projected subgradient step (u+ = max{u,0} elementwise)
Dual decomposition 9.6
Interpretation as price coordination
• p = 2 units in a system; unit j chooses decision variable x j
• constraints are limits on shared resources; zi is price of resource i
Dual update: depends on slacks s = b − A1x1 − A2x2
z+ = (z − ts)+
• increases price zi if resource is over-utilized (si < 0)
• decreases price zi if resource is under-utilized (si > 0)
• never lets prices get negative
Distributed architecture: central node sets prices z, peripheral node j sets x j
01 2A1 x̂1
z
A2 x̂2
z
Dual decomposition 9.7
Example
Quadratic optimization problem
minimizer∑
j=1(12xT
j Pj x j + qTj x j)
subject to B j x j � d j, j = 1, . . . ,rr∑
j=1A j x j � b
• without last inequality, problem would separate into r independent QPs
• we assume Pj � 0
Formulation for dual decomposition
minimizer∑
j=1f j(x j)
subject tor∑
j=1A j x j � b
where f j(x j) = (1/2)xTj Pj x j + qT
j x j with domain {x j | B j x j � d j}Dual decomposition 9.8
Dual problem
maximize −bT z −r∑
j=1f ∗j (−AT
j z)subject to z � 0
• gradient of h(z) = ∑j f ∗j (−AT
j z) is Lipschitz continuous (since Pj � 0):
‖∇h(z) − ∇h(z′)‖2 ≤‖A‖22
min j λmin(Pj)‖z − z′‖2
where A = [ A1 · · · Ar ]• function value of − f ∗j (−AT
j z) is optimal value of QP
minimize (over x j) (1/2)xTj Px j + (q j + AT
j z)T x j
subject to B j x j � d j
• optimal solution x̂ j is gradient x̂ j = ∇ f ∗j (−ATj z)
Dual decomposition 9.9
Numerical example
• 10 subproblems (r = 10), each with 100 variables and 100 constraints
• 10 coupling constraints
• projected gradient descent and FISTA, with the same fixed step size
0 50 100 150 200
10−5
10−4
10−3
10−2
10−1
100
101
102
iteration
rela
tive
dual
subo
ptim
ality
gradient projectionFISTA
Dual decomposition 9.10
Outline
• dual methods
• dual decomposition
• network utility maximization
• network flow optimization
Network utility maximization
Network flows
• n flows, with fixed routes, in a network with m links
• variable x j ≥ 0 denotes the rate of flow j
• flow utility is U j : R→ R, concave, increasing
Capacity constraints
• traffic yi on link i is sum of flows passing through it
• y = Rx, where R is the routing matrix
Ri j =
{1 flow j passes over link i0 otherwise
• link capacity constraint: y � c
Dual decomposition 9.11
Dual network utility maximization problem
primal: maximizen∑
j=1U j(x j)
subject to Rx � c
dual: minimize cT z +n∑
j=1(−U j)∗(−rT
j z)subject to z � 0
• r j is column j of R
• dual variable zi is price (per unit flow) for using link i
• rTj z is the sum of prices along route j
Dual decomposition 9.12
(Sub-)gradients of dual function
Dual objective
f (z) = cT z +n∑
j=1(−U j)∗(−rT
j z)
= cT z +n∑
j=1sup
x j
(U j(x j) − (rT
j z)x j
)Subgradient
c − Rx̂ ∈ ∂ f (z) where x̂ j = argmaxx j
(U j(x j) − (rT
j z)x j
)• rT
j z is the sum of link prices along route j
• c − Rx̂ is vector of link capacity margins for flow x̂
• if U j is strictly concave, this is a gradient
Dual decomposition 9.13
Dual decomposition algorithm
given initial link price vector z � 0 (e.g., z = 1), repeat:
1. sum link prices along each route: calculate λ j = rTj z for j = 1, . . . ,n
2. optimize flows (separately) using flow prices
x̂ j = argmaxx j
(U j(x j) − λ j x j
), j = 1, . . . ,n
3. calculate link capacity margins s = c − Rx̂
4. update link prices using projected (sub-)gradient step with step t
z := (z − ts)+
Decentralized:
• to find λ j , x̂ j source j only needs to know the prices on its route
• to update si, zi, link i only needs to know the flows that pass through it
Dual decomposition 9.14
Outline
• dual methods
• dual decomposition
• network utility maximization
• network flow optimization
Single commodity network flow
Network
• connected, directed graph with n links/arcs, m nodes
• node-arc incidence matrix A ∈ Rm×n is
Ai j =
1 arc j enters node i−1 arc j leaves node i
0 otherwise
Flow vector and external sources
• variable x j denotes flow (traffic) on arc j
• bi is external demand (or supply) of flow at node i (satisfies 1T b = 0)
• flow conservation: Ax = b
Dual decomposition 9.15
Network flow optimization problem
minimize φ(x) =n∑
j=1φ j(x j)
subject to Ax = b
• φ is a separable sum of convex functions
• dual decomposition yields decentralized solution method
Dual problem (a j is jth column of A)
maximize −bT z −n∑
j=1φ∗j(−aT
j z)
• dual variable zi can be interpreted as potential at node i
• y j = −aTj z is the potential difference across arc j
(potential at start node minus potential at end node)
Dual decomposition 9.16
(Sub-)gradients of dual function
Negative dual objective
f (z) = bT z +n∑
j=1φ∗j(−aT
j z)
Subgradient
b − Ax̂ ∈ ∂ f (z) where x̂ j = argmin(φ j(x j) + (aT
j z)x j
)• this is a gradient if the functions φ j are strictly convex
• if φ j is differentiable, φ′j(x̂ j) = −aTj z
Dual decomposition 9.17
Dual decomposition network flow algorithm
given initial potential vector z, repeat
1. determine link flows from potential differences y = −AT z
x̂ j = argminx j
(φ j(x j) − y j x j
), j = 1, . . . ,n
2. compute flow residual at each node: s := b − Ax̂
3. update node potentials using (sub-)gradient step with step size t
z := z − ts
Decentralized:
• flow is calculated from potential difference across arc
• node potential is updated from its own flow surplus
Dual decomposition 9.18
Electrical network interpretation
network flow optimality conditions (with differentiable φ j)
Ax = b, y + AT z = 0, y j = φ′j(x j), j = 1, . . . ,n
network with node incidence matrix A, nonlinear resistors in branches
Kirchhoff current law (KCL): Ax = b
x j is the current flow in branch j; bi is external current extracted at node i
Kirchhoff voltage law (KVL): y + AT z = 0
z j is node potential; y j = −aTj z is jth branch voltage
Current-voltage characterics: y j = φ′j(x j)
for example, φ j(x j) = Rj x2j /2 for linear resistor Rj
current and potentials in circuit are optimal flows and dual variables
Dual decomposition 9.19
Example: minimum queueing delay
Flow cost function and conjugate (c j > 0 is link capacity):
φ j(x j) =x j
c j − x j, φ∗j(y j) =
(√c jy j − 1
)2y j > 1/c j
0 y j ≤ 1/c j
with dom φ j = [0, c j)
• φ j is differentiable except at x j = 0
∂φ j(0) = (−∞,0], φ′j(x j) =c j
(c j − x j)2(0 < x j < c j)
• φ∗j is differentiable
φ∗j′(y j) =
{0 y j ≤ 1/c jc j −
√c j/y j y j > 1/c j
Dual decomposition 9.20
Flow cost function, conjugate, and their subdifferentials (c j = 1)
0 10
2
4
6
x j
φj(x
j)
0 2 4 6 80
1
2
y j
φ∗ j(y
j)
0 1
0
2
4
6
8
x j
∂φ
j(xj)
0 2 4 6 8
0
1
y j
∂φ∗ j(y
j)
Dual decomposition 9.21
References
• S. Boyd, Lecture slides and notes for EE364b, Convex Optimization II, lectures and notes ondecomposition.
• M. Chiang, S.H. Low, A.R. Calderbank, J.C. Doyle, Layering as optimization decomposition: Amathematical theory of network architectures, Proceedings IEEE (2007).
• D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods(1989).
• D.P. Bertsekas, Network Optimization. Continuous and Discrete Models (1998).• L.S. Lasdon, Optimization Theory for Large Systems (1970).
Dual decomposition 9.22