Multi-Block Alternating Direction Method of Multipliersyyye/MIT2015.pdfMarkov decision processes...

Multi-Block Alternating Direction Method of

MultipliersAre there Alternative Algorithms for Linear Programming?

Yinyu Ye

1Department of Management Science and Engineering andInstitute for Computational and Mathematical Engineering

Stanford University, Stanford

September 17, 2015

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 1 / 45

Table of Contents

1 Linear Programming and Algorithms

2 Advances in Simplex and Interior-Point Algorithms

3 First-Order and Alternating Direction Method of Multipliers

4 Convergence of RP-ADMM in Expectation

5 Why Multi-Block ADMM?

Table of Contents

Linear Programming (LP)

Algebra of Linear Programming

minimizex cTx

subject to Ax (<=>)b,x ≥ 0,

where given constraint matrix A is an m × n matrix, theright-hand b is an m-dimensional vector, the objectivecoefficients c is an n-dimensional vector.

Variables x is an n-dimensional vector and need to beoptimally decided.LP is a data-driven computation/decision model that haswide applications so that we like to decide the optimal xfast in theory and practice.

minimizex cTx

where given constraint matrix A is an m × n matrix, theright-hand b is an m-dimensional vector, the objectivecoefficients c is an n-dimensional vector.Variables x is an n-dimensional vector and need to beoptimally decided.

LP is a data-driven computation/decision model that haswide applications so that we like to decide the optimal xfast in theory and practice.

minimizex cTx

where given constraint matrix A is an m × n matrix, theright-hand b is an m-dimensional vector, the objectivecoefficients c is an n-dimensional vector.Variables x is an n-dimensional vector and need to beoptimally decided.LP is a data-driven computation/decision model that haswide applications so that we like to decide the optimal xfast in theory and practice.

Geometry of Linear Programming

LP Algorithms: the Simplex Method

LP Algorithms: the Interior-Point Method

Table of Contents

Advances in the Simplex MethodMarkov decision processes (MDPs) provide amathematical framework for sequentialdecision-making where outcomes are partly randomand partly under the control of a decision maker.

MDPs are useful for studying a wide range ofoptimization problems solved via dynamicprogramming, where it was known at least as early asthe 1950s (cf. Shapley 53, Bellman 57).Modern applications include dynamic planning,reinforcement learning, social networking, and almostall other dynamic/sequential decision makingproblems in Mathematical, Physical, Managementand Social Sciences.

Advances in the Simplex MethodMarkov decision processes (MDPs) provide amathematical framework for sequentialdecision-making where outcomes are partly randomand partly under the control of a decision maker.MDPs are useful for studying a wide range ofoptimization problems solved via dynamicprogramming, where it was known at least as early asthe 1950s (cf. Shapley 53, Bellman 57).

Modern applications include dynamic planning,reinforcement learning, social networking, and almostall other dynamic/sequential decision makingproblems in Mathematical, Physical, Managementand Social Sciences.

Advances in the Simplex MethodMarkov decision processes (MDPs) provide amathematical framework for sequentialdecision-making where outcomes are partly randomand partly under the control of a decision maker.MDPs are useful for studying a wide range ofoptimization problems solved via dynamicprogramming, where it was known at least as early asthe 1950s (cf. Shapley 53, Bellman 57).Modern applications include dynamic planning,reinforcement learning, social networking, and almostall other dynamic/sequential decision makingproblems in Mathematical, Physical, Managementand Social Sciences.

The Markov Decision Process/Game continuedAt each time step, the process is in some statei ∈ {1, ...,m}, and the decision maker chooses anaction ji among a set of actions, Ai , for state i .

The process responds at the next time step byrandomly moving into a state and producing acorresponding cost cji . The correspondingprobabilities entering next state, pji , is conditionallyindependent of all previous states and actions.A stationary policy for the decision maker is a set ofm actions taking by the decision maker all times.The MDP is to find a policy to optimiza the expecteddiscounted sum over infinite horizon with a discountfactor 0 ≤ γ < 1.

The Markov Decision Process/Game continuedAt each time step, the process is in some statei ∈ {1, ...,m}, and the decision maker chooses anaction ji among a set of actions, Ai , for state i .The process responds at the next time step byrandomly moving into a state and producing acorresponding cost cji . The correspondingprobabilities entering next state, pji , is conditionallyindependent of all previous states and actions.

A stationary policy for the decision maker is a set ofm actions taking by the decision maker all times.The MDP is to find a policy to optimiza the expecteddiscounted sum over infinite horizon with a discountfactor 0 ≤ γ < 1.

The Markov Decision Process/Game continuedAt each time step, the process is in some statei ∈ {1, ...,m}, and the decision maker chooses anaction ji among a set of actions, Ai , for state i .The process responds at the next time step byrandomly moving into a state and producing acorresponding cost cji . The correspondingprobabilities entering next state, pji , is conditionallyindependent of all previous states and actions.A stationary policy for the decision maker is a set ofm actions taking by the decision maker all times.

The MDP is to find a policy to optimiza the expecteddiscounted sum over infinite horizon with a discountfactor 0 ≤ γ < 1.

The Markov Decision Process/Game continuedAt each time step, the process is in some statei ∈ {1, ...,m}, and the decision maker chooses anaction ji among a set of actions, Ai , for state i .The process responds at the next time step byrandomly moving into a state and producing acorresponding cost cji . The correspondingprobabilities entering next state, pji , is conditionallyindependent of all previous states and actions.A stationary policy for the decision maker is a set ofm actions taking by the decision maker all times.The MDP is to find a policy to optimiza the expecteddiscounted sum over infinite horizon with a discountfactor 0 ≤ γ < 1.

The Fixed-Point of Cost-to-Go ValuesAn optimal policy is associated with m cost-to-go valuesfor every state, y ∈ Rm, such that it is a fixed point:

y ∗i = min{cji + γpTjiy, ji ∈ Ai}, ∀i ,

where the optimal action

j∗i = argmin{cji + γpTjiy, ji ∈ Ai}, ∀i .

maximizey∑m

i=1 yisubject to y1 ≤ cj1 + γpT

j1y, j1 ∈ A1

. . . . . . . . .yi ≤ cji + γpT

jiy, ji ∈ Ai

. . . . . . . . .ym ≤ cjm + γpT

jmy, jm ∈ Am.

The Fixed-Point of Cost-to-Go ValuesAn optimal policy is associated with m cost-to-go valuesfor every state, y ∈ Rm, such that it is a fixed point:

y ∗i = min{cji + γpTjiy, ji ∈ Ai}, ∀i ,

where the optimal action

j∗i = argmin{cji + γpTjiy, ji ∈ Ai}, ∀i .

maximizey∑m

i=1 yisubject to y1 ≤ cj1 + γpT

j1y, j1 ∈ A1

. . . . . . . . .yi ≤ cji + γpT

jiy, ji ∈ Ai

. . . . . . . . .ym ≤ cjm + γpT

jmy, jm ∈ Am.

Algorithmic Events of the MDP Methods

Shapley 53 and Bellman 57 developed a method called thevalue-iteration method to approximate the optimal state values.

Another best known method is due to Howard 60 and is knownas the policy-iteration method, or block pivoting simplexmethod, which generate an optimal policy in finite number ofiterations in a distributed and decentralized way.de Ghellinck 60, D’Epenoux 60 and Manne 60 showed that theMDP has an LP representation, so that it can be solved by thesimplex method of Dantzig 47, which is a special policy-iterationmethod that each step only actions in one state is switched.But most analyses of these methods are negative: the simplexmethod with smallest index rule (Melekopoglou/Condon 90),random pivoting rule (Friedman et al 12), and policy-iterationfor undiscounted finite-horizon MDP (Fearnley 10) are allexponential.

Shapley 53 and Bellman 57 developed a method called thevalue-iteration method to approximate the optimal state values.Another best known method is due to Howard 60 and is knownas the policy-iteration method, or block pivoting simplexmethod, which generate an optimal policy in finite number ofiterations in a distributed and decentralized way.

de Ghellinck 60, D’Epenoux 60 and Manne 60 showed that theMDP has an LP representation, so that it can be solved by thesimplex method of Dantzig 47, which is a special policy-iterationmethod that each step only actions in one state is switched.But most analyses of these methods are negative: the simplexmethod with smallest index rule (Melekopoglou/Condon 90),random pivoting rule (Friedman et al 12), and policy-iterationfor undiscounted finite-horizon MDP (Fearnley 10) are allexponential.

Shapley 53 and Bellman 57 developed a method called thevalue-iteration method to approximate the optimal state values.Another best known method is due to Howard 60 and is knownas the policy-iteration method, or block pivoting simplexmethod, which generate an optimal policy in finite number ofiterations in a distributed and decentralized way.de Ghellinck 60, D’Epenoux 60 and Manne 60 showed that theMDP has an LP representation, so that it can be solved by thesimplex method of Dantzig 47, which is a special policy-iterationmethod that each step only actions in one state is switched.

But most analyses of these methods are negative: the simplexmethod with smallest index rule (Melekopoglou/Condon 90),random pivoting rule (Friedman et al 12), and policy-iterationfor undiscounted finite-horizon MDP (Fearnley 10) are allexponential.

Shapley 53 and Bellman 57 developed a method called thevalue-iteration method to approximate the optimal state values.Another best known method is due to Howard 60 and is knownas the policy-iteration method, or block pivoting simplexmethod, which generate an optimal policy in finite number ofiterations in a distributed and decentralized way.de Ghellinck 60, D’Epenoux 60 and Manne 60 showed that theMDP has an LP representation, so that it can be solved by thesimplex method of Dantzig 47, which is a special policy-iterationmethod that each step only actions in one state is switched.But most analyses of these methods are negative: the simplexmethod with smallest index rule (Melekopoglou/Condon 90),random pivoting rule (Friedman et al 12), and policy-iterationfor undiscounted finite-horizon MDP (Fearnley 10) are allexponential.

Positive Results for MDP of m states and n actionsThe classic simplex method, with the Dantzigpivoting rule, terminates in

m(n −m)

1− γ· log

1− γ

)iterations, and each iteration uses at most O(mn)arithmetic operations (Y MOR10).

The policy-iteration or block-pivoting method

1− γ· log

1− γ

iterations and each iteration uses at most m2narithmetic operations (Hansen/Miltersen/ZwickACM12).

Positive Results for MDP of m states and n actionsThe classic simplex method, with the Dantzigpivoting rule, terminates in

m(n −m)

1− γ· log

1− γ

)iterations, and each iteration uses at most O(mn)arithmetic operations (Y MOR10).The policy-iteration or block-pivoting method

1− γ· log

1− γ

iterations and each iteration uses at most m2narithmetic operations (Hansen/Miltersen/ZwickACM12).

Positive Results for MDP continued

The simplex method for deterministic MDP,regardless discount factor, terminates in

O(m3n2 log2m)

iterations (Post/Y MOR2014).

The result was extended to general LP by Kitaharaand Mizuno 2012; also see renewed exciting researchwork on the simplex method, e.g., Feinberg/Huang2013, Lee/Epelman/Romeijn/Smith 2013, Scherrer2014, Fearnley/Savani 2014,Adler/Papadimitriou/Rubinstein 2014, etc.

Positive Results for MDP continued

The simplex method for deterministic MDP,regardless discount factor, terminates in

O(m3n2 log2m)

iterations (Post/Y MOR2014).

The result was extended to general LP by Kitaharaand Mizuno 2012; also see renewed exciting researchwork on the simplex method, e.g., Feinberg/Huang2013, Lee/Epelman/Romeijn/Smith 2013, Scherrer2014, Fearnley/Savani 2014,Adler/Papadimitriou/Rubinstein 2014, etc.

The Turn-Based Two-Person Zero-Sum Game

The states is partitioned to two sets where one is tomax and the other is to min, i.e., the fixed point:

y ∗i = min{cji + γpTjiy, ji ∈ Ai}, ∀i ∈ I+,

y ∗i = max{cji + γpTjiy, ji ∈ Ai}, ∀i ∈ I−.

It does not admit a convex programming formulation,and it is unknown if it can be solved in polynomialtime in general.

Hansen/Miltersen/Zwick ACM12 proved that thestrategy iteration method solves it in stronglypolynomial time when discount factor is fixed – thefirst strongly polynomial time algorithm.

Advances in Interior-Point Methods

In theory, the best interior-point algorithms converge inO(

√max{m, n}) iterations, where notation O includes some

constant and logarithmic factors of dimensions and accuracy ϵ.

Navigating Central Path with Electrical Flows: from Flows toMatchings, and Back (Madry FOCS13). The paper made abreakthrough on solving a class of max-flow and min-cutproblems.

Path-Finding Methods for Linear Programming : Solving LinearPrograms in O(

√min{m, n}) Iterations and Faster Algorithms

for Maximum Flow (Lee and Sidford, FOCS14).

Efficient Inverse Maintenance and Faster Algorithms for LinearProgramming (Lee and Sidford, FOCS15), where the author alsohave also reduced the operation complexity for linearprogramming.

Table of Contents

First-Order or “Inverse-Free” Methods

Early work include the von Neumann projectionmethod (also see Fruend and Jarre/Rendl 07).

Use subgradient method for a transformed problem,assuming knowing a strict feasible point (Reneger 14,Freund 15) with iteration complexity O(L2D2 1

ϵ2 ),where L and D are condition numbers on the optimalsolution structure and feasible region.

First-order Karmarkar potential reduction reductionalgorithm...

First order method for packing and covering LP(Allen-Zhu/Orecchia 14): penalize the constraint,then use stochastic coordinate descent and severaltechniques. The iteration complexity: O(1ϵ ) for

packing LP and O( 1ϵ1.5 ) for covering LP.

Two-Block ADMM for solve primal-LP (He/Yuan 11,Monteiro/Svaiter 13, Monteiro/Ortiz/Svaiter 14)with iteration complexity O(1ϵ ) (a matrix needs to beinversed once).

First order method for packing and covering LP(Allen-Zhu/Orecchia 14): penalize the constraint,then use stochastic coordinate descent and severaltechniques. The iteration complexity: O(1ϵ ) for

packing LP and O( 1ϵ1.5 ) for covering LP.

Two-Block ADMM for solve primal-LP (He/Yuan 11,Monteiro/Svaiter 13, Monteiro/Ortiz/Svaiter 14)with iteration complexity O(1ϵ ) (a matrix needs to beinversed once).

First-Order Potential Reduction I

Minimize f (x)Subject to eTx = 1; x ≥ 0,

where e is the vector of all ones.

Assume that f (x) is a convex function in x ∈ Rn and f (x∗) = 0where x∗ is a minimizer of the problem. Furthermore,

f (x+ d)− f (x) ≤ ∇f (x)Td+γ

2∥d∥2,

where positive γ is the Lipschitz parameter.We consider the potential function (e.g., Karmarkar 84,Todd/Ye 87)

ϕ(x) = ρ ln(f (x))−∑j

ln(xj),

where ρ ≥ n over the simplex. If we start from x0 = 1ne, and

generate a sequence of points xk , k = 1, ...,, whose potential value isstrictly decrease by a fixed amount.

where e is the vector of all ones.Assume that f (x) is a convex function in x ∈ Rn and f (x∗) = 0where x∗ is a minimizer of the problem. Furthermore,

f (x+ d)− f (x) ≤ ∇f (x)Td+γ

2∥d∥2,

where positive γ is the Lipschitz parameter.

We consider the potential function (e.g., Karmarkar 84,Todd/Ye 87)

ϕ(x) = ρ ln(f (x))−∑j

ln(xj),

where e is the vector of all ones.Assume that f (x) is a convex function in x ∈ Rn and f (x∗) = 0where x∗ is a minimizer of the problem. Furthermore,

f (x+ d)− f (x) ≤ ∇f (x)Td+γ

2∥d∥2,

where positive γ is the Lipschitz parameter.We consider the potential function (e.g., Karmarkar 84,Todd/Ye 87)

ϕ(x) = ρ ln(f (x))−∑j

ln(xj),

First-Order Potential Reduction II

Update x by solving

Minimize ∇ϕ(x)TdSubject to eTd = 0, ∥X−1d∥ ≤ β;

where β < 1 is yet to be determined; and x+ = x+ d.

ϕ(x+)− ϕ(x) ≤ −β +ργ

2f (x)β2 +

2(1− β)

so that one can choose a β to make

ϕ(x+)− ϕ(x) ≤ −f (x)2(f (x) + 2ργ)

TheoremThe steepest descent potential reduction algorithm generates a xk

with f (xk)/f (x0) ≤ ϵ in no more than

4(n +√n)max{1,2(n+

√n)γ/f (x0)}

ϵln(1

ϵ) steps.

Update x by solving

ϕ(x+)− ϕ(x) ≤ −β +ργ

2f (x)β2 +

2(1− β)

ϕ(x+)− ϕ(x) ≤ −f (x)2(f (x) + 2ργ)

4(n +√n)max{1,2(n+

√n)γ/f (x0)}

ϵln(1

ϵ) steps.

Update x by solving

ϕ(x+)− ϕ(x) ≤ −β +ργ

2f (x)β2 +

2(1− β)

ϕ(x+)− ϕ(x) ≤ −f (x)2(f (x) + 2ργ)

4(n +√n)max{1,2(n+

√n)γ/f (x0)}

ϵln(1

ϵ) steps.

Alternating Direction Method of Multipliers

Consider the convex optimization problem

minx∈Rn

f1(x1) + · · ·+ fp(xp),

s.t. Ax , A1x1 + · · ·+ Apxp = b,

xi ∈ Xi ⊂ Rdi , i = 1, . . . , p.

Augmented Lagrangian function:

Lγ(x1, .., xp; y) =∑

i fi(xi)− yT (∑

i Aixi − b)

+γ2∥

∑i Aixi − b∥2.

Consider the convex optimization problem

minx∈Rn

f1(x1) + · · ·+ fp(xp),

s.t. Ax , A1x1 + · · ·+ Apxp = b,

xi ∈ Xi ⊂ Rdi , i = 1, . . . , p.

Augmented Lagrangian function:

Lγ(x1, .., xp; y) =∑

i fi(xi)− yT (∑

i Aixi − b)

+γ2∥

∑i Aixi − b∥2.

continued

Multi-block ADMM:xk+11 ←− argminx1∈X1

Lγ(x1, xk2 . . . , xkp; y

k),...

xk+1p ←− argminxp∈Xp

Lγ(xk+11 , . . . , xk+1

p−1, xp; yk),

yk+1 ←− yk − γ(∑

i Aixk+1i − b).

Convergence was well established when p = 1 orp = 2 (..., Glowinski/Marrocco 75, Gabay/Mercier’76, Eckstein/Bertsekas 92,...)

But what about p > 2?

continued

Lγ(x1, xk2 . . . , xkp; y

k),...

Lγ(xk+11 , . . . , xk+1

p−1, xp; yk),

yk+1 ←− yk − γ(∑

i Aixk+1i − b).

But what about p > 2?

continued

Lγ(x1, xk2 . . . , xkp; y

k),...

Lγ(xk+11 , . . . , xk+1

p−1, xp; yk),

yk+1 ←− yk − γ(∑

i Aixk+1i − b).

But what about p > 2?Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 24 / 45

Convergence of Multi-Block ADMM

Recent Discovery: When p ≥ 3, ADMM can diverge[Chen/He/Y/Yuan MP14]:

p = 3.

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

The ADMM would be a linear mapping of matrix M

zk+1 = Mzk , where zk = (xk1 ; xk2 ; x

k3 ; y

Why diverges? ρ(M) > 1 for above A and anychoice of γ.

When you randomly choose an initial solution, itdiverges with probability one!

Recent Discovery: When p ≥ 3, ADMM can diverge[Chen/He/Y/Yuan MP14]: p = 3.

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

k3 ; y

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

k3 ; y

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

k3 ; y

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

k3 ; y

Almost Always Diverges

Strong Convexity Does not Help

min 0.05x21 + 0.05x22 + 0.05x23

1 1 11 1 21 2 2

x1x2x3

The mapping matrix M in the ADMM (γ = 1) has

ρ(M) = 1.0087 > 1

That is, even for strongly convex programming, theADMM is not necessarily convergent for a range ofγ > 0.

min 0.05x21 + 0.05x22 + 0.05x23

1 1 11 1 21 2 2

x1x2x3

ρ(M) = 1.0087 > 1

min 0.05x21 + 0.05x22 + 0.05x23

1 1 11 1 21 2 2

x1x2x3

ρ(M) = 1.0087 > 1

The Stepsize of ADMM

For some step-size 0 < β < 1, update the Lagrangianmultiplier:

yk+1 := yk − βγ(∑i

Aixk+1i − b).

p = 1; (Augmented Lagrangian Method) forβ ∈ (0, 2), (Hestenes 69, Powell 69).

p = 2; (Alternating Direction Method of Multipliers)

for β ∈ (0, 1+√5

2 ), (Glowinski 84).

p ≥ 3; for β sufficiently small provided additionalconditions on the problem, (Hong/Luo 12)

The Stepsize of ADMM

For some step-size 0 < β < 1, update the Lagrangianmultiplier:

yk+1 := yk − βγ(∑i

Aixk+1i − b).

p = 1; (Augmented Lagrangian Method) forβ ∈ (0, 2), (Hestenes 69, Powell 69).

p = 2; (Alternating Direction Method of Multipliers)

for β ∈ (0, 1+√5

2 ), (Glowinski 84).

p ≥ 3; for β sufficiently small provided additionalconditions on the problem, (Hong/Luo 12)

Is there a Problem-Data-Independent β?

For any positive β the following linear system makesADMM diverge 1 1 1

1 1 1 + β1 1 + β 1 + β

x1x2x3

There is no practical problem-data-independent β suchthat the small-step size variant would work.

Many other alternatives were proposed, but somehowthey all make ADMM converge slower in practice...

1 1 1 + β1 1 + β 1 + β

x1x2x3

1 1 1 + β1 1 + β 1 + β

x1x2x3

Randomly Permuted ADMM

Random-Permuted ADMM (RP-ADMM): each round,draw a random permutation σ = (σ(1), . . . , σ(p)) of{1, . . . , p}, and

Update xσ(1) → xσ(2) → · · · → xσ(p) → y.

(This is block sample without replacement)

Force “absolute fairness” among blocks or a mixedstrategy on updating order.

Random permutation has been effectively used inmany areas but little theoretical analysis is known.

Simulation Test Result: almost always converges andfast!

Update xσ(1) → xσ(2) → · · · → xσ(p) → y.

Almost Always Converges when Randomize the

Update Order

Table of Contents

Any Theory Behind the Success?

Consider a square system of linear equation:

(S) minx∈Rn

s.t. A1x1 + · · ·+ Apxp = b,

where A = [A1, . . . ,Ap] ∈ Rn×n, xi ∈ Rdi and∑

i di = n.

After k rounds, RP-ADMM generates zk , an r.v.depending on

ξk = (σ1, . . . , σk),

where σj is the randomly picked permutation at j-thround. Let

ϕk , Eξk(zk).

(S) minx∈Rn

s.t. A1x1 + · · ·+ Apxp = b,

i di = n.

ξk = (σ1, . . . , σk),

ϕk , Eξk(zk).

(S) minx∈Rn

s.t. A1x1 + · · ·+ Apxp = b,

i di = n.

ξk = (σ1, . . . , σk),

ϕk , Eξk(zk).

Convergence in Expectation (Sun/Luo/Y 15)

Theorem 1

The expected iterate ϕk , Eξk(zk) converges to the

solution linearly for any 1 ≤ p ≤ n if A is invertible, i.e.

{ϕk}k→∞ −→[A−1b0

Remark: Expected convergence = convergence, but is astrong evidence for convergence for solving mostproblems, e.g., when iterates are bounded.

Convergence in Expectation (Sun/Luo/Y 15)

Theorem 1

The expected iterate ϕk , Eξk(zk) converges to the

solution linearly for any 1 ≤ p ≤ n if A is invertible, i.e.

{ϕk}k→∞ −→[A−1b0

Remark: Expected convergence = convergence, but is astrong evidence for convergence for solving mostproblems, e.g., when iterates are bounded.

The Average Mapping is a Contraction

The update equation of RP-ADMM for (S) (withb = 0) is

zk+1 = Mσzk ,

where Mσ ∈ R2n×2n depend on σ.

Define the expected update matrix as

M = Eσ(Mσ) =1

Theorem 2The spectral radius of M is strictly less than 1, i.e.ρ(M) < 1, for any 1 ≤ p ≤ n if A is invertible.

zk+1 = Mσzk ,

M = Eσ(Mσ) =1

zk+1 = Mσzk ,

M = Eσ(Mσ) =1

Proof Difficulty of Theorem 2

Theorem 2 implies Theorem 1 is relatively easy toshow, but Theorem 2...

Few tools deal with spectral radius of non-symmetricmatrices.

E.g. ρ(X + Y ) ≤ ρ(X ) + ρ(Y ) and ρ(XY ) ≤ ρ(X )ρ(Y ) don’thold.

Though ρ(M) < ∥M∥, it turns out ∥M∥ > 2.3 for thecounterexample.

RP is not independent so that M is a complicatedfunction of A.

Techniques: Symmetrization and MathematicalInduction.

Theorem 2 implies Theorem 1 is relatively easy toshow, but Theorem 2...Few tools deal with spectral radius of non-symmetricmatrices.

Two Main Lemmas to Prove Theorem 2

Step 1: Relate M to a symmetric matrix AQAT .

Lemma 1

λ ∈ eig(M)⇐⇒ (1− λ)2

1− 2λ∈ eig(AQAT ).

Sicne Q is symmetric, we have

ρ(M) < 1⇐⇒ eig(AQAT ) ⊆ (0,4

Step 2: Bound eigenvalues of AQAT - prove by induction.

Lemma 2

eig(AQAT ) ⊆ (0,4

Remark: 4/3 is “almost” tight; for m = 3, maximum ≈ 1.18.Increase to 4/3 as m increases.

Lemma 1

λ ∈ eig(M)⇐⇒ (1− λ)2

ρ(M) < 1⇐⇒ eig(AQAT ) ⊆ (0,4

Lemma 2

eig(AQAT ) ⊆ (0,4

Lemma 1

λ ∈ eig(M)⇐⇒ (1− λ)2

ρ(M) < 1⇐⇒ eig(AQAT ) ⊆ (0,4

Lemma 2

eig(AQAT ) ⊆ (0,4

Fixed Cyclic ADMM vs RP-ADMM

Solving weakly Laplacian linear systems:

A(i , i) = 1, A(i , j) = A(j , i) = τ · rand(1, 1).

(n, τ) - (iter,time) CycADMM RP-ADMM

(500, 0.01) diverge (10.1, 1.3)(500, 0.05) diverge (67, 0.61)

(5000, 0.002) diverge (7.5, 21.1)(5000, 0.01) diverge (22, 38)(5000, 0.02) diverge (205, 251)

Further Result on Randomized Permutation for

Convex Optimization

Consider the non-separable convex quadratic problem

minx∈Rn

xTHx+ cTx,

s.t. Ax , A1x1 + · · ·+ Apxp = b,

xi ∈ Xi ⊂ Rdi , i = 1, . . . , p.

Theorem 3If each block subproblem possesses a unique solution, theexpected iterate ϕk , Eξk(z

k) converges to an optimalsolution of the original problem (Chen, Li, Liu and Y 15).

Further Result on Randomized Permutation for

Convex Optimization

Consider the non-separable convex quadratic problem

minx∈Rn

xTHx+ cTx,

s.t. Ax , A1x1 + · · ·+ Apxp = b,

xi ∈ Xi ⊂ Rdi , i = 1, . . . , p.

Theorem 3If each block subproblem possesses a unique solution, theexpected iterate ϕk , Eξk(z

k) converges to an optimalsolution of the original problem (Chen, Li, Liu and Y 15).

Table of Contents

Universal Decomposition and Block Coordinate

Descent

Consider the homogeneous and self-dual linear program tofind feasible solution (x, y, s) such that

Ax− bτ = 0,−ATy − s+ cτ = 0,bTy − cTx− κ = 0,

eTx+ τ + eT s+ κ = 1,(x, τ, s, κ) ≥ 0,

where three blocks (x, τ), y and (s, κ) are alternativelyupdated (Donoghue/Chu/Parikh/Boyd 15), also see Sunand Toh 14 for SDP.

Combine ADMM and Interior-Point

Consider the logarithmic barrier function as objective:

min −µ ln(τκ)− µ∑

j ln(xjsj)

s.t. Ax− bτ = 0,−ATy − s+ cτ = 0,bTy − cTx− κ = 0,

eTx+ τ + eT s+ κ = 1,

which is solved by Multi-Block ADMM.

Then µ is gradually reduced to 0 as in interior-pointmethods – initial computational results are encouraging(Lin/Ma 15)!

Combine ADMM and Interior-Point

Consider the logarithmic barrier function as objective:

min −µ ln(τκ)− µ∑

j ln(xjsj)

s.t. Ax− bτ = 0,−ATy − s+ cτ = 0,bTy − cTx− κ = 0,

eTx+ τ + eT s+ κ = 1,

which is solved by Multi-Block ADMM.

Then µ is gradually reduced to 0 as in interior-pointmethods – initial computational results are encouraging(Lin/Ma 15)!

Implementation on Distributed Platforms I

Reformulate

minimizex cTx

subject to Ax = b,x ≥ 0, as

minimizex0,...,xm cTx0

subject to aixi = bi , i = 1, ...,mxi − x0 = 0, i = 1, ...,m,x0 ≥ 0.

Then apply Multi-Block ADMM to the new formulation(He and Yuan 15, Sun/Y 15).

Implementation on Distributed Platforms I

Reformulate

minimizex cTx

subject to Ax = b,x ≥ 0, as

minimizex0,...,xm cTx0

subject to aixi = bi , i = 1, ...,mxi − x0 = 0, i = 1, ...,m,x0 ≥ 0.

Then apply Multi-Block ADMM to the new formulation(He and Yuan 15, Sun/Y 15).

Implementation on Distributed Platforms II

In each round of ADMM, for i = 1, ...,m, for a simplequadratic objective one independently solves

minimizexi qi(xi)subject to aix = bi ;

then, update x0 as

x0 = max{ 1m

xi , 0}.

minimizexi qi(xi)subject to aix = bi , xi ≥ 0;

Implementation on Distributed Platforms II

In each round of ADMM, for i = 1, ...,m, for a simplequadratic objective one independently solves

minimizexi qi(xi)subject to aix = bi ;

then, update x0 as

x0 = max{ 1m

xi , 0}. Or

minimizexi qi(xi)subject to aix = bi , xi ≥ 0;

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Good theories have been established.

Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Multi-Block Alternating Direction Method of Multipliersyyye/MIT2015.pdfMarkov decision processes...

Documents

CPSC 340: Data Mining Machine Learning - Computer ...schmidtm/Courses/340-F16/L33.pdfMarkov Chains •This random walk algorithm is a special case of a Markov chain: –Most common

Sampling and Markov Chain Monte Carlolarry/=sml2008/lect2.pdfMarkov chain Monte Carlo (MCMC) methods We have a Markov chain x 0 → x 1 → x 2 → x 3 → ... where x 0 ∼ p 0(x),

PART 12 Fuzzy Decision Making 1. Individual decision making 2. Multiperson decision making 3. Multicriteria decision making 4. Multistage decision making

Decision theory - Decision Analyisis 3

Decision theory - Decision Analyisis 1

Decision Theory Application in Agricultural ...€¦ · Decision Theory (DT) diversify decision makers according to their decision criteria and decision alternatives. Taking into

DECISION MAKING. Faulty Decision Making GUT INSTINCTS UNCONSCIOUS DECISION MAKING TRAPS

Decision Trees Jeff Storey. Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Decision

Decision theory - Decision Analyisis 2

· Presentation Noting Presentation - Decision Paper - Paper - Paper - Paper - Paper - Decision Decision Decision Decision Notin Decision Morning Tea Draft SAAL NRM Board Governance

{Connecting statistics, combinatorics, and computational ...imi.cas.sc.edu/django/site_media/media/...Lecture2.pdfMarkov bases Markov bases Sonja Petrovic« (SAC seminar) Algebraic

Decision Analysis-Decision Trees

Mit2015 kv k 20042015 cwe

Decision Trees and Decision Tables

blogs.evergreen.edublogs.evergreen.edu/mit2015/files/2013/07/Elementary...S… · Web viewTeaching Elementary and Middle Level Science Methods. Masters in teaching program. 2014

Decision Making & Decision Support

Strategic Decision Making - uliege.be decision... · Strategic Decision Making - uliege.be

Decision Support: Decision Analysis - IJSkt.ijs.si/MarkoBohanec/DS2012/DS_2012_DecisionAnalysis.pdf · Decision Support: Decision Analysis ... Expected profit, shown in decision table

Decision and Decision Making

Decision theory decision tree