Multi-Block Alternating Direction Method of Multipliersyyye/MIT2015.pdfMarkov decision processes...

Preview:

Citation preview

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Multi-Block Alternating Direction Method of

MultipliersAre there Alternative Algorithms for Linear Programming?

Yinyu Ye

1Department of Management Science and Engineering andInstitute for Computational and Mathematical Engineering

Stanford University, Stanford

September 17, 2015

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 1 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Table of Contents

1 Linear Programming and Algorithms

2 Advances in Simplex and Interior-Point Algorithms

3 First-Order and Alternating Direction Method of Multipliers

4 Convergence of RP-ADMM in Expectation

5 Why Multi-Block ADMM?

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 2 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Table of Contents

1 Linear Programming and Algorithms

2 Advances in Simplex and Interior-Point Algorithms

3 First-Order and Alternating Direction Method of Multipliers

4 Convergence of RP-ADMM in Expectation

5 Why Multi-Block ADMM?

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 3 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Linear Programming (LP)

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 4 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algebra of Linear Programming

minimizex cTx

subject to Ax (<=>)b,x ≥ 0,

where given constraint matrix A is an m × n matrix, theright-hand b is an m-dimensional vector, the objectivecoefficients c is an n-dimensional vector.

Variables x is an n-dimensional vector and need to beoptimally decided.LP is a data-driven computation/decision model that haswide applications so that we like to decide the optimal xfast in theory and practice.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 5 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algebra of Linear Programming

minimizex cTx

subject to Ax (<=>)b,x ≥ 0,

where given constraint matrix A is an m × n matrix, theright-hand b is an m-dimensional vector, the objectivecoefficients c is an n-dimensional vector.Variables x is an n-dimensional vector and need to beoptimally decided.

LP is a data-driven computation/decision model that haswide applications so that we like to decide the optimal xfast in theory and practice.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 5 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algebra of Linear Programming

minimizex cTx

subject to Ax (<=>)b,x ≥ 0,

where given constraint matrix A is an m × n matrix, theright-hand b is an m-dimensional vector, the objectivecoefficients c is an n-dimensional vector.Variables x is an n-dimensional vector and need to beoptimally decided.LP is a data-driven computation/decision model that haswide applications so that we like to decide the optimal xfast in theory and practice.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 5 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Geometry of Linear Programming

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 6 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

LP Algorithms: the Simplex Method

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 7 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

LP Algorithms: the Interior-Point Method

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 8 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Table of Contents

1 Linear Programming and Algorithms

2 Advances in Simplex and Interior-Point Algorithms

3 First-Order and Alternating Direction Method of Multipliers

4 Convergence of RP-ADMM in Expectation

5 Why Multi-Block ADMM?

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 9 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Advances in the Simplex MethodMarkov decision processes (MDPs) provide amathematical framework for sequentialdecision-making where outcomes are partly randomand partly under the control of a decision maker.

MDPs are useful for studying a wide range ofoptimization problems solved via dynamicprogramming, where it was known at least as early asthe 1950s (cf. Shapley 53, Bellman 57).Modern applications include dynamic planning,reinforcement learning, social networking, and almostall other dynamic/sequential decision makingproblems in Mathematical, Physical, Managementand Social Sciences.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 10 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Advances in the Simplex MethodMarkov decision processes (MDPs) provide amathematical framework for sequentialdecision-making where outcomes are partly randomand partly under the control of a decision maker.MDPs are useful for studying a wide range ofoptimization problems solved via dynamicprogramming, where it was known at least as early asthe 1950s (cf. Shapley 53, Bellman 57).

Modern applications include dynamic planning,reinforcement learning, social networking, and almostall other dynamic/sequential decision makingproblems in Mathematical, Physical, Managementand Social Sciences.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 10 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Advances in the Simplex MethodMarkov decision processes (MDPs) provide amathematical framework for sequentialdecision-making where outcomes are partly randomand partly under the control of a decision maker.MDPs are useful for studying a wide range ofoptimization problems solved via dynamicprogramming, where it was known at least as early asthe 1950s (cf. Shapley 53, Bellman 57).Modern applications include dynamic planning,reinforcement learning, social networking, and almostall other dynamic/sequential decision makingproblems in Mathematical, Physical, Managementand Social Sciences.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 10 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Markov Decision Process/Game continuedAt each time step, the process is in some statei ∈ {1, ...,m}, and the decision maker chooses anaction ji among a set of actions, Ai , for state i .

The process responds at the next time step byrandomly moving into a state and producing acorresponding cost cji . The correspondingprobabilities entering next state, pji , is conditionallyindependent of all previous states and actions.A stationary policy for the decision maker is a set ofm actions taking by the decision maker all times.The MDP is to find a policy to optimiza the expecteddiscounted sum over infinite horizon with a discountfactor 0 ≤ γ < 1.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 11 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Markov Decision Process/Game continuedAt each time step, the process is in some statei ∈ {1, ...,m}, and the decision maker chooses anaction ji among a set of actions, Ai , for state i .The process responds at the next time step byrandomly moving into a state and producing acorresponding cost cji . The correspondingprobabilities entering next state, pji , is conditionallyindependent of all previous states and actions.

A stationary policy for the decision maker is a set ofm actions taking by the decision maker all times.The MDP is to find a policy to optimiza the expecteddiscounted sum over infinite horizon with a discountfactor 0 ≤ γ < 1.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 11 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Markov Decision Process/Game continuedAt each time step, the process is in some statei ∈ {1, ...,m}, and the decision maker chooses anaction ji among a set of actions, Ai , for state i .The process responds at the next time step byrandomly moving into a state and producing acorresponding cost cji . The correspondingprobabilities entering next state, pji , is conditionallyindependent of all previous states and actions.A stationary policy for the decision maker is a set ofm actions taking by the decision maker all times.

The MDP is to find a policy to optimiza the expecteddiscounted sum over infinite horizon with a discountfactor 0 ≤ γ < 1.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 11 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Markov Decision Process/Game continuedAt each time step, the process is in some statei ∈ {1, ...,m}, and the decision maker chooses anaction ji among a set of actions, Ai , for state i .The process responds at the next time step byrandomly moving into a state and producing acorresponding cost cji . The correspondingprobabilities entering next state, pji , is conditionallyindependent of all previous states and actions.A stationary policy for the decision maker is a set ofm actions taking by the decision maker all times.The MDP is to find a policy to optimiza the expecteddiscounted sum over infinite horizon with a discountfactor 0 ≤ γ < 1.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 11 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Fixed-Point of Cost-to-Go ValuesAn optimal policy is associated with m cost-to-go valuesfor every state, y ∈ Rm, such that it is a fixed point:

y ∗i = min{cji + γpTjiy, ji ∈ Ai}, ∀i ,

where the optimal action

j∗i = argmin{cji + γpTjiy, ji ∈ Ai}, ∀i .

maximizey∑m

i=1 yisubject to y1 ≤ cj1 + γpT

j1y, j1 ∈ A1

. . . . . . . . .yi ≤ cji + γpT

jiy, ji ∈ Ai

. . . . . . . . .ym ≤ cjm + γpT

jmy, jm ∈ Am.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 12 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Fixed-Point of Cost-to-Go ValuesAn optimal policy is associated with m cost-to-go valuesfor every state, y ∈ Rm, such that it is a fixed point:

y ∗i = min{cji + γpTjiy, ji ∈ Ai}, ∀i ,

where the optimal action

j∗i = argmin{cji + γpTjiy, ji ∈ Ai}, ∀i .

maximizey∑m

i=1 yisubject to y1 ≤ cj1 + γpT

j1y, j1 ∈ A1

. . . . . . . . .yi ≤ cji + γpT

jiy, ji ∈ Ai

. . . . . . . . .ym ≤ cjm + γpT

jmy, jm ∈ Am.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 12 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algorithmic Events of the MDP Methods

Shapley 53 and Bellman 57 developed a method called thevalue-iteration method to approximate the optimal state values.

Another best known method is due to Howard 60 and is knownas the policy-iteration method, or block pivoting simplexmethod, which generate an optimal policy in finite number ofiterations in a distributed and decentralized way.de Ghellinck 60, D’Epenoux 60 and Manne 60 showed that theMDP has an LP representation, so that it can be solved by thesimplex method of Dantzig 47, which is a special policy-iterationmethod that each step only actions in one state is switched.But most analyses of these methods are negative: the simplexmethod with smallest index rule (Melekopoglou/Condon 90),random pivoting rule (Friedman et al 12), and policy-iterationfor undiscounted finite-horizon MDP (Fearnley 10) are allexponential.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 13 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algorithmic Events of the MDP Methods

Shapley 53 and Bellman 57 developed a method called thevalue-iteration method to approximate the optimal state values.Another best known method is due to Howard 60 and is knownas the policy-iteration method, or block pivoting simplexmethod, which generate an optimal policy in finite number ofiterations in a distributed and decentralized way.

de Ghellinck 60, D’Epenoux 60 and Manne 60 showed that theMDP has an LP representation, so that it can be solved by thesimplex method of Dantzig 47, which is a special policy-iterationmethod that each step only actions in one state is switched.But most analyses of these methods are negative: the simplexmethod with smallest index rule (Melekopoglou/Condon 90),random pivoting rule (Friedman et al 12), and policy-iterationfor undiscounted finite-horizon MDP (Fearnley 10) are allexponential.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 13 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algorithmic Events of the MDP Methods

Shapley 53 and Bellman 57 developed a method called thevalue-iteration method to approximate the optimal state values.Another best known method is due to Howard 60 and is knownas the policy-iteration method, or block pivoting simplexmethod, which generate an optimal policy in finite number ofiterations in a distributed and decentralized way.de Ghellinck 60, D’Epenoux 60 and Manne 60 showed that theMDP has an LP representation, so that it can be solved by thesimplex method of Dantzig 47, which is a special policy-iterationmethod that each step only actions in one state is switched.

But most analyses of these methods are negative: the simplexmethod with smallest index rule (Melekopoglou/Condon 90),random pivoting rule (Friedman et al 12), and policy-iterationfor undiscounted finite-horizon MDP (Fearnley 10) are allexponential.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 13 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algorithmic Events of the MDP Methods

Shapley 53 and Bellman 57 developed a method called thevalue-iteration method to approximate the optimal state values.Another best known method is due to Howard 60 and is knownas the policy-iteration method, or block pivoting simplexmethod, which generate an optimal policy in finite number ofiterations in a distributed and decentralized way.de Ghellinck 60, D’Epenoux 60 and Manne 60 showed that theMDP has an LP representation, so that it can be solved by thesimplex method of Dantzig 47, which is a special policy-iterationmethod that each step only actions in one state is switched.But most analyses of these methods are negative: the simplexmethod with smallest index rule (Melekopoglou/Condon 90),random pivoting rule (Friedman et al 12), and policy-iterationfor undiscounted finite-horizon MDP (Fearnley 10) are allexponential.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 13 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Positive Results for MDP of m states and n actionsThe classic simplex method, with the Dantzigpivoting rule, terminates in

m(n −m)

1− γ· log

(m2

1− γ

)iterations, and each iteration uses at most O(mn)arithmetic operations (Y MOR10).

The policy-iteration or block-pivoting method

n

1− γ· log

(m

1− γ

),

iterations and each iteration uses at most m2narithmetic operations (Hansen/Miltersen/ZwickACM12).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 14 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Positive Results for MDP of m states and n actionsThe classic simplex method, with the Dantzigpivoting rule, terminates in

m(n −m)

1− γ· log

(m2

1− γ

)iterations, and each iteration uses at most O(mn)arithmetic operations (Y MOR10).The policy-iteration or block-pivoting method

n

1− γ· log

(m

1− γ

),

iterations and each iteration uses at most m2narithmetic operations (Hansen/Miltersen/ZwickACM12).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 14 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Positive Results for MDP continued

The simplex method for deterministic MDP,regardless discount factor, terminates in

O(m3n2 log2m)

iterations (Post/Y MOR2014).

The result was extended to general LP by Kitaharaand Mizuno 2012; also see renewed exciting researchwork on the simplex method, e.g., Feinberg/Huang2013, Lee/Epelman/Romeijn/Smith 2013, Scherrer2014, Fearnley/Savani 2014,Adler/Papadimitriou/Rubinstein 2014, etc.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 15 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Positive Results for MDP continued

The simplex method for deterministic MDP,regardless discount factor, terminates in

O(m3n2 log2m)

iterations (Post/Y MOR2014).

The result was extended to general LP by Kitaharaand Mizuno 2012; also see renewed exciting researchwork on the simplex method, e.g., Feinberg/Huang2013, Lee/Epelman/Romeijn/Smith 2013, Scherrer2014, Fearnley/Savani 2014,Adler/Papadimitriou/Rubinstein 2014, etc.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 15 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Turn-Based Two-Person Zero-Sum Game

The states is partitioned to two sets where one is tomax and the other is to min, i.e., the fixed point:

y ∗i = min{cji + γpTjiy, ji ∈ Ai}, ∀i ∈ I+,

y ∗i = max{cji + γpTjiy, ji ∈ Ai}, ∀i ∈ I−.

It does not admit a convex programming formulation,and it is unknown if it can be solved in polynomialtime in general.

Hansen/Miltersen/Zwick ACM12 proved that thestrategy iteration method solves it in stronglypolynomial time when discount factor is fixed – thefirst strongly polynomial time algorithm.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 16 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Turn-Based Two-Person Zero-Sum Game

The states is partitioned to two sets where one is tomax and the other is to min, i.e., the fixed point:

y ∗i = min{cji + γpTjiy, ji ∈ Ai}, ∀i ∈ I+,

y ∗i = max{cji + γpTjiy, ji ∈ Ai}, ∀i ∈ I−.

It does not admit a convex programming formulation,and it is unknown if it can be solved in polynomialtime in general.

Hansen/Miltersen/Zwick ACM12 proved that thestrategy iteration method solves it in stronglypolynomial time when discount factor is fixed – thefirst strongly polynomial time algorithm.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 16 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Turn-Based Two-Person Zero-Sum Game

The states is partitioned to two sets where one is tomax and the other is to min, i.e., the fixed point:

y ∗i = min{cji + γpTjiy, ji ∈ Ai}, ∀i ∈ I+,

y ∗i = max{cji + γpTjiy, ji ∈ Ai}, ∀i ∈ I−.

It does not admit a convex programming formulation,and it is unknown if it can be solved in polynomialtime in general.

Hansen/Miltersen/Zwick ACM12 proved that thestrategy iteration method solves it in stronglypolynomial time when discount factor is fixed – thefirst strongly polynomial time algorithm.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 16 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Advances in Interior-Point Methods

In theory, the best interior-point algorithms converge inO(

√max{m, n}) iterations, where notation O includes some

constant and logarithmic factors of dimensions and accuracy ϵ.

Navigating Central Path with Electrical Flows: from Flows toMatchings, and Back (Madry FOCS13). The paper made abreakthrough on solving a class of max-flow and min-cutproblems.

Path-Finding Methods for Linear Programming : Solving LinearPrograms in O(

√min{m, n}) Iterations and Faster Algorithms

for Maximum Flow (Lee and Sidford, FOCS14).

Efficient Inverse Maintenance and Faster Algorithms for LinearProgramming (Lee and Sidford, FOCS15), where the author alsohave also reduced the operation complexity for linearprogramming.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 17 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Advances in Interior-Point Methods

In theory, the best interior-point algorithms converge inO(

√max{m, n}) iterations, where notation O includes some

constant and logarithmic factors of dimensions and accuracy ϵ.

Navigating Central Path with Electrical Flows: from Flows toMatchings, and Back (Madry FOCS13). The paper made abreakthrough on solving a class of max-flow and min-cutproblems.

Path-Finding Methods for Linear Programming : Solving LinearPrograms in O(

√min{m, n}) Iterations and Faster Algorithms

for Maximum Flow (Lee and Sidford, FOCS14).

Efficient Inverse Maintenance and Faster Algorithms for LinearProgramming (Lee and Sidford, FOCS15), where the author alsohave also reduced the operation complexity for linearprogramming.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 17 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Advances in Interior-Point Methods

In theory, the best interior-point algorithms converge inO(

√max{m, n}) iterations, where notation O includes some

constant and logarithmic factors of dimensions and accuracy ϵ.

Navigating Central Path with Electrical Flows: from Flows toMatchings, and Back (Madry FOCS13). The paper made abreakthrough on solving a class of max-flow and min-cutproblems.

Path-Finding Methods for Linear Programming : Solving LinearPrograms in O(

√min{m, n}) Iterations and Faster Algorithms

for Maximum Flow (Lee and Sidford, FOCS14).

Efficient Inverse Maintenance and Faster Algorithms for LinearProgramming (Lee and Sidford, FOCS15), where the author alsohave also reduced the operation complexity for linearprogramming.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 17 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Advances in Interior-Point Methods

In theory, the best interior-point algorithms converge inO(

√max{m, n}) iterations, where notation O includes some

constant and logarithmic factors of dimensions and accuracy ϵ.

Navigating Central Path with Electrical Flows: from Flows toMatchings, and Back (Madry FOCS13). The paper made abreakthrough on solving a class of max-flow and min-cutproblems.

Path-Finding Methods for Linear Programming : Solving LinearPrograms in O(

√min{m, n}) Iterations and Faster Algorithms

for Maximum Flow (Lee and Sidford, FOCS14).

Efficient Inverse Maintenance and Faster Algorithms for LinearProgramming (Lee and Sidford, FOCS15), where the author alsohave also reduced the operation complexity for linearprogramming.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 17 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Table of Contents

1 Linear Programming and Algorithms

2 Advances in Simplex and Interior-Point Algorithms

3 First-Order and Alternating Direction Method of Multipliers

4 Convergence of RP-ADMM in Expectation

5 Why Multi-Block ADMM?

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 18 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order or “Inverse-Free” Methods

Early work include the von Neumann projectionmethod (also see Fruend and Jarre/Rendl 07).

Use subgradient method for a transformed problem,assuming knowing a strict feasible point (Reneger 14,Freund 15) with iteration complexity O(L2D2 1

ϵ2 ),where L and D are condition numbers on the optimalsolution structure and feasible region.

First-order Karmarkar potential reduction reductionalgorithm...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 19 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order or “Inverse-Free” Methods

Early work include the von Neumann projectionmethod (also see Fruend and Jarre/Rendl 07).

Use subgradient method for a transformed problem,assuming knowing a strict feasible point (Reneger 14,Freund 15) with iteration complexity O(L2D2 1

ϵ2 ),where L and D are condition numbers on the optimalsolution structure and feasible region.

First-order Karmarkar potential reduction reductionalgorithm...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 19 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order or “Inverse-Free” Methods

Early work include the von Neumann projectionmethod (also see Fruend and Jarre/Rendl 07).

Use subgradient method for a transformed problem,assuming knowing a strict feasible point (Reneger 14,Freund 15) with iteration complexity O(L2D2 1

ϵ2 ),where L and D are condition numbers on the optimalsolution structure and feasible region.

First-order Karmarkar potential reduction reductionalgorithm...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 19 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order or “Inverse-Free” Methods

First order method for packing and covering LP(Allen-Zhu/Orecchia 14): penalize the constraint,then use stochastic coordinate descent and severaltechniques. The iteration complexity: O(1ϵ ) for

packing LP and O( 1ϵ1.5 ) for covering LP.

Two-Block ADMM for solve primal-LP (He/Yuan 11,Monteiro/Svaiter 13, Monteiro/Ortiz/Svaiter 14)with iteration complexity O(1ϵ ) (a matrix needs to beinversed once).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 20 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order or “Inverse-Free” Methods

First order method for packing and covering LP(Allen-Zhu/Orecchia 14): penalize the constraint,then use stochastic coordinate descent and severaltechniques. The iteration complexity: O(1ϵ ) for

packing LP and O( 1ϵ1.5 ) for covering LP.

Two-Block ADMM for solve primal-LP (He/Yuan 11,Monteiro/Svaiter 13, Monteiro/Ortiz/Svaiter 14)with iteration complexity O(1ϵ ) (a matrix needs to beinversed once).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 20 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order Potential Reduction I

Minimize f (x)Subject to eTx = 1; x ≥ 0,

where e is the vector of all ones.

Assume that f (x) is a convex function in x ∈ Rn and f (x∗) = 0where x∗ is a minimizer of the problem. Furthermore,

f (x+ d)− f (x) ≤ ∇f (x)Td+γ

2∥d∥2,

where positive γ is the Lipschitz parameter.We consider the potential function (e.g., Karmarkar 84,Todd/Ye 87)

ϕ(x) = ρ ln(f (x))−∑j

ln(xj),

where ρ ≥ n over the simplex. If we start from x0 = 1ne, and

generate a sequence of points xk , k = 1, ...,, whose potential value isstrictly decrease by a fixed amount.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 21 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order Potential Reduction I

Minimize f (x)Subject to eTx = 1; x ≥ 0,

where e is the vector of all ones.Assume that f (x) is a convex function in x ∈ Rn and f (x∗) = 0where x∗ is a minimizer of the problem. Furthermore,

f (x+ d)− f (x) ≤ ∇f (x)Td+γ

2∥d∥2,

where positive γ is the Lipschitz parameter.

We consider the potential function (e.g., Karmarkar 84,Todd/Ye 87)

ϕ(x) = ρ ln(f (x))−∑j

ln(xj),

where ρ ≥ n over the simplex. If we start from x0 = 1ne, and

generate a sequence of points xk , k = 1, ...,, whose potential value isstrictly decrease by a fixed amount.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 21 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order Potential Reduction I

Minimize f (x)Subject to eTx = 1; x ≥ 0,

where e is the vector of all ones.Assume that f (x) is a convex function in x ∈ Rn and f (x∗) = 0where x∗ is a minimizer of the problem. Furthermore,

f (x+ d)− f (x) ≤ ∇f (x)Td+γ

2∥d∥2,

where positive γ is the Lipschitz parameter.We consider the potential function (e.g., Karmarkar 84,Todd/Ye 87)

ϕ(x) = ρ ln(f (x))−∑j

ln(xj),

where ρ ≥ n over the simplex. If we start from x0 = 1ne, and

generate a sequence of points xk , k = 1, ...,, whose potential value isstrictly decrease by a fixed amount.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 21 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order Potential Reduction II

Update x by solving

Minimize ∇ϕ(x)TdSubject to eTd = 0, ∥X−1d∥ ≤ β;

where β < 1 is yet to be determined; and x+ = x+ d.

ϕ(x+)− ϕ(x) ≤ −β +ργ

2f (x)β2 +

β2

2(1− β)

so that one can choose a β to make

ϕ(x+)− ϕ(x) ≤ −f (x)2(f (x) + 2ργ)

.

TheoremThe steepest descent potential reduction algorithm generates a xk

with f (xk)/f (x0) ≤ ϵ in no more than

4(n +√n)max{1,2(n+

√n)γ/f (x0)}

ϵln(1

ϵ) steps.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 22 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order Potential Reduction II

Update x by solving

Minimize ∇ϕ(x)TdSubject to eTd = 0, ∥X−1d∥ ≤ β;

where β < 1 is yet to be determined; and x+ = x+ d.

ϕ(x+)− ϕ(x) ≤ −β +ργ

2f (x)β2 +

β2

2(1− β)

so that one can choose a β to make

ϕ(x+)− ϕ(x) ≤ −f (x)2(f (x) + 2ργ)

.

TheoremThe steepest descent potential reduction algorithm generates a xk

with f (xk)/f (x0) ≤ ϵ in no more than

4(n +√n)max{1,2(n+

√n)γ/f (x0)}

ϵln(1

ϵ) steps.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 22 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First-Order Potential Reduction II

Update x by solving

Minimize ∇ϕ(x)TdSubject to eTd = 0, ∥X−1d∥ ≤ β;

where β < 1 is yet to be determined; and x+ = x+ d.

ϕ(x+)− ϕ(x) ≤ −β +ργ

2f (x)β2 +

β2

2(1− β)

so that one can choose a β to make

ϕ(x+)− ϕ(x) ≤ −f (x)2(f (x) + 2ργ)

.

TheoremThe steepest descent potential reduction algorithm generates a xk

with f (xk)/f (x0) ≤ ϵ in no more than

4(n +√n)max{1,2(n+

√n)γ/f (x0)}

ϵln(1

ϵ) steps.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 22 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Alternating Direction Method of Multipliers

Consider the convex optimization problem

minx∈Rn

f1(x1) + · · ·+ fp(xp),

s.t. Ax , A1x1 + · · ·+ Apxp = b,

xi ∈ Xi ⊂ Rdi , i = 1, . . . , p.

Augmented Lagrangian function:

Lγ(x1, .., xp; y) =∑

i fi(xi)− yT (∑

i Aixi − b)

+γ2∥

∑i Aixi − b∥2.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 23 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Alternating Direction Method of Multipliers

Consider the convex optimization problem

minx∈Rn

f1(x1) + · · ·+ fp(xp),

s.t. Ax , A1x1 + · · ·+ Apxp = b,

xi ∈ Xi ⊂ Rdi , i = 1, . . . , p.

Augmented Lagrangian function:

Lγ(x1, .., xp; y) =∑

i fi(xi)− yT (∑

i Aixi − b)

+γ2∥

∑i Aixi − b∥2.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 23 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Alternating Direction Method of Multipliers

continued

Multi-block ADMM:xk+11 ←− argminx1∈X1

Lγ(x1, xk2 . . . , xkp; y

k),...

xk+1p ←− argminxp∈Xp

Lγ(xk+11 , . . . , xk+1

p−1, xp; yk),

yk+1 ←− yk − γ(∑

i Aixk+1i − b).

Convergence was well established when p = 1 orp = 2 (..., Glowinski/Marrocco 75, Gabay/Mercier’76, Eckstein/Bertsekas 92,...)

But what about p > 2?

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 24 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Alternating Direction Method of Multipliers

continued

Multi-block ADMM:xk+11 ←− argminx1∈X1

Lγ(x1, xk2 . . . , xkp; y

k),...

xk+1p ←− argminxp∈Xp

Lγ(xk+11 , . . . , xk+1

p−1, xp; yk),

yk+1 ←− yk − γ(∑

i Aixk+1i − b).

Convergence was well established when p = 1 orp = 2 (..., Glowinski/Marrocco 75, Gabay/Mercier’76, Eckstein/Bertsekas 92,...)

But what about p > 2?

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 24 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Alternating Direction Method of Multipliers

continued

Multi-block ADMM:xk+11 ←− argminx1∈X1

Lγ(x1, xk2 . . . , xkp; y

k),...

xk+1p ←− argminxp∈Xp

Lγ(xk+11 , . . . , xk+1

p−1, xp; yk),

yk+1 ←− yk − γ(∑

i Aixk+1i − b).

Convergence was well established when p = 1 orp = 2 (..., Glowinski/Marrocco 75, Gabay/Mercier’76, Eckstein/Bertsekas 92,...)

But what about p > 2?Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 24 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Convergence of Multi-Block ADMM

Recent Discovery: When p ≥ 3, ADMM can diverge[Chen/He/Y/Yuan MP14]:

p = 3.

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

.

The ADMM would be a linear mapping of matrix M

zk+1 = Mzk , where zk = (xk1 ; xk2 ; x

k3 ; y

k)

Why diverges? ρ(M) > 1 for above A and anychoice of γ.

When you randomly choose an initial solution, itdiverges with probability one!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 25 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Convergence of Multi-Block ADMM

Recent Discovery: When p ≥ 3, ADMM can diverge[Chen/He/Y/Yuan MP14]: p = 3.

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

.

The ADMM would be a linear mapping of matrix M

zk+1 = Mzk , where zk = (xk1 ; xk2 ; x

k3 ; y

k)

Why diverges? ρ(M) > 1 for above A and anychoice of γ.

When you randomly choose an initial solution, itdiverges with probability one!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 25 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Convergence of Multi-Block ADMM

Recent Discovery: When p ≥ 3, ADMM can diverge[Chen/He/Y/Yuan MP14]: p = 3.

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

.

The ADMM would be a linear mapping of matrix M

zk+1 = Mzk , where zk = (xk1 ; xk2 ; x

k3 ; y

k)

Why diverges? ρ(M) > 1 for above A and anychoice of γ.

When you randomly choose an initial solution, itdiverges with probability one!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 25 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Convergence of Multi-Block ADMM

Recent Discovery: When p ≥ 3, ADMM can diverge[Chen/He/Y/Yuan MP14]: p = 3.

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

.

The ADMM would be a linear mapping of matrix M

zk+1 = Mzk , where zk = (xk1 ; xk2 ; x

k3 ; y

k)

Why diverges? ρ(M) > 1 for above A and anychoice of γ.

When you randomly choose an initial solution, itdiverges with probability one!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 25 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Convergence of Multi-Block ADMM

Recent Discovery: When p ≥ 3, ADMM can diverge[Chen/He/Y/Yuan MP14]: p = 3.

Ax = a1x1 + a2x2 + a3x3 = 0, where A =

1 1 11 1 21 2 2

.

The ADMM would be a linear mapping of matrix M

zk+1 = Mzk , where zk = (xk1 ; xk2 ; x

k3 ; y

k)

Why diverges? ρ(M) > 1 for above A and anychoice of γ.

When you randomly choose an initial solution, itdiverges with probability one!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 25 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Almost Always Diverges

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 26 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Strong Convexity Does not Help

min 0.05x21 + 0.05x22 + 0.05x23

s.t.

1 1 11 1 21 2 2

x1x2x3

= 0.

The mapping matrix M in the ADMM (γ = 1) has

ρ(M) = 1.0087 > 1

That is, even for strongly convex programming, theADMM is not necessarily convergent for a range ofγ > 0.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 27 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Strong Convexity Does not Help

min 0.05x21 + 0.05x22 + 0.05x23

s.t.

1 1 11 1 21 2 2

x1x2x3

= 0.

The mapping matrix M in the ADMM (γ = 1) has

ρ(M) = 1.0087 > 1

That is, even for strongly convex programming, theADMM is not necessarily convergent for a range ofγ > 0.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 27 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Strong Convexity Does not Help

min 0.05x21 + 0.05x22 + 0.05x23

s.t.

1 1 11 1 21 2 2

x1x2x3

= 0.

The mapping matrix M in the ADMM (γ = 1) has

ρ(M) = 1.0087 > 1

That is, even for strongly convex programming, theADMM is not necessarily convergent for a range ofγ > 0.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 27 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Stepsize of ADMM

For some step-size 0 < β < 1, update the Lagrangianmultiplier:

yk+1 := yk − βγ(∑i

Aixk+1i − b).

p = 1; (Augmented Lagrangian Method) forβ ∈ (0, 2), (Hestenes 69, Powell 69).

p = 2; (Alternating Direction Method of Multipliers)

for β ∈ (0, 1+√5

2 ), (Glowinski 84).

p ≥ 3; for β sufficiently small provided additionalconditions on the problem, (Hong/Luo 12)

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 28 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Stepsize of ADMM

For some step-size 0 < β < 1, update the Lagrangianmultiplier:

yk+1 := yk − βγ(∑i

Aixk+1i − b).

p = 1; (Augmented Lagrangian Method) forβ ∈ (0, 2), (Hestenes 69, Powell 69).

p = 2; (Alternating Direction Method of Multipliers)

for β ∈ (0, 1+√5

2 ), (Glowinski 84).

p ≥ 3; for β sufficiently small provided additionalconditions on the problem, (Hong/Luo 12)

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 28 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Is there a Problem-Data-Independent β?

For any positive β the following linear system makesADMM diverge 1 1 1

1 1 1 + β1 1 + β 1 + β

x1x2x3

= 0.

There is no practical problem-data-independent β suchthat the small-step size variant would work.

Many other alternatives were proposed, but somehowthey all make ADMM converge slower in practice...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 29 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Is there a Problem-Data-Independent β?

For any positive β the following linear system makesADMM diverge 1 1 1

1 1 1 + β1 1 + β 1 + β

x1x2x3

= 0.

There is no practical problem-data-independent β suchthat the small-step size variant would work.

Many other alternatives were proposed, but somehowthey all make ADMM converge slower in practice...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 29 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Is there a Problem-Data-Independent β?

For any positive β the following linear system makesADMM diverge 1 1 1

1 1 1 + β1 1 + β 1 + β

x1x2x3

= 0.

There is no practical problem-data-independent β suchthat the small-step size variant would work.

Many other alternatives were proposed, but somehowthey all make ADMM converge slower in practice...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 29 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Randomly Permuted ADMM

Random-Permuted ADMM (RP-ADMM): each round,draw a random permutation σ = (σ(1), . . . , σ(p)) of{1, . . . , p}, and

Update xσ(1) → xσ(2) → · · · → xσ(p) → y.

(This is block sample without replacement)

Force “absolute fairness” among blocks or a mixedstrategy on updating order.

Random permutation has been effectively used inmany areas but little theoretical analysis is known.

Simulation Test Result: almost always converges andfast!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 30 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Randomly Permuted ADMM

Random-Permuted ADMM (RP-ADMM): each round,draw a random permutation σ = (σ(1), . . . , σ(p)) of{1, . . . , p}, and

Update xσ(1) → xσ(2) → · · · → xσ(p) → y.

(This is block sample without replacement)

Force “absolute fairness” among blocks or a mixedstrategy on updating order.

Random permutation has been effectively used inmany areas but little theoretical analysis is known.

Simulation Test Result: almost always converges andfast!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 30 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Randomly Permuted ADMM

Random-Permuted ADMM (RP-ADMM): each round,draw a random permutation σ = (σ(1), . . . , σ(p)) of{1, . . . , p}, and

Update xσ(1) → xσ(2) → · · · → xσ(p) → y.

(This is block sample without replacement)

Force “absolute fairness” among blocks or a mixedstrategy on updating order.

Random permutation has been effectively used inmany areas but little theoretical analysis is known.

Simulation Test Result: almost always converges andfast!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 30 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Randomly Permuted ADMM

Random-Permuted ADMM (RP-ADMM): each round,draw a random permutation σ = (σ(1), . . . , σ(p)) of{1, . . . , p}, and

Update xσ(1) → xσ(2) → · · · → xσ(p) → y.

(This is block sample without replacement)

Force “absolute fairness” among blocks or a mixedstrategy on updating order.

Random permutation has been effectively used inmany areas but little theoretical analysis is known.

Simulation Test Result: almost always converges andfast!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 30 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Almost Always Converges when Randomize the

Update Order

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 31 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Table of Contents

1 Linear Programming and Algorithms

2 Advances in Simplex and Interior-Point Algorithms

3 First-Order and Alternating Direction Method of Multipliers

4 Convergence of RP-ADMM in Expectation

5 Why Multi-Block ADMM?

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 32 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Any Theory Behind the Success?

Consider a square system of linear equation:

(S) minx∈Rn

0,

s.t. A1x1 + · · ·+ Apxp = b,

where A = [A1, . . . ,Ap] ∈ Rn×n, xi ∈ Rdi and∑

i di = n.

After k rounds, RP-ADMM generates zk , an r.v.depending on

ξk = (σ1, . . . , σk),

where σj is the randomly picked permutation at j-thround. Let

ϕk , Eξk(zk).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 33 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Any Theory Behind the Success?

Consider a square system of linear equation:

(S) minx∈Rn

0,

s.t. A1x1 + · · ·+ Apxp = b,

where A = [A1, . . . ,Ap] ∈ Rn×n, xi ∈ Rdi and∑

i di = n.

After k rounds, RP-ADMM generates zk , an r.v.depending on

ξk = (σ1, . . . , σk),

where σj is the randomly picked permutation at j-thround. Let

ϕk , Eξk(zk).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 33 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Any Theory Behind the Success?

Consider a square system of linear equation:

(S) minx∈Rn

0,

s.t. A1x1 + · · ·+ Apxp = b,

where A = [A1, . . . ,Ap] ∈ Rn×n, xi ∈ Rdi and∑

i di = n.

After k rounds, RP-ADMM generates zk , an r.v.depending on

ξk = (σ1, . . . , σk),

where σj is the randomly picked permutation at j-thround. Let

ϕk , Eξk(zk).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 33 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Convergence in Expectation (Sun/Luo/Y 15)

Theorem 1

The expected iterate ϕk , Eξk(zk) converges to the

solution linearly for any 1 ≤ p ≤ n if A is invertible, i.e.

{ϕk}k→∞ −→[A−1b0

].

Remark: Expected convergence = convergence, but is astrong evidence for convergence for solving mostproblems, e.g., when iterates are bounded.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 34 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Convergence in Expectation (Sun/Luo/Y 15)

Theorem 1

The expected iterate ϕk , Eξk(zk) converges to the

solution linearly for any 1 ≤ p ≤ n if A is invertible, i.e.

{ϕk}k→∞ −→[A−1b0

].

Remark: Expected convergence = convergence, but is astrong evidence for convergence for solving mostproblems, e.g., when iterates are bounded.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 34 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Average Mapping is a Contraction

The update equation of RP-ADMM for (S) (withb = 0) is

zk+1 = Mσzk ,

where Mσ ∈ R2n×2n depend on σ.

Define the expected update matrix as

M = Eσ(Mσ) =1

m!

∑σ

Mσ.

Theorem 2The spectral radius of M is strictly less than 1, i.e.ρ(M) < 1, for any 1 ≤ p ≤ n if A is invertible.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 35 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Average Mapping is a Contraction

The update equation of RP-ADMM for (S) (withb = 0) is

zk+1 = Mσzk ,

where Mσ ∈ R2n×2n depend on σ.

Define the expected update matrix as

M = Eσ(Mσ) =1

m!

∑σ

Mσ.

Theorem 2The spectral radius of M is strictly less than 1, i.e.ρ(M) < 1, for any 1 ≤ p ≤ n if A is invertible.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 35 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The Average Mapping is a Contraction

The update equation of RP-ADMM for (S) (withb = 0) is

zk+1 = Mσzk ,

where Mσ ∈ R2n×2n depend on σ.

Define the expected update matrix as

M = Eσ(Mσ) =1

m!

∑σ

Mσ.

Theorem 2The spectral radius of M is strictly less than 1, i.e.ρ(M) < 1, for any 1 ≤ p ≤ n if A is invertible.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 35 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Proof Difficulty of Theorem 2

Theorem 2 implies Theorem 1 is relatively easy toshow, but Theorem 2...

Few tools deal with spectral radius of non-symmetricmatrices.

E.g. ρ(X + Y ) ≤ ρ(X ) + ρ(Y ) and ρ(XY ) ≤ ρ(X )ρ(Y ) don’thold.

Though ρ(M) < ∥M∥, it turns out ∥M∥ > 2.3 for thecounterexample.

RP is not independent so that M is a complicatedfunction of A.

Techniques: Symmetrization and MathematicalInduction.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 36 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Proof Difficulty of Theorem 2

Theorem 2 implies Theorem 1 is relatively easy toshow, but Theorem 2...Few tools deal with spectral radius of non-symmetricmatrices.

E.g. ρ(X + Y ) ≤ ρ(X ) + ρ(Y ) and ρ(XY ) ≤ ρ(X )ρ(Y ) don’thold.

Though ρ(M) < ∥M∥, it turns out ∥M∥ > 2.3 for thecounterexample.

RP is not independent so that M is a complicatedfunction of A.

Techniques: Symmetrization and MathematicalInduction.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 36 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Proof Difficulty of Theorem 2

Theorem 2 implies Theorem 1 is relatively easy toshow, but Theorem 2...Few tools deal with spectral radius of non-symmetricmatrices.

E.g. ρ(X + Y ) ≤ ρ(X ) + ρ(Y ) and ρ(XY ) ≤ ρ(X )ρ(Y ) don’thold.

Though ρ(M) < ∥M∥, it turns out ∥M∥ > 2.3 for thecounterexample.

RP is not independent so that M is a complicatedfunction of A.

Techniques: Symmetrization and MathematicalInduction.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 36 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Proof Difficulty of Theorem 2

Theorem 2 implies Theorem 1 is relatively easy toshow, but Theorem 2...Few tools deal with spectral radius of non-symmetricmatrices.

E.g. ρ(X + Y ) ≤ ρ(X ) + ρ(Y ) and ρ(XY ) ≤ ρ(X )ρ(Y ) don’thold.

Though ρ(M) < ∥M∥, it turns out ∥M∥ > 2.3 for thecounterexample.

RP is not independent so that M is a complicatedfunction of A.

Techniques: Symmetrization and MathematicalInduction.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 36 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Two Main Lemmas to Prove Theorem 2

Step 1: Relate M to a symmetric matrix AQAT .

Lemma 1

λ ∈ eig(M)⇐⇒ (1− λ)2

1− 2λ∈ eig(AQAT ).

Sicne Q is symmetric, we have

ρ(M) < 1⇐⇒ eig(AQAT ) ⊆ (0,4

3).

Step 2: Bound eigenvalues of AQAT - prove by induction.

Lemma 2

eig(AQAT ) ⊆ (0,4

3).

Remark: 4/3 is “almost” tight; for m = 3, maximum ≈ 1.18.Increase to 4/3 as m increases.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 37 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Two Main Lemmas to Prove Theorem 2

Step 1: Relate M to a symmetric matrix AQAT .

Lemma 1

λ ∈ eig(M)⇐⇒ (1− λ)2

1− 2λ∈ eig(AQAT ).

Sicne Q is symmetric, we have

ρ(M) < 1⇐⇒ eig(AQAT ) ⊆ (0,4

3).

Step 2: Bound eigenvalues of AQAT - prove by induction.

Lemma 2

eig(AQAT ) ⊆ (0,4

3).

Remark: 4/3 is “almost” tight; for m = 3, maximum ≈ 1.18.Increase to 4/3 as m increases.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 37 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Two Main Lemmas to Prove Theorem 2

Step 1: Relate M to a symmetric matrix AQAT .

Lemma 1

λ ∈ eig(M)⇐⇒ (1− λ)2

1− 2λ∈ eig(AQAT ).

Sicne Q is symmetric, we have

ρ(M) < 1⇐⇒ eig(AQAT ) ⊆ (0,4

3).

Step 2: Bound eigenvalues of AQAT - prove by induction.

Lemma 2

eig(AQAT ) ⊆ (0,4

3).

Remark: 4/3 is “almost” tight; for m = 3, maximum ≈ 1.18.Increase to 4/3 as m increases.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 37 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Fixed Cyclic ADMM vs RP-ADMM

Solving weakly Laplacian linear systems:

A(i , i) = 1, A(i , j) = A(j , i) = τ · rand(1, 1).

(n, τ) - (iter,time) CycADMM RP-ADMM

(500, 0.01) diverge (10.1, 1.3)(500, 0.05) diverge (67, 0.61)

(5000, 0.002) diverge (7.5, 21.1)(5000, 0.01) diverge (22, 38)(5000, 0.02) diverge (205, 251)

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 38 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Further Result on Randomized Permutation for

Convex Optimization

Consider the non-separable convex quadratic problem

minx∈Rn

xTHx+ cTx,

s.t. Ax , A1x1 + · · ·+ Apxp = b,

xi ∈ Xi ⊂ Rdi , i = 1, . . . , p.

Theorem 3If each block subproblem possesses a unique solution, theexpected iterate ϕk , Eξk(z

k) converges to an optimalsolution of the original problem (Chen, Li, Liu and Y 15).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 39 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Further Result on Randomized Permutation for

Convex Optimization

Consider the non-separable convex quadratic problem

minx∈Rn

xTHx+ cTx,

s.t. Ax , A1x1 + · · ·+ Apxp = b,

xi ∈ Xi ⊂ Rdi , i = 1, . . . , p.

Theorem 3If each block subproblem possesses a unique solution, theexpected iterate ϕk , Eξk(z

k) converges to an optimalsolution of the original problem (Chen, Li, Liu and Y 15).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 39 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Table of Contents

1 Linear Programming and Algorithms

2 Advances in Simplex and Interior-Point Algorithms

3 First-Order and Alternating Direction Method of Multipliers

4 Convergence of RP-ADMM in Expectation

5 Why Multi-Block ADMM?

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 40 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Universal Decomposition and Block Coordinate

Descent

Consider the homogeneous and self-dual linear program tofind feasible solution (x, y, s) such that

Ax− bτ = 0,−ATy − s+ cτ = 0,bTy − cTx− κ = 0,

eTx+ τ + eT s+ κ = 1,(x, τ, s, κ) ≥ 0,

where three blocks (x, τ), y and (s, κ) are alternativelyupdated (Donoghue/Chu/Parikh/Boyd 15), also see Sunand Toh 14 for SDP.

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 41 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Combine ADMM and Interior-Point

Consider the logarithmic barrier function as objective:

min −µ ln(τκ)− µ∑

j ln(xjsj)

s.t. Ax− bτ = 0,−ATy − s+ cτ = 0,bTy − cTx− κ = 0,

eTx+ τ + eT s+ κ = 1,

which is solved by Multi-Block ADMM.

Then µ is gradually reduced to 0 as in interior-pointmethods – initial computational results are encouraging(Lin/Ma 15)!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 42 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Combine ADMM and Interior-Point

Consider the logarithmic barrier function as objective:

min −µ ln(τκ)− µ∑

j ln(xjsj)

s.t. Ax− bτ = 0,−ATy − s+ cτ = 0,bTy − cTx− κ = 0,

eTx+ τ + eT s+ κ = 1,

which is solved by Multi-Block ADMM.

Then µ is gradually reduced to 0 as in interior-pointmethods – initial computational results are encouraging(Lin/Ma 15)!

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 42 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Implementation on Distributed Platforms I

Reformulate

minimizex cTx

subject to Ax = b,x ≥ 0, as

minimizex0,...,xm cTx0

subject to aixi = bi , i = 1, ...,mxi − x0 = 0, i = 1, ...,m,x0 ≥ 0.

Then apply Multi-Block ADMM to the new formulation(He and Yuan 15, Sun/Y 15).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 43 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Implementation on Distributed Platforms I

Reformulate

minimizex cTx

subject to Ax = b,x ≥ 0, as

minimizex0,...,xm cTx0

subject to aixi = bi , i = 1, ...,mxi − x0 = 0, i = 1, ...,m,x0 ≥ 0.

Then apply Multi-Block ADMM to the new formulation(He and Yuan 15, Sun/Y 15).

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 43 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Implementation on Distributed Platforms II

In each round of ADMM, for i = 1, ...,m, for a simplequadratic objective one independently solves

minimizexi qi(xi)subject to aix = bi ;

then, update x0 as

x0 = max{ 1m

∑i

xi , 0}.

Or

minimizexi qi(xi)subject to aix = bi , xi ≥ 0;

x0 =1

m

∑i

xi .

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 44 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Implementation on Distributed Platforms II

In each round of ADMM, for i = 1, ...,m, for a simplequadratic objective one independently solves

minimizexi qi(xi)subject to aix = bi ;

then, update x0 as

x0 = max{ 1m

∑i

xi , 0}. Or

minimizexi qi(xi)subject to aix = bi , xi ≥ 0;

x0 =1

m

∑i

xi .

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 44 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 45 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 45 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 45 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 45 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 45 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.

Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 45 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 45 / 45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary and Future Directions

Could MDP be solved in strongly polynomial time regardlessdiscount factors?

Could the recent theoretically better interior-point algorithms berealized in practice?

Could the RP-ADMM convergence be with high probability,and/or the RP-ADMM result extended to more general convexoptimization problems?

Big Picture: Are there other competitive linear programmingalgorithms in both theory and practice?

Multi-Block ADMM might have a chance (at least in practice):

Good theories have been established.Could be easily implemented in a distributed way on a cloudand/or GPU platform.

Linear Programming Research Continues ...

Ye, Yinyu (Stanford) Alternative Algorithms for LP? MIT ORC, 2015 45 / 45

Recommended