Upload
iitg-in
View
1
Download
0
Embed Size (px)
Citation preview
DESIGN AND IMPLEMENTATION OF
PARALLEL AND RANDOMIZED
APPROXIMATION ALGORITHMS
A Project Report Submitted
in Partial Fulfilment of the Requirements
for the Degree of
BACHELOR OF TECHNOLOGY
in
Mathematics and Computing
by
Ajay Shankar Bidyarthy
(Roll No. 09012305)
to the
DEPARTMENT OF MATHEMATICS
INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI
GUWAHATI - 781039, INDIA
April 2013
CERTIFICATE
This is to certify that the work contained in this project report entitled “Design and im-
plementation of parallel and randomized approximation algorithms” submitted
by Ajay Shankar Bidyarthy (Roll No.: 09012305) to Indian Institute of Technology
Guwahati towards partial requirement of Bachelor of Technology in Mathematics and
Computing has been carried out by him under my supervision and that it has not been
submitted elsewhere for the award of any degree.
Guwahati - 781 039 (Dr. Gautam K. Das)
April 2013 Project Supervisor
ii
ABSTRACT
The multiplicative weights algorithm is a well known technique for solving packing-
covering linear program (LP) and semidefinite programs (SDP) approximately. There
has been considerable recent research building on this technique and developing fast and
approximate solutions to packing-covering linear as well as semi-definite programs [1],
[2], [8], [12], [7].
In this article we have implemented these technique. The performance of our program
outperforms the best known semidefinite programming solvers in the literature. Our al-
gorithms run much faster for moderately small approximation factors and large instances
(5000× 5000 dense real matrices), compared to SeDuMi [11] (a well-known LP and SDP
solver) and GLPK.
These algorithms are a building-block for solving large-scale combinatorial optimiza-
tion problems approximately, for example Lovasz ϑ, control and system theory, sparsest
cut etc. Our algorithms have been implemented in MATLAB and have been tested on
multiple modern 2-core machines.
We also compare our results with the matrix multiplicative weights algorithm for com-
puting approximate solutions to packing-covering semidefinite programs [7], and present
results on problems in SDPLIB [4].
Keywords: convex optimization, approximation algorithm, packing-covering semidef-
inite programs, positive semidefinite programming, multiplicative weights, online algo-
rithm, experimental analysis.
iii
Contents
List of Figures vii
List of Tables viii
1 Fast Algorithms for Approximate Packing Semidefinite Programs using
the Multiplicative - Weights Update Techniques 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Primal only approach for approximately solving SDPs . . . . . . . . . . . 2
1.2.1 Overview of algorithm . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 ORACLE implementation . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Techniques for correctness . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Converting a feasibility engine into an optimizer . . . . . . . . . . . . . . 12
1.6 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6.1 AHK versus SeDuMi . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Solving Optimization problems using AHK optimizer 16
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
iv
2.2 Techniques to solve relaxed SDPs . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 The Multiplicative - Weights Update Techniques: Survey 21
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 The Weighted Majority Algorithm . . . . . . . . . . . . . . . . . . . . . . 22
3.3 The Multiplicative - Weights Update Techniques . . . . . . . . . . . . . . 23
3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Data Streaming and Online Algorithm: Matrix Multiplicative - Weights
Update technique 27
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Data Streaming and Online Problems . . . . . . . . . . . . . . . . . . . . 28
4.2.1 2-Player Zero-Sum Game . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.2 Matrix Multiplicative - Weights Update Algorithm . . . . . . . . 30
4.2.3 Data Streaming and Online Algorithm . . . . . . . . . . . . . . . 31
4.3 Experiments and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.3 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Fast Algorithms for Approximate Packing-Covering Semidefinite Pro-
grams using the Multiplicative - Weights Update Techniques 43
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
v
5.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.1 Computation of vector y . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Bibliography 51
vi
List of Figures
1.1 CPU Time tradeoff versus n (problem size) of AHK Algorithm . . . . . . 14
1.2 CPU Time tradeoff versus n (problem size) of AHK Algorithm and of
SeDuMi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Total payoff and worst payoff Vs. number of rounds . . . . . . . . . . . 34
4.2 upperbound, random payoff and random payoff Div. total payoff Vs. num-
ber of rounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Total actual payoff Vs. size of PSD matrices . . . . . . . . . . . . . . . . 37
4.4 Random payoff Vs. size of PSD matrices . . . . . . . . . . . . . . . . . . 37
4.5 Random payoff div. total actual payoff Vs. size of PSD matrices . . . . . 38
4.6 Worst payoff Vs. size of PSD matrices . . . . . . . . . . . . . . . . . . . 38
4.7 Upper bound Vs. size of PSD matrices . . . . . . . . . . . . . . . . . . . 39
4.8 Total actual payoff Vs. ε error accuracy . . . . . . . . . . . . . . . . . . . 39
4.9 Random payoff Vs. ε error accuracy . . . . . . . . . . . . . . . . . . . . . 40
4.10 Random payoff div. total actual payoff Vs. ε error accuracy . . . . . . . 40
4.11 Worst payoff Vs. ε error accuracy . . . . . . . . . . . . . . . . . . . . . . 41
4.12 Upper bound Vs. ε error accuracy . . . . . . . . . . . . . . . . . . . . . . 41
vii
List of Tables
2.1 SDP lower bound for Lovasz ϑ functions . . . . . . . . . . . . . . . . . . 19
2.2 SDP upper bound for Lovasz ϑ functions . . . . . . . . . . . . . . . . . . 19
viii
Chapter 1
Fast Algorithms for Approximate
Packing Semidefinite Programs
using the Multiplicative - Weights
Update Techniques
1.1 Introduction
Semidefinite programming (SDP) is a linear programming over positive semidefinite ma-
trices. The properties of symmetric positive semidefinite matrices and their associated
convex quadratic forms have many applications in mathematics and industry level. With
the availability of optimization software, more and more problems are modeled as semidef-
inite programs. Hence semidefinite programming has become one of the basic modeling
and standard optimization tools along with linear and quadratic programming.
SDP has applications in control theory, robust optimization, combinatorial optimiza-
tion and eigenvalue optimization. SDP had been use widely even before the development
of efficient algorithms. Today SDP is one of the basic modeling and optimization tools
along with linear and quadratic programming.
In 2005 Arora et al. [7] developed multiplicative - weights update techniques to ap-
proximate SDPs primal only (we will refer it by AHK) and later Arora and Kale [2]
1
developed same technique to approximate primal-dual SDPs (we will refer it by AK).
A recent paper by Raman et al. [12] developed a fast approximations to solve packing-
covering LPs and constant-sum games via multiplicative-weights technique.
We will strongly follow algorithms proposed by AHK and AK to design multiplicative-
weights update algorithms and we will implement it for computing approximate solutions
to packing-covering semidefinite programs. From now on we refer multiplicative weights
update method algorithms framework as the MW framework.
1.2 Primal only approach for approximately solving
SDPs
Semidefinite programming (SDP) solves the following problem:
min c •X
subject to Aj •X ≥ bj for j = 1, 2, ...,m
X � 0. (1.1)
Where X ∈ Rn×n is a matrix of variables and A1, A2, ..., Am ∈ Rn×n. For n × n
matrices A and B, A • B is their inner product (A • B =∑
ij AijBij) treating them as
vector in Rn2, and A � 0 is notation for A is positive semidefinite. Our implementation
contributes the modified multiplicative weights technique to handle some of high-width
SDPs and technique used is the multiplicative weights technique. Here we have implicitly
performed a binary search on the optimum and the objective is converted to a constraint
in the standard way (in later section we will give technique to do it). Here one additional
constraint has been assumed, a bound on the trace of the solution (∑
iXii):
Aj •X ≥ bj for j = 1, 2, ...,m.∑i
Xii ≤ R
X � 0. (1.2)
2
Here we have added one additional constraint i.e. the upper bound on trace Tr(x) =∑iXii ≤ R. For SDPs relaxation it is natural to put an upper bounds on trace, Tr(X) =∑iXii. We solve the SDP approximately up to a given tolerance ε, by which we mean
that either we find a solution X which satisfies all the constraints up to an additive error
of ε, i.e.
Aj •X − bj ≥ −ε for j = 1, 2, ...,m,
or conclude correctly that the SDP is infeasible.
1.2.1 Overview of algorithm
The Multiplicative Weights Update idea for solving SDPs is to perform several iterations,
indexed by time t as follows:
According to AHK, associate a non-negative weight w(t)j with constraint j, where
∑j w
(t)j =
1. A high current weight for a constraint indicates that it was not satisfied too well in the
past, and therefore should receive higher priority in the next step. Thus the optimization
problem to the next step is to
max∑j
w(t)j (Aj •X − bj)∑
i
Xii ≤ R
X � 0.
Which represents an eigenvalue problem, since the optimum is attained at an X that
has rank 1. The Lagrangian relaxation idea would be to solve this eigenvalue problem,
and update the weights wi according to the usual multiplicative update rule, for some
constant β:
w(t+1)j = w
(t)j (1− β(AjXt − bj))/St.
Where Xt was the solution to the eigenvalue problem, expressed as a rank 1 PSD
matrix, at a time t, and St is the normalization factor to make the weights sum to 1, can
3
be written as:
St =
(t)∑j
(1− β(Aj •Xt − bj)).
If β is small enough, the average 1T
∑Tt=1 Xt is guaranteed to converge to a near-feasible
solution to the original SDP if we assume a feasible solution exists, assuming a feasible
solution exists.
1.3 Algorithm
Objective is to
min c •X
subject to Aj •X ≥ bj for j = 1, 2, ...,m
X � 0.
Let n be the size of matrices Aj, where j = 1, 2, 3, ...,m, m being the number of
constraints. Let R be the upper bound on the Tr(X), i.e. Tr(x) =∑
iXii ≤ R.
Assuming Aj is symmetric, i.e. Aj = Aᵀj for each j = 1, 2, 3, ...,m. Let N ≥ n be the
number of nonzero entries of Aj for each j = 1, 2, , 3, ...,m.
Algorithm 1 checks the feasibility of given SDP problems. To get the optimal solution
of a given particular SDP problem, we implicitly perform a binary search on the optimum
and the objective is converted to a constraint in the standard way. In the next section we
provide detail on binary search technique on the optimum and we also provide conversion
of objective function to a constraint for AHK algorithm. Since this algorithm uses multi-
plicative weights update techniques and checks the feasibility of given SDP therefore we
call this algorithm as decision making algorithm AHK primal only multiplicative weights
SDP solver.
Let C be the weighted cumulative constraint matrix (as C defined by AHK) and it
is used to calculate the eigenvector, V (largest eigenvector), corresponding to the largest
eigenvalue λ by the lanczos method and adding V ᵀ × V to the current solution XCur.
4
Algorithm 1 returns feasible/infeasible solution PSD matrixX and corresponding num-
ber of satisfiable/unsatisfiable constraints (feasible/infeasible constraints) Aj •X ≥ bj for
j = 1, 2, ...,m.
Theorem 1.3.1. (Theorem 1 [Arora et al. [7]])
Consider the SDP (1.2). Let P = {X � 0,∑
iXii ≤ R}. Assume that for any j, Aj •X−
bj lies in one of the ranges [−l, ρ] or [−ρ, l] where 1 ≤ l ≤ ρ. Then ρ is called the width of
constraint. Also assume that there is an algorithm (ORACLE) which runs in time Toracle,
and given any set of non-negative weights w1, w2, ..., wm on the constraints summing to 1,
either finds an X ∈ P which makes the weighted combination∑m
j=1wJ(Aj •X−bj) ≥ − ε2
or declares correctly that no X ∈ P makes this combination non-negative. Then there is
an algorithm which runs in O( lρε2
(Toracle +m)) time and either gets an ε solution to SDP
(1.2) or concludes that it is infeasible.
Theorem 1.3.2. (Theorem 2 [Arora et al. [7]])
With the setup as in previous theorem, there is an algorithm which produces an ε approx-
imate solution to the SDP (1.2) or declares correctly its infeasibility in time
O(m log(ρ)Toracle +m log(ρ)M(m log(ρ)))
where M(m) = O(m2.3) is time needed to multiply two m × m matrices (m is the
number of constraints). The O notation is used to suppress polylog(mnε
).
5
On input ( An×nj , bj for j = 1, 2, ...,m, ε, R)
Result: Feasible/infeasible constraints and the solution PSD matrix X
Initialize wj such that∑
j wj = 1, numiter = n lognε2
, β = min{ ε2, 0.01}.
while T < numiter doUpdate T : T = T + 1.
Compute C: C =∑m
j=1 w(t)j (Aj − bj
RI).
if C is negative definite thenreport solution is infeasible and stop
elsecompute the largest eigenvector of C say V .
end
Compute Xt: Xt = V × V ᵀ // where t represents the current iteration
Update Wj: w(t+1)j = w
(t)j (1− β(Aj •Xt − bj))/St, where
St =∑m
j=1 w(t)j (1− β(Aj •Xt − bj)).
Compute XCur: XCur =∑T
t=1XtT
//Check the feasibility of constraints
if Aj •XCur ≥ bj for all j = 1, 2, 3, ...,m thenStop
elseContinue
end
end
Compute X: X =∑T
t=1XtT
, where T = number of iterations (numiter), check
the feasibility of constraints and return feasible/infeasible constraints and the
solution PSD matrix X
Algorithm 1: Decision making algorithm for primal only SDP using multiplica-
tive weights update method
1.3.1 ORACLE implementation
Here we describe ORACLE implementation using the approximate eigenvector computa-
tions. We provide three lemmas of Arora et al. [7], which completely help us to implement
the ORACLE (note: proof is not given).
6
Lemma 1.3.3. (lemma 1 [Arora et al. [7]])
Suppose we have a procedure that given a matrix C ∈ Rn×n and a tolerance δ > 0 , com-
putes a unit vector x which satisfies xᵀCx ≥ −δ, in time Tev(C, δ), or declares correctly
that C is negative definite. Then using this procedure once with C =∑m
j=1 wi(Aj −bjRI)
and δ = ε2R
we have implemented X (Oracle).
We need to compute the top eigenvector of the matrix approximately which represents
the weighted combination of the constraints. The Lanczos algorithm with a random
starting vector is the most efficient algorithm for finding extreme eigenvectors. The
running time of Lanczos algorithm as follows:
Lemma 1.3.4. (lemma 2 [Arora et al. [7]])
Let C ∈ Rn×n be a matrix with N non-zero entries and eigenvalues λ1 ≥ λ2 ≥ ... ≥ λn
. Let δ > 0 be a given error parameter. Let γ = max{ λ1λ1+|λn| ,
δ|λn|}. Then the Lanczos
algorithm with a random start applied to matrix C + ΛI yields with high probability a
unit vector x which satisfies xᵀCx ≥ −δ or declares correctly that C is negative definite
in time Tev(C, δ) = O( N√γ). Of course the parameter γ is unknown and this also shows
that the running time of the Oracle depends in on the sparsity of C or number of nonzero
entries in it.
The parameter γ is unknown, in applications we derive its lower bound. Lemma 1.3.5
shows the running time of the ORACLE depends on the sparsity of C, i.e. the number
on non-zero entries in it. The randomized sparsification procedure as follows:
Lemma 1.3.5. (lemma 3 [Arora et al. [7]])
Let C ∈ Rn×n be a matrix with N non-zero entries and let S =∑
ij |Cij|. Let δ > 0 be
given error parameter. Then there is a randomized procedure which runs in O(N) time
and with high probability produces a matrix C ′ such that A′ has O(√nSδ
) non-zero entries
and for all unit vectors x ∈ Rn, we have |xᵀCx− xᵀC ′x| ≤ O(δ).
C ′ can be used in place of C in the Lanczos algorithm, if it turns out to be sparser:
the decision of specific applications depends on the relative value of N and√nSδ
.
7
1.4 Correctness
We have checked correctness of decision making algorithm AHK primal only multiplicative
weights update SDP solver implemented in matlab. By verifying the solutions with those
obtained from other SDP solvers like Sedumi, SDPA, SDPLR, CSDP etc. All agree for a
given SDP optimization problem with objective min/max and m constraints of size n×n.
Next we give standard form of SDP where Sedumi, SDPA, SDPLR etc deal with.
SDP
P : minm∑j=1
cixi
subject tom∑j=1
Fjxj − F0 = X
X � 0
D : max F0 • Y
subject to Fj • Y = cj , j = 1, 2, ...,m
Y � 0 (1.3)
Where m is the number of constraints matrices of size n×n each and X, Y is n×n PSD.
Some time we also deal with a different primal-dual pair P and D of the form:
SDP’
P : min A0 •X
subject to Aj •X = bj , j = 1, 2, ...,m
X � 0
D : maxm∑j=1
bjyj
subject tom∑j=1
Ajyj + Z = A0
Z � 0 (1.4)
8
Where we can easily transform the SDP’ to SDP as follows:
−Aj(j = 0, ...,m) → Fj(j = 0, ...,m)
−bj(j = 1, ...,m) → cj(j = 1, ...,m)
X → Y
y → x
Z → X (1.5)
AHK solves SDP (original problem):
min A0 •X
subject to Aj •X ≥ bj , j = 1, 2, ...,m
X � 0 (1.6)
Sedumi, SDPA, and other solver solves (translating SDP’s, from equation (1.6) to
(1.8), i.e. equality to inequality SDP’s by adding some slack variables). Let yj be the
slack variables for j = 1, 2, ...,m are being added, then equation (1.6) becomes
min −A0 •X
subject to Aj •X − yj = bj , j = 1, 2, ...,m
X � 0 (1.7)
Which is nothing but,
9
max A0 •X
subject to Aj •X − yj ≤ bj , j = 1, 2, ...,m
X � 0
OR
min −A0 •X
subject to −Aj •X + yj ≥ −bj , j = 1, 2, ...,m
X � 0 (1.8)
Note that we are keeping equation (1.8) in equality SDP’s, since it is being solved by
Sedumi, SDPA etc solver which solves equality SDP’s only. After transformation the new
constraint matrices and objective Aj look like (let it is represented by Mj)
Mj =
Aj 0n×m
0m×n yj
, for j = 0, 1, 2, ...,m
and the new solution PSD X will look like (let it is represented by Y)
Y =
Xn×n 0n×m
0m×n yj
� 0, for j = 1, 2, ...,m
and thus Y � 0, Y is (n+m)× (n+m) PSD. Y PSD ensures y1, y2, ..., ym are all positive
and X is PSD. This is because det(Y −λI) = (λ−y1)(λ−y2)...(λ−ym).det(X−λI) = 0.
All roots non-negative imply y1 ≥ 0, y2 ≥ 0, ..., ym ≥ 0 and all roots of det(X − λI) ≥ 0,
as X is PSD.
1.4.1 Techniques for correctness
The following steps represents the technique to check correctness of the decision making
algorithm AHK SDP solver:
1. Solve SDP optimization problem using decision making algorithm AHK SDP solver
and check how many constraints are satisfiable (feasible). (say K)
10
2. Solve SDP optimization problem using SDPA, SeDuMi and others with exactly K
constraint and an A0 objective. Find Optimal value (say α?).
3. Now give SDP optimization problem return back to the decision making algorithm
AHK SDP solver with exactly K constraints plus add an extra constraint A0 to
AHK SDP solver (i.e. A0 •X ≥ α?).
Thus now we have: K constraints + A0 • X ≥ α? ± δ, for small δ > 0. Thus the
decision making algorithm AHK should give
A0 •X ≥ α? − δ : feasible (satisfiable) and
A0 •X ≥ α? + δ : infeasible (not satisfiable). (1.9)
Feasible A0 •X ≥ α? − δ and infeasible A0 •X ≥ α? + δ guarantee that the decision
making algorithm AHK SDP solver returns ε-optimal and δ-feasible optimum value of
optimization problems.
1.4.2 Example
Input for decision making algorithm AHK:
n = 2,m = 3, A0 =
−11 0
0 23
, A1 =
10 4
4 0
, A2 =
0 0
0 −8
, A3 =
0 −8
−8 −2
,
b =
−48
−8
−20
.
Decision making algorithm AHK output: K = 3 (all three constraints satisfiable (feasi-
ble)). Give this SDP optimization problem to SDPA, Sedumi and other solvers. SDPA,
SeDuMi, etc. Output is:
objValPrimal = +2.3000000262935881e+ 01
objValDual = +2.2999999846605416e+ 01
Now give SDP optimization problem back to the decision making algorithm AHK SDP
solver with 4 constraints, AHK. Outputs: For n = 2,m = 3+1 = K+1, ε = 0.01, R ≥ 11,
11
(since Tr(X) ≤ R, δ = 0.01 and α? = 23.0, we get
A0 •X ≥ α? + δ : infeasible (not satisfiable)
A0 •X ≥ α? − δ : feasible (satisfiable).
1.5 Converting a feasibility engine into an optimizer
In this section we explain technique to convert a feasibility engine into an optimizer.
Let lb be the expected lower bound of minima and ub be the expected upper bound of
minima. Here constraints + objective we mean that Aj • X ≥ bj for j = 0, 1, 2, ...,m
i.e. AHK decision making SDP solver handle with m + 1 constraints (objective A0 also
included in constraints). In order to get optima of a given problem we formulate the
objective function into (m+ 1)th constraint as: A0 •X ≥ lb or A0 •X ≥ ub.
In the next algorithm 2 in each step we move from lb towards ub and from ub towards
lb. If ub− lb < ε then we get optima-ε approximation and return lb as optimum of a given
optimization problem. Here we are calling Algorithm 1 by a function call AHKFeasibil-
ity(constraint + objective (A0), α?) whose input arguments are constraints Aj •X ≥ bj
and objective in terms of constant i.e. A0 • X ≥ α?, where α? = ub+lb2
. We have given
binary search technique to minimize a given optimization problem using decision making
AHK SDP solver.
12
AHKOptimizer(Aj for j = 0, 1, 2, ...,m, lb, ub)
α? = ub+lb2
AHKFeasibility(Aj for j = 0, 1, 2, ...,m, α?)
if Aj •X ≥ bj for all j = 0, 1, 2, ...,m, where X we get from algorithm 1
thenub = α?
AHKOptimizer(Aj for j = 0, 1, 2, ...,m, lb, ub)
elselb = α?
AHKOptimizer(Aj for j = 0, 1, 2, ...,m, lb, ub)
end
if ub− lb ≤ ε thenreturn lb as optima-ε approximation and stop
elsecontinue
end
Algorithm 2: Converting a feasibility engine (decision making AHK SDP
solver) into an optimizer.
1.6 Experimental results
Figure 4.3 represents the CPU time tradeoff versus n (problem size i.e. the size of the
constraint matrices) of the AHK algorithm. For a given constraints m, an objective and
its size n, CPU time is the time required to get a minimum optimum value of objective
function of a given SDP.
1.6.1 AHK versus SeDuMi
We have performed experimental analysis to compare the performance of our implemen-
tation versus SeDuMi in MATLAB. SeDuMi (Self-Dual-Minimization) is an add-on for
Matlab, which let us solve optimization problems with linear, quadratic and semidefi-
niteness constraints. It implements the self- dual embedding technique for optimization
over self-dual homogeneous cones. The self-dual embedding technique essentially makes
it possible to solve certain optimization problems in a single phase, leading either to an
13
100 200 300 400 500 600 700 800 9000
20
40
60
80
100
120
140
160
180
200
n: Problem Size−−−−−−−−−−−>
CP
U T
ime (
seco
nd
s)−
−−
−−
−−
−>
Time Vs Problem size with ε = 0.01 (m = number of constraints)
m = 100
m = 200
m = 300
Figure 1.1: CPU Time tradeoff versus n (problem size) of AHK Algorithm
optimal solution, or a certificate of infeasibility. Complex valued data and variables can
also be input for SeDuMi. Large scale optimization problems are solved efficiently, by
exploiting sparsity.
Primal
min bᵀx
subject to∑m
j=1Ajxj − A0 = X
X � 0 (1.10)
Dual
max A0 •X
subject to Aj •X = bj j = 1, 2, ...,m
X � 0 (1.11)
Where vector x = [x1, x2, ..., xm]ᵀ and Aj are symmetric matrices and X is PSD (positive
semidefinite). Since Sedumi is a best known SDP solver, therefore we have checked the
performance analysis of AHk SDK solver with respect to Sedumi SDP solver. Figure 4.4
represents the CPU time tradeoff of AHK algorithm versus SeDuMi.
14
100 200 300 400 500 600 700 800 9000
100
200
300
400
500
600
700
800
900
1000
n: Problem size−−−−−−−−>
CP
U T
ime (
seco
nd
s)−
−−
−−
−−
−−
−−
>
CPU Time Tradeoff of AHK Algorithm versus SeDuMi with ε = 0.1 (Gaps)
m = 100 "AHK"
m = 200 "AHK"
m = 300 "AHK"
m = 100 "SeDuMi"
Figure 1.2: CPU Time tradeoff versus n (problem size) of AHK Algorithm and of SeDuMi
1.7 Conclusions
We have designed and implemented multiplicative-weights update algorithms for solving
SDPs approximately. In the computation, each iteration is an approximate eigenvector
computation. In every iteration of the interior point algorithms, we need to compute the
cholesky decomposition of a positive semidefinite matrix. This computation takes O(n3)
time, where as the top eigenvector of a matrix can be computed much more efficiently.
This is where our implementation gets an edge over interior point methods.
Second advantage of our implementation is that the Cholesky decomposition of the final
solution is calculated automatically because the solution is a convex combination of many
rank 1 matrices. As we know that the first step of rounding in approximation algorithms
relying on SDP is to compute the Cholesky decomposition of the optimal solution, and
this step is free of cost in our implementation.
15
Chapter 2
Solving Optimization problems using
AHK optimizer
2.1 Introduction
In the previous chapter we have given an algorithm to check the feasibility of a given
SDP problems. We call it feasibility engine or decision making algorithm. Since it only
decides whether a given SDP problem is feasible or not. Later we gave technique how to
convert this decision making algorithm or feasibility engine to an optimizer.
In this chapter we use this technique to solve some of the SDP problems listed in
SDPLib [4]. In this SDP library several SDP problems are well designed. This library
has equality constraints SDPs. But our optimizer solves inequality constraints SDPs.
To resolve this issue we relax these equality constraints SDPs to inequality constraints
SDPs. Due to relaxation we get relaxed optimal (lower and upper bound of minima),
since inequality constraints will have more freedom to search the optimum solution in
compare to equality constraints (since the feasible region will increase). We solved some
selected SDPs listed in SDPLib. Next we give technique to relax these problems and
finally we give our experimental results.
16
2.2 Techniques to solve relaxed SDPs
The decision making algorithm AHK SDP solver is a deterministic algorithm, when con-
sidered theoretically and numerical stability other issues left aside. The parameter that
is in our control is R (Trace(X) ≤ R) and β. Therefore the first control parameter R we
choose sufficient enough and the second control parameter β we choose small enough for
faster convergence. We defined R and β in previous section 1.2 of chapter 1.
Let R ≥ n where n is the size of constraints matrices and β, selected by us is β =
min(0.01, ε/2). We have also fixed constraint over number of iterations to be executed.
The maximum number of iterations before timeout is max(1000, 16 × lognε2
). The only
other reason we give an error is when C becomes negative semi-definite (C defined in
lemma of section 1.3.1 of chapter 1) during iterations. The effective optimal values we
have written in table represents the effective lower and upper bound of optimal value
(minima) that can be obtained for each ε-optimal and δ-feasible.
SDPLib SDPs: Primal-dual pair P and D of the form:
SDP’
P : min A0 •X
subject to Aj •X = bj , j = 1, 2, ...,m
X � 0
D : maxm∑j=
bjyj
subject tom∑j=1
Ajyj + Z = A0
Z � 0 (2.1)
AHK SDPs: Primal only SDP is
min A0 •X
subject to Aj •X ≥ bj , j = 1, 2, ...,m
X � 0 (2.2)
17
Relaxation of SDPLib SDPs to AHK SDPs: We can write constraints of SDPLib
SDPs as Aj •X ≥ bj − ε and Aj •X ≤ bj + ε for j = 1, 2, ...,m and ε > 0. This can be
formulated as
min A0 •X
subject to Aj •X ≥ bj − ε
−Aj •X ≥ −bj − ε for j = 1, 2, ...,m and ε > 0
X � 0
Tr(X) ≤ R. (2.3)
To get lower and upper bound of optimum value by this AHK optimizer, we add one
more constraint to AHK i.e. objective constraint matrix A0 and guessed optimum values
α?. Thus (2m+ 1)th constraint will be A0 •X ≥ (α? ± δ, where α? is optimal objective
value.
To get lower and upper bound of optimum objective value by AHK optimizer, it will
solve 2m constraints plus one objective constraint A0 • X ≥ α? ± δ. Where α? is the
optimum (minimum here) objective value (guessed) of the given SDPs. Hence AHK
optimizer will solve 2m + 1 constraints and correctly satisfy (2m + 1)th constraint with
α? and gives lower and upper bound of optimum objective value of the given SDP.
2.3 Experimental results
Result 2.3.1. Given optimization problems (m being number of constraints matrices of
size n × n), table represents the effective lower and upper bound of optimal objective
value, number of iterations and CPU time to get this lower and upper bound (minima).
(optimization problems taken from SDPLIB 1.2).
Result 2.3.2. For m= number of constraints matrices, for n × n positive semidefinite
matrix X, with inequalties satisfied upto additive error ε. Each iteration is a selection
of the eigenvector, V (largest eigenvector), corresponding to the largest eigenvalue λ of
the new weighted cumulative constraint matrix (C, defined in chapter 1) by the lanczos
method and adding V ᵀ × V to the current solution.
18
Problem m n Optimal ε R δ α? CPU Time (secs) Iterations
theta1 104 50 23 0.1 1 0.1 0.40 0.076 1
0.01 1 0.01 10.91 0.4219 7
theta2 498 100 32.879 0.1 1 0.1 0.40 0.7 1
0.01 1 0.01 15.44 3.332 7
theta3 1106 150 42.17 0.1 1 0.1 0.40 2.17 1
0.01 1 0.01 15.9 12.89 9
Table 2.1: SDP lower bound for Lovasz ϑ functions
Problem m n Optimal ε R δ α? CPU Time (secs) Iterations
theta1 104 50 23 0.1 1 0.1 25.4 0.0375 1
0.01 10 0.01 25.4 0.121 1
theta2 498 100 32.879 0.1 1 0.1 50.2 0.76 1
0.01 1 0.01 40 0.39 1
theta3 1106 150 42.17 0.1 1 0.1 75.1 1.64 1
0.01 1 0.01 60 1.56 1
Table 2.2: SDP upper bound for Lovasz ϑ functions
Result 2.3.3. Effective lower bound must be lower than optimal value as the AHK op-
timizer has freedom to violate linear dot-product constraints by ε. This is due to we are
solving relaxed SDPs and hence there will be more feasible region to search the optimum
solution.
Result 2.3.4. Effective lower bound must increase with decreasing ε, since minimization
problem has greater freedom with larger ε. This is due to the inequality constraints will
converge towards equality constraints as ε decreases.
Result 2.3.5. Effective upper bound must be greater than optimal value as the AHK
optimizer has freedom to violate linear dot-product constraints by ε. This is due to we are
solving relaxed SDPs and hence there will be more feasible region to search the optimum
solution.
Result 2.3.6. Effective upper bound must decrease with decreasing ε, since minimization
problem has lesser freedom with lower ε. This is again due to the inequality constraints
will converge towards equality constraints as ε decreases.
19
2.4 Conclusions and future work
In this chapter we have presented our experimental results on solving relaxed SDPs Lo-
vasz ϑ functions and presented how we approach towards the optimum one. Our results
are good in quality and efficient in compare to existing one. Presently our implementation
runs sequentially, which may not be good for considerable large combinatorial optimiza-
tion problems. In future I am willing to implement these multiplicative weights update
algorithms in a distributed setup. This will give us faster results in compare to existing
one. Distributed setup implementation will also have good quality of results as we have
know. There has been considerable recent research building on this work, developing fast
and parallel approximate algorithms to approximate solutions to packing-covering linear
as well as semi-definite programs. This research direction seem to be more interesting
and can have reasonable payoff and hence it can not be ignored.
20
Chapter 3
The Multiplicative - Weights Update
Techniques: Survey
3.1 Introduction
The multiplicative - weights method is a an idea which has been repeatedly discovered
in the field of machine learning, optimization and game theory.
The idea of multiplicative - weights update technique can be understood in the follow-
ing setting. A decision maker has a choice of n decisions, and needs to repeatedly make
a decision and obtain an associated payoff. The decision maker’s goal, in the long run,
is to achieve a total payoff which is comparable to the payoff of that fixed decision that
maximizes the total payoff with the benefit of hindsight. While this best decision may
not be known a priori, it is still possible to achieve this goal by maintaining weights on
the decisions, and choosing the decisions randomly with probability proportional to the
weights. In each successive round, the weights are updated by multiplying them with
factors which depend on the payoff of the associated decision in that round.
Intuitively, this scheme works because it tends to focus higher weight on higher payoff
decisions in the long run.
21
3.2 The Weighted Majority Algorithm
The weighted majority algorithm also known as the prediction from expert advice problem.
Let us assume the process of picking good times to invest in a stock. Assume there is
a single stock of interest. Assume its daily price movement is modeled as a sequence of
binary events: 1 : price goes up
0 : price goes down
(3.1)
Each morning we try to predict whether the price will go up or down that day:if our prediction happens to be wrong then we lose a dollar that day
and if it is correct then we lose nothing
(3.2)
The stock movements can be arbitrary and even adversarial. In order to balance out this
assumption, we assume that while making our predictions we are allowed to watch the
predictions of n ”experts.” These experts could be arbitrarily correlated, and they may
or may not know what they are talking about. The Weighted Majority Algorithm’s goal
is to limit its cumulative losses, that is the bad predictions, to roughly the same as the
best of these experts.
The first and trivial technique is to compute each day’s up and down prediction by
going with the majority opinion among the experts that day. But this technique does
not work, because a majority of experts may be consistently wrong on every single day.
The weighted majority algorithm corrects the trivial algorithm. It maintains a weight-
ing of the experts. Initially all experts have equal weight. As time goes on, some experts
are seen as making better predictions than others, and the weighted majority algorithm
increases their weight proportionately. The weighted majority algorithm’s prediction of
up and down for each day is computed by going with the opinion of the weighted majority
of the experts for that day. Now consider the following algorithm 3, which is called the
weighted majority algorithm.
22
Initialization: Fix an ε ≤ 12. With each expert i, associate the weight
w(1)i := 1.
for t = 1, 2, ..., T : do1. Make the prediction that is the weighted majority of the experts’
predictions based on the weights w(t)1 , ..., w
(t)n . That is, predict ”up” or
”down” depending on which prediction has a higher total weight of
experts advising it (if there is ties break it arbitrarily).
2. For every expert i who predicts wrongly, decrease his weight for
the next round by multiplying it by a factor of (1− ε) :
w(t+1)i = (1− ε)w(t)
i (update rule). (3.3)
end
Algorithm 3: Weighted Majority Algorithm
The following theorem 3.2.1 bounds the expected cost of the weighted majority algorithm
3 in terms of the cost of the best decision.
Theorem 3.2.1. (Theorem 1.1 [Arora et al. [3]])
After T steps, let m(T )i be the number of mistakes of expert i and M (T ) be the number of
mistakes weighted majority algorithm has made. Then we have the following bound for
every i :
M (T ) ≤ 2(1 + ε)m(T )i +
2 lnn
ε.
In particular, this holds for i which is the best expert , i.e. having the least m(T )i .
Important thing to note down is that weighted majority algorithm do not make any
assumption about the sequence of events. They could be arbitrarily correlated and could
even depend upon our current weighting of the experts.
3.3 The Multiplicative - Weights Update Techniques
Let us assume the general setting, we will still have n experts. The set of events or
outcomes may not be necessarily binary and could even be infinite. To model this setting,
23
we dispense with the notion of predictions altogether, and instead suppose that in each
round, every expert recommends a course of action, and our task is to pick an expert and
use his advice. Basically we have a set of n decisions and each round, we are required to
select one decision from the set. In each run each decision incurs a certain cost, which
is determined by nature or an adversary. All the costs are revealed after we choose our
decision, and we incur the cost of the decision that we choose.
For example: in the prediction from expert advice problem, each decision corresponds
to a choice of an expert, and the cost of an expert is 1 if the expert makes a mistake, and
0 otherwise.
In order to understand the multiplicative - weights (MW) technique. Consider the
naive strategy. In each iteration, simply picks a decision at random. The expected
penalty will be that of the average decision. Suppose now that a few decisions are clearly
better in the long run. This is easy to spot as the costs are revealed over time, and so
it is sensible to reward them by increasing their probability of being picked in the next
round, hence it is called the multiplicative - weights update technique.
Automatically, being in complete ignorance about the decisions at the outset, we select
them uniformly at random. This maximum entropy starting rule reflects our ignorance.
As we learn which ones are the good decisions and which ones are bad, we lower the
entropy to reflect our increased knowledge. The multiplicative weight update technique
is our means of skewing the distribution.
Let, t = 1, 2, 3, ..., T denote the current round. Let i be a generic decision. In each
round t, we select a distribution p(t) over the set of decisions, and select a decision i
randomly from it. At this point, the costs of all the decisions are revealed by nature in
the form of the vector m(t) such that decision i incurs cost m(t)i . Assume that the costs
lie in the range [−1, 1]. This is the only assumption to make on the costs. Nature is
completely free to choose the cost vector as long as these bounds are respected, even with
full knowledge of the distribution that we choose our decision from.
The expected cost to the algorithm for sampling a decision i from the distribution p(t)
is
Ei∈p(t) [m(t)i ] = m(t) · p(t).
24
The total expected cost over all rounds is therefore:
T∑t=1
m(t) · p(t).
Just as before, we want to achieve a total expected cost not too much more than the cost
of the best decision in hindsight i.e.
mini
T∑t=1
m(t)i .
Now consider the following algorithm 4, which is called the multiplicative - weights update
algorithm.
Initialization: Fix an ε ≤ 12. With each decision i, associate the
weight w(1)i := 1.
for t = 1, 2, ..., T : do1. Choose decisioni with probability proportional to its weight
w(t)i . i.e. use the distribution over decision p(t) = {w
(t)1
Φ(t) , ...,w
(t)n
Φ(t)},
where Φ(t) =∑
iwti .
2. Observe the costs of the decision m(t).
3. Penalize the costly decisions by updating their weights as
follows: for every decision i, set
w(t+1)i = w
(t)i (1− εm(t)
i ) (update rule). (3.4)
end
Algorithm 4: Multiplicative - Weights Update Algorithm
The following theorem 3.3.1 bounds the total expected cost of the multiplicative -
weights update algorithm 4 in terms of the total cost of the best decision:
Theorem 3.3.1. (Theorem 2.1 [Arora et al. [3]])
Assume that all costs m(t)i ∈ [−1, 1] and ε ≤ 1
2. Then the multiplicative - weights update
algorithm guarantees that after T rounds, for any decision i we have
T∑t=1
m(t) · p(t) ≤T∑t=1
m(t)i + ε
T∑t=1
|m(t)i |+
lnn
ε. (3.5)
25
3.4 Conclusions
In this chapter we have presented a survey on the multiplicative - weights update tech-
niques, which was proposed in game theory in the early 1950s. First we have presented the
weighted majority algorithm followed by the multiplicative - weights algorithm. These
algorithms mostly used in solving machine learning, data streaming and online problems.
These algorithms tries to increase knowledge about input after every round. Based on
present knowledge about input it updates weights and then tries to construct strategy
such that cost is minimized and profit is maximized.
3.5 Remarks
We have not given proofs of theorem 3.2.1 and 3.3.1 for the weighted majority algorithm
and the multiplicative - weights algorithm. Proofs of these two theorems are available in
Arora et. al [3] paper.
26
Chapter 4
Data Streaming and Online
Algorithm: Matrix Multiplicative -
Weights Update technique
4.1 Introduction
The multiplicative - weights update techniques is being used in areas of combinatorial op-
timization, Game theory and machine learning. For example the multiplicative - weights
update techniques can be used to solve a constrained optimization problems, data stream-
ing online problems, 2-player zero-sum games problem and etc.
Let us consider constrained optimization problem. Let a decision represent each con-
straint in the problem, where costs specified by the points in the domain of interest. For
a given point, the cost of a decision is made proportional to how well the corresponding
constraint is satisfied on the point.
About weights update mechanism, we reduce a decision’s weight depending on its
penalty, and if a constraint is well satisfied on points then we want its weight to be
smaller, so that the algorithm focuses on constraints that are poorly satisfied.
Overall, the choice of points is also under our control (but not for all applications).
We need to generate the maximally adversarial point, i.e. the point that maximizes
the expected cost. In order to apply the multiplicative - weights update technique in
27
constraints optimization problems, we require to go through following two phases:
1. An oracle for generating the maximally adversarial point at each step, and
2. The multiplicative - weights update technique for updating the weights of the de-
cision.
4.2 Data Streaming and Online Problems
Online algorithms represent a theoretical framework for studying problems in interactive
computing. They model, in particular, that the input in an interactive system does not
arrive as a batch but as a sequence on input portions and that the system must react
in response to each incoming portion. They take into account that at any point in time
future input is unknown. Online algorithms consider the algorithmic aspects of interactive
systems. The problem is to design strategies that always compute good output and keep
a given system in good state. One do not make any assumptions about the input streams.
The input can even be generated by an adversary that creates new input portions based
on the system’s reaction to previous one.
In an online decision problem, one has to make a sequence of decisions without knowl-
edge of the future. One version of this problem is the case with n experts corresponding
to decisions. Each period we pick one expert and then observe the cost ∈ [0, 1] for each
expert. Our cost is that of the chosen expert. Our basic goal is to ensure that total
cost is not much larger than the minimum total cost of any expert. This is a version
of the predicting from expert advice problem. The exponential weighting schemes for
this problem have been discovered and rediscovered in many years. Here we use matrix
multiplicative - weights update technique to solve data streaming online problem and
give results based on experimental analysis.
4.2.1 2-Player Zero-Sum Game
The strategic form, or normal from of a two-player zero-sum game is given by a triplet
(X, Y,A), where
28
1. X is a nonempty set, the set of strategies of Player I
2. Y is a nonempty set, the set of strategies of Player II
3. A is a real-valued function defined on X × Y .
Thus A(x, y) is a real number for every x ∈ X and every y ∈ Y .
A finite 2-player zero-sum game is strategic form of, (X, Y,A), is also called a matrix
game because the payoff function A can be represented by a matrix. If X = {x1, ..., xm}
and Y = {y1, ..., yn}, then by the game matrix of payoff matrix we understand the matrix
A =
a11 · · · a1n
......
am1 · · · amn
where aij = A(xi.yj).
In this form, Player I chooses a row, Player II chooses a column, and II pays I the entry
in the chosen row and column. Here the entries of the matrix are the winnings of the row
chooser and losses of the column chooser.
A mixed strategy for Player I may be represented by an m−tuple, p = (p1, p2, ..., pm)T
of probabilities that add to 1. If I uses the mixed strategy p = (p1, p2, ..., pm)T and II
chooses column j, the the average payoff to I is
m∑i=1
piaij. (4.1)
In other words, the expected payoff to the column player for choosing a strategy j is
A(p, j) := Ei∈p[A(i, j)]. (4.2)
Here, the best response for the column player is the strategy j which maximizes this
payoff. Similarly, a mixed strategy for Player II is and n−tuple q = (q1, q2, ..., qn)T . If II
uses q and I uses row i then the payoff to I is
n∑j=1
aijqj. (4.3)
29
In other words, the expected payoff column player gets if the row player chooses the
strategy i is
A(i,q) := Ej∈q[A(i, j)]. (4.4)
Here, the best response for the row player is the strategy i which minimizes this payoff.
In general if I uses the mixed strategy p and II uses the mixed strategy q, the average
payoff to I is
pTAq =m∑i=1
n∑j=1
piaijqj. (4.5)
4.2.2 Matrix Multiplicative - Weights Update Algorithm
Let A be a symmetric matrix. let λ1(A) ≥ λ2(A) ≥ ... ≥ λn(A) denote eigenvalues of
matrix A. Let a matrix generalization of the usual 2-player zero-sum game. The first
player chooses a unit vector v ∈ Sn−1. The second player chooses a matrix M such that
0 �M � I. The first player has to pay the second player
vᵀMv = M • vvᵀ. (4.6)
For n× n matrices A and B, A •B is their inner product (A •B =∑
ij AijBij) treating
them as vector in Rn2. Let the first player to choose his vector from a distribution D over
Sn−1, then the expected loss of the first player is
ED[vᵀMv] = M • ED[vvᵀ] (4.7)
The matrix Let P = ED[vvᵀ] is a density matrix, it is positive semidefinite and has trace
1, (for example, density matrices which appear in quantum computation). Let us consider
an online version of this game. The first player has to react to an external adversary who
picks a matrix M at each step; this is called an observed event. An online algorithm for
the first player chooses a density matrix P (t), and observe the event matrix M (t) in each
round t = 1, 2, ..., T . After T rounds, the best fixed vector for the first player in hindsight
30
is the unit vector v which minimizes the total loss
T∑t=1
vᵀM (t)v. (4.8)
This is minimized when v is the unit eigenvector of∑T
t=1M(t) corresponding to the
smallest eigenvalue.
Next our goal is to design and implement the matrix multiplicative - weights update
algorithm, whose total expected loss over the T rounds is not much more than the mini-
mum loss λn(∑T
t=1 M(t)). Consider the algorithm 5, is the matrix multiplicative - weights
algorithm, which uses matrix multiplicative - weights update technique to update weights
in every round.
Initialize: Fix an ε < 12, and let ε′ = −ln(1− ε). In every round t
for t = 1, 2, ... do
Compute W (t) = (1− ε)∑t−1
i=1 M(i) = exp(−ε′(
∑t−1i=1 M
(i))).
Use the density matrix P (t) = W (t)
Tr(W (t) and observe the event
M (t).
end
Algorithm 5: Matrix Multiplicative Weights Update Algorithm
Theorem 4.2.1. (Theorem 10 [Arora et al. [2]])
The matrix Multiplicative weights Update algorithm generates density matrices P 1, P 2, ..., P T
such that
T∑t=1
M (t) • P (t) ≤ (1 + ε)λn(T∑t=1
M (t)) +lnn
ε.
4.2.3 Data Streaming and Online Algorithm
Assuming same setup as above for matrix multiplicative - weights update algorithm. Next
our goal is to design and implement the data streaming and online algorithm, whose total
expected loss over the T rounds is not much more than the minimum loss λn(∑T
t=1M(t)).
Consider the algorithm 6, we call it data streaming and online algorithm (an extended
version of matrix multiplicative - weights algorithm), which uses matrix multiplicative -
31
weights update technique to update weights in every round. Here our goal includes the
minimum payoff, total payoff, upper bound on payoff, worst payoff and random payoff to
be calculated. Our goal also includes to do analysis on these payoffs which has be given
in next section. Let T be the maximum number of observations. We introduce a new
matrix in the algorithm that we describe here is N (T ), which is defined as follows:
N (T ) =T∑t=1
M (t) (4.9)
Next actual total payoff calculated as follows:
payoff(t)total = payoff
(t−1)total + P (t) •M (t) (4.10)
Then we calculate largest eigenvector V1 and smallest eigenvector Vn corresponding to
largest and smallest eigenvalue of N (T ) respectively. We calculate the minimum payoff as
follows:
payoffmin = V ′nN(T )Vn (4.11)
We calculate the worst payoff as follows:
payoffworst = V ′1N(T )V1 (4.12)
We calculate the upper bound on minimum payoff as follows:
payoffubound = payoffmin(1 + ε) +log n
ε(4.13)
Next our target is to calculate the random payoff. We calculate it as follows: Fix k, find
a random vector, r = r||r|| , of normally distributed pseudorandom numbers of length n,
then the random payoff is
payoffrandom =1
k
k∑i=1
r′N (T )r. (4.14)
32
On input (Event matrices M(t)n×n and number of observations)
Result: total payoff
Initialize: Fix an ε < 12, and let ε′ = −ln(1− ε), N (0)
n×n = 0,
payoff(0)total = 0, δ ≥ 0. In every round t
for t = 1, 2, ... do
Create PSD matrix M (t) that satisfies 0 �M (t) � I
Compute: largest eigenvector, V (t) corresponding to largest
eigenvalue d(t) of M (t)
Compute: d(t) = d(t) + δ, i.e. slightly greater than largest
eigenvalue of M (t)
Compute: M (t) = M(t)
d(t)
Row player calculates: N (t) = N (t−1) +M (t)
if we already see matrix M (t) then
compute: W (t) = exp− ln(1−ε)N(t)
end
if we are not clairvoyant, and are developing an online
algorithm that does not know future inputs then
compute: W (t) = exp− ln(1−ε)N(t−1)
end
Compute: P (t) = W (t)
Tr (W (t))
Compute: payoff(t)total = payoff
(t−1)total + P (t) •M (t)
end
Algorithm 6: Data Streaming and Online Algorithm
4.3 Experiments and Analysis
4.3.1 Problem
You have to provide a density matrix P , which is a positive semidefinite matrix with trace
1, at every unit of time. After you provide the matrix, we matchers will evaluate your
strategy for current set of observations. At every unit of time, the environment gives you
a positive semidefinite matrix, M such that 0 � M � I. You incur a cost P •M . Your
aim is to minimize the cost when the game is repeated T times. T is hopefully large, say
33
hundred at least usually.
4.3.2 Experimental Results
Result 4.3.1. As T becomes large, matrix exponents gives a strategy density that is quit
responsible.
• For 100 × 100 matrices, T = 100, a 60% premium on [clairvoyant] optimal that
drops to a 10% premium on [clairvoyant] optimal for T = 1000.
• A random density matrix on the other hand, will cost you atleast a factor of say
four.
Following figures represents total payoff Vs. number of rounds, worst payoff Vs. num-
ber of rounds, upperbound Vs. number of rounds, random payoff Vs. number of rounds
and random payoff Div. total payoff Vs. number of rounds. As we see For 100 × 100
matrices, T = 100, a 60% premium on [clairvoyant] optimal that drops to a 10% premium
on [clairvoyant] optimal for T = 1000. and a random density matrix on the other hand,
cost us atleast a factor of say four.
100 200 300 400 500 600 700 800 900 10001
1.5
2
2.5
3
3.5
4
4.5
5
5.5
T: Number of rounds −−−−−−−−−−−>
To
tal p
ayo
ff −
−−
−−
−−
−>
Total payoff Vs Number of rounds
n = 100, ε = 0.001
n = 200, ε = 0.01
n = 300, ε = 0.1
100 200 300 400 500 600 700 800 900 10000
500
1000
1500
2000
2500
T: Number of rounds −−−−−−−−−−−>
Wo
rst
payo
ff −
−−
−−
−−
−>
Worst payoff Vs Number of rounds
n = 100, ε = 0.001
n = 200, ε = 0.01
n = 300, ε = 0.1
Figure 4.1: Total payoff and worst payoff Vs. number of rounds
34
100 200 300 400 500 600 700 800 900 10000
0.5
1
1.5
2
2.5
3
3.5
4x 10
4
T: Number of rounds −−−−−−−−−−−>
Up
perb
ou
nd
−−
−−
−−
−−
>
Upperbound Vs Number of rounds
n = 100, ε = 0.001
n = 200, ε = 0.01
n = 300, ε = 0.1
100 200 300 400 500 600 700 800 900 10002.5
3
3.5
4
4.5
5
5.5
6
6.5
T: Number of rounds −−−−−−−−−−−>
Ran
do
m p
ayo
ff −
−−
−−
−−
−>
Random payoff Vs Number of rounds
n = 100, ε = 0.001
n = 200, ε = 0.01
n = 300, ε = 0.1
100 200 300 400 500 600 700 800 900 10000.5
1
1.5
2
2.5
3
3.5
4
4.5
5
T: Number of rounds −−−−−−−−−−−>
Ran
do
m p
ayo
ff D
iv. T
ota
l p
ayo
ff −
−−
−−
−−
−> Random payoff Div. Total payoff Vs Number of rounds
n = 100, ε = 0.001
n = 200, ε = 0.01
n = 300, ε = 0.1
Figure 4.2: upperbound, random payoff and random payoff Div. total payoff Vs. number
of rounds
Result 4.3.2. The result is interesting as the largest eigenvector strategy that could incur
the maximum payoff (worst case algorithms analysis) grows atleast linearly in the matrix
dimension and could cost you say a factor of 100 more. So the worst strategy could be
very very expensive so on average we are doing very well.
Result 4.3.3. It can be seen as a data streaming online problem, as at every instant we
matchers only store O(n2) data (does not depend on T ).
Definition 4.3.4. An online algorithm is a 2-player zero-sum game between algorithm
and adversary. A well known that optimal strategies require randomization.
Definition 4.3.5. The areas of Online Algorithms and Machine Learning are both con-
35
cerned with problems of making decisions about the present based only on knowledge of
the past, hence we see the online Algorithms in machine learning.
Definition 4.3.6. A problem where goal is to predict whether or not to invest in a stock
that day. Input is the advice of n ”experts” Each day, each expert predicts yes or no.
The learning algorithm must use this information in order to make its own prediction,
assuming we make no assumptions about the quality or independence of the experts.
This is called predicting from Expert Advice.
Definition 4.3.7. An algorithm which is used to solve the problem, where we only have
a limited amount of storage is called streaming algorithms.
4.3.3 Experimental Analysis
Under competitive analysis (worst case), for any input, the cost of our online algorithm is
never worse than c (constant) times the cost of the optimal offline algorithm. If we knew
whole input in advance it is easy to optimize. Pros: it can make very robust statements
about the performance of a strategy. Cons: its results tend to be pessimistic.
Under probabilistic analysis, we have assumed a distribution generating the input. We
have designed an algorithm which minimizes the expected cost of the algorithm. Pros:
it can incorporate information predicting the future. Cons: can be difficult to determine
probability distributions accurately.
Figure 4.3 represents plot of total actual payoff with respect to size of PSD matrices
for three different fixed ε error accuracy with T = 500 rounds. For matrix An×n, n =
100, 200, ..., 1000, three fixed error accuracy ε = 0.05, 0.1, 0.2 and after T = 500 rounds,
total actual payoff remains between 1.12 to 1.23.
36
100 200 300 400 500 600 700 800 900 1000
1.14
1.16
1.18
1.2
1.22
1.24
1.26
n: size of PSD matrices −−−−−−−−−−−>
To
tal p
ayo
ff −
−−
−−
−−
−>
Total payoff Vs size of PSD matrices
ε = 0.05
ε = 0.1
ε = 0.2
Figure 4.3: Total actual payoff Vs. size of PSD matrices
Figure 4.4 represents plot of random payoff with respect to size of PSD matrices
for three different fixed ε error accuracy with T = 500 rounds. For matrix An×n, n =
100, 200, ..., 1000, three fixed error accuracy ε = 0.05, 0.1, 0.2 and after T = 500 rounds,
random payoff remains between 2.8 to 8.
100 200 300 400 500 600 700 800 900 10002
3
4
5
6
7
8
n: size of PSD matrices −−−−−−−−−−−>
Ran
do
m p
ayo
ff −
−−
−−
−−
−>
Random payoff Vs size of PSD matrices
ε = 0.05
ε = 0.1
ε = 0.2
Figure 4.4: Random payoff Vs. size of PSD matrices
Figure 4.5 represents plot of random payoff div. total actual payoff with respect to
size of PSD matrices for three different fixed ε error accuracy with T = 500 rounds. For
matrix An×n, n = 100, 200, ..., 1000, three fixed error accuracy ε = 0.05, 0.1, 0.2 and after
T = 500 rounds, random payoff div. total actual payoff remains between 2.2 to 6.6. This
represents total actual payoff is 2.2 to 6.6 times lesser than random payoff.
37
100 200 300 400 500 600 700 800 900 10002
3
4
5
6
7
n: size of PSD matrices −−−−−−−−−−−>
Ran
do
m p
ayo
ff D
iv. T
ota
l p
ayo
ff −
−−
−−
−−
−>
Random payoff div. Total payoff Vs size of PSD matrices
ε = 0.05
ε = 0.1
ε = 0.2
Figure 4.5: Random payoff div. total actual payoff Vs. size of PSD matrices
Figure 4.6 represents plot of worst payoff with respect to size of PSD matrices for
three different fixed ε error accuracy with T = 500 rounds. For matrix An×n, n =
100, 200, ..., 1000, three fixed error accuracy ε = 0.05, 0.1, 0.2 and after T = 500 rounds,
worst payoff remains between 300 to 3300.
100 200 300 400 500 600 700 800 900 10000
500
1000
1500
2000
2500
3000
3500
n: size of PSD matrices −−−−−−−−−−−>
Wo
rst
payo
ff −
−−
−−
−−
−>
Worst payoff Vs size of PSD matrices
ε = 0.05
ε = 0.1
ε = 0.2
Figure 4.6: Worst payoff Vs. size of PSD matrices
Figure 4.7 represents plot of upper bound with respect to size of PSD matrices for
three different fixed ε error accuracy with T = 500 rounds. For matrix An×n, n =
100, 200, ..., 1000, three fixed error accuracy ε = 0.05, 0.1, 0.2 and after T = 500 rounds,
upper bound remains between 10 to 900.
38
100 200 300 400 500 600 700 800 900 10000
100
200
300
400
500
600
700
800
900
1000
n: size of PSD matrices −−−−−−−−−−−>
Up
perb
ou
nd
−−
−−
−−
−−
>
Upperbound Vs size of PSD matrices
ε = 0.05
ε = 0.1
ε = 0.2
Figure 4.7: Upper bound Vs. size of PSD matrices
Figure 4.8 represents Plot of total actual payoff with respect to ε error accuracy for
three different fixed size of PSD matrices with T = 500 rounds. For error accuracy
ε = 0.01, 0.02, ..., 0.1, three fixed matrix An×n, n = 100, 300, 500, and after T = 500
rounds, total actual payoff remains between 1.1 to 1.8.
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.111.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
ε : Error accuracy −−−−−−−−−−−>
To
tal p
ayo
ff −
−−
−−
−−
−>
Total payoff Vs Error accuracy
n = 100
n = 300
n = 500
Figure 4.8: Total actual payoff Vs. ε error accuracy
Figure 4.9 represents Plot of random payoff with respect to ε error accuracy for three
different fixed size of PSD matrices with T = 500 rounds. For error accuracy ε =
0.01, 0.02, ..., 0.1, three fixed matrix An×n, n = 100, 300, 500, and after T = 500 rounds,
random payoff remains between 1.7 to 6.3.
39
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.111.5
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
ε : Error accuracy −−−−−−−−−−−>
Ran
do
m p
ayo
ff −
−−
−−
−−
−>
Random payoff Vs Error accuracy
n = 100
n = 300
n = 500
Figure 4.9: Random payoff Vs. ε error accuracy
Figure 4.10 represents Plot of random payoff div. total actual payoff with respect to
ε error accuracy for three different fixed size of PSD matrices with T = 500 rounds. For
error accuracy ε = 0.01, 0.02, ..., 0.1, three fixed matrix An×n, n = 100, 300, 500, and after
T = 500 rounds, random payoff div. total actual payoff remains between 1.5 to 5.2. This
represents total actual payoff is 1.5 to 5.2 times lesser than random payoff.
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.111
1.5
2
2.5
3
3.5
4
4.5
5
5.5
ε : Error accuracy −−−−−−−−−−−>
Ran
do
m p
ayo
ff D
iv. T
ota
l p
ayo
ff −
−−
−−
−−
−>
Random payoff Div. Total payoff Vs Error accuracy
n = 100
n = 300
n = 500
Figure 4.10: Random payoff div. total actual payoff Vs. ε error accuracy
Figure 4.11 represents Plot of worst payoff with respect to ε error accuracy for three
different fixed size of PSD matrices with T = 500 rounds. For error accuracy ε =
0.01, 0.02, ..., 0.1, three fixed matrix An×n, n = 100, 300, 500, and after T = 500 rounds,
worst payoff remains between 300 to 1700.
40
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11200
400
600
800
1000
1200
1400
1600
1800
ε : Error accuracy −−−−−−−−−−−>
Wo
rst
payo
ff −
−−
−−
−−
−>
Worst payoff Vs Error accuracy
n = 100
n = 300
n = 500
Figure 4.11: Worst payoff Vs. ε error accuracy
Figure 4.12 represents Plot of upper bound with respect to ε error accuracy for three
different fixed size of PSD matrices with T = 500 rounds. For error accuracy ε =
0.01, 0.02, ..., 0.1, three fixed matrix An×n, n = 100, 300, 500, and after T = 500 rounds,
upper bound remains between 32 to 2100.
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.110
500
1000
1500
2000
2500
ε : Error accuracy −−−−−−−−−−−>
Up
perb
ou
nd
−−
−−
−−
−−
>
Upperbound Vs Error accuracy
n = 100
n = 300
n = 500
Figure 4.12: Upper bound Vs. ε error accuracy
4.4 Conclusions
In this chapter we have designed and implemented data streaming and online algorithm.
This algorithm basically uses the matrix multiplicative - weights update technique to
update weights of the problem. At each round we have used matrix exponential technique
to increase knowledge about input. This online algorithm that we have presented here,
will have many applications in constrained combinatorial optimization, machine learning,
data streaming and online problems.
41
4.5 Remarks
We have not given any theorem and proofs for data streaming and online algorithm
6. Since theorems of matrix multiplicative - weights update algorithm can be directly
mapped to data streaming and online algorithm 6 [3].
42
Chapter 5
Fast Algorithms for Approximate
Packing-Covering Semidefinite
Programs using the Multiplicative -
Weights Update Techniques
5.1 Introduction
In chapter 1 we have presented primal only approach to approximate SDPs. Here we
present primal-dual approach to approximate SDPs. Our main objective of this chapter
is to design and implement general primal-dual algorithm to compute a near-optimal
solution to any SDP.
A general SDP with n2 variables, an n× n matrix variable X and m constraints and
43
its dual can be written as follows:
Max C •X
subject to Aj •X ≤ bj, forj = 1, 2, ...,m
X � 0 (Primal) (5.1)
Min b.y
subject to∑m
j=1Ajyj � C
y ≥ 0 (Dual) (5.2)
Where, y = {y1, y2, ..., ym} are the dual variables and b = {b1, b2, ..., bm}. A linear
program is the special case of SDPs. We can see this by replacing all entries of all
matrices by zero other than diagonal entries. Hence we say than SDP’s are more general
form of LPs.
Assume A1 = I and b1 = R. This assumption helps in trace bounding i.e. Trace(X) ≤
R. Where Trace(X) ≤ R is a scaling constraint bound on trace of optimum value X.
We assume as in the case of primal only approach to approximate SDPs, that our imple-
mentation of SDP primal-dual algorithm uses binary search on α to reduce optimization
to feasibility. Let α be the algorithm’s current guess for the optimum value of the SDP.
Implementation is trying to either construct a PSD matrix that is primal feasible and has
value > α. or a dual feasible solution whose value is at most (1 + δ)α for some arbitrary
small δ > 0.
5.2 Algorithm
The algorithm starts with a trivial candidate for a primal solution X, where X is PSD
matrix possibly infeasible of trace equals R. Let X(1) = RnI. Then algorithm iteratively
generates candidate primal solutions X(1), X(2), .... At every step it tries to improve X(t)
to obtain X(t+1). In order to apply the multiplicative - weights update technique in
constraints combinatorial optimization problems, we require to go through following two
phases:
44
1. An oracle for generating the maximally adversarial point at each step, and
2. The multiplicative - weights update technique for updating the weights of the de-
cision.
Oracle basically tries to check the validity of X(t) i.e. whether it is feasible or infeasible.
The validity of current X(t) is certified by oracle as follows: oracle searches for a vector
y from the polytope Dα = {y : y ≥ 0, b · y ≤ α} such that
m∑j=1
(Aj •X(t))yj − (C •X(t)) ≥ 0 (5.3)
If oracle succeeds in finding such a y then we claim that X(t) is either primal infeasible
or has value which satisfies
C •X(t) ≤ α (5.4)
If not then otherwise
m∑j=1
(Aj •X(t))yj − (C •X(t)) ≤m∑j=1
bjyj − (C •X(t))
< α− α = 0 (5.5)
Which contradict∑m
j=1(Aj • X(t))yj − (C • X(t)) ≥ 0. Thus y implicitly contains some
useful information to improve the candidate primal X(t), and algorithm uses y to update
X(t) using a matrix exponential update rule. Please note that if X(t) is PSD then the
matrix exponential of X(t), the new matrix X(t+1) is also PSD. If oracle declares that
there is no vector y ∈ Dα which satisfies∑m
j=1(Aj • X(t))yj − (C • X(t)) ≥ 0, then the
algorithm declares that the found X(t) must be a primal feasible solution of objective
value atleast α. Next algorithm 7 represents the SDP primal-dual algorithm proposed
by Arora et al. [2]. It basically runs in two phases: the one is an oracle for generating
the maximally adversarial point at each step, and other is the multiplicative - weights
update technique for updating the weights of the decision. In the next subsection we have
proposed a general scheme of oracle implementation which computes the vector y from
the polytope Dα = {y : y ≥ 0, b · y ≤ α} such that∑m
j=1(Aj •X(t))yj − (C •X(t)) ≥ 0,
otherwise it returns primal feasible X(t) of objective value at least α.
45
On input (Aj, bj, C, for j = 1, 2, ...,m, where Aj’s are
square matrix of dimension n× n)
Result: Feasible primal PSD matrix X and dual y ≥ 0
Assumption: A1 = I, and b1 = R, where Trace(X) ≤ R,
and α be the algorithm’s current guess for the optimum
value of the SDP.
Initialize: Set X(1) = RnI. Let ε = δα
2ρR, and let
ε′ = −ln(1− ε). Let T = 8ρ2R2ln(n)δ2α2 .
for t = 1, 2, ..., T do
Run the oracle with candidate solution X(t).
if the oracle fails i.e.∑mj=1(Aj •X(t))yj − (C •X(t)) < 0 then
stop and output X(t).
else
let y(t) be the vector generated by oracle.
end
Let M (t) = (∑m
j=1Ajy(t)j − C + ρI)/2ρ.
Compute
W (t+1) = (1− ε)∑tτ=1M
(τ)= exp(−ε′(
∑tτ=1M
(τ))).
Set X(t+1) = RW (t+1)
Trace(W (t+1))and continue.
end
Algorithm 7: Primal-dual algorithm to approximate SDPs
Theorem 5.2.1 bounds the number of iterations needed in the algorithm 7.
Theorem 5.2.1. (Theorem 1 [Arora et al. [2]])
Let e1 = {1, 0, ..., 0}. In the primal-dual SDP algorithm, assume that the oracle never
fails for
T =8ρ2R2 lnn
δ2α2iterations (5.6)
Let y =δα
Re1 +
1
T
T∑t=1
y(t) (5.7)
Then y is a feasible double solution with objective value atmost (1 + δ)α
46
5.2.1 Computation of vector y
Here we give a proposed idea for implementation of oracle for primal-dual SDPs. This
implementation calculates the vector y from the polytope Dα = {y : y ≥ 0, b · y ≤ α}
such that∑m
j=1(Aj •X(t))yj − (C •X(t)) ≥ 0, otherwise it returns primal feasible X(t) of
objective value at least α. The search for vector y is as follows:
y ≥ 0, b1y1 + b2y2 + ...+ bmym ≤ α (5.8)m∑j=1
(Aj •X(t))yj − (C •X(t)) ≥ 0 (5.9)
from equation (5.8)
y1 =α−
∑mj=2 bjyj − εb1
(5.10)
substitute (5.10) in (5.9)
(A1 •X(t))y1 +m∑j=2
(Aj •X(t))yj ≥ C •X(t)
(A1 •X(t))(α−
∑mj=2 bjyj − εb1
) +m∑j=2
(Aj •X(t))yj ≥ C •X(t)
(A1 •X(t))α
b1
−(A1 •X(t))
∑mj=2 bjyj
b1
− ε(A1 •X(t))
b1
+m∑j=2
(Aj •X(t))yj ≥ C •X(t)
m∑j=2
(Aj •X(t))yj −(A1 •X(t))
∑mj=2 bjyj
b1
≥ C •X(t) +ε(A1 •X(t))
b1
− (A1 •X(t))α
b1
m∑j=2
(Aj •X(t) − (A1 •X(t))bjb1
)yj ≥ C •X(t) +ε(A1 •X(t))
b1
− (A1 •X(t))α
b1
Let,
dj = (Aj •X(t) − A1 •X(t)bjb1
)
c? =ε(A1 •X(t))
b1
− (A1 •X(t))α
b1
47
therefore,
m∑j=2
djyj ≥ C •X(t) + c? = CNew. (5.11)
For all dj < 0, set,
yj = 0, 2 ≤ j ≤ m (5.12)
let K of the dj’s 2 ≤ j ≤ m be ≥ 0, say dl1 , dl2 , ..., dlk , then let S = dl1 + dl2 + ...+ dlk ,
set
yl1 = yl2 = ... = ylk =CNewS
+ ε (5.13)
others were set to zero in (5.12). Calculate y1 from (5.10). Hence (5.10), (5.12) and (5.13)
gives vector y.
The primal-dual SDP algorithm depends on the width parameter of the oracle. This
is the smallest ρ ≥ 0 such that for every primal candidate X, the vector y ∈ Dα returned
by the oracle satisfies
||Ajyj − C|| ≤ ρ. (5.14)
Theorem 5.2.2. (Theorem V.1 [Chan et al. [5]])
There exists such an oracle with access time O(n2), and for candidate value α, it has
width ρα = O(αn).
5.3 Examples
Consider the SDP’s primal (5.1) and dual (5.2)
48
5.3.1 Example 1
Consider the problem as follows:
n = 2,m = 3, C =
−11 0
0 23
, A1 =
10 4
4 0
, A2 =
0 0
0 −8
, A3 =
0 −8
−8 −2
,
b =
−48
−8
−20
.
AK solver output: for α? = 23, δ = 0.9, R = 3 we get, iter = 8, CPU time = 0.0061
sec., primal objective value = 23.1256 and dual objective value = 22.9250. Hence we
have primal feasible and has value > α. or a dual feasible solution whose value is at most
(1 + δ)α for some arbitrary small δ > 0.
SeDuMi output: iter = 7, CPU time = 0.2 sec., primal objective value = 23.000000006
and dual objective value = 23.000000025.
5.3.2 Example 2
Consider the problem as follows:
n = 3,m = 4, C =
2 −0.5 −0.6
−0.5 2 0.4
−0.6 0.4 3
, A1 =
0 1 0
1 0 0
0 0 0
, A2 =
0 0 1
0 0 0
1 0 0
, A3 =
0 0 0
0 0 1
0 1 0
, A4 =
1 0 0
0 1 0
0 0 1
, b =
0
0
0
1
.
AK solver output: for α? = 3, δ = 0.01, R = 2 we get, iter = 2, CPU time = 0.0025
sec., primal objective value = 4.6668 and dual objective value = 2.9992. Hence we have
primal feasible and has value > α. or a dual feasible solution whose value is at most
(1 + δ)α for some arbitrary small δ > 0.
SeDuMi output: iter = 5, CPU time = 0.1248 sec., primal objective value = 3.0 and
dual objective value = 3.0.
49
5.4 Conclusions
In this chapter we have given an overview of AK algorithm. AK algorithm runs in
basically two parts. One as oracle computations and other is matrix multiplicative -
weights update method. Matrix multiplicative - weights update method is very trivial, it
is easy to implement. The first part i.e. oracle computation (computation of vector y) of
AK algorithm is the root of this algorithm. Hence we have given a different approach to
compute this vector y, a very general one. This oracle computation is likely to work for
all general SDP’s problems. Then we have given two examples and compared our results
with SeDuMi, a well known SDP solver. Thus we are doing well in computation time.
5.5 Remarks
Please not that we have not modified AK SDP algorithm presented in algorithm 7. We
have given a general approach to compute vector y at each round. Proofs of related
theorems are available in Arora et al. [2] paper.
50
Bibliography
[1] F. Alizadeh. Interior point methods in semidefinite programming with approximation
to combinatorial optimization. SIAM journal of optimization, 5(1):13–51, 1995.
[2] S. Arora and S. Kale. A combinatorial, primal-dual approach to semidefinite pro-
grams. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of
Computing, pages 227–236, 2007.
[3] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update
method: A meta-algorithm and applications. Theory of computing, 8(1), 2012.
[4] Brian Borchers. Sdplib 1.2, a library of semidefinite programming test problems.
11(1):683–690, 1999.
[5] T.-H. Hubert Chan, Kevin L. Chang, and Rajiv Raman. An sdp primal-dual algo-
rithm for approximating the lovasz-theta function. ISIT, pages 2808–2812, 2009.
[6] M.D. Grigoriadis and L.G. Khachiyan. A sublinear-time randomized apprximation
algorithm for matrix games. Operations Research Letters, 18(2):53–58, 1995.
[7] E. Hazan, S. Arora, and S. Kale. Fast algorithms for approximate semidefinite
programming using the multiplicative weights update method. In Proceedings of the
46th Annual IEEE Symposium on Foundations of Computer Science, pages 339–348,
2005.
[8] C. Helmberg. Semidefinite programming. European Journal Of Operational Research
(137)3, pages 461–482, 2002.
[9] Rahul Jain and Penghui Yao. A parallel approximation algorithm for positive
semidefinite programming. FOCS, pages 463–471, 2011.
51
[10] Christos Koufogiannakis and Neal E. Young. Beating simplex for fractional pack-
ing and covering linear programs. The 48th IEEE Symposium on Foundation of
Computer Science, 2007.
[11] Jos. F. Strum. Sedumi ver. 1.3. Optimization Methods and Software.
[12] Dilys Thomas, Rajiv Raman, and Ajay Shankar Bidyarthy. Fast approximations to
solve packing-covering lps and constant-sum games via multiplicative-weights tech-
nique. In proceeding of the International Symposium on Combinatorial Optimization
- CO 2012, September 17-19 2012.
52