214
CONVEX OPTIMIZATION FOR SIGNAL PROCESSING PROBLEMS LUI WING KIN DOCTOR OF PHILODOPHY CITY UNIVERSITY OF HONG KONG AUGUST 2009

Convex Optimization for Signal Processing Problems … · CONVEX OPTIMIZATION FOR SIGNAL PROCESSING PROBLEMS LUI WING KIN DOCTOR OF PHILODOPHY CITY UNIVERSITY OF HONG KONG AUGUST

Embed Size (px)

Citation preview

  • CONVEX OPTIMIZATION FOR

    SIGNAL PROCESSING PROBLEMS

    LUI WING KIN

    DOCTOR OF PHILODOPHY

    CITY UNIVERSITY OF HONG KONG

    AUGUST 2009

  • CITY UNIVERSITY OF HONG KONG

    Convex Optimization for Signal Processing Problems

    Submitted to Department of Electronic Engineering

    in Partial Fulfillment of the Requirements

    for the Degree of Doctor of Philosophy

    by

    Lui Wing Kin

    August 2009

  • To my parents.

    i

  • Abstract

    Convex optimization has been one of the most exciting research areas in optimization,

    and it refers to minimizing a convex objective function subject to convex constraints.

    By recognizing or formulating the optimization problem in convex form, the problem

    can be solved it efficiently. In the recent decade, convex optimization has become

    an essential tool in engineering because of the benefits from two convex properties.

    First, convex optimization gives a globally optimal solution which can be found effi-

    ciently and reliably. Second, the optimization problem can be computed within any

    desired accuracy using well-developed numerical methods. Once the convex problem

    is formed, the problem is claimed to be solved.

    Source localization, sinusoidal parameter estimation, polynomial root finding and

    the determination of the capacity region, which are important areas of signal pro-

    cessing, will be tackled with convex optimization perspective. The problems are first

    formulated as optimization problems and they are either relaxed or transformed as

    convex problems to yield global solutions and high-fidelity approximation.

    In source localization problems, the positions of targets, such as sensor or mobile

    terminal are the parameters of interest. Given position-bearing measurements, such

    as time-of-arrival or time-different-of-arrival with the known coordinates of receivers,

    the target positions are able to be obtained. These problems, especially for time-of-

    arrival based localization, have been extended to multiple sources in a collaborative

    environment, which is called sensor network node localization. However, most liter-

    ature concentrates on the case that the anchor positions and the propagation speed

    are perfectly known. In this thesis, node localization in the presence of uncertainties

    in anchor positions and/or propagation speed are dealt with. Furthermore, source

    localization in non-line-of-sight propagation, which contributes to significant error,

    is addressed. Source localization using time-different-of-arrival measurements is also

    studied based on convex optimization.

    Parameter estimators of several sinusoidal models, namely, single complex/real

    ii

  • tone, multiple complex sinusoids, single two-dimensional complex tone and polyno-

    mial phase signal, in the presence of additive Gaussian noise are developed with

    convex perspective. The major difficulty for optimally determining the parameters is

    that the corresponding maximum-likelihood estimators involve searching the global

    minimum or maximum of multi-modal cost functions because of the nonlinearity of

    frequencies in the observed signals. By relaxing the non-convex maximum-likelihood

    formulations using semidefinite programs, high-fidelity approximate solutions are ob-

    tained in a globally optimum fashion.

    The problem of solving a polynomial equation has been a classical problem in

    mathematics. Semidefinite relaxation, which is a branch of convex optimization tech-

    nique, is investigated to find real roots of a real polynomial.

    The determination of the capacity region of parallel Gaussian interference channels

    is an open problem. The special case where the channel is one-sided is considered.

    The sum capacity cost function of user powers is shown to be convex. Exploiting the

    inherent structure of the problem, a numerical algorithm is constructed to compute

    the sum capacity.

    iii

  • Acknowledgments

    I would like to express my gratitude to my supervisor, Dr. So, Hing Cheung, for his

    great patience, guidance and support for both of my personal life and research. He

    introduces many interesting research topics to me and provided a lot of insightful

    and inspirational comments as well as proverbs for daily life during the time of these

    researches.

    I am also very grateful for the kind and valuable help of Prof. Chen, Ron Guan-

    rong, Dr. Ma, Wing-Kin, Dr. Shum, Kenneth Wing Ki, Dr. Sung, Albert Chi Wan,

    Dr. Wong, Kwok-Wo and Dr. Yan, Wei (by alphabetical order). The knowledge and

    experience they shared with me improved my works a lot, and their expertise on their

    research areas broaden my view.

    My thanks also go to my ex-colleagues and colleagues, Dr. Chan, Frankie Kit Wing,

    Chan, Thomas Chin Tao, Liu, Michael Hongqing, Tawfiq Amin, Lo, Thomas Kai

    Chun, Dr. Wu, Yuntao, Zheng, Jason Jun (by alphabetical order). Frequent discus-

    sions with them on researches and leisure have been enjoyable and productive.

    Finally, I would like to thank my family and friends. Their support and encourage-

    ment are important to the completion of this work and will be forever remembered.

    v

  • Mathematical Symbols

    Specific Sets

    C complex numberCn complex column vector of length n (n1 matrix)Cmn complex matrix of size m nHmn Hankel matrix of size m nR real numberRn real column vector of length n (n1 matrix)Rmn real matrix of size m nR+ nonnegative real numberR++ positive real numberRn+ nonnegative real column vector of length n (n1 matrix)Rn++ positive real column vector of length n (n1 matrix)Sn symmetric matrix of size n nSn+ symmetric positive semidefinite matrix of size n nSn++ symmetric positive definite matrix of size n nZ integer numbersZ+ nonnegative integerZ++ positive integer

    Vectors and Matrices

    a bold lower case symbol symbolizing vectorA bold upper case symbol symbolizing matricesA calligraphic upper case symbol symbolizing set0m mm zero matrix0mn m n zero matrix1m mm matrix with all elements one1mn m n matrix with all elements oneIm mm identity matrix

    vi

  • Operators

    [a]i ith element of a[A]i,j (i, j)th entry of A|A| absolute value of A|A| number of candidates in AbAc floor of Aan, a n-norm of a, any norm of aA? optimal of AA conjugate of Ax+ x if x > 0 otherwise 0AT transpose AAH Hermitian transpose of AA1 inverse of Af1 inverse function of fA1/2 matrix square root of A

    A estimate of Ax

    partial differentiation with respect to xf sub-differential of function ff vector differential of function fa b a, b R being approximately equalA := B A defined as Ba b a Rn element-wise greater than bA 0n A Cnn being positive semidefiniteA 0n A Cnn being positive definiteA B A being a subset of BA B A being a proper subset of BA negation of AAB union of A and B

    S A union of AAB intersection of A

    S A intersection of Aa b statement a implying statement b

    vii

  • aff A affine hull of A (See (2.1.20))B(xc, r) Euclidean ball with

    radius r and center xc (See (2.1.27))arg max{a : a A} argument of the largest in a Aarg min{a : a A} argument of the smallest in a Ablkdiag (A1,A2, ,Am) block diagonal matrix with

    diagonal matrices A1,A2, ,Akconv A convex hull of A (See (2.1.21))dom f domain of fdiag (a) Cnn diagonal matrix with a Cn as diagonal elementsdiag (A, k) Cnk column vector with

    the kth diagonal elements of A Cnnepi f epigraph of function f (See (2.1.36))exp(a) exponential function of aE {A} expectation of Ainf{a : a A} infimum in a Aln(a) natural logarithm of a R++log(a) logarithm of a R++max{i1, i2, , im} largest element among i1 to im, for m 2max{a : a A} largest in a Amin{i1, i2, , im} smallest element among i1 to im, for m 2min{a : a A} smallest in a Ax mod y x ny, n = bx/yc and x > 0, y > 0N (,C) Gaussian distribution with and C

    being the mean and covariance, respectivelyO(a) order of ap({Ai}|{Bj}) probability of {Ai}, given {Bj}perm1(S,mi,mj) Cnn, [perm1(S,mi,mj)]k,l = [S]i,j, i, j = 1, 2, ,mimj

    n = mimj k = mi(i 1) (mimj 1)bi/mic+ 1,l = mi(j 1) (mimj 1)bj/mjc+ 1(See Appendix B.1)

    perm2(S,mi,mj) Cnn, [perm2(S,mi,mj)]k,l = [Si,j, i, j = 1, 2, ,mimjn = min{mi,mj} i = mi(k 1) + k, j = mi(l 1) + l

    (See Appendix B.2)relint A relative interior of A (See (2.2.5))sign(a) sign of aToeplitz(x) symmetric or Hermitian Toeplitz matrix

    formed by x, which is the first rowtr(A) trace of Avec(A) vectorization of A

    viii

  • Abbreviation

    0 to L

    2D two-dimensional3D three-dimensionalAOA angle-of-arrivalCRLB Cramer-Rao lower boundDPT discrete polynomial transformDTFT discreet-time Fourier transformESDP edge-based semidefinite programmingFIM Fisher information matrixFIR finite impulse responseGA genetic algorithmGP geometric programmingGPS global positioning systemIC interference channelIID independently and identically distributedIQML iterative quadratic maximum-likelihoodIW iterative water-fillingKKT Karush-Kuhn-TuckerLLS linear least-squaresLOS line-of-sightLS least-squares

    ix

  • M to Z

    MAP maximum a posterioriMCMC Markov chain Monte CarloMDS multidimensional scalingML maximum-likelihoodMLE maximum-likelihood estimatorMSE mean square errorMSPE mean square position errorNLOS non-line-of-sightNLS nonlinear least-squaresNP-hard nondeterministic polynomial-time hardPDF probability density functionPSD positive semidefiniteRD range-differenceRSS received signal strengthSOCP second-order cone programmingSDP semidefinite programmingSDR semidefinite relaxationSNR signal-to-noise ratioTDOA time-difference-of-arrivalTOA time-of-arrivalTSWLS two-step weighted least-squaresWLS weighted least-squaresWSN wireless sensor network

    x

  • Table of Contents

    Abstract ii

    Acknowledgments v

    Mathematical Symbols vi

    Abbreviation ix

    Table of Contents xi

    List of Figures xiv

    1 Introduction 1

    1.1 Mathematical Optimization . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Conventional Optimization Problems . . . . . . . . . . . . . . . . . . 3

    1.2.1 Least-Squares and Linear Programming . . . . . . . . . . . . . 3

    1.2.2 Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . 6

    1.2.3 Nonlinear Optimization . . . . . . . . . . . . . . . . . . . . . 8

    1.3 Applications to Signal Processing Problems . . . . . . . . . . . . . . . 10

    1.3.1 Source Localization . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.3.2 Sinusoidal Parameter Estimation . . . . . . . . . . . . . . . . 12

    1.3.3 Polynomial Root-Finding . . . . . . . . . . . . . . . . . . . . . 13

    1.3.4 One-Sided Parallel Gaussian Interference Channels . . . . . . 13

    1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2 Preliminaries 15

    2.1 Disciplines of Convex Optimization . . . . . . . . . . . . . . . . . . . 15

    2.1.1 Optimization Theory . . . . . . . . . . . . . . . . . . . . . . . 15

    2.1.2 Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.1.3 Numerical Computation . . . . . . . . . . . . . . . . . . . . . 29

    2.2 Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    2.2.1 The Largrangian . . . . . . . . . . . . . . . . . . . . . . . . . 37

    2.2.2 Karush-Kuhn-Tucker Conditions . . . . . . . . . . . . . . . . 39

    2.3 Types of Convex Optimization . . . . . . . . . . . . . . . . . . . . . . 39

    2.3.1 Geometric Programming . . . . . . . . . . . . . . . . . . . . . 40

    2.3.2 Conic Optimization . . . . . . . . . . . . . . . . . . . . . . . . 41

    2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    xi

  • 3 Source Localization 46

    3.1 TOA-Based Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.1.1 Sensor Network Localization . . . . . . . . . . . . . . . . . . . 48

    3.1.2 Non-Line-of-Sight Environment . . . . . . . . . . . . . . . . . 77

    3.2 TDOA-Based Positioning . . . . . . . . . . . . . . . . . . . . . . . . . 86

    3.2.1 Relaxation on LLS Based Estimator . . . . . . . . . . . . . . . 89

    3.2.2 Relaxation on ML Based Estimator . . . . . . . . . . . . . . . 96

    3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    4 Sinusoidal Parameter Estimation 104

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    4.2 Single Complex Tone . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    4.2.1 Periodogram Approach . . . . . . . . . . . . . . . . . . . . . . 106

    4.2.2 ML Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    4.2.3 NLS Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    4.3 Single Real Tone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

    4.3.1 ML Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

    4.3.2 NLS Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 114

    4.4 Multiple Complex Tones . . . . . . . . . . . . . . . . . . . . . . . . . 115

    4.5 Two-Dimensional Complex Tone . . . . . . . . . . . . . . . . . . . . . 122

    4.5.1 Periodogram Approach . . . . . . . . . . . . . . . . . . . . . . 122

    4.5.2 NLS Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    4.6 Polynomial Phase Signal . . . . . . . . . . . . . . . . . . . . . . . . . 126

    4.6.1 Review of Discrete Polynomial Transform . . . . . . . . . . . 127

    4.6.2 Relaxation on ML Function . . . . . . . . . . . . . . . . . . . 130

    4.7 Fast Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

    4.8 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

    4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

    5 Polynomial Root-Finding 146

    5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    5.2 Algorithm Development . . . . . . . . . . . . . . . . . . . . . . . . . 148

    5.3 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

    5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

    6 Gaussian Interference Channels 157

    6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

    6.2 Channel Model and Problem Formulation . . . . . . . . . . . . . . . 159

    6.3 Computation of the Sum Capacity . . . . . . . . . . . . . . . . . . . 164

    6.3.1 First Subproblem . . . . . . . . . . . . . . . . . . . . . . . . . 165

    6.3.2 Second Subproblem . . . . . . . . . . . . . . . . . . . . . . . . 166

    6.3.3 Alternating Optimization . . . . . . . . . . . . . . . . . . . . 167

    6.4 Comparison with Suboptimal Schemes . . . . . . . . . . . . . . . . . 168

    6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

    xii

  • 7 Conclusions and Future Work 171

    A Development of MAP Estimator 174

    B Development of Permutation Operators 175

    B.1 Development of perm1(S,mi,mj) . . . . . . . . . . . . . . . . . . . . 175

    B.2 Development of perm2(S,mi,mj) . . . . . . . . . . . . . . . . . . . . 176

    C Proof of Theorem 6.3 177

    D Proof of Theorem 6.6 180

    Bibliography 184

    Publications 196

    xiii

  • List of Figures

    2.1 Geometric interpretation of epigraph form problem. . . . . . . . . . . 21

    2.2 Some simple convex and non-convex sets. . . . . . . . . . . . . . . . . 22

    2.3 Convex hull of the kidney shaped set. . . . . . . . . . . . . . . . . . . 24

    2.4 Geometric interpretation of a convex cone. . . . . . . . . . . . . . . . 24

    2.5 Convexity of functions. . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    2.6 Geometric interpretation of sub-level sets. . . . . . . . . . . . . . . . 28

    2.7 Hierarchical representation of common convex optimization problems. 40

    3.1 Intersection of circles gives the receiver location. . . . . . . . . . . . . 47

    3.2 Geometry of sensor network. . . . . . . . . . . . . . . . . . . . . . . . 67

    3.3 Single trial performance of the standard SDP algorithm in the presence

    of anchor position uncertainty. . . . . . . . . . . . . . . . . . . . . . . 68

    3.4 Single trial performance of the proposed SDP algorithm in the presence

    of anchor position uncertainty. . . . . . . . . . . . . . . . . . . . . . . 69

    3.5 Single trial performance of the proposed ESDP algorithm in the pres-

    ence of anchor position uncertainty. . . . . . . . . . . . . . . . . . . . 69

    3.6 Mean square position error versus 2d in the presence of anchor position

    uncertainty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    3.7 Mean square position error versus 2i at 2d=-50dB. . . . . . . . . . . 71

    3.8 Single trial performance of the standard SDP algorithm for unknown

    propagation speed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    3.9 Single trial performance of the proposed SDP algorithm for unknown

    propagation speed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    3.10 Mean square position error versus 2t/c2 for unknown propagation speed. 73

    3.11 Mean square speed error versus 2t/c2 for unknown propagation speed. 73

    3.12 Single trial performance of the standard SDP algorithm in the presence

    of combined uncertainties. . . . . . . . . . . . . . . . . . . . . . . . . 74

    3.13 Single trial performance of the proposed SDP algorithm in the presence

    of combined uncertainties. . . . . . . . . . . . . . . . . . . . . . . . . 75

    xiv

  • 3.14 Mean square position error versus 2t/co2 in the presence of combined

    uncertainties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    3.15 Mean square speed error versus 2t/co2 in the presence of combined

    uncertainties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    3.16 Illustration of two overlapped regions (along the line passing through

    xi and xj). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    3.17 Illustration of the non-overlapped case (along the line passing through

    xi and xj). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    3.18 LOS/NLOS detection probability. . . . . . . . . . . . . . . . . . . . . 85

    3.19 Mean square position error versus qi. . . . . . . . . . . . . . . . . . . 85

    3.20 Intersection of hyperbolas gives the target location. . . . . . . . . . . 87

    3.21 Geometrical interpretation of Ri and R1,i. . . . . . . . . . . . . . . . 913.22 Mean square position error versus 2 at x = [3.5, 2.5, 1.5]T m. . . . . . 95

    3.23 Mean square position error versus 2 at x = [5.5, 3.5, 1.5]T m. . . . . . 96

    3.24 Mean square position error versus 2 at x = [3.5, 2.5, 1.5]T m. . . . . . 101

    3.25 Mean square position error versus x-coordinate at 2 = 10 dBm2. . 102

    4.1 Mean square error for of single complex sinusoid. . . . . . . . . . . 133

    4.2 Mean square error for of single complex sinusoid. . . . . . . . . . . 134

    4.3 Mean square error for of single complex sinusoid. . . . . . . . . . . 134

    4.4 Mean square error for of single real sinusoid. . . . . . . . . . . . . . 135

    4.5 Mean square error for of single real sinusoid. . . . . . . . . . . . . . 136

    4.6 Mean square error for of single real sinusoid. . . . . . . . . . . . . . 136

    4.7 Mean square error for 1 of multiple complex sinusoids. . . . . . . . . 137

    4.8 Mean square error for 2 of multiple complex sinusoids. . . . . . . . . 138

    4.9 Mean square error for 1 of multiple complex sinusoids. . . . . . . . . 138

    4.10 Mean square error for 2 of multiple complex sinusoids. . . . . . . . . 139

    4.11 Mean square error for 1 of multiple complex sinusoids. . . . . . . . . 139

    4.12 Mean square error for 2 of multiple complex sinusoids. . . . . . . . . 140

    4.13 Mean square error for of 2D single complex sinusoid. . . . . . . . . 140

    4.14 Mean square error for of 2D single complex sinusoid. . . . . . . . . 141

    4.15 Mean square error for of 2D single complex sinusoid. . . . . . . . . 141

    4.16 Mean square error for of 2D single complex sinusoid. . . . . . . . . 142

    4.17 Mean square error for of polynomial phase signal. . . . . . . . . . . 143

    xv

  • 4.18 Mean square error for a0 of polynomial phase signal. . . . . . . . . . . 143

    4.19 Mean square error for a1 of polynomial phase signal. . . . . . . . . . . 144

    4.20 Mean square error for a2 of polynomial phase signal. . . . . . . . . . . 144

    5.1 Interaction of f(x, y) = y + x + 1 and y x2 = 0. . . . . . . . . . . . 1505.2 f(x) = 0.1034x7+0.1573x6+0.4075x5+0.4078x4+0.0527x3+0.9418x2+

    0.15x + 0.3844. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    5.3 f(x) = 0.8959x6 0.9791x5 + 0.6537x4 + 0.4208x3 + 0.8830x2 +1.2610x + 0.7249. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

    5.4 f(x) = 0.6813x4 + 0.3795x3 + 0.8318x2 + 0.5028x + 0.7095. . . . . . . 154

    5.5 Mean absolute error versus polynomial degree n. . . . . . . . . . . . . 155

    6.1 One-sided parallel Gaussian IC. . . . . . . . . . . . . . . . . . . . . . 158

    6.2 Sum capacity of a single one-sided Gaussian IC, Ca, when cross link is

    weak (a = 0.5 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616.3 Sum capacity of a single one-sided Gaussian IC, Ca, when cross link is

    strong (a = 2.5 > 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

    6.4 Comparison of three different transmission schemes. . . . . . . . . . . 169

    xvi

  • CHAPT ER1Introduction

    In this chapter, some important classes of mathematical optimization problemsand the investigated signal processing problems are presented. The constraintspecifications of two mathematical optimizations, namely, linear programming and

    convex optimization, are introduced in Section 1.1. Conventional optimization prob-

    lems, that is, least-squares, linear programs, convex optimization approaches and

    nonlinear optimization methods, are presented in Section 1.2. Being the super-set of

    classical optimization methods, such as least-squares and linear programming, con-

    vex optimization extends our ability to solve richer classes of optimization problem.

    By reviewing the nonlinear optimization approaches, it would be seen that convex

    optimization also takes an important roles even if the problems are non-convex. The

    tackled signal processing applications are described in Section 1.3. Finally, thesis

    organization is provided in Section 1.4.

    1.1 Mathematical Optimization

    The mathematical optimization problem [1], or in short optimization problem, can

    be expressed as

    minimizex

    f0(x)

    subject to fi(x) bi, i = 1, 2 ,m.(1.1.1)

    The optimization variables are encapsulated in a vector x = [x1, x2, , xn]T . Theobjective function, the inequality constraint functions and their bounds are denoted

    by f0 : Rn R, fi : Rn R, i = 1, 2, ,m and b1, b2, , bm, respectively. The

    optimal vector, or a solution of (1.1.1) is symbolized by vector x?, which corresponds

    to the smallest objective value among all vectors that satisfies the constraints, that

    is, for any vector z Rn with f1(z) b1, f2(z) b2, , fm(z) bm, the inequalityof f0(z) f0(x?) is obtained.

    The classes of optimization problems can be characterized by particular forms of

    the objective function as well as constraint functions. An important example is linear

    1

  • CHAPT ER 1. INTRODUCTION 2

    program [2], where its objective function and constraint functions, f0, f1, , fm, arelinear, satisfying

    fi(x + y) = fi(x) + fi(y), i = 0, 1, ,m, (1.1.2)

    where for all x,y Rn and for all , R. If any of the objective and constraintfunctions fail to fulfill the linearity requirement, the problem becomes a nonlinear

    program.

    In this thesis, a class of optimization problems in the content of convex optimiza-

    tion [3] is investigated. This refers to an optimization problem where the objective

    and constraint functions are convex [4], that is,

    fi(x + y) fi(x) + fi(y), i = 0, 1, ,m, (1.1.3)

    for all x,y Rn and for all , R+ with + = 1. From (1.1.2) and (1.1.3), itwould be observed that the convexity is more general than linearity, as the inequality

    represents a super-set of restrictive equality and the inequality holds for certain values

    of and provided that + = 1. Since any linear program is a convex optimization

    problem, so it is considered to be a generalization of linear programming.

    An algorithm that computes a solution of a class of optimization problem, to a

    certain accuracy, is called a solution method. Since 1940s, many researchers have

    devoted themselves into developing those algorithms for solving various kinds of op-

    timization problems, analyzing their properties, and even implementing them with

    well-developed software packages [5]. However, the forms of the objective and con-

    straint functions together with the number of variables and constraints limit our

    ability to solve (1.1.1). In some special structured optimization problems, such as a

    sparse problem, where each constraint function depends on only a small number of

    the variables, the sparsity [6] of the problem is inversely proportional to the solving

    effort. On the contrary, even if the objective and constraint functions are smooth,

    the optimization problem (1.1.1) is extremely difficult to solve. However, there are

    exceptions. Some classes of optimization problems are friendly. There are effective

    algorithms to deal with them even if the problem is large in scale, say, with thousands

    of variables and constraints. Two important and famous instances are least-squares

  • CHAPT ER 1. INTRODUCTION 3

    problem [7] and, as previously mentioned, linear programs, and they are introduced

    in the following section. Another less famous exceptional class, which is the super-

    class of the previous two classes of problems, is convex optimization, and it has been

    one of the most active and exciting research areas in optimization recently. Like

    least-squares and linear programing, convex optimization problems come along with

    reliable and efficient algorithms to find the solutions.

    1.2 Conventional Optimization Problems

    In this section, the classical optimization problems, namely, least-squares and linear

    programming, which are essentially two sub-sets of the convex optimization problem,

    are reviewed. Then, followed by the introduction of convex optimization problem

    and its relation with the least-squares and linear programs, the general nonlinear

    optimization is presented. Compromising between the effectiveness and the quality

    of solution for the nonlinear programming, convex optimization acts as a reliable

    alternative and can play an important role even when the original problem is non-

    convex.

    1.2.1 Least-Squares and Linear Programming

    Two widely adopted subclasses of convex optimization, namely, least-squares and

    linear programs are described in this subsection.

    Least-Squares

    A least-squares problem is an optimization problem without constraints, or m = 0,

    and the objective function f0 is a sum of squares of terms with the form of aTi xbi:

    minimizex

    f0(x) = Ax b22 =k

    i=1

    (aTi x bi)2, (1.2.1)

    where A Rkn with k n, {aTi } are the rows of A, and the optimization variablesare stored in the vector x Rn. From (1.2.1), there is a set of linear equation,

    (ATA)x = ATb, (1.2.2)

  • CHAPT ER 1. INTRODUCTION 4

    where the closed-form solution is

    x? = (ATA)1ATb

    with x? is the optimal vector. For least-squares problems, well-developed software

    and algorithms [8] exist, and their solving time is approximately proportional to n2k

    with known coefficient matrix A. In some cases, the problem can be further speeded

    up by exploiting the special structure in A. If matrix A is sparse, which means that

    there are less than kn nonzero entries, the least-squares problem would be solved with

    less than order n2k or O(n2k) computational time. In general, least-squares problem

    is a mature technology. The solution is automatically computed when the problem is

    recognized or reformulated with least-squares formulation.

    Recognizing a least-squares problem is also straightforward. The problem has a

    quadratic function as its objective function and there is no constraint. Two tech-

    niques are employed to increase its flexibility, namely, weighted least-squares and

    regularization.

    Weighted Least-Squares The formulation of the weighted least-squares can be

    given as

    minimizex

    (Ax b)TW(Ax b), (1.2.3)

    where W is the positive definite weighted matrix. Here, the entries of W are chosen

    to reflect the importance of the corresponding entries of

    (Ax b)(Ax b)T .

    The analytical solution of (1.2.3) is x? = (ATWA)1ATWb. One of the common

    applications is estimation problem, where the parameter vector of interest x is es-

    timated by the given measurements corrupted by noise with W1 being the known

    noise covariance matrix.

    Regularization In regularization, extra terms are added to the cost function. For

    example, a sum of squares of the variables can be added to the objective function:

    minimizex

    ki=1

    (aTi x bi)2 + n

    i=1

    x2i , (1.2.4)

  • CHAPT ER 1. INTRODUCTION 5

    where > 0 is a user-defined parameter and x = [x1, x2, , xn]T . Large values ofxi, i = 1, 2, , n will be penalized by the extra terms, and hence the solutions tendto be small. The parameter gives the freedom to user for choosing the trade-off

    between making the original objective functionk

    i=1(aTi x bi)2 small while keeping

    n

    i=1 x2i not too big. Weighted least-squares and regularization are two common

    modifications for least-squares optimization, but they can also be employed for other

    classes of optimization methods, such as convex optimization.

    Linear Programming

    Another classical optimization is linear programming, where the objective and con-

    straint functions are linear:

    minimizex

    cTx

    subject to aTi x bi, i = 1, 2, ,m,(1.2.5)

    where c, a1, a2, , am Rn are known parameter vectors and b1, b2, , bm Rare the scalar parameters, which vary with different linear programming problems.

    Unlike least-squares problem, there is no simple analytical formula for the solution,

    but well-developed and efficient algorithms, namely, Dantzigs simplex method [9] and

    interior-point methods [10,11] are available to provide numerical solutions. The exact

    operations involved in solving a linear programming problem cannot be evaluated,

    but a rigorous bound can be estimated when the interior-point methods are used.

    The complexity is around O(n2m), with the assumption of m n. The above-mentioned methods can handle large-scale problems. With the nature of sparsity or

    other exploitable structure, problems with more variable can be solved practically.

    The form of (1.2.5) can be found in some applications, but in many other cases,

    transformation is necessary. A simple example is Chebyshev approximation problem

    [12]:

    minimizex

    maxi|aTi x bi|, i = 1, 2, , k, (1.2.6)

    where x Rn is the variable vector, and the known parameters are denoted bya1, a2, , ak Rn and b1, b2, , bk R. For both least-squares and linear pro-grams, the objective function evaluates the value of the term aTi xbi, i = 1, 2, , k.

  • CHAPT ER 1. INTRODUCTION 6

    In linear program, such as Chebyshev approximation, the problem is with the ab-

    solute sum while its least-squares counterpart is with the sum of squares. Another

    important distinction is the non-differentiability of the objective function in (1.2.6),

    while the least-squares in (1.2.1) is quadratic, which is differentiable. Nevertheless,

    the Chebyshev approximation problem can be transformed to the following linear

    program

    minimizet,x

    t

    subject to aTi x bi t, i = 1, 2 , k,(aTi x bi) t, i = 1, 2 , k,

    (1.2.7)

    with t R and x Rn, which is ready to be solved. Reducing a problem to linearproblem involves more knowledge than least-squares problem. Nevertheless, like the

    least-squares problem, once the problem is transformed or recognized as a linear

    program, well-developed software [13] can be applied.

    1.2.2 Convex Optimization

    Convex optimization has been one of the most active and exciting research areas

    in optimization recently, and it refers to minimizing an objective function, which is

    convex but not necessarily differentiable, subject to convex constraints. The problem

    is formulated as

    minimizex

    f0(x)

    subject to fi(x) bi, i = 1, 2, ,m,(1.2.8)

    with convex functions f0, f1, , fm : Rn R, satisfying (1.1.3). The least-squaresproblem of (1.2.1) and linear programming problem of (1.2.5) are the special cases

    of the convex optimization problem of (1.2.8). Like the linear programs, convex

    problems are in general without analytical formula, but there are effective methods

    to solve them. Interior-point method [10, 11] is one of the most practical solvers for

    convex problem given specified accuracy with a number of operations that does not

    exceed a polynomial of the problem dimensions [14], which solves (1.2.8) in a number

    of steps or iterations. Ignoring the structure of the problem, such as sparsity, each

  • CHAPT ER 1. INTRODUCTION 7

    step requires the order of

    max{n3, n2m,F} (1.2.9)

    operations [15], where F is the time of evaluating the first and second derivatives

    of the objective and constraint functions f0, f1, , fm, it is assumed that they aredifferentiable. Interior-point methods for solving convex optimization problems are

    relatively mature. Like least-squares and linear programs, exploitation of the problem

    sparsity makes problem with extreme large scale solvable in practice.

    The usage of convex optimization is more or less the same as least-squares and

    linear programming. By recognizing or formulating the optimization problem in con-

    vex form, it can be solved efficiently just like solving least-squares and linear program

    problems. However, recognizing a convex function or transforming the problem to

    convex program can be difficult and the technique is much more sophisticated than

    recognizing a least-squares or linear program. When a problem is cast into a convex

    form, the structure of the optimal, which often reveals design insights, can often be

    identified with a rigorous optimality condition and a duality theory [16,17]. Further-

    more, numerical algorithms are available to solve for the solution, so the study of

    convex optimization focuses on the formulation techniques. Once the convex problem

    is formed, it would be claimed to be already solved.

    In the recent decade, convex optimization has become an essential tool in engi-

    neering because of the benefits from the convex properties:

    1. Convex optimization gives a globally optimal solution which can be found effi-

    ciently and reliably.

    2. The optimization problem can be computed within any desired accuracy using

    well-developed numerical methods.

    The solutions of practical engineering problems, which are often large in scale, can

    be obtained in a relatively reliable and efficient manner. Convex optimization, in

    a certain extent, provides an indispensable modern computational tool, which ex-

    tends some classical best-fit problem solvers, such as least-squares and linear pro-

    gramming. A much larger and richer class of problems has been enabled by this

  • CHAPT ER 1. INTRODUCTION 8

    optimization technique [18]. Breakthroughs in algorithms for solving convex prob-

    lems [10,11,1921] and advances in computing power equip us to solve these kinds

    of problems. New engineering applications are being proposed from almost every

    discipline, for instance, control [2225], circuit design [26, 27], computer science [28],

    and signal processing [16,17,2932].

    1.2.3 Nonlinear Optimization

    With nonlinear constraint functions and/or objective function, which may not be

    convex, it would be said that these problems correspond to nonlinear optimization or

    nonlinear programming. Unfortunately, there is no algorithm which is able to solve

    the general optimization problem stated in (1.1.1) with nonlinear and non-convex

    constraint functions or objective function. Even problems with few variables would

    be extremely challenging, or exhaustive search is a must. Although stochastic algo-

    rithms, such as genetic algorithm (GA) [33], which is implemented in a computer

    simulation in which a population of abstract representations (called chromosomes

    or the genotype of the genome) of candidate solutions (called individuals, creatures,

    or phenotypes) to an optimization problem evolves toward better solutions, jumping

    genes evolutionary algorithm [34], which introduces a genetic operator called jumping

    genes transposition to increase the ability of GA in finding extreme solutions, par-

    ticle swarm optimization [35], which is an algorithm modeled on swarm intelligence

    that finds a solution to an optimization problem in a search space, ant colony opti-

    mization [36], which is a probabilistic technique for solving computational problems

    which can be reduced to finding good paths through graphs, Markov chain Monte

    Carlo (MCMC) [37], which is a class of algorithms for sampling from probability dis-

    tributions based on constructing a Markov chain that has the desired distribution as

    its equilibrium distribution, particle filters [38], which are usually used to estimate

    Bayesian models and are the sequential analogue of MCMC batch methods and are

    often similar to importance sampling methods, are blooming in recent optimization

    research, they fail to guarantee global convergence within a reasonable time, and they

    are computationally intensive. In general, methods for the nonlinear optimization in-

    volve two compromises, namely, local convergence and computational efficiency.

  • CHAPT ER 1. INTRODUCTION 9

    For local optimization, seeking for the optimal x, which minimizes the objective

    function over all feasible solutions, is not the major concern. Instead a point that is

    only optimal locally is obtained, or it does not guarantee the obtained solution to pro-

    duce the smallest objective function value over all feasible points. Local optimization

    methods are fast and are able to handle large-scale problems. The only requirement

    for the local optimization like Newtons method [39] is differentiable objective function

    as well as constraints. Therefore, local optimization is widely adopted in engineering

    design applications. However, apart from the possibility of local convergence, the

    methods require an initial estimate for the optimization variable, x, meaning that

    the quality of the solution is highly depending on how far the initial guess is apart

    from the global optimal point. Local optimization methods are also sensitive to the

    user-defined parameters in the algorithm, hence careful adjustments are required for a

    particular family of problems. The procedures of applying local optimization methods

    involve choosing a suitable algorithm, adjusting algorithm parameters, and finding a

    high quality initial guess or a method for generating a high quality initial estimate.

    Comparing convex optimization and local optimization for nonlinear optimization

    problem, the former provides unique globally optimal solution and is able to handle

    even for a non-differentiable objective function or constraints, although relaxation

    may be needed. To achieve globally optimal solution using exhaustive search in the

    general optimization problem of (1.1.1), the efficiency is inescapably sacrificed. The

    exponential growth of the solving time with the problem size is expected. When the

    number of problem variables is small and the computation time is not critical, it may

    be worthy to find the true global optimum. Examples are worst-case analysis [40],

    and verification of an expensive or safety-critical system from engineering design [41].

    On the other hand, convex optimization also plays an important role even when

    the problem is non-convex. The solution obtained from convex optimization can

    be a high quality initial estimate for the local optimization. Approximating the

    original problem as convex problem, the approximated problem is solved with rela-

    tively straightforward manner. The solution is then used as the starting point for

    the local optimization method. Second, some nondeterministic polynomial-time hard

    (NP-hard) [42] combinatorial problems can be approximated as convex formulation

  • CHAPT ER 1. INTRODUCTION 10

    and the approximated problem can be solved within a polynomial-time. Finally, in

    some applications, exact solution is not important, and finding the lower bound on

    the optimal value is required with relatively lower computational burden. Two meth-

    ods to fulfill such requirement are based on relaxation, which are essentially replacing

    each non-convex constraint with a looser but convex constraint, and Lagrangian relax-

    ation, where the Lagrangian dual problem [20] is solved instead of the prime problem,

    as prime-dual solution is not always the same. The dual problem is convex and it

    provides a lower bound on the optimal value of the non-convex problem.

    To conclude, convex optimization is playing an important role for nonlinear opti-

    mization problems. Some signal processing problems are introduced in this thesis as

    examples to illustrate the contribution of convex optimization.

    1.3 Applications to Signal Processing Problems

    In this thesis, some important problems on signal processing would be addressed

    with the convex point of view. With these enhanced classes of optimization, new

    constraints or requirements to the original problems can be added, or getting new

    insights with duality theory. However, limitation also exists for convex optimization

    technique. When the problem is non-convex in nature and it is impossible to made

    it convex, there is no option but relaxing the problem. The performance is degraded

    unavoidable in such relaxation. The degradation in estimation accurate heavily de-

    pends on the relaxation technique applied. Guidelines can be outline, but there is lack

    of standard procedure to deal with. Therefore, depending on empirical experience is

    another major limitation on applying convex optimization. Also, there is difficult

    to devise performance analysis for the convex formulation as there are no analytic

    solutions in general. In this thesis, the operations of convex reformulation with some

    signal processing problems would be demonstrated.

    Source localization [43], sinusoidal parameter estimation [44], polynomial root

    finding [45] and the determination of the capacity region [46], will be tackled with

    convex optimization perspective. The problems are first formulated as optimization

    problems and they are either relaxed or transformed as convex problems to yield

  • CHAPT ER 1. INTRODUCTION 11

    high-fidelity global solutions.

    The optimization problem described in (1.1.1) is an abstract problem of making

    the best possible choice of a vector in Rn from a set of candidate choices. The

    variable x represents the choice made, the constraints fi(x) bi, i = 1, 2, . . . , m,denote the firm requirements or the specifications that limit the possible choices,

    and the objective function represents the cost of choosing x. A solution, x? of the

    optimization problem (1.1.1) corresponds to a choice that has the minimum cost.

    1.3.1 Source Localization

    In the localization problem, the positions of targets, such as sensor node or mobile

    terminal, are the concerned parameter given position-bearing measurements, such as

    time-of-arrival (TOA), time-different-of-arrival (TDOA) or angle-of-arrival together

    with the known coordinates of receivers. For the single-source TOA-based positioning,

    which is the simplest case of localization, the case of m receivers with known positions

    at xi = [xi, yi]T , i = 1, 2, ,m, is considered and the target is located at unknown

    position x = [x, y]T . It is worthy to note that it can be directly upgraded to 3-

    dimensional cases by including the z-coordinate in xi and x. The distance between the

    target and the ith receiver, which is obtained from multiplying the known propagation

    speed by the corresponding TOA measurement, is

    di = x xi2 + i, i = 1, 2, ,m, (1.3.1)

    where i R is the error of the ith measurement. It is assumed that the noise is azero-mean white Gaussian process, the maximum-likelihood (ML) estimator is [47]:

    minimizex,{ri}

    mi=1

    (ri di)2

    subject to ri = x xi2, i = 1, 2, ,m,(1.3.2)

    where the norm function with equality constraint does not fit the convex framework,

    and hence (1.3.2) is a non-convex problem. With convex optimization technique of

    semidefinite relaxation, Cheung et al. [48] propose a convex program formulation for

    this single-source case.

  • CHAPT ER 1. INTRODUCTION 12

    The problem has been extended to multiple sources in a collaborative environ-

    ment, which corresponds to sensor network node localization, and the related convex

    formulations are presented by Biswas et al. [49,50]. However, they concentrate on the

    case that the anchor positions and the propagation speed are perfectly known, which

    is not valid in some applications. In this thesis, node localization in the presence

    of uncertainties in anchor positions and/or propagation speed are tackled. Further-

    more, source localization in non-line-of-sight propagation [51], which contributes to

    significant error, is addressed. Source localization using TDOA measurements is also

    studied based on convex optimization. The developments of these kinds of problems

    under relaxation technique are presented in Chapter 3.

    1.3.2 Sinusoidal Parameter Estimation

    The simplest case of the problem, namely, estimation of parameters for a single real

    sinusoid [52] is first considered, and its discrete-time signal model is

    x(i) = cos(i + ) + q(i), i = 1, 2, ,m, (1.3.3)

    where R++, (0, ) and [0, 2) are unknown but deterministic constantswhich represent the tone amplitude, frequency and phase, respectively, while the noise

    q(i) is assumed to be zero-mean white process with unknown variance. The objective

    is to find , and given the m samples of {x(i)}. Similar to (1.3.2) and (1.1.1),the ML sinusoidal parameter estimator is

    minimize,,,{s(i)}

    mi=1

    (s(i) x(i))2

    subject to s(i) = cos(i + ), i = 1, 2, ,m,(1.3.4)

    where s(i) is the estimate of the noise-free signal. The constraint functions involve a

    nonlinear operator of cos(), which corresponds to a nonlinear and non-convex opti-mization problem, but relaxation can be utilized to yield a practically solvable prob-

    lem. With the same principle, other related signal models, that is, single/multiple

    complex tone [5355], single two-dimensional complex tone [56] and polynomial phase

    signal [57,58], are taken into consideration in Chapter 4.

  • CHAPT ER 1. INTRODUCTION 13

    1.3.3 Polynomial Root-Finding

    The problem of solving a polynomial equation is to find x that satisfies

    p(x) = p0 + p1x + x2 + + pnxn = 0, (1.3.5)

    where pi is the ith real coefficient of the polynomial. To transform it to an optimiza-

    tion problem like (1.1.1), the |p(x)|2 is minimized, hence the root-finding problembecomes

    minimizex,x

    |pTx|2

    subject to xi = xi, i = 0, 1, ,m,

    (1.3.6)

    where the vector p = [p0, p1, , pm]T stores the coefficients and x = [x0, x1, , xm]T .The challenging polynomial equality constraints, xi = x

    i, i = 0, 1, ,m, in (1.3.6)make the problem to be nonlinear, hence relaxation is needed to solve the problem.

    The relaxed optimization problem under convex technique is developed in Chapter 5.

    1.3.4 One-Sided Parallel Gaussian Interference Channels

    The determination of the capacity region of L parallel Gaussian interference channels

    [59] is an open problem. By considering an one-sided situation, which is a special

    case of the channel, the sum capacity is shown to be a convex function of the two

    users powers, and the optimization problem can be expressed as:

    maximizep1,p2

    L

    l=1

    Ca(l)(p(l)1 , p

    (l)2 )

    subject to p P ,(1.3.7)

    where p1,p2 RL denote the power allocation of the two users, namely, user 1and user 2, p is the power allocation for (p1,p2), P is the feasible set of the powerallocation, and Ca(p1, p2), which is convex but non-differentiable, is the function of the

    sum capacity of a single one-sided Gaussian interference channel. Unlike the previous

    problems where relaxation is the major technique, exploiting the inherent structure

    in (1.3.7) is the major concept for dealing this sum capacity problem. Based on our

    finding about the inherent structure, a numerical algorithm is proposed to compute

    the sum capacity. The details of the algorithms as well as the formulation of the sum

    capacity problem are presented in Chapter 6.

  • CHAPT ER 1. INTRODUCTION 14

    1.4 Thesis Organization

    The organization of this thesis is as follows. A brief introduction on the convex

    optimization techniques and theories is in Chapter 2. The disciplines of convex opti-

    mization including optimization theory, convex analysis and numerical computation,

    duality theory and types of convex optimization are presented. Then, the signal

    processing problems, namely, source localization, sinusoidal parameter estimation,

    polynomial root finding and the determination of the capacity region as well as their

    common applications and the development of convex formulations are presented in

    Chapters 3 to 6, respectively. The TOA and TDOA based localization would be

    studied in Chapters 3. For TOA based localization, the algorithms for sensor net-

    work localization with uncertainties of anchor positions and/or propagation speed

    are developed. Furthermore, the semidefinite relaxation formulation for identifying

    NLOS measurements is derived. For TDOA source localization, two relaxations on

    linear least-squares and ML are proposed. Sinusoidal parameter estimation includ-

    ing single complex/real tone, multiple complex tone, two-dimensional complex tone,

    polynomial phase signal is presented in Chapter 4. Based on the relaxation about the

    periodogram, ML and nonlinear least-squares, algorithms are proposed under con-

    vex framework. The application of convex optimization on polynomial root-finding

    is descripted in Chapter 5. In Chapter 6, the sum capacity of a Gaussian interfer-

    ence channel is investigated objective. The algorithm to obtain the sum capacity is

    proposed and the related theoretical proofs are presented. Finally, conclusion and

    possible future work are presented in Chapter 7.

  • CHAPT ER2Preliminaries

    In this chapter, some background information about convex optimization theoryare introduced. Disciplines of convex optimization, duality theory and types ofconvex optimization are presented in the following sections.

    2.1 Disciplines of Convex Optimization

    Inheriting the wisdom cumulated from three disciplines, namely, convex analysis

    [6062], optimization theory [6366] and numerical computation [6, 6769], convex

    optimization can be claimed as a fusion research topic of them.

    In this section, the three disciplines is briefly introduced in order to have a better

    understanding about the scope of convex optimization. Optimization theory, which

    contributes to the notations and forms to convex optimization, is presented. Convex

    analysis equips engineers with ability to distinguish and to handle the convex sets and

    convex functions. These two convex objects are critical for researchers to recognize

    convex problems or transform the problem under convex framework. Numerical com-

    putation is vital in solving the problem with an efficient and reliable manner. They

    are the three pillars for the convex optimization.

    2.1.1 Optimization Theory

    In this subsection, the optimization theory is introduced. Convex optimization adopts

    the notations and representation from the general optimization discipline. To better

    understand the meaning of convex problem, the general notations and terms of op-

    timization theory are presented. Finally, some tricks for handling difficult functions

    developed in the field of general optimization, are introduced.

    Optimization is a study of seeking a set of real or integer solutions to minimize

    or maximize a real objective function with a systematic manner from an allowed

    set [1]. Mathematically, it can be written as (1.1.1), where an equality constraint

    is converted to two inequality constraints and the bounds bi are included in the

    15

  • CHAPT ER 2. PRELIMINARIES 16

    constraint functions. More specifically, the constraints can be divided into inequality

    and equality constraints, here (1.1.1) can be rewritten as:

    minimizex

    f0(x)

    subject to fi(x) 0, i = 1, 2 ,m,hi(x) = 0, i = 1, 2, , p,

    (2.1.1)

    which describes the problem of finding the optimization variable x Rn in minimizingthe objective function f0(x), f0 : R

    n R, within the feasible set satisfying theconditions fi(x) 0, i = 1, 2 ,m and hi(x) = 0, i = 1, 2, , p, which referto the inequality constraints and the equality constraints, respectively. Function

    fi : Rn R and hi : Rn R are called inequality constraint functions and equality

    constraint functions, respectively. The allowed set of the problem is defined as the

    intersection of the domain of all constraints functions, which is defined as:

    D =m

    i=0

    dom fi p

    i=1

    dom hi, (2.1.2)

    where a point x D is called feasible and D is feasible set or constraint set. Theproblem of (2.1.1) is said to be feasible if at least one point satisfies all the constraints

    and infeasible otherwise.

    The optimization theory provides a standard mathematical formulation to prob-

    lems. By restricting the constraints and the objective function complied with convex

    requirement, which is revealed by convex analysis, the problem can be said to be

    convex.

    The optimal value of (2.1.1) is denoted by o? and it is defined as

    o? = inf{f0(x) | fi(x) 0, i = 1, 2, ,m, hi(x) = 0, i = 1, 2, , p}, (2.1.3)

    where o? (,). The value of o? tends to infinity, o? if the problem isinfeasible. When there is at least one point xk with f0(xk) , where (2.1.1)is unbounded below. The optimal point x? is defined as f0(x

    ?) = o?, which means

    putting the solution into the objective produces the lowest value for the feasible set.

    In some problems, optimal points may not be unique, and they are defined as a set

  • CHAPT ER 2. PRELIMINARIES 17

    Xopt

    Xopt ={x | fi(x) 0, i = 1, 2, ,m,hi(x) = 0, i = 1, 2, , p, f0(x) = o?

    },

    (2.1.4)

    where if Xopt is a non-empty set, (2.1.1) is solvable otherwise unsolvable, includingthe situation of bounded below. A suboptimal solution x with f0(x) p? + , where > 0 is called -suboptimal for (2.1.1). The locally optimal x is defined as

    f0(x) = inf{f0(z) | fi(z) 0, i = 1, 2, ,m,hi(z) = 0, i = 1, 2, , p, z x2 R

    },

    (2.1.5)

    where R > 0 and z Rn is the variable vector.Let x be a feasible point and fi(x) = 0, i = 1, 2, ,m, the inequality constraint

    fi(x) 0 is active at x otherwise inactive, and inactive constraints are redundant.The feasible set does not change if the constraint is omitted.

    In some problems, the objective value is identical to zero or some constants, which

    are classified as feasibility problems:

    minimizex

    0

    subject to fi(x) 0, i = 1, 2, ,m,hi(x) = 0, i = 1, 2, , p,

    (2.1.6)

    where the feasibility problem is to find a feasible point that satisfies all the constraints

    otherwise unsolvable.

    The optimization problem in (2.1.1) is a standard form. The convention of stan-

    dard form is placing the right hand side of the inequality and equality constraints

    with zeros. For instance, the equality constraint gi(x) = gi(x) is represented as

    hi(x) = 0, where hi(x) = gi(x) gi(x), and the inequality of fi(x) 0 is expressed asfi(x) 0. The maximization problem can be replaced by minimizing the negativeobjective function f0(x) subject to the constraints.

    Unsurprisingly, there are intractable problems in practice, but some of them are

    computational friendly. Engineers can always make their live easier by transforming

    complicated problems to some solvable programs. Some techniques are developed

  • CHAPT ER 2. PRELIMINARIES 18

    within optimization theory, namely, substitution of variables, transformation of func-

    tions, insertion of slack variables, divide-and-conquer, employing epigraph problem

    form.

    Substitution of Variables

    It is assumed that a function : Rn Rn is one-to-one function, that is, (dom ) D, where the problem domain is D, functions fi and hi are defined as

    fi(z) = fi((x)), i = 0, 1, ,m,hi(z) = hi((x)), i = 0, 1, , p.

    (2.1.7)

    Substitute x = (z) in (2.1.1) gives

    minimizez

    f0(z)

    subject to fi(z) 0, i = 1, 2 ,m,hi(z) = 0, i = 1, 2, , p,

    (2.1.8)

    with z Rn. The two problems characterized by (2.1.1) and (2.1.8) are equivalent.If (2.1.1) can be solved efficiently with the solution x, then (2.1.8) can be solved with

    the inverse function of z = 1(x). This technique helps us to change the target

    variables. In some situations, functions in terms of particular variables are difficult

    to be solved, but they are easy to solve while in terms of other variables.

    Transformation of Functions

    Given that 0 : R R is monotone increasing, 1, 2, , m : R R fulfilli(u) 0, i = 1, 2, ,m if and only if u 0, and m+1, m+2, , m+p : R Rsatisfy i(u) = 0, i = m + 1,m + 2, ,m + p if and only if u = 0. The compositedfunctions fi and hi are defined as

    fi(x) = i(fi(x)), i = 0, 1, ,m,hi(x) = m+i(hi(x)), i = 1, 2 , p,

    (2.1.9)

    and the problem is transformed as

    minimizex

    f0(x)

    subject to fi(x) 0, i = 1, 2 ,m,hi(x) = 0, i = 1, 2, , p,

    (2.1.10)

  • CHAPT ER 2. PRELIMINARIES 19

    where the feasible sets remain unchanged. This technique assists us to handle some

    complicated functions by wrapping them with monotone increasing function. The

    problem may be solvable after applying such alternation without changing the optimal

    point. A subclass of convex programming, geometric programming (GP) [70], is

    essentially based on this technique, and GP will be reviewed in Section 2.3.1.

    Insertion of Slack Variables

    Inserting slack variables, si R++, is a procedure to convert an inequality constraintto equality constraint and vice versa, that is, fi(x) 0 fi(x) + si = 0. Thetransformed optimization is

    minimizex,s

    f0(x)

    subject to si 0, i = 1, 2 ,m,fi(x) + si = 0, i = 1, 2 ,m,hi(x) = 0, i = 1, 2, , p,

    (2.1.11)

    where x Rn and s = [s1, s2, , sm]T are the vector containing the optimizationvariables and the vector storing the slack variables, respectively. In (2.1.11), there

    are n + m variables, m inequality constraints, which are the constraints to limit si,

    i = 1, 2, ,m nonnegative, and m + p equality constraints. The technique is uselessin terms of simplifying the original problem, but it helps us to evaluate how tight

    the approximation problem is. In Section 1.2.3 (See page 10), a relaxation is men-

    tioned, which replaces the constraints with looser but convex constraints. Inequality

    constraints are usually replaced by the equality constraints, so by introducing slack

    variables, it is possible to evaluate how good our approximation is. The value ofm

    i=1 si is inversely proportional to the tightness of the relaxation.

    Divide-and-Conquer

    In some optimization problems, there is

    infx,y

    f(x,y) = infx

    f(x), (2.1.12)

    where

    f(x) = infy

    f(x,y), (2.1.13)

  • CHAPT ER 2. PRELIMINARIES 20

    with two sets of optimization variables encapsulated in x and y. It is possible to

    minimize a function by first minimizing over some of the variables, and then mini-

    mizing over the remaining ones. In the standard form of (2.1.1), It is assumed that

    the variable x Rn is segmented as x = (x1,x2) with x1 Rn1 , x2 Rn2 such thatn1 +n2 = n. Then, by grouping the constraints according to the segmented variables,

    the following problem can be obtained:

    minimizex1,x2

    f0(x1,x2)

    subject to fi(x1) 0, i = 1, 2, ,m1,fi(x2) 0, i = 1, 2, ,m2,

    (2.1.14)

    where the constraints fi, i = 1, 2, ,m1 and fi, i = 1, 2, ,m2 are independent, inwhich each constraint function only depends on x1 or x2. The x2 is first minimized

    and the f0 is defined in terms of x1 only

    f0(x1) = inf{

    f0(x1, z) | fi(z) 0, i = 1, 2, ,m2}

    , (2.1.15)

    with variable z Rm2 as the optimal point of minimizing over x2. The problem of(2.1.14) is equivalent to

    minimizex1

    f0(x1)

    subject to fi(x1) 0, i = 1, 2 ,m1,(2.1.16)

    where the number of variables is smaller than that in (2.1.14). It is a common sense

    that the number of variables is proportional to the computation time. From (1.2.9),

    it is observed that the computation time is a cubic function of involved variables.

    Dividing the variables and performing optimization one by one is a way to reduce

    computational time.

    Employing Epigraph Form

    The standard form of (2.1.1) can be represented by the following epigraph form

    minimizex,t

    t

    subject to f0(x) t 0,fi(x) 0, i = 1, 2 ,m,hi(x) = 0, i = 1, 2, , p,

    (2.1.17)

  • CHAPT ER 2. PRELIMINARIES 21

    where x Rn and t R are the variables. The problem is to minimize t over theepigraph of f0 subject to the constraints on x. This technique has been demonstrated

    by the transformation for (1.2.6) to (1.2.7), where the non-differentiable objective

    function in (1.2.6) is replaced by the epigraph. When n = 1 is considered, x is a

    scalar and denoted by x, and the optimization problem in (x, t) can be interpreted

    geometrically. This is illustrated in Figure 2.1 and the optimal point is denoted by

    (x?, t?).

    Figure 2.1: Geometric interpretation of epigraph form problem.

    The optimization theory benefits convex optimization with notations, symbols and

    techniques. The notations and symbols used in optimization theory are adopted by

    convex optimization, so that the convex optimization saves time to develop their own

    set of language. The techniques of transforming the optimization problems assists us

    to solve some apparently intractable problems.

    In the next subsection, the set and function properties, which fall in the discipline

    of convex analysis, are the focuses. With the knowledge about the sets and functions,

    it is possible to distinguish the different properties of constraint sets and objective

    function.

    2.1.2 Convex Analysis

    The originality of convex analysis can date back to antique time, but this name ap-

    pears in the 60s of the 20th century [4]. There are many discoveries of convex analysis

  • CHAPT ER 2. PRELIMINARIES 22

    recently, so it is a fusion of classical and modern topics. Closely related to geometry

    and deeply connected with analysis, convex analysis has stimulated many interests

    recently because of its vast applications in mathematics, mathematical physics and

    economics. As a branch of mathematics, convex analysis devotes to study convex

    sets and convex functions. In the following subsections, brief explanation of the two

    major convex objects, namely, convex sets and convex functions is presented.

    Convex Set

    A set S Rn is said to be a convex set if it contains the line segment joining any of itspoints, that is, if for any x,y S and any with 0 1, then x + (1 )y S,where Figure 2.2 shows four 2-dimensional examples. The left most rounded shape

    and the four-sided shape, which are inclusive, are convex. The irregular shaped set

    and the separated set in the right hand side are not convex, because part of line

    segment is not contained in the set. From geometry point of view, convex set is

    Figure 2.2: Some simple convex and non-convex sets.

    always bulging outward with no dents or kinks in it. Every point in the set can be

    seen by other points, along the corresponding straight lines which lie in the set.

    In the following subsections, some usual and important convex sets, namely, lines

    and line segments, affine sets, convex hull, convex cones, hyperplanes and half-spaces,

    Euclidean balls and ellipsoids, as well as some sets constructed by convexity preser-

    vation operations are described.

  • CHAPT ER 2. PRELIMINARIES 23

    Lines and Line Segments The line and line segment can be represented as

    y = x1 + (1 )x2, (2.1.18)

    with x1 6= x2 Rn and [0, 1]. The form represents a line segment between pointsx1 and x2. In addition, the y can be expressed in the form of

    y = x2 + (x1 x2), (2.1.19)

    where y is the sum of the base point x2 and the direction x1 x2 scaled by .

    Affine Sets If for any x1, x2 C, C Rn and R and x1 +(1 )x2 C, thenC is defined as affine, or It can be said that C contains the linear combination of anytwo points in C. The idea can be generalized for more than two points. A point of1x1+2x2+ +kxk is defined, where 1+2+ k = 1 is an affine combination ofthe points x1,x2, ,xk, and the set of all affine combination of points in set C Rn,which is called affine hull of C:

    aff C = {1x1 + 2x2 + + kxk | x1,x2, ,xk, 1 + 2 + k = 1}, (2.1.20)

    where affine hull is the smallest affine set containing C.

    Convex Hulls The convex hull of a set C, which is denoted by conv C, is the setof all convex combinations of points in C:

    conv C = {1x1 + 2x2 + + kxk | xi C, i 0,i = 1, 2 , k, 1 + 2 + + k = 1},

    (2.1.21)

    where conv C is always convex, as it is the smallest convex set that constrains C:If G is any convex set that contains C, then conv C G. Figure 2.3 illustrates thedefinition of convex hull.

    Convex Cones A set of convex cone C Rn contains all rays emerging from theorigin passing through its points, and all line segments joining any points on the those

    rays, that is,

    x,y C, , 0 = x + y C, (2.1.22)

  • CHAPT ER 2. PRELIMINARIES 24

    Figure 2.3: Convex hull of the kidney shaped set.

    Figure 2.4: Geometric interpretation of a convex cone.

    where the geometrical interpretation is shown in Figure 2.4.

    Besides, the nonnegative orthant, Rn+ is convex cone. The set of symmetric pos-

    itive semidefinite matrices, Snn+ = {X Snn | X 0} is also a convex cone,since the positive combination of semidefinite matrices is semidefinite, hence Snn+ apositive semidefinite cone.

    Hyperplanes and Half-Spaces The hyperplane is defined as the set containing

    all possible x:

    {x | aTx = b}, (2.1.23)

    with a Rn, a 6= 0 and b R, and it can be alternatively expressed as

    {x | aT (x x0) = 0}, (2.1.24)

    where aT is the normal vector and x0 lies on the hyperplane. As the hyperplanes

    contain all the lines and line segments joining any two points, so it is a convex set.

  • CHAPT ER 2. PRELIMINARIES 25

    In the similar manner, a half-space is defined, which is

    {x | aTx b}, (2.1.25)

    with a Rn, a 6= 0 and b R, and the alternative representation is

    {x | aT (x x0) 0}, (2.1.26)

    where aT is the normal vector and x0 lies on the boundary, and the set is representing

    the region under the boundary. The less than or equal to sign, , can be replacedwith greater than or equal to, , to represent the area above the boundary. Bothof them are convex in nature, as all lines joining any two points are also within the

    sets.

    Euclidean Balls and Ellipsoids An Euclidean ball in Rn is defined as

    B(xc, r) = {x | x xc2 r} = {x | (x xc)T (x xc) r2}, (2.1.27)

    where the radius is denoted by r > 0 and xc is the center of the ball. An alternative

    representation for the Euclidean ball is

    B(xc, r) = {xc + ru | u2 1}, (2.1.28)

    which also contains points within a circle of radius r centered at xc. The Euclidean

    ball is a convex set, when x1xc2 r is considered, x2xc2 r, and 0 1,then

    x1 + (1 )x2 xc2 = (x1 xc) + (1 )(x2 xc)2 x1 xc2 + (1 )x2 xc)2 r.

    (2.1.29)

    Apart from Euclidean ball, another similar set is ellipsoids, which is defined as

    E = {x | (x xc)TP1(x xc)}, (2.1.30)

    where P = PT 0n is a symmetric and positive definite matrix and xc Rn is thecenter of the ellipsoid. The lengths of the semi-axes of E are given as i, where i,

  • CHAPT ER 2. PRELIMINARIES 26

    i = 1, 2, , n are the eigenvalues. An Euclidean ball is a special case of ellipsoidwith P = r2In. An alternative representation of an ellipsoid is

    E = {xc + Au | u2 1}, (2.1.31)

    where A = P1/2 is a square and nonsingular matrix.

    Sets Constructed by Convexity Preservation Operations Preservation of

    convexity is one of the most important properties. There are many convexity preser-

    vation operations and the study of them is still an active research area [71,72]. Some

    simple examples of them are scalar multiplication, vector sum and linear transfor-

    mation, but the most important operation goes to intersection, as it lets us combine

    different convex sets to be a new one.

    Let A be an arbitrary index set and {S| A} a collection of sets which areconvex, hence

    A S is also convex.

    The preservation examples can be found everywhere. For instance, a polyhedron,

    which is intersection of a finite number of half-spaces, is convex set defined by the

    principle of intersection. Another convex set example is the solution set of linear

    matrix inequality

    F (x) = A0 + x1A1 + + xmAm 0, (2.1.32)

    where A0,A1, ,Am Snn are the coefficient matrices. The set of solution isdefined by the vector sum principle with affine set {xi | xiAi = bTi }, where {bi} aresome constant vectors, for i = 1, 2, ,m.

    After presenting the convex sets, which always corresponds to the constraints and

    the feasible solution sets of the problems, another convex object convex function,

    which always relates to the objective function in convex problem would be described.

    Convex Function

    A function f : Rn is convex if its domain, dom f , is convex and for all x,y dom f, [0, 1] satisfies f(x + (1 )y) f(x) + (1 )f(y). If f is convex, thenf is concave, and some examples, which consider n = 1, are shown in Figure 2.5.

  • CHAPT ER 2. PRELIMINARIES 27

    Figure 2.5: Convexity of functions.

    An example of convex function is x2, where its domain is on R, while log(x) is

    concave, with x is defined on R++. For the sake of fitting the criteria about the

    objective function that the infeasible solution should return +, the convex functionf is always extended as

    f(x) =

    f(x), x dom f+, x / dom f

    , (2.1.33)

    where f still satisfies the basic definition of convexity. In this thesis, the same symbol

    is used for f and its extension, so all convex functions are assumed to be extended.

    Some important properties about the convex function are reviewed in the following

    sections, namely, first-order and second-order conditions, sub-level sets, epigraph and

    operations that preserve convexity.

    First-Order and Second-Order Conditions If f is a differentiable function,

    then f is convex if and only if dom f is convex and

    f(y) f(x) +f(x)T (y x), (2.1.34)

    where (2.1.34) holds for all x, y dom f Rn. The inequality of (2.1.34) showsthat if f(x) = 0n1, then for all y dom f , f(y) f(x), hence x is a globalminimizer of function f .

    It is assumed that the Hessian, 2f , of function f exists at each point in dom f .The f is convex if and only if dom f is convex and its Hessian is positive semidefinite,

    that is, 2f(x) 0 for all x dom f .They provide simple ways to determine whether a differentiable or twice-differen-

    tiable function is convex or not.

  • CHAPT ER 2. PRELIMINARIES 28

    Sub-Level Sets The -sub-level of f : Rn R is defined as

    C = {x dom f | f(x) }, (2.1.35)

    and C with different of a convex function is also convex. Figure 2.6 shows the geo-metric interpretation of f s sub-level sets, C with interval [a, b] and C with (, c],where f : R R. This property helps us identify quasi-convex function, which

    Figure 2.6: Geometric interpretation of sub-level sets.

    is with all two sub-level convex sets, as they always contain stationary points which

    will be overlooked by first order or second order conditions even when the function is

    differentiable. For some non-convex functions with local minimal points, these points

    can be ignored with a well-designed sub-level set.

    Epigraph It is assumed that f : Rn R is a function, and its graph is defined as{(x, f(x)) | x dom f}, which is a subset of Rn+1, and the epigraph of f is definedas

    epi f = {(x, t) | x dom f, f(x) t}, (2.1.36)

    where it is also a subset of Rn+1. Epigraph is a linkage between convex sets and

    convex functions, as a function is convex if and only if its epigraph is a convex set,

    which equips us to determine whether a function is convex by its epigraph.

    Operations that Preserve Convexity The common convexity preservation op-

    erations are nonnegative weighted sum, composition with an affine mapping and

    point-wise maximum and supremum.

  • CHAPT ER 2. PRELIMINARIES 29

    If fi : Rn R, i = 1, 2, ,m, are convex, then the nonnegative weighted sum,

    which is

    f =m

    i=1

    wifi (2.1.37)

    is also convex with wi 0, i = 1, 2, ,m.An affine mapping function g : Rm R is considered with

    g(x) = f(Ax + b), (2.1.38)

    where f : Rn R, A Rnm, b Rn and dom g = {x | Ax + b dom f}. If fis convex, then g is also convex.

    If f1 and f2 are convex functions and f is defined as

    f(x) = max {f1(x), f2(x)} , (2.1.39)

    and dom f = dom f1 dom f2 is convex, and hence the point-wise maximum

    f(x) = max {f1(x), f2(x), , fm(x)} (2.1.40)

    is also convex.

    The study of convex objects enables us to recognize or cast the problem to fit into

    the convex framework. With the knowledge about the requirements on the constraints

    and the objective function of the optimization problem, better understanding about

    relaxing or solving the problem can be achieved. Convex analysis contributes to a

    systematic mean of analysis about the convexity of the optimization problem and the

    assurance of optimality.

    2.1.3 Numerical Computation

    Numerical computation, which receives many attentions on matrix calculation, is the

    study of algorithms to perform mathematical computations on computers. Numerical

    computation also takes an important role on engineering and computational science

    problems, such as image and signal processing, computational finance, material sci-

    ence simulations, structural biology, data mining, bioinformatics and fluid dynamics.

  • CHAPT ER 2. PRELIMINARIES 30

    The software packages of numerical computation rely on the development, analysis,

    and implementation of state-of-the-art algorithms for solving various numerical linear

    algebra problems, in large part because of the role of matrices in finite difference and

    finite element methods. For example, the common problems are LU decomposition,

    QR decomposition, singular value decomposition, eigenvalues, interior-point method

    and conic optimization. Numerical computation enables a computational backup

    to the convex optimization. Convex optimization problem can be solved efficiently

    with the development of numerical computation algorithms or even some convex op-

    timization software packages such as YALMIP [73], CVX [74], CVXOPT [75], SeDuMi [76],

    SDPT3 [7779], etc..

    Some numerical methods for tackling convex problems are reviewed in the follow-

    ing sections, namely, ellipsoid method, sub-gradient method, cutting-plane methods

    and interior-point methods.

    Ellipsoid Method

    The ellipsoid method is an algorithm for solving convex optimization problems. It

    was introduced by Shor [80], Nemirovsky [81], and Yudin [82] in 1972, and used by

    Khachiyan [83] to prove the polynomial-time solvability of linear programs [84]. At

    that time, the ellipsoid method was the only algorithm for solving linear program

    whose runtime was proved to be polynomial in time. The algorithm is to enclose the

    minimizer of a convex function in a sequence of ellipsoids whose volume decreases at

    each iteration.

    First, a convex problem in standard form as in (2.1.1) is considered and every

    equality constraint hi is replaced by two inequalities is assumed. An arbitrary initial

    ellipsoid E0 Rn can be defined, which is

    E0 ={z|(z x0)TP10 (z x0) 1

    }, (2.1.41)

    where z Rn and x0 is the center of E0. At kth iteration, the point xk Rn islocated at the center of Ek:

    Ek ={z|(z xk)TP1k (z xk) 1

    }. (2.1.42)

  • CHAPT ER 2. PRELIMINARIES 31

    Then, the cutting-plane oracle is defined, which is given by the sub-gradient of f0(k),

    gk Rn satisfying

    gTk+1(x? xk) 0, (2.1.43)

    and the optimal point x? is obtained as

    x? Ek {z|gTk+1(z xk) 0}, (2.1.44)

    where Ek is the minimal volume containing the solution x?. The update is given by

    xk+1 = xk 1n + 1

    Pkgk+1 (2.1.45)

    Pk+1 =n2

    n2 1(Pk 2

    n + 1Pkgk+1g

    Tk+1Pk

    ), (2.1.46)

    where

    gk+1 =1

    gTk+1Pkgk+1gk+1. (2.1.47)

    For each iteration, the feasibility of xk is checked. The xk is feasible when choosing

    a sub-gradient gk+1 that satisfies

    gTk+1(x? xk) + f0(xk) f(k)best 0, (2.1.48)

    where f(k)best contains the smallest objective value of k feasible iterations. Otherwise

    xk is infeasible and violates the j-th constraint, and the feasible set of z will be

    updated as

    gT(j)(z xk) + fj(xk) 0, (2.1.49)

    where g(j) is a sub-gradient of fj(xk). Finally, the stopping criterion is given by

    gTk+1Pkgk+1 = f0(xk) f(x?) , (2.1.50)

    with is the tolerable error.

    Although the algorithm provides a polynomial-time and a clear stopping criterion,

    the interior-point method and variants of the simplex algorithm are much faster than

    the ellipsoid method, in both theory and practice.

  • CHAPT ER 2. PRELIMINARIES 32

    Sub-Gradient Method

    Sub-gradient methods are algorithms for solving convex optimization problems. Orig-

    inally developed by Shor [80] in the 1960s and 1970s, sub-gradient methods can be

    used for a non-differentiable objective function [85]. When the objective function is

    differentiable, sub-gradient methods for unconstrained problems use the same search-

    ing direction as the method of steepest descent.

    Although sub-gradient methods can be much slower than interior-point methods

    and Newtons method in practice, they can be immediately applied to a far wider

    variety of problems and require much less memory. Moreover, by combining the

    sub-gradient method with primal or dual decomposition techniques, it is sometimes

    possible to develop a simple distributed solver.

    A convex function f : Rn R with dom f Rn is considered, the sub-gradientmethod of kth iteration is

    xk+1 = xk kgk, (2.1.51)

    where gk is a sub-gradient of f at xk and k > 0 is the step size, which is governed

    by the step size rules, such as constant step size k = and constant step length

    k = /gk2 where = xk+1xk2. For differentiable f , gk equals to f at xk. Alist f(k)best that keeps track the smallest objective function value in k valid iterations

    is also kept, that is,

    f(k)best = min{f(k1)best, f(xk)}. (2.1.52)

    As convex problem is with unique optimum, the sub-gradient algorithm is guaranteed

    to converge with constant step size and constant step length, hence

    limk

    f(k)best f ? , (2.1.53)

    where f ? and are the optimal value and the error, respectively. The sub-gradient

    method can be extended to handle the problem in the form of (2.1.1), where the

    constraints are present, but the sub-gradient is modified as

    gk =

    f0(x), fi(x) 0, i = 1, 2, ,mfj(x), fj(x) > 0

    , (2.1.54)

  • CHAPT ER 2. PRELIMINARIES 33

    where f is the sub-differential of f . If xk is feasible, the algorithm uses an objective

    sub-gradient, otherwise the algorithm chooses a sub-gradient of the violated constraint

    fj.

    Cutting-Plane Methods

    In mathematics, more specifically in optimization, the cutting-plane method [86] is

    an umbrella term for optimization methods which iteratively defines a feasible set

    or objective function by means of linear inequalities, termed cuts. Such procedures

    are popularly used to find integer solutions to mixed integer linear programming

    problems, as well as to solve general, not necessarily differentiable convex optimiza-

    tion problems. The use of cutting-planes to solve mixed integer linear programming

    problems was introduced by Gomory [87].

    Cutting-plane methods for mixed integer linear programming can be achieved by

    solving the non-integer linear program, the relaxation of the given integer program.

    The obtained optimum is tested if it is also an integer solution. If this is not the

    case, it is guaranteed that there exists a linear inequality that separates the optimum

    from the convex hull of the true feasible set. Finding such inequality is the separation

    problem, and a found inequality is a cut. A cut can be added to the relaxed linear

    program to cut off the current non-integer solution. This process is repeated until an

    optimal integer solution is found.

    Cutting-plane methods for general convex continuous optimization and variants

    are known under various names:

    Kelleys method [88], Kelley-Cheney-Goldstein method [89], and bundle methods [90].

    They are popularly used for non-differentiable convex minimization, where a convex

    objective function and its sub-gradient can be evaluated efficiently but usual gradi-

    ent methods for differentiable optimization can not be used. This situation is most

    typical for the concave maximization of Lagrangian dual functions. Another common

    situation is the application of the Dantzig-Wolfe decomposition [91] to a structured

  • CHAPT ER 2. PRELIMINARIES 34

    optimization problem in which formulations with an exponential number of variables

    are obtained. Generating these variables on demand by means of delayed column

    generation is identical to performing a cutting-plane on the respective dual problem.

    Here, the base cutting-plane method with the simplest Gomorys cut would be

    illustrated. It is assumed that an admissible solution x Rn is known to us and

    [B F]

    [xb

    xf

    ]= b, (2.1.55)

    with x = [xTb xTf ]

    T where xb denotes the first nb elements, and hence

    Bxb + Fxf = b = xb = B1bB1Fxf. (2.1.56)

    The b = B1b and A = B1F is defined to yield

    xb + Axf = b

    xi +[Axf

    ]i= bi, i = 1, 2, , nb,

    (2.1.57)

    with x = [x1, x2, , xn]T and b = [b1, b2, , bnb ]T . Then, the inequalities is con-structed

    xi +[

    Axf

    ]i bi, i = 1, 2, , nb, (2.1.58)

    for xi 0, i = 1, 2, , n. Without losing integer solutions, the right hand side isrounded to the integer:

    xi +[

    Axf

    ]i bbic, i = 1, 2, , nb. (2.1.59)

    By subtracting (2.1.57) with (2.1.59), the following integer formulation of Gomorys

    cut is obtained

    [(A

    A

    )xf

    ]i bi bbic, i = 1, 2, , nb. (2.1.60)

    Although Gomorys cut is proposed in the 60s, it has been forgotten for almost

    thirty years. Many experts - including Gomory himself - considered it to be impracti-

    cal, for numerical stability reasons as well as disregarding them as ineffective as many

    rounds of cuts are needed to make progress in the objective. Things turned when

  • CHAPT ER 2. PRELIMINARIES 35

    in the mid-90s Cornuejols et al. [92] showed them to be very effective in combina-

    tion with branch-and-cut and ways to overcome numerical instabilities. Nowadays, all

    commercial solvers of mixed integer linear programming, which refer to the minimiza-

    tion or maximization of a linear function subject to linear constraints, use Gomorys

    cut.

    There exist many more general cuts for mixed integer programs. Gomorys cuts

    however are very efficient to generate from a simplex tableau, whereas many other

    types of cuts are either expensive or even NP-hard to separate. Among other general

    cuts for mixed integer linear programming, lift-and-project [93] dominates Gomory

    cuts.

    Interior-Point Methods

    Interior point methods, which also refer to as barrier methods, are a certain class of

    algorithms to solve linear and nonlinear convex optimization problems.

    These algorithms have been inspired by Karmarkars algorithm [94] developed

    in 1984 for linear programming. The basic elements of the method consist of a

    self-concordant barrier function to encode the convex set. Contrary to the simplex

    method [91], it reaches an optimal solution by traversing the interior of the feasible

    region.

    Any convex optimization problem can be transformed into minimizing (or max-

    imizing) a linear function over a convex set. The idea of encoding the feasible set

    using a barrier and design