Recursive Compressed Sensing

  • View
    225

  • Download
    1

Embed Size (px)

Text of Recursive Compressed Sensing

  • Recursive Compressed Sensing

    Pantelis Sopasakis

    Presentation at ICTEAM UC Louvain, Belgiumjoint work with N. Freris and P. Patrinos

    IMT Institute for advanced studies Lucca, Italy NYU, Abu Dhabi, United Arab Emirates

    ESAT, KU Leuven, Belgium

    April 7, 2016

  • Motivation

    MRI

    Radio-astronomy

    Holography

    Seismology

    Photography

    RadarsFa

    cial

    rec

    ogni

    tion

    Speech recognition

    Fault detection

    Medicalimaging

    Faci

    al r

    ecog

    nitio

    n

    Part

    icle

    phy

    sics

    Video processing

    ECG

    Encr

    yptio

    n

    Communication networks

    System identification

    Compressed Sensing

    1 / 55

  • Spoiler alert!

    The proposed method is an order of magnitude faster compared toother reported methods for recursive compressed sensing.

    2 / 55

  • Outline

    1. Forward-Backward Splitting

    2. The Forward-Backward envelope function

    3. The Forward-Backward Newton method

    4. Recursive compressed sensing

    5. Simulations

    3 / 55

  • I. Forward-Backward Splitting

  • Forward-Backward Splitting

    Problem structure

    minimize (x) = f(x) + g(x)

    where

    1. f, g : Rn R are proper, closed, convex2. f has L-Lipschitz gradient

    3. g is prox-friendly, i.e., its proximal operator

    proxg(v) := arg minz

    {g(z) + 12v z

    2}

    is easily computable[1].

    [1]Parikh & Boyd 2014; Combettes & Pesquette, 2010.

    4 / 55

  • Example #1

    Constrained QPs

    minimize 12x>Qx+ q>x

    f

    + (x | B) g

    where B is a set on which projections are easy to compute and

    (x | B) =

    {0, if x B,+, otherwise

    Then proxg(x) = proj(x | B).

    5 / 55

  • Example #2

    LASSO problems

    minimize 12Ax b2

    f

    +x1 g

    Indeed,

    1. f is cont. diff/ble with f(x) = A>(Ax b)2. g is prox-friendly

    6 / 55

  • Other examples

    X Constrained optimal control

    X Elastic net

    X Sparse log-logistic regression

    X Matrix completion

    X Subspace identification

    X Support vector machines

    7 / 55

  • Forward-Backward Splitting

    FBS offers a generic framework for solving such problems using theiteration

    xk+1 = proxg(xk f(xk)) =: T(xk),

    for < 2/L.

    Features:

    1. (xk) ? O(1/k)2. with Nesterovs extrapolation (xk) ? O(1/k2)

    8 / 55

  • Forward-Backward Splitting

    The iteration

    xk+1 = proxg(xk f(xk)),

    can be written as[2]

    xk+1 = arg minz

    {f(xk) + f(xk), z xk+ 12 z x

    k2 Qf(z,xk)

    +g(z)},

    where Qf(z, xk) serves as a quadratic model for f [3].

    [2]Beck and Teboulle, 2010.[3]Qf(, x

    k) is the linearization of f at xk plus a quadratic term; moreover, Qf(z, xk) f(x) and Qf(z, z) = f(z).

    9 / 55

  • Forward-Backward Splitting

    x0

    (x0)

    = f + g

    10 / 55

  • Forward-Backward Splitting

    x0

    (x0)

    = f + g

    11 / 55

  • Forward-Backward Splitting

    x0

    (x0)

    = f + g

    Qf(z;x0) + g(z)

    12 / 55

  • Forward-Backward Splitting

    x0 x1

    (x0)

    (x1)

    = f + g

    Qf(z;x0) + g(z)

    13 / 55

  • Forward-Backward Splitting

    x0 x1 x2

    (x0)

    (x1)

    (x2)

    = f + g

    Qf(z;x1) + g(z)

    14 / 55

  • Forward-Backward Splitting

    x0 x1 x2 x3

    (x0)

    (x1)

    (x2)(x3)

    = f + g

    Qf(z;x2) + g(z)

    15 / 55

  • Overview

    Generic convex optimization problem

    minimize f(x) + g(x).

    The generic iteration

    xk+1 = proxg(xk f(xk))

    is a fixed-point iteration for the optimality condition

    x? = proxg(x? f(x?))

    16 / 55

  • Overview

    It generalizes several other methods

    xk+1 =

    xk f(xk) gradient method, g = 0C(x

    k f(xk)) gradient projection, g = ( | C)proxg(x

    k) proximal point algorithm, f = 0

    There are several flavors of proximal gradient algorithms[4].

    [4]Nesterovs accelerated method, FISTA (Beck & Teboulle), etc.

    17 / 55

  • Shortcomings

    FBS are first-order methods, therefore, they can be slow!

    Overhaul. Use a better quadratic model for f [5]:

    Qf,B(z, xk) = f(xk) + f(xk), z xk+ 12 z x

    k2Bk ,

    where Bk is (an approximation of) 2f(x).

    Drawback. No closed form solution of the inner problem.

    [5]As in Becker & Fadili 2012; Lee et al. 2012; Tran-Dinh et al. 2013.

    18 / 55

  • II. Forward-Backward Envelope

  • Forward-Backward Envelope

    The Forward-Backward envelope of is defined as

    (x) = minz

    {f(x) + f(x), z x+ g(z) + 12 z x

    2},

    with 1/L. Lets see how it looks...

    19 / 55

  • Forward-Backward Envelope

    x

    (x)

    (x)

    20 / 55

  • Forward-Backward Envelope

    x

    (x)

    (x)

    21 / 55

  • Forward-Backward Envelope

    x

    (x)

    (x)

    22 / 55

  • Properties of FBE

    Define

    T(x) = proxg(x f(x))R(x) =

    1(x T(x))

    FBE upper bound

    (x) (x) 12 R(x)2

    FBE lower bound

    (x) (T(x)) +1Lf

    2 R(x)2

    x T(x)

    (x)

    (T(x))(x)

    x? = T(x?)

    (x?)

    23 / 55

  • Properties of FBE

    Ergo: Minimizing is equivalent to minimizing its FBE .

    inf = inf

    arg min = arg min

    However, is continuously diff/able[6] whenever f C2.

    [6]More about the FBE: P. Patrinos, L. Stella and A. Bemporad, 2014.

    24 / 55

  • FBE is C2

    FBE can be written as

    (x) = f(x) 2f(x)2 + g(xf(x)),

    where g is the Moreau envelope of g,

    g(v) = minz{g(z) + 12 z v

    2}

    g is a smooth approximation of g with g(x) = 1(x proxg(x)). Iff C2, then

    (x) = (I 2f(x))R(x).

    Therefore,arg min = arg min = zer .

    25 / 55

  • The Moreau envelope

    g(x) = |x|g0.1

    g10

    26 / 55

  • Forward-Backward Newton

    X Since is C1 but not C2, we may not apply a Newton method.X The FB Newton method is a semi-smooth method for minimizing

    using a notion of generalized differentiability.

    X The FBN iterations are

    xk+1 = xk + kdk,

    where dk is a Newton direction given by

    Hkdk = (xk),

    Hk 2B(xk),

    X B is the so-called B-subdifferential (well define it later)

    27 / 55

  • III. Forward-Backward Newton

  • Optimality conditions

    LASSO problem

    minimize 12Ax b2

    f

    +x1 g

    .

    Optimality conditions

    f(x?) g(x?).

    where f(x) = A>(Ax b) and g(x)i = sign(xi) for xi 6= 0 andg(x)i = [, ] otherwise, so

    if(x?) = sign(x?i ), if xi 6= 0,|if(x?)| , otherwise

    28 / 55

  • Optimality conditions

    If we knew the set

    = {i : x?i = 0}, = {j : x?j 6= 0},

    we would be able to write down the optimality conditions as

    A>Ax? = A

    > b+ sign(x

    ?)

    Goal. Devise a method to determine efficiently.

    29 / 55

  • Optimality conditions

    We may write the optimality conditions as follows

    x? = proxg(x? f(x?)),

    where

    proxg(z)i = sign(zi)(|zi| )+.

    ISTA and FISTA are method for the iterative solution of theseconditions. Instead, we are looking for a zero of the fixed-point residualoperator

    R(x) = x proxg(x f(x)).

    30 / 55

  • B-subdifferential

    For a function F : Rn Rn which is almost everywhere differentiable, wedefine its B-subdifferential to be[7]

    BF (x) :=

    {B Rnn

    {xn}n : xn x,R(xn) exists and R(xn) B}.

    [7]See Facchinei & Pang, 2004

    31 / 55

  • Forward-Backward Newton

    R(x) is nonexpansive Lipschitz Differentiable a.e. B-sub-differentiable (BR(x)). The proposed algorithm takes the form

    xk+1 = xk kH1k R(xk), with Hk BR(xk).

    When close to the solution, all Hk are nonsingular. Take

    Hk = I Pk(I A>A),

    where Pk is diagonal with (Pk)ii = 1 iff i k, where

    k = {i : |xki if(xki )| > }

    The scalar k is computed by a simple line search method to ensureglobal convergence of the algorithm.

    32 / 55

  • Forward-Backward Newton

    The Forward-Backward Newton method can be concisely written as

    xk+1 = xk + kdk.

    The Newton direction dk is determined as follows without the need toformulate Hk

    dkk = (R(xk))k ,

    A>kAkdkk

    = (R(xk))k A>kAkd

    kk.

    For the method to converge globally, we compute k so that the Armijocondition is satisfied for

    (xk + kd

    k) (xk) + k(xk)>dk.

    33 / 55

  • Forward-Backward Newton

    Require: A, y, , x0, 0.95/A2x x0while R(x) > do {i : |xi if(x)| > } {i : |xi if(x)| }d xs sign(x f(x))Solve A>A(x + d) = A

    > y s

    1while (x+ d) (x) + (x)>d do 12

    end whilex x+ d

    end while

    34 / 55

  • Speeding up FBN by Continuation

    1. In applications of LASSO we have x?0 m n[8]

    2. If 0 := f(x0), then supp(x) = 3. We relax the optimization problem solving

    P() : minimize 12Ax y2 + x1

    4. Once we have approximately solved P() we update as

    max{, },

    until eventually = .

    5. This way we enforce that (i) |k| increases smoothly, (ii) |k| < m,(iii) A>kAk remains always positive definite.

    [8]The zero-norm of x, x0, is the number of its nonzeroes.35 / 55

  • Speeding up FBN by Continuation

    1. In applications of LASSO we have x?0 m n[8]

    2. If 0 := f(x0