A parallel min-cut algorithm using iteratively reweighted ... 2016...A parallel min-cut algorithm using iteratively reweighted least squares targeting at problems with floating-point

  • View
    213

  • Download
    1

Embed Size (px)

Text of A parallel min-cut algorithm using iteratively reweighted ... 2016...A parallel min-cut algorithm...

  • Parallel Computing 59 (2016) 4359

    Contents lists available at ScienceDirect

    Parallel Computing

    journal homepage: www.elsevier.com/locate/parco

    A parallel min-cut algorithm using iteratively reweighted least

    squares targeting at problems with floating-point edge

    weights

    Yao Zhu , David F. Gleich Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA

    a r t i c l e i n f o

    Article history:

    Received 30 March 2015

    Revised 21 November 2015

    Accepted 22 February 2016

    Available online 3 March 2016

    Keywords:

    Undirected graphs

    s t min-cut

    Iteratively reweighted least squares

    Laplacian systems

    Parallel linear system solvers

    a b s t r a c t

    We present a parallel algorithm for the undirected s t min-cut problem with floating-

    point valued edge weights. Our overarching algorithm uses an iteratively reweighted least

    squares framework. Specifically, this algorithm generates a sequence of Laplacian linear

    systems, which are solved in parallel. The iterative nature of our algorithm enables us

    to trade off solution quality for execution time, which is distinguished from those purely

    combinatorial algorithms that only produce solutions at optimum. We also propose a novel

    two-level rounding procedure that helps to enhance the quality of the approximate min-

    cut solution output by our algorithm. Our overall implementation, including the round-

    ing procedure, demonstrates significant speed improvement over a state-of-the-art serial

    solver, where it could be up to 200 times faster on commodity platforms.

    2016 Elsevier B.V. All rights reserved.

    1. Introduction

    We consider the undirected s t min-cut problem. Our goal is a practical, scalable, parallel algorithm for problems with

    hundreds of millions or billions of edges and with floating point weights on the edges. Additionally, we expect there to be

    a sequence of such s t min-cut computations where the difference between successive problems is small. The motivation

    for such problems arises from a few recent applications including the FlowImprove method to improve a graph partition [1]

    and the GraphCut method to segment high-resolution images and MRI scans [2,3] . Both of these applications are limited by

    the speed of current s t min-cut solvers, and the fact that most of them cannot handle problems with floating point edge

    weights. We seek to accelerate such s t min-cut computations, especially on floating point valued instances.

    For the undirected s t min-cut problem, we present a Parallel Iteratively Reweighted least squares Min-Cut solver, which

    we call PIRMCut for convenience. This algorithm draws its inspiration from the recent theoretical work on using Laplacians

    and electrical flows to solve max-flow/min-cut in undirected graphs [46] . However, our exposition and derivation is entirely

    self-contained. In contrast to traditional combinatorial solvers for this problem, our method produces an approximate min-

    cut solution, just like many of the recent theory papers [46] .

    There are three essential ingredients to our approach. The first essential ingredient is a variational representation of the

    1 -minimization formulation of the undirected s t min-cut ( Section 2.1 ). This representation allows us to use the iteratively

    reweighted least squares (IRLS) method to generate a sequence of symmetric diagonally dominant linear systems whose

    Corresponding author. Tel.: +1 7654305239. E-mail addresses: yaozhu@purdue.edu (Y. Zhu), dgleich@purdue.edu (D.F. Gleich).

    http://dx.doi.org/10.1016/j.parco.2016.02.003

    0167-8191/ 2016 Elsevier B.V. All rights reserved.

    http://dx.doi.org/10.1016/j.parco.2016.02.003http://www.ScienceDirect.comhttp://www.elsevier.com/locate/parcohttp://crossmark.crossref.org/dialog/?doi=10.1016/j.parco.2016.02.003&domain=pdfmailto:yaozhu@purdue.edumailto:dgleich@purdue.eduhttp://dx.doi.org/10.1016/j.parco.2016.02.003

  • 44 Y. Zhu, D.F. Gleich / Parallel Computing 59 (2016) 4359

    solutions converge to an approximate solution ( Theorem 2.6 ). We show that these systems are equivalent to electrical flows

    computation ( Proposition 2.3 ). We also prove a Cheeger-type inequality that relates an undirected s t min-cut to a general-

    ized eigenvalue problem ( Theorem 2.7 ). The second essential ingredient is a parallel implementation of the IRLS algorithm

    using a parallel linear system solver. The third essential ingredient is a two-level rounding procedure that uses information

    from the electrical flow solution to generate a much smaller s t min-cut problem suitable for serial s t min-cut solvers.

    The current state-of-the-art combinatorial s t max-flow/min-cut solvers [710] are all based on operating the residual

    graph. The residual graph and associated algorithms are usually associated with complex data structures and updating pro-

    cedures. Thus, the operations on it would result in irregular memory access patterns. Moreover, the directed edges of the

    residual graph come and go frequently during algorithm execution. This dynamically changing structure of the residual

    graph further exacerbates the irregularity of the s t min-cut computation if the combinatorial solvers are used. In contrast,

    PIRMCut reduces the s t min-cut problem to solving a sequence of Laplacian systems all with the same fixed nonzero struc-

    ture. Because then only matrix computations are used, such a reduction also gets rid of the need for complex updating

    procedures. Although it does not necessarily wipe out the irregularity of the application, it does significantly diminishes

    the degree of irregularity. In fact, it has been demonstrated that irregular applications rich in parallel sparse matrix com-

    putations can obtain significant speedups on multithreaded platforms [11] . In a broader sense, the algorithmic paradigm

    embodied by PIRMCut, i.e., reducing irregular graph applications to more regular matrix computations, could facilitate the

    adoption of modern high performance computing systems and architectures, especially those specifically designed for irreg-

    ular applications [1214] .

    We have designed and implemented an MPI based implementation of PIRMCut and evaluated its performance on both

    distributed and shared memory machines using a set of test problems consisting of different kinds of graphs. Our solver,

    PIRMCut, is 200 times faster (using 32 cores) than a state-of-the-art serial s t min-cut solver on a test graph with no

    essential difference in quality. In the experimental results, we also demonstrate the benefit of using warm starts when

    solving a sequence of related linear systems. We further show the advantage of the proposed two-level rounding procedure

    over the standard sweep cut in producing better approximate solutions.

    At the moment, we do not have a precise and graph-size based runtime bound on PIRMCut. We also acknowledge that,

    like most numerical solvers, it is only up to -accurate. The focus of this paper is investigating and documenting a set oftechniques that are principled and could lead to practically fast solutions on real world s t min-cut problems. We com-

    pare our approach with those of others and discuss some further opportunities of our approach in the related work and

    discussions ( Sections 4 and 6 ).

    2. An IRLS algorithm for undirected s t min-cut

    In this section, we describe the derivation of the IRLS algorithm for the undirected s t min-cut problem. We first in-

    troduce our notations. Let G = (V, E ) be a weighted, undirected graph. In the derivation that follows, the term weight willbe reserved to denote the set of weights that result from the IRLS algorithm. Thus, at this point, we wish to refer to the

    edge weights as capacities following the terminology in network flow problems. Let n = |V| , and m = |E| . We require for eachundirected edge { u, v } E, its capacity c({ u, v } ) > 0 . Let s and t be two distinguished nodes in G, and we call s the sourcenode and t the sink node . The problem of undirected s t min-cut is to find a partition of V = S S with s S and t S suchthat the cut value

    cut (S, S ) =

    { u, v }E u S, v S

    c({ u, v } )

    is minimized. In the interest of solving the undirected s t min-cut problem, we assume G to be connected. We call thesubgraph of G induced on V\{ s, t} the non-terminal graph , and denote it by G = ( V , E ) . We call the edges incident to s or tthe terminal edges , and denote them by E T .

    The undirected s t min-cut problem can be formulated as an 1 -minimization problem. Let B {1 , 0 , 1 } m n be theedge-node incidence matrix corresponding to an arbitrary orientation of Gs edges, and C be the m m diagonal matrixwith c({ u, v } ) on the main diagonal. Further let f = [1 0] T , and T = [ e s e t ] where e s ( e t ) is the s th ( t th) standard basis,then the undirected s t min-cut problem is

    minimize x

    || CBx || 1 subject to x = f , x [0 , 1] n . (1)

    Note that in (1) we adopt the constraint x [0, 1] n instead of the integral constraint x {0, 1} n . This change is justifiedbecause once the st min-cut problem is converted into a linear program in standard form, the matrix B appears in the

    constraints. The incidence matrix B is a standard example of totally unimodular [15] . Thus, such a relaxation does not

    change the set of integral optimal solutions.

  • Y. Zhu, D.F. Gleich / Parallel Computing 59 (2016) 4359 45

    2.1. The IRL