- Home
- Documents
*A parallel min-cut algorithm using iteratively reweighted ... 2016...A parallel min-cut algorithm...*

prev

next

out of 17

View

213Download

1

Embed Size (px)

Parallel Computing 59 (2016) 4359

Contents lists available at ScienceDirect

Parallel Computing

journal homepage: www.elsevier.com/locate/parco

A parallel min-cut algorithm using iteratively reweighted least

squares targeting at problems with floating-point edge

weights

Yao Zhu , David F. Gleich Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN 47907, USA

a r t i c l e i n f o

Article history:

Received 30 March 2015

Revised 21 November 2015

Accepted 22 February 2016

Available online 3 March 2016

Keywords:

Undirected graphs

s t min-cut

Iteratively reweighted least squares

Laplacian systems

Parallel linear system solvers

a b s t r a c t

We present a parallel algorithm for the undirected s t min-cut problem with floating-

point valued edge weights. Our overarching algorithm uses an iteratively reweighted least

squares framework. Specifically, this algorithm generates a sequence of Laplacian linear

systems, which are solved in parallel. The iterative nature of our algorithm enables us

to trade off solution quality for execution time, which is distinguished from those purely

combinatorial algorithms that only produce solutions at optimum. We also propose a novel

two-level rounding procedure that helps to enhance the quality of the approximate min-

cut solution output by our algorithm. Our overall implementation, including the round-

ing procedure, demonstrates significant speed improvement over a state-of-the-art serial

solver, where it could be up to 200 times faster on commodity platforms.

2016 Elsevier B.V. All rights reserved.

1. Introduction

We consider the undirected s t min-cut problem. Our goal is a practical, scalable, parallel algorithm for problems with

hundreds of millions or billions of edges and with floating point weights on the edges. Additionally, we expect there to be

a sequence of such s t min-cut computations where the difference between successive problems is small. The motivation

for such problems arises from a few recent applications including the FlowImprove method to improve a graph partition [1]

and the GraphCut method to segment high-resolution images and MRI scans [2,3] . Both of these applications are limited by

the speed of current s t min-cut solvers, and the fact that most of them cannot handle problems with floating point edge

weights. We seek to accelerate such s t min-cut computations, especially on floating point valued instances.

For the undirected s t min-cut problem, we present a Parallel Iteratively Reweighted least squares Min-Cut solver, which

we call PIRMCut for convenience. This algorithm draws its inspiration from the recent theoretical work on using Laplacians

and electrical flows to solve max-flow/min-cut in undirected graphs [46] . However, our exposition and derivation is entirely

self-contained. In contrast to traditional combinatorial solvers for this problem, our method produces an approximate min-

cut solution, just like many of the recent theory papers [46] .

There are three essential ingredients to our approach. The first essential ingredient is a variational representation of the

1 -minimization formulation of the undirected s t min-cut ( Section 2.1 ). This representation allows us to use the iteratively

reweighted least squares (IRLS) method to generate a sequence of symmetric diagonally dominant linear systems whose

Corresponding author. Tel.: +1 7654305239. E-mail addresses: yaozhu@purdue.edu (Y. Zhu), dgleich@purdue.edu (D.F. Gleich).

http://dx.doi.org/10.1016/j.parco.2016.02.003

0167-8191/ 2016 Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.parco.2016.02.003http://www.ScienceDirect.comhttp://www.elsevier.com/locate/parcohttp://crossmark.crossref.org/dialog/?doi=10.1016/j.parco.2016.02.003&domain=pdfmailto:yaozhu@purdue.edumailto:dgleich@purdue.eduhttp://dx.doi.org/10.1016/j.parco.2016.02.003

44 Y. Zhu, D.F. Gleich / Parallel Computing 59 (2016) 4359

solutions converge to an approximate solution ( Theorem 2.6 ). We show that these systems are equivalent to electrical flows

computation ( Proposition 2.3 ). We also prove a Cheeger-type inequality that relates an undirected s t min-cut to a general-

ized eigenvalue problem ( Theorem 2.7 ). The second essential ingredient is a parallel implementation of the IRLS algorithm

using a parallel linear system solver. The third essential ingredient is a two-level rounding procedure that uses information

from the electrical flow solution to generate a much smaller s t min-cut problem suitable for serial s t min-cut solvers.

The current state-of-the-art combinatorial s t max-flow/min-cut solvers [710] are all based on operating the residual

graph. The residual graph and associated algorithms are usually associated with complex data structures and updating pro-

cedures. Thus, the operations on it would result in irregular memory access patterns. Moreover, the directed edges of the

residual graph come and go frequently during algorithm execution. This dynamically changing structure of the residual

graph further exacerbates the irregularity of the s t min-cut computation if the combinatorial solvers are used. In contrast,

PIRMCut reduces the s t min-cut problem to solving a sequence of Laplacian systems all with the same fixed nonzero struc-

ture. Because then only matrix computations are used, such a reduction also gets rid of the need for complex updating

procedures. Although it does not necessarily wipe out the irregularity of the application, it does significantly diminishes

the degree of irregularity. In fact, it has been demonstrated that irregular applications rich in parallel sparse matrix com-

putations can obtain significant speedups on multithreaded platforms [11] . In a broader sense, the algorithmic paradigm

embodied by PIRMCut, i.e., reducing irregular graph applications to more regular matrix computations, could facilitate the

adoption of modern high performance computing systems and architectures, especially those specifically designed for irreg-

ular applications [1214] .

We have designed and implemented an MPI based implementation of PIRMCut and evaluated its performance on both

distributed and shared memory machines using a set of test problems consisting of different kinds of graphs. Our solver,

PIRMCut, is 200 times faster (using 32 cores) than a state-of-the-art serial s t min-cut solver on a test graph with no

essential difference in quality. In the experimental results, we also demonstrate the benefit of using warm starts when

solving a sequence of related linear systems. We further show the advantage of the proposed two-level rounding procedure

over the standard sweep cut in producing better approximate solutions.

At the moment, we do not have a precise and graph-size based runtime bound on PIRMCut. We also acknowledge that,

like most numerical solvers, it is only up to -accurate. The focus of this paper is investigating and documenting a set oftechniques that are principled and could lead to practically fast solutions on real world s t min-cut problems. We com-

pare our approach with those of others and discuss some further opportunities of our approach in the related work and

discussions ( Sections 4 and 6 ).

2. An IRLS algorithm for undirected s t min-cut

In this section, we describe the derivation of the IRLS algorithm for the undirected s t min-cut problem. We first in-

troduce our notations. Let G = (V, E ) be a weighted, undirected graph. In the derivation that follows, the term weight willbe reserved to denote the set of weights that result from the IRLS algorithm. Thus, at this point, we wish to refer to the

edge weights as capacities following the terminology in network flow problems. Let n = |V| , and m = |E| . We require for eachundirected edge { u, v } E, its capacity c({ u, v } ) > 0 . Let s and t be two distinguished nodes in G, and we call s the sourcenode and t the sink node . The problem of undirected s t min-cut is to find a partition of V = S S with s S and t S suchthat the cut value

cut (S, S ) =

{ u, v }E u S, v S

c({ u, v } )

is minimized. In the interest of solving the undirected s t min-cut problem, we assume G to be connected. We call thesubgraph of G induced on V\{ s, t} the non-terminal graph , and denote it by G = ( V , E ) . We call the edges incident to s or tthe terminal edges , and denote them by E T .

The undirected s t min-cut problem can be formulated as an 1 -minimization problem. Let B {1 , 0 , 1 } m n be theedge-node incidence matrix corresponding to an arbitrary orientation of Gs edges, and C be the m m diagonal matrixwith c({ u, v } ) on the main diagonal. Further let f = [1 0] T , and T = [ e s e t ] where e s ( e t ) is the s th ( t th) standard basis,then the undirected s t min-cut problem is

minimize x

|| CBx || 1 subject to x = f , x [0 , 1] n . (1)

Note that in (1) we adopt the constraint x [0, 1] n instead of the integral constraint x {0, 1} n . This change is justifiedbecause once the st min-cut problem is converted into a linear program in standard form, the matrix B appears in the

constraints. The incidence matrix B is a standard example of totally unimodular [15] . Thus, such a relaxation does not

change the set of integral optimal solutions.

Y. Zhu, D.F. Gleich / Parallel Computing 59 (2016) 4359 45

2.1. The IRL