36
Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

Graph Partitioning forHigh-Performance

Scientific Simulations

Advanced Topics Spring 2008

Prof. Robert van Engelen

Page 2: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 22/4/08

Overview

Challenges for irregular meshes

Modeling mesh-based computations as graphs

Static graph-partitioning techniques Geometric techniques

Combinatorial techniques

Spectral methods

Multilevel schemes

Load balancing of adaptive computations Scratch-remap repartitioners

Diffusion-based repartitioners

Multiconstraint graph partitioning

Publicly available software packages

Page 3: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 32/4/08

Challenges for Irregular Meshes

How to generate irregular meshes in the first place? Triangulation, 2D/3D finite-element mesh

How to partition them? Optimal graph partitioning is an NP-complete problem

Heuristic edge-cut methods

Sequential or parallel partitioning methods?

Static or dynamic (adaptive) partitioning methods?

How to design iterative solvers? PETSc, Prometheus, …

How to design direct solvers? Reordering the adjacency matrix, e.g. reverse Cuthill-McKee

SuperLU, parallel Gaussian elimination, …

Page 4: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 42/4/08

Finite-Element Mesh Example

Image source: MATLAB 7.5 NASA airfoil demo

NASA airfoil finite-element mesh, 4253 grid points

Page 5: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 52/4/08

Adjacency Matrix and ReverseCuthill-McKee Reordering

Image source: MATLAB 7.5 NASA airfoil demo

Adjacency Matrix Reverse Cuthill-McKee

Page 6: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 62/4/08

Matrix - Graph Relationship

11

11111

1111

11111

111

111

11111

111

87654321

8

7

6

5

4

3

2

1 1 2

34 5

6

78

Can reorder rows and columns of matrix Non-zeros outside of blocks require communication

An optimal partition of the graph for parallel computation: Balanced number of vertices in each subdomain Minimum edge-cut (number of edges between subdomains)

Page 7: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 72/4/08

Node Graph and Dual

Graph models the structure of the computation

Node graph (mesh nodes compute): Vertices = mesh nodes (nodes compute)

Edges = communication between nodes

Dual graph (mesh elements compute): Vertices = mesh elements

Edges = communication between adjacent mesh elements

2D irregular mesh Node graph Dual graph

Page 8: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 82/4/08

Goals of Partitioning

Balance computational load such that each processorhas the same execution time

Balance storage such that each processor has the samestorage demands

Minimize communication volume between subdomains,along the edges of the mesh

5-cut4-cut

Example 3-way partitionwith edge-cut = 9

Page 9: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 92/4/08

Partitioning Problem

Let G = (V, E) be a weighted undirected graph withweight functions wV: V → R+ and wE: E → R+

The k-way graph partitioning problem is to split V into kdisjoint subsets Sj, j = 1, …, k, (subdomains) such that Balance constraint:∑v ∈ Sj wV(v) is roughly equal for all j = 1, …, k

Min-cut (minimum edge-cut):∑(u,v) ∈ E such that u ∈ Si and v ∈ Sj with i≠j wE((u,v)) is minimal

The weight functions are defined such that: wV models CPU loads wE models communication volumes (+ fixed startup latency)

Can add subdomain weights to improve mapping toheterogeneous clusters of workstations

Page 10: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 102/4/08

Edge Cuts VersusCommunication Steps

Min cut ≠ optimal communication

Communication pattern does not follow edges exactly

Edge-cut: 7

Communication steps: 9

A sends 1 to BA sends 3 to BA sends 4 to CB sends 5 to AB sends 7 to AB sends 7 to CB sends 8 to CC sends 9 to BC sends 10 to A

Page 11: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 112/4/08

Static Graph Partitioning

Geometric techniques Coordinate Nested Dissection (CND)

Recursive Inertial Bisection (RIB)

Space-filling curve techniques

Sphere-cutting approach

Combinatorial techniques Levelized Nested Dissection (LND)

Kernighan-Lin/Fiduccia-Mattheyses (KL/FM)

Spectral methods

Multilevel schemes Multilevel recursive bisection

Multilevel k-way partitioning

Page 12: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 122/4/08

Geometric Techniques

Partition solely based on coordinate information Recursively bisects mesh into increasingly smaller subdomains

Applicable when coordinate system exists or can be constructed

Have no concept of edge cut, so no communication optimization

Geometric techniques are typically fast

May suffer from disconnected meshes in complex subdomains

Page 13: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 132/4/08

CND

Coordinate Nested Dissection (CND) is a geometric method Compute centers of mass of mesh elements Project onto axis of longest dimension Bisect the list of centers Repeat recursively

Good: fast and easy to parallelize Bad: low quality partitioning

Page 14: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 142/4/08

CND

Mesh is bisected normal to the axis of the longest dimension Results in smaller subdomain boundaries

Approximates smaller data volume exchanges

Bisected normal to the x-axis Bisected normal to the y-axis

Page 15: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 152/4/08

RIB

Recursive Inertial Bisection (RIB) orients the bisection to minimizethe subdomain boundary

Mesh elements are converted into point masses Compute principal inertial axis of the mass distribution Bisect orthogonal to principal inertial axis Repeat recursively

Fast and better quality than CND

CND RIB

Page 16: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 162/4/08

Space-Filling Curve Techniques

CND and RIB consider only a single dimension at a time Space-filling curve techniques follow points of mass of mesh

elements using locality-preserving curves, e.g. Peano-Hilbert curves Curve segments fill higher-dimensional subspaces Split curve into k parts resulting in k subdomains

Good: fast and generally better quality than CND and RIB Bad: works better for certain types of problems, e.g. n-body sims

8-way partition using aPeano-Hilbert space-filling curve

Page 17: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 172/4/08

The Sphere-Cutting Approach

Approach separates mesh by vertices in overlap graph Create (α,k)-overlap graph, with k-ply neighborhood system

Overlap graphs have O(n(d-1)/d) vertex separators

No point is contained in more than k d-dimensional spheres Nodes are connected when spheres, sized up by α, intersect

Project overlap graph onto (d+1)-dimensional sphere, which is cut

(1,3)-overlap graph

3-ply neighborhood system

Mesh nodes

Page 18: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 182/4/08

Combinatorial Techniques

The problem with geometric techniques is that theygroup vertices that are spatially close using a coordinatesystem, whether nodes are connected or not

Combinatorial partitioning techniques use adjacencyinformation

Smaller edge-cuts

Reasonably fast

But not easily parallelizable

Page 19: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 192/4/08

LND

Levelized Nested Dissection(LND)

Select vertex v0, preferably aperipheral vertex

For each vertex, computevertex distance to v0 using abreadth-first search startingfrom v0

When half of the vertices havebeen assigned, split graph intwo (assigned, unassigned)

Can be repeated with trials ofv0 to improve edge-cut

LND cutMin-cut

Page 20: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 202/4/08

KL Partition Refinement

Kernighan-Lin (KL) partition refinement Given partition of vertices V in disjoint sets A and B Find X ⊆ A and Y ⊆ B such that swapping X to B and Y to A yields the

greatest reduction in edge-cut Finding optimal X and Y is intractable Perform repeated passes over V, taking O(|V|2) time in each pass Each pass swaps two vertices, one from A with one from B

After KL passBefore KL pass

Edge-cut: 6

Edge-cut: 3

Page 21: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 212/4/08

KL/FM Partition Refinement

Kernighan-Lin/Fiduccia-Mattheyses (KL/FM)partition refinement

FM considers movingone vertex at a time

Works by annotatingeach vertex along thepartition with gain/lossvalues

Move vertex that givesgain

May move vertex evenwhen gain is negative Avoid local minimum

Page 22: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 222/4/08

KL/FM

Page 23: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 232/4/08

Spectral Methods

Given graph G with adjacency matrix A The discrete Laplacian matrix is

LG = A - Dwith D diagonal matrix such that Dii = degree of vertex i

Largest eigenvalue of LG is 0 Second largest eigenvalue measures graph connectivity The Fiedler vector (the eigenvector corresponding to

second largest eigenvalue) measures distance betweenvertices

Sort Fiedler vector and split graph according to thesorted list of vertices

Recursive spectral bisection computes a k-way partitionrecursively

Page 24: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 242/4/08

Spectral Methods

Input graph Partitioned graph

Page 25: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 252/4/08

Multilevel Schemes

Multilevel schemes consist of recursively applying:

1. Graph coarsening

2. Initial partitioning

3. Partition refinement

Multilevel recursive bisection Coarsening collapses vertices in pairs

Pick random pairs

Or pick heavy-edge pairs

Partitioning is performed on coarsest graph (smallest)

Refinement with KL/FM

Page 26: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 262/4/08

Multilevel SchemesRandom matching Heavy-edge matching

Coarsened graphCoarsened graph

Page 27: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 272/4/08

Partitioning Methods Compared

From: [SRC] Ch.18 K.Schloegel,G. Karypis, and V. Kumar

Page 28: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 282/4/08

Load Balancing of AdaptiveComputations

Adaptive graph partitioning methods Goal is to balance the load by redistributing data when mesh

dynamically changes

Redistributing data is expensive, should be minimized

To measure the redistribution cost we define: TotalV is the sum of the sizes of vertices that change subdomain

(measures total data volume)

MaxV is the max of the sum of sizes of those vertices that movedin/out of a subdomain (measures max time of send/recv)

Repartitioners aim to minimize TotalV or MaxV Minimizing TotalV is easier

Minimizing TotalV tends to minimize MaxV

Page 29: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 292/4/08

Adaptive Graph Partitioning

Two repartitioning approaches:

Scratch (and remap) repartition methods Partition from scratch

Incremental repartition methods Cut-and-paste

Diffusion-based methods

Page 30: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 302/4/08

Repartitioning Approaches

Diffusiverepartitioning

Repartitioningfrom scratch

Imbalancedpartitioning

(vertex weights=1)

Cut-and-pasterepartitioning

Page 31: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 312/4/08

Scratch-Remap Partitioning

Improves repartitioning from scratch After repartitioning from scratch remap vertices to match up with old

subdomains

Lowers TotalV (and thus MaxV)

Similarity matrix

Scratch-remappartitioned graph

Page 32: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 322/4/08

Incremental VersusScratch-Remap

Scratch-remap results in higher distribution costs (TotalV) comparedto incremental methods that use local perturbations

Incremental partitioning with cut-and-paste Moves few vertices between subdomains to restore balance

Incremental diffusion-based methods

Imbalancedpartioning

Incrementalpartioning

Scrath-remappartioning

Page 33: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 332/4/08

Diffusion-Based Repartitioners

Address two questions How much work should be transferred between processors?

Which tasks should be transferred?

Local diffusion schemes only consider workloads oflocalized groups

Global diffusion schemes consider entire graph Recursive bisection diffusion partitioners

Adaptive space-filling curve partitioners

Repartitioning based on flow solutions

Page 34: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 342/4/08

Multiconstraint GraphPartitioning

Sophisticated classes of simulations for which traditional graphpartitioning formulation is inadequate

Multiple constraints must be satisfied Nonuniform memory requirements

Nonuniform CPU requirements

Page 35: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 352/4/08

Software Packages

From: [SRC] Ch.18 K.Schloegel,G. Karypis, and V. Kumar

Page 36: Graph Partitioning for High-Performance Scientific Simulationsengelen/courses/HPC-adv-2008/GraphPartitioning.pdf2/4/08 HPC Fall 2007 16 Space-Filling Curve Techniques CND and RIB consider

HPC Fall 2007 362/4/08

Further Reading

[SRC] Chapter 18