34
1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University [email protected]

1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University [email protected]

Embed Size (px)

Citation preview

Page 1: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

1

Large-Scale Network Analysis with the Boost Graph Libraries

Douglas GregorOpen Systems LabIndiana University

[email protected]

Page 2: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

2

What are the BGLs? A collection of libraries for computation on

graphs/networks. Graph data structures Graph algorithms Graph input/output

Common design Flexibility/customizability throughout Obsessed with performance Common interfaces throughout the collection

All open source, freely available onlineIntro

Page 3: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

3

The BGL Family

The Original (sequential) BGL

BGL-Python

The Parallel BGL

Parallel BGL-Python

Intro

Page 4: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

4

The Original BGL The largest and most mature BGL

~7 years of research and development Many users, contributors outside of the OSL Steadily evolving

Written in C++ Generic Highly customizable Efficient (both storage and execution)

Intro BGL

Page 5: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

5

BGL: Graph Data Structures Graphs:

adjacency_list: highly configurable with user-specified containers for vertices and edges

adjacency_matrix compressed_sparse_row

Adaptors: subgraphs, filtered graphs, reverse graphs LEDA and Stanford GraphBase

Or, use your own…

Intro BGL

Page 6: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

6

Original BGL: Algorithms Searches (breadth-first,

depth-first, A*) Single-source shortest

paths (Dijkstra, Bellman-Ford, DAG)

All-pairs shortest paths (Johnson, Floyd-Warshall)

Minimum spanning tree (Kruskal, Prim)

Components (connected, strongly connected, biconnected)

Maximum cardinality matching

Max-flow (Edmonds-Karp, push-relabel)

Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree)

Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun)

Betweenness centrality PageRank Isomorphism Vertex coloring Transitive closure Dominator tree

Intro BGL

Page 7: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

7

Task: Biconnected Components

Input Graph Output Graph

Articulation points: B G A

Intro BGL

Page 8: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

8

Define a Graph Type Determine vertex/edge properties:

struct Vertex { string name; };struct Edge { int bicomponent; };

Determine the graph type:typedef adjacency_list< /*EdgeListS=*/ vecS, /*VertexListS=*/ vecS, /*DirectedS=*/ undirectedS, /*VertexProperty=*/ Vertex, /*EdgeProperty=*/ Edge> Graph;

Intro BGL

Page 9: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

9

Read in a GraphViz DOT File Build an empty graph:

Graph g;

Map vertex properties:dynamic_properties dyn;dyn.property(“node_id”, get(&Vertex::name, g));

Read in the GraphViz graph:ifstream in(“biconnected_components.dot”);read_graphviz(in, g, dyn);

Intro BGL

Page 10: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

10

Run Biconnected Components Keep track of the articulation points:

vector<Graph::vertex_descriptor> art_points;

Compute biconnected components:biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points));

Intro BGL

Page 11: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

11

Output results Attach bicomponent number to the “label” property

of edges:dyn.property(“label”, get(&Edge::bicomponent, g));

Write results to another GraphViz file:ofstream out(“bc_out.dot”);write_graphviz(out, g, dyn);

Show articulation points:cout << “Articulation points: “;for (int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘;}

Intro BGL

Page 12: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

12

Task: Biconnected Components

Input Graph Output Graph

Articulation points: B G A

Intro BGL

Page 13: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

13

Original BGL Summary The original BGL is large, stable, efficient

Lots of algorithms, graph types Peer-reviewed code with many users, nightly

regression testing, etc. Performance comparable to FORTRAN.

Who should use the BGL? Programmers comfortable with C++ Users with graph sizes from tens of vertices to

millions of vertices

Intro BGL

Page 14: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

14

BGL-Python Python is ideal for rapid prototyping:

It’s a scripting language (no compiler) Dynamically typed means less typing for you Easy to use: you already know Python…

BGL-Python provides access to the BGL from within Python Similar interfaces to C++ BGL Easier to learn than C++ Great for scripting, GUI applications help(bgl.dijkstra_shortest_paths)

Intro BGL Python

Page 15: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

15

Example: Biconnected Components

import boost.graph as bgl # Pull in the BGL bindingsg = bgl.Graph.read_graphviz("biconnected_components.dot")

# Compute biconnected components and articulation pointsbicomponent = g.edge_property_map(‘int’)art_points = bgl.biconnected_components(g, bicomponent);

# Save results with bicomponent numbers as edge labelsg.edge_properties[‘label’] = bicomponentg.write_graphviz("biconnected_components_out.dot")

print "Articulation points: ",node_id = g.vertex_properties[‘node_id’]for v in art_points: print node_id[v],’ ’,print ""

Intro BGL Python

Page 16: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

16

Wrapping the BGL in Python BGL-Python is not a…

“port” reimplementation

BGL-Python wraps the C++ BGL Python calls translate to C+

+ calls C++ can call back into

Python Most of the speed of C++ Most of the flexibility of

Python

Page 17: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

17

Performance: Shortest Paths

0

5

10

15

20

25

30

Seconds

BGL Dijkstra BGL Dijkstra withPython Visitor

Python Dijkstra

Intro BGL Python

Page 18: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

18

BGL-Python Summary BGL-Python is all about tradeoffs:

More gradual learning curve Faster time-to-solution Lower performance

Our typical approach:1. Prototype in Python to get your ideas down2. Port to C++ when performance matters

Intro BGL Python

Page 19: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

19

Page 20: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

20

The Parallel BGL A version of the C++ BGL

for computational clusters Distributed memory for huge

graphs Parallel processing for

improved performance An active research project Closely related to the

original BGL Parallelizing BGL programs

should be “easy”

Intro BGL ParallelPython

Page 21: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

21

Parallel BGL: Distributed Graphs

A simple, directed graph… distributed across 3 processors.

Intro BGL ParallelPython

Page 22: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

22

Parallel Graph Algorithms Breadth-first search Eager Dijkstra’s

single-source shortest paths

Crauser et al. single-source shortest paths

Depth-first search Minimum spanning

tree (Boruvka, Dehne & Götz)

Connected components

Strongly connected components

Biconnected components

PageRank Graph coloring Fruchterman-Reingold

layout Max-flow (Dinic’s)

Intro BGL ParallelPython

Page 23: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

23

Performance: Sparse graphs

1

10

100

1000

1 10 100

# of Processors

Wall Clock Time (seconds)

Breadth-First SearchCrauser et al.Eager Dijkstra 0.1Dense BoruvkaMerging Local MSFsBoruvka-Then-MergeBoruvka-Mixed-MergeBoman et al Coloring

Page 24: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

24

Scalability (~547k vertices/node)

0

50

100

150

200

250

300

350

400

0 50 100 150

# of Processors

Wall Clock Time (seconds)

Breadth-First Search

Crauser et al. ShortestPathsEager Dijkstra ShortestPathsConnected Components

Vertex Coloring

Up to 70M Vertices1B EdgesSmall-World Graph

Page 25: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

25

Performance vs. CGMgraph

96k vertices10M edgesErdos-Renyi

17x

30x

Intro BGL ParallelPython

Page 26: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

26

Parallel BGL Summary The Parallel BGL is built for huge graphs

Millions to hundreds of millions of nodes Distributed-memory parallel processing on

clusters Future work will permit larger graphs…

Parallel programming has a learning curve Parallel graph algorithms much harder to write Distributed graph manipulation can be tricky

Parallel BGL is an active research library

Intro BGL ParallelPython

Page 27: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

27

Distributed Graph Layout

Intro BGL ParallelPython

Page 28: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

28

Parallel BGL in Python Preliminary support for the Parallel BGL in

Python Just import boost.graph.distributed Similar interface to sequential BGL-Python

Several options for usage with MPI: Straight MPI: mpirun -np 2 python script.py pyMPI: allows interactive use of the interpreter

Initially used to prototype our distributed Fruchterman-Reingold implementation.

Intro BGL ParallelPython

Page 29: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

29

Porting for Performance

Intro BGL ParallelPython Porting

Page 30: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

30

Which BGL is Right for You? Is any BGL right for you? Depends on how large your networks are:

Up to 1/2 million vertices, any BGL will do C++ BGL can push to a couple million vertices For tens of millions or larger, Parallel BGL only

Other considerations: You can prototype in Python, port to C++ Algorithm authors might prefer the original BGL Parallelism is very hard to manage

Intro BGL ParallelPython Porting

Page 31: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

31

Conclusion The Boost Graph Library family is a

collection of full-featured graph libraries All are flexible, customizable, efficient Easy to port from Python to C++ Can port from sequential to parallel Always growing, improving

Is one of the BGLs right for you? A typical “build or buy” decision

Intro BGL ParallelPython Porting Conclusion

Page 32: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

32

For More Information… (Original) Boost Graph Library

http://www.boost.org/libs/graph/doc Parallel Boost Graph Library

http://www.osl.iu.edu/research/pbgl Python Bindings for (Parallel) BGL

http://www.osl.iu.edu/~dgregor/bgl-python Contact us!

Douglas Gregor <[email protected]> Andrew Lumsdaine <[email protected]>

Intro BGL ParallelPython Porting Conclusion

Page 33: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

33

Other BGL Variants QuickGraph (C#)

http://www.codeproject.com/cs/miscctrl/quickgraph.asp Ruby Graph Library

http://rubyforge.org/projects/rgl/ Rooster Graph (Scheme)

http://savannah.nongnu.org/projects/rgraph/ RBGL (an R interface to the C++ BGL)

http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html

Disclaimer: These are all separate projects. We do not maintain them.

Intro BGL ParallelPython Porting

Page 34: 1 Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

34

Comparative Performance

BC Clustering Performance BGL vs. JUNG

0

10

20

30

40

50

60

200 225 250 275 300 325 350 375 400

# of Movies

Wall clock time (minutes)

BGL JUNG

Intro BGL