22
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin

A Lightweight Infrastructure for Graph Analytics

  • Upload
    mariko

  • View
    113

  • Download
    3

Embed Size (px)

DESCRIPTION

A Lightweight Infrastructure for Graph Analytics. Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin. What is Graph Analytics?. Algorithms to compute properties of graphs Connected components, shortest paths, centrality measures, diameter, PageRank, … - PowerPoint PPT Presentation

Citation preview

Page 1: A Lightweight Infrastructure for Graph Analytics

A Lightweight Infrastructure for Graph

AnalyticsDonald Nguyen

Andrew Lenharth and Keshav Pingali

The University of Texas at Austin

Page 2: A Lightweight Infrastructure for Graph Analytics

What is Graph Analytics?• Algorithms to compute properties

of graphs– Connected components, shortest

paths, centrality measures, diameter, PageRank, …

• Many applications– Google, path routing, friend

recommendations, network analysis

• Difficult to implement on a large scale– Data sets are large, data accesses

are irregular– Need parallelism and efficient

runtimes

Bridges of Konigsberg

The Social Network

Page 3: A Lightweight Infrastructure for Graph Analytics

3

Contributions• A lightweight infrastructure for graph

analytics based on improvements to the Galois system– Example: scalable priority scheduling

• Orders of magnitude faster than third-party graph DSLs for many applications and inputs– Galois permits better algorithms to be written– Galois infrastructure more scalable

Page 4: A Lightweight Infrastructure for Graph Analytics

4

Outline1. Abstraction for graph analytics

applications and implementation in Galois system

2. One piece of infrastructure: priority scheduling

3. Evaluation of Galois system versus graph analytics DSLs

Page 5: A Lightweight Infrastructure for Graph Analytics

5

Example: SSSP• Find the shortest distance from source

node to all other nodes in a graph– Label nodes with tentative distance– Assume non-negative edge weights

• Algorithms– Chaotic relaxation O(2V)– Bellman-Ford O(VE)– Dijkstra’s algorithm O(E log V)

• Uses priority queue– Δ-stepping

• Uses sequence of bags to prioritize work• Δ=1, O(E log V)• Δ=∞, O(VE)

• Different algorithms are different schedules for applying relaxations– SSSP needs priority scheduling for work

efficiency

1

9

2

31∞

0

Active edge

Neighborhood

Activity

1 9

4

3

1 3

Operator

Edge relaxation

∞2

Page 6: A Lightweight Infrastructure for Graph Analytics

6

Parallel Program = Operator + Schedule + Parallel Data Structure

• What is the operator?– Ligra, PowerGraph: only vertex programs– Galois: Unrestricted, may even morph graph by

adding/removing nodes and edges

• Where/When does it execute?– Autonomous scheduling: activities execute

transactionally– Coordinated scheduling: activities execute in

rounds• Read values refer to previous rounds• Multiple updates to the same location are resolved

with reduction, etc.

Algorithm

ba

c

Page 7: A Lightweight Infrastructure for Graph Analytics

7

Galois SystemParallel Program = Operator + Schedule + Parallel Data Structure

Operators Schedules Data Structure API

CallsSchedulers Data

StructuresAllocato

rThread

PrimitivesMulticore

User Program

Galois System

Page 8: A Lightweight Infrastructure for Graph Analytics

8

Outline1. Abstraction for graph analytics

applications and implementation in Galois system

2. One piece of infrastructure: priority scheduling

3. Evaluation of Galois system versus DSLs

Page 9: A Lightweight Infrastructure for Graph Analytics

9

Galois Infrastructure for Scheduling

• Library of different schedulers– FIFO, LIFO, Priority, …– DSL for specifying scheduling policies

• Nguyen and Pingali (ASPLOS ’11)

Page 10: A Lightweight Infrastructure for Graph Analytics

10

Priority Scheduling• Concurrent priority queue– Implemented in TBB, java.concurrent– Does not scale: our tasks are very fine-

grain

• Sequence of bags– One bag per priority level– High overhead for finding non-empty,

urgent-priority bags

Page 11: A Lightweight Infrastructure for Graph Analytics

11

Ordered by Metric (OBIM)• Sparse sequence of Bags– One Bag per priority level– To reduce synchronization• Sequence is replicated among cores• Bag is distributed between cores

– To reduce overhead of finding work• A reduction tree over machine topology to

get estimate of current urgent priority

Page 12: A Lightweight Infrastructure for Graph Analytics

12

Scheduling Bag

c1 c2

Head

Package 1

Bag

c1 c2

Head

Package 2

Page 13: A Lightweight Infrastructure for Graph Analytics

13

Scheduling Bag

c1 c2

Head

Package 1

Bag

c1 c2

Head

Package 2

Page 14: A Lightweight Infrastructure for Graph Analytics

14

Priority Map

0 1

Global Map

Local Maps

Core 1 Core 2

SchedulingBags

1 3

1 30

13

3

15

5

5

Page 15: A Lightweight Infrastructure for Graph Analytics

15

Outline1. Abstraction for graph analytics

applications and implementation in Galois system

2. One piece of infrastructure: priority scheduling

3. Evaluation of Galois system versus graph analytics DSLs

Page 16: A Lightweight Infrastructure for Graph Analytics

16

Graph Analytics DSLs

• Easy to implement their APIs on top of Galois system– Galois implementations called PowerGraph-g,

Ligra-g, etc.– About 200-300 lines of code each

GraphLab Low et al. (UAI ’10) PowerGraph Gonzalez et al. (OSDI ’12) GraphChi Kyrola et al. (OSDI ’12) Ligra Shun and Blelloch (PPoPP ’13) Pregel Malewicz et al. (SIGMOD ‘10) …

Page 17: A Lightweight Infrastructure for Graph Analytics

17

Evaluation• Platform

– 40-core system• 4 socket, Xeon E7-4860

(Westmere)– 128 GB RAM

• Applications– Breadth-first search (bfs)– Connected components

(cc)– Approximate diameter

(dia)– PageRank (pr)– Single-source shortest

paths (sssp)

• Inputs– twitter50 (50 M nodes, 2

B edges, low-diameter)– road (20 M nodes, 60 M

edges, high-diameter)

• Comparison with– Ligra (shared memory)– PowerGraph

(distributed)• Runtimes with 64 16-core

machines (1024 cores) does not beat one 40-core machine using Galois

Page 18: A Lightweight Infrastructure for Graph Analytics

18

Page 19: A Lightweight Infrastructure for Graph Analytics

19

Page 20: A Lightweight Infrastructure for Graph Analytics

20

• The best algorithm may require application-specific scheduling– Priority scheduling for

SSSP

• The best algorithm may not be expressible as a vertex program– Connected components

with union-find

• Autonomous scheduling required for high-diameter graphs– Coordinated scheduling

uses many rounds and has too much overhead

Page 21: A Lightweight Infrastructure for Graph Analytics

21

See Paper For• More inputs, applications

• Detailed discussion of performance results

• More infrastructure: non-priority scheduling, memory management

• Results for out-of-core DSLs

Page 22: A Lightweight Infrastructure for Graph Analytics

22

Summary• Galois programming model is general– Permits efficient algorithms to be written

• Galois infrastructure is lightweight

• Simpler graph DSLs can be layered on top– Can perform better than existing systems

• Download at http://iss.ices.utexas.edu/galois