Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •

Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann

Technical University of Munich

Chair of Database Systems

Evaluation of Parallel Graph Loading Techniques

3Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques


Goal: Efficiently load a given graph dataset for explorative analytics


General Graph Loading Pipeline

Read

• Parse edges and create relabeling

• Write edges to worker-local buffer

Sync

• Find unique vertices

• Count neighbors

Write

• Create final graph data structure

• Apply final relabeling

Analytics• The actual analytics work

Problem: The optimal way of loading the graph depends on various factors:

• Format of the graph data

• Source of the data

• Properties of the input data

• Target graph data structure

• Execution machine

Graph loading pipeline must be adapted to the scenario at hand


Scenario-specific Graph Loading




Read



Sync


• Count neighbors

Write







Read



Sync


• Count neighbors

Write




Identifier data

type? binary,

decimal, string?




Read



Sync


• Count neighbors

Write




Identifier data

type? binary,

decimal, string?

Can input data

be read multiple

times?




Read



Sync


• Count neighbors

Write




Identifier data

type? binary,

decimal, string?

Random

access

possible?

Can input data

be read multiple

times?




Read



Sync


• Count neighbors

Write




Identifier data

type? binary,

decimal, string?

Random

access

possible?

Can input data

be read multiple

times?

Explicit vertex

list available?




Read



Sync


• Count neighbors

Write




Identifier data

type? binary,

decimal, string?

Random

access

possible?

Can input data

be read multiple

times?

Explicit vertex

list available?




Read



Sync


• Count neighbors

Write




Identifier data

type? binary,

decimal, string?

Random

access

possible?

Can input data

be read multiple

times?

Explicit vertex

list available?

Which data

structure to

generate?




Read



Sync


• Count neighbors

Write




Identifier data

type? binary,

decimal, string?

Random

access

possible?

Can input data

be read multiple

times?

Explicit vertex

list available?

Which data

structure to

generate?




Read



Sync


• Count neighbors

Write




Binary reader

• No parsing necessary => directly copy vertex identifiers

• Every edge same size => work splitting trivial


Parsers

Binary reader



Library-provided decimal parsing

• Readily-available for many languages

• We evaluated C++’s stream operator and strtol

• Varying edge length => work splitting more complex


Parsers

Binary reader








Parsers

2x 20x 200x

Binary reader







Iterative decimal parsing

• Multiply by ten and add character’s respective digit


Parsers

2x 20x 200x

Binary reader










Parsers

2x 20x 200x

Binary reader










Parsers

2x 20x 200x

Binary reader









Vectorized decimal parsing

• Leverage wide vector units for identifier parsing


Parsers

2x 20x 200x

Binary reader












2x 20x 200x

Parsers

T. Muhlbauer, W. Rodiger, R. Seilbeck, A. Reiser, A. Kemper, and T. Neumann

Instant loading for main memory databases.

Proceedings of the VLDB Endowment, 2013.

Binary reader












Parsers

2x 20x 200x

Binary reader












Parsers

2x 20x 200x

Binary reader











Parser code generation


Parsers

2x 20x 200x




Read



Sync


• Count neighbors

Write




Closely related areas


Data Structures and Identifier Relabeling


Map of Neighbor Lists => No relabeling (Identity)

• Directly use dataset identifiers

• Runtime overhead for neighbor and property accesses

• Simple and efficient to load



1

1 2

0 2








1

1 2

0 2

Hash-based

access






Compressed Sparse Row (CSR) => Dense relabeling

• Dense identifiers [0, |V|-1]

• Packed, sequential memory layout

• Allows offset-based data structure access

• e.g. for neighbor lists, or properties

• Overhead during loading



1

1 2

0 2

1 1 2 0 2

Hash-based

access


No relabeling (Identity) => Map of Neighbor Lists




Dense relabeling => Compressed Sparse Row (CSR)

• Dense identifiers [0, |V|-1]

• Packed, sequential memory layout

• Allows offset-based data structure access

• e.g. for neighbor lists, or properties

• Overhead during loading



1

1 2

0 2

1 1 2 0 2

Hash-based

access

Offset-based

access

Mapping

• Assign dense identifiers while reading the input data

• Global: All workers use a shared map

• Local: Each worker creates a local relabeling


Relabeling Strategies

Mapping




Collection

• Gather unique identifiers while reading the input

• Assign dense identifiers at the end

• Global: Shared identifier set for all workers

• Local: Use a local set per worker



∪ ∪ ∪

Mapping




Collection

• Gather unique identifiers while reading the input

• Assign dense identifiers at the end

• Global: Shared identifier set for all workers

• Local: Use a local set per worker

Relabeling is finalized/applied when the graph data structure is written



∪ ∪ ∪

Graph loading times for various relabeling strategies

No further dataset properties leveraged


Relabeling Strategies - Measurements

Graph loading times for various relabeling strategies

No further dataset properties leveraged


Relabeling Strategies - Measurements




Read



Sync


• Count neighbors

Write




Explicit vertex lists

• All unique vertices in the dataset are known beforehand

• No need to find and count vertices => improves loading efficiency


Leveraging Dataset Properties




Partitioned edge list

• Edge list partitioned by source vertex

• Each source vertex has a responsible worker thread

• determined by the input data chunk

• Significantly reduces worker communication overhead






Partitioned edge list

• Edge list partitioned by source vertex

• Each source vertex has a responsible worker thread

• determined by the input data chunk

• Significantly reduces worker communication overhead



Partitioned

1 2

1 3

1 4

2 1

2 4

3 1

3 2

4 3

Unpartitioned

4 3

1 3

3 1

1 4

2 1

1 2

3 2

2 4


Leveraging Dataset Properties - Measurements

Graphs

• LDBC-1000, |V| = 3.6M, |E| = 447M

• Twitter , |V| = 41.6M, |E| = 1.5B



Graphs

• LDBC-1000, |V| = 3.6M, |E| = 447M

• Twitter , |V| = 41.6M, |E| = 1.5B



Graphs

• LDBC-1000, |V| = 3.6M, |E| = 447M

• Twitter , |V| = 41.6M, |E| = 1.5B



Graphs

• LDBC-1000, |V| = 3.6M, |E| = 447M

• Twitter , |V| = 41.6M, |E| = 1.5B


Comparison with Existing Systems

Twitter LDBC

Oracle PGX 2153s 632s

GraphBIG out of memory 1682s

Ours non-partitioned 88s 24s

Ours partitioned 34s 7s

Graphs

• LDBC-1000, |V| = 3.6M, |E| = 447M

• Twitter , |V| = 41.6M, |E| = 1.5B

Machine:

• 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)

• 256GB, Ubuntu 15.10, kernel 4.2.0

CSR (relabeled)

Load + Run = Total

Neighbors Map (identity)

Load + Run = Total

PageRank 37s 33s 70s---- 25s 194s 219s----

Triangle Counting 37s 49s 86s---- 25s 66s 92s----


Influence on Analytics

Graphs

• Twitter , |V| = 41.6M, |E| = 1.5B

Machine:

• 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)

• 256GB, Ubuntu 15.10, kernel 4.2.0

Optimal loading pipeline for a graph dataset is highly dependent on the

• Data format

• Source of the data

• Properties of the dataset

• Algorithm-dependent graph data structure

• Target machine

Custom iterative identifier parsing always beneficial

Concurrent identifier relabeling mostly beneficial

• More challenging than identity mapping, but usually worth it

Leveraging properties of the dataset can lead to enormous speedups


Summary

Documents

Evaluation of Parallel Graph Loading Techniques · Problem: The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data •