Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann
Technical University of Munich
Chair of Database Systems
Evaluation of Parallel Graph Loading Techniques
3Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
4Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Goal: Efficiently load a given graph dataset for explorative analytics
5Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Problem: The optimal way of loading the graph depends on various factors:
• Format of the graph data
• Source of the data
• Properties of the input data
• Target graph data structure
• Execution machine
Graph loading pipeline must be adapted to the scenario at hand
6Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Scenario-specific Graph Loading
Goal: Efficiently load a given graph dataset for explorative analytics
7Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Goal: Efficiently load a given graph dataset for explorative analytics
8Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Goal: Efficiently load a given graph dataset for explorative analytics
9Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Can input data
be read multiple
times?
Goal: Efficiently load a given graph dataset for explorative analytics
10Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Goal: Efficiently load a given graph dataset for explorative analytics
11Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Explicit vertex
list available?
Goal: Efficiently load a given graph dataset for explorative analytics
12Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Explicit vertex
list available?
Goal: Efficiently load a given graph dataset for explorative analytics
13Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Explicit vertex
list available?
Which data
structure to
generate?
Goal: Efficiently load a given graph dataset for explorative analytics
14Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Identifier data
type? binary,
decimal, string?
Random
access
possible?
Can input data
be read multiple
times?
Explicit vertex
list available?
Which data
structure to
generate?
Goal: Efficiently load a given graph dataset for explorative analytics
15Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
16Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
17Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
18Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
19Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
20Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
21Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
22Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
23Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
2x 20x 200x
Parsers
T. Muhlbauer, W. Rodiger, R. Seilbeck, A. Reiser, A. Kemper, and T. Neumann
Instant loading for main memory databases.
Proceedings of the VLDB Endowment, 2013.
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
24Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
25Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
• No parsing necessary => directly copy vertex identifiers
• Every edge same size => work splitting trivial
Library-provided decimal parsing
• Readily-available for many languages
• We evaluated C++’s stream operator and strtol
• Varying edge length => work splitting more complex
Iterative decimal parsing
• Multiply by ten and add character’s respective digit
Vectorized decimal parsing
• Leverage wide vector units for identifier parsing
Parser code generation
26Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Goal: Efficiently load a given graph dataset for explorative analytics
27Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Closely related areas
28Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
Closely related areas
Map of Neighbor Lists => No relabeling (Identity)
• Directly use dataset identifiers
• Runtime overhead for neighbor and property accesses
• Simple and efficient to load
29Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1
1 2
0 2
Closely related areas
Map of Neighbor Lists => No relabeling (Identity)
• Directly use dataset identifiers
• Runtime overhead for neighbor and property accesses
• Simple and efficient to load
30Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1
1 2
0 2
Hash-based
access
Closely related areas
Map of Neighbor Lists => No relabeling (Identity)
• Directly use dataset identifiers
• Runtime overhead for neighbor and property accesses
• Simple and efficient to load
Compressed Sparse Row (CSR) => Dense relabeling
• Dense identifiers [0, |V|-1]
• Packed, sequential memory layout
• Allows offset-based data structure access
• e.g. for neighbor lists, or properties
• Overhead during loading
31Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1
1 2
0 2
1 1 2 0 2
Hash-based
access
Closely related areas
No relabeling (Identity) => Map of Neighbor Lists
• Directly use dataset identifiers
• Runtime overhead for neighbor and property accesses
• Simple and efficient to load
Dense relabeling => Compressed Sparse Row (CSR)
• Dense identifiers [0, |V|-1]
• Packed, sequential memory layout
• Allows offset-based data structure access
• e.g. for neighbor lists, or properties
• Overhead during loading
32Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1
1 2
0 2
1 1 2 0 2
Hash-based
access
Offset-based
access
Mapping
• Assign dense identifiers while reading the input data
• Global: All workers use a shared map
• Local: Each worker creates a local relabeling
33Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
Mapping
• Assign dense identifiers while reading the input data
• Global: All workers use a shared map
• Local: Each worker creates a local relabeling
Collection
• Gather unique identifiers while reading the input
• Assign dense identifiers at the end
• Global: Shared identifier set for all workers
• Local: Use a local set per worker
34Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
∪ ∪ ∪
Mapping
• Assign dense identifiers while reading the input data
• Global: All workers use a shared map
• Local: Each worker creates a local relabeling
Collection
• Gather unique identifiers while reading the input
• Assign dense identifiers at the end
• Global: Shared identifier set for all workers
• Local: Use a local set per worker
Relabeling is finalized/applied when the graph data structure is written
35Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
∪ ∪ ∪
Graph loading times for various relabeling strategies
No further dataset properties leveraged
36Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies - Measurements
Graph loading times for various relabeling strategies
No further dataset properties leveraged
37Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies - Measurements
Goal: Efficiently load a given graph dataset for explorative analytics
38Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
• Parse edges and create relabeling
• Write edges to worker-local buffer
Sync
• Find unique vertices
• Count neighbors
Write
• Create final graph data structure
• Apply final relabeling
Analytics• The actual analytics work
Explicit vertex lists
• All unique vertices in the dataset are known beforehand
• No need to find and count vertices => improves loading efficiency
39Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
Explicit vertex lists
• All unique vertices in the dataset are known beforehand
• No need to find and count vertices => improves loading efficiency
Partitioned edge list
• Edge list partitioned by source vertex
• Each source vertex has a responsible worker thread
• determined by the input data chunk
• Significantly reduces worker communication overhead
40Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
Explicit vertex lists
• All unique vertices in the dataset are known beforehand
• No need to find and count vertices => improves loading efficiency
Partitioned edge list
• Edge list partitioned by source vertex
• Each source vertex has a responsible worker thread
• determined by the input data chunk
• Significantly reduces worker communication overhead
41Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
Partitioned
1 2
1 3
1 4
2 1
2 4
3 1
3 2
4 3
Unpartitioned
4 3
1 3
3 1
1 4
2 1
1 2
3 2
2 4
42Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
43Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
44Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
45Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
46Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Comparison with Existing Systems
Twitter LDBC
Oracle PGX 2153s 632s
GraphBIG out of memory 1682s
Ours non-partitioned 88s 24s
Ours partitioned 34s 7s
Graphs
• LDBC-1000, |V| = 3.6M, |E| = 447M
• Twitter , |V| = 41.6M, |E| = 1.5B
Machine:
• 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)
• 256GB, Ubuntu 15.10, kernel 4.2.0
CSR (relabeled)
Load + Run = Total
Neighbors Map (identity)
Load + Run = Total
PageRank 37s 33s 70s---- 25s 194s 219s----
Triangle Counting 37s 49s 86s---- 25s 66s 92s----
47Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Influence on Analytics
Graphs
• Twitter , |V| = 41.6M, |E| = 1.5B
Machine:
• 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)
• 256GB, Ubuntu 15.10, kernel 4.2.0
Optimal loading pipeline for a graph dataset is highly dependent on the
• Data format
• Source of the data
• Properties of the dataset
• Algorithm-dependent graph data structure
• Target machine
Custom iterative identifier parsing always beneficial
Concurrent identifier relabeling mostly beneficial
• More challenging than identity mapping, but usually worth it
Leveraging properties of the dataset can lead to enormous speedups
48Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Summary