35
Large-Scale Graph Processing on Emerging Storage Devices Nima Elyasi 1 , Changho Choi 2 , Anand Sivasubramaniam 1 1 Pennsylvania State University 2 Samsung Semiconductor Inc.

Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Large-Scale Graph Processing on Emerging Storage Devices

Nima Elyasi1, Changho Choi2, Anand Sivasubramaniam1

1Pennsylvania State University

2Samsung Semiconductor Inc.

Page 2: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Graph Processing is Commonplace

2

Search Engines Social MediaRecommendations

and AdsMap and

Navigation

Page 3: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Large-Scale Graph Processing Challenges

Huge Datasets Irregular Accesses

5

High cost of DRAM

$$$$

DRAM

Page 4: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Large-Scale Graph Processing Challenges

Huge Datasets Irregular Accesses

External Graph Processing is Desirable

5

High cost of DRAM

$$$$

NVMe SSD

$

DRAM

Page 5: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Large-Scale Graph Processing Challenges

Huge Datasets Irregular Accesses

External Graph Processing is Desirable

5

High cost of DRAM

$$$$

NVMe SSD

$

DRAM

Page 6: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Large-Scale Graph Processing Challenges

Huge Datasets Irregular Accesses

External Graph Processing is Desirable

5

High cost of DRAM

$$$$

NVMe SSDFine-Grained and Random

Accesses

$

DRAM

Page 7: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Fine-Grained Access in External Graph Processing

5

SSD Page Size and Vertex Accesses Don’t Match!

SSD Page 0 SSD Page 1

SSD Page

Several KiloBytes(4KB ~ 16KB) Several Bytes,

e.g., 4Bytes

Vertex Value

Irregular Accesses

Page 8: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Fine-Grained Access in External Graph Processing

5

SSD Page Size and Vertex Accesses Don’t Match!

SSD Page 0 SSD Page 1

SSD Page

Several KiloBytes(4KB ~ 16KB)

Vertex updates are detrimental to:

Performance Device Endurance

Several Bytes, e.g., 4Bytes

Vertex Value

Irregular Accesses

Page 9: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Providing Perfect Sequentiality as a Remedy

9

• If vertex data could be stored on DRAM• Fine-grained accesses was less of an issue

GraFBoost, ISCA’18Instead, prior external graph processing framework maintains vertex data on SSD

Page 10: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Providing Perfect Sequentiality as a Remedy

10

• If vertex data could be stored on DRAM• Fine-grained accesses was less of an issue

GraFBoost, ISCA’18Instead, prior external graph processing framework maintains vertex data on SSD

Achieves perfect sequentiality by coalescing fine-grained accesses

Page 11: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Programming Model

5

Vertex-centric Programming Model

- Iterative programming model

- Each vertex runs a user-defined program

- Sending updates to neighbors along outgoing edges

A

Page 12: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Prior External Graph Processing -- GraFBoost

12

Vertex Data Index File

Edge File

Sang-Woo Jun, et al. Grafboost: Using accelerated flash storage for external graph analytics, ISCA’18.

Page 13: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Prior External Graph Processing -- GraFBoost

13

V0 → Vx

V0 → Vy

V0 → Vz

Vertex Data Index File

Edge File

Sang-Woo Jun, et al. Grafboost: Using accelerated flash storage for external graph analytics, ISCA’18.

Page 14: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Prior External Graph Processing -- GraFBoost

14

Keys: {Vx , Vy , Vz}, Value: {V0 value}

<Vx,V0 value>, <Vy,V0 value>, <Vz,V0 value>

V0 → Vx

V0 → Vy

V0 → Vz

Vertex Data Index File

Edge File

Sang-Woo Jun, et al. Grafboost: Using accelerated flash storage for external graph analytics, ISCA’18.

Page 15: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Prior External Graph Processing -- GraFBoost

15

Keys: {Vx , Vy , Vz}, Value: {V0 value}

<Vx,V0 value>, <Vy,V0 value>, <Vz,V0 value>

V0 → Vx

V0 → Vy

V0 → VzGraFBoost sorts key-value pairs in memory, logs them in SSD, merges them, and updates vertex list in SSD

Vertex Data Index File

Edge File

Sang-Woo Jun, et al. Grafboost: Using accelerated flash storage for external graph analytics, ISCA’18.

Page 16: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Computation Overhead of Sort!

16

• Up to 60% sort overhead (web graph)

• Higher sort overhead for PageRank- Processes all vertices in each iteration and generates more updates

Page 17: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Current External Graph Processing:

Read from SSD Sort in Memory Write to SSD

Linear Time O(|E|) |E|*log(|E|) Linear Time O(|E|)

Scalability Issue

17

Page 18: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Current External Graph Processing:

Read from SSD Sort in Memory Write to SSD

Linear Time O(|E|) |E|*log(|E|) Linear Time O(|E|)

Assuming DRAM “k” times faster than SSD (e.g., k=30):

Scalability Issue

18

When k < log(|E|) → Sorting can become bottleneck

Page 19: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Current External Graph Processing:

Read from SSD Sort in Memory Write to SSD

Linear Time O(|E|) |E|*log(|E|) Linear Time O(|E|)

Assuming DRAM “k” times faster than SSD (e.g., k=30):

Scalability Issue

19

When k < log(|E|) → Sorting can become bottleneck

Page 20: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Current External Graph Processing:

Read from SSD Sort in Memory Write to SSD

Linear Time O(|E|) |E|*log(|E|) Linear Time O(|E|)

Assuming DRAM “k” times faster than SSD (e.g., k=30):

Scalability Issue

20

When k < log(|E|) → Sorting can become bottleneck

Instead, we propose a vertex partitioning to eliminate the sorting

Page 21: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Extensive Prior Efforts on Partitioning Graph Data:

- Not well suited for fully external graph processing

Partitioning Graph Data

21

Require all vertices be present in main memory

Do not decouple vertices and edges

FlashGraph, FAST’15 GraphChi, OSDI’12, Mosaic, EuroSys’17

PowerGraph, OSDI’12GridGraph, USENIX ATC’15

GraphP, HPCA’18

Need each partition be completely present in

cache or memory

Dramatically increasing number of partitions and incurring high cross-

partition communication

Page 22: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Reorganizing graph data so that vertices associated with each partition can fit in main memory

Instead, We Propose a Partitioning for Vertex Data

22

Source Vertices

Destination Vertices

Page 23: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Reorganizing graph data so that vertices associated with each partition can fit in main memory

Instead, We Propose a Partitioning for Vertex Data

23

Sou

rce V

erte

x D

ata

Vertex ID &

ValueIndex

Vertex A Offset A

Vertex B Offset B

Vertex C Offset C

Partition 0

So

rted B

ased

on

Vertex

ID

Edge Data

Vertex A

Out-edge

Vertex A

Out-edge

Vertex B

Out-edge

Vertex C

Out-edge

Source Vertices

Destination Vertices

Page 24: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

In each iteration:

Execution Flow

24

SSD

Vertex Data

Destination

Vertex for a

partition

Memory

Page 25: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

In each iteration:

Execution Flow

25

Sou

rce V

erte

x D

ata

Vertex ID &

ValueIndex

Vertex A Offset A

Vertex B Offset B

Vertex C Offset C

Partition 0

SSD

Vertex Data

Destination

Vertex for a

partition

A Chunk

of Source

Vertex

(32MB)

Update Destination

Vertices

Reading

Neighboring

Information

Memory Memory

Stre

am

ing F

rom

SS

D

Page 26: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

In each iteration:

Execution Flow

26

Sou

rce V

erte

x D

ata

Vertex ID &

ValueIndex

Vertex A Offset A

Vertex B Offset B

Vertex C Offset C

Partition 0

SSD

Vertex Data

Destination

Vertex for a

partition

A Chunk

of Source

Vertex

(32MB)

Write all

updated

vertex data

on SSD

Update Destination

Vertices

Reading

Neighboring

Information

Generate Mirror Updates

for other partitions

Meta-

data for

current

partition

Memory Memory Memory

Stre

am

ing F

rom

SS

D

Page 27: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

In each iteration:

Execution Flow

27

Sou

rce V

erte

x D

ata

Vertex ID &

ValueIndex

Vertex A Offset A

Vertex B Offset B

Vertex C Offset C

Partition 0

SSD

Vertex Data

Destination

Vertex for a

partition

A Chunk

of Source

Vertex

(32MB)

Write all

updated

vertex data

on SSD

Update Destination

Vertices

Reading

Neighboring

Information

Generate Mirror Updates

for other partitions

Meta-

data for

current

partition

Memory Memory Memory

Stre

am

ing F

rom

SS

D

How to Update Vertex List in Main Memory?

Page 28: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Multiple threads are updating elements of the same vertex list- High synchronization cost

Updating Vertices in Memory

28

Vertex List

Page 29: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Multiple threads are updating elements of the same vertex list- High synchronization cost

Updating Vertices in Memory

29

Vertex List

Buffer,

e.g., 1MB

Page 30: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Required Meta-Data for Mirror Updates

Updating Vertex Mirrors on Different Partitions

30O(|V|) running time for updating mirrors

Vertex Value

Partition i

Mirrors for

Partition 0

Source Vertex Table

Vertex 0

Part ID(s)

Vertex 1

Part ID(s)

Vertex 2

Part ID(s)

Partition 0

Start Index

End Index

Partition i

Start Index

End Index

For each

partition

Metadata

For each

VertexStart Index

End Index

Page 31: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Experimental Setup

• Processor: Intel Xeon -- 48 Cores

• Memory: DRAM – 256 GB

• SSD: Two Samsung NVMe SSDs - 3.2 TB capacity in total, and 6.4 GB/s Sequential Read Speed

• Graph Algorithms: - PageRank and Breadth-First-Search (BFS)

• Input Graphs:- Web, Twitter, Synthetic (Kron)

31

Page 32: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Performance Evaluation

32

• More than 2X Improvement Compared to GrafSoft

• Providing Higher Benefits for larger graphs (Web, Kron32)

• Incurring around 10% space overhead for partitioning

Page 33: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Execution Time Breakdown

33

• Mirror updates account for 8-12% of execution time

• I/O does not remain the main contributor to the total execution time

PageRank

Page 34: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Concluding Remarks

• Large-scale graph processing suffers from random updates to vertices

• State-of-the-art provides perfect sequentiality by sorting all updates

- High computation overhead

• A partitioning for vertex data is proposed to eliminate the need for perfect sequentiality

• In Future: Addressing timely evolving graphs

• Thanks to GraFboost authors (Sang-Woo Jun) !34

Page 35: Large-Scale Graph Processing on Emerging Storage Devices...Prior External Graph Processing -- GraFBoost 13 V 0 →V x V 0 →V y V 0 →V z Vertex Data Index File Edge File Sang-Woo

Thanks!

35