29
Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns 1 , Max Kießling 1,2 , Alex Averbuch 2 , André Petermann 1 and Erhard Rahm 1 1 University of Leipzig – Database Research Group 2 Neo Technology

Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in Gradoop

GRADES2017: Graph Data-management Experiences & Systems

Chicago

May 2017

Martin Junghanns1, Max Kießling1,2, Alex Averbuch2, André Petermann1 and Erhard Rahm1 1University of Leipzig – Database Research Group 2Neo Technology

Page 2: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 2

Motivation GRADOOP Implementation Benchmark

2

Motivation

Page 3: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 3

Motivation GRADOOP Implementation Benchmark

3

Motivation

Page 4: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 4

Motivation GRADOOP Implementation Benchmark

4

Motivation

„Who are the closest enemies of each Orc?“

Page 5: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 5

Motivation GRADOOP Implementation Benchmark

5

Motivation

Cypher

Page 6: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 6

Motivation GRADOOP Implementation Benchmark

6

Motivation

Flink Gelly

Page 7: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 7

Motivation GRADOOP Implementation Benchmark

7

Motivation

Page 8: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 8

Motivation GRADOOP Implementation Benchmark

8

Motivation

Page 9: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 9

Motivation GRADOOP Implementation Benchmark

9

Motivation

„Which two clan leaders hate each other and one of them knows Frodo over one to ten hops?“

Page 10: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 10

Motivation GRADOOP Implementation Benchmark

10

Motivation

Cypher

Page 11: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 11

Motivation GRADOOP Implementation Benchmark

11

Motivation

Flink Gelly (or any other non-declarative

graph processing system)

Page 12: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 12

Motivation GRADOOP Implementation Benchmark

12

GRADOOP

Page 13: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 13

Motivation GRADOOP Implementation Benchmark GRADOOP

„An open-source graph dataflow system for declarative analytics of heterogeneous graph data.“

Page 14: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 14

Motivation GRADOOP Implementation Benchmark GRADOOP

Distributed Graph Storage (Apache HDFS)

Distributed Operator Execution (Apache Flink)

Extended Property Graph Model (EPGM)

Analytical API

I/O Graph Operators Graph Algorithms

Page 15: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 15

Motivation GRADOOP Implementation Benchmark GRADOOP

1 2

3

4

5 1 2 3

4

5

Hobbit name : Samwise

Orc name : Azog

Clan name : Tribes of Moria founded : 1981

Orc name : Bolg

Hobbit name : Frodo yob : 2968

leaderOf since : 2790

memberOf since : 2013

hates since : 2301

hates

knows since : 2990

Property Graph Model

Page 16: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 16

Motivation GRADOOP Implementation Benchmark GRADOOP

1 2

3

4

5 1 2 3

4

5

2

1

Hobbit name : Samwise

Orc name : Azog

Clan name : Tribes of Moria founded : 1981

Orc name : Bolg

Hobbit name : Frodo yob : 2968

leaderOf since : 2790

memberOf since : 2013

hates since : 2301

hates

knows since : 2990

|Area|title:Mordor

|Area|title:Shire

Extended Property Graph Model

Page 17: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 17

Motivation GRADOOP Implementation Benchmark GRADOOP

Gradoop Graph Transformations

Unary Binary

Gra

ph

Co

llect

ion

Lo

gica

l Gra

ph

Equality

Union

Intersection

Difference

Limit

Selection

Pattern Matching

Distinct

Apply

Reduce

Call

Aggregation

Pattern Matching

Transformation

Grouping

Call

Subgraph

Equality

Combination

Overlap

Exclusion

Fusion

Page 18: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 18

Motivation GRADOOP Implementation Benchmark GRADOOP

3

1 3

4

5 2 Pattern

4 5

1 3

4

2

Graph Collection

Pattern Matching (Single-Graph Setting)

GraphCollection collection = graph3.cypher(‘MATCH (:Green)-[:orange]->(:Orange) RETURN *’, ISO, ISO);

Page 19: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 19

Motivation GRADOOP Implementation Benchmark

19

Implementation

Page 20: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 20

Motivation GRADOOP Implementation Benchmark

20

Implementation

Id Label Properties

1 Area {title:Mordor}

2 Area {title:Shire}

Id Label Properties Graphs

1 Orc {name:Azog} {1}

2 Clan {name:Tribes of Moria, founded:1981} {1}

3 Orc {name:Bolg} {1,2}

4 Hobbit {name:Frodo, yob:2968} {2}

5 Hobbit {name:Samwise} {2}

Id Label Source Target Properties Graphs

1 leaderOf 1 2 {since:2790} {1}

2 memberOf 3 2 {since:2013} {1}

3 hates 3 4 {since:2301} {2}

4 hates 3 5 {} {2}

5 knows 5 4 {since:2990} {2}

DataSet<EPGMGraphHead>

DataSet<EPGMVertex> DataSet<EPGMEdge>

Page 21: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 21

Motivation GRADOOP Implementation Benchmark

21

Implementation

Parsing Execution

c1

o2 h

c2

o1

(c1 != c2) AND (o1 != o2) AND (h.name = Frodo Baggins)

=> 23

=> 42

=> 84

=> 123

=> 456

=> 789

0

3 4

1

2

0

3 5

1

2 4

0

3 6

1

2 4

0

3 6

1

2

4

7

Planning

Page 22: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 22

Motivation GRADOOP Implementation Benchmark

22

Implementation

PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan : |-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))} |.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]} |.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]} |.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I} |.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]} |.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}

Page 23: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 23

Motivation GRADOOP Implementation Benchmark

23

Implementation

Filter Hobbit(name=Frodo Baggins)

name: Frodo Baggins height: 1.22m gender: male city: Bag End

Project [ ]

h.id h.name h.height …

31 Frodo 1.22 …

h.id

32

id Properties

1 {…}

2 {…}

3 {…}

… …

DataSet<Vertex> DataSet<Embedding>

FlatMap(Vertex -> Embedding)

𝜋ℎ.𝐼𝑑(𝑉′) 𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡

∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜(𝑉)

FilterAndProject

Page 24: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 24

Motivation GRADOOP Implementation Benchmark

24

Implementation

c.id _e1.id o1.id

51 11 2

52 12 3

… … …

DataSet<Embedding> DataSet<Embedding>

FlatJoin(lhs, rhs -> combine(lhs, rhs)) DataSet<Embedding>

o1.id _e2.id o2.id

2 13 5

3 14 3

… … …

c.id _e1.id o1.id _e2.id o2.id

51 11 2 13 5

52 12 3 14 3

… … …

Combine Check for vertex/edge isomorphism,

Remove duplicate entries

JoinEmbeddings Left: (c1:Clan)<-[:hasLeader]-(o1:Orc) Right: (o1:Orc)-[:hates]->(o2.Orc)

𝐿 ⋈𝑜1.𝑖𝑑 𝑅

Page 25: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 25

Motivation GRADOOP Implementation Benchmark

25

Implementation

Page 26: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 28

Motivation GRADOOP Implementation Benchmark

28

Benchmark

Page 27: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 29

Motivation GRADOOP Implementation Benchmark

29

Benchmark

• 16x Intel(R) Xeon(R) 2.50GHz 6 (12), 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.1.2

Dataset # Vertices # Edges Disk size

LDBC-SNB 10 29 M 167 M 19 GB

LDBC-SNB 100 271 M 1.6 B 191 GB

Q2:

Q6:

Page 28: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

• Gradoop & Extended Property Graph Model • Schema flexible: Type Labels and Properties • Logical Graphs / Graphs Collection

• Cypher Pattern Matching Operator • Flexible operator for computing matches • Combination with existing analytical operators • Extendible architecture (planner, statistics, …)

• Implemented on Apache Flink • Horizontal scalability • Combine with other Flink libraries

Summary

Page 29: Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop GRADES2017: Graph Data-management Experiences & Systems Chicago May 2017 Martin Junghanns1,

www.gradoop.com

Junghanns, M.; Kießling, M.; Averbuch, A.,; Petermann, A.; Rahm, E. „Cypher-based Graph Pattern Matching in Gradoop“ Proc. ACM SIGMOD workshop on Graph Data Management Experiences and Systems (GRADES), 2017

Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “,

Proc. ICDM Conf. (Demo), 2016.

Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“,

Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016.

Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“,

it – Special Issue on Big Data Analytics, 2016.

Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“,

Proc. VLDB Conf. (Demo), 2014.