Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Extended Property Graphs and Cypher on Gradoop
1st openCypher Implementers Meeting
8 February 2017
Walldorf, Germany
Martin Junghanns University of Leipzig – Database Research Group
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
2
Gradoop
„An open-source graph dataflow system for declarative analytics of heterogeneous graph data.“
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 3
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
3
Gradoop
Distributed Graph Storage (Apache HDFS)
Apache Flink Operator Implementation
Distributed Operator Execution (Apache Flink)
Extended Property Graph Model (EPGM)
Graph Dataflow Operators
I/O
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 4
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
4
Extended Property Graphs
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
• Properties
1 2
3
4
5 1 2 3
4
5
2
1
Hobbit name : Samwise
Orc name : Azog
Clan name : Tribes of Moria founded : 1981
Orc name : Bolg
Hobbit name : Frodo yob : 2968
leaderOf since : 2790
memberOf since : 2013
hates since : 2301
hates
knows since : 2990
|Area|title:Mordor
|Area|title:Shire
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 5
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
5
Extended Property Graphs
Graph Operators/Transformations
Unary Binary
Gra
ph
Co
llect
ion
Lo
gica
l Gra
ph
Equality
Union
Intersection
Difference
Limit
Selection
Pattern Matching
Distinct
Apply
Reduce
Call
Aggregation
Pattern Matching
Transformation
Grouping
Call
Subgraph
Equality
Combination
Overlap
Exclusion
Fusion
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 6
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
6
Extended Property Graphs
3
1 3
4
5 2 3
4
1 2 UDF
LogicalGraph graph3 = readFromHDFS(); LogicalGraph graph4 = graph3.subgraph( (vertex => vertex.getLabel().equals(‘Green’)), (edge => edge.getLabel().equals(‘orange’)));
Subgraph
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 7
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
7
Extended Property Graphs
3
1 3
4
5 2
Pattern
4 5
1 3
4
2
Graph Collection
GraphCollection collection = graph3.match(‘(:Green)-[:orange]->(:Orange)’);
Pattern Matching (Single Graph Input)
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 8
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
8
Cypher on Gradoop
„Which two clan leaders hate each other and one of them knows Frodo over one to ten hops?“
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 9
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
9
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 10
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
10
Cypher on Gradoop
PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan : |-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))} |.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]} |.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]} |.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I} |.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]} |.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 11
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
11
Cypher on Gradoop
0 1 2 3 4 5 6 7 8 9
1 37 5 3 7 8 45 99 12 3
0 1 2
Frodo Baggins 1.22 Saruman
0
45: [4,1,33]
EmbeddingMetaData – Stores information about the embedding content
Mapping : Variable -> ID Column {h: 0, e1: 1, o2: 5, ...}
Mapping : Variable.Property -> Property Column {h.name: 0, h.height: 1, c1.name: 2, ...}
Embedding - Data structure used for intermediate results
Identifiers
Properties
Paths
Embedding
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 12
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
12
Cypher on Gradoop
Filter Hobbit(name=Frodo Baggins)
name: Frodo Baggins height: 1.22m gender: male city: Bag End
Project [ ]
h.id h.name h.height …
31 Frodo 1.22 …
h.id
32
id Properties
1 {…}
2 {…}
3 {…}
… …
DataSet<Vertex> DataSet<Embedding>
FlatMap(Vertex -> Embedding)
𝜋ℎ.𝐼𝑑(𝑉′) 𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡
∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜(𝑉)
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 13
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
13
Cypher on Gradoop
c.id _e1.id o1.id
51 11 2
52 12 3
… … …
DataSet<Embedding> DataSet<Embedding>
FlatJoin(lhs, rhs -> combine(lhs, rhs)) DataSet<Embedding>
o1.id _e2.id o2.id
2 13 5
3 14 3
… … …
c.id _e1.id o1.id _e2.id o2.id
51 11 2 13 5
52 12 3 14 3
… … …
Combine Check for vertex/edge isomorphism,
Remove duplicate entries
JoinEmbeddings Left: (c1:Clan)<-[:hasLeader]-(o1:Orc) Right: (o1:Orc)-[:hates]->(o2.Orc)
𝐿 ⋈𝑜1.𝑖𝑑 𝑅
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 14
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
14
Cypher on Gradoop
ExpandEmbeddings
o2.id
5
DataSet<Embedding> DataSet<Embedding>
DataSet<Embedding>
_e3.sid _e3.id _e3.tid
5 26 31
31 27 32
32 28 33
o2.id _e3.id h.id
3 [26] 31
3 [26,31,27] 32
3 [26,31,27,32,28] 33
FlatJoin(lhs, rhs -> combine(lhs, rhs))
BulkIteration
𝐿 ⋈𝑜2.𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸
Left: (o2:Orc) Edge: (o2)-[:knows*1..10]->(h)
𝐸′ ⋈𝑒.𝑡𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸
Combine Check for vertex/edge isomorphism
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 15
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
15
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 16
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
16
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 17
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
17
Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 18
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
18
Conclusion
• Implement Cypher Technology Compatibility KIT (TCK) integration tests • Benchmarking
• Implement and evaluate LDBC benchmarking queries
• Optimizations • DP-Planner • Improve cost model (more statistics, Flink optimizer hints) • Reuse of intermediate results • Consider graph partitioning
• Support more Cypher features • e.g. Aggregation and Functions
• Introduce new Cypher features • e.g. regular path queries
Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 19
Gradoop Extended Property Graphs Conclusion Cypher on Gradoop
19
Conclusion
• Gradoop on Apache Flink • Extended Property Graph abstraction on Apache Flink • Schema flexible: Type Labels and Properties • Logical Graphs / Graphs Collection • Graph Transformations for Graphs and Graph collections
• Cypher on Gradoop • Covering many Cypher features (variable length paths, predicates) • Query execution engine incl. Greedy cost-based optimizer • Physical operators mapped to Flink transformations
www.gradoop.com
[1] Junghanns, M.; Petermann, A.; K.; Rahm, E., „Distributed Grouping of Property Graphs with Gradoop“, Proc. BTW Conf. , 2017.
[2] Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “,
Proc. ICDM Conf. (Demo), 2016.
[3] Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“,
Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016.
[4] Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“,
it – Special Issue on Big Data Analytics, 2016.
[5] Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“,
Proc. VLDB Conf. (Demo), 2014.