Upload
bachmanm
View
1.895
Download
3
Embed Size (px)
DESCRIPTION
Modelling Data in Neo4j, bidirectional relationships, qualifying relationships with properties vs. relationship types (performance comparison), Neo4j hardware sizing, Cypher vs. Java API
Citation preview
GraphAwareTM
by Michal Bachman
plus a few best practices and lessons learned
Modelling Data in Neo4j
GraphAwareTM
GraphAwareTM
Contents
GraphAwareTM
Quick intro
Contents
GraphAwareTM
Quick intro
1x mistake
Contents
GraphAwareTM
Quick intro
1x mistake
1x experiment
Contents
GraphAwareTM
Quick intro
1x mistake
1x experiment
1x FAQ
Contents
GraphAwareTM
Quick intro
1x mistake
1x experiment
1x FAQ
1x case-study
Contents
GraphAwareTM
Data Has Changed
GraphAwareTM
Larger Volumes
Data Has Changed
GraphAwareTM
Larger Volumes
Less Structured
Data Has Changed
GraphAwareTM
Larger Volumes
Less Structured
More Interconnected
Data Has Changed
GraphAwareTM
Larger Volumes
Less Structured
More Interconnected
Polygot Persistence
Data Has Changed
GraphAwareTM
NoSQL
GraphAwareTM
Key-Value Stores
NoSQL
GraphAwareTM
Key-Value Stores
Column-Family Stores
NoSQL
GraphAwareTM
Key-Value Stores
Column-Family Stores
Document Databases
NoSQL
GraphAwareTM
Key-Value Stores
Column-Family Stores
Document Databases
Graph Databases
NoSQL
GraphAwareTM
The first three use aggregate data models, graph databases work with simple records and complex interconnections.
Graph Databases
GraphAwareTM
Neo4j
GraphAwareTM
Open-source
Neo4j
GraphAwareTM
Open-source
Schema-less
Neo4j
GraphAwareTM
Open-source
Schema-less
JVM-based
Neo4j
GraphAwareTM
Open-source
Schema-less
JVM-based
Fully ACID
Neo4j
ipsum
name: "Drama"type: "genre"
name: "Triller"type: "genre"
name: "Pulp Fiction"year: 1994type: "movie"
DIRECTED
IS_OF_GENRE
name: "Quentin Tarantino"type: "person"
name: "Director"type: "occupation"
name: "Actor"type: "occupation"
IS_OF_GENRE
ACTED_IN
name: "Samuel L. Jackson"type: "person"
IS_A
IS_A
IS_A
ACTED_IN
role: "Jules Winnfield"
role: "Jimmie Dimmick"
GraphAwareTM
Property Graph
name: "Drama"type: "genre"
name: "Triller"type: "genre"
name: "Pulp Fiction"year: 1994type: "movie"
DIRECTED
IS_OF_GENRE
name: "Quentin Tarantino"type: "person"
name: "Director"type: "occupation"
name: "Actor"type: "occupation"
IS_OF_GENRE
ACTED_IN
name: "Samuel L. Jackson"type: "person"
IS_A
IS_A
IS_A
ACTED_IN
role: "Jules Winnfield"
role: "Jimmie Dimmick"
GraphAwareTM
Traversal
GraphAwareTM
There is no single correct way.
Modeling Data as Graphs
ipsum
name: "Drama"type: "genre"
name: "Triller"type: "genre"
name: "Pulp Fiction"year: 1994type: "movie"
DIRECTED
IS_OF_GENRE
name: "Quentin Tarantino"type: "person"
name: "Director"type: "occupation"
name: "Actor"type: "occupation"
IS_OF_GENRE
ACTED_IN
name: "Samuel L. Jackson"type: "person"
IS_A
IS_A
IS_A
ACTED_IN
role: "Jules Winnfield"
role: "Jimmie Dimmick"
GraphAwareTM
One Way
GraphAwareTM
name: "Pulp Fiction"year: 1994type: "movie"genres: "Drama", "Thriller"
DIRECTE
D
name: "Quentin Tarantino"type: "person"occupation: "Actor", "Director"
ACTED_AS
name: "Samuel L. Jackson"type: "person"occupation: "Actor"
ACTED_AS
name: "Jules Winnfield"type: "role"
name: "Jimmie Dimmick"type: "role"
CHAR
ACTE
R_IN
CHARACTER_IN
Another Way
GraphAwareTM
a common mistake
Bidirectional Relationships
DEFEATEDCzech Republic
Sweden
GraphAwareTM
Ice Hockey
DEFEATEDCzech Republic
Sweden
GraphAwareTM
Ice Hockey
DEFEATED
Czech Republic
Sweden
DEFEATED_BY
GraphAwareTM
Ice Hockey (Implied Relationship)
DEFEATED
Czech Republic
Sweden
DEFEATED_BY
GraphAwareTM
Ice HockeyIce Hockey (Implied Relationship)
GraphAwareTM
In Neo4j, the speed of traversal does not depend on the direction of the relationships being traversed.
Traversals
GraphAwareTM
Why?
GraphAwareTM
Node Record in the Node Store (9 bytes), first bit = inUse flag
Relationship Record in the Relationship Store (33 bytes), first bit = inUse flag, second bit unused
next relationship
(35 bits)
next property (36 bits)
first node(35 bits)
second node (35 bits)
type(16 bits)
first node's previous
relationship (35 bits)
first node's next
relationship (35 bits)
second node's first relationship
(35 bits)
second node's next relationship
(35 bits)
next property (36 bits)
GraphAwareTM
Neo4j Data Layout
PARTNERNeo Technology GraphAware
PARTNERNeo Technology GraphAware
GraphAwareTM
Company Partnership (Naturally Bidirectional)
PARTNER
Neo Technology GraphAware
PARTNER
GraphAwareTM
Company Partnership (Naturally Bidirectional)
PARTNER
Neo Technology GraphAware
PARTNER
GraphAwareTM
Company Partnership (Naturally Bidirectional)
Neo Technology GraphAware
PARTNER
GraphAwareTM
Company Partnership (Naturally Bidirectional)
Neo Technology GraphAware
PARTNER
GraphAwareTM
Company Partnership (Naturally Bidirectional)
GraphAwareTM
Neo4j APIs allow developers to completely ignore relationship direction when querying the graph.
Why?
GraphAwareTM
MATCH (neo)-‐[:PARTNER]-‐>(partner)
Cypher
GraphAwareTM
MATCH (neo)<-‐[:PARTNER]-‐(partner)
Cypher
GraphAwareTM
MATCH (neo)-‐[:PARTNER]-‐(partner)
Cypher
GraphAwareTM
performance comparison
Qualifying Relationships
PulpFiction Michal
RATED
rating: 5
Mark
Daniela
RATEDrating: 1
RATED
ratin
g: 4
GraphAwareTM
Qualifying by Properties
GraphAwareTM
START pulpFiction=node({id})MATCH (pulpFiction)<-‐[r:RATED]-‐(fan)WHERE r.rating > 3RETURN fan
Who liked Pulp Fiction? (Cypher)
GraphAwareTM
for (Relationship r : pulpFiction.getRelationships(INCOMING, RATED)) { if ((int) r.getProperty("rating") > 3) { Node fan = r.getStartNode(); //do something with it }}
Who liked Pulp Fiction? (Java)
PulpFiction Michal
LOVED
Mark
Daniela
HATED
LIKED
GraphAwareTM
Qualifying by Relationship Type
GraphAwareTM
START pulpFiction=node({id})MATCH (pulpFiction)<-‐[r:LIKED|LOVED]-‐(fan)RETURN fan
Who liked Pulp Fiction? (Cypher)
GraphAwareTM
for (Relationship r : pF.getRelationships(INCOMING, LIKED, LOVED)) { Node fan = r.getStartNode(); //do something with it}
Who liked Pulp Fiction? (Java)
GraphAwareTM
GraphAwareTM
PulpFiction Michal
LOVED
Mark
Daniela
HATED
LIKED
GraphAwareTM
Winner!
Other interesting info?
GraphAwareTM
frequently asked question
Hardware Sizing
HDD
Record Files
Transaction Log
Operating System
JVM
Neo4j
Object Cache
Core API
Other APIs
TransactionManagement
File System Cache
Node
s
Rela
tions
hips
Prop
ertie
s
Rela
tions
hip
Type
s
GraphAwareTM
Neo4j Architecture
GraphAwareTM
> cd data> ls -‐ah
Disk Space
GraphAwareTM
drwxr-‐xr-‐x 5 bachmanm wheel 170B 19 Oct 12:56 index-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 31K 19 Oct 12:56 messages.log-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 69B 19 Oct 12:56 neostore-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 8.8K 19 Oct 12:56 neostore.nodestore.db-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.nodestore.db.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 39M 19 Oct 12:56 neostore.propertystore.db-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 153B 19 Oct 12:56 neostore.propertystore.db.arrays-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.propertystore.db.arrays.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.propertystore.db.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 43B 19 Oct 12:56 neostore.propertystore.db.index-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.propertystore.db.index.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 140B 19 Oct 12:56 neostore.propertystore.db.index.keys-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.propertystore.db.index.keys.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 154B 19 Oct 12:56 neostore.propertystore.db.strings-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.propertystore.db.strings.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 31M 19 Oct 12:56 neostore.relationshipstore.db-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.relationshipstore.db.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 38B 19 Oct 12:56 neostore.relationshiptypestore.db-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.relationshiptypestore.db.id-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 140B 19 Oct 12:56 neostore.relationshiptypestore.db.names-‐rw-‐r-‐-‐r-‐-‐ 1 bachmanm wheel 9B 19 Oct 12:56 neostore.relationshiptypestore.db.names.id
Disk Space
GraphAwareTM
Disk Space
node 9B
relationship 33B
property 41B
GraphAwareTM
Disk Space (Example)
1,000 nodes x 9B = =
8.8 kB1,000,000 rels x 33B =
=31.5 MB
2,010,000 props x 41B = =
78.6 MBTOTAL 110.1 MB
GraphAwareTM
How about low level cache? Any guesses?
Low Level Cache
GraphAwareTM
Same as disk space
Low Level Cache
GraphAwareTM
High Level Cache
node 344B
relationship 208B
property 116B
...
Other interesting info?
GraphAwareTM
case study
Java API vs. Cypher
User 2
User 1
User 3
TRAVELLED_WITH
User 4TRAVELLED_TOGETHER
FRIEND
TRAVELLED_WITH
weight: 5
weight: 1
weight: 3 weight: 4
GraphAwareTM
Data Model
GraphAwareTM
START from=node:node_auto_index(user_id="{FROM}"), to=node:node_auto_index(user_id="{TO}")
MATCH p = from-‐[r*1..5]-‐>to
RETURN extract(n in nodes(p) : n.user_id), extract(rel in relationships(p) : rel.weight), extract(rel in relationships(p) : type(rel))
ORDER BY length(p), reduce(totalWeight = 0, rel in relationships(p) : totalWeight + rel.weight)
LIMIT 3
GraphAwareTM
START from=node:node_auto_index(user_id="{FROM}"), to=node:node_auto_index(user_id="{TO}")
MATCH p = from-‐[r*1..5]-‐>to
RETURN extract(n in nodes(p) : n.user_id), extract(rel in relationships(p) : rel.weight), extract(rel in relationships(p) : type(rel))
ORDER BY length(p), reduce(totalWeight = 0, rel in relationships(p) : totalWeight + rel.weight)
LIMIT 3
> 1 second
10 - 20 ms
GraphAwareTM
Java API vs. Cypher
GraphAwareTM
Cypher is great!
Java API vs. Cypher
GraphAwareTM
Cypher is great!
Cypher is improving
Java API vs. Cypher
GraphAwareTM
Cypher is great!
Cypher is improving
But don’t be afraid of writing some Java
Java API vs. Cypher
GraphAwareTM
www.graphaware.com@graph_aware
Thanks!