60
Using Graph Databases For Insights Into Connected Data Gagan Agrawal Xebia

Graph db

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Graph db

Using Graph Databases For Insights Into Connected Data

Gagan Agrawal

Xebia

Page 2: Graph db

2

Agenda

High level view of Graph Space Comparison with RDBMS and other NoSQL

stores Data Modeling Cypher : Graph Query Language Graphs in Real World Graph Database Internals

Page 3: Graph db

3

What is a Graph?

Page 4: Graph db

4

Graph

Page 5: Graph db

5

What is a Graph? A collection of vertices and edges. Set of nodes and the relationships that

connect them. Graph Represents -

Entities as NODES The way those entities relate to the world

as RELATIONSHIP Allows to model all kind of scenarios

System of road Medical history Supply chain management Data Center

Page 6: Graph db

6

Example – Twitter's Data

Page 7: Graph db

7

Example – Twitter's Data

Page 8: Graph db

8

High Level view of Graph Space

Graph Databases - Technologies used

primarily for transactional online graph persistence – OLTP.

Graph Compute Engines - Tecnologies used primarily for offline graph analytics - OLAP.

Page 9: Graph db

9

Graph Databases Online database management system with

-Create, Read, Update, Delete

methods that expose a graph data model. Built for use with transactional (OLTP)

systems. Used for richly connected data. Querying is performed through traversals. Can perform millions of traversal steps per

second. Traversal step resembles a join in a RDBMS

Page 10: Graph db

10

Graph Database Properties

The Underlying Storage : Native / Non-Native

The Processing Engine : Native / Non-Native

Page 11: Graph db

11

Graph DB – The Underlying Storage Native Graph Storage – Optimized and

designed for storing and managing graphs.

Non-Native Graph Storage – Serialize the graph data into a relational database, an object oriented database, or some other general purpose data store.

Page 12: Graph db

12

Native Graph Storage

Page 13: Graph db

13

Graph DB – The processing Engine Index free adjacency – Connected Nodes

physically point to each other in the database

Page 14: Graph db

14

Non-Native : Index Look-Up

Page 15: Graph db

15

Native : Index Free Adjacency

Page 16: Graph db

16

Graph Databases

Page 17: Graph db

17

Power of Graph Databases Performance

Flexibility

Agility

Page 18: Graph db

18

Comparison Relational Databases

NoSQL Databases

Graph Databases

Page 19: Graph db

19

Relational Databases Lack Relationships Initially designed to codify paper forms and

tabular structures. Deal poorly with relationships. The rise in connectedness translates into

increased joins. Lower performance. Difficult to cater for changing business

needs.

Page 20: Graph db

20

RDBMS

Page 21: Graph db

21

RDBMS

What products did a customer buy?

Which customers bought this product?

Which customers bought this product who also bought that product?

Page 22: Graph db

22

RDBMS

Page 23: Graph db

23

Query to find friends-of-friends

Page 24: Graph db

24

NoSQL Databases also lack Relationships NOSQL Databases e.g key-value, document

or column oriented store sets of disconnected values/documents/columns.

Makes it difficult to use them for connected data and graphs.

One of the solution is to embed an aggregate's identifier inside the field belonging to another aggregate.

Effectively introducing foreign keys Requires joining aggregates at the

application level.

Page 25: Graph db

25

NoSQL DB

Page 26: Graph db

26

NoSQL DB Relationships between aggregates aren't first

class citizens in the data model. Foreign aggregate "links" are not reflexive. Asking the database "Who has bought a

particular product" is an expensive operation. Need to use some external compute

infrastructure e.g Hadoop for such processing. Do not maintain consistency of connected

data. Do not support index-free adjacency.

Page 27: Graph db

27

NoSQL DB

Page 28: Graph db

28

Graph DB Embraces Relationships

Page 29: Graph db

29

Graph DB Find friends-of-friends in a social network,

to a maximum depth of 5. Total records : 1,000,000 Each with approximately 50 friends

Page 30: Graph db

30

Graph DB

Page 31: Graph db

31

NoSQL Comparison

Page 32: Graph db

32

Data Modeling with Graph

Page 33: Graph db

33

Data Modeling “Whiteboard” friendly

The typical whiteboard view of a problem is a GRAPH.

Sketch in our creative and analytical modes, maps closely to the data model inside the database.

Page 34: Graph db

34

The Property Graph Model

Page 35: Graph db

35

Cypher : Graph Query Language Pattern-Matching Query Language Humane language Expressive Declarative : Say what you want, now how Borrows from well know query languages Aggregation, Ordering, Limit Update the Graph

Page 36: Graph db

36

Cypher Cypher Representation : (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]-

>(a)

(c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KNOWS]-(c)

Page 37: Graph db

37

Cypher

START c=node:user(name='Michael')MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a),

(c)-[:KNOWS]->(a)RETURN a, b

Page 38: Graph db

38

Other Cypher Clauses WHERE

Provides criteria for filtering pattern matching results.

CREATE and CREATE UNIQUE Create nodes and relationships

DELETE Removes nodes, relationships and

properties SET

Sets property values

Page 39: Graph db

39

Other Cypher Clauses FOREACH

Performs an updating action for graph element in a list.

UNION Merge results from two or more queries.

WITH Chains subsequent query parts and forward

results from one to the next. Similar to piping commands in UNIX.

Page 40: Graph db

40

Comparison of Relational and Graph Modeling

Page 41: Graph db

41

Systems Management Domain

Page 42: Graph db

42

Entity Relationship Diagram

Page 43: Graph db

43

Tables and Relationships

Page 44: Graph db

44

Graph Representation

Page 45: Graph db

45

Query to find faulty Equipment

Page 46: Graph db

46

Matched Paths

Page 47: Graph db

47

Fine Grained vs Generic Relationships

DELIVERY_ADDRESS

VS

ADDRESS{type : 'delivery'}

Page 48: Graph db

48

Page 49: Graph db

49

Page 50: Graph db

50

Graphs in the Real World

Page 51: Graph db

51

Common Use Cases Social Recommendations Geo Logistics Networks : for package routing, finding

shortest Path Financial Transaction Graphs : for fraud detection

Master Data Management Bioinformatics : Era7 to relate complex web of

information that includes genes, proteins and enzymes Authorization and Access Control : Adobe

Creative Cloud, Telenor

Page 52: Graph db

52

Graph Database Internals

Page 53: Graph db

53

Non Functional Characteristics Transactions

Fully ACID Recoverability Availability Scalability

Page 54: Graph db

54

Scalability Capacity (Graph Size)

Latency (Response Time)

Read and Write Throughput

Page 55: Graph db

55

Capacity 1.9 Release of Neo4j can support single

graphs having 10s of billions of nodes, relationships and properties.

The Neo4j team has publicly expressed the intention to support 100B+ nodes/relationships/properties in a single graph as part of its 2013 roadmap.

Page 56: Graph db

56

Latency RDBMS – more data in tables/indexes result

in longer join operations. Graph DB doesn't suffer the same latency

problem. Index is used to find starting node. Traversal uses a combination of pointer

chasing and pattern matching to search the data.

Performance does not depend on total size of the dataset.

Depends only on the data being queried.

Page 57: Graph db

57

Throughput Constant performance irrespective of

graph size.

Page 58: Graph db

58

Who uses Neo4j ?

Page 59: Graph db

59

Resources

Page 60: Graph db

60

Thank You