73
Rela%onal to (Big) Graph Harnessing the Power of the Graph Michael Hunger JAX Mainz 2015

Relational to Big Graph

Embed Size (px)

Citation preview

Rela%onal  to  (Big)  Graph  Harnessing  the  Power  of  the  Graph  

Michael  Hunger  JAX  Mainz  2015  

Agenda  

• History  of  Neo4j  • Rela1onal  Pains  –  Graph  Pleasure  • Rela1onal  to  Graph  • Model  -­‐>  Import  -­‐>  Query  -­‐>  Build  -­‐>  Integrate  • Demo  • Q&A  

History  of  Neo4j  

A  Story  of  Rela%onal  Pain  

History  of  Neo4j  -­‐  Problem  

•  Digital  Asset  Management  System  in  2000  •  SaaS  many  users  in  many  countries  •  Two  hard  use-­‐cases  •  Mul1  language  keyword  search  •  Including  synonyms  /  word  hierarchies  

•  Access  Management  to  Assets  for  SaaS  Scale  

History  of  Neo4j  –  Rela%onal  ABempt  

•  Tried  with  many  rela1onal  DBs  •  JOIN  Performance  Problems  •  Hierarchies,  Networks,  Graphs  

•  Modeling  Problems  •  Data  Model  evolu1on  

•  No  Success,  even  …  •  With  expensive  database  consultants!  

History  of  Neo4j  –  First  working  Implementa%on  

•  Graph  Model    &  API  sketched  on  a  napkin  •  Nodes  connected  by  RelaAonships  •  Just  like  your  conceptual  model  

•  Implemented  network-­‐database  in  memory  •  Java  API,  fast  Traversals  •  Worked  well,  but  …  •  No  persistence,  No  Transac1ons  •  Long  import  /  export  1me  from  rela1onal  storage  

History  of  Neo4j  -­‐  Solu%on  

•  Evolved  to  full  fledged  database  in  Java  •  With  persistence  using  files  +  memory  mapping  •  Transac1ons  with  Transac1on  Log  (WAL)  •  Lucene  for  fast  Node  search  

•  Founded  Company  in  2007  •  Neo4j  (REST)-­‐Server  •  Neo4j  Clustering  &  HA    •  Cypher  Query  Language  

•  Today  …  

Neo  Technology  Overview  

Product  • Neo4j  -­‐  World’s  leading  graph  database  

• 1M+  downloads,  adding  50k+    per  month  

• 150+  enterprise  subscrip1on  customers  including  over    50  of  the  Global  2000  

Company  • Neo  Technology,  Creator  of  Neo4j  • 80  employees  with  HQ  in  Silicon  Valley,  London,  Munich,  Paris  and  Malmö  

• $45M  in  funding  from  Fidelity,  Sunstone,  Conor,  Creandum,  Dawn  Capital  

Neo4j  Adop%on  by  Selected  Ver%cals  FinancialServices Communications Health &

Life Sciences HR &

Recruiting Media &

Publishing SocialWeb

Industry & Logistics

Entertainment Consumer Retail Information Services Business Services

How  Customers  Use  Neo4j  Network &

Data Center Master DataManagement Social Recom–

mendations Identity

& Access Search &Discovery GEO

“Forrester  es1mates  that  over  25%  of  enterprises  will  be  using  graph  databases  by  2017”  

Neo4j  Leads  the  Graph  Database  Revolu%on  

“Neo4j  is  the  current  market  leader  in  graph  databases.”  

“Graph  analysis  is  possibly  the  single  most  effec%ve  compe%%ve  differen%ator  for  organiza1ons  pursuing  data-­‐driven  opera1ons  and  decisions  aler  the  design  of  data  capture.”  

IT  Market  Clock  for  Database  Management  Systems,  2014  hmps://www.gartner.com/doc/2852717/it-­‐market-­‐clock-­‐database-­‐management  TechRadar™:  Enterprise  DBMS,  Q1  2014  hmp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-­‐/E-­‐RES106801  Graph  Databases  –  and  Their  Poten%al  to  Transform  How  We  Capture  Interdependencies  (Enterprise  Management  Associates)  hmp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-­‐databasesand-­‐poten1al-­‐transform-­‐capture-­‐interdependencies/  

Largest  Ecosystem  of  Graph  Enthusiasts  

•  1,000,000+  downloads  •  20,000+  educated  developers  •  18,000+  Meetup  members  •  100+  technology  and  service  partners  •  150+  enterprise  subscrip1on  customers    including  50+  Global  2000  companies  

High  Business  Value  in  Data  Rela%onships  

Data  is  increasing  in  volume…  •  New  digital  processes  •  More  online  transac1ons  •  New  social  networks  •  More  devices  

Using  Data  Rela%onships  unlocks  value    •  Real-­‐1me  recommenda1ons  •  Fraud  detec1on  •  Master  data  management  •  Network  and  IT  opera1ons  •  Iden1ty  and  access  management  •  Graph-­‐based  search  …  and  is  ge^ng  more  connected  

Customers,  products,  processes,  devices  interact  and  relate  to  each  other    

Early  adopters  became  industry  leaders  

Rela%onal  Pains  –    Graph  Pleasure  

Rela%onal  DBs  Can’t  Handle  Rela%onships  Well  

•  Cannot  model  or  store  data  and  relaAonships  without  complexity  

•  Performance  degrades  with  number  and  levels  of  rela1onships,  and  database  size  

•  Query  complexity  grows  with  need  for  JOINs  •  Adding  new  types  of    data  and  relaAonships  requires  schema  redesign,  increasing  1me  to  market  

…  making  tradi1onal  databases  inappropriate  when  data  rela1onships  are  valuable  in  real-­‐%me      

Slow  development  Poor  performance  Low  scalability  Hard  to  maintain  

Why  Rela%onal  DBs  Can’t  Handle  Rela%onships  Well?  

•  Data  Model  built  for  tabular  forms  not  JOINS  managing  connec1ons  was  bolted  on  both  in  schema  and  query  

•  Strict  schema  not  suitable  for  variable  structured  data  which  is  generated  and  used  by  todays  applica1ons  

•  Data  volume  and  JOIN  number  affect  cost  of  query  opera1on  exponen1ally  

•  Variable  hierarchies  and  networks  are  hard  to  store  and  query  so  many  “pamerns”  were  developed  

…  olen  only  denormaliza1on  makes  complex  rela1onal  queries  fast  but  destroys  the  good  normalized  data-­‐model      

Built  for  Forms  Joins  are  expensive  Denormalize  #FTW  

 

Unlocking  Value  from  Your  Data  Rela%onships  

•  Model  your  data  naturally  as  a  graph  of  data  and  rela1onships  

•  Drive  graph  model  from  domain  and  use-­‐cases  

•  Use  rela1onship  informa1on  in  real-­‐1me  to  transform  your  business  

•  Add  new  rela1onships  on  the  fly  to  adapt  to  your  changing  requirements  

High  Query  Performance  with  a  Na%ve  Graph  DB  

•  Rela1onships  are  first  class  ci1zen  •  No  need  for  joins,  just  follow  pre-­‐materialized  rela1onships  of  nodes  

•  Query  &  Data-­‐locality  –  navigate  out  from  your  star1ng  points  

•  Only  load  what’s  needed  •  Aggregate  and  project  results  as  you  go  

•  Op1mized  disk  and  memory  model  for  graphs  

High  Query  Performance:  Some  Numbers  

•  Traverse  4M+  rela1onships  per  second  and  core  

•  Cost  based  query  op1mizer  –  complex  queries  return  in  milliseconds  

•  Import  100K-­‐1M  records  per  second  transac1onally  

•  Bulk  import  tens  of  billions  of  records  in  a  few  hours  

High  Query  Performance:  Some  Numbers  

•  Traverse  4M+  rela1onships  per  second  and  core  

•  Cost  based  query  op1mizer  –  complex  queries  return  in  milliseconds  

•  Import  100K-­‐1M  records  per  second  transac1onally  

•  Bulk  import  tens  of  billions  of  records  in  a  few  hours  

Modeling  as  a  Graph  

The  Whiteboard  Model  Is  the  Physical  Model  

CAR  

name:  “Dan”  born:  May  29,  1970  twimer:  “@dan”  

name:  “Ann”  born:    Dec  5,  1975  

since:    Jan  10,  2011  

brand:  “Volvo”  model:  “V70”  

Property  Graph  Model  Components  

Nodes  •  The  objects  in  the  graph  •  Can  have  name-­‐value  proper&es  •  Can  be  labeled  Rela%onships  •  Relate  nodes  by  type  and  direc1on  •  Can  have  name-­‐value  proper&es  

LOVES  

LOVES  

LIVES  WITH  PERSON   PERSON  

Rela%onal  Versus  Graph  Models  

Rela%onal  Model   Graph  Model  

KNOWS  ANDREAS  

TOBIAS  

MICA  

DELIA  

Person   Friend  Person-­‐Friend  

ANDREAS  DELIA  

TOBIAS  

MICA  

Let’s  Model!  

 

Customer,  Supplier,  and  Product  (Master  Data)  Orders  (Ac%vity)  

The  Domain  Model  

Order

Product

Customer Employee

SOLD

ORDERS

Category

Employee

REPORTS_TO

PART_OF

PURCHASED

Supplier

SUPPLIES

Except…  

The  Requisite  Northwind  Example!  

 

NOT  JUST  ANY  

(Northwind)-­‐[:TO]-­‐>(Graph)  Building  the  Graph  Model  

Building  Rela%onships  in  Graphs  

SOLD  

Employee   Order  Order  

Locate  Foreign  Keys  

(FKs)-­‐[:BECOME]-­‐>(Rela%onships)  Correct  Direc%ons  

Drop  Foreign  Keys  

Find  the  Join  Tables  

Simple  Join  Tables  Becomes  Rela%onships  

ABributed  Join  Tables  Become  Rela%onships  with  Proper%es  

Working  Subset  (Today’s  Exercise)  

Northwind  Graph  Model  

Order

Product

Customer Employee

SOLD

ORDERS

Category

Employee

REPORTS_TO

PART_OF

PURCHASED

Supplier

SUPPLIES

s  

Recap  -­‐  Rules  

Model  your  graph  first  and    import  into  that  model.  

Alterna%vely  …  

Normalized  ER-­‐Models:  Transforma%on  Rules  

•  Tables  become  nodes  •  Table  name  as  node-­‐label  •  Columns  turn  into  proper%es  •  Convert  values  if  needed  •  Foreign  Keys  (1:1,  1:n,  n:1)  into  rela%onships,    column  name  into  rela1onship-­‐type  (or  bemer  verb)  

•  JOIN-­‐Tables  represent  rela%onships  •  Also  other  tables  without  domain  iden1ty  (w/o  PK)  and  two  FKs  •  Columns  turn  into  rela%onship  proper%es  

Normalized  ER-­‐Models:  Cleanup  Rules  

•  Remove  technical  IDs  (auto-­‐incremen1ng  PKs)  •  Keep  domain  IDs  (e.g.  ISBN)  •  Add  constraints  for  those  

•  Add  indexes  for  lookup  fields  •  Adjust  names  for  Label,  REL_TYPE  and  propertyName  

 Note:  currently  no  composite  constraints  and  indexes  

Impor%ng  Your  Data  

Ge^ng  Data  into  Neo4j  

Cypher-­‐Based  “LOAD  CSV”  Capability  •  Transac1onal  (ACID)  writes  •  Ini1al  and  incremental  loads  of  up  to    10  million  nodes  and  rela1onships  

Command-­‐Line  Bulk  Loader        neo4j-­‐import  •  For  ini1al  database  popula1on  •  For  loads  up  to  10B+  records  •  Up  to  1M  records  per  second  

 4.58  million  things  and  their  rela1onships…  

 Loads  in  100  seconds!  

CSV  

Ge^ng  Data  into  Neo4j  

Custom  Cypher-­‐Based  Loader  •  Uses  transac1onal  Cypher  hmp  endpoint  •  Parametrized,  batched,  concurrent    Cypher  statements  

•  Any  programming/script  language  with  driver  or  plain  hmp  

JVM  Transac%onal  Loader  •  Use  Neo4j’s  Java-­‐API  •  From  any  JVM  language  •  Up  to  1M  records  per  second  

Any    Data    

Program  

Program  

Program  

Data  Import  Demo  

Import  Demo  

Cypher-­‐Based  “LOAD  CSV”  Capability  •  Use  to  import  Northwind  CSV  dumps  

Command-­‐Line  Bulk  Loader        neo4j-­‐import  •  Chicago  Crimes  Dataset  

Rela%onal  Import  Tool        neo4j-­‐rdbms-­‐import  •  Proof  of  Concept  

JDBC  +  API  

CSV  

RDBMS  Import  Tool  Demo  –  Proof  of  Concept  

•  JDBC  for  vendor-­‐independent  database  connec1on  •  SchemaCrawler  to  extract  DB-­‐Meta-­‐Data  •  Use  Rules  to  drive  graph  model  import  •  Op1onal  means  to  override  default  behavior  •  Scales  writes  with  Parallel  Batch  Importer  API  •  Reads  tables  concurrently  for  nodes  &  rela1onships  

Demo:  MySQL  -­‐  Employee  Demo  Database    Source:  github.com/jexp/neo4j-­‐rdbms-­‐import  

Post  gres  

MySQL  Oracle  

Querying  Your  Data  

Basic  Query:  Who  do  people  report  to?  

MATCH  (:Employee  {firstName:”Steven”}  )  -­‐[:REPORTS_TO]-­‐>  (:Employee  {firstName:“Andrew”}  )    

REPORTS_TO  Steven   Andrew  

LABEL   PROPERTY  

NODE   NODE  

LABEL   PROPERTY  

Basic  Query  Comparison:  Who  do  people  report  to?  

SELECT *FROM Employee as e JOIN Employee_Report AS er ON (e.id = er.manager_id) JOIN Employee AS sub ON (er.sub_id = sub.id)

MATCH (e:Employee)<-[:REPORTS_TO]-(sub:Employee)RETURN *

Basic  Query:  Who  do  people  report  to?  

Basic  Query:  Who  do  people  report  to?  

MATCH  (sub)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(sub)  WHERE  boss.firstName  =  'Andrew'  RETURN  sub.firstName  AS  Subordinate,        count(report)  AS  Total;  

Express  Complex  Queries  Easily  with  Cypher  

Find  all  direct  reports  and  how  many  people  they  manage,    each  up  to  3  levels  down  

Cypher  Query  

SQL  Query  

“We  found  Neo4j  to  be  literally  thousands  of  %mes  faster  than  our  prior  MySQL  solu1on,  with  queries  that  require  10  to  100  %mes  less  code.  Today,  Neo4j  provides  eBay  with  func1onality  that  was  previously  impossible.”    Volker  Pacher  Senior  Developer  

Who  is  in  Robert’s  (direct,  upwards)  repor%ng  chain?  

MATCH path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)WHERE sub.firstName = 'Robert'RETURN path;

Who  is  in  Robert’s  (direct,  upwards)  repor%ng  chain?  

Who’s  the  Big  Boss?  

MATCH (e:Employee)WHERE NOT (e)-[:REPORTS_TO]->()RETURN e.firstName as bigBoss;

Who’s  the  Big  Boss?  

Product  Cross-­‐Sell  MATCH (choc:Product {productName: 'Chocolade'}) <-[:ORDERS]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:ORDERS]->(other:Product)RETURN employee.firstName, other.productName, count(distinct o2) as countORDER BY count DESCLIMIT 5;

Product  Cross-­‐Sell  

Neo4j  Query  Planner  

Cost  based  Query  Planner  since  Neo4j  2.2  •  Uses  database  stats  to  select  best  plan  •  Currently  for  Read  OperaAons  •  Query  Plan  Visualizer,  finds  •  Non  op1mal  queries  •  Cartesian  Product  •  Missing  Indexes,  Global  Scans  •  Typos  •  Massive  Fan-­‐Out  

 

Query  Planner  

Slight  change,  add  an  :Employee  label  -­‐>  more  stats  available  -­‐>  new  plan  with  fewer  database-­‐hits  

Architecture  &  Integra%on  “Polyglot  Persistence”  

Neo4j  Clustering    Architecture  Op%mized  for  Speed  &  Availability  at  Scale  

64

Performance  Benefits  •  No  network  hops  within  queries  •  Real-­‐Ame  operaAons  with  fast  and  consistent  response  1mes    

•  Cache  sharding  spreads  cache  across  cluster  for  very  large  graphs  

Clustering  Features  •  Master-­‐slave  replica1on  with    master  re-­‐elecAon  and  failover    

•  Each  instance  has  its  own  local  cache  •  Horizontal  scaling  &  disaster  recovery  

Load  Balancer  

Neo4j  Neo4j  Neo4j  

MIGRATE    ALL  DATA  

MIGRATE    GRAPH  DATA  

DUPLICATE  GRAPH  DATA  

Non-­‐graph  data   Graph  data  

Graph  data  All  data  

All  data  

Rela%onal  Database  

Graph  Database  

Applica1on  

Applica1on  

Applica1on  

Three  Ways  to  Migrate  Data  to  Neo4j  

Data  Storage  and  Business  Rules  Execu1on  

Data  Mining    and  Aggrega1on  

Neo4j  Fits  into  Your  Enterprise  Environment  

Applica%on  

Graph  Database  Cluster  

Neo4j   Neo4j   Neo4j  

Ad  Hoc  Analysis  

Bulk  Analy%c  Infrastructure  

Graph  Compute  Engine  EDW      …  

Data  Scien%st  

End  User  

Databases  Rela1onal  NoSQL  Hadoop  

User  Voice  

Users  Love  Neo4j  

Learn  the  Way  of  the  Graph  Quickly  and  Easily  

Quick  Start:  Plan  Your  Project  

1  

2  

3  

4  

5  

6  

7  

8  

Learn  Neo4j  

Decide  on  Architecture  

Import  and  Model  Data  

Build  Applica%on  

Test  Applica%on  

Deploy  your  app  in  as  limle  as  8  weeks  

PROFESSIONAL  SERVICES  PLAN  

There  Are  Lots  of  Ways  to  Easily  Learn  Neo4j  

GraphConnect,Europe,London,•,May,657,,2015

DATE,

LOCATION,

ACTIVITIES,

Wednesday,,May,6,–,Full,Day,Trainings,(includes,new,Advanced,Deployment,class),Thursday,,May,7,–,Main,Conference,

Etc,Venues,in,London,,UK,Training:,4,Norton,Folgate,Conference:,at,155,Bishopsgate,Liverpool,Street,

• Customers,and,community,members,such,as,adidas,,Pitney*Bowes,,Orange,,e1Spirit,,KNMI,and,others,,showcasing,their,Neo4j,solutions,• Neo4j,product,training,• Free,personal,advice,in,Neo4j,GraphClinics,• Opportunity,to,network,with,graph,users,from,across,the,world,• Enjoy,yourself!

TICKETS!JAX,Discount,Code,

50%,off,JAX50GCE,

www.graphconnect.com

www.graphconnect.com

GraphConnect,Europe,London,•,May,657,,2015

DATE,

LOCATION,

ACTIVITIES,

Wednesday,,May,6,–,Full,Day,Trainings,(includes,new,Advanced,Deployment,class),Thursday,,May,7,–,Main,Conference,

Etc,Venues,in,London,,UK,Training:,4,Norton,Folgate,Conference:,at,155,Bishopsgate,Liverpool,Street,

• Customers,and,community,members,such,as,adidas,,Pitney*Bowes,,Orange,,e1Spirit,,KNMI,and,others,,showcasing,their,Neo4j,solutions,• Neo4j,product,training,• Free,personal,advice,in,Neo4j,GraphClinics,• Opportunity,to,network,with,graph,users,from,across,the,world,• Enjoy,yourself!

TICKETS!JAX,Discount,Code,

50%,off,JAX50GCE,

www.graphconnect.com

www.graphconnect.com

Rela%onal  to  (Big)  Graph  Harnessing  the  Power  of  the  Graph  

End  of  PresentaAon