Upload
neo4j-the-fastest-and-most-scalable-native-graph-database
View
9.570
Download
8
Embed Size (px)
Citation preview
Agenda
• History of Neo4j • Rela1onal Pains – Graph Pleasure • Rela1onal to Graph • Model -‐> Import -‐> Query -‐> Build -‐> Integrate • Demo • Q&A
History of Neo4j -‐ Problem
• Digital Asset Management System in 2000 • SaaS many users in many countries • Two hard use-‐cases • Mul1 language keyword search • Including synonyms / word hierarchies
• Access Management to Assets for SaaS Scale
History of Neo4j – Rela%onal ABempt
• Tried with many rela1onal DBs • JOIN Performance Problems • Hierarchies, Networks, Graphs
• Modeling Problems • Data Model evolu1on
• No Success, even … • With expensive database consultants!
History of Neo4j – First working Implementa%on
• Graph Model & API sketched on a napkin • Nodes connected by RelaAonships • Just like your conceptual model
• Implemented network-‐database in memory • Java API, fast Traversals • Worked well, but … • No persistence, No Transac1ons • Long import / export 1me from rela1onal storage
History of Neo4j -‐ Solu%on
• Evolved to full fledged database in Java • With persistence using files + memory mapping • Transac1ons with Transac1on Log (WAL) • Lucene for fast Node search
• Founded Company in 2007 • Neo4j (REST)-‐Server • Neo4j Clustering & HA • Cypher Query Language
• Today …
Neo Technology Overview
Product • Neo4j -‐ World’s leading graph database
• 1M+ downloads, adding 50k+ per month
• 150+ enterprise subscrip1on customers including over 50 of the Global 2000
Company • Neo Technology, Creator of Neo4j • 80 employees with HQ in Silicon Valley, London, Munich, Paris and Malmö
• $45M in funding from Fidelity, Sunstone, Conor, Creandum, Dawn Capital
Neo4j Adop%on by Selected Ver%cals FinancialServices Communications Health &
Life Sciences HR &
Recruiting Media &
Publishing SocialWeb
Industry & Logistics
Entertainment Consumer Retail Information Services Business Services
How Customers Use Neo4j Network &
Data Center Master DataManagement Social Recom–
mendations Identity
& Access Search &Discovery GEO
“Forrester es1mates that over 25% of enterprises will be using graph databases by 2017”
Neo4j Leads the Graph Database Revolu%on
“Neo4j is the current market leader in graph databases.”
“Graph analysis is possibly the single most effec%ve compe%%ve differen%ator for organiza1ons pursuing data-‐driven opera1ons and decisions aler the design of data capture.”
IT Market Clock for Database Management Systems, 2014 hmps://www.gartner.com/doc/2852717/it-‐market-‐clock-‐database-‐management TechRadar™: Enterprise DBMS, Q1 2014 hmp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-‐/E-‐RES106801 Graph Databases – and Their Poten%al to Transform How We Capture Interdependencies (Enterprise Management Associates) hmp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-‐databasesand-‐poten1al-‐transform-‐capture-‐interdependencies/
Largest Ecosystem of Graph Enthusiasts
• 1,000,000+ downloads • 20,000+ educated developers • 18,000+ Meetup members • 100+ technology and service partners • 150+ enterprise subscrip1on customers including 50+ Global 2000 companies
High Business Value in Data Rela%onships
Data is increasing in volume… • New digital processes • More online transac1ons • New social networks • More devices
Using Data Rela%onships unlocks value • Real-‐1me recommenda1ons • Fraud detec1on • Master data management • Network and IT opera1ons • Iden1ty and access management • Graph-‐based search … and is ge^ng more connected
Customers, products, processes, devices interact and relate to each other
Early adopters became industry leaders
Rela%onal DBs Can’t Handle Rela%onships Well
• Cannot model or store data and relaAonships without complexity
• Performance degrades with number and levels of rela1onships, and database size
• Query complexity grows with need for JOINs • Adding new types of data and relaAonships requires schema redesign, increasing 1me to market
… making tradi1onal databases inappropriate when data rela1onships are valuable in real-‐%me
Slow development Poor performance Low scalability Hard to maintain
Why Rela%onal DBs Can’t Handle Rela%onships Well?
• Data Model built for tabular forms not JOINS managing connec1ons was bolted on both in schema and query
• Strict schema not suitable for variable structured data which is generated and used by todays applica1ons
• Data volume and JOIN number affect cost of query opera1on exponen1ally
• Variable hierarchies and networks are hard to store and query so many “pamerns” were developed
… olen only denormaliza1on makes complex rela1onal queries fast but destroys the good normalized data-‐model
Built for Forms Joins are expensive Denormalize #FTW
Unlocking Value from Your Data Rela%onships
• Model your data naturally as a graph of data and rela1onships
• Drive graph model from domain and use-‐cases
• Use rela1onship informa1on in real-‐1me to transform your business
• Add new rela1onships on the fly to adapt to your changing requirements
High Query Performance with a Na%ve Graph DB
• Rela1onships are first class ci1zen • No need for joins, just follow pre-‐materialized rela1onships of nodes
• Query & Data-‐locality – navigate out from your star1ng points
• Only load what’s needed • Aggregate and project results as you go
• Op1mized disk and memory model for graphs
High Query Performance: Some Numbers
• Traverse 4M+ rela1onships per second and core
• Cost based query op1mizer – complex queries return in milliseconds
• Import 100K-‐1M records per second transac1onally
• Bulk import tens of billions of records in a few hours
High Query Performance: Some Numbers
• Traverse 4M+ rela1onships per second and core
• Cost based query op1mizer – complex queries return in milliseconds
• Import 100K-‐1M records per second transac1onally
• Bulk import tens of billions of records in a few hours
CAR
name: “Dan” born: May 29, 1970 twimer: “@dan”
name: “Ann” born: Dec 5, 1975
since: Jan 10, 2011
brand: “Volvo” model: “V70”
Property Graph Model Components
Nodes • The objects in the graph • Can have name-‐value proper&es • Can be labeled Rela%onships • Relate nodes by type and direc1on • Can have name-‐value proper&es
LOVES
LOVES
LIVES WITH PERSON PERSON
Rela%onal Versus Graph Models
Rela%onal Model Graph Model
KNOWS ANDREAS
TOBIAS
MICA
DELIA
Person Friend Person-‐Friend
ANDREAS DELIA
TOBIAS
MICA
The Domain Model
Order
Product
Customer Employee
SOLD
ORDERS
Category
Employee
REPORTS_TO
PART_OF
PURCHASED
Supplier
SUPPLIES
Northwind Graph Model
Order
Product
Customer Employee
SOLD
ORDERS
Category
Employee
REPORTS_TO
PART_OF
PURCHASED
Supplier
SUPPLIES
Normalized ER-‐Models: Transforma%on Rules
• Tables become nodes • Table name as node-‐label • Columns turn into proper%es • Convert values if needed • Foreign Keys (1:1, 1:n, n:1) into rela%onships, column name into rela1onship-‐type (or bemer verb)
• JOIN-‐Tables represent rela%onships • Also other tables without domain iden1ty (w/o PK) and two FKs • Columns turn into rela%onship proper%es
Normalized ER-‐Models: Cleanup Rules
• Remove technical IDs (auto-‐incremen1ng PKs) • Keep domain IDs (e.g. ISBN) • Add constraints for those
• Add indexes for lookup fields • Adjust names for Label, REL_TYPE and propertyName
Note: currently no composite constraints and indexes
Ge^ng Data into Neo4j
Cypher-‐Based “LOAD CSV” Capability • Transac1onal (ACID) writes • Ini1al and incremental loads of up to 10 million nodes and rela1onships
Command-‐Line Bulk Loader neo4j-‐import • For ini1al database popula1on • For loads up to 10B+ records • Up to 1M records per second
4.58 million things and their rela1onships…
Loads in 100 seconds!
CSV
Ge^ng Data into Neo4j
Custom Cypher-‐Based Loader • Uses transac1onal Cypher hmp endpoint • Parametrized, batched, concurrent Cypher statements
• Any programming/script language with driver or plain hmp
JVM Transac%onal Loader • Use Neo4j’s Java-‐API • From any JVM language • Up to 1M records per second
Any Data
Program
Program
Program
Import Demo
Cypher-‐Based “LOAD CSV” Capability • Use to import Northwind CSV dumps
Command-‐Line Bulk Loader neo4j-‐import • Chicago Crimes Dataset
Rela%onal Import Tool neo4j-‐rdbms-‐import • Proof of Concept
JDBC + API
CSV
RDBMS Import Tool Demo – Proof of Concept
• JDBC for vendor-‐independent database connec1on • SchemaCrawler to extract DB-‐Meta-‐Data • Use Rules to drive graph model import • Op1onal means to override default behavior • Scales writes with Parallel Batch Importer API • Reads tables concurrently for nodes & rela1onships
Demo: MySQL -‐ Employee Demo Database Source: github.com/jexp/neo4j-‐rdbms-‐import
Post gres
MySQL Oracle
Basic Query: Who do people report to?
MATCH (:Employee {firstName:”Steven”} ) -‐[:REPORTS_TO]-‐> (:Employee {firstName:“Andrew”} )
REPORTS_TO Steven Andrew
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
Basic Query Comparison: Who do people report to?
SELECT *FROM Employee as e JOIN Employee_Report AS er ON (e.id = er.manager_id) JOIN Employee AS sub ON (er.sub_id = sub.id)
MATCH (e:Employee)<-[:REPORTS_TO]-(sub:Employee)RETURN *
MATCH (sub)-‐[:REPORTS_TO*0..3]-‐>(boss), (report)-‐[:REPORTS_TO*1..3]-‐>(sub) WHERE boss.firstName = 'Andrew' RETURN sub.firstName AS Subordinate, count(report) AS Total;
Express Complex Queries Easily with Cypher
Find all direct reports and how many people they manage, each up to 3 levels down
Cypher Query
SQL Query
“We found Neo4j to be literally thousands of %mes faster than our prior MySQL solu1on, with queries that require 10 to 100 %mes less code. Today, Neo4j provides eBay with func1onality that was previously impossible.” Volker Pacher Senior Developer
Who is in Robert’s (direct, upwards) repor%ng chain?
MATCH path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)WHERE sub.firstName = 'Robert'RETURN path;
Product Cross-‐Sell MATCH (choc:Product {productName: 'Chocolade'}) <-[:ORDERS]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:ORDERS]->(other:Product)RETURN employee.firstName, other.productName, count(distinct o2) as countORDER BY count DESCLIMIT 5;
Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2 • Uses database stats to select best plan • Currently for Read OperaAons • Query Plan Visualizer, finds • Non op1mal queries • Cartesian Product • Missing Indexes, Global Scans • Typos • Massive Fan-‐Out
Query Planner
Slight change, add an :Employee label -‐> more stats available -‐> new plan with fewer database-‐hits
Neo4j Clustering Architecture Op%mized for Speed & Availability at Scale
64
Performance Benefits • No network hops within queries • Real-‐Ame operaAons with fast and consistent response 1mes
• Cache sharding spreads cache across cluster for very large graphs
Clustering Features • Master-‐slave replica1on with master re-‐elecAon and failover
• Each instance has its own local cache • Horizontal scaling & disaster recovery
Load Balancer
Neo4j Neo4j Neo4j
MIGRATE ALL DATA
MIGRATE GRAPH DATA
DUPLICATE GRAPH DATA
Non-‐graph data Graph data
Graph data All data
All data
Rela%onal Database
Graph Database
Applica1on
Applica1on
Applica1on
Three Ways to Migrate Data to Neo4j
Data Storage and Business Rules Execu1on
Data Mining and Aggrega1on
Neo4j Fits into Your Enterprise Environment
Applica%on
Graph Database Cluster
Neo4j Neo4j Neo4j
Ad Hoc Analysis
Bulk Analy%c Infrastructure
Graph Compute Engine EDW …
Data Scien%st
End User
Databases Rela1onal NoSQL Hadoop
Quick Start: Plan Your Project
1
2
3
4
5
6
7
8
Learn Neo4j
Decide on Architecture
Import and Model Data
Build Applica%on
Test Applica%on
Deploy your app in as limle as 8 weeks
PROFESSIONAL SERVICES PLAN
GraphConnect,Europe,London,•,May,657,,2015
DATE,
LOCATION,
ACTIVITIES,
Wednesday,,May,6,–,Full,Day,Trainings,(includes,new,Advanced,Deployment,class),Thursday,,May,7,–,Main,Conference,
Etc,Venues,in,London,,UK,Training:,4,Norton,Folgate,Conference:,at,155,Bishopsgate,Liverpool,Street,
• Customers,and,community,members,such,as,adidas,,Pitney*Bowes,,Orange,,e1Spirit,,KNMI,and,others,,showcasing,their,Neo4j,solutions,• Neo4j,product,training,• Free,personal,advice,in,Neo4j,GraphClinics,• Opportunity,to,network,with,graph,users,from,across,the,world,• Enjoy,yourself!
TICKETS!JAX,Discount,Code,
50%,off,JAX50GCE,
www.graphconnect.com
www.graphconnect.com
GraphConnect,Europe,London,•,May,657,,2015
DATE,
LOCATION,
ACTIVITIES,
Wednesday,,May,6,–,Full,Day,Trainings,(includes,new,Advanced,Deployment,class),Thursday,,May,7,–,Main,Conference,
Etc,Venues,in,London,,UK,Training:,4,Norton,Folgate,Conference:,at,155,Bishopsgate,Liverpool,Street,
• Customers,and,community,members,such,as,adidas,,Pitney*Bowes,,Orange,,e1Spirit,,KNMI,and,others,,showcasing,their,Neo4j,solutions,• Neo4j,product,training,• Free,personal,advice,in,Neo4j,GraphClinics,• Opportunity,to,network,with,graph,users,from,across,the,world,• Enjoy,yourself!
TICKETS!JAX,Discount,Code,
50%,off,JAX50GCE,
www.graphconnect.com
www.graphconnect.com