Upload
kisung-kim
View
707
Download
6
Embed Size (px)
Citation preview
AgensGraph: a Multi-Model Graph Database based-on PostgreSQL
Kisung Kim ([email protected])Bitnine R&D Center
2017-1-14
Who am I
• Ph.D Kisung Kim -Chief Technology Officer of Bitnine Global Inc.
• Researched query optimization for graph-structured data during doctorate degree
• Developed a distributed relational database engine in TmaxSoft
• Lead the development of a new graph database, AgensGraph in Bitnine Global
What is Graph Database?
Images from http://www.slideshare.net/debanjanmahata/an-introduction-to-nosql-graph-databases-and-neo4j
What is Graph Database?
• Relationship is the first-class citizen in the graph database
• Make your data connected in the graph database
Relational Database Graph Database
Entity Row Node (Vertex)
Relationship Row Relationship (Edge)
What is the Graph Database?
• Handle data in different view
• Data model similar to entity-relationship model
• Gartner says it represents a radical change in how data is organized and processed
Cypher Query Language
• Declarative query language for the property graph model
• Inspired by SQL and SPARQL
– Designed to be human-readable query language
• Developed by Neo technology Inc. since 2011
• Current version is 3.0
• OpenCypher.org (http://opencypher.org)
– Participate in developing the query language
Cypher Query Example
Make two nodesCREATE (:person {id: 1, name: “Kisung Kim”, birthday: 1980-01-05});CREATE (:company {id: 1, name: “Bitnine Global”});
Make a relationship between the two nodesMATCH (p:person {id: 1}), (c:company {id:1})CREATE (p)-[:workFor {title: “CTO”, since: 2014}]->(c);
Kisung Kim Bitnine GlobalworkFor
Cypher Query Example
QueryingMATCH (p:person {name: “Kisung Kim”})-[:workFor]->(c:company)RETURN (p), (c)
No Table Definitions and No Joins
Query with variable length relationshipsMATCH (p:person {name: “Kisung Kim”})-[:knows*..3]->(f:person)RETURN (f)
Kisung Kim ?workFor
Kisung Kim ?knows
?knows
?knows
GraphDB to PostgreSQL Case
• From Hipolabs
http://engineering.hipolabs.com/graphdb-to-postgresql/
Graph Database and Hybrid Database
Magic Quadrant for Operational Database Management Systems, Gartner, 2016
So, What We Want to Make is
• Hybrid database engine with graph and relational model
• Cypher query processing on PostgreSQL
• Online transactional graph database
• Disk-based persistent graph storage
( ) -[:processes]->(Cypher)
Why We Choose PostgreSQL?
• Fully-featured enterprise-ready open source database
• Graph processing actually uses relational algebra– Graph is serialized as tables in disk– Every graph traversal step is in principle a join
(from LDBC documentation)
• It is important to optimize the joins speed up join processing – PostgreSQL has an excellent query optimizer
• And…. Abundant eco-system of PostgreSQL
Challenges
• How to store graph data– Efficient structure for graph pattern matching
– At the same time, efficient for transaction processing
• How to process graph queries– Processing complex graph pattern matching: variable length path,
shortest path
– Mismatches between graph data model & relational data model
– Graph query optimization
Graph Storage
• Graph data is stored in disk as decomposed into vertexes and edges
• When processing graph pattern matching, it is essential to find adjacent vertexes or edges efficiently
– Given a start vertex, find end vertexes
– Given an end vertex, find start vertexesv1
Two Graph Databases
Solution Company Latest Version Features
Neo Technology 3.1Most famous graph database, Cypher
O(1) access using fixed-size array
Datastax -Distributed graph system based on
Cassandra
Titan
Graph Storage -Neo4j
• Fixed-size array for nodes and relationships• Relationships for a node is organized as a doubly-linked list• Index-free adjacency• O(1) access for adjacent edges: follow the pointer
From Graph Databases 2nd ed. O’Reilly, 2015
Graph Storage – Titan (DSE Graph)
• Titan stores graphs in adjacency list format
• Each edge is stored twice
• Vertex and edge list are stored in backend storage like HBase Cassandra or BerkeleyDB
From http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
Graph Storage -AgensGraph
• Fixed-size array is hard to implement in PostgreSQL– Tuples are moved when updated
• Titan’s big row approach is also inadequate• We chose B-tree index for graph traversal
GraphVertex Edge
Vertex ID Properties Edge ID PropertiesStart Vertex ID End Vertex ID
B-treeVertex ID
B-tree(Start, End)
B-tree(End, Start)
Index Problems
• Current B-tree has several disadvantages for our workload
– Composite index is preferable but the size increases
– There exists a lot of duplicate keys (vertex ID) on start_ID or end_ID
– Property updates incur insertions into B-trees
• We are developing a new index having bucket structure (like GIN index), in-direct index and supports for index-only scan for the graph traversals
Graph Storage -AgensGraph• Vertexes and edges are grouped into labels
• Labels are organized as a label hierarchy
• We use PostgreSQL’s table hierarchy feature
Vertex ID Properties
ag_vertex
Vertex ID PropertiesPerson
Vertex ID PropertiesMessage
Vertex ID PropertiesComment
Vertex ID PropertiesPost
Current Status
• AgensGraph v0.9 (https://github.com/bitnine-oss/agens-graph or http://bitnine.net/downloads/)
– Graph data model and DDL on PostgreSQL 9.6– Cypher query processing (70% of OpenCypher spec.)– Integrated query processing (Cypher + SQL)– Client library (JDBC, ODBC, Python)–Monitoring and development using Tadpole DB-hub
Tadpole for Agens Graph
• Tadpole DB Hub is open-source project for managing unified infrastructure (https://github.com/hangum/TadpoleForDBTools)
• Support various databases including (PostgreSQL and Agens Graph)
• Features of Tadpole for Agens Graph
– Monitoring Agens Graph server
– Cypher query browser and graph visualization
Tadpole for AgensGraph
Future Roadmap
• Distributed graph database
– Plan to exploit Postgres-XL
• Specialized storage and index for graph traversals
• Dictionary compression for JSONB (ZSON)
• Graph query optimization using graph statistics
• Integration with big data systems
– HDFS Storage
– Graph analysis using GraphX
Join Us
• AgensGraph is an open-source project https://github.com/bitnine-oss/agens-graph
• We also wish to contribute PostgreSQL community
• Graph database meetup in Silicon Valley– http://www.meetup.com/Graph-Database-in-Silicon-Valley/
Thank [email protected]
:likes