View
7.267
Download
0
Category
Preview:
DESCRIPTION
Graph databases are not widespread in the development communities, although they are a swiss-army knife for problem that the relational model can't simple handle well. In this talk we're gonna talk for a few minutes about the graph theory, see how to easily solve a few relational anti-patterns with graph databases and how to integrate them in your next project. At the end we will take a practical look to OrientDB, "next big thing" of the NoSQL ecosystem, through its PHP Data Mapper, "Orient".
Citation preview
1
Graph databases:time for serious stuff
David FunaroAlessandro Nadalin
Agenda
2
•Theory•When to use a graph?•Why graphDB?•The graphDB community•OrientDB•Orient PHP library
Essential (Theory)
A
3
Ver
tex
(V,G
raph
G =
Edg
e
E)
Binary Relation
4
BA
Hates
Itchy Scratchy
Binary Relation
4
B
Vertex Vertex
Edge
A
Graph
5
B
D
E
G
FA
Undirected Graph
B
D
E
F
A
Example: Friendship 6
Directed Edge
7
B
Vertex Vertex
Edge
A
Directed Graph
8Example: Followee
D
FA
BA
Path
9
B
D
E
G
FA
Path
10
B D EG FA
Graph -> GraphDB
11
A GraphDB is a database that use the graph as its primary data structure
... when to use a graph ?
Recommendations
14
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows
Recommendations
22
John
Rome
Milan
Cinema A
Cinema B
Cinema C
Se7en
Mr Bean
Thriller
Fun
lives in
location
location
location
type
type
likes
shows
shows
shows x
✓
x
x x x ✓ ✓ ✓ ✓
Your data is a graph
23
a tree is a graph
24
parent_id is a graph
25
Solve decision problems
Maximum flow
maximum flowGiven a dataset, calculate how to best organize it
travelling salesman problem
The pizza guy needs to deliver on A, B,C.
Decision base on distance, traffic, time and so on.
Shortest path
Identify "special" nodes of the graph
Given your dataset, organize some clusters
Are there some nodes which cannot belong to a cluster?
They probably have some properties different from the average
Given your dataset, organize some clusters
Are there some nodes which cannot belong to a cluster?
They probably have some properties different from the average
ACHTUNG!TERRORISTEN!
but ... why graphDB?
36
✓Relational Database
(mysql, oracle)
✓Document Oriented DB
(mongodb, couchdb)
✓XML Database
(MarkLogic, eXist-db)
http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation#
Representing a Graph in:
37
where is the difference ?
38
A graph database is any storage system that provides index-free adjacency.
GraphDB
http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation
Step by step example
40
Given a list of people, find their homepages
Tree-based DB WAY
41
1
find
http://davidfunaro.com
3
David Funaro
put in the Search Engine2
Tree-based DB WAY
41
1
find
http://davidfunaro.com
3
David Funaro
put in the Search Engine2
The cost to find a single friend HP grows as the friends HP tables grows
GraphDB WAY
42
get the embedded information(index)
www.odino.org
1
it’s like that the GraphDB has an additional information(the ancor <a>)
GraphDB WAY
43
<a href=”http://odino.org”>Alessandro Nadalin
</a>
The Anchor work as a local index to reach the document = index-free
adjacency
Local cost
44
The local cost is O(k) = Constant
Local cost
45
The local cost is O(k) = Constant
Local cost
46
Thus, as the graph grows in size, the cost of a local step remain the same
any database can implicity represent a graph
BUTonly a graph database make the graph
structure explicit
47
Benchmark
48
• 1 Million Vertex
• 4 Million Edge
• Scale Free Tolopogy
• MySql VS Neo4J
• Both Hash and BTree
Deph RDBMS Graph
1
2
3
4
5
100ms 30ms
1000ms 500ms
10000ms 3000ms
100000ms
50000ms
N/A 100000ms
http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/
How ?
49
PREFIX geospecies: <http://rdf.geospecies.org/ont/geospecies#>PREFIX lycopodiophyta: <http://lod.geospecies.org/phyla/Pc2>PREFIX door_county: <http://sws.geonames.org/5250768/>PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT ?family_name ?canonicalName ?commonName ?identifier ?wikipedia_url
WHERE {?x geospecies:hasFamilyName ?family_name; geospecies:hasCanonicalName ?canonicalName; geospecies:hasCommonName ?commonName; dcterms:identifier ?identifier; geospecies:inPhylum lycopodiophyta:; geospecies:isUSDA_ExpectedIn door_county:. OPTIONAL { ?x geospecies:hasCommonName ?commonName; geospecies:hasWikipediaArticle ?wikipedia_url} }
ORDER BY ?family_name ?canonicalName
50
http://blog.acaro.org/entry/somebody-is-going-to-hate-me-nosparql
NoSPARQL
Databases
community that is building and feeding the GraphDB ecosystem
ThinkerPopStack
NoSPARQL
Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data
model. Blueprints is analogous to the JDBC, but for graph databases.
https://github.com/tinkerpop/blueprints/wiki/
data model and their implementation
provide a collection of "pipes" that are connected togheter to from processing
pipelines
a data flow Framework using Process Graph
a graph-based programming language.
a Turing-Complete graph-base programming language that compiles Gremlin syntax down to Pipes
a REST-full graph shell.
Allow blueprints graph to be exposed through a RESTful API (HTTP)
What's hot
OrientDB
Glossary
59
<10:05>RID
Cluster Position
CLASS
Main features
Inheritance
class Bike
class Vehicle
class Car
class Bike
class Vehicle
class Car
SELECT FROM Vehicle WHERE owner = 1:1
class Bike
class Vehicle
class Car
can return records of class Bike or Car
Traversal
68SELECT FROM fellas WHERE any() traverse(0,-1) ( @rid = [Michelle @rid] )
SELECT FROM fellas WHERE any() traverse(0,2) ( @rid = [Michelle @rid] )
SQL synthax
beyond SQL
SELECT FROM authors WHERE book.title = ...
ACID
speaks JSON
{ "schema": { "name": "Address" }, "result": [{ "@type": "d", "@rid": "#13:0", "@version": 6, "@class": "Address", "type": "Residence", "street": "Piazza Navona, 1", "city": "#14:0", "nick": "Luca2" }, { ... ...
Double Protocol
HTTP
Universal
HTTP
Easy to interact with
Blazing fast
binary
on-record SELECTs
SELECT FROM cats
SELECT FROM 11:0
SELECT FROM [11:0,11:1]
SELECT FROM [11:0,12:0]
stress-free setup
2 Mb
./orient/bin/server.sh
94
in-memory DB
or disk-persisted
Supports standards Supports standards
97
OrientDB
•Inheritance
•Traversal
•SQL-like syntax
•ACID
•Speak JSON
•Double protocol
•on-record Select
•ThinkerPop Compliant
Language Bindings
99http://code.google.com/p/orient/wiki/ProgrammingLanguageBindings
Orient = PHP Library to work with OrientDB
101
https://github.com/congow/Orient
Data Mapper
Query BuilderHTTP Binding
HTTP Binding
use Congow\Orient;use Congow\Orient\Foundation\Binding;
$driver = new Orient\Http\Client\Curl();$orient = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');
$response = $orient->query("SELECT FROM Address");
$output = json_decode($response->getBody());
foreach ($output->result as $address){ var_dump($address->street);}
apart from ->query($SQL)
->get|delete|postClass($class)
->post|delete|put|getDocument($rid)
...and much more!
(connect, disconnect, ...)
Query Builder
use Congow\Orient\Query;
$query = new Query();$query->from(array('users'))->where('username = ?', "admin");
echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"
$query->select(array('name', 'username', 'email'), false) ->from(array('12:0', '12:1'), false) ->where('any() traverse ( any() like "%danger%" )') ->orWhere("1 = ?", 1) ->andWhere("links = ?", 1) ->limit(20) ->orderBy('username') ->orderBy('name', true, true) ->range("12:0", "12:1");
SELECT name, username, email FROM [12:0, 12:1] WHERE any() traverse ( any() like "%danger%" ) OR 1 = "1" AND links = "1" ORDER BY name, username LIMIT 20 RANGE 12:0 12:1
Data Mapper
namespace Poland\PHPCon\Entity;
use Congow\Orient\ODM\Mapper\Annotations as ODM;
/*** @ODM\Document(class="Person")*/class Speaker{ /** * @ODM\Property( type="string") */ protected $name;
public function setName($name) { $this->name = $name; }
Domain Driven Design
{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...
{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...
$david = $mapper->hydrate(json_decode($speaker));
{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "Martin Fowler" }, { ... ...
$david instanceOf Poland\PHPCon\Entity\Speaker
Repository Pattern
$repo = $manager->getRepository('Speaker')
$speakers = $repo->findAll();
$speaker = $repo->find($rid);
$criteria = array('Name' => 'Martin');
$lornas = $repo->findBy($criteria);
$criteria = array( 'Name' => 'Martin', 'last_name' => 'Fowler');
$lornaJ = $repo->findOneBy($criteria);
https://github.com/doctrine/common/tree/master/lib/Doctrine/Common/Persistence
134
135
136
David Funaro@ingdavidinohttp://davidfunaro.com
Alessandro Nadalin@_odino_
http://odino.org
That’s all, folks!
Credits
http://www.flickr.com/photos/sayamindu/5677281218/sizes/l/in/photostream/http://farm1.static.flickr.com/182/471383865_79d04aec36_o.pnghttp://farm1.static.flickr.com/134/318947873_12028f1b66_b.jpg
http://www.flickr.com/photos/atomdocs/3275758118/sizes/o/in/photostream/http://www.flickr.com/photos/pattipics/5229478393/sizes/o/in/photostream/
http://www.flickr.com/photos/kongharald/366597251/sizes/o/in/photostream/http://www.everaldo.com/
http://www.flickr.com/photos/tusnelda/6140792529/sizes/l/in/photostream/http://www.flickr.com/photos/mondi/5368644355/sizes/l/in/photostream/
http://www.flickr.com/photos/jayneandd/4191106566/sizes/l/in/photostream/http://www.flickr.com/photos/jooon/2093253534/sizes/l/in/photostream/
http://www.flickr.com/photos/bluedharma/89186151/sizes/o/in/photostream/http://www.flickr.com/photos/exfordy/2747089295/sizes/l/in/photostream/
http://www.flickr.com/photos/nostri-imago/3137422976/sizes/o/in/photostream/http://www.flickr.com/photos/fionasjournal/379587818/sizes/z/in/photostream/
http://www.flickr.com/photos/nperlapro/1297392267/http://www.flickr.com/photos/fastphive/28428808/sizes/m/in/photostream/
http://www.flickr.com/photos/rnugraha/2003147365/sizes/o/in/photostream/http://www.flickr.com/photos/zigazou76/4412946911/sizes/l/in/photostream/http://www.flickr.com/photos/greatnet/4667555436/sizes/l/in/photostream/
http://www.flickr.com/photos/mnsc/2768391365/sizes/l/in/photostream/http://www.flickr.com/photos/christmaswithak/4675962453/sizes/l/in/photostream/
http://www.amazon.com/Trainspotting-Irvine-Welsh/dp/0393314804http://www.flickr.com/photos/franconadalin59/5778176872/sizes/l/in/photostream/
http://farm6.static.flickr.com/5176/5474445627_875d621689_b.jpghttp://farm3.static.flickr.com/2243/2189435082_a16d3c89ae_b.jpghttp://farm3.static.flickr.com/2647/3816311930_ac52cff491_o.jpg
http://i130.photobucket.com/albums/p266/feike1977/PES6-4-3-3defencesettings.jpghttp://images.usatoday.com/life/_photos/2006/11/30/numb3rs-topper.jpg
http://www.flickr.com/photos/jakecaptive/3205277810/sizes/l/in/photostream/
Recommended