Upload
kaylie-sherod
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
AllegroGraph as a Graph Database
Jans Aasman, Ph.D.CEO - Franz [email protected]
Contents
• AllegroGraph as a – QuintupleStore (well OcttupleStore in 2011)– RDF store– Graph Database
• Agraph architecture• Extreme use cases
– AMDOCS … CRM on top a trillion triples– Pharmaceutic … explore connections in graph space– Demo
Agraph as a quintuple store
• S, P, O, G + unique ID + transaction #• SPOG can be any data type
1 2.0 3 42001-12-12 after 010-12-12 +19258781444Jans loves pizza file1 12NoOne believes 12
• And include very efficient geospatial and temporal representations and indices
• 6 default indices, 24 user controlled indices• Range indexing, Freetext Indexing• Neighborhood matrixes & UPI maps (for 1 ms access)• 2011: time, security
Agraph as an RDF store
• RDF store when you adhere to the RDF conventions.• Full Sparql 1.0, most of Sparql 1.1• RDFS++ reasoner• GeoSpatial and Temporal representations.• Prolog for Rules• Soon Common Logic (CLIF+)
– As a usability layer on top of Prolog– Easier to combine Rules and Queries
Agraph as a Graph Database
• If you want a Property Graph: – use the graph argument
Jans loves pizza gr1gr1 weight 90gr1 author Sophia
Schema
• Node typing• Edge typing• Attributes (nodes)• Attributes (edges)• Directed edges• Undirected edges• Restricted edges• Loop edges• Attribute indexing• Starting node• Schema
Yes Yes Yes Yes: A trusts B gr1, gr1 certainty 80. Yes: A trusts B Yes: if using RDFS symmetric property or generators Yes, if it means there can be islands. Yes, A loves A Yes No, although, is that a DB property? Yes and No: On demand you can use Ontology and
validation is straight forward
Querying
• Language• Traversals
Lisp, Prolog, JavaScript and toy version of Gremlin Yes, through adjacency lists and special indices.. This
seems to be an implementation point and not a fundamental property
Database
• Transactional• ACID• Fully Indexed• Distributed• Cache• Embeddable• Store-engine• Migration
framework• Object mapping
Yes Yes Yes Federation (in-machine, between machines), AG5 Yes, adjacency vectors (neighbourhood matrics) Yes: 3.3, No: 4.2.x Custom From RDB to Graph DB? Various
Only in Lisp, not in clients.
Utilities
• Shell• Algorithms• Benchmark• Protocols• RDF Store• OWL Store• IDE Integration• Admin tool• Importer• Exporter• Loader• Scripting Language
All from Lisp shell, some from cshell, wget/curl Yes, JavaScript, Prolog and Lisp Yes, but only for RDF stores and reasoning REST/JSON Yes Yes Yes Yes, AGWebview Yes, from various input formats Yes, clients lets you dump triples AGLoad, Gruff, AGWebview Lisp and Javascript.
Languages
• Java • Python• Ruby• C#• Scala• Clojure• Perl• PHP
Many graph algorithms using generator model
• Because of Social Network Analysis requirements we implement many graph algorithms.– Using generators– A first class function that takes
• One node as input• Returns all children
• And neighbourhood matrices(or adjacency hash-tables) forspeed.
how far is Actor1 from Actor2?
• Degrees of separation– How far is P1 from P2
• Connection strength– How many shortest paths
from P1 to P2 through a series of predicates and rules
In what groups is this actor?
• Find the ego-network around a person or thing– Friend, friends
of friends, etc.
• Find all the fully connect graphs around a personor thing
Questions in SNA: How Important is an actor?
• In-degree, out-degree
• Actor degree centrality– I have the most connections
in a group so I am more important
• Actor closeness centrality– I have more shortest paths to
anyone else in the group so I am more important
• Actor betweenness centrality– I am more often on the shortest path between other people in the group so I am
more important. I can control flow of information better than other people
Has the group a leader, is the group cohesive?
• Group centralization– How centralized is this group?– Does this group have a leader– Is there someone controlling
the information flow
• Group cohesiveness– How strong and well
connected is this group– Are most people connected– What is the density
All search and SNA functions use Generators
• Generator– Input: one node– Output: list of nodes– Fully functional, can be complex sparql or prolog queries – Or just predicates and indication of direction
How to get from A to E??
subj pred obj a dinner-with b a kissed-with c c movie-with e b kissed-with d d movie-with e e dinner-with a
(defgenerator knows (node) (objects-of :p dinner-with))
(defgenerator knows (node) (objects-of :p dinner-with) (subjects-of :p dinner-with))
How to get from A to E??
(defgenerator knows () (object-of :p dinner-with) (subject-of :p dinner-with) (object-of :p movie-with) (subject-of :p movie-with) (object-of :p kissed-with) (subject-of :p kissed-with))
(defgenerator knows () (undirected (dinner-with movie-with kissed-with)))
Declaratively specify
(generator knows (node) (select (?x) (q ??node movie-with ?x)
(q ??node dinner-with ?x) (not (q ??node kissed-with ?x))) (select (?x) (q ?x movie-with ??node) (q- ?x dinner-with ??node) (not (q- ?x kissed-with ??node)))
Sample SNA functions
(Ego-group actor generator depth ?group)- binds ?group to group of nodes
(Ego-group-members actor generator depth ?a) - bind ?a to every member in the group
(Cliques actor generator min-depth ?cl)- binds ?cl to all cliques
(Clique-members actor generator min-depth ?cl ?a)- binds ?cl to cliques and then iterates of ever member ?a in ?cl
(Actor-centrality actor group generator ?num) - binds ?num to actorcentrality
(Actor-centrality-members group ?actor ?num) - binds ?actor to every actor in group, ?centrality is centrality of
that actor, we start with the actor with highest centrality.(Group-centrality group generator ?num)
Actor = single node
Group = list of nodes
Depth = number
Generator = generator
Integrated in Prolog and Common Logic (CLIF)
(defgenerator knows (node) (undirected :p (!fr:dinner-with !fr:kissed-with))) (select (?x) (ego-group-members !person:jans knows ?x 2) (q ?x !geo:place ?y) (geo-box-around !geoname:Berkeley ?y 5 miles))
(select (?x) (ego-group !person:jans knows ?group 2) (actor-centrality-members ?group knows ?x ?num) (q ?x !geo:place ?y) (geo-box-around !geoname:Berkeley ?y 5 miles))
Where we use this?
• Amdocs: Know everything about every customer– Partitioned on customer– Most graph search centered in client
• Pfizer: help me find connections between drugs, diseases, genes, side effects in a sea of clinical trials– Just a mess of data– All graph search in server
Traditional Business Intelligence
Can tell you ALL about the average customer
but NOTHING about the individual.
Can you in < 1 second with one push of a button
• Predict the three most likely reasons why Joe Smith from Kansas is calling the call center? Bill unexpectedly high, loosing connection too often, doesn’t know how to use new subscription service?
• The ten last events that happened for JS? Phone calls, sms, downloads of movie, device stopped working, payment of bill, looking at map, search for local store.
• What is the likelyhood that he will change from T-Mobile to Sprint or AT&T?
• What are his ten most important friends and what devices do they have. And who is the first to change and who follows?
Can you in < 1 second with one push of a button
• What are the usual daily locations for this person? What kind of shops?
• What kind of services does he download, what kind of movies/music/games does he like, what products does he buy?
• Is his plan the right plan for him?• Is he in a good mood?• Is he a valuable customer, is he a good payer, what is your
margin on him, how many times per month does he call a call center, does he look up help for mail on the internet? Can you predict if he is going to pay the bill?
Events Decision Engine
ContainerContainer
Actions
SBA Application Server
“Sesame”
AllegroGraphTriple Store DB
EventIngestion
ScheduledEvents
Inference Engine(Business Rules)
BayesianBeliefNetwork
Events
Operational Systems
Event Data Sources
Amdocs Event Collector
CRMCRMRM
Amdocs Integration Framework
OMS
NW Web 2.0
Architecture
Work for Pharma
sider
Gruff Demo
What about Scalability
Architecture overview
Storage layer ( compression, indexing, freetext, transactions )
Session Management, Query Engine, Federation
RESTBackup/Restore
Replication
Warm Failover
Security
Management
Sparql Prolog Rules Clif++ Geo SNA Time RDFS+ Java-
Script
Java:Sesame Jena Python Ruby C# Clojure
Scala Perl
• Thanks…