AURELIUS THINKAURELIUS.COM
Adding Value Through Graph Analysis
Matthias Broecheler, CTO @mbroecheler March V, MMXIII
KNOWLEDGE INFORMATION DATA
"
"
"
"
"
"
"
"
"
"
"
Communities of Interest
Finding Influencers
Understanding Behavior
"
"
"
"
"
"
"
"
"
"
"
Information Integration
Recommendation
Question Answering
"
"
"
"
"
"
"
"
"
"
"
Fraud Detection
Risk Analysis
Market Valuation
Data
Information
Knowledge
Val
ue
Data
Information
Knowledge
2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trending ; ACTION=CLICK|DELAY=250|x=450|y=632!
"
"
userid:3552
addid:9914 clicked timestamp: 93932342
likes(Jane Joe, cute mamals):0.8
Data
Information
Knowledge
2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trending ; ACTION=CLICK|DELAY=250|x=450|y=632!
"
"
userid:3552
addid:9914 clicked timestamp: 93932342
likes(Jane Joe, cute mamals):0.8
Graph Databases &
Graph Analysis
AURELIUS THINKAURELIUS.COM
I Graph Foundation
Graph
name: Jupiter type: god
name: Pluto type: god
name: Neptune type: god
name: Hercules type: demigod
name: Cerberus type: monster
name: Alcmene type: god
name: Saturn type: titan
Vertex Property
Graph
name: Jupiter type: god
name: Pluto type: god
name: Neptune type: god
name: Hercules type: demigod
name: Cerberus type: monster
name: Alcmene type: god
name: Saturn type: titan
father father
mother brother
brother battled
pet
time:12
Edge
Edge Property
Edge Type
Path
name: Jupiter type: god
name: Pluto type: god
name: Neptune type: god
name: Hercules type: demigod
name: Cerberus type: monster
name: Alcmene type: god
name: Saturn type: titan
father father
mother brother
brother battled
pet
time:12
Degree
name: Jupiter type: god
name: Pluto type: god
name: Neptune type: god
name: Hercules type: demigod
name: Cerberus type: monster
name: Alcmene type: god
name: Saturn type: titan
father father
mother brother
brother battled
pet
time:12
Aurelius Graph Cluster
Stores a massive-scale property graph allowing real-time traversals and updates
Batch processing of large graphs with Hadoop
Runs global graph algorithms on large, compressed,
in-memory graphs
Map/Reduce
Analysis results back into Titan
Apache 2
TITAN FAUNUS FULGORA
Bulk Load
Load
AURELIUS THINKAURELIUS.COM
II Titan Graph Database
Numerous Concurrent Users Many Short Transactions
read/write
Real-time Traversals (OLTP) High Availability Dynamic Scalability Variable Consistency Model
ACID or eventual consistency
Real-time Big Graph Data
Titan Features
Storage Backends
Partitionability
Availability Consistency
$ ./titan-0.2.0/bin/gremlin.sh! ! ! !\,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = TitanFactory.open('/tmp/titan')!==>titangraph[local:/tmp/titan]!gremlin> v = g.V(‘name’,’Hercules’)!==>v[4]!gremlin> v.out(‘father’).out(‘brother’).name!
name: Jupiter type: god
name: Pluto type: god
name: Neptune type: god
name: Hercules type: demigod
name: Cerberus type: monster
name: Alcmene type: god
name: Saturn type: titan
father father
mother brother
brother battled
pet
time:12
gremlin> v.out(‘father’).out(‘brother’).name!
Vertex-Centric Indices
Sort and index edges per vertex by primary key Primary key can be composite
Enables efficient focused traversals Only retrieve edges that matter
Uses push down predicates for quick, index-driven retrieval
v
time: 1
fought fought father
mother
battled battled battled
battled
time: 3 time: 5
time: 9 v.query()!
v
time: 1
father
mother
battled battled battled
battled
time: 3 time: 5
time: 9 v.query()! .direction(OUT)!
v
time: 1
battled battled battled
battled
time: 3 time: 5
time: 9 v.query()! .direction(OUT)! .labels(‘battled’)!
v
time: 1
battled battled
time: 3
v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!
Titan Features
I. Data Management
II. Vertex-Centric Indices
Titan Features
III. Graph Partitioning
IV. Edge Compression
AURELIUS THINKAURELIUS.COM
III TITAN 0.3.0 [-SNAPSHOT]
Titan Embedding
Rexster RexPro lightweight Gremlin
Server binary protocol
Titan Gremlin Engine Embedded Storage
Backend in-JVM method calls
Native clients Java, Python, Clojure
Graph Indexing
Vertex and Edge indexing
Pluggable index provider ElasticSearch
Lucene
Full-text search
Numeric range search
Geographic search
name: Jupiter age: 4800 title: God of the heaven and skies
name: Pluto age: 4900 title: God of the underworld
name: Neptune age: 5200 title: God of the earth and ocean
name: Hercules title: Divine hero
name: Cerberus title: Ugly beast of the underworld
name: Alcmene age: 3300
name: Saturn age: 5900
father father
mother brother
brother
battled
pet
time:12 location: (38.071,23.745)
name: Jupiter age: 4800 title: God of the heaven and skies
name: Pluto age: 4900 title: God of the underworld
name: Neptune age: 5200 title: God of the earth and ocean
name: Hercules title: Divine hero
name: Cerberus title: Ugly beast of the underworld
name: Alcmene age: 3300
name: Saturn age: 5900
father father
mother brother
brother
battled
pet
time:12 location: (38.071,23.745)
g.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!
name: Jupiter age: 4800 title: God of the heaven and skies
name: Pluto age: 4900 title: God of the underworld
name: Neptune age: 5200 title: God of the earth and ocean
name: Hercules title: Divine hero
name: Cerberus title: Ugly beast of the underworld
name: Alcmene age: 3300
name: Saturn age: 5900
father father
mother brother
brother
battled
pet
time:12 location: (38.071,23.745)
g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
name: Jupiter age: 4800 title: God of the heaven and skies
name: Pluto age: 4900 title: God of the underworld
name: Neptune age: 5200 title: God of the earth and ocean
name: Hercules title: Divine hero
name: Cerberus title: Ugly beast of the underworld
name: Alcmene age: 3300
name: Saturn age: 5900
father father
mother brother
brother
battled
pet
time:12 location: (38.071,23.745)
g.query().has(‘age’,Cmp.GREATER_THAN,5000) has(‘title’,Txt.CONTAINS,’god’).vertices()!
name: Jupiter age: 4800 title: God of the heaven and skies
name: Pluto age: 4900 title: God of the underworld
name: Neptune age: 5200 title: God of the earth and ocean
name: Hercules title: Divine hero
name: Cerberus title: Ugly beast of the underworld
name: Alcmene age: 3300
name: Saturn age: 5900
father father
mother brother
brother
battled
pet
time:12 location: (38.071,23.745)
g.query().has(‘location’,Geo.WITHIN, Geoshape.circle(38,23,100).edges()!
AURELIUS THINKAURELIUS.COM
IV Faunus Graph Analytics
Hadoop-based Graph Computing Framework
Graph Analytics
Breadth-first Traversals
Global Graph Computations
Batch Big Graph Data
Faunus Features
Faunus Architecture
g._()!
Faunus Work Flow
hdfs://user/ubuntu/
output/job-0/
output/job-1/
output/job-2/ { graph*
sideeffect*
g.V.out .out .count()
Compressed HDFS Graphs stored in sequence files variable length encoding prefix compression
Aurelius Graph Cluster
Stores a massive-scale property graph allowing real-time traversals and updates
Batch processing of large graphs with Hadoop
Runs global graph algorithms on large, compressed,
in-memory graphs
Map/Reduce
Analysis results back into Titan
Apache 2
TITAN FAUNUS FULGORA
Bulk Load
Load
What’s New
Faunus 0.1 released
Bulk Import / Export for Titan loaded graph into Titan
loading derivations into Titan
RDF support
Many optimizations vertex compression
Faunus Setup
$ bin/gremlin.sh !
\,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!==>faunusgraph[titanhbaseinputformat]!gremlin> g.getProperties()!==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat!==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!==>faunus.output.location=dbpedia!==>faunus.output.location.overwrite=true!
gremlin> g._() !12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
Build a Knowledge Graph
Based on DBPedia Graph version of Wikipedia ~290 million edges (~1B triples)
1. Bulk load RDF into Faunus 6 m1.xlarge
2. Convert to property graph 3. Bulk load into Titan
3 m1.xlarge with Cassandra
4. OLTP+OLAP Total Time: ~ 2 hours
gremlin> g = TitanFactory.open('bin/cassandra.local') !==>titangraph[cassandrathrift:10.176.213.110]!
gremlin> g.V('name','Random_walker_algorithm').both.name!==>Random_walk!==>Segmentation_(image_processing)!==>Graph_(mathematics)!==>Laplacian_matrix!==>Graph!==>Laplacian_matrix!==>Electrical_network!==>Resistor!==>Electrical_resistance_and_conductance!==>Ground_(electricity)!==>Direct_current!==>Voltage_source!==>Precomputation!==>Category:Computer_vision!==>Random_Walker_(Computer_Vision)!==>List_of_algorithms!==>Segmentation_(image_processing)!==>Watershed_(image_processing)!==>Random_walker_(computer_vision)!==>Random_Walker_(computer_vision)!
Graph OLTP
gremlin> g.V('name','Learning').out.out.out.out[0..10].name !==>Latium!==>Roman_Kingdom!==>Roman_Republic!==>Roman_Empire!==>Middle_Ages!==>Early_modern_Europe!==>Armenian_Kingdom_of_Cilicia!==>Lingua_franca!==>Vatican_City!==>Vulgar_Latin!==>Romance_languages!
Aurelius Graph Cluster
Stores a massive-scale property graph allowing real-time traversals and updates
Batch processing of large graphs with Hadoop
Runs global graph algorithms on large, compressed,
in-memory graphs
Map/Reduce
Analysis results back into Titan
Apache 2
TITAN FAUNUS FULGORA
Bulk Load
Load
The Graph Landscape Sp
eed
of T
rave
rsal
/Pro
cess
Size of Graph Illustration only, not to scale
TINKERPOP.COM
AURELIUS THINKAURELIUS.COM
Thanks!
Vadas Gintautas @vadasg
Marko Rodriguez @twarko
Stephen Mallette @spmallette
Daniel LaRocque
AURELIUS THINKAURELIUS.COM
We are Hiring