Adding Value through graph analysis using Titan and Faunus

Preview:

DESCRIPTION

In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems. This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.

Citation preview

AURELIUS THINKAURELIUS.COM

Adding Value Through Graph Analysis

Matthias Broecheler, CTO @mbroecheler March V, MMXIII

KNOWLEDGE INFORMATION DATA

"

"

"

"

"

"

"

"

"

"

"

Communities of Interest

Finding Influencers

Understanding Behavior

"

"

"

"

"

"

"

"

"

"

"

Information Integration

Recommendation

Question Answering

"

"

"

"

"

"

"

"

"

"

"

Fraud Detection

Risk Analysis

Market Valuation

Data

Information

Knowledge

Val

ue

Data

Information

Knowledge

2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trending ; ACTION=CLICK|DELAY=250|x=450|y=632!

"

"

userid:3552

addid:9914 clicked timestamp: 93932342

likes(Jane Joe, cute mamals):0.8

Data

Information

Knowledge

2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trending ; ACTION=CLICK|DELAY=250|x=450|y=632!

"

"

userid:3552

addid:9914 clicked timestamp: 93932342

likes(Jane Joe, cute mamals):0.8

Graph Databases &

Graph Analysis

AURELIUS THINKAURELIUS.COM

I Graph Foundation

Graph

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

Vertex Property

Graph

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

father father

mother brother

brother battled

pet

time:12

Edge

Edge Property

Edge Type

Path

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

father father

mother brother

brother battled

pet

time:12

Degree

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

father father

mother brother

brother battled

pet

time:12

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce

Analysis results back into Titan

Apache 2

TITAN FAUNUS FULGORA

Bulk Load

Load

AURELIUS THINKAURELIUS.COM

II Titan Graph Database

  Numerous Concurrent Users   Many Short Transactions

  read/write

  Real-time Traversals (OLTP)   High Availability   Dynamic Scalability   Variable Consistency Model

  ACID or eventual consistency

 Real-time Big Graph Data

Titan Features

Storage Backends

Partitionability

Availability Consistency

$ ./titan-0.2.0/bin/gremlin.sh! ! ! !\,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = TitanFactory.open('/tmp/titan')!==>titangraph[local:/tmp/titan]!gremlin> v = g.V(‘name’,’Hercules’)!==>v[4]!gremlin> v.out(‘father’).out(‘brother’).name!

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

father father

mother brother

brother battled

pet

time:12

gremlin> v.out(‘father’).out(‘brother’).name!

Vertex-Centric Indices

  Sort and index edges per vertex by primary key   Primary key can be composite

  Enables efficient focused traversals   Only retrieve edges that matter

  Uses push down predicates for quick, index-driven retrieval

v

time: 1

fought fought father

mother

battled battled battled

battled

time: 3 time: 5

time: 9 v.query()!

v

time: 1

father

mother

battled battled battled

battled

time: 3 time: 5

time: 9 v.query()! .direction(OUT)!

v

time: 1

battled battled battled

battled

time: 3 time: 5

time: 9 v.query()! .direction(OUT)! .labels(‘battled’)!

v

time: 1

battled battled

time: 3

v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!

Titan Features

I.  Data Management

II. Vertex-Centric Indices

Titan Features

III.  Graph Partitioning

IV.  Edge Compression

AURELIUS THINKAURELIUS.COM

III TITAN 0.3.0 [-SNAPSHOT]

Titan Embedding

  Rexster RexPro   lightweight Gremlin

Server   binary protocol

  Titan Gremlin Engine   Embedded Storage

Backend   in-JVM method calls

  Native clients   Java, Python, Clojure

Graph Indexing

  Vertex and Edge indexing

  Pluggable index provider   ElasticSearch

  Lucene

  Full-text search

  Numeric range search

  Geographic search

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

g.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

g.query().has(‘age’,Cmp.GREATER_THAN,5000) has(‘title’,Txt.CONTAINS,’god’).vertices()!

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

g.query().has(‘location’,Geo.WITHIN, Geoshape.circle(38,23,100).edges()!

AURELIUS THINKAURELIUS.COM

IV Faunus Graph Analytics

  Hadoop-based Graph Computing Framework

  Graph Analytics

  Breadth-first Traversals

  Global Graph Computations

 Batch Big Graph Data

Faunus Features

Faunus Architecture

g._()!

Faunus Work Flow

hdfs://user/ubuntu/

output/job-0/

output/job-1/

output/job-2/ { graph*

sideeffect*

g.V.out .out .count()

Compressed HDFS Graphs   stored in sequence files   variable length encoding   prefix compression

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce

Analysis results back into Titan

Apache 2

TITAN FAUNUS FULGORA

Bulk Load

Load

What’s New

  Faunus 0.1 released

  Bulk Import / Export for Titan   loaded graph into Titan

  loading derivations into Titan

  RDF support

  Many optimizations   vertex compression

Faunus Setup

$ bin/gremlin.sh !

\,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!==>faunusgraph[titanhbaseinputformat]!gremlin> g.getProperties()!==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat!==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!==>faunus.output.location=dbpedia!==>faunus.output.location.overwrite=true!

gremlin> g._() !12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!

Build a Knowledge Graph

  Based on DBPedia   Graph version of Wikipedia   ~290 million edges (~1B triples)

1.  Bulk load RDF into Faunus   6 m1.xlarge

2.  Convert to property graph 3.  Bulk load into Titan

  3 m1.xlarge with Cassandra

4.  OLTP+OLAP   Total Time: ~ 2 hours

gremlin> g = TitanFactory.open('bin/cassandra.local') !==>titangraph[cassandrathrift:10.176.213.110]!

gremlin> g.V('name','Random_walker_algorithm').both.name!==>Random_walk!==>Segmentation_(image_processing)!==>Graph_(mathematics)!==>Laplacian_matrix!==>Graph!==>Laplacian_matrix!==>Electrical_network!==>Resistor!==>Electrical_resistance_and_conductance!==>Ground_(electricity)!==>Direct_current!==>Voltage_source!==>Precomputation!==>Category:Computer_vision!==>Random_Walker_(Computer_Vision)!==>List_of_algorithms!==>Segmentation_(image_processing)!==>Watershed_(image_processing)!==>Random_walker_(computer_vision)!==>Random_Walker_(computer_vision)!

Graph OLTP

gremlin> g.V('name','Learning').out.out.out.out[0..10].name !==>Latium!==>Roman_Kingdom!==>Roman_Republic!==>Roman_Empire!==>Middle_Ages!==>Early_modern_Europe!==>Armenian_Kingdom_of_Cilicia!==>Lingua_franca!==>Vatican_City!==>Vulgar_Latin!==>Romance_languages!

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce

Analysis results back into Titan

Apache 2

TITAN FAUNUS FULGORA

Bulk Load

Load

aureliusgraphs@googlegroups.com

The Graph Landscape Sp

eed

of T

rave

rsal

/Pro

cess

Size of Graph Illustration only, not to scale

TINKERPOP.COM

AURELIUS THINKAURELIUS.COM

Thanks!

Vadas Gintautas @vadasg

Marko Rodriguez @twarko

Stephen Mallette @spmallette

Daniel LaRocque

AURELIUS THINKAURELIUS.COM

We are Hiring

Recommended