Transcript
Page 1: Lighthouse: Large-scale graph pattern matching on Giraph

LighthouseLarge-scale graph pattern matching on Giraph

Page 2: Lighthouse: Large-scale graph pattern matching on Giraph

2

Page 3: Lighthouse: Large-scale graph pattern matching on Giraph

Timeline• Inspired by Google Pregel (2010)

• Donated to ASF by Yahoo! in 2011

• Top-level project in 2012

• 1.0 release in January 2013

• 1.1 release in November 2014

• Used at Facebook, LinkedIn, Yahoo!

3

Page 4: Lighthouse: Large-scale graph pattern matching on Giraph

Vertex-centric API

5

?

?

?

2

3

Iteration i+1Iteration i

4

Page 5: Lighthouse: Large-scale graph pattern matching on Giraph

PU 1

PU 2

PU 3

PU 4

PU 5

Iteration i Iteration i+1

BSP/Pregel implementation

5

Page 6: Lighthouse: Large-scale graph pattern matching on Giraph

Architecture

Netty Netty Netty Netty

...

Hadoop File System (HDFS)

Zookeeper

Master Coordinator

Worker 1 Worker 2 Worker N Master

Compute threads

Vertices

Message Inbox

Message Outbox

6

Page 7: Lighthouse: Large-scale graph pattern matching on Giraph

Lighthouse

Page 8: Lighthouse: Large-scale graph pattern matching on Giraph

Giraph execution algebra

Binding Table. Matching and potential graph patterns are stored in a table that is distributed across the messages sent around by vertices. !• Scan: starts traversals from certain vertices. • Select: prunes traversals based on expressions. • Project: adds data to the binding table. • Hash Join: joins paths generated from different traversals • Step Join: performs a further hop in the traversal. • Move: continues a traversal from different vertices.

8

Page 9: Lighthouse: Large-scale graph pattern matching on Giraph

5

?

?

?

2

3

Iteration i+1Iteration i

V1 John … VN

… … … …

V4 Paul … VJ

V7 Mark … VL

Distributed Binding Table

9

Page 10: Lighthouse: Large-scale graph pattern matching on Giraph

MATCH (person:Person {firstName:"Antonio"}) -[:WORK_AT]-> (company), (company) -[:IS_LOCATED_IN]-> (country)

WHERE person.browser = "Chrome" RETURN person.id, person.lastName, company.id, country.id

10

Page 11: Lighthouse: Large-scale graph pattern matching on Giraph

MATCH (person:Person) -[:WORK_AT]-> (company) RETURN person.id, person.birthDate, company.id

11

Page 12: Lighthouse: Large-scale graph pattern matching on Giraph

Scan

Project12

Page 13: Lighthouse: Large-scale graph pattern matching on Giraph

StepJoin

13

Page 14: Lighthouse: Large-scale graph pattern matching on Giraph

Cypher path-queriesDesired functionality: • weighted shortest paths • multiple source and destinations • top N shortest paths for each pair • provide both paths and their costs • restrict search to subset of graph

Restrictions: • Monotonic cost function • Path-independent local vertex/edge restrictions

14

Page 15: Lighthouse: Large-scale graph pattern matching on Giraph

ProposalMATCH p = (a:Start) -[e* | not(endNode(e)).danger ]-> (b:Finish)

CHEAPEST 3 SUM e.distance * e.maxSpeed AS length RETURN a, b, path, length

Features: • Selector applied before WHERE condition (optional) • Number of paths for each pair (e.g. 3) (optional) • User-defined cost function (required) • AS keyword to bind distance to variable (optional)

15

Page 16: Lighthouse: Large-scale graph pattern matching on Giraph

Giraph implementation

Two phases: !• First phase: we compute the routes of each top K

shortest paths. Each vertex discovers and registers the precedent vertex in the shortest paths (similar to Pregel BFS).

• Second phase: starting from “leaves”, we traverse back the structure building the paths.

16

Page 17: Lighthouse: Large-scale graph pattern matching on Giraph

Preliminary results

17

Page 18: Lighthouse: Large-scale graph pattern matching on Giraph

Thanks.


Recommended