# Lighthouse: Large-scale graph pattern matching on Giraph

• View
26

1

Embed Size (px)

Transcript

• LighthouseLarge-scale graph pattern matching on Giraph

• 2

• Timeline Inspired by Google Pregel (2010) Donated to ASF by Yahoo! in 2011 Top-level project in 2012 1.0 release in January 2013 1.1 release in November 2014 Used at Facebook, LinkedIn, Yahoo!

3

• Vertex-centric API

5

?

?

?

2

3

Iteration i+1Iteration i

4

• PU 1

PU 2

PU 3

PU 4

PU 5

Iteration i Iteration i+1

BSP/Pregel implementation

5

• Architecture

Netty Netty Netty Netty

...

Zookeeper

Master Coordinator

Worker 1 Worker 2 Worker N Master

Vertices

Message Inbox

Message Outbox

6

• Lighthouse

• Giraph execution algebra

Binding Table. Matching and potential graph patterns are stored in a table that is distributed across the messages sent around by vertices. ! Scan: starts traversals from certain vertices. Select: prunes traversals based on expressions. Project: adds data to the binding table. Hash Join: joins paths generated from different traversals Step Join: performs a further hop in the traversal. Move: continues a traversal from different vertices.

8

• 5

?

?

?

2

3

Iteration i+1Iteration i

V1 John VN

V4 Paul VJ

V7 Mark VL

Distributed Binding Table

9

• MATCH (person:Person {firstName:"Antonio"}) -[:WORK_AT]-> (company), (company) -[:IS_LOCATED_IN]-> (country)

WHERE person.browser = "Chrome" RETURN person.id, person.lastName, company.id, country.id

10

• MATCH (person:Person) -[:WORK_AT]-> (company) RETURN person.id, person.birthDate, company.id

11

• Scan

Project12

• StepJoin

13

• Cypher path-queriesDesired functionality: weighted shortest paths multiple source and destinations top N shortest paths for each pair provide both paths and their costs restrict search to subset of graph

Restrictions: Monotonic cost function Path-independent local vertex/edge restrictions

14

• ProposalMATCH p = (a:Start) -[e* | not(endNode(e)).danger ]-> (b:Finish)

CHEAPEST 3 SUM e.distance * e.maxSpeed AS length RETURN a, b, path, length

Features: Selector applied before WHERE condition (optional) Number of paths for each pair (e.g. 3) (optional) User-defined cost function (required) AS keyword to bind distance to variable (optional)

15

• Giraph implementation

Two phases: ! First phase: we compute the routes of each top K

shortest paths. Each vertex discovers and registers the precedent vertex in the shortest paths (similar to Pregel BFS).

Second phase: starting from leaves, we traverse back the structure building the paths.

16

• Preliminary results

17

• Thanks.