Lighthouse: Large-scale graph pattern matching on Giraph

  • View
    26

  • Download
    1

Embed Size (px)

Transcript

  • LighthouseLarge-scale graph pattern matching on Giraph

  • 2

  • Timeline Inspired by Google Pregel (2010) Donated to ASF by Yahoo! in 2011 Top-level project in 2012 1.0 release in January 2013 1.1 release in November 2014 Used at Facebook, LinkedIn, Yahoo!

    3

  • Vertex-centric API

    5

    ?

    ?

    ?

    2

    3

    Iteration i+1Iteration i

    4

  • PU 1

    PU 2

    PU 3

    PU 4

    PU 5

    Iteration i Iteration i+1

    BSP/Pregel implementation

    5

  • Architecture

    Netty Netty Netty Netty

    ...

    Hadoop File System (HDFS)

    Zookeeper

    Master Coordinator

    Worker 1 Worker 2 Worker N Master

    Compute threads

    Vertices

    Message Inbox

    Message Outbox

    6

  • Lighthouse

  • Giraph execution algebra

    Binding Table. Matching and potential graph patterns are stored in a table that is distributed across the messages sent around by vertices. ! Scan: starts traversals from certain vertices. Select: prunes traversals based on expressions. Project: adds data to the binding table. Hash Join: joins paths generated from different traversals Step Join: performs a further hop in the traversal. Move: continues a traversal from different vertices.

    8

  • 5

    ?

    ?

    ?

    2

    3

    Iteration i+1Iteration i

    V1 John VN

    V4 Paul VJ

    V7 Mark VL

    Distributed Binding Table

    9

  • MATCH (person:Person {firstName:"Antonio"}) -[:WORK_AT]-> (company), (company) -[:IS_LOCATED_IN]-> (country)

    WHERE person.browser = "Chrome" RETURN person.id, person.lastName, company.id, country.id

    10

  • MATCH (person:Person) -[:WORK_AT]-> (company) RETURN person.id, person.birthDate, company.id

    11

  • Scan

    Project12

  • StepJoin

    13

  • Cypher path-queriesDesired functionality: weighted shortest paths multiple source and destinations top N shortest paths for each pair provide both paths and their costs restrict search to subset of graph

    Restrictions: Monotonic cost function Path-independent local vertex/edge restrictions

    14

  • ProposalMATCH p = (a:Start) -[e* | not(endNode(e)).danger ]-> (b:Finish)

    CHEAPEST 3 SUM e.distance * e.maxSpeed AS length RETURN a, b, path, length

    Features: Selector applied before WHERE condition (optional) Number of paths for each pair (e.g. 3) (optional) User-defined cost function (required) AS keyword to bind distance to variable (optional)

    15

  • Giraph implementation

    Two phases: ! First phase: we compute the routes of each top K

    shortest paths. Each vertex discovers and registers the precedent vertex in the shortest paths (similar to Pregel BFS).

    Second phase: starting from leaves, we traverse back the structure building the paths.

    16

  • Preliminary results

    17

  • Thanks.