GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Preview:

DESCRIPTION

GraphConnect 2014 SF: From Zero to Graph in 120: Scale

Citation preview

Scaling  Neo4j  Applica0ons  

SAN  FRANCISCO  |  10.22.2014  

powered by!

powered by!

@iansrobinson  

The  Burden  of  Success  

•  More  users  •  Larger  datasets  •  More  concurrent  requests  •  More  complex  queries  

Scaling  is  a  Feature  

•  It  doesn’t  come  for  free  •  Condi0ons  of  success:    – Understand  current  needs  

•  Design  for  an  order  of  magnitude  growth  

–  Itera0ve  and  incremental  development  – Unit  tests  

•  Bedrock  of  asserted  behaviour  – Performance  tests  

Overview  

•  Scaling  Reads  – Latency  – Throughput  

•  Scaling  Writes  •  Hardware  

Scaling  Reads  -­‐  Latency  

Query  Latency  

latency = f(search_area)

Query  Latency  

latency = f(search_area)

Query  Latency  

latency = f(search_area)

Query  Latency  

latency = f(search_area)

Query  Latency  

latency = f(search_area)

Query  Latency  

latency = f(search_area)

Search  Area  

search_area = f(domain_invariants)

Search  Area  

search_area = f(domain_invariants)

Absolute  Every  user  has  50  friends      

Search  Area  

search_area = f(domain_invariants)

Absolute  Every  user  has  50  friends      

Search  Area  

search_area = f(domain_invariants)

Absolute  Every  user  has  50  friends    Rela,ve  Every  user  is  friends  with  10%  of  the  user  base  

Search  Area  

search_area = f(domain_invariants)

Absolute  Every  user  has  50  friends    Rela,ve  Every  user  is  friends  with  10%  of  the  user  base  

Reducing  Read  Latency  

•  The  Blackadder  solu0on  

Reducing  Read  Latency  

•  The  Blackadder  solu0on  •  Improve  the  Cypher  query  •  Change  the  model  •  Use  an  Unmanaged  Extension  

Improve  Cypher  Query  

•  Small  queries,  separated  by  WITH•  Start  from  low-­‐cardinality  nodes  

h\p://thought-­‐bytes.blogspot.co.uk/2013/01/op0mizing-­‐neo4j-­‐cypher-­‐queries.html  h\p://wes.skeweredrook.com/pragma0c-­‐cypher-­‐op0miza0on-­‐2-­‐0-­‐m06/  

Change  the  Model  

Goal  Do  less  work  (in  the  query)  –  By  exploring  less  of  the  graph  

How?  Iden0fy  inferred  rela-onships  –  Replace  with  use-­‐case  specific  shortcuts  

Change  the  Model  -­‐  From  

MATCH (:Person{username:'ben'}) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (colleague:Person)  

Change  the  Model  -­‐  From  

MATCH (:Person{username:'ben'}) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (colleague:Person)  

Change  the  Model  -­‐  To  

MATCH (:Person{username:'ben'}) -[:WORKED_WITH]- (colleague:Person)  

Tradeoff  

More  expensive  writes  More  data  

Cheaper  reads  

When  to  add  the  new  rela0onship?  • With  tx  • Queue  for  subsequent  tx  •  Periodic/batch  

Refactor  Exis0ng  Data  

MATCH (p1:Person) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (p2:Person)WHERE NOT ((p1)-[:WORKED_WITH]-(p2))WITH DISTINCT p1, p2 LIMIT 10MERGE (p1)-[r:WORKED_WITH]-(p2)RETURN count(r)  

Select  Batch  

MATCH (p1:Person) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (p2:Person)WHERE NOT ((p1)-[:WORKED_WITH]-(p2))WITH DISTINCT p1, p2 LIMIT 10MERGE (p1)-[r:WORKED_WITH]-(p2)RETURN count(r)  

Batch  size  

Add  New  Rela0onship  

MATCH (p1:Person) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (p2:Person)WHERE NOT ((p1)-[:WORKED_WITH]-(p2))WITH DISTINCT p1, p2 LIMIT 10MERGE (p1)-[r:WORKED_WITH]-(p2)RETURN count(r)  

Con0nue  While  count(r)  >  0  

MATCH (p1:Person) -[:WORKED_ON]->(:Project)<-[:WORKED_ON]- (p2:Person)WHERE NOT ((p1)-[:WORKED_WITH]-(p2))WITH DISTINCT p1, p2 LIMIT 10MERGE (p1)-[r:WORKED_WITH]-(p2)RETURN count(r)  

Use  Unmanaged  Extensions  

REST  API   Extensions  

/db/data/cypher /my-extension/service

RESTful  Resource  

@Path("/similar-skills")public class ColleagueFinderExtension { private static final ObjectMapper MAPPER = new ObjectMapper(); private final ColleagueFinder colleagueFinder; public ColleagueFinderExtension( @Context CypherExecutor cypherExecutor ) { this.colleagueFinder = new ColleagueFinder( cypherExecutor.getExecutionEngine() ); } @GET @Produces(MediaType.APPLICATION_JSON) @Path("/{name}") public Response getColleagues( @PathParam("name") String name ) throws IOException { String json = MAPPER .writeValueAsString( colleagueFinder.findColleaguesFor( name ) ); return Response.ok().entity( json ).build(); }}

JAX-­‐RS  Annota0ons  

@Path("/similar-skills")public class ColleagueFinderExtension { private static final ObjectMapper MAPPER = new ObjectMapper(); private final ColleagueFinder colleagueFinder; public ColleagueFinderExtension( @Context CypherExecutor cypherExecutor ) { this.colleagueFinder = new ColleagueFinder( cypherExecutor.getExecutionEngine() ); } @GET @Produces(MediaType.APPLICATION_JSON) @Path("/{name}") public Response getColleagues( @PathParam("name") String name ) throws IOException { String json = MAPPER .writeValueAsString( colleagueFinder.findColleaguesFor( name ) ); return Response.ok().entity( json ).build(); }}

Inject  Database/Cypher  Execu0on  Engine  

@Path("/similar-skills")public class ColleagueFinderExtension { private static final ObjectMapper MAPPER = new ObjectMapper(); private final ColleagueFinder colleagueFinder; public ColleagueFinderExtension( @Context CypherExecutor cypherExecutor ) { this.colleagueFinder = new ColleagueFinder( cypherExecutor.getExecutionEngine() ); } @GET @Produces(MediaType.APPLICATION_JSON) @Path("/{name}") public Response getColleagues( @PathParam("name") String name ) throws IOException { String json = MAPPER .writeValueAsString( colleagueFinder.findColleaguesFor( name ) ); return Response.ok().entity( json ).build(); }}

1.  Get  Close  to  the  Data  

Applica0on  

MATCH MATCH CREATE DELETE MERGE MATCH

Single  request,  many  opera0ons  –   Reduce  network  latencies  

2.  Mul0ple  Implementa0on  Op0ons  

REST  API   Extensions  

Cypher  Traversal  Framework  Graph  Algo  Package  Core  API  

3.  Control  Request/Response  Format  

{ users: [ { id: 1234}, { id: 9876} ] }

JSON,  CSV,  protobuf,  etc  

1a 03 08 96 01 Domain-­‐specific  representa0ons  –  Compact  –  Conserve  bandwidth  

4.  Control  HTTP  Headers  

GET /my-extension/service/top-10

Reverse  Proxy  

Applica0on  

HTTP/1.1 200 OK Cache-Control: max-age=60

5.  Integrate  with  Backend  Systems  

REST  API   Extensions  

Applica0on  

RDBMS   LDAP  

Migra0ng  to  Extensions  

•  Re-­‐implement  original  query  inside  extension  •  Modify  request/response  formats  and  headers  

•  Refactor  implementa0on  to  use  lower  parts  of  the  stack  where  necessary  

•  Measure,  measure,  measure  

Scaling  Reads  -­‐  Throughput  

Scale  Horizontally  For  High  Read  Throughput  

Applica0on  

Scale  Horizontally  For  High  Read  Throughput  

Applica0on  

Master   Slave   Slave  

Load  Balancer  

Scale  Horizontally  For  High  Read  Throughput  

Applica0on  

Master   Slave   Slave  

Read  Load  Balancer  

Write  Load  Balancer  

Configure  HAProxy  as  Read  Load  Balancer  global daemon maxconn 256defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000msfrontend http-in bind *:80 default_backend neo4j-slavesbackend neo4j-slaves option httpchk GET /db/manage/server/ha/slave server s1 10.0.1.10:7474 maxconn 32 check server s2 10.0.1.11:7474 maxconn 32 check server s3 10.0.1.12:7474 maxconn 32 checklisten admin bind *:8080 stats enable

Configure  HAProxy  as  Read  Load  Balancer  global daemon maxconn 256defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000msfrontend http-in bind *:80 default_backend neo4j-slavesbackend neo4j-slaves option httpchk GET /db/manage/server/ha/slave server s1 10.0.1.10:7474 maxconn 32 check server s2 10.0.1.11:7474 maxconn 32 check server s3 10.0.1.12:7474 maxconn 32 checklisten admin bind *:8080 stats enable

404 Not Found false

404 Not Found UNKNOWN

200 OK true

Master  

Slave  

Unknown  

This  Isn’t  The  Throughput  You  Were  Looking  For  

Applica0on  

1   2   3  

Load  Balancer  

MATCH (c:Country{name:'Australia'})... MATCH (c:Country{name:'Zambia'})... MATCH (c:Country{name:'Norway'})...

Cache  Sharding  Using  Consistent  Rou0ng  

Applica0on  

1   2   3  

Load  Balancer  

MATCH (c:Country{name:'Australia'})... MATCH (c:Country{name:'Zambia'})... MATCH (c:Country{name:'Norway'})...

A-­‐I          1  J-­‐R          2  S-­‐Z          3  

MATCH (c:Country{name:'Zimbabwe'})... MATCH (c:Country{name:'Japan'})... MATCH (c:Country{name:'Brazil'})...

Configure  HAProxy  for  Cache  Sharding  global daemon maxconn 256defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000msfrontend http-in bind *:80 default_backend neo4j-slavesbackend neo4j-slaves balance url_param country_code server s1 10.0.1.10:7474 maxconn 32 server s2 10.0.1.11:7474 maxconn 32 server s3 10.0.1.12:7474 maxconn 32listen admin bind *:8080 stats enable

Configure  HAProxy  for  Cache  Sharding  global daemon maxconn 256defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000msfrontend http-in bind *:80 default_backend neo4j-slavesbackend neo4j-slaves balance url_param country_code server s1 10.0.1.10:7474 maxconn 32 server s2 10.0.1.11:7474 maxconn 32 server s3 10.0.1.12:7474 maxconn 32listen admin bind *:8080 stats enable

Scaling  Writes  -­‐  Throughput  

Factors  Impac0ng  Write  Performance  

•  Managing  transac0onal  state  – Crea0ng  and  commilng  are  expensive  opera0ons  

•  Contending  for  locks  – Nodes  and  rela0onships  

Improving  Write  Throughput  

•  Delay  taking  expensive  locks  •  Batch/queue  writes  

Delay  Expensive  Locks  

•  Iden0fy  contended  nodes  •  Involve  them  as  late  as  possible  in  a  transac0on  

Add  Linked  List  Item  +  Update  Pointers  

Add  Linked  List  Item  +  Update  Pointers  

Locked  

Add  Linked  List  Item  +  Update  Pointers  

Locked  

Add  Linked  List  Item  +  Update  Pointers  

Locked  

Add  Linked  List  Item  

Add  Linked  List  

Add  Linked  List  

Add  Linked  List  

Add  Pointers  

Locked  

Batch  Writes  

•  Mul0ple  CREATE/MERGE  statements  per  request  – Good  for  integra0on  with  backend  systems  

•  Queue  – Good  for  small,  online  transac0ons  

Single-­‐Threaded  Queue  

Write  

Write  Write  

Queue  

Single  Thread  Batch  

Queue  Loca0on  Op0ons  

Applica0on  Applica0on  

Benefits  of  Batched  Writes  

•  Less  transac0onal  state  management  – Create/commit  per  batch  rather  than  per  write  

•  No  conten0on  for  locks  – No  deadlocks  

•  Query  consolida0on  – Reduce  the  amount  of  work  inside  the  database  

Query  Consolida0on  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH samMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1MATCH samCREATE sam-[:LIVES_AT]-address2

Eliminate  Duplicate  Lookups  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH samMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1MATCH samCREATE sam-[:LIVES_AT]-address2

Eliminate  Duplicate  Lookups  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH samMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1MATCH samCREATE sam-[:LIVES_AT]-address2

Eliminate  Duplicate  Lookups  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1CREATE sam-[:LIVES_AT]-address2

Eliminate  Duplicate  Lookups  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1CREATE sam-[:LIVES_AT]-address2

Eliminate  Unnecessary  Writes  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1CREATE sam-[:LIVES_AT]-address2

Eliminate  Unnecessary  Writes  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address1CREATE address2DELETE address1CREATE sam-[:LIVES_AT]-address2

Eliminate  Unnecessary  Writes  

MATCH samMATCH jennyCREATE sam-[:KNOWS]-jennyMATCH sarahCREATE sam-[:KNOWS]-sarahCREATE address2CREATE sam-[:LIVES_AT]-address2

Tradeoff  

Latency  

Higher  throughput  

In-­‐memory  or  durable  queues?  • Lost  writes  in  event  of  crash  • Transac0onal  dequeue?  

Further  Reading  

h\p://maxdemarzi.com/2013/09/05/scaling-­‐writes/  h\p://maxdemarzi.com/2014/07/01/scaling-­‐concurrent-­‐writes-­‐in-­‐neo4j/  

Hardware  

Memory  

•  SLC  (single-­‐level  cell)  SSD  w/SATA    •  Lots  of  RAM  – 8-­‐12G  heap  – Explicitly  memory-­‐map  store  files  

Object  Cache  

•  2G  for  12G  heap  •  No  object  cache  – consistent  throughput  at  expense  of  latency  

AWS  

•  HVM  (hardware  virtual  machine)  over  PV  (paravirtual)  

•  EBS-­‐op0mized  instances    •  Provisioned  IOPS  

Recommended