26
Scaling Apache Giraph Nitay Joffe, Data Infrastructure Engineer [email protected] @nitayj September 10, 2013

2013.09.10 Giraph at London Hadoop Users Group

  • View
    3.156

  • Download
    4

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 2013.09.10 Giraph at London Hadoop Users Group

Scaling Apache Giraph

Nitay Joffe, Data Infrastructure Engineer

[email protected]

@nitayj

September 10, 2013

Page 2: 2013.09.10 Giraph at London Hadoop Users Group

Agenda

1 Background

2 Scaling

3 Results

4 Questions

Page 3: 2013.09.10 Giraph at London Hadoop Users Group

Background

Page 4: 2013.09.10 Giraph at London Hadoop Users Group

What is Giraph?• Apache open source graph computation engine based on Google’s Pregel.• Support for Hadoop, Hive, HBase, and Accumulo.• BSP model with simple think like a vertex API.• Combiners, Aggregators, Mutability, and more.• Configurable Graph<I,V,E,M>:

– I: Vertex ID

– V: Vertex Value

– E: Edge Value

– M: Message data

What is Giraph NOT?• A Graph database. See Neo4J.• A completely asynchronous generic MPI system.• A slow tool.

implementsWritable

Page 5: 2013.09.10 Giraph at London Hadoop Users Group

Why not Hive?

Inputformat

Outputformat

Map tasks

Intermediatefiles

Reducetasks

Output 0

Output 1

Input 0

Input 1

Iterate!

• Too much disk. Limited in-memory caching.• Each iteration becomes a MapReduce job!

Page 6: 2013.09.10 Giraph at London Hadoop Users Group

Giraph components

Master – Application coordinator

• Synchronizes supersteps

• Assigns partitions to workers before superstep begins

Workers – Computation & messaging

• Handle I/O – reading and writing the graph

• Computation/messaging of assigned partitions

ZooKeeper

• Maintains global application state

Page 7: 2013.09.10 Giraph at London Hadoop Users Group

Giraph Dataflow

Split 0

Split 1

Split 2

Split 3

Work

er

1

Mast

er

Work

er

0Input

formatLoad / SendGraph

Load / SendGraph

Loading the graph

1

Part 0

Part 1

Part 2

Part 3

Compute / Send

Messages

Work

er

1

Compute / Send

Messages

Mast

er

Work

er

0

In-memory graph

Send stats / iterate!

Compute/Iterate

2

Work

er

1W

ork

er

0 Part 0

Part 1

Part 2

Part 3

Output format

Part 0

Part 1

Part 2

Part 3

Storing the graph

3

Split 4

Split

Page 8: 2013.09.10 Giraph at London Hadoop Users Group

Giraph Job Lifetime

Output

Active Inactive

Vote to Halt

Received Message

Vertex Lifecycle

All Vertices Halted?

InputCompute Superstep

No

Master halted?

No

Yes

Yes

Page 9: 2013.09.10 Giraph at London Hadoop Users Group

Simple Example – Compute the maximum value

5

15

2

5

5

25

5

5

5

5

1

2

Processor 1

Processor 2

Time

Connected Componentse.g. Finding Communities

Page 10: 2013.09.10 Giraph at London Hadoop Users Group

PageRank – ranking websites

Mahout (Hadoop)854 lines

Giraph< 30 lines

• Send neighbors an equal fraction of your page rank • New page rank = 0.15 / (# of vertices) + 0.85 *

(messages sum)

Page 11: 2013.09.10 Giraph at London Hadoop Users Group

Scaling

Page 12: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Worker Crash.

Superstep i(no

checkpoint)

Superstep i+1

(checkpoint)

Superstep i+2(no

checkpoint)Worker failure!

Superstep i+1

(checkpoint)

Superstep i+2(no

checkpoint)

Superstep i+3

(checkpoint)Worker failure after

checkpoint complete!

Superstep i+3(no

checkpoint)

ApplicationComplete…

Solution: Checkpointing.

Page 13: 2013.09.10 Giraph at London Hadoop Users Group

“Spare”Master 2

ActiveMaster State“Spare”

Master 1

“Active”Master 0

Before failure of active master 0

“Spare”Master 2

ActiveMaster State“Active”

Master 1

“Active”Master 0

After failure of active master 0

ZooKeeper ZooKeeper

Problem: Master Crash.

Solution: ZooKeeper Master Queue.

Page 14: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Primitive Collections.• Graphs often parameterized with {Null,Int,Long,Float,Double}• Boxing/unboxing. Objects have internal overhead.

3

Solution: Use fastutil, e.g. Long2DoubleOpenHashMap.

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion

1

24

5

1.2

0.50.8

0.4

1.7

0.7

Single Source Shortest Path

s

t

1.2

0.50.8

0.4

0.2

0.7

Network Flow

3

1

24

5

Count In-Degree

Page 15: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Too many objects.Lots of time spent in GC.

Graph: 1B Vertices, 200B Edges, 200 Workers.

• 1B Edges per Worker. 1 object per edge value.• List<Edge<I, E>> ~ 10B objects

• 5M Vertices per Worker. 10 objects per vertex value.• Map<I, Vertex<I, V, E> ~ 50M objects

• 1 Message per Edge. 10 objects per message data.• Map<I, List<M>> ~ 10B objects

• Objects used ~= O(E*e + V*v + M*m) => O(E*e)

Label Propagatione.g. Who’s sleeping?

3

1

24

5

Boring

Amazing

Q: What did he think?

0.5

0.2

0.8 0.36

0.17

0.41

Confusing

Page 16: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Too many objects.Lots of time spent in GC.

Solution: byte[]• Serialize messages, edges, and vertices.• Iterable interface with representative object.

Input Input Input

next()next()

next()Objects per worker ~= O(V)

Label Propagatione.g. Who’s sleeping?

3

1

24

5

Boring

Amazing

Q: What did he think?

0.5

0.2

0.8 0.36

0.17

0.41

Confusing

Page 17: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Serialization of byte[]• DataInput? Kyro? Custom?

Solution: Unsafe• Dangerous. No formal API. Volatile. Non-portable (oracle JVM only).

• AWESOME. As fast as it gets.• True native. Essentially C: *(long*)(data+offset);

Page 18: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Large Aggregations.

Worker

Worker

Worker Worke

r

Worker

Master

Workers own aggregators

Worker

Worker

Worker Worke

r

Worker

Master

Aggregator owners communicatewith Master

Worker

Worker

Worker Worke

r

Worker

Master

Aggregator owners distribute values

Solution: Sharded Aggregators.

Worker

Worker

Worker Worke

r

Worker

Master

K-Means Clusteringe.g. Similar Emails

Page 19: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Network Wait.• RPC doesn’t fit model.• Synchronous calls no good.

Solution: NettyTune queue sizes & threads

BarrierBarrier

Begin superstep

computenetwork

End compute

End superstep

wait

BarrierBarrier

Begin superstep

compute

network

wait

Time to first message

End compute

End superstep

Page 20: 2013.09.10 Giraph at London Hadoop Users Group

Results

Page 21: 2013.09.10 Giraph at London Hadoop Users Group

50 100 150 200 250 3000

50

100

150

200

250

300

350

400

450

2B Vertices, 200B Edges, 20 Compute Threads

Workers

Itera

tion

Tim

e (

sec)

Increasing Workers

Increasing Data Size

1000000000 1010000000000

50

100

150

200

250

300

350

400

450

50 Workers, 20 Compute Threads

EdgesIt

era

tion

Tim

e (

sec)

Scalability Graphs

Page 22: 2013.09.10 Giraph at London Hadoop Users Group

Lessons Learned

• Coordinating is a zoo. Be resilient with ZooKeeper.• Efficient networking is hard. Let Netty help.• Primitive collections, primitive performance. Use fastutil.• byte[] is simple yet powerful.• Being Unsafe can be a good thing.

• Have a graph? Use Giraph.

Page 23: 2013.09.10 Giraph at London Hadoop Users Group

What’s the final result?

Comparison with Hive:• 20x CPU speedup• 100x Elapsed time speedup. 15 hours => 9 minutes.

Computations on entire Facebook graph no longer “weekend jobs”.Now they’re coffee breaks.

Page 24: 2013.09.10 Giraph at London Hadoop Users Group

Questions?

Page 25: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Measurements.

• Need tools to gain visibility into the system.• Problems with connecting to Hadoop sub-processes.

Solution: Do it all.• YourKit – see YourKitProfiler• jmap – see JMapHistoDumper• VisualVM –with jstatd & ssh socks proxy• Yammer Metrics• Hadoop Counters• Logging & GC prints

Page 26: 2013.09.10 Giraph at London Hadoop Users Group

Problem: Mutations• Synchronization.• Load balancing.

Solution: Reshuffle resources• Mutations handled at barrier between supersteps.• Master rebalances vertex assignments to optimize

distribution.• Handle mutations in batches.• Avoid if using byte[].• Favor algorithms which don’t mutate graph.