View
107
Download
1
Embed Size (px)
DESCRIPTION
研究室での論文紹介のスライド
Citation preview
Pregel: A System for Large-Scale Graph Processing
2014 / 5 /14
Ishikawa Yasutaka
About this Paper
• Authers:Malewicz, GrzegorzAustern, Matthew HBik, Aart J.CDehnert, James CHorn, IlanLeiser, NatyCzajkowski, Grzegorz• Google’s paper
• Proceedings of the 2010 international conference on Management of data - SIGMOD '10
2
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
3
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
4
Today’s problems of graph processing
• Poor locality of memory access
• Very little work ver vertex
5
Methods of graph processing…(1/2)
1. Crafting a custom distributed infrastructure→typically requiring a substantial implementation effort
2. Relying on an existing distributed computing platform(e.g.,MapReduce)→this can lead to suboptimal performance and usability
issues.
6
Methods of graph processing…(2/2)
3. Using a single-computer graph algorithm library→limiting the scale of problems
4. Using an existing parallel graph system→do not address fault tolerance or other issues that are
important for very large scale distributed systems
7
What is Pregel
• Scalable graph processing model- Based on BSP(Bulk Synchronous Parallel)- Designed for efficient,scalable and fault- tolerant
Implementation on clusters- Distribution-related details are hidden behind an
abstract API
• Not open source software- Apach Giraph is a open source software
implementation of Pregel
8
Bulk Synchronous Parallel
• Bridging model for designing parallel algorithm
• BPS iterates superstep for computing
and synchronize all
processes at
each superstep
superstep
9
BSP’s algorithm(1/3)
1. Concurrent computation
2. Communication
3. Barrier synchronisation
Each thread processes their data concurrently,independently
10
BSP’s algorithm(2/3)
1. Concurrent computation
2. Communication
3. Barrier synchronisation
They pass messages
11
BSP’s algorithm(3/3)
1. Concurrent computation
2. Communication
3. Barrier synchronisation
They wait for completion of message passing of all other tread
Next superstep…
12
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
13
Pregel’s input and output
• Input: graph
• Output: graph
• Iterate superstep,which
consists of user defined function,
message passing
Graph:Input
Graph:output
Superstep
Superstep
Superstep
14
Graph component
• Graph of Pregel consists of vertex and edge• Vertex:
- Consisting of unique identifier, user defined value
- Outgoing edge and value are modifiable
• Edge:- Consisting of source vertex, target vertex, user defined value
- User defined value is modifiable
- Not first class citizen
A B
Vertex value is modifiableD
C
B
A
D
C
B
A
Outgoing edge and edge value are modifiablea
b c
d
15
State of vertex
• Vertex has two states:Active,Inactive
• In case vertex receives message, chage state to Active
• In case vertex has no message, change state to Inactive
Active Inactive
Vote to halt
Message received 16
Pregel’s Superstep
1. In Superstep S,vertex V, compute user defined fuctionwith messages send in Superstep S-1
2. Send messages to other vertices that will be received in Superstep S+1
3. Modify the state of V
4. If all other vertices finish 1~3, go to Superstep S+1
• When no further vertices change in a superstep, algorithm terminates with output
17
Example: maximum value(1/4)
3 6 2 1
3 6 2 1
:Active
:InactiveSuperstep 0
18
Example: maximum value(2/4)
3 6 2 1
6 6 2 6
6 6 2 6
:Active
:InactiveSuperstep 0
Superstep 1
19
Example: maximum value(3/4)
3 6 2 1
6 6 2 6
6 6 6 6
6 6 6 6
:Active
:InactiveSuperstep 0
Superstep 1
Superstep 2
20
Example: maximum value(4/4)
3 6 2 1
6 6 2 6
6 6 6 6
6 6 6 6
:Active
:InactiveSuperstep 0
Superstep 1
Superstep 2
Superstep 3
21
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
22
Vertex class
• Writing Pregel program involves subclassing the predefined Vertex class• Compute() method will be executed at each active vertex
23
Message Passing
• The type of message which sent by vertex is specified by the user as template parameter of Vertex class
• There is no guaranteed order of messages in the iterator, but it is guaranteed that messages will be delivered
24
Combiners
• Sending a message to a vertex on another machine incurs some overhead
• In some case, using combiners can reduce the number of messages
• To enable this, user subclass
Conbiner classReduction of messages
25
Aggregators(1/2)
• Pregel aggregators are a mechanism for global communication
• Each vertex can provide a value in Superstep S, and this value is made available to all vertices in Superstep S+1
Superstep S
4
2
1
Superstep S+1
7
7
7
4+2+1…
Sum aggregator: number of edges
26
Aggregators(2/2)
• To define a new aggregator, a user subclasses the predefined Aggregator class
Superstep S
4
2
1
Superstep S+1
7
7
7
4+2+1…
Sum aggregator: number of edges
27
Topology Mutations(1/2)
• Some graph algorithms need to change the graph’s topology
- Clustering algorithm
- Minimum spanning tree algorithm
• User’s Compute() function can issue requests to add or remove vertices or edges
- it causes conflicts
28
Topology Mutations(2/2)
• We can solve this conflict using two mechanisms- Partial ordering: edge remove → vertex remove → vertex addition → edge addition
- Handler: This picks one arbitrary. User can define hundler method in vertex subclass
• Partial ordering yields deterministic for most conflict
29
Input and output
• Pregel adapts to many file format in input and output
- It decouples the task of interpreting an input file from task of graph computation
- Library provides readers and writers
- Users can write own by subclassing Reader and Writer
File format A
File format B
Reader
Compute
File format C
File format D
Writer
30
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
31
Basic architecture(1/2)
• The Pregel library divides a graph into partitions
• Assignment of a vertex to a partition depends sololy on vertex ID
- Default partitioning function is Hash(ID):mod N
32
Basic architecture(2/2)
• The execution of a Pregel program consists of several stages
1. Many copies of the user program begin executing on a cluster of machines. One of these acts as the master
2. The master determines how many partitions the graph will have, and assigns partitions to each worker
3. The master assigns a portion of the user’s input to each worker
4. The master instructs each worker to perform a superstep
33
Fault tolerance(1/2)
• Fault tolerance is achieved through chechpointing
• The master instructs workers to save the state of their partitions to persistent storage is
- Including vertex values,edge values,imcoming messages
- Master separately saves the aggregator values
34
Fault tolerance(2/2)
• Worker failures are detected using regular “ping” messages the master issues to workers
• When one or more workers fail, the master reassigns graph partitions to the workers
- Repeating the missing Supersteps
35
Worker implementation
• A worker machine maintains the state of its portion of the graph in memory
• There are two copies of active flag and incoming message queue• One for the current superstep and another for the next
superstep
• In message sending, there are two pattern: remote, local
36
Master implementation
• The master assigns unique identifier to each worker at the time of registration
• The master maintains a list of all workers known to be active
• If any worker fails, the master enters recovery mode
• The master runs an HTTP server that display statistics about the progress of computation
37
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
38
[1]Page Rank(1/2)
• Page Rank algorithm decide the importance of web pages
• This algorithm is based on evaluation of paper- Good paper might be cited from many other papers
- 「A paper that is cited from papers cited from many papers」 might be good paper
• This is named from one of Google’s founders,
Larry “Page”
39
[1]Page Rank(2/2)
40
[2]Shortest Path(1/6)
• Shortest-Path problem: calculate the shortest path in given two nodes of a weighted graph
• There is several variety of Shortest-Path problem- The single-source shortest paths problem- The s-t shortest path problem- All-pairs shortest paths problem
• In this paper, focusing on single-source shortest paths problems
41
[2]Shortest Path(2/6)
∞ ∞
0 ∞
∞
∞
∞
5
3
1 4
3 2
1
2
4
Superstep 0
42
[2]Shortest Path(3/6)
5 ∞
0 3
∞
∞
∞
5
3
1 4
3 2
1
2
4
Superstep 1
43
[2]Shortest Path(4/6)
4 6
0 3
6
∞
5
5
3
1 4
3 2
1
2
4
Superstep 2
44
[2]Shortest Path(5/6)
4 5
0 3
6
9
5
5
3
1 4
3 2
1
2
4
Superstep 3
45
[2]Shortest Path(6/6)
46
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
47
Experiment details
• Three experiments with the single-source shortest paths
• Using a cluster of 300 multicore commodity PCs
• Reporting runtime for binary trees and log-normal graphs
- Binary tree, varying number of worker tasks- Binary tree, varying graph sizes- Log-normal, random graphs: varying graph sizes
48
[1]1 billion vertex binary tree:varyingnumber of worker tasks
• Setting- A billion vertices, the number of Pregelworkers varying from50 to 800
• Result- Using 16 times as many as Workersrepresents a speedupof about 10
49
[2]Binary tree:varying graph sizes on 800 worker tasks
• Setting- Varying in size from a billion to 50 billion vertices,using a fixed numberof 800 workertasks
• Result- tree size varying from a billion to 50 billion,the time increase from17.3 to 702
50
[3]Log-normal random graphs: varying graph sizes on 800 worker
tasks(1/2)
• Binary trees are not representative of graphs encountered in practice
• Use a log-normal distribution of outdegrees
• In this experiment, μ = 4, σ = 1.3
ed
ddp
22 2/)(ln
2
1)(
51
[3]Log-normal random graphs: varying graph sizes on 800 worker
tasks(2/2)• Setting
- Varying in size from
10million to a a billion
vertices
• Result- Largest graph took
a little over 10 minutes
52
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
53
Conclusion
• They suggest a computing model that is suitable for graph processing, and has scalability, fault-tolerance
• They say that programmers can implement graph processing algorithm easily with Pregel
54
This slide’s sources(1/)
• http://www.slideshare.net/doryokujin/largescale-graph-processingintroduction
• http://shnya.jp/blog/?p=797
• http://www.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing
• http://teppei.hateblo.jp/entry/2013/11/11/232052
• http://ja.wikipedia.org/wiki/%E5%AF%BE%E6%95%B0%E6%AD%A3%E8%A6%8F%E5%88%86%E5%B8%83
55
This slide’s sources(2/)
• http://keisan.casio.jp/exec/system/1161228861
• http://www.atmarkit.co.jp/ait/articles/1203/22/news165_2.html
• http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
• http://research.preferred.jp/2011/06/bsp_piccolo_spark_introduction/
• http://ja.wikipedia.org/wiki/%E3%83%9A%E3%83%BC%E3%82%B8%E3%83%A9%E3%83%B3%E3%82%AF
56
This slide’s sources(3/)
• http://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%91%E3%83%8B%E3%83%B3%E3%82%B0%E3%83%84%E3%83%AA%E3%83%BC%E3%83%97%E3%83%AD%E3%83%88%E3%82%B3%E3%83%AB
• http://ja.wikipedia.org/wiki/%E6%9C%80%E7%9F%AD%E7%B5%8C%E8%B7%AF%E5%95%8F%E9%A1%8C
• http://matome.naver.jp/odai/2128685245125920701?&page=1
• http://www.cs.ucsb.edu/~prakash/projects/cs290b/index.html
57
This slide’s sources
• http://homepage2.nifty.com/well/Template.html
• http://ja.wikipedia.org/wiki/%E7%AC%AC%E4%B8%80%E7%B4%9A%E3%82%AA%E3%83%96%E3%82%B8%E3%82%A7%E3%82%AF%E3%83%88
• http://ja.wikipedia.org/wiki/%E3%82%AF%E3%83%AA%E3%83%BC%E3%82%AF_(%E3%82%B0%E3%83%A9%E3%83%95%E7%90%86%E8%AB%96)
• http://www.alaxala.com/jp/techinfo/archive/manual/AX2000R/HTML/KAISETS2/0078.HTM
58