70
Large-Scale Graph Processing @doryokujin Hadoop Conference Japan 2011 Fall ~Introduction~

Large-Scale Graph Processing〜Introduction〜(完全版)

Embed Size (px)

DESCRIPTION

グラフデータの大規模処理はMapReduceよりも効率の良い計算モデル が提案され、Google Pregel・Giraph・Hama・GoldenOrb等のプロジェクトにおいて実装 が進められています。またHamaやGiraphはNextGen Apache Hadoop MapReduceへ の対応が進められています。本LTでは"Large Scale Graph Processing"とはどのようなものをMap Reduceと比較して紹介するとともに、最後に各プロジェクトの特徴を挙げています。

Citation preview

Page 1: Large-Scale Graph Processing〜Introduction〜(完全版)

Large-Scale GraphProcessing

@doryokujinHadoop Conference Japan 2011 Fall

~Introduction~

Page 2: Large-Scale Graph Processing〜Introduction〜(完全版)

・井上 敬浩(26歳)

・twitter: doryokujin

・データマイニングエンジニア

・MongoDB JP 代表

・Hadoop, MongoDB, GraphDB に関心

・マラソン2時間33分

自己紹介

Page 3: Large-Scale Graph Processing〜Introduction〜(完全版)

・SNSを中心に大規模なグラフデータが爆発的に増加

→ グラフデータの解析が必要に

・Page Rank, Popularity Rank, Recommend, Shortest Path, Recommend, Friend Search

→ 大規模グラフの分散処理において、MapReduce が本当に適切であるか?

Motivation: Why Graph?

http://www.catehuston.com/blog/2009/11/02/touchgraph/

Page 4: Large-Scale Graph Processing〜Introduction〜(完全版)

@IT さんの9/15付のインタビュー記事でカッティング氏がGiraphについて言及

Hadoop MapReduce デザインパターン——MapReduceによる大規模テキストデータ処理

1 Jimmy Lin, Chris Dyer�著、神林 飛志、野村 直之�監修、玉川 竜司�訳

2 2011年10月01日 発売予定3 210ページ4 定価2,940円

MapReduceデザインパターン本の5章でグラフアルゴリズムが取り上げられている

Motivation: Why Graph?

Page 5: Large-Scale Graph Processing〜Introduction〜(完全版)

MR: Good For Simple Problems

Map

Map

Map

Reduce

Reduce

Map

HDFS

Page 6: Large-Scale Graph Processing〜Introduction〜(完全版)

MR: Bad for Iterative Problems

Map

Map

Map

Reduce

Reduce

Map

HDFS

Map

Map

Map

Reduce

Reduce

Map

HDFS

Shuffle & barrier

job start/shutdown

イテレーション毎のデータロード

i i+1

Page 8: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

Is MR Fit for Graph Data?

Page 9: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

3

5

B

C

D

E

F

GA

5

1

3

5

42

41

3

5!4

3

3 2

min(6,4)

Graph Processing = “Vertex Based Approach”

i

i+1

隣接するノード間のメッセージパッシングをMRでどう記述する?

Is MR Fit for Graph Data?

Page 10: Large-Scale Graph Processing〜Introduction〜(完全版)

BSP: Bulk Synchronous Parallel

a super step

http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel

Page 11: Large-Scale Graph Processing〜Introduction〜(完全版)

BSP: Bulk Synchronous Parallel

1. Local Computation:  各Processorがローカルデータに対して独立した処理を行う

2. Communication:  Processor 間でメッセージパッシングを行う

3. Barrier Synchronisation:全Processorのメッセージパッシングが完了するまで待機。

1.~3. の “super step” のイテレーション

...

Page 12: Large-Scale Graph Processing〜Introduction〜(完全版)

Relation: MR and BSP

Local Computation= Map Phase

Communication + Barrier = Shuffle and Sort Phase

Aggregation or (next) Local Computation

= Reduce Phase

a super step

Page 13: Large-Scale Graph Processing〜Introduction〜(完全版)

BSP Iterative MR

MR

Relation: MR and BSP

Page 14: Large-Scale Graph Processing〜Introduction〜(完全版)

Graph, Matrix, MachineLearning

BSP Iterative MR

MRGraph Processing

Matrix Computation

Machine Leaning ※1

※1 多くのMachine Learning ModelはMapReduceで記述可能なことが証明されている

Page 15: Large-Scale Graph Processing〜Introduction〜(完全版)

・2009年6月に Google が発表

- BSP を Graph Processing に応用

- 大規模データの80%をMapReduceで、20%をPregelで

- 10億node, 800億edgeのグラフをPC480台で並列処理、最短経路問題を200秒で解く

- YouTube の Graph-Based Recommendations で使われてるれているらしい

- 論文も入手可能

Google Pregel

Page 16: Large-Scale Graph Processing〜Introduction〜(完全版)

・Input: - Directed Graph ( vertex, edgeはunique_idと変更可能なvalueを持つ )

・Each Superstep S:- S-1 stepからのメッセージを受信

- Compute(): vertex Vにユーザー定義関数を適用

- V のlocal stateを変更、Graphのlocal topologyを変更

- Communication: 隣接するvertexへメッセージパッシング

- Barrier Synchronisation: 全てのCommunicationが終了するまで待機

Google Pregel Model

Page 17: Large-Scale Graph Processing〜Introduction〜(完全版)

・Termination Condition: - 各vertexはそれ以上の処理が無ければ停止信号を送信 (Vote to Halt) して inactive な状態に

- 全てのvertexが同時にinactiveになっていれば終了

- メッセージパッシングが全く行われなくなれば終了

Google Pregel Model

active inactive

Vote to Halt

Message Received

Page 18: Large-Scale Graph Processing〜Introduction〜(完全版)

Map: BSP to Graph ProcessingLocal Computation

-> [頂点へのユーザ定義関数] Compute()

Communication -> 隣接するノードへメッセー

ジパッシング

Barrier synchronisation-> 全ノードのメッセージパッシングが終了するまで待機

a super step

Page 19: Large-Scale Graph Processing〜Introduction〜(完全版)

SSSP: Parallel BFS

MapReduce & Pregel

※ SSSP: Single Source Shortest Paths, BFS: Breadth First Search

Page 20: Large-Scale Graph Processing〜Introduction〜(完全版)

SSSP: Parallel BFS

MapReduce & Pregel

※ SSSP: Single Source Shortest Paths, BFS: Breadth First Search

Page 21: Large-Scale Graph Processing〜Introduction〜(完全版)

SSSP: MapReduce Model

B

C

D

E

F

GA

5

1

3

5

42

41

3

initialize

・Load: Adjacency ListA: <(B,5),(D,3)>

B: <(E,1)>

C: <(F,5)>

D: <(B,1),(C,3),(E,4),(F,2)>

E: <>

F: <(G,4)>

G: <>

Source

Page 22: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

+∞

1

+∞

+∞

+∞+∞

+∞0

・Map Input: [Graph Structure]- <A: <0, (B,5),(D,3)>>

- <B: <∞, (E,1)>>

- <C: <∞, (F,5)>>

- <D: <∞, (B,1),(C,3),(E,4),(F,2)>>

- <E: <∞>>

- <F: <∞, (G,4)>>

- <G: <∞>>

SSSP: MapReduce Model

Page 23: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

+∞

1

+∞

+∞

+∞+∞

+∞0

・Map Output:- (B,5),(D,3), <A: <0, (B,5),(D,3)>>

- (E,∞), <B: <∞, (E,1)>>

- (F,∞), <C: <∞, (F,5)>>

- (B,∞),(C,∞),(E,∞),(F,∞),

<D: <∞, (B,1),(C,3),(E,4),(F,2)>>

- <E: <∞>>

- (G,∞), <F: <∞, (G,4)>>

- <G: <∞>>

SSSP: MapReduce ModelGraph 構造もReducerに送信

Local Disk のFlush

Page 24: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

+∞

1

+∞

+∞

+∞+∞

+∞0

・Reduce Input:[A] - <A: <0, (B,5),(D,3)>>

[B] - (B,5),(B,∞), <B: <∞, (E,1)>>

[C] - (C,∞), <C: <∞, (F,5)>>

[D] - (D,3),

<D: <∞, (B,1),(C,3),(E,4),(F,2)>>

[E] - (E,∞),(E,∞), <E,<∞>>

[F] - (F,∞),(F,∞), <F: <∞, (G,4)>>

[G] - (G,∞), <G: <∞>>

SSSP: MapReduce Model

Page 25: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

5

1

+∞

+∞

+∞+∞

30

SSSP: MapReduce Model

・Reduce Process:[A] - <A: <0, (B,5),(D,3)>>

[B] - (B,5),(B,∞), <B: <∞, (E,1)>>

[C] - (C,∞), <C: <∞, (F,5)>>

[D] - (D,3),

<D: <∞, (B,1),(C,3),(E,4),(F,2)>>

[E] - (E,∞),(E,∞), <E,<∞>>

[F] - (F,∞),(F,∞), <F: <∞, (G,4)>>

[G] - (G,∞), <G: <∞>>Reduce後、HDFS のフラッシュ

Page 26: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

5

2

+∞

+∞

+∞+∞

30

SSSP: MapReduce Model

・Map Input (Reduce Output):- <A: <0, (B,5),(D,3)>>

- <B: <5, (E,1)>>

- <C: <∞, (F,5)>>

- <D: <3, (B,1),(C,3),(E,4),(F,2)>>

- <E,<∞>>

- <F: <∞, (G,4)>>

- <G: <∞>>

Page 27: Large-Scale Graph Processing〜Introduction〜(完全版)

SSSP: MapReduce Model

B

C

D

E

F

GA

5

1

3

5

42

41

3

5

2

+∞

+∞

+∞+∞

30

・Map Output:- (B,5),(D,3), <A: <0, (B,5),(D,3)>>

- (E,6), <B: <5, (E,1)>>

- (F,∞), <C: <∞, (F,5)>>

- (B,4),(C,6),(E,7),(F,5), <D: <3, (B,1),(C,3),(E,4),(F,2)>>

- <E,<∞>>

- (G,∞), <F: <∞, (G,4)>>

- <G: <∞>>Local Disk のFlush

Page 28: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

2

6

+∞

56

30

・Reduce Process:[A] - <A: <0, (B,5),(D,3)>>

[B] - (B,5),(B,4), <B: <5, (E,1)>>

[C] - (C,6), <C: <∞, (F,5)>>

[D] - (D,3),

<D: <3, (B,1),(C,3),(E,4),(F,2)>>

[E] - (E,6),(E,7), <E, <∞>>

[F] - (F,∞),(F,5), <F: <∞, (G,4)>>

[G] - (G,∞), <G: <∞>>

SSSP: MapReduce Model

Reduce後、HDFS のフラッシュ

Page 29: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

3

6

+∞

56

30

・Map Input (Reduce Output):- <A: <0, (B,5),(D,3)>>

- <B: <4, (E,1)>>

- <C: <6, (F,5)>>

- <D: <3, (B,1),(C,3),(E,4),(F,2)>>

- <E: <6>>

- <F: <5, (G,4)>>

- <G: <∞>>

SSSP: MapReduce Model

Page 30: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

3

6

+∞

56

30

・Map Output:- (B,5),(D,3), <A: <0, (B,5),(D,3)>>

- (E,5), <B: <4, (E,1)>>

- (F,11), <C: <6, (F,5)>>

- (B,4),(C,6),(E,7),(F,5),

<D: <3, (B,1),(C,3),(E,4),(F,2)>>

- <E: <6>>

- (G,9), <F: <5, (G,4)>>

- <G: <∞>>

SSSP: MapReduce Model

Local Disk のFlush

Page 31: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

3

5

9

56

30

・Reduce Process:[A] - <A: <0, (B,5),(D,3)>>

[B] - (B,5),(B,4), <B: <4, (E,1)>>

[C] - (C,6), <C: <6, (F,5)>>

[D] - (D,3),

<D: <3, (B,1),(C,3),(E,4),(F,2)>>

[E] - (E,5), (E,7), <E, <6>>

[F] - (F,5),(F,11), <F: <5, (G,4)>>

[G] - (G,9), <G: <∞>>

SSSP: MapReduce Model

Page 32: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

end

5

9

56

30

SSSP: MapReduce Model

Page 33: Large-Scale Graph Processing〜Introduction〜(完全版)

class ShortestPathMapper(Mapper)

def map(self, node_id, Node):

# send graph structure

emit node_id, Node

# get node value and add it to edge distance

dist = Node.get_value()

for neighbour_node_id in Node.get_adjacency_list():

dist_to_nbr = Node.get_distance(

node_id, neighbour_node_id )

emit neighbour_node_id, dist + dist_to_nbr

SSSP: MapReduce Model

Page 34: Large-Scale Graph Processing〜Introduction〜(完全版)

class ShortestPathReducer(Reducer):

def reduce(self, node_id, dist_list):

min_dist = sys.maxint

for dist in dist_list:

# dist_list contains a Node

if is_node(dist):

Node = dist

elif dist < min_dist:

min_dist = dist

Node.set_value(min_dist)

" emit node_id, Node

SSSP: MapReduce Model

Page 35: Large-Scale Graph Processing〜Introduction〜(完全版)

・MapReduce:- “Dence” なグラフに対する処理は苦手

- グラフ構造もShuffle Phaseで毎回送信しないといけない

- 基本的な問題に関しては最適化手法が存在

・Pregel:- シンプルなアルゴリズム

- ネットワーク通信はメッセージのみ

MapReduce v.s. Pregel

Page 36: Large-Scale Graph Processing〜Introduction〜(完全版)

・MapReduce:- “Dence” なグラフに対する処理は苦手

- グラフ構造もShuffle Phaseで毎回送信しないといけない

- ↑基本的なグラフ問題に関しては最適化可能

・Pregel:- シンプルなアルゴリズム

- ネットワーク通信はメッセージのみ

MapReduce v.s. Pregel

Page 37: Large-Scale Graph Processing〜Introduction〜(完全版)

SSSP: MR Optimization・Combiner:

- shuffle phase でのネットワーク通信量を削減

- reduce() と同じ処理を combine() で適用すれば良い

・In-Mapper Combiner:

- map phase で buffer (hash map) を使用

- bufferの上限に達するかmap処理完了後にemit

・Shimmy trick:- mapperからでなく、HDFSから上手にグラフ構造を読み込む

Page 38: Large-Scale Graph Processing〜Introduction〜(完全版)

# In-Mapper Combiner

class ShortestPathMapper(Mapper):

def __init__(self):

self.buffer = {}

def check_and_put(self, key, value):

if key not in self.buffer or value < self.buffer[key]:

self.buffer[key] = value

def check_and_emit(self):

if is_exceed_limit_buffer_size(self.buffer):

for key, value in self.buffer.items():

emit key, value

self.buffer = {}

def close(self):

for key, value in self.buffer.items():

emit key, value

Page 39: Large-Scale Graph Processing〜Introduction〜(完全版)

#...continue

def map(self, node_id, Node):

# send graph structure

emit node_id, Node

# get node value and add it to edge distance

dist = Node.get_value()

for nbr_node_id in Node.get_adjacency_list():

dist_to_nbr = Node.get_distance(node_id, nbr_node_id)

dist_nbr = dist + dist_to_nbr

check_and_put(nbr_node_id, dist_nbr)

check_and_emit()

Page 40: Large-Scale Graph Processing〜Introduction〜(完全版)

Shimmy Trick・”parallel merge join” のアイデアの活用

Sorted by join_key

P1

P2

P3

分割してparallel join

Page 41: Large-Scale Graph Processing〜Introduction〜(完全版)

Shimmy Trick・Graph G の頂点はソート済みのnode_idで順序づけられているとする

・G = G1 ∪ G2 ∪ ... ∪ Gn に分割

・Reducer数をGraph分割数 n と一致させる

・Partitioner:- 全てのイテレーションで共通にする- Reducer Ri に送られるnode_id 集合は Graph Partition Gi の node_id 集合に必ず含まれるように

Page 42: Large-Scale Graph Processing〜Introduction〜(完全版)

Shimmy Trick・Mapper:- (node_id, dist) のペアのみemit。構造はemitしない

・Reducer:- Reducerに渡されるnode_idの順序はshuffle phaseでsort済み

- 対応するpartitioned graph Gi のnode_idの順序と一致

- 対応する Gi をHDFSからシーケンシャルに読み込み、node_idのグラフ構造を読み取る。計算した最小値と構造をペアにしてemit

Page 43: Large-Scale Graph Processing〜Introduction〜(完全版)

Shimmy Trick

HDFS

id_1, [d1,d2,...]id_2, [d1,d2,...]

...id_10,[d1,d2,...]

Reduce

id_1, N1 id_2, N2...

id_10, N10

Map Map Map

HDFS

id_11, N11 id_12, N12

...id_20, N20

id_11, [d1,d2,...]id_12, [d1,d2,...]

...id_20,[d1,d2,...]

Reduceemit id_1, N1’emit id_2, N2’

emit id_11, N11’emit id_12, N12’

HDFS

Page 44: Large-Scale Graph Processing〜Introduction〜(完全版)

# Shimmy trick

class ShortestPathReducer(Reducer):

def __init__(self):

P.open_graph_partition()

def emit_precede_node(self, node_id):

for pre_node_id, Node in P.read():

if node_id == pre_node_id:

return Node

else:

emit pre_node_id, Node

Page 45: Large-Scale Graph Processing〜Introduction〜(完全版)

#(...continue)

def reduce(node_id, dist_list):

Node = self.emit_precede_node(node_id)

min_dist = sys.maxint

for dist in dist_list:

if dist < min_dist:

min_dist = dist

Node.set_value(min_dist)

emit node_id, Node

Page 46: Large-Scale Graph Processing〜Introduction〜(完全版)

SSSP: Parallel BFS

MapReduce & Pregel

※ SSSP: Single Source Shortest Paths, BFS: Breadth First Search

Page 47: Large-Scale Graph Processing〜Introduction〜(完全版)

SSSP: Pregel Model

B

C

D

E

F

GA

5

1

3

5

42

41

3

+∞

1

+∞

+∞

+∞+∞

+∞0

P1 P2 P1 P2Compute()

Comupte():Vertexの値と前回のステップからの

メッセージを元に計算

Page 48: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

+∞

1

+∞

+∞

+∞+∞

+∞0

P1 P2 P1 P2Compute()

Communicate

SSSP: Pregel Model

Communicate:更新した値を矢線の出る方のノード

へメッセージパッシング

Page 49: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

+∞

1

+∞

+∞

+∞+∞

+∞0

P1 P2 P1 P2Compute()

CommunicateBarrier

SSSP: Pregel Model

Barrier:全てのノードへメッセージパッシン

グが終了するまで待機

Page 50: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

5

2

+∞

+∞

+∞+∞

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()

SSSP: Pregel Model

Compute():Vertexの値と前回のステップからの

メッセージを元に計算

Page 51: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

5

2

+∞

+∞

+∞+∞

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()Communicate

Barrier

SSSP: Pregel Model

Communicate & Barrier:更新した値を矢線の出る方のノードへメッセージパッシング、待機

Page 52: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

3

6

+∞

56

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()Communicate

BarrierCompute()

SSSP: Pregel Model

Page 53: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

3

6

+∞

56

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()Communicate

BarrierCompute()

CommunicateBarrier

SSSP: Pregel Model

Page 54: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

4

5

9

56

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()Communicate

BarrierCompute()

CommunicateBarrier

Compute()

SSSP: Pregel Model

Page 55: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

4

5

9

56

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()Communicate

BarrierCompute()

CommunicateBarrier

Compute()Communicate

Barrier

SSSP: Pregel Model

Page 56: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

5

5

9

56

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()Communicate

BarrierCompute()

CommunicateBarrier

Compute()Communicate

Barrier

SSSP: Pregel Model

Compute()

Page 57: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

5

5

9

56

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()Communicate

BarrierCompute()

CommunicateBarrier

Compute()Communicate

Barrier

SSSP: Pregel Model

Compute()Communicate

Barrier

Page 58: Large-Scale Graph Processing〜Introduction〜(完全版)

B

C

D

E

F

GA

5

1

3

5

42

41

3

4

end

5

9

56

30

P1 P2 P1 P2Compute()

CommunicateBarrier

Compute()Communicate

BarrierCompute()

CommunicateBarrier

Compute()Communicate

Barrier

SSSP: Pregel Model

Compute()Communicate

BarrierTerminate

Page 59: Large-Scale Graph Processing〜Introduction〜(完全版)

class ShortestPathVertex:

def compute(self, msgs):

min_dist = 0 if self.is_source() else sys.maxint;

# get values from all incoming edges.

for msg in msgs:

min_dist = min(min_dist, msg.get_value())

if min_dist < self.get_value():

# update current value(state).

" self.set_current_value(min_dist)

# send new value to outgoing edge.

out_edge_iterator = self.get_out_edge_iterator()

for out_edge in out_edge_iterator:

recipient =

out_edge.get_other_element(self.get_id())

self.send_massage(recipient.get_id(),

min_dist + out_edge.get_distance() )

self.vote_to_halt()

Page 60: Large-Scale Graph Processing〜Introduction〜(完全版)

・MapReduce:- “Dence” なグラフに対する処理は苦手

- ネットワーク通信は状態とグラフ構造

- 基本的な問題に関しては最適化可能

・Pregel:- シンプルなアルゴリズム

- ネットワーク通信はメッセージのみ

MapReduce v.s. Pregel

Page 61: Large-Scale Graph Processing〜Introduction〜(完全版)

Hama, Giraph,

GoldenOrb, Pregel

Page 62: Large-Scale Graph Processing〜Introduction〜(完全版)

Hama GoldenOrb Giraph

Logo

API BSP Pregel(Graph) Pregel(Graph)

NextGen MR 対応 ? 対応

Lincense Apache Apache Apache

Infrastructure 必要 必要 不要(on Hadoop)

Hama, GoldenOrb, Giraph

Page 63: Large-Scale Graph Processing〜Introduction〜(完全版)

Hama GoldenOrb Giraph

Logo

API BSP Pregel(Graph) Pregel(Graph)

NextGen MR 対応 ? 対応

Lincense Apache Apache Apache

Infrastructure 必要 必要 不要(on Hadoop)

Hama, GoldenOrb, Giraph

YARN 対応!

HamaはBSP全般を扱う

Hadoop上でのMapのイテレーション

Pregelに準拠したGraphAPI

Page 64: Large-Scale Graph Processing〜Introduction〜(完全版)

BSP Iterating MR

MRGraph Processing

Matrix Computation

Machine Leaning

Hama は BSP 全般を扱う

Google Pregel・Giraph・GordenOrb は Graph Processing に特化

Pregel

Hama, GoldenOrb, Giraph

Page 66: Large-Scale Graph Processing〜Introduction〜(完全版)

・Master- Controlling SuperStep and Fault

- Scheduling Job

- Managing Workers

・Worker- Task Processor

- Running with GFS and BigTable

- Communicating with Workers

[Fault Tolerance]・Check Point- Each Supertep S:・Workers: Checkpoint V, E, and Massages

・Master: Checkpoints Aggregators

・Save to Persistent (Local) Storage

・Node Failure- Detect by ping massages

- Reload the Checkpoint and Start From S

Pregel: Architecture

※調査中

Page 67: Large-Scale Graph Processing〜Introduction〜(完全版)

※ Map Only Job in Hadoop + Thread Assignment・Master:- Using Input Format for Graph

- Making VertexSplitObjects

- Synchronization of Supersteps

- Handling Changes Occurred within Supersteps

- WIth Multiple Masters for Fault Tolerrance

・Worker- Reading vertices from VertexSplitObjects, Splitting Them into VertexRanges

- Executing compute() for Each Vertex and Buffering Incoming Massages

- Running with HDFS

・Zookeeper

Apache Giraph: Architecture

[Fault Tolerance]・Check Point- Multiple Master and Zookeepr

- The Same Concept of Pregel

※調査中

Page 68: Large-Scale Graph Processing〜Introduction〜(完全版)

・BSP Master: (≒ Job Tracker)- Controlling SuperStep and Fault

- Scheduling Jobs

- Managing Groom Servers

・Groom Server: (≒ Task Tracker)- BSP Task Processor

- Running with HDFS and Other DFS

※ Hadoop RPC is used for BSPPeers to communicate each other.

・Zookeeper:- Managing the Barrier Synchronisation of the BSPPeers

Apache Hama: Architecture

HAMA: An Efficient Matrix Computation with theMapReduce Framework

Sangwon SeoComputer Science Division

KAIST (Korea Advanced Institute ofScience and Technology), South Korea

[email protected]

Edward J. YoonUser Service Development Center

NHN Corp., South [email protected]

Jaehong KimComputer Science Division

KAIST (Korea Advanced Institute ofScience and Technology), South Korea

[email protected]

Seongwook JinComputer Science Division

KAIST (Korea Advanced Institute ofScience and Technology), South Korea

[email protected]

Jin-Soo KimSchool of Information and CommunicationSungkyunkwan University, South Korea

[email protected]

Seungryoul MaengComputer Science Division

KAIST (Korea Advanced Institute ofScience and Technology), South Korea

[email protected]

Abstract—APPLICATION. Various scientific computationshave become so complex, and thus computation tools play animportant role. In this paper, we explore the state-of-the-artframework providing high-level matrix computation primitiveswith MapReduce through the case study approach, and demon-strate these primitives with different computation engines toshow the performance and scalability. We believe the opportunityfor using MapReduce in scientific computation is even morepromising than the success to date in the parallel systemsliterature.

I. INTRODUCTION

As cloud computing environment emerges, Google hasintroduced the MapReduce framework to accelerate paralleland distributed computing on more than a thousand of in-expensive machines. Google has shown that the MapReduceframework is easy to use and provides massive scalabilitywith extensive fault tolerance [2]. Especially, MapReduce fitswell with complex data-intensive computations such as high-dimensional scientific simulation, machine learning, and datamining. Google and Yahoo! are known to operate dedicatedclusters for MapReduce applications, each cluster consistingof several thousands of nodes. One of typical MapReduceapplications in these companies is to analyze search logs tocharacterize user tendencies. The success of Google promptedan Apache opensource project called Hadoop [11], which isthe clone of the MapReduce framework. Recently, Hadoopgrew into an enormous project unifying many Apache sub-projects such as HBase [12] and Zookeeper [13].Massive matrix/graph computations are often used as pri-

mary means for many data-intensive scientific applications.For example, such applications as large-scale numerical anal-ysis, data mining, computational physics, and graph renderingfrequently require the intensive computation power of matrixinversion. Similarly, graph computations are key primitives forvarious scientific applications such as machine learning, infor-mation retrieval, bioinformatics, and social network analysis.

HDFS

File

RDBMS

HBase

MapReduce BSP Dryad

Zookeeper

HAMA Core

Computation Engine(Plugged In/Out)

HAMA API

Storage Systems

Distributed Locking

HAMA Shell

Fig. 1. The overall architecture of HAMA.

HAMA is a distributed framework on Hadoop for massivematrix and graph computations. HAMA aims at a power-ful tool for various scientific applications, providing basicprimitives for developers and researchers with simple APIs.HAMA is currently being incubated as one of the subprojectsof Hadoop by the Apache Software Foundation [10].Figure 1 illustrates the overall architecture of HAMA.

HAMA has a layered architecture consisting of three compo-nents: HAMA Core for providing many primitives to matrixand graph computations, HAMA Shell for interactive userconsole, and HAMA API. The HAMA Core component alsodetermines the appropriate computation engine. At this mo-ment, HAMA supports three computation engines: Hadoop’sMapReduce engine, our own BSP (Bulk Synchronous Parallel)[9] engine, and Microsoft’s Dryad [3] engine. The Hadoop’sMapReduce engine is used for matrix computations, whileBSP and Dryad engines are commonly used for graph com-putations. The main difference between BSP and Dryad isthat BSP gives high performance with good data locality,while Dryad provides highly flexible computations with thefine control over the communication graph.

※調査中

http://wiki.apache.org/hama/Articles

Page 69: Large-Scale Graph Processing〜Introduction〜(完全版)

・Graph ProcessingにおいてはMR / BSPの解法が存在

・Graphの構造や扱うアルゴリズムによって使い分ける

・SSSPなどの基本的な問題は最適化されたMRによって効率良く解ける手法が提案されている

・複雑な問題はBSPの方が比較的シンプルに実装できる

・Giraph はHadoopフレームワークのMap処理のみをイテレーションなので最も親和性が高い

・今後は動作検証やベンチマークが必要

まとめ

Page 70: Large-Scale Graph Processing〜Introduction〜(完全版)

・Large-scale graph computing at Google

・ Pregel: A System for Large-Scale Graph Processing

・Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel

・2010-Pregel

・Google Pregel and other massive graph distributed systems.

・Design patterns for efficient graph algorithms in MapReduce

・Apache HAMA: An Introduction to Bulk Synchronization Parallel on Hadoop

・2011.06.29. Giraph - Hadoop Summit 2011

・Graph Exploration with Apache Hadoop and MapReduce

・Graph Exploration with Apache Hama

・Shortest Path Finding with Apache Hama

参考文献