Upload
wiley
View
58
Download
0
Embed Size (px)
DESCRIPTION
gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments. 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura The University of Tokyo. Barriers of Grid Environments. Grid = Multiple Clusters (LAN/WAN) Complex environment Dynamic node joins - PowerPoint PPT Presentation
Citation preview
gluepy:A Simple Distributed Python Programming Framework for Complex Grid Environments
8/1/08Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro TauraThe University of Tokyo
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Barriers of Grid Environments
• Grid = Multiple Clusters (LAN/WAN)• Complex environment– Dynamic node joins– Resource removal/failure• Network and nodes
– Connectivity• NAT/firewall
leave
join
Fire Wall
Grid enabled frameworks are crucial to facilitate computing in these environments
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
What type of applications?
• Typical Usage– Standalone jobs– No interaction among nodes
• Parallel and distributed Applications– Orchestrate nodes for a single application
• Map an existing application on the Grid– Requires complex interaction⇒frameworks must make it
simple and manageable
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Common Approaches(1)
• Programming-less– Batch Scheduler• Task placement (inter-cluster)• Transparent retries on failure
– Enables minimal interaction• Pass data via files/raw sockets• Embarrassingly parallel tasks• Very limited for application
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
execute
redo
SUBMIT
Common Approaches(2)
• Incorporate some user programming– e.g. : Master-Worker framework
• Program the master/worker(s)– Job distribution– Handling worker join/leave– Error handling
• Enables simple interaction– Still limited in application
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
error()
join()
doJob()
For more complex interaction (larger problem set)must allow more flexible/general programming
The most flexible approach
• Parallel Programming Languages– Extend existing languages: retains flexibility– Countless past examples
• (MultiLisp[Halstead ‘85], JavaRMI, ProActive[Huet et al. ‘04], …)
– Problem : not in context of the Grid• Node joins/leaves?• Resolve connectivity with NAT/firewall?
– Coding becomes complex/overwhelming
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Can we not complement this?
Our Contribution
• Grid-enabled distributed object-oriented framework– a focus on coping with complex environment
• Joins, failures, connectivity
– simple Programming & minimal Configuration• Simple tool to act as a glue for the Grid
– Implemented parallel applications on Grid environment with 900 cores (9 clusters)
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Agenda
• Introduction• Related Work• Proposal• Evaluation• Conclusion
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Programming-less frameworks
• Condor/DAGMan [Thain et al. ‘05]– Batch scheduler– Transparent retires/ handle multiple clusters– Extremely limited interaction among nodes
• Tasks with DAG dependencies• Pass on data using intermediate/scratch files
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Central Manager
Busy Nodes
Assign
Cluster
Interaction using files
Task
“Restricted” Programming frameworks• Master-Worker Model: Jojo2 [Aoki et al. ‘06], OmniRPC [Sato et al. ‘01], Ninf-C [Nakata et al. ‘04], NetSolve [Casanova et al. ‘96]
– Event driven master code: handle join/leave• Map-Reduce [Dean et al. ‘05]
– define 2 functions: map(), reduce()– Partial retires when nodes fail
• Ibis – Satin [Wrzesinska et al. ‘06]– Distributed divide-and-conquer– Random work stealing: accommodate join/leave
• Effective for specialized problem sets– Specialize on a problem/model, made mapping/programming easy– For “unexpected models”, users have to resort to out-of-band/Ad-hoc means
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Map()
Map()
Map()
Reduce()
Reduce()Input Data
Join
Failure Handler
Join Handler
dividefib(n)
fib(n-1)
Distributed Object Oriented frameworks
• ABCL [Yonezawa ‘90]JavaRMI, Manta [Maassen et al. ‘99]ProActive [Huet et al. ‘04]
• Distributed Object oriented– Disperse objects among resources
• Load delegation/distribution– Method invocations– RMI (Remote Method Invocation)– Async. RMIs for parallelism
• RMI: – good abstraction
• Extension of general language:– Allow flexible coding
foo.doJob(args)
RMI
compute
foo
Async. RMI
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Hurdles for DOO on the Grid• Race conditions
– Simultaneous RMIs on 1 object– Active Objects
• 1 object = 1 thread• Deadlocks:
e.g.: recursive calls
• Handling asynchronous events– e.g., handling node joins– Why not event driven?
• The flow of the program is segmented, and hard to flow
• Handling joins/failures– Difficult to handle them transparently
in a reasonable manner
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
b.f()
a.g()a
bdeadlock
event
if join: addif done: give more…
failure
Checkpoint?Automatic retry?…
Hurdles for Implementation• Connecivity with NAT/firewall
– Solution: Build an overlay
• Existing implementations– ProActive [Huet et al. ‘04]
• Tree topology overlay• User must hand write connectable points
– Jojo2 [Aoki et al. ‘06]• 2-level Hierarchical topology
– SSH / UDP broadcast• assumes network topology/setting
– out of user control
• Requirements– Minimal user burden
NAT
Firewall
Configure each link
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
ConnectionConfiguration File
Summarization of the Problems
• Distributed Object-Oriented on the Grid– Thread race conditions– Event handling– Node join/leave– underlying Connectivity
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Proposal : gluepy• Grid enabled distributed object oriented framework
– As a Python Library– glue together Grid resources via simple and flexible coding
• Resolve the issues in an object-oriented paradigm– SerialObjects
• define “ownership” for objects• blocking operations unblock on events
– Constructs for handling Node join/leave• Resolve the “first reference” problem• Failures are abstracted as exceptions
– Connectivity (NAT/firewall)• Peers automatically construct an overlay
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
The Basic Programming Model• RemoteObjects
– Created/mapped to a process– Accessible from other processes (RMI)– Passive Objects
• Threads are not bound to objects
• Thread– Simply to gain parallelism– RMIs / async. invocations (RMIs)
implicitly spawn a thread
• Future– Returned for async. invocation– placeholder for result– Uncaught exception is stored
and re-raised at collection
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
a
f()
Proc: A Proc: B
a.f() Spawn for
RMI
a
f()
Proc
Spawn for async
store in F
F = a.f() async
Programming in gluepy• Basics: RemoteObject
– Inherit Base class– Externally referenceable
• Async. invocation with futures– No explicit threads– Easier to maintain
sequential flow
• mutual exclusion? events? ⇒ SerialObjects
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
class Peer(RemoteObject): def run(self, arg): # work here… return result
futures = [] for p in peers: f = p.run.future(arg) futures.append(f) waitall(futures)
for f in futures: print f.get()
async. RMI run() on all
wait for all results
read for all results
inherit Remote Object
“ownership” with SerialObjects• SerialObjects
– Objects with mutual exclusion– RemoteObject sub-class
• No explicit locks• Ownership for each object
– call acquire⇒– return release⇒– Method execution by only 1 thread
• The “owner thread”
• Owner releases ownership onblocking operations– e.g: waitall(), RMI to other SerialObject– Pending threads contest for ownership– Arbitrary thread is scheduled– Eliminate deadlocks for recursive calls
Th
Th Th Th
object
newownerthread
Give-upOwner
ship
block
Th Th Th Th
object
unblock
re-contestfor ownership
ThTh Th Th
object ownerthread
waiting threads
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Signals to SerialObjects• We don’t want event-driven loops!
• Events → “signals”– Blocking op. unblock on signal
• Signals to objects– Unblock a thread blocking
in object’s context• If none, unblock a next blocking thread
– Unblocked thread can handlethe signal(event)
Th
object
unblock
SIGNAL
Th
object handle
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
SerialObjects in gluepy• e.g. : A Queue
– pop() • blocks on empty Queue
– add()• call signal() to unblock waiter
• Atomic Section:– Between blocking ops
in a method– Can update obj. attr.s
and do invocation on Non-Serial Objects
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
class DistQueue(SerialObject): def __init__(self): self.queue = []
def add(self, x): self.queue.append(x) if len(self.queue) == 1: self.signal()
def pop(self): while len(self.queue) == 0:
wait([])
x = self.queue.pop(0) return x
Atomic Section
Signal & wake
Block until signal
Managing dynamic resources• Node Join:
– Python process starts• Node leave:
– Process termination
• Constructs for node joins/leaves– Node Join
“⇒ first reference” problemObject lookup
• obtain ref. to existing objects in computation
– Node Leave ⇒ RMI exception
• Catch to handle failure
Exception!
Objects in computation
joining node
lookup
Object on failed node
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
e.g.:Master-worker in gluepy (1/3)
• Handles join/leave
• code for join :– join will invoke signal– signal will unblock main
master thread
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
class Master(SerialObject): ...
def nodeJoin(self, node): self.nodes.append(node) self.signal()
def run (self): assigned = {} while True: while len(self.nodes)>0 and
len(self.jobs)>0: ASYNC. RMIS TO IDLE WORKERS
readys = wait(futures) if readys == None: continue
for f in readys: HANDLE RESULTS
Signal for join
Block &Handle join
e.g. : Master-worker in gluepy (2/3)
• Failure handling– Exception on collection– Handle exception to
resubmit task
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job)
Failurehandling
e.g.: Master-worker in gluepy (3/3)
• Deployment– Master exports object– Workers get reference
and do RMI to join
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
worker = Worker()
master = RemoteRef(“master”)
master.nodeJoin(worker)
while True: sleep(1)
master = Master()
master.register(“master”)
master.run()
lookup on join
Worker init
Master init
Automatic Overlay Construction(1)• Solution for Connectivity
– Automatically construct an overlay
• TCP overlay– On boot, acquire other peer info.– Each node connects to a small
number of peers– Establish a connected connection
graph
NAT
Firewall
Global IP
Attempt connection
established connections
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Automatic Overlay Construction(2)
• Firewalled clusters– Automatic
port-forwarding– User configure SSH info
• Transparent routing– P-P communication is
routed– (AODV [Perkins ‘97])
SSH
Firewalltraversal
P-to-Pcommunication
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
#config fileuse src_pat dst_pat, prot=ssh, user=kenny
RMI failure detection on Overlay• Problem with overlay
– A route consists of a number of connections
• RMI failure ⇒ failure of any intermediateconnection
• Path Pointers– Recorded on each forwarding
node– RMI reply returns the path it came
• Failure of intermediate connection– The preceding forwarding node
back-propagates the failure
Path pointer
RMI handler
failure
Backpropagate
RMI invoker
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Agenda
• Introduction• Related Work• Proposal• Evaluation• Conclusion
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Experimental Environment
hongo:98
chiba:186
okubo:28
suzuk:72
imade:60kototoi:88
kyoto:70
istbs:316
tsubame:64
FirewallPrivate IPs
All packets droppedhiro:88
mirai:48
Global IPs
Max. scale : 9 clusters, over 900 cores
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
InTrigger Grid
Platform in Japan
InTrigger
requires SSH forwarding
Necessary Configuration
• Configuration necessary for Overlay– 2 clusters( tsubame, istbs) require SSH-
portforwarding to other clusters ⇒ 2 lines of configuration
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
# istbs cluster uses SSH for inter-cluster conn.use 133\.11\.23\. (?!133\.11\.23\.), prot=ssh, user=kenny
# tsubame cluster gateway uses SSH for inter-cluster conn.use 131.112.3.1 (?!172\.17\.), prot=ssh, user=kenny
add connection instruction by regular expression
Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections per peer– 1000 trials per each cluster/attempted connection configuration
0 5 10 15 20 25 30 350
0.2
0.4
0.6
0.8
1
1.2
Probability of Connected Graph
4 Global 3 Private2 Global 3 Private1 Global 3 Private
num. of attempted connections per peer
Prob
abili
ty
28 Global/ 238 Private Peers Case: 95 %
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Dynamic Master-Worker• Master object distributes work to Worker objects
– 10,000 tasks as RMI
• Workers repeat join/leave– Tasks for failed nodes are redistributed– No tasks were lost during the experiment
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
0 500 1000 1500 2000 2500 30000
100200300400500600700800900
Workers
Time[sec]
Tas
k/W
orke
r C
ount
A Real-life Application
• A combination optimization problem– Permutation Flow Shop Problem– parallel branch-and-bound• Master-Worker like• Requires periodic exchange of bounds
– Code• 250 lines of Python code as glue code• Worker node starts up sequential C++ code
– Communicate with local Python through pipes
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Master-Worker interaction
• Master does RMI to worker– Worker: periodical RMI to master– Not your typical master-worker– requires a flexible framework like ours
Master
Worker
doJob() exchange_bound()
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Performance
• Work Rate– ci : total comp. time per core– N: num. of cores– T: completion time
• Slight drop with 950 cores– due to master node
becoming overloaded
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
169
569
948
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Work RateC
PU C
ores
Troubleshoot Search Engine• Ever stuck debugging, or troubleshooting?
• Re-rank query results obtained from google– Use results from machine learning web-forums– Perform natural language processing on page contents
at query time• Use a Grid backend
– Computationally intensive– Require good response time
• in 10s of seconds
backend
Search Engine
Query:“vmware kernel panic”
Compute!!
Compute!!
Compute!!
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Troubleshoot Search Engine Overview
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
PythonCGI
async.doQuery()
doSearch()
async.doWork()
parsing
Graph extraction
rescoring
Leveraged sync/async RMIs to seamlessly integrate parallelism into a sequential program.
Merged CGIs with Grid backend
Agenda
• Introduction• Related Work• Proposal• Evaluation• Conclusion
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Conclusion• gluepy: Grid enabled distributed object oriented framework
– Supports simple and flexible coding for complex Grid• SerialObjects• Signal semantics• Object lookup / exception on RMI failure• Automatic overlay construction
– as a tool to glue together Grid resources simply and flexibly
• Implemented and evaluated applications on the Grid– Max. scale: 900 core (9 cluster)
• NAT/Firewall, with runtime joins/leaves– Parallelized real-life applications
• Take full advantage of gluepy constructs for seamless programming
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Questions?
• gluepy is available from its homepagewww.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections per peer– 1000 trials per each cluster/attempted connection configuration
0 5 10 15 20 25 30 350
0.2
0.4
0.6
0.8
1
1.2
Probability of Connected Graph
4 Global 3 Private2 Global 3 Private1 Global 3 Private
num. of attempted connections per peer
Prob
abili
ty
28 Global/ 238 Private Peers Case: 95 %
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Troubleshoot Search Engine Overview
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
RMIfromCGI
extraction
parsing
Graph extraction
rescoring
asynchronously return to CGI
Leverage async. RMI from CGI script to work distribution on the Grid
All coding done in Python seamlessly, using gluepy
async.RMI forquery
async.RMI to
workers
Utilization of the Grid• Grid
= Multiple Clusters (LAN/WAN)
• Typical Usage– Many stand-alone jobs in parallel– Little or No interaction among nodes
• Parallel and distributed Computing– Utilize nodes for a single application
• Parallelize an existing application– Requires complex interaction
⇒ Utilize Grid enabled frameworks
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
The demands on the Grid
• A framework that realizes flexible/complex interaction on the Grid
• Can we learn anything from parallel languages?
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
leave
join
Fire Wall
apply
App.
Speedup
100 200 300 400 500 600 700 800 900 10000
2000
4000
6000
8000
10000
12000
14000
16000
Completion Time
CPU Cores
Com
plet
ion
Tim
e [s
ec]
900 cores でスケールしなくなる
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
累積実行時間
A(169)
B(569)
C(948)
0 1000000 2000000 3000000 4000000 5000000 6000000
Total Comm.Total Comp.
累積計算時間の拡大が再実行による無駄が出ていることが分かる
累積計算時間を考慮すると、スピードアップは 169 cores 948 cores ⇒(5.64 倍 ) で 4.942008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
オーバレイに関するMicro-Benchmarks
• 1 ノードから RMI を発行– ほぼすべてのノードに対して 3hop 以内で到達
• Latency– Overlay 上で no-op の RMI : ping()
• Bandwidth– Overlay 上で大きい引数の RMI : send_data()
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
オーバレイ上の遅延• 1 ノードから 5 クラスタのノード上の
object へ RMI : ping()– ping で測定した RTT と比較
• overlay, 1 hop = ~1.5 [ms]
chiba1
01
chiba1
03
hiro00
1
hong
o100
hongo
102
imad
e001
kotot
oi000
kototoi002
kyoto
002
mirai00
0
mirai00
20
5
10
15
20
25
30
35
ping()RTT
Lat
ency
[m
s]L
aten
cy
[ms]
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
オーバレイ上のバンド幅• 引数 (de)serialization overhead 大– Full 1Gbit Ethernet で理想最大値: 40[MB/sec]– iperf 測定値から算出される最大値と比較– overlay で hop をするたびにバンド幅が減少• store-and-forward
chiba
101
chiba1
03
hiro00
1
hongo
100
hong
o102
imad
e001
kotot
oi000
kotot
oi002
kyoto
002
mirai00
0
mirai00
205
1015202530354045
Throughputiperf
Thr
ough
put [
MB
/s]
Overlay hop 数でクラスタ内でもバンド幅が変化
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
Master-worker in gluepy
• Full Master-Worker– 並列 RMI
• New RMI for idle workers– ノードの参加・脱退
• Non-event driven
2008/8/1www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
class Master(SerialObject): def __init__(self, jobs): self.nodes = [] self.jobs = jobs
def nodeJoin(self , node): self.nodes.append(node) self.signal()
def run (self): assigned = {} while True: while len(self.nodes)>0 and len(self.jobs)>0: node = self.nodes.pop() job = self.jobs.pop() f = node.doJob.future(job) assigned[f] = (node, job)
readys = wait(assigned.keys()) if readys == None: continue
for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job)
Failurehandling
Signal for join
Block &Handle join
worker = Worker()
master = RemoteRef(“master”)
master.join(worker)
while True: sleep(1)
master = Master()
master.register(“master”)
master.run()
lookup on join
Worker init
Master init
Grid 上で並列分散計算の需要• 例: Web アプリケーション
• CGI とのインタラクション• タスク分割・負荷分散• 結果集約・ CGI へ結果加工
backend
Application
Publicly accessible
複雑な協調を必要とする
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
提案するモデルでのプログラミング• future で非同期 RMI
– 呼び出しに対するplaceholder
– いずれ結果が格納される• 明示的なスレッドは不要
– 別スレッドによる callbackなどはない
• RMI を処理する側は?2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
# 非同期 RMI 実行f1 = foo1.doJob.future(args)f2 = foo2.doJob.future(args)# 他の計算……# 返り値を取得。 Waitvalue1 = f1.get()value2 = f2.get()
プログラミングモデルが提供するもの• 柔軟性:既存のオブジェクト指向言語をライブラリ拡張• 並列分散計算に携わる諸問題を解決
– ロックを意識させない• オブジェクトの所有権を使った implicit な排他制御• block 時に所有権を implicit に委託する:デッドロック予防
– event-driven に陥らないイベント処理• 同期モデルと協調した signal セマンティクス
– 参加・脱退への対応• object lookup : 「最初の参照」問題• 故障に対する例外セマンティクス
• 分散した動的な環境 (Grid) を簡潔に扱えるモデル2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
資源間通信と協調の実現• Condor/DAGMan [Thain et al. ‘05]
– バッチスケジューラ– 「タスク」はスクリプトで表現– タスク間の依存関係は DAG
• Ibis / Satin [Wrzesinska et al. ‘06]– 分散環境で分割統治問題– 子タスクに分割– タスクには親子関係
DAG DependencyRelationship
Central Manager
Busy Nodes
Assign
Cluster
Task
- タスク間の協調は中間ファイル媒体- 依存関係のあるタスク間での通信に限定
www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1
参加・脱退の処理• JoJo2 [Aoki et al. ‘06]
– Master – Worker– Event driven– イベント毎にハンドラ定義
• タスクの終了• 参加• 脱退・故障
Join
Failure Handler
Join Handler
- 同期の問題- イベント・ドリブンは分かりにくい-提案する同期セマンティクスに非同期イベント通知機構を導入-Event-driven でない記述が可能
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections per peer– 1000 trials per each cluster/attempted connection configuration
0 5 10 15 20 25 30 350
0.2
0.4
0.6
0.8
1
1.2
Probability of Connected Graph
4 Global 3 Private2 Global 3 Private1 Global 3 Private
num. of attempted connections per peer
Prob
abili
ty
28 Global/ 238 Private Peers Case: 95 %
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Future Work
• Reliable WAN communication for the Grid overlays– Node failure– Connection failure
• Weakness of WAN connections– Router Policies• close connections after given period
– Obscure kernel bugs with NAT• Connection resets
Faulty link
WAN links are more vulnerable, and failures will occur2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Some Related Work• Robust Tree Topologies for
Sensor Networks [ England ‘06]– Create spanning tree for data
reduction– Flat tree for high reliability
• Fewest Hops– Tree with short distance for
low power consumption• Shortest Path
⇒ Spanning Tree that merges the two metrics for the best of two worlds
Fewest Hop:High ReliabilityHigh Power Usage
Shortest Path:Low ReliabilityLow Power Usage
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Possible Future Direction
• Our context: Grid computing– communication latency = metric for link reliability
• Fewest Hops– Reliability for node
failure• Shortest Distance– Reliability for link failure
Short reliable links
Long faulty links
Can we construct an overlay connection topology that take the best of two worlds?
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Publications• 1. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming
Framework for Complex Grid Environments. In 8th IEEE International Symposium on Cluster Computing and the Grid, May 2008 (Poster Paper. To Appear).
• 2. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In IPSJ Transactions on Advanced Computing Systems. (Conditional Accept)
• 3. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In Proceedings of 2008 Symposium on Advanced Computing Systems and Infrastructures. (To Appear)
• 4. Ken Hironaka, Shogo Sawai, Kenjiro Taura. A Distributed Object-Oriented Library for Computation Across Volatile Resources. In Summer United Workshops on Parallel, Distributed and Cooperative Processing. August 2007
• 5. Ken Hironaka, Kenjiro Taura, Takashi Chikayama. A Low-Stretch Object Migration Scheme for Wide-Area Environments. In IPSJ Transactions on Programming. Vol 48 No.SIG 12 (PRO 34), pp.28-40, August 2007
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Problems with Grid Computing (2)
• Complexity of Programming on the Grid– Low Level Computing (sockets)
• Communication• Multi-threaded Computing (Synchronization)• Heavy Burden on Non-experts
– Flexibility and Integration• Grid Frameworks for task distribution• Independent parallel programming languages• Computing is not execution of many independent tasks
– Need finer grained communication• Bad interface with user application
– Java, Ruby, Python, PHP
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Related Work
• Discussed with respect to criteria necessary for modern Grid computing– Workflow Coordination• Flexibility without putting the burden on the user
– Joining Nodes / Failure of resources• Handling these events should not dominate the
programming overhead– Connectivity in Wide-Area Networks• Adaptation to networks with NAT/firewall with little
manual settings
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Workflow Coordination (1)
• Condor / DAGMan [Thain ‘05]– “Tasks” are expressed as script files and
distributed on idle nodes– Dependency between tasks can be
expressed in DAG (Directed Acyclic Graph)
• Ibis / Satin [Wrzesinska ‘06]– framework for divide-and-conquer
problems– Tasks can be broken into smaller sub-
tasks, on which it depends
DAG DependencyRelationship
Central Manager
Busy Nodes
Assign
Cluster
Task
- Many computation cannot be expressed as “Tasks” with dependencies- A task’s communication is limited to others to which it has dependencies
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Object Synchronization Example
class A: def __init__(self, x): self.x = x
def f(self, b): self.x += 1
#blocking RMI b.g() self.x -= 1
return
Atomic section
Atomic section
a b
b.g()
Value x stays
consistent
In method f(), instance a invokes blocking method g() on object b
only 1 thread at a
time
give-upownershipduring RMI
block
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Adaptation to Dynamic Resources• Signal delivery to objects
– Unblocks any thread that is blocking in the object’s context• Can be used to notify
asynchronous events– A joining node
• Node Failure RMI Failure⇒– Failure returned as exception to
method invocation– The user can catch the
exception, and perform rollback procedures if necessary
exception
Th
object
blockunblock
signal
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
Preliminary Experiments
• Overlay Construction Simulation• A Simple Master-Worker Application with
dynamically joining/leaving nodes• A Real-life Parallel Application on the Grid• A Troubleshoot-Search Engine
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
hongo(98)
chiba(186)
okubo(28)
suzuk(72)
imade(60)kototoi (88)
kyoto(70)
istbs(316)
tsubame(64)
Global IPs
FirewallPrivate IPs
All packets dropped
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
hongo(98)
chiba(186)
okubo(28)
suzuk(72)
imade(60)kototoi (88)
kyoto(70)
istbs(316)
tsubame(64)
Global IPs
FirewallPrivate IPs
All packets dropped
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy
100 200 300 400 500 600 700 800 900 10000
2000
4000
6000
8000
10000
12000
14000
16000
Completion Time
CPU Cores
Com
plet
ion
Tim
e [s
ec]