72
gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura The University of Tokyo 2008/8/1 www.logos.ic.i.u- tokyo.ac.jp/~kenny/gluepy

gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

  • Upload
    wiley

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments. 8/1/08 Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura The University of Tokyo. Barriers of Grid Environments. Grid = Multiple Clusters (LAN/WAN) Complex environment Dynamic node joins - PowerPoint PPT Presentation

Citation preview

Page 1: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

gluepy:A Simple Distributed Python Programming Framework for Complex Grid Environments

8/1/08Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro TauraThe University of Tokyo

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 2: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Barriers of Grid Environments

• Grid = Multiple Clusters (LAN/WAN)• Complex environment– Dynamic node joins– Resource removal/failure• Network and nodes

– Connectivity• NAT/firewall

leave

join

Fire Wall

Grid enabled frameworks are crucial to facilitate computing in these environments

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 3: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

What type of applications?

• Typical Usage– Standalone jobs– No interaction among nodes

• Parallel and distributed Applications– Orchestrate nodes for a single application

• Map an existing application on the Grid– Requires complex interaction⇒frameworks must make it

simple and manageable

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 4: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Common Approaches(1)

• Programming-less– Batch Scheduler• Task placement (inter-cluster)• Transparent retries on failure

– Enables minimal interaction• Pass data via files/raw sockets• Embarrassingly parallel tasks• Very limited for application

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

execute

redo

SUBMIT

Page 5: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Common Approaches(2)

• Incorporate some user programming– e.g. : Master-Worker framework

• Program the master/worker(s)– Job distribution– Handling worker join/leave– Error handling

• Enables simple interaction– Still limited in application

 2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

error()

join()

doJob()

For more complex interaction (larger problem set)must allow more flexible/general programming

Page 6: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

The most flexible approach

• Parallel Programming Languages– Extend existing languages: retains flexibility– Countless past examples

• (MultiLisp[Halstead ‘85], JavaRMI, ProActive[Huet et al. ‘04], …)

– Problem : not in context of the Grid• Node joins/leaves?• Resolve connectivity with NAT/firewall?

– Coding becomes complex/overwhelming

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Can we not complement this?

Page 7: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Our Contribution

• Grid-enabled distributed object-oriented framework– a focus on coping with complex environment

• Joins, failures, connectivity

– simple Programming & minimal Configuration• Simple tool to act as a glue for the Grid

– Implemented parallel applications on Grid environment with 900 cores (9 clusters)

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 8: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Agenda

• Introduction• Related Work• Proposal• Evaluation• Conclusion

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 9: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Programming-less frameworks

• Condor/DAGMan [Thain et al. ‘05]– Batch scheduler– Transparent retires/ handle multiple clusters– Extremely limited interaction among nodes

• Tasks with DAG dependencies• Pass on data using intermediate/scratch files

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Central Manager

Busy Nodes

Assign

Cluster

Interaction using files

Task

Page 10: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

“Restricted” Programming frameworks• Master-Worker Model: Jojo2 [Aoki et al. ‘06], OmniRPC [Sato et al. ‘01],           Ninf-C [Nakata et al. ‘04], NetSolve [Casanova et al. ‘96]

– Event driven master code: handle join/leave• Map-Reduce [Dean et al. ‘05]

– define 2 functions: map(), reduce()– Partial retires when nodes fail

• Ibis – Satin [Wrzesinska et al. ‘06]– Distributed divide-and-conquer– Random work stealing: accommodate join/leave

• Effective for specialized problem sets– Specialize on a problem/model, made mapping/programming easy– For “unexpected models”, users have to resort to out-of-band/Ad-hoc means

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Map()

Map()

Map()

Reduce()

Reduce()Input Data

Join

Failure Handler

Join Handler

dividefib(n)

fib(n-1)

Page 11: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Distributed Object Oriented frameworks

• ABCL [Yonezawa ‘90]JavaRMI, Manta [Maassen et al. ‘99]ProActive [Huet et al. ‘04]

• Distributed Object oriented– Disperse objects among resources

• Load delegation/distribution– Method invocations– RMI (Remote Method Invocation)– Async. RMIs for parallelism

• RMI: – good abstraction

• Extension of general language:– Allow flexible coding

foo.doJob(args)

RMI

compute

foo

Async. RMI

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 12: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Hurdles for DOO on the Grid• Race conditions

– Simultaneous RMIs on 1 object– Active Objects

• 1 object = 1 thread• Deadlocks:

e.g.: recursive calls

• Handling asynchronous events– e.g., handling node joins– Why not event driven?

• The flow of the program is segmented, and hard to flow

• Handling joins/failures– Difficult to handle them transparently

in a reasonable manner

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

b.f()

a.g()a

bdeadlock

event

if join: addif done: give more…

failure

Checkpoint?Automatic retry?…

Page 13: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Hurdles for Implementation• Connecivity with NAT/firewall

– Solution: Build an overlay

• Existing implementations– ProActive [Huet et al. ‘04]

• Tree topology overlay• User must hand write connectable points

– Jojo2 [Aoki et al. ‘06]• 2-level Hierarchical topology

– SSH / UDP broadcast• assumes network topology/setting

– out of user control

• Requirements– Minimal user burden

NAT

Firewall

Configure each link

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

ConnectionConfiguration File

Page 14: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Summarization of the Problems

• Distributed Object-Oriented on the Grid– Thread race conditions– Event handling– Node join/leave– underlying Connectivity

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 15: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Proposal : gluepy• Grid enabled distributed object oriented framework

– As a Python Library– glue together Grid resources via simple and flexible coding

• Resolve the issues in an object-oriented paradigm– SerialObjects

• define “ownership” for objects• blocking operations unblock on events

– Constructs for handling Node join/leave• Resolve the “first reference” problem• Failures are abstracted as exceptions

– Connectivity (NAT/firewall)• Peers automatically construct an overlay

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 16: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

The Basic Programming Model• RemoteObjects

– Created/mapped to a process– Accessible from other processes (RMI)– Passive Objects

• Threads are not bound to objects

• Thread– Simply to gain parallelism– RMIs / async. invocations (RMIs)

implicitly spawn a thread

• Future– Returned for async. invocation– placeholder for result– Uncaught exception is stored

and re-raised at collection

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

a

f()

Proc: A Proc: B

a.f() Spawn for

RMI

a

f()

Proc

Spawn for async

store in F

F = a.f() async

Page 17: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Programming in gluepy• Basics: RemoteObject

– Inherit Base class– Externally referenceable

• Async. invocation with futures– No explicit threads– Easier to maintain

sequential flow

• mutual exclusion? events? ⇒ SerialObjects 

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

class Peer(RemoteObject): def run(self, arg): # work here… return result

futures = [] for p in peers: f = p.run.future(arg) futures.append(f) waitall(futures)

for f in futures: print f.get()

async. RMI run() on all

wait for all results

read for all results

inherit Remote Object

Page 18: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

“ownership” with SerialObjects• SerialObjects

– Objects with mutual exclusion– RemoteObject sub-class

• No explicit locks• Ownership for each object

– call acquire⇒– return release⇒– Method execution by only 1 thread

• The “owner thread”

• Owner releases ownership onblocking operations– e.g: waitall(), RMI to other SerialObject– Pending threads contest for ownership– Arbitrary thread is scheduled– Eliminate deadlocks for recursive calls

Th

Th Th Th

object

newownerthread

Give-upOwner

ship

block

Th Th Th Th

object

unblock

re-contestfor ownership

ThTh Th Th

object ownerthread

waiting threads

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 19: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Signals to SerialObjects• We don’t want event-driven loops!

• Events → “signals”– Blocking op. unblock on signal

• Signals to objects– Unblock a thread blocking

in object’s context• If none, unblock a next blocking thread

– Unblocked thread can handlethe signal(event)

Th

object

unblock

SIGNAL

Th

object handle

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 20: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

SerialObjects in gluepy• e.g. : A Queue

– pop() • blocks on empty Queue

– add()• call signal() to unblock waiter

• Atomic Section:– Between blocking ops

in a method– Can update obj. attr.s

and do invocation on Non-Serial Objects

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

class DistQueue(SerialObject): def __init__(self): self.queue = []

def add(self, x): self.queue.append(x) if len(self.queue) == 1: self.signal()

def pop(self): while len(self.queue) == 0:

wait([])

x = self.queue.pop(0) return x

Atomic Section

Signal & wake

Block until signal

Page 21: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Managing dynamic resources• Node Join:

– Python process starts• Node leave:

– Process termination

• Constructs for node joins/leaves– Node Join

“⇒ first reference” problemObject lookup

• obtain ref. to existing objects in computation

– Node Leave ⇒ RMI exception

• Catch to handle failure

Exception!

Objects in computation

joining node

lookup

Object on failed node

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 22: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

e.g.:Master-worker in gluepy (1/3)

• Handles join/leave

• code for join :– join will invoke signal– signal will unblock main

master thread

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

class Master(SerialObject): ...

def nodeJoin(self, node): self.nodes.append(node) self.signal()

def run (self): assigned = {} while True: while len(self.nodes)>0 and

len(self.jobs)>0: ASYNC. RMIS TO IDLE WORKERS

readys = wait(futures) if readys == None: continue

for f in readys: HANDLE RESULTS

Signal for join

Block &Handle join

Page 23: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

e.g. : Master-worker in gluepy (2/3)

• Failure handling– Exception on collection– Handle exception to

resubmit task

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job)

Failurehandling

Page 24: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

e.g.: Master-worker in gluepy (3/3)

• Deployment– Master exports object– Workers get reference

and do RMI to join

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

worker = Worker()

master = RemoteRef(“master”)

master.nodeJoin(worker)

while True: sleep(1)

master = Master()

master.register(“master”)

master.run()

lookup on join

Worker init

Master init

Page 25: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Automatic Overlay Construction(1)• Solution for Connectivity

– Automatically construct an overlay

• TCP overlay– On boot, acquire other peer info.– Each node connects to a small

number of peers– Establish a connected connection

graph

NAT

Firewall

Global IP

Attempt connection

established connections

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 26: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Automatic Overlay Construction(2)

• Firewalled clusters– Automatic

port-forwarding– User configure SSH info

• Transparent routing– P-P communication is

routed– (AODV [Perkins ‘97])

SSH

Firewalltraversal

P-to-Pcommunication

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

#config fileuse src_pat dst_pat, prot=ssh, user=kenny

Page 27: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

RMI failure detection on Overlay• Problem with overlay

– A route consists of a number of connections

• RMI failure ⇒ failure of any intermediateconnection

• Path Pointers– Recorded on each forwarding

node– RMI reply returns the path it came

• Failure of intermediate connection– The preceding forwarding node

back-propagates the failure

Path pointer

RMI handler

failure

Backpropagate

RMI invoker

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 28: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Agenda

• Introduction• Related Work• Proposal• Evaluation• Conclusion

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 29: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Experimental Environment

hongo:98

chiba:186

okubo:28

suzuk:72

imade:60kototoi:88

kyoto:70

istbs:316

tsubame:64

FirewallPrivate IPs

All packets droppedhiro:88

mirai:48

Global IPs

Max. scale : 9 clusters, over 900 cores

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

InTrigger Grid

Platform in Japan

InTrigger

requires SSH forwarding

Page 30: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Necessary Configuration

• Configuration necessary for Overlay– 2 clusters( tsubame, istbs) require SSH-

portforwarding to other clusters ⇒ 2 lines of configuration

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

# istbs cluster uses SSH for inter-cluster conn.use 133\.11\.23\. (?!133\.11\.23\.), prot=ssh, user=kenny

# tsubame cluster gateway uses SSH for inter-cluster conn.use 131.112.3.1 (?!172\.17\.), prot=ssh, user=kenny

add connection instruction by regular expression

Page 31: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections per peer– 1000 trials per each cluster/attempted connection configuration

0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

1.2

Probability of Connected Graph

4 Global 3 Private2 Global 3 Private1 Global 3 Private

num. of attempted connections per peer

Prob

abili

ty

28 Global/ 238 Private Peers Case: 95 %

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 32: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Dynamic Master-Worker• Master object distributes work to Worker objects

– 10,000 tasks as RMI

• Workers repeat join/leave– Tasks for failed nodes are redistributed– No tasks were lost during the experiment

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

0 500 1000 1500 2000 2500 30000

100200300400500600700800900

Workers

Time[sec]

Tas

k/W

orke

r C

ount

Page 33: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

A Real-life Application

• A combination optimization problem– Permutation Flow Shop Problem– parallel branch-and-bound• Master-Worker like• Requires periodic exchange of bounds

– Code• 250 lines of Python code as glue code• Worker node starts up sequential C++ code

– Communicate with local Python through pipes

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 34: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Master-Worker interaction

• Master does RMI to worker– Worker: periodical RMI to master– Not your typical master-worker– requires a flexible framework like ours

Master

Worker

doJob() exchange_bound()

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 35: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Performance

• Work Rate– ci : total comp. time per core– N: num. of cores– T: completion time

• Slight drop with 950 cores– due to master node

becoming overloaded

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

169

569

948

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Work RateC

PU C

ores

Page 36: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Troubleshoot Search Engine• Ever stuck debugging, or troubleshooting?

• Re-rank query results obtained from google– Use results from machine learning web-forums– Perform natural language processing on page contents

at query time• Use a Grid backend

– Computationally intensive– Require good response time

• in 10s of seconds

backend

Search Engine

Query:“vmware kernel panic”

Compute!!

Compute!!

Compute!!

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 37: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Troubleshoot Search Engine Overview

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

PythonCGI

async.doQuery()

doSearch()

async.doWork()

parsing

Graph extraction

rescoring

Leveraged sync/async RMIs to seamlessly integrate parallelism into a sequential program.

Merged CGIs with Grid backend

Page 38: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Agenda

• Introduction• Related Work• Proposal• Evaluation• Conclusion

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 39: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Conclusion• gluepy: Grid enabled distributed object oriented framework

– Supports simple and flexible coding for complex Grid• SerialObjects• Signal semantics• Object lookup / exception on RMI failure• Automatic overlay construction

– as a tool to glue together Grid resources simply and flexibly

• Implemented and evaluated applications on the Grid– Max. scale: 900 core (9 cluster)

• NAT/Firewall, with runtime joins/leaves– Parallelized real-life applications

• Take full advantage of gluepy constructs for seamless programming

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 40: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Questions?

• gluepy is available from its homepagewww.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 41: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 42: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections per peer– 1000 trials per each cluster/attempted connection configuration

0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

1.2

Probability of Connected Graph

4 Global 3 Private2 Global 3 Private1 Global 3 Private

num. of attempted connections per peer

Prob

abili

ty

28 Global/ 238 Private Peers Case: 95 %

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 43: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Troubleshoot Search Engine Overview

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

RMIfromCGI

extraction

parsing

Graph extraction

rescoring

asynchronously return to CGI

Leverage async. RMI from CGI script to work distribution on the Grid

All coding done in Python seamlessly, using gluepy

async.RMI forquery

async.RMI to

workers

Page 44: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Utilization of the Grid• Grid

= Multiple Clusters (LAN/WAN)

• Typical Usage– Many stand-alone jobs in parallel– Little or No interaction among nodes

• Parallel and distributed Computing– Utilize nodes for a single application

• Parallelize an existing application– Requires complex interaction

⇒ Utilize Grid enabled frameworks

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1

Page 45: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

The demands on the Grid

• A framework that realizes flexible/complex interaction on the Grid

• Can we learn anything from parallel languages?

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

leave

join

Fire Wall

apply

App.

Page 46: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Speedup

100 200 300 400 500 600 700 800 900 10000

2000

4000

6000

8000

10000

12000

14000

16000

Completion Time

CPU Cores

Com

plet

ion

Tim

e [s

ec]

900 cores でスケールしなくなる

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 47: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

累積実行時間

A(169)

B(569)

C(948)

0 1000000 2000000 3000000 4000000 5000000 6000000

Total Comm.Total Comp.

累積計算時間の拡大が再実行による無駄が出ていることが分かる

累積計算時間を考慮すると、スピードアップは 169 cores 948 cores ⇒(5.64 倍 ) で 4.942008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 48: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

オーバレイに関するMicro-Benchmarks

• 1 ノードから RMI を発行– ほぼすべてのノードに対して 3hop 以内で到達

• Latency– Overlay 上で no-op の RMI : ping()

• Bandwidth– Overlay 上で大きい引数の RMI : send_data()

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1

Page 49: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

オーバレイ上の遅延• 1 ノードから 5 クラスタのノード上の

object へ RMI : ping()– ping で測定した RTT と比較

• overlay, 1 hop = ~1.5 [ms]

chiba1

01

chiba1

03

hiro00

1

hong

o100

hongo

102

imad

e001

kotot

oi000

kototoi002

kyoto

002

mirai00

0

mirai00

20

5

10

15

20

25

30

35

ping()RTT

Lat

ency

[m

s]L

aten

cy

[ms]

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1

Page 50: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

オーバレイ上のバンド幅• 引数 (de)serialization overhead 大– Full 1Gbit Ethernet で理想最大値: 40[MB/sec]– iperf 測定値から算出される最大値と比較– overlay で hop をするたびにバンド幅が減少• store-and-forward

chiba

101

chiba1

03

hiro00

1

hongo

100

hong

o102

imad

e001

kotot

oi000

kotot

oi002

kyoto

002

mirai00

0

mirai00

205

1015202530354045

Throughputiperf

Thr

ough

put [

MB

/s]

Overlay hop 数でクラスタ内でもバンド幅が変化

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1

Page 51: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Master-worker in gluepy

• Full Master-Worker– 並列 RMI

• New RMI for idle workers– ノードの参加・脱退

• Non-event driven

2008/8/1www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

class Master(SerialObject): def __init__(self, jobs): self.nodes = [] self.jobs = jobs

def nodeJoin(self , node): self.nodes.append(node) self.signal()

def run (self): assigned = {} while True: while len(self.nodes)>0 and len(self.jobs)>0: node = self.nodes.pop() job = self.jobs.pop() f = node.doJob.future(job) assigned[f] = (node, job)

readys = wait(assigned.keys()) if readys == None: continue

for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job)

Failurehandling

Signal for join

Block &Handle join

worker = Worker()

master = RemoteRef(“master”)

master.join(worker)

while True: sleep(1)

master = Master()

master.register(“master”)

master.run()

lookup on join

Worker init

Master init

Page 52: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Grid 上で並列分散計算の需要• 例: Web アプリケーション

• CGI とのインタラクション• タスク分割・負荷分散• 結果集約・ CGI へ結果加工

   

backend

Application

Publicly accessible

複雑な協調を必要とする

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 53: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

提案するモデルでのプログラミング• future で非同期 RMI

– 呼び出しに対するplaceholder

– いずれ結果が格納される• 明示的なスレッドは不要

– 別スレッドによる callbackなどはない

• RMI を処理する側は?2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

# 非同期 RMI 実行f1 = foo1.doJob.future(args)f2 = foo2.doJob.future(args)# 他の計算……# 返り値を取得。 Waitvalue1 = f1.get()value2 = f2.get()

Page 54: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

プログラミングモデルが提供するもの• 柔軟性:既存のオブジェクト指向言語をライブラリ拡張• 並列分散計算に携わる諸問題を解決

– ロックを意識させない• オブジェクトの所有権を使った implicit な排他制御• block 時に所有権を implicit に委託する:デッドロック予防

– event-driven に陥らないイベント処理• 同期モデルと協調した signal セマンティクス

– 参加・脱退への対応• object lookup : 「最初の参照」問題• 故障に対する例外セマンティクス

• 分散した動的な環境 (Grid) を簡潔に扱えるモデル2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 55: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

資源間通信と協調の実現• Condor/DAGMan [Thain et al. ‘05]

– バッチスケジューラ– 「タスク」はスクリプトで表現– タスク間の依存関係は DAG

• Ibis / Satin [Wrzesinska et al. ‘06]– 分散環境で分割統治問題– 子タスクに分割– タスクには親子関係

DAG DependencyRelationship

Central Manager

Busy Nodes

Assign

Cluster

Task

- タスク間の協調は中間ファイル媒体- 依存関係のあるタスク間での通信に限定

www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy2008/8/1

Page 56: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

参加・脱退の処理• JoJo2 [Aoki et al. ‘06]

– Master – Worker– Event driven– イベント毎にハンドラ定義

• タスクの終了• 参加• 脱退・故障

Join

Failure Handler

Join Handler

- 同期の問題- イベント・ドリブンは分かりにくい-提案する同期セマンティクスに非同期イベント通知機構を導入-Event-driven でない記述が可能

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 57: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections per peer– 1000 trials per each cluster/attempted connection configuration

0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

1.2

Probability of Connected Graph

4 Global 3 Private2 Global 3 Private1 Global 3 Private

num. of attempted connections per peer

Prob

abili

ty

28 Global/ 238 Private Peers Case: 95 %

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 58: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Future Work

• Reliable WAN communication for the Grid overlays– Node failure– Connection failure

• Weakness of WAN connections– Router Policies• close connections after given period

– Obscure kernel bugs with NAT• Connection resets

Faulty link

WAN links are more vulnerable, and failures will occur2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 59: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Some Related Work• Robust Tree Topologies for

Sensor Networks [ England ‘06]– Create spanning tree for data

reduction– Flat tree for high reliability

• Fewest Hops– Tree with short distance for

low power consumption• Shortest Path

⇒ Spanning Tree that merges the two metrics for the best of two worlds

Fewest Hop:High ReliabilityHigh Power Usage

Shortest Path:Low ReliabilityLow Power Usage

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 60: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Possible Future Direction

• Our context: Grid computing– communication latency = metric for link reliability

• Fewest Hops– Reliability for node

failure• Shortest Distance– Reliability for link failure

Short reliable links

Long faulty links

Can we construct an overlay connection topology that take the best of two worlds?

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 61: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Publications• 1. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming

Framework for Complex Grid Environments. In 8th IEEE International Symposium on Cluster Computing and the Grid, May 2008 (Poster Paper. To Appear).

• 2. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In IPSJ Transactions on Advanced Computing Systems. (Conditional Accept)

• 3. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In Proceedings of 2008 Symposium on Advanced Computing Systems and Infrastructures. (To Appear)

• 4. Ken Hironaka, Shogo Sawai, Kenjiro Taura. A Distributed Object-Oriented Library for Computation Across Volatile Resources. In Summer United Workshops on Parallel, Distributed and Cooperative Processing. August 2007

• 5. Ken Hironaka, Kenjiro Taura, Takashi Chikayama. A Low-Stretch Object Migration Scheme for Wide-Area Environments. In IPSJ Transactions on Programming. Vol 48 No.SIG 12 (PRO 34), pp.28-40, August 2007

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 62: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 63: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 64: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Problems with Grid Computing (2)

• Complexity of Programming on the Grid– Low Level Computing (sockets)

• Communication• Multi-threaded Computing (Synchronization)• Heavy Burden on Non-experts

– Flexibility and Integration• Grid Frameworks for task distribution• Independent parallel programming languages• Computing is not execution of many independent tasks

– Need finer grained communication• Bad interface with user application

– Java, Ruby, Python, PHP

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 65: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Related Work

• Discussed with respect to criteria necessary for modern Grid computing– Workflow Coordination• Flexibility without putting the burden on the user

– Joining Nodes / Failure of resources• Handling these events should not dominate the

programming overhead– Connectivity in Wide-Area Networks• Adaptation to networks with NAT/firewall with little

manual settings

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 66: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Workflow Coordination (1)

• Condor / DAGMan [Thain ‘05]– “Tasks” are expressed as script files and

distributed on idle nodes– Dependency between tasks can be

expressed in DAG (Directed Acyclic Graph)

• Ibis / Satin [Wrzesinska ‘06]– framework for divide-and-conquer

problems– Tasks can be broken into smaller sub-

tasks, on which it depends

DAG DependencyRelationship

Central Manager

Busy Nodes

Assign

Cluster

Task

- Many computation cannot be expressed as “Tasks” with dependencies- A task’s communication is limited to others to which it has dependencies

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 67: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Object Synchronization Example

class A: def __init__(self, x): self.x = x

def f(self, b): self.x += 1

#blocking RMI b.g() self.x -= 1

return

Atomic section

Atomic section

a b

b.g()

Value x stays

consistent

In method f(), instance a invokes blocking method g() on object b

only 1 thread at a

time

give-upownershipduring RMI

block

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 68: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Adaptation to Dynamic Resources• Signal delivery to objects

– Unblocks any thread that is blocking in the object’s context• Can be used to notify

asynchronous events– A joining node

• Node Failure RMI Failure⇒– Failure returned as exception to

method invocation– The user can catch the

exception, and perform rollback procedures if necessary

exception

Th

object

blockunblock

signal

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 69: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

Preliminary Experiments

• Overlay Construction Simulation• A Simple Master-Worker Application with

dynamically joining/leaving nodes• A Real-life Parallel Application on the Grid• A Troubleshoot-Search Engine

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 70: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

hongo(98)

chiba(186)

okubo(28)

suzuk(72)

imade(60)kototoi (88)

kyoto(70)

istbs(316)

tsubame(64)

Global IPs

FirewallPrivate IPs

All packets dropped

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 71: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

hongo(98)

chiba(186)

okubo(28)

suzuk(72)

imade(60)kototoi (88)

kyoto(70)

istbs(316)

tsubame(64)

Global IPs

FirewallPrivate IPs

All packets dropped

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

Page 72: gluepy : A Simple Distributed Python Programming Framework for Complex Grid Environments

2008/8/1 www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy

100 200 300 400 500 600 700 800 900 10000

2000

4000

6000

8000

10000

12000

14000

16000

Completion Time

CPU Cores

Com

plet

ion

Tim

e [s

ec]