16
Parallel Processing with IPython January 22, 2010

Parallel Processing with IPython

Embed Size (px)

DESCRIPTION

In this screencast, Travis Oliphant gives an introduction to IPython, an extremely useful tool for task-based parallel processing with Python.

Citation preview

Page 1: Parallel Processing with IPython

Parallel Processing with IPython

January 22, 2010

Page 2: Parallel Processing with IPython

Enthought Python Distribution (EPD)

MORE THAN SIXTY INTEGRATED PACKAGES

• Python 2.6

• Science (NumPy, SciPy, etc.)

• Plotting (Chaco, Matplotlib)

• Visualization (VTK, Mayavi)

• Multi-language Integration (SWIG,Pyrex, f2py, weave)

• Repository access

• Data Storage (HDF, NetCDF, etc.)

• Networking (twisted)

• User Interface (wxPython, Traits UI)

• Enthought Tool Suite (Application Development Tools)

Page 3: Parallel Processing with IPython

Enthought Training Courses

Python Basics, NumPy, SciPy, Matplotlib, Chaco, Traits, TraitsUI, …

Page 4: Parallel Processing with IPython

PyCon

http://us.pycon.org/2010/tutorials/

Introduction to TraitsIntroduction to Enthought Tool Suite

Fantastic deal (normally $700 at PyConget the same material for $275)

Corran Webster

Page 5: Parallel Processing with IPython

Upcoming Training ClassesMarch 1 – 5, 2009 Python for Scientists and Engineers Austin, Texas, USA

March 1 – 5, 2009 Python for Quants London, UK

http://www.enthought.com/training/

Page 6: Parallel Processing with IPython

6

Parallel Processingwith

IPython

Page 7: Parallel Processing with IPython

7

IPython.kernel

• IPython's interactive kernel provides a simple (but powerful) interface for task-based parallel programming.

• Allows fast development and tuning of task-parallel algorithm to better utilize resources.

Page 8: Parallel Processing with IPython

8

Getting started --- local clustermanually WINDOWSUNIX and OSX (and now WINDOWS)

# run ipcluster to start-up a # controller and a set of engines$ ipcluster local –n 4Your cluster is up and running.

...

You can then cleanly stop the cluster from IPython using:

mec.kill(controller=True)

You can also hit Ctrl-C to stop it, or use from the cmd line:

kill -INT 20465

Creates several key-files in ~/.ipython/security :

ipcontroller-engine.furl ipcontroller-mec.furl ipcontroller-tc.furl

# run ipcontroller and then# ipengine for each desired engine> start /B C:\Python25\Scripts\ipcontroller.exe> start /B C:\Python25\Scripts\ipengine.exe> start /B C:\Python25\Scripts\ipengine.exe> start /B C:\Python25\Scripts\ipengine.exe...2009-02-11 23:58:26-0600 [-] Log opened.2009-02-11 23:58:28-0600 [-] Using furl file: C:\Documents and Settings\demo\_ipython\security\ipcontroller-engine.furl2009-02-11 23:58:28-0600 [-] registered engine with id: 32009-02-11 23:58:28-0600 [-] distributing Tasks2009-02-11 23:58:28-0600 [Negotiation,client] engine registration succeeded, got id: 3

Creates several key-files in %HOME%\_ipython\security :

ipcontroller-engine.furl ipcontroller-mec.furl ipcontroller-tc.furl

Page 9: Parallel Processing with IPython

9

Getting started -- distributed• Run ipcontroller on a host and create .furl files

• Creates separate .furl files to be used by the different connections (engine, multiengine client, task client).

• Places .furl files by default in ~/.ipython/security (UNIX or Mac OSX) or %HOME%\_ipython\security (Windows).

• Takes --<connection>-furl-file=FILENAME options where <connection> is engine, multiengine, or task to place the .furl files somewhere else.

• Ensure the ipcontroller-engine.furl file is available to each host that will run an engine and run ipengine on these hosts.• Either place it in the default security directory

• Use the –furl-file=FILENAME option to ipengine

• Ensure the multiengine (task) .furl file is available to each host that will run a multiengine (task) client. • Either place it in the default security directory

• Pass the FILENAME as the first argument to the constructor

Page 10: Parallel Processing with IPython

10

Initialize client

TASKCLIENTMULTIENGINECLIENT

# * allows fine-grained control# * each engine has an id number# * more intuitive for beginners# optional argument can be # location of mec furl-file# created by the controller>>> mec = client.MultiEngineClient()>>> mec.get_ids()[0 1 2 3]

>>> from IPython.kernel import client

# * does not expose individual # engines# * presents a load-balanced,# fault-tolerant queue# optional argument can be # location of tc furl-file# created by the controller>>> tc = client.TaskClient()

mec.map -- parallel mapmec.parallel –- parallel functionmec.execute -- execute in parallelmec.push -- push datamec.pull -- pull datamec.scatter -- spread outmec.gather -- collect backmec.kill -- kill engines and controller

tc.map –- parallel maptc.parallel –- function decoratortc.run -- run Taskstc.get_task_result – get result

client.MapTask –- function-likeclient.StringTask –- code-string

Page 11: Parallel Processing with IPython

11

MultiEngineClientSCALAR FUNCTION PARALLEL VECTORIZED FUNCTION

# Using map>>> def func(x):... return x**2.5 * (3*x – 2)# standard map>>> result = map(func, range(32))# mec.map>>> parallel_result = mec.map(func, range(32))

# mec.parallel >>> pfunc = mec.parallel()(func)

@mec.paralleldef pfunc(x): return x**2.5 * (3*x – 2)

>>> parallel_result2 = pfunc(range(32))

or using decorators

Page 12: Parallel Processing with IPython

12

TaskClient – Load BalancingSCALAR FUNCTION PARALLEL VECTORIZED FUNCTION

# Using map>>> def func(x):... return x**2.5 * (3*x – 2)# standard map>>> result = map(func, range(32))# mec.map>>> parallel_result = tc.map(func, range(32))

# mec.parallel >>> pfunc = tc.parallel()(func)

@tc.paralleldef pfunc(x): return x**2.5 * (3*x – 2)

>>> parallel_result2 = pfunc(range(32))

or using decorators

Page 13: Parallel Processing with IPython

13

MultiEngineClient EXECUTE CODESTRING IN PARALLEL

>>> from enthought.blocks.api import func2str# decorator that turns python-code into a string>>> @func2str... def code():... import numpy as np... a = np.random.randn(N,N)... eigs, vals = np.linalg.eig(a)... maxeig = max(abs(eigs))>>> mec['N'] = 100>>> result = mec.execute(code)>>> print mec['maxeig'][10.471428625885835, 10.322386155553213, 10.237638983818622, 10.614715948426941]

Page 14: Parallel Processing with IPython

14

TaskClient – Load Balancing QueueEXECUTE CODESTRING IN PARALLEL

>>> from enthought.blocks.api import func2str# decorator that turns python-code into a string>>> @func2str... def code():... import numpy as np... a = np.random.randn(N,N)... eigs, vals = np.linalg.eig(a)... maxeig = max(abs(eigs))>>> task = client.StringTask(str(code), push={'N':100}, pull='maxeig') >>> ids = [tc.run(task) for i in range(4)]>>> res = [tc.get_task_result(id) for id in ids]>>> print [x['maxeig'] for x in res][10.439989436983467, 10.250842410862729, 10.040835983392991, 10.603885977189803]

Page 15: Parallel Processing with IPython

Parallel FFT On Memory Mapped File

ProcessorsTime

(seconds)Speed Up

1 11.75 1.0

2 6.06 1.9

4 3.36 3.5

8 2.50 4.7

Page 16: Parallel Processing with IPython

EPDhttp://www.enthought.com/products/epd.php

Enthought Training:http://www.enthought.com/training/

Webinarshttp://www.enthought.com/training/webinars.php