Parallel python

  • Published on
    14-Jul-2015

  • View
    676

  • Download
    2

Embed Size (px)

Transcript

<ul><li><p>Parallel Python</p><p>12 10 9 </p></li><li><p> Parallel Python Python (Nodes &amp; Client) </p><p>12 10 9 </p></li><li><p>Python 1991 , </p><p>12 10 9 </p></li><li><p>GIL Global Interpreter Lock</p><p> GIL Python 1 .( )</p><p> , Python </p><p>12 10 9 </p></li><li><p>Multiprocessing</p><p> Process-based threading interface New in python version 2.6 &gt;&gt;&gt; from multiprocessing</p><p>12 10 9 </p></li><li><p>Parallel Python</p><p> , &amp; Multiprocessing Python </p><p> (BSD-Like license)</p><p>12 10 9 </p></li><li><p>PP</p><p> Python 2.5 user$ curl -0 http://www.parallelpython.com/downloads/pp/pp-1.6.2.tar.gz | tar -zx user$ sudo python setup.py install</p><p>12 10 9 </p></li><li><p>On the client</p><p>2) Import pp module:</p><p> import pp</p><p>3) Create a list of all the nodes in your cluster (computers where you've run ppserver.py)</p><p> ppservers=("node-1", "node-2", "node-3")</p><p>4) Start pp execution server with the number of workers set tothenumberofprocessorsinthesystem and list of ppservers to connect with :</p><p> job_server = pp.Server(ppservers=ppservers)</p><p>5) Submit all the tasks for parallel execution:</p><p> f1 = job_server.submit(func1, args1, depfuncs1, modules1)</p><p> f2 = job_server.submit(func1, args2, depfuncs1, modules1)</p><p> f3 = job_server.submit(func2, args3, depfuncs2, modules2)</p><p> ...etc...</p><p>6) Retrieve the results as needed:</p><p> r1 = f1()</p><p> r2 = f2()</p><p> r3 = f3()</p><p>12 10 9 </p></li><li><p>On the nodes</p><p>1) Start parallel python execution server on all your remote computational nodes:</p><p> node-1&gt; ./ppserver.py</p><p> node-2&gt; ./ppserver.py</p><p> node-3&gt; ./ppserver.py</p><p>12 10 9 </p></li><li><p> 1 import math, sys, md5, time 2 import pp 3 4 def md5test(hash, start, end): 5 for x in xrange(start, end): 6 if md5.new(str(x)).hexdigest() == hash: 7 return x 8 9 ppservers = ("127.0.0.1", "1.234.80.127", "210.94.181.157",) 10 11 if len(sys.argv) &gt; 1: 12 ncpus = int(sys.argv[1]) 13 job_server = pp.Server(ncpus, ppservers=ppservers) 14 else: 15 job_server = pp.Server(ppservers=ppservers) 16 17 print "Starting pp with", job_server.get_ncpus(), "workers" 18 hash = md5.new("1829182").hexdigest() 19 print "hash =", hash 20 </p><p> 21 start_time = time.time() 22 start = 1 23 end = 2000000 24 25 parts = 128 26 27 step = (end - start) / parts + 1 28 jobs = [] 29 30 for index in xrange(parts): 31 starti = start+index*step 32 endi = min(start+(index+1)*step, end) 33 jobs.append(job_server.submit(md5test, (hash, starti, endi), (), ("md5",))) 34 35 for job in jobs: 36 result = job() 37 if result: 38 break 39 40 if result: 41 print "Reverse md5 for", hash, "is", result 42 else: 43 print "Reverse md5 for", hash, "has not been found" 44 45 print "Time elapsed: ", time.time() - start_time, "s" 46 job_server.print_stats()</p><p>12 10 9 </p></li><li><p>Hadoop VS PP</p><p> . VS VS 3GB VS . . VS .</p><p>12 10 9 </p></li><li><p> : http://python.org</p><p> GIL !! : http://openlook.org:625/blog/2006/11/12/cb-1136/</p><p> PP : http://www.parallelpython.com</p><p> Hadoop VS Parallel Python : http://stackoverflow.com/questions/7701989/can-someone-explain-parallelpython-versus-hadoop-for-distributing-python-process</p><p>12 10 9 </p></li></ul>

Recommended

View more >