Upload
others
View
28
Download
0
Embed Size (px)
Citation preview
Georgia Advanced Computing Resource CenterEITS/University of GeorgiaZhuofei Hou, [email protected]
Python on GACRC Computing Resources
110/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Outline
• GACRC
• Python Overview
• Python on Clusters
• Python Packages on Clusters
• Run Python Interactively on Clusters
• Run Python Batch Job on Clusters
210/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
GACRCWho Are We? Georgia Advanced Computing Resource Center Collaboration between the Office of Vice President for Research (OVPR) and
the Office of the Vice President for Information Technology (OVPIT) Guided by a faculty advisory committee (GACRC-AC)
Why Are We Here? To provide computing hardware and network infrastructure in support of high-
performance computing (HPC) at UGA
Where Are We? http://gacrc.uga.edu (Web) http://wiki.gacrc.uga.edu (Wiki) http://gacrc.uga.edu/help/ (Web Help) https://wiki.gacrc.uga.edu/wiki/Getting_Help (Wiki Help)
310/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python Overview – Language
• Open source general-purpose scripting language (https://www.python.org/)
• Working with procedural, object-oriented, and functional programming
• Glue language with Interfaces to C/C++ (via SWIG), Object-C (via PyObjC),
Java (Jython), and Fortran (via F2PY) , etc.
(https://wiki.python.org/moin/IntegratingPythonWithOtherLanguages)
• Mainstream version is 2.7.x; new version is 3.5.x (as to March 2016)
410/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python Overview – Modules
• Python has a large collection of built-in modules included in standard distributions:
https://docs.python.org/2/py-modindex.html
https://docs.python.org/3/py-modindex.html
• Many third-party packages for scientific modules:
5
NumPy
Sympy
SciPy
Biopy
Matplotlib
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python Overview – Scientific Modules
NumPy: Matlab-ish capabilities, fast N-D array operations, linear algebra, etc.
(http://www.numpy.org/)
SciPy: Fundamental library for scientific computing (http://www.scipy.org/)
Sympy: Symbolic mathematics (http://www.sympy.org/en/index.html)
matplotlib: High quality plotting (http://matplotlib.org/)
Biopy: Phylogenetic exploration (https://code.google.com/archive/p/biopy/)
6
A scientific Python distribution may include all those packages for you!
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python Overview – Scientific Distributions
• Anaconda
“A Python distribution including ~200 of the most popular Python packages for science, math, engineering, and data analysis.”
Supports Linux, Mac and Windows (https://www.continuum.io/)
• Python(x,y)
Windows only (http://python-xy.github.io/)
• WinPython
Windows only (http://winpython.github.io/)
710/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python on Clusters
• Python
https://wiki.gacrc.uga.edu/wiki/Python
https://wiki.gacrc.uga.edu/wiki/Python-Sapelo
• Anaconda Python
https://wiki.gacrc.uga.edu/wiki/Anaconda
https://wiki.gacrc.uga.edu/wiki/Anaconda-Sapelo
810/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python on zcluster
9
Version Installation Path Invoke Command
2.4.3 (default) /usr/bin python
2.7.2* /usr/local/python/2.7.2 python2.7
2.7.8 /usr/local/python/2.7.8 python2.7.8
3.3.0 /usr/local/python/3.3.0 python3
3.4.0 /usr/local/python/3.4.0 python3.4
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
* Most Python site-packages GACRC installed are for the version of 2.7.2 on zcluster
Python on Sapelo
10
Version Installation Path Module Load Invoke Command
2.6.6 (default) /usr/bin
python
2.7.8 /usr/local/apps/python/2.7.8 module load python/2.7.8
3.4.3 /usr/local/apps/python/3.4.3 module load python/3.4.3 python3
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Anaconda Python on zcluster
11
Version Installation PathPython Version
ExportsInvoke
Command
2.3.0 /usr/local/anaconda/2.3.0 2.7.11export
PATH=/usr/local/anaconda/2.3.0/bin:$PATH
python
3-2.2.0 /usr/local/anaconda/3-2.2.0 3.4.3export
PATH=/usr/local/anaconda/3-2.2.0/bin:$PATH
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Anaconda Python on Sapelo
12
Version Installation PathPythonVersion
Module LoadInvoke
Command
2.2.0 /usr/local/apps/anaconda/2.2.0 2.7.12 module load anaconda/2.2.0
python2.5.0 /usr/local/apps/anaconda/2.5.0 2.7.11 module load anaconda/2.5.0
3-2.2.0 /usr/local/apps/anaconda/3-2.2.0 3.4.3 module load anaconda/3-2.2.0
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python Packages on Clusters
• Python Packages
https://wiki.gacrc.uga.edu/wiki/Python
https://wiki.gacrc.uga.edu/wiki/Python-Sapelo
• Anaconda Python Packages
https://wiki.gacrc.uga.edu/wiki/Anaconda
https://wiki.gacrc.uga.edu/wiki/Anaconda-Sapelo
1310/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python Packages on Clusters
How to know if the package you need is already installed on clusters?
1. python –c ‘import pkgName; print pkgName__version__’
2. conda list pkgName
3. python -m pip list | grep pkgName
Examples: Next page!
1410/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Python Packages on Clusters
1510/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
zcluster
1. python2.7 –c ‘import numpy; print numpy.__version__’
2. python2.7.8 –m pip list | grep numpy
3. export PATH=/usr/local/anaconda/2.3.0/bin:$PATH
conda list numpy
Sapelo
1. module load python/2.7.8
python –c ‘import numpy; print numpy.__version__’
2. python -m pip list | grep numpy
3. module load anaconda/3-2.2.0
conda list numpy
Common Python Packages on zcluster
1610/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Packagepython2.7
(python2.7.2)python2.7.8
(python2.7.8)python3.4
(python3.4.0)Anaconda2.3.0(python2.7.11)
Anaconda3-2.2.0(python3.4.3)
Numpy 1.11.0 1.10.1 1.9.1 1.10.2 1.10.2
Scipy 0.10.1 0.14.1 n/a 0.15.1 0.15.1
Biopython 1.65 1.67 n/a 1.66 1.66
Matplotlib 1.3.1 1.3.1 1.3.1 1.4.3 1.5.0
Cython 0.16 0.19.1 0.19.1 0.23.2 0.23.4
Pandas 0.17.0 n/a n/a 0.17.1 0.15.2
Scikit-image n/a 0.10.1 n/a 0.11.3 0.11.2
Scikit-learn 0.15.2 0.17 n/a 0.16.1 0.15.2
Networkx 2.0.dev 1.11 1.11 1.9.1 1.9.1
Requests 2.5.1 2.8.0 n/a 2.9.0 2.9.0
Common Python Packages on Sapelo
1710/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Packagepython
(python2.7.8)python3
(python3.4.3)Anaconda2.5.0(python2.7.11)
Anaconda3-2.2.0(python3.4.3)
Numpy 1.9.2 1.9.2 1.10.4 1.9.2
Scipy 0.16.1 n/a 0.17.0 0.15.1
Biopython 1.66 n/a 1.66 1.66
Matplotlib 1.4.3 1.5.1 1.5.1 1.4.3
Cython 0.24.1 0.22 0.23.4 0.22
Pandas 0.17.1 0.17.1 0.17.1 0.15.2
Scikit-image n/a n/a 0.11.3 0.11.2
Scikit-learn 0.17.1 n/a 0.17 0.15.2
Networkx n/a n/a 1.11 1.9.1
Requests 2.5.1 n/a 2.9.1 2.6.0
Python Package Paths on zcluster
18
Version Python Package Path Python Shared Library Path
2.7.2/usr/local/python/2.7.2/lib/python2.7/usr/local/python/2.7.2/lib/python2.7/site-packages
N/A
2.7.8/usr/local/python/2.7.8/lib/python2.7/usr/local/python/2.7.8/lib/python2.7/site-packages
/usr/local/python/2.7.8/lib
3.3.0/usr/local/python/3.3.0/lib/python3.3/usr/local/python/3.3.0/lib/python3.3/site-packages
N/A
3.4.0/usr/local/python/3.4.0/lib/python3.4/usr/local/python/3.4.0/lib/python3.4/site-packages
N/A
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
python can find those packages automatically! To be exported in LD_LIBRARY_PATH
Python Package Paths on Sapelo
19
Version Python Package Path Python Shared Library Path
2.7.8/usr/local/apps/python/2.7.8/lib/python2.7
/usr/local/apps/python/2.7.8/lib/python2.7/site-packages/usr/local/apps/python/2.7.8/lib
3.4.3/usr/local/apps/python/3.4.3/lib/python2.7
/usr/local/apps/python/3.4.3/lib/python3.4/site-package/usr/local/apps/python/3.4.3/lib
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
python can find those packages automatically! To be exported in LD_LIBRARY_PATH
Anaconda Python Package Paths on zcluster
20
Version Installation PathPython Version
Python Package Path
2.3.0 /usr/local/anaconda/2.3.0 2.7.11
/usr/local/anaconda/2.3.0/lib/python2.7
/usr/local/anaconda/2.3.0/lib/python2.7/site-
packages
3-2.2.0 /usr/local/anaconda/3-2.2.0 3.4.3
/usr/local/anaconda/3-2.2.0/lib/python3.4
/usr/local/anaconda/3-2.2.0/lib/python3.4/site-
packages
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
python can find those packages automatically!Python Shared Libraries were built for all versions in lib
Anaconda Python Package Paths on Sapelo
21
Version Installation PathPythonVersion
Python Package Path
2.2.0 /usr/local/apps/anaconda/2.2.0 2.7.12/usr/local/apps/anaconda/2.2.0/lib/python2.7/usr/local/apps/anaconda/2.2.0/lib/python2.7/site-packages
2.5.0 /usr/local/apps/anaconda/2.5.0 2.7.11/usr/local/apps/anaconda/2.5.0/lib/python2.7/usr/local/apps/anaconda/2.5.0/lib/python2.7/site-packages
3-2.2.0 /usr/local/apps/anaconda/3-2.2.0 3.4.3/usr/local/apps/anaconda/3-2.2.0/lib/python3.4/usr/local/apps/anaconda/3-2.2.0/lib/python3.4/site-packages
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
python can find those packages automatically!Python Shared Libraries were built for all versions in lib
Run Python Interactively on Clusters
• Run Python Interactively
• Run Anaconda Python Interactively
2210/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
DO NOT run jobs from login node; Run interactive tasks from interactive node:
zcluster.rcc.uga.edu (zcluster)
sapelo1.gacrc.uga.edu (Sapelo)
https://wiki.gacrc.uga.edu/wiki/Training - Download
qlogin
interactive node
Run Python Interactively on Clusters
23
zhuofei@compute-18-16:~$ python
Python 2.4.3 (#1, Oct 23 2012, 22:02:41)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> a = 7
>>> e = 2
>>> a**e
49
>>>
[zhuofei@n15 ~]$ python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> a = 7
>>> e = 2
>>> a**e
49
>>>
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Running on zcluster interactive node: Running on Sapelo interactive node:
Run Python Interactively on Clusters
24
zhuofei@compute-18-16:~$ /usr/local/python/2.7.8/bin/python myScript.py
2.7.8 (default, Jan 7 2015, 15:33:35)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
49
[zhuofei@n15 ~]$ module load python/2.7.8
[zhuofei@n15 ~]$ python myScript.py
2.7.8 (default, Sep 26 2014, 07:26:46)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
49
import sysprint sys.version
a = 7e = 2print a**e
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Run Python Interactively on Clusters
25
zhuofei@compute-18-16:~$ chmod u+x myScript.py
zhuofei@compute-18-16:~$ ./myScript.py
2.7.8 (default, Jan 7 2015, 15:33:35)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
49
[zhuofei@n15 ~]$ chmod u+x myScript.py
[zhuofei@n15 ~]$ ./myScript.py
2.7.8 (default, Sep 26 2014, 07:26:46)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
49
#!/usr/local/python/2.7.8/bin/python
import sysprint sys.version
a = 7; e = 2print a**e
tells exec where python is on zcluster
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
#!/usr/local/apps/python/2.7.8/bin/python2.7
import sysprint sys.version
a = 7; e = 2print a**e
tells exec where python is on Sapelo
Run Python Interactively on Clusters
26
zhuofei@compute-18-16:~$ chmod u+x myScript.py
zhuofei@compute-18-16:~$ export PATH=/usr/local/python/2.7.8/bin:$PATH
zhuofei@compute-18-16:~$ ./myScript.py
2.7.8 (default, Jan 7 2015, 15:33:35)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
49
[zhuofei@n15 ~]$ chmod u+x myScript.py
[zhuofei@n15 ~]$ module load python/2.7.8
[zhuofei@n15 ~]$ ./myScript.py
2.7.8 (default, Sep 26 2014, 07:26:46)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
49
#!/usr/bin/env python
import sysprint sys.version
a = 7; e = 2print a**e
env tells exec where python is by searching PATH
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
[zhuofei@n15 ~]$ chmod u+x myScript.py
[zhuofei@n15 ~]$ module load anaconda/2.5.0
[zhuofei@n15 ~]$ ./myScript.py
2.7.11 |Anaconda 2.5.0 (64-bit)| (default, Dec 6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
49
Run Anaconda Python Interactively on Clusters
27
zhuofei@compute-18-16:$ chmod u+x myScript.py
zhuofei@compute-18-16:$ export PATH=/usr/local/anaconda/2.3.0/bin:$PATH
zhuofei@compute-18-16:$ ./myScript.py
2.7.11 |Anaconda 2.3.0 (64-bit)| (default, Dec 6 2015, 18:08:32)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
49
#!/usr/bin/env python
import sysprint sys.version
a = 7; e = 2print a**e
10/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Run Python Batch Job on Clusters
• Run Python Batch Job on zcluster
• Run Python Bach Job on Sapelo
2810/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
Note: zcluster Job Working Space: /escratch4/username/
Sapelo Job Working Space: /lustre1/username/
https://wiki.gacrc.uga.edu/wiki/Training - Download
Run Python Batch Job on zcluster
2910/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
#!/bin/bash
cd working_directory
export PATH=/usr/local/python/2.7.8/bin:$PATH
export PYTHONPATH=/usr/local/python/2.7.8/lib/python2.7:/usr/local/python/2.7.8/lib/python2.7/site-\
packages:$PYTHONPATH
time python myScript.py [options]
qsub -q rcc-30d sub.sh optional qsub options, e.g., -pe threads 4 -l mem_total=20g
Run Anaconda Python Batch Job on zcluster
3010/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
#!/bin/bash
cd working_directory
export PATH=/usr/local/anaconda/2.3.0/bin:$PATH
export PYTHONPATH=/usr/local/anaconda/2.3.0/bin:/usr/local/anaconda/2.3.0/lib/python2.7:$PYTHONPATH
time python myScript.py [options]
qsub -q rcc-30d sub.sh optional qsub options, e.g., -pe threads 4 -l mem_total=20g
Run (Anaconda) Python Batch Job on Sapelo
3110/26/2016 PYTHON ON GACRC COMPUTING RESOURCES
#PBS -S /bin/bash
#PBS -q batch
#PBS -N PythonJob1
#PBS -l nodes=1:ppn=4:AMD
#PBS -l walltime=48:00:00
#PBS -l mem=10gb
cd $PBS_O_WORKDIR
module load python/3.4.3
time python3 myScript.py [options]
qsub sub.sh #PBS -S /bin/bash
#PBS -q batch
#PBS -N PythonJob1
#PBS -l nodes=1:ppn=4:AMD
#PBS -l walltime=48:00:00
#PBS -l mem=10gb
cd $PBS_O_WORKDIR
module load anaconda/3-2.2.0
time python myScript.py [options]
Thank You!
3210/26/2016 PYTHON ON GACRC COMPUTING RESOURCES