165
Sheffield HPC Documentation Release October 06, 2016

Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC DocumentationRelease

October 06, 2016

Page 2: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC
Page 3: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Contents

1 Research Computing Team 3

2 Research Software Engineering Team 5

i

Page 4: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

ii

Page 5: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster.

A new system, ShARC (Sheffield Advanced Research Computer), is currently under development. It is not yet readyfor use.

Contents 1

Page 6: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

2 Contents

Page 7: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

CHAPTER 1

Research Computing Team

The research computing team are the team responsible for the iceberg service, as well as all other aspects of researchcomputing. If you require support with iceberg, training or software for your workstations, the research computingteam would be happy to help. Take a look at the Research Computing website or email [email protected].

3

Page 8: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

4 Chapter 1. Research Computing Team

Page 9: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

CHAPTER 2

Research Software Engineering Team

The Sheffield Research Software Engineering Team is an academically led group that collaborates closely with CiCS.They can assist with code optimisation, training and all aspects of High Performance Computing including GPUcomputing along with local, national, regional and cloud computing services. Take a look at the Research SoftwareEngineering website or email [email protected]

2.1 Using the HPC Systems

2.1.1 Getting Started

If you have not used a High Performance Computing (HPC) cluster, Linux or even a command line before this is theplace to start. This guide will get you set up using iceberg in the easiest way that fits your requirements.

Getting an Account

Before you can start using iceberg you need to register for an account. Accounts are availible for staff by [email protected].

The following categories of students can also have an account on iceberg with the permission of their supervisors:

• Research Postgraduates

• Taught Postgraduates - project work

• Undergraduates 3rd & 4th year - project work

Student’s supervisors can request an account for the student by emailing [email protected].

Note: Once you have obtained your iceberg username, you need to initialize your iceberg password to be the same asyour normal password by using the CICS synchronize passwords system.

Connecting to iceberg (Terminal)

Accessing iceberg through an ssh terminal is easy and the most flexible way of using iceberg, as it is the native wayof interfacing with the linux cluster. It also allows you to access iceberg from any remote location without having tosetup VPN connections.

5

Page 10: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Rest of this page summarizes the recommended methods of accessing iceberg from commonly usedplatforms. A more comprehensive guide to accessing iceberg and transfering files is located athttp://www.sheffield.ac.uk/cics/research/hpc/using/access

Windows

Download and install the Installer edition of mobaxterm.

After starting MobaXterm you should see something like this:

Click Start local terminal and if you see something like the the following then please continue to the Connect toiceberg section.

Mac OS/X and Linux

Linux and Mac OS/X both have a terminal emulator program pre-installed.

Open a terminal and then go to Connect to iceberg.

Connect to iceberg

Once you have a terminal open run the following command:

6 Chapter 2. Research Software Engineering Team

Page 11: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

ssh -X <username>@iceberg.shef.ac.uk

where you replace <username> with your CICS username.

This should give you a prompt resembling the one below:

[te1st@iceberg-login2 ~]$

at this prompt type:

qsh

like this:

[te1st@iceberg-login2 ~]$ qshYour job 135355 ("INTERACTIVE") has been submittedwaiting for interactive job to be scheduled ....Your interactive job 135355 has been successfully scheduled.

which will pop up another terminal window, which supports graphical applications.

Note: Iceberg is a compute cluster. When you login to the cluster you reach one of two login nodes. You should notrun applications on the login nodes. Running qsh gives you an interactive terminal on one of the many worker nodesin the cluster.

If you only need terminal based (CLI) applications you can run the qrsh command. Which will give you a shell on aworker node, but without graphical application (X server) support.

What Next?

Now you have connected to iceberg, you can look at how to submit jobs with Iceberg’s Queue System or look atSoftware on iceberg.

2.1.2 Scheduler

qhost

qhost is a scheduler command that show’s the status of Sun Grid Engine hosts.

Documentation

Documentation is available on the system using the command:

man qhost

Examples

Get an overview of the nodes and CPUS on Iceberg

qhost

This shows every node in the cluster. Some of these nodes may be reserved/purchased for specific research groups andhence may not be available for general use.

2.1. Using the HPC Systems 7

Page 12: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

qrsh

qrsh is a scheduler command that requests an interactive session on a worker node. The resulting session will notsupport graphical applications. You will usually run this command from the head node.

Examples

Request an interactive session that provides the default amount of memory resources

qrsh

Request an interactive session that provides 10 Gigabytes of real and virtual memory

qrsh -l rmem=10G -l mem=10G

qrshx

qrshx is a scheduler command that requests an interactive session on a worker node. The resulting session will supportgraphical applications. You will usually run this command from the head node.

Examples

Request an interactive X-Windows session that provides the default amount of memory resources and launch the gedittext editor

qrshxgedit

Request an interactive X-Windows session that provides 10 Gigabytes of real and virtual memory and launch the latestversion of MATLAB

qrshx -l mem=10G -l rmem=10Gmodule load apps/matlabmatlab

Request an interactive X-Windows session that provides 10 Gigabytes of real and virtual memory and 4 CPU cores

qrshx -l rmem=10G -l mem=10G -pe openmp 4

Sysadmin notes

qrshx is a Sheffield-developed modification to the standard set of scheduler commands. It is at /usr/local/bin/qrshxand contains the following

#!/bin/shexec qrsh -v DISPLAY -pty y "$@" bash

qsh

qsh is a scheduler command that requests an interactive X-windows session on a worker node. The resulting terminalis not user-friendly and we recommend that you use our qrshx command instead.

8 Chapter 2. Research Software Engineering Team

Page 13: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Examples

Request an interactive X-Windows session that provides the default amount of memory resources

qsh

Request an interactive X-Windows session that provides 10 Gigabytes of real and virtual memory

qsh -l rmem=10G -l mem=10G

Request an interactive X-Windows session that provides 10 Gigabytes of real and virtual memory and 4 CPU cores

qsh -l rmem=10G -l mem=10G -pe openmp 4

qstat

qstat is a scheduler command that displays the status of the queues.

Examples

Display all jobs queued on the system

qstat

Display all jobs queued by the username foo1bar

qstat -u foo1bar

Display all jobs in the openmp parallel environment

stat -pe openmp

Display all jobs in the queue named foobar

qstat -q foobar.q

qsub

qsub is a scheduler command that submits a batch job to the system.

Examples

Submit a batch job called myjob.sh to the system

qsub myjob.sh

qtop

qtop is a scheduler command that provides a summary of all processes running on the cluster for a given user.

2.1. Using the HPC Systems 9

Page 14: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Examples

qtop is only available on the worker nodes. As such, you need to start an interactive session on a worker node usingqrsh or qrshx in order to use it.

To give a summary of all of your currently running jobs

qtop

Summary for job 256127

HOST VIRTUAL-MEM RSS-MEM %CPU %MEM CPUTIME+ COMMANDtestnode03 106.22 MB 1.79 MB 0.0 0.0 00:00:00 bashtestnode03 105.62 MB 1.27 MB 0.0 0.0 00:00:00 qtoptestnode03 57.86 MB 3.30 MB 0.0 0.0 00:00:00 ssh

--------- --------TOTAL: 0.26 GB 0.01 GB

Iceberg’s Queue System

To manage use of the Iceberg cluster, there is a queue system (SoGE, a derivative of the Sun Grid Engine).

The queue system works by a user requesting some task, either a script or an interactive session, be run on the clusterand then the scheduler will take tasks from the queue based on a set of rules and priorities.

Using Iceberg Interactively

If you wish to use the cluster for interactive use, such as running applications such as MATLAB or Ansys, or compilingsoftware, you will need to request that the scheduler gives you an interactive session. For an introduction to this seeGetting Started.

There are three commands which give you an interactive shell:

• qrsh - Requests an interactive session on a worker node. No support for graphical applications.

• qrshx - Requests an interactive session on a worker node. Supports graphical applications. Superior to qsh.

• qsh - Requests an interactive session on a worker node. Supports graphical applications.

You can configure the resources available to the interactive session by specifying them as command line options to theqsh or qrsh commands. For example to run a qrshx session with access to 16 GB of virtual RAM

[te1st@iceberg-login2 ~]$ qrshx -l mem=16G

or a session with access to 8 cores:

[te1st@iceberg-login2 ~]$ qrshx -pe openmp 8

A table of Common Interactive Job Options is given below, any of these can be combined together to request moreresources.

Note: Long running jobs should use the batch submission system rather than requesting an interactive session for avery long time. Doing this will lead to better cluster performance for all users.

10 Chapter 2. Research Software Engineering Team

Page 15: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Common Interactive Job Options

Command Description-lh_rt=hh:mm:ss

Specify the total maximum execution time for the job.

-l mem=xxG Specify the maximum amount (xx) of memory to be used (per process or core).-pe <env><nn>

Specify a parallel environment and number of processors.

-pe openmp<nn>

The openmp parallel environment provides multiple threads on one node. <nn> specifies themax number of threads.

Running Batch Jobs on iceberg

The power of iceberg really comes from the ‘batch job’ queue submission process. Using this system, you write ascript which requests various resources, initializes the computational environment and then executes your program(s).The scheduler will run your job when resources are available. As the task is running, the terminal output and anyerrors are captured and saved to disk, so that you can see the output and verify the execution of the task.

Any task that can be executed without any user intervention while it is running can be submitted as a batch job toIceberg. This excludes jobs that require a Graphical User Interface (GUI), however, many common GUI applicationssuch as Ansys or MATLAB can also be used without their GUIs.

When you submit a batch job, you provide an executable file that will be run by the scheduler. This is normally a bashscript file which provides commands and options to the program you are using. Once you have a script file, or otherexecutable file, you can submit it to the queue by running:

qsub myscript.sh

Here is an example batch submission script that runs a fictitious program called foo

#!/bin/bash# Request 5 gigabytes of real memory (mem)# and 5 gigabytes of virtual memory (mem)#$ -l mem=5G -l rmem=5G

# load the module for the program we want to runmodule load apps/gcc/foo

#Run the program foo with input foo.dat#and output foo.resfoo < foo.dat > foo.res

Some things to note:

• The first line always needs to be #!/bin/bash to tell the scheduler that this is a bash batch script.

• Comments start with a #

• Scheduler options, such as the amount of memory requested, start with #$

• You will usually require one or more module commands in your submission file. These make programs andlibraries available to your scripts.

Here is a more complex example that requests more resources

#!/bin/bash# Request 16 gigabytes of real memory (mem)# and 16 gigabytes of virtual memory (mem)#$ -l mem=16G -l rmem=16G# Request 4 cores in an OpenMP environment#$ -pe openmp 4

2.1. Using the HPC Systems 11

Page 16: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

# Email notifications to [email protected]#$ -M [email protected]# Email notifications if the job aborts#$ -m a

# load the modules required by our programmodule load compilers/gcc/5.2module load apps/gcc/foo

#Set the OPENMP_NUM_THREADS environment variable to 4export OMP_NUM_THREADS=4

#Run the program foo with input foo.dat#and output foo.resfoo < foo.dat > foo.res

Scheduler Options

Com-mand

Description

-lh_rt=hh:mm:ss

Specify the total maximum execution time for the job.

-lmem=xxG

Specify the maximum amount (xx) of memory to be used.

-lhostname=

Target a node by name. Not recommended for normal use.

-l arch= Target a processor architecture. Options on Iceberg include intel-e5-2650v2 and intel-x5650-N Job name, used to name output files and in the queue list.-j Join the error and normal output into one file rather than two.-M Email address to send notifications to.-m bea Type of notifications to send. Can be any combination of begin (b) end (e) or abort (a) i.e. -m ea for

end and abortion messages.-a Specify the earliest time for a job to start, in the format MMDDhhmm. e.g. -a 01011130 will

schedule the job to begin no sooner than 11:30 on 1st January.

All scheduler commands

All available scheduler commands are listed in the Scheduler section.

Frequently Asked SGE Questions

How many jobs can I submit at any one time

You can submit up to 2000 jobs to the cluster, and the scheduler will allow up to 200 of your jobs to run simultaneously(we occasionally alter this value depending on the load on the cluster).

How do I specify the processor type on Iceberg?

Add the following line to your submission script

#$ -l arch=intel-e5-2650v2

This specifies nodes that have the Ivybridge E5-2650 CPU. All such nodes on Iceberg have 16 cores.

12 Chapter 2. Research Software Engineering Team

Page 17: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

To only target the older, 12 core nodes that contain X5650 CPUs add the following line to your submission script

#$ -l arch=intel-x5650

How do I specify multiple email addresses for job notifications?

Specify each additional email with it’s own -M option

#$ -M [email protected]#$ -M [email protected]

How do you ensure that a job starts after a specified time?

Add the following line to your submission script

#$ -a time

but replace time with a time in the format MMDDhhmm

For example, for 22nd July at 14:10, you’d do

#$ -a 07221410

This won’t guarantee that it will run precisely at this time since that depends on available resources. It will, however,ensure that the job runs after this time. If your resource requirements aren’t too heavy, it will be pretty soon after.When I tried it, it started about 10 seconds afterwards but this will vary.

This is a reference section for commands that allow you to interact with the scheduler.

• qhost - Show’s the status of Sun Grid Engine hosts.

• qrsh - Requests an interactive session on a worker node. No support for graphical applications.

• qrshx - Requests an interactive session on a worker node. Supports graphical applications. Superior to qsh.

• qsh - Requests an interactive session on a worker node. Supports graphical applications.

• qstat - Displays the status of jobs and queues.

• qsub - Submits a batch job to the system.

• qtop - Provides a summary of all processes running on the cluster for a given user

2.2 Iceberg

2.2.1 Software on iceberg

These pages list the software available on iceberg. If you notice an error or an omission, or wish to request newsoftware please email the research computing team at [email protected].

Modules on Iceberg

In general the software available on iceberg is loaded and unloaded via the use of the modules system 1.

Modules make it easy for us to install many versions of different applications, compilers and libraries side by side andallow users to setup the computing environment to include exactly what they need.

1 http://modules.sourceforge.net/

2.2. Iceberg 13

Page 18: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Note: Modules are not available on the login node. You must move to a worker node using either qrsh or qsh (seeGetting Started) before any of the following commands will work.

Available modules can be listed using the following command:

module avail

Modules have names that look like apps/python/2.7. To load this module, you’d do:

module load apps/python/2.7

You can unload this module with:

module unload apps/python/2.7

It is possible to load multiple modules at once, to create your own environment with just the software you need. Forexample, perhaps you want to use version 4.8.2 of the gcc compiler along with MATLAB 2014a

module load compilers/gcc/4.8.2module load apps/matlab/2014a

Confirm that you have loaded these modules wih

module list

Remove the MATLAB module with

module unload apps/matlab/2014a

Remove all modules to return to the base environment

module purge

Module Command Reference

Here is a list of the most useful module commands. For full details, type man module at an iceberg commandprompt.

• module list – lists currently loaded modules

• module avail – lists all available modules

• module load modulename – loads module modulename

• module unload modulename – unloads module modulename

• module switch oldmodulename newmodulename – switches between two modules

• module initadd modulename – run this command once to automatically ensure that a module is loadedwhen you log in. (It creates a .modules file in your home dir which acts as your personal configuration.)

• module show modulename - Shows how loading modulename will affect your environment

• module purge – unload all modules

• module help modulename – may show longer description of the module if present in the modulefile

• man module – detailed explanation of the above commands and others

14 Chapter 2. Research Software Engineering Team

Page 19: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Applications

CASTEP

CASTEP

Latest Version 16.1URL http://www.castep.org/

Licensing Only licensed users of CASTEP are entitled to use it and license details are available on CASTEP’swebsite. Access to CASTEP on the system is controlled using a Unix group. That is, only members of the castepgroup can access and run the program. To be added to this group, you will need to contact the Iceberg email the teamat [email protected] and provide evidence of your eligibility to use CASTEP.

Interactive Usage The serial version of CASTEP should be used for interactive usage. After connecting to iceberg(see Connect to iceberg), start an interactive session with the qrsh or qsh command. Make the serial version ofCASTEP available using the one of the commands

module load apps/intel/15/castep/16.1-serialmodule load apps/intel/15/castep/8.0-serial

The CASTEP executable is called castep-serial so if you execute

castep.serial

You should get the following

Usage:castep <seedname> : Run files <seedname>.cell [and <seedname>.param]

" [-d|--dryrun] <seedanme> : Perform a dryrun calculation on files <seedname>.cell" [-s|--search] <text> : print list of keywords with <text> match in description" [-v|--version] : print version information" [-h|--help] <keyword> : describe specific keyword in <>.cell or <>.param" " all : print list of all keywords" " basic : print list of basic-level keywords" " inter : print list of intermediate-level keywords" " expert : print list of expert-level keywords" " dummy : print list of dummy keywords

If, instead, you get

-bash: castep.serial: command not found

It is probably because you are not a member of the castep group. See the section on Licensing above for details onhow to be added to this group.

Interactive usage is fine for small CASTEP jobs such as the Silicon example given athttp://www.castep.org/Tutorials/BasicsAndBonding

To run this example, you can do

# Get the files, decompress them and enter the directory containing themwget http://www.castep.org/files/Si2.tgztar -xvzf ./Si2.tgzcd Si2

2.2. Iceberg 15

Page 20: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

#Run the CASTEP job in serialcastep.serial Si2

#Read the output using the more commandmore Si2.castep

CASTEP has a built in help system. To get more information on using castep use

castep.serial -help

Alternatively you can search for help on a particular topic

castep.serial -help search keyword

or list all of the input parameters

castep.serial -help search all

Batch Submission - Parallel The parallel version of CASTEP is called castep.mpi. To make the parallel envi-ronment available, use one of the following module commands

module load apps/intel/15/castep/16.1-parallelmodule load apps/intel/15/castep/8.0-parallel

As an example of a parallel submission, we will calculate the bandstructure of graphite following the tutorial athttp://www.castep.org/Tutorials/BandStructureAndDOS

After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrsh or qsh command.Download and decompress the example input files with the commands

wget http://www.castep.org/files/bandstructure.tgztar -xvzf ./bandstructure.tgz

Enter the directory containing the input files for graphite

cd bandstructure/graphite/

Create a file called submit.sge that contains the following

#!/bin/bash#$ -pe openmpi-ib 4 # Run the calculation on 4 CPU cores#$ -l rmem=4G # Request 4 Gigabytes of real memory per core#$ -l mem=4G # Request 4 Gigabytes of virtual memory per coremodule load apps/intel/15/castep/16.1-parallel

mpirun castep.mpi graphite

Submit it to the system with the command

qsub submit.sge

After the calculation has completed, get an overview of the calculation by looking at the file graphite.castep

more graphite.castep

Installation Notes These are primarily for system administrators.

CASTEP Version 16.1

16 Chapter 2. Research Software Engineering Team

Page 21: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

The jump in version numbers from 8 to 16.1 is a result of CASTEP’s change of version numbering. There are noversions 9-15.

Serial (1 CPU core) and Parallel versions of CASTEP were compiled. Both versions were compiled with version15.0.3 of the Intel Compiler Suite and the Intel MKL versions of BLAS and FFT were used. The parallel version madeuse of OpenMPI 1.8.8

The Serial version was compiled and installed with

module load compilers/intel/15.0.3install_dir=/usr/local/packages6/apps/intel/15/castep/16.1mkdir -p $install_dir

tar -xzf ./CASTEP-16.1.tar.gzcd CASTEP-16.1

#Compile Serial versionmake INSTALL_DIR=$install_dir FFT=mkl MATHLIBS=mkl10make INSTALL_DIR=$install_dir FFT=mkl MATHLIBS=mkl10 install install-tools

The directory CASTEP-16.1 was then deleted and the parallel version was installed with

#!/bin/bashmodule load libs/intel/15/openmpi/1.8.8#The above command also loads Intel Compilers 15.0.3#It also places the MKL in LD_LIBRARY_PATH

install_dir=/usr/local/packages6/apps/intel/15/castep/16.1

tar -xzf ./CASTEP-16.1.tar.gzcd CASTEP-16.1

#Workaround for bug described at http://www.cmth.ph.ic.ac.uk/computing/software/castep.htmlsed 's/-static-intel/-shared-intel/' obj/platforms/linux_x86_64_ifort15.mk -i

#Compile parallel versionmake COMMS_ARCH=mpi FFT=mkl MATHLIBS=mkl10mv ./obj/linux_x86_64_ifort15/castep.mpi $install_dir

CASTEP Version 8

Serial (1 CPU core) and Parallel versions of CASTEP were compiled. Both versions were compiled with version15.0.3 of the Intel Compiler Suite and the Intel MKL versions of BLAS and FFT were used. The parallel version madeuse of OpenMPI 1.8.8

The Serial version was compiled and installed with

module load compilers/intel/15.0.3install_dir=/usr/local/packages6/apps/intel/15/castep/8.0

tar -xzf ./CASTEP-8.0.tar.gzcd CASTEP-8.0

#Compile Serial versionmake INSTALL_DIR=$install_dir FFT=mkl MATHLIBS=mkl10make INSTALL_DIR=$install_dir FFT=mkl MATHLIBS=mkl10 install install-tools

The directory CASTEP-8.0 was then deleted and the parallel version was installed with

#!/bin/bashmodule load libs/intel/15/openmpi/1.8.8

2.2. Iceberg 17

Page 22: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

#The above command also loads Intel Compilers 15.0.3#It also places the MKL in LD_LIBRARY_PATH

install_dir=/usr/local/packages6/apps/intel/15/castep/8.0mkdir -p $install_dir

tar -xzf ./CASTEP-8.0.tar.gzcd CASTEP-8.0

#Compile parallel versionmake COMMS_ARCH=mpi FFT=mkl MATHLIBS=mkl10mv ./obj/linux_x86_64_ifort15/castep.mpi $install_dir

Modulefiles

• CASTEP 16.1-serial

• CASTEP 16.1-parallel

• CASTEP 8.0-serial

• CASTEP 8.0-parallel

Testing Version 16.1 Serial

The following script was submitted via qsub from inside the build directory:

#!/bin/bash#$ -l mem=10G#$ -l rmem=10Gmodule load compilers/intel/15.0.3

cd CASTEP-16.1/Test../bin/testcode.py -q --total-processors=1 -e /home/fe1mpc/CASTEP/CASTEP-16.1/obj/linux_x86_64_ifort15/castep.serial -c simple -v -v -v

All but one of the tests passed. It seems that the failed test is one that fails for everyone forthis version since there is a missing input file. The output from the test run is on the system at/usr/local/packages6/apps/intel/15/castep/16.1/CASTEP_SERIAL_tests_09022016.txt

Version 16.1 Parallel

The following script was submitted via qsub from inside the build directory

#!/bin/bash#$ -pe openmpi-ib 4#$ -l mem=10G#$ -l rmem=10Gmodule load libs/intel/15/openmpi/1.8.8

cd CASTEP-16.1/Test../bin/testcode.py -q --total-processors=4 --processors=4 -e /home/fe1mpc/CASTEP/CASTEP-16.1/obj/linux_x86_64_ifort15/castep.mpi -c simple -v -v -v

All but one of the tests passed. It seems that the failed test is one that fails for everyone forthis version since there is a missing input file. The output from the test run is on the system at/usr/local/packages6/apps/intel/15/castep/16.1/CASTEP_Parallel_tests_09022016.txt

Version 8 Parallel The following script was submitted via qsub

18 Chapter 2. Research Software Engineering Team

Page 23: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

#!/bin/bash#$ -pe openmpi-ib 4module load libs/intel/15/openmpi/1.8.8

cd CASTEP-8.0make check COMMS_ARCH=mpi MAX_PROCS=4 PARALLEL="--total-processors=4 --processors=4"

All tests passed.

GATK

GATK

Version 3.4-46URL https://www.broadinstitute.org/gatk/

The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, devel-oped by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools,with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Itsrobust architecture, powerful processing engine and high-performance computing features make it capable of takingon projects of any size.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive sesssion with the qsh orqrsh command.

The latest version of GATK (currently 3.4-46) is made available with the command

module load apps/binapps/GATK

Alternatively, you can load a specific version with

module load apps/binapps/GATK/3.4-46module load apps/binapps/GATK/2.6-5

Version 3.4-46 of GATK also changes the environment to use Java 1.7 since this is required by GATK 3.4-46. Anenvironment variable called GATKHOME is created by the module command that contains the path to the requestedversion of GATK.

Thus, you can run the program with the command

java -jar $GATKHOME/GenomeAnalysisTK.jar -h

Which will give a large amount of help, beginning with the version information

---------------------------------------------------------------------------------The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12Copyright (c) 2010 The Broad InstituteFor support and documentation go to http://www.broadinstitute.org/gatk---------------------------------------------------------------------------------

Documentation The GATK manual is available online https://www.broadinstitute.org/gatk/guide/

Installation notes The entire install is just a .jar file. Put it in the install directory and you’re done.

2.2. Iceberg 19

Page 24: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Modulefile Version 3.4-46

• The module file is on the system at /usr/local/modulefiles/apps/binapps/GATK/3.4-46

Its contents are

#%Module1.0######################################################################### GATK 3.4-46 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

#This version of GATK needs Java 1.7module load apps/java/1.7

proc ModulesHelp { } {puts stderr "Makes GATK 3.4-46 available"

}

set version 3.4-46set GATK_DIR /usr/local/packages6/apps/binapps/GATK/$version

module-whatis "Makes GATK 3.4-46 available"

prepend-path GATKHOME $GATK_DIR

Version 2.6-5

• The module file is on the system at /usr/local/modulefiles/apps/binapps/GATK/2.6-5

Its contents are

#%Module1.0######################################################################### GATK 2.6-5 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes GATK 3.4-46 available"

}

set version 2.6.5set GATK_DIR /usr/local/packages6/apps/binapps/GATK/$version

module-whatis "Makes GATK 2.6-5 available"

prepend-path GATKHOME $GATK_DIR

MOMFBD

20 Chapter 2. Research Software Engineering Team

Page 25: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

MOMFBD

Version 2016-04-14URL http://dubshen.astro.su.se/wiki/index.php?title=MOMFBD

MOMFBD or Multi-Object Multi-Frame Blind Deconvolution is a image processing application for removing theeffects of atmospheric seeing from solar image data.

Usage The MOMFBD binaries can be added to your path with

module load apps/gcc/5.2/MOMFBD

Installation notes MOMFBD was installed using gcc 5.2. Using this script and is loaded by this modulefle.

STAR

STAR

Latest version 2.5.0cURL https://github.com/alexdobin/STAR

Spliced Transcripts Alignment to a Reference.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshcommand.

The latest version of STAR (currently 2.5.0c) is made available with the command

module load apps/gcc/5.2/STAR

Alternatively, you can load a specific version with

module load apps/gcc/5.2/STAR/2.5.0c

This command makes the STAR binary available to your session and also loads the gcc 5.2 compiler environmentwhich was used to build STAR. Check that STAR is working correctly by displaying the version

STAR --version

This should give the output

STAR_2.5.0c

Documentation The STAR manual is available online: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

Installation notes STAR was installed using gcc 5.2

module load compilers/gcc/5.2

mkdir STARmv ./STAR-2.5.0c.zip ./STAR

2.2. Iceberg 21

Page 26: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

cd ./STAR/unzip STAR-2.5.0c.zipcd STAR-2.5.0cmakecd sourcemkdir -p /usr/local/packages6/apps/gcc/5.2/STAR/2.5.0cmv ./STAR /usr/local/packages6/apps/gcc/5.2/STAR/2.5.0c/

Testing No test suite was found.

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/STAR/2.5.0c

It’s contents

#%Module1.0######################################################################### STAR 2.5.0c module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

module load compilers/gcc/5.2

module-whatis "Makes version 2.5.0c of STAR available"

set STAR_DIR /usr/local/packages6/apps/gcc/5.2/STAR/2.5.0c/

prepend-path PATH $STAR_DIR

Abaqus

Abaqus

Versions 6.13,6.12 and 6.11Support Level FULLDependancies Intel CompilerURL http://www.3ds.com/products-services/simulia/products/abaqus/Local URL https://www.shef.ac.uk/wrgrid/software/abaqus

Abaqus is a software suite for Finite Element Analysis (FEA) developed by Dassault Systèmes.

Interactive usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qshcommand. Alternatively, if you require more memory, for example 16 gigabytes, use the command qsh -lmem=16G

The latest version of Abaqus (currently version 6.13) is made available with the command

module load apps/abaqus

22 Chapter 2. Research Software Engineering Team

Page 27: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Alternatively, you can make a specific version available with one of the following commands

module load apps/abaqus/613module load apps/abaqus/612module load apps/abaqus/611

After that, simply type abaqus to get the command-line interface to abaqus or type abaqus cae to get the GUIinterface.

Abaqus example problems Abaqus contains a large number of example problems which can be used to becomefamiliar with Abaqus on the system. These example problems are described in the Abaqus documentation, and can beobtained using the Abaqus fetch command. For example, after loading the Abaqus module enter the following at thecommand line to extract the input file for test problem s4d

abaqus fetch job=s4d

This will extract the input file s4d.inp, to run the computation defined by this input file replace input=myabaqusjobwith input=s4d in the commands and scripts below.

Batch submission of a single core job In this example, we will run the s4d.inp file on a single core using 8 Gigabytesof memory. After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the qrsh command.

Load version 6.13-3 of Abaqus and fetch the s4d example by running the following commands

module load apps/abaqus/613abaqus fetch job=s4d

Now, you need to write a batch submission file. We assume you’ll call this my_job.sge

#!/bin/bash#$ -S /bin/bash#$ -cwd#$ -l rmem=8G#$ -l mem=8G

module load apps/abaqus

abq6133 job=my_job input=s4d.inp scratch=/scratch memory="8gb" interactive

Submit the job with the command qsub my_job.sge

Important notes:

• We have requested 8 gigabytes of memory in the above job. The memory="8gb" switch tells abaqus to use8 gigabytes. The #$ -l rmem=8G and #$ -l mem=8G tells the system to reserve 8 gigabytes of real andvirtual memory resptively. It is important that these numbers match.

• Note the word interactive at the end of the abaqus command. Your job will not run without it.

Batch submission of a single core job with user subroutine In this example, we will fetch a simulation fromAbaqus’ built in set of problems that makes use of user subroutines (UMATs) and run it in batch on a single core.After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the qrsh command.

Load version 6.13-3 of Abaqus and fetch the umatmst3 example by running the following commands

module load apps/abaqus/613abaqus fetch job=umatmst3*

2.2. Iceberg 23

Page 28: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

This will produce 2 files: The input file umatmst3.inp and the Fortran user subroutine umatmst3.f.

Now, you need to write a batch submission file. We assume you’ll call this my_user_job.sge

#!/bin/bash#$ -S /bin/bash#$ -cwd#$ -l rmem=8G#$ -l mem=8G

module load apps/abaqus/613module load compilers/intel/12.1.15

abq6133 job=my_user_job input=umatmst3.inp user=umatmst3.f scratch=/scratch memory="8gb" interactive

Submit the job with the command qsub my_user_job.sge

Important notes:

• In order to use user subroutimes, it is necessary to load the module for the intel compiler.

• The user-subroutine itself is passed to Abaqus with the switch user=umatmst3.f

Annovar

Annovar

Version 2015DEC14URL http://annovar.openbioinformatics.org/en/latest/

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variantsdetected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast andmany others).

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qsh orqrsh command.

To add the annovar binaries to the system PATH, execute the following command

module load apps/binapps/annovar/2015DEC14

Documentation The annovar manual is available online at http://annovar.openbioinformatics.org/en/latest/

Installation notes The install is a collection of executable Perl scripts. Installation involves copying the directoryand adding it to the PATH. It seems that annovar uses dates to distinguish between releases rather than version numbers

tar -xvzf ./annovar.latest.tar.gzmkdir -p mkdir -p /usr/local/packages6/apps/binapps/annovarmv annovar /usr/local/packages6/apps/binapps/annovar/2015DEC14

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/binapps/annovar/2015DEC14

24 Chapter 2. Research Software Engineering Team

Page 29: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Its contents are

#%Module1.0######################################################################### annovar modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes annovar 2015DEC14 available"

}

set ANNOVAR_DIR /usr/local/packages6/apps/binapps/annovar/2015Dec14

module-whatis "Makes annovar 2015DEC14 available"

prepend-path PATH $ANNOVAR_DIR

Ansys

Ansys

Version 14 , 14.5 , 15Support Level FULLDependencies If using the User Defined Functions (UDF) will also need the following: For Ansys

Mechanical, Workbench, CFX and AutoDYN : INTEL 14.0 or above Compiler For Fluent :GCC 4.6.1 or above

URL http://www.ansys.com/en_ukLocal URL http://www.shef.ac.uk/cics/research/software/fluent

ANSYS suite of programs can be used to numerically simulate a large variety of structural and fluid dynamics problemsfound in many engineering, physics, medical, aeronotics and automative industry applications

Interactive usage After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with theqsh command. Alternatively, if you require more memory, for example 16 gigabytes, use the command qsh -lrmem=16G

To make the latest version of Ansys available, run the following module command

module load apps/ansys

Alternatively, you can make a specific version available with one of the following commands

module load apps/ansys/14module load apps/ansys/14.5module load apps/ansys/15.0

After loading the modules, you can issue the following commands to run an ansys product. The teaching versionsmentioned below are installed for use during ANSYS and FLUENT teaching labs and will only allow models of upto500,000 elements.

2.2. Iceberg 25

Page 30: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

ansyswb : to run Ansys workbenchansys : to run Ansys Mechanical outside the workbenchansystext: to run line-mode version of ansysAnsys and Ansystext : to run the teaching license version of the above two commands.fluent : to run Fluent outside the workbenchfluentext: to run the line-mode version of FluentFluent and Fluentext : to run the teaching license version of the above two commands.icemcfx or icem: to run icemcfd outside the workbench.

Running Batch fluent and ansys jobs The easiest way of running batch ansys and fluent jobs is as follows:

module load {version_you_require} for example: module load apps/ansys/15.0followed by runfluent or runansys

runfluent and runansys command submits a fluent journal or ansys input file into the batch system and can take anumber of different parameters, according to your requirements.

runfluent command Just typing runfluent will display information on how to use it.

Usage: runfluent [2d,2ddp,3d or 3ddp] fluent_journal_file -time hh:mm:ss [-mem=nn] [-rmem=nn] [-mailyour_email_address] [-nq] [-parallel nprocs][optional_extra_fluent_params].

Where all but the first two parameters are optional.

First parameter [2d , 2ddp , etc ] is the dimensionality of the problem.Second parameter, fluent_journal_file, is the file containing the fluent commands.Other 'optional' parameters are:-time hh:mm:ss is the cpu time needed in hours:minutes:seconds-mem=nn is the virtual memory needed (Default=8G). Example: -mem 12G (for 12 GBytes)-rmem=nn is the real memory needed.(Default=2G). Example: -rmem 4G (for 4 GBytes)-mail email_address. You will receive emails about the progress of your job.Example:-mail [email protected] is an optional parameter to submit without confirming-parallel nprocs : Only needed for parallel jobs to specify the no.of processors.-project project_name : The job will use a project allocation.fluent_params : any parameter not recognised will be passed to fluent itself.

Example: runfluent 3d nozzle.jou -time 00:30:00 -mem=10G

Fluent journal files are essentially a sequence of Fluent Commands you would have entered by starting fluent innon-gui mode.

Here is an example journal file:

/file/read-case test.cas/file/read-data test.dat/solve iter 200/file/write-data testv5b.datyes

/exityes

Note that there can be no graphics output related commands in the journal file as the job will be run in batch mode.Please see fluent documents for further details of journal files and how to create them.

By using the -g parameter, you can startup an interactive fluent session in non-gui mode to experiment. For example-fluent 3d -g

26 Chapter 2. Research Software Engineering Team

Page 31: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

runansys command RUNANSYS COMMAND SUBMITS ANSYS JOBS TO THE SUN GRID ENGINE

Usage: runansys ansys_inp_file [-time hh:mm:ss][-mem=nn] [-rmem=nn] [-parallel n] [-project proj_name] [-mailemail_address] [-fastdata] [other qsub parameters]

Where; ansys_inp_file is a file containing a series of Ansys commands.

-time hh:mm:ss is the cpu time needed in hours:minutes:seconds, if not specified 1 hour will be assumed.-mem=nn is the virtual memory requirement.-rmem=nn is the real memory requirement.-parallel n request an n-way parallel ansys job-gpu use GPU. Note for GPU users: -mem= must be greater than 18G.-project project_name : The job will use a project's allocation.-mail your_email_address : Job progress report is emailed to you.-fastdata use /fastdata/$USER/$JOB_ID as the working directory

As well as time and memory, any other valid qsub parameter can be specified.Particularly users of UPF functions willneed to specify -v ANS_USER_PATH=the_working_directory

All parameters except the ansys_inp file are optional.

Output files created by Ansys take their names from the jobname specified by the user. You will be prompted for ajobname as well as any other startup parameter you wish to pass to Ansys Example: runansys test1.dat -time 00:30:00-mem 8G -rmem=3G -mail [email protected]

bcbio

bcbio

Latest version 0.9.6aDependancies gcc 5.2, R 3.2.1, Anaconda Python 2.3URL http://bcbio-nextgen.readthedocs.org/en/latest/

A python toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis. You writea high level configuration file specifying your inputs and analysis parameters. This input drives a parallel pipeline thathandles distributed execution, idempotent processing restarts and safe transactional steps. The goal is to provide ashared community resource that handles the data processing component of sequencing analysis, providing researcherswith more time to focus on the downstream biology.

Usage Load version 0.9.6a of bcbio with the command

module load apps/gcc/5.2/bcbio/0.9.6a

There is also a development version of bcbio installed on iceberg. This could change without warning and should notbe used for production

module load apps/gcc/5.2/bcbio/devel

These module commands add bcbio commands to the PATH, load any supporting environments and correctly configurethe system for bcbio usage.

Once the module is loaded you can, for example, check the version of bcbio

bcbio_nextgen.py -v

/usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/anaconda/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.

2.2. Iceberg 27

Page 32: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

warnings.warn(self.msg_depr % (key, alt_key))0.9.6a

To check how the loaded version of bcbio has been configured

more $BCBIO_DIR/config/install-params.yaml

At the time of writing, the output from the above command is

aligners:- bwa- bowtie2- rtg- hisat2genomes:- hg38- hg19- GRCh37isolate: truetooldir: /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/toolstoolplus: []

Example batch submission TODO

Integration with SGE TODO

Installation Notes These are primarily for system administrators.

0.9.6a

Version 0.9.6a was installed using gcc 5.2, R 3.2.1 and Anaconda Python 2.3. The install was performed in two parts.

The first step was to run the SGE script below in batch mode. Note that the install often fails due to external servicesbeing flaky. See https://github.com/rcgsheffield/iceberg_software/issues/219 for details. Depending on the reason forthe failure, it should be OK to simply restart the install. This particular install was done in one-shot...no restartsnecessary.

• install_bcbio_0.96a

The output from this batch run can be found in /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/install_output/

Once the install completed, the module file (see Modulefile section) was created and loaded and the following upgradeswere performed

bcbio_nextgen.py upgrade --toolplus gatk=./GenomeAnalysisTK.jarbcbio_nextgen.py upgrade --genomes hg38 --aligners hisat2

The GATK .jar file was obtained from https://www.broadinstitute.org/gatk/download/

A further upgrade was performed on 13th January 2016. STAR had to be run directly because the bcbio upgradecommand that made use of it kept stalling ( bcbio_nextgen.py upgrade –data –genomes GRCh37 –aligners bwa –aligners star ). We have no idea why this made a difference but at least the direct STAR run could make use ofmultiple cores whereas the bcbio installer only uses 1

#!/bin/bash#$ -l rmem=3G -l mem=3G#$ -P radiant#$ -pe openmp 16

28 Chapter 2. Research Software Engineering Team

Page 33: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load apps/gcc/5.2/bcbio/0.9.6aSTAR --genomeDir /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Hsapiens/GRCh37/star --genomeFastaFiles /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Hsapiens/GRCh37/seq/GRCh37.fa --runThreadN 16 --runMode genomeGenerate --genomeSAindexNbases 14

bcbio_nextgen.py upgrade --data --genomes GRCh37 --aligners bwa

Another upgrade was performed on 25th February 2016

module load apps/gcc/5.2/bcbio/0.9.6abcbio_nextgen.py upgrade -u stable --data --genomes mm10 --aligners star --aligners bwa

As is usually the case for us, this stalled on the final STAR command. The exact call to STAR was found in/usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/star/Log.out and run manually in a 16core OpenMP script:

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/star --genomeFastaFiles /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/seq/mm10.fa --genomeSAindexNbases 14 --genomeChrBinNbits 14

This failed (see https://github.com/rcgsheffield/iceberg_software/issues/272). The fix was to add the line

index mm10 /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/seq/mm10.fa

to the file

usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/galaxy/tool-data/sam_fa_indices.loc

Update: 14th March 2016

Another issue required us to modify /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/genomes/Mmusculus/mm10/seq/mm10-resources.yaml so that it read

version: 16

aliases:snpeff: GRCm38.82ensembl: mus_musculus_vep_83_GRCm38

variation:dbsnp: ../variation/mm10-dbSNP-2013-09-12.vcf.gzlcr: ../coverage/problem_regions/repeats/LCR.bed.gz

rnaseq:transcripts: ../rnaseq/ref-transcripts.gtftranscripts_mask: ../rnaseq/ref-transcripts-mask.gtftranscriptome_index:tophat: ../rnaseq/tophat/mm10_transcriptome.ver

dexseq: ../rnaseq/ref-transcripts.dexseq.gff3refflat: ../rnaseq/ref-transcripts.refFlatrRNA_fa: ../rnaseq/rRNA.fa

srnaseq:srna-transcripts: ../srnaseq/srna-transcripts.gtfmirbase-hairpin: ../srnaseq/hairpin.famirbase-mature: ../srnaseq/hairpin.famirdeep2-fasta: ../srnaseq/Rfam_for_miRDeep.fa

Development version

The development version was installed using gcc 5.2, R 3.2.1 and Anaconda Python 2.3.

• install_bcbio_devel.sge This is a SGE submit script. The long running time of the installer made it better-suitedto being run as a batch job.

2.2. Iceberg 29

Page 34: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

• bcbio-devel modulefile located on the system at /usr/local/modulefiles/apps/gcc/5.2/bcbio/devel

The first install attempt failed with the error

To debug, please try re-running the install command with verbose output:export CC=${CC:-`which gcc`} && export CXX=${CXX:-`which g++`} && export SHELL=${SHELL:-/bin/bash} && export PERL5LIB=/usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools/lib/perl5:${PERL5LIB} && /usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools/bin/brew install -v --env=inherit --ignore-dependencies gitTraceback (most recent call last):

File "bcbio_nextgen_install.py", line 276, in <module>main(parser.parse_args(), sys.argv[1:])

File "bcbio_nextgen_install.py", line 46, in mainsubprocess.check_call([bcbio["bcbio_nextgen.py"], "upgrade"] + _clean_args(sys_argv, args, bcbio))

File "/usr/local/packages6/apps/binapps/anacondapython/2.3/lib/python2.7/subprocess.py", line 540, in check_callraise CalledProcessError(retcode, cmd)

subprocess.CalledProcessError: Command '['/usr/local/packages6/apps/gcc/5.2/bcbio/devel/anaconda/bin/bcbio_nextgen.py', 'upgrade', '--tooldir=/usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools', '--isolate', '--genomes', 'GRCh37', '--aligners', 'bwa', '--aligners', 'bowtie2', '--data']' returned non-zero exit status 1

I manually ran the command

export CC=${CC:-`which gcc`} && export CXX=${CXX:-`which g++`} && export SHELL=${SHELL:-/bin/bash} && export PERL5LIB=/usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools/lib/perl5:${PERL5LIB} && /usr/local/packages6/apps/gcc/5.2/bcbio/devel/tools/bin/brew install -v --env=inherit --ignore-dependencies git

and it completed successfully. I then resubmitted the submit script which eventually completed successfully. It tookseveral hours! At this point, I created the module file.

Bcbio was upgraded to the development version with the following interactive commands

module load apps/gcc/5.2/bcbio/develbcbio_nextgen.py upgrade -u development

The GATK .jar file was obtained from https://www.broadinstitute.org/gatk/download/ and installed to bcbio by runningthe following commands interactively

module load apps/gcc/5.2/bcbio/develbcbio_nextgen.py upgrade --tools --toolplus gatk=./cooper/GenomeAnalysisTK.jar

Module files

• 0.9.6a

Testing Version 0.9.6a

The following test script was submitted to the system as an SGE batch script

#!/bin/bash#$ -pe openmp 12#$ -l mem=4G #Per Core!#$ -l rmem=4G #Per Core!

module add apps/gcc/5.2/bcbio/0.9.6a

git clone https://github.com/chapmanb/bcbio-nextgen.gitcd bcbio-nextgen/tests./run_tests.sh devel./run_tests.sh rnaseq

The tests failed due to a lack of pandoc

[2016-01-07T09:40Z] Error: pandoc version 1.12.3 or higher is required and was not found.[2016-01-07T09:40Z] Execution halted[2016-01-07T09:40Z] Skipping generation of coverage report: Command 'set -o pipefail; /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/anaconda/bin/Rscript /data/fe1mpc/bcbio-nextgen/tests/test_automated_output/report/qc-coverage-report-run.R

30 Chapter 2. Research Software Engineering Team

Page 35: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Error: pandoc version 1.12.3 or higher is required and was not found.Execution halted' returned non-zero exit status 1

The full output of this testrun is on the system at /usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/tests/7-jan-2016/

Pandoc has been added to the list of applications that need to be installed on iceberg.

Development version

The following test script was submitted to the system. All tests passed. The output is at/usr/local/packages6/apps/gcc/5.2/bcbio/0.9.6a/tests/tests_07_01_2016/

#!/bin/bash#$ -pe openmp 12#$ -l mem=4G #Per Core!#$ -l rmem=4G #Per Core!

module add apps/gcc/5.2/bcbio/0.9.6a

git clone https://github.com/chapmanb/bcbio-nextgen.gitcd bcbio-nextgen/tests./run_tests.sh devel./run_tests.sh rnaseq

bcl2fastq

bcl2fastq

Versions 1.8.4Support Level BronzeURL http://support.illumina.com/downloads/bcl2fastq_conversion_software_184.html

Illumina sequencing instruments generate per-cycle BCL basecall files as primary sequencing output, but many down-stream analysis applications use per-read FASTQ files as input. bcl2fastq combines these per-cycle BCL files froma run and translates them into FASTQ files. bcl2fastq can begin bcl conversion as soon as the first read has beencompletely sequenced.

Usage To make bcl2fastq available, use the following module command in your submission scripts

module load apps/bcl2fastq/1.8.4

Installation Notes These notes are primarily for system administrators.

Compilation was done using gcc 4.4.7. I tried it with gcc 4.8 but ended up with a lot of errors. The package is alsodependent on Perl. Perl 5.10.1 was used which was the system Perl installed at the time. The RPM Perl-XML-Simplealso needed installing.

export TMP=/tmpexport SOURCE=${TMP}/bcl2fastqexport BUILD=${TMP}/bcl2fastq-1.8.4-buildmkdir -p /usr/local/packages6/apps/gcc/4.4.7/bcl2fastq/1.8.4export INSTALL=/usr/local/packages6/apps/gcc/4.4.7/bcl2fastq/1.8.4

2.2. Iceberg 31

Page 36: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

mv bcl2fastq-1.8.4.tar.bz2 ${TMP}cd ${TMP}tar xjf bcl2fastq-1.8.4.tar.bz2

mkdir ${BUILD}cd ${BUILD}${SOURCE}/src/configure --prefix=${INSTALL}

makemake install

bedtools

bedtools

Versions 2.25.0Dependancies compilers/gcc/5.2URL https://bedtools.readthedocs.org/en/latest/

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks.

Interactive usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qshcommand.

The latest version of bedtools (currently version 2.25.0) is made available with the command:

module load apps/gcc/5.2/bedtools

Alternatively, you can make a specific version available:

module load apps/gcc/5.2/bedtools/2.25.0

After that any of the bedtools commands can be run from the prompt.

Installation notes bedtools was installed using gcc 5.2 with the script install_bedtools.sh

Testing No test suite was found.

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bedtools/2.25.0

The contents of the module file is

#%Module1.0######################################################################### bedtools module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

32 Chapter 2. Research Software Engineering Team

Page 37: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

proc ModulesHelp { } {global bedtools-version

puts stderr " Adds `bedtools-$bedtools-version' to your PATH environment variable and necessary libraries"}

set bedtools-version 2.25.0module load compilers/gcc/5.2

prepend-path PATH /usr/local/packages6/apps/gcc/5.2/bedtools/2.25.0/bin

bitseq

bitseq

Versions 0.7.5URL http://bitseq.github.io/

Transcript isoform level expression and differential expression estimation for RNA-seq

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshcommand

module load apps/gcc/5.2/bitseq

Alternatively, you can load a specific version with

module load apps/gcc/5.2/bitseq/0.7.5

This command adds the BitSeq binaries to your PATH.

Documentation Documentation is available online http://bitseq.github.io/howto/index

Installation notes BitSeq was installed using gcc 5.2

module load compilers/gcc/5.2tar -xvzf ./BitSeq-0.7.5.tar.gzcd BitSeq-0.7.5

makecd ..mkdir -p /usr/local/packages6/apps/gcc/5.2/bitseqmv ./BitSeq-0.7.5 /usr/local/packages6/apps/gcc/5.2/bitseq/

Testing No test suite was found.

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bitseq/0.7.5. It’s contents

2.2. Iceberg 33

Page 38: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

#%Module1.0######################################################################### bitseq 0.7.5 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

module load compilers/gcc/5.2

proc ModulesHelp { } {puts stderr "Makes bitseq 0.7.5 available"

}

set version 0.7.5set BITSEQ_DIR /usr/local/packages6/apps/gcc/5.2/bitseq/BitSeq-$version

module-whatis "Makes bitseq 0.7.5 available"

prepend-path PATH $BITSEQ_DIR

BLAST

ncbi-blast

Version 2.3.0URL https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download

BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit. The BLAST+ applications have a numberof performance and feature improvements over the legacy BLAST applications. For details, please see the BLAST+user manual.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshxor qrsh command.

The latest version of BLAST+ (currently 2.3.0) is made available with the command

module load apps/binapps/ncbi-blast

Alternatively, you can load a specific version with

module load apps/binapps/ncbi-blast/2.3.0

This command makes the BLAST+ executables available to your session by adding the install directory to your PATHvariable. It also sets the BLASTDB database environment variable.

You can now run commands directly. e.g.

blastn -help

Databases The following databases have been installed following a user request

• nr.*tar.gz Non-redundant protein sequences from GenPept, Swissprot, PIR, PDF, PDB, and NCBI RefSeq

34 Chapter 2. Research Software Engineering Team

Page 39: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

• nt.*tar.gz Partially non-redundant nucleotide sequences from all traditional divisions of GenBank, EMBL, andDDBJ excluding GSS,STS, PAT, EST, HTG, and WGS.

A full list of databases available on the NCBI FTP site is at ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blastdb.html

If you need any of these installing, please make a request on our github issues log.

Installation notes This was an install from binaries

#get binaries and put in the correct locationwget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.3.0+-x64-linux.tar.gztar -xzf ./ncbi-blast-2.3.0+-x64-linux.tar.gzmkdir -p /usr/local/packages6/apps/binapps/ncbi-blast/mv ncbi-blast-2.3.0+ /usr/local/packages6/apps/binapps/ncbi-blast/

#Create database directorymkdir -p /usr/local/packages6/apps/binapps/ncbi-blast/ncbi-blast-2.3.0+/db

#Install the nr databasecd /usr/local/packages6/apps/binapps/ncbi-blast/ncbi-blast-2.3.0+/dbfor num in `seq 0 48`;dopaddednum=`printf "%02d" $num``wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.$paddednum.tar.gz`done

#Install the nt databasefor num in `seq 0 36`;dopaddednum=`printf "%02d" $num``wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.$paddednum.tar.gz`done

for f in *.tar.gz; do tar -xvzf $f; done

Testing No testing has been performed. If you can suggest a suitable test suite, please contact us.

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/binapps/ncbi-blast/2.3.0

The contents of the module file is

#%Module1.0######################################################################### BLAST 2.3.0 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes BLAST 2.3.0 available"

}

2.2. Iceberg 35

Page 40: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

set BLAST_DIR /usr/local/packages6/apps/binapps/ncbi-blast/ncbi-blast-2.3.0+

module-whatis "Makes BLAST 2.3.0 available"

prepend-path PATH $BLAST_DIR/binprepend-path BLASTDB $BLAST_DIR/db

bless

Bless

Version 1.02URL http://sourceforge.net/p/bless-ec/wiki/Home/

BLESS: Bloom-filter-based Error Correction Solution for High-throughput Sequencing Reads

Interactive Usage Bless uses MPI and we currently have no interactive MPI environments available. As such, it isnot possible to run Bless interactively on Iceberg.

Batch Usage The latest version of bless (currently 1.02) is made available with the command

module load apps/gcc/4.9.2/bless

Alternatively, you can load a specific version with

module load apps/gcc/4.9.2/bless/1.02

The module create the environment variable BLESS_PATH which points to the Bless installation directory. It alsoloads the dependent modules

• compilers/gcc/4.9.2

• mpi/gcc/openmpi/1.8.3

Here is an example batch submission script that makes use of two example input files we have included in our instal-lation of BLESS

#!/bin/bash#Next line is memory per slot#$ -l mem=5G -l rmem=5G#Ask for 4 slots in an OpenMP/MPI Hybrid queue#These 4 slots will all be on the same node#$ -pe openmpi-hybrid-4 4

#load the modulemodule load apps/gcc/4.9.2/bless/1.02

#Output information about nodes where code is runningcat $PE_HOSTFILE > nodes

data_folder=$BLESS_PATH/test_datafile1=test_small_1.fqfile2=test_small_2.fq

#Number of cores per node we are going to use.

36 Chapter 2. Research Software Engineering Team

Page 41: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

cores=4export OMP_NUM_THREADS=$cores

#Run BLESSmpirun $BLESS_PATH/bless -kmerlength 21 -smpthread $cores -prefix test_bless -read1 $data_folder/$file1 -read2 $data_folder/$file2

BLESS makes use of both MPI and OpenMP parallelisation frameworks. As such, it is necessary to use the hybridMPI/OpenMP queues. The current build of BLESS does not work on more than one node. This will limit you to themaximum number of cores available on one node.

For example, to use all 16 cores on a 16 core node you would request the following parallel environment

#$ -pe openmpi-hybrid-16 16

Remember that memory is allocated on a per-slot basis. You should ensure that you do not request more memory thanis available on a single node or your job will be permanently stuck in a queue-waiting (qw) status.

Installation notes Various issues were encountered while attempting to install bless. Seehttps://github.com/rcgsheffield/iceberg_software/issues/143 for details. It was necessary to install gcc 4.9.2 inorder to build bless. No other compiler worked!

Here are the install steps

tar -xvzf ./bless.v1p02.tgz

mkdir -p /usr/local/modulefiles/apps/gcc/4.9.2/bless/cd v1p02/

Load Modules

module load compilers/gcc/4.9.2module load mpi/gcc/openmpi/1.8.3

Modify the Makefile. Change the line

cd zlib; ./compile

to

cd zlib;

Manually compile zlib

cd zlib/./compile

Finish the compilation

cd ..make

Copy the bless folder to the central location

cd ..cp -r ./v1p02/ /usr/local/packages6/apps/gcc/4.9.2/bless/

Testing No test suite was found.

2.2. Iceberg 37

Page 42: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/4.9.2/bless/1.02

• The module file is on github.

Bowtie2

bowtie2

Versions 2.2.26URL http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It isparticularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligningto relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memoryfootprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped,local, and paired-end alignment modes.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive sesssion with the qsh orqrsh command.

The latest version of bowtie2 (currently 2.2.26) is made available with the command

module load apps/gcc/5.2/bowtie2

Alternatively, you can load a specific version with

module load apps/gcc/5.2/bowtie2/2.2.6

This command makes the bowtie2 executables available to your session by adding the install directory to your PATHvariable. This allows you to simply do something like the following

bowtie2 --version

which gives results that looks something like

/usr/local/packages6/apps/gcc/5.2/bowtie2/2.2.6/bowtie2-align-s version 2.2.664-bitBuilt on node063Fri Oct 23 08:40:38 BST 2015Compiler: gcc version 5.2.0 (GCC)Options: -O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITYSizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

Installation notes bowtie2 2.2.6 was installed using gcc 5.2

#Buildmodule load compilers/gcc/5.2unzip bowtie2-2.2.6-source.zipcd bowtie2-2.2.6make

#Installcd ..

38 Chapter 2. Research Software Engineering Team

Page 43: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

mkdir -p /usr/local/packages6/apps/gcc/5.2/bowtie2mv ./bowtie2-2.2.6 /usr/local/packages6/apps/gcc/5.2/bowtie2/2.2.6

Testing No test suite was found.

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bowtie2/2.2.6

The contents of the module file is

#%Module1.0######################################################################### bowtie2 2.2.6 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

module load compilers/gcc/5.2

proc ModulesHelp { } {puts stderr "Makes bowtie 2.2.6 available"

}

set version 2.2.6set BOWTIE2_DIR /usr/local/packages6/apps/gcc/5.2/bowtie2/$version

module-whatis "Makes bowtie2 v2.2.6 available"

prepend-path PATH $BOWTIE2_DIR

bwa

bwa

Versions 0.7.12URL http://bio-bwa.sourceforge.net/

BWA (Burrows-Wheeler Aligner) is a software package for mapping low-divergent sequences against a large referencegenome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. Thefirst algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences rangedfrom 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment,but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate.BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshxcommand.

The latest version of bwa (currently 0.7.12) is made available with the command

2.2. Iceberg 39

Page 44: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load apps/gcc/5.2/bwa

Alternatively, you can load a specific version with

module load apps/gcc/5.2/bwa/0.7.12module load apps/gcc/5.2/bwa/0.7.5a

This command makes the bwa binary available to your session.

Documentation Once you have made bwa available to the system using the module command above, you can readthe man pages by typing

man bwa

Installation notes bwa 0.7.12

bwa 0.7.12 was installed using gcc 5.2

module load compilers/gcc/5.2

#buildmodule load compilers/gcc/5.2tar -xvjf ./bwa-0.7.12.tar.bz2cd bwa-0.7.12make

#Sort out manfilemkdir -p share/man/man1mv bwa.1 ./share/man/man1/

#Installmkdir -p /usr/local/packages6/apps/gcc/5.2/bwa/cd ..mv bwa-0.7.12 /usr/local/packages6/apps/gcc/5.2/bwa/0.7.12/

bwa 0.7.5a

bwa 0.7.5a was installed using gcc 5.2

module load compilers/gcc/5.2

#buildmodule load compilers/gcc/5.2tar -xvjf bwa-0.7.5a.tar.bz2cd bwa-0.7.5amake

#Sort out manfilemkdir -p share/man/man1mv bwa.1 ./share/man/man1/

#Installmkdir -p /usr/local/packages6/apps/gcc/5.2/bwa/cd ..mv bwa-0.7.5a /usr/local/packages6/apps/gcc/5.2/bwa/

Testing No test suite was found.

40 Chapter 2. Research Software Engineering Team

Page 45: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Module files The default version is controlled by the .version file at /usr/local/modulefiles/apps/gcc/5.2/bwa/.version

#%Module1.0######################################################################### version file for bwa##set ModulesVersion "0.7.12"

Version 0.7.12

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bwa/0.7.12

• On github: 0.7.12.

Version 0.7.5a

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/bwa/0.7.5a

• On github: 0.7.5a.

CDO

CDO

Version 1.7.2URL https://code.zmaw.de/projects/cdo

CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data. Supporteddata formats are GRIB 1/2, netCDF 3/4, SERVICE, EXTRA and IEG. There are more than 600 operators available.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qsh orqrsh command.

To add the cdo command to the system PATH, execute the following command

module load apps/gcc/5.3/cdo/1.7.2

Documentation The CDO manual is available online at https://code.zmaw.de/projects/cdo/embedded/index.html

Installation notes

Installation

(1) Download the latest current version:

wget https://code.zmaw.de/attachments/download/12350/cdo-current.tar.gz

(2) Extract the files into a working directory ( I have used /data/cs1ds/2016/cdo ) :

gunzip cdo-current.tar.gztar -xvf cdo-current.tar

(3) Install the program ( I have used /usr/local/extras/CDO ):

module load compilers/gcc/5.3

2.2. Iceberg 41

Page 46: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

./configure --prefix=/usr/local/extras/CDO --with-netcdf=/usr/local/extras/netcdf/4.3.2make install

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.3/cdo

Its contents are

#%Module1.0######################################################################### CDO Climate Data Operators Module file.##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes Climate Data Operator $version available"

}

set version 1.7.2set BIN_DIR /usr/local/extras/CDO/$version

module-whatis "Makes CDO (Climate Data Operators) $version available"

prepend-path PATH $BIN_DIR/bin

Code Saturne 4.0

Code Saturne

Version 4.0URL http://code-saturne.org/cms/Documentation http://code-saturne.org/cms/documentationLocation /usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0

Code_Saturne solves the Navier-Stokes equations for 2D, 2D-axisymmetric and 3D flows, steady or unsteady, laminaror turbulent, incompressible or weakly dilatable, isothermal or not, with scalars transport if required.

Usage To make code saturne available, run the following module command after starting a qsh session.

module load apps/code_saturne/4.0.0

Troubleshooting If you run Code Saturne jobs from /fastdata they will fail since the underlying filesystem usedby /fastdata does not support posix locks. If you need more space than is available in your home directory, use/data instead. If you use /fastdata with Code Saturne you may get an error message similar to the one below

File locking failed in ADIOI_Set_lock(fd 14,cmd F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 26.- If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching).- If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option.ADIOI_Set_lock:: Function not implemented

42 Chapter 2. Research Software Engineering Team

Page 47: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

ADIOI_Set_lock:offset 0, length 8--------------------------------------------------------------------------MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLDwith errorcode 1.NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.You may or may not see output from other processes, depending onexactly when Open MPI kills them.--------------------------------------------------------------------------solver script exited with status 1.

Error running the calculation.Check Code_Saturne log (listing) and error* files for details.

Installation Notes These notes are primarily for system administrators.

Installation notes for the version referenced by module module load apps/code_saturne/4.0.0:

Pre-requisites: This version of Code Saturne was built with the following:-

• gcc 4.4.7

• cgns 3.2.1

• MED 3.0.8

• OpenMPI 1.8.3

• HDF5 1.8.14 built using gcc

module load libs/gcc/4.4.7/cgns/3.2.1

tar -xvzf code_saturne-4.0.0.tar.gzmkdir code_saturn_buildcd code_saturn_build/./../code_saturne-4.0.0/configure --prefix=/usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0 --with-mpi=/usr/local/mpi/gcc/openmpi/1.8.3/ --with-med=/usr/local/packages6/libs/gcc/4.4.7/med/3.0.8/ --with-cgns=/usr/local/packages6/libs/gcc/4.4.7/cgnslib/3.2.1 --with-hdf5=/usr/local/packages6/hdf5/gcc-4.4.7/openmpi-1.8.3/hdf5-1.8.14/

This gave the following configuration

Configuration options:use debugging code: noMPI (Message Passing Interface) support: yesOpenMP support: no

The package has been configured. Type:makemake install

To generate and install the PLE package

Configuration options:use debugging code: nouse malloc hooks: nouse graphical user interface: yesuse long integers: yesZlib (gzipped file) support: yesMPI (Message Passing Interface) support: yesMPI I/O support: yesMPI2 one-sided communication support: yes

OpenMP support: noBLAS (Basic Linear Algebra Subprograms) support: no

2.2. Iceberg 43

Page 48: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Libxml2 (XML Reader) support: yesParMETIS (Parallel Graph Partitioning) support: noMETIS (Graph Partitioning) support: noPT-SCOTCH (Parallel Graph Partitioning) support: noSCOTCH (Graph Partitioning) support: noCCM support: noHDF (Hierarchical Data Format) support: yesCGNS (CFD General Notation System) support: yesMED (Model for Exchange of Data) support: yesMED MPI I/O support: yes

MEDCoupling support: noCatalyst (ParaView co-processing) support: noEOS support: nofreesteam support: noSALOME GUI support: yesSALOME Kernel support: yesDynamic loader support (for YACS): dlopen

I then did

makemake install

Post Install Steps To make Code Saturne aware of the SGE system:

• Created /usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0/etc/code_saturne.cfg:See code_saturne.cfg 4.0

• Modified /usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0/share/code_saturne/batch/batch.SGE.See: batch.SGE 4.0

Testing This module has not been yet been properly tested and so should be considered experimental.

Several user’s jobs up to 8 cores have been submitted and ran to completion.

Module File Module File Location: /usr/local/modulefiles/apps/code_saturne/4.0.0

#%Module1.0######################################################################### code_saturne 4.0 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global code-saturneversion

puts stderr " Adds `code_saturn-$codesaturneversion' to your PATH environment variable and necessary libraries"}

set codesaturneversion 4.0.module load mpi/gcc/openmpi/1.8.3

module-whatis "loads the necessary `code_saturne-$codesaturneversion' library paths"

44 Chapter 2. Research Software Engineering Team

Page 49: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

set cspath /usr/local/packages6/apps/gcc/4.4.7/code_saturne/4.0prepend-path MANPATH $cspath/share/manprepend-path PATH $cspath/bin

ctffind

ctffind

Versions 3.140609URL http://ctffind.readthedocs.org/en/latest/

CTFFIND3 and CTFTILT are two programs for finding CTFs of electron micrographs. The program CTFFIND3 isan updated version of the program CTFFIND2, which was developed in 1998 by Nikolaus Grigorieff at the MRCLaboratory of Molecular Biology in Cambridge, UK with financial support from the MRC. This software is licensedunder the terms of the Janelia Research Campus Software Copyright 1.1.

Making ctffind available The following module command makes the latest version of gemini available to yoursession

module load apps/gcc/4.4.7/ctffind

Alternatively, you can make a specific version available

module load apps/gcc/4.4.7/ctffind/3.140609

Installation notes These are primarily for system administrators.

• ctffind was installed using the gcc 4.4.7 compiler

• install_ctffind.sh

Modulefile The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/ctffind/3.140609

The contents of the module file is

#%Module1.0######################################################################### ctfind module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global bedtools-version

puts stderr " Adds `ctffind-$ctffindversion' to your PATH environment variable"}

set ctffindversion 3.140609

prepend-path PATH /usr/local/packages6/apps/gcc/4.4.7/ctffind/3.140609/

2.2. Iceberg 45

Page 50: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Cufflinks

Cufflinks

Version 2.2.1URL http://cole-trapnell-lab.github.io/cufflinks

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation inRNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of tran-scripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support eachone, taking into account biases in library preparation protocols.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qsh orqrsh command.

The latest version of Cufflinks (currently 2.2.1) is made available with the command

module load apps/binapps/cufflinks

Alternatively, you can load a specific version with

module load apps/binapps/cufflinks/2.2.1

This command makes the cufflinks binary directory available to your session by adding it to the PATH environmentvariable.

Documentation The Cufflinks manual is available online at http://cole-trapnell-lab.github.io/cufflinks/manual/

Installation notes A binary install was used

tar -xvzf cufflinks-2.2.1.Linux_x86_64.tar.gzmkdir -p /usr/local/packages6/apps/binapps/cufflinksmv cufflinks-2.2.1.Linux_x86_64 /usr/local/packages6/apps/binapps/cufflinks/2.2.1

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/binapps/cufflinks/2.2.1

Its contents are

## cufflinks 2.2.1 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes cufflinks 2.2.1 available"

}

set version 2.2.1set CUFF_DIR /usr/local/packages6/apps/binapps/cufflinks/$version

46 Chapter 2. Research Software Engineering Team

Page 51: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module-whatis "Makes cufflinks 2.2.1 available"

prepend-path PATH $CUFF_DIR

ffmpeg

ffmpeg

Version 2.8.3URL https://www.ffmpeg.org/

FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and playpretty much anything that humans and machines have created. It supports the most obscure ancient formats up to thecutting edge. No matter if they were designed by some standards committee, the community or a corporation. It isalso highly portable: FFmpeg compiles, runs, and passes our testing infrastructure FATE across Linux, Mac OS X,Microsoft Windows, the BSDs, Solaris, etc. under a wide variety of build environments, machine architectures, andconfigurations.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh orqrsh command. The latest version of ffmpeg (currently 2.8.3) is made available with the command

module load apps/gcc/5.2/ffmpeg

Alternatively, you can load a specific version with

module load apps/gcc/5.2/ffmpeg/2.8.3

This command makes the ffmpeg binaries available to your session. It also loads version 5.2 of the gcc compilerenvironment since gcc 5.2 was used to compile ffmpeg 2.8.3

You can now run ffmpeg. For example, to confirm the version loaded

ffmpeg -version

and to get help

ffmpeg -h

Documentation Once you have made ffmpeg available to the system using the module command above, you canread the man pages by typing

man ffmpeg

Installation notes ffmpeg was installed using gcc 5.2

module load compilers/gcc/5.2

tar xf ./ffmpeg-2.8.3.tar.xzcd ffmpeg-2.8.3mkdir -p /usr/local/packages6/apps/gcc/5.2/ffmpeg/2.8.3./configure --prefix=/usr/local/packages6/apps/gcc/5.2/ffmpeg/2.8.3makemake install

2.2. Iceberg 47

Page 52: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Testing The test suite was executed

make check

All tests passed.

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/ffmpeg/2.8.3

• The module file is on github.

Freesurfer

Freesurfer

Versions 5.3.0URL http://freesurfer.net/

An open source software suite for processing and analyzing (human) brain MRI images.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive graphical session withthe qsh command.

The latest version of Freesurfer is made available with the command

module load apps/binapps/freesurfer

Alternatively, you can load a specific version with

module load apps/binapps/freesurfer/5.3.0

Important note Freesurfer is known to produce differing results when used across platforms. Seehttp://journals.plos.org/plosone/article?id=10.1371/journal.pone.0038234 for details. We recommend that you run allof your analyses for any given study on the same operating system/hardware combination and record details of theseas part of your results.

Testing No formal test suite was found. A set of commands to try were suggested athttps://surfer.nmr.mgh.harvard.edu/fswiki/TestingFreeSurfer (Accessed November 6th 2015)

The following were tried

freeview -v $SUBJECTS_DIR/bert/mri/brainmask.mgz -v $SUBJECTS_DIR/bert/mri/aseg.mgz:colormap=lut:opacity=0.2 -f $SUBJECTS_DIR/bert/surf/lh.white:edgecolor=yellow -f $SUBJECTS_DIR/bert/surf/rh.white:edgecolor=yellow -f $SUBJECTS_DIR/bert/surf/lh.pial:annot=aparc:edgecolor=red -f $SUBJECTS_DIR/bert/surf/rh.pial:annot=aparc:edgecolor=red

tkmedit bert orig.mgz

tkmedit bert norm.mgz -segmentation aseg.mgz $FREESURFER_HOME/FreeSurferColorLUT.txt

tksurfer bert rh pial

qdec

48 Chapter 2. Research Software Engineering Team

Page 53: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Installation notes These are primarily for administrators of the system. The github issue for the original installrequest is at https://github.com/rcgsheffield/iceberg_software/issues/164

Freesurfer was installed as follows

wget -c ftp://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/5.3.0/freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0.tar.gztar -xvzf ./freesurfer-Linux-centos6_x86_64-stable-pub-v5.3.0.tar.gzmv freesurfer 5.3.0mkdir -p /usr/local/packages6/apps/binapps/freesurfermv ./5.3.0/ /usr/local/packages6/apps/binapps/freesurfer/

The license file was obtained from the vendor and placed in /usr/local/packages6/apps/binapps/freesurfer/license.txt.The vendors were asked if it was OK to use this license on a central system. The answer is ‘yes’ - details athttp://www.mail-archive.com/[email protected]/msg43872.html

To find the information necessary to create the module file

env >base.envsource /usr/local/packages6/apps/binapps/freesurfer/5.3.0/SetUpFreeSurfer.shenv > after.envdiff base.env after.env

The script SetUpFreeSurfer.sh additionally creates a MATLAB startup.m file in the user’s home directory if it doesnot already exist. This is for MATLAB support only, has not been replicated in the module file and is currently notsupported in this install.

The module file is at /usr/local/modulefiles/apps/binapps/freesurfer/5.3.0

#%Module10.2#####################################################################

## Module file loggingsource /usr/local/etc/module_logging.tcl##

# Freesurfer version (not in the user's environment)set ver 5.3.0

proc ModulesHelp { } {global ver

puts stderr "Makes Freesurfer $ver available to the system."}

module-whatis "Sets the necessary Freesurfer $ver paths"

prepend-path PATH /usr/local/packages6/apps/binapps/freesurfer/$ver/binprepend-path FREESURFER_HOME /usr/local/packages6/apps/binapps/freesurfer/$ver

# The following emulates the results of 'source $FREESURFER_HOME/SetUpFreeSurfer.csh'setenv FS_OVERRIDE 0setenv PERL5LIB /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/lib/perl5/5.8.5setenv OS Linuxsetenv LOCAL_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/localsetenv FSFAST_HOME /usr/local/packages6/apps/binapps/freesurfer/5.3.0/fsfastsetenv MNI_PERL5LIB /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/lib/perl5/5.8.5setenv FMRI_ANALYSIS_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/fsfastsetenv FSF_OUTPUT_FORMAT nii.gzsetenv MINC_BIN_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/binsetenv SUBJECTS_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/subjects

2.2. Iceberg 49

Page 54: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

prepend-path PATH /usr/local/packages6/apps/binapps/freesurfer/5.3.0/fsfast/bin:/usr/local/packages6/apps/binapps/freesurfer/5.3.0/tktools:/usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/bin

setenv FUNCTIONALS_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/sessionssetenv MINC_LIB_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/libsetenv MNI_DIR /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni#setenv FIX_VERTEX_AREA #How do you set this to the empty string? This was done in the original script.setenv MNI_DATAPATH /usr/local/packages6/apps/binapps/freesurfer/5.3.0/mni/data

gemini

gemini

Versions 0.18.0URL http://gemini.readthedocs.org/en/latest/

GEMINI (GEnome MINIng) is a flexible framework for exploring genetic variation in the context of the wealth ofgenome annotations available for the human genome. By placing genetic variants, sample phenotypes and geno-types, as well as genome annotations into an integrated database framework, GEMINI provides a simple, flexible, andpowerful system for exploring genetic variation for disease and population genetics.

Making gemini available The following module command makes the latest version of gemini available to yoursession

module load apps/gcc/4.4.7/gemini

Alternatively, you can make a specific version available

module load apps/gcc/4.4.7/gemini/0.18

Installation notes These are primarily for system administrators.

• gemini was installed using the gcc 4.4.7 compiler

• install_gemini.sh

Testing The test script used was

• test_gemini.sh

The full output from the test run is on the system at /usr/local/packages6/apps/gcc/4.4.7/gemini/0.18/test_results/

There was one failure

effstring.t02...\cok

effstring.t03...\c15d14< gene impact impact_severity biotype is_exonic is_coding is_lof64a64> gene impact impact_severity biotype is_exonic is_coding is_loffailupdated 10 variants

annotate-tool.t1 ... ok

50 Chapter 2. Research Software Engineering Team

Page 55: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

updated 10 variantsannotate-tool.t2 ... ok

This was reported to the developers who indicated that it was nothing to worry about

Modulefile The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/gemini/0.18

The contents of the module file is

#%Module1.0######################################################################### Gemini module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global bedtools-version

puts stderr " Adds `gemini-$geminiversion' to your PATH environment variable"}

set geminiversion 0.18

prepend-path PATH /usr/local/packages6/apps/gcc/4.4.7/gemini/0.18/bin/

GROMACS

GROMACS

Latest version 5.1Dependancies mpi/intel/openmpi/1.10.0URL http://www.gromacs.org/

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh orqrsh command. To make GROMACS available in this session, run one of the following command:

source /usr/local/packages6/apps/intel/15/gromacs/5.1/bin/GMXRC

Installation notes

Version 5.1 Compilation Choices:

• Use latest intel compilers

• -DGMX_MPI=on

• -DCMAKE_PREFIX_PATH=/usr/local/packages6/apps/intel/15/gromcas

• -DGMX_FFT_LIBRARY=”fftw3”

• -DGMX_BUILD_OWN_FFTW=ON

2.2. Iceberg 51

Page 56: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

The script used to build gromacs can be found here.

htop

htop

Versions 2.0URL http://hisham.hm/htop/

This is htop, an interactive process viewer for Unix systems.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh orqsh command.

The latest version of htop (currently 2.0) is made available with the command

module load apps/gcc/4.4.7/htop

Alternatively, you can load a specific version with

module load apps/gcc/4.4.7/htop/2.0

This command makes the htop binary available to your session.

Installation notes htop was installed using gcc 4.4.7

tar -xvzf ./htop-2.0.0.tar.gzcd htop-2.0.0mkdir -p /usr/local/packages6/apps/gcc/4.4.7/htop/2.0./configure --prefix=/usr/local/packages6/apps/gcc/4.4.7/htop/2.0makemake install

Testing No test suite was found.

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/htop/2.0

• The module file is on github.

IDL

IDL

Version 8.5Support Level extrasDependancies javaURL http://www.exelisvis.co.uk/ProductsServices/IDL.aspxDocumentation http://www.exelisvis.com/docs/using_idl_home.html

52 Chapter 2. Research Software Engineering Team

Page 57: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

IDL is a data analysis language that first appeared in 1977.

Usage If you wish to use the IDLDE then you may need to request more memory for the interactive session usingsomething like qsh -l mem=8G.

IDL can be activated using the module file:

module load apps/idl/8.5

then run using idl or idlde for the interactive development environment.

Installation notes Extract the supplied linux x86-64 tar file, run the install shell script and point it at the installdirectory.

Integrative Genomics Viewer (IGV)

Integrative Genomics Viewer (IGV)

Version 2.3.63URL https://www.broadinstitute.org/igv/

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large,integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generationsequence data, and genomic annotations.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an graphical interactive session withthe qsh command. In testing, we determined that you need to request at least 7 Gigabytes of memory to launch IGV

qsh -l mem=7G

The latest version of IGV (currently 2.3.63) is made available with the command

module load apps/binapps/igv

Alternatively, you can load a specific version with

module load apps/binapps/igv/2.3.63

This command makes the IGV binary directory available to your session by adding it to the PATH environmentvariable. Launch the applications with the command

igv.sh

Documentation The IGV user guide is available online at https://www.broadinstitute.org/igv/UserGuide

Installation notes A binary install was used

unzip IGV_2.3.63.zipmkdir -p /usr/local/packages6/apps/binapps/IGVmv ./IGV_2.3.63 /usr/local/packages6/apps/binapps/IGV/2.3.63rm /usr/local/packages6/apps/binapps/IGV/2.3.63/igv.bat

2.2. Iceberg 53

Page 58: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/binapps/igv/2.3.63

Its contents

#%Module1.0######################################################################### IGV 2.3.63 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes IGV 2.3.63 available"

}

set version 2.3.63set IGV_DIR /usr/local/packages6/apps/binapps/IGV/$version

module-whatis "Makes IGV 2.3.63 available"

prepend-path PATH $IGV_DIR

ImageJ

ImageJ

Version 1.50gURL http://imagej.nih.gov/ij/

ImageJ is a public domain, Java-based image processing program developed at the National Institutes of Health.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshxcommand.

The latest version of ImageJ (currently 1.50g) is made available with the command

module load apps/binapps/imagej

Alternatively, you can load a specific version with

module load apps/binapps/imagej/1.50g

This module command also changes the environment to use Java 1.8 and Java 3D 1.5.2 An environment variable calledIMAGEJ_DIR is created by the module command that contains the path to the requested version of ImageJ.

You can start ImageJ, with default settings, with the command

imagej

This starts imagej with 512Mb of Java heap space. To request more, you need to run the Java .jar file directly and usethe -Xmx Java flag. For example, to request 1 Gigabyte

54 Chapter 2. Research Software Engineering Team

Page 59: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

java -Xmx1G -Dplugins.dir=$IMAGEJ_DIR/plugins/ -jar $IMAGEJ_DIR/ij.jar

You will need to ensure that you’ve requested enough virtual memory from the scheduler to support the above request.For more details, please refer to the Virtual Memory section of our Java documentation.

Documentation Links to documentation can be found at http://imagej.nih.gov/ij/download.html

Installation notes Install version 1.49 using the install_imagej.sh script

Run the GUI and update to the latest version by clicking on Help > Update ImageJ

Rename the run script to imagej since run is too generic.

Modify the imagej script so that it reads

java -Xmx512m -jar -Dplugins.dir=/usr/local/packages6/apps/binapps/imagej/1.50g/plugins/ /usr/local/packages6/apps/binapps/imagej/1.50g/ij.jar

ImageJ’s 3D plugin requires Java 3D which needs to be included in the JRE used by imagej.I attempted a module-controlled version of Java3D but the plugin refused to recognise it. Seehttps://github.com/rcgsheffield/iceberg_software/issues/279 for details.

The plug-in will install Java3D within the JRE for you if you launch it and click Plugins>3D>3D Viewer

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/binapps/imagej/1.50g

Its contents are

#%Module1.0######################################################################### ImageJ 1.50g modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

module load apps/java/1.8u71

proc ModulesHelp { } {puts stderr "Makes ImageJ 1.50g available"

}

set version 1.50gset IMAGEJ_DIR /usr/local/packages6/apps/binapps/imagej/$version

module-whatis "Makes ImageJ 1.50g available"

prepend-path PATH $IMAGEJ_DIRprepend-path CLASSPATH $IMAGEJ_DIR/prepend-path IMAGEJ_DIR $IMAGEJ_DIR

R (Intel Build)

2.2. Iceberg 55

Page 60: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

R

Dependencies BLASURL http://www.r-project.org/Documentation http://www.r-project.org/

R is a statistical computing language. This version of R is built using the Intel Compilers and the Intel Math KernelLibrary. This combination can result in significantly faster runtimes in certain circumstances.

Most R extensions are written and tested for the gcc suite of compilers so it is recommended that you perform testingbefore switching to this version of R.

CPU Architecture The Intel build of R makes use of CPU instructions that are only present on the most modern ofIceberg’s nodes. In order to use them, you should add the following to your batch submission scripts

#$ -l arch=intel-e5-2650v2

If you do not do this, you will receive the following error message when you try to run R

Please verify that both the operating system and the processor support Intel(R) F16C and AVX instructions.

Loading the modules After connecting to iceberg (see Connect to iceberg), start an interactive session with theqrshx command.

There are two types of the Intel builds of R, sequential and parallel. sequential makes use of one CPU core and canbe used as a drop-in replacement for the standard version of R installed on Iceberg.

module load apps/intel/15/R/3.3.1_sequential

The parallel version makes use of multiple CPU cores for certain linear algebra routines since it is linked to the parallelversion of the Intel MKL. Note that only linear algebra routines are automatically parallelised.

module load apps/intel/15/R/3.3.1_parallel

When using the parallel module, you must also ensure that you set the bash environment variableOMP_NUM_THREADS to the number of cores you require and also use the openmp parallel environment. E.g.Add the following to your submission script

#$ -pe openmp 8export OMP_NUM_THREADS=8

module load apps/intel/15/R/3.3.1_parallel

Example batch jobs Here is how to run the R script called linear_algebra_bench.r from the HPC Examples githubrepository

#!/bin/bash#This script runs the linear algebra benchmark multiple times using the intel-compiled version of R#that's linked with the sequential MKL#$ -l mem=8G -l rmem=8G# Target the Ivy Bridge Processors#$ -l arch=intel-e5-2650v2

module load apps/intel/15/R/3.3.1_sequential

56 Chapter 2. Research Software Engineering Team

Page 61: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

echo "Intel R with sequential MKL on intel-e5-2650v2"Rscript linear_algebra_bench.r output_data.rds

Here is how to run the same code using 8 cores

#!/bin/bash#$ -l mem=3G -l rmem=3G # Memory per core# Target the Ivy Bridge Processors#$ -l arch=intel-e5-2650v2#$ -pe openmp 8export OMP_NUM_THREADS=8

module load apps/intel/15/R/3.3.1_parallel

echo "Intel R with parallel MKL on intel-e5-2650v2"echo "8 cores"Rscript inear_algebra_bench.r 8core_output_data.rds

Installing additional packages By default, the standard version of R allows you to install packages into the location~/R/x86_64-unknown-linux-gnu-library/3.3/, where ~ refers to your home directory.

To ensure that the Intel builds do not contaminate the standard gcc builds, the Intel R module files set the environmentvariable R_LIBS_USER to point to ~/R/intel_R/3.3.1

As a user, you should not need to worry about this detail and just install packages as you usuall would from within R.e.g.

install.packages("dplyr")

The Intel build of R will ignore any packages installed in your home directory for the standard version of R and viceversa

Installation Notes These notes are primarily for administrators of the system.

version 3.3.1

This was a scripted install. It was compiled from source with Intel Compiler 15.0.3 and with –enable-R-shlib enabled.It was run in batch mode.

This build required several external modules including xz utils, curl, bzip2 and zlib

• install_intel_r_sequential.sh Downloads, compiles, tests and installs R 3.3.1 using Intel Compilers andthe sequential MKL. The install and test logs are at /usr/local/packages6/apps/intel/15/R/sequential-3.3.1/install_logs/

• install_intel_r_parallel.sh Downloads, compiles, tests and installs R 3.3.1 using Intel Compilers and the parallelMKL. The install and test logs are at /usr/local/packages6/apps/intel/15/R/sequential-3.3.1/install_logs/

• 3.3.1_parallel Parallel Module File

• 3.3.1_sequential Sequential Module File

JAGS

2.2. Iceberg 57

Page 62: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

JAGS

Latest version 4.2URL http://mcmc-jags.sourceforge.net/

JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov ChainMonte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind:

• To have a cross-platform engine for the BUGS language

• To be extensible, allowing users to write their own functions, distributions and samplers.

• To be a platform for experimentation with ideas in Bayesian modeling

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshxor qrsh command. To make JAGS available in this session, run one of the following module commands

module load apps/gcc/4.8.2/JAGS/4.2module load apps/gcc/4.8.2/JAGS/3.4module load apps/gcc/4.8.2/JAGS/3.1

You can now run the jags command

jagsWelcome to JAGS 4.2.0 on Thu Jun 30 09:21:17 2016JAGS is free software and comes with ABSOLUTELY NO WARRANTYLoading module: basemod: okLoading module: bugs: ok.

The rjags and runjags interfaces in R rjags and runjags are CRAN packages that provide an R interface to jags.They are not installed in R by default.

After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshx command. Run thefollowing module commands

module load compilers/gcc/4.8.2module load apps/gcc/4.8.2/JAGS/4.2module load apps/R/3.3.0

Launch R by typing R and pressing return. Within R, execute the following commands

install.packages('rjags')install.packages('runjags')

and follow the on-screen inctructions. Answer y to any questions about the creation of a personal library should theybe asked.

The packages will be stored in a directory called R within your home directory.

You should only need to run the install.packages commands once. When you log into the system in future,you will only need to run the module commands above to make JAGS available to the system.

You load the rjags packages the same as you would any other R package

library('rjags')library('runjags')

58 Chapter 2. Research Software Engineering Team

Page 63: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

If you received an error message such as

Error : .onLoad failed in loadNamespace() for 'rjags', details:call: dyn.load(file, DLLpath = DLLpath, ...)error: unable to load shared object '/home/fe1mpc/R/x86_64-unknown-linux-gnu-library/3.2/rjags/libs/rjags.so':libjags.so.3: cannot open shared object file: No such file or directory

Error: package or namespace load failed for 'rjags'

the most likely cause is that you forget to load the necessary modules before starting R.

Installation notes Version 4.2

JAGS 4.2 was built with gcc 4.8.2

• Install script on github - https://github.com/mikecroucher/HPC_Installers/blob/master/apps/jags/4.2.0/sheffield/iceberg/install_jags4.2.0.sh

• Module file on github - https://github.com/mikecroucher/HPC_Installers/blob/master/apps/jags/4.2.0/sheffield/iceberg/4.2

Version 3.4

JAGS 3.4 was built with gcc 4.8.2

module load compilers/gcc/4.8.2tar -xvzf ./JAGS-3.4.0.tar.gzcd JAGS-3.4.0mkdir -p /usr/local/packages6/apps/gcc/4.8.2/JAGS/3.4./configure --prefix=/usr/local/packages6/apps/gcc/4.8.2/JAGS/3.4makemake install

Version 3.1

JAGS 3.1 was built with gcc 4.8.2

module load compilers/gcc/4.8.2tar -xvzf ./JAGS-3.1.0.tar.gzcd JAGS-3.1.0mkdir -p /usr/local/packages6/apps/gcc/4.8.2/JAGS/3.1./configure --prefix=/usr/local/packages6/apps/gcc/4.8.2/JAGS/3.1makemake install

Java

Java

Latest Version 1.8u71URL https://www.java.com/en/download/

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh orqsh command.

The latest version of Java (currently 1.8u71) is made available with the command

2.2. Iceberg 59

Page 64: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load apps/java

Alternatively, you can load a specific version with one of

module load apps/java/1.8u71module load apps/java/1.7.0u55module load apps/java/1.7module load apps/java/1.6

Check that you have the version you expect. First, the runtime

java -version

java version "1.8.0_71"Java(TM) SE Runtime Environment (build 1.8.0_71-b15)Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)

Now, the compiler

javac -version

javac 1.8.0_71

Virtual Memory By default, Java requests a lot of virtual memory on startup. This is usually a given fraction of thephysical memory on a node, which can be quite a lot on Iceberg. This then exceeds a user’s virtual memory limit setby the scheduler and causes a job to fail.

To fix this, we have created a wrapper script to Java that uses the -Xmx1G switch to force Java to only request oneGigabyte of memory for its heap size. If this is insufficient, you are free to allocate as much memory as you requirebut be sure to request enough from the scheduler as well. You’ll typically need to request more virtual memory fromthe scheduler than you specify in the Java -Xmx switch.

For example, consider the following submission script. Note that it was necessary to request 9 Gigabytes of memoryfrom the scheduler even though we only allocated 5 Gigabytes heap size in Java. The requirement for 9 gigabytes wasdetermined empirically

#!/bin/bash#Request 9 gigabytes of real memory from the scheduler (mem)#and 9 gigabytes of virtual memory from the scheduler (mem)#$ -l mem=9G -l rmem=9G

# load the Java modulemodule load apps/java/1.8u71

#Run java program allocating 5 Gigabytesjava -Xmx5G HelloWorld

Installation notes These are primarily for administrators of the system.

Unzip and copy the install directory to /usr/local/packages6/apps/binapps/java/jdk1.8.0_71/

To fix the virtual memory issue described above, we use a wrapper around the java install that sets Java’s Xmxparameter to a reasonable value.

Create the file /usr/local/packages6/apps/binapps/java/jdk1.8.0_71/shef/java with contents

#!/bin/bash#

60 Chapter 2. Research Software Engineering Team

Page 65: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

# Java version 1.8 cannot be invoked without specifying the java virtual# machine size due to the limitations imposed by us via SGE on memory usage.# Therefore this script intercepts the java invocations and adds a# memory constraint parameter to java engine unless there was one already# specified on the command parameter.##

if test -z "`echo $* | grep -e -Xmx`"; then# user has not specified -Xmx memory requirement flag, so add it.

/usr/local/packages6/apps/binapps/java/jdk1.8.0_71/bin/java -Xmx1G $*else# user specified the -Xmx flag, so don't add it.

/usr/local/packages6/apps/binapps/java/jdk1.8.0_71/bin/java $*fi

The module file is at /usr/local/modulefiles/apps/java/1.8u71. It’s contents are

#%Module10.2#####################################################################

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global helpmsgputs stderr "\t$helpmsg\n"

}

set version 1.8

set javahome /usr/local/packages6/apps/binapps/java/jdk1.8.0_71/

if [ file isdirectory $javahome/bin ] {module-whatis "Sets JAVA to version $version"set helpmsg "Changes the default version of Java to Version $version"# bring in new versionsetenv JAVA_HOME $javahomeprepend-path PATH $javahome/binprepend-path PATH $javahome/shefprepend-path MANPATH $javahome/man

} else {module-whatis "JAVA $version not installed"set helpmsg "JAVA $version not installed"if [ expr [ module-info mode load ] || [ module-info mode display ] ] {

# bring in new versionputs stderr "JAVA $version not installed on [uname nodename]"

}}

Julia

2.2. Iceberg 61

Page 66: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Julia

Versions 0.5.0-rc3URL http://julialang.org/

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that isfamiliar to users of other technical computing environments.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with either theor qrsh or qrshx commands.

The latest version of Julia (currently 0.5.0-rc3) is made available with the commands

module load compilers/gcc/5.2module load apps/gcc/5.2/julia/0.5.0-rc3

This adds Julia to your PATH and also loads the gcc 5.2 compiler environment with which Julia was built.

Start Julia by executing the command

julia

You can exit a Julia session with the quit() function.

Installation notes These are primarily for administrators of the system.

Julia was installed using gcc 5.2

module load apps/gcc/5.2/git/2.5module load compilers/gcc/5.2module load apps/python/anaconda2-2.5.0

git clone git://github.com/JuliaLang/julia.gitcd juliagit checkout release-0.5#The next line targets the NEHALEM CPU architecture. This is the lowest architecture available on#Iceberg and so the resulting binary will be supported on all nodes. Performance will not be as good as it#could be on modern nodes.sed -i s/OPENBLAS_TARGET_ARCH:=/OPENBLAS_TARGET_ARCH:=NEHALEM/ ./Make.incmake

Module File The module file is as below

#%Module10.2#####################################################################

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global ver

puts stderr " Adds Julia $ver to your environment variables."}

# Mathematica version (not in the user's environment)

62 Chapter 2. Research Software Engineering Team

Page 67: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

set ver 0.5.0-rc3

module-whatis "sets the necessary Julia $ver paths"

prepend-path PATH /usr/local/packages6/apps/gcc/5.2/julia/0.5.0-rc3

Knime

Knime

URL https://www.knime.orgDocumentation http://tech.knime.org/documentation

KNIME Analytics Platform is an open solution for data-driven innovation, helping you discover the potential hiddenin your data, mine for fresh insights, or predict new futures.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshxcommand.

The latest version of Knime including all of the extensions can be loaded with

module load apps/knime

Alternatively, you can load a specific version of Knime excluding all of the extensions using one of the following

module load apps/knime/3.1.2

With extensions

module load apps/knime/3.1.2ext

Knime can then be run with

$ knime

Batch usage There is a command line option allowing the user to run KNIME in batch mode:

knime -application org.knime.product.KNIME_BATCH_APPLICATION -nosplash -workflowDir="<path>"

• -application org.knime.product.KNIME_BATCH_APPLICATION launches the KNIME batch application.

• -nosplash does not show the initial splash window.

• -workflowDir=”<path>” provides the path of the directory containing the workflow.

Full list of options:

• -nosave do not save the workflow after execution has finished

• -reset reset workflow prior to execution

• -failonloaderror don’t execute if there are errors during workflow loading

• -updateLinks update meta node links to latest version

• -credential=name[;login[;password]] for each credential enter credential name and optional login/password

2.2. Iceberg 63

Page 68: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

• -masterkey[=...] prompt for master password (used in e.g. database nodes),if provided with argument, useargument instead of prompting

• -preferences=... path to the file containing eclipse/knime preferences,

• -workflowFile=... ZIP file with a ready-to-execute workflow in the root of the ZIP

• -workflowDir=... directory with a ready-to-execute workflow

• -destFile=... ZIP file where the executed workflow should be written to if omitted the workflow is only saved inplace

• -destDir=... directory where the executed workflow is saved to if omitted the workflow is only saved in place

• -workflow.variable=name,value,type define or overwrite workflow variable’name’ with value ‘value’ (possiblyenclosed by quotes). The’type’ must be one of “String”, “int” or “double”.

The following return codes are defined:

• 0 upon successful execution

• 2 if parameters are wrong or missing

• 3 when an error occurs during loading a workflow

• 4 if an error during execution occurred

Installation Notes These notes are primarily for administrators of the system.

Version 3.1.2 without extensions

• Download with wget https://download.knime.org/analytics-platform/linux/knime_3.1.2.linux.gtk.x86_64.tar.gz

• Move to /usr/local/extras/knime_analytics/3.1.2

• Unzip tar -xvzf knime_3.1.2.linux.gtk.x86_64.tar.gz

The modulefile is at /usr/local/extras/modulefiles/apps/knime/3.1.2

contains

#%Module1.0

proc ModulesHelp { } {puts stderr " Adds KNIME to your PATH environment variable and necessary libraries"

}

prepend-path PATH /usr/local/extras/knime_analytics/3.1.2

Version 3.1.2 with extensions

• Download with wget https://download.knime.org/analytics-platform/linux/knime-full_3.1.2.linux.gtk.x86_64.tar.gz

• Move to /usr/local/extras/knime_analytics/3.1.2ext

• Unzip tar -xvzf knime-full_3.1.2.linux.gtk.x86_64.tar.gz

The modulefile is at /usr/local/extras/modulefiles/apps/knime/3.1.2ext

contains

#%Module1.0

proc ModulesHelp { } {puts stderr " Adds KNIME to your PATH environment variable and necessary libraries"

64 Chapter 2. Research Software Engineering Team

Page 69: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

}

prepend-path PATH /usr/local/extras/knime_analytics/3.1.2ext

Lua

Lua

URL https://www.lua.org/Documentation https://www.lua.org/docs.html

Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object-oriented programming, functional programming, data-driven programming, and data description.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshxcommand.

Lua can be loaded with

module load apps/lua/5.3.3

Lua can then be run with

$ lua

Installation Notes These notes are primarily for administrators of the system.

Version 5.3.3

• curl -R -O http://www.lua.org/ftp/lua-5.3.3.tar.gz

• tar zxf lua-5.3.3.tar.gz

• cd lua-5.3.3

• make linux test

The modulefile is at /usr/local/extras/modulefiles/apps/lua/5.3.3

contains

#%Module1.0

proc ModulesHelp { } {puts stderr " Adds Lua to your PATH environment variable and necessary libraries"

}

prepend-path PATH /usr/local/extras/Lua/lua-5.3.3/src

Maple

2.2. Iceberg 65

Page 70: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Maple

Latest Version 2016URL http://www.maplesoft.com/products/maple/

Scientific Computing and Visualisation

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshxcommand.

The latest version of Maple (currently 2016) is made available with the command

module load apps/binapps/maple

Alternatively, you can load a specific version with

module load apps/binapps/maple/2016module load apps/binapps/maple/2015

You can then run the graphical version of Maple by entering xmaple or the command line version by entering maple.

Batch usage It is not possible to run Maple worksheets in batch mode. Instead, you must convert your worksheet toa pure text file that contains a set of maple input commands. You can do this in Maple by opening your worksheet andclicking on File->Export As->Maple Input. The result will have the file extension .mpl

An example Sun Grid Engine submission script that makes use of a .mpl file called, for example, mycode.mpl is

#!/bin/bash# Request 4 gigabytes of real memory (mem)# and 4 gigabytes of virtual memory (mem)#$ -l mem=4G -l rmem=4G

module load apps/binapps/maple/2015

maple < mycode.mpl

For general information on how to submit batch jobs refer to Running Batch Jobs on iceberg

Tutorials

• High Performance Computing with Maple A tutorial from the Sheffield Research Software Engineering groupon how to use Maple in a High Performance Computing environment

Installation notes These are primarily for administrators of the system.

Maple 2016

• Run the installer Maple2016.1LinuxX64Installer.run and follow instructions.

• Choose Install folder /usr/local/packages6/apps/binapps/maple2016

• Do not configure MATLAB

• Choose a network license. Details on CiCS internal wiki.

• Uncheck ‘Enable periodic checking for Maple 2016 updates’

• Check ‘Check for updates now’

66 Chapter 2. Research Software Engineering Team

Page 71: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

The module file is at /usr/local/modulefiles/apps/binapps/maple/2016

#%Module10.2#####################################################################

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global ver

puts stderr " Makes Maple $ver available to the system."}

# Maple version (not in the user's environment)set ver 2016

module-whatis "sets the necessary Maple $ver paths"

prepend-path PATH /usr/local/packages6/apps/binapps/maple2016/bin

Maple 2015

The module file is at /usr/local/modulefiles/apps/binapps/maple/2015

#%Module10.2#####################################################################

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global ver

puts stderr " Makes Maple $ver available to the system."}

# Maple version (not in the user's environment)set ver 2015

module-whatis "sets the necessary Maple $ver paths"

prepend-path PATH /usr/local/packages6/maple/bin/

Mathematica

Wolfram Mathematica

Dependancies NoneURL http://www.wolfram.com/mathematica/Latest version 10.3.1

Mathematica is a technical computing environment and programming language with strong symbolic and numericalabilities.

2.2. Iceberg 67

Page 72: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Single Core Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive sessionwith qsh.

The latest version of Mathematica can be loaded with

module load apps/binapps/mathematica

Alternatively, you can load a specific version of Mathematica using

module load apps/binapps/mathematica/10.3.1module load apps/binapps/mathematica/10.2

Mathematica can then be started with the mathematica command

mathematica

Multicore Interactive Usage Mathematica has extensive parallel functionality. To use it, you should request aparallel interactive session. For example, to request 4 cores

qsh -pe openmp 4

Load and launch Mathematica

module load apps/binapps/mathematicamathematica

In Mathematica, let’s time how long it takes to calculate the first 20 million primes on 1 CPU core

AbsoluteTiming[primelist = Table[Prime[k], {k, 1, 20000000}];]

When I tried this, I got 78 seconds. Your results may vary greatly. Now, let’s launch 4 ParallelKernels and redo thecalculation in parallel

LaunchKernels[4]AbsoluteTiming[primelist =

ParallelTable[Prime[k], {k, 1, 20000000},Method -> "CoarsestGrained"];]

When I tried this, I got 29 seconds – around 2.7 times faster. This illustrates a couple of points:-

• You should always request the same number of kernels as you requested in your qsh command (in this case, 4).If you request more, you will damage performance for yourself and other users of the system.

• N kernels doesn’t always translate to N times faster.

Batch Submission Unfortunately, it is not possible to run Mathematica notebook .nb files directly in batch. Instead,you need to create a simple text file that contains all of the Mathematica commands you want to execute. Typically,such files are given the extension .m. Let’s run the following simple Mathematica script as a batch job.

(*Find out what version of Mathematica this machine is running*)Print["Mathematica version is " <> $Version]

(*Find out the name of the machine we are running on*)Print["The machine name is " <> $MachineName]

(*Do a calculation*)Print["The result of the integral is "]Print [ Integrate[Sin[x]^2, x]]

68 Chapter 2. Research Software Engineering Team

Page 73: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Copy and paste the above into a text file called very_simple_mathematica.m

An example batch submission script for this file is

#!/bin/bash# Request 4 gigabytes of real memory (mem)# and 4 gigabytes of virtual memory (mem)#$ -l mem=4G -l rmem=4G

module load apps/binapps/mathematica/10.3.1

math -script very_simple_mathematica.m

Copy and paste the above into a file called run_job.sh and submit with

qsub run_job.sh

Once the job has successfully completed, the output will be in a file named like run_job.sh.o396699. The numberat the end refers to the job-ID given to this job by the system and will be different for you. Let’s take a look at thecontents of this file

more run_job.sh.o396699

Mathematica version is 10.2.0 for Linux x86 (64-bit) (July 28, 2015)The machine name is node131The result of the integral isx/2 - Sin[2*x]/4

Installation notes These are primarily for administrators of the system

For Version 10.3.1

mkdir -p /usr/local/packages6/apps/binapps/mathematica/10.3.1chmod +x ./Mathematica_10.3.1_LINUX.sh./Mathematica_10.3.1_LINUX.sh

The installer is interactive. Here’s the session output--------------------------------------------------------------------------------

Wolfram Mathematica 10.3 Installer--------------------------------------------------------------------------------

Copyright (c) 1988-2015 Wolfram Research, Inc. All rights reserved.

WARNING: Wolfram Mathematica is protected by copyright law and internationaltreaties. Unauthorized reproduction or distribution may result in severecivil and criminal penalties and will be prosecuted to the maximum extentpossible under law.

Enter the installation directory, or press ENTER to select/usr/local/Wolfram/Mathematica/10.3:> /usr/local/packages6/apps/binapps/mathematica/10.3.1

Now installing...

[*****************************************************************************]

Type the directory path in which the Wolfram Mathematica script(s) will becreated, or press ENTER to select /usr/local/bin:> /usr/local/packages6/apps/binapps/mathematica/10.3.1/scripts

2.2. Iceberg 69

Page 74: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Create directory (y/n)?> y

WARNING: No Avahi Daemon was detected so some Kernel Discovery features willnot be available. You can install Avahi Daemon using your distribution'spackage management system.

For Red Hat based distributions, try running (as root):

yum install avahi

Installation complete.

Install the University network mathpass file at /usr/local/packages6/apps/binapps/mathematica/10.3.1/Configuration/Licensing

For Version 10.2

mkdir -p /usr/local/packages6/apps/binapps/mathematica/10.2chmod +x ./Mathematica_10.2.0_LINUX.sh./Mathematica_10.2.0_LINUX.sh

The installer is interactive. Here’s the session output

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Wolfram Mathematica 10.2 Installer

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Copyright (c) 1988-2015 Wolfram Research, Inc. All rights reserved.

WARNING: Wolfram Mathematica is protected by copyright law and international treaties. Unauthorized reproduction or distribution may result in severe civil and criminal penalties and will beprosecuted to the maximum extent possible under law.

Enter the installation directory, or press ENTER to select /usr/local/Wolfram/Mathematica/10.2:>

Error: Cannot create directory /usr/local/Wolfram/Mathematica/10.2.

You may need to be logged in as root to continue with this installation.

Enter the installation directory, or press ENTER to select /usr/local/Wolfram/Mathematica/10.2:> /usr/local/packages6/apps/binapps/mathematica/10.2

Now installing...

[*********************************************************************************************************************************************************************************************************]

Type the directory path in which the Wolfram Mathematica script(s) will be created, or press ENTER to select /usr/local/bin:> /usr/local/packages6/apps/binapps/mathematica/10.2/scripts

Create directory (y/n)?> y

WARNING: No Avahi Daemon was detected so some Kernel Discovery features will not be available. You can install Avahi Daemon using your distribution's package management system.

For Red Hat based distributions, try running (as root):

yum install avahi

70 Chapter 2. Research Software Engineering Team

Page 75: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Installation complete.

Remove the playerpass file

rm /usr/local/packages6/apps/binapps/mathematica/10.2/Configuration/Licensing/playerpass

Install the University network mathpass file at /usr/local/packages6/apps/binapps/mathematica/10.2/Configuration/Licensing

Modulefiles

• The 10.3.1 module file.

• The 10.2 module file.

MATLAB

MATLAB

Versions 2013a , 2013b , 2014a, 2015a, 2016aSupport Level FULLDependancies NoneURL http://uk.mathworks.com/products/matlabLocal URL http://www.shef.ac.uk/wrgrid/software/matlabDocumentation http://uk.mathworks.com/help/matlab

Scientific Computing and Visualisation

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qshcommand.

The latest version of MATLAB (currently 2016a) is made available with the command

module load apps/matlab

Alternatively, you can load a specific version with one of of the following commands

module load apps/matlab/2013amodule load apps/matlab/2013bmodule load apps/matlab/2014amodule load apps/matlab/2015amodule load apps/matlab/2016a

You can then run MATLAB by entering matlab

Serial (one CPU) Batch usage Here, we assume that you wish to run the program hello.m on the system.

First, you need to write a batch submission file. We assume you’ll call this my_job.sge.

#!/bin/bash#$ -l rmem=4G # Request 4 Gigabytes of real memory#$ -l mem=16G # Request 16 Gigabytes of virtual memory$ -cwd # Run job from current directorymodule load apps/matlab # Make latest version of MATLAB available

2.2. Iceberg 71

Page 76: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

matlab -nodesktop -r 'hello'

Ensuring that hello.m and myjob.sge are both in your current working directory, submit your job to the batchsystem.

qsub my_job.sge

Some notes about this example:

• We are running the script hello.m but we drop the .m in the call to MATLAB. That is, we do -r ’hello’rather than -r hello.m.

• All of the module commands introduced in the Interactive usage section will also work in batch mode. Thisallows you to select a specific version of MATLAB if you wish.

Easy Way of Running MATLAB Jobs on the Batch Queue Firstly prepare a MATLAB script that contains all thecommands for running a MATLAB task. Let us assume that this script is called mymatlabwork.m. Next select theversion of MATLAB you wish to use by using the module load command, for example;

module load apps/matlab/2016a

Now submit a job that runs this MATLAB script as a batch job.

runmatlab mymatlabwork.m

That is all to it!

The runmatlab command can take a number of parameters to refine the control of your MATLAB batch job, suchas the maximum time and memory needs. To get a full listing of these parameters simply type runmatlab on icebergcommand line.

MATLAB Compiler and running free-standing compiled MATLAB programs The MATLAB compiler mcc isinstalled on iceberg that can be used to generate free standing executables. Such executables can then be run on othercomputers that does not have MATLAB installed. We strongly recommend you use R2016a or later versions to takeadvantage of this feature.

To compile a MATLAB function or script for example called myscript.m the following steps are required.

module load apps/matlab/2016a # Load the matlab 2016a modulemcc -m myscript.m # Compile your program to generate the executable myscript and

# also generate a shell script named run_myscript.sh./run_myscript.sh $MCRROOT # Finally run your program

If myscript.m is a MATLAB function that require inputs these can be suplied on the command line. For example if thefirst line of myscript.m reads:

function out = myscript ( a , b , c )

then to run it with 1.0, 2.0, 3.0 as its parameters you will need to type:

./run_myscript.sh $MCRROOT 1.0 2.0 3.0

After a successful compilation and running you can transfer your executable and the runscript to another computer.That computer does not have to have MATLAB installed or licensed on it but it will have to have the MATLAB runtimesystem installed. This can be done by either downloading the MATLAB runtime environment from Mathworks website or by copying the installer file from iceberg itself which resides in:

72 Chapter 2. Research Software Engineering Team

Page 77: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

/usr/local/packages6/matlab/R2016a/toolbox/compiler/deploy/glnxa64/MCRInstaller.zip

This file can be unzipped in a temporary area and run the setup script that unzipping yields to install the MATLABruntime environment. Finally the environment variable $MCRROOT can be set to the directory containing the runtimeenvironment.

Parallel MATLAB on iceberg Currently we recommend the 2015a version of MATLAB for parallel work.

The default cluster configuration named local provides parallel working environment by using the CPUs of the workernode that is running the current MATLAB session. Each iceberg worker node can run multiple users’ jobs simultane-ously. Therefore depending on who else is using that node at the time, parallel MATLAB jobs can create contentionsbetween jobs and slow them considerably. It is therefore advisable to start parallel MATLAB jobs that will use thelocal profile from a parallel SGE job. For example, to use the local profile with 5 workers, do the following;

Start a parallel OpenMP job with 6 workers:

qsh -pe openmp 6

Run MATLAB in that session and select 5 workers:

matlabparpool ('local' , 5 )

The above example will use 5 MATLAB workers on a single iceberg node to run a parallel task.

To take advantage of the multiple iceberg nodes, you will need to make use of a parallel cluster profile named sge.This can be done by issuing a locally provided MATLAB command named iceberg that imports the parallel clusterprofile named sge that can take advantage of the SGE scheduler to run larger parallel jobs.

When using the sge profile, MATLAB will be able to submit multiple MATLAB jobs the the SGE scheduler fromwithin MATLAB itself. However, each job will have the default resource requirements unless the following trick isdeployed. For example, during your MATLAB session type:

global sge_paramssge_params='-l mem=16G -l h_rt=36:00:00'

to make sure that all the MATLAB batch jobs will use up to 16GBytes of memory and will not be killed unless theyexceed 36 hours of run time.

Training

• CiCS run an Introduction to Matlab course

• In November 2015, CiCS hosted a Parallel Computing in MATLAB Masterclass. The materials are available athttp://rcg.group.shef.ac.uk/courses/mathworks-parallelmatlab/

Installation notes These notes are primarily for system administrators.

Requires the floating license server licserv4.shef.ac.uk to serve the licenses. An install script namedinstaller_input.txt and associated files are downloadable from Mathworks site along with all the requiredtoolbox specific installation files.

The following steps are performed to install MATLAB on iceberg.

1. If necessary, update the floating license keys on licserv4.shef.ac.uk to ensure that the licenses areserved for the versions to install.

2. Log onto Mathworks site to download the MATLAB installer package for 64-bit Linux ( for R2016a this wascalled matlab_R2016a_glnxa64.zip )

2.2. Iceberg 73

Page 78: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

3. Unzip the installer package in a temporary directory: unzip matlab_R2016a_glnxa64.zip ( This willcreate a few items including files named install and installer_input.txt)

4. Run the installer: ./install

5. Select install choice of Log in to Mathworks Account

6. Select Download only.

7. Select the offered default Download path ( this will be in your home area$HOME/Downloads/MathWorks/... ) Note: This is the default download location that is laterused by the silent installer. Another option is to move all downloaded files to the same directory where installscript resides.

8. Finally run the installer using our customized installer_input.txt script as input ( ./install-inputFile installer_input.txt ).

Installation should finish with exit status 0 if all has worked.

Note: A template installer_input file for 2016a is available at /usr/local/packages6/matlab directory named2016a_installer_input.txt. This will need minor edits to install the next versions in the same way.

MrBayes

MrBayes

Version 3.2.6URL https://sourceforge.net/projects/mrbayes/

MrBayes is a program for the Bayesian estimation of phylogeny.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshxcommand.

The latest version of MrBayes (currently 3.2.6) is made available with the command

module load apps/gcc/4.4.7/mrbayes

Alternatively, you can load a specific version with

module load apps/gcc/4.4.7/mrbayes/3.2.6

This command makes the MrBayes mb binary available to your session.

Installation notes MrBayes was installed with gcc 4.4.7

tar -xvzf ./mrbayes-3.2.6.tar.gzlscd mrbayes-3.2.6autoconf./configure --with-beagle=/usr/local/packages6/libs/gcc/4.4.7/beagle/2.1.2 --prefix=/usr/local/packages6/apps/gcc/4.4.7/mrbayes/3.2.6/makemake install

74 Chapter 2. Research Software Engineering Team

Page 79: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/mrbayes/3.2.6

• The module file is on github.

Octave

Ocatve

Versions 4.0.0URL https://www.gnu.org/software/octave/

GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabil-ities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. Italso provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used throughits interactive command line interface, but it can also be used to write non-interactive programs. The Octave languageis quite similar to MATLAB so that most programs are easily portable.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with either theqsh or qrsh commands.

The latest version of Octave (currently 4.0.0) is made available with the command

module load apps/gcc/5.2/octave

Alternatively, you can load a specific version with

module load apps/gcc/5.2/octave/4.0

This adds Octave to your PATH and also loads all of the supporting libraries and compilers required by the system.

Start Octave by executing the command

octave

If you are using a qsh session, the graphical user interface version will begin. If you are using a qrsh session, you willonly be able to use the text-only terminal version.

Batch Usage Here is an example batch submission script that will run an Octave program called foo.m

#!/bin/bash# Request 5 gigabytes of real memory (mem)# and 5 gigabytes of virtual memory (mem)#$ -l mem=5G -l rmem=5G# Request 64 hours of run time#$ -l h_rt=64:00:00

module load apps/gcc/5.2/octave/4.0

octave foo.m

2.2. Iceberg 75

Page 80: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Using Packages (Toolboxes) Octave toolboxes are referred to as packages. To see which ones are installed, use thecommand ver from within Octave.

Unlike MATLAB, Octave does not load all of its packages at startup. It is necessary to load the package before itscommands are available to your session. For example, as with MATLAB, the pdist command is part of the statisticspackage. Unlike MATLAB, pdist is not immediately available to you

octave:1> pdist([1 2 3; 2 3 4; 1 1 1])warning: the 'pdist' function belongs to the statistics package from OctaveForge which you have installed but not loaded. To load the package, run`pkg load statistics' from the Octave prompt.

Please read `http://www.octave.org/missing.html' to learn how you cancontribute missing functionality.warning: called from

__unimplemented__ at line 524 column 5error: 'pdist' undefined near line 1 column 1

As the error message suggests, you need to load the statistics package

octave:1> pkg load statisticsoctave:2> pdist([1 2 3; 2 3 4; 1 1 1])ans =

1.7321 2.2361 3.7417

Installation notes These are primarily for administrators of the system.

Octave was installed using gcc 5.2 and the following libraries:

• Java 1.8u60

• FLTK 1.3.3

• fftw 3.3.4

• Octave was installed using a SGE batch job. The install script is on github

• The make log is on the system at /usr/local/packages6/apps/gcc/5.2/octave/4.0/make_octave4.0.0.log

• The configure log is on the system at /usr/local/packages6/apps/gcc/5.2/octave/4.0/configure_octave4.0.0.log

For full functionality, Octave requires a large number of additional libraries to be installed. We have currently notinstalled all of these but will do so should they be required.

For information, here is the relevant part of the Configure log that describes how Octave was configured

Source directory: .Installation prefix: /usr/local/packages6/apps/gcc/5.2/octave/4.0C compiler: gcc -pthread -fopenmp -Wall -W -Wshadow -Wforma

t -Wpointer-arith -Wmissing-prototypes -Wstrict-prototypes -Wwrite-strings -Wcast-align -Wcast-qual -I/usr/local/packages6/compilers/gcc/5.2.0/include

C++ compiler: g++ -pthread -fopenmp -Wall -W -Wshadow -Wold-style-cast -Wformat -Wpointer-arith -Wwrite-strings -Wcast-align -Wcast-qual -g -O2

Fortran compiler: gfortran -OFortran libraries: -L/usr/local/packages6/compilers/gcc/5.2.0/lib -

L/usr/local/packages6/compilers/gcc/5.2.0/lib64 -L/usr/local/packages6/compilers/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0 -L/usr/local/packages6/compilers/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/local/packages6/libs/gcc/5.2/fftw/3.3.4/lib -L/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3/lib -L/usr/local/packages6/compilers

76 Chapter 2. Research Software Engineering Team

Page 81: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

/gcc/5.2.0/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/../../.. -lgfortran -lm -lquadmath

Lex libraries:LIBS: -lutil -lm

AMD CPPFLAGS:AMD LDFLAGS:AMD libraries:ARPACK CPPFLAGS:ARPACK LDFLAGS:ARPACK libraries:BLAS libraries: -lblasCAMD CPPFLAGS:CAMD LDFLAGS:CAMD libraries:CARBON libraries:CCOLAMD CPPFLAGS:CCOLAMD LDFLAGS:CCOLAMD libraries:CHOLMOD CPPFLAGS:CHOLMOD LDFLAGS:CHOLMOD libraries:COLAMD CPPFLAGS:COLAMD LDFLAGS:COLAMD libraries:CURL CPPFLAGS:CURL LDFLAGS:CURL libraries: -lcurlCXSPARSE CPPFLAGS:CXSPARSE LDFLAGS:CXSPARSE libraries:DL libraries:FFTW3 CPPFLAGS:FFTW3 LDFLAGS:FFTW3 libraries: -lfftw3_threads -lfftw3FFTW3F CPPFLAGS:FFTW3F LDFLAGS:FFTW3F libraries: -lfftw3f_threads -lfftw3fFLTK CPPFLAGS: -I/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3/in

clude -I/usr/include/freetype2 -I/usr/local/packages6/compilers/gcc/5.2.0/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_THREAD_SAFE -D_REENTRANT

FLTK LDFLAGS: -L/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3/lib -Wl,-rpath,/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3/lib -L/usr/local/packages6/compilers/gcc/5.2.0/lib -L/usr/local/packages6/compilers/gcc/5.2.0/lib64 -lfltk_gl -lGLU -lGL -lfltk -lXcursor -lXfixes -lXext -lXft -lfontconfig -lXinerama -lpthread -ldl -lm -lX11

FLTK libraries:fontconfig CPPFLAGS:fontconfig libraries: -lfontconfigFreeType2 CPPFLAGS: -I/usr/include/freetype2FreeType2 libraries: -lfreetypeGLPK CPPFLAGS:GLPK LDFLAGS:GLPK libraries:HDF5 CPPFLAGS:HDF5 LDFLAGS:HDF5 libraries: -lhdf5Java home: /usr/local/packages6/apps/binapps/java/jre1.8.0_6

2.2. Iceberg 77

Page 82: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

0/Java JVM path: /usr/local/packages6/apps/binapps/java/jre1.8.0_6

0/lib/amd64/serverJava CPPFLAGS: -I/usr/local/packages6/apps/binapps/java/jre1.8.0

_60//include -I/usr/local/packages6/apps/binapps/java/jre1.8.0_60//include/linuxJava libraries:LAPACK libraries: -llapackLLVM CPPFLAGS:LLVM LDFLAGS:LLVM libraries:Magick++ CPPFLAGS:Magick++ LDFLAGS:Magick++ libraries:OPENGL libraries: -lfontconfig -lGL -lGLUOSMesa CPPFLAGS:OSMesa LDFLAGS:OSMesa libraries:PCRE CPPFLAGS:PCRE libraries: -lpcrePortAudio CPPFLAGS:PortAudio LDFLAGS:PortAudio libraries:PTHREAD flags: -pthreadPTHREAD libraries:QHULL CPPFLAGS:QHULL LDFLAGS:QHULL libraries:QRUPDATE CPPFLAGS:QRUPDATE LDFLAGS:QRUPDATE libraries:Qt CPPFLAGS: -I/usr/include/QtCore -I/usr/include/QtGui -I/usr

/include/QtNetwork -I/usr/include/QtOpenGLQt LDFLAGS:Qt libraries: -lQtNetwork -lQtOpenGL -lQtGui -lQtCoreREADLINE libraries: -lreadlineSndfile CPPFLAGS:Sndfile LDFLAGS:Sndfile libraries:TERM libraries: -lncursesUMFPACK CPPFLAGS:UMFPACK LDFLAGS:UMFPACK libraries:X11 include flags:X11 libraries: -lX11Z CPPFLAGS:Z LDFLAGS:Z libraries: -lz

Default pager: lessgnuplot: gnuplot

Build Octave GUI: yesJIT compiler for loops: noBuild Java interface: noDo internal array bounds checking: noBuild static libraries: noBuild shared libraries: yesDynamic Linking: yes (dlopen)

78 Chapter 2. Research Software Engineering Team

Page 83: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Include support for GNU readline: yes64-bit array dims and indexing: noOpenMP SMP multithreading: yesBuild cross tools: no

configure: WARNING:

I didn't find gperf, but it's only a problem if you need toreconstruct oct-gperf.h

configure: WARNING:

I didn't find icotool, but it's only a problem if you need toreconstruct octave-logo.ico, which is the case if you're building fromVCS sources.

configure: WARNING: Qhull library not found. This will result in loss of functionality of some geometry functions.configure: WARNING: GLPK library not found. The glpk function for solving linear programs will be disabled.configure: WARNING: gl2ps library not found. OpenGL printing is disabled.configure: WARNING: OSMesa library not found. Offscreen rendering with OpenGL will be disabled.configure: WARNING: qrupdate not found. The QR & Cholesky updating functions will be slow.configure: WARNING: AMD library not found. This will result in some lack of functionality for sparse matrices.configure: WARNING: CAMD library not found. This will result in some lack of functionality for sparse matrices.configure: WARNING: COLAMD library not found. This will result in some lack offunctionality for sparse matrices.configure: WARNING: CCOLAMD library not found. This will result in some lack offunctionality for sparse matrices.

configure: WARNING: CHOLMOD library not found. This will result in some lack offunctionality for sparse matrices.

configure: WARNING: CXSparse library not found. This will result in some lack of functionality for sparse matrices.configure: WARNING: UMFPACK not found. This will result in some lack of functionality for sparse matrices.configure: WARNING: ARPACK not found. The eigs function will be disabled.configure: WARNING: Include file <jni.h> not found. Octave will not be able tocall Java methods.configure: WARNING: Qscintilla library not found -- disabling built-in GUI editorconfigure:

• Some commonly-used packages were additionally installed from Octave Forge using the following commandsfrom within Octave

pkg install -global -forge iopkg install -global -forge statisticspkg install -global -forge mappingpkg install -global -forge imagepkg install -global -forge structpkg install -global -forge optim

Module File The module file is octave_4.0

2.2. Iceberg 79

Page 84: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

OpenFOAM

OpenFOAM

Version 3.0.0Dependancies gcc/5.2 mpi/gcc/openmpi/1.10.0 scotch/6.0.4URL http://www.openfoam.orgDocumentation http://www.openfoam.org/docs/

OpenFOAM is free, open source software for computational fluid dynamics (CFD).

Usage The lastest version of OpenFoam can be activated using the module file:

module load apps/gcc/5.2/openfoam

Alternatively, you can load a specific version of OpenFOAM:

module load apps/gcc/5.2/openfoam/2.4.0module load apps/gcc/5.2/openfoam/3.0.0

This has the same effect as sourcing the openfoam bashrc file, so that should not be needed.

Installation notes OpenFoam was compiled using the install_openfoam.sh script, the module file is 3.0.0.

OpenSim

OpenSim

Support Level bronzeDependancies NoneURL https://simtk.org/home/opensimVersion 3.3

OpenSim is a freely available, user extensible software system that lets users develop models of musculoskeletalstructures and create dynamic simulations of movement.

Usage The latest version of OpenSim can be loaded with

module load apps/gcc/4.8.2/opensim

Installation Notes These are primarily for administrators of the system.

Built using:

cmake /home/cs1sjm/Downloads/OpenSim33-source/-DCMAKE_INSTALL_PREFIX=/usr/local/packages6/apps/gcc/4.8.2/opensim/3.3/-DSIMBODY_HOME=/usr/local/packages6/libs/gcc/4.8.2/simbody/3.5.3/-DOPENSIM_STANDARD_11=ON

make -j 8

80 Chapter 2. Research Software Engineering Team

Page 85: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

make install

orca

orca

Versions 3.0.3URL https://orcaforum.cec.mpg.de/

ORCA is a flexible, efficient and easy-to-use general purpose tool for quantum chemistry with specific emphasis onspectroscopic properties of open-shell molecules. It features a wide variety of standard quantum chemical methodsranging from semiempirical methods to DFT to single- and multireference correlated ab initio methods. It can alsotreat environmental and relativistic effects.

Making orca available The following module command makes the latest version of orca available to your session

module load apps/binapps/orca

Alternatively, you can make a specific version available

module load apps/binapps/orca/3.0.3

Example single core job Create a file called orca_serial.inp that contains the following orca commands

## My first ORCA calculation :-)## Taken from the Orca manual# https://orcaforum.cec.mpg.de/OrcaManual.pdf! HF SVP

* xyz 0 1C 0 0 0O 0 0 1.13

*

Create a Sun Grid Engine submission file called submit_serial.sh that looks like this

#!/bin/bash# Request 4 Gig of virtual memory per process#$ -l mem=4G# Request 4 Gig of real memory per process#$ -l rmem=4G

module load apps/binapps/orca/3.0.3$ORCAHOME/orca example1.inp

Submit the job to the queue with the command

qsub submit_serial.sh

Example parallel job An example Sun Grid Engine submission script is

2.2. Iceberg 81

Page 86: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

#!/bin/bash#Request 4 Processes#Ensure that this matches the number requested in your Orca input file#$ -pe openmpi-ib 4# Request 4 Gig of virtual memory per process#$ -l mem=4G# Request 4 Gig of real memory per process#$ -l rmem=4G

module load mpi/gcc/openmpi/1.8.8module load apps/binapps/orca/3.0.3

ORCAPATH=/usr/local/packages6/apps/binapps/orca/3.0.3/$ORCAPATH/orca example2_parallel.inp

Register as a user You are encouraged to register as a user of Orca at https://orcaforum.cec.mpg.de/ in order to takeadvantage of updates, announcements and also of the users forum.

Documentation A comprehensive .pdf manual is available online.

Installation notes These are primarily for system administrators. Orca was a binary install

tar -xjf ./orca_3_0_3_linux_x86-64.tbzcd orca_3_0_3_linux_x86-64mkdir -p /usr/local/packages6/apps/binapps/orca/3.0.3mv * /usr/local/packages6/apps/binapps/orca/3.0.3/

Modulefile The module file is on the system at /usr/local/modulefiles/apps/binapps/orca/3.0.3

The contents of the module file are

#%Module1.0######################################################################### Orca module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global bedtools-version

puts stderr " Adds `orca-$orcaversion' to your PATH environment variable"}

set orcaversion 3.0.3prepend-path ORCAHOME /usr/local/packages6/apps/binapps/orca/3.0.3/prepend-path PATH /usr/local/packages6/apps/binapps/orca/3.0.3/

Paraview

82 Chapter 2. Research Software Engineering Team

Page 87: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Paraview

Version 4.3Support Level extrasDependancies openmpi (1.8)URL http://paraview.org/Documentation http://www.paraview.org/documentation/

Paraview is a parallel visualisation tool.

Using Paraview on iceberg This guide describes how to use Paraview from iceberg. Paraview is a parallel visualisa-tion tool designed for use on clusters like iceberg. Because iceberg is designed primarily as a headless computationalcluster paraview has not been installed on iceberg in such a way that you load the GUI remotely on iceberg 1. Paraviewtherefore runs on iceberg in client + server mode, the server running on iceberg, and the client on your local machine.

Configuring the Client To use Paraview on iceberg, first download and install Paraview 4.3 2. Once you haveinstalled Paraview locally (the client) you need to configure the connection to iceberg. In Paraview go to File >Connect, then click Add Server, name the connection iceberg, and select ‘Client / Server (reverse connection)’ forthe Server Type, the port should retain the default value of 11111. Then click configure, on the next screen leave theconnection as manual and click save. Once you are back at the original connect menu, click connect to start listeningfor the connection from iceberg.

Starting the Server Once you have configured the local paraview client, login to iceberg from the client machinevia ssh 3 and run qsub-paraview. This will submit a job to the scheduler que for 16 processes with 4GB or RAM each.This is designed to be used for large visualisation tasks, smaller jobs can be requested by specifying standard qsubcommands to qsub-paraview i.e. qsub-paraview -pe openmpi-ib 1 will only request one process.

Assuming you still have the client listening for connections, once the paraview job starts in the que it should connectto your client and you should be able to start accessing data stored on iceberg and rendering images.

A Note on Performance When you run Paraview locally it will use the graphics hardware of your local machinefor rendering. This is using hardware in your computer to create and display these images. On iceberg there is nosuch hardware, it is simulated via software. Therefore for small datasets you will probably find you are paying aperformance penalty for using Paraview on iceberg.

The advantage however, is that the renderer is on the same cluster as your data, so no data transfer is needed, and youhave access to very large amounts of memory for visualising very large datasets.

Manually Starting the Server The qsub-paraview command is a wrapper that automatically detects the client IPaddress from the SSH connection and submits the job. It is possible to customise this behavior by copying andmodifying this script. This for instance would allow you to start the paraview server via MyApps or from a differentcomputer to the one with the client installed. The script used by qsub-paraview also serves as a good example scriptand can be copied into your home directory by running cp /usr/local/bin/pvserver_submit.sh ~/. This script can thenbe qsubmitted as normal by qsub. The client IP address can be added manually by replacing echo $SSH_CLIENT |awk ‘{ print $1}’ with the IP address. More information on Paraview client/server can be found Here.

1 It is not possible to install the latest version of the paraview GUI on iceberg due to the Qt version shipped with Scientific Liunx 5.2 The client and server versions have to match.3 Connecting to Paraview via the automatic method descibed here is not supported on the MyApps portal.

2.2. Iceberg 83

Page 88: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Installation Custom build scripts are availible in /usr/local/extras/paraview/build_scripts which can be used to re-compile.

Perl

Perl

Latest Version 5.10.1URL https://www.perl.org/

Perl 5 is a highly capable, feature-rich programming language with over 27 years of development. Perl 5 runs on over100 platforms from portables to mainframes and is suitable for both rapid prototyping and large scale developmentprojects.

Usage Perl 5.10.1 is installed by default on the system so no module command is required to use it

perl --version

This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi

Copyright 1987-2009, Larry Wall

Phyluce

Phyluce

Version 1.5.0Dependancies apps/python/condaURL https://github.com/faircloth-lab/phyluceDocumentation http://faircloth-lab.github.io/phyluce/

Phyluce (phy-loo-chee) is a software package that was initially developed for analyzing data collected from ultracon-served elements in organismal genomes.

Usage Phyluce can be activated using the module file:

module load apps/binapps/phyluce/1.5.0

Phyluce makes use of the apps/python/conda module, therefore this module will be loaded by loading Phyluce.As Phyluce is a Python package your default Python interpreter will be changed by loading Phyluce.

Installation notes As root:

$ module load apps/python/conda$ conda create -p /usr/local/packages6/apps/binapps/conda/phyluce python=2$ source activate /usr/local/packages6/apps/binapps/conda/phyluce$ conda install -c https://conda.binstar.org/faircloth-lab phyluce

This installs Phyluce as a conda environment in the /usr/local/packages6/apps/binapps/conda/phyluce folder, which isthen loaded by the module file phyluce/1.5.0, which is a modification of the anaconda module files.

84 Chapter 2. Research Software Engineering Team

Page 89: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Picard

Picard

Version 1.129URL https://github.com/broadinstitute/picard/

A set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats. Picard isimplemented using the HTSJDK Java libraryHTSJDK, supporting accessing of common file formats, such as SAMand VCF, used for high-throughput sequencing data.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshxor qrsh command, preferably requesting at least 8 Gigabytes of memory

qrsh -l mem=8G -l rmem=8G

The latest version of Picard (currently 1.129) is made available with the command

module load apps/binapps/picard

Alternatively, you can load a specific version with

module load apps/binapps/picard/1.129module load apps/binapps/picard/1.101

These module commands also changes the environment to use Java 1.6 since this is required by Picard 1.129. Anenvironment variable called PICARDHOME is created by the module command that contains the path to the requestedversion of Picard.

Thus, you can run the program with the command

java -jar $PICARDHOME/picard.jar

Installation notes Version 1.129

A binary install was used. The binary came from the releases page of the project’s github repo

unzip picard-tools-1.129.zipmkdir -p /usr/local/packages6/apps/binapps/picardmv ./picard-tools-1.129 /usr/local/packages6/apps/binapps/picard/1.129

Version 1.101

A binary install was used. The binary came from the project’s sourceforge sitehttps://sourceforge.net/projects/picard/files/picard-tools/

unzip picard-tools-1.101.zipmv ./picard-tools-1.101 /usr/local/packages6/apps/binapps/picard/1.101/

Modulefile Version 1.129

The module file is on the system at /usr/local/modulefiles/apps/binapps/picard/1.129

Its contents are

2.2. Iceberg 85

Page 90: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

#%Module1.0######################################################################### Picard 1.129 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

#This version of Picard needs Java 1.6module load apps/java/1.6

proc ModulesHelp { } {puts stderr "Makes Picard 1.129 available"

}

set version 1.129set PICARD_DIR /usr/local/packages6/apps/binapps/picard/$version

module-whatis "Makes Picard 1.129 available"

prepend-path PICARDHOME $PICARD_DIR

Version 1.101

The module file is on the system at /usr/local/modulefiles/apps/binapps/picard/1.101

Its contents are

#%Module1.0######################################################################### Picard 1.101 modulefile##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

#This version of Picard needs Java 1.6module load apps/java/1.6

proc ModulesHelp { } {puts stderr "Makes Picard 1.101 available"

}

set version 1.101set PICARD_DIR /usr/local/packages6/apps/binapps/picard/$version

module-whatis "Makes Picard 1.101 available"

prepend-path PICARDHOME $PICARD_DIR

Plink

86 Chapter 2. Research Software Engineering Team

Page 91: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Plink

Versions 1.9 beta 3Support Level BronzeDependancies NoneURL https://www.cog-genomics.org/plink2

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive sesssion with the ‘qshor qrsh command.

The latest version of Plink is made available with the command

module load apps/binapps/plink

Alternatively, you can load a specific version with

module load apps/binapps/plink/1.9

You can now execute the plink command on the command line.

Installation notes These are primarily for administrators of the system.

The binary version of Plink was installed

mkdir plink_buildcd plink_buildunzip plink_linux_x86_64.ziprm plink_linux_x86_64.zipmkdir -p /usr/local/packages6/apps/binapps/plink/1.9mv * /usr/local/packages6/apps/binapps/plink/1.9

The module file is at /usr/local/modulefiles/apps/binapps/plink/1.9

#%Module10.2#####################################################################

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global ver

puts stderr " Makes Plink $ver available to the system."}

# Plink version (not in the user's environment)set ver 1.9

module-whatis "sets the necessary Plink $ver paths"

prepend-path PATH /usr/local/packages6/apps/binapps/plink/$ver

2.2. Iceberg 87

Page 92: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

povray

povray

Version 3.7.0URL http://www.povray.org/

The Persistence of Vision Raytracer is a high-quality, Free Software tool for creating stunning three-dimensionalgraphics. The source code is available for those wanting to do their own ports.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qsh orqrsh command. The latest version of povray (currently 3.7) is made available with the command

module load apps/gcc/5.2/povray

Alternatively, you can load a specific version with

module load apps/gcc/5.2/povray/3.7

This command makes the povray binary available to your session. It also loads version 5.2 of the gcc compilerenvironment since gcc 5.2 was used to compile povray 3.7.

You can now run povray. For example, to confirm the version loaded

povray --version

and to get help

povray --help

Documentation Once you have made povray available to the system using the module command above, you canread the man pages by typing

man povray

Installation notes povray 3.7.0 was installed using gcc 5.2 using the following script

• install_povray-3.7.sh

Testing The test suite was executed

make check

All tests passed.

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/povray/3.7

• The module file is on github.

88 Chapter 2. Research Software Engineering Team

Page 93: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Python

Python

Support Level GoldDependencies NoneURL https://python.orgVersion All

This page documents the “miniconda” installation on iceberg. This is the recommended way of using Python oniceberg, and the best way to be able to configure custom sets of packages for your use.

“conda” a Python package manager, allows you to create “environments” which are sets of packages that you canmodify. It does this by installing them in your home area. This page will guide you through loading conda and thencreating and modifying environments so you can install and use whatever Python packages you need.

Using conda Python After connecting to iceberg (see Connect to iceberg), start an interactive session with the qshor qrsh command.

Conda Python can be loaded with:

module load apps/python/conda

The root conda environment (the default) provides Python 3 and no extra modules, it is automatically updated, andnot recommended for general use, just as a base for your own environments. There is also a python2 environment,which is the same but with a Python 2 installation.

Quickly Loading Anaconda Environments There are a small number of environments provided for everyone touse, these are the default root and python2 environments as well as various versions of Anaconda for Python 3and Python 2.

The anaconda environments can be loaded through provided module files:

module load apps/python/anaconda2-2.4.0module load apps/python/anaconda3-2.4.0module load apps/python/anaconda3-2.5.0

Where anaconda2 represents Python 2 installations and anaconda3 represents Python 3 installations. Thesecommands will also load the apps/python/conda module and then activate the anaconda environment specified.

Note: Anaconda 2.5.0 is compiled with Intel MKL libraries which should result in higher numerical performance.

Using conda Environments Once the conda module is loaded you have to load or create the desired conda environ-ments. For the documentation on conda environments see the conda documentation.

You can load a conda environment with:

source activate python2

where python2 is the name of the environment, and unload one with:

source deactivate

2.2. Iceberg 89

Page 94: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

which will return you to the root environment.

It is possible to list all the available environments with:

conda env list

Provided system-wide are a set of anaconda environments, these will be installed with the anaconda version numberin the environment name, and never modified. They will therefore provide a static base for derivative environments orfor using directly.

Creating an Environment Every user can create their own environments, and packages shared with the system-wide environments will not be reinstalled or copied to your file store, they will be symlinked, this reduces the spaceyou need in your /home directory to install many different Python environments.

To create a clean environment with just Python 2 and numpy you can run:

conda create -n mynumpy python=2.7 numpy

This will download the latest release of Python 2.7 and numpy, and create an environment named mynumpy.

Any version of Python or list of packages can be provided:

conda create -n myscience python=3.5 numpy=1.8.1 scipy

If you wish to modify an existing environment, such as one of the anaconda installations, you can clone that envi-ronment:

conda create --clone anaconda3-2.3.0 -n myexperiment

This will create an environment called myexperiment which has all the anaconda 2.3.0 packages installed withPython 3.

Installing Packages Inside an Environment Once you have created your own environment you can install addi-tional packages or different versions of packages into it. There are two methods for doing this, conda and pip, if apackage is available through conda it is strongly recommended that you use conda to install packages. You can searchfor packages using conda:

conda search pandas

then install the package using:

conda install pandas

if you are not in your environment you will get a permission denied error when trying to install packages, if thishappens, create or activate an environment you own.

If a package is not available through conda you can search for and install it using pip:

pip search colormath

pip install colormath

Previous Anaconda Installation There is a legacy anaconda installation which is accessible through thebinapps/anacondapython/2.3 module. This module should be considered deprecated and should no longerbe used.

90 Chapter 2. Research Software Engineering Team

Page 95: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Using Python with MPI There is an experimental set of packages for conda that have been compiled by the icebergteam, which allow you to use a MPI stack entirely managed by conda. This allows you to easily create complexevironments and use MPI without worrying about other modules or system libraries.

To get access to these packages you need to run the following command to add the repo to your conda config:

conda config --add channels file:///usr/local/packages6/conda/conda-bld/

you should then be able to install the packages with the openmpi feature, which currently include openmpi, hdf5,mpi4py and h5py:

conda create -n mpi python=3.5 openmpi mpi4py

Currently, there are Python 2.7, 3.4 and 3.5 versions of mpi4py and h5py compiled in this repository.

The build scripts for these packages can be found in this GitHub repository.

Installation Notes These are primarily for administrators of the system.

The conda package manager is installed in /usr/share/packages6/conda, it was installed using the minicondainstaller.

The two “root” environments root and python2 can be updated using the update script located in/usr/local/packages6/conda/_envronments/conda-autoupdate.sh. This should be run regu-larly to keep this base environments upto date with Python, and more importantly with the conda package manageritself.

Installing a New Version of Anaconda Perform the following:

$ cd /usr/local/packages6/conda/_envronments/$ cp anaconda2-2.3.0.yml anaconda2-x.y.z.yml

then edit that file modifying the environment name and the anaconda version under requirements then run:

$ conda env create -f anaconda2-x.y.z.yml

then repeat for the Python 3 installation.

Then copy the modulefile for the previous version of anaconda to the new version and update the name of the environ-ment. Also you will need to append the new module to the conflict line in apps/python/.conda-environments.tcl.

R

R

Dependencies BLASURL http://www.r-project.org/Documentation http://www.r-project.org/

R is a statistical computing language.

Interactive Usage After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrshxcommand.

The latest version of R can be loaded with

2.2. Iceberg 91

Page 96: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load apps/R

Alternatively, you can load a specific version of R using one of the following

module load apps/R/3.3.1module load apps/R/3.3.0module load apps/R/3.2.4module load apps/R/3.2.3module load apps/R/3.2.2module load apps/R/3.2.1module load apps/R/3.2.0module load apps/R/3.1.2

R can then be run with

$ R

Serial (one CPU) Batch usage Here, we assume that you wish to run the program my_code.R on the system.With batch usage it is recommended to load a specific version of R, for example module load apps/R/3.2.4,to ensure the expected output is achieved.

First, you need to write a batch submission file. We assume you’ll call this my_job.sge

#!/bin/bash#$ -S /bin/bash#$ -cwd # Run job from current directory

module load apps/R/3.3.0 # Recommended to load a specific version of R

R CMD BATCH my_code.R my_code.R.o$JOB_ID

Note that R must be called with both the CMD and BATCH options which tell it to run an R program, in this casemy_code.R. If you do not do this, R will attempt to open an interactive prompt.

The final argument, my_code.R.o$JOBID, tells R to send output to a file with this name. Since $JOBID willalways be unique, this ensures that all of your output files are unique. Without this argument R sends all output to afile called my_code.Rout.

Ensuring that my_code.R and my_job.sge are both in your current working directory, submit your job to thebatch system

qsub my_job.sge

Replace my_job.sge with the name of your submission script.

Graphical output By default, graphical output from batch jobs is sent to a file called Rplots.pdf

Installing additional packages As you will not have permissions to install packages to the default folder, additionalR packages can be installed to your home folder ~/. To create the appropriate folder, install your first package in R ininteractive mode. Load an interactive R session as described above, and install a package with

install.packages()

You will be prompted to create a personal package library. Choose yes. The package will download and install from aCRAN mirror (you may be asked to select a nearby mirror, which you can do simply by entering the number of yourpreferred mirror).

92 Chapter 2. Research Software Engineering Team

Page 97: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Once the chosen package has been installed, additional packages can be installed either in the same way, or by creatinga .R script. An example script might look like

install.packages("dplyr")install.packages("devtools")

Call this using source(). For example if your script is called packages.R and is stored in your home folder,source this from an interactive R session with

source("~/packages.R")

These additional packages will be installed without prompting to your personal package library.

To check your packages are up to date, and update them if necessary, run the following line from an R interactivesession

update.packages(lib.loc = "~/R/x86_64-unknown-linux-gnu-library/3.3/")

The folder name after ~/R/ will likely change, but this can be completed with tab autocompletion from the R session.Ensure lib.loc folder is specified, or R will attempt to update the wrong library.

R Packages that require external libraries Some R packages require external libraries to be installed before youcan install and use them. Since there are so many, we only install those libraries that have been explicitly requested byusers of the system.

The associated R packages are not included in the system install of R, so you will need to install them yourself to yourhome directory following the instructions linked to below.

• geos This is the library required for the rgeos package.

• JAGS This is the library required for the rjags and runjags packages

Using the Rmath library in C Programs The Rmath library allows you to access some of R’s functionality from aC program. For example, consider the C-program below

#include <stdio.h>#define MATHLIB_STANDALONE#include "Rmath.h"

main(){double shape1,shape2,prob;

shape1 = 1.0;shape2 = 2.0;prob = 0.5;

printf("Critical value is %lf\n",qbeta(prob,shape1,shape2,1,0));}

This makes use of R’s qbeta function. You can compile and run this on a worker node as follows.

Start a session on a worker node with qrsh or qsh and load the R module

module load apps/R/3.3.0

Assuming the program is called test_rmath.c, compile with

gcc test_rmath.c -lRmath -lm -o test_rmath

2.2. Iceberg 93

Page 98: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

For full details about the functions made available by the Rmath library, see section 6.7 of the document Writing Rextensions

Accelerated version of R There is an experimental, accelerated version of R installed on Iceberg that makes use ofthe Intel Compilers and the Intel MKL. See R (Intel Build) for details.

Installation Notes These notes are primarily for administrators of the system.

version 3.3.1

• What’s new in R version 3.3.1

This was a scripted install. It was compiled from source with gcc 4.4.7 and with –enable-R-shlib enabled. It was runin batch mode.

This build required several external modules including xz utils, curl, bzip2 and zlib

• install_R_3.3.1.sh Downloads, compiles, tests and installs R 3.3.1 and the Rmath library.

• R 3.3.1 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.3.1

• Install log-files, including the output of the make check tests are available on the system at/usr/local/packages6/R/3.3.1/install_logs

version 3.3.0

• What’s new in R version 3.3.0

This was a scripted install. It was compiled from source with gcc 4.4.7 and with –enable-R-shlib enabled. You willneed a large memory qrshx session in order to successfully run the build script. I used qrshx -l rmem=8G -l mem=8G

This build required several external modules including xz utils, curl, bzip2 and zlib

• install_R_3.3.0.sh Downloads, compiles, tests and installs R 3.3.0 and the Rmath library.

• R 3.3.0 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.3.0

• Install log-files, including the output of the make check tests are available on the system at/usr/local/packages6/R/3.3.0/install_logs

Version 3.2.4

• What’s new in R version 3.2.4

This was a scripted install. It was compiled from source with gcc 4.4.7 and with –enable-R-shlib enabled. You willneed a large memory qrshx session in order to successfully run the build script. I used qrshx -l rmem=8G -l mem=8G

This build made use of new versions of xz utils and curl

• install_R_3.2.4.sh Downloads, compiles, tests and installs R 3.2.4 and the Rmath library.

• R 3.2.4 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.2.4

• Install log-files, including the output of the make check tests are available on the system at/usr/local/packages6/R/3.2.4/install_logs

Version 3.2.3

• What’s new in R version 3.2.3

This was a scripted install. It was compiled from source with gcc 4.4.7 and with --enable-R-shlib enabled. Youwill need a large memory qrsh session in order to successfully run the build script. I used qrsh -l rmem=8G -lmem=16G

• install_R_3.2.3.sh Downloads, compiles, tests and installs R 3.2.3 and the Rmath library.

94 Chapter 2. Research Software Engineering Team

Page 99: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

• R 3.2.3 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.2.3

• Install log-files, including the output of the make check tests are available on the system at/usr/local/packages6/R/3.2.3/install_logs

Version 3.2.2

• What’s new in R version 3.2.2

This was a scripted install. It was compiled from source with gcc 4.4.7 and with --enable-R-shlib enabled. Youwill need a large memory qrsh session in order to successfully run the build script. I used qrsh -l rmem=8G -lmem=16G

• install_R_3.2.2.sh Downloads, compiles and installs R 3.2.2 and the Rmath library.

• R 3.2.2 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.2.2

• Install log-files were manually copied to /usr/local/packages6/R/3.2.2/install_logs on thesystem. This step should be included in the next version of the install script.

Version 3.2.1

This was a manual install. It was compiled from source with gcc 4.4.7 and with --enable-R-shlib enabled.

• Install notes

• R 3.2.1 Modulefile located on the system at /usr/local/modulefiles/apps/R/3.2.1

Older versions

Install notes for older versions of R are not available.

relion

relion

Versions 1.4URL http://relion.readthedocs.org/en/latest/

RELION is a software package that performs an empirical Bayesian approach to (cryo-EM) structure de-terminationby single-particle analysis. Note that RELION is distributed under a GPL license.

Making RELION available The following module command makes the latest version of gemini available to yoursession

module load apps/gcc/4.4.7/relion

Alternatively, you can make a specific version available

module load apps/gcc/4.4.7/relion/1.4

Installation notes These are primarily for system administrators.

Install instructions: http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

• RELION was installed using the gcc 4.4.7 compiler and Openmpi 1.8.8

• install_relion.sh

2.2. Iceberg 95

Page 100: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

• Note - the environment variable RELION_QSUB_TEMPLATE points to an SGE qsub template, which needscustomizing to work with our environment

Modulefile The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/relion/1.4

The contents of the module file is

#%Module1.0######################################################################### relion module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global bedtools-version

puts stderr " Setups `relion-$relionversion' environment variables"}

set relionversion 1.4

module load apps/gcc/4.4.7/ctffind/3.140609module load apps/binapps/resmap/1.1.4module load mpi/gcc/openmpi/1.8.8

prepend-path PATH /usr/local/packages6/apps/gcc/4.4.7/relion/1.4/binprepend-path LD_LIBRARY_PATH /usr/local/packages6/apps/gcc/4.4.7/relion/1.4/libsetenv RELION_QSUB_TEMPLATE /usr/local/packages6/apps/gcc/4.4.7/relion/1.4/bin/relion_qsub.cshsetenv RELION_CTFFIND_EXECUTABLE ctffind3_mp.exesetenv RELION_RESMAP_EXECUTABLE /usr/local/packages6/apps/binapps/resmap/1.1.4

ResMap

ResMap

Versions 1.1.4URL http://resmap.readthedocs.org/en/latest/

ResMap (Resolution Map) is a software package for computing the local resolution of 3D density maps studied instructural biology, primarily electron cryo-microscopy.

Making ResMap available The following module command makes the latest version of gemini available to yoursession

module load apps/binapps/resmap

Alternatively, you can make a specific version available

module load apps/binapps/resmap/1.1.4

96 Chapter 2. Research Software Engineering Team

Page 101: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Installation notes These are primarily for system administrators.

• ResMap was installed as a precompiled binary from https://sourceforge.net/projects/resmap/files/ResMap-1.1.4-linux64/download

Modulefile The module file is on the system at /usr/local/modulefiles/apps/binapps/1.1.4

The contents of the module file is

#%Module1.0######################################################################### ResMap module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {global bedtools-version

puts stderr " Adds `ResMap-$resmapversion' to your PATH environment variable"}

set resmapversion 1.1.4

prepend-path PATH /usr/local/packages6/apps/binapps/resmap/$resmapversion/

Samtools

Samtools

Versions 1.2URL http://samtools.sourceforge.net/

SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qshcommand.

The latest version of Samtools (currently 1.2) is made available with the command

module load apps/gcc/5.2/samtools

Alternatively, you can load a specific version with

module load apps/gcc/5.2/samtools/1.2

This command makes the samtools binary directory available to your session.

Documentation Once you have made samtools available to the system using the module command above, you canread the man pages by typing

2.2. Iceberg 97

Page 102: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

man samtools

Installation notes Samtools was installed using gcc 5.2

module load compilers/gcc/5.2

tar -xvjf ./samtools-1.2.tar.bz2cd samtools-1.2mkdir -p /usr/local/packages6/apps/gcc/5.2/samtools/1.2make prefix=/usr/local/packages6/apps/gcc/5.2/samtools/1.2make prefix=/usr/local/packages6/apps/gcc/5.2/samtools/1.2 install#tabix and bgzip are not installed by the above procedure.#We can get them by doing the followingcd htslib-1.2.1/makemv ./tabix /usr/local/packages6/apps/gcc/5.2/samtools/1.2/bin/mv ./bgzip /usr/local/packages6/apps/gcc/5.2/samtools/1.2/bin/

Testing The test suite was run with

make test 2>&1 | tee make_tests.log

The summary of the test output was

Test output:Number of tests:

total .. 368passed .. 336failed .. 0expected failure .. 32unexpected pass .. 0

test/merge/test_bam_translate test/merge/test_bam_translate.tmptest/merge/test_pretty_headertest/merge/test_rtrans_buildtest/merge/test_trans_tbl_initcd test/mpileup && ./regression.shSamtools mpileup tests:

EXPECTED FAIL: Task failed, but expected to fail;when running $samtools mpileup -x -d 8500 -B -f mpileup.ref.fa deep.sam|awk '{print $4}'

Expected passes: 123Unexpected passes: 0Expected failures: 1Unexpected failures: 0

The full log is on the system at /usr/local/packages6/apps/gcc/5.2/samtools/1.2/make_tests.log

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/5.2/samtools/1.2

• The module file is on github.

98 Chapter 2. Research Software Engineering Team

Page 103: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

spark

Spark

Version 2.0URL http://spark.apache.org/

Apache Spark is a fast and general engine for large-scale data processing.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrsh orqrshx command.

To make Spark available, execute the following module command

module load apps/gcc/4.4.7/spark/2.0

Installation notes Spark was built using the system gcc 4.4.7

tar -xvzf ./spark-2.0.0.tgzcd spark-2.0.0build/mvn -DskipTests clean package

mkdir /usr/local/packages6/apps/gcc/4.4.7/sparkmv spark-2.0.0 /usr/local/packages6/apps/gcc/4.4.7/spark/

Modulefile Version 2.0

• The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/spark/2.0

Its contents are

#%Module1.0######################################################################### Spark module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

# Use only one core. User can override this if they wantsetenv MASTER local\[1\]prepend-path PATH /usr/local/packages6/apps/gcc/4.4.7/spark/spark-2.0.0/bin

SU2

SU2

Version 4.1.0Dependancies gcc/4.4.7 mpi/gcc/openmpi/1.10.1URL http://su2.stanford.edu/Documentation https://github.com/su2code/SU2/wiki

2.2. Iceberg 99

Page 104: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

The SU2 suite is an open-source collection of C++ based software tools for performing Partial Differential Equation(PDE) analysis and solving PDE constrained optimization problems.

Usage SU2 can be activated using the module file:

module load apps/gcc/4.4.7/su2

Installation notes Su2 was compiled using the install_su2.sh script. SU2 also has it’s own version of CGNS whichwas compiled with the script install_cgns.sh. The module file is 4.1.0.

Tophat

Tophat

Versions 2.1.0URL https://ccb.jhu.edu/software/tophat/index.shtml

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomesusing the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splicejunctions between exons.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with either theqsh or qrsh command.

The latest version of tophat (currently 2.1.0) is made available with the command:

module apps/gcc/4.8.2/tophat

Alternatively, you can load a specific version with

module load apps/gcc/4.8.2/tophat/2.1.0

This command makes the tophat binary available to your session.

Installation notes Tophat 2.1.0 was installed using gcc 4.8.2. Installs were attempted using gcc 5.2 and gcc 4.4.7but both failed (see this issue on github )

This install has dependencies on the following

• GNU Compiler Collection (gcc) 4.8.2

• Boost C++ Library 1.58

• Bowtie2 (not needed at install time but is needed at runtime)

Install details

module load compilers/gcc/4.8.2module load libs/gcc/4.8.2/boost/1.58

mkdir -p /usr/local/packages6/apps/gcc/4.8.2/tophat/2.1.0

tar -xvzf ./tophat-2.1.0.tar.gzcd tophat-2.1.0./configure --with-boost=/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0/ --prefix=/usr/local/packages6/apps/gcc/4.8.2/tophat/2.1.0

100 Chapter 2. Research Software Engineering Team

Page 105: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Configuration results

-- tophat 2.1.0 Configuration Results --C++ compiler: g++ -Wall -Wno-strict-aliasing -g -gdwarf-2 -Wuninitialized -O3 -DNDEBUG -I./samtools-0.1.18 -pthread -I/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0//include -I./SeqAn-1.3Linker flags: -L./samtools-0.1.18 -L/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0//libBOOST libraries: -lboost_thread -lboost_systemGCC version: gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)Host System type: x86_64-unknown-linux-gnuInstall prefix: /usr/local/packages6/apps/gcc/4.8.2/tophat/2.1.0Install eprefix: ${prefix}

Built with

makemake install

Testing A test script was executed based on the documentation on the tophat website (retrieved 29th October 2015).It only proves that the code can run without error, not that the result is correct.

• tophat_test.sh

Modulefile

• The module file is on the system at /usr/local/modulefiles/apps/gcc/4.8.2/tophat/2.1.0

• The module file is on github.

VCFtools

VCFtools

Version 0.1.14Dependancies gcc/5.2URL https://vcftools.github.io/Documentation https://vcftools.github.io/examples.html

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 GenomesProject. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation datain the form of VCF files.

Usage VCFtools can be activated using the module file:

module load apps/gcc/5.2/vcftools/0.1.14

This will both add the binary files to the $PATH and the Perl module to $PERL5LIB.

Installation notes VCFtools was compiled using the install_vcftools.sh script, the module file is 0.1.14.

Velvet

2.2. Iceberg 101

Page 106: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Velvet

Version 1.2.10URL https://www.ebi.ac.uk/~zerbino/velvet/

Sequence assembler for very short reads.

Interactive Usage After connecting to Iceberg (see Connect to iceberg), start an interactive session with the qrshxor qrsh command.

To add the velvet binaries to the system PATH, execute the following command

module load apps/gcc/4.4.7/velvet/1.2.10

This makes two programs available:-

• velvetg - de Bruijn graph construction, error removal and repeat resolution

• velveth - simple hashing program

Example submission script If the command you want to run is velvetg /fastdata/foo1bar/velvet/assembly_31 -exp_cov auto -cov_cutoff auto, here is an example submission script that requests 60Gig memory

#!/bin/bash#$ -l mem=60G#$ -l rmem=60G

module load apps/gcc/4.4.7/velvet/1.2.10

velvetg /fastdata/foo1bar/velvet/assembly_31 -exp_cov auto -cov_cutoff auto

Put the above into a text file called submit_velvet.sh and submit it to the queue with the command qsub submit_velvet.sh

Velvet Performance Velvet has got multicore capabilities but these have not been compiled in our ver-sion. This is because there is evidence that the performance is better running in serial than in parallel. Seehttp://www.ehu.eus/ehusfera/hpc/2012/06/20/benchmarking-genetic-assemblers-abyss-vs-velvet/ for details.

Installation notes Velvet was compiled with gcc 4.4.7

tar -xvzf ./velvet_1.2.10.tgzcd velvet_1.2.10make

mkdir -p /usr/local/packages6/apps/gcc/4.4.7/velvet/1.2.10mv * /usr/local/packages6/apps/gcc/4.4.7/velvet/1.2.10

Modulefile The module file is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/velvet/1.2.10

Its contents are

#%Module1.0######################################################################### velvet 1.2.10 module file##

102 Chapter 2. Research Software Engineering Team

Page 107: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes velvet 1.2.10 available"

}

set VELVET_DIR /usr/local/packages6/apps/gcc/4.4.7/velvet/1.2.10

module-whatis "Makes velevt 1.2.10 available"

prepend-path PATH $VELVET_DIR

xz utils

xz utils

Latest version 5.2.2URL http://tukaani.org/xz/

XZ Utils is free general-purpose data compression software with a high compression ratio. XZ Utils were written forPOSIX-like systems, but also work on some not-so-POSIX systems. XZ Utils are the successor to LZMA Utils.

The core of the XZ Utils compression code is based on LZMA SDK, but it has been modified quite a lot to be suitablefor XZ Utils. The primary compression algorithm is currently LZMA2, which is used inside the .xz container format.With typical files, XZ Utils create 30 % smaller output than gzip and 15 % smaller output than bzip2.

XZ Utils consist of several components:

• liblzma is a compression library with an API similar to that of zlib.

• xz is a command line tool with syntax similar to that of gzip.

• xzdec is a decompression-only tool smaller than the full-featured xz tool.

• A set of shell scripts (xzgrep, xzdiff, etc.) have been adapted from gzip to ease viewing, grepping, and comparingcompressed files.

• Emulation of command line tools of LZMA Utils eases transition from LZMA Utils to XZ Utils.

While liblzma has a zlib-like API, liblzma doesn’t include any file I/O functions. A separate I/O library is planned,which would abstract handling of .gz, .bz2, and .xz files with an easy to use API.

Usage There is an old version of xz utils available on the system by default. We can see its version with

xz --version

which gives

xz (XZ Utils) 4.999.9betaliblzma 4.999.9beta

Version 4.999.9beta of xzutils was released in 2009.

To make version 5.2.2 (released in 2015) available, run the following module command

2.2. Iceberg 103

Page 108: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load apps/gcc/4.4.7/xzutils/5.2.2

Documentation Standard man pages are available. The documentation version you get depends on wether or notyou’ve loaded the module.

man xz

Installation notes This section is primarily for administrators of the system. xz utils 5.2.2 was compiled with gcc4.4.7

tar -xvzf ./xz-5.2.2.tar.gzcd xz-5.2.2mkdir -p /usr/local/packages6/apps/gcc/4.4.7/xzutils/5.2.2./configure --prefix=/usr/local/packages6/apps/gcc/4.4.7/xzutils/5.2.2makemake install

Testing was performed with

make check

Final piece of output was

==================All 9 tests passed==================

Module file Modulefile is on the system at /usr/local/modulefiles/apps/gcc/4.4.7/xzutils/5.2.2

#%Module1.0######################################################################### xzutils 5.2.2 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes xzutils 5.2.2 available"

}

set XZUTILS_DIR /usr/local/packages6/apps/gcc/4.4.7/xzutils/5.2.2

module-whatis "Makes xzutils 5.2.2 available"

prepend-path PATH $XZUTILS_DIR/binprepend-path LD_LIBRARY_PATH $XZUTILS_DIR/libprepend-path CPLUS_INCLUDE_PATH $XZUTILS_DIR/includeprepend-path CPATH $XZUTILS_DIR/includeprepend-path LIBRARY_PATH $XZUTILS_DIR/libprepend-path MANPATH $XZUTILS_DIR/share/man/

104 Chapter 2. Research Software Engineering Team

Page 109: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Libraries

bclr

bclr

Latest version UNKNOWNURL http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/

Future Technologies Group researchers are developing a hybrid kernel/user implementation of checkpoint/restart.Their goal is to provide a robust, production quality implementation that checkpoints a wide range of applications,without requiring changes to be made to application code. This work focuses on checkpointing parallel applicationsthat communicate through MPI, and on compatibility with the software suite produced by the SciDAC Scalable Sys-tems Software ISIC. This work is broken down into 4 main areas:

Usage BCLR was installed before we started using the module system. As such, it is currently always available toworker nodes. It should be considered experimental.

The checkpointing is performed at the kernel level, so any batch code should be checkpointable without modification(it may not work with our MPI environment though... although it should cope with SMP codes).

To run a code, use

cr_run ./executable

To checkpoint a process with process id PID

cr_checkpoint -f checkpoint.file PID

Use the –term flag if you want to checkpoint and kill the process

To restart the process from a checkpoint

cr_restart checkpoint.file

Using with SGE in batch A checkpoint environment has been setup called BLCR An example of a checkpointingjob would look something like

#!/bin/bash#$ -l h_rt=168:00:00#$ -c sx#$ -ckpt blcrcr_run ./executable >> output.file

The -c sx options tells SGE to checkpoint if the queue is suspended, or if the execution daemon is killed. You can alsospecify checkpoints to occur after a given time period.

A checkpoint file will be produced before the job is terminated. This file will be called checkpoint.[jobid].[pid]. Thisfile will contain the complete in-memory state of your program at the time that it terminates, so make sure that youhave enough disk space to save the file.

To resume a checkpointed job submit a job file which looks like

#!/bin/bash#$ -l h_rt=168:00:00#$ -c sx

2.2. Iceberg 105

Page 110: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

#$ -ckpt blcrcr_restart [name of checkpoint file]

Installation notes Installation notes are not available

beagle

beagle

Latest version 2.1.2URL https://github.com/beagle-dev/beagle-libLocation /usr/local/packages6/libs/gcc/4.4.7/beagle/2.1.2/

General purpose library for evaluating the likelihood of sequence evolution on trees

Usage To make this library available, run the following module command

module load libs/gcc/4.4.7/beagle/2.1.2

This populates the environment variables C_INCLUDE_PATH, LD_LIBRARY_PATH and LD_RUN_PATH with therelevant directories.

Installation notes This section is primarily for administrators of the system.

Beagle 2.1.2 was compiled with gcc 4.4.7

• Install script - <https://github.com/mikecroucher/HPC_Installers/blob/master/libs/beagle/2.1.2/sheffield/iceberg/install_beagle_2_1_2.sh>‘_

• Module file - <https://github.com/mikecroucher/HPC_Installers/blob/master/libs/beagle/2.1.2/sheffield/iceberg/2.1.2>‘_

Boost C++ Library

Boost C++ Library

Latest version 1.58URL www.boost.org

Boost provides free, peer-reviewed and portable C++ source libraries.

Usage On Iceberg, different versions of Boost were built using different versions of gcc. We suggest that you usethe matching version of gcc to build your code.

The latest version of Boost, version 1.59, was built using gcc 5.2. To make both the compiler and Boost libraryavailable to the system, execute the following module commands while in a qrsh or qsh session

module load compilers/gcc/5.2module load libs/gcc/5.2/boost/1.59

Boost version 1.58 was built using gcc 4.8.2. To make both the compiler and Boost library available to the system,execute the following module commands while in a qrsh or qsh session

106 Chapter 2. Research Software Engineering Team

Page 111: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load compilers/gcc/4.8.2module load libs/gcc/4.8.2/boost/1.58

Version 1.41 of Boost uses version 4.4.7 of the gcc compiler. Since this is the default version of gcc on the system,you only need to load the module for the library

module load libs/gcc/4.4.7/boost/1.41

Build a simple program using Boost Many boost libraries are header-only which makes them partic-ularly simple to compile. The following program reads a sequence of integers from standard input,uses Boost.Lambda to multiply each number by three, and writes them to standard output (taken fromhttp://www.boost.org/doc/libs/1_58_0/more/getting_started/unix-variants.html):

#include <boost/lambda/lambda.hpp>#include <iostream>#include <iterator>#include <algorithm>

int main(){

using namespace boost::lambda;typedef std::istream_iterator<int> in;

std::for_each(in(std::cin), in(), std::cout << (_1 * 3) << " " );

}

Copy this into a file called example1.cpp and compile with

g++ example1.cpp -o example

Provided you loaded the correct modules given above, the program should compile without error.

Linking to a Boost library The following program is taken from the official Boost documentationhttp://www.boost.org/doc/libs/1_58_0/more/getting_started/unix-variants.html

#include <boost/regex.hpp>#include <iostream>#include <string>

int main(){

std::string line;boost::regex pat( "^Subject: (Re: |Aw: )*(.*)" );

while (std::cin){

std::getline(std::cin, line);boost::smatch matches;if (boost::regex_match(line, matches, pat))

std::cout << matches[2] << std::endl;}

}

This program makes use of the Boost.Regex library, which has a separately-compiled binary component we need tolink to. Assuming that the above program is called example2.cpp, compile with the following command

g++ example2.cpp -o example2 -lboost_regex

2.2. Iceberg 107

Page 112: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

If you get an error message that looks like this:

example2.cpp:1:27: error: boost/regex.hpp: No such file or directory

the most likely cause is that you forgot to load the correct modules as detailed above.

Installation Notes This section is primarily for administrators of the system

version 1.59: Compiled with gcc 5.2 and icu version 55

module load compilers/gcc/5.2module load libs/gcc/4.8.2/libunistring/0.9.5module load libs/gcc/4.8.2/icu/55

mkdir -p /usr/local/packages6/libs/gcc/5.2/boost/1.59.0/tar -xvzf ./boost_1_59_0.tar.gzcd boost_1_59_0./bootstrap.sh --prefix=/usr/local/packages6/libs/gcc/5.2/boost/1.59.0/

It complained that it could not find the icu library but when I ran

./b2 install --prefix=/usr/local/packages6/libs/gcc/5.2/boost/1.59.0/

It said that it had detected the icu library and was compiling it in

Version 1.58: Compiled with gcc 4.8.2 and icu version 55

module load compilers/gcc/4.8.2module load libs/gcc/4.8.2/libunistring/0.9.5module load libs/gcc/4.8.2/icu/55tar -xvzf ./boost_1_58_0.tar.gzcd boost_1_58_0./bootstrap.sh --prefix=/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0/

It complained that it could not find the icu library but when I ran

./b2 install --prefix=/usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0

It said that it had detected the icu library and was compiling it in

Version 1.41: This build of boost was built with gcc 4.4.7 and ICU version 42

module load libs/gcc/4.4.7/icu/42tar -xvzf ./boost_1_41_0.tar.gzcd boost_1_41_0./bootstrap.sh --prefix=/usr/local/packages6/libs/gcc/4.4.7/boost/1.41./bjam -sICU_PATH=/usr/local/packages6/libs/gcc/4.4.7/icu/42 install

Testing The two examples above were compiled and run.

Module Files Version 1.59

Module file location: /usr/local/modulefiles/libs/gcc/5.2/boost/1.59

#%Module1.0######################################################################### boost 1.59 module file##

108 Chapter 2. Research Software Engineering Team

Page 113: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

## Module file loggingsource /usr/local/etc/module_logging.tcl##

module load libs/gcc/4.8.2/libunistring/0.9.5module load libs/gcc/4.8.2/icu/55

proc ModulesHelp { } {puts stderr "Makes the Boost 1.59 library available"

}

set BOOST_DIR /usr/local/packages6/libs/gcc/5.2/boost/1.59.0

module-whatis "Makes the Boost 1.59 library available"

prepend-path LD_LIBRARY_PATH $BOOST_DIR/libprepend-path CPLUS_INCLUDE_PATH $BOOST_DIR/includeprepend-path LIBRARY_PATH $BOOST_DIR/lib

Version 1.58

Module file location: /usr/local/modulefiles/libs/gcc/4.8.2/boost/1.58

#%Module1.0######################################################################### boost 1.58 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

module load libs/gcc/4.8.2/libunistring/0.9.5module load libs/gcc/4.8.2/icu/55

proc ModulesHelp { } {puts stderr "Makes the Boost 1.58 library available"

}

set BOOST_DIR /usr/local/packages6/libs/gcc/4.8.2/boost/1.58.0

module-whatis "Makes the Boost 1.58 library available"

prepend-path LD_LIBRARY_PATH $BOOST_DIR/libprepend-path CPLUS_INCLUDE_PATH $BOOST_DIR/includeprepend-path LIBRARY_PATH $BOOST_DIR/lib

Version 1.41

The module file is on the system at /usr/local/modulefiles/libs/gcc/4.4.7/boost/1.41

#%Module1.0######################################################################### Boost 1.41 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

2.2. Iceberg 109

Page 114: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load libs/gcc/4.4.7/icu/42

proc ModulesHelp { } {puts stderr "Makes the Boost 1.41 library available"

}

set BOOST_DIR /usr/local/packages6/libs/gcc/4.4.7/boost/1.41

module-whatis "Makes the Boost 1.41 library available"

prepend-path LD_LIBRARY_PATH $BOOST_DIR/libprepend-path CPLUS_INCLUDE_PATH $BOOST_DIR/includeprepend-path LIBRARY_PATH $BOOST_DIR/lib

bzip2

bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice asfast at compression and six times faster at decompression.

bzip2

Version 1.0.6URL http://www.bzip.org/Location /usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6

Usage The default version of bzip2 on the system is version 1.0.5. If you need a newer version run the followingmodule command

module load libs/gcc/4.4.7/bzip2/1.0.6

Check the version number that’s available using

bzip2 --version

Documentation Standard man pages are available

man bzip2

Installation notes This section is primarily for administrators of the system.

It was built with gcc 4.4.7

wget http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gzmkdir -p /usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6tar -xvzf ./bzip2-1.0.6.tar.gzcd bzip2-1.0.6

make -f Makefile-libbz2_somakemake install PREFIX=/usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6mv *.so* /usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6/lib/

110 Chapter 2. Research Software Engineering Team

Page 115: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

testing The library is automatically tested when you do a make. The results were

Doing 6 tests (3 compress, 3 uncompress) ...If there's a problem, things might stop at this point.

./bzip2 -1 < sample1.ref > sample1.rb2

./bzip2 -2 < sample2.ref > sample2.rb2

./bzip2 -3 < sample3.ref > sample3.rb2

./bzip2 -d < sample1.bz2 > sample1.tst

./bzip2 -d < sample2.bz2 > sample2.tst

./bzip2 -ds < sample3.bz2 > sample3.tstcmp sample1.bz2 sample1.rb2cmp sample2.bz2 sample2.rb2cmp sample3.bz2 sample3.rb2cmp sample1.tst sample1.refcmp sample2.tst sample2.refcmp sample3.tst sample3.ref

If you got this far and the 'cmp's didn't complain, it lookslike you're in business.

Module File Module location is /usr/local/modulefiles/libs/gcc/4.4.7/bzip2/1.0.6. Module contents

#%Module1.0######################################################################### bzip2 1.0.6 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes the bzip 1.0.6 library available"

}

module-whatis "Makes the bzip 1.0.6 library available"

set BZIP_DIR /usr/local/packages6/libs/gcc/4.4.7/bzip2/1.0.6

prepend-path LD_LIBRARY_PATH $BZIP_DIR/libprepend-path CPATH $BZIP_DIR/includeprepend-path MANPATH $BZIP_DIR/manprepend-path PATH $BZIP_DIR/bin

cfitsio

cfitsio

Latest version 3.380URL http://heasarc.gsfc.nasa.gov/fitsio/fitsio.html

CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image TransportSystem) data format. CFITSIO provides simple high-level routines for reading and writing FITS files that insulate theprogrammer from the internal complexities of the FITS format.

2.2. Iceberg 111

Page 116: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Usage To make this library available, run the following module command

module load libs/gcc/5.2/cfitsio

The modulefile creates a variable $CFITSIO_INCLUDE_PATH which is the path to the include directory.

Installation notes This section is primarily for administrators of the system. CFITSIO 3.380 was compiled with gcc5.2. The compilation used this script and it is loaded with this modulefile .

cgns

cgns

Version 3.2.1Support Level BronzeDependancies libs/hdf5/gcc/openmpi/1.8.14URL http://cgns.github.io/WhatIsCGNS.htmlLocation /usr/local/packages6/libs/gcc/4.4.7/cgnslib

The CFD General Notation System (CGNS) provides a general, portable, and extensible standard for the storage andretrieval of computational fluid dynamics (CFD) analysis data.

Usage To make this library available, run the following module command

module load libs/gcc/4.4.7/cgns/3.2.1

This will also load the module files for the prerequisite libraries, Open MPI 1.8.3 and HDF5 1.8.14 with parallelsupport.

Installing This section is primarily for administrators of the system.

• This is a prerequisite for Code Saturne version 4.0.

• It was built with gcc 4.4.7, openmpi 1.8.3 and hdf 1.8.14

module load libs/hdf5/gcc/openmpi/1.8.14tar -xvzf cgnslib_3.2.1.tar.gzmkdir /usr/local/packages6/libs/gcc/4.4.7/cgnslibcd /usr/local/packages6/libs/gcc/4.4.7/cgnslibmkdir 3.2.1cd 3.2.1cmake ~/cgnslib_3.2.1/ccmake .

Configured the following using ccmake

CGNS_ENABLE_PARALLEL ONMPIEXEC /usr/local/mpi/gcc/openmpi/1.8.3/bin/mpiexecMPI_COMPILER /usr/local/mpi/gcc/openmpi/1.8.3/bin/mpic++MPI_EXTRA_LIBRARY /usr/local/mpi/gcc/openmpi/1.8.3/lib/libmpi.sMPI_INCLUDE_PATH /usr/local/mpi/gcc/openmpi/1.8.3/includeMPI_LIBRARY /usr/local/mpi/gcc/openmpi/1.8.3/lib/libmpi_cZLIB_LIBRARY /usr/lib64/libz.so

FORTRAN_NAMING LOWERCASE_

112 Chapter 2. Research Software Engineering Team

Page 117: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

HDF5_INCLUDE_PATH /usr/local/packages6/hdf5/gcc-4.4.7/openmpi-1.8.3/hdf5-1.8.14/include/HDF5_LIBRARY /usr/local/packages6/hdf5/gcc-4.4.7/openmpi-1.8.3/hdf5-1.8.14/lib/libhdf5.soHDF5_NEED_MPI ONHDF5_NEED_SZIP OFFHDF5_NEED_ZLIB ONCGNS_BUILD_CGNSTOOLS OFFCGNS_BUILD_SHARED ONCGNS_ENABLE_64BIT ONCGNS_ENABLE_FORTRAN ONCGNS_ENABLE_HDF5 ONCGNS_ENABLE_SCOPING OFFCGNS_ENABLE_TESTS ONCGNS_USE_SHARED ONCMAKE_BUILD_TYPE ReleaseCMAKE_INSTALL_PREFIX /usr/local/packages6/libs/gcc/4.4.7/cgnslib/3.2.1

Once the configuration was complete, I did

makemake install

Module File Module File Location: /usr/local/modulefiles/libs/gcc/4.4.7/cgns/3.2.1

#%Module1.0######################################################################### cgns 3.2.1 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes the cgns 3.2.1 library available"

}

module-whatis "Makes the cgns 3.2.1 library available"module load libs/hdf5/gcc/openmpi/1.8.14

set CGNS_DIR /usr/local/packages6/libs/gcc/4.4.7/cgnslib/3.2.1

prepend-path LD_LIBRARY_PATH $CGNS_DIR/libprepend-path CPATH $CGNS_DIR/include

CUDA

CUDA, which stands for Compute Unified Device Architecture, is a parallel computing platform and application pro-gramming interface (API) model created by NVIDIA. It allows software developers to use a CUDA-enabled graphicsprocessing unit (GPU) for general purpose processing – an approach known as GPGPU

Usage There are several versions of the CUDA library available. As with many libraries installed on the system,CUDA libraries are made available via module commands which are only available once you have started a qrsh orqsh session.

The latest version CUDA is loaded with the command

2.2. Iceberg 113

Page 118: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load libs/cuda

Alternatively, you can load a specific version with one of the following

module load libs/cuda/7.5.18module load libs/cuda/6.5.14module load libs/cuda/4.0.17module load libs/cuda/3.2.16

Compiling the sample programs You do not need to be using a GPU-enabled node to compile the sample programsbut you do need a GPU to run them.

In a qrsh session

#Load modulesmodule load libs/cuda/7.5.18module load compilers/gcc/4.9.2

#Copy CUDA samples to a local directory#It will create a directory called NVIDIA_CUDA-7.5_Samples/mkdir cuda_samplescd cuda_samplescp -r $CUDA_SDK .

#Compile (This will take a while)cd NVIDIA_CUDA-7.5_Samples/make

A basic test is to run one of the resulting binaries, deviceQuery.

Documentation

• CUDA Toolkit Documentation

• The power of C++11 in CUDA 7

Determining the NVIDIA Driver version Run the command

cat /proc/driver/nvidia/version

Example output is

NVRM version: NVIDIA UNIX x86_64 Kernel Module 340.32 Tue Aug 5 20:58:26 PDT 2014GCC version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC

Installation notes These are primarily for system administrators

CUDA 7.5.18

The device drivers were updated separately by one of the sysadmins.

A binary install was performed using a .run file

mkdir -p /usr/local/packages6/libs/binlibs/CUDA/7.5.18/

chmod +x ./cuda_7.5.18_linux.run./cuda_7.5.18_linux.run --toolkit --toolkitpath=/usr/local/packages6/libs/binlibs/CUDA/7.5.18/cuda --samples --samplespath=/usr/local/packages6/libs/binlibs/CUDA/7.5.18/samples --no-opengl-libs -silent

114 Chapter 2. Research Software Engineering Team

Page 119: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Previous version

No install notes are available

Module Files

• The module file is on the system at /usr/local/modulefiles/libs/cuda/7.5.18

• The module file is on github.

curl

curl

Latest version 7.47.1URL https://curl.haxx.se/

curl is an open source command line tool and library for transferring data with URL syntax, supporting DICT, FILE,FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB,SMTP, SMTPS, Telnet and TFTP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTPform based upload, proxies, HTTP/2, cookies, user+password authentication (Basic, Plain, Digest, CRAM-MD5,NTLM, Negotiate and Kerberos), file transfer resume, proxy tunneling and more.

Usage There is a default version of curl available on the system but it is rather old

curl-config --version

gives the result

libcurl 7.19.7

Version 7.19.7 was released in November 2009!

A newer version of the library is available via the module system. To make it available

module load libs/gcc/4.4.7/curl/7.47.1

The curl-config command will now report the newer version

curl-config --version

Should result in

libcurl 7.47.1

Documentation Standard man pages are available

man curl

Installation notes This section is primarily for administrators of the system.

Curl 7.47.1 was compiled with gcc 4.47

• Install script: install_curl_7.47.1.sh

• Module file: 7.47.1

2.2. Iceberg 115

Page 120: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

fftw

fftw

Latest version 3.3.4URL http://www.fftw.org/Location /usr/local/packages6/libs/gcc/5.2/fftw/3.3.4

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, ofarbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sinetransforms or DCT/DST).

Usage To make this library available, run the following module command

module load libs/gcc/5.2/fftw/3.3.4

Installation notes This section is primarily for administrators of the system. FFTW 3.3.4 was compiled with gcc5.2

module load compilers/gcc/5.2mkdir -p /usr/local/packages6/libs/gcc/5.2/fftw/3.3.4tar -xvzf fftw-3.3.4.tar.gzcd fftw-3.3.4./configure --prefix=/usr/local/packages6/libs/gcc/5.2/fftw/3.3.4 --enable-threads --enable-openmp --enable-sharedmakemake check

Result was lots of numerical output and--------------------------------------------------------------

FFTW transforms passed basic tests!--------------------------------------------------------------

--------------------------------------------------------------FFTW threaded transforms passed basic tests!

--------------------------------------------------------------

Installed with

make install

Module file Modulefile is on the system at /usr/local/modulefiles/libs/gcc/5.2/fftw/3.3.4

#%Module1.0######################################################################### fftw 3.3.4 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

module load compilers/gcc/5.2

proc ModulesHelp { } {

116 Chapter 2. Research Software Engineering Team

Page 121: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

puts stderr "Makes the FFTW 3.3.4 library available"}

set FFTW_DIR /usr/local/packages6/libs/gcc/5.2/fftw/3.3.4

module-whatis "Makes the FFTW 3.3.4 library available"

prepend-path LD_LIBRARY_PATH $FFTW_DIR/libprepend-path CPLUS_INCLUDE_PATH $FFTW_DIR/includeprepend-path LIBRARY_PATH $FFTW_DIR/lib

FLTK

FLTK

Version 1.3.3URL http://www.fltk.org/index.phpLocation /usr/local/packages6/libs/gcc/5.2/fltk/1.3.3

FLTK (pronounced “fulltick”) is a cross-platform C++ GUI toolkit for UNIX®/Linux® (X11), Microsoft® Win-dows®, and MacOS® X. FLTK provides modern GUI functionality without the bloat and supports 3D graphics viaOpenGL® and its built-in GLUT emulation.

Usagemodule load libs/gcc/5.2/fltk/1.3.3

Installation notes This section is primarily for administrators of the system.

• This is a pre-requisite for GNU Octave version 4.0

• It was built with gcc 5.2

module load compilers/gcc/5.2mkdir -p /usr/local/packages6/libs/gcc/5.2/fltk/1.3.3#tar -xvzf ./fltk-1.3.3-source.tar.gzcd fltk-1.3.3

#Fixes error relating to undefined _ZN18Fl_XFont_On_Demand5valueEv#Source https://groups.google.com/forum/#!topic/fltkgeneral/GT6i2KGCb3Ased -i 's/class Fl_XFont_On_Demand/class FL_EXPORT Fl_XFont_On_Demand/' FL/x.H

./configure --prefix=/usr/local/packages6/libs/gcc/5.2/fltk/1.3.3 --enable-shared --enable-xftmakemake install

Module File Modulefile at /usr/local/modulefiles/libs/gcc/5.2/fltk/1.3.3

#%Module1.0######################################################################### fltk 1.3.3 module file##

2.2. Iceberg 117

Page 122: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

## Module file loggingsource /usr/local/etc/module_logging.tcl##

module load compilers/gcc/5.2

proc ModulesHelp { } {puts stderr "Makes the FLTK 1.3.3 library available"

}

set FLTK_DIR /usr/local/packages6/libs/gcc/5.2/fltk/1.3.3

module-whatis "Makes the FLTK 1.3.3 library available"

prepend-path LD_LIBRARY_PATH $FLTK_DIR/libprepend-path CPLUS_INCLUDE_PATH $FLTK_DIR/includeprepend-path LIBRARY_PATH $FLTK_DIR/libprepend-path PATH $FLTK_DIR/bin

geos

geos

Version 3.4.2Support Level BronzeDependancies compilers/gcc/4.8.2URL http://trac.osgeo.org/geos/Location /usr/local/packages6/libs/gcc/4.8.2/geos/3.4.2

GEOS - Geometry Engine, Open Source

Usage To make this library available, run the following module commands

module load compilers/gcc/4.8.2module load libs/gcc/4.8.2/geos/3.4.2

We load version 4.8.2 of gcc since gcc 4.8.2 was used to build this version of geos.

The rgeos interface in R rgeos is a CRAN package that provides an R interface to geos. It is not installed in R bydefault so you need to install a version in your home directory.

After connecting to iceberg (see Connect to iceberg), start an interactive session with the qrsh or qsh command.Run the following module commands

module load apps/R/3.2.0module load compilers/gcc/4.8.2module load libs/gcc/4.8.2/geos/3.4.2

Launch R and run the command

install.packages('rgeos')

If you’ve never installed an R package before on the system, it will ask you if you want to install to a personal library.Answer y to any questions you are asked.

118 Chapter 2. Research Software Engineering Team

Page 123: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

The library will be installed to a sub-directory called R in your home directory and you should only need to performthe above procedure once.

Once you have performed the installation, you will only need to run the module commands above to make the geoslibrary available to the system. Then, you use regos as you would any other library in R

library('rgeos')

Installation notes This section is primarily for administrators of the system.

qrshtar -xvjf ./geos-3.4.2.tar.bz2cd geos-3.4.2mkdir -p /usr/local/packages6/libs/gcc/4.8.2/geos/3.4.2module load compilers/gcc/4.8.2./configure prefix=/usr/local/packages6/libs/gcc/4.8.2/geos/3.4.2

Potentially useful output at the end of the configure run

Swig: falsePython bindings: falseRuby bindings: falsePHP bindings: false

Once the configuration was complete, I did

makemake install

Testing Compile and run the test-suite with

make check

All tests passed.

Module File Module File Location: /usr/local/modulefiles/libs/gcc/4.8.2/geos/3.4.2

more /usr/local/modulefiles/libs/gcc/4.8.2/geos/3.4.2#%Module1.0######################################################################### geos 3.4.2 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes the geos 3.4.2 library available"

}

set GEOS_DIR /usr/local/packages6/libs/gcc/4.8.2/geos/3.4.2

module-whatis "Makes the geos 3.4.2 library available"

prepend-path LD_LIBRARY_PATH $GEOS_DIR/libprepend-path PATH $GEOS_DIR/bin

2.2. Iceberg 119

Page 124: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

HDF5

HDF5

Version 1.8.16, 1.8.15-patch1, 1.8.14 and 1.8.13Dependencies gcc or pgi compiler, openmpi (optional)URL http://www.hdfgroup.org/HDF5/Documentation http://www.hdfgroup.org/HDF5/doc/

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety ofdatatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable andis extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools andapplications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

Two primary versions of this library are provided, MPI parallel enabled versions and serial versions.

Usage - Serial The serial versions were built with gcc version 4.8.2. As such, if you are going to build anythingagainst these versions of HDF5, we recommend that you use gcc 4.8.2 which can be enabled with the following modulecommand

module load compilers/gcc/4.8.2

To enable the serial version of HDF5, use one of the following module commands depending on which version of thelibrary you require:

module load libs/hdf5/gcc/1.8.14module load libs/hdf5/gcc/1.8.13

Usage – Parallel There are multiple versions of parallel HDF5 installed with different openmpi and compiler ver-sions.

Two versions of HDF were built using gcc version 4.4.7 and OpenMPI version 1.8.3. Version 4.4.7 of gcc is the defaultcompiler on the system so no module command is required for this

module load libs/hdf5/gcc/openmpi/1.8.14module load libs/hdf5/gcc/openmpi/1.8.13

One version was built with the PGI compiler version 15.7 and openmpi version 1.8.8

module load libs/hdf5/pgi/1.8.15-patch1

The above module also loads the relevant modules for OpenMPI and PGI Compiler. To see which modules have beenloaded, use the command module list

Finally, another version was built with GCC 4.4.7 and openmpi 1.10.1, this version is also linked against ZLIB andSZIP.:

module load libs/gcc/4.4.7/openmpi/1.10.1/hdf5/1.8.16

Installation notes This section is primarily for administrators of the system.

Version 1.8.16 built using GCC Compiler, with seperate ZLIB and SZIP

• install_hdf5.sh Install script

• 1.8.16

120 Chapter 2. Research Software Engineering Team

Page 125: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Version 1.8.15-patch1 built using PGI Compiler

Here are the build details for the module libs/hdf5/pgi/1.8.15-patch1

Compiled using PGI 15.7 and OpenMPI 1.8.8

• install_pgi_hdf5_1.8.15-patch1.sh Install script

• Modulefile located on the system at /usr/local/modulefiles/libs/hdf5/pgi/1.8.15-patch1

gcc versions

This package is built from the source code distribution from the HDF Group website.

Two primary versions of this library are provided, a MPI parallel enabled version and a serial version. The serialversion has the following configuration flags enabled:

--enable-fortran --enable-fortran2003 --enable-cxx --enable-shared

The parallel version has the following flags:

--enable-fortran --enable-fortran2003 --enable-shared --enable-parallel

The parallel library does not support C++, hence it being disabled for the parallel build.

icu - International Components for Unicode

icu

Latest Version 55URL http://site.icu-project.org/

ICU is the premier library for software internationalization, used by a wide array of companies and organizations

Usage Version 55 of the icu library requires gcc version 4.8.2. To make the compiler and library available, run thefollowing module commands

module load compilers/gcc/4.8.2module load libs/gcc/4.8.2/icu/55

Version 42 of the icu library uses gcc version 4.4.7 which is the default on Iceberg. To make the library available, runthe following command

module load libs/gcc/4.4.7/icu/42

Installation Notes This section is primarily for administrators of the system.

Version 55

Icu 55 is a pre-requisite for the version of boost required for an experimental R module used by one of our users.

module load compilers/gcc/4.8.2tar -xvzf icu4c-55_1-src.tgzcd icu/source./runConfigureICU Linux/gcc --prefix=/usr/local/packages6/libs/gcc/4.8.2/icu/55/make

2.2. Iceberg 121

Page 126: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Version 42

Icu version 42 was originally installed as a system RPM. This install moved icu to a module-based install

tar -xvzf icu4c-4_2_1-src.tgzcd icu/source/./runConfigureICU Linux/gcc --prefix=/usr/local/packages6/libs/gcc/4.4.7/icu/42makemake install

Testing Version 55

make check

Last few lines of output were

[All tests passed successfully...]Elapsed Time: 00:00:00.086make[2]: Leaving directory `/home/fe1mpc/icu/icu/source/test/letest'---------------ALL TESTS SUMMARY:All tests OK: testdata intltest iotest cintltst letestmake[1]: Leaving directory `/home/fe1mpc/icu/icu/source/test'make[1]: Entering directory `/home/fe1mpc/icu/icu/source'verifying that icu-config --selfcheck can operateverifying that make -f Makefile.inc selfcheck can operatePASS: config selfcheck OKrm -rf test-local.xml

Version 42

make check

Last few lines of output were

All tests passed successfully...]Elapsed Time: 00:00:12.000make[2]: Leaving directory `/home/fe1mpc/icu/source/test/cintltst'---------------ALL TESTS SUMMARY:ok: testdata iotest cintltst===== ERRS: intltestmake[1]: *** [check-recursive] Error 1make[1]: Leaving directory `/home/fe1mpc/icu/source/test'make: *** [check-recursive] Error 2

The error can be ignored since it is a bug in the test itself:

• http://sourceforge.net/p/icu/mailman/message/32443311/

Module Files Version 55 Module File Location: /usr/local/modulefiles/libs/gcc/4.8.2/icu/55

#%Module1.0######################################################################### icu 55 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

122 Chapter 2. Research Software Engineering Team

Page 127: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

proc ModulesHelp { } {puts stderr "Makes the icu library available"

}

set ICU_DIR /usr/local/packages6/libs/gcc/4.8.2/icu/55

module-whatis "Makes the icu library available"

prepend-path LD_LIBRARY_PATH $ICU_DIR/libprepend-path LIBRARY_PATH $ICU_DIR/libprepend-path CPLUS_INCLUDE_PATH $ICU_DIR/include

Version 42 Module File Location: /usr/local/modulefiles/libs/gcc/4.4.7/icu/42

#%Module1.0######################################################################### icu 42 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes the icu library available"

}

set ICU_DIR /usr/local/packages6/libs/gcc/4.4.7/icu/42

module-whatis "Makes the icu library available"

prepend-path LD_LIBRARY_PATH $ICU_DIR/libprepend-path LIBRARY_PATH $ICU_DIR/libprepend-path CPLUS_INCLUDE_PATH $ICU_DIR/include

libunistring

libunistring

Latest version 0.9.5URL http://www.gnu.org/software/libunistring/#TOCdownloading/

Text files are nowadays usually encoded in Unicode, and may consist of very different scripts – from Latin lettersto Chinese Hanzi –, with many kinds of special characters – accents, right-to-left writing marks, hyphens, Romannumbers, and much more. But the POSIX platform APIs for text do not contain adequate functions for dealing withparticular properties of many Unicode characters. In fact, the POSIX APIs for text have several assumptions at theirbase which don’t hold for Unicode text.

This library provides functions for manipulating Unicode strings and for manipulating C strings according to theUnicode standard.

Usage To make this library available, run the following module command

2.2. Iceberg 123

Page 128: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load libs/gcc/4.8.2/libunistring/0.9.5

This correctly populates the environment variables LD_LIBRARY_PATH, LIBRARY_PATH andCPLUS_INCLUDE_PATH

Installation notes This section is primarily for administrators of the system. libunistring 0.9.5 was compiled withgcc 4.8.2

module load compilers/gcc/4.8.2tar -xvzf ./libunistring-0.9.5.tar.gzcd ./libunistring-0.9.5./configure --prefix=/usr/local/packages6/libs/gcc/4.8.2/libunistring/0.9.5

#buildmake#Testingmake check#Installmake install

Testing Run make check after make and before make install. This runs the test suite.

Results were

============================================================================Testsuite summary for============================================================================# TOTAL: 492# PASS: 482# SKIP: 10# XFAIL: 0# FAIL: 0# XPASS: 0# ERROR: 0============================================================================

Module file The module file is on the system at /usr/local/modulefiles/libs/gcc/4.8.2/libunistring/0.9.5

#%Module1.0######################################################################### libunistring 0.9.5 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes the libunistring library available"

}

set LIBUNISTRING_DIR /usr/local/packages6/libs/gcc/4.8.2/libunistring/0.9.5

module-whatis "Makes the libunistring library available"

124 Chapter 2. Research Software Engineering Team

Page 129: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

prepend-path LD_LIBRARY_PATH $LIBUNISTRING_DIR/lib/prepend-path LIBRARY_PATH $LIBUNISTRING_DIR/libprepend-path CPLUS_INCLUDE_PATH $LIBUNISTRING_DIR/include

MED

MED

Version 3.0.8Support Level BronzeURL http://www.salome-platform.org/downloads/current-versionLocation /usr/local/packages6/libs/gcc/4.4.7/med/3.0.8

The purpose of the MED module is to provide a standard for storing and recovering computer data associated tonumerical meshes and fields, and to facilitate the exchange between codes and solvers.

Usage To make this library available, run the following module command

module load libs/gcc/4.4.7/med/3.0.8

Installation notes This section is primarily for administrators of the system.

• This is a pre-requisite for Code Saturne version 4.0.

• It was built with gcc 4.4.7, openmpi 1.8.3 and hdf5 1.8.14

module load mpi/gcc/openmpi/1.8.3tar -xvzf med-3.0.8.tar.gzcd med-3.0.8mkdir -p /usr/local/packages6/libs/gcc/4.4.7/med/3.0.8./configure --prefix=/usr/local/packages6/libs/gcc/4.4.7/med/3.0.8 --disable-fortran --with-hdf5=/usr/local/packages6/hdf5/gcc-4.4.7/openmpi-1.8.3/hdf5-1.8.14/ --disable-pythonmakemake install

Fortran was disabled because otherwise the build failed with compilation errors. It’s not needed for Code Saturne 4.0.

Python was disabled because it didn’t have MPI support.

testing The following was submiited as an SGE job from the med-3.0.8 build directory

#!/bin/bash

#$ -pe openmpi-ib 8#$ -l mem=6G

module load mpi/gcc/openmpi/1.8.3make check

All tests passed

Module File

2.2. Iceberg 125

Page 130: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

#%Module1.0######################################################################### MED 3.0.8 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes the MED 3.0.8 library available"

}

module-whatis "Makes the MED 3.0.8 library available"

set MED_DIR /usr/local/packages6/libs/gcc/4.4.7/med/3.0.8

prepend-path LD_LIBRARY_PATH $MED_DIR/lib64prepend-path CPATH $MED_DIR/include

NAG Fortran Library (Serial)

Produced by experts for use in a variety of applications, the NAG Fortran Library has a global reputation for itsexcellence and, with hundreds of fully documented and tested routines, is the largest collection of mathematical andstatistical algorithms available.

This is the serial (1 CPU core) version of the NAG Fortran Library. For many routines, you may find it beneficial touse the parallel version of the library.

Usage There are several versions of the NAG Fortran Library available. The one you choose depends on whichcompiler you are using. As with many libraries installed on the system, NAG libraries are made available via modulecommands which are only available once you have started a qrsh or qsh session.

In addition to loading a module for the library, you will usually need to load a module for the compiler you are using.

NAG for Intel Fortran

Use the following command to make Mark 25 of the serial (1 CPU core) version of the NAG Fortran Library for Intelcompilers available

module load libs/intel/15/NAG/fll6i25dcl

Once you have ensured that you have loaded the module for the Intel Compilers you can compile your NAG programusing

ifort your_code.f90 -lnag_mkl -o your_code.exe

which links to a version of the NAG library that’s linked against the high performance Intel MKL (which provideshigh-performance versions of the BLAS and LAPACK libraries). Alternatively, you can compile using

ifort your_code.f90 -lnag_nag -o your_code.exe

Which is linked against a reference version of BLAS and LAPACK. If you are in any doubt as to which to choose, wesuggest that you use -lnag_mkl

126 Chapter 2. Research Software Engineering Team

Page 131: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Running NAG’s example programs Most of NAG’s routines come with example programs that show how to usethem. When you use the module command to load a version of the NAG library, the script nag_example becomesavailable. Providing this script wth the name of the NAG routine you are interested in will copy, compile and run theexample program for that routine into your current working directory.

For example, here is an example output for the NAG routine a00aaf which identifies the version of the NAG libraryyou are using. If you try this yourself, the output you get will vary according to which version of the NAG library youare using

nag_example a00aaf

If you have loaded the module for fll6i25dcl this will give the following output

Copying a00aafe.f90 to current directorycp /usr/local/packages6/libs/intel/15/NAG/fll6i25dcl/examples/source/a00aafe.f90 .

Compiling and linking a00aafe.f90 to produce executable a00aafe.exeifort -I/usr/local/packages6/libs/intel/15/NAG/fll6i25dcl/nag_interface_blocks a00aafe.f90 /usr/local/packages6/libs/intel/15/NAG/fll6i25dcl/lib/libnag_nag.a -o a00aafe.exe

Running a00aafe.exe./a00aafe.exe > a00aafe.rA00AAF Example Program Results

*** Start of NAG Library implementation details ***

Implementation title: Linux 64 (Intel 64 / AMD64), Intel Fortran, Double Precision (32-bit integers)Precision: FORTRAN double precision

Product Code: FLL6I25DCLMark: 25.1.20150610 (self-contained)

*** End of NAG Library implementation details ***

Functionality The key numerical and statistical capabilities of the Fortran Library are shown below.

• Click here for a complete list of the contents of the Library.

• Cick here to see what’s new in Mark 25 of the library.

Numerical Facilities

• Optimization, both Local and Global

• Linear, quadratic, integer and nonlinear programming and least squares problems

• Ordinary and partial differential equations, and mesh generation

• Solution of dense, banded and sparse linear equations and eigenvalue problems

• Solution of linear and nonlinear least squares problems

• Curve and surface fitting and interpolation

• Special functions

• Numerical integration and integral equations

• Roots of nonlinear equations (including polynomials)

• Option Pricing Formulae

• Wavelet Transforms

Statistical Facilities

2.2. Iceberg 127

Page 132: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

• Random number generation

• Simple calculations on statistical data

• Correlation and regression analysis

• Multivariate methods

• Analysis of variance and contingency table analysis

• Time series analysis

• Nonparametric statistics

Documentation

• The NAG Fortran Library Manual (Link to NAG’s webbsite)

Installation notes fll6i25dcl

These are primarily for system administrators

tar -xvzf ./fll6i25dcl.tgz./install.sh

The installer is interactive. Answer the installer questions as follows

Do you wish to install NAG Mark 25 Library? (yes/no):yes

License file gets shown

[accept/decline]? :accept

Where do you want to install the NAG Fortran Library Mark 25?Press return for default location (/opt/NAG)or enter an alternative path.The directory will be created if it does not already exist.>/usr/local/packages6/libs/intel/15/NAG/

Module Files fll6i25dcl

• The module file is on the system at /usr/local/modulefiles/libs/intel/15/NAG/fll6i25dcl

• The module file is on github.

NetCDF (fortran)

NetCDF

Latest version 4.4.3Dependancies mpi/gcc/openmpi/1.10.1 libs/gcc/4.4.7/openmpi/1.10.1/hdf5/1.8.16URL http://www.unidata.ucar.edu/software/netcdf/Documentation http://www.unidata.ucar.edu/software/netcdf/docs/

128 Chapter 2. Research Software Engineering Team

Page 133: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation,access, and sharing of array-oriented scientific data. NetCDF was developed and is maintained at Unidata. Unidataprovides data and software tools for use in geoscience education and research. Unidata is part of the UniversityCorporation for Atmospheric Research (UCAR) Community Programs (UCP). Unidata is funded primarily by theNational Science Foundation.

Usage This version of NetCDF was compiled using version 4.4.7 of the gcc compiler, openmpi 1.10.1 and HDF51.8.16. To make it available, run the following module command after starting a qsh or qrsh session

module load libs/gcc/4.4.7/openmpi/1.10.1/netcdf-fortran/4.4.3

This module also loads the relevant modules for OpenMPI, PGI Compiler and HDF5. To see which modules havebeen loaded, use the command module list

Installation notes This section is primarily for administrators of the system.

Version 4.4.3

• install_gcc_netcdf-fortran_4.4.3.sh Install script

• Modulefile.

NetCDF (PGI build)

NetCDF

Latest versionDependancies PGI Compiler 15.7, PGI OpenMPI 1.8.8, PGI HDF5 1.8.15-patch1URL http://www.unidata.ucar.edu/software/netcdf/Documentation http://www.unidata.ucar.edu/software/netcdf/docs/

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation,access, and sharing of array-oriented scientific data. NetCDF was developed and is maintained at Unidata. Unidataprovides data and software tools for use in geoscience education and research. Unidata is part of the UniversityCorporation for Atmospheric Research (UCAR) Community Programs (UCP). Unidata is funded primarily by theNational Science Foundation.

Usage This version of NetCDF was compiled using version 15.7 of the PGI Compiler, OpenMPI 1.8.8 and HDF51.8.15-patch1. To make it available, run the following module command after starting a qsh or qrsh session

module load libs/pgi/netcdf/4.3.3.1

This module also loads the relevant modules for OpenMPI, PGI Compiler and HDF5. To see which modules havebeen loaded, use the command module list

Installation notes This section is primarily for administrators of the system.

Version 4.3.3.1

Compiled using PGI 15.7, OpenMPI 1.8.8 and HDF5 1.8.15-patch1

• install_pgi_netcdf_4.3.3.1.sh Install script

• Install logs are on the system at /usr/local/packages6/libs/pgi/netcdf/4.3.3.1/install_logs

2.2. Iceberg 129

Page 134: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

• Modulefile located on the system at ls /usr/local/modulefiles/libs/pgi/netcdf/4.3.3.1

pcre

pcre

Version 8.37Support Level BronzeDependancies NoneURL http://www.pcre.org/Location /usr/local/packages6/libs/gcc/4.4.7/pcre/8.37

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax andsemantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIXregular expression API. The PCRE library is free, even for building proprietary software.

Usage To make this library available, run the following module command after starting a qsh or qrsh session.

module load libs/gcc/4.4.7/pcre/8.37

This also makes the updated pcregrep command available and will replace the system version. Check the versionyou are using with pcregrep -V

pcregrep -V

pcregrep version 8.37 2015-04-28

Installation notes This section is primarily for administrators of the system.

qrshtar -xvzf pcre-8.37.tar.gzcd pcre-8.37mkdir -p /usr/local/packages6/libs/gcc/4.4.7/pcre/8.37./configure --prefix=/usr/local/packages6/libs/gcc/4.4.7/pcre/8.37

The configuration details were

pcre-8.37 configuration summary:

Install prefix .................. : /usr/local/packages6/libs/gcc/4.4.7/pcre/8.37C preprocessor .................. : gcc -EC compiler ...................... : gccC++ preprocessor ................ : g++ -EC++ compiler .................... : g++Linker .......................... : /usr/bin/ld -m elf_x86_64C preprocessor flags ............ :C compiler flags ................ : -g -O2 -fvisibility=hiddenC++ compiler flags .............. : -O2 -fvisibility=hidden -fvisibility-inlines-hiddenLinker flags .................... :Extra libraries ................. :

Build 8 bit pcre library ........ : yesBuild 16 bit pcre library ....... : noBuild 32 bit pcre library ....... : no

130 Chapter 2. Research Software Engineering Team

Page 135: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Build C++ library ............... : yesEnable JIT compiling support .... : noEnable UTF-8/16/32 support ...... : noUnicode properties .............. : noNewline char/sequence ........... : lf\R matches only ANYCRLF ......... : noEBCDIC coding ................... : noEBCDIC code for NL .............. : n/aRebuild char tables ............. : noUse stack recursion ............. : yesPOSIX mem threshold ............. : 10Internal link size .............. : 2Nested parentheses limit ........ : 250Match limit ..................... : 10000000Match limit recursion ........... : MATCH_LIMITBuild shared libs ............... : yesBuild static libs ............... : yesUse JIT in pcregrep ............. : noBuffer size for pcregrep ........ : 20480Link pcregrep with libz ......... : noLink pcregrep with libbz2 ....... : noLink pcretest with libedit ...... : noLink pcretest with libreadline .. : noValgrind support ................ : noCode coverage ................... : no

Once the configuration was complete, I did

makemake install

Testing was performed and all tests passed

make check

============================================================================Testsuite summary for PCRE 8.37============================================================================# TOTAL: 5# PASS: 5# SKIP: 0# XFAIL: 0# FAIL: 0# XPASS: 0# ERROR: 0============================================================================

Module File Module File Location: /usr/local/modulefiles/libs/gcc/4.4.7/pcre/8.37

#%Module1.0######################################################################### pcre 8.37 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

2.2. Iceberg 131

Page 136: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

proc ModulesHelp { } {puts stderr "Makes the pcre 8.37 library available"

}

module-whatis "Makes the pcre 8.37 library available"

set PCRE_DIR /usr/local/packages6/libs/gcc/4.4.7/pcre/8.37

prepend-path LD_LIBRARY_PATH $PCRE_DIR/libprepend-path CPATH $PCRE_DIR/includeprepend-path PATH $PCRE_DIR/bin

SCOTCH

SCOTCH

Latest version 6.0.4URL http://www.labri.fr/perso/pelegrin/scotch/Location /usr/local/packages6/libs/gcc/5.2/scotch/6.0.4

Software package and libraries for sequential and parallel graph partitioning, static mapping and clustering, sequentialmesh and hypergraph partitioning, and sequential and parallel sparse matrix block ordering.

Usage To make this library available, run the following module command

module load libs/gcc/5.2/scotch/6.0.4

Installation notes This section is primarily for administrators of the system. SCOTCH 6.0.4 was compiled with gcc5.2 using the bash file install_scotch.sh

SimBody

SimBody

Version 3.5.3Support Level BronzeDependancies compilers/gcc/4.8.2URL https://simtk.org/home/simbodyLocation /usr/local/packages6/libs/gcc/4.8.2/simbody/3.5.3

Usage To make this library available, run the following module command

module load libs/gcc/4.8.2/simbody

Installing This section is primarily for administrators of the system.

Installed using the following procedure:

132 Chapter 2. Research Software Engineering Team

Page 137: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

module load compilers/gcc/4.8.2

cmake ../simbody-Simbody-3.5.3/ -DCMAKE_INSTALL_PREFIX=/usr/local/packages6/libs/gcc/4.8.2/simbody/3.5.3

make -j 8

zlib

zlib

Version 1.2.8URL http://zlib.net/Location /usr/local/packages6/libs/gcc/4.4.7/zlib/1.2.8

A Massively Spiffy Yet Delicately Unobtrusive Compression Library.

Usage To make this library available, run the following module command

module load libs/gcc/4.4.7/zlib/1.2.8

Installation notes This section is primarily for administrators of the system.

It was built with gcc 4.4.7

#!/bin/bash

install_dir=/usr/local/packages6/libs/gcc/4.4.7/zlib/1.2.8

wget http://zlib.net/zlib-1.2.8.tar.gztar -xvzf ./zlib-1.2.8.tar.gzcd zlib-1.2.8

mkdir -p $install_dir

./configure --prefix=$install_dirmakemake install

testing The library was tested with make check. The results were

make check

hello worldzlib version 1.2.8 = 0x1280, compile flags = 0xa9uncompress(): hello, hello!gzread(): hello, hello!gzgets() after gzseek: hello!inflate(): hello, hello!large_inflate(): OKafter inflateSync(): hello, hello!inflate with dictionary: hello, hello!

*** zlib test OK ***hello world

2.2. Iceberg 133

Page 138: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

zlib version 1.2.8 = 0x1280, compile flags = 0xa9uncompress(): hello, hello!gzread(): hello, hello!gzgets() after gzseek: hello!inflate(): hello, hello!large_inflate(): OKafter inflateSync(): hello, hello!inflate with dictionary: hello, hello!

*** zlib shared test OK ***hello worldzlib version 1.2.8 = 0x1280, compile flags = 0xa9uncompress(): hello, hello!gzread(): hello, hello!gzgets() after gzseek: hello!inflate(): hello, hello!large_inflate(): OKafter inflateSync(): hello, hello!inflate with dictionary: hello, hello!

*** zlib 64-bit test OK ***

Module File Module location is /usr/local/modulefiles/libs/gcc/4.4.7/zlib/1.2.8. Module contents

#%Module1.0######################################################################### zlib 1.2.8 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes the zlib 1.2.8 library available"

}

module-whatis "Makes the zlib 1.2.8 library available"

set ZLIB_DIR /usr/local/packages6/libs/gcc/4.4.7/zlib/1.2.8

prepend-path LD_LIBRARY_PATH $ZLIB_DIR/libprepend-path CPATH $ZLIB_DIR/includeprepend-path MANPATH $ZLIB_DIR/share/man

Development Tools

CMake

CMake is a build tool commonly used when compiling other libraries.

CMake is installed in /usr/local/packages6/cmake.

Usage CMake can be loaded with:

module load compilers/cmake

134 Chapter 2. Research Software Engineering Team

Page 139: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Installation Run the following commands:

module load apps/python/2.7

./bootstrap --prefix=/usr/local/packages6/cmake/3.3.0/--mandir=/usr/local/packages6/cmake/3.3.0/man --sphinx-man

gmake -j8

gmake install

GNU Compiler Collection (gcc)

The GNU Compiler Collection (gcc) is a widely used, free collection of compilers including C (gcc), C++ (g++) andFortran (gfortran). The defaut version of gcc on the system is 4.4.7

gcc -v

Using built-in specs.Target: x86_64-redhat-linuxConfigured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linuxThread model: posixgcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC)

It is possible to switch to other versions of the gcc compiler suite using modules. After connecting to iceberg (seeConnect to iceberg), start an interactive sesssion with the qrsh or qsh command. Choose the version of the compileryou wish to use using one of the following commands

module load compilers/gcc/6.2module load compilers/gcc/5.4module load compilers/gcc/5.3module load compilers/gcc/5.2module load compilers/gcc/4.9.2module load compilers/gcc/4.8.2module load compilers/gcc/4.5.3

Alternatively load the most recent available version using

module load compilers/gcc

Confirm that you’ve loaded the version of gcc you wanted using gcc -v.

Documentation man pages are available on the system. Once you have loaded the required version of gcc, type

man gcc

• What’s new in the gcc version 6 series?

• What’s new in the gcc version 5 series?

Installation Notes These notes are primarily for system administrators

• gcc version 6.2 was installed using :

– install_gcc_6.2.sh

– gcc 6.2 modulefile located on the system at /usr/local/modulefiles/compilers/gcc/6.2

• gcc version 5.4 was installed using :

2.2. Iceberg 135

Page 140: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

– install_gcc_5.4.sh

– gcc 5.4 modulefile located on the system at /usr/local/modulefiles/compilers/gcc/5.4

• Installation notes for version 5.3 are not available.

• gcc version 5.2 was installed using :

– install_gcc_5.2.sh

– gcc 5.2 modulefile located on the system at /usr/local/modulefiles/compilers/gcc/5.2

• gcc version 4.9.2 was installed using :

– install_gcc_4.9.2.sh

– gcc 4.9.2 modulefile located on the system at /usr/local/modulefiles/compilers/gcc/4.9.2

• Installation notes for versions 4.8.2 and below are not available.

git

git

Latest version 2.5Dependancies gcc 5.2URL https://git-scm.com/

Git is a free and open source distributed version control system designed to handle everything from small to very largeprojects with speed and efficiency.

Usage An old version of git is installed as part of the system’s opertaing system. As such, it is available everywhere,including on the log-in nodes

$ git --versiongit version 1.7.1

This was released in April 2010. We recommend that you load the most up to date version using modules - somethingthat can only be done after starting an interactive qrsh or qsh session

module load apps/gcc/5.2/git/2.5

Installation notes Version 2.5 of git was installed using gcc 5.2 using the following install script and module file:

• install_git_2.5.sh

• git 2.5 modulefile located on the system at /usr/local/modulefiles/compilers/git/2.5

Intel Compilers

Intel Compilers help create C, C++ and Fortran applications that can take full advantage of the advanced hardwarecapabilities available in Intel processors and co-processors. They also simplify that development by providing highlevel parallel models and built-in features like explicit vectorization and optimization reports.

136 Chapter 2. Research Software Engineering Team

Page 141: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Making the Intel Compilers available After connecting to iceberg (see Connect to iceberg), start an interactivesession with the qsh or qrsh command. To make one of the versions of the Intel Compiler available, run one of thefollowing module commands

module load compilers/intel/15.0.3module load compilers/intel/14.0module load compilers/intel/12.1.15module load compilers/intel/11.0

Compilation examples C

To compile the C hello world example into an executable called hello using the Intel C compiler

icc hello.c -o hello

C++

To compile the C++ hello world example into an executable called hello using the Intel C++ compiler

icpc hello.cpp -o hello

Fortran

To compile the Fortran hello world example into an executable called hello using the Intel Fortran compiler

ifort hello.f90 -o hello

Detailed Documentation Once you have loaded the module on Iceberg, man pages are available for Intel compilerproducts

man ifortman icc

The following links are to Intel’s website

• User and Reference Guide for the Intel® C++ Compiler 15.0

• User and Reference Guide for the Intel® Fortran Compiler 15.0

• Step by Step optimizing with Intel C++ Compiler

Related Software on the system Users of the Intel Compilers may also find the following useful:

• NAG Fortran Library (Serial) - A library of over 1800 numerical and statistical functions.

Installation Notes The following notes are primarily for system administrators.

• Version 15.0.3

– The install is located on the system at /usr/local/packages6/compilers/intel/2015/

– The license file is at /usr/local/packages6/compilers/intel/license.lic

– The environment variable INTEL_LICENSE_FILE is set by the environment module and points to thelicense file location

– Download the files l_ccompxe_2015.3.187.tgz (C/C++) and l_fcompxe_2015.3.187.tgz(Fortran) from Intel Portal.

– Put the above .tgz files in the same directory as install_intel15.sh and silent_master.cfg

2.2. Iceberg 137

Page 142: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

– Run install_intel15.sh

– To find what was required in the module file, I did

env > base.envsource /usr/local/packages6/compilers/intel/2015/composer_xe_2015.3.187/bin/compilervars.sh intel64env > after_intel.envdiff base.env after_intel.env

– The module file is on iceberg at /usr/local/modulefiles/compilers/intel/15.0.3

version 14 and below Installation notes are not available for these older versions of the Intel Compiler.

NAG Fortran Compiler

The NAG Fortran Compiler is robust, highly tested, and valued by developers all over the globe for its checking capa-bilities and detailed error reporting. The Compiler is available on Unix platforms as well as for Microsoft Windowsand Apple Mac platforms. Release 6.0 has extensive support for both legacy and modern Fortran features, and alsosupports parallel programming with OpenMP.

Making the NAG Compiler available After connecting to iceberg (see Connect to iceberg), start an interactivesesssion with the qsh or qrsh command. To make the NAG Fortran Compiler available, run the following modulecommand

module load compilers/NAG/6.0

Compilation examples To compile the Fortran hello world example into an executable called hello using theNAG compiler

nagfor hello.f90 -o hello

Detailed Documentation Once you’ve run the NAG Compiler module command, man documentation is available

man nagfor

Online documentation:

• PDF version of the NAG Fortran Compiler Manual

• NAG Fortran Compiler Documentation Index (NAG’s Website)

Installation Notes The following notes are primarily for system sysadmins

mkdir -p /usr/local/packages6/compilers/NAG/6.0tar -xvzf ./npl6a60na_amd64.tgzcd NAG_Fortran-amd64/

Run the interactive install script

./INSTALL.sh

Accept the license and answer the questions as follows

138 Chapter 2. Research Software Engineering Team

Page 143: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Install compiler binaries to where [/usr/local/bin]?/usr/local/packages6/compilers/NAG/6.0

Install compiler library files to where [/usr/local/lib/NAG_Fortran]?/usr/local/packages6/compilers/NAG/6.0/lib/NAG_Fortran

Install compiler man page to which directory [/usr/local/man/man1]?/usr/local/packages6/compilers/NAG/6.0/man/man1

Suffix for compiler man page [1] (i.e. nagfor.1)?Press enter

Install module man pages to which directory [/usr/local/man/man3]?/usr/local/packages6/compilers/NAG/6.0/man/man3

Suffix for module man pages [3] (i.e. f90_gc.3)?Press Enter

Licensing Add the license key to /usr/local/packages5/nag/license.lic

The license key needs to be updated annually before 31st July.

Point to this license file using the environment variable $NAG_KUSARI_FILE

This is done in the environment module

Module File Module file location is /usr/local/modulefiles/compilers/NAG/6.0

#%Module1.0######################################################################### NAG Fortran Compiler 6.0 module file##

## Module file loggingsource /usr/local/etc/module_logging.tcl##

proc ModulesHelp { } {puts stderr "Makes the NAG Fortran Compiler v6.0 available"

}

module-whatis "Makes the NAG Fortran Compiler v6.0 available"

set NAGFOR_DIR /usr/local/packages6/compilers/NAG/6.0

prepend-path PATH $NAGFOR_DIRprepend-path MANPATH $NAGFOR_DIR/mansetenv NAG_KUSARI_FILE /usr/local/packages5/nag/license.lic

PGI Compilers

The PGI Compiler suite offers C,C++ and Fortran Compilers. For full details of the features of this compiler suite, seePGI’s website at http://www.pgroup.com/products/pgiworkstation.htm

2.2. Iceberg 139

Page 144: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Making the PGI Compilers available After connecting to iceberg (see Connect to iceberg), start an interactivesession with the qsh or qrsh command. To make one of the versions of the PGI Compiler Suite available, run oneof the following module commands

module load compilers/pgi/15.10module load compilers/pgi/15.7module load compilers/pgi/14.4module load compilers/pgi/13.1

Once you’ve loaded the module, you can check the version with

pgcc --version

Compilation examples C

To compile a C hello world example into an executable called hello using the PGI C compiler

pgcc hello.c -o hello

C++

To compile a C++ hello world example into an executable called hello using the PGI C++ compiler

pgc++ hello.cpp -o hello

Fortran

To compile a Fortran hello world example into an executable called hello using the PGI Fortran compiler

pgf90 hello.f90 -o hello

Additional resources

• Using the PGI Compiler with GPUs on Iceberg

Installation Notes Version 15.10

The interactive installer was slightly different to that of 15.7 (below) but the questions and answers were essentiallythe same.

Version 15.7

The installer is interactive. Most of the questions are obvious. Here is how I answered the rest

Installation type

A network installation will save disk space by having only one copy of thecompilers and most of the libraries for all compilers on the network, andthe main installation needs to be done once for all systems on the network.

1 Single system install2 Network install

Please choose install option: 1

Path

140 Chapter 2. Research Software Engineering Team

Page 145: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Please specify the directory path under which the software will be installed.The default directory is /opt/pgi, but you may install anywhere you wish,assuming you have permission to do so.

Installation directory? [/opt/pgi] /usr/local/packages6/compilers/pgi

CUDA and AMD components

Install CUDA Toolkit Components? (y/n) yInstall AMD software components? (y/n) y

AMCL version

This PGI version links with ACML 5.3.0 by default. Also available:(1) ACML 5.3.0(2) ACML 5.3.0 using FMA4

Enter another value to override the default (1)1

Other questions

Install JAVA JRE [yes] yesInstall OpenACC Unified Memory Evaluation package? (y/n) nDo you wish to update/create links in the 2015 directory? (y/n) yDo you wish to install MPICH? (y/n) yDo you wish to generate license keys? (y/n) nDo you want the files in the install directory to be read-only? (y/n) n

The license file is on the system at /usr/local/packages6/compilers/pgi/license.dat and is a 5 seatnetwork license. Licenses are only used at compile time.

Extra install steps Unlike gcc, the PGI Compilers do not recognise the environment variable LIBRARY_PATHwhich is used by a lot of installers to specify the locations of libraries at compile time. This is fixed by creatinga siterc file at /usr/local/packages6/compilers/pgi/linux86-64/VER/bin/siterc with thefollowing contents

# get the value of the environment variable LIBRARY_PATHvariable LIBRARY_PATH is environment(LD_LIBRARY_PATH);

# split this value at colons, separate by -L, prepend 1st one by -Lvariable library_path isdefault($if($LIBRARY_PATH,-L$replace($LIBRARY_PATH,":", -L)));

# add the -L arguments to the link lineappend LDLIBARGS=$library_path;

Where VER is the version number in question: 15.7, 15.10 etc

At the time of writing (August 2015), this is documented on PGI’s website athttps://www.pgroup.com/support/link.htm#lib_path_ldflags

Modulefile Version 15.10 The PGI compiler installer creates a suitable modulefile that’s configured to our system. Itputs it at /usr/local/packages6/compilers/pgi/modulefiles/pgi64/15.10 so all that is requiredis to copy this to where we keep modules at /usr/local/modulefiles/compilers/pgi/15.10

Version 15.7

2.2. Iceberg 141

Page 146: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

The PGI compiler installer creates a suitable modulefile that’s configured to our system. It puts it at/usr/local/packages6/compilers/pgi/modulefiles/pgi64/15.7 so all that is required is to copythis to where we keep modules at /usr/local/modulefiles/compilers/pgi/15.7

MPI

OpenMPI (gcc version)

OpenMPI (gcc version)

Latest Version 1.10.1Dependancies gccURL http://www.open-mpi.org/

The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintainedby a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise,technologies, and resources from all across the High Performance Computing community in order to build the bestMPI library available. Open MPI offers advantages for system and software vendors, application developers andcomputer science researchers.

Module files The latest version is made available using

module load mpi/gcc/openmpi

Alternatively, you can load a specific version using one of

module load mpi/gcc/openmpi/1.10.1module load mpi/gcc/openmpi/1.10.0module load mpi/gcc/openmpi/1.8.8module load mpi/gcc/openmpi/1.8.3module load mpi/gcc/openmpi/1.6.4module load mpi/gcc/openmpi/1.4.4module load mpi/gcc/openmpi/1.4.3

Installation notes These are primarily for administrators of the system.

Version 1.10.1

Compiled using gcc 4.4.7.

• install_openMPI_1.10.1.sh Downloads, compiles and installs OpenMPI 1.10.0 using the system gcc.

• Modulefile 1.10.1 located on the system at /usr/local/modulefiles/mpi/gcc/openmpi/1.10.1

Version 1.10.0

Compiled using gcc 4.4.7.

• install_openMPI_1.10.0.sh Downloads, compiles and installs OpenMPI 1.10.0 using the system gcc.

• Modulefile 1.10 located on the system at /usr/local/modulefiles/mpi/gcc/openmpi/1.10.0

Version 1.8.8

Compiled using gcc 4.4.7.

• install_openMPI.sh Downloads, compiles and installs OpenMPI 1.8.8 using the system gcc.

142 Chapter 2. Research Software Engineering Team

Page 147: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

• Modulefile located on the system at /usr/local/modulefiles/mpi/gcc/openmpi/1.8.8

OpenMPI (PGI version)

OpenMPI (PGI version)

Latest Version 1.8.8Support Level FULLDependancies PGIURL http://www.open-mpi.org/

The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintainedby a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise,technologies, and resources from all across the High Performance Computing community in order to build the bestMPI library available. Open MPI offers advantages for system and software vendors, application developers andcomputer science researchers.

These versions of OpenMPI make use of the PGI compiler suite.

Module files The latest version is made available using

module load mpi/pgi/openmpi

Alternatively, you can load a specific version using one of

module load mpi/pgi/openmpi/1.8.8module load mpi/pgi/openmpi/1.6.4

Installation notes These are primarily for administrators of the system.

Version 1.8.8

Compiled using PGI 15.7.

• install_openMPI.sh Downloads, compiles and installs OpenMPI 1.8.8 using v15.7 of the PGI Compiler.

• Modulefile located on the system at /usr/local/modulefiles/mpi/pgi/openmpi/1.8.8

Installation notes for older versions are not available.

There are many versions of applications, libraries and compilers installed on the iceberg cluster. In order to avoidconflict between these software items we employ a system called modules. Having logged onto a worker nodes, usersare advised to select the software (and the version of the software) they intend to use by utilising the module command.

• Modules on Iceberg

• Applications

• Development Tools (including compilers)

• Libraries

• Scheduler Commands for interacting with the scheduler

• MPI Parallel Programming

2.2. Iceberg 143

Page 148: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

2.2.2 Accessing Iceberg

Note: If you have not connected to iceberg before the Getting Started documentation will guide you through theprocess.

Below is a list of the various ways to connect to or use iceberg:

Terminal Access

Warning: This page is a stub and needs writing. Please send a PR.

JupyterHub

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code,equations, visualizations and explanatory text. It is an excellent interface to explore your data and share your research.

The JupyterHub is a web interface for running notebooks on the iceberg cluster. It allows you to login and have anotebook server started up on the cluster, with access to both your data stores and the resources of iceberg.

Warning: This service is currenty experimental, if you use this service and encounter a problem, please providefeedback to [email protected].

Logging into the JupyterHub

To get started visit https://jupyter.shef.ac.uk and log in with your university account. A notebook session will besubmitted to the iceberg queue once you have logged in, this can take a minute or two, so the page may seem to beloading for some time.

Note: There is currently no way of specifying any options to the queue when submitting the server job. So you cannot increase the amount of memory assigned or the queue the job runs in. There is an ongoing project to add thisfunctionality.

Using the Notebook on iceberg

To create a new notebook session, select the “New” menu on the right hand side of the top menu.

144 Chapter 2. Research Software Engineering Team

Page 149: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

You will be presented with a menu showing all available conda environments that have Jupyter available. To learnmore about installing custom Python environments with conda see Python.

To learn how to use the Jupyter Notebook see the official documentation.

Using the Notebook Terminal In the “New” menu it is possible to start a terminal, this is a fully featured webterminal, running on a worker node. You can use this terminal to perform any command line only operation oniceberg. Including configuring new Python environments and installing packages.

Troubleshooting

Below is a list of common problems:

1. If you modify the PYTHON_PATH variable in your .bashrc file your jupyter server may not start correctly,and you may encounter a 503 error after logging into the hub. The solution to this is to remove these lines fromyour .bashrc file.

2. If you have previously tried installing and running Jupyter yourself (i.e. not using this JupyterHub interface) thenyou may get 503 errors when connecting to JupyterHub due to the old .jupyter profile in your home directory; ifyou then find a jupyter log file in your home directory containing SSL WRONG_VERSION_NUMBER then trydeleting (or renaming) the .jupyter directory in your home directory.

Remote Visualisation

Warning: This page is a stub and needs writing. Please send a PR.

2.2. Iceberg 145

Page 150: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

2.2.3 Filestore on Iceberg

Every user on the system has access to three different types of filestore. They differ in terms of the amount of spaceavailable, the speed of the underlying storage system, frequency of backup and the time that the data can be left there.

Here are the current details of filestore available to each user.

Home directory

All users have a home directory in the location /home/username. The filestore quota is 10 GB per user.

Backup policy: /home has backup snapshots taken every 4 hours and we keep the 10 most recent. /home also hasdaily snapshots taken each night, and we keep 28 days worth, mirrored onto a separate storage system.

The filesystem is NFS.

Data directory

Every user has access to a much larger data-storage area provided at the location /data/username.

The quota for this area is 100 GB per user.

Backup policy: /data has snapshots taken every 4 hours and we keep the 10 most recent. /data also has dailysnapshots taken each night, and we keep 7 days worth, but this is not mirrored.

The filesystem is NFS.

Fastdata directory

All users also have access to a large fast-access data storage area under /fastdata.

In order to avoid interference from other users’ files it is vitally important that you store your files in a directorycreated and named the same as your username. e.g.

mkdir /fastdata/yourusername

By default the directory you create will have world-read access - if you want to restrict read access to just your accountthen run

chmod 700 /fastdata/yourusername

after creating the directory. A more sophisticated sharing scheme would have private and public directories

mkdir /fastdata/yourusernamemkdir /fastdata/yourusername/publicmkdir /fastdata/yourusername/private

chmod 755 /fastdata/yourusernamechmod 755 /fastdata/yourusername/publicchmod 700 /fastdata/yourusername/private

The fastdata area provides 260 Terabytes of storage in total and takes advantage of the internal infiniband network forfast access to data.

Although /fastdata is available on all the worker nodes, only by accessing from the Intel-based nodes ensures thatyou can benefit from these speed improvements.

146 Chapter 2. Research Software Engineering Team

Page 151: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

There are no quota controls on the /fastdata area but files older than 3 months will be automatically deletedwithout warning. We reserve the right to change this policy without warning in order to ensure efficient running ofthe service.

You can use the lfs command to find out which files under /fastdata are older than a certain number of days andhence approaching the time of deletion. For example, to find files 50 or more days old

lfs find -ctime +50 /fastdata/yourusername

/fastdata uses the Lustre filesystem. This does not support POSIX locking which can cause issues for someapplications.

fastdata is optimised for large file operations and does not handle lots of small files very well. An example of howslow it can be for large numbers of small files is detailed at http://www.walkingrandomly.com/?p=6167

Shared directories

When you purchase an extra filestore from CiCS you should be informed of its name. Once you know this you canaccess it

• as a Windows-style (SMB) file share on machines other than Iceberg using\\uosfstore.shef.ac.uk\shared\

• as a subdirectory of /shared on Iceberg.

Note that this subdirectory will be mounted on demand on Iceberg: it will not be visible if you simply listthe contents of the /shared directory but will be accessible if you cd (change directory) into it e.g. cd/shared/my_group_file_share1

A note regarding permissions: behind the scenes, the file server that provides this shared storage manages permissionsusing Windows-style ACLs (which can be set by area owners via Group Management web interface). However, thefilesystem is mounted on a Linux cluster using NFSv4 so the file server therefore requires a means for mappingWindows-style permissions to Linux ones. An effect of this is that the Linux mode bits as seen on Iceberg are notalways to be believed for files under /shared: the output of ls -l somefile.sh may indicate that a file isreadable/writable/executable when the ACLs are what really determine access permissions. Most applications haverobust ways of checking for properties such as executability but some applications can cause problems when accessingfiles/directories on /shared by naievely checking permissions just using Linux mode bits:

• which: a directory under /shared may be on your path and you may be able to run a contained executablewithout prefixing it with a absolute/relative directory but which may fail to find that executable.

• Perl: scripts that check for executability of files on /shared using -x may fail unless Perl is explicitly told totest for file permissions in a more thorough way (see the mention of use filetest ’access’ here).

Determining your current filestore allocation

To find out your current filestore quota allocation and usage type quota.

If you exceed your file storage allocation

As soon as the quota is exceeded your account becomes frozen. In order to avoid this situation it is strongly recom-mended that you

• Use the quota command to check your usage regularly.

• Copy files that do not need to be backed to the /data/username area, or remove them from iceberg com-pletely.

2.2. Iceberg 147

Page 152: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Efficiency considerations - The /scratch areas

For jobs requiring a lot of Input and Output (I/O), it may sometimes be necessary to store copies of the data on theactual compute node on which your job is running. For this, you can create temporary areas of storage under thedirectory /scratch. The /scratch area is local to each worker node and is not visible to the other workernodes or to the head-nodes. Therefore any data created by jobs should be transferred to either your /data or /homearea before the job finishes if you wish to keep them.

The next best I/O performance that requires the minimum amount of work is achieved by keeping your data in the/fastdata area and running your jobs on the new intel nodes by specifying -l arch=intel in your job submis-sion script.

These methods provide much faster access to data than the network attached storage on either /home or /data areas,but you must remember to copy important data back onto your /home area.

If you decide to use the /scratch area we recommend that under /scratch you create a directory with the samename as your username and work under that directory to avoid the possibility of clashing with other users.

Anything under the /scratch is deleted periodically when the worker-node is idle, whereas files on the /fastdataarea will be deleted only when they are 3 months old.

\scratch uses the ext4 filesystem.

Recovering snapshots

We take regular back-ups of your /home and /data directories and it is possible to directly access a limited subsetof them.

There are 7 days worth of snapshots available in your /home and /data directories in a hidden directory called.snapshot. You need to explicitly cd into this directory to get at the files:

cd /home/YOURUSERNAME/.snapshot

The files are read-only. This allows you to attempt recover any files you might have accidentally deleted recently.

This does not apply for /fastdata for which we take no back-ups.

2.2.4 Iceberg specifications

• Total CPUs: 3440 cores

• Total GPUs: 16 units

• Total Memory: 31.8 TBytes

• Permanent Filestore: 45 TBytes

• Temporary Filestore: 260 TBytes

• Physical size: 8 Racks

• Maximum Power Consumption: 83.7 KWatts

• All nodes are connected via fast infiniband.

For reliability, there are two iceberg head-nodes ‘for logging in’ configured to take over from each other in case ofhardware failure.

148 Chapter 2. Research Software Engineering Team

Page 153: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Worker Nodes CPU Specifications

Intel Ivybridge based nodes

• 92 nodes, each with 16 cores and 64 GB of total memory (i.e. 4 GB per core).

• 4 nodes, each with 16 cores and 256 GB of total memory (i.e. 16GB per core).

• Each node uses2 of Intel E5 2650V2 8-core processors (hence 2*8=16 cores).

• Scratch space on local disk of on each node is 400 GB

Intel Westmere based nodes

• 103 nodes, each with 12 cores and 24 GB of total memory ( i.e. 2 GB per core )

• 4 nodes with 12 cores and 48 GB of total memory ( i.e. 4GB per core )

• Each node uses 2 of Intel X5650 6-core processors ( hence 2*6=12 cores )

GPU Units Specifications

8 Nvidia Tesla Kepler K40Ms GPU units

• Each GPU unit contains 2880 thread processor cores

• Each GPU unit has 12GB of GDR memory. Hence total GPU memory is 8*12=96 GB

• Each GPU unit is capable of about 4.3TFlop of single precision floating point performance, or 1.4TFlops atdouble precision.

8 Nvidia Tesla Fermi M2070s GPU units

• Each GPU unit contains 448 thread processor cores

• Each GPU unit contains 6GB of GDR memory. Hence total GPU memory is 8*6=48 GB

• Each GPU unit is capable of about 1TFlop of single precision floating point performance, or 0.5TFlops at doubleprecision.

Software and Operating System

Users normally log into a head node and then use one (or more) of the worker nodes to run their jobs on. Schedulingof users’ jobs on the worker nodes are managed by the ‘Sun Grid Engine’ software. Jobs can be run interactively ( qsh) or submitted as batch jobs ( qsub ).

• The operating system is 64-bit Scientific Linux (which is Redhat based) on all nodes

• The Sun Grid Engine for batch and interactive job scheduling

• Many Applications, Compilers, Libraries and Parallel Processing Tools. See Software on iceberg

2.3 ShARC

The University of Sheffield’s new HPC service, ShARC, is currently under development and is not yet ready for usersto access. Please use our current system, Iceberg, for now.

2.3. ShARC 149

Page 154: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

2.3.1 Software on ShARC

These pages list the software available on ShARC. If you notice an error or an omission, or wish to request newsoftware please create an issue on our github page

Applications on Sharc

Libraries on Sharc

Development Tools on Sharc

Parallel Systems

• Applications on Sharc

• Development Tools on Sharc

• Libraries on Sharc

• Parallel Systems

2.4 Parallel Computing

Modern computers contain more than one processor with a typical laptop usually containing either 2 or 4 (if youthink your laptop contains 8, it is because you have been fooled by Hyperthreading). Systems such as Iceberg or Sharccontain many hundreds of processors and the key to making your research faster is to learn how to distribute your workacross them. If your program is designed to run on only one processor, running it on our systems without modificationwill not make it any faster (it may even be slower!). Learning how to use parallelisation technologies is vital.

This section explains how to use the most common parallelisation technologies on our systems.

If you need advice on how to parallelise your workflow, please contact the Research Software Engineering Group

2.4.1 SGE Job Arrays

The simplest way of exploiting parallelism on the clusters is to use Job Arrays. A Job Array is a set of batch jobs runfrom a single job script. For example

#!/bin/bash##$ -t 1-100#echo "Task id is $SGE_TASK_ID"

./myprog.exe $SGE_TASK_ID > output.$SGE_TASK_ID

The above script will submit 100 tasks to the system at once. The difference between each of these 100 jobs is thevalue of the environment variable $SGE_TASK_ID which will range from 1 to 100, determined by the line #$ -t 1-100.In this example, the program myprog.exe will be run 100 times with differing input values of $SGE_TASK_ID. 100output files will be created with filenames output.1, output.2 and so on.

Job arrays are particularly useful for Embarrassingly Parallel problems such as Monte Carlo Simulations (Where$SGE_TASK_ID might correspond to random number seed), or batch file processing (where $SGE_TASK_ID mightrefer to a file in a list of files to be processed).

150 Chapter 2. Research Software Engineering Team

Page 155: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Examples

• MATLAB SGE Job Array example

2.4.2 Shared Memory Parallelism (SMP)

Shared Memory Parallelism (SMP) is when multiple processors can operate independently but they have access to thesame memory space. On our systems, programs that use SMP can make use of only the resources available on a singlenode. This limits you to 16 processor cores on our current system.

If you wish to scale to more processors, you should consider more complex parallelisation schemes such as MPI orHybrid OpenMP/MPI.

OpenMP

OpenMP (Open Multi-Processing) is one of the most common programming frameworks used to implement sharedmemory parallelism. To submit an OpenMP job to our system, we make use of the OpenMP parallel environmentwhich is specified by adding the line

`#$ -pe openmp N`

to your job submission script, changing the value N to the number of cores you wish to use. Here’s an example for 4cores

#!/bin/bash# Request 4 cores from the scheduler#$ -pe openmp 4#$ Request 4 Gb of memory per CPU core#$ -l rmem=4G#$ -l mem=4G

# Tell the OpenMP executable to use 4 coresexport OMP_NUM_THREADS=4./myprogram

Note that you have to specify the number of cores twice. Once to the scheduler (#$ -pe openmp 4) and once to theOpenMP runtime environment export OMP_NUM_THREADS=4

Running other SMP schemes

Any system that uses shared memory parallelism, such as pthreads, Python’s multiprocessing module, R’s mcapplyand so on can be submitted using the OpenMP parallel environment.

2.4.3 Message Passing Interface (MPI)

The Message Passing Interface (MPI) Standard is a specification for a message passing library. MPI was originallydesigned for distributed memory architectures and is used on systems ranging from a few interconnected RaspberryPi’s through to the UK’s national supercomputer, Archer.

MPI Implementations

Our systems have several MPI implementations installed. See the MPI section in software for details

2.4. Parallel Computing 151

Page 156: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Example MPI jobs

Some example MPI jobs are available in the HPC Examples repository of Sheffield’s Research Software Engineeringgroup

Batch MPI

The queue to submit to is openmpi-ib. Here is an example that requests 4 slots with 8Gb per slot using the gccimplementation of OpenMPI

#!/bin/bash#$ -l h_rt=1:00:00# Change 4 to the number of slots you want#$ -pe openmpi-ib 4# 8Gb per slot#$ -l rmem=8G#$ -l mem=8G

module load mpi/gcc/openmpimpirun ./executable

Interactive MPI

Our general-access interactive queues currently don’t have any MPI-compatible parallel environments enabled. Thus,it is not possible to run MPI jobs interactively.

MPI Training

Course notes from the national supercomputing centre are available here

2.4.4 Hybrid OpenMP / MPI

Support for Hybrid OpenMP/MPI is in the preliminary stages. Here is an example job submission for an executablethat combines OpenMP with MPI

#$ -pe openmpi-hybrid-4 8#$ -l rmem=2G#$ -l mem=6Gmodule load mpi/intel/openmpiexport OMP_NUM_THREADS=4mpirun -bynode -np 2 -cpus-per-proc 4 [executable + options]

There would be 2 MPI processes running, each with 4x (6GB mem, 2GB rmem) shared between the 4 threads per MPIprocess, for a total of 8 x (6GB mem, 2GB rmem) for the entire job. When we tried this, we got warnings saying thatthe -cpus-per-proc was getting deprecated. A quick google suggests that

mpirun -np 2 --map-by node:pe=1 [executable + options]

would be the appropriate replacement.

152 Chapter 2. Research Software Engineering Team

Page 157: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

2.4.5 GPU Computing

Graphics Processing Units (GPUs) were, as the name suggests, originally designed for the efficient processing ofgraphics. Over time, they were developed into systems that were capable of performing general purpose computingwhich is why some people refer to modern GPUs as GP-GPUs (General Purpose Graphical Processing Units).

Graphics processing tpically involves relatively simple computations that need to applied to millions of on-screenpixels in parallel. As such, GPUs tend to be very quick and efficient at computing certain types of parallel workloads.

The GPUComputing@Sheffield website aims to facilitate the use of GPU computing within University of Sheffieldresearch by providing resources for training and purchasing of equipment as well as providing a network of GPU usersand research projects within the University.

Requesting access to GPU facilities

In order to ensure that the nodes hosting the GPUs run only GPU related tasks, we have defined a special project-group for accessing these nodes. If you wish to take advantage of the GPU processors, please contact [email protected] asking to join the GPU project group.

Any iceberg user can apply to join this group. However, because our GPU resources are limited we will need to discussthe needs of the user and obtain consent from the project leader before allowing usage.

GPU Community and NVIDIA Research Centre

The University of Sheffield has been officially affiliated with NVIDIA since 2011 as an NVIDIA CUDA ResearchCentre. As such NVIDIA offer us some benefits as a research institution including discounts on hardware, technicalliaisons, online training and priority seed hardware for new GPU architectures. For first access to hardware, trainingand details of upcoming events, discussions and help please join the GPUComputing google group.

GPU Hardware on iceberg

Iceberg currently contains 16 GPU units:

• 8 Nvidia Tesla Kepler K40M GPU units. Each unit contains 2880 CUDA cores, 12GB of memory and is capableof up to 1.43 Teraflops of double precision compute power.

• 8 Nvidia Tesla Fermi M2070 GPU units. Each unit contains 448 CUDA cores, 6GB of memory and is capableof up to 515 Gigaflops of double precision compute power.

Compiling on the GPU using the PGI Compiler

The PGI Compilers are a set of commercial Fortran,C and C++ compilers from the Portland Group. To make useof them, first start an interactive GPU session and run one of the following module commands, depending on whichversion of the compilers you wish to use

module load compilers/pgi/13.1module load compilers/pgi/14.4

The PGI compilers have several features that make them interesting to users of GPU hardware:-

2.4. Parallel Computing 153

Page 158: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

OpenACC Directives

OpenACC is a relatively new way of programming GPUs that can be significantly simpler to use than low-levellanguage extensions such as CUDA or OpenCL. From the OpenACC website :

The OpenACC Application Program Interface describes a collection of compiler directives to specifyloops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attachedaccelerator. OpenACC is designed for portability across operating systems, host CPUs, and a wide rangeof accelerators, including APUs, GPUs, and many-core coprocessors.

The directives and programming model defined in the OpenACC API document allow programmers tocreate high-level host+accelerator programs without the need to explicitly initialize the accelerator, man-age data or program transfers between the host and accelerator, or initiate accelerator startup and shut-down.

For more details concerning OpenACC using the PGI compilers, see The PGI OpenACC website.

CUDA Fortran

In mid 2009, PGI and NVIDIA cooperated to develop CUDA Fortran. CUDA Fortran includes a Fortran 2003 compilerand tool chain for programming NVIDIA GPUs using Fortran.

• CUDA Fortran Programming Guide.

CUDA-x86

NVIDIA CUDA was developed to enable offloading computationally intensive kernels to massively parallel GPUs.Through API function calls and language extensions, CUDA gives developers explicit control over the mapping ofgeneral-purpose computational kernels to GPUs, as well as the placement and movement of data between an x86processor and the GPU.

The PGI CUDA C/C++ compiler for x86 platforms allows developers using CUDA to compile and optimize theirCUDA applications to run on x86-based workstations, servers and clusters with or without an NVIDIA GPU accelera-tor. When run on x86-based systems without a GPU, PGI CUDA C applications use multiple cores and the streamingSIMD (Single Instruction Multiple Data) capabilities of Intel and AMD CPUs for parallel execution.

• PGI CUDA-x86 guide.

Deep Learning with Theano on GPU Nodes

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involvingmulti-dimensional arrays efficiently. Theano is most commonly used to perform deep learning and has excellent GPUsupport and integration through PyCUDA. The following steps can be used to setup and configure Theano on yourown profile.

Request an interactive session with a sufficient amount of memory:

qsh -l gpu=1 -P gpu –l gpu_arch=nvidia-k40m -l mem=13G

Load the relevant modules

module load apps/python/anaconda3-2.5.0 module load libs/cuda/7.5.18

Create a conda environment to load relevant modules on your local user account

conda create -n theano python=3.5 anaconda3-2.5.0 source activate theano

154 Chapter 2. Research Software Engineering Team

Page 159: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Install the other Python module dependencies which are required using pip (alternatively these could be installed withconda if you prefer)

pip install theano pip install nose pip install nose-parameterized pip install pycuda

For optimal Theano performance, enable the CUDA memory manager CNMeM. To do this, create the .theanorc file inyour HOME directory and set the fraction of GPU memory reserved by Theano. The exact amount of energy may haveto be hand-picked: if Theano asks for more memory that is currently available on the GPU, an error will be thrownduring import of theano module. Create or edit the .theanorc file with nano

nano ~/.theanorc

Add the following lines and, if necessary, change the 0.8 number to whatever works for you:

[lib] cnmem=0.8

Run python and verify that Theano is working correctly:

python -c “import theano;theano.test()”

Interactive use of the GPUs

Once you are included in the GPU project group you may start using the GPU enabled nodes interactively by typing

qsh -l gpu=1 -P gpu

the gpu= parameter determines how many GPUs you are requesting. Currently, the maximum number of GPUsallowed per job is set to 4, i.e. you cannot exceed gpu=4. Most jobs will only make use of one GPU.

If your job requires selecting the type of GPU hardware, one of the following two optional parameters can be used tomake that choice

-l gpu_arch=nvidia-m2070-l gpu_arch=nvidia-k40m

Interactive sessions provide you with 2 Gigabytes of CPU RAM by default which is significantly less than the amountof GPU RAM available. This can lead to issues where your session has insufficient CPU RAM to transfer data to andfrom the GPU. As such, it is recommended that you request enough CPU memory to communicate properly with theGPU

-l gpu_arch=nvidia-m2070 -l mem=7G-l gpu_arch=nvidia-k40m -l mem=13G

The above will give you 1Gb more CPU RAM than GPU RAM for each of the respective GPU architectures.

Submitting GPU jobs

Interactive use of the GPUs

You can access GPU enabled nodes interactively by typing

qsh -l gpu=1

the gpu= parameter determines how many GPUs you are requesting. Currently, the maximum number of GPUsallowed per job is set to 4, i.e. you cannot exceed gpu=4. Most jobs will only make use of one GPU.

If your job requires selecting the type of GPU hardware, one of the following two optional parameters can be used tomake that choice

2.4. Parallel Computing 155

Page 160: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

-l gpu_arch=nvidia-m2070-l gpu_arch=nvidia-k40m

Interactive sessions provide you with 2 Gigabytes of CPU RAM by default which is significantly less than the amountof GPU RAM available. This can lead to issues where your session has insufficient CPU RAM to transfer data to andfrom the GPU. As such, it is recommended that you request enough CPU memory to communicate properly with theGPU

-l gpu_arch=nvidia-m2070 -l mem=7G-l gpu_arch=nvidia-k40m -l mem=13G

The above will give you 1Gb more CPU RAM than GPU RAM for each of the respective GPU architectures.

Submitting batch GPU jobs

To run batch jobs on gpu nodes, edit your jobfile to include a request for gpus, e.g. for a single GPU

#$ -l gpu=1

You can also use the the gpu_arch discussed aboved to target a specific gpu model

#$ -l gpu_arch=nvidia-m2070

GPU enabled Software

Ansys

See Issue #25 on github

Maple

TODO

MATLAB

TODO

NVIDIA Compiler

TODO - Link to relevant section

PGI Compiler

TODO - Link to releveant section

156 Chapter 2. Research Software Engineering Team

Page 161: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

Compiling on the GPU using the NVIDIA Compiler

To compile GPU code using the NVIDA compiler, nvcc, first start an interactive GPU session. Next, you need to setup the compiler environment via one of the following module statements

module load libs/cuda/3.2.16module load libs/cuda/4.0.17module load libs/cuda/6.5.14

depending on the version of CUDA you intend to use. This makes the nvcc CUDA compiler available. For example

nvcc filename.cu -arch sm_20

will compile the CUDA program contained in the file filename.cu. The -arch flag above signifies the computecapability of the intended GPU hardware, which is 20 for our current GPU modules. If you are intending to generatecode for older architectures you may have to specify sm_10 or sm_13 for example.

2.5 Troubleshooting

In this section, we’ll discuss some tips for solving problems with iceberg. It is suggested that you work through someof the ideas here before contacting the service desk for assistance.

2.5.1 Frequently Asked Questions

I’m a new user and my password is not recognised

When you get a username on the system, the first thing you need to do is to syncronise your passwords which will setyour password to be the same as your network password.

Strange things are happening with my terminal

Symptoms include many of the commands not working and just bash-4.1$ being displayed instead of your usernameat the bash prompt.

This may be because you’ve deleted your .bashrc and .bash_profile files - these are ‘hidden’ files which live in yourhome directory and are used to correctly set up your environment. If you hit this problem you can run the commandresetenv which will restore the default files.

I can no longer log onto iceberg

If you are confident that you have no password or remote-access-client related issues but you still can not log ontoiceberg you may be having problems due to exceeding your iceberg filestore quota. If you exceed your filestore quotain your /home area it is sometimes possible that crucial system files in your home directory gets truncated that effectthe subsequent login process.

If this happens, you should immediately email [email protected] and ask to be unfrozen.

I can not log into iceberg via the applications portal

Most of the time such problems arise due to due to JAVA version issues. As JAVA updates are released regularly, theseproblems are usually caused by the changes to the JAVA plug-in for the browser. Follow the trouble-shooting link fromthe iceberg browser-access page to resolve these problems. There is also a link on that page to test the functionality

2.5. Troubleshooting 157

Page 162: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

of your java plug-in. It can also help to try a different browser to see if it makes any difference. All failing, you mayhave to fall back to one of the non-browser access methods.

My batch job terminates without any messages or warnings

When a batch job that is initiated by using the qsub command or runfluent, runansys or runabaqus commands, it getsallocated specific amount of virtual memory and real-time. If a job exceeds either of these memory or time limits itgets terminated immediately and usually without any warning messages.

It is therefore important to estimate the amount of memory and time that is needed to run your job to completion andspecify it at the time of submitting the job to the batch queue.

Please refer to the section on hitting-limits and estimating-resources for information on how to avoid these problems.

Exceeding your disk space quota

Each user of the system has a fixed amount of disk space available in their home directory. If you exceed this quota,various problems can emerge such as an inability to launch applications and run jobs. To see if you have exceededyour disk space quota, run the quota command:

quota

Size Used Avail Use% Mounted on10.1G 5.1G 0 100% /home/foo11b100G 0 100G 0% /data/foo11b

In the above, you can see that the quota was set to 10.1 gigabytes and all of this is in use. Any jobs submitted by thisuser will likely result in an Eqw status. The recommended action is for the user to delete enough files to allow normalwork to continue.

Sometimes, it is not possible to log-in to the system because of a full quota, in which case you need to [email protected] and ask to the unfrozen.

I am getting warning messages and warning emails from my batch jobs about insufficient memory

There are two types of memory resources that can be requested when submitting batch jobs using the qsub command.These are, virtual memory ( -l mem=nnn ) and real memory ( -l rmem=nnn ). Virtual memory limit specified shouldalways be greater than equal to the real memory limit specification.

If a job exceeds its virtual memory resource it gets terminated. However if a job exceeds its real memory resource itdoes not get terminated but an email message is sent to the user asking him to specify a larger rmem= parameter thenext time, so that the job can run more efficiently.

What is rmem ( real_memory) and mem ( virtual_memory)

Running a program always involves loading the program instructions and also its data i.e. all variables and arraysthat it uses into the computers “RAM” memory. A program’s entire instructions and its entire data, along with anydynamic link libraries it may use, defines the VIRTUAL STORAGE requirements of that program. If we did nothave clever operating systems we would need as much physical memory (RAM) as the virtual-storage requirementsof that program. However, operating systems are clever enough to deal with situations where we have insufficientREAL MEMORY to load all the program instructions and data into the available Real Memory ( i.e. RAM ) . Thistechnique works because hardly any program needs to access all its instructions and its data simultaneously. Thereforethe operating system loads into RAM only those bits of the instructions and data that are needed by the program at

158 Chapter 2. Research Software Engineering Team

Page 163: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

a given instance. This is called PAGING and it involves copying bits of the programs instructions and data to/fromhard-disk to RAM as they are needed.

If the REAL MEMORY (i.e. RAM) allocated to a job is much smaller than the entire memory requirements of a job (i.e. VIRTUAL MEMORY) then there will be excessive need for ‘paging’ that will slow the execution of the programconsiderably due to the relatively slow speeds of transferring information to/from the disk into RAM.

On the other hand if the Real Memory (RAM) allocated to a job is larger than the Virtual Memory requirement of thatjob then it will result in waste of RAM resources which will be idle duration of that job.

It is therefore crucial to strike a fine balance between the VIRTUAL MEMORY (i.e. mem) and the PHYSICALMEMORY ( i.e. rmem) allocated to a job. Virtual memory limit defined by the -l mem parameter defines the maximumamount of virtual-memory your job will be allowed to use. If your job’s virtual memory requirements exceed this limitduring its execution your job will be killed immediately. Real memory limit defined by the -l rmem parameter definesthe amount of RAM that will be allocated to your job.

The way we have configured SGE, if your job starts paging excessively your job is not killed but you receive warningmessages to increase the RAM allocated to your job next time by means of the rmem parameter.

It is important to make sure that your -l mem value is always greater than your -l rmem value so as not to waste thevaluable RAM resources as mentioned earlier.

Insufficient memory in an interactive session

By default, an interactive session provides you with 2 Gigabytes of RAM (sometimes called real memory) and 6Gigabytes of Virtual Memory. You can request more than this when running your qsh or qrsh command

qsh -l mem=64G -l rmem=8G

This asks for 64 Gigabytes of Virtual Memory and 8 Gigabytes of RAM (real memory). Note that you should

• not specify more than 768 Gigabytes of virtual memory (mem)

• not specify more than 256 GB of RAM (real memory) (rmem)

Windows-style line endings

If you prepare text files such as your job submission script on a Windows machine, you may find that they do not workas intended on the system. A very common example is when a job immediately goes into Eqw status after you havesubmitted it.

The reason for this behaviour is that Windows and Unix machines have different conventions for specifying ‘end ofline’ in text files. Windows uses the control characters for ‘carriage return’ followed by ‘linefeed’, \r\n, whereasUnix uses just ‘linefeed’ \n.

The practical upshot of this is that a script prepared in Windows using Notepad looking like this

#!/bin/bashecho 'hello world'

will look like the following to programs on a Unix system

#!/bin/bash\recho 'hello world'\r

If you suspect that this is affecting your jobs, run the following command on the system

dos2unix your_files_filename

2.5. Troubleshooting 159

Page 164: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Sheffield HPC Documentation, Release

error: no DISPLAY variable found with interactive job

If you receive the error message

error: no DISPLAY variable found with interactive job

the most likely cause is that you forgot the -X switch when you logged into iceberg. That is, you might have typed

ssh [email protected]

instead of

ssh -X [email protected]

Problems connecting with WinSCP

Some users have reported issues while connetcing to the system using WinSCP, usually when working from homewith a poor connection and when accessing folders with large numbers of files.

In these instances, turning off Optimize Connection Buffer Size in WinSCP can help:

• In WinSCP, goto the settings for the site (ie. from the menu Session->Sites->SiteManager)

• From the Site Manager dialog click on the selected session and click edit button

• Click the advanced button

• The Advanced Site Settings dialog opens.

• Click on connection

• Untick the box which says Optimize Connection Buffer Size

Login Nodes RSA Fingerprint

The RSA key fingerprint for Iceberg’s login nodes is “de:72:72:e5:5b:fa:0f:96:03:d8:72:9f:02:d6:1d:fd”.

2.6 Glossary of Terms

The worlds of scientific and high performance computing are full of technical jargon that can seem daunting to new-comers. This glossary attempts to explain some of the most common terms as quickly as possible.

HPC ‘High Performance Computing’. The exact definition of this term is sometimes a subject of great debate butfor the purposes of our services you can think of it as ‘Anything that requires more resources than a typicalhigh-spec desktop or laptop PC’.

GPU Acronymn for Graphics Processing Unit.

SGE Acronymn for Sun Grid Engine.

Sun Grid Engine The Sun Grid Engine is the software that controls and allocates resources on the system. Thereare many variants of ‘Sun Grid Engine’ in use worldwide and the variant in use at Sheffield is the Son of GridEngine

Wallclock time The actual time taken to complete a job as measured by a clock on the wall or your wristwatch.

160 Chapter 2. Research Software Engineering Team

Page 165: Sheffield HPC Documentation · Sheffield HPC Documentation, Release The current High Performance Computing (HPC) system at Sheffield, is the Iceberg cluster. A new system, ShARC

Index

GGPU, 160

HHPC, 160

SSGE, 160Sun Grid Engine, 160

WWallclock time, 160

161