94
Introductory Supercomputing

Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Introductory

Supercomputing

Page 2: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• Understand basic parallel computing concepts and

workflows

• Understand the high-level architecture of a

supercomputer

• Use a basic workflow to submit a job and monitor it

• Understand a resource request and know what to

expect

• Check basic usage and accounting information

Course Objectives

Page 3: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

WORKFLOWS

Iron Man 2

Page 4: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Research enabled by computers is called eResearch.

eResearch examples:

– Simulations (e.g. prediction, testing theories)

– Data analysis (e.g. pattern matching, statistical analysis)

– Visualisation (e.g. interpretation, quality assurance)

– Combining datasets (e.g. data mashups, data mining)

A supercomputer is just one part of an eResearch

workflow.

eResearch Workflows

Page 5: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

eResearch Workflow

Collect• Kilobytes of parameters, or petabytes of data from a telescope or atomic collider

Stage• Data transfer from manual transfer of small files, to custom data management frameworks for large

projects

Process• Single large parallel job with check pointing, or many smaller parallel or serial jobs

Visualise• Examining values, generating images or movies, or interactive 3D projections

Archive• Storage of published results, or publicly accessible research datasets

Page 6: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Display

InstrumentSupercomputer(s)

Pawsey Centre

Vis cluster

Archive,databases

Archive,databases

scratch scratchcompute

• Provides multiple components of the workflow in one location

• Capacity to support significant computational and data requirements

The Pawsey Centre

Page 7: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Exploiting parallelism in a workflow allows us to

• get results faster, and

• break up big problems into manageable sizes.

A modern supercomputer is not a fast processor. It is

many processors working together in parallel.

Parallelism in Workflows

Page 8: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

It has a sequence of tasks that follow a recipe.

Just like a computer program!

Some tasks are independent

• Can be done in any order.

• Can be done in parallel.

Some tasks have prerequisites.

• Must be done sequentially.

Workflow Example – Cake Baking

Page 9: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

SupermarketKitchen pantry Kitchen bench

• “Staging” the ingredients improves access time.

• Ingredients are “prerequisites”.

• Each ingredient does not depend on others, so can

be moved in parallel, or at different times.

Baking a Cake - Staging

Page 10: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Measure ingredients Mix ingredients

Bake and cool cake

Mix icingIce cake

Stage ingredients Preheat oven

Serial

Parallel

Baking a Cake - Process

Page 11: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Either:

• Sample it for quality.

• Put it in a shop for others to browse and buy.

• Store it for later.

• Eat it!

Clean up!

Then make next cake.

Baking a Cake - Finished

Page 12: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Coarse-grained parallelism (high level)

• Different people baking cakes in their own kitchens.

• Preheating oven while mixing ingredients.

Greater autonomy, can scale to large problems and many helpers.

Fine-grained parallelism (low level)

• Spooning mixture into cupcake tray.

• Measuring ingredients.

Higher coordination requirement. Difficult to get many people to help on a single cupcake tray.

Levels of Parallelism

Page 13: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

What is your goal? – high throughput, to do one job fast, or solve a grand-challenge problem?

High Throughput:

• For many cakes, get many people to bake independently in their own kitchens – minimal coordination.

• Turn it into a production line. Use specialists and teams in some parts.

Doing one job fast:

• Experience as well as trial and error will find the optimal number of helpers.

How many helpers?

Page 14: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Workflows need supercomputing when the resources of a single laptop or workstation are not sufficient:

• The program takes too long to process

• There is not sufficient memory for the program

• The dataset is too large to fit on the computer

If you are unsure whether moving to a supercomputer will help, ask for help from Pawsey staff.

When to use supercomputing

World’s largest wedding cake6.8 tonnes

2004. Connecticut, USAPhoto: Guinness World Records

Page 15: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

SUPERCOMPUTERS

Cray 1, Cray Research Inc.

Page 16: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Data Movers

High Performance Storage

Login Nodes

Compute Nodes

Scheduler

Abstract Supercomputer

Page 17: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• Remote access to the supercomputer

• Where users should manage workflows

• Many people (~100) share a login node

at the same time.

Do not run your programs on the login nodes!

• Use the login nodes to submit jobs to the queue to be executed on

the compute nodes

• Login nodes can have different hardware to compute nodes.

• Some build tests may fail if you try to compile on login nodes.

Login Nodes

Page 18: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• Programs should run on

the compute nodes.

• Access is provided via the scheduler

• Compute nodes have a fast interconnect

that allows them to communicate with each other

• Jobs can span multiple compute nodes

• Individual nodes are not that different in performance from a

workstation

• Parallelism across compute nodes is how significant

performance improvements are achieved.

Compute Nodes

Page 19: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Each compute node has one or more CPUs:

• Each CPU has multiple cores

• Each CPU has memory attached to it

Each node has an external network connection

Some systems have accelerators (e.g. GPUs)

Inside a Compute Node

Page 20: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Fast storage “inside” the supercomputer

• Temporary working area

• Might have local node storage

• Not shared with other users

• Usually have global storage

• All nodes can access the filesystems

• Either directly connected to the interconnect, or via router nodes

• The storage is shared. Multiple simultaneous users on different

nodes will reduce performance

High Performance Storage

Page 21: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Externally connected servers that are

dedicated to moving data to and from the

high performance filesystems.

• Data movers are shared, but most users will not notice.

• Performance depends on the other end, distance, encryption

algorithms, and other concurrent transfers.

• Data movers see all the global filesystems

hpc-data.pawsey.org.au

Data Mover Nodes

Page 22: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

The scheduler feeds jobs into the compute nodes. It has a queue

of jobs and constantly optimizes how to get them all done.

As a user you interact with the queues.

The scheduler runs jobs on the compute nodes on your behalf.

Scheduler

Page 23: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

PAWSEY SYSTEMS

Pawsey Supercomputing Centre

Page 24: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Name System Nodes Cores/node RAM/node Notes

Magnus Cray XC40 1488 24 64 GB Aries interconnect

Galaxy Cray XC30 472 20 64 GB Aries interconnect, +64 K20X GPU nodes

Zeus HPE Cluster 90 28 98-128 GB +20 vis nodes, +15 K20/K40/K20X GPU nodes, +11 P100 GPU nodes, +80 KNL nodes, +1 6TB node

Pawsey Supercomputers

• Magnus should be used for large, parallel workflows.

• Galaxy is reserved for the operation of the ASKAP and

MWA radio telescopes.

• Zeus should be used for small parallel jobs, serial or single

node workflows, GPU-accelerated jobs, and remote

visualisation.

Page 25: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Filesystems are areas of storage with very different

purposes.

• Some are large, some are small.

• Some are semi-permanent, others temporary.

• Some are fast, some are slow.

High performance storage should not be used for long-

term storage.

Filesystems

Page 26: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

High performance Lustre filesystem for large data transfers

• Large 3 PB capacity, or 1.98 billion files

• Shared by all users, with no quotas

• Temporary space for queued, running, or recently finished

jobs

Your directory is /scratch/projectname/username

• Path is available in $MYSCRATCH environment variable

• Files on /scratch may be purged after 30 days

• Touching files to avoid purging violates Pawsey’s conditions

of use and will result in suspension of accounts

• Please delete your files before they are purged

Scratch Filesystem

Page 27: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

High performance Lustre filesystem for shared project files,

such as data sets and executables.

• Not as fast as /scratch, copy data before accessing it from the

compute nodes.

Your directory is /group/projectname

• Path is available in $MYGROUP environment variable

• Default 1TB quota: ̀ lfs quota -g projectname /group`

• Make sure files are set to your project group, not your username

group.

• Use `fix.group.permission.sh projectname` if needed

Group Filesystem

Page 28: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

You can use your home directory to store environment

configurations, scripts and small input files

• Small quota (about 10GB): `quota -s -f /home`

• Uses NFS, which is very slow compared to Lustre

• Where possible, use /group instead

Your directory is /home/username

• The path is available in the $HOME environment variable

Home Filesystem

Page 29: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

LOGGING IN

John Kogut using Cray X-MP at NCSA, University of Illinois

Page 30: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Within a terminal window, type:

ssh [email protected]

• Common terminal programs:

• Windows, use MobaXterm (download)

• Linux, use xterm (preinstalled)

• OS X, use Terminal (preinstalled) or xterm

(download)

Command line SSH

Page 31: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• SSH uses fingerprints to identify computers … so you don't give your password to someone pretending to be the remote host

• We recommend that you set up SSH key authentication to increase the security of your account:

https://support.pawsey.org.au/documentation/display/US/Logging+in+with+SSH+keys

• Do not share your account with others, this violates the conditions of use (have the project leader add them to the project)

• Please do not provide your password in help desk tickets, we never ask for your password via email as it is not secure.

Account Security

Page 32: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• Remote graphical interface for some tasks available. • Graphical debuggers, text editors, remote visualisation.

• Low performance over long distances.

• Very easy to set up for Linux, Mac and MobaXtermclients. Add -X flag to ssh.

ssh -X [email protected]

• For higher performance remote GUI, use VNC:

https://support.pawsey.org.au/documentation/display/US/Visualisation+documentation

Graphical Interfaces

Page 33: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• Firewalls protect computers on the local network by only

allowing particular traffic in or out to the internet.

• Most computers are behind a firewall: Pawsey

supercomputers, university desktops, home desktops.

Firewalls

Image courtesy Wikipedia

Page 34: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

SCP and SFTP use the SSH port through firewalls.

• Pawsey firewalls allow both incoming/outgoing SSH

connections.

• Most organisations and ISPs do not allow incoming

SSH connections.

Firewalls and SSH

Pawsey

Supercomputers

Page 35: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• Forgot password

• Self service reset

https://support.pawsey.org.au/password-reset/

• Scheduled maintenance

• Check your email or

https://support.pawsey.org.au/documentation/display/

US/Maintenance+and+Incidents

• Blacklisted IP due to too many failed login attempts

• This is a security precaution.

• Email the helpdesk with your username and the

machine you are attempting to log in to.

Common login problems

Page 36: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Log in to Zeus via ssh:

ssh [email protected]

Use git to download the exercise material:

Change into the downloaded directory:

cd Introductory-Supercomputing

List the contents of the directory:ls

• What are the message of the day announcements?

• What directories and files are in the exercises?

Exercise: Log in

git clone http://github.com/PawseySupercomputing/Introductory-Supercomputing

Page 37: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

USER ENVIRONMENT

Page 38: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Various software is provided to support work flows on the systems:

• Operating System (SLES or CLE)

• Compilers (e.g. Intel, GCC, Cray, PGI)

• Debuggers and Profilers (e.g. MAP, DDT)

• Performant mathematical libraries (e.g. MKL, Lapack, Petsc)

• Parallel programming libraries (e.g. MPI)

• File format libraries for parallel IO (e.g. HDF5)

Project groups are expected to manage the installation of their own software stack.

Applications that are widely used by a large number of groups may also be provided.

All of the above software is not immediately available as soon as you log in.

User Environment

Page 39: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

To prevent conflicts between software names and

versions, applications and libraries are not installed in

the standard directory locations.

Modules modify the environment to easily locate

software, libraries, documentation, or particular versions

of the software.

module load nwchem

Modules

Page 40: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Command Description

module avail Show available modules

module list List loaded modules

module load modulename Load a module into the current environment

module unload modulename Unload a module from the environment

module swap module1 module2 Swap a loaded module with another

module show modulename Give help for a particular module

module help Show module specific help

Module Commands

Page 41: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Some modules have prerequisites and order is important.

Most modules depend on an architecture and compiler, to allow the use of one module for multiple combinations.

Module Prerequisites

System Architecture modules Compiler modules

Magnus, Galaxy cray-sandybridgecray-ivybridgecray-haswell

PrgEnv-crayPrgEnv-gnuPrgEnv-intel

Zeus, Zythos sandybridgebroadwell

gccintelpgi

Page 42: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Module PrerequisitesOn Crays, switch to the desired programming environment first:

module swap PrgEnv-cray PrgEnv-intel

On Zeus and Zythos, switch to the desired architecture and compiler first:

module swap sandybridge broadwellmodule swap gcc intel

• Some modules can only be compiled for particular combinations of architectures and compilers.

• Loading these modules with the wrong prerequisites will generate a conflict error:

charris@zeus-1:~> module swap gcc intel

charris@zeus-1:~> module load gromacs/2018

Lmod has detected the following error: Cannot load module "gromacs/2018" because these module(s) are loaded:

intel/17.0.5

While processing the following module(s):

Module fullname Module Filename

--------------- ---------------

gromacs/2018 /pawsey/sles12sp3/modulefiles/apps/gromacs/2018.lua

Page 43: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Try running the following module commands:

module avail

module list

module load hdf5

module show hdf5

module swap gcc intel

module swap sandybridge broadwell

module show hdf5

• What paths are set by the hdf5 module?

Exercise: Modules

Page 44: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

SCHEDULERS

Page 45: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Supercomputers are expensive, and need to be fully utilised to get best value for money.

• All computers get replaced every 3-5 years and consume electricity whether used or not.

• Thus we want them running at maximum capacity including night and weekends.

=> use queues and a scheduler.

You may need to change your workflow:• Supercomputers do not sit idle waiting for you. They are

busy doing science. You have to wait your turn.

• Do not sit waiting for a job to start. Minimise interactivity and automate where possible.

• Always have jobs queued to maximise utilisation.

Maximising Science Outcomes

Page 46: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Filling a 192-cpu supercomputer over a day, with jobs of

different sizes and lengths.

A Scheduler in action

Page 47: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Tell the scheduler what resources your calculation needs. (Usually how many nodes and for how long).

Overestimating the time required means it will take longer to find an available slot.

Underestimating the time required means the job will get killed.

Underestimating memory will cause your program to crash.

Scheduling Your Job

Page 48: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

All Pawsey supercomputers (Magnus, Zeus, Zythos and Galaxy) use SLURM to manage queues.

The three essential commands:sbatch jobscriptfilenamesqueuescancel jobid

You’ll get an identifier (i.e., jobid) when you sbatch the job:

username@zeus-1:~> sbatch jobscript.slurm

Submitted batch job 2315399

Interacting with Pawsey Queues

Page 49: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

A SLURM partition is a queue.

A SLURM cluster is all the partitions that are managed by a single

SLURM daemon.

In the Pawsey Centre there are multiple SLURM clusters, each with

multiple partitions.

• The clusters approximately map to systems (e.g. magnus, galaxy, zeus).

• You can submit a job to a partition in one cluster from another cluster.

• This is useful for pre-processing, post-processing or staging data.

SLURM Partitions and Clusters

Page 50: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

To list the partitions when logged into a machine:sinfo

To get all partitions in all local clusters:

sinfo -M all

For example:

username@magnus-1:~> sinfoPARTITION AVAIL JOB_SIZE TIMELIMIT CPUS S:C:T NODES STATE NODELISTworkq* up 1-1366 1-00:00:00 24 2:12:1 2 idle* nid00[543,840]workq* up 1-1366 1-00:00:00 24 2:12:1 1 down* nid00694workq* up 1-1366 1-00:00:00 24 2:12:1 12 reserved nid000[16-27]workq* up 1-1366 1-00:00:00 24 2:12:1 1457 allocated nid0[0028-0063, … workq* up 1-1366 1-00:00:00 24 2:12:1 8 idle nid0[0193, …debugq up 1-6 1:00:00 24 2:12:1 4 allocated nid000[08-11]debugq up 1-6 1:00:00 24 2:12:1 4 idle nid000[12-15]

Querying SLURM Partitions

Page 51: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

It is important to use the correct system and partition for each part of a workflow:

Pawsey Partitions

System Partition Purpose

Magnus workq Large distributed memory (MPI) jobs

Magnus debugq Debugging and compiling on Magnus

Galaxy workq ASKAP operations astronomy jobs

Galaxy gpuq MWA operations astronomy jobs

Zeus workq Serial or shared memory (OpenMP) jobs

Zeus zythos Large shared memory (OpenMP) jobs

Zeus debugq Debugging and development jobs

Zeus gpuq, gpuq-dev GPU-accelerated jobs

Zeus knlq, knlq-dev Knights Landing evaluation jobs

Zeus visq Remote visualisation jobs

Zeus copyq Data transfer jobs

Zeus askap ASKAP data transfer jobs

Page 52: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

squeue displays the status of jobs in the local cluster

squeue

squeue –u username

squeue –p debugq

charris@zeus-1:~> squeue

JOBID USER ACCOUNT PARTITION NAME EXEC_HOST ST REASON START_TIME END_TIME TIME_LEFT NODES PRIORITY

2358518 jzhao pawsey0149 zythos SNP_call_zytho zythos R None Ystday 11:56 Thu 11:56 3-01:37:07 1 1016

2358785 askapops askap copyq tar-5182 hpc-data3 R None 09:20:35 Wed 09:20 1-23:01:09 1 3332

2358782 askapops askap copyq tar-5181 hpc-data2 R None 09:05:13 Wed 09:05 1-22:45:47 1 3343

2355496 pbranson pawsey0106 gpuq piv_RUN19_PROD n/a PD Priority Tomorr 01:53 Wed 01:53 1-00:00:00 2 1349

2355495 pbranson pawsey0106 gpuq piv_RUN19_PROD n/a PD Resources Tomorr 01:52 Wed 01:52 1-00:00:00 4 1356

2358214 yyuan pawsey0149 workq runGet_FQ n/a PD Priority 20:19:00 Tomorr 20:19 1-00:00:00 1 1125

2358033 yyuan pawsey0149 gpuq 4B_2 n/a PD AssocMaxJo N/A N/A 1-00:00:00 1 1140

2358709 pbranson pawsey0106 workq backup_RUN19_P n/a PD Dependency N/A N/A 1-00:00:00 1 1005

Querying the Queue

Page 53: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

NAME – job name. Set this if you have lots of jobs.

ST – job state. R=running. PD=pending.

REASON – the reason the job is not running

• Dependency – job must wait for another to complete before it

• Priority – a higher priority job exists

• Resources – the job is waiting for sufficient resources

Querying the Queue (cont’d)

charris@zeus-1:~> squeue

JOBID USER ACCOUNT PARTITION NAME EXEC_HOST ST REASON START_TIME END_TIME TIME_LEFT NODES PRIORITY

2358518 jzhao pawsey0149 zythos SNP_call_zytho zythos R None Ystday 11:56 Thu 11:56 3-01:37:07 1 1016

2358785 askapops askap copyq tar-5182 hpc-data3 R None 09:20:35 Wed 09:20 1-23:01:09 1 3332

2358782 askapops askap copyq tar-5181 hpc-data2 R None 09:05:13 Wed 09:05 1-22:45:47 1 3343

2355496 pbranson pawsey0106 gpuq piv_RUN19_PROD n/a PD Priority Tomorr 01:53 Wed 01:53 1-00:00:00 2 1349

2355495 pbranson pawsey0106 gpuq piv_RUN19_PROD n/a PD Resources Tomorr 01:52 Wed 01:52 1-00:00:00 4 1356

2358214 yyuan pawsey0149 workq runGet_FQ n/a PD Priority 20:19:00 Tomorr 20:19 1-00:00:00 1 1125

2358033 yyuan pawsey0149 gpuq 4B_2 n/a PD AssocMaxJo N/A N/A 1-00:00:00 1 1140

2358709 pbranson pawsey0106 workq backup_RUN19_P n/a PD Dependency N/A N/A 1-00:00:00 1 1005

Page 54: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

scontrol show job jobid

charris@magnus-1:~> scontrol show job 2474075JobId=2474075 JobName=m2BDF2

UserId=tnguyen(24642) GroupId=tnguyen(24642) MCS_label=N/APriority=7016 Nice=0 Account=pawsey0199 QOS=normalJobState=RUNNING Reason=None Dependency=(null)Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0RunTime=03:13:09 TimeLimit=1-00:00:00 TimeMin=N/ASubmitTime=12 Dec 2017 EligibleTime=12 Dec 2017StartTime=10:41:04 EndTime=Tomorr 10:41 Deadline=N/APreemptTime=None SuspendTime=None SecsPreSuspend=0Partition=workq AllocNode:Sid=magnus-2:53310ReqNodeList=(null) ExcNodeList=(null)NodeList=nid0[0041-0047,0080-0082,0132-0133,0208-0219,0224-0226,0251-0253,0278-0279,0284-0289,0310-0312,0319,0324-0332,0344,0349-0350,0377-

0379,0385-0387,0484-0503,0517-0520,0525-0526,0554-0573,0620-0628,0673-0686,0689-0693,0732,0894-0895,0900-0907,1036-1037,1048-1051,1134-1138,1202-1203,1295-1296,1379-1380,1443-1446,1530-1534]

BatchHost=mom1NumNodes=171 NumCPUs=4104 NumTasks=171 CPUs/Task=1 ReqB:S:C:T=0:0:*:*TRES=cpu=4104,mem=5601960M,node=171Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*MinCPUsNode=1 MinMemoryCPU=1365M MinTmpDiskNode=0Features=(null) Gres=(null) Reservation=(null)OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)Command=/scratch/pawsey0199/tnguyen/run_test_periodicwave/stiff_problem/forMagnus/4thOrder/accuracy_check/eta_1/PeriodicBCs/BDF2/m2/gpc.shWorkDir=/scratch/pawsey0199/tnguyen/run_test_periodicwave/stiff_problem/forMagnus/4thOrder/accuracy_check/eta_1/PeriodicBCs/BDF2/m2StdErr=/scratch/pawsey0199/tnguyen/run_test_periodicwave/stiff_problem/forMagnus/4thOrder/accuracy_check/eta_1/PeriodicBCs/BDF2/m2/m2BDF2StdIn=/dev/nullStdOut=/scratch/pawsey0199/tnguyen/run_test_periodicwave/stiff_problem/forMagnus/4thOrder/accuracy_check/eta_1/PeriodicBCs/BDF2/m2/m2BDF2Power=

Individual Job Information

Page 55: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Nodes can be manually reserved for a certain time by the system administrators.

• Email the helpdesk to ask for a reservation. Only ask if you cannot work via the standard queues.

• For scheduled maintenance - we reserve the whole machine.

• For interactive use – debugging a many-node job or for a training course.

• A once-off urgent deadline.

[reaper@magnus-2 ~]> sinfo -TRESV_NAME STATE START_TIME END_TIME DURATION NODELISTcourseq ACTIVE 09:00:00 17:00:00 08:00:00 nid000[16-23]

Reservations

Page 56: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Examine the partitions and jobs on Zeus:

sinfo

squeue

scontrol show job jobid

• What is the largest job running at the moment?

• What is the largest job queued and when is it

estimated to start?

• What is the most common reason for jobs not

running?

Exercise: Partitions and Queues

Page 57: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

REQUESTING NODES

Page 58: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

SLURM needs to know two things from you:

1. Resource requirement.• How many nodes and how long you need them for.

2. What to run.• You cannot submit an application directly to SLURM.

Instead, SLURM executes on your behalf a list of shell commands.

• In batch mode, SLURM executes a jobscript which contains the commands.

• In interactive mode, type in commands just like when you log in.

• These commands can include launching programs onto the compute nodes assigned for the job.

Job Request

Page 59: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

A jobscript is a bash or csh script.

sbatch interprets directives in the script, which are written as comments and not executed.

• Directive lines start with #SBATCH.

• These are equivalent to sbatch command-line arguments.

• Directives are usually more convenient and reproducible than command-line arguments. Put your resource request into the jobscript.

The jobscript will execute on one of the allocated compute node

• #SBATCH directives are comments, so only subsequent commands are executed.

Jobscripts

Page 60: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

#SBATCH --job-name=myjob makes it easier to find in squeue

#SBATCH --account=pawsey0001 project accounting

#SBATCH --nodes=2 number of nodes

#SBATCH --tasks-per-node processes per node

#SBATCH --cpus-per-task cores per process

#SBATCH --time=00:05:00 walltime requested

#SBATCH --export=NONE start with a clean environment. This improves reproducibility and avoids contamination of the environment.

Common sbatch directives

Page 61: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

#!/bin/bash -l#SBATCH –partition=workq#SBATCH --job-name=myjob#SBATCH --account=courses01#SBATCH --nodes=1#SBATCH --tasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --time=00:05:00#SBATCH --export=NONE

#The next line is executed on the compute nodehostname

Example Jobscript

Page 62: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

A SLURM reservation temporarily dedicates nodes for a

particular purpose, with constraints that may be different

from the standard partitions.

To use a reservation:

sbatch --reservation=reservation-name myscript

Or in your jobscript:

#SBATCH --reservation=reservation-name

Requesting Reservations

Page 63: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Standard output and standard error from your jobscript

are collected by SLURM, and written to a file in the

directory you submitted the job from when the job

finishes/dies.

slurm-jobid.out

SLURM Output

Page 64: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Launch the job with sbatch:

cd hostname

sbatch --reservation=courseq hostname.slurm

Use squeue to see if it is in the queue:

squeue -u username

It is a short job so it has probably run already.

If it is still in the queue can you work out why?

Examine the slurm-jobid.out file:

cat slurm-jobid.out

Which node did the job run on?

Exercise: Hostname

Page 65: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

LAUNCHING PROGRAMS

Page 66: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

On Zeus, multiple jobs can share nodes.

Serial commands can be used directly in the jobscript,

as seen in the previous section.

On Magnus and Galaxy, the node is exclusive. Do not

run a serial job, as it will waste idle cores that will count

against your project allocation.

Serial Jobs (Zeus only)

Page 67: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

This job script will run on a single core of Zeus for up to 5 minutes:

#!/bin/bash -l#SBATCH --job-name=hello-serial#SBATCH --nodes=1#SBATCH --tasks-per-node=1#SBATCH --cpus-per-task=1#SBATCH --time=00:05:00#SBATCH --export=NONE

# load modules

module load python/3.6.3

# launch serial python script

python3 hello-serial.py

The script can be submitted to the scheduler with:

sbatch hello-serial.slurm

Serial python example

Page 68: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Parallel applications are launched using srun.

The arguments determine the parallelism:

-N number of nodes

-n number of tasks (for process parallelism e.g. MPI)

-c cores per task (for thread parallelism e.g. OpenMP)

While these are already provided in the SBATCH directives, they should be provided again in the srun arguments.

To carry the environment in the script to the launched programs, set the export flag:

--export=all

Launching Parallel Programs

Page 69: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

This will run 1 process with 28 threads on Zeus, using 28 cores for up to 5 minutes:

#!/bin/bash -l#SBATCH --job-name=hello-openmp#SBATCH --nodes=1#SBATCH --tasks-per-node=1#SBATCH --cpus-per-task=28#SBATCH --time=00:05:00#SBATCH --export=NONE

# set OpenMP environment variables

export OMP_NUM_THREADS=28

export OMP_PLACES=cores

export OMP_PROC_BIND=close

# launch OpenMP program

srun --export=all -n 1 -c ${OMP_NUM_THREADS} ./hello-openmp

The program can be compiled and the script can be submitted to the scheduler with:

cd hello-openmp

make

sbatch hello-openmp.slurm

OpenMP example

Page 70: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

This will run 28 MPI processes on 1 node on Zeus:

#!/bin/bash -l#SBATCH --job-name=hello-mpi#SBATCH --nodes=1#SBATCH --tasks-per-node=28#SBATCH --cpus-per-task=1#SBATCH --time=00:05:00#SBATCH --export=NONE

# prepare MPI environment

module load intel-mpi

# launch MPU program

srun --export=all -N 1 -n 28 ./hello-mpi

The script can be submitted to the scheduler with:

cd hello-mpi

make

sbatch hello-mpi.slurm

MPI example

Page 71: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Run hello-serial.py on a Zeus compute node.

Move into the exercise directory:

cd hello-serial

View the submission script:

less hello-serial.slurm

Submit the script to the SLURM scheduler:

sbatch hello-serial.slurm

Check the queue:

squeue -u username

View the output:

less slurm-#jobid.out

Exercise: Run a job

Page 72: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Sometimes you need them:

• Debugging

• Compiling

• Pre/post-processing

Use salloc instead of sbatch.

You still need srun to place jobs onto compute nodes.

#SBATCH directives must be included as command line arguments.

For example:

salloc --tasks=16 --time=00:10:00

srun make -j 16

Interactive Jobs

Page 73: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

If there are no free nodes, you may need to wait while the job is in the queue.

charris@zeus-1:~> salloc --tasks 1

salloc: Pending job allocation 2315927

salloc: job 2315927 queued and waiting for resources

It may appear to hang – waiting for resources to become available.

For small interactive jobs on Magnus use the debugq to wait less.

salloc --tasks=1 --time=10:00 -p debugq

Interactive Jobs

Page 74: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Run hello-serial.py interactively on a Zeus compute node.

Start an interactive session (you may need to wait while it is in the queue):

salloc --reservation=courseq --tasks=1 --time=00:10:00

Prepare the environment:

module load python/3.6.3

Launch the program:

srun --export=all -n 1 python3 hello-serial.py

Exit the interactive session:

exit

Exercise: Interactive session

Page 75: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

PROJECT ACCOUNTING

Page 76: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Merit allocations are awarded typically for 12 months.

Merit allocations are divided evenly between the four quarters of the year, to avoid end-

of-year congestion. Priorities reset at the start of the quarter for merit allocations

Director share allocations are typically awarded for up to 12 months, or until the time is

consumed, and do not reset automatically.

The job priority in the queue is affected the following:

• usage relative to allocation

• size of request

• length of time in queue

Allocations

Page 77: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Project usage can be checked using the pawseyAccountBalance tool:

module load pawseytools

pawseyAccountBalance -p projectname -u

For example:

Project Usage

charris@magnus-2:~> pawseyAccountBalance -p pawsey0001 -u

Compute Information

-------------------

Project ID Allocation Usage % used

---------- ---------- ----- ------

pawsey0001 250000 124170 49.7

--mcheeseman 119573 47.8

--mshaikh 2385 1.0

--maali 1109 0.4

--bskjerven 552 0.2

--ddeeptimahanti 292 0.1

Page 78: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

The sacct tool provides high-level information on the jobs that have been run:

sacct

There are many arguments, some commonly used options are:

-a display jobs for all users, not just the current user

-A projectname display jobs from this project account

-S yyyy-mm-ddThh:mm:ss display jobs after this start time

-E yyyy-mm-ddThh:mm:ss display jobs before this end time

For example:charris@magnus-1:~> sacct -a -A pawsey0001 -S 2017-12-01 -E 2017-12-02 -X

JobID JobName Partition Account AllocCPUS State ExitCode

------------ ---------- ---------- ---------- ---------- ---------- --------

2461157 bash debugq pawsey0001 24 COMPLETED 0:0

2461543 bubble512 debugq pawsey0001 24 FAILED 1:0

2461932 bash workq pawsey0001 24 FAILED 2:0

2462029 bash workq pawsey0001 24 FAILED 127:0

2462472 bash debugq pawsey0001 24 COMPLETED 0:0

2462527 jobscript+ workq pawsey0001 960 COMPLETED 0:0

Job Information

Page 79: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

DATA TRANSFER

Page 80: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• All transfers handled via secure copies

• scp, rsync, etc.

• Interactive use on login nodes is discouraged

• Small transfers may be okay

• Dedicated servers for transferring large amounts of data

• /home, /scratch and /group visible

• Externally visible scp

Data Transfer Nodes

Supercomputer Hostname

Magnus / Zeus / Zythos hpc-data.pawsey.org.au

Galaxy hpc-data.pawsey.org.au

Page 81: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

For file transfers, run scp from the remote system using the data transfer nodes.

For example, to copy a file to Pawsey:

scp filename [email protected]:/group/projectname/username

And to copy a file from Pawsey:

scp [email protected]:/group/projectname/username/filename $PWD

Use the same username and password as a normal ssh login.

There are many scp clients programs with graphical interfaces, such as MobaXTerm,

FileZilla, and WinSCP, particularly for Windows users.

Ensure username and passwords are correct for these programs, as they can

automatically retry and trigger IP blocking. This setting should be disabled in the

software.

Data Transfer Nodes

Page 82: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

copyq

• Batch job access to data

transfer nodes

• “copyq” partition

• Located on Zeus

• Available to all Pawsey

machines

• Serial job

• No srun needed

#!/bin/bash –login

#SBATCH --partition=copyq

#SBATCH --cluster=zeus

#SBATCH --ntasks=1

#SBATCH --account=[user-account]

#SBATCH --time=06:00:00

#SBATCH --export=NONE

# stage data

module load python

python ./data-mover.py

Page 83: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Practice data transfer by backing up the course material.

Exit the session, then use scp to copy via a data transfer node:

exit

scp -r cou###@hpc-data.pawsey.org.au:/group/courses01/cou### $PWD

For Windows users not using MobaXTerm, use WinSCP or Filezilla.

You may need to download it.

Exercise: Backing Up

Page 84: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

PERFORMANCE

Page 85: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Wall time

Length of real-world time taken.

E.g. The simulation took 12 hours.

Cost

Amount of resources used.

Cost = walltime * nodes.

E.g. 12 hours * 100 nodes = 1,200 node hours.

(= 28,800 core hours on a 24 cores per node system)

• This is also what you have prevented other people from being able to use!!

Metrics

Page 86: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

NWChem timings on Galaxy.

C60 molecule heat of formation using

double-hybrid DFT.Collaborative project w/ Amir Karton, UWA

Real-world Example

Cores Nodes Walltime (hours) Cost (Node hours)

128 8 11.2 90

256 16 4.4 70

512 32 2.1 67

1024 64 1.0 64

2048 128 0.91 116

4096 256 0.75 192

Fast and low cost

Fastest and highest cost2x 0.8x

Page 87: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

So How Many Cores to Use?

Fast turnaround

• Weigh up between

turnaround time and cost

• Total time = runtime +

queue time

High throughput

• Use an efficient core count

(usually low)

• Each job may run longer

• Run many jobs

Page 88: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

• For many workloads the parallel portion expands faster than the serial portion when the problem size is increased

• Humans tend to accept a certain delay for answers

• Many computational workloads can expand to consume more compute power if provided

tackle a larger problem with more cores for the same amount of time

• Higher cost, but more efficient use of many cores, and better science

Tackle Larger Problems

Page 89: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

When migrating from a desktop / workstation:

1. Request a single node with maximum allowed walltime(24 hours on Magnus).

2. Base subsequent walltime requests on how long the job took.

When trying larger (MPI) problems:

1. Start with a known job size, node count and walltime.

2. Repeat for successively larger job sizes and node count, and extrapolate the required time. • Use a table or plot.

This helps in knowing how the algorithm scales.

Do Some Testing

Page 90: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

ADDITIONAL HELP

Page 91: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Pawsey provides extensive documentation:

https://support.pawsey.org.au

• System user guides

• Frequently asked questions

• Pawsey-supported software list

• Maintenance logs

• Policies and terms of use

Documentation

Page 92: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

For further assistance, contact the help desk:

• https://support.pawsey.org.au/

Help us to help you. Provide details!

E.g.

• Which resource

• Error messages

• Location of files

• SLURM job id

• Your username and IP address if having login issues

• Never tell us (or anyone) your password!

Assistance

Page 93: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Applications Support Team

Team expertise:• Access to Pawsey

facilities

• High-performance computing

• Parallel programming

• Computational science

• GPU accelerators

• Cloud computing

• Scientific visualisation

• Data-intensive computing

Contact the helpdesk

Page 94: Introductory Supercomputing Supercomputing.pdf · • Jobs can span multiple compute nodes • Individual nodes are not that different in performance from a ... • Zeus should be

Come to the Intermediate Supercomputing training to learn about:

• Filesystem performance

• Compiling code

• Scientific libraries

• Advanced job launching

• SLURM workflows

• Job arrays

Further Training