29
18th–19th Mar. 2014 Introduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader High Performance Computing Computing Centre TU Darmstadt

Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Embed Size (px)

Citation preview

Page 1: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

18th–19th Mar. 2014

Introduction Workshop18th – 19th March 2014

Lecture II: Access and Batchsystem

Dr. Andreas Wolf

Group LeaderHigh Performance ComputingComputing Centre TU Darmstadt

Page 2: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 218th–19th Mar. 2014

Overview

● Access and Requirements

● Software packages– Module system

● Batch system– Queueing rules

– Commands

– Batch script examples for MPI and OpenMP

Page 3: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 318th–19th Mar. 2014

● Access and Requirements

● Software packages– Module system

● Batch system– Queueing rules

– Commands

– Batch script examples for MPI and OpenMP

Page 4: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 418th–19th Mar. 2014

● Because of the size of the system– Each potential user needs to be proved first against the

export limitations of the “Bundesamt für Wirtschaft und Ausfuhrkontrolle (BAFA)” - Export control/Embargo

● E-Mail at [email protected]– The new “user rules” document (“Nutzungsordnung”)

● Names, TU-ID, Email address, Institute and Institution affinity, Citizenship

● Project title● Reports● No private data (email, pictures etc.)● No commercial use● Limited data storage life

Access Requirements for the new Cluster

Page 5: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 518th–19th Mar. 2014

http://www.hhlr.tu-darmstadt.de

1.

2.

3.

Page 6: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 618th–19th Mar. 2014

http://www.hhlr.tu-darmstadt.de

1.

2.

3.

Only for lessons and practice groups!Only for lessons and practice groups!Finished later ...Finished later ...

Page 7: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 718th–19th Mar. 2014

Registration

● Send the filled “user rules” document to us

– We need the original sheet of paper● Wait for the automated answer for granting access

– Follow the instruction to setup your password via ANDO● Accept the additional email for registering to the

'HPC-Nutzer' email list– Here we will inform you about News and Trouble

Get Connected

● Windows: download SSH clients, e.g. “putty”

● Linux: typically SSH already installed

Get Access and be Connected

Page 8: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 818th–19th Mar. 2014

Access to the Lichtenberg Cluster (lcluster)

● per SSH

● Login Nodes: lcluster1.hrz.tu-darmstadt.de (currently disabled)lcluster2.hrz.tu-darmstadt.delcluster3.hrz.tu-darmstadt.de (currently disabled)lcluster4.hrz.tu-darmstadt.de

<some where>:~> ssh <user name>@lcluster1.hrz.tu-darmstadt.de

<user name>@lcluster1.hrz.tu-darmstadt.de's password:

<user name>@hla0001:~>

Page 9: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 918th–19th Mar. 2014

Data Transfer between your Client and the Cluster

● per SCP command (a part of the SSH tools)

– Transfer over Login nodes

<some where>:~> scp <file names> <user name>@lcluster1.hrz … de:

Password:<file name> 100% <Bytes> <1.0>KB/s <01:00>

<some where>:~>

<some where>:~> scp <user name>@lcluster1.hrz … de:<file name> <dir>

Password:<file name> 100% <Bytes> <1.0>KB/s <01:00>

<some where>:~>

Page 10: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1018th–19th Mar. 2014

HPC Newsletter

Email Newsletter: [email protected]

Newsletter subscription

● https://lists.tu-darmstadt.de/mailman/listinfo/hpc

● Information about

– Planned Events, User meetings

– Planned Lectures, Workshops etc.

– Common information of the system / news

Page 11: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1118th–19th Mar. 2014

● Access and Requirements

● Software packages– Module system

● Batch system– Queueing rules

– Commands

– Batch script examples for MPI and OpenMP

Page 12: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1218th–19th Mar. 2014

Operation System• SLES 11 SP3 (SP2), x86-64, 64Bit

System Tools• GCC 4.4.6, 4.7.2, 4.8.1 …• Intel 13.0.1 (incl. Intel Cluster Studio XE → Dr. G. B. Hagos)• PGI 13.1• ACML, Intel-MKL, SCALAPACK ...• OpenMPI 1.6.5, Intel-MPI ...• Totalview ( → Dr. S. Boldyrev), Vampir ( → A. Jäger)

Applications• Ansys 140, 145 ...• Abaqus 6.12-1• Matlab 2012a• COMSOL 4.3

Software available / installable

Page 13: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1318th–19th Mar. 2014

> module list

– Shows all currently load software environments(load packages of the user)

> module load <module name>

– Loads a specific software environment module

● Only when the module is successfully loaded – the software is really useable!

<user name>@hla0001:~> module load ansysModules: loading ansys/14.5<user name>@hla0001:~>

> module unload <module name>

– Unloads a software module

Modular Load and Unload – 1

Page 14: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1418th–19th Mar. 2014

> module avail

● Shows all available software packages currently installed

<user name>@hla0001:~> module avail

–---------------–---------- /shared/modulefiles –------------------------abaqus/6.12-1 gcc/4.6.4 openmpi/gcc/1.6.5(defansys/13.0 gcc/4.7.3(default) openmpi/intel/1.6.5ansys/14.5(default) gcc/4.8.1 openspeedshop/2.0.2(…

<user name>@hla0001:~>

Modular Load and Unload – 2

Page 15: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1518th–19th Mar. 2014

● Access and Requirements

● Software packages– Module system

● Batch system– Queueing rules

– Commands

– Batch script examples for MPI and OpenMP

Page 16: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1618th–19th Mar. 2014

Queueing – Scheduling

• Different queues – for different purposes– deflt – Limited to maximal 24 hours

● Main part of all computing nodes– long – Limited to maximal 7 days

● Very small part (~ 8) of all computing nodes– short – Limited to maximal 30 minutes

● Depending on the demand – reserved nodes,on all others nodes with highest scheduling priority

• Advantages

– Maintainability of the most computing nodes within 24 hours● Because of the main focus to 24 hours jobs

– Small test jobs (30 minutes) will scheduled promptly

Page 17: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1718th–19th Mar. 2014

Additional Queues

• Special queues – may be changed in the future– multi – Limited to maximal 24 hours

● Inter island computing – only MPI section– testmem – Limited to maximal 24 hours

● 1 TByte memory nodes – MEM section– testnvd – Limited to maximal 24 hours

● Nvidia Tesla accelerators – ACC-G section– testphi – Limited to maximal 24 hours

● Intel Xeon Phi accelerators – ACC-M section

Page 18: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1818th–19th Mar. 2014

Why using LSF?

• Scalability for a large number of nodes• Professional support (for fine tuning)• WebGUI – graphical front-end for

– Creating

– Submitting

– Monitoring

of batch jobs

(At present unfortunately not ready for use – later)

➔ Usability also from a Windows client

Batch System - LSF

Page 19: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2118th–19th Mar. 2014

> bsub < <batch script>

– Submit a new batch job to the queueing system

> bqueue

– Shows all presently submitted or active batch jobs and their batch-ID numbers

> bkill <batch-ID>

– Deletes own batch jobs (with ID ...)

> bjobs <batch-ID>

– Shows specific configuration or runtime information for a batch job

Batch System Commands - LSF

Page 20: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2218th–19th Mar. 2014

LSF – bsub & bkill

<user name>@hla0001:~> bsub < sample-script.sh

Job <12345> is submitted to default queue <deflt>.

<user name>@hla0001:~> bkill 12345

Job <12345> is being terminated

<user name>@hla0001:~>

● The sign < is important for using a script

Page 21: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2318th–19th Mar. 2014

LSF - bqueue

<user name>@hla0001:~> bqueues

QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSPshort 100 Open:Active - - - - 0 0 0 0multi 11 Open:Active - - - - 0 0 0 0deflt 10 Open:Active - - - - 39731 30080 9651 0testnvd 10 Open:Active - - - - 0 0 0 0testphi 10 Open:Active - - - - 0 0 0 0testmem 10 Open:Active - - - - 128 64 64 0long 1 Open:Active - - - - 1541 1413 128 0

<user name>@hla0001:~>

Page 22: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2418th–19th Mar. 2014

LSF - bjobs

<user name>@hla0001:~> bjobs

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME APS12345 <user> RUN testmem hla0001 64*mem <title> Nov 4 14:50 -12346 <user> PEND testmem hla0001 <title> Nov 4 14:50 9.1

<user name>@hla0001:~>

● Absolute priority scheduling depends on the users computing history, a queue factor and the job size

Page 23: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2518th–19th Mar. 2014

● Access and Requirements

● Software packages– Module system

● Batch system– Queueing rules

– Commands

– Batch script examples for MPI and OpenMP

Page 24: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2618th–19th Mar. 2014

#Job name#BSUB -J MPItest

#File / path where STDOUT will be written, the %J is the job id#BSUB -o /home/<TU-ID>/MPItest.out%J

#File / path where STDERR will be written, the %J is the job id#BSUB -e /home/<TU-ID>/MPItest.err%J

#Request the time you need for execution in minutes#The format for the parameter is: [hour:]minute,#that means for 80 minutes you could also use this: 1:20#BSUB -W 10

#Request virtual memory you need for your job in MByte#BSUB -M 18000

Batch Script for running a MPI Program - 1

suppress full path

hard limit, sum over all cores, for job scheduling

Page 25: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2718th–19th Mar. 2014

...#Request the number of compute slots / #MPI tasks you want to use#BSUB -n 64

#Specify the MPI support#BSUB -a openmpi

#Specify your mail address#BSUB -u <email address>

#Send a mail when job is done#BSUB -N

Batch Script for running a MPI Program - 2

Page 26: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2818th–19th Mar. 2014

...module load openmpi/gcc/1.6.5#Prove the loaded modulesmodule list

cd ~/<working path>

mpirun <program>

#Otherwise generating an explicit hostfileecho "$LSB_HOSTS" | sed -e "s/ /\n/g" > hostfile.$LSB_JOBID#Specify number of tasks and hostsmpirun -n 64 -hostfile hostfile.$LSB_JOBID <program>

Batch Script for running a MPI Program - 3

Loading modules is saver than using the submit environment

Common use without any parameters ← given by LSF

Page 27: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2918th–19th Mar. 2014

Batch Script for running a OpenMP Program

...

#Specify the OpenMP support#BSUB -a openmp

#export OMP_NUM_THREADS=16

<OpenMP program call>

Page 28: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 3018th–19th Mar. 2014

Batch Description, seehttp://www.hhlr.tu-darmstadt.de/hhlr/lichtenberg/ arbeit_auf_dem_cluster_2/arbeit_mit_lsf_2/index~1.de.jsp

Page 29: Introduction Workshop 18 – 19 March 2014 - · PDF fileIntroduction Workshop 18th – 19th March 2014 Lecture II: Access and Batchsystem Dr. Andreas Wolf Group Leader ... abaqus/6.12-1

Lecture II: Access and Batchsystem | Dr. Andreas Wolf 3118th–19th Mar. 2014

Thank you for your attention

Questions ???