Upload
phungkhanh
View
218
Download
1
Embed Size (px)
Citation preview
18th–19th Mar. 2014
Introduction Workshop18th – 19th March 2014
Lecture II: Access and Batchsystem
Dr. Andreas Wolf
Group LeaderHigh Performance ComputingComputing Centre TU Darmstadt
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 218th–19th Mar. 2014
Overview
● Access and Requirements
● Software packages– Module system
● Batch system– Queueing rules
– Commands
– Batch script examples for MPI and OpenMP
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 318th–19th Mar. 2014
● Access and Requirements
● Software packages– Module system
● Batch system– Queueing rules
– Commands
– Batch script examples for MPI and OpenMP
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 418th–19th Mar. 2014
● Because of the size of the system– Each potential user needs to be proved first against the
export limitations of the “Bundesamt für Wirtschaft und Ausfuhrkontrolle (BAFA)” - Export control/Embargo
● E-Mail at [email protected]– The new “user rules” document (“Nutzungsordnung”)
● Names, TU-ID, Email address, Institute and Institution affinity, Citizenship
● Project title● Reports● No private data (email, pictures etc.)● No commercial use● Limited data storage life
Access Requirements for the new Cluster
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 518th–19th Mar. 2014
http://www.hhlr.tu-darmstadt.de
1.
2.
3.
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 618th–19th Mar. 2014
http://www.hhlr.tu-darmstadt.de
1.
2.
3.
Only for lessons and practice groups!Only for lessons and practice groups!Finished later ...Finished later ...
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 718th–19th Mar. 2014
Registration
● Send the filled “user rules” document to us
– We need the original sheet of paper● Wait for the automated answer for granting access
– Follow the instruction to setup your password via ANDO● Accept the additional email for registering to the
'HPC-Nutzer' email list– Here we will inform you about News and Trouble
Get Connected
● Windows: download SSH clients, e.g. “putty”
● Linux: typically SSH already installed
Get Access and be Connected
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 818th–19th Mar. 2014
Access to the Lichtenberg Cluster (lcluster)
● per SSH
● Login Nodes: lcluster1.hrz.tu-darmstadt.de (currently disabled)lcluster2.hrz.tu-darmstadt.delcluster3.hrz.tu-darmstadt.de (currently disabled)lcluster4.hrz.tu-darmstadt.de
<some where>:~> ssh <user name>@lcluster1.hrz.tu-darmstadt.de
<user name>@lcluster1.hrz.tu-darmstadt.de's password:
<user name>@hla0001:~>
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 918th–19th Mar. 2014
Data Transfer between your Client and the Cluster
● per SCP command (a part of the SSH tools)
– Transfer over Login nodes
<some where>:~> scp <file names> <user name>@lcluster1.hrz … de:
Password:<file name> 100% <Bytes> <1.0>KB/s <01:00>
<some where>:~>
<some where>:~> scp <user name>@lcluster1.hrz … de:<file name> <dir>
Password:<file name> 100% <Bytes> <1.0>KB/s <01:00>
<some where>:~>
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1018th–19th Mar. 2014
HPC Newsletter
Email Newsletter: [email protected]
Newsletter subscription
● https://lists.tu-darmstadt.de/mailman/listinfo/hpc
● Information about
– Planned Events, User meetings
– Planned Lectures, Workshops etc.
– Common information of the system / news
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1118th–19th Mar. 2014
● Access and Requirements
● Software packages– Module system
● Batch system– Queueing rules
– Commands
– Batch script examples for MPI and OpenMP
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1218th–19th Mar. 2014
Operation System• SLES 11 SP3 (SP2), x86-64, 64Bit
System Tools• GCC 4.4.6, 4.7.2, 4.8.1 …• Intel 13.0.1 (incl. Intel Cluster Studio XE → Dr. G. B. Hagos)• PGI 13.1• ACML, Intel-MKL, SCALAPACK ...• OpenMPI 1.6.5, Intel-MPI ...• Totalview ( → Dr. S. Boldyrev), Vampir ( → A. Jäger)
Applications• Ansys 140, 145 ...• Abaqus 6.12-1• Matlab 2012a• COMSOL 4.3
Software available / installable
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1318th–19th Mar. 2014
> module list
– Shows all currently load software environments(load packages of the user)
> module load <module name>
– Loads a specific software environment module
● Only when the module is successfully loaded – the software is really useable!
<user name>@hla0001:~> module load ansysModules: loading ansys/14.5<user name>@hla0001:~>
> module unload <module name>
– Unloads a software module
Modular Load and Unload – 1
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1418th–19th Mar. 2014
> module avail
● Shows all available software packages currently installed
<user name>@hla0001:~> module avail
–---------------–---------- /shared/modulefiles –------------------------abaqus/6.12-1 gcc/4.6.4 openmpi/gcc/1.6.5(defansys/13.0 gcc/4.7.3(default) openmpi/intel/1.6.5ansys/14.5(default) gcc/4.8.1 openspeedshop/2.0.2(…
<user name>@hla0001:~>
Modular Load and Unload – 2
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1518th–19th Mar. 2014
● Access and Requirements
● Software packages– Module system
● Batch system– Queueing rules
– Commands
– Batch script examples for MPI and OpenMP
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1618th–19th Mar. 2014
Queueing – Scheduling
• Different queues – for different purposes– deflt – Limited to maximal 24 hours
● Main part of all computing nodes– long – Limited to maximal 7 days
● Very small part (~ 8) of all computing nodes– short – Limited to maximal 30 minutes
● Depending on the demand – reserved nodes,on all others nodes with highest scheduling priority
• Advantages
– Maintainability of the most computing nodes within 24 hours● Because of the main focus to 24 hours jobs
– Small test jobs (30 minutes) will scheduled promptly
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1718th–19th Mar. 2014
Additional Queues
• Special queues – may be changed in the future– multi – Limited to maximal 24 hours
● Inter island computing – only MPI section– testmem – Limited to maximal 24 hours
● 1 TByte memory nodes – MEM section– testnvd – Limited to maximal 24 hours
● Nvidia Tesla accelerators – ACC-G section– testphi – Limited to maximal 24 hours
● Intel Xeon Phi accelerators – ACC-M section
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 1818th–19th Mar. 2014
Why using LSF?
• Scalability for a large number of nodes• Professional support (for fine tuning)• WebGUI – graphical front-end for
– Creating
– Submitting
– Monitoring
of batch jobs
(At present unfortunately not ready for use – later)
➔ Usability also from a Windows client
Batch System - LSF
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2118th–19th Mar. 2014
> bsub < <batch script>
– Submit a new batch job to the queueing system
> bqueue
– Shows all presently submitted or active batch jobs and their batch-ID numbers
> bkill <batch-ID>
– Deletes own batch jobs (with ID ...)
> bjobs <batch-ID>
– Shows specific configuration or runtime information for a batch job
Batch System Commands - LSF
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2218th–19th Mar. 2014
LSF – bsub & bkill
<user name>@hla0001:~> bsub < sample-script.sh
Job <12345> is submitted to default queue <deflt>.
<user name>@hla0001:~> bkill 12345
Job <12345> is being terminated
<user name>@hla0001:~>
● The sign < is important for using a script
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2318th–19th Mar. 2014
LSF - bqueue
<user name>@hla0001:~> bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSPshort 100 Open:Active - - - - 0 0 0 0multi 11 Open:Active - - - - 0 0 0 0deflt 10 Open:Active - - - - 39731 30080 9651 0testnvd 10 Open:Active - - - - 0 0 0 0testphi 10 Open:Active - - - - 0 0 0 0testmem 10 Open:Active - - - - 128 64 64 0long 1 Open:Active - - - - 1541 1413 128 0
<user name>@hla0001:~>
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2418th–19th Mar. 2014
LSF - bjobs
<user name>@hla0001:~> bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME APS12345 <user> RUN testmem hla0001 64*mem <title> Nov 4 14:50 -12346 <user> PEND testmem hla0001 <title> Nov 4 14:50 9.1
<user name>@hla0001:~>
● Absolute priority scheduling depends on the users computing history, a queue factor and the job size
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2518th–19th Mar. 2014
● Access and Requirements
● Software packages– Module system
● Batch system– Queueing rules
– Commands
– Batch script examples for MPI and OpenMP
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2618th–19th Mar. 2014
#Job name#BSUB -J MPItest
#File / path where STDOUT will be written, the %J is the job id#BSUB -o /home/<TU-ID>/MPItest.out%J
#File / path where STDERR will be written, the %J is the job id#BSUB -e /home/<TU-ID>/MPItest.err%J
#Request the time you need for execution in minutes#The format for the parameter is: [hour:]minute,#that means for 80 minutes you could also use this: 1:20#BSUB -W 10
#Request virtual memory you need for your job in MByte#BSUB -M 18000
Batch Script for running a MPI Program - 1
suppress full path
hard limit, sum over all cores, for job scheduling
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2718th–19th Mar. 2014
...#Request the number of compute slots / #MPI tasks you want to use#BSUB -n 64
#Specify the MPI support#BSUB -a openmpi
#Specify your mail address#BSUB -u <email address>
#Send a mail when job is done#BSUB -N
Batch Script for running a MPI Program - 2
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2818th–19th Mar. 2014
...module load openmpi/gcc/1.6.5#Prove the loaded modulesmodule list
cd ~/<working path>
mpirun <program>
#Otherwise generating an explicit hostfileecho "$LSB_HOSTS" | sed -e "s/ /\n/g" > hostfile.$LSB_JOBID#Specify number of tasks and hostsmpirun -n 64 -hostfile hostfile.$LSB_JOBID <program>
Batch Script for running a MPI Program - 3
Loading modules is saver than using the submit environment
Common use without any parameters ← given by LSF
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 2918th–19th Mar. 2014
Batch Script for running a OpenMP Program
...
#Specify the OpenMP support#BSUB -a openmp
#export OMP_NUM_THREADS=16
<OpenMP program call>
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 3018th–19th Mar. 2014
Batch Description, seehttp://www.hhlr.tu-darmstadt.de/hhlr/lichtenberg/ arbeit_auf_dem_cluster_2/arbeit_mit_lsf_2/index~1.de.jsp
Lecture II: Access and Batchsystem | Dr. Andreas Wolf 3118th–19th Mar. 2014
Thank you for your attention
Questions ???