Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
© 2013 Regents of the University of Minnesota. All rights reserved.
Minnesota SupercomputingInstitute
© 2013 Regents of the University of Minnesota. All rights reserved.
Intro to Job Submission and SchedulingEvan F. Bollig, Ph.D.
Senior Scientific Computing Consultant
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
MSI Systems Overview
4
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Resource Allocations
ALL groups will have access to these basic resources:
● Service Units (SUs) for HPC usage○ 70,000 (~100,000 CPU-hours) or more if requested
● High Performance Storage○ 150 GB home directory quota or more if requested
● Access to interactive gateways● Access to MSI labs
○ contact us to activate your Ucard for physical access
5
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Batch vs. Interactive ComputingBatch (qsub) Interactive
SSH (qsub -I)Interactive
Desktop (Linux)
Wall clock limit
696 hours 696 hours1,2 24 hours
Requires SUs ✔ ✔3 ✕4
Memory limit 1 TB 1 TB 16 GBCore Limit 8640 8640 4Software Modules
400+ 400+ 400+
GPUs ✔ ✔ ✔
GUIs ✕ ✔ ✔
6
1. Don’t be a jerk 2. Larger requests receive lower priority 3. Resource dependent 4. Subject to change
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
What is Interactive Computing?
• Software GUIs
• Prototyping workflows
o Design your workflow for a single node (multi-core) or small set of nodes
o Discover and test new tools/concepts
o Profile, optimize and debug
• Data Visualization
2
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Batch Jobs• When should you use Batch Jobs?
o Whenever possible! This is the traditional way to work in HPC
o “Don’t Be a Jerk”; share resources and be considerate of other researchers
• What are the benefits of Batch Jobs?
o Headless execution of automated processes
o Long runtimes
o Large core counts
o A scheduler packs jobs in hardware to maximize utilization, reduce latency, etc.
8
© 2013 Regents of the University of Minnesota. All rights reserved.
Interacting with MSI Systems
© 2013 Regents of the University of Minnesota. All rights reserved.
Machine Architecture: Cluster
Source: http://en.wikipedia.org/wiki/Cluster_%28computing%29
© 2013 Regents of the University of Minnesota. All rights reserved.
○ Mesabi ■ About 19,040 total cores (Intel Haswell).■ 24 cores and 62gb per node for most nodes.■ Special queues with large memory (up to 1TB), and GPUs.■ Node sharing: good for both small and large jobs.
○ Itasca■ About 9,000 total cores, on Intel Nehalem processors.■ 8 cores and 22gb per node in the large primary queue.■ Special queues with larger memory and 16 cores per node.
○ Lab Server■ About 500 total cores, on older hardware.■ For interactive, or small single node jobs.■ 8 cores and 15gb per node in the primary queue.
Clusters at MSImesabi.msi.umn.edu
itasca.msi.umn.edu
lab.msi.umn.edu
1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111
860 TFlops DP 788 total nodes
19040 cores 83 TB mem
1150 TFlops DP
177 total nodes 20888 cores 46 TB mem
�=- -------
1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 111111111 1111111111 111111111
64 GB mem 256 GB mem
1111 1111 1111
1111 1111 1111
1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111 1111
1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111
24 cores 128 cores
480 GB HDD I I 1111111111 1111111111 1111111111 1111111111 1111111111 1111
16 x ram1t nodes
--------
--------
78 x 256 GB nodes Some dedicated, some in ram256 queue
•••••••••• •••••••••• •••••••••• •••••••••• •••••••••• •••••••••• •••••••••• ••••••••
FDR 54Gb
40 x k40 GPU nodes
I� 2x1<.1o I I� 2XK.IO II� 2XK.IO II� 2XK.IO I l@l 2x1<.10 I l@l 2x1<.10 I l@l 2x1<.10 I l@I 2XK40 I I� 2x1<40 I I� 2x1<40 I I� 2xK40 I I� 2x1<.10 I l@l 2x1«0 I I@! 2x1<40 I I@! 2XK40 I l@I 2x1<40 I I� 2XK40 I I� 2XK40 I I� 2XK40 I I� 2XK40 I
480 GB SSD �
EDR 100Gb
Itasca
637 nodes
5096 cores
ODO 0000000000 DOD 0000000000 ODO 0000000000 ODO 0000000000 DOD 0000000000 ODO 0000000000 ODO 0000000000 ODO 0000000000 ODO 0000000000 DOD 0000000000
0000000000 0000000000 0000000000
17 nodes
48 NVIDIA
V C <fi!zxv100
100 ards �,.v,oo <f2I 2xv100
<@lzxv,oo
<@14 xV100 <@I 2xv100 �4xV100 I <@lzxv,oo �4xV100 <@I 2xV100
�4xV100 @hxv,oo
<f:!2xv100
'<@I 2xV100
,<ie 2 xV100
<@lsxv100 :<@.12xv100
- .......... ······-- •......... ······-- -■■■-■■■ ······-- --■- -------- •■■■■■■••• u■■■•-- .......... -·-- .......... ······-- .......... ······-- .......... -·-- .......... -··-
=== •••••••••• • •••••••••••••••••••.......... ······.......... ···----··· ··-·········· ······-
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Choosing a Cluster
• Tiny jobs and desktops run on the Interactive Cluster (a.k.a. “lab”)• Long running jobs and batch jobs run on Mesabi/Itasca
• Shorter queues
• Better hardware
• Same software
• “qsub -I” also supported
• Itasca requires ppn=8
• Mesabi compute nodes cannot connect to internet
• Mesabi compute nodes are heterogeneous (e.g., up to 1TB Memory, GPUs, SSDs, etc.)
17
= More research; Faster.
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Hands-On: Getting Connected
19
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Pre-requisite: VPN(for Non-UMN Networks)
https://it.umn.edu/virtual-private-network-vpn
20
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Windows LaptopPre-requisite:
SSH and File Transferhttps://winscp.net/eng/downloads.php
Download and install both WinSCP and PuTTY
21
© 2013 Regents of the University of Minnesota. All rights reserved.
Connecting to MSISSH is the most reliable connection method
• Linux and Mac users can use the terminal command:
ssh login.msi.umn.edu
• Windows users will need to use an ssh capable program, like PuTTY or Cygwin.
NOTE: all SSH connections must first connect to login.msi.umn.edu. From there you can connect to other systems.
For graphical connections use NICE or NX:https://nice.msi.umn.edu https://nx.msi.umn.edu
© 2013 Regents of the University of Minnesota. All rights reserved.
Clusters at MSI
Mesabi LabItasca
Login
First connect to login.msi.umn.edu,then connect to a cluster.
Must be on-campus, or using the VPN.https://it.umn.edu/virtual-private-network-vpn
© 2013 Regents of the University of Minnesota. All rights reserved.
MSI Computing Environment
MSI systems are primarily Linux compute clusters running CentOS
• Home directories are unified across systems. • Software is managed via a module system. • Jobs are scheduled via a queueing system.
© 2013 Regents of the University of Minnesota. All rights reserved.
Home Directories
Home directories are unified across all Linux systems.
Each group has a disk quota which can be viewed with the command: groupquota
Panasas ActivStor 14: 6PB storage, capable of 30 GB/sec read/write, and 270,000 IOPS
© 2013 Regents of the University of Minnesota. All rights reserved.
Loading SoftwareSoftware modules are used to alter environmental variables, in order to make software available. MSI has hundreds of software modules.
Description Command Example
See all available modules: module avail module avail
Load a module: module load module load matlab/2015a
Unload a module: module unload module unload matlab/2015a
Unload all modules: module purge module purge
See what a module does: module show module show matlab/2015a
List currently loaded modules: module list module list
Module Commands:
© 2013 Regents of the University of Minnesota. All rights reserved.
Job SchedulingOn MSI systems, calculations are performed within “jobs”. A job is a planned calculation that will run for a specified time length on a specified set of hardware.
There are two types of job:
1. Non-interactive (vast majority)2. Interactive
The job scheduler front-end is called the Portable Batch System (PBS).
Jobs start in your home directory with no modules loaded.
© 2013 Regents of the University of Minnesota. All rights reserved.
Job ScriptsTo submit a non-interactive job, first make a PBS job script.
Example:
#!/bin/bash -l#PBS -l walltime=8:00:00,nodes=3:ppn=8,pmem=1000mb #PBS -m abe#PBS -M [email protected]
cd ~/program_directorymodule load intelmodule load ompi/intelmpirun -np 24 program_name < inputfile > outputfile
© 2013 Regents of the University of Minnesota. All rights reserved.
Job SubmissionTo submit a job script use the command:
qsub -q queuename scriptname
A list of queues available on different systems can be found here:
https://www.msi.umn.edu/queues
Submit jobs to a queue which is appropriate for the resources needed.
Resources to consider when choosing a queue:● Walltime● Total cores and cores per node● Memory● Special hardware (GPUs, etc)
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Selecting Resources
Options to QSUB
o qsub -I [options]
• -l walltime=W
• -l nodes=X:ppn=Y
• -l pmem=M -OR- -l mem=M
• -q “QueueName”
• -A groupname
• -l gres=MATLAB+4
o Enable graphics via X-tunneling (-X)
24
NOTE: you can specify options to the qsub command or embed them in your PBS script with the "#PBS " prefix.
© 2013 Regents of the University of Minnesota. All rights reserved.
Job SubmissionTo view queued jobs use the commands:
qstat -u usernameshowq -w user=username
For detailed information:checkjob -v jobnumber
To cancel a submitted job use the command:qdel jobnumber
© 2013 Regents of the University of Minnesota. All rights reserved.
Interactive Jobs
Nodes may be requested for interactive use using the command:
qsub -I -X -l walltime=1:00:00,nodes=1:ppn=8,mem=2gb
The job waits in the queue like all jobs, and when it begins the terminal returns control.
MN Supercomputing Institutefor Advanced Computational Research
© 2009 Regents of the University of Minnesota. All rights reserved.
Build a PBS Script Interactively• Use a terminal to complete your work
o Automate the process
o All commands go into a single BASH script (e.g., workflow.bash)
• Remember:
o Errors will interrupt the script and fail your job (work aroundthem)
o Use relative File Paths in case script is moved (e.g.,/home/bollige/boll0107/test.m -> ~/test.m -or- $HOME/test.m)
o Use BASH variables whenever possible to generalize
26
© 2013 Regents of the University of Minnesota. All rights reserved.
Service Units (SUs)
Jobs consume Service Units (SUs), which roughly correspond to processor/CPU time.
Each research group is given a service unit allocation at the beginning of the year.
To view the number of service units remaining use the command: acctinfo
If a group is using service units faster than the "fairshare target", then the group's jobs will have lower queue priority.
© 2013 Regents of the University of Minnesota. All rights reserved.
Simple Parallelization: BackgroundingMost easily done with single node jobs.
#!/bin/bash#PBS -l walltime=8:00:00,nodes=1:ppn=8,pmem=1000mb #PBS -m abe#PBS -M [email protected]
cd $PBS_O_WORKDIRmodule load example/1.0./program1.exe < input1 > output1 &./program2.exe < input2 > output2 &./program3.exe < input3 > output3 &./program4.exe < input4 > output4 &./program5.exe < input5 > output5 &./program6.exe < input6 > output6 &./program7.exe < input7 > output7 &./program8.exe < input8 > output8 &wait
© 2013 Regents of the University of Minnesota. All rights reserved.
Simple Parallelization: Job ArraysWorks best on Mesabi.
Template Job Script, template.pbs:
#!/bin/bash#PBS -l walltime=8:00:00,nodes=1:ppn=8,pmem=1000mb #PBS -m abe#PBS -M [email protected]
cd $PBS_O_WORKDIRmodule load example/1.0program.exe < input$PBS_ARRAYID > output$PBS_ARRAYID
Submit an array of 10 jobs:qsub -t 1-10 template.pbs
© 2013 Regents of the University of Minnesota. All rights reserved.
Simple Parallelization: MPIWorks best on Mesabi.
Template Job Script, template.pbs:#!/bin/bash#PBS -l walltime=8:00:00,nodes=2:ppn=8,pmem=1000mb #PBS -m abe#PBS -M [email protected]
cd $PBS_O_WORKDIRmodule load intel impiecho "Running job with $PBS_NUM_NODES nodes \* $PBS_NUM_PPN ppn = $PBS_NP processes"
mpirun -np $PBS_NP ./program.exe
Submit the job:qsub template.pbs
© 2013 Regents of the University of Minnesota. All rights reserved.
Simple Parallelization: GNU Parallel Runs 1 command per node
Template Job Script, template.pbs:#!/bin/bash#PBS -l walltime=8:00:00,nodes=2:ppn=8,pmem=1000mb #PBS -m abe#PBS -M [email protected]
cd $PBS_O_WORKDIRmodule load parallel
echo "Running job with $PBS_NUM_NODES nodes \* $PBS_NUM_PPN ppn = $PBS_NP processes"
sort -u $PBS_NODEFILE > unique-nodelist.txt
env_parallel --jobs 1 --sshloginfile $PBS_NODEFILE \ --workdir $PWD < commands.txt
Submit the job:qsub template.pbs
© 2013 Regents of the University of Minnesota. All rights reserved.
Minnesota Supercomputing Institute
The University of Minnesota is an equal opportunity educator and employer. This PowerPoint is available in alternative formats upon request. Direct requests to Minnesota Supercomputing Institute, 599 Walter library, 117 Pleasant St. SE,
Minneapolis, Minnesota, 55455, 612-624-0528.
Web: www.msi.umn.edu
Email: [email protected]
Telephone: (612) 626-0802