Upload
samuel-murphy
View
219
Download
0
Embed Size (px)
Citation preview
Getting Started on Topsail
Getting Started on TopsailCharles Davis
ITS Research ComputingFebruary 10, 2010
2
History of Topsail
Structure of Topsail
File Systems on Topsail
Compiling on Topsail
Topsail and LSF
OutlineOutline
3
Initial Topsail ClusterInitial Topsail Cluster
Initially: 1040 CPU Dell Linux Cluster
•520 dual socket, single core nodes
Infiniband interconnect
Intended for capability research
Housed in ITS Franklin machine room
Fast and efficient for large computational jobs
4
Topsail Upgrade 1Topsail Upgrade 1
Topsail upgraded to 4,160 CPU• replaced blades with dual socket, quad core
Intel Xeon 5345 (Clovertown) Processors• Quad-Core with 8 CPU/node
Increased number of processors, but decreased individual processor speed (was 3.6 GHz, now 2.33)
Decreased energy usage and necessary resources for cooling system
Summary: slower clock speed, better memory bandwidth, less heat• Benchmarks tend to run at the same speed per core
• Topsail shows a net ~4X improvement
• Of course, this number is VERY application dependent
5
Topsail – Upgraded blades
Topsail – Upgraded blades
52 Chassis: Basis of node names• Each holds 10 blades -> 520 blades total• Nodes = cmp-chassis#-blade#
Old Compute Blades: Dell PowerEdge 1855• 2 Single core Intel Xeon EMT64T 3.6 GHZ procs• 800 Mhz FSB• 2MB L2 Cache per socket• Intel NetBurst MicroArchitecture
New Compute Blades: Dell PowerEdge 1955• 2 Quad core Intel 2.33 GHz procs• 1333 Mhz FSB• 4MB L2 Cache per socket• Intel Core 2 MicroArchitecture
6
Topsail Upgrade 2Topsail Upgrade 2
Most recent Topsail upgrade (Feb/Mar ‘09)
Refreshed much of the infrastructure
Improved IBRIX filesystem
Replaced and improved Infiniband cabling
Moved cluster to ITS-Manning building
•Better cooling and UPS
7
Current Topsail Architecture
Current Topsail Architecture
Login node: 8 CPU @ 2.3 GHz Intel EM64T, 12 GB memory
Compute nodes: 4,160 CPU @ 2.3 GHz Intel EM64T, 12 GB memory
Shared disk: 39TB IBRIX Parallel File System
Interconnect: Infiniband 4x SDR
64bit Linux Operating System
8
Multi-Core ComputingMulti-Core Computing
Processor Structure on Topsail
• 500+ nodes
• 2 sockets/node
• 1 processor/socket
• 4 cores/processor (Quad-core)
• 8 cores/node
http://www.tomshardware.com/2006/12/06/quad-core-xeon-clovertown-rolls-into-dp-servers/page3.html
9
Multi-Core ComputingMulti-Core Computing
The trend in High Performance Computing is towards multi-core or many core computing.
More cores at slower clock speeds for less heat
Now, dual and quad core processors are becoming common.
Soon 64+ core processors will be common
•And these may be heterogeneous!
10
The Heat ProblemThe Heat Problem
Taken From: Jack Dongarra, UT
11
More ParallelismMore Parallelism
Taken From: Jack Dongarra, UT
12
Infiniband Connections
Infiniband Connections
Connection comes in single (SDR), double (DDR), and quad data rates (QDR).
•Topsail is SDR. Single data rate is 2.5 Gbit/s in each direction
per link. Links can be aggregated - 1x, 4x, 12x.
•Topsail is 4x. Links use 8B/10B encoding —10 bits carry 8
bits of data — useful data transmission rate is four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s respectively.
Data rate for Topsail is 8 GB/s (4x SDR).
13
Topsail Network Topology
Topsail Network Topology
14
Infiniband Benchmarks
Infiniband Benchmarks
Point-to-point (PTP) intranode communication on Topsail for various MPI send types
Peak bandwidth:• 1288 MB/s
Minimum Latency (1-way):• 3.6 s
15
Infiniband Benchmarks
Infiniband Benchmarks
Scaled aggregate bandwidth for MPI Broadcast on Topsail
Note good scaling throughout the tested range (from 24-1536 cores)
16
Login to TopsailLogin to Topsail
Use ssh to connect:•ssh topsail.unc.edu
SSH Secure Shell with Windows For using interactive programs with
X-Windows Display:•ssh –X topsail.unc.edu
•ssh –Y topsail.unc.edu Off-campus users (i.e. domains
outside of unc.edu) must use VPN connection
17
Topsail File SystemsTopsail File Systems
39TB IBRIX Parallel File System
Split into Home and Scratch Space
Home: /ifs1/home/my_onyen
Scratch: /ifs1/scr/my_onyen
Mass Storage
•Only Home is backed up
•/ifs1/home/my_onyen/ms
18
File System LimitsFile System Limits
500GB Total Limit per User
Home – 15GB limit for Backups
Scratch:
•No limit except 500GB total
•Not backed up
•Periodically cleaned
Few installed packages/programs
19
Compiling on TopsailCompiling on Topsail
Modules Serial Programming
• Intel Compiler Suite for Fortran77, Fortran90, C and C++ - Recommended by Research Computing
• GNU
Parallel Programming• MPI
• OpenMP Must use Intel Compiler Suite Compiler tag: -openmp Must set OMP_NUM_THREADS in submission script
20
Compiling ModulesCompiling Modules
Module commands
•module – list commands
•module avail – list modules
•module add – add module temporarily
•module list – list modules being used
•module clear – remove module temporarily
Add module using startup files
21
Available Compilers Available Compilers
Intel – ifort, icc, icpc GNU – gcc, g++, gfortran Libraries - BLAS/LAPACK MPI:
•mpicc/mpiCC
•mpif77/mpif90
mpixx is just a wrapper around the Intel or GNU compiler•Adds location of MPI libraries and include files
•Provided as a convenience
22
Test MPI CompileTest MPI Compile
Copy cpi.c to scratch directory:• cp /ifs1/scr/cdavis/Topsail/cpi.c /ifs1/scr/my_onyen/.
Add Intel module:
•module load hpc/mvapich-intel-11
Confirm Intel module:
•which mpicc
Compile code:
•mpicc –o cpi cpi.c
23
MPI/OpenMP TrainingMPI/OpenMP Training
Courses are taught throughout year by Research Computing http://learnit.unc.edu/workshops
Next course:
•MPI – Summer
•OpenMP – March 3rd
24
Running Programs on Topsail
Running Programs on Topsail
Upon ssh to Topsail, you are on the Login node.
Programs SHOULD NOT be run on Login node.
Submit programs to one of 4,160 Compute nodes.
Submit jobs using Load Sharing Facility (LSF).
25
Job Scheduling Systems
Job Scheduling Systems
Allocates compute nodes to job submissions based on user priority, requested resources, execution time, etc.
Many types of schedulers
•Load Sharing Facility (LSF) – Used by Topsail
•IBM LoadLeveler
•Portable Batch System (PBS)
•Sun Grid Engine (SGE)
26
Load Sharing Facility (LSF)
Load Sharing Facility (LSF)
Submission host
LIM
Batch API
Master host
MLIM
MBD
Execution host
SBD
Child SBD
LIM
RES
User jobLIM – Load Information ManagerMLIM – Master LIMMBD – Master Batch DaemonSBD – Slave Batch DaemonRES – Remote Execution Server
queue1
2
3
45
6 7
89
10
11
12
13
Loadinformation
otherhosts
otherhosts
bsub app
27
Submitting a Job to LSF
Submitting a Job to LSF
For a compiled MPI job:
•bsub -n "< number CPUs >" -o out.%J -e err.%J -a mvapich mpirun ./mycode
bsub – LSF command that submits job to compute node
bsub –o and bsub -e
•Job output saved to file in submission directory
28
Queue System on Topsail
Queue System on Topsail
Topsail uses queues to distribute jobs.
Specify queue with –q in bsub:
•bsub –q week …
No –q specified = default queue (week)
Queues vary depending on size and required time of jobs
See listing of queues:
•bqueues
29
Topsail QueuesTopsail Queues
Queue Time Limit
Jobs/User CPU/Job
int 2 hrs 128 ---debug 2 hrs 128 ---day 24 hrs 1024 4 – 128week 1 week 1024 4 – 128512cpu 4 days 1024 32 – 1024128cpu 4 days 1024 32 – 12832cpu 2 days 1024 4 – 32chunk 4 days 1024 Batch Jobs
• Most jobs do not scale very well over 128 cpu.
30
Submission ScriptsSubmission Scripts
Easier to write submission script that can be edited for each job submission.
Example script file – run.hpl:#BSUB -n "< number CPUs >"
#BSUB -e err.%J
#BSUB -o out.%J
#BSUB -a mvapich
mpirun ./mycode
Submit with: bsub < run.hpl
31
More bsub options More bsub options
bsub –x NO LONGER USE!!!!•Exclusive use of a node
•Use extensively when first testing code bsub –n 4 –R span[ptile=4]
•Forces all 4 processors to be on same node
•Similar to –x bsub –J job_name see man pages for a complete
description•man bsub
32
Performance TestPerformance Test
Gromacs MD simulation of bulk water
Simulation setups:
•Case 1: -n 8 -R span[ptile=1]
•Case 2: -n 8 -R span[ptile=8]
Simulation times (1ns MD):
•Case 1: 1445 sec
•Case 2: 1255 sec
Using 1 node only improved speed by 13%
33
Following Job After Submission
Following Job After Submission
bjobs•bjobs –l JobID
•Shows current status of job
bhist•bhist –l JobID
•More details information regarding job history
bkill•bkill –r JobID
•Ends job prematurely
34
Submit Test MPI JobSubmit Test MPI Job
Submit the test MPI program on Topsail
•bsub –q week –n 4 –o out.%J –e err.%J –a mvapich mpirun ./cpi
Follow submission: bjobs
Output stored in out.%J file
35
Pre-Compiled Programs on Topsail
Pre-Compiled Programs on Topsail
Some applications are precompiled for all users:
• /ifs1/apps
• Amber, Gaussian, Gromacs, NetCDF, NWChem, R
Add module to path using module commands:
• module list – shows available applications
• module add – add specific application
Once module command is used, executable is added to the full path
36
Test Gaussian Job on Topsail
Test Gaussian Job on Topsail
Add Gaussian Application to path:
• module add apps/gaussian-03e01
• module list
Copy input com file:
• cp /ifs1/scr/cdavis/Topsail/water.com .
Check that executable has been added to path:
• echo $PATH
Submit job:
• bsub –q week –n 4 –e err.%J –o out.%J g03 water.com
37
Common Error 1Common Error 1
If job immediately dies, check err.%J file
err.%J file has error:
• Can't read MPIRUN_HOST
Problem: MPI enivronment settings were not correctly applied on compute node
Solution: Include mpirun in bsub command
38
Common Error 2Common Error 2
Job immediately dies after submission err.%J file is blank Problem: ssh passwords and keys were
not correctly setup at initial login to Topsail
Solution: • cd ~/.ssh/
• mv id_rsa id_rsa-orig
• mv id_rsa.pub id_rsa.pub-orig
• Logout of Topsail
• Login to Topsail and accept all defaults
39
Interactive JobsInteractive Jobs
To run long shell scripts on Topsail, use int queue
bsub –q int –Ip /bin/bash
•This bsub command provides a prompt on compute node
•Can run program or shell script interactively from compute node
Totalview debugger can also be run interactively from Topsail
40
Further Help with Topsail
Further Help with Topsail
More details about using Topsail can be found on the Getting Started on Topsail help document
•http://help.unc.edu/?id=6214•http://keel.isis.unc.edu/wordpress/ - ON
CAMPUS
For assistance with Topsail, please contact the ITS Research Computing group
•Email: [email protected] For immediate assistance, see manual
pages on Topsail:•man <command>