30
FAS Research Computing Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:[email protected]

Choosing Resources Wisely - HARVARD UNIVERSITY

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Choosing Resources Wisely

Plamen Krastev Office: 38 Oxford, Room 117

Email:[email protected]

Page 2: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Inform you of available computational resources

Help you choose appropriate computational

resources for your research

Provide guidance for scaling up your applications

and performing computations more efficiently

More efficient use = more resources available to

do research

Enable you to “Work smarter, better, faster”

Objectives

Slide 2

Page 3: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Outline

Choosing computational resources

Overview of available RC resources

Partition / Queue

Time

Number of nodes and cores

Memory

Storage

Examples

Slide 3

Page 4: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

What resources do I need?

Slide 4

Is my code serial or parallel?

How many cores and/or nodes does it need?

How much memory does it require?

How long does my code take to run?

How big is the Input / Output Data for each run?

How is the input data read by the code (e.g., hardcoded,

keyboard, parameter/data file(s), external

database/website, etc.)?

Page 5: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

What resources do I need?

Slide 5

How is the output data written by the code (standard

output/screen, data file(s), etc.)?

How many tasks/jobs/runs do I need to complete?

What is my timeframe / deadline for the project (e.g., paper,

conference, thesis, etc.)?

What computational resources are available at Research

Computing?

Page 6: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

RC resources: Odyssey

Slide 6

Odyssey is a large scale heterogeneous HPC cluster

Compute:

60,000+ compute cores (and increasing)

Cores per node: 8 to 64

Memory per node: 12GB to 512GB (4GB/core)

1,000,000+ NVIDIA GPU cores

Storage:

Over 35PB of storage

Home directories: 100GB

Lab space: Initial 4TB at $0 with expansion on a TB basis available for

purchase at $45/TB/year

Local scratch: 270GB/node

Global Scratch: High-performance shared scratch: 1 PB total, Lustre

file system

https://rc.fas.harvard.edu/resources/odyssey-storage

Page 7: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

RC resources: Odyssey

Slide 7

Odyssey is a large scale heterogeneous HPC cluster

Software:

CentOS

SLURM job manager

1,000+ scientific tools and programs

https://portal.rc.fas.harvard.edu/apps/modules

Interconnect:

2 underlying networks connecting 3 data centers

TCP/IP network

Low-latency 56 GB/s InfiniBand network: inter-node parallel

computing, fast access to Lustre mounted storage

Hosted Machines:

• 300+ virtual machines

• Lab instrument workstations

Page 8: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Available Storage

Slide 8

Home Directories

Lab Storage Local Scratch Global Scratch Persistent

Research Data

Size Limit 100GB 4TB+ 270GB/node 1.2PB total 3PB

Availability All cluster nodes +

Desktop/laptop

All cluster nodes + Desktop/laptop

Local compute

node only.

All cluster nodes

Only IB connected

cluster nodes

Backup Hourly

snapshot + Daily Offsite

Daily Offsite No backup No backup External Repos

No backup

Retention Policy

Indefinite Indefinite Job duration 90 days 3-9 months

Performance Moderate. Not

suitable for high I/O

Moderate. Not suitable for

high I/O

Suited for small file I/O intensive jobs

Appropriate for large file I/O

intensive jobs

Appropriate for large I/O

intensive jobs

Cost Free 4TB Free +

Expansion at $45/TB/yr

Free Free Free

Page 9: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Partition / Queue

Slide 9

general serial_requeue interact bigmem unrestricted Lab

queues

Time Limit

7 days 7 days 3 days no limit no limit no limit

# Nodes 177 1071 8 7 8 1154

# Cores / Node

64 8-64 64 64 64 8-64

Memory / Node (GB)

256 12-512 256 512 256 12-512

Batch jobs:

#SBATCH -p general # Partition name

Interactive or test jobs:

srun -p interact OTHER_OPTIONS

https://rc.fas.harvard.edu/resources/running-jobs/#SLURM_partitions

Page 10: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Time

Slide 10

How long does my code take to run?

Batch jobs:

#SBATCH -p serial_requeue

#SBATCH -t 0-02:00 #Time in D-HH:MM

Interactive or test jobs:

srun -t 0-02:00 -p interact OTHER_JOB_OPTIONS

Page 11: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Number of nodes and cores

Slide 11

Is my code serial or parallel?

Serial (single-core) jobs

Batch jobs:

#SBATCH -p serial_requeue

#SBATCH -c 1 # Number of cores

Interactive or test jobs:

srun -c 1 -p interact OTHER_JOB_OPTIONS

Core / Thread / Process / CPU

Page 12: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Number of nodes and cores

Slide 12

Parallel shared memory (single node) jobs

Examples:

• OpenMP (Fortran, C/C++)

• MATLAB Parallel Computing Toolbox (PCT)

• Python (e.g., threading, multiprocessing)

• R (e.g., multicore)

Batch jobs:

#SBATCH -p general # Partition

#SBATCH -N 1 # Number of nodes

#SBATCH -c 4 # Number of cores (per task)

srun -c 4 PROGRAM PROGRAM_OPTIONS

Interactive or test jobs:

srun -p interact -N 1 -c 4 OTHER_OPTIONS

Page 13: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Number of nodes and cores

Slide 13

Parallel distributed memory (multi-node) jobs

Examples:

• MPI (openmpi, impi, mvapich) with Fortran or

C/C++ code

• MATLAB Distributed Computing Server (DCS)

• Python (e.g., mpi4py)

• R (e.g., Rmpi, snow)

Batch jobs:

#SBATCH -p general # Partition

#SBATCH -n 4 # Number of tasks

srun -n 4 PROGRAM PROGRAM_OPTIONS

Interactive or test jobs:

srun -p interact -n 4 OTHER_OPTIONS

Page 14: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Memory

Slide 14

Serial and parallel shared memory (single node) jobs

Batch jobs:

#SBATCH -p serial_requeue # Partition

#SBATCH --mem=4000 # Memory / node in MB

Interactive or test jobs:

srun --mem=4000 -p interact OTHER_OPTIONS

Parallel distributed memory (multi-node) jobs

Batch jobs:

#SBATCH -p general # Partition

#SBATCH -n 4 # Number of tasks

#SBATCH --mem-per-cpu=4000 # Memory / core in MB

Interactive or test jobs:

srun --mem-per-cpu=4000 -n 4 -p interact OTHER_OPTIONS

Page 15: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Memory

Slide 15

How much memory does my code require?

• Understand your code and how the algorithms scale

analytically

• Run an interactive job and monitor memory usage

(with the “top” Unix command)

• Run a test batch job and check memory usage after

the job has completed (with the “sacct” SLURM

command)

Page 16: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Memory

Slide 16

Know your code

Example:

A real*8 (Fortran), or double (C/C++), matrix of dimension

100,000 X 100,000 requires ~80GB of RAM

Data Type: Fortran / C Bytes

integer*4 / int 4

integer*8 / long 8

real*4 / float 4

real*8 / double 8

complex*8 / float complex 8

complex*16 / double complex 16

Page 17: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Memory

Slide 17

Run an interactive job and monitor memory usage (with the “top” Unix

command)

Example: Check the memory usage of a matrix diagonalization code

Request an interactive bash shell session:

srun -p interact -n 1 -t 0-02:00 --pty --mem=4000 bash

Run the code, e.g.,

./matrix_diag.x

Open a new shell terminal and ssh to the compute node where the

interactive job dispatched, e.g.,

ssh holy2a18307

In the new shell terminal run top, e.g.,

top -u pkrastev

Page 18: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Memory

Slide 18

Run 1:

Matrix dimension = 3000 X 3000 (real*8)

Needs 3,000 X 3000 X 8 / 1000000 = ~72 MB of RAM

Page 19: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Memory

Slide 19

Run 2: Input size changed

Double matrix dimension, Quadrupole required memory

Matrix dimension = 6000 X 6000 (real*8)

Needs 6,000 X 6000 X 8 / 1000000 = ~288MB of RAM

Page 20: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

sacct overview

• sacct = SLURM accounting database

– every 30 sec the node collects the amount of CPU

and memory usage that all of the process IDs are

using for a given job. After the job ends this data is

set to slurmdb.

• Common flags

– j jobid or –name=jobname

– S YYYY-MM-DD and –E YYYY-MM-DD

– o ouput_options

Slide 20

JobID,JobName,NCPUS,Nnodes,Submit,Start,End,CPUTime,TotalCPU,ReqMem,

MaxRSS,MaxVMSize,State,Exit,Node

http://slurm.schedmd.com/sacct.html

Page 21: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Memory

Slide 21

Run a test batch job and check memory usage after the job has

completed (with the “sacct” SLURM command)

Example:

[pkrastev@sa01 Resources]$ sacct -o ReqMem,MaxRSS -j 70446364

ReqMem MaxRSS

---------- ----------

320Mn 286648K

or

MaxRSS = 286648KB = 286.648MB

ReqMem = 320MB or 10% > MaxRSS

https://rc.fas.harvard.edu/resources/faq/how-to-know-what-memory-limit-to-put-on-my-job

Page 22: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Storage

Slide 22

Home directories, /n/home*, and Lab storage are not appropriate for I/O

intensive or large number of jobs. Typical utilization would be jobscripts,

and in-house analysis codes or self-installed software

For jobs that create high-volume of small files (< 10 MB) , use local

scratch. You need to copy your input data to /scratch and move output

data to a different location after the job completes

For I/O intensive jobs – large data files (> 100 MB) and/or large number

of data files (100s of 10-100MB) – use the global scratch file-system

/n/regal

https://rc.fas.harvard.edu/policy-scratch

Page 23: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Storage

Slide 23

60 Oxford St

Initial Lab shares (4TB)

Legacy equipment

1 Summer Street

Personal home directories

Purchased lab shares

Older Lab owned compute nodes

Holyoke, MA

Global scratch high-performance file-

system

Compute nodes > 2012 (33K+ cores)

Topology may affect the efficiency of your

work! For best performance storage needs

to be closer to compute

Page 24: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Storage Utilization

Use “du” Unix command to check disk usage, e.g.,

du -h $HOME

...

37G /n/home06/pkrastev

Slide 24

https://en.wikipedia.org/wiki/Du_(Unix)

Page 25: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Examples

Slide 25

#!/bin/bash

#SBATCH -J lapack_test

#SBATCH -o lapack_test.out

#SBATCH -e lapack_test.err

#SBATCH -p serial_requeue

#SBATCH -t 0-00:30

#SBATCH -N 1

#SBATCH -c 1

#SBATCH --mem=4000

# Load required modules

source new-modules.sh

# Run program

./lapack_test.x

Serial application

Page 26: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Examples

Slide 26

#!/bin/bash

#SBATCH -J omp_dot

#SBATCH -o omp_dot.out

#SBATCH -e omp_dot.err

#SBATCH -p general

#SBATCH -t 0-02:00

#SBATCH -N 1

#SBATCH -c 4

#SBATCH --mem=16000

# Set up environment

source new-modules.sh

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Run program

srun -c $SLURM_CPUS_PER_TASK ./omp_dot.x

Parallel OpenMP (single-node) application

Page 27: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Examples

Slide 27

#!/bin/bash

#SBATCH -J parallel_monte_carlo

#SBATCH -o parallel_monte_carlo.out

#SBATCH -e parallel_monte_carlo.err

#SBATCH -N 1

#SBATCH -c 8

#SBATCH -t 0-03:30

#SBATCH -p general

#SBATCH --mem=32000

# Load required software modules

source new-modules.sh

module load matlab/R2016a-fasrc01

# Run program

srun -n 1 -c 8 matlab-default -nosplash -nodesktop -r "parallel_monte_carlo;exit"

MATLAB Parallel Computing Toolbox (single-node) application

Page 28: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Examples

Slide 28

#!/bin/bash

#SBATCH -J planczos

#SBATCH -o planczos.out

#SBATCH -e planczos.err

#SBATCH -p general

#SBATCH -t 30

#SBATCH -n 8

#SBATCH --mem-per-cpu=4000

# Load required modules

source new-modules.sh

module load intel/15.0.0-fasrc01

module load openmpi/1.8.3-fasrc02

# Run program

srun -n 8 --mpi=pmi2 ./planczos.x

Parallel MPI (multi-node) application

https://github.com/fasrc/User_Codes

Page 29: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Test first

• Before diving right into submitting 100s or 1000s

of research jobs, ALWAYS test a few first.

– ensure the job will finish to completion without

errors

– ensure you understand the resources needs

and how they scale with different data sizes

and input options

Slide 29

Page 30: Choosing Resources Wisely - HARVARD UNIVERSITY

FAS Research Computing

Contact Information

Harvard Research Computing Website:

http://rc.fas.harvard.edu

Email:

[email protected]

[email protected]

Office Hours:

Wednesdays noon – 3pm

38 Oxford Street, 2nd Floor Conference Room Slide 30