25
An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Embed Size (px)

Citation preview

Page 1: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

An Introduction to Gauss

Paul D. BainesUniversity of California, Davis

November 20th 2012

Page 2: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

http://wiki.cse.ucdavis.edu/support:systems:gauss

12 node compute cluster (2 x 16 cores per node)

1 TB storage per node ~ 11 TB storage on head node 64GB RAM per node Total 416 cores (inc. head node)

What is Gauss?

Page 3: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Running large numbers of independent jobs Running long-running jobs Running jobs involving parallel computing Running large-memory jobs

What is Gauss good for?

Page 4: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Running simple, fast jobs (just use your laptop)

Running interactive R sessions Running GPU-based calculations

What Gauss is not designed for…

Page 5: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Create your public/private key (see Wiki for details) Provide CSE with your public key and campus

username (via email to [email protected]) Log in to Gauss via ssh:

(e.g., ssh –X [email protected]) When you ssh into Gauss, you log in to the head node If you just directly type R at the command line, you

will be running R on the head node (Please do not do this !) To use the compute nodes you submit jobs via SLURM SLURM manages which jobs runs on which nodes

Gauss Overview

Page 6: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Head Node

SLURM

Compute Node

1

Compute Node

2

Compute Node

3

Compute Node

Compute Node

12

Gauss Structure

Page 7: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Important commands to know:

sbatch (submit a job to Gauss) sarray (submit an array job to Gauss) squeue (check the status of running jobs) scancel (cancel a job)

Examples (more detailed examples later):

squeue # view all running jobssqueue –u pdbaines # check all jobs scancel –u pdbaines # cancel all of pdbaines’ jobsscancel 19213 # cancel job 19213

SLURM Basics

Page 8: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

The compute resources (CPU’s, memory) are shared across all Gauss users.

When users submit jobs, SLURM allocates resources. You must be sure to request sufficient resources (e.g.,

cores, memory) for your jobs to run Resource requests are made when submitting your job (via

your sbatch or sarray scripts) Resources are allocated as per user requests, but strict

limits are not enforced If you use more memory than you requested it can

~massively~ slow down yours (and others) jobs! To check the memory usage of your jobs you can use the

‘myjobs’ command (see examples later)

Resource Allocation on Gauss

Page 9: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Gauss is a shared resource – your bad code can (potentially) ruin someone elses simulation!

Test your code thoroughly before running large jobs

Make sure you request the correct amount of resources for your jobs

Regularly check memory usage for long-running jobs

Be considerate of others!

Gauss Etiquette

Page 10: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

To use Gauss you need to know some basic Linux commands (these work on a Mac terminal too)

You should already be, or quickly get, familiar with the following commands:

ls, cd, cp, mv, rm, pwd, cat, tar, grep

It helps if you learn how to use a command line editor such as vim or nano. (hint: use vim )

Aside: Linux Basics

Page 11: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Bob has been given a large dataset by a collaborator and told to analyze it in. The dataset is large and the job will take about 3 days to complete so he doesn’t want to use his laptop!

Bob can submit the job on Gauss, and keep on working on other stuff in the meantime.

Ways to use Gauss: Example 1

Page 12: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Code files:

bob_example_1.Rbob_example_1.sh

To submit:

sbatch bob_example_1.sh

Example 1 cont…

Page 13: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Example 1 Code: SLURM script

Page 14: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

How do you know how much memory to request? Run small trial jobs!

Use the ‘myjobs’ command e.g.,

pdbaines@gauss:~/Examples/Example_3$ myjobsTue Nov 20 10:27:45 PST 2012 - pdbaines has jobs running on: c0-11jobs for pdbaines on c0-11USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDpdbaines 13932 99.0 0.3 408424 216492 ? R 10:25 3:12 Rpdbaines 13949 99.1 0.3 434308 242336 ? R 10:25 3:12 Rpdbaines 13975 99.1 0.2 367720 175780 ? R 10:25 3:12 Rpdbaines 13995 99.1 0.3 425100 233172 ? R 10:25 3:12 R

VSZ and RSS give a rough indication of how much memory your job is using (in Kb)e.g., The above R jobs are using ~350-450Mb each.

Allocating Resources

Page 15: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Bob has been given 3 more datasets to analyze by his collaborator (or three new analyses to perform on the same dataset).

He just needs to set up the same thing as example 1 multiple times.

Ways to use Gauss: Example 2

Page 16: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Code files:

bob_example_2A.R, bob_example_2B.R, bob_example_2C.Rbob_example_2A.sh,bob_example_2B.sh,bob_example_2C.sh,

To submit:

sbatch bob_example_2A.shsbatch bob_example_2B.shsbatch bob_example_2B.sh

Example 2 cont…

Page 17: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Example 2 Code: SLURM script

Page 18: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Bob has developed a new methodology for analyzing super-complicated data.

He wants to run a simulation to prove to the world how awesome his method is compared to his competitors methods.

He decides to simulate 100 datasets, and analyze each of them with his method, and his competitors methods.This is done using an array job.

Ways to use Gauss: Example 3

Page 19: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Bob writes an R script to randomly generate and analyze one dataset at a time

He would like to run the script 100 times on Gauss To do this, he write a shell script to submit to SLURM Each run must use a different random seed, o/w he will

analyze the same dataset 100 times! He will also need to write an R script to combine the results

from all 100 jobs He will also need a shell script to submit the post-processing

portion of the analysis (Note: I have described this process in detail on the Gauss

page of the CSE Wiki: http://wiki.cse.ucdavis.edu/support:systems:gauss)

Example 3 cont…

Page 20: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Code files:

bob_example_3.RBob_post_process.R

To submit:

sarray bob_example_3.shsbatch bob_post_process.sh

Example 3 cont…

Page 21: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Example 3: SLURM script

Page 22: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Example 3: Modified R Code

Page 23: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

To copy results back from Gauss to your laptop:

Archive them e.g.,tar –cvzf all_results.tar.gz my_results/

• Copy them by either using a file transfer (sftp) program, or, just use the command line (Linux/Mac users) e.g.,

scp [email protected]:~/all_results.tar.gz ./

Retrieving your results

Page 24: An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012

Gauss can be setup to run parallel computing jobs using MPI, OpenMP etc.

SLURM submit files need to be modified to specify number of tasks, CPU’s, memory per CPU etc.

New (free) software can be installed on Gauss at your request by emailing help@cse

More Advanced Usage