12
Tools for High Performance Computing 2013: 1. Introduction 1 Tools for High Performance Computing Suurteholaskennan työkalut Lectures: Wed 12-14, room E205, first lecture 4.9.2013 Excercises: Fri 12-14, room TBA, first session TBA Due to mid-term break: No lecture nor exercise session on week 21.-27.10.2013 Lecturer: Antti Kuronen [email protected] http://www.acclab.helsinki.fi/~aakurone/ Exercise assistant: Andrey Ilinov [email protected] Course homepage: http://www.physics.helsinki.fi/courses/s/stltk/ Course completion: exercises: 50% final project: 50% Credit points: 5 ECTS points • Objectives: To learn to use the programming tools for high performance computing: Fortran, Unix program- ming tools, program optimization, parallel computation. Tools for High Performance Computing 2013: 1. Introduction 2 Computational physics specialization alternative http://atom.physics.helsinki.fi/suomi/laskfys/

Tools for High Performance Computing Suurteholaskennan työkalut · 2013-08-27 · Tools for High Performance Computing Suurteholaskennan työkalut • Lectures: Wed 12-14, ... 4

  • Upload
    lethien

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Tools for High Performance Computing 2013: 1. Introduction 1

Tools for High Performance ComputingSuurteholaskennan työkalut

• Lectures: Wed 12-14, room E205, first lecture 4.9.2013• Excercises: Fri 12-14, room TBA, first session TBA

• Due to mid-term break: No lecture nor exercise session on week 21.-27.10.2013

• Lecturer: Antti [email protected]://www.acclab.helsinki.fi/~aakurone/

• Exercise assistant: Andrey [email protected]

• Course homepage: http://www.physics.helsinki.fi/courses/s/stltk/

• Course completion:exercises: 50%final project: 50%

• Credit points: 5 ECTS points

• Objectives: To learn to use the programming tools for high performance computing: Fortran, Unix program-ming tools, program optimization, parallel computation.

Tools for High Performance Computing 2013: 1. Introduction 2

Computational physics specialization alternative

http://atom.physics.helsinki.fi/suomi/laskfys/

Tools for High Performance Computing 2013: 1. Introduction 3

Course material

• These lecture notes; downloadable in PDF format from address http://www.physics.helsinki.fi/courses/s/stltk/

• See also the links to external sites on the course homepage.

• Books on Fortran:

Modern Fortran Explained by Michael Metcalf, John Reid and Malcolm Cohen, Oxford University Press

Fortran 95/2003 by Juha Haataja, Jussi Rahola, Juha Ruokolainen, CSC, 2007 (in Finnish)see http://www.csc.fi/csc/julkaisut/oppaat

• Material on parallel computation

Parallel and Distributed Computation: Numerical Methods, Dimitri P. Bertsekas and John N. Tsitsiklishttp://web.mit.edu/dimitrib/www/pdc.html

• Google ;-)

Tools for High Performance Computing 2013: 1. Introduction 4

Exercises• Return to assistant’s email address.

• Return deadline will be announced later when the session time is known.

• Mostly small programming tasks.

• What should be returned?

- Program code.- Possible compilation instructions or Makefile (code must compile without errors).- Input and output files.- Text that explains what you have done (plain ASCII or PDF).- No Word documents.- Plots and figures (if applicable).- The exact instructions for naming files and such are given in

http://www.physics.helsinki.fi/courses/s/stltk/exercises/exercisefiles_new.pdf- Not conforming to these rules may mean losing points!

Tools for High Performance Computing 2013: 1. Introduction 5

Computing environment

• Exercise programs (at least parallel ones) are run on the cluster computer alcyone.grid.helsinki.fi

- You get a user account for the duration of the course (unless you have one already).

- Alcyone is a 892-core (70-node) Intel Xeon cluster- More info in http://docs.physics.helsinki.fi/alcyone.html

- Software- Operating system: Scientifc Linux- Batch queue system is SLURM (Simple Linux Utility for Resource Management, http://slurm.net/)- Compilers and libraries

GNU Fortran (gfortran) and C (gcc) compilersLAPACK, FFTWIntel Fortran (ifort) and C (icc) compilers

includes Intel Math Kernel Library: BLAS, LAPACK, ScaLAPACK, FFT etc.

• More detailed usage instructions will be given later.

Tools for High Performance Computing 2013: 1. Introduction 6

Computing environment

• The IT Department Unix system can be used for non-parallel exercises

- mutteri.helsinki.fi: Red Hat Enterprise Linux- GNU Fortran (gfortran) and C (gcc) compilers- maple and matlab are installed on ruuvi

- punk.it.helsinki.fi: Linux server (CentOS)- GNU Fortran (gfortran) and C (gcc) compilers

• Of course, you can use your own machine, too.

- The GNU compilers are free- They can be installed in Windows environment using e.g. the Cygwin environment: http://www.cygwin.com/

- Another free compiler suite is the Open64: http://www.open64.net/

Tools for High Performance Computing 2013: 1. Introduction 7

Table of contents

1. Introduction

2. The Fortran 90/95/2003/2008 programming language

3. Programming tools in the Unix environmentcompilation, make utility, debugging, profiling

4. Code optimization

5. Introduction to parallel computationsdifferent paradigms and architectures

6. Parallel computations: workstation clustersMPI

7. Parallel computations: shared memory architectureOpenMP, threads

Tools for High Performance Computing 2013: 1. Introduction 8

Computational science and scientific computation

Tools for High Performance Computing 2013: 1. Introduction 9

High performance computing (HPC)

• Physicsatomistic simulationsweather forecastcomputational fluid dynamicsmodeling nuclear weapons

• Chemistryquantum chemistry calculations

• BiologyDNA sequencesmodeling of moleculesbioinformatics

• Industryaerodynamicscar crash simulationsoil and gas industry

• Most HPC problems in different fields of sci-ence and technology reduce to a small num-ber of basic numerical problems.

• These are handled in courses Scientific com-puting I-III.

• This course provides the tools to efficiently implement these numerical methods as pro-grams.

Tools for High Performance Computing 2013: 1. Introduction 10

High performance computing (HPC)

• Why parallel computations?

- Single CPU core performace increase today not as drastic as in previous years: problems in heat dissipation etc.

- Instead, the number of processor cores is increased.Quadcore is now common.

- Challenge to programmers: How to efficiently utilize all cores (both in HPC and on your desktop)?

- Multitasking operating systems run many processes simulta-neously.

- A single application can use many cores: multithreading.

- In HPC parallel computations have been used for years. - However, not all computational problems are parallellizable:

they don’t scale well.- In addition to speedup, some problem do not fit into one core

due to memory usage.

http://www.scidacreview.org/0904/html/multicore.html

Tools for High Performance Computing 2013: 1. Introduction 11

Programs used in HPC

• Self-made (compiled) codes

- Often the most efficient solution.- Particularly, when used with highly optimized numerical libraries. (Don’t reinvent the wheel!)

• Numerical software (in practice matlab and its clones octave, scilab)

- A perfect fit if your problem can be cast in a standard form (e.g. standard linear algebra operations) or if you need not do the calculation many times.

- In other cases not necessarily the most efficient solution.- Compilation of matlab programs possible.

• Symbolic mathematical software (mathematica, maple, mupad, maxima)

- For non-numerical calculations.- Output in programming languages: results used in ones own codes.

• Self-made codes written in scripting languages (awk, perl, python)

- Good for handling textual data: process program output.- Not very fast.- Combination: numerically intensive part by compiled (C) code, user interface etc. by python.

Tools for High Performance Computing 2013: 1. Introduction 12

The Unix programming environment• Assume you all have some experience in using Unix.

• Created by programmers for programmers.

• Things done by using the command line.

• Many (small) tools that ease program development.

- GNU Emacs (emacs): editor with language awareness:

- syntax highlightning and smart indentation- tag files- compilation and debugging within the editor- access shell- good help system (info)- psychotherapist- no limit for customization (if you know Lisp)- You can do everything in Emacs!

- In Linux there are other good editors for programdevelopment: vim, kate, gedit, sublime_text,...

- Real integrated development environments (IDEs): eclipse

Tools for High Performance Computing 2013: 1. Introduction 13

The Unix programming environment

- Scripting language awk (or gawk for GNU awk)

- Swiss army knife of Unix- a lot can be done using simple ’one-liners’- well suited for e.g. picking something out of a (ASCII)

data file- no need to write and compile a C/Fortran program

- example: we have a data file (atoms.out) shown on the right- we want to plot the distribution of the 5th column

data of those lines that have ’Cu’ as the 1st string on the line:

gawk ’BEGIN {c=20} $1=="Cu" {i=int(c*$6+0.5); e[i]++} END {for (i in e) print i/c,e[i]}’ atoms.out |\

sort n | xgraph -P

- Here we have used the plotting program xgraph.

mdmorse atom output at time 2.000 fsCu 8.151144 8.155904 8.149658 3.033020Cu 6.336929 6.348670 8.163063 3.040200Cu 8.152819 6.338272 6.340627 3.050315Cu 6.341799 8.167974 6.346826 3.033914Cu 8.158021 8.155310 4.520418 3.051445Cu 6.351106 6.347753 4.540661 3.038614Cu 8.156158 6.336749 2.716651 3.043019Cu 6.343872 8.161483 2.726575 3.028146...

Tools for High Performance Computing 2013: 1. Introduction 14

The Unix programming environment

- gawk command line syntaxgawk -f progfile [--] file ...gawk ’program’ file ...

- gawk programpattern {action_statements}

- Read the input file and when pattern is found execute the action_statements.

- C-like syntax- Variables need not be declared, interpretation depends on the context.- Good string handling functions, some mathematical functions (incl. trigonometry)- Special variables:

NR : line number of current file,NF : number of fields in the current line$i : ith field in the current lineFS : field separator (default: spaces or tabs)

- Simple example: print the 4th field of every line that has the string ’dat’ in the very beginning of the line:

gawk ’/^dat/ {print $4}’ file.dat

- For more information give one of the commands man gawk, man awk, info gawk.

Pattern: BEGIN END /regular expression/ relational expression pattern && pattern pattern || pattern pattern ? pattern : pattern (pattern) !pattern pattern1, pattern2

Tools for High Performance Computing 2013: 1. Introduction 15

The Unix programming environment

- sed stream editor

- perform basic text transformations on an input stream- syntax:

sed “sed-commands” input.file > output.filesed -f sedscript.file input.file > output.file

- sed command consists of an optional address or address range, followed by a one-character command

- most common operation is substitution; e.g. change Fortran77-style comments to Fortran90-style:

sed 's/^[Cc]/!/' oldfortran.f > newfortran.f

- address range can be added in the form of line numbers or regular expressions:

sed '10,50s/^[Cc]/!/' old.f > new.fsed -n '/SUBROUTINE CH/,/^[ \t]END/{s/^[Cc]/!/;p}' old.f

Tools for High Performance Computing 2013: 1. Introduction 16

The Unix programming environment- Other tools

sortuniqtr...

- Use the man command to find more information.

Tools for High Performance Computing 2013: 1. Introduction 17

The Unix programming environment• Shell scripting

- A common task: series of computations with a varying parameter(s)- Use a shell script and for loop

- Many shells available for Unix systems: bash good for scripting.

#! /bin/sh

./prog < input01 > output01

./prog < input02 > output02

./prog < input03 > output03

./prog < input04 > output04

./prog < input05 > output05

Solution 1

#! /bin/sh

for p in 1 2 3 4 5; do

sed "s/__p__/$p/g" input.template > input${p} ./prog < input${p} > output${p}

done

Solution 2

1.02.0__p__10002000

File input.template

Tools for High Performance Computing 2013: 1. Introduction 18

The Unix programming environment• Visualization

- Good tools for simple plotting of curves: xgraph, gnuplot

- For publication-quality figures: matlab (also 3D plots), gle, gnuplot, xmgrace.

- Python has good packages for visualization:- matlab-style plotting with matplotlib- 3D visualization with mayavi2

- For atomistic simulations common molecular visualization ptograms are used (ovito, jmol, rasmol etc.)

Tools for High Performance Computing 2013: 1. Introduction 19

The Unix programming environment• Short instructions to program compilation and (interactive) running:

- GNU compilers- C gcc -o prog source.c (alcyone, punk, mutteri)- Fortran gfortran -o prog source.f90 (alcyone, punk, mutteri)

- Intel compilers- C icc -o prog source.c (alcyone)- Fortran ifort -o prog source.f90 (alcyone)

- When compiling with libraries you need to tell

- which libraries to use (option -l)f90 -o eigen eigen.f90 -llapackcc -o eigen eigen.c -llapack -lfor -lm- option -lfile looks for library file libfile.so or libfile.a

from predetermined directories (normally at least /usr/lib and /lib)

- where to find them- Option -L tells to first look for library files in particular directories. Example:gfortran -o prog prog.f90 -L/home/user/libs/ -lmylib

Tools for High Performance Computing 2013: 1. Introduction 20

General programming issues

• Think before you start- Make a block diagram of the program.- Think about subroutine interfaces (what parameters in and out).- Write out explicitly subroutine interfaces (in F90 interface modules, in C prototypes).

• Modularize your code- Divide the code into smaller parts; code and test them separately.- Only after the separate parts work test the whole program.

• Generalize slightly- One day you will need to use your code in a slightly different problem.- Or the size of the problem changes- Foresee this by e.g. not fixing array sizes and declaring non-changing (or seldom-changing) constants:

in Fortran use modules with constants defined in them: integer,parameter :: MAXBUF=10000in C use preprocessor macros: #define MAXBUF 10000

• Make the code readable- Use meaningful variable and function names.- Indent code (use language-aware editor; e.g. GNU Emacs, XEmacs, ...)

• Think about efficiency- Concentrate on the most time-consuming parts of the code (profiling).- Optimize also your own time.

• Documentation- At least include comments to source code and write a README file.

Tools for High Performance Computing 2013: 1. Introduction 21

General programming issues

• Use a version control system (VCS)

- Keeps track of all versions of your code.- In case yous recent modification does not work you can go back to the last working version.- The basic cycle in program development:

1) check out code from repository2) modify it3) commit the changes to the repository (with explanatory message on what has been done)

- Many branches of the code can be maintained.- Changes from one branch can be included into another.

- Many people can work on the same code simultaneously.

- Many open source VCSs exist. The two most used are probably Subversion and GIT.

- Subversion: http://subversion.tigris.org/- Replacement of the old Unix CVS.

- GIT: http://git-scm.com/- Developed originally by Linus Torvalds for Linux kernel VCS.

Tools for High Performance Computing 2013: 1. Introduction 22

General programming issues• Program input

- Large amounts of data from files- Often changing parameters from command line

program argtest implicit none character (len=80) :: argu integer :: i real :: x,y

call get_command_argument(0,argu)

if (command_argument_count()/=3) then call getarg(0,argu) write(0,’(a,a,a)’) ’Usage: ’,trim(argu),’ x y i’ stop endif call get_command_argument(1,argu); read(argu,*) x call get_command_argument(2,argu); read(argu,*) y call get_command_argument(3,argu); read(argu,*) i write(6,’(a,g10.4,a,g10.4,a,i5)’) & & ’Command line parameters: x ’,x,’ y ’,y,’ i ’,i

stopend program argtest

#include <stdio.h>#include <stdlib.h>#include <math.h>

main (int argc, char **argv){ double x,y; int i;

if (argc!=4) { fprintf(stderr,"Usage: %s x y i\n",argv[0]); return (1); } x=atof(*++argv); y=atof(*++argv); i=atoi(*++argv); fprintf(stdout,"Command line parameters: x %g y %g i %d\n",

x,y,i); return(0);}

Fortran C

Tools for High Performance Computing 2013: 1. Introduction 23

General programming issues- Note that the way to read command-line arguments is (finally) standardized in Fortran 2003.

- The old, almost de facto standard, versions of the functions are shown below.

program argtest implicit none character (len=80) :: argu integer :: i real :: x,y integer :: iargc

call getarg(0,argu)

if (iargc()/=3) then call getarg(0,argu) write(0,'(a,a,a)') 'Usage: ',trim(argu),' x y i' stop endif

call getarg(1,argu) ;read(argu,*) x call getarg(2,argu) ;read(argu,*) y call getarg(3,argu) ;read(argu,*) i write(6,'(a,g10.4,a,g10.4,a,i5)') & & 'Command line parameters: x ',x,' y ',y,' i ',i

stopend program argtest