Upload
ziva
View
49
Download
0
Embed Size (px)
DESCRIPTION
Getting acquainted to PDC. Nils Smeds
Citation preview
PDCEnabling Science
Using PDC resources
• Acquiring informationhttp://www.pdc.kth.se• Guided Tours
http://www.pdc.kth.se/support/• AFS
• Strindberg IBM-SP
• Helpdesk• FAQs
• Contact information
• 08-790 7800
PDCEnabling Science
http://www.pdc.kth.se/doc
PDCEnabling Science
http://www.pdc.kth.se/support/sp2-tour.html
PDCEnabling Science
http://www.pdc.kth.se/compresc/
PDCEnabling Science
Your environment as a user
• File systems• AFS — Home directories• GPFS — Parallel file system (IBM SP)• HSM — Hierarchical storage management• /scratch — scratch file systems
• Modules — handles $PATH, $MANPATH• module add sp2 local • module show local
• E-mail — when you leave• Create $HOME/.forward publicly readable
http://www.pdc.kth.se/support/misc-tour.html#EMAIL
PDCEnabling Science
Kerberos commands
• kauth — Proves your identity> ./kauth -n [email protected] -l 60
• klist — List your kerberos tokens> ./klist
Ticket file: /tmp/tkt58016Principal: [email protected] Issued Expires Principal
Mar 9 12:18:59 Mar 9 13:18:59 krbtgt.NADA.KTH.SE Mar 9 12:19:25 Mar 9 13:18:59 rcmd.r11n07.pdc.kth.se
• kdestroy — removes your ticket file> ./klist Ticket file: /tmp/tkt58016 klist: No ticket file (tf_util)
• kpasswd — change your passwd
PDCEnabling Science
Commands that rely on kerberos
• Getting a shell > ./rxtelnet -l username strindberg.pdc.kth.se> ./telnet -l username strindberg.pdc.kth.se
• Transferring files > ./ftp strindberg.pdc.kth.seConnected to r11n07-f.pdc.kth.se220 r11n07.pdc.kth.se FTP server ready.Name (strindberg.pdc.kth.se:smeds): <RET>S:232- //PDC// S:232- //PDC// Welcome to Strindberg, an IBM SP ... S:232 User smeds logged in.ftp> kauthPassword for [email protected]: mypasswordS:200 Tickets will be destroyed on exitftp> binaryftp> put filename.datftp> get otherfile.dat ftp> quit
PDCEnabling Science
AFS commands
http://www.pdc.kth.se/support/afs-tour.html• tokens — List your afs tokens – -smeds> [email protected]'s Password: mypasswordsmeds> unlogsmeds> tokensTokens held by the Cache Manager: --End of list--smeds> afslogsmeds> tokensTokens held by the Cache Manager:(AFS ID 22557) tokens for [email protected] [Expires Aug 19 03:38](AFS ID 22557) tokens for [email protected] [Expires Aug 19 03:38] --End of list--smeds>
• kauth/kdestroy automatically does afslog/unlog
PDCEnabling Science
More AFS
• fs — Directory access managementsmeds> fs setacl directoryname username rlsmeds> fs listacl directorynamesmeds> fs setacl directoryname username nonesmeds> fs setacl directoryname system:anyuser rlsmeds> fs helpsmeds> fs setacl -h
• pts — ACL group managementsmeds> pts mem usernamesmeds> pts creategroup username:bs106smeds> pts adduser mybuddy mygroupsmeds> pts examine mygroupsmeds> pts adduser -h
• Putting it all togethersmeds> fs setacl MyProject smeds:buddies rlsmeds> fs setacl MyProject smeds:REALbuddies rlidwk
PDCEnabling Science
HSM usage
• Use tar to pack many files into one file which can be saved in HSM
smeds> module add hsmsmeds> hsmls -lsmeds> tar cvf /scratch/MyAnalysis.tar Results-980812/Run1/smeds> hsmcopyto /scratch/MyAnalysis.tar Res-980812-1.tar
• HSM location is kallsup:/hsm/home/u/username/..., see output from hsmmyhome
• You may use kerberized rcp to move files to and from this location.
• On line help is available:smeds> hsmls -hsmeds> hsmcopyfrom -h
PDCEnabling Science
Strindberg usage
• http://www.pdc.kth.se/cgi-bin/strindberg-usage.pl
PDCEnabling Science
Node types
• http://www.pdc.kth.se/compresc/hardware/• Batch nodes (T)
• 160 MHz (640 MFlop/s), 256MB RAM, 2 GB /scratch
• Batch nodes with more memory (W,Z)• 160 MHz (640 MFlop/s), 512/1024 MB RAM, 2 GB /scratch
• 4-way SMP nodes (M)• 4332 MHz (4664 MFlop/s), 512 MB RAM, 4 GB /scratch
• 8-way SMP nodes (N,H)• 8222 MHz (8888 Mflop/s), 4/16 GB RAM
• Serial nodes (G, S)• One 135MHz wide node w. 2 GB RAM, some 67MHz nodes
PDCEnabling Science
• Login node(s)
• The node of the SP that you are connected to after./rxtelnet -l name strindberg.pdc.kth.se./rxtelnet -l name august.pdc.kth.se./rxtelnet -l name nf01r01.pdc.kth.se
• Interactive nodes
• Nodes that are shared among several users. Used for eg debugging and compiling. spattach -i -p#
• These nodes must be used with IP communication:export MP_EUILIB=ip
• Dedicated (or batch) nodes• Nodes used for production codes and/or longer pre/post-jobs
spsubmit -p# -t time -c CAC scriptfilespattach -p# -t time -c CAC
PDCEnabling Science
A full interactive example
rxtelnet strindberg.pdc.kth.se
(New window)
klist
kauth
cd workdir
mpcc -g -o myprog myprog.c
spattach -i -p5
(wait)
./myprog
./myprog -procs 3
./myprog -procs 3 -stdoutmode ordered -labelio yes
PDCEnabling Science
Interacting with the EASY scheduler
smeds> spsubmit -hspsubmit [-h][-inWvCb][-c cac][-I#][-s#][[-p# -t#][-j#][-M]
file[args]] -h: help-p processors: number of processors. (Example: -p2W)-t minutes: number of minutes (Wall-clock)-j Job Type: available job types mpi, task, pvm3...-c CAC: optional, submit for accounting group cac.-I InitialDir: optional, default current working
directory.-b: optional, hold job until all jobs completed-i: optional, use IP instead of UserSpace.-v: optional, verbose.-C: optional, commit before submit.-s Filename: optional, save EASY generated script
[...]program: executable or script.args: optional arguments to program.
User smeds can specify: staff free ta.smeds
PDCEnabling Science
spsubmit examples
• Submitting an MPI programsmeds> spsubmit -p 4T -t 30 -j mpi ./mympiprog
• Saving the generated script for later re-usesmeds> spsubmit -p 4T -t 30 -j mpi -s myscript.esy ./mympiprog
• To have a mix of nodes and start on a Z-nodesmeds> spsubmit -p 1Z8T -t 30 -j mpi ./mympiprog "arg1 'arg2 here'"
• Redirecting STDOUT for your programsmeds> spsubmit -p 4T -t 30 -j mpi ./mympiprog "> job.out"
• Submitting an MPI programsmeds> spsubmit -p 4T -t 30 -j mpi ./mympiprog
PDCEnabling Science
The batch script file
#!/bin/bash#------ Customizable part ------# (Use submitting directory as working directory)cd $SP_INITIALDIROUT=MyProgram.out#------ End customizable part ------#------ Generic part ------PROGRAM="MyProgram" ; PROGRAMDIR="$HOME/Public/MyProgramDir"export MP_HOSTFILE=$SP_HOSTFILEexport MP_PROCS=$SP_PROCSexport MP_EUILIB=us ; export MP_EUIDEVICE=css0export MP_INFOLEVEL=0export MP_CSS_INTERRUPT="yes”
export TMPDIR=/scratch
echo "Executing $PROGRAM in directory `pwd` at `date`"poe ${PROGRAMDIR}/${PROGRAM} > $OUTecho "Program finished `date`"
PDCEnabling Science
http://wwww.pdc.kth.se/info/qwatch/
• A snapshot from different queues at PDC
• Updated at regular intervals• It is generated from the same
information you get using the command spq
smeds> spq -a
smeds> spq -r
smeds> spq -u smeds
PDCEnabling Science
http://www.pdc.kth.se/sp/sptetris/
PDCEnabling Science
Scheduling limits
smeds> spq -hUsage: spq [-h] [-l] [-L] [-r] [-q] … ...smeds> spq -l NICKNAME SATURATE CAC NJOB Wall Total- - weekend - r1149 1 16h- - weekend saturate tf109 3 169h40- - night saturate gw11 4 48h40- - day - gw11 1 3h30 . . .smeds> spq -LINTERVAL NICKNAME MAXNJOB MAXWALLTIME[15h,60h] weekend - - - 30h[4h,15h] night - - - 16h[1h,4h] day - - - 16h . . .[0m01s,2h] Nshort - - 4 -
PDCEnabling Science
The concept of CACs
• Computer cycle "accounts" smeds> cac members smedsCAC groups smeds is a member of: ta.smeds staff summer-2000 freesmeds> cac -hsmeds> spjobsummary -c summer-2000usr jid req npe treq tstart r-cpu ucpusmeds ###### 1G2Z2T 5 0h30 yyhhmm 2h30 1h49mike ###### 4T 4 0h15 yyhhmm 1h 0h56 . . .smeds> cac -hsmeds> spjobsummary -u smeds -f 200003 -lsmeds> spjobsummary -hsmeds> spsummary -h
PDCEnabling Science
Compilers (IBM SP)
• cc, mpcc• IBM C-compiler, mpcc adds special flags for compiling MPI
parallel programs. Include file search path, tags binary to be parallel etc.
• xlC, mpCC• IBM C++-compiler. Not fully ANSI compliant.
• xlf, mpxlf, xlf90, mpxlf90, f90• Fortran, Fortran90/95 compilers
• Reentrant code generation (thread safety)• xlc_r, xlf_r, mpxlf90_r …
• OpenMP directives only available in Fortran currently
PDCEnabling Science
Code optimization
• -O2 -O3• Code restructure. Code in-lining. Level 3 may cause arithmetic
reorganization.
• -qhot• Higher order transformations of generated code. Uses cache size
information. Occasionally slows code down.
• -qipa, -O4, -O5• Interprocedure analysis. Mainly code inlining across file
boundaries. -O4 => -O3 -qhot -qipa -qtune=arch -qcache=arch
• -qsmp=omp, -qsmp=auto, -qreport=smplist• All of the above. Long compile time. Needs thorough checking of
results
PDCEnabling Science
The lab session
• The object of the exercise is to get familiar to the PDC environment by a hands on experience
• The lab session has three parts• Install a kerberos travel kit
• The workstation is a Sun Solaris 2.6 workstation
• Install a travel kit and verify that you can use that to log in to the SP
• Experiment with file systems and storage media• Try AFS, tokens and ACLs
• Use the HSM data migration system
• Run a Fortran90 program on the IBM SP2• Serial and parallel - interactive and batch
• Play, experiment, think and ask!
PDCEnabling Science
Topics that can not be covered in this talk
• Compiler options• Optimization options, linker options, file name convention options
• Programming tools• Tracing, Sampling, Debugging, F90 conversion• See http://www.pdc.kth.se/compresc/software• Totalview, Foresys, Vampir, Dimemas
• Running parallel programs on other computers• Running MPICH in the NADA computer lab rooms
PDCEnabling Science
Totalview and “How to trick the OS”
• Have the program read from the keyboard as early in the program flow as possible after MPI_Init()
• Start the process and attach to the running poe processmodule add totalview
./myprog & (Start your program)
totalview -no_stop_all & (Or start totalview in other window)• "Show all unattached processes"• Attach to the poe process, the debugger locates all MPI processes• Select one of the MPI-processes (not the poe process)• Set break-points later in the program flow if you want• In one of the MPI-process windows say "Go group <G> " • Give the program the input it is waiting for
PDCEnabling Science
Running MPI programs on SUNs (locally)
• MPICH - 1.2.0• Argonne National Laboratories
• Reference MPI implementation
• module add workshop/5.0• module add mpich/1.2• mpicc -o myprog-sun myprog.c• mpirun -np 4 -machinefile LOCAL ./myprog-sun
• The machinefile is reused up to the number of processes requested by -np
• Further information on MPI at KTH:http://www.nada.kth.se/datorer/unix/
LOCALred01.nada.kth.sered01.nada.kth.se
PDCEnabling Science
Running MPI programs on SUNs (remote)
• Running on several hosts• You may need to set up AFS tokens on the remote
hosts
• kauth -h red02.nada.kth.se -l 30kauth -h red03.nada.kth.se -l 30kauth -h red04.nada.kth.se -l 30kauth (For your local rights)
• mpirun -np 4 -machinefile RED ./myprog-sun
• mpirun -np 6 -nolocal -machinefile RED ./myprog-sun
• The remote processes are started of by a kerberos rsh to the remote host. Modern kerberos rsh has a call to the command afslog in them.
• The remote ticket must be there in advance
REDred02.nada.kth.sered03.nada.kth.sered04.nada.kth.se
PDCEnabling Science