32
Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Profiling Tools on the NERSC Crays and IBM/SP

  • Upload
    thanh

  • View
    40

  • Download
    3

Embed Size (px)

DESCRIPTION

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER. Profiling Tools on the NERSC Crays and IBM/SP. NERSC User Services. N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER. Outline. Profiling Tools on NERSC platforms Cray PVP (killeen, seymour) Cray T3E (mcurie) - PowerPoint PPT Presentation

Citation preview

Page 1: Profiling Tools on the NERSC Crays and IBM/SP

Profiling Tools on the NERSC Crays and IBM/SP

NERSC User Services

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Page 2: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

2

Outline• Profiling Tools on NERSC platforms

– Cray PVP (killeen, seymour)

– Cray T3E (mcurie)

– IBM/SP (gseaborg)

• UNIX profiling/performance analysis tools• References

Page 3: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

3

Why Profile?

• Characterise application :– Is code cpu bound?– Is code I/O bound?– Is code memory bound?– Analyse communication patterns - D.M. codes

• Focus optimisation effort ... and ultimately..• Improve performance and resource utilisation

Page 4: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

4

Cray PVP/T3E - Application Characterization

• Job accounting (ja) • ja

• ./a.out• ja -st -n a.out - see next slide for sample output

• Look out for :• Maximum Memory Used > available memory• Total I/O wait time (locked+unlocked) > 50% User CPU

time• Multitasking breakdown for parallel codes

Page 5: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

5

Job accounting : summary reportElapsed Time : 8 Seconds User CPU Time : 35.5939 Seconds Multitasking/ Multistreaming Breakdown (Concurrent CPUs * Connect seconds = CPU seconds)

1 * 0.0100 = 0.0100 2 * 0.0100 = 0.0200 3 * 0.0600 = 0.1800 4 * 8.8500 = 35.4000

(Avg.) (total) (total) 3.99 * 8.9300 = 35.6100

System CPU Time : 0.1226 Seconds I/O Wait Time (Locked) : 0.0000 I/O Wait Time (Unlocked) : 0.0000CPU Time Memory Integral : 5.3854 Mword-seconds Data Transferred : 0.0001 MWords Maximum memory used : 0.4746 MWords

Page 6: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

6

HPM - Hardware Performance HPM - Hardware Performance MonitorMonitor

• Helps locate CPU related code bottlenecks• reports use of vector registers, instruction buffers,

memory ports

• hpm {options} ./a.out {prog_arguments}• options = -g2 -> memory access information • options = -g3 -> vector register information

• Look for :• Ratio of Floating Ops/CPU second to CPU mem.

references per sec should reflect the FpOps in the code

Page 7: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

7

Sample hpm output : (hpm -g0 ./a.out)Million inst/sec (MIPS) : 7.67 Instructions : 274017290Avg. clock periods/inst : 26.06% CP holding issue : 94.02 CP holding issue : 6714667737Inst.buffer fetches/sec : 0.04M Inst.buf. fetches: 1420802Floating adds/sec : 15.40M F.P. adds : 550002417Floating multiplies/sec : 24.36M F.P. multiplies : 870004996Floating reciprocal/sec : 0.28M F.P. reciprocals : 10000042Cache hits/sec : 0.00M Cache hits : 45893CPU mem. references/sec : 34.64M CPU references : 1236978495Floating ops/CPU second: 40.5M

Page 8: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

8

Cray PVP : CPU Bound Codes: prof/profview

• Instruments code to provide % cpu time in function calls

• f90 -lprof prog.f90• ./a.out -> generates prof.data• prof -st ./a.out > prof.report

• Chart (over) indicates relative distribution of CPU execution time by function call– prof -x a.out > pgm.prof– profview pgm.prof

Page 9: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

9

Profview - Sample Output

Page 10: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

10

I/O and Memory Bound Codes : procstat/procview

• procstat -m -i -R a.raw a.out • procview a.raw

– I/O Analysis :• Reports, Files -> All User Files (Long Report)

• Bytes Processed or I/O Wait Time

– Memory Analysis :• Reports -> Processes -> Maximum Memory Used

(Long Format)

Page 11: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

11

I/O Bound Codes : procview

• procview indicates which files consume most real time for I/O processing

Page 12: Profiling Tools on the NERSC Crays and IBM/SP

Memory Bound Codes : procview– “High” (> 10% Elapsed

Time) Time to complete Memory requests may indicate memory bound code

– Use Graphs option to produce plot of Memory use over elapsed time of application

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

12

Page 13: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

13

ATExpert - Autotasking ATExpert - Autotasking PredictionPrediction

• Analysis of source code to predict autotasking performance on dedicated Cray PVP

• f90 -eX -O3 -r4 -o {prog_name} prog.f90– ./a.out– atexpert -> shows predicted speed-up

Page 14: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

14

ATExpert Sample outputATExpert Sample outputIndicates predicted speed-up of 4.3 on dedicated 8 processor PVP when source code is autotasked

Page 15: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

15

Also available on Cray PVP Also available on Cray PVP • Flowtrace/flowview

• times (using Operating System timers) subroutines and functions during program execution

• jumptrace/jumpview• provides exact timing in function/subroutine by analysis

of machine instructions in program

• perftrace/perfview• times subroutines/functions based on statistics gathered

from HPM tool

Page 16: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

16

Cray T3E - ApprenticeCray T3E - Apprentice• Locate performance problems /inefficiencies

• MPI and shared memory performance, load balance and communication, memory use

• Provides hardware performance information and tuning recommendations (Displays -> Observations)

• Compile/link• f90 -o {prog} -eA {prog_name.f90} -lapp

• cc -o {prog} -happrentice {prog_name.c} -lapp

• Run code to generate app.rif

Page 17: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

17

Output from :

apprentice app.rif

Page 18: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

18

Cray T3E - PATCray T3E - PAT

• Generates profile of CPU time in functions; load balance across PEs; h/w counter info.

• Compile and Link with PAT library• f90 -o exe -lpat {source.f} pat.cld

• Run program as normal• mpprun -n {procs} {exe} -> generate exe.pif

• pat executable exe.pif

Page 19: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

19

Profile based on relative CPU time in function calls

Load Balance Histogram for routine “COLL”

Page 20: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

20

Cray T3E - ACTS/TAUCray T3E - ACTS/TAU • Performance analysis of distributed/shared

memory applications (C++ in particular)• module load tau

• instrument programs with TAU macros

• add $(TAU_DEFS), $(TAULIBS) to compile/link

• run application; view tracefile with pprof, VAMPIR

• Reference• http://acts.nersc.gov/tau• http://hpcf.nersc.gov/training/classes/Teleconf/1999july/Wu

Page 21: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

21

Cray T3E - VampirCray T3E - Vampir • Analysis of message passing characteristics -

generates display of MPI activity over instrumented time period (e.g. sender, receiver, message size, elapsed time)

• module load VAMPIR; module load vampirtrace• Facility to instrument with VAMPIRtrace calls• Generate trace file using TAU or VAMPIRtrace

• Reference :• http://hpcf.nersc.gov/software/tools/vampir.html

Page 22: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

22

IBM/SP - XprofilerIBM/SP - Xprofiler• Graphical interface for gprof profiles of parallel

applications – Compile and link code with “-g -pg”– poe ./a.out -procs {n}

• generates gmon.out.{n} file for each process• may introduce significant (upto factor of 2) overhead

– (In $TMPDIR) xprofiler ./a.out gmon.out.*• Report menu provides (gprof) text profile• Source statement profiling shown

Page 23: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

23

Page 24: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

24

Statement level profile available by clicking on relevant function graphical output - use Show Source Code option

Page 25: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

25

IBM/SP - Visualization Tool (VT)IBM/SP - Visualization Tool (VT)

• Message passing trace visualization • Realtime system activity monitor (limited)• MPI load balance overview :

• poe ./a.out -procs {n} -tlevel=3• copy a.out.trc to $TMPDIR• (In $TMPDIR) Invoke vt • In trace visualization mode, “Play” a.out.trc • see next slide for sample of Interprocessor

Communication during program execution

Page 26: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

26

Page 27: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

27

IBM/SP : system_statsIBM/SP : system_stats• IBM Internal Tool

• module load sptools• instrument code with system_stats() call• Link with $(SPTOOLS), run code as normal

• Sample output Summary of the utilization of system resources:node hostname wall(s) user(s) sys(s) size(KB) pswitches 0 gs01015 16.80 13.18 0.04 2748 2138 1 gs01015 16.80 16.07 0.04 2744 1868 2 gs01003 16.80 16.62 0.04 2740 1870 3 gs01003 16.80 16.56 0.03 2732 1841

Page 28: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

28

IBM/SP - trace-mpi IBM/SP - trace-mpi • IBM Internal tool - Quantitative information on

MPI calls– module load USG ; module load trace-mpi– Fortran - add $(TRACE_MPIF) to build– C - add $(TRACE_MPI) to build– poe ./a.out -procs {n} - generates mpi.trace_file for each

process (executable must call MPI_Finalize)– summary mpi.trace_file.{n} (see over)

• Useful check for load balance :– grep “Total Communication” mpi.trace.file.*

Page 29: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

29

MPI message-passing summary for mpi.trace_file.3MPI Function #calls Avg Bytes Time (sec)-------------------------------------------------------------MPI_Allreduce: 9355 8.0 3.596MPI_Barrier: 3 0.0 0.017MPI_Bcast: 66 5.8 0.013MPI_Scatter: 31 1008.0 0.088MPI_Comm_rank: 1 0.0 0.000MPI_Comm_size: 1 0.0 0.000MPI_Isend: 43023 2003.7 0.893MPI_Recv: 43023 2003.7 7.481MPI_Wait: 43023 2003.7 3.739Total Communication Information: WALL = 15.8277, CPU = 15.53, MBYTES = 258.72The total amount of wall time = 26.229613

Page 30: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

30

Upcoming on the SPUpcoming on the SP• ACTS/TAU (C/C++)

• currently being ported to the IBM/SP

• VAMPIR• has been ordered, awaiting delivery

• Performance Monitor Toolkit (HPM)• should be available with Phase II system

(requires AIX 4.3.4)• Also, see Performance API project:

– http://icl.cs.utk.edu/projects/papi

Page 31: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

31

General/UNIX Profiling ToolsGeneral/UNIX Profiling Tools• Command line profilers and system analysis

• prof/gprof (enabled for MPI on IBM/SP)• csh time command : time ./a.out• vmstat -> look for high paging over extended time

period (application may require more memory)

• Fortran/C function timers • getrusage • rtc, irtc• etime, dtime, mclock• MPI_Wtime

Page 32: Profiling Tools on the NERSC Crays and IBM/SP

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

32

Reference MaterialReference Material• NERSC web pages

• http://hpcf.nersc.gov/software/tools

• Cray PVP/Cray T3E • http://www.cray.com/swpubs

– Optimizing Code on Cray PVP Systems– Cray T3E C, Fortran Optimization Guides

• IBM/SP• LLNL Workshop on Performance Tools