26
Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany [email protected] Jesper Larsson Träff NEC Europe Ltd. C&C Research Labs Germany [email protected] Initial Design of a Test Suite for (Automatic) Performance Analysis Tools

Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

Embed Size (px)

Citation preview

Page 1: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

Initial Design of a Test Suite

for Automatic Performance Analysis

Tools Bernd Mohr

Forschungszentrum JülichJohn von Neumann - Institut für

ComputingGermany

[email protected]

Jesper Larsson Träff

NEC Europe Ltd.C&C Research Labs

[email protected]

Initial Design of a Test Suitefor (Automatic)

Performance Analysis Tools

Page 2: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [2]

IST Working Group APART (since 1999)

AAutomatic PPerformance AAnalysis: RResources and TTools

• Forum for scientists and vendors• About 20 partners in Europe and the U.S.• http://www.fz-juelich.de/apart/

• Current Automatic Performance Tools Projects• Askalon http://www.par.univie.ac.at/project/askalon/• Kappa-Pi http://www.caos.uab.es/kpi.html• KOJAK http://www.fz-juelich.de/zam/kojak• Paradyn http://www.cs.wisc.edu/~paradyn/• Peridot http://wwwbode.cs.tum.edu/~gerndt/peridot/

Page 3: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [3]

(Full, Associated, and Former) Members

• European Research Centers and Universities

• U.S. Research Centers and Universities

• Vendors

Page 4: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [4]

APART Terminologie

• Performance Property• Aspect of performance behavior of an application

– E.g., communication dominated by waiting time• Specified as condition referring to performance data• Quantified and normalized in terms of

behavior-independent metric (severity)

• Performance Problem• Performance property with “negative” implications

• Performance Bottleneck• Performance Problem with highest severity

Page 5: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [5]

Example: Performance Property “Message in Wrong Order”

Locati

on

RECVA

Time

wait

SEND B SEND

C

RECV

SEND

Page 6: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [6]

The APART Test Suite (ATS)

• Users rely on correct working of tools Tools need to be especially well tested Systematic approach needed

• APART Test Suite• Common project inside APART group

– Every member needs this minimize resources– Ensures re-usability– Will also allow evaluation / comparison of

the different member projects• Main focus: automatic performance analysis tools• But also useful for “regular” performance tools

– http://www.fz-juelich.de/apart/ats/

Page 7: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [7]

Desired Functionality

• Tests to determine whether the semanticsof the original program were not altered

• Tests to see whether the recordedperformance data is correct

• Synthetic positive test cases for each known and definedperformance property and combinations of them

• Negative test cases which have no known performance problem

• “Real world” size parallel applications and benchmarks

Can be partially based on existing validation suites WWW

Probably needs to be tool specific

Collect available benchmarks and applications WWW

Design and Implementation of a ATS Framework

Page 8: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [8]

Validation Suites and Kernel Benchmarks (I)

ValidationMPI test / validation suites from Intel, IBM, ANL•http://www-unix.mcs.anl.gov/mpi/mpi-test/tsuite.html

MPI BenchmarksPARKBENCH (PARallel Kernels and BENCHmarks)•http://www.netlib.org/parkbench/

PMB - Pallas MPI Benchmarks•http://www.pallas.com/e/products/pmb/

SKaMPI (Special Karlsruher MPI – Benchmark)•http://liinwww.ira.uka.de/~skampi/

Page 9: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [9]

Kernel Benchmarks (II)

OpenMP BenchmarksEPCC OpenMP Microbenchmarks •http://www.epcc.ed.ac.uk/… research/openmpbench/openmp_index.html

Hybrid BenchmarksThe Los Alamos MicroBenchmarks Suite (LAMB) • MPI and multi threading ( Pthreads and OpenMP)

programming models based on SKaMPI and EPCC

Page 10: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [10]

“Real World” Applications and Benchmarks

The NAS Parallel Benchmarks (NPB)•http://www.nas.nasa.gov/Software/NPB/

The ASCI Purple and Blue Benchmark Codes•http://www.llnl.gov/… asci/purple/benchmarks/limited/code_list.html… asci_benchmarks/asci/asci_code_list.html

NCAR Benchmarks•http://www.scd.ucar.edu/css/software/bench/

Page 11: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [11]

Current Design of ATS Framework

df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()

DISTRIBUTION

do_work()

WORK

Page 12: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [12]

The Distribution Module

• Distribution specified by• Distribution function• Distribution parameters

• All distribution function have the same signature• double distr_func (int me, int size, double sf, distr_t* dd)

– me, size: member me of group of size size– sf: scaling factor– dd: distribution parameter descriptor

• returns value for me calculated based on me, size, and ddscaled by sf

• ATS provides set of predefined distribution functions• Can easily extended if needed

Page 13: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [13]

Predefined Distribution Functions

low

high

block2

low

high

cyclic2

val

same

low

high

linear

low

high

peak

low

med

high

block3

low

med

high

cyclic3

n

Page 14: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [14]

Current Design of ATS Framework

df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()

DISTRIBUTION

do_work()

WORK

MPI PROPERTIES OpenMP PROPERTIES

par_do_omp_work()

OpenMP UTILS

par_do_mpi_work()alloc_mpi_buf()free_mpi_buf()alloc_mpi_vbuf()free_mpi_vbuf()mpi_commpattern_sendrecv()mpi_commpattern_shift()

MPI UTILS

Page 15: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [15]

Example: MPI Property Function late_sender

void par_do_mpi_work(distr_func_t df, distr_t* dd, MPI_Comm c) { int me, sz; MPI_Comm_rank(c, &me); MPI_Comm_size(c, &sz); do_work(df(me, sz, 1.0, dd));}

void late_sender(double bwork, double ework, int r, MPI_Comm c) { val2_distr_t dd; int i; mpi_buf_t* buf = alloc_mpi_buf(base_type, base_cnt); dd.low = bwork+ework; dd.high = bwork;

for (i = 0; i<r; ++i) { par_do_mpi_work(df_cyclic2, &dd, c); mpi_commpattern_sendrecv(buf, DIR_UP, 0, 0, c); } free_mpi_buf(buf);}

Page 16: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [16]

Currently Implemented Performance Property Functions

• MPI Point-to-PoCommunication Performance Properties• late_sender(basework, extrawork, rf, MPI_Comm);• late_receiver(basework, extrawork, rf, MPI_Comm);

• MPI Collective Communication Performance Properties• imbalance_at_mpi_barrier(distr_func, distr_param, rf, MPI_Comm);• imbalance_at_mpi_alltoall(distr_func, distr_param, rf, MPI_Comm);• late_broadcast(basework, rootextrawork, root, rf, MPI_Comm);• late_scatter(basework, rootextrawork, root, rf, MPI_Comm);• late_scatterv(basework, rootextrawork, root, rf, MPI_Comm);• early_reduce(rootwork, baseextrawork, root, rf, MPI_Comm);• early_gather(rootwork, baseextrawork, root, rf, MPI_Comm);• early_gatherv(rootwork, baseextrawork, root, rf, MPI_Comm);

• OpenMP Performance Properties• imbalance_in_parallel_region(distr_func, distr_param, rf);• imbalance_at_barrier(distr_func, distr_param, rf);• imbalance_in_loop(distr_func, distr_param, rf);

Page 17: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [17]

Current Design of ATS Framework

df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()

DISTRIBUTION

do_work()

WORK

MPI PROPERTIES OpenMP PROPERTIES

par_do_omp_work()

OpenMP UTILS

par_do_mpi_work()alloc_mpi_buf()free_mpi_buf()alloc_mpi_vbuf()free_mpi_vbuf()mpi_commpattern_sendrecv()mpi_commpattern_shift()

MPI UTILS

TEST PROGRAMS

Page 18: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [18]

Performance Property Test Programs

• Single performance property testing• Programs can be generated automatically from

performance property function signature– Generator based on Program Database Toolkit (PDT)– http://www.cs.uoregon.edu/research/paracomp/pdtoolkit/

• Property parameters become test program arguments• More extensive tests through scripting languages

or experiment management system (e.g., Zenturio)– http://www.par.univie.ac.at/project/zenturio/

• Composite performance property testing• Program containing multiple performance property functions• Complexity only limited by imagination• Currently: manually implemented

Page 19: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [19]

Example: Single Performance Property Test Program

#include "mpi_pattern.h"

int main(int argc, char *argv[]) { distr_func_t df = atodf("b2:0.5:1.0"); distr_t *dd = atodd("b2:0.5:1.0"); int r = 1;

MPI_Init(&argc, &argv);

switch ( argc ) { case 3: r = atoi(argv[2]); case 2: df = atodf(argv[1]); dd = atodd(argv[1]); case 1: break; default: fprintf(stderr, "usage: %s <distf> <rfac>\n", argv[0]); break; }

imbalance_at_mpi_barrier(df, dd, r, MPI_COMM_WORLD); MPI_Finalize();}

Page 20: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [20]

Example: Single Performance Property Test Program

• imbalance_at_mpi_barrier <distribution-spec> <repition-factor>

b2:0.5:1.0 2 b2:0.1 :2.0 5

• Problem: additional property “MPI Setup/Termination Overhead” also holds!

Page 21: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [21]

Example: Collection of MPI Performance Properties

Page 22: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [22]

Examples: Detail MPI Properties

Page 23: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [23]

Example: MPI Properties in 2 Communicators

Page 24: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [24]

EXPERT Analysis of MPI 2 Communicator Example

Page 25: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [25]

Example: OpenMP Performance Property

Page 26: Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany

© 2003 Forschungszentrum Jülich, NIC-ZAM [26]

ATS: Status and Future Work

• Initial prototype available from APART website• List of MPI, OpenMP, and hybrid

validation and benchmark suites• 1st version of ATS framework including

– C version of code– Single property test program generator

• Future Work• More complete collection of validation and benchmark suites• Real “real world” applications• ATS Framework

– Fortran version – More complete list of property functions for

MPI, OpenMP, hybrid, and sequential performance properties– Documentation