11
Presented by The HPC Challenge (HPCC) Benchmark Suite Piotr Luszczek The MathWorks, Inc. http://icl.cs.utk.edu/hpcc/

The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

Presented by

The HPC Challenge (HPCC)Benchmark Suite

Piotr Luszczek

The MathWorks, Inc.

http://icl.cs.utk.edu/hpcc/

Page 2: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

2 Luszczek_HPCC_SC07

HPCC: Components

1. HPL (High Performance LINPACK)

2. STREAM

3. PTRANS A ← AT+B

4. RandomAccess

5. FFT

6. Matrix-matrix multiply7. b_eff (effective bandwidth/latency)

------------------------------------------------------- name kernel bytes/iter FLOPS/iter------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2-------------------------------------------------------

+1

-1

T: T[k] (+) ai

64 bits

ping

C ← s*C + t * A*B

Ax=b

zk=Σxj exp(-2π√-1 jk/n)

pong

Page 3: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

3 Luszczek_HPCC_SC07

HPCC: Motivation and measurement

HPC Challenge BenchmarksSelect Applications

0.00

0.20

0.40

0.60

0.80

1.00

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Spatial Locality

Te

mp

ora

l lo

ca

lity

HPL

Test3D

CG

OverflowGamess

RandomAccess

AVUS

OOCore

RFCTH2

STREAM

HYCOM

Generated by PMaC @ SDSC

Spatial and temporal data locality hereis for one node/processor - i.e., locallyor “in the small.”

Hig

h

Spatial locality

Tem

pora

l lo

calit

y

DGEMMHPL

PTRANSSTREAM

FFT

RandomAccess

Missionpartner

applications

Low High

Measurement

Concept

Page 4: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

4 Luszczek_HPCC_SC07

HPCC: Scope and naming conventions

G

EP

S

HPL STREAM

FFT...

RandomAccess(1 m)HPL (25%)

system

CPU

thread

M

PP

M

PP

M

PP

M

PP

Network

M

PP

M

PP

M

PP

M

PP

Network

M

PP

M

PP

M

PP

M

PP

Network

Global

Embarrassingly Parallel

Single

CPU

Memory Interconnect

Computationalresources

core(s)

MPI

OpenMP

Softwaremodules

Vectorize

Page 5: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

5 Luszczek_HPCC_SC07

HPCC: Hardware probes

Registers

Cache

Local memory

Disk

Instr. Operands

Blocks

Pages

Remote memory

Messages

HPC ChallengeBenchmark

CorrespondingMemory Hierarchy

HPCS PerformanceTargets (improvement)

•Top500: solves a system

Ax = b

•STREAM: vector operations

A = B + s x C

•FFT: 1D fast Fourier transform

Z = FFT(X)

•RandomAccess: random updates

T(i) = XOR( T(i), r )

bandwidth

latency

2 Petaflops(8x)

6.5 Petabytes(40x)

0.5 Petaflops(200x)

64,000 GUPS(2000x)

• HPCS program has developed a new suite of benchmarks (HPC Challenge).

• Each benchmark focuses on a different part of the memory hierarchy.

• HPCS program performance targets will flatten the memory hierarchy, improvereal application performance, and make programming easier.

Page 6: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

6 Luszczek_HPCC_SC07

HPCC: Official submission process1. Download

2. Install

3 . Run

4. Upload results

5. Confirm via @email@

6. Tune

7. Run

8. Upload results

9. Confirm via @email@• Only some routines can be replaced.• Data layout needs to be preserved.• Multiple languages can be used.

Provide detailedinstallation andexecution environment.

Results are immediately availableon the Web site:● Interactive HTML● XML● MS Excel● Kiviat charts (radar plots)

Optional

Prerequisites:• C compiler• BLAS• MPI

Page 7: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

7 Luszczek_HPCC_SC07

HPCC: Submissions over time

10

100

1000

10000

100000

1000000

10000000

Nov-03

Jan-04

Mar-04

May-04

Jul-04

Sep-04

Nov-04

Jan-05

Mar-05

May-05

Jul-05

Sep-05

Nov-05

Jan-06

Mar-06

May-06

Jul-06

Sep-06

1

10

100

1000

10000

100000

Jul-04

Aug-04

Sep-04

Oct-04

Nov-04

Dec-04

Jan-05

Feb-05

Mar-05

Apr-05

May-05

Jun-05

Jul-05

Aug-05

Sep-05

Oct-05

Nov-05

Dec-05

Jan-06

Feb-06

Mar-06

Apr-06

May-06

Jun-06

Jul-06

Aug-06

Sep-06

Oct-06

0.1

1

10

100

1000

10000

Nov-03

Jan-04

Mar-04

May-04

Jul-04

Sep-04

Nov-04

Jan-05

Mar-05

May-05

Jul-05

Sep-05

Nov-05

Jan-06

Mar-06

May-06

Jul-06

Sep-06

0.001

0.01

0.1

1

10

100

1000

Jul-04

Aug-04

Sep-04

Oct-04

Nov-04

Dec-04

Jan-05

Feb-05

Mar-05

Apr-05

May-05

Jun-05

Jul-05

Aug-05

Sep-05

Oct-05

Nov-05

Dec-05

Jan-06

Feb-06

Mar-06

Apr-06

May-06

Jun-06

Jul-06

Aug-06

Sep-06

Oct-06

STREAM[GB/s]

HPL[Tflop/s]

FFT[Gflop/s]

RandomAccess[GUPS]

Sum Sum

Sum Sum

#1

#1

#1

#1

Page 8: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

8 Luszczek_HPCC_SC07

HPCC: Comparing three interconnects

Kiviat chart (radar plot)• 3 AMD Opteron clusters− Clock: 2.2 GHz− 64-processor cluster

• Interconnect typesA. VendorB. CommodityC. GigE− G-HPL− Matrix-matrix multiply

• Cannot be differentiated based on− G-HPL− Matrix-matrix multiply

• Available on HPCC Web site− http://icl.cs.utk.edu/hpcc/

Page 9: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

9 Luszczek_HPCC_SC07

HPCC: Analysis of sample resultsHPCS ~102

HPC ~104

Clusters ~106

Systems(in Top500

order)

Meg

aG

iga

Tera

Pet

a

Effe

ctiv

e B

andw

idth

(w

ords

/sec

ond)

• All results inwords/second

• Highlightsmemoryhierarchy

• Clusters− Hierarchy

steepens

• HPC systems− Hierarchy

constant

• HPCS goals− Hierarchy

flattens− Easier to

program

Kilo

Page 10: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

10 Luszczek_HPCC_SC07

HPCC: Augmenting June TOP500

TOP500 rating Data provided by HPCC database

Page 11: The HPC Challenge (HPCC) Benchmark Suite · 2007. 10. 2. · 3 Luszczek_HPCC_SC07 HPCC: Motivation and measurement HPC Challenge Benchmarks Select Applications 0.00 0.20 0.40 0.60

11 Luszczek_HPCC_SC07

Contacts

Piotr Luszczek

The MathWorks, Inc.(508) [email protected]