Benchmarks Programs specifically chosen to measure performance Must reflect typical workload of the user Benchmark types Real applications Small benchmarks

BenchmarksPrograms specifically chosen to

measure performanceMust reflect typical workload of the

user Benchmark types

Real applicationsSmall benchmarksBenchmark suitesSynthetic benchmarks

Real Applications

Workload: Set of programs a typical user runs day in and day out.

To use these real applications for metrics is a direct way of comparing the execution time of the workload on two machines.

Using real applications for metrics has certain restrictions:They are usually bigTakes time to port to different machinesTakes considerable time to executeHard to observe the outcome of a certain

improvement technique

Comparing & Summarizing Performance

A is 100 times faster than B for program 1B is 10 times faster than A for program 2For total performance, arithmetic mean is used:

Computer A

Computer B

Program 1 1 s 100 s

Program 2 1000 s 100 s

Total time 1001 s 200 s

n

iiTime

n 1

1AM

Arithmetic MeanIf each program, in the workload, are not run

equal # times, then we have to use weighted arithmetic mean:

n

iii Timew

n 1

1AM

weight Computer A Computer B

Program 1 (seconds)

10 1 100

Program 2 (seconds)

1 1000 100

Weighted AM - ? ?

• Suppose that the program 1 runs 10 times as often as the program 2. Which machine is faster?

Small BenchmarksSmall code segments which are common in many

applications For example, loops with certain instruction mixfor (j = 0; j<8; j++)

S = S + Aj Bi-j

Good for architects and designersSince small code segments are easy to compile and

simulate even by hand, designers use these kind of benchmarks while working on a novel machine

Can be abused by compiler designers by introducing special-purpose optimizations targeted at specific benchmark.

Benchmark SuitesSPEC (Standard Performance Evaluation

Corporation)Non-profit organization that aims to produce "fair,

impartial and meaningful benchmarks for computers”

Began in 1989 - SPEC89 (CPU intensive)Companies agreed on a set of real programs and

inputs which they hope reflect a typical user’s workload best.

Valuable indicator of performance Can still be abusedUpdates are required as the applications and

their workload change by time

SPEC Benchmark SetsCPU Performance (SPEC CPU2006)Graphics (SPECviewperf)High-performance computing (HPC2002,

MPI2007, OMP2001) Java server applications (jAppServer2004)

a multi-tier benchmark for measuring the performance of Java 2 Enterprise Edition (J2EE) technology-based application servers.

Mail systems (MAIL2001, SPECimap2003)Network File systems (SFS97_R1 (3.0))Web servers (SPEC WEB99, SPEC WEB99 SSL)More information: http://www.spec.org/

SPECIntInteger Benchmarks

Name Description

400.perlbench Programming Language

401.bzip2 Compression

403.gcc C Compiler

429.mcf Combinatorial Optimization

445.gobmk Artificial Intelligence

456.hmmer Search Gene Sequence

458.sjeng Artificial Intelligence

462.libquantum Physics / Quantum Computing

464.h264ref Video Compression

471.omnetpp Discrete Event Simulation

473.astar Path-finding Algorithms

483.xalancbmk XML Processing

SPECfpFloating Point Benchmarks

Name Type

wupwise Quantum chromodynamics

swim Shallow water model

mgrid multigrid solver in 3D potential field

applu Parabolic/elliptic partial dif. equation

mesa Three-dimensional graphics library

galgel Computational fluid dynamics

art Image recognition using neural nets

equake Seismic wave propagation simulation

facerec Image recognition of faces

ammp Computational chemistry

lucas Primality testing

fma3d Crash simulation

sixtrack High-energy nuclear physics acceleration design

apsi meteorology; pollutant distribution

SPEC CPU2006 – Summarizing SPEC ratio: the execution time measurements

are normalized by dividing the measured execution time by the execution time on a reference machineSun Microsystems Fire V20z, which has an AMD

Opteron 252 CPU, running at 2600 MHz.164.gzip benchmark executes in 90.4 s. The reference time for this benchmark is 1400 s, benchmark is 1400/90.4 × 100 = 1548 (a unitless

value) Performances of different programs in the suites

are summarized using “geometric mean” of SPEC ratios.

Pentium III & Pentium 4

Comparing Pentium III and Pentium 4

Ratio Pentium III Pentium 4

CINT2000/Clock rate in MHz 0.47 0.36

CFP2000/Clock rate in MHz 0.34 0.39

Implementation efficiency?

SPEC WEB99System Processor # of disk

drivers# of CPU

s

# of network

s

Clock rate (GHz)

Result

1550/1000

Pentium III 2 2 2 1 2765

1650 Pentium III 3 2 1 1.4 1810

2500 Pentium III 8 2 4 1.13 3435

2550 Pentium III 1 2 1 1.26 1454

2650 Pentium 4 Xeon 5 2 4 3.06 5698

4600 Pentium 4 Xeon 10 2 4 2.2 4615

6400/700 Pentium III Xeon 5 4 4 0.7 4200

6600 Pentium 4 Xeon MP

8 4 8 2 6700

8450/700 Pentium III Xeon 7 8 8 0.7 8001

Power Consumption Concerns Performance studied at different levels:1. Maximum power2. Intermediate level that conserves battery life3. Minimum power that maximizes battery life Intel Mobile Pentium & Pentium M:

two available clock rates1. Maximum2. Reduced clock rate

Pentium M @ 1.6/0.6 GHz Pentium 4-M @ 2.4/1.2 GHz Pentium III-M @ 1.2/0.8 GHz

Three Intel Mobile Processors

Energy Efficiency

Synthetic BenchmarksArtificial programs constructed to try to match

the characteristics of a large set of program.Goal: Create a single benchmark program where

the execution frequency of instructions in the benchmark simulates the instruction frequency in a large set of benchmarks.

Examples:Dhrystone, Whetstone

They are not real programsCompiler and hardware optimizations can inflate

the improvement far beyond what the same optimization would do with real programs

Amdahl’s Law in ComputingImproving one aspect of a machine by a

factor of n does not improve the overall performance by the same amount.

Speedup = (Performance after imp.) / (Performance before imp.)

Speedup = (Execution time before imp.)/(Execution time after imp.)

Execution Time After Improvement = Execution Time Unaffected +(Execution Time Affected/n)

Amdahl’s Law

Example: Suppose a program runs in 100 s on a machine, with multiplication responsible for 80 s of this time.

How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?

Can we improve the performance by a factor 5?

Amdahl’s Law

The performance enhancement possible due to a given improvement is limited by the amount that the improved feature is used.

In previous example, it makes sense to improve multiplication since it takes 80% of all execution time.

But after certain improvement is done, the further effort to optimize the multiplication more will yield insignificant improvement.

Law of Diminishing ReturnsA corollary to Amdahl’s Law is to make a common

case faster.

Examples Suppose we enhance a machine making all floating-

point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions?

We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 90 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark?

Remember

Total execution time is a consistent summary

of performance

Execution Time = (IC CPI)/f

For a given architecture, performance

increases come from:1. increases in clock rate (without too much

adverse CPI effects)2. improvements in processor organization that

lower CPI3. compiler enhancements that lower CPI and/or

IC

Documents

Benchmarks Programs specifically chosen to measure performance Must reflect typical workload of the user Benchmark types Real applications Small benchmarks