34
CpE442 Lec2.1 Introduction to Computer Architectures CpE 442 Introduction to Computer Architecture The Role of Performance Instructor: H. H. Ammar

CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.1 Introduction to Computer Architectures

CpE 442Introduction to Computer Architecture

The Role of Performance

Instructor: H. H. Ammar

Page 2: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.2 Introduction to Computer Architectures

Overview of Today’s Lecture: The Role of Performance

° Review from Last Lecture

° Definition and Measures of Performance

° Benchmarks

° Summarizing Performance and Performance Pitfalls

Page 3: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.3 Introduction to Computer Architectures

Review: What is "Computer Architecture"

° Co-ordination of levels of abstraction

I/O systemInstr. Set Proc.

Compiler

OperatingSystem

Application

Digital Design

Circuit Design

° Under a set of rapidly changing Forces

Instruction SetArchitecture

Page 4: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.4 Introduction to Computer Architectures

Review: Levels of Representation

High Level Language Program

Assembly Language Program

Machine Language Program

Control Signal Specification

Compiler

Assembler

Machine Interpretation

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

lw $15, 0($2)

lw $16, 4($2)

sw $16, 0($2)sw $15, 4($2)

0000 1001 1100 0110 1010 1111 0101 1000

1010 1111 0101 1000 0000 1001 1100 0110

1100 0110 1010 1111 0101 1000 0000 1001

0101 1000 0000 1001 1100 0110 1010 1111

Page 5: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.5 Introduction to Computer Architectures

Review: Levels of Organization

SPARCstation 20

SPARCProcessor

Computer

Control

Datapath

Memory Devices

Input

Output

Page 6: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.6 Introduction to Computer Architectures

1. The HASE Architecture Simulation Environment

2. The New Compiler Technology simulation (shown in class)

3. MIPS Assembly Language Simulatorsa. SPIM A MIPS32 Simulatorhttp://pages.cs.wisc.edu/~larus/spim.html

b. MARS (MIPS Assembler and Runtime Simulator)http://courses.missouristate.edu/kenvollmar/mars/

Computer Architecture Simulation Tools

Page 7: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.7 Introduction to Computer Architectures

Review: Summary from Last Lecture

° All computers consist of five components

• Processor: (1) datapath and (2) control

• (3) Memory

• (4) Input devices and (5) Output devices

° Not all “memory” are created equally

• Cache: fast (expensive) memory are placed closer to the processor

• Main memory: less expensive memory--we can have more

° Input and output (I/O) devices has the messiest organization

• Wide range of speed: graphics vs. keyboard

• Wide range of requirements: speed, standard, cost ... etc.

• Least amount of research (so far)

Page 8: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.8 Introduction to Computer Architectures

Overview of Today’s Lecture: The Role of Performance

° Review from Last Lecture

° Definition and Measures of Performance

° Benchmarks

° Summarizing Performance and Performance Pitfalls

Page 9: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.9 Introduction to Computer Architectures

Metrics of performance

Compiler

Programming

Language

Application

Datapath

Control

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second – MIPS

(millions) of (F.P.) operations per second – MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Response time, Answers per month

Operations per second

Page 10: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.10 Introduction to Computer Architectures

Relating Processor Metricsthe execution time of a given program on a given CPU architecture

° CPU execution time = CPU clock cycles/pgm X clock cycle time

° or CPU execution time = CPU clock cycles/pgm ÷ clock rate

° Define CPI = the avg. clock cycles per instruction, CPI tells us something about the Instruction Set Architecture, the Implementation of that architecture, and the program being measured

° CPU clock cycles/pgm = Instructions/pgm X CPI

° or CPI = CPU clock cycles/pgm ÷ Instructions/pgm

Page 11: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.11 Introduction to Computer Architectures

Aspects of CPU Performance,

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

instr. count CPI clock rate

Program

Compiler

Instr. Set Arch.

Organization

Technology

Page 12: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.12 Introduction to Computer Architectures

Aspects of CPU Performance

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

instr count CPI clock rate

Program X (x)

Compiler X (x)

Instr. Set. X X

Organization X X

Technology X

Page 13: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.13 Introduction to Computer Architectures

Figures from a Simulator for the following code segment comparing two compilers

for (i=0;i<3;i++) { in_a(i)++; int_b(i)++; flt_d(i) = flt_d(i) + flt_c(i); }

Page 14: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.14 Introduction to Computer Architectures

Page 15: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.15 Introduction to Computer Architectures

Page 16: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.16 Introduction to Computer Architectures

Page 17: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.17 Introduction to Computer Architectures

Page 18: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.18 Introduction to Computer Architectures

Organizational Trade-offs

Compiler

Programming

Language

Application

Datapath

Control

Transistors Wires Pins

ISA

Function Units

Instruction Mix

Cycle Time

CPISingle-Cycle Processor Design

CPI=1, large cycle time-Slow clock

Multi-cycle Processor Design

CPI > 1, smaller cycle time- Faster

clock

Page 19: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.19 Introduction to Computer Architectures

CPI

CPU time = ClockCycleTime * S CPI * Ii = 1

n

i i

CPI = S CPI * F where F = I

i = 1

n

i i i i

Instruction Count

"instruction frequency"

Invest Resources where time is Spent!

CPI = (CPU Time * Clock Rate) / Instruction Count

= Clock Cycles / Instruction Count

“Average cycles per instruction”

The performance equation can be written as follows using instruction classes

and the instruction count I and CPI for each class i

See example next slide

Page 20: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.20 Introduction to Computer Architectures

Example

Typical Mix

Base Machine (Reg / Reg)

Op Freq(Fi) CPI(i) % Time

ALU 50% 1 .5 33%

Load 20% 2 .4 27%

Store 10% 2 .2 13%

Branch 20% 2 .4 27%

1.5

The CPI = 1.5 cycles per instruction

Page 21: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.21 Introduction to Computer Architectures

Enhanced Machine (Reg / Reg)

Op Freq CPI(i) % Time

ALU 50% 1 .5 25%

Load 20% 3 .6 30%

Store 10% 3 .3 15%

Branch20% 3 .6 30%

2.0

Assume a program of 1 million instructions, Compare the

performance of

Base Machine (B) with the above CPI, 1 GHZ clock, and

Enhanced Machine (E) with 1.333 GHZ and a one cycle increase

for L/S and branch instructions

Page 22: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.22 Introduction to Computer Architectures

Comparing the performance of two machines

Speedup due to enhancement E:

ExTime w/o E Performance w/ E

Speedup(E) = -------------------- = ---------------------

ExTime w/ E Performance w/o E

= Perf. of E / Perf. of B = exec. Time of B /

exec. Time of E

= 1.5 * 1 / 2 * 0.75 = 1

Performance of B is similar to that of E,

No gain in performance

Page 23: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.23 Introduction to Computer Architectures

Rate Metrics – MIPS (Million Instructions Per Second), and

MFLPOS (Miilions Floating Point Operations Per Second)

MIPS = Instruction Count / (CPU Time * 10^6)

= Clock Rate / (CPI * 10^6)

•machines with different instruction sets ?

•programs with different instruction mixes ?

dynamic frequency of instructions

• uncorrelated with performance

MFLOP/S= FP Operations / (Time * 10^6)

•machine dependent

•often not where time is spent

Page 24: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.24 Introduction to Computer Architectures

Example showing why MIPS can failCompare performance with Compilers 1 and 2 for a given program on a given machine

Instruction Count in Billions forinstruction classes A B C

Compiler 1 Instruction Count 5 1 1Compiler 2 Instruction Count 10 1 1CPI for each class 1 2 3

Clock cycles using compiler1 = 10 BillionClock cycles using compiler2 = 15 Billion

assuming 1GHZ clockCPU Time 1 = 5x1+1x2 +1x3 = 10 secsCPU Time 2 = 10x1 + 1x2 + 1x3 = 15 secs

yet the MIPS rating isMIPS 1 = (instr. Count/cpu time in sec x 10^6)

= (5+1+1)/10 * 1000 = 700MIPS 2 = 12/15 * 1000 = 800giving the impression that 2 have a higher rate of executing instructions than 1

Page 25: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.25 Introduction to Computer Architectures

Overview of Today’s Lecture: The Role of Performance

° Review from Last Lecture

° Definition and Measures of Performance

° Benchmarks

° Summarizing Performance and Performance Pitfalls

Page 26: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.26 Introduction to Computer Architectures

Why Do Benchmarks?

° How we evaluate differences

• Different systems

• Changes to a single system

° Provide a target

• Benchmarks should represent large class of important programs

• Improving benchmark performance should help many programs

° For better or worse, benchmarks shape a field

° Good ones accelerate progress

• good target for development

° Bad benchmarks hurt progress

• help real programs v. sell machines/papers?

• Inventions that help real programs don’t help benchmark

Page 27: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.27 Introduction to Computer Architectures

Programs to Evaluate Processor Performance

° (Toy) Benchmarks

• 10-100 line

• e.g.,: sieve, puzzle, quicksort

° Synthetic Benchmarks

• attempt to match average frequencies of real workloads

• e.g., Whetstone, dhrystone

° Kernels

• Time critical excerpts Real programs

• e.g., gcc, spice

Page 28: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.28 Introduction to Computer Architectures

Successful Benchmark: SPEChttp://www.spec.org/benchmarks.html

http://mrob.com/pub/comp/benchmarks/spec.html#CPU_06

° EE Times + 5 companies band together to form the Systems Performance Evaluation Committee (SPEC): Sun, MIPS, HP, Apollo, DEC

° Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O

Page 29: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.29 Introduction to Computer Architectures

SPEC second round, SPEC95

• 8 integer benchmarks in C and 10 floating pt benchmarks in Fortran

Page 30: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.30 Introduction to Computer Architectures

Page 31: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.31 Introduction to Computer Architectures

Page 32: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.32 Introduction to Computer Architectures

Overview of Today’s Lecture: The Role of Performance

° Review from Last Lecture

° Definition and Measures of Performance

° Benchmarks

° Summarizing Performance and Performance Pitfalls

Page 33: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.33 Introduction to Computer Architectures

Amdahl's Law

Speedup due to enhancement E:

ExTime w/o E Performance w/ E

Speedup(E) = -------------------- = ---------------------

ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task

by a factor S and the remainder of the task is unaffected then,

ExTime(with E) = ((1-F) + F/S) X ExTime(without E)

Speedup(with E) = ExTime(without E) ÷((1-F) + F/S) X ExTime(without E)

<= 1/(1-F) speed up is bounded by this factor

Page 34: CpE 242 Introduction to Computer Architecture Lecture 2 ...community.wvu.edu/~hhammar/cpe242_files/lec2f.pdf · CpE442 Lec2.7 Introduction to Computer Architectures Review: Summary

CpE442 Lec2.34 Introduction to Computer Architectures

Performance Evaluation Summary

° Time is the measure of computer performance!

° Good products created when have:

• Good benchmarks

• Good ways to summarize performance

° If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins

° Remember Amdahl’s Law: Speedup is limited by unimproved part of programs

° HW 1, Submit via ecampus

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle