26
Compsci 001 4.1 Today’s topics Performance & Computer Architecture Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann, 1997. http://computer.howstuffworks.com/pc.htm Slides from Alvy Lebeck, Duke CS Marti Hearst, UC Berkeley SIMS David Patterson, UC Berkeley CS Mounir Hamdi, HKUST CS Upcoming Complexity

Compsci 001 4.1 Today’s topics l Performance & Computer Architecture Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Embed Size (px)

Citation preview

Page 1: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.1

Today’s topics

Performance & Computer Architecture Notes from David A. Patterson and John L.

Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann, 1997.

http://computer.howstuffworks.com/pc.htm Slides from

Alvy Lebeck, Duke CS Marti Hearst, UC Berkeley SIMS David Patterson, UC Berkeley CS Mounir Hamdi, HKUST CS

Upcoming Complexity

Page 2: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.2

Performance Performance= 1/Time

The goal for all software and hardware developers is to increase performance

Metrics for measuring performance (pros/cons?)

Elapsed time CPU time

• Instruction count (RISC vx. CISC) • Clock cycles per instruction • Clock cycle time

MIPS vs. MFLOPS Throughput (tasks/time) Other more subjective metrics?

What kind of workload to be used? Applications, kernels and benchmarks (toy or synthetic)

Page 3: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.3

What is Realtime? Response time

Panic• How to tell “I am still computing”• Progress bar

Flicker Fusion frequency

Update rate vs. refresh rate Movie film standards (24 fps projected at 48 fps)

Interactive media Interactive vs. non-interactive graphics

• computer games vs. movies• animation tools vs. animation

Interactivity real-time systems• system must respond to user inputs without any

perceptible delay (A Primary Challenge in VR)

Page 4: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.4

The Big Picture

Control

Datapath

Memory

Processor

Input

Output

Since 1946 all computers have had 5 components

The Von Neumann Machine

What is computer architecture?Computer Architecture = Machine Organization + Instruction Set Architecture + ...

Page 5: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.5

Fetch, Decode, Execute Cycle

Computer instructions are stored (as bits) in memory

A program’s execution is a loop Fetch instruction from memory Decode instruction Execute instruction

Cycle time Measured in hertz (cycles per second) 2 GHz processor can execute this cycle up to 2

billion times a second Not all cycles are the same though…

Page 6: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.6

Organization Logic Designer's View

ISA Level

FUs & Interconnect

Capabilities & Performance Characteristics of Principal Functional Units (Fus) (e.g., Registers, ALU, Shifters,

Logic Units, ...) Ways in which these components

are interconnected Information flows between

components Logic and means by which such

information flow is controlled. Choreography of FUs to realize

the ISA

Page 7: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.7

Instruction Set Architecture... the attributes of a [computing] system as seen by

the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.

– Amdahl, Blaaw, and Brooks, 1964

SOFTWARESOFTWARE-- Organization of Programmable Storage

-- Data Types & Data Structures: Encodings & Representations

-- Instruction Set

-- Instruction Formats

-- Modes of Addressing and Accessing Data Items and Instructions

-- Exceptional Conditions

Page 8: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.8

The Instruction Set: a Critical Interface

instruction set

What is an example of an Instruction Set architecture?

Page 9: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.9

Forces on Computer Architecture

ComputerArchitecture

Technology ProgrammingLanguages

OperatingSystems

History

Applications

Cleverness

Page 10: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.10

Technology

In ~1985 the single-chip processor (32-bit) and the single-board computer emerged => workstations, personal computers,

multiprocessors have been riding this wave since

Now, we have multicore processors

DRAM

Year Size

1980 64 Kb

1983 256 Kb

1986 1 Mb

1989 4 Mb

1992 16 Mb

1996 64 Mb

1999 256 Mb

2002 1 Gb

2007 2 Gb

2009 4 Gb

uP-Name

Microprocessor Logic DensityDRAM chip capacity

Page 11: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.11

Technology => dramatic change Processor

logic capacity: about 30% per year clock rate: about 20% per year

Memory DRAM capacity: about 60% per year (4x every 3

years) Memory speed: about 10% per year Cost per bit: improves about 25% per year

Disk capacity: about 60% per year Total use of data: 100% per 9 months!

Network Bandwidth Bandwidth increasing more than 100% per year!

Page 12: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.12

Performance Trends

Page 13: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.13

Processor Transistor Count (from http://en.wikipedia.org/wiki/Transistor_count)

Processor Transistor count

Date of intro-duction

Manufactu-rer

Intel 4004 2300 1971 Intel

Intel 8008 2500 1972 Intel

Intel 8080 4500 1974 Intel

Intel 8088 29 000 1978 Intel

Intel 80286 134 000 1982 Intel

Intel 80386 275 000 1985 Intel

Intel 80486 1 200 000 1989 Intel

Pentium 3 100 000 1993 Intel

AMD K5 4 300 000 1996 AMD

Pentium II 7 500 000 1997 Intel

AMD K6 8 800 000 1997 AMD

Pentium III 9 500 000 1999 Intel

AMD K6-III 21 300 000 1999 AMD

AMD K7 22 000 000 1999 AMD

Pentium 4 42 000 000 2000 Intel

Processor Transistor count

Date of introdu-ction

Manufacturer

Itanium 25 000 000 2001 Intel

Barton 54 300 000 2003 AMD

AMD K8 105 900 000 2003 AMD

Itanium 2 220 000 000 2003 Intel

Itanium 2 with 9MB cache

592 000 000 2004 Intel

Cell 241 000 000 2006 Sony/IBM/Toshiba

Core 2 Duo 291 000 000 2006 Intel

Core 2 Quadro 582 000 000 2006 Intel

Dual-Core Itanium 2

1 700 000 000 2006 Intel

Quad-Core Itanium

2 000 000 000 200 Intel

Page 14: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.14

Processor-Memory Speed Gap

µProc50%/yr.

DRAM9%/yr.(2X/10 yrs)

1

10

100

1000

198

0198

1

198

3198

4198

5198

6198

7198

8198

9199

0199

1199

2199

3199

4199

5199

6199

7199

8199

9200

0

DRAM

CPU198

2

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

“Moore’s Law”

Page 15: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.15

Latency vs. Throughput

Page 16: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.16

Memory bottleneck

CPU can execute dozens of instruction in the time it takes to retrieve one item from memory

Solution: Memory Hierarchy Use fast memory Registers Cache memory Rule: small memory is fast, large memory is

small

Page 17: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.17

A great idea in computer science Temporal locality

Programs tend to access data that has been accessed recently (i.e. close in time)

Spatial locality Programs tend to access data at an address near

recently referenced data (i.e. close in space)

Useful in graphics and virtual reality as well Realistic images require significant

computational power Don’t need to represent distant objects as well

Efficient distributed systems rely on locality Memory access time increases over a network Want to acess data on local machine

Page 18: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.18

Microprocessor Generations

First generation: 1971-78 Behind the power curve

(16-bit, <50k transistors) Second Generation: 1979-85

Becoming “real” computers (32-bit , >50k transistors)

Third Generation: 1985-89 Challenging the “establishment”

(Reduced Instruction Set Computer/RISC, >100k transistors)

Fourth Generation: 1990- Architectural and performance leadership

(64-bit, > 1M transistors, Intel/AMD translate into RISC internally)

Page 19: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.19

In the beginning (8-bit) Intel 4004

First general-purpose, single-chip microprocessor

Shipped in 1971 8-bit architecture, 4-bit

implementation 2,300 transistors Performance < 0.1 MIPS

(Million Instructions Per Sec)

8008: 8-bit implementation in 1972

3,500 transistors First microprocessor-

based computer (Micral) • Targeted at laboratory

instrumentation• Mostly sold in Europe

All chip photos in this talk courtesy of Michael W. Davidson and The Florida State University

Page 20: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.20

1st Generation (16-bit) Intel 8086

Introduced in 1978 Performance < 0.5

MIPS New 16-bit architecture

“Assembly language” compatible with 8080

29,000 transistors Includes memory

protection, support for Floating Point coprocessor

In 1981, IBM introduces PC Based on 8088--8-bit

bus version of 8086

Page 21: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.21

2nd Generation (32-bit) Motorola 68000

Major architectural step in microprocessors:

First 32-bit architecture• initial 16-bit implementation

First flat 32-bit address• Support for paging

General-purpose register architecture

• Loosely based on PDP-11 minicomputer

First implementation in 1979 68,000 transistors < 1 MIPS (Million

Instructions Per Second) Used in

Apple Mac Sun , Silicon Graphics, &

Apollo workstations

Page 22: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.22

3rd Generation: MIPS R2000

Several firsts: First (commercial) RISC

microprocessor First microprocessor to

provide integrated support for instruction & data cache

First pipelined microprocessor (sustains 1 instruction/clock)

Implemented in 1985 125,000 transistors 5-8 MIPS (Million

Instructions per Second)

Page 23: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.23

4th Generation (64 bit) MIPS R4000

First 64-bit architecture Integrated caches

On-chip Support for off-chip, secondary

cache Integrated floating point Implemented in 1991:

Deep pipeline 1.4M transistors Initially 100MHz > 50 MIPS

Intel translates 80x86/ Pentium X instructions into RISC internally

Page 24: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.24

Key Architectural Trends

Increase performance at 1.6x per year (2X/1.5yr) True from 1985-present

Combination of technology and architectural enhancements Technology provides faster transistors

( 1/lithographic feature size) and more of them Faster transistors leads to high clock rates More transistors (“Moore’s Law”):

• Architectural ideas turn transistors into performance

– Responsible for about half the yearly performance growth Two key architectural directions

Sophisticated memory hierarchies Exploiting instruction level parallelism

Page 25: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.25

Where have all the transistors gone?

Superscalar (multiple instructions per clock cycle)

Execution

Icache

Dcache

branch

TLB

Intel Pentium III (10M transistors)

2 Bus Intf

Out-Of-Order

SS

• Branch prediction (predict outcome of decisions)

• 3 levels of cache

• Out-of-order execution (executing instructions in different order than programmer wrote them)

Page 26: Compsci 001 4.1 Today’s topics l Performance & Computer Architecture  Notes from David A. Patterson and John L. Hennessy, Computer Organization and Design:

Compsci 001 4.26

Laws?

Define each of the following. What has its effect been on the advancement of computing technology?

Moore’s Law

Amdahl’s Law

Metcalfe’s Law