Upload
anabel-bryan
View
223
Download
0
Embed Size (px)
Citation preview
Compsci 001 4.1
Today’s topics
Performance & Computer Architecture Notes from David A. Patterson and John L.
Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann, 1997.
http://computer.howstuffworks.com/pc.htm Slides from
Alvy Lebeck, Duke CS Marti Hearst, UC Berkeley SIMS David Patterson, UC Berkeley CS Mounir Hamdi, HKUST CS
Upcoming Complexity
Compsci 001 4.2
Performance Performance= 1/Time
The goal for all software and hardware developers is to increase performance
Metrics for measuring performance (pros/cons?)
Elapsed time CPU time
• Instruction count (RISC vx. CISC) • Clock cycles per instruction • Clock cycle time
MIPS vs. MFLOPS Throughput (tasks/time) Other more subjective metrics?
What kind of workload to be used? Applications, kernels and benchmarks (toy or synthetic)
Compsci 001 4.3
What is Realtime? Response time
Panic• How to tell “I am still computing”• Progress bar
Flicker Fusion frequency
Update rate vs. refresh rate Movie film standards (24 fps projected at 48 fps)
Interactive media Interactive vs. non-interactive graphics
• computer games vs. movies• animation tools vs. animation
Interactivity real-time systems• system must respond to user inputs without any
perceptible delay (A Primary Challenge in VR)
Compsci 001 4.4
The Big Picture
Control
Datapath
Memory
Processor
Input
Output
Since 1946 all computers have had 5 components
The Von Neumann Machine
What is computer architecture?Computer Architecture = Machine Organization + Instruction Set Architecture + ...
Compsci 001 4.5
Fetch, Decode, Execute Cycle
Computer instructions are stored (as bits) in memory
A program’s execution is a loop Fetch instruction from memory Decode instruction Execute instruction
Cycle time Measured in hertz (cycles per second) 2 GHz processor can execute this cycle up to 2
billion times a second Not all cycles are the same though…
Compsci 001 4.6
Organization Logic Designer's View
ISA Level
FUs & Interconnect
Capabilities & Performance Characteristics of Principal Functional Units (Fus) (e.g., Registers, ALU, Shifters,
Logic Units, ...) Ways in which these components
are interconnected Information flows between
components Logic and means by which such
information flow is controlled. Choreography of FUs to realize
the ISA
Compsci 001 4.7
Instruction Set Architecture... the attributes of a [computing] system as seen by
the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.
– Amdahl, Blaaw, and Brooks, 1964
SOFTWARESOFTWARE-- Organization of Programmable Storage
-- Data Types & Data Structures: Encodings & Representations
-- Instruction Set
-- Instruction Formats
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions
Compsci 001 4.8
The Instruction Set: a Critical Interface
instruction set
What is an example of an Instruction Set architecture?
Compsci 001 4.9
Forces on Computer Architecture
ComputerArchitecture
Technology ProgrammingLanguages
OperatingSystems
History
Applications
Cleverness
Compsci 001 4.10
Technology
In ~1985 the single-chip processor (32-bit) and the single-board computer emerged => workstations, personal computers,
multiprocessors have been riding this wave since
Now, we have multicore processors
DRAM
Year Size
1980 64 Kb
1983 256 Kb
1986 1 Mb
1989 4 Mb
1992 16 Mb
1996 64 Mb
1999 256 Mb
2002 1 Gb
2007 2 Gb
2009 4 Gb
uP-Name
Microprocessor Logic DensityDRAM chip capacity
Compsci 001 4.11
Technology => dramatic change Processor
logic capacity: about 30% per year clock rate: about 20% per year
Memory DRAM capacity: about 60% per year (4x every 3
years) Memory speed: about 10% per year Cost per bit: improves about 25% per year
Disk capacity: about 60% per year Total use of data: 100% per 9 months!
Network Bandwidth Bandwidth increasing more than 100% per year!
Compsci 001 4.12
Performance Trends
Compsci 001 4.13
Processor Transistor Count (from http://en.wikipedia.org/wiki/Transistor_count)
Processor Transistor count
Date of intro-duction
Manufactu-rer
Intel 4004 2300 1971 Intel
Intel 8008 2500 1972 Intel
Intel 8080 4500 1974 Intel
Intel 8088 29 000 1978 Intel
Intel 80286 134 000 1982 Intel
Intel 80386 275 000 1985 Intel
Intel 80486 1 200 000 1989 Intel
Pentium 3 100 000 1993 Intel
AMD K5 4 300 000 1996 AMD
Pentium II 7 500 000 1997 Intel
AMD K6 8 800 000 1997 AMD
Pentium III 9 500 000 1999 Intel
AMD K6-III 21 300 000 1999 AMD
AMD K7 22 000 000 1999 AMD
Pentium 4 42 000 000 2000 Intel
Processor Transistor count
Date of introdu-ction
Manufacturer
Itanium 25 000 000 2001 Intel
Barton 54 300 000 2003 AMD
AMD K8 105 900 000 2003 AMD
Itanium 2 220 000 000 2003 Intel
Itanium 2 with 9MB cache
592 000 000 2004 Intel
Cell 241 000 000 2006 Sony/IBM/Toshiba
Core 2 Duo 291 000 000 2006 Intel
Core 2 Quadro 582 000 000 2006 Intel
Dual-Core Itanium 2
1 700 000 000 2006 Intel
Quad-Core Itanium
2 000 000 000 200 Intel
Compsci 001 4.14
Processor-Memory Speed Gap
µProc50%/yr.
DRAM9%/yr.(2X/10 yrs)
1
10
100
1000
198
0198
1
198
3198
4198
5198
6198
7198
8198
9199
0199
1199
2199
3199
4199
5199
6199
7199
8199
9200
0
DRAM
CPU198
2
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
“Moore’s Law”
Compsci 001 4.15
Latency vs. Throughput
Compsci 001 4.16
Memory bottleneck
CPU can execute dozens of instruction in the time it takes to retrieve one item from memory
Solution: Memory Hierarchy Use fast memory Registers Cache memory Rule: small memory is fast, large memory is
small
Compsci 001 4.17
A great idea in computer science Temporal locality
Programs tend to access data that has been accessed recently (i.e. close in time)
Spatial locality Programs tend to access data at an address near
recently referenced data (i.e. close in space)
Useful in graphics and virtual reality as well Realistic images require significant
computational power Don’t need to represent distant objects as well
Efficient distributed systems rely on locality Memory access time increases over a network Want to acess data on local machine
Compsci 001 4.18
Microprocessor Generations
First generation: 1971-78 Behind the power curve
(16-bit, <50k transistors) Second Generation: 1979-85
Becoming “real” computers (32-bit , >50k transistors)
Third Generation: 1985-89 Challenging the “establishment”
(Reduced Instruction Set Computer/RISC, >100k transistors)
Fourth Generation: 1990- Architectural and performance leadership
(64-bit, > 1M transistors, Intel/AMD translate into RISC internally)
Compsci 001 4.19
In the beginning (8-bit) Intel 4004
First general-purpose, single-chip microprocessor
Shipped in 1971 8-bit architecture, 4-bit
implementation 2,300 transistors Performance < 0.1 MIPS
(Million Instructions Per Sec)
8008: 8-bit implementation in 1972
3,500 transistors First microprocessor-
based computer (Micral) • Targeted at laboratory
instrumentation• Mostly sold in Europe
All chip photos in this talk courtesy of Michael W. Davidson and The Florida State University
Compsci 001 4.20
1st Generation (16-bit) Intel 8086
Introduced in 1978 Performance < 0.5
MIPS New 16-bit architecture
“Assembly language” compatible with 8080
29,000 transistors Includes memory
protection, support for Floating Point coprocessor
In 1981, IBM introduces PC Based on 8088--8-bit
bus version of 8086
Compsci 001 4.21
2nd Generation (32-bit) Motorola 68000
Major architectural step in microprocessors:
First 32-bit architecture• initial 16-bit implementation
First flat 32-bit address• Support for paging
General-purpose register architecture
• Loosely based on PDP-11 minicomputer
First implementation in 1979 68,000 transistors < 1 MIPS (Million
Instructions Per Second) Used in
Apple Mac Sun , Silicon Graphics, &
Apollo workstations
Compsci 001 4.22
3rd Generation: MIPS R2000
Several firsts: First (commercial) RISC
microprocessor First microprocessor to
provide integrated support for instruction & data cache
First pipelined microprocessor (sustains 1 instruction/clock)
Implemented in 1985 125,000 transistors 5-8 MIPS (Million
Instructions per Second)
Compsci 001 4.23
4th Generation (64 bit) MIPS R4000
First 64-bit architecture Integrated caches
On-chip Support for off-chip, secondary
cache Integrated floating point Implemented in 1991:
Deep pipeline 1.4M transistors Initially 100MHz > 50 MIPS
Intel translates 80x86/ Pentium X instructions into RISC internally
Compsci 001 4.24
Key Architectural Trends
Increase performance at 1.6x per year (2X/1.5yr) True from 1985-present
Combination of technology and architectural enhancements Technology provides faster transistors
( 1/lithographic feature size) and more of them Faster transistors leads to high clock rates More transistors (“Moore’s Law”):
• Architectural ideas turn transistors into performance
– Responsible for about half the yearly performance growth Two key architectural directions
Sophisticated memory hierarchies Exploiting instruction level parallelism
Compsci 001 4.25
Where have all the transistors gone?
Superscalar (multiple instructions per clock cycle)
Execution
Icache
Dcache
branch
TLB
Intel Pentium III (10M transistors)
2 Bus Intf
Out-Of-Order
SS
• Branch prediction (predict outcome of decisions)
• 3 levels of cache
• Out-of-order execution (executing instructions in different order than programmer wrote them)
Compsci 001 4.26
Laws?
Define each of the following. What has its effect been on the advancement of computing technology?
Moore’s Law
Amdahl’s Law
Metcalfe’s Law