27
1 © 2004 Babak Falsafi 18-200 Introduction to Computer Architecture at Carnegie Mellon Or What are we doing with all these transistors? Fall 2004 Prof. Babak Falsafi

calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

1© 2004 Babak Falsafi

18-200

Introduction toComputer Architecture at Carnegie MellonOr

What are we doing with all these transistors?

Fall 2004Prof. Babak Falsafi

Page 2: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

2© 2004 Babak Falsafi

18-240 Fundamentals of Computer Engineering18-447: Introduction to Computer Architecture18-741: Advanced Computer Architecture18-747: Advanced Topics in Microarchitecture

High-Perf. Memory SystemsHigh-Perf. Memory Systems

[email protected]

Associate ProfessorDepartment of ECE & CSCALCMCSSIhttp://www.ece.cmu.edu/~babak

Babak FalsafiBabak Falsafi Courses:Courses:

0.0625

10

0.25

6

80

0.01

0.1

1

10

100

1000

Clo

cks

per i

nstr

uctio

n

0.01

0.1

1

10

100

1000

Clo

cks

per D

RA

M a

cces

s

CPUMemory

Nanoscale CMOS SystemsNanoscale CMOS SystemsProcessor/memory performance gap will continue to increase!

VAX/1980 PPro/1996 2010+

Tempest Tempest (w/ Ailamaki)(w/ Ailamaki)

• Streaming memory engine

• Correlate temporal data & insts.

• Predict “when” to stream

• Bridge the CPU/memory perf. gapBenchmark (SPEC 2K) Execution Time

Intel 2.8 GHzPentium 4

Fastest simulator of Pentium 4

Hour

s

10

100

1000

10,000

1.49 hours

250 days

Full-workload simulation is not tractable!

SimFlexSimFlex (w/ Hoe & Nowatzyk)(w/ Hoe & Nowatzyk)

• Fast & Accurate Simulation

• Using Statistical sampling

• Reduce time/benchmark from days to secs.

CPU L1-IL1-D

L2Node Engine

L1-IL1-D

Home EngineParity

CPU L1-IL1-D

L2Node Engine NIU

MEMBRANE

L1-IL1-D

Home Engine

Mem Cntlr

NIU

I/OChDRAM

L1-IL1-D

Blade0

Blade1

Bladen-1

TRUSS TRUSS (w/ Hoe & Nowatzyk)(w/ Hoe & Nowatzyk)Failure trends: • High chip density -> more error-prone• High hard & transient failure rate• Downtime cost for brokerage operations:

$6,450,000/h [Patterson, ROC keynote]

TRUSS = “RAID” for Computing• Non-Stop Scalable Servers

Chiller Chiller (w/ Asheghi)(w/ Asheghi)

Dynamic HotSpot management Average heat much lower than peakBut, designs target worst-case heat ->

expensive + unreliable

Chiller processors:Identify HotSpots using nanosensorsChill on the fly

Page 3: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

3© 2004 Babak Falsafi

Computers are Digital SystemsComputer system“pizza box”

…implementedas a digital systemon a big circuit board

…implementedas digital functionson individual chips

…implemented as atomicsubsystems (e.g. CPU) on complex packages like “multichip modules”

Page 4: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

4© 2004 Babak Falsafi

What’s Inside a Computer

• Processor(s), Microprocessor(s), or CPU(s)

• Memory subsystem• I/O subsystem

network, disk drives, keyboard, mouse, etc.

CPU

Memory

I/O Video

Bus

CPU

…..

I/O

…..

..

Page 5: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

5© 2004 Babak Falsafi

What is Computer Architecture?

• The science and art of selecting and interconnecting hardware components to create a computer that meets functional, performance and cost goals.

Selection process evolves because• Technology changes

Enables new applications

Page 6: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

6© 2004 Babak Falsafi

Why Study Computer Architecture?

http://www.intel.com/research/silicon/mooreslaw.htm

Page 7: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

7© 2004 Babak Falsafi

Besides More Transistors….

Technology Annual Improvement Transistor speed 20%-25%Memory density 60%Memory speed 4%Disk density 25%Disk speed 4%

Other aspects also improve

Page 8: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

8© 2004 Babak Falsafi

Processor Performance“Unmatched by any other industry” [John Crawford, Intel Fellow, 1993]

Doubling every 18 months (1982-1996): total of 800X- Cars travel at 44,000 MPH; get 16,000 miles/gal.- Air travel: L.A. to N.Y. in 22 seconds (MACH 800)- Wheat yield: 80,000 bushels per acre

Doubling every 24 months (1970-1996): total of 9,000X- Cars travel at 600,000 MPH; get 150,000 miles/gal.- Air travel: L.A. to N.Y. in 2 seconds (MACH 9,000)- Wheat yield: 900,000 bushels per acre

Exponential effect

Page 9: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

9© 2004 Babak Falsafi

User Requirements Change Too

• Types of applications todayScientific

Weather prediction, crash analysis, earthquake analysis, medical imaging, imaging of the earth (searching for oil)

Businessdatabase, data mining, video

General purposeMicrosoft Word, Excel

Real-timeautomated control systems

GamesNintendo

Mobile

• Tomorrow: you can think of one before leaving class today…..

Page 10: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

10© 2004 Babak Falsafi

A Historical Perspective • In the beginning...Eniac

5,000 additions in one second

Page 11: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

11© 2004 Babak Falsafi

Eniac

Built at the University of PennsylvaniaLt Gillon, Eckert and Mauchley

Initial contract for $61,700, June 1943eventually cost $486,804.22, in 1946Accumulator deployed in Jun 1944Accumulator, multiplier, divide and square root and 3 portable function tables completed in Fall 1945

200 µsecond cycle time for 1 addInternals

19K vacuum tubes, 1.5K relays, 100K’s of resistors, and inductors; 30 separate units; forced air cooling; Multiply’s in base 10; just like a humanOriginally, no internal memory --> programmed w/cables and switches

Designed to compute firing tablesDifferential equations of motion to compute trajectory in 15 seconds (same amount of computation took a human 20 hours)

Power = 200K Watts

Page 12: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

12© 2004 Babak Falsafi

Four Decades of Microprocessor

The Decade of the 1970’s “Microprocessors”- Single-Chip Microprocessors- Personal Computers (PC)

The Decade of the 1980’s “Quantitative Architecture”- Instruction Pipelining- Fast Cache Memories- Workstations

The Decade of the 1990’s “Instruction-Level Parallelism”- Superscalar Processors- Low-Cost Desktop Supercomputing

The Decade of the 2000’s

You will learn in 18-347/18-741/18-742/18-747

Page 13: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

13© 2004 Babak Falsafi

Intel 4004, circa 1971

• The first single chip CPU4-bit processor for a calculator. 1K data memory 4K program memory 2,300 transistors16-pin DIP package740kHz (eight clock cycles per CPU cycle of 10.8 microseconds)~100K OPs per second

Molecular Expressions: Chipshots

Page 14: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

14© 2004 Babak Falsafi

• Performance leader in floating-point apps

64-bit processor3 MByte on-chip memory!!221 million transistor1 GHz, issue up to 8 instruction per cycle

In ~30 years, about 100,000 fold growth in transistor count and performance!

Intel Itanium 2, circa 2002

http://cpus.hp.com/images/die_photos/McKinley_die.jpg

Page 15: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

15© 2004 Babak Falsafi

Where Are Microprocessors?

• Right now…Supercomputers, workstations, PCsSet-Top boxesDVD players, CD playersCars (many per car)Modems, network cardsToasters, microwave ovens, fridges

• Soon…Your clothing, your glasses, your jewelry…Everywhere

Page 16: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

16© 2004 Babak Falsafi

“Non-computer” ICs Have CPUs Too

Memories,ROMS &RAMS

Analogsensors,

signalprocessing

Controllogic CPU

Page 17: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

17© 2004 Babak Falsafi

Computer Architecture @ CMU: Curriculum

18-747

18-447

Introduction to Architecture

18-741Advanced Architecture

18-742

Multiprocessors

18-744

MicroarchitectureHardware Engineering

18-743

Energy-AwareSystems

15-213 18-240Pre-reqs

Page 18: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

18© 2004 Babak Falsafi

Computer Architecture Lab (CALCM)

• Pronounced “calcium”• Architecture students & faculty • “At Carnegie Mellon” means

ECE + CS + …• 10 faculty• Lots of very stellar students

(over 30 at last count) • Over a dozen research projects• Seminar on Tuesdays @ 4pm

http://www.ece.cmu.edu/CALCM

Page 19: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

19© 2004 Babak Falsafi

What Are Hot Topics in Architecture? Topic #1: Parallelism

Want to make programs go fastExecute program in parallel• Pipeline execution • Superscalar multiple

pipelines• Like mult. lunch buffetsResearch:• Independent exec?• Program control flow?

Page 20: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

20© 2004 Babak Falsafi

Topic #2: The Memory Wall

Processor/memory performance gap will continue to increase!

0.0625

10

0.25

6

80

0.01

0.1

1

10

100

1000

Clo

cks

per i

nstr

uctio

n

0.01

0.1

1

10

100

1000

Clo

cks

per D

RA

M a

cces

s

CPUMemory

VAX/1980 PPro/1996 2010+

Page 21: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

21© 2004 Babak Falsafi

100G

Today’s Solution• Hierarchy of memory levels• Trade off capacity for speed• Like fast short term memory• When looking, CPU must wait

Recent research indicates• Server running top 4 DB systems• Half of the time, CPU is idle

waiting for memory

MemoryMemory

CCPPUU

L3 32M

L2 1M

1000

clk

L1 32K

1 cl

k10

clk

100

clk

Page 22: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

22© 2004 Babak Falsafi

Bigger Problem in MultiprocessorsTypical platforms:1. Chips with multiple CPUs2. Servers with multiple chips3. Memory shared across

Memory access:• Traverse multiple

hierarchies

Current research:• Track repetitive patterns• Replay ahead of time to

overlap latency

PP

PP

Mem

Multiprocessor Server

PP

PPP

PP

P

MemMem

Page 23: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

23© 2004 Babak Falsafi

Topic #3: Slow Design Evaluation Tools

> 7 trillion inst.in SPEC CPU benchmarks(www.spec.org)

Full-benchmark simulation is not tractable

Benchmark Execution Time

Intel 2.8 GHzPentium 4

Fastest P4 Simulator

Hou

rs

10

100

1000

10,000

1.49 hours

250 days

Current research:

• Statistical sampling

• Accurate & fast measurement

Page 24: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

24© 2004 Babak Falsafi

Topic #4: Future CMOS Digital Systems

• In 2010, we will have chips with billions and billions of tiny but lousy transistors

“Leak” power need coolingLose information need protectionHighly defective cost a lotComplex designs need verification

• Research:Circumvent trans. limitations

Page 25: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

25© 2004 Babak Falsafi

Reliability in Future CMOS Systems

• Highly-integrated chipsmore error-prone

• High rate of failure in future hard failure or bit flips due to cosmic rays

• High complexity designs

Reliability/availability is key in many important enterprise applications

Downtime cost for brokerage operations: $6,450,000/h [Patterson, ROC keynote]

Current research:• Make systems “bullet-proof” for critical apps

Page 26: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

26© 2004 Babak Falsafi

Power in Future CMOS Systems

Power consumption is hitting the roof• Chip power soon approaching 1KW• Buildings → Hundreds of multi-KW desktops/servers• Battery is classic problem example• You also run out of electricity → e.g., California

But, heat density prohibitive in high-perf. systems!• Heat adversely affects reliability (remember AMD?)• Some rack-mounted blades are already heat-limited

Current research:• Reduced energy/heat with minimal perf. impact

Page 27: calcm - Carnegie Mellon Universityusers.ece.cmu.edu/~jzhu/class/18200/F04/Lecture09... · 18-741: Advanced Computer Architecture 18-747: Advanced Topics in Microarchitecture High-Perf

27© 2004 Babak Falsafi

Topic #5: Beyond CMOS

Molecular technology:• Carbon nanotubes• Act as both wires and switches• Trillions, trillions of “transistors”

Quantum technology:• At its very infancy