34
M. J. Flynn 1 HPEC ‘04 Banquet Talk: Area-Time-Power tradeoffs in computer design: the road ahead Michael J. Flynn

Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 1 HPEC ‘04

Banquet Talk: Area-Time-Power tradeoffs in computer design: the road ahead

Michael J. Flynn

Page 2: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

Report Documentation Page Form ApprovedOMB No. 0704-0188

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, ArlingtonVA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if itdoes not display a currently valid OMB control number.

1. REPORT DATE 01 FEB 2005

2. REPORT TYPE N/A

3. DATES COVERED -

4. TITLE AND SUBTITLE Banquet Talk: Area-Time-Power tradeoffs in computer design: the road ahead

5a. CONTRACT NUMBER

5b. GRANT NUMBER

5c. PROGRAM ELEMENT NUMBER

6. AUTHOR(S) 5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Stanford University

8. PERFORMING ORGANIZATIONREPORT NUMBER

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)

11. SPONSOR/MONITOR’S REPORT NUMBER(S)

12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited

13. SUPPLEMENTARY NOTES See also ADM00001742, HPEC-7 Volume 1, Proceedings of the Eighth Annual High PerformanceEmbedded Computing (HPEC) Workshops, 28-30 September 2004 Volume 1., The original documentcontains color images.

14. ABSTRACT

15. SUBJECT TERMS

16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT

UU

18. NUMBEROF PAGES

33

19a. NAME OFRESPONSIBLE PERSON

a. REPORT unclassified

b. ABSTRACT unclassified

c. THIS PAGE unclassified

Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

Page 3: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 2 HPEC ‘04

The marketplace

Page 4: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 3 HPEC ‘04

The computer design market (in millions produced, not value)

REF:J. Hines, “2003 Semiconductor Manufacturing Market: Wafer Foundry.” Gartner Focus Report, August 2003.

Device Type 2001 2002 2003 2004 2005 2006Computational MPUs 147 151 168 187 202 218Embedded MPUs 159 152 161 167 177 1864- and 8-bit MCUs 3982 4180 4570 5090 5540 597016-bit MCUs 589 626 678 714 797 79732-bit MCUs 190 216 247 368 511 673DSP 484 526 569 638 694 691Total 5551 5851 6393 7164 7921 8535

Page 5: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 4 HPEC ‘04

The computer design market growth

MPU Marketplace Growth

0%2%4%6%8%

10%12%14%16%18%20%

Computational MPUs MPUs Overall SOCs

Ann

ual %

Gro

wth

B. Lewis, “Microprocessor Cores Shape Future SLI/SOC Market.” Gartner, July 2002

Page 6: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 5 HPEC ‘04

Moore’s Law

• Something doubles (or is announced to double) right before a stockholders meeting.

• So that it really doubles about every 2 years

Suggested by: Greg Astfalk lecture

Page 7: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 6 HPEC ‘04

Semiconductor Industry Roadmap

Year 2004 2007 2013 2016(f) Technology generation (nm) 90 65 32 22Wafer size (cm) 30 30 45 45(ρ) Defect density (per cm2) 0.14 0.14 0.14 0.14(Α) µP die size (cm2) 3.1 3.1 3.1 3.1!!! Chip Frequency (GHz) 4.2 9.3 23 39.6MTx per Chip (Microprocessor) 553 1204 4424 8848!!! MaxPwr(W) High Performance 158 189 251 288

Semiconductor Technology Roadmap

Page 8: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 7 HPEC ‘04

Time (Performance), Area and Power Tradeoffs

Page 9: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 8 HPEC ‘04

There’s a lot of silicon out there … and it’s cheap

www.newburycamraclub.org.uk

Page 10: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 9 HPEC ‘04

Cost of SiliconAverage $/ sq cm

02468

101214

2004 2005 2006 2007

Year

$/ s

q cm

90nm130nm180nm350nmold

REF:J. Hines, “2003 Semiconductor Manufacturing Market: Wafer Foundry.” Gartner Focus Report, August 2003.

Page 11: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 10 HPEC ‘04

The basics of wafer fab

• A 30 cm, state of the art (ρ = 0.2) wafer fabfacility might cost $3B and require $5B/year sales to be profitable…that’s at least 5M wafer starts and almost 5B 1cm die /per year. A die (f=90nm) has 100-200 M tx /cm2.

• Ultimately at O($2000) per wafer that’s $2/ cm2 or 100M tx. So how to use them?

Page 12: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 11 HPEC ‘04

Area and CostIs efficient use of die area important? Is processor cost important?NO, to a point − server processors (cost

dominated by memory, power/ cooling, etc.)

YES – everything in the middle.NO – very small, embedded die which are

package limited. (10-100Mtx/die)

Page 13: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 12 HPEC ‘04

But it takes a lot of effort to design a chip

world’s first CADsystem?

© Canadian Museum of

Civilization Corporation

Page 14: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 13 HPEC ‘04

Design time: CAD productivity limitations favor soft designs

1

Logi

c Tr

ansi

stor

s pe

r Chi

p(K

)

Prod

uctiv

ityTr

ans.

/Sta

ff - M

onth

10

100

1,000

10,000

100,000

1,000,000

10,000,000

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000Logic Transistors/ChipTransistor/Staff Month

58%/Yr. compoundComplexity growth rate

21%/Yr. compoundProductivity growth rate

Source: S. Malik, orig. SEMATECH

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2003

2001

2005

2007

2009

xx xx x

x

x

2.5µ

.10µ

.35µ

Page 15: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 14 HPEC ‘04

Time

www.snowder.com

Page 16: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 15 HPEC ‘04

High Speed Clocking

Fast clocks are not primarily the result of technology scaling, but rather of architecture/logic techniques:

– Smaller pipeline segments, less clock overhead

• Modern microprocessors are increasing clock speed more rapidly than anyone (SIA) predicted…fast clocks or hyperclocking (really short pipe segments)

• But fast clocks do not by themselves increase system performance

Page 17: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 16 HPEC ‘04

Change in pipe segment size

0

20

40

60

80

100

120

1996 1999 2003 2006

FO4

dela

ys p

er c

lock

FO4 gatedelays

M.S. Hrishikesh et al., “The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays.” 29th ISCA: 2002.

Page 18: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 17 HPEC ‘04

CMOS processor clock frequencyCMOS clock rates

1

10

100

1000

10000

73 83 93 03

Year

Freq

uenc

y, M

Hz

Frequency

MHz History Chart: see Mac Info Home Page .

Page 19: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 18 HPEC ‘04

Bipolar and CMOS clock frequency

1

10

100

1000

10000

67 73 83 93 '03

year

freq

uenc

y

BipolarCMOSLimit

Bipolar power limit

Page 20: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 19 HPEC ‘04

Bipolar cooling technology (ca ’91)

Hitachi M880: 500 MHz; one processor/module, 40 die sealed in helium then cooled by a water jacket.

Power consumed: about 800 watts per module

F. Kobayashi, et al . “Hardware technology for Hitachi M-880.” Proceedings Electronic Components and Tech Conf., 1991.

Page 21: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 20 HPEC ‘04

Translating time into performance: scaling the walls

• The memory “wall”: performance is limited by the predictability & supply of data from memory. This depends on the access time to memory and thus on wire delay which remains constant with scaling.

• But (so far) significant hardware support (area, fast dynamic logic) has enabled designers to manage cache misses, branches, etc reducing memory access.

• There’s also a frequency (minimum segment size) and related power “wall”. Here the questions is how to improve performance without changing the segment size or increasing the frequency.

Page 22: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 21 HPEC ‘04

Missing the memory wall

• Itanium processors now have 9MB of cache/die, moving to > 27 MB

• A typical processor (in 2004) occupies < 20% of die, moving to 5%.

• Limited memory BW, access time (cache miss now over 300~)

• Result: large cache and efforts to improve memory & bus

Cache size and performance for the Itanium processor family

0

10

20

30

Itanium2(1.0GHz)[43]

Itanium2(1.5GHz)[43]

Itanium2(1.7GHz)[43]

MontecitoItanium2[30]

processors

Cac

he s

ize

0

2

4

6

8

Rel

ativ

e pe

rfor

man

ce

On die cache size(MB)Relative performance

S. Rusu, et al, “Itanium 2 Processor 6M: Higher Frequency and Larger L3 Cache.” IEEE Micro, March/April 2004.

Page 23: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 22 HPEC ‘04

Power

• Old rough an’ ready, Zachary Taylor

Page 24: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 23 HPEC ‘04

Power: the real price of performance

While Vdd and C (capacity) decrease, frequency increases at an accelerating rate thereby increasing power density

As Vdd decreases so does Vth; this increases Ileakage and static power.Static power is now a big problem in high performance designs.

Net: while increasing frequency may or may not increaseperformance - it certainly does increase power

Page 25: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 24 HPEC ‘04

Power

• Cooled high power: >70 W/ die• High power: 10- 50 W/ die … plug in supply• Low power: 0.1- 2 W / die.. rechargeable battery• Very low power: 1- 100 mW /die .. AA size

batteries• Extremely low power: 1- 100 micro Watt/die and

below (nano Watts) .. button batteries• No power: extract from local EM field,

…. O (1µW/die)

Page 26: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 25 HPEC ‘04

Achieving low power systems

• By scaling alone a 1000x slower implementation may need only 10-9 as much power.

• Gating power to functional units and other techniques should enable 100MHz processors to operate at O(10-3) watts.

• Goal: O(10-6) watts…. Implies about 10 MHz

freq2

freq1

=P2

P1

3By Vdd and device scaling

Page 27: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 26 HPEC ‘04

Extremely Low Power Architecture (ELPA), O (µW), getting there….

1) Scaling alone lowers power by reducing parasitic wire and gate capacitance.

2) Lowering frequency lowers power by a cubic

3) Slower clocked processors are better matched to memoryreducing caches, etc

4) Asynchronous clocking and logic eliminate clock power and state transitions.

5) Circuits: adiabatic switching, adaptive body biasing, etc

6) Integrated power management

Page 28: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 27 HPEC ‘04

ELP Technology & Architecture

• ELPA Challenge: build a processor & memory that uses 10-6 the power of a high performance MPU and has 0.1 of the state of the art SPECmark (ok, how about .01?).

Page 29: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 28 HPEC ‘04

Study 1 StrongARM: 450 mW @ 160MHz

• Scale (@22nm) 450mW becomes 5 mW• Cubic rule: 5mW becomes 5µw@ 16MHz• Use 10x more area to recover performance• 10x becomes 3-4x memory match timing.• Asynchronous logic may give 5x in perf.• Use sub threshold (?) circuits; higher VT

• Power management

Page 30: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 29 HPEC ‘04

Study 2 the Ant

• 10 Hz, 0.1-1µW 250k neurons; 3D packaging, About ¼ mm3 or about 1 micron3 per neuron

• How many SPECmarks?

Page 31: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 30 HPEC ‘04

And while we’re at it, why not an EHP challenge?

• Much more work done in this area but limited applicability– Whole earth simulator (60TFlops)– Grape series (40TFlops)

• General solutions come slowly (like the mills of the Gods)– Streaming compilers – New mathematical models

Page 32: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 31 HPEC ‘04

And don’t forget reliability!

NASA Langley Research Center

Page 33: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 32 HPEC ‘04

Computational Integrity

Design for– Reliability– Testability– Serviceability– Process recoverability– Fail-safe computation

Page 34: Banquet Talk: Area-Time-Power tradeoffs in computer design ... · Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information

M. J. Flynn 33 HPEC ‘04

Summary• Embedded processors/SOC is the major growth

area with obvious challenges:

1) 10x speed without increasing power

2) 10-6 less power with the same speed

3) 100x circuit complexity with same design effort.

• Beyond this the real challenge is in system design: new system concepts, interconnect technologies, IP management and design tools.