Transcript
Page 1: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

1

www.compaq.com

Design Challenges in Design Challenges in MultiMulti --GHz GHz

MicroprocessorsMicroprocessors

Bill HerrickBill Herrick

Director, Alpha Microprocessor Director, Alpha Microprocessor DevelopmentDevelopment

2

ASP DAC 2000

IntroductionIntroduction

�� Moore’s Moore’s Law (Law (the trend that the demand for IC functions and the trend that the demand for IC functions and the capability of the semiconductor industry to meet that demandthe capability of the semiconductor industry to meet that demand, ,

will double every 1.5 to 2 yearswill double every 1.5 to 2 years) has worked well during ) has worked well during the last 30 yearsthe last 30 years

�� Difficult challenges face the industry attempting to Difficult challenges face the industry attempting to maintain the pacemaintain the pace

�� With collaboration, understanding, vision and With collaboration, understanding, vision and innovation this trend can continue for high innovation this trend can continue for high performance microprocessorsperformance microprocessors

Page 2: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

2

3

ASP DAC 2000

TopicsTopics�� Historical TrendsHistorical Trends

zz IntelIntel

zz Alpha chips and design styleAlpha chips and design style

zz Observations and trendsObservations and trends

�� Technology PredictionsTechnology Predictionszz ITRS 1999ITRS 1999

�� Key Design ChallengesKey Design ChallengesÎÎClocking and power Clocking and power -- how Alpha has managedhow Alpha has managed

ÎÎClocking and power Clocking and power -- long term solutionslong term solutions

4

ASP DAC 2000

Historical Trends: Then and NowHistorical Trends: Then and Now

Circa 1970Circa 1970�� 12µ PMOS12µ PMOS

�� 1000 transistors1000 transistors

�� 5 5 -- 10 mm10 mm22 die sizedie size

�� 10V supply10V supply

�� 50 50 -- 100 kHz frequency100 kHz frequency

�� 100 100 -- 200 200 mWmW

�� 16 pin 16 pin DIPsDIPs

Circa 2000Circa 2000�� 0.18µ CMOS0.18µ CMOS

�� 10 10 -- 100 million transistors100 million transistors

�� 300 300 -- 400 mm400 mm22 die sizedie size

�� 2.5V supply2.5V supply

�� 500 500 -- 1000 MHz frequency1000 MHz frequency

�� 50 50 -- 100 W100 W

�� 500 500 -- 1000 pin 1000 pin BGAsBGAs

Page 3: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

3

5

ASP DAC 2000

Intel Performance HistoryIntel Performance History

0.01

0.1

1

10

100

1000

10000

71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00

Date o f Introduction

MIP

S

386

4004

8080

808680286

486

Pentium

Pentium ProPe ntium II

Xe onPe ntium III

6

ASP DAC 2000

Intel TrendsIntel Trends

�� The 4004 (1971)The 4004 (1971)zz 2300 transistors in a 10u process, 2300 transistors in a 10u process, zz 108kHz operation, executing 0.06108kHz operation, executing 0.06 MIPs MIPs

�� Pentium III (1999) Pentium III (1999) zz 28 million transistors in a 0.18u process, 28 million transistors in a 0.18u process, zz 733MHz operation, executes 2000 733MHz operation, executes 2000 MIPsMIPs

�� Over nearly 30 years Over nearly 30 years zz performance has increased 30,000x, performance has increased 30,000x, zz transistor count has increase 10,000x transistor count has increase 10,000x zz frequency has increased 7,000x frequency has increased 7,000x zz die size has increased only 25x. die size has increased only 25x. zz Moore’sMoore’s law predicts 30,000x to 1,000,000x law predicts 30,000x to 1,000,000x

improvement over this period.improvement over this period.

Page 4: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

4

7

ASP DAC 2000

Alpha ArchitectureAlpha Architecture

�� Alpha is a true 64Alpha is a true 64--bit load/store RISC architecturebit load/store RISC architecture

�� Alpha is designed for high clock speedAlpha is designed for high clock speedzz Simple, fixed length (32Simple, fixed length (32--bit) instructionsbit) instructions

zz Minimal instruction ordering constraintsMinimal instruction ordering constraints

zz No conditions codesNo conditions codeszz No branch delay slotsNo branch delay slots

�� Chip microChip micro--architecture is carefully chosen to architecture is carefully chosen to maximize performance without impacting cycle maximize performance without impacting cycle timetime

8

ASP DAC 2000

Alpha Performance HistoryAlpha Performance History

1

10

100

1992 1993 1994 1995 1996 1997 1998 1999

D a te o f In tro du ction

SP

EC

int9

5

E V45-275

E V5-300

E V56-500

E V6-575

E V67-700

E V4-200

EV56-600

EV56-400

Page 5: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

5

9

ASP DAC 2000

EV4 Chip OverviewEV4 Chip Overview•• 0.75µm 3LM N0.75µm 3LM N--well CMOS,well CMOS,

LLeffeff=0.5µm, =0.5µm, TToxox=10.5nm=10.5nm

•• 3.3V 3.3V VddVdd

•• 200MHz @100°C & 3.3V200MHz @100°C & 3.3V

•• 16 gate delays per cycle 16 gate delays per cycle

•• 30W @200MHz & 3.3V30W @200MHz & 3.3V

•• 13.9mm x 16.8mm (233 mm13.9mm x 16.8mm (233 mm22) )

•• 1.7 Million Transistors1.7 Million Transistors

~ 0.85 Million Logic ~ 0.85 Million Logic TransistorsTransistors

•• 431 pin PGA (291 signals)431 pin PGA (291 signals)

10

ASP DAC 2000

EV4 MicroEV4 Micro--ArchitectureArchitecture

�� Dual InDual In--Order Instruction Issue Order Instruction Issue zz singlesingle--issue Integer & singleissue Integer & single--issue FPissue FP

�� Fully Pipelined (except Integer MUL and FP DIV)Fully Pipelined (except Integer MUL and FP DIV)zz 77--stage Integer and 10stage Integer and 10--stage FP pipelinesstage FP pipelines

�� 11--bit Branch Prediction: 2kbit Branch Prediction: 2k--entry BHTentry BHT�� 8kB direct8kB direct--mapped Imapped I--Cache and 8kB directCache and 8kB direct--

mapped writemapped write--through Dthrough D--CacheCache�� 32 Integer and 32 FP Registers, 64b/entry32 Integer and 32 FP Registers, 64b/entry�� Flexible external interface: shared 128b/64b data, Flexible external interface: shared 128b/64b data,

34b address L2 cache and system interface34b address L2 cache and system interface

Page 6: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

6

11

ASP DAC 2000

EV5 Chip OverviewEV5 Chip Overview

•• 0.50µm 4LM N0.50µm 4LM N--well CMOS, well CMOS, LLeffeff=0.365µm, =0.365µm, TToxox=9.0nm=9.0nm

•• 3.3V 3.3V VddVdd

•• 350MHz @100°C & 3.3V350MHz @100°C & 3.3V

•• 14 gate delays per cycle 14 gate delays per cycle

•• 60W @350MHz & 3.3V60W @350MHz & 3.3V

•• 16.5mm x 18.1mm (298 mm16.5mm x 18.1mm (298 mm22) )

•• 9.3 Million Transistors9.3 Million Transistors

~ 2.5 Million Logic ~ 2.5 Million Logic TransistorsTransistors

•• 499 pin PGA (294 signals)499 pin PGA (294 signals)

12

ASP DAC 2000

EV5 MicroEV5 Micro--ArchitectureArchitecture

�� Quad InQuad In--Order Instruction IssueOrder Instruction Issuezz dualdual--issue Integer & dualissue Integer & dual--issue FPissue FP

�� 77--stage Integer and 9stage Integer and 9--stage FP pipelinesstage FP pipelineszz FP latencies reduced by 2 cyclesFP latencies reduced by 2 cycles

�� 22--bit Branch Prediction: 2kbit Branch Prediction: 2k--entry BHTentry BHT�� 8kB I8kB I--Cache and 8kB writeCache and 8kB write--through Dthrough D--CacheCache�� 96kB unified on96kB unified on--chip L2 Cachechip L2 Cache�� Improved external interface supports a nonImproved external interface supports a non--

blocking cache schemeblocking cache scheme

Page 7: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

7

13

ASP DAC 2000

EV6 Chip OverviewEV6 Chip Overview

• 0.35µm 6LM N6LM N--well well CMOS,LLeffeff=0.25µm,=0.25µm, TToxox=6.0nm=6.0nm

• 2.2V Vdd• 575MHz @100°C & 2.2V • 12 gate delays per cycle • 90W @575MHz & 2.2V• 16.7mm x 18.8mm (314 mm2) • 15.2 Million Transistors

~ 6 Million Logic Transistors•• 587 pin PGA (374 signals)587 pin PGA (374 signals)

14

ASP DAC 2000

EV6 MicroEV6 Micro--ArchitectureArchitecture

�� FourFour--wide Instruction Fetchwide Instruction Fetch

�� Tournament Branch PredictorTournament Branch Predictor

�� OutOut--ofof--Order Execution PipelinesOrder Execution Pipelineszz QuadQuad--speculativespeculative--issue integer pipelineissue integer pipelinezz DualDual--speculativespeculative--issue floatingissue floating--point pipelinepoint pipeline

�� 80 In80 In--flight Instructionsflight Instructions�� Registers: 80 Integer, 72 Floating PointRegisters: 80 Integer, 72 Floating Point�� Queue Entries: 20 Integer, 15 Floating PointQueue Entries: 20 Integer, 15 Floating Point�� 22--Way 64KB L1 OnWay 64KB L1 On--Chip Instruction and Data Chip Instruction and Data

CachesCaches�� Up to 16 outstanding offUp to 16 outstanding off--chip memory referenceschip memory references

Page 8: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

8

15

ASP DAC 2000

EV7 Chip OverviewEV7 Chip Overview

�� 0.18µm CMOS technology 0.18µm CMOS technology �� 1.5V 1.5V VddVdd�� Clock frequency >1.0GHzClock frequency >1.0GHz�� 100W100W�� ~350mm~350mm22

�� ~100 Million transistors~100 Million transistors�� EV6 coreEV6 core�� Integrated L2 Cache (1.75 MB 7Integrated L2 Cache (1.75 MB 7--way)way)�� Integrated memory controller (RAMBUS)Integrated memory controller (RAMBUS)�� Integrated network interfaceIntegrated network interface

16

ASP DAC 2000

EV8 Chip OverviewEV8 Chip Overview�� Clock frequency range 1.0Clock frequency range 1.0--2.0GHz2.0GHz

�� Leading edge 0.125µm CMOS technology Leading edge 0.125µm CMOS technology

�� ~1.2V~1.2V VddVdd

�� <150W<150W

�� ~250 Million transistors~250 Million transistors

�� Enhanced outEnhanced out--ofof--order executionorder execution

�� 88--widewide superscalarsuperscalar

�� 44--way simultaneous multiway simultaneous multi--threading (SMT)threading (SMT)

�� EV7 memory and system enhancementsEV7 memory and system enhancements

Page 9: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

9

17

ASP DAC 2000

Alpha Circuit Design PhilosophyAlpha Circuit Design Philosophy

�� Transistor level circuit designTransistor level circuit design�� Broad range of circuit styles and logic familiesBroad range of circuit styles and logic families

zz Complementary CMOSComplementary CMOSzz Dynamic logicDynamic logiczz DCVSL (DCVSL (cascodecascode))zz RatioedRatioed logiclogic

�� Key components to enable Key components to enable high performancehigh performancezz OnOn--chip clock generation and distribution (lowchip clock generation and distribution (low--

skew, fast edge)skew, fast edge)zz Latching (low latency)Latching (low latency)zz Low noise onLow noise on--chip power distributionchip power distributionzz OnOn--chip signal integrity managementchip signal integrity management

18

ASP DAC 2000

Complexity TrendsComplexity Trends

�� Process scaling has continued steadilyProcess scaling has continued steadily

�� Planarization Planarization has enabled an increase in has enabled an increase in the number of interconnect layersthe number of interconnect layers

�� Transistor counts have increased Transistor counts have increased dramatically with the L2 cache dramatically with the L2 cache SRAMsSRAMs

�� Additionally, design team size has Additionally, design team size has increased ~40% per generationincreased ~40% per generation

�� Opportunities to manage complexity and Opportunities to manage complexity and productivityproductivity

zz Fundamental understanding and modeling Fundamental understanding and modeling of process and circuit element behaviorsof process and circuit element behaviors

zz High level design methodsHigh level design methodszz CADCAD

zz Design reuseDesign reuse

zz MicroMicro--architecturearchitecture

Process Features

00.10.20.30.40.50.60.70.8

EV4 EV5 EV6 EV7 EV8

Dim

ensi

on (u

m)

0

2

4

6

8

10

Me

tal L

aye

rs

Chip Features

0

50

100

150

200

250

300

EV4 EV5 EV6 EV7 EV8

Tra

nsis

tors

(M)

050100150200250300350400450

Die

Siz

e (m

m2 )

Page 10: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

10

19

ASP DAC 2000

Power Dissipation TrendsPower Dissipation Trends

�� Power consumption is increasingPower consumption is increasingzz Better cooling technology neededBetter cooling technology needed

�� Supply current is increasing faster!Supply current is increasing faster!�� OnOn--chip signal integrity will be a major chip signal integrity will be a major

issueissue�� Power and current distribution are criticalPower and current distribution are critical�� Opportunities to slow power growthOpportunities to slow power growth

zz Accelerate Accelerate VddVdd scalingscalingzz /RZ � GLHOHFWULFV WKLQQHU �&X�/RZ � GLHOHFWULFV WKLQQHU �&X�

interconnectinterconnect

zz SOI circuit innovations SOI circuit innovations

zz Clock system designClock system designzz micromicro--architecturearchitecture

Power Dissipation

020406080

100120140160

EV4 EV5 EV6 EV7 EV8

Pow

er (W

)

0

0.51

1.5

22.5

3

3.5

Vo

ltag

e (

V)

Supply Current

0

20

40

60

80

100

120

140

EV4 EV5 EV6 EV7 EV8

Cu

rren

t (A

)

0

0.5

1

1.5

2

2.5

3

3.5

Vol

tage

(V)

20

ASP DAC 2000

Performance TrendsPerformance Trends

�� Performance has increased Performance has increased significantly (7x) faster than significantly (7x) faster than frequencyfrequency

�� Performance tracks transistor count Performance tracks transistor count when L2 cache ignoredwhen L2 cache ignored

zz Transistor budget has increased more Transistor budget has increased more than performance when L2 cache is than performance when L2 cache is considered butconsidered but

zz benchmarks did not reflect larger benchmarks did not reflect larger applicationsapplications

�� Opportunities to continue Opportunities to continue performance improvementsperformance improvements

zz Continued scaling of devices, interconnect Continued scaling of devices, interconnect and dielectricsand dielectrics

zz Clock distributionClock distribution

zz MicroMicro--architecturearchitecture

System designSystem design

Clock Speed

0

10

20

30

40

50

60

EV4 EV5 EV6 EV7 EV8

Re

lativ

e P

erfo

rma

nc

e

02004006008001000120014001600

Fre

que

ncy

(MH

z)

Transistor Count

0

10

20

30

40

50

60

70

EV4 EV5 EV6 EV7 EV8

Rel

ativ

e P

erfo

rman

ce

0

50

100

150

200

250

300

Tra

nsis

tors

(M)

Page 11: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

11

21

ASP DAC 2000

MicroMicro--Architecture TrendsArchitecture Trends

�� Trends have includedTrends have includedzz Wider superWider super--scalar scalar

machines, deep pipelinesmachines, deep pipelines

zz Larger register, L1 cachesLarger register, L1 caches

zz OnOn--chip L2 cacheschip L2 caches

zz Out of order executionOut of order execution

zz Sophisticated branch Sophisticated branch prediction, predication, prediction, predication, speculationspeculation

zz Integrated memory and Integrated memory and network controllersnetwork controllers

zz SMTSMT

zz Less idle logic but more Less idle logic but more bookkeeping logicbookkeeping logic

�� Future opportunities Future opportunities includeinclude

zz Floating point performance Floating point performance improvementsimprovements

zz VectorsVectors

zz ThreadThread--level speculationlevel speculation

zz More pipeliningMore pipelining

zz Better onBetter on--chip chip communicationscommunications

–– Banking, replicating Banking, replicating structuresstructures

–– Clustering functional Clustering functional unitsunits

zz OnOn--chip SMPchip SMP

22

ASP DAC 2000

Challenging Design TrendsChallenging Design Trends

�� MicroMicro--architecture and logic design are architecture and logic design are stressed as frequency has increased stressed as frequency has increased faster than scalingfaster than scaling

�� Further reducing the number of gate Further reducing the number of gate delays per cycle will be difficultdelays per cycle will be difficult

�� Cycles to communicate across chip track Cycles to communicate across chip track with frequencywith frequency

�� Clock edge rates are not scalingClock edge rates are not scaling

�� Opportunities to continue performance Opportunities to continue performance increasesincreases

zz Chip implementation designChip implementation design

zz Clock system designClock system design

zz MicroMicro--architecturearchitecture

Logic Levels per Cycle

0

5

10

15

20

EV4 EV5 EV6 EV7 EV8

Gat

e D

elay

s pe

r Cyc

le

02004006008001000120014001600

Fre

que

ncy

(MH

z)

Cycles Across Chip

012345678

EV4 EV5 EV6 EV7 EV8

Cyc

les

02004006008001000120014001600

Fre

quen

cy (M

Hz)

Page 12: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

12

23

ASP DAC 2000

ITRS ITRS --1999 Key Messages1999 Key Messages

�� No major issues through 130 nm generation, but No major issues through 130 nm generation, but significant issues for 100 nm generation (2005)significant issues for 100 nm generation (2005)

�� Continued technology scaling will require the Continued technology scaling will require the introduction of new process materials and new introduction of new process materials and new devicesdevices

�� Transistor densities will continue the historical Transistor densities will continue the historical trends ~2X / 2yrstrends ~2X / 2yrs

�� Clock frequency increases will slow compared to Clock frequency increases will slow compared to historical trends historical trends

24

ASP DAC 2000

ITRSITRS--1999 The Roadmap1999 The Roadmap

1999 2001 2003 2005 2008 2011 2014

Generation (nm) 180 130 130 100 70 50 35

Leff (nm) 140 100 80 65 45 31 21

Devices (M) 110 220 441 882 2494 7053 19949

Chip Size (mm 2) 450 450 567 622 713 817 937

Signals 768 1024 1024 1024 1280 1408 1472

Pins 1600 2007 2518 3158 4437 6234 8758

Page 13: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

13

25

ASP DAC 2000

ITRSITRS--1999 The Roadmap (continued)1999 The Roadmap (continued)

1999 2001 2003 2005 2008 2011 2014

Generation (nm) 180 130 130 100 70 50 35

Clock (MHz) 1200 1454 1724 2000 2500 3000 3600

Local Clk (MHz) 1250 1767 2490 3500 6000 10000 13500

IO (MHz) 480 722 932 1035 1285 1540 1800

Wiring Levels 7 7 8 9 9 10 10

Vdd (V) 1.8 1.5 1.5 1.2 0.9 0.6 0.6

Power (W) 90 115 140 160 170 174 183

26

ASP DAC 2000

ITRS ITRS -- 1999 Highlights1999 Highlights�� TransistorsTransistors

zz Drive currents will remain constant through 2014 at Drive currents will remain constant through 2014 at 750 µA/µm for 750 µA/µm for NFETs NFETs and 350 µA/µm for and 350 µA/µm for PFETsPFETs

zz Leakage currents will double every 3 years from 5 Leakage currents will double every 3 years from 5 nAnA/µm in 1999 to 160 /µm in 1999 to 160 nAnA/µm in 2014/µm in 2014

�� InterconnectInterconnectzz 8VH�RI�&X�DQG�ORZ���GLHOHFWULFV�ZLOO�EHFRPH��VWDQGDUG8VH�RI�&X�DQG�ORZ���GLHOHFWULFV�ZLOO�EHFRPH��VWDQGDUG

zz Local interconnect delays will scale with gate delaysLocal interconnect delays will scale with gate delays

zz Global interconnect delays, even with repeaters will not Global interconnect delays, even with repeaters will not scale with gate delaysscale with gate delays

zz Coplanar Coplanar waveguideswaveguides, free space RF and optical , free space RF and optical interconnect may be needed longer term interconnect may be needed longer term

Page 14: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

14

27

ASP DAC 2000

ITRS ITRS -- 1999 Highlights (2)1999 Highlights (2)�� PackagingPackaging

zz Maximum junction temperature must be reduced Maximum junction temperature must be reduced from 100from 100ooC to 85C to 85ooC by 2002 for reliability concernsC by 2002 for reliability concerns

zz SignificantSignificant θθjaja improvements will be required to improvements will be required to maintain air cooling system solutions: a 50% maintain air cooling system solutions: a 50% reduction by 2002 and another 30% by 2014reduction by 2002 and another 30% by 2014

�� Modeling & SimulationModeling & Simulationzz 2D and 3D interconnect models with inductance 2D and 3D interconnect models with inductance

and transmission line effects will be neededand transmission line effects will be neededzz Transistor models of nonTransistor models of non--quasiquasi--static effects and static effects and

quantum mechanical gate effects will be needed; quantum mechanical gate effects will be needed; gate currents will become importantgate currents will become important

zz OCV modeling will become necessary OCV modeling will become necessary zz CPU efficient and accurate models will be essentialCPU efficient and accurate models will be essential

28

ASP DAC 2000

ITRS ITRS -- 1999 Highlights (3)1999 Highlights (3)

�� Design ProductivityDesign Productivityzz Design team sizes will not exceed 300 peopleDesign team sizes will not exceed 300 people

zz Design cycle times will decrease from 36 months in Design cycle times will decrease from 36 months in 1999 to 30 months in 2005 to 24 months in 20141999 to 30 months in 2005 to 24 months in 2014

�� Verification and TestVerification and Testzz Verification has become more than 50% of the total Verification has become more than 50% of the total

design effortdesign effortzz Use of formal verification will increase from 15% Use of formal verification will increase from 15%

now to 30% in 2005 to 60% in 2014now to 30% in 2005 to 60% in 2014

zz BIST coverage will increase from 20% now to 40% BIST coverage will increase from 20% now to 40% in 2005 to 70% in 2014in 2005 to 70% in 2014

Page 15: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

15

29

ASP DAC 2000

�� 2 phase single wire clock, 2 phase single wire clock, distributed globallydistributed globallyzz Low skewLow skewzz Fast edge rateFast edge rate

�� 1 clock driver channel1 clock driver channelzz 3.5nF clock load3.5nF clock loadzz 35 cm final driver width35 cm final driver width

EV4 ClockingEV4 Clocking

trise = 0.5ns tskew = 0.3ns

tcycle= 6.0ns

Clock waveform

Location of clockdriver on die

30

ASP DAC 2000

EV4 LatchesEV4 Latches

First single wire clock First single wire clock implementationimplementationzz Race immune latchRace immune latchzz Level sensitive designLevel sensitive designzz 2 latches per cycle2 latches per cyclezz Can build logic into first stage Can build logic into first stage

of latchof latchzz ttcyclecycle latch overhead is latch overhead is

approximately 25%approximately 25%

D

CLKQ

CLK high loading latch

Page 16: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

16

31

ASP DAC 2000

EV4 Thermal GradientEV4 Thermal Gradient

Temp = 46C

Temp = 76C

EV4

32

ASP DAC 2000

EV5 ClockingEV5 Clocking

�� 2 phase single wire clock, 2 phase single wire clock, distributed globallydistributed globally

�� 2 distributed driver channels2 distributed driver channelszz Reduced RC delay/skewReduced RC delay/skewzz Improved thermal distributionImproved thermal distributionzz 3.75nF clock load3.75nF clock loadzz 58 cm final driver width58 cm final driver width

�� Local inverters for latchingLocal inverters for latching�� Conditional clocks in caches to Conditional clocks in caches to

reduce powerreduce power�� More complex race checkingMore complex race checking�� Device variationDevice variation

trise = 0.35ns tskew = 150ps

tcycle= 3.3ns

Clock waveform

Location of clockdriver on die

pre-driver

final drivers

Page 17: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

17

33

ASP DAC 2000

EV5 LatchesEV5 Latches

D

CLKQ

CLK_L

CLK high loading latch

Reduce tReduce tdqdq and reduce clock loadand reduce clock loadzz Local clock inverter Local clock inverter

complicated race issuescomplicated race issueszz Level sensitive designLevel sensitive designzz 2 latches per cycle2 latches per cyclezz Can build logic into first and Can build logic into first and

last stages of latchlast stages of latchzz ttcyclecycle latch overhead is latch overhead is

approximately 15%approximately 15%zz Smaller, faster and lower Smaller, faster and lower

power than EV4 latchpower than EV4 latch

34

ASP DAC 2000

EV5 Thermal GradientEV5 Thermal Gradient

Page 18: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

18

ASP DAC 2000

EV5 Global Clock SkewEV5 Global Clock Skew

ASP DAC 2000

EV5 Local Clock SkewEV5 Local Clock Skew

Page 19: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

19

37

ASP DAC 2000

�� 2 Phase, with multiple conditional 2 Phase, with multiple conditional buffered clocksbuffered clockszz 2.82.8 nFnF clock loadclock loadzz 40 cm final driver width40 cm final driver width

�� Local clocks can be gated “off” to Local clocks can be gated “off” to save powersave power

�� Reduced load/skewReduced load/skew�� Reduced thermal issuesReduced thermal issues�� Multiple clocks complicate race Multiple clocks complicate race

checkingchecking

trise = 0.35ns tskew = 50ps

tcycle= 1.67ns

EV6 ClockingEV6 Clocking

Global clock waveform

PLL

38

ASP DAC 2000

EV6 LatchesEV6 Latches

Conditional clocks to reduce Conditional clocks to reduce powerpowerzz Static designStatic designzz 1 latch per cycle1 latch per cyclezz Edge triggered to simplify Edge triggered to simplify

race rulesrace ruleszz Can build logic into latchCan build logic into latchzz ttcyclecycle latch overhead is latch overhead is

approximately 15%approximately 15%

D CLK

Q_H Q_L

Page 20: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

20

39

ASP DAC 2000

EV6 Clock ResultsEV6 Clock Results

GCLK Skew(at Vdd/2 Crossings)

ps5

101520253035404550

ps300305310315320325330335340345

GCLK Rise Times(20% to 80% Extrapolated to 0% to 100%)

40

ASP DAC 2000

EV7 Clock HierarchyEV7 Clock Hierarchy

GCLK(CPU Core)L

2L

_C

LK

(L2

Cac

he)

L2

R_

CL

K(L

2 C

ache

)

NCLK(Mem Ctr l )

DLL

PL

L

SYSCLK

DL

L

DL

L

+ widely dispersed drivers

+ DLLs compensate static and low-frequency variation

+ divides design and verification effort

- DLL design and verification is added work

+ tailored clocks

Active Skew Management and Multiple Clock Domains

Page 21: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

21

41

ASP DAC 2000

Power ConsumptionPower Consumption

�� Clocks consume the Clocks consume the largest fraction of powerlargest fraction of power

�� Driving interDriving inter--unit busses unit busses consumes as much power consumes as much power as intraas intra--unit gates and unit gates and interconnectinterconnect

ClocksCachesExecution Un itsContro lI/O D riv ers

40%20%

15%

15% 10%

42

ASP DAC 2000

EV4 EV4 -- 3 Metal Layers3 Metal Layers3rd “coarse and thick” metal layer added to the3rd “coarse and thick” metal layer added to the

technology for EV4 designtechnology for EV4 designPower supplied from two sides of the die via 3rd metal layerPower supplied from two sides of the die via 3rd metal layer

2nd metal layer used to form power grid2nd metal layer used to form power grid

90% of 3rd metal layer used for power/clock routing90% of 3rd metal layer used for power/clock routing

Metal 3

Metal 2

Metal 1

Page 22: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

22

43

ASP DAC 2000

EV5 EV5 -- 4 Metal Layers4 Metal Layers4th “coarse and thick” metal layer added to the4th “coarse and thick” metal layer added to the

technology for EV5 designtechnology for EV5 designPower supplied from four sides of the diePower supplied from four sides of the die

Grid strapping done all in coarse metalGrid strapping done all in coarse metal

90% of 3rd and 4th metals used for power/clock routing90% of 3rd and 4th metals used for power/clock routing

Metal 3

Metal 2

Metal 1

Metal 4

44

ASP DAC 2000

2 reference plane metal layers added to thetechnology for EV6 designSolid planes dedicated to Vdd/Vss

Significantly lowers resistance of gridLowers on-chip inductance

EV6 EV6 -- 6 Metal Layers6 Metal Layers

Metal 4

Metal 2Metal 1

RP2/Vdd

RP1/Vss

Metal 3

Page 23: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

23

ASP DAC 2000

Reference Plane ExampleReference Plane ExampleSimulation MethodologySimulation Methodology

Extract Inductance & Resistance versus FrequencyExtract Inductance & Resistance versus Frequency

Model Skin Effect Both Vertically and HorizontallyModel Skin Effect Both Vertically and Horizontally

Construct TimeConstruct Time--Domain SPICE Model and Simulate with SPICEDomain SPICE Model and Simulate with SPICE

Use FF Devices, HighUse FF Devices, High VddVdd & Low Temperature to Aggravate Inductive Effects& Low Temperature to Aggravate Inductive Effects

Substrate

RP2

Metal 4

Metal 3

RP1Metal 2Metal 1

Substrate

RP2

Metal 4

Metal 3

Metal 1Metal 2

⇐ Metal 3 Victims

ASP DAC 2000

Reference Plane Example (continued)Reference Plane Example (continued)

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

Time (ns)

M3 Victim

M3 Aggressors

M1 Aggressors

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

Time (ns)

M3 Victim

M3 Aggressors

M1 Aggressors

RP2 Only RP1 & RP2

Page 24: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

24

47

ASP DAC 2000

DeDe--coupling Capacitor Ratioscoupling Capacitor Ratios

�� EV4EV4zz total effective switching capacitance = 12.5nFtotal effective switching capacitance = 12.5nFzz 128nF of de128nF of de--coupling capacitancecoupling capacitance

zz dede--coupling/switching capacitance ~ 10xcoupling/switching capacitance ~ 10x

�� EV5EV5zz 13.9nF of switching capacitance 13.9nF of switching capacitance

zz 160nF of de160nF of de--coupling capacitancecoupling capacitance

�� EV6EV6zz 34nF of effective switching capacitance34nF of effective switching capacitance

zz 320nF of de320nF of de--coupling capacitance coupling capacitance ---- not enough!not enough!

48

ASP DAC 2000

EV6 DeEV6 De--coupling Capacitancecoupling Capacitance

Design for Design for ∆∆IddIdd= 25 A @= 25 A @ VddVdd = 2.2 V, = 2.2 V, f f = 600 MHz= 600 MHzzz 0.320.32--µF of onµF of on--chip dechip de--coupling capacitance was coupling capacitance was

addedadded–– Under major busses and around majorUnder major busses and around major griddedgridded clock clock

driversdrivers

–– Occupies 15Occupies 15--20% of die area20% of die area

zz 11--µF 2µF 2--cmcm22 WirebondWirebond Attached Chip Capacitor Attached Chip Capacitor (WACC) significantly increases “Near(WACC) significantly increases “Near--Chip” deChip” de--couplingcoupling

–– 160 160 VddVdd//Vss bondwireVss bondwire pairs on the WACC minimize pairs on the WACC minimize inductanceinductance

Page 25: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

25

49

ASP DAC 2000

EV6 WACCEV6 WACC

587 IPGA

Micro p rocessorWACC

Heat Slu g

389 Signal - 198 VDD/VSS Pins389 Signal Bondwires

395 VDD/VSS Bondwires

320 VDD/VSS Bondwires

50

ASP DAC 2000

Clocking FuturesClocking Futures

�� Frequencies will continue to scaleFrequencies will continue to scale

�� Clock edge rates are not scaling as wellClock edge rates are not scaling as well

�� Multiple clock zones required Multiple clock zones required zz Architectures minimizing global communicationsArchitectures minimizing global communicationszz Adaptive and passive synchronization techniquesAdaptive and passive synchronization techniques

zz DLLs in the near termDLLs in the near term

zz Clocking schemes utilizing encoding, extraction, Clocking schemes utilizing encoding, extraction, multimulti--state and local phase optimization to state and local phase optimization to compensate for skew and latencycompensate for skew and latency

�� Asynchronous or quasiAsynchronous or quasi--synchronous architecturessynchronous architectures

Page 26: Design Challenges in Multi-GHz Microprocessorsbwrcs.eecs.berkeley.edu/Classes/CS252/Notes/alpha.pdf1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director, Alpha Microprocessor

26

51

ASP DAC 2000

Power FuturesPower Futures

�� Low power modesLow power modes

�� Power tradeoffs in the microPower tradeoffs in the micro--architecturearchitecture

�� More emphasis on a low power circuitsMore emphasis on a low power circuitszz Reducing clock loadReducing clock loadzz Low swing differential clocksLow swing differential clockszz Low swing busesLow swing buses

zz Adiabatic circuits, clocked drivers, retractile logicAdiabatic circuits, clocked drivers, retractile logiczz Asynchronous designAsynchronous design

�� Reference plans also help to minimize inductive Reference plans also help to minimize inductive and wave effectsand wave effects

52

ASP DAC 2000

ConclusionConclusion

�� Physical technology advances will enable multiPhysical technology advances will enable multi--GHz chipsGHz chips

�� Key challenges in power, clocking, complexity, Key challenges in power, clocking, complexity, and verification must be addressedand verification must be addressed

�� New tools and methods will be neededNew tools and methods will be needed

�� CAD developers and chip designers must CAD developers and chip designers must collaborate more closely than ever collaborate more closely than ever

�� With a solid understanding of the fundamentals, a With a solid understanding of the fundamentals, a clear vision of the product and ingenuity, we will clear vision of the product and ingenuity, we will realize multirealize multi--GHz microprocessorsGHz microprocessors


Recommended