Design Challenges in Multi-GHz 1 Design Challenges in Multi-GHz Microprocessors Bill Herrick Director,

  • View
    0

  • Download
    0

Embed Size (px)

Text of Design Challenges in Multi-GHz 1 Design Challenges in Multi-GHz Microprocessors Bill Herrick...

  • 1

    www.compaq.com

    Design Challenges in Design Challenges in MultiMulti --GHz GHz

    MicroprocessorsMicroprocessors

    Bill HerrickBill Herrick

    Director, Alpha Microprocessor Director, Alpha Microprocessor DevelopmentDevelopment

    2

    ASP DAC 2000

    IntroductionIntroduction

    Moore’s Moore’s Law (Law (the trend that the demand for IC functions and the trend that the demand for IC functions and the capability of the semiconductor industry to meet that demandthe capability of the semiconductor industry to meet that demand, ,

    will double every 1.5 to 2 yearswill double every 1.5 to 2 years) has worked well during ) has worked well during the last 30 yearsthe last 30 years

    Difficult challenges face the industry attempting to Difficult challenges face the industry attempting to maintain the pacemaintain the pace

    With collaboration, understanding, vision and With collaboration, understanding, vision and innovation this trend can continue for high innovation this trend can continue for high performance microprocessorsperformance microprocessors

  • 2

    3

    ASP DAC 2000

    TopicsTopics Historical TrendsHistorical Trends

    zz IntelIntel

    zz Alpha chips and design styleAlpha chips and design style

    zz Observations and trendsObservations and trends

    Technology PredictionsTechnology Predictions zz ITRS 1999ITRS 1999

    Key Design ChallengesKey Design Challenges ÎÎClocking and power Clocking and power -- how Alpha has managedhow Alpha has managed

    ÎÎClocking and power Clocking and power -- long term solutionslong term solutions

    4

    ASP DAC 2000

    Historical Trends: Then and NowHistorical Trends: Then and Now

    Circa 1970Circa 1970 12µ PMOS12µ PMOS

    1000 transistors1000 transistors

    5 5 -- 10 mm10 mm22 die sizedie size

    10V supply10V supply

    50 50 -- 100 kHz frequency100 kHz frequency

    100 100 -- 200 200 mWmW

    16 pin 16 pin DIPsDIPs

    Circa 2000Circa 2000 0.18µ CMOS0.18µ CMOS

    10 10 -- 100 million transistors100 million transistors

    300 300 -- 400 mm400 mm22 die sizedie size

    2.5V supply2.5V supply

    500 500 -- 1000 MHz frequency1000 MHz frequency

    50 50 -- 100 W100 W

    500 500 -- 1000 pin 1000 pin BGAsBGAs

  • 3

    5

    ASP DAC 2000

    Intel Performance HistoryIntel Performance History

    0.01

    0.1

    1

    10

    100

    1000

    10000

    71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00

    Date o f Introduction

    M IP

    S

    386

    4004

    8080

    8086 80286

    486

    Pentium

    Pentium Pro Pe ntium II

    Xe on Pe ntium III

    6

    ASP DAC 2000

    Intel TrendsIntel Trends

    The 4004 (1971)The 4004 (1971) zz 2300 transistors in a 10u process, 2300 transistors in a 10u process, zz 108kHz operation, executing 0.06108kHz operation, executing 0.06 MIPs MIPs

    Pentium III (1999) Pentium III (1999) zz 28 million transistors in a 0.18u process, 28 million transistors in a 0.18u process, zz 733MHz operation, executes 2000 733MHz operation, executes 2000 MIPsMIPs

    Over nearly 30 years Over nearly 30 years zz performance has increased 30,000x, performance has increased 30,000x, zz transistor count has increase 10,000x transistor count has increase 10,000x zz frequency has increased 7,000x frequency has increased 7,000x zz die size has increased only 25x. die size has increased only 25x. zz Moore’sMoore’s law predicts 30,000x to 1,000,000x law predicts 30,000x to 1,000,000x

    improvement over this period.improvement over this period.

  • 4

    7

    ASP DAC 2000

    Alpha ArchitectureAlpha Architecture

    Alpha is a true 64Alpha is a true 64--bit load/store RISC architecturebit load/store RISC architecture

    Alpha is designed for high clock speedAlpha is designed for high clock speed zz Simple, fixed length (32Simple, fixed length (32--bit) instructionsbit) instructions

    zz Minimal instruction ordering constraintsMinimal instruction ordering constraints

    zz No conditions codesNo conditions codes zz No branch delay slotsNo branch delay slots

    Chip microChip micro--architecture is carefully chosen to architecture is carefully chosen to maximize performance without impacting cycle maximize performance without impacting cycle timetime

    8

    ASP DAC 2000

    Alpha Performance HistoryAlpha Performance History

    1

    10

    100

    1992 1993 1994 1995 1996 1997 1998 1999

    D a te o f In tro du ction

    S P

    E C

    in t9

    5

    E V45-275

    E V5-300

    E V56-500

    E V6-575

    E V67-700

    E V4-200

    EV56-600

    EV56-400

  • 5

    9

    ASP DAC 2000

    EV4 Chip OverviewEV4 Chip Overview •• 0.75µm 3LM N0.75µm 3LM N--well CMOS,well CMOS,

    LLeffeff=0.5µm, =0.5µm, TToxox=10.5nm=10.5nm

    •• 3.3V 3.3V VddVdd

    •• 200MHz @100°C & 3.3V200MHz @100°C & 3.3V

    •• 16 gate delays per cycle 16 gate delays per cycle

    •• 30W @200MHz & 3.3V30W @200MHz & 3.3V

    •• 13.9mm x 16.8mm (233 mm13.9mm x 16.8mm (233 mm22) )

    •• 1.7 Million Transistors1.7 Million Transistors

    ~ 0.85 Million Logic ~ 0.85 Million Logic TransistorsTransistors

    •• 431 pin PGA (291 signals)431 pin PGA (291 signals)

    10

    ASP DAC 2000

    EV4 MicroEV4 Micro--ArchitectureArchitecture

    Dual InDual In--Order Instruction Issue Order Instruction Issue zz singlesingle--issue Integer & singleissue Integer & single--issue FPissue FP

    Fully Pipelined (except Integer MUL and FP DIV)Fully Pipelined (except Integer MUL and FP DIV) zz 77--stage Integer and 10stage Integer and 10--stage FP pipelinesstage FP pipelines

    11--bit Branch Prediction: 2kbit Branch Prediction: 2k--entry BHTentry BHT 8kB direct8kB direct--mapped Imapped I--Cache and 8kB directCache and 8kB direct--

    mapped writemapped write--through Dthrough D--CacheCache 32 Integer and 32 FP Registers, 64b/entry32 Integer and 32 FP Registers, 64b/entry Flexible external interface: shared 128b/64b data, Flexible external interface: shared 128b/64b data,

    34b address L2 cache and system interface34b address L2 cache and system interface

  • 6

    11

    ASP DAC 2000

    EV5 Chip OverviewEV5 Chip Overview

    •• 0.50µm 4LM N0.50µm 4LM N--well CMOS, well CMOS, LLeffeff=0.365µm, =0.365µm, TToxox=9.0nm=9.0nm

    •• 3.3V 3.3V VddVdd

    •• 350MHz @100°C & 3.3V350MHz @100°C & 3.3V

    •• 14 gate delays per cycle 14 gate delays per cycle

    •• 60W @350MHz & 3.3V60W @350MHz & 3.3V

    •• 16.5mm x 18.1mm (298 mm16.5mm x 18.1mm (298 mm22) )

    •• 9.3 Million Transistors9.3 Million Transistors

    ~ 2.5 Million Logic ~ 2.5 Million Logic TransistorsTransistors

    •• 499 pin PGA (294 signals)499 pin PGA (294 signals)

    12

    ASP DAC 2000

    EV5 MicroEV5 Micro--ArchitectureArchitecture

    Quad InQuad In--Order Instruction IssueOrder Instruction Issue zz dualdual--issue Integer & dualissue Integer & dual--issue FPissue FP

    77--stage Integer and 9stage Integer and 9--stage FP pipelinesstage FP pipelines zz FP latencies reduced by 2 cyclesFP latencies reduced by 2 cycles

    22--bit Branch Prediction: 2kbit Branch Prediction: 2k--entry BHTentry BHT 8kB I8kB I--Cache and 8kB writeCache and 8kB write--through Dthrough D--CacheCache 96kB unified on96kB unified on--chip L2 Cachechip L2 Cache Improved external interface supports a nonImproved external interface supports a non--

    blocking cache schemeblocking cache scheme

  • 7

    13

    ASP DAC 2000

    EV6 Chip OverviewEV6 Chip Overview

    • 0.35µm 6LM N6LM N--well well CMOS, LLeffeff=0.25µm,=0.25µm, TToxox=6.0nm=6.0nm

    • 2.2V Vdd • 575MHz @100°C & 2.2V • 12 gate delays per cycle • 90W @575MHz & 2.2V • 16.7mm x 18.8mm (314 mm2) • 15.2 Million Transistors

    ~ 6 Million Logic Transistors •• 587 pin PGA (374 signals)587 pin PGA (374 signals)

    14

    ASP DAC 2000

    EV6 MicroEV6 Micro--ArchitectureArchitecture

    FourFour--wide Instruction Fetchwide Instruction Fetch

    Tournament Branch PredictorTournament Branch Predictor

    OutOut--ofof--Order Execution PipelinesOrder Execution Pipelines zz QuadQuad--speculativespeculative--issue integer pipelineissue integer pipeline zz DualDual--speculativespeculative--issue floatingissue floating--point pipelinepoint pipeline

    80 In80 In--flight Instructionsflight Instructions Registers: 80 Integer, 72 Floating PointRegisters: 80 Integer, 72 Floating Point Queue Entries: 20 Integer, 15 Floating PointQueue Entries: 20 Integer, 15 Floating Point 22--Way 64KB L1 OnWay 64KB L1 On--Chip Instruction and Data Chip Instruction and Data

    CachesCaches Up to 16 outstanding offUp to 16 outstanding off--chip memory referenceschip memory references

  • 8

    15

    ASP DAC 2000

    EV7 Chip OverviewEV7 Chip Overview

    0.18µm CMOS technology 0.18µm CMOS technology 1.5V 1.5V VddVdd Clock frequency >1.0GHzCl