ITRS-2001 Grenoble Meeting April 25, 2001 U.S. Design TWG

ITRS-2001 Grenoble Meeting

April 25, 2001

U.S. Design TWG

4/26 Agenda and Issues

• ASIC-LP SOC (product class driven by cost and power)– SOC = system driver class, is roadmapped in the System

Drivers Chapter, would like to keep correlation• Participation of other TWGs in the System Drivers

Chapter• Design ITWG: (1) ORTC Lines (freq, density,

power, chip size / logic-memory tx counts, pkg pins/balls); (2) System Drivers Chapter (MPU, SOC, AMS/RF, DRAM); (3) STRJ models; (4) AMS/RF models

SYSTEM DRIVERS Chapter

• Defines segments of silicon market that drive process and design technology

• Along with ORTCs, serves as “glue” for ITRS• 4 Drivers: SOC (Japan), MPU (USA), DRAM (Korea), M/S

(Europe)– SOC: driven by cost, power, integration– SOC: same as “ASIC-LP”, drives device requirements, packaging IO

counts, – M/S: driven by applications in networking/telecomm

• Each section:• Formal definition of this driver• Nature, market, past/present/future• What market forces apply to this driver ?• For what factors (process, device, design technology) is this a driver ?• Key figures of merit, and futures

• Participation of other ITWGs

DESIGN Chapter• Context

– Scope of Design Technology

– High-level summary of complexities (at level of “issues”)

– Cost, productivity, quality, and other metrics of Design Technology

• Overview of Needs– Driver classes and associated emphases {SOC, MPU, DRAM, MS}

– Resulting needs (e.g., power, …, cost-driven design)

• Summary of Difficult Challenges• Detailed Statements of Needs, Potential Solutions

– System-Level, Circuit, Logic/Physical, Verification, Test

4/27 DESIGN Presentation Outline

• Mixed-Signal Roadmap summary slide• Low-Power Scenario slide• Clock Frequency Model slide• SRAM Density model slide• Decreasing Memory Content slide• Design Cost (and Quality) Requirement slide• System Drivers Chapter slide• Design Chapter slide

Outline

• MPU diminishing returns• New MPU clock frequency model• MPU futures and ASIC/MPU/SOC convergence• New logic (ASIC, MPU) and SRAM density models• Required logic decrease due to power constraint• Design cost / design quality requirement, gap

analysis• Summary of changes and errata (ORTCs, other

TWGs)

MPU Diminishing Returns• Pollack’s Rule

– In a given process technology, new uArch takes 2-3x area of old (last generation) uArch, and provides only 40% more performance (see Slide)

– Slide: process generations (x-axis) versus (1) ratio of Area of New/Old uArch, (2) ratio of Performance of New/Old (approaching 1)

– Slides: SPECint, SPECfp per MHz, SPECint per Watt all decreasing rapidly

• Power knob running out– Speed == Power– 10W/cm2 limit for convection cooling, 50W/cm2 limit for forced-air cooling– Large currents, large power surges on wakeup– Cf. 140A supply current, 150W total power at 1.2V Vdd for EV8 (Compaq)

• Speed knob running out– Historically, 2x clock frequency every process generation

• 1.4x from device scaling (running into t_ox, other limits?)• 1.4x from fewer logic stages (from 40-100 down to around 14 FO4 INV delays)

– Clocks cannot be generated with period < 6-8 FO4 INV delays– Pipelining overhead (1-1.5 FO4 INV delay for pulse-mode latch, 2-3 for FF)– Around 14 FO4 INV delays is limit for clock period (L1 $ access, 64b add)

• Unrealistic to continue 2x frequency trend in ITRS

0

1

2

3

4

Performance Efficiency of Microarchitectures – Pollack’s Rule

Area(Lead / Compaction)

Performance(Lead / Compaction)

1.5 1 0.7 0.5 0.35 0.18Technology Generation

Growth (X)

Note: Performance measured using SpecINT and SpecFP

Implications (in the same technology)•New microarchitecture ~2-3X die area of the last microarchitecture•Provides 1.4-1.7X performance of the last microarchitecture

We are on the Wrong Side of a Square LawIntel: Gelsinger talk ISSCC-2001

SPECint95

y = -5E-05x + 0.0989

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 200 400 600 800 1000

Clock Speed (MHz)

SP

EC

rat

io/M

Hz

Decreasing SPECint/MHz

SPECfp95

y = -0.0005x + 0.5392

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 200 400 600 800 1000

Clock Speed (MHz)

SP

EC

rat

io/M

Hz

Decreasing SPECfp/MHz

SPECfp Per Watt

0

0.5

1

1.5

2

2.5

3

3.5

May-1996

Dec-1996

Jun-1997

Jan-1998

Jul-1998

Feb-1999

Aug-1999

Mar-2000

Oct-2000

Apr-2001

Date of Data

Decreasing SPECfp/Watt

Addendum: SPEC Company List (www.specbench.org)

• Advanced Micro Devices• Alpha Processor• BULL S.A.• Compaq Computer• Data General Corp.• Dell Computer• Digital Equipment• Fujitsu• Gateway 2000• HAL Computer Systems• Hewlett-Packard• Hitachi Ltd.

• IBM• Intel• Intergraph Corp.• KryoTech• Motorola• Pyramid Technology• ROSS Technology• SGI• Siemens• Sun Microsystems• Tandem Computers• UNISYS Corp.

10

100

1000

Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

8038680486PentiumPentium II

Expon.

MPU Clock Frequency Trend

Intel: Borkar/Parkhurst

10.00

100.00

Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

8038680486PentiumPentium II

Expon.

MPU Clock Cycle Trend (FO4 Delays)

Intel: Borkar/Parkhurst

Outline



TWGs)

New MPU Clock Model• Global clock: flat at 14 FO4 INV delays

– FO4 INV delay = delay of an inverter driving a load equal to 4 times its input capacitance

– no local interconnect: negligible, scales with device performance– no (buffered) global interconnect: (1) was unrealistically fast in Fisher98

(ITRS99) model, and (2) global interconnects are pipelined (clock frequency is set by time needed to complete local computation loops, not time for global communication - cf. Pentium-4 and Alpha-21264)

• Local clock: flat at 6 FO4 INV delays– somewhat meaningless: only for ser-par conversion, small iterative structures,

“marketing interpretation” of phase-pipelining– reasonable alternative: delete from Roadmap

• ASIC/SOC: flat at 40-50 FO4 INV delays– absence of interconnect component justified by same pipelining argument, and

by convergence of ASIC / structured-custom design methodologies, tools sets– higher ASIC/SOC frequencies possible, but represent tradeoffs with design

cost, power, other figures of merit– information content is nil; reasonable to delete from Roadmap

Outline



TWGs)

MPU Futures (1)• Drivers: power, I/O bandwidth, yield, ...• Multiple small cores per die, more memory hierarchy on board

– core can be reused across multiple apps/configs– replication redundancy, power savings (lower freq, Vdd while maintaining

throughput); better use of area than memory; avoid overhead of time-mplexing– IBM Power4 (2 CPU + L2); IBM S390 (14 MPU, 16MB L2 (8 chips) on 1 MCM

(31 chips, 1000W, 1.4B xtors, 4224 pins))– Processor-in-Memory (PIM): O(10M) xtors logic per core– 0.5Gb eDRAM L3 by 2005– high memory content gives better control of leakage, total chip power

• I/O bandwidth major differentiator– double-clocking, phase-pipelining in par/ser data conversion hits 6 FO4 limit– I/O count may stay same or decrease due to integration– roughly constant die size (200-350 mm2) also limits I/O count

• Evolutionary uArch changes– superpipelining (for freq), superscalar (beyond 4-way) running out of steam– more multithreading support for parallel processing– more complex hardwired functions (networking, graphics, communications, ...)

(megatrend: shift of flexibility-efficiency tradeoff point away from GPP)

MPU Futures (2)• Circuit design

– ECC for SEU– pass gates on the way out due to low Vt– more redundancy to compensate for yield loss– density models are impacted

• Clocking and power (let’s be reasonable about “needs” !)– 1V supplies, 10-50W total power both flat– SOI (5% or 25%), multi-Vth (10%), multi-Vdd (30-50%), min-energy sizing

under throughput constraints (25%), parallelism … (synergy not guaranteed)– multiple clock domains, grids; more gating/scheduling– adaptive voltage and frequency scaling– frequency: +1 GHz/year ... BUT: marketing focus shifts to system throughput

• Bifurcation of MPU requirements via “centralized processing”?– smart interface remedial processing (SIRP): basic computing and power

efficiency, SOC integration of RF, M/S, digital (wireless mobile multimedia)– centralized computing server: high-performance computing (traditional MPU)

• The preceding gives example content for definition of MPU (high-volume custom) in System Drivers Chapter

ASIC-SOC-MPU Convergence• Custom vs. ASIC headroom diminishing

– density of custom == 1.25x ASIC (logic, memory)– “custom quality on ASIC schedule” achieved by on-the-fly, tuning, liquid etc.

cell-based methodologies (cf. IBM, Motorola)– convergence of ASIC, structured-custom methodologies (accelerated by COT

model, tool limitations) to “hierarchical ASIC/SOC”

• ASIC-SOC convergence– ASIC = business model– SOC = product class (like MPU, DRAM), driven by cost and integration– ASICs are rapidly becoming indistinguishable from SOCs in terms of

content, design methodology• MPU-SOC convergence

– MPUs evolving into SOCs in two ways– MPUs designed as cores to be included in SOCs– MPUs themselves designed as SOCs to improve reuse – (recall also SIRP = SOC integration)

• Thus, four System Driver Classes: MPU (high-volume custom), SOC, DRAM, AMS/RF

Outline



TWGs)

ASIC Logic Density Model• Average size of gate (4t) = 32MP2 = 320F2

• MP is contacted lower-level metal pitch – sets size of a standard cell (e.g., 7-track, 9-track, etc.)– ITRS Interconnect chapter: MP ~ 3.1-3.2 * F 1 MP2 =

10F2 (consistent throughout technologies)

• 32 comes from:– 8 tracks (expected height for dense std-cell library) by 4 tracks (avg width of 2-

input NAND gate)– close match with claimed gate densities (published and unpublished data) –

e.g., 100K gates/mm2 at 0.18mm

• Overhead/white space factor = 0.5– effective gate size = 64MP2

– logic density = 19.3Mt/cm2 at 180nm (compare to 20Mt/cm2 in ITRS2000, total density)

• Scales quadratically – e.g., density 1.39Bt/cm2 at 30nm will be 36X that at 180nm (compare

with current ITRS)

MPU Logic Density Model

• Custom logic density == 1.25X ASIC logic density• Example: MPU logic density 24.13Mt/cm2 at 180nm

(equal to 60K gates/mm2)• Suggest breaking out logic and SRAM density

separately for MPU, rather than lumping together

SRAM Density• SRAM cell size expressed as A*F2

• SRAM A factor essentially constant, barring paradigm shifts in architecture/stacking– Slight reduction with scaling, as seen in following slide– N.B.: 1-T SRAM (www.mosys.com): 2-3x area reduction, 4x

power reduction, in production (Broadcom, Nintendo)

• Overhead (periphery)– Best current estimate = 100% effective bitcell size = 2*actual– Periphery area can be more exact function of memory size

• smaller caches experience more overhead (could pertain to cost-perf vs. high-perf MPUs)

• A word * B bit SRAM: core area = A*B*C (Artisan TSMC25: C = 240 F2); periphery area = K*log(A)*B (Artisan TSMC25: K = 4000-5000 F2)

http://www.mosys.com/

Collection of 6T SRAM Cell Sizes fromTSMC, Toshiba, Motorola, IBM, UMC, Samsung, Fujitsu, Intel

A-Factor= 50.546F + 133.19

0

20

40

60

80

100

120

140

160

180

200

0.1 0.15 0.2 0.25 0.3 0.35 0.4

F (DRAM half-pitch) micron

A-F

ac

tor

(SR

AM

Ce

ll A

rea

no

rma

lize

d t

o F

2)

Technology Node, F 180nm 130nm 100nm 70nm 50nm 35nm

SRAM Cell Size/F2 142 139 138 136 135 134

Without overhead

SRAM Density

• At 180nm 65.2 Mt/cm2 (compare to 35 Mt/cm2 in ITRS00 for cost-performance MPU)

• Easier to understand: 10.87 Mbits/cm2 since the Mt/cm2 definition ignores peripheral transistor count

• At 30nm 414.6 Mbits/cm2 or 2.49 Bt/cm2 (compare to 3.5Bt/cm2 in ITRS00)

• Difference is due to non-quadratic scaling in ITRS00

Outline



TWGs)

Memory/Logic Power Study Setup• Motivation: Is current ITRS MPU model consistent with power

realities? Does it drive the right set of needs?• Ptotal = Plogic + Pmemory = constant (say, 50W or 100W)• Plogic composed of dynamic and static power, calculated as densities• Pmemory = 0.1*Pdensity_dynamic

– power density in memories is around 1/10 th that of logic

• Logic power density (dynamic) determined using active capacitance density (Borkar, Micro99)– dynamic power density Pdensity_dynamic = Cactive * Vdd

2 * fclock

– fclock uses new fixed-FO4 inverter delay model (linear, not superlinear, with scale factor)

– Cactive = 0.25nF/mm2 at 180nm– increases with scale factor (~1.43X)

Memory/Logic Power Study Setup• Static power model considers dual Vth values

– 90% of logic gates use high-Vth with Ioff from PIDS Table 28a/b

– 10% of logic gates use low-Vth with Ioff = 10X Ioff from PIDS Table 28a/b (90/10 split is from IBM and other existing dual-Vth MPUs)

– Operating temp (80-100C) Ioff is 10X of Table 28a/b (room temp)

• Width of each gate determined from IBM SA-27E library– 150nm technology; 2-input NAND = basic cell– performance level E: smallest footprint, next to fastest

implementation W of each device ~ 4um

– Weff (effective leakage width) for each gate = 4um

– 0.8*Weff*Ioff (per um) = Ileak / gate (0.8 comes from avg leakage over input patterns)

Memory/Logic Study Setup

• Calculate densities, then find allowable logic component (percent of total area) to achieve constant power (or power density)– Amemory + Alogic = Achip

– recall that Achip is flat at 157 mm2 from 1999-2004, then increases by 20% every 4 years

• Constant power and constant power density scenarios same until 65nm node (because chip area flat until then)

Power as a Constraint: Implications

Constant power or power density decreasing logic content cannot scale logic, SRAM in lock step as in current ITRS

Anomaly going from 45nm to 32nm due to constant Vdd

Power as a Constraint: Implications

Using same constraints, calculate #MPU cores (12Mt/core) and Mbytes SRAM allowable (again, anomaly at 32nm due to constant Vdd)

Outline



TWGs)

Design Cost Requirement• “Largest possible ASIC” design cost model

• engineer cost per year increases 5% per year ($181,568 in 1990)• EDA tool cost per year increases 3.9% per year ($99,301 in 1990)• #Gates in largest ASIC design per ORTCs (.25M in 1990, 250M in 2005)• %Logic Gates constant at 70% (see next slide)• #Engineers / Million Logic Gates decreasing from 250 in 1990 to 5 in 2005• Productivity due to 7 Design Technology innovations (3.5 of which are still

unavailable) : RTL methodology; In-house P&R; Tall-thin engineer; Small-block reuse; Large-block reuse; IC implementation suite; Intelligent testbench; ES-level methodology

• Small refinements: (1) whether 30% memory content is fixed; (2) modeling increased amount of large-block reuse (not just the ability to do large-block reuse). No discussion of other design NRE (mask cost, etc.).

• #Engineers per ASIC design still rising (44 in 1990 to 875 in 2005), despite assumed 50x improvement in designer productivity

• New Design Technology -- beyond anything currently contemplated -- is required to keep costs manageable

Cost Metrics Forecast

$10,000,000

$100,000,000

$1,000,000,000

$10,000,000,000

$100,000,000,000

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

Design Cost for Largest Possible ASIC

Same Cost RTL Methodology Only

Design Cost Requirement• Source: Dataquest (2001)

ASIC Core Composition Breakout

0

10

20

30

40

50

60

1999 2000 2001

Per

cen

tgae

of

Die

A

rea

(I/O

s E

xclu

ded

)

Random Logic

Memory

Analog

Cores

ASIC Memory Content Trends• Source: Dataquest (2001)

Design Quality Requirement• “Normalized transistor” quality model

• speed, power, density in a given technology• analog vs. digital• custom vs. semi-custom vs. generated• first-silicon success• other: simple / complex clocking, …• developing quality normalization model within MARCO

GSRC; VSIA, Numetrics, others pursuing similar goals

• Design quality: gathering evidence, will have metric, historical trend / needs table)

• Design quality, and quality/cost, will show red bricks?

Outline



TWGs)

Pre-Meeting Design Changes (1/2)

• New clock frequency requirements– FO4 based, no global interconnect– global clock tracks 14 FO4 INV delays– local clock tracks 6 FO4 INV delays (or can be deleted)

• New layout density requirements– “A” factors for SRAM, logic (custom), logic (semi-custom)– adjustments for overheads (memories)– adjustments for redundancy, error correction– adjustments for change in “MPU” architecture (multi-core, L3 on board, ...)

• New MPU power requirements– bring total chip power down (e.g., flat at 90W, or perhaps 50W)– socially responsible, reasonable “need”, if nothing else

• New MPU figures of merit and requirements– statement of need: increase utility (SPECint, throughput, etc.), not frequency– server: SPEC/W, I/O or request handling bandwidth– smart interface: power, form factor, reusability, reprogrammability

Pre-Meeting Design Changes (2/2)

• #Metal layers– formal model: #Layers grows as log (#Transistors) (DeHon 2000)– add: dedicated metal layers for inductive shielding (1 per generation; these are

not “interconnect” layers)

• Package pins/balls• Variability

– performance uncertainty due to variation of Leff, Vt, Tox, W_int, t_ILD, etc. is managed by design of synchronization, logic, circuits

– these tolerances can be increased (removing some red bricks), and in any case should be developed via critical-path, other design models

• ASIC-SOC convergence– SOC (= System-LSI) is the “product class” that is analogous to MPU, DRAM– System Drivers Chapter: SOC, MPU, AMS/RF, (DRAM)– references to “ASIC” in ITRS should be adjusted/removed accordingly

Design Pre-Meeting Notes (ORTCs 1/1)• ORTCs: inconsistent density metrics

– ASIC, high-perf MPU give total density; cost-perf MPU breaks down logic vs. memory

• ORTCs: density scales super-quadratically– 180nm to 90nm gives 5X rise in density (instead of 4X)– 180nm to 30nm gives 100X rise in density (instead of 36X)

• ORTCs: ASIC total density == high-perf MPU total density– MPU logic density should be 1.25X ASIC– even if SRAM densities same, overall MPU density should be >5% larger (more if

ASIC memory component is smaller than MPU)

• ORTC00: MPU pad counts, Tables 3a/3b– flat from 2001-2005– but in this time period, chip current draw increases 64%

• ORTCs: Distinguish b/w high-perf MPU, ASIC power?– currently no estimates for high-end ASIC power consumption

Post-Meeting Notes (PIDS 1/2)• AR for DESIGN: Obtain MPU designers’ input• AR for PIDS: contribute to the common LITHO-PIDS-FEP-DESIGN linked spreadsheet • AR for DESIGN and PIDS: Review device specs and characteristics in terms of circuit

implications and requirements– Rds (> 10% of Ron nearing 22-25%)

– Rho_gate (sheet resistance spec on gate lead, increasing from previous 4-6 ohm values)

– Ion, Ioff values (Ion > 2.5mA/um ? Ioff = 1 uA/um at 65nm node (room temperature))

– Delta Lg, Delta Vt (25mV at 65nm node?), etc.

– AR for PIDS: Answer DESIGN questions/concerns re key device characteristics and statistical variations, especially variations that define Delta Lg – goal is to have a common understanding of variability requirements

– AR for DESIGN: give a power dissipation spec (tunneling, Ioff)

– AR for DESIGN: Consider both high performance and low power scenarios

• AR for PIDS: Pass along current work on ASIC-LP roadmap to Hiwatashi-san of Japan Design TWG; this will lead to SOC input within the System Drivers Chapter

Post-Meeting Notes (PIDS 2/2)• Original list of comments on PIDS Chapter that DESIGN ITWG had before meeting (we did not get to

discuss most of these…)– Rds #’s in spreadsheet from PIDS are higher than 15-20% numbers – more like 22-25% near end of ITRS. Drive

current penalty is < 14%. Necessary Vth reduction to compensate will adversely affect Ioff. Since Ioff is already a major headache this might not be good.

– What is “effective Vth”?– Very strong dependence when in tunneling regime: leakage increases 10X with 2 angstrom differences in oxide

thickness– Ioff values seem much worse than in the current ITRS: 1 uA/um at 65nm (is this room temp or 100C? Answer

from PIDS: room temp). This is listed as “user-adjustable parameter”. (Answer from PIDS: higher Vth means that it’s less temp-dependent.)

• From DESIGN perspective, these values are problematic.• From DAC-01 work, also seem pessimistic, particularly if assuming negligible gate depletion and inversion

layer quantization effects, Cox scaling should help alleviate Vth reductions which are otherwise the primary way of getting Ion to be 750 uA/um.

– 10% static power constraint in Table 28a/b• Justification?• 100X increase in Ioff from room temp to 100C is high (better estimate = 10-20X; see Borkar IEEE Micro99)• W/L of 3 for all devices – including memory? Max Ioff used? Pessimistic; should use simpler gate-level (not

xtor-level) approach– Suggest that PIDS be as clear as possible re both electrical and physical gate oxide thickness (Answer from PIDS:

already in the table)• can incorporate expected gate material enhancements to reduce gate depletion effects (GDE) • can give better depiction of how Ion scales and how significant Ioff will be as a result

• DESIGN suggestion: optimize over all possibilities by exhaustive search.

Post-Meeting Notes (FEP 1/1)• AR for FEP (with PIDS, Litho): close on definitions of CD variation• AR for FEP, DESIGN: collaborate with PIDS, Litho to develop a big

spreadsheet to call out interdependencies between these four TWGs as much as possible

• AR for FEP: explain to DESIGN ITWG the models or derivations that are behind the following variability specs: gate oxide, gate length, effective channel length, CD bias iso-dense

– Comment from FEP TWG: etch bias does not include every density, every feature size (comprehended by RET in the mask) – rather, just one single isolated line

– Comment from FEP TWG: companies will start with larger printed feature size, larger etch bias to get down to given physical gate length need more control (what is a model for this trend?)

• AR for FEP: explain to DESIGN ITWG the critical area definition/model used• AR for DESIGN: develop implications of low-power circuit techniques on FEP

technology requirements– must address both low operational power, low standby power regimes– what are leakage requirements for low-power? – N.B.: gate leakage (need high-k) and subthreshold leakage (Vth) are at roughly

same order of magnitude. Also, leakage requirement is now getting to 100A/cm^2

Post-Meeting Notes (M&S 1/1)• AR for M&S: provide model of SEU phenomena send to DESIGN folks

– affects need for ECC, SRAM layout density, etc.

• Note: ABK did not take notes at this meeting– mostly, was a freeform discussion with no obvious AR’s for either TWG at the

conclusion (?)

• Some questions that came up for M&S from DESIGN:– best practices : best-in-class analysis and simulation approximations (e.g., power,

timing) to fit various CPU, information regimes– compact modeling issues (4-terminal devices, SiGe, …)– modeling of critical performance indicators (leakage, transconductance, max

frequency of CMOS, Ion/Ioff ratios, etc.)• Implicit AR for DESIGN: make some requests (?)

– variability modeling– timing and power models (e.g., crosstalk delay uncertainty model)– AR for M&S: Can M&S ITWG come up with optimizable models (i.e., suitable as

objectives for optimization)?– AR for M&S: In addition to purely modeling and simulation, can M&S ITWG come

up with silicon calibration methodologies, eyecharts, etc. to provide the validation and test structures that accompany models and simulations?

• Were there any AR’s for DESIGN ?

Post-Meeting Notes (A&P 1/1)• AR for DESIGN (from A&P): Address codesign of die and package: RF, passives, …

– 3 domains: EM, thermal-mech, microstructural (stress/strain) at minimum, need to pass datasets back and forth (e.g., design of RF front-end w/flip-chip

– Differential heating (transient, not structural) + mechanical stresses in flip-chip die is the weak link (microstructure, fracturing, …)

– (What about redistribution, terminal assignment, etc. – and what about existing tool development in industry – cf. Cadence-Agere announcement?)

– A&P Goals: (1) short term: data; (2) medium-term: codesign; long-term: cost-driven die-package co-optimization? (May have written this down wrong…)

• AR for A&P: Send new writeup on multi-die packaging to DESIGN folks

• AR for A&P: Send new writeup on SOC / optoelectronics end of spectrum to DESIGN folks– A&P cited four areas for roadmapping: MEMS, MCM, materials, optoelectronics

• Pre-Meeting Questions from DESIGN to A&P (we did not get to discuss most of these)– Effective bump pitch roughly constant at 350um throughout ITRS

• Why does bump/pad count scale with chip area only, not with technology demands (IR drop, L*di/dt) ?

• Implication – metal resource needed to ensure <10% IR drop skyrockets since Ichip and wiring resistance increase

– Later technologies (30-40nm) have too few bumps to carry required maximum current draw

• 1250 Vdd pads at 30nm: with bump pitch of 250mm can carry 150mA (bumps at 350nm can carry more, not shown in ITRS)

• 187.5A max capability but Ichip/Vdd > 300A

• 100,000 hour reliability #’s build cushion into this calculation, but… could A&P provide details of analysis?– Why is hand-held power 2.6W in 2005 (monotonically increasing 1999-2005) but then 2.1W in 2008 (resumes

increasing)?

Post-Meeting Notes (Interconnect 1/1)• GLOBAL INTERCONNECT STUDY GROUP: perhaps should not expect activity this renewal cycle• AR for INTERCONNECT: Linked spreadsheet send to DESIGN folks• AR for INTERCONNECT: “Algorithm” for creating idealized interconnect stack send to both ITWGs

– AR for Werner Weber (DESIGN): Distribute the IBM paper he referred to (re optimal interconnect stacks)• AR for DESIGN: Variability control requirement send to INTERCONNECT folks

– planarization, width/spacing, … issues are all subsumed within this AR– Example metric: “cross-sectional variance per unit length”…– Note: INTERCONNECT ITWG would like a more well-supported/motivated planarization metric… (due to time

considerations, INTERCONNECT ITWG was not intending to address variability in detail in this renewal cycle)• AR for DESIGN: (Hiwatashi-san) Work with Japan TWG (Ohsaki-san) to understand the interconnect

stack performance requirement issue– Is the position of the Japan TWG that both #levels and “minimum dimensions” be increased? If so, what is the

motivation/thinking behind this? (Initial reaction: increasing #levels is probably justifiable; increasing minimum dimensions is not as necessary since designers have non-minimum dimensions available to them, and since global interconnect performance is solvable by pipelining, etc.

– AR for Chris Case: broadcast the Japan TWG slides (Ohsaki-san’s slides)• AR for DESIGN: address the question of #Metal levels (should they increase?) by proposing a model

send to INTERCONNECT folks• AR for DESIGN: Comment on NEED for particular effective ILD permittivities, particularly red brick

values– At least, will create a dialogue about any differences in “requirements”

• AR for INTERCONNECT: Explain CEP and any potential changes for Design – e.g., how does need for dummy features change?

• AR for DESIGN: How should Interconnect parameters be driven by design?– Example hope: “Crosstalk metric” (e.g., DRAM makers must decide when to invest …)– Potential criteria for metrics: Delay uncertainty? Power delivery? Via impact factor / porosity / routability

• AR for DESIGN: Variability requirement for on-chip passives (e.g., driven by Q variability needs)– motivation: thick metal planarization (e.g., 5um high spiral inductors)

• AR for DESIGN/INTERCONNECT: Work out whether PSM of LI/M1 layers needed– Check spreadsheet to determine whether pitches vs. wavelengths for use of PSM

• AR for DESIGN/INTERCONNECT: work with LITHO, PIDS, FEP to build common big spreadsheet

Post-Meeting Notes (Litho 1/2)• AR for LITHO: Can cost of mask be added into the ITRS?

– As a “precedent”, we would cite the existence of a “Data Volume” requirement in Table 41. – Independent of this issue, can Litho TWG help us gather this information?

• AR for LITHO: Is local interconnect going to use strong phase-shifting? If so, when?• AR for LITHO: Can the CD variability requirement (Table 39) be relaxed based on input from DESIGN ITWG?

– Design also joins in the apparently existing request for a more precise definition of CD control (see below).– What should the process be if Design and Litho end up with different requirements for CD control?

• AR for LITHO: Continuing with the variability discussion, could the LITHO ITWG supply DESIGN ITWG with explanations of WHY particular requirements exist?

– These requirements, in our opinion, must exist to achieve some target level of control over the manufactured features. (If we are wrong in this understanding, please tell us.)

– Design needs to know the relationship between the Litho requirement and the eventual effect in terms of control over functional aspects of the manufactured devices and interconnects. As examples, we list the following.

– Table 41: wafer overlay, magnification, mask minimum image size, mask OPC feature size, image placement, mask design grid, etc. (One observation is that Litho ITWG defines a mask OPC feature size requirement, yet is unable to tell Design ITWG the underlying assumptions as to what the TYPE of OPC might be!)

– Table 41 (continued): AttPSM transmission mean deviation from target (e.g., why is this mean only?), AttPSM transmission uniformity (don’t these transmission deviation/uniformity requirements depend on the nominal values of transmission?)

• AR for LITHO: Please tell us what needs to be improved (i.e., by DESIGN Technology) in the design-litho flow.– For example, does the mask cost structure require function- and cost-driven mask data prep / RET? – As suggested above, we are confused by Litho ITWG’s inability to provide guidance re efficiency and cost issues, given that

there are data volume requirements, OPC minimum feature size, etc. requirements in Table 41. It seems to us that data volume requirements (for example) are completely driven by throughput, storage, etc. – i.e., cost – considerations.

– With respect to the design-litho flow, what information would LITHO like to see from DESIGN?

Post-Meeting Notes (Litho 2/2)• AR for LITHO: Please supply formal (exact) definitions of the variabilities that are coming into design. (Some of

the following may overlap with previous AR statements.)– Are variabilities corrected or uncorrected? – What correction mechanisms are assumed? (It matters GREATLY to Design whether these are using OAI, annular/quadrupole

illumination, particular OPC technologies (SRAFs, level of OPC aggressiveness), particular PSM technologies (AltPSM (full-poly? LI layer?), AttPSM), etc.

– Specify precisely where in the reticle these variabilities are occurring. Design requires knowledge of the decomposition of variability into systematic and random components; we need to know what is correctable in the mask data prep flow (RET), what is correctable in Design (e.g., compensating for coma effect by assuming that “lower-quality” features will print from the periphery of the reticle), and what is uncorrectable. Put another way, the agreement during the discussion was that Litho would break out variability into systematic / random and cross-chip / local.

• AR for LITHO: Please define (formally) OPC and PSM for us. – The level of detail that we seek is suggested in the previous AR.

• AR for LITHO: Please explain the model/derivation of magnification (Table 41)• AR for DESIGN: Can large stitching tolerances be allowed in the mask-making process? The EPL tool requires

such stitching every M microns (M = 250 or 8000? Not clear from discussion). This would apparently yield a new problem or constraint for layout and floorplanning.

• AR for DESIGN: Are the presently specified overlay tolerances acceptable?• LITHO comments during discussion: (1) Lithography sets the pace of the roadmap; (2) cost does not come into

play in Litho’s roadmapping (???); (3) many answers to the questions from Design cannot be shared in an open forum (but, why is Litho discussing SRAF metrology requirements with Metrology while at the same time being unable to provide a one-bit (yes/no) indication to Design as to whether SRAFs should be assumed in future masks?); (4) CD variability requirements may vary by level.

• Comment re Data Volume and AR for LITHO: Please indicate the model for setting the “requirement”– Members of Design ITWG have studied this issue (including GDSII Stream, MEBES etc. formats), and are curious how the

numbers were derived.

Post-Meeting Notes (Test 1/2)• AR for DESIGN: statistically aware timing/xtalk analysis: is this good enough for signoff, or does

test have to be a backstop for this / clean up ?• AR for DESIGN: model for #logic, #memory transistors on-chip send to TEST folks• AR for DESIGN: what percentage of the memory arrays have BIST (can avoid having APGs on

test equipment)• AR for DESIGN: send to TEST folks all information re normal operation power dissipation,

frequencies, #IOs, … for all System Drivers (I.e., type of blocks we have) – highest frequency pins (split p/g vs. signal pins) (IEEE 1394), SDRAM/DDR/RDRAM speeds, pin inductance, …

• AR for DESIGN: what kind of design debug environment is envisaged? (first-silicon design debug) • AR for TEST: describe the boundary between design and test• AR for TEST: Test design rules for design // n.b.: memory test is totally different animal• AR for TEST: set a constraint on Design (in terms of test cost, etc.)• AR for TEST: send to DESIGN folks all comments on test_column_feb01.ppt• AR for TEST: power delivery, dI/dt, etc. constraints from Test that may impact Design • AR for DESIGN AND TEST: need to split Design, Test material among three chapters (System

Drivers, Design, Test) – AR for DESIGN: describe System Drivers chapter to Test– AR for TEST: pass along SOC test material

• AR for DESIGN AND TEST: define the proper group of liaison folks (Mike, Tom, Tim, Rob, …?) Should also set up processes/schedule for interaction and convergence.

Post-Meeting Notes (Test 2/2)• Miscellaneous comments during discussion

– What about mainstream CMOS vs. ???

– TEST: split wafer test from final assembly at-speed test

– What level of coverage of SAF is achieved with DFT? 90% or so…

– There is also transition-fault testing and path-delay testing; the latter is applied to a few critical paths today, but will it be applied to many paths in future?

– What is level of Test application in design? E.g., w.r.t. correct by construction, etc.

– General issue: want to replace functional tests with structural testing based on static faults, i.e., how do we prove functional performance without functional tests? (something about functional vectors being from design community; test having to rely on high quality of these vectors; functional testing leading to expensive equipment

– Something about TEST needing to define appropriate DFT fault model, size limits for logic, …– Something about use of scan out LFSR capability after functional error found (?)

– Something about on-chip tests (design is merging with test; mindset of “design for testability” is giving way to mindset of test engineer being a design engineer

Documents

ITRS-2001 Grenoble Meeting April 25, 2001 U.S. Design TWG