28
Interconnect Modeling for Improved System-Level Design Optimization Luca Carloni § Andrew B. Kahng Swamy Muddu Alessandro Pinto Kambiz Samadi Puneet Sharma § Columbia University University of California, San Diego University of California, Berkeley January 22, 2008

Interconnect Modeling for Improved System-Level Design Optimization

  • Upload
    morna

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Interconnect Modeling for Improved System-Level Design Optimization. Luca Carloni  § Andrew B. Kahng ¶ Swamy Muddu ¶ Alessandro Pinto ‡ Kambiz Samadi ¶ Puneet Sharma ¶ § Columbia University ¶ University of California, San Diego ‡ University of California, Berkeley January 22, 2008. - PowerPoint PPT Presentation

Citation preview

Page 1: Interconnect Modeling for Improved System-Level Design Optimization

Interconnect Modeling for Improved System-Level Design Optimization

Luca Carloni§

Andrew B. Kahng¶

Swamy Muddu ¶

Alessandro Pinto‡

Kambiz Samadi ¶

Puneet Sharma ¶

§ Columbia University¶ University of California, San Diego

‡ University of California, Berkeley

January 22, 2008

Page 2: Interconnect Modeling for Improved System-Level Design Optimization

Outline Motivation System-Level Communication Synthesis Buffered Interconnect Model Interconnect Optimization Validation and Significance Assessment Conclusions

Page 3: Interconnect Modeling for Improved System-Level Design Optimization

Motivation Focus of design process is shifting from “computation” to

“communication” Device and interconnect performance scaling mismatches

cause breakdown of traditional across-chip communication System-level designers require accurate, yet simple models

to bridge planning and implementation stages Today’s system-level performance, power modeling suffers:

Ad hoc selection of models Poor balance between accuracy and simplicity Poor definition of inputs Lack of model extensibility across future technology nodes Inability to explore different implementation styles

Our Goal: Develop accurate models that are easily usable by system-level design early in the design cycle

Page 4: Interconnect Modeling for Improved System-Level Design Optimization

Previous Interconnect Delay Models

Missing required aspects of accurate delay estimation 90nm Do not consider input slew change, which impacts effective drive

resistance and consequently cell delay Do not consider scattering, which impacts metal resistivity and

consequently metal resistance

Bakoglu90 No crosstalk impact, assumes driver on-resistance Rd, gate input

capacitance Cg vary linearly with device size, uses Elmore delay model

Pamunuwa03 Similar to Bakoglu90 but adds crosstalk impact

CongPan99 (IPEM) Multiple delay models under certain optimization schemes

Use of second-order RC model for gate delay (e.g., Shao03) Does not address gate loading during model construction

Page 5: Interconnect Modeling for Improved System-Level Design Optimization

Other Limitations of Previous Work

Design style and buffering schemes Design-level degrees of freedom: wire width, spacing,

shielding Practical buffer sizing

Only consider the delay as optimization objective = wrong Analytic solutions have large buffer sizes (100X-400X) which are

not in any realistic cell library

Model inputs and technology capture Do not have well-defined pathways to capture necessary

technology and device parameters Collect inputs from ad hoc sources, which often leads to

misleading conclusions

Page 6: Interconnect Modeling for Improved System-Level Design Optimization

OutlineMotivation System-Level Communication Synthesis Buffered Interconnect Model Interconnect Optimization Validation and Significance Assessment Conclusions

Page 7: Interconnect Modeling for Improved System-Level Design Optimization

Communication Synthesis for Network-on-Chip

Given An input specification as a set of communication constraints A library of communication components An objective function (e.g., power, area, delay)

Find A network-on-chip implementation as a composition of

library components that Satisfies the specification Minimizes the cost function

Communication Synthesis Infrastructure (COSI) Based on the Platform-Based Design methodology Takes specification and library descriptions in XML format Produces a variety of outputs , including a cycle accurate

SystemC implementation of the optimal network-on-chip

Page 8: Interconnect Modeling for Improved System-Level Design Optimization

Ap

plic

atio

n

Imp

lem

enta

tio

n

Constraints Propagation

Point-to-Point Specification On-Chip Communication Library

Perf. / CostAbstractions

Syn

thes

isSynthesis Result

Constraint-Driven Communication Synthesis

Page 9: Interconnect Modeling for Improved System-Level Design Optimization

Communication Synthesis Key Elements Specification of input constraints

Set of IP cores: area and interface End-to-end communication requirements between pairs of

IP cores: latency and throughput

Characterization of library of components Interface types, max number of ports Max capacities: bandwidth, latency, max distance Performance and cost model

Component instantiation and parallel composition Rename, set parameters of library components Composition based on algebra on quantities (including type

compatibility)

Page 10: Interconnect Modeling for Improved System-Level Design Optimization

Synthesis of optimal network-on-chip Return valid composition that meets input constraints and Minimizes the objective function (e.g., power dissipation)

(Original Specification)

Platform Instance 1

Platform Instance 2

Communication Synthesis Example

Page 11: Interconnect Modeling for Improved System-Level Design Optimization

COSI is a public-domain software package for NoC synthesis http://embedded.eecs.berkeley.edu/cosi/

COSI: Communication Synthesis Infrastructure

Page 12: Interconnect Modeling for Improved System-Level Design Optimization

OutlineMotivationSystem-Level Communication Synthesis Buffered Interconnect Model Interconnect Optimization Validation and Significance Assessment Conclusions

Page 13: Interconnect Modeling for Improved System-Level Design Optimization

Proposed Model Features

Tech. Characteristics• # metal layers• min. width, spacing, thickness• dielectric thickness, constant• device drive res, cap, leakage

Design Style• width/spacing configs• buffering scheme• shielding• signaling scheme

Bus Attributes• length, # bits, layer, switching

ProposedModel

Delay

Leakage

Dynamic

Max. unclocked length, # pipelines, latency, throughput

Area

Improved accuracy with respect to well-known models Modeling of nanoscale-era effects: crosstalk, scattering, barrier thickness, dependence of delay on slews, etc. Single-digit percentage accuracy relative to gate-level analyses

Page 14: Interconnect Modeling for Improved System-Level Design Optimization

Model Technology Inputs Inputs for repeater delay calculation

Delay and slew values for a set of input slew and load capacitance values (obtained from Liberty / Timing Library Formats (TLF) / SPICE)

Input capacitance for different repeater size (Liberty, Predictive Technology Models (PTM))

Inputs for wire delay calculation Wire dimensions (ITRS/PTM, LEF, ITF) Inter-wire spacings for global and intermediate layers (ITRS/PTM, LEF,

ITF)

Inputs for power calculation Input capacitance (Liberty, PTM) Wire parasitics (computed in wire delay calculation)

Inputs for area calculation Wire dimensions used above Repeater area is available from Liberty and for future technologies,

ITRS A-factors or proposed area models can be used

Page 15: Interconnect Modeling for Improved System-Level Design Optimization

Buffered Interconnect Model Buffered interconnect model for delay, power, and area

Constructed from: buffer (repeater) and wire delay models Accounts for coupling capacitances, slew dependence and UDSM

effects (e.g., scattering-dependent wire resistance changes) Calibrated against SPICE

Components: Repeater delay model

Separate models for intrinsic delay, output slew, input capacitance Wire delay model

Accounts for coupling capacitance impact on wire delay Repeater power model

Accounts for sub-threshold and gate leakages Repeater area model

Derived from existing cell layouts (can be extrapolated) Wire area model

Derived from wire width and spacing (can be extrapolated)

Page 16: Interconnect Modeling for Improved System-Level Design Optimization

Repeater Delay Model Repeater delay can be decomposed into load independent (i) and load

dependent (rd.cl) components: d = i + rd.cl

i(si) = α0 + α1.s1 + α2.si2

si denotes input slew; α0, α1 and α2 are the coefficient by quadratic regression

Drive resistance is nearly linear with input slew; also both the intercept and slope vary with repeater size rd = rd0 + rd1.si

Output slew depends on load capacitance; slope is independent of input slew, while intercept depends linearly on it

so(cl , si) = so0 + s01.si + so2.cl

so is the output slew, and so0, so1 and so2 are the fitting coefficients from linear regression

ci is the input capacitance, wp, wn are PMOS and NMOS widths respectively,

and η is a coefficient derived using linear regression with zero intercept

ci = η × (wp + wn)

Page 17: Interconnect Modeling for Improved System-Level Design Optimization

Wire Delay Model For wire delay we use the model proposed by Pamunuwa et al. (cf.

TVLSI03) which accounts for cross-talk dw, rw, cg, cc, and ci respectively denote wire delay, wire resistance, ground

capacitance, coupling capacitance and input capacitance of the next-stage repeater

λi is a coefficient (i.e., based on SPICE simulation) due to switching patterns of the neighboring wires

dw = rw.(0.4cg + (λi.cc)/2 + 0.7ci)

We enhance the quality of the wire delay model by considering two other important factors that change wire resistance: Scattering-aware resistivity (cf. Shi et al. ASPDAC06):

ρ(w) = ρB + Kρ/ww

ww is the wire width, ρB=2.202 µΩ.cm, and Kρ=1.030×10-15 Ω.m2

Interconnect barrier (cf. Mai et al. IEEE01) tm, tb respectively are the metal and barrier thicknesses, lw is the length of the wire,

and ρ is computed using the above equation

rw = (ρ.lw) / (tm - tb).(ww - 2tb)

Page 18: Interconnect Modeling for Improved System-Level Design Optimization

Repeater and Wire Delay Models

Model coefficient fit from data extracted from Liberty/LEF/Tech. files and other extrapolatable sources (i.e., PTM and ITRS)

Drive Resistance Model – r(slewin)Intrinsic Delay Model – i(slewin)

Output Slew Model – o(slewin, CL)delay = i(slewin) + r(slewin) * CL

r(s) = f(size, slewin)slewout = f(slewin,CL)wire delay = Elmore

Page 19: Interconnect Modeling for Improved System-Level Design Optimization

Repeater and Wire Power Models Power is an important design objective and must be accounted for early in

the design flow Today, leakage and dynamic power are primary forms of power dissipation Leakage has two main components: (1) sub-threshold leakage, and (2)

gate-tunneling current Both components depend linearly on device size

ps= (ps

n + psp) / 2

psn = k0

n + k1n.wn

psp = k0

p + k1p.wp

Dynamic power can be calculated as:

pd = a.cl.vdd2.f

cl = ci + cg + cc

pd, a, cl, vdd and f are dynamic power, activity factor, load capacitance, supply voltage and frequency, respectively

Load capacitance is composed of the input capacitance of the next repeater (ci), ground (cg) and coupling (cc) capacitances of the wire driven

Page 20: Interconnect Modeling for Improved System-Level Design Optimization

Repeater and Wire Area Models For existing technologies, the area of a repeater can be

calculated as: ar = τ0 + τ1.wn

ar denotes repeater area, τ0 and τ1 are coefficients using linear regression; wn and wp are widths of NMOS and PMOS, respectively

For future technologies, feature size (F), contacted pitch (CP), row height (RH), and row width (RW) can be used to estimate the area:

NF = (wp + wn + 2.F) / RH RW = NF × (F + CP) + CP

ar = RH × RW Wiring area can be calculated as:

aw = n × (ww + sw) + sw

aw denotes wire area, n is the bit width of the bus, and ww and sw are wire width and spacing

Page 21: Interconnect Modeling for Improved System-Level Design Optimization

Repeater Power and Area Models

Repeater area and power models fit from simulation data points Area and leakage power are

linear over the range of implementable repeater sizes (larger repeater sizes higher leakage power)

Page 22: Interconnect Modeling for Improved System-Level Design Optimization

OutlineMotivationSystem-Level Communication SynthesisBuffered Interconnect Model Interconnect Optimization Validation and Significance Assessment Conclusions

Page 23: Interconnect Modeling for Improved System-Level Design Optimization

Interconnect Optimization: Buffering Conventional delay-optimal buffering unrealistic buffer

sizes high dynamic / leakage power suboptimal

Our approach: iterative optimization of hybrid objective (power + delay) Search for optimal number and size of repeaters Can be extended for other interconnect optimizations (e.g.,

wire sizing and driver sizing)

Pareto-optimal frontier of the power-delay tradeoff of a 5mm interconnect in 90nm / 65nm

Page 24: Interconnect Modeling for Improved System-Level Design Optimization

OutlineMotivationCommunication SynthesisBuffered Interconnect ModelInterconnect Optimization Validation and Significance Assessment Conclusions

Page 25: Interconnect Modeling for Improved System-Level Design Optimization

Model Validation Model comparison with results from physical implementation

{5mm wire} X {90nm, 65nm} X {wiring layers} X {design styles} Model-predicted delays compared with delays from PrimeTime

Deviation of proposed model from PrimeTime delays < 15%

Page 26: Interconnect Modeling for Improved System-Level Design Optimization

Impact on System-Level Design

Testcases VPROC: video processor with 42 cores and 128-bit datawidth dVOPD: dual video object plane decoder with 26 cores and 128-bit

datawidth

Original model (Orig.) underestimates power compared to the Proposed Model (Prop.)

Original Model is very optimistic in delay (i.e. the synthesis result may be actually infeasible).

This could become more critical as technology scales and the chip size becomes larger than the critical sequential length.

Original Proposed Original Proposed Original Proposed Original Proposed Original Proposed Original ProposedVPROC 90nm 117.3 364.8 38.1 99.6 0.070 0.009 0.370 0.346 3.09 3.01 4 5

65nm 51.1 179.9 69.9 86.7 0.036 0.007 0.217 0.223 3.10 3.42 4 6dVOPD 90nm 63.4 88.0 14.2 32.5 0.026 0.003 0.141 0.162 1.76 1.76 3 3

65nm 27.3 73.2 25.7 33.2 0.013 0.003 0.082 0.085 1.76 1.91 3 4

SOCAvg # of hops Max # of hopsDynamic Power (mW) Leakage Power (mW) Device Area (mm x mm) Total Area (mm x mm)

Page 27: Interconnect Modeling for Improved System-Level Design Optimization

OutlineMotivationSystem-Level Communication SynthesisBuffered Interconnect ModelInterconnect OptimizationValidation and Significance Assessment Conclusions

Page 28: Interconnect Modeling for Improved System-Level Design Optimization

Conclusions and Future Directions Accurate models can drive effective system-level exploration

Inaccurate models can lead to misleading design targets

Reproducible methodology for extracting inputs to models from reliable sources

More realistic buffering scheme, where power and area are considered in addition to delay

Modeling of NoC components besides wires Across future nanometer technologies (45nm and beyond) At different levels of abstractions

protocol encapsulation (e.g., hand-shaking for AMBA bus allocation) buses, pipelined rings (e.g. EIB in IBM Cell) routers, network interfaces FIFOs, queues, crossbar switches (where ORION left off) from high-level analytical models to low-level executable models

Extending to other metrics Reliability estimation (i.e., error probability of transmission over wires)