CSE241 VLSI Digital Circuits Winter 2003 Lecture 03: ASIC Flow and Design Convergence

  • View

  • Download

Embed Size (px)


CSE241 VLSI Digital Circuits Winter 2003 Lecture 03: ASIC Flow and Design Convergence. This Class + Logistics. Overview of flow (preparation for Smith Chapters 12-17) Read: Smith Chapter 12 (Synthesis), 13.7 (Static timing) Lab #1 revised due date: Monday January 20 Near-term schedule: - PowerPoint PPT Presentation

Text of CSE241 VLSI Digital Circuits Winter 2003 Lecture 03: ASIC Flow and Design Convergence

CSE241 VLSI Digital Circuits Winter 2003 Lecture 03:ASIC prototyping CSE241 L3 ASICs.*
Near-term schedule:
Ben has reserved the lab (EBU I, Room 3329) for this Friday, January 17, noon-1:20pm a running start into synthesis
Recitation #2 tomorrow (noon-12:50pm): not on RTL design, but on datapaths and memories
Lab tomorrow (3:30-5pm): really Lab #1
Slide courtesy of S. P. Levitan, U. Pittsburg
CSE241 L3 ASICs.*
What happens when you make a gate bigger?
What happens when you make a wire taller? Wider?
RC delay
How do these issues impact estimates and design approaches?
Slide courtesy of S. P. Levitan, U. Pittsburg
CSE241 L3 ASICs.*
Basic flow
Semi-Custom (strong infrastructure, economical in lower volumes)
ASIC (Application-Specific Integrated Circuit)
Mixed-Signal / RF (unique to each process, no scaling)
Slide courtesy of S. P. Levitan, U. Pittsburg
CSE241 L3 ASICs.*
What are some implications?
Second, this is the roadmap for optical lithography.
Starting from 350-nanometer process generation, we are making features that are smaller than the wavelength of light.
The key: to stay on the roadmap, we will need sub-50 nanometer processes manufactured with 157 nanometer lasers.
CSE241 L3 ASICs.*
but average only 500 wafers per set
The first driver is non-recurring engineering cost.
Certainly, OPC and Phase-Shift Masks increase this part of the system cost.
According to SEMATECH, the cost for a 25-level mask set will cost around 1 million dollars in the 130 nanometer process generation, which arrives one year early, in 2001.
Also according to SEMATECH, the average number of wafers that are processed with a given mask set is only 500.
So, we can afford to make only high-value designs !
CSE241 L3 ASICs.*
2-3X more verification engineers than designers on microprocessor teams
Software = 80% of system development cost (and Analog design hasn’t scaled)
Design NRE > 10’s of $M manufacturing NRE $1M
Design TAT = months or years manufacturing TAT = weeks
Without DFT, test cost per transistor grows exponentially relative to mfg cost
Incremental Cost Per Transistor
In summary, the design productivity gap goes hand in hand with a design QUALITY gap. Together, these eventually threaten the ASIC business model.
Design technology must deliver high-value silicon with low cost.
CSE241 L3 ASICs.*
Non-ideal scaling (leakage, power management, circuit/device innovation, current delivery)
Coupled high-frequency devices and interconnects (signal integrity analysis and management)
Manufacturing variability (library characterization, analog and digital circuit performance, error-tolerant design, layout reusability, static performance verification methodology/tools)
Scaling of global interconnect performance (communication, synchronization)
Decreased reliability (SEU, gate insulator tunneling and breakdown, joule heating and electromigration)
Complexity of manufacturing handoff (reticle enhancement and mask writing/inspection flow, manufacturing NRE cost)
CSE241 L3 ASICs.*
Reuse (hierarchical design support, heterogeneous SOC integration, reuse of verification/test/IP)
Verification and test (specification capture, design for verifiability, verification reuse, system-level and software verification, AMS self-test, noise-delay fault tests, test reuse)
Cost-driven design optimization (manufacturing cost modeling and analysis, quality metrics, die-package co-optimization, …)
Embedded software design (platform-based system design methodologies, software verification/analysis, codesign w/HW)
Reliable implementation platforms (predictable chip implementation onto multiple fabrics, higher-level handoff)
Design process management (team size / geog distribution, data mgmt, collaborative design, process improvement)
CSE241 L3 ASICs.*
Basic flow
Floorplanning and custom WLM
Power distribution (Internal, I/O)
I/O driver, padring design
Models and technology data required to execute the design flow
Power, timing: ALF, DCL, OLA, .lib, STAMP
Delays and path timing, parasitics: SDF, GCF, SDC, DSPF, RSPF, SPEF, SPICE
Layout rules: Dracula, Calibre “deck”
CSE241 L3 ASICs.*
Specification to RTL
Defines the logic and fundamental structure of the chip at the RTL level in either the verilog or VHDL language
Requires considerable interaction with the customer, plus specs such as the architecture, system, design, test and block specs
May include RTL from the customer or third party IP providers
Coding guidelines should be established and adhered to, and the code must be compatible with the chosen synthesis tool
Special design considerations such as multiple clock frequencies, asynchronous logic, high speed logic, race conditions, gated clocks, etc. must be addressed
CSE241 L3 ASICs.*
RTL Simulation
RTL code, written in Verilog, VHDL or a combination of both, is simulated to verify functional correctness
Testbenches apply input stimulus to the design
Several methods are used to verify the outputs
Self-checking testbenches automatically verify output correctness and report mismatches
Results can be stored in a file and compared to previous results
Waveform displays can be used to interactively verify the outputs
Verification-specific tools: Verisity Specman, Synopsys Vera
Functional verification
Mostly Modelsim
CSE241 L3 ASICs.*
Decide on the physical layout strategy—flat or hierarchical?
Advantages of a flat implementation are generally a smaller die size, and a more straightforward approach to clock and power distribution and RC generation
Advantages of a hierarchical design
better runtimes,
better ability to control timing within localized areas of the design, and concurrent design
For hierarchical design, issues
assignment of the physical locations for the block pins
timing budgeting,
CSE241 L3 ASICs.*
Take advantage of RTL hierarchy
Generate a physical hierarchy
Place big blocks on chip (memories)
Allow space for power/clk/busses
Reduce complexity of placement
Target foundry specific library
Input driving cells, output loading
False paths, multi-cycle paths
Interconnect delay is calculated based on a wireload model which uses fanout to calculate delay
Clocks parameters (insertion delay, skew, jitter, etc.) Are assumed to be attainable later in place and route
CSE241 L3 ASICs.*
CSE241 L3 ASICs.*
Formal Verification
RTL description and gate level netlist are compared to verify functional equivalence, thereby verifying the synthesis results
An emerging technology that supplements the more traditional approach of gate level simulation
Synopsys Formality (will be used in-class)
CSE241 L3 ASICs.*
Gate Level Simulation
Another method to verify the synthesis process, which covers both the functionality and timing
Correctness is only as good as the test vectors that are used
Especially critical for non-synchronous designs, verification of false path and multi-cycle path constraints
Cell timing is included in the simulation models and interconnect delay is passed from the synthesis run
Worst case PVT conditions are used to analyze for setup violations, and best case PVT conditions are used to analyze for hold violations
PVT = Process, Voltage, Temperature
CSE241 L3 ASICs.*
Implicitly assumes correct timing constraints (!), e.g., boundary conditions
Timing constraints are similar to those used in synthesis
Verifies setup and hold times at FF inputs; can also check timing from and to PI’s and PO’s; can also check point-to-point delay values (with blocking of pins, etc.)
As with gate-level simulation, both best- and worst-case analysis is performed
Typically performed on full-chip (not block) basis
May require modified constraints for inter-block issues: multiple clock domains, multi-cycle paths, etc.
For compatibility with timing-driven layout flow, helps to have simple / single set of constraints
Other issues: incremental analysis, …
Define the standard cell rows and I/O placement locations
Place rams and other macro cells
Define power bus structures such as power rings and stripes
Often performed using the standard place and route tool
Rules of thumb for cell density are used to initially calculate design size
Popular standalone tools are Cadence’s design planner and avanti’s planet
CSE241 L3 ASICs.*
Generate clock trees
Route clock lines
Route signal interconnects
Timing driven tools
Require timing constraints and analysis algorithms similar to those used during the static timing analysis step
CSE241 L3 ASICs.*
Based on placement of cells
Routing segments
Extracts capacitance between metal segments
RC data is transferred to
Static timing analysis (back annotation)
Gate level simulation
Tools used:
CSE241 L3 ASICs.*
SI aware routing
CSE241 L3 ASICs.*
Logic equivalence
Comparison of pre- and post-layout netlist
Similar to the formal verification step after synthesis; clock tree insertions, drive strength changes, etc. have been made
Buffer insertion or logic optimization may have been performed
CSE241 L3 ASICs.*
LVS – Layout Versus Schematic
Verifies that layout and netlist are equivalent at the transistor level
Manufacturing check for long nets
Net can accumulate charge during plasma etch and damage gate oxide
Final merge of layout, routing and placement data for mask production
Example tools:
Cadence Dracula, Diva
CSE241 L3 ASICs.*
Metal fill and metal stress relief rules are checked
Manufacturing information such as scribe lanes, seal rings, mask shop data, part numbers, logos and pin 1 identification information for assembly are also added
DRC and LVS are run to verify the correctness of the modified database
‘Tapeout’ documentation is prepared prior to release of the GDSII to the foundry
Pad location information is prepared, typically in a spreadsheet
Cadence’s Virtuoso is used for custom-manual edits of the mask layers
Manufacturing steps
Basic flow
Multiple design files are converged into one efficient Data Model
Disk accesses are eliminated in critical methodology loops
Verification of Function, Performance, Testability and other design
criteria all move to earlier, higher levels of abstraction followed by
equivalence checking and
Incremental modular tools for optimization and analysis
CSE241 L3 ASICs.*
CSE241 L3 ASICs.*
CSE241 L3 ASICs.*
“go”: recipe for invocation and composition of SP&R results
“no go”: diagnosis of RTL code problems
Logical and physical hierarchies co-evolve
spatial: top-down coarse placement physical hierarchy
logic/timing: implementable RTL logical hierarchy
limits of human fanout, organizations always have hierarchy
Have seen a natural sequence of no-floorplanning, physical-floorplanning, RTL-floorplanning... as chip complexities increase
Details (must construct, predict, ignore, eliminate, ...)
pin optimizations, interconnect planning, hierarchy reconciliations, budgeting mechanisms, compatibility with downstream SP&R, ...
Here is my view of what the Design Closure tool must do.
The input to the tool is RTL Verilog, IP and technology library, and constraints.
The “GO” output is a Guaranteed Recipe for using Synthesis, Place and Route Back End.
Or, the “NO GO” output is a diagnosis of why the RTL was bad.
I will now list some key technology “Do’s and Don’t’s” for making such a tool.
CSE241 L3 ASICs.*
(schematic hierarchy also typical in structured-custom)
RTL design = logical/functional hierarchy
provides valuable clues for physical embedding: datapath structure, timing structure, etc.
can be incredibly misleading (e.g., all clock buffers in a single hierarchy block)
Main issues:
CSE241 L3 ASICs.*
Subblocks in A connected with subblocks in B result in
600 top level nets.
Physical Partitioning
Physical partitioning reduced the number of top level nets from 600 to 0
Source: ReShape
CSE241 L3 ASICs.*
“Natural” Block Shapes
Are not disjoint rectangles, e.g., intersecting timing paths all want to be embedded as “straight paths”
Traditional chip floorplan = dissection into rectangles may not be optimum for wirelength and timing, but has compensating advantages (convenience)
Physical hierarchy = hierarchical, very structured organization of the core layout region
Potentially, little relation to high-quality (e.g., w.r.t. timing, routability) embedding of logic
Some obvious exceptions
hard IP blocks
And, physical hierarchy helps to define and plan global interconnects
Recent trend: try to avoid artifactual physical hierarchy created by top-down recursive bipartitioning-based placement approach
CSE241 L3 ASICs.*
Convergence and Predictability
We seek a predictable, estimatable back end (physical implementation after some handoff level of design)
Predictability == regression models? (e.g., wireload models)
Predictability == an enforceable assumption? (“correct by construction”)
constant-delay paradigm (logical effort, DEC, IBM, Magma, ...)
Predictability == fast constructive prediction? (also “correct by construction”)
RT-level (Tera Systems), gate-level flat full-chip (Silicon Perspective Corp. FirstEncounter)
Predictability == remove the need for predictability?
GALS, LIS (global-asynchronous/local-synchronous; latency-independent synchronization)
“protocol- / communication-based system-level design”
Or, just make the loops tighter and easier (“construct by correction”)
CSE241 L3 ASICs.*
Global interconnect planning and optimization
symbolic route representations to support block plan ECOs
Controllable SP&R back end (including power/clock/scan)
Incremental / ECO optimizations, and optimizations that are “robust” under partial or imperfect design knowledge
Estimators (“initial wireload models”)
to account for optimizations (placement, ripup/reroute, timing)
“earliest RTL signoff with detailed P&R knowledge”
On the other hand, here are some required technologies that no one has developed yet.
RTL partitioning must create blocks that can be placed and budgeted well, and also must recognize a bad RTL.
When the block areas and timings are not yet certain, global interconnect plans must support design changes smoothly.
The SP&R Back End must be very controllable and handle constraints well. For example, if some local routing resource in the block has been used up by a global bus, then place-and-route must work around this.
In general, optimization as the design changes, or if the design information is incomplete, becomes very important for convergence.
Finally, there is always a chicken-egg problem for top-down planning: always, there is some “initial wireload model” needed to drive the first synthesis and budgeting. So we do need better estimators that exploit netlist and timing structure to give better “initial wireload models”.
CSE241 L3 ASICs.*
Driver sizing,
topology-based optimization
Sequence, DAC-2000
This is their favorite. It looks like Anakin Skywalker’s Pod Racer.
But of course we should be serious, and think clearly about the basics.
CSE241 L3 ASICs.*
Block Area/Performance Estimation
CSE241 L3 ASICs.*
Through buffer insertion
CSE241 L3 ASICs.*
Interconnect Complexities
Interconnect effects play a major role in the increasing costs for large hard-block or rectilinear-outline based design styles
Probabilistic wireload models fail
Without new capabilities for soft IP design and assembly, interconnect problems will significantly impact performance and cost for emerging IC technologies
Occurrence Rate
Technology Scaling
Block sizes cannot grow as rapidly as chip sizes since block design becomes increasingly more difficult --- each block is a chip design over multiple configurations
If the blocks are inflexible, the global wiring problems begin to dominate all aspects of performance quality and system cost
Occurrence Rate
Soft Blocks
With soft, flexible blocks, the system assembly can more thoroughly exploit the available technology
Interconnect problem is controlled via: soft boundaries for area re-shaping; re-synthesis and re-mapping for timing; smart wires; and top-down specified block synthesis
Cf. “Amoeba” placement, coloring analysis of “good” placements with respect to original logic hierarchy, etc.
Occurrence Rate
wire-planning methodology with block/cell global placement
global routing directives passed forward to chip finishing
constant-delay methodology may be used to guide sizing
Synopsys, (Magma)
placement-driven or placement-knowledgeable logic synthesis
Cadence, Avant!
placement, timing, sizing optimization tools
interface synthesis between blocks

View more >