Digital Integrated Circuits A Design Perspective Design Methodologies Jan M. Rabaey Anantha...

Preview:

Citation preview

Digital Integrated CircuitsA Design Perspective

DesignMethodologies

Jan M. RabaeyAnantha ChandrakasanBorivoje Nikolic

A System-on-a-Chip: Example

Courtesy: Philips

Impact of Implementation ChoicesEn

ergy

Effi

cien

cy (i

n M

OPS

/mW

)

Flexibility(or application scope)

0.1-1

1-10

10-100

100-1000

None Fullyflexible

Somewhatflexible

Har

dwire

d cu

stom

Confi

gura

ble/

Para

met

eriza

ble D

omai

n-sp

ecifi

c pr

oces

sor

(e.g

. DSP

)

Embe

dded

mic

ropr

oces

sor

Design Methodology

• Design process traverses iteratively between three abstractions: behavior, structure, and geometry• More and more automation for each of these steps

Implementation Choices

Custom

Standard CellsCompiled Cells Ma cro Cells

Cell-based

Pre-diffused(Gate Arrays)

Pre-wired(FPGA's)

Array-based

Semicustom

Digital Circuit Implementation Approaches

The Custom Approach

Intel 4004

Courtesy Intel

Transition to Automation and Regular Structures

Intel 4004 (‘71)Intel 8080 Intel 8085

Intel 8286 Intel 8486Courtesy Intel

Standard Cell — Example

[Brodersen92]

Standard Cell – The New Generation

Cell-structurehidden underinterconnect layers

Standard Cell - Example

3-input NAND cell(from ST Microelectronics):C = Load capacitanceT = input rise/fall time

Automatic Cell Generation

Courtesy Acadabra

Initial transistorgeometries

Placedtransistors

Routedcell

Compactedcell

Finishedcell

A Historical Perspective: the PLA

x0 x1 x2

ANDplane

x0 x1

x2

Product terms

ORplane

f0 f1

Two-Level Logic

Inverting format (NOR-NOR) more effective

Every logic function can beexpressed in sum-of-productsformat (AND-OR)

minterm

PLA Layout – Exploiting Regularity

f0 f1x0 x0 x1 x1 x2 x2Pull-up devices Pull-up devices

V DD GNDfAnd-Plane Or-Plane

Breathing Some New Life in PLAsRiver PLAs• A cascade of multiple-output PLAs.• Adjacent PLAs are connected via river routing.

PRE-CHARGE

PRE-

CHARGE

PRE-CHARGE

PRE-CHARGE

BUFFER

BUFFER

BUFFER

BUFFER

PRE-CHARGE

PRE-CHARGE

BUFFER

BUFFER

PRE-CHARGE

PRE-

CHARGE

BUFFERBUFFER

• No placement and routing needed. • Output buffers and the input buffers of

the next stage are shared.

Courtesy B. Brayton

Experimental Results

Layout of C2670

Network of PLAs, 4 layers OTC

River PLA,2 layers no additional routing

Standard cell, 2 layers channel routing

Standard cell,3 layers OTC

0.2

0.6

1

1.4

0 2 4 6 area

delay

SC NPLA RPLA

Area: RPLAs (2 layers) 1.23 SCs (3 layers) - 1.00, NPLAs (4 layers) 1.31 DelayRPLAs 1.04SCs 1.00 NPLAs 1.09 Synthesis time: for RPLA , synthesis time equals design time; SCs and NPLAs still need P&R.

Also: RPLAs are regular and predictable

MacroModules

25632 (or 8192 bit) SRAMGenerated by hard-macro module generator

“Soft” MacroModules

Synopsys DesignCompiler

“Intellectual Property”

A Protocol Processor for Wireless

Semicustom Design Flow

HDL

Logic Synthesis

Floorplanning

Placement

Routing

Tape-out

Circuit Extraction

Pre-Layout Simulation

Post-Layout Simulation

Structural

Physical

BehavioralDesign Capture

Des

ign

Itera

tion

The “Design Closure” Problem

Courtesy Synopsys

Iterative Removal of Timing Violations (white lines)

Integrating Synthesis with Physical Design

Physical Synthesis

RTL (Timing) Constraints

Place-and-RouteOptimization

Artwork

Netlist with Place-and-Route Info

MacromodulesFixed netlists

Pre-diffused(Gate Arrays)

Pre-wired(FPGA's)

Array-based

Late-Binding Implementation

Gate Array — Sea-of-gates

rows of

cells

routing channel

uncommitted

VDD

GND

polysilicon

metal

possiblecontact

In1 In2 In3 In4

Out

UncommitedCell

CommittedCell(4-input NOR)

Sea-of-gate Primitive Cells

NMOS

PMOS

Oxide-isolation

PMOS

NMOS

NMOS

Using oxide-isolation Using gate-isolation

Sea-of-gates

Random Logic

MemorySubsystem

LSI Logic LEA300K(0.6 mm CMOS)

Courtesy LSI Logic

The return of gate arrays?

metal-5 metal-6

Via-programmable cross-point

programmable via

Via programmable gate array(VPGA)

[Pileggi02]

Exploits regularity of interconnect

Prewired ArraysClassification of prewired arrays (or field-programmable devices):• Based on Programming Technique

– Fuse-based (program-once)– Non-volatile EPROM based– RAM based

• Programmable Logic Style– Array-Based– Look-up Table

• Programmable Interconnect Style– Channel-routing– Mesh networks

Fuse-Based FPGA

antifuse polysilicon ONO dielectric

n+ antifuse diffusion

2 l

From Smith97

Open by default, closed by applying current pulse

Array-Based Programmable Logic

PLA PROM PAL

I 5 I 4

O 0

I 3 I 2 I 1 I 0

O 1O 2O 3

Programmable AND array

ProgrammableOR array I 5 I 4

O 0

I 3 I 2 I 1 I 0

O 1O 2O 3

Programmable AND array

Fixed OR array

Indicates programmable connection

Indicates fixed connection

O0

I 3 I 2 I 1 I 0

O1O2O3

Fixed AND array

ProgrammableOR array

Programming a PROM

f0

1 X 2 X 1 X 0

f1NANA

: programmed node

2-input mux as programmable logic block

FA 0

B

S

1

Configuration

A B S F=

0 0 0 00 X 1 X0 Y 1 Y0 Y X XYX 0 YY 0 XY 1 X X 1 Y1 0 X1 0 Y1 1 1 1

XYXY

XY

LUT-Based Logic Cell

Courtesy Xilinx

D 4

C 1 ....C4

xxxxxx

D 3

D 2

D 1

F4

F3

F2

F1

Logicfunction

ofxxx

Logicfunction

ofxxx

Logicfunction

ofxxx

xx

xx

4

xxxxxx

xxxxxxxx

xxx

xxxx xxxx xxxx

HP

Bitscontrol

Bitscontrol

Multiplexer Controlledby Configuration Program

x

xx

x

xx

xxx xx

xxxx

x

xxxxxx

xx

x

xx

xxx

xx

Xilinx 4000 Series

Figure must be updated

Array-Based Programmable Wiring

Input/output pinProgrammed interconnection

InterconnectPoint

Horizontaltracks

Vertical tracks

Cell

Mesh-based Interconnect Network

Switch Box

Connect Box

InterconnectPoint

Courtesy Dehon and Wawrzyniek

Transistor Implementation of Mesh

Courtesy Dehon and Wawrzyniek

Hierarchical Mesh Network

Use overlayed meshto support longer connections

Reduced fanout and reduced resistance

Courtesy Dehon and Wawrzyniek

EPLD Block Diagram

MacrocellPrimary inputs

Courtesy Altera

Altera MAX

From Smith97

Altera MAX Interconnect Architecture

LAB2

PIA

LAB1

LAB6

t PIA

t PIA

row channelcolumn channel

LAB

Courtesy Altera

Array-based(MAX 3000-7000)

Mesh-based(MAX 9000)

Field-Programmable Gate ArraysFuse-based

I/O Buffers

P rogram/Test/Diag nostics

I/O Buffers

I/O B

uffe

rs

I/O B

uffe

rs

Vertical ro utes

Rows o f logic m odule s

Routing channels

Standard-cell likefloorplan

Xilinx 4000 Interconnect Architecture

2

12

8

4

3

2

3

CLB

8 4 8 4

Quad

Single

Double

Long

DirectConnect

DirectConnect

Quad Long GlobalClock

Long Double Single GlobalClock

CarryChain

Long12 4 4

Courtesy Xilinx

RAM-based FPGA

Xilinx XC4000ex

Courtesy Xilinx

A Low-Energy FPGA (UC Berkeley)

Array Size: 8x8 (2 x 4 LUT) Power Supply: 1.5V & 0.8V Configuration: Mapped as RAM Toggle Frequency: 125MHz Area: 3mm x 3mm

Larger Granularity FPGAs

1-mm 2-metalCMOS tech

1.2 x 1.2 mm2

600k transistors

208-pin PGA

fclock = 50 MHz

Pav = 3.6 W @ 5V

Basic Module: Datapath

PADDI-2 (UC Berkeley)

Design at a crossroadSystem-on-a-Chip

RAM

500 k Gates FPGA+ 1 Gbit DRAMPreprocessing

Multi-

SpectralImager

mCsystem+2 GbitDRAMRecog-

nition

Ana

log

64 SIMD ProcessorArray + SRAM

Image Conditioning100 GOPS

Embedded applications where cost, performance, and energy are the real issues!

DSP and control intensive Mixed-mode Combines programmable and

application-specific modules Software plays crucial role

Addressing the Design Complexity IssueArchitecture Reuse

Reuse comes in generationsGeneration Reuse element Status

1st Standard cells Well established

2nd IP blocks Being introduced

3rd Architecture Emerging

4th IC Early research

Source: Theo Claasen (Philips) – DAC 00

Architecture ReUse

• Silicon System Platform– Flexible architecture for hardware and software– Specific (programmable) components– Network architecture– Software modules– Rules and guidelines for design of HW and SW

• Has been successful in PC’s– Dominance of a few players who specify and control architecture

• Application-domain specific (difference in constraints)– Speed (compute power)– Dissipation– Costs– Real / non-real time data

Platform-Based Design

• A platform is a restriction on the space of possible implementation choices, providing a well-defined abstraction of the underlying technology for the application developer

• New platforms will be defined at the architecture-micro-architecture boundary

• They will be component-based, and will provide a range of choices from structured-custom to fully programmable implementations

• Key to such approaches is the representation of communication in the platform model

“Only the consumer gets freedom of choice;designers need freedom from choice”

(Orfali, et al, 1996, p.522)

Source:R.Newton

Berkeley Pleiades Processor

• 0.25um 6-level metal CMOS • 5.2mm x 6.7mm• 1.2 Million transistors• 40 MHz at 1V• 2 extra supplies: 0.4V, 1.5V• 1.5~2 mW power dissipation

Interface

Reconfigurable

Data-path

FPGA

ARM8 Core

Heterogeneous Programmable Platforms

Xilinx Vertex-II Pro

Courtesy Xilinx

High-speed I/O

Embedded PowerPcEmbedded memories

Hardwired multipliers

FPGA Fabric

Summary

• Digital CMOS Design is kicking and healthy• Some major challenges down the road caused by

Deep Sub-micron– Super GHz design– Power consumption!!!!– Reliability – making it workSome new circuit solutions are bound to emerge

• Who can afford design in the years to come? Some major design methodology change in the making!

Recommended