A Dic Scripts s 2011 Complete

Integrated Electronic Systems Lab

Advanced Digital Integrated Circuit

Design

Summer Term 2011

Prof. Dr.-Ing. Klaus HofmannM. Tech. Ashok Jaiswal

www.ies.tu-darmstadt.de



Contents1. Introduction

2. Repetition MOS Transistors

3. Short Channel MOS

4. MOS Spice Model

5. CMOS Inverter

6. CMOS Technology

7. CMOS Logic

8. Passtransistor Logic

9. Memory Elements and Dynamic Logic

10. Performance

11. CAD and Design Flow

12. Digital Subsystem Design

13. FSM

14. ASIC Design Concepts

15. Programmable Logic Devices

16. Arithmetic Units

17. Microarchitectures

18. Semiconductor Memory

19. ASIC Design Guidelines

20. Testing

21. Future Trends

Exercises



Advanced Digital Integrated Circuit Design

Summer Term 2011

Prof. Dr.-Ing. Klaus HofmannM.Tech. Ashok Jaiswal

http://www.ies.tu-darmstadt.de

Integrated Electronic Systems Lab 2Organisational

Organisational (I)

• This lecture is intended for students of the following subjects:

– Wirtschaftsingenieurwesen Elektrotechnik (FB1, >= 5. Semester)

– Elektrotechnik und Informationstechnik (FB18, >= 5. Semester)

– Informatik (FB20, nach dem Vordiplom)

– Intern. Master Program Information & Communication Engineering

• Requirements: Electronics, Logic Design(i.e. lecture „Elektronik“ or „Analog Integrated Circuit Design“)

• Courses which can complete this lecture:

– Integrated Electronic Systems Lab. (SS)

– HDL-course and HDL-laboratory (2 weeks, full day course, SS)

– Computer Aided Design for Integrated Circuits (SS)


Organisational (II)Lecture:

Tuesday 800h - 940h in room S3|06/052Friday 800h - 940h in room S3|06/052

Practice: The excercises will take place within the lecture hours (Tue. or Fri., to be announced depending on progress)

Attending Staff:

Prof. Dr.-Ing. Klaus HofmannM.Tech. Ashok JaiswalMerckstrasse 25, 3rd floor

Information:You must register for this lecture and the exam using TUCAN. We will use the TUCAN messaging system to communicate

Consultation hours:Directly after the lecture/exercise, or upon request


Exam

Examination:

Type: written examDate: will be announced by FB18 examination officeDuration: 90 minutesAllowed materials to use: tbdRelevant topics: Topics of lectures and exercises

You must register for this exam using TUCAN! (Some exceptions may apply, e.g. diploma students, or students from non FB-18/20 departments).


Overview

• Introduction

• Repetition MOS Devices

• CMOS Inverter

• CMOS Technology

• Static CMOS Logic

• Synchronous Logic

• Basic Sequential Circuits

• Performance

• CAD - Design Flow

• Digital Subsystem Design

• ASIC Design Concepts

• Arithmetic Units

• Micro Architectures

• Memories

• ASIC Design Guidelines

• Design for Testability

• VLSI in Signal Processing

• VLSI in Communications

• Digital Baseband Design

• Future Nanoscale CMOS


[1] John P. Uyemura: Fundamentals of MOS Digital Integrated Circuits, Addison Wesley, 1988

[2] John P. Uyemura: Circuit Design for CMOS VLSI, Kluwer Academic Publishers, 1992

[3] Neil Weste and Kamran Eshragihian: Principles of CMOS VLSI Design, Addison Wesley

[4] W. Maly: Atlas of IC Technologies: An Introduction to VLSI Processes, The Benjamin/Cummings Publishing Company, 1987

[5] Jan M. Rabaey: Digital Integrated Circuits - A DesignPerspective, Prentice Hallhttp://bwrc.eecs.berkeley.edu/Classes/IcBook/index.html

[6] Richard C. Jaeger: Microelectronic Circuit Design, McGraw-Hill

Literature


1. Introduction

Integrated Electronic Systems Lab 81: Introduction

SoC: Silicon Components Categories

Silicon components

Integrated circuitsDiscrete devices

and optoelectronics

Analog andMixed signal

Logic• Logic• Gate arrays• Cell based• FPLDs• SoC

Memory• DRAMs• SRAMs• Flash• Other

Microcomponets• Microprocessors• Microcontrolers• Microperipherals

Silicon components

Integrated circuitsDiscrete devices

and optoelectronics

Analog andMixed signal

Logic• Logic• Gate arrays• Cell based• FPLDs• Other

Memory• DRAMs• SRAMs• Flash• Other

Microcomponents• Microprocessors• Microcontrollers• Microperipherals

Modern SoCs can integrate different components


WW Semiconductor Sales 2008Rank Company Origin Revenue

(Mio US$)Market Share (%)

1 Intel Corp. U.S.A. 33767 13.1

2 Samsung South Korea

16902 6.5

3 Toshiba Japan 11081 4.3

4 Texas Instrum. U.S.A. 11068 4.3

5 STMicroelectronics France/ Italy

10325 4.0

6 Renesas Japan 7017 2.7

7 Sony Japan 6950 2.7

8 Qualcomm U.S.A. 6477 2.5

9 Hynix South Korea

6023 2.3

10 Infineon Germany 5954 2.3

Foundries excluded (Revenue: TSMC: 10000 Mio US$, UMC: 3500)


WW Semiconductor Sales 2008Rank Company Origin Revenue

(Mio US$)Market Share (%)

11 NEC Semi Japan 5826 2.3

12 AMD U.S.A. 5455 2.1

13 Freescale U.S.A. 4933 1.9

14 Broadcom U.S.A. 4643 1.8

15 Panasonic Japan 4473 1.7

16 Micron Tech U.S.A. 4435 1.7

17 NXP Nether-lands

4055 1.6

18 Sharp Japan 3682 1.4

19 Elpida Japan 3599 1.4

21 NVIDIA U.S.A. 3241 1.3

24 Fujitsu Microelec Japan 2757 1.1

Top 25 174464 67.5

TOTAL 258304 100.0


Example 1: Commodity MicroprocessorIntel Core Duo (Penryn Kernel), 2008

Application area: Mobile Computing, Desktop PC Technology: 45nmHafnium based High-k, Metal Gatelots of (> 6-9) levels of interconnect (Al, Cu)IP Block based design800Mio TransistorsArea: about 140mm2

Selling price: at launch time about 150 US$


Example 2: Graphics DRAMQimonda 512Mbit GDDR5, 2008

11326.74um

9898um

Application area: high end graphic cards (ATI HD4870)up to 6Gbit/p/s (HD4870: 115GB/s)

Technology: 75nm3 Metal layer interconnect (Al, W)Area: 112mm2

750 Mio TransistorsSelling price: at launch time about 8 US$


Example 3: Analog/Mixed Signal RFInfineon E-Gold Radio, 2005

Application area: BB+RF Part of entry-level mobile phoneGSM/GPRS QuadbandSupport of Camera, Keyboard, 2 Displays, MP3 ...1st chip that combines logic + RFTechnology: 130nm


Example 4: AMBInfineon/Qimonda: Advanced Memory Buffer, 2006


Example 4: AMBInfineon/Qimonda: Advanced Memory Buffer, 2006

High Speed Lanes

DDR2 interface (DQs)

Digital Core LogicDDR2 interface(CAs)

PLL

Application area: High bandwidth server memory buffer (DDR2)Max Transferrate per digital pair: 4,8Gb/s; overall: max 115Gb/sTechnology: 130nm, 6Cu + 1Al LayerArea: 30,5mm2

Power: 4-6W


Example 4: Power / AreaInfineon/Qimonda: Advanced Memory Buffer, 2006

Power: 1500W∅: 180mm

Power: 6WDie Area: 30,5mm2

Area: 25400mm2

Power Density: 0,059W/mm2

Power Density: 0,196W/mm2


0.02 0.05 0.1 0.5 1

0.1

0.2

0.5

1

2

5

10

1

2

5

10

20

50

tOX

Vt

Vdd

Gat

e o

xid

e th

ickn

ess

tO

X(n

m)

MOSFET channel length (µm)

CMOS feature size 0.035 µm

1.1 - 1.2 V

Transistors/cm 2 100 M

4 G

Future VLSI chip 20112008

Core voltage (V)

0.022 µm

0.6-0.7 V

40 M

8-16 GDRAM bits /chip

Number of wiring levels 9 12-15

(Source: International Technology Roadmap for Semiconductors 2008 update)

Status of Microelectronics Technology


Technology Requirements:Inductive effects will become increasingly importantAdditional metal patterns or ground planes for inductive shieldingThinner metallizationLower line-to-line capacitanceIncreasing pitch and thickness at each conductor level to alleviate the impact of interconnect delay

Passivation

Dielectric

Etch stop layer

Dielectric diffusionbarrier

Copper conductorwith metalbarrier liner

Pre-metaldielectric

Tungstencontact plug

Global

Local

Intermediate

Source: SIA Roadmap 1999

Interconnect


Need to increase Designers Productivity in order to make use of new Technologies

ITRS Roadmap for the Design Technology Requirements (today / near term):

Productivity Gap: Technology vs. CAD


ITRS Roadmap for the Design Technology Requirements (far term):

Productivity Gap: Beyond 2012


0,1

1

10

100

1000

10000

100000

1000000

1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050

Fea

ture

Siz

e (n

ano

met

ers)

Year of First Product Shipment

ITRS Feature Size Projections

uP chan L

DRAM 1/2 p

min Tox

max Tox

Atom

We are here

Bacterium

Virus

Proteinmolecule

DNA moleculethickness

Eukaryoticcell

Human hairthickness

(Sources: 1994-2009 SIA/ITRS roadmaps, 1997 lecture by Gordon Moore)

ITRS roadmap


(source: ITRS ‘08 roadmap)

NOW

.08 µm already available

Intel has verified20 nm transistors

in the lab

ITRS roadmap


Technology Scaling: Notation

• Historically, device feature length scales have decreased by ~12%/year.

– So: feature length l ∝ 0.88year ≡: ⇓– 1/l ∝ (1/0.88)year ≈ 1.14 year ≡: ⇑

• up 14%/year

• Meanwhile, typical CPU die diameters have increased by ~2.3%/year. (Less stable trend.)

– Diameter ∝ 1.023year ≡: ↑– 1/Diameter ∝ 0.978year ≡: ↓

• Quantities that are constant over time are written as ∝ 1 ≡: •


Resistance Scaling

• Fixed-shape wire (any shape):R ∝ l/wt ∝ ⇓/⇓⇓ = ⇑

– All dimensions scalingequally.

– E.g. a local interconnectin a small scaled logicblock / functional unit

• Constant-length thin wire: R∝ •/⇓⇓ = ⇑⇑• Thin cross-chip wire: R∝ ↑/⇓⇓ = ⇑⇑↑ !

– Up 33%/year!

– Long-distance wires have to be extra thick to be fast• But, fewer thick wires can fit!

Current flow

l

w

t


• Fixed-shape structure (any):C ∝ lw/s ∝ ⇓⇓/⇓ = ⇓

– E.g. scaled devices/wires

• Per unit wire length:

– C ∝ •w/s ∝ •⇓/⇓ ∝ • (constant)

• Cross-chip thin wire: C ∝ ↑• Per unit area: C ∝ ••/s ∝ ⇑

– E.g., total on-chip cap./cm2

Capacitance Scaling

w

s


Some 1st-order Semiconductor Scaling Laws

• Voltages V∝⇓ (due to e.g. punch-through )

• Long-term: temperature T∝⇓ (prevents leakage)

• Resistance:

– Fixed-shape wire: R ∝ l/wt ∝ ⇓/⇓⇓ = ⇑– Thin cross-chip wire: R∝ ↑/⇓⇓ = ⇑⇑↑

• Capacitance:

– Fixed-shape structure: C ∝ lw/s ∝ ⇓⇓/⇓ = ⇓– Per unit wire length: C ∝ • (constant)

– Cross-chip wire: C ∝ ↑– Per unit area: C ∝ 1/s ∝ ⇑


Why Voltage Scaling?

• For many years, logic voltages were maintained at fairly constant levels as transistors shrunk

– TTL 5V logic – was standard for many years– later 3.3 V, now: ~1V within leading-edge CPUs

• Further shrinkage w/o voltage scaling is no longer possible, due to various effects:

– Punch-through– Device degradation from hot carriers– Gate-insulator failure– Carrier velocity saturation

• In general, things break down at high field strengths

– constant-field voltage scaling may be preferred


Punch-Through

Moderate bias

e− e−e− e− e−e−

e− e−e−e− e−e−

p+ p+ p+p+

pn n

gateelectrode

Vbias

e− e−e−

e− e−e−

Strong bias

e− e− e−

e− e− e−

e−

e−

Very strong bias

Zero bias


Need for Voltage Scaling

pn n

gateelectrode

Vbias

e− e−e− e− e−e−

e− e−e−e− e−e−

p+ p+ p+p+

e− e−e−

e− e−e−

e− e− e−

e− e− e−

e−

e−

pn n

e−e−e− e−e−e−

e−e−e−e−e−e−

p+p+p+p+

e−e−e−

e−e−e− e−

e−

e−

Smaller size & same voltage →higher electric field strengths →easier punch-through

Vbias


Long-term Temperature Scaling?

• Sub-threshold power dissipation across “off” transistors is based on the leakage current density ∝ exp(−Vt / φT)α

– Vt is the threshold voltage• Must scale down with Vdd, or else transistor can’t turn on!

– φT is the thermal voltage at temperature T• Equal to kBT/q, where q is electron charge magnitude• Voltage spread of individual electrons fr. thermal noise

• As voltages decrease,– leakage power will dominate, devices will become unable to store

charge

• Unless (eventually), T ∝ V ∝ l ∝ ⇓• Only alternative to low T: Scaling halts!

– Probably what must happen, because low temps.imply slow rate of quantum evolution.

Unfortunately,lower T → fewercharge carriers!


Delay Scaling

• Charging time delay t ∝ RC :

– Through fixed shape conductor: RC ∝ ⇑⇓ = •– Thin constant-length wire: RC ∝ ⇑⇑– Via cross-die thin wire: RC ∝ ⇑⇑↑·↑ = up 36%/yr!

– Through a transistor: RC ∝ •·⇓ = ⇓• Implications:

– Transistors increasingly faster than long thin wires.

– Even becoming faster than fixed-shape wires!

– Local communication among chip elements is becoming increasingly favored!


Performance scaling

• Performance characteristics:

– Clock frequency for small, transistor-delay-dominated local structures: f ∝ 1/t ∝ ⇑ (up 14%/yr)

– Transistor density (per area): d = 1/⇓⇓ = ⇑⇑– Perf. density RA = fd = ⇑⇑⇑; chip area: A ∝ ↑↑– Total raw performance (local transitions / chip / time): R = fd A =

⇑⇑⇑↑↑ = 1.55year

• Increases 55% each year!

• Nearly doubles every 18 months (like Moore’s Law).

• Raw performance has (in the past) been harnessed for improvements in serial microprocessor performance.

• Future architectures will need to move to more parallel programming models to fully use further improvements.


Charges & Currents

• Charges & fields:

– Charge on a structure: Q = CV ∝ ⇓⇓– Surface charge density: Q/A ∝ •– Electric field strengths: E = V/l ∝ •

• Currents:

– Peak current densities: J = E/ρ ∝ •– Peak current in a wire: I = JA ∝ ⇓⇓– Channel-crossing times: t = l/v ∝ ⇓

• Due to constant e− saturation velocity v ≈ 200 kmph

– Current in an on-transistor: I = Q/t ∝ ⇓⇓/⇓ = ⇓– Effective trans. on-resistance: R = V/I ∝ ⇓/⇓ = •

• ~4-20 kΩ is typical for a min-sized transistor

Resistivity: Constant


Interconnect Scaling

• Since transistor delay dt scales as ⇓,

• And wire delay dw (w. scaled cross-section size) for a wire of length l scales as

RC ∝ (l/wt)(lw/s) = l2/st ∝ l2/⇓⇓ = l2⇑⇑,

• Then to keep dw < dt (1-cycle access) requires:l2⇑⇑ < ⇓l2 < ⇓/⇑⇑ = ⇓⇓⇓l < ⇓3/2

• So wire length in units of transistor length lt isl/lt < ⇓3/2/⇓ = ⇓1/2 (down 6%/year)

• So number of devices accessible within a constant × dt in 2-D goes as (⇓1/2)2 = ⇓, in 3-D as (⇓1/2)3 = ⇓3/2.

– Circuits must be increasingly local.


Energy and Power

• Energy:– Energy on a structure: E ∝ QV ∝ CV2 ∝ ⇓⇓2 = ⇓3

– Energy per-area: EA ∝ CV2/A ∝ ⇓3/⇓2 = ⇓– Energy densities: E/l3 ∝ ⇓3/⇓3 ∝ • (not a problem)

• Power levels:– Per-area power: PA = EAf ∝ ⇓⇑ = • (not a problem)– Power per die: P = PAA ∝ ↑↑ (up ~5%/year)

• Power-per-performance: PA/RA = •/⇑⇑⇑ = ⇓⇓⇓

• But, if constant-field scaling is not used (and it has not been, very much, and cannot be much further) all the above scaling rates get increased by the square of the field strength (F) scaling rate.

– Because V ∝ F·l, and E and P scale with V2.


3-D Scalability?

• Consider stacking circuits in 3-D within a constant volume.

• # of layers n: •/thickness ∝ •/⇓ ∝ ⇑• Total power: PT = P(flat chip)×n ∝ •⇑ = ⇑• Enclosing surface area AE: •• Power flux (if not recycled): PT/AE = ⇑/• = ⇑

– For this to be possible, coolant velocity &/or thermal conductivity must also increase as ⇑!

• Probably not feasible.

• Power recycling is needed to scale in 3-D!


Types of Limits

• Meindl ‘95 identifies several kinds of limits on VLSI (from most to least fundamental):

– Theoretical limits (focus on energy & delay)• Fundamental limits (such as we already discussed)

• Material limits (dependent on materials used)

• Device limits (dependent on structure & geometry)

• Circuit limits (dependent on circuit styles used)

• System limits (dependent on architecture & packaging)

– Practical limits• Design limits

• Manufacturing limits


Dielectric Constants

• Dielectric constants κ = ε/ε0 = C/C0. κSiO2 ≈ 4

– Want high κ in thin gate dielectrics, • To maximize channel surface-charge density, & thus on-current,

for given VG,on,

• But avoid very low thickness w. high tunneling leakage.

• But, material must also be an insulator! (κSrTi = 310!)

– Want low κ for thick interconnect (“field”) insulators• To minimize parasitic C and delay of interconnects

• Lowest κ possible is that of vacuum (1). Air is close.

– High-k dielectrics under development, used in recent Intel processes


Some Device Limits• MOSFET channel length

– Generally, the lower, the better!• Reduces load capacitance & thus load charging time.

– But, lengths are lower-bounded by the following:• Manufacturing limits, such as lithography wavelengths.• Supply voltage lower-limits to keep a decent Ion/Ioff.• Depletion region thickness due to dopant density limits.• Yield, in the face of threshold variation due to statistical fluctuation in

dopant concentrations.• Source-to-drain tunneling.

• Distributed RC network response time– Limited by:

• ρ of wires (e.g. the recent shift from Al to Cu)• κ of insulators (at most, 4x less than SiO2 is possible) • Widths, lengths of wires: limited by basic geometry


Circuit Limits

• Power supply voltage limits

• Switching energy limits

• Gate delays:

– Fundamentally limited by transistor characteristics, RC network charging times

• each of which are limited as per previous slide

– There is a fastest possible logic gate in any given device technology• esp. considering it has to be switched by similar gates

– Static CMOS & its close relatives (precharged domino, NORA) are probably close to the fastest-possible gates using CMOS transistors in a given tech. generation.


System Limits

• Architectural limits

• Power dissipation

• Heat removal capability of packaging

• Cycle time requirements

• Physical size


Design & Design-Verification Limits

• Increasing complexity (# of devices/chip) leads to continual new challenges in:

– Design organization• modularity vs. efficiency

– Automatic circuit synthesis & layout• circuit optimization

– Design verification• layout-vs-schematic

• logic-level simulation

• analog (e.g. SPICE) modeling

– Testing and design-for-testability• test coverage


Manufacturing Limits

See the ITRS ‘10 roadmap for these.

• Lithography resolution, tools

• Dopant implantation techniques

• Process changes for new device structures

• Assembly & packaging

• Yield enhancement

• Environmental / safety / health considerations

• Metrology (measurement)

• Product cost & factory cost


Possible Endpoints for Electronics

• Merkle’s minimal “quantum FET”

• Mesoscale nanoelectronic devices based on metal or semiconductor “islands”

– E.g. Single-electron transistors, quantum dots, resonant tunneling transistors.

• Various organic molecular electronic devices

– diodes, transistors

• Inorganic atomic-scale devices

– 1-atom-wide chains of conductor/semiconductor atoms precisely positioned on/in substrates

• Superconducting devices


Energy Limits in Electronics

• Origin of CV2/2 switching energy dissipation

• Thermal reliability bounds on CV2 scaling

– Voltage limits

– Capacitance limits

• Leakage trends in MOSFETs



Challenge: System-on-a-Chip Design ?

Design Complexity

Design Productivity

1975 1980 1985 1990 1995 2000

Gates

RTL

Place & Route

Synthesis

Reuse, IP Cores

System on a Chip

Transistors

Polygons Masks

Chasing the design gap


Traditional ASIC market

ASICs are customer specific ICsIf application-specific processor: ASIPThe product is made only once an application is found

Non-standard IC

Semicustom

Custom

(application specific)

ASIP

(customer specific)

ASIC

Programmable

One or more customised layers

All layers customised

Circuit with fuse, antifuse or memory that can be programmed


Market for Systems-on-a-Chip

Area Examples:

MultimediaMobile CommunicationAutomotive...

SoC

-> Domain Specific Computing

WWW

JavaConfigurable

Multi-StandardInfo Plug...

LAN

BroadbandNetwork

Services

MPEG 4-7100 Gop/s 5 Gtr/s 10 Watt

100Mb/sWLAN

<1 Watt

RF20Gop/s

??

Source:Hugo De ManEIS´99, Darmstadt


2. Repetition Transistor Models

Integrated Electronic Systems Lab 512: Transistors

Structure of MOSFET

n+ n+

L

Source (S)Gate (G)

Drain (D)

Channel Region

Body (B)

P-Type Substrate

vD

vGvS

iSiG iD

vB

i B

D

G

S

B

MOSFET - Current through the channel region is controlled with voltage vG


Inversion

• The bulk has to have the lowest potential to ensure reverse biased pn-junctions (no current must flow between drain/source and bulk!)

• VSB = 0 → in the following we relate all voltages to the source voltage

• VGS > VT → n-channel is induced (blue area between drain and source).

• White area → depletion region

• A current can flow between drain and source, if VDS > 0

• Because the MOSFET is a symmetrical device, source and drain have to be defined: source has always a lower potential than the drain for an n-channel FET!


Ohmic region

• Increasing VDS to a value VDS > 0leads to a current ID.

• Near the drain the voltage responsible for the inversion is (VGS - VT) - VDS and thus smaller than near the source.

• The channel acts like a linear resistor - that’s why this region of operation is called ohmic.

0.80.60.40.20.00.00e+0

2.00e-4

4.00e-4

6.00e-4

8.00e-4

Drain-Source Voltage (V)

Dra

in-S

ourc

e C

urre

nt (

A)

V = 2 V

V = 3 V

V = 4 V

V = 5 VGS

GS

GS

GS

In this region: iDS ∼ vDS ⇒ Ron

0.5kΩ < Ron < 10kΩ


Pinch - off

• If VDS rises to the point where it is VGS - VT, there is no voltage near the drain to induce an inversion layer - the channel is pinched offat the drain.


Saturation

• Further increasing VDS causes the pinch-off point to move in the direction of the source.

• The voltage at the pinch off pointis always VGS - VT.

• When the electrons coming from the source reach the pinch off point, they are injected into the depleted region and the electric field in this region sweeps the electrons form the pinch off point to the drain.


Output Characteristics

1210864200.00e+0

2.00e-5

4.00e-5

6.00e-5

8.00e-5

1.00e-4

1.20e-4

1.40e-4

1.60e-4

1.80e-4

2.00e-4

2.20e-4

Drain-Source Voltage (V)

Dra

in-S

ourc

e C

urre

nt (

A)

V = 2 V

V = 3 V

V = 4 V

V = 5 V

Pinchoff Locus

V < 1 V

Saturation Region

LinearRegion

GS

GS

GS

GSGS

• VT = 1V


Channel Length Modulation


Transfer Characteristics and Depletion Mode MOSFET

• Transfer characteristics: plot of drain current versus gate-source voltage for a fixed drain-source voltage

• If threshold voltage of NMOS transistor negative → depletion mode MOSFET (there exists an implanted n-type channel region)

6420-2-4-50

0

50

100

150

200

250

Gate-Source Voltage (V)

Dra

in-S

ourc

e C

urre

nt (

uA)

Enhancement-Mode

Depletion-Mode

V = -2 VTN V = +2 VTNn+ n+

L

G D

p-type substrate

S

Implanted n-typeChannel Region

B


P-channel MOSFET (PMOS)

p+

L

Source

Gate Drain

Channel Region

Body

n-type substrate

v > 0BiB

iS

vS v < 0GiG

v < 0DiD

p+

121086420-2-5.00e-5

0.00e+0

5.00e-5

1.00e-4

1.50e-4

2.00e-4

2.50e-4

Source-Drain Voltage (V)

Sou

rce-

Dra

in C

urr

ent

(A)

V < 1 V (V > -1 V)SG

V = 3 V (V = -3V)SG

V = 2 V (V = -2 V)SG

V = 4 V (V = -4V)SG

V = 5 V (V = -5V)SG

NMOS Device PMOS DeviceEnhancement-mode VTN > 0 VTP < 0Depletion-mode VTN < 0 VTP > 0

GS

GS

GS

GS

GS


IEEE Standard MOS Transistor Circuit Symbols

(b) PMOS enhancement-mode device

D

B

S

G

D

B

S

G

(d) PMOS depletion-mode device

D

S

G

(f) Three-terminal PMOS transistor

D

B

S

G

(a) NMOS enhancement-mode device

D

S

G

(e) Three-terminal NMOS transistor

D

B

S

G

(c) NMOS depletion-mode device


Summary of MOS Equations

From NMOS to PMOS: Signs of all voltages change

D

B

S

G iDS

D

B

S

G

iSD


MOS Capacitances - Linear Region

CDB

CSB

C'OLC'OL

C"OX

C"OX

Gate DrainSource

p-type substrate

n-type channel

BulkNMOS device in the linear region

n+ n+

The channel shields the bulk electrode from the gate since the inversion layer acts as conductor between drain and source.


MOS Capacitances - Saturation

CDB

CSB

C'OL

C'OL C"

OX

GateDrain

Bulk

Source

p-type substrate

n-type channel

NMOS device in saturation

C"OX

n + n +

The channel shields the bulk electrode from the gate since the inversion layer acts as conductor between drain and source. The channel is pinched off and does not

contact the drain n+ region.


MOS Capacitances - Cutoff

C DBCSB

C'OLC'

OL

GateDrain

Bulk

Source

p-type substrate

NMOS device in cutoff

CGB

Depletion region

n + n +

The gate-bulk capacitance consists of the gate capacitance in series with the

depletion capacitance of the depletion region.


Small-Signal Models for Field-Effect Transistors (I)

ig

i d

+

-

+

-vgs

vds

The MOSFET represented as a two-port network

- Considering the MOSFET as a three-terminal device.- Small-signal model of the MOSFET is based on the y-parameter

two-port network.


Small-Signal Models for Field-Effect Transistors

+

-

vgs

+

-

vds

i gi d

g vm gs

rο

G D

S

Small-signal model for the three-terminal MOSFET


Body Effect in the Four-Terminal MOSFET

+

-

rog vmb bsg vm gs

vds vbs

+

-

+

-

vgs

G

S

D B

A second voltage-controlled current source has been added to model the back-gate transconductance gmb.

Small-Signal model for the four-terminal MOSFET


High-Frequency MOSFET Small Signal Model

+

-

rog vmb bsg vm gs

vds vbs

+

-

+

-

vgs

G

S

D

B

GDCGBC

BDC

BSCGSC

*D

*S

SR

DR


High-Frequency MOSFET Small Signal Model

DOXWLC

sionunderdiffu todue Sourceor Drain toGate Overlap :DL

DOXWLC

WLCOX

1BDC

1BSC

OXDOX WLCWLC 21+

OXDOX WLCWLC 21+

0

21

1BC

BDCC +

21

1BC

BSCC +

OXDOX WLCWLC 32+

DOXWLC

0

1BDC

11 32

BCBS CC +

GDC

GSC

BGC

BDC

BSC

Cutoff Ohmic Saturation


3. Short Channel Effects on MOS Transistors

Integrated Electronic Systems Lab3: Short Channel Effects 71

Overview.

• Short Channel

Devices.

• Velocity Saturation

Effect.

• Threshold Voltage

Variations.

• Hot Carrier Effects.

• Process Variations.

(Source: Jan M. Rabaey, Digital Integrated Circuits)


Short Channel Devices.

• As the technology scaling reaches channel lengths less than a micron (L<1µ), second order effects, that were ignored in devices with long channel length (L>1µ), become very important.

• MOSFET‘s owning those dimensions are called „short channel devices“.

• The main second order effects are: Velocity Saturation, Threshold Voltage Variations and Hot Carrier Effects.

L<1µn+ n+

Polysilicon

GateGate Oxyde

Source Drain

p-substrate

Field-Oxyde(SiO2)

p+ stopper


Velocity Saturation Effect (I)

• Review of the Classical Derivation of the Drain Current:

VGS>VT

VDS<<VGS

• Induced channel charge at V(x):

Qi(x)=-COX[VGS-V(x)-VT] (1)

• The current is given as a product of the drift velocity of the carriers vn and the available charge:

ID=-vn(x)Qi(x)W (2)

n+ n+

G

p-substrate

D

SVGS VDS

ID

B

V(x)

xL

MOS transistor and ist bias conditions


Velocity Saturation Effect (II)

• The electron velocity is related to the electric field through the mobility:

(3)

• Combining (1) and (3) in (2):

IDdx=µnCOXW(VGS-V(x)-VT)dV (4)

• Integrating (4) from 0 to L yields the voltage-current relation of the transistor:

(5)

• The behavior of the short channel devices deviates considerablyfrom this model.

• Eq. (3) assumes the mobility µn

as a constant independent of the value of the electric field Ε.

• At high electric field carriers fail to follow this linear model.

• This is due to the velocity saturation effect.

( )dx

dVxv nnn µµ =Ε−=

( ) ⎥⎦

⎤⎢⎣

⎡−−=

2

2DS

DSTGSOXnD

VVVV

L

WCI µ


Velocity Saturation Effect (III)

• When the electric field reaches a critical value ΕC, (1.5×106 V/m for p-type silicon) the velocity of the carriers tends to saturate (105

m/s for silicon) due to scattering effects.

constant mobility (slope=µ)

constant velocity

Ec=1.5

E (V/µm)

vn (m/s)

vsat=105


Velocity Saturation Effect (IV)

(7)

with:

• For large values of L or small values of VDS, κ approaches 1 and (7) reduces to (5).

• For short channel devices κ<1 and the current is smaller than what would be expected.

satvv =

( ) ( ) ⎥⎦

⎤⎢⎣

⎡−−=

2

2DS

DSTGSOXnDSD

VVVV

L

WCVI µκ

C

nvΕΕ+

Ε=

1

µ

• The impact of this effect over the drain current of a MOSFET operating in the linear region is obtained as follows:

• The velocity as a function of the electric field, plotted in the last figure can be approximated by:

for Ε≤ΕC (6)

for Ε≥ΕC

Reevaluating (1) and (2) using (6):

( ) ( ))(1

1

LVV

CDSDS Ε+

=κ


Velocity Saturation Effect (V)

• When increasing the drain-source voltage, the electric field reaches the value ΕC, and the carriers at the drain become velocitiy saturated. Assuming that the drift velocity is saturated, from (4) with µndV/dx=vsat the drain current is:

IDSAT=vsatCOXW(VGS-VT-VDSAT) (8)

Evaluating (7) with VDS=VDSAT

• Where VGT is a short notation for VGS-VT.

• Equating (8) and (9) and solving for VDSAT:

(10)

• For a short channel device and large enough values of VGT, κ(VGT) is smaller than 1, hence the device enters saturation before VDS reaches VGS-VT.

( ) ⎥⎦

⎤⎢⎣

⎡−=

2

2DSAT

DSATGTOXnDSATDSAT

VVV

L

WCVI µκ

( ) GTGTDSAT VVV κ=


Velocity Saturation Effect (VI)

Long-channel device

Short-channel device

VGS=VDD

VDSAT VGS-VT

ID

VDS

Short channel devices display an extended saturation region due to velocity-saturation


Simplificated model for hand calculations (I)

A substantially simpler model can be obtained by making two assumptions:

• Velocity saturates abruptly at ΕC and is approximated by:

ν=µnΕ for Ε≤ΕC

ν=νsat= µnΕC for Ε≥ΕC

• VDSAT at which ΕC is reached is constant and has a value:

(11)

Under these conditions the equation for the current in the linear region remains unchanged from the long channel model. The value for IDSAT is found by substituting eq. (11) in (5).

n

satCDSAT

LLV

µν

=Ε=


Simplificated model for hand calculations (II)

( ) ⎥⎦

⎤⎢⎣

⎡−−=

2

2DSAT

DSATTGSOXnDSAT

VVVV

L

WCI µ

( ) ⎥⎦⎤

⎢⎣⎡ −−=

2DSAT

TGSOXsatDSAT

VVVWCvI (12)

This model is truly first order and empirical and causes substantial deviations in the transition zone between linear and velocity saturated regions. However it shows a linear dependence of the saturation current with respect to VGS for the short channel devices.


I-V characteristics of long- and short-channel MOS transistors both with W/L=1.5


ID-VGS characteristic for long- and short channel devices both with W/L=1.5


Threshold Voltage Variations (I)

• For a long channel N-MOS transistor the threshold Voltage is given for:

(11)

• Eq. (11) states that the threshold Voltage is only a function of the technology and applied body bias VSB

• For short channel devices this model becomes inaccurate and threshold voltage becomes function of L, W and VDS.

( )FSBFTT VVV φφγ 220 −−+−+=


Threshold Voltage Variations (II)

Drain-induced barrier lowering(for low L)

Threshold as a function ofthe length (for low VDS)

VT

VDSL

Long-channel threshold

VT

Low VDS threshold


Hot Carrier Effects (I)

• During the last decades transistors dimensions were scaleddown, but not the power supply.

• The resulting increase in the electric field strength causes an increasing energy of the electrons.

• Some electrons are able to leave the silicon and tunnel into the gate oxide.

• Such electrons are called „Hot carriers“.

• Electrons trapped in the oxide change the VT of the transistors.

• This leads to a long term reliabilty problem.

• For an electron to become hot an electric field of 104 V/cm is necessary.

• This condition is easily met with channel lengths below 1µm.


Hot Carrier Effects (II)

Hot carrier effects cause the I-V characteristics of an NMOS transistor to degrade from extensive usage.


Process Variations.

Devices parameters vary between runs and even onthe same die!

Variations in the process parameters, such as impurity concentrationdensities, oxide thicknesses, and diffusion depths. These are caused by non uniform conditions during the deposition and/or the diffusion of the impurities. This introduces variations in the sheet resistances and transistor parameters such as the threshold voltage.

Variations in the dimensions of the devices, mainly resulting from the limited resolution of the photolithographic process. This causes (W/L) variations in MOS transistors and mismatches in the emitter areas of bipolar devices.


Impact of Device Variations.

1.10 1.20 1.30 1.40 1.50 1.60

Leff (in mm)

1.50

1.70

1.90

2.10

De

lay

( ns e

c)

–0.90 –0.80 –0.70 –0.60 –0.50

VTp (V)

1.50

1.70

1.90

2.10

De

lay

( ns e

c)

Delay of Adder circuit as a function of variations in L and VT


Parameter values for a 0.25µm CMOS process. (minimum length devices).

VTO (V) γ (V0.5) VDSAT (V) K‘ (A/V2) λ (V-1)NMOS 0.43 0.4 0.63 115 × 10-6 0.06PMOS -0.4 -0.4 -1 -30 × 10-6 -0.1


4. SPICE LEVEL 1 MOSFET MODEL

Integrated Electronic Systems Lab4: MOSFET Model 91

Four mask layout and cross section of a N channel MOS Transistor.


Layout and cross section of a n-well CMOS technology.


Equations for the different operation regions

0=DSI )( THGS VV ≤

( )[ ]( )DSDSTHGSDSeffDS VLAMBDAVVVVLWKP

I ⋅+−−= 12)(2

)0( THGSDS VVV −≤≤

( )( ) ( )DSTHGSeffDS VLAMBDAVVLWKP

I ⋅+−= 12

2 )0( DSTHGS VVV ≤−≤

Where the threshold voltage is given by:

( )PHIVPHIGAMMAVV BSTTH ⋅−−⋅+= 220

and the channel length:

LDLLeff ⋅−= 2


Where L is the length of the polysilicon gate and LD is the gate overlap of the source and drain.

The elements in the large signal MOSFET model are shown in the following figure.


MOSFET SPICE PARAMETERS.

Parameter Name SPICE Symbol Analytical Symbol Units

Channel length Leff L M

Poly gate length L Lgate M

Lateral diffusion/Gate-source overlap LD LD M

Transconductanceparameter KP µnCOX A/V2

Threshold voltage/Zero-bias threshold VTO VTO V

Channel-lengthmodulation parameter LAMBDA λn V-1

Bulk threshold/Backgate effect parameter GAMMA γn V1/2

Surface potential/Depletion drop in

inversionPHI -φP V


Specifying MOSFET Geometry in SPICE.

Mname D G S B MODname L= W= AD= AS= PD= PS= NRD= NRS=


LEVEL 1 MOSFET MODEL PARAMETERS.

.MODEL MODname NMOS/PMOS VTO= KP= GAMMA= PHI= LAMBDA= RD= RS= RSH= CBD= CBS= CJ= MJ= CJSW= MJSW= PB= IS= CGDO= CGSO= CGBO= TOX= LD=

where:

NMOS/PMOS- MOSFET type.

VTO- Threshold voltage (V)

KP- Transconductance parameter (A/V2)

GAMMA- Bulk threshold parameter (V1/2)

PHI- Surface potential (V)

LAMBDA- Channel length modulation parameter (V-1)

RD- Drain resistance (Ω)



RS- Source resistance (Ω)

RSH- Sheet resistance of the drain/source diffusions (Ω/ )

CBD- Zero bias drain-bulk junction capacitance (F)

CBS- Zero bias source-bulk junction capacitance (F)

MJ- Bulk junction grading coefficient (dimensionless)

PB- Built-in potential for the bulk junction (V)

• With CBD, CBS, MJ and PB, SPICE computes the voltage dependences of the drain-bulk and source-bulk capacitances:

( )( )MJ

BD

BDBDPBV

CBDVC

−=

1( )

( )MJBS

BSBSPBV

CBSVC

−=

1


Large-signal, charge-storage capacitors of the MOS device.



CJ- Zero bias planar bulk junction capacitance (F/m2)

CJSW- Zero bias sidewall bulk junction capacitance (F/m)

MJSW- Sidewall junction grading coefficient (dimensionless)

• If CJ, CJSW, and MJSW are given, a more accurated simulation of these capacitances is performed using the following equations:

( )( ) ( )MJSW

BDMJ

BD

BDBDPBV

PDCJSW

PBV

ADCJVC

−⋅

+−

⋅=

11

( )( ) ( )MJSW

BSMJ

BS

BSBSPBV

PSCJSW

PBV

ASCJVC

−⋅

+−

⋅=

11


Bottom and Sidewall components of the bulk junction capacitors.

Bottom=ABCD

Sidewall=ABEF+BCFG+DCGH+ADEH



IS- Saturation current of the junction diode (A)

CGDO- Overlap capacitance of the gate with drain (F)

CGSO- Overlap capacitance of the gate with source (F)

CGBO- Overlap capacitance of the gate with bulk (F)

TOX- Gate oxide thickness (m)

LD- Lateral diffusion (m)


Overlap Capacitances of an MOS transistor. (a) Top view showing the overlap between the source or drain

and the gate. (b) Side view.


Example of MOSFET model parameters values.

Parameter Name N Channel MOSFET P Channel MOSFET Units

Gate oxide thickness TOX 150 150 Angstroms

Transconductanceparameter KP 50 x 10-6 25 x 10-6 A/V2

Threshold voltage 1.0 -1.0 V

Channel-lengthmodulation parameter

LAMBDA0.1/L (L in µm) 0.1/L (L in µm) V-1

Bulk threshold parameterGAMMA 0.6 0.6 V1/2

Surface potential PHI 0.8 0.8 V

Gate-Drain overlapcapacitance. CGDO 5 x 10-10 5 x 10-10 F/m

Gate-Source overlapcapacitance. CGSO 5 x 10-10 5 x 10-10 F/m

Zero-bias planar bulkdepeltion capacitance CJ 10-4 3 x 10-4 F/m2

Zero-bias sidewall bulkdepletion capacitance

CJSW5 x 10-10 3.5 x 10-10 F/m

Bulk junction potential PB 0.95 0.95 V

Planar bulk junctiongrading coefficient MJ 0.5 0.5

Sidewall bulk junctiongrading coefficient MJSW 0.33 0.33


5. CMOS Inverter

Integrated Electronic Systems Lab5: CMOS Inverter 106

Inverter as simplest logic gate

R

vO

V +

v I

vI

vO

V+

MS

v I

R

vO

V DD

iD

v I

R

vO

V CC

iC

QS

VI

VO


Logic Voltage Levels

VOL: Nominal voltage corresponding to a low logic state at the output of a logic gate for vI = VOH.

Generally V- ≤ VOL.

VOH: Nominal voltage corresponding to a high logic state at the output of a logic gate for vI = VOL.

Generally VOH ≤ V+.

VIL: Maximum input voltage that will be recognised as a low input logic level.

VIH: Minimum input voltage that will be recognised as a high input logic level.

VIL

VIH

vI

vO

VOH

VOL

Slope = -1

Slope = -1

00

VOL

VOH

V+

V+

NMLNM

H

V-


Noise Margins

NML: Noise margin associated with a low input level

NML = VIL - VOL

NMH: Noise margin associated with a high input level

NMH = VOH - VIH

Undefined Logic State

VOH

VIH

VIL

VOL

NML

NMH

V+

V-

"1"

"0"

"0"

"1"

vIvO


Dynamic Response of Logic Gates

• Rise time tr: time required for the transition from V10% to V90%.

• Fall time tf: time required for the transition from V90% to V10%.

V10% = VOL + 0.1(VOH - VOL)

V90% = VOL + 0.9(VOH - VOL)

• Propagation delay τP: difference in time between the input and output signals reaching V50%.

V50% = (VOH + VOL)/2t rt f

vO

VOH

t

50%

90%

10%

τPHLτ PLH

VOL

VOL

vI

t

50%

90%

10%

tr tf

V + V OH OL

2

VOH

V + V OH OL

2

t1 t2 t3 t4

(a)

(b)

Switching waveforms for an idealised inverter(a) Input voltage signal (b) Output voltage waveform2

PHLPLHP

τττ +=


MOS Inverter with Resistive Load

• NMOS switching device MS

designed to force vO to VOL

• Resistor load R to pull the output up toward the power supply VDD

• VOH = VDD (driver in cut off ⇒ iD = 0)

• VOL determined by W/L ratio of MS

MS

vI

R

vO

V = 5 VDD

vDS

iD

+

-


Example

R

V = 5V DD

MS

v = V = 5 VOH

0

R

V = 5V DD

95 k Ω

MS

v = V OL

50 µA

2.06 1

v = 0.25 VDS

iDD

(a) (b)

+

-v = V < V

OL THIv = V = 5 V

OHI

O

O


On - Resistance

R

VDD

VOH

Ron

v = V OLI

R

Ron

VDD

VOL

v = VOHI

(b)(a)

on

DDon

onDDOL

RR

VRR

RVV

+=

+=

1

1

⎟⎠⎞

⎜⎝⎛ −−

==

2'

1

DSTNGSn

D

DSon v

VvL

WK

i

vR


Transistor Alternatives to the Load Resistor

M SvI

vO

VDD

(b) NMOS inverter with gate of the load device grounded

M L+

MSvI

vO

VDD

(a) NMOS inverter with gate of the load device connected to its source

M L

M S

vO

VDDVGG

(d) Linear load inverter

ML

MSvI

vO

VDD

(c) Saturated load inverter

M L

VI


CMOS Inverter Technology

n+

p-type substrate

n+ p+ p+

vI

vo

V (5 V)DD

n-well

n+

NMOS transistorPMOS transistor

p+

V (0 V)SS

BSDDSB

Ohmic contact Ohmic

contact

CM O S T ransistor Param eters

NM O S Device PM O S Device

V T O 1 V -1 V

γ 0.50 V 0.75 V

2 φF 0.60 V 0.70 V

K ' 25 µA/V 2 10 µA/V 2


Complementary MOS (CMOS) Logic Design

• Inverter with resistive load ⇒ power dissipation when the input is high.

• If an NMOS and PMOS transistor is used ⇒ CMOS.

• One transistor is always off while the other is on ⇒ no static power consumption.

MN

vI

vO

V = 5 VDD

MP

S

G

G

S

D

DvO

V = 5 VDD

vI

Ronp

onnR


CMOS voltage transfer Characteristic

0V 1.0V 2.0V 3.0V 4.0V 5.0V

4.0V

2.0V

0V

- VTPI= vov

- VTNI= vov IHV

ILV

vI

1 2

3

45

M offN

M and M saturated

M saturatedM linearP

N

N P

M saturatedM linear

PN

M offP

vo


Regions of Operation of Transistors in a Symmetrical Inverter

Region Input Voltage vI OutputVoltage vO

NMOSTransistor

PMOSTransistor

1 vI ≤ VTN VOH = VDD Cutoff Linear

2 VTN < vI ≤ vO + VTP High Saturation Linear

3 vI ≈ VDD/2 VDD/2 Saturation Saturation

4 vO + VTN < vI ≤ (VDD + VTP) Low Linear Saturation

5 vI ≥ (VDD + VTP) VOL = 0 Linear Cutoff


What happens, if the inverter is not symmetrical?

0V 1.0V 2.0V 3.0V 4.0V 5.0V 6.0V

6.0V

4.0V

2.0V

0V

V = 5 VDD

V = 4 V

V = 3 V

V = 2 V

DD

DD

DD

Iv = vO

vI0V 1.0V 2.0V 3.0V 4.0V 5.0V

6.0V

4.0V

2.0V

0V

vI

K = 0.2R

K = 1R

K = 5R

v = vO I

Symmetrical inverter (Kn = Kp) Asymmetrical inverter (KR = Kn / Kp)


Calculation of VIL

Equating currents for saturated nMOS and nonsaturated pMOS device (Region 2):

The derivation condition (dVout / dVin) = -1 has to be evaluated for

IDn(Vin) = IDp(Vin , Vout):

Evaluating the derivation gives:

This equation has to be solved together with the first equation ⇒ VIL

( ) ( )1

/

//−=

∂∂∂∂−

=outDp

inDpinDn

in

out

VI

VIdVdI

dV

dV

( ) ( )( ) ( )[ ]22 222 outDDoutDDTpinDD

pTnin

n VVVVVVVK

VVK

−−−−−=−

TpDDTnp

nout

p

nIL VVV

K

KV

K

KV −−+=⎟⎟

⎠

⎞⎜⎜⎝

⎛+ 21


Calculation of VIH

At the point VIH the NMOS device is nonsaturated and the PMOS transistor is saturated (region 4):

The derivation condition (dVout / dVin) = -1 has to be evaluated for IDn(Vin, Vout) = IDp(Vin):

which gives:

This equation forms together with the first equation a quadratic in VIH

which has to be solved.

( )[ ] ( )22

22

2 TpIHDDp

outoutTnIHn VVV

KVVVV

K−−=−−

( ) ( )1

/

//−=

∂∂∂∂−

=outDn

inDninDp

in

out

VI

VIdVdI

dV

dV

( )TpDDn

pTnout

n

pIH VV

K

KVV

K

KV −++=⎟⎟

⎠

⎞⎜⎜⎝

⎛+ 21


Calculation of Vth

For Vth = Vin = Vout both transistors are saturated (λ is assumed to be 0):

Solving for Vth yields:

0V 1.0V 2.0V 3.0V 4.0V 5.0V

4.0V

2.0V

0V

IHV

ILV

vI

1 2

3

4 5

M and M saturatedN Pvo

Vin=Vout

Vth

( ) ( )22

22 TpthDDp

Tnthn VVV

KVV

K−−=−

( )np

TpDDnpTnth KK

VVKKVV

/1

/

+

−+=


Design of CMOS inverter (I)

• NMH = VOH - VIH = VDD - VIH

• NML = VIL - VOL = VIL - 0 = VIL

• KR = Kp / Kn

• Remember:

⇒Influence of the symmetry via W/L of transistors!

111098765432100.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Noi

se M

argi

n (

Vol

ts)

NM

NM

H

L

KR

nnn L

WKK ⎟

⎠⎞

⎜⎝⎛= '

ppp L

WKK ⎟

⎠⎞

⎜⎝⎛= '


Design of CMOS inverter (II)

The ratio (W/L) in CMOS design is used to set the level of Vth.

The ratio required to establish a given inverter threshold voltage is:

To get a symmetrical voltage transfer curve, Vth is set to VDD/2:

If in a process |VTp| = VTn, the device aspect ratios for a symmetrical inverter are related by:

Since µn / µp ≈ 2.5, a minimum area CMOS inverter will have (W/L)n ≈ 1 and (W/L)p ≈ 2.5. In this case the voltage transfer function is completely symmetric.

( )( )nn

pp

n LWµ

LWµ

K

K p

=

Tnth

TpthDD

p

n

VV

VVV

K

K

−

−−=

TnDD

TpDD

p

n

VV

VV

K

K

−

−=

21

21

( )( ) p

n

n

p

µµ

LW

LW=


Summary

So what did we accomplish until now?

• We know how a CMOS inverter works.

• VOL, VOH - do you still know it?

• We know how to set the W/L ratios of the transistors to get optimal noise margins.

• So we make every inverter the same, that is to say minimal -or?0V 1.0V 2.0V 3.0V 4.0V 5.0V

4.0V

2.0V

0V

IHV

ILV

vI

1 2

3

4 5

vo


M N

v = 5VI O

M P

C

V = 5 VDD

v (0+) = 5V0 V

+ 5V

0t

vI

VOL = 0 V

VOH = 5V

t1

t

vO

t2tX

(Vin - VTn)

MN saturated

MN nonsaturated

Dynamic Behavior of the CMOS Inverter High to Low Output Transition (I)

MN goes from Cutoff over Saturation into Nonsaturation region for the given input. The border between Saturation and Nonsaturation is reached at the time txand the output voltage Vout = VOH - VTn


High to Low Output Transition (II)

In order to simplify the final expressions, the integrations on the right for computing tHL are done with the borders from VDD to V0

(V1 = 0,9 VDD, V0 = 0,1 VDD) ∫∫ =

==

i

dVCdt

dt

dVC

dt

dQi

OUTOUT

OUTOUT

( )[ ] ( ) ( )

( )( )

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−−

=

=⎟⎟⎠

⎞⎜⎜⎝

⎛−−−

−=−−

−=−−−

∫

12

ln

2ln

2

12

22

0

22

00

V

VV

VVK

C

VVV

V

VVK

C

VVVVK

dVCtt

TnDD

TnDDn

OUT

V

VVOUTThDD

OUT

TnDDn

OUT

V

VVOUTOUTTnDD

n

OUTOUTx

TnDDTnDD

Saturation:

Nonsaturation:

( ) ( )22

1

2

2TnDDn

Tnout

VV

VTnDD

n

OUTOUTx

VVK

VC

VVK

dVCtt

TnDD

DD−

=−

−=− ∫−


High to Low Output Transition (III)

therefore:( )

⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−

−+

−= 1

2ln

2

0V

VV

VV

Vt TnDD

TnDD

TnHL τ

( )TnDDn

OUT

VVK

C

−=τwhere

⎟⎠⎞

⎜⎝⎛

−=

−∫ xa

x

axax

dxln

12

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛+

=+∫ n

n

n bxa

x

anbxax

dxln

1

In our case: 1 ,1 −== bn

We have used the following integral:

( ) ( )xxHL ttttt −+−= 21


Low to high output transition

From symmetry (VTn → VTp; Kn → Kp) follows for the high to low transition time:

M N

M P

C

V = 5 VDD

V = 0 V I

v (0+) = 0VO

0

v

0 V

+ 5V

t

I

v

0 V

+ 5V

0t

O

( )( )

⎥⎥⎦

⎤

⎢⎢⎣

⎡

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛−

−+

−−=⇒ 1

2ln

2

0V

VV

VV

V

VVK

Ct

TpDD

TpDD

Tp

TpDDp

OUTLH


Dynamic Behavior of the CMOS Inverter (cont’d)

• The choice of size of the NMOS and PMOS transistors can be dictated by the

desired average propagation delay τP

• For symmetrical inverter: PLHPHL

PLHPHLP tt

tt==

+=

2τ '' 5.2 pn KK ≈

M N

v I v

M P

C

V = 5 VDD

21

51

o

M N

M P

1 pF

V = 5 VDD

131

32.51

(a)

M N

vo

M P

2 pF

V = 5 VDD

81

201

(b)

vI

vI v

o

Example:

Symmetrical reference inverter

| VTP | = VTN = 1V τP = 6.4 nsC = 1 pF tr = tf = 12.8 ns

Scaled inverters

a) τP = 1 ns b) τP = 3.2 ns

Pfr tt τ2==


Power Dissipation

• Two kinds of power dissipation in digital electronics:

– static power dissipation (logic gate output is stable)

– dynamic power dissipation (during switching of logic gate)

• With CMOS nearly no static power dissipation!

0V 2.0V 4.0V 6.0V

6.0V

4.0V

2.0V

0V

40uA

20uA

0A >>

Output Voltage

Drain Current

vI


Dynamic Power Dissipation (I)

Power dissipation due to charge and discharge of capacitances

The total energy ED delivered by the source is given by

The power P(t) = VDDi(t), and because VDD is a constant,

R

VDD

+ -

C

Switch closes at t = 0

i(t)

v (t)c

+

-

1

(a) v (0) = 0c

Non-linear Resistor

∫∞

=0

)( dttPED

∫ ∫∞ ∞

==0 0

)()( dttiVdttiVE DDDDD

∫

∫∞

∞

=

=

)(

)0(

0

C

C

V

V CDD

CDDD

dvCV

dtdt

dvCVE

The current supplied by source VDD is also equal to the current in capacitor C, and so


Dynamic Power Dissipation (II)

Integrating from t = 0 to t = ∞, with VC(0) = 0 and VC (∞) = VDD results in

We know that the energy Es stored in capacitor C is given by

and thus the energy EL lost in the resistive element must be

2DDD CVE =

2

2DD

S

CVE =

2

2DD

SDL

CVEEE =−=

2

22

22

DD

Discharge

DD

Charge

DDTD

CV

CVCVE

=

⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎠

⎞⎜⎜⎝

⎛=

The total energy ETD dissipated in the process of first charging and then discharging the capacitor is equal to


Dynamic Power Dissipation (III)

Thus, every time a logic gate goes through a complete switching cycle, the transistors within the gate dissipate an energy equal to ETD. Logic gates normally switch states at some relatively high frequency (switching events/second), and the dynamic power PD dissipated by the logic gate is then

In effect, an average current equal to (CVDDf) is supplied from the source VDD.

fCVP DDD2=


Dynamic Power Dissipation (IV)

• Power dissipation due to the “short circuit current” (when both transistors are on during transition)

• The short circuit current reaches a peak for Vin = Vout = VDD/2

vout

V = 5 VDD

Ronp

onnR

Vin = Vout = VDD/2

0s 4ns 8ns 12ns 16nsTime

0 uA

30uA

0.0 V

vO

vI

i DD

5.0 V

Vol

tage

Cur

rent


Summary

Let’s repeat:

• What is the dynamic behaviour of the inverter?

• What do we need it for?

• What kind of power dissipation is there?

• What kind of power dissipation is dominant with CMOS logic?

0V 2.0V 4.0V 6.0V

6.0V

4.0V

2.0V

0V

40uA

20uA

0A>>

Output Voltage

Drain Current

vI

fCVP DDD2=

Integrated Electronic Systems Lab 136

6: CMOS Technology

6. CMOS Technology


6: CMOS Technology

• Basic Fabrication Operations

• Steps for Fabricating a NMOS Transistor

• LOCOS Process

• n-Well CMOS Technology

• Layout Design Rules

• CMOS Inverter Layout Design

• Circuit Extraction, Electrical Process Parameters

• Layout Tool Demonstration

• Appendix: MOSIS, EUROPRACTICE

CMOS Technology


6: CMOS Technology

1. Chip = Die = Microchip = Bar

2. Scribe Lines

3. Engineering Test Die

4. Edge Die

5. Crystal Planes

6. Wafer Flats

Wafer Terminology


6: CMOS Technology

The number of steps in IC fabrication flow depends upon the technology process and the complexity of the circuit

Example:

CMOS n-Well process - 30 major steps, and each major step may involve up to 15 substeps

Only three basic operations are performed on the wafer:

• Layering

• Patterning

• Doping

Basic Wafer Fabrication Operations


6: CMOS Technology

Layering

Layers Technique

Thermaloxidation

Chemical VaporDeposition (CVD)

Evaporation Sputtering

Insulators Silicon Dioxide(SiO2)

Silicon Dioxide (SiO2)

Silicon Nitrides (Si3N4)

Silicon Dioxide (SiO2)

Silicon Monoxide (SiO)

Semiconductors Epitaxial Silicon

Poly Silicon

Conductors

Doped polysilicon

Metals

Al/Si Alloys

Silicides

Metals

Alloys

Metals

Alloys

Grow or deposit thin layers of different materials on the wafer surface


6: CMOS Technology

Layering - Thermal Oxidation

SiO2 functions:

Si + O2 → SiO2 (900-1200°C)

700nm oxide: 10hours (1200°C)

Good oxide quality: gate oxide

Si + H2O → SiO2 + 2H2 (900-1200°C)

700nm oxide: 0.65hours (1200°C)

Poor oxide quality: field oxide

Dry oxidation

Wet oxidation (water vapor or steam)

Natural oxide: silicon will readily grow an oxide (5-10nm) if exposed to oxygen in the air!

The range for useful oxide thickness: 25nm (MOS gates) - 1500nm (field oxide)

O2

Silicon

SiO2

Surface passivation Diffusion barrier Field oxide MOS Gate oxide


6: CMOS Technology

Layering - Chemical Vapor Deposition (CVD)

Deposited materials:

• Insulators & Dielectrics: SiO2, Si3N4, Phosphorus Silicate Glass (PSG), Doped Oxide

• Semiconductors: Si

• Conductors: Al, Cu, Ni, Au, Pt, Ti, W, Mo, Cr, Silicides (WSi2, MoSi2), doped polysilicon

Basic CVD processing:

• a gas containing an atom(s) of the material to be deposited reacts with another gas liberating the desired material

• the freed material (atom or molecular form) “deposits” on the substrate

• the unwanted products of the chemical reaction leave the reaction chamber

Example: CVD of silicon from silicon tetrachloride

SiCl4 + 2H2 → Si + 4HCl↑

wafer


6: CMOS Technology

Layering - Evaporation

Used to deposit conductive layers (metallization): Al, Al/Si, Al/Cu, Au, Mo, Pt

When temperature is raised high enough, atoms of solid material (Al) will melt and “evaporate” into the atmosphere and deposit on to the wafer

External energy needed to evaporate the metal are provided by:

1.A current flowing through a filament

The evaporation take place into an evacuated chamber; otherwise Al would combine with oxygen in air to form Al2O3

2.Flash system

Al/Si alloy

3.Electron beam

Al

Crucible

Magnet

Evaporation Source

Wafer

Heater

Vacuum Pump

High Vacuum

(10-5-10-7 torr)


6: CMOS Technology

Layering - Sputtering

Used to deposit thin metal/alloys films and insulators: Al, Ti, Mo, Al/Si, Al/Cu, SiO2

Sputtering process:

• ionized argon atoms (+) are introduced into an evacuated chamber

• the target (Al) is maintained at negative potential

• the argon ions accelerated towards the negative charge

• following the impact some of the target material atoms tear off

• the liberated material settles on everything in the chamber, including the wafers

The material to be sputtered does not have to be heated


6: CMOS Technology

Patterning • Patterning = Lithography = Masking

• Selective removal of the top layer(s) on the wafers

• Ex.: Process steps required for patterning SiO2

Photoresist

2.Photoresist deposition

Si substrate (wafer)

SiO2

1.Initial structure

Mask

UV light

Insoluble photoresist

Soluble photoresist 3.UV Exposure

Chemical/Dry etch

5.SiO2 etching

5.SiO2 etching (end)

4.Soluble photoresist etching

6.Photoresist etching


6: CMOS Technology

Doping

• Change conductivity type and resistivity on selected regions of wafer

• Doping takes place to the wafer through the holes patterned in the surface layer

• Two techniques are used:

• Thermal diffusion

• Ion implantation

Thermal diffusion:- heat the wafer to the vicinity of 1000°C- expose the wafer to vapors containing the desired dopant- the dopant atoms diffuse into the wafer surface creating a p/n region

Ion implantation:- room temperature- dopant atoms are accelerated to a high speed and “shot” into the wafer surface- an annealing (heating) step is necessary to reorder the crystal structure damaged by implant


6: CMOS Technology

NMOS Transistor Fabrication - process flow (1)

Si Substrate (p)

SiO2 Field Oxide (Thick Oxide)

Oxidation (Layering)

Oxide etching (Patterning)


6: CMOS Technology


Polysilicon etching (Patterning)

SiO2 Gate Oxide (Thin Oxide)

Polysilicon deposition (Layering)



6: CMOS Technology



Ion implantation (Doping)


SiO2 Insulated Oxide

n type n+ n+

n+ n+


6: CMOS Technology


Al evaporation


Metal deposition (Layering)

Metal etching (Patterning)

Contact windows

n+ n+

n+ n+

n+ n+

S D

G

Si Substrate (p)


6: CMOS Technology

Device Isolation Techniques

MOS transistors must be electrically isolated from each other in order to:

• prevent unwanted conduction paths between devices

• avoid creation of inversion layers outside the channel regions

• reduce the leakage currents

Each device is created in dedicated regions - active areas

Each active area is surrounded by a field oxide barrier using few techniques:

A) Etched field-oxide isolation

1) grow a field oxide over the entire surface of the chip

2) pattern the oxide and define active areas

Drawbacks: -large oxide steps at the boundaries between active areas and field regions!

-cracking of polysilicon/metal subsequent deposited layers!

Not used!

B) Local Oxidation of Silicon (LOCOS)


6: CMOS Technology

Local Oxidation of Silicon (LOCOS) (1)

More planar surface topology

Selectively growing the field oxide in certain regions - process flow:

1) grow a thin pad oxide (SiO2) on the silicon surface

2) define active area : deposition and patterning a silicon nitride (Si3N4) layer

Si3N4

SiO2

Silicon substrate

The thin pad oxide - protect the silicon surface from stress caused by nitride

3) channel stop implant: p-type regions that surround the transistors

p+ p+p+


6: CMOS Technology

Local Oxidation of Silicon (LOCOS) (2)

4) Grow a thick field oxide

Field oxide is partially recessed into the surface (oxidation consume some of the silicon)

Field oxides forms a lateral extension under the nitride layer - bird`s beak region

Bird’s beak region limits device scaling and device density in VLSI circuits!

5) Etch the nitride layer and the thin oxide pad layer

Active area

Active area


6: CMOS Technology

n-Well CMOS Technology - simplified process sequence

Creating n-well regions (PMOS transistors) and channel stop regions

Grow field oxide and gate oxide

Deposit and pattern polysilicon layer

Implant source and drain regions, substrate contacts

Create contact windows, deposit and pattern metal layer


6: CMOS Technology

n-Well CMOS Technology - Inverter Example

• Process starts with a moderately doped (1015 cm-3) p-type substrate (wafer)

• An initial oxide layer is grown on the entire surface (barrier oxide)

SiO2

Si (p)


6: CMOS Technology

1. n-Well mask - defines the n-Well regions

• Pattern the oxide

• Implant n-type impurity atoms (phosphorus) - 1016cm-3

• Drive-in the impurities (vertical but also lateral redistribution - limits the density )

n-well

SiO2

Si (p)


6: CMOS Technology

2. Active area mask - define the regions in which MOS devices will be created

• LOCOS process to isolate NMOS and PMOS transistors

• lateral penetration of bird’s beak region ~ oxide thickness

• channel stop p+ implants (boron)

• Grow gate oxide (dry oxidation) - only in the open area of active region

n-well

SiO2

Si (p)

p+


6: CMOS Technology

3. Polysilicon mask - define the gates of the MOS transistors

• Polysilicon is deposited over the entire wafer (CVD process) and doped (typically n-type)

• Pattern the polysilicon in the dry (plasma) etching process

• Etch the gate oxide

n-well

SiO2

Si (p)

p+

Polysilicon gate


6: CMOS Technology

4. n-Select mask - define the n+ source/drain regions of NMOS transistors

• Define an ohmic contact to the n-well

• Implant n-type impurity atoms (arsenic)

• Polisilicon layer protects transistor channel regions from the arsenic dopant

n+ n+ n+

n-well

SiO2DS

Si (p)

p+

n-well ohmic contact


6: CMOS Technology

5. Complement of the n-select mask - define the p+ source/drain regions of PMOS transistors

• Define the ohmic contacts to the substrate

• Implant p-type impurity atoms (boron)

• Polisilicon layer protects transistor channel regions from the boron dopant

n+ n+ n+p+ p+

n-well

SiO2D D SSp+

Si (p)

p+

substrate ohmic contact


6: CMOS Technology

• In the n-well two p+ and one n+ regions are created

• After source/drain implantation a short thermal process is performed (annealing):

• moderate temperature

• drive the impurities deeper into the substrate

• repair some of the crystal structure damage

• lateral diffusion under the gate: overlap capacitances

• Next the SiO2 insulated layer is deposited over the entire wafer area using a CVD technique

• The surface becomes nonplanar: impact on the metal deposition step

n+ n+ n+p+ p+

SiO2

n-well

SiO2D D SSp+

Si (p)

p+


6: CMOS Technology

6. Contact mask - define the contact cuts in the insulating layer

• Contacts to polysilicon must be made outside the gate region (avoid metal spikes through the poly and the thin gate oxide)

n+ n+ n+p+ p+

SiO2

n-well

SiO2D D SSp+

Si (p)

p+

Contact window


6: CMOS Technology

7. Metallization mask - define the interconnection pattern

• Aluminum is deposited over the entire wafer (evaporation) and selectively etched

• The step coverage in this process is most critical (nonplanarity of the wafer surface)

n+ n+ n+p+ p+

SiO2

n-well

SiO2

Metal

D D SSp+

Si (p)

p+


6: CMOS Technology

• The final step: the entire surface is passivated (overglass layer)

• Protect the surface from contaminants and scratches

• Then, openings are etched to the bond pads to allow for wire bonding


6: CMOS Technology

GND VDD

Out

In

Poly

n+ n+ n+p+ p+

SiO2

n-well

SiO2

Metal

D

Gate oxide

N-channel transistor P-channel transistor

D SSp+

Si (p)

p+

InGND VDD

Out


6: CMOS Technology

Design Rules

• Interface between designer and process engineer

• Guidelines for constructing process masks

• Unit dimension: minimum line width

• Scalable design rules - lambda (λ) parameter:

– define all rules as a function of a single parameter λ– scaling of the minimum dimension: change the value of λ - linear scaling!

– linear scaling is only possible over a limited range of dimensions (1-3µm)

– are conservative: they have to represent the worst case rules for the whole set

– for small projects are a flexible and versatile design methodology

• Micron rules - absolute dimensions:

– can exploit the features of a given process to a maximum degree

– scaling and porting designs between technologies is more demanding: manually or using advanced CAD tools!

• Ex.: Scalable CMOS design rules


6: CMOS Technology

CMOS Process Layers

Layer

Polysilicon

Metal1

Metal2

Contact To Poly

Contact To Diffusion

Via

Well (p,n)

Active Area (n+,p+)

Color Representation

Yellow

Green

Red

Blue

Magenta

Black

Black

Black

Select (p+,n+) Green


6: CMOS Technology

Intra-Layer Design Rules (λ)

Metal24

3

10

96Well

Active3

3

Polysilicon2

2

Different PotentialSame Potential

Metal13

2

Contact/Via hole

Select 2

2

3

Minimum dimensions and distances


6: CMOS Technology

1

2

5

3

Well boundary

Transistor

poly active (n+)

Inter-Layer Design Rules - Transistor Layout (λ)


6: CMOS Technology

Inter-Layer Design Rules - Contact and Via (λ)

1

2

1

Via

Metal toPoly contact

Metal toActive contact

2

5

4

3 2

2

1

Metal1 toMetal2 contact

m2

m1

n+

Via

2

m2m1

poly

m1


6: CMOS Technology

Select Layer (λ)

33

2

2

2

Well

Substrate

Select

5

SelectContact to substrate

Contact to well

1


6: CMOS Technology

CMOS Inverter Layout

Poly

n+ n+ n+p+ p+

SiO2

n-well

SiO2

Metal

D

Gate oxide

N-channel transistor P-channel transistor

D SSp+

Si (p)

p+

InGND VDD

Out


6: CMOS Technology

CMOS Latchup

n+

p-type substrate

n+ p+ p+

V (5 V)DD

n+p+

V (0 V)SS

BSDDSB

n-well

Rp

Rn

vO

pnp transistor

npn transistor

• The parasitic bipolar transistors can destroy the CMOS circuitry• The bipolar devices are normallly inactive• The collector of each bipolar transistor is connected to the base of the

other in a positive feedback structure• The latchup effect can occur when:

1. Both bipolar transistors conduct2. Product of gains of the 2 transistors in the feedback loop

exceeds unity ( βPβN > 1)


7. Complementary MOS (CMOS) Logic Design

Integrated Electronic Systems Lab7: CMOS Logic 175

Basic CMOS Logic Gate Structure

• PMOS and NMOS switching networks are complementary

⇒Either the PMOS or the NMOS network is on while the other is off

⇒No static power dissipation

VDD

Logic Inputs

PMOS SwitchingNetwork

NMOS SwitchingNetwork

Y


CMOS NOR Gate

M N

v I

M P

V = 5 VDD

vo

21

51

V = 5 VDD

A B

Z

101

101

21

21

NOR Gate Truth Table

A B

0 0

0 1

1 0

1 1

1

0

0

0

Z = A + B


Transistor Sizing for CMOS Gates: Review

Goal: To maintain the delay times equal the reference inverter design under the worst-case input conditions

Example: 2 input CMOS NOR gate

- Each transistor of the NMOS network is capable of dischargingindividually the load capacitance C ⇒ Same size as NMOStransistor of reference inverter

- PMOS network conducts only when AB = 00 (Transistors in serie) ⇒ Each PMOS must be twice larger( On-resistance proportional to (W/L)-1 )


CMOS NAND Gate

M N

v I v

O

M P

V = 5 VDD

21

51

Z

V = 5 VDD

41

A

B

51

51

41

A B Z = AB

0 00 11 01 1

1110

NAND Gate Truth Table


Multi-Input NAND Gate

C

V = 5 VDD

A

B

C

Y

D

E

Y

15

15

15

15

15

110

110

110

110

110

Y= ABCDE

Why should one prefer a NAND gate rather than a NOR gate?


Steps in Constructing Graphs for NMOS and PMOS Networks (I)

Y = A + B (C + D)

+5 V

M AA

Y

PMOS Switch Network

ABCD

M CC MDD

MBB

C + D

B (C + D)

A + B (C + D)


Steps in Constructing Graphs for NMOS and PMOS Networks (II)

0

1

2

A

B

C

D

(b) NMOS Graph

1

+5 V

M AA21

Y

PMOS Switch Network

ABCD

2

3

M CC MDD

MBB

41

41

41

0

1

(a)

3

4

2

0

1

2

A

B

CD

23

4

5

(c) NMOS Graph with

New Nodes Added

0

1

2

A

B

C

D

23

4

5

(d) Graph with

PMOS Arcs Added


Steps in Constructing Graphs for NMOS and PMOS Networks (III)

+5 V

M CC MDDMAA

MBB

41

21

Y

41

41

4

B

A

C

D

151

151

151

7.51

Final CMOS Circuit

3

4

5

2

1

0

1

2

A

B

C

D

23

4

5

Graph with

PMOS Arcs Added


Summary

• AND - serially connected FET

• OR - parallel connected FET

• NMOS network implements “zeros”

• PMOS network implements “ones”

• W/L ratio has to be determined as a design parameter

+5 V

M CC MDDMAA

MBB

41

21

Y

41

41

B

A

C

D

151

151

151

7.51


CMOS Gate Design: Minimum Size Vs. Performance (I)

CMOS circuit with only minimum size transistors

Considerable savings in chip area, but increased logic delay

Example:


CMOS Gate Design: Minimum Size Vs. Performance (II)

(W/L) for PMOS network = 2/3 PLHIPLHIPLH τττ 5.7

3215

=⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛

=

of reference inverterPLHPLHI ττ =

The average propagation delay of the minimum size logic gate is:

( ) ( )PLHI

PLHIPLHIPHLIPLHPHLP τττττττ 75.4

2

5.9

2

5.72

2==

+=

+=

Mininimum size gate will 4.75 times slower than reference inverter whendriving the same load capacitance

For NMOS network PHLIPHL ττ 2=


Power-Delay Product (PDP)

The PDP is an important figure of merit for a logic technology

PAVPPDP τ=

For CMOS: fCVP DDAV2= with

Tf

1=

CMOS switching waveform


Power-Delay Product (cont’d)

bfar ttttT +++≥• The period T must satisfy:

• Assumptions: At high frequencies ta → 0 and tb → 0, tr and tf account forapproximately 80 % of the total transition time

For symmetrical inverter:

( )P

PrtT ττ

58.0

22

8.0

2==≥

55

22DD

PP

DD CVCVPDP =≤ τ

τ


8. Passtransistor and Transmission Gate Logic

Integrated Electronic Systems Lab8: Transmission Gate Logic 189

Passtransistor Logic: Basic Principle

control

inV outV

inV outV

control

Idea:

Implementation:

Vin control Vout

1 0 x

1 1 1

0 0 x

0 1 0

0=open1=closed

Integrated Electronic Systems Lab7b: Transmission Gate Logic 190

Passtransistor Logic: NEXOR Realisation

B

A

OUT

A B OUT

0 0 1

0 1 0

1 0 0

1 1 1

A

B


Passtransistor: Charging Characteristics

DDin VV = )t(Vout

)t(Vctrl

outC 00 == )t(Vout

DDctrl

ctrl

V)t(V

)t(V

=>=

=<

0

00

)t(Vout

t

)V(VV SBTDD −

GSVTransistor is in Saturation duringCharging Process

NMOS


Passtransistor Cascades

DDin VV = )V(VVV maxTDDmax −=

outC

DDVDDV DDVDDV

maxV maxV maxV

maxV

DDin VV =

DDV

1max,V

TDD

max,Tmax,max,

VV

)V(VVV

2212

−≈

−=

outC2max,V

)V(VVV max,TDDmax, 11 −=

DDin VV =


Passtransistor: Discharging Characteristics

0=inV )t(Vout

)t(Vctrl

outC )V(VV)t(V SBTDDout −== 0

DDctrl

ctrl

V)t(V

)t(V

=>=

=<

0

00

)t(Vout

t

)V(VV SBTDD −

GSVTransistor is always in Nonsaturation duringDischarging Process

NMOS Passtransistor:Discharging faster thanCharging, since DeviceImpedance is lower in NSatthan in Sat

NMOS


Passtransistor: Charging Characteristics

DDin VV = )t(Vout

)t(Vctrl

outC

00

0

=>=

=<

)t(V

V)t(V

ctrl

DDctrl

GSV

PMOS Charging Process:

00 == )t(Vout

The output is charged to VDD(Transistor is initially saturated and goes in nonsaturatedmode)

0=inV )t(Vout

)t(Vctrl

outC

00

0

=>=

=<

)t(V

V)t(V

ctrl

DDctrl

GSV

PMOS Discharging Process:

DDout V)t(V == 0

The output is discharged to VT(Transistor is saturated and finally goes in cut-off mode)

DDV

DDV


From Passtransistors to Transmission Gates

Logic Level

NMOS PMOS CMOS

Logic 0 0 0

Logic 1 TNDD VV −

TPV

DDV DDV

ctrlV

DDV

outV

outC

inV

ctrlV

CMOS Transmission Gate

ctrlV

ctrlV

outVinV

Symbol: CMOS Transmission Gate

dt

dV*CII out

outDPDN =+

• Bidirectional resistive connection between the input and output terminals• Useful in both analog (e.g. for relay contacts) and in digital design (e.g.

for multiplexers)


Transmission Gate: Operation States

Operation states of the Transistors which are passed over during charging the output from 0 to VDD:

DDV : VoltageFinal

0 : VoltageInitial

TPV

TNDD VV −

Mn

satu

rate

d

Mn

cut-

off

Mp

sat.

Mp

nons

atur

ated


CMOS Transmission Gate: On-Resistance

On-resistance of a transmission gate, including body effect

22

5.0

/50,/20

,6.02,5.0

75.0,75.0

VAKVAK

VV

VVVV

np

F

TOPTON

µµ

φγ

==

==

−==

onNonP

onNonPEQ RR

RRR

+=


CMOS Transmission Gate (III)

• Charge sharing problem

SMALLBIG

SMALLSMALLBIGBIGF CC

VCVCV

++

=

Example: CSMALL = 0.02 pF, VSMALL = 5 V, VBIG = 0 V

CBIG = 0.2 pF (about 10 standard loads in a 0.5 CMOS process)

VF = 0.45 V ⇒ The ‘big‘ capacitor has forced node A to a voltage

close to a ‘0‘

Node A has to be insulated from node Z by including a buffer (e.g. Inverter) between the 2 nodes, if node A is not strong enough to over-come the ‘big‘ capacitor


Transmission Gate Logic

S

S

B

A

S

F

Multiplexer:

SBASF +=

B

B

A

A

B

F

Equivalence (NEXOR):

BA

BAABF

⊕=

+=

F

B

B

A

Alternate equivalence logic circuit:


Function Implementation with Passtransistor Logic

dcbdabdbadbF +++=

Karnaugh Map of F:

1 0 0 1

0 0 1 0

1 0 1 1

1 1 1 1a

b

cd

F

(in our case: decompose with combinations of the literals b and d

find minimum decomposition in such a way, that each selected field is depending on one variable or constant 0 or constant 1 only

Step 1:


Function Implementation with Passtransistor Logic

F

DDVAttach decomposition variables toselection lines

Step 2:

Determine the line input signals (implement inverted function to compensate output inverter

Step 3:

b b d d

Sustainer transistor

c

a

a

0


9. Memory Elements and Dynamic Logic


9: Memory Elements & Dynamic Logic

RS Flipflop

The RS-flipflop is a bistable element with two inputs:

• Reset (R), resets the output Q to 0

• Set (S), sets the output Q to 1



RS-Flipflops

There are two ways to implement a RS-flipflop:

• based on NOR-gates: positive logic

• based on NAND-gates: negative logic



Clocked RS-Latch

To achieve a synchronous operation, we can add a clock signal

• Clock= 0: R and S have no influence upon the state of the circuit

• Clock= 1: R and S can change the state of the circuit



D-Latch

For storing data it is more convenient to have a data input. This is realized by using the data input as set signal and the inverted data input as reset signal.

• Clock= 0: Q unchanged

• Clock= 1: Q= D



Transmission Gate D-Latch

An alternative way to build a D-latch is to use transmission gates thus reducing the complexity (transistor count) of the circuit.

• Load= 0: Latch stores data

• Load= 1: Latch is transparent (output= input)



Clocked JK-Latch

An other extension of a simple RS-flipflop is a JK-Latch

• J: enables/disables the low to high transition of the latch

• K: enables/disables the high to low transition of the latch



Edge Triggered Logic

If the previous presented D-latch would be used in a synchronous circuit, i.e. a counter, it would produce a malfunction:

While clock is low the latches have the state Q(n) and the feedback network would apply the state Q(n+1) at the inputs of the latches. When clock goes high the latches change to the new state Q(n+1). The feedback logic calculates now the state Q(n+2). But clock is still high so the latches change falsely to the state Q(n+2).

So what we need is a latch which changes only once per clock cycle, this is edge triggered logic.



Edge Triggered JK-Flipflop

A straight forward way to implement an edge-triggered JK-flipflop is to use a master-slave flipflop.

• Clock= 1: The master (left latch) is changeable, the slave (right latch) is locked and holds the output at the current state

• Clock= 0: The master is locked and the slave is changes its state if necessary

The output value is the state of the master at the falling edge of the clock signal



Edge Triggered TG D-Flipflop

Circuitry of an edge-triggered flipflop

• Clk= 0: First stage is loaded, second stage is locked and stores data

• Clk= 1: First stage is locked, second stage is loaded

With the rising edge (low to high transition) the new value is available a the output



Transmission Gate JK- Flipflop

It is also possible to build a JK-flipflop with transmission gates as a edge-triggered flipflop.

This achieves that the output state can only change at the rising edge of the clock signal



Dynamic D-Flipflop

Dynamic logic utilizes the parasitic capacitances of transistors and interconnect to store the current state. This reduces the transistor count but forbids a static operation. An application of dynamic circuits is the dynamic D-flipflop.



Dynamic Shift Register

An other application is the dynamic shift register. It has also less transistor count but requires a non-overlapping two-phase clock which is expensive to generate.



Dynamic Chain Latch



Dynamic RAM

A special kind of memory is dynamic RAM. The major advantage is the low transistor count, DRAM requires only one transistor and one (small) capacitor per bit.

The first disadvantage is the destructive read. After reading a cell the red value must be written back to keep the data in the RAM.

The second disadvantage is the limited duration of storage. After some milliseconds the cell must be refreshed (read and written back).



Dynamic RAM



Clock Signal:

• used to synchronize data flow though a digital network

⇒ clocked static or dynamic circuits

• problems: clock skew(delay caused by clock distribution wires)

Condition for nonoverlapping clock signals and :)t(φ2

Clocking

Ideal nonoverlapping 2-phase clocks

)t(φ1

0)t(φ)t(φ 21 = t∀



Basic 2-phase clocking



Single and Multiple Clock Signals

⇒ For nonoverlapping clock phases fine tuned and well designeddelay lines (realized as Transmission gates) have to be inserted in order toavoid overlapping of .

φφ and

φφ and

Single clock 2-phase timing



Generation of inverted clock phase

TG delay circuit



Pseudo 2-φ clocking



Clocked Dynamic Logic⇒ Synchronized data transfer

Shift register

1) Upper Frequency Limitation: Charging and Discharging Times

Clocked shift register circuit



Time constant for charging and discharging:LTGTG CR=τ

wherelineinTGL CCCC ++=

VA=VDD: (Vin(0)=0)

⎥⎦⎤

⎢⎣⎡ τ−−≅ TGDDin

/te1V)t(V

Inverter is switched, when Vin=VIH which occurs after

( ) ( )[ ]pnoxin

DD

IHTG1

WLWLCC

VV

1lnt

+=

⎥⎦⎤

⎢⎣⎡ −τ−≅ϕ

VA=0: (Vin(0)= VDD)

TGDDin/teV)t(V τ−⋅≅

The time until Vin reaches VIL is given by

⎥⎦⎤

⎢⎣⎡τ−≅

IL

DDTG0

VV

lnt



2) Lower Frequency Limitation: Charge Leakage

Leakage patch in a CMOS TG

The load capacitance, seen by the transmission gate (TG) is

inlineTGL CCCC ++=

The depletion capacitance contributions to CL are due to the reversed pnjunctions in the MOS transistors. As shown in fig. above a leakage current flow exists across the reverse biased pn junctions. The influence of this leakage current on the charge stored in CL depends on the values of ILp and ILn.



Charge leakage problem in CMOS TG



WithLpLnL III −=

the leakage current influence on Vin is given by

Lin

L Idt

dVC −=

If ILp>ILn the capacitance is charged by IL otherwise it is discharged or remains constant when the ideal condition ILp=ILn is true.

dVdQ

C

IIdt

dQ

storestore

LnLpstore

=

−=

Assuming that the leakage currents ILp and ILn are constant and that the node charge voltage relation is linear of the form

VCQ storestore =



follows (because Cstore is const.)

LnLpstor IIdtdV

C −=

The solution of this equation is

)0(VtC

)II()t(V

stor

LnLp+

−=

If ∆V is the maximum allowed voltage change:

L

stormax

IV∆C

t =

Charge leakage circuit



With Tmax=2tmax (the longest allowed clock period) follows for the minimum frequency

V∆C2I

t21

fstore

L

maxmin ≅≅

The transmission gate capacitance is

Transmission gate capacitance

)V(C)V(CCCCCC DBnSBpoldolslineGT +++++≅



So the storage capacitance can be estimated by voltage averaging of this expression:

[ ]DBnSBpDDoldolslineGstor CC)V,0(KCCCCC +++++≅

For a realistic analysis of the charge leakage problems the dependence of the leakage currents from the reverse voltage bias has to be taken into consideration.



Charge Sharing

Basic charge sharing circuit

t<0: (TG switched off)

DD1T

2

DD1

VCQ

0)0t(V

V)0t(V

==<=<

t>0: (TG switched on)

DD12

DD21

1

21f

f21T

V)C/C(1

1V

CCC

)0t(V)0t(VV

V)CC(Q

+=

+=

>=>=+=



If we design a circuit with C1=C2, then Vf=(VDD/2), indicating drop in voltage. A reliable forward transfer of a logic 1 state from C1 to C2 requires that C1>>C2 to insure that Vf≈VDD.

Let us specify arbitrary initial conditions V1(0)and V2(0) on the capacitors giving the system a total charge of

)0(VC)0(VCQ 2211t +=Applying basic circuit analysis gives the time-dependent voltage as

where the time constant is given by

21

21eqeqTG

CCCC

CwithCR+

==τ

In the limit t→∝, V1=V2=Vf:



This agrees with the result from simple charge conservation by noting that the final charge distributes according to

f21T V)CC(Q +=

Transient voltage behavior for initial conditions of V1(0)=VDD and V2(0)=0



Charge sharing among N TG-connected capacitors

Initial charge: ∑==

N

1iiiT )0(VCQ

After connecting nodes: fN

1iiT VCQ ⎟⎠⎞

⎜⎝⎛ ∑=

=

Final voltage:∑

∑==

=N

1i i

N1i ii

fC

)0(VCV



Dynamic Logic• Pull-up (pull-down) network of static CMOS is replaced by a single precharge(discharge) transistor.The remaining network then conditionally discharges (changes up) the output in a second operation pulse

• One logic level is held by dynamic charge storage• Transistor count is reduced from 2n (static CMOS) to n+2 for dynamic

precharged CMOS (but now: 2 phases of operation)

Dynamic nMOS Inverter (Single clock, 2 phases)

Basic dynamic nMOS inverter



Precharge Phase

If Vin=0 then

outpTpDDp

outch CR

)VV(C

=−β

=τ

WORST case (Vin=VDD):

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=

=

−−

+−

τ

+=τ

1V

)VV(2ln

)VV(

V2

t

)CC(R

0

TpDD

TpDD

Tpmax,ch

max,ch

noutpmax,ch

Dynamic nMOS inverter: precharge and evaluate



Evaluation Phase

For the case that M1 is switched on and identically designed channel width for M1and Mn the discharge time constant is given by

)VV(WkC)LL(

TnDDn

outn1dis

−′+

=τ

Precharge network for worst case



Evaluation discharge network

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛= −

−+

−τ 1

V)VV(2

ln)VV(

V2t0

TnDD

TnDD

Tndisdis

Maximum clock frequency

Mmax

dismax,chM

t21

f

)t,tmax(t

≅

=



Dynamic pMOS Inverter

Basic dynamic pMOS inverter

Dynamic CMOS Properties and Conditions• single phase clock

• input should change during precharge only

• input must be stable at the end of the precharge phase

• in the evaluation phase the output remains HIGH (LOW) or is optionally

discharged (charged)

φ=1 Prechargeφ=0 Evaluate



Complex Logic

Complex dynamic logic



Dynamic CascadespMOS blocks and nMOS blocks have to be installed alternated in order to avoid glitches

Cascaded nMOS-nMOS glitch problem

Dynamic cascades

Wrongly coupledstages: while the first oneis in precharge, the secondis in evaluation.The result of the secondstage will be influencedby the precharge processof the first stage



Domino CMOS Logic

Basic domino logic circuit



• Domino Logic: design method for glitch-free cascading of nMOS logic blocks• Each stage is driven by φ

- Precharge during φ = 0- Evaluation when φ = 1

• Domino logic blocks consists of a precharge/ evaluation block and an output inverter

Precharge Phase: The gate output is precharged to logic 1 and the inverter output is going to logic 0. Logic transmission errors are avoided by providing a logic 0 at the inverter output (avoiding discharge of the next logic state).

Evaluation Phase: The inverter output stays according to the actual input values at logic 0 or is set to logic 1. The correct result signal is provided at the end of the domino cascade after stabilization of all stages.



Domino AND gate

Cascaded domino logic



Visualization of domino effect

Domino timing



Cascaded domino circuit with fanout = 2



Domino Logic Properties

• Domino logic consists of either n-type or p-type blocks• small load capacity to be driven by logic (one inverter only) ⇒ low dimensions of

transistors• only one clock signal required• only positive logic realizations possible because of the input inverters ⇒ domino logic is noninverting

Functions as

cannot be directly realized in a domino chain



Analysis

Domino AND4 gate

CX=C0+CT. C0 represents the capacitance due to M0, while CT is the total of all other contributions.



Precharge (φ=0: Mp1 in conduction, Mn1 in cutoff)

lineG1BDp1GDp1BDn1GDn

T0X

CC)CC()CC(

CCC

+++++≅+=

Evaluate

If all inputs Ai are set to logic 1, the worst case delay time can be estimated by

X0123n1123n

223n33nnnD

C)RRRRR(C)RRRR(

C)RRR(C)RR(CRt

+++++++++++++++≅

with

)()/(

1

TnDDjn

j

VVLWkR

−′=

Mp1 conducting → )1iclog(VVC IHxx =>→Minimum precharge time

⎟⎟⎠

⎞⎜⎜⎝

⎛−

−−

+⎥⎥⎦

⎤

⎢⎢⎣

⎡

−≅ 1

)(2ln

)(

2

IHDD

TpDD

TpDD

Tp

chch

VV

VV

VV

Vt τ

VX(0)=0

⎥⎦

⎤⎢⎣

⎡−β

τ =)VV(

CTpDDp

Xch



Charge Leakage and Charge Sharing

Domino stage with pull-up MOSFET



Charge sharing in a domino chain

Cout,1>>Cx1+Cx2



Use of feedback to control a pull-up MOSFET for charge sharing problem



NORA Logic

(NORA = NO RAce)

NORA Properties

• NORA is very insensitive to clock delay• one clock signal and the inverted clock signal with short slopes rise times are

sufficient • no inverter is needed between the logic stages, because of alternate use of

n-type and p-type blocks• the last stage is a clocked inverter, a C2MOS latch• ideal to clock pipelined logic systems



The Signal Race Problem

Signal race problem

The signal race problem can be seen: a signal race can arise, when both transmission gates conduct at the same time. If the new input from TG1 reaches the input of TG2 while TG2 is still transmitting the output, the output information will be lost. Imperfect TG synchronization occurs because of normal transmission intervals or clock skew.



Clock skew

tp>>tr,tf → no problems

Tskew=tp → race result critical



Dynamic latch operation


Accept data when φ=0,hold data when φ=1



NORA Structuring



tionssecφandφNORA



C2MOS latch

NORA pipelined logic




tionssecφandφNORA

:0=φ P P locked E E transp.E E transp. P P locked:1=φ

φ

φ



tionssecφandφNORA


φ

φ

?

0V

?

?



tionssecφandφNORA


φ

φ

?

0V

C²MOS Latchlocked duringclock skewperiod!



tionssecφandφNORA


φ

φ

Prechargedto 0V

Duration of initial Value of Evalutation Phase (VDD) will be enhanced

?

?

Duration of provision of logicaloutput value to next stage willeventually be enhanced

And the other way round:

?


Advantages of dynamic logic:• Smaller area than static logic• Smaller parasitic capacitances, therefore higher speed• Reliable operation if designed correctly

Concerns / Disadvantages:• Capacitive coupling to dynamic nodes• Charge sharing with dynamic nodes• Subthreshold leakage in eval logic• Minority carrier injection and latchup• Alpha particle immunity• Vdd/gnd noise vulnerability / IR-drop

Summary Dynamic Logic


10. Performance, interconnect and packaging

10: PerformanceIntegrated Electronic Systems Lab 266

Summary

Interconnect Parameters: Capacitance, Resistance, Inductance

Electrical Wire Models

• Lumped C model

• Lumped RC model

• RC chain model

• Distributed RC line model

• Transmission line model

Technology Scaling

Power and Clock Distribution

Input Protection Circuits

Static Gate Sizing

Off-Chip Driver Circuits

Packaging Technology


Interconnect Parameters

Interconnection choices in an actual CMOS process:

• multiple layers of Aluminum (up to 7)

• polysilicon layer (at least one)

• possibility of using the heavily doped n+ and p+ layers

The wiring forms a complex geometry that introduces parasitics:

• capacitive

• resistive

• inductive

Parasitic effects reduce the performance and the reliability by:

• increasing the propagation delay

• affecting the energy dissipation and the power distribution

• introducing extra noise source


Modern Interconnect


Full Wire Model

Assume that all wires in a bus network are implemented in a single interconnect layer (Al), isolated from the silicon substrate and from each other by a layer of dielectric material (SiO2):

Schematic view

Physical view

Full wire circuit model:

• Consider parasitic capacitance, resistance and inductance

• Parasitics are distributed over the length of the wire

• Inter-wire parasitics: coupling effects


Simplified (Only Capacitance) Wire Model

A simplified capacitance-only model can be used if:

• the wires are short

• the wires cross-section is large or the wire material has a low resistivity (small resistance)

Other simplified models can be obtained

1) Neglecting the inductive effects, valid when:

• the resistance of the wire is large (long Al wires with a small cross-section)

• trise and tfall of the signals are large (slow signals)

2) Neglecting the inter-wire capacitance, valid when:

• the separation between neighboring wires is large

• the wires run together for a short distance


Wire Parallel-Plate Capacitance

Simple model - the parallel-plate capacitance: L

H

tox

Substrate

SiO2

W

Current Flow

Electrical-field lines

The capacitance of a wire is function of:

• shape of the wire

• environment

• distance to substrate

• distance to surrounding wires

True for W >> tox ⇒ electric field lines are orthogonal to the capacitor plates

WLt

CCox

oxppwire

ε==

Cwire is the total capacitance of the wire (pF)


Wire Fringing Capacitance

• Advanced processes have a reduced W/H ratio (<1)

• The capacitance between side-wall of the wires and the substrate (fringing capacitance) must be considered!

( )( )Htt

HWc

ccc

ox

ox

ox

oxwire

fringeppwire

/log

22/ πεε+

−≈

+=

cwire is the wire capacity per unit length (pF/cm)

For W/H large cfringe < cpp, cwire ~ cpp

For W/H < 1.5 ⇒ cfringe > cpp

W - H/2H

+

Cfringe CppSubstrateCpp

Cfringe

W

H

SiO2

Substrate

tOX

cpp

cwire

cpp

cfringe


Interwire Capacitance

In multilevel interconnects technologies the wires are not completely isolated

Each wire is coupled to the:

• substrate (grounded capacitor)

• neighboring wires on the same layer (floating capacitor)

• neighboring wires on adjacent layers (floating capacitor)

Level1

Level2

CparallelCfringe

Assuming that oxide thickness (tox = 1µm) and metal thickness (H=1µm) are held constant while scaling the

other dimensions ⇒ for W < 1.75H, C interwire dominates!


Wiring Capacitances

Field Active Poly Al1 Al2 Al3 Al4

Cplate (aF/µm2) 88

PolyCfringe (aF/µm) 54

Cplate (aF/µm2) 30 41 57

Al1Cfringe (aF/µm) 40 47 54

Cplate (aF/µm2) 13 15 17 36

Al2Cfringe (aF/µm) 25 27 29 45

Cplate (aF/µm2) 8.9 9.4 10 15 41

Al3Cfringe (aF/µm) 18 19 20 27 49

Cplate (aF/µm2) 6.5 6.8 7 8.9 15 35

Al4Cfringe (aF/µm) 14 15 15 18 27 45

Cplate (aF/µm2) 5.2 5.4 5.4 6.6 9.1 14 38

Al5Cfringe (aF/µm) 12 12 12 14 19 27 52

Plate and fringe capacitance values for a typical 0.25 µm CMOS process


Wire Resistance

W

L

H

R1 R2≡

R - Sheet Resistance

R = ρ

H W

L= R

L

W


Dealing With Resistance

Polycide gate MOSFET

Silicides: WSi2, TiSi2, PtSi2, TaSi

Conductivity: 8-10 times better than Poly

• Selective technology scaling

• Use better interconnect materials (silicides, bypasses)

• More interconnect layers (reduce average wire length)


Other Resistive Effects(1) Contact resistance

• Extra resistance added by transition between routing layers

• Can be reduced by making the contact holes larger

• Current crowding upper limits the size of the contact

(2) Skin effect

• High frequency (GHz) currents tends to flow on the surface of a conductor

• Resistance become frequency-dependent (increase when frequency increase)

• Affects only wider wires

(3) Electromigration

• Limits the DC currents to 1mA/µm


Wire inductance

dt

diLv =∆

At switching frequencies in GHz range the wire inductance must be considered

A changing current passing through an inductor generates a voltage drop:

On-chip inductance effects are:

• reflection of signals due to impedance mismatch

• inductive coupling between lines

• ringing effects

• switching noise due to Ldi/dt voltage drops

It is possible to compute the wire inductance directly from its geometry and its environment

A more simple approximation is given by following relation:

cl = εµ

where c is capacitance per unit length, l inductance per unit length, ε electric permittivity and µ magnetic permeability of the surrounding dielectric

Ex.: 0.25 µm technology a 0.4µm width Al wire routed on top of the field oxide (SiO2) has

c = 92aF/µm, l = 0.47pH/µm


Example: Intel 0.25 micron Process


The Lumped C Model

Conditions:

• resistive component of the wire is small

• consider only the capacitive component

• switching frequencies are in medium range

The wire still represents an equipotential region and does not introduce any delay

The distributed capacitance is lumped into a single capacitor

The only impact on performance:

• loading effect of Clumped on the driving gate


The Lumped RC Model

Metal wires of few mm length have a significant resistance and the equipotential assumption is no longer adequate!

New model:

• Lumps the total resistance of the wire into a single resistor R

• Combines the global capacitance of the wire into a single capacitor C

The estimated wire delay: τ = RC

This model is pessimistic and inaccurate for long interconnect wires!


The Elmore Delay

The shared path resistance Rik is the resistance shared among the paths from the source node s to the nodes k and i:

( ) ( )[ ]kspathispathwhereRRR jjik →∩→∈= ∑ ,

Assume that each node of the network is initially discharged and a step input is applied at t=0

The Elmore delay at node i, for a network with N nodes, is given by:

∑=

=N

kikkDi RC

1

τ

Ex.: τDi = R1C1 + R1C2 + (R1 + R3)C3 + (R1 + R3)C4 + (R1 + R3 + Ri)Ci

Consider the following RC-tree network:

• the network has a single input node (s)

• all capacitors are between a node and the ground

• the network does not contain any resistive loops

Ex.: Ri4 = R1 + R3; Ri2 = R1


The RC Chain Model

RC chain - a special case of the RC-tree network:

∑ ∑∑= ==

==N

iii

N

ii

i

jjiDN RCRC

1 11

τ Ex.: τ Di = C1R1 + C2(R1 + R2) + ... + Ci(R1 + ... + Ri)

Assume that a wire of length L is modeled by N equal-length segments, each having Ri = rL/N, and Ci = cL/N (r, c are resistance and capacitance per unit length)

( ) ( ) ( )N

NRC

N

NNrcLNrcrcrc

N

LDN 2

1

2

1...2

22

2 +=

+=+++⎟

⎠⎞

⎜⎝⎛=τ

For N large, the RC chain model approach the distributed RC line model:22

2rcLRCDN ==τ

(1) The delay of a wire is a quadratic function of its length

(2) The delay of the RC chain model is 1/2 of the delay predicted by the lumped RC model!

21 i-1 NiRi-1 Ri

Ci-1 Ci

Vin VNR1

C1

R2

C2


The Distributed RC Line Model (1)

( ) ( )Lr

VVVV

t

VLc iiiii

∆−−−

=∂

∂∆ −+ 11

2

2

x

V

t

Vrc

∂∂

=∂∂

For ∆L -> 0, we obtain the diffusion equation:

The voltage at node i is given by the following partial differential equation:

The diffusion equation is difficult to use for circuit analysis

However, the distributed RC line can be approximated by a lumped RC chain network, and:

( )2

2rcLout =τ

V - the voltage at a particular point in the wire

x - the distance between this point and the signal source

L - total length of the wire

r - resistance per unit length

c - capacitance per unit length


The Distributed RC Line Model (2)

• The step input waveform diffuses from the start to the end of the wire

• The waveform rapidly degrades: delay for long wires

Voltage range Lumped RC network Distributed RC network

0 → 50%(tp) 0.69RC 0.38RC

0 → 63%(τ) RC 0.5RC

10% → 90%(tr) 2.2RC 0.9RC

0 → 90% 2.3RC RC

Step response of lumped and distributed RC networks: points of interests


Transmission Lines

When the inductance of the wire dominates the delay behavior - transmission line effects!

Model: a distributed RLC wire

Signal propagate as a wave - alternatively transferring energy from electric to magnetic field

The wave propagation equation:

2

2

2

2

t

vlc

t

vrc

x

v

∂∂

+∂∂

=∂∂ r,c,l - resistance, capacitance and inductance per unit length

g ~ 0 - the leakage conductance

The ideal wave propagation equation (for lossless transmission line, r=0) :

2

2

22

2

2

2 1

t

v

t

vlc

x

v

∂∂

=∂∂

=∂∂

ν lc

1=ν propagation speed along the line


Lossless Transmission Lines Parameters (1)

rr

c

lc µεεµν 011

===

Dielectric constant and wave-propagation speed for various materials used in IC technology

c0 - speed of light in vacuum

ε - electric permittivity of insulator

µ - magnetic permeability of insulator

εr - relative permittivity with respect to vacuum

µr - relative permeability with respect to vacuum

Propagation speed: only a function of surrounding medium

tflight = L/v - the time it takes for the wave to propagate from one to the other end of the wire


Lossless Transmission Lines Parameters (2)

Characteristic impedance: impedance presented by wire

νν

cl

c

lZ

10 === 100 to 500Ω for typical wires

The behavior of the transmission line is influenced by the termination of the line

The termination how much of the wave is reflected upon arrival at the wire end

0

0

ZR

ZR

I

I

V

V

inc

refl

inc

refl

+−

===ρ

ρ - Reflection coefficient

R - the termination resistance

R = Z0 ρ = 0

R = ∞ ρ = 1

R = 0 ρ = -1


Transmission Lines with Terminating Impedances Zs and ZL

Consider the case: ZL = ∞, ρ = 1

Zs

ZL

Z0 VDestVSource

VSource = (Z0/(Z0+Zs))Vin

Vin

ρs = (Zs-Z0)/(Zs+Z0)


Lattice Diagram

Conclusion: in order to avoid ringing or slow propagation delay the transmission lineshould be terminated both at the source (series termination) and at the destination (parallel termination) with a resistance equal to Z0

Vin = 5V, RS = 5Z0, RL = ∞

t = 0 ... tflight

V1S = (Z0/(Z0+Zs))Vin = 0.83V

V1D = V1

S + Vr,1D; Vr,1

D = ρD V1S = 0.83V

V1D = 0.83V + 0.83 = 1.66V

ρs = (Zs-Z0)/(Zs+Z0) = 0.66ρD = 1

t = tflight ... 2tflight

V2S = V1

S + Vr,1D + Vr,1

S ; Vr,1S = ρS Vr,1

D = 0.55V

V2S = 2.22V

V2D = V1

D + Vr,1S + Vr,2

D; Vr,2D = ρD Vr,1

S = 0.55V

V2D = 2.77V

....


2 2flight w

r

t lt lc< =

02 2w

lR rl Z

c= < =

2 2rw

t ll

r clc< <

12

wrl c

lξ = <or

Inductance is important

2. High attenuation

1. Large input rise time

2 rw

tl

lc<

2w

ll

r c<

10.00

1.00

0.10

0.01

0.01 0.10 1.00 10.00

Length (cm)

Transition time (ns) of line driver / input signal

1. & 2.

Figures of Merit for RLC Interconnect

Criteria:

•Distributed versus Lumped Model: Distributed Model: Rise (fall) time of input signal,

tr, must be smaller than propagation delay through wire. (Otherwise, a lumped model suffices.)

•Consideration of Inductance required: Wire resistance R / damping factor ξ may not be too large, otherwise distributed RC model sufficient

• In conclusion: Distributed RLC model required if

With Induct.

No Induct.

lc

tl r

w

2>⇔

c

l

rlw

2<⇔


Scaling (1)

VLSI integration depends on the smallest-size feature permitted by the technology

The size of the transistors has to be as small as possible!

The internal operating physics of the down-scaled MOS transistor changes

First order scaling theory:

• Estimates the improvements that can be expected as technology is scaled

• Scaled MOS device is obtained by applying a dimensionless scaling factor α to:

• all dimensions (L, W, junction depth, oxide thickness, etc.)

• device voltages

• impurities concentration densities

• The characteristics of the scaled MOS device are similar to that of the original one

• A number of parameters such as voltage drop, line propagation delay, current density, contact resistance exhibit significant degradation with scaling!


Scaling (2)

Parameter Scaling Factor

Length; L 1/α

Width; W 1/α

Gate oxide thickness; tox 1/α

Junction depth; Xj 1/α

Substrate doping; Na or Nd α

Supply voltage; VDD 1/α

Electric field across gate oxide; E 1

Depletion layer thickness; d 1/α

DeviceParameter

Parasitic capacitance; WL/tox 1/α

Gate delay; VC/I 1/α

DC power dissipation; Ps 1/α2

Dynamic power dissipation; Pd 1/α2

Power delay product 1/α3

Gate area 1/α2

Power density; VI/A 1

Current density; I/A α

ResultantInfluence

Transconductance; gm 1

Influence of first-order scaling on MOS device

1>α


Scaling (3)

Interconnect layer scaling

Parameter Scaling Factor

Conductor line width; W 1/α

Conductor line length; L 1/α

Conductor line thickness; t 1/α

Line cross-section; A 1/α2

Line resistance; r α

Line response time; rc 1

Normalized line response time α

Line voltage drop; Vd 1

Normalized line voltage drop α

Current density; J α

Normalized contact voltage drop; Vc/V α2

rW

L

tr α

αα

αρ

=⎥⎦⎤

⎢⎣⎡=

/

/

/'

( )( ) constIrrIVd === αα/'

( )( ) constrCCrs === αατ /'

The scaled line resistance is:

The voltage drop along the scaled line is:

The scaled line response time is:

For a constant chip size many of the signals paths do not scale down! Therefore:

• Voltage drops along the lines are larger by a factor of α than scaled line voltage drop

• The line response time is larger by a factor of α than scaled line response (see table)

Problems: distribution and organization of clocking signals, electromigration, the increase ofthe wire capacitance (affects the gate delay)

(Line ofsame length)

(Line ofsame length)


Power Distribution

Process with 1 Level of metal :

• VDD and ground (VSS) are routed in interdigitated trees

• Crossunders are very difficult (low resistance interconnect)

Power distribution is much easier for technologies with 2 (or more) levels of metal

Cautions:

• Parts of the chip that are likely to simultaneous transition are routed separately!

• Separate power pins might be used for the output driver!


Clock and Timing Circles (1)

The clock

• synchronize machine operations and data transfer

• global control technique that provide the “glue” for system operation

System level timing can be described using circular timing charts

Ideal pseudo 2-phase clocking chart:

• φ1(t)φ2(t) = 0, ∀t

• φ1=1 during first half-period

• φ2=1 during the last half-period

• time increases in a counter-clockwise direction

• one full rotation corresponds to a clock period T


Clock and Timing Circles (2)

Overlapping pseudo 2-phase clocking chart:

• φ1(t)φ2(t) = 0, except during the transition times

• mutually-exclusive clock periods provide timing intervals for logical operations

• overlapped segments must be avoided

• transition times can be made small by proper clock generator design

Clock skew is represented by rotating one of the clocks!

• φ1(t)φ2(t) = 1 defines the skew time, ts

• ts indicates the possibility of unwanted simultaneous bit transfer

• skew are caused by the clock driving circuit or by the distribution arrangement


Clock Generation Circuits (1)

2-phase clock generator with transmission gate delay

• Mp1, Mn1 inverter acts as the first driver for the chain

• Transmission gate (TG) is used as delay element to minimize clock skew

• TG is modeled as an equivalent resistance RTG and introduces a delay tD = RTGCin

• tP - the propagation delay through an inverter

• Choosing tD ~ tP the delay between the two branches is the same

• Thus clocking skew can be controlled by adjusting the size of the TG transistors (β)

( ) ( )TpDDpTnDDn

TGVVVV

R−+−

=ββ

1


Clock Generation Circuits (2)

2-phase clock generator with RS latch

To insure proper operation of the circuit two items should be checked:

• tP through the inverter must be small compared to the clock period (CLK has time to enter the latch)

• the output capacitance in both branches should be equal for equal switching delays; but capacitances are sensitive to the layout and interconnect geometry!


Clock Drivers and Distribution Techniques (1)

The clock driver must be able to handle large capacitive loads at the required clock frequency

Clock skew originate mostly from:

• unbalanced loads at the driver

• unequal distribution line delays (RC) - see figure

Distribution networks approaches:

• cascaded chain of inverting buffers that matches the clock generator to the distribution line

• balanced tree network with multiple fanouts

• symmetrical geometries (like H-tree) for the clock distribution lines


Clock Drivers and Distribution Techniques (2)

Balanced tree network with multiple fanouts:

• identical drivers can be used within a given stage

• the drive requirements of the output circuits are reduced from the single inverter design since the fanout has been split into groups

H-tree network:

• each clock distribution point O is at the same distance from the driver D, giving equal delay times


Input Protection Circuits (1)

Excessive electrical charge on the gate of the MOS transistor can destroy the device!

Protection circuits drain this excessive charge and avoid static burnout!

WLCC oxg =ox

Gox x

VE ≈ cmVEBD /105,7~ 6•

If Eox>EBD, the oxide insulating properties break down and charge is transported through the material - destruction of the device!

The max gate voltage VGmax is a relatively small number

Static electricity during handling could easily reach a few kV

VcmVxEV oxBDG 25.261035/105,7 96max =⋅⋅⋅=⋅≅ −

Protection circuits allow for alternate charge flow paths when the input voltage is too large

Diode structures are very useful in this application because:

• have relatively low breakdown voltages which can be controlled

• reverse breakdown in a pn junction is non-destructive


Input Protection Circuits (2)

Diode input protection circuit:

• D1...4 are reverse biased

• R reduces the voltage that reaches D3, D4 and increases the level of protection

• D1, D2 and D3, D4 undergo breakdown for positive or negative voltage sources

Thick oxide MOSFET protection circuit:

• the transistor has the threshold voltage > VDD

and is in cutoff during normal operation

• If Vin > VT,f the transistor conducts providing a path to ground to drain off the excessive charge

Input protection circuits introduce parasitic RC time constants into the network!


Static Gate Sizing (1)

Problem - determine the values of Sj for j = 2,... which minimizes the total propagation delay through the inverter chain

• Sj - sizing factor, S1 = 1; Sj >1 for j>1

• βj - conduction factor, β1=k’(W/L)1; βj=Sjβ1

• Cw - wiring contribution of gate 1

• Ci, Co - in/out capacitances of gate 1

• Co,j = SjCo - output capacitance from gate j

• Ci,j = SjCi - input capacitance to gate j

• Cw,j = SjCw - wiring capacitance of gate j

( ) ( )[ ]wijojj

jwjijoj

jD CCSCSS

RCCC

S

Rt ++⎟

⎟⎠

⎞⎜⎜⎝

⎛=++⎟

⎟⎠

⎞⎜⎜⎝

⎛= +++ 11,1,,,

The time delay through gate j is, tD,j:



( )[ ]∑

=

+ ++=

N

j j

wijojD S

CCSCSRT

1

1

Suppose that there are N stages in the chain, the total time delay is given by:

To minimize TD we differentiate with respect to Sj and look for zero slope points: 0=∂∂

j

D

S

T

This results in the recursion relation:1

1

−

+ =j

j

j

j

S

S

S

Sfor j= 2,3,...N

If this to hold for arbitrary values of j, then: constKS

S

j

j ==+1

The boundary conditions of the problem are: S1 = 1, SN+1 = CL/Ci

Forming the product:i

LN

N

N

C

CK

S

S

S

S

S

S

S

S==⋅⋅⋅⋅⋅ +1

3

4

2

3

1

2

We obtain the scaling ratio in the form:N

i

L

C

CK

/1

⎟⎟⎠

⎞⎜⎜⎝

⎛=



Explicitly, the scaling factors are given by:

( )[ ] ( )[ ]∑=

++=++=N

jwiowioD CCKCNRCCKCRT

1min,

S1 = 1, S2 = K, S3 = K2 ... SN = KN-1

The minimum delay is then:

The number of stages that optimize the delay is obtained by differentiating TD (replacing K with its N-dependent equation) with respect to N and setting the result to 0:

( ) ( )0

)/ln1

1

=⎥⎦⎤

⎢⎣⎡ −⎟⎟

⎠

⎞⎜⎜⎝

⎛++

N

CC

C

CCCRRC iL

N

i

Lwio

If Co is small: ⎟⎟⎠

⎞⎜⎜⎝

⎛=

i

L

C

CN ln N is chosen the nearest integer for given values of Ci and CL

The equation K = Sj+1/Sj says that the minimum delay occurs when every stage has the same individual delay time tD

⎟⎟⎠

⎞⎜⎜⎝

⎛=⇔=

i

L

i

LN

C

CKN

C

CK lnlnwith eeKKNKN ==⇔=⇔=⇒ 11lnln

the optimum scaling ratio equals e !!!


Off-Chip Driver Circuits

Off-chip driver circuits are critical to the overall chip design

Some important problems must be addressed:

• efficient buffer circuitry between internal and off-chip drivers

• minimization of transmission line effects

• fast switching

• static charge protection

• interface specific items, such as CMOS-TTL level converter, etc.

An inverter circuit can be used as a basic off-chip driver

Performance factors are :

• the transient switching times tLH and tHL

• transmission line effects


Double-Inverter Off-Chip Driver Circuit

The simplest off-chip driver circuit: an inverter chain designed to handle a large capacitive load

( )TnDDnn

out

n VVk

C

L

W

−=⎟

⎠⎞

⎜⎝⎛

'2 τ

( )TpDDpp

out

p VVk

C

L

W

−=⎟

⎠⎞

⎜⎝⎛

'2 τ

The sizes of Mn2 and Mp2 can be estimated using the high-to-low time constant τn and the low-to-high time constant τp:

The actual values of the fall and rise time can be estimated from:

( )⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−

−+

−= 1

2ln

2

0V

VV

VV

Vt TnDD

TnDD

TnnHL τ

( )⎥⎥⎦

⎤

⎢⎢⎣

⎡

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛−

−+

−= 1

2ln

2

0V

VV

VV

Vt

TpDD

TpDD

Tp

pLH τ

Cout is large ⇒ Mn2 and Mp2 are large! ⇒ obtained using parallel connected transistors to aid in layout and parasitic control

Mn1 and Mp1 can be sized using the previously presented sizing theory

where V0 is the 10% voltage point


Example

Consider a process characterized by the nominal values:

k’n = 55[µA/V2] VT0n = 0.9[V]

k’p = 25[µA/V2] VT0p = -0.75[V]

and VDD = 5[V]

The requirements for off-chip driver circuits are tLH = tHL = 20[ns] with a maximum load of Cout = 50[pF]

Using the previous equations we can compute the time constants

τn = 6.45[ns]

τp = 6.58[ns]

the aspect ratios are: 352

≅⎟⎠⎞

⎜⎝⎛

nL

W72

2

≅⎟⎠⎞

⎜⎝⎛

pL

W


Tri-State Off-Chip Driver Circuit

The input signal is split and individually control each output transistor

The high-impedance state is obtained by driving both NMOS and PMOS output devices into cutoff

Normal operation:

Z = 1 ⇒ Mp1 and Mp2 off, Mn on

High-impedance state:

Z = 0 ⇒ Mp1 and Mp2 on, Mn off

⇒ Vp = VDD, Vn = 0

⇒ the output transistors are in cutoff


Bidirectional Off-Chip Driver Circuit

The tri-state section is a non-inverting buffer with an enable control E

E = 0 gives the high-Z state


Packaging Technology (1)

Package types

1. Bare die

2. Dual-In-line Package (DIP)

3. Pin Grid Array (PGA)

4. Small-outline IC

5. Quad flat pack

6. Plastic Leaded Package (PLCC)

7. Leadless carrier

1

4

3 6

5

27



Package has an important functionality in IC technology

• provides a means of bringing signal and supply wires in/out of the circuit

• removes the heat generated by the circuit

• protects the die against environmental conditions such as humidity

• provides mechanical support

Meantime packaging technology has a tremendous impact on the performance ⇒ up to 50% of the delay of a high-performance computer is due to packaging delays!

Packages generate parasitic inductance and capacitance:

Package Type Capacitance (pF) Inductance (nH)

68-pin plastic DIP 4 35

68-pin ceramic DIP 7 20

256-PGA 1-5 2-15

Wire bond 0.5-1 1-2

Solder bump 0.1-0.5 0.01-0.1


VDDext

L

L

VoutVin

CL

i(t)

VDDint

∆v - the difference between VDDext and VDDint:

• affects the logic levels

• reduces the noise margin

dt

diLv =∆

Inductive coupling between external (VDDext) and internal (VDDint) supply voltage (bonding wires)

A changing current passing through an inductor generates a voltage drop:

A transient current is sourced/sunk from/into the supply rails to charge/discharge CL


Example: parasitic effects of the bond-wire inductance


Design techniques:

• Separate power pins for I/O pads and chip core

• Multiple power and ground pins

• Careful selection of the position of the power and ground pins on the package

• Adding decoupling capacitance on the board

• Increase the rise and fall times

• Use advanced packaging technologies

CHIPSUPPLY

Bonding

WireBoard

Wiring

Cd

Decoupling

Capacitor

+

-




Packaging Technology Requirements:

• Electrical: low parasitics (L, C, R)

• Mechanical: reliable and robust

• Thermal: efficient heat removal

• Economical: inexpensive

Two interconnect levels:

(1) Die-to-Package-Substrate

(2) Package substrate to PCB



1-a: Wire bonding

Lead Frame

Substrate

Die

Pad

• Wires must be attached serially

• Bonding wires have inferior electrical properties (L, C)

• Difficult to predict the exact value of parasitics (irregular)



Substrate

Die

Solder BumpFilm + Pattern

Sprockethole

Polymer film

Leadframe

Testpads

1-b: Tape-automated bonding (TAB)

• The die is attached to a metal lead frame that is printed on a polymer film

• The connection between chip pads and polymer film wires is made using solder bumps

• Highly automated process

• Improve electrical performance (L ~ 0.5nH, C~0.3pF)



1-c: Flip-chip mounting

Solder bumps

Substrate

Die

Interconnect

layers

• Flip the die upside-down and attach it directly to the substrate using solder bumps

• Superior electrical performance

• Pads can be placed at any position on the chip (not only on the die boundary)

• A possible solution for power and clock distribution problems



2-a: Through-hole mounting

• mechanically reliable connections

• limits packaging density

2-b: Surface mounting

• increase package density:

• through holes are eliminated

• the lead pitch is reduced

• both sides of the board can be used

• the on-the-surface connection is weaker

• more expensive equipment needed

• testing on board is more complex


Packaging Technology (10)Multi-Chip-Modules (MCM) - Die-to-Board

(avionics processor module - Rabaey96)

Mount the die directly on the substrate

• increase the packaging density

• increase the performance

• reduce power consumption

• expensive technology


Semiconductor Packaging Process

How to come from wafer to final application ?

?

?

?



Finally, the packaging processes on component and application board level make the product working and successful.


Semiconductor Packaging Process– Pre-AssemblyAdvanced Pre-Assembly Process (Dicing before Grinding - DBG)

Half CutDicing

TapeLamination

Back Side Grinding

Dicing Blades Grinding Tape Grinding Wheels Mounting Tape Peeling Tape

Stress Relief(Plasma)

Wafer Mounting

Grind. Tape Removal

Gas/Energy

TapeLamination

Back Side Grinding

Dicing BladesGrinding Tape Grinding Wheels Mounting Tape Peeling Tape

Stress Relief(Plasma/Dry Polish)

Wafer Mounting

Grind. Tape Removal

Gas/Energy

Full CutDicing

Source: S. Mimietz/QD:Pre-Assembly Process Flow

Standard Pre-Assembly Process (Grinding before Dicing - GBD)



Face- Down Assembly Process (Ball Grid Arrays w/ Bond Channel)

Printing/Taping

Die Attach/Lamination

Adhesive/Tape Pick-up Tooling Temperature/Time Capillary & Wire

Adhesive Curing

WireBonding

Temperature/Time

Post Print Curing

Die Attach

Adhesive/ Dispense & Pick-up Tooling Temperature/Time Capillary & Wire

Adhesive Curing

WireBonding

AdhesiveDispense

Face-Up Assembly Process (Ball Grid Arrays w/o Bond Channel)


Molding

Gas/Energy Temperature/Time Solder Ball & Flux

Post Mold Curing

S/B AttachReflow

Plasma Activation

End of Line Process (Ball Grid Arrays)

Compound Dicing Blades

Package Singulation

End of Line Process (Leaded Packages)

PlasmaActivation

Post Mold Curing

Gas/Energy Temperature/Time Plating Bath Cutting Tool

Sn-PlatingLeads

Dedam/ Dejunk

Compound

Molding

Forming Tool

Trim&Form



CostCost per function decreases 25% per year

Form Factor (Package Density)Feature size reduction by factor 0.7X linear each node (every 2...3 years)Doubling devices/cm² each node (every 2...3 years)

Integration LevelMoore's law: bits per chip grow by factor of 4x every 3 yearsIn future slowing down to 4x every 4...5 years

SpeedClock frequency/data rate is increasing (5x growth every 10 years, slowing down to 3x)

PowerLaptop or cell phone require extended battery life timesHeat dissipation to be more effective

Functionality Logic: Digital CMOS - Analog / Mixed Signal - CMOS RF Memory: SRAM - DRAM - eDRAM EEPROM/Flash - FRAM – MRAM Actors / Sensors: Electro-optical - MEMs - chemical sensors - electro biological

Packaging Key Enabler


Typical Memory Package Types

Basic Packaging Concepts

The actual package concepts in use are:

TSOP (Thin Small Outline Package) – since about 1995

FBGA (Fine Pitch Ball Grind Array) – since about 2003

FLGA (Fine Pitch Land Grid Array) – since about 2005

F2BGA (Fine Pitch Flip Chip Ball Grid Array)

MCP (Multi Chip Package)


Packaging Key Enabler – Form Factor Dimension

F2BGABump &Substrate

Wire Bond &Lead Frame

TSOP

Form Factor Interconnect, Size, Cost?

FBGA Wire Bond &Substrate

Source: H. Hedler/QAG: Current and future packaging challenges

LGAWire Bond &Substrate &w/o balls

SiliconSize

2D-Package

3D-Package

Function,Performance

MCP

MCP/SiP

CustomizedSolution

TSOP, FBGALGA, FCiP

Standard Package

Smaller package sizes allow increased package density on board.Better electrical package performance supports higher speed.


Packaging Key Enabler – Form Factor Chip Density

Higher package and/or chip density support increased storage density on module level.

- packages get stacked to better utilize placement area- substrates get thinner to enable thin packages- chips get thinner to enable die stacking- balls get smaller to maintain total package height- bonding wires get replaced by RDL and vias

Stacked BGA (Folded)

Stacked BGA (PoP)

Stacked TSOP

Stacked Die FBGA

Wafer Level Package


Typical Memory Package Types - TSOP

1. Thin Small Outline Package (TSOPII)

Package type w/ “Z- leads” on 2 opposite package sides

TSOPII is typically a single die package

SMT compliant

Typical pin count : 54/66

Package height : 1.2 mm


Typical Memory Package Types - TSOP

Chip face-down assembly

Chip face-up assembly

Principle Package Constructions for TSOPII


Technical Challenges – TSOP Challenges

TSOP Challenges

One big challenge for TSOP packages is whisker growing related to the Pb-free plating applied for green package. The whisker growth rate strongly depends on the existing stress level inside the plated layer on the leads. The stress conditions can be impacted by plating technology and SMT reflow.


Typical Memory Package Types - FBGA

2. Fine Pitch Ball Grid Array (FBGA)

Package type w/ ball interconnects on bottom side only

The FBGA package concept is flexible and can carry more then 1 chip.

SMT compliant package

Ball count range : 54 – 144

Package height : 0.55/0.80/1.00/1.20/1.40 mm


Typical Memory Package Types - FBGA



Principle Package Constructions for FBGA


Typical Memory Package Types - FLGA

3. Fine Pitch Land Grid Array (FLGA)

Package type w/o solder spheres on bottom side what results in a lower total package height

Contains typically a single die in flip chip or wire bond technology but the FLGA package concept is also flexible to carry more then one die.


Ball count range : 8 – 300+

Package height : 0.4/0.48/1.0/1.1/1.2/1.3/1.4/1.92/2.2 mm


Typical Memory Package Types - FLGA


Principle Package Construction for FLGA


Typical Memory Package Types - F2BGA

4. Fine Pitch Flip Chip BGA (F2BGA)

F2BGA is a low or thin profile plastic BGA that carries inside a flip chip mounted on polymer substrate but looks from the package outside like a FBGA w/o bond channel

This package contains typically one die in flip chip technology


Package height : 1.2/1.4 mm

Package ball count range : 136 - 240


Typical Memory Package Types – F2BGA


Principle Package Construction for F2BGA


Typical Memory Package Types - MCP

5. Multi Chip Package (MCP)

MCP’s are low or thin profile plastic TQFP, LQFP or FBGA packages that contain today 2 - 8 stacked functional chips and up to 7 spacers in same package.

Memory MCP’s follow very different package concepts based on the individual chip sizes to be packaged and the required position of each individual die within the chip stack.


Ball count range : 54 – 149 (2007)

Package height : 0.8/1.0/1.2/1.3/1.4/1.6 mm



“Chinese Tower” “Chinese Reverse Tower”

“Mixed Die Stack” “Quad Die Stack”

Principle Package Constructions for MCP



… continued MCP

MCP’s got a tremendous importance as memory packages during last 2 years since this is the most effective way to combine different functionalities and/or increase storage density per package foot print.

The main stream memory packages using stacked chips. The package concepts could be generally structured into:

- Chip stack of same die size (Dual Die or Quad Die Stack)- Chip stack starting w/ largest and finishing w/ smallest die (Chinese Tower)- Chip stack starting w/ smallest and finishing w/ largest die (Chin. Reverse Tower)- Mixed die sizes in all stack positions (Mixed Die Stack)

To manufacture MCP’s a broad range of wafer thinning, die attach and wire bond technologies need to be mastered. Beside the process technologies also the materials to be used play a major role for success.

The MCP technology is considered as a key packaging technology of the near future.


Technical Challenges – MCP Challenges

MCP Challenges

Most crucial task for MCP’s is to develop and establish robust processes for thin die stacking and wire bonding.

Die pick-up capability for 75µm, 50µm or less thickness Full range of material-set to stack different chips for different stack configurationsAdvanced die attach and wire bond loop capability


Future Technical Challenges – Where we are?

Wins / Features• Small footprint• Very high scale integration• Very high storage density• High speed and data rate• Less energy consumption• New DRAM architecture

Phase 1:• Single Die Package

Phase 2:• Multi Chip Package

Phase 3:• 3D Chip Integration


Future Technical Challenges – New Concepts

Future packaging technology will focus on 3D chip integration what requiresvery strong cooperation between Frontend and Backend Development.

Challenges

DRAM architecture different Wire bonds replaced by Si-

trough hole electrode DRAM design to consider

space for micro vias Redistribution layer and micro

vias to be Frontend process Chip thickness extremely low New interconnect technology to

be developed Balancing of CTE- mismatch

inside package to be managed

Multi Chip Package

3D Chip Stack Package


11. CAD & Design Flow

Integrated Electronic Systems Lab 34711: CAD & Design Flow

Motivation: Microelectronics Design Efficiency

Achieving required productivity by system-level design methodologies

1970 1980 1990 2000 2010

Layout Editor

Moore‘sLaw

Schematic Entry

Logic and Architectural Synthesis

???

Eff

icie

ncy

Platform-based Design


Example for Complex Systems: Embedded SoC

Properties

• Potentially consisting of a large number of components

• Specialised to an application domain• reactive• Real-time capability

Design Tasks

• Definition of communication architecture which is adequate to the application‘s structure

• Mapping of the system specification on available implementation components

Constraints

• Costs• Power consumption• Latency• Required flexibility

Embedded „System-on-Chip“

Micro-con-

troller

DSP

Memory

I/O-Module

ASIC

Actuators

Sensors

RFTransc.


Platform-Based System Design: Platform Life-Cycle

Platform

DSP core

CPU core

busMemory

DSP core

CPU core

busMemory

Specificblocks

OSAPI

Applications

OSAPI

Easy Implementation:

multiple devices with similar basic functions

ExperiencesNew Requirements

Feedback for future platform generations

Drivers

GenericPlatform

+Application-

SpecificAdditions

Lifecycle


Project Management: System Design: V Model

Analysis ofSystem Requirements

Design ofSystem Architecture

Analysis of HW/SW Component Requirements

HW/SW Co-Design

HW and SW ComponentImplementation

HW/SWIntegration

System Integration

System Delivery

System Properties and Constraints

Cost Analysis

Abstract Interfaces

Implemented HW/SW Modules

Prototype Generation and/orManufacturing

Product

Customer Application

Quality Assurance

Quality Assurance

Quality Assurance

Validation

Validation

Validation

SystemLevel

HW/SWComponentLevel

HW/SWIP Databaseand ImplementationLevel

ProductLevel


Hardware/Software Co-Design

Co-Simulation

HW/SW-Partitioning

Specification

HW-Specification SW-Specification

Synthesis Compilation

Heterogeneous HW-/SW-System

Communication Synth.

Placement/Routing Real-Time OS

O.k., let‘s gobottom-up now


Classes of CAD Tools

• Design Entry:– Graphical Editor (drawing schematic diagrams, physical layout, stick

layout diagrams, ...)– Language based circuit capture tools (for hardware description

languages like VHDL, Verilog, EDIF)

• Design Validation:– Physical design verification tools (design rule checker, extractor,

LVS, schematic and electrical rule checker)– Design Simulation:

• analog simulation: circuit level; behavioural level• digital simulations: circuit level, switch level, logic level, register transfer

level, architectural level, behavioural level; • thermal simulation: displaying heat dissipation on chip

– Formal Verification Methods


Classes of CAD Tools

• Design Implementation:– Layout Compilers (stick2layout, macrocell generators, datapath

compilers)– Layout Structuring & Optimization:

• Layout Compaction• Placement and Routing

– Logic Synthesis– Finite State Machine (FSM) Synthesis– Architectural Synthesis

• Management of Design Projects:– Design Databases:

• keep different versions (current, backup 1, ..., backup n) and views of a design object (schematic, simulation netlist, stick diagram, physicallayout, ...) in database


Full Custom Design: Design EntryFull Custom Design

With Full Custom Design techniques, the designer is able to individually specify the geometrical layout of the integrated circuit (transistor size[channel length, channel width, shape, ...], transistor placement, wire width, ...). The designer has the option to manually optimizethe layout

the most dense/area efficient layouts can be generated using the full custom design styles.

www.tanner.comLayout Editor

and Design Rule CheckHand-Crafted Layout:• The layout is drawn in form of rectangles and polygons on different layers using a graphics

editor.• The designer has to know a large set of process dependent design rules.• The mask layout is generated as drawn on the screen: direct influence to component

placement, to important parameters as W and L of transistors, wire widths, ...


Full Custom Design: Design Entry

Tool internal Design Representation: Geometrical Specification Language

• The layout is specified in textual form giving either the position and layer of rectangles (similar to hand crafted layout) or lines (as in stick diagrams).

• Since programming language constructs like – parameterized macros (to be used for layout segments as cells, ...), – loops (while, repeat, for, ...), and – conditional statements (if, case, ...) may be available, – parameterized layouts (e.g. generic transistor with W and L as parameters, cells for

different bit widths, sss) can be described using geometrical specification languages.

• Used in a large number of macrocell compilers.



B x y dx dy Box with length dx, width dy, an lower left hand corner placed at (x,y)L n Layout level (layer) for the box definiitions that followM n Start of macro definition nE End of macro definitionC n x y m Call for macro number n with translation x,y and orientation m.Q End of layout file

Example for a simplified geometrical specification language:

MOS Layer definitions:

Layer CMOS NMOS

1 n-diffusion n-diffusion2 p-diffusion ion implant3 polysilicon polysilicon4 metal metal5 contact contact8 n-well --9 overglass overglass



Cell Orientations:

Orien-tation Description

1 no rotation2 rotate 90° counterclockwise3 rotate 180° counterclockwise4 rotate 270° counterclockwise5 mirror about y-axis6 rotate 90° counterclockwise and mirror about y-axis7 rotate 180° counterclockwise and mirror about y-axis8 rotate 270° counterclockwise and mirror about y-axis



Full custom layout (hand crafted or generated out of a stick

diagram resp. a layout description)

Corresponding geometrical specification file and schematic diagram



Stick Diagram:• The layout is drawn in form of lines and polygons on differentlayers using a

graphics editor. • A stick--to--layout converter together with a compactor and a description of the

process design rules is then used to generate the rectanglebased layout.

• The designer can draw almost process and design rule independent symbolic layouts. Process adaption is done by the converter/compactor.

• Converter constraints (cell dimensions, channel widths / lengths of transistors, ...) can be specified.

• Stick Diagram Conventions:– Diffusion Areas: green (b/w: dotted line)– Polysilicon Lines: red (b/w: dashed line)– Metal Lines: blue (b/w: solid line)– Contacts: black

Example: Stick Diagram of a Transistor:


Full Custom Design: Stick Diagrams

Memory cell schematic and corresponding stick diagram


Fabrication Test Pattern

Block Layout

FloorplanningPlacement & Routing

Full Custom Design: Design Flow

Stick DiagramEditor

stick2layoutConverter

and Compactor

Layout Editor

Cells

Symbol Generation

Schematic Entry

Mask Layout Data

Fabrication

Simulation NetlistExtraction and Simulation (SPICE)

Design AnalysisDRC, ERC

Circuit ExtractionLVS

Circuit Simulation (SPICE)Timing Analysis

Test Pattern Generation


Cell Based Design

Cell based Design approaches rely on layout components predefined and provided by a silicon foundry. Several implemenation styles can be distinguished:

• Standard Cells:– layout blocks predefined by silicon foundry– full process sequence (amount of mask layers) for chip fabrication required

• Gate Arrays:– Linear Gate Arrays:

• pre-fabricated diffusion and poly layers (regular structures, e.g. transistors)• customized interconnect structures (wires in metal 1 and metal 2)• fixed size interconnect areas (channels)

– Sea of Gate Array• pre-fabricated diffusion and poly layers (regular structures e.g. transistors)• customized interconnect structures (wires in metal 1 and metal 2)• variable size interconnect areas (channels) over unused transistors

discussed later in this lecture


Cell based Full Custom Design: Design FlowMacrocell

Specification/Compilation

Fabrication

Simulation Netlist Extraction

Design AnalysisDRC, ERC

Circuit ExtractionLVS

Fabrication Test Pattern

CellLibrary

Symbol Generation

Schematic EntryGraphical

Data

Logic SimulationFault SimulationTiming Analysis

Test Pattern Generation

Simulation Models

Placement:Standard Cells

Macro CellsI/O Cells

LayoutData

Routing:Channel Generation

Global RoutingDetailed Routing

Mask Layout Data

Place &RouteOptimization

ParasiticWire Capacitances /Delay Backannotation


Standard Cell Full Custom Design


Physical Design Rule Check:

Physical design rule checks (DRCs) are performed to guarantee the conformity of a layout design to thesilicon vendor's set of design rules. Design rules are defined between objects on the same layer (minimum width, minimum spacing) as well as for objects on different layers (minimum spacing, overlapping, extension).

• Minimum width• Minimum spacing• Overlapping• Extension

Design rule violations are usually reported in the physical layout using a graphics editor. Sometimes, also a tabular form indicating the location and type of design rule violation can be generated.

Design Verification


Design Verification

Extraction:

• Circuit Level Extraction can be used to create a netlist for circuit level simulations (e.g. SPICE, ...). The netlist consists of MOS transistors (including geometrical parameters as W / L, parasitic capacitances), resistors, capacitances, diodes, ...

• Switch Level Extraction: can be used to create a netlist which can be processed by a switch level simulator. The resulting netlist consists of MOS transistors and parasitic capacitances (to model storage effects in MOS circuits).

• Parasitics Extraction: is used in conjunction with cell based design techniques. Since wire delay is dependent on the parasitic capacitance of a wire, parasitic capacitances of nets and input capacitances of other gates connected to an output can be used to estimate the extrinsic delays (Note: intrinsic delays [i.e. the delay of unloaded gates] are fetched from the cell library's simulation model data).

• Schematic Extraction: is executed to generate the connectivity data out of a graphical representation (schematic diagram) of a circuit module. The connectivity data is forwarded to a netlister which provides the information required e.g. by simulation tools (the simulators cannot operate on graphical data, they require netlists in a textual format). This kind of extraction is usually required in pre-layout design specification phases.


Design Verification

LVS:

The layout-versus-schematic (LVS) comparison tool checks the equivalence of the layout and its schematic.The tool can be used to find wrong connections or parameter mismatch (as W/L of transistors, ...) between a schematic and its physical layout representation.

Schematic / Electrical Rule Check (SRC / ERC):

To verify schematics used e.g. in cell based designs, a schematic rulechecker can find schematic rule violations (like the following examples):

• Warnings:• unconnected (floating) wire segments• open outputs• exceeded fanout

• Errors:• open inputs (undefined input value!)• number of bits differ for 2 buses connected together• number of input/output pins in a schematic differs from its symbol representation ( --> pins are

not accessible / not present at higher levels of schematic hierarchy)• more than one active driver connected to a net at the same time


Simulation: Models

Circuit and Delay Modelling:

• Circuit is built up by simulator primitives• Modelling of the timing/delay behaviour:

∆ : basic time unitτ(n) = n * ∆: delay of the gatet1, t2, t3, ...: clock time of a synchronous circuit(tν+1-tν): ∆t = m*∆

Timing Models:

• Zero Delay: ∆ = 0• Unit Delay: τ(n) = constant• Nominal delay: τ(n) = user-specified


Logic simulation (1/8)

• Simulation only in the time domain

• Typical Questions:

– How do my output signals behave based on a certain input pattern?

– Is my design still functioning at a given frequency?

• Algorithms:

– Signals values are discrete

– Signal changes are discrete events (where an event characterizes the transition from one signal level to another)

– Events are held and processed using a so-called “event-queue”

• Dynamic, linked list

• Sorted based on time (appearance of event)

• Processed based on current simulation time

• Models (gate primitives) are triggered by events at input signals



• Logic Systems

– Signal values representing (logic) level and strength

– Resolving multiple drivers via so-called resolution functions (e.g. ‘0’ and ‘1’ at the same node result in an ‘X’); example later

– 2-valued logic system (e.g. VHDL: Type bit)• '0' ("low", e.g. Vout < 2.5 V) and '1' ("high", Vout > 2.5 V)

– 3-valued logic system• To describe circuit problems (signal conflicts)

• '0' ("low"), '1' ("high")

• 'X' ("unknown", may be '0' or '1')

– 4-valued logic system • To describe bus structures

• '0', '1', 'X' (see above)

• 'Z' ("high impedance")

0

1

1

0

X

1

1

0

1

1

Z

1

1

1

EN

EN

1

0



– 9-valued logic systems (VHDL: Type std_logic_1164)• 'U' ("uninitialized")

• 'X' ("forcing unknown")

• '0' ("forcing low"), '1' (forcing high")

• 'Z' ("high impedance")

• 'W' ("weak unknown")

• 'L' ("weak low"), 'H' ("weak high")

• '-' ("don't care")0

H

CONSTANT resolution_table : stdlogic_table := (-- ----------------------------------------------------------- | U X 0 1 Z W L H - | | -- ---------------------------------------------------------

( 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U' ), -- | U |( 'U', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X' ), -- | X |( 'U', 'X', '0', 'X', '0', '0', '0', '0', 'X' ), -- | 0 |( 'U', 'X', 'X', '1', '1', '1', '1', '1', 'X' ), -- | 1 |( 'U', 'X', '0', '1', 'Z', 'W', 'L', 'H', 'X' ), -- | Z |( 'U', 'X', '0', '1', 'W', 'W', 'W', 'W', 'X' ), -- | W |( 'U', 'X', '0', '1', 'L', 'W', 'L', 'W', 'X' ), -- | L |( 'U', 'X', '0', '1', 'H', 'W', 'W', 'H', 'X' ), -- | H |( 'U', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X' ) -- | - |

);



• Timing Behavior Models

– Non-Delay: All gates have the same delay: NULL Simple, fast

No accuracy or timing behavior, not for asynchronous circuits

– Unit-Delay: All gates have the same delay: t_pd > 0 Simple, fast

Cause oscillations in feedback loops (!= reality)

– Nominal Delay: Every gate has an individual, but nominal delay More detailed timing behavior

Tolerances still not being modeled

– Delay model for load (C_load) and environment conditions (temp., voltage, process) dependency (KT,V,P =1 in nominal case).

tpd,actual = ( t0 + KL * Cload) * KT * KV * KP



• Timing Behavior Models (cont.)

– Min-Max-Delay• Models delay

tolerances

Timing behavior under worst-case conditions

Complex, higher runtime, undefined signal states are mostly very pessimistic as they propagate

10,20

10,20

15,40

A

B C

D

0 ns 20 ns 40 ns 60 ns 80 ns 100 ns 120 ns 140 ns

min. max.

A

B

C

D



• Event QueueExample


i1

i2

sel

&

&

1

&

in_gate1

in_gate2

sel_inverter

out_gate

i1

sel

i2

results1

s2selbar 1

10ns

15ns

12ns

8nsTimeSignalValue

TimeSignalValue



• Event Queue Example (cont.)

– Event queue before Initialization

– Event queue for t = 0 ns

– Event queue for t = 10 ns

0 nssel0

10 nsi11

30 nsi20

70 nssel1

100 nsi10

0 nsi21

0 nsi10

10 nsselbar

1

10 nsi11

30 nsi20

70 nssel1

100 nsi10

22 nss21

15 nss10

30 nsi20

70 nssel1

100 nsi10

15 nss10

12 nss20

12 nss20


s1

s2

result

selbar U

U...

U

U



• Simulation on logic level

– Netlist of gates (structural modeling)

– Gate model defined in standard cell library macro

– Strictly using signals of the selectedlogic systems

• Simulation on register-transfer-level

– Netlist of larger components

– Modeling of the component behavior using a hardware description language(VHDL/Verilog)

– Logic signals or even more abstract data types (e.g. state machine states)

&1

&

&

&

1

a

b

1

s

c

init state_1

state_2 state_3

0 0 0 0 0 1

0 1 1 1 0 0

00 10

1-

01

01

11

11

00

01

10

0- 10

00

11


Simulation: ModelsAdvanced Logic Simulators:

• Introduction of signal strength additional to logic values for driver and bus modelling

A : active, e.g. low impedance driverP : passive, e.g. high impedance driver (depletion load)S : storing, e.g. capacitive stored stateX : active indeterminate (e.g. active or storing)Y : passive indeterminate (e.g. passive or storing)Z : high impedance

• Instead of simple logical values, signals are used for simulation. A signal consists of a logical value and a strength.

• Logical Values = 0,1,X• 16 states A0 A1 AX P0 P1 PX S0 S1 SX X0 X1 XX Y0 Y1 YX ZZ

A0 A0 AX AX A0 A0 A0 A0 A0 A0 A0 AX AX A0 A0 A0 A0A1 A1 A1 A1 A1 A1 A1 A1 A1 AX A1 AX A1 A1 A1 A1AX AX AX AX AX AX AX AX AX AX AX AX AX AX AXP0 P0 PX PX P0 P0 P0 X0 XX XX P0 PX PX P0P1 P1 PX P1 P1 P1 XX X1 XX PX P1 PX P1PX PX PX PX PX XX XX XX PX PX PX PXS0 S0 SX SX X0 XX XX Y0 YX YX S0S1 S1 SX XX X1 XX YX Y1 YX S1SX SX XX XX XX YX YX YX SXX0 X0 XX XX X0 X0 XX X0X1 X1 XX X1 XX XX X1XX XX XX XX XX XXY0 Y0 YX YX Y0Y1 Y1 YX Y1YX YX YXZZ ZZ

Overviewon

SignalCombinations


Simulation: Models

Example: Driver Modelling:

Competing Drivers at a Bus


Simulation

www.modelsim.com


Simulation: Techniques

Simulation Techniques:

• Compiler-driven technique:– Problems:

• Feedbacks• Sorting of gate netlist• Zero delay model• Entire circuit is simulated

• Event-driven simulation ...

Switch-Level Simulation:

• well-suited so simulate digital MOS circuits

• no fixed direction of signal flow• transistor modeled as a switch

with three states: open, closed, unknown

• algebraic or RC models


Executable Specifications: VHDL

architecture structural of first_tap is

signal x_q,red : std_logic_vector(bitwidth-1 downto 0);signal mult : std_logic_vector(2*bitwidth-1 downto 0);

begin

delay_register:process(reset,clk)begin

if reset='1' thenx_q <= (others => '0');

elsif (clk'event and clk='1') thenx_q <= x_in;

end if;end process;

mult <= signed(coef)*signed(x_q);

Different types of modeling:

• Data Flow• Behaviour• Structure

VHDL is used for:

• Modelling• Simulation• Hardware Synthesis

VHDL: Very high speed integrated Circuits Hardware Description Language


Design Flow: IC Design with High-Level-Entryarchitecture structural of first_tap is

signal x_q,red : std_logic_vector(bitwidth-1 downto 0);signal mult : std_logic_vector(2*bitwidth-1 downto 0);

begin

delay_register:process(reset,clk)begin if reset='1' then

x_q <= (others => '0');elsif (clk'event and clk='1') thenx_q <= x_in;

end if;end process;

mult <= signed(coef)*signed(x_q);

VHDL-Description

RTL-Synthesis(Synopsys)

Gate-LevelNetlist

Layout

Placement &Routing

(Cadence/Mentor)Production

ASIC


Future Outlook: Networks-on-Chip

Generic Interface

Router

High-Speed Interconnect

µP

FPGA MEM

ASIC

– Regular platform integrating independent subsystems

• combine structures of today‘s SoC complexity

– Separation between Communication and Computation


NoC-based design flow: Hardware/Software Co-DesignClassical Flow

Co-Simulation

HW/SW-Partitioning

Specification

HW-Specification SW-Specification

Synthesis Compilation

Heterogeneous HW-/SW-System

Communication Synth.

Placement/Routing Real-Time OS

Dynamic Allocation/Re-Mapping during Operation

HW Library

Implementation

Specification

SW Library

NoC Mapping

NoC-based Flow

NoC Placement


Application Scenario: Mobile Video Terminal

Single Chip Mobile TerminalMobileServiceBase

Station(s)RF Centr.

CTRL

DISPLAY

Displ.CTRL

Different Configurations for:• High Quality (Resolution) Downstreaming• Low-Power Mode (Quality Reduction)• Image Compression and Upstreaming• Multi-Stream Modes


12. Digital Subsystem Design

Integrated Electronic Systems Lab12: Digital Design 387

Weinberger Structuring

Is a structured approach that simplifies structural layout and improves layout density. Method presented by Weinberger in 1967.

Weinberger Arrays:

• Are created by placing transistors on the chip in a geometrically regular manner. Horizontal and vertical interconnect patterns are used to wire the devices together.

• Using one type of gate (ex. NOR) complex NMOS circuits can be realized.

• Regularity of Weinberger Arrays is very suitable for automatic layout generation.


Weinberger Structuring (2)

Example of NOR gate reduction for Weinberger structuring:

• Empty squares = input connections

• Filled squares = output connections

( )CBAF ++=


Example: 3-to-8 decoder

Weinberger structuring:


3-to-8 decoder (2)


3-to-8 decoder (3)


Example 2

Weinberger NOR array representation

Random logic implementation

YXWVUF ++++=


Example 2 (2)

Weinberger stick diagram


Example 2 (3)

Weinberger array structure: (a) schematic (b) layout


Gate matrix layout

Gate matrix layout is a character based layout style for custom CMOScircuitry. It is a regular design style employing a matrix of intersecting transistor diffusion rows and poly-silicon columns such that intersections are potential transistor sites.

Creating a gate matrix. Representational line drawing or stick figure using the levels of interconnections available e.g. poly-silicon gate technology poly-silicon metal diffusion.

– Immediately draw series of parallel poly lines corresponding to the number of inputs to the circuit (may become more if an output is chosen to be poly-silicon)

– Subsequent transistor placements will be determined by two factors, i.e. input column and serial or parallel association among transistors.

– After row definition, further interconnections may be done with horizontal and vertical metal interconnection tracks\item final improvements


Gate matrix layout (2)

Gate matrix layout:

(a) Schematic

(b) Layout

(c) Optimized layout of N part


Example: half adder

( )AABBABAABBAB

ABABBABABAS

ABABC

⋅=+=

+++=+=

==

)(


Half adder realizations

(a) Standard cell

(b) Gate matrix


Character definitions for symbolic layout

N n-channel transistor

P p-channel transistor

+ metal-poly or metal-diffusion crossover

* contact

| poly-silicon or n-diffusion wire

! p-diffusion wire

: vertical metal

- horizontal metal


Character definitions (cont.)


Rules

The following rules summarize the gate-matrix technique:– Poly-silicon runs only in one direction and is of constant width and pitch

– Diffusion wires (of constant width) may run vertically between poly-silicon columns.

– Metal may run horizontally and vertically. Any pitch departures from a minimum (e.g. power rails) are manually specified.

– Transistors can only exist on poly-silicon columns.

Wide transistors may be specified by abutting two ort more N or P symbols.


Summary of gate matrix properties

regular design style

technology updateable

modularity is encouraged by the block nature of the layout style

circuit extraction may done at the symbolic level or at the mask level by conventional circuit extractions

character symbolic description is not hierarchical modules must be assembled in their entirety and ''pasted'' together at the mask level

no freedom to locally optimize geometry, e.g. transistor size


Optimal CMOS complex gate layout

In MOS circuit design, advantage can be taken by the application of complex functional cells in order to achieve better performance. In this section, the implementation of a random logic function on an array of CMOS transistors will be discussed. The method has been presented by Uehara and van Cleemput in 1981. A graph theoretical approach for systematic and efficient layout generation minimizes the required chip area.

optimal


EXOR: NAND implementation

(a) Logic diagram

(b) Circuit

(c) Layout


CMOS Functional cells (Complex gates)

Advantages of complex-gate approach:– better performance

– smaller size


Complex gates (2)

In the following, the consideration is limited to AND/OR networks realized in complex gate CMOS by means of series/parallel connections of transistors.The topology of the NMOS network and the PMOS network are assumed to be dual.

The delay of a complex CMOS cell mainly depends on the maximum number of series transistors between VDD or VSS and the cell output, which is called levelof the complex cell. This quantity has a direct influence on the charging or discharging resistance of the cell. Generally, cells with less than four levels are desirable. The number of cells with parallel/serial topology is given by the following table:

It is reasonable to use mainly cells with three levels and only sometimes cells with four levels in order to get a sufficient performance.


Alternative EXOR implementation


Basic layout strategy


Layout strategy (2)

Layout properties:– two rows of transistors, for the PMOS and NMOS parts of the circuit

– equal number of transistors in both rows

Optimizations: If the metal connections between adjacent transistors are replaced by diffusion (designer should be careful in doing this for high-speed circuits) the following layout (a) is achieved.


Optimized layout

An even more sophisticated layout arrangement which reduces the required area is shown in (b)

area = width * heightwith

height = const.width = basic grid size * (#inputs + #separations + 1)

A separation is required when there is no connection between physically adjacent transistors.

An optimal layout is obtained by reducing the number of separations.


Optimal layout

The best layout is achieved by the following transistor arrangement, logically equivalent to the previous figures:


Graph theoretical algorithm

The p-side and the n-side of the circuit can be formulated as graphs which can be defined:

( )( ) network siden,

network sidep,

−=−=

NNN

PPP

EVG

EVG

Graph properties:– the graphs are series/parallel graphs (CMOS complex gate

property/assumption)

– every source/drain potential is represented by a vertex V

– every transistor is represented by an edge E, connecting the vertices representing source and drain

– edges are labeled by the corresponding transistor gate input signal

– GP and GN are dual


Graph theoretical algorithm (2)

If two edges Ei and Ej are adjacent in the graph model, then it is possible to place the corresponding gates in a physically adjacent position of an array and hence, connect them by a diffusion area. In order to minimize the number of separations a set of minimum size paths has to be found, which corresponds to chains of transistors in the array.

Definition 1: An Euler path is a single (uninterrupted) path on a graph, that covers every edge of the graph exactly once.

If there exist Euler paths for GN and GP then all transistors can be chained by diffusion areas. Otherwise the graphs have to be partitioned into sub-graphs which have Euler graphs.

It's necessary to find a pair of paths for GP and GN with the same sequence of labels, because p- and n-type transistors corresponding to the same input have to be positioned at the same horizontal position (poly line).


Graph theoretical algorithm (3)

General algorithm:– enumerate all possible decompositions of the graph model to find the

minimum number of Euler paths that cover the graph

– chain the gates by means of a diffusion area according to the order of the edges in each Euler path and

– if more than two Euler paths are necessary to cover the graph model, then provide a separation area between each pair of chains

Result: Search of minimal number of Euler paths is NP-complete.

Problem reduction: An odd number of series or parallel edges can be reduced to a single edge:


Problem reduction

Definition 2: The reduced graph is obtained by iteratively replacing an odd number of series (parallel) edges by a single edge, until no further reduction is possible.

Theorem 1: If there is an Euler path in the reduced Graph then there exists an Euler path in the original graph.

Proof: It is possible to reconstruct an Euler path in the original graph by replacing each edge of the Euler path in the reduced graph by a sequence of the original odd number of edges.

Theorem 2: If the number of inputs to every AND/OR element is odd, then:

– the corresponding graph model has a single Euler path

– there exists a graph model such that the sequence of edges on an Euler path corresponds to the vertical order of inputs on a planar representation of the logic diagram.


Problem reduction (2)

If there are gates in the logic diagram with an even number of inputs, additional “pseudo” inputs have to be introduced in order to guarantee an odd number of inputs. It is guaranteed by the second previously given theorem, that there exists an Euler path for this modified problem. But the pseudo edges in the Euler path have to be removed afterwards and then they can cause diffusion separations. An algorithm for minimizing separations caused by pseudo edges is given in the next section ( minimal interlace of normal and pseudo inputs).


Problem reduction (3)

The heuristic algorithm for generating an Euler path is given by:1. To every gate with an even number of inputs a “pseudo” input is added

2. Add this new input to the gate such that the planar representation of the logic diagram shows a minimal interlace of “pseudo” and real inputs. It should be noted that a “pseudo” input at the top or at the bottom of the logic diagram does not contribute to the separation areas.

3. Construct the graph model such that the sequence of edges corresponds to the vertical order of inputs on the planar logic diagram.

4. Chain together the gates by means of diffusion areas, as indicated by the sequence of edges on the Euler path. “Pseudo” edges indicate separation areas.

5. The final circuit topology can be derived by deleting “pseudo” edges in parallel with other edges and by contracting “pseudo” edges in series with other edges.


Application of reduction rule

(a) Logic diagram

(b) Graph model and its reduction

(c) Reconstruction of an Euler path


Application of heuristic algorithm

This heuristic algorithm does not necessarily give the optimal layout, but if the resulting sequence has no separation areas, it is the real optimal solution.

(a) New inputs p1 and p2 are added

(b) Optimal sequence of inputs without the interlace of p1 and p2

(c) Circuit with the dual path p1,2,3,1,4,5,p2


Algorithm for calculating minimal interlace

Put it in the line.Any

white triangleleft?

Yes

Put it in the line, and set the white

part on top.

Anyblackwhite triangle

left?

Yes

Put it in the line.Any

black triangleleft?

Yes

Put it in the line, and set the black

part on top.

Anyblackwhite triangle

left?

Yes

Anywhite triangle

left?

start

No

No

No

No

stop

No

Yes

An example of line.


Application example for minimal interlace algorithm


Example: carry look-ahead

This implementation has no Euler path!


Alternative carry look-ahead topology

This topology

does have Euler path!


Comparison of space

(a) Functional cell realization

(b) Conventional NAND realization


Standard cell layout


Example: synchronous counter


Programmable Logic Arrays (1)

• Map a set of Boolean functions in canonical, two-level sum-of-product form into a geometrical structure

• Consist of an AND-plane and an OR-plane

• For every input variable in the Boolean equations, there is an input signal to the AND-plane

• The AND plane produces a set of product terms by performing an AND operation

• The OR plane generates output signals by performing an OR operation on the product terms fed by the AND plane





• PLA (Programmable Logic Array): – AND and OR array are programmable

– every product term of the AND array can be connected to any of the OR output gates

• PAL (Programmable Array Logic):– AND array is programmable

– OR array has fixed connection points (OR gates)

• PROM (Programmable Read Only Memory):– AND array hardwired

– OR array programmable

– Set of all possible product terms is realized


Architectures (1)


Architectures (2)


Example (1)

• PROM implementation realizes all of the 8 product terms

x0 x1 x2 z0 z1

0 0 0 1 1

0 0 1 1 1

0 1 0 0 0

0 1 1 0 0

1 0 0 0 0

1 0 1 0 0

1 1 0 1 0

1 1 1 0 1

21010

2102102100

xxxxx

xxxxxxxxxz

+=++=

21010

2102102101

xxxxx

xxxxxxxxxz

+=++=


Example (2)

• PLA implementation needs only 3 product terms

111

011

X00

x2x1x0

10

01

11

z1z021010

2102102100

xxxxx

xxxxxxxxxz

+=++=

21010

2102102101

xxxxx

xxxxxxxxxz

+=++=


Floor Plan for PLA

A AND plane programming cell

O OR plane programming cell

AO AND-OR communication cell

IN AND plane input cell

OUT OR plane output cell

LA left AND plane cell

RO right OR plane cell

BL bottom left cell

BM bottom middle cell

BR bottom right cell

TL top left cell

TA top AND cell

TM top middle cell

TO top OR cell

TR top right cell

PLA generic floor plan


Static nMOS and Pseudo-nMOS PLA

• nMOS PLA: Pull-up network realized by single nMOS depletion transistor

• Pseudo nMOS PLA: Pull-up by high resistance pMOS transistor with permanently grounded gate input

• But: AND-OR structure not suited to MOS circuit technology

• Therefore: AND and OR planes are implemented through NOR or NAND gate structures

• The transformation is based on deMorgan’s law


INV-NOR-NOR-INV Structure (1)

Transformation according to deMorgan’s law:



Example:

General structure:



Properties:

• high static power dissipation

• small area

• useful if high speed is not required



Pseudo nMOS NOR-NOR PLA circuit



PLA implementation in pseudo nMOS logic



Stick diagram of a nMOS PLA


NAND-NAND Structure (1)

Transformation according to deMorgan’s law:

Example:


NAND-NAND Structure (2)

Properties:

• NAND-NAND approach not recommended:

• decreasing performance at increasing number of inputs (because of series connection of nMOS transistors)

• high static power dissipation


Static CMOS PLA (1)

• NOR gates with a large number of inputs should be avoided in CMOS (because the p-channel devices are in series)

• Static CMOS PLAs are usually realized in NAND-INV-INV-NAND structure in order to avoid long chains of pMOS transistors

Properties:

• no static power dissipation

• area increase becomes unacceptable for large PLAs

• working fast


Static CMOS PLA (2)

PLA NAND-INV-INV-NAND implementation


Static CMOS PLA Layout


Dynamic CMOS PLA (1)

• less size than static CMOS

• fast

• 2-phase clocking

• states of Φ1: Φ1 = 1– no path to ground– inputs change– both NOR planes are precharged

• states of Φ1: Φ1 = 0– first NOR plane discharges– dummy: worst case discharge (prevents second NOR plane to

discharge)– after first NOR plane, the second plane evaluates



• Φ2 is used to latch the second stage

• Intermediate clock is required to precharge OR plane– generated by the cells TL, TA and TM

– uses a dummy product row that discharges at the worst case rate according to the loading of the AND array



Dynamic 2-phase PLA circuit


Noise in PLA circuits (1)

• Noise Problems on switched supply lines in dynamic PLAs

• The discharge current generates transients in the power supply bus

• To reduce noise: locally grounding the PLA; use of metal lines for power supply whenever possible (reduced impedance)


Noise in PLA circuits (2)


• optimizations (minimizations) of boolean equations in order to reduce the number of minterms or literals

• decoder in front of the AND plane to generate combined input variables

• if a term is needed both positive and negative, a reduction can be achieved sometimes by using negative logic

Example:

Optimization of PLAs – Logic Minimization

z = x1 + x0x1’x2’ + x0’x1’x2 3 minterms

z’ = (x1 + x0x1’x2’ + x0’x1’x2)’= x1’(x0x1’x2’)’(x0’x1’x2)’= x1’(x0’ + x1 + x2)’(x0 + x1 + x2’)’= (x0’x1’ + x1’x2)(x0 + x1 + x2’)= x0x1’x2 + x0’x1’x2’ 2 minterms


Optimization of PLAs – Folding

PLA before folding

Row-folded PLA

Column-folded PLA


Optimization of PLAs – Multi Sided Access

• An advantage of multi-sided access and folding is the decreased layout area, but the layout structure has changed and the wiring is more difficult.

Multi sided input/output access


Timing & Power Dissipation of a Static PLA

• Delay is determined by – (W/L) of the AND/OR load

– (W/L) of the AND/OR cells

• Minimum Delay:– large load current Iload

– (W/L)ORplane = e*(W/L)ANDplane

• Limitations:– Iload limited by:

• the total power of the PLA

• the internal logical ‘0’: (I * RnMOS = ‘0’) < VT !

– the stage sizing factor e for successive stages can not always be realized due to the floorplan


Automatic PLA Layout Generation (1)

Input: boolean equations

logical optimization

truth table = matrix

floorplanner

Output: layout with mask data

Cells:input/output bufferclock driverVDD/VSS cellsSchmittrigger …

structure of PLA


Automatic PLA Layout Generation (2)

Example: PLA generator input filePLA adderpla;INPUT: I1,I2,I3;OUTPUT: O1,O2;PRODUCT: P1,P2,P3,P4,P5,P6,P7;

AND_BEGINP1 := I1 * I2;P2 := I1 * I3;P3 := I2 * I3;P4 := I1 * I2' * I3';P5 := I1' * I2 * I3';P6 := I1' * I2' * I3;P7 := I1 * I2 * I3;

END_END

OR_BEGINO1 := P1 + P2 + P3;O2 := P4 + P5 + P6 + P7;

OR_END

Truth table matrix:optimized intermediateresult

1 1 X 1 0

1 X 1 1 0

X 1 1 1 0

1 0 0 0 1

0 1 0 0 1

0 0 1 0 1

1 1 1 0 1


13. Finite State Machines

Integrated Electronic Systems Lab13: FSMs 459

Finite State Machines - Basics

• Finite State Machines (FSMs) can be divided into 2 classes:

– Moore Machines• The outputs depend only on the current state

• The next state depends on current state and inputs

– Mealy Machines• The outputs depend on current state and inputs

• The next state depends on current state and inputs


Logic

Moore Machines

Characteristics of a Moore Machine:• Outputs depend only on the current state

• Next state depends on current state and inputs

Φ

state

inputs

outputs

next state

Logic

State Register


Logic

Moore Machines

Alternative implementation of a Moore Machine with registered outputs:• Outputs still depend only on the current state !

– (but are calculated from the next state signal now)

– At the rising clock edge, the next state and its corresponding outputs are loaded into the registers

Φ

state

inputs

outputsnext state

Logic

State Register

Φ


Logic

Mealy Machines

Characteristics of a Mealy Machine:• Outputs and next state both depend on current state and inputs

Φ

state

inputs

outputs

next state

State Register


Logic

Mealy Machines

Implementation of a Mealy Machine with registered outputs• Note that the required logic would be different from that of a Mealy

Machine with unregistered outputs (like the one shown on the previous slide)

Φ

state

inputs

outputs

next state

State Register

Φ


Table Notation

• FSMs can be represented as a State Transition Table– The table exactly defines the values for the next state and all outputs (right

side of the table) depending on the current state and the inputs (left side)

– Logic functions can be easily derived from the table, e.g.

– Current state and next state are encoded binary (in the example: 3 bits)

– “Don‘t cares” in the input conditions

are indicated by an ‘x’

– In each state, every possible

combination of input values should

be covered by exactly one line in

the table (not more, not less)

current state

inputs next state outputs

S2S1S0 a b S2‘S1‘S0‘ x y

0 0 0 0 0 0 0 0 0 0

0 0 0 0 1 0 0 1 0 0

0 0 0 1 x 1 0 1 0 0

0 0 1 1 0 0 1 0 0 1

... ... ... ...

...' 0120120 ++= aSSSbaSSSS


Graph Notation

• FSMs can also be represented as a graph– Every state is a node in the graph

– Every state transition is an edge (arrow)• The arrows indicate which state is taken in the next cycle, depending on the inputs

and the current state

– State encoding is displayed inside the nodes

001

initial state some other state

state transition input condition (boolean expression)

010binary state encoding


Example for a Moore Machine

000

111

010

a = 0

a = 1

a = 0

a = 1

always

current state


S1S0 a S1‘S0‘ x

0 0 0 0 0 0

0 0 1 0 1 0

0 1 0 0 0 0

0 1 1 1 1 0

1 1 x 0 0 1

S1S0

x

current state

assigned output value

Notation:


Example for a Mealy Machine

• Because the outputs of a Mealy Machine also depend on the inputs, the values assigned to them are annotated at the transitions

• The notation is:

00

11 01

a = 0 / x 0

a = 1 / x 1

a = 0 / x 0

a = 1 / x 1

always / x 1

current state


S1S0 a S1‘S0‘ x

0 0 0 0 0 0

0 0 1 0 1 1

0 1 0 0 0 0

0 1 1 1 1 1

1 1 x 0 0 1

input condition / output assignment


State Encoding

• The encoding of the states plays a key role for the implementation of a FSM

– It influences the complexity of the logic functions, the hardware costs of the circuits, timing issues, power, etc.

• Therefore, several common coding styles with different features exist

– regular encoding

– „one hot“ encoding

– ...

• The optimum choice depends on the used technology (ASIC, PLA, FPGA, etc.) as well as on the given design goals


State Encoding

• Regular Encoding– The minimum number of bits is used to encode the states

• At least N bits are required to encode up to 2N states

– Codes can be assigned to states arbitrarily or according to certain rules (e.g., in order to minimize complexity of the logic)

– Advantages:• Minimum number of flipflops required

– Disadvantages:• Due to the compactness of the state encoding, the logic functions for

calculating the next state and the outputs can be become more complex

• On average, many bits switch when the state changesHigher power consumption

Glitches can occur


State Encoding

• One Hot Encoding– N bits are used to encode N states

• In each state, exactly one bit is ‘1’, all others are ‘0’

• therefore the name “one hot” encoding

– Advantages:• In many cases, less logic is required

– many small logic functions are used instead of few complex functions

– particularly advantageous for FPGA implementations

• Low switching activity, resulting in ... lower power consumption

less glitches

– Disadvantages:• The number of required flipflops grows linearly with the number of states

High hardware costs for large FSMs


State Encoding

• One Hot Encoding – Implementation Aspects– Best suited for distributed implementation

• One flipflop for each state

• One small transition logic for each flipflop

– Each flipflop can be used to directly activate some other hardware block or logic function that is only needed in this state

Logic FF

Logic FF

Logic FF

current state

Logic FF

– From an abstract point of view, all N flipflops together can also be seen as one single state register of size N

some specific functional

block

enable


Examples for State Encoding

0001

1000

0010

0100

One Hot Encoding

00

11

01

10

Regular Encoding



0001

1000

0010

0100

One Hot Encoding

00

11

01

10

Regular Encoding

1

0

0

0

0 0



0001

1000

0010

0100

One Hot Encoding

00

11

01

10

Regular Encoding

0 1

0

0

1

0



0001

1000

0010

0100

One Hot Encoding

00

11

01

10

Regular Encoding

1 0

0

0

0

1



0001

1000

0010

0100

One Hot Encoding

00

11

01

10

Regular Encoding

1 1

0

1

0

0


14. ASIC Design Concepts:Gate Arrays

& Standard Cells

Integrated Electronic Systems Lab14: Gate Arrays 478

Cost Issues

• Design Costs

• Non-recurring Engineering Costs (NRE)

• Manufacturing Costs

TotalCosts

Number of manufactured Chips

Design+ NRE

Costs= Fixed

Costs

Costsper Chip


Design+ NRE

Costs= Fixed

Costs


Cost Issues: Design Costs

Design Costs reduced by

• raising level of abstraction

• re-use

• powerful synthesis methods

Cost-affecting Decisions:

• System Level: – System architecture

– Communication architecture

• Block-Level:– appropriate modeling of control-

dominated and data path oriented components

Synthesis:

• High-level Synthesis (allocation, scheduling, binding)

• Logic Synthesis (RTL to logic translation, FSM synthesis, logic optimisation, retiming)

• Layout Synthesis (module generators, PLA generators, Place & Route)


Cost Issues: Manufacturing Costs

...depending on Design Style:

ASIC

(synthesized)

Standard CellsMacro Cells Gate Arrays FPGAs/PLDs

Full CustomSemi Custom

Cell-based Array-based

Gate Arrays


Gate Arrays – Introduction (1)

Gate Arrays (Masterslices):

• Prefabricated active elements (master)

• Construction of logic functions by personalization (wiring macros from a cell library, intra-cell routing)

• Connection of functional blocks by inter-cell routing in 1...3 layers plus contact/via layers

• Arrangement of gate arrays:– row structure

– island structure

– matrix of structures (= sea of gates)

• Mixed analog/digital gate arrays



Gate array floor plan with row structure



Floor plan for a sea of gates array


Gate Array Design Flow


Qualification of Gate Array Design Style

• Advantages:– Lower number of individual masks needed

– Higher number of pieces for uncustomized master (cost reduction)

– Many others for masters, second source fabrication, libraries and design systems

• Disadvantages:– Area overhead (by unused transistor cells)

– Overdimensioned routing channels

– Larger cell size

Advantages dominate for smaller production volumes


Costs: Full Custom vs. Gate Array

• Gate Arrays: Reduction of fixed costs (reduced mask costs)

• Increased per piece costs, since utilisation of transistors is not optimal, therefore larger chip area and less yield, implying larger cost

TotalCosts


Design+ NRE

Costs= Fixed

Costs

Costsper Chip


Design+ NRE

Costs= Fixed

Costs

Full Custom

Gate Array


Standard Cells

• Standard cell libraries are required by almost all CAD tools for chip design

• Standard cell libraries contain primitive cells required for digital design

• However, more complex cells that have been specially optimized can also be included

• The main purpose of the CAD tools is to implement the so called RTL-to-GDS flow

• The input to the design process, in most cases, is the circuit description at the register-transfer level (RTL)

• The final output from the design process is the full chip layout, mostly in the GDSII (gds2) format

• To produce a functionally correct design that meets all the specifications and constraints, requires a combination of different tools in the design flows

• These tools require specific information in different formats


Standard Cell Library Formats

• The formats explained here are for Cadence tools, howerver similar information is required for other tool suites.

• Physical Layout (gdsII, Virtuoso Layout Editor)– Should follow specific design standards eg. constant height, offsets etc.

• Logical View (verilog description or TLF or LIB)– Verilog is required for dynamic simulation. Place and route tools usually can use TLF.– Verilog description should preferably support back annotation of timing information.

• Abstract View (Cadence Abstract Generator, LEF)– LEF: Contains information about each cell as well as technology information

• Timing, power and parasitics (TLF or LIB)– Transistor and interconnect parasitics are extracted using Cadence or other extraction

tools.– Spice or Spectre netlist is generated and detailed timing simulations are performed.– Power information can also be generated during these simulations.– Data is formatted into a TLF or LIB file including process, temperature and supply

voltage variations.– Logical information for each cell is also contained in this file.


Standard Cell Design Flow


Standard Cell Layout

• Routing Grids

• Both vertical and horizontal routing grids need to be defined

• HVH or VHV routing is defined for alternating metals layers

• All standard cell pins should ideally be placed on intersection of horizontal and vertical routing grids

• Exceptions are abutment type pins (VDD and GND)

• Grids are defined wrt the cell origin

• Grids can be offset from the origin, however by exactly half the grid spacing

• The cell height must be a multiple of the horizontal grid spacing

• All cells must have the same height, but some complex cells can be designed with double height

• The cell width must be a multiple of the vertical grid spacing

• However, limited routing tracks are the bottleneck even with wider cells




Standard Cells




Standard Cell Example: Layout of Inverter


Standard Cell Example: Layout of NAND2


Standard Cell Library

• Cell libraries determine the overall performance of the synthesized logic

• Synthesis engines rely on a number of factors for optimization

• The cell library should be designed catered solely towards the synthesis approach

• Here are some guidelines:– A variety of drive strengths for all cells

– Larger varieties of drive strengths for inverters and buffers

– Cells with balanced rise and fall delays (for clock tree buffers/gated clocks)

– Same logical function and its inversion as separate outputs, within same cell

– Complex cells

– High fanin cells


Standard Cell Library

– Variety of flip-flops, both positive and negative edge triggered, preferably with multiple drive strengths

– Single or Multiple outputs available for each flip-flop (e.g. Q only, or Qbar only or both), preferably with multiple drive strengths

– Flops to contain different inputs for Set and Reset (e.g. Set only, Reset only, both)

– Variety of latches, both positive and negative level sensitive

– Several delay cells. Useful for fixing hold time violations

– To enable scan testing of the designs, each flip-flop should have an equivalent scan flop

• Using high fan-in reduce the overall cell area, but may cause routing congestion inadvertently causing timing degradation. Therefore they should be used with caution


15. Programmable Logic Devices

Integrated Electronic Systems Lab15: PLDs 499

Overview

• Introduction

• Programming Technologies

• Basic Programmable Logic Device (PLD) Concepts

• Complex PLD

• Field Programmable Gate Array (FPGA)

• CAD (Computer Aided Design) for FPGAs

• Design flow for Xilinx FPGAs

• Economical Considerations

• Logic design Alternatives


• A Programmable Logic Device is an integrated circuit with internal logic gates and interconnects. These gates can be connected to obtain the required logic configuration.

• The term “programmable” means changing either hardware or software configuration of an internal logic and interconnects.

• The configuration of the internal logic is done by the user.

• PROM, EPROM, PAL, GAL etc. are examples of Programmable Logic Devices.

Introduction


Programmable Logic Device can be programmed in two ways:

1. Mask programming (in some few cases)

2. Field programming (typical)

1.) Mask programming: programming of device is done in the mask level.

+ good timing performance due to internal connections hardwired during manufacture

+ cheap at high volume production

- programmed by manufacturer

- development cycle = weeks or months

- not re-programmable

Programming Technologies


2.) Field programming: Programming of device is done by the user. The programming technologies are of two types

Permanent type (Non-volatile):• Fuse (normal on) - ‘CLOSE (intact)’ ‘OPEN (blown)’• Anti-fuse (normal off) - just the opposite of a FUSE• EPROM• EEPROM

Nonpermanent type (Volatile):• driving n-MOS pass transistor by SRAM• NOTE:

-When power of device is switched off then the content of SRAM is lost.

Programming Technologies (II)


1.) PLA (Programmable Logic Array):

• array of AND and OR gates are programmable• product term sharing: every product term of the AND array can be

connected to the input of any OR gate • unidirectional input/output pins

Basic PLD Concepts

Figure 1: PLA device


Basic PLD Concepts (II)

2.) Memory based: Device with fixed AND array and programmable OR array

• output of OR gate has fixed connection with input of AND gates

• PROM, EPROM and EEPROM are memory based PLD device

3.) PAL/GAL(Programmable Array Logic/ Gate Array Logic):

AND array is programmable and OR array has fix connection with outputs of AND gates. PAL/GAL devices may have bi-directional I/O pins.

There are three different types of PAL/GAL devices

• combinational PAL devices are used for the implementation of logic function

• sequential PAL devices are used for the implementation of sequential

logic (finite state machines)

• arithmetic PAL devices sum of product terms may be combined by XOR

gates at the input of the macrocell D flip-flop


Basic PLD Concepts (IV)

Additional features of PAL/GAL devices

• PAL: - EPROM - based programming Technology

• GAL: - has array of programmable AND gates and OLMC (Output

Logic Macro Cell)- EEPROM - based programming Technology - programmable output polarity- device can be configured as dedicated input and output mode


Figure 2:

Combinational PAL device, AMD PAL16L8


Figure 3:

Sequential PAL devices, AMD PAL16R8


Figure 4:

Arithmetic PAL device, AMD PAL16A4


Figure 5: GAL device, GAL 16V8

• GAL16V8 has 8 configurable OLMC (Output Logic Macro Cell)

• each OLMC has programmable XOR to get active low or high outputsignal

• there is a feedback from output to input


• is combination of multiple PAL or GAL type devices on a single chip

• CPLD architectures consists of

- Macrocells

- configurable flip-flop (D, T, JK or SR)

- Output enable/clock select

- Feedback select

• CPLD has predictable time delay because of hierarchical inter-connection

• easy to route, very fast turnaround

• performance independent of netlist

• devices is erasable and programmable with non-volatile EPROM or EEPROM configuration

• wide designer acceptance

• has more logic density than any classical PLDs device

• relatively mature technology, but some innovation still ongoing

Complex PLD (CPLD)


Figure 6:

Complex PLD device Altera EP1800

Complex PLD (II)


• EP1800 is erasable PLD device and has 48 macrocells, 16 dedicated input pins and 48 I/O pins.

• device is divided into four quadrants, each contains 12 macrocells and has local bus with 24 lines and a local clock

• out of 12 microcells, 8 are “local” macrocells and 4 are “global” macrocells

Figure 7: Local macrocell Figure 8: Global macrocell

Erasable CPLD


• global bus has 64 lines and runs through all of the four quadrants (true and complement signals of 12 inputs (=24 lines) + true and complement of 4 clocks (=8 lines) + true and complement of I/O pins of the 4 global macro cells in each quadrant (=32 lines)

• macrocells: combinational or registered data output; the flip-flop is configurable as D, T, JK or SR type.

Erasable CPLD (II)

Figure 9: Synchronous clock, output enable by product term

Figure 10: Asynchronous clock, output permanently enabled


Figure 11: Block diagram of Altera MAX 7000 family

Electrically Erasable PLD

• MAX 7000 is EEPROM based programmable logic device

• it’s architecture includes following elements,

- Logic Array Blocks (LABs)

- Macrocells- Programmable Interconnect Array (PIA)

- I/O control blocks• Pin to pin delay is about 5

ns • predictable delay because

of hierarchical routing structure of PIA


Figure 12: MAX 7000 device, macrocell

Electrically Erasable PLD (II)

• each Logic Array Block (LAB) has 16 macrocells

• each macrocell consists of logic array, product term select matrix and programmable register

• the product term select matrix allocates product terms from logic array to use them as either primary logic inputs to OR and XOR gate or secondary inputs to clear, preset, clock and clock enable control function for the register of macrocell


Figure 13:

MAX 7000 device, programmable Interconnect Array (PIA)

Electrically Erasable PLD (III)

• logic is routed among LABs via the PIA.

• dedicated inputs, I/O pins, and macrocell outputs feed the PIA, which makes the signals available throughout the entire device

• only the signals required by each LAB are actually routed from the PIA into the LAB

• selecting of signal from PIA to LAB is done by an EEPROM cell


Field Programmable Gate Array

• FPGA is a general purpose, multi-level programmable logic device

• FPGA is composed of,

- logic blocks to implement combinational and sequential

logic circuit

- programmable interconnect wire to connect input and output of logic blocks

- I/O blocks logic blocks at periphery of device for the external connection

•“The routing resources are both the greatest strength and weakness

of the FPGA’s”


Field Programmable Gate Array (II)

Figure 14: Symmetrical arrayarchitecture of FPGAs


• There are four main categories of FPGAs available commercially,

- symmetrical array

- row - based

- hierarchical PLD

- sea of gates

• They are differ to each other on their interconnection and how they are programmed

Field Programmable Gate Array (III)

Figure 15: Category of different FPGA


Programming Technologies

• Currently, there are four programming technologies for FPGAs,

- static RAM cells

- anti fuse

- EPROM transistor

- EEPROM transistor

Static RAM programming technology:

a) pass-transister b) transmission gate

c) multiplexer

Figure 16: SRAM based programming technology


• completely reusable - no limit concerning re-programmability

• pass gate closes when a “1” is stored in the SRAM cell

• allows iterative prototyping

• volatile memory - power must be maintained

• large area - five transistor SRAM cell plus pass gate

• memory cells distributed throughout the chip

• fast re-programmability (tens of milliseconds)

• only standard CMOS process required

SRAM Programming technology


• An anti-fuse is the opposite of normal fuse. • Anti-fuse are made with a modified CMOS process having an extra step• This step creates a very thin insulating layer which separates two

conducting layers • That thin insulating layer is fused by applying a high voltage across the

conducting layer• Such high voltage can be destructive for CMOS logic circuit • Non-volatile (Permanent)• Requires extra programming circuitry, including a programming

transistor

Anti-fuse Programming


Actel PLICE Anti-fuse programming technology

Figure 17: Actel PLICE anti-fuse structure

• The Actel PLICE anti-fuse consists of a layer of positively doped silicon (n+ diffusion), a layer of dielectric (Oxygen-Nitrogen-Oxygen) and a layer of polysilicon

• it is programmed by placing a relatively high voltage (18V) across the anti-fuse terminals which results current of about 5 mA through it

• typical resistance of a fused contact is 300 to 500 Ω

• manufactured by 3 additional masks to a normal CMOS process


Quicklogic ViaLink Anti fuse programming technology

Figure 18 : Four layer Metal ViaLink structure Figure 19: ViaLink

element

• amorphous silicon is used as an insulating layer

• direct metal to metal contact results path resistance below 50 Ω

• 10 V terminal voltage is required to fuse the amorphous silicon


EEPROM programming technology

• static charge on floating gate turns the transistor permanently off • re-programmable• non-volatile• external permanent memory is not required• slow re-configuration time• floating-gate FET has relatively high on resistance• higher static power consumption due to pull up resistor

Figure 20:

EEPROM programming technology


Commercially available FPGAs


Xilinx FPGA

Figure 21: General architecture of Xilinx FPGA

• Xilinx architecturecomprises of two dimensional array of logic block called as CLB.

• They are interconnected via horizontal and vertical routing channel

• I/O Blocks are user configurable to provide an interface between external package pin and input logic

• I/O can be configured as input, output and bi-directional signal


Figure 22: Xilinx XC4000 CLB

Xilinx FPGA (II)

• Xilinx XC4000 is an SRAM based FPGA

• each CLB has three LUTs (Look Up Tables) and two flip-flops.

• result of combinatorial logic is stored in 16x1 SRAM LUTs

• LUTs can be also used as RAM

• combinatorial results of CLB is passed to the interconnect network or can be stored in flip-flops and pass to the interconnect network

• with two stage of LUTs, two functions of 4 variables or one function of 5 variables can be implemented


Figure 24: Switch matrix

Figure 23: Programmable interconnect associated with XC4000 series CLB

Xilinx FPGA (III)

Horizontal longlines

Single length lines

Double length lines


Xilinx FPGA (IV)

• interconnects of XC4000 device are arranged in horizontal and vertical channels

• each channel contains some number of wire segments • They are,Single length lines:

• they span a single CLB • provide highest interconnect flexibility and offer fast routing• acquire delay whenever line passes through switch matrix• they are not suitable for routing signal for long distance

Double length lines:• they span two CLB so that each line is twice as long as single length

lines• provide faster signal routing over intermediate distance

Longlines:• Longlines form a grid of metal interconnect segments that run entire

length or width of the array • they are for high fan-out and nets with critical delay


Xilinx, Virtex-II ProTM FPGA family

• The Virtex-II Pro Platform FPGA is the most technically sophisticated silicon and software product development in the history of the

programmable logic industry.

• The Virtex-II Pro FPGAs are manufactured in a 0.13-micron process.

• It is capable of implementing high performance System-On-a-Chipdesigns with low development cost

• It can be used in the application such as system architectures in networking applications, deeply embedded systems and digital signal processing systems etc.

• Virtex-II Pro devices incorporates one to four PowerPC 405 processorcores. The PowerPC 405 cores are fully embedded within the FPGA, where all processor nodes are controlled by the FPGA routing resources.

• Each PowerPC 405 core is capable of more than 300 MHz clock frequency.


Xilinx, Virtex-II ProTM FPGA family (II)

Figure 25: Virtex-II Pro Generic Architecture Overview

• The Virtex-II Pro FPGA consists of the following components:

- Embedded Rocket I/O™ Multi-Gigabit Transceivers (MGTs)

- Processor Blocks containing embedded IBM ® PowerPC ® 405 RISC CPU (PPC405) cores and integration circuitry

- FPGA fabric based on Virtex- II architecture.


Xilinx, Virtex-II ProTM FPGA family (III)

• CLB (Configurable Logic Block) include four slices and two 3-state buffers

• Each slice is equivalent and contains:

• Two function generators (F & G)

• Two storage elements• Arithmetic logic gates• Large multiplexers• Wide function capability• Fast carry look-ahead chain• Horizontal cascade chain

(OR gate)

Figure 26: CLB (Configurable Logic Block) of Virtex-II Pro FPGA


Xilinx, Virtex-II ProTM FPGA family (IV)

• IOB blocks include six storage elements, as shown in Figure.

• Each storage element can be configured either as an edge-triggered D-type flip-flop or as a level-sensitive latch.

• On the input, output, and 3-state path, one or two DDR (Double Data Rate) registers can be used.

• Double data rate is directly accomplished by the two registers on each path, clocked by the rising edges (or falling edges) from two different clock nets.

Figure 27: IOB block of Virtex-II Pro FPGA


Actel/TI FPGA architecture

Figure 28: General architecture of Actel FPGA

• Actel offers three main families:

- Act 1, Act 2, Act 3

• programmable Logic blocks are arranged in row

• horizontal routing channels are arranged between the adjacent rows

• Actel FPGA are based on anti fused technology

• instead of LUTs, it has multiplexer


Actel/TI FPGA architecture (II)

Act-1 Logic Module:• The Act-1 logic module has 8 - input and 1- output logic circuit

• it has only combinatorial logic circuit module• The Logic Module can implement the four basic functions which are NAND, AND, NOR and OR

Figure 29: Act-1 logic module


Actel/TI FPGA architecture (III)

Figure 30: Act-2 logic module

C module

S module

Act-2 Logic Module:• Act-2 family has two module architecture, consisting of C module

(Combinatorial) and S module (Sequential) • the Logic Module is optimized for both combinatorial and sequential

designs


Act-3 Logic Module:

• it comprises an AND and OR gate that are connected to a multiplexer-based circuit block.

• The multiplexer circuit is arranged such that, in combination with the two logic gates, a very wide range of functions can be realized in a single logic block

• about half of the logic blocks in an Act-3 device also contains a flip-flop

Figure 31: Act-3 Logic module

Actel/TI FPGA architecture (IV)


Figure 32: Act-1 programmable interconnection architecture

Actel/TI FPGA architecture (V)


CAD for FPGAsInitial Design Entry

Logic Optimization

Technology Mapping

Placement

Routing

Programming Unit

Configured FPGAFigure 33: Design flow for FPGA


DESIGN IMPLEMENTATION

Design Entry

Design validation

Device Selection

Design Synthesis Optimization

Mapping

Placement

Routing

Design validation/ Back Annotation

Bits Stream generationDownload to Xilinx

FPGA

Design validation

Design flow for Xilinx FPGA


Economical Considerations

Figure 34: Cost per Chip


FPGA MPGA1. Cost per chip is less for low

volumes (low fixed cost)2. Short turnaround time3. Design flexibility is high and

cost for re-designing is low4. Speed is relatively slow

because of resistance andcapacitance of theprogrammable switch

5. Programmable switches andconfiguration network requirechip area, this resultsdecreased in logical density

1. Less cost per chip for high volumes2. Fabrication is done with hardwired

metal connection layer, this resultsfast operation

3. High logic density4. Very high costs for low volumes

(high fixed cost)5. No redesign flexibility

Economical Considerations (I)


Logic design Alternatives

SSI andMSI Ics

PLDs Programmablegate arrays

Gatearrays

CustomICs

Chip complexity small medium medium large ultra large

Speed Fast Slow tomedium

Slow tomedium

Slow tofast

Fast

Functiondefined by user

No Yes Yes Yes Yes

Time tocustomize

- Seconds Seconds Months Year

Userprogrammable

No Yes Yes No No


Logic design Alternatives (I)

Figure 35: Relative merits of various ASIC implementation styles


CPLDs and FPGAs

Architecture More Combinational Gate array-likeMore Registers + RAM

Density Low-to-medium Medium-to-high

Performance Predictable timing Application dependent

Interconnect “Crossbar Switch” Incremental

Complex Programmable Logic Device (CPLD)

Field-Programmable Gate Array (FPGA)


16. Arithmetic Units

Integrated Electronic Systems Lab16: Arithmetic Units 548

Basic Adder Cells

• Half Adder:• Can be used to calculate the sum of two bits A1 and A2.

• Full Adder:

• For adding binary numbers having a bitwidth of more than one single bit.

• These equations can be realized either by logic gates (AND, OR, XOR) or by two half-adders and an OR gate.

Adders / Subtracters

21AAC =

21 AAS ⊕=

2121 )( AAAACC inout ++=

inout CAAS ⊕⊕= 21


Adders / Subtracters for Binary Coded Integers

Serial Adders

• The n-bit sum and the carry output are available after (n+1) clock cycles (1 operand load, n calculations).

• The serial adder has the smallest hardware complexity (wordlength independent if the shift registers are not considered) but requires the highest computation time of all adder implementations.


Adders / Subtracters for Binary Coded Integers

Parallel Adders

• Ripple Carry Adder:

• Chained full-adders where the carry „ripples“ through the whole chain from the LSB to the MSB.

• The addition time depends on the wordlength of the operands.


Parallel Adders

• Carry Lookahead Adder:• The carry input of a stage i is calculated directly from the input of

the preceding stages i-1, i-2, ... i-k.

• The Cout of ordinary full adders are substituted by the generateand propagate signals:

• The carry input of stage i+1 is defined by:

• Example (4 bit adder):

iii bag =iii bap +=

11 −+==+ iiiiin cpgcc

i

inin cpgcc 000 1+==

inin cppgpgcc 010111 2++==

inin cpppgppgpgcc 0120121222 3+++==

inout cppppgpppgppgpgcc 012301231232333 ++++==


• The carry lookahead circuits can be realized by a two level logic implementation: the addition is performed in constant time.

• Carry lookahead adder for 4 bits:

• The number of gate inputs (the wordlength) is restricted due to technological constraints.


• Clustered Carry Lookahead Adder:

• Big wordlengths are split into smaller groups processed by carry lookahead adders with reasonable length.

• The carry ripples through different blocks as in the carry ripple adder.

• Alternative: a group-generate and group-propagate signals can be generated and then evaluated by a second-level carry lookahead circuit.


• Carry Select Adder:


• Carry Select Adder:

– The additions are performed in each cluster in parallel for the following cases:

• Carry in is „0“

• Carry in is „1“

– Cluster carry out and partial sum C/Sum[i:j] are forwarded to multiplexors.

– The multiplexors select the appropriate value depending on the carry output of the preceding stages.

– The overall addition time is almost independent of the wordlength.

– The hardware amount is almost twice that of a ripple carry adder.

– It is slower than a carry lookahead adder.

– Has a higher regularity, thus better suited for VLSI implementation.


• Carry Save Adder:

Ex. for 4 operands

(V, W, X, Y):


• Carry Save Adder:

– Achieves constant addition time complexity.

– The propagation of computed carry results is avoided.

– S and Cout are connected to the correct adder in the succeeding stage.

– Requires a final addition to merge the sum and the carry vector of the final stage (e.g. with a carry ripple adder).

– The adder delay is increased by one full-adder delay if it is extended by an additional operand.


Shift and Add (SAA) Multiplier

• The most common multiplier

• Multiplies two unsigned integer words X and Y of bit-size Nx and Ny:

• The following recurrence can be derived:

• At each step, one bit of X is AND-ed with Y and added to Di which is shifted one bit.

Multipliers

∑−

=

=1

0

2xN

i

iixX ∑

−

=

=1

0

2jN

j

jjyY

( )( )( )∑−

=−− +++==⋅=

1

0021 2222

x

xx

N

iNN

ii YxYxYxYxYXZ KK

00 =D YxDD iii += −+

11 2 12 −= x

x

NNDZ


• It takes N clock cycles to complete the multiplication (one bit of X is processed each step).

• The delay is approximately NyδFA (where δFA is the delay of a full adder).

• The cost of a SAA multiplier is (3N + 2N)γFA (the cost of a full adder γFA is assumed to be equal to the cost of a register).


Carry Save Multiplier (CSM)

• Calculates the result in one step.• Every bit of the first argument is multiplied with every bit of the second

argument concurrently.• The CSM consists of combinatorial logic only.• Example for two 4-bit binary numbers:

X3 X2 X1 X0

Y3 Y2 Y1 Y0

P30 P20 P10 P00

P31 P21 P11 P01

P32 P22 P12 P02

P33 P23 P13 P03

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

where Pij = Xi Λ Yj


Part III Part II Part I

• It is assumed that Nx ≥ Ny (if Nx = Ny, then Part II is omitted).

• The multiplier delay is (Nx + Ny - 2)δFA

• The cost is (Nx - 1)NyγFA plus (2Ny + 2Nx) γFA, if X, Y, and the Z-register are accounted.


Block Multiplier

• Can be configured from working fully serial to working fully parallel.

• Arguments divided into blocks of same size.

• Individual blocks are multiplied in a fast Carry Save Multiplier.

• The arguments and the intermediate result have to be shifted in an appropriate way.


• The intermediate result has to be shifted in both directions (requires a bidirectional shift register).

• The controller can be realized using a simple counter.

• The multiplier needs kx·ky clock cycles to perform a multiplication (where kx and ky are the number of separated blocks of the first and of the second argument, respectively).


17. Microarchitectures

Integrated Electronic Systems Lab17: Microarchitectures 565

Microarchitecture

• Components:

– Data Path

– Control Path (can be interpreted like a FSM)

• hardwired

• programmable

– I/O Unit


Datapath Design

• Example:

• Implementation:– Standard cells (gates, muxes, registers, ...).

Or:

– Datapath compiler: several layout tiles.


• Layout scheme:

• Datapath compiler: creates a regular layout by stacking the appropriate number of tiles (depending on the wordlengths of the operands).

• Bit slice: a horizontal slice of tiles performing all functions for a single bit.

• Functional slice: vertical layout block implementing a single function.


Bit-slice ALU AMD 2901

• 16-word register set

• Q register (used in add-shift multiplications and divisions)

• ALU

• Shifter

• Instruction decoder

• All operations and registers are designed for 4-bit operands.


Bit-slice ALU AMD 2901

• The instructions are encoded in a 9-bit I vector, provided by an external microcode controller.

• First table: selection of the sources for both ALU inputs (R and S).

• Second table: ALU functions.

• Third table: ALU results.


16-bit bit-sliced ALU:

• Cascaded 2901 ICs for wordlengths with multiples of 4 bits.

• Simple carry propagation scheme (alternatively, carry-lookahead circuits AMD 2902 can be used).


Controller Implementations

Combinational logic block implementation:

• Early microprocessors (≤ 8 bit) and RISC: random logic

– separate gates

– modifications require redesigning of a whole combinational gate network

• CISC processors: microprogramming– regular layout structures (ROMs or PLAs)

– modifications in the control sequence require only to redefine the contents of a PLA or ROM


Microprogrammed Controllers

ROM based controller PLA based controller

• Microinstruction = the concatenation of the control signals (for the

data path) and the next address (NA).


Horizontal Microinstructions

• Control word directly applied to the controlled circuit.

• Each control point has a corresponding entry in the control word.

• Very long control words

• Big control memories

• Very specific encoding is possible

• High degree of parallelism in the operations


Vertical Microinstructions

• n-bit control word: 2n configurations possible (hardly used).

• M control vectors are encoded into a vector of [log2M] bits.

• The n-bit control word is fetched from a secondary memory: control vector decoder (ROM or PLA).

• Alternative: encoding the control vector in groups for different units (ALU, shifter,...).

• Group by group decoding instead of using a single and large control vector decoder.


Microcode / Nanocode Controller

• Microinstruction = a sequence of nanoinstructions.

• MNA (microcode next address) register is halted while the nanocode sequence runs.

• Feedback via the NNA (nanocode next address): control sequences can be generated by the nanocode PLA.

• If the same nanocode sequences are used in many microinstructions, savings in implementation area are achieved.


17. Semiconductor Memories

18: Semiconductor MemoriesIntegrated Electronic Systems Lab 577

• Introduction

• Read Only Memory (ROM)

• Nonvolatile Read/Write Memory, esp. Flash (RWM)

• Static Random Access Memory (SRAM)

• Dynamic Random Access Memory (DRAM)

• Summary

Overview


Market

2 main driving forces for emerging technologies:

Total DRAM market 2008: 31 B$ (Source: Gartner 2009)

Total Flash market 2008: 28 B$ (Source: Gartner 2009)

Total SRAM market 2008: 2 B$ (Source: Gartner 2009)

Find lower cost solutions (shrinks capabilities are limited by costs rather than physics)

Find „unified memory“ combining strength of all known technologies (e.g. low power & speed)


Memory Requirement


Physical Principles of Semiconductor Memories

Memory Type Physical effectDRAM Charge (capacitor)

SRAM Cross coupled transistors

Flash Charge (gate of FET)

CBRAM Ion relocation Resistance

FeRAM Polarization

MRAM Magnetization Resistance

ORAM Phase Change Resistance

PCRAM Material phase Resistance


Semiconductor Memory Classification

SRAM - Static Random Access Memory

DRAM - Dynamic Random Access Memory

FIFO - First-In First-Out

LIFO - Last-In First-Out

Volatile Memory

Read/Write MemoryRead Only Memory

(ROM)

EPROME2PROMFLASH

RandomAccess

Non-RandomAccess

SRAMDRAM

Mask-Programmable ROM

Programmable ROM FIFOLIFO

Shift Register

Non-Volatile Memory

Read/Write Memory(RWM)

EPROM - Erasable Programmable ROM

E2PROM - Electrically Erasable Programmable ROM


Random Access Memory Array Organization

Each memory cell

• stores one bit of binary information (”0“ or ”1“ logic)

• shares common connections with other cells: rows, columns

Memory array

• Memory storage cells

• Address decoders


• Simple combinatorial Boolean network which produces a specific output for each input combination (address)

• ”1“ bit stored - absence of an active transistor• ”0“ bit stored - presence of an active transistor

• Organized in arrays of 2N words

• Typical applications:• store the microcoded instructions set of a microprocessor• store a portion of the operation system for PCs• store the fixed programs for microcontrollers (firmware)

Read Only Memory - ROM


Mask Programmable NOR ROM (1)

NOR ROM with 4-bit words

• Each column Ci (NOR gate) corresponds to one bit of the stored word

• A word is selected by rising to “1“ the corresponding wordline

• All the wordlines are “0“ except the selected wordline which is “1“

• ”1“ bit stored - absence of an active transistor

• ”0“ bit stored - presence of an active transistor


Mask Programmable NOR ROM (2)

• “1” bit stored - the drain/source connection (or the gate electrode) are omitted in the final metallization step

• “0” bit stored - the drain of the corresponding transistor is connected to the metal bit line

common ground line

D

D

S

S

G

G

Cost efficient, since few masks have to be manufactured only


Implant Mask Programmable NOR ROM

Idea: deactivation of the NMOS transistors by raising their threshold voltage above the VOHlevel through channel implants

• “1” bit stored - the corresponding transistor is turned off through channel implant

• “0” bit stored - non-implanted (normal) transistors

Advantage: higher density (smaller area)!

D

D

D

S

S


Implant Mask Programmable NAND ROM (1)

• Each column Ci (NAND gate) corresponds to one bit of the stored word

• A word is selected by putting to “0“ the corresponding wordline Ri

• All the wordlines Ri are “1“ except the selected wordline which is “0“

Normally on transistors: have a lower threshold voltage (channel implant)

NAND ROM with 4-bit words

• “1” bit stored - presence of a transistor that can be switched off

• “0” bit stored - shorted/normally-on transistor


Implant-Mask-Programmable NAND ROM (2)

4x4 bit NAND ROM array layout

• The structure is more compact than NOR array (no contacts)

• The access time is larger than NOR array access time (chain of nMOS)

R1

D

S

D

S


NOR Row Address Decoder for a NOR ROM Array

NOR ROMArray

• The decoder must select out one row by rising its voltage to “1” logic

• Different combinations for the address bits A1A2 select the desired row

• The NOR decoder array and the NOR ROM array are fabricated as two adjacent arrays, using the same layout strategy

A1 A2 R1 R2 R3 R4

0 0 1 0 0 0

0 1 0 1 0 0

1 0 0 0 1 0

1 1 0 0 0 1


NAND Row Address Decoder for a NAND ROM Array

• The decoder has to lower the voltage level of the selected row to logic “0” wile keeping all the other rows at logic “1”

• The NAND row decoder of the NAND ROM array is implemented using the same layout strategy as the memory itself


NOR Column Address Decoder for a NOR ROM Array

NOR Address decoder + 2M pass transistors

• Large area!

Binary selection tree decoder

• No need for NOR address decoder, but are necessary additional inverters!

• Smaller area

• Drawback - long data access time


Nonvolatile Read-Write Memories

• The architecture is similar to the ROM structure

• Array of transistors placed on a word-line/bit-line grid

• Special transistor that permits its threshold to be altered electrically

• Programming: selectively disabling or enabling some of these transistors

• Reprogramming: erasing the old threshold values and start a new programming cycle

Method of erasing:

• ultraviolet light - EPROMs

• electrically - EEPROMs


EPROM (1)

The floating gate avalanche-injection MOS (FAMOS) transistor:

• extra polysilicon strip is inserted between the gate and the channel - floating gate

• impact: double the gate oxide thickness, reduce the transconductance, increase the threshold voltage

• threshold voltage is programmable by the trapping electrons on the floating gate through avalanche injection

Schematic symbol


Avalanche injectionRemoving programming voltage leaves charge trapped

Programming results in higher VT

EPROM (2)

• Electrons acquire sufficient energy to became “hot” and traverse the first oxide insulator (100nm) so that they get trapped on the floating gate

• Electron accumulation on the floating gate is a self-limiting process that increases the threshold voltage (~7V)

• The trapped charge can be stored for many years

• The erasure is performed by shining strong ultraviolet light on the cells through a transparent window in the package

• The UV radiation renders the oxide conductive by direct generation of electron-hole pairs


EPROM (3)

• The erasure process is slow (~min.)

• The erasure procedure is off-system!

• Programming takes several usecs/word

• Limited endurance - max 1000 erase/program cycles

• The cell is very simple and dense: large memories at low cost!

• Applications that do not require regular reprogramming


EEPROM

• Reversible programming by reversing the applied voltage (rise and lower the threshold voltage) difficult to control the threshold voltage extra transistor required as access device

• Larger area than EPROM

• More expensive technology than EPROM

• Offers a higher versatility than EPROM

• Can support 105 erase/write cycles

• Provide an electrical-erasure procedure

• Modified floating-gate device, floating-gate tunneling oxide (FLOTOX):

• reduce the distance between floating gate and channel near the drain

• Fowler-Nordheim tunneling mechanism(when apply 10V over the thin insulator)


Flash Memories

Combines the density of the EPROM with the versatility of EEPROM structures

• Programming: avalanche hot-electron-injection

• Erasure: Fowler-Nordheim tunneling (as for EEPROM cells)

• Difference: erasure is performed in bulk for the complete (or subsection of) memory chip -reduction in flexibility!

• Extra access transistor of the EEPROM is eliminated because the global erasure process allows a careful monitoring of the device characteristics and control of the threshold voltage!

• High integration density

ETOX Flash cell - introduced by INTEL


Static Random Access Memory - SRAM (1)

• Permit the modification (writing) of stored data bits

• The stored data can be retained infinitely, without need of any refresh operation

• Data storage cell - simple latch circuit with 2 stable states

• Any voltages disturbance the latch switches from one stable point to the other stable point

• Two switches are required to access (r/w) the data

vI2 4 60

Unstable Q-Point

Stable Q-Point

Stable Q-Point

v = vo I

vo

0

2

4

6

VOL

VOH

0 1 0

vI v o

(a)

1 2 1

2

1

1

0

0

vI

v o(b)


Static Random Access Memory - SRAM (2)

a) general structure of a SRAM cell based on two inverter latch circuit

b) implementation of the SRAM cell

c) resistive load (undoped polysilicon resistors) SRAM cell

d) depletion load NMOS SRAM cell

e) full CMOS SRAM cell


Resistive Load SRAM Cell - Operation Principle (1)

• MP1,2 pull up transistors - charge up the large column parasitic capacitances CC, CC

• The steady-state voltage: VCc= VDD -VT ~ 3.5V

The basic operations on SRAM cells

RS = 1 (M3, M4 on)

• Read/Write “1”

• Read/Write “0”

RS = 0 (M3, M4 off)

• data is being held

V1 V2

Here we define the memorycontent to be located


Resistive Load SRAM Cell - Operation Principle (2)

• Write “1” operation (RS = 1 - M3, M4 on)

VC - forced to 0 by data write circuitry, V2 decreases to 0, M1 off; V1 increases;

Final state: V1= 1, V2= 0

• Read “1” operation (RS = 1 - M3, M4 on)

M1 off; M2, M4 on; VC - pulled down , VC > VC read as a logic “1”

• Write “0” operation (RS = 1 - M3, M4 on)

VC - forced to 0 by data write circuitry, V1 goes to 0, M2 off; V2 increases to 1

Final state: V1= 0, V2= 1

• Read “0” operation (RS = 1 - M3, M4 on)

M2 off; M1, M3 on; VC - pulled down, VC < VC read as logic 0


Full CMOS SRAM Cell

• Low-power SRAM Cell: the static power dissipation is limited by the leakage current during a switching event

• The pMOS pull-up transistors allow the column voltage to reach full VDD level

• High noise immunity due to larger noise margins

• Lower power supply voltages than resistive-load SRAM cell

• Drawback: large area!


CMOS SRAM Cell Design Strategy (1)

Layout of the resistive-load SRAM cell Layout of the CMOS SRAM cell



(1) The data read operation should not destroy the stored information

Assume that a logic “0” is stored in the cell (V1 = 0, V2 = 1: M1, M6-linear; M2, M5-off)

• RS = 0: M3, M4-off;

• RS = 1: M3-saturation; M4, M1-linear

VC decreases , V1 increases slowly

Condition - M2 must remain turned off during the data reading operation:

V1, max ≤ V T,2 ; IM3 = IM1 ⇒

( )( )2

,

,,

1

3

2

5.12

nTDD

nTnTDD

VV

VVV

LWL

W

−

−<

⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛

Design rule:

A symmetrical rule is valid also for M2 and M4


(2) The cell should allow modification of the stored information during the data write phase


Consider the write “0“ operation, assuming that “1“ is stored in the cell (V1 = 1, V2 = 0: M1, M6-off; M2, M5-linear)

• RS = 0: M3, M4-off;

• RS = 1: M3, M4 saturation, M5-linear

In order to change the stored information: V1 = 0, V2 = 1 ⇒ M1 on and M2 off!

But V2 < VT1 (previous design condition) ⇒ M1 cannot be switched on! ⇒ M2 must be switched off ⇒ V1 must be reduced below VT2

V1 ≤ V T,2 ; IM3 = IM5 ⇒

VDD 0V0V

( )( )2

,

,,

3

5

2

5.12

pTDD

nTnTDD

p

n

VV

VVV

LWL

W

+

−=

⎟⎠⎞

⎜⎝⎛

⎟⎠⎞

⎜⎝⎛

µµ

Design rule:

A symmetrical rule is valid also for M6 and M4


SRAM Write Circuitry

Write operation is performing by forcing the voltage level of either column (bit line) to “0”

W DATA WB WB Operation

0 1 1 0 M1-off, M2-on, VC high, VC low

0 0 0 1 M1-on, M2-off, VC low, VC high

1 X 0 0 M1, M2 off, VC, VC high


SRAM Read Circuitry

The read circuitry must detect a very small difference between the two complementary columns (sense amplifier)

( )( ) Dn

GS

Dmm

CC

oo IkV

IgwheregR

VV

VV2 ,21 =

∂∂

=•−=−∂−∂

The gain can be increased by using• active loads• cascode configuration

Precharging of bit lines plays a significant role in the access time!

• The equalization of bit lines prior to each new access (between two access cycles)


Dual Port SRAM Arrays

Allows simultaneous access to the same location in the memory array (systems with multiple high speed processors).

• Eliminates wait states for the processes during data read operation• Problems can occur if:

• two processors attempt to write data simultaneously onto the same cell• one processor attempts to read while other writes data onto the same cell

• Solution: contention arbitration logic


Summary of the SRAM properties

– 6 Transistors required (layout area about 100F2)

– Circuit is always in a stable state

– Current/Power consumption only by change of state

– Area required: approx 3 * area of an inverter

– Very fast read and write cycles


Introduction to the DRAM cell

• The typical DRAM cell consists of 1 Transistor / 1 Capacitor

WL (= Wordline)

BL

(= B

itlin

e)

CS

WRITE: WL-Activation

Writing a 1 (or 0) to BL and to CS

WL-Deactivation – CS,

isolated,transistor is off

1

VCS

VCS

VWL

VBL

VDD-Vth

VDD/2

VDD

1

– Transistor on

t

t

t

0

0


WL (= Wordline)

BL

(= B

itlin

e)CS

CBL

READ

WL-Activation

Transferring CS-Charge to BL towards a sense amplifier

Loading BL to VDD/2; BL not driven

VCS

VWL

VBL

VDD-Vth

VDD/2

VDD

t

t

t

– Transistor on

CBL >> CS !

Introduction to the DRAM cell

1

1

• The typical DRAM cell consists of 1 Transistor / 1 Capacitor


DRAM realization (Trench)

Single-sided buried strap(= cell contact)

Deep trench isolation:- strap cut- Isolation collar

Wordline ( = gate)

CB (contact to bitline)

Bitline

Deep trench:- common electrode- storage electrode

Current path


DRAM Stack realization (buried wordline)


DRAM Stack realization (buried wordline)

614


Summary of the DRAM properties

– One one transistor / one capacitor needed – very efficient and cheap! (area currently 8F2-6F2, path to 4F2

demonstrated)

– Capacitor is leaking, therefore refresh cycles required

– Very low area for realization required!

– Somehow slower read & write cycles compared to SRAM


High End: Graphics DRAMQimonda 512Mbit GDDR5, 2008

11326.74um

9898um

Application area: high end graphic cards (ATI HD4870)up to 6Gbit/p/s (HD4870: 115GB/s)

Technology: 75nm3 Metal layer interconnect (Al, W)Area: 112mm2

750 Mio TransistorsSelling price: at launch time about 8 US$


Flash

• non-volatile memory (10 years)Flash : =• electrical programmable & erasable

- EEPROM: single bytes erasable- Flash: large blocks erasable

• applications:- camera- mobile - chip card- solid state disk / storage...

• storage element: = MOS transistor

with adjustable threshold voltage:transistor on <-> off


source drain

substrate

Idrain

Vgate

floating gate

control gate

TOX

thick dielectric (gate coupling)

Charge storage,completely encapsulated

keeps carge 10 yearsTOX thickness ca. 8 nm

Flash Introduction


source drain

substrate

floating gate

control gate

TOX

thick dielectric (gate coupling)

Samsung

Flash Introduction

Charge storage,completely encapsulated

keeps carge 10 yearsTOX thicknes ca. 8 nm


source drain

substrate

Idrain

Vgate

Flash Introduction


source drain

Vcontrol gate = 2.5V

Vdrain = 1V

substrate

No current Is-d

Negative charge => no current.

+ ++ +

Idrain

Vgate2.5V

source drain

Vcontrol gate = 2.5V

Vdrain = 1V

substrate

Is-d = 30 µANeutral (or positive) charge in Floating Gate => current along channel.

+ ++ +

Idrain

Vgate2.5V

Flash Introduction

The 2 storage states:


source drain

substrate

source drain

substrate

0V 0V 0V 0V

∆ Vt ≈ 6V∆ Q ≈ 500 electrons

- 20V + 20V

e-e-

Electrical programming & erase

Flash Introduction

Leakage rate < 1 electron per week !


Electric Field

source drain

substrate

source drain

substrate

0V 0V 0V 0V

Thunderstorm flash lightening:0.03 MV/cm

- 20V + 20V

e-e-

Elektrical programming & erase

Flash Introduction

Flash transistor programming field:10 MV/cm


NAND Flash Memory Architecture Bitline

Wordline

Cell array Memory Cell

Verstärker

Input / Output

Decoding of bits

DecoderControl-signal

Semiconductor Memories

Bit accessed at intersect Word / Bit -line Verstärker/ Decoder

NAND has string of 32 cells, w. select transistors at end

4F2 cell size with 2/4 bits per cell

Smallest cell sizes of all memories

But: slow random single bit access

Usage for large data storage (fast serial data access)

System level solves limitations of serial access

GSL WL32 BSLWL2WL1


NOR Flash Memory Architecture

NOR cells are fully in parallel

Random access / erase, but low density

Usage for execute in Place (XiP) (no need to copy into RAM before executing)


Model: Trap assisted Tunneling

Mechanism of Slow Charge Loss


Floating Gate vs Trapping

Si

O

Poly

ONO

Poly Si

ONO

EC

EVPoly

Cell optimisation:

Intense work on dielectrics /energy barriers

Floating Gate

2 options how to bring the charge in Poly or Nitride layer:

Fowler Nordheim (FN) or Hot Electron programming (CHE)

2 options how to bring the charge in Poly or Nitride layer:

Fowler Nordheim (FN) or Hot Electron programming (CHE)


Floating gate – today‘s winner for Data Flash!

Vgate

I Dra

in

programming

erase

electrons in floating gate

Vt high

no electrons floating gate

Vt low

Different Levels of Vt allow storage of 2 bits/cell

Program / erase by Fowler/ Nordheim tunnelingEnergy barrier required to secure 10 years of retentionHigh voltage (10-20V)Slow prog./erase (µs-ms)

Vt

500-1000 electrons


Summary of Established Memory Technologies

Every established memory technology has shortcomings.

Quest for universal memory, that combines non-volatility withhigh speed, high write endurance and a small size

SRAM DRAM NAND Flash NOR Flash

Cell Size per bit in F2 100 8 ..6.. (4) 2 5

Retention Time ∞(with power)

64ms 10 yrs 10 yrs

Random Read Access

2-100ns 30ns 10µs 90ns

Random Write Access

2-100ns 30ns100µs

(erase 100ms)10µs

(erase 100ms)

Endurance >1015 >1015 >1015 read105 write

>1015 read105 write

Established memory technologies:


19. ASIC Design Guidelines

Integrated Electronic Systems Lab19: ASIC Design Guidelines 631

Introduction

• The following design guidelines have been adapted from [2]: European Silicon Structures (ES2), Zone Industrielle, 13106 France. Solo 2030 User Guide, e02a02 edition, June 1992

• These recommendations are useful in order to avoid functional faults and get the desired functionality


Synchronous Circuits (1)

• All data storage elements are clocked

• The same active edge of a single clock is applied at precisely the same time to all storage elements



• NON-RECOMMENDED CIRCUITS:– Flip-flop driving clock input of another Flip-flop:

– The clock-input of the second FF is skewed by the clock-to-q delay of the first FF and not activated at every activation clock edge (e.g. ripple counter)



• NON-RECOMMENDED CIRCUITS:– Gated clock line:

– Clock skew caused by gating the clock line (e.g. multiplexer in clock line)



• NON-RECOMMENDED CIRCUITS:– Double-edged clocking:

– FFs are clocked on the opposite edges of the clock signal

– Insertion of scan-path impossible

– Difficulties in determining critical path lengths



• NON-RECOMMENDED CIRCUITS:– Flip-flop driving asynchronous reset of another Flip-flop:

– Synchronous design principle, that all FFs change state at exactly the same time is not fulfilled

• Recommended Circuits will be described during the following sections


Clock Buffering (1)

• NON-RECOMMENDED CIRCUITS:– Unequal depth of clock buffering:

– causes clock skew


Clock Buffering (2)

• NON-RECOMMENDED CIRCUITS:– Unbalanced fanout of clock buffers:

– Clock skew by different

load-dependent delays

– Excessive clock fanout

should be avoided (slow edges)


Clock Buffering (3)

• Recommended circuits:– Balanced clock tree buffering

– Same depth of buffering

– Same fanout

– Limited fanout in order to

achieve sharp clock edges


Clock Buffering (4)

• Recommended circuits:– Combined geometric/tree buffering

– Using intermediate buffer

of suitable strength at each

fanout point


Gated Clocks (1)

• NON-RECOMMENDED CIRCUITS:– Multiplexer on clock line:

– Signal change at multiplexer input can cause a glitch at the clk input (FF captures invalid data)

– Gating the clock line introduces clock skew


Gated Clocks (2)

• Recommended circuits:1) Enabled (E-type) flip-flop: 2) Toggle (T-type) flip-flop:


Double-edged Clocking (1)

• NON-RECOMMENDED CIRCUITS:– Pipelined logic with double-edged clocking:

– Not recommended in context with scan-path methods


Double-edged Clocking (2)

• Recommended circuits:– Pipelined logic with single-edged clocking:


Asynchronous Resets (1)

• NON-RECOMMENDED CIRCUITS:– Flip-flop driving the asynchronous reset of another flip-flop:



• Recommended circuits:– Global asynchronous reset by external signal:



• Recommended circuits: – Flip-flop driving the synchronous reset of another flip-flop:


Shift Registers (1)

• NON-RECOMMENDED CIRCUITS:– Shift register with forward or reverse chain of clock buffers:

– Internal clock skew can cause data fallthrough


Shift Registers (2)

• Recommended circuits:– Shift register with balanced tree of clock buffers:


Asynchronous Inputs (1)

• NON-RECOMMENDED CIRCUITS:– Circuits with complicated feedback loops to capture asynchronous

inputs (very sensitive to noise, and functionality can be influenced by placement and routing delays)



• Recommended circuits:– Chain of two or more D-type flip-flops for capturing an asynchronous

input:

– The probability of propagating a metastable state is decreased with increasing number of register stages



• Recommended circuits:– Use of 4-bit register as shift register for capturing an asynchronous

input:

– The probability of propagating a metastable state is decreased with increasing number of register stages



• Recommended circuits:– Asynchronous handshake circuit:



• The asynchronous handshake ciruit works as follows:a) The first flip-flop is reset asynchronously when the r input is zero or

when the qb outputs of the second and the third FF both have the value 0

b) The q-output of the first FF is asynchronously set to high, when a positive edge arises at its ck-input

c) The high output of the first FF is propagated through the second and the third FF in the two following cycles. The q-outputs of these FFs are set to zero and the reset logic for the first FF is activated. Now the first FF is ready to receive another edge at its input.

d) ...



d) Three cases of metastability caused by simultaneously rising edges of the asynchronous input and the system clock:

1) the second FF stabilizes to q=1 before the next rising clock edge (circuit works as desired)

2) the second FF settles to q=0 and the third FF remains in its state. Since the output q of the first FF is high, the propagation of this output works correctly, but it needs one cycle more than in the first case.

3) The metastable state of the second FF is still there at the next rising edge of the clock signal. Then the third FF also becomes metastable. The probability of receiving a metastable d (internal) signal can be reduced by increasing the length of the register chain.



• Operation of asynchronous handshake circuit:


Delay Lines and Monostables (1)

• NON-RECOMMENDED CIRCUITS:– In general, it cannot be recommended to build circuits with a

functionality that relies on delays.

– E.g. monostable pulse generator:



• NON-RECOMMENDED CIRCUITS:– Pulse generator using flip-flop:

– Multivibrator:



• Recommended circuits:– Synchronous pulse generator:

– Usage of higher clock speed

– Minimum time resolution is given by clock cycle


Bistable Elements (1)

• NON-RECOMMENDED CIRCUITS:– Cross-coupled flip-flops and RS-flip-flops

– Bistable storing elements formed by cross-coupled NAND or NOR gates:



• NON-RECOMMENDED CIRCUITS:– Asynchronous RS-flip-flop:



• Recommended circuits:– Use D-types with set/reset

– Use latch configured as RS flip-flop:


RAMs and ROMs in Synchronous Circuits 1

• Problem: RAMs are double-edge triggered. The address is latched on the opposite edge to the data

• Timing scheme:



• Recommended circuits:– Interfacing RAM into synchronous circuit: ME and WEbar generation



• Recommended circuits:– Using flip-flop for WEbar generation: timing scheme



• Recommended circuits:– Avoiding floating RAM/DPRAM output propagation


Tristates (1)

• NON-RECOMMENDED CIRCUITS:– Tristate bus with non-central enable control:


Tristates (2)

• Recommended circuits:– Tristate bus with central control of all tristate enable signals and one

additional driver that is activated on non-controlled states


Tristates vs. Multiplexer

Tristates:

– large area

– limited buffering

– large routing load slow

• Control decoding expense is the same for tristates and multiplexers.

• Multiplexers are more favourable

Multiplexer:

– small area

– efficient routing


Parallel Signals

• NON-RECOMMENDED CIRCUITS:– Wired-OR part used to create higher fanout:

• Recommended Circuits:– High-fanout buffer replacing wired OR part


Fanout (1)

• NON-RECOMMENDED CIRCUITS:– Excessive fanout on

control signals:


Fanout (2)

• Recommended circuits:– Geometric buffering

on control signal:


Fanout (3)

• Recommended circuits:– Tree buffering

on control signal:


Design for Speed (1)

• Use a maximum of 2 inputs on all combinational logic gates:

• Use AOI logic (complex cells from standard cell library) where possible. The figure below shows a multiplexer using AOI logic:



• Feed late changing inputs late into combinational logic:

• Use shift (Johnson) counters instead of binary counters:

q0 q1 q2 q3

0 0 0 0

1 0 0 0

1 1 0 0

1 1 1 0

1 1 1 1

0 1 1 1

0 0 1 1

0 0 0 1

0 0 0 0



• Use duplicate logic to reduce fanout:

• Use fast library cells where available

• Reduce length of critical signal paths

• Use Schmitt trigger inputs in noisy environments


Design for Testability (1)

• Testability = Controllability + Observability

• NON-RECOMMENDED CIRCUITS:– Circuit with inaccessible internal logic: only the first block is

controllable, and only the last block is directly observable



• Recommended circuit:– Insert test inputs and outputs



• NON-RECOMMENDED CIRCUITS:– Chain of counters: first counter is not directly observable and

second counter is not directly controllable



• Recommended circuit:– Break long counter / shift register chains

– Chain of counters broken by test input tc and output signals:



• NON-RECOMMENDED CIRCUITS:– Counter with closed feedback loop: initial state is not known



• Recommended circuit:– Open feedback loops

– Counter with feedback loop opened by test control tr and output signals:



• Recommended circuits:– Use BIST (Built-In-Self-Test) with compiled megacells

– Compiled megacell with compiled inputs/outputs:



• Recommended circuits:– Scan path testing

– E-type scan path flip-flop (right):

– Circuit with scan path (below):



• Recommended circuits:– Use of JTAG boundary scan path

– JTAG test circuitry:


20. Testing andDesign for Testability

Integrated Electronic Systems Lab20: Testing 687

Motivation

• Stable chip manufacturing costs

• Increasing testing costs:– Increasing number of gates/device

– Limited number of pins

– Increasing number of internal states

– Increasing logical and sequential depth

• Example:– Testing of a combinational

circuit with n inputs

(10 MHz, one test per cycle)

• Testability has to be considered in all

phases of design

n time for test

25 3 s

30 107 s

40 1 day

50 3,5 years

60 3656 years


Economical Considerations (1)

• Average Quality Level (AQL):

rtsAcceptedPa

artsDevectivePaql

#

#=



• Correlation: Fault Coverage and Defective Parts



• Correlation: Fault Coverage and Defective Parts

– DL(=AQL): Defect Level; Number of defective circuits which have been classified as correct working (testing with T )

– Y: yield

– T: fault coverage

TYDL −−= 11



Defect level as function of yield and fault coverage


Design Flow: Testing (1)


Design Flow: Testing (2)

• Chip Test after Manufacturing:

Manufacturing Process

Parametric Test (current/power dissipation)(erroneous chips are marked with color points and removed after sawing)

Chip Test on Tester


Fundamental Definitions

• Relationship between faults, errors and failures:

• Fault: physical defect, imperfection or flaw which occurs in a hardware or software component

• Error: manifestation of a fault (erroneous information on a hardware line or in a program, caused by a fault)

• Failure: malfunction of a system

• Three-universe model of a system:

fault error failure

Physical Universe

Informational Universe

External Universe

Faults Errors Failures


Fault Models (1)

• Basis: physical phenomena– Oxide defects

– Missing implants

– Lithographic defects

– Junction defects

– Metal shorts & opens

– Moisture accumulation

– Impurities / Contaminations

– Static discharge

• Examples for physical faults:


Fault Models (2)


Fault Models for Gates (1)

• The GATE model: Stuck-at– stuck @0

– stuck @1

– 1 fault at a time (single-stuck)

PHYSICAL (analog)

LOGICAL (digital)



• Issue: complexity– as 1 model .......................

• 12 faults

– as 12 gates ......................................................

• 30 (collapsed) faults

• 12x larger netlist

• 30x computation

– as 60 transistors ................

• 90 (collapsed) faults

• 60 transistors

• 400x computation



• The controversy:– IBM: comprehensive stuck-at no empirical need for MOS fault

models

– UNISYS: MOS model required for < 1% AQL



• The MOS problem: Gates Memory

• Example: the output floats ..................................– Fault-free: C always driven

– Fault: C un-driven;

assumes last value;

sequential !

• Need 2-pattern test ...........– set C to opposite

– test

Set Test

branch A B A B

a 0 0 1 1

0 1 1 1

1 0 1 1

b 1 1 0 1

c 1 1 1 0

Anything works !


Fault Tolerant Design (1)

• Fault tolerance achieved by redundancy techniques:– Duplication with Complementary Logic

– Self-Checking Logic

– Reconfigurable Array Structures

Fault detection by duplication with complementary logic



4-by-4 array with one spare column



Reconfigured array


Test Pattern Generation (1)

• manually

• pseudo random (leads up to 60% fault coverage)

• algorithmic

• special test patterns for RAMs

• fault coverage sufficient ?fault simulation


The D-Algorithm (1)

• Every test generation procedure has to solve the following problems:– Creation of a change at the faulty line– Propagation of the change to the primary output line

• In the D-Algorithm the symbols and are used to refer to the changes. and are used as follows:

– : used if a line has the value 1 in absence of a fault and the value 0 in case of a fault occurrence

– :used if a line has the value 0 if no fault occurs and otherwise the value 1

• The D-algorithm method for path sensitization consists of two principal phases:

– forward drive (propagation) of an D-value to an primary output – backward trace (consistency operation)

• These two steps are iterated for different propagation paths for the D-value from one dedicated internal point i to one dedicated primary output point o until the backward trace phase is finished without any contradiction (a test vector for a fault at i has been found) or until all possible paths from i to o have been examined.

DDDD

D

D


The D-Algorithm (2)

Basic concept of D-algorithm


The D-Algorithm (3)

• A primitive D-cube of a failure is a D-cube associated with a fault on the output line l of a gate G. This produces the value or on l and the input lines have values which would produce in the fault-free case.

Primitive D-cube of fault (pdcf) for two-input NAND gate

α/lDD

α


The D-Algorithm (4)

• A propagation D-cube of a failure specifies the propagation of changes at one (or more) inputs of a gate G to its inputs l.

Propagation D-cube (pdc) for two-input NAND gate


The D-Algorithm (5)

• A singular cover of a gate G is a 0, 1, X truth table representation of G.

Singular cover for two-input NAND gate


The D-Algorithm (6)

Singular covers for several basic logic gates


The D-Algorithm (7)

Construction of the singular cover of a logic module


D-Algorithm Example (1)

• In the following the D-Algorithm is illustrated for the example circuit given below:



Propagation D-cube table



Singular cover table



D-cube intersection table



• Running the D-Algorithm for generating a test for line 5/0:1) Start with D-cube for the fault 5/0:

2) The D of line 5 is automatically propagated to line 6 and 7 by cube j

3) Now the propagation along path 6 9 11 is considered: D on line 6 is propagated to line 9 by cube d. Combining d and k yields cube l:



• Running the D-Algorithm (continued):4) If cube i is used with instead of D, the propagation to the output

can be done:

5) Now the consistency phase is started and a value for line 4 has to be found. From the singular cover table it can be seen that a 0 on line 10 implies both line 7 and line 8 to be 1. In cube m line 7 is a D(and also line 5 which is connected to 7 by j), and this D must now be set to 1 which is a contradiction that disables the path sensitization 5 6/7 9 11.

D



• Running the D-Algorithm (continued):6) Starting the propagation along 5 7 10 11 leads to the

following cube:

7) From the singular cover table we get the information that a 1 on line 8 is the same as a 0 on line 4. Additionally, it can be seen that the 0 on line 9 can be obtained by a 1 on line 1.

8) This yields the final cube:

1 1 1 0 D D D 1 0

9) A test vector for line 5/0 is given by:

1 1 1 0

D D


Fault Simulation

• Algorithms: Serial Fault Simulation

• Improved Algorithms:– Parallel Fault Simulation

– Concurrent Fault Simulation

discussed in CAD lecture



• Circuit level: restriction of physically possible faults

• Logic level: restrict possibilities of realizations

• System level: restrict size of component and number of states

Testability:

• controllability

• observability

• additional chip area required

• shorter design cycle

Methods to improve controllability and observability:

• ad-hoc techniques

• structured approaches



Design for testability: complex gate (a) not testable with stuck-at model; (b) fully testable with stuck-at model



• Ad-Hoc Techniques:– developed for special design

– less silicon area

– design automation almost impossible

– partitioning (test of circuit components by use of dedicated multiplexers)



Ad-hoc techniques: partitioning for testability



A-hoc techniques: insertion of register in order to limit logic depth to a given maximum value



Ad-hoc techniques :test shift registers for PLA test (increasing PLA area)


Scan-Path Methods (1)

• Main idea: test of sequential network is reduced to test of combinational network

• for circuits consisting of logic with some feedbacks

• can be realized by reconfiguration of latches as shift registers (two modes of use)

Feedback logic with scan-path



• Test scan-path / register function first:– Flush test ( 0...010...0 ) or

– Shift test ( 00110011... ) (each register transfer is tested by this combination: 0 0, 0 1, 1 1, 1 0 ).

• Cycle for testing combinational logic function:1) Scan mode: Preload Y and set PI

2) System operation mode: Wait until inputs of Y are steady. Clock new state into Y.

3) Shift state out. Compare PO and state values with expected responses.



• Advantages:– Testability of clocked circuits is improved and guaranteed at design

stage

– Consistent with good VLSI design practice (rules, abstraction, modularity, ...)

– Does not require special CAD

• Disadvantages:– Wastes silicon

– Constrains designer to design according given conditions

– Additional complexity

• Overhead:– 2% for a fundamentally ‘structured’ design

– 30% for ‘wild’ logic

~~


Built-In Tests (1)

• System generates test vectors by its own

• Analysis and evaluation of test vectors is also automatically done

• Compromise: silicon testability

Test Pattern Generators:

• Test patterns are generated inside the circuit to be tested

• Short design time, simple test programs, self-test

• Example: Test pattern memories, deterministic generators, counter


Built-In Tests (2)

Two examples for built-in test pattern generators


Built-In Tests (3)

• Pseudo Random Number Generators:– used as pseudo random pattern generator

011

1

1

1

)(

2) (mod ))1((*)(

2für )1()(

kxkxkxkxK

txktx

nitxtx

nn

nn

n

iiii

ii

++++=

−=

≤≤−=

−−

=

−

∑L


Built-In Tests (4)

• Pseudo Random Number Generators:– Example for pseudo random pattern generator:

1)( 4 ++= xxxK


Evaluation of Testing Data (1)

• Evaluation of testing results inside the circuit

• Counting techniques, signature analysis

Example: Counting techniques for test data evaluation

π*

11

mF −≈



• Signature analysis– Communication technique: coding theory

– Code words: data stream D, polynomial P(x), division modulo 2

– Evaluation of testing data

P

RQ

P

D+=



Example: Test data evaluation by signature analysis



• Signature analysis: Degree of Fault Recognition1) Length of sequence: sequences possible

2) One sequence contains no faults number of erronous sequences is

3) Length of signature register:

4) sequences are mapped on signatures number of non-detectable faults is:

5) Possibility for non-detection of erronous sequence: number of non-detectable faults divided by number of possible faults:

6) Fault detection rate:

mm 2 →bit

12 −m

signatures bit nn 2 →m2 n2

1212

2−=− −nm

n

m

12

12

−−

=−

m

nm

N

n

m

nm

F

F

−

−

−≈−−

−=

21

12

121



• Interpretation:– all faults recognized if m < n (trivial)

– long sequences: n is important only

– n = 16 bit F = 99,99985%

• Parallel signature register with k inputs:12

121

−−

−=−

mk

nmk

F


Built-in Logic Block Observation (1)

• A BILBO register is a universal element for use in either a scan-path environment or a self-test (signature analysis) environment.

BILBO register: 1. full circuit, 2. normal use, 3. scan-path, 4. signature analysis



• Advantages:– Versatility

• Normal operation

• Scan-path test: enhances testability

• Test vector generation via LFSR

• Data compression via LFSR

• Combined scab-path/self-test using LFSRs

• Disadvantages:– silicon area

• Bilbo latch can be 50% larger than ordinary latch≈



Example: Self-testing circuit

feedback disconnect: open in test mode

Test Clock

For clarity, mode control lines, normal system clocks, and preset/clear facilities have been omitted

binary up-counter

decoder

pass gate

red LED,

green LED

go / no go output


21. Future Trends:

Design of robust Circuits and Systems under Consideration of Reliability

Constraints

Integrated Electronic Systems Lab21: Future Trends 742

• Introduction and Definitions

• Reliability Challenges for nano-scaled CMOS Technologies

• Reliability Challenges for Technologies based on new Material Classes: Printed Electronics

• Conclusions and Outlook

Overview


Basic Definitions• Reliability:

... is the ability of a system or a component to perform its required functions under stated conditions for a specified period of time (IEEE)

• RobustnessRobustness is the quality of being able to withstand stresses, pressures, or changes in procedure or circumstance. A system, organism or design may be said to be "robust" if it is capable of coping well with variations (sometimes unpredictable variations) in its operating environment with minimal damage, alteration or loss of functionality. (Wikipedia)

• Zuverlässigkeit:

... eines technischen Produkts ist eine Eigenschaft (Verhaltensmerkmal), die angibt, wie verlässlich eine dem Produkt zugewiesene Funktion in einem Zeitintervall erfüllt wird. Sie unterliegt einem stochastischen Prozess und kann qualitativ oder auch quantitativ (durch die Überlebenswahrscheinlichkeit) beschrieben werden, sie ist nicht unmittelbar messbar. (Wikipedia)

• Robustheit:

... Ist die Eigenschaft eines Systems oder Verfahrens, auch unter ungünstigen Bedingungen noch zuverlässig zu funktionieren (Wikipedia)


Reliability: Devices, Components, Systems

• Technology Issue: solve reliability problems in new technologies; adequate technology modeling

• Device Issues:appropriate device models; device and circuit simulation; robust ciruit design

• Circuit Design Issue: cope with limited device reliability >> device tolerant design techniques

Component Component Component System+ + + ... =

Device Device Device Component+ + + ... =

∏=

=N

iiDC RR

j1

∏=

=M

jjCS RR

k1

Example: %6.3699.0 %4.9099.0 10010 ==


Reliability: Devices, Components, Systems

• System Design Issue: flexible adaptive systems with masking capability for lower level deviations/defects

• Application Design Issue:select adaquate manufacturing technologies, design techniques and system architectures

• Test / Quality Control Issue:test, if guaranteed system functionality is available

Source: NXP / Spoerle

Source: sees-project.net

Physics / Technology Models Test






Overview


Major Challenges in CMOS IC Design

• Solution:

PowerConsumption

DesignRobustness

contradictoryin nature

Designs for minimizingpower consumption

Reducedreliability

Power Reliability

Joint Optimization


Power Consumption

• Traditionally: the driving force behind technology changes:

• Currently: rapidly-growing power densities (90 nm and beyond)

– Causes: exponential grow in:

NMOS

CMOS

Bipolar

Subthreshold

Gate LeakageCurrents

[Sakurai 2004 ISSCC]

Research for a mature low-power

technology alternative to CMOS:

• Single electron transistors

• Spin transistors

• Carbon nanotube FETs

• Ferromagnetic logic devices

10 ... 20 years


Major Challenges

• Inherent Tradeoff: Critical Delay– Initially: many noncritical delay paths

– Power optimization: distribution pushed towards the initial critical path delay

• (Near-)Critical Paths:– affect the yield due to

Variability Power(particularly Leakage)

require the most additional EDA investment

Intersection: the most efforts from the CAD community

Initial path delay distribution

Timing wallProcess

Variability [Sylvester 2007 ProcIEEE]


Dynamic Power reduction:

• Gate Sizing: linear power reduction; convexproblem to be solved (polynomial time); enhanced standard cell libraries

• Clustered Voltage Scaling (CVS):quadratic power reduction; but: delay penalty

Dual VDD Approaches in most cases(power supply overhead!)

∑=

=N

iddiidyn fVCP

tot1

2α

for each node i, not straightforward to determine αi and Ci

switchingprobability

CVS:

[Usami 98 JSSC]

(plus short circuitcurrent)

[Usami 98 JSSC]


Static Power Minimization• Static Power: to be considered in active mode and standby

– Has become a significant contributor to the total power budget

– Particularly a problem for mobile applications

• Leakage Current: – Affected by the input-vector probability:

Stack Effect

S D

p substrate

n+n+

S D

p substrate

n+n+

Subthreshold Leakage Gate Leakage

the dominant contributon relevant in 65 nm and beyond (use high-K)

[Actel]


Static Power: Active Mode Leakage Reduction

• Multi-Vth Assignment:

– Analog method of dual-Vdd assignment,for leakage power

– Optimal Choice of Vth Values (Opt. Problem):

– Exponential dependence of leakage current on Vth

– Implementation: post-layout

– Sensitive to Vth variations

• Effective Gate Length (Leff) Biasing:

– introduce longer-than-minimum channel lengths(max 10%)

– very small delay and powerpenalties

– substantial reduction in leakage

[Gupta 2004 DAC]

54% less worst-case variability!

DDLowthHighth VVV ⋅≈− %10,,


Standby Mode Leakage Reduction

[Macii 2007 CLEAN Ws]

• Input Vector Controling (IVC)– Uses the stack effect to reduce leakage

– Force gates to a low leakage state

– Only a few nodes in a design can be

assigned to a given state:

– Hard Problem: Determine the state that should be forced: heuristics, random sampling

– leakage reduction up to 20% [1999 Johnson TransCAD]

• High-Vth Sleep Transistors– very large

– area and delay penalties

• Body Biasing:– Reverse body biasing: worse short-channel effects

[Keshavarzi 2002 ISVLSI]

– Current implementations: Forward BB to lower Vth during active mode operation

• Combination of IVC and Dual-Vth:– Up to 5x leakage savings than IVC alone! [Lee 2005 TransCAD]

Vbs(V)

Vth

(V)


Quantifying the Tradeoff• Parametric Yield given Timing and Power Constraints:

• Major Concern: yield loss due to power constraints violation

– Leff variations affect:

Two-sidedyield constraint

Delay Leakage

inversely correlated: opposite sensitivities to Leff

Dynamic Power

Leakage Power

sublinearly

exponentially !

[Sylvester 2007 ProcIEEE]


Total Power Optimization under Variability

~Variations in Dynamic Power

Same range as Process Parameters

Significantly HigherLeakage Current

Variations

Efficient methods are required for:

Statistical Analysis and Optimization of Leakage Power

Combined Approaches:

Dual-Vdd/Vth: improvements of 15%-45% in total power

• Dynamic Power:– Linear dependence on process parameters

• Leakage Power:– Exponential dependence on process parameters

• Interconnect design is another important issue!






Overview


MaterialsSciences

Electronics

TU Darmstadt: Research Center for Printed Electronics

Chemistry

PrintingTechnology

• Advanced Materials Synthesis

• Materials Optimization

• Materials Characterization

• Circuit Design

• Antenna Design

• Device Modeling

• Device Testing

• Printing, Processing

• Quality Management

Research Topics

[Source: PolyIC]

Application Scenario:

Printed RFID


Manufacturing Technology>> Printing Technology

Materials Research

Circuit Design

Applications

Device&

ProcessModels

– Materials– Printing– Modeling & Design

Technology & Design

Research Center for Printable Electronics

TUD MerckLab:

Joint University / Industry Research Lab


Mixed Level/Domain Models based on Verilog-A: UHF RFID system

• Modeled Components: – Reader– Wireless Channel– Transponder

• Mixed Wave Domain (s-Parameter) and Circuit Modelling


Circuit-level Simulation and Design of a RFID transponder: Rectifier

• Rectifier– Three-stage modified Villard rectifier

– LC matching network

• Rectifier impedance evaluation:

– Simulation result:

ΩK2

VV in 5.0^

= VV 5.1=+


RFID Reader Technology: 13.56 MHz Interrogator

Analog FrontEndLantronix XPort

Xilinx Spartan3FPGA Board

Antenna






Overview


Future Directions in IC Design• Multiple Cores

– Particularly interesting: nonuniform cores

(different supply voltages and different power/performance ratios)

– Dedicated hardware accelerators

for very low voltages

• Interconnect Design Trends– Problem shrinking wires >> larger delays

– Solutions Requirement:

• Meet stringent timing and signal integrityrequirements

• Reduce both static and dynamic power– Currently: aggressive shielding to avoid highly inductive

lines

– Future: improved signaling techniques:

Low-swing, pulsed signaling, Ultra high-speed serial lines, bus encoding

– Global wiring optimization for low power rather than performance

– Adaptive SoC top-level NoC-based interconnection architectures

[IBM Cell Processor]


Future Directions in IC Design

Robust design strategy

Generality and applicability to many optimization tools

Closely related CAD and technology improvements!

• Advanced circuit modeling and characterisation approaches required (simulation)

• New standard cell design approaches based on reliability criteria

• Usage of assertion based verification techniques on component level

• CAD/Design: Multiobjective Optimization(static and dynamic power, performance, yield)

– Parametric yield should be the objective of CAD flows

• Not simply: timing, power, area, ...

– Possible approaches:

• Use SSTA (statistical static timing analysis) with current optimization engines

• Use fast deterministic analysis with variation space sampling [Sylvester 2007]


Conclusions• NanoScale CMOS:

– Power is the key limiter of Moore‘s law [Sylvester ProcIEEE 2007]

– Design Goals: low-power and robustness (parametric yield)

– Power and robustness has to be considered on all levels of the design flow

– New CAD techniques for multi-objective optimization needed

– Design of adaptive circuits required (adaptive body biasing has been successful)

– Signal transmission one of the central future challenges (smart repeaters, pulses)

• New Technologies: (e.g. Printed organic/inorganic Electronics)– Reliability challenge for new manufacturing technologies

– Multi-level and multi-domain modeling required for optimized circuit design

– Realistic physical and design oriented modeling and characterisation of devices

– Technology modeling


Thank you!

Vielen Dank!

谢谢您!



Exercises



1. Exercise: Short Channel MOSFETs


1. Exercise: Short Channel MOSFETs 769

1. Problem: Short Channel MOSFETs

• Complete the table on the next slide (calculate K‘)

• What is the value of for a long channel MOSFET?

• Estimate the drain current IDS for both MOSFETs in ohmic region using the classical expression and using the velocity saturation effect. Compare both results by calculating the percentage of error between the results.

• Calculate the value of VDSAT and compare it with the classical assumption that the device enters in saturation when VDS≥VGS-VT0

• Find an expression for the on-resistance of short channel devices and estimate the on-resistance for both devices.

κ




Given the following parameters:

VT[V] K‘ [A/V2] µ [cm²/Vs]

NMOS 0.4 µn= 1.15* 104

PMOS -0.4 µp=3.00*103

COX= 10-8 F/cm2

|VGS|=0.6V

|VDS|=0.1V

L=0.25µm

W=0.75µm

EC= 1.5*106 V/m



( ) ( ))Ε(+1

1=

LVVκ

CDSDS


Formulas:

( ) ( ) ⎥⎦

⎤⎢⎣

⎡−−⋅=

2

2DS

DSTGSOXDSDSV

VVVL

WCVI µκ

0

1

→

=DSVDS

on gR

DS

DSDS V

Ig

∂∂

=



2. Exercise:

NMOS and CMOS Inverters


2. Exercise: NMOS and CMOS Inverters 773

1. NMOS Inverter

Assume three types of NMOS inverters:

a) with resistive load

b) with enhancement MOSFET load

c) with depletion MOSFET load

a) resistive load b) enhancement load

c) depletion load

Iout Iout Iout



Draw the simplified pull-up characteristic of the three types of NMOS inverters shown before.

Use the appended diagram “Pull-Up-Characteristics” for this purpose

Assume

VDD = 5V,

RL = , VT,enh = 1V,

VT,dep = -1V

λ = 0

The short-circuit current of both inverters with active load is

IQ = 0.2mA

Neglect short channel and body effects of the transistors.

1. NMOS Inverter

Ωk20



1. NMOS Inverter

The next appended diagram shows the output characteristics of the driver transistor QS.

The low-state output voltage should not exceed 0.8V. Determine graphically, for an input voltage of 2.5V and 3V, how much current the NMOS inverter can sink if:

• a load resistor is used,

• a depletion transistor with is used, neglecting body and short channel effects

Ω= kRL 20

mAIQ 2.0=



1. NMOS Inverter

For the NMOS inverter with saturated enhancement load, the voltage transfer characteristics should be estimated.

Use the appended diagram “Determination of VTC” to determine the Voltage Transfer Characteristic (VTC) of the NMOS inverter with saturated enhancement load graphically. Draw the VTC in the empty diagram “VTC of NMOS-Inverter”.



1. NMOS Inverter

This inverter is characterized by the following parameters:

• Calculate VOH

• Calculate VOL

• Calculate VIH

VVDD 5= VF 6.02 =φ

V37.0=γ82

1 ===βββRRK

VVT 0.10 =



1. NMOS Inverter

Hints:

• The body effect (influence of the bulk- source voltage) of the load transistor must be taken into account when determining its threshold voltage. Therefore the following equation for the threshold voltage can be used:

• An equation of type x = f(x) can be solved numerically by starting at any value for x and iteratively calculating f(x) until the result reaches the desired precision.

( )||2||20 FSBFTTH VVV φφγ −++=



2. VIL and VIH for a CMOS Inverter

A CMOS process is characterized by the following parameters:

• Calculate the values of VIL and VIH for a supply voltage VDD= 5V, 10V and 15V

• At which operation point does the current consumption of the inverter reach its maximum ?

• Calculate the current consumption of the inverter at these supply voltages.

²40,8.00 V

µAVV nnT =+= β

²40,8.00 V

µAVV ppT =−= β



3. Exercise: CMOS Inverter Technology


3. Exercise: CMOS Inverter Technology 781

Problem 1

The figure below shows the layout of a CMOS inverter, whose dimensions are given in micrometers. The inverter is realized in a n-well CMOS process. The oxides capacitance is Cox = 69.1 nF/cm2 for both n and p-channel transistors. The drain-bulk and source-bulk depletion capacitances of the transistors are given by the following parameters:

[ ][ ][ ][ ] 985.0921.0

939.00.879

362.00.107/

0298.00.0975/

0

0

0

20

V

V

mfFC

mfFC

PMOSNMOS

sw

jsw

j

φφ

µµ



( )

( )

2020

OHOLmaxdb,OHOLaveragedb,

00

0

0

0

0

0

1680- :PMOS ; 4080 :NMOS

:rs transistofor the parameters following the Useabove. determined

of value theusingby inverter, for the and Computed)

V and Vbetween Vfor Average ; CV,VKC

. and wiresctinginterconne theignore i.e.

, determineherewith andinverter for the Computec)

11

.separatelyaccount into regions bottom theand sidewall theTake

. and of valuebias zero theDetermineb)

and of valuemaximum theComputea)

.5 is tagesupply vol The ns.calculatioin included bemust and

assumed is m0.3 overlapan figure, in theshown explicitlynot Although

V

µAKV , .V

V

µAKV , .V

Ctt

C

C,VVK

/φV

perimeterC; C

/φV

areaCC

CC

CC

V V

L

ppTnnT

outLHHL

G

outLH

swr

jswsidewall

r

jbottom

dbpdbn

GDpGDn

DD

=′==′+=

⋅=

+

⋅=

+

⋅=

== µ



Hints:

MOS Overlap Capacitors

MOS Gate Capacitances




1. Cutoff: no inversion layer channel

2. Nonsaturation: the channel shields the bulk electrode from the gate

3. Saturation: the channel is pinched off and does not contact the drain n+ region




Combination of the gate capacitances with the overlap contributions:

The Bulk Junction Capacitances

The total depletion capacitance of a pn junction

is given (considering the bottom and sidewall

regions) by:

where Vr is the magnitude of the reverse-bias voltage applied to the junction:

• For drain regions: Vr = VDB

• For source regions: Vr = VSB



An average depletion capacitance can be defined by:

where

Defining a dimensionless voltage factor

yields



Problem 2

The figure below shows the layout of two cascaded CMOS inverters, each stage being identical to the one analysed in the problem 1. Capacitances and the connecting wires are now taken into account. Let Cp-f = 0.0576 fF/um2

and Cm-f = 0.0345 fF/um2.

?dominating escapacitanc two theofone Is 1. problemin calculated of

value theusing , sum theDetermine line.poly theof beginning

thefromseen as stage, second theof ecapacitancinput theDetermineb)

poly.or p ,n overlaps metalin which regions theignoring regions,

field-metal only theConsider stage.second theofcontact poly -metal the

tostagefirst theofoutput thefromecapacitanc field-metal theComputea)

out

gline

C C C +

++



Problem 3

Let’s consider a CMOS inverter with βn = βp = 35 µA/V2 and VT0n = 0.9V,

VT0p = -0.8V. The output capacitance is Cout = 125 fF and the supply voltage

is VDD = 5V.

a) Compute tHL and tLH for the inverter.

b) Determine the propagation delay time tp. You may assume an input voltage that has a rise or fall time of 0ns, i.e. the input signal goes immediately from 0V to 5V and vice versa.



4. Exercise: CMOS and Pass Transistor Logic


4. Exercise: CMOS and Pass Transistor Logic 790

Determine the logic function of the following NMOS circuits:

a)

b)

1. Problem: Logic Function Analysis



Synthesize the CMOS circuit for a parity generator with four inputs:DCBAZ ⊕⊕⊕=

3. Problem: Full AdderSynthesize the CMOS circuit for a full-adder, which has the following truth table:

4. Problem: CMOS LogicImplement the following function using static CMOS logic:

( ) ( )EDCABf ++=

2. Problem: CMOS Logic



The figure below shows an implementation with CMOS transmission gates of the function:

a) Build the equivalent multistage circuit with elementary gates (AND, OR, INV)

b) Implement the circuit as a Complex-Gate

c) Compare the transistor count. Point out the advantages and disadvantages of all three solutions

BSSAF +=

5. Problem: Transistor Count



6. Problem: Pass Transistor Logic

Implement the following function:

You may use 8 PMOS and 8 NMOS transistors respectively. The literals are available in both inverted and non-inverted form.

bcabcaacddcaF ′′+′′++′′=



7.Problem: Pass Transistor Logic

• Given are the following five logic functions, which are implemented in Pass Transistor Logic.

• Are these implementations correct?

– If not, under which condition of the input signals does the output not show the correct result?

– Hint: Take a look at the Karnaugh charts

– Try to draw the correct circuits



7. Problem: Pass Transistor Logic (cont)

cbacbaf ++=1

1f

b b c c

a

1a

cdcbdacf ++=2

2f

a a c c

b

1

d



cbacbdabcf3 ++=

3f

b b c c

a

d

bdadcbdbaf4 +++=

4f

b b d d

1c

a

a

a




dcbacabf ++=5

5f

b b c c

a

a

d



5. Exercise: Dynamic Logic




Problem 1: Dynamic Logic Full Adder

Draw the transistor level circuit of a dynamic ripple carry full adder, whose logic functions are the following.

( )( ) nnnnnnnn

nnnnnn

CBACBACS

BACBAC

⋅⋅+++=

++⋅=

+

+

1

1



Problem 2: Charge Sharing

The function:

must be implemented using domino logic. Could charge sharing effects occur? If yes, how can they be avoided?

( )FEDCBAZ ++++=



Problem 3: Charge Sharing

All input variables in the above circuit come from domino logic blocks, so that immediately after the precharge we have: .

For which possible 0 →1 transitions has the charge sharing effect the greatest influence? The capacitances are:

Calculate the voltage Vout,1. Make the calculations for .

fFCfFCC outXX 185 , 10 1,21 ===

VFDCBA 0=====

fFCC XX 4021 ==



6. Exercise:

Line Propagation Delay, Buffer Stages


6. Exercise: Line Propagation Delay, Buffer Stages 803

Problem 1: Line Propagation Delay

Assume a poly line with a length of l = 3mm, a line resistance of

r = 12 Ω/µm and a line capacitance of c = 4*10-4 pF/µm.

a) Calculate the delay of the line

b) Insert a buffer with a delay = 3 ns. At which position must the buffer be inserted to achieve a minimum delay (line delay and buffer delay)? Calculate this delay.



Problem 2: Inverter Chain

Consider an inverter chain with M stages like the one depicted below:



Problem 2: Inverter Chain

• Assume the inverters in the chain as symmetrical, this means that the rise and fall times at the output of the inverter are equal. Furthermore, the gate capacitance is for the NMOS of the first stage C1 = 6fF. The line capacitances are negligible. The load capacitance is CL = 150pF.

• Determine M and S, so that the delay of the inverter chain is minimal. The output must not be inverted.


7. Exercise:Gate-Matrix, Stick-Diagrams, Euler Graphs



7. Exercise: Gate-Matrix, Stick-Diagrams, Euler Graphs

Problem 1: Full adder - Stick Diagram

Let’s consider a full adder, whose input signals are A, B and Cin. The outputs are S and Cout.

A) Draw the logic table for the full adder and determine the equations for S and Cout.

B) Show the stick-diagram of the full adder



Problem 2: Barrel Shifter

Draw the stick-diagram of a barrel shifter for a 4-bit word, n∈0…3. Each input has its own shift-enable. Assume that these inputs are properly driven by a decoder, i.e. only one input can be enabled at a time.



Problem 3: Gate-Matrix Method

The figure below shows an implementation with CMOS transmission gates of the function: BSSAF +=

a) Build the equivalent multi-stage circuit with elementary gates (AND,OR,INV)

b) Compare the transistor count. Show the advantages and disadvantages of both solutions

c) Implement the circuit from a) using the gate-matrix technique. Draw the corresponding stick-diagram



Problem 4: Euler Graphs

Given the following function:

a) Show the transistor level circuit implemented using static CMOS logic.

b) Build the optimal layout using the Euler graph method.

1) Show the complex-gate implementation

2) Modify the circuit so that, after applying the Euler graph method, to obtain the optimal result

3) Determine the Euler path for the graph reduction and the subsequent graph expansion

c) Draw the layout as stick-diagram.

( )( ) 87654321 iiiiiiiiF ⋅+⋅+++=


8. Exercise: PLA Structures


Integrated Electronic Systems Lab 8128. Exercise: PLA Structures

Problem 1: PLA - Stick diagram

Draw the stick diagram of a NMOS PLA that implements a full adder stage. The input and the output registers are clocked using φ1 and φ2 respectively.


Problem 2: FSM implementation with PLA

Design and implement with PLA a traffic light controller for the crossroad below. The farm road has sensors for detecting waiting cars.

There is also a timer available, which is triggered by the rising edge of a ‘Start’ signal and provides two output signals:

TShort - during the yellow phase

TLong - for timing the green phase

StartTLong

TShort

TLong

TShort


S - Signal when a car is on the farmroad

TL - Timer signal for green (active low)

TS - Timer signal for yellow (active low)

HG - Highway green state

HY - Highway yellow state

FG - Farm road green state

FY - Farm road yellow state

First, draw the schematics of the controller, showing the PLA, the timer and the traffic lights.

Documents

A Dic Scripts s 2011 Complete