Upload
bandele
View
25
Download
2
Embed Size (px)
DESCRIPTION
ECE260B – CSE241A Winter 2005 Design Styles Multi-Vdd/Vth Designs. Website: http://vlsicad.ucsd.edu/courses/ece260b-w05. The Design Problem. Source: sematech97. A growing gap between design complexity and design productivity. Design Methodology. - PowerPoint PPT Presentation
Citation preview
ECE 260B – CSE 241A Design Styles 1 http://vlsicad.ucsd.edu
ECE260B – CSE241AWinter 2005
Design StylesMulti-Vdd/Vth
Designs
Website: http://vlsicad.ucsd.edu/courses/ece260b-w05
ECE 260B – CSE 241A Design Styles 2 http://vlsicad.ucsd.edu
The Design Problem
Source: sematech97
A growing gap between design complexity and design productivity
ECE 260B – CSE 241A Design Styles 3 http://vlsicad.ucsd.edu
Design Methodology
• Design process traverses iteratively between three abstractions: behavior, structure, and geometry• More and more automation for each of these steps
ECE 260B – CSE 241A Design Styles 4 http://vlsicad.ucsd.edu
Behavioral Description of Accumulator
entity accumulator isport (
DI : in integer;DO : inout integer := 0;CLK : in bit
);end accumulator;
architecture behavior of accumulator isbegin
process(CLK)variable X : integer := 0; -- intermediate variablebegin
if CLK = '1' thenX <= DO + D1;DO <= X;
end if;end process;
end behavior;
Design described as set of input-outputrelations, regardless of chosen implementation
Data described at higher abstractionlevel (“integer”)
ECE 260B – CSE 241A Design Styles 5 http://vlsicad.ucsd.edu
Structural Description of Accumulator
entity accumulator isport ( -- definition of input and output terminals
DI: in bit_vector(15 downto 0) -- a vector of 16 bit wideDO: inout bit_vector(15 downto 0);CLK: in bit
);end accumulator;
architecture structure of accumulator iscomponent reg -- definition of register ports
port (DI : in bit_vector(15 downto 0);DO : out bit_vector(15 downto 0);CLK : in bit
);end component;component add -- definition of adder ports
port (IN0 : in bit_vector(15 downto 0);IN1 : in bit_vector(15 downto 0);OUT0 : out bit_vector(15 downto 0)
);end component;
-- definition of accumulator structuresignal X : bit_vector(15 downto 0);begin
add1 : addport map (DI, DO, X); -- defines port connectivity
reg1 : regport map (X, DO, CLK);
end structure;
Design defined as composition ofregister and full-adder cells (“netlist”)
Data represented as {0,1,Z}
Time discretized and progresses withunit steps
Description language: VHDLOther options: schematics, Verilog
ECE 260B – CSE 241A Design Styles 6 http://vlsicad.ucsd.edu
Implementation Methodologies
Digital Circuit Implementation Approaches
Custom Semi-custom
Cell-Based Array-Based
Standard Cells Macro Cells Pre-diffused Pre-wired(FPGA)Compiled Cells (Gate Arrays)
ECE 260B – CSE 241A Design Styles 7 http://vlsicad.ucsd.edu
Full Custom
Hand drawn geometry
All layers customized
Digital and analog
Simulation at transistor level
High density
High performance
Long design time
Magic Layout Editor(UC Berkeley)
ECE 260B – CSE 241A Design Styles 8 http://vlsicad.ucsd.edu
Symbolic Layout
1
3
In O ut
VDD
GND
Stick diagram of inverter
• Dimensionless layout entities• Only topology is important• Final layout generated by “compaction” program
ECE 260B – CSE 241A Design Styles 9 http://vlsicad.ucsd.edu
Standard Cells
FunctionalModule(RAM,multiplier, )
Row
s of
Cel
ls
Logic Cell
RoutingChannel
Feedthrough Cell
Routing channel requirements arereduced by presenceof more interconnectlayers
Organized in rows
Cells made as full custom by vendor (not user)
All layers customized
Digital with possible special analog cells
Simulation at gate level (digital)
Medium-high density
Medium-high performance
Reasonable design time
ECE 260B – CSE 241A Design Styles 10 http://vlsicad.ucsd.edu
Standard Cell — Example
[Brodersen92]
ECE 260B – CSE 241A Design Styles 11 http://vlsicad.ucsd.edu
Standard Cell - Example
3-input NAND cell(from Mississippi State Library)characterized for fanout of 4 andfor three different technologies
ECE 260B – CSE 241A Design Styles 12 http://vlsicad.ucsd.edu
Automatic Cell Generation
Random-logic layoutgenerated by CLEOcell compiler (Digital)
ECE 260B – CSE 241A Design Styles 13 http://vlsicad.ucsd.edu
Module Generators — Compiled Datapath
add
er
bu
ffer
reg0
reg1
mu
x
bus0
bus2
bus1
bit-slicerouting area feed-through
Advantages: One-dimensional placement/routing problem
ECE 260B – CSE 241A Design Styles 14 http://vlsicad.ucsd.edu
Macrocell-Based Design
Macrocell
Interconnect Bus
Routing Channel
Predefined macro blocks (uP, RAM, etc.)
Macro blocks made as full custom by vendor (IP blocks)
All layers customized
Digital and some analog
Simulation at behavior
or gate level
High density
High performance
Short design time
Use standard on-chip busses
“System on a chip” (SOC)
ECE 260B – CSE 241A Design Styles 15 http://vlsicad.ucsd.edu
Macrocell Design Methodogoly
Video-encoder chip[Brodersen92]
SRAM
SRAM
Rou
ting
Cha
nnel
Data paths
Standard cells
Floorplan:Defines overalltopology of design,relative placement ofmodules, and global routes of busses,supplies, and clocks
ECE 260B – CSE 241A Design Styles 16 http://vlsicad.ucsd.edu
Gate Array
rows of
cells
routing channel
uncommitted
Predefined transistors connected via metal
Two types: channel based, sea of gates
Only metal layers customized
Fixed array sizes
Digital cells in library
Simulation at gate level (digital)
Medium density
Medium performance
Reasonable design time
ECE 260B – CSE 241A Design Styles 17 http://vlsicad.ucsd.edu
Gate Array — Primitive Cells
VD D
GND
polysilicon
metal
possiblecontact
In1 In2 In3 In4
Out
UncommitedCell
CommittedCell(4-input NOR)
ECE 260B – CSE 241A Design Styles 19 http://vlsicad.ucsd.edu
Sea-of-gates
Random Logic
MemorySubsystem
LSI Logic LEA300K(0.6 m CMOS)
ECE 260B – CSE 241A Design Styles 20 http://vlsicad.ucsd.edu
Prewired Arrays Programmable logic blocks
Programmable connections between logic blocks
No layers customized (standard devices)
Digital only
Low-medium performance
Low-medium density
Programmable: SRAM, EPROM, Flash,
Anti-fuse, etc.
Easy and quick design changes
Cheap design tools
Low development cost
High device cost
NOT a real ASIC
Courtesy Altera Corp.
ECE 260B – CSE 241A Design Styles 21 http://vlsicad.ucsd.edu
Programmable Logic Devices
PLA PROM PAL
ECE 260B – CSE 241A Design Styles 23 http://vlsicad.ucsd.edu
Field-Programmable Gate Arrays - Fuse-based
I/O Buffers
P rogram/Test/Diag nostics
I/O Buffers
I/O B
uffe
rs
I/O B
uffe
rs
Vertical ro utes
Rows o f logic m odule s
Routing channels
Standard-cell likefloorplan
ECE 260B – CSE 241A Design Styles 24 http://vlsicad.ucsd.edu
Interconnect
Cell
Horizontaltracks
Vertical tracks
Input/output pin
Antifuse
Programmed interconnection
Programming interconnect using anti-fuses
ECE 260B – CSE 241A Design Styles 25 http://vlsicad.ucsd.edu
Field-Programmable Gate Arrays - RAM-based
CLB CLB
CLBCLB
switching matrixHorizontalroutingchannel
Vertical routing channel
Interconnect point
ECE 260B – CSE 241A Design Styles 26 http://vlsicad.ucsd.edu
RAM-based FPGA - Basic Cell (CLB)
R
Q1D
CE
R
Q2D
CE
F
G
F
G
F
G
R
D in
Clock
CE
F
G
A
B/Q1/Q2
C/Q1/Q2
D
A
B/Q1/Q2
C/Q1/Q2
D
E
Combinationa l logic Sto ra ge eleme nts
Any function of up to 4 variables
Any function of up to 4 variables
Courtesy of Xilinx
ECE 260B – CSE 241A Design Styles 27 http://vlsicad.ucsd.edu
RAM-based FPGA
Xilinx XC4025
ECE 260B – CSE 241A Design Styles 28 http://vlsicad.ucsd.edu
High Performance Devices
Mixture of full custom, standard cells and macro’s
Full custom for special blocks: Adder (data path), etc.
Macro’s for standard blocks: RAM, ROM, etc.
Standard cells for non critical digital blocks
ECE 260B – CSE 241A Design Styles 29 http://vlsicad.ucsd.edu
Global Signaling and Layout
Global signaling and layout optimization
Multi-Vdd
Static power analysis
Multi-Vth + Vdd + sizing
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 30 http://vlsicad.ucsd.edu
Global Signaling
Current global signaling paradigm insert large static CMOS repeaters to reduce wire RC delay
Impending problems: Too many repeaters
- 180nm processors: 22K repeaters (Itanium), 70K (Power4)
- Project 1-1.5M repeaters at 45-65nm technologies
Too much power
- Many large repeaters = significant static and dynamic power
Too much noise
- Repeater clustering complicates power distribution
- Inductive coupling across wide bus structures
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 31 http://vlsicad.ucsd.edu
Cell Layout Optimization Advanced layout techniques must allow
Continuous individual device sizing Variable p/n ratios Tapered FET stacking sizes Arbitrary Vth assignments within gates
First cut: Cadabra 15-22% power reduction using 1st two approaches under fixed footprint constraint
GDSII Import Compact fixed widthRef: Hurat, Cadabra
Optimize specific instances of
standard gates
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 32 http://vlsicad.ucsd.edu
Multi-Vdd
Global signaling and layout optimization
Multi-Vdd
Static power analysis
Multi-Vth + Vdd + sizing
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 33 http://vlsicad.ucsd.edu
Multi-Vdd Status
Idea: Incorporate two Vdd’s to reduce dynamic power
Limited to a few recent Japanese multimedia processors Example – 0.3 m, 75MHz, 3.3V media processor (Toshiba)
- Total power savings of 47% in logic, 69% in clock Dynamic voltage scaling of mobile processors
- Transmeta Crusoe, Intel Speedstep, etc.
- Not considered in this talk
Very powerful technique currently applied only inlow-performance designs
Mentality: today’s high performance parts aren’t “limited” by power
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 34 http://vlsicad.ucsd.edu
Lower Power Via Rich Replacement
Media processors and other low speed designs have many non-critical paths
60-70% of paths have delay half the clock period
After replacement, most paths become near critical
What about high-speed microprocessors?
% o
f to
tal p
ath
s
Path delay (normalized to clock period)
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 35 http://vlsicad.ucsd.edu
Similar Story For High-Performance
IBM 480 MHz PowerPC shows over 50% of paths have delay less than half the clock period
Implies that high-performance designs can benefit from multi-Vdd
Ref: Akrout, JSSC98
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 36 http://vlsicad.ucsd.edu
Resizing Is Not The Right Answer
Post-synthesis optimizations resize gates to recover power on non-critical paths
Looks similar to pre- and post-replacement figures in media processor…
Before post-synthesis resizing
After post-synthesis resizing
Ref: Sirichotiyakul, DAC99
This is the wrong approach for nanometer design!
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 37 http://vlsicad.ucsd.edu
Multi-Vdd Instead of Sizing
Power ~ C Vdd2 f, where f is fixed
Key: Reducing gate width impacts power sub-linearly Interconnect capacitance is not affected
Reducing supply voltage cuts power quadratically All capacitive loads have lower voltage swing
How can we minimize delay penalty at low Vdd?
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 38 http://vlsicad.ucsd.edu
Challenges For Multi-Vdd
Area overhead Toshiba reported 7% rise in area due to placement restrictions,
level converters, additional power grid routing
EDA tool support for the above issues (placement, dual power routing)
Noise analysis Additional shielding required between Vdd,low and Vdd,high
signals? Including clock network
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 39 http://vlsicad.ucsd.edu
Static Power
Global signaling and layout optimization
Multi-Vdd
Static power Multi-Vth + Vdd + sizing
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 40 http://vlsicad.ucsd.edu
Static Power Why do we care about static power in non-portable
devices? Standby power is “wasted” -- leaves fewer Watts for
computation Worsens reliability by raising die temperatures
Leakage current is a function of Vth and subthreshold swing (Ss) (x10 at operating vs. room temp!)
Ss expected to remain at 80-85 mV/dec (room temp) Device technology may cut this by ~20%
Vth reductions are mandated by scaling Vdd Vth has been around Vdd/5
mAI s
th
S
V
off /1010
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 42 http://vlsicad.ucsd.edu
Leakage Suppression Approaches
Dual-Vth (most common) Low-Vth on critical paths, high-Vth off Only cost is additional masks
MTCMOS Series inserted high-Vth device cuts
leakage current when off (sleep mode) Delay and area penalties, control
device sizing is critical
Other techniques Substrate biasing to control Vth
Dual-Vth domino
- Use low-Vth devices only inevaluate paths
Pull Up
Pull Down
ParasiticNode
Vcontrol
Vout
Vdd
High Vth Device
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 43 http://vlsicad.ucsd.edu
Can Gate-length biasing help leakage reduction?
Reduce leakage?
0
0.2
0.4
0.6
0.8
1
1.2
Gate-length (nm)
Leakage
Delay
Variation of leakage and delay (each normalized to 1) for an NMOS device in an industrial 130nm technology
Reduce leakage variability?
Leakage Variability
Gate-length
Lea
kag
e
Leakage Variability
Gate-length
Lea
kag
e
Biasing
ECE 260B – CSE 241A Design Styles 44 http://vlsicad.ucsd.edu
Gate-length Biasing
First proposed by Sirisantana et al. Comparative study of effect of doping, tox and gate-length Large bias used, significant slow down
Small bias Little reduction in leakage beyond 10% bias while delay degrades
linearly Preserves pin compatibility
Technique applicable as post-RET step
Salient features Design cycle not interfered Zero cost (no additional masks)
ECE 260B – CSE 241A Design Styles 45 http://vlsicad.ucsd.edu
Granularity
Technology-level
All devices in all cells have one biased gate-length
Cell-level
All devices in a cell have one biased gate-length
Device-level
All devices have independent biased gate-length
Simplification: In each cell, NMOS devices have one gate-length and PMOS devices have another
ECE 260B – CSE 241A Design Styles 46 http://vlsicad.ucsd.edu
Device-Level Leakage Reduction
0
5
10
15
20
25
30
35
40
INVX4 NANDX4 BUFX4 ANDX6
Leakage saving with a delay penalty of up to 10% (Simplified device level biasing)
Low Vt
Nom Vt
High Vt
ECE 260B – CSE 241A Design Styles 47 http://vlsicad.ucsd.edu
Circuit level
Bias gate-length for non-critical cells
Library extended with each cell having a biased version
Benefits analyzed in conjunction with Multi-VT assignment and in isolation
SVT-SGL DVT-SGL SVT-DGL DVT-DGL
ECE 260B – CSE 241A Design Styles 48 http://vlsicad.ucsd.edu
Results: Leakage Reduction
00.10.20.30.40.50.60.70.80.9
1
No
rmal
ized
Lea
kag
e
c5315 c6288 c7552 alu128
SVT-SGL
SVT-DGL
DVT-SGL
DVT-DGL
With less than 2.5% delay penalty
• Design Compiler used for VT assignment and gate-length biasing• Better results expected with Duet (academic sizer from Michigan)
ECE 260B – CSE 241A Design Styles 51 http://vlsicad.ucsd.edu
Multi-Vth + Vdd + Sizing
Global signaling and layout optimization
Multi-Vdd
Static power analysis
Multi-Vth + Vdd + sizing
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 52 http://vlsicad.ucsd.edu
Multi-Everything
Need an approach that selects between speed, static power, and dynamic power
Should be scalable to nanometer design Rules out dual-Vth domino or other dynamic logic families (low
supplies kill performance advantages)
Techniques mentioned so far Flexible, optimized cell layouts Multi-Vdd
Dual-Vth
Put them all together
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 53 http://vlsicad.ucsd.edu
Multi-Vdd Can Leverage Vth’s
Existing designs using multi-Vdd do not alter Vth in low-Vdd cells
Highly sub-optimal, delay is fully penalized Limits cell replacement limits power savings
Much better solution: reduce Vth in low-Vdd cells to carefully balance delay, static power, and dynamic power
Enforce technology scaling within a chip – whenever we reduce Vdd, we also reduce Vth to maintain speed
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 54 http://vlsicad.ucsd.edu
Multi-Vdd + Vth Negates Delay PenaltyDelay ~ CVdd/Ion
Scenarios Constant Vth (current paradigm) Scale Vth to maintain constant static power Scale Vth to reduce static power linearly with Vdd
Delay penalty is substantially offset Ion is very sensitive to Vth
at Vdd < 1V
Pstatic reduces with Vdd due to linear term and smaller Ioff (Ion and DIBL )
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 55 http://vlsicad.ucsd.edu
Now Add Sizing
Multi-Vdd + multi-Vth + sizing/cell layout optimization attacks power from many angles (multi-dimensional)
Depending on criticality and switching activities, non-critical gates can be:
Assigned Vdd,low Assigned Vdd,low + lower Vth Assigned Vth,high Downsized (at the individual transistor level if advantageous) Assigned Vdd,low and upsized
- For gates that cannot tolerate Vdd,low delay, this can be power efficient
And others
D. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 56 http://vlsicad.ucsd.edu
Summary
Power density must saturate to maintain affordable packaging options
50 W/cm2 means 200-250W for future large MPUs Dynamic thermal management saves 25% on packaging power
budget
Multi-Vdd will leverage multiple Vth’s to offset delay penalty at low Vdd
More widespread re-assignment to Vdd,low Use Vdd first instead of re-sizing to take advantage of large path
slacks Anticipated power savings of 50-80%
Static power also addressed through multi-Vth + Vdd + sizing
Vth difficult to control in ultra-short channels Intra-cell Vth assignment + MTCMOS/variants + sleep modesD. Sylvester, DAC-2001
ECE 260B – CSE 241A Design Styles 57 http://vlsicad.ucsd.edu
Next Week: Project Meetings
D. Sylvester, DAC-2001