View
3
Download
0
Category
Preview:
Citation preview
Parallel Processing and Circuit Design
with Nano-Electro-Mechanical Relays
Elad Alon1, Tsu-Jae King Liu1, Vladimir Stojanovic2, Dejan Markovic3
1University of California, Berkeley2Massachusetts Institute of Technology
3University of California, Los Angeles
100
101
102
103
104
0
20
40
60
80
100
No
rma
lize
d E
ne
rgy
/op
1/throughput (ps/op)
CMOS is Scaling, Power Density is Not
Vdd and Vt not scaling well power/area not scaling
2
400480088080
8085
8086
286386
486Pentium® proc
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010Year
Po
we
r D
en
sit
y (
W/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
S. Borkar
Sun’s Surface
Power Density Prediction circa 2000
CMOS is Scaling, Power Density is Not
Vdd and Vt not scaling well power/area not scaling
Parallelism to improve throughput within power budget
3
100
101
102
103
104
0
20
40
60
80
100
No
rmalized
En
erg
y/o
p
1/throughput (ps/op)
Operate at a
lower energy
point
Run in parallel to
recoup performance
400480088080
8085
8086
286386
486Pentium® proc
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010Year
Po
we
r D
en
sit
y (
W/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
S. Borkar
Sun’s Surface
Power Density Prediction circa 2000
Core 2
101
102
103
104
105
0
20
40
60
80
100
No
rma
lize
d E
ne
rgy
/op
1/throughput (ps/op)
Where Parallelism Doesn’t Help
4
CMOS circuits have well-defined minimum energy
Caused by leakage and finite sub-threshold swing
Need to balance leakage and active energy
Limits energy-efficiency, no matter how slow the
circuit runs
Energy/op vs. Vdd
0.1 0.2 0.3 0.4 0.5
5
10
15
20
25
No
rmalized
En
erg
y/c
ycle
Vdd (V)
Etotal
Edynamic
Eleak
Energy/op vs. 1/throughput
Scale Vdd & VT:
Etot
Edyn
Eleak
What if There Was No Leakage?
5
No longer need to balance increasing component
of energy as Vdd decreases
Relay (mechanical switch) offers near infinite sub-
threshold slope and no leakage current
0.0
0.2
0.4
0.6
0.8
1.0
1.2
34 35 36 37
Measured relay I-V
VGB (V)
I DS
(mA
) compliance limit
VDS = 1V
W x L = 2mm x 20mm
H = g = 0.6mm
Energy/op vs. Vdd
0.1 0.2 0.3 0.4 0.5
5
10
15
20
25
No
rmalized
En
erg
y/c
ycle
Vdd (V)
Etotal
Edynamic
Etot = Edyn
Outline
If we could reliably build relays, would
relay circuits offer superior energy-
efficiency?
NEM Relay Structure and Model
Digital Circuit Design with NEM Relays
Comparisons to CMOS
6
7
Top View
Cross-Section
L
W
Body Electrode(under body
dielectric)
DrainSource
Channel
Contact Dimples
Electrical
Gate
Anchor
NEM Relay Structure & Operation
Metal Channel Gate DielectricElectrical Gate
Source Drain
Bodyody
Body Dielectric
H
HSource Drain
Body
Body Dielectric
t gap
Htdimple
W
Air gap
Infinite off-resistance zero leakage
Open Relay (“OFF”):
|Vgb| < Vpo (pull-out voltage)
Closed Relay (“ON”):
|Vgb| > Vpi (pull-in voltage)
On-resistance depends on materials, pressure (|Vgb|)
8
Preliminary Fabricated Relay
SOURCE
DRAIN
GATE
BODY
Channel
1mm
Measured I-V
Cantilever W,L,H = 2μm,20μm,200nm
Actuation gap thickness = 400nm
NEM Relay Model
Compact Verilog-A model for circuit simulation:
Mechanical dynamics: spring (k), damper (b), mass (m)
Electrical parasitics: non-linear gate-body (Cgb), gate-channel
(Cgc), and source/drain-body cap (Csd_b), contact (Rcs,d)
9
0.5Rc
RsdRsdCgb
0.5Rc
RcdRcs
DS
Cgc
So DoCsd_bCsd_b
B
Go
C
Cs Cd
Anchor
Spring k Damper b
Mechanical Model
Mass m
NEM Relay Scaling
10
Scaling (e.g., constant-E) benefits similar to CMOS
Pull-in Voltage with Beam Scaling
Measured pull-in voltages scale linearly
If overcome reliability issues & surface forces scale:
{W,L,H,tgap} = {90nm,2.3um,90nm,10nm} Vpi = 200mV
Mechanical delay also scales linearly (~10ns @90nm)
NEM Relay as a Logic Element
11
4-terminal design mimics MOSFET operation Electrostatic actuation is ambipolar
Non-inverting logic possible: actuation independent of drain/source voltages
Mechanical delay (~10ns) >> electrical t (~1ps) Can stack 200 devices (with 1kW on-resistance) before
electrical delay is comparable to mechanical delay
Digital Circuit Design with Relays
CMOS: delay set by electrical time constant
Quadratic delay penalty for stacking devices
Buffer & distribute logical/electrical effort over many stages
Relays: delay dominated by mechanical movement
Want all relays to switch simultaneously
Implement logic as a single complex gate
12
Digital Circuit Design with Relays
Delay Comparison vs. CMOS
Single mechanical delay vs. several electrical gate delays
For reasonable load, relay delay unaffected by fan-out/fan-in
Area Comparison vs. CMOS
Larger individual devices
Fewer devices needed to implement the same logic function
13
4 gate delays 1 mechanical delay
101
102
103
104
105
0
20
40
60
80
100
No
rma
lize
d E
ne
rgy
/op
1/throughput (ps/op)
CMOS vs. Relays: Digital Logic
14
Most processor components exhibit the energy vs.
performance tradeoff of static CMOS
Control, Datapath, Clock
Adder energy performance tradeoff is representative
Relay-based Adder
Full adder cell:
12 relays vs. 24
transistors
XOR “free”
Complementary
signals avoid extra
mechanical delay (to
invert)
Relays all sized
minimally
15
N-bit Relay-Based Adder
Ripple carry
configuration
Cascade full adder
cells to create
larger complex
gate
Stack of N relays,
but still single
mechanical delay
16
Snapshot: NEM Relays vs. CMOS
32-bit adder CMOS Relay
Supply
Voltage
0.5 V 0.32 - 0.9 V
Load Cap
per Output
25 fF 25 fF
Total Gate
Cap
4.0 pF 125 fF
Area 600 mm2 480 mm2
17
Compare vs. Sklansky CMOS adder1
For similar area: >9x lower E/op, >10x greater delay
1D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp.
on Computer Arithmetic (ARITH'07).
9x
10x
Energy/op vs. Delay/op across Vdd
Energy is for adder only
Energy-Delay Results: Contact R
18
Low contact R not
critical
Enables low force,
hard contact
Good news for
reliability…
More later
Energy-Delay Comparison
19
Energy/op vs. Delay/op across Vdd & CL
30x cap. gain
Lower device Cg, Cd
Fewer devices
2.4x Vdd gain
No leakage energy
Limited by Esurf
Relays always more energy efficient
>10x better, even in the GOPS range
Area vs. Throughput
For high throughputs, relays require area overhead
to achieve similar throughputs as CMOS
Area overhead is bounded (max. 6x-25x) :
CMOS needs parallelism to maintain Emin for throughputs
over 1GOPS
20
0.5 1 1.5 2 2.5 3
5
10
15
20
25
30
Throughput (GOPS)
AR
ela
y/A
CM
OS
W Relay (buffered, 25fF)
W Relay (buffered, 100fF)
Au Relay (25fF)
Au Relay (100fF)
Erelay = 20fJ/op
Revised Power Breakdown
21
I/O energy dominant if core is ~30x more efficient
Need to explore processing of analog signals too…
No or limited “linear gain” in relays – rely on switching1
*F. Chen et. al., “Integrated Circuit Design with NEM Relays,” IEEE ICCAD, Nov. 2008.
Conclusions
NEM relays offer unique characteristics
Nearly ideal Ion/Ioff
Significantly lower Cg than CMOS
Switching delay largely independent of electrical t
Relay circuits show potential for order of
magnitude better energy efficiency
Examined 32-bit adders
Exploring memories, multipliers, complete
microcontroller
Key challenges are reliability and scaling
Circuit level insights critical: engineer contact R
22
Acknowledgements
Hei Kam, Fred Chen, Hossein Fariborzi,
Abhinav Gupta, Jaeseok Jeon, Rhesa
Nathanael, Vincent Pott, Matthew Spencer,
Cheng Wang
Berkeley Wireless Research Center
NSF
DARPA
FCRP (C2S2, MSD)
Intel Foundation Graduate Fellowship
23
Recommended