ASelf-Tuning DynamicVoltage ScaledProcessorUsingDelay
-
Upload
others
-
View
1
-
Download
0
Embed Size (px)
Citation preview
A Self-Tuning Dynamic Voltage Scaled Processor Using Delay-Error
Detection and Correction
Shidhartha Das, David Roberts, Seokwoo Lee, Sanjay Pant, David
Blaauw, Todd Austin, Trevor Mudge, Krisztian Flautner* Department
ofEECS, University of Michigan, Ann Arbor, MI
*ARM Inc., Cambridge, UK
Abstract signal which overwrites the shadow latch data into the
errant flip- In this paper, we present the implementation and
silicon flop. A distributed pipeline recovery mechanism [1] is
implemented measurements results of a 64bit processor fabricated in
0. 18ptm to recover correct pipeline state (figure lb). The minimum
allowed technology. The processor employs a delay-error detection
and supply voltage is set to ensure that the shadow latch data is
correction scheme called Razor to eliminate voltage safety margins
guaranteed correct and can be used for error recovery. The duration
and scale voltage 120mV below the first failure point. It achieves
of the positive clock phase, when the shadow latch is transparent,
44% energy savings over the worst case operating conditions for a
determines the sampling delay of the shadow latch. The hold time
0.1% targeted error rate at a fixed frequency of 120MHz. constraint
imposed by the shadow latch was met by inserting delay
buffers. 1. Introduction
Recently, we proposed a new voltage management concept for The rest
of the paper is organized as follows. Section 2 describes Dynamic
Voltage Scaled (DVS) processors, called Razor [1], along the
transistor level design of the Razor flip-flop and the with initial
simulation based results. In this paper, we present the
metastability detector. Section 3 describes Razor processor first
silicon implementation of a Razor design. We discuss the
implementation details and measurement results are presented in
circuit structures used in this new implementation and present
Section 4. The Razor energy savings are quantified in Section 5.
silicon measurements for 33 tested dies. The chip implements a We
give the details on Razor voltage control in Section 6 and draw
subset of the Alpha instruction set and was fabricated with our
conclusions in Section 7. MOSIS[7] in tsmc 0.1I8,um
technology.
2. Razor Flip-Flop Circuit Level Implementation Traditional DVS
techniques [2-6] use a delay chain or a lookup Figure 2 shows the
transistor level schematic of the RFF. The error table to determine
the minimum voltage necessary for error-free comparator evaluates
in the negative phase when the data latched operation at a
particular frequency. Hence they require voltage by the slave
differs from the shadow. The metastability detector margins to
ensure correct operation over process variation, which shares the
dynamic node Errdyn with the comparator temperature fluctuations
and voltage drop. In contrast, the Razor evaluates in the positive
phase of the clock when the slave output processor uses a
delay-error tolerant flip-flop on critical paths to could become
metastable. Thus, the RFF error signal is flagged detect when
voltage is scaled to the point of first failure for a given when
either evaluate. This, in turn, evaluates the dynamic gate to
frequency. Voltage control is based on the observed error rate and
generate the restore signal by OR-ing error signals of individual
power savings are achieved by 1) eliminating the above margins
RFFs. The restore overwrites the master with the shadow latch data
under nominal operating and silicon conditions and 2) scaling such
that the slave gets the correct data at the next positive edge. In
voltage 120mV below the first failure point to achieve a 0.1% this
positive phase, it also disables the shadow to protect state. The
targeted error rate. The total measured energy savings over the
rbar latched signal precharges the Err dyn node for the next errant
worst case was 44% at 120MHz under nominal conditions. cycle.
Compared to a regular DFF of the same drive strength and
delay, the RFF consumes 22% extra (60fJ/49fJ) energy when Figure la
shows the delay-error tolerant Razor flip-flop in concept. sampled
data is static and 65% extra (205fJ/124fJ) energy when The standard
positive edge triggered DFF is augmented with a sampled data
switches. However, in the processor only 207 flip- shadow latch
which samples at the negative clock edge. Timing flops out of 2388
flip-flops, or 9%, could become critical and errors are detected by
comparing the main flip-flop data with that needed to be RFFs.
Hence, the net Razor power overhead, of the shadow latch. An
additional detector flags the occurrence of including the delay
buffer power for short paths, was computed to metastability at the
main flip-flop output. Error signals of be 3% of nominal chip
power. individual RFFs are OR-ed together to generate the pipeline
restore
I Latc ErrorW GlobalMeta-A|l ll ll LI,R Z comparator Detector
Fls
clk Restore
1 -4244-0098-8/06/$20.O ©0(2006 IEEE. 1 ICICDTO6
trev
IEEE International Conference on Integrated Circuit Design and
Technology. May, 2006. Padova, Italy.
Clk ~ Clk_n ~ Clk_p Restore Restore p
Master Slave Restore_n Clk_p ClkCl
Restore_p
Restor|_pS|sRbarlatchedFF1dXl2 P-skewed FFs Cl_- D ClkJ r Il lr ro
Clk n fl- I< > m I > n > r~~~~~~~~~~~~~~~~~~
Latchl
Restore_n
Figure 2a. Razor Flip-Flop Circuit Schematic Figure 2b. Restore
Generation Circuitry
The metastability detector consists of p- and n-skewed inverters
The probability that metastability propagates through the error
which switch to opposite power rails under a meta-stable input
detection logic and causes metastability of the restore signal
itself voltage. The detector evaluates when input node QS can be
was computed to be below 2e-30. Such an event is flagged by the
ambiguously interpreted by its fan-out, inverter GJ and the error
fail signal generated using double skewed flip-flops. In the rare
comparator. The DC transfer curve (figure 3) of inverter GJ, the
event of a fail, the pipeline is flushed and the supply voltage is
error comparator and the metastability detector show that the
immediately increased. During 4 months of chip testing, this event
"detection" band is contained well within the ambiguously was never
detected. interpreted voltage band. Table 1 gives the error
detection and ambiguous interpretation bands for different corners.
3. Processor Implementation Details
Detection Band The die photograph ofthe Razor processor and its
details are shown 4* >in figure 4. To verify correct operation,
the dcache/register file
river Gl contents were scanned and compared with a PC emulating the
same 1.6- code. A 64b register records the number of errant cycles
and was
sampled to compute the error rate. An internal clock unit
generates
1.2- Meta Corn parator an asymmetric clock with a range between 60
MHz to 400 MHz in 1.2 steps of 20MHz. The shadow latch sampling
delay, defined by the
positive clock phase, is configurable from Ops to 3.5ns in steps of
°l0.8- 500ps. The clock unit has a separate voltage domain that is
not
> Ambiguous Band voltage scaled. Energy savings from Razor DVS
were measured at-----l*'h*----- 140 and 120MHz for 33 chips from 2
fabrication runs.
0.4- 3.3mm Error Comparator
Voltage of Node QS Figure 3. DC Transfer Characteristics
Corner Ambiguous Detection
Slow 1.2V 85C 0.57-0.60 0.53-0.64
Typ. 1.2V 40C 0.52-0.58 0.48-0.61
Fast 1.2V 27C 0.48-0.56 0.40-0.61
SloIw 1.8tv 85Cr 0.77-0.87 0.6%7-0.93-
|Fast 1.8V 27C 0.64-0.81 0.58-0.89|_
Table 1. Metastability Detector Corner Analysis Figure 4. Die
Photograph of the Chip
1 -4244-0098-8/06/$20.O ©0(2006 IEEE. 2 ICICDTO6
Technology Node 0.18pm offsetting the quadratic savings due to
voltage scaling. For the measured chips, the energy optimal error
rate fell at approximately
Max. Clock Frequency 140MHz 0.11%. Table 3 shows the measured power
at the point of first
DVS Supply Voltage Range 1.2-1.8V failure and the energy per
instruction for both the chips at the point of first failure and at
the point of 0.10% error rate. At 120MHz, chip
Total Number of Transistors 1.58million 1 consumes 104.5mW at the
first failure point and 89.7mW at an Die Size 3.3mm*3.6mm optimal
0.1% error rate, leading to 15% energy savings with
Measured Chip Power at 1.8V 130mW negligible IPC hit. The energy
gains for chip 2 are 18%. These gains are in addition to the energy
saved by eliminating voltage
Icache Size 8KB margins.
Dcache Size 8KB
Total Number of Flip-Flops 2801 Point of First Failure Point of 0.
1% Error
Total Number of Razor Flip-Flops 207 120MHz Rate
Number of Delay Buffers Added 2388 27C Error Free Operation
(Simulation Results) Energy per Energy per
Power Instruction Power Instruction Standard FF Energy
(Static/Switching) 49fJ/1 24fJ (Power/I PC/ (Power/I PC/
RFF Energy (Static/Switching) 60fJ/205fJ Freq) Freq)
% Total Chip Power Overhead 2.9% Chipl 104.5mW 87OpJ 89.7mW 74OpJ
Chip2 11 9.4mW 99OpJ 99.6mW 83OpJ
Error Correction and Recovery Overhead
Energy of a RFF per error event T 260fJ ] Table 3. Error Rate and
Energy/Instruction at Point of First Failure and Point of 0.1%
Error Rate
Table 2. Processor Implementation Details Figure 6 shows the
distribution of the first failure voltage for the 33
4. Measurement Results measured chips. The first failure voltage
for chips 1 and 2 in figure Figure 5 shows the error rates and
energy gains versus supply 5 are 1.63V and 1.72V, respectively and
hence represent typical voltage at 120 and 140 MHz for two chips.
Energy at a particular and worst case process conditions. The
scatter plot shows the voltage is normalized with respect to the
energy at point of first correlation between the first failure
voltage and the 0.10% error rate failure. For all plotted points,
correct program execution with Razor voltage. The relative
"flatness" of the linear fit indicates less error correction was
verified. sensitivity to process variation when running at a 0.1%
error rate
than at point of first failure. The distribution of energy savings
From the figure, we note that the error rate at the point of first
from running at 0.1% error rate at 120MHz and 27C is shown for
failure is very low because only a few of the critical paths fail
to all chips and ranges from 5% to 23%. The measured error rates at
meet the setup requirements. As voltage is scaled further into the
different operating temperatures shows 8OmV shift in the point of
sub-critical regime the error rate increases exponentially. The IPC
first failure from 1.46V to 1.54V for a temperature increase from
penalty due to the error recovery cycles is negligible for error
rates 45 to 95C as shown in figure 7. below 0.1%. Under such low
error rates, the recovery overhead energy is also negligible and
the total processor energy shows a 5. Razor Energy Savings
quadratic reduction with the supply voltage. At error rates The bar
graph in figure 8 shows the energy for the two chips in exceeding
0.1%, the recovery energy rapidly starts to dominate, figure 4 when
operating at 120MHz and 45C. The first set of bars
Chip 1 10Chip 210
oX1 00Wt1E -\/>k> 110 ~~~~~~~ 0.1 ~~~~1.10
D0.01 tos (0~.o 1 oo
~1E-3- 12MH tOO> 1E- 120MHz -_-, 0S95Ss W 120MHz~~~~~~~~~~~~~~~~
10
cDIE-44 \1~A fA,40MHz0-95.Bo 1E-4 | 4vO Ff 10.0- =1E-51 ,-~ 0
=0E1E-5 140MHz 08 0.01~~~~~~~~~~~~~~~~~~~~~~
1E-6; --i - 0t 2e1E-6 -< \ X0zoI\) \ \s1~~~~0.85ZoI-I , , S-
0080
1 E-8 - I 10E-8 -0J0 1.48 1.52 1.56 1.60 1.64 1.52 1.56 1.60 1.64
1.68 1.72 1.76
Voltage (in Volts) Voltage (in Volts) Figure 5. Error Rate and
Normalized Energy Measurements for Chips 1 and 2
1 -4244-0098-8/06/$20.O ©2006 IEEE. 3 ICICDT06
12- Lot0 16 LotO 1 C 11 1418
l,,o IILot 1 12 Lotl Ir +0.22117 9Q gL 2 I Linear Fit
5120MHz ~~~~~~~~~~~~~~~~~~~~~~~~ A~ 17 s 8- _
E *4 T-27C E4;
.48 1.56 1.64 1.72 1.80 0 1.4 1.5 1.6 1.7 1.8 Voltage Percentage
Savings Vlaea is alr
Point of First Failure Normalized Energy Savings over First Failure
Point Vlaea is alr at 0.1% Error Rate
Figure 6. Distribution of Point of First Failure, Point of 0.1%
Error Rate and Normalized Energy across 33 measured chips
o.ot 65C 10 LiZnI _ otl 14- ilLot I
1E- 45C\Alo 95C Q. 7- _l_
BIE-4. t 5C 9C 120MHz D 1u 140H
5-0 A~~~~~~~~~~~~~~~~~~~~~~ E -5 65C 9C E 15
IE- 3E3 E z 1- 2Z2 AZ2~~~~~
0 IE-8 0A
1IE-9 39 44 49 54 59 64 41 46 51 56 61 1.40 1.44 1.48 1.52
Percentage Savings Percentage Savings
Voltage of 9.FailbroofmNetEnergy Savings Figure 7. Temperature
Dependence of Error Rate Figure 9. Distribution of Net Energy
Savings
shows the energy when Razor is turned off and worst-case margins 6.
Razor Voltage Control are added to ensure correct operation. For
chip 1, 80mV Figure 10 shows the voltage controller, implemented in
software temperature margin, 130mV process margin (compared to the
that regulates the supply voltage by reacting to error rates. The
worst-case chip out of the 33 chips) and an estimated 180mV
controller samples the error register and adjusts the supply
voltage power supply margin (1000 nominal Vdd) were added. The
second to achieve a targeted error rate. The response of the
voltage and third sets of bars show the energy when operating with
Razor controller for a 0.50% targeted error rate for a test code
with at first point of failure and at 0.10 error rate. altern rai
and lower rate phases s shown. The controller
settles at 1.52V at high error rate phases and at 1.45V at low
error Total energy gains for chip 1 (71mW, 440%) and chip 2 (63mW,
rate phases. 390%) are comparable because greater process margin in
chip 1l_____________ (lOOmV greater) is compensated by increased
savings for chip 2 1-g10H 18 when scaling below first failure
point. The distribution of the total @1-\2C0.5- energy savings over
the worst case at 120 and 140MHz for all 33 a 2 chips is shown in
figure 9 and shows an average savings of
160.5mW162.8mW 27.3mW11 V o 1E13 \4p 10 100 200 300 400 500
600
IIPower 1lSpl|Runtime Samples
SuplyFigre10Rao
E 14 Iniq |Spl |IntegrityFiuelORao Voltage Controller Response
*113mW|t5nW |7. Conclusion
°'120 173m 3Om 1194m In this paper, we present a self tuning DVS
processor using delayX l3OmVlrcs error tolerant flip-flops. We
obtained 4400 energy savings by n Processl 104.5mWl eliminating
voltage margins and operating at a 0.100 error rate.
100 | 119.4 119.4 99.6mWCfl 1 mW | lmW Referen2ces O| | ~~~104.5
1l111451 89.7mW |9 [1] D. Ernst, et. al., pp.7-18, MICRO 36.mW l mW
89.71 mW [2] K. Suzuki, et. al., pp. 587-590, CICC 97
80 ImWII II II I [3] S. Sakiyama, et. al., pp. 99-100, Symposium on
VLSI circuits, chip1 chip2 chip1 chip2 chip1 chip2 1997 Measured
Power Powerwith Razor Power with Razor [4] T. Burd, et. al.,
pp.294-295, ISSCC, 2000.
with suppiy, DVS DVS [5] 5. Akui, et. al., pp.64-65, ISSCC 2004.
temperature when Operating when Operating at' ' and process at
Point Point [6] Nowka, et. al., pp.340-341, ISSCC 2002 margins of
First Failure of 0.1% Error Rate [7] w mosis . org
Figure 8. Razor Energy Savings @120Mhz
i-4244-0098-8/06/$20.O r2006 IrEEE. 4 ICICDTO6