25
Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University Reducing Power Consumption with Relaxed Quasi Delay- Insensitive Circuits

Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

  • Upload
    umika

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits. Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University. Outline. Motivation / Background Contributions Relaxed Quasi Delay-Insensitive (RQDI) RQDI Voltage Scaling - PowerPoint PPT Presentation

Citation preview

Page 1: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Christopher LaFrieda and Rajit ManoharComputer Systems LaboratoryCornell University

Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Page 2: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Outline

Motivation / BackgroundContributions

Relaxed Quasi Delay-Insensitive (RQDI)RQDI Voltage ScalingRQDI Two Phase Circuits

ResultsSummary

Page 3: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Motivation:How Does Dynamic Power Scale?

α – activity factor (1x)N – total number of transistors (2x)CL – average load capacitance per transistor (.7x)

Vdd – doesn’t scale well anymoreScaled by 17-20% from 130nm to 65nm.Scaled by 10% at 45nm and 5.5% at 32nm.

0

1

0

1

0

12

2

4.1f

f

V

V

P

P

dd

dd

D

D

fVCNP ddLD 2

Page 4: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Motivation:Power Scaling With Fixed Frequency

Page 5: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Motivation:Process Variations Getting WorseProcess Variation in 65nm:

FO4 delays across corners:

FF is 70% faster than SS.Circuits need to be robust w.r.t. process

variations.QDI is a logical place to start.

SS Corner TT Corner FF Corner

13.6 ps 18.2 ps 22.6 ps

Page 6: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Background:QDI – WCHB Buffer• Simple buffer.• Neutrality is

checked in the pull-up stack of the c-element.

• Timing assumption?

Page 7: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI:Staticizer Timing Assumption I• Data is neutral

and enable is high.

Page 8: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI:Staticizer Timing Assumption II• Data is neutral

and enable is high.

• Data becomes valid which sets _R0 low. If R0 inverter is slow, R0 will remain low.

Page 9: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI:Staticizer Timing Assumption III• Data is neutral

and enable is high.

• Data becomes valid which sets _R0 low. If R0 inverter is slow, R0 will remain low.

• Nothing is fighting the weak feedback, _R0 can go high.

Page 10: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI:Half Cycle Timing AssumptionThe half cycle timing assumption (HCTA):

A small amount of combinational logic (1-2 transitions) will always switch within one half cycle of a process.

There is a 4.5x (@ 18 t.p.c.) timing margin.With worst case corners, 2.7x margin in 65nm.Wire delays make the assumption even more

conservative.QDI has an HCTA in staticizers. RQDI allows them everywhere.

Page 11: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI:HCHB Template• N tracks

neutrality. • Check N+,

but assume N- happens in the first half cycle.

• Two transition latency.

• 14 transition cycle time.

• Validity must be checked by pull-down.

Page 12: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI Voltage Scaling:Scaling Scenarios• Two possible

scenarios for voltage scaling.

• Top: mismatched slack. Lower pipeline can run slower.

• Bottom: Token limited loop. Latency through loop should be minimal, but cycle time can scale.

• In some applications these can’t be avoided.

Mismatched slack

Token limited

loop

Page 13: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI Voltage Scaling:Slack Mismatch In An FPGA• Logic blocks (LB)

for logic.• Switch boxes

(SB) for routing.• Limited routing

resources.• Imperfect slack

matching.• Can scale

voltage on blue path.

Page 14: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI Voltage Scaling:DVHB: Dual Voltage Template• Data rails are full

swing.• Acknowledges

are low swing.• Latency remains

constant through voltage scaling.

• Cycle time can be adjusted through voltage scaling.

Page 15: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI Two Phase Circuits:Two Phase Buffer (HCFB2P)• An HCTA exists

on the right pair of XORs.

• Two transition latency.

• Seven transition cycle time.

• Twice the area of a WCHB. However, it can replace two stages.

Page 16: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI Two Phase Circuits:Two Phase In An FPGA• Replace routing

(SB) with two phase logic.

• Logic (LB) remains four phase.

• Phase converters are placed around logic blocks.

• Routing makes up over half the area in an asynchronous FPGA, so power savings can be large.

Width N Switch

Page 17: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI Two Phase Circuits:ConvertersNeed to convert between two phase (for

routing) and four phase (for logic).The 4:2 converter is 3x larger than a

WCHB.The 2:4 converter is 3.25x larger than a

WCHB.

Page 18: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Experimental Setup• Simulated in

HSpice with a 65nm bulk technology.

• Circuits are sized to the drive strength of a 20/10 lambda inverter.

Name

Description

Inputs

Outputs

ImpliesValidity?

and2 And 2 1 No

or2 Or 2 1 No

xor2 Exclusive Or

2 1 Yes

fa Full Adder 3 2 Yes

benc Booth Encoder

3 2 No

Page 19: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Results :HCHB – Energy Per Cycle• HCHB

consumes 32% less energy than PCHB.

• HCHB consumes 36% less energy than PCEHB.

• Slight frequency improvement.

• Negligible latency penalty.

Page 20: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Results:HCHB – Total Transistor Area• Despite the

additional transistors to check validity, HCHB is smaller.

• HCHB is about 20% smaller than PCHB.

• HCHB is about 15% smaller than PCEHB.

Page 21: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Results:DVHB – Low voltage vs. Dual Voltage

Page 22: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Results:HCFB2P Switch – Energy Reduction vs. WCHB

• Wider switches means larger MUXes and larger PCs.

• The associated caps switch half as much.

• Over 50% reduction in power. Due to replacing two stages.

Page 23: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

RQDI Two Phase Circuits:Results – Area Overhead• Typically, there

is about of 8 stages of 4-wide switches between logic blocks.

• Area overhead is 15%.

• With direct connections, there are about 10 stages with an overhead of 10%.

Page 24: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

SummaryRQDI allows half cycle timing assumptions

outside of staticizers. With RQDI, we can simplify the PCHB logic

template. The resulting template, HCHB, consumes 32% less energy.

The dual voltage logic template can be used to adjust the dynamic slack of a stage. This allows us to save energy with a minimal throughput penalty in token limited loops.

Replacing the routing in an FPGA with two phase logic can reduce energy consumption by 50%. Using the RQDI two phase buffer and converters will achieve this with a 10-15% area overhead.

Page 25: Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Questions?