-1- UC San Diego / VLSI CAD Laboratory Accuracy-Configurable Adder for Approximate Arithmetic Designs Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY,

-1-UC San Diego / VLSI CAD Laboratory

Accuracy-Configurable Adder for Approximate Arithmetic Designs

Accuracy-Configurable Adder for Approximate Arithmetic Designs

Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY, UC San Diego

49th Design Automation ConferenceJune 6th, 2012

-2-

OutlineOutline

Background and Motivation Accuracy Configurable Adder Design Experimental Setup and Results Conclusions and Ongoing Works

-3-

Why Approximate Designs?Why Approximate Designs? Threats to traditional IC design approach ...

Extreme variations: PVT variation uncertainty lead to design overheadReliability issues: Hard errors (NBTI, latchup), Soft errors (α-particle)Cost: Cost (power/performance) of perfect accuracy is too high!

Approximate designsRelaxing the requirement of correctness can dramatically reduce costs of the design

What is the square root of 10 ?

“a little more than three”

“3.162278....”

Approximation could be faster and more powerful

Threats to traditional IC design approach ...Extreme variations / Reliability issues / Cost:

Approximate designsRelaxing the requirement of correctness can dramatically reduce costs of the design

-4-

Previous Approximate AddersPrevious Approximate Adders

Lu et al. IEEE Computer 2004

Zhu et al. TVLSI 2010

Output accuracy is fixed benefits can be limited by required accuracy

Faster adder w/ shorter carry chain High performance with small error rate Large area overhead: not applicable for

low energy design

ETAI : accurate part + inaccurate part Reduce error size Error rate is high

-5-

Our Work: Accuracy-Configurable Approximate AdderOur Work: Accuracy-Configurable Approximate Adder

time

norm

aliz

ed p

ower 1.0

required accuracy80% 100% 90% 80%

accurate design

accuracy configurable design

event occurred

accurate mode

approximate mode

Accuracy-configurable design adapts to changing requirements by using different modes in each situation

How power benefits can be achieved …

-6-

Our Work: Accuracy-Configurable Approximate AdderOur Work: Accuracy-Configurable Approximate Adder

How power benefits can be achieved …

time

norm

aliz

ed p

ower 1.0


accurate design


event occurred

accurate mode

approximate mode

Accuracy-configurable approximate adder

Mode 1: turn-off ECC-1, ECC-2

accuracy: 90% accuracy: 95%Mode 2: turn-off ECC-2

Mode 3: turn-on All ECC

accuracy: 100%

approximateadder

error collection

(ECC-1)

error collection

(ECC-2)

-7-

OutlineOutline

Background Motivation Accuracy Configurable Adder Design Experimental Setup and Results Conclusions and Ongoing Works

-8-

Approximate Adder ImplementationApproximate Adder Implementation

A[15:0]

8-bitadder

8-bitadder

8-bitadder‘

SUM[16]

SUM

SUM[3:0]

SUM[15:12]AH+BH

AM+BM

AL+BL

SUM[7:4]

SUM[11:8]

carryAH=A[15:8],AM=A[11:4],

AL=A[7:0]

B[15:0]

A[0]

A[15]

SUMH

SUMM

SUML

16-bit adder case

Carry chain is cut to reduce critical path delay Sub-adders generate results of partial summation Middle sub-adder improves accuracy (error 50% 5.5%)

-9-

Approximate Adder ImplementationApproximate Adder Implementation

N-bit adder case

Probability of correct result :2

1)

2

12

2

11(),(

k

N

k

k

kkNP

Approximate adder can be configured with “k”

A [N-1:N-k]

B [N-1:N-k]

A [N-k-1:N-2k]

B [N-k-1:N-2k]

A [N-2k-1:N-3k]

B [N-2k-1:N-3k]

SUM [N-1:N-k] SUM [N-k-1:N-2k]

A [N-2k-1:N-3k]

B [N-2k-1:N-3k]

SUM [N-2k-1:N-3k]carry

k N: bit width, k: ½ carry-chain depth

Estimation over CLA (N=16)K 2 3 4 5 6

Min. clock cycle 0.5 0.65 0.75 0.83 0.89area 0.87 1.05 1.12 1.15 1.12power 0.44 0.68 0.84 0.95 1.00pass rate 0.554 0.829 0.942 0.982 0.995

carry

-10-

Error Detection and CorrectionError Detection and Correction

SUMapprox

OUTINsub-adderi

sub-adderi+1

approximate adder

SUMcorrect

carryi+1

error

EDC circuit

data stall

sumi

errori

incrementor

Error can be detected and corrected with small overhead Error detection: ‘and’ gates Error correction: incrementor circuit

Error detection and correction can take more time than critical path delay of “sub-adder”; the throughput can be reduced

Variable latencyoperation

-11-

Accuracy Configuration with PipelineAccuracy Configuration with Pipeline

approximate adder

A

B

Stage 1 Stage 2

errors on S1

SUMcorrectcorrection on S1

S3 S2 S1 S0SUM

approximate correct

S3 S2 S1 S0

approximate correct

Stage 3

correction on S2

Stage 4

correction on S3

S3 S2 S1 S0

correctapprox.

S3 S2 S1 S0

correct

errors on S2

errors on S3

Config.Power-gating

Accuracy

Power reductio

n

Mode-1 None 1.000 -11.5%

Mode-2 Stage 4 0.960 12.4%

Mode-3 Stage-3, 4 0.925 31.0%

Mode-4Stage-2, 3,

40.900 51.6%

Each stage generates a result with different accuracy

Can turn off later stages with power gating according to accuracy requirement

power gating

power gating

power gating

-12-

OutlineOutline


-13-

Experimental Setup and MetricsExperimental Setup and Metrics

Metric Definition Data typeACCamp 1-|Rc-Re|/Rc Amplitude dataACCinf 1-Be/Bw Information data

Experimental Setup Library: TSMC 65GP Implementation: Synopsys Design Compiler Simulation: Cadence NC-SIM Input patterns: random data and actual data Library preparation: Cadence Library Characterizer

Accuracy Metrics

Rc and Re : correct and obtained results Be: number of error bits, Bw: bit-width of data

-14-

Approximate Adder ComparisonApproximate Adder Comparison Accuracy vs. power consumption

Image smoothing(Gaussian filter)(a) Original image(b) Accurate adder(c) ACA (PSNR 24.5dB)(d) ETAI (25.3dB)(e) ETAII (16.2dB)(f) LU (11.1dB)

(c)~(f) have 50% power of accurate adder (b)

(a) (b) (c)

(d) (e) (f)

* ETAI cannot detect and correct errors

-15-

Approximate Adder ComparisonApproximate Adder Comparison Accuracy vs. power consumption w/voltage scaling

2.00E-04 4.00E-04 6.00E-04 8.00E-040.400

0.500

0.600

0.700

0.800

0.900

1.000

ACA adderCLALu's adderETAI

total power (W)

ACCamp

Voltage scaling (1.0V~0.6V)

2.00E-04 4.00E-04 6.00E-04 8.00E-040.400

0.500

0.600

0.700

0.800

0.900

1.000

ACA adderCLALu's adderETAIETAIIM

total power (W)

ACCinf

ACA adder shows fine results (accuracy vs. power)

on both ACCamp and ACCinf metrics

-16-

0.80 0.85 0.90 0.95 1.000.00E+00

5.00E-04

1.00E-03

1.50E-03

2.00E-03

2.50E-03

3.00E-03

3.50E-03

4.00E-03

Conventional pipelined adderACA adder (mode 1)ACA adder (mode 2)ACA adder (mode 3)ACA adder (mode 4)

ACCinf

tota

l pow

er c

onsu

mpti

on (W

)Accuracy Configuration and Power SavingAccuracy Configuration and Power Saving Power saving from voltage scaling + mode change

4-stage 32-bit adder caseaccurate result

mode change

voltage scaling

Accuracy configuration w/ mode change is more effective than w/ voltage scaling

volta

ge s

calin

g

mod

e ch

ange 4X

redu

ction

Accuracy:1.0 → 0.9

-17-

Accuracy Configuration and Power SavingAccuracy Configuration and Power Saving Power consumption when accuracy requirement

is varying (w/ SPEC 2006 benchmarks)

astar

bzip2

calcu

lix gcc

h264refmcf

sjeng

soplex

0

0.2

0.4

0.6

0.8

1

mode-4mode-3mode-2mode-1

Nor

mal

ized

pow

er

cons

umpti

on

0.95 Accuracy 1.00

Average 30% power savings over no accuracy configuration

reference

referenceresultAvgAccuracy

||1.

Hig

h ac

cura

cy

-18-

OutlineOutline


-19-

Conclusions and Ongoing WorksConclusions and Ongoing Works

RTL Required accuracy

exact adder

approximate adder Synthesis

Accuracy estimation

Conclusions We proposed accuracy-configurable approximate (ACA)

adder, which can adapt to changing accuracy requirement ACA can provide 30% power reduction with accuracy

configuration during runtime Ongoing Works

Accuracy-configurable design for other arithmetic units (multiplier, divider)

Automated synthesis flow (minimize power under the required accuracy)

-20-

Thank You!

-21-

Accuracy-Configurable Approximate DesignAccuracy-Configurable Approximate Design

Required accuracy can change during runtime Idea of High-Efficiency Math

highlighted by Intel Labs at ISSCC-2012 Variable-precision floating point unit w/

accuracy tracking : 24-bit 12-bit 6-bit as needed

time

norm

aliz

ed p

ower 1.0


accurate design


event occurred

accurate mode

approximate mode

Variable-precision Mantissa

Accuracy-configurable design adapts to changing requirements, maximizing benefits of approximate design paradigm

Documents

-1- UC San Diego / VLSI CAD Laboratory Accuracy-Configurable Adder for Approximate Arithmetic Designs Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY,