Approximating to the Last Bit

Thierry Moreau, Adrian Sampson, Luis Ceze {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu

WAX 2016 co-located with ASPLOS 2016

April 3rd 2016

What this Talk is About

How many bits in a program are really that important?

1 - AXE: Quality Tuning Framework

2 - PERFECT Benchmark Study

Precision Tuning

More precision means larger memory footprint, more data movement, more energy used in computation

Precision Tuning

doublefloat

Precision Tuning

doublefloat

AXE Precision Tuning Framework

Goal: Maximize Bit-Savings given a Quality Target

kernel.c

quality target

AXE framework instruction-level

precision requirements

quality &bit-savings

Built on top of ACCEPT, the approximate C/C++ compiler http://accept.rocks

quality

bit-savings

bad OK

instruction 0instruction 1instruction 2…instruction n-1instruction n

Default (no bit-savings)

Coarse-Grained Precision Reduction

quality

bit-savings

bad OK

Fine-Grained Precision Reduction

quality

bit-savings

bad OK

PERFECT Benchmark SuiteApplication Domain Kernels Metric

PERFECT Application 1Discrete Wavelet

Transform

Signal to Noise Ratio(SNR)

[120dB to 10dB] (0.0001% to 31.6% MSE)

2D ConvolutionHistogram Equalization

Space Time Adaptive Processing

Outer ProductSystem SolveInner Product

Synthetic Aperture RadarInterpolation 1Interpolation 2Back Projection

Wide Area Motion Imaging

DebayerImage RegistrationChange Detection

Required Kernels FFT 1DFFT 2D

1 - PERFECT Dynamic Instruction Mix

load/store 27%

int arith 4%

fp arith 31%

math 1%

int arith 25%

control 11%

Safe to approximatePrecise

Long latency ops are all safe to approximate

fp arith 31%

math 1%

load/store 27%

Memory ops are mostly safe to approximate

(mostly data vs. pointers)

int arith 25%

control 11%

Control and address computation must

remain precise

2 - Bit-Savings over Approximate Instructions

Average SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

74%83%

High QualityApproximate

Average SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

74%83%

PERFECT Manual 0.001% MSE

Average SNR (dB)10 20 40 60 80 100 120

26%32%

40%48%

74%83%

PERFECT Manual 0.001% MSE

Approximate Computing 10% MSE

Future Architectural Challenges

Mechanisms to translate bit-savings into energy savings?

New data types/representations?

ISA extensions?

Thank You!

Thierry Moreau, Luis Ceze, Adrian Sampson {moreau, luisceze}@cs.washington.edu, asampson@cornell.edu

WAX 2016 co-located with ASPLOS 2016

April 3rd 2016

Approximating to the Last Bit

Backup Slides

Bit Savings

Explore the opportunity for precision reduction in a hardware-agnostic way

BitSavings =X

insnstatic

(precisionref

� precision

approx

precision

⇥ execs

Framework Overview

Built on top of ACCEPT, the approximate C/C++ compiler http://accept.rocks

AnnotatedProgram

Program Inputs &Quality Metrics

ACCEPTstatic analysis ILPC*

ACCEPTerror injection & instrumentation

ApproximateBinary

Execution & Quality

Assessment

Quality AutotunerOutput Configuration

Quality Results& Bit Savings

* Instruction-level Precision Configuration

Program Annotation

void conv2d (pix *in, pix *out, flt *filter){ for (row) { for (col) { flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } }}

AnnotatedProgram

ApproximateBinary

Execution & Quality

Assessment

Program Annotation

void conv2d (APPROX pix *in, APPROX pix *out, APPROX flt *filter){ for (row) { for (col) { APPROX flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } }}

AnnotatedProgram

ApproximateBinary

Execution & Quality

Assessment

Key: use the APPROXtype qualifier

Program Annotation

Takeways: Annotating data is intuitive (~10 mins to annotate a kernel) Variables used to index arrays cannot be safely approximated

typedef float flttypedef int pix

typedef APPROX float flttypedef APPROX int pix

tips on annotating programs faster

AnnotatedProgram

ApproximateBinary

Execution & Quality

Assessment

Static Analysis

AnnotatedProgram

ApproximateBinary

Execution & Quality

Assessment

Instruction-Level Precision Configuration (ILPC)

conv2d:13:7:load:Int32 conv2d:13:10:load:Float conv2d:13:11:fmul:Float conv2d:13:12:fadd:Float conv2d:15:1:fdiv:Float conv2d:15:7:store:Int32

ACCEPT identified safe-to-approximate instructions from data annotations using flow analysis

void conv2d (APPROX pix *in, APPROX pix *out, APPROX flt *filter){ for (row) { for (col) { APPROX flt sum = 0 int dstPos = … for (row_offset) { for (col_offset) { int srcPos = … int fltPos = … sum += in[srcPos] * filter[fltPos] } } out[dstPos] = sum / normFactor } }}

ACCEPT

Approximate Binary

AnnotatedProgram

ApproximateBinary

Execution & Quality

Assessment

Error Injection

Instruction-Level Precision Configuration (ILPC)

conv2d:13:7:load:Int32 conv2d:13:10:load:Float conv2d:13:11:fmul:Float conv2d:13:12:fadd:Float conv2d:15:1:fdiv:Float conv2d:15:7:store:Int32

Each instruction in the ILCP acts as a quality knob that the autotuner can use to maximize bit-savings

ACCEPT

AnnotatedProgram

ApproximateBinary

Execution & Quality

Assessment

Quality Assessment

The programmer provides a quality assessment script to evaluate quality on the program output

Reference Binary

Approximate Binary

eval.py

10dB SNR

AnnotatedProgram

ApproximateBinary

Execution & Quality

Assessment

Autotuner

config k: error = 0.10%

config [k+1, i-1]: error = 5.91%

config [k+1, i]: error = 0.30%

config [k+1, i+1]: error = 0.12%

config [k+2, i-1]: error = 5.91%

config [k+2, i]: error = 0.33%

config [k+2, i+1]: error = 1.6%

Greedy iterative algorithm: reduces precision requirement of the instruction that impacts quality the least

Finds solution in O(m2n) worst case where m is the number of static safe-to-approximate instructions and n are the levels of precision for all instructions

AnnotatedProgram

ApproximateBinary

Execution & Quality

Assessment

Autotuner

precise60dB40dB

10dB The autotuner greedily maximizes bit-savings as the quality target is lowered

Precision “Guarantees”

Currently empirically derived and input dependent

Future work would extend on the current infrastructure to assimilate data dependence

information in order to derive formal error guarantees

2d-convdwthist-eqoutersysteminnerinterp1interp2bp debayerlucas-kanadechange-detfft-1dfft-2dAVERAGE

prec_controlprec_int_arithprec_memappr_mathappr_fp_arithappr_int_arithappr_mem

PERFECT Benchmark SuiteApplication Domain Kernels Metric

PERFECT Application 1Discrete Wavelet

Transform

SNR [120dB to 10dB]

2D ConvolutionHistogram Equalization

Space Time Adaptive Processing

Outer ProductSystem SolveInner Product

Synthetic Aperture RadarInterpolation 1Interpolation 2Back Projection

Wide Area Motion Imaging

DebayerImage RegistrationChange Detection

Required Kernels FFT 1DFFT 2D

10 log10

PNk=1 |rk|2PN

k=1 |rk � ak|2

N : number of output elements

rk: reference value of element ktk: approximate value of element k

nv dwt

histeq ou

system

solve inn

interp

erp2 bp

AVERAGE

10 20 40 60 80 100 120

You don’t need a lot of bits to obtain an acceptable output!36

Average SNR (dB)10 20 40 60 80 100 120

int arith fp arith mem ops math AGGREGATE

26%32%

83%83%

40%32%

Architectural Target

Core Energy Breakdownoverheads

compute

General Purpose CPU

compute

Vector Processor*

Accelerators

specialization

the smaller the overheads, the larger the potential gains

* [Quora, Venkataramani et al., MICRO2013]

Precision ScalingMechanisms for precision scalability:

• Fine-grained ALU power gating*

• Bit-sliced ALU units

• Lossy Compression

Energy Savings

Bit-Savings

* [Quora, Venkataramani et al., MICRO2013]

Approximating to the Last Bit -...

Documents

Approximating the Normal Tail

approximating curves

Kernels Review

Approximating Power Indices

Approximating Parameterized Convex Optimization Problemstheinf2.informatik.uni-jena.de/theinf2_multimedia/Publications/... Approximating Parameterized Convex Optimization Problems

Interpolation - 2D mapping Tutorial 1: triangulationstockage.univ-brest.fr/~herbette/...interpolation... · Interpolation - 2D mapping 1 dimension Piecewise Cubic Hermite Interpolation

Approximating areas under curves - global2methods.global2.vic.edu.au/.../08/Approximating-areas-under-curves... · Approximating areas under curves • Area under curves ... gradient,

Exploiting Structure in Data - University of Isfahanengold.ui.ac.ir/~sabahi/Advanced digital signal... · 2016-05-08 · DSP Reconstruction filter Analog Sampling Kernels DSP Interpolation

Extensible Kernels

Approximating Bayes Factors

Approximating the extinction threshold of spatial dynamics ...xiangshengw/talks/slide201107-Alberta.pdf · Approximating the extinction threshold of ... Approximating the extinction

mugberiaopac.aadijatechnologies.commugberiaopac.aadijatechnologies.com/opac-admin/images/collegemid... · Stirling's interpolation formula Interpolation by Iteration (Aitken's Interpolation)

rational approximations - arXiv · We review the issues with RBF interpolation using at kernels in Section 2. We then discuss the new vector-valued rational approximation method that

Approximating Roots

Characterising Graphs using the Heat Kernelphst/BMVC2005/papers/58/bmvc05.pdf · 2 Heat Kernels on Graphs In this section, we develop a method for approximating the geodesic distance

uniroma1.it · 2010-02-09 · ization. The process is an interpolation of a Markov chain, and this allows us to write down a recursive equation for the approximating unnormalized

Interpolation and polynomial approximation Interpolation - … · 2019-07-15 · 1 Computer Aided Design Outline Interpolation and polynomial approximation Interpolation Lagrange

Chapter 7: Interpolation Topics 7: Interpolation Topics: ... Michael T. Heath Scientiﬁc Computing 4 / 56 Interpolation Polynomial Interpolation Piecewise Polynomial Interpolation

Application of fastNLO to NNLO calculationsrabbertz/Office/Physics_Talks/PD… · -> interpolation kernels Single PDF is replaced by a linear combination of interpolation kernels

CONSTRAINED INTERPOLATION AND SHAPE PRESERVING ... · 6.2 The Shape of the Data 65 CHAPTER 7 SHAPE PRESERVATION CRITERIA AND THE APPROXIMATING CURVE 67 7.1 Shape Preserving Criteria