Quality Programmable Vector Processors for Approximate Computing · 2013-12-22 · Quality...

Preview:

Citation preview

++

P.Err

1<<

MAX

Xpsc[i]

PSc

Err.[i+1]

MUX

Err Xt

N.Err

MAX/+

MUX >

Quality Programmable Vector Processors for Approximate Computing Swagath Venkataramani1, Vinay Chippa1, Srimat Chakradhar2,

Kaushik Roy1, Anand Raghunathan1

1Integrated Systems Laboratory, School of ECE, Purdue University 2Systems Architecture Department, NEC Laboratories America

Approximate Computing - Motivation

Recognition Synthesis Mining Video

Intrinsic Application Resilience Ability of applications to produce outputs of acceptable quality despite underlying computations executed imprecisely

Search

Intrinsic Application Resilience

‘Noisy’  Real  World Inputs

Self- Healing/ Iterative

Algorithms

Redundant Input Data

No Golden Output

Perceptual Limitations

Statistical Probabilistic

Computations

Vision

Sources of Resilience

Quality Programmable Processors

Experimental Methodology and Results

QUORA: Quality Programmable 1D/2D Vector Processor

National Science Foundation

Summary

Intrinsic application resilience: A new dimension to optimize HW and SW

Objective: Energy-efficient & programmable processor for approximate computing

Quality programmable processors: Quality codified as part of the instruction set

QUORA: Quality programmable 1D/2D vector processor • Quality programmable ISA and microarchitecture

Acknowledgement

NEC laboratories America

Notion of correctness is relaxed Good enough answers !!!

Need programmable platforms for

approximate computing!

APEAPE

APEAPE

APE

APEAPE APE

APE

APE

APE

APE

APE

APE

ACC

ALU

MU

X

MUX

Reg

Reg

MAPE

MAPE

MU

X

Data. O

UT

APEAPE APE

APEAPE APE

APE

APE

APE

APE

1-to

-man

y-D

EMU

X

MAPE

MAPE

SM

SM

SM

SM

MAP

E

MAP

E

MAP

E

MAP

E

SM SM SM SMSM

MUX

Data. OUT

Data. IN

1-to-many-DEMUX

SM

INST.MEMORY

ScalarReg. File ALU

Prog. Counter

INST. DECODE & CONTROL UNIT

CAPE

Halt

Dat

a. IN

DATA MEMORY

ALU

ACC

Scratch Registers

MAPE

APE

Scra

tch

Regi

ster

s

ALU AC

C

MAPE

InstructionInst. AddInst. Read

APE ARRAY

CLK

RESET

SM_row_sel

MAPE_row_sel

SM_col_sel

MAPE_col_sel

Data. INData. OUT

Data. ReadData. WriteData. Add

INTE

RFAC

E

Quality Control Unit & Quality Monitors

Qu

alit

y Co

ntro

l Un

it &

Qu

alit

y M

on

ito

rs

PE c

ount

Com

plex

ity

Ener

gy

App

roxi

mat

ion

Scop

e

3-tiered PE hierarchy enables larger energy benefits from approximate computing

APE

CAPE

MAPE

Processing Element Hierarchy

> 90%

> 70%

Quality Programmable Instruction Set

Quality Configurable Execution

47 Instructions – 9 APE, 22 MAPE, 13 CAPE, 3 SM Instructions extended with 2 quality fields

e.g., qpMAC R_length, R_row_enb, R_col_enb, R_q_type, R_q_amt

Type of error – 3 quality metrics e.g.,  𝑀𝑎𝑥𝐸𝑟𝑟 = |  𝑂𝑜𝑟𝑖𝑔 − 𝑂𝑎𝑝𝑝𝑟𝑜𝑥|

Amount of error

Block Diagram of QUORA Micro-architecture

Track positive & negative errors

Modulate the threshold for

round-off

Quality specified @ vector inst. outputs Scale the precision of input operands Key idea: Compensate errors across

many scalar operations

C.PSc.

Op-code

Error target

Actual Error+/>

Error

ErrorPSc

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

PSc.Unit

MAP

E

MAP

E

MAP

E

MAP

E

MAPE

MAPE

MAPE

MAPE

APEAPE APE

APE APEAPE APE

APE APEAPE APE

APE APEAPE APE

ACC

CLK

Gated CLK

C.PS

c

R.PS

c

Op-

code

Quality Control unit

Precision Scaling Unit

Array Level View Quality Control Unit

• Set precision values

PSc. units shared across

row/col <1% energy

Micro-architectural Parameters Benchmarks

Micro-architectural Parameters Value

Array Dimensions 16 X 16

No. of PEs (APEs + MAPEs+ CAPE) 289 (256 + 32 + 1)

No. of SM elements 32

Depth of SM elements 64

Operating Frequency 250 MHz

Metric Value

Feature Size 45nm

Area 2.6 mm2

Power 367.8 mW

Gate Count 502042

51% 28%

1% 19%

0% 1% APEs (%)MAPEs (%)CAPE(%)SMs (%)PScE (%)Misc. (%)

Applications Dataset Quality Metric Handwritten Digit Recognition

(SVM-MNIST) MNIST

Percentage classification accuracy

Object Recognition (SVM-NORB) NORB

Digit Classification (CNN) MNIST

Eye Detection (GLVQ) NEC labs.

Optical Character Recognition (k-NN) OCR digits

Census Data Analysis (ANN) Adult

Document Search (SSI) Subset of Wikipedia

No. correct in top 25 results

Image Segmentation (K-Means-Seg) Berkeley dataset Mean distance of

clustered points from respective centroids Optical Character Clustering

(K-Means-OCR) OCR digits

Energy Benefits QP-Instructions

0

0.5

1

1.5

Nor

mal

ized

Ene

rgy

--> No Approx. < 0.5%

~ 2.5 % ~ 7.5%

1.05-1.7X savings for NO quality loss 1.18-2.1X savings for < 2.5% quality loss > 2.5X savings for < 7.5% quality loss

88%

2% 10% 1% 3%

96%

QP-APE QP-MAPEAccurate

Dynamic inst. count Energy 90% of energy in QP instructions

Precision Scaling Mechanisms

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1

APE

Ene

rgy

-->

Average Error (%) -->

TruncUp/DownErr. Comp

Error compensation provides superior Q-E trade-off

QP-MAC

An abstract model for programmable approximate processors

Application Program

Application Quality

Requirement

Program with QP-ISA

HW/S

W IN

TERF

ACE

Quality Programmability: Ability to specify the desired accuracy requirement to HW

Notion of quality explicitly built into the instruction set

0

5

10

15

20

25

30

0 5 10

% o

f App

roxi

mat

e in

stru

ctio

ns

% Loss in Output Quality -->

Arbitrary< 50%< 25%< 7.5%

Constraining errors enables 25-100X more approximate instructions compared to allowing arbitrary errors

Quality Configurable

Execution Unit

QUALITY PROGRAMMABLE MICROARCHITECTURE

Decode & Control

Inst. Fetch

Register File

Quality Control Logic

Instruction accuracy monitor

Software visible Error Registers Quality Programmable ISA

• Quality fields in insts. qpADD dest, op1, op2, MAG, 1%

• Purely based on instruction semantics

Quality-programmable add

Error magnitude < 1%

Capable of executing instructions with different quality levels

Feedback about actual error - used by software to determine quality levels of future instructions

Micro-architecture guarantees instruction level quality bounds

SVM

Recommended