40
sa pa University of Washington MICRO 2012 Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian Sampson Luis Ceze Doug Burger University of Washington Microsoft Research

Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

sa paUniversity of Washington MICRO 2012

Neural Accelerationfor General-PurposeApproximate ProgramsHadi EsmaeilzadehAdrian SampsonLuis CezeDoug Burger

University of Washington

Microsoft Research

Page 2: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

ProgramCPU

Page 3: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

JPL & Rob Hogg NASA thefrugalgirl.com

computer vision

machine learning

sensory data

physical simulation

information retrieval

augmented reality

image rendering

Page 4: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

computer vision

machine learning

sensory data

physical simulation

information retrieval

augmented reality

image rendering

Approximatecomputing

EnerJ programming language[PLDI 2011]

Truffle dual-voltage architecture[ASPLOS 2012]

Relax software fault recovery[de Kruijf et al., ISCA 2010]

Code perforation transformations[MIT]

Green runtime system[Baek and Chilimbi, PLDI 2010]

Flikker approximate DRAM[Liu et al., ASPLOS 2011]

Stochastic processors[Illinois]

Probabilistic CMOS designs[Rice, NTU, Georgia Tech…]

Page 5: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Accelerators

CPU GPU

FPGAVector

Unit

DySERWisconsin

BERETMichiganConservation

CoresUCSD

Page 6: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Accelerators

CPU GPU

FPGAVector

Unit

DySERWisconsin

BERETMichiganConservation

CoresUCSD

Approximatecomputingcomputer vision

machine learning

sensory data

physical simulation

information retrieval

augmented reality

image rendering

Page 7: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

An acceleratorfor approximate computations

ApproximateAccelerator

1.0

Mimics functions written in traditional languages!

Runs more efficiently than a CPU or a precise accelerator!

May introduce small errors!

√NEW

!

Page 8: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Neural networksare function approximatorsTrainable: implements many functions

Very efficienthardware implementations

Highly parallel

Fault tolerant[Temam, ISCA 2012]

CPU

Page 9: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Program

Neural acceleration

Page 10: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Neural acceleration

Program

Annotate an approximateprogram component

Page 11: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Program

Annotate an approximateprogram component

Compile the programand train a neural network

Neural acceleration

Page 12: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Program

Annotate an approximateprogram component

Compile the programand train a neural network

Execute on a fast Neural Processing Unit (NPU)

Neural acceleration

Page 13: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Annotate an approximateprogram component

Compile the programand train a neural network

Execute on a fast Neural Processing Unit (NPU)

Neural acceleration

Improve performance 2.3x and energy 3.0x on average

1234

Page 14: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Programming model

float grad(float[3][3] p) { …}

edgeDetection()

void edgeDetection(Image &src, Image &dst) { for (int y = …) { for (int x = …) { dst[x][y] = grad(window(src, x, y)); } }}

[[transform]]

Page 15: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Code region criteria

Hot code

Approximable

Well-definedinputs and outputs

grad()

run on every3x3 pixel window

small errors do not corrupt output

takes 9 pixel values;returns a scalar

Page 16: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Empirically selectingtarget functions

Program AcceleratedProgram

√✗

Page 17: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Compiling and transformingAnnotated

SourceCode

Training Inputs

TrainedNeural

Network

Augmented Binary

1. CodeObservation

2. Training3. Code

Generation

Page 18: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Code observation

[[NPU]]float grad(float[3][3] p) { …}

void edgeDetection(Image &src, Image &dst) { for (int y = …) { for (int x = …) { dst[x][y] = grad(window(src, x, y)); } }}

+ =

test cases instrumented program

samplearguments & outputs

323, 231, 122, 93, 321, 49 53.2➝

p grad(p)

49, 423, 293, 293, 23, 2 94.2➝

34, 129, 493, 49, 31, 11 1.2➝

21, 85, 47, 62, 21, 577 64.2➝

7, 55, 28, 96, 552, 921 18.1➝

5, 129, 493, 49, 31, 11 92.2➝

49, 423, 293, 293, 23, 2 6.5➝

34, 129, 72, 49, 5, 2 120➝

323, 231, 122, 93, 321, 49 53.2➝

6, 423, 293, 293, 23, 2 49.7➝

record(p); record(result);

Page 19: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Training

BackpropagationTraining

Training Inputs

Page 20: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Training

Training Inputs

Training Inputs

Training Inputs

fasterless robust

slowermore accurate

70% 98% 99%

Page 21: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Code generation

void edgeDetection(Image &src, Image &dst) { for (int y = …) { for (int x = …) { p = window(src, x, y); NPU_SEND(p[0][0]); NPU_SEND(p[0][1]); NPU_SEND(p[0][2]); … dst[x][y] = NPU_RECEIVE(); } }}

Page 22: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Neural Processing Unit (NPU)

Core NPU

Page 23: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Software interface:ISA extensions

Core NPU

input

output

configurationenq.cdeq.c

enq.d

deq.d

Page 24: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Microarchitectural interface

NPUconfiguration

S

enq.cdeq.c

enq.d

deq.d

NS

S NS

Fetch

Decode

Issue

Execute

Memory

Commit

Page 25: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

A digital NPUBus

Scheduler

Processing Engines

input

output

scheduling

Page 26: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

A digital NPUBus

Scheduler

Processing Engines

input

output

schedulingmultiply-add unit

accumulator

sigmoid LUT

neuronweights

input

output

Page 27: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Experiments

Several benchmarks; annotated one hot function eachFFT, inverse kinematics, triangle intersection, JPEG, K-means, Sobel

Simulated full programs on MARSSx86Energy modeled with McPAT and CACTIMicroarchitecture like Intel Penryn: 4-wide, 6-issue45 nm, 2080 MHz, 0.9 V

Page 28: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

1,079static x86-64instructions

60 neurons2 hidden layers

88 staticinstructions

18neurons

triangle intersection

edge detection

Two benchmarks

56% of dynamicinstructions

97% of dynamicinstructions

Page 29: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Speedup with NPU acceleration

2.3x average speedupRanges from 0.8x to 11.1x

0x

2x

4x

6x

8x

10x

12x

fft inversek2j jmeint jpeg kmeans sobel geometric mean

spee

dup

over

all-

CPU

exe

cutio

n

Page 30: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Energy savings with NPU acceleration

3.0x average energy reductionAll benchmarks benefit

0x

2x

4x

6x

8x

10x

12x

fft inversek2j jmeint jpeg kmeans sobel geometric mean

ener

gy re

duct

ion

over

all-

CPU

exe

cutio

n

21.1x

Page 31: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Application quality loss

Quality loss below 10% in all casesBased on application-specific quality metrics

0%

20%

40%

60%

80%

100%

fft inversek2j jmeint jpeg kmeans sobel geometric mean

qual

ity d

egra

datio

n

Page 32: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Edge detectionwith gradient calculation on NPU

Page 33: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Also in the paper

Sensitivity to communication latency

Sensitivity to NN evaluation efficiency

Sensitivity to PE count

Benchmark statistics

All-software NN slowdown

Page 34: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Program

Page 35: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Program

Page 36: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Program

Neural networks can efficiently approximate functions from programs written in conventional languages.

Page 37: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

CPU

low powerparallel

regular fault-tolerantanalog

flexible

Page 38: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian
Page 39: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Normalized dynamic instructions

0%

20%

40%

60%

80%

100%

fft inversek2j jmeint jpeg kmeans sobel geometric mean

dyna

mic

inst

ruct

ion

coun

t nor

mal

ized

to o

rigin

al

other instructionsNPU queue instructions

Page 40: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian

Slowdown with software NN

20x average slowdownUsing off-the-shelf FANN library

0x

15x

30x

45x

60x

75x

fft inversek2j jmeint jpeg kmeans sobel geometric mean

slow

dow

n ov

er o

rigin

al p

rogr

am