70
Neural Network Acceleration on Mobile Devices Dongyoung Kim Machine learning team Hyperconnect Seoul, South Korea [email protected]

Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Neural Network Accelerationon Mobile Devices

Dongyoung KimMachine learning team

HyperconnectSeoul, South Korea

[email protected]

Page 2: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Agenda

● Introduction

● Neural Processing Unit (NPU)○ NPU utilizing sparsity

Zero-aware neural network accelerator (ZeNA)

○ NPU utilizing reduced precision Outlier quantization and Precision highway

● Working as an AI System Architect in the Industry

● On-device Machine Learning○ Neural network acceleration on mobile CPU

○ Optimizing neural network for mobile devices

Temporal convolution for real-time keyword spotting on mobile devices

Page 3: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Introduction

Page 4: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Deep Neural Network (DNN)

Self driving car Voice recognition Feed and ad

● Deep neural networks are ubiquitous in various applications.

Page 5: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

DNN on Real‐time Mobile Devices

Self driving car Virtual Reality (VR) Augmented Reality (AR)

● Especially, DNN shows reliable result on real-time mobile devices.

Page 6: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Challenges of DNN Applications on Mobile Devices

Embedded systemsGPU server

● DNN based applications perform well on large devices which have abundant resources but they are unsuitable for mobile devices.

Page 7: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Neural Processing Unit (NPU)

Page 8: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Neural Processing Unit (NPU)

A11 / A12 Bionic (Apple)

Snapdragon 855 (Qualcomm)

Kirin 970 (Huawei) Exynos 9820 (Samsung)

Booming of NPU

Page 9: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

NPU Utilizing Sparsity

Page 10: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Sparsity

● Significant portion of input values in a convolutional layer is zero.● Large number of ineffectual computations can be skipped.● However, it is difficult to utilize sparsity on CPU.

AlexNet

Layer Zero Weight [%] Zero Activation [%]conv1 15.7 0

conv2 62.1 50.9

conv3 65.4 76.3

conv4 62.8 61.8

conv5 63.1 59.0

Pruning ReLU

Page 11: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Previous Works

AlexNet

Layer Zero Weight [%] Zero Activation [%]conv1 15.7 0

conv2 62.1 50.9

conv3 65.4 76.3

conv4 62.8 61.8

conv5 63.1 59.0

Energy ↓

Run me ↓Energy ↓

Run me ↓

● DaDianNao: wide SIMD-like architecture.● Eyeriss: clock-gating for multiplications with zero activations.● Cnvlutin: utilizes zero activations for performance and energy.● Cambricon-X: utilizes zero weights for performance and energy.

Exploit zero values in both kernel weights and input activations

Page 12: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Basic Idea of Zero‐aware Neural Network Accelerator (ZeNA)

PE 0

PE 1

Proportional to intersection of non-zero inputs

K: kernel weightsA: input activationsP: partial sums

Cambricon-X

Cnvlutin

BroadcastZeNA

Zero-induced load imbalance

*

*

Zero bit-vectors

Zero bit-vectors

Page 13: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

ZeNA Architecture Overview

Act SRAM

ReLUmodule

PE array

Activation

Weight

Conv result

Psum

Bitvec

Output FM

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

WeightSRAM

Weight SRAM storesKernel weightsNon-zero bit-vectors

Output FM

BitvecReLU module generates

Output feature mapsNon-zero bit-vectors

Bus connectsOn-chip SRAM and PE array

Act SRAM storesActivationsOutput partial sumsOutput feature mapsNon-zero bit-vectors

Page 14: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Work Group (WG)

Activation Kernel

Activation Kernel

WG 1

Activation Kernel

WG 0

● Work group (WG): Spatial dimension of input activations is divided into work groups.

Page 15: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Work Group (WG)

Activation Kernel

Activation Kernel

WG 0

WG 1

Sub-WG: WG is divided into multiple subsets in the channel dimension of output feature map.

Activation Kernel

Sub-WG 0 of WG 0

Activation Kernel

Sub-WG 1 of WG 0

Activation Kernel

Sub-WG 0 of WG 1

Activation Kernel

Sub-WG 1 of WG 1

Page 16: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Work Group (WG)

Activation Kernel

Sub-WG 0 of WG 0

Activation Kernel

Sub-WG 0 of WG 1

Sub-WG 0 of WG 0

Activation tiles Kernel tiles

Sub-WG 0 of WG 1

Activation tiles Kernel tiles

● Each sub-WG consists of activation tiles and kernel tiles.

Page 17: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Computation ProcedureActivation Kernel

Output feature map

WG 0

WG 1

Sub-WG 0

Sub-WG 1

● Our accelerator performs convolution with activation and kernel tiles iterativelyto compute output feature maps.

Page 18: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Data Flow and Computation: Kernel Broadcast

Weight

Act SRAM

ReLUmodule

Activation

Conv result

Psum

Bitvec

Output FM

WeightSRAM

PE array

PE group 0PE 0 PE 1

PE 2 PE 3

PE 0 PE 1

PE 2 PE 3

PE group 1

Page 19: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Data Flow and Computation: Kernel Broadcast

Weight SRAM

Weight

Activation Output feature mapKernel

* =

PE array

PE group 0PE 0 PE 1

PE 2 PE 3

PE 0 PE 1

PE 2 PE 3

PE group 1Broadcast

WG1

WG0 Sub-WG 0

Sub-WG 1

Page 20: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Data Flow and Computation: Activation Broadcast

Weight

Act SRAM

ReLUmodule

Activation

Conv result

Psum

Bitvec

Output FM

WeightSRAM

PE array

PE group 0PE 0 PE 1

PE 2 PE 3

PE 0 PE 1

PE 2 PE 3

PE group 1

Page 21: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Data Flow and Computation: Activation Broadcast

Act SRAM

PE array

PE group 0PE 0 PE 1

PE 2 PE 3

PE 0 PE 1

PE 2 PE 3

PE group 1

BroadcastBroadcast only the difference

Previous activation tiles are stored in local buffer

Zing-zag scan

Activation Output feature mapKernel

* =

WG1

WG0 Sub-WG 0

Sub-WG 1

Activation

Page 22: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Zero‐aware PE

Non-zero index

Act buffer

Act zero bit-vector1 1 0 0 1

0 1 0 1 1Weight zero bit-vector

Fetch controller

Weight buffer

Psum buffer+

AND

× ×

PE

3 0

Page 23: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Zero‐induced Load Imbalance

PE 0

PE 1

*

*

PE 0

PE 1

*

*

Sub-WG 0 Sub-WG 1

Before kernel allocation is applied Total runtime

● Typically, kernel weights are allocated to PEs based on the kernel index.

Page 24: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Zero‐aware Kernel Allocation

PE 0

PE 1

*

*

PE 0

PE 1

*

*

Sub-WG 0 Sub-WG 1

Before kernel allocation is applied Total runtime

After kernel allocation is applied Total runtime● First, we sort the sets of kernel weights based on the number of non-zero weights in the sets.

Then, allocate sets of kernel weights to PEs in the sorted order

Page 25: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Performance

0

2

4

6

Alexnet VGG-16

Spee

dup

WZ AZ WAZ WAZ+KA

AlexNet

4x (AlexNet) and 5.2x (VGG-16) speed up w.r.t. Eyeriss. 1.8x (AlexNet) and 2.1x (VGG-16) speed up w.r.t. AZ (Cnvlutin). 19.6% (AlexNet) and 25.8% (VGG-16) speed up w.r.t. WAZ.

4x

5.2x

1.8x

2.1x

19.6%

25.8%

Page 26: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Energy

11.3% (AlexNet) and 18% (VGG-16) energy reduction w.r.t. Eyeriss.

0

0.2

0.4

0.6

0.8

1

AlexNet VGG-16

Norm

alize

d en

ergy

/ Im

age

Eyeriss WAZ WAZ+KA

0

0.2

0.4

0.6

0.8

1

AlexNet VGG-16

Norm

alize

d en

ergy

/ Im

age

Eyeriss WAZ WAZ+KA

0

0.2

0.4

0.6

0.8

1

AlexNet VGG-16

Norm

alize

d en

ergy

/ Im

age

Eyeriss WAZ WAZ+KA

0

0.2

0.4

0.6

0.8

1

AlexNet VGG-16

Norm

alize

d en

ergy

/ Im

age

Eyeriss WAZ WAZ+KA

0

0.2

0.4

0.6

0.8

1

AlexNet VGG-16

Norm

alize

d en

ergy

/ Im

age

Eyeriss WAZ WAZ+KA

0

0.2

0.4

0.6

0.8

1

AlexNet VGG-16

Nor

mal

ized

ene

rgy

/ Im

age

SRAM leakage SRAM Local buffer Logic Bus

Exploiting zero weights

11.3% 18%

AlexNet VGG-16

Page 27: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

NPU Utilizing Reduced Precision

Page 28: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Reduced Precision

● Neural network shows comparable accuracy after applying quantization.● Quantization method reduces computation complexity and memory

footprint.

Page 29: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Outlier Quantization

Outlier: weight and activation having larger value than threshold. Outlier incurs quantization error. Outlier-aware Quantization

○ Keep outliers in high-precision.○ Apply linear quantization to data except outliers.

Outliers

High-precision outliers (1~3 %)(No quantization error)

Low-precision linear quantization

(Small quantization error)

Page 30: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Accuracy

● 4-bit quantization with outliers of 0.5%already gives good accuracy without fine-tuning.

● Outliers of 3.5% lose only <1% accuracy.

Page 31: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Precision Highway

Conventional approach Precision highway

● Precision Highway

○ Keep residual path in high-precision (8-bit).

○ Apply low-bit linear quantization (2-bit) to data except residual path.

Page 32: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Precision Highway vs. Zhuang’s (2‐bit)

Zhuang’s: [Bohan Zhuang et al. Towards effective low-bitwidth convolutional neural networks. Computer Vision and Pattern Recognition(CVPR), 2018.]

Precision highway shows 66.71% and 73.55% TOP-1 accuracy in ResNet-18 and ResNet-50, respectively.

Page 33: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

NPU Supporting Mixed‐precision Computation (OLAccel)

● Mixed-precision computation

○ In case of outlier:

4-bit data (dense) + 16-bit activation/8-bit weight outliers (sparse)

○ In case of precision highway:

2-bit data (dense)+ 8-bit residual path (sparse)

Page 34: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Area and Energy Reduction

● 82.3 % reduction in chip area (16-bit vs. 3-bit).● 73.1 % reduction in energy consumption (16-bit vs. 3-bit).

Page 35: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Working as an AI System Architect in the Industry

Page 36: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving
Page 37: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving
Page 38: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving
Page 39: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Neural Network Acceleration on Mobile CPU

Page 40: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Running AI Applications Using Mobile CPU

● AI applications are widely used even on low-end mobile devices where NPU is absent.

● CPU is required for executing specific operations utilized in state-of-the-art neural networks.

Page 41: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Example of 1X1 Convolution

Input activation

Weight

* =

Output activation

8

1

16

1

18

1

16

Page 42: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Example of 1X1 Convolution

1

Input activation

Weight

* =

Output activation

8

1

16

1

18

1

16

0 ,… , 7

0 ,… , 7

Page 43: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

1X1 Convolution on ARM CPU with 8‐bit Quantization (w/o quality loss)

Example of 1X1 convolution Pseudo code (int8)

Input activation

Weight

* =

Output activation

8

1

16

1

18

1

16

0 ,… , 7

0 ,… , 7

Page 44: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

1X1 Convolution on ARM CPU with 8‐bit Quantization (w/o quality loss)

4 cycle

64 cycle

4 cycle

4 cycle

40 cycle

4 cycle

1.5x speed up

Pseudo code (int8; 48 cycle)Pseudo code (float32; 72 cycle)

Page 45: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

1X1 Convolution on ARM CPU with 8‐bit Quantization (w/ quality loss)

Pseudo code (int8 w/ quality loss)Pseudo code (int8)

Page 46: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

1X1 Convolution on ARM CPU with 8‐bit Quantization (w/ quality loss)

4 cycle

64 cycle

4 cycle

2 cycle

24 cycle

2 cycle

2.6x speed up

Pseudo code (float32; 72 cycle) Pseudo code (int8 w/ quality loss; 28 cycle)

Page 47: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Optimizing Neural Network for Mobile Devices

Page 48: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Neural Network Optimization for Mobile Devices

● IT industries are interested in IoT, AR/VR, robotics and self driving car.● They require tremendous computations.● Neural network optimization for mobile devices is required.

AR glass (Google, Microsoft)

Robot (Amazon)

IoT devices(Naver, Google, Amazon,

Apple, …)

Self driving car (Tesla, Waymo, Uber)

Page 49: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Keyword Spotting (KWS)

● Keyword spotting (KWS) deals with the identification of predefined keywords in utterances.

● Recognizing wake-up word (“Hey Siri” and “OK Google”) and distinguishing common command (“yes” or “no”).

Siri: “Hey Siri” Alexa: “Alexa” Google assistant: “OK, Google”Widely used for hands-free control of mobile applications

Page 50: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

CNN‐based KWS

● CNN-based KWS studies show remarkable accuracy.● Most of CNN-based KWS approaches receive features as a 2D input of a

convolutional network.

Utterance2D input

(e.g., MFCC) CNN

Page 51: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Challenges of KWS on Real‐time Mobile Devices

● Since use of KWS is commonly concentrated on mobile devices, the response of KWS should be both fast and accurate.

It is challenging to implement fast and accurate KWS on mobile devices with restricted hardware resources

Page 52: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Preliminary: Spectrogram

● Time-frequency representation of a speech signal is referred to as spectrogram.

Speech signal

Ampl

itude

Time

Spectrogram

Sampling

FFT

Ampl

itude

Frequency

Ampl

itude

Frequency

Page 53: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Preliminary: Mel‐frequency Filter

● It is observed that human ears act as filter.

● Mel-frequency filters are non-uniformly spaced on the frequency axis.

Speech signalAm

plitu

de

Time

Spectrogram

Ampl

itude

Frequency

Mel-frequency filters

Mel-spectrogram

Page 54: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Preliminary: Mel‐frequency Cepstral Coefficients (MFCC)

● Cepstral coefficients obtained from Mel-spectrogram are referred to as Mel-Frequency Cepstral Coefficients (MFCC).

Speech signalAm

plitu

de

Time

Mel-spectrogram

MFCC

Ampl

itude

Frequency

Cepstral analysis

Page 55: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Preliminary: Mel‐frequency Cepstral Coefficients (MFCC)

● MFCC: speech signal is represented as a sequence of cepstral vectors.

Speech signalAm

plitu

de

Time

MFCC

n

1

Tiime

MFC

Coe

ffici

ents

(re

pres

ent f

requ

ency

) Time (==1)

MFC Coefficients

MFC Coefficient

Page 56: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Conventional 2D Convolution for KWS

Input feature map

MFCC

time ( t )frequ

ency

( f )

Output feature mapWeights

1 3

31

1* =

● Conventional 2D convolution for KWS utilizes input tensor where w = t, h = f (or vice versa), and c = 1.

Page 57: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Conventional 2D Convolution for KWS

Input feature map

MFCC

time ( t )frequ

ency

( f )

Output feature mapWeights

1 3

3

* =(1)(2)

● Conventional 2D convolution slides window along the large spatial dimension (2D) of input feature map.

Page 58: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Proposed Temporal Convolution for KWS

Input feature map

MFCC

time ( t )

3

I

Input feature map

MFCC

time ( t )frequ

ency

( f )

Conventional approach Proposed temporal convolution

Page 59: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Proposed Temporal Convolution for KWS

Input feature map

MFCC

time ( t )

Weights Output feature map

1* =3

I

(1)

(2)

● Proposed temporal convolution slides window along the small spatial dimension (1D) of input feature map.

31

Page 60: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Problem of Conventional 2D Convolution for KWS

Tiime (t)

MFC

Coe

ffici

ents

(f)

(max

= 8

0)Conventional

2D convolution

1st conv(3×3 weight)

Receptive field(3×3)

t

f

● Both low-and-high frequency data at the same time step includes informative features.

● Since modern CNNs commonly utilize small kernels, it is difficult to capture informative features from both low and high frequencies.

Input MFCC

Page 61: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Small Receptive Field of Conventional 2D Convolution

Tiime (t)

MFC

Coe

ffici

ents

(f)

(max

= 8

0)Conventional

2D convolution

1st conv(3×3 weight)

Receptive field(2N+1) × (2N+1)

t

f

● Assume that N convolutional layers of 3 × 3 weights with a stride of one exist, the receptive field of the network only grows up to 2N+ 1.

● Conventional 2D convolution requires a large number of operations to increase receptive field.

2nd conv (3×3 weight)

N th conv (3×3 weight)

Not suitable for running on real-time mobile devices

Input MFCC

Page 62: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Large Receptive Field of Proposed Temporal Convolution 

Tiime (t)1st conv

(3×1 weight)

ProposedTemporal

convolution

MFC

Coe

ffici

ents

(f)

(max

= 8

0)

● In the proposed method, all lower-level features always participate in forming the higher-level features in the next layer.

Tiime (t)

MFC

Coe

ffici

ents

(f)

(max

= 8

0)Conventional

2D convolution

1st conv(3×3 weight)

t

f

2nd conv (3×3 weight)

N th conv (3×3 weight)

Every frequency is covered

Input MFCC

Input MFCC

Page 63: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Small Footprint and Low Computational ComplexityInput feature map Output feature mapWeights

1 3

3

* =

Input feature map Output feature mapWeights

3

Conventional2D convolution

* =ProposedTemporal

convolution

Note that both the parameters of a conventional 2D convolution and that of the temporal convolution have the same size in this example by setting t = 98, f = 40, c = 160, and c` = 12.

MACs = 3 × 3 × 1 × × ×

= 5,644,800

MACs = = 3 × 1 × × × 1 × ′

= 141,120

31

Page 64: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

End‐to‐end Pipeline for Mobile Devices

● End-to-end pipeline for Mobile Devices

○ We release end-to-end pipeline codebase for training, evaluating, and benchmarking the baseline models and together with the proposed models.

● Proposed codebase includes following components:

○ TensorFlow models

○ Scripts to convert the models into the TensorFlow Lite models (mobile devices)

○ Pre-built TensorFlow Lite Android benchmark tool (mobile performance measurement)

● Git repo

○ https://github.com/hyperconnect/TC-ResNet

Page 65: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Accuracy and Inference Time

● TC-ResNet8 improves 11.5%p accuracy with comparable latency compared to latency state-of-the-art (CNN-2).

Page 66: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Accuracy and Inference Time

● TC-ResNet8 achieves 385x speedup while improving 0.3%p accuracy compared to accuracy state-of-the-art (Res15).

Page 67: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Conclusion

Page 68: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Neural Network Optimization in the Future

● Edge devices will run more complex tasks in the future.● Neural network acceleration becomes more important in the future.

AR glass (Google, Microsoft)

Robot (Amazon)

IoT devices(Naver, Google, Amazon,

Apple, …)

Self driving car (Tesla, Waymo, Uber)

Page 69: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

Thanks

Page 70: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving

References● ZeNA

○ Dongyoung Kim, Soobeom Kim, Sungjoo Yoo, "FPGA Prototyping of Low-Precision Zero-Skipping Accelerator for Neural Networks," in Rapid System Prototyping (RSP), Nov. 2018.

○ Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo, "ZeNA: Zero-Aware Neural Network Accelerator," in IEEE Design & Test, Aug. 2017.

○ Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo, "A Novel Zero Weight/Activation-Aware Hardware Architecture of Convolutional Neural Network," in Proc. Design Automation and Test in Europe (DATE), Mar. 2017.

● Outlier quantization and Precision highway

○ Eunhyeok Park, Dongyoung Kim, Sungjoo Yoo, “Energy-Efficient Neural Network Accelerator based on Outlier-Aware Low Precision Computation," in International Symposium on Computer Architecture (ISCA), June 2018.

○ Eunhyeok Park, Dongyoung Kim, Sungjoo Yoo, Peter Vajda, “Precision Highway for Ultra Low-Precision Quantization,” arXiv preprint arXiv:1812.09818, Dec. 2018.

● Temporal convolution for real-time KWS

○ Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha, “Temporal Convolution for Real-time Keyword Spotting on Mobile Devices,” in Annual Conference of the International Speech Communication Associatio(INTERSPEECH), Sep. 2019.