用赛灵思 FPGA 提速机器学习推断€¦ · 赛灵思技术日 xilinx technology day 张帆. 资深全球. ai. 方案技术专家 2019.03.19. 用赛灵思. fpga . 加速机器学习推断

赛灵思技术日XILINX TECHNOLOGY DAY

张帆资深全球AI方案技术专家2019.03.19

用赛灵思 FPGA 加速机器学习推断

© Copyright 2019 Xilinx

Who is Xilinx? Why Should I choose FPGA?

Only HW/SW configurable device for fast changing networks

High performance / low power with custom internal memory hierarchy

Future proof to lower precisions Low latency end-to-end Scalable device family for

different applications

1 2

3 4 5


Xilinx AI 解决方案

Zynq-7020 Zynq-7020 ZU2 ZU2 ZU9

DPU and Accelerating IPs

DNNDK

Xilinx AI SDK

Algorithm Model Zoo

Dense Neural Network Pruned Neural Network Compressed Sparse Neural Network

Pruning Quantization

Core API

Driver

Runtime

Loader

Tracer

Compilation

Compiler AssemblerDeep learning libraries

BSP for boards

Cross-compiling tools

Reference solutions

Data centerSurveillance ADAS/AD


算法模型集合（Zoo）

Network Backbone DPU Deployment

Input Size

OPs Paras Training Set

Val Set Eva Norm Float

Eva Norm Fixed

Resnet50_v1 Resnet Yes 224*224 7.7G 25.6M ImageNet ImageNet 0.7483 0.7338

Inception_v3 Inception No 299*299 11.43G 23.8M ImageNet ImageNet 0.7401 0.7347

SSD VGG16 No 300*300 62.77G 26.3M Voc07+12 Voc07 77.19%

RefineDet VGG-16 Support 480*360 123.9G 29.6M Coco2014 Coco2014 70.14%

Densebox Yes 320*320 492M Private Private 97.92% 97.50%

Yolo_v3 Yolo Yes 512*288 53.7G 61.8M Cityscape Cityscape 53.7% 53.1%

• DPU deployment- Yes: the model is successfully deployed on DPU.- Support: the model is supported but not deployed. Similar model structure is deployed and test successfully.- No: the model is not supported by DPU right now mainly due to some special operations or layers.

69 Models in the zoo,in which 34 modelsare already deployed.


DNNDK – 深度神经网络开发套件

˃ DECENT Flt32 to Int8 quantization with one line command

˃ DNNC Automatic layer fusion to avoid frequently data read

and write

˃ Runtime N2CubeVarious APIs to facilitate specific application

˃ Profiler DsightPowerful tool as failure analysis and optimization

DECENT

DNNC

N2Cube

DSight

DNNDK

Customer Platform (Board, OS)


DNNDK 开发流程

Five Steps with DNNDK

01 Model Compression

02 Model Compilation

03 Programming

04 Hybrid Compilation

05 Execution


第1步：使用 DECENT 进行模型压缩

• Consists of two separate tools• Quantization Tool • Pruning Tool

• Platform• Caffe, Darknet• TensorFlow

• Quantization Tool Beta version• Pruning Tool Internal version

• Effects• Compress model size

5x – 100x• Compress running

time 1.5x – 10x

decent – Deep compression Tooldecent_q – Quantization Tooldecent_p – Pruning Tool


第2步：使用DNNC进行模型编译

Parser

Optimizer

Code-generator

DNNC

1010110101010111010101010101001110110101010010110000110110101101000011

……DPU Instruction CodeNeural Network Model

Programmable tensor-level DPU instruction set - Compatible for Caffe/TensorFlow frameworks- Flexible & scalable for various CNN layers


硬件加速 - SSD

117

57

3727 23 19 17 15.6 14.6 13.6 12.2 11.6

61.5 63.4 63.5 63.4 62.4 62 61.5 61.1 61 60.8 59.2 60.4

18 71103

-300

-250

-200

-150

-100

-50

0

50

100

150

0

20

40

60

80

100

120

140

160

180

200

baseline 1 2 3 4 5 6 7 8 9 10 11

FPS

Ope

ratio

ns(G

) / m

AP

Pruning Iterations

Pruning Speedup on Hardware (2xDPU-4096@ZU9)VGG_SSD 4 classes detection @Deephi surveillance data

Operations(G) mAP(%) fps

4x5.7x



Xilinx AI SDK

˃ Xilinx AI SDK to enable low touch engagements for customersEncourage top customers to use SDK to build applications and solutionsOnly use generic SDK release to support low priority customers

DNNDK

Algorithm Model ZooLibraries and reference solutions

BSP for boards

Cross-compiling tools

Xilinx AI SDK Xilinx AI Suite

DPU and accelerating IPs


DPU 可扩展能力

Z701056G

Peak Perf INT8 (OPS)

Z7012S102GZ7014S/Z7015115G

Z7020230G

Z7030700GZU2576G

ZU31.2T

ZU41.6T

ZU52.4T

ZU62.9T

ZU73.5T

ZU9

ZU11

ZU15

4.1T

5.5T

6.8T

Z70351.7T Z7045

Z71002.8T

DPU Configuration

* B256/288/512/3136 work in progress


DPU IP 集成

The DPU TRD (Targeted Reference Design) has been published in Xilinx.com.https://www.xilinx.com/products/design-tools/ai-inference/ai-developer-hub.html#edge


重点总结

赛灵思提供先进的 Edge ML 解决方案

DNNDK 让 FPGA 上神经网络的部署更容易

DPU IP 已在赛灵思官方网站开放免费下载

1

2

3

Adaptable.Intelligent.

赛灵思技术日XILINX TECHNOLOGY DAY

Documents

用赛灵思 FPGA 提速机器学习推断€¦ · 赛灵思技术日 xilinx technology day 张 帆. 资深全球. ai. 方案技术专家 2019.03.19. 用赛灵思. fpga . 加速机器学习推断

用赛灵思 FPGA 提速机器学习推断€¦ · 赛灵思技术日 xilinx technology day 张帆. 资深全球. ai. 方案技术专家 2019.03.19. 用赛灵思. fpga . 加速机器学习推断