Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
赛 灵 思 技 术 日XILINX TECHNOLOGY DAY
张 帆资深全球AI方案技术专家2019.03.19
用赛灵思 FPGA 加速机器学习推断
© Copyright 2019 Xilinx
Who is Xilinx? Why Should I choose FPGA?
Only HW/SW configurable device for fast changing networks
High performance / low power with custom internal memory hierarchy
Future proof to lower precisions Low latency end-to-end Scalable device family for
different applications
1 2
3 4 5
© Copyright 2019 Xilinx
Xilinx AI 解决方案
Zynq-7020 Zynq-7020 ZU2 ZU2 ZU9
DPU and Accelerating IPs
DNNDK
Xilinx AI SDK
Algorithm Model Zoo
Dense Neural Network Pruned Neural Network Compressed Sparse Neural Network
Pruning Quantization
Core API
Driver
Runtime
Loader
Tracer
Compilation
Compiler AssemblerDeep learning libraries
BSP for boards
Cross-compiling tools
Reference solutions
Data centerSurveillance ADAS/AD
© Copyright 2019 Xilinx
算法模型集合(Zoo)
Network Backbone DPU Deployment
Input Size
OPs Paras Training Set
Val Set Eva Norm Float
Eva Norm Fixed
Resnet50_v1 Resnet Yes 224*224 7.7G 25.6M ImageNet ImageNet 0.7483 0.7338
Inception_v3 Inception No 299*299 11.43G 23.8M ImageNet ImageNet 0.7401 0.7347
SSD VGG16 No 300*300 62.77G 26.3M Voc07+12 Voc07 77.19%
RefineDet VGG-16 Support 480*360 123.9G 29.6M Coco2014 Coco2014 70.14%
Densebox Yes 320*320 492M Private Private 97.92% 97.50%
Yolo_v3 Yolo Yes 512*288 53.7G 61.8M Cityscape Cityscape 53.7% 53.1%
• DPU deployment- Yes: the model is successfully deployed on DPU.- Support: the model is supported but not deployed. Similar model structure is deployed and test successfully.- No: the model is not supported by DPU right now mainly due to some special operations or layers.
69 Models in the zoo,in which 34 modelsare already deployed.
© Copyright 2019 Xilinx
DNNDK – 深度神经网络开发套件
˃ DECENT Flt32 to Int8 quantization with one line command
˃ DNNC Automatic layer fusion to avoid frequently data read
and write
˃ Runtime N2CubeVarious APIs to facilitate specific application
˃ Profiler DsightPowerful tool as failure analysis and optimization
DECENT
DNNC
N2Cube
DSight
DNNDK
Customer Platform (Board, OS)
© Copyright 2019 Xilinx
DNNDK 开发流程
Five Steps with DNNDK
01 Model Compression
02 Model Compilation
03 Programming
04 Hybrid Compilation
05 Execution
© Copyright 2019 Xilinx
第1步:使用 DECENT 进行模型压缩
• Consists of two separate tools• Quantization Tool • Pruning Tool
• Platform• Caffe, Darknet• TensorFlow
• Quantization Tool Beta version• Pruning Tool Internal version
• Effects• Compress model size
5x – 100x• Compress running
time 1.5x – 10x
decent – Deep compression Tooldecent_q – Quantization Tooldecent_p – Pruning Tool
© Copyright 2019 Xilinx
第2步:使用DNNC进行模型编译
Parser
Optimizer
Code-generator
DNNC
1010110101010111010101010101001110110101010010110000110110101101000011
……DPU Instruction CodeNeural Network Model
Programmable tensor-level DPU instruction set - Compatible for Caffe/TensorFlow frameworks- Flexible & scalable for various CNN layers
© Copyright 2019 Xilinx
硬件加速 - SSD
117
57
3727 23 19 17 15.6 14.6 13.6 12.2 11.6
61.5 63.4 63.5 63.4 62.4 62 61.5 61.1 61 60.8 59.2 60.4
18 71103
-300
-250
-200
-150
-100
-50
0
50
100
150
0
20
40
60
80
100
120
140
160
180
200
baseline 1 2 3 4 5 6 7 8 9 10 11
FPS
Ope
ratio
ns(G
) / m
AP
Pruning Iterations
Pruning Speedup on Hardware (2xDPU-4096@ZU9)VGG_SSD 4 classes detection @Deephi surveillance data
Operations(G) mAP(%) fps
4x5.7x
© Copyright 2019 Xilinx
© Copyright 2019 Xilinx
Xilinx AI SDK
˃ Xilinx AI SDK to enable low touch engagements for customersEncourage top customers to use SDK to build applications and solutionsOnly use generic SDK release to support low priority customers
DNNDK
Algorithm Model ZooLibraries and reference solutions
BSP for boards
Cross-compiling tools
Xilinx AI SDK Xilinx AI Suite
DPU and accelerating IPs
© Copyright 2019 Xilinx
DPU 可扩展能力
Z701056G
Peak Perf INT8 (OPS)
Z7012S102GZ7014S/Z7015115G
Z7020230G
Z7030700GZU2576G
ZU31.2T
ZU41.6T
ZU52.4T
ZU62.9T
ZU73.5T
ZU9
ZU11
ZU15
4.1T
5.5T
6.8T
Z70351.7T Z7045
Z71002.8T
DPU Configuration
* B256/288/512/3136 work in progress
© Copyright 2019 Xilinx
DPU IP 集成
The DPU TRD (Targeted Reference Design) has been published in Xilinx.com.https://www.xilinx.com/products/design-tools/ai-inference/ai-developer-hub.html#edge
© Copyright 2019 Xilinx
重点总结
赛灵思提供先进的 Edge ML 解决方案
DNNDK 让 FPGA 上神经网络的部署更容易
DPU IP 已在赛灵思官方网站开放免费下载
1
2
3
Adaptable.Intelligent.
赛 灵 思 技 术 日XILINX TECHNOLOGY DAY