Upload
embedded-vision-alliance
View
84
Download
0
Embed Size (px)
Citation preview
© Copyright Khronos Group 2016 - Page 1
Vision and Neural Net Update November 2016
Neil Trevett Vice President Developer Ecosystem, NVIDIA | President, Khronos
[email protected] | @neilt3d
© Copyright Khronos Group 2016 - Page 2
Khronos and Open Standards
Software
Silicon
Khronos is an Industry Consortium of over 100 companies
We create royalty-free, open standard APIs for hardware acceleration of
Graphics, Parallel Compute, Neural Networks and Vision
© Copyright Khronos Group 2016 - Page 3
Accelerated API Landscape
Vision Frameworks
High-level Language-based
Acceleration Frameworks
Explicit
Kernels
GPU FPGA
DSP Dedicated
Hardware
Neural Net Libraries OpenVX Neural
Net Extension
3D
Graphics
© Copyright Khronos Group 2016 - Page 4
OpenCL – Low-level Parallel Programing • Low level programming of heterogeneous parallel compute resources
- One code tree can be executed on CPUs, GPUs, DSPs and FPGA
• OpenCL C language to write kernel programs to execute on any compute device
- Platform Layer API - to query, select and initialize compute devices
- Runtime API - to build and execute kernels programs on multiple devices
• New in OpenCL 2.2 - OpenCL C++ kernel language - a static subset of C++14
- Adaptable and elegant sharable code – great for building libraries
- Templates enable meta-programming for highly adaptive software
- Lambdas used to implement nested/dynamic parallelism
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
GPU
DSP CPU
CPU FPGA
Kernel code
compiled for
devices Devices
CPU
Host
Runtime API
loads and executes
kernels across devices
© Copyright Khronos Group 2016 - Page 5
OpenCL Conformant Implementations
OpenCL 1.0 Specification
Dec08 Jun10 OpenCL 1.1
Specification
Nov11 OpenCL 1.2
Specification OpenCL 2.0
Specification
Nov13
1.0 | Jul13
1.0 | Aug09
1.0 | May09
1.0 | May10
1.0 | Feb11
1.0 | May09
1.0 | Jan10
1.1 | Aug10
1.1 | Jul11
1.2 | May12
1.2 | Jun12
1.1 | Feb11
1.1 |Mar11
1.1 | Jun10
1.1 | Aug12
1.1 | Nov12
1.1 | May13
1.1 | Apr12
1.2 | Apr14
1.2 | Sep13
1.2 | Dec12
Desktop
Mobile
FPGA
2.0 | Jul14
OpenCL 2.1 Specification
Nov15
1.2 | May15
2.0 | Dec14
1.0 | Dec14
1.2 | Dec14
1.2 | Sep14
Vendor timelines are first implementation of
each spec generation
1.2 | May15
Embedded
1.2 | Aug15
1.2 | Mar16
2.0 | Nov15
2.1 | Jun15
© Copyright Khronos Group 2016 - Page 6
SYCL for OpenCL • Single-source heterogeneous programming using STANDARD C++
- Use C++ templates and lambda functions for host & device code
• Aligns the hardware acceleration of OpenCL with direction of the C++ standard
- C++14 with open source C++17 Parallel STL hosted by Khronos
C++ Kernel Language Low Level Control
‘GPGPU’-style separation of
device-side kernel source
code and host code
Single-source C++ Programmer Familiarity Approach also taken by
C++ AMP and OpenMP
Developer Choice The development of the two specifications are aligned so
code can be easily shared between the two approaches
© Copyright Khronos Group 2016 - Page 7
OpenVX – Low Power Vision Acceleration • Targeted at vision acceleration in real-time, mobile and embedded platforms
- Precisely defined API for production deployment
• Higher abstraction than OpenCL for performance portability across diverse architectures
- Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware…
• Extends portable vision acceleration to very low power domains
- Doesn’t require high-power CPU/GPU Complex or OpenCL precision
GPU
Vision Engine
Middleware
Application
DSP
Hardware
Pow
er
Eff
icie
ncy
Computation Flexibility
Dedicated Hardware
GPU Compute
Multi-core CPU X1
X10
X100 Vision Processing
Efficiency
Vision DSPs
© Copyright Khronos Group 2016 - Page 8
OpenVX Graphs • OpenVX developers express a graph of image operations (‘Nodes’)
- Nodes can be on any hardware or processor coded in any language
• Graphs can execute almost autonomously
- Possible to Minimize host interaction during frame-rate graph execution
• Graphs are the key to run-time optimization opportunities…
Array of
Keypoints
YUV
Frame
Gray
Frame
Camera
Input
Rendering
Output
Pyrt
Color Conversion
Channel Extract
Optical Flow
Harris Track
Image Pyramid
RGB
Frame
Array of
Features Ftrt-1 OpenVX Graph
OpenVX Nodes
Feature Extraction Example Graph
© Copyright Khronos Group 2016 - Page 9
OpenVX Efficiency through Graphs..
Reuse pre-allocated memory for
multiple intermediate data
Memory Management
Less allocation overhead, more memory for other applications
Replace a sub-graph with a
single faster node
Kernel Merge
Better memory locality, less kernel launch overhead
Split the graph execution across
the whole system: CPU / GPU /
dedicated HW
Graph Scheduling
Faster execution or lower power consumption
Execute a sub-graph at tile
granularity instead of image
granularity
Data Tiling
Better use of data cache and local memory
© Copyright Khronos Group 2016 - Page 10
Example Relative Performance
1.1
2.9
8.7
1.5
2.5
0
1
2
3
4
5
6
7
8
9
10
Arithmetic Analysis Filter Geometric Overall
OpenCV (GPU accelerated)
OpenVX (GPU accelerated)
Relative Performance
NVIDIA
implementation
experience. Geometric mean of
>2200 primitives,
grouped into each
categories,
running at different
image sizes and
parameter settings
© Copyright Khronos Group 2016 - Page 11
Dedicated Vision
Hardware
Layered Vision Processing Ecosystem
Programmable Vision
Processors
Application
C/C++
Implementers may use OpenCL or Compute Shaders to
implement OpenVX nodes on programmable processors
And then developers can use OpenVX to enable a
developer to easily connect those nodes into a graph
The OpenVX graph enables implementers to optimize execution across
diverse hardware architectures an drive to lower power implementations
OpenVX enables the graph to be extended to include hardware
architectures that don’t support programmable APIs
AMD OpenVX - Open source, highly optimized
for x86 CPU and OpenCL for GPU
- “Graph Optimizer” looks at
entire processing pipeline and
removes/replaces/merges
functions to improve
performance and bandwidth
- Scripting for rapid prototyping,
without re-compiling, at
production performance levels http://gpuopen.com/compute-product/amd-openvx/
© Copyright Khronos Group 2016 - Page 12
OpenVX 1.0 Shipping, OpenVX 1.1 Released! • Multiple OpenVX 1.0 Implementations shipping – spec in October 2014
- Open source sample implementation and conformance tests available
• OpenVX 1.1 Specification released 2nd May 2016
- Expands node functionality AND enhances graph framework
• OpenVX is EXTENSIBLE
- Implementers can add their own nodes at any time to meet customer and market needs
= shipping implementations
© Copyright Khronos Group 2016 - Page 13
OpenVX Neural Net Extension • Convolution Neural Network topologies can be represented as OpenVX graphs
- Layers are represented as OpenVX nodes
- Layers connected by multi-dimensional tensors objects
- Layer types include convolution, activation, pooling, fully-connected, soft-max
- CNN nodes can be mixed with traditional vision nodes
• Import/Export Extension
- Efficient handling of network Weights/Biases or complete networks
• The specification is provisional
- Welcome feedback from the deep learning community
Vision Node
Vision Node
Vision Node
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
An OpenVX graph mixing CNN nodes
with traditional vision nodes
© Copyright Khronos Group 2016 - Page 14
NNEF - Neural Network Exchange Format
NN Authoring Framework 1
NN Authoring Framework 2
NN Authoring Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
NNEF encapsulates neural network structure, data formats, commonly used operations
(such as convolution, pooling, normalization, etc.) and formal network semantics
NN Authoring Framework 1
NN Authoring Framework 2
NN Authoring Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
Neural Net
Exchange
Format
NNEF 1.0 is currently being defined
OpenVX will import NNEF files
© Copyright Khronos Group 2016 - Page 15
Please Consider Joining Khronos!
Influence how
standards evolve!
Access draft specs to
build products faster!
Understand early
industry requirements!
Ship products that conform to
international standards!
Khronos is proven to RAPIDLY generate hardware API standards
that create significant market opportunities
Any company or organization is welcome to join Khronos
for a voice and a vote in any of its standards
www.khronos.org