15
© Copyright Khronos Group 2016 - Page 1 Vision and Neural Net Update November 2016 Neil Trevett Vice President Developer Ecosystem, NVIDIA | President, Khronos [email protected] | @neilt3d

"New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

Embed Size (px)

Citation preview

Page 1: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 1

Vision and Neural Net Update November 2016

Neil Trevett Vice President Developer Ecosystem, NVIDIA | President, Khronos

[email protected] | @neilt3d

Page 2: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 2

Khronos and Open Standards

Software

Silicon

Khronos is an Industry Consortium of over 100 companies

We create royalty-free, open standard APIs for hardware acceleration of

Graphics, Parallel Compute, Neural Networks and Vision

Page 3: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 3

Accelerated API Landscape

Vision Frameworks

High-level Language-based

Acceleration Frameworks

Explicit

Kernels

GPU FPGA

DSP Dedicated

Hardware

Neural Net Libraries OpenVX Neural

Net Extension

3D

Graphics

Page 4: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 4

OpenCL – Low-level Parallel Programing • Low level programming of heterogeneous parallel compute resources

- One code tree can be executed on CPUs, GPUs, DSPs and FPGA

• OpenCL C language to write kernel programs to execute on any compute device

- Platform Layer API - to query, select and initialize compute devices

- Runtime API - to build and execute kernels programs on multiple devices

• New in OpenCL 2.2 - OpenCL C++ kernel language - a static subset of C++14

- Adaptable and elegant sharable code – great for building libraries

- Templates enable meta-programming for highly adaptive software

- Lambdas used to implement nested/dynamic parallelism

OpenCL

Kernel

Code

OpenCL

Kernel

Code

OpenCL

Kernel

Code

OpenCL

Kernel

Code

GPU

DSP CPU

CPU FPGA

Kernel code

compiled for

devices Devices

CPU

Host

Runtime API

loads and executes

kernels across devices

Page 5: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 5

OpenCL Conformant Implementations

OpenCL 1.0 Specification

Dec08 Jun10 OpenCL 1.1

Specification

Nov11 OpenCL 1.2

Specification OpenCL 2.0

Specification

Nov13

1.0 | Jul13

1.0 | Aug09

1.0 | May09

1.0 | May10

1.0 | Feb11

1.0 | May09

1.0 | Jan10

1.1 | Aug10

1.1 | Jul11

1.2 | May12

1.2 | Jun12

1.1 | Feb11

1.1 |Mar11

1.1 | Jun10

1.1 | Aug12

1.1 | Nov12

1.1 | May13

1.1 | Apr12

1.2 | Apr14

1.2 | Sep13

1.2 | Dec12

Desktop

Mobile

FPGA

2.0 | Jul14

OpenCL 2.1 Specification

Nov15

1.2 | May15

2.0 | Dec14

1.0 | Dec14

1.2 | Dec14

1.2 | Sep14

Vendor timelines are first implementation of

each spec generation

1.2 | May15

Embedded

1.2 | Aug15

1.2 | Mar16

2.0 | Nov15

2.1 | Jun15

Page 6: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 6

SYCL for OpenCL • Single-source heterogeneous programming using STANDARD C++

- Use C++ templates and lambda functions for host & device code

• Aligns the hardware acceleration of OpenCL with direction of the C++ standard

- C++14 with open source C++17 Parallel STL hosted by Khronos

C++ Kernel Language Low Level Control

‘GPGPU’-style separation of

device-side kernel source

code and host code

Single-source C++ Programmer Familiarity Approach also taken by

C++ AMP and OpenMP

Developer Choice The development of the two specifications are aligned so

code can be easily shared between the two approaches

Page 7: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 7

OpenVX – Low Power Vision Acceleration • Targeted at vision acceleration in real-time, mobile and embedded platforms

- Precisely defined API for production deployment

• Higher abstraction than OpenCL for performance portability across diverse architectures

- Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware…

• Extends portable vision acceleration to very low power domains

- Doesn’t require high-power CPU/GPU Complex or OpenCL precision

GPU

Vision Engine

Middleware

Application

DSP

Hardware

Pow

er

Eff

icie

ncy

Computation Flexibility

Dedicated Hardware

GPU Compute

Multi-core CPU X1

X10

X100 Vision Processing

Efficiency

Vision DSPs

Page 8: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 8

OpenVX Graphs • OpenVX developers express a graph of image operations (‘Nodes’)

- Nodes can be on any hardware or processor coded in any language

• Graphs can execute almost autonomously

- Possible to Minimize host interaction during frame-rate graph execution

• Graphs are the key to run-time optimization opportunities…

Array of

Keypoints

YUV

Frame

Gray

Frame

Camera

Input

Rendering

Output

Pyrt

Color Conversion

Channel Extract

Optical Flow

Harris Track

Image Pyramid

RGB

Frame

Array of

Features Ftrt-1 OpenVX Graph

OpenVX Nodes

Feature Extraction Example Graph

Page 9: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 9

OpenVX Efficiency through Graphs..

Reuse pre-allocated memory for

multiple intermediate data

Memory Management

Less allocation overhead, more memory for other applications

Replace a sub-graph with a

single faster node

Kernel Merge

Better memory locality, less kernel launch overhead

Split the graph execution across

the whole system: CPU / GPU /

dedicated HW

Graph Scheduling

Faster execution or lower power consumption

Execute a sub-graph at tile

granularity instead of image

granularity

Data Tiling

Better use of data cache and local memory

Page 10: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 10

Example Relative Performance

1.1

2.9

8.7

1.5

2.5

0

1

2

3

4

5

6

7

8

9

10

Arithmetic Analysis Filter Geometric Overall

OpenCV (GPU accelerated)

OpenVX (GPU accelerated)

Relative Performance

NVIDIA

implementation

experience. Geometric mean of

>2200 primitives,

grouped into each

categories,

running at different

image sizes and

parameter settings

Page 11: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 11

Dedicated Vision

Hardware

Layered Vision Processing Ecosystem

Programmable Vision

Processors

Application

C/C++

Implementers may use OpenCL or Compute Shaders to

implement OpenVX nodes on programmable processors

And then developers can use OpenVX to enable a

developer to easily connect those nodes into a graph

The OpenVX graph enables implementers to optimize execution across

diverse hardware architectures an drive to lower power implementations

OpenVX enables the graph to be extended to include hardware

architectures that don’t support programmable APIs

AMD OpenVX - Open source, highly optimized

for x86 CPU and OpenCL for GPU

- “Graph Optimizer” looks at

entire processing pipeline and

removes/replaces/merges

functions to improve

performance and bandwidth

- Scripting for rapid prototyping,

without re-compiling, at

production performance levels http://gpuopen.com/compute-product/amd-openvx/

Page 12: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 12

OpenVX 1.0 Shipping, OpenVX 1.1 Released! • Multiple OpenVX 1.0 Implementations shipping – spec in October 2014

- Open source sample implementation and conformance tests available

• OpenVX 1.1 Specification released 2nd May 2016

- Expands node functionality AND enhances graph framework

• OpenVX is EXTENSIBLE

- Implementers can add their own nodes at any time to meet customer and market needs

= shipping implementations

Page 13: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 13

OpenVX Neural Net Extension • Convolution Neural Network topologies can be represented as OpenVX graphs

- Layers are represented as OpenVX nodes

- Layers connected by multi-dimensional tensors objects

- Layer types include convolution, activation, pooling, fully-connected, soft-max

- CNN nodes can be mixed with traditional vision nodes

• Import/Export Extension

- Efficient handling of network Weights/Biases or complete networks

• The specification is provisional

- Welcome feedback from the deep learning community

Vision Node

Vision Node

Vision Node

Downstream

Application

Processing

Native

Camera

Control CNN Nodes

An OpenVX graph mixing CNN nodes

with traditional vision nodes

Page 14: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 14

NNEF - Neural Network Exchange Format

NN Authoring Framework 1

NN Authoring Framework 2

NN Authoring Framework 3

Inference Engine 1

Inference Engine 2

Inference Engine 3

NNEF encapsulates neural network structure, data formats, commonly used operations

(such as convolution, pooling, normalization, etc.) and formal network semantics

NN Authoring Framework 1

NN Authoring Framework 2

NN Authoring Framework 3

Inference Engine 1

Inference Engine 2

Inference Engine 3

Neural Net

Exchange

Format

NNEF 1.0 is currently being defined

OpenVX will import NNEF files

Page 15: "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 15

Please Consider Joining Khronos!

Influence how

standards evolve!

Access draft specs to

build products faster!

Understand early

industry requirements!

Ship products that conform to

international standards!

Khronos is proven to RAPIDLY generate hardware API standards

that create significant market opportunities

Any company or organization is welcome to join Khronos

for a voice and a vote in any of its standards

www.khronos.org