33
April 4-7, 2016 | Silicon Valley S6783 Elif Albuz, April 4, 2016 VISIONWORKS™ A CUDA ACCELERATED COMPUTER VISION LIBRARY

VISIONWORKS™ - GPU Technology Conferenceon-demand.gputechconf.com/gtc/2016/presentation/s6783-elif-albuz... · VisionWorks™ Programming Model Conclusion Demo . 3 COMPUTER VISION

  • Upload
    buianh

  • View
    254

  • Download
    0

Embed Size (px)

Citation preview

April 4-7, 2016 | Silicon Valley

S6783

Elif Albuz, April 4, 2016

VISIONWORKS™ A CUDA ACCELERATED COMPUTER VISION LIBRARY

2

AGENDA

Motivation

Introduction to VisionWorks™

VisionWorks™ Software Stack

VisionWorks™ Programming Model

Conclusion

Demo

3

COMPUTER VISION Intelligent Video

Analytics

Drones

Autonomous Driving

Robotics

Augmented Reality

4

COMPUTER VISION

5

COMPUTER VISION APP DEVELOPMENT

Concept

Reference Implementation

Product

Port to target & optimize

6

VISIONWORKS™ MOTIVATION

Deliver high performance, robust computer vision primitives

Ease development of computer vision applications on Tegra platforms

Accelerate prototype to product cycle

Depth Map

Optical Flow

Corner detection

7

CUDA accelerated library

(OpenVX primitives + NVIDIA extensions + Plus Algorithms)

VISIONWORKS™ AT A GLANCE

Flexible framework for seamlessly adding user-defined primitives. Interoperability with OpenCV

Thread-safe API

Documentation, tutorials, sample software pipelines that teach use of primitives and framework

8

JETSON TK1 Pro Drive PX2 JETSON TK1

VISIONWORKS™ SUPPORTED PLATFORMS

Ubuntu Linux 14.04,

Windows 8

Drive PX JETSON TX1

Automotive Embedded Desktop

9

VISIONWORKS™ TOOLKIT SOFTWARE STACK

CUDA Acceleration Framework

OpenVXTM Framework & Primitives

NVIDIA VisionWorks

Framework & Primitive Extensions

VisionWorks

SfM

NVIDIA

Khronos

VisionWorks Core

Library

Source Samples VisionWorks Source Samples

Feature Tracking, Hough Transform, Stereo Depth

Extraction, Camera Hist Equalization..

NVXIO

Multimedia

Abstraction

VisionWorks-Plus VisionWorks

Object Tracker . . .

VisionWorks

CUDA API

10

VISIONWORKS™ PRIMITIVES IMAGE ARITHMETIC Absolute Difference

Accumulate Image

Accumulate Squared

Accumulate Weighted

Add/ Subtract/ Multiply +

Channel Combine

Channel Extract

Color Convert +

CopyImage

Convert Depth

Magnitude

MultiplyByScalar

Not / Or / And / Xor

Phase

Table Lookup

Threshold

FLOW & DEPTH Median Flow

Optical Flow (LK) +

Semi-Global Matching

Stereo Block Matching

IME Create Motion Field

IME Refine Motion Field

IME Partition Motion Field

GEOMETRIC

TRANSFORMS Affine Warp +

Warp Perspective +

Flip Image

Remap

Scale Image +

FILTERS BoxFilter

Convolution

Dilation Filter

Erosion Filter

Gaussian Filter

Gaussian Pyramid

Laplacian3x3

Median Filter

Scharr3x3

Sobel 3x3

FEATURES

Canny Edge Detector

FAST Corners +

FAST Track

Harris Corners +

Harris Track

Hough Circles

Hough Lines

ANALYSIS

Histogram

Histogram Equalization

Integral Image

Mean Std Deviation

Min Max Locations

NVIDIA

Extensions

All OpenVX

Primitives

+ type/mode extension by NVIDIA

NVIDIA extension primitives

11

VISIONWORKS™ PRIMITIVES

• VisionWorks primitives are CUDA optimized

(except MedianFlow & FindHomography extensions)

• 85% of VisionWorks OpenVX API is also accelerated with NEON.

Table of NEON optimized primitives are listed in VisionWorks Toolkit Ref.

(Go to "VisionWorks API" -> "NVIDIA Extensions API" -> "Vision Primitives API”)

• Primitive acceleration with VisionWorks

• Up to 92x speedup compared to OpenCV CPU kernels on Drive PX (Ave 8x)

• Up to 13x speedup compared to OpenCV CUDA kernels on Drive PX (Ave 2x)

(Measured on Drive PX, OS=‘V4L' Linux Kernel='3.18.21-tegra-g06aec38'

CPU Rate='1632 MHz' GPU Rate='844 MHz' EMC Rate='1600 MHz’)

NVIDIA

Extensions

All OpenVX

Primitives

12

Feature Tracker Stereo Depth

Extraction

OpenCV-NPP-OpenVX Interop

Hough Lines &

Circles

+ Video stabilization

+ Iterative Motion Estimation/Flow

and other platform specific samples (available only on certain platforms)

Camera Capture, OpenGL interop, Video playback

VISIONWORKS™ SAMPLE APPLICATIONS

13

Camera input

ISP & Camera Processing

CUDA

CSI

VISIONWORKS SAMPLE APPLICATIONS NVXIO MULTIMEDIA ABSTRACTION

Vision processing

GFX Render

Video/image

file input

Streamed

video/image

input

Image/Video Encode . . .

Image/Video Decode

Interop/EGLStre

ams

Interop/EGLStre

ams

NVXIO CPU COMPLEX

(Multi-core

ARM v8)

SECURITY

ENGINE

2D ENGINE

(VIC)

VIDEO

ENCODER

VIDEO

DECODER

AUDIO

ENGINE

(APE)

SAFETY

ENGINE

(SCE)

IMAGE

PROC (ISP)

SAFETY

MANAGER

(HSM)

BOOT PROC

(BPMP)

CAN PROC

(SPE)

I/O

GPU

14

Structure From Motion Object Tracker

VISIONWORKS™ PLUS ALGORITHMS

15

Programming with VisionWorks Library

16

VISIONWORKS™ PROGRAMMING MODEL

VisionWorks

OpenVX™

Immediate Mode

VisionWorks

OpenVX™

Graph Mode

VisionWorks

CUDA API

Standard specified

heterogeneous

compute API with

individual function

calls

Heterogeneous compute

API with graph

optimizations

Extensible with user

defined nodes

Direct CUDA API for

advanced CUDA

developers

17

VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE

OpenVX Immediate mode API enables developers to easily port their applications.

OpenVX API Immediate mode calls are prefixed with “vxu”

Ported Video Stabilization algorithm in OpenCV to VisionWorks Immediate Mode.

Color Conversion

Optical Flow

Stabilized frames

Cv::Mat to Vx_image

Processs pts & Find

Homography

Warp Perspective

OpenCV image

Source Feature detection

Image Pyramid

18

VISIONWORKS OPENVX™ IMMEDIATE MODE VIDEO STABILIZATION SAMPLE

Performance boost: Video stabilization application is accelerated by 2.6x

(including the overhead for Mat to vx_image conversions)

Color Conversion

Optical Flow

Stabilized frames

Cv::Mat to Vx_image

Processs pts & Find

Homography

Warp Perspective

OpenCV image

Source Feature detection

Image Pyramid

0.6x

1.4x

1.7x

4.9x 2.3x 4.6x

19

VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE

OpenVX API graph mode calls are prefixed with “vx”

OpenVX Graph enables advanced optimizations

• Buffer reuse, kernel fusion

• Efficient use of streaming and CUDA textures

• Automatic scheduling across processing units based on various factors (safety, perf,..)

• Tiling and pipelining vision functions at sub-frame level

Color Conversion

Optical Flow

Stabilized frames

Processs pts & Find

Homography

Warp Perspective

Image

Source

Feature detection

Image Pyramid

20

VISIONWORKS OPENVX™ GRAPH MODE VIDEO STABILIZATION SAMPLE

Performance boost: Video stabilization application is further accelerated compared to immediate mode.

Color Conversion

Optical Flow

Stabilized frames

Processs pts & Find

Homography

Warp Perspective

Image

Source

Feature detection

Image Pyramid

21

VISIONWORKS CUDA API FEATURE TRACKING SAMPLE

VisionWorks CUDA API enables developer with low-level access. Developer manages

• Data allocations and transfer

• Scheduling and pipelining

YUV

frame

Gray

frame

Camera/image/video

Input data Rendering/Output

nvxcuColor Convert

nvxcuChannel Extract

nvxcuOpticalFlowPyrLK

nvxcuHarris Track

nvxcuGaussian Pyramid

RGB frame

(CUDA buffer)

Array of

keypoints

22

VISIONWORKS™ API SELECTION

VisionWorks

OpenVX™

Immediate Mode

VisionWorks

OpenVX™

Graph Mode

VisionWorks

CUDA API

Quick port from other

libraries

To be able to reassign

CPU and GPU tasks based

on perf.

Let the graph manager to

hide overheads, optimize

and manage data

To be able to reassign CPU

and GPU tasks based on

perf.

Low level CUDA API

access for advanced

CUDA developers

23

DEBUGGING WITH VISIONWORKS™

Enable VisionWorks debug markers with “export NVX_PROF=nvtx”

24

VISIONWORKS™ DOCUMENTATION

Installed location: /usr/share/visionWorks/docs

25

First Khronos OpenVX™ 1.0 compliant library (Jan 2015)

VisionWorks enables key demos (CES’16 and more at GTC)

27K downloads (embedded) since release in Nov, 2015 + Installed by default

on all automotive platforms

VISIONWORKS™ FACTS

Weekly VisionWorks downloads for various platforms

26

CONCLUSION

• VisionWorks Toolkit delivers multiple levels of API

– OpenVX Immediate Mode, OpenVX Graph Mode, VisionWorks CUDA API

• Heterogeneous API enables switching from GPU to CPU

– this is very powerful, reducing productization time

• Delivers high performance

– Offers significant speedup over CUDA optimized OpenCV functions

• Adopts native media APIs on Tegra platforms and delivers ready to use code

samples

S6739-VisionWorks™

Toolkit Programming

Tutorial

Room LL20A

L6129-VisionWorks™

Toolkit LAB Session

Room 210C

H6115 - Designing

Computer Vision

Applications with

VisionWorks™

Pod B

27

RESOURCES & USEFUL LINKS

http://www.embedded-vision.com/

https://www.khronos.org/openvx/

https://developer.nvidia.com/embedded/visionworks

VisionWorks Webinars - https://developer.nvidia.com/embedded/learn/tutorials

28

VISIONWORKS WITH DEEP LEARNING DEMO

FULLY CONVOLUTIONAL NETWORK

[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

[2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS).

[3] M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The

Cityscapes Dataset," in CVPR Workshop on The Future of Datasets in Vision, 2015. 2015.

29

DEEP LEARNING & VISION DEMO

FULLY CONVOLUTIONAL NETWORK

[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

[2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS).

2015.

30

Introduction VisionWorks API OpenVX Sample Overview

31

VISIONWORKS™ Sample Applications

NVXIO

(Multimedia Abstraction)

Histogram Eq

w/Camera input

Feature tracking

with compressed

images

Source Samples

with multimedia

I/0

Hough Lines

with decoded video . . .

Platform Software Stack (Multimedia, Interop, GL, UI, System)

32

PLATFORMS & MULTIMEDIA API

Platform Camera Decode Interop Render Encode

Android Android Camera

HAL v3.0

Android API CUDA-OpenGL

interop?

OpenGLES

3.0

(?)

Vibrante NvMedia capture NvMedia +Gst

NvMedia h264 ES

EGLStreams OpenGLES

(GLFW)

Gst

Linux4Tegra Gst-capture Gst+OpenMAX EGLStreams OpenGLES Gst+OpenMAX

Ubuntu Linux

14.04

V4L through

OpenCV4Tegra

Gst+VDPAU CUDA-OpenGL

Interop

OpenGL Gst

Windows x64 V4W/OpenCV NVCUVID (Gst?) CUDA-OpenGL

Interop

OpenGL Ffmeg/OpenCV

Gst - Gstreamer

33

“Multi-quote slide sample.”

— Source: Either a name or publication text here, OR, a company logo to the right

“Multi-quote slide sample.”

— Source: Either a name or publication text here, OR, a company logo to the right

“Multi-quote slide sample.”

— Source: Either a name or publication text here, OR, a company logo to the right