Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
© Copyright Khronos Group 2014 - Page 1
Open API Standards for Mobile Graphics, Compute and Vision Processing
GTC, March 2014 Neil Trevett
Vice President Mobile Ecosystem, NVIDIA President Khronos
© Copyright Khronos Group 2014 - Page 2
Khronos Connects Software to Silicon
Open Consortium creating ROYALTY-FREE, OPEN STANDARD APIs for hardware acceleration
Defining the roadmap for low-level silicon interfaces needed on every platform
Graphics, compute, rich media, vision, sensor and camera
processing
Rigorous specifications AND conformance tests for cross-
vendor portability
Acceleration APIs BY the Industry
FOR the Industry
Well over a BILLION people use Khronos APIs Every Day…
© Copyright Khronos Group 2014 - Page 3
Khronos Standards
Visual Computing - 3D Graphics - Heterogeneous Parallel Computing
3D Asset Handling - 3D authoring asset interchange
- 3D asset transmission format with compression
Acceleration in HTML5 - 3D in browser – no Plug-in
- Heterogeneous computing for JavaScript
Camera Control API
Over 100 companies defining royalty-free APIs to connect software to silicon
Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion
© Copyright Khronos Group 2014 - Page 4
The OpenGL Family OpenGL 4.4 is the industry’s most advanced 3D API Cross platform – Windows, Linux, Mac, Android Foundation for productivity apps Target for AAA engines and games
The most pervasively available 3D API – 1.6 Billion devices and counting Almost every mobile and embedded device – inc. Android, iOS Bringing proven desktop functionality to mobile
JavaScript binding to OpenGL ES Enabling the Web with GPU access Almost pervasive availability on mobile and desktop browsers Truly portable 3D apps with HTML5
© Copyright Khronos Group 2014 - Page 5
OpenGL ES 3.1 Launched at GDC! • Headline features - Compute Shaders and Draw-Indirect
- Compute shaders can create geometry or other rendering data
• Expecting rapid adoption - driver upgrade for many SOCs - Backward compatible with 2.0/3.0 so apps can incrementally adopt features
• Enabling desktop OpenGL to be used for mobile development - ARB_ES_3_1_compatibility specification to support “OpenGL ES 3.1 context”
2002 Working Group
Formed
2003 1.0
2004 1.1
2007 2.0
2012 3.0
2014 3.1
Driver Update
Silicon Update
Silicon Update
Driver Update
© Copyright Khronos Group 2014 - Page 6
OpenGL Fallacy: Old and Inefficient
Immediate Mode Fixed
Function
Ancient crufty stuff
Feedback
Selection
Evaluators
Display Lists
Selectors
© Copyright Khronos Group 2014 - Page 7
OpenGL Reality: Modern & Efficient
Bindless ARB
SSBO GL4.3
Multi-Draw Indirect GL4.3
UBO GL3.1
Texture Arrays GL3.0
Buffer Storage GL4.4
© Copyright Khronos Group 2014 - Page 8
indirect draw
buffer object
buffer object
texture object
buffer object
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Classic OpenGL Model
CPU
GPU
…
…
Memory
cmd
cmd
cmd
cmd
Direct Drawing Commands (via the command fifo)
© Copyright Khronos Group 2014 - Page 9
indirect draw
buffer object
indirect draw
buffer object
texture object
buffer object
indirect draw
buffer object
texture object
buffer object
buffer object
buffer object
render target
buffer object
Efficient OpenGL Model
CPU
CPU
CPU
CPU
GPU …
…
Memory
CPU Writes Memory – multi-threaded (no API)!
GPU Writes Commands to Memory Reads Commands from Memory
No API – Minimal CPU Involvement
Memory access mediated through
OpenGL fences
© Copyright Khronos Group 2014 - Page 10
Results • OpenGL enables scalable multi-threading with no new API
- CPU and GPU Cores just write to memory - GPU work creation - builds buffers, constructs MDI commands
• Integer multiple speedups ~5x – ~15x (not a typo) - On driver limited cases, obviously
• Works TODAY on existing drivers! - Mostly OpenGL 4.2+ - Extensions are at least EXT
• Does not require a new object model - Does not require breaking existing applications
• http://blogs.nvidia.com/blog/2014/03/20/opengl-gdc2014/
© Copyright Khronos Group 2014 - Page 11
EGL 1.5 Released • EGL 1.5 brings functionality from multiple
extensions into core - Increased reliability and portability
• EGLImages - Sharing textures and renderbuffers
• Context Robustness - Defending against malicious code
• EGLSync objects - Improved OpenGL /OpenCL interop
• Platform extensions - Standardized interactions for multiple OS
e.g. Android and 64-bit platforms
• sRGB colorspace rendering
• NEXT – EGLStreams into core for vision and camera interop
Applications
OS and Display Platforms
Application Portability EGL abstracts graphics context
management, surface and buffer binding and rendering
synchronization
API Interop EGL provides efficient
transfer of data and events between Khronos APIs
© Copyright Khronos Group 2014 - Page 12
OpenCL as Parallel Compute Foundation • 100+ tool chains and languages leveraging OpenCL
- Heterogeneous solutions emerging for the most popular programming languages
JavaScript binding to OpenCL
WebCL River Trail Language
extensions to JavaScript
C++ AMP Shevlin Park Uses Clang and LLVM
OpenCL provides vendor optimized, cross-platform, cross-vendor access to
heterogeneous compute resources
Harlan High level
language for GPU programming
Compiler directives for
Fortran C and C++
Aparapi Java language extensions for
parallelism
PyOpenCL Python wrapper
around OpenCL
Image Processing Language
Halide
Device X Device Y Device Z
© Copyright Khronos Group 2014 - Page 13
Widening OpenCL Ecosystem
Device X Device Y Device Z
OpenCL C Kernel Source
SPIR Generator (e.g. patched Clang)
Alternative Language for
Kernels
Alternative Language for
Kernels
Alternative Language for
Kernels
High-level Frameworks
High-level Frameworks Apps and
Frameworks
OpenCL C Runtime
OpenCL run-time can consume SPIR
SPIR Standard Portable
Intermediate Representation SPIR 1.2 Released
January 2014
SYCL Programming abstraction that combines
portability and efficiency of OpenCL with ease of use and flexibility of C++
SPIR 1.2 Released here at GDC!
© Copyright Khronos Group 2014 - Page 14
WebCL: Heterogeneous Computing for the Web • WebCL 1.0 specification officially finalized today at GDC!
- https://www.khronos.org/webcl
• WebCL defines JavaScript binding to the OpenCL APIs - Enables initiation of Kernels written in OpenCL C within the browser
• Typical Use Cases - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data visualization, Augmented Reality
OpenCL Kernel Code
OpenCL Kernel Code
OpenCL Kernel Code
OpenCL C Kernel Code
GPU
DSP CPU
CPU HW
JavaScript Platform API To query, select and initialize
compute devices
JavaScript Runtime API To build and execute kernels
across multiple devices
© Copyright Khronos Group 2014 - Page 15
Content JavaScript, HTML, CSS, ...
WebGL/WebCL Ecosystem
JavaScript Middleware
JavaScript HTML5 / CSS
Browser provides WebGL and WebCL Alongside other HTML5 technologies
No plug-in required
OS Provided Drivers WebGL uses OpenGL ES 2.0 or
Angle for OpenGL ES 2.0 over DX9 WebCL uses OpenCL 1.X
Content downloaded from the Web
Middleware can make WebGL and WebCL accessible to non-expert programmers
E.g. three.js library: http://threejs.org/ used by majority of WebGL content
Low-level APIs provide a powerful foundation for a rich JavaScript
middleware ecosystem
© Copyright Khronos Group 2014 - Page 16
OpenVX – Power Efficient Vision Acceleration • Acceleration API for real-time vision
- Focus on mobile and embedded systems
• Enable diverse efficient implementations - From CPUs, through GPUs and DSPs
to dedicated hardware
• Foundational API for vision acceleration - Can be used by middleware libraries or
by applications directly
• Complementary to OpenCV - Which is great for prototyping
• Khronos open source sample implementation - To be released with final specification - Sample - not reference - spec remains the
definitive definition of OpenVX operation Open source sample
implementation Hardware vendor implementations
OpenCV open source library
Other higher-level CV libraries
Application
© Copyright Khronos Group 2014 - Page 17
OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency
- Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Processing can be tiled to keep data entirely in local memory/cache
• EGLStreams can provide data and event interop with other Khronos APIs - BUT use of other Khronos APIs are not mandated
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
Heterogeneous Processing
Native Camera Control
Example OpenVX Graph
© Copyright Khronos Group 2014 - Page 18
OpenVX 1.0 Function Overview • Core data structures
- Images and Image Pyramids - Processing Graphs, Kernels, Parameters
• Image Processing - Arithmetic, Logical, and statistical operations - Multichannel Color and BitDepth Extraction and Conversion - 2D Filtering and Morphological operations - Image Resizing and Warping
• Core Computer Vision - Pyramid computation - Integral Image computation
• Feature Extraction and Tracking - Histogram Computation and Equalization - Canny Edge Detection - Harris and FAST Corner detection - Sparse Optical Flow
OpenVX 1.0 defines framework for
creating, managing and executing graphs
Focused set of widely used functions that are
readily accelerated
Implementers can add functions as extensions
Widely used extensions adopted into future versions of the core
OpenVX Specification Evolution
© Copyright Khronos Group 2014 - Page 19
OpenVX and OpenCV are Complementary
Governance Community driven open source with no formal specification
Formal specification defined and implemented by hardware vendors
Conformance No conformance tests for consistency and every vendor implements different subset
Full conformance test suite / process creates a reliable acceleration platform
Portability APIs can vary depending on processor Hardware abstracted for portability
Scope Very wide
1000s of imaging and vision functions Multiple camera APIs/interfaces
Tight focus on hardware accelerated functions for mobile vision Use external camera API
Efficiency Memory-based architecture Each operation reads and writes memory
Graph-based execution Optimizable computation, data transfer
Use Case Rapid experimentation Production development & deployment
© Copyright Khronos Group 2014 - Page 20
Vision Developers – API Choice
Out of the Box Vision Framework Vision operators and graph framework library Can run on dedicated hardware – no compiler Suited for low-power, always-on acceleration
Easier performance portability to diverse hardware
GPU Compute Shaders Flexible GLSL compute and imaging Easy to integrate into existing graphics apps Can mean less APIs and contexts for app to manage Limited to single GPU
General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler Single run-time framework for CPUs, GPUs, DSPs, hardware Needs full compiler stack and IEEE precision Can be used to code new OpenVX nodes
© Copyright Khronos Group 2014 - Page 21
Need for Camera Control API • Advanced control of ISP and camera subsystem
- Generate sophisticated image stream for advanced imaging & vision apps
• No platform API currently fulfills all developer requirements - Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays - Cross sensor synch: e.g. synch of camera and MEMS sensors - Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI - Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing
Pre-processing Image Signal Processor (ISP) Post-processing
Sensor, Color Filter Array Lens, Flash, Focus, Aperture
Bayer RGB/YUV Image/Vision Applications
Lens, sensor, aperture control 3A - Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)
Scope of Camera Control API
© Copyright Khronos Group 2014 - Page 22
Camera API Architecture will be FCAM-based • No global state
- State travels with image requests - Every stage in the pipeline may have different state - Enables fast, deterministic state changes
• Synchronize devices - Lens, flash, sound capture, gyro… - Devices can schedule Actions - E.g. to be triggered on exposure change
© Copyright Khronos Group 2014 - Page 23
Low-level Sensor Abstraction API
Apps Need Sophisticated Access to Sensor Data Without coding to specific
sensor hardware
Apps request semantic sensor information StreamInput defines possible requests, e.g.
Read Physical or Virtual Sensors e.g. “Game Quaternion” Context detection e.g. “Am I in an elevator?”
StreamInput processing graph provides optimized sensor data stream
High-value, smart sensor fusion middleware can connect to apps in a portable way
Apps can gain ‘magical’ situational awareness
Advanced Sensors Everywhere Multi-axis motion/position, quaternions,
context-awareness, gestures, activity monitoring, health and environmental sensors Sensor Discoverability
Sensor Code Portability
© Copyright Khronos Group 2014 - Page 24
Khronos APIs for Augmented Reality
Advanced Camera Control and stream
generation
3D Rendering and Video Composition
On GPU
Audio Rendering
Application on CPUs, GPUs
and DSPs
Sensor Fusion
Vision Processing
MEMS Sensors
Camera Control API
EGLStream - stream data
between APIs
Precision timestamps on all sensor samples
AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for
all these subsystems to work efficiently together
© Copyright Khronos Group 2014 - Page 25
NVIDIA Use of Khronos Standards • Shipping
- OpenGL 4.4 - OpenGL ES 3.X - WebGL 1.0 - OpenCL 1.X - EGL 1.4
• Implementing - OpenVX 1.0 Provisional
(foundational framework for VisionWorks)
• Participating - Camera Working Group - StreamInput
Camera Control API