32
Next Generation Visual Computing (Making GPU Computing a Reality with Mali™) Taipei, 18 June 2013 Roberto Mijat ARM

Next Generation Visual Computing

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Next Generation Visual Computing

Next Generation Visual Computing

(Making GPU Computing a Reality with Mali™)

Taipei, 18 June 2013

Roberto Mijat

ARM

Page 2: Next Generation Visual Computing

2

Addressing Computational Challenges

Trends

Growing display sizes and resolutions

Increasing computational power and novel applications

Persistent users’ expectation of improved experience

Limitations

Limited and restricted energy and thermal budgets

In mobile, processing power greatly outgrowing battery capacity

Traditional scaling solutions not sustainable

Necessities

Increase computational efficiency of processing platforms

Make use of heterogeneous and parallel computing

Leverage new technologies such as GPU Compute

Page 3: Next Generation Visual Computing

3

Complementary Compute Architectures

Note: characteristics of generic CPUs and GPUs

Page 4: Next Generation Visual Computing

4

Heterogeneous Computing

Cost effective,

efficient, great

floating point

performance

2D/3D graphics

Advanced Image Processing

Accelerate/Complement ISP functionality

Offload video codec blocks

Accelerate physics computation

Operating System

Most application processing

Programmable

through C-like

languages and APIs

GPU used as

computational

accelerators or

companion processor

CPU GPU

Control

RAM

Caches

ALU ALU

ALU ALU

Page 5: Next Generation Visual Computing

5

Benefits of GPU Computing

Performance

Faster computation

Offload and acceleration of non-graphical applications

Energy Efficiency

Free-up CPU resource by offloading to GPU

Better load-balance across system resources

Increased system efficiency using the best processor for the job

Cost Reduction

Reduced cost through h/w consolidation and software flexibility

Simpler interface to parallel programming through modern APIs

Improved user experience

Remove computational barriers

Enable new use cases and applications

Page 6: Next Generation Visual Computing

6

Adoption of Mobile GPU Compute

2012 2013 2014 2015+

OpenCL™ Full Profile Khronos conformant GPUs in mobile SoCs

GPU Compute capable devices start shipping

OEMs and SiPs evaluating leading GPU Compute solutions

Gradual roll-out of GPU Compute APIs in mobile/embedded platforms

Android™ RenderScript computation first enabled on GPU

Page 7: Next Generation Visual Computing

7

Adoption of Mobile GPU Compute

First public demonstrations of GPU Compute Mobile benchmarks

ISVs and OEMs start porting/optimizing libraries and key use-case

functionality using GPU Compute

Computational Photography and Advanced Imaging GPU acceleration

Codec vendors develop GPU Compute enabled HEVC decoders

Exploration by mainstream developers

2012 2013 2014 2015+

Page 8: Next Generation Visual Computing

8

Adoption of Mobile GPU Compute

Mainstream support for GPU Computing in Mobile and Embedded

GPU Compute widely available and utilized by developers/libraries

Introduction of GPUs implementing HSA™ features, full system coherency

Hardware consolidation and software cost reduction through migration of

selected ISP/DSP functionality to GPU

New use cases, innovation

2012 2013 2014 2015+

Page 9: Next Generation Visual Computing

9

OPENCL

Page 10: Next Generation Visual Computing

10

OpenCL Overview

OpenCL enables easier, better programming of heterogeneous

parallel compute systems, and unleashes the general purpose

computational power of GPUs needed by emerging workloads

OpenCL is

A framework to enable general purpose

parallel computing

A computing language portable across

heterogeneous processing platforms

An API to define and control the platforms

A royalty-free open standard, interoperable

with existing APIs

OpenCL and the OpenCL logo are trademarks of Apple Inc.

Page 11: Next Generation Visual Computing

11

OpenCL Programming Model

Application Program

Runtime Compiler

Kernel object

Kernel

- OpenCL kernel

- Native kernel

Index space (NDRange)

Execute command

Can use static compilation

Binaries are cached

Can be built to target

any supported device

Optimize performance

critical code

The kernel is executed over each

element of the N-dimensional

index space

Work-item: instance of a

kernel executing on a

point in the index space

Work-group: collection

of work-items

Page 12: Next Generation Visual Computing

12

The ARM OpenCL Implementation

Implements the latest version of the standard

Implements Full Profile, supports 64-bit

Optimized for interoperability with existing Mali software stack

Optimized for interoperability between CPU and GPU

Architected for Cache Coherent Interconnect support

Extensible design

Page 13: Next Generation Visual Computing

13

With Full Profile you know what you get

Full Profile defines the baseline set of features for OpenCL

Embedded Profile defines a subset of the specification

Designed to enable OpenCL on less capable devices

Making optional a large set of features, restricting developers

Reducing precision of floating point maths

Key Feature Embedded Full

FP32 precision Relaxed IEEE-754

Built-in atomic operations Optional Supported

64-bit integer Optional Supported

Online compiler Optional Supported

3D image writes Optional Supported

Linear interpolation for floating point images Optional Supported

Size of buffers and memory Limited Supported

Image data type requirements Reduced Supported

Page 14: Next Generation Visual Computing

14

RENDERSCRIPT

Page 15: Next Generation Visual Computing

15

Introduction to RenderScript

Compute framework and API for Android

Officially introduced in Honeycomb

Cross-platform control-slave architecture, with runtime compilation

A graphics engine component has been deprecated since Jelly Bean

Complements existing APIs by adding:

A compute API for parallel processing similar to OpenCL

A scripting language based on C99 supporting vector data types

Designed for portability, performance, usability

On-device JIT compilation and dynamic thread launch

Native code optimization to maximize performance critical algorithms

Mali-T604™ is the first GPU to support RenderScript

Page 16: Next Generation Visual Computing

16

How RenderScript works

Java App

RenderScript Script

Portable Bitcode

Machine Code

libRS

Reflected Layer

llvm-rs-cc

libbcc

Dalvik JIT

Executable

ARM Compute System

(Cortex™ CPU + Mali GPU + AMBA™ 4)

Online compilation

On

lin

e c

om

pil

ati

on

Page 17: Next Generation Visual Computing

17

DESIGNED FOR GPU COMPUTE

Page 18: Next Generation Visual Computing

18

Mali-T600™ : Designed for GPU Compute

Comprehensive support for general purpose data types

8/16/32/64-bit signed/unsigned integer

FP16, FP32, FP64

2,3,4,8,16 wide vectors

2D/3D images

Floating Point precision & performance

Full IEEE 754-2008 compliance

100s of GFLOPs performance for non graphical workloads

Sustainable and proven performance for real life workloads

Page 19: Next Generation Visual Computing

19

Mali-T600: Designed for GPU Compute

Hardware acceleration

Most common mathematical functions implemented in h/w

>70% coverage within newest industry APIs

Most operations compute in one cycle

Optimal memory throughput and latency

Optimized for stream and generic load/store operations

Tight integration with system using latest AMBA interfaces

Leverage on new Cache Coherent Interconnect technologies

Task management implemented in hardware

Optimal automatic distribution of compute workloads

Optimal dynamic power management

Efficient use of processing resources

Page 20: Next Generation Visual Computing

20

GPU Compute on Mali: here today!

Passed Khronos™ Conformance

Only OpenCL 1.1 Full Profile on Linux and Android outside

of console and desktop space.

Proven in Silicon

Samsung® Exynos™ 5 Dual, implements Full Profile

OpenCL and RenderScript DDKs available now

Mali-T600 shipping in real products

Google® Chromebook™

Google Nexus™ 10

InSignal ® Arndale™ Community Board

API exposed for developers

RenderScript on Android for Nexus 10

Page 21: Next Generation Visual Computing

21

USE CASES

Example of the benefits of GPU Compute from the real world

Page 22: Next Generation Visual Computing

22

Example use cases for GPU Computing

Mobile

• Computational Photography

• Physics in games

• Moving and still image real-time stabilization

• Information extraction: object detection, classification and tracking

• Imaging: correction, improvement, consolidation

• Content and context understanding

• HDR

• Augmented Reality

DTV/STB

• 2D to 3D conversion

• Super resolution

• Pre and post processing

• Camera based UI

• Trans-coding

• Information extraction and superimposition

Automotive

• Lane Detection

• Smart Head-Light

• Road Sign Recognition

• Night Vision

• Object Classification

• Pedestrian, Vehicle and Collision Detection

• Vehicle Detection

• Dynamic cruise control

100s GFLOPs of efficient processing power: improve existing use-cases, enable next generation use-cases

Page 23: Next Generation Visual Computing

23

Advanced Image Processing

RenderScript is the official Heterogeneous Compute Android API

Since Android ICS 4.2 it has been enabled to target the GPU

Complex image filters can be greatly accelerated by GPU Compute

Filter Speed-up [1]

MotionBlur 3.5x

Cloud 4.2x

Labyrinth 3.8x

TitleReflection 7.3x

WhirlPinch 3.6x

Wave 7.0x

Bicubic 15.4x

[1] Acceleration compares RenderScript compiled on device (LLVM) on dual-core Cortex™-A15 and Mali™-T604 on a stock Google Nexus™ 10

Image size: 2560x1920

Page 24: Next Generation Visual Computing

24

Video Processing APK

Proprietary Transcoding/Processing Pipeline

Image filters implemented using RenderScript

Optimized for ARM + Mali-T600 GPU Compute

Filter FPS

(GPU+CPU vs CPU only)

Speed-up

Deshake (720p) 28 / 8 3.5x

Upscaling (720p to 1080p) 20 / 3 6.7x

Page 25: Next Generation Visual Computing

25

GPU Compute accelerated superscaling

Accelerated using RenderScript

On Google Nexus 10 (Mali-T604)

Page 26: Next Generation Visual Computing

26

Next Generation Multimedia Codecs

High Efficiency Video Coding (HEVC)

Latest video compression standard ratified by ITU in Jan 2013

Improved video quality and double data compression from H.264

Can support up to 8k UHD

ARM is collaborating with multiple codec vendors

Ensuring widest availability of HEVC across multiple ARM platforms

Enabling HEVC early, in software, through NEON and GPU Compute

Flexibility of software solutions critical as HEVC rolls out

Page 27: Next Generation Visual Computing

27

Why GPU Compute for HEVC

High resolution HEVC decoding maximises CPU load

GPUs are traditionally idle during video playback

GPU architecture suites acceleration of parallel codec blocks

Offloading computation to the GPU frees up the CPU to

perform other (system) tasks

Combining CPU (NEON) and GPU Compute enable most

efficient HEVC decode

“Mali GPUs are well suited for

Video Acceleration

with significant

power/performance benefits”

– Ittiam Systems

Page 28: Next Generation Visual Computing

28

Physics (Cloth Simulation)

Page 29: Next Generation Visual Computing

29

ISP Pipeline Offload to GPU (OpenCL)

Entire ISP pipeline offloaded to the GPU using

OpenCL

More flexibility

Sensor and camera module vendors can invest in

optimized portable software libraries instead of

hardware ISP

SoC implementers can reduce BoM by offloading

ISP blocks to the GPU

Mali-T604 demo was previewed at MWC13

Noise reduction

HDR reconstruction

Tone mapping Colour

conversion

Gamma correction

De-noising Rendering

Raw Data form

HDR Sensor

OpenGL ES

OpenCL

Page 30: Next Generation Visual Computing

30

Gesture User Interfaces

eyeSightTM’s gesture recognition technology using GPU

Compute on ARM’s Mali-T600 offers unique capabilities

Reduction of overall power consumption

Reduction of load from the CPU

Robust recognition in challenging lighting conditions

Enhanced user experience

Higher FPS for more gesture capabilities and features

Page 31: Next Generation Visual Computing

31

Computer Vision Based Applications

Computer Vision entails the acquisition, processing, analysis

and understanding of sensor data (images), in order to derive

information to enable decisions to be made

En

erg

y u

se

d fo

r u

nit o

f w

ork

(lo

we

r is

be

tte

r)

Face detection study on Mali-T604 based silicon

In this example:

Consistent 6x speed up

~5x more energy efficiency

Page 32: Next Generation Visual Computing

32

Conclusions

Improve energy efficiency through heterogeneous computing

Use the best processor for the task

Balance workload across system resources

Offload heavy parallel computation to the GPU

Bring the benefits of GPU Compute to key use cases

Computational Photography and Advanced Imaging

Next generation of multimedia codecs

Computer Vision applications

The Mali Ecosystem is making GPU Compute a reality