203
Wang, Yuan-Kai (王元凱) Computer Vision Parallelization by GPGPU p. Wang, Yuan-Kai (王元凱) Electrical Engineering Department, Fu Jen Catholic University (輔仁大學電機工程系) [email protected] http://www.ykwang.tw 2014/07/17 Parallelize Computer Vision by GPGPU Computing 1

2014/07/17 Parallelize computer vision by GPGPU computing

Embed Size (px)

Citation preview

Page 1: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Wang, Yuan-Kai (王元凱)Electrical Engineering Department, Fu Jen Catholic

University (輔仁大學電機工程系)[email protected]

http://www.ykwang.tw

2014/07/17

Parallelize Computer Visionby GPGPU Computing

1

Page 2: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

About this Course❖ Multicore Era for Computer Vision❖ GPGPU❖ Parallel Programming

(CUDA, OpenCL, Renderscript)❖ OpenCV Acceleration with GPGPU❖ Computer Vision Acceleration

2

Page 3: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

1. Multicore Era forComputer Vision

Paradigm shift from Clock Speed Race

to Multicore Race

3

Page 4: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multicore Computing❖ What Is Multicore

• Combine multiple processors(CPU, DSP, GPGPU, FPGA)into single chip

❖ Multicore computing is inevitable

4

Page 5: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Moore's Law❖ In 1965, Gordon Moore (Intel co-founder)

predicted• The transistors no. on an IC would double

every 18 months❖ The well-known law

• The performance of computer doubles every 18 months• More transistors → More performance

❖ The prediction was kept correctly by Intel's CPUs for 40 years

5

Page 6: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Review of Moore's Law ❖ Transistors in a chip did increase

6

Software enjoys the fruits of hardware's labour.

Page 7: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Problems❖ More transistors need high frequency

• We come into the Clock Speed Race❖ But high frequency needs high power

consumption• High power consumption è Heat problem• 4GHz has been the limit of Moore’s law

7

Page 8: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Paradigm Shift from 2000 AD❖ General-purpose multicore

comes of age❖ Chip companies race to create multicore

processors• CPU: Intel Core Duo, Quad-core,

ARM v7, ...• DSP: TI OMAP, ARM NEON, …• GPU/GPGPU:

• nVidia: GeForce/Tesla, Tegra• ARM: Mali-T6x• …

8

Page 9: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

The Multicore Evolution

Pentium processorOptimized for single

thread

Core Duo 5~10 years10~100 energy efficient

cores optimized for parallel execution

From large mono-core to multiple lightweight cores

9

Page 10: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Moore’s Law Needs Multicore❖ Single core cannot fit Moore's law❖ Multicore can fit Moore's law if a

parallel programming model exists

Time

Per

form

ance

Single Core

Multi-Core

10

Page 11: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Two Architectures for Multicore

❖ Symmetric multiprocessing (SMP)• Multicore CPU, GPGPU, DSP multicore• Homogeneous computing

❖ Asymmetric multiprocessing (AMP)• CPU+GPGPU,

CPU+FPGA, CPU+DSP

• Heterogeneous computing

11

Page 12: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multicore CPU (1/2)❖ Two or more CPUs in a chip❖ Ex.: Intel Core i7

12

Multiple Execution Cores

Page 13: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multicore CPU (2/2)❖ Windows Task Manager(工作管理員)

Two cores Eight cores

13

Page 14: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU (1/2)❖ GPU (Graphical Processing Unit)

• The processor in graphics card to speed up 3D graphics

• Game playingis a majorapplication

❖ GPGPU: General-Purpose GPU• General purpose computation using

GPU in applications other than 3D graphics

14

Page 15: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU (2/2)❖ GPGPU has more cores than CPU

• 120 ~ 3072 cores vs. 2 ~ 8 cores(Many-core vs. Multi-core)

❖ GPGPU is more powerful than multicore CPU

❖ Vendors: • nVidia • Quadcomm

(AMD, ATI)• ARM• Intel

15

Page 16: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p. 16

It is the Software, Stupid❖Gary Smith and Daya Nadamuni, Gartner

Dataquest, Design Automation Conf., 2006❖The biggest problem with SoC design

is embedded software development. ❖The next big hurdle is

programmability. It's the ability to program these multicore platforms."❖You can have elegant algorithms,

first-pass silicon, and fancy intellectual property. But without software, the product goes nowhere.

Page 17: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multicore Demands Threading17

Page 18: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multicore Demands Threading18

Page 19: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

What Is Computer Vision19

Page 20: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

VideoCapture

ImageEnhance

Object/Event

DetectionObjectTracking

Object/Event

RecognitionBehaviorAnalysis Retrieval

Imaging

Event Detection

Abnormal Detection Face Recognition Retrieval

TripwireImage/Video Enhancement

A Complete Vision System– Video Surveillance as an Example

20

Page 21: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Computer Vision NeedsHigh Performance Computing❖ A CV example : video processing

• Intelligent video surveillance,❖ Its complexity is high

• Video (1080p RGB): 6 Megapixels per frame, 30fps

• 100 – 1K flops per pixel• ⇒ 18 - 180 Gigaflops per second

❖ Massive data processing• Intensive computation

21

Page 22: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

HPC Approaches❖ Cluster/distributed computing

• Hadoop/MAP-REDUCE(Google, Cloud Computing)

• MPI❖ Multi-processing

computing• Multicore (GPGPU, CPU, FPGA/DSP)• Programming: multi-thread

• Windows thread, Pthraed, OpenMP• CUDA, renderscript, C++ AMP, …

Supercomputer

22

Page 23: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

However❖ Can CV algorithms speed-up every 18

months with multicore?❖ Multicore is not a simple solution for

upgrading CV algorithm performance• The transition from single core to

multicore will be blocked by software• We are not ready to face the software

programming challenges• It is the software, stupid.

23

Page 24: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Software, Threading, and Parallel Computing

❖ Identify parallelism: Analyze algorithm❖ Express parallelism: Write parallel code❖ Validate parallelism: Debug & verify parallel code❖ Optimize parallelism: enhance parallel

performance

24

Page 25: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multi-threading DemandsNew Programming Skills

❖ Previous multi-threading techniques❖ Windows thread, pthread, OpenMP,

MPI, …❖ New techniques

• CUDA, C++ AMP, OpenCL, Renderscript,OpenACC, Map Reduce, …

❖ Concepts• Race condition, deadlock,• Domain partition, function partition, …

25

Page 26: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multicore Programming Practice (MPP)

❖ Goal: Write portable C/C++ programs to be "Multicore ready" and platform compatible• Proposed by a

MPP working group in the Multicore Association

http://www.multicore-association.org/workgroup/mpp.php

26

Page 27: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenACC❖ An organization develops API to

• describes a collection of compiler directives

• To specify loops and regions of code in standard C, C++ and Fortran

• To be offloaded from a host CPU to an attached accelerator, including• APUs, GPUs, and many-core coprocessor

27

Page 28: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

HSA Foundation❖Heterogeneous System Architecture

• Key members: AMD, QUALCOMM, ARM, SAMSUNG, TI

❖System architecture easing efficient use of accelerators, SoCs

• Intended to support high-level parallel programming frameworks

• OpenCL, C++, C#, OpenMP, Java • Accelerator requirements

• Full-system SVM, memory coherency, preemption, user-mode dispatch

28

Page 29: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

The ParLab in Berkeley❖ The Parallel Computing Lab. in UC

Berkeleyhttp://parlab.eecs.berkeley.edu• The ParLab. offers programmers a

practical introduction to parallel programming techniques and tools on current parallel computers, emphasizing multicore and manycore computers.

29

Page 30: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

HPEC❖ High Performance Embedded

Computing• MIT Lincoln Lab, 1997 ~

30

Page 31: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCL❖ Royalty-free, cross-platform, cross-

vendor standard • Targeting: supercomputers è embedded systems è mobile devices

❖Enables programming of diverse compute resources • CPU, GPU, DSP, FPGA …

31

Page 32: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCL Working Group Members

❖Diverse industry participation – many industry experts

❖NVIDIA is chair, Apple is specification editor

32

Page 33: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Today We Talk About❖ Why GPGPU's multicore is better(Sec. 2)❖ Vendor, Hardware

❖ How parallel programming (Sec. 3)

❖ OpenCV Acceleration (Sec. 4)

❖ Computer vision Acceleration-PC (Sec. 5)

❖ Computer vision Acceleration-Android(Sec. 6)

33

Page 34: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

2. GPGPU

PC platformMobile platform

34

Page 35: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Why GPGPU❖ GPGPU has many-core (vs. multi-core)

• Suitable for masssively parallel computing

35

Page 36: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU as a Coprocessor

Heterogeneous Computing

36

Page 37: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

PC Platform• Discrete GPUs• GPGPU card as a coprocessor

From PC to PSC (Personal Super-Computer)

37

PCIe

Page 38: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Mobile Platform• Integrated GPUs• GPGPU sub-chip as a coprocessor

From mobile phone to mobile personal computer

38

No PCIe

GPGPU

CPU

Page 39: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU Solutions - nVidia• Compute Architecture:

Tesla, Fermi, Kepler, …• PC

• GeForce, Quadro• Tesla

• 870, 1060, 2070, K40• Mobile

• Tegra: …, 4, K1(192 cores)

39

It’s Tegra K1 Everywhere at Google I/O, Embedded Vision Alliance, 2014/7/7.

Page 40: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU Solutions– Qualcomm/AMD

❖ Qualcomm, AMD, ATI❖ APU: integrated CPU+GPU❖ Low energy consumption

❖ PC(AMD): FirePro❖ Mobile(Snapdragon):❖ Adreno: 330(32 cores)

40

Page 41: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU Solutions - ARM❖ Mali❖ Samsung Exynos, MediaTek❖ Compute engine

after T-600 ❖ Exynos 5

❖ At most 8 cores(Mali-T678)

41

Page 42: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Intel – Multicore CPU• PC (Xeon Phi)

• IRIS pro GPU• Knight Landing: 60 cores• Knight Cover: 48 CPU cores,

PCIe• Mobile

• Haswell• Atom

42

Page 43: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Applications of GPGPU

http://developer.nvidia.com/category/zone/cuda-zone

43

Page 44: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Heterogeneous Architecture❖Host: CPU❖Device: GPGPU❖Notice: memory hierarchy in device

44

Page 45: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPUs Architecture- nVidia

❖ GT200• GTX 260/280, Quardro5800, Tesla 1060

❖ Fermi• Tesla 2060

DRAM

Cache

ALUControl

ALU

ALU

ALU

DRAM

CPU(host)Multicore

GPU(device)Many-core

45

Page 46: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

nVidia GPGPU Architecture❖ SM/SP(Stream multiprocessor/Stream

processor) + Shared memory + DRAM

46

Page 47: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Memory Hierarchy❖ On-Chip Memory

• Registers• Shared Memory• Constant Memory• Texture Memory

❖ Off-Chip Memory• Local Memory• Global Memory

47

Page 48: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU vs. FPGA

❖GPU: nVidia GeForce GTX 280, GTX580

❖FPGA: Xilinx Virtex4, Virtex5

A Comparison of FPGA and GPU for real-Time Phase-Based Optical Flow, Stereo, and Local Image Features, IEEE Transactions on Computers, 2012.

48

Page 49: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU vs. FPGA

❖GPU: nVidia GeForce 7900 GTX❖FPGA: Xilinx Virtex-4

Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study, IEEE Transactions on Computers, 2010.

49

Page 50: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU vs. FPGA vs. Multicore❖Application: 2-D image convolution

GPU: nVidia GeForce 295 GTXFPGA: Altera Stratix III E260

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications, ACM/SIGDA international symposium on FPGA, 2012.

50

Page 51: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

However, GPGPU May NotAlways Improve Speed & Energy

51

Page 52: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Hardware vs. Software52

GPGPU

nVidia

Qualcomm

ARM

Intel

ParallelProgramming

CUDA

OpenCL

RenderScript

C++ AMP

Page 53: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Today We Talk About❖ Why GPGPU's multicore is better(Sec. 2)

❖ How parallel programming (Sec. 3)• CUDA, renderscript, OpenCL, …

❖ OpenCV Acceleration (Sec. 4)

❖ Computer vision Acceleration-PC (Sec. 5)

❖ Computer vision Acceleration-Android(Sec. 6)

53

Page 54: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

3. Parallel Programming

Multi-threadingProgramming Languages for Parallels

54

Page 55: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Parallel Computing❖ Serial

Computing

❖ ParallelComputing

CPU/GPU

55

Core

Core

Core

Core

Page 56: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Parallel Programming❖ Many codes are written in C/C++/Java

• Especially algorithmic programs❖ Can we write GPGPU parallel

programs by C/C++/Java?❖ However, C/C++ is sequential

• Three control structures of C/C++/Java:sequence, selection, repetition

56

Page 57: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multi-threading❖ Multi-threading is the fundamental

concept for parallel programming• Some techniques are ready

• Pthread, Win32 thread, OpenMP, MPI, Intel TBB (Threading Building Block)...

• New techniques• CUDA, OpenCL, Renderscript,

OpenACC, C++ AMP, ...

57

Page 58: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Parallel Programming Models58

Page 59: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Parallel Programming in Sequential Language

❖ Do we need to learn new languages for multi-threading?• No

❖ Write multi-threading codes in C/C++• Add functions/directives to C/C++ for

multi-threading• That is the way current solutions did

• pthread, Win32 thread, OpenMP, MPI, CUDA, OpenCL, ...

59

Page 60: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Decompose the Problem❖ Two basic approaches to partition

computational work• Domain decomposition

• Partition the data used in solving the problem

• Function decomposition• Partition the jobs (functions)

from the overall work (problem)

GPGPU

CPUCooperate

60

Page 61: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multi-Threading❖ A program running

In Serial

http://en.wikipedia.org/wiki/Thread_(computer_science)

In Parallel

61

Page 62: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Domain Decomposition (1/3) ❖An image example

• It is 2D data• Three popular partition ways

62

Page 63: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Domain Decomposition (2/3)❖Domain data are usually processed

by loop• for (i=0; i<height; i++)

for (j=0; j<width; j++)img2[i][j] = RemoveNoise(img1[i][j]);

Original image(img1) Enhanced image(img2)

The X-ray image of a circuit board

ij

SIMDSPMDSIMT

63

Page 64: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Domain Decomposition (3/3)❖A three-block partition

example• // Thread 1

for (i=0; i<height/3; i++)for (j=0; j<width; j++)

img2[i][j] = RemoveNoise(img1[i][j]);• // Thread 2

for (i=height/3; i<height*2/3; i++)for (j=0; j<width; j++)

img2[i][j] = RemoveNoise(img1[i][j]);• // Thread 3

for (i=height*2/3; i<height; i++)for (j=0; j<width; j++)

img2[i][j] = RemoveNoise(img1[i][j]);

ij

OpenMPCUDA(SPMD)

fork(threads)

join(barrier)

i=0i=1i=2i=3

i=4i=5i=6i=7

i=8i=9i=10i=11

subdomain1 subdomain2 subdomain3

64

Page 65: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU Programming: SIMT model

❖ CPU (“host”) program often written in C or C++

❖ GPU code is written as a sequential kernel in (usually) a C or C++ dialect

65

Page 66: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU ProgrammingTechniques

CUDA

OpenCL

C++ AMP

Rednerscript

66

Page 67: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU Programming Techniques

67

Page 68: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA

68

Page 69: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA❖ CUDA: Compute Unified Device

Architecture❖ Parallel programming

for nVidia's GPGPU❖ Use C/C++ language

• Java, Fortran, Matlab are OK❖ When executing CUDA programs,

the GPU operates as coprocessor to the main CPU

69

Page 70: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Hardware Environment: CPU+GPU

❖ CPU• Organizes, interprets, and

communicates information❖ GPU

• Handles the core processing on large quantities of parallel information

• Compute-intensive portions of applications that are executed many times, but on different data, are extracted from the main application and compiled to execute in parallel on the GPU

CPU GPUPCI-E

70

Page 71: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Software Stack71

Page 72: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Processing Flow on CUDA

Copyprocessingdata

2

Copytheresult

5 Instructtheprocessing

3Main

Memory CPU

Memoryfor GPU Execute

parallelineachcore

4

Releasedevicememory

6

Allocatedevicememory

1

72

Page 73: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Programming withMemory Hierarchy

❖ Locality principle• Temporal

locality• Spatial

locality

73

Page 74: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(1/3)int main(){

char src[12]="Hello World";char h_hello[12];

char* d_hello1; char* d_hello2;

cudaMalloc((void**) &d_hello1, sizeof(char)*12); cudaMalloc((void**) &d_hello2, sizeof(char)*12);

cudaMemcpy(d_hello1 , src , sizeof(char)* 12 , cudaMemcpyHostToDevice);

hello<<<1,1>>>(d_hello1 , d_hello2 );

Host

src

h_hello

Device

d_hello1

d_hello2

call the kernel function

74

Page 75: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(2/3)❖ Kernel Function__global__ void hello(char* hello1 , char* hello2 ){

int k;

for(k = 0 ; hello1[k] != '\0' ; k++){hello2[k] = hello1[k];

}}

Host

src

h_hello

Device

d_hello1

d_hello2No parallel processing in this example

75

Page 76: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(3/3)cudaMemcpy(h_hello, d_hello2, sizeof(char)*12, cudaMemcpyDeviceToHost);

printf("%s\n", h_hello);

cudaFree(d_hello1);❖ cudaFree(d_hello2);

system("pause");return 0;

}Result:

Host

src

h_hello

Device

d_hello1

d_hello2

76

Page 77: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCL

Standard

77

Page 78: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

The Inspiration for OpenCL78

Page 79: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

What's OpenCL❖One code tree can be executed on CPUs, GPUs,

DSPs and hardware • Dynamically interrogate system load and

balance across available processors ❖Powerful, low-level flexibility

• Foundational access to compute resources for higher-level engines, frameworks and languages

79

Page 80: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Broad OpenCL Implementer Adoption

❖Multiple conformant implementations shipping on desktop and mobile

❖Android ICD extension released in latest extension specification

❖Multiple implementations shipping in Android NDK

80

Page 81: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCL Enables Portability ❖C to gates programs are

proprietary

81

Page 82: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Altera OpenCL SDK for FPGAs82

Page 83: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

NVIDIA OpenCL SDK for GPU83

Page 84: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

AMD OpenCL Optimization Case Study

❖Platform• AMD Phenom II X4 965 CPU (quad core)• ATI Radeon HD 5870 GPU

❖Unoptimized CPU performance: 1 GFLOP/s❖Optimized CPU performance reaches: 4 GFLOP/s❖Optimized GPU performance reaches: 50 GFLOP/s

84

Page 85: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(1/3)Including

Declaring

85

Page 86: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(2/3)

Creating

86

Page 87: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(2/3)

Do

Copy to host &display

Creating

87

Page 88: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(3/3)Kernel Function

88

Page 89: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

C++ AMP

Microsoft

89

Page 90: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

What's C++ AMP(1/2)❖Microsoft’s C++ AMP (Accelerated Massive

Parallelism) • Part of Visual C++, integrated with Visual

Studio, built on Direct3D • “Performance for the mainstream”

❖STL-like library for multidimensional array data

• Special convenience support for 1, 2, and 3 dimensional arrays on CPU or GPU

• C++ AMP runtime handles CPU<->GPU data copying

• Tiles enable efficient processing of sub-arrays

90

Page 91: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

What's C++ AMP(2/2)❖Parallel_for_each

• Executes a kernel (C++ lambda) at each point in the extent

• restrict() clause specifies where to run the kernel: cpu (default) or direct3d (GPU)

91

Page 92: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(1/2)

Declaring&Coping to device

92

Page 93: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(2/2)

Do

Display

93

Page 94: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript

Google Android

94

Page 95: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

What's Renderscript(1/2)❖Higher-level than CUDA or OpenCL: simpler &

less performance control • Emphasis on mobile devices & cross-SoC

performance portability ❖Programming model

• C99-based kernel language, JIT-compiled, single input-single output

• Automatic Java class reflection • Intrinsics: built-in, highly-tuned operations,

e.g. ScriptIntrinsicConvolve3x3 • Script groups combine kernels to amortize

launch cost & enable kernel fusion

95

Page 96: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

What's Renderscript(2/2)❖ Data type:

• 1D/2D collections of elements, C types like intand short2, types include size

• Runtime type checking ❖ Parallelism

• Implicit: one thread per data element, atomics for thread-safe access

• Thread scheduling not exposed, VM-decided

96

Page 97: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript Architecture97

Page 98: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Low Level Virtual Machine❖Low Level Virtual Machine (LLVM)

is a compiler infrastructure

98

Page 99: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Offline Compiler Flow99

Page 100: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Renderscript Compiler: libbcc100

Page 101: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Renderscript Project Framework

101

Page 102: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(1/8)102

Page 103: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(2/8)HelloWorld.java

103

Page 104: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(3/8)HelloWorld.java

104

Page 105: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(4/8)HelloWorldView.java

105

Page 106: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(5/8)HelloWorldView.java

106

Page 107: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(6/8)HelloWorldRS.java

107

Page 108: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(7/8)HelloWorldRS.java

108

Page 109: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(7/8)ScriptC_helloworld.java

109

Page 110: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(7/8)ScriptC_helloworld.java

110

Page 111: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Example - Hello World(8/8)HelloWorld.rs

111

Page 112: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Comparison (1/2)❖Renderscript vs. Native(NDK) vs. Java(SDK)

• OS: Honeycomb v3.2(CPU only)

Qian, Xi, Guangyu Zhu, and Xiao-Feng Li. "Comparison and analysis of the three programming models in google android." in Proc. First Asia-Pacific Programming Languages and Compilers Workshop (APPLC). 2012.

112

Page 113: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Comparison(2/2)❖OpenCL & CUDA

• Sobel filter with(CMw/o) and without(CMw) constant memory

OpenCL’s portability does not fundamentally affect its performance

Fang, Jianbin, Ana Lucia Varbanescu, and Henk Sips. "A comprehensive performance comparison of CUDA and OpenCL." in Proc. International Conference Parallel Processing (ICPP), 2011.

113

Page 114: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU Programming114

Performance: more control, better performance

Productivity: ease use, quick programming, portability

Page 115: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

❖ Multicore/Multi-threading❖ Data Parallelization

• Data distribution• Parallel convolution• Reduction algorithm• Amdahl’s law

❖ Memory Hierarchy Management• Locality principle

• Program accesses a relatively small portion of the address space at any instant of time

Parallelization115

Page 116: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Multi-thread Programming withthe Discipline of Parallelization

❖ Identify parallelism: Analyze algorithm❖ Express parallelism: Write parallel code❖ Validate parallelism: Debug & verify parallel code❖ Optimize parallelism: enhance parallel

performance

116

Page 117: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Today We Talk About❖ Why GPGPU's multicore is better(Sec. 2)

❖ How parallel programming (Sec. 3)

❖ OpenCV Acceleration (Sec. 4)

❖ Computer vision Acceleration-PC (Sec. 5)

❖ Computer vision Acceleration-Android(Sec. 6)

117

Page 118: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

4. OpenCVAcceleration

118

Page 119: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

What Is OpenCV❖A very popular computer vision

library• 6M downloads• BSD licenses• 2000 ~ CV functions• Modularized and efficient• Optimization

• Intel SSE, IPP, TBB• ARM NEON & GLSL (Tegra)• CUDA, OpenCL

119

Page 120: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV Modules❖Image/video I/O, processing, display (core,

imgproc, highgui) ❖Object/feature detection (objdetect, features2d,

nonfree) ❖Geometry-based monocular or stereo computer

vision (calib3d, stitching, videostab) ❖Computational photography (photo, video,

superres) ❖Machine learning & clustering (ml, flann) ❖CUDA and OpenCL GPU acceleration (gpu, ocl)

Normal CV modules: 14Acceleration modules: 2

120

Page 121: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV GPU Module

❖Implemented using NVIDIA CUDA Runtime API❖Latest version: 2.4.9

• Utilizing Multiple GPUs❖Implemented modules: 11 ❖Implemented functions: 270

Focus on PC platformNot fully compatible to mobile GPGPU on Android

121

Page 122: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Matrix Operations❖Point-wise matrix math

• gpu::add(), ::sum(), ::div(), ::sqrt(), ::sqrSum(), ::meanStdDev, ::min(), ::max(), ::minMaxLoc(), ::magnitude(), ::norm(), ::countNonZero(), ::cartToPolar(), etc..

❖Matrix multiplication • gpu::gemm()

❖Channel manipulation • gpu::merge(), ::split()

122

Page 123: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Geometric Operations ❖Image resize with sub-pixel interpolation

• gpu::resize() ❖Image rotate with sub-pixel interpolation

• gpu::rotate() ❖Image warp (e.g., panoramic stitching)

• gpu::warpPerspective(), ::warpAffine()

123

Page 124: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA other Math and Geometric Operations

❖Integral images• gpu::integral(), ::sqrIntegral()

❖Custom geometric transformation (e.g., lens distortion correction)

• gpu::remap(), ::buildWarpCylindricalMaps(), ::buildWarpSphericalMaps()

124

Page 125: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Image Processing(1/2) ❖Smoothing

• gpu::blur(), ::boxFilter(), ::GaussianBlur()

❖Morphological • gpu::dilate(), ::erode(), ::morphologyEx()

❖Edge Detection • gpu::Sobel(), ::Scharr(), ::Laplacian(),

gpu::Canny() ❖Custom 2D filters

• gpu::filter2D(), ::createFilter2D_GPU(), ::createSeparableFilter_GPU()

❖Color space conversion • gpu::cvtColor()

125

Page 126: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Image Processing(2/2) ❖Image blending

• gpu::blendLinear() ❖Template matching (automated inspection)

• gpu::matchTemplate() ❖Gaussian pyramid (scale invariant

feature/object detection) • gpu::pyrUp(), ::pyrDown()

❖Image histogram • gpu::calcHist(), gpu::histEven,

gpu::histRange() ❖Contract enhancement

• gpu::equalizeHist()

126

Page 127: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA De-noising ❖Gaussian noise removal

• gpu::FastNonLocalMeansDenoising() ❖Edge preserving smoothing

• gpu::bilateralFilter()

127

Page 128: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Fourier and MeanShift❖Fourier analysis

• gpu::dft(), ::convolve(), ::mulAndScaleSpectrums(), etc..

❖MeanShift• gpu::meanShiftFiltering(), ::meanShiftSegmentation()

128

Page 129: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Shape Detection ❖Line detection (e.g., lane detection, building

detection, perspective correction) • gpu::HoughLines(), ::HoughLinesDownload()

❖Circle detection (e.g., cells, coins, balls) • gpu::HoughCircles(),

::HoughCirclesDownload()

129

Page 130: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Object Detection ❖HAAR and LBP cascaded adaptive boosting

(e.g., face, nose, eyes, mouth) • gpu::CascadeClassifier_GPU::detectMulti

Scale() ❖HOG detector (e.g., person, car, fruit, hand)

• gpu::HOGDescriptor::detectMultiScale()

130

Page 131: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Object Recognition❖Interest point detectors

• gpu::cornerHarris(), ::cornerMinEigenVal(), ::SURF_GPU, ::FAST_GPU, ::ORB_GPU(), ::GoodFeaturesToTrackDetector_GPU()

❖Feature matching • gpu::BruteForceMatcher_GPU(),

::BFMatcher_GPU()

131

Page 132: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Stereo and 3D ❖RANSAC

• gpu::solvePnPRansac() ❖Stereo correspondence (disparity map)

• gpu::StereoBM_GPU(), ::StereoBeliefPropagation(), ::StereoConstantSpaceBP(), ::DisparityBilateralFilter()

❖Represent stereo disparity as 3D or 2D • gpu::reprojectImageTo3D(),

::drawColorDisp()

132

Page 133: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Optical Flow ❖Dense/sparse optical flow

gpu::FastOpticalFlowBM(), ::PyrLKOpticalFlow, ::BroxOpticalFlow(), ::FarnebackOpticalFlow(), ::OpticalFlowDual_TVL1_GPU(), ::interpolateFrames()

133

Page 134: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA Background Segmentation

❖Foregrdound/background segmentation (e.g., object detection/removal, motion tracking, background removal)

• gpu::FGDStatModel, ::GMG_GPU, ::MOG_GPU, ::MOG2_GPU

134

Page 135: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Performance of OpenCV GPU Accelerators on PC

135

Page 136: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Today We Talk About❖ Why GPGPU's multicore is better(Sec. 2)

❖ How parallel programming (Sec. 3)

❖ OpenCV Acceleration (Sec. 4)

❖ Computer vision Acceleration-PC (Sec. 5)

❖ Computer vision Acceleration-Android(Sec. 6)

136

Page 137: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

5. Computer VisionAcceleration on PCImage enhancement (HDR)

Feature extractionVideo surveillance cloud

137

Page 138: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

HDR andImage Enhancement

138

Page 139: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

❖ Restore and enhance an image❖ Its complexity is high for large images

HDR Image Enhancement

Original RestoredComplexity:O(N2) ~ O(N3)

139

Page 140: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Algorithms for Image Restoration

❖ Wiener Filter❖ Histogram Based Approach

• Histogram Equalization, Histogram Modification, …

❖ Retinex• Path-based Retinex• Recursive Retinex• Center/surround Retinex

• No iterative process and is suitable for parallelization• Multi-Scale Retinex with Color Restoration (MSRCR)

[Rahman et al. 1997]

140

Page 141: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

MSRCR Algorithm

• : the MSRCR output

• : the original image distribution in the ith spectral band

• : the kth Gaussian Surround function

• : the convolution operation

• : the weight

• : the color restoration factor in the ith spectral band

N : the number of spectral bands: the gain constant: controls the strength of the nonlinearity

141

Page 142: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

The Method

Gaussian Blur

Log-domain Processing

Normalization

Copy Data from CPU to

GPGPU

Copy Data from GPGPU to

CPU

GPGPUCPU

Histogram Stretching

• Wang, Yuan-Kai, and Wen-Bin Huang. "Acceleration of an improved Retinex algorithm." Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on. IEEE, 2011.

• Wang, Yuan-Kai, and Wen-Bin Huang. "A CUDA-enabled parallel algorithm for accelerating retinex." Journal of Real-Time Image Processing (2012): 1-19.

142

Page 143: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

❖ Multicore/Multi-threading• Tesla C1060 : 240 SP (Stream Processor)• CUDA: , Thread , Block , Grid

❖ Data Parallelization• Parallel convolution

Parallelization by GPGPU

• Parallel convolution

A(0)A(1)A(2)A(3)A(4)A(5)A(6)A(7)

A(0)+A(1)

A(2)+A(3)

A(4)+A(5)

A(6)+A(7)

A(0)+A(1)+A(2)+A(3)

A(4)+A(5)+A(6)+A(7)

sum

PE data timet0 t1t2t3t4t5

01234567

PEi{ {pixels

pixels

Mpixels

Mpixels

PEipixels

pixels

pixels

pixels

1pixels 1pixels

1pixels 1pixels

143

Page 144: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Our Memory Hierarchy

Parallel Gaussian Blur

Parallel Log-domain Processing

Parallel Normalization

Texture Memory

Parallel Histogram Stretching

Constant Memory

Global Memory

Shared Memory

144

Page 145: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CPU results GPGPU resultsOriginal images

Experimental Results (1/2)145

Page 146: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CPU results GPGPU resultsOriginal images

Experimental Results (2/2)146

Page 147: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU Speedup over CPU74x

2x

• Ideal speedup: 240 * (1.296GHz/ 3GHz) = 103• NPP: nVidia Performance Primitive

147

Page 148: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Feature Extraction (SIFT)

148

Page 149: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

❖SIFT• Scale Invariant Feature Transform

❖Invariance of feature points• Translation• Rotation• Scale

What Is SIFT149

Page 150: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

❖Object recognition/tracking❖Image retrieval❖Autostitch

Applications of SIFT150

Page 151: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Parallelize SIFT by GPGPU

Intel Q9400Quad cores(2.66GHz)

Geforce GTS 250128 SPs(1.836GHz)

151

Page 152: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CPU GPUExperimental Results

152

Page 153: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Execution Timem s

CPU:10 secondsin average

GPGPU:0.8 secondsin average

153

Page 154: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Speedup

13x speedup in average

154

Page 155: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Video Surveillance Cloud

155

Page 156: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU雲端視訊監控系統

警戒區域入侵偵測

PTZ相機追蹤

攝影機異常偵測

高效率影片事件瀏覽系統中央視訊及訊息管理系統多重解析度廣域監視系統

戶外停車場

空位偵測

非法停車偵測

動態場景人臉偵測

Storage Area Network

PC Mobile device

Multi-core

Hypervisor

GPGPU

156

Page 157: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

私有雲機房

157

Page 158: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Today We Talk About❖ Why GPGPU's multicore is better(Sec. 2)

❖ How parallel programming (Sec. 3)

❖ OpenCV Acceleration (Sec. 4)

❖ Computer vision Acceleration-PC (Sec. 5)

❖ Computer vision Acceleration-Android(Sec. 6)

158

Page 159: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

6. Computer VisionAcceleration on

AndroidOpenCV

RenderScript

159

Page 160: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCVon Android

160

Page 161: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV4Android SDK❖Enables development of Android applications

with use of OpenCV library.❖Use java native interface (JNI) directly access c

code❖Support nVIDAs’ Tegra android development

pack(TADP)

Not fullycompatible withGPU module

161

Page 162: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

System Framework162

Page 163: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Two Methods to Call OpenCV❖Using Java API

❖Using native C++

163

Page 164: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV for Android SDK by GPU(1/5)

164

Page 165: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV for Android SDK by GPU(2/5)

165

Page 166: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV for Android SDK by GPU(3/5)

166

Page 167: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV for Android SDK by GPU(4/5)

167

Page 168: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV for Android SDK by GPU(5/5)

168

Page 169: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript on Android with GPU

Acceleration

169

Page 170: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript on android with GPU(1/5)

170

Page 171: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript on android with GPU(2/5)

171

Page 172: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript on android with GPU(3/5)

172

Page 173: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript on android with GPU(4/5)

173

Page 174: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript on android with GPU(5/5)

174

Page 175: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript Image Processing Intrinsics

Name OperationScriptIntrinsicConvolve3x3,ScriptIntrinsicConvolve5x5

Performs a 3x3 or 5x5 convolution.

ScriptIntrinsicBlur Performs a Gaussian blur. Supports grayscale and RGBA buffers and is used by the system framework for drop shadows.

ScriptIntrinsicYuvToRGB Converts a YUV buffer to RGB. Often used to process camera data.

ScriptIntrinsicColorMatrix Applies a 4x4 color matrix to a buffer.

ScriptIntrinsicBlend Blends two allocations in a variety of ways.

ScriptIntrinsicLUT Applies a per-channel lookup table to a buffer.

ScriptIntrinsic3DLUT Applies a color cube with interpolation to a buffer.

ScriptIntrinsicHistogram Intrinsic Histogram filter

175

Page 176: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Gaussian Blur Example by RenderScript Intrinsic

RenderScript rs = RenderScript.create(theActivity);ScriptIntrinsicBlur theIntrinsic = ScriptIntrinsicBlur.create(mRS,Element.U8_4(rs));;Allocation tmpIn = Allocation.createFromBitmap(rs, inputBitmap);Allocation tmpOut = Allocation.createFromBitmap(rs, outputBitmap);theIntrinsic.setRadius(25.f);theIntrinsic.setInput(tmpIn);theIntrinsic.forEach(tmpOut);tmpOut.copyTo(outputBitmap);

176

Page 177: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript Intrinsic Example(1/2)

177

Page 178: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript Intrinsic Example(2/2)

178

Page 179: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Blur Intrinsic Performance Analysis

179

Page 180: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Performance of RenderScript Intrinsics

❖On new Nexus 7❖Relative to equivalent multithreaded C

implementations.

180

Page 181: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript Image Processing Benchmarks(1/2) ❖CPU only on a Galaxy Nexus device.

181

Page 182: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript Image Processing Benchmarks(2/2)

182

Page 183: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Acceleration of Retinex Using RenderScript

❖This paper presents an implementation of rsRetinex, an optimized Retinex algorithm by using Renderscript technique.

❖The experimental results show that rsRetinexcould gain up to five times speedup when applied to different image resolution.

Le, Duc Phuoc Phat Dat, et al. "Acceleration of Retinex Algorithm for ImageProcessing on Android Device Using Renderscript." in Proc. The 8th InternationalConference on Robotic, Vision, Signal Processing & Power Applications, 2014.

183

Page 184: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Mobile GPGPU ListAdoption OpenCL/ CUDA OpenCV Renderscript

Qualcomm Adreno

Google Nexus 10, Google new Nexus 7, SONY Xperia Tablet Z2

1.2(302~420) OCL module

Android 4.0 later

ARM Mali Nexus 10, Samsung Note 3, Samsung Note PRO 12.2, Meizu MX3

OpenCL 1.1 (T604~T678)

OCL module

Android 4.0 later

nVIDIATegra

Google Project Tango, HTC Nexus 9, Microsoft Surface 2, Nvidia Shield Note 7

CUDA, OpenCL1.2(K1 only)

GPU module

Android 4.0 later(K1 only)

AnandTechPowerVR

iPad Air, iPad mini OpenCL 1.2 OCL module

none

Intel HD Graphics

Microsoft Surface Pro 3, Sony VAIO Tap 11

OpenCL 1.1 OCL module

none

Nvidia CEO sees future in cars and gaming, 2014/5/19, CNet.

184

Page 185: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

7. Summary

185

Page 186: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPGPU❖ Single-coreè Multi-coreè Many-core❖PC

• nVidia Tesla + CUDA/OpenCV❖Android

• Qualcomm Adreno + OpenCV ocl• nVidia Tegra + OpenCV gpu

186

Page 187: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Parallel Programming❖C/C++/OpenCV

• OpenMP, OpenACC, CUDA, C++ AMP• OpenCL

❖Java• OpenCL, RenderScript

❖Notice that OpenCL and RenderScript is • Not Efficient in parallelization.• Efficient in CV algorithmic design.

187

Page 188: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV Acceleration (1/2)❖Ver. 2.4.x

• gpu module: CUDA, PC• ocl module: OpenCL, mobile

❖Ver. 3.0 (2014/6)• Transparent API for GPGPU

acceleration

188

Page 189: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV Acceleration (2/2)189

Page 190: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCL 2.0❖Released in 2013❖SVM: Shared Virtual Memory

• OpenCL 1.2: Explicit memory management

❖Dynamic (Nested) Parallelism • Allows a device to enqueue kernels onto

itself – no round trip to host required❖Disadvantage

• Strong hardware support• Not well supported in current GPGPUs

190

Page 191: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA still Dominant in the Near Future

❖ However, we have to manually parallelize the algorithm: more design overhead

❖ We need expertise in• Algorithms of image and signal processing

• Filtering, frequency analysis, compression, feature extraction, recognition, ...

• Theory, tools and methodology of parallel computing• Communication, synchronization, resource

management, load balancing, debugging, ...

191

Page 192: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPUs for Multimedia

Motion Estimation forH.264/AVC on Multiple GPUs

Using NVIDIA CUDA

10 XCUDA JPEG Decoder

10 XDivideFrame GPU Decoder

Hyperspectral Image Compression on NVIDIA GPUs

10 XGPU Decoder

(Vegas/Premiere) -Using the Power of

NVIDIA Graphic Card to Decode H.264 Video Files

26 X

PowerDirector7 Ultra

3.5X

192

Page 193: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPUs for Computer Vision(1/2)

87 XCUDA SURF – A Real-

timeImplementation for SURF

TU Darmstadt

26 XLeukocyte Tracking:

ImageJ PluginUniversity of Virginia

200 XReal-time SpatiotemporalStereo Matching Using theDual-Cross-Bilateral Grid

100 XImage Denoising with

Bilateral Filter Wlroclaw University

of Technology

85 XDigital BreastTomosynthesisReconstruction

Massachusetts General Hospital

100 XFast Optical Flow on GPUAt Video Rate for Full HD

ResolutionOnera

8 XA Framework for Efficientand Scalable Execution of

Domain-specific TemplatesOn GPU

NEC Labs, Berkeley, Purdue

13 XAccelerating Advanced MRI

ReconstructionsUniversity of Illinois

193

Page 194: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

GPUs for Computer Vision(2/2)

20 XGPU for Surveillance

13 XFast Human Detection with

Cascaded Ensembles

109 XFast Sliding-Window

Object Detection

263 XGPU Acceleration of Object

Classification AlgorithmUsing NVIDIA CUDA

10 XReal-time

Visual Tracker byStream Processing

45 XA GPU Accelerated

Evolutionary Computer Vision System

3 XCanny Edge Detection

300 XAudience Measurement –Real-time Video Analysisfor Counting People, Face Detection and Tracking

194

Page 195: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

The Embedded VisionAlliance

195

Page 196: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Readings (1/2)• Wang, Yuan-Kai, and Wen-Bin Huang. "Acceleration of an improved

Retinex algorithm." IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). 2011.

• Wang, Yuan-Kai, and Wen-Bin Huang. "A CUDA-enabled parallel algorithm for accelerating retinex." Journal of Real-Time Image Processing (2012): 1-19.

• Pauwels, Karl, et al. "A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features." Computers, IEEE Transactions on 61.7 (2012): 999-1012.

• Pratx, Guillem, and Lei Xing. "GPU computing in medical physics: A review." Medical physics 38.5 (2011): 2685-2697.

• Cope, Ben, et al. "Performance comparison of graphics processors to reconfigurable logic: a case study." Computers, IEEE Transactions on 59.4 (2010): 433-448.

196

Page 197: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Readings (2/2)❖ “Designing Visionary Mobile Apps Using the Tegra

Android Development Pack,” http://bit.ly/1jvwbgV❖ “Getting Started With GPU-Accelerated Computer

Vision Using OpenCV and CUDA,” http://bit.ly/1oMwJEG

❖ “The open standard for parallel programming of heterogeneous systems,” https://www.khronos.org/opencl/

197

Page 198: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCV Acceleration❖ GPU Module Introduction — OpenCV 2.4.9.0

documentation❖ OpenCL Module Introduction - opencv documentation!❖ OpenCV-CL: Computer vision with OpenCL

acceleration, AMD Developer Central, 2013.❖ Pulli, Kari, et al. "Real-time computer vision with

OpenCV." Communications of the ACM 55.6 (2012): 61-69.

❖ Allusse, Yannick, et al. "GpuCV: A GPU-accelerated framework for image processing and computer vision." Advances in Visual Computing. Springer Berlin Heidelberg, 2008. 430-439.

198

Page 199: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

CUDA❖ CUDA Programming guide. nVidia.❖ CUDA Best Practices Guide. nVidia.❖ CUDA Reference Manual. nVidia.❖ CUDA Zone - NVIDIA Developer,

https://developer.nvidia.com/cuda-zone❖ Parallel Programming and Computing Platform | CUDA

Home, www.nvidia.com/object/cuda_home_new.html❖ Applications of CUDA for Imaging and Computer

Visionhttp://www.nvidia.com/object/imaging_comp_vision.html

❖ nVidia Performance Primitives (NPP)http://developer.nvidia.com/object/npp_home.html

199

Page 200: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

OpenCL❖ Khronos OpenCL specification, reference card, tutorials, etc:

http://www.khronos.org/opencl❖ AMD OpenCL Resources:

http://developer.amd.com/opencl❖ NVIDIA OpenCL Resources:

http://developer.nvidia.com/opencl❖ Books

• Using OpenCL: Programming Massively Parallel Computers. IOS Press, 2012.

• OpenCL programming guide. Pearson Education, 2011.• Heterogeneous Computing with OpenCL: Revised OpenCL 1.

Newnes, 2012.• OpenCL in Action: how to accelerate graphics and

computation. Manning, 2012.

200

Page 201: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

RenderScript❖ RenderScript for Android Developer, Official web site

http://developer.android.com/guide/topics/renderscript/compute.html

❖ Qian, Xi, Guangyu Zhu, and Xiao-Feng Li. "Comparison and analysis of the three programming models in google android." First Asia-Pacific Programming Languages and Compilers Workshop. 2012.

❖ "High Performance Apps Development with RenderScript," 12th Kandroid Conference, 2013.

201

Page 202: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Web Sites and Resources❖Embedded Vision Alliance,

http://www.embedded-vision.com❖GPUComputing.Net,

http://www.gpucomputing.net❖HAS Foundation, www.hsafoundation.com❖

202

Page 203: 2014/07/17 Parallelize computer vision by GPGPU computing

Wang,Yuan-Kai(王元凱) Computer Vision Parallelization by GPGPU p.

Parallel Computing withGPGPU

❖Programming Massively Parallel Processors – A Hands-on Approach• D. B. Kirk, W. M. Hwu• Morgan Kaufmann, 2010• http://www.nvidia.com/object/promotion_david_kirk_book.html

203