Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
© 2012 NVIDIA - Page 1
Tegra – at the Convergence of Mobile and GPU Supercomputing Neil Trevett, VP Mobile Content, NVIDIA
© 2012 NVIDIA - Page 2
Welcome to the Inaugural GTC Mobile Summit! Tuesday Afternoon - Room 210C
Ecosystem Broad View – including Ouya Development Tools – including Tegra 4 and Shield
Wednesday Morning - Marriott Ballroom 3 Visualization – including using H.264 for still imagery Augmented device interaction – including depth camera on Tegra
Wednesday Afternoon - Room 210C Vision and Computational Photography – including Chimera Web – the fastest mobile browser Mobile Panel – your chance to ask gnarly questions!
Select Mobile Summit Tag in your GTC Mobile App!
© 2012 NVIDIA - Page 3
Why Mobile GPU Compute?
Courtesy Metaio http://www.youtube.com/watch?v=xw3M-TNOo44&feature=related
State-of-the-art Augmented Reality without GPU Compute
© 2012 NVIDIA - Page 4
Augmented Reality with GPU Compute
High-Quality Reflections, Refractions, and Caustics in Augmented Reality and their Contribution to Visual Coherence P. Kán, H. Kaufmann, Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria
Research today on CUDA equipped laptop PCs
How will this GPU Compute Capability migrate from high-end PCs to mobile?
© 2012 NVIDIA - Page 5
Mobile SOC Performance Increases
1
100
CPU
/GPU
AG
GRE
GAT
E PE
RFO
RMA
NCE
2013 2015
Tegra 4 1st Quad A15 Chimera Computational Photography
2014 2011
2012
Tegra 2 1st Dual A9
Tegra 3 1st Quad A9 1st Power saver 5th core
Logan
10
Core 2 Duo
Parker
Core i5
HTC One X+
Google Nexus 7
100x perf increase in four years
Device Shipping Dates
Full Kepler GPU CUDA 5.0
OpenGL 4.3
Denver CPU Maxwell GPU
FinFET
© 2012 NVIDIA - Page 6
Power is the New Design Limit The Process Fairy keeps bringing more transistors.. ..but the ‘End of Voltage Scaling’ means power is much more of an issue than in the past
In the Good Old Days Leakage was not important, and voltage
scaled with feature size
L’ = L/2 D’ = 1/L2 = 4D f’ = 2f V’ = V/2 E’ = CV2 = E/8 P’ = P
Halve L and get 4x the transistors and 8x the capability for the same power
The New Reality Leakage has limited threshold voltage,
largely ending voltage scaling
L’ = L/2 D’ = 1/L2 = 4D f’ = ~2f V’ = ~V E’ = CV2 = E/2 P’ = 4P
Halve L and get 4x the transistors and 8x the capability for
4x the power!!
© 2012 NVIDIA - Page 7
Mobile Thermal Design Point
2-4W 4-7W
6-10W 30-90W
4-5” Screen takes 250-500mW
7” Screen takes 1W
10” Screen takes 1-2W Resolution makes a difference -
the iPad3 screen takes up to 8W!
Typical max system power levels before thermal failure Even as battery technology improves - these thermal limits remain
© 2012 NVIDIA - Page 8
How to Save Power? Much more expensive to MOVE data than COMPUTE data Energy efficiency must now be key metric during silicon AND software design
Awareness of where data lives, where computation happens, how is it scheduled
Need to use hardware acceleration Reduce data movement Lots of local processing in parallel Efficient caching and memory usage
32-bit Integer Add 1pJ
32-bit Float Operation 7pJ
32-bit Register Write 0.5pJ
Send 32-bits 2mm 24pJ
Send 32-bits Off-chip 50pJ
For 40nm, 1V process
Write 32-bits to Memory 600pJ
© 2012 NVIDIA - Page 9
Dark Silicon, Mobile SOCs and Power Efficiency Lots of space for transistors - can’t turn them on at same time!
Would exceed Thermal Design Point
Dark Silicon - specialized hardware turned on when needed Dedicated units can increase locality and parallelism of computation
GPUs are also much more power efficient than CPUs When exploiting data parallelism
Pow
er C
onsu
mpt
ion
Computation Flexibility
Enabling new mobile experiences requires pushing computation onto GPUs and
dedicated hardware
Dedicated Hardware
GPU Compute
Multi-core CPU
X1
X10
X100
© 2012 NVIDIA - Page 10
Mobile GPU Compute Adoption
NVIDIA invented GPU Computing What we learned - it’s not technology alone it’s USE CASES
Augmented Reality
Face, Body and Gesture Tracking
Computational Photography
Mobile GPU Compute Use Case Pipeline
3D Scene/Object Reconstruction
© 2012 NVIDIA - Page 11
ISP – Dedicated Hardware for Sensor Processing Camera ISP (Image Signal Processor) typically has little or no programmability
Scan-line-based, data flows through compact hardware pipe No global memory used to minimize power
BUT… computational photography apps now want to mix non-programmable ISP processing with more flexible GPU processing -> Chimera – new NVIDIA Computational Photography Architecture
Camera ISP ~760 math Ops
~42K vals = 670Kb ~250Gops @ 300MHz
© 2012 NVIDIA - Page 12
Flexible Use of ISP, GPU and CPU
Flexible routing of image frames between computation engines
Potential to integrate more hardware blocks over time - ISPs for different types of sensors – e.g. IR and depth cameras
- ‘Scanners’ - very low power, always on, to detect things in the environment to process
© 2012 NVIDIA - Page 13
Tegra 4 Family
Tegra 4 (“Wayne”) World’s Fastest Mobile Processor
Tegra 4i (“Grey”) 1st Integrated Tegra 4 LTE Processor
Superphone / Tablet Smartphone
Quad CPU Cortex A15, 4+1 Cortex A9 r4, 4+1
NVIDIA GPU 72 Core 60 Core
LTE Optional with i500 Integrated i500
Chimera
© 2012 NVIDIA - Page 14
Android Three Layer Ecosystem
API Drivers - Java (SDK) and Native (NDK)
Apps and Games Most use Java, Cutting-edge apps/games use native APIs
Middleware and Apps Engines Use native APIs for power and performance
Partners
VisX Turn-key vision middleware
developed by NVIDIA: E.g. Tap-to-track, Panorama Paint
VisX
© 2012 NVIDIA - Page 15
?
APIs for Mobile Imaging and Vision
Graphics Camera and Images
Java
Native
MediaCodec SurfaceTexture
FilterScript (RenderScript Subset) Java Binding to OpenGL ES
(similar to JSR239)
OpenCV Use GLSL shaders for imaging
OpenGL 4.3 Compute Shaders provide general purpose computation on uniforms, images and
textures for image and vision processing
Open source research project for advanced camera control
OpenCV4Tegra Open source OpenCV vision library with OpenGL ES GLSL, ARM Multithreading and NEON optimizations
Open standard under development at Khronos for optimized, power efficient vision acceleration
© 2012 NVIDIA - Page 16
APIs for Mobile GPU Compute
Graphics GPU Compute
Java
Native
RenderScript Run performance critical sections as
native C. Automatically offload C code segments to the GPU if possible
Java Binding to OpenGL ES (similar to JSR239)
Use GLSL shaders for GPGPU compute
OpenGL 4.3 Compute Shaders provides sufficient flexibility for physics, AI, Global Illumination and Ray-tracing acceleration
? Program GPUs in C - over 375 million CUDA-enabled GPUs in notebooks, workstations and supercomputers
© 2012 NVIDIA - Page 17
CUDA 5.0 and OpenGL 4.3 on Tegra Today Kayla Tegra + discrete GPU development platforms
Available to select developers
OpenGL 4.3 and CUDA 5.0 Full Kepler support on Linux PhysX, VisX …
Enables early development of ARM-based applications with desktop-class graphics and compute Talk to us if you are interested
Or email [email protected]
© 2012 NVIDIA - Page 18
Thank You!
Powerful GPU Compute is coming to a mobile device near you! New use cases need GPUs for acceptable battery consumption Logan will bring full Kepler-class GPU to Mobile! Desktop APIs for full GPU Compute: OpenGL 4.3 and CUDA 5.0
If you have apps that need Mobile GPU Compute now is the time to be talking to us…
Questions? [email protected]