Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
HIGH PERFORMANCE VIDEO ENCODING
Abhijit Patait
Sr. Manager,
GPU Multimedia SW
USING NVIDIA GPUS
AGENDA
� Overview GPU Video Encoding
� NVIDIA Video Encoding Capabilities
— Kepler vs Maxwell GPU capabilities
— Roadmap
� Software API
� Performance & Quality
WHY GPU VIDEO ENCODING?
BENEFITS OF ENCODING ON GPU
� Low power
— Fixed function hardware
— Reduced memory transfers
� Low latency
� High performance
� Higher density
� Scalability
� Ease of Programming
— Linux, Windows, C/C++, Application portability
NVIDIA GPU VIDEO ENCODING CAPABILITIES
NVIDIA GPU ENCODING CAPABILITIES
Feature Benefits
H.264 base, main, high profiles Wide range of use-cases
High performance (Up to 16x HD) “Blazing-speed” encoding
YUV 4:2:0 and 4:4:4 support High quality encoding without chroma subsampling
QP maps Customizable quality, region of interest encoding
MVC Full resolution stereo encode
Up to 4096 × 4096 in HW High resolution encode
API - NV Encode SDK & GRID SDK Flexible, Win/Linux, DirectX/CUDA
Independent of CUDA Use CUDA and encode simultaneously
VIDEO ENCODING — KEPLER VS. MAXWELL
Kepler (GK104, GK107, GK106, GK110, GK208)
Maxwell (GM107)
Planar 4:4:4 Standard 4:4:4 and H.264 lossless encoding
~240 fps 2-pass encoding @ 720p ~500 fps 2-pass encoding @ 720p
GRID K340/K520, K1/K2, Quadro, Tesla K10/K20
Current and future Maxwell GPU-boards
GeForce – 2 full-speed encode sessions/GPU
GeForce – 2 full-speed encode sessions/GPU
NV Encode SDK 1.0, 2.0, 3.0 (Now) NV Encode SDK 4.0+ (May 2014)
GRID SDK 1.x, 2.2, 2.3 (Now) GRID SDK 3.0+ (June 2014)
NVIDIA VIDEO ENCODING ROADMAP
� Performance improvements
� Quality improvements
— 4:4:4 & lossless encoding
— Rate control enhancements
— Adaptive quantization
— ROI, ME-only mode
� New video standards
NVENC SOFTWARE APIS
USING NVENCNVENC SDK • No capture
• Transcoding
• Archiving
• Video editing
• CUDA pre-process + encoding
• Granular encoder settings
• D3D, CUDA interopGRID SDK • Capture + encode
• Optimized for low-latency apps
• Capture + CUDA pre-process + encoding
• Encoder settings optimized for streaming
• D3D, CUDA interop
Direct
Encode
Capture +
Encode
DIRECT ENCODE (NVENC SDK)
Client application
NVENC API
NVENC
Driver
DirectX
Driver
CUDA
Driver
NVENC firmware + hardware
Initialize,
Configure HW
HW Encode
Encoded
bitstream Configure, Encode
CAPTURE AND ENCODE (GRID SDK)
Client application
NvFBC/NvIFR
NVENC
Driver
DirectX/OGL
Driver
NVENC Hardware
Capture
YUV
GPU 3D Engine
DX/OGL Present
Encode
Encoded
Bitstream
NVENC SDK� Available on NVIDIA developer zone
— https://developer.nvidia.com/nvidia-video-codec-sdk
— Current release 3.0
— Release 4.0 in May 2014 with Maxwell support
� Interface header, documentation, sample application
— .dll/.so included in the driver
� Unified API for Windows and Linux
� Works on x86/x64
� Various API’s, presets, rate control modes for
— Transcoding
— Video conferencing
— GTC Session S4654
NVENC SDK (CONTD.)� Advantages
— Flexibility
� Dynamic resolution/bitrate change
� CABAC vs CAVLC; low-level encoder settings, B-frames, sync vs async, custom QP
� Linux, Windows, DirectX, CUDA, OGL (via CUDA)
� Also works on GeForce hardware (2 sessions/GPU)
— Error concealment
� Reference picture invalidation
� Intra-refresh
— Quality
� Two-pass modes for higher quality
� Various presets with quality/performance trade-off
� 4:4:4 & lossless encoding (Maxwell only)
GRID SDK ENCODE
� Available on NVIDIA developer zone
— https://developer.nvidia.com/grid-app-game-streaming
— Current release: 2.2
� Interface header, documentation, sample apps
— .dll/.so included in the driver
� Windows and Linux
� Works on x86/x64
� Various presets and API’s for
— Remote graphics (Cloud gaming, remote desktop, capture & stream)
� Optimized for low latency
GRID SDK (CONTD.)
� Advantages
— Simplicity
� Very simple API; single function call for capture + H.264 encode
— Low-latency, high performance
� Optimized API
— Error concealment
� Reference picture invalidation
� Intra-refresh
— Quality
� Two-pass modes for higher quality
� 4:4:4 & lossless encoding (Maxwell only)
PERFORMANCE AND QUALITY
PERFORMANCE – 720P
100 200 300 400 500 600
2_PASS_QUALITY
2_PASS_FRAMESIZE_CAP
CBR_IFRAME_2PASS
505 fps
503 fps
504 fps
232 fps
232 fps
231 fps
720p Performance (fps)
NVENC Performance at 720p, Low-Latency HP preset
Kepler (GRID)
Maxwell
Performance measured on GRID K520 with GRID SDK NVENC performance benchmarking application
Rate control modes
PERFORMANCE – 1080P
50 100 150 200 250
2_PASS_QUALITY
2_PASS_FRAMESIZE_CAP
CBR_IFRAME_2PASS
238 fps
240 fps
239 fps
119 fps
118 fps
118 fps
1080p Performance (fps)
NVENC Performance at 1080p, Low-Latency HP preset
Kepler (GRID)
Maxwell
Performance measured on GRID K520 with GRID SDK NVENC performance benchmarking application
Rate control modes
ENCODING QUALITY VS X264 –ASSUMPTIONS
� Infinite GOP IPPP…
� VBV buffer = bitrate/framerate
� x264
— Zero latency
— CRF = 24
— Preset = faster
� NVENC
— Preset = LOW_LATENCY_HQ
— RC = 2-pass-quality
NVENC/X264 QUALITY COMPARISON
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
0
5
10
15
20
25
30
35
40
45
1 101 201 301 401 501 601 701 801 901
SSIM
Y
PSN
R Y
(d
B)
Titan Fall 720p, 5 Mbps, Low-latency HQ
PSNR NVENC
PSNR x264
SSIM NVENC
SSIM x264
PSNR Y (dB)
SSIM Y
NVENC/X264 QUALITY COMPARISON
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
0
10
20
30
40
50
60
1 101 201 301 401 501
SSIM
Y
PSN
R Y
(d
B)
Bunny 1080p, 12 Mbps, Low-latency HQ
PSNR NVENC
PSNR x264
SSIM NVENC
SSIM x264
PSNR Y (dB)
SSIM Y
QUALITY COMPARISON – PSNR
-5.00 dB
0.00 dB
5.00 dB
10.00 dB
15.00 dB
20.00 dB
25.00 dB
30.00 dB
35.00 dB
40.00 dB
45.00 dB
50.00 dB
Bunny1080p
NFS Rivals720p
NFS Rivals1080p
Titan Fall720p
Titan Fall1080p
WoT - 31280 × 768
WoT - 121280 × 768
PSNR NVENC 47.24 dB 34.05 dB 35.51 dB 30.58 dB 28.13 dB 34.15 dB 35.60 dB
PSNR x264 43.71 dB 33.18 dB 34.39 dB 29.78 dB 30.63 dB 33.41 dB 34.72 dB
PSNR Difference 3.52 dB 0.87 dB 1.12 dB 0.80 dB -2.50 dB 0.74 dB 0.87 dB
PSN
R Y
(d
B)
PSNR Comparison - x264 vs NVENC
QUALITY COMPARISON – SSIM
-0.2000
0.0000
0.2000
0.4000
0.6000
0.8000
1.0000
Bunny 1080p NFS Rivals720p
NFS Rivals1080p
Titan Fall720p
Titan Fall1080p
WoT - 31280 × 768
WoT - 121280 × 768
SSIM NVENC 0.9874 0.9217 0.9388 0.8350 0.8309 0.9101 0.9169
SSIM x264 0.9808 0.9103 0.9269 0.8073 0.8567 0.8930 0.9027
SSIM Difference 0.01 0.01 0.01 0.03 -0.03 0.02 0.01
SSIM
Y
SSIM Comparison - x264 vs NVENC
QUESTIONS?