FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No....

FPGA-based Real-Time Super-Resolution System

for Ultra High Definition Videos

Zhuolun He, Hanxian Huang, Ming Jiang, Yuanchao Bai, and Guojie Luo

Peking University

FCCM 2018

Ultra High Definition (UHD) Technology

UHD Television UHD Projector

UHD Phone UHD Camera

Content? • Limited Creators

• High network

bandwidth cost

• Huge storage cost

High-Resolution <---> Low-Resolution

Desired HR Image 𝑿

Down-Sampling

Observed LR Image 𝒀

Noise 𝑛

Spectrum of Super Resolution Methods

Interpolation

• Fast

• Easy to implement

• Blurry results

Model-based

• Interpretable

• High complexity

• Assumed known

blur kernel/noise

Example-based

• State-of-the-art quality

• High complexity

• Training data needed

Complicated Simple

Model-based Method is also Compute-Intensive

Desired HR Image 𝑿

Down-Sampling

Observed LR Image 𝒀

Noise 𝑛

Iteration 1 Iteration 2 X

Model-based methods may not be needed

• The computation also has a layered structure

• We can use a neural network to approximate

Total Variation Distribution

Fact: Blocks contain DIFFERENT

amount of information

(NOT equally important)

Insight: Use DIFFERENT upscaling

methods for different blocks

A Hybrid Algorithm

INPUT: LR Image 𝒀

1. Crop 𝒀 into sub-images {𝒚}

2.1. 𝒙 <- Upscale(𝒚) IF 𝑴 𝒙 > 𝑻

2.2. ELSE 𝒙 <- CheapUpscale(𝒚)

3. Mosaic 𝑿 with {𝒙} OUTPUT: HR Image 𝑿

M: Total Variation (TV)

Upscale: FSRCNN-s

CheapUpscale: Intepolation

Overall System

Low-Res

High-Res

Pipelined Neural Network

Conv(32, 1, 5) Conv(5, 3, 5) Conv(5, 1, 32)

Interpolator

Accelerator

Deconv(32, 9, 1) Conv(1, 5, 32)

Feature Extration Shrinking Mapping Expanding Deconvolution

Dispatcher

Stencil Access of TV Computation

x[offset]

x[right]

f2 ……

x[down]

height

width 𝑁

(𝛻𝑥)offset = 𝑎𝑏𝑠(𝑥 right − 𝑥 offset ) + 𝑎𝑏𝑠(𝑥 down − 𝑥 offset )

x[offset]

x[right]

f2 ……

x[down]

Micro-architecture for Stencil Computation

s1 buffer1(𝑁-1) s2 s3

f1 f2 f3

buffer2(1)

x[i][j]…x[i-1][j+2] x[i-1][j+1]

x[i][j]

(x[down])

x[i-1][j+1]

(x[right])

x[i-1][j]

(x[offset])

Buffering System for array x

Computation Kernel (𝛻𝑥)𝑖,𝑗

Convolutional Neural Network

Pipelined Neural Network

Conv(32, 1, 5) Conv(5, 3, 5) Conv(5, 1, 32) Deconv(32, 9, 1) Conv(1, 5, 32)

Feature Extraction Shrinking Mapping Expanding Deconvolution

Convolution 𝑁i 𝑁i+1

ni ci fi

Input Compute Output

sliding window(s)

Conv(ci, fi, ni)

Deconvolution

Input Compute Output

sliding window(s)

𝑁i 𝑁i+1

Deconv(ci, fi, ni) ci fi ni

Pipeline Balancing

Layer 𝒄𝒊 𝒇𝒊 𝒏𝒊 𝑵𝒊 #Mult. Ideal

#DSP Ideal II

Alloc.

#DSP Alloc. II

Extraction 1 5 32 36 819200 201 4076 200 4096

Shrinking 32 1 5 32 163840 40 4096 32 4096

Mapping 5 3 5 32 202500 50 4050 45 4500

Expanding 5 1 32 30 144000 35 4115 32 4500

Deconvolution 32 9 1 30 2332800 573 4072 519 4500

Overall - - - - 3662340 899 4115 828 4500

Available (ZC706) - - - - - 900 - 900 -

Sub-image Size

• Padding

• 𝑁𝑖 ≡ 𝑘 + 𝑓𝑖 − 1#𝐶𝑜𝑛𝑣𝑖

• If sub-image size too small

• Large border-to-block ratio

• Limited by memory bandwidth

• If sub-image size too large

• Large feature maps

• Limited by on-chip BRAM capacity

Sub-image Size vs. Performance vs. #mult.

10 20 30 40 50

Block Size

PSNR SSIM

8,00E+09

8,50E+09

9,00E+09

9,50E+09

10 20 30 40 50

Block Size

Multiplications

Overall Comparisons

• Compared six configurations

No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM

1 None Interpolation 6.6*10^7 35.51 0.9138

2 None Neural Network 8.2*10^9 38.55 0.9421

3 Blocking Interpolation 6.6*10^7 35.51 0.9138

4 Blocking Neural Network 8.4*10^9 38.55 0.9420

5 Blocking Mixed-Random 2.2*10^9 36.10 0.9211

6 Blocking Mixed-TV 2.2*10^9 37.36 0.9287

+3.04dB

No Performance Loss

+1.26dB

-1.19dB -75%

Example Outputs

Configuration 1

None/Interpolation

Configuration 2

None/Neural Network

Configuration 3

Blocking/Interpolation

Configuration 4

Blocking/Neural Network

Configuration 5

Blocking/Mixed-Random

Configuration 6

Blocking/Mixed-TV

Summary Flow

• Crop each frame into blocks • Suitable for low-level (pixel-level) tasks

• GOOD: on-chip buffer friendly

• BAD: Computation overheads

• Dispatch blocks according to TV value • Micro-architecture for buffering system

• Fully-pipelined CNN for upscaling • Sliding window for convolution/deconvolultion

• Pipeline balancing

• Performance • Full-HD (1920x1080) -> Ultra-HD (3940x2160): 31.7fps

Thank you!

TV Threshold vs. Performance vs. #mult.

30 40 50 60 70

TV Threshold

PSNR SSIM

0,0E+00

5,0E+09

1,0E+10

1,5E+10

2,0E+10

2,5E+10

30 40 50 60 70

TV Threshold

Multiplications

Resource Utilizations

Component BRAM DSP FF LUT

Dispatcher 1 2 618 1138

Neural Network 178 844 63149 98439

Interpolator 0 10 1414 3076

Total 327 858 66261 103714

Available 1090 900 437200 218600

Utilization (%) 30 95 15 47

FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No....

Documents

SSIM MEU ALUNO COM AUTISMO APRENDEU MATEMÁTICA

Data Strategy Status Update SSIM RID Technology Strategies

Ssim Maintenance Inst Mus 2

Pruebas completas de interfaces de audio y vídeocdn.rohde-schwarz.com/pws/dl_downloads/dl_common...ción matemática. Basada en dos métricas de uso indus-trial, PSNR y SSIM, la evaluación

Multiple Watermarking Techniques using Visual Cryptography ... · psnr =109.5163 db psnr =109.5165 db PSNR =109.5099 dB PSNR =109.5139 dB International Journal of Scientific & Engineering

Ssim E-prospectus.pdf

SSIM-inspired image restoration using sparse representationz70wang/publications/EURASIP12.pdf · 2012. 7. 26. · RESEARCH Open Access SSIM-inspired image restoration using sparse

Eﬃcient Motion Weighted Spatio-Temporal Video SSIM Index

Katalóg SSiM

ORGANIZADORA: EDITORA DO BRASIL SSim u prendo Caligraﬁa

Image Quality Assessment: SSIM

ANALYTICAL RELATION & COMPARISON OF PSNR AND SSIM … · 5. Context-based measures, that is penalties based on various functional of the multidimensional context probability. 6. Human

Learning for Video Compression with Hierarchical Quality ... · from Figure 13 that both our PSNR and MS-SSIM models have less compression artifacts than H.265, in case that our models

Generative Adversarial Networks arXiv:1809.00219v2 [cs.CV ... · Noise Ratio (PSNR) value [5,6,7,1,8,9,10,11,12]. However, these PSNR-oriented approaches tend to output over-smoothed

Employee Engagement Survey Scripted - PSNR

EMPLOYEE HAND BOOK - SSIM

Training Requirements for: §35.50 Radiation Safety Officers §35.51 Authorized Medical Physicists §35.55 Authorized Nuclear Pharmacists

esenvolvimento sustentável ssim contribuímos para o d a

Battery lifetime and PSNR quality-scalable video

Nature and Scope of Managerial Economics-SSIM