Deep Learning on Computer Vision Applicationsmedia.ee.ntu.edu.tw/crash_course/2018/dl/dlcv_app.pdf · Image Credit: Wei-Sheng Lai @ UC Merced [1] “Deep Laplacian Pyramid Networks

d

National Taiwan University

Dept. of EE, Research Assistant

Yen-Cheng Liu

Deep Learning on Computer VisionApplications

Outline

• Convolution Neural Network (CNN)

• Applications• Super-resolution

• Compressed Artifact Reduction

• Reconstruct Compressed Image

• Context Inpainting

• Colorization

• Sketch Simplification• Style Transfer

1

• Depth Estimation

• Semantic Segmentation

• Standard Model

• Image-to-Image Model

Outline






• Colorization


1



• Standard Model


2

No Math in this tutorial

No Math in this tutorial

2

How I feel when no math

appear ing in a paper

Convolutional Neural Networkn CNN History

• 1990s, CNN used to be the dominant tool, but then fell out of fashion, particularly in computer

vision, with the rise of support vector machines(SVM).

• In 2012, CNN has become popular again due to significant success on the ILSVRC

n Standard structure

Convolution layers and pooling layers Fully connected layers 3

Convolutional Neural Networkn Component

• Convolution layers, Pooling layers and Fully connected layers

• Purpose: originally for classification (i.e. LeNet)

4

Convolution Layer

Image Credit: Stanford CS231n

Pooling Layer Fully-Connected Layer

Input Feature Map Filters

Output Feature Map

Standard CNN Model

5

Convolution layers and pooling layers Fully connected layers

Input Output

6

What you did yesterday……

Ground Truth Label

+

7

Ground Truth Label

+Human Machine




8

Input

“9” “5” “2” ”7”

Digit

Recognition

Output

Standard CNN Model- Example

9

Object Datasets

CNN


10

InputOutput

“Dog”

Object

Recognition



11

Face Datasets

CNN


12

InputOutput

“柯P”

Face

Recognition



13

某Datasets

+ Label

CNN

某Recognition


某Datasets

+ Label

CNN

某Recognition

13


14

某Datasets

+ Label

CNN

某Recognition

Jonathan Long Evan Shelhamer Trevor Darrell

(from UC Berkley)

Image-to-Image Model


15

Input Output


Convolution layers Fully connected layers

15

Input Output



15

Input Output

Input



Input Output

Image

Input

Image

OutputUp-sampling

15



Input Output

Image

Input

Image

Output

15



Input Output

Input Output

15

Applications

16

Super-resolution

17

Image Credit: Wei-Sheng Lai @ UC Merced

Super-resolution

18

Super-resolution- Input : Low-resolution image Y

- Output : High-resolution image F(Y)

Image Credit: Wei-Sheng Lai @ UC Merced

[1] “Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution”, Lai et al., CVPR ‘17

[2] “Image super-resolution using deep convolutional networks”, Dong et al., TPAMI ‘17

[3] “Accelerating the super-resolution convolutional neural network”, Dong et al., ECCV ‘16

[5] “Deeply-recursive convolutional network for image super-resolution”, Kim et al., CVPR ‘16 [4] “Accurate image super-resolution using very deep convolutional network”, Kim et al., CVPR ‘16

[1]

[2]

[3]

[4]

[5]

19

[6] Learning a Deep Convolutional Network for Image Super-Resolution, Dong et al., ECCV ‘14

Super-resolution

20

Reconstructing Compressed Image• Kulkarni et al.[4] present a non-iterative and extremely fast algorithm to

reconstruct images from compressively sensed (CS) random measurements

21[7] “ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Random Measurements”, Kulkarni et al., CVPR ‘16

Reconstructing Compressed Image

22[7] “ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Random Measurements”, Kulkarni et al., CVPR ‘16

Image Artifacts Removal• Compressed artifacts can be removed by using ARCNN [8]

23[8] “Compression Artifacts Reduction by a Deep Convolutional Network”, Dong et al., ICCV ‘15

Image Artifacts Removal

[8] “Compression Artifacts Reduction by a Deep Convolutional Network”, Dong et al., ICCV ‘15 24

Context Inpainting• Pathak et al.[9] generate the contexts of an arbitrary image region

conditioned on its surroundings using CNN

25[9] ”Context Encoders: Feature Learning by Inpainting”, Pathak et al., CVPR ‘16.

Context Inpainting• Pathak et al.[9] generate the contexts of an arbitrary image region

conditioned on its surroundings using Generative Adversarial Net


Context Inpainting


Face Inpainting• Input: Corrupted Facial Image• Output: Complete Facial Image

28[10] ”Generative Face Completion”, Li et al., CVPR ‘17.

[11] “DeMeshNet: Blind Face Inpainting for Deep MeshFace Verification”, Zhang et al., CVPR ‘17

OutputInput Input Output

Face Rotation• Input: Facial Image• Output: Facial Image with given angle

29[12] ” Rotating Your Face Using Multi-task Deep Neural Network ”,Yim et al., CVPR ‘15.

[13] “Disentangled Representation Learning GAN for Pose-Invariant Face Recognition”, Trum et al., CVPR ‘17

Attribute Manipulation

30[14] StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Colorization

31

Colorization• Cheng et al.[14] investigates into the colorization problem which converts a

grayscale image to a colorful one

32[15] “Deep Colorization”, Cheng et al., ICCV ‘15.

Colorization• Cheng et al.[14] investigates into the colorization problem which converts a

grayscale image to a colorful version

33[15] “Deep Colorization”, Cheng et al., ICCV ‘15.

Colorization• Satoshi et al.[16] propose a technique to automatically colorize grayscale

images that combines both global priors and local image features .

Global image priors are extracted from entire image

Local image feature are computed from small image pattern

[16] “Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with SimultaneousClassification”, Iizuka et al., SIGGRAPH ‘16

34

Colorization• Satoshi et al.[16] propose a technique to automatically colorize grayscale

images that combines both global priors and local image features .

[16] “Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with SimultaneousClassification”, Iizuka et al., SIGGRAPH ‘16

35

Sketch Simplification• Simo-Serra et al.[17] propose CNN structure to simplify sketch drawings

• This architecture can process any resolution due to Fully Convolutional

Neural Network

Input Image Output Image[17] Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup, Simo-Serra et al., SIGGRAPH ‘16 36

Sketch Simplification• Simo-Serra et al.[17] propose CNN structure to simplify sketch drawings

More challenging input rough raster image, instead of vector image

[17] Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup, Simo-Serra et al., SIGGRAPH ‘16 37

Sketch-to-Photo Inversion

[18] “Scribbler: Controlling Deep Image Synthesis with Sketch and Color”, Sangkloy et al. , CVPR 2017

Input

Output

38

Attribute Manipulation + Style Transfer

[19] “Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation”, CVPR ’1839

No Label Supervision

Input

Output

Outline






• Colorization


40



• Standard Model


Outline






• Colorization




• Standard Model


40

• Gatys et al.[20] propose a system which use neural representation to

separate and recombine content and style of arbitrary images

Artistic Image Style Transfer

[20] “A Neural Algorithm of Artistic Style”, Gatys et al., CVPR ‘1641


• Based on VGG-19 framework

• Extract the feature map of single photo and artwork to generate the image which mixcontent and style

VGG-19

Content / Style

Representation

[20] “A Neural Algorithm of Artistic Style”, Gatys et al., CVPR ‘1642


• Based on VGG-19 framework

• Extract the feature map of single photo and artwork to generate the image which mixcontent and style

43[20] “A Neural Algorithm of Artistic Style”, Gatys et al., CVPR ‘16


• Feed-forward CNN Model (1000x faster than Gatys et al.)

44

• Loss Network is based on Pre-trained VGG-19 framework

[21] "Perceptual losses for real-time style transfer and super-resolution.“, Johnson et al., ECCV ‘16.

Artistic Video Style Transfer

https://www.youtube.com/watch?v=Khuj4ASldmU

45[22] "Artistic style transfer for videos.“, Ruder et al., arXiv ‘16.

https://www.youtube.com/watch?v=Khuj4ASldmU

Artistic 360 Video Style Transfer

46

https://www.youtube.com/watch?v=pkgMUfNeUCQ

[23] " Artistic style transfer for videos and spherical images.“, Ruder et al., arXiv ‘17.

https://www.youtube.com/watch?v=pkgMUfNeUCQ

Deep Photo Enhancer

47[24] "Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs.“, Chen et al., CVPR ‘18

Deep Photo Enhancer

47[24] "Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs.“, Chen et al., CVPR ‘18

Single Image Depth Estimation

• Depth Estimation CNN– Input: RGB image

– Output: Depth/Disparity Estimation

48[25] “Unsupervised Monocular Depth Estimation with Left-Right Consistency”, Godard et al., CVPR ’17

[26] “Unsupervised Learning of Depth and Ego-motion from Video”, Zhou et al., CVPR ‘17

[27] “Semi-Supervised Deep Learning for Monocular Depth Map Prediction”, Kuznietsov et al., CVPR ‘17

Single Image Depth Estimation

• Depth Estimation CNN– Input: RGB image

– Output: Depth/Disparity Estimation

48

Semantic Segmentation

• Semantic Segmentation CNN (Pixel-wise classification)

– Input: RGB image

– Output: Pixel-wise classes prediction

49[28] “Fully Convolutional Networks for Semantic Segmentation”, Long et al., CVPR ’15

[29] “Pyramid Scene Parsing Network”, Zhao et al., CVPR ‘17





50

https://www.youtube.com/watch?v=qWl9idsCuLQ





50

Today’s Practice!





50

Today’s Practice!

Style Transfer + Semantic Segmentation• Champandard [30] introduce a novel concept to augment artistic style

algorithm with semantic annotation

Doodle by Human Result

[30] Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks, Champandard et al., arXiv, Mar, 201651



51


by CNN or HumanPainting




51


by CNN or HumanPainting


Summary

52

Today’s Presentation

Sty le Trans fe r

S e m a n t i c S e g m e n t a t i o n

Depth Estimation

Photo EnhancerAttribute Manipulation

S u p e r -Reso lu t ion

Colorization

I m a g e Art i facts R e m o v a l

C o n t e x t I n p a i n t i n g

Summary

53

Computer Vision

Today’s Presentation

54

Conclusion• Research areas including computer vision, image processing and computer graphics

have a great success based on Deep Learning

• Convolution Neural Network is still evolving and continually achieve magical

performance• Unsupervised learning

• Meta learning (Learning to learn)

• Explanation of neural network

55

Documents

Deep Learning on Computer Vision Applicationsmedia.ee.ntu.edu.tw/crash_course/2018/dl/dlcv_app.pdf · Image Credit: Wei-Sheng Lai @ UC Merced [1] “Deep Laplacian Pyramid Networks