[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation

Preview:

Citation preview

Image-to-Image Translation with Conditional Adversarial Nets (Pix2Pix)

& Perceptual Adversarial Networks for

Image-to-Image Transformation (PAN)

2017/10/2 DLHacks Otsubo

Topic : image-to-image “translation”

1

Info

Pix2Pix [CVPR2017] •  Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros

-  iGAN [ECCV 2016] -  interactive-deep-colorization [SIGGRAPH 2017] -  Context-Encoder [CVPR 2016] -  Image Quilting [SIGGRAPH 2001] -  Texture Synthesis by Non-parametric Sampling [ICCV 1999]

•  University of California •  178 citations

PAN [arXiv2017] •  Chaoyue Wang, Chang Xu, Chaohui Wang, Dacheng Tao •  University of Technology Sydney, The University of Sydney,

Universite Paris-Est

2

Background

•  Many tasks are regarded as “translation” from input image to output image -  Diverse methods exist for them

3

Istheresingleframeworktoachievethem?

Overview

Pix2Pix •  General-purpose solution to image-to-image

translation using single framework -  Single framework: conditional GAN (cGAN)

PAN •  Pix2Pix - (per-pixel loss)

+ (perceptual adversarial loss)

4

Naive Implementation : U-Net (①)

5

①per-pixel loss (L1/L2)

Pix2Pix (①+②)

6

②adversarial loss

Pix2Pix’s loss (①+②)

7

PAN (②+③)

8

③perceptual adversarial loss

PAN’s loss (②+③)

9

L1 norm

m : constant

Example1 : Image De-Raining

•  Removing rain from single images via a deep detail network [Fu, CVPR2017]

•  ID-GAN (cGAN) [Zhang, arXiv2017] -  per-pixel loss -  adversarial loss -  pre-trained VGG’s

perceptual loss

10

Input Output (Ground Truth)

Example1 : Image De-Raining

•  Removing rain from single images via a deep detail network [Fu, CVPR2017]

•  ID-GAN (cGAN) [Zhang, arXiv2017] -  per-pixel loss -  adversarial loss -  pre-trained VGG’s

perceptual loss

11

Input Output (Ground Truth)

(cf. PAN uses discriminator’s perceptual loss)

Example2 : Image Inpainting

•  Globally and Locally Consistent Image Completion [Iizuka, SIGGRAPH2017]

•  Context Encoders (cGAN) [Pathak, CVPR2016] -  per-pixel loss -  adversarial loss

12

Input Output (Ground Truth)

Example3 : Semantic Segmentation

Cityscape / Pascal VOC •  DeepLabv3 [Chen, arXiv2017] •  PSPNet [Zhao, CVPR2017]

http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=6

Cell Tracking / CREMI •  Learned Watershed

[Wolf, ICCV2017] •  U-Net

[Ronneberger, MICCAI2015] http://www.codesolorzano.com/Challenges/CTC/Welcome.html

13

Input Output (Ground Truth)

Result1 : Image De-Raining

14

(≒pix2pix)→

(≒pix2pix)

Result2 : Image Inpainting

15

Result3 : Semantic Segmentation

16

Discussion

vs. No perceptual loss (Pix2Pix) -  Perceptual loss enables D to detect more

discrepancy between True/False images vs. Pre-trained VGG perceptual loss (ID-GAN)

-  VGG features tend to focus on content -  PAN features tend to focus on discrepancy -  PAN’s loss leads to avoid adversarial

examples [Goodfellow, ICLR2015] (?)

17

Why is perceptual adversarial loss so efficient?

Minor Difference

•  Pix2Pix uses Patch-GAN -  Small size(70×70) patch-discriminator -  Final output of D is average of

patch-discriminator’s responses (convolutionally applied)

18

To Do

•  Implement 1.  Pix2Pix (Patch Discriminator) 2.  PAN (Patch Discriminator) 3.  PAN (Normal Discriminator) Wang et al. might compare 1 with 3.

19

20

Implementation

2017/10/17 DLHacks Otsubo

My Implementation

•  https://github.com/DLHacks/pix2pix_PAN

•  pix2pix - https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

•  PAN -  per-pixel loss à perceptual adversarial loss -  not same as paper’s original architecture -  num of parameters is same as pix2pix

22

My Experiments

•  Facade (label à picture) •  Map (picture à Google map) •  Cityscape (picture à label)

23

Result (Facade pix2pix)

24

Result (Facade PAN)

25

Result (Map pix2pix)

26

Result (Map PAN)

27

Result (Cityscape pix2pix)

28

Result (Cityscape PAN)

29

Result (PSNR[dB])

30

Discussion – Why pix2pix > PAN?

•  per-pixel loss is needed? •  patch discriminator is not suited for PAN? •  positive margin m

•  (bad pix2pix implementation in PAN’s paper…?)

31

Recommended