Upload
deep-learning-jp
View
425
Download
1
Embed Size (px)
Citation preview
Image-to-Image Translation with Conditional Adversarial Nets (Pix2Pix)
& Perceptual Adversarial Networks for
Image-to-Image Transformation (PAN)
2017/10/2 DLHacks Otsubo
Topic : image-to-image “translation”
1
Info
Pix2Pix [CVPR2017] • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
- iGAN [ECCV 2016] - interactive-deep-colorization [SIGGRAPH 2017] - Context-Encoder [CVPR 2016] - Image Quilting [SIGGRAPH 2001] - Texture Synthesis by Non-parametric Sampling [ICCV 1999]
• University of California • 178 citations
PAN [arXiv2017] • Chaoyue Wang, Chang Xu, Chaohui Wang, Dacheng Tao • University of Technology Sydney, The University of Sydney,
Universite Paris-Est
2
Background
• Many tasks are regarded as “translation” from input image to output image - Diverse methods exist for them
3
Istheresingleframeworktoachievethem?
Overview
Pix2Pix • General-purpose solution to image-to-image
translation using single framework - Single framework: conditional GAN (cGAN)
PAN • Pix2Pix - (per-pixel loss)
+ (perceptual adversarial loss)
4
Naive Implementation : U-Net (①)
5
①per-pixel loss (L1/L2)
Pix2Pix (①+②)
6
②adversarial loss
Pix2Pix’s loss (①+②)
7
②
②
①
PAN (②+③)
8
③perceptual adversarial loss
PAN’s loss (②+③)
9
L1 norm
②
②
③
③
m : constant
Example1 : Image De-Raining
• Removing rain from single images via a deep detail network [Fu, CVPR2017]
• ID-GAN (cGAN) [Zhang, arXiv2017] - per-pixel loss - adversarial loss - pre-trained VGG’s
perceptual loss
10
Input Output (Ground Truth)
Example1 : Image De-Raining
• Removing rain from single images via a deep detail network [Fu, CVPR2017]
• ID-GAN (cGAN) [Zhang, arXiv2017] - per-pixel loss - adversarial loss - pre-trained VGG’s
perceptual loss
11
Input Output (Ground Truth)
(cf. PAN uses discriminator’s perceptual loss)
Example2 : Image Inpainting
• Globally and Locally Consistent Image Completion [Iizuka, SIGGRAPH2017]
• Context Encoders (cGAN) [Pathak, CVPR2016] - per-pixel loss - adversarial loss
12
Input Output (Ground Truth)
Example3 : Semantic Segmentation
Cityscape / Pascal VOC • DeepLabv3 [Chen, arXiv2017] • PSPNet [Zhao, CVPR2017]
http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=11&compid=6
Cell Tracking / CREMI • Learned Watershed
[Wolf, ICCV2017] • U-Net
[Ronneberger, MICCAI2015] http://www.codesolorzano.com/Challenges/CTC/Welcome.html
13
Input Output (Ground Truth)
Result1 : Image De-Raining
14
(≒pix2pix)→
(≒pix2pix)
Result2 : Image Inpainting
15
Result3 : Semantic Segmentation
16
Discussion
vs. No perceptual loss (Pix2Pix) - Perceptual loss enables D to detect more
discrepancy between True/False images vs. Pre-trained VGG perceptual loss (ID-GAN)
- VGG features tend to focus on content - PAN features tend to focus on discrepancy - PAN’s loss leads to avoid adversarial
examples [Goodfellow, ICLR2015] (?)
17
Why is perceptual adversarial loss so efficient?
Minor Difference
• Pix2Pix uses Patch-GAN - Small size(70×70) patch-discriminator - Final output of D is average of
patch-discriminator’s responses (convolutionally applied)
18
To Do
• Implement 1. Pix2Pix (Patch Discriminator) 2. PAN (Patch Discriminator) 3. PAN (Normal Discriminator) Wang et al. might compare 1 with 3.
19
20
Implementation
2017/10/17 DLHacks Otsubo
My Implementation
• https://github.com/DLHacks/pix2pix_PAN
• pix2pix - https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
• PAN - per-pixel loss à perceptual adversarial loss - not same as paper’s original architecture - num of parameters is same as pix2pix
22
My Experiments
• Facade (label à picture) • Map (picture à Google map) • Cityscape (picture à label)
23
Result (Facade pix2pix)
24
Result (Facade PAN)
25
Result (Map pix2pix)
26
Result (Map PAN)
27
Result (Cityscape pix2pix)
28
Result (Cityscape PAN)
29
Result (PSNR[dB])
30
Discussion – Why pix2pix > PAN?
• per-pixel loss is needed? • patch discriminator is not suited for PAN? • positive margin m
• (bad pix2pix implementation in PAN’s paper…?)
31