44
GauGAN/SPADE Semantic Image Synthesis with Spatially Adaptive Normalization Ming-Yu Liu NVIDIA

GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

GauGAN/SPADESemantic Image Synthesis with Spatially

Adaptive Normalization

Ming-Yu Liu

NVIDIA

Page 2: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Image credit: https://www.smithsonianmag.com/history/journey-oldest-cave-paintings-world-180957685/

Page 3: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Rock Brush

Cave painting By Gauguin By fabulouswalrususing MS Paint

By Pablo Munoz Gomezusing NVIDIA GauGAN

Digital revolution AI revolution

Page 4: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Caveman cartoon image credit: https://canchamthailand.org/caveman-classic-tees-off-april-6th-reserve-spot-now/

Page 5: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Supervised/Paired/Aligned/Registered Unsupervised/Unpaired/Unaligned/Unregistered

Supervised vs UnsupervisedPaired vs UnpairedAligned vs UnalignedSupervised vs Unsupervised

Page 6: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

• Supervised/Paired/Aligned/Registered• Image Analogy (Hertzmann et. al. 2001)

• pix2pix (Isola et. al. 2017)

• CRN (Chen et. al. 2017)

• BicycleGAN (Zhu et. al. 2017)

• pix2pixHD (Wang et. al. 2018)

• SIMS (Qi et. al. 2018)

• SPADE (Park et. al. 2019)

• …

• Unsupervised/Unpaired/Unaligned/Unregistered• CoupledGAN (Liu et. al. 2016)

• DTN (Taigman et. al. 2017)

• DiscoGAN (Kim et. al. 2017)

• CycleGAN (Zhu et. al. 2017)

• SimGAN (Shrivastava et. al. 2017)

• DualGAN (Yi et. al. 2017)

• UNIT (Liu et. al. 2017)

• MUNIT, 2018 (Huang et. al. 2018)

• DRIT (Lee et. al. 2018)

• XGAN (Royer et. al. 2018)

• GANimorph (Gokaslan et. al. 2018)

• OST (Benaim et. al. 2018)

• FUNIT (Liu et. al. 2019)

• …

D1 D2 D1 D2

Page 7: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 8: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 9: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

@seahcb@frasSmith @LipComeralla Tyler Schatz

@jonathanfly @torans_photo123 @inning0@Soerenpepp

Page 10: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

@darekzabrocki@coliewertz

Page 11: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 12: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

By Neil BickfordBy Jay Axe

Page 13: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

How we achieve it?

Page 14: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Deep Generative Modeling

Generate new 2D Gaussian samples

𝑓 𝑧 =1

2𝜋𝑑2

𝑒−𝑧2

2

Generate new images

Page 15: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Generative Adversarial Networks

Generator

False

Discriminator

True

Image credit: Celebrity dataset, Jensen Huang, Founder and CEO of NVIDIA, Ian Goodfellow, Father of GANs.

~Discriminator

Page 16: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

After training the

model for a while

Image credit: NVIDIA StyleGAN, CVPR 2019

Generator

Page 17: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Conditional Generative Adversarial Networks

modeling

sampling

Page 18: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Segmentation Mask–Conditional GANs

Generator

Page 19: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Illustration of pix2pixHD Generator Design

pix2pixHD: High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro

Conference on Computer Vision and Pattern Recognition (CVPR) Oral 2018, Salt Lake City, Utah

• Previous SOTA method for GAN-based semantic image synthesis• ResNet-based encoder—decoder architecture• Work nicely only on highly constrained scenes• Utilize BatchNorm (BN) or InstanceNorm (IN) in the generator

CO

NV

BN

ReL

U

CO

NV

BN

ReL

U

CO

NV

Tan

H

CO

NV

BN

ReL

U

CO

NV

BN

ReL

UC

ON

VB

N

CO

NV

BN

ReL

UC

ON

VB

N

Page 20: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

pix2pixHD results

Page 21: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 22: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

BN: Batch Normalization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

Internaltional Conference on Machine Learning (ICML) 2015, Lille, France,

CO

NV

BN

ReL

U

CO

NV

BN

ReL

U

CO

NV

Tan

H

CO

NV

BN

ReL

U

CO

NV

BN

ReL

UC

ON

VB

N

CO

NV

BN

ReL

UC

ON

VB

N

Page 23: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Why Batch Normalization?

• Initial hypothesis: reducing covariance shift in internal activations

Page 24: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Why Batch Normalization?

• Initial hypothesis: reducing covariance shift in internal activations

• New hypothesis #1: leading to smoother optimization landscape

• New hypothesis #2: leading to length-direction decoupling of the weight space -> faster convergence rate

Without BN With BN

Loss landscape illustration

length direction

Page 25: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 26: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Issue with using Batch Normalization for Semantic Image Synthesis

• It tends to wash away semantic input information.

CO

NV

BN

grass

sky

1 1 1

1 1 1

1 1 1

5 5 5

5 5 5

5 5 5

CO

NV

0.7 0.7 0.7

0.7 0.7 0.7

0.7 0.7 0.7

-0.9 -0.9 -0.9

-0.9 -0.9 -0.9

-0.9 -0.9 -0.9

BN

β β β

β β β

β β β

β β β

β β β

β β β

Become identical,all the semantic information is gone.

Page 27: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 28: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Segmentation masks often contains large uniform regions

Page 29: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

SPADE: SPatially Adaptive DEnormalization

BN

SPADE

Spatially varying quantity

Depending on the input segmentation mask s

Information removed by normalization can be added back by gamma and beta

Page 30: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

SPADE

element-wise

conv

𝛾

normalization

conv

𝛽

SPADE Module

Page 31: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

SPADE-based Generator

Generator

SPA

DE

Res

Blk

SPA

DE

Res

Blk

SPA

DE

Res

Blk

CO

NV

ReL

U SPA

DE

ReL

UC

ON

V

ReL

UC

ON

V

SPA

DE

SPADE ResBlk

Page 32: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

GauGAN Framework

Generator

CO

NV

ReL

U

CO

NV

ReL

U

CO

NV

False

Discriminator

CO

NV

ReL

U

CO

NV

ReL

U

CO

NV

True

Feature matching loss

SPA

DE

Res

Blk

SPA

DE

Res

Blk

SPA

DE

Res

Blk

CO

NV

ReL

U

Page 33: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Results

Page 34: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 35: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 36: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 37: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Style Control Learning via a Variational Learning Framework

Generator

SPA

DE

Res

Blk

SPA

DE

Res

Blk

SPA

DE

Res

Blk

CO

NV

ReL

U

CO

NV

ReL

U

CO

NV

ReL

U

CO

NV

ReL

U

Encoder

Page 38: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Style Control Learning via a Variational Learning Framework

Generator

SPA

DE

Res

Blk

SPA

DE

Res

Blk

SPA

DE

Res

Blk

CO

NV

ReL

U

Page 39: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Style Control Learning via a Variational Learning Framework

Generator

SPA

DE

Res

Blk

SPA

DE

Res

Blk

SPA

DE

Res

Blk

CO

NV

ReL

U

Page 40: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Style Control Learning via a Variational Learning Framework

Generator

SPA

DE

Res

Blk

SPA

DE

Res

Blk

SPA

DE

Res

Blk

CO

NV

ReL

U

Page 41: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 42: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

In the test time, we can then use different style images to control the global color tone of the output image.

Generator

SPA

DE

Res

Blk

SPA

DE

Res

Blk

SPA

DE

Res

Blk

CO

NV

ReL

U

CO

NV

ReL

U

CO

NV

ReL

U

CO

NV

ReL

U

Encoder

Page 43: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern
Page 44: GauGAN: Semantic Image Synthesis with Spatially Adaptive ......Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro Conference on Computer Vision and Pattern

Conclusion• Segmentation to Image Synthesis Task

• SPADE: Spatially Adaptive Denormalization

• Joint Style and Layout Control

• CVPR2019

• Online demo link: http://nvidia-research-mingyuliu.com/gaugan

• SPADE code: https://github.com/nvlabs/spade/

• Paper: https://arxiv.org/abs/1903.07291

Taesung ParkNVIDIA, UC Berkeley

Ting-Chun WangNVIDIA

Jun-Yan ZhuNVIDIA, MIT

Ming-Yu LiuNVIDIA