Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Deep Learning for Computer VisionSpring 2018
http://vllab.ee.ntu.edu.tw/dlcv.html (primary)
https://ceiba.ntu.edu.tw/1062DLCV (grade, etc.)
FB: DLCV Spring 2018
Yu-Chiang Frank Wang 王鈺強, Associate Professor
Dept. Electrical Engineering, National Taiwan University
2018/05/09
What’s to Be Covered Today…
• Visualization of NNs• t-SNE• Visualization of Feature Maps• Neural Style Transfer
• Understanding NNs• Image Translation• Feature Disentanglement
• Recurrent Neural Networks• LSTM & GRU
Many slides from Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, J.-B. Huang, Yuying Yeh, and Hsuan-I Ho
Representation Disentanglement
• Goal• Interpretable deep feature representation• Disentangle attribute of interest c from the derived latent representation z
• Unsupervised: InfoGAN• Supervised: AC-GAN
InfoGANChen et al.
NIPS ’16
ACGANOdena et al.
ICML ’17
Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17 3
AC-GAN
Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17
Real dataw.r.t. its domain label
• Supervised Disentanglement
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑧𝑧, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦
ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))
• Learning• Overall objective function
• Adversarial Loss
• Disentanglement lossG
D
𝑧𝑧𝑐𝑐
G(𝑧𝑧, 𝑐𝑐)𝑦𝑦 (real)
Supervised
Generated dataw.r.t. assigned label
4
AC-GAN
Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICML’17
• Supervised Disentanglement
G
D
𝑧𝑧𝑐𝑐
G(𝑧𝑧, 𝑐𝑐)𝑦𝑦 (real)
Supervised
Different 𝑐𝑐 values
5
InfoGAN
• Unsupervised Disentanglement
Generated dataw.r.t. assigned label
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑧𝑧, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦
ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))
• Learning• Overall objective function
• Adversarial Loss
• Disentanglement loss
Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.6
InfoGAN
• Unsupervised Disentanglement
Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.
• No guarantee in disentangling particular semantics
Different 𝑐𝑐
Rotation Angle Width
Training process
Different 𝑐𝑐
Time
Loss
7
• Representation Disentanglement• InfoGAN: Unsupervised representation disentanglement• AC-GAN: Supervised representation disentanglement
• Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)
• Joint Image Translation & Disentanglement• StarGAN (CVPR’18) : Image translation via representation disentanglement• CDRD (CVPR’18) : Cross-domain representation disentanglement and translation
From Image Understanding to Image Manipulation
8
Pix2pix
• Image-to-image translation with conditional adversarial networks (CVPR’17)• Can be viewed as image style transfer
Sketch Photo
Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017. 9
Pix2pix• Goal / Problem Setting
• Image translation across two distinct domains (e.g., sketch v.s. photo)
• Pairwise training data
• Method: Conditional GAN• Example: Sketch to Photo
• Generator (VAE)Input: SketchOutput: Photo
• DiscriminatorInput: Concatenation of Input(Sketch) & Synthesized/Real(Photo) imagesOutput: Real or Fake
Testing Phase
GeneratedInput
Input
Concat
Concat
Training Phase
Input
real
Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017. 10
Pix2pix
• Learning the model
GeneratedInput
Input
Concat
Training Phase
Concat
Input
Real
ℒ𝑐𝑐𝐺𝐺𝐺𝐺𝐺𝐺(G, D) = 𝔼𝔼 𝑥𝑥 log 1 − D(𝑥𝑥, G(𝑥𝑥)) + 𝔼𝔼 𝑥𝑥,𝑦𝑦 log D 𝑥𝑥,𝑦𝑦Fake (Generated) Real
Concatenate Concatenate
ℒ𝐿𝐿𝐿(G) = 𝔼𝔼 𝑥𝑥,𝑦𝑦 𝑦𝑦 − G(𝑥𝑥) 𝐿
Reconstruction Loss
Conditional GAN loss
Overall objective functionG∗ = arg min
GmaxD
ℒ𝑐𝑐𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝐿𝐿𝐿(G)
Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017. 11
Pix2pix
• Experiment results
Demo page: https://affinelayer.com/pixsrv/
Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017. 12
• Representation Disentanglement• InfoGAN: Unsupervised representation disentanglement• AC-GAN: Supervised representation disentanglement
• Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)
• Joint Image Translation & Disentanglement• StarGAN (CVPR’18) : Image translation via representation disentanglement• CDRD (CVPR’18) : Cross-domain representation disentanglement and translation
From Image Understanding to Image Manipulation
CycleGAN/DiscoGAN/DualGAN
• CycleGAN (CVPR’17)• Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
Networks -to-image translation with conditional adversarial networks
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
• Easier to collect training data
• More practical
Paired Unpaired
1-to-1 Correspondence
No Correspondence
CycleGAN
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
• Goal / Problem Setting• Image translation across two distinct domains • Unpaired training data
• Idea• Autoencoding-like image translation• Cycle consistency between two domains
Photo PaintingUnpaired
Cycle Consistency
Photo Painting Photo
Training data
Painting Photo Painting
Cycle Consistency
CycleGAN
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
• Method (Example: Photo & Painting)
• Based on 2 GANs• First GAN (G1, D1): Photo to Painting• Second GAN (G2, D2): Painting to Photo
• Cycle Consistency• Photo consistency• Painting consistency
Photo(Input)
Painting(Generated)
G1
D1
Painting(Real)
or Real / Fake
Photo(Generated)
Painting(Input)
G2
D2or Real / Fake
Photo(Real)
CycleGAN
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
• Method (Example: Photo vs. Painting)
• Based on 2 GANs• First GAN (G1, D1): Photo to Painting• Second GAN (G2, D2): Photo to Painting
• Cycle Consistency• Photo consistency• Painting consistency
Photo Consistency
Photo Painting PhotoG1 G2
Painting Photo Painting
Painting Consistency
G1G2
CycleGAN
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
• Learning
• Adversarial Loss• First GAN (G1, D1):
• Second GAN (G2, D2):
Overall objective functionG𝐿∗ , G2∗ = arg min
G1,G2maxD1,D2
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2First GAN Second GAN
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑥𝑥)) + 𝔼𝔼 log D𝐿 𝑦𝑦
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 = 𝔼𝔼 log 1 − D2(G2(𝑦𝑦)) + 𝔼𝔼 log D2 𝑥𝑥
Photo(Input)
Painting(Generated)
G1
D1
Painting(Real)
or
Real/ Fake
Photo(Generated)
Painting(Input)
G2
D2or Real/ Fake
Photo(Real)
𝑥𝑥 G𝐿(𝑥𝑥)
𝑦𝑦
𝑦𝑦 G2(𝑦𝑦)
𝑥𝑥
CycleGAN
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
• Learning
• Consistency Loss• Photo and Painting consistency
Overall objective functionG𝐿∗ , G2∗ = arg min
G1,G2maxD1,D2
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G2, D2 + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2Cycle Consistency
ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G𝐿, G2 = 𝔼𝔼 G2 G𝐿 𝑥𝑥 − 𝑥𝑥 𝐿 + G𝐿 G2 𝑦𝑦 − 𝑦𝑦 𝐿
Photo Consistency
Photo Painting Photo
G1 G2𝑥𝑥 G𝐿 𝑥𝑥 G2 G𝐿 𝑥𝑥
Painting Photo Painting
Painting Consistency
G1G2𝑦𝑦 G2 𝑦𝑦 G𝐿 G2 𝑦𝑦
CycleGAN
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
• Example results
Project Page: https://junyanz.github.io/CycleGAN/
Image Translation Using Unpaired Training Data
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.Kim et al. "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks.“, ICML 2017
Yi et al. "DualGAN: Unsupervised dual learning for image-to-image translation." ICCV 2017
• CycleGAN, DiscoGAN, and DualGAN
CycleGANICCV’17
DiscoGANICML’17
DualGANICCV’17
• Representation Disentanglement• InfoGAN: Unsupervised representation disentanglement• AC-GAN: Supervised representation disentanglement
• Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)
• Joint Image Translation & Disentanglement• StarGAN (CVPR’18) : Image translation via representation disentanglement• CDRD (CVPR’18) : Cross-domain representation disentanglement and translation
From Image Understanding to Image Manipulation
UNIT
• Unsupervised Image-to-Image Translation Networks (NIPS’17)• Image translation via learning cross-domain joint representation
Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017
𝑥𝑥𝐿𝑥𝑥2
𝑧𝑧𝒵𝒵: Joint latent space
𝒳𝒳𝐿 𝒳𝒳2
𝑥𝑥𝐿𝑥𝑥2
𝑧𝑧𝒵𝒵: Joint latent space
𝒳𝒳𝐿 𝒳𝒳2
Stage1: Encode to the joint space Stage2: Generate cross-domain images
Day Night Day Night
UNIT
• Goal/Problem Setting• Image translation
across two distinct domains• Unpaired training image data
• Idea• Based on two parallel VAE-GAN models
Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017
VAE GAN
UNIT
• Goal/Problem Setting• Image translation
across two distinct domains• Unpaired training image data
• Idea• Based on two parallel VAE-GAN models• Learning of joint representation
across image domains
Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017
UNIT
• Goal/Problem Setting• Image translation
across two distinct domains• Unpaired training image data
• Idea• Based on two parallel VAE-GAN models• Learning of joint representation
across image domains• Generate cross-domain images
from joint representation
Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017
Variation Autoencoder Loss
• LearningOverall objective function
G∗ = arg minG
maxD
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 = 𝔼𝔼 G𝐿 E𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))𝔼𝔼 G2 E2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))
VAEG𝐿E𝐿 D𝐿
E2 G2 D2
Adversarial Lossℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑧𝑧) + 𝔼𝔼 log D𝐿 𝑦𝑦𝐿
𝔼𝔼 log 1 − D2(G2(𝑧𝑧) + 𝔼𝔼 log D2 𝑦𝑦2
G𝐿(𝑧𝑧)
G2(𝑧𝑧)
GAN
Variation Autoencoder Adversarial
UNIT
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑧𝑧) + 𝔼𝔼 log D𝐿 𝑦𝑦𝐿𝔼𝔼 log 1 − D2(G2(𝑧𝑧) + 𝔼𝔼 log D2 𝑦𝑦2
Variation Autoencoder Loss
• LearningOverall objective function
G = arg minG
maxD
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 = 𝔼𝔼 G𝐿 E𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))
𝔼𝔼 G2 E2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))
VAEG𝐿E𝐿 D𝐿
E2 G2 D2
Adversarial Loss
G𝐿 E𝐿 𝑥𝑥𝐿
G2 E2 𝑥𝑥2
Reconstruction
UNIT
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑧𝑧) + 𝔼𝔼 log D𝐿 𝑦𝑦𝐿𝔼𝔼 log 1 − D2(G2(𝑧𝑧) + 𝔼𝔼 log D2 𝑦𝑦2
Variation Autoencoder Loss
• LearningOverall objective function
G = arg minG
maxD
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 = 𝔼𝔼 G𝐿 E𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))
𝔼𝔼 G2 E2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))
VAEG𝐿E𝐿 D𝐿
E2 G2 D2
Adversarial Loss
G𝐿 E𝐿 𝑥𝑥𝐿
G2 E2 𝑥𝑥2
Prior Loss
UNIT
Variation Autoencoder Loss
• LearningOverall objective function
G = arg minG
maxD
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 = 𝔼𝔼 G𝐿 E𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))𝔼𝔼 G2 E2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))
G𝐿E𝐿 D𝐿
E2 G2 D2
Adversarial Lossℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑧𝑧) + 𝔼𝔼 log D𝐿 𝑦𝑦𝐿
𝔼𝔼 log 1 − D2(G2(𝑧𝑧) + 𝔼𝔼 log D2 𝑦𝑦2
G𝐿(𝑧𝑧)
G2(𝑧𝑧)
GAN
Generated
𝑦𝑦𝐿
𝑦𝑦2
UNIT
Variation Autoencoder Loss
• LearningOverall objective function
G = arg minG
maxD
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2
ℒ𝑉𝑉𝐺𝐺𝑉𝑉 E𝐿, G𝐿, E2, G2 = 𝔼𝔼 G𝐿 E𝐿 𝑥𝑥𝐿 − 𝑥𝑥𝐿 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞𝐿(𝑧𝑧)||𝑝𝑝(𝑧𝑧))𝔼𝔼 G2 E2 𝑥𝑥2 − 𝑥𝑥2 2 + 𝔼𝔼 𝒦𝒦ℒ(𝑞𝑞2(𝑧𝑧)||𝑝𝑝(𝑧𝑧))
E𝐿
E2
Adversarial Lossℒ𝐺𝐺𝐺𝐺𝐺𝐺 G𝐿, D𝐿, G2, D2 = 𝔼𝔼 log 1 − D𝐿(G𝐿(𝑧𝑧) + 𝔼𝔼 log D𝐿 𝑦𝑦𝐿
𝔼𝔼 log 1 − D2(G2(𝑧𝑧) + 𝔼𝔼 log D2 𝑦𝑦2Real
G𝐿 D𝐿
G2 D2
G𝐿(𝑧𝑧)
G2(𝑧𝑧)
GAN𝑦𝑦𝐿
𝑦𝑦2Real
Real
UNIT
UNIT• Example results
Liu et al., "Unsupervised image-to-image translation networks.“, NIPS 2017
Sunny → Rainy
Rainy → Sunny
Real Street-view → Synthetic Street-view
Synthetic Street-view → Real Street-view
Github Page: https://github.com/mingyuliutw/UNIT
• Representation Disentanglement• InfoGAN: Unsupervised representation disentanglement• AC-GAN: Supervised representation disentanglement
• Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)
• Joint Image Translation & Disentanglement• StarGAN (CVPR’18) : Image translation via representation disentanglement• CDRD (CVPR’18) : Cross-domain representation disentanglement and translation
From Image Understanding to Image Manipulation
Domain Transfer Networks
• Unsupervised Cross-Domain Image Generation (ICLR’17)• Goal/Problem Setting
• Image translation across two domains• One-way only translation• Unpaired training data
• Idea • Apply unified model to learn
joint representation across domains.
Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2016
Domain Transfer Networks
• Unsupervised Cross-Domain Image Generation (ICLR’17)• Goal/Problem Setting
• Image translation across two domains• One-way only translation• Unpaired training data
• Idea • Apply unified model to learn
joint representation across domains.• Consistency observed in image and feature spaces
Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2016
Image consistency
feature consistency
• Learning• Unified model to translate across domains
• Consistency of feature and image space
• Adversarial loss
G∗ = arg minG
maxD
ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D
ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑔𝑔 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2
ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑔𝑔 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦
G D
Domain Transfer Networks
• Learning• Unified model to translate across domains
• Consistency of image and feature space
• Adversarial loss
G∗ = arg minG
maxD
ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D
ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑔𝑔 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2
ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑔𝑔 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦
𝑦𝑦
𝑥𝑥
G D
Image consistency
feature consistency
G = {𝑓𝑓,𝑔𝑔}
Domain Transfer Networks
• Learning• Unified model to translate across domains
• Consistency of feature and image space
• Adversarial loss
G∗ = arg minG
maxD
ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D
ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑔𝑔 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2
ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑔𝑔 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦
𝑦𝑦
𝑥𝑥
G DG(𝑦𝑦)
G(𝑥𝑥)
Domain Transfer Networks
• Learning• Unified model to translate across domains
• Consistency of feature and image space
• Adversarial loss
G∗ = arg minG
maxD
ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G + ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G + ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D
ℒ𝑖𝑖𝑖𝑖𝑖𝑖 G = 𝔼𝔼 𝑔𝑔 𝑓𝑓 𝑦𝑦 − 𝑦𝑦 2
ℒ𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 G = 𝔼𝔼 𝑓𝑓(𝑔𝑔 𝑓𝑓 𝑥𝑥 ) − 𝑓𝑓(𝑥𝑥) 2
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥) + 𝔼𝔼 log 1 − D(G(𝑦𝑦) + 𝔼𝔼 log D 𝑦𝑦
𝑦𝑦
𝑥𝑥
G DG(𝑦𝑦)
G(𝑥𝑥)
Domain Transfer Networks
DTN• Example results
Taigman et al., "Unsupervised cross-domain image generation.“, ICLR 2016
SVHN 2 MNIST Photo 2 Emoji
• Representation Disentanglement• InfoGAN: Unsupervised representation disentanglement• AC-GAN: Supervised representation disentanglement
• Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)
• Joint Image Translation & Disentanglement• StarGAN (CVPR’18) : Image translation via representation disentanglement• CDRD (CVPR’18) : Cross-domain representation disentanglement and translation
From Image Understanding to Image Manipulation
StarGAN
• Goal• Unified GAN for multi-domain image-to-image translation
Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018
Traditional Cross-Domain Models Unified Multi-Domain Model(StarGAN)
StarGAN
• Goal• Unified GAN for multi-domain image-to-image translation
Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018
Traditional Cross-Domain ModelsUnified Multi-Domain Model
(StarGAN)
UnifiedG
𝐺𝐺𝐿2
𝐺𝐺𝐿3 𝐺𝐺24
𝐺𝐺34
𝐺𝐺𝐿4
𝐺𝐺23
StarGAN
• Goal / Problem Setting• Single image translation model across
multiple domains • Unpaired training data
StarGAN
• Goal / Problem Setting• Single Image translation model across multiple
domains • Unpaired training data
• Idea• Concatenate image and target domain label as input of generator• Auxiliary domain classifier on Discriminator
Target domain Image
StarGAN
• Goal / Problem Setting• Single Image translation model across multiple
domains • Unpaired training data
• Idea• Concatenate image and target domain label as input of
Generator• Auxiliary domain classifier as discriminator too
StarGAN
• Goal / Problem Setting• Single Image translation model across
multiple domains • Unpaired training data
• Idea• Concatenate image and target domain label as input of
Generator• Auxiliary domain classifier on Discriminator• Cycle consistency across domains
StarGAN• Goal / Problem Setting
• Single Image translation model across multiple domains
• Unpaired training data
• Idea• Auxiliary domain classifier as discriminator• Concatenate image and target domain label as input• Cycle consistency across domains
StarGAN
• LearningOverall objective function
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G
StarGAN
• Learning
• Adversarial Loss
Overall objective functionG∗ = arg min
GmaxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦
Adversarial Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)
𝑥𝑥𝑐𝑐
StarGAN
• Learning
• Adversarial Loss
• Domain Classification Loss (Disentanglement)
Overall objective functionG∗ = arg min
GmaxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦
Domain Classification Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)
𝑥𝑥𝑐𝑐
ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))
StarGAN
• Learning
• Adversarial Loss
• Domain Classification Loss (Disentanglement)
Overall objective functionG∗ = arg min
GmaxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦
Domain Classification Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)
ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))
Real dataw.r.t. its domain label
𝑐𝑐
𝑥𝑥
Dcls(𝑐𝑐′|𝑦𝑦)
StarGAN
• Learning
• Adversarial Loss
• Domain Classification Loss (Disentanglement)
Overall objective functionG∗ = arg min
GmaxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦
Domain Classification Loss 𝑦𝑦G(𝑥𝑥, 𝑐𝑐)
ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))
Generated dataw.r.t. assigned label
𝑐𝑐 𝑥𝑥
Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))
StarGAN
• Learning
• Adversarial Loss
• Domain Classification Loss (Disentanglement)
• Cycle Consistency Loss
Overall objective functionG∗ = arg min
GmaxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦
Consistency Loss
𝑐𝑐𝑥𝑥
G(𝑥𝑥, 𝑐𝑐)
𝑥𝑥𝑐𝑐ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D= 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))
ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G = 𝔼𝔼 G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥 − 𝑥𝑥 𝐿
G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥
StarGAN
• Learning
• Adversarial Loss
• Domain Classification Loss
• Cycle Consistency Loss
Overall objective functionG∗ = arg min
GmaxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D + ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = 𝔼𝔼 log 1 − D(G(𝑥𝑥, 𝑐𝑐)) + 𝔼𝔼 log D 𝑦𝑦
ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = 𝔼𝔼 − log Dcls(𝑐𝑐′|𝑦𝑦) + 𝔼𝔼 − log Dcls(𝑐𝑐|G(𝑥𝑥, 𝑐𝑐))
ℒ𝑐𝑐𝑦𝑦𝑐𝑐 G = 𝔼𝔼 G G 𝑥𝑥, 𝑐𝑐 , 𝑐𝑐𝑥𝑥 − 𝑥𝑥 𝐿
StarGAN• Example results
• StarGAN can somehow be viewed as a representation disentanglement model, instead of an image translation one.
Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018
Multiple Domains Multiple Domains
Github Page: https://github.com/yunjey/StarGAN
• Representation Disentanglement• InfoGAN: Unsupervised representation disentanglement• AC-GAN: Supervised representation disentanglement
• Image Translation• Pix2pix (CVPR’17): Pairwise cross-domain training data• CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training data• UNIT (NIPS’17): Learning cross-domain image representation (with unpaired training data)• DTN (ICLR’17) : Learning cross-domain image representation (with unpaired training data)
• Joint Image Translation & Disentanglement• StarGAN (CVPR’18) : Image translation via representation disentanglement• CDRD (CVPR’18) : Cross-domain representation disentanglement and translation
From Image Understanding to Image Manipulation
57
Cross-Domain Representation Disnentanglement (CDRD)
Wang et al., Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation”. CVPR 2018
• Goal / Problem Setting• Learning cross-domain joint disentangled
representation • Single domain supervision• Cross-domain image translation with attribute of
interest
• Idea• Bridge the domain gap across domains• Auxiliary attribute classifier on Discriminator• Semantics of disentangled factor is learned from label
in source domain
Joint Space
58
CDRD
• Goal / Problem Setting• Learning cross-domain joint disentangled representation • Single domain supervision• Cross-domain image translation with attribute of interest
• Idea• Bridge the domain gap across domains• Auxiliary attribute classifier on Discriminator• Semantics of disentangled factor is learned from label in
source domain
Supervised(w/ or w/oEyeglass)
Unsupervised
59
CDRD
w/ eyeglasses
w/o eyeglasses
• Goal / Problem Setting• Learning cross-domain joint disentangled representation • Single domain supervision• Cross-domain image translation with attribute of interest
• Idea• Bridge the domain gap across domains• Auxiliary attribute classifier on Discriminator• Semantics of disentangled factor is learned from label in
source domain
w/ eyeglasses
w/o eyeglasses
60
CDRD
• Goal / Problem Setting• Learning cross-domain joint disentangled representation • Single domain supervision• Cross-domain image translation with attribute of interest
• Idea• Based on GAN• Auxiliary attribute classifier on Discriminator• Bridge the domain gap across domains• Semantics of disentangled factor is learned from label in
source domain
Real or Fake
𝑧𝑧
Generator
Discriminator
𝑋𝑋𝑆𝑆 �𝑋𝑋𝑆𝑆
G
D
61
CDRD
• Goal / Problem Setting• Learning cross-domain joint disentangled representation • Single domain supervision• Cross-domain image translation with attribute of interest
• Idea• Based on GAN• Auxiliary attribute classifier as Discriminator• Bridge the domain gap across domains• Semantics of disentangled factor is learned from label in
source domain
Real or Fake
𝑧𝑧
Generator
Discriminator
𝑋𝑋𝑆𝑆 �𝑋𝑋𝑆𝑆
G
D
𝑙𝑙
Classificationw.r.t. 𝑙𝑙𝑆𝑆 or 𝑙𝑙
𝑙𝑙𝑆𝑆
Supervised
62
CDRD
• Goal / Problem Setting• Learning cross-domain joint disentangled representation • Single domain supervision• Cross-domain image translation with attribute of interest
• Idea• Based on GAN• Auxiliary attribute classifier on Discriminator• Bridge the domain gap
with division of high and low-level layers• Semantics of disentangled factor is learned from label in
source domain
63
CDRD
• Goal / Problem Setting• Learning cross-domain joint disentangled representation • Single domain supervision• Cross-domain image translation with attribute of interest
• Idea• Based on GAN• Auxiliary attribute classifier on Discriminator• Bridge the domain gap with division of high- and low-
level layer• Semantics of disentangled factor is learned from
label info in source domain
64
CDRD
• LearningOverall objective function
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D
65
CDRD
• LearningOverall objective function
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D
ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D = 𝔼𝔼 log D𝐶𝐶(DS(𝑋𝑋𝑆𝑆)) + 𝔼𝔼 log 1 − D𝐶𝐶(DS( �𝑋𝑋𝑆𝑆))
ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, D = 𝔼𝔼 log D𝐶𝐶(DT(𝑋𝑋𝑇𝑇)) + 𝔼𝔼 log 1 − D𝐶𝐶(DT( �𝑋𝑋𝑇𝑇))
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D + ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, D
Adversarial Loss
66
Real
CDRD
• LearningOverall objective function
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D
ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D = 𝔼𝔼 log D𝐶𝐶(DS(𝑋𝑋𝑆𝑆)) + 𝔼𝔼 log 1 − D𝐶𝐶(DS( �𝑋𝑋𝑆𝑆))
ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, D = 𝔼𝔼 log D𝐶𝐶(DT(𝑋𝑋𝑇𝑇)) + 𝔼𝔼 log 1 − D𝐶𝐶(DT( �𝑋𝑋𝑇𝑇))
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D + ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, DAdversarial Loss
67
Generated
CDRD
• LearningOverall objective function
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D
ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D = 𝔼𝔼 log D𝐶𝐶(DS(𝑋𝑋𝑆𝑆)) + 𝔼𝔼 log 1 − D𝐶𝐶(DS( �𝑋𝑋𝑆𝑆))
ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, D = 𝔼𝔼 log D𝐶𝐶(DT(𝑋𝑋𝑇𝑇)) + 𝔼𝔼 log 1 − D𝐶𝐶(DT( �𝑋𝑋𝑇𝑇))
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D + ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, DAdversarial Loss
68
AC-GAN
CDRD
• LearningOverall objective function
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D
ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D = 𝔼𝔼 log D𝐶𝐶(DS(𝑋𝑋𝑆𝑆)) + 𝔼𝔼 log 1 − D𝐶𝐶(DS( �𝑋𝑋𝑆𝑆))ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, D = 𝔼𝔼 log D𝐶𝐶(DT(𝑋𝑋𝑇𝑇)) + 𝔼𝔼 log 1 − D𝐶𝐶(DT( �𝑋𝑋𝑇𝑇))
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D + ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, DAdversarial Loss
ℒ𝑐𝑐𝑐𝑐𝑐𝑐𝑆𝑆 G, D = 𝔼𝔼 log𝑃𝑃(𝑙𝑙 = 𝑙𝑙| �𝑋𝑋𝑆𝑆) + 𝔼𝔼 log𝑃𝑃(𝑙𝑙 = 𝑙𝑙𝑆𝑆|𝑋𝑋𝑆𝑆)
ℒ𝑐𝑐𝑐𝑐𝑐𝑐𝑇𝑇 G, D = 𝔼𝔼 log𝑃𝑃(𝑙𝑙 = 𝑙𝑙| �𝑋𝑋𝑇𝑇)
Disentangle Lossℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = ℒ𝑐𝑐𝑐𝑐𝑐𝑐𝑆𝑆 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐𝑇𝑇 G, D
69
InfoGAN
CDRD
• LearningOverall objective function
G∗ = arg minG
maxD
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D
ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D = 𝔼𝔼 log D𝐶𝐶(DS(𝑋𝑋𝑆𝑆)) + 𝔼𝔼 log 1 − D𝐶𝐶(DS( �𝑋𝑋𝑆𝑆))ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, D = 𝔼𝔼 log D𝐶𝐶(DT(𝑋𝑋𝑇𝑇)) + 𝔼𝔼 log 1 − D𝐶𝐶(DT( �𝑋𝑋𝑇𝑇))
ℒ𝐺𝐺𝐺𝐺𝐺𝐺 G, D = ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑆𝑆 G, D + ℒ𝐺𝐺𝐺𝐺𝐺𝐺𝑇𝑇 G, DAdversarial Loss
ℒ𝑐𝑐𝑐𝑐𝑐𝑐𝑆𝑆 G, D = 𝔼𝔼 log𝑃𝑃(𝑙𝑙 = 𝑙𝑙| �𝑋𝑋𝑆𝑆) + 𝔼𝔼 log𝑃𝑃(𝑙𝑙 = 𝑙𝑙𝑆𝑆|𝑋𝑋𝑆𝑆)
ℒ𝑐𝑐𝑐𝑐𝑐𝑐𝑇𝑇 G, D = 𝔼𝔼 log𝑃𝑃(𝑙𝑙 = 𝑙𝑙| �𝑋𝑋𝑇𝑇)
Disentangle Lossℒ𝑐𝑐𝑐𝑐𝑐𝑐 G, D = ℒ𝑐𝑐𝑐𝑐𝑐𝑐𝑆𝑆 G, D + ℒ𝑐𝑐𝑐𝑐𝑐𝑐𝑇𝑇 G, D
70
CDRD• Add an additional encoder
• Input: Gaussian Noise Image• Image translation with attribute of interest• Input: Gaussian Noise
71
CDRD
• Experiment resultsNo Label Supervision
Input
Wang et al., Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation”. CVPR 2018 72
CDRD• Experiment results
Digits
Face
Scene
Cross-Domain Classification
Figure: t-SNE for digit
CVPR’17
NIPS’16
ICML’15
ICCV’15
CVPR’17
PAMI’17
CVPR’14
CVPR’13
ECCV’16
arXiv’16
NIPS’16
NIPS’17
NIPS’16
NIPS’17
Wang et al., Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation”. CVPR 2018 73
Comparisons
Cross-Domain Image Translation Representation Disentanglement
Unpaired Training Data
Multi-domains Bi-direction Joint
Representation Unsupervised Interpretability of disentangled factor
Pix2pix X X X X
Cannot disentangle representation
CycleGAN O X O XStarGAN O O O X
UNIT O X O ODTN O X X O
infoGAN
Cannot translate image across domainsO X
AC-GAN X OCDRD (Ours) O O O O Partially O
74
What’s to Be Covered Today…
• Visualization of NNs• t-SNE• Visualization of Feature Maps• Neural Style Transfer
• Understanding NNs• Image Translation• Feature Disentanglement
• Recurrent Neural Networks• LSTM & GRU
Many slides from Fei-Fei Li, Bhiksha Raj, Yuying Yeh, and Hsuan-I Ho 75
What We Have Learned…
• CNN for Image Classification
DOG
CAT
MONKEY
76
What We Have Learned…
• Generative Models (e.g., GAN) for Image Synthesis
77
But…
• Is it sufficiently effective and robust to perform visual analysis from a single image?
• E.g., Are the hands of this man tied?
78
But…
• Is it sufficiently effective and robust to perform visual analysis from a single image?
• E.g., Are the hands of this man tied?
79
So What Are The Limitations of CNN?
• Can’t easily model series data• Both input and output might be sequential data (scalar or vector)
• Simply feed-forward processing• Cannot memorize, no long-term feedback
DOG
CAT
MONKEY
80
Example of (Visual) Sequential Data
https://quickdraw.withgoogle.com/#
81
More Applications in Vision
Image Captioning
Figure from Vinyals et al, “Show and tell: A neural image caption generator”, CVPR 201582
More Applications in Vision
Visual Question Answering (VQA)
output
Input
Figure from Zhu et al, “Visual 7W: Grounded Question Answering in Images”, CVPR 201683
How to Model Sequential Data?
• Deep learning for sequential data• Recurrent neural networks (RNN)• 3-dimensional convolution neural networks
RNN 3D convolution
84
Recurrent Neural Networks
• Parameter sharing + unrolling• Keeps the number of parameters fixed• Allows sequential data with varying lengths
• Memory ability• Capture and preserve information which has been extracted
85
Recurrence Formula
• Same function and parameters used at every time step:
ℎ𝑓𝑓 ,𝑦𝑦𝑓𝑓 = 𝑓𝑓𝑊𝑊 ℎ𝑓𝑓−𝐿, 𝑥𝑥𝑓𝑓
𝑥𝑥
RNN
𝑦𝑦new statefor time t
state attime t-1
input vector at time t
output vectorat time t
functionwith parameters W
86
Recurrence Formula
• Same function and parameters used at every time step:
ℎ𝑓𝑓 ,𝑦𝑦𝑓𝑓 = 𝑓𝑓𝑊𝑊 ℎ𝑓𝑓−𝐿, 𝑥𝑥𝑓𝑓
↓
ℎ𝑓𝑓 = 𝑡𝑡𝑡𝑡𝑡𝑡ℎ 𝑊𝑊ℎℎℎ𝑓𝑓−𝐿 + 𝑊𝑊𝑥𝑥ℎ𝑥𝑥𝑓𝑓𝑦𝑦𝑓𝑓 = 𝑊𝑊ℎ𝑦𝑦ℎ𝑓𝑓
𝑥𝑥
RNN
𝑦𝑦
87
Multiple Recurrent Layers
88
Multiple Recurrent Layers
89
𝑥𝑥𝐿
𝑓𝑓𝑊𝑊 ℎ𝐿ℎ0
90
𝑥𝑥𝐿
𝑓𝑓𝑊𝑊 ℎ𝐿ℎ0 𝑓𝑓𝑊𝑊
𝑥𝑥2
ℎ2
91
𝑥𝑥𝐿
𝑓𝑓𝑊𝑊 ℎ𝐿ℎ0 𝑓𝑓𝑊𝑊
𝑥𝑥2
ℎ2……
ℎ𝑇𝑇𝑓𝑓𝑊𝑊
𝑥𝑥3
ℎ3
92
𝑥𝑥𝐿
𝑓𝑓𝑊𝑊 ℎ𝐿ℎ0 𝑓𝑓𝑊𝑊
𝑥𝑥2
ℎ2……
ℎ𝑇𝑇𝑓𝑓𝑊𝑊
𝑥𝑥3
ℎ3
Wxh
Whh
93
𝑥𝑥𝐿
𝑓𝑓𝑊𝑊 ℎ𝐿ℎ0 𝑓𝑓𝑊𝑊
𝑥𝑥2
ℎ2……
ℎ𝑇𝑇𝑓𝑓𝑊𝑊
𝑥𝑥3
ℎ3
𝑦𝑦𝐿 𝑦𝑦2 𝑦𝑦3 𝑦𝑦𝑇𝑇
Why
94
e.g., video indexinge.g., video predictione.g., image caption e.g., action recognition
95
Example: Image Captioning
Figure from Karpathy et a, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 201596
CNN
97
98
99
100
101
102
103
Example: Action Recognition
Figure from “Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features” 104
𝑥𝑥𝐿
𝑓𝑓𝑊𝑊 ℎ𝐿ℎ0 𝑓𝑓𝑊𝑊
𝑥𝑥2
ℎ2……
ℎ𝑇𝑇𝑓𝑓𝑊𝑊
𝑥𝑥3
ℎ3
𝑦𝑦𝐿 𝑦𝑦2 𝑦𝑦3 𝑦𝑦𝑇𝑇
105
𝑥𝑥𝐿
𝑓𝑓𝑊𝑊 ℎ𝐿ℎ0 𝑓𝑓𝑊𝑊
𝑥𝑥2
ℎ2……
ℎ𝑇𝑇𝑓𝑓𝑊𝑊
𝑥𝑥3
ℎ3
𝑦𝑦𝐿 𝑦𝑦2 𝑦𝑦3 𝑦𝑦𝑇𝑇
softmax
golf
standingexercise
106
𝑥𝑥𝐿
𝑓𝑓𝑊𝑊 ℎ𝐿ℎ0 𝑓𝑓𝑊𝑊
𝑥𝑥2
ℎ2……
ℎ𝑇𝑇𝑓𝑓𝑊𝑊
𝑥𝑥3
ℎ3
𝑦𝑦𝐿 𝑦𝑦2 𝑦𝑦3 𝑦𝑦𝑇𝑇
action label
softmax
cross entropyloss
Back propagation through time
107
Back Propagation Through Time (BPTT)
108
Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013
Back Propagation Through Time (BPTT)
109
Training RNNs via BPTT
110
Training RNNs: Forward Pass
• Forward pass each training instance:• Pass the entire data sequence through the net• Generate outputs
111
Training RNNs: Computing Gradients
• After forward passing each training instance:• Backward pass: compute gradients via BP• More specifically, BPTT
112
Training RNNs: BPTT
• Let’s focus on one training instance.• The divergence computed is between
the sequence of outputs by the networkand the desired sequence of outputs.
• Generally, this is not just the sum of the divergences at individual times.
113
Gradient Vanishing & Exploding
● Computing gradient involves many factors of W
○ Exploding gradients : Largest singular value > 1
○ Vanishing gradients : Largest singular value < 1
114
Solutions…
● Gradients clipping : rescale gradients if too large
● How about vanishing gradients?
standard gradient descent trajectories
gradient clipping to fix problem
115
Variants of RNN
• Long Short-term Memory (LSTM) [Hochreiter et al., 1997]
• Additional memory cell • Input/Forget/Output Gates• Handle gradient vanishing• Learn long-term dependencies
• Gated Recurrent Unit (GRU) [Cho et al., EMNLP 2014]
• Similar to LSTM• No additional memory cell• Reset / Update Gates• Fewer parameters than LSTM• Comparable performance to LSTM [Chung et al., NIPS Workshop 2014]
116
Vanilla RNN vs. LSTM
Memory Cell
Input in time t
Input in time t
Output in time t
Output in time t
LSTM
117
Forget gate
Input gate
Input activation
Output gate
Cell state
Hidden state
Long Short-Term Memory (LSTM)
Image Credit: Hung-Yi Lee 118
Memory Cell
Input in time tOutput in time t
Forget gate
Input gate
Input activation
Output gate
Cell state
Hidden state
tanh
tanh
Cell state
Hidden state
Forget gate
Input gate
Input activation
Output gate
Cell state
Hidden state
Long Short-Term Memory (LSTM)
119
Calculate forget gate fCell state
Hidden state
Long Short-Term Memory (LSTM)
120
Forget gate
Long Short-Term Memory (LSTM)
Calculate forget gate fCell state
Hidden state
121
Input gate
tanh
Calculate input activationCell state
Hidden state
Long Short-Term Memory (LSTM)
122
Input activation
tanh
Calculate output gateCell state
Hidden state
Long Short-Term Memory (LSTM)
123
Output gate
tanh
Update memory cellCell state
Hidden state
Long Short-Term Memory (LSTM)
124
Cell state
tanh
tanh
Calculate output htCell state
Hidden state
Long Short-Term Memory (LSTM)
125
Hidden state
tanh
tanh
Prevent gradient vanishing if forget gate is open (>0).
Remarks:- Forget gate f provides shortcut connection across time steps- Linear relationship between ct and ct-1 instead of multiplication relationship of ht and ht-1 in vanilla RNN
Cell state
Hidden state
Long Short-Term Memory (LSTM)
126
Variants of RNN
• Long Short-term Memory (LSTM) [Hochreiter et al., 1997]
• Additional memory cell • Input / Forget / Output Gates• Handle gradient vanishing• Learn long-term dependencies
• Gated Recurrent Unit (GRU) [Cho et al., EMNLP 2014]
• Similar to LSTM• No additional memory cell• Reset/Update Gates• Fewer parameters than LSTM• Comparable performance to LSTM [Chung et al., NIPS Workshop 2014]
127
LSTM vs. GRU
LSTM GRU
Forget gate
Input gate
Input activation
Output gate
Cell state
Hidden state
Reset gate
Update gate
Input activation
Hidden state
128
Gated Recurrent Unit (GRU)
tanh
Hidden state
129
Hidden stateCalculate reset gate r
Gated Recurrent Unit (GRU)
130
Reset gate
Hidden stateCalculate update gate z
Gated Recurrent Unit (GRU)
131
Update gate
tanh
Hidden stateCalculate input activation
Gated Recurrent Unit (GRU)
132
Input activation
tanh
Hidden stateCalculate output
Gated Recurrent Unit (GRU)
133
Hidden state
tanh
Hidden statePrevent gradient vanishing if update gate is open!
Remarks:- Update gate z provides shortcut connection across time steps- Linear relationship between ht and ht-1 instead of multiplication relationship of ht and ht-1 in vanilla RNN
Gated Recurrent Unit (GRU)
134
Input in time tOutput in time t
tanh
tanh
tanh
135
Vanilla RNN, LSTM, & GRU
𝑥𝑥
RNN
𝑦𝑦
LSTM vs. GRU
Vanilla RNN LSTM GRU
Cell state
Number of Gates
Parameters
Gradient Vanishing / Exploding
136
References
bookhttp://www.deeplearningbook.org/contents/rnn.html
VThttps://computing.ece.vt.edu/~f15ece6504/slides/L14_RNNs.pptx.pdfhttps://computing.ece.vt.edu/~f15ece6504/slides/L15_LSTMs.pptx.pdf
CS231nhttp://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Jia-Bing Huanghttps://filebox.ece.vt.edu/~jbhuang/teaching/ece5554-4554/fa17/lectures/Lecture_23_ActionRec.pdf
MLDShttps://www.csie.ntu.edu.tw/~yvchen/f106-adl/doc/171030+171102_Attention.pdf
137
Web Tutorial
● http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
● https://r2rt.com/written-memories-understanding-deriving-and-extending-the-lstm.html
138
What We’ve Learned Today…
• Understanding NNs• Image Translation• Feature Disentanglement
• Recurrent Neural Networks• LSTM & GRU
• HW4 is due 5/18 Fri 23:59!
139