View
2
Download
0
Category
Preview:
Citation preview
Network as Generator
Network
๐ง
๐ฆSimple Distribution
โ Generator Complex Distribution
๐ฅ
2
We know its formulation, so we can sample from it.
Why distribution?
Video Prediction
Network
Source: https://github.com/dyelax/Adversarial_Video_Generation
Real Video
๐ฆ
Previous frames
next frame
3
Why distribution?
Source: https://github.com/dyelax/Adversarial_Video_Generation
Prediction
turn right
???
Video Prediction
Network ๐ฆ
Previous frames
turn left
4
Why distribution?
Source: https://github.com/dyelax/Adversarial_Video_Generation
Prediction
Video Prediction
Network
Previous frames
Simple Distribution
๐ง
5
Why distribution?
โข Especially for the tasks needs โcreativityโ
NetworkCharacter
with red eyes
Drawing
Chatbot
Networkไฝ ็ฅ้่ผๅคๆฏ่ชฐๅ?
ๅฅนๆฏ็ง็ฅ้ขๅญธ็ๆโฆ
ๅฅน้ๅตไบๅฟ่ ๆไปฃโฆ
6
(The same input has different outputs.)
All Kinds of GAN โฆ https://github.com/hindupuravinash/the-gan-zoo
GAN
ACGAN
BGAN
DCGAN
EBGAN
fGAN
GoGAN
CGAN
โฆโฆ
Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed, โVariational Approaches for Auto-Encoding Generative Adversarial Networksโ, arXiv, 2017
9
Anime Face Generation
โข Unconditional generation
Generator
๐ง
๐ฆ
Normal Distribution
Complex Distribution
๐ฅ
0.1โ0.1โฎ0.7
โ0.30.1โฎ0.9
0.3โ0.1โฎ
โ0.7
10
Low-dimvector
high-dimvector
Discri-minator
image
Discriminator It is a neural network (that is, a function).
Discri-minator
Discri-minator
Discri-minator1.0 1.0
0.1 Discri-minator
0.1
Scalar: Larger means real, smaller value fake.
11
Basic Idea of GAN
Brown veins
Butterflies are not brown
Butterflies do not have veins
โฆโฆ..
Generator
Discriminator
Basic Idea of GAN
NNGenerator
v1
Discri-minator
v1
Real images:
NNGenerator
v2
Discri-minator
v2
NNGenerator
v3
Discri-minator
v3
This is where the termโadversarialโ comes from.
โข Initialize generator and discriminator
โข In each training iteration:
DG
sample
generated objects
G
Algorithm
D
Update
vector
vector
vector
vector
0000
1111
randomly sampled
Database
Step 1: Fix generator G, and update discriminator D
Discriminator learns to assign high scores to real objects and low scores to generated objects.
Fix
15
โข Initialize generator and discriminator
โข In each training iteration:
DG
Algorithm
Step 2: Fix discriminator D, and update generator G
Discri-minator
NNGenerator
vector
0.13
hidden layer
update fix
large network
Generator learns to โfoolโ the discriminator
16
โข Initialize generator and discriminator
โข In each training iteration:
DG
Learning D
Sample some real objects:
Generate some fake objects:
G
Algorithm
D
Update
Learning G
G Dimage
1111
imageimage
image1
update fix
0000
vector
vector
vector
vector
vector
vector
vector
vector
fix
17
Our Objective Normal
Distribution
๐๐บ ๐๐๐๐ก๐
as close as possibleG
How to compute the divergence?
๐บโ = ๐๐๐min๐บ
๐ท๐๐ฃ ๐๐บ , ๐๐๐๐ก๐
Divergence between distributions ๐๐บ and ๐๐๐๐ก๐
32
๐คโ, ๐โ = ๐๐๐min๐ค,๐
๐ฟc.f.
Sampling is good enough โฆโฆ๐บโ = ๐๐๐min
๐บ๐ท๐๐ฃ ๐๐บ , ๐๐๐๐ก๐
Although we do not know the distributions of ๐๐บ and ๐๐๐๐ก๐, we can sample from them.
sample
G
vector
vector
vector
vector
sample from normal
Database
Sampling from ๐ท๐ฎ
Sampling from ๐ท๐ ๐๐๐
33
Discriminator ๐บโ = ๐๐๐min๐บ
๐ท๐๐ฃ ๐๐บ , ๐๐๐๐ก๐
Discriminator
: data sampled from ๐๐๐๐ก๐ : data sampled from ๐๐บ
train
๐ ๐บ,๐ท = ๐ธ๐ฆโผ๐๐๐๐ก๐ ๐๐๐๐ท ๐ฆ + ๐ธ๐ฆโผ๐๐บ ๐๐๐ 1 โ ๐ท ๐ฆ
Objective Function for D
๐ทโ = ๐๐๐max๐ท
๐ ๐ท, ๐บTraining:
https://arxiv.org/abs/1406.2661
The value is related to JS divergence.
๐ทโ = ๐๐๐max๐ท
๐ ๐ท, ๐บ
negative cross entropy
Training classifier: minimize cross entropy
=
class 1
class 2 Train a binary classifier
Discriminator ๐บโ = ๐๐๐min๐บ
๐ท๐๐ฃ ๐๐บ , ๐๐๐๐ก๐
Discriminator
: data sampled from ๐๐๐๐ก๐: data sampled from ๐๐บ
train
hard to discriminatesmall divergence
Discriminatortrain
easy to discriminatelarge divergence
๐ทโ = ๐๐๐max๐ท
๐ ๐ท, ๐บ
Training:
Small max๐ท
๐ ๐ท, ๐บ
35
๐บโ = ๐๐๐min๐บ
๐ท๐๐ฃ ๐๐บ , ๐๐๐๐ก๐max๐ท
๐ ๐บ,๐ท
The maximum objective value is related to JS divergence.
โข Initialize generator and discriminator
โข In each training iteration:
Step 1: Fix generator G, and update discriminator D
Step 2: Fix discriminator D, and update generator G
๐ทโ = ๐๐๐max๐ท
๐ ๐ท, ๐บ
36
GAN is difficult to train โฆโฆ
โข There is a saying โฆโฆ
(I found this joke from ้ณๆๆโs facebook.)38
JS divergence is not suitable
โข In most cases, ๐๐บ and ๐๐๐๐ก๐ are not overlapped.
โข 1. The nature of data
โข 2. Sampling
Both ๐๐๐๐ก๐ and ๐๐บ are low-dim manifold in high-dim space.
๐๐๐๐ก๐๐๐บ
The overlap can be ignored.
Even though ๐๐๐๐ก๐ and ๐๐บhave overlap.
If you do not have enough sampling โฆโฆ
40
๐๐๐๐ก๐๐๐บ0 ๐๐๐๐ก๐๐๐บ1
๐ฝ๐ ๐๐บ0 , ๐๐๐๐ก๐= ๐๐๐2
๐๐๐๐ก๐๐๐บ100
โฆโฆ
๐ฝ๐ ๐๐บ1 , ๐๐๐๐ก๐= ๐๐๐2
๐ฝ๐ ๐๐บ100 , ๐๐๐๐ก๐= 0
What is the problem of JS divergence?
โฆโฆ
JS divergence is always log2 if two distributions do not overlap.
Intuition: If two distributions do not overlap, binary classifier achieves 100% accuracy.
Equally bad
41.
The accuracy (or loss) means nothing during GAN training.
Wasserstein distance
โข Considering one distribution P as a pile of earth, and another distribution Q as the target
โข The average distance the earth mover has to move the earth.
๐ ๐
d
๐ ๐,๐ = ๐42
Wasserstein distance
Source of image: https://vincentherrmann.github.io/blog/wasserstein/
๐
๐
Using the โmoving planโ with the smallest average distance to define the Wasserstein distance.
There are many possible โmoving plansโ.
Smaller distance?
Larger distance?
43
๐๐๐๐ก๐๐๐บ0 ๐๐๐๐ก๐๐๐บ1
๐ฝ๐ ๐๐บ0 , ๐๐๐๐ก๐= ๐๐๐2
๐๐๐๐ก๐๐๐บ100
โฆโฆ
๐ฝ๐ ๐๐บ1 , ๐๐๐๐ก๐= ๐๐๐2
๐ฝ๐ ๐๐บ100 , ๐๐๐๐ก๐= 0
What is the problem of JS divergence?
๐ ๐๐บ0 , ๐๐๐๐ก๐= ๐0
๐ ๐๐บ1 , ๐๐๐๐ก๐= ๐1
๐ ๐๐บ100 , ๐๐๐๐ก๐= 0
๐0 ๐1
โฆโฆ
โฆโฆ
Better!
44
45
๐๐๐๐ก๐๐๐บ0 ๐๐๐๐ก๐๐๐บ1 ๐๐๐๐ก๐๐๐บ100
โฆโฆ
What is the problem of JS divergence?
๐0 ๐1
https://www.pnas.org/content/104/suppl_1/8567.figures-only
WGAN
max๐ทโ1โ๐ฟ๐๐๐ ๐โ๐๐ก๐ง
๐ธ๐ฆ~๐๐๐๐ก๐ ๐ท ๐ฆ โ ๐ธ๐ฆ~๐๐บ ๐ท ๐ฆ
Evaluate Wasserstein distance between ๐๐๐๐ก๐ and ๐๐บ
How to fulfill this constraint?D has to be smooth enough.
real
โโ
generated
D
โWithout the constraint, the training of D will not converge.
Keeping the D smooth forces D(y) become โ and โโ
https://arxiv.org/abs/1701.07875
46
โข Original WGAN โ Weight
โข Improved WGAN โ Gradient Penalty
โข Spectral Normalization โ Keep gradient norm smaller than 1 everywhere
Force the parameters w between c and -c
After parameter update, if w > c, w = c; if w < -c, w = -c
Keep the gradient close to 1
max๐ทโ1โ๐ฟ๐๐๐ ๐โ๐๐ก๐ง
๐ธ๐ฆ~๐๐๐๐ก๐ ๐ท ๐ฆ โ ๐ธ๐ฆ~๐๐บ ๐ท ๐ฆ
samples
https://arxiv.org/abs/1704.00028
https://arxiv.org/abs/1802.0595747
GAN is still challenging โฆ
โข Generator and Discriminator needs to match each other (ๆฃ้ขๆตๆ)
Generate fake images to fool discriminator
Tell the difference between real and fake
Generator Discriminator
I cannot tell the difference โฆโฆ
Fail to improve ...
Fail to improve ...Cannot fool the discriminator โฆ
More Tips
โข Tips from Soumith
โข https://github.com/soumith/ganhacks
โข Tips in DCGAN: Guideline for network architecture design for image generation
โข https://arxiv.org/abs/1511.06434
โข Improved techniques for training GANs
โข https://arxiv.org/abs/1606.03498
โข Tips from BigGAN
โข https://arxiv.org/abs/1809.11096
49
50
GAN for Sequence Generation
max or sample
Decoder
ๆฉ ๅจ ๅญธ ็ฟ
Generator
Discriminator
score
update
Non-differentiable โฆ
unchanged
โ
โ โ โ โ
unchanged
. RL is difficult to train GAN is difficult to train
Sequence Generation GAN (RL+GAN)
Reinforcement learning (RL) is involved โฆโฆ
GAN for Sequence Generation
51
GAN for Sequence Generation
โข Usually, the generator are fine-tuned from a model learned by other approaches.
โข However, with enough hyperparameter-tuning and tips, ScarchGAN can train from scratch.
Training language GANs from Scratchhttps://arxiv.org/abs/1905.09922
52
Generative Models
โข This lecture: Generative Adversarial Network (GAN)
Full versionhttps://www.youtube.com/playlist?list=PLJV_el3uVTsMq6JEFPW35BCiOQTsoqwNw
53
More Generative Models
Variational Autoencoder (VAE)
FLOW-based Model
https://youtu.be/uXY18nzdSsMhttps://youtu.be/8zomhgKrsmQ
54
Possible Solution?
0.1โ0.1โฎ0.7
โ0.30.1โฎ0.9
0.3โ0.1โฎ
โ0.7
0.70.1โฎ
โ0.9
โ0.10.8โฎ0.8
Network
0.3โ0.1โฎ
โ0.7
Using typical learning approaches?
Generative Latent Optimization (GLO), https://arxiv.org/abs/1707.05776
Gradient Origin Networks, https://arxiv.org/abs/2007.0279855
Generator
๐ง
๐ฆ
๐ฅ
57
red eyes
red eyes
black hair
yellow hair
dark circles
Text-to-image
red hair,green eyes
blue hair,red eyes
Conditional GAN
G๐งNormal distribution
๐ฆ = ๐บ ๐, ๐ง
๐ฆ is real image or not
Image
Real images:
Generated images:
1
0
Generator will learn to generate realistic images โฆ.
But completely ignore the input conditions.
๐ฅ: Red eyes
D (original)
scalar๐ฆ
Conditional GAN
True text-image pairs:
๐ฆ is realistic or not + ๐ฅ and ๐ฆ are matched or not
(red eyes, ) 1
0
G๐งNormal distribution
๐ฆ = ๐บ ๐, ๐งImage๐ฅ: Red eyes
D (better)
scalar๐ฆ
๐ฅ
(red eyes, )
https://arxiv.org/abs/1605.05396
0(red eyes, )
Conditional GAN
G๐ง
๐ฅ
https://arxiv.org/abs/1611.07004
Image translation, or pix2pix
๐ฆ = ๐บ ๐, ๐ง
Conditional GAN
Testing:
input supervised GAN
G๐ง
Image D scalar
GAN + supervised
https://arxiv.org/abs/1611.07004
Conditional GAN
G๐ฅ: sound Image
"a dog barking sound"
Training Data Collection
video
https://arxiv.org/abs/1808.04108
Conditional GAN
โข Sound-to-image
https://wjohn1483.github.io/audio_to_scene/index.html
The images are generated by Chia-Hung Wan and Shun-Po Chuang.
Louder
Conditional GANMulti-label Image Classifier = Conditional Generator
Input condition
Generated output
https://arxiv.org/abs/1811.04689
68
Deep
Network๐ ๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐
๐๐๐
๐๐unpaired
Learning from Unpaired Data
HW3: pseudo labelingHW5: back translation
Still need somepaired data
69
Deep
Network๐ ๐
unpaired
Learning from Unpaired Data
Image Style Transfer
Domain ๐ณ Domain ๐ด
Can we learn the mapping without any paired data?
Unsupervised Conditional Generation
?
Cycle GAN
๐บ๐ณโ๐ด
๐ท๐ด scalar
Input image belongs to domain ๐ด or not
Become similar to domain ๐ด
Domain ๐ณ
Domain ๐ด
Domain ๐ณ Domain ๐ด
?
Cycle GAN
๐บ๐ณโ๐ด
Become similar to domain ๐ด
Domain ๐ณ
Domain ๐ด
Domain ๐ณ Domain ๐ด
ignore input
๐ท๐ด scalar
Input image belongs to domain ๐ด or not
Cycle GAN
๐บ๐ณโ๐ด ๐บ๐ดโ๐ณ
as close as possible
Lack of information for reconstruction
?
๐ท๐ด scalar
Input image belongs to domain ๐ด or not
Domain ๐ด
Cycle consistency
Cycle GAN
๐บ๐ณโ๐ด ๐บ๐ดโ๐ณ
as close as possible
Cycle consistency
๐ท๐ด scalar
Input image belongs to domain ๐ด or not
Domain ๐ด
โRelatedโ to input, so possible to reconstruct
Cycle GAN
๐บ๐ณโ๐ด ๐บ๐ดโ๐ณ
as close as possible
๐ท๐ด scalar๐ท๐ณscalar: belongs to domain ๐ณor not
๐บ๐ดโ๐ณ ๐บ๐ณโ๐ด
Cycle consistency
Cycle GAN
Dual GAN
Disco GAN
https://arxiv.org/a
bs/1703.10593
https://arxiv.org/abs/1704.02510
https://arxiv.org/abs/1703.05192
G
Seq2seq
Positive or not?
D Discriminator
Text Style Transfer
positive
ไฝ ็็ฌจ(negative) R
Seq2seq
ไฝ ็็ฌจ(negative)
minimize the reconstruction error
Cycle GAN
ไฝ ็่ฐๆ(positive)???????
?
Text Style Transfer
ๆ่ฌๅผต็ไนๅๅญธๆไพๅฏฆ้ฉ็ตๆ
่็ผ , ๆฒ็ก้ , ๅ็จฎไธ่ๆโ็ๆฅๅฟซๆจ , ็ก้ , ่ถ ็ด่ๆ
ๆ้ฝๆณๅปไธ็ญไบ, ็ๅค ่ณค็! โๆ้ฝๆณๅป็กไบ, ็ๅธฅ็ !
ๆๆญปไบ, ๅ็็คใ็ซ็ถ้ๅฐๅ่ฎๆ ็โๅๅๅฅฝ ~ , ๅ็็ค ~ ็ซ็ถ้ๅฐๅธฅ็
ๆ่ๅญ็็ๅฒๅฎณโๆ็ๆฅๅฟซๆจๅฒๅฎณ
โข From negative sentence to positive one
Language 1 Language 2
Audio Text
Unsupervised Abstractive Summarization
Unsupervised ASR
Unsupervised Translation
summarydocumenthttps://arxiv.org/abs/1810.02851
https://arxiv.org/abs/1710.04087https://arxiv.org/abs/1710.11041
https://arxiv.org/abs/1804.00316https://arxiv.org/abs/1812.09323https://arxiv.org/abs/1904.04100
Quality of Image
โข Human evaluation is expensive (and sometimes unfair/unstable).
โข How to evaluate the quality of the generated images automatically?
85
Off-the-shelf Image Classifier
๐ฆ ๐ ๐|๐ฆ
Concentrated distribution means higher visual quality
e.g., Inception net, VGG, etc.
class 1
class 2
class 3image
Diversity - Mode Dropping
87
(BEGAN on CelebA)
Generator at iteration t
Generator at iteration t+1
: real data
: generated data
Diversity
88
CNN๐ฆ1
low diversity
CNN๐ฆ2
CNN๐ฆ3
โฆโฆ
class 1
class 2
class 3
class 1
class 2
class 3
class 1
class 2
class 3
๐ ๐
=1
๐
๐
๐ ๐|๐ฆ๐
class 1
class 2
class 3
๐ ๐|๐ฆ1
๐ ๐|๐ฆ2
๐ ๐|๐ฆ3
Diversity
89
CNN๐ฆ1๐ ๐|๐ฆ1
CNN๐ฆ2๐ ๐|๐ฆ2
CNN๐ฆ3 ๐ ๐|๐ฆ3
โฆโฆ
class 1
class 2
class 3
class 1
class 2class 3
class 1class 2
class 3
๐ ๐
=1
๐
๐
๐ ๐|๐ฆ๐
Uniform means higher variety
Inception Score (IS): Good quality, large diversity โ Large IS
What is the problem here? โบ
Frรฉchet Inception Distance (FID)
red points: real images
FID = Frรฉchet distance between the two Gaussians
https://arxiv.org/pdf/1706.08500.pdf
CNN
softm
ax
blue points: generated images
???
Smaller is better
Are GANs Created Equal? A Large-Scale Studyhttps://arxiv.org/abs/1711.10337
FIT: Smaller is better
91
We donโt want memory GAN.
https://arxiv.org/pdf/1511.01844.pdf
92
Real Data
Generated Data
Generated Data
Same as real data โฆ
Simply flip real data โฆ
To learn more about evaluation โฆ
Pros and cons of GAN evaluation measureshttps://arxiv.org/abs/1802.03446 93
Concluding Remarks
Introduction of Generative Models
Generative Adversarial Network (GAN)
Theory behind GAN
Tips for GAN
Conditional Generation
Learning from unpaired data
Evaluation of Generative Models
94
Recommended