Generation - NTU Speech Processing Laboratory

Preview:

Citation preview

Generation Hung-yi Lee ๆŽๅฎๆฏ…

1

Network as Generator

Network

๐‘ง

๐‘ฆSimple Distribution

โ†’ Generator Complex Distribution

๐‘ฅ

2

We know its formulation, so we can sample from it.

Why distribution?

Video Prediction

Network

Source: https://github.com/dyelax/Adversarial_Video_Generation

Real Video

๐‘ฆ

Previous frames

next frame

3

Why distribution?

Source: https://github.com/dyelax/Adversarial_Video_Generation

Prediction

turn right

???

Video Prediction

Network ๐‘ฆ

Previous frames

turn left

4

Why distribution?

Source: https://github.com/dyelax/Adversarial_Video_Generation

Prediction

Video Prediction

Network

Previous frames

Simple Distribution

๐‘ง

5

Why distribution?

โ€ข Especially for the tasks needs โ€œcreativityโ€

NetworkCharacter

with red eyes

Drawing

Chatbot

Networkไฝ ็Ÿฅ้“่ผๅคœๆ˜ฏ่ชฐๅ—Ž?

ๅฅนๆ˜ฏ็ง€็Ÿฅ้™ขๅญธ็”Ÿๆœƒโ€ฆ

ๅฅน้–‹ๅ‰ตไบ†ๅฟ่€…ๆ™‚ไปฃโ€ฆ

6

(The same input has different outputs.)

Generative Adversarial Network (GAN)

7

GAN

โ€ข How to pronounce โ€œGANโ€?

Google ๅฐๅง

8

All Kinds of GAN โ€ฆ https://github.com/hindupuravinash/the-gan-zoo

GAN

ACGAN

BGAN

DCGAN

EBGAN

fGAN

GoGAN

CGAN

โ€ฆโ€ฆ

Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed, โ€œVariational Approaches for Auto-Encoding Generative Adversarial Networksโ€, arXiv, 2017

9

Anime Face Generation

โ€ข Unconditional generation

Generator

๐‘ง

๐‘ฆ

Normal Distribution

Complex Distribution

๐‘ฅ

0.1โˆ’0.1โ‹ฎ0.7

โˆ’0.30.1โ‹ฎ0.9

0.3โˆ’0.1โ‹ฎ

โˆ’0.7

10

Low-dimvector

high-dimvector

Discri-minator

image

Discriminator It is a neural network (that is, a function).

Discri-minator

Discri-minator

Discri-minator1.0 1.0

0.1 Discri-minator

0.1

Scalar: Larger means real, smaller value fake.

11

Basic Idea of GAN

Brown veins

Butterflies are not brown

Butterflies do not have veins

โ€ฆโ€ฆ..

Generator

Discriminator

Basic Idea of GAN

NNGenerator

v1

Discri-minator

v1

Real images:

NNGenerator

v2

Discri-minator

v2

NNGenerator

v3

Discri-minator

v3

This is where the termโ€œadversarialโ€ comes from.

Basic Idea of GAN

โ€ข ๅฏซไฝœๆ•ตไบบ๏ผŒๅ”ธๅšๆœ‹ๅ‹

โ€ข Initialize generator and discriminator

โ€ข In each training iteration:

DG

sample

generated objects

G

Algorithm

D

Update

vector

vector

vector

vector

0000

1111

randomly sampled

Database

Step 1: Fix generator G, and update discriminator D

Discriminator learns to assign high scores to real objects and low scores to generated objects.

Fix

15

โ€ข Initialize generator and discriminator

โ€ข In each training iteration:

DG

Algorithm

Step 2: Fix discriminator D, and update generator G

Discri-minator

NNGenerator

vector

0.13

hidden layer

update fix

large network

Generator learns to โ€œfoolโ€ the discriminator

16

โ€ข Initialize generator and discriminator

โ€ข In each training iteration:

DG

Learning D

Sample some real objects:

Generate some fake objects:

G

Algorithm

D

Update

Learning G

G Dimage

1111

imageimage

image1

update fix

0000

vector

vector

vector

vector

vector

vector

vector

vector

fix

17

Anime Face Generation

100 updates

Source of training data: https://zhuanlan.zhihu.com/p/2476705918

Anime Face Generation

1000 updates

19

Anime Face Generation

2000 updates

20

Anime Face Generation

5000 updates

21

Anime Face Generation

10,000 updates

22

Anime Face Generation

20,000 updates

23

Anime Face Generation

50,000 updates

24

The faces generated by machine.

ๅœ–็‰‡็”Ÿๆˆ๏ผšๅณๅฎ—็ฟฐใ€่ฌๆฟฌไธžใ€้™ณๅปถๆ˜Šใ€้ŒขๆŸๅ‡25

In 2019, with StyleGAN โ€ฆโ€ฆ

Source of video:https://www.gwern.net/Faces

26

Progressive GANhttps://arxiv.org/abs/1710.10196

27

0.00.0

G

0.90.9

G

0.10.1

G

0.20.2

G

0.30.3

G

0.40.4

G

0.50.5

G

0.60.6

G

0.70.7

G

0.80.8

G

28

The first GANhttps://arxiv.org/abs/1406.2661

29

(Ian J. Goodfellow)

Today โ€ฆโ€ฆ BigGANhttps://arxiv.org/abs/1809.11096

30

Theory behind GAN

31

Our Objective Normal

Distribution

๐‘ƒ๐บ ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž

as close as possibleG

How to compute the divergence?

๐บโˆ— = ๐‘Ž๐‘Ÿ๐‘”min๐บ

๐ท๐‘–๐‘ฃ ๐‘ƒ๐บ , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž

Divergence between distributions ๐‘ƒ๐บ and ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž

32

๐‘คโˆ—, ๐‘โˆ— = ๐‘Ž๐‘Ÿ๐‘”min๐‘ค,๐‘

๐ฟc.f.

Sampling is good enough โ€ฆโ€ฆ๐บโˆ— = ๐‘Ž๐‘Ÿ๐‘”min

๐บ๐ท๐‘–๐‘ฃ ๐‘ƒ๐บ , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž

Although we do not know the distributions of ๐‘ƒ๐บ and ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž, we can sample from them.

sample

G

vector

vector

vector

vector

sample from normal

Database

Sampling from ๐‘ท๐‘ฎ

Sampling from ๐‘ท๐’…๐’‚๐’•๐’‚

33

Discriminator ๐บโˆ— = ๐‘Ž๐‘Ÿ๐‘”min๐บ

๐ท๐‘–๐‘ฃ ๐‘ƒ๐บ , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž

Discriminator

: data sampled from ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž : data sampled from ๐‘ƒ๐บ

train

๐‘‰ ๐บ,๐ท = ๐ธ๐‘ฆโˆผ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž ๐‘™๐‘œ๐‘”๐ท ๐‘ฆ + ๐ธ๐‘ฆโˆผ๐‘ƒ๐บ ๐‘™๐‘œ๐‘” 1 โˆ’ ๐ท ๐‘ฆ

Objective Function for D

๐ทโˆ— = ๐‘Ž๐‘Ÿ๐‘”max๐ท

๐‘‰ ๐ท, ๐บTraining:

https://arxiv.org/abs/1406.2661

The value is related to JS divergence.

๐ทโˆ— = ๐‘Ž๐‘Ÿ๐‘”max๐ท

๐‘‰ ๐ท, ๐บ

negative cross entropy

Training classifier: minimize cross entropy

=

class 1

class 2 Train a binary classifier

Discriminator ๐บโˆ— = ๐‘Ž๐‘Ÿ๐‘”min๐บ

๐ท๐‘–๐‘ฃ ๐‘ƒ๐บ , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž

Discriminator

: data sampled from ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž: data sampled from ๐‘ƒ๐บ

train

hard to discriminatesmall divergence

Discriminatortrain

easy to discriminatelarge divergence

๐ทโˆ— = ๐‘Ž๐‘Ÿ๐‘”max๐ท

๐‘‰ ๐ท, ๐บ

Training:

Small max๐ท

๐‘‰ ๐ท, ๐บ

35

๐บโˆ— = ๐‘Ž๐‘Ÿ๐‘”min๐บ

๐ท๐‘–๐‘ฃ ๐‘ƒ๐บ , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Žmax๐ท

๐‘‰ ๐บ,๐ท

The maximum objective value is related to JS divergence.

โ€ข Initialize generator and discriminator

โ€ข In each training iteration:

Step 1: Fix generator G, and update discriminator D

Step 2: Fix discriminator D, and update generator G

๐ทโˆ— = ๐‘Ž๐‘Ÿ๐‘”max๐ท

๐‘‰ ๐ท, ๐บ

36

Using the divergence you like โ˜บ

Can we use other divergence?

https://arxiv.org/abs/1606.00709

37

GAN is difficult to train โ€ฆโ€ฆ

โ€ข There is a saying โ€ฆโ€ฆ

(I found this joke from ้™ณๆŸๆ–‡โ€™s facebook.)38

Tips for GAN

39

JS divergence is not suitable

โ€ข In most cases, ๐‘ƒ๐บ and ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž are not overlapped.

โ€ข 1. The nature of data

โ€ข 2. Sampling

Both ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž and ๐‘ƒ๐บ are low-dim manifold in high-dim space.

๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ

The overlap can be ignored.

Even though ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž and ๐‘ƒ๐บhave overlap.

If you do not have enough sampling โ€ฆโ€ฆ

40

๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ0 ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ1

๐ฝ๐‘† ๐‘ƒ๐บ0 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= ๐‘™๐‘œ๐‘”2

๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ100

โ€ฆโ€ฆ

๐ฝ๐‘† ๐‘ƒ๐บ1 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= ๐‘™๐‘œ๐‘”2

๐ฝ๐‘† ๐‘ƒ๐บ100 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= 0

What is the problem of JS divergence?

โ€ฆโ€ฆ

JS divergence is always log2 if two distributions do not overlap.

Intuition: If two distributions do not overlap, binary classifier achieves 100% accuracy.

Equally bad

41.

The accuracy (or loss) means nothing during GAN training.

Wasserstein distance

โ€ข Considering one distribution P as a pile of earth, and another distribution Q as the target

โ€ข The average distance the earth mover has to move the earth.

๐‘ƒ ๐‘„

d

๐‘Š ๐‘ƒ,๐‘„ = ๐‘‘42

Wasserstein distance

Source of image: https://vincentherrmann.github.io/blog/wasserstein/

๐‘ƒ

๐‘„

Using the โ€œmoving planโ€ with the smallest average distance to define the Wasserstein distance.

There are many possible โ€œmoving plansโ€.

Smaller distance?

Larger distance?

43

๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ0 ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ1

๐ฝ๐‘† ๐‘ƒ๐บ0 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= ๐‘™๐‘œ๐‘”2

๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ100

โ€ฆโ€ฆ

๐ฝ๐‘† ๐‘ƒ๐บ1 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= ๐‘™๐‘œ๐‘”2

๐ฝ๐‘† ๐‘ƒ๐บ100 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= 0

What is the problem of JS divergence?

๐‘Š ๐‘ƒ๐บ0 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= ๐‘‘0

๐‘Š ๐‘ƒ๐บ1 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= ๐‘‘1

๐‘Š ๐‘ƒ๐บ100 , ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž= 0

๐‘‘0 ๐‘‘1

โ€ฆโ€ฆ

โ€ฆโ€ฆ

Better!

44

45

๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ0 ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ1 ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ƒ๐บ100

โ€ฆโ€ฆ

What is the problem of JS divergence?

๐‘‘0 ๐‘‘1

https://www.pnas.org/content/104/suppl_1/8567.figures-only

WGAN

max๐ทโˆˆ1โˆ’๐ฟ๐‘–๐‘๐‘ ๐‘โ„Ž๐‘–๐‘ก๐‘ง

๐ธ๐‘ฆ~๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž ๐ท ๐‘ฆ โˆ’ ๐ธ๐‘ฆ~๐‘ƒ๐บ ๐ท ๐‘ฆ

Evaluate Wasserstein distance between ๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž and ๐‘ƒ๐บ

How to fulfill this constraint?D has to be smooth enough.

real

โˆ’โˆž

generated

D

โˆžWithout the constraint, the training of D will not converge.

Keeping the D smooth forces D(y) become โˆž and โˆ’โˆž

https://arxiv.org/abs/1701.07875

46

โ€ข Original WGAN โ†’ Weight

โ€ข Improved WGAN โ†’ Gradient Penalty

โ€ข Spectral Normalization โ†’ Keep gradient norm smaller than 1 everywhere

Force the parameters w between c and -c

After parameter update, if w > c, w = c; if w < -c, w = -c

Keep the gradient close to 1

max๐ทโˆˆ1โˆ’๐ฟ๐‘–๐‘๐‘ ๐‘โ„Ž๐‘–๐‘ก๐‘ง

๐ธ๐‘ฆ~๐‘ƒ๐‘‘๐‘Ž๐‘ก๐‘Ž ๐ท ๐‘ฆ โˆ’ ๐ธ๐‘ฆ~๐‘ƒ๐บ ๐ท ๐‘ฆ

samples

https://arxiv.org/abs/1704.00028

https://arxiv.org/abs/1802.0595747

GAN is still challenging โ€ฆ

โ€ข Generator and Discriminator needs to match each other (ๆฃ‹้€ขๆ•ตๆ‰‹)

Generate fake images to fool discriminator

Tell the difference between real and fake

Generator Discriminator

I cannot tell the difference โ€ฆโ€ฆ

Fail to improve ...

Fail to improve ...Cannot fool the discriminator โ€ฆ

More Tips

โ€ข Tips from Soumith

โ€ข https://github.com/soumith/ganhacks

โ€ข Tips in DCGAN: Guideline for network architecture design for image generation

โ€ข https://arxiv.org/abs/1511.06434

โ€ข Improved techniques for training GANs

โ€ข https://arxiv.org/abs/1606.03498

โ€ข Tips from BigGAN

โ€ข https://arxiv.org/abs/1809.11096

49

50

GAN for Sequence Generation

max or sample

Decoder

ๆฉŸ ๅ™จ ๅญธ ็ฟ’

Generator

Discriminator

score

update

Non-differentiable โ€ฆ

unchanged

โˆ†

โˆ† โˆ† โˆ† โˆ†

unchanged

. RL is difficult to train GAN is difficult to train

Sequence Generation GAN (RL+GAN)

Reinforcement learning (RL) is involved โ€ฆโ€ฆ

GAN for Sequence Generation

51

GAN for Sequence Generation

โ€ข Usually, the generator are fine-tuned from a model learned by other approaches.

โ€ข However, with enough hyperparameter-tuning and tips, ScarchGAN can train from scratch.

Training language GANs from Scratchhttps://arxiv.org/abs/1905.09922

52

Generative Models

โ€ข This lecture: Generative Adversarial Network (GAN)

Full versionhttps://www.youtube.com/playlist?list=PLJV_el3uVTsMq6JEFPW35BCiOQTsoqwNw

53

More Generative Models

Variational Autoencoder (VAE)

FLOW-based Model

https://youtu.be/uXY18nzdSsMhttps://youtu.be/8zomhgKrsmQ

54

Possible Solution?

0.1โˆ’0.1โ‹ฎ0.7

โˆ’0.30.1โ‹ฎ0.9

0.3โˆ’0.1โ‹ฎ

โˆ’0.7

0.70.1โ‹ฎ

โˆ’0.9

โˆ’0.10.8โ‹ฎ0.8

Network

0.3โˆ’0.1โ‹ฎ

โˆ’0.7

Using typical learning approaches?

Generative Latent Optimization (GLO), https://arxiv.org/abs/1707.05776

Gradient Origin Networks, https://arxiv.org/abs/2007.0279855

Conditional Generation

56

Generator

๐‘ง

๐‘ฆ

๐‘ฅ

57

red eyes

red eyes

black hair

yellow hair

dark circles

Text-to-image

red hair,green eyes

blue hair,red eyes

Conditional GAN

G๐‘งNormal distribution

๐‘ฆ = ๐บ ๐‘, ๐‘ง

๐‘ฆ is real image or not

Image

Real images:

Generated images:

1

0

Generator will learn to generate realistic images โ€ฆ.

But completely ignore the input conditions.

๐‘ฅ: Red eyes

D (original)

scalar๐‘ฆ

Conditional GAN

True text-image pairs:

๐‘ฆ is realistic or not + ๐‘ฅ and ๐‘ฆ are matched or not

(red eyes, ) 1

0

G๐‘งNormal distribution

๐‘ฆ = ๐บ ๐‘, ๐‘งImage๐‘ฅ: Red eyes

D (better)

scalar๐‘ฆ

๐‘ฅ

(red eyes, )

https://arxiv.org/abs/1605.05396

0(red eyes, )

Conditional GAN

G๐‘ง

๐‘ฅ

https://arxiv.org/abs/1611.07004

Image translation, or pix2pix

๐‘ฆ = ๐บ ๐‘, ๐‘ง

Conditional GAN

Testing:

input supervised GAN

G๐‘ง

Image D scalar

GAN + supervised

https://arxiv.org/abs/1611.07004

Conditional GAN

G๐‘ฅ: sound Image

"a dog barking sound"

Training Data Collection

video

https://arxiv.org/abs/1808.04108

Conditional GAN

โ€ข Sound-to-image

https://wjohn1483.github.io/audio_to_scene/index.html

The images are generated by Chia-Hung Wan and Shun-Po Chuang.

Louder

Conditional GAN

https://arxiv.org/abs/1905.08233

Talking Head Generation

Conditional GANMulti-label Image Classifier = Conditional Generator

Input condition

Generated output

https://arxiv.org/abs/1811.04689

66

Learning from Unpaired Data

67

68

Deep

Network๐’™ ๐’š

๐’™๐Ÿ

๐’™๐Ÿ‘

๐’™๐Ÿ“

๐’™๐Ÿ•

๐’™๐Ÿ—

๐’š๐Ÿ

๐’š๐Ÿ’

๐’š๐Ÿ–

๐’š๐Ÿ๐ŸŽ

๐’š๐Ÿ”unpaired

Learning from Unpaired Data

HW3: pseudo labelingHW5: back translation

Still need somepaired data

69

Deep

Network๐’™ ๐’š

unpaired

Learning from Unpaired Data

Image Style Transfer

Domain ๐’ณ Domain ๐’ด

Can we learn the mapping without any paired data?

Unsupervised Conditional Generation

Learning from Unpaired Data

Network

70

Domain ๐’ณ Domain ๐’ด

?

Cycle GAN

๐บ๐’ณโ†’๐’ด

๐ท๐’ด scalar

Input image belongs to domain ๐’ด or not

Become similar to domain ๐’ด

Domain ๐’ณ

Domain ๐’ด

Domain ๐’ณ Domain ๐’ด

?

Cycle GAN

๐บ๐’ณโ†’๐’ด

Become similar to domain ๐’ด

Domain ๐’ณ

Domain ๐’ด

Domain ๐’ณ Domain ๐’ด

ignore input

๐ท๐’ด scalar

Input image belongs to domain ๐’ด or not

Cycle GAN

๐บ๐’ณโ†’๐’ด ๐บ๐’ดโ†’๐’ณ

as close as possible

Lack of information for reconstruction

?

๐ท๐’ด scalar

Input image belongs to domain ๐’ด or not

Domain ๐’ด

Cycle consistency

Cycle GAN

๐บ๐’ณโ†’๐’ด ๐บ๐’ดโ†’๐’ณ

as close as possible

Cycle consistency

๐ท๐’ด scalar

Input image belongs to domain ๐’ด or not

Domain ๐’ด

โ€œRelatedโ€ to input, so possible to reconstruct

Cycle GAN

๐บ๐’ณโ†’๐’ด ๐บ๐’ดโ†’๐’ณ

as close as possible

๐ท๐’ด scalar๐ท๐’ณscalar: belongs to domain ๐’ณor not

๐บ๐’ดโ†’๐’ณ ๐บ๐’ณโ†’๐’ด

Cycle consistency

Cycle GAN

Dual GAN

Disco GAN

https://arxiv.org/a

bs/1703.10593

https://arxiv.org/abs/1704.02510

https://arxiv.org/abs/1703.05192

https://arxiv.org/abs/1711.09020

StarGAN

78

https://selfie2anime.com/https://arxiv.org/abs/1907.10830

G

Seq2seq

Text Style Transfer

ไฝ ็œŸ็ฌจ(negative)

ไฝ ็œŸ่ฐๆ˜Ž(positive)

G

Seq2seq

Positive or not?

D Discriminator

Text Style Transfer

positive

ไฝ ็œŸ็ฌจ(negative) R

Seq2seq

ไฝ ็œŸ็ฌจ(negative)

minimize the reconstruction error

Cycle GAN

ไฝ ็œŸ่ฐๆ˜Ž(positive)???????

?

Text Style Transfer

ๆ„Ÿ่ฌๅผต็“Šไน‹ๅŒๅญธๆไพ›ๅฏฆ้ฉ—็ตๆžœ

่ƒƒ็–ผ , ๆฒ’็ก้†’ , ๅ„็จฎไธ่ˆ’ๆœโ†’็”Ÿๆ—ฅๅฟซๆจ‚ , ็ก้†’ , ่ถ…็ดš่ˆ’ๆœ

ๆˆ‘้ƒฝๆƒณๅŽปไธŠ็ญไบ†, ็œŸๅค ่ณค็š„! โ†’ๆˆ‘้ƒฝๆƒณๅŽป็กไบ†, ็œŸๅธฅ็š„ !

ๆšˆๆญปไบ†, ๅƒ็‡’็ƒคใ€็ซŸ็„ถ้‡ๅˆฐๅ€‹่ฎŠๆ…‹็‹‚โ†’ๅ“ˆๅ“ˆๅฅฝ ~ , ๅƒ็‡’็ƒค ~ ็ซŸ็„ถ้‡ๅˆฐๅธฅ็‹‚

ๆˆ‘่‚šๅญ็—›็š„ๅŽฒๅฎณโ†’ๆˆ‘็”Ÿๆ—ฅๅฟซๆจ‚ๅŽฒๅฎณ

โ€ข From negative sentence to positive one

Language 1 Language 2

Audio Text

Unsupervised Abstractive Summarization

Unsupervised ASR

Unsupervised Translation

summarydocumenthttps://arxiv.org/abs/1810.02851

https://arxiv.org/abs/1710.04087https://arxiv.org/abs/1710.11041

https://arxiv.org/abs/1804.00316https://arxiv.org/abs/1812.09323https://arxiv.org/abs/1904.04100

Evaluation of Generation

84

Quality of Image

โ€ข Human evaluation is expensive (and sometimes unfair/unstable).

โ€ข How to evaluate the quality of the generated images automatically?

85

Off-the-shelf Image Classifier

๐‘ฆ ๐‘ƒ ๐‘|๐‘ฆ

Concentrated distribution means higher visual quality

e.g., Inception net, VGG, etc.

class 1

class 2

class 3image

Diversity - Mode Collapse

: real data

: generated data

86

Diversity - Mode Dropping

87

(BEGAN on CelebA)

Generator at iteration t

Generator at iteration t+1

: real data

: generated data

Diversity

88

CNN๐‘ฆ1

low diversity

CNN๐‘ฆ2

CNN๐‘ฆ3

โ€ฆโ€ฆ

class 1

class 2

class 3

class 1

class 2

class 3

class 1

class 2

class 3

๐‘ƒ ๐‘

=1

๐‘

๐‘›

๐‘ƒ ๐‘|๐‘ฆ๐‘›

class 1

class 2

class 3

๐‘ƒ ๐‘|๐‘ฆ1

๐‘ƒ ๐‘|๐‘ฆ2

๐‘ƒ ๐‘|๐‘ฆ3

Diversity

89

CNN๐‘ฆ1๐‘ƒ ๐‘|๐‘ฆ1

CNN๐‘ฆ2๐‘ƒ ๐‘|๐‘ฆ2

CNN๐‘ฆ3 ๐‘ƒ ๐‘|๐‘ฆ3

โ€ฆโ€ฆ

class 1

class 2

class 3

class 1

class 2class 3

class 1class 2

class 3

๐‘ƒ ๐‘

=1

๐‘

๐‘›

๐‘ƒ ๐‘|๐‘ฆ๐‘›

Uniform means higher variety

Inception Score (IS): Good quality, large diversity โ†’ Large IS

What is the problem here? โ˜บ

Frรฉchet Inception Distance (FID)

red points: real images

FID = Frรฉchet distance between the two Gaussians

https://arxiv.org/pdf/1706.08500.pdf

CNN

softm

ax

blue points: generated images

???

Smaller is better

Are GANs Created Equal? A Large-Scale Studyhttps://arxiv.org/abs/1711.10337

FIT: Smaller is better

91

We donโ€™t want memory GAN.

https://arxiv.org/pdf/1511.01844.pdf

92

Real Data

Generated Data

Generated Data

Same as real data โ€ฆ

Simply flip real data โ€ฆ

To learn more about evaluation โ€ฆ

Pros and cons of GAN evaluation measureshttps://arxiv.org/abs/1802.03446 93

Concluding Remarks

Introduction of Generative Models

Generative Adversarial Network (GAN)

Theory behind GAN

Tips for GAN

Conditional Generation

Learning from unpaired data

Evaluation of Generative Models

94

95

Recommended