15
Multi-objective training of Generative Adversarial Networks with multiple discriminators Isabela Albuquerque * , Jo˜ ao Monteiro * , Thang Doan, Breandan Considine, Tiago Falk, and Ioannis Mitliagkas * Equal contribution 1 / 11

Multi-objective training of Generative Adversarial

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-objective training of Generative Adversarial

Multi-objective training of Generative AdversarialNetworks with multiple discriminators

Isabela Albuquerque∗, Joao Monteiro∗, Thang Doan, BreandanConsidine, Tiago Falk, and Ioannis Mitliagkas

∗Equal contribution

1 / 11

Page 2: Multi-objective training of Generative Adversarial

The multiple discriminators GAN setting

I Recent literature proposed to tackle GANs training instability*issues with multiple discriminators (Ds)

1. Generative multi-adversarial networks, Durugkar et al. (2016)2. Stabilizing GANs training with multiple random projections,

Neyshabur et al. (2017)3. Online Adaptative Curriculum Learning for GANs, Doan et al.

(2018)4. Domain Partitioning Network, Csaba et al. (2019)

*Mode-collapse or vanishing gradients

2 / 11

Page 3: Multi-objective training of Generative Adversarial

The multiple discriminators GAN setting

3 / 11

Page 4: Multi-objective training of Generative Adversarial

Our work

4 / 11

Page 5: Multi-objective training of Generative Adversarial

Our work

minLG (z) = [l1(z), l2(z), ..., lK (z)]T

I Each lk = −Ez∼pz logDk(G (z)) is the loss provided by thek-th discriminator

4 / 11

Page 6: Multi-objective training of Generative Adversarial

Our work

minLG (z) = [l1(z), l2(z), ..., lK (z)]T

I Multiple gradient descent (MGD) is a natural choice to solvethis problem

I But it might be too costly

I Alternative: maximize the hypervolume (HV) of a singlesolution

4 / 11

Page 7: Multi-objective training of Generative Adversarial

Multiple gradient descent

I Seeks a Pareto-stationary solutionI Two steps:

1. Find a common descent direction ∀lk1.1 Minimum norm element within the convex hull of all ∇lk(x)

2. Update the parameters with xt+1 = xt − λ w∗t

||w∗t ||

, where

w∗t = argmin||w||2, w =

K∑k=1

αk∇lk(xt),

s.t.K∑

k=1

αk = 1, αk ≥ 0 ∀k

5 / 11

Page 8: Multi-objective training of Generative Adversarial

Hypervolume maximization for training GANs

LD1

LD2

l1

l2

η∗

LG

η

η

6 / 11

Page 9: Multi-objective training of Generative Adversarial

Hypervolume maximization for training GANs

LG = − log

(K∏

k=1

(η − lk)

)

LG = −K∑

k=1

log(η − lk) LD1

LD2

l1

l2

η∗

LG

η

η

∂LG∂θ

=K∑

k=1

1

η − lk

∂lk∂θ

6 / 11

Page 10: Multi-objective training of Generative Adversarial

Hypervolume maximization for training GANs

LG = − log

(K∏

k=1

(η − lk)

)

LG = −K∑

k=1

log(η − lk) LD1

LD2

l1

l2

η∗

LG

η

η

∂LG∂θ

=K∑

k=1

1

η − lk

∂lk∂θ

ηt = δmaxk{l tk}, δ > 1

6 / 11

Page 11: Multi-objective training of Generative Adversarial

MGD vs. HV maximization vs. Average loss minimization

I MGD seeks a Pareto-stationary solutionI xt+1 ≺ xt

I HV maximization seeks Pareto-optimal solutionsI HV(xt+1) > HV(xt)I For the single-solution case, central regions of the Pareto-front

are preferred

I Average loss minimization does not enforce equally goodindividual losses

I Might be problematic in case there is a trade-off betweendiscriminators

7 / 11

Page 12: Multi-objective training of Generative Adversarial

MNIST

I Same architecture, hyperparameters, and initialization for allmethods

I 8 Ds, 100 epochs

I FID was calculated using a LeNet trained on MNIST until98% test accuracy

2400

2500

AVG GMAN HV MGDModel

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

FID

- M

NIS

T

0 250 500 750 1000 1250 1500 1750Wall-clock time until best FID (minutes)

7

8

9

10

11

12

Best

FID

ach

ieve

d du

ring

trai

ning

HVGMANMGDAVG

8 / 11

Page 13: Multi-objective training of Generative Adversarial

Upscaled CIFAR-10 - Computational cost

I Different GANs with both 1 and 24 Ds + HV

I Same architecture and initialization for all methods

I Comparison of minimum FID obtained during training, alongwith computation cost in terms of time and space

# Disc. FID-ResNet FLOPS∗ Memory

DCGAN1 4.22 8e10 1292

24 1.89 5e11 5671

LSGAN1 4.55 8e10 1303

24 1.91 5e11 5682

HingeGAN1 6.17 8e10 1303

24 2.25 5e11 5682∗Floating point operations per second

I Additional cost → performance improvement

9 / 11

Page 14: Multi-objective training of Generative Adversarial

Cats 256× 256

10 / 11

Page 15: Multi-objective training of Generative Adversarial

Thank you!

Questions? Come to our poster! #4

Code: https://github.com/joaomonteirof/hGAN

11 / 11