Generative · Web viewPixel art generation with generative adversarial networks Author: Georgi Kabadzhov Student ID: 6359353BSc Games TechnologySupervisor: Will BlewitInstitution:

COVENTRY UNIVERSITYBSc Games Technology

PIXEL ART GENERATION WITH GENERATIVE ADVERSARIAL NETWORKSAuthor: Georgi Kabadzhov

Student ID: 6359353BSc Games TechnologySupervisor: Will BlewitInstitution: Department of Computing, Coventry UniversityDate of Submission: 29.04.209

303COM: Individual Project Georgi Kabadzhov



ContentsAbstract.......................................................................................................................................................5

1 Introduction.........................................................................................................................................5

2 Related Work.......................................................................................................................................5

2.1 Assets creation.............................................................................................................................5

2.2 The cost of Art in Games..............................................................................................................6

2.3 Procedural Generated Content....................................................................................................7

2.4 Neural Networks..........................................................................................................................7

2.5 Unsupervised generative models.................................................................................................8

2.6 Extensions of and variations of GAN............................................................................................8

2.7 Deep Learning Frameworks.........................................................................................................9

2.8 Pixel art......................................................................................................................................10

2.9 Evaluation Metrics.....................................................................................................................11

2.10 Analyzing the results and survey...............................................................................................11

2.11 Contributions.............................................................................................................................11

3 Methodology.....................................................................................................................................11

3.1 Required knowledge – GAN.......................................................................................................12

3.2 Deep Convolutional GAN (DCGAN)............................................................................................12

3.2.1 Weight Initialization...........................................................................................................13

3.2.2 Generator..........................................................................................................................13

3.2.3 Discriminator.....................................................................................................................14

3.2.4 Loss Function and Optimizers............................................................................................14

3.2.5 Training..............................................................................................................................15

3.3 Auxiliary classifier GAN..............................................................................................................15

3.3.1 Weight initialization...........................................................................................................16

3.3.2 Generator..........................................................................................................................16

3.3.3 Discriminator.....................................................................................................................16

3.3.4 Loss functions and Optimization........................................................................................16

3.3.5 Training..............................................................................................................................17

3.4 Informational Maximizing GAN..................................................................................................17

3.4.1 Mutual information for Inducing Latent codes..................................................................17

3.4.2 Variational Mutual Information Maximization...................................................................18


3.4.3 Implementation.................................................................................................................19

3.5 Variational Auto Encoder...........................................................................................................19

3.5.1 Variational lower bound....................................................................................................20

3.5.2 Model structure.................................................................................................................20

4 Evaluation..........................................................................................................................................21

4.1 DCGAN.......................................................................................................................................22

4.1.1 Generator and Discriminator Loss during training.............................................................22

4.1.2 Celeb-A..............................................................................................................................22

4.1.3 Pixel Art Dataset................................................................................................................23

4.2 AC-GAN......................................................................................................................................24

4.2.1 Pixel art Dataset.................................................................................................................25

4.3 InfoGAN.....................................................................................................................................25

4.3.1 Manipulating latent codes on 3D Faces.............................................................................26

4.3.2 Pixel Art Dataset................................................................................................................27

4.4 VAE............................................................................................................................................27

4.4.1 Pixel Art..............................................................................................................................27

4.5 Survey Results............................................................................................................................28

4.6 Results.......................................................................................................................................29

5 Discussion..........................................................................................................................................29

5.1 Achievements............................................................................................................................29

5.2 What went wrong......................................................................................................................29

6 Reflection on social, legal and ethical issues.....................................................................................30

6.1 Project Management.................................................................................................................30

6.2 Supervisor feedback..................................................................................................................30

6.3 Future Improvements................................................................................................................30

6.4 Ethical issues..............................................................................................................................30

7 Conclusion.........................................................................................................................................31

8 Bibliography.......................................................................................................................................31

9 Appendices........................................................................................................................................32

9.1 Detailed project proposal..........................................................................................................32

9.1.1 Research Question.............................................................................................................32

9.1.2 Literature Review...............................................................................................................33

9.1.3 Client..................................................................................................................................33


9.1.4 Primary Research Plan.......................................................................................................34

9.1.5 Intended Project Outcome.................................................................................................34

9.1.6 Bibliography.......................................................................................................................34

9.2 Full Source Code........................................................................................................................35

9.2.1 DCGAN...............................................................................................................................35

9.2.2 AC-GAN..............................................................................................................................45

9.2.3 InfoGAN.............................................................................................................................50

9.3 Survey........................................................................................................................................61

9.4 Supervisor feedback..................................................................................................................63


AbstractA Generative Adversarial Network (GAN) is a class of machine learning systems. This technique can generate photographs that look at least superficially authentic to human observers, having many realistic characteristics. It is a form of unsupervised learning. In this thesis the efficiency of different GAN models will be compared. The metrics used are based on real world data with the goal of producing pixel art sprites from already existing game assets. The proposed models are trained with the same dataset, consisted only of 700 images which, in terms of machine learning, this is considered a very small dataset. The tested models are compared based on the accuracy, generator and discriminator loss and time for training. The goal of this research paper is to see if GAN’s are suitable for procedural generation tool in games.

1 IntroductionGenerative Adversarial Networks (GAN) model (Goodfellow, 2016) are powerful subclass of generative models and were successfully applied to image generation and editing, semi-supervised learning, and domain adaptation. In the GAN framework the model learns a deterministic transformation G of a sample distribution Pz, with the goal of matching the data distribution Pd. This learning problem may be viewed as a two-player game between the generator, which learns how to generate samples which resemble real data, and a discriminator, which learns how to discriminate between real and fake data. Both players aim to minimize their own cost and the solution to the game is the Nash equilibrium where neither player can improve their cost unilaterally.

Various flavours of GAN’s have been recently proposed (Lucic et al., 2018) both purely unsupervised as well as conditional. While these models achieve compelling result in specific domains, there is still no clear consensus on which GAN algorithms perform objectively better than others.

The goal of this research is to find if Generative Adversarial networks are suitable for procedurally generating pixel art. To achieve this, a couple different GAN models will be compared, using small dataset and taking in consideration the time needed for producing art.

2 Related Work2.1 Assets creationGame assets is a term used to refer to every piece of work that go into making a game. From 2D sprites, to sound effects and programming scripts. This paper is concerned of the current state of Art Production. In his GDC talk, Naughty Dog’s Andrew Maximov discusses what a video game art production will look in the near future and most importantly, how it will affect game company art departments in that timespan. “Technology forces upon a conversation about what is valuable about Art”. In his talk he addresses the limitations during early days of game development. He gives examples of color being technical resource. His point is that there is technology from the past, which is obviously imperfect, but also questions what the current limitations of technology are. At the end of his talk he addresses that the future of game art production, might, and probably should engage in using Machine Learning and Neural Networks for procedural content creation. An example of company that uses neural network as technology is the game studio Remedy. In their game Quantum Break, Remedy use a neural network-based animation solver. This game is famous for its realistic facial animations.


2.2 The cost of Art in GamesAccording to an article in Games Industry Career Guide, game artist salaries start around 35,000 USD per annual for entry level positions and can grow to as much as 90,000 USD per annual, even higher for senior or lead positions. The main way to break down the numbers is to look at experience. In the graph below you can see the average salaries for game artists with various years of experience (Figure 2.2.1) and average salaries based on the job title (Figure 2.2.2)

Figure 2.2.1 Artist salary based on experience (Bay, 2019)

Figure 2.2.2 Artist salaries by job title (Bay, 2019)


As you can see, the cost of creating game art is not low at all and for this reason, companies resort to using procedural generated content.

2.3 Procedural Generated ContentIn the right game – Diablo, Rogue, Spelunky, Daggerfall, Elite, Spore and even the likes of Football Manager – procedurally-generated content is magical. It elevates the design and highlights the elegance of the core system loops. As shown above, the price for hiring an artist can hit pretty high numbers, so a well-designed procedural generated content can save time and earn money. Procedural content creation is a key part of why games like Skyrim and Minecraft are able to attract and retain huge player base, that keep playing years after release. The most common use of procedurally generated content is creating environment. Games such as Spelunky, Diablo, No Man’s Sky and Rogue heavily rely on their procedural generated content to provide immersive experience with great variety. (Richard Moss, 2016)

2.4 Neural NetworksAn artificial neural network is a system of hardware or software that is patterned after the working of neurons in the human brain and nervous system. Artificial neural networks are a variety of deep learning technologies which comes under the broad domain of Artificial Intelligence

Deep learning is a branch of Machine Learning which uses different types of neural networks. These algorithms are inspired by the way our brain functions and therefore many experts believe they are our best shot to moving towards real AI. Often, none technological people wonder are there any games using neural networks to provide high level AI, but this is rarely the case in the industry. Most of the AAA games are still using Finite State Machines for creating their AI, as this still provides immersive content, while using a technique that is easier to debug and have more predictable behavior in general. However, neural networks definitely have their role in games, often used as animation solver or content generation. (Mehta, 2019)


Figure 2.4.3 Different types of Neural Networks (Mehta, 2019)

2.5 Unsupervised generative modelsThe original Generative Adversarial Network (GAN) is a generative model. There are roughly two methods in generative models. One is parametric learning, which bring hypothetical distribution close to true distribution. This is usually achieved by the usage of Variational Auto Encoder (VAE). The other most common approach is non-parametric learning, achieved by the usage of GANs. However, both of which have their advantages and disadvantages. While the VAE approach is considered an elegant solution, it often tends to produce blurry images. On the other hand, GANs produce very high quality of generated images, but suffers difficulty in translating a random vector in to a desired high dimensional sample. (Jiayu Wang, 2019)

2.6 Extensions of and variations of GAN

For the last couple of years, developers have been coming up with different extensions and variations of GANs. Some of the more popular ones are:

cCGAN - (Conditional GAN) – generator and discriminator are conditioned on some external information like class labels (Mirza, 2014) https://arxiv.org/pdf/1411.1784.pdf

https://arxiv.org/pdf/1411.1784.pdf


DCGAN (Deep Convolutional GAN) )– building GAN network using convolutional network components (Radford et al., 2016) https://arxiv.org/pdf/1511.06434.pdf

biGAN ( Bidirectional GAN) – Projecting data back into latent space. Useful for auxiliary supervised discrimination tasks. (Bao et al., 2018) https://arxiv.org/pdf/1805.07862.pdf

infoGAN – Information-theoretic extension of GAN. It allows learning of meaningful representations, which are competitive with representations learned by some supervised methods. (Chen et al., 2016) https://arxiv.org/pdf/1606.03657.pdf

srGAN – GAN to recover a photo-realistic textures from down-sampled images (Ledig et al., 2017) https://arxiv.org/pdf/1609.04802.pdf

SeqGAN – Sequence generative framework to directly generate sequential synthetic data, and achieved significant performance in speech generation and music generation. (Yu et al., 2017) https://arxiv.org/pdf/1609.05473.pdf

Introspective Adversarial Network – achieves accurate reconstructions without loss of feature quality and improved generalization performance with Orthogonal Regularization. (Brock et al., 2017) https://arxiv.org/pdf/1609.07093.pdf%5D

Some engineers have attempted combining auto encoders with generative models and got impressive results. Such examples are the Auto Encoder Generative Adversarial Networks (AEGAN, Jiayu Wang et al., 2018), Deep Recurrent Attentive Writer (DRAW, Karol Gregor et al., 2014 ), Variational Auto-Encoded Generative Adversarial Network (VAEGAN, Anders Laresn et al., 2015) and multiple implementations of only VAE or only GAN as well.

2.7 Deep Learning FrameworksWhen it comes down to machine learning, there are two main frameworks that are running the market. They are Facebook’s PyTorch and Google’s Tensorflow. At it’s core, the competition between those two is fueled by the similarity of the two frameworks. Both frameworks

Are an open source libraries for high performance numerical computation. Are supported by a large tech company (Facebook and Google) Have a strong and active supporting community Are Python based Use graphs to represent the flow of data and operations Are well documented

Taking all of this into account, we can say that almost anything created in one of the frameworks can be replicated in the other at a similar cost. In 2019, a survey on those two frameworks was created by the company Slashdata.co. They asked developers who said that they are involved in data science (DS) or machine learning (ML) which of the two frameworks they are using, how they are using them and what else they do in their professional life.

From 3,000 developers involved in ML or DS, 43% of them use PyTorch or Tensorflow. Those 43% are not equally distributed between the two frameworks. Tensorflow is 3.4 times bigger than PyTorch. A total of 86% of ML developers and data scientists, said they are currently using Tensorflow, while only 11% were using PyTorch. Moreover, PyTorch has more than 50% of it’s



community also using Tensorflow. On the other hand, only 15% of the Tensorflow community also uses PyTorch. It would seem like Tensorflow is a must, but PyTorch is nice-to-have.

Figure 2.7.4 Survey based on a sample of 1,616 ML developers and data scientists

2.8 Pixel art As mentioned in chapter 2.1, the early days of game development had heavy technical limitations. This lead to the birth of Pixel Art as a style. Originated from classic arcade games, such as Space Invaders (1978) and Pac-Man(1980). The term pixel art was first published by Adele Goldberg and Robert Flegal of Xerox Palo Alto Research Center in 1982. The concept, however, goes back about 11 years before that, for example in Richard Shoup’s SuperPaint system in 1972, also at Xerox PARC.

With the increasing use of 3D graphics in games, pixel art lost some of it’s use. Despite that, this is still a very active, professional/amateur area, since mobile phones and other portable devices still have low resolution and therefore require skillful use of space and memory. The improvement in current technology has evolved pixel art, allowing for better detail and animation for the art style than previously attainable.

Modern pixel art has been seen as reaction to the 3D graphics industry by amateur game/graphic hobbyists. Many retro enthusiasts often choose to mimic the style of the past. Some view the pixel art revival as restoring the golden age of second and third generation consoles, where it is argued grahpics were more aesthetically pleasing. Pixel art still remains popular and has been used in social networking virtual worlds such as Citypixel and Habbo, as well as among hand-held devices such as the Nintendo DS, Nintendo 3DS, PSP, PS Vita and mobile phones, and in modern indie games such as Hotline Miami and FTL: Faster than Light


Pixel art has evolved a lot through the ages. As Devonte Griffins says in his article “History of Pixel Arts” - The highs and lows pixel art faced, eventually resulted in developers using pixel art as an artistic choice instead of shortcut for primitive tech”. Since pixel art is simple and straight forward style, the outcome expectations are that the neural network should be fully capable of producing satisfying results.

https://thegemsbok.com/art-reviews-and-articles/video-game-reviews-mid-week-mission-ftl-faster-than-light-subset-games/

2.9 Evaluation MetricsDespite the widespread interest in GANs, only few works have studied the metrics that quantitatively evaluate GANs performance. In the paper “Are GANs Created Equal? A Large-Scale study” (Lucic et. al., 2018) the following approach is proposed. A few necessary conditions are initialized in the beginning to produce meaningful scores, such as distinguishing real from generated sample, identifying mode dropping and mode collapsing and detecting overfitting. The authors found that kernel Maximum Mean Discrepancy (MDD) and 1-Nearest-Neighbour (1-NN) two sample tests seem to satisfy most of the desirable properties, provided that the distance between samples are computed in suitable featured space. Another popular metrics is calculating the Loss of the Discriminators and Generators, as well as their accuracy.

2.10 Analyzing the results and surveyWhen there is a reasonable amount of data produced a survey will be taken. There are multiple different techniques for analyzing survey results, some of which are discussed in the “Approaches of Analysis of Survey Data”. The survey will make use of a technique known as “Indicators”. Indicators are used as summary measures. A good indicator should synthesize information and serve to represent a reasonable measure of quality. As of now, the indicators which will be used for that research are still unclear, they will be defined by the time the network is ready. Those indicators will be discussed with people from different fields of the industry to ensure they will produce valid feedback. (Statistical Survey Centre, 2001)

2.11 Contributions GANs and neural networks could be applied in multiple fields. In the game industry, mainly behavior neural networks have been researched such as the chess AI DeepChess (David et al, 2016), the League of Legends AI DeepLeague (Farza, 2018). They are capable of providing extremely competitive AI playing versus humans. However, the use of generative models to create procedurally generated game content is not a very popular concept. The objective of this thesis is to understand how suitable they are for such a task. In chapter 3 you can find a summary of the theory of GANs. In chapter 4 there is a comparison between different GAN extensions. A simulation of models on the Celeb-A Faces dataset is included to verify successful results, as well as simulation on the Pixel Art characters dataset.

3 MethodologyIn this chapter, four different models will be considered for generating data. It is important to note, that none of those models are personal work. Each subchapter in the methodology refers to it’s own paper, as I have provided an explanation of how their backend and implementation. Beware that many of the information provided below will be under the form of citations since there is no good way to paraphrase mathematical theory.


3.1 Required knowledge – GANGoodfellow et al. introduced the Generative Adversarial Networks (GAN), a framework for training generative models using a minimax algorithm. The goal is to learn a generator distribution PG(x) that matches the real data distribution Pdata(x). Instead of trying to explicitly assign probability to every x in the data distribution, GAN teaches a generator network G that generates samples from the generator distribution PG by transforming a noise variable z ~ Pnoise(z) into a sample G(z). This generator is trained by opposing an adversarial discriminator network D, which aim is to distinguish between samples from the true data distribution Pdata and the generator’s distribution PG. So far a given generator, the optimal discriminator is D(x) = Pdata(x) / (Pdata(x) + PG(x)). (Goodfellow et. al, 2014) More formally, the minimax could be expressed as:

Figure 3.1.5 GAN expression (Goodfellow et. al., 2014)

3.2 Deep Convolutional GAN (DCGAN)A DCGAN is a direct extension of the GAN described above, except that it explicitly uses convolutional and convolutional-transpose layers in the discriminator and the generator, respectively. (Radford, et. al., 2016). The discriminator is made up of strided convolutional layers (figure 5.1.1), batch norm layers (figure 5.1.2) and rectified linear unit (ReLU) activations (figure 5.1.3). The input is 3x64x64 input image and the output is a scalar probability that the input is from real data distribution. (Inkawhich, 2017)

Figure 3.2.6 2D Convolution over an input signal composed on several input planes. ⋆ is the valid 2D cross correlation operator, N is a batch size, C denotes number of channels, H is a height of input planes in pixels and W is width in pixelsFigure 3.2.7 Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension (Loffe et. al., 2015)

Figure 3.2.8 Rectified Linear Unity function

The generator is consisted of convolutional-transpose layers, batch norms layers, and ReLU activations. The input is a latent vector Z, drawn from a standard normal distribution and the output is 3x64x64 RGB image. The strided convolutional transpose layers allow the latent vector to be transformed into a volume with the same shape as an image.


3.2.1 Weight InitializationAs specified by the authors from the DCGAN paper, all model weights shall be randomly initialized from a Normal distribution with a mean=0 and stdev=0.02. A weight initialization function takes and initialized model as input and reinitializes all convolutional, convolutional-transpose and batch normalization layers to meet these criteria. This function is applied to the models immediately after initialization

3.2.2 GeneratorThe generator, G, is designed to map the latent space vector (Z) to data-space. Since the data is images, converting Z to data-space ultimately creating RGB image with the same size as the training images (i.e. 3x64x64). In practice this is accomplished through a series of strided two-dimensional convolutional transpose layers, each paired with a 2D batch norm layer and ReLU activation. The output of the generator is fed through a Tanh function to return it to the input data range |-1,1|.

Figure 3.2.9 Tanh or Hyperbolic tangent Activasion Function - The Tanh function is mainly used as classification between two classes

It is worth noting the existence of the batch norm functions after the conv-transpose layers, as this is critical contribution of the DCGAN paper. These layers help with the flow of gradients during training.

Figure 3.2.10 An image of DCGAN generator.

3.2.3 Discriminator The Discriminator, D, is a binary classification network that takes an image as an input and outputs a scalar probability that the input image is real (as opposed to fake). Here, the Discriminator takes a


3x64x64 input image, processes it through a series of convolutional, batch-norm and LeRU layers, and outputs the final probability through a Sigmoid activation function (Figure 5.1.6).

Figure 3.2.11 Sigmoid function - Perfect for probability prediction, since this function exists between (0 and 1)

3.2.4 Loss Function and OptimizersWith the discriminator and the generator setup, a loss function and optimizers must be applied to specify how D and G will learn. For the loss function, this model will use the Binary Cross Entropy function (Figure 5.1.7 and Figure 5.1.8)

Figure 3.2.12 Binary Cross Entropy Function

Figure 3.2.13 Binary Cross Entropy graph – it measures the loss of a classification model whose output is a probability value between 0 and 1.

Since the calculation of both log components is in the objective function (i.e. log(D(x)) and log(1−D(G(z)))), it allows to specify what part of the BCE equation to use with the y input. This is accomplished in the training loop.

The real label is defined as 1 and the fake label is defined as 0. These labels are used when calculating the losses of D and G, and this is also the convention used in the original GAN paper. Finally, two


separate optimizers are set – one for D and one for G. As specified in the DCGAN paper, both are Adam optimizers with learning rate 0.0002 and Beta1 = 0.5. For keeping track of the generator’s learning progression, a fixed batch of latent vectors drawn from a Gaussian distribution (i.e. fixed noise) are generated. In the training loop, this fixed noise is periodically inputted into G and over the iterations, images are formed out of the noise.

3.2.5 TrainingFor the training of this model Algorithm 1 from Goodfellow’s paper will be used. Different mini-batches for real and fake images are constructed and G’s objective function is adjusted to maximize log(D(G(z)). Training is split into two main phases. Phase 1 updates the Discriminator and Phase 2 updates the Generator.

During Phase 1, the goal of training the discriminator is to maximize the probability of correctly classifying a given input as real or fake. In terms of Goodfellow, the goal is to “update the discriminator by ascending it’s stochastic gradient”. Practically, this means maximizing log(D(x)) + log(1-D(G(z))). Due to separate mini-batch suggestion, provided in GanHacks repository (Chintala, 2016), this is calculated in two steps. First, a batch of real samples is constructed from the training set. Then it is passed through D, the loss is calculated ( log(D(x)) ) and then the gradients are calculated in a backward pass. After that a batch of fake samples is constructed with the current generator. This batch is passed through the D, the loss is calculated ( log(1-D(G(z))) ), and the gradients are accumulated with a backward pass. With the gradients accumulated from both all-real and all-fake batches, a step of the Discriminator’s optimizer is called.

In Phase 2 the goal is training the generator. The original paper says that the Generator should be trained by minimizing log(1-D(G(z))) in an effort to generate better fakes. However, as shown by Goodfellow, this does not provide sufficient gradients, especially early in the learning process.

Instead, the goal is to maximize log(D(G(z))). This is accomplished by classifying the Generator output from Phase 1 with the Discriminator, computing G’s loss using real labels, computing G’s gradient in a backward pass, and finally updating G’s parameters with an optimizer step.

At the end of each epoch the fixed noise batch is being pushed through the generator to visually track the progress of G’s training. The training statistics reported are the discriminator’s loss calculated as the sum of losses for the all real and all fake batches ( log(D(x)) + log(D(G(z)) ), the generator loss calculated, the average output (across the batch) of the discriminator for the all real batch and average discriminator output for the all fake batch. The average outputs across the batch should start at 1 then theoretically convert to 0.5 when the generator gets better. The average D output for the all fake batch starts near 0 and converge to 0.5 as G gets better.

3.3 Auxiliary classifier GAN“Auxiliary classifier GAN (or AC-GAN) is a variant of the GAN architecture. In the AC-GAN, every generated sample has a corresponding class label, C ~ Pc in addition to the noise Z. The generator uses both to generate images Xfake = G(c,z). The discriminator gives both a probability distribution over sources and a probability distribution over the class labels, P(S | X), P (C | X) = D(X). The objective function has two parts: the log-likelihood of the correct source, Ls, and the log-likelihood of the correct class, Lc.” (Odena et. Al., 2017)


Figure 3.3.14 AC-GAN Objective Function

The discriminator is trained to maximize Ls + Lc while the generator is trained to maximize Lc – Ls. AC-GAN learns a representation for Z that is independent of class label (Odena et Al., 2017 ).

Structurally, this model is not tremendously different from other existing models. However, this improvement to the standard GAN formulation is found to stabilize training and produce excellent results as shown in the original AC-GAN paper.

3.3.1 Weight initializationThis model makes no change to the weight initialization proposed in the original GAN and all model weights are to be randomly initialized from a Normal distribution with a mean=0 and stdev=0.02.

3.3.2 GeneratorThe AC-GAN Generator works similarly to the DCGAN model. It consists a series of strided two-dimensional convolutional transpose layers, paired with 2D batch norm layer and ReLU activation. Main difference between the AC-GAN and DCGAN is the fact that the AC-GAN model adds a layer to upsample. Upsampling is the process of inserting zero-valued samples between original samples to increase the sampling rate. The output of the generator is fed through a Tanh function to return it to the input data range |-1,1|.

3.3.3 DiscriminatorThe Discriminator, D, is consisted of blocks, each consisted of it’s own layers. The batch norm is taken through a series of convolutional, batch-norm and ReLU layers. The AC-GAN makes use of a dropout function, which prevents cases of overfitting. The final probability is outputted through a Sigmoid and Softmax activation functions. Softmax activation function turns numbers (or logits) into probabilities that sum to one. Softmax function outputs a vector, that represents the probability distributions of a list of potential outcomes.

3.3.4 Loss functions and OptimizationSimilar to DCGAN, the adversarial loss in AC-GAN is calculated using the Binary Cross Entropy function. The auxiliary loss is calculated using normal Cross Entropy Loss, since it deals with multiple classes. The optimizer parameters are the same as in the DCGAN model.

3.3.5 TrainingThe training of AC-GAN begins with configuring the input and adversarial ground truths (real and fake). Sample noise and labels are passed as generator input. Because of that, a batch of images is generated, and the loss is used to measure the generator’s ability to fool the discriminator. The discriminator is fed the with the generated batch and the loss for fake and real images is calculated. Total discriminator loss


is calculated as the sum of the real loss and the fake loss divided by two. Finally, for every batch, the discriminator accuracy is calculated to be used as performance metrics.

3.4 Informational Maximizing GAN“The informational maximizing GAN (InfoGAN) is an information-theoretic extension to the GAN’s that is able to learn disentangled representation in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. A lower bound of the mutual information objective are derived to make them efficient for optimization. The original paper proves that InfoGAN is showing outstanding results while working with the digit shapes on the MNIST dataset, pose from lighting. Previous experiments show that InfoGAN learns interpretable representation that are competitive with representations learned by supervised methods.” (Chen et. al., 2016)

3.4.1 Mutual information for Inducing Latent codes “The standard GAN formulation uses a simple factored continuous input noise Z, while imposing no restrictions ion the manner in which the generator may use this noise. As a result, it is possible that the noise will be used by the generator in a highly entangled way, causing the individual dimensions of Z to not correspond to semantic features of the data. However, many domains naturally decompose into a set of semantically meaningful factors of variation. The proposed method in “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets” (Chen et. al., 2016) is rather than using a single unstructured noise vector, to decompose the input noise vector into two parts: Z, which is treated as source of incompressible noise, and C, which will be used as to call the latent code and will target the salient structured semantic features of the data distribution. Mathematically, the set of structured latent variables is denoted as C1, C2,…, CL. In it’s simplest form we may assume a factored distribution, given by P(C1, C2….., CL )= ∏L

I=1P(ci). For the ease of notations, the latent code C will be used to denote the concatenation of all latent variables C i. “ (Chen et. al., 2016)

“Providing the generator network with both incompressible noise Z and the latent code C will allow the model to discover those latent vectors in an unsupervised way, so the form of the generator becomes G(Z,C). However, in standard GAN, the generator is free to ignore the additional latent code C, by finding a solution satisfying PG(X|C) = PG(X). An information-theoretic regularization is applied to avoid dealing with trivial code – theoretically there should be high mutual information between latent codes C and generation distribution G(Z,C). Thus I(C;G(Z,C)) should be high. In information theory mutual information between X and Y, I(X;Y), measures the “amount of information” learned from knowledge of random valuable Y about the other random valuable X. The mutual information can be expressed as the difference of two entropy terms (Figure 5.3.1). “ (Chen et. Al., 2016)

Figure 3.4.15 Mutual information expression (Chen et. Al., 2016)

3.4.2 Variational Mutual Information Maximization “In practice, the mutual information term I(c;G(z,c)) is hard to maximize directly as it requires access to the posterior P(c|x). Fortunately, a lower bound of it can be obtained by defining an auxiliary distribution Q(c|x):


Figure 3.4.16 Variational Information Maximization (Chen et. Al., 2016)

The technique of lower bounding material is called Variational Information Maximization. The usage of this lower bound, allows bypassing the problem of having to compute posterior P(c|x), but the network still must be capable of sampling from the posterior in the inner expectations. A simple Lema is proposed by the authors of the original paper, which removes the need to sample from the processor.” (Chen et., al., 2016)

Lemma 5.1 For random variables X, Y and function f(x,y) under suitable regularity conditions:

By using Lemma 5.1, the variational lower bound L1(G,Q) can be defined of the mutual information I(c;G(z,c)):

“Not that L1(G,Q) is easy to approximate with Monte Carlo simulation. In particular, L1 can be maximized w.r.t Q directly and w.r.t G via the reparameterization trick. Hence, L1 (G, Q) can be added to the GAN’s objectives with no change to GAN’s training process and the final algorithm is called Information Maximizing Generative Adversarial Network (InfoGAN).

Another important factor that must be noted is the fact that the lower bound becomes tight as the auxiliary distribution Q approaches the true posterior distribution:

In addition it is known that when the variational lower bound attains it’s maximum L1 (G, Q) = H(c) for discrete latent codes, the bound becomes and the maximal mutual information is achieved. Hence, infoGan is defined as the following minimax game with a variational regularization of mutual information and hyper parameter λ:

” (Chen et. al., 2016)

3.4.3 Implementation“In practice, the auxiliary distribution Q is parametrized as a neural network. In most experiments, Q and D share all convolutional layers and there is one final fully connected layer to output parameters for the conditional distribution Q(c|x), which means InfoGan only adds a negligible computation cost to GAN.


Another observation made by the original paper is that L1 ( G, Q) always converges faster than normal GAN objectives and hence InfoGan essentially comes for free with GAN.

For categorical latent code ci, softmax nonlinearity is used to represent Q(ci|x). For continius latent code cj, there are more options depending on what is the true posterior P (c j | x). Existing experiments have shown that simply treating Q(cj | x) as a factored Gaussian is sufficient.

Even through InfoGAN introduces an extra hyperparameter λ, it’s easy to tune and simply setting to 1 is sufficient for discrete latent codes. When the latent code contains continuous variables, a smaller λ is typically used to ensure that λL1 (G, Q), which now involves differential entropy, is on the same scale as GAN objectives.

Since GAN is known to be difficult to train, the experiments are designed based on existing techniques introduced by DCGAN, which are enough to stabilize InfoGAN training and there is no need for new tricks to be introduced.” (Chen et. al., 2016)

3.5 Variational Auto Encoder. The variational auto encoder will be used as performance bench mark, since there is already existing project with our dataset (Almousli, et.al., 2017). This model is a joint distribution of two types of random variables – observed random variables X, related to the used data and latent random variables Z. The idea behind this approach is three-fold – to model the distribution of X as the marginal of the joint distributions. Thus, complex distributions of X can emerge from simpler joint distributions. After that representations of the possibly high dimensional X are obtained using Z. Finally, Z is used as the “knobs” for generating samples of X.

Consequently, the three possible use cases of such model area either to compute, or obtain a sample from P(X), or for a certain “knobs” configuration Z, compute or obtain a sample from P(X|Z), or for a given data point X, compute, or obtain a sample from P(X|Z). These use cases reveal the computational structure associated with the joint distribution. Usually it’s the case that one marginal or conditional distribution is computationally feasible while another is not. Thus, when using P(Z|X) for example, the question becomes not only about the mathematical significance, but also about the computational feasibility.

Generally speaking, the modeling is done by writing the joint distribution as a product of computationally-feasible parameterized factors. For example:

Where P0(Z) and P0 (X|Z) each can be computed easily.

However, the training of this model implies finding which maximizes the log-likelihood of the marginal

. This is usually infeasible.

To understand why this term might be computationally infeasible, assume P0(Z) is the Gaussian distribution Obviously, this term is very feasible. Assume also that the conditional P0(X|Z) is the Gaussian , where f0(Z) is a deep neural network whose weights and biases are represented by . Again, computing f(Z) needs only a forward pass of the network, which is feasible.


3.5.1 Variational lower bound“To train this model, the approximate P0(X) is used. One way to do that is by finding a feasible lower bound L0(X) for P0(X). Thus, by maximizing the lower bound, the likelihood is also being maximized.

One way to derive this lower bound is to notice that the joint distribution P0 (X, Z) can be written in the following two ways:

If it is believed that P0 (X) is computationally infeasible, then it implies P0(Z|X) should also be computationally infeasible. The obvious way of computing P0 (X|Z) require the computation of P0(X).

The second key idea is to approximate P0(Z|X) by some feasible distribution .

The term represents the approximation error.

By taking the log of the last formula and rearranging the terms, the formula becomes

This formula holds for any X, Z such that P(X,Z) > 0. Thus, we can take the expectation of the two sides

with respect to an appropriate distribution of Z. One natural way is to use . This is so because, given the formulation, it represents the closest distribution known to the infeasible P0(Z|X). Thus, it results in the tightest bound:

Considering the last formula, it’s right hand side is a lower bound for the marginal log P0(X), so it has to be denoted by

“ (Almousli, 2017)

3.5.2 Model structureThe model goal is to capture the joint probability distribution of the pixels and be able to generate new images that are not in the training set. This is achieved by maximizing the lower bound as mentioned above. Batch normalization is applied after every layer to accelerate training. Rectifier is used for non-linearity. The model consists of

Input Layer will 128x192x3 RGB images. Images are transformed to HSV before they are fed to the model.

Convolutional layer with (50, 3, 3) kernel followed by max pooling (3,3). Convolutional layer with (60, 3, 3) kernel followed by max pooling (2,2) Latent layer with 36 samples of normally distributed latent variables, each with its own mean

mu and standard deviation sigma, reshaped to a 6x6 grid. Transposed convolutional layer, with kernel (20,5,5)


Fully connect layers of 200 neurons Output layer with mu and sigma for each pixel.

The figure below shows the model structure:

Figure 3.5.17 VAE model

4 EvaluationThe goal of the experiments is to investigate which model produces the best quality images given the dataset size, how much time it takes and what is the failure rate. For the evaluation of the results, all models were trained with dataset, consisted of 737 images of pixel art characters in sixteen different positions. The type of image is 128x192x3 RGB image. Since the original size of the images took too long to compute, for the GAN models, the images are resized in code to 64x64x3 RGB images. Results from other datasets, provided in each model research papers are also compared. An online survey was given to a small test group, consisted of artists, currently studying their Masters course in game development. The questions in the survey were designed to evaluate the artists need for a tool, generating pixel art sprites. Those questions are supposed to evaluate the number of assets usually produced for a game, the time given by the studio for producing said assets, how often do they need to change the size of the images. The whole survey is provided in the appendix.

4.1 DCGANThe experiments done with the DCGAN is over the Celeb-A faces dataset and the pixel art dataset.There are two graphs following the Generator and Discriminator Loss during training which shows one of the biggest issues with DCGAN.


4.1.1 Generator and Discriminator Loss during training

Figure 4.1.18 Generator and Discriminator loss graph

Figure 4.1.19 Generator and Discriminator loss graph (Inkawhich, 2017)

The implementations of Figure 6.1.1 and Figure 6.1.2 are identical, using the same hyperparameters. Which means, that DCGAN training is inconsistent in producing satisfying results every time. There is also the risk of overfitting the images if the network works for too long.

4.1.2 Celeb-AThe first experiment is done over the Celeb-A dataset. This experiment is done over 100 epocs with 128 images per batch size. This took 26 hours, using 50000 images for training


Figure 4.1.20 Original data on the left and generated images on the right.

4.1.3 Pixel Art Dataset In the graph below, you can see the discriminator and generator loss with the pixel art set. You will notice that halfway through the model failed, but around the 350th iteration it managed to work again.

Below you can see the comparison between generated pixel art and real images


As you can see, this model training show symptoms of over fitting. All of the generated images look very similar, which means that the generator might have found a single solution that cheats the discriminator and produces only copies of it.. This training took 100 epochs, a batch size of 32 and around 8 hours to complete.

4.2 AC-GANIn the experiments done in the original AC-GAN paper (Odena et. Al., 2017), the authors concluded that synthesizing higher resolution images leads to increased discriminability. “The 128x128 model achieves an accuracy of 10.1% +- 2.0% versus 7.0% +- 2.0% with samples resized to 64 x 64 and 5.0% +- 2.0% with samples resized to 32x32x. In other words, downsizing the outputs of the AC-GAN to 32x32 and 64x64 decreases visual discriminability by 50% and 38% respectively. Furthermore, 84.4% of the ImageNet classes, which is the dataset used for the experience, have higher accuracy at 128 x 128 than at 32x32.” (Odena et. Al., 2017)


Figure 4.2.21 AC-GAN experiments provided by the original ACGAN paper (Odena et. al, 2017)

4.2.1 Pixel art DatasetBelow you can see generated pixel art after 100 epochs with a batch size of 32. Even though the quality is bad, you can clearly see there is no case of overfitting as this model provides unique characters for every batch. You can notice some failures, but in general the results are satisfying. This model is also considerably faster than DCGAN as this training took around 6 hours to complete.

4.3 InfoGANThe goal of the experiment proposed by the original InfoGAN paper is to investigate if mutual information can be maximized efficiently. To evaluate whether the mutual information between latent


codes c and generated images G(z,c) can be maximized efficiently with proposed method, the InfoGAN has been train on the MNIST dataset with a uniform categorical distribution on latent codes c ~ Cat(K=10, p = 0.1). In the graph below, you can see the training of a regular GAN with an auxiliary distribution Q as a baseline, when the generator is not explicitly encouraged to maximize the mutual information with the latent codes.

Figure 4.3.22 Comparing InfoGan with standard GAN (Chen et. al., 2016)

4.3.1 Manipulating latent codes on 3D FacesIn the original research, the effect of continuous latent factors on the outputs as their values vary from-1 to 1. In (a), it is shown that one of the continuous latent codes consistently captures the azimuth of the face across different shapes; in (b) the continuous code captures elevation; in (c), the continuous code captures the orientation of lighting and finally in (d), the continuous code learns to interpolate between the wide and narrow faces, while preserving other visual features.

Figure 4.3.23 InfoGan faces (Chen et. al., 2016)


4.3.2 Pixel Art DatasetBelow you can see the results on the pixel art dataset. As you can see there might be a case of overfitting, which will become more obvious over higher number of epochs. The models are very close to each other and because of the image quality, we can’t be sure if this is the case.

4.4 VAE4.4.1 Pixel ArtThe Variational Auto Encoder provides outstanding results and as such is used as benchmark of success. Below you can see example images provided by the authors of the VAE project.

Figure 4.4.24 Samples of the generated characters with VAE (Almousli, et.al., 2017)


These samples were achieved with training the VAE using MLP on a 20 latent variable.

4.5 Survey ResultsThe survey took place with very limited number of participants. Sixteen artists, currently doing their master’s degrees in “Game Art” took part in the survey. Each one of them answered four questions – the full survey can be found in the appendix. Below are the results of the survey

The results of the survey don’t show a lot of positive responses. It turns out artists are not eager to include such software to help them out.


4.6 ResultsFrom the recorded data GAN’s are suited for procedural generation tool. However, they tend to work slow and often training is unpredictable and inconsistent. But the biggest problem with GAN models is the required data. Game studios rarely can rely on existing datasets in specific art styles and producing the amount of assets needed for guaranteed results may be more expensive than hand drawing them. The survey gives some interesting perspective too. Artists are often given very tight deadlines, which means training a neural network with a new data set is quite inefficient. On top of that, their size guides are often changed and changing GAN’s output size isn’t exactly a trivial task.

From the analyzed data AC-GAN has shown the best results, however it is nowhere near the quality produced by Variational Auto Encoder. It is a good idea for future research to compare the performance of combination of both such as VAEGAN, AeGAN and others.

5 Discussion5.1 AchievementsI’ve been interested in neural networks since I started programming. This research was my first academic opportunity to delve in the topic. I would say that reading through that many papers definitely gave me a good understanding of Deep Learning and this is a field I will keep exploring. Something I didn’t expect when starting this project, is how much math will be involved. However, instead of getting scared from it, I decided to work hard. In retrospective, I would say that after submitting this research, I have better understanding of programming, math and deep learning.

5.2 What went wrongThis research started off slowly, since I had to change my supervisor in the middle of the semester. Even thought I was following the schedule shown in my project proposal, I wasn’t absolutely clear on what kind of research I am doing until very far into the semester. This resulted in amount of time wasted in researching papers that weren’t directly related to my research.

Another big problem, which I didn’t consider when starting this project was the framework and language used. I am not experienced as python developer, which means I had to spend a decent amount of time revisiting coding in Python. I also didn’t do a proper research in the framework I should use. As pointed out in Chapter 2.7 Deep Learning Frameworks, Google’s Tensorflow seem to be more popular by a big margin. Instead I decided to work in PyTorch, which resulted in hard to solve issues. At the end of this research, I have understood that Tensorflow community is way more active and if this was my choice of framework, I would’ve had easier time solving problems. Some of the issues I had coding wise was resizing output images in a size which isn’t really standard (128x192x3).

The third problem was the fact that I didn’t manage to apply the evaluation metrics shown in chapter 2.9 Evaluation Metrics. I feel like I understand the theory behind those metrics, but in practice I wasn’t able to apply Maximum Mean Discrepancy (MDD) and 1-Nearest-Neighbour (1-NN) as performance tests.

The last and main issue is the fact I didn’t took in consideration how much time is spent on training the network. As seen in chapter 4, all of the generated images from the pixel art dataset show very poor


quality. The reason for this is that during half of the semester I was training models, without actually saving the progress on each model. This means, that every time I tried new hyperparameters, the network was learning from 0, instead of the continuing the progress of the previous generations.

6 Reflection on social, legal and ethical issues6.1 Project ManagementSince the beginning of this project, the project has been going off schedule. The first model implementation was done in the middle of march, while according to the time schedule I proposed in my project proposal, the first model should’ve been up and running by the end of February. The initial estimations of how long each task would take proved to be incorrect, as the time for training the models was not taken into consideration. There were also some technical limitations that had effect on the schedule, such as being able to train only a single model on my laptop. If all of the models started their training earlier, I would’ve been able to provide way higher quality images. The lack of high quality images also had it’s effect on the survey, as the initial plan was to ask the participants to pick a sprite from a combination of human drawn and computer generated options.

6.2 Supervisor feedbackThe received feedback from my supervisor Will Blewit was quite helpful. It gave the research paper a clear direction, which I didn’t have when starting it initially. Thanks to the received feedback, the project goal shifter from developing a single neural network to comparing the performance of different models.

The feedback received during the project presentation was positive. It was pointed out that there are results to be shown and that at this point, I’ve started catching up with the initial schedule.

6.3 Future ImprovementsFirst and foremost, the evaluation metrics proposed in chapter 2.9 should be applied to all of the networks. The output size of every network should be fixed to apply for the dataset size and ratio. The survey could be improved, given the GAN models have produced high quality sprites. In this case, the participants can be provided with a multiple choice question, giving them the choice to pick a sprite. Such question would give a few human made sprites and a few computer generated. The user will be allowed to pick the number of human drawn sprites or more (e.g. Two fakes and two reals – the participant will be asked to pick 2 or more images).

Another important observation is that the survey should’ve had more control groups. A few people from a production team would’ve given a interesting perspective.

6.4 Ethical issuesThe goal of using GAN models is to help out the artists, lower their work load and lower the price of the project. However, if those networks are capable of producing high quality assets, such tool might replace the artist. The existence of such tool could result in studios hiring artists for a single contract job of producing the original dataset and then getting rid of the said artist. As pointed out in the article Video Game Artist Salary for 2019 (Bay, 2019), the openings for artists in game studios are very limited, with very high demand. Such tool might increase the margin between those two, which might have a negative impact on the industry.


7 ConclusionThis project goal was to understand if Generative Adversarial Networks and their variations are suitable for that purpose. The conclusion drawn out both from performance graphs and the survey is that they are not suitable in the general case. There are some corner cases where such application would be suitable (existing datasets, big franchises, etc.) but for the average studio, such tool will be too inconsistent and too slow. As of the current results, it seem Variational Auto Encoder is still far superior to GAN’s when it comes to small datasets.

8 Bibliography[1] Almousli, H., Al-lahham, A., [2017] Pixel Art generation using VAE [online] Available at https://mlexplained.wordpress.com/2017/05/06/pixel-art-generation-using-vae/

[2] Anders Larsen, Soren Sonderby, Hugo Larochelle, Ole Winther, 2016. Autoencoding beyond pixels using a learned similarity metric [online]Available at: https://arxiv.org/pdf/1512.09300.pdf [accessed 04.02.2019][3] Bao, R., Liang, S. and Wang, Q. (2019). Featurized Bidirectional GAN: Adversarial Defense via Adversarially Learned Semantic Inference. [online] arXiv.org. Available at: https://arxiv.org/abs/1805.07862 [Accessed 16 Apr. 2019].

[4] Bay, J. (2019). Video Game Artist Salary for 2019. [online] Gameindustrycareerguide.com. Available at: https://www.gameindustrycareerguide.com/video-game-artist-salary/) [Accessed 26 Apr. 2019].

[5] Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I. and Abbeel, P. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. [online] Arxiv.org. Available at: https://arxiv.org/pdf/1606.03657.pdf [Accessed 13 Apr. 2019].

[6] David, E., Netanyahu, N., Wolf, L. (2017) DeepChess: End-to-End deep neural network for automatic learning in chess [online ] Arxiv.org. Available at: https://arxiv.org/abs/1711.09667

[7] Devonte Griffiths, 2018. The History of Pixel Art [online]Available at: http://www.thefactorytimes.com/factory-times/2018/9/27/the-history-of-pixel-art [accessed 07.02.2019]

[8] Farza (2018) DeepLeague: leveraging computer vision and deep learning on the League Of Legends mini map [online] Available at: https://arxiv.org/abs/1711.09667

[9] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. (2014). Generative Adversarial Networks. [online] arXiv.org. Available at: https://arxiv.org/abs/1406.2661 [Accessed 11 Mar. 2019].

[10] Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z. and Shi, W. (2019). Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. [online] arXiv.org. Available at: https://arxiv.org/abs/1609.04802 [Accessed 29 Apr. 2019].

[11] Jiyu Wang, Wengang Zhou, Jinhui Tang, Zhongqian Fu, Qi Tian, Houqiang Li, 2018. Unregularized Auto-Encoder with Generative Adversarial Networks for Image Generation [online]. Available at: https://dl.acm.org/citation.cfm?id=3240569 [accessed 29.01.2019]

https://dl.acm.org/citation.cfm?id=3240569

http://www.thefactorytimes.com/factory-times/2018/9/27/the-history-of-pixel-art



[12] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra, 2015. DRAW: A Recurrent Neural Network for Image Generation [online]Available at: https://arxiv.org/abs/1502.04623 [accessed 04.02.2019]

[13] Lucic, M., Kurach, K., Michalski, M., Gelly, S. and Bousquet, O. (2019). Are GANs Created Equal? A Large-Scale Study. [online] arXiv.org. Available at: https://arxiv.org/abs/1711.10337 [Accessed 5 Apr. 2019].

[14] Maximov, A. (2019). The Future of Art Production in Games. [online] YouTube. Available at: https://www.youtube.com/watch?v=7Rt0wOyCCAI) [Accessed 18 Apr. 2019].

[15] Mehta, A. (2019). A Complete Guide to Types of Neural Networks. [online] Digital Vidya. Available at: https://www.digitalvidya.com/blog/types-of-neural-networks [Accessed 12 Mar. 2019].

[16] Mirza, M. and Osindero, S. (2019). Conditional Generative Adversarial Nets. [online] arXiv.org. Available at: https://arxiv.org/abs/1411.1784 [Accessed 29 Apr. 2019].

[17] Moss, R. (2019). 7 uses of procedural generation that all developers should study. [online] Gamasutra.com. Available at: http://www.gamasutra.com/view/news/262869/7_uses_of_procedural_generation_that_all_developers_should_study.php [Accessed 25 Apr. 2019].

[18] Odena, A., Olah, C., Shlens, J. (2017) Conditional Image Synthesis with Auxiliary Classifier GANs [online] arXiv.org. Available at: https://arxiv.org/abs/1610.09585

[19] Inkawhich, N. (2017) Official PyTorch DCGAN tutorial [online] Pytorch.org. Available at: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html

[20] Radford, A., Metz, L. and Chintala, S. (2019). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. [online] arXiv.org. Available at: https://arxiv.org/abs/1511.06434 [Accessed 4 Apr. 2019].

[21] Statistical Services Centre, the University of Reading, 2001. Approaches of Analysis to Survey Data [online]Available at: https://www.ilri.org/biometrics/TrainingResources/Documents [accessed 08.02.2019]

[22] Yu, L., Zhang, W., Wang, J. and Yu, Y. (2019). SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. [online] arXiv.org. Available at: https://arxiv.org/abs/1609.05473 [Accessed 29 Apr. 2019].

9 Appendices 9.1 Detailed project proposal9.1.1 Research QuestionWe live in an era where making games is extremely accessible. There are multiple gaming platforms and services which allow small scale studios to publish their dream game. However, with small studios comes the issue of scalability. Either after successful launch or during overly ambitious productions, it is very possible that the art department might not be able to handle the workload.

https://www.ilri.org/biometrics/TrainingResources/Documents/University%20of%20Reading/Guides/Guides%20on%20Analysis/ApprochAnalysis.pdf

https://arxiv.org/abs/1502.04623


For this very reason, in this paper I propose a research on the topic “Can generative adversarial networks produce industry quality 2d assets?”. For this research, I will create a neural network capable of generating 2D pixel art assets. Conclusions will be drawn based on a comparison between an artist’s work and images produced by generative network. The data needed for those conclusions will be collected by survey with 3 focus groups participating – artists, developers and users.

9.1.2 Literature ReviewWith the development of deep neural networks, recent years have witnessed the increasing research interest on generative models. As an approach to generating images, there’s been a rise in the popularity of Generative Adversarial Networks (referred to as “GAN”) and Variational Auto-Encoders (referred to as “VAE”). Many developers and researchers have explored that topic and some of them, such as Jiayu Wang and Wengang Zhou in their paper “Unregularized Auto-Encoders with Generative Adversarial Networks”, have pointed the biggest issues with those two methods. The VAE approach, while considered as elegant solution, tends to produce blurry images. In the meanwhile, GANs produce very high quality of generated images, but suffers difficulty in translating a random vector in to a desired high-dimensional sample.

For the last couple of years, developers have been coming up with different combinations of those two. From Auto Encoder Generative Adversarial Networks (AEGAN, Jiayu Wang et al., 2018), Deep Recurrent Attentive Writer (DRAW, Karol Gregor et al., 2014 ), Variational Auto-Encoded Generative Adversarial Network (VAEGAN, Anders Laresn et al., 2015) and multiple implementations of only VAE or only GAN as well. This research will be focused on finding the best approach to produce a specific art style and more specifically – pixel art.

Pixel art has evolved a lot through the ages. As Devonte Griffins says in his article “History of Pixel Arts” - “The highs and lows pixel art faced, eventually resulted in developers using pixel art as an artistic choice instead of shortcut for primitive tech”. Since pixel art is simple and straight forward style, the outcome expectations are that the neural network should be fully capable of producing satisfying results.

When there is a reasonable amount of data produced a survey will be taken. There are multiple different techniques for analyzing survey results, some of which are discussed in the “Approaches of Analysis of Survey Data”. The survey will make use of a technique known as “Indicators”. Indicators are used as summary measures. A good indicator should synthesize information and serve to represent a reasonable measure of quality. As of now, the indicators which will be used for that research are still unclear, they will be defined by the time the network is ready. Those indicators will be discussed with people from different fields of the industry to ensure they will produce valid feedback.

9.1.3 ClientThis research is targeted towards small scale game developer studios. Many small studios usually have a either a small number of artists on the team. Such a tool will enable developers to produce a great quantity of content, while staying true to the initial art style


9.1.4 Primary Research PlanA neural network model producing 2D images will be developed. The first few weeks will be spent on researching different models, until a suitable one is found. As pointed in the literature review, one of the more common methods is using VAE. However, this is a technique that is known to blur the produced image, resulting in lower quality. Lowering the blur from VAE will be a priority in this research, as blurry pixels will seriously impact how the viewer perceives the image, especially in the pixel art style. During the production period, a research will be conducted on how the viewer is impacted by the art style and his quality expectations. Technology wise, python will be used as the main programming language in addition with an appropriate library. An image data sets with 2d sprites will also be required. The current data set is consisted of 730 character sprites. The expectations for the model are that the network will support 2 or 3 hidden layers and probably auto encoders, in case a solution is found to the blurry problem. However, the VAE seems important as consistency in the characters key features is required. Once the network can produce 2D images consisted of all key features and accurate to the style used for teaching, the research will continue onto doing surveys and analyzing their results.

Semester is consisted of 11 weeks. The schedule goes as:

1. Generative Adaptive Networks and Variational Auto Encoder research and practice (weeks 1-4)2. Finding Data (weeks 1-3)3. Python refresher and choosing framework (weeks 2-4) 4. Developing the application (weeks 4-9)5. Survey (weeks 9-11) 6. Report (weeks 5-11)

9.1.5 Intended Project OutcomeConclusions will be drawn from the results of a survey given to three control groups – developers, artists and general users. The study will collect the results from each control group independently for a couple of different reasons. First, automating art generation have some ethical implications, and I want to ensure the artists are happy with how the software represents their skills and technique. The developer group and the client groups will provide different perspective on the same issue. They will provide me with the information of how well they think the art fit in the game industry and how well it fits with the original art. For the survey the participants will be given multiple options per question (e.g. 5 images with random percentage generated images). If in every control group, the generated images have 50% or higher pick rate, it means that the majority of the participants agree the tool is doing its job correctly and neural networks can, in fact, produce industry quality pixel art assets.

9.1.6 BibliographyJiyu Wang, Wengang Zhou, Jinhui Tang, Zhongqian Fu, Qi Tian, Houqiang Li, 2018. Unregularized Auto-Encoder with Generative Adversarial Networks for Image Generation [online]. Available at: https://dl.acm.org/citation.cfm?id=3240569 [accessed 29.01.2019]

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra, 2015. DRAW: A Recurrent Neural Network for Image Generation [online]Available at: https://arxiv.org/abs/1502.04623 [accessed 04.02.2019]

https://arxiv.org/abs/1502.04623

https://dl.acm.org/citation.cfm?id=3240569


Anders Larsen, Soren Sonderby, Hugo Larochelle, Ole Winther, 2016. Autoencoding beyond pixels using a learned similarity metric [online]Available at: https://arxiv.org/pdf/1512.09300.pdf [accessed 04.02.2019]

Devonte Griffiths, 2018. The History of Pixel Art [online]Available at: http://www.thefactorytimes.com/factory-times/2018/9/27/the-history-of-pixel-art [accessed 07.02.2019]

Statistical Services Centre, the University of Reading, 2001. Approaches of Analysis to Survey Data [online]Available at: https://www.ilri.org/biometrics/TrainingResources/Documents [accessed 08.02.2019]

9.2 Full Source Code9.2.1 DCGAN(Inkawhich, 2017)

from __future__ import print_function

#%matplotlib inline

import argparse

import os

import random

import torch

import torch.nn as nn

import torch.nn.parallel

import torch.backends.cudnn as cudnn

import torch.optim as optim

import torch.utils.data

import torchvision.datasets as dset

import torchvision.transforms as transforms

import torchvision.utils as vutils

import numpy as np

import matplotlib.pyplot as plt

import matplotlib.animation as animation

from IPython.display import HTML

https://www.ilri.org/biometrics/TrainingResources/Documents/University%20of%20Reading/Guides/Guides%20on%20Analysis/ApprochAnalysis.pdf

http://www.thefactorytimes.com/factory-times/2018/9/27/the-history-of-pixel-art



# Set random seem for reproducibility

manualSeed = 999

#manualSeed = random.randint(1, 10000) # use if you want new results

print("Random Seed: ", manualSeed)

random.seed(manualSeed)

torch.manual_seed(manualSeed)

# Root directory for dataset

dataroot = "C:/Users/kabad/Desktop/dataset/characters"

# Number of workers for dataloader

workers = 2

# Batch size during training

batch_size = 32

# Spatial size of training images. All images will be resized to this

# size using a transformer.

image_size = 64

# Number of channels in the training images. For color images this is 3

nc = 3

# Size of z latent vector (i.e. size of generator input)

nz = 100

# Size of feature maps in generator

ngf = 64

# Size of feature maps in discriminator


ndf = 64

# Number of training epochs

num_epochs = 30

# Learning rate for optimizers

lr = 0.0001

# Beta1 hyperparam for Adam optimizers

beta1 = 0.5

# Number of GPUs available. Use 0 for CPU mode.

ngpu = 1

# We can use an image folder dataset the way we have it setup.

# Create the dataset

dataset = dset.ImageFolder(root=dataroot,

transform=transforms.Compose([

transforms.Resize(image_size),

transforms.CenterCrop(image_size),

transforms.ToTensor(),

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),

]))

# Create the dataloader

dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,

shuffle=True, num_workers=workers)

# Decide which device we want to run on

device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")


# Plot some training images

real_batch = next(iter(dataloader))

plt.figure(figsize=(8,8))

plt.axis("off")

plt.title("Training Images")

plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0)))

# custom weights initialization called on netG and netD

def weights_init(m):

classname = m.__class__.__name__

if classname.find('Conv') != -1:

nn.init.normal_(m.weight.data, 0.0, 0.02)

elif classname.find('BatchNorm') != -1:

nn.init.normal_(m.weight.data, 1.0, 0.02)

nn.init.constant_(m.bias.data, 0)

class Generator(nn.Module):

def __init__(self, ngpu):

super(Generator, self).__init__()

self.ngpu = ngpu

self.main = nn.Sequential(

# input is Z, going into a convolution

nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),

nn.BatchNorm2d(ngf * 8),

nn.ReLU(True),

# state size. (ngf*8) x 4 x 4

nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),


nn.ReLU(True),



nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),


nn.ReLU(True),


nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False),

nn.BatchNorm2d(ngf),

nn.ReLU(True),

# state size. (ngf) x 32 x 32

nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),

nn.Tanh()

# state size. (nc) x 64 x 64

)

def forward(self, input):

return self.main(input)

# Create the generator

netG = Generator(ngpu).to(device)

# Handle multi-gpu if desired

if (device.type == 'cuda') and (ngpu > 1):

netG = nn.DataParallel(netG, list(range(ngpu)))

# Apply the weights_init function to randomly initialize all weights

# to mean=0, stdev=0.2.

netG.apply(weights_init)

# Print the model

print(netG)


class Discriminator(nn.Module):

def __init__(self, ngpu):

super(Discriminator, self).__init__()

self.ngpu = ngpu

self.main = nn.Sequential(

# input is (nc) x 64 x 64

nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),

nn.LeakyReLU(0.2, inplace=True),

# state size. (ndf) x 32 x 32

nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),

nn.BatchNorm2d(ndf * 2),


# state size. (ndf*2) x 16 x 16

nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),




nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),




nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),

nn.Sigmoid()

)

def forward(self, input):

return self.main(input)

# Create the Discriminator

netD = Discriminator(ngpu).to(device)


# Handle multi-gpu if desired

if (device.type == 'cuda') and (ngpu > 1):

netD = nn.DataParallel(netD, list(range(ngpu)))

# Apply the weights_init function to randomly initialize all weights

# to mean=0, stdev=0.2.

netD.apply(weights_init)

# Print the model

print(netD)

# Initialize BCELoss function

criterion = nn.BCELoss()

# Create batch of latent vectors that we will use to visualize

# the progression of the generator

fixed_noise = torch.randn(64, nz, 1, 1, device=device)

# Establish convention for real and fake labels during training

real_label = 1

fake_label = 0

# Setup Adam optimizers for both G and D

optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))

optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

# Training Loop

# Lists to keep track of progress

img_list = []


G_losses = []

D_losses = []

iters = 0

print("Starting Training Loop...")

# For each epoch

for epoch in range(num_epochs):

# For each batch in the dataloader

for i, data in enumerate(dataloader, 0):

############################

# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))

###########################

## Train with all-real batch

netD.zero_grad()

# Format batch

real_cpu = data[0].to(device)

b_size = real_cpu.size(0)

label = torch.full((b_size,), real_label, device=device)

# Forward pass real batch through D

output = netD(real_cpu).view(-1)

# Calculate loss on all-real batch

errD_real = criterion(output, label)

# Calculate gradients for D in backward pass

errD_real.backward()

D_x = output.mean().item()

## Train with all-fake batch

# Generate batch of latent vectors


noise = torch.randn(b_size, nz, 1, 1, device=device)

# Generate fake image batch with G

fake = netG(noise)

label.fill_(fake_label)

# Classify all fake batch with D

output = netD(fake.detach()).view(-1)

# Calculate D's loss on the all-fake batch

errD_fake = criterion(output, label)

# Calculate the gradients for this batch

errD_fake.backward()

D_G_z1 = output.mean().item()

# Add the gradients from the all-real and all-fake batches

errD = errD_real + errD_fake

# Update D

optimizerD.step()

############################

# (2) Update G network: maximize log(D(G(z)))

###########################

netG.zero_grad()

label.fill_(real_label) # fake labels are real for generator cost

# Since we just updated D, perform another forward pass of all-fake batch through D

output = netD(fake).view(-1)

# Calculate G's loss based on this output

errG = criterion(output, label)

# Calculate gradients for G

errG.backward()

D_G_z2 = output.mean().item()

# Update G


optimizerG.step()

# Output training stats

if i % 10 == 0 or i == 491:

print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'

% (epoch, num_epochs, i, len(dataloader),

errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

# Save Losses for plotting later

G_losses.append(errG.item())

D_losses.append(errD.item())

# Check how the generator is doing by saving G's output on fixed_noise

if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):

with torch.no_grad():

fake = netG(fixed_noise).detach().cpu()

img_list.append(vutils.make_grid(fake, padding=2, normalize=True))

iters += 1


plt.title("Generator and Discriminator Loss During Training")

plt.plot(G_losses,label="G")

plt.plot(D_losses,label="D")

plt.xlabel("iterations")

plt.ylabel("Loss")

plt.legend()

plt.show()

# Grab a batch of real images from the dataloader

real_batch = next(iter(dataloader))


# Plot the real images


plt.subplot(1,2,1)

plt.axis("off")

plt.title("Real Images")

plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0)))

# Plot the fake images from the last epoch

plt.subplot(1,2,2)

plt.axis("off")

plt.title("Fake Images")

plt.imshow(np.transpose(img_list[-1],(1,2,0)))

plt.show()

9.2.2 AC-GAN(Odena et. Al., 2017)

import argparseimport osimport numpy as npimport math

import torchvision.transforms as transformsfrom torchvision.utils import save_image

from torch.utils.data import DataLoaderfrom torchvision import datasetsfrom torch.autograd import Variable

import torch.nn as nnimport torch.nn.functional as Fimport torch

os.makedirs("images", exist_ok=True)

parser = argparse.ArgumentParser()parser.add_argument("--n_epochs", type=int, default=1, help="number of epochs of training")parser.add_argument("--batch_size", type=int, default=32, help="size of the batches")


parser.add_argument("--lr", type=float, default=0.0002, help="adam: learning rate")parser.add_argument("--b1", type=float, default=0.5, help="adam: decay of first order momentum of gradient")parser.add_argument("--b2", type=float, default=0.999, help="adam: decay of first order momentum of gradient")parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")parser.add_argument("--latent_dim", type=int, default=100, help="dimensionality of the latent space")parser.add_argument("--n_classes", type=int, default=10, help="number of classes for dataset")parser.add_argument("--img_size", type=int, default=64, help="size of each image dimension")parser.add_argument("--channels", type=int, default=3, help="number of image channels")parser.add_argument("--sample_interval", type=int, default=400, help="interval between image sampling")opt = parser.parse_args()print(opt)

cuda = True if torch.cuda.is_available() else False

def weights_init_normal(m): classname = m.__class__.__name__ if classname.find("Conv") != -1: torch.nn.init.normal_(m.weight.data, 0.0, 0.02) elif classname.find("BatchNorm2d") != -1: torch.nn.init.normal_(m.weight.data, 1.0, 0.02) torch.nn.init.constant_(m.bias.data, 0.0)

class Generator(nn.Module): def __init__(self): super(Generator, self).__init__()

self.label_emb = nn.Embedding(opt.n_classes, opt.latent_dim)

self.init_size = opt.img_size // 4 # Initial size before upsampling self.l1 = nn.Sequential(nn.Linear(opt.latent_dim, 128 * self.init_size ** 2))

self.conv_blocks = nn.Sequential( nn.BatchNorm2d(128), nn.Upsample(scale_factor=2), nn.Conv2d(128, 128, 3, stride=1, padding=1), nn.BatchNorm2d(128, 0.8), nn.LeakyReLU(0.2, inplace=True), nn.Upsample(scale_factor=2), nn.Conv2d(128, 64, 3, stride=1, padding=1), nn.BatchNorm2d(64, 0.8), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(64, opt.channels, 3, stride=1, padding=1), nn.Tanh(), )

def forward(self, noise, labels):


gen_input = torch.mul(self.label_emb(labels), noise) out = self.l1(gen_input) out = out.view(out.shape[0], 128, self.init_size, self.init_size) img = self.conv_blocks(out) return img

class Discriminator(nn.Module): def __init__(self): super(Discriminator, self).__init__()

def discriminator_block(in_filters, out_filters, bn=True): """Returns layers of each discriminator block""" block = [nn.Conv2d(in_filters, out_filters, 3, 2, 1), nn.LeakyReLU(0.2, inplace=True), nn.Dropout2d(0.25)] if bn: block.append(nn.BatchNorm2d(out_filters, 0.8)) return block

self.conv_blocks = nn.Sequential( *discriminator_block(opt.channels, 16, bn=False), *discriminator_block(16, 32), *discriminator_block(32, 64), *discriminator_block(64, 128), )

# The height and width of downsampled image ds_size = opt.img_size // 2 ** 4

# Output layers self.adv_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, 1), nn.Sigmoid()) self.aux_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, opt.n_classes), nn.Softmax())

def forward(self, img): out = self.conv_blocks(img) out = out.view(out.shape[0], -1) validity = self.adv_layer(out) label = self.aux_layer(out)

return validity, label

# Loss functionsadversarial_loss = torch.nn.BCELoss()auxiliary_loss = torch.nn.CrossEntropyLoss()

# Initialize generator and discriminatorgenerator = Generator()discriminator = Discriminator()

if cuda: generator.cuda() discriminator.cuda() adversarial_loss.cuda() auxiliary_loss.cuda()


# Initialize weightsgenerator.apply(weights_init_normal)discriminator.apply(weights_init_normal)


dataset = datasets.ImageFolder(root=dataroot, transform=transforms.Compose([ transforms.Resize( opt.img_size), transforms.CenterCrop(opt.img_size), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ]))

dataloader = torch.utils.data.DataLoader(dataset, batch_size=opt.batch_size, shuffle=True)

# Configure data loader#os.makedirs("../../data/mnist", exist_ok=True)#dataloader = torch.utils.data.DataLoader(# datasets.MNIST(# "../../data/mnist",# train=True,# download=True,# transform=transforms.Compose(# [transforms.Resize(opt.img_size), transforms.ToTensor(), transforms.Normalize([0.5], [0.5])]# ),# ),# batch_size=opt.batch_size,# shuffle=True,#)

# Optimizersoptimizer_G = torch.optim.Adam(generator.parameters(), lr=opt.lr, betas=(opt.b1, opt.b2))optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=opt.lr, betas=(opt.b1, opt.b2))

FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensorLongTensor = torch.cuda.LongTensor if cuda else torch.LongTensor

def sample_image(n_row, batches_done): """Saves a grid of generated digits ranging from 0 to n_classes""" # Sample noise z = Variable(FloatTensor(np.random.normal(0, 1, (n_row ** 2, opt.latent_dim)))) # Get labels ranging from 0 to n_classes for n rows labels = np.array([num for _ in range(n_row) for num in range(n_row)]) labels = Variable(LongTensor(labels)) gen_imgs = generator(z, labels) save_image(gen_imgs.data, "images/%d.png" % batches_done, nrow=n_row, normalize=True)


# ----------# Training# ----------

for epoch in range(opt.n_epochs): for i, (imgs, labels) in enumerate(dataloader):

batch_size = imgs.shape[0]

# Adversarial ground truths valid = Variable(FloatTensor(batch_size, 1).fill_(1.0), requires_grad=False) fake = Variable(FloatTensor(batch_size, 1).fill_(0.0), requires_grad=False)

# Configure input real_imgs = Variable(imgs.type(FloatTensor)) labels = Variable(labels.type(LongTensor))

# ----------------- # Train Generator # -----------------

optimizer_G.zero_grad()

# Sample noise and labels as generator input z = Variable(FloatTensor(np.random.normal(0, 1, (batch_size, opt.latent_dim)))) gen_labels = Variable(LongTensor(np.random.randint(0, opt.n_classes, batch_size)))

# Generate a batch of images gen_imgs = generator(z, gen_labels)

# Loss measures generator's ability to fool the discriminator validity, pred_label = discriminator(gen_imgs) g_loss = 0.5 * (adversarial_loss(validity, valid) + auxiliary_loss(pred_label, gen_labels))

g_loss.backward() optimizer_G.step()

# --------------------- # Train Discriminator # ---------------------

optimizer_D.zero_grad()

# Loss for real images real_pred, real_aux = discriminator(real_imgs) d_real_loss = (adversarial_loss(real_pred, valid) + auxiliary_loss(real_aux, labels)) / 2

# Loss for fake images fake_pred, fake_aux = discriminator(gen_imgs.detach()) d_fake_loss = (adversarial_loss(fake_pred, fake) + auxiliary_loss(fake_aux, gen_labels)) / 2


# Total discriminator loss d_loss = (d_real_loss + d_fake_loss) / 2

# Calculate discriminator accuracy pred = np.concatenate([real_aux.data.cpu().numpy(), fake_aux.data.cpu().numpy()], axis=0) gt = np.concatenate([labels.data.cpu().numpy(), gen_labels.data.cpu().numpy()], axis=0) d_acc = np.mean(np.argmax(pred, axis=1) == gt) # d_prec = np.mean(pred/(real_pred+))

d_loss.backward() optimizer_D.step()

print( "[Epoch %d/%d] [Batch %d/%d] [D loss: %f, acc: %d%%] [G loss: %f]" % (epoch, opt.n_epochs, i, len(dataloader), d_loss.item(), 100 * d_acc, g_loss.item()) ) batches_done = epoch * len(dataloader) + i save_image(batches_done, "generatedImage" , 1, 2) if batches_done % opt.sample_interval == 0: sample_image(n_row=10, batches_done=batches_done)

9.2.3 InfoGAN(Chen et. al., 2016)

import argparse

import os

import numpy as np

import math

import itertools

import torchvision.transforms as transforms

from torchvision.utils import save_image

from torch.utils.data import DataLoader

from torchvision import datasets

from torch.autograd import Variable

import torch.nn as nn

import torch.nn.functional as F


import torch

os.makedirs("images/static/", exist_ok=True)

os.makedirs("images/varying_c1/", exist_ok=True)

os.makedirs("images/varying_c2/", exist_ok=True)

parser = argparse.ArgumentParser()

parser.add_argument("--n_epochs", type=int, default=200, help="number of epochs of training")

parser.add_argument("--batch_size", type=int, default=64, help="size of the batches")

parser.add_argument("--lr", type=float, default=0.0001, help="adam: learning rate")

parser.add_argument("--b1", type=float, default=0.5, help="adam: decay of first order momentum of gradient")

parser.add_argument("--b2", type=float, default=0.999, help="adam: decay of first order momentum of gradient")

parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")

parser.add_argument("--latent_dim", type=int, default=62, help="dimensionality of the latent space")

parser.add_argument("--code_dim", type=int, default=2, help="latent code")

parser.add_argument("--n_classes", type=int, default=10, help="number of classes for dataset")

parser.add_argument("--img_size", type=int, default=32, help="size of each image dimension")

parser.add_argument("--channels", type=int, default=3, help="number of image channels")

parser.add_argument("--sample_interval", type=int, default=400, help="interval between image sampling")

opt = parser.parse_args()

print(opt)

cuda = True if torch.cuda.is_available() else False

def weights_init_normal(m):


classname = m.__class__.__name__

if classname.find("Conv") != -1:

torch.nn.init.normal_(m.weight.data, 0.0, 0.02)

elif classname.find("BatchNorm") != -1:

torch.nn.init.normal_(m.weight.data, 1.0, 0.02)

torch.nn.init.constant_(m.bias.data, 0.0)

def to_categorical(y, num_columns):

"""Returns one-hot encoded Variable"""

y_cat = np.zeros((y.shape[0], num_columns))

y_cat[range(y.shape[0]), y] = 1.0

return Variable(FloatTensor(y_cat))

class Generator(nn.Module):

def __init__(self):

super(Generator, self).__init__()

input_dim = opt.latent_dim + opt.n_classes + opt.code_dim

self.init_size = opt.img_size // 4 # Initial size before upsampling

self.l1 = nn.Sequential(nn.Linear(input_dim, 128 * self.init_size ** 2))

self.conv_blocks = nn.Sequential(

nn.BatchNorm2d(128),

nn.Upsample(scale_factor=2),

nn.Conv2d(128, 128, 3, stride=1, padding=1),

nn.BatchNorm2d(128, 0.8),



nn.Upsample(scale_factor=2),

nn.Conv2d(128, 64, 3, stride=1, padding=1),

nn.BatchNorm2d(64, 0.8),


nn.Conv2d(64, opt.channels, 3, stride=1, padding=1),

nn.Tanh(),

)

def forward(self, noise, labels, code):

gen_input = torch.cat((noise, labels, code), -1)

out = self.l1(gen_input)

out = out.view(out.shape[0], 128, self.init_size, self.init_size)

img = self.conv_blocks(out)

return img

class Discriminator(nn.Module):

def __init__(self):

super(Discriminator, self).__init__()

def discriminator_block(in_filters, out_filters, bn=True):

"""Returns layers of each discriminator block"""

block = [nn.Conv2d(in_filters, out_filters, 3, 2, 1), nn.LeakyReLU(0.2, inplace=True), nn.Dropout2d(0.25)]

if bn:

block.append(nn.BatchNorm2d(out_filters, 0.8))

return block


self.conv_blocks = nn.Sequential(

*discriminator_block(opt.channels, 16, bn=False),

*discriminator_block(16, 32),



)

# The height and width of downsampled image

ds_size = opt.img_size // 2 ** 4

# Output layers

self.adv_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, 1))

self.aux_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, opt.n_classes), nn.Softmax())

self.latent_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, opt.code_dim))

def forward(self, img):

out = self.conv_blocks(img)

out = out.view(out.shape[0], -1)

validity = self.adv_layer(out)

label = self.aux_layer(out)

latent_code = self.latent_layer(out)

return validity, label, latent_code

# Loss functions

adversarial_loss = torch.nn.MSELoss()

categorical_loss = torch.nn.CrossEntropyLoss()

continuous_loss = torch.nn.MSELoss()


# Loss weights

lambda_cat = 1

lambda_con = 0.1

# Initialize generator and discriminator

generator = Generator()

discriminator = Discriminator()

if cuda:

generator.cuda()

discriminator.cuda()

adversarial_loss.cuda()

categorical_loss.cuda()

continuous_loss.cuda()

# Initialize weights

generator.apply(weights_init_normal)

discriminator.apply(weights_init_normal)


dataset = datasets.ImageFolder(root=dataroot,

transform=transforms.Compose([

transforms.Resize( opt.img_size),

transforms.CenterCrop(opt.img_size),

transforms.ToTensor(),

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),

]))


dataloader = torch.utils.data.DataLoader(dataset, batch_size=opt.batch_size,

shuffle=True)

# Configure data loader

# os.makedirs("../../data/mnist", exist_ok=True)

#dataloader = torch.utils.data.DataLoader(

# datasets.MNIST(

# "../../data/mnist",

# train=True,

# download=True,

# transform=transforms.Compose(

# [transforms.Resize(opt.img_size), transforms.ToTensor(), transforms.Normalize([0.5], [0.5])]

# ),

# ),

# batch_size=opt.batch_size,

# shuffle=True,

#)

# Optimizers

optimizer_G = torch.optim.Adam(generator.parameters(), lr=opt.lr, betas=(opt.b1, opt.b2))

optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=opt.lr, betas=(opt.b1, opt.b2))

optimizer_info = torch.optim.Adam(

itertools.chain(generator.parameters(), discriminator.parameters()), lr=opt.lr, betas=(opt.b1, opt.b2)

)

FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor

LongTensor = torch.cuda.LongTensor if cuda else torch.LongTensor


# Static generator inputs for sampling

static_z = Variable(FloatTensor(np.zeros((opt.n_classes ** 2, opt.latent_dim))))

static_label = to_categorical(

np.array([num for _ in range(opt.n_classes) for num in range(opt.n_classes)]), num_columns=opt.n_classes

)

static_code = Variable(FloatTensor(np.zeros((opt.n_classes ** 2, opt.code_dim))))

def sample_image(n_row, batches_done):

"""Saves a grid of generated digits ranging from 0 to n_classes"""

# Static sample

z = Variable(FloatTensor(np.random.normal(0, 1, (n_row ** 2, opt.latent_dim))))

static_sample = generator(z, static_label, static_code)

save_image(static_sample.data, "images/static/%d.png" % batches_done, nrow=n_row, normalize=True)

# Get varied c1 and c2

zeros = np.zeros((n_row ** 2, 1))

c_varied = np.repeat(np.linspace(-1, 1, n_row)[:, np.newaxis], n_row, 0)

c1 = Variable(FloatTensor(np.concatenate((c_varied, zeros), -1)))

c2 = Variable(FloatTensor(np.concatenate((zeros, c_varied), -1)))

sample1 = generator(static_z, static_label, c1)

sample2 = generator(static_z, static_label, c2)

save_image(sample1.data, "images/varying_c1/%d.png" % batches_done, nrow=n_row, normalize=True)

save_image(sample2.data, "images/varying_c2/%d.png" % batches_done, nrow=n_row, normalize=True)


# ----------

# Training

# ----------

for epoch in range(opt.n_epochs):

for i, (imgs, labels) in enumerate(dataloader):

batch_size = imgs.shape[0]

# Adversarial ground truths

valid = Variable(FloatTensor(batch_size, 1).fill_(1.0), requires_grad=False)

fake = Variable(FloatTensor(batch_size, 1).fill_(0.0), requires_grad=False)

# Configure input

real_imgs = Variable(imgs.type(FloatTensor))

labels = to_categorical(labels.numpy(), num_columns=opt.n_classes)

# -----------------

# Train Generator

# -----------------

optimizer_G.zero_grad()

# Sample noise and labels as generator input

z = Variable(FloatTensor(np.random.normal(0, 1, (batch_size, opt.latent_dim))))

label_input = to_categorical(np.random.randint(0, opt.n_classes, batch_size), num_columns=opt.n_classes)

code_input = Variable(FloatTensor(np.random.uniform(-1, 1, (batch_size, opt.code_dim))))


# Generate a batch of images

gen_imgs = generator(z, label_input, code_input)

# Loss measures generator's ability to fool the discriminator

validity, _, _ = discriminator(gen_imgs)

g_loss = adversarial_loss(validity, valid)

g_loss.backward()

optimizer_G.step()

# ---------------------

# Train Discriminator

# ---------------------

optimizer_D.zero_grad()

# Loss for real images

real_pred, _, _ = discriminator(real_imgs)

d_real_loss = adversarial_loss(real_pred, valid)

# Loss for fake images

fake_pred, _, _ = discriminator(gen_imgs.detach())

d_fake_loss = adversarial_loss(fake_pred, fake)

# Total discriminator loss

d_loss = (d_real_loss + d_fake_loss) / 2

d_loss.backward()


optimizer_D.step()

# ------------------

# Information Loss

# ------------------

optimizer_info.zero_grad()

# Sample labels

sampled_labels = np.random.randint(0, opt.n_classes, batch_size)

# Ground truth labels

gt_labels = Variable(LongTensor(sampled_labels), requires_grad=False)

# Sample noise, labels and code as generator input

z = Variable(FloatTensor(np.random.normal(0, 1, (batch_size, opt.latent_dim))))

label_input = to_categorical(sampled_labels, num_columns=opt.n_classes)

code_input = Variable(FloatTensor(np.random.uniform(-1, 1, (batch_size, opt.code_dim))))

gen_imgs = generator(z, label_input, code_input)

_, pred_label, pred_code = discriminator(gen_imgs)

info_loss = lambda_cat * categorical_loss(pred_label, gt_labels) + lambda_con * continuous_loss(

pred_code, code_input

)

info_loss.backward()

optimizer_info.step()


# --------------

# Log Progress

# --------------

print(

"[Epoch %d/%d] [Batch %d/%d] [D loss: %f] [G loss: %f] [info loss: %f]"

% (epoch, opt.n_epochs, i, len(dataloader), d_loss.item(), g_loss.item(), info_loss.item())

)

batches_done = epoch * len(dataloader) + i

if batches_done % opt.sample_interval == 0:

sample_image(n_row=10, batches_done=batches_done)

9.3 SurveyFull list of the survey questions and visualized answer data available below:

1. When you are given a task to produce a set of 2D sprites, how much time are you usually given to complete the task? A) Less than 2 weeksB) Between 2 weeks and 2 monthsC) Between 2 months and 6 monthsD) Over 6 months

2. Do you produce sprites with multiple resolutions for a single project or do you stick to one resolution? A) Multiple resolutionsB) Single resolution


3. How many sprites do you usually produce per project?A) Less than 250B) 250 – 500 C) 500 – 1000D) Over 1000

4. Would you use a tool that produces 2D sprites in given art style, if it requires at least 1000 sprites to start?A) YesB) No



9.4 Supervisor feedback