Making faces with GANGogh

I wanted to see if I could pre-train a network on various kinds of art before adding my Renaissance faces dataset. Maybe, like with generalized Style Transfer, the network would pick up some general insights about images from the larger dataset, and apply them to the images generated based on the much smaller Renaissance faces data.

Some of the most impressive results from GANs are those done with enormous computing power on very large datasets. The computing power needed to produce new pre-trained networks such as BigGAN or Progressive Growing of GANs is mostly restricted to researchers at Google and Nvidia. The same dozen large datasets are used to train most of these networks, and no wonder. ImageNet’s 9 million labels were added by hand by people working for Amazon’s Mechanical Turk. Even so, the resulting images are almost always readily distinguishable from the training data, and resolution rarely reaches or exceeds 512×512 pixels.

As AI artist Helena Sarin points out (https://www.artnome.com/news/2018/11/14/helena-sarin-why-bigger-isnt-always-better-with-gans-and-ai-art) that artists and researchers all training with the  same data leads to a repetition of the same aesthetic. She advocates for a “smallGAN” movement that embraces the limitations and bad behaviour of the sort of small (64×64, 128x128pixel) networks that can be trained on more affordable equipment, on datasets that can conceivably be curated by single resourceful person. I am working to take this one step further and recruit object recognition algorithms for face detection to produce a dataset specific to my project.

I used computer vision techniques to crop 2800 faces from several thousand early Renaissance paintings. I was then able to train a GAN to produce new faces, which I plan to composite onto the characters in the mythological images I am creating. The generated faces are of reasonable quality, but I wondered if it would be possible to add variety and depth by incorporating more general information about old paintings into the network, while still outputting faces. I reasoned that this approach was part of what enabled  Google researchers to perform Style Transfer using a single example of the desired style, using a network that was previously trained on many different styles. Could the same technique be applied to help my network learn from relatively few Renaissance faces? 

I found a blog post and some code (https://github.com/rkjones4/GANGogh/tree/master/misc) that used this same approach with GANs. Jones used the entire wikiart dataset of 80,000 paintings, sketches and other art objects, categorized among 15 categories such as “landscape”, “abstract”, “portrait” and so on. The code was not intended for widespread use, and I struggled to get it to run. I recruited help from Nilav Ghosh through Codementor who helped me figure out how to configure the version of Tensorflow required to run this two-year-old code. I couldn’t figure out how to run the data pre-processor – it seemed to be windows-specific, and ultimately I wrote my own in Python, and pretty soon I was able to replicate his results. 

It was a triumph for me to even get this code running. I have deliberately skipped learning about frameworks for programming neural networks from the ground up in the traditional way, because I read and heard this would take several months – more time than I had. So I have been picking things up as I go, and digging into details only when things break. This kind of challenge is everywhere when running repositories of someone else’s code that was never intended as anything other than an experiment or prototype. It is not a criticism of the developer, who very often has gone to a lot of trouble to document what they’ve done and how to replicate their results. But altering anything, changing datasets or using more recent versions of libraries and languages all lead to problems. It is not uncommon to find “magic numbers” in the code in various places, leading to questions like ”why is this variable multiplied by 844 on line 127?” Chances are that no one knows. If the author hasn’t looked at it in a year or more, even they can’t remember. I have had to abandon my attempts to get several promising and widely cited GANs to work after days or weeks of unsuccessful troubleshooting. I have the feeling that someone with greater programming experience in deep learning might be able to overcome these problems. I don’t know. I have often encountered obstacles and been unable to find solutions from anyone in person or online. 

64 x 64 pixel samples of images generated by GANGogh in the category “Symbolic Painting”

In order to produce faces informed by the much larger Wikiart dataset, I added my Renaissance face images as an additional category alongside the 14 original ones, resized the input and output matrices and let it train most of the night. 

The resulting images are small – attempts to scale this GAN beyond 64×64 were not successful. Each generated image grid shows 90 images from one of the 15 categories. These grids of 90 sample images for each category are produced for each training epoch. An epoch represents the processing required for the GAN to train on all the data in its training dataset. This allows us to look at images that differ only by the category we are asking for. We can see an image change slightly to become as a landscape, a portrait, a religious painting or as a Renaissance face. 

64 x 64 pixel images created by GANGogh in the category “faces”, trained on Wikiart and my Renaissance faces dataset

      The first few times I ran this (with slightly different training parameters) it stopped prematurely due to an error. It is interesting to see the colourful an evocative face images emerge from noise as the training proceeds. With the present approach, however the faces are probably not an improvement over the faces produced earlier using a DCGAN without the entire wikiart dataset – the colour and shape variation is probably not conducive to compositing the faces into the scenes I am trying to create. I am continuing to experiment with this approach in the hopes that further training will produce faces that are more suitable for my needs.