The first and simplest GANs such as DC GANs produce images that mimic the images they were trained on. (See the black and white digits produced by a DC GAN in the blog post Getting Started with GANs). Others have since developed variations that address some of the challenges such as GANs falling to converge (i.e. failing to be trained) or mode collapse, which causes a GAN to create the same output over and over. Various techniques such as Wasserstein loss, SELU and subdivision can sometimes help the training and reduce the amount of trial-and-error needed to fine-tune hyperparameters. For a sense of some of these approaches and their tradeoffs, see Alexia Jolicoeur-Martineau’s blog post on generating cat faces with GANs: https://ajolicoeur.wordpress.com/cats/ .
CycleGANs are significantly different, using the same building blocks in quite a different way. This week I got some promising results after training a CycleGAN on Early Renaissance paintings from WikiArt and Astronomy pictures from NASA’s Astronomy Picture of the Day. I used the cycleGAN in an unconventional way, so I will start at the beginning with CycleGAN basics.
Unlike Style Transfer, which can render a photo in the style of a particular painter, Cycle GANs are normally used to convert an image from one subject to another. Style transfer is the subject of my next blog post.
CycleGANs are two complementary GANs – one that converts pictures of A into pictures of B, and the other turns pictures of B into pictures of A. The authors of the original CycleGAN paper demonstrate training a CycleGAN on a thousand pictures of zebras and a thousand pictures of horses.
Converting A->B and B->A with CycleGAN. (From the original paper: https://arxiv.org/abs/1703.10593v6 )
Like other GANs, this approach requires a training dataset with more than a thousand images for each type of scene. Like other GANs, these can be hard to train. Having trained them, there are other ways they can fail to deliver the desired results.
How to blend into the herd: A shirtless man (Vladimir Putin?) becomes an extension of his zebra. (From the paper )
The training and image generation code has been released on GitHub (here). It makes use of Cuda processing on the GPU to speed up training, so it was necessary to run it on a remote machine with multiple GPUs.
As an aside for anyone who is looking to replicate this, it took some wrestling to get it to work, mostly due to inexperience. The code wants to run a graphical display of progress on a local port, which doesn’t help if the machine is remote. If you’re not running the graphical server, the code will crash. Tushar Gupta helped me figure out how to get this all working on AWS. You need to run Python 3 with PyTorch, Cuda, and check the other module requirements in requirements.txt. Before starting, make sure to run:
python -m visdom.server
I decided to deliberately try to map normally unrelated subject matter together. Zebraman, above, is a fine example of machines making connections that humans would never make (…okay, very few humans would ever make). Part of my process of writing plausible machine creation myths is looking for connections that might seem natural to machines, but unexpected to us. One of the myths that I suggested could emerge from astronomy AIs on Mauna Kea was the veneration of distant nebulas – the birthplaces of stars. Since the future renaissance machines are taking their image-making cues from traditional human images, they mash up astronomical images and gilding.
A couple of weeks ago, I had managed to scrape about 2500 pictures from the NASA’s Astronomy Picture of the Day website. For the first pass, I simply resized all the images I had to 64×64 pics and then cropped them square without resizing. I did the same for every picture in the Early Renaissance category of wikiart.org, also about 2500 pics. This collection therefore included paintings, drawings and sculptures of many different subjects, many with heads cut off by the cropping. The astronomy pictures were an equally varied lot. This is not ideal for training a GAN, where more consistency would be helpful. I wanted to try this as a baseline, to see how much improvement I got by sorting the images manually, by scraping with keywords, or by computer vision techniques before feeding them to the GAN.
Early Renaissance pictures from Wikiart.org, resized to 64x64pix and cropped square Astronomy pictures of the Day from NASA, resized to 64x64pix and cropped square
I let it run for several hours, about 22 training epochs. The authors mention using 200 epochs (https://arxiv.org/pdf/1703.10593v6.pdf Section 7, Appendix) An epoch means processing through all the training data once. Multiple epochs are usually needed, and often hundreds. I had to write another short script to combine the images in some sort of order that I could make sense of (code on my Github).
In the 21st epoch, promising results started to appear:
A Madonna becomes a globular cluster… A globular cluster becomes a Madonna. The original Madonna The original globular cluster
Normally CycleGANs are used to create a complete transformation (e.g. all the way to zebra). In this case, interesting intermediate results are achieved by pausing part way through the training. Although the method is quite different, the results are similar to the interpolations possible between image categories using BigGAN discussed
This result was based on smaller datasets I had before developing a new web scraper for APOD (discussed last week) and before doing some filtering and segmenting of the renaissance images (to be discussed later this week). I’m looking forward to more experiments with CycleGAN.