Getting Started with GANs

For the last two weeks I have focused on concepts and techniques for creating mythological images. Generally, I come up with a mythological scenario and then create the image around it – the civilization of machines being unearthed from the ground, emerging from the ocean, resulting from a crashed flying machine and various others.

I start with a sketch (example below)….

…and build a photo composite based on the general composition of the sketch. This one below is made by compositing low-res images from ImageNet, which is a dataset used to train machine vision systems.

ImageNet looks something like this. A hierarchy of English nouns with associated photos, almost 15 million photos in total.

I also tried using the outputs of BigGAN – a new Neural Network trained on ImageNet that generates an endless variety of eerily photorealistic images in various categories. Some of them contain objects that can almost pass for photographs but contain strange duplications, omissions, distortions and dreamlike details that don’t make sense. Some of the mysterious objects looked like good candidates for the myth cycle wherein machine life emerged from the sea, so I composited them together with a seascape. This test image suggests this approach may have some promise, but perhaps the ambiguity of the machine dream objects makes these unreadable.

These are rough images intended to explore concepts and lacking polish. I don’t want to spend too much time on concepts that don’t work out. Here’s one I did spend quite a bit of time on, combining images from ImageNet and from BigGAN, but which really didn’t work. I was experimenting with showing a crash-landed drone as the figurative source of the spark of wisdom, reminiscent of daVinci’s Madonna of the Yarnwinder, and the obelisk in 2001: A Space Odyssey. It really didn’t go well.

 I think it is fitting that the machines would use pictures from ImageNet to depict their earliest cultural memories. But for this image I had to troll through hundreds of images just to find a picture of a drone and cave that would sort of work, and weren’t great. Part of the problem is that you don’t have your pick of images or of lighting conditions from ImageNet, and many of the promising images have restrictive copyright that prohibits their use in derivative works like this. They are not selected for inclusion in ImageNet because they are beautiful. The low resolution of the images is another major issue. Almost all of the pictures seem to be from pre-2007 (perhaps when people started uploading their pictures to Facebook, rather than Flickr and personal webpages, and the camera sensors of 12-15 years ago don’t produce beautiful results)

My original concept was to develop a GAN or other generative technique that can create original patterns for scribing on gold leaf. Unfortunately, GANs are at the complicated and recent end of the spectrum when it comes to Machine Learning and AI. There are no books, the techniques are complex, and you need large datasets of appropriate imagery. GANs are famous for being difficult to train. There is a lot of time-consuming trial-and-error while tuning hyperparameters. Mode collapse, where a network decides to produce the same image over and over, is a common problem. Sophisticated approaches like BigGAN are out of reach, requiring enormous processing power and not insignificant electricity. I began experimenting by replicating others’ results.

One traditional experiment looked promising. Ian Goodfellow’s 2014 paper, which was the first to present GANs, attempted to create handwritten digits that would be indistinguishable from those written by people. Following in his footsteps, I was able to create these using the most basic type of GAN, a deep convolutional GAN or DCGAN. It is shown here progressing through stages of training from random noise to passable results.

I then replicated another well-known experiment creating new celebrity faces from the celebA dataset. Here are some of the resulting people who never were, dreamt up by a machine:

By the end of the second training epoch, they are starting to look pretty human. More recent techniques have improved on these – see these folks created by a Progressive GAN (link)

I was fortunate to be able to make use of pre-existing datasets that are all carefully cropped with the eyes in the centre of the frame. Training my own GAN on my own data was more tricky. I wrote a little utility to resize and crop images from I am pretty new with Python. Last week I went through about half of Learn Python the Hard Way. It definitely helped me get my bearings inside other people’s code. The fact that you can run it line by line or in a Jupyter Notebook makes it easy to take apart other people’s code and see what’s happening. Python is pretty convenient. Here’s how you open a file in Python and print it out:

filename = "hello_world.txt"
file = open(filename, "r")
for line in file:
   print line

This is refreshingly useful and direct, especially when compared with C or Java. Here’s how you do it in Java (not including the main class you need in order to call this):

List<Integer> list = new ArrayList<Integer>();
File file = new File("file.txt");
BufferedReader reader = null;
try {
    reader = new BufferedReader(new FileReader(file));
    String text = null;
    while ((text = reader.readLine()) != null) {
} catch (FileNotFoundException e) {
} catch (IOException e) {
} finally {
    try {
        if (reader != null) {
    } catch (IOException e) {
//print out the list

(Courtesy of dogbane at StackOverflow – link. Because I have to look it up every time)

I have been asking myself how Practice-based Research works when part of the creative process involves coding or building. It’s too early to say, but I can say that the feeling of writing a ten line program that crops and resizes 1400 photos in a few seconds was absolutely thrilling – like having a studio assistant that takes the grunt work out of preparing to make a picture. I could easily see using Python to filter out sketches from paintings (based on colour composition) or centring faces on the page using pre-existing libraries like openCV. Python feels like it will actually speed up the process of creating, as opposed to being a necessary but very slow grind like developing in lower level languages.

I was also assisted immeasurably by Tushar Gupta a programmer I found through Codementor, who helped me get unstuck and suggested some approaches to try.

I scraped about 1400 early Renaissance paintings from WikiArt and ran them through DCGAN. These are quite a mismatched group of images, so I wasn’t expecting miracles. Here’s a sample of a few images from the dataset, resized to a tiny 64x64pix:

Predictably, the DC GAN was not able to get a handle on this diversity of images in just two training epochs, which took about 20 minutes to run on my laptop.

Here’s how far it got, going from random noise, to coloured blotches.

It did pick up on the colour palette – avoiding chromium yellow, durable greens and other pigments that were not available until much later.

Experiments with CycleGAN (for image style transfer and pix2pix), and other GANs will be next. I will be running those on powerful cloud-based servers with GPUs that can run Cuda, which my laptop is unable to do.

I’ve noticed that the sheer amount of time spent trying to execute on some of these ideas provides space for creative ideas to arise. As I mentioned in my blog post about BigGAN, spending hours going through images searching for useful material allows part of my mind to focus on the details of the task while some other part is chewing on the themes and general questions of the project in a more abstract, non-verbal and sometimes non-conceptual way. It would be hard to sit down and work on developing ideas or solving problems related to this research for hours on end, but when there is a making task to guide and focus my attention, it becomes possible to think about the research most of the day.

I had a couple of ideas that were spurred by working on the celebrity faces and cropping the Renaissance Paintings – would it be possible to run a face detection algorithm on Renaissance paintings, and then train a GAN based on just those faces? This could provide appropriately painterly figures for an image that avoids the problem of having too much photorealism when trying to create a mythological or allegorical space.

The image I want to use this technique in is partly based on an image I found while looking for gold backgrounds that could be used to train GANs. That will be the subject for another blog post.