GANs and Icons

A quick recap of what this project is about:

Future Renaissance imagines a distant future society of intelligent machines that only dimly remember humanity. With digital records lost or corrupted, the machines study humanity’s surviving physical artworks, using this imagery to illustrate their own creation myths. Traditional artmaking techniques combine with robotic tools to physically record these digital dreams.

At the outset of this project, I imagined I would composite photos together to tell a mythic story, and possibly train a robot using machine learning to inscribe ornamentation and detail on gold leaf on the background. In fact I found that it was really helpful to collaborate with the machines to make the images without using photographs. I’m only now looking at ways to generate vector art to inscribe on the gold.

I got looking at The Noun Project, a collection of more than a million icons organized by category which can be licensed for use, or used free if crediting the designer.

Icons from the Noun Project’s home page. Apologies for not crediting each designer individually here…

I wrote some code to download icons using Python. They are generously allowing 5000 API calls per month for free accounts, and each call will give me a URLs and attribution data for up to 50 icons in a category I specify. Using an API and JSON in this way was new to me, so it took a lot longer than I wanted it to. I spent a little more time refining it than I strictly had to, so that it would be a bit more user friendly…so that if I want to use it again in three months or three years, it won’t be like reading ancient greek. (Code is here).

Here are a few of the Noun Project’s “sun” icons to give you an idea:


There are over 14,000 icons in this category alone, by thousands of different designers. I gathered about 5000 of them, partly because of the time it takes, and partly because I’m not sure what the etiquette is when downloading so many files from someone’s server.

After a few false starts (i.e. most of a day) I got CarpeDM’s Tensorflow DCGAN (link) to read these – I used about 5000 at a time – and make new examples. I am also pleased to now be working at 128×128 pixel resolution rather than 64×64. Here’s what it made with “sun” when I got everything set up:

64 GAN-generated images based on “sun” icons from The Noun Project

I find the results pretty interesting – a machine’s abstraction of various humans’ abstractions of a heavenly body. I am working on a more symbolic composition with several sets of icons made using this approach. Updates to follow!

Figurative vs Symbolic

I’m continuing my adventure of making images using GANs, a machine learning technique that uses two neural networks trained on a large number of images to produce new examples. This week I made some new portraits.

This week I was reminded of the story of the happy face…recalling that it predates the whole concept of emoticons. I remember being told that this is the simplest drawing that a newborn will respond to.

(image: The World Smiley Foundation)

Whether or not that’s true, we’re so obviously hard-wired to notice faces. They jump out at you, even when they’re not really faces, and I guess you don’t have to think too hard about our evolution to imagine why it would have been important to all of our ancestors to notice faces and pay attention to their nuances.

Faces in Things
Faces are everywhere. See for example the Faces In Things Twitter https://twitter.com/FacesPics.

Years ago I tried photographing inanimate objects in studio lighting conditions – photographing, say, a fire hydrant with a softbox as key, a fill and a rim light. I thought it would look interesting. It mostly just looked like a fire hydrant. What always does look interesting, though, is a picture of a person. Maybe that’s why the first images I made using components from various GANs that really kind of worked were pictures of people:

Composite from several different GANs

I used a DCGAN to make faces based on several thousand faces from Renaissance paintings. The body came from Robbie Barrat’s GAN pre-trained on portrait paintings, with some transfer learning from my renaissance dataset. I used SRGAN to increase the resolution (by inventing detail), and a style transfer GAN to help blend the face with the style of the body.

It’s hard to credit the full lineage of researchers and artists who had a hand in building these tools, much less acknowledging the nearly 20,000 paintings that went into training the GANs for making the images.

This is not a finished image – I am doing some gilding and incorporating background details in this, but it’s interesting to note that this image stands on its own better than the more symbolic and abstract pieces that I’m doing like the one in the previous blog post. I still don’t have that one at a point that I’m happy with.

Making pictures

My process is to create raw materials using neural networks (GANs), composite them together by hand, and transfer the results to a gilded panel. For now, I am creating the designs inscribed on gold by hand in Illustrator.

The process starts with sketches and lots of experimentation with the outputs from the GANs. As I mentioned in a previous post, I recently built a larger dataset to train the GANs to produce better options. I was feeling limited by the GAN output I had.

I have been producing faces and bodies with separate neural networks. Our human minds are so finely attuned to the nuances of faces, that they benefit from the extra detail and exactness produced by a GAN trained on just faces.

Here’s an example of an image that didn’t seem worth printing. I didn’t do too much fine tuning of this image (e.g. to match the skin tone of the face to the body!). I liked the upward gaze of the face, and planned to have her contemplating a scene, but when assembled, I thought her look was too odd and her pose was too formal, and not very expressive. Nonetheless, it was one of the clearer figures and better faces from my GANs, so I didn’t have lots of choice.

Unsuccessful composite of face and body

I decided to build a better dataset and train the GANs some more, detailed in this blog post. The faces look promising, and I’ve been working on some new composites.

At the same time, I’ve been practicing my gilding. This is a pretty time-consuming step – building up the base and adding multiple layers, sanding them smooth and then very often encountering some other problem – cracks, scratches, or a base that’s too hard or too soft or too gluey or not gluey enough. If I rub through the gold while burnishing, it can be hard to patch without leaving obvious marks. As Zach Arias once said about photography, “We want to make big leaps, but what actually happens is gradual progress”. This seems to be true here as well. Each panel I make is better than the last, as I figure out what temperatures to use, and what brush, etc.

In the mornings I’ve been sketching compositions. I find I’m pretty foggy first thing in the morning, so the sketches can take unexpected turns. Through this process, I got the idea to portray the neural network’s architecture by inscribing it in the gold background. Over a few days this evolved:

Freehand drawing is not my forte, but it’s still a helpful way of thinking
Experimenting with giving the networks scale. In a digital drawing, I could include many more layers.

I also thought about working the nodes in a network into a lace, scrollwork or vine motif in the background that would blend organic and digital components, alluding to how life emerges from networks, and neural networks emerged from biological forms.

For the figure, I used a design created using transfer learning – a dataset built on portrait paintings then trained on renaissance faces. It produced an example that evoked a human figure, but with an interesting internal patterning suggesting a less literal and more symbolic portrayal of the figure.

I researched structures of neural networks and found some great resources. The Asimov Institute’s Neural Network Zoo (link), and Piotr Midgał’s very insightful paper on medium about the value of visualizing in talking about network architecture (link).

The elements I wanted to combine (from left to right): a figure created by transfer learning (GAN) + neural network diagrams (source) + vine design

The network will emerge from the figure into the gold background, with the nodes of the network represented as leaves, and the complex interconnections between then represented by vines.

I am still working on the drawing while my gilded panels dry. Hoping to try punching (stippling) as well as engraving to get some shading on the leaves (nodes). Here is the panel so far:

The printed figure is mostly hidden under a layer of gold (which will be removed) and frisket – a protective film of latex that will be removed.

This is certainly my cleanest gilded panel so far, lacking the usual rips and tears, but there may be a problem with the surface – it’s not burnishing properly and I’m not sure why. No big leaps, then, but slow steady progress.

Finding Faces, colours, and materials: more machine vision tools for building training datasets

In a recent post, I worked on improving the raw material my GANs pass along for me to use making composite portraits. I noticed that the Renaissance faces I’m using for training do a lot more head tilting than the more usual GAN training data consisting of celebrity headshots. I wondered if machine vision could rotate the training faces straight to make it easier for the neural networks in the GAN to learn their patterns.

In object detection with neural networks, the training data can be augmented by included copies of the same images with various crops and rotations to bulk up the dataset size, giving the network the opportunity to learn about objects appearing in various parts of the frame in different orientations. I haven’t heard of this technique being used to train GANs, but I thought it would be worth trying a couple of ways – first remove rotation so all the faces are straight, then try them at various rotations to see if the larger dataset helps in training.

Rather than cooking up my own, I turned to Google’s Cloud Vision API, which has been used to detect various image properties on the Metropolitan Museum of Art’s online collection. (I wrote about this here). Not only are the images categorized by period, medium, genre and style, but the database hosted by Google’s BigQuery also contains a little bit of data on what Cloud Vision thinks is in the image. If you want images where the dominant colour is red, you’re in luck. One of the more experimental features is facial expression recognition. This is definitely something computer vision struggles with at the moment.

It was possible to build a small dataset of images for “sad” facial expression, but it contained many questionable entries, some of which are not faces, such as the profile of a piece of decorative moulding included in the collection below:

Images containing a high probability of “sorrow” facial expression according to Google Cloud Vision

The AI struggled even more with “surprise”, as it lacks expertise understanding semantic cues about when people are surprised, as opposed to, say, singing:

Not sure any of these people are really surprised…

As interesting as this was, my first pass suggested this feature is not ready to build large datasets for me. I hope to return to it in the future. In the meantime, I turned my attention to its facial recognition. My implementation of Facenet is trained to detect faces in photographs, so it misses quite a lot in paintings (see blog post). I wondered if Google’s implementation might possibly be better than mine. So far it looks encouraging. It also contains pan and tilt estimates for faces. I tried overlaying a few of its measurements on one of the faces, and the results were encouraging:

Crop showing accurate face detection with roll angle and even a decent guess at emotion

This was a randomly selected face, and only a single data point, but I noticed that it determined that joy was UNLIKELY, but not VERY UNLIKELY. This is promising enough that I will write a scraper and other scripts to get these faces.

It was also fun to look at all the images of objects in the collection that were made of rock. I’m going to see if I can make some interesting rock objects with a GAN based on these, which will help my images grow from being simple portraits to more complex scenes.

Inkjet and water gilding

My project imagines stories told through pictures by distant future artificial intelligent machines, who rummage through humanity’s physical traces, finding mostly e-waste and some preserved works of art. I am training neural networks (GANs) to create new images from large collections of images of artworks.

Another part of this project entails taking the images made with neural networks out of the digital realm into the physical. It is an interesting process to gather digital images by the thousands and then discard them, keeping only the information encoded in the neural network (GAN), ultimately returning the images to the physical world on panels that resemble the Medieval and Renaissance panel paintings on which they based.

Late medieval and early Renaissance paintings were done on wooden panels with gold leaf background. I am using this traditional technique, using circuitboards rather than wood, using inkjet printers rather than tempera paints, and using a computer controlled machine to punch and inscribe the gold background rather than doing it by hand.

For a quick review of the process of making a panel painting on gold, it’s hard to beat this video from the National Gallery. My process is the same, except I’m using an inkjet printer instead of paint.

In a previous blog post, I posted a photo of one of these. In it, I had printed a composite portrait of a figure onto a metallized film, cut it out and adhered it to the gilded panel. Last week I ran a bunch of experiments printing on different papers, films and coatings, and this week I managed to print directly onto the fibreglass panel.

First I tried various options for transferring the images onto the gesso using a transfer medium. Bonny Lhotka’s books and website have quite a bit of information about printing and transferring onto unconventional materials. I used a popular method based on printing on the non-absorbent side of freezer paper, and then scraping the still-wet ink onto the final surface. The results I got were interesting, but not quite what I was looking for. The ink tended to bead up on the surface of the transfer paper, leaving splotches and gaps in the resulting transfer. Colours were not well separated either, as everything tended to run together a bit.

Image printed on freezer paper and transferred to plain note paper. Despite trying various settings, this was one of the cleanest results.

Inkjet printers are a marvel of technology, precisely controlling picoliter droplets onto a medium. Photopaper and other substrates are also carefully made to optimize this process. Try printing a colour picture on plain paper, and you get a blurry, wrinkly mess. Try it on a shiny surface, and you get a greasy mess.

The same test print on glossy photo paper (L) and on gold foil (R). The baby faces are as disturbing as anything a GAN can produce, but alas not what I was looking for.

I discovered the murky and mysterious world of printer ICC profiles and various settings that affect how much ink is deposited on the print, as well as a coating intended to take non-traditional surfaces (fabric, regular paper, etc) and make them suitable for receiving ink.

Two successful prints on gold foil painted with Golden Digital Ground and one curled up mess on vellum supposedly suitable for inkjet printers.

Luckily, my photo printer (Epson R3000) accepts media like posterboard up to 1.3mm thick and has a “straight through” printing path as an option. I got some 0.75mm thick circuit boards (bare copper on fibreglass) to see if I could run them through directly using Golden Digital Ground that worked on the thinner media.

This was tricky, because painting the various layers of rabbit skin glue, gesso and bole on much thicker panels causes them to warp when the glue sets. This has been a problem for the embossing phase as well. I solved it by painting both sides of the panel for the first few coats, then sanding off the unnecessary material from the back when the good surface of the panel was stabilized by the usual 6-8 coats of gesso. The usual scraping and sanding needed to ensure the panel is flat and smooth was enough to bring the gessoed circuit board under the 1.3mm maximum the printer will accept. I already know from my earlier printing tests that a “head strike”, where the print head collides with the material can be bad enough with paper, so I really didn’t want it to happen with something rigid like the panel.

The panels I have are 6″x9″ (around 15cm x 23cm), so rather small, and the printer wouldn’t accept them without crashing the software. I eventually got around this by making a holder for the panel out of a 1.3mm thick mat board used for picture framing, with a 6″x9″ hole cut to fit the panel exactly (shown below with a bare copper/fibreglass panel).

Using this approach I was able to print on fibreglass that I had treated with the Golden Digital Ground.

Copper/fibreglass circuit board panel partly coated in Golden Digital Ground with some of my latest portraits generated with transfer learning. The rightmost edge of the image at bottom left overlaps an area of the copper not treated with the Golden Digital Ground, and you can see the ink beading up on the surface there.

I took a few deep breaths before sending a gessoed panel through, but the results were quite good on both untreated gesso and gesso treated with Golden Digital Ground

Gessoed circuit board panel with inkjet printed figure. Washers were used later as spacers to keep a dust cover from sticking while everything dried.

I then made a stencil that matched the outline of the figure and sprayed the inkjet painted part with lacquer to protect it from the water used in the next step, applying the bole.

New figures, new faces

I’ve been using faces extracted from Renaissance paintings to train the neural networks (GANs) and create “new” faces and portraits. I am collaborating with these neural networks to make portraits of what distant future artificially intelligent machines might remember as their ancestors, gods or creators. Painted portraits were not originally digital images, of course, so they seem like a good place to start when looking for progenitor figures.

Renaissance paintings, particularly from the early Renaissance may employ repetitive and stereotyped facial expressions. Even the finest examples seem to exhibit some kind of heavenly ideal of beauty where individuality is not the main point – see for example Duccio’s Maestà, a masterpiece from 1311 CE.

Duccio Maestà (link)

Less than ten years later, Giotto was showing much greater individuality and a wider range of facial expressions. For example, see St. Francis Renounces his Worldly Goods (c. 1320) (link)

St. Francis Renounces his Worldly Goods (c. 1320) (from wikiart.org)

For the next 200 years, faces only got more expressive and individual. These are fewer of them, but they are much more interesting than 50,000 headshots of smiling celebrities often used to train neural networks.

The last faces I made (see blog post) produced some interesting examples, but most of the output from the GAN was unusuable, probably because of the small number of images. Revisiting the dataset, I also noticed that some of the images were quite small, and my data pre-processor had scaled them up.

This time, I took not just early renaissance faces (that would be contemporary with the gold backgrounds I used in the physical process), but also included some High Renaissance and Mannerist images. After removing all the faces less than about 64 x 64 pix, I was left with a higher quality image dataset almost twice the size, at around 4500 images.

Training overnight on a rented AWS instance, I got some interesting candidates around epoch 70:

Grid of faces produced by Deep Convolutional Generative Adversarial Network (DC GAN) trained on faces from Renaissance paintings

The results are more interesting, more varied, and higher quality than what the GAN produced previously. This process was about 100x faster than the first time I did it. I knew which GAN implementations to use, and I had already written Python programs to get the images, extract the faces, and rename/resize the training data.

Transfer learning

I hoped that transfer learning would allow me to do more with less. Less training data, less computing power, and less time. In the context of image generation with GANs, transfer learning means taking a neural network (GAN) trained on one set of data, then switching to a different set of data and continuing to train briefly.

Robbie Barrat demonstrated this technique by training a GAN on landscape images, then briefly switching the training data to abstract paintings:

GAN trained on landscape paintings is then briefly trained on abstract paintings to produce…abstract landscapes?

In previous blog posts, I have attempted to compensate for having a relatively small training dataset by using an undertrained StyleGAN, creating a category in GANGogh, and interpolating between different parts of a trained GAN’s latent space (see Evolition here).

I used several implementations of DC GANs such as Soumith Chintala’s DCGAN (here), Robbie Barrat’s implementation of Soumith’s GAN (here), and Taehoon Kim’s tensorflow version (here). These three all implement the option of resuming training from a checkpoint, which makes it possible to pause training and swap the data. From what I can see, this does not automatically produce usable results. After all, the Generator network is now trained around the features common to one dataset. Swapping a completely different dataset is just as likely to confuse the network as enhance it. The other half of the GAN is the Discriminator network and its job changes completely when the data is switched. Letting the training run for more than a couple of epoch (the term for one complete pass through all the training data) generally results in a garbled mess.

I did manage to produce some interesting images, based on Robbie Barrat’s GAN trained first on portait paintings, then switching the data for my renaissance faces. One of the first things I noticed was the subdued palette coming in. My renaissance faces training dataset includes drawings as well as paintings, so some of the output reflects this, consisting of two colour line drawing images, and these sorts of features began appearing in the output.

This portrait takes on the quality of a silverpoint drawing after two epochs of training on the Renaissance faces
It’s sometimes hard to say whether it’s the additional training that caused an image to resemble the Renaissance faces
In this image (which has been enlarged), a large second face seems to be appearing just below centre.

This technique definitely shows promise, however training on two quite different datasets doesn’t seem to be the way to get the most predictable results, as the image above indicates. With the two datasets I’ve used, the output lacks something in predictability, but it is helpful for generating unexpected results -e.g. a painted torso with hand drawn face, or a second face appearing out of the sitter’s clothing. For now experimentation seems to be the way. I making more and hope to get some insight on what datasets can work well together.

A few finished pieces

This week I showed some of the pieces I’ve made so far to fellow students and faculty in the Digital Futures program at OCAD U.

A quick recap: my project asks how intelligent machines would portray their own creation if they didn’t have detailed records of it. What might they imagine if their only knowledge came from the faint impressions left in their neural networks of their early training?

To address this question, I used today’s state of the art machine vision techniques, gathered large collections of categorized images, and trained neural networks (Generative Adversarial Networks, or GANs) to produce original images. I presented the images in a variety of media that could be appropriate to machines making images of their own distant past. Images of the pieces are shown below.

Gallery installation at OCAD U
Heaven Underground – Aluminized polymer, inkjet, acrylic.
This is an illuminated punch card, reminiscent of the illuminated manuscripts in A Canticle for Leibowitz. The alignment of hills indicates the possible location of hidden data stores, such as those curated by the Memory of Mankind project.
Evolition – digital animation
This short animation is a walk through the latent space of BigGAN, proposing a possible evolution of machine intelligence starting from some of earth’s earliest life forms.
Corporation (photo c/o Kristy Boyce). Aluminized polymer on gilded panel
This portrait is composited from a face and bust generated by separate GANs, on a gilded panel made using traditional techniques, with patters inscribed in the gold by a computer-controlled machine.

Two of the pieces I showed (Evolition and Corporation) incorporate imagery produced by neural networks, and the third (Heaven Underground) is based on an image dataset used to train neural networks for scene recognition.

I’ve been describing this work with neural network as a collaboration for two reasons. Unlike other artists tools, whether digital or otherwise, the output of the neural network is surprising. That is, it is not predictable using conventional computation techniques or algorithms.  Also there is a back-and-forth interaction between me and the neural network. I decide what images to feed the neural network, it “decides” what to make of that input. I decide how to combine the output of one network with the output of another, and feed it to a third network for the final results. This differs from a more basic workflow where the neural network makes images based on generic training data, and the human experimenter selects the “best” ones to show. 

This collaboration allows me to better understand the capabilities and limitations of neural networks. They are great with putting something down on a blank canvas, where I am terrible with that. I know how to shape a story with levels of semantic depth, whereas machines generally can’t. Telling a story about machines in the distant future is enriched by my hands-on experience of how they excel and how they stumble, and leads me to imagery I could not have developed on my own.

Today’s neural networks struggle to make compelling images on their own, and are not capable of telling a coherent story (unless it’s very short) without human help. Experimenters and artists often compensate for this limitation of their machine by selecting the most compelling finished image or text snippet from a thousand or more samples created by the network. I experimented with more complicated collaborations with image-making neural networks by combining and re-combining the outputs of several specialized neural networks to build up a story.

I don’t think “behind the scenes” or “how it’s made” material is necessary in an art show, but this was mostly a look at process. Some people had earlier expressed confusion about the various steps involved, along the lines of “which part did you do and which part did the machine do?” so I included a how it’s made chart for Corporation, below.

The bust was generated by Robbie Barrat’s Art DCGAN (link) and the faces were generated using Taehoon Kim’s Tensorflow DCGAN (link). The faces used to build the training dataset were gathered using David Sandberg’s implementation of Facenet (link) and I used letsenhance.io’s SR GAN to increase resolution (link)

Using machine learning to build datasets for machine learning

Using the same datasets over and over when training leads to Neural Networks that generate repetitive results. I am trying different approaches to control the kind of images my Networks generate. Previously, I have tried under-training a CycleGAN to hybridize two images, and using face detection to build a dataset of faces from early Renaissance artworks. 

This week I am using Google’s BigQuery service and its Cloud Vision API, along with the Metropolitan Museum of Art’s recently released online art collection to gather large numbers of high quality images from particular artworks without the need for manually sorting hundreds of thousands of images. 

The online collection of art includes images of more than 200,000 items from museums around the world. The Met’s portal for this collection allows visitors to download any image from this collection if it is public domain. They have also made a catalogue of sorts available, but the half-million items in the CSV is too much for Numbers or Google Sheets, and makes Excel grind almost to a halt and crash pretty frequently. This amount of data calls for a database approach. Google makes this information available on its cloud-based Big Query service, based on SQL. It is clearly intended for businesses rather than individual users, but free accounts are available for a limited number of queries. As with many Google products, the technology is impressive, but the user experience may be baffling. 

As an example of the baffling, the Met’s artwork dataset is divided between three tables – objects, images and vision_api_data. It is not initially clear what the relationship of these tables is. Images appears to include at least some paintings, but so does objects. Objects has detailed info on its items, but Images has just 6 fields. It turns out that all three tables in fact refer to the same items – the 200,000 artworks mentioned earlier. They can be cross-referenced between tables using the JOIN command in SQL using their item number. This is not documented anywhere, but left as an exercise for the curious. There is no obvious reason why this couldn’t be a single table; there are not that many fields in each of the three tables. 

Another red herring lurks in the helpful comment #Standard SQL, found at the top of the sample query provided, which suggests that the syntax of the SQL being used is standard. It is not. In standard SQL, you can for example exclude items with no entry under the heading “period” by saying: 

where period != null

In Big Query, this generates a syntax error. You need quotes around “null”. But how would anyone know that? I was capably assisted by Zach Peyton who figured all this out. I connected with Zach through Codementor.io, which pairs students with experienced programmers for teaching and troubleshooting help at rates averaging $20/15min. 

There is no substitute for learning through experimentation, but sometimes you just get stuck, and forums aren’t any help. It’s good to have an option to ask an expert at times like that. Based on the leg up that Zach gave me, I was able to assemble queries that would return, for example, URLs of images of objects made of rock, which I could then use to train a GAN:

select i.object_id, o.period, i.original_image_url, v.description
from bigquery-public-data.the_met.objects o
join bigquery-public-data.the_met.images i on i.object_id = o.object_id
JOIN (
SELECT
label.description as description,
object_id
FROM bigquery-public-data.the_met.vision_api_data, UNNEST(labelAnnotations) label
) v on v.object_id = o.object_id
where is_public_domain = True
and description = 'rock'

Making faces with GANGogh

I wanted to see if I could pre-train a network on various kinds of art before adding my Renaissance faces dataset. Maybe, like with generalized Style Transfer, the network would pick up some general insights about images from the larger dataset, and apply them to the images generated based on the much smaller Renaissance faces data.

Some of the most impressive results from GANs are those done with enormous computing power on very large datasets. The computing power needed to produce new pre-trained networks such as BigGAN or Progressive Growing of GANs is mostly restricted to researchers at Google and Nvidia. The same dozen large datasets are used to train most of these networks, and no wonder. ImageNet’s 9 million labels were added by hand by people working for Amazon’s Mechanical Turk. Even so, the resulting images are almost always readily distinguishable from the training data, and resolution rarely reaches or exceeds 512×512 pixels.

As AI artist Helena Sarin points out (https://www.artnome.com/news/2018/11/14/helena-sarin-why-bigger-isnt-always-better-with-gans-and-ai-art) that artists and researchers all training with the  same data leads to a repetition of the same aesthetic. She advocates for a “smallGAN” movement that embraces the limitations and bad behaviour of the sort of small (64×64, 128x128pixel) networks that can be trained on more affordable equipment, on datasets that can conceivably be curated by single resourceful person. I am working to take this one step further and recruit object recognition algorithms for face detection to produce a dataset specific to my project.

I used computer vision techniques to crop 2800 faces from several thousand early Renaissance paintings. I was then able to train a GAN to produce new faces, which I plan to composite onto the characters in the mythological images I am creating. The generated faces are of reasonable quality, but I wondered if it would be possible to add variety and depth by incorporating more general information about old paintings into the network, while still outputting faces. I reasoned that this approach was part of what enabled  Google researchers to perform Style Transfer using a single example of the desired style, using a network that was previously trained on many different styles. Could the same technique be applied to help my network learn from relatively few Renaissance faces? 

I found a blog post and some code (https://github.com/rkjones4/GANGogh/tree/master/misc) that used this same approach with GANs. Jones used the entire wikiart dataset of 80,000 paintings, sketches and other art objects, categorized among 15 categories such as “landscape”, “abstract”, “portrait” and so on. The code was not intended for widespread use, and I struggled to get it to run. I recruited help from Nilav Ghosh through Codementor who helped me figure out how to configure the version of Tensorflow required to run this two-year-old code. I couldn’t figure out how to run the data pre-processor – it seemed to be windows-specific, and ultimately I wrote my own in Python, and pretty soon I was able to replicate his results. 

It was a triumph for me to even get this code running. I have deliberately skipped learning about frameworks for programming neural networks from the ground up in the traditional way, because I read and heard this would take several months – more time than I had. So I have been picking things up as I go, and digging into details only when things break. This kind of challenge is everywhere when running repositories of someone else’s code that was never intended as anything other than an experiment or prototype. It is not a criticism of the developer, who very often has gone to a lot of trouble to document what they’ve done and how to replicate their results. But altering anything, changing datasets or using more recent versions of libraries and languages all lead to problems. It is not uncommon to find “magic numbers” in the code in various places, leading to questions like ”why is this variable multiplied by 844 on line 127?” Chances are that no one knows. If the author hasn’t looked at it in a year or more, even they can’t remember. I have had to abandon my attempts to get several promising and widely cited GANs to work after days or weeks of unsuccessful troubleshooting. I have the feeling that someone with greater programming experience in deep learning might be able to overcome these problems. I don’t know. I have often encountered obstacles and been unable to find solutions from anyone in person or online. 

64 x 64 pixel samples of images generated by GANGogh in the category “Symbolic Painting”

In order to produce faces informed by the much larger Wikiart dataset, I added my Renaissance face images as an additional category alongside the 14 original ones, resized the input and output matrices and let it train most of the night. 

The resulting images are small – attempts to scale this GAN beyond 64×64 were not successful. Each generated image grid shows 90 images from one of the 15 categories. These grids of 90 sample images for each category are produced for each training epoch. An epoch represents the processing required for the GAN to train on all the data in its training dataset. This allows us to look at images that differ only by the category we are asking for. We can see an image change slightly to become as a landscape, a portrait, a religious painting or as a Renaissance face. 

64 x 64 pixel images created by GANGogh in the category “faces”, trained on Wikiart and my Renaissance faces dataset

      The first few times I ran this (with slightly different training parameters) it stopped prematurely due to an error. It is interesting to see the colourful an evocative face images emerge from noise as the training proceeds. With the present approach, however the faces are probably not an improvement over the faces produced earlier using a DCGAN without the entire wikiart dataset – the colour and shape variation is probably not conducive to compositing the faces into the scenes I am trying to create. I am continuing to experiment with this approach in the hopes that further training will produce faces that are more suitable for my needs.