Computer Vision finding old faces and making new ones

What would future artificially intelligent machines think humans looked like if they’d never seen one? With only scattered fragments of data about our time, they might try to reconstruct our history from scraps – after all, history and archaeology is always a reconstruction from fragmentary evidence.

In this hypothetical future, let’s say that digital and print images of humans do not survive, but paintings – whose pigments last for centuries – have been safeguarded. Would traces of the neural networks that allow present day computers to recognize faces survive through centuries of machine evolution, the way ancient words occasionally appear in modern language?

Over the weekend, I experimented with a more sophisticated face detection algorithm to increase the size of dataset containing faces culled from early Renaissance artworks. After discussing the results of my face detection effort using Haar Cascades, Tushar Gupta suggested I try Facenet (link). Although both of these approaches work well with photos and webcams, I am not expecting perfection from either of them when confronted with pictures of artworks, considering the neural network underlying the detection was based on photographs.

Facenet is more complex and harder to use, and requires a Cuda-enabled GPU, but online posts suggested its performance was much better than Haar Cascades. I developed some Python code based on cjekel’s implementation of David Sandberg’s implementation of the Facenet paper.

This had to be run on a remote machine due to the computationally intensive neural network it uses, and the need for a GPU running Cuda. This means a lot of tricky file manipulations over ssh with a rented cloud computer from AWS. Although I’m now getting used to this kind of remote work, it’s definitely inefficient. To perform the simplest operation like coping an image from this machine, I need to compose a command line statement something like:

scp -i /Users/chrisluginbuhl/Dropbox/Digital\ Futures/Thesis/Python/AWS-CL3.pem /Users/chrisluginbuhl/machine_learning_local/wikiart/Early_Renaissance/detected 

On the other hand, it can run code that my machine can’t, and it’s pretty fast at almost anything. I am paying about $3/hr for this machine (a p3.2xlarge), whose Nvidia V100 graphics card alone retails for over US$11,000!

The previous face-recognition program I ran on this dataset found 1059 faces and 1250 false positives among an unknown number of actual faces in 2790 pictures. This algorithm did much better: 2758 faces and 111 false positives in the same 2790 pictures, and took only slightly longer to run. The cropping results were significantly more consistent as well:

 Despite being trained on photos of people, Facenet finds faces in these photos of paintings and sculptures very well.

There’s less entertainment with false positives, but I removed a few by hand just to help the network be clear on what’s a human and what’s a lion:

Performance is a little too good if you want just human faces that will help the GAN rather than confuse it.

It definitely missed quite a few faces as well. I modified the script to show the detected faces in their original context, and ran it through a handful of images:

Red boxes indicate detected faces. These four images show some of the failures and successes.

Generating new Renaissance people

My computer is capable of running a basic DCGAN, and I trained one with this collection of faces. I needed to do some fine tuning of hyperparameters to get it to train and its discouraging to run an experiment all night only to discover it failed to train. I quickly discovered just how much faster my $3/hr AWS machine with a fast GPU is for training a GAN.

One pass of about 4000 every 10 seconds means a single experiment runs all night on my fast video editing laptop….(sped up 2.5x above)

….what a difference 5120 CUDA cores make

Towards the end of the experiment, some interesting looking renaissance people began gazing out of the data at me:

After many days spent working towards this, I was very happy to make the acquaintance of these strangers (some stranger than others). At the moment I am still playing with hyperparameters and working on troubleshooting an error that is preventing me from making faces larger than 64×64 pixels.

One of the more haunting faces captured my imagination, and I wanted to see if Huxley the robot can make a decent mosaic of it from coloured tiles. There may be something appropriate about a young robot painstakingly envisioning what a human from the renaissance might have looked like, while I continue toiling to create a better machine for the future out of lengthy command line arguments.