What do machines see?

Earlier this week I adapted a computer vision technique intended for photos to isolate the faces in 2500 early Renaissance paintings from wikiart.org. I am hoping to create new faces from these using a GAN, in order to represent how machines “see” their human creators. For purposes of the image I’m making, the machines have a penchant for traditional European art.

My code was forked from the code by Jeevesh Narang here. It uses OpenCV’s face detect algorithm using Haar cascades which is discussed in detail here (section 1.10.1), which in turn is based on Paul Viola and Michael Jones’s 2001 paper, “Rapid Object Detection using a Boosted Cascade of Simple Features”. It’s a machine learning technique that does not depend on neural networks.

Running it on 2800 early Renaissance produced results that were…interesting. It found a lot of faces, but produced about 1250 false positives in 2790 paintings (based on hand filtering the results). Here’s what a selection of its “faces” looked like:

 Only about half of the hits contain actual faces, and some are cropped strangely. Only about half of the hits contain actual faces, and some are cropped strangely.

Some of these are clearly faces, some are clearly not faces, and some are understandable mistakes. I particularly liked these “faces”

 Never noticed how much a horse’s rump looks like a face. Also I think it spotted the Shroud of Turin. Never noticed how much a horse’s rump looks like a face. Also I think it spotted the Shroud of Turin.

It didn’t take too long to filter the false positives by hand in a small dataset like this one, but by spot checking I could see that the algorithm missed a lot of faces as well. It makes sense that an algorithm intended for photographs would suffer when used for paintings. Rather than tweak parameters,

More research turned up a more modern approach (github) for face detection based on – you guessed it – machine learning. It boasts greater than 99% accuracy, and is trained on hundreds of thousands, or millions of images, depending on settings.

This question of what machines see is particularly poignant this week as social network Tumblr has announced it will be banning adult content from its site soon. Bloggers like Janelle Shane have been posting (on Twitter) some of the images that have been flagged as inappropriate:

 From Twitter - Janelle Shane’s Tumblr post on the inappropriate dual nature of light From Twitter – Janelle Shane’s Tumblr post on the inappropriate dual nature of light

Clearly some algorithms are better than others at distinguishing and categorizing images. Social networks all seem to employ growing teams of people to moderate content. It’s interesting to note that the Discriminator half of a GAN is quite a bit better at its job than the Generator half is. this makes sense – it’s easier to detect content than to create convincing content. But the Discriminator is the half that we throw away once the network is trained.

This got me thinking about the challenges I’m facing with gathering a large enough dataset to create new images. In a 2017 blog post about their artmaking GAN, Kenny Jones and Derrick Bonafilia mentioned that the Discriminator network of a GAN they were making was able to successfully categorize artworks 90% of the time, based on the categories in wikiart.org.

Last week I got 8500 images from NASA and am considering filtering them by hand to train a GAN. This approach is taken by others who curate their training data to achieve a particular kind of output from a GAN, or even create all the images by hand. Yikes!

Looking at my NASA dataset, I want to filter out all the images that are not of starry scenes with a nebula, galaxy, or other heavenly body in them. I considered writing a script to look at the histogram, and reject anything that wasn’t black around the edges. That would filter out a lot of terrestrial landscapes, diagrams and other scenes that will only confuse my nebula GAN. But technically the perfect tool for filtering these images is…a neural network. Wouldn’t that be the perfect tool to facilitate generating more and better datasets?

For training Neural Networks on specific data that isn’t already available in a large dataset, it seems that we need an easy-to-use neural network that could help gather images in a particular category. As the power and variety of GANs increases, we may find ourselves increasingly limited by our training datasets, the same handful of which get used again and again. But with a relatively small number of sample images, we may be able to go out into the web and find “more images like these” using essentially the Discriminator networks which are a discarded by-product of the GANs we’re training all the time.