Finding Faces, colours, and materials: more machine vision tools for building training datasets

In a recent post, I worked on improving the raw material my GANs pass along for me to use making composite portraits. I noticed that the Renaissance faces I’m using for training do a lot more head tilting than the more usual GAN training data consisting of celebrity headshots. I wondered if machine vision could rotate the training faces straight to make it easier for the neural networks in the GAN to learn their patterns.

In object detection with neural networks, the training data can be augmented by included copies of the same images with various crops and rotations to bulk up the dataset size, giving the network the opportunity to learn about objects appearing in various parts of the frame in different orientations. I haven’t heard of this technique being used to train GANs, but I thought it would be worth trying a couple of ways – first remove rotation so all the faces are straight, then try them at various rotations to see if the larger dataset helps in training.

Rather than cooking up my own, I turned to Google’s Cloud Vision API, which has been used to detect various image properties on the Metropolitan Museum of Art’s online collection. (I wrote about this here). Not only are the images categorized by period, medium, genre and style, but the database hosted by Google’s BigQuery also contains a little bit of data on what Cloud Vision thinks is in the image. If you want images where the dominant colour is red, you’re in luck. One of the more experimental features is facial expression recognition. This is definitely something computer vision struggles with at the moment.

It was possible to build a small dataset of images for “sad” facial expression, but it contained many questionable entries, some of which are not faces, such as the profile of a piece of decorative moulding included in the collection below:

Images containing a high probability of “sorrow” facial expression according to Google Cloud Vision

The AI struggled even more with “surprise”, as it lacks expertise understanding semantic cues about when people are surprised, as opposed to, say, singing:

Not sure any of these people are really surprised…

As interesting as this was, my first pass suggested this feature is not ready to build large datasets for me. I hope to return to it in the future. In the meantime, I turned my attention to its facial recognition. My implementation of Facenet is trained to detect faces in photographs, so it misses quite a lot in paintings (see blog post). I wondered if Google’s implementation might possibly be better than mine. So far it looks encouraging. It also contains pan and tilt estimates for faces. I tried overlaying a few of its measurements on one of the faces, and the results were encouraging:

Crop showing accurate face detection with roll angle and even a decent guess at emotion

This was a randomly selected face, and only a single data point, but I noticed that it determined that joy was UNLIKELY, but not VERY UNLIKELY. This is promising enough that I will write a scraper and other scripts to get these faces.

It was also fun to look at all the images of objects in the collection that were made of rock. I’m going to see if I can make some interesting rock objects with a GAN based on these, which will help my images grow from being simple portraits to more complex scenes.