A Robot and A Software Toolkit for Making Images with GANs

Introducing Huxley

This week I had the pleasure of working with Huxley, an ST Robotics ST-12 arm in Phase Lab. Huxley has a 1m reach and can write his name with a pen. We did not shake hands, but neither did he punch me in the nose. So far so good.

I am planning to use a robot to take some of the designs produced by neural networks and trace them onto gold leaf, as an important step in taking the digital art back into the physical world, and the realm of traditional art materials.

 Huxley, the ST-12 6-axis robot
Huxley, the ST-12 6-axis robot

The robot’s software runs on Windows 7 or earlier, which presented some issues, since software updates, when they were still being released, would sometimes break drivers. Huxley hasn’t been able to connect for a while.

The driver box is connected to a PC via RS-232 serial port, so it was necessary to spend a couple of days thrashing around with cables and drivers that could make the connection. Success was achieved around 4:58pm on Friday! Michael Page, who runs Phase Lab, has done some interesting work with this robot in the past. He supplied three USB-to-Serial adaptors of unknown quality, and two Windows 7 machines. I supplied one more plus a virtual Windows 10 machine and with a voltmeter found a combination that worked.

This week Michael is going to get me started and set me up with the code that runs the robot.


Off-the-Shelf vs. Roll-Your-Own GANs

GANs are neural networks that can be used to make pictures. By using thousands of sample photos to “train” the network, the GAN becomes “tuned” to a particular style of image. Emerging in 2014, GANs are a recent invention. They are not user friendly or easy to train, and are limited to low resolutions. But the results can be quite astonishing, whether in the ability to nail an image in a particular style or, just as often, by the ways they get images wrong.

These computationally-intensive systems are a great compliment to human artists – they are quick where we are slow. But for all their inventiveness, they display to common sense, whereas people carry around in their heads a lifetime’s worth of knowledge about people, objects, vision and how these interact in the world.

Every week, new papers are released to the machine learning community featuring a new variation on GANs. Often, these are simply adjustments to hyperparameters but are worthy of publication because they impart some special ability. One such example is CycleGAN, which I mentioned last week, which can create original imagery such as celebrities who do not exist.

One such example is the CycleGAN paper, which uses two GANs for Style Transfer. Style Transfer is sometimes done with another Deep Learning technique described here (https://arxiv.org/abs/1508.06576) and made famous by websites like deepart.io and apps like Prisma.

 Prisma uses pre-trained neural networks to transfer styles to photos based on various presets (image:  mspoweruser )
Prisma uses pre-trained neural networks to transfer styles to photos based on various presets (image: mspoweruser )

At the moment, the tools are mostly pre-packaged with a few previously defined styles, or open-ended but extremely user unfriendly. One exception to this is ML5.js, a new library for Javascript that packages the power of Deep Learning for convenient use in a web browser. As with the above techniques, however, only pre-trained neural networks are available at the moment.

Part of the reason for this is that training neural networks is computationally intensive, and involves significant trial and error to arrive at a network that avoids mode collapse and strikes the right balance between too abstract and too derivative when making images.

It’s also challenging to find enough images to train a GAN – typically thousands are required. Datasets like the CelebA dataset and ImageNet have been used because they provide images of well-defined subject matter in clear categories.

I’ve been taking on the challenge of making GANs from scratch. Last week I ran a few experiments on a local machine with pre-packaged training images, but this week I ran more complex models on a remote machine. I also wrote a script for getting more training images from the web.

Clearly getting good training data is half of the challenge of producing good output. The Python script I wrote (here) scans through the last 23 years of Nasa’s Astronomy Picture of the Day and downloads the images to a local machine. As I learn more about Python, I continue to be impressed with what can be done in fewer than 45 lines of code. (I wish I could say that I’m falling in love with HTML at the same time, but alas not yet). This approach overcame the API’s usage limits and some of the limitations of other web-scraping programs I previously used to gather the images for training last week’s DCGAN. I got 8500 images this way in a few hours.

I am hoping to make nebula-inspired images for the series of machine-made mythological images. I may refine this approach to filter for keywords in the description, such as “nebula”. Unlike other image datasets I’ve seen, most of the images in this collection are gorgeous. I can’t wait to see what a nebula-trained GAN will come up with.

 Above: Nebula image from the Hubble Space Telescope courtesy of Nasa ( source )
Above: Nebula image from the Hubble Space Telescope courtesy of Nasa ( source )

Most of the code used to produce images in academia is made publicly available. It is not often packaged in order to be user-friendly, and may be written in one of a variety of languages and libraries used with Machine Learning. Fortunately, the ML community often adapts and improves the algorithms in popular papers, making them available in various computer languages. I am focusing on Python and Pytorch, because it is popular, powerful and concise. Lua and Tensorflow (by Facebook and Google, respectively) are lower-level approaches to neural networks and result in longer code that can be difficult to debug.

While small neural networks can be run on the CPU only, larger networks and larger datasets only work with Nvidia GPUs which run Cuda to harness the computational power of the parallel processing that comes with graphics cards. A network that trains in a day on a CPU could take a year to train on a CPU. The system I have been renting from Amazon Web Services (AWS), for about $3/hr, has 8 Tesla V100 GPUs, each of which is about 20x faster than the $1200 graphics cards found on high-end gaming desktops (comparison here). I find it hard to imagine the kind of computation I’ve been doing that takes all day on a GPU that performs 8 TFLOPS – that’s 8 million million double-precision floating point operations per second.

Running a machine on AWS is challenging to the non-expert, since everything has to be done through a tunnel provided by SSH. It’s like assembling a clock in a locked room by reaching a screwdriver in through the keyhole. I was surprised to note how much I’ve come to depend on hearing my computer’s fan to know when it’s processing something heavy. Without this physical symptom, I struggled to know whether anything was happening at all. Today after several hours of finding no results in the results folder of the remote machine, I pulled the plug. I will try again on Thursday with a web-based monitoring utility that will give me results from intermediate stages of the computation.