A map of my images so far

I have several aims with making the images that are a major part of this project: to explore creative collaborations with AI, to create a collection of images depicting distant future AI creation myths and to think through some of the big questions about AI & robotics by making things about AI & robotics, with AI & robotics.

The robots and I started by making a few different myth images using a few different techniques. The results of each phase of image making influence the next phase, and experiences with one kind of experimentation lead to ideas for a different kind of experimentation. I have tried to capture this graphically below. Each postage stamp sized image is accompanied by the medium (blue text) and technique (red text) it was produced with. Some of the images have titles, and a few are just test images. I numbered them so that I can refer to them easily. The arrows show how one image inspired or influenced another, and often the arrows also indicate some kind of technique or technological development was required to proceed from one to the other, as with printing on reflective surfaces, or using facial recognition to build a large dataset to train a GAN.

The topmost group of these images are composites, the next group down deal with gilding and style transfer, and the bottom-most images are produced directly by GANs. I haven’t added much detail about the robotic experiments yet.

 

Computer Vision finding old faces and making new ones

What would future artificially intelligent machines think humans looked like if they’d never seen one? With only scattered fragments of data about our time, they might try to reconstruct our history from scraps – after all, history and archaeology is always a reconstruction from fragmentary evidence.

In this hypothetical future, let’s say that digital and print images of humans do not survive, but paintings – whose pigments last for centuries – have been safeguarded. Would traces of the neural networks that allow present day computers to recognize faces survive through centuries of machine evolution, the way ancient words occasionally appear in modern language?

Over the weekend, I experimented with a more sophisticated face detection algorithm to increase the size of dataset containing faces culled from early Renaissance artworks. After discussing the results of my face detection effort using Haar Cascades, Tushar Gupta suggested I try Facenet (link). Although both of these approaches work well with photos and webcams, I am not expecting perfection from either of them when confronted with pictures of artworks, considering the neural network underlying the detection was based on photographs.

Facenet is more complex and harder to use, and requires a Cuda-enabled GPU, but online posts suggested its performance was much better than Haar Cascades. I developed some Python code based on cjekel’s implementation of David Sandberg’s implementation of the Facenet paper.

This had to be run on a remote machine due to the computationally intensive neural network it uses, and the need for a GPU running Cuda. This means a lot of tricky file manipulations over ssh with a rented cloud computer from AWS. Although I’m now getting used to this kind of remote work, it’s definitely inefficient. To perform the simplest operation like coping an image from this machine, I need to compose a command line statement something like:

scp -i /Users/chrisluginbuhl/Dropbox/Digital\ Futures/Thesis/Python/AWS-CL3.pem ubuntu@ec2-35-183-135-227.ca-central-1.compute.amazonaws.com:~/facenet/facedetect/faceBx.zip /Users/chrisluginbuhl/machine_learning_local/wikiart/Early_Renaissance/detected 

On the other hand, it can run code that my machine can’t, and it’s pretty fast at almost anything. I am paying about $3/hr for this machine (a p3.2xlarge), whose Nvidia V100 graphics card alone retails for over US$11,000!

The previous face-recognition program I ran on this dataset found 1059 faces and 1250 false positives among an unknown number of actual faces in 2790 pictures. This algorithm did much better: 2758 faces and 111 false positives in the same 2790 pictures, and took only slightly longer to run. The cropping results were significantly more consistent as well:

 Despite being trained on photos of people, Facenet finds faces in these photos of paintings and sculptures very well.

There’s less entertainment with false positives, but I removed a few by hand just to help the network be clear on what’s a human and what’s a lion:

Performance is a little too good if you want just human faces that will help the GAN rather than confuse it.

It definitely missed quite a few faces as well. I modified the script to show the detected faces in their original context, and ran it through a handful of images:

Red boxes indicate detected faces. These four images show some of the failures and successes.


Generating new Renaissance people

My computer is capable of running a basic DCGAN, and I trained one with this collection of faces. I needed to do some fine tuning of hyperparameters to get it to train and its discouraging to run an experiment all night only to discover it failed to train. I quickly discovered just how much faster my $3/hr AWS machine with a fast GPU is for training a GAN.

One pass of about 4000 every 10 seconds means a single experiment runs all night on my fast video editing laptop….(sped up 2.5x above)

….what a difference 5120 CUDA cores make

Towards the end of the experiment, some interesting looking renaissance people began gazing out of the data at me:

After many days spent working towards this, I was very happy to make the acquaintance of these strangers (some stranger than others). At the moment I am still playing with hyperparameters and working on troubleshooting an error that is preventing me from making faces larger than 64×64 pixels.

One of the more haunting faces captured my imagination, and I wanted to see if Huxley the robot can make a decent mosaic of it from coloured tiles. There may be something appropriate about a young robot painstakingly envisioning what a human from the renaissance might have looked like, while I continue toiling to create a better machine for the future out of lengthy command line arguments.

What do machines see?

Earlier this week I adapted a computer vision technique intended for photos to isolate the faces in 2500 early Renaissance paintings from wikiart.org. I am hoping to create new faces from these using a GAN, in order to represent how machines “see” their human creators. For purposes of the image I’m making, the machines have a penchant for traditional European art.

My code was forked from the code by Jeevesh Narang here. It uses OpenCV’s face detect algorithm using Haar cascades which is discussed in detail here (section 1.10.1), which in turn is based on Paul Viola and Michael Jones’s 2001 paper, “Rapid Object Detection using a Boosted Cascade of Simple Features”. It’s a machine learning technique that does not depend on neural networks.

Running it on 2800 early Renaissance produced results that were…interesting. It found a lot of faces, but produced about 1250 false positives in 2790 paintings (based on hand filtering the results). Here’s what a selection of its “faces” looked like:

 Only about half of the hits contain actual faces, and some are cropped strangely. Only about half of the hits contain actual faces, and some are cropped strangely.

Some of these are clearly faces, some are clearly not faces, and some are understandable mistakes. I particularly liked these “faces”

 Never noticed how much a horse’s rump looks like a face. Also I think it spotted the Shroud of Turin. Never noticed how much a horse’s rump looks like a face. Also I think it spotted the Shroud of Turin.

It didn’t take too long to filter the false positives by hand in a small dataset like this one, but by spot checking I could see that the algorithm missed a lot of faces as well. It makes sense that an algorithm intended for photographs would suffer when used for paintings. Rather than tweak parameters,

More research turned up a more modern approach (github) for face detection based on – you guessed it – machine learning. It boasts greater than 99% accuracy, and is trained on hundreds of thousands, or millions of images, depending on settings.

This question of what machines see is particularly poignant this week as social network Tumblr has announced it will be banning adult content from its site soon. Bloggers like Janelle Shane have been posting (on Twitter) some of the images that have been flagged as inappropriate:

 From Twitter - Janelle Shane’s Tumblr post on the inappropriate dual nature of light From Twitter – Janelle Shane’s Tumblr post on the inappropriate dual nature of light

Clearly some algorithms are better than others at distinguishing and categorizing images. Social networks all seem to employ growing teams of people to moderate content. It’s interesting to note that the Discriminator half of a GAN is quite a bit better at its job than the Generator half is. this makes sense – it’s easier to detect content than to create convincing content. But the Discriminator is the half that we throw away once the network is trained.

This got me thinking about the challenges I’m facing with gathering a large enough dataset to create new images. In a 2017 blog post about their artmaking GAN, Kenny Jones and Derrick Bonafilia mentioned that the Discriminator network of a GAN they were making was able to successfully categorize artworks 90% of the time, based on the categories in wikiart.org.

Last week I got 8500 images from NASA and am considering filtering them by hand to train a GAN. This approach is taken by others who curate their training data to achieve a particular kind of output from a GAN, or even create all the images by hand. Yikes!

Looking at my NASA dataset, I want to filter out all the images that are not of starry scenes with a nebula, galaxy, or other heavenly body in them. I considered writing a script to look at the histogram, and reject anything that wasn’t black around the edges. That would filter out a lot of terrestrial landscapes, diagrams and other scenes that will only confuse my nebula GAN. But technically the perfect tool for filtering these images is…a neural network. Wouldn’t that be the perfect tool to facilitate generating more and better datasets?

For training Neural Networks on specific data that isn’t already available in a large dataset, it seems that we need an easy-to-use neural network that could help gather images in a particular category. As the power and variety of GANs increases, we may find ourselves increasingly limited by our training datasets, the same handful of which get used again and again. But with a relatively small number of sample images, we may be able to go out into the web and find “more images like these” using essentially the Discriminator networks which are a discarded by-product of the GANs we’re training all the time.

Creation myths with CycleGAN

The first and simplest GANs such as DC GANs produce images that mimic the images they were trained on. (See the black and white digits produced by a DC GAN in the blog post Getting Started with GANs). Others have since developed variations that address some of the challenges such as GANs falling to converge (i.e. failing to be trained) or mode collapse, which causes a GAN to create the same output over and over. Various techniques such as Wasserstein loss, SELU and subdivision can sometimes help the training and reduce the amount of trial-and-error needed to fine-tune hyperparameters. For a sense of some of these approaches and their tradeoffs, see Alexia Jolicoeur-Martineau’s blog post on generating cat faces with GANs: https://ajolicoeur.wordpress.com/cats/ .

CycleGANs are significantly different, using the same building blocks in quite a different way. This week I got some promising results after training a CycleGAN on Early Renaissance paintings from WikiArt and Astronomy pictures from NASA’s Astronomy Picture of the Day. I used the cycleGAN in an unconventional way, so I will start at the beginning with CycleGAN basics.

Unlike Style Transfer, which can render a photo in the style of a particular painter, Cycle GANs are normally used to convert an image from one subject to another. Style transfer is the subject of my next blog post.

CycleGANs are two complementary GANs – one that converts pictures of A into pictures of B, and the other turns pictures of B into pictures of A. The authors of the original CycleGAN paper demonstrate training a CycleGAN on a thousand pictures of zebras and a thousand pictures of horses.

 Converting A->B and B->A with CycleGAN. (From the original paper:  https://arxiv.org/abs/1703.10593v6 ) Converting A->B and B->A with CycleGAN. (From the original paper: https://arxiv.org/abs/1703.10593v6 )

Like other GANs, this approach requires a training dataset with more than a thousand images for each type of scene. Like other GANs, these can be hard to train. Having trained them, there are other ways they can fail to deliver the desired results.

 How to blend into the herd: A shirtless man (Vladimir Putin?) becomes an extension of his zebra. (From the  paper ) How to blend into the herd: A shirtless man (Vladimir Putin?) becomes an extension of his zebra. (From the paper )

The training and image generation code has been released on GitHub (here). It makes use of Cuda processing on the GPU to speed up training, so it was necessary to run it on a remote machine with multiple GPUs.

As an aside for anyone who is looking to replicate this, it took some wrestling to get it to work, mostly due to inexperience. The code wants to run a graphical display of progress on a local port, which doesn’t help if the machine is remote. If you’re not running the graphical server, the code will crash. Tushar Gupta helped me figure out how to get this all working on AWS. You need to run Python 3 with PyTorch, Cuda, and check the other module requirements in requirements.txt. Before starting, make sure to run:

python -m visdom.server

I decided to deliberately try to map normally unrelated subject matter together. Zebraman, above, is a fine example of machines making connections that humans would never make (…okay, very few humans would ever make). Part of my process of writing plausible machine creation myths is looking for connections that might seem natural to machines, but unexpected to us. One of the myths that I suggested could emerge from astronomy AIs on Mauna Kea was the veneration of distant nebulas – the birthplaces of stars. Since the future renaissance machines are taking their image-making cues from traditional human images, they mash up astronomical images and gilding.

A couple of weeks ago, I had managed to scrape about 2500 pictures from the NASA’s Astronomy Picture of the Day website. For the first pass, I simply resized all the images I had to 64×64 pics and then cropped them square without resizing. I did the same for every picture in the Early Renaissance category of wikiart.org, also about 2500 pics. This collection therefore included paintings, drawings and sculptures of many different subjects, many with heads cut off by the cropping. The astronomy pictures were an equally varied lot. This is not ideal for training a GAN, where more consistency would be helpful. I wanted to try this as a baseline, to see how much improvement I got by sorting the images manually, by scraping with keywords, or by computer vision techniques before feeding them to the GAN.

 Early Renaissance pictures from Wikiart.org, resized to 64x64pix and cropped square Early Renaissance pictures from Wikiart.org, resized to 64x64pix and cropped square  Astronomy pictures of the Day from NASA, resized to 64x64pix and cropped square Astronomy pictures of the Day from NASA, resized to 64x64pix and cropped square

I let it run for several hours, about 22 training epochs. The authors mention using 200 epochs (https://arxiv.org/pdf/1703.10593v6.pdf Section 7, Appendix) An epoch means processing through all the training data once. Multiple epochs are usually needed, and often hundreds. I had to write another short script to combine the images in some sort of order that I could make sense of (code on my Github).

In the 21st epoch, promising results started to appear:

 A Madonna becomes a globular cluster… A Madonna becomes a globular cluster…  A globular cluster becomes a Madonna. A globular cluster becomes a Madonna.  The original Madonna The original Madonna  The original globular cluster The original globular cluster

Normally CycleGANs are used to create a complete transformation (e.g. all the way to zebra). In this case, interesting intermediate results are achieved by pausing part way through the training. Although the method is quite different, the results are similar to the interpolations possible between image categories using BigGAN discussed

This result was based on smaller datasets I had before developing a new web scraper for APOD (discussed last week) and before doing some filtering and segmenting of the renaissance images (to be discussed later this week). I’m looking forward to more experiments with CycleGAN.

A Robot and A Software Toolkit for Making Images with GANs

Introducing Huxley

This week I had the pleasure of working with Huxley, an ST Robotics ST-12 arm in Phase Lab. Huxley has a 1m reach and can write his name with a pen. We did not shake hands, but neither did he punch me in the nose. So far so good.

I am planning to use a robot to take some of the designs produced by neural networks and trace them onto gold leaf, as an important step in taking the digital art back into the physical world, and the realm of traditional art materials.

 Huxley, the ST-12 6-axis robot
Huxley, the ST-12 6-axis robot

The robot’s software runs on Windows 7 or earlier, which presented some issues, since software updates, when they were still being released, would sometimes break drivers. Huxley hasn’t been able to connect for a while.

The driver box is connected to a PC via RS-232 serial port, so it was necessary to spend a couple of days thrashing around with cables and drivers that could make the connection. Success was achieved around 4:58pm on Friday! Michael Page, who runs Phase Lab, has done some interesting work with this robot in the past. He supplied three USB-to-Serial adaptors of unknown quality, and two Windows 7 machines. I supplied one more plus a virtual Windows 10 machine and with a voltmeter found a combination that worked.

This week Michael is going to get me started and set me up with the code that runs the robot.


Off-the-Shelf vs. Roll-Your-Own GANs

GANs are neural networks that can be used to make pictures. By using thousands of sample photos to “train” the network, the GAN becomes “tuned” to a particular style of image. Emerging in 2014, GANs are a recent invention. They are not user friendly or easy to train, and are limited to low resolutions. But the results can be quite astonishing, whether in the ability to nail an image in a particular style or, just as often, by the ways they get images wrong.

These computationally-intensive systems are a great compliment to human artists – they are quick where we are slow. But for all their inventiveness, they display to common sense, whereas people carry around in their heads a lifetime’s worth of knowledge about people, objects, vision and how these interact in the world.

Every week, new papers are released to the machine learning community featuring a new variation on GANs. Often, these are simply adjustments to hyperparameters but are worthy of publication because they impart some special ability. One such example is CycleGAN, which I mentioned last week, which can create original imagery such as celebrities who do not exist.

One such example is the CycleGAN paper, which uses two GANs for Style Transfer. Style Transfer is sometimes done with another Deep Learning technique described here (https://arxiv.org/abs/1508.06576) and made famous by websites like deepart.io and apps like Prisma.

 Prisma uses pre-trained neural networks to transfer styles to photos based on various presets (image:  mspoweruser )
Prisma uses pre-trained neural networks to transfer styles to photos based on various presets (image: mspoweruser )

At the moment, the tools are mostly pre-packaged with a few previously defined styles, or open-ended but extremely user unfriendly. One exception to this is ML5.js, a new library for Javascript that packages the power of Deep Learning for convenient use in a web browser. As with the above techniques, however, only pre-trained neural networks are available at the moment.

Part of the reason for this is that training neural networks is computationally intensive, and involves significant trial and error to arrive at a network that avoids mode collapse and strikes the right balance between too abstract and too derivative when making images.

It’s also challenging to find enough images to train a GAN – typically thousands are required. Datasets like the CelebA dataset and ImageNet have been used because they provide images of well-defined subject matter in clear categories.

I’ve been taking on the challenge of making GANs from scratch. Last week I ran a few experiments on a local machine with pre-packaged training images, but this week I ran more complex models on a remote machine. I also wrote a script for getting more training images from the web.

Clearly getting good training data is half of the challenge of producing good output. The Python script I wrote (here) scans through the last 23 years of Nasa’s Astronomy Picture of the Day and downloads the images to a local machine. As I learn more about Python, I continue to be impressed with what can be done in fewer than 45 lines of code. (I wish I could say that I’m falling in love with HTML at the same time, but alas not yet). This approach overcame the API’s usage limits and some of the limitations of other web-scraping programs I previously used to gather the images for training last week’s DCGAN. I got 8500 images this way in a few hours.

I am hoping to make nebula-inspired images for the series of machine-made mythological images. I may refine this approach to filter for keywords in the description, such as “nebula”. Unlike other image datasets I’ve seen, most of the images in this collection are gorgeous. I can’t wait to see what a nebula-trained GAN will come up with.

 Above: Nebula image from the Hubble Space Telescope courtesy of Nasa ( source )
Above: Nebula image from the Hubble Space Telescope courtesy of Nasa ( source )

Most of the code used to produce images in academia is made publicly available. It is not often packaged in order to be user-friendly, and may be written in one of a variety of languages and libraries used with Machine Learning. Fortunately, the ML community often adapts and improves the algorithms in popular papers, making them available in various computer languages. I am focusing on Python and Pytorch, because it is popular, powerful and concise. Lua and Tensorflow (by Facebook and Google, respectively) are lower-level approaches to neural networks and result in longer code that can be difficult to debug.

While small neural networks can be run on the CPU only, larger networks and larger datasets only work with Nvidia GPUs which run Cuda to harness the computational power of the parallel processing that comes with graphics cards. A network that trains in a day on a CPU could take a year to train on a CPU. The system I have been renting from Amazon Web Services (AWS), for about $3/hr, has 8 Tesla V100 GPUs, each of which is about 20x faster than the $1200 graphics cards found on high-end gaming desktops (comparison here). I find it hard to imagine the kind of computation I’ve been doing that takes all day on a GPU that performs 8 TFLOPS – that’s 8 million million double-precision floating point operations per second.

Running a machine on AWS is challenging to the non-expert, since everything has to be done through a tunnel provided by SSH. It’s like assembling a clock in a locked room by reaching a screwdriver in through the keyhole. I was surprised to note how much I’ve come to depend on hearing my computer’s fan to know when it’s processing something heavy. Without this physical symptom, I struggled to know whether anything was happening at all. Today after several hours of finding no results in the results folder of the remote machine, I pulled the plug. I will try again on Thursday with a web-based monitoring utility that will give me results from intermediate stages of the computation.

Getting Started with GANs

For the last two weeks I have focused on concepts and techniques for creating mythological images. Generally, I come up with a mythological scenario and then create the image around it – the civilization of machines being unearthed from the ground, emerging from the ocean, resulting from a crashed flying machine and various others.

I start with a sketch (example below)….

…and build a photo composite based on the general composition of the sketch. This one below is made by compositing low-res images from ImageNet, which is a dataset used to train machine vision systems.

ImageNet looks something like this. A hierarchy of English nouns with associated photos, almost 15 million photos in total.

I also tried using the outputs of BigGAN – a new Neural Network trained on ImageNet that generates an endless variety of eerily photorealistic images in various categories. Some of them contain objects that can almost pass for photographs but contain strange duplications, omissions, distortions and dreamlike details that don’t make sense. Some of the mysterious objects looked like good candidates for the myth cycle wherein machine life emerged from the sea, so I composited them together with a seascape. This test image suggests this approach may have some promise, but perhaps the ambiguity of the machine dream objects makes these unreadable.

These are rough images intended to explore concepts and lacking polish. I don’t want to spend too much time on concepts that don’t work out. Here’s one I did spend quite a bit of time on, combining images from ImageNet and from BigGAN, but which really didn’t work. I was experimenting with showing a crash-landed drone as the figurative source of the spark of wisdom, reminiscent of daVinci’s Madonna of the Yarnwinder, and the obelisk in 2001: A Space Odyssey. It really didn’t go well.

 I think it is fitting that the machines would use pictures from ImageNet to depict their earliest cultural memories. But for this image I had to troll through hundreds of images just to find a picture of a drone and cave that would sort of work, and weren’t great. Part of the problem is that you don’t have your pick of images or of lighting conditions from ImageNet, and many of the promising images have restrictive copyright that prohibits their use in derivative works like this. They are not selected for inclusion in ImageNet because they are beautiful. The low resolution of the images is another major issue. Almost all of the pictures seem to be from pre-2007 (perhaps when people started uploading their pictures to Facebook, rather than Flickr and personal webpages, and the camera sensors of 12-15 years ago don’t produce beautiful results)

My original concept was to develop a GAN or other generative technique that can create original patterns for scribing on gold leaf. Unfortunately, GANs are at the complicated and recent end of the spectrum when it comes to Machine Learning and AI. There are no books, the techniques are complex, and you need large datasets of appropriate imagery. GANs are famous for being difficult to train. There is a lot of time-consuming trial-and-error while tuning hyperparameters. Mode collapse, where a network decides to produce the same image over and over, is a common problem. Sophisticated approaches like BigGAN are out of reach, requiring enormous processing power and not insignificant electricity. I began experimenting by replicating others’ results.

One traditional experiment looked promising. Ian Goodfellow’s 2014 paper, which was the first to present GANs, attempted to create handwritten digits that would be indistinguishable from those written by people. Following in his footsteps, I was able to create these using the most basic type of GAN, a deep convolutional GAN or DCGAN. It is shown here progressing through stages of training from random noise to passable results.

I then replicated another well-known experiment creating new celebrity faces from the celebA dataset. Here are some of the resulting people who never were, dreamt up by a machine:

By the end of the second training epoch, they are starting to look pretty human. More recent techniques have improved on these – see these folks created by a Progressive GAN (link)

I was fortunate to be able to make use of pre-existing datasets that are all carefully cropped with the eyes in the centre of the frame. Training my own GAN on my own data was more tricky. I wrote a little utility to resize and crop images from wikiart.org. I am pretty new with Python. Last week I went through about half of Learn Python the Hard Way. It definitely helped me get my bearings inside other people’s code. The fact that you can run it line by line or in a Jupyter Notebook makes it easy to take apart other people’s code and see what’s happening. Python is pretty convenient. Here’s how you open a file in Python and print it out:

filename = "hello_world.txt"
file = open(filename, "r")
for line in file:
   print line

This is refreshingly useful and direct, especially when compared with C or Java. Here’s how you do it in Java (not including the main class you need in order to call this):

List<Integer> list = new ArrayList<Integer>();
File file = new File("file.txt");
BufferedReader reader = null;
try {
    reader = new BufferedReader(new FileReader(file));
    String text = null;
    while ((text = reader.readLine()) != null) {
        list.add(Integer.parseInt(text));
    }
} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
} finally {
    try {
        if (reader != null) {
            reader.close();
        }
    } catch (IOException e) {
    }
}
//print out the list
System.out.println(list);

(Courtesy of dogbane at StackOverflow – link. Because I have to look it up every time)

I have been asking myself how Practice-based Research works when part of the creative process involves coding or building. It’s too early to say, but I can say that the feeling of writing a ten line program that crops and resizes 1400 photos in a few seconds was absolutely thrilling – like having a studio assistant that takes the grunt work out of preparing to make a picture. I could easily see using Python to filter out sketches from paintings (based on colour composition) or centring faces on the page using pre-existing libraries like openCV. Python feels like it will actually speed up the process of creating, as opposed to being a necessary but very slow grind like developing in lower level languages.

I was also assisted immeasurably by Tushar Gupta a programmer I found through Codementor, who helped me get unstuck and suggested some approaches to try.

I scraped about 1400 early Renaissance paintings from WikiArt and ran them through DCGAN. These are quite a mismatched group of images, so I wasn’t expecting miracles. Here’s a sample of a few images from the dataset, resized to a tiny 64x64pix:

Predictably, the DC GAN was not able to get a handle on this diversity of images in just two training epochs, which took about 20 minutes to run on my laptop.

Here’s how far it got, going from random noise, to coloured blotches.

It did pick up on the colour palette – avoiding chromium yellow, durable greens and other pigments that were not available until much later.

Experiments with CycleGAN (for image style transfer and pix2pix), and other GANs will be next. I will be running those on powerful cloud-based servers with GPUs that can run Cuda, which my laptop is unable to do.

I’ve noticed that the sheer amount of time spent trying to execute on some of these ideas provides space for creative ideas to arise. As I mentioned in my blog post about BigGAN, spending hours going through images searching for useful material allows part of my mind to focus on the details of the task while some other part is chewing on the themes and general questions of the project in a more abstract, non-verbal and sometimes non-conceptual way. It would be hard to sit down and work on developing ideas or solving problems related to this research for hours on end, but when there is a making task to guide and focus my attention, it becomes possible to think about the research most of the day.

I had a couple of ideas that were spurred by working on the celebrity faces and cropping the Renaissance Paintings – would it be possible to run a face detection algorithm on Renaissance paintings, and then train a GAN based on just those faces? This could provide appropriately painterly figures for an image that avoids the problem of having too much photorealism when trying to create a mythological or allegorical space.

The image I want to use this technique in is partly based on an image I found while looking for gold backgrounds that could be used to train GANs. That will be the subject for another blog post.

Scattered Plots – Organizing Thoughts Graphically

Part of my process of defining and refining my project has been to look for moments when my various interests can be combined in one project. In the past that has meant doing something with a social conscience that involves machinery or photography, or writing a bit of code for an arts organization to help them reduce duplicated effort. If it involves two or more of my interests, and I get to learn something along the way, I’m in!

For my thesis, I had a number of ideas swirling. I was very fortunate to be able to participate in Martha Ladly’s Florence Contemporary course in Italy last spring. I spent a lot of time reading biographies, especially Leonardo da Vinci, Giotto and other key artists who instigated major innovations in painting. I was working on a couple of photography and design projects that presented images and artifacts that were ambiguous as to whether they were from the past or the future, as a way of exploring the possibility of history being cyclical or spiral-shaped, with dark ages  followed by Renaissances. What would the next renaissance be, and who would be there to participate in it?

I tried to map out the relationships between the various ideas I was pursuing like this:

 Fig 1. My various research topics and creation projects conceived in Florence

Over the next week or so, I thought more about this and the Future Renaissance seemed to be a perfect way to bring together several different ideas and interests I wanted to pursue. I wrote it in point form notes that began like this:

“Museum of the Neo-renaissance.

After a fall, the museum shows artifacts of the dark ages – restoring images created then, artifacts made from defunct high-tech objects, archaeology and images of crumbled buildings with huddled survivors.
-photos (which are actually composites) could be contrasted with romantic paintings of that age…. ”

Through discussions with Kristy, I settled into the idea of distant future machines who roam the earth making images by combining tech and traditional image making techniques, using reclaimed gold for gilding.

In September, I had to explain this to other people, and seemed to make a lot less sense in words than it made in my mind. But maybe this is a case of E.M. Forster’s “How will I know what I think unt il I see what I say?” That is – the ideas are not actually clearly formed in the mind until we need to put them into words. The words may take some polishing, but eventually if the words don’t make sense, maybe the idea isn’t clearly formed.

 I wanted to group or organize the key ideas and topics to see what structure might be revealed. After trying various physical groupings with a process akin to moving paper dolls around the page, I arrived at a kind of scatter plot with three poles. The diagram below (fig 1) is composed of captions summarizing the topics of my project. The topics listed have specific meanings – e.g. generative art refers to generative art as it relates to my project, not generative art in general.

         The three labels overlapping the edge of the diagram indicate the three major categories of these topics: “Speculative Fiction”, “Practice-based Research”, and “AI, Machine Learning, Robotics”. Each of the topics are plotted inside the circle in relation to all three categories denoted by their proximity. For example, “Generative art” appears close to Practice-based Research, and AI, Machine Learning, Robotics, but not close to Speculative Fiction. Another, “Fusing tech with traditional techniques like gilding” is equidistant from all three categories, which helps explain why it has remained an important part of this project since its conception, even though most other topics have changed quite a bit.

Figure 2. Mapping topics in this thesis by proximity to Speculative Fiction, AI/Machine Learning/Robotics, and Practice-based Research.

         The biggest outcome of plotting these topics this way was noting that most of the topics close AI/Machine Learning/Robotics were equally close to Practice-based Research. There was also a strong cluster close to speculative fiction. So a good description of my project might be “Using practice-based research to explore AI, machine learning and robotics. Speculative fiction provides the overall guidance and framing for the practice-based research.

The diagram is a bit of a mess, however. Colours were not helpful, and probably only add to the confusion. Bad enough to have a polar plot with a triangular structure and tags scattered everywhere without confusing matters with random colours.

I tried again the following week. As with the previous figure, there were several phases of making this and the structure and logic of the diagram changed. I felt I was getting closer to the heart of the matter when I arrived at the three labels for the yellow, orange and red regions. It felt as though the whole thing was about to click and make sense. I would take a break, come back to it, realize it didn’t work, and change it all around. Here’s the result.

Figure 3. Research Outputs are shown at the centre, as the result of the human/machine collaboration. Farther out (orange ring) are the topics, methods and practice-based research activities that are actively involved in this project. The yellow outermost ring identifies topics that provide important context and support for the work.

I originally decided to make this third diagram in order to give the reader a key to the relationship between the various topics in my thesis. After writing the caption for the image, I realized that it wasn’t that the diagram was failing to get the ideas across clearly, rather it was that the ideas were jumbled, and trying to go too many directions at once. This kind of rich compost heap of thoughts may be a fun conversation, but may not hold together as a single piece of writing. In the diagram I have topics mixing freely with actions. At the centre is an activity, and its result, but it is surrounded by items which are also activities.

Back to the drawing board. I think the “right” next version of this diagram will focus on the various activities that contribute to the human/machine collaboration, which is the practice-based research. Rather than drawing the diagram and then doing the practice-based research according to that plan, I’m going to do some practice-based research next and then diagram how the various pieces related.

Physical Processes, Collaborations and Surprise

Part of this project is to take the output of AI and make it physical. These objects and creations work better and are less confined to our present day when they are not experienced on a computer screen. I am also exploring the performative aspects of a machine making images. For now, the intent comes from me, but when watching a robot arm work for example, it’s hard not to anthropomorphize and perceive that the machine has intention and is a real being. Jordan Wolfson’s Colored Sculpture is a great example of this. The title emphasizes that the humanoid figure which seems to undergo violent treatment is really just a coloured sculpture. But viewers tend to have visceral reactions when confronted with a spectacle that engages their sense that a sentient being is involved.

Robotlab’s bios (http://www.robotlab.de) features a large industrial robot carefully and precisely copying the Torah only a long scroll of paper. The viewer has the impression that this diligent machine will take all the time in the world copying this manuscript, and nothing could possibly deflect it from its devoted copying. The hushed sounds of the brush on paper and the machine’s movement recall a monastic scriptorium, and the robot’s bright colour is a reminder of the monochromatic monastic garb belonging to various different traditions.

But there is no surprise in what bios writes. Although the arm is capable of almost limitless motions, it is confined along a one-dimensional trajectory, without deviating by a single character – a feat probably unmatched by even the most diligent human copyists. 

I think the capacity to surprise us is the most interesting and possibly most useful characteristic of AI, and the key characteristic that differentiates these machines from every other kind of software or hardware machine ever devised. I think this is valuable because they can complement humanity’s strengths – excelling where we are weak, devising solutions we might never have conceived, showing us our blind spots and teaching us new tricks along the way. There are plenty of areas where they are very far behind us. 

In a wonderful summary of the recent matches between Google’s AlphaGo and human champions, The Atlantic wrote:

 “They’re how I imagine games from far in the future,” Shi Yue, a top Go player from China, has told the press. A Go enthusiast named Jonathan Hop who’s been reviewing the games on YouTube calls the AlphaGo-versus-AlphaGo face-offs “Go from an alternate dimension.” From all accounts, one gets the sense that an alien civilization has dropped a cryptic guidebook in our midst: a manual that’s brilliant—or at least, the parts of it we can understand.

This new way of learning seems to also be a two-way street. Not only are humans needed to create the AIs in the first place, we can use our unique human capabilities to steer their general-purpose networks in the right direction, as for example with supervised learning. Artist Mario Klingemann has done a lot of work with GANs on faces and human figures. In a recent project, he helped cue the network by identifying the facial features as particularly important. We can see it developing better acuity in the face than in the surrounding details. we can point out a path to them and they will learn it (https://twitter.com/quasimondo/status/1058421977550069760)

The quality of surprise is both an outcome of neural networks, and a capability that can be actively cultivated to help the network play and make discoveries. OpenAI researchers programmed a bot to play the video games Montezuma’s Revenge and Super Mario Bros by configuring it to avoid boredom, i.e. states where it could predict what would happen next. By seeking unfamiliar states, it learned to navigate the map, discover hidden levels, and defeat the bosses at the end of each level. https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/. No one watching the player would confuse it for a human playing the game. No human thumbs, no matter how young, would see a reason to have the player hop continuously as they traverse the map.

Is there is a fundamental difference between traditional generative art methods (e.g. procedural bots like The Painting Fool) and GANs? Do we learn more from collaborating with a neural network-based system than with a rule-based system? I hope to learn more about it as this project proceeds.

Compositing the Myths

How to make the images?

In my last blog post, “Is It Art?” I questioned whether making composite images from the output of other people’s GANs was an appropriate approach. (A GAN is a type of neural network for image generation). It seems like my human-machine collaboration experiment will be more personal and immediate if I am the one training the neural networks, rather than just selecting from the pre-existing output of other GANs.

I had made one of the four images I planned as “prototypes” as part of my practice-based research. I decided to dig a little more into making my own neural networks before proceeding with the next three images.

First try setting up a GAN

Trying to program GANs turned out to be extremely tricky, however. There is a lot of math, computer science and theoretical knowledge required. The scientific papers on the topic are quite daunting. Although several good implementations are available on Github, I quickly hit obstacles I didn’t know how to traverse.

One of the first obstacles was not having a suitable computer. For most machine learning work, a powerful computer with an Nvidia GPU is needed. Many of these models will take months to run on the CPU alone. Having heard that it’s possible to rent time on a virtual machine with a good GPU from Amazon, I set out to follow some tutorials to set that up. There is a delay of a couple of days  while Amazon reviews the request.

The next issue I had was getting anything to run on a remote server. The information I found on how to do this seems to be geared to people who are pretty experienced programmers. I think I got things running in the end, but had some troublesome errors and found that some of the python scripts in the repository had known issues that have not yet been fixed.

I realized that I am a little too new with machine learning to jump into the deep end right away. If it were just the theory or the math or just Python or just the libraries that were new to me, I could work my way through it, but since all of these things are new to me, taking them on all at once is not going to go well. I decided to start looking for help, and at the same time started looking at other machine learning techniques for image generation.

In the meantime, I proceeded making the next two images based on the general outlines I devised and sketched (see the “Four Prototype Images” blog post).

“Earth” Image

This image depicts the machines’ notion that they may have originated in the earth. I like that this is an inversion of the myths in which a heavenly influence originates from the sky. If the machines have a garbled understanding of how the earliest computers were made, they might gather that mining obscure minerals was part of it. There is also this idea (for example in Stephen King’s Maximum Overdrive) that an animating force could emerge from underground due to careless excavation.

I decided to make this image out of pictures that were used in the famous ImageNet dataset, commonly used to train neural networks like BigGAN. ImageNet contains over 14 million images categorized by subject. They seem to be mostly from the pre-Facebook and Instagram era, when resolutions were low and sensors were poor. I soon found that almost all the images come from Flickr or personal or academic websites from before 2007 and are usually not much more than 500 pixels on the long edge. Normally for making composites look real, the photographer starts with a good looking, high res photo of a background, and tries to capture as much of the subject and props “in camera” as possible, to reduce the amount of time needed in photoshop trying to make things look real. Since I may not be aiming for photorealism anyway, I decided to embrace this limitation, and see where it leads.

As I began work on the image, I found that ImageNet is a Stanford-based initiative. A few weeks ago, my co-supervisor Adam Tindale mentioned that while he was there, he witnessed some IT department employees wheeling a safe through the lab. In it was a digital copy of all of Stanford’s data from that year. They were taking it out to a secret location in the desert to bury it, so that in the event of a global calamity, a future civilization could one day find and revive their lost knowledge. This seems like a particularly enlightened form of reverse mining. We’ve taken all the readily accessible metals and minerals from the surface of the earth and scattered them. If our modern technology were to cease, it would be hard for a subsequent civilization to have a Bronze or Iron Age. Burying our data might not fix that problem, but could be a boost in the event there is a civilization that comes after a dark age.

I thought it would be interesting that in searching for their origins underground, a future machine civilization could uncover ImageNet, a well-organized catalogue of our world and civilization, and try to make sense of us using those images.

I did a few more sketches to try to flesh out my ideas for this composition. I considered having a few workers unearthing a tablet with strange markings at the bottom of a pit, something like a strip mine. Unfortunately, it isn’t usually possible to find such specific items in ImageNet. It is organized in a hierarchy using only nouns, so verbs like working, digging etc are absent. It is set up to be used as a classification hierarchy, rather than a set of keywords.

I thought about making a mountain in the reverse image of a strip mine – like a stepped or terraced pyramid. This is reflected in my initial sketch, in which I composited a scene from a strip mine.

Experimental sketch: how would I turn a strip mine into a mountain through compositing?

Through ImageNet I found the original Flickr album from a mining company and browsed through it while looking for images I could use while planning my composition. It is interesting that all of these images have copyright rules that restrict them from being modified or used in derivative works like mine. So images of what goes on under the earth aren’t public. These spaces are mediated by corporations who, judging by their Flickr album, have a very keen interest in controlling the perception of what goes in the ground, with workers and in nearby communities.

I spent quite a bit of time looking for mining images while planning this composition. It ended up being an important part of my research and making process. I found myself reflecting that all of my grandparents came to Canada from Europe for work in the gold mines in remote locations of northern Ontario. This mining was part of the colonial history of Canada.

I searched for images of veins in the rock that I could use to show the gold like circuit traces coming out of the ground. A lot of this was beyond what I am capable of drawing from scratch, or compositing from ImageNet. Nonetheless, it gave me ideas for the next set of images.

I settled on a composition and produced a more detailed sketch:

Final sketch before photo compositing

In this sketch, two figures (generated by a GAN) have unearthed a golden computer punch card (the preferred storage medium for data before the 1960s). The strip mine-like spiral ramp in the foreground goes down into the earth, which has a coppery glow. The mountain might be a pile of iron pellets. The golden sky is decorated in the style of religious panel paintings with a pattern depicting drones. This represents the machines’ ascendence from the ground into the sky. If I can, I will have a neural network evolve the outline of the drones in each row going upwards.

Since this is a prototype, I did not incorporate all the elements I dreamed up in the image below. It is composed of six photographs, with some hand-illustrated elements such as the shovel and the lighting effects on the foreground figures.

Composite of six photographs – a punch card being unearthed by the long-vanished humans during machine pre-history.

This is intended to be printed over burnished gold so that the computer card and the sun will reflect light brilliantly.

My first experiments with printing on a shiny surface were pretty disastrous.

Gerhard Richter would have approved of this print. I was disappointed.

There are several artists who have published books and videos about printing on unusual surfaces. With some experimentation, I was able to print with an inkjet printer or a shiny aluminum film, letting the white areas of the image allow the base material to shine through:

The sun and punch card reveal the shiny surface beneath.

As I mentioned in the post called “Physical Processes, Collaborations and Surprises”, making these digital images physical is an important part of this project. Having a reflective surface as the background of the print allows for a much higher contrast ration than you can get on a screen.

This image is not really something I want to show widely, but was a helpful part of my process in thinking through the questions of machine origin myths.

Is It Art?

As I mentioned in a previous post (“A Big Change in the World of GAN Image-Making”), Kate and I had been discussing compositing creation myth images out of the images made by neural networks, and the images they’re trained on. 

I made a couple of images using the output of BigGAN. No one is claiming that the output of that network is art, though the images are certainly quite striking. But since they are produced without any particular intention in mind, we can talk about their aesthetic qualities but probably wouldn’t call them art.

This raises an important point about how we see AI in society. Is creativity yet another area where our jobs are being taken over by machines? It’s hard to feel very threatened by today’s rudimentary image-making nets. But we are starting to see that it’s possible to write compelling click-bait headlines using neural nets. 

NYU Graduate Student Ross Goodwin mentioned a conversation he had with his supervisor Allison Parrish, who is an expert in text generation, 

…as I once heard Allison Parrish say, so much commentary about computational creative writing focuses on computers replacing humans—but as anyone who has worked with computers and language knows, that perspective (which Allison summarized as “Now they’re even taking the poet’s job!”) is highly uninformed.

When we teach computers to write, the computers don’t replace us any more than pianos replace pianists—in a certain way, they become our pens, and we become more than writers. We become writers of writers.

Ross uses position data and photos to inspire his machines to write interesting and whimsical prose, which people can then contemplate in the locations that inspired the writing. He speculates that machines can be a tool to extend our writing capabilities and tap into thoughts and impulses we might otherwise struggle to express. 

In a similar vein, a teacher of machine learning on popular computer science learning site Udacity.com predicts that machine learning will augment the capabilities of our minds in the same way that physical machinery has augmented the physical strength of our bodies a thousandfold.

Returning to the question of neural networks like GANs making art, researcher and neural network blogger Janelle Shane says that the images BigGAN makes may not be art, but selecting them for a particular purpose is an artistic act. I agree. Nonetheless, my first images created by combining bits of the output from BigGAN did not impress the few people I discussed them with. A typical exchange went something like: 

“Did you make these images then?” 

“No. It was produced by a neural network” 

“Did you program it?”

“No, someone else did.”

“Oh.”

“I’m going through the thousands of images and trying to make a composition from them that tells a story”

“OK so you’re copying, basically.”

I sigh deeply and ponder the plight of the misunderstood artist. But then I reflect that they have a point. It seems that people are willing to consider machine-made art, but it helps if the artist programmed the machine themselves. 

This is a familiar criticism that comes up regarding photo composites and even studio photography – that it is somehow “fake”. If the person wasn’t really in front of that backdrop, is it honest to cut-and-paste them in front of it? Journalistic and documentary photography has to adhere to a pretty strict set of professional ethics when representing a scene or events as they happened. But most people would accept that a painter can modify or entirely construct a scene to satisfy their aims. I think we struggle more with then when it’s done with photography because we may be used to thinking of photorealistic images as portraying reality. 

This may be a challenge for this project because I am using a photorealistic image-making technique to depict imaginary events. This kind of photo compositing seems to work best when either it is done extremely well (as on many movie posters), or is clearly depicting an imaginary scene (see for example the work of Von Wong https://www.vonwong.com or Miss Aniela https://www.missaniela.com). 

I am still working through the visual style and visual vocabulary of these images, by making images and soliciting feedback. I will be satisfied even if the images I produce don’t satisfy everyone, but I do want to try to make them reasonably interesting and accessible for most people. If I find that most people just think I am trying to fool them with a fake, I will want to adjust my approach.

As I work on the style and composition of my mythological images, I am simultaneously getting started training up my own GANs. Unfortunately, this particular technique, while perfect for my project, is very much the deep end of the machine learning pool. I will be looking for ways to get acquainted with some of the technical details without getting completely tangled in the weeds.