Artificial Intelligence

Computer Generated Art: Neural Style Transfer

Computer Creativity

Ronit Taleti
5 min readSep 8, 2020
Mona Lisa in the style of several famous paintings (Image from ML4A)

What is Neural Style Transfer?

Neural Style Transfer is a method of combining two images with a computer. One as a base image, and the second as the stylization. It’s like taking a picture of your best friend, and making it look as if they’d been painted by Van Gogh himself!

An example of Neural Style Transfer. (Image by Ritul in Style Transfer using Deep Neural Network and PyTorch)

Before we start, you should be familiar with neural networks, and while I won’t explain neural network basics here (I’ll be explaining Convolutional Neural Networks), I’ll link to an article which will teach you everything you need to know to understand this:

How exactly does it work?

Well, Neural Style Transfer is generally done using a Convolutional Neural Network (CNN), which is a type of neural network, but what exactly does a CNN do?

Convolutional Layers

Well, first off, CNN’s are different from a regular neural network as they don’t sequentially go through data (pixels in this case) like most neural networks do. Instead, they look at multiple pixels at once, almost like sliding a filter over the image.

The layers of a CNN are made of mostly convolutional layers (hence the name) which help create “features”, and each layer is progressively more complex. Here is a visualization:

Image from Stanford

The filter here is the red numbers we are multiplying the yellow numbers against. Basically, we look at one part of the image, apply the filter, and sum the result to get a convolved feature. If we think of the image being grayscale, each of the original numbers is how bright that pixel is.

The early layers may be designed to find edges, corners, and shapes, but later layers may detect more complex features like limbs, faces, and entire objects.

The result of an edge detector (Photo from Denny Britz at KDNuggets)

Pooling Layers

Pooling is much simpler and also similar to convolution. Instead of detecting features, pooling is meant to lighten the load on the network by reducing its spatial size.

Image from Xinhui Wang

Basically, it reduces the size of the inputs of the convolutional layers, with generally two methods:

Max Pooling
Max pooling is the first type, wherein you take the biggest number in each cell of the image.

Average Pooling
The second type of pooling, wherein you average all the numbers in each cell of the image.

Activation Layers

Activation layers are usually key for a neural network. It allows us to make the final prediction.

Image from GumGum

However, here we won’t be making a prediction, and we will only be using convolutional (and pooling) layers. What we will do is create a loss function, which will be what blends the image and will also allow us to control how it is blended.

Loss Functions

The loss function will be essentially how far off the neural network is from the original image, and this loss function will actually be made of two different loss functions that make up a “total” loss function. We want to minimize the loss of the function, basically meaning we want it to be close to the original image, but we also don’t want to make it 0, because we want our new style transfer image to be different.

The two loss functions I mentioned before are for the two images we are going to input, the content image, and the style image, which we want to apply to the content image. Then we can combine those with some weights (numbers applied afterward which we can change to tweak how much style/content comes out), and then each other, to get our total loss function.

I did kind of lie before however because I said that WE tweak the weights for the loss function. The computer will actually do the tweaking for us, with an optimizer. This optimizer will essentially try it’s best to lower the total loss function that was explained earlier, and after it does that, we are done!

Of course, we don’t actually feed an image into the neural network, we simply turn that image into data in an array, and then feed that into the neural network. Afterward, we have to turn the data the network spits out into our new styled image!

A visualization for how a network takes in an image of 784 pixels. (Credit to 3Blue1Brown)

Main Takeaways

  • Neural Style Transfer is a technique that applies the style of one image to another.
  • Neural Style Transfer uses a type of Neural Network called a CNN.
  • CNN's have convolutional layers that look at multiple pixels at the same time.
  • They also have pooling layers to lessen the load on the network.
  • In Neural Style Transfer the network has a loss function which allows us to retain elements of the style and content images
  • After getting the loss function to closely represent both images, we successfully have applied the style of one image to another!

If you enjoyed reading this article or have any suggestions or questions, let me know by commenting or clapping! You can find me on LinkedIn, or on my website for my latest work and updates, or reach out directly on my email!

--

--

Ronit Taleti

I’m an avid 17-year old blogger interested in new and emerging technologies like Artificial Intelligence, Blockchain, and Virtual/Augmented Reality.