280 view times

Deep Learning & Art: Neural Style Transfer

In this assignment, you will learn about Neural Style Transfer. This algorithm was created by Gatys et al. (2015).

In this assignment, you will:

  • Implement the neural style transfer algorithm
  • Generate novel artistic images using your algorithm

The content image (C) shows the Louvre museum’s pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.

** 3.1.1 – Make generated image G match the content of image C**

Shallower versus deeper layers

  • The shallower layers of a ConvNet tend to detect lower-level features such as edges and simple textures.
  • The deeper layers tend to detect higher-level features such as more complex textures as well as object classes.

Choose a “middle” activation layer \(a^{[l]}\)

We would like the “generated” image G to have similar content as the input image C. Suppose you have chosen some layer’s activations to represent the content of an image.

  • In practice, you’ll get the most visually pleasing results if you choose a layer in the middle of the network–neither too shallow nor too deep.
  • (After you have finished this exercise, feel free to come back and experiment with using different layers, to see how the results vary.)

Forward propagate image “C”

  • Set the image C as the input to the pretrained VGG network, and run forward propagation.
  • Let \(a^{(C)}\) be the hidden layer activations in the layer you had chosen. (In lecture, we had written this as \(a^{l}\), but here we’ll drop the superscript \([l]\) to simplify the notation.) This will be an \(n_H \times n_W \times n_C\) tensor.

Forward propagate image “G”

  • Repeat this process with the image G: Set G as the input, and run forward progation.
  • Let \(a^{(G)}\) be the corresponding hidden layer activation.

Content Cost Function \(J_{content}(C,G)\)

We will define the content cost function as:

$$J_{content}(C,G) = \frac{1}{4 \times n_H \times n_W \times n_C}\sum _{ \text{all entries}} (a^{(C)} – a^{(G)})^2\tag{1} $$

  • Here, \(n_H, n_W\) and \(n_C\) are the height, width and number of channels of the hidden layer you have chosen, and appear in a normalization term in the cost.
  • For clarity, note that \(a^{(C)}\) and \(a^{(G)}\) are the 3D volumes corresponding to a hidden layer’s activations.
  • In order to compute the cost \(J_{content}(C,G)\), it might also be convenient to unroll these 3D volumes into a 2D matrix, as shown below.
  • Technically this unrolling step isn’t needed to compute \(J_{content}\), but it will be good practice for when you do need to carry out a similar operation later for computing the style cost \(J_{style}\).

Exercise: Compute the “content cost” using TensorFlow.

Instructions: The 3 steps to implement this function are:

  1. Retrieve dimensions from a_G:
    • To retrieve dimensions from a tensor X, use: X.get_shape().as_list()
  2. Unroll a_C and a_G as explained in the picture above
  3. Compute the content cost:

Additional Hints for “Unrolling”

  • To unroll the tensor, we want the shape to change from (m,nH,nW,nC)(m,nH,nW,nC) to (m,nH×nW,nC)(m,nH×nW,nC).
  • tf.reshape(tensor, shape) takes a list of integers that represent the desired output shape.
  • For the shape parameter, a -1 tells the function to choose the correct dimension size so that the output tensor still contains all the values of the original tensor.
  • So tf.reshape(a_C, shape=[m, n_H * n_W, n_C]) gives the same result as tf.reshape(a_C, shape=[m, -1, n_C]).
  • If you prefer to re-order the dimensions, you can use tf.transpose(tensor, perm), where perm is a list of integers containing the original index of the dimensions.
  • For example, tf.transpose(a_C, perm=[0,3,1,2]) changes the dimensions from (m,nH,nW,nC)(m,nH,nW,nC) to (m,nC,nH,nW)(m,nC,nH,nW).
  • There is more than one way to unroll the tensors.
  • Notice that it’s not necessary to use tf.transpose to ‘unroll’ the tensors in this case but this is a useful function to practice and understand for other situations that you’ll encounter.

The code is shown as below:

# GRADED FUNCTION: compute_content_cost

def compute_content_cost(a_C, a_G):
    Computes the content cost
    a_C -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C 
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G
    J_content -- scalar that you compute using equation 1 above.
    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()
    # Reshape a_C and a_G (≈2 lines)
    a_C_unrolled = tf.transpose(tf.reshape(a_C,[n_H*n_W,n_C]))
    a_G_unrolled = tf.transpose(tf.reshape(a_G,[n_H*n_W,n_C]))
    # compute the cost with tensorflow (≈1 line)
    J_content = (1/(4*n_H*n_W*n_C))*tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled)))
    ### END CODE HERE ###
    return J_content

3.2.1 – Style matrix

Gram matrix

  • The style matrix is also called a “Gram matrix.”
  • In linear algebra, the Gram matrix G of a set of vectors \((v_{1},\dots ,v_{n})\) is the matrix of dot products, whose entries are \({\displaystyle G_{ij} = v_{i}^T v_{j} = np.dot(v_{i}, v_{j}) }\).
  • In other words, \(G_{ij}\) compares how similar \(v_i\) is to \(v_j\): If they are highly similar, you would expect them to have a large dot product, and thus for \(G_{ij}\) to be large.

Two meanings of the variable G

  • Note that there is an unfortunate collision in the variable names used here. We are following common terminology used in the literature.
  • \(G\) is used to denote the Style matrix (or Gram matrix)
  • \(G\) also denotes the generated image.
  • For this assignment, we will use \(G_{gram}\) to refer to the Gram matrix, and \(G\) to denote the generated image.

Compute \(G_{gram}\)

In Neural Style Transfer (NST), you can compute the Style matrix by multiplying the “unrolled” filter matrix with its transpose:

\[G_{gram} = A_{unrolled} A_{unrolled}^T\]

\(G_{(gram)i,j}\): correlation

The result is a matrix of dimension \((n_C,n_C)\) where \(n_C\) is the number of filters (channels). The value \(G_{(gram)i,j}\) measures how similar the activations of filter i are to the activations of filter j.

\(G_{(gram),i,i}\): prevalence of patterns or textures

  • The diagonal elements \(G_{(gram)ii}\) measure how “active” a filter i is.
  • For example, suppose filter i is detecting vertical textures in the image. Then \(G_{(gram)ii}\) measures how common vertical textures are in the image as a whole.
  • If \(G_{(gram)ii}\) is large, this means that the image has a lot of vertical texture.

By capturing the prevalence of different types of features (\(G_{(gram)ii}\)), as well as how much different features occur together (\(G_{(gram)ij}\)), the Style matrix \(G_{gram}\) measures the style of an image.


  • Using TensorFlow, implement a function that computes the Gram matrix of a matrix A.
  • The formula is: The gram matrix of A is \(G_A = AA^T\).
  • You may use these functions: matmul and transpose.


# GRADED FUNCTION: gram_matrix

def gram_matrix(A):
    A -- matrix of shape (n_C, n_H*n_W)
    GA -- Gram matrix of A, of shape (n_C, n_C)
    ### START CODE HERE ### (≈1 line)
    GA = tf.matmul(A, tf.transpose(A))
    ### END CODE HERE ###
    return GA

3.2.2 – Style cost

Your goal will be to minimize the distance between the Gram matrix of the “style” image S and the gram matrix of the “generated” image G.

  • For now, we are using only a single hidden layer \(a^{[l]}\).
  • The corresponding style cost for this layer is defined as:

$$J_{style}^{[l]}(S,G) = \frac{1}{4 \times {n_C}^2 \times (n_H \times n_W)^2} \sum {i=1}^{n_C}\sum{j=1}^{n_C}(G^{(S)}{(gram)i,j} – G^{(G)}{(gram)i,j})^2\tag{2} $$

  • \(G_{gram}^{(S)}\) Gram matrix of the “style” image.
  • \(G_{gram}^{(G)}\) Gram matrix of the “generated” image.
  • Remember, this cost is computed using the hidden layer activations for a particular hidden layer in the network \(a^{[l]}\)

Exercise: Compute the style cost for a single layer.

Instructions: The 3 steps to implement this function are:

  1. Retrieve dimensions from the hidden layer activations a_G:
    • To retrieve dimensions from a tensor X, use: X.get_shape().as_list()
  2. Unroll the hidden layer activations a_S and a_G into 2D matrices, as explained in the picture above (see the images in the sections “computing the content cost” and “style matrix”).
  3. Compute the Style matrix of the images S and G. (Use the function you had previously written.)
  4. Compute the Style cost:

Additional Hints

  • Since the activation dimensions are \((m, n_H, n_W, n_C)\) whereas the desired unrolled matrix shape is \((n_C, n_H*n_W)\), the order of the filter dimension \(n_C\) is changed. So tf.transpose can be used to change the order of the filter dimension.
  • for the product \(\mathbf{G}{gram} = \mathbf{A}{} \mathbf{A}_{}^T\), you will also need to specify the perm parameter for the tf.transpose function.
# GRADED FUNCTION: compute_layer_style_cost

def compute_layer_style_cost(a_S, a_G):
    a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S 
    a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G
    J_style_layer -- tensor representing a scalar value, style cost defined above by equation (2)
    ### START CODE HERE ###
    # Retrieve dimensions from a_G (≈1 line)
    m, n_H, n_W, n_C = a_G.get_shape().as_list()
    # Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
    a_S = tf.transpose(tf.reshape(a_S, ([n_H*n_W, n_C])))
    a_G = tf.transpose(tf.reshape(a_G, ([n_H*n_W, n_C])))

    # Computing gram_matrices for both images S and G (≈2 lines)
    GS = gram_matrix(a_S)
    GG = gram_matrix(a_G)

    # Computing the loss (≈1 line)
    J_style_layer = 1./(4*n_C**2 *(n_H*n_W)**2)*tf.reduce_sum(tf.pow((GS - GG), 2))
    ### END CODE HERE ###
    return J_style_layer

You can combine the style costs for different layers as follows:

\[J_{style}(S,G) = \sum_{l} \lambda^{[l]} J^{[l]}_{style}(S,G)\]

where the values for \(\lambda^{[l]}\) are given in STYLE_LAYERS.

Exercise: compute style cost

  • We’ve implemented a compute_style_cost(…) function.
  • It calls your compute_layer_style_cost(...) several times, and weights their results using the values in STYLE_LAYERS.
  • Please read over it to make sure you understand what it’s doing.

Description of compute_style_cost

For each layer:

  • Select the activation (the output tensor) of the current layer.
  • Get the style of the style image “S” from the current layer.
  • Get the style of the generated image “G” from the current layer.
  • Compute the “style cost” for the current layer
  • Add the weighted style cost to the overall style cost (J_style)

Once you’re done with the loop:

  • Return the overall style cost.
def compute_style_cost(model, STYLE_LAYERS):
    Computes the overall style cost from several chosen layers
    model -- our tensorflow model
    STYLE_LAYERS -- A python list containing:
                        - the names of the layers we would like to extract style from
                        - a coefficient for each of them
    J_style -- tensor representing a scalar value, style cost defined above by equation (2)
    # initialize the overall style cost
    J_style = 0

    for layer_name, coeff in STYLE_LAYERS:

        # Select the output tensor of the currently selected layer
        out = model[layer_name]

        # Set a_S to be the hidden layer activation from the layer we have selected, by running the session on out
        a_S = sess.run(out)

        # Set a_G to be the hidden layer activation from same layer. Here, a_G references model[layer_name] 
        # and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
        # when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
        a_G = out
        # Compute style_cost for the current layer
        J_style_layer = compute_layer_style_cost(a_S, a_G)

        # Add coeff * J_style_layer of this layer to overall style cost
        J_style += coeff * J_style_layer

    return J_style

What you should remember

  • The style of an image can be represented using the Gram matrix of a hidden layer’s activations.
  • We get even better results by combining this representation from multiple different layers.
  • This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
  • Minimizing the style cost will cause the image $G$ to follow the style of the image S.

3.3 – Defining the total cost to optimize

Finally, let’s create a cost function that minimizes both the style and the content cost. The formula is:

$$J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)$$

Exercise: Implement the total cost function which includes both the content cost and the style cost.

# GRADED FUNCTION: total_cost

def total_cost(J_content, J_style, alpha = 10, beta = 40):
    Computes the total cost function
    J_content -- content cost coded above
    J_style -- style cost coded above
    alpha -- hyperparameter weighting the importance of the content cost
    beta -- hyperparameter weighting the importance of the style cost
    J -- total cost as defined by the formula above.
    ### START CODE HERE ### (≈1 line)
    J = alpha * J_content + beta*J_style
    ### END CODE HERE ###
    return J


电子邮件地址不会被公开。 必填项已用*标注