In this assignment, you will learn about Neural Style Transfer. This algorithm was created by Gatys et al. (2015).
In this assignment, you will:
- Implement the neural style transfer algorithm
- Generate novel artistic images using your algorithm
The content image (C) shows the Louvre museum’s pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.
** 3.1.1 – Make generated image G match the content of image C**
Shallower versus deeper layers
- The shallower layers of a ConvNet tend to detect lower-level features such as edges and simple textures.
- The deeper layers tend to detect higher-level features such as more complex textures as well as object classes.
Choose a “middle” activation layer \(a^{[l]}\)
We would like the “generated” image G to have similar content as the input image C. Suppose you have chosen some layer’s activations to represent the content of an image.
- In practice, you’ll get the most visually pleasing results if you choose a layer in the middle of the network–neither too shallow nor too deep.
- (After you have finished this exercise, feel free to come back and experiment with using different layers, to see how the results vary.)
Forward propagate image “C”
- Set the image C as the input to the pretrained VGG network, and run forward propagation.
- Let \(a^{(C)}\) be the hidden layer activations in the layer you had chosen. (In lecture, we had written this as \(a^{l}\), but here we’ll drop the superscript \([l]\) to simplify the notation.) This will be an \(n_H \times n_W \times n_C\) tensor.
Forward propagate image “G”
- Repeat this process with the image G: Set G as the input, and run forward progation.
- Let \(a^{(G)}\) be the corresponding hidden layer activation.
Content Cost Function \(J_{content}(C,G)\)
We will define the content cost function as:
$$J_{content}(C,G) = \frac{1}{4 \times n_H \times n_W \times n_C}\sum _{ \text{all entries}} (a^{(C)} – a^{(G)})^2\tag{1} $$
- Here, \(n_H, n_W\) and \(n_C\) are the height, width and number of channels of the hidden layer you have chosen, and appear in a normalization term in the cost.
- For clarity, note that \(a^{(C)}\) and \(a^{(G)}\) are the 3D volumes corresponding to a hidden layer’s activations.
- In order to compute the cost \(J_{content}(C,G)\), it might also be convenient to unroll these 3D volumes into a 2D matrix, as shown below.
- Technically this unrolling step isn’t needed to compute \(J_{content}\), but it will be good practice for when you do need to carry out a similar operation later for computing the style cost \(J_{style}\).

Exercise: Compute the “content cost” using TensorFlow.
Instructions: The 3 steps to implement this function are:
- Retrieve dimensions from
a_G
:- To retrieve dimensions from a tensor
X
, use:X.get_shape().as_list()
- To retrieve dimensions from a tensor
- Unroll
a_C
anda_G
as explained in the picture above- You’ll likey want to use these functions: tf.transpose and tf.reshape.
- Compute the content cost:
- You’ll likely want to use these functions: tf.reduce_sum, tf.square and tf.subtract.
Additional Hints for “Unrolling”
- To unroll the tensor, we want the shape to change from (m,nH,nW,nC)(m,nH,nW,nC) to (m,nH×nW,nC)(m,nH×nW,nC).
tf.reshape(tensor, shape)
takes a list of integers that represent the desired output shape.- For the
shape
parameter, a-1
tells the function to choose the correct dimension size so that the output tensor still contains all the values of the original tensor. - So tf.reshape(a_C, shape=[m, n_H * n_W, n_C]) gives the same result as tf.reshape(a_C, shape=[m, -1, n_C]).
- If you prefer to re-order the dimensions, you can use
tf.transpose(tensor, perm)
, whereperm
is a list of integers containing the original index of the dimensions. - For example,
tf.transpose(a_C, perm=[0,3,1,2])
changes the dimensions from (m,nH,nW,nC)(m,nH,nW,nC) to (m,nC,nH,nW)(m,nC,nH,nW). - There is more than one way to unroll the tensors.
- Notice that it’s not necessary to use tf.transpose to ‘unroll’ the tensors in this case but this is a useful function to practice and understand for other situations that you’ll encounter.
The code is shown as below:
# GRADED FUNCTION: compute_content_cost
def compute_content_cost(a_C, a_G):
"""
Computes the content cost
Arguments:
a_C -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C
a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G
Returns:
J_content -- scalar that you compute using equation 1 above.
"""
### START CODE HERE ###
# Retrieve dimensions from a_G (≈1 line)
m, n_H, n_W, n_C = a_G.get_shape().as_list()
#tensor.get_shape()返回的是元组,不能放到sess.run()里面,这个里面只能放operation和tensor;
#tf.shape()返回的是一个tensor。要想知道是多少,必须通过sess.run()
# Reshape a_C and a_G (≈2 lines)
a_C_unrolled = tf.transpose(tf.reshape(a_C,[n_H*n_W,n_C]))
a_G_unrolled = tf.transpose(tf.reshape(a_G,[n_H*n_W,n_C]))
# compute the cost with tensorflow (≈1 line)
J_content = (1/(4*n_H*n_W*n_C))*tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled)))
### END CODE HERE ###
return J_content
3.2.1 – Style matrix
Gram matrix
- The style matrix is also called a “Gram matrix.”
- In linear algebra, the Gram matrix G of a set of vectors \((v_{1},\dots ,v_{n})\) is the matrix of dot products, whose entries are \({\displaystyle G_{ij} = v_{i}^T v_{j} = np.dot(v_{i}, v_{j}) }\).
- In other words, \(G_{ij}\) compares how similar \(v_i\) is to \(v_j\): If they are highly similar, you would expect them to have a large dot product, and thus for \(G_{ij}\) to be large.
Two meanings of the variable G
- Note that there is an unfortunate collision in the variable names used here. We are following common terminology used in the literature.
- \(G\) is used to denote the Style matrix (or Gram matrix)
- \(G\) also denotes the generated image.
- For this assignment, we will use \(G_{gram}\) to refer to the Gram matrix, and \(G\) to denote the generated image.
Compute \(G_{gram}\)
In Neural Style Transfer (NST), you can compute the Style matrix by multiplying the “unrolled” filter matrix with its transpose:

\[G_{gram} = A_{unrolled} A_{unrolled}^T\]
\(G_{(gram)i,j}\): correlation
The result is a matrix of dimension \((n_C,n_C)\) where \(n_C\) is the number of filters (channels). The value \(G_{(gram)i,j}\) measures how similar the activations of filter i are to the activations of filter j.
\(G_{(gram),i,i}\): prevalence of patterns or textures
- The diagonal elements \(G_{(gram)ii}\) measure how “active” a filter i is.
- For example, suppose filter i is detecting vertical textures in the image. Then \(G_{(gram)ii}\) measures how common vertical textures are in the image as a whole.
- If \(G_{(gram)ii}\) is large, this means that the image has a lot of vertical texture.
By capturing the prevalence of different types of features (\(G_{(gram)ii}\)), as well as how much different features occur together (\(G_{(gram)ij}\)), the Style matrix \(G_{gram}\) measures the style of an image.
Exercise:
- Using TensorFlow, implement a function that computes the Gram matrix of a matrix A.
- The formula is: The gram matrix of A is \(G_A = AA^T\).
- You may use these functions: matmul and transpose.
Code:
# GRADED FUNCTION: gram_matrix
def gram_matrix(A):
"""
Argument:
A -- matrix of shape (n_C, n_H*n_W)
Returns:
GA -- Gram matrix of A, of shape (n_C, n_C)
"""
### START CODE HERE ### (≈1 line)
GA = tf.matmul(A, tf.transpose(A))
### END CODE HERE ###
return GA
3.2.2 – Style cost
Your goal will be to minimize the distance between the Gram matrix of the “style” image S and the gram matrix of the “generated” image G.
- For now, we are using only a single hidden layer \(a^{[l]}\).
- The corresponding style cost for this layer is defined as:
$$J_{style}^{[l]}(S,G) = \frac{1}{4 \times {n_C}^2 \times (n_H \times n_W)^2} \sum {i=1}^{n_C}\sum{j=1}^{n_C}(G^{(S)}{(gram)i,j} – G^{(G)}{(gram)i,j})^2\tag{2} $$
- \(G_{gram}^{(S)}\) Gram matrix of the “style” image.
- \(G_{gram}^{(G)}\) Gram matrix of the “generated” image.
- Remember, this cost is computed using the hidden layer activations for a particular hidden layer in the network \(a^{[l]}\)
Exercise: Compute the style cost for a single layer.
Instructions: The 3 steps to implement this function are:
- Retrieve dimensions from the hidden layer activations a_G:
- To retrieve dimensions from a tensor X, use:
X.get_shape().as_list()
- To retrieve dimensions from a tensor X, use:
- Unroll the hidden layer activations a_S and a_G into 2D matrices, as explained in the picture above (see the images in the sections “computing the content cost” and “style matrix”).
- You may use tf.transpose and tf.reshape.
- Compute the Style matrix of the images S and G. (Use the function you had previously written.)
- Compute the Style cost:
- You may find tf.reduce_sum, tf.square and tf.subtract useful.
Additional Hints
- Since the activation dimensions are \((m, n_H, n_W, n_C)\) whereas the desired unrolled matrix shape is \((n_C, n_H*n_W)\), the order of the filter dimension \(n_C\) is changed. So
tf.transpose
can be used to change the order of the filter dimension. - for the product \(\mathbf{G}{gram} = \mathbf{A}{} \mathbf{A}_{}^T\), you will also need to specify the
perm
parameter for thetf.transpose
function.
# GRADED FUNCTION: compute_layer_style_cost
def compute_layer_style_cost(a_S, a_G):
"""
Arguments:
a_S -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S
a_G -- tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G
Returns:
J_style_layer -- tensor representing a scalar value, style cost defined above by equation (2)
"""
### START CODE HERE ###
# Retrieve dimensions from a_G (≈1 line)
m, n_H, n_W, n_C = a_G.get_shape().as_list()
# Reshape the images to have them of shape (n_C, n_H*n_W) (≈2 lines)
a_S = tf.transpose(tf.reshape(a_S, ([n_H*n_W, n_C])))
a_G = tf.transpose(tf.reshape(a_G, ([n_H*n_W, n_C])))
# Computing gram_matrices for both images S and G (≈2 lines)
GS = gram_matrix(a_S)
GG = gram_matrix(a_G)
# Computing the loss (≈1 line)
J_style_layer = 1./(4*n_C**2 *(n_H*n_W)**2)*tf.reduce_sum(tf.pow((GS - GG), 2))
### END CODE HERE ###
return J_style_layer
You can combine the style costs for different layers as follows:
\[J_{style}(S,G) = \sum_{l} \lambda^{[l]} J^{[l]}_{style}(S,G)\]
where the values for \(\lambda^{[l]}\) are given in STYLE_LAYERS
.
Exercise: compute style cost
- We’ve implemented a compute_style_cost(…) function.
- It calls your
compute_layer_style_cost(...)
several times, and weights their results using the values inSTYLE_LAYERS
. - Please read over it to make sure you understand what it’s doing.
Description of compute_style_cost
For each layer:
- Select the activation (the output tensor) of the current layer.
- Get the style of the style image “S” from the current layer.
- Get the style of the generated image “G” from the current layer.
- Compute the “style cost” for the current layer
- Add the weighted style cost to the overall style cost (J_style)
Once you’re done with the loop:
- Return the overall style cost.
def compute_style_cost(model, STYLE_LAYERS):
"""
Computes the overall style cost from several chosen layers
Arguments:
model -- our tensorflow model
STYLE_LAYERS -- A python list containing:
- the names of the layers we would like to extract style from
- a coefficient for each of them
Returns:
J_style -- tensor representing a scalar value, style cost defined above by equation (2)
"""
# initialize the overall style cost
J_style = 0
for layer_name, coeff in STYLE_LAYERS:
# Select the output tensor of the currently selected layer
out = model[layer_name]
# Set a_S to be the hidden layer activation from the layer we have selected, by running the session on out
a_S = sess.run(out)
# Set a_G to be the hidden layer activation from same layer. Here, a_G references model[layer_name]
# and isn't evaluated yet. Later in the code, we'll assign the image G as the model input, so that
# when we run the session, this will be the activations drawn from the appropriate layer, with G as input.
a_G = out
# Compute style_cost for the current layer
J_style_layer = compute_layer_style_cost(a_S, a_G)
# Add coeff * J_style_layer of this layer to overall style cost
J_style += coeff * J_style_layer
return J_style
What you should remember
- The style of an image can be represented using the Gram matrix of a hidden layer’s activations.
- We get even better results by combining this representation from multiple different layers.
- This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
- Minimizing the style cost will cause the image $G$ to follow the style of the image S.
3.3 – Defining the total cost to optimize
Finally, let’s create a cost function that minimizes both the style and the content cost. The formula is:
$$J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G)$$
Exercise: Implement the total cost function which includes both the content cost and the style cost.
# GRADED FUNCTION: total_cost
def total_cost(J_content, J_style, alpha = 10, beta = 40):
"""
Computes the total cost function
Arguments:
J_content -- content cost coded above
J_style -- style cost coded above
alpha -- hyperparameter weighting the importance of the content cost
beta -- hyperparameter weighting the importance of the style cost
Returns:
J -- total cost as defined by the formula above.
"""
### START CODE HERE ### (≈1 line)
J = alpha * J_content + beta*J_style
### END CODE HERE ###
return J