337 view times

Face Recognition

1.2 – The Triplet Loss

For an image x, we denote its encoding f(x), where f is the function computed by the neural network.

supplement: Triplet loss is commonly used in a individual level classification(eg. face identification, vechicle reidentification). We want a classification with sublevel precision. However the convegernce time would be longer.

Training will use triplets of images (A, P, N):

  • A is an “Anchor” image–a picture of a person.
  • P is a “Positive” image–a picture of the same person as the Anchor image.
  • N is a “Negative” image–a picture of a different person than the Anchor image.

These triplets are picked from our training dataset. We will write \((A^{(i)}, P^{(i)}, N^{(i)})\) to denote the i-th training example.

You’d like to make sure that an image \(A^{(i)}\) of an individual is closer to the Positive \(P^{(i)}\) than to the Negative image \(N^{(i)}\)) by at least a margin \(\alpha\):

$$\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2$$

You would thus like to minimize the following “triplet cost”:

$$\mathcal{J} = \sum^{m}{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2}\text{(1)} – \underbrace{\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2}\text{(2)} + \alpha \large ] \small+ \tag{3}$$

Here, we are using the notation “\([z]_+\)” to denote max(z,0).


  • The term (1) is the squared distance between the anchor “A” and the positive “P” for a given triplet; you want this to be small.
  • The term (2) is the squared distance between the anchor “A” and the negative “N” for a given triplet, you want this to be relatively large. It has a minus sign preceding it because minimizing the negative of the term is the same as maximizing that term.
  • \(\alpha\) is called the margin. It is a hyperparameter that you pick manually. We will use \(\alpha = 0.2\).

Most implementations also rescale the encoding vectors to haven L2 norm equal to one (i.e., \( \mid \mid f(img)\mid \mid_2 $=1\); you won’t have to worry about that in this assignment.

Exercise: Implement the triplet loss as defined by formula (3). Here are the 4 steps:

  1. Compute the distance between the encodings of “anchor” and “positive”: \(\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2\)
  2. Compute the distance between the encodings of “anchor” and “negative”: \(\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2\)
  3. Compute the formula per training example: \( \mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 – \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2 + \alpha\)
  4. Compute the full formula by taking the max with zero and summing over the training examples:
    $$\mathcal{J} = \sum^{m}{i=1} \large[ \small \mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 – \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2+ \alpha \large ] \small+ \tag{3}$$


  • Useful functions: tf.reduce_sum(), tf.square(), tf.subtract(), tf.add(), tf.maximum().
  • For steps 1 and 2, you will sum over the entries of \(\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2\) and \(\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2\).
  • For step 4 you will sum over the training examples.

Additional Hints

  • Recall that the square of the L2 norm is the sum of the squared differences: \(||x – y||{2}^{2} = \sum{i=1}^{N}(x_{i} – y_{i})^{2}\)
  • Note that the anchor, positive and negative encodings are of shape (m,128), where m is the number of training examples and 128 is the number of elements used to encode a single example.
  • For steps 1 and 2, you will maintain the number of m training examples and sum along the 128 values of each encoding.
    tf.reduce_sum has an axis parameter. This chooses along which axis the sums are applied.
  • Note that one way to choose the last axis in a tensor is to use negative indexing (axis=-1).
  • In step 4, when summing over training examples, the result will be a single scalar value.
  • For tf.reduce_sum to sum across all axes, keep the default value axis=None.
# GRADED FUNCTION: triplet_loss

def triplet_loss(y_true, y_pred, alpha = 0.2):
    Implementation of the triplet loss as defined by formula (3)
    y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images, of shape (None, 128)
            positive -- the encodings for the positive images, of shape (None, 128)
            negative -- the encodings for the negative images, of shape (None, 128)
    loss -- real number, value of the loss
    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
    ### START CODE HERE ### (≈ 4 lines)
    # Step 1: Compute the (encoding) distance between the anchor and the positive
    pos_dist = tf.reduce_sum(tf.square(anchor - positive), axis = -1)
    # Step 2: Compute the (encoding) distance between the anchor and the negative
    neg_dist = tf.reduce_sum(tf.square(anchor - negative), axis = -1)
    # Step 3: subtract the two previous distances and add alpha.
    basic_loss = pos_dist- neg_dist + alpha
    # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
    loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
    ### END CODE HERE ###
    return loss

I generate some image to test the algorithm. The figure as belown is a failed case.

Here is a successful case.

Generally speak, the robusty of algorithm need to be improved furthermore. And I will read the paper to enhance understanding why it does not work very well.



邮箱地址不会被公开。 必填项已用*标注