### 1.2 – The Triplet Loss

For an image x, we denote its encoding f(x), where f is the function computed by the neural network.

**supplement**: Triplet loss is commonly used in a individual level classification(eg. face identification, vechicle reidentification). We want a classification with sublevel precision. However the convegernce time would be longer.

Training will use triplets of images (A, P, N):

- A is an “Anchor” image–a picture of a person.
- P is a “Positive” image–a picture of the same person as the Anchor image.
- N is a “Negative” image–a picture of a different person than the Anchor image.

These triplets are picked from our training dataset. We will write \((A^{(i)}, P^{(i)}, N^{(i)})\) to denote the i-th training example.

You’d like to make sure that an image \(A^{(i)}\) of an individual is closer to the Positive \(P^{(i)}\) than to the Negative image \(N^{(i)}\)) by at least a margin \(\alpha\):

$$\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2$$

You would thus like to minimize the following “triplet cost”:

$$\mathcal{J} = \sum^{m}{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2}\text{(1)} – \underbrace{\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2}\text{(2)} + \alpha \large ] \small+ \tag{3}$$

Here, we are using the notation “\([z]_+\)” to denote max(z,0).

Notes:

- The term (1) is the squared distance between the anchor “A” and the positive “P” for a given triplet; you want this to be small.
- The term (2) is the squared distance between the anchor “A” and the negative “N” for a given triplet, you want this to be relatively large. It has a minus sign preceding it because minimizing the negative of the term is the same as maximizing that term.
- \(\alpha\) is called the margin. It is a hyperparameter that you pick manually. We will use \(\alpha = 0.2\).

Most implementations also rescale the encoding vectors to haven L2 norm equal to one (i.e., \( \mid \mid f(img)\mid \mid_2 $=1\); you won’t have to worry about that in this assignment.

**Exercise**: Implement the triplet loss as defined by formula (3). Here are the 4 steps:

- Compute the distance between the encodings of “anchor” and “positive”: \(\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2\)
- Compute the distance between the encodings of “anchor” and “negative”: \(\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2\)
- Compute the formula per training example: \( \mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 – \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2 + \alpha\)
- Compute the full formula by taking the max with zero and summing over the training examples:

$$\mathcal{J} = \sum^{m}*{i=1} \large[ \small \mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 – \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2+ \alpha \large ] \small*+ \tag{3}$$

#### Hints

- Useful functions:
`tf.reduce_sum()`

,`tf.square()`

,`tf.subtract()`

,`tf.add()`

,`tf.maximum()`

. - For steps 1 and 2, you will sum over the entries of \(\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2\) and \(\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2\).
- For step 4 you will sum over the training examples.

#### Additional Hints

- Recall that the square of the L2 norm is the sum of the squared differences: \(||x – y||{2}^{2} = \sum{i=1}^{N}(x_{i} – y_{i})^{2}\)
- Note that the
`anchor`

,`positive`

and`negative`

encodings are of shape`(m,128)`

, where m is the number of training examples and 128 is the number of elements used to encode a single example. - For steps 1 and 2, you will maintain the number of
`m`

training examples and sum along the 128 values of each encoding.

tf.reduce_sum has an`axis`

parameter. This chooses along which axis the sums are applied. - Note that one way to choose the last axis in a tensor is to use negative indexing (
`axis=-1`

). - In step 4, when summing over training examples, the result will be a single scalar value.
- For
`tf.reduce_sum`

to sum across all axes, keep the default value`axis=None`

.

```
# GRADED FUNCTION: triplet_loss
def triplet_loss(y_true, y_pred, alpha = 0.2):
"""
Implementation of the triplet loss as defined by formula (3)
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 128)
positive -- the encodings for the positive images, of shape (None, 128)
negative -- the encodings for the negative images, of shape (None, 128)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
### START CODE HERE ### (≈ 4 lines)
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(anchor - positive), axis = -1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(anchor - negative), axis = -1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = pos_dist- neg_dist + alpha
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
### END CODE HERE ###
return loss
```

I generate some image to test the algorithm. The figure as belown is a failed case.

Here is a successful case.

Generally speak, the robusty of algorithm need to be improved furthermore. And I will read the paper to enhance understanding why it does not work very well.

[…] triplet loss was used. Detail could be viewes in my other blog http://ziqingguan.net/index.php/2020/05/30/face-recognition/ […]