337 view times

# Face Recognition

### 1.2 – The Triplet Loss

For an image x, we denote its encoding f(x), where f is the function computed by the neural network.

supplement: Triplet loss is commonly used in a individual level classification(eg. face identification, vechicle reidentification). We want a classification with sublevel precision. However the convegernce time would be longer.

Training will use triplets of images (A, P, N):

• A is an “Anchor” image–a picture of a person.
• P is a “Positive” image–a picture of the same person as the Anchor image.
• N is a “Negative” image–a picture of a different person than the Anchor image.

These triplets are picked from our training dataset. We will write $$(A^{(i)}, P^{(i)}, N^{(i)})$$ to denote the i-th training example.

You’d like to make sure that an image $$A^{(i)}$$ of an individual is closer to the Positive $$P^{(i)}$$ than to the Negative image $$N^{(i)}$$) by at least a margin $$\alpha$$:

$$\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2$$

You would thus like to minimize the following “triplet cost”:

$$\mathcal{J} = \sum^{m}{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2}\text{(1)} – \underbrace{\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2}\text{(2)} + \alpha \large ] \small+ \tag{3}$$

Here, we are using the notation “$$[z]_+$$” to denote max(z,0).

Notes:

• The term (1) is the squared distance between the anchor “A” and the positive “P” for a given triplet; you want this to be small.
• The term (2) is the squared distance between the anchor “A” and the negative “N” for a given triplet, you want this to be relatively large. It has a minus sign preceding it because minimizing the negative of the term is the same as maximizing that term.
• $$\alpha$$ is called the margin. It is a hyperparameter that you pick manually. We will use $$\alpha = 0.2$$.

Most implementations also rescale the encoding vectors to haven L2 norm equal to one (i.e., $$\mid \mid f(img)\mid \mid_2 =1$$; you won’t have to worry about that in this assignment.

Exercise: Implement the triplet loss as defined by formula (3). Here are the 4 steps:

1. Compute the distance between the encodings of “anchor” and “positive”: $$\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2$$
2. Compute the distance between the encodings of “anchor” and “negative”: $$\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2$$
3. Compute the formula per training example: $$\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 – \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2 + \alpha$$
4. Compute the full formula by taking the max with zero and summing over the training examples:
$$\mathcal{J} = \sum^{m}{i=1} \large[ \small \mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2 – \mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2+ \alpha \large ] \small+ \tag{3}$$

#### Hints

• Useful functions: tf.reduce_sum(), tf.square(), tf.subtract(), tf.add(), tf.maximum().
• For steps 1 and 2, you will sum over the entries of $$\mid \mid f(A^{(i)}) – f(P^{(i)}) \mid \mid_2^2$$ and $$\mid \mid f(A^{(i)}) – f(N^{(i)}) \mid \mid_2^2$$.
• For step 4 you will sum over the training examples.

• Recall that the square of the L2 norm is the sum of the squared differences: $$||x – y||{2}^{2} = \sum{i=1}^{N}(x_{i} – y_{i})^{2}$$
• Note that the anchor, positive and negative encodings are of shape (m,128), where m is the number of training examples and 128 is the number of elements used to encode a single example.
• For steps 1 and 2, you will maintain the number of m training examples and sum along the 128 values of each encoding.
tf.reduce_sum has an axis parameter. This chooses along which axis the sums are applied.
• Note that one way to choose the last axis in a tensor is to use negative indexing (axis=-1).
• In step 4, when summing over training examples, the result will be a single scalar value.
• For tf.reduce_sum to sum across all axes, keep the default value axis=None.
# GRADED FUNCTION: triplet_loss

def triplet_loss(y_true, y_pred, alpha = 0.2):
"""
Implementation of the triplet loss as defined by formula (3)

Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 128)
positive -- the encodings for the positive images, of shape (None, 128)
negative -- the encodings for the negative images, of shape (None, 128)

Returns:
loss -- real number, value of the loss
"""

anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]

### START CODE HERE ### (≈ 4 lines)
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(anchor - positive), axis = -1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(anchor - negative), axis = -1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = pos_dist- neg_dist + alpha
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
### END CODE HERE ###

return loss

I generate some image to test the algorithm. The figure as belown is a failed case.

Here is a successful case.

Generally speak, the robusty of algorithm need to be improved furthermore. And I will read the paper to enhance understanding why it does not work very well.

## 1条评论

1. […] triplet loss was used. Detail could be viewes in my other blog http://ziqingguan.net/index.php/2020/05/30/face-recognition/ […]