1,293 view times

various loss function

MSE vs MAE. MAE show better robusty to outlier.

Huber loss:

hyperparameter \(\delta \sim \infty\) , huber approximate to MSE. Otherwise MAE. This loss combine advantage of these two loss functions.

Contrastive Loss

Proposed in siamese network. derived from LeCun ‘dimensionality reduction by learning an invariant mapping’. After dimensionality reduction, the similar samples should be closed in low demensional feature space.


\(d=\vert\vert x_1-x_2\vert\vert_2\)

y=1, the loss remain the first item, which focus on minimizing the similar samples distance. y=0, second item, focus on maximize the unsimilar smaples distance.



import torch
import torch.nn as nn



print(nn.BCEWithLogitsLoss()(input, target))
tensor([[ 0.6426, -0.5485, -0.2136],
        [ 0.6424,  0.7613,  0.1525],
        [-0.3920, -1.2589, -0.2521]])
tensor([[0.6553, 0.3662, 0.4468],
        [0.6553, 0.6816, 0.5381],
        [0.4032, 0.2212, 0.4373]])

BCEWithLogitsLoss just add a sigmoid function before BCEloss. The first time I saw this loss is in FCN as a pixel segmentation loss.

Contrast loss

The goal to maximize the ratio of postive sample in all samples so as to maximize the contrast of positive and negative smaples.

# -*- coding: utf-8 -*-
Created on Thu Aug  6 22:25:21 2020

@author: 99488

import numpy as np
p1 = np.array([-0.83483301, -0.16904167, 0.52390721])
p2 = np.array([-0.83455951, -0.16862266, 0.52447767])
neg = np.array([
 [ 0.70374682, -0.18682394, -0.68544673],
 [ 0.15465702,  0.32303224,  0.93366556],
 [ 0.53043332, -0.83523217, -0.14500935],
 [ 0.68285685, -0.73054075,  0.00409143],
 [ 0.76652431,  0.61500886,  0.18494479]])

# P1 and p2 are nearly identically, thus close to 1.0
pos_dot = p1.dot(p2)
# Most of the negatives are pretty far away, so small or negative
num_neg = len(neg)
neg_dot = np.zeros(num_neg)
for i in range(num_neg):
    neg_dot[i] = p1.dot(neg[i])

# make a vector from the positive and negative vectors comparisons
v = np.concatenate(([pos_dot], neg_dot))
# take e to the power of each value in the vector
exp = np.exp(v)
# divide each value by the sum of the exponentiated values
softmax_out = exp/np.sum(exp)

# Contrastive loss of the example values
# temp parameter
t = 0.07
# concatenated vector divided by the temp parameter
logits = np.concatenate(([pos_dot], neg_dot))/t
#e^x of the values
exp = np.exp(logits)
# we only need to take the log of the positive value over the sum of exp. 
loss = - np.log(exp[0]/np.sum(exp))

reference: https://towardsdatascience.com/contrastive-loss-explaned-159f2d4a87ec#:~:text=Contrastive%20loss%20takes%20the%20output,the%20distance%20to%20negative%20examples.


  1. […] this is a modification of contrastive loss. Some understanding about CL could be seen here http://ziqingguan.net/index.php/2020/07/17/various-loss-function/. The goal here just to amplified the positive sample ratio so as to make the feature […]


邮箱地址不会被公开。 必填项已用*标注