MSE vs MAE. MAE show better robusty to outlier.
hyperparameter \(\delta \sim \infty\) , huber approximate to MSE. Otherwise MAE. This loss combine advantage of these two loss functions.
Proposed in siamese network. derived from LeCun ‘dimensionality reduction by learning an invariant mapping’. After dimensionality reduction, the similar samples should be closed in low demensional feature space.
y=1, the loss remain the first item, which focus on minimizing the similar samples distance. y=0, second item, focus on maximize the unsimilar smaples distance.
import torch import torch.nn as nn input=torch.randn(3,3) print(input) m=nn.Sigmoid() data=m(input) print(data) target=torch.FloatTensor([[0,1,1],[1,1,1],[1,0,1]]) loss=nn.BCELoss() print(loss(data,target)) print(nn.BCEWithLogitsLoss()(input, target))
tensor([[ 0.6426, -0.5485, -0.2136], [ 0.6424, 0.7613, 0.1525], [-0.3920, -1.2589, -0.2521]]) tensor([[0.6553, 0.3662, 0.4468], [0.6553, 0.6816, 0.5381], [0.4032, 0.2212, 0.4373]]) tensor(0.6985) tensor(0.6985)
BCEWithLogitsLoss just add a sigmoid function before BCEloss. The first time I saw this loss is in FCN as a pixel segmentation loss.
The goal to maximize the ratio of postive sample in all samples so as to maximize the contrast of positive and negative smaples.
# -*- coding: utf-8 -*- """ Created on Thu Aug 6 22:25:21 2020 @author: 99488 """ import numpy as np p1 = np.array([-0.83483301, -0.16904167, 0.52390721]) p2 = np.array([-0.83455951, -0.16862266, 0.52447767]) neg = np.array([ [ 0.70374682, -0.18682394, -0.68544673], [ 0.15465702, 0.32303224, 0.93366556], [ 0.53043332, -0.83523217, -0.14500935], [ 0.68285685, -0.73054075, 0.00409143], [ 0.76652431, 0.61500886, 0.18494479]]) # P1 and p2 are nearly identically, thus close to 1.0 pos_dot = p1.dot(p2) # Most of the negatives are pretty far away, so small or negative num_neg = len(neg) neg_dot = np.zeros(num_neg) for i in range(num_neg): neg_dot[i] = p1.dot(neg[i]) # make a vector from the positive and negative vectors comparisons v = np.concatenate(([pos_dot], neg_dot)) # take e to the power of each value in the vector exp = np.exp(v) # divide each value by the sum of the exponentiated values softmax_out = exp/np.sum(exp) # Contrastive loss of the example values # temp parameter t = 0.07 # concatenated vector divided by the temp parameter logits = np.concatenate(([pos_dot], neg_dot))/t #e^x of the values exp = np.exp(logits) # we only need to take the log of the positive value over the sum of exp. loss = - np.log(exp/np.sum(exp))