Losses
Created: 2022-04-25 11:50
#note
We often use "cost function" and "loss function" as synonymous, but they are different:
- loss function: it is for a single training example/input;
- cost function: it is the average loss over the entire training dataset.
Regression
Mean Squared Error
Also called squared loss or L2 loss.
It is the simplest and most common loss function: $MSE = \frac{1}{N}\sum_i^N(y_i-\hat{y_i})^2$.
Advantages:
- Easy to interpret;
- Always differential because of the square;
- Only one local minima
Disadvantages: - Error unit in the square;
- Not robust to outlier
Mean Absolute Error
Also called L1 loss. It is also very simple: $MAE = \frac{1}{N}\sum_{i=1}^N|y_i-\hat{y_i}|$.
Advantage:
- intuitive and easy;
- robust to outlier
Disadvantages: - Not differential
Huber Loss
It is used because it is less sensitive to outliers than squared error loss.
$\begin{cases}Huber=\frac{1}{n}\sum_{i=1}^n\frac{1}{2}(y_i-\hat{y_i})^2 \quad |y_i-\hat{y_i}|\leq \delta\ Huber= \frac{1}{n}\sum_{i=1}^n\delta(|y_i-\hat{y_i}|-\frac{1}{2}\delta)\quad |y_i-\hat{y_i}|>\delta\end{cases}$
Where $n$ is the number of data points, $y$ is the ground truth, $\hat{y}$ is the predicted value, and $\delta$ defines the point where the Huber loss function transitions from a quadratic to linear.
Advantages:
- Robust to outliers;
- It is considered as in the middle between MAE and MSE
Disadvantages: - It is complex and also $\delta$ has to be optimized (in addiction to the other parameters).
Classification Loss
Other losses
Pointwise loss: minimization of the squared loss between predicted score $y_{u,i}$ and target value $y(u,i)$. To use negative feedback, all unobserved entries are considered as negative examples or some unobserved entries are sampled to be negative instances.
Pairwise loss: aims to rank observed entries higher than the unobserved ones by maximizing the margin between observed entries and unobserved entries.
References
Tags
#loss