七三笔记

范数 norm L1 Loss L1 Loss 示例 SmoothL1Loss L2 Loss/MSELoss 参考

范数 norm

L1范数：1维上的距离

 
各个元素绝对值的和

|a-b|，两个数相减，再求绝对值，再相加，是常见的距离的计算方式
也叫L1范数

L2范数：多维上的距离

 
各个元素的平方和再开平方

两个向量
A=(a1,a2,...,an)
B=(b1,b2,...,bn)
设xi=ai - bi,那么向量AB的距离可表示为

 
这就是L2范数，也是欧氏距离，表示多维上的距离/差异 

L2范数将距离的概念 扩展/加强 了

向量有多维，就像花有绿肥红瘦一样，不同事物之间的差异是多方向/方面/维度的，

各个维度的差异积累起来的总差异，也是两个事物之间的差异，

这种差异在数学上叫 L2 norm

L1 Loss

L1 Loss:绝对值损失,再对绝对值求均值

 
nn.L1Loss(size_average=None, reduce=None, reduction: str = 'mean') -> None

Creates a criterion that measures the meanL1 Loss 示例 absolute error (MAE) between each element in

the input :math:`x` and target :math:`y`.

size_average, reduce 这两个参数已废弃，看 reduction: str = 'mean'

L1 Loss 示例

  
where :math:`N` is the batch size

import torch
from torch import nn 

loss = nn.L1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

output
tensor(1.1354, grad_fn=MeanBackward0)

SmoothL1Loss

Smooth L1 Loss为L1 Loss的平滑处理

 
Init signature:
nn.SmoothL1Loss(
    size_average=None,
    reduce=None,
    reduction: str = 'mean',
    beta: float = 1.0,
) -> None
Docstring:     
Creates a criterion that uses a squared term if the absolute
element-wise error falls below beta and an L1 term otherwise.
It is less sensitive to outliers than :class:`torch.nn.MSELoss` and in some cases
prevents exploding gradients (e.g. see the paper `Fast R-CNN`_ by Ross Girshick).


For a batch of size :math:`N`, the unreduced loss can be described as:

 
.. note::
Smooth L1 loss can be seen as exactly :class:`L1Loss`, but with the :math:`|x - y| < beta`
portion replaced with a quadratic function such that its slope is 1 at :math:`|x - y| = beta`.
The quadratic segment smooths the L1 loss near :math:`|x - y| = 0`.

当x与y相异不大时， 使用的是平方，只不过加了个参数，是扩大还是缩小分布的差异，看参数设置， 
相比MESLoss，可以防止爆炸，数据偏差大时，不再使用平方了，而是回到绝对值（即L1Loss），
只不过又减了一个数,并且这个数是正数，意思就是让偏差小一点
不管怎样，这两者都会随着偏差的增大而增大，能够反应分布之间的差异

L1 Loss易受异常点影响，且绝对值的梯度计算在0点容易丢失梯度。
Smooth L1 Loss 在0点附近是强凸，结合了平方损失和绝对值损失的优点。

L2 Loss/MSELoss

 
    $\displaystyle \frac{1}{n}  \sum_{i=1}^{i=n}{ \big(a_i-b_i\big)^2}$

官方示例

 
import torch
from torch import nn 

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

output
tensor(1.0446, grad_fn=MseLossBackward0)

 
nn.MSELoss(size_average=None, reduce=None, reduction: str = 'mean') -> None
    
Creates a criterion that measures the mean squared error (squared L2 norm) 
between each element in the input :math:`x` and target :math:`y`.

 
where :math:`N` is the batch size. 
If :attr:`reduction` is not ``'none'`` (default ``'mean'``), then:

Shape

 
- Input: :math:`(*)`, where :math:`*` means any number of dimensions.
- Target: :math:`(*)`, same shape as the input.

MSELoss要求模型输出与标签的shape一致，
并不像交叉熵那样，标签支持索引的格式
也没什么，就是遇到MSE加一个标签与模型输出的shape是否一致的判断就可以了

 
loss_fn = nn.MSELoss()

for X,y in trainDataLoader:
    y_out  = transformer(X)
    print(y_out.shape,y.shape)            # torch.Size([64, 100, 1]) torch.Size([64, 100, 1]) 
    loss = loss_fn(y_out,y)
    loss.backward()
    print(round(loss.item(),5))  #5.5041
    break

 
与前面官方的例子不同的是，这里输入的数据没有设置 requires_grad=True 

输入的数据X是常量，模型中的参数由模型内部自动设置 requires_grad=True 
- requires_grad隐藏在台后了，时间长了，会让人认为不需要requires_grad了
- 实际上，真实改变的就是参数，参数的改变有两步：
- 1-求导，2- w -= learning_rate*w.grad作差  
  
另外，这只是个示例，梯度没有下降，即没有加优化器

 
在PyTorch中，并没有直接提供名为RMSELoss的API来计算均方根误差（Root Mean Square Error, RMSE）。
不过，RMSE可以通过均方误差（Mean Squared Error, MSE）损失函数计算后开平方得到。
以下是实现RMSE的步骤和代码示例：

 
import torch
import torch.nn as nn

# 假设我们有一些预测值和目标值
predictions = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
targets = torch.tensor([1.5, 2.5, 3.5])

# 定义MSE损失函数
mse_loss = nn.MSELoss()

# 计算MSE
mse = mse_loss(predictions, targets)

# 计算RMSE
rmse = torch.sqrt(mse)

print("MSE:", mse.item())    #MSE: 0.25
print("RMSE:", rmse.item())  #RMSE: 0.5

 
当损失函数小于1时，RMSE是大于MSE的

自定义RMSE损失函数

 
import torch
import torch.nn as nn
import torch.nn.functional as F

# 假设我们有一些预测值和目标值
predictions = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
targets = torch.tensor([1.5, 2.5, 3.5])

class RMSELoss(nn.Module):
    def __init__(self):
        super(RMSELoss, self).__init__()
        self.mse = nn.MSELoss()
        
    def forward(self, predictions, targets):
        mse = self.mse(predictions, targets)
        rmse = torch.sqrt(mse)
        return rmse

# 使用自定义的RMSE损失函数
rmse_loss = RMSELoss()
loss = rmse_loss(predictions, targets)
print("Custom RMSE Loss:", loss.item())  #Custom RMSE Loss: 0.5

参考

    L2范数的理解
    什么是范数（norm）？以及L1,L2范数的简单介绍
    拉普拉斯分布

七三笔记路线：学习，记录，分享